Joseph Huber [Thu, 2 Jun 2022 18:25:49 +0000 (14:25 -0400)]
[llvm-objdump] Add support for dumping embedded offloading data
In Clang/LLVM we are moving towards a new binary format to store many
embedded object files to create a fatbinary. This patch adds support for
dumping these embedded images in the `llvm-objdump` tool. This will
allow users to query information about what is stored inside the binary.
This has very similar functionality to the `cuobjdump` tool for thoe familiar
with the Nvidia utilities. The proposed use is as follows:
```
$ clang input.c -fopenmp --offload-arch=sm_70 --offload-arch=sm_52 -c
$ llvm-objdump -O input.o
input.o: file format elf64-x86-64
OFFLOADIND IMAGE [0]:
kind cubin
arch sm_52
triple nvptx64-nvidia-cuda
producer openmp
OFFLOADIND IMAGE [1]:
kind cubin
arch sm_70
triple nvptx64-nvidia-cuda
producer openmp
```
This will be expanded further once we start embedding more information
into these offloading images. Right now we are planning on adding
flags and entries for debug level, optimization, LTO usage, target
features, among others.
This patch only supports printing these sections, later we will want to
support dumping files the user may be interested in via another flag. I
am unsure if this should go here in `llvm-objdump` or `llvm-objcopy`.
Reviewed By: MaskRay, tra, jhenderson, JonChesterfield
Differential Revision: https://reviews.llvm.org/D126904
Joseph Huber [Tue, 14 Jun 2022 19:04:39 +0000 (15:04 -0400)]
[ObjectYAML] Add offloading binary implementations for obj2yaml and yaml2obj
This patchs adds the necessary code for inspecting or creating offloading
binaries using the standing `obj2yaml` and `yaml2obj` features in LLVM.
Depends on D127774
Reviewed By: jhenderson
Differential Revision: https://reviews.llvm.org/D127776
Jennifer Yu [Tue, 14 Jun 2022 17:11:10 +0000 (10:11 -0700)]
Generate the capture for the field when the field is used in openmp
region with implicit default inside the member function.
This is to fix assert when field is referenced in OpenMP region with
default (first|private) clause inside member function.
The problem of assert is that the capture is not generated for the field.
This patch is to generate capture when the field is used with implicit
default, use it in the code, and save the capture off to make sure it is
considered from that point and add first/private clauses.
1> Add new field ImplicitDefaultFirstprivateFDs in SharingMapTy, used to
store generated capture fields info.
2> In function isOpenMPCaptureDecl: the caputer is generated and saved
in ImplicitDefaultFirstprivateFDs.
3> Add new help functions:
getImplicitFDCapExprDecl
isImplicitDefaultFirstprivateFD
addImplicitDefaultFirstprivateFD
4> Add addition argument in hasDSA to check default attribute for
default(first|private).
5> The isImplicitDefaultFirstprivateFD is used in VisitDeclRefExpr to
build the implicit clause.
6> Add new parameter "Context" for buildCaptureDecl, due to when capture
field, the parent context is needed to be used.
7> Change in isOpenMPPrivateDecl where stop propagate the capture from
the enclosing region for private variable.
8> In ActOnOpenMPFirstprivate/ActOnOpenMPPrivate, using captured info
to generate first|private clause.
9> Add new function isOpenMPRebuildMemberExpr: use to determine if field
needs to be rebuild during template instantiation.
Differential Revision: https://reviews.llvm.org/D127803
LLVM GN Syncbot [Fri, 1 Jul 2022 23:35:58 +0000 (23:35 +0000)]
[gn build] Port
94c7b89fe5b0
Konstantin Varlamov [Fri, 1 Jul 2022 23:34:08 +0000 (16:34 -0700)]
[libc++][ranges] Implement `ranges::stable_sort`.
Differential Revision: https://reviews.llvm.org/D127834
Vitaly Buka [Fri, 1 Jul 2022 23:22:04 +0000 (16:22 -0700)]
[sanitizer] Update dn_expand interceptor for glibc 2.34
Symbol changed with
640bbdf71c6f10ac26252ac67a22902e26657bd8
Nuno Lopes [Fri, 1 Jul 2022 22:53:41 +0000 (23:53 +0100)]
Revert [LowerMatrixMultiplication] Switch dummy values from undef to poison [NFC]
This reverts commits
47e6f98f84ac3 and
3e701bcd2a6aee2
Nuno Lopes [Fri, 1 Jul 2022 22:43:48 +0000 (23:43 +0100)]
attempt to fix aarch64 build bot
Nuno Lopes [Fri, 1 Jul 2022 22:31:31 +0000 (23:31 +0100)]
[LowerMatrixMultiplication] Switch dummy values from undef to poison [NFC]
Maksim Panchenko [Fri, 1 Jul 2022 01:46:47 +0000 (18:46 -0700)]
[BOLT] Fix instrumentation problem with floating point
If BOLT instrumentation runtime uses XMM registers, it can interfere
with the user program causing crashes and unexpected behavior. This
happens as the instrumentation code preserves general purpose registers
only.
Build BOLT instrumentation runtime with "-mno-sse".
Reviewed By: Amir
Differential Revision: https://reviews.llvm.org/D128960
Fangrui Song [Fri, 1 Jul 2022 21:35:36 +0000 (14:35 -0700)]
[llvm-lto2] Remove unneeded cl::init(false). NFC
Argyrios Kyrtzidis [Thu, 30 Jun 2022 21:04:14 +0000 (14:04 -0700)]
[Lex] Introduce `PPCallbacks::LexedFileChanged()` preprocessor callback
This is a preprocessor callback focused on the lexed file changing, without conflating effects of line number directives and other pragmas.
A client that only cares about what files the lexer processes, like dependency generation, can use this more straightforward
callback instead of `PPCallbacks::FileChanged()`. Clients that want the pragma directive effects as well can keep using `FileChanged()`.
A use case where `PPCallbacks::LexedFileChanged()` is particularly simpler to use than `FileChanged()` is in a situation
where a client wants to keep track of lexed file changes that include changes from/to the predefines buffer, where it becomes
unnecessary complicated trying to use `FileChanged()` while filtering out the pragma directives effects callbacks.
Also take the opportunity to provide information about the prior `FileID` the `Lexer` moved from, even when entering a new file.
Differential Revision: https://reviews.llvm.org/D128947
Arthur Eubanks [Fri, 1 Jul 2022 20:47:19 +0000 (13:47 -0700)]
[bazel] Fix invalid characters
Arthur Eubanks [Fri, 1 Jul 2022 20:36:47 +0000 (13:36 -0700)]
[bazel] Port
43dc3190, adding rules to generate dxil intrinsics
Sanjay Patel [Fri, 1 Jul 2022 20:24:34 +0000 (16:24 -0400)]
[InstCombine] restrict select of bit-tests to constant shift amounts
This transform is responsible for a long-standing miscompile
as discussed in issue #47012 (was bugzilla #47668).
There was a proposal to correct it in D88432, but that was
abandoned and there hasn't been any recent activity to fix
it AFAICT.
The original patch D45108 started with a constant-shift-only
restriction and only expanded during review, so I don't think
there's much risk of perf regression on the motivating code.
Sanjay Patel [Fri, 1 Jul 2022 20:18:11 +0000 (16:18 -0400)]
[InstCombine] avoid 'tmp' usage in test files; NFC
The update script ( utils/update_test_checks.py ) warns against this.
wren romano [Fri, 1 Jul 2022 19:41:01 +0000 (12:41 -0700)]
[mlir][sparse] Reducing computational complexity
This is a followup to D128847. The `AffineMap::getPermutedPosition` method performs a linear scan of the map, thus the previous implementation had asymptotic complexity of `O(|topSort| * |m|)`. This change reduces that to `O(|topSort| + |m|)`.
Reviewed By: aartbik
Differential Revision: https://reviews.llvm.org/D129011
Valentin Clement [Fri, 1 Jul 2022 19:47:53 +0000 (21:47 +0200)]
[flang][NFC] Add embox test with character
This test is added to check for multidimensional descriptor of array
substring/derived type component array.
This patch is part of the upstreaming effort from fir-dev branch.
Reviewed By: jeanPerier
Differential Revision: https://reviews.llvm.org/D128990
Co-authored-by: Jean Perier <jperier@nvidia.com>
Alexey Bataev [Fri, 1 Jul 2022 19:42:34 +0000 (12:42 -0700)]
[SLP][NFC]Rework the test for logical and freeze, need some extra nodes,
NFC.
Haojian Wu [Fri, 1 Jul 2022 19:23:29 +0000 (21:23 +0200)]
Eric Kunze [Fri, 1 Jul 2022 18:36:37 +0000 (11:36 -0700)]
[mlir][tosa] Update TOSA transpose_conv2d to match spec
The TOSA Specification doesn't have a dilation attribute for transpose_conv2d,
and the padding array is of size 4. (top,bottom,left,right).
This change updates the dialect to match the specification, and updates the lit
tests to match the dialect changes.
Differential Revision: https://reviews.llvm.org/D127332
Alexey Bataev [Fri, 1 Jul 2022 18:53:11 +0000 (11:53 -0700)]
[SLP][NFC]Add a test for logical and operands, requiring extra
freezextra freeze, NFC.e.
Fangrui Song [Fri, 1 Jul 2022 18:42:47 +0000 (11:42 -0700)]
[UpdateTestChecks][test] Remove stray ; before/after non-RUN-non-CHECK comments
Arthur Eubanks [Fri, 1 Jul 2022 18:38:32 +0000 (11:38 -0700)]
[gn build] Manually port
43dc3190
rdzhabarov [Fri, 1 Jul 2022 18:31:20 +0000 (18:31 +0000)]
[mlir] Fix usages of `run-reproducer`.
There is no need to specify `run-reproducer` explicitly anymore.
Differential Revision: https://reviews.llvm.org/D129010
Erich Keane [Fri, 1 Jul 2022 18:20:16 +0000 (11:20 -0700)]
Revert "Re-apply "Deferred Concept Instantiation Implementation"""
This reverts commit
befa8cf087dbb8159a4d9dc8fa4d6748d6d5049a.
Apparently this breaks some libc++ builds with an apparent assertion,
so I'm looking into that .
Craig Topper [Fri, 1 Jul 2022 17:22:43 +0000 (10:22 -0700)]
[RISCV] Considering existing offset in the alignment when folding ADDIs into load/store.
getPointerAlignment and ConstantPoolSDNode::getAlign only consider
the alignment of the object. If we already have a non-zero offset
into the offset that may have reduced the alignment.
Since the base pointer will become an LUI with the old offset, we
need to be sure the new offset fits in the alignment of the address
that will be used to create the LUI immediate.
I'm not sure it is possible to have a non-zero offset in the
GlobalAddressSDNode or ConstantPoolSDNode at this point today so this
may only be a theoretical bug.
Differential Revision: https://reviews.llvm.org/D129006
Haojian Wu [Fri, 1 Jul 2022 18:16:06 +0000 (20:16 +0200)]
[pseudo] Fix an out-of-bound issue in getReduceRules.
Fangrui Song [Fri, 1 Jul 2022 18:15:04 +0000 (11:15 -0700)]
[MC][RISCV] Suppress R_RISCV_{ADD,SUB}32 in .apple_names .apple_types after D127549
This fixes test/DebugInfo/Generic/accel-table-hash-collisions.ll and
cross-cu-inlining.ll when the default triple is riscv. llvm-dwarfdump
--apple-names does not resolve R_RISCV_{ADD,SUB}32 in .apple_names .apple_types
and having ADD/SUB will cause decoding failure `Atom[0]: Error extracting the
value`.
Rong Xu [Fri, 1 Jul 2022 16:56:47 +0000 (09:56 -0700)]
Remove redundant code. [NFC]
isAssumeLikeIntrinsic() is a superset of isLifetimeStartOrEnd().
Peiming Liu [Fri, 1 Jul 2022 03:10:33 +0000 (03:10 +0000)]
[mlir][sparse] add more unittest cases to sparse dialect merger
Reviewed By: aartbik, wrengr
Differential Revision: https://reviews.llvm.org/D128058
Xiang Li [Thu, 16 Jun 2022 17:49:21 +0000 (10:49 -0700)]
[DirectX] add thread/group id DXIL operations.
Add DXIL operation for thread/group id operations.
ID Name Description
93 ThreadId reads the thread ID
94 GroupId reads the group ID (SV_GroupID)
95 ThreadIdInGroup reads the thread ID within the group (SV_GroupThreadID)
96 FlattenedThreadIdInGroup provides a flattened index for a given thread within a given group (SV_GroupIndex)
Also add llvm intrinsic which map to these intrinsics to DXIL operation.
Reviewed By: beanz
Differential Revision: https://reviews.llvm.org/D127990
Aaron Ballman [Fri, 1 Jul 2022 17:47:08 +0000 (13:47 -0400)]
Test a few more C99 DRs
This updates the status for another 8 DRs.
Petr Hosek [Fri, 1 Jul 2022 17:24:37 +0000 (17:24 +0000)]
[compiler-rt] Update Fuchsia sanitizer sched_yield
Fuchsia has split overloaded nanosleep(0) for yielding to its own
dedicated syscall, so valid zero deadlines would just return.
Patch By: gevalentino
Differential Revision: https://reviews.llvm.org/D128748
Quentin Colombet [Fri, 1 Jul 2022 16:27:30 +0000 (09:27 -0700)]
[GISel] Don't fold convergent instruction across CFG
Before merging two instructions together, GISel does some sanity checks
that the folding is legal. However that check was missing that the
source of the pattern may be convergent. When the destination location
is in a different basic block, the folding is invalid.
Differential Revision: https://reviews.llvm.org/D128539
Petr Hosek [Sat, 28 May 2022 05:55:38 +0000 (05:55 +0000)]
[CMake][Fuchsia] Use libunwind as the default unwinder
Fuchsia already uses libunwind, but it does so implicitly via libc++.
This change makes the unwinder choice explicit.
Differential Revision: https://reviews.llvm.org/D127887
LLVM GN Syncbot [Fri, 1 Jul 2022 17:14:07 +0000 (17:14 +0000)]
[gn build] Port
554aea52d79e
Martin Sebor [Fri, 1 Jul 2022 17:07:41 +0000 (11:07 -0600)]
[InstCombine] Add tests in anticipation of D128939 (NFC)
Precommit tests exercising the future folding of memchr and strchr calls
in equality expressions with the first function argument.
Martin Sebor [Fri, 1 Jul 2022 16:09:42 +0000 (10:09 -0600)]
[InstCombine] Transform strrchr to memrchr for constant strings
Add an emitter for the memrchr common extension and simplify the strrchr
call handler to use it. This enables transforming calls with the empty
string to the test C ? S : 0.
Reviewed By: nikic
Differential Revision: https://reviews.llvm.org/D128954
Alexey Lapshin [Tue, 28 Jun 2022 16:52:12 +0000 (19:52 +0300)]
[reland][Debuginfo][DWARF][NFC] Refactor DwarfStringPoolEntryRef.
This review is extracted from D96035.
This patch adds possibility to keep not only DwarfStringPoolEntry, but also
pointer to it. The DwarfStringPoolEntryRef keeps reference to the string map entry.
String map keeps string data and corresponding DwarfStringPoolEntry
info. Not all string map entries may be included into the result,
and then not all string entries should have DwarfStringPoolEntry
info. Currently StringMap keeps DwarfStringPoolEntry for all entries.
It leads to extra memory usage. This patch allows to keep
DwarfStringPoolEntry info only for entries which really need it.
[reland] : make msan happy.
Reviewed By: JDevlieghere
Differential Revision: https://reviews.llvm.org/D126883
Pengxuan Zheng [Tue, 21 Jun 2022 01:44:32 +0000 (18:44 -0700)]
[LLD][COFF] Ignore /kernel flag
There exists some description of the flag from Microsoft, but not sure if
there's more to it. We ignore the flag for now until we find out more about it.
https://docs.microsoft.com/en-us/cpp/build/reference/kernel-create-kernel-mode-binary?view=msvc-170
Reviewed By: thieta, hans
Differential Revision: https://reviews.llvm.org/D128238
Arjun P [Fri, 1 Jul 2022 16:53:13 +0000 (17:53 +0100)]
[MLIR][Presburger] support symbolicLexMin for IntegerRelation
This also changes the space of the returned lexmin for IntegerPolyhedrons;
the symbols in the poly now correspond to symbols in the result rather than dims.
Reviewed By: Groverkss
Differential Revision: https://reviews.llvm.org/D128933
Arjun P [Fri, 1 Jul 2022 16:46:50 +0000 (17:46 +0100)]
[MLIR][Presburger] Simplex: refactor (symbolic)lex to support specifying multiple varKinds as symbols
This is also required to support lexmin for relations.
Reviewed By: Groverkss
Differential Revision: https://reviews.llvm.org/D128931
Kirill Okhotnikov [Fri, 1 Jul 2022 14:37:27 +0000 (16:37 +0200)]
[libc][math] Improved ExhaustiveTest performance.
Previous implementation splits value ranges around threads. Because of
very different performance of testing functions over different ranges,
CPU utilization were poor. Current implementation split test range
over small pieces and threads take the pieces when they finish with
previous. Therefore the CPU load is constant during testing.
Differential Revision: https://reviews.llvm.org/D128995
Fangrui Song [Fri, 1 Jul 2022 16:08:42 +0000 (09:08 -0700)]
[llvm-objdump] -r: print non-SHF_ALLOC relocations for non-ET_REL files
ET_EXEC and ET_DYN files may contain non-SHF_ALLOC relocation sections
(e.g. ld --emit-relocs). Match GNU objdump by dumping them.
* Remove Object/dynamic-reloc.test. Replace it with a -r RUN line in dynamic-relocs.test
* Update relocations-in-nonreloc.test to set sh_link/sh_info. GNU
objdump seems to ignore a SHT_REL/SHT_RELA section not linking to SHT_SYMTAB.
The test did not test what it intended to test.
Fix https://github.com/llvm/llvm-project/issues/41246
Reviewed By: jhenderson
Differential Revision: https://reviews.llvm.org/D128959
Fazlay Rabbi [Fri, 1 Jul 2022 00:08:17 +0000 (17:08 -0700)]
[OpenMP] Initial parsing and semantic support for 'parallel masked taskloop simd' construct
This patch gives basic parsing and semantic support for
"parallel masked taskloop simd" construct introduced in
OpenMP 5.1 (section 2.16.10)
Differential Revision: https://reviews.llvm.org/D128946
Jun Zhang [Fri, 1 Jul 2022 15:55:55 +0000 (23:55 +0800)]
Revert "[NFC] Add a missing test for for clang-repl"
This reverts commit
2750985a5ccb97f4630c3443e75d78ed435d2bd0.
This has caused Windows buildbot unhappy :(
Jun Zhang [Fri, 1 Jul 2022 15:26:08 +0000 (23:26 +0800)]
[NFC] Add a missing test for for clang-repl
This adds a missing test for
0ecbedc0986bd4b7b90a60a5f31d32337160d4c4
Signed-off-by: Jun Zhang <jun@junz.org>
Differential Revision: https://reviews.llvm.org/D128991
lorenzo chelini [Fri, 1 Jul 2022 09:44:39 +0000 (11:44 +0200)]
[MLIR][Linalg] Update filename to reflect implementation (NFC)
Reviewed By: nicolasvasilache
Differential Revision: https://reviews.llvm.org/D128978
Sander de Smalen [Fri, 1 Jul 2022 14:29:07 +0000 (14:29 +0000)]
[AArch64] Make nxv1i1 types a legal type for SVE.
One motivation to add support for these types are the LD1Q/ST1Q
instructions in SME, for which we have defined a number of load/store
intrinsics which at the moment still take a `<vscale x 16 x i1>` predicate
regardless of their element type.
This patch adds basic support for the nxv1i1 type such that it can be passed/returned
from functions, as well as some basic support to support some existing tests that
result in a nxv1i1 type. It also adds support for splats.
Other operations (e.g. insert/extract subvector, logical ops, etc) will be
supported in follow-up patches.
Reviewed By: paulwalker-arm, efriedma
Differential Revision: https://reviews.llvm.org/D128665
Nikita Popov [Thu, 16 Jun 2022 08:22:11 +0000 (10:22 +0200)]
[AST] Don't assert instruction reads/writes memory (PR51333)
This function is well-defined for an instruction that doesn't access
memory (and thus trivially doesn't alias anything in the AST), so
drop the assert. We can end up with a readnone call here if we
originally created a MemoryDef for an indirect call, which was
later replaced with a direct readnone call.
Fixes https://github.com/llvm/llvm-project/issues/51333.
Differential Revision: https://reviews.llvm.org/D127947
Sam McCall [Fri, 1 Jul 2022 14:44:36 +0000 (16:44 +0200)]
[pseudo] temporary fix for missing generated header after
fe66aebd755191fac6
Better fix to be added by Haojian later!
Andrew Ng [Thu, 23 Jun 2022 14:16:48 +0000 (15:16 +0100)]
[Build][NFC] Fixes for building on Windows with libc++
Differential Revision: https://reviews.llvm.org/D128514
Nikita Popov [Fri, 1 Jul 2022 14:28:56 +0000 (16:28 +0200)]
[SCEV] Remove unnecessary pointer handling in BuildConstantFromSCEV (NFCI)
Nowadays, we do not allow pointers in multiplies, and adds can only
have a single pointer, which is also guaranteed to be last by
complexity sorting. As such, we can somewhat simplify the treatment
of pointer types.
Nikita Popov [Fri, 1 Jul 2022 14:11:39 +0000 (16:11 +0200)]
[LoopDeletion] Fix deletion with unusual predecessor terminator (PR56266)
LoopSimplify only requires that the loop predecessor has a single
successor and is safe to hoist into -- it doesn't necessarily have
to be an unconditional BranchInst.
Adjust LoopDeletion to assert conditions closer to what it actually
needs for correctness, namely a single successor and a
side-effect-free terminator (as the terminator is getting dropped).
Fixes https://github.com/llvm/llvm-project/issues/56266.
David Goldman [Wed, 29 Jun 2022 14:04:21 +0000 (10:04 -0400)]
[clangd][ObjC] Fix ObjC method definition completion
D124637 improved filtering of method expressions, but not method
definitions. With this change, clangd will now filter ObjC method
definition completions based on their entire selector instead of
only the first selector fragment.
Differential Revision: https://reviews.llvm.org/D128821
Erich Keane [Thu, 30 Jun 2022 19:03:42 +0000 (12:03 -0700)]
Re-apply "Deferred Concept Instantiation Implementation""
This reverts commit
d4d47e574ecae562ab32f8ac7fa3f4d424bb6574.
This fixes the lldb crash that was observed by ensuring that our
friend-'template contains reference to' TreeTransform properly handles a
TemplateDecl.
Shilei Tian [Fri, 1 Jul 2022 13:50:29 +0000 (09:50 -0400)]
[NFC][OpenMP][CUDA] Remove unnecessary default label
Nikita Popov [Fri, 1 Jul 2022 13:43:27 +0000 (15:43 +0200)]
[ConstantRange] Fix sdiv() with one bit values (PR56333)
Signed one bit values can only be -1 or 0, not positive. The code
was interpreting the 1 as -1 and intersecting with a full range
rather than an empty one.
Fixes https://github.com/llvm/llvm-project/issues/56333.
Matt Devereau [Tue, 7 Jun 2022 11:19:23 +0000 (11:19 +0000)]
[SVE][AArch64] Refine hasSVEArgsOrReturn
As described in aapcs64 (https://github.com/ARM-software/abi-aa/blob/2022Q1/aapcs64/aapcs64.rst#scalable-vector-registers)
AAVPCS is used only when registers z0-z7 take an SVE argument. This fixes the case where floats occupy the lower bits
of registers z0-z7 but SVE arguments in registers greater than z7 cause a function to use AAVPCS where it should use AAPCS.
Moving SVE function deduction from AArch64RegisterInfo::hasSVEArgsOrReturn to AArch64TargetLowering::LowerFormalArguments
where physical register lowering is more accurate fixes this.
Differential Revision: https://reviews.llvm.org/D127209
Mirko Brkusanin [Fri, 1 Jul 2022 10:50:58 +0000 (12:50 +0200)]
[AMDGPU][GlobalISel] Always use VGPR bank for G_FCMP
Differential Revision: https://reviews.llvm.org/D128980
Ben Dunbobbin [Thu, 30 Jun 2022 22:01:30 +0000 (23:01 +0100)]
[LLVM][LTO][LLD] Enable Profile Guided Layout (--call-graph-profile-sort) for FullLTO
The CGProfilePass needs to be run during FullLTO compilation at link
time to emit the .llvm.call-graph-profile section to the compiled LTO
object file. Currently, it is being run only during the initial
LTO-prelink compilation stage (to produce the bitcode files to be
consumed by the linker) and so the section is not produced.
ThinLTO is not affected because:
- For ThinLTO-prelink compilation the CGProfilePass pass is not run
because ThinLTO-prelink passes are added via
buildThinLTOPreLinkDefaultPipeline. Normal and FullLTO-prelink
passes are both added via buildPerModuleDefaultPipeline which uses
the LTOPreLink parameter to customize its behavior for the
FullLTO-prelink pass differences.
- ThinLTO backend compilation phase adds the CGProfilePass (see:
buildModuleOptimizationPipeline).
Adjust when the pass is run so that the .llvm.call-graph-profile
section is produced correctly for FullLTO.
Fixes #56185 (https://github.com/llvm/llvm-project/issues/56185)
Nikita Popov [Fri, 1 Jul 2022 12:54:10 +0000 (14:54 +0200)]
[IRBuilder] Move CreateNeg() to fold API
Remove the CreateNeg() method from IRBuilderFolder and base it on
CreateSub(0, V) instead, which will call FoldNoWrapBinaryOp().
May not be NFC if InstSimplifyFolder is used.
Nikita Popov [Fri, 1 Jul 2022 12:47:56 +0000 (14:47 +0200)]
[IRBuilder] Move CreateNot() to fold API
Drop the IRBuilderFolder method entirely and base this on
CreateXor(V, -1) instead, so this will now go through FoldBinOp.
May not be NFC if the InstSimplifyBuilder is used.
Florian Hahn [Fri, 1 Jul 2022 12:48:38 +0000 (13:48 +0100)]
[LV] Don't optimize exit cond during epilogue vectorization.
At the moment, the same VPlan can be used code generation of both the
main vector and epilogue vector loop. This can lead to wrong results, if
the plan is optimized based on the VF of the main vector loop and then
re-used for the epilogue loop.
One example where this is problematic is if the scalar loops need to
execute at least one iteration, e.g. due to interleave groups.
To prevent mis-compiles in the short-term, disable optimizing exit
conditions for VPlans when using epilogue vectorization. The proper fix
is to avoid re-using the same plan for both loops, which will require
support for cloning plans first.
Fixes #56319.
Pavel Labath [Fri, 1 Jul 2022 12:32:50 +0000 (14:32 +0200)]
[lldb/test] Don't use preexec_fn for launching inferiors
As the documentation states, using this is not safe in multithreaded
programs, and I have traced it to a rare deadlock in some of the tests.
The reason this was introduced was to be able to attach to a program
from the very first instruction, where our usual mechanism of
synchronization -- waiting for a file to appear -- does not work.
However, this is only needed for a single test
(TestGdbRemoteAttachWait) so instead of doing this everywhere, I create
a bespoke solution for that single test. The solution basically
consists of outsourcing the preexec_fn code to a separate (and
single-threaded) shim process, which enables attaching and then executes
the real program.
This pattern could be generalized in case we needed to use it for other
tests, but I suspect that we will not be having many tests like this.
This effectively reverts commit
a997a1d7fbe229433fb458bb0035b32424ecf3bd.
Nikita Popov [Fri, 1 Jul 2022 12:27:38 +0000 (14:27 +0200)]
[SimplifyLibCalls] Use inbounds GEP
When converting strchr(p, '\0') to p + strlen(p) we know that
strlen() must return an offset that is inbounds of the allocated
object (otherwise it would be UB), so we can use an inbounds GEP.
An equivalent argument can be made for the other cases.
Sanjay Patel [Fri, 1 Jul 2022 12:21:55 +0000 (08:21 -0400)]
[InstCombine] add code comment for icmp transform; NFC
This was accidentally left out of
cc88445a9106
Aaron Ballman [Fri, 1 Jul 2022 12:11:46 +0000 (08:11 -0400)]
Add some more expected warnings to this C99 DR test
This should address the issue found by:
https://lab.llvm.org/buildbot/#/builders/171/builds/16835
Aaron Ballman [Fri, 1 Jul 2022 11:48:07 +0000 (07:48 -0400)]
Ensure that the generic associations aren't redundant
This should hopefully address the test failure found in:
https://lab.llvm.org/buildbot/#/builders/171/builds/16833
Matt Devereau [Thu, 23 Jun 2022 14:58:56 +0000 (14:58 +0000)]
[AArch64][SVE] Create AArch64ISD node for DUPQLANE128
Create an AArch64ISD node instead of emitting machine node DUP_ZZI_Q.
This allows a simpler DAG combine for work previously attempted
in https://reviews.llvm.org/D128503
Differential Revision: https://reviews.llvm.org/D128902
Aaron Ballman [Fri, 1 Jul 2022 11:33:37 +0000 (07:33 -0400)]
Fix this C99 DR to be more robust
This should fix the following test issue on ARM:
https://lab.llvm.org/buildbot/#/builders/171/builds/16815
Florian Hahn [Fri, 1 Jul 2022 11:03:24 +0000 (12:03 +0100)]
[VPlan] Move addMetadata to VPTransformState (NFC).
The moved helpers are only used for codegen. It will allow moving the
remaining ::execute implementations out of LoopVectorize.cpp.
Depends on D127966.
Depends on D127965.
Reviewed By: Ayal
Differential Revision: https://reviews.llvm.org/D127968
Guillaume Chatelet [Fri, 1 Jul 2022 10:48:57 +0000 (10:48 +0000)]
Revert "[reland] algorithm_test.cpp"
This reverts commit
1514acb20f404fa3fe0e20f068b1caf763396176.
Guillaume Chatelet [Thu, 30 Jun 2022 14:43:52 +0000 (14:43 +0000)]
[reland] algorithm_test.cpp
Removing `-ffreestanding` for the tests should allow us to use `<iostream>`
Differential Revision: https://reviews.llvm.org/D128916
Kazushi (Jam) Marukawa [Fri, 1 Jul 2022 10:24:33 +0000 (19:24 +0900)]
[VE][NFC] Correct comment
Florian Hahn [Fri, 1 Jul 2022 10:12:00 +0000 (11:12 +0100)]
[LV] Update test for #56319 to use interleave group.
The original test was over-reduced. It requires an interleave group, so
the last vector iteration of the epilogue vector loop doesn't execute.
Valentin Clement [Fri, 1 Jul 2022 10:04:19 +0000 (12:04 +0200)]
[flang] File omp_lib.f90 is not a standard intrinsic module
This patch is part of the upstreaming effort from fir-dev branch.
Reviewed By: jeanPerier
Differential Revision: https://reviews.llvm.org/D128976
Co-authored-by: V Donaldson <vdonaldson@nvidia.com>
Daniel Bertalan [Fri, 1 Jul 2022 09:47:34 +0000 (11:47 +0200)]
[lld-macho] Fix left shift of negative value UB
I introduced this mistake in
573c7e6b3c79c7ce80a2221e000fab7dd20c0bb4.
Fixes the failure on this UBSan bot:
https://lab.llvm.org/buildbot/#/builders/5/builds/25537
Dmitry Preobrazhensky [Fri, 1 Jul 2022 09:34:59 +0000 (12:34 +0300)]
[AMDGPU][GFX908][DOC][NFC] Update assembler syntax description
Summary of changes:
- Remove dst for global_atomic_add_f32, global_atomic_pk_add_f16.
- Make vdata input-only for buffer_atomic_add_f32, buffer_atomic_pk_add_f16.
- Other minor improvements.
Simon Pilgrim [Fri, 1 Jul 2022 09:36:01 +0000 (10:36 +0100)]
Revert rG057db2002bb3: [X86] combineAndnp - constant fold ANDNP(C,X) -> AND(~C,X)
If the LHS op has a single use then using the more general AND op is likely to allow commutation, load folding, generic folds etc.
Reverted due to reports from @alexfh about it causing an infinite loop (repro still pending).
Dmitry Preobrazhensky [Mon, 27 Jun 2022 16:30:44 +0000 (19:30 +0300)]
[AMDGPU][GFX940][DOC][NFC] Update assembler syntax description
Summary of changes:
- Update SMEM syntax (see https://reviews.llvm.org/D127314).
- Minor improvements.
Florian Hahn [Fri, 1 Jul 2022 09:09:23 +0000 (10:09 +0100)]
[LV] Add test case for #56319.
Test case for PR56319.
Nico Weber [Fri, 1 Jul 2022 08:31:35 +0000 (04:31 -0400)]
[gn build] (manually) port
fe66aebd7551 (PseudoCLI)
Nico Weber [Wed, 25 May 2022 12:39:29 +0000 (08:39 -0400)]
[gn build] (manually) port
cd2292ef824 (PseudoCXX)
This target will be used in the next commit.
Christian Kandeler [Fri, 1 Jul 2022 08:43:23 +0000 (04:43 -0400)]
[clangd] Also mark output arguments of array subscript expressions
... with the "usedAsMutableReference" semantic token modifier.
It's quite unusual to declare the index parameter of a subscript
operator as a non-const reference type, but arguably that makes it even
more helpful to be aware of it when working with such code.
Reviewed By: nridge
Differential Revision: https://reviews.llvm.org/D128892
Serge Pavlov [Fri, 1 Jul 2022 08:32:56 +0000 (15:32 +0700)]
Revert "[FPEnv] Allow CompoundStmt to keep FP options"
On some buildbots test `ast-print-fp-pragmas.c` fails, need to investigate it.
This reverts commit
0401fd12d4aa0553347fe34d666fb236d8719173.
This reverts commit
b822efc7404bf09ccfdc1ab7657475026966c3b2.
Valentin Clement [Fri, 1 Jul 2022 08:36:45 +0000 (10:36 +0200)]
[flang] Fix for broken/degenerate forall case
Fix for broken/degenerate forall case where there is no assignment to an
array under the explicit iteration space. While this is a multiple
assignment, semantics only raises a warning.
The fix is to add a test that the explicit space has any sort of array
to be updated, and if not then the do_loop nest will not require a
terminator to forward array values to the next iteration.
This patch is part of the upstreaming effort from fir-dev branch.
Reviewed By: jeanPerier
Differential Revision: https://reviews.llvm.org/D128973
Co-authored-by: Eric Schweitz <eschweitz@nvidia.com>
Mikhail Goncharov [Fri, 1 Jul 2022 08:33:48 +0000 (10:33 +0200)]
[fix/build] bazel rule for ParallelCombiningOpInterface
Muhammad Omair Javaid [Fri, 1 Jul 2022 08:21:27 +0000 (12:21 +0400)]
[LLDB] Xfail TestStepNoDebug.py AArch64/Windows
LLDB fails to step in/out/over code with missing debug information.
This is only reproducible on AArch64/Windows. I have reported a issue
upstream at llvm.org/pr56292
This patch Xfail TestStepNoDebug.py for AArch64/Windows.
Chen Zheng [Wed, 29 Jun 2022 09:07:23 +0000 (05:07 -0400)]
[SCEV] pre-commit test case for D127835, NFC
Serge Pavlov [Fri, 1 Jul 2022 08:17:04 +0000 (15:17 +0700)]
Fix warning on unhandled enumeration value
Valentin Clement [Fri, 1 Jul 2022 08:16:09 +0000 (10:16 +0200)]
[flang] Add correct number of args for wait
Add source coordinates to BeginWait and BeginWaitAll calls
This patch is part of the upstreaming effort from fir-dev branch.
Reviewed By: jeanPerier
Differential Revision: https://reviews.llvm.org/D128970
Co-authored-by: V Donaldson <vdonaldson@nvidia.com>
Chen Zheng [Mon, 27 Jun 2022 13:56:20 +0000 (09:56 -0400)]
[InstructionSimplify] handle denormal input for fcmp
Handle denormal constant input for fcmp instructions based on the
denormal handling mode.
Reviewed By: spatel, dcandler
Differential Revision: https://reviews.llvm.org/D128647
Daniel Bertalan [Thu, 30 Jun 2022 09:01:18 +0000 (11:01 +0200)]
[lld-macho] Handle LOH_ARM64_ADRP_LDR linker optimization hints
This linker optimization hint transforms a pair of adrp+ldr (immediate)
instructions into an ldr (literal) load from a PC-relative address if
it is 4-byte aligned and within +/- 1 MiB, as ldr can encode a signed
19-bit offset that gets multiplied by 4.
In the wild, only a small number of these hints are applicable because
not many loads end up close enough to the data segment. However, the
added helper functions will be useful in implementing the rest of the
LOH types.
Differential Revision: https://reviews.llvm.org/D128942
Serge Pavlov [Mon, 28 Sep 2020 07:32:06 +0000 (14:32 +0700)]
[FPEnv] Allow CompoundStmt to keep FP options
AST does not have special nodes for pragmas. Instead a pragma modifies
some state variables of Sema, which in turn results in modified
attributes of AST nodes. This technique applies to floating point
operations as well. Every AST node that can depend on FP options keeps
current set of them.
This technique works well for options like exception behavior or fast
math options. They represent instructions to the compiler how to modify
code generation for the affected nodes. However treatment of FP control
modes has problems with this technique. Modifying FP control mode
(like rounding direction) usually requires operations on hardware, like
writing to control registers. It must be done prior to the first
operation that depends on the control mode. In particular, such
operations are required for implementation of `pragma STDC FENV_ROUND`,
compiler should set up necessary rounding direction at the beginning of
compound statement where the pragma occurs. As there is no representation
for pragmas in AST, the code generation becomes a complicated task in
this case.
To solve this issue FP options are kept inside CompoundStmt. Unlike to FP
options in expressions, these does not affect any operation on FP values,
but only inform the codegen about the FP options that act in the body of
the statement. As all pragmas that modify FP environment may occurs only
at the start of compound statement or at global level, such solution
works for all relevant pragmas. The options are kept as a difference
from the options in the enclosing compound statement or default options,
it helps codegen to set only changed control modes.
Differential Revision: https://reviews.llvm.org/D123952
Nikita Popov [Wed, 29 Jun 2022 08:48:40 +0000 (10:48 +0200)]
[ConstExpr] Don't create insertvalue expressions
In preparation for the removal in D128719, this stops creating
insertvalue constant expressions (well, unless they are directly
used in LLVM IR).
Differential Revision: https://reviews.llvm.org/D128792
Nicolas Vasilache [Thu, 30 Jun 2022 10:37:21 +0000 (03:37 -0700)]
[mlir][SCF] Add a ParallelCombiningOpInterface to decouple scf::PerformConcurrently from its contained operations
This allows purging references of scf.ForeachThreadOp and scf.PerformConcurrentlyOp from
ParallelInsertSliceOp.
This will allowmoving the op closer to tensor::InsertSliceOp with which it should share much more
code.
In the future, the decoupling will also allow extending the type of ops that can be used in the
parallel combinator as well as semantics related to multiple concurrent inserts to the same
result.
Differential Revision: https://reviews.llvm.org/D128857
Nicolas Vasilache [Wed, 29 Jun 2022 08:59:33 +0000 (01:59 -0700)]
[mlir][vector] Untangle TransferWriteDistribution and avoid crashing in the 0-D case.
This revision avoids a crash in the 0-D case of distributing vector.transfer ops out of
vector.warp_execute_on_lane_0.
Due to the code complexity and lack of documentation, it took untangling the implementation
before realizing that the simple fix was to fail in the 0-D case.
The rewrite is still very useful to understand this code better.
Differential Revision: https://reviews.llvm.org/D128793
Nikita Popov [Tue, 21 Jun 2022 08:34:41 +0000 (10:34 +0200)]
[SCCP] Only handle unknown lattice values in resolvedUndefsIn()
This is a minor refinement of resolvedUndefsIn(), mostly for clarity.
If the value of an instruction is undef, then that's already a legal
final result -- we can safely rauw such an instruction with undef.
We only need to mark unknown values as overdefined, as that's the
result we get for an instruction that has not been processed because
it has an undef operand.
Differential Revision: https://reviews.llvm.org/D128251