Matthias Springer [Wed, 15 Dec 2021 09:43:24 +0000 (18:43 +0900)]
[mlir][linalg][bufferize] Reimplementation of TiledLoopOp bufferization
Instead of modifying the existing linalg.tiled_loop op, create a new op with memref input/outputs and delete the old op.
Differential Revision: https://reviews.llvm.org/D115493
Nikita Popov [Wed, 15 Dec 2021 08:38:48 +0000 (09:38 +0100)]
[CodeGen] Avoid deprecated ConstantAddress constructor
Change all uses of the deprecated constructor to pass the
element type explicitly and drop it.
For cases where the correct element type was not immediately
obvious to me or would require a slightly larger change I'm
falling back to explicitly calling getPointerElementType() for now.
Matthias Springer [Wed, 15 Dec 2021 09:32:13 +0000 (18:32 +0900)]
[mlir][linalg][bufferize] Reimplementation of scf.if bufferization
Instead of modifying the existing scf.if op, create a new op with memref OpOperands/OpResults and delete the old op.
New allocations / other memrefs can now be yielded from the op. This functionality is deactivated by default and guarded against by AssertDestinationPassingStyle.
Differential Revision: https://reviews.llvm.org/D115491
Javier Setoain [Tue, 12 Oct 2021 13:26:01 +0000 (14:26 +0100)]
[mlir][RFC] Add scalable dimensions to VectorType
With VectorType supporting scalable dimensions, we don't need many of
the operations currently present in ArmSVE, like mask generation and
basic arithmetic instructions. Therefore, this patch also gets
rid of those.
Having built-in scalable vector support also simplifies the lowering of
scalable vector dialects down to LLVMIR.
Scalable dimensions are indicated with the scalable dimensions
between square brackets:
vector<[4]xf32>
Is a scalable vector of 4 single precission floating point elements.
More generally, a VectorType can have a set of fixed-length dimensions
followed by a set of scalable dimensions:
vector<2x[4x4]xf32>
Is a vector with 2 scalable 4x4 vectors of single precission floating
point elements.
The scale of the scalable dimensions can be obtained with the Vector
operation:
%vs = vector.vscale
This change is being discussed in the discourse RFC:
https://llvm.discourse.group/t/rfc-add-built-in-support-for-scalable-vector-types/4484
Differential Revision: https://reviews.llvm.org/D111819
Matthias Springer [Wed, 15 Dec 2021 09:26:27 +0000 (18:26 +0900)]
[mlir][linalg][bufferize] Reimplementation of scf.for bufferization
Instead of modifying the existing scf.for op, create a new op with memref OpOperands/OpResults and delete the old op.
New allocations / other memrefs can now be yielded from the loop. This functionality is deactivated by default and guarded against by AssertDestinationPassingStyle.
This change also introduces `replaceOp`, which will be utilized by all other `bufferize` implementations in future commits. Bufferization will then no longer rely on old (pre-bufferize) ops to DCE away. Instead old ops are deleted on the spot. This improves debuggability because there won't be any duplicate ops anymore (bufferized + not-yet-bufferized) when dumping IR during bufferization. It is also less fragile because unbufferized IR can no longer silently "hang around" due to an implementation bug.
Differential Revision: https://reviews.llvm.org/D114926
Fangrui Song [Wed, 15 Dec 2021 09:27:08 +0000 (01:27 -0800)]
[ELF] --gc-sections: Change startwith(".jcr") to exact match
GNU ld's internal linker script keeps `.jcr`, but not other sections
starting with `.jcr`.
Julian Gross [Wed, 8 Dec 2021 10:29:47 +0000 (11:29 +0100)]
[mlir] Added documentation for bufferization to memref conversion pass.
Added documentation to clearify the purpose of the bufferization to memref pass
and added some remarks.
Differential Revision: https://reviews.llvm.org/D115326
Fangrui Song [Wed, 15 Dec 2021 09:16:25 +0000 (01:16 -0800)]
[ELF] --gc-sections: Change startwith(".init") (and ".fini") to exact match
GNU ld's internal linker script keeps `.init`, but not other sections starting
with `.init`. .fini is similar.
Fangrui Song [Wed, 15 Dec 2021 08:37:10 +0000 (00:37 -0800)]
[ELF] Change objectFiles to ELFFileBase *
This can sometimes avoid `cast<ObjFile<...>>`.
I intentionally do not touch postScanRelocations to wait for its stabilization.
Nikita Popov [Wed, 15 Dec 2021 08:27:49 +0000 (09:27 +0100)]
[CodeGen] Avoid some pointer element type accesses
Fangrui Song [Wed, 15 Dec 2021 08:18:58 +0000 (00:18 -0800)]
[ELF] Adjust getOutputSectionName prefix order
Sorting the prefixes by decreasing frequency can improve performance.
.gcc_except_table is relatively frequent, so move it ahead.
.ctors and .dtors mostly disappear and should be the last.
Nikita Popov [Tue, 14 Dec 2021 13:37:23 +0000 (14:37 +0100)]
[CodeGen] Store ElementType in Address
Explicitly track the pointer element type in Address, rather than
deriving it from the pointer type, which will no longer be possible
with opaque pointers. This just adds the basic facility, for now
everything is still going through the deprecated constructors.
I had to adjust one place in the LValue implementation to satisfy
the new assertions: Global registers are represented as a
MetadataAsValue, which does not have a pointer type. We should
avoid using Address in this case.
This implements a part of D103465.
Differential Revision: https://reviews.llvm.org/D115725
Fangrui Song [Wed, 15 Dec 2021 07:43:00 +0000 (23:43 -0800)]
[ELF] Slightly speed up getOutputSectionName. NFC
Nikolas Klauser [Wed, 15 Dec 2021 00:32:30 +0000 (01:32 +0100)]
[libc++][NFC] Use _LIBCPP_DEBUG_ASSERT in <string>
Use `_LIBCPP_DEBUG_ASSERT` instead of `_LIBCPP_ASSERT` and guarding it with `LIBCPP_DEBUG_LEVEL == 2`
Reviewed By: ldionne, #libc
Spies: libcxx-commits
Differential Revision: https://reviews.llvm.org/D115765
Esme-Yi [Wed, 15 Dec 2021 07:38:12 +0000 (07:38 +0000)]
[DebugInfo][DWARF] emit DW_AT_accessibility attribute for class/struct/union types.
Summary:
This patch emits the DW_AT_accessibility attribute for
class/struct/union types in the LLVM part.
Reviewed By: dblaikie
Differential Revision: https://reviews.llvm.org/D115606
gysit [Wed, 15 Dec 2021 07:10:32 +0000 (07:10 +0000)]
[mlir][linalg] Remove RangeOp and RangeType.
Remove the RangeOp and the RangeType that are not actively used anymore. After removing RangeType, the LinalgTypes header only includes the generated dialect header.
Reviewed By: nicolasvasilache
Differential Revision: https://reviews.llvm.org/D115727
Matthias Springer [Wed, 15 Dec 2021 07:12:17 +0000 (16:12 +0900)]
[adt] Fix compiler warning in test
Differential Revision: https://reviews.llvm.org/D115589
Fangrui Song [Wed, 15 Dec 2021 06:41:52 +0000 (22:41 -0800)]
[ELF] Remove dead code from SymbolTable::find
Logan Chien [Sat, 16 Oct 2021 00:43:25 +0000 (17:43 -0700)]
Print the sign of negative infinity
Differential Revision: https://reviews.llvm.org/D111917
Craig Topper [Wed, 15 Dec 2021 05:44:23 +0000 (21:44 -0800)]
[RISCV] Add more curly braces to constexpr array initialization to hopefully appease gcc 5.
Build bot failure found after D115668.
Fangrui Song [Wed, 15 Dec 2021 05:11:45 +0000 (21:11 -0800)]
[ELF] Use SmallVector for SharedFile and simplify parseVerdefs
SHT_GNU_verdef is typically small, so it's unnecessary to reserve the vector.
While here, fix a hypothetical issue when SHT_GNU_verdef has non-increasing
version indexes, which don't happen with GNU ld, gold, ld.lld's output.
My x86-64 lld executable is 256 bytes smaller.
Fangrui Song [Wed, 15 Dec 2021 04:55:32 +0000 (20:55 -0800)]
[ELF] Make InputFile smaller
sizeof(ObjFile<ELF64LE>) is decreased from 344 to 272 on an ELF64 system.
In a large link with 30000 ObjFiles, this may be 2+MiB saving.
Change std::vector members to SmallVector, and std::string members to
SmallString<0> (these members typically don't benefit from small string optimization).
On Linux x86-64 the lld executable is ~6k smaller.
Nico Weber [Wed, 15 Dec 2021 04:11:42 +0000 (23:11 -0500)]
[gn build] (manually) port
b45ad7363c30 (LLVM_WITH_Z3)
Stephen Hines [Wed, 15 Dec 2021 01:20:06 +0000 (17:20 -0800)]
[compiler-rt][AArch64] Add a workaround for Exynos 9810
Big.LITTLE Heterogeneous architectures, as described by ARM [1],
require that the instruction set architecture of the big and little
cores be compatible. However, the Samsung Exynos 9810 is known to
have different ISAs in its core.
According to [2], some cores are ARMv8.2 and others are ARMv8.0.
Since LSE is for ARMv8.1 and later, it should be disabled
for this broken CPU.
[1] https://developer.arm.com/documentation/den0024/a/big-LITTLE-Technology
[2] https://github.com/golang/go/issues/28431
Patch by: Byoungchan Lee (byoungchan.lee@gmx.com)
Reviewed By: srhines
Differential Revision: https://reviews.llvm.org/D114523
Lang Hames [Fri, 10 Dec 2021 20:53:59 +0000 (07:53 +1100)]
[llvm-jitlink] Update handling of library options.
Adds -L<search-path> and -l<library> options that are analogous to ld's
versions.
Each instance of -L<search-path> or -l<library> will apply to the most recent
-jd option on the command line (-jd <name> creates a JITDylib with the given
name). Library names will match against JITDylibs first, then llvm-jitlink will
look through the search paths for files named <search-path>/lib<library>.dylib
or <search-path>/lib<library>.a.
The default "main" JITDylib will link against all JITDylibs created by -jd
options, and all JITDylibs will link against the process symbols (unless
-no-process-symbols is specified).
The -dlopen option is renamed -preload, and will load dylibs into the JITDylib
for the ORC runtime only.
The effect of these changes is to make it easier to describe a non-trivial
program layout to llvm-jitlink for testing purposes. E.g. the following
invocation describes a program consisting of three JITDylibs: "main" (created
implicitly) containing main.o, "Foo" containing foo1.o and foo2.o, and linking
against library "bar" (not a JITDylib, so it must be a .dylib or .a on disk)
and "Baz" (which is a JITDylib), and "Baz" containing baz.o.
llvm-jitlink \
main.o \
-jd Foo foo1.o foo2.o -L${HOME}/lib -lbar -lBaz
-jd Baz baz.o
Nico Weber [Tue, 14 Dec 2021 20:10:41 +0000 (15:10 -0500)]
[clang] Use usual lit pattern for CLANG_DEFAULT_PIE_ON_LINUX and LLVM_WITH_Z3
See D28294 for context.
Differential Revision: https://reviews.llvm.org/D115751
Kirill Stoimenov [Fri, 10 Dec 2021 21:44:14 +0000 (21:44 +0000)]
[ASan] Added NO_EXEC_STACK_DIRECTIVE to assembly callback file.
This is present in our assembly files. It should fix decorate_proc_maps.cpp failures because of shadow memory being allocated as executable.
Reviewed By: vitalybuka
Differential Revision: https://reviews.llvm.org/D115552
Wenlei He [Wed, 15 Dec 2021 01:10:47 +0000 (17:10 -0800)]
[llvm-profgen] Turn on preinliner by default
preinliner has been tuned on large server workloads and it's not ready to be turned on by default. this change also updates the thresholds based on tuning.
Differential Revision: https://reviews.llvm.org/D115770
Sindhu Chittireddy [Wed, 15 Dec 2021 01:40:33 +0000 (17:40 -0800)]
Avoid setting tbaa on the store of return type of call to inline assembler.
In 32bit mode, attaching TBAA metadata to the store following the call
to inline assembler results in describing the wrong type by making a
fake lvalue(i.e., whatever the inline assembler happens to leave in
EAX:EDX.) Even if inline assembler somehow describes the correct type,
setting TBAA information on return type of call to inline assembler is
likely not correct, since TBAA rules need not apply to inline assembler.
Differential Revision: https://reviews.llvm.org/D115320
LLVM GN Syncbot [Wed, 15 Dec 2021 01:15:06 +0000 (01:15 +0000)]
[gn build] Port
4299d8d0ce42
Sam McCall [Wed, 15 Dec 2021 01:13:26 +0000 (02:13 +0100)]
[clangd] Cleanup unneeded use of shared_ptr. NFC
Lang Hames [Wed, 8 Dec 2021 09:25:53 +0000 (20:25 +1100)]
[ORC] Add MaterializationUnit::Interface parameter to ObjectLayer::add.
Also moves object interface building functions out of Mangling.h and in to the
new ObjectFileInterfaces.h header, and updates the llvm-jitlink tool to use
custom object interfaces rather than a custom link layer.
ObjectLayer::add overloads are added to match the old signatures (which
do not take a MaterializationUnit::Interface). These overloads use the
standard getObjectFileInterface function to build an interface.
Passing a MaterializationUnit::Interface explicitly makes it easier to alter
the effective interface of the object file being added, e.g. by changing symbol
visibility/linkage, or renaming symbols (in both cases the changes will need to
be mirrored by a JITLink pass at link time to update the LinkGraph to match the
explicit interface). Altering interfaces in this way can be useful when lazily
compiling (e.g. for renaming function bodies) or emulating linker options (e.g.
demoting all symbols to hidden visibility to emulate -load_hidden).
LLVM GN Syncbot [Wed, 15 Dec 2021 00:46:46 +0000 (00:46 +0000)]
[gn build] Port
3f630cff65fc
wlei [Tue, 14 Dec 2021 04:33:33 +0000 (20:33 -0800)]
[CSSPGO][llvm-profgen] Fix external address issues of perf reader (return to external addr part)
Before we have an issue with artificial LBR whose source is a return, recalling that "an internal code(A) can return to external address, then from the external address call a new internal code(B), making an artificial branch that looks like a return from A to B can confuse the unwinder". We just ignore the LBRs after this artificial LBR which can miss some samples. This change aims at fixing this by correctly unwinding them instead of ignoring them.
List some typical scenarios covered by this change.
1) multiple sequential call back happen in external address, e.g.
```
[ext, call, foo] [foo, return, ext] [ext, call, bar]
```
Unwinder should avoid having foo return from bar. Wrong call stack is like [foo, bar]
2) the call stack before and after external call should be correctly unwinded.
```
{call stack1} {call stack2}
[foo, call, ext] [ext, call, bar] [bar, return, ext] [ext, return, foo ]
```
call stack 1 should be the same to call stack2. Both shouldn't be truncated
3) call stack should be truncated after call into external code since we can't do inlining with external code.
```
[foo, call, ext] [ext, call, bar] [bar, call, baz] [baz, return, bar ] [bar, return, ext]
```
the call stack of code in baz should not include foo.
### Implementation:
We leverage artificial frame to fix #2 and #3: when we got a return artificial LBR, push an extra artificial frame to the stack. when we pop frame, check if the parent is an artificial frame to pop(fix #2). Therefore, call/ return artificial LBR is just the same as regular LBR which can keep the call stack.
While recording context on the trie, artificial frame is used as a tag indicating that we should truncate the call stack(fix #3).
To differentiate #1 and #2, we leverage `getCallAddrFromFrameAddr`. Normally the target of the return should be the next inst of a call inst and `getCallAddrFromFrameAddr` will return the address of call inst. Otherwise, getCallAddrFromFrameAddr will return to 0 which is the case of #1.
Reviewed By: hoy, wenlei
Differential Revision: https://reviews.llvm.org/D115550
wlei [Sun, 12 Dec 2021 09:42:53 +0000 (01:42 -0800)]
[llvm-profgen] Fix to use getUntrackedCallsites outside the loop
Unwinder is hoisted out in https://reviews.llvm.org/D115550, so fix the useage of getUntrackedCallsites.
Reviewed By: hoy, wenlei
Differential Revision: https://reviews.llvm.org/D115760
wlei [Wed, 15 Dec 2021 00:28:36 +0000 (16:28 -0800)]
[CSSPGO][llvm-profgen] Fix external address issues of perf reader (leading external LBR part)
We can have the sampling just hit into the external addresses, in that case, both the top stack frame and the latest LBR target are external addresses. For example:
```
ffffffff
0x4006c8/0xffffffff/P/-/-/0 0x40069b/0x400670/M/-/-/0
ffffffff
40067e
0xffffffff/0xffffffff/P/-/-/0 0x4006c8/0xffffffff/P/-/-/0 0x40069b/0x400670/M/-/-/0
```
Before we will ignore the entire samples. However, we found there exists some internal LBRs in the remaining part of sample, the range between them is still a valid range, we will lose some valid LBRs. Those LBRs will be unwinded based on a empty(context-less) call stack.
This change tries to fix it, instead of ignoring the entire sample, we only ignore the leading external addresses.
Note that the first outgoing LBR is useful since there is a valid range between it's source and next LBR's target.
Reviewed By: hoy, wenlei
Differential Revision: https://reviews.llvm.org/D115538
wlei [Mon, 13 Dec 2021 22:35:38 +0000 (14:35 -0800)]
[llvm-profgen] Skip disassembling for PLT section
Skip disassembling .plt section, then .plt section code will be treated as external code.
Reviewed By: hoy, wenlei
Differential Revision: https://reviews.llvm.org/D115699
Hongtao Yu [Mon, 13 Dec 2021 17:25:14 +0000 (09:25 -0800)]
[CSSPGO] Warn instead of error out for modules that are not probed.
Modules that are not compiled with pseudo probe enabled can still be compiled with a sample profile input, such as in LTO postlink where other modules are probed. Since the profile is unrelated to the current modules, we should warn instead of error out the compilation.
Reviewed By: wenlei
Differential Revision: https://reviews.llvm.org/D115642
Fangrui Song [Wed, 15 Dec 2021 00:36:44 +0000 (16:36 -0800)]
[gn build] (manually) port
9c7fbc3f9b05b3249468ef6aeacaf71841b5cfe3 (LLDB_ENABLE_FBSDVMCORE)
Fangrui Song [Wed, 15 Dec 2021 00:28:41 +0000 (16:28 -0800)]
Reland D114783/D115603 [ELF] Split scanRelocations into scanRelocations/postScanRelocations
(Fixed an issue about GOT on a copy relocated alias.)
(Fixed an issue about not creating r_addend=0 IRELATIVE for unreferenced non-preemptible ifunc.)
The idea is to make scanRelocations mark some actions are needed (GOT/PLT/etc)
and postpone the real work to postScanRelocations. It gives some flexibility:
* Make it feasible to support .plt.got (PR32938): we need to know whether GLOB_DAT and JUMP_SLOT are both needed.
* Make non-preemptible IFUNC handling slightly cleaner: avoid setting/clearing sym.gotInIgot
* -z nocopyrel: report all copy relocation places for one symbol
* Make GOT deduplication feasible
* Make parallel relocation scanning feasible (if we can avoid all stateful operations and make Symbol attributes atomic), but parallelism may not be the appealing choice
Since this patch moves a large chunk of code out of ELFT templates. My x86-64
executable is actually a few hundred bytes smaller.
For ppc32-ifunc-nonpreemptible-pic.s: I remove absolute relocation references to non-preemptible ifunc
because absolute relocation references are incorrect in -fpie mode.
Reviewed By: peter.smith, ikudrin
Differential Revision: https://reviews.llvm.org/D114783
Fangrui Song [Wed, 15 Dec 2021 00:25:50 +0000 (16:25 -0800)]
[ELF][test] Test unreferenced non-preemptible ifunc
Add missing coverage exposed by D114783.
There should be no associated IRELATIVE, otherwise (a) glibc ld.so may
crash (b) it wastes space (c) unused IPLT causes confusion.
David Blaikie [Wed, 15 Dec 2021 00:03:34 +0000 (16:03 -0800)]
DebugInfo: Fix test to match comment
This produced a few verifier warnings that came up while I was
investigating something else here. Change the assembly to match the
comment so it's warning free. Doesn't seem necessary to change the
CHECKs for the test since it's just a bug in the test, not in the code
under test.
David Blaikie [Tue, 14 Dec 2021 23:19:54 +0000 (15:19 -0800)]
DebugInfo: Sink string form validation down from verifier to form parsing
Avoid duplicating the string decoding - improve the error messages down
in form parsing (& produce an Expected<const char*> instead of
Optional<const char*> to communicate the extra error details)
Lang Hames [Tue, 14 Dec 2021 07:30:24 +0000 (18:30 +1100)]
[ORC] Add early-out to OL_applyQueryPhase1.
If all symbols in a lookup match before we reach the end of the search order
then bail out of the search-order loop early.
This should reduce unnecessary contention on the session lock and improve
readability of the debug logs.
David Blaikie [Tue, 14 Dec 2021 19:45:23 +0000 (11:45 -0800)]
DebugInfo: Migrate callers from getAsCString to dwarf::toString
This makes a bunch of these call sites independent of a follow-up change
I'm making to have getAsCString return Expected<const char*> for more
descriptive error messages so that the failures there can be
communicated up to DWARFVerifier (or other callers who want to provide
more verbose diagnostics) so DWARFVerifier doesn't have to re-implement
the string lookup logic and error checking.
Hongtao Yu [Tue, 14 Dec 2021 18:03:05 +0000 (10:03 -0800)]
[CSSPGO] Use nested context-sensitive profile.
CSSPGO currently employs a flat profile format for context-sensitive profiles. Such a flat profile allows for precisely manipulating contexts that is either inlined or not inlined. This is a benefit over the nested profile format used by non-CS AutoFDO. A downside of this is the longer build time due to parsing the indexing the full CS contexts.
For a CS flat profile, though only the context profiles relevant to a module are loaded when that module is compiled, the cost to figure out what profiles are relevant is noticeably high when there're many contexts, since the sample reader will need to scan all context strings anyway. On the contrary, a nested function profile has its related inline subcontexts isolated from other unrelated contexts. Therefore when compiling a set of functions, unrelated contexts will never need to be scanned.
In this change we are exploring using nested profile format for CSSPGO. This is expected to work based on an assumption that with a preinliner-computed profile all contexts are precomputed and expected to be inlined by the compiler. Contexts not expected to be inlined will be cut off and returned to corresponding base profiles (for top-level outlined functions). This naturally forms a nested profile where all nested contexts are expected to be inlined. The compiler will less likely optimize on derived contexts that are not precomputed.
A CS-nested profile will look exactly the same with regular nested profile except that each nested profile can come with an attributes. With pseudo probes, a nested profile shown as below can also have a CFG checksum.
```
main:1968679:12
2: 24
3: 28 _Z5funcAi:18
3.1: 28 _Z5funcBi:30
3: _Z5funcAi:1467398
0: 10
1: 10 _Z8funcLeafi:11
3: 24
1: _Z8funcLeafi:1467299
0: 6
1: 6
3: 287884
4: 287864 _Z3fibi:315608
15: 23
!CFGChecksum:
138828622701
!Attributes: 2
!CFGChecksum:
281479271677951
!Attributes: 2
```
Specific work included in this change:
- A recursive profile converter to convert CS flat profile to nested profile.
- Extend function checksum and attribute metadata to be stored in nested way for text profile and extbinary profile.
- Unifiy sample loader inliner path for CS and preinlined nested profile.
- Changes in the sample loader to support probe-based nested profile.
I've seen promising results regarding build time. A nested profile can result in a 20% shorter build time than a CS flat profile while keep an on-par performance. This is with -duplicate-contexts-into-base=1.
Test Plan:
Reviewed By: wenlei
Differential Revision: https://reviews.llvm.org/D115205
Fangrui Song [Tue, 14 Dec 2021 22:33:50 +0000 (14:33 -0800)]
Revert D114783 [ELF] Split scanRelocations into scanRelocations/postScanRelocations
May cause a failure for non-preemptible `bcmp` in a glibc -static link.
Stephan T. Lavavej [Tue, 14 Dec 2021 00:08:18 +0000 (16:08 -0800)]
[NFC] Fix typos in release notes.
Reviewed By: ldionne, Mordante, MaskRay
Differential Revision: https://reviews.llvm.org/D115685
Konstantin Varlamov [Tue, 14 Dec 2021 22:11:37 +0000 (14:11 -0800)]
[libc++][ranges] Implement ranges::uninitialized_default_construct{,_n}.
Defined in [`specialized.algorithms`](wg21.link/specialized.algorithms).
Also:
- refactor the existing non-range implementation so that most of it
can be shared between the range-based and non-range-based algorithms;
- remove an existing test for the non-range version of
`uninitialized_default_construct{,_n}` that likely triggered undefined
behavior (it read the values of built-ins after default-initializing
them, essentially reading uninitialized memory).
Reviewed By: #libc, Quuxplusone, ldionne
Differential Revision: https://reviews.llvm.org/D115315
Jon Chesterfield [Tue, 14 Dec 2021 21:59:24 +0000 (21:59 +0000)]
[amdgpu] Drop lowering of LDS used by global variables
Approximately revert D103431.
LDS variables are allocated at kernel launch and deallocated at kernel exit.
The address is therefore kernel execution dependent. Global variables are
initialized by values written to .data, which can't be done for a LDS variable
as there is no kernel running, or by a global constructor. Initializing the
global to the address of some LDS allocated by a global constructor is possible
but indistinguishable from undef.
Assigning the address of a LDS variable to a global should be a sema error. It
isn't for openmp, haven't checked other languages. Failing that it could be set
to undef, perhaps in this pass.
Reviewed By: rampitec
Differential Revision: https://reviews.llvm.org/D115413
Louis Dionne [Tue, 3 Nov 2020 19:46:57 +0000 (14:46 -0500)]
[libc++] Allow detecting whether the executor supports Bash
A few tests in the test suite require support for Bash. For example,
tests that run a program and send data through stdin to it require some
way of piping the data in, and we use a Bash script for that.
However, some executors (e.g. an embedded systems simulator) do not
support Bash, so these tests will fail. This commit adds a Lit feature
that tries to detect whether Bash is available through conventional
means, and disables the tests that require it otherwise.
Differential Revision: https://reviews.llvm.org/D114612
Thomas Raoux [Tue, 14 Dec 2021 20:52:36 +0000 (12:52 -0800)]
[mlir][linalg] Break up linalg vectorization pre-condition
Break up the vectorization pre-condition into the part checking for
static shape and the rest checking if the linalg op is supported by
vectorization. This allows checking if an op could be vectorized if it
had static shapes.
Differential Revision: https://reviews.llvm.org/D115754
Michał Górny [Wed, 1 Dec 2021 22:04:59 +0000 (23:04 +0100)]
[lldb] Introduce a FreeBSDKernel plugin for vmcores
Introduce a FreeBSDKernel plugin that provides the ability to read
FreeBSD kernel core dumps. The plugin utilizes libfbsdvmcore to provide
support for both "full memory dump" and minidump formats across variety
of architectures supported by FreeBSD. It provides the ability to read
kernel memory, as well as the crashed thread status with registers
on arm64, i386 and x86_64.
Differential Revision: https://reviews.llvm.org/D114911
Stella Stamenova [Tue, 14 Dec 2021 21:06:07 +0000 (13:06 -0800)]
[lldb] Update the PDB tests to pass with the VS2019 toolset
The pdb lldb tests do not work correctly with both the VS2019 and VS2017 toolsets at the moment. This change updates several of the tests to work with both toolsets. Unfortunately, this makes the tests suboptimal for both toolsets, but we can update them to be better for VS2019 once we officially drop VS2017. This change is meant to bridge the gap until the update happens, so that the buildbots can work with either toolset.
Differential Revision: https://reviews.llvm.org/D115482
Sanjay Patel [Tue, 14 Dec 2021 20:18:10 +0000 (15:18 -0500)]
[InstCombine] fold mask-with-signbit-splat to icmp+select
~(iN X s>> (N-1)) & Y --> (X s< 0) ? 0 : Y
https://alive2.llvm.org/ce/z/JKlQ9x
This is similar to D111410 /
727e642e970d028049d ,
but it includes a 'not' of the signbit and so it
saves an instruction in the basic pattern.
DAGCombiner or target-specific folds can expand
this back into bit-hacks.
The diffs in the logical-select tests are not true
regressions - running early-cse and another round
of instcombine is expected in a normal opt pipeline,
and that reduces back to a minimal form as shown
in the duplicated PhaseOrdering test.
I have no understanding of the SystemZ diffs, so
I made the minimal edits suggested by FileCheck to
make that test pass again. That whole test file is
wrong though. It is running the entire optimizer (-O2)
to check IR, and then topping that by even running
codegen and checking asm. It needs to be split up.
Fixes #52631
Lei Zhang [Tue, 14 Dec 2021 20:52:13 +0000 (15:52 -0500)]
[mlir][spirv] Support size-1 vector/tensor constant during conversion
Reviewed By: ThomasRaoux, mravishankar
Differential Revision: https://reviews.llvm.org/D115518
Mehrnoosh Heidarpour [Tue, 14 Dec 2021 20:32:05 +0000 (15:32 -0500)]
[InstSimplify] Add tests for logic AND; NFC
Mingming Liu [Fri, 10 Dec 2021 05:39:55 +0000 (05:39 +0000)]
[LTO] Ignore unreachable virtual functions in WPD in hybrid LTO.
Differential Revision: https://reviews.llvm.org/D115492
Luís Ferreira [Tue, 14 Dec 2021 20:11:58 +0000 (20:11 +0000)]
[lldb][NFC] Fix documentation for EncodingDataType
Reviewed By: teemperor
Differential Revision: https://reviews.llvm.org/D113605
Krzysztof Drewniak [Tue, 14 Dec 2021 18:47:09 +0000 (18:47 +0000)]
[MLIR][GPU] Make max flat work group size for ROCDL kernels configurable
While the default value for the amdgpu-flat-work-group-size attribute,
"1, 256", matches the defaults from Clang, some users of the ROCDL dialect,
namely Tensorflow, use larger workgroups, such as 1024. Therefore,
instead of hardcoding this value, we add a rocdl.max_flat_work_group_size
attribute that can be set on GPU kernels to override the default value.
Reviewed By: whchung
Differential Revision: https://reviews.llvm.org/D115741
Jonas Devlieghere [Tue, 14 Dec 2021 20:04:28 +0000 (12:04 -0800)]
[lldb] Check if language is supported before creating a REPL instance
Currently, we'll try to instantiate a ClangREPL for every known
language. The plugin manager already knows what languages it supports,
so rely on that to only instantiate a REPL when we know the requested
language is supported.
rdar://
86439474
Differential revision: https://reviews.llvm.org/D115698
Michael Benfield [Tue, 14 Sep 2021 18:36:58 +0000 (18:36 +0000)]
[clang] diagnose_as_builtin attribute for Fortify diagnosing like builtins.
Differential Revision: https://reviews.llvm.org/D112024
Sanjay Patel [Tue, 14 Dec 2021 19:15:14 +0000 (14:15 -0500)]
[PhaseOrdering] add tests for vector select; NFC
The 1st test corresponds to a minimally optimized (mem2reg)
version of the example in:
issue #52631
The 2nd test copies an existing instcombine test with the
same pattern. If we canonicalize differently, we can miss
reducing to minimal form in a single invocation of
-instcombine, but that should not escape the normal opt
pipeline.
Luís Ferreira [Tue, 14 Dec 2021 19:31:09 +0000 (19:31 +0000)]
[lldb][NFC] Format lldb/include/lldb/Symbol/Type.h
Reviewed By: teemperor, JDevlieghere, dblaikie
Differential Revision: https://reviews.llvm.org/D113604
Muiez Ahmed [Tue, 14 Dec 2021 19:22:11 +0000 (14:22 -0500)]
Revert "[z/OS] Implement prologue and epilogue generation for z/OS target."
This reverts commit
ffad4d777b227f91be04020e2cd86ab38e969e39 because it introduced buildbot failures.
LLVM GN Syncbot [Tue, 14 Dec 2021 19:12:20 +0000 (19:12 +0000)]
[gn build] Port
4e94cba5b4e4
Nico Weber [Tue, 14 Dec 2021 19:07:19 +0000 (14:07 -0500)]
[gn build] Reformat all build files
Ran:
git ls-files '*.gn' '*.gni' | xargs llvm/utils/gn/gn.py format
Nico Weber [Tue, 14 Dec 2021 19:04:43 +0000 (14:04 -0500)]
[gn build] (manually) port
f0ca8d2461a7f3c8 (debuginfod-find)
Alexander Belyaev [Tue, 14 Dec 2021 18:58:40 +0000 (19:58 +0100)]
[mlir] Add a missing pattern to bufferize tensor.rank.
Differential Revision: https://reviews.llvm.org/D115745
Simon Pilgrim [Tue, 14 Dec 2021 18:55:59 +0000 (18:55 +0000)]
[X86] Adjust some IceLake integer shuffle schedule classes (PR48110)
The IceLake scheduler model is still mainly a copy of the SkylakeServer model.
This patch adjusts the integer shuffle classes to account for most instructions now working on Port 1 as well as Port 5.
This is based off Agner + uops.info as well as the PR48110 report.
Differential Revision: https://reviews.llvm.org/D115547
Craig Topper [Tue, 14 Dec 2021 18:35:38 +0000 (10:35 -0800)]
[RISCV] Add isel support for scalar STRICT_FADD/FSUB/FMUL/FDIV/FSQRT.
Test that STRICT_FMINNUM/FMAXNUM are lowered to libcalls for f32/f64.
The RISC-V instructions don't match the behavior of fmin/fmax libcalls
with respect to SNaN.
Promoting FMINNUM/FMAXNUM for f16 needs more work outside of the
RISC-V backend.
Reviewed By: asb, arcbbb
Differential Revision: https://reviews.llvm.org/D115680
Nico Weber [Tue, 14 Dec 2021 18:48:41 +0000 (13:48 -0500)]
[gn build] (manually) port
1042de9058 to lit.site.cfg.in too
Kazu Hirata [Tue, 14 Dec 2021 18:46:57 +0000 (10:46 -0800)]
[AArch64] Revise a warning fix
This patch revises the warning fix done in
a93b1792f1c8f7e2e7c931993110dc48f7ddba01. Specifically, it rolls the
MRI.getType call into the assert, thereby avoiding the named variable.
Fangrui Song [Tue, 14 Dec 2021 18:31:06 +0000 (10:31 -0800)]
[ELF] -Map: Print symbols which needs canonical PLT entry/copy relocation just once
If a copy related symbol (say `copy`) is referenced in two .o
files, this change removes a duplicated line from the -Map output:
```
202470 202470 1 1 .bss.rel.ro
202470 202470 1 1 <internal>:(.bss.rel.ro)
202470 202470 1 1 copy
removed 202470 202470 1 1 copy
```
Differential Revision: https://reviews.llvm.org/D115697
Mehdi Amini [Tue, 14 Dec 2021 18:23:43 +0000 (18:23 +0000)]
Revert "Only define LLVM_EXTERNAL_VISIBILITY when building libLLVM dylib"
This reverts commit
71e97ad35b2abcc89cc8ff471a3eb404120cf208.
The MLIR tests using the dylib are broken.
https://lab.llvm.org/buildbot/#/builders/61/builds/18785
Henry Linjamäki [Tue, 14 Dec 2021 18:08:57 +0000 (10:08 -0800)]
[HIPSPV][2/4] Add HIPSPV tool chain
This patch adds a new tool chain, HIPSPVToolChain, for emitting HIP
device code as SPIR-V binary. The SPIR-V binary is emitted by using an
external tool, SPIRV-LLVM-Translator, temporarily. We intend to switch
the translator to the llc tool when the SPIR-V backend lands on LLVM
and proves to work well on HIP implementations which consume SPIR-V.
Before the SPIR-V emission the tool chain loads an optional external
pass plugin, either automatically from a HIP installation or from a
path pointed by --hipspv-pass-plugin, and runs passes that are meant
to expand/lower HIP features that do not have direct counterpart in
SPIR-V (e.g. dynamic shared memory).
Code emission for SPIR-V will be enabled and HIPSPVToolChain tests
will be added in the follow up patch part 3.
Other changes: New option ‘-nohipwrapperinc’ is added to exclude HIP
include wrappers. The reason for the addition is that they cause
compile errors when compiling HIP sources for the host side for HIPCL
and HIPLZ implementations. New option is added to avoid this issue.
Reviewed By: tra
Differential Revision: https://reviews.llvm.org/D110618
Michael Spencer [Tue, 14 Dec 2021 18:18:31 +0000 (11:18 -0700)]
[Clang][ScanDeps] Use the virtual path for module maps
Make clang-scan-deps use the virtual path for module maps instead of the on disk
path. This is needed so that modulemap relative lookups are done correctly in
the actual module builds. The file dependencies still use the on disk path as
that's what matters for build invalidation.
Differential Revision: https://reviews.llvm.org/D114206
Fangrui Song [Tue, 14 Dec 2021 18:20:51 +0000 (10:20 -0800)]
[gn] Use CLANG_DEFAULT_PIE_ON_LINUX=
Craig Topper [Tue, 14 Dec 2021 17:54:24 +0000 (09:54 -0800)]
[RISCV] Use AdjustInstrPostInstrSelection to insert a FRM dependency for scalar FP instructions with dynamic rounding mode.
In order to support constrained FP intrinsics we need to model FRM
dependency. Whether or not a instruction uses FRM is based on a 3
bit field in the instruction. Because of this we can't add
'Uses = [FRM]' to the tablegen descriptions.
This patch examines the immediate after isel and adds an implicit
use of FRM. This idea came from Roger Ferrer Ibanez.
Other ideas:
We could be overly conservative and just pretend all instructions with
frm field read the FRM register. Or we could have pseudoinstructions
for CodeGen with rounding mode.
Reviewed By: asb, frasercrmck, arcbbb
Differential Revision: https://reviews.llvm.org/D115555
Fangrui Song [Tue, 14 Dec 2021 18:08:59 +0000 (10:08 -0800)]
[Driver] Add CLANG_DEFAULT_PIE_ON_LINUX to emulate GCC --enable-default-pie
In 2015-05, GCC added the configure option `--enable-default-pie`. When enabled,
* in the absence of -fno-pic/-fpie/-fpic (and their upper-case variants), -fPIE is the default.
* in the absence of -no-pie/-pie/-shared/-static/-static-pie, -pie is the default.
This has been adopted by all(?) major distros.
I think default PIE is the majority in the Linux world, but
--disable-default-pie users is not that uncommon because GCC upstream hasn't
switched the default yet (https://gcc.gnu.org/PR103398).
This patch add CLANG_DEFAULT_PIE_ON_LINUX which allows distros to use default PIE.
The option is justified as its adoption can be very high among Linux distros
to make Clang default match GCC, and is likely a future-new-default, at which
point we will remove CLANG_DEFAULT_PIE_ON_LINUX.
The lit feature `default-pie-on-linux` can be handy to exclude default PIE sensitive tests.
Reviewed By: foutrelis, sylvestre.ledru, thesamesam
Differential Revision: https://reviews.llvm.org/D113372
Noah Shutty [Tue, 14 Dec 2021 17:29:17 +0000 (17:29 +0000)]
[llvm] [Debuginfo] Add llvm-debuginfod-find tool and end-to-end-tests.
This implements the `llvm-debuginfod-find` tool, which wraps the Debuginfod library (D112758) to query debuginfod servers for artifacts according to the [[ https://www.mankier.com/8/debuginfod#Webapi | specification ]].
Reviewed By: labath
Differential Revision: https://reviews.llvm.org/D112759
Ben Langmuir [Fri, 10 Dec 2021 19:49:04 +0000 (11:49 -0800)]
Only define LLVM_EXTERNAL_VISIBILITY when building libLLVM dylib
When building LLVM static libraries, we should not make symbols more
visible than CMAKE_CXX_VISIBILITY_PRESET, since the goal may be to have
a purely hidden llvm embedded in another library. Instead, we only
define LLVM_EXTERNAL_VISIBILITY for the dynamic library build (when
LLVM_BUILD_LLVM_DYLIB=YES).
Differential Revision: https://reviews.llvm.org/D113610
Craig Topper [Tue, 14 Dec 2021 17:32:58 +0000 (09:32 -0800)]
[RISCV] Add mayRaiseFPException to RISCV scalar FP instructions.
FRM dependency will be added in a future patch.
Reviewed By: arcbbb
Differential Revision: https://reviews.llvm.org/D115540
Fangrui Song [Tue, 14 Dec 2021 17:52:43 +0000 (09:52 -0800)]
[ELF] Remove needsPltAddr in favor of needsCopy
needsPltAddr is equivalent to `needsCopy && isFunc`. In many places, it is
equivalent to `needsCopy` because the non-STT_FUNC cases are ruled out.
Reviewed By: ikudrin, peter.smith
Differential Revision: https://reviews.llvm.org/D115603
Craig Topper [Tue, 14 Dec 2021 17:17:50 +0000 (09:17 -0800)]
[RISCV] Add a table for extension implications.
This a proof of concept for a suggestion I proposed in D108694.
Reviewed By: eopXD
Differential Revision: https://reviews.llvm.org/D115668
Ellis Hoag [Tue, 14 Dec 2021 06:03:23 +0000 (22:03 -0800)]
[DebugInfo][dsymutil] Keep locations for function-local globals
In debug info, we expect variables to have location info if they are used and we don't want location info for functions that are not used. However, if an unused function is inlined, we could have the scenario where a function is not in the final binary but its static variable is in the final binary. Ensure that variables in the final binary have location debug info even if their scope was inlined.
Also add `--implicit-check-not` to a test for clarity.
Reviewed By: JDevlieghere
Differential Revision: https://reviews.llvm.org/D115565
Kirill Stoimenov [Tue, 14 Dec 2021 17:17:57 +0000 (17:17 +0000)]
[gn build] Reland
5082c330138: (semimanually) port ebc31d2.
Michał Górny [Tue, 14 Dec 2021 17:17:32 +0000 (18:17 +0100)]
Revert "[lldb] Introduce a FreeBSDKernel plugin for vmcores"
This reverts commit
aedb328a4dc9cb48ee3cf3198281649ea2c4f532.
I have failed to make the new tests conditional to the presence
of libfbsdvmcore.
Craig Topper [Tue, 14 Dec 2021 16:58:16 +0000 (08:58 -0800)]
[RISCV] Convert (splat_vector (load)) to vlse with 0 stride.
We already do this for splat nodes that carry a VL, but not for
splats that use VLMAX.
Reviewed By: frasercrmck
Differential Revision: https://reviews.llvm.org/D115483
Michał Górny [Wed, 1 Dec 2021 22:04:59 +0000 (23:04 +0100)]
[lldb] Introduce a FreeBSDKernel plugin for vmcores
Introduce a FreeBSDKernel plugin that provides the ability to read
FreeBSD kernel core dumps. The plugin utilizes libfbsdvmcore to provide
support for both "full memory dump" and minidump formats across variety
of architectures supported by FreeBSD. It provides the ability to read
kernel memory, as well as the crashed thread status with registers
on arm64, i386 and x86_64.
Differential Revision: https://reviews.llvm.org/D114911
Alexander Batashev [Tue, 14 Dec 2021 17:01:52 +0000 (20:01 +0300)]
Disable issue labeler in LLVM forks
LLVM forks may use GitHub Actions as well as the upstream projects,
but they do not necessarily follow the same development processes.
Disable automatic issue labeling for forks, so that it does not
interfere with downstream repo automation.
Reviewed By: tstellar
Differential Revision: https://reviews.llvm.org/D115708
Zaara Syeda [Tue, 14 Dec 2021 16:45:43 +0000 (16:45 +0000)]
[LoopUnroll] Disable loop unroll when user explicitly asks for unroll-and-jam
If a loop isn't forced to be unrolled, we want to avoid unrolling it when there
is an explicit unroll-and-jam pragma. This is to prevent automatic unrolling
from interfering with the user requested transformation.
Differential Revision: https://reviews.llvm.org/D114886
Philip Reames [Tue, 14 Dec 2021 16:09:00 +0000 (08:09 -0800)]
Add FMF to hasPoisonGeneratingFlags/dropPoisonGeneratingFlags
These flags are documented as generating poison values for particular input values. As such, we should really be consistent about their handling with how we handle nsw/nuw/exact/inbounds.
Differential Revision: https://reviews.llvm.org/D115460
Jing Bao [Tue, 14 Dec 2021 16:42:38 +0000 (08:42 -0800)]
[WebAssembly] Custom optimization for truncate
When possible, optimize TRUNCATE to generate Wasm SIMD narrow
instructions (i16x8.narrow_i32x4_u, i8x16.narrow_i16x8_u), rather than generate
lots of extract_lane and replace_lane.
Closes #50350.
Mark de Wever [Mon, 13 Dec 2021 17:30:41 +0000 (18:30 +0100)]
[libc++] Remove C++ version guards in the dylib.
The library is always build using C++20 so these guards are not needed.
Reviewed By: #libc, Quuxplusone, ldionne
Differential Revision: https://reviews.llvm.org/D115644
Aart Bik [Mon, 13 Dec 2021 20:51:34 +0000 (12:51 -0800)]
[mlir][sparse] fixed typos
Reviewed By: bixia
Differential Revision: https://reviews.llvm.org/D115667
Aart Bik [Tue, 14 Dec 2021 04:41:42 +0000 (20:41 -0800)]
[mlir][sparse] speed up sparse tensor file I/O by more than 2x
data point using the 3-dim tensor nell-2.tns
MLIR:
READ FILE INTO COO: 24424.369294 ms ---> improves to ----> 9638.501044 ms
SORT COO BEFORE PACK: 762.834831 ms
PACK COO TO TENSOR: 1243.376245 ms
TACO:
b file read: 13270.9 ms
b pack: 7137.74 ms
b size: (12092 x 9184 x 28818),
925300328 bytes
https://github.com/llvm/llvm-project/issues/52679
Reviewed By: bixia
Differential Revision: https://reviews.llvm.org/D115696
Sanjay Patel [Tue, 14 Dec 2021 15:46:25 +0000 (10:46 -0500)]
[InstCombine] prevent infinite looping from opposing cmp and select transforms (PR52684)
As noted in the code comment, we might want to simply give up on this select
transform completely (given how many exceptions there are already and the
risk of future conflicts), but for now, carve out one more bailout to
avoid an infinite loop.
Fixes #52684:
https://github.com/llvm/llvm-project/issues/52684
Sanjay Patel [Mon, 13 Dec 2021 20:41:32 +0000 (15:41 -0500)]
[InstCombine] regenerate test checks; NFC
Sanjay Patel [Mon, 13 Dec 2021 20:18:10 +0000 (15:18 -0500)]
[InstCombine] convert static function to internal class function; NFC
The transform can require an optional shuffle instruction to be sound,
so we need to use Builder to create all values and then replace the
original instruction with whatever that final value is.