Guillaume Chatelet [Mon, 10 May 2021 07:53:48 +0000 (07:53 +0000)]
[libc] Allow target architecture customization
This patch provides a way to specify the default target cpu optimizations to use when compiling llvm-libc.
This ensures we don't rely on current compiler's default and allows compiling and cross compiling for a particular target.
Differential Revision: https://reviews.llvm.org/D101991
Pushpinder Singh [Fri, 7 May 2021 08:15:49 +0000 (08:15 +0000)]
[AMDGPU][OpenMP] Disable tests when amdgpu-arch fails
This patch prevents runtime tests running on systems without amdgpu.
Reviewed By: protze.joachim, tianshilei1992
Differential Revision: https://reviews.llvm.org/D102054
Pushpinder Singh [Fri, 7 May 2021 11:56:46 +0000 (11:56 +0000)]
[amdgpu-arch] Guard hsa.h with __has_include
This patch is suppose to fix the issue of hsa.h not found.
Issue was reported in D99949
Reviewed By: JonChesterfield
Differential Revision: https://reviews.llvm.org/D102067
Fraser Cormack [Fri, 7 May 2021 10:20:21 +0000 (11:20 +0100)]
[LegalizeVectorOps][RISCV] Add scalable-vector SELECT expansion
This patch extends VectorLegalizer::ExpandSELECT to permit expansion
also for scalable vector types. The only real change is conditionally
checking for BUILD_VECTOR or SPLAT_VECTOR legality depending on the
vector type.
We can use this to fix "cannot select" errors for scalable vector
selects on the RISCV target. Note that in future patches RISCV will
possibly custom-lower vector SELECTs to VSELECTs for branchless codegen.
Reviewed By: craig.topper
Differential Revision: https://reviews.llvm.org/D102063
Adrian Kuegel [Mon, 10 May 2021 05:48:45 +0000 (07:48 +0200)]
[mlir] Fix compile error.
Inside a templated function, other class members need to be called with
this->.
Otherwise we get: explicit qualification required to use member
'setDebugName' from dependent base class.
Jun Ma [Fri, 30 Apr 2021 02:30:37 +0000 (10:30 +0800)]
[AArch64][SVE] Remove index_vector node.
Since index_vector is lowered into step_vector in D100816, we can just remove
index_vector, use step_vector for codegen directly.
Differential Revision: https://reviews.llvm.org/D101593
Lang Hames [Sun, 9 May 2021 18:20:54 +0000 (11:20 -0700)]
[ORC] Use the new dispatchTask API to run query callbacks.
Dispatching query callbacks, rather than running them on the current thread,
will allow them to be distributed across multiple threads.
Lang Hames [Sun, 9 May 2021 00:45:42 +0000 (17:45 -0700)]
[ORC] Generalize materialization dispatch to task dispatch.
Generalizing this API allows work to be distributed more evenly. In particular,
query callbacks can now be dispatched (rather than running immediately on the
thread that satisfied the query). This avoids the pathalogical case where an
operation on one thread satisfies many queries simultaneously, causing large
amounts of work to be run on that thread while other threads potentially sit
idle.
Teresa Johnson [Wed, 28 Apr 2021 22:20:04 +0000 (15:20 -0700)]
[SimplifyCFG] Ignore ephemeral values when counting insts for threading
Ignore ephemeral values (only feeding llvm.assume intrinsics) when
computing the instruction count to decide if a block is small enough for
threading. This is similar to the handling of these values in the
InlineCost computation. These instructions will eventually be removed
and shouldn't count against code size (similar to the existing ignoring
of phis).
Without this change, when enabling -fwhole-program-vtables, which causes
type test / assume sequences to be inserted by clang, we can get
different threading decisions. In particular, when building with
instrumentation FDO it can affect the optimizations decisions before FDO
matching, leading to some mismatches.
Differential Revision: https://reviews.llvm.org/D101494
Yuanfang Chen [Mon, 10 May 2021 02:04:07 +0000 (19:04 -0700)]
[NFC][Coroutines] Fix two tests by removing hardcoded SSA value.
Zakk Chen [Wed, 5 May 2021 07:53:41 +0000 (15:53 +0800)]
[RISCV][NFC] Don't need to create a new STI in RISCVAsmPrinter.
RISCVAsmPrinter already has MCSubtargetInfo.
Reviewed By: HsiangKai
Differential Revision: https://reviews.llvm.org/D101889
Chia-hung Duan [Fri, 16 Apr 2021 05:34:10 +0000 (13:34 +0800)]
Support NativeCodeCall binding in rewrite pattern.
We are able to bind the result from native function while rewriting
pattern. In matching pattern, if we want to get some values back, we can
do that by passing parameter as return value placeholder. Besides, add
the semantic of '$_self' in NativeCodeCall while matching, it'll be the
operation that defines certain operand.
Differential Revision: https://reviews.llvm.org/D100746
Jez Ng [Mon, 10 May 2021 01:11:29 +0000 (21:11 -0400)]
[lld-macho] Add llvm-otool as a test dependency
This unbreaks my local build, which is configured to build only parts of
LLVM.
Nico Weber [Sun, 9 May 2021 22:35:16 +0000 (18:35 -0400)]
[lld/mac] Fix alignment on subsections
On a section with alignment of 16, subsections aligned to 16-byte
boundaries should keep their 16-byte alignment.
Fixes PR50274. (The same bug could have happened with -order_file
previously.)
Differential Revision: https://reviews.llvm.org/D102139
Jez Ng [Mon, 10 May 2021 00:05:45 +0000 (20:05 -0400)]
[lld-macho] Don't reference entry symbol for non-executables
This would cause us to pull in symbols (and code) that should
be unused.
Reviewed By: #lld-macho, thakis
Differential Revision: https://reviews.llvm.org/D102137
Tomasz Miąsko [Sun, 9 May 2021 20:38:13 +0000 (13:38 -0700)]
[Demangle][Rust] Print special namespaces
Reviewed By: dblaikie
Differential Revision: https://reviews.llvm.org/D101821
Roman Lebedev [Sun, 9 May 2021 20:45:44 +0000 (23:45 +0300)]
[X86] AMD Zen 3: same-reg CMP is a zero-cycle dependency-breaking instruction
As measured by exegesis, and confirmed by ref docs.
Roman Lebedev [Sun, 9 May 2021 20:28:08 +0000 (23:28 +0300)]
[NFC][X86][MCA] AMD Zen 3: add tests for CMP dependency breaking
Roman Lebedev [Sun, 9 May 2021 20:14:17 +0000 (23:14 +0300)]
[X86] AMD Zen 3: same-reg SBB is a dependency-breaking instruction
As confirmed by exegesis measurements, and ref docs.
It does actually execute.
While there, bump latency for MULX32rr, that seems to match measurements.
Roman Lebedev [Sun, 9 May 2021 20:14:12 +0000 (23:14 +0300)]
[NFC][X86][MCA] AMD Zen 3: add tests for SBB dependency breaking
Roman Lebedev [Sun, 9 May 2021 19:43:30 +0000 (22:43 +0300)]
[X86] AMD Zen 3: same-register XOR/SUB are GPR dependency breaking zero-idioms
As measured by exegesis and confirmed in reference docs.
Roman Lebedev [Sun, 9 May 2021 19:27:16 +0000 (22:27 +0300)]
[NFC][X86][MCA] AMD Zen3: add GPR zero-idiom dependency breaking tests
David Green [Sun, 9 May 2021 20:57:55 +0000 (21:57 +0100)]
[ARM] Fix postinc of vst1xN
These nodes are not handled correctly by CombineBaseUpdate. For the
moment, similar to
5f1cad4d296a20025f0b mark them as unsupported.
Nikita Popov [Sat, 1 May 2021 14:59:06 +0000 (16:59 +0200)]
[SCEV] Handle and/or in applyLoopGuards()
applyLoopGuards() already combines conditions from multiple nested
guards. However, it cannot use multiple conditions on the same guard,
combined using and/or. Add support for this by recursing into either
`and` or `or`, depending on the direction of the branch.
Differential Revision: https://reviews.llvm.org/D101692
Nikita Popov [Sun, 9 May 2021 19:21:54 +0000 (21:21 +0200)]
[SCEV] Add additional loop guard and/or tests (NFC)
Add tests for and/and, and/or, or/or, or/and combinations.
Roman Lebedev [Sun, 9 May 2021 17:37:30 +0000 (20:37 +0300)]
[NFC][X86] Znver3: drop obsolete fixme
Roman Lebedev [Sun, 9 May 2021 14:32:37 +0000 (17:32 +0300)]
[X86] AMD Zen 3: XCHG is a zero-cycle instruction
As measured by exegesis and confirmed by reference docs.
LemonBoy [Sun, 9 May 2021 16:51:05 +0000 (18:51 +0200)]
[SelectionDAG] Regenerate test checks (NFC)
Nikita Popov [Sun, 9 May 2021 16:20:37 +0000 (18:20 +0200)]
[SROA] Regenerate test checks (NFC)
Mark de Wever [Sun, 9 May 2021 15:55:50 +0000 (17:55 +0200)]
[libc++][doc] Update the Format library status.
- Move LWG-3218 to the chrono section.
- Mark the several parts 'In progress'.
Greg McGary [Sat, 8 May 2021 18:42:15 +0000 (11:42 -0700)]
[lld-macho][NFC] Purge stale test-output trees prior to split-file
Enforce standard practice
Differential Revision: https://reviews.llvm.org/D102112
Roman Lebedev [Sat, 8 May 2021 21:57:59 +0000 (00:57 +0300)]
[NFC][LoopIdiom] Add some tests for 'lshr until zero' ('count active bits') "on steroids" idiom
Roman Lebedev [Sat, 8 May 2021 17:42:14 +0000 (20:42 +0300)]
[NFCI][X86] Mark Znver3 scheduling model as complete
To the best of my knowledge, all instructions are modelled,
and have reasonable values to them; flipping the switch
doesn't cause any diff for MCA tests, so either we're good,
or we have test coverage gaps.
I'm not really sure why no other X86 sched model is marked as complete.
Roman Lebedev [Sat, 8 May 2021 17:39:26 +0000 (20:39 +0300)]
[NFCI][X86] Mark a few lately-added system instructions as such for Scheduling purposes
Fangrui Song [Sat, 8 May 2021 20:41:36 +0000 (13:41 -0700)]
[test] Fix tools/gold/X86/new-pm.ll after D101797
Krzysztof Parzyszek [Fri, 7 May 2021 17:51:10 +0000 (12:51 -0500)]
[Hexagon] Propagate metadata in Hexagon Vector Combine
Andrea Di Biagio [Sat, 8 May 2021 18:41:56 +0000 (19:41 +0100)]
[llvm-mca][View] Update the Register File statistics.
Correctly track the number of move eliminated in the
Register File statistics.
Greg McGary [Sat, 8 May 2021 01:05:47 +0000 (18:05 -0700)]
[lld-macho] Explicitly undefine literal exported symbols
Symbols explicitly exported via command-line options `--exported_symbol SYM` and `--exported_symbols_list FILE` must be defined. Before this fix, lazy symbols defined in archives would be left to languish. We now force them to be included in the linked output.
Differential Revision: https://reviews.llvm.org/D102100
Andrea Di Biagio [Sat, 8 May 2021 16:58:46 +0000 (17:58 +0100)]
[MCA][RegisterFile] Refactor the move elimination logic to address PR50258.
This patch lifts the restriction on the number of read/write registers for a
move elimination candidate. With this patch, move elimination candidates with
exactly two reads and two writes are treated like register swap operations for
the purpose of move elimination.
This patch currently doesn't affect any upstream model. However, it should help
unblock the progress on PR50258.
Nico Weber [Sat, 8 May 2021 17:03:17 +0000 (13:03 -0400)]
[lld/mac] Copy some of the commit message of
d5a70db193 into a comment
Louis Dionne [Sat, 8 May 2021 16:15:30 +0000 (12:15 -0400)]
[libc++] NFC: Refactor Lit annotations
Annotations for c++03 mode are useless, since we only run these tests
in C++11 and C++14.
Florian Hahn [Sun, 11 Apr 2021 10:41:48 +0000 (11:41 +0100)]
[VPlan] Add test for sink scalars and merging using VPlan.
Add a couple of tests with scalars that can be sunk to their predicated
users.
This pre-commits tests for D100258.
Simon Pilgrim [Sat, 8 May 2021 15:22:46 +0000 (16:22 +0100)]
[GlobalISel] Ensure MachineIRBuilder::getDebugLoc() returns a const reference. NFCI.
Avoids a lot of unnecessary tracking increments/decrements of the underlying TrackingMDNodeRef.
Simon Pilgrim [Sat, 8 May 2021 15:19:18 +0000 (16:19 +0100)]
[X86] combineHorizOpWithShuffle - generalize HOP(SHUFFLE(X),SHUFFLE(Y)) -> SHUFFLE(HOP(X,Y)) fold.
For 128-bit types, generalize the fold to recognise duplicate operands in either shuffle.
Louis Dionne [Fri, 7 May 2021 14:15:36 +0000 (10:15 -0400)]
[libc++] Move handling of the target triple to the DSL
This fixes a long standing issue where the triple is not always set
consistently in all configurations. This change also moves the
back-deployment Lit features to using the proper target triple
instead of using something ad-hoc.
This will be necessary for using from scratch Lit configuration files
in both normal testing and back-deployment testing.
Differential Revision: https://reviews.llvm.org/D102012
Vinayaka Bandishti [Sat, 8 May 2021 14:42:23 +0000 (20:12 +0530)]
[MLIR] Add memref dialect dependency for affine fusion pass
For `AffineLoopFusion` pass, add `memref` dialect as a dependent
dialect. Since the fusion pass can create `memref::AllocOp`s, the
dialect must be registered in its dependent dialects.
The missing dependency was not discovered until now because the above
said op creation happes only when the input already has
`memref::AllocOp`s in it, and all dialects in the input are
automatically added to the context.
Reviewed By: bondhugula
Differential Revision: https://reviews.llvm.org/D102104
Uday Bondhugula [Sat, 8 May 2021 13:15:14 +0000 (18:45 +0530)]
[MLIR][NFC] Remove unused MLIRContext declaration
Remove unused MLIRContext declaration. NFC.
Differential Revision: https://reviews.llvm.org/D102103
Roman Lebedev [Sat, 8 May 2021 12:42:11 +0000 (15:42 +0300)]
Revert "[LICM] Hoist loads with invariant.group metadata"
This appears to miscompile google benchmark's GetCacheSizesFromKVFS()
when compiling with -fstrict-vtable-pointers.
Runnable reproducer: https://godbolt.org/z/f9ovKqTzb
The "f.fail()" crashes with BUS error, it is compiled into testb,
and the adress it is testing is non-sensical.
This reverts commit
4c89bcadf6cae8320a1925eb9cbeb8c8c1f5f58b.
Saurabh Jha [Sat, 8 May 2021 12:24:05 +0000 (13:24 +0100)]
Test commit to check commit access
Roman Lebedev [Sat, 8 May 2021 12:15:41 +0000 (15:15 +0300)]
[X86] Improve costmodel for scalar byte swaps
Currently we model i16 bswap as very high cost (`10`),
which doesn't seem right, with all other being at `1`.
Regardless of `MOVBE`, i16 reg-reg bswap is lowered into
(an extending move plus) rot-by-8:
https://godbolt.org/z/8jrq7fMTj
I think it should at worst have throughput of `1`:
Since i32/i64 already have cost of `1`,
`MOVBE` doesn't improve their costs any further.
BUT, `MOVBE` must have at least a single memory operand,
with other being a register. Which means, if we have
a bswap of load, iff load has a single use,
we'll fold bswap into load.
Likewise, if we have store of a bswap, iff bswap
has a single use, we'll fold bswap into store.
So i think we should treat such a bswap as free,
unless of course we know that for the particular CPU
they are performing badly.
Reviewed By: RKSimon
Differential Revision: https://reviews.llvm.org/D101924
Louis Dionne [Fri, 7 May 2021 17:14:57 +0000 (13:14 -0400)]
[libc++] Use Xcode's CMake if it's present
This resolves issues when the CMake in use on the host is too old to
configure libc++ properly, but Xcode has a sufficiently recent version.
It is technically possible for the reverse issue to happen, where the
Xcode version would be too old and the user-installed version would be
better, however in the context of our build bots, we use AppleClang on
Apple platforms, and the CMake shipped with Xcode should work with the
AppleClang shipped alongside that Xcode.
Differential Revision: https://reviews.llvm.org/D102083
Qiu Chaofan [Sat, 8 May 2021 10:13:05 +0000 (18:13 +0800)]
[VectorCombine] Simplify to scalar store if only one element updated
This patch simplifies load-insertelt-store pattern into
getelementptr-store.
Reviewed By: fhahn
Differential Revision: https://reviews.llvm.org/D98240
Butygin [Sat, 10 Apr 2021 16:38:11 +0000 (19:38 +0300)]
[mlir] Debug print pattern before and after matchAndRewrite call
Motivation: we have passes with lot of rewrites and when one one them segfaults or asserts, it is very hard to find waht exactly pattern failed without debug info.
Differential Revision: https://reviews.llvm.org/D101443
Xiang1 Zhang [Sat, 8 May 2021 05:46:51 +0000 (13:46 +0800)]
[X86] Support AMX fast register allocation
Differential Revision: https://reviews.llvm.org/D100026
Arthur Eubanks [Sat, 8 May 2021 06:18:44 +0000 (23:18 -0700)]
Fix build after
34a8a437b
Xiang1 Zhang [Sat, 8 May 2021 05:43:32 +0000 (13:43 +0800)]
Revert "[X86] Support AMX fast register allocation"
This reverts commit
77e2e5e07d01fe0b83c39d0c527c0d3d2e659146.
Xiang1 Zhang [Fri, 7 May 2021 02:46:52 +0000 (10:46 +0800)]
[X86] Support AMX fast register allocation
Michael Liao [Sat, 8 May 2021 05:09:15 +0000 (01:09 -0400)]
Replace a remaining CRLF with LF. NFC.
Arthur Eubanks [Mon, 3 May 2021 23:09:56 +0000 (16:09 -0700)]
[NewPM] Hide pass manager debug logging behind -debug-pass-manager-verbose
Printing pass manager invocations is fairly verbose and not super
useful.
This allows us to remove DebugLogging from pass managers and PassBuilder
since all logging (aside from analysis managers) goes through
instrumentation now.
This has the downside of never being able to print the top level pass
manager via instrumentation, but that seems like a minor downside.
Reviewed By: ychen
Differential Revision: https://reviews.llvm.org/D101797
RamNalamothu [Sat, 8 May 2021 04:45:49 +0000 (10:15 +0530)]
[DebugInfo] UnwindTable::create() should not add empty rows to CFI unwind table
UnwindTable::parseRows() may return successfully if the CFIProgram has either
no CFI instructions or only DW_CFA_nop instructions and the UnwindRow return
argument will be empty. But currently, the callers are not checking for this case
which is leading to incorrect dumps in the unwind tables in such cases i.e.
CFA=unspecified
Reviewed By: clayborg
Differential Revision: https://reviews.llvm.org/D101892
River Riddle [Sat, 8 May 2021 02:30:25 +0000 (19:30 -0700)]
[mlir] Refactor the representation of function-like argument/result attributes.
The current design uses a unique entry for each argument/result attribute, with the name of the entry being something like "arg0". This provides for a somewhat sparse design, but ends up being much more expensive (from a runtime perspective) in-practice. The design requires building a string every time we lookup the dictionary for a specific arg/result, and also requires N attribute lookups when collecting all of the arg/result attribute dictionaries.
This revision restructures the design to instead have an ArrayAttr that contains all of the attribute dictionaries for arguments and another for results. This design reduces the number of attribute name lookups to 1, and allows for O(1) lookup for individual element dictionaries. The major downside is that we can end up with larger memory usage, as the ArrayAttr contains an entry for each element even if that element has no attributes. If the memory usage becomes too problematic, we can experiment with a more sparse structure that still provides a lot of the wins in this revision.
This dropped the compilation time of a somewhat large TensorFlow model from ~650 seconds to ~400 seconds.
Differential Revision: https://reviews.llvm.org/D102035
Arthur Eubanks [Sat, 8 May 2021 01:11:21 +0000 (18:11 -0700)]
[lit] Bump up the Windows process cap from 32 to 60
At 61 or over, I see messages like
File "...\Python\Python39\lib\multiprocessing\connection.py", line 816, in _exhaustive_wait
res = _winapi.WaitForMultipleObjects(L, False, timeout)
ValueError: need at most 63 handles, got a sequence of length 64
60 seems to work for me.
If this causes issues for anybody else, feel free to revert.
River Riddle [Sat, 8 May 2021 00:55:52 +0000 (17:55 -0700)]
[mlir] Add hover support to mlir-lsp-server
This provides information when the user hovers over a part of the source .mlir file. This revision adds the following hover behavior:
* Operation:
- Shows the generic form.
* Operation Result:
- Shows the parent operation name, result number(s), and type(s).
* Block:
- Shows the parent operation name, block number, predecessors, and successors.
* Block Argument:
- Shows the parent operation name, parent block, argument number, and type.
Differential Revision: https://reviews.llvm.org/D101113
Arthur Eubanks [Sat, 8 May 2021 01:00:11 +0000 (18:00 -0700)]
Revert "lit: revert
134b103fc0f3a995d76398bf4b029d72bebe8162"
This reverts commit
d319005a3746a7661c8c9a3302266b6ff7cf61be.
Causing messages like:
File "...\Python\Python39\lib\multiprocessing\connection.py", line 816, in _exhaustive_wait
res = _winapi.WaitForMultipleObjects(L, False, timeout)
ValueError: need at most 63 handles, got a sequence of length 74
Arthur Eubanks [Sat, 8 May 2021 00:54:32 +0000 (17:54 -0700)]
[gn build] Manually port
5b158093e
thomasraoux [Sat, 8 May 2021 00:10:35 +0000 (17:10 -0700)]
[mlir][vector] Fix warning
Previous change caused another warning in some build configuration:
"default label in switch which covers all enumeration values"
Amara Emerson [Fri, 7 May 2021 00:14:04 +0000 (17:14 -0700)]
[AArch64][GlobalISel] Create a new minimal combiner pass just for -O0.
We never bothered to have a separate set of combines for -O0 in the prelegalizer
before. This results in some minor performance hits for a mode where performance
isn't a concern (although not regressing code size significantly is still preferable).
This also removes the CSE option since we don't need it for -O0.
Through experiments, I've arrived at a set of combines that gets the most code
size improvement at -O0, while reducing the amount of time spent in the combiner
by around 35% give or take.
Differential Revision: https://reviews.llvm.org/D102038
Amara Emerson [Wed, 5 May 2021 18:37:00 +0000 (11:37 -0700)]
[GlobalISel] Don't form zero/sign extending loads for atomics.
For importing patterns, we only support matching G_LOAD, not G_ZEXTLOAD or
G_SEXTLOAD.
Differential Revision: https://reviews.llvm.org/D101932
Weston Carvalho [Thu, 29 Apr 2021 21:30:47 +0000 (14:30 -0700)]
Make `hasTypeLoc` matcher support more node types.
Differential Revision: https://reviews.llvm.org/D101572
Weston Carvalho [Fri, 7 May 2021 23:32:57 +0000 (00:32 +0100)]
NFC: Move TypeList implementation up the file
This will make it possible for more code to use it.
Arthur Eubanks [Fri, 7 May 2021 21:32:40 +0000 (14:32 -0700)]
[NewPM] Move analysis invalidation/clearing logging to instrumentation
We're trying to move DebugLogging into instrumentation, rather than
being part of PassManagers/AnalysisManagers.
Reviewed By: ychen
Differential Revision: https://reviews.llvm.org/D102093
Jessica Paquette [Tue, 20 Apr 2021 23:07:54 +0000 (16:07 -0700)]
[AArch64][GlobalISel] Legalize narrow type G_CTPOPs
Using `clampScalar` here because we ought to mark s128 as custom eventually.
(Right now, it will just fall back.)
With this legalization, we get the same code as SDAG:
https://godbolt.org/z/TneoPKrKG
Differential Revision: https://reviews.llvm.org/D100908
Adrian Prantl [Fri, 7 May 2021 21:44:45 +0000 (14:44 -0700)]
Fix the module-enabled build by removing a redundant type definition.
Petr Hosek [Fri, 7 May 2021 06:05:39 +0000 (23:05 -0700)]
[BareMetal] Ensure that sysroot always comes after library paths
This addresses an issue introduced in D91559. We would invoke the
compiler with -Lpath/to/lib --sysroot=path/to/sysroot where both
locations contain libraries with the same name, but we expect linker
to pick up the library in path/to/lib since that version is more
specialized. This was the case before D91559 where the sysroot path
would be ignored, but after that change linker would now pick up the
library from the sysroot which resulted in unexpected behavior.
The sysroot path should always come after any user provided library
paths, followed by compiler runtime paths. We want for libraries in user
provided library paths to always take precedence over sysroot libraries.
This matches the behavior of other toolchains used with other targets.
Differential Revision: https://reviews.llvm.org/D102049
Nico Weber [Thu, 6 May 2021 18:47:57 +0000 (14:47 -0400)]
[lld/mac] Write every weak symbol only once in the output
Before this, if an inline function was defined in several input files,
lld would write each copy of the inline function the output. With this
patch, it only writes one copy.
Reduces the size of Chromium Framework from 378MB to 345MB (compared
to 290MB linked with ld64, which also does dead-stripping, which we
don't do yet), and makes linking it faster:
N Min Max Median Avg Stddev
x 10 3.9957051 4.3496981 4.1411121 4.156837 0.
10092097
+ 10 3.908154 4.169318 3.9712729 3.9846753 0.
075773012
Difference at 95.0% confidence
-0.172162 +/- 0.083847
-4.14165% +/- 2.01709%
(Student's t, pooled s = 0.0892373)
Implementation-wise, when merging two weak symbols, this sets a
"canOmitFromOutput" on the InputSection belonging to the weak symbol not put in
the symbol table. We then don't write InputSections that have this set, as long
as they are not referenced from other symbols. (This happens e.g. for object
files that don't set .subsections_via_symbols or that use .alt_entry.)
Some restrictions:
- not yet done for bitcode inputs
- no "comdat" handling (`kindNoneGroupSubordinate*` in ld64) --
Frame Descriptor Entries (FDEs), Language Specific Data Areas (LSDAs)
(that is, catch block unwind information) and Personality Routines
associated with weak functions still not stripped. This is wasteful,
but harmless.
- However, this does strip weaks from __unwind_info (which is needed for
correctness and not just for size)
- This nopes out on InputSections that are referenced form more than
one symbol (eg from .alt_entry) for now
Things that work based on symbols Just Work:
- map files (change in MapFile.cpp is no-op and not needed; I just
found it a bit more explicit)
- exports
Things that work with inputSections need to explicitly check if
an inputSection is written (e.g. unwind info).
This patch is useful in itself, but it's also likely also a useful foundation
for dead_strip.
I used to have a "canoncialRepresentative" pointer on InputSection instead of
just the bool, which would be handy for ICF too. But I ended up not needing it
for this patch, so I removed that again for now.
Differential Revision: https://reviews.llvm.org/D102076
thomasraoux [Fri, 7 May 2021 20:57:34 +0000 (13:57 -0700)]
[mlir] Missed clang-format
thomasraoux [Fri, 7 May 2021 20:44:33 +0000 (13:44 -0700)]
[mlir][vector] Extend pattern to trim lead unit dimension to Splat Op
Differential Revision: https://reviews.llvm.org/D102091
Petr Hosek [Fri, 7 May 2021 20:38:04 +0000 (13:38 -0700)]
Revert "[BareMetal] Ensure that sysroot always comes after library paths"
This reverts commit
6b00b34b8a05896f79b18a1963811299b83d5b21.
Florian Hahn [Fri, 7 May 2021 20:30:54 +0000 (21:30 +0100)]
[LV] Remove reference of PHI from comment, they are not recorded (NFC).
The comment incorrectly states that the PHI is recorded. That's not
accurate, only the recipe for the incoming value is recorded.
Suggested post-commit for
4ba8720f8844.
Andrea Di Biagio [Fri, 7 May 2021 19:20:03 +0000 (20:20 +0100)]
[MCA][RegisterFile] Fix register class check for move elimination (PR50265)
The register file should always check if the destination register is from a
register class that allows move elimination.
Before this change, the check on the register class was only performed in a few
very specific cases. However, it should have always been performed.
This patch fixes the issue.
Note that none of the upstream scheduling models is currently affected by this
bug, so there is no test for it. The issue was found by Roman while working on
the znver3 model. I was able to reproduce the issue locally by tweaking the
btver2 model. I then verified that this patch fixes the issue.
Olivier Goffart [Fri, 7 May 2021 20:23:53 +0000 (13:23 -0700)]
[SEH] Fix regression with SEH in noexpect functions
Commit
5baea0560160a693b19022c5d0ba637b6b46b2d8 set the CurCodeDecl
because it was needed to pass the assert in CodeGenFunction::EmitLValueForLambdaField,
But this was not right to do as CodeGenFunction::FinishFunction passes it to EmitEndEHSpec
and cause corruption of the EHStack.
Revert the part of the commit that changes the CurCodeDecl, and instead
adjust the assert to check for a null CurCodeDecl.
Differential Revision: https://reviews.llvm.org/D102027
Florian Hahn [Fri, 7 May 2021 20:05:58 +0000 (21:05 +0100)]
[LV] Assert if trying to sink replicate region into another region (NFC)
Currently sinking a replicate region into another replicate region is
not supported. Add an assert, to make the problem more obvious, should
it occur.
Discussed post-commit for
ccebf7a1096a.
Florian Hahn [Fri, 7 May 2021 19:21:36 +0000 (20:21 +0100)]
[LV] Rename Region to TargetRegion, similar to SinkRegion (NFC).
Adjust the name to make it clearer this is the region containing the
target recipe, similar to SinkRegion below.
Suggested post-commit for
ccebf7a1096a.
peter klausler [Thu, 6 May 2021 20:50:12 +0000 (13:50 -0700)]
[flang] Implement NORM2 in the runtime
Implement the reduction transformational intrinsic function NORM2 in
the runtime, using infrastructure already in place for MAXVAL & al.
Differential Revision: https://reviews.llvm.org/D102024
Petr Hosek [Fri, 7 May 2021 06:05:39 +0000 (23:05 -0700)]
[BareMetal] Ensure that sysroot always comes after library paths
This addresses an issue introduced in D91559. We would invoke the
compiler with -Lpath/to/lib --sysroot=path/to/sysroot where both
locations contain libraries with the same name, but we expect linker
to pick up the library in path/to/lib since that version is more
specialized. This was the case before D91559 where the sysroot path
would be ignored, but after that change linker would now pick up the
library from the sysroot which resulted in unexpected behavior.
The sysroot path should always come after any user provided library
paths, followed by compiler runtime paths. We want for libraries in user
provided library paths to always take precedence over sysroot libraries.
This matches the behavior of other toolchains used with other targets.
Differential Revision: https://reviews.llvm.org/D102049
Hsiangkai Wang [Fri, 7 May 2021 07:18:11 +0000 (15:18 +0800)]
[RISCV] Consider scalar types for required extensions.
We have vector operations on double vector and float scalar. For
example, vfwadd.wf is such a instruction.
vfloat64m1_t vfwadd_wf(vfloat64m1_t op0, float op1, size_t op2);
We should specify F and D extensions for it.
Differential Revision: https://reviews.llvm.org/D102051
Vyacheslav Zakharin [Fri, 7 May 2021 19:42:04 +0000 (12:42 -0700)]
An attempt to abandon omptarget out-of-tree builds.
I want to start using LLVM component libraries in libomptarget
to stop duplicating implementations already available in LLVM
(e.g. LLVMObject, LLVMSupport, etc.). Without relying on LLVM
in all libomptarget builds one has to provide fallback implementation
for each used LLVM feature.
This is an attempt to stop supporting out-of-llvm-tree builds of libomptarget.
I understand that I may need to revert this,
if this affects downstream projects in a bad way.
Differential Revision: https://reviews.llvm.org/D101509
Alexander Belyaev [Fri, 7 May 2021 19:20:55 +0000 (21:20 +0200)]
[mlir] Add a pattern to bufferize std.index_cast.
Differential Revision: https://reviews.llvm.org/D102088
Alexander Belyaev [Fri, 7 May 2021 19:21:54 +0000 (21:21 +0200)]
[mlir] Add a pattern to bufferize linalg.tensor_reshape.
Differential Revision: https://reviews.llvm.org/D102089
Emilio Cota [Fri, 7 May 2021 19:23:01 +0000 (19:23 +0000)]
[mlir][docs] remove stale statement about index type in vectors
b614ada0e8 ("[mlir] add support for index type in vectors.") removed
this limitation.
Differential Revision: https://reviews.llvm.org/D102081
Arthur Eubanks [Fri, 7 May 2021 19:05:16 +0000 (12:05 -0700)]
Revert "[DebugInfo] Fix updateDbgUsersToReg to support DBG_VALUE_LIST"
This reverts commit
0791f968fee259e5c34523167bd58179b8b081c2.
Causing crashes: https://crbug.com/1206764
Florian Hahn [Fri, 7 May 2021 18:39:05 +0000 (19:39 +0100)]
[SCEV] By more careful when traversing phis in isImpliedViaMerge.
I think currently isImpliedViaMerge can incorrectly return true for phis
in a loop/cycle, if the found condition involves the previous value of
Consider the case in exit_cond_depends_on_inner_loop.
At some point, we call (modulo simplifications)
isImpliedViaMerge(<=, %x.lcssa, -1, %call, -1).
The existing code tries to prove IncV <= -1 for all incoming values
InvV using the found condition (%call <= -1). At the moment this succeeds,
but only because it does not compare the same runtime value. The found
condition checks the value of the last iteration, but the incoming value
is from the *previous* iteration.
Hence we incorrectly determine that the *previous* value was <= -1,
which may not be true.
I think we need to be more careful when looking at the incoming values
here. In particular, we need to rule out that a found condition refers to
any value that may refer to one of the previous iterations. I'm not sure
there's a reliable way to do so (that also works of irreducible control
flow).
So for now this patch adds an additional requirement that the incoming
value must properly dominate the phi block. This should ensure the
values do not change in a cycle. I am not entirely sure if will catch
all cases and I appreciate a through second look in that regard.
Alternatively we could also unconditionally bail out in this case,
instead of checking the incoming values
Reviewed By: nikic
Differential Revision: https://reviews.llvm.org/D101829
Thomas Lively [Fri, 7 May 2021 18:50:19 +0000 (11:50 -0700)]
[WebAssembly] Use functions instead of macros for const SIMD intrinsics
To improve hygiene, consistency, and usability, it would be good to replace all
the macro intrinsics in wasm_simd128.h with functions. The reason for using
macros in the first place was to enforce the use of constants for some arguments
using `_Static_assert` with `__builtin_constant_p`. This commit switches to
using functions and uses the `__diagnose_if__` attribute rather than
`_Static_assert` to enforce constantness.
The remaining macro intrinsics cannot be made into functions until the builtin
functions they are implemented with can be replaced with normal code patterns
because the builtin functions themselves require that their arguments are
constants.
This commit also fixes a bug with the const_splat intrinsics in which the f32x4
and f64x2 variants were incorrectly producing integer vectors.
Differential Revision: https://reviews.llvm.org/D102018
Fangrui Song [Fri, 7 May 2021 18:42:16 +0000 (11:42 -0700)]
[unittest] Fix -Wunused-variable after D94717
Krzysztof Parzyszek [Fri, 7 May 2021 17:52:20 +0000 (12:52 -0500)]
Allow empty value list in propagateMetadata(Inst, ArrayOf...)
This will allow writing
propagateMetadata(Inst, collectInterestingValues(...))
without concern about empty lists. In case of an empty list,
Inst is returned without any changes.
Fangrui Song [Fri, 7 May 2021 18:15:43 +0000 (11:15 -0700)]
Internalize some cl::opt global variables or move them under namespace llvm
Louis Dionne [Fri, 7 May 2021 17:57:07 +0000 (13:57 -0400)]
[libc++][ci] Run longer CI jobs first
Jobs that test with a more recent standard version run more tests, so
they take longer. We'll decrease the average latency by running them
first instead of last.
Saleem Abdulrasool [Fri, 7 May 2021 17:18:28 +0000 (10:18 -0700)]
lit: revert
134b103fc0f3a995d76398bf4b029d72bebe8162
Revert the 32-process cap on Windows. When testing with Swift, we found
that there was a time reduction for testing with the higher load. This
should hopefully not matter much in practice. In the case that the
original problem with python remains with a high subprocess count, we
can easily revert this change.
Roman Lebedev [Fri, 7 May 2021 17:05:30 +0000 (20:05 +0300)]
[X86] AMD Zen 3: mark XMM/YMM (but not MMX!) reg moves as eliminatible in RegisterFile
Roman Lebedev [Fri, 7 May 2021 16:36:37 +0000 (19:36 +0300)]
[X86] AMD Zen 3: MOVSX32rr32 is a zero-cycle move
It measures as such, and the reference docs agree.
I can't easily add a MCA test, because there's no mnemonic for it,
it can only be disassembled or created as a MCInst.