Wang, Pengfei [Mon, 27 Sep 2021 00:44:44 +0000 (08:44 +0800)]
[X86][FP16] Add more builtins to avoid multi evaluation problems & add 2 missed intrinsics
Reviewed By: craig.topper
Differential Revision: https://reviews.llvm.org/D110336
Lang Hames [Mon, 27 Sep 2021 00:56:47 +0000 (17:56 -0700)]
[ORC] Fix SimpleRemoteEPC data races.
Adds a 'start' method to SimpleRemoteEPCTransport to defer transport startup
until the client has been configured. This avoids races on client members if the
first messages arrives while the client is being configured.
Also fixes races on the file descriptors in FDSimpleRemoteEPCTransport.
Amara Emerson [Mon, 27 Sep 2021 00:24:58 +0000 (17:24 -0700)]
[GlobalISel] Re-generate some call lowering tests with the new CHECK-NEXT behaviour.
Mehdi Amini [Sun, 26 Sep 2021 22:04:41 +0000 (22:04 +0000)]
Fix clang-tidy warning "modernize-use-nullptr" in MLIR VulkanRuntime (NFC)
Mehdi Amini [Sun, 26 Sep 2021 22:01:19 +0000 (22:01 +0000)]
Fix ClangTidyLegacy warning: "'virtual' is redundant since the function is already declared 'final' " (NFC)
Lang Hames [Sun, 26 Sep 2021 21:14:41 +0000 (14:14 -0700)]
[MCJIT] This test shouldn't require an unwind table.
This should fix the failures on the Fuchsia bot that started in
https://lab.llvm.org/buildbot/#/builders/98/builds/6401.
Michał Górny [Sat, 25 Sep 2021 10:31:25 +0000 (12:31 +0200)]
[lldb] [gdb-remote] Use llvm::StringRef.split() and llvm::to_integer()
Replace the uses of StringConvert combined with hand-rolled array
splitting with llvm::StringRef.split() and llvm::to_integer().
Differential Revision: https://reviews.llvm.org/D110472
Nikita Popov [Sun, 26 Sep 2021 19:21:13 +0000 (21:21 +0200)]
[BasicAA] Don't check whether GEP is sized (NFC)
GEPs are required to have sized source element type, so we can
just assert that here.
Simon Pilgrim [Sun, 26 Sep 2021 18:27:38 +0000 (19:27 +0100)]
[X86][SSE] combineMulToPMADDWD - enable sext_extend_vector_inreg(vXi16) -> zext_extend_vector_inreg(vXi16) fold
The plan is to allow combineMulToPMADDWD to match illegal vector types (as long as they're still pow2), which should allow us to start removing the 128-bit limit on more of the PMADDWD combines.
Simon Pilgrim [Sun, 26 Sep 2021 17:42:51 +0000 (18:42 +0100)]
[X86] Fold PACK(*_EXTEND_VECTOR_INREG, UNDEF) -> *_EXTEND_VECTOR_INREG
For 128-bit vectors, we can remove a PACK of a EXTEND_VECTOR_INREG node and just create a smaller extension to the result/packed type.
Lang Hames [Sun, 26 Sep 2021 18:02:37 +0000 (11:02 -0700)]
[ORC] Remote OrcRemoteTargetClient and OrcRemoteTargetServer.
Now that the lli and lli-child-target tools have been updated to use
SimpleRemoteEPC (
6498b0e991b) the OrcRemoteTarget* APIs are no longer needed.
Once the LLJITWithRemoteDebugging example has been migrated to SimpleRemoteEPC
we will remove OrcRPCExecutorProcessControl, and the ORC RPC system itself.
Lang Hames [Sun, 26 Sep 2021 18:15:42 +0000 (11:15 -0700)]
[ORC] Export process symbols in lli-child-target.
We want this behavior for future testing infrastructure anyway, and it may help
with the failure in https://lab.llvm.org/buildbot/#/builders/98/builds/6401:
/b/fuchsia-x86_64-linux/llvm.obj/tools/clang/stage2-bins/bin/lli: warning:
remote mcjit does not support lazy compilation
Finalization error: could not register eh-frame: __register_frame function not
found
/b/fuchsia-x86_64-linux/llvm.obj/tools/clang/stage2-bins/bin/lli: disconnecting
LLVM GN Syncbot [Sun, 26 Sep 2021 17:25:08 +0000 (17:25 +0000)]
[gn build] Port
6498b0e991ba
Lang Hames [Sun, 26 Sep 2021 00:53:21 +0000 (10:53 +1000)]
Reintroduce "[ORC] Introduce EPCGenericRTDyldMemoryManager."
This reintroduces "[ORC] Introduce EPCGenericRTDyldMemoryManager."
(
bef55a2b47a938ef35cbd7b61a1e5fa74e68c9ed) and "[lli] Add ChildTarget dependence
on OrcTargetProcess library." (
7a219d801bf2c3006482cf3cbd3170b3b4ea2e1b) which were
reverted in
99951a56842d8e4cd0706cd17a04f77b5d0f6dd0 due to bot failures.
The root cause of the bot failures should be fixed by "[ORC] Fix uninitialized
variable." (
0371049277912afc201da721fa659ecef7ab7fba) and "[ORC] Wait for
handleDisconnect to complete in SimpleRemoteEPC::disconnect."
(
320832cc9b7e7fea5fc8afbed75c34c4a43287ba).
Simon Pilgrim [Sun, 26 Sep 2021 17:08:17 +0000 (18:08 +0100)]
[X86] Fold ADD(VPMADDWD(X,Y),VPMADDWD(Z,W)) -> VPMADDWD(SHUFFLE(X,Z), SHUFFLE(Y,W))
Merge addition of VPMADDWD nodes if each element pair doesn't use the upper element in each pair (i.e. its zero) - we can generalize this to either element in the pair if we one day create VPMADDWD with zero lower elements.
There are still a number of issues with extending/shuffling with 256/512-bit VPMADDWD nodes so this initially only works for v2i32/v4i32 cases - I'm working on removing all these limitations but there's still a bit of yak shaving to go.....
Lang Hames [Sun, 26 Sep 2021 16:58:46 +0000 (09:58 -0700)]
[ORC][llvm-jitlink] Add debugging output to SimpleRemoteEPC (and Server).
Also adds an optional 'debug' argument to the llvm-jitlink-executor tool to
enable debug-logging.
Kazu Hirata [Sun, 26 Sep 2021 16:26:56 +0000 (09:26 -0700)]
[RISCV] Remove redundant declaration RISCVMnemonicSpellCheck (NFC)
Note that RISCVMnemonicSpellCheck is defined in
RISCVGenAsmMatcher.inc, which RISCVAsmParser.cpp includes.
Identified with readability-redundant-declaration.
Roman Lebedev [Sun, 26 Sep 2021 16:06:16 +0000 (19:06 +0300)]
[X86][Costmodel] Load/store i16 VF=2 interleaving costs
The only sched models that for cpu's that support avx2
but not avx512 are: haswell, broadwell, skylake, zen1-3
For load we have:
https://godbolt.org/z/M8vEKs5jY - for intels `Block RThroughput: =2.0`;
for ryzens, `Block RThroughput: <=1.0`
So pick cost of `2`.
For store we have:
https://godbolt.org/z/Kx1nKz7je - for intels `Block RThroughput: =1.0`;
for ryzens, `Block RThroughput: <=0.5`
So pick cost of `1`.
I'm directly using the shuffling asm the llc produced,
without any manual fixups that may be needed
to ensure sequential execution.
Reviewed By: RKSimon
Differential Revision: https://reviews.llvm.org/D103144
Nikita Popov [Sun, 26 Sep 2021 16:01:26 +0000 (18:01 +0200)]
[DSE] Don't check getUnderlyingObject() return value (NFC)
getUnderlyingObject() never returns null. It will simply return
something that is not the "root" underlying object.
Also drop a stale comment.
Nikita Popov [Sun, 26 Sep 2021 15:52:20 +0000 (17:52 +0200)]
[DSE] Make DSEState non-copyable (NFC)
As it contains a self-reference, the default copy/move ctors
would not be safe.
Move the DSEState::get() method into the ctor to make sure no move
occurs here even without NRVO.
This is a speculative fix for test failures on
llvm-clang-x86_64-expensive-checks-win.
Jon Chesterfield [Sun, 26 Sep 2021 14:34:18 +0000 (15:34 +0100)]
[libomptarget][amdgpu] Destruct HSA queues
Store queues in unique_ptr so they are destroyed when the global DeviceInfo is. Currently they leak which raises an assert in debug builds of hsa.
Reviewed By: pdhaliwal
Differential Revision: https://reviews.llvm.org/D109511
Sanjay Patel [Sun, 26 Sep 2021 13:47:01 +0000 (09:47 -0400)]
[InstCombine] move add after min/max intrinsic
This is another regression noted with the proposal to canonicalize
to the min/max intrinsics in D98152.
Here are Alive2 attempts to show correctness without specifying
exact constants:
https://alive2.llvm.org/ce/z/bvfCwh (smax)
https://alive2.llvm.org/ce/z/of7eqy (smin)
https://alive2.llvm.org/ce/z/2Xtxoh (umax)
https://alive2.llvm.org/ce/z/Rm4Ad8 (umin)
(if you comment out the assume and/or no-wrap, you should see failures)
The different output for the umin test is due to a fold added with
c4fc2cb5b2d98125 :
// umin(x, 1) == zext(x != 0)
We probably want to adjust that, so it applies more generally
(umax --> sext or patterns where we can fold to select-of-constants).
Some folds that were ok when starting with cmp+select may increase
instruction count for the equivalent intrinsic, so we have to decide
if it's worth altering a min/max.
Differential Revision: https://reviews.llvm.org/D110038
Simon Pilgrim [Sun, 26 Sep 2021 12:43:46 +0000 (13:43 +0100)]
[CostModel][X86] Improve AVX1/AVX2 v16i32->v16i16/v16i8 truncation costs (PR51972)
Based off worst case btver2 (AVX1) and haswell (AVX2) llvm-mca reports
Michael Kruse [Sat, 25 Sep 2021 06:13:11 +0000 (01:13 -0500)]
[Polly] Support for InlineAsm.
Inline assembly was not handled at all and treated like a llvm::Value.
In particular, it tried to create a pointer it which is not allowed.
Fix by handling like a llvm::Constant such that it is just reused when
required, instead of trying to marshall it in memory.
Fixes llvm.org/PR51960
Michael Kruse [Sat, 25 Sep 2021 06:03:36 +0000 (01:03 -0500)]
[Polly] Use VirtualUse to determine references.
VirtualUse ensures consistency over different source of values with
Polly. In particular, this enables its use of instructions moved between
Statement. Before the patch, the code wrongly assumed that the BB's
instructions are also the ScopStmt's instructions. Reference are
determined for OpenMP outlining and GPGPU kernel extraction.
GPGPU CodeGen had some problems. For one, it generated GPU kernel
parameters for constants. Second, it emitted GPU-side invariant loads
which have already been loaded by the host. This has been partially
fixed, it still generates a store for the invariant load result, but
using the value that the host has already written.
WARNING: I did not test the generated PollyACC code on an actual GPU.
The improved consistency will be made use of in the next patch.
Michael Kruse [Sun, 26 Sep 2021 08:06:19 +0000 (03:06 -0500)]
[Polly] Remove isConstCall.
The function was intended to catch OpenMP functions such as
get_thread_id(). If matched, the call would be considered synthesizable.
There were a few problems with this:
* get_thread_id() is not 'const' in the sense of have the gcc manual
defines it: "do not examine any values except their arguments".
get_thread_id() reads OpenCL runtime libreary global state.
What was inteded was probably 'speculable'.
* isConstCall was implemented using mayReadOrWriteMemory(). 'const' is
stricter than that, mayReadOrWriteMemory is e.g. true for malloc(),
since it may only read/write addresses that are considered
inaccessible fro the application. However, malloc is certainly not
speculable.
* Values that are isConstCall were not handled consistently throughout
Polly. In particular, it was not considered for referenced values
(OpenMP outlining and PollyACC).
Fix by removing special handling for isConstCall entirely.
Alexandre Rames [Sun, 26 Sep 2021 01:02:55 +0000 (18:02 -0700)]
[ADT] Add trailing comma on TYPED_TEST_SUITE
This avoids a -pedantic warning:
warning: ISO C++11 requires at least one argument for the "..." in a variadic macro
See also https://github.com/google/googletest/issues/2271
Reviewed By: arames, bkramer
Differential Revision: https://reviews.llvm.org/D110283
Mehdi Amini [Sat, 25 Sep 2021 17:24:55 +0000 (17:24 +0000)]
MLIR can't support -Bsymbolic link option, fail at CMake time with a helpful message instead of broken runtime
Differential Revision: https://reviews.llvm.org/D110483
Lang Hames [Sun, 26 Sep 2021 00:17:21 +0000 (10:17 +1000)]
[ORC] Wait for handleDisconnect to complete in SimpleRemoteEPC::disconnect.
Disconnect should block until handleDisconnect completes, otherwise we might
destroy the SimpleRemoteEPC instance while it's still in use.
Thanks to Dave Blaikie for helping me track this down.
Lang Hames [Sat, 25 Sep 2021 23:40:15 +0000 (09:40 +1000)]
[ORC] Fix uninitialized variable.
Spotted by Dave Blaikie. Thanks Dave!
Fangrui Song [Sat, 25 Sep 2021 22:47:27 +0000 (15:47 -0700)]
[ELF] Remove unneeded binding parameter from addOptionalRegular. NFC
__rela_iplt_start uses spurious STB_WEAK, but it doesn't matter because STV_HIDDEN overrides the binding.
Fangrui Song [Sat, 25 Sep 2021 22:16:44 +0000 (15:16 -0700)]
[ELF] Replace noneRel = R_*_NONE with static constexpr. NFC
All architectures define R_*_NONE to 0.
Fangrui Song [Sat, 25 Sep 2021 22:06:09 +0000 (15:06 -0700)]
[ELF] Default gotBaseSymInGotPlt to false (NFC for most architectures)
Most architectures use .got instead of .got.plt, so switching the default can
minimize customization.
This fixes an issue for SPARC V9 which uses .got .
AVR, AMDGPU, and MSP430 don't seem to use _GLOBAL_OFFSET_TABLE_.
Nikita Popov [Thu, 23 Sep 2021 19:23:17 +0000 (21:23 +0200)]
[AA] Move earliest escape tracking from DSE to AA
This is a followup to D109844 (and alternative to D109907), which
integrates the new "earliest escape" tracking into AliasAnalysis.
This is done by replacing the pre-existing context-free capture
cache in AAQueryInfo with a replaceable (virtual) object with two
implementations: The SimpleCaptureInfo implements the previous
behavior (check whether object is captured at all), while
EarliestEscapeInfo implements the new behavior from DSE.
This combines the "earliest escape" analysis with the full power of
BasicAA: It subsumes the call handling from D109907, considers a
wider range of escape sources, and works with AA recursion. The
compile-time cost is slightly higher than with D109907.
Differential Revision: https://reviews.llvm.org/D110368
Nikita Popov [Sat, 25 Sep 2021 20:20:19 +0000 (22:20 +0200)]
[DSE] Make capture check more precise
It is sufficient that the object has not been captured before the
load that produces the pointer we're loading. A capture after that
can not affect the already loaded pointer.
This is small part of D110368 applied separately.
Nikita Popov [Sat, 25 Sep 2021 20:01:28 +0000 (22:01 +0200)]
[BasicAA] Don't consider Argument as escape source (NFCI)
The case of an Argument and an identified function local is already
handled earlier, because we don't care about captures in that case.
As such, we don't need to additionally consider the combination of
an Argument with a non-escaping identified function local.
This ensures that isEscapeSource() only returns true for
instructions, which is necessary for D110368.
Lang Hames [Sat, 25 Sep 2021 18:56:30 +0000 (11:56 -0700)]
[ORC-RT] ExecutorAddrDiff ergonomic improvements; contains and overlaps methods
Renames StartAddress and EndAddress members to Start and End.
Adds contains and overlap methods.
Adds a constructor from an address and size.
These changes are counterparts to LLVM commits
ef391df2b6332,
c0d889995e708, and
37f1b7a3f35fd.
Fangrui Song [Sat, 25 Sep 2021 18:57:54 +0000 (11:57 -0700)]
[ELF][test] Improve test coverage
Lang Hames [Sat, 25 Sep 2021 18:17:55 +0000 (11:17 -0700)]
Revert "[ORC] Introduce EPCGenericRTDyldMemoryManager."
This reverts commit
bef55a2b47a938ef35cbd7b61a1e5fa74e68c9ed while I investigate
failures on some bots. Also reverts "[lli] Add ChildTarget dependence on
OrcTargetProcess library." (
7a219d801bf2c3006482cf3cbd3170b3b4ea2e1b) which was
a fallow-up to
bef55a2b47a.
Lang Hames [Sat, 25 Sep 2021 17:50:59 +0000 (10:50 -0700)]
[lli] Add ChildTarget dependence on OrcTargetProcess library.
ChildTarget depends on OrcTargetProcess after
bef55a2b47a.
Lang Hames [Sat, 25 Sep 2021 00:04:29 +0000 (17:04 -0700)]
[ORC] Introduce EPCGenericRTDyldMemoryManager.
EPCGenericRTDyldMemoryMnaager is an EPC-based implementation of the
RuntimeDyld::MemoryManager interface. It enables remote-JITing via EPC (backed
by a SimpleExecutorMemoryManager instance on the executor side) for RuntimeDyld
clients.
The lli and lli-child-target tools are updated to use SimpleRemoteEPC and
SimpleRemoteEPCServer (rather than OrcRemoteTargetClient/Server), and
EPCGenericRTDyldMemoryManager for MCJIT tests.
By enabling remote-JITing for MCJIT and RuntimeDyld-based ORC clients,
EPCGenericRTDyldMemoryManager allows us to deprecate older remote-JITing
support, including OrcTargetClient/Server, OrcRPCExecutorProcessControl, and the
Orc RPC system itself. These will be removed in future patches.
Simon Pilgrim [Sat, 25 Sep 2021 17:35:39 +0000 (18:35 +0100)]
[DAG] ReduceLoadOpStoreWidth - replace getABITypeAlign with allowsMemoryAccess (PR45116)
One of the cases identified in PR45116 - we don't need to limit store narrowing to ABI alignment, we can use allowsMemoryAccess - which tests using getABITypeAlign, but also checks if a target permits (fast) misaligned memory access by checking allowsMisalignedMemoryAccesses as a fallback.
mydeveloperday [Sat, 25 Sep 2021 16:34:34 +0000 (17:34 +0100)]
[clang-format] Left/Right alignment fixer can cause false positive replacements when they don't actually change anything
Earlier during the development of {D69764} I felt it was no longer necessary to
ensure we were not trying to change code which didn't need to change
and we felt this could be removed, however I'd like to bring this back for now
as I am seeing some false positives in terms of the "replacements"
What I see is the generation of a replacement which is a "No Op" on the original
code, I think this comes about because of the merging of replacements:
```
static const a;
->
const static a;
->
static const a;
```
The replacements don't really merge, in such a way as to identify when we have gone
back to the original
Also remove the Penalty as I'm not using it (and it became marked as set and no used,
I'd rather get rid of it if it means nothing)
I think we need to do this step for now, as many people use the --output-replacements-xml
to identify that the file "needs a clang-format"
The same can be seen with the -n or --dry-run option as this uses the replacements
to drive the error/warning output.
Reviewed By: HazardyKnusperkeks
Differential Revision: https://reviews.llvm.org/D110392
Simon Pilgrim [Sat, 25 Sep 2021 15:28:48 +0000 (16:28 +0100)]
[CostModel][X86] Adjust vXi32 multiply costs if it can be performed using PMADDWD
Update the costs to match the codegen from combineMulToPMADDWD - not only can we use PMADDWD is its zero-extended, but also if its a constant or sign-extended from a vXi16 (which can be replaced with a zero-extension).
Simon Pilgrim [Sat, 25 Sep 2021 14:50:06 +0000 (15:50 +0100)]
[X86][SSE] combineMulToPMADDWD - mask off upper bits of sign-extended vXi32 constants
If we are multiplying by a sign-extended vXi32 constant, then we can mask off the upper 16 bits to allow folding to PMADDWD and make use of its implicit sign-extension from i16
Simon Pilgrim [Sat, 25 Sep 2021 13:49:10 +0000 (14:49 +0100)]
[X86][SSE] combineMulToPMADDWD - enable sext(v8i16) -> zext(v8i16) fold on sub-128 bit vectors
Kazu Hirata [Sat, 25 Sep 2021 14:41:10 +0000 (07:41 -0700)]
[Mips] Remove redundant declarations (NFC)
Note that identical declarations immediately precede what's being
removed in this patch.
Identified with readability-redundant-declaration.
Simon Pilgrim [Sat, 25 Sep 2021 13:35:31 +0000 (14:35 +0100)]
[X86][SSE] combineMulToPMADDWD - enable sext(v8i16) -> zext(v8i16) fold on pre-SSE41 targets
We already do this on SSE41 targets where we have sext/zext instructions, now that combineShiftToPMULH handles SSE2 targets, we can enable this here as well.
Simon Pilgrim [Sat, 25 Sep 2021 13:31:14 +0000 (14:31 +0100)]
[X86] X86FastISel::fastMaterializeConstant - break if-else chain to fix llvm-else-after-return warning. NFCI
All previous if-else cases return
Simon Pilgrim [Fri, 24 Sep 2021 19:25:06 +0000 (20:25 +0100)]
[X86] combineShiftToPMULH - relax from ISA from SSE41 to SSE2
With improved shuffle combines (in particular canonicalizeShuffleWithBinOps), we can now usefully perform this on any SSE2+ target.
We should be able to remove this entirely and just use DAGCombiner's combineShiftToMULH if we can someday get it to support illegal (pre-widened) types.
Michał Górny [Fri, 24 Sep 2021 21:36:49 +0000 (23:36 +0200)]
[lldb] Convert misc. StringConvert uses
Replace misc. StringConvert uses with llvm::to_integer()
and llvm::to_float(), except for cases where further refactoring is
planned. The purpose of this change is to eliminate the StringConvert
API that is duplicate to LLVM, and less correct in behavior at the same
time.
Differential Revision: https://reviews.llvm.org/D110447
Valentin Clement [Sat, 25 Sep 2021 12:10:02 +0000 (14:10 +0200)]
[fir] Add desc to fir.array_load op and update operand name
This patch is part of the upstreaming effort from fir-dev branch.
Add a description for the fir.array_load opeartion and rename lenParams to typeparams.
Reviewed By: kiranchandramohan
Differential Revision: https://reviews.llvm.org/D110393
Co-authored-by: Eric Schweitz <eschweitz@nvidia.com>
Simon Pilgrim [Sat, 25 Sep 2021 11:57:33 +0000 (12:57 +0100)]
[InstCombine] Ensure shifts are in range for (X << C1) / C2 -> X fold.
We can get here before out of range shift amounts have been handled - limit to BW-2 for sdiv and BW-1 for udiv
Fixes https://bugs.chromium.org/p/oss-fuzz/issues/detail?id=38078
Markus Böck [Sat, 25 Sep 2021 11:13:11 +0000 (13:13 +0200)]
[CMake] Consistently use the LibXml2::LibXml2 target instead of LIBXML2_LIBRARIES
Linking against the LibXml2::LibXml2 target has the advantage of not only importing the library, but also adding the include path as well as any definitions the library requires. In case of a static build of libxml2, eg. a define is set on Windows to remove any DLL imports and export.
LLVM already makes use of the target, but c-index-test and lldb were still linking against the library only.
The workaround for Mac OS-X that I removed seems to have also been made redundant since https://reviews.llvm.org/D84563 I believe
Differential Revision: https://reviews.llvm.org/D109975
Simon Pilgrim [Sat, 25 Sep 2021 10:58:06 +0000 (11:58 +0100)]
[IR] DIBuilder::createEnumerator - pass APSInt by const reference
Avoid unnecessary copy by value.
Simon Pilgrim [Sat, 25 Sep 2021 10:56:35 +0000 (11:56 +0100)]
[DAG] combineShiftToMULH - move getValueType() inside assert. NFCI.
Avoids an unnecessary (void).
Kunwar Shaanjeet Singh Grover [Sat, 25 Sep 2021 10:31:17 +0000 (16:01 +0530)]
[MLIR] Add functionality to remove redundant local variables
This patch adds functionality to FlatAffineConstraints to remove local
variables using equalities. This helps in keeping output representation of
FlatAffineConstraints smaller.
This patch is part of a series of patches aimed at generalizing affine
dependence analysis.
Reviewed By: bondhugula
Differential Revision: https://reviews.llvm.org/D110056
David Green [Sat, 25 Sep 2021 10:32:25 +0000 (11:32 +0100)]
[ARM] Fix Arm block placement creating branches after jump tables.
Given:
- A jump table
- Which jumps to the next block
- The next block ends in a WLS
- Where the WLS conditionally jumps to block earlier in the program.
The Arm block placement pass would attempt to move the block containing
the WLS earlier, as the WLS instruction can only branch forward. In
doing so it would add a branch from the jumptable block to the WLS
block, thinking it previously fell-through.
This in itself would be fine, if a little inefficient, but the constant
island pass expects all instructions after a jump-table branch to have
been removed by analyzeBranch. So it gets confused and can assign the
same labels to multiple jump table blocks.
I've changed the condition to the same as used in analyzeBranch.
Dmitry Vyukov [Sat, 25 Sep 2021 09:56:53 +0000 (11:56 +0200)]
tsan: uninline RacyStacks::operator==
It's only used during race reporting.
There is no point in polluting the main header file with it.
Reviewed By: xgupta
Differential Revision: https://reviews.llvm.org/D110470
Simon Pilgrim [Sat, 25 Sep 2021 09:50:54 +0000 (10:50 +0100)]
[TTI] getUserCost - Ensure a vector insert/extract index is in unsigned 32-bit range
Otherwise fallback to the generic 'unknown index' path
Fixes https://bugs.chromium.org/p/oss-fuzz/issues/detail?id=29050
Jim Lin [Sat, 25 Sep 2021 03:25:02 +0000 (11:25 +0800)]
[RISCV] Fix incorrect operand type of inst alias for InstR4
Reviewed By: craig.topper
Differential Revision: https://reviews.llvm.org/D110381
Matheus Izvekov [Fri, 24 Sep 2021 20:18:54 +0000 (22:18 +0200)]
[clang] set templates as invalid when any of the parameters are invalid
See PR51872 for the original repro.
This fixes a crash when converting a templated constructor into a deduction
guide, in case any of the template parameters were invalid.
Signed-off-by: Matheus Izvekov <mizvekov@gmail.com>
Reviewed By: rsmith
Differential Revision: https://reviews.llvm.org/D110460
Amara Emerson [Sat, 25 Sep 2021 01:25:23 +0000 (18:25 -0700)]
[AArch64][AMDGPU] Re-generate some tests with CHECK-NEXT to prepare for a patch.
Petr Hosek [Sat, 25 Sep 2021 00:56:00 +0000 (17:56 -0700)]
[CMake] Pass through CMAKE_READELF to subbuilds
This matches handling of other CMake variables.
Differential Revision: https://reviews.llvm.org/D110463
Jason Molenda [Sat, 25 Sep 2021 00:11:54 +0000 (17:11 -0700)]
Add pragma to make it easier to find "image list" impl
I couldn't find it; make this easier for next time.
David Blaikie [Mon, 20 Sep 2021 04:03:20 +0000 (21:03 -0700)]
DebugInfo: Use the signedness of the underlying enum when encoding enum non-type-template-parameters
This improves the accuracy of the debug info and improves round tripping
through -gsimple-template-names.
River Riddle [Fri, 24 Sep 2021 23:45:25 +0000 (23:45 +0000)]
[mlir:ElementsAttr] Avoid crash on empty contiguous ranges
We currently, incorrectly, assume that a range always has at least
one element when building a contiguous range. This commit adds
a proper empty check to avoid crashing.
Differential Revision: https://reviews.llvm.org/D110457
modimo [Fri, 24 Sep 2021 23:42:30 +0000 (16:42 -0700)]
[llvm-profdata] Extend support of --topn to sample profiles
Reviewed By: wenlei
Differential Revision: https://reviews.llvm.org/D110449
Nico Weber [Fri, 24 Sep 2021 22:48:08 +0000 (18:48 -0400)]
[llvm] Remove LLVM_CHECK_ENABLED_PROJECTS again
This reverts commit
55f0b337087136554122f942fea951a357bc4a49 and
follow-up reverts commit
e9ea03c62ccc1ed4e3ed4f20e37640cfdd76cbcf.
LLVM_EXTERNAL_PROJECTS is sufficient, see https://reviews.llvm.org/D110016
Nico Weber [Fri, 24 Sep 2021 22:43:51 +0000 (18:43 -0400)]
Revert "[Driver] Correctly handle static C++ standard library"
This reverts commit
03142c5f67788bcc1573f76732d0fccd75c6b965.
Breaks check-asan if system ld doesn't support --push-state, even
if lld was built and is used according to lit's output.
See comments on https://reviews.llvm.org/D110128
Konrad Kleine [Fri, 24 Sep 2021 22:29:13 +0000 (00:29 +0200)]
[llvm] Improve export.sh with help and snapshot
This change adds the ability to create source tarballs for unreleased or untagged code by providing the `--git-ref <GIT_REF>` flag to the `llvm/utils/release/export.sh` script. This is useful for creating daily snapshot tarballs that can easily be consumed by packagers who want to build a daily snapshot.
The default behavior of `export.sh` hasn't changed.
You may also provide a `--template` argument to say how the artifacts
are supposed to be named (as suggested by @hans).
The `-help` output of `export.sh` was changed quite significantly to look like this:
```
Export the Git sources and build tarballs from them.
Usage: export.sh [-release|--release <major>.<minor>.<patch>]
[-rc|--rc <num>]
[-final|--final]
[-git-ref|--git-ref <git-ref>]
[-template|--template <template>]
Flags:
-release | --release <major>.<minor>.<patch> The version number of the release
-rc | --rc <num> The release candidate number
-final | --final When provided, this option will disable the rc flag
-git-ref | --git-ref <git-ref> (optional) Use <git-ref> to determine the release and don't export the test-suite files
-template | --template <template> (optional) Possible placeholders: $PROJECT $YYYYMMDD $GIT_REF $RELEASE $RC.
Defaults to '${PROJECT}-${RELEASE}${RC}.src.tar.xz'.
The following list shows the filenames (with <placeholders>) for the artifacts
that are being generated (given that you don't touch --template).
* llvm-<RELEASE><RC>.src.tar.xz
* clang-<RELEASE><RC>.src.tar.xz
* compiler-rt-<RELEASE><RC>.src.tar.xz
* libcxx-<RELEASE><RC>.src.tar.xz
* libcxxabi-<RELEASE><RC>.src.tar.xz
* libclc-<RELEASE><RC>.src.tar.xz
* clang-tools-extra-<RELEASE><RC>.src.tar.xz
* polly-<RELEASE><RC>.src.tar.xz
* lldb-<RELEASE><RC>.src.tar.xz
* lld-<RELEASE><RC>.src.tar.xz
* openmp-<RELEASE><RC>.src.tar.xz
* libunwind-<RELEASE><RC>.src.tar.xz
* flang-<RELEASE><RC>.src.tar.xz
Additional files being generated:
* llvm-project-<RELEASE><RC>.src.tar.xz (the complete LLVM source project)
* test-suite-<RELEASE><RC>.src.tar.xz (only when not using --git-ref)
To ease the creation of snapshot builds, we also provide these files
* llvm-release-<YYYYMMDD>.txt (contains the <RELEASE> as a text)
* llvm-rc-<YYYYMMDD>.txt (contains the rc version passed to the invocation of export.sh)
* llvm-git-revision-<YYYYMMDD>.txt (contains the current git revision sha1)
Example values for the placeholders:
* <RELEASE> -> 13.0.0
* <YYYYMMDD> ->
20210414
* <RC> -> rc4 (will be empty when using --git-ref)
In order to generate snapshots of the upstream main branch you could do this for example:
export.sh --git-ref upstream/main --template '${PROJECT}-${YYYYMMDD}.src.tar.xz'
```
Reviewed By: tstellar
Differential Revision: https://reviews.llvm.org/D101446
Wei Mi [Fri, 24 Sep 2021 22:35:07 +0000 (15:35 -0700)]
Add "REQUIRES: zlib" in forward-compatible.test since it handles compressed file.
Wei Mi [Fri, 24 Sep 2021 22:20:16 +0000 (15:20 -0700)]
Fixed a bug in https://reviews.llvm.org/rG8eb617d719bdc6a4ed7773925d2421b9bbdd4b7a.
For compressed profile when reading an unknown section, the data reader pointer
adjustment was incorrect. This patch fixed that.
Craig Topper [Fri, 24 Sep 2021 21:29:55 +0000 (14:29 -0700)]
[RISCV] Add another isel optimization for (and (shl X, c2), c1).
Where c1 is a shifted mask with 32-c2 leading zeros and c3 trailing
zeros and c3>c2. We can select it as (slli (srliw X, c3-c2), c3).
Jonas Devlieghere [Fri, 24 Sep 2021 22:06:39 +0000 (15:06 -0700)]
[dsymutil] Update union-fwd-decl.test for Windows
Remove path separators from CHECK-lines in union-fwd-decl.test
Jonas Devlieghere [Fri, 24 Sep 2021 21:59:58 +0000 (14:59 -0700)]
[lldb] Copy the system debugserver in LLDB.framework
When using the system debugserver for testing, copy the binary in the
LLDB.framework Resource directory instead of the build's bin directory.
rdar://
82998263
Lang Hames [Fri, 24 Sep 2021 20:51:36 +0000 (13:51 -0700)]
[ORC] Allow construction of an ExecutorAddrRange from an addr and a size.
David Blaikie [Mon, 20 Sep 2021 04:04:03 +0000 (21:04 -0700)]
WIP: Verify -gsimple-template-names=mangled values
Clang will encode names that should be able to be simplified as
"_STNname|<template, args>" (eg: "_STNt1|<int>") - this verification
mode will detect these names, decode them, create the original name
("t1<int>") and the simple name ("t1") - letting the simple name run
through the usual rebuilding logic - then compare the two sources of the
full name - the rebuilt and the _STN encoding.
This helps ensure that -gsimple-template-names is lossless.
Jonas Devlieghere [Fri, 24 Sep 2021 20:23:29 +0000 (13:23 -0700)]
[dsymutil] Track incompleteness across unions
When determining the incompleteness of a DIE based on its children, make
sure we propagate it across union types. See test case for an example.
Without this patch we never emit the definition of Container_ivars.
Differential revision: https://reviews.llvm.org/D110443
wlei [Sun, 19 Sep 2021 23:17:37 +0000 (16:17 -0700)]
[llvm-profgen] Unify output format of different unsymbolized profiles
Differential Revision: https://reviews.llvm.org/D110080
wlei [Fri, 24 Sep 2021 05:53:12 +0000 (22:53 -0700)]
[AutoFDO][llvm-profgen] Report zero count for unexecuted part of function code
In order to be consistent with compiler that interprets zero count as unexecuted(cold), this change reports zero-value count for unexecuted part of function code. For the implementation, it leverages the range counter, initializes all the executed function range with the zero-value. After all ranges are merged and converted into disjoint ranges, the remaining zero count will indicates the unexecuted(cold) part of the function.
This change also extends the current `findDisjointRanges` method which now can support adding zero-value range.
Reviewed By: hoy, wenlei
Differential Revision: https://reviews.llvm.org/D109713
Lei Zhang [Fri, 24 Sep 2021 20:57:46 +0000 (16:57 -0400)]
[mlir][tosa] Do not fold transpose with quantized types
For such cases, the type of the constant DenseElementsAttr is
different from the transpose op return type.
Reviewed By: rsuderman
Differential Revision: https://reviews.llvm.org/D110446
wlei [Thu, 23 Sep 2021 03:00:24 +0000 (20:00 -0700)]
[AutoFDO][llvm-profgen] Profile generation for LBR(non-CS) sample
This patch introduces non-CS AutoFDO profile generation into LLVM. The profile is supposed to be well consumed by compiler using `-fprofile-sample-use=[profile]`.
After range and branch counters are extracted from the LBR sample, here we go through each addresses for symbolization, create FunctionSamples and populate its sub fields like TotalSamples, BodySamples and HeadSamples etc. For inlined code, as we need to map back to original code, so we always add body samples to the leaf frame's function sample.
Reviewed By: wenlei, hoy
Differential Revision: https://reviews.llvm.org/D109551
Diego Caballero [Wed, 22 Sep 2021 17:11:45 +0000 (17:11 +0000)]
[mlir] Create a generic reduction detection utility
This patch introduces a generic reduction detection utility that works
across different dialecs. It is mostly a generalization of the reduction
detection algorithm in Affine. The reduction detection logic in Affine,
Linalg and SCFToOpenMP have been replaced with this new generic utility.
The utility takes some basic components of the potential reduction and
returns: 1) the reduced value, and 2) a list with the combiner operations.
The logic to match reductions involving multiple combiner operations disabled
until we can properly test it.
Reviewed By: ftynse, bondhugula, nicolasvasilache, pifon2a
Differential Revision: https://reviews.llvm.org/D110303
wlei [Fri, 24 Sep 2021 06:43:08 +0000 (23:43 -0700)]
[llvm-profgen] Ignore invalid perf line in LBR record
Similar to https://reviews.llvm.org/D109637, there is a whole invalid line of message in perfscript.
```
warning: Invalid address in LBR record at line
14118674: Processed
14138923 events and lost 1 chunks!
warning: Invalid address in LBR record at line
14118676: Check IO/CPU overload!
```
This only happened for LBR only perfscript, hybridperfscript have a check of " 0x" to make sure it's the LBR perf line.
Reviewed By: hoy, wenlei
Differential Revision: https://reviews.llvm.org/D110424
Stanislav Mekhanoshin [Thu, 23 Sep 2021 23:03:48 +0000 (16:03 -0700)]
[AMDGPU] Limit promote alloca max size in functions
Non-entry functions have 32 caller saved VGPRs available. If we
promote alloca to consume more registers we will have to spill
CSRs. There is no reason to eliminate scratch access to get
another scratch access instead.
Differential Revision: https://reviews.llvm.org/D110372
LLVM GN Syncbot [Fri, 24 Sep 2021 20:30:26 +0000 (20:30 +0000)]
[gn build] Port
c0d889995e70
LLVM GN Syncbot [Fri, 24 Sep 2021 20:30:25 +0000 (20:30 +0000)]
[gn build] Port
a9ae2436fc0d
Lang Hames [Fri, 24 Sep 2021 18:35:03 +0000 (11:35 -0700)]
[ORC] Add 'contains' and 'overlaps' operations to ExecutorAddrRange.
Also includes unit tests for not-yet tested operations like comparison and
to/from pointer conversion.
Anirudh Prasad [Fri, 24 Sep 2021 20:25:27 +0000 (16:25 -0400)]
[SystemZ][z/OS] Introduce the GOFFMCAsmInfo Interface for z/OS
- This patch adds in the GOFFMCAsmInfo interfaces for the z/OS target.
- This patch decouples the previously existing SystemZMCAsmInfo interface for the ELF target and the z/OS target.
- This patch also removes a small test in the SystemZAsmLexerTest.cpp. The reason for this is because, the test is set up for the s390x-ibm-linux (SystemZ ELF triple), and the test checks a function which is overridden only for the z/OS target. The reason we can't change the test to use a z/OS triple outright is because there is still missing support which prevents the successful running of a test (assert in AsmParser.cpp due to missing GOFFAsmParser support)
Reviewed By: uweigand, abhina.sreeskantharajan
Differential Revision: https://reviews.llvm.org/D110077
Nikita Popov [Fri, 24 Sep 2021 18:42:47 +0000 (20:42 +0200)]
[IR] Handle large element size when calculating GEP indices
This is a fix for the issue reported at
https://reviews.llvm.org/D110043#3019942:
The ElementSize is a uint64_t and as such may be larger than the
index space, or be negative in the index space. This is UB, but
shouldn't cause assertion failures.
We address this by detecting whether the size is too large and
use a zero index in that case (which is always conservatively
correct).
Differential Revision: https://reviews.llvm.org/D110437
River Riddle [Fri, 24 Sep 2021 19:56:01 +0000 (19:56 +0000)]
[mlir:OpAsm] Factor out the common bits of (Op/Dialect)Asm(Parser/Printer)
This has a few benefits:
* It allows for defining parsers/printer code blocks that
can be shared between operations and attribute/types.
* It removes the weird duplication of generic parser/printer hooks,
which means that newly added hooks only require touching one class.
Differential Revision: https://reviews.llvm.org/D110375
Valentin Clement [Fri, 24 Sep 2021 20:10:12 +0000 (22:10 +0200)]
[flang][fir] Add support to mangle/deconstruct namelist group name
Add support to create unique name for namelist group and be able to
deconstruct them.
This patch is part of the upstreaming effort from fir-dev branch.
Reviewed By: jeanPerier
Differential Revision: https://reviews.llvm.org/D110331
Co-authored-by: Jean Perier <jperier@nvidia.com>
Co-authored-by: Eric Schweitz <eschweitz@nvidia.com>
Michael Kruse [Fri, 24 Sep 2021 19:23:55 +0000 (14:23 -0500)]
[Polly] Fix wrong redirect in test case.
Sanjay Patel [Fri, 24 Sep 2021 17:27:02 +0000 (13:27 -0400)]
[InstCombine] fold lshr(trunc(lshr X, C1)) C2
Only the multi-use cases are changing here because there's
another fold that catches the simpler patterns.
But that other fold is the source of infinite loops when we
try to add D110170, so removing that is planned as a follow-up.
Attempt to show the general proof in Alive2:
https://alive2.llvm.org/ce/z/Ns1uS2
Note that the overshift fold-to-zero tests are not
currently handled by instsimplify. If they were, we
could assert that the shift amount sum is less than
the source bitwidth.
Sanjay Patel [Fri, 24 Sep 2021 16:04:46 +0000 (12:04 -0400)]
[InstCombine] match variable names and code comments; NFC
Teresa Johnson [Fri, 24 Sep 2021 19:41:03 +0000 (12:41 -0700)]
Fix bot failure by adding needed dependence
Fix bot failure from
96cb97c4533a0a02c2d62ffb1121cd275aa43dd5, e.g.:
https://lab.llvm.org/buildbot/#/builders/61/builds/15203
llvm-lto now needs to link in IPO.
River Riddle [Fri, 24 Sep 2021 19:32:23 +0000 (19:32 +0000)]
[mlir:MemRef] Move DmaStartOp/DmaWaitOp to ODS
These are among the last operations still defined explicitly in C++. I've
tried to keep this commit as NFC as possible, but these ops
definitely need a non-NFC cleanup at some point.
Differential Revision: https://reviews.llvm.org/D110440
Teresa Johnson [Fri, 24 Sep 2021 00:22:54 +0000 (17:22 -0700)]
[ThinLTO] Update combined index for SamplePGO indirect calls to locals
In ThinLTO for locals we normally compute the GUID from the name after
prepending the source path to get a unique global id. SamplePGO indirect
call profiles contain the target GUID without this uniquification,
however (unless compiling with -funique-internal-linkage-names).
In order to correctly handle the call edges added to the combined index
for these indirect calls, during importing and bitcode writing we
consult a map of original to full GUID to identify the actual callee.
However, for a large application this was consuming a lot of compile
time as we need to do this repeatedly (especially during importing where
we may traverse call edges multiple times).
To fix this implement a suggestion in one of the FIXME comments, and
actually modify the call edges during a single traversal after the index
is built to perform the fixups once. I combined this fixup with the dead
code analysis performed on the index in order to avoid adding an
additional walk of the index. The dead code analysis is the first
analysis performed on the index.
This reduced the time required for a large thin link with SamplePGO by
about 20%.
No new test added, but I confirmed that there are existing tests that
will fail when no fixup is performed.
Differential Revision: https://reviews.llvm.org/D110374
Lei Zhang [Fri, 24 Sep 2021 19:21:11 +0000 (15:21 -0400)]
[mlir][tosa] Add some transpose folders
* If the input is a constant splat value, we just
need to reshape it.
* If the input is a general constant with one user,
we can also constant fold it, without bloating
the IR.
Reviewed By: rsuderman
Differential Revision: https://reviews.llvm.org/D110439