Sumesh Udayakumaran [Sat, 25 Sep 2021 22:46:03 +0000 (01:46 +0300)]
[mlir] Mode for explicitly controlling the fusion kind
New mode option that allows for either running the default fusion kind that happens today or doing either of producer-consumer or sibling fusion. This will also be helpful to minimize the compile-time of the fusion tests.
Reviewed By: bondhugula, dcaballe
Differential Revision: https://reviews.llvm.org/
D110102
Quinn Pham [Thu, 16 Sep 2021 19:00:01 +0000 (14:00 -0500)]
[PowerPC] Fix td pattern for P10 VSLDBI and VSRDBI
This patch fixes the pattern for the P10 instructions Vector Shift Left
Double by Bit Immediate VN-form and Vector Shift Right Double by Bit
Immediate VN-form. The third argument should be a target constant (`timm`)
instead of an `i32` because an immediate is expected.
Reviewed By: lei
Differential Revision: https://reviews.llvm.org/
D109920
Yaxun (Sam) Liu [Thu, 23 Sep 2021 03:45:27 +0000 (23:45 -0400)]
[HIP] Fix linking of asanrt.bc
HIP currently uses -mlink-builtin-bitcode to link all bitcode libraries, which
changes the linkage of functions to be internal once they are linked in. This
works for common bitcode libraries since these functions are not intended
to be exposed for external callers.
However, the functions in the sanitizer bitcode library is intended to be
called by instructions generated by the sanitizer pass. If their linkage is
changed to internal, their parameters may be altered by optimizations before
the sanitizer pass, which renders them unusable by the sanitizer pass.
To fix this issue, HIP toolchain links the sanitizer bitcode library with
-mlink-bitcode-file, which does not change the linkage.
A struct BitCodeLibraryInfo is introduced in ToolChain as a generic
approach to pass the bitcode library information between ToolChain and Tool.
Reviewed by: Artem Belevich
Differential Revision: https://reviews.llvm.org/
D110304
William S. Moses [Mon, 27 Sep 2021 16:55:24 +0000 (12:55 -0400)]
[MLIR][LLVM] Add error if using incorrect attribute type for specifying LLVM linkage
Address post-commit review in https://reviews.llvm.org/
D108524 to add appropriate diagnostics.
Reviewed By: mehdi_amini
Differential Revision: https://reviews.llvm.org/
D110566
peter klausler [Wed, 15 Sep 2021 15:28:48 +0000 (08:28 -0700)]
[flang] Enforce constraint: defined ass't in WHERE must be elemental
A defined assignment subroutine invoked in the context of a WHERE
statement or construct must necessarily be elemental (C1032).
Differential Revision: https://reviews.llvm.org/
D109932
Craig Topper [Mon, 27 Sep 2021 16:45:30 +0000 (09:45 -0700)]
[RISCV] Fold store of vmv.x.s to a vse with VL=1.
This can avoid a loss of decoupling with the scalar unit on cores
with decoupled scalar and vector units.
We should support FP too, but those use extract_element and not a
custom ISD node so it is a little different. I also left a FIXME
in the test for i64 extract and store on RV32.
Reviewed By: frasercrmck
Differential Revision: https://reviews.llvm.org/
D109482
Fangrui Song [Mon, 27 Sep 2021 16:50:41 +0000 (09:50 -0700)]
[ELF] Support symbol names with space in linker script expressions
Fix PR51961
Reviewed By: jhenderson
Differential Revision: https://reviews.llvm.org/
D110490
Kazu Hirata [Mon, 27 Sep 2021 16:49:32 +0000 (09:49 -0700)]
[InstCombine] Fix an "unused variable" warning
Bixia Zheng [Sat, 25 Sep 2021 06:19:07 +0000 (23:19 -0700)]
Implement the conversion from sparse constant to sparse tensors.
The sparse constant provides a constant tensor in coordinate format. We first split the sparse constant into a constant tensor for indices and a constant tensor for values. We then generate a loop to fill a sparse tensor in coordinate format using the tensors for the indices and the values. Finally, we convert the sparse tensor in coordinate format to the destination sparse tensor format.
Add tests.
Reviewed By: aartbik
Differential Revision: https://reviews.llvm.org/
D110373
@vladaindjic [Mon, 27 Sep 2021 16:44:46 +0000 (19:44 +0300)]
[OpenMP] libomp: Usage of TASK_TIED constant inside kmp_gsupport.cpp
The minor code refactorization introduces the TASK_TIED constant inside
kmp_gsupprot.cpp as a replacement for the literal value 1.
The mentioned constant is now used in both kmp_tasking.cpp and
kmp_gsupport.cpp files.
Differential Revision: https://reviews.llvm.org/
D110441
Craig Topper [Mon, 27 Sep 2021 16:37:04 +0000 (09:37 -0700)]
[RISCV] Improve support for forming widening multiplies when one input is a scalar splat.
If one input of a fixed vector multiply is a sign/zero extend and
the other operand is a splat of a scalar, we can use a widening
multiply if the scalar value has sufficient sign/zero bits.
Reviewed By: frasercrmck
Differential Revision: https://reviews.llvm.org/
D110028
Daniil Fukalov [Mon, 27 Sep 2021 16:23:47 +0000 (19:23 +0300)]
[NFC][AMDGPU] Update cost model tests:
1. Convert to generated tests.
2. Added code-size case in few places.
Sanjay Patel [Mon, 27 Sep 2021 16:06:40 +0000 (12:06 -0400)]
[InstCombine] move shl-only folds out from under commonShiftTransforms(); NFCI
This is no-functional-change-intended, but it hopefully makes things
slightly clearer and more efficient to have transforms that require
'shl' be called only from visitShl(). Further cleanup is possible.
Pavel Labath [Mon, 27 Sep 2021 15:57:22 +0000 (17:57 +0200)]
[lldb] A different fix for Domain Socket tests
we need to drop nuls from the end of the string.
Kazu Hirata [Mon, 27 Sep 2021 15:58:27 +0000 (08:58 -0700)]
[Lanai] Remove redundant declaration getTheLanaiTarget (NFC)
Note that getTheLanaiTarget is declared in
TargetInfo/LanaiTargetInfo.h, which LanaiDisassembler.cpp includes.
Identified with readability-redundant-declaration.
Kirill Bobyrev [Mon, 27 Sep 2021 15:50:50 +0000 (17:50 +0200)]
[clangd] Refactor IncludeStructure: use File (unsigned) for most computations
Preparation for
D108194.
Reviewed By: sammccall
Differential Revision: https://reviews.llvm.org/
D110386
Joseph Huber [Fri, 24 Sep 2021 15:53:31 +0000 (11:53 -0400)]
[OpenMP] Add new worksharing definitions into device RTL
This path defines the newly added `__kmpc_disitrute_static_init`
functions in the device runtime library. These functions are currently
exact copies of the current worksharing method but can be tuned later.
Depends on
D110429
Reviewed By: tianshilei1992
Differential Revision: https://reviews.llvm.org/
D110430
Joseph Huber [Fri, 24 Sep 2021 15:02:36 +0000 (11:02 -0400)]
[OpenMP] Introduce a new worksharing RTL function for distribute
This patch adds a new RTL function for worksharing. Currently we use
`__kmpc_for_static_init` for both the `distribute` and `parallel`
portion of the loop clause. This patch replaces the `distribute` portion
with a new runtime call `__kmpc_distribute_static_init`. Currently this
will be used exactly the same way, but will make it easier in the future
to fine-tune the distribute and parallel portion of the loop.
Reviewed By: jdoerfert
Differential Revision: https://reviews.llvm.org/
D110429
Raphael Isemann [Mon, 27 Sep 2021 13:28:02 +0000 (15:28 +0200)]
[lldb] Fix SocketTest.DomainGetConnectURI on macOS by stripping more zeroes from getpeername result
Apparently macOS is padding the name result with several padding zeroes at
the end. Just strip them all to pretend it's a C-string.
Thanks to Pavel for suggesting this fix.
Nico Weber [Mon, 27 Sep 2021 15:31:45 +0000 (11:31 -0400)]
[llvm/OptTable] Add named param comment for GroupedShortOption
Jake Egan [Mon, 27 Sep 2021 15:29:53 +0000 (11:29 -0400)]
Fix tests defaulting to incorrect triples on AIX
The tests only specify -march, so when the tests are run on AIX the target OS defaults to AIX, which causes the tests to misbehave.
This patch constrains the tests by specifying -mtriple instead of -march.
Reviewed By: daltenty, jsji, MaskRay
Differential Revision: https://reviews.llvm.org/
D110186
Nico Weber [Mon, 27 Sep 2021 15:24:51 +0000 (11:24 -0400)]
[llvm/OptTable] Drop "The" prefix on fields
Nico Weber [Mon, 27 Sep 2021 15:19:04 +0000 (11:19 -0400)]
[llvm] Convert OptTable::ParseOneArg() to std::unique_ptr<>
Nico Weber [Mon, 27 Sep 2021 15:10:13 +0000 (11:10 -0400)]
[llvm] Convert OptTable::parseOneArgGrouped() to std::unique_ptr<>
Nico Weber [Mon, 27 Sep 2021 15:04:07 +0000 (11:04 -0400)]
[llvm] ConvertOption::accept(), acceptInternal() to std::unique_ptr<>
These functions transfer ownership to the caller. Make this clear in the
type system.
No behavior change.
Sanjay Patel [Mon, 27 Sep 2021 13:27:28 +0000 (09:27 -0400)]
[InstCombine] generalize fold for (trunc (X u>> C1)) u>> C
This is another step towards trying to re-apply
D110170
by eliminating conflicting transforms that cause infinite loops.
a47c8e40c734 was a previous patch in this direction.
The diffs here are mostly cosmetic, but intentional:
1. The existing code that would handle this pattern in FoldShiftByConstant()
is limited to 'shl' only now. The formatting change to IsLeftShift shows
that we could move several transforms into visitShl() directly for
efficiency because they are not common shift transforms.
2. The tests are regenerated to show new instruction names to prove that
we are getting (almost) identical logic results.
3. The one case where we differ ("trunc_sandwich_small_shift1") shows that
we now use a narrow 'and' instruction. Previously, we relied on another
transform to do that, but it is limited to legal types. That seems to
be a legacy constraint from when IR analysis and codegen were less robust.
https://alive2.llvm.org/ce/z/JxyGA4
declare void @llvm.assume(i1)
define i8 @src(i32 %x, i32 %c0, i8 %c1) {
; The sum of the shifts must not overflow the source width.
%z1 = zext i8 %c1 to i32
%sum = add i32 %c0, %z1
%ov = icmp ult i32 %sum, 32
call void @llvm.assume(i1 %ov)
%sh1 = lshr i32 %x, %c0
%tr = trunc i32 %sh1 to i8
%sh2 = lshr i8 %tr, %c1
ret i8 %sh2
}
define i8 @tgt(i32 %x, i32 %c0, i8 %c1) {
%z1 = zext i8 %c1 to i32
%sum = add i32 %c0, %z1
%maskc = lshr i8 -1, %c1
%s = lshr i32 %x, %sum
%t = trunc i32 %s to i8
%a = and i8 %t, %maskc
ret i8 %a
}
Sanjay Patel [Sun, 26 Sep 2021 16:02:46 +0000 (12:02 -0400)]
[InstCombine] match variable names and code comments; NFC
Similar to:
29c09c7
Planned follow-up is to add a transform here to allow removing
a common shift fold that is conflicting with
D110170.
Amy Kwan [Mon, 27 Sep 2021 13:51:25 +0000 (08:51 -0500)]
Explicitly specify -fintegrated-as to clang/test/Driver/compilation_database.c test case.
It appears that this test assumes that the toolchain utilizes the integrated
assembler by default, since the expected output in the CHECKs are
compilation_database.o.
However, this test fails on AIX as AIX does not utilize the integrated assembler.
On AIX, the output instead is of the form /tmp/compilation_database-*.s.
Thus, this patch explicitly adds the -fintegrated-as option to match the
assumption that the integrated assembler is used by default.
Differential Revision: https://reviews.llvm.org/
D110431
Eugene Zhulenev [Mon, 27 Sep 2021 14:06:54 +0000 (07:06 -0700)]
[mlir] AsyncRuntime: use int64_t for ref counting operations
Workaround for SystemZ ABI problem: https://bugs.llvm.org/show_bug.cgi?id=51898
Reviewed By: ftynse
Differential Revision: https://reviews.llvm.org/
D110550
Dmitry Vyukov [Mon, 27 Sep 2021 12:57:18 +0000 (14:57 +0200)]
tsan: fix trace tests on darwin
The trace tests crashed on darwin because of some thread
initialization issues (thread initialization is somewhat
different on darwin).
Instead of starting real threads, create a new ThreadState
in the main thread. This makes the tests more unit-testy
and hopefully won't crash on darwin (there is almost no
platform-specific code involved now).
This will also help with future trace tests that will need
more than 1 thread. Creating more than 1 real thread and
dispatching test actions across multiple threads in the
required deterministic order is painful.
Depends on
D110539.
Reviewed By: melver
Differential Revision: https://reviews.llvm.org/
D110546
Dmitry Vyukov [Mon, 27 Sep 2021 12:07:28 +0000 (14:07 +0200)]
tsan: add a test for stack init race
Depends on
D110538.
Reviewed By: melver
Differential Revision: https://reviews.llvm.org/
D110539
Dmitry Vyukov [Mon, 27 Sep 2021 11:43:33 +0000 (13:43 +0200)]
tsan: fix and test detection of TLS races
Currently detection of races with TLS/stack initialization
is broken because we imitate the write before thread initialization,
so it's modelled with a wrong thread/epoch.
Fix that and add a test.
Reviewed By: melver
Differential Revision: https://reviews.llvm.org/
D110538
Sebastian Neubauer [Fri, 16 Jul 2021 11:15:49 +0000 (13:15 +0200)]
[AMDGPU] Ignore KILLs when forming clauses
KILL instructions are sometimes present and prevented hard
clauses from being formed.
Fix this by ignoring all meta instructions in clauses.
Differential Revision: https://reviews.llvm.org/
D106042
Nico Weber [Fri, 24 Sep 2021 23:42:09 +0000 (19:42 -0400)]
[clang] Put original flags on 'Driver args:' crash report line
We used to put the canonical spelling of flags after alias processing
on that line. For clang-cl in particular, that meant that we put flags
on that line that the clang-cl driver doesn't even accept, and the
"Driver args:" line wasn't usable.
Differential Revision: https://reviews.llvm.org/
D110458
Dmitry Vyukov [Mon, 27 Sep 2021 11:30:32 +0000 (13:30 +0200)]
tsan: de-hardcode MemCount const
Use MemCount instead of hard-coded value 7.
Reviewed By: melver
Differential Revision: https://reviews.llvm.org/
D110532
Michał Górny [Sat, 18 Sep 2021 15:31:14 +0000 (17:31 +0200)]
[lldb] [DynamicRegisterInfo] Add a convenience method to add suppl. registers
Add a convenience method to add supplementary registers that takes care
of adding invalidate_regs to all (potentially) overlapping registers.
Differential Revision: https://reviews.llvm.org/
D110023
Sjoerd Meijer [Mon, 27 Sep 2021 07:39:53 +0000 (08:39 +0100)]
[FuncSpec] Don't specialise (or crash) on poison or constexpr values
Function specialization was crashing on poison values and constexpr values.
The problem is that these values are not added to the solver, so it crashes
when a lookup is performed for these values. This fixes that by not
specialising on these values. For poison that is obvious, but for constexpr
this is a change in behaviour. Thus, in one way this is a bit of a stopgap, but
specialising on constexpr values wasn't done very intentionally, and need some
more work and tests if we wanted to support this.
As a follow up, we need to look if the solver should exit more gracefully and
return a "don't know", or that it should really support these constexprs.
This should fix PR51600 (https://bugs.llvm.org/show_bug.cgi?id=51600).
Differential Revision: https://reviews.llvm.org/
D110529
David Green [Mon, 27 Sep 2021 13:43:26 +0000 (14:43 +0100)]
[AArch64] Fix neon-reverseshuffle test extension. NFC
Apparently I gave a ll file a .patch extension. Oops.
Aaron Ballman [Mon, 27 Sep 2021 13:39:45 +0000 (09:39 -0400)]
Removing a default constructor argument; NFC
The argument is always used with its default value, so remove the
argument entirely.
Sjoerd Meijer [Wed, 22 Sep 2021 12:06:23 +0000 (13:06 +0100)]
[LoopFlatten] Precommit new test widen-iv2.ll for
D110234.
gbreynoo [Mon, 27 Sep 2021 13:28:31 +0000 (14:28 +0100)]
[llvm-dwarfdump][docs] Add missing options to the help output and the command guide
This change is to add some missing details to the help text and command
guide:
- Added a note to the command guide that --debug-macro also dumps
.debug_macinfo.
- Added a note to the command guide that --debug-frame and --eh_frame
are aliases, and in cases where both sections are present one command
outputs both.
- Changed the wording in the help output for --ignore-case and --regex to
closer match the command guide.
Jun Ma [Mon, 27 Sep 2021 12:39:05 +0000 (20:39 +0800)]
Revert "Recommit "Revert "[CVP] processSwitch: Remove default case when switch cover all possible values."""
This reverts commit
8ba2adcf9e54b34ba8efa73ac0d81a1192e4f614.
LLVM GN Syncbot [Mon, 27 Sep 2021 12:33:13 +0000 (12:33 +0000)]
[gn build] Port
9da2fa277e81
Michał Górny [Sat, 25 Sep 2021 10:47:06 +0000 (12:47 +0200)]
[lldb] Move StringConvert inside debugserver
The StringConvert API is no longer used anywhere but in debugserver.
Since debugserver does not use LLVM API, we cannot replace it with
llvm::to_integer() and llvm::to_float() there. Let's just move
the sources into debugserver.
Differential Revision: https://reviews.llvm.org/
D110478
Pushpinder Singh [Fri, 24 Sep 2021 07:01:01 +0000 (07:01 +0000)]
[AMDGPU][OpenMP] Add memory pool size check to isValidMemoryPool
Keeping all the checks in one place for future simplification.
Reviewed By: JonChesterfield
Differential Revision: https://reviews.llvm.org/
D110513
Michał Górny [Fri, 24 Sep 2021 11:25:27 +0000 (13:25 +0200)]
[lldb] [Host] Refactor XML converting getters
Refactor the XML converting attribute and text getters to use LLVM API.
While at it, remove some redundant error and missing XML support
handling, as the called base functions do that anyway. Add tests
for these methods.
Note that this patch changes the getter behavior to be IMHO more
correct. In particular:
- negative and overflowing integers are now reported as failures to
convert, rather than being wrapped over or capped
- digits followed by text are now reported as failures to convert
to double, rather than their numeric part being converted
Differential Revision: https://reviews.llvm.org/
D110410
Michael Kruse [Mon, 27 Sep 2021 12:11:41 +0000 (07:11 -0500)]
[OpenMP][CMake] Use in-project clang as CUDA->IR compiler for new DeviceRTL.
Use the in-project clang, llvm-link and opt if available and unless
CMake cache variables specify to use a different compiler. This applies
D101265 to the new DeviceRTL's CMakeLists.txt which was copied before
D101265 was applied.
Fixes the openmp-offloading-cuda-runtime builder which was failing
since
D110006.
Reviewed By: tianshilei1992
Differential Revision: https://reviews.llvm.org/
D110251
Tobias Gysi [Mon, 27 Sep 2021 10:07:44 +0000 (10:07 +0000)]
[mlir][linalg] Make fusion on tensor rewriter friendly (NFC).
Let the calling pass or pattern replace the uses of the original root operation. Internally, the tileAndFuse still replaces uses and updates operands but only of newly created operations.
Reviewed By: nicolasvasilache
Differential Revision: https://reviews.llvm.org/
D110169
Emre Kultursay [Mon, 27 Sep 2021 10:56:53 +0000 (12:56 +0200)]
Fix rendezvous for rebase_exec=true case
When rebase_exec=true in DidAttach(), all modules are loaded
before the rendezvous breakpoint is set, which means the
LoadInterpreterModule() method is not called and m_interpreter_module
is not initialized.
This causes the very first rendezvous breakpoint hit with
m_initial_modules_added=false to accidentally unload the
module_sp that corresponds to the dynamic loader.
This bug (introduced in D92187) was causing the rendezvous
mechanism to not work in Android 28. The mechanism works
fine on older/newer versions of Android.
Test: Verified rendezvous on Android 28 and 29
Test: Added dlopen test
Reviewed By: labath
Differential Revision: https://reviews.llvm.org/
D109797
Roman Lebedev [Mon, 27 Sep 2021 11:15:58 +0000 (14:15 +0300)]
[X86][Costmodel] Load/store i16 Stride=2 VF=32 interleaving costs
The only sched models that for cpu's that support avx2
but not avx512 are: haswell, broadwell, skylake, zen1-3
For load we have:
https://godbolt.org/z/q6GbK89br - for intels `Block RThroughput: =18.0`; for ryzens, `Block RThroughput: <=7.0`
So pick cost of `18`.
For store we have:
https://godbolt.org/z/Yzfoo5TnW - for intels `Block RThroughput: =8.0`; for ryzens, `Block RThroughput: <=4.0`
So pick cost of `8`.
I'm directly using the shuffling asm the llc produced,
without any manual fixups that may be needed
to ensure sequential execution.
Reviewed By: RKSimon
Differential Revision: https://reviews.llvm.org/
D110507
Roman Lebedev [Mon, 27 Sep 2021 11:15:49 +0000 (14:15 +0300)]
[X86][Costmodel] Load/store i16 Stride=2 VF=16 interleaving costs
The only sched models that for cpu's that support avx2
but not avx512 are: haswell, broadwell, skylake, zen1-3
For load we have:
https://godbolt.org/z/Y1E7qnjz8 - for intels `Block RThroughput: =9.0`; for ryzens, `Block RThroughput: <=3.5`
So pick cost of `9`.
For store we have:
https://godbolt.org/z/Y1E7qnjz8 - for intels `Block RThroughput: =4.0`; for ryzens, `Block RThroughput: <=2.0`
So pick cost of `4`.
I'm directly using the shuffling asm the llc produced,
without any manual fixups that may be needed
to ensure sequential execution.
Reviewed By: RKSimon
Differential Revision: https://reviews.llvm.org/
D110506
Roman Lebedev [Mon, 27 Sep 2021 11:15:37 +0000 (14:15 +0300)]
[X86][Costmodel] Load/store i16 Stride=2 VF=8 interleaving costs
The only sched models that for cpu's that support avx2
but not avx512 are: haswell, broadwell, skylake, zen1-3
For load we have:
https://godbolt.org/z/e5YE99a4P - for intels `Block RThroughput: =6.0`; for ryzens, `Block RThroughput: =2.0`
So pick cost of `6`.
For store we have:
https://godbolt.org/z/3vM4KsE1n - for intels `Block RThroughput: =3.0`; for ryzens, `Block RThroughput: <=2.0`
So pick cost of `3`.
I'm directly using the shuffling asm the llc produced,
without any manual fixups that may be needed
to ensure sequential execution.
Reviewed By: RKSimon
Differential Revision: https://reviews.llvm.org/
D110505
Roman Lebedev [Mon, 27 Sep 2021 11:15:25 +0000 (14:15 +0300)]
[X86][Costmodel] Load/store i16 Stride=2 VF=4 interleaving costs
The only sched models that for cpu's that support avx2
but not avx512 are: haswell, broadwell, skylake, zen1-3
For load we have:
https://godbolt.org/z/1j3nf3dro - for intels `Block RThroughput: =2.0`; for ryzens, `Block RThroughput: <=1.0`
So pick cost of `2`.
For store we have:
https://godbolt.org/z/4n1zvP37j - for intels `Block RThroughput: =1.0`; for ryzens, `Block RThroughput: <=0.5`
So pick cost of `1`.
I'm directly using the shuffling asm the llc produced,
without any manual fixups that may be needed
to ensure sequential execution.
Reviewed By: RKSimon
Differential Revision: https://reviews.llvm.org/
D110504
Dmitry Vyukov [Sat, 25 Sep 2021 11:13:11 +0000 (13:13 +0200)]
tsan: align ThreadState to cache line
There are 2 reasons to do this:
1. We place hot data in the first cache line of ThreadState,
this assumed that it's cache-line-aligned but we never actually
enforced it (or it was lost at some point).
2. The new vector clock uses vector instructions and requires
data alignment. Later the new vector clock will be embedded in
ThreadState, then ensuring vector clock alignment will be
impossible w/o ThreadState alignment.
Depends on
D110519.
Reviewed By: melver
Differential Revision: https://reviews.llvm.org/
D110520
Dmitry Vyukov [Sat, 25 Sep 2021 10:57:29 +0000 (12:57 +0200)]
tsan: move shadow stack into ThreadState
Currently the shadow stack is located in the trace memory mapping.
The new tsan runtime will remove the trace memory mapping.
Move the shadow stack into ThreadState as a preparation step.
Reviewed By: melver
Differential Revision: https://reviews.llvm.org/
D110519
Fraser Cormack [Wed, 1 Sep 2021 16:02:35 +0000 (17:02 +0100)]
[DAGCombiner][VP] Fold zero-length or false-masked VP ops
This patch adds a generic DAGCombine for vector-predicated (VP) nodes.
Those for which we can determine that no vector element is active can be
replaced by either undef or, for reductions, the start value.
This is tested rather trivially at the IR level, where it's possible
that we want to teach instcombine to perform this optimization.
However, we can also see the zero-evl case arise during SelectionDAG
legalization, when wide VP operations can be split into two and the
upper operation emerges as trivially false.
It's possible that we could perform this optimization "proactively"
(both on legal vectors and before splitting) and reduce the width of an
operation and insert it into a larger undef vector:
```
v8i32 vp_add x, y, mask, 4
->
v8i32 insert_subvector (v8i32 undef), (v4i32 vp_add xsub, ysub, mask, 4), i32 0
```
This is somewhat analogous to similar vector narrow/widening
optimizations, but it's unclear at this point whether that's beneficial
to do this for VP ops for any/all targets.
Reviewed By: craig.topper
Differential Revision: https://reviews.llvm.org/
D109148
David Green [Mon, 27 Sep 2021 10:21:21 +0000 (11:21 +0100)]
[ARM] Improve detection of fallthough when aligning blocks
We align non-fallthrough branches under Cortex-M at O3 to lead to fewer
instruction fetches. This improves that for the block after a LE or
LETP. These blocks will still have terminating branches until the
LowOverheadLoops pass is run (as they are not handled by analyzeBranch,
the branch is not removed until later), so canFallThrough will return
false. These extra branches will eventually be removed, leaving a
fallthrough, so treat them as such and don't add unnecessary alignments.
Differential Revision: https://reviews.llvm.org/
D107810
Nicolas Vasilache [Fri, 24 Sep 2021 15:44:33 +0000 (15:44 +0000)]
[mlir] Factor out constraint set creation from hoist padding.
This revision adds a
```
FlatAffineValueConstraints(ValueRange ivs, ValueRange lbs, ValueRange ubs)
```
method and use it in hoist padding.
Differential Revision: https://reviews.llvm.org/
D110427
Daniel Kiss [Mon, 27 Sep 2021 10:01:35 +0000 (12:01 +0200)]
[libunwind] Support cfi_undefined and cfi_register for float registers.
During a backtrace the `.cfi_undefined` for a float register causes an assert in libunwind.
Reviewed By: MaskRay
Differential Revision: https://reviews.llvm.org/
D110144
Max Kazantsev [Mon, 27 Sep 2021 09:55:31 +0000 (16:55 +0700)]
[Test] Regenerate test checks with autogen script
Nicolas Vasilache [Fri, 24 Sep 2021 13:42:51 +0000 (13:42 +0000)]
[mlir][Linalg] Refactor padding hoisting - NFC
This revision extracts padding hoisting in a new file and cleans it up in prevision of future improvements and extensions.
Differential Revision: https://reviews.llvm.org/
D110414
Simon Pilgrim [Mon, 27 Sep 2021 09:39:28 +0000 (10:39 +0100)]
[X86] combineVectorHADDSUB - remove the broken HOP(x,x) merging code (PR51974)
This intention of this code turns out to be superfluous as we can handle this with shuffle combining, and it has a critical flaw in that it doesn't check for dependencies.
Fixes PR51974
Florian Hahn [Tue, 7 Sep 2021 11:12:23 +0000 (13:12 +0200)]
[LV] Add tests where rt checks may make vectorization unprofitable.
Add a few additional tests which require a large number of runtime
checks for
D109368.
Pushpinder Singh [Mon, 27 Sep 2021 09:30:08 +0000 (09:30 +0000)]
[libomptarget][nfc][amdgpu] Reorder function to clarify review diff
Matthias Springer [Mon, 27 Sep 2021 08:13:11 +0000 (17:13 +0900)]
[mlir][vector] Fix bug in vector-transfer-full-partial-split
When splitting with linalg.copy, cannot write into the destination alloc directly. Instead, write into a subview of the alloc.
Differential Revision: https://reviews.llvm.org/
D110512
Ben Shi [Sat, 25 Sep 2021 03:22:11 +0000 (03:22 +0000)]
[AArch64][test] Add more tests of add/sub with immediate
Reviewed By: dmgreen
Differential Revision: https://reviews.llvm.org/
D110474
David Spickett [Mon, 27 Sep 2021 08:44:09 +0000 (08:44 +0000)]
[llvm] Disable LLVM_ENABLE_PER_TARGET_RUNTIME_DIR by default on Arm Linux
Due to the way detecting the hard float ABI is currently
handled, clang fails to find the per target dir.
I am working to fix this but in the meantime disable it by
default on Arm Linux.
Fraser Cormack [Fri, 24 Sep 2021 15:08:37 +0000 (16:08 +0100)]
[RISCV] Create the correct mask type when lowering EXTRACT_VECTOR_ELT
This particular case was creating a `VMSET_VL` using the old
fixed-length type in order to pass a mask to other custom nodes
operating on the scalable container type. This kind of thing wasn't
caught for us; I only noticed when experimenting with odd-length
vectors, where it was trying to generate an invalid `v3i1` MVT.
Reviewed By: craig.topper
Differential Revision: https://reviews.llvm.org/
D110420
Krasimir Georgiev [Mon, 27 Sep 2021 08:43:56 +0000 (10:43 +0200)]
Jon Chesterfield [Mon, 27 Sep 2021 08:38:07 +0000 (09:38 +0100)]
[libomptarget][amdgpu] Replace dead exit call with returning error
Michał Górny [Sun, 26 Sep 2021 12:07:21 +0000 (14:07 +0200)]
[llvm] [ADT] Add a range/iterator-based Split()
Add a llvm::Split() implementation that can be used via range-for loop,
e.g.:
for (StringRef x : llvm::Split("foo,bar,baz", ','))
...
The implementation uses an additional SplittingIterator class that
uses StringRef::split() internally.
Differential Revision: https://reviews.llvm.org/
D110496
Balazs Benics [Mon, 27 Sep 2021 08:17:12 +0000 (10:17 +0200)]
[clang][AST] Add support for ShuffleVectorExpr to ASTImporter
Addresses https://bugs.llvm.org/show_bug.cgi?id=51902
Reviewed By: shafik, martong
Differential Revision: https://reviews.llvm.org/
D110052
Max Kazantsev [Mon, 27 Sep 2021 08:00:56 +0000 (15:00 +0700)]
[Test] Add test showing that SCEV cannot properly infer ranges of cycled phis
Krasimir Georgiev [Mon, 27 Sep 2021 07:35:58 +0000 (09:35 +0200)]
[lldb] silence -Wsometimes-uninitialized warnings
No functional changes intended.
Silence warnings from
https://github.com/llvm/llvm-project/commit/
3a6ba3675177cb5e47dee325f300aced4cd864ed.
Vignesh Balu [Mon, 27 Sep 2021 06:19:09 +0000 (11:49 +0530)]
[OpenMP][OMPD] Implementation of OMPD debugging library - libompd.
This is a continuation of the review: https://reviews.llvm.org/
D100182
This patch implements the OMPD API as specified in the standard doc.
Reviewed By: @hbae
Differential Revision: https://reviews.llvm.org/
D100183
serge-sans-paille [Mon, 27 Sep 2021 06:23:38 +0000 (08:23 +0200)]
Make analyze-cc path discovery sensible to symlinks
Fix https://bugs.llvm.org/show_bug.cgi?id=51897
Differential Revision: https://reviews.llvm.org/
D110521
Freddy Ye [Mon, 27 Sep 2021 05:06:09 +0000 (13:06 +0800)]
[X86][ISel] Lowering FROUND(f16) and FROUNDEVEN(f16)
When AVX512FP16 is enabled, FROUND(f16) cannot be dealt with
TypeLegalize, and no libcall in libm is ready for fround(f16) now.
FROUNDEVEN(f16) has related instruction in AVX512FP16.
Reviewed By: craig.topper
Differential Revision: https://reviews.llvm.org/
D110312
Max Kazantsev [Mon, 27 Sep 2021 05:12:51 +0000 (12:12 +0700)]
[Test] Add some simple tests where IndVars cannot remove a check in loop
Previously I've added tests that require context for inference, but it
seems tha SCEV can't prove same facts even when the context isn't required.
Michael Kruse [Mon, 27 Sep 2021 01:10:26 +0000 (20:10 -0500)]
[Polly] Reject reject regions entered by an indirectbr/callbr.
SplitBlockPredecessors is unable to insert an additional BasicBlock
between an indirectbr/callbr terminator and the successor blocks.
This is needed by Polly to normalize the control flow before emitting
its optimzed code.
This patches rejects regions entered by an indirectbr/callbr to not fail
later at code generation.
This fixes llvm.org/PR51964
Lang Hames [Mon, 27 Sep 2021 01:26:10 +0000 (18:26 -0700)]
[ORC] Add missing lock to CompileOnDemandLayer::getPerDylibResources.
The getPerDylibResources method may be called concurrently from multiple
threads, so we need to protect access to the underlying map.
Possible for fix https://llvm.org/PR51064
Wang, Pengfei [Mon, 27 Sep 2021 00:44:44 +0000 (08:44 +0800)]
[X86][FP16] Add more builtins to avoid multi evaluation problems & add 2 missed intrinsics
Reviewed By: craig.topper
Differential Revision: https://reviews.llvm.org/
D110336
Lang Hames [Mon, 27 Sep 2021 00:56:47 +0000 (17:56 -0700)]
[ORC] Fix SimpleRemoteEPC data races.
Adds a 'start' method to SimpleRemoteEPCTransport to defer transport startup
until the client has been configured. This avoids races on client members if the
first messages arrives while the client is being configured.
Also fixes races on the file descriptors in FDSimpleRemoteEPCTransport.
Amara Emerson [Mon, 27 Sep 2021 00:24:58 +0000 (17:24 -0700)]
[GlobalISel] Re-generate some call lowering tests with the new CHECK-NEXT behaviour.
Mehdi Amini [Sun, 26 Sep 2021 22:04:41 +0000 (22:04 +0000)]
Fix clang-tidy warning "modernize-use-nullptr" in MLIR VulkanRuntime (NFC)
Mehdi Amini [Sun, 26 Sep 2021 22:01:19 +0000 (22:01 +0000)]
Fix ClangTidyLegacy warning: "'virtual' is redundant since the function is already declared 'final' " (NFC)
Lang Hames [Sun, 26 Sep 2021 21:14:41 +0000 (14:14 -0700)]
[MCJIT] This test shouldn't require an unwind table.
This should fix the failures on the Fuchsia bot that started in
https://lab.llvm.org/buildbot/#/builders/98/builds/6401.
Michał Górny [Sat, 25 Sep 2021 10:31:25 +0000 (12:31 +0200)]
[lldb] [gdb-remote] Use llvm::StringRef.split() and llvm::to_integer()
Replace the uses of StringConvert combined with hand-rolled array
splitting with llvm::StringRef.split() and llvm::to_integer().
Differential Revision: https://reviews.llvm.org/
D110472
Nikita Popov [Sun, 26 Sep 2021 19:21:13 +0000 (21:21 +0200)]
[BasicAA] Don't check whether GEP is sized (NFC)
GEPs are required to have sized source element type, so we can
just assert that here.
Simon Pilgrim [Sun, 26 Sep 2021 18:27:38 +0000 (19:27 +0100)]
[X86][SSE] combineMulToPMADDWD - enable sext_extend_vector_inreg(vXi16) -> zext_extend_vector_inreg(vXi16) fold
The plan is to allow combineMulToPMADDWD to match illegal vector types (as long as they're still pow2), which should allow us to start removing the 128-bit limit on more of the PMADDWD combines.
Simon Pilgrim [Sun, 26 Sep 2021 17:42:51 +0000 (18:42 +0100)]
[X86] Fold PACK(*_EXTEND_VECTOR_INREG, UNDEF) -> *_EXTEND_VECTOR_INREG
For 128-bit vectors, we can remove a PACK of a EXTEND_VECTOR_INREG node and just create a smaller extension to the result/packed type.
Lang Hames [Sun, 26 Sep 2021 18:02:37 +0000 (11:02 -0700)]
[ORC] Remote OrcRemoteTargetClient and OrcRemoteTargetServer.
Now that the lli and lli-child-target tools have been updated to use
SimpleRemoteEPC (
6498b0e991b) the OrcRemoteTarget* APIs are no longer needed.
Once the LLJITWithRemoteDebugging example has been migrated to SimpleRemoteEPC
we will remove OrcRPCExecutorProcessControl, and the ORC RPC system itself.
Lang Hames [Sun, 26 Sep 2021 18:15:42 +0000 (11:15 -0700)]
[ORC] Export process symbols in lli-child-target.
We want this behavior for future testing infrastructure anyway, and it may help
with the failure in https://lab.llvm.org/buildbot/#/builders/98/builds/6401:
/b/fuchsia-x86_64-linux/llvm.obj/tools/clang/stage2-bins/bin/lli: warning:
remote mcjit does not support lazy compilation
Finalization error: could not register eh-frame: __register_frame function not
found
/b/fuchsia-x86_64-linux/llvm.obj/tools/clang/stage2-bins/bin/lli: disconnecting
LLVM GN Syncbot [Sun, 26 Sep 2021 17:25:08 +0000 (17:25 +0000)]
[gn build] Port
6498b0e991ba
Lang Hames [Sun, 26 Sep 2021 00:53:21 +0000 (10:53 +1000)]
Reintroduce "[ORC] Introduce EPCGenericRTDyldMemoryManager."
This reintroduces "[ORC] Introduce EPCGenericRTDyldMemoryManager."
(
bef55a2b47a938ef35cbd7b61a1e5fa74e68c9ed) and "[lli] Add ChildTarget dependence
on OrcTargetProcess library." (
7a219d801bf2c3006482cf3cbd3170b3b4ea2e1b) which were
reverted in
99951a56842d8e4cd0706cd17a04f77b5d0f6dd0 due to bot failures.
The root cause of the bot failures should be fixed by "[ORC] Fix uninitialized
variable." (
0371049277912afc201da721fa659ecef7ab7fba) and "[ORC] Wait for
handleDisconnect to complete in SimpleRemoteEPC::disconnect."
(
320832cc9b7e7fea5fc8afbed75c34c4a43287ba).
Simon Pilgrim [Sun, 26 Sep 2021 17:08:17 +0000 (18:08 +0100)]
[X86] Fold ADD(VPMADDWD(X,Y),VPMADDWD(Z,W)) -> VPMADDWD(SHUFFLE(X,Z), SHUFFLE(Y,W))
Merge addition of VPMADDWD nodes if each element pair doesn't use the upper element in each pair (i.e. its zero) - we can generalize this to either element in the pair if we one day create VPMADDWD with zero lower elements.
There are still a number of issues with extending/shuffling with 256/512-bit VPMADDWD nodes so this initially only works for v2i32/v4i32 cases - I'm working on removing all these limitations but there's still a bit of yak shaving to go.....
Lang Hames [Sun, 26 Sep 2021 16:58:46 +0000 (09:58 -0700)]
[ORC][llvm-jitlink] Add debugging output to SimpleRemoteEPC (and Server).
Also adds an optional 'debug' argument to the llvm-jitlink-executor tool to
enable debug-logging.
Kazu Hirata [Sun, 26 Sep 2021 16:26:56 +0000 (09:26 -0700)]
[RISCV] Remove redundant declaration RISCVMnemonicSpellCheck (NFC)
Note that RISCVMnemonicSpellCheck is defined in
RISCVGenAsmMatcher.inc, which RISCVAsmParser.cpp includes.
Identified with readability-redundant-declaration.
Roman Lebedev [Sun, 26 Sep 2021 16:06:16 +0000 (19:06 +0300)]
[X86][Costmodel] Load/store i16 VF=2 interleaving costs
The only sched models that for cpu's that support avx2
but not avx512 are: haswell, broadwell, skylake, zen1-3
For load we have:
https://godbolt.org/z/M8vEKs5jY - for intels `Block RThroughput: =2.0`;
for ryzens, `Block RThroughput: <=1.0`
So pick cost of `2`.
For store we have:
https://godbolt.org/z/Kx1nKz7je - for intels `Block RThroughput: =1.0`;
for ryzens, `Block RThroughput: <=0.5`
So pick cost of `1`.
I'm directly using the shuffling asm the llc produced,
without any manual fixups that may be needed
to ensure sequential execution.
Reviewed By: RKSimon
Differential Revision: https://reviews.llvm.org/
D103144
Nikita Popov [Sun, 26 Sep 2021 16:01:26 +0000 (18:01 +0200)]
[DSE] Don't check getUnderlyingObject() return value (NFC)
getUnderlyingObject() never returns null. It will simply return
something that is not the "root" underlying object.
Also drop a stale comment.
Nikita Popov [Sun, 26 Sep 2021 15:52:20 +0000 (17:52 +0200)]
[DSE] Make DSEState non-copyable (NFC)
As it contains a self-reference, the default copy/move ctors
would not be safe.
Move the DSEState::get() method into the ctor to make sure no move
occurs here even without NRVO.
This is a speculative fix for test failures on
llvm-clang-x86_64-expensive-checks-win.