River Riddle [Thu, 1 Oct 2020 00:23:11 +0000 (17:23 -0700)]
[mlir] Split Dialect::addOperations into two functions
The current implementation uses a fold expression to add all of the operations at once. This is really nice, but apparently the lifetime of each of the AbstractOperation instances is for the entire expression which may lead to a stack overflow for large numbers of operations. This splits the method in two to allow for the lifetime of the AbstractOperation to be properly scoped.
peter klausler [Wed, 30 Sep 2020 22:04:43 +0000 (15:04 -0700)]
[flang] Fix Gw.d format output
The estimation of the decimal exponent needs to allow for all
'd' of the requested significant digits.
Also accept a plus sign on a "+kP" scaling factor in a format.
Differential revision: https://reviews.llvm.org/D88618
Geoffrey Martin-Noble [Thu, 1 Oct 2020 00:47:25 +0000 (17:47 -0700)]
Remove `Ops` suffix from dialect library names
Dialects include more than just ops, so this suffix is outdated. Follows
discussion in
https://llvm.discourse.group/t/rfc-canonical-file-paths-to-dialects/621
Reviewed By: stellaraccident
Differential Revision: https://reviews.llvm.org/D88530
Sam Clegg [Fri, 31 Jul 2020 00:44:32 +0000 (17:44 -0700)]
[lld][WebAssembly] Allow exporting of mutable globals
In particular allow explict exporting of `__stack_pointer` but
exclud this from `--export-all` to avoid requiring the mutable
globals feature whenenve `--export-all` is used.
This uncovered a bug in populateTargetFeatures regarding checking
if the mutable-globals feature is allowed.
See: https://github.com/WebAssembly/binaryen/issues/2934
Differential Revision: https://reviews.llvm.org/D88506
Amara Emerson [Thu, 1 Oct 2020 00:35:53 +0000 (17:35 -0700)]
Try to fix build. May have used a C++ feature too new/not supported on all platforms.
Arthur Eubanks [Wed, 30 Sep 2020 16:22:18 +0000 (09:22 -0700)]
[WholeProgramDevirt][NewPM] Add NPM testing path to match legacy pass
The legacy pass's default constructor sets UseCommandLine = true and
goes down a separate testing route. Match that in the NPM pass.
This fixes all tests in llvm/test/Transforms/WholeProgramDevirt under NPM.
Reviewed By: ychen
Differential Revision: https://reviews.llvm.org/D88588
Amara Emerson [Thu, 1 Oct 2020 00:20:57 +0000 (17:20 -0700)]
[AArch64][GlobalISel] Add some more legal types for G_PHI, G_IMPLICIT_DEF, G_FREEZE.
Also use this opportunity start to clean up the mess of vector type lists we
have in the LegalizerInfo. Unfortunately since the legalizer rule builders require
std::initializer_list objects as parameters we can't programmatically generate the
type lists.
peter klausler [Wed, 30 Sep 2020 19:53:00 +0000 (12:53 -0700)]
[flang] Allow record advancement in external formatted sequential READ
The '/' control edit descriptor causes a runtime crash for an
external formatted sequential READ because the AdvanceRecord()
member function for external units implemented only the tasks
to finish reading the current record. Split those out into
a new FinishReadingRecord() member function, call that instead
from EndIoStatement(), and change AdvanceRecord() to both
finish reading the current record and to begin reading the next
one.
Differential revision: https://reviews.llvm.org/D88607
Jonas Devlieghere [Thu, 1 Oct 2020 00:01:27 +0000 (17:01 -0700)]
[lldb] Make TestGuiBasicDebug more lenient
Matt's change to the register allocator in
89baeaef2fa9 changed where we
end up after the `finish`. Before we'd end up on line 4.
* thread #1, queue = 'com.apple.main-thread', stop reason = step out
Return value: (int) $0 = 1
frame #0: 0x0000000100003f7d a.out`main(argc=1, argv=0x00007ffeefbff630) at main.c:4:3
1 extern int func();
2
3 int main(int argc, char **argv) {
-> 4 func(); // Break here
5 func(); // Second
6 return 0;
7 }
Now, we end up on line 5.
* thread #1, queue = 'com.apple.main-thread', stop reason = step out
Return value: (int) $0 = 1
frame #0: 0x0000000100003f8d a.out`main(argc=1, argv=0x00007ffeefbff630) at main.c:5:3
2
3 int main(int argc, char **argv) {
4 func(); // Break here
-> 5 func(); // Second
6 return 0;
7 }
Given that this is not expected stable to be stable I've made the test a
bit more lenient to accept both scenarios.
Jessica Paquette [Wed, 30 Sep 2020 01:23:02 +0000 (18:23 -0700)]
[AArch64][GlobalISel] NFC: Refactor G_FCMP selection code
Refactor this so it's similar to the existing integer comparison code.
Also add some missing 64-bit testcases to select-fcmp.mir.
Refactoring to prep for improving selection for G_FCMP-related conditional
branches etc.
Differential Revision: https://reviews.llvm.org/D88614
Ranjeet Singh [Wed, 30 Sep 2020 23:30:36 +0000 (00:30 +0100)]
[ARM] Add missing target for Arm neon test case.
This is a follow-up from https://reviews.llvm.org/D61717. Where Richard
described the issue with compiling arm_neon.h under
-flax-vector-conversions=none. It looks like the example reproducer does
actually work but what was missing was a test entry for that target.
Differential Revision: https://reviews.llvm.org/D88546
Joachim Protze [Wed, 30 Sep 2020 23:01:09 +0000 (01:01 +0200)]
[OpenMP][libarcher] Allow all possible argument separators in TSAN_OPTIONS
Currently, the parser used to tokenize the TSAN_OPTIONS in libomp uses
only spaces as separators, even though TSAN in compiler-rt supports
other separators like ':' or ','.
CTest uses ':' to separate sanitizer options by default.
The documentation for other sanitizers mentions ':' as separator,
but TSAN only lists spaces, which is probably where this mismatch originated.
Patch provided by upsj
Differential Revision: https://reviews.llvm.org/D87144
Craig Topper [Wed, 30 Sep 2020 20:55:01 +0000 (13:55 -0700)]
Patch IEEEFloat::isSignificandAllZeros and IEEEFloat::isSignificandAllOnes (bug 34579)
Patch IEEEFloat::isSignificandAllZeros and IEEEFloat::isSignificandAllOnes to behave correctly in the case that the size of the significand is a multiple of the width of the integerParts making up the significand.
The patch to IEEEFloat::isSignificandAllOnes fixes bug 34579, and the patch to IEEE:Float:isSignificandAllZeros fixes the unit test "APFloatTest.x87Next" I added here. I have included both in this diff since the changes are very similar.
Patch by Andrew Briand
Ahsan Saghir [Tue, 29 Sep 2020 14:40:38 +0000 (09:40 -0500)]
[PowerPC] Add outer product instructions for MMA
This patch adds outer product instructions for MMA, including related infrastructure, and their tests.
Depends on D84968.
Reviewed By: #powerpc, bsaleil, amyk
Differential Revision: https://reviews.llvm.org/D88043
Akira Hatanaka [Wed, 30 Sep 2020 23:05:17 +0000 (16:05 -0700)]
Handle unknown OSes in DarwinTargetInfo::getExnObjectAlignment
rdar://problem/
69727650
Joachim Protze [Wed, 30 Sep 2020 22:53:41 +0000 (00:53 +0200)]
[OpenMP][OMPT] Update OMPT tests for newly added GOMP interface patches
This patch updates the expected results for the GOMP interface patches: D87267, D87269, and D87271.
The taskwait-depend test is changed to really use taskwait-depend and copied to an task_if0-depend test.
To pass the tests, the handling of the return address was fixed.
Differential Revision: https://reviews.llvm.org/D87680
Joachim Protze [Wed, 30 Sep 2020 08:40:51 +0000 (10:40 +0200)]
[OpenMP][libomptarget] make omp_get_initial_device 5.1 compliant
OpenMP 5.1 defines omp_get_initial_device to return the same value as omp_get_num_devices.
Since this change is also 5.0 compliant, no versioning is needed.
Differential Revision: https://reviews.llvm.org/D88149
peter klausler [Wed, 30 Sep 2020 20:34:23 +0000 (13:34 -0700)]
[flang] Semantic analysis for FINAL subroutines
Represent FINAL subroutines in the symbol table entries of
derived types. Enforce constraints. Update tests that have
inadvertent violations or modified messages. Added a test.
The specific procedure distinguishability checking code for generics
was used to enforce distinguishability of FINAL procedures.
(Also cleaned up some confusion and redundancy noticed in the
type compatibility infrastructure while digging into that area.)
Differential revision: https://reviews.llvm.org/D88613
Reid Kleckner [Wed, 30 Sep 2020 21:55:51 +0000 (14:55 -0700)]
Re-land "[PDB] Merge types in parallel when using ghashing"
Stored Error objects have to be checked, even if they are success
values.
This reverts commit
8d250ac3cd48d0f17f9314685a85e77895c05351.
Relands commit
49b3459930655d879b2dc190ff8fe11c38a8be5f..
Original commit message:
-----------------------------------------
This makes type merging much faster (-24% on chrome.dll) when multiple
threads are available, but it slightly increases the time to link (+10%)
when /threads:1 is passed. With only one more thread, the new type
merging is faster (-11%). The output PDB should be identical to what it
was before this change.
To give an idea, here is the /time output placed side by side:
BEFORE | AFTER
Input File Reading: 956 ms | 968 ms
Code Layout: 258 ms | 190 ms
Commit Output File: 6 ms | 7 ms
PDB Emission (Cumulative): 6691 ms | 4253 ms
Add Objects: 4341 ms | 2927 ms
Type Merging: 2814 ms | 1269 ms -55%!
Symbol Merging: 1509 ms | 1645 ms
Publics Stream Layout: 111 ms | 112 ms
TPI Stream Layout: 764 ms | 26 ms trivial
Commit to Disk: 1322 ms | 1036 ms -300ms
----------------------------------------- --------
Total Link Time: 8416 ms 5882 ms -30% overall
The main source of the additional overhead in the single-threaded case
is the need to iterate all .debug$T sections up front to check which
type records should go in the IPI stream. See fillIsItemIndexFromDebugT.
With changes to the .debug$H section, we could pre-calculate this info
and eliminate the need to do this walk up front. That should restore
single-threaded performance back to what it was before this change.
This change will cause LLD to be much more parallel than it used to, and
for users who do multiple links in parallel, it could regress
performance. However, when the user is only doing one link, it's a huge
improvement. In the future, we can use NT worker threads to avoid
oversaturating the machine with work, but for now, this is such an
improvement for the single-link use case that I think we should land
this as is.
Algorithm
----------
Before this change, we essentially used a
DenseMap<GloballyHashedType, TypeIndex> to check if a type has already
been seen, and if it hasn't been seen, insert it now and use the next
available type index for it in the destination type stream. DenseMap
does not support concurrent insertion, and even if it did, the linker
must be deterministic: it cannot produce different PDBs by using
different numbers of threads. The output type stream must be in the same
order regardless of the order of hash table insertions.
In order to create a hash table that supports concurrent insertion, the
table cells must be small enough that they can be updated atomically.
The algorithm I used for updating the table using linear probing is
described in this paper, "Concurrent Hash Tables: Fast and General(?)!":
https://dl.acm.org/doi/10.1145/3309206
The GHashCell in this change is essentially a pair of 32-bit integer
indices: <sourceIndex, typeIndex>. The sourceIndex is the index of the
TpiSource object, and it represents an input type stream. The typeIndex
is the index of the type in the stream. Together, we have something like
a ragged 2D array of ghashes, which can be looked up as:
tpiSources[tpiSrcIndex]->ghashes[typeIndex]
By using these side tables, we can omit the key data from the hash
table, and keep the table cell small. There is a cost to this: resolving
hash table collisions requires many more loads than simply looking at
the key in the same cache line as the insertion position. However, most
supported platforms should have a 64-bit CAS operation to update the
cell atomically.
To make the result of concurrent insertion deterministic, the cell
payloads must have a priority function. Defining one is pretty
straightforward: compare the two 32-bit numbers as a combined 64-bit
number. This means that types coming from inputs earlier on the command
line have a higher priority and are more likely to appear earlier in the
final PDB type stream than types from an input appearing later on the
link line.
After table insertion, the non-empty cells in the table can be copied
out of the main table and sorted by priority to determine the ordering
of the final type index stream. At this point, item and type records
must be separated, either by sorting or by splitting into two arrays,
and I chose sorting. This is why the GHashCell must contain the isItem
bit.
Once the final PDB TPI stream ordering is known, we need to compute a
mapping from source type index to PDB type index. To avoid starting over
from scratch and looking up every type again by its ghash, we save the
insertion position of every hash table insertion during the first
insertion phase. Because the table does not support rehashing, the
insertion position is stable. Using the array of insertion positions
indexed by source type index, we can replace the source type indices in
the ghash table cells with the PDB type indices.
Once the table cells have been updated to contain PDB type indices, the
mapping for each type source can be computed in parallel. Simply iterate
the list of cell positions and replace them with the PDB type index,
since the insertion positions are no longer needed.
Once we have a source to destination type index mapping for every type
source, there are no more data dependencies. We know which type records
are "unique" (not duplicates), and what their final type indices will
be. We can do the remapping in parallel, and accumulate type sizes and
type hashes in parallel by type source.
Lastly, TPI stream layout must be done serially. Accumulate all the type
records, sizes, and hashes, and add them to the PDB.
Differential Revision: https://reviews.llvm.org/D87805
Stanislav Mekhanoshin [Wed, 30 Sep 2020 22:01:33 +0000 (15:01 -0700)]
[AMDGPU] Reorganize VOP3P encoding
This changes width of encoding and opcode fields to match the
documentation.
Differential Revision: https://reviews.llvm.org/D88619
Vitaly Buka [Wed, 30 Sep 2020 10:02:41 +0000 (03:02 -0700)]
[Msan] Add ptsname, ptsname_r interceptors
Reviewed By: eugenis, MaskRay
Differential Revision: https://reviews.llvm.org/D88547
MaheshRavishankar [Wed, 30 Sep 2020 21:55:59 +0000 (14:55 -0700)]
[mlir][Linalg] Add pattern to tile and fuse Linalg operations on buffers.
The pattern is structured similar to other patterns like
LinalgTilingPattern. The fusion patterns takes options that allows you
to fuse with producers of multiple operands at once.
- The pattern fuses only at the level that is known to be legal, i.e
if a reduction loop in the consumer is tiled, then fusion should
happen "before" this loop. Some refactoring of the fusion code is
needed to fuse only where it is legal.
- Since the fusion on buffers uses the LinalgDependenceGraph that is
not mutable in place the fusion pattern keeps the original
operations in the IR, but are tagged with a marker that can be later
used to find the original operations.
This change also fixes an issue with tiling and
distribution/interchange where if the tile size of a loop were 0 it
wasnt account for in these.
Differential Revision: https://reviews.llvm.org/D88435
Reid Kleckner [Wed, 30 Sep 2020 21:55:32 +0000 (14:55 -0700)]
Revert "[PDB] Merge types in parallel when using ghashing"
This reverts commit
49b3459930655d879b2dc190ff8fe11c38a8be5f.
Reid Kleckner [Thu, 14 May 2020 21:02:36 +0000 (14:02 -0700)]
[PDB] Merge types in parallel when using ghashing
This makes type merging much faster (-24% on chrome.dll) when multiple
threads are available, but it slightly increases the time to link (+10%)
when /threads:1 is passed. With only one more thread, the new type
merging is faster (-11%). The output PDB should be identical to what it
was before this change.
To give an idea, here is the /time output placed side by side:
BEFORE | AFTER
Input File Reading: 956 ms | 968 ms
Code Layout: 258 ms | 190 ms
Commit Output File: 6 ms | 7 ms
PDB Emission (Cumulative): 6691 ms | 4253 ms
Add Objects: 4341 ms | 2927 ms
Type Merging: 2814 ms | 1269 ms -55%!
Symbol Merging: 1509 ms | 1645 ms
Publics Stream Layout: 111 ms | 112 ms
TPI Stream Layout: 764 ms | 26 ms trivial
Commit to Disk: 1322 ms | 1036 ms -300ms
----------------------------------------- --------
Total Link Time: 8416 ms 5882 ms -30% overall
The main source of the additional overhead in the single-threaded case
is the need to iterate all .debug$T sections up front to check which
type records should go in the IPI stream. See fillIsItemIndexFromDebugT.
With changes to the .debug$H section, we could pre-calculate this info
and eliminate the need to do this walk up front. That should restore
single-threaded performance back to what it was before this change.
This change will cause LLD to be much more parallel than it used to, and
for users who do multiple links in parallel, it could regress
performance. However, when the user is only doing one link, it's a huge
improvement. In the future, we can use NT worker threads to avoid
oversaturating the machine with work, but for now, this is such an
improvement for the single-link use case that I think we should land
this as is.
Algorithm
----------
Before this change, we essentially used a
DenseMap<GloballyHashedType, TypeIndex> to check if a type has already
been seen, and if it hasn't been seen, insert it now and use the next
available type index for it in the destination type stream. DenseMap
does not support concurrent insertion, and even if it did, the linker
must be deterministic: it cannot produce different PDBs by using
different numbers of threads. The output type stream must be in the same
order regardless of the order of hash table insertions.
In order to create a hash table that supports concurrent insertion, the
table cells must be small enough that they can be updated atomically.
The algorithm I used for updating the table using linear probing is
described in this paper, "Concurrent Hash Tables: Fast and General(?)!":
https://dl.acm.org/doi/10.1145/3309206
The GHashCell in this change is essentially a pair of 32-bit integer
indices: <sourceIndex, typeIndex>. The sourceIndex is the index of the
TpiSource object, and it represents an input type stream. The typeIndex
is the index of the type in the stream. Together, we have something like
a ragged 2D array of ghashes, which can be looked up as:
tpiSources[tpiSrcIndex]->ghashes[typeIndex]
By using these side tables, we can omit the key data from the hash
table, and keep the table cell small. There is a cost to this: resolving
hash table collisions requires many more loads than simply looking at
the key in the same cache line as the insertion position. However, most
supported platforms should have a 64-bit CAS operation to update the
cell atomically.
To make the result of concurrent insertion deterministic, the cell
payloads must have a priority function. Defining one is pretty
straightforward: compare the two 32-bit numbers as a combined 64-bit
number. This means that types coming from inputs earlier on the command
line have a higher priority and are more likely to appear earlier in the
final PDB type stream than types from an input appearing later on the
link line.
After table insertion, the non-empty cells in the table can be copied
out of the main table and sorted by priority to determine the ordering
of the final type index stream. At this point, item and type records
must be separated, either by sorting or by splitting into two arrays,
and I chose sorting. This is why the GHashCell must contain the isItem
bit.
Once the final PDB TPI stream ordering is known, we need to compute a
mapping from source type index to PDB type index. To avoid starting over
from scratch and looking up every type again by its ghash, we save the
insertion position of every hash table insertion during the first
insertion phase. Because the table does not support rehashing, the
insertion position is stable. Using the array of insertion positions
indexed by source type index, we can replace the source type indices in
the ghash table cells with the PDB type indices.
Once the table cells have been updated to contain PDB type indices, the
mapping for each type source can be computed in parallel. Simply iterate
the list of cell positions and replace them with the PDB type index,
since the insertion positions are no longer needed.
Once we have a source to destination type index mapping for every type
source, there are no more data dependencies. We know which type records
are "unique" (not duplicates), and what their final type indices will
be. We can do the remapping in parallel, and accumulate type sizes and
type hashes in parallel by type source.
Lastly, TPI stream layout must be done serially. Accumulate all the type
records, sizes, and hashes, and add them to the PDB.
Differential Revision: https://reviews.llvm.org/D87805
Sam McCall [Wed, 30 Sep 2020 21:19:08 +0000 (23:19 +0200)]
[clangd] Remove dead variable. NFC
peter klausler [Wed, 30 Sep 2020 19:43:21 +0000 (12:43 -0700)]
[flang] Fix descriptor-based array data item I/O for list-directed CHARACTER & LOGICAL
These types have to distinguish list-directed I/O from formatted I/O,
and the subscript incrementation call was in the formatted branch
of the if() rather than after the if().
Differential revision: https://reviews.llvm.org/D88606
Hubert Tong [Wed, 30 Sep 2020 20:58:48 +0000 (16:58 -0400)]
[NFC] Fix spacing in clang/test/Driver/aix-ld.c
Fix one line with mismatch in indentation after
afc277b0ed0d.
Rainer Orth [Wed, 30 Sep 2020 20:58:07 +0000 (22:58 +0200)]
[asan][test] XFAIL Posix/no_asan_gen_globals.c on Solaris
`Posix/no_asan_gen_globals.c` currently `FAIL`s on Solaris:
$ nm no_asan_gen_globals.c.tmp.exe | grep ___asan_gen_
0809696a r .L___asan_gen_.1
0809a4cd r .L___asan_gen_.2
080908e2 r .L___asan_gen_.4
0809a4cd r .L___asan_gen_.5
0809a529 r .L___asan_gen_.7
0809a4cd r .L___asan_gen_.8
As detailed in Bug 47607, there are two factors here:
- `clang` plays games by emitting some local labels into the symbol
table. When instead one uses `-fno-integrated-as` to have `gas` create
the object files, they don't land in the objects in the first place.
- Unlike GNU `ld`, the Solaris `ld` doesn't support support
`-X`/`--discard-locals` but instead relies on the assembler to follow its
specification and not emit local labels.
Therefore this patch `XFAIL`s the test on Solaris.
Tested on `amd64-pc-solaris2.11` and `x86_64-pc-linux-gnu`.
Differential Revision: https://reviews.llvm.org/D88218
Craig Topper [Wed, 30 Sep 2020 19:36:28 +0000 (12:36 -0700)]
[X86] Canonicalize (x > 1) ? x : 1 -> (x >= 1) ? x : 1 for sign and unsigned to enable the use of test instructions for the compare.
This will be further canonicalized to a compare involving 0
which will enable the use of test instructions. Either using
cmovg for signed for cmovne for unsigned.
Fixes more case for PR47049
Arthur Eubanks [Wed, 30 Sep 2020 20:23:21 +0000 (13:23 -0700)]
[NPM] Add target specific hook to add passes for New Pass Manager
The patch adds a new TargetMachine member "registerPassBuilderCallbacks" for targets to add passes to the pass pipeline using the New Pass Manager (similar to adjustPassManager for the Legacy Pass Manager).
Reviewed By: aeubanks
Differential Revision: https://reviews.llvm.org/D88138
Thomas Raoux [Wed, 30 Sep 2020 19:58:24 +0000 (12:58 -0700)]
[mlir][vector] First step of vector distribution transformation
This is the first of several steps to support distributing large vectors. This
adds instructions extract_map and insert_map that allow us to do incremental
lowering. Right now the transformation only apply to simple pointwise operation
with a vector size matching the multiplicity of the IDs used to distribute the
vector.
This can be used to distribute large vectors to loops or SPMD.
Differential Revision: https://reviews.llvm.org/D88341
Christian Sigg [Tue, 29 Sep 2020 13:19:54 +0000 (15:19 +0200)]
Add GDB prettyprinters for a few more MLIR types.
Reviewed By: dblaikie, jpienaar
Differential Revision: https://reviews.llvm.org/D87159
Joseph Huber [Wed, 30 Sep 2020 19:11:51 +0000 (15:11 -0400)]
Revert "[OpenMP] Replace OpenMP RTL Functions With OMPIRBuilder and OMPKinds.def"
Failing tests on Arm due to the tests automatically populating
incomatible pointer width architectures. Reverting until the tests are
updated. Failing tests:
OpenMP/distribute_parallel_for_num_threads_codegen.cpp
OpenMP/distribute_parallel_for_if_codegen.cpp
OpenMP/distribute_parallel_for_simd_if_codegen.cpp
OpenMP/distribute_parallel_for_simd_num_threads_codegen.cpp
OpenMP/target_teams_distribute_parallel_for_if_codegen.cpp
OpenMP/target_teams_distribute_parallel_for_simd_if_codegen.cpp
OpenMP/teams_distribute_parallel_for_if_codegen.cpp
OpenMP/teams_distribute_parallel_for_simd_if_codegen.cpp
This reverts commit
90eaedda9b8ef46e2c0c1b8bce33e98a3adbb68c.
Sanjay Patel [Wed, 30 Sep 2020 19:09:21 +0000 (15:09 -0400)]
[CodeGen] improve coverage for float (32-bit) type of NAN; NFC
Goes with D88238
Joseph Huber [Wed, 30 Sep 2020 19:02:16 +0000 (15:02 -0400)]
Revert "[OpenMP] Add Error Handling for Conflicting Pointer Sizes for Target Offload"
Failing tests on Arm due to the tests automatically populating
incomatible pointer width architectures. Reverting until the tests are
updated. Failing tests:
OpenMP/distribute_parallel_for_num_threads_codegen.cpp
OpenMP/distribute_parallel_for_if_codegen.cpp
OpenMP/distribute_parallel_for_simd_if_codegen.cpp
OpenMP/distribute_parallel_for_simd_num_threads_codegen.cpp
OpenMP/target_teams_distribute_parallel_for_if_codegen.cpp
OpenMP/target_teams_distribute_parallel_for_simd_if_codegen.cpp
OpenMP/teams_distribute_parallel_for_if_codegen.cpp
OpenMP/teams_distribute_parallel_for_simd_if_codegen.cpp
This reverts commit
9d2378b59150f6f1cb5c9cf42ea06b0bb57029a1.
Louis Dionne [Wed, 30 Sep 2020 18:58:17 +0000 (14:58 -0400)]
[libc++] Make sure we don't attempt to run check-cxx-abilist when libc++ doesn't define new/delete
That would make the test fail spuriously because we don't generate
an ABI list for that configuration.
Arthur Eubanks [Wed, 30 Sep 2020 18:49:58 +0000 (11:49 -0700)]
[test][NewPM][SampleProfile] Fix more tests under NPM
These all have separate legacy and new PM RUN lines.
Jim Ingham [Wed, 30 Sep 2020 18:46:59 +0000 (11:46 -0700)]
Fix crash in SBStructuredData::GetDescription() when there's no StructuredDataPlugin.
Also, use the StructuredData::Dump method to print the StructuredData if there
is no plugin, rather than just returning an error.
Differential Revision: https://reviews.llvm.org/D88266
Jordan Rupprecht [Wed, 30 Sep 2020 18:24:10 +0000 (11:24 -0700)]
[lldb-vscode] Allow an empty 'breakpoints' field to clear breakpoints.
Per the DAP spec for SetBreakpoints [1], the way to clear breakpoints is: `To clear all breakpoint for a source, specify an empty array.`
However, leaving the breakpoints field unset is also a well formed request (note the `breakpoints?:` in the `SetBreakpointsArguments` definition). If it's unset, we have a couple choices:
1. Crash (current behavior)
2. Clear breakpoints
3. Return an error response that the breakpoints field is missing.
I propose we do (2) instead of (1), and treat an unset breakpoints field the same as an empty breakpoints field.
[1] https://microsoft.github.io/debug-adapter-protocol/specification#Requests_SetBreakpoints
Reviewed By: wallace, labath
Differential Revision: https://reviews.llvm.org/D88513
Eugene Zhulenev [Tue, 29 Sep 2020 20:55:33 +0000 (13:55 -0700)]
[MLIR] Add async.value type to Async dialect
Return values from async regions as !async.value<...>.
Reviewed By: mehdi_amini, csigg
Differential Revision: https://reviews.llvm.org/D88510
Jordan Rupprecht [Wed, 30 Sep 2020 17:47:48 +0000 (10:47 -0700)]
[lldb/ipv6] Support running lldb tests in an ipv6-only environment.
When running in an ipv6-only environment where `AF_INET` sockets are not available, many lldb tests (mostly gdb remote tests) fail because things like `127.0.0.1` don't work there.
Use `localhost` instead of `127.0.0.1` whenever possible, or include a fallback of creating `AF_INET6` sockets when `AF_INET` fails.
Reviewed By: labath
Differential Revision: https://reviews.llvm.org/D87333
Rahman Lavaee [Wed, 30 Sep 2020 17:37:00 +0000 (10:37 -0700)]
Exception support for basic block sections
This is part of the Propeller framework to do post link code layout optimizations. Please see the RFC here: https://groups.google.com/forum/#!msg/llvm-dev/ef3mKzAdJ7U/1shV64BYBAAJ and the detailed RFC doc here: https://github.com/google/llvm-propeller/blob/plo-dev/Propeller_RFC.pdf
This patch provides exception support for basic block sections by splitting the call-site table into call-site ranges corresponding to different basic block sections. Still all landing pads must reside in the same basic block section (which is guaranteed by the the core basic block section patch D73674 (ExceptionSection) ). Each call-site table will refer to the landing pad fragment by explicitly specifying @LPstart (which is omitted in the normal non-basic-block section case). All these call-site tables will share their action and type tables.
The C++ ABI somehow assumes that no landing pads point directly to LPStart (which works in the normal case since the function begin is never a landing pad), and uses LP.offset = 0 to specify no landing pad. In the case of basic block section where one section contains all the landing pads, the landing pad offset relative to LPStart could actually be zero. Thus, we avoid zero-offset landing pads by inserting a **nop** operation as the first non-CFI instruction in the exception section.
**Background on Exception Handling in C++ ABI**
https://github.com/itanium-cxx-abi/cxx-abi/blob/master/exceptions.pdf
Compiler emits an exception table for every function. When an exception is thrown, the stack unwinding library queries the unwind table (which includes the start and end of each function) to locate the exception table for that function.
The exception table includes a call site table for the function, which is used to guide the exception handling runtime to take the appropriate action upon an exception. Each call site record in this table is structured as follows:
| CallSite | --> Position of the call site (relative to the function entry)
| CallSite length | --> Length of the call site.
| Landing Pad | --> Position of the landing pad (relative to the landing pad fragment’s begin label)
| Action record offset | --> Position of the first action record
The call site records partition a function into different pieces and describe what action must be taken for each callsite. The callsite fields are relative to the start of the function (as captured in the unwind table).
The landing pad entry is a reference into the function and corresponds roughly to the catch block of a try/catch statement. When execution resumes at a landing pad, it receives an exception structure and a selector value corresponding to the type of the exception thrown, and executes similar to a switch-case statement. The landing pad field is relative to the beginning of the procedure fragment which includes all the landing pads (@LPStart). The C++ ABI requires all landing pads to be in the same fragment. Nonetheless, without basic block sections, @LPStart is the same as the function @Start (found in the unwind table) and can be omitted.
The action record offset is an index into the action table which includes information about which exception types are caught.
**C++ Exceptions with Basic Block Sections**
Basic block sections break the contiguity of a function fragment. Therefore, call sites must be specified relative to the beginning of the basic block section. Furthermore, the unwinding library should be able to find the corresponding callsites for each section. To do so, the .cfi_lsda directive for a section must point to the range of call-sites for that section.
This patch introduces a new **CallSiteRange** structure which specifies the range of call-sites which correspond to every section:
`struct CallSiteRange {
// Symbol marking the beginning of the precedure fragment.
MCSymbol *FragmentBeginLabel = nullptr;
// Symbol marking the end of the procedure fragment.
MCSymbol *FragmentEndLabel = nullptr;
// LSDA symbol for this call-site range.
MCSymbol *ExceptionLabel = nullptr;
// Index of the first call-site entry in the call-site table which
// belongs to this range.
size_t CallSiteBeginIdx = 0;
// Index just after the last call-site entry in the call-site table which
// belongs to this range.
size_t CallSiteEndIdx = 0;
// Whether this is the call-site range containing all the landing pads.
bool IsLPRange = false;
};`
With N basic-block-sections, the call-site table is partitioned into N call-site ranges.
Conceptually, we emit the call-site ranges for sections sequentially in the exception table as if each section has its own exception table. In the example below, two sections result in the two call site ranges (denoted by LSDA1 and LSDA2) placed next to each other. However, their call-sites will refer to records in the shared Action Table. We also emit the header fields (@LPStart and CallSite Table Length) for each call site range in order to place the call site ranges in separate LSDAs. We note that with -basic-block-sections, The CallSiteTableLength will not actually represent the length of the call site table, but rather the reference to the action table. Since the only purpose of this field is to locate the action table, correctness is guaranteed.
Finally, every call site range has one @LPStart pointer so the landing pads of each section must all reside in one section (not necessarily the same section). To make this easier, we decide to place all landing pads of the function in one section (hence the `IsLPRange` field in CallSiteRange).
| @LPStart | ---> Landing pad fragment ( LSDA1 points here)
| CallSite Table Length | ---> Used to find the action table.
| CallSites |
| … |
| … |
| @LPStart | ---> Landing pad fragment ( LSDA2 points here)
| CallSite Table Length |
| CallSites |
| … |
| … |
…
…
| Action Table |
| Types Table |
Reviewed By: MaskRay
Differential Revision: https://reviews.llvm.org/D73739
David Tenty [Tue, 29 Sep 2020 15:30:28 +0000 (11:30 -0400)]
[AIX][Clang][Driver] Link libm in c++ mode
since that is the normal behaviour of other compilers on the platform.
Reviewed By: hubert.reinterpretcast
Differential Revision: https://reviews.llvm.org/D88500
Joseph Huber [Mon, 28 Sep 2020 14:23:14 +0000 (10:23 -0400)]
[OpenMP] Replace OpenMP RTL Functions With OMPIRBuilder and OMPKinds.def
Summary:
Replace the OpenMP Runtime Library functions used in CGOpenMPRuntimeGPU
for OpenMP device code generation with ones in OMPKinds.def and use
OMPIRBuilder for generating runtime calls. This allows us to consolidate
more OpenMP code generation into the OMPIRBuilder. This patch also
invalidates specifying target architectures with conflicting pointer
sizes.
Reviewers: jdoerfert
Subscribers: aaron.ballman cfe-commits guansong llvm-commits sstefan1 yaxunl
Tags: #OpenMP #Clang #LLVM
Differential Revision: https://reviews.llvm.org/D88430
Joseph Huber [Wed, 30 Sep 2020 17:10:32 +0000 (13:10 -0400)]
[OpenMP] Add Error Handling for Conflicting Pointer Sizes for Target Offload
Summary:
This patch adds an error to Clang that detects if OpenMP offloading is used
between two architectures with incompatible pointer sizes. This ensures that
the data mapping can be done correctly and solves an issue in code generation
generating the wrong size pointer.
Reviewer: jdoerfert
Subscribers:
Tags: #OpenMP #Clang
Differential Revision:
Richard Smith [Wed, 30 Sep 2020 17:48:00 +0000 (10:48 -0700)]
Fix interaction of `constinit` and `weak`.
We previously took a shortcut and said that weak variables never have
constant initializers (because those initializers are never correct to
use outside the variable). We now say that weak variables can have
constant initializers, but are never usable in constant expressions.
Alexandre Rames [Wed, 30 Sep 2020 17:11:14 +0000 (18:11 +0100)]
[Sema] Support Comma operator for fp16 vectors.
The current half vector was enforcing an assert expecting
"(LHS is half vector) == (RHS is half vector)"
for comma.
Reviewed By: ahatanak, fhahn
Differential Revision: https://reviews.llvm.org/D88265
Sanjay Patel [Wed, 30 Sep 2020 17:18:42 +0000 (13:18 -0400)]
[CodeGen] add test for NAN creation; NFC
This goes with the APFloat change proposed in
D88238.
This is copied from the MIPS-specific test in
builtin-nan-legacy.c to verify that the normal
behavior is correct on other targets without the
complication of an inverted quiet bit.
Congzhe Cao [Wed, 30 Sep 2020 17:03:14 +0000 (13:03 -0400)]
[AArch64] Avoid pairing loads when the base reg is modified
When pairing loads, we should check if in between the two loads the
base register has been modified. If that is the case then avoid pairing
them because the second load actually loads from a different address.
Reviewed By: fhahn
Differential Revision: https://reviews.llvm.org/D86956
Rainer Orth [Wed, 30 Sep 2020 16:56:52 +0000 (18:56 +0200)]
[asan][test] Several Posix/unpoison-alternate-stack.cpp fixes
`Posix/unpoison-alternate-stack.cpp` currently `FAIL`s on Solaris/i386.
Some of the problems are generic:
- `clang` warns compiling the testcase:
compiler-rt/test/asan/TestCases/Posix/unpoison-alternate-stack.cpp:83:7: warning: nested designators are a C99 extension [-Wc99-designator]
.sa_sigaction = signalHandler,
^~~~~~~~~~~~~
compiler-rt/test/asan/TestCases/Posix/unpoison-alternate-stack.cpp:84:7: warning: ISO C++ requires field designators to be specified in declaration order; field '_funcptr' will be initialized after field 'sa_flags' [-Wreorder-init-list]
.sa_flags = SA_SIGINFO | SA_NODEFER | SA_ONSTACK,
^~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~
and some more instances. This can all easily be avoided by initializing
each field separately.
- The test `SEGV`s in `__asan_memcpy`. The default Solaris/i386 stack size
is only 4 kB, while `__asan_memcpy` tries to allocate either 5436
(32-bit) or 10688 bytes (64-bit) on the stack. This patch avoids this by
requiring at least 16 kB stack size.
- Even without `-fsanitize=address` I get an assertion failure:
Assertion failed: !isOnSignalStack(), file compiler-rt/test/asan/TestCases/Posix/unpoison-alternate-stack.cpp, line 117
The fundamental problem with this testcase is that `longjmp` from a
signal handler is highly unportable; XPG7 strongly warns against it and
it is thus unspecified which stack is used when `longjmp`ing from a
signal handler running on an alternative stack.
So I'm `XFAIL`ing this testcase on Solaris.
Tested on `amd64-pc-solaris2.11` and `x86_64-pc-linux-gnu`.
Differential Revision: https://reviews.llvm.org/D88501
Arthur Eubanks [Wed, 30 Sep 2020 16:42:49 +0000 (09:42 -0700)]
[test][SampleProfile][NewPM] Fix some tests under NPM
Peter Collingbourne [Fri, 25 Sep 2020 00:01:24 +0000 (17:01 -0700)]
scudo: Make it thread-safe to set some runtime configuration flags.
Move some of the flags previously in Options, as well as the
UseMemoryTagging flag previously in the primary allocator, into an
atomic variable so that it can be updated while other threads are
running. Relaxed accesses are used because we only have the requirement
that the other threads see the new value eventually.
The code is set up so that the variable is generally loaded once per
allocation function call with the exception of some rarely used code
such as error handlers. The flag bits can generally stay in a register
during the execution of the allocation function which means that they
can be branched on with minimal overhead (e.g. TBZ on aarch64).
Differential Revision: https://reviews.llvm.org/D88523
Vy Nguyen [Tue, 4 Aug 2020 22:34:22 +0000 (18:34 -0400)]
[llvm-exegesis] Add option to check the hardware support for a given feature before benchmarking.
This is mostly for the benefit of the LBR latency mode.
Right now, it performs no checking. If this is run on non-supported hardware, it will produce all zeroes for latency.
Differential Revision: https://reviews.llvm.org/D85254
Valentin Clement [Wed, 30 Sep 2020 16:23:06 +0000 (12:23 -0400)]
[mlir][openacc] Remove -allow-unregistred-dialect from ops and invalid tests
Switch to a dummy op in the test dialect so we can remove the -allow-unregistred-dialect
on ops.mlir and invalid.mlir. Change after comment on D88272.
Reviewed By: mehdi_amini
Differential Revision: https://reviews.llvm.org/D88587
Arthur Eubanks [Sat, 26 Sep 2020 07:18:42 +0000 (00:18 -0700)]
[ObjCARCAA][NewPM] Add already ported objc-arc-aa to PassRegistry.def
Also add missing AnalysisKey definition.
Kazushi (Jam) Marukawa [Mon, 21 Sep 2020 08:15:26 +0000 (17:15 +0900)]
[VE] Support TargetBlockAddress
Change to handle TargetBlockAddress and add a regression test for it.
Reviewed By: simoll
Differential Revision: https://reviews.llvm.org/D88576
Simon Moll [Wed, 30 Sep 2020 15:10:44 +0000 (17:10 +0200)]
[DA][SDA] SyncDependenceAnalysis re-write
This patch achieves two things:
1. It breaks up the `join_blocks` interface between the SDA to the DA to
return two separate sets for divergent loops exits and divergent,
disjoint path joins.
2. It updates the SDA algorithm to run in O(n) time and improves the
precision on divergent loop exits.
This fixes `https://bugs.llvm.org/show_bug.cgi?id=46372` (by virtue of
the improved `join_blocks` interface) and revealed an imprecise expected
result in the `Analysis/DivergenceAnalysis/AMDGPU/hidden_loopdiverge.ll`
test.
Reviewed By: sameerds
Differential Revision: https://reviews.llvm.org/D84413
Mircea Trofin [Tue, 29 Sep 2020 16:09:25 +0000 (09:09 -0700)]
[NFC][regalloc] Make VirtRegAuxInfo part of allocator state
All the state of VRAI is allocator-wide, so we can avoid creating it
every time we need it. In addition, the normalization function is
allocator-specific. In a next change, we can simplify that design in
favor of just having it as a virtual member.
Differential Revision: https://reviews.llvm.org/D88499
Simon Pilgrim [Wed, 30 Sep 2020 15:08:52 +0000 (16:08 +0100)]
[InstCombine] Add tests for 'partial' bswap patterns
As mentioned on PR47191, if we're bswap'ing some bytes and the zero'ing the remainder we can perform this as a bswap+mask which helps us match 'partial' bswaps as a first step towards folding into a more complex bswap pattern.
Zarko Todorovski [Wed, 30 Sep 2020 15:03:03 +0000 (11:03 -0400)]
[PPC] Do not emit extswsli in 32BIT mode when using -mcpu=pwr9
It looks like in some circumstances when compiling with `-mcpu=pwr9` we create an EXTSWSLI node when which causes llc to fail. No such error occurs in pwr8 or lower.
This occurs in 32BIT AIX and BE Linux. the cause seems to be that the default return in combineSHL is to create an EXTSWSLI node. Adding a check for whether we are in PPC64 before that fixes the issue.
Reviewed By: #powerpc, nemanjai
Differential Revision: https://reviews.llvm.org/D87046
Benjamin Kramer [Wed, 30 Sep 2020 15:01:14 +0000 (17:01 +0200)]
[PowerPC] Avoid unused variable warning in Release builds
PPCFrameLowering.cpp:632:8: warning: unused variable 'isAIXABI' [-Wunused-variable]
Simon Pilgrim [Wed, 30 Sep 2020 14:51:54 +0000 (15:51 +0100)]
[InstCombine] Fix bswap(trunc(bswap(x))) -> trunc(lshr(x, c)) vector support
Use getScalarSizeInBits not getPrimitiveSizeInBits to determine the shift value at the element level.
Simon Pilgrim [Wed, 30 Sep 2020 14:42:53 +0000 (15:42 +0100)]
[InstCombine] Add bswap(trunc(bswap(x))) -> trunc(lshr(x, c)) vector tests
Add tests showing failure to correctly fold vector bswap(trunc(bswap(x))) intrinsic patterns
Mahesh Ravishankar [Tue, 29 Sep 2020 23:14:49 +0000 (16:14 -0700)]
[mlir][Linalg] Generalize the logic to compute reassociation maps
while folding tensor_reshape op.
While folding reshapes that introduce unit extent dims, the logic to
compute the reassociation maps can be generalized to handle some
corner cases, for example, when the folded shape still has unit-extent
dims but corresponds to folded unit extent dims of the expanded shape.
Differential Revision: https://reviews.llvm.org/D88521
Xiangling Liao [Wed, 30 Sep 2020 14:35:00 +0000 (10:35 -0400)]
[FE] Use preferred alignment instead of ABI alignment for complete object when applicable
On some targets, preferred alignment is larger than ABI alignment in some cases. For example,
on AIX we have special power alignment rules which would cause that. Previously, to support
those cases, we added a “PreferredAlignment” field in the `RecordLayout` to store the AIX
special alignment values in “PreferredAlignment” as the community suggested.
However, that patch alone is not enough. There are places in the Clang where `PreferredAlignment`
should have been used instead of ABI-specified alignment. This patch is aimed at fixing those
spots.
Differential Revision: https://reviews.llvm.org/D86790
Simon Pilgrim [Wed, 30 Sep 2020 14:32:57 +0000 (15:32 +0100)]
[InstCombine] Remove %tmp variable names from bswap-fold tests
Appease update_test_checks script that was complaining about potential %TMP clashes
Matt Arsenault [Sat, 26 Sep 2020 14:14:14 +0000 (10:14 -0400)]
GlobalISel: Assert if MoreElements uses a non-vector type
Matt Arsenault [Tue, 22 Sep 2020 15:46:41 +0000 (11:46 -0400)]
LiveDebugValues: Fix typos and indentation
Matt Arsenault [Mon, 28 Sep 2020 17:42:17 +0000 (13:42 -0400)]
RegAllocFast: Add extra DBG_VALUE for live out spills
This allows LiveDebugValues to insert the proper DBG_VALUEs in live
out blocks if a spill is inserted before the use of a
register. Previously, this would see the register use as the last
DBG_VALUE, even though the stack slot should be treated as the live
out value.
This avoids an lldb test regression when D52010 is re-applied.
Matt Arsenault [Tue, 22 Sep 2020 12:55:54 +0000 (08:55 -0400)]
Reapply "RegAllocFast: Rewrite and improve"
This reverts commit
73a6a164b84a8195defbb8f5eeb6faecfc478ad4.
Xiangling Liao [Wed, 30 Sep 2020 13:52:41 +0000 (09:52 -0400)]
[NFC][FE] Replace TypeSize with StorageUnitSize
On some targets like AIX, last bitfield size is not always equal to last
bitfield type size. Some bitfield like bool will have the same alignment
as [unsigned]. So we'd like to use a more general term `StorageUnit` to
replace type in this field.
Differential Revision: https://reviews.llvm.org/D88260
Rainer Orth [Wed, 30 Sep 2020 14:30:18 +0000 (16:30 +0200)]
[sanitizers] Fix internal__exit on Solaris
`TestCases/log-path_test.cpp` currently `FAIL`s on Solaris:
$ env ASAN_OPTIONS=log_path=`for((i=0;i<10000;i++)); do echo -n $i; done` ./log-path_test.cpp.tmp
==5031==ERROR: Path is too long:
01234567...
Segmentation Fault (core dumped)
The `SEGV` happens here:
Thread 2 received signal SIGSEGV, Segmentation fault.
[Switching to Thread 1 (LWP 1)]
0x00000000 in ?? ()
(gdb) where
#0 0x00000000 in ?? ()
#1 0x080a1e63 in __interceptor__exit (status=1)
at /vol/gcc/src/llvm/llvm/local/projects/compiler-rt/lib/asan/../sanitizer_common/sanitizer_common_interceptors.inc:3808
#2 0x08135ea8 in __sanitizer::internal__exit (exitcode=1)
at /vol/gcc/src/llvm/llvm/local/projects/compiler-rt/lib/sanitizer_common/sanitizer_solaris.cc:139
when `__interceptor__exit` tries to call `__interception::real__exit` which
is `NULL` at this point because the interceptors haven't been initialized yet.
Ultimately, the problem lies elsewhere, however: `internal__exit` in
`sanitizer_solaris.cpp` calls `_exit` itself since there doesn't exit a
non-intercepted version in `libc`. Using the `syscall` interface instead
isn't usually an option on Solaris because that interface isn't stable.
However, in the case of `SYS_exit` it can be used nonetheless: `SYS_exit`
has remained unchanged since at least Solaris 2.5.1 in 1996, and this is
what this patch does.
Tested on `amd64-pc-solaris2.11`.
Differential Revision: https://reviews.llvm.org/D88404
Benjamin Kramer [Wed, 30 Sep 2020 11:26:25 +0000 (13:26 +0200)]
Move AffineMapAttr into BaseOps.td
AffineMapAttr is already part of base, it's just impossible to refer to
it from ODS without pulling in the definition from Affine dialect.
Differential Revision: https://reviews.llvm.org/D88555
Gabriel Hjort Åkerlund [Wed, 30 Sep 2020 13:24:41 +0000 (15:24 +0200)]
[GlobalISel] Fix incorrect setting of ValNo when splitting
Before, for each original argument i, ValNo was set to i + PartIdx, but
ValNo is intended to reflect the index of the value before splitting.
Hence, ValNo should always be set to i and not consider the PartIdx.
Reviewed By: arsenm
Differential Revision: https://reviews.llvm.org/D86511
Sean Fertile [Wed, 30 Sep 2020 13:56:55 +0000 (09:56 -0400)]
[PowerPC] Remove support for VRSAVE save/restore/update.
After removal of Darwin as a PowerPC subtarget, the VRSAVE
save/restore/spill/update code is no longer needed by any supported
subtarget, so remove it while keeping support for vrsave and related instruction
aliases for inline asm. I've pre-commited tests to document the existing vrsave
handling in relation to @llvm.eh.unwind.init and inline asm usage, as
well as a test which shows a beahviour change on AIX related to
returning vector type as we were wrongly emiting VRSAVE_UPDATE on AIX.
Sam McCall [Wed, 30 Sep 2020 13:45:13 +0000 (15:45 +0200)]
[clangd] Fix invalid UTF8 when extracting doc comments.
Differential Revision: https://reviews.llvm.org/D88567
Simon Pilgrim [Wed, 30 Sep 2020 13:54:04 +0000 (14:54 +0100)]
[InstCombine] recognizeBSwapOrBitReverseIdiom - merge the regular/trunc+zext paths. NFCI.
There doesn't seem to be any good reason for having a separate path for when we bswap/bitreverse at a smaller size than the destination size - so merge these to make the instruction generation a lot clearer.
Simon Pilgrim [Wed, 30 Sep 2020 13:39:23 +0000 (14:39 +0100)]
[InstCombine] Remove %tmp variable names from bswap tests
Appease update_test_checks script that was complaining about potential %TMP clashes
Simon Pilgrim [Wed, 30 Sep 2020 13:36:12 +0000 (14:36 +0100)]
[InstCombine] recognizeBSwapOrBitReverseIdiom - remove unnecessary cast. NFCI.
Michał Górny [Wed, 30 Sep 2020 12:53:05 +0000 (14:53 +0200)]
[lldb] [Process/NetBSD] Fix operating on ftag register
Florian Hahn [Tue, 22 Sep 2020 14:10:06 +0000 (15:10 +0100)]
[VPlan] Change recipes to inherit from VPUser instead of a member var.
Now that VPUser is not inheriting from VPValue, we can take the next
step and turn the recipes that already manage their operands via VPUser
into VPUsers directly. This is another small step towards traversing
def-use chains in VPlan.
This is NFC with respect to the generated code, but makes the interface
more powerful.
Ed Maste [Wed, 30 Sep 2020 12:27:09 +0000 (08:27 -0400)]
[lldb] Fix FreeBSD Arm Process Plugin build
Add a missing include and some definitions in
769533216666.
Patch by: Brooks Davis
Reviewed by: labath
Differential Revision: https://reviews.llvm.org/D88453
Simon Pilgrim [Wed, 30 Sep 2020 13:19:00 +0000 (14:19 +0100)]
[InstCombine] Add PR47191 bswap tests
Simon Pilgrim [Wed, 30 Sep 2020 13:11:43 +0000 (14:11 +0100)]
[InstCombine] recognizeBSwapOrBitReverseIdiom - cleanup bswap/bitreverse detection loop. NFCI.
Early out if both pattern matches have failed (or we don't want them). Fix case of bit index iterator (and avoid Wshadow issue).
Sam Parker [Wed, 30 Sep 2020 11:25:06 +0000 (12:25 +0100)]
[RDA] isSafeToDefRegAt: Look at global uses
We weren't looking at global uses of a value, so we could happily
overwrite the register incorrectly.
Differential Revision: https://reviews.llvm.org/D88554
Simon Pilgrim [Wed, 30 Sep 2020 12:39:18 +0000 (13:39 +0100)]
[InstCombine] recognizeBSwapOrBitReverseIdiom - use ArrayRef::back() helper. NFCI.
Post-commit feedback on D88316
Simon Pilgrim [Wed, 30 Sep 2020 11:28:18 +0000 (12:28 +0100)]
InstCombine] collectBitParts - cleanup variable names. NFCI.
Fix a number of WShadow warnings (I was used as the instruction and index......) and fix cases to match style.
Also, replaced the Bit APInt mask check in AND instructions with a direct APInt[] bit check.
Florian Hahn [Wed, 30 Sep 2020 11:39:30 +0000 (12:39 +0100)]
[SCEV] Verify that all mapped SCEV AddRecs refer to valid loops.
This check helps to guard against cases where expressions referring to
invalidated/deleted loops are not properly invalidated.
The additional check is motivated by the reproducer shared for
8fdac7cb7abb
and I think in general make sense as a sanity check.
Reviewed By: reames
Differential Revision: https://reviews.llvm.org/D88166
Jakub Lichman [Wed, 30 Sep 2020 07:13:59 +0000 (07:13 +0000)]
[mlir][Linalg] Tile sizes for Conv ops vectorization added as pass arguments
Current setup for conv op vectorization does not enable user to specify tile
sizes as well as dimensions for vectorization. In this commit we change that by
adding tile sizes as pass arguments. Every dimension with corresponding tile
size > 1 is automatically vectorized.
Differential Revision: https://reviews.llvm.org/D88533
Sam Parker [Wed, 30 Sep 2020 11:14:39 +0000 (12:14 +0100)]
[NFC][ARM] Add more LowOverheadLoop tests.
Jakub Lichman [Wed, 30 Sep 2020 07:42:43 +0000 (07:42 +0000)]
[mlir] Added support for rank reducing subviews
This commit adds support for subviews which enable to reduce resulting rank
by dropping static dimensions of size 1.
Differential Revision: https://reviews.llvm.org/D88534
Simon Pilgrim [Wed, 30 Sep 2020 11:07:19 +0000 (12:07 +0100)]
[InstCombine] recognizeBSwapOrBitReverseIdiom - recognise zext(bswap(trunc(x))) patterns (PR39793)
PR39793 demonstrated an issue where we fail to recognize 'partial' bswap patterns of the lower bytes of an integer source.
In fact, most of this is already in place collectBitParts suitably tags zero bits, so we just need to correctly handle this case by finding the zero'd upper bits and reducing the bswap pattern just to the active demanded bits.
Differential Revision: https://reviews.llvm.org/D88316
Simon Pilgrim [Wed, 30 Sep 2020 10:13:54 +0000 (11:13 +0100)]
[InstCombine] recognizeBSwapOrBitReverseIdiom - assert for correct bit providence indices. NFCI.
As suggested by @spatel on D88316
LLVM GN Syncbot [Wed, 30 Sep 2020 10:09:34 +0000 (10:09 +0000)]
[gn build] Port
413577a8790
Xiang1 Zhang [Wed, 30 Sep 2020 10:01:15 +0000 (18:01 +0800)]
[X86] Support Intel Key Locker
Key Locker provides a mechanism to encrypt and decrypt data with an AES key without having access
to the raw key value by converting AES keys into “handles”. These handles can be used to perform the
same encryption and decryption operations as the original AES keys, but they only work on the current
system and only until they are revoked. If software revokes Key Locker handles (e.g., on a reboot),
then any previous handles can no longer be used.
Reviewed By: craig.topper
Differential Revision: https://reviews.llvm.org/D88398
George Mitenkov [Wed, 30 Sep 2020 09:05:17 +0000 (12:05 +0300)]
[MLIR][SPIRV] Support different function control in (de)serialization
Added support for different function control
in serialization and deserialization.
Reviewed By: mravishankar
Differential Revision: https://reviews.llvm.org/D88280
Jonas Paulsson [Thu, 24 Sep 2020 15:24:29 +0000 (17:24 +0200)]
[SystemZ] Support bare nop instructions
Add support of "nop" and "nopr" (without operands) to assembler.
Review: Ulrich Weigand
Jay Foad [Mon, 28 Sep 2020 08:30:58 +0000 (09:30 +0100)]
[SplitKit] Cope with no live subranges in defFromParent
Following on from D87757 "[SplitKit] Only copy live lanes", it is
possible to split a live range at a point when none of its subranges
are live. This patch handles that case by inserting an implicit def
of the superreg.
Patch by Quentin Colombet!
Differential Revision: https://reviews.llvm.org/D88397
Mirko Brkusanin [Tue, 29 Sep 2020 14:35:42 +0000 (16:35 +0200)]
[AMDGPU] Do not generate mul with 1 in AMDGPU Atomic Optimizer
Check if operand of mul is constant value of one for certain atomic
instructions in order to avoid making unnecessary instructions when
-amdgpu-atomic-optimizer is present.
Differential Revision: https://reviews.llvm.org/D88315
Kadir Cetinkaya [Tue, 29 Sep 2020 18:15:03 +0000 (20:15 +0200)]
[clangd][remote] Make sure relative paths are absolute with respect to posix style
Relative paths received from the server are always in posix style. So
we need to ensure they are relative using that style, and not the native one.
Differential Revision: https://reviews.llvm.org/D88507