Kirill Stoimenov [Thu, 19 Aug 2021 17:58:29 +0000 (17:58 +0000)]
[asan] Implemented flag to emit intrinsics to optimize ASan callbacks.
Reviewed By: vitalybuka
Differential Revision: https://reviews.llvm.org/D108377
Kirill Stoimenov [Thu, 26 Aug 2021 00:12:53 +0000 (00:12 +0000)]
[asan] Fixed a runtime crash.
Looks like the NoRegister has some effect on the final code that is generated. My guess is that some optimization kicks in at the end?
When I use -S to dump the assembly I get the correct version with 'shrq $3, %r8':
movq %r9, %r8
shrq $3, %r8
movsbl
2147450880(%r8), %r8d
But, when I disassemble the final binary I get RAX in stead of R8:
mov %r9,%r8
shr $0x3,%rax
movsbl 0x7fff8000(%r8),%r8d
Reviewed By: vitalybuka
Differential Revision: https://reviews.llvm.org/D108745
Rob Suderman [Thu, 26 Aug 2021 18:20:58 +0000 (11:20 -0700)]
[mlir][tosa] Tosa reverse to linalg supporting dynamic shapes
Needed to switch to extract to support tosa.reverse using dynamic shapes.
Reviewed By: NatashaKnk
Differential Revision: https://reviews.llvm.org/D108744
Alexey Bataev [Tue, 3 Aug 2021 20:20:32 +0000 (13:20 -0700)]
[SLP]Improve graph reordering.
Reworked reordering algorithm. Originally, the compiler just tried to
detect the most common order in the reordarable nodes (loads, stores,
extractelements,extractvalues) and then fully rebuilding the graph in
the best order. This was not effecient, since it required an extra
memory and time for building/rebuilding tree, double the use of the
scheduling budget, which could lead to missing vectorization due to
exausted scheduling resources.
Patch provide 2-way approach for graph reodering problem. At first, all
reordering is done in-place, it doe not required tree
deleting/rebuilding, it just rotates the scalars/orders/reuses masks in
the graph node.
The first step (top-to bottom) rotates the whole graph, similarly to the previous
implementation. Compiler counts the number of the most used orders of
the graph nodes with the same vectorization factor and then rotates the
subgraph with the given vectorization factor to the most used order, if
it is not empty. Then repeats the same procedure for the subgraphs with
the smaller vectorization factor. We can do this because we still need
to reshuffle smaller subgraph when buildiong operands for the graph
nodes with lasrger vectorization factor, we can rotate just subgraph,
not the whole graph.
The second step (bottom-to-top) scans through the leaves and tries to
detect the users of the leaves which can be reordered. If the leaves can
be reorder in the best fashion, they are reordered and their user too.
It allows to remove double shuffles to the same ordering of the operands in
many cases and just reorder the user operations instead. Plus, it moves
the final shuffles closer to the top of the graph and in many cases
allows to remove extra shuffle because the same procedure is repeated
again and we can again merge some reordering masks and reorder user nodes
instead of the operands.
Also, patch improves cost model for gathering of loads, which improves
x264 benchmark in some cases.
Gives about +2% on AVX512 + LTO (more expected for AVX/AVX2) for {625,525}x264,
+3% for 508.namd, improves most of other benchmarks.
The compile and link time are almost the same, though in some cases it
should be better (we're not doing an extra instruction scheduling
anymore) + we may vectorize more code for the large basic blocks again
because of saving scheduling budget.
Differential Revision: https://reviews.llvm.org/D105020
Nikita Popov [Thu, 26 Aug 2021 19:12:11 +0000 (21:12 +0200)]
[MergeICmps] Add test for call before first load (NFC)
If a clobbering call happens before all loads, that shouldn't
block the transform.
Arthur Eubanks [Thu, 26 Aug 2021 19:05:56 +0000 (12:05 -0700)]
[test] Update precommit tests for D108734
Vitaly Buka [Thu, 26 Aug 2021 19:02:45 +0000 (12:02 -0700)]
[sanitizer] Add basic qsort test
Jon Chesterfield [Thu, 26 Aug 2021 17:56:01 +0000 (18:56 +0100)]
[libomptarget][amdgpu][nfc] Rename variables, delete dead code
Andrea Di Biagio [Thu, 26 Aug 2021 18:53:17 +0000 (19:53 +0100)]
Revert "[MCA][NFC] Remove redundant calls to std::move."
This reverts commit
9cc0023fb863194be526f0bf19bd21e36236c5f6.
due to buildbot failures.
Siva Chandra Reddy [Thu, 26 Aug 2021 05:21:54 +0000 (05:21 +0000)]
[libc][NFC] Move the mutex implementation into a utility class.
This allows others parts of the libc to use the mutex types without
actually pulling in public function implementations.
Along the way, few cleanups have been done, like using a uniform type to
refer the linux futex word.
Reviewed By: michaelrj
Differential Revision: https://reviews.llvm.org/D108749
Andrea Di Biagio [Thu, 26 Aug 2021 18:43:18 +0000 (19:43 +0100)]
[MCA][NFC] Remove redundant calls to std::move.
This fixes some redundant move in return statement [-Wredundant-move] gcc 9.3.0
warnings.
This also fixes a minor coverity issue reported agaist class MCAOperand about
the lack of proper initialization for field Index.
No functional change intended.
Jessica Paquette [Thu, 26 Aug 2021 18:04:17 +0000 (11:04 -0700)]
[AArch64][GlobalISel] Optimize G_BUILD_VECTOR of undef + 1 elt -> SUBREG_TO_REG
This pattern
```
%elt = ... something ...
%undef = G_IMPLICIT_DEF
%vec = G_BUILD_VECTOR %elt, %undef, %undef, ... %undef
```
Can be selected to a SUBREG_TO_REG, assuming `%elt` and `%vec` have the same
register bank. We don't care about any of the bits in `%vec` aside from those
in `%elt`, which just happens to be the 0th element.
This is preferable to emitting `mov` instructions for every index.
This gives minor code size improvements on the test suite at -Os.
Differential Revision: https://reviews.llvm.org/D108773
RamNalamothu [Thu, 26 Aug 2021 18:24:15 +0000 (23:54 +0530)]
[docs, AMDGPU] Fix typo in dwarf register number mapping
Reviewed By: xgupta
Differential Revision: https://reviews.llvm.org/D108557
Yaron Keren [Sat, 21 Aug 2021 17:59:45 +0000 (20:59 +0300)]
[docs] Update Getting Started with Visual Studio guide
Update this document for 2021.
Reviewed By: aaron.ballman, kuhnel, amccarth
Differential Revision: https://reviews.llvm.org/D108513
Rob Suderman [Thu, 26 Aug 2021 18:06:12 +0000 (11:06 -0700)]
[mlir][tosa] Elementwise operation dynamic shape support
Added dynamic shape support for elementwise operations. This assumes equal
sizes (broadcasting 1-length dynamic is problematic).
Reviewed By: NatashaKnk
Differential Revision: https://reviews.llvm.org/D108730
Louis Dionne [Thu, 26 Aug 2021 18:18:03 +0000 (14:18 -0400)]
[libc++][NFC] Sort headers alphabetically
Shafik Yaghmour [Thu, 26 Aug 2021 18:11:00 +0000 (11:11 -0700)]
[LLDB] Add type to the output for FieldDecl when logging in ClangASTSource::layoutRecordType
I was debugging a problem and noticed that it would have been helpful to have
the type of each FieldDecl when looking at the output from
ClangASTSource::layoutRecordType.
Differential Revision: https://reviews.llvm.org/D108257
Andrea Di Biagio [Thu, 26 Aug 2021 17:57:59 +0000 (18:57 +0100)]
[MCA][RegisterFile] Consistently update the PRF in the presence of multiple writes to the same register.
My last change to the RegisterFile (PR51495) has introduced a bug in the logic
that allocates physical registers in the PRF.
In some cases, this bug could have triggered a nasty unsigned wrap in the number
of allocated registers, thus resulting in mca being stuck forever in a loop of
PRF availability checks.
LLVM GN Syncbot [Thu, 26 Aug 2021 18:08:07 +0000 (18:08 +0000)]
[gn build] Port
ee44dd8062a2
Louis Dionne [Wed, 11 Aug 2021 21:36:35 +0000 (17:36 -0400)]
[libc++] Implement the underlying mechanism for range adaptors
This patch implements the underlying mechanism for range adaptors. It
does so based on http://wg21.link/p2387, even though that paper hasn't
been adopted yet. In the future, if p2387 is adopted, it would suffice
to rename `__bind_back` to `std::bind_back` and `__range_adaptor_closure`
to `std::range_adaptor_closure` to implement that paper by the spec.
Differential Revision: https://reviews.llvm.org/D107098
Michael Jones [Wed, 25 Aug 2021 21:11:31 +0000 (21:11 +0000)]
[libc] add inttypes header
Add inttypes.h to llvm libc. As its first functions strtoimax and
strtoumax are included.
Reviewed By: sivachandra
Differential Revision: https://reviews.llvm.org/D108736
Alexey Bataev [Thu, 26 Aug 2021 17:35:31 +0000 (10:35 -0700)]
[SLP][NFC]Add a test for correct shuffles order after reordering.
Yonghong Song [Thu, 26 Aug 2021 16:44:14 +0000 (09:44 -0700)]
[DebugInfo] convert btf_tag attrs to DI annotations for DIGlobalVariable
Generate btf_tag annotations for DIGlobalVariable. The annotations
are represented as an DINodeArray in DebugInfo.
Differential Revision: https://reviews.llvm.org/D106619
Walter Erquinigo [Thu, 26 Aug 2021 17:35:29 +0000 (10:35 -0700)]
[NFC] Removing deprecated intel-features test folder
This folder has no valid tests anymore
Sanjay Patel [Thu, 26 Aug 2021 16:33:42 +0000 (12:33 -0400)]
[GlobalOpt] add tests for constant expressions that can trap; NFC
https://llvm.org/PR47578
Walter Erquinigo [Thu, 26 Aug 2021 17:34:04 +0000 (10:34 -0700)]
[NFC] Remove deprecated Intel PT test
Jon Chesterfield [Thu, 26 Aug 2021 16:36:04 +0000 (17:36 +0100)]
[libomptarget][amdgpu][nfc] Rename source files
Louis Dionne [Wed, 25 Aug 2021 16:27:20 +0000 (12:27 -0400)]
[libc++] Fix incorrect bypassing of <wctype.h>
Differential Revision: https://reviews.llvm.org/D108709
Vitaly Buka [Thu, 26 Aug 2021 17:21:20 +0000 (10:21 -0700)]
[NFC][sanitizer] Swap qsort_r and qsort code
To simplify future review.
Louis Dionne [Thu, 26 Aug 2021 17:21:29 +0000 (13:21 -0400)]
[libc++] XFAIL align.pass.cpp for PowerPC LE
This patch XFAILs the `align.pass.cpp` for PowerPC (LE).
It appears that this test will fail on Power for the `LLIArr2` and `Padding` structs within the test,
as the `assert` for `alignof(AtomicImpl) >= sizeof(AtomicImpl)` will be false. In this case, these structs
presumably should not be lock-free, so we currently XFAIL this for now.
The failure was discovered after D97913 was committed. It looks like `alignof(AtomicImpl) < sizeof(AtomicImpl)`,
even prior to this commit, but this test began running on Power after D97913, whereas we were
not running `align.pass.cpp` before.
This patch addresses https://bugs.llvm.org/show_bug.cgi?id=51548 by temporarily XFAILing the test
in order to investigate it further.
Differential Revision: https://reviews.llvm.org/D108668
Craig Topper [Thu, 26 Aug 2021 16:33:53 +0000 (09:33 -0700)]
[RISCV] Insert a sext_inreg when type legalizing i32 shl by constant on RV64.
Similar to what we do for add/sub/mul.
This can help remove some sext.w. There are some regressions on
some bswap tests, but I have an idea how to fix that for a follow up.
A new PACKW pattern is added to handle the new sext_inreg placement.
Differential Revision: https://reviews.llvm.org/D108663
Fangrui Song [Thu, 26 Aug 2021 17:13:16 +0000 (10:13 -0700)]
[CMake] Enable LLVM_ENABLE_PER_TARGET_RUNTIME_DIR by default on Linux
This makes the default build closer to a -DLLVM_ENABLE_RUNTIMES=all build.
The layout is arguably superior because different libraries of target triples
are in different directories, similar to GCC/Debian multiarch.
When LLVM_DEFAULT_TARGET_TRIPLE is x86_64-unknown-linux-gnu,
`lib/clang/14.0.0/lib/libclang_rt.asan-x86_64.a`
becomes
`lib/clang/14.0.0/lib/x86_64-unknown-linux-gnu/libclang_rt.asan.a`.
Clang has been detecting both paths since 2018 (D50547).
---
Note: Darwin needs to be disabled. The hierarchy needs to be sorted out.
The current -DLLVM_DEFAULT_TARGET_TRIPLE=off state is like:
```
lib/clang/14.0.0/lib/darwin/libclang_rt.profile_ios.a
lib/clang/14.0.0/lib/darwin/libclang_rt.profile_iossim.a
lib/clang/14.0.0/lib/darwin/libclang_rt.profile_osx.a
```
Windows needs to be disabled: https://reviews.llvm.org/D107799?id=368557#2963311
Differential Revision: https://reviews.llvm.org/D107799
Yonghong Song [Mon, 19 Jul 2021 16:33:55 +0000 (09:33 -0700)]
[DebugInfo] generate btf_tag annotations for DIGlobalVariable
Generate btf_tag annotations for DIGlobalVariable.
A field "annotations" is introduced to DIGlobalVariable, and
annotations are represented as an DINodeArray, similar to
DIComposite elements. The following example illustrates how
annotations are encoded in IR:
distinct !DIGlobalVariable(..., annotations: !10)
!10 = !{!11, !12}
!11 = !{!"btf_tag", !"a"}
!12 = !{!"btf_tag", !"b"}
Differential Revision: https://reviews.llvm.org/D106619
RamNalamothu [Thu, 26 Aug 2021 16:54:06 +0000 (22:24 +0530)]
[DWARFLinker] Prefix debug section names with '.' in the comments. NFC.
In DWARFLinker.h, some comments prefix the debug section names
with '.' while others do not.
Reviewed By: dblaikie
Differential Revision: https://reviews.llvm.org/D108519
Aaron Ballman [Thu, 26 Aug 2021 16:53:52 +0000 (12:53 -0400)]
Typo fix; NFC
Aaron Ballman [Thu, 26 Aug 2021 16:39:12 +0000 (12:39 -0400)]
Adding an assertion back.
This assert was removed in
98339f14a0420cdfbe4215d8d1bc0a01165e0495,
but during post-commit review, it was pointed out that the assert was
valid.
Andrew Litteken [Thu, 26 Aug 2021 15:24:34 +0000 (08:24 -0700)]
[CodeExtractor] Making the arguments outlined easier to access from the outside
The Code Extractor does not provide an easy mechanism for determining the
inputs and outputs after extraction has occurred, this patch gives the
ability to pass in empty SetVectors to be filled with the inputs and
outputs if they need to be analyzed.
Added Tests:
- InputOutputMonitoring in unittests/Transforms/Utils/CodeExtractorTests.cpp
Reviewers: paquette
Differential Revision: https://reviews.llvm.org/D106991
Luís Marques [Thu, 26 Aug 2021 16:43:06 +0000 (17:43 +0100)]
[Clang][RISCV] Implement getConstraintRegister for RISC-V
The getConstraintRegister method is used by semantic checking of inline
assembly statements in order to diagnose conflicts between clobber list
and input/output lists. By overriding getConstraintRegister we get those
diagnostics and we match RISC-V GCC's behavior. The implementation is
trivial due to the lack of single-register RISC-V-specific constraints.
Differential Revision: https://reviews.llvm.org/D108624
Stanislav Mekhanoshin [Wed, 25 Aug 2021 21:02:38 +0000 (14:02 -0700)]
[AMDGPU] Invert partial vgpr to agpr spill lane order
On targets requiring VGPR alignment we may end up spilling an
unaligned register if we were partially spilled odd number of
leading lanes. The reminder will start with an odd register.
This problem is solved by inverting the order of lanes to
be spillied so that we start from the end.
Differential Revision: https://reviews.llvm.org/D108732
Craig Topper [Wed, 25 Aug 2021 22:28:06 +0000 (15:28 -0700)]
[SelectionDAG] Optimize bitreverse expansion to minimize the number of mask constants.
We can halve the number of mask constants by masking before shl
and after srl.
This can reduce the number of mov immediate or constant
materializations. Or reduce the number of constant pool loads
for X86 vectors.
I think we might be able to do something similar for bswap. I'll
look at it next.
Differential Revision: https://reviews.llvm.org/D108738
LLVM GN Syncbot [Thu, 26 Aug 2021 16:28:53 +0000 (16:28 +0000)]
[gn build] Port
1076082a0d97
Jon Chesterfield [Thu, 26 Aug 2021 16:28:17 +0000 (17:28 +0100)]
[libomptarget][amdgpu] Macro for accessing GPU variables from plugin
Lets the amdgpu plugin write to omptarget_device_environment
to enable debugging. Intend to use in the near future to record the
wavesize that a given deviceRTL was compiled with for running on hardware
that supports 32 or 64.
Patch sets all the attributes that are useful. Notably .data means the variable
is set by writing to host memory before copying to the GPU instead of launching
a kernel to update the image. Can simplify the plugin slightly to drop the
code for patching after load if this is used consistently.
NFC on nvptx, cuda plugin seems to work fine without any annotations.
Reviewed By: jdoerfert
Differential Revision: https://reviews.llvm.org/D108698
Alexandre Rames [Tue, 10 Aug 2021 17:25:21 +0000 (10:25 -0700)]
[Support]: Introduce the `HashBuilder` interface.
The `HashBuilder` interface allows conveniently building hashes of various data
types, without relying on the underlying hasher type to know about hashed data
types.
Reviewed By: dexonsmith
Differential Revision: https://reviews.llvm.org/D106910
Alexey Bataev [Thu, 26 Aug 2021 16:11:22 +0000 (09:11 -0700)]
Revert "[SLP]Improve graph reordering."
This reverts commit
a28234e37af877b2b4a23c2091c27fa18c155f9a to
investigate a compiler crash caused by the commit.
Balazs Benics [Thu, 26 Aug 2021 16:15:10 +0000 (18:15 +0200)]
[analyzer] Extend the documentation of MallocOverflow
Previously by following the documentation it was not immediately clear
what the capabilities of this checker are.
In this patch, I add some clarification on when does the checker issue a
report and what it's limitations are.
I'm also advertising suppressing such reports by adding an assertion, as
demonstrated by the test3().
I'm highlighting that this checker might produce an extensive amount of
findings, but it might be still useful for code audits.
Reviewed By: martong
Differential Revision: https://reviews.llvm.org/D107756
Kazu Hirata [Thu, 26 Aug 2021 16:02:26 +0000 (09:02 -0700)]
[IR] Remove addPseudoProbeAttribute (NFC)
The last use was removed on Jun 17, 2021 in commit
bd52495518808bdbf24f4d8e9e20774d6d2e3333.
Simon Wallis [Thu, 26 Aug 2021 15:56:26 +0000 (16:56 +0100)]
[AArch64] provide strictfp attributes in test file
A post-commit review comment on https://reviews.llvm.org/D107452 pointed out that
https://llvm.org/docs/LangRef.html
says:
"In a function that uses the constrained intrinsics the strictfp attribute is required on all function calls."
Although there are several files across several test directories which don't follow this guidance, it is straightforward to provide this attribute.
Reviewed By: kpn
Differential Revision: https://reviews.llvm.org/D107567
Yonghong Song [Thu, 26 Aug 2021 06:13:38 +0000 (23:13 -0700)]
[DebugInfo] convert btf_tag attrs to DI annotations for DISubprograms
Generate btf_tag annotations for DISubprograms. The annotations
are represented as an DINodeArray in DebugInfo.
Differential Revision: https://reviews.llvm.org/D106618
Roman Lebedev [Thu, 26 Aug 2021 15:40:00 +0000 (18:40 +0300)]
[X86][Codegen] PR51615: don't replace wide volatile load with narrow broadcast-from-memory
Even though https://bugs.llvm.org/show_bug.cgi?id=51615
appears to be introduced by D105390, the fix lies here.
We can not replace a wide volatile load with a broadcast-from-memory,
because that would narrow the load, which isn't legal for volatiles.
Reviewed By: spatel
Differential Revision: https://reviews.llvm.org/D108757
Florian Hahn [Thu, 26 Aug 2021 12:56:58 +0000 (13:56 +0100)]
[ConstraintElimination] Rewrite tests to reduce verification complexity.
This patch reduces the bitwidth of types certain tests operate and gets
rid of a number of @use(i1) calls and xor's the conditions together
instead, which eliminates all timeouts when verifying the tests.
See https://github.com/AliveToolkit/alive2/issues/744 for more details.
Anna Thomas [Wed, 25 Aug 2021 20:19:18 +0000 (16:19 -0400)]
[LoopPredication] Preserve MemorySSA
Since LICM has now unconditionally moved to MemorySSA based form, all
passes that run in same LPM as LICM need to preserve MemorySSA (i.e. our
downstream pipeline).
Added loop-mssa to all tests and perform -verify-memoryssa within
LoopPredication itself.
Differential Revision: https://reviews.llvm.org/D108724
Arthur O'Dwyer [Wed, 25 Aug 2021 23:40:03 +0000 (19:40 -0400)]
[libc++] Revert a use of `static_cast` for `_VSTD::forward`. NFCI.
As requested in D107584.
Differential Revision: https://reviews.llvm.org/D108743
Yonghong Song [Mon, 19 Jul 2021 15:33:01 +0000 (08:33 -0700)]
[DebugInfo] generate btf_tag annotations for DISubprogram types
Generate btf_tag annotations for DISubprogram types.
A field "annotations" is introduced to DISubprogram, and
annotations are represented as an DINodeArray, similar to
DIComposite elements. The following example illustrates how
annotations are encoded in IR:
distinct !DISubprogram(..., annotations: !10)
!10 = !{!11, !12}
!11 = !{!"btf_tag", !"a"}
!12 = !{!"btf_tag", !"b"}
Differential Revision: https://reviews.llvm.org/D106618
Andrew Wei [Thu, 26 Aug 2021 14:52:42 +0000 (22:52 +0800)]
[CGP] Fix the crash for combining address mode when having cyclic dependency
In the combination of addressing modes, when replacing the matched phi nodes,
sometimes the phi node to be replaced has been modified. For example,
there’s matcher set [A, B] and [C, A], which will have cyclic dependency:
A is replaced by B and C will be replaced by A. Because we tried to match new phi node
to another new phi node, we should ignore new phi nodes when mapping new phi node to old one.
Reviewed By: skatkov
Differential Revision: https://reviews.llvm.org/D108635
Jacob Bramley [Thu, 5 Aug 2021 12:36:05 +0000 (13:36 +0100)]
[AArch64] Lower fpto*i.sat intrinsics for NEON.
Following on from D102353, extend the fpto*i.sat intrinsics to use NEON
fcvt* instructions.
Differential Revision: https://reviews.llvm.org/D108460
Joe Loser [Thu, 26 Aug 2021 14:34:35 +0000 (10:34 -0400)]
[libc++][NFC] Fix typo in test/support/test_range.h
Fix typo in `#error` filepath.
Differential Revision: https://reviews.llvm.org/D108764
Kent Ross [Thu, 26 Aug 2021 14:23:54 +0000 (10:23 -0400)]
[libc++][doc] Cleanup, normalize, and update projects status docs
Mark the now-done [cmp.result] in spaceship projects as complete;
normalize some status markers for papers and projects; fix alignment
and line breaks in spaceship projects, add links to standard
Differential Revision: https://reviews.llvm.org/D108502
Alexey Bataev [Tue, 3 Aug 2021 20:20:32 +0000 (13:20 -0700)]
[SLP]Improve graph reordering.
Reworked reordering algorithm. Originally, the compiler just tried to
detect the most common order in the reordarable nodes (loads, stores,
extractelements,extractvalues) and then fully rebuilding the graph in
the best order. This was not effecient, since it required an extra
memory and time for building/rebuilding tree, double the use of the
scheduling budget, which could lead to missing vectorization due to
exausted scheduling resources.
Patch provide 2-way approach for graph reodering problem. At first, all
reordering is done in-place, it doe not required tree
deleting/rebuilding, it just rotates the scalars/orders/reuses masks in
the graph node.
The first step (top-to bottom) rotates the whole graph, similarly to the previous
implementation. Compiler counts the number of the most used orders of
the graph nodes with the same vectorization factor and then rotates the
subgraph with the given vectorization factor to the most used order, if
it is not empty. Then repeats the same procedure for the subgraphs with
the smaller vectorization factor. We can do this because we still need
to reshuffle smaller subgraph when buildiong operands for the graph
nodes with lasrger vectorization factor, we can rotate just subgraph,
not the whole graph.
The second step (bottom-to-top) scans through the leaves and tries to
detect the users of the leaves which can be reordered. If the leaves can
be reorder in the best fashion, they are reordered and their user too.
It allows to remove double shuffles to the same ordering of the operands in
many cases and just reorder the user operations instead. Plus, it moves
the final shuffles closer to the top of the graph and in many cases
allows to remove extra shuffle because the same procedure is repeated
again and we can again merge some reordering masks and reorder user nodes
instead of the operands.
Also, patch improves cost model for gathering of loads, which improves
x264 benchmark in some cases.
Gives about +2% on AVX512 + LTO (more expected for AVX/AVX2) for {625,525}x264,
+3% for 508.namd, improves most of other benchmarks.
The compile and link time are almost the same, though in some cases it
should be better (we're not doing an extra instruction scheduling
anymore) + we may vectorize more code for the large basic blocks again
because of saving scheduling budget.
Differential Revision: https://reviews.llvm.org/D105020
Kent Ross [Thu, 26 Aug 2021 14:08:59 +0000 (10:08 -0400)]
[libc++][doc] Repair files with CRLF line endings.
These are the only files in libc++ that have CRLF line endings instead of LF.
Differential Revision: https://reviews.llvm.org/D108748
Simon Pilgrim [Thu, 26 Aug 2021 12:17:27 +0000 (13:17 +0100)]
[X86] getShape - don't dereference dyn_cast<>
dyn_cast can return nullptr, use cast<> to assert we have the correct type.
Simon Pilgrim [Thu, 26 Aug 2021 12:15:29 +0000 (13:15 +0100)]
Fix MSVC "result of 32-bit shift implicitly converted to 64 bits" warning. NFCI.
Balazs Benics [Thu, 26 Aug 2021 13:29:32 +0000 (15:29 +0200)]
Revert "[analyzer] Extend the documentation of MallocOverflow"
This reverts commit
6097a41924584b613153237d8e66e9660001ce7d.
Balazs Benics [Thu, 26 Aug 2021 12:31:09 +0000 (14:31 +0200)]
[analyzer] Extend the documentation of MallocOverflow
Previously by following the documentation it was not immediately clear
what the capabilities of this checker are.
In this patch, I add some clarification on when does the checker issue a
report and what it's limitations are.
I'm also advertising suppressing such reports by adding an assertion, as
demonstrated by the test3().
I'm highlighting that this checker might produce an extensive amount of
findings, but it might be still useful for code audits.
Reviewed By: martong
Differential Revision: https://reviews.llvm.org/D107756
Andrew Wei [Thu, 26 Aug 2021 13:01:59 +0000 (21:01 +0800)]
[LoopDataPrefetch] Add missed LoopSimplify dependence for prefetch pass
SCEVExpander::expandCodeFor may expand add recurrences for loop with a preheader,
so we should make LoopDataPrefetch dependent on LoopSimplify.
This patch will try to fix : https://bugs.llvm.org/show_bug.cgi?id=43784
Reviewed By: Meinersbur
Differential Revision: https://reviews.llvm.org/D108448
Jessica Clarke [Thu, 26 Aug 2021 11:48:29 +0000 (12:48 +0100)]
[AMDGPU] Remove dead and broken ComplexPatterns
SelectADDRParam was discovered as being dead 5 years ago and removed in
7b4ef068c6f5 but the unused ComplexPattern definition was left behind.
SelectADDRDWord has never existed as far as I can tell, even back when
AMDGPU was R600-only and called that.
Reviewed By: foad
Differential Revision: https://reviews.llvm.org/D108758
Jessica Clarke [Thu, 26 Aug 2021 11:48:14 +0000 (12:48 +0100)]
[SelectionDAG] Remove unused SDTConvertOp
This was used by CONVERT_RNDSAT, which was removed in
def496c04b0d, so
the profile is now unused.
Reviewed By: xgupta
Differential Revision: https://reviews.llvm.org/D108508
Andrea Di Biagio [Wed, 25 Aug 2021 20:34:35 +0000 (21:34 +0100)]
[X86][MCA] Address the latest issues with MULX reported in PR51495.
It turns out that SchedWrite WriteIMulH was always assigned to the low half of
the result of a MULX (rather than to the high half).
To avoid confusion, this patch swaps the two MULX writes in the tablegen
definition of MULX32/64. That way, write names better describe what they
actually refer to; this also avoids further complications if in future we decide
to reuse the same MulH writes to also model other scalar integer multiply
instructions. I also had to swap the latency values for the two MULX writes to
make sure that the change is effectively an NFC. In fact, none of the existing
x86 tests were affected by this small refactoring.
This patch also fixes a bug in MCA: a wrong latency value was propagated for
instructions that perform multiple writes to a same register. This last issue
was found by Roman while testing MULX on targets that define a different latency
for the Low/High part of the result.
Differential Revision: https://reviews.llvm.org/D108727
Alex Richardson [Thu, 26 Aug 2021 10:11:56 +0000 (11:11 +0100)]
[sanitizer] Fix build on FreeBSD RISC-V
We have to avoid calling renameat2 and clone on FreeBSD.
Additionally, the mcontext structure has different members.
Reviewed By: jrtc27, luismarques
Differential Revision: https://reviews.llvm.org/D103886
Sindhu Chittireddy [Thu, 26 Aug 2021 10:58:56 +0000 (06:58 -0400)]
Assert pointer cannot be null; NFC
Klocwork static code analysis exposed this concern:
Pointer 'SubExpr' returned from call to getSubExpr() function which may
return NULL from 'cast_or_null<Expr>(Operand)', which will be
dereferenced in the statement following it
Add an assert on SubExpr to make it clear this pointer cannot be null.
Matthew Devereau [Thu, 26 Aug 2021 10:08:03 +0000 (11:08 +0100)]
[AArch64][SVE] Teach cost model masked gathers/scatters are cheap
Tell the cost model to use the scalable calculation for non-neon fixed vector.
This results in a cheaper cost for fixed-length SVE masked gathers/scatters
allowing the vectorizor to emit them more frequently.
Benjamin Kramer [Thu, 26 Aug 2021 10:11:02 +0000 (12:11 +0200)]
[X86] Don't write to the source directory in test
Roman Lebedev [Thu, 26 Aug 2021 08:51:28 +0000 (11:51 +0300)]
The maximal representable alignment in LLVM IR is 1GiB, not 512MiB
In LLVM IR, `AlignmentBitfieldElementT` is 5-bit wide
But that means that the maximal alignment exponent is `(1<<5)-2`,
which is `30`, not `29`. And indeed, alignment of `
1073741824`
roundtrips IR serialization-deserialization.
While this doesn't seem all that important, this doubles
the maximal supported alignment from 512MiB to 1GiB,
and there's actually one noticeable use-case for that;
On X86, the huge pages can have sizes of 2MiB and 1GiB (!).
So while this doesn't add support for truly huge alignments,
which i think we can easily-ish do if wanted, i think this adds
zero-cost support for a not-trivially-dismissable case.
I don't believe we need any upgrade infrastructure,
and since we don't explicitly record the IR version,
we don't need to bump one either.
As @craig.topper speculates in D108661#2963519,
this might be an artificial limit imposed by the original implementation
of the `getAlignment()` functions.
Differential Revision: https://reviews.llvm.org/D108661
Benjamin Kramer [Thu, 26 Aug 2021 09:37:07 +0000 (11:37 +0200)]
[libunwind] Don't include cet.h/immintrin.h unconditionally
These may not exist when CET isn't available.
Alex Richardson [Thu, 26 Aug 2021 08:51:23 +0000 (09:51 +0100)]
Make Value::MaxAlignment(Exponent) constexpr
This avoids references to the variables be generated when using e.g. max().
Reviewed By: dexonsmith
Differential Revision: https://reviews.llvm.org/D95050
Alex Richardson [Thu, 26 Aug 2021 08:50:05 +0000 (09:50 +0100)]
Fix __attribute__((annotate("")) with non-zero globals AS
The existing code attempting to bitcast from a value in the default globals AS
to i8 addrspace(0)* was triggering an assertion failure in our downstream fork.
I found this while compiling poppler for CHERI-RISC-V (we use AS200 for all
globals). The test case uses AMDGPU since that is one of the in-tree targets
with a non-zero default globals address space.
The new test previously triggered a "Invalid constantexpr bitcast!" assertion
and now correctly generates code with addrspace(1) pointers.
Reviewed By: rjmccall
Differential Revision: https://reviews.llvm.org/D105972
Alex Richardson [Thu, 26 Aug 2021 08:47:53 +0000 (09:47 +0100)]
Fix LLVM_ENABLE_THREADS check from
26a92d5852b2c6bf77efd26f6c0194c913f40285
We should be using #if instead of #ifdef here since LLVM_ENABLE_THREADS
is set using #cmakedefine01 so is always defined.
Reviewed By: aaron.ballman
Differential Revision: https://reviews.llvm.org/D108110
Florian Hahn [Thu, 26 Aug 2021 09:08:00 +0000 (10:08 +0100)]
[ConstraintElimination] Initial support for using info from assumes.
This patch adds initial support to use facts from @llvm.assume calls. It
intentionally does not handle all possible cases to keep things simple
initially.
For now, the condition from an assume is made available on entry to the
containing block, if the assume is guaranteed to execute. Otherwise it
is only made available in the successor blocks.
Florian Hahn [Thu, 26 Aug 2021 09:07:46 +0000 (10:07 +0100)]
[ConstraintElimination] Add more assume tests.
David Green [Thu, 26 Aug 2021 08:43:44 +0000 (09:43 +0100)]
[AArch64] Remove unpredictable from narrowing instructions.
Like other similar instructions the xtn2 family do not have side
effects, and explicitly marking them as such can help improve scheduling
freedom.
David Green [Thu, 26 Aug 2021 08:13:30 +0000 (09:13 +0100)]
[AArch64] Add a Cortex-A55 NEON scheduler test case.
Jay Foad [Thu, 26 Aug 2021 08:27:01 +0000 (09:27 +0100)]
[MachineScheduler] Fix tracing
Consistently print a newline before "RegionInstrs:".
LLVM GN Syncbot [Thu, 26 Aug 2021 08:14:37 +0000 (08:14 +0000)]
[gn build] Port
21b25a1fb32e
gejin [Thu, 26 Aug 2021 08:20:38 +0000 (16:20 +0800)]
[libunwind] Support stack unwind in CET environment
Control-flow Enforcement Technology (CET), published by Intel,
introduces shadow stack feature aiming to ensure a return from
a function is directed to where the function was called.
In a CET enabled system, each function call will push return
address into normal stack and shadow stack, when the function
returns, the address stored in shadow stack will be popped and
compared with the return address, program will fail if the 2
addresses don't match.
In exception handling, the control flow may skip some stack frames
and we must adjust shadow stack to avoid violating CET restriction.
In order to achieve this, we count the number of stack frames skipped
and adjust shadow stack by this number before jumping to landing pad.
Reviewed By: hjl.tools, compnerd, MaskRay
Differential Revision: https://reviews.llvm.org/D105968
Signed-off-by: gejin <ge.jin@intel.com>
Jean Perier [Thu, 26 Aug 2021 07:44:24 +0000 (09:44 +0200)]
[flang] Take result length into account in ApplyElementwise folding
ApplyElementwise on character operation was always creating a result
ArrayConstructor with the length of the left operand. This is not
correct for concatenation and SetLength operations.
Compute and thread the length to the spot creating the ArrayConstructor
so that the length is correct for those character operations.
Differential Revision: https://reviews.llvm.org/D108711
LLVM GN Syncbot [Thu, 26 Aug 2021 07:29:05 +0000 (07:29 +0000)]
[gn build] Port
3373e845398b
Gabor Bencze [Wed, 25 Aug 2021 18:22:15 +0000 (20:22 +0200)]
[clang-tidy] Add bugprone-suspicious-memory-comparison check
The check warns on suspicious calls to `memcmp`.
It currently checks for comparing types that do not have
unique object representations or are non-standard-layout.
Based on
https://wiki.sei.cmu.edu/confluence/display/c/EXP42-C.+Do+not+compare+padding+data
https://wiki.sei.cmu.edu/confluence/display/c/FLP37-C.+Do+not+use+object+representations+to+compare+floating-point+values
and part of
https://wiki.sei.cmu.edu/confluence/display/cplusplus/OOP57-CPP.+Prefer+special+member+functions+and+overloaded+operators+to+C+Standard+Library+functions
Add alias `cert-exp42-c` and `cert-flp37-c`.
Some tests are currently failing at head, the check depends on D89649.
Originally started in D71973
Reviewed By: aaron.ballman
Differential Revision: https://reviews.llvm.org/D89651
Gabor Bencze [Wed, 25 Aug 2021 18:22:15 +0000 (20:22 +0200)]
Fix __has_unique_object_representations with no_unique_address
Fix incorrect behavior of `__has_unique_object_representations`
when using the no_unique_address attribute.
Based on the bug report: https://bugs.llvm.org/show_bug.cgi?id=47722
Reviewed By: aaron.ballman
Differential Revision: https://reviews.llvm.org/D89649
Esme-Yi [Thu, 26 Aug 2021 07:17:06 +0000 (07:17 +0000)]
[llvm-readobj][XCOFF] Add support for `--needed-libs` option.
Summary: This patch is trying to add support for llvm-readobj
--needed-libs option under XCOFF.
For XCOFF, the needed libraries can be found from the Import
File ID Name Table of the Loader Section.
Currently, I am using binary inputs in the test since yaml2obj
does not yet support for writing the Loader Section and the
import file table.
Reviewed By: jhenderson
Differential Revision: https://reviews.llvm.org/D106643
Lin Sun [Thu, 26 Aug 2021 06:50:17 +0000 (23:50 -0700)]
[Driver][Linux] Fix regression when -DLIBCXX_LIBDIR_SUFFIX=64
This patch allows an installed (`ninja install-clang`) Clang to find
`../lib64/libc++.so`
Reviewed By: MaskRay
Differential Revision: https://reviews.llvm.org/D108286
Jan Svoboda [Wed, 25 Aug 2021 16:45:46 +0000 (18:45 +0200)]
[clang][deps] Reset non-modular language and preprocessor options
There are a number of language and preprocessor options that are reset in the `CompilerInvocation` that describes the build of an implicit module. This patch uses the logic for explicit modules as well.
Reviewed By: dexonsmith
Differential Revision: https://reviews.llvm.org/D108710
Aart Bik [Thu, 26 Aug 2021 04:01:12 +0000 (21:01 -0700)]
[mlir][sparse] add asCOO() functionality to sparse tensor object
This prepares general sparse to sparse conversions. The code that
needs to be generated using this new feature is now simply:
(1) coo = sparse_tensor_1->asCOO(); // source format1
(2) sparse_tensor_2 = newSparseTensor(coo); // destination format2
By using COO as an intermediate, we can do *all* conversions without
having to implement the full O(N^2) conversion matrix. Note that we
can always improve particular conversions individually if a faster
solution is required.
Reviewed By: bixia
Differential Revision: https://reviews.llvm.org/D108681
Wenlei He [Tue, 24 Aug 2021 16:55:18 +0000 (09:55 -0700)]
[CSSPGO] Add switch for sample loader to honor global pre-inliner decision from llvm-profgen
The change adds a switch to allow sample loader to use global pre-inliner's decision instead. The pre-inliner in llvm-profgen makes inline decision globally based on whole program profile and function byte size as cost proxy.
Since pre-inliner also adjusts/merges context profile based on its inline decision, honoring its inline decision in sample loader would lead to better post-inline profile quality especially for thinlto where cross module profile merging isn't possible without pre-inliner.
Minor fix in profile reader is also included. When pre-inliner is use, we now also turn off the default merging and trimming logic unless it's explicitly asked.
Differential Revision: https://reviews.llvm.org/D108677
Fangrui Song [Wed, 25 Aug 2021 23:59:06 +0000 (16:59 -0700)]
[LLVMgold.so][test] Make comdat-nodeduplicate.ll work with binutils<2.27
Sam Clegg [Wed, 25 Aug 2021 22:13:46 +0000 (18:13 -0400)]
[clang][Emscripten] Define __unix family of macros
This will allow us to remove these from the downstream
driver:
https://github.com/emscripten-core/emscripten/blob/
57270ce8150a5107e591b4e9ec7cbeff0ba7c905/emcc.py#L860-L863
Differential Revision: https://reviews.llvm.org/D108735
Arthur Eubanks [Wed, 25 Aug 2021 23:13:40 +0000 (16:13 -0700)]
[gn build] Unbreak non-clang host builds
eecd5d0a broke non-clang host builds.
Some crt code is not always built with the just-built clang.
0da172b checked if the compiler is clang, not assert that the compiler
is clang.
Alexey Bataev [Wed, 25 Aug 2021 22:54:23 +0000 (15:54 -0700)]
[SLP][NFC]Add a test for non-optimal PHIs vectorization, NFC.
Heejin Ahn [Wed, 25 Aug 2021 10:53:22 +0000 (03:53 -0700)]
[WebAssembly] Use entry block only for initializations in EmSjLj
Emscripten SjLj transformation is done in four steps. This will be
mostly the same for the soon-to-be-added Wasm SjLj; the step 1, 3, and 4
will be shared and there will be separate way of doing step 2.
1. Initialize `setjmpTable` and `setjmpTableSize` in the entry BB
2. Handle `setjmp` callsites
3. Handle `longjmp` callsites
4. Cleanup and update SSA
We initialize `setjmpTable` and `setjmpTableSize` in the entry BB. But
if the entry BB contains a `setjmp` call, some `setjmp` handling
transformation will also happen in the entry BB, such as calling
`saveSetjmp`.
This is fine for Emscripten SjLj but not for Wasm SjLj, because in Wasm
SjLj we will add a dispatch BB that contains a `switch` right after the
entry BB, from which we jump to one of post-`setjmp` BBs. And this
dispatch BB should precede all `setjmp` calls.
Emscripten SjLj (current):
```
entry:
%setjmpTable = ...
%setjmpTableSize = ...
...
call @saveSetjmp(...)
```
Wasm SjLj (follow-up):
```
entry:
%setjmpTable = ...
%setjmpTableSize = ...
setjmp.dispatch:
...
; Jump to the right post-setjmp BB, if we are returning from a
; longjmp. If this is the first setjmp call, go to %entry.split.
switch i32 %no, label %entry.split [
i32 1, label %post.setjmp1
i32 2, label %post.setjmp2
...
i32 N, label %post.setjmpN
]
entry.split:
...
call @saveSetjmp(...)
```
So in Wasm SjLj we split the entry BB to make the entry block only for
`setjmpTable` and `setjmpTableSize` initialization and insert a
`setjmp.dispatch` BB. (This part is not in this CL. This will be a
follow-up.) But note that Emscripten SjLj and Wasm SjLj share all
steps except for the step 2. If we only split the entry BB only for Wasm
SjLj, there will be one more `if`-`else` and the code will be more
complicated.
So this CL splits the entry BB in Emscripten SjLj and put only
initialization stuff there as follows:
Emscripten SjLj (this CL):
```
entry:
%setjmpTable = ...
%setjmpTableSize = ...
br %entry.split
entry.split:
...
call @saveSetjmp(...)
```
This is just done to share code with Wasm SjLj. It adds an unnecessary
branch but this will be removed in later optimization passes anyway.
This is in effect NFC, meaning the program behavior will not change, but
existing ll tests files have changed because the entry block was split.
The reason I upload this in a separate CL is to make the Wasm SjLj diff
tidier, because this changes many existing Emscripten SjLj tests, which
can be confusing for the follow-up Wasm SjLj CL.
Reviewed By: tlively
Differential Revision: https://reviews.llvm.org/D108729
Heejin Ahn [Wed, 25 Aug 2021 03:40:21 +0000 (20:40 -0700)]
[WebAssembly] Extract longjmp handling in EmSjLj to a function (NFC)
Emscripten SjLj and (soon-to-be-added) Wasm SjLj transformation share
many steps:
1. Initialize `setjmpTable` and `setjmpTableSize` in the entry BB
2. Handle `setjmp` callsites
3. Handle `longjmp` callsites
4. Cleanup and update SSA
1, 3, and 4 are identical for Emscripten SjLj and Wasm SjLj. Only the
step 2 is different. This CL extracts the current Emscripten SjLj's
longjmp callsites handling into a function. The reason to make this a
separate CL is, without this, the diff tool cannot compare things well
in the presence of moved code and added code in the followup Wasm SjLj
CL, and it ends up mixing them together, making the diff unreadable.
Also fixes some typos and variable names. So far we've been calling the
buffer argument to `setjmp` and `longjmp` `jmpbuf`, but the name used in
the man page for those functions is `env`, so updated them to be
consistent.
Reviewed By: tlively
Differential Revision: https://reviews.llvm.org/D108728
Dimitry Andric [Wed, 25 Aug 2021 21:08:13 +0000 (23:08 +0200)]
[libc++][NFC] Remove duplicate ranges entry in CMakeLists.txt.
The second entry got added accidentally as part of
5a3309f825769.
Reviewed By: cjdb
Differential Revision: https://reviews.llvm.org/D108726
Reid Kleckner [Wed, 25 Aug 2021 18:34:00 +0000 (11:34 -0700)]
Effectively revert
33c3d8a916c / D33782
This change would treat the token `or` in system headers as an
identifier, and elsewhere as an operator. As reported in
llvm.org/pr42427, many users classify their third party library headers
as "system" headers to suppress warnings. There's no clean way to
separate Windows SDK headers from user headers.
Clang is still able to parse old Windows SDK headers if C++ operator
names are disabled. Traditionally this was controlled by
`-fno-operator-names`, but is now also enabled with `/permissive` since
D103773. This change will prevent `clang-cl` from parsing <query.h> from
the Windows SDK out of the box, but there are multiple ways to work
around that:
- Pass `/clang:-fno-operator-names`
- Pass `/permissive`
- Pass `-DQUERY_H_RESTRICTION_PERMISSIVE`
In all of these modes, the operator names will consistently be available
or not available, instead of depending on whether the code is in a
system header.
I added a release note for this, since it may break straightforward
users of the Windows SDK.
Fixes PR42427
Differential Revision: https://reviews.llvm.org/D108720