Fangrui Song [Thu, 15 Sep 2022 05:14:36 +0000 (22:14 -0700)]
[IRBuilder] Fix -Wunused-variable in non-assertion build. NFC
Dhruva Chakrabarti [Thu, 15 Sep 2022 03:08:46 +0000 (03:08 +0000)]
Revert "[OpenMP] Codegen aggregate for outlined function captures"
This reverts commit
7539e9cf811e590d9f12ae39673ca789e26386b4.
Vitaly Buka [Mon, 12 Sep 2022 04:49:18 +0000 (21:49 -0700)]
[nfc][msan] getShadowOriginPtr on <N x ptr>
Some vector instructions can benefit from
of Addr as <N x ptr>.
Differential Revision: https://reviews.llvm.org/D133681
Vitaly Buka [Mon, 12 Sep 2022 04:30:07 +0000 (21:30 -0700)]
[IRBuilder] Add CreateMaskedExpandLoad and CreateMaskedCompressStore
Vitaly Buka [Sun, 11 Sep 2022 19:55:57 +0000 (12:55 -0700)]
[NFC][msan] Rename variables to match definition
Vitaly Buka [Sun, 11 Sep 2022 00:22:32 +0000 (17:22 -0700)]
[NFC][msan] Convert some code to early returns
Reviewed By: kda
Differential Revision: https://reviews.llvm.org/D133673
Vitaly Buka [Sun, 11 Sep 2022 00:13:12 +0000 (17:13 -0700)]
[NFC][msan] Simplify llvm.masked.load origin code
Reviewed By: kda
Differential Revision: https://reviews.llvm.org/D133652
Vitaly Buka [Thu, 15 Sep 2022 01:18:45 +0000 (18:18 -0700)]
[msan] Resolve FIXME from D133880
We don't need to change tests we convertToBool
unconditionally only before OR.
Vitaly Buka [Thu, 15 Sep 2022 01:42:13 +0000 (18:42 -0700)]
[test][msan] Use implicit-check-not
Sheng [Thu, 15 Sep 2022 01:22:15 +0000 (09:22 +0800)]
[M68k] Fix the crash of fast register allocator
`MOVEM` is used to spill the register, which will cause problem with 1 byte data, since it only supports word (2 bytes) and long (4 bytes) size.
We change to use the normal `move` instruction to spill 1 byte data.
Fixes #57660
Reviewed By: myhsu
Differential Revision: https://reviews.llvm.org/D133636
Jeff Niu [Wed, 14 Sep 2022 00:35:38 +0000 (17:35 -0700)]
[mlir] Allow `Attribute::print` to elide the type
This patch adds a flag to `Attribute::print` that prints the attribute
without its type.
Fixes #57689
Reviewed By: rriddle, lattner
Differential Revision: https://reviews.llvm.org/D133822
Jeff Niu [Wed, 14 Sep 2022 20:43:45 +0000 (13:43 -0700)]
[mlir][ods] Add cppClassName to ConfinedType
So ODS can generate `OneTypedResult` when a ConfinedType is used as a
result type.
Reviewed By: rriddle
Differential Revision: https://reviews.llvm.org/D133893
Giorgis Georgakoudis [Thu, 15 Sep 2022 00:09:54 +0000 (00:09 +0000)]
[OpenMP] Codegen aggregate for outlined function captures
Parallel regions are outlined as functions with capture variables explicitly generated as distinct parameters in the function's argument list. That complicates the fork_call interface in the OpenMP runtime: (1) the fork_call is variadic since there is a variable number of arguments to forward to the outlined function, (2) wrapping/unwrapping arguments happens in the OpenMP runtime, which is sub-optimal, has been a source of ABI bugs, and has a hardcoded limit (16) in the number of arguments, (3) forwarded arguments must cast to pointer types, which complicates debugging. This patch avoids those issues by aggregating captured arguments in a struct to pass to the fork_call.
Reviewed By: jdoerfert, jhuber6, ABataev
Differential Revision: https://reviews.llvm.org/D102107
Stanislav Mekhanoshin [Wed, 14 Sep 2022 22:42:30 +0000 (15:42 -0700)]
Fix crash while printing MMO target flags
MachineMemOperand::print can dereference a NULL pointer if TII
is not passed from the printMemOperand. This does not happen while
dumping the DAG/MIR from llc but crashes the debugger if a dump()
method is called from gdb.
Differential Revision: https://reviews.llvm.org/D133903
Craig Topper [Wed, 14 Sep 2022 22:53:18 +0000 (15:53 -0700)]
[RISCV] Simplify some code in RISCVInstrInfo::verifyInstruction. NFCI
This code was written as if it lived in the MC layer instead of
the CodeGen layer. We get the MCInstrDesc directly from MachineInstr.
And we can use RISCVSubtarget::is64Bit instead of going to the
Triple.
Differential Revision: https://reviews.llvm.org/D133905
Sam Clegg [Thu, 1 Sep 2022 07:59:54 +0000 (00:59 -0700)]
[MC] Fix typo in getSectionAddressSize comment. NFC
The comment was refering to a now non-existant function that was removed
in
93e3cf0ebd9c95a8df42fff0aa38fc022422b4d4.
Differential Revision: https://reviews.llvm.org/D133098
Craig Topper [Wed, 14 Sep 2022 21:52:17 +0000 (14:52 -0700)]
[IR][VP] Remove IntrArgMemOnly from vp.gather/scatter.
IntrArgMemOnly is only valid for intrinsics that use a scalar
pointer argument. These intrinsics use a vector of pointer.
Alias analysis will try to find a scalar pointer argument and
will return incorrect alias results when it doesn't find one.
Reviewed By: reames
Differential Revision: https://reviews.llvm.org/D133898
Craig Topper [Wed, 14 Sep 2022 21:52:05 +0000 (14:52 -0700)]
[GVN][VP] Add test case for incorrect removal of a vp.gather. NFC
Pre-commit for D133898
Reviewed By: reames
Differential Revision: https://reviews.llvm.org/D133899
Jez Ng [Tue, 13 Sep 2022 15:15:15 +0000 (11:15 -0400)]
[lld-macho] Have ICF dedup explicitly-defined selrefs
This is what ld64 does (though it doesn't use ICF to do this; instead it
always dedups selrefs by default).
We'll want to dedup implicitly-defined selrefs as well, but I will leave
that for future work.
Additionally, I'm not *super* happy with the current LLD implementation
because I think it is rather janky and inefficient. But at least it
moves us toward the goal of closing the size gap with ld64. I've
described ideas for cleaning up our implementation here:
https://github.com/llvm/llvm-project/issues/57714
Differential Revision: https://reviews.llvm.org/D133780
Jez Ng [Wed, 14 Sep 2022 21:46:41 +0000 (17:46 -0400)]
[lld-macho][nfc] Clean up ICF code
Split these changes out from https://reviews.llvm.org/D133780.
Vitaly Buka [Wed, 14 Sep 2022 03:17:55 +0000 (20:17 -0700)]
[msan] Change logic of ClInstrumentationWithCallThreshold
According to logs, ClInstrumentationWithCallThreshold is workaround
for slow backend with large number of basic blocks.
However, I can't reproduce that one, but I see significant slowdown
after ClCheckConstantShadow. Without ClInstrumentationWithCallThreshold
compiler is able to eliminate many of the branches.
So maybe we should drop ClInstrumentationWithCallThreshold completly.
For now I just change the logic to ignore constant shadow so it will
not trigger callback fallback too early.
Reviewed By: kstoimenov
Differential Revision: https://reviews.llvm.org/D133880
Craig Topper [Wed, 14 Sep 2022 21:41:19 +0000 (14:41 -0700)]
[RISCV] Update error message to not call 'RV32' and 'RV64' an extension.
I used RV32 so I didn't have to write RV32I and RV32E. Ideally
these builtins will be wrapped in a header someday so long term I don't
expect users to see these errors.
Reviewed By: asb
Differential Revision: https://reviews.llvm.org/D133444
Jim Ingham [Wed, 14 Sep 2022 19:45:26 +0000 (12:45 -0700)]
Revert "Revert "Be more careful to maintain quoting information when parsing commands.""
This reverts commit
ac05bc0524c66c74278b26742896a4c634c034cf.
I had incorrectly removed one set of checks in the option handling in
Options::ParseAlias because I couldn't see what it is for. It was a
bit obscure, but it handled the case where you pass "-something=other --"
as the input_line, which caused the built-in "run" alias not to return
the right value for IsDashDashCommand, causing TestHelp.py to fail.
Philip Reames [Wed, 14 Sep 2022 21:40:29 +0000 (14:40 -0700)]
[RISCV] Verify SEW/VecPolicy immediate values
Copy the asserts from the printing code, and turn them into actual verifier rules. Doing this revealed an existing bug - see 0a14551.
Differential Revision: https://reviews.llvm.org/D133869
Philip Reames [Wed, 14 Sep 2022 21:39:09 +0000 (14:39 -0700)]
[RISCV] Fix a silent miscompile in copyPhysReg
Found this when adding verifier rules. The case which arises is that we have a DefMBBI which has a VecPolicy operand. The code was not expecting this, and the unconditional copy of the last two operands resulted in the SEW and VecPolicy fields being added to the VMV_V_V as AVL and SEW respectively.
Oddly, this appears to be a silent in practice. There's no test change despite verifier changes proving that we definitely hit this in existing tests.
Differential Revision: https://reviews.llvm.org/D133868
Kazu Hirata [Wed, 14 Sep 2022 21:35:19 +0000 (14:35 -0700)]
[mlir] Fix warnings
This patch fixes three warnings of the form:
mlir/lib/Dialect/Linalg/Transforms/Vectorization.cpp:1436:5: error:
default label in switch which covers all enumeration values
[-Werror,-Wcovered-switch-default]
Lei Zhang [Wed, 14 Sep 2022 21:23:27 +0000 (17:23 -0400)]
[mlir][vector] Check minor identity map in FoldExtractSliceIntoTransferRead
vecotr.transfer_read ops with minor identity indexing map is rank
reducing, with implicit leading unit dimensions. This should be
a natural extension to support in addition to full identity indexing
maps.
Reviewed By: ThomasRaoux
Differential Revision: https://reviews.llvm.org/D133883
Siva Chandra Reddy [Wed, 14 Sep 2022 19:23:40 +0000 (19:23 +0000)]
[libc] Add POSIX functions pread and pwrite.
Reviewed By: michaelrj
Differential Revision: https://reviews.llvm.org/D133888
Arthur Eubanks [Tue, 13 Sep 2022 21:01:15 +0000 (14:01 -0700)]
[OptBisect] Add flag to print IR when opt-bisect kicks in
-opt-bisect-print-ir-path=foo will dump the IR to foo when opt-bisect-limit starts skipping passes.
Currently we don't print the IR if the opt-bisect-limit is higher than the total number of times opt-bisect is called.
This makes getting the IR right before a bad transform easier.
Reviewed By: hans
Differential Revision: https://reviews.llvm.org/D133809
Leonard Grey [Wed, 7 Sep 2022 17:21:09 +0000 (13:21 -0400)]
[lsan][Darwin] Scan libdispatch and Foundation memory regions
libdispatch uses its own heap (_dispatch_main_heap) for some allocations, including the dispatch_continuation_t that holds a dispatch source's event handler.
Objective-C block trampolines (creating methods at runtime with a block as the implementations) use the VM_MEMORY_FOUNDATION region (see https://github.com/apple-oss-distributions/objc4/blob/
8701d5672d3fd3cd817aeb84db1077aafe1a1604/runtime/objc-block-trampolines.mm#L371).
This change scans both regions to fix false positives. See tests for details; unfortunately I was unable to reduce the trampoline example with imp_implementationWithBlock on a new class, so I'm resorting to something close to the bug as seen in the wild.
Differential Revision: https://reviews.llvm.org/D129385
Vitaly Buka [Thu, 8 Sep 2022 22:39:15 +0000 (15:39 -0700)]
[pipelines] Require GlobalsAA after sanitizers
Restore GlobalsAA if sanitizers inserted at early optimize callback.
The analysis can be useful for the following FunctionPassManager.
Reviewed By: aeubanks
Differential Revision: https://reviews.llvm.org/D133537
Vitaly Buka [Wed, 14 Sep 2022 20:28:46 +0000 (13:28 -0700)]
[NFC][CodeGen] Remove empty line
Jeff Niu [Wed, 14 Sep 2022 15:36:51 +0000 (08:36 -0700)]
[mlir][LLVMIR] Add lifetime start and end marker instrinsics
This patch adds the `llvm.intr.lifetime.start` and `llvm.intr.lifetime.end`
intrinsics which are used to indicate to LLVM the lifetimes of allocated
memory.
These ops have the requirement that the first argument (the size) be an
"immediate argument". I added an OpTrait to check this, but it is
possible that an approach like GEPArg would work too.
Reviewed By: rriddle, dcaballe
Differential Revision: https://reviews.llvm.org/D133867
Stanley Winata [Wed, 14 Sep 2022 20:08:21 +0000 (13:08 -0700)]
[mlir][linalg] fix switch case for conv-vec to have brackets
Windows build requires brackets on switch-cases that initializes
variables.
Reviewed By: hanchung
Differential Revision: https://reviews.llvm.org/D133889
Akira Hatanaka [Tue, 13 Sep 2022 15:47:43 +0000 (08:47 -0700)]
[compiler-rt][builtins] Enable more warnings in add_security_warnings
Enable -Wsizeof-array-div and -Wsizeof-pointer-divcompiler.
Also, replace -Wmemset-transposed-args with -Wsuspicious-memaccess. The
latter automatically enables the former and a few other warnings.
Differential Revision: https://reviews.llvm.org/D133783
John Ericson [Wed, 14 Sep 2022 04:12:04 +0000 (00:12 -0400)]
[CMake] Avoid `LLVM_BINARY_DIR` when other more specific variable are better-suited, part 2
A simple sed doing these substitutions:
- `${LLVM_BINARY_DIR}/lib${LLVM_LIBDIR_SUFFIX}\>` -> `${LLVM_LIBRARY_DIR}`
- `${LLVM_BINARY_DIR}/bin\>` -> `${LLVM_TOOLS_BINARY_DIR}`
where `\>` means "word boundary".
The only manual modifications were reverting changes in
- `runtimes/CMakeLists.txt`
because these were "entry points" where we wanted to tread carefully not not introduce a "loop" which would end with an undefined variable being expanded to nothing.
There are some `${LLVM_BINARY_DIR}/lib` without the `${LLVM_LIBDIR_SUFFIX}`, but these refer to the lib subdirectory of the source (`llvm/lib`). That `lib` is automatically appended to make the local `CMAKE_CURRENT_BINARY_DIR` value by `add_subdirectory`; since the directory name in the source tree is fixed without any suffix, the corresponding `CMAKE_CURRENT_BINARY_DIR` will also be. We therefore do not replace it but leave it as-is.
This picks up where D133828 left off, getting the occurrences with*out* `CMAKE_CFG_INTDIR`. But this is difficult to do correctly and so not done in the (retroactively) previous diff.
This hopefully increases readability overall, and also decreases the usages of `LLVM_LIBDIR_SUFFIX`, preparing us for D130586.
Reviewed By: sebastian-ne
Differential Revision: https://reviews.llvm.org/D132316
Roland Froese [Wed, 14 Sep 2022 19:35:37 +0000 (15:35 -0400)]
[DAGCombiner] More load-store forwarding for big-endian
Get some load-store forwarding cases for big-endian where a larger store covers
a smaller load, and the offset would be 0 and handled on little-endian but on
big-endian the offset is adjusted to be non-zero. The idea is just to shift the
data to make it look like the offset 0 case.
Differential Revision: https://reviews.llvm.org/D130115
Fangrui Song [Wed, 14 Sep 2022 19:30:34 +0000 (12:30 -0700)]
[llvm-objdump] Change printSymbolVersionDependency to use ELFFile API
When .gnu.version_r is empty (allowed by readelf but warned by objdump),
llvm-objdump -p may decode the next section as .gnu.version_r and may crash due
to out-of-bounds C string reference. ELFFile<ELFT>::getVersionDependencies
handles 0-entry .gnu.version_r gracefully. Just use it.
Fix https://github.com/llvm/llvm-project/issues/57707
Differential Revision: https://reviews.llvm.org/D133751
Fangrui Song [Wed, 14 Sep 2022 19:27:30 +0000 (12:27 -0700)]
[llvm-objdump][test] Add verneed-invalid.test
Nico Weber [Thu, 1 Sep 2022 14:01:54 +0000 (10:01 -0400)]
lld: Include name of output file in "failed to write output" diag
Differential Revision: https://reviews.llvm.org/D133110
Michael Buch [Wed, 14 Sep 2022 16:31:25 +0000 (12:31 -0400)]
[lldb][tests] Move C++ gmodules tests into new gmodules/ subdirectory
This is in preparation for adding more gmodules
tests.
Differential Revision: https://reviews.llvm.org/D133876
Tue Ly [Wed, 14 Sep 2022 15:37:29 +0000 (11:37 -0400)]
[libc][math] Improve exp2f performance.
Reduce the number of subintervals that need lookup table and optimize
the evaluation steps.
Currently, `exp2f` is computed by reducing to `2^hi * 2^mid * 2^lo` where
`-16/32 <= mid <= 15/32` and `-1/64 <= lo <= 1/64`, and `2^lo` is then
approximated by a degree 6 polynomial.
Experiment with Sollya showed that by using a degree 6 polynomial, we
can approximate `2^lo` for a bigger range with reasonable errors:
```
> P = fpminimax((2^x - 1)/x, 5, [|D...|], [-1/64, 1/64]);
> dirtyinfnorm(2^x - 1 - x*P, [-1/64, 1/64]);
0x1.e18a1bc09114def49eb851655e2e5c4dd08075ac2p-63
> P = fpminimax((2^x - 1)/x, 5, [|D...|], [-1/32, 1/32]);
> dirtyinfnorm(2^x - 1 - x*P, [-1/32, 1/32]);
0x1.05627b6ed48ca417fe53e3495f7df4baf84a05e2ap-56
```
So we can optimize the implementation a bit with:
# Reduce the range to `mid = i/16` for `i = 0..15` and `-1/32 <= lo <= 1/32`
# Store the table `2^mid` in bits, and add `hi` directly to its exponent field to compute `2^hi * 2^mid`
# Rearrange the order of evaluating the polynomial approximating `2^lo`.
Performance benchmark using perf tool from the CORE-MATH project on Ryzen 1700:
```
$ CORE_MATH_PERF_MODE="rdtsc" ./perf.sh exp2f
GNU libc version: 2.35
GNU libc release: stable
CORE-MATH reciprocal throughput : 9.534
System LIBC reciprocal throughput : 6.229
BEFORE:
LIBC reciprocal throughput : 21.405
LIBC reciprocal throughput : 15.241 (with `-msse4.2` flag)
LIBC reciprocal throughput : 11.111 (with `-mfma` flag)
AFTER:
LIBC reciprocal throughput : 18.617
LIBC reciprocal throughput : 12.852 (with `-msse4.2` flag)
LIBC reciprocal throughput : 9.253 (with `-mfma` flag)
$ CORE_MATH_PERF_MODE="rdtsc" ./perf.sh exp2f --latency
GNU libc version: 2.35
GNU libc release: stable
CORE-MATH latency : 40.869
System LIBC latency : 30.580
BEFORE
LIBC latency : 64.888
LIBC latency : 61.027 (with `-msse4.2` flag)
LIBC latency : 48.778 (with `-mfma` flag)
AFTER
LIBC latency : 48.803
LIBC latency : 45.047 (with `-msse4.2` flag)
LIBC latency : 37.487 (with `-mfma` flag)
```
Reviewed By: sivachandra, orex
Differential Revision: https://reviews.llvm.org/D133870
Fangrui Song [Wed, 14 Sep 2022 18:24:00 +0000 (11:24 -0700)]
[CMake] Enable LLVM_ENABLE_PER_TARGET_RUNTIME_DIR by default on *BSD
Similar to D107799 but for *BSD (DragonFlyBSD, FreeBSD, NetBSD, OpenBSD, etc).
This Linux default has been in main and release/15.x for a while.
`CMAKE_SYSTEM_PROCESSOR MATCHES "^arm"` is excluded for now.
Link: https://discourse.llvm.org/t/rfc-time-to-drop-legacy-runtime-paths/64628
Reviewed By: dim
Differential Revision: https://reviews.llvm.org/D110126
Stanley Winata [Wed, 14 Sep 2022 18:07:46 +0000 (11:07 -0700)]
[mlir][linalg] Vectorization for conv_1d_ncw_fcw
Most computer vision torch models uses nchw/ncw convolution. In a previous patch we added decomposition conv2dNchw to conv1dNcw. To enhance the performance on torch models we add this vectorization pattern for conv1dNcw which would consquently also improve the performance on conv2dNchw.
On IREE + Intel Xeon 8360 + Resnet50, we were able to get ~7x speed up ~880ms to 126ms.
Reviewed By: nicolasvasilache, hanchung
Differential Revision: https://reviews.llvm.org/D133675
Piotr Sobczak [Wed, 14 Sep 2022 11:19:16 +0000 (13:19 +0200)]
[AMDGPU] Check for num elts in SelectVOP3PMods
The rest of the code section assumes there are exactly two elements
in the vector (Lo, Hi), so add the check before entering the section.
Differential Revision: https://reviews.llvm.org/D133852
Julius [Wed, 14 Sep 2022 17:34:16 +0000 (19:34 +0200)]
[Clang]: Diagnose deprecated copy operations also in MSVC compatibility mode
When running in MSVC compatibility mode, previously no deprecated copy
operation warnings (enabled by -Wdeprecated-copy) were raised. This
restriction was already in place when the deprecated copy warning was
first introduced.
This patch removes said restriction so that deprecated copy warnings, if
enabled, are also raised in MSVC compatibility mode. The reasoning here
being that these warnings are still useful when running in MSVC
compatibility mode and also have to be semi-explicitly enabled in the
first place (using -Wdeprecated-copy, -Wdeprecated or -Wextra).
Differential Revision: https://reviews.llvm.org/D133354
Florian Hahn [Wed, 14 Sep 2022 17:43:53 +0000 (18:43 +0100)]
[ConstraintElimination] Track if variables are positive in constraint.
Keep track if variables are known positive during constraint
decomposition, aggregate the information when building the constraint
object and encode the extra information as constraints to be used during
reasoning.
Thomas Raoux [Wed, 14 Sep 2022 02:10:38 +0000 (02:10 +0000)]
[mlir][vector] Clean up and generalize lowering of warp_execute to scf
Simplify the lowering of warp_execute_on_lane0 of scf.if by making the
logic more generic. Also remove the assumption that the most inner
dimension is the dimension distributed.
Differential Revision: https://reviews.llvm.org/D133826
Matt Arsenault [Wed, 14 Sep 2022 13:10:25 +0000 (09:10 -0400)]
llvm-reduce: Do not insert replacement IMPLICIT_DEFs for dead defs
Also skip dead defs when looking for a previous vreg with the same
class. This helps avoid some mid-reduction verifier errors when
LiveIntervals computation starts introducing dead flags everywhere.
Matt Arsenault [Wed, 14 Sep 2022 12:42:10 +0000 (08:42 -0400)]
llvm-reduce: Restrict test to only test relevant reductions
Avoids breaking this test in a future change.
Joseph Huber [Wed, 31 Aug 2022 20:55:14 +0000 (15:55 -0500)]
[Libomptarget] Change device free routines to accept the allocation kind
Previous support for device memory allocators used a single free
routine and did not provide the original kind of the allocation. This is
problematic as some of these memory types required different handling.
Previously this was worked around using a map in runtime to record the
original kind of each pointer. Instead, this patch introduces new free
routines similar to the existing allocation routines. This allows us to
avoid a map traversal every time we free a device pointer.
The only interfaces defined by the standard are `omp_target_alloc` and
`omp_target_free`, these do not take a kind as `omp_alloc` does. The
standard dictates the following:
"The omp_target_alloc routine returns a device pointer that references
the device address of a storage location of size bytes. The storage
location is dynamically allocated in the device data environment of the
device specified by device_num."
Which suggests that these routines only allocate the default device
memory for the kind. So this has been changed to reflect this. This
change is somewhat breaking if users were using `omp_target_free` as
previously shown in the tests.
Reviewed By: JonChesterfield, tianshilei1992
Differential Revision: https://reviews.llvm.org/D133053
Nico Weber [Wed, 14 Sep 2022 16:43:24 +0000 (12:43 -0400)]
Revert "[clang] fix generation of .debug_aranges with LTO"
This reverts commit
6bf6730ac55e064edf46915ebba02e9c716f48e8.
Breaks tests if LLD isn't being built, see comments on
https://reviews.llvm.org/D133092
revunov.denis@huawei.com [Wed, 14 Sep 2022 16:29:48 +0000 (16:29 +0000)]
[BOLT] Preserve original LSDA type encoding
In non-pie binaries BOLT unconditionally converted type encoding
from indirect to absptr, which broke std exceptions since pointers
to their typeinfo were only assigned at runtime in .data section.
In this patch we preserve original encoding so that indirect
remains indirect and can be resolved at runtime, and absolute remains absolute.
Reviewed By: rafauler, maksfb
Differential Revision: https://reviews.llvm.org/D132484
Ashay Rane [Tue, 13 Sep 2022 13:20:26 +0000 (08:20 -0500)]
[clang] fix linker executable path in test
A previous patch (https://reviews.llvm.org/D132810) introduced a test
that fails on systems where the linker executable (`ld`) has a `.exe`
extension. This patch updates the regex in the test so that lit can
look for both `ld` as well as `ld.exe`.
Reviewed By: stella.stamenova
Differential Revision: https://reviews.llvm.org/D133773
Stella Stamenova [Wed, 14 Sep 2022 16:30:49 +0000 (09:30 -0700)]
Revert "[lldb][DWARF5] Enable macro evaluation"
This reverts commit
a0fb69d17b4d7501a85554010727837340e7b52f.
This broke the windows lldb bot: https://lab.llvm.org/buildbot/#/builders/83/builds/23666
Nico Weber [Wed, 14 Sep 2022 16:17:41 +0000 (12:17 -0400)]
Revert "[test][clang] run test for lld emitting dwarf-aranages only if lld is presented"
This reverts commit
44075cc34a9b373714b594964001ce283598eac1.
Broke check-clang, see comments on https://reviews.llvm.org/D133841
Eman Copty [Mon, 12 Sep 2022 23:54:16 +0000 (23:54 +0000)]
[mlir] Add accessor methods for I[2|4|16] types to Builder.
Adds the accessor methods for I[2|4|16] types to the Builder.
Differential Revision: https://reviews.llvm.org/D133793
Peiming Liu [Mon, 12 Sep 2022 23:57:53 +0000 (23:57 +0000)]
[mlir][sparse] Make sparse compiler more admissible.
Previously, the iteration graph is computed without priority. This patch add a heuristic when computing the iteration graph by starting with Reduction iterator when doing topo sort, which makes Reduction iterators (likely) appear as late in the sorted array as possible.
The current sparse compiler also failed to compile the newly added case.
Reviewed By: aartbik
Differential Revision: https://reviews.llvm.org/D133738
Nicolas Vasilache [Wed, 14 Sep 2022 15:51:30 +0000 (08:51 -0700)]
Revert "[mlir][scf][Transform] Refactor transform.fuse_into_containing_op so it is iterative and supports output fusion."
This reverts commit
54a5f606281d05203dca1d81d135e691b10bc513 which is a WIP that was pushed by mistake.
Nicolas Vasilache [Tue, 13 Sep 2022 06:01:25 +0000 (23:01 -0700)]
[mlir][scf][Transform] Refactor transform.fuse_into_containing_op so it is iterative and supports output fusion.
This revision revisits the implementation of `transform.fuse_into_containing_op` so that it iterates on
producers one use at a time.
Support is added to fuse a producer through a foreach_thread shared tensor argument, in which case we
tile and fuse the op inside the containing op and update the shared tensor argument to the unique destination operand.
If one cannot find such a unique destination operand the transform fails.
Nicolas Vasilache [Wed, 14 Sep 2022 12:26:01 +0000 (05:26 -0700)]
[mlir][Linalg] Add return type filter to the transform dialect
This allows matching ops by additionally providing an idiomatic spec for a unique return type.
Differential Revision: https://reviews.llvm.org/D133862
Alexey Bataev [Wed, 14 Sep 2022 15:33:27 +0000 (08:33 -0700)]
[SLP][NFC]Extract getLastInstructionInBundle function for better
dependence checking, NFC.
Part of D110978
Jeff Niu [Tue, 13 Sep 2022 18:52:29 +0000 (11:52 -0700)]
[MLIR][math] Use approximate matches for folded ops
LibM implementations differ, so the folders can have different results
on different platforms. For instance, the `cos` folder was failing on M1
mac. I chose to match the constant floats to 2(.5) significant digits.
Reviewed By: jacquesguan
Differential Revision: https://reviews.llvm.org/D133797
Groverkss [Wed, 14 Sep 2022 15:19:47 +0000 (16:19 +0100)]
[MLIR][Presburger] Add hermite normal form computation to Matrix
This patch adds hermite normal form computation to Matrix. Part of this algorithm
lived in LinearTransform, being used for compuing column echelon form. This
patch moves the implementation to Matrix::hermiteNormalForm and generalises it
to compute the hermite normal form.
Reviewed By: arjunp
Differential Revision: https://reviews.llvm.org/D133510
Mats Petersson [Wed, 14 Sep 2022 14:10:48 +0000 (15:10 +0100)]
[flang][driver]Fix broken PowerPC tests
Tests don't work on PPC since `return` instruciton is't called `ret` (apparently)
Reviewed By: awarzynski
Differential Revision: https://reviews.llvm.org/D133859
Zain Jaffal [Wed, 14 Sep 2022 15:29:39 +0000 (16:29 +0100)]
[InstCombine] Optimize multiplication where both operands are negated
Handle the case where both operands are negated in matrix multiplication
Reviewed By: fhahn
Differential Revision: https://reviews.llvm.org/D133695
Haojian Wu [Wed, 14 Sep 2022 15:19:17 +0000 (17:19 +0200)]
Remove some unused static functions in CGOpenMPRuntimeGPU.cpp, NFC
David Spickett [Tue, 13 Sep 2022 10:33:28 +0000 (10:33 +0000)]
[LLVM][AArch64] Don't warn about clobbering X16 when Speculative Load Hardening is used
SLH will fall back to a different technique if X16 is being used,
so there is no need to warn for inline asm use. Only prevent other codegen
from using it.
Reviewed By: kristof.beyls
Differential Revision: https://reviews.llvm.org/D133766
Joseph Huber [Wed, 14 Sep 2022 15:14:17 +0000 (10:14 -0500)]
[OpenMP] Remove unused function after removing simplified interface
Summary:
A previous patch removed the user of this function but did not remove
the function causing unused function warnings. Remove it.
John Ericson [Wed, 14 Sep 2022 03:28:33 +0000 (23:28 -0400)]
[CMake] Avoid `LLVM_BINARY_DIR` when other more specific variable are better-suited, part 1
A simple sed doing these substitutions:
- `${LLVM_BINARY_DIR}/\$\{CMAKE_CFG_INTDIR}/lib(${LLVM_LIBDIR_SUFFIX})?\>` -> `${LLVM_LIBRARY_DIR}`
- `${LLVM_BINARY_DIR}/\$\{CMAKE_CFG_INTDIR}/bin\>` -> `${LLVM_TOOLS_BINARY_DIR}`
where `\>` means "word boundary".
The only manual modifications were reverting changes in
- `compiler-rt/cmake/Modules/CompilerRTUtils.cmake`
because these were "entry points" where we wanted to tread carefully not not introduce a "loop" which would end with an undefined variable being expanded to nothing.
There are many more occurrences without `CMAKE_CFG_INTDIR`, but those are left for D132316 as they have proved somewhat tricky to fix.
This hopefully increases readability overall, and also decreases the usages of `LLVM_LIBDIR_SUFFIX`, preparing us for D130586.
Reviewed By: sebastian-ne
Differential Revision: https://reviews.llvm.org/D133828
Arjun P [Wed, 14 Sep 2022 14:05:54 +0000 (15:05 +0100)]
[MLIR][Presburger] use arbitrary-precision arithmetic with MPInt instead of int64_t
Only the main Presburger library under the Presburger directory has been switched to use arbitrary precision. Users have been changed to just cast returned values back to int64_t or to use newly added convenience functions that perform the same cast internally.
The performance impact of this has been tested by checking test runtimes after copy-pasting 100 copies of each function. Affine/simplify-structures.mlir goes from 0.76s to 0.80s after this patch. Its performance sees no regression compared to its original performance at commit
18a06d4f3a7474d062d1fe7d405813ed2e40b4fc before a series of patches that I landed to offset the performance overhead of switching to arbitrary precision.
Affine/canonicalize.mlir and SCF/canonicalize.mlir show no noticable difference, staying at 2.02s and about 2.35s respectively.
Also, for Affine and SCF tests as a whole (no copy-pasting), the runtime remains about 0.09s on average before and after.
Reviewed By: bondhugula
Differential Revision: https://reviews.llvm.org/D129510
Balazs Benics [Wed, 14 Sep 2022 14:45:44 +0000 (16:45 +0200)]
[analyzer] Initialize ShouldEmitErrorsOnInvalidConfigValue analyzer option
Downstream users who doesn't make use of the clang cc1 frontend for
commandline argument parsing, won't benefit from the Marshalling
provided default initialization of the AnalyzerOptions entries. More
about this later.
Those analyzer option fields, as they are bitfields, cannot be default
initialized at the declaration (prior c++20), hence they are initialized
at the constructor.
The only problem is that `ShouldEmitErrorsOnInvalidConfigValue` was
forgotten.
In this patch I'm proposing to initialize that field with the rest.
Note that this value is read by
`CheckerRegistry.cpp:insertAndValidate()`.
The analyzer options are initialized by the marshalling at
`CompilerInvocation.cpp:GenerateAnalyzerArgs()` by the expansion of the
`ANALYZER_OPTION_WITH_MARSHALLING` xmacro to the appropriate default
value regardless of the constructor initialized list which I'm touching.
Due to that this only affects users using CSA as a library, without
serious effort, I believe we cannot test this.
Reviewed By: martong
Differential Revision: https://reviews.llvm.org/D133851
Joseph Huber [Mon, 12 Sep 2022 21:21:33 +0000 (16:21 -0500)]
[OpenMP][AMDGPU] Link bitcode ROCm device libraries per-TU
Previously, we linked in the ROCm device libraries which provide math
and other utility functions late. This is not stricly correct as this
library contains several flags that are only set per-TU, such as fast
math or denormalization. This patch changes this to pass the bitcode
libraries per-TU using the same method we use for the CUDA libraries.
This has the advantage that we correctly propagate attributes making
this implementation more correct. Additionally, many annoying unused
functions were not being fully removed during LTO. This lead to
erroneous warning messages and remarks on unused functions.
I am not sure if not finding these libraries should be a hard error. let
me know if it should be demoted to a warning saying that some device
utilities will not work without them.
Reviewed By: JonChesterfield
Differential Revision: https://reviews.llvm.org/D133726
Joseph Huber [Tue, 13 Sep 2022 19:38:00 +0000 (14:38 -0500)]
[OpenMP] Remove simplified device runtime handling
The old device runtime had a "simplified" version that prevented many of
the runtime features from being initialized. The old device runtime was
deleted in LLVM 14 and is no longer in use. Selectively deactivating
features is now done using specific flags rather than the old technique.
This patch simply removes the extra logic required for handling the old
simple runtime scheme.
Reviewed By: jdoerfert
Differential Revision: https://reviews.llvm.org/D133802
Nikita Popov [Thu, 28 Jul 2022 13:50:39 +0000 (15:50 +0200)]
[AA] Tracking per-location ModRef info in FunctionModRefBehavior (NFCI)
Currently, FunctionModRefBehavior tracks whether the function reads
or writes memory (ModRefInfo) and which locations it can access
(argmem, inaccessiblemem and other). This patch changes it to track
ModRef information per-location instead.
To give two examples of why this is useful:
* D117095 highlights a weakness of ModRef modelling in the presence
of operand bundles. For a memcpy call with deopt operand bundle,
we want to say that it can read any memory, but only write argument
memory. This would allow them to be treated like any other calls.
However, we currently can't express this and have to say that it
can read or write any memory.
* D127383 would ideally be modelled as a separate threadid location,
where threadid Refs outside pre-split coroutines can be ignored
(like other accesses to constant memory). The current representation
does not allow modelling this precisely.
The patch as implemented is intended to be NFC, but there are some
obvious opportunities for improvements and simplification. To fully
capitalize on this we would also want to change the way we represent
memory attributes on functions, but that's a larger change, and I
think it makes sense to separate out the FunctionModRefBehavior
refactoring.
Differential Revision: https://reviews.llvm.org/D130896
Florian Hahn [Wed, 14 Sep 2022 14:31:25 +0000 (15:31 +0100)]
[ConstraintElimination] Clear new indices directly in getConstraint(NFC)
Instead of checking if any of the new indices has a non-zero coefficient
before using the constraint, do this directly when constructing the
constraint.
Christian Sigg [Wed, 14 Sep 2022 10:58:08 +0000 (12:58 +0200)]
[MLIR] Fix toy lit substitutions
The tools are called e.g. `toyc-ch1`, not `toy-ch1`.
Add missing toyc-ch6/7.
It turns out that the other substitutions are not needed more by specific circumstances rather than by design:
The lit test exec root is set to build/mlir/test, which is where all the test tools are placed by CMake and we wouldn't need to substitute them at all.
We shouldn't rely on this assumption though, because it will make things harder for standalone tests and other build systems.
Reviewed By: mehdi_amini
Differential Revision: https://reviews.llvm.org/D133842
Jordan Rupprecht [Wed, 14 Sep 2022 13:52:47 +0000 (06:52 -0700)]
Fix heap-use-after-free when clearing DIEs in fission compile units.
D131437 caused heap-use-after-free failures when testing TestCreateAfterAttach.py in asan mode, and "regular" crashes outside of asan.
This appears to be due to a mismatch in a couple places where we choose to clear the DIEs. When we clear the DIE of a skeleton unit, we unconditionally clear the DIE of the DWO unit if it exists. However, `~ScopedExtractDIEs()` only looks at the skeleton unit when deciding to clear. If we decide to clear the skeleton unit because it is now unused, we end up clearing the DWO unit that _is_ used. This change adds a guard by checking `m_cancel_scopes` to prevent clearing the DWO unit.
This is 100% reproducible by running TestCreateAfterAttach.py in asan mode, although it only seems to reproduce in our internal build, so no test case is added here. If someone has suggestions on how to write one, I can add it.
Reviewed By: labath
Differential Revision: https://reviews.llvm.org/D133790
Zain Jaffal [Wed, 14 Sep 2022 13:49:55 +0000 (14:49 +0100)]
[AArch64] Disable nontemproal load for Big Endian
The current code for generating nontemporal load outputs the wrong assembly for big endian architecture.
Reviewed By: fhahn
Differential Revision: https://reviews.llvm.org/D133789
Sanjay Patel [Wed, 14 Sep 2022 13:21:21 +0000 (09:21 -0400)]
[InstCombine] try multi-use demanded bits folds for 'add'
This patch enables a multi-use demanded bits fold (motivated by issue #57576):
https://alive2.llvm.org/ce/z/DsZakh
This mimics transforms that we already do on the single-use path.
Originally, this patch did not include the last part to form a constant, but
that can be removed independently to reduce risk. It's not clear what the
effect of either change will be when viewed end-to-end.
This is expected to be neutral or a slight win for compile-time.
See the "add-demand2" series for experimental timing results:
https://llvm-compile-time-tracker.com/?config=NewPM-O3&stat=instructions&remote=rotateright
Differential Revision: https://reviews.llvm.org/D133788
Alexey Bataev [Wed, 14 Sep 2022 13:08:01 +0000 (06:08 -0700)]
[SLP] Move getInsertIndex function, NFC.
Part of D110978.
Mats Petersson [Tue, 13 Sep 2022 17:04:01 +0000 (18:04 +0100)]
[flang][driver]Fix broken flang-new mlir test
The test was added as a .mlir file, and this extension is not
in the lit.cfg.py, so it was never run. When running it, the
file would produce an error, as semicolon is not an MLIR comment.
This adds the extension and fixes the comment start by using C++
style comments.
Reviewed By: awarzynski
Differential Revision: https://reviews.llvm.org/D133792
Zain Jaffal [Wed, 14 Sep 2022 12:51:26 +0000 (13:51 +0100)]
[AArch64] Add nontemporal load tests for big endian.
Reviewed By: fhahn
Differential Revision: https://reviews.llvm.org/D133765
Nikita Popov [Wed, 14 Sep 2022 11:17:28 +0000 (13:17 +0200)]
[AA] Remove unnecessary intersections from getModRefBehavior() (NFC)
Intersection with other providers is performed by AAResults. Doing
this here is both pointless and confusing.
Florian Hahn [Wed, 14 Sep 2022 11:00:31 +0000 (12:00 +0100)]
[ConstraintElimination] Further de-compose operands of add operations.
This simply extends the existing logic to look through adds and combine
the components as done in other places already.
Simon Pilgrim [Wed, 14 Sep 2022 10:46:26 +0000 (11:46 +0100)]
[CostModel][X86] getArithmeticInstrCost - move GLM/SLM custom costs AFTER constant shift -> multiply canonicalization
Corrects the shift by constant costs to better account for them being converted to multiples for lowering - which demonstrates that we should probably be trying harder NOT to convert these to multiplies for some CPUs (v4i32 in particular).
Simon Pilgrim [Wed, 14 Sep 2022 10:18:23 +0000 (11:18 +0100)]
[CostModel][X86] Fix throughput costs for AVX512BW v32i16 shifts
Fixes regression from
a931dbfbd30754cf39897037a223eee60ae9e855
Pavel Labath [Wed, 14 Sep 2022 09:35:16 +0000 (11:35 +0200)]
[lldb] Enable (un-xfail) some dwarf tests for arm
These are passing now that the relocation assertion has been removed in
D132954.
Relocations still remain unimplemented though, so it's possible this may
start to fail due to unrelated changes. If that happens very often, we
may just need to disable (skip) the test instead.
Florian Hahn [Wed, 14 Sep 2022 09:04:07 +0000 (10:04 +0100)]
[ConstraintElimination] Add tests where info from zext can be used.
Pavel Kosov [Wed, 14 Sep 2022 08:30:27 +0000 (11:30 +0300)]
[lldb][DWARF5] Enable macro evaluation
Patch enables handing of DWARFv5 DW_MACRO_define_strx and DW_MACRO_undef_strx
~~~
OS Laboratory. Huawei RRI. Saint-Petersburg
Reviewed By: clayborg
Differential Revision: https://reviews.llvm.org/D130062
Marco Elver [Wed, 14 Sep 2022 08:30:25 +0000 (10:30 +0200)]
[MIR] Support printing and parsing pcsections
Adds support for printing and parsing PC sections metadata in MIR.
Reviewed By: arsenm
Differential Revision: https://reviews.llvm.org/D133785
Florian Hahn [Wed, 14 Sep 2022 08:27:17 +0000 (09:27 +0100)]
[ConstraintElimination] Add tests for chained adds.
Add test coverage for reasoning about chains of adds.
Azat Khuzhin [Wed, 14 Sep 2022 08:09:01 +0000 (10:09 +0200)]
[test][clang] run test for lld emitting dwarf-aranages only if lld is presented
Fixes: https://reviews.llvm.org/D133092
CI: https://lab.llvm.org/buildbot/#/builders/109/builds/46592
Reviewed By: hokein
Differential Revision: https://reviews.llvm.org/D133841
Siva Chandra [Wed, 14 Sep 2022 08:03:07 +0000 (01:03 -0700)]
[libc][Obvious] Fix typo in the alternate path of the POSIX "access" function.
Siva Chandra Reddy [Tue, 13 Sep 2022 20:09:20 +0000 (20:09 +0000)]
[libc] Add implementation of POSIX function "access".
Reviewed By: lntue
Differential Revision: https://reviews.llvm.org/D133814
Timm Bäder [Wed, 14 Sep 2022 04:21:38 +0000 (06:21 +0200)]
[clang][Interp] Remove struct from a testcase
This should fix the leak sanitizer breakage introduced by
https://reviews.llvm.org/D132997, e.g.
https://lab.llvm.org/buildbot/#/builders/5/builds/27410
Chuanqi Xu [Wed, 14 Sep 2022 06:55:13 +0000 (14:55 +0800)]
[C++20] [Coroutines] Prefer sized deallocation in promise_type
Now when the compiler can't find the sized deallocation function
correctly in promise_type if there are multiple deallocation function
overloads there.
According to [dcl.fct.def.coroutine]p12:
> If both a usual deallocation function with only a pointer parameter
> and a usual deallocation function with both a pointer parameter and a
> size parameter are found, then the selected deallocation function
> shall be the one with two parameters.
So when there are multiple deallocation functions, the compiler should
choose the sized one instead of the unsized one. The patch fixes this.
Jean Perier [Wed, 14 Sep 2022 06:54:00 +0000 (08:54 +0200)]
[flang] Make a descriptor copy for fir.load fir.ref<fir.box>
`fir.box` and `fir.ref<fir.box>` are both lowered to LLVM as a
descriptor in memory. This is because fir.box of polymorphic and assumed
rank entities cannot be known at compile time, so fir.box cannot be
lowered to a struct value.
fir.load or fir.ref<fir.box> was previously lowered to a no-op,
propagating the operand descriptor storage as a result.
This is wrong because the operand descriptor storage may later be
modified, and these changes should not be visible in the loaded fir.box
that is an immutable SSA value.
Modify fir.load codegen for fir.box to make a copy into a new storage to
ensure the fir.box is immutable.
Differential Revision: https://reviews.llvm.org/D133779
Jon Chesterfield [Wed, 14 Sep 2022 06:55:44 +0000 (07:55 +0100)]
[amdgpu] Expand all ConstantExpr users of LDS variables in instructions
Bug noted in D112717 can be sidestepped with this change.
Expanding all ConstantExpr involved with LDS up front makes the variable specialisation simpler. Excludes ConstantExpr that don't access LDS to avoid disturbing codegen elsewhere.
Reviewed By: rampitec
Differential Revision: https://reviews.llvm.org/D133422
Martin Storsjö [Tue, 13 Sep 2022 08:17:58 +0000 (11:17 +0300)]
[Support] Access threadIndex via a wrapper function
On Unix platforms, this wrapper function is inline, so it should
expand to the same direct access to the thread local variable. On
Windows, it's a non-inline function within Parallel.cpp, allowing
making the thread_local variable static.
Windows Native TLS doesn't support direct access to thread local
variables in a different DLL, and GCC/binutils on Windows occasionally
has problems with non-static thread local variables too.
This fixes mingw dylib builds with native TLS after
e6aebff67426fa0f9779a0c19d6188a043bf15e7.
At the same time, move the whole thread local variable within
#if LLVM_ENABLE_THREADS
to fix builds without threading support.
Differential Revision: https://reviews.llvm.org/D133759