Michele Scandale [Fri, 11 Nov 2022 20:06:29 +0000 (12:06 -0800)]
Fix `unsafe-fp-math` attribute emission.
The conditions for which Clang emits the `unsafe-fp-math` function
attribute has been modified as part of
`
84a9ec2ff1ee97fd7e8ed988f5e7b197aab84a7`.
In the backend code generators `"unsafe-fp-math"="true"` enable floating
point contraction for the whole function.
The intent of the change in `
84a9ec2ff1ee97fd7e8ed988f5e7b197aab84a7`
was to prevent backend code generators performing contractions when that
is not expected.
However the change is inaccurate and incomplete because it allows
`unsafe-fp-math` to be set also when only in-statement contraction is
allowed.
Consider the following example
```
float foo(float a, float b, float c) {
float tmp = a * b;
return tmp + c;
}
```
and compile it with the command line
```
clang -fno-math-errno -funsafe-math-optimizations -ffp-contract=on \
-O2 -mavx512f -S -o -
```
The resulting assembly has a `vfmadd213ss` instruction which corresponds
to a fused multiply-add. From the user perspective there shouldn't be
any contraction because the multiplication and the addition are not in
the same statement.
The optimized IR is:
```
define float @test(float noundef %a, float noundef %b, float noundef %c) #0 {
%mul = fmul reassoc nsz arcp afn float %b, %a
%add = fadd reassoc nsz arcp afn float %mul, %c
ret float %add
}
attributes #0 = {
[...]
"no-signed-zeros-fp-math"="true"
"no-trapping-math"="true"
[...]
"unsafe-fp-math"="true"
}
```
The `"unsafe-fp-math"="true"` function attribute allows the backend code
generator to perform `(fadd (fmul a, b), c) -> (fmadd a, b, c)`.
In the current IR representation there is no way to determine the
statement boundaries from the original source code.
Because of this for in-statement only contraction the generated IR
doesn't have instructions with the `contract` fast-math flag and
`llvm.fmuladd` is being used to represent contractions opportunities
that occur within a single statement.
Therefore `"unsafe-fp-math"="true"` can only be emitted when contraction
across statements is allowed.
Moreover the change in `
84a9ec2ff1ee97fd7e8ed988f5e7b197aab84a7` doesn't
take into account that the floating point math function attributes can
be refined during IR code generation of a function to handle the cases
where the floating point math options are modified within a compound
statement via pragmas (see `CGFPOptionsRAII`).
For consistency `unsafe-fp-math` needs to be disabled if the contraction
mode for any scope/operation is not `fast`.
Similarly for consistency reason the initialization of `UnsafeFPMath` of
in `TargetOptions` for the backend code generation should take into
account the contraction mode as well.
Reviewed By: zahiraam
Differential Revision: https://reviews.llvm.org/D136786
Chuanqi Xu [Tue, 15 Nov 2022 03:50:51 +0000 (11:50 +0800)]
[NFC] [C++20] [Modules] Remove unused Global Module Fragment variables/arguments
Craig Topper [Tue, 15 Nov 2022 03:37:04 +0000 (19:37 -0800)]
[RISCV] Expand i32 abs to negw+max at isel.
This adds a RISCVISD::ABSW to remember that we started with an i32
abs. Previously we used a DAG combine of (sext_inreg (abs)) to
delay emitting a freeze from type legalization in order to make
ComputeNumSignBits optimizations work on other promoted nodes.
This new approach always uses negw+max even if the result doesn't
need to be sign extended. This helps the RISCVSExtWRemoval pass
if the sext.w is in another basic block.
Jonas Devlieghere [Fri, 11 Nov 2022 19:50:45 +0000 (11:50 -0800)]
[dsymutil] Fix assertion in the Reproducer/FileCollector when TMPDIR is empty
Fix a assertion in dsymutil coming from the Reproducer/FileCollector.
When TMPDIR is empty, the root becomes a relative path, triggering an
assertion when adding a relative path to the VFS mapping. This patch
fixes the issue by resolving the relative path and also moves the
assertion up to make it easier to diagnose these issues in the future.
rdar://
102170986
Differential revision: https://reviews.llvm.org/D137959
Chen Zheng [Tue, 8 Nov 2022 06:39:09 +0000 (01:39 -0500)]
[PowerPC] make expensive mflr be away from its user in the function prologue
mflr is kind of expensive on Power version smaller than 10, so we should
schedule the store for the mflr's def away from mflr.
In epilogue, the expensive mtlr has no user for its def, so it doesn't
matter that the load and the mtlr are back-to-back.
Reviewed By: RolandF
Differential Revision: https://reviews.llvm.org/D137423
River Riddle [Tue, 15 Nov 2022 00:55:33 +0000 (16:55 -0800)]
[mlir][AttrTypeReplacer] Make attribute dictionary replacement optional
This provides an optimization opportunity for clients that don't want/need
to recurse attribute dictionaries.
Xiaodong Liu [Tue, 15 Nov 2022 01:55:03 +0000 (09:55 +0800)]
[LoongArch] Handle register spill in BranchRelaxation pass
When the range of the unconditional branch is overflow, the indirect
branch way is used. The case when there is no scavenged register for
indirect branch needs to spill register to stack.
Reviewed By: SixWeining, wangleiat
Differential Revision: https://reviews.llvm.org/D137821
Jakub Kuderski [Tue, 15 Nov 2022 01:54:14 +0000 (20:54 -0500)]
[mlir][arith][spirv] Clean up arith-to-spirv. NFC.
Reviewed By: antiagainst
Differential Revision: https://reviews.llvm.org/D137978
Jakub Kuderski [Tue, 15 Nov 2022 01:51:58 +0000 (20:51 -0500)]
[mlir][arith] Add `arith.shrsi` support to WIE
This includes LIT tests over the generated ops and runtime tests.
Reviewed By: antiagainst
Differential Revision: https://reviews.llvm.org/D137965
TatWai Chong [Tue, 15 Nov 2022 01:29:45 +0000 (17:29 -0800)]
[mlir][tosa] Create a profile validation pass for TOSA dialect
Add a separate validation pass to check if TOSA operations match with
the specification against given requirement. Perform profile type
checking as the initial feature in the pass.
This is an optional pass that can be enabled via command line. e.g.
$mlir-opt --tosa-validate="profile=bi" for validating against the
base inference profile.
Description:
TOSA defines a variety of operator behavior and requirements in the
specification. It would be helpful to have a separate validation pass
to keep TOSA operation input match with TOSA specification for given
criteria, and also diminish the burden of dialect validation during
compilation.
TOSA supports three profiles of which two are for inference purposes.
The main inference profile supports both integer and floating-point
data types, but the base inference profile only supports integers.
In this initial PR, validate the operations against a given profile
of TOSA, so that validation would fail if a floating point tensor is
present when the base inference profile is selected. Afterward, others
checking will be added to the pass if needed. e.g. control flow
operators and custom operators validation.
The pass is expected to be able to run on any point of TOSA dialect
conversion/transformation pipeline, and not depend on a particular
pass run ahead. So that it is can be used to validate the initial tosa
operations just converted from other dialects, the intermediate form,
or the final tosa operations output.
Change-Id: Ib58349c873c783056e89d2ab3b3312b8d2c61863
Reviewed By: rsuderman
Differential Revision: https://reviews.llvm.org/D137279
Roman Lebedev [Tue, 15 Nov 2022 00:29:58 +0000 (03:29 +0300)]
[NFC][Clang] Autogenerate checklines in a test being affected by a patch
Aart Bik [Mon, 14 Nov 2022 21:51:23 +0000 (13:51 -0800)]
[mlir][sparse] avoid nop rewriting on runtime lib path in pipeline
Reviewed By: Peiming
Differential Revision: https://reviews.llvm.org/D137981
Dmitry Vassiliev [Tue, 15 Nov 2022 00:30:00 +0000 (04:30 +0400)]
[NVPTX] Emit pragma nounroll for llvm.loop.unroll.count=1
Emit pragma nounroll for llvm.loop.unroll.count=1 (#pragma unroll 1).
Reviewed By: tra
Differential Revision: https://reviews.llvm.org/D137991
Peiming Liu [Tue, 15 Nov 2022 00:02:43 +0000 (00:02 +0000)]
[mlir][sparse] fix memory leak sparse2sparse reshape
Reviewed By: aartbik
Differential Revision: https://reviews.llvm.org/D137994
Stella Stamenova [Tue, 15 Nov 2022 00:18:04 +0000 (16:18 -0800)]
Revert "[mlir][sparse] Macros to clean up StridedMemRefType in the SparseTensorRuntime" and "[mlir][sparse] move SparseTensorReader functions into the _mlir_ciface_ section"
This reverts commits 6c22dad and 92bc3fb.
These broke the windows mlir buildbot.
Matt Arsenault [Mon, 14 Nov 2022 20:42:08 +0000 (12:42 -0800)]
GlobalISel: Add debug print for applied rule in generated combiner
Fangrui Song [Mon, 14 Nov 2022 23:51:03 +0000 (15:51 -0800)]
Revert "[opt][clang] Enable using -module-summary/-flto=thin with -S/-emit-llvm"
This reverts commit
bf8381a8bce28fc69857645cc7e84a72317e693e.
There is a layering violation: LLVMAnalysis depends on LLVMCore, so
LLVMCore should not include LLVMAnalysis header
llvm/Analysis/ModuleSummaryAnalysis.h
Alexander Shaposhnikov [Mon, 14 Nov 2022 23:15:19 +0000 (23:15 +0000)]
[opt][clang] Enable using -module-summary/-flto=thin with -S/-emit-llvm
Enable using -module-summary with -S
(similarly to what currently can be achieved with opt <input> -o - | llvm-dis).
This is a recommit of
ef9e62469.
Test plan: ninja check-all
Differential revision: https://reviews.llvm.org/D137768
Peiming Liu [Mon, 14 Nov 2022 22:28:12 +0000 (22:28 +0000)]
[mlir][sparse] fix memory leak in test cases
Reviewed By: aartbik
Differential Revision: https://reviews.llvm.org/D137985
wren romano [Mon, 14 Nov 2022 22:43:03 +0000 (14:43 -0800)]
[mlir][sparse] Fix warning on GCC
Reviewed By: aartbik, Peiming
Differential Revision: https://reviews.llvm.org/D137987
Philip Reames [Mon, 14 Nov 2022 22:21:51 +0000 (14:21 -0800)]
[RISCV] Add codegen coverage for select idioms which might benefit from XVentanaCondOps
Joseph Huber [Mon, 14 Nov 2022 20:58:19 +0000 (14:58 -0600)]
[libc] Forward LLVM_LIBC options when using a runtimes build
The `LLVM_ENABLE_RUNTIMES' mode is commonly used to build runtimes that
depend on an up-to-date version of clang. Currently, `libc` uses some
internal variables that are not forwarded when building in this mode.
This patch forwards the relevent arguments beginning with `LLVM_LIBC` to
the build when built this way.
Reviewed By: tianshilei1992
Differential Revision: https://reviews.llvm.org/D137977
bixia1 [Mon, 14 Nov 2022 18:05:19 +0000 (10:05 -0800)]
[mlir][sparse] Make three tests run with the codegen path.
Reviewed By: aartbik, Peiming
Differential Revision: https://reviews.llvm.org/D137964
wren romano [Wed, 9 Nov 2022 21:38:11 +0000 (13:38 -0800)]
[mlir][sparse] move SparseTensorReader functions into the _mlir_ciface_ section
Depends On D137735
Reviewed By: aartbik
Differential Revision: https://reviews.llvm.org/D137737
wren romano [Wed, 9 Nov 2022 21:35:45 +0000 (13:35 -0800)]
[mlir][sparse] Macros to clean up StridedMemRefType in the SparseTensorRuntime
In particular, this silences warnings from [-Wsign-compare].
Depends On D137681
Reviewed By: aartbik
Differential Revision: https://reviews.llvm.org/D137735
wren romano [Wed, 9 Nov 2022 21:33:01 +0000 (13:33 -0800)]
[mlir][sparse] Making way for SparseTensorRuntime to support non-permutations
Systematically updates the SparseTensorRuntime to properly distinguish tensor-dimensions from storage-levels (and their associated ranks, shapes, sizes, indices, etc). With a few exceptions which are noted in the code, this ensures the runtime has all the **semantic** changes necessary to support non-permutations.
(Whereas **operationally**, since we're still using `std::vector<uing64_t>` to represent the mappings, there's no way to pass in any interesting non-permutations. Changing the representation to `std::function` will be done in a separate differential.)
Depends On D137680
Reviewed By: aartbik
Differential Revision: https://reviews.llvm.org/D137681
Craig Topper [Mon, 14 Nov 2022 21:34:24 +0000 (13:34 -0800)]
[RISCV] Add PseudoCCMOVGPR to RISCVSExtWRemoval.
This instruction is a conditional move. It propagates sign bits
from its inputs.
Alexander Shaposhnikov [Mon, 14 Nov 2022 21:31:30 +0000 (21:31 +0000)]
Revert "[opt][clang] Enable using -module-summary/-flto=thin with -S/-emit-llvm"
This reverts commit
ef9e624694c0f125c53f7d0d3472fd486bada57d
for further investigation offline.
It appears to break the buildbot
llvm-clang-x86_64-sie-ubuntu-fast.
Alexander Shaposhnikov [Mon, 14 Nov 2022 21:10:24 +0000 (21:10 +0000)]
[opt][clang] Enable using -module-summary/-flto=thin with -S/-emit-llvm
Enable using -module-summary with -S
(similarly to what currently can be achieved with opt <input> -o - | llvm-dis).
Test plan: ninja check-all
Differential revision: https://reviews.llvm.org/D137768
Jonas Devlieghere [Mon, 14 Nov 2022 21:03:29 +0000 (13:03 -0800)]
Revert "[dsymutil] Fix assertion in the Reproducer/FileCollector when TMPDIR is empty"
This reverts commit
68efb4772c0d0e60cbfb09ea619b58d80c31ff0f because the
test fails on some of the buildbots.
Xiang Li [Fri, 11 Nov 2022 08:00:11 +0000 (00:00 -0800)]
[DirectX backend] Fix build and test error caused by out of sync with upstream change.
Fix build and test error caused by
https://github.com/llvm/llvm-project/commit/
a2620e00ffa232a406de3a1d8634beeda86956fd#
and
https://github.com/llvm/llvm-project/commit/
304f1d59ca41872c094def3aee0a8689df6aa398
Reviewed By: beanz
Differential Revision: https://reviews.llvm.org/D137815
Jonas Devlieghere [Fri, 11 Nov 2022 19:50:45 +0000 (11:50 -0800)]
[dsymutil] Fix assertion in the Reproducer/FileCollector when TMPDIR is empty
Fix a assertion in dsymutil coming from the Reproducer/FileCollector.
When TMPDIR is empty, the root becomes a relative path, triggering an
assertion when adding a relative path to the VFS mapping. This patch
fixes the issue by resolving the relative path and also moves the
assertion up to make it easier to diagnose these issues in the future.
rdar://
102170986
Differential revision: https://reviews.llvm.org/D137959
Andreas Hollandt [Mon, 14 Nov 2022 20:27:12 +0000 (12:27 -0800)]
[cmake] Fix _GNU_SOURCE being added unconditionally
Reviewed By: tstellar
Differential Revision: https://reviews.llvm.org/D137917
Nico Weber [Mon, 14 Nov 2022 19:23:56 +0000 (14:23 -0500)]
[COFF, Mach-O] Include -mllvm options in thinlto cache key
Like D134013, but for COFF and Mach-O.
Also expand the ELF test a bit. I at first didn't realize that `getValue()` for
`-mllvm -foo=bar` would return `-foo=bar` instead of just `bar`, and so
I wrote the test to check if we indeed get this wrong. We don't, but
having the test for it seems nice, so I'm including it.
Differential Revision: https://reviews.llvm.org/D137971
Jakub Kuderski [Mon, 14 Nov 2022 20:07:18 +0000 (15:07 -0500)]
[mlir][arith][spirv] Handle i1 sign extension in arith-to-spirv
Also fix some surrounding nits.
Reviewed By: antiagainst
Differential Revision: https://reviews.llvm.org/D137974
Mehdi Amini [Mon, 14 Nov 2022 06:24:54 +0000 (06:24 +0000)]
Apply clang-tidy fixes for readability-identifier-naming in AlgebraicSimplification.cpp (NFC)
Mehdi Amini [Mon, 14 Nov 2022 06:10:39 +0000 (06:10 +0000)]
Apply clang-tidy fixes for readability-simplify-boolean-expr in GPUDialect.cpp (NFC)
Rob Suderman [Mon, 14 Nov 2022 19:38:16 +0000 (11:38 -0800)]
[mlir][tosa] Remove zero-fill of tosa.concat outputs when lowering to linalg.
Since all output elements are known to be overridden by construction the fill is not required. This change makes the tosa lowering consistent with the MHLO and Torch lowerings of concat which do not do the fill.
Reviewed By: rsuderman
Differential Revision: https://reviews.llvm.org/D137967
Guozhi Wei [Mon, 14 Nov 2022 19:34:59 +0000 (19:34 +0000)]
[MachineCSE] Allow CSE for instructions with ignorable operands
Ignorable operands don't impact instruction's behavior, we can safely do CSE on
the instruction.
It is split from D130919. It has big impact to some AMDGPU test cases.
For example in atomic_optimizations_raw_buffer.ll, when trying to check if the
following instruction can be CSEed
%37:vgpr_32 = V_MOV_B32_e32 0, implicit $exec
Function isCallerPreservedOrConstPhysReg is called on operand "implicit $exec",
this function is implemented as
- return TRI.isCallerPreservedPhysReg(Reg, MF) ||
+ return TRI.isCallerPreservedPhysReg(Reg, MF) || TII.isIgnorableUse(MO) ||
(MRI.reservedRegsFrozen() && MRI.isConstantPhysReg(Reg));
Both TRI.isCallerPreservedPhysReg and MRI.isConstantPhysReg return false on this
operand, so isCallerPreservedOrConstPhysReg is also false, it causes LLVM failed
to CSE this instruction.
With this patch TII.isIgnorableUse returns true for the operand $exec, so
isCallerPreservedOrConstPhysReg also returns true, it causes this instruction to
be CSEed with previous instruction
%14:vgpr_32 = V_MOV_B32_e32 0, implicit $exec
So I got different result from here. AMDGPU's implementation of isIgnorableUse
is
bool SIInstrInfo::isIgnorableUse(const MachineOperand &MO) const {
// Any implicit use of exec by VALU is not a real register read.
return MO.getReg() == AMDGPU::EXEC && MO.isImplicit() &&
isVALU(*MO.getParent()) && !resultDependsOnExec(*MO.getParent());
}
Since the operand $exec is not a real register read, my understanding is it's
reasonable to do CSE on such instructions.
Because more instructions are CSEed, so I get less instructions generated for
these tests.
Differential Revision: https://reviews.llvm.org/D137222
Matt Arsenault [Fri, 28 Oct 2022 23:10:41 +0000 (16:10 -0700)]
clang/AMDGPU: Use Support's wrapper around getenv
This does some extra stuff for Windows, so might as well
use it just in case.
Quentin Colombet [Thu, 20 Oct 2022 22:18:58 +0000 (22:18 +0000)]
[mlir][MemRef] Change the anchor point of a reshapeLikeOp pattern
Essentially, this patches changes the anchor point of the
`extract_strided_metadata(reshapeLikeOp)` pattern from
`extract_strided_metadata` to `reshapeLikeOp`.
In details, this means that instead of replacing:
```
base, offset, sizes, strides =
extract_strided_metadata(reshapeLikeOp(src))
```
With
```
base, offset = extract_strided_metadata(src)
sizes = <some math>
strides = <some math>
```
We replace only the reshapeLikeOp part and connect it back with a
reinterpret_cast:
```
val = reshapeLikeOp(src)
```
=>
```
base, offset, ... = extract_strided_metadata(src)
sizes = <some math>
strides = <some math>
val = reinterpret_cast base, offset, sizes, strides
Differential Revision: https://reviews.llvm.org/D136386
Quentin Colombet [Wed, 12 Oct 2022 21:23:27 +0000 (21:23 +0000)]
[mlir][MemRef] Change the anchor point of a subview pattern
Essentially, this patches changes the anchor point of the
`extract_strided_metadata(subview)` pattern from
`extract_strided_metadata` to `subview`.
In details, this means that instead of replacing:
```
base, offset, sizes, strides = extract_strided_metadata(subview(src))
```
With
```
base, ... = extract_strided_metadata(src)
offset = <some math>
sizes = subSizes
strides = <some math>
```
We replace only the subview part and connect it back with a
reinterpret_cast:
```
val = subview(src)
```
=>
```
base, ... = extract_strided_metadata(src)
offset = <some math>
sizes = subSizes
strides = <some math>
val = reinterpret_cast base, offset, sizes, strides
```
Differential Revision: https://reviews.llvm.org/D135839
Quentin Colombet [Wed, 12 Oct 2022 21:18:53 +0000 (21:18 +0000)]
[mlir][MemRef] Simplify extract_strided_metadata(reinterpret_cast)
This patch adds a pattern to simplify
```
base, offset, sizes, strides =
extract_strided_metadata(
reinterpret_cast(src, srcOffset, srcSizes, srcStrides))
```
Into
```
base, baseOffset, ... = extract_strided_metadata(src)
offset = srcOffset
sizes = srcSizes
strides = srcStrides
```
Note: Reinterpret_cast with unranked sources are not simplified since
they cannot feed extract_strided_metadata operations.
Differential Revision: https://reviews.llvm.org/D135837
Nico Weber [Mon, 14 Nov 2022 18:30:55 +0000 (13:30 -0500)]
[lto] Update function name in comment after
5f312ad45
Craig Topper [Mon, 14 Nov 2022 18:28:29 +0000 (10:28 -0800)]
[RISCV] Add scalar FP compares to isSignExtendingOpW in RISCVSExtWRemoval.
Joe Nash [Mon, 14 Nov 2022 15:15:27 +0000 (10:15 -0500)]
[AMDGPU][MC][NFC] Rename VOP3 VOPC test files
D136149 and D136148 renamed the MC test files for VOP3 promoted from VOP1 and
VOP2 in a consistent way. Do the same for VOP3 coming from VOPC.
Reviewed By: dp
Differential Revision: https://reviews.llvm.org/D137950
Craig Topper [Mon, 14 Nov 2022 17:59:03 +0000 (09:59 -0800)]
[RISCV] Move FixableDef handling out of isSignExtendingOpW.
We have two layers of opcode checks. The first is in
isSignExtendingOpW. If that returns false, a second switch is used
for looking through nodes by adding them to the worklist.
Move the FixableDef handling to the second switch. This simplies
the interface of isSignExtendingOpW and makes that function more
accurate to its name.
Yashwant Singh [Mon, 14 Nov 2022 17:57:08 +0000 (23:27 +0530)]
[GlobalIsel][AMDGPU] Changing legalize rule for G_{UADDO|UADDE|USUBO|USUBE|SADDE|SSUBE}
Generic add and sub with carry are now legalized in a way to explicitly calculate carry/borrow output. i.e
%6:_(s64), %7:_(s1) = G_UADDO %0, %1
becomes,
%13:_(s32), %14:_(s1) = G_UADDO %2, %4
%15:_(s32), %16:_(s1) = G_UADDE %3, %5, %14
%6:_(s64) = G_MERGE_VALUES %13(s32), %15(s32)
%7:_(s1) = G_ICMP intpred(ult), %6(s64), %1
Here G_MERGE and G_ICMP instructions are redundant for recalculating carry output. (Similar case for sub with borrow)
This change fix this.
Reviewed By: arsenm, #amdgpu
Differential Revision: https://reviews.llvm.org/D137932
Thomas Raoux [Sun, 13 Nov 2022 18:52:03 +0000 (18:52 +0000)]
[mlir][linalg] Add reduction tiling using scf.foreachthread
This adds a transformation to tile reduction operations to partial
reduction using scf.foreachthread. This uses
PartialReductionOpInterface to create a merge operation of the partial
tiles.
Differential Revision: https://reviews.llvm.org/D137912
Benjamin Kramer [Mon, 14 Nov 2022 18:02:56 +0000 (19:02 +0100)]
[bazel] Add another missing dependency after D137833
While there run buildifier.
Quentin Colombet [Wed, 12 Oct 2022 00:29:39 +0000 (00:29 +0000)]
[mlir][MemRef] Make reinterpret_cast(extract_strided_metadata) more robust
Prior to this patch the canonicalization pattern that turns
`reinterpret_cast(extract_strided_metadata)` into cast was only applied
when all the input operands of the `reinterpret_cast` are exactly all the
output results of the `extract_strided_metadata`.
This missed simplification opportunities when the values would have hold
the same constant values, but yet, come from different actual values.
E.g., prior to this patch, a pattern of the form:
```
%base, %offset = extract_strided_metadata %source : memref<i16>
reinterpret_cast %base to offset:[0]
```
Wouldn't have been simplified into a simple cast, because %offset is not
directly the same value object as 0.
This patch teaches this pattern how to check if the constant values
match what the results of the `extract_strided_metadata` operation would
have hold.
Differential Revision: https://reviews.llvm.org/D135736
Chenguang Wang [Mon, 14 Nov 2022 17:57:52 +0000 (18:57 +0100)]
[bazel] Fix Bufferization dialect build
D137833 added a new .td file and updated existing files to use it.
It broke bazel build.
Differential Revision: https://reviews.llvm.org/D137961
Jason Molenda [Mon, 14 Nov 2022 17:50:58 +0000 (09:50 -0800)]
Change last-ditch magic address in IRMemoryMap::FindSpace
When we cannot allocate memory in the inferior process, the IR
interpreter's IRMemoryMap::FindSpace will create an lldb local
buffer and assign it an address range in the inferior address
space. When the interpreter sees an address in that range, it
will read/write from the local buffer instead of the target. If
this magic address overlaps with actual data in the target, the
target cannot be accessed through expressions.
Instead of using a high memory address that is validly addressable,
this patch uses an address that cannot be accessed on 64-bit systems
that don't actually use all 64 bits of the virtual address.
Differential Revision: https://reviews.llvm.org/D137682
rdar://
96248287
Yabin Cui [Mon, 14 Nov 2022 17:48:44 +0000 (17:48 +0000)]
[Support] Use thread safe version of getpwuid and getpwnam.
OpenGroup specification doesn't require getpwuid and getpwnam
to be thread-safe. And musl libc has a not thread-safe implementation.
When building clang with musl, this can make clang-scan-deps crash.
Reviewed By: pirama
Differential Revision: https://reviews.llvm.org/D137864
Arthur Eubanks [Sun, 13 Nov 2022 22:54:26 +0000 (14:54 -0800)]
[LegacyPM] Remove cl::opts controlling optimization pass manager passes
Move these to the new PM if they're used there.
Part of removing the legacy pass manager for optimization pipeline.
Reland with UseNewGVN usage in clang removed.
Reviewed By: MaskRay
Differential Revision: https://reviews.llvm.org/D137915
Craig Topper [Mon, 14 Nov 2022 17:17:47 +0000 (09:17 -0800)]
[RISCV] Remove old test case. NFC
This seemed to be testing a pattern for an RV64 Zbp instruction, but
on RV32. On RV32, it's just swizzling registers so isn't very
interesting.
Craig Topper [Mon, 14 Nov 2022 17:15:31 +0000 (09:15 -0800)]
[RISCV] Improve use of PACK instruction on rv64.
Handle the case where the lower bits come from a zero extending
load or other operation with known zero bits.
Arthur Eubanks [Mon, 14 Nov 2022 17:33:38 +0000 (09:33 -0800)]
Revert "[LegacyPM] Remove cl::opts controlling optimization pass manager passes"
This reverts commit
7ec05fec7115a910b2e172de794adc462388c25e.
Breaks bots, e.g. https://lab.llvm.org/buildbot#builders/217/builds/15008
Arthur Eubanks [Sun, 13 Nov 2022 22:54:26 +0000 (14:54 -0800)]
[LegacyPM] Remove cl::opts controlling optimization pass manager passes
Move these to the new PM if they're used there.
Part of removing the legacy pass manager for optimization pipeline.
Reviewed By: MaskRay
Differential Revision: https://reviews.llvm.org/D137915
Nicolas Vasilache [Sun, 13 Nov 2022 13:28:32 +0000 (05:28 -0800)]
[mlir][Transform]Significantly cleanup scf.foreach_thread and GPU transform permutation handling
Previously, the need for a dense permutation leaked into the thread_dim_mapping specification.
This revision allows to use a sparse specification of the thread_dim_mapping and the proper completion / sorting is applied automatically.
In the process, the sematics of scf.foreach_thread is tightened to require a matching number of thread dimensions and mappings.
The relevant negative test is added.
Differential Revision: https://reviews.llvm.org/D137906
Akash Banerjee [Wed, 9 Nov 2022 15:54:21 +0000 (15:54 +0000)]
Migrate getOrCreateInternalVariable from Clang to OMPIRBuilder.
This patch removes getOrCreateInternalVariable from Clang OMP CodeGen and replaces it's uses with OMPBuilder::getOrCreateInternalVariable. Also refactors OMPBuilder::getOrCreateInternalVariable to change type of name from Twine to StringRef
Differential Revision: https://reviews.llvm.org/D137720
Lorenzo Chelini [Fri, 11 Nov 2022 12:35:16 +0000 (13:35 +0100)]
[MLIR][Transform] Expose map layout option in `OneShotBufferizeOp`
Expose `function-boundary-type-conversion` in `OneShotBufferizeOp`. To
reuse options between passes and transform operations, create a
`BufferizationEnums.td`.
Reviewed By: springerm
Differential Revision: https://reviews.llvm.org/D137833
Sanjay Patel [Mon, 14 Nov 2022 16:54:47 +0000 (11:54 -0500)]
[InstSimplify] restrict logic fold with partial undef vector
https://alive2.llvm.org/ce/z/4ncsnX
Fixes #58977
Sanjay Patel [Sun, 13 Nov 2022 17:38:12 +0000 (12:38 -0500)]
[SystemZ] improve test for showing store merge miscompile; NFC
See issue #58883 for details.
Philip Reames [Mon, 14 Nov 2022 16:29:55 +0000 (08:29 -0800)]
[RISCV] Implement assembler support for XVentanaCondOps
This change provides an implementation of the XVentanaCondOps vendor extension. This extension is defined in version 1.0.0 of the VTx-family custom instructions specification (https://github.com/ventanamicro/ventana-custom-extensions/releases/download/v1.0.0/ventana-custom-extensions-v1.0.0.pdf) by Ventana Micro Systems.
In addition to the technical contribution, this change is intended to be a test case for our vendor extension policy.
Once this lands, I plan to use this extension to prototype selection lowering to conditional moves. There's an RVI proposal in flight, and the expectation is that lowering to these and the new RVI instructions is likely to be substantially similar.
Differential Revision: https://reviews.llvm.org/D137350
bixia1 [Wed, 9 Nov 2022 17:07:06 +0000 (09:07 -0800)]
[mlir][sparse] Add rewriting rules for sparse_tensor.sort_coo.
Refactor the rewriting of sparse_tensor.sort to support the implementation of
sparse_tensor.sort_coo.
Reviewed By: aartbik
Differential Revision: https://reviews.llvm.org/D137522
Sylvain Audi [Wed, 9 Nov 2022 15:01:55 +0000 (10:01 -0500)]
[PDB] Don't include input files in the 'cmd' entry of S_ENVBLOCK
MSVC records the command line arguments in S_ENVBLOCK, skipping the input file arguments.
This patch adds this filtering on lld-link side.
Differential Revision: https://reviews.llvm.org/D137723
Simon Pilgrim [Mon, 14 Nov 2022 16:13:16 +0000 (16:13 +0000)]
[MCA][X86] Ensure the avx512 vnni tests use the upper xmm/ymm registers
Ensure we're testing the avx512vl vnni instructions and not the avx vnni instructions
Simon Pilgrim [Mon, 14 Nov 2022 15:57:13 +0000 (15:57 +0000)]
[MCA][X86] Add test coverage for VBMI2 instructions
Chris Bieneman [Mon, 14 Nov 2022 16:28:36 +0000 (10:28 -0600)]
[NFC] Fixing spelling in code comment
bixia1 [Fri, 11 Nov 2022 22:24:26 +0000 (14:24 -0800)]
[mlir][sparse][NFC] Add comments to tests that are run for with and without runtime libraries.
Reviewed By: aartbik
Differential Revision: https://reviews.llvm.org/D137869
Ivan Kosarev [Mon, 14 Nov 2022 16:10:23 +0000 (16:10 +0000)]
[AMDGPU][AsmParser] Forbid TFE modifiers for MBUF stores.
Reviewed By: dp
Differential Revision: https://reviews.llvm.org/D137832
Nicholas Guy [Mon, 14 Nov 2022 15:55:44 +0000 (15:55 +0000)]
[NFC] Removal of complex deinterleaving test case complex_mul_v8f64
This test is not particularly useful for testing complex deinterleaving,
especially due to f64 muls not being supported in mve. The test is
being removed as it's hitting an unrelated pre-existing condition
regarding register spilling.
Jay Foad [Mon, 14 Nov 2022 15:27:59 +0000 (15:27 +0000)]
[AMDGPU] More use of DivergentBinFrag and friends. NFC.
Nikita Popov [Tue, 18 Oct 2022 10:11:04 +0000 (12:11 +0200)]
[AA] Move MayBeCrossIteration into AAQI (NFC)
Move the MayBeCrossIteration flag from BasicAA into AAQI. This is
in preparation for exposing it to users of the AA API.
Ivan Kosarev [Mon, 14 Nov 2022 12:37:26 +0000 (12:37 +0000)]
[AMDGPU][MC] Support TFE modifiers in MUBUF loads and stores.
Reviewed By: dp, arsenm
Differential Revision: https://reviews.llvm.org/D137783
Mindong Chen [Mon, 14 Nov 2022 15:18:47 +0000 (23:18 +0800)]
[docs][OpaquePtr] Fix hyperlinks
Jay Foad [Mon, 14 Nov 2022 15:14:55 +0000 (15:14 +0000)]
[AMDGPU] Define and use UniformTernaryFrag. NFC.
Simon Pilgrim [Mon, 14 Nov 2022 10:58:20 +0000 (10:58 +0000)]
[X86] Remove unnecessary overrides for CBW/CWDE/CDQE/CMC instructions
All of these match the default WriteALU schedule
Caroline Concatto [Thu, 3 Nov 2022 12:18:20 +0000 (12:18 +0000)]
[AArch64] Add all SME2.1 instructions Assembly/Disassembly
This patch adds a new feature flag:
sme-f16f16 to represent FEAT_SME-F16F16
This patch add the following instructions:
SME2.1 stand alone instructions:
MOVAZ (array to vector, four registers): Move and zero four ZA single-vector groups to vector registers.
(array to vector, two registers): Move and zero two ZA single-vector groups to vector registers.
(tile to vector, four registers): Move and zero four ZA tile slices to vector registers.
(tile to vector, single): Move and zero ZA tile slice to vector register.
(tile to vector, two registers): Move and zero two ZA tile slices to vector registers.
LUTI2 (Strided four registers): Lookup table read with 2-bit indexes.
(Strided two registers): Lookup table read with 2-bit indexes.
LUTI4 (Strided four registers): Lookup table read with 4-bit indexes.
(Strided two registers): Lookup table read with 4-bit indexes.
ZERO (double-vector): Zero ZA double-vector groups.
(quad-vector): Zero ZA quad-vector groups.
(single-vector): Zero ZA single-vector groups.
SME2p1 and SME-F16F16:
All instructions are half precision elements:
FADD: Floating-point add multi-vector to ZA array vector accumulators.
FSUB: Floating-point subtract multi-vector from ZA array vector accumulators.
FMLA (multiple and indexed vector): Multi-vector floating-point fused multiply-add by indexed element.
(multiple and single vector): Multi-vector floating-point fused multiply-add by vector.
(multiple vectors): Multi-vector floating-point fused multiply-add.
FMLS (multiple and indexed vector): Multi-vector floating-point fused multiply-subtract by indexed element.
(multiple and single vector): Multi-vector floating-point fused multiply-subtract by vector.
(multiple vectors): Multi-vector floating-point fused multiply-subtract.
FCVT (widening): Multi-vector floating-point convert from half-precision to single-precision (in-order).
FCVTL: Multi-vector floating-point convert from half-precision to deinterleaved single-precision.
FMOPA (non-widening): Floating-point outer product and accumulate.
FMOPS (non-widening): Floating-point outer product and subtract.
SME2p1 and B16B16:
BFADD: BFloat16 floating-point add multi-vector to ZA array vector accumulators.
BFSUB: BFloat16 floating-point subtract multi-vector from ZA array vector accumulators.
BFCLAMP: Multi-vector BFloat16 floating-point clamp to minimum/maximum number.
BFMLA (multiple and indexed vector): Multi-vector BFloat16 floating-point fused multiply-add by indexed element.
(multiple and single vector): Multi-vector BFloat16 floating-point fused multiply-add by vector.
(multiple vectors): Multi-vector BFloat16 floating-point fused multiply-add.
BFMLS (multiple and indexed vector): Multi-vector BFloat16 floating-point fused multiply-subtract by indexed element.
(multiple and single vector): Multi-vector BFloat16 floating-point fused multiply-subtract by vector.
(multiple vectors): Multi-vector BFloat16 floating-point fused multiply-subtract.
BFMAX (multiple and single vector): Multi-vector BFloat16 floating-point maximum by vector.
(multiple vectors): Multi-vector BFloat16 floating-point maximum.
BFMAXNM (multiple and single vector): Multi-vector BFloat16 floating-point maximum number by vector.
(multiple vectors): Multi-vector BFloat16 floating-point maximum number.
BFMIN (multiple and single vector): Multi-vector BFloat16 floating-point minimum by vector.
(multiple vectors): Multi-vector BFloat16 floating-point minimum.
BFMINNM (multiple and single vector): Multi-vector BFloat16 floating-point minimum number by vector.
(multiple vectors): Multi-vector BFloat16 floating-point minimum number.
BFMOPA (non-widening): BFloat16 floating-point outer product and accumulate.
BFMOPS (non-widening): BFloat16 floating-point outer product and subtract.
The reference can be found here:
https://developer.arm.com/documentation/ddi0602/2022-09
Differential Revision: https://reviews.llvm.org/D137571
Nikita Popov [Mon, 14 Nov 2022 14:46:00 +0000 (15:46 +0100)]
[AST] Remove legacy AliasSetPrinter pass
A NewPM version of this pass exists, drop the legacy version of
this testing-only pass.
Sjoerd Meijer [Fri, 11 Nov 2022 12:56:42 +0000 (18:26 +0530)]
[AArch64] Add match patterns for the reassociated forms of FNMUL
Differential Revision: https://reviews.llvm.org/D137925
Nikita Popov [Mon, 14 Nov 2022 14:28:09 +0000 (15:28 +0100)]
[LoopVersioningLICM] Clarify scope of AST (NFC)
Make it clearer that the AST is only temporarily used during the
legality check, and does not have to survive into the transformation
phase.
Joseph Huber [Mon, 14 Nov 2022 14:11:33 +0000 (08:11 -0600)]
[OpenMP] Fix installation to old resource dir
Summary:
The changes in D125860 renamed the old resource directory to only use
the major version. This was not updated for the OpenMP project, causing
OpenMP resources to still be installed in the old `major.minor.rev`
folder. This lead to problems including the header files.
fixes #58966
Luca Di Sera [Mon, 14 Nov 2022 14:17:22 +0000 (15:17 +0100)]
Add clang_CXXMethod_isMoveAssignmentOperator to libclang
The new method is a wrapper of `CXXMethodDecl::isMoveAssignmentOperator` and
can be used to recognized move-assignment operators in libclang.
An export for the function, together with its documentation, was added to
"clang/include/clang-c/Index.h" with an implementation provided in
"clang/tools/libclang/CIndex.cpp". The implementation was based on
similar `clang_CXXMethod.*` implementations, following the same
structure but calling `CXXMethodDecl::isMoveAssignmentOperator` for its
main logic.
The new symbol was further added to "clang/tools/libclang/libclang.map"
to be exported, under the LLVM16 tag.
"clang/tools/c-index-test/c-index-test.c" was modified to print a
specific tag, "(move-assignment operator)", for cursors that are
recognized by `clang_CXXMethod_isMoveAssignmentOperator`.
A new regression test file,
"clang/test/Index/move-assignment-operator.cpp", was added to ensure
whether the correct constructs were recognized or not by the new function.
The "clang/test/Index/get-cursor.cpp" regression test file was updated
as it was affected by the new "(move-assignment operator)" tag.
A binding for the new function was added to libclang's python's
bindings, in "clang/bindings/python/clang/cindex.py", adding a new
method for `Cursor`, `is_move_assignment_operator_method`.
An accompanying test was added to
`clang/bindings/python/tests/cindex/test_cursor.py`, testing the new
function with the same methodology as the corresponding libclang test.
The current release note, `clang/docs/ReleaseNotes.rst`, was modified to
report the new addition under the "libclang" section.
Reviewed By: aaron.ballman
Differential Revision: https://reviews.llvm.org/D137246
Nikita Popov [Mon, 14 Nov 2022 14:16:38 +0000 (15:16 +0100)]
[LoopVersioningLICM] Remove unnecessary reset code (NFC)
The LoopVersioningLICM object is only ever used for a single loop,
but there was various unnecessary code for handling the case where
it is reused across loops. Drop that code, and pass the loop to the
constructor.
LLVM GN Syncbot [Mon, 14 Nov 2022 14:05:19 +0000 (14:05 +0000)]
[gn build] Port
d52e2839f3b1
Nicholas Guy [Mon, 14 Nov 2022 13:59:59 +0000 (13:59 +0000)]
[ARM][CodeGen] Add support for complex deinterleaving
Adds the Complex Deinterleaving Pass implementing support for complex numbers in a target-independent manner, deferring to the TargetLowering for the given target to create a target-specific intrinsic.
Differential Revision: https://reviews.llvm.org/D114174
revunov.denis@huawei.com [Mon, 14 Nov 2022 13:25:20 +0000 (13:25 +0000)]
[BOLT][NFC] Fix possible use-after-free
If NewName twine has reference to the old name, then after
Section.Name = NewName.str(); this reference is invalidated,
so we cannot use NewName.str() anymore.
Reviewed By: rafauler
Differential Revision: https://reviews.llvm.org/D137616
Valentin Clement [Mon, 14 Nov 2022 13:27:44 +0000 (14:27 +0100)]
[flang][NFC] Fix typo in fir.box_typecode op description
Dmitry Preobrazhensky [Mon, 14 Nov 2022 13:20:20 +0000 (16:20 +0300)]
[AMDGPU][MC][GFX11] Improve diagnostic messages for invalid VOPD syntax
Differential Revision: https://reviews.llvm.org/D137842
Nicolas Vasilache [Wed, 9 Nov 2022 11:56:26 +0000 (03:56 -0800)]
[mlir][Transform] Add support for dynamically unpacking tile_sizes / num_threads in tile_to_foreach_thread
This commit adds automatic unpacking of Value's of type pdl::OperationType to the underlying single-result OpResult.
This allows mixing single-value, attribute and multi-value pdl::Operation tile sizes and num threads to TileToForeachThreadOp.
Differential Revision: https://reviews.llvm.org/D137896
Ying Yi [Mon, 10 Oct 2022 12:26:56 +0000 (13:26 +0100)]
[ThinLTO] a ThinLTO warning is added if cache_size_bytes or cache_size_files is too small for the current link job. The warning recommends the user to consider adjusting --thinlto-cache-policy.
A specific case for ThinLTO cache pruning is that the current build is huge, and the cache wasn't big enough to hold the intermediate object files of that build. So in doing that build, a file would be cached, and later in that same build it would be evicted. This was significantly decreasing the effectiveness of the cache. By giving this warning, the user could identify the required cache size/files and improve ThinLTO link speed.
Differential Revision: https://reviews.llvm.org/D135590
Jay Foad [Mon, 14 Nov 2022 11:36:24 +0000 (11:36 +0000)]
[AMDGPU] Simplify SelectPat and remove comment obsoleted by D133593
Thomas Symalla [Mon, 14 Nov 2022 11:55:05 +0000 (12:55 +0100)]
[InstCombine][NFC] Add extractelement tests
HanSheng Zhang [Mon, 14 Nov 2022 11:45:23 +0000 (12:45 +0100)]
[reg2mem] Skip non-sized Instructions (PR58890)
We can only convert sized values into alloca/load/store, skip
instructions returning other types.
Fixes https://github.com/llvm/llvm-project/issues/58890.
Differential Revision: https://reviews.llvm.org/D137700
Christian Sigg [Mon, 14 Nov 2022 11:21:59 +0000 (12:21 +0100)]
[mlir][bazel] NFC: change MLIR_GPU_TO_CUBIN_PASS_ENABLE from `defines` to `local_defines`.
Joshua Cao [Mon, 14 Nov 2022 03:24:15 +0000 (22:24 -0500)]
Do not write a comma when varargs is the only argument
Fixes https://github.com/llvm/llvm-project/issues/56544
AsmWriter always writes ", ..." when a tail call has a varargs argument. This patch only writes the ", " when there is an argument before the varargs argument.
I did not write a dedicated test this for this change, but I modified an existing test that will test for a regression.
Reviewed By: avogelsgesang
Differential Revision: https://reviews.llvm.org/D137893
Signed-off-by: Adrian Vogelsgesang <avogelsgesang@salesforce.com>
Jean Perier [Mon, 14 Nov 2022 10:19:21 +0000 (11:19 +0100)]
[flang] Add hlfir.declare codegen
hlfir.declare codegen generates a fir.declare, and may generate a
fir.embox/fir.rebox/fir.emboxchar if the base value does not convey
all the variable bounds and length parameter information.
Leave OPTIONAL as a TODO to keep this patch simple. It will require
making the embox/rebox optional to preserve the optionality aspects.
Differential Revision: https://reviews.llvm.org/D137789
LLVM GN Syncbot [Mon, 14 Nov 2022 10:12:18 +0000 (10:12 +0000)]
[gn build] Port
dd46a08008f7