Nikita Popov [Wed, 14 Jul 2021 19:09:06 +0000 (21:09 +0200)]
[Attributes] Use single method to fetch type from AttributeSet (NFC)
While it is nice to have separate methods in the public AttributeSet
API, we can fetch the type from the internal AttributeSetNode
using a generic API for all type attribute kinds.
Roman Lebedev [Wed, 14 Jul 2021 18:54:04 +0000 (21:54 +0300)]
[NFC][PhaseOrdering] Add test for the lack of CSE after SimplifyCFG (PR51092)
David Green [Wed, 14 Jul 2021 19:06:49 +0000 (20:06 +0100)]
[ARM] Move add(VMLALVA(A, X, Y), B) to VMLALVA(add(A, B), X, Y)
For i64 reductions we currently try and convert add(VMLALV(X, Y), B) to
VMLALVA(B, X, Y), incorporating the addition into the VMLALVA. If we
have an add of an existing VMLALVA, this patch pushes the add up above
the VMLALVA so that it may potentially be simplified further, for
example being folded into another VMLALV.
Differential Revision: https://reviews.llvm.org/D105686
Vitaly Buka [Wed, 14 Jul 2021 01:11:57 +0000 (18:11 -0700)]
[scudo] Don't enabled MTE for small alignment
Differential Revision: https://reviews.llvm.org/D105954
Mehdi Amini [Wed, 14 Jul 2021 19:01:34 +0000 (19:01 +0000)]
Remove uses of deprecated target AllPassesAndDialectsNoRegistration in Bazel (NFC)
It was an alias for a long time.
Nikita Popov [Wed, 14 Jul 2021 18:58:52 +0000 (20:58 +0200)]
[Verifier] Improve incompatible attribute type check
A couple of attributes had explicit checks for incompatibility
with pointer types. However, this is already handled generically
by the typeIncompatible() check. We can drop these after adding
SwiftError to typeIncompatible().
However, the previous implementation of the check prints out all
attributes that are incompatible with a given type, even though
those attributes aren't actually used. This has the annoying
result that the error message changes every time a new attribute
is added to the list. Improve this by explicitly finding which
attribute isn't compatible and printing just that.
Saleem Abdulrasool [Wed, 14 Jul 2021 18:42:24 +0000 (11:42 -0700)]
Demangle: correct swift_async demangling for Microsoft scheme
The emission was corrected for the swift_async calling convention but
the demangling support was not. This repairs the demangling support as
well.
Eli Friedman [Mon, 12 Jul 2021 22:11:01 +0000 (15:11 -0700)]
[SelectionDAG] Add an overload of getStepVector that assumes step 1.
This is mostly a minor convenience, but the pattern seems frequent
enough to be worthwhile (and we'll probably add more uses in the
future).
Differential Revision: https://reviews.llvm.org/D105850
Thomas Lively [Wed, 14 Jul 2021 18:31:53 +0000 (11:31 -0700)]
[WebAssembly] Codegen for v128.loadX_lane instructions
Replace the experimental clang builtin and LLVM intrinsics for these
instructions with normal codegen patterns. Resolves PR50433.
Differential Revision: https://reviews.llvm.org/D105950
Louis Dionne [Wed, 14 Jul 2021 18:25:13 +0000 (14:25 -0400)]
[runtimes] Inherit the TARGET_TRIPLE that may be set by LLVM
Thomas Lively [Wed, 14 Jul 2021 18:17:08 +0000 (11:17 -0700)]
[WebAssembly] Remove datalayout strings from llc tests
The data layout strings do not have any effect on llc tests and will become
misleadingly out of date as we continue to update the canonical data layout, so
remove them from the tests.
Differential Revision: https://reviews.llvm.org/D105842
Fangrui Song [Wed, 14 Jul 2021 17:18:30 +0000 (10:18 -0700)]
[ELF] --fortran-common: prefer STB_WEAK to COMMON
The ELF specification says "The link editor honors the common definition and
ignores the weak ones." GNU ld and our Symbol::compare follow this, but the
--fortran-common code (D86142) made a mistake on the precedence.
Fixes https://bugs.llvm.org/show_bug.cgi?id=51082
Reviewed By: peter.smith, sfertile
Differential Revision: https://reviews.llvm.org/D105945
David Green [Wed, 14 Jul 2021 17:11:32 +0000 (18:11 +0100)]
[ARM] Lower v16i8 -> i64 VMLA reductions.
MVE does not have a VMLALV instruction that can perform v16i8 -> i64
reductions, like it does for v8i16->i64 and v4i32->i64 reductions. That
means that the pattern to create them will be spilt up by type
legalization, creating a lot of instructions.
This extends the patterns for matching i64 reductions a little to handle
the v16i8->i64 case. We need to turn them into a pair of v8i16->i64
VMLALVs that each perform half of the reduction and are summed together
(so the later is a VMLALVA). The order of the lanes does not matter for
the reduction so we generate a MVEEXT for the extension, that will
either be folded into a extending load or can be optimized to a
VREV/VMOVL. Some of the resulting codegen isn't optimal, but will be
improved in a later patch.
Differential Revision: https://reviews.llvm.org/D105680
Sanjay Patel [Wed, 14 Jul 2021 15:57:36 +0000 (11:57 -0400)]
[InstCombine] reorder icmp with offset folds for better results
This set of folds was added recently with:
c7b658aeb526
0c400e895306
40b752d28d95
...and I noted that this wasn't likely to fire in code derived
from C/C++ source because of nsw in particular. But I didn't
notice that I had placed the code above the no-wrap block
of transforms.
This is likely the cause of regressions noted from the previous
commit because -- as shown in the test diffs -- we may have
transformed into a compare with an arbitrary constant rather
than a simpler signbit test.
Sanjay Patel [Wed, 14 Jul 2021 15:35:23 +0000 (11:35 -0400)]
[InstCombine] add tests for icmp with constant offset and no-wrap flags; NFC
Sander de Smalen [Wed, 14 Jul 2021 15:45:07 +0000 (16:45 +0100)]
[LV] Print remark when loop cannot be vectorized due to invalid costs.
This patch emits remarks for instructions that have invalid costs for
a given set of vectorization factors. Some example output:
t.c:4:19: remark: Instruction with invalid costs prevented vectorization at VF=(vscale x 1): load
dst[i] = sinf(src[i]);
^
t.c:4:14: remark: Instruction with invalid costs prevented vectorization at VF=(vscale x 1, vscale x 2, vscale x 4): call to llvm.sin.f32
dst[i] = sinf(src[i]);
^
t.c:4:12: remark: Instruction with invalid costs prevented vectorization at VF=(vscale x 1): store
dst[i] = sinf(src[i]);
^
Reviewed By: fhahn, kmclaughlin
Differential Revision: https://reviews.llvm.org/D105806
Matt Arsenault [Thu, 10 Jun 2021 13:28:20 +0000 (09:28 -0400)]
GlobalISel: Handle lowering non-power-of-2 extloads
Sander de Smalen [Wed, 14 Jul 2021 08:43:30 +0000 (09:43 +0100)]
[CostModel][AArch64] Make loads/stores of <vscale x 1 x eltty> invalid.
At the moment, <vscale x 1 x eltty> are not yet fully handled by the
code-generator, so to avoid vectorizing loops with that VF, we mark the
cost for these types as invalid.
The reason for not adding a new "TTI::getMinimumScalableVF" is because
the type is supposed to be a type that can be legalized. It partially is,
although the support for these types need some more work.
Reviewed By: paulwalker-arm, dmgreen
Differential Revision: https://reviews.llvm.org/D103882
Aaron Ballman [Wed, 14 Jul 2021 15:40:37 +0000 (11:40 -0400)]
Combine two diagnostics into one and correct grammar
The anonymous and non-anonymous bit-field diagnostics are easily
combined into one diagnostic. However, the diagnostic was missing a
"the" that is present in the almost-identically worded
warn_bitfield_width_exceeds_type_width diagnostic, hence the changes to
test cases.
Jay Foad [Tue, 13 Jul 2021 13:30:54 +0000 (14:30 +0100)]
[AMDGPU] Check llc-pipeline.ll with -match-full-lines -strict-whitespace
This prevents breaking the indentation that shows the structure of the
pass managers.
Differential Revision: https://reviews.llvm.org/D105891
Alexey Bataev [Mon, 12 Jul 2021 17:44:36 +0000 (10:44 -0700)]
[SLP]Workaround for InsertSubVector cost.
The cost of the InsertSubvector shuffle kind cost is not complete and
may end up with just extracts + inserts costs in many cases. Added
a workaround to represent it as a generic PermuteSingleSrc, which is
still pessimistic but better than InsertSubvector.
Differential Revision: https://reviews.llvm.org/D105827
Louis Dionne [Wed, 14 Jul 2021 14:49:28 +0000 (10:49 -0400)]
[runtimes] NFCI: Drop intermediate CMake variable TARGET_TRIPLE
We might as well use the various XXX_TARGET_TRIPLE variables directly.
Yitzhak Mandelbaum [Fri, 2 Jul 2021 18:53:10 +0000 (18:53 +0000)]
[Lexer] Fix bug in `makeFileCharRange` called on split tokens.
When the end loc of the specified range is a split token, `makeFileCharRange`
does not process it correctly. This patch adds proper support for split tokens.
Differential Revision: https://reviews.llvm.org/D105365
Peixin Qiao [Wed, 14 Jul 2021 13:42:26 +0000 (09:42 -0400)]
[flang][OpenMP] Fix semantic check of test case in taskloop simd construct
The following semantic check is removed in OpenMP Version 5.0:
```
Taskloop simd construct restrictions: No reduction clause can be specified.
```
Also fix several typos.
Reviewed By: kiranchandramohan
Differential Revision: https://reviews.llvm.org/D105874
Jinsong Ji [Wed, 14 Jul 2021 13:37:50 +0000 (13:37 +0000)]
[AIX] Enable dollar sign as PC in inlineasm
$ is used as PC for PowerPC inlineasm, ELF use it,
enable it for AIX XCOFF as well.
Reviewed By: #powerpc, amyk, nemanjai
Differential Revision: https://reviews.llvm.org/D105956
Matthias Springer [Wed, 14 Jul 2021 13:14:05 +0000 (22:14 +0900)]
[mlir][linalg] Fix typo in ExtractSliceOfPadTensorSwapPattern
Differential Revision: https://reviews.llvm.org/D105607
oToToT [Wed, 14 Jul 2021 13:14:12 +0000 (21:14 +0800)]
[docs] Update CMake cross compiling guide link
The CMake community Wiki has been moved to the [[ https://gitlab.kitware.com/cmake/community/wikis/home | Kitware GitLab Instance ]].
Also, the original anchor for `Information how to set up various cross compiling toolchains` section might not work as expected. The original content is now being collapsed, so browser won't navigate to the right section directly.
Hence, I think it might be better to provide the section name instead of `this section` with link to help readers find the right section by themselves.
Reviewed By: void
Differential Revision: https://reviews.llvm.org/D104996
Tim Northover [Wed, 14 Jul 2021 13:11:20 +0000 (14:11 +0100)]
ARM: reuse existing libcall global variable if possible.
If we try to create a new GlobalVariable on each iteration, the Module will
detect the name collision and "helpfully" rename later iterations by appending
".1" etc. But "___udivsi3.1" doesn't exist and we definitely don't want to try
to call it.
So instead check whether there's already a global with the right name in the
module and use that if so.
Sanjay Patel [Wed, 14 Jul 2021 13:02:31 +0000 (09:02 -0400)]
[SLP] match logical and/or as reduction candidates
This has been a work-in-progress for a long time...we finally have all of
the pieces in place to handle vectorization of compare code as shown in:
https://llvm.org/PR41312
To do this (see PhaseOrdering tests), we converted SimplifyCFG and
InstCombine to the poison-safe (select) forms of the logic ops, so now we
need to have SLP recognize those patterns and insert a freeze op to make
a safe reduction:
https://alive2.llvm.org/ce/z/NH54Ah
We get the minimal patterns with this patch, but the PhaseOrdering tests
show that we still need adjustments to get the ideal IR in some or all of
the motivating cases.
Differential Revision: https://reviews.llvm.org/D105730
Gabor Marton [Wed, 9 Jun 2021 15:03:47 +0000 (17:03 +0200)]
[Analyzer][solver] Add dump methods for (dis)equality classes.
This proved to be very useful during debugging.
Differential Revision: https://reviews.llvm.org/D103967
Alexander Shaposhnikov [Wed, 14 Jul 2021 11:33:09 +0000 (04:33 -0700)]
[lld][MachO] Code cleanup
Make use of ArgList::getLastArgValue. NFC.
Test plan: make check-lld-macho
Differential revision: https://reviews.llvm.org/D105452
Djordje Todorovic [Mon, 28 Jun 2021 12:15:31 +0000 (05:15 -0700)]
[RemoveRedundantDebugValues] Add a Pass that removes redundant DBG_VALUEs
This new MIR pass removes redundant DBG_VALUEs.
After the register allocator is done, more precisely, after
the Virtual Register Rewriter, we end up having duplicated
DBG_VALUEs, since some virtual registers are being rewritten
into the same physical register as some of existing DBG_VALUEs.
Each DBG_VALUE should indicate (at least before the LiveDebugValues)
variables assignment, but it is being clobbered for function
parameters during the SelectionDAG since it generates new DBG_VALUEs
after COPY instructions, even though the parameter has no assignment.
For example, if we had a DBG_VALUE $regX as an entry debug value
representing the parameter, and a COPY and after the COPY,
DBG_VALUE $virt_reg, and after the virtregrewrite the $virt_reg gets
rewritten into $regX, we'd end up having redundant DBG_VALUE.
This breaks the definition of the DBG_VALUE since some analysis passes
might be built on top of that premise..., and this patch tries to fix
the MIR with the respect to that.
This first patch performs bacward scan, by trying to detect a sequence of
consecutive DBG_VALUEs, and to remove all DBG_VALUEs describing one
variable but the last one:
For example:
(1) DBG_VALUE $edi, !"var1", ...
(2) DBG_VALUE $esi, !"var2", ...
(3) DBG_VALUE $edi, !"var1", ...
...
in this case, we can remove (1).
By combining the forward scan that will be introduced in the next patch
(from this stack), by inspecting the statistics, the RemoveRedundantDebugValues
removes 15032 instructions by using gdb-7.11 as a testbed.
Differential Revision: https://reviews.llvm.org/D105279
Simon Pilgrim [Wed, 14 Jul 2021 11:20:47 +0000 (12:20 +0100)]
[InstCombine] Fold (select C, (gep Ptr, Idx), Ptr) -> (gep Ptr, (select C, Idx, 0)) (PR50183) (REAPPLIED)
As discussed on PR50183, we already fold to prefer 'select-of-idx' vs 'select-of-gep':
define <4 x i32>* @select0a(<4 x i32>* %a0, i64 %a1, i1 %a2, i64 %a3) {
%gep0 = getelementptr inbounds <4 x i32>, <4 x i32>* %a0, i64 %a1
%gep1 = getelementptr inbounds <4 x i32>, <4 x i32>* %a0, i64 %a3
%sel = select i1 %a2, <4 x i32>* %gep0, <4 x i32>* %gep1
ret <4 x i32>* %sel
}
-->
define <4 x i32>* @select1a(<4 x i32>* %a0, i64 %a1, i1 %a2, i64 %a3) {
%sel = select i1 %a2, i64 %a1, i64 %a3
%gep = getelementptr inbounds <4 x i32>, <4 x i32>* %a0, i64 %sel
ret <4 x i32>* %gep
}
This patch adds basic handling for the 'fallthrough' cases where the gep idx == 0 has been folded away to the base address:
define <4 x i32>* @select0(<4 x i32>* %a0, i64 %a1, i1 %a2) {
%gep = getelementptr inbounds <4 x i32>, <4 x i32>* %a0, i64 %a1
%sel = select i1 %a2, <4 x i32>* %a0, <4 x i32>* %gep
ret <4 x i32>* %sel
}
-->
define <4 x i32>* @select1(<4 x i32>* %a0, i64 %a1, i1 %a2) {
%sel = select i1 %a2, i64 0, i64 %a1
%gep = getelementptr inbounds <4 x i32>, <4 x i32>* %a0, i64 %sel
ret <4 x i32>* %gep
}
Reapplied with a fix for the bpf "-bpf-disable-avoid-speculation" tests
Differential Revision: https://reviews.llvm.org/D105901
Chuanqi Xu [Wed, 14 Jul 2021 11:12:57 +0000 (19:12 +0800)]
[NFC] [Coroutines] Remove unused CoroFree
Bruce Mitchener [Wed, 14 Jul 2021 10:59:08 +0000 (10:59 +0000)]
[lldb][docs] Remove mention of subversion. NFC.
Reviewed By: DavidSpickett
Differential Revision: https://reviews.llvm.org/D103744
Simon Pilgrim [Wed, 14 Jul 2021 11:03:16 +0000 (12:03 +0100)]
[X86] Implement smarter instruction lowering for FP_TO_UINT from f32/f64 to i32/i64 and vXf32/vXf64 to vXi32 for SSE2 and AVX2 by using the exact semantic of the CVTTPS2SI instruction.
We know that "CVTTPS2SI" returns 0x80000000 for out of range inputs (and for FP_TO_UINT, negative float values are undefined). We can use this to make unsigned conversions from vXf32 to vXi32 more efficient, particularly on targets without blend using the following logic:
small := CVTTPS2SI(x);
fp_to_ui(x) := small | (CVTTPS2SI(x - 2^31) & ARITHMETIC_RIGHT_SHIFT(small, 31))
Even on targets where "PBLENDVPS"/"PBLENDVB" exists, it is often a latency 2, low throughput instruction so this logic is applied there too (in particular for AVX2 also). It furthermore gets rid of one high latency floating point comparison in the previous lowering.
@TomHender checked the correctness of this for all possible floats between -1 and 2^32 (both ends excluded).
Original Patch by @TomHender (Tom Hender)
Differential Revision: https://reviews.llvm.org/D89697
LLVM GN Syncbot [Wed, 14 Jul 2021 10:49:08 +0000 (10:49 +0000)]
[gn build] Port
c08dabb0f476
Simon Pilgrim [Wed, 14 Jul 2021 10:48:22 +0000 (11:48 +0100)]
Revert rGb803294cf78714303db2d3647291a2308347ef23 : "[InstCombine] Fold (select C, (gep Ptr, Idx), Ptr) -> (gep Ptr, (select C, Idx, 0)) (PR50183)"
Missed some BPF test changes that need addressing
Nico Weber [Wed, 14 Jul 2021 10:43:23 +0000 (06:43 -0400)]
[gn build] (manually) merge
462d4de35b0c
Stefan Pintilie [Wed, 14 Jul 2021 02:15:30 +0000 (21:15 -0500)]
[NFC][PowerPC] Added test to check regsiter allocation for ACC registers
ACC regsiters are a combination of 4 consecutive vector regsiters and therefore
somtimes require special treatment for register allocation. This patch only
adds a test.
Stephen Tozer [Tue, 13 Jul 2021 12:31:11 +0000 (13:31 +0100)]
[DebugInfo] Correctly update dbg.values with duplicated location ops
This patch fixes code that incorrectly handled dbg.values with duplicate
location operands, i.e. !DIArgList(i32 %a, i32 %a). The errors in
question were caused by either applying an update to dbg.value multiple
times when the update is only valid once, or by updating the
DIExpression for only the first instance of a value that appears
multiple times.
Differential Revision: https://reviews.llvm.org/D105831
Simon Pilgrim [Tue, 13 Jul 2021 18:06:13 +0000 (19:06 +0100)]
[InstCombine] Fold (select C, (gep Ptr, Idx), Ptr) -> (gep Ptr, (select C, Idx, 0)) (PR50183)
As discussed on PR50183, we already fold to prefer 'select-of-idx' vs 'select-of-gep':
define <4 x i32>* @select0a(<4 x i32>* %a0, i64 %a1, i1 %a2, i64 %a3) {
%gep0 = getelementptr inbounds <4 x i32>, <4 x i32>* %a0, i64 %a1
%gep1 = getelementptr inbounds <4 x i32>, <4 x i32>* %a0, i64 %a3
%sel = select i1 %a2, <4 x i32>* %gep0, <4 x i32>* %gep1
ret <4 x i32>* %sel
}
-->
define <4 x i32>* @select1a(<4 x i32>* %a0, i64 %a1, i1 %a2, i64 %a3) {
%sel = select i1 %a2, i64 %a1, i64 %a3
%gep = getelementptr inbounds <4 x i32>, <4 x i32>* %a0, i64 %sel
ret <4 x i32>* %gep
}
This patch adds basic handling for the 'fallthrough' cases where the gep idx == 0 has been folded away to the base address:
define <4 x i32>* @select0(<4 x i32>* %a0, i64 %a1, i1 %a2) {
%gep = getelementptr inbounds <4 x i32>, <4 x i32>* %a0, i64 %a1
%sel = select i1 %a2, <4 x i32>* %a0, <4 x i32>* %gep
ret <4 x i32>* %sel
}
-->
define <4 x i32>* @select1(<4 x i32>* %a0, i64 %a1, i1 %a2) {
%sel = select i1 %a2, i64 0, i64 %a1
%gep = getelementptr inbounds <4 x i32>, <4 x i32>* %a0, i64 %sel
ret <4 x i32>* %gep
}
Differential Revision: https://reviews.llvm.org/D105901
Butygin [Tue, 6 Jul 2021 16:11:16 +0000 (19:11 +0300)]
[mlir][SCF] populateSCFStructuralTypeConversionsAndLegality WhileOp support
Differential Revision: https://reviews.llvm.org/D105923
Fraser Cormack [Tue, 13 Jul 2021 16:08:05 +0000 (17:08 +0100)]
[RISCV] Fix the neutral element in vector 'fadd' reductions
Using positive zero as the neutral element in 'fadd' reductions, while
it generates better code, is incorrect. The correct neutral element is
negative zero: 0.0 + -0.0 = 0.0, whereas -0.0 + -0.0 = -0.0.
There are perhaps more optimal lowerings of negative zero avoiding
constant-pool loads which could be left as future work.
Reviewed By: craig.topper
Differential Revision: https://reviews.llvm.org/D105902
Sebastian Neubauer [Wed, 14 Jul 2021 08:03:54 +0000 (10:03 +0200)]
[AMDGPU] Init scratch only if necessary
If no scratch or flat instructions are used, we do not need to
initialize the flat scratch hardware register.
Differential Revision: https://reviews.llvm.org/D105920
Sebastian Neubauer [Tue, 13 Jul 2021 14:37:15 +0000 (16:37 +0200)]
[AMDGPU] Precommit flat-scratch-init.ll test
Cullen Rhodes [Wed, 14 Jul 2021 08:01:19 +0000 (08:01 +0000)]
[AArch64][SME] Add matrix register definitions and parsing support
SME introduces the ZA array, a new piece of architectural register state
consisting of a matrix of [SVLb x SVLb] bytes, where SVL is the
implementation defined Streaming SVE vector length and SVLb is the
number of 8-bit elements in a vector of SVL bits.
SME instructions consist of three types of matrix operands:
* Tiles: a ZA tile is a square, two-dimensional sub-array of elements
within the ZA array. These tiles make up the larger accumulator array
and the granularity varies based on the element size, i.e.
- ZAQ0..ZAQ15 (smallest tile granule)
- ZAD0..ZAD7
- ZAS0..ZAS3
- ZAH0..ZAH1
or ZAB0 (largest tile granule, single tile)
* Tile vectors: similar to regular tiles, but have an extra 'h' or 'v'
to tell how the vector at [reg+offset] is layed out in the tile,
horizontally or vertically. E.g. za1h.h or za15v.q, which corresponds
to vectors in registers ZAH1 and ZAQ15, respectively.
* Accumulator matrix: this is the entire accumulator array ZA.
This patch adds the register classes and related operands and parsing
for SME instructions operating on the accumulator array.
The ADDHA and ADDVA instructions which operate on tiles are also added
in this patch to make some use of the code added, later patches will
make use of the other operands introduced here.
The reference can be found here:
https://developer.arm.com/documentation/ddi0602/2021-06
Co-authored by: Sander de Smalen (@sdesmalen)
Reviewed By: david-arm
Differential Revision: https://reviews.llvm.org/D105570
Sam McCall [Fri, 9 Jul 2021 07:40:18 +0000 (09:40 +0200)]
[clangd] Add CMake option to (not) link in clang-tidy checks
This reduces the size of the dependency graph and makes incremental
development a little more pleasant (less rebuilding).
This introduces a bit of complexity/fragility as some tests verify
clang-tidy behavior. I attempted to isolate these and build/run as much
of the tests as possible in both configs to prevent rot.
Expectation is that (some) developers will use this locally, but
buildbots etc will keep testing clang-tidy.
Fixes https://github.com/clangd/clangd/issues/233
Differential Revision: https://reviews.llvm.org/D105679
Ruiling Song [Thu, 8 Jul 2021 01:42:06 +0000 (09:42 +0800)]
[AMDGPU] Don't handle export done when unify exit nodes
This patch aims to revert the changes introduced by D70781 D71192 D76364
D70781 was introduced to fix hardware hang where we do not insert exp-
null-done for a kill inside infinit loop. At that time we have not added
exp-null-done for kill early termination, but I believe as for now, we will
always add the exp-null-done for early termination case in LaterBranchLowering.
D71192 was introduced to handle the only_kill case, which is also been
handled by the kill early termination work.
D76364 was used to fix a regression by D71192, where we cleared the done
bit of the export in the existing program and not let the normal return
block branching to the new unified return block.
With this change, we just trust frontends have setup exp-done correctly
which is true for all existing frontends. The backend only inserts
exp-null-done for the kill cases which is handled in SILateBranchLowering.cpp.
Reviewed by: critson
Differential Revision: https://reviews.llvm.org/D105610
Ruiling Song [Thu, 8 Jul 2021 03:09:33 +0000 (11:09 +0800)]
[NFC][AMDGPU] autogenerate kill-infinite-loop.ll checks
This would help us to track the assembly changes to these tests.
Reviewed by: foad
Differential Revision: https://reviews.llvm.org/D105609
Ruiling Song [Thu, 17 Jun 2021 22:40:44 +0000 (06:40 +0800)]
[RegisterCoalescer] Resolve conflict based on liveness of subregister
Currently we are resolving lane/subregister conflict by visiting
instructions sequentially in current block to see whether there is any
use of the tainted lanes. To save compile time, we are not doing further
check in successor blocks. This sounds reasonable without subgregister liveness.
But since we have added subregister liveness tracking capability to
register coalescer, we can easily determine whether we have subregister
liveness conflict by checking subranges. This would help coalescing more
COPYs for target that enables subregister liveness tracking.
Reviewed by: arsenm, qcolombet
Differential Revision: https://reviews.llvm.org/D104509
Kito Cheng [Tue, 29 Jun 2021 07:23:55 +0000 (15:23 +0800)]
[RISCV] Pass -u to linker correctly.
`-u` is a linker option used to pretend a symbol is undefined,
this option are common used for forcing archive member extraction.
This option should pass to `ld`, and many other toolchain in Clang
like `tools::gnutools` has pass that too.
Reviewed By: MaskRay
Differential Revision: https://reviews.llvm.org/D105091
Martin Storsjö [Tue, 13 Jul 2021 12:39:54 +0000 (12:39 +0000)]
[libcxx] [test] Clarify weak_ptr_ret on Windows, remove a LIBCXX-WINDOWS-FIXME
On Windows, structs with a destructor are always returned indirectly;
add this to the list of known exceptions in the test where the class
isn't returned in registers as expected.
Differential Revision: https://reviews.llvm.org/D105906
Dmitry Vyukov [Mon, 12 Jul 2021 19:06:28 +0000 (12:06 -0700)]
sanitizer_common: add simpler ThreadRegistry ctor
Currently ThreadRegistry is overcomplicated because of tsan,
it needs tid quarantine and reuse counters. Other sanitizers
don't need that. It also seems that no other sanitizer now
needs max number of threads. Asan used to need 2^24 limit,
but it does not seem to be needed now. Other sanitizers blindly
copy-pasted that without reasons. Lsan also uses quarantine,
but I don't see why that may be potentially needed.
Add a ThreadRegistry ctor that does not require any sizes
and use it in all sanitizers except for tsan.
In preparation for new tsan runtime, which won't need
any of these parameters as well.
Reviewed By: vitalybuka
Differential Revision: https://reviews.llvm.org/D105713
Yuichi Yoshida [Wed, 14 Jul 2021 05:47:31 +0000 (05:47 +0000)]
Reformulate OrcJIT tutorial doc to make it more clear.
Fixed a minor writing error. The text was hard to understand.
Reviewed By: mehdi_amini
Differential Revision: https://reviews.llvm.org/D105899
Zakk Chen [Wed, 14 Jul 2021 03:32:55 +0000 (20:32 -0700)]
[RISCV] Support overloading for RVV miscellaneous functions.
Based on this update to the intrinsic doc
https://github.com/riscv/rvv-intrinsic-doc/pull/103
Reviewed By: craig.topper
Differential Revision: https://reviews.llvm.org/D105611
Vitaly Buka [Wed, 14 Jul 2021 04:29:14 +0000 (21:29 -0700)]
[sanitizer] Fix type error in python 3
Vitaly Buka [Wed, 14 Jul 2021 03:45:04 +0000 (20:45 -0700)]
[sanitizer] Upgrade android scripts to python 3
David Green [Wed, 14 Jul 2021 03:40:47 +0000 (04:40 +0100)]
Revert "[clang] Refactor AST printing tests to share more infrastructure"
This reverts commit
20176bc7dd3f431db4c3d59b51a9f53d52190c82 as some
versions of GCC do not seem to handle the new code very well. They
complain about:
/tmp/ccqUQZyw.s: Assembler messages:
/tmp/ccqUQZyw.s:1151: Error: symbol `_ZNSt14_Function_base13_Base_managerIN5clangUlPKNS1_4StmtEE2_EE10_M_managerERSt9_Any_dataRKS7_St18_Manager_operation' is already defined
/tmp/ccqUQZyw.s:11963: Error: symbol `_ZNSt17_Function_handlerIFbPKN5clang4StmtEENS0_UlS3_E2_EE9_M_invokeERKSt9_Any_dataOS3_' is already defined
This seems like it is some GCC issue, but multiple buildbots (and my
local machine) are all failing because of it.
Vitaly Buka [Wed, 14 Jul 2021 03:38:45 +0000 (20:38 -0700)]
[sanitizer] Convert script to python 3
Michael Kruse [Wed, 14 Jul 2021 03:33:56 +0000 (22:33 -0500)]
[Polly] Fix typo. NFC.
Thanks to Mugerwa Martin for reporting.
Jinsong Ji [Wed, 14 Jul 2021 03:29:06 +0000 (03:29 +0000)]
[AIX] Update testcase to use aix triple
We have implemented the basic MCAsmParser now, we can use the triple
directly now.
Hongtao Yu [Wed, 14 Jul 2021 02:49:50 +0000 (19:49 -0700)]
[CSSPGO][llvm-profgen] Fix a missing initalization
Fixing a missing initalization that accidentaly caused by https://reviews.llvm.org/D103178 .
Hongtao Yu [Wed, 14 Jul 2021 02:48:58 +0000 (19:48 -0700)]
Revert "[CSSPGO][llvm-profgen] Fix a missing initalization"
This reverts commit
fef5f4456abcb1ea052206db6c232468d70b07f2.
Hongtao Yu [Wed, 14 Jul 2021 02:45:48 +0000 (19:45 -0700)]
[CSSPGO][llvm-profgen] Fix a missing initalization
Fixing a missing initalization that accidentaly caused by https://reviews.llvm.org/D103178 .
Shilei Tian [Wed, 14 Jul 2021 02:28:26 +0000 (22:28 -0400)]
[AbstractAttributor] Fold function calls to `__kmpc_is_spmd_exec_mode` if possible
In the device runtime there are many function calls to `__kmpc_is_spmd_exec_mode`
to query the execution mode of current kernels. In many cases, user programs
only contain target region executing in one mode. As a consequence, those runtime
function calls will only return one value. If we can get rid of these function
calls during compliation, it can potentially improve performance.
In this patch, we use `AAKernelInfo` to analyze kernel execution. Basically, for
each kernel (device) function `F`, we collect all kernel entries `K` that can
reach `F`. A new AA, `AAFoldRuntimeCall`, is created for each call site. In each
iteration, it will check all reaching kernel entries, and update the folded value
accordingly.
In the future we will support more function.
Reviewed By: jdoerfert
Differential Revision: https://reviews.llvm.org/D105787
Philip Reames [Tue, 13 Jul 2021 20:30:44 +0000 (13:30 -0700)]
[SCEV] Handle zero stride correctly in howManyLessThans
This is split from D105216, but the code is hoisted much earlier into
the path where we can actually get a zero stride flowing through. Some
fairly simple proofs handle the cases which show up in practice. The
only test changes are the cases where we really do need a non-zero
divider to produce the right result.
Recommitting with isLoopInvariant() check.
Differential Revision: https://reviews.llvm.org/D105921
Richard Smith [Wed, 14 Jul 2021 01:28:45 +0000 (18:28 -0700)]
Fix test trying to write a spurious output file into the source
directory.
This causes test failures if the source directory is read-only.
Hongtao Yu [Wed, 14 Jul 2021 01:30:16 +0000 (18:30 -0700)]
[NFC][CSSPGO] Rename the name of an enum value.
Hongtao Yu [Wed, 30 Jun 2021 23:52:37 +0000 (16:52 -0700)]
[CSSPGO] Do not import pseudo probe desc in thinLTO
Previously we reliedy on pseudo probe descriptors to look up precomputed GUID during probe emission for inlined probes. Since we are moving to always using unique linkage names, GUID for functions can be computed in place from dwarf names. This eliminates the need of importing pseudo probe descs in thinlto, since those descs should be emitted by the original modules.
This significantly reduces thinlto memory footprint in some extreme case where the number of imported modules for a single module is massive.
Test Plan:
Reviewed By: wenlei
Differential Revision: https://reviews.llvm.org/D105248
Hongtao Yu [Mon, 12 Jul 2021 16:47:05 +0000 (09:47 -0700)]
[CSSPGO][llvm-profgen] Allow multiple executable load segments.
The linker or post-link optimizer can create an ELF image with multiple executable segments each of which will be loaded separately at run time. This breaks the assumption of llvm-profgen that currently only supports one base load address. What it ends up with is that the subsequent mmap events will be treated as an overwrite of the first mmap event which will in turn screw up address mapping. While it is non-trivial to support multiple separate load addresses and given that on x64 those segments will always be loaded at consecutive addresses (though via separate mmap
sys calls), I'm adding an error checking logic to bail out if that's violated and keep using a single load address which is the address of the first executable segment.
Also changing the disassembly output from printing section offset to printing the virtual address instead, which matches the behavior of objdump.
Differential Revision: https://reviews.llvm.org/D103178
Vitaly Buka [Wed, 14 Jul 2021 00:26:07 +0000 (17:26 -0700)]
[NFC][sanitizer] Simplify MapPackedCounterArrayBuffer
Jon Roelofs [Wed, 14 Jul 2021 01:05:30 +0000 (18:05 -0700)]
[AArch64] rm unused subreg's
Jon Roelofs [Wed, 14 Jul 2021 00:08:45 +0000 (17:08 -0700)]
[AArch64] Fix AArch64::dsub's size
Arthur Eubanks [Wed, 14 Jul 2021 00:51:44 +0000 (17:51 -0700)]
Revert "[SCEV] Handle zero stride correctly in howManyLessThans"
This reverts commit
4df591b5c960affd1612e330d0c9cd3076c18053.
Causes crashes, see comments on D105921.
Vitaly Buka [Wed, 14 Jul 2021 00:42:59 +0000 (17:42 -0700)]
Revert "[NFC][sanitizer] Simplify MapPackedCounterArrayBuffer"
Does not compile.
This reverts commit
8725b382b0a5ea375252d966bafbace62a21e93b.
Jessica Paquette [Tue, 13 Jul 2021 22:21:58 +0000 (15:21 -0700)]
[AArch64][GlobalISel] Mark v2s64 -> v2p0 G_INTTOPTR as legal
Allow
```
%x:_<2 x p0> = G_INTTOPTR %y:_<2 x s64>
```
This shows up when building clang for AArch64 with GlobalISel.
Also show that we can select it.
This should match SDAG's behaviour: https://godbolt.org/z/33oqYoaYv
Differential Revision: https://reviews.llvm.org/D105944
Vitaly Buka [Wed, 14 Jul 2021 00:26:07 +0000 (17:26 -0700)]
[NFC][sanitizer] Simplify MapPackedCounterArrayBuffer
Vitaly Buka [Tue, 13 Jul 2021 23:51:18 +0000 (16:51 -0700)]
[NFC][sanitizer] Move MemoryMapper template parameter
Matt Arsenault [Tue, 13 Jul 2021 23:33:38 +0000 (19:33 -0400)]
AMDGPU: Try to fix test failure with EXPENSIVE_CHECKS
The machine verifier is enabled by default for EXPENSIVE_CHECKS, so
the pass runs of it would pollute the output here.
Dmitry Vyukov [Tue, 13 Jul 2021 22:34:58 +0000 (15:34 -0700)]
sanitizer_common: optimize memory drain
Currently we allocate MemoryMapper per size class.
MemoryMapper mmap's and munmap's internal buffer.
This results in 50 mmap/munmap calls under the global
allocator mutex. Reuse MemoryMapper and the buffer
for all size classes. This radically reduces number of
mmap/munmap calls. Smaller size classes tend to have
more objects allocated, so it's highly likely that
the buffer allocated for the first size class will
be enough for all subsequent size classes.
Reviewed By: melver
Differential Revision: https://reviews.llvm.org/D105778
Arthur Eubanks [Tue, 13 Jul 2021 19:50:34 +0000 (12:50 -0700)]
[NewPM][SimpleLoopUnswitch] Add option to not trivially unswitch
To help with debugging non-trivial unswitching issues.
Don't care about the legacy pass, nobody is using it.
If a pass's string params are empty (e.g. "simple-loop-unswitch"), don't
default to the empty constructor for the pass params. We should still
let the parser take care of it in case the parser has its own defaults.
Reviewed By: asbirlea
Differential Revision: https://reviews.llvm.org/D105933
Vitaly Buka [Tue, 13 Jul 2021 22:58:55 +0000 (15:58 -0700)]
[NFC][sanitizer] Don't store region_base_ in MemoryMapper
Part of D105778
Matt Arsenault [Wed, 26 Sep 2018 23:36:28 +0000 (09:36 +1000)]
RegAlloc: Allow targets to split register allocation
AMDGPU normally spills SGPRs to VGPRs. Previously, since all register
classes are handled at the same time, this was problematic. We don't
know ahead of time how many registers will be needed to be reserved to
handle the spilling. If no VGPRs were left for spilling, we would have
to try to spill to memory. If the spilled SGPRs were required for exec
mask manipulation, it is highly problematic because the lanes active
at the point of spill are not necessarily the same as at the restore
point.
Avoid this problem by fully allocating SGPRs in a separate regalloc
run from VGPRs. This way we know the exact number of VGPRs needed, and
can reserve them for a second run. This fixes the most serious
issues, but it is still possible using inline asm to make all VGPRs
unavailable. Start erroring in the case where we ever would require
memory for an SGPR spill.
This is implemented by giving each regalloc pass a callback which
reports if a register class should be handled or not. A few passes
need some small changes to deal with leftover virtual registers.
In the AMDGPU implementation, a new pass is introduced to take the
place of PrologEpilogInserter for SGPR spills emitted during the first
run.
One disadvantage of this is currently StackSlotColoring is no longer
used for SGPR spills. It would need to be run again, which will
require more work.
Error if the standard -regalloc option is used. Introduce new separate
-sgpr-regalloc and -vgpr-regalloc flags, so the two runs can be
controlled individually. PBQB is not currently supported, so this also
prevents using the unhandled allocator.
Eli Friedman [Tue, 13 Jul 2021 21:48:47 +0000 (14:48 -0700)]
[ScalarEvolution] Make isKnownNonZero handle more cases.
Using an unsigned range instead of signed ranges is a bit more precise.
Differential Revision: https://reviews.llvm.org/D105941
Vitaly Buka [Tue, 13 Jul 2021 21:54:24 +0000 (14:54 -0700)]
[NFC][sanitizer] Exctract DrainHalfMax
Part of D105778
Vitaly Buka [Tue, 13 Jul 2021 22:31:54 +0000 (15:31 -0700)]
[NFC][sanitizer] Rename some MemoryMapper members
Part of D105778
Geoffrey Martin-Noble [Tue, 13 Jul 2021 19:57:31 +0000 (12:57 -0700)]
[NFC][MLIR][std] Clean up ArithmeticCastOps
The documentation on these was out of sync with the implementation. Also
the declaration of inputs was repeated when it is already part of the
ArithmeticCastOp definition.
Reviewed By: rriddle
Differential Revision: https://reviews.llvm.org/D105934
Victor Huang [Tue, 13 Jul 2021 19:57:08 +0000 (14:57 -0500)]
[PowerPC] Add PowerPC compare and multiply related builtins and instrinsics for XL compatibility
This patch is in a series of patches to provide builtins for compatibility
with the XL compiler. This patch adds the builtins and instrisics for compare
and multiply related operations.
Reviewed By: nemanjai, #powerpc
Differential revision: https://reviews.llvm.org/D102875
MaheshRavishankar [Tue, 13 Jul 2021 21:51:20 +0000 (14:51 -0700)]
[mlir][Tensor] Implement `reifyReturnTypeShapesPerResultDim` for `tensor.insert_slice`.
Differential Revision: https://reviews.llvm.org/D105852
Aart Bik [Tue, 13 Jul 2021 19:13:39 +0000 (12:13 -0700)]
[mlir][sparse] add support for std unary operations
Adds zero-preserving unary operators from std. Also adds xor.
Performs minor refactoring to remove "zero" node, and pushed
the irregular logic for negi (not support in std) into one place.
Reviewed By: gussmith23
Differential Revision: https://reviews.llvm.org/D105928
Adam Paszke [Tue, 13 Jul 2021 21:35:50 +0000 (14:35 -0700)]
Add more types to the LLVM dialect C API
This includes:
- void type
- array types
- function types
- literal (unnamed) struct types
Reviewed By: jpienaar, ftynse
Differential Revision: https://reviews.llvm.org/D105908
Derek Schuff [Tue, 13 Jul 2021 21:31:19 +0000 (14:31 -0700)]
[WebAssembly] Run varargs codegen test with non-emscripten triple
This is a followup from D105749 to cover both triples in the case
where they differ.
Alexander Yermolovich [Tue, 13 Jul 2021 19:11:53 +0000 (12:11 -0700)]
[LLD] Adding support for RELA for CG Profile.
This is a follow up to https://reviews.llvm.org/D104080, and https://github.com/llvm/llvm-project/commit/
ca3bdb57fa1ac98b711a735de048c12b5fdd8086#diff-e64a48fabe31db213a631fdc5f2acb51bdddf3f16a8fb2928784f4c579229585. The implementation of call graph profile was changed from a black box section to relocation approach. This was done to be compatible with post processing tools like strip/objcopy, and llvm equivalent. When they are invoked on object file before the final linking step with this new approach the symbol indices correctness is preserved.
The GNU binutils tools change the REL section to RELA section, unlike llvm tools. For example when strip -S is run on the ELF object files, as an intermediate step before linking. To preserve compatibility this patch extends implementation in LLD and ELFDumper to support both REL and RELA sections for call graph profile.
Reviewed By: MaskRay, jhenderson
Differential Revision: https://reviews.llvm.org/D105217
Hedin Garca [Tue, 13 Jul 2021 17:19:58 +0000 (17:19 +0000)]
[libc] Capture floating point encoding and arrange it sequentially in memory
Redefined FPBits.h and LongDoubleBitsX86 so its implementation works for the Windows
and Linux platform while maintaining a packed memory alignment of the precision floating
point numbers. For its size in memory to be the same as the data type of the float point number.
This change was necessary because the previous attribute((packed)) specification in the struct was not working
for Windows like it was for Linux and consequently static_asserts in the FPBits.h file were failing.
Reviewed By: aeubanks, sivachandra
Differential Revision: https://reviews.llvm.org/D105561
Caitlyn Cano [Thu, 8 Jul 2021 17:44:10 +0000 (17:44 +0000)]
[libc] Don't pass -fpie/-ffreestanding on Windows
The current compile options function hardcodes the -fpie and
-ffreestanding flags, which don't exist on Windows. This patch sets the
compilation flags conditionally based on the OS specifics.
Reviewed By: sivachandra, aeubanks
Differential Revision: https://reviews.llvm.org/D105643
Vitaly Buka [Tue, 13 Jul 2021 20:37:29 +0000 (13:37 -0700)]
[sanitizer] Few more NFC changes from D105778
Philip Reames [Tue, 13 Jul 2021 20:30:44 +0000 (13:30 -0700)]
[SCEV] Handle zero stride correctly in howManyLessThans
This is split from D105216, but the code is hoisted much earlier into the path where we can actually get a zero stride flowing through. Some fairly simple proofs handle the cases which show up in practice. The only test changes are the cases where we really do need a non-zero divider to produce the right result.
Differential Revision: https://reviews.llvm.org/D105921
Martin Storsjö [Tue, 13 Jul 2021 11:24:51 +0000 (14:24 +0300)]
[libcxx] [docs] Acknowledge that the library is known to work in some configs outside of what's tested in CI
Differential Revision: https://reviews.llvm.org/D105888
Vitaly Buka [Tue, 13 Jul 2021 20:16:46 +0000 (13:16 -0700)]
[NFC][sanitizer] Move MemoryMapper out of SizeClassAllocator64
Part of D105778