Andrea Di Biagio [Fri, 7 May 2021 19:20:03 +0000 (20:20 +0100)]
[MCA][RegisterFile] Fix register class check for move elimination (PR50265)
The register file should always check if the destination register is from a
register class that allows move elimination.
Before this change, the check on the register class was only performed in a few
very specific cases. However, it should have always been performed.
This patch fixes the issue.
Note that none of the upstream scheduling models is currently affected by this
bug, so there is no test for it. The issue was found by Roman while working on
the znver3 model. I was able to reproduce the issue locally by tweaking the
btver2 model. I then verified that this patch fixes the issue.
Olivier Goffart [Fri, 7 May 2021 20:23:53 +0000 (13:23 -0700)]
[SEH] Fix regression with SEH in noexpect functions
Commit
5baea0560160a693b19022c5d0ba637b6b46b2d8 set the CurCodeDecl
because it was needed to pass the assert in CodeGenFunction::EmitLValueForLambdaField,
But this was not right to do as CodeGenFunction::FinishFunction passes it to EmitEndEHSpec
and cause corruption of the EHStack.
Revert the part of the commit that changes the CurCodeDecl, and instead
adjust the assert to check for a null CurCodeDecl.
Differential Revision: https://reviews.llvm.org/D102027
Florian Hahn [Fri, 7 May 2021 20:05:58 +0000 (21:05 +0100)]
[LV] Assert if trying to sink replicate region into another region (NFC)
Currently sinking a replicate region into another replicate region is
not supported. Add an assert, to make the problem more obvious, should
it occur.
Discussed post-commit for
ccebf7a1096a.
Florian Hahn [Fri, 7 May 2021 19:21:36 +0000 (20:21 +0100)]
[LV] Rename Region to TargetRegion, similar to SinkRegion (NFC).
Adjust the name to make it clearer this is the region containing the
target recipe, similar to SinkRegion below.
Suggested post-commit for
ccebf7a1096a.
peter klausler [Thu, 6 May 2021 20:50:12 +0000 (13:50 -0700)]
[flang] Implement NORM2 in the runtime
Implement the reduction transformational intrinsic function NORM2 in
the runtime, using infrastructure already in place for MAXVAL & al.
Differential Revision: https://reviews.llvm.org/D102024
Petr Hosek [Fri, 7 May 2021 06:05:39 +0000 (23:05 -0700)]
[BareMetal] Ensure that sysroot always comes after library paths
This addresses an issue introduced in D91559. We would invoke the
compiler with -Lpath/to/lib --sysroot=path/to/sysroot where both
locations contain libraries with the same name, but we expect linker
to pick up the library in path/to/lib since that version is more
specialized. This was the case before D91559 where the sysroot path
would be ignored, but after that change linker would now pick up the
library from the sysroot which resulted in unexpected behavior.
The sysroot path should always come after any user provided library
paths, followed by compiler runtime paths. We want for libraries in user
provided library paths to always take precedence over sysroot libraries.
This matches the behavior of other toolchains used with other targets.
Differential Revision: https://reviews.llvm.org/D102049
Hsiangkai Wang [Fri, 7 May 2021 07:18:11 +0000 (15:18 +0800)]
[RISCV] Consider scalar types for required extensions.
We have vector operations on double vector and float scalar. For
example, vfwadd.wf is such a instruction.
vfloat64m1_t vfwadd_wf(vfloat64m1_t op0, float op1, size_t op2);
We should specify F and D extensions for it.
Differential Revision: https://reviews.llvm.org/D102051
Vyacheslav Zakharin [Fri, 7 May 2021 19:42:04 +0000 (12:42 -0700)]
An attempt to abandon omptarget out-of-tree builds.
I want to start using LLVM component libraries in libomptarget
to stop duplicating implementations already available in LLVM
(e.g. LLVMObject, LLVMSupport, etc.). Without relying on LLVM
in all libomptarget builds one has to provide fallback implementation
for each used LLVM feature.
This is an attempt to stop supporting out-of-llvm-tree builds of libomptarget.
I understand that I may need to revert this,
if this affects downstream projects in a bad way.
Differential Revision: https://reviews.llvm.org/D101509
Alexander Belyaev [Fri, 7 May 2021 19:20:55 +0000 (21:20 +0200)]
[mlir] Add a pattern to bufferize std.index_cast.
Differential Revision: https://reviews.llvm.org/D102088
Alexander Belyaev [Fri, 7 May 2021 19:21:54 +0000 (21:21 +0200)]
[mlir] Add a pattern to bufferize linalg.tensor_reshape.
Differential Revision: https://reviews.llvm.org/D102089
Emilio Cota [Fri, 7 May 2021 19:23:01 +0000 (19:23 +0000)]
[mlir][docs] remove stale statement about index type in vectors
b614ada0e8 ("[mlir] add support for index type in vectors.") removed
this limitation.
Differential Revision: https://reviews.llvm.org/D102081
Arthur Eubanks [Fri, 7 May 2021 19:05:16 +0000 (12:05 -0700)]
Revert "[DebugInfo] Fix updateDbgUsersToReg to support DBG_VALUE_LIST"
This reverts commit
0791f968fee259e5c34523167bd58179b8b081c2.
Causing crashes: https://crbug.com/1206764
Florian Hahn [Fri, 7 May 2021 18:39:05 +0000 (19:39 +0100)]
[SCEV] By more careful when traversing phis in isImpliedViaMerge.
I think currently isImpliedViaMerge can incorrectly return true for phis
in a loop/cycle, if the found condition involves the previous value of
Consider the case in exit_cond_depends_on_inner_loop.
At some point, we call (modulo simplifications)
isImpliedViaMerge(<=, %x.lcssa, -1, %call, -1).
The existing code tries to prove IncV <= -1 for all incoming values
InvV using the found condition (%call <= -1). At the moment this succeeds,
but only because it does not compare the same runtime value. The found
condition checks the value of the last iteration, but the incoming value
is from the *previous* iteration.
Hence we incorrectly determine that the *previous* value was <= -1,
which may not be true.
I think we need to be more careful when looking at the incoming values
here. In particular, we need to rule out that a found condition refers to
any value that may refer to one of the previous iterations. I'm not sure
there's a reliable way to do so (that also works of irreducible control
flow).
So for now this patch adds an additional requirement that the incoming
value must properly dominate the phi block. This should ensure the
values do not change in a cycle. I am not entirely sure if will catch
all cases and I appreciate a through second look in that regard.
Alternatively we could also unconditionally bail out in this case,
instead of checking the incoming values
Reviewed By: nikic
Differential Revision: https://reviews.llvm.org/D101829
Thomas Lively [Fri, 7 May 2021 18:50:19 +0000 (11:50 -0700)]
[WebAssembly] Use functions instead of macros for const SIMD intrinsics
To improve hygiene, consistency, and usability, it would be good to replace all
the macro intrinsics in wasm_simd128.h with functions. The reason for using
macros in the first place was to enforce the use of constants for some arguments
using `_Static_assert` with `__builtin_constant_p`. This commit switches to
using functions and uses the `__diagnose_if__` attribute rather than
`_Static_assert` to enforce constantness.
The remaining macro intrinsics cannot be made into functions until the builtin
functions they are implemented with can be replaced with normal code patterns
because the builtin functions themselves require that their arguments are
constants.
This commit also fixes a bug with the const_splat intrinsics in which the f32x4
and f64x2 variants were incorrectly producing integer vectors.
Differential Revision: https://reviews.llvm.org/D102018
Fangrui Song [Fri, 7 May 2021 18:42:16 +0000 (11:42 -0700)]
[unittest] Fix -Wunused-variable after D94717
Krzysztof Parzyszek [Fri, 7 May 2021 17:52:20 +0000 (12:52 -0500)]
Allow empty value list in propagateMetadata(Inst, ArrayOf...)
This will allow writing
propagateMetadata(Inst, collectInterestingValues(...))
without concern about empty lists. In case of an empty list,
Inst is returned without any changes.
Fangrui Song [Fri, 7 May 2021 18:15:43 +0000 (11:15 -0700)]
Internalize some cl::opt global variables or move them under namespace llvm
Louis Dionne [Fri, 7 May 2021 17:57:07 +0000 (13:57 -0400)]
[libc++][ci] Run longer CI jobs first
Jobs that test with a more recent standard version run more tests, so
they take longer. We'll decrease the average latency by running them
first instead of last.
Saleem Abdulrasool [Fri, 7 May 2021 17:18:28 +0000 (10:18 -0700)]
lit: revert
134b103fc0f3a995d76398bf4b029d72bebe8162
Revert the 32-process cap on Windows. When testing with Swift, we found
that there was a time reduction for testing with the higher load. This
should hopefully not matter much in practice. In the case that the
original problem with python remains with a high subprocess count, we
can easily revert this change.
Roman Lebedev [Fri, 7 May 2021 17:05:30 +0000 (20:05 +0300)]
[X86] AMD Zen 3: mark XMM/YMM (but not MMX!) reg moves as eliminatible in RegisterFile
Roman Lebedev [Fri, 7 May 2021 16:36:37 +0000 (19:36 +0300)]
[X86] AMD Zen 3: MOVSX32rr32 is a zero-cycle move
It measures as such, and the reference docs agree.
I can't easily add a MCA test, because there's no mnemonic for it,
it can only be disassembled or created as a MCInst.
Fangrui Song [Fri, 7 May 2021 16:44:26 +0000 (09:44 -0700)]
[AArch64][ELF] Prefer to lower MC_GlobalAddress operands to .Lfoo$local
Similar to X86 D73230 &
46788a21f9152be3950e57dc526454655682bdd4
With this change, we can set dso_local in clang's -fpic -fno-semantic-interposition mode,
for default visibility external linkage non-ifunc-non-COMDAT definitions.
For such dso_local definitions, variable access/taking the address of a
function/calling a function will go through a local alias to avoid GOT/PLT.
Note: the 'S' inline assembly constraint refers to an absolute symbolic address
or a label reference (D46745).
Differential Revision: https://reviews.llvm.org/D101872
Matt Morehouse [Fri, 7 May 2021 16:11:45 +0000 (09:11 -0700)]
[libFuzzer] Fix stack-overflow-with-asan.test.
Fix function return type and remove check for SUMMARY, since it doesn't
seem to be output in Windows.
Whitney Tsang [Fri, 7 May 2021 15:36:55 +0000 (15:36 +0000)]
[LoopNest] Consider loop nest with inner loop guard using outer loop
induction variable to be perfect
This patch allow more conditional branches to be considered as loop
guard, and so more loop nests can be considered perfect.
Reviewed By: bmahjour, sidbav
Differential Revision: https://reviews.llvm.org/D94717
Simon Pilgrim [Fri, 7 May 2021 15:39:11 +0000 (16:39 +0100)]
[X86] combineXor - limit fold to non-opaque constants (PR50254)
Ensure we don't try to fold when one might be an opaque constant - the constant fold will fail and then the reverse fold will happen in DAGCombine.....
Roman Lebedev [Fri, 7 May 2021 15:22:01 +0000 (18:22 +0300)]
[X86] AMD Zen 3: _REV variants of zero-cycles moves are also zero-cycles (PR50261)
Sometimes disassembler picks _REV variants of instructions
over the plain ones, which in this case exposed an issue
that the _REV variants aren't being modelled as optimizable moves.
Roman Lebedev [Fri, 7 May 2021 15:09:18 +0000 (18:09 +0300)]
[NFC][X86][MCA] AMD Zen3: add test for zero-cycle X87 move
Sebastian Poeplau [Fri, 7 May 2021 15:00:33 +0000 (08:00 -0700)]
[libFuzzer] Fix stack overflow detection
Address sanitizer can detect stack exhaustion via its SEGV handler, which is
executed on a separate stack using the sigaltstack mechanism. When libFuzzer is
used with address sanitizer, it installs its own signal handlers which defer to
those put in place by the sanitizer before performing additional actions. In the
particular case of a stack overflow, the current setup fails because libFuzzer
doesn't preserve the flag for executing the signal handler on a separate stack:
when we run out of stack space, the operating system can't run the SEGV handler,
so address sanitizer never reports the issue. See the included test for an
example.
This commit fixes the issue by making libFuzzer preserve the SA_ONSTACK flag
when installing its signal handlers; the dedicated signal-handler stack set up
by the sanitizer runtime appears to be large enough to support the additional
frames from the fuzzer.
Reviewed By: morehouse
Differential Revision: https://reviews.llvm.org/D101824
thomasraoux [Thu, 6 May 2021 23:37:47 +0000 (16:37 -0700)]
[mlir][vector] add pattern to cast away leading unit dim for elementwise op
Differential Revision: https://reviews.llvm.org/D102034
thomasraoux [Thu, 6 May 2021 23:41:43 +0000 (16:41 -0700)]
[mlir][spirv] add support lowering of extract_slice to scalar type
Differential Revision: https://reviews.llvm.org/D102041
Joseph Tremoulet [Fri, 7 May 2021 14:48:18 +0000 (07:48 -0700)]
BasicAA: Recognize inttoptr as isEscapeSource
Pointers escape when converted to integers, so a pointer produced by
converting an integer to a pointer must not be a local non-escaping
object.
Reviewed By: nikic, nlopes, aqjune
Differential Revision: https://reviews.llvm.org/D101541
Sanjay Patel [Fri, 7 May 2021 14:43:47 +0000 (10:43 -0400)]
[AArch64] add test for missed vectorization; NFC
This is a reduction of the example in:
https://llvm.org/PR50256
Joseph Huber [Thu, 6 May 2021 16:42:55 +0000 (12:42 -0400)]
[libomptarget] Add support for target memory allocators to cuda RTL
Summary:
The allocator interface added in D97883 allows the RTL to allocate shared and
host-pinned memory from the cuda plugin. This patch adds support for these to
the runtime.
Reviewed By: grokos
Differential Revision: https://reviews.llvm.org/D102000
Tobias Gysi [Fri, 7 May 2021 14:17:06 +0000 (14:17 +0000)]
[mlir][linalg] Remove redundant indexOp builder.
Remove the builder signature taking a signed dimension identifier.
Reviewed By: ergawy
Differential Revision: https://reviews.llvm.org/D102055
Tres Popp [Tue, 20 Apr 2021 08:36:48 +0000 (10:36 +0200)]
[mlir] Rename BufferAliasAnalysis to BufferViewFlowAnalysis
This it to make more clear the difference between this and
an AliasAnalysis.
For example, given a sequence of subviews that create values
A -> B -> C -> d:
BufferViewFlowAnalysis::resolve(B) => {B, C, D}
AliasAnalysis::resolve(B) => {A, B, C, D}
Differential Revision: https://reviews.llvm.org/D100838
Ahsan Saghir [Tue, 4 May 2021 13:57:27 +0000 (08:57 -0500)]
[PowerPC] Provide MMA builtins for compatibility
Vector pair intrinsics and builtins were renamed in
https://reviews.llvm.org/D91974 to replace the _mma_ prefix by _vsx_.
However, some projects used the _mma_ version, so this patch adds
these intrinsics to provide compatibility.
Fixes Bugzilla: https://bugs.llvm.org/show_bug.cgi?id=50159
Reviewed By: nemanjai, amyk
Differential Revision: https://reviews.llvm.org/D100482
Roman Lebedev [Fri, 7 May 2021 13:45:17 +0000 (16:45 +0300)]
[NFC][X86][MCA] AMD Zen3 Decrease iteration count in reg-move-elimination tests
Drop it just enough so it still produces the right IPC.
Roman Lebedev [Fri, 7 May 2021 13:41:46 +0000 (16:41 +0300)]
[X86] AMD Zen 3: throughput for renameable XMM/YMM moves is 6
They are resolved at the register rename stage without
using any execution units.
Roman Lebedev [Fri, 7 May 2021 13:28:01 +0000 (16:28 +0300)]
[X86] AMD Zen 3: AVX YMM moves are zero-cycle
I've verified this with llvm-exegesis.
This is not limited to zero registers.
Roman Lebedev [Fri, 7 May 2021 13:23:38 +0000 (16:23 +0300)]
[X86] AMD Zen 3: AVX XMM moves are zero-cycle
I've verified this with llvm-exegesis.
This is not limited to zero registers.
Roman Lebedev [Fri, 7 May 2021 13:15:43 +0000 (16:15 +0300)]
[X86] AMD Zen 3: SSE XMM moves are zero-cycle
I've verified this with llvm-exegesis.
This is not limited to zero registers.
Refs:
AMD SOG 19h, 2.9.4 Zero Cycle Move
The processor is able to execute certain register to register
mov operations with zero cycle delay.
Agner,
22.13 Instructions with no latency
Register-to-register move instructions are resolved at
the register rename stage without using any execution units.
These instructions have zero latency. It is possible to do six such
register renamings per clock cycle, and it is even possible to
rename the same register multiple times in one clock cycle.
Roman Lebedev [Fri, 7 May 2021 13:15:35 +0000 (16:15 +0300)]
[NFC][X86][MCA] AMD Zen 3: Add tests for renameable AVX YMM moves
Roman Lebedev [Fri, 7 May 2021 13:15:17 +0000 (16:15 +0300)]
[NFC][X86][MCA] AMD Zen 3: Add tests for renameable AVX XMM moves
Roman Lebedev [Fri, 7 May 2021 13:04:23 +0000 (16:04 +0300)]
[NFC][X86][MCA] AMD Zen 3: Add tests for renameable SSE XMM moves
Roman Lebedev [Fri, 7 May 2021 12:11:14 +0000 (15:11 +0300)]
[X86] AMD Zen 3: throughput for renameable GPR moves is 6
They are resolved at the register rename stage without
using any execution units.
Roman Lebedev [Fri, 7 May 2021 12:43:32 +0000 (15:43 +0300)]
[NFC][X86] AMD Zen 3: move sched classes for renameables moves togeter
Roman Lebedev [Fri, 7 May 2021 12:11:01 +0000 (15:11 +0300)]
[NFC][X86][MCA] Increase iteration count in reg move elimination tests
So the IPC actually stabilizes at 6.
Arthur O'Dwyer [Thu, 6 May 2021 16:18:09 +0000 (12:18 -0400)]
[libc++] [test] Test that unordered_*::swap/move/assign does not invalidate iterators.
And remove the dedicated debug-iterator tests; we want to test this in all modes.
We have a CI step for testing the whole test suite with `--debug_level=1` now.
Part of https://reviews.llvm.org/D102003
Arthur O'Dwyer [Thu, 6 May 2021 15:53:10 +0000 (11:53 -0400)]
[libc++] [test] Simplify arithmetic in list.special/swap.pass.cpp. NFCI.
Part of https://reviews.llvm.org/D102003
Arthur O'Dwyer [Thu, 6 May 2021 15:42:25 +0000 (11:42 -0400)]
[libc++] [test] Test that list::swap/move/move-assign does not invalidate iterators.
And remove the dedicated debug-iterator test; we want to test this in all modes.
We have a CI step for testing the whole test suite with `--debug_level=1` now.
Part of https://reviews.llvm.org/D102003
Stephen Tozer [Fri, 7 May 2021 12:53:09 +0000 (13:53 +0100)]
Reapply "[DebugInfo] Drop DBG_VALUE_LISTs with an excessive number of debug operands"
Reapply
b623df3c, which was reverted while reverting a different patch
with a breaking change. There are no underlying issues with this patch,
so no changes have been made to the original patch.
This reverts commit
b11e4c990771541e440861f017afea7b4ba162f4.
Simon Pilgrim [Fri, 7 May 2021 13:48:10 +0000 (14:48 +0100)]
[CodeGen] Ensure UserValue::getDebugLoc() and UserLabel::getDebugLoc() consistently return a const reference NFCI.
Avoids a lot of unnecessary tracking increments/decrements of the underlying TrackingMDNodeRef.
Simon Pilgrim [Fri, 7 May 2021 12:43:10 +0000 (13:43 +0100)]
[DAG] Ensure all SD classes consistently return a const reference with getDebugLoc(). NFCI.
Avoids a lot of unnecessary tracking increments/decrements of the underlying TrackingMDNodeRef.
Benjamin Kramer [Fri, 7 May 2021 13:15:52 +0000 (15:15 +0200)]
Retire TargetRegisterInfo::getSpillAlignment
getSpillAlign does the same thing.
Sebastian Neubauer [Mon, 3 May 2021 08:14:12 +0000 (10:14 +0200)]
[AMDGPU] Restrict immediate scratch offsets
gfx9 does not work with negative offsets, gfx10 works only with
aligned negative offsets, but not with unaligned negative offsets.
This is slightly more conservative than needed, gfx9 does support
negative offsets when a VGPR address is used and gfx10 supports
negative, unaligned offsets when an SGPR address is used, but we
do not make use of that with this patch.
Differential Revision: https://reviews.llvm.org/D101292
David Stuttard [Mon, 24 Feb 2020 21:19:15 +0000 (21:19 +0000)]
AMDGPU: Correct const_index_stride for wave 32 for PAL ABI
Retrying after revert and fix (removed implicit def flag from operand). Now
passes with expensive_checks enabled.
Since there is a single scratch resource descriptor for all shaders, if there is
a wave32 and a wave64 shader (for instance for VsFs pairs)
then the const_index_stride will be incorrect for wave32 shaders.
Differential Revision: https://reviews.llvm.org/D101830
Change-Id: Ie3b8b2921237968caca91527dd0c97b1b0cc0360
Stephen Tozer [Fri, 7 May 2021 12:36:31 +0000 (13:36 +0100)]
Fix: [DebugInfo] Fix crash when emitting an invalidated SDDbgValue
This patch is a fix for revision
ce0c1f3c, which caused test failures on
bots without x86 as a registered target. This patch moves the test added
in the prior patch to the x86 folder, so that it only runs on bots with
the correct target available.
Malhar Jajoo [Thu, 6 May 2021 23:29:06 +0000 (00:29 +0100)]
[ARM] Transforming memset to Tail predicated Loop
This patch converts llvm.memset intrinsic into Tail Predicated
Hardware loops for a target that supports the Arm M-profile
Vector Extension (MVE).
The llvm.memset is converted to a TP loop for both
constant and non-constant input sizes (of llvm.memset).
Reviewed By: dmgreen
Differential Revision: https://reviews.llvm.org/D100435
Anastasia Stulova [Fri, 7 May 2021 11:15:51 +0000 (12:15 +0100)]
[OpenCL] Fix optional image types.
This change allows the use of identifiers for image types
from `cl_khr_gl_msaa_sharing` freely in the kernel code if
the extension is not supported since they are not in the
list of the reserved identifiers.
This change also removed the need for pragma for the types
in the extensions since the spec does not require the pragma
uses.
Differential Revision: https://reviews.llvm.org/D100983
Joachim Meyer [Thu, 6 May 2021 20:26:19 +0000 (22:26 +0200)]
[NFC] Correctly assert the indents for printEnumValHelpStr.
Only verify that there's no negative indent.
Noted by @chapuni in https://reviews.llvm.org/D93494.
Reviewed By: chapuni
Differential Revision: https://reviews.llvm.org/D102021
Stephen Tozer [Thu, 29 Apr 2021 15:36:05 +0000 (16:36 +0100)]
[DebugInfo] Fix crash when emitting an invalidated SDDbgValue
This patch fixes a crash in the compiler that occurs when certain
invalidated SDDbgValues are emitted. The cause of this was that we would
attempt to check the liveness of the debug value's operands, which
triggers an assert if any of those operands are invalid. This patch
changes this check such that it only occurs if the SDDbgValue is valid;
if not, the check is irrelevant anyway, so can be safely ignored.
Differential Revision: https://reviews.llvm.org/D101540
Simon Pilgrim [Fri, 7 May 2021 12:12:16 +0000 (13:12 +0100)]
[DAG] Add a generic expansion for SHIFT_PARTS opcodes using funnel shifts
Based off a discussion on D89281 - where the AARCH64 implementations were being replaced to use funnel shifts.
Any target that has efficient funnel shift lowering can handle the shift parts expansion using the same expansion, avoiding a lot of duplication.
I've generalized the X86 implementation and moved it to TargetLowering - so far I've found that AARCH64 and AMDGPU benefit, but many other targets (ARM, PowerPC + RISCV in particular) could easily use this with a few minor improvements to their funnel shift lowering (or the folding of their target ops that funnel shifts lower to).
NOTE: I'm trying to avoid adding full SHIFT_PARTS legalizer handling as I think it might actually be possible to remove these opcodes in the medium-term and use funnel shift / libcall expansion directly.
Differential Revision: https://reviews.llvm.org/D101987
David Stuttard [Fri, 7 May 2021 11:49:17 +0000 (12:49 +0100)]
Revert "AMDGPU: Correct const_index_stride for wave 32 for PAL ABI"
This reverts commit
442de0c1adf36bfddb5fb66b442bba8999fa733b.
Simon Pilgrim [Fri, 7 May 2021 11:31:05 +0000 (12:31 +0100)]
[SLP] Regenerate tests to reduce diff in D98714. NFCI.
Simon Pilgrim [Thu, 6 May 2021 17:57:19 +0000 (18:57 +0100)]
[X86] Ensure we pass DebugLoc by const reference where possible. NFCI.
Avoids a lot of unnecessary tracking increments/decrements of the underlying TrackingMDNodeRef
Ole Strohm [Fri, 7 May 2021 11:30:31 +0000 (12:30 +0100)]
[NFC] (test commit) Changed example invocation of C++ for OpenCL
David Stuttard [Mon, 24 Feb 2020 21:19:15 +0000 (21:19 +0000)]
AMDGPU: Correct const_index_stride for wave 32 for PAL ABI
Since there is a single scratch resource descriptor for all shaders, if there is
a wave32 and a wave64 shader (for instance for VsFs pairs)
then the const_index_stride will be incorrect for wave32 shaders.
Differential Revision: https://reviews.llvm.org/D101830
Change-Id: Id8de5566b0d1a07a814e2e7db016df9d20bf6d2c
Roman Lebedev [Fri, 7 May 2021 10:43:46 +0000 (13:43 +0300)]
[NFC][X86][MCA] AMD Zen 3: add tests with non-eliminatible MMX moves
In Zen3, MMX moves are *not* eliminated,
i've verified this with llvm-exegesis.
Roman Lebedev [Fri, 7 May 2021 10:02:14 +0000 (13:02 +0300)]
[X86] AMD Zen 3: 32/64 -bit GPR register moves are zero-cycle
I've verified this with llvm-exegesis.
This is not limited to zero registers.
Refs:
AMD SOG 19h, 2.9.4 Zero Cycle Move
The processor is able to execute certain register to register
mov operations with zero cycle delay.
Agner,
22.13 Instructions with no latency
Register-to-register move instructions are resolved at
the register rename stage without using any execution units.
These instructions have zero latency. It is possible to do six such
register renamings per clock cycle, and it is even possible to
rename the same register multiple times in one clock cycle.
Roman Lebedev [Fri, 7 May 2021 10:02:07 +0000 (13:02 +0300)]
[NFC][X86][MCA] AMD Zen 3: add tests with eliminatible GPR moves
Stephen Tozer [Thu, 29 Apr 2021 15:04:24 +0000 (16:04 +0100)]
[DebugInfo] Fix updateDbgUsersToReg to support DBG_VALUE_LIST
This patch modifies updateDbgUsersToReg to properly handle
DBG_VALUE_LIST instructions, by replacing the hard-coded operand indices
(i.e. getOperand(0)) with the more general getDebugOperandsForReg(), and
updating the register for all matching operands.
Differential Revision: https://reviews.llvm.org/D101523
gbreynoo [Fri, 7 May 2021 10:21:51 +0000 (11:21 +0100)]
[llvm-dwarfdump] Help option output should be consistent with the command guide
The dwarfdump command guide shows the short options used as aliases but
these are not found in the help text unless --show-hidden is used.
Investigating other tools some follow this pattern, others like
llvm-objdump show aliases with --help. This change fixes the help output
to be consistent with the command guide. This includes updating alias
descriptions in the help output to use "--".
As part of this change I updated cmdline.test, including some options
that were missing testing.
Differential Revision: https://reviews.llvm.org/D101646
Guillaume Chatelet [Fri, 7 May 2021 10:22:41 +0000 (10:22 +0000)]
[llvm][NFC] Remove remaining deprecated alignment functions from CodeGen
Differential Revision: https://reviews.llvm.org/D102058
Guillaume Chatelet [Fri, 7 May 2021 09:12:56 +0000 (09:12 +0000)]
[llvm][NFC] Remove deprecated TargetFrameLowering and InstrTypes alignment functions
Differential Revision: https://reviews.llvm.org/D102056
LemonBoy [Fri, 7 May 2021 10:09:38 +0000 (12:09 +0200)]
[AsmParser][ARM] Make .thumb_func imply .thumb
GNU as documentation states that a `.thumb_func` directive implies `.thumb`, teach the asm parser to switch mode whenever it's encountered. On the other hand the labeled form, exclusive to Apple's toolchain, doesn't switch mode at all.
Reviewed By: nickdesaulniers, peter.smith
Differential Revision: https://reviews.llvm.org/D101975
LLVM GN Syncbot [Fri, 7 May 2021 09:15:50 +0000 (09:15 +0000)]
[gn build] Port
98e5ede60499
Sebastian Neubauer [Fri, 30 Apr 2021 19:31:55 +0000 (21:31 +0200)]
[AMDGPU] Serialize MFInfo::ScavengeFI
Serialize ScavengeFI from SIMachineFunctionInfo into yaml.
ScavengeFI is not used outside of the PrologEpilogInserter,
so this shouldn't change anything.
Differential Revision: https://reviews.llvm.org/D101367
Diana Picus [Thu, 6 May 2021 09:26:57 +0000 (09:26 +0000)]
[flang] Remove redundant reallocation
The MaxMinHelper used to implement MIN and MAX for character types would
reallocate the accumulator whenever the number of characters in it was
different from that in the other input. This is unnecessary if the
accumulator is already larger than the other input. This patch fixes the
issue and adds a unit test to make sure we don't reallocate if we don't
need to.
Differential Revision: https://reviews.llvm.org/D101984
Diana Picus [Tue, 4 May 2021 18:57:54 +0000 (18:57 +0000)]
[flang] Add tests for MIN for character arrays. NFC
We used to test only scalar character types. This commit adds tests for
arrays with a few simple shapes.
Differential Revision: https://reviews.llvm.org/D101983
Caroline Concatto [Thu, 22 Apr 2021 07:24:40 +0000 (08:24 +0100)]
[LoopVectorize][SVE] Remove assert for scalable vector in InnerLoopVectorizer::fixReduction
The function fixReduction used to assert/crash for scalable vector when
a vector reduce could be done with a smaller vector.
This patch removes this assertion as it is safe to use scalable vector for
vector reduce and truncate.
Differential Revision: https://reviews.llvm.org/D101260
James Henderson [Fri, 7 May 2021 08:20:50 +0000 (09:20 +0100)]
[lit][test] Attempt fix when paths include symlink
Example of failure:
https://lab.llvm.org/staging/#/builders/126/builds/345/steps/5/logs/FAIL__lit___use-tool-search-env_py
Martin Storsjö [Thu, 6 May 2021 07:18:41 +0000 (10:18 +0300)]
[libcxx] Fix a case of -Wundef warnings. NFC.
Differential Revision: https://reviews.llvm.org/D101978
Peilin Guo [Fri, 7 May 2021 08:05:50 +0000 (16:05 +0800)]
[LazyValueInfo] Insert an Overdefined placeholder to prevent infinite recursion
getValueFromCondition() uses a Visited set to record the intermediate value.
However, it uses a postorder way to compute the value first and update the
Visited set later. Thus it will be trapped into an infinite recursion if there
exists IRs that use no dominated by its def as in this example:
%tmp3 = or i1 undef, %tmp4
%tmp4 = or i1 undef, %tmp3
To prevent this, we can insert an Overdefined placeholder into the set
before computing the actual value.
Reviewed by: nikic
Differential Revision: https://reviews.llvm.org/D101273
Chen Zheng [Fri, 7 May 2021 07:00:11 +0000 (07:00 +0000)]
[Debug-Info][NFC] add a wrapper for Die.addValue
Add a new wrapper function addAttribute() for Die.addValue() function,
so we can do some attributes control in one single interface.
Reviewed By: dblaikie
Differential Revision: https://reviews.llvm.org/D101125
Amara Emerson [Fri, 7 May 2021 07:00:47 +0000 (00:00 -0700)]
[GlobalISel] Micro-optimize the conditional branch optimization.
Convert a check into an assert and pass an MI instead of recomputing in the
apply function.
KareemErgawy-TomTom [Fri, 7 May 2021 06:59:35 +0000 (08:59 +0200)]
[MLIR][SPIRV] Properly (de-)serialize BranchConditionalOp.
Implements proper (de-)serialization logic for BranchConditionalOp when
such ops have true/false target operands.
Reviewed By: antiagainst
Differential Revision: https://reviews.llvm.org/D101602
Chen Zheng [Fri, 7 May 2021 06:19:29 +0000 (06:19 +0000)]
[XCOFF] handle string constants generation for AIX
This follows https://www.ibm.com/docs/en/aix/7.2?topic=constants-string
Reviewed By: hubert.reinterpretcast
Differential Revision: https://reviews.llvm.org/D101280
Tobias Gysi [Fri, 7 May 2021 05:59:05 +0000 (05:59 +0000)]
[mlir][linalg] Add IndexedGenericOp to GenericOp canonicalization.
Replace all `linalg.indexed_generic` ops by `linalg.generic` ops that access the iteration indices using the `linalg.index` op.
Differential Revision: https://reviews.llvm.org/D101612
Qiu Chaofan [Fri, 7 May 2021 03:04:47 +0000 (11:04 +0800)]
[PowerPC] Remove extra swap for extract+vperm on LE
This is a simple fix on LE. On BE, vector shuffles are categorized into
different ops. We may need more work to eliminate these in
tablegen/pre-isel.
Reviewed By: nemanjai
Differential Revision: https://reviews.llvm.org/D101605
Yonghong Song [Thu, 6 May 2021 23:31:30 +0000 (16:31 -0700)]
BPF: fix FIELD_EXISTS relocation with array subscripts
Lorenz Bauer reported an issue in bpf mailing list ([1]) where
for FIELD_EXISTS relocation, if the object is an array subscript,
the patched immediate is the object offset from the base address,
instead of 1.
Currently in BPF AbstractMemberAccess pass, the final offset
from the base address is the patched offset except FIELD_EXISTS
which is 1 unconditionally. In this particular case, the last
data structure access is not a field (struct/union offset)
so it didn't hit the place to set patched immediate to be 1.
This patch fixed the issue by checking the relocation type.
If the type is FIELD_EXISTS, just set to 1.
Tested by modifying some bpf selftests, libbpf is okay with
such types with FIELD_EXISTS relocation.
[1] https://lore.kernel.org/bpf/CACAyw99n-cMEtVst7aK-3BfHb99GMEChmRLCvhrjsRpHhPrtvA@mail.gmail.com/
Differential Revision: https://reviews.llvm.org/D102036
Coelacanthus [Thu, 6 May 2021 10:36:52 +0000 (18:36 +0800)]
[TableGen] Use range-based for loops (NFC)
Use range-based for loops in TableGen.
Reviewed By: Paul-C-Anagnostopoulos
Differential Revision: https://reviews.llvm.org/D101994
qixingxue [Thu, 6 May 2021 07:33:56 +0000 (15:33 +0800)]
[IR] Fix typo in comment of Intrinsics.td (NFC)
Bruno Cardoso Lopes [Fri, 7 May 2021 04:04:23 +0000 (21:04 -0700)]
[CGAtomic] Lift strong requirement for remaining compare_exchange combinations
Follow up on
431e3138a and complete the other possible combinations.
Besides enforcing the new behavior, it also mitigates TSAN false positives when
combining orders that used to be stronger.
MaheshRavishankar [Fri, 7 May 2021 00:17:29 +0000 (17:17 -0700)]
[mlir][Linalg] Allow folding to rank-zero tensor when using rank-reducing subtensors.
The pattern to convert subtensor ops to their rank-reduced versions
(by dropping unit-dims in the result) can also convert to a zero-rank
tensor. Handle that case.
This also fixes a OOB access bug in the existing pattern for such
cases.
Differential Revision: https://reviews.llvm.org/D101949
Jianzhou Zhao [Fri, 30 Apr 2021 17:18:05 +0000 (17:18 +0000)]
[dfsan] Rename and fix an internal test issue for mmap+calloc
The linker suggests using -Wl,-z,notext.
Replaced assert by exit also fixed this.
After renaming, interceptor.c would be used to test interceptors in general by D101204.
Reviewed By: morehouse
Differential Revision: https://reviews.llvm.org/D101649
Cyndy Ishida [Thu, 6 May 2021 23:18:55 +0000 (16:18 -0700)]
[llvm][TextAPI] add mapping from OS string to Platform
* add utility for matching target triple OS value strings to PlatformKind
This was reviewed offline by ributzka, steven_wu
Stanislav Mekhanoshin [Thu, 6 May 2021 20:29:48 +0000 (13:29 -0700)]
[AMDGPU] Expose __builtin_amdgcn_perm for v_perm_b32
Differential Revision: https://reviews.llvm.org/D102022
Rob Suderman [Thu, 6 May 2021 22:55:58 +0000 (15:55 -0700)]
[mlir][tosa] Added div op, variadic concat. Removed placeholder. Spec v0.22 alignment.
Nearly complete alignment to spec v0.22
- Adds Div op
- Concat inputs now variadic
- Removes Placeholder op
Note: TF side PR https://github.com/tensorflow/tensorflow/pull/48921 deletes Concat legalizations to avoid breaking TensorFlow CI. This must be merged only after the TF PR has merged.
Reviewed By: rsuderman
Differential Revision: https://reviews.llvm.org/D101958
Jon Chesterfield [Thu, 6 May 2021 22:52:18 +0000 (23:52 +0100)]
[libomptarget][nfc] Refactor amdgpu partial barrier to simplify adding a second one
[libomptarget][nfc] Refactor amdgpu partial barrier to simplify adding a second one
D101976 would require a second barrier instance. This NFC to amdgpu makes it
simpler to add one (an extra global, one more line in init). Also renames the
current barrier to L0.
Reviewed By: jdoerfert
Differential Revision: https://reviews.llvm.org/D102016
Amy Zhuang [Thu, 6 May 2021 22:08:34 +0000 (15:08 -0700)]
[mlir] Update dstNode after DenseMap insertion in loop fusion pass.
Reviewed By: vinayaka-polymage
Differential Revision: https://reviews.llvm.org/D101794