Alan Zhao [Wed, 11 May 2022 22:05:55 +0000 (15:05 -0700)]
Explicitly add -target for Windows builds in file_test_windows.c
It turns out that the llvm buildbots run the test with
-DLLVM_DEFAULT_TARGET_TRIPLE=x86_64-scei-ps4, which would cause this
test to fail as the test assumed that the default target is Windows. To
fix this, we explicitly set -target for the Windows testcases.
Reviewed By: aeubanks
Differential Revision: https://reviews.llvm.org/D125425
Yuanfang Chen [Wed, 11 May 2022 21:42:03 +0000 (14:42 -0700)]
[Driver][test] run one test in darwin-dsymutil.c for Darwin only
Alan Zhao [Wed, 11 May 2022 20:54:09 +0000 (22:54 +0200)]
[clang] Add the flag -ffile-reproducible
When Clang generates the path prefix (i.e. the path of the directory
where the file is) when generating FILE, __builtin_FILE(), and
std::source_location, Clang uses the platform-specific path separator
character of the build environment where Clang _itself_ is built. This
leads to inconsistencies in Chrome builds where Clang running on
non-Windows environments uses the forward slash (/) path separator
while Clang running on Windows builds uses the backslash (\) path
separator. To fix this, we add a flag -ffile-reproducible (and its
inverse, -fno-file-reproducible) to have Clang use the target's
platform-specific file separator character.
Additionally, the existing flags -fmacro-prefix-map and
-ffile-prefix-map now both imply -ffile-reproducible. This can be
overriden by setting -fno-file-reproducible.
[0]: https://crbug.com/1310767
Differential revision: https://reviews.llvm.org/D122766
Mike Rice [Wed, 11 May 2022 18:26:07 +0000 (11:26 -0700)]
[OpenMP] Fix mangling for linear parameters with negative stride
The 'n' character is used in place of '-' in the mangled name.
Differential Revision: https://reviews.llvm.org/D125406
Xiang Li [Wed, 11 May 2022 20:38:13 +0000 (13:38 -0700)]
Revert "[HLSL] add -D option for dxc mode."
This reverts commit
4dae38ebfba0d8583e52c3ded8f62f5f9fa77fda.
Differential Revision: https://reviews.llvm.org/D125414
Joseph Huber [Wed, 11 May 2022 20:53:36 +0000 (16:53 -0400)]
[LinkerWrapper][Fix} Fix bad alignment from extracted archive members
Summary:
We use embedded binaries to extract offloading device code from the host
fatbinary. This uses a binary format whose necessary alignment is
eight bytes. The alignment is included within the ELF section type so
the data extracted from the ELF should always be aligned at that amount.
However, if this file was extraqcted from a static archive, it was being
sent as an offset in the archive file which did not have the same
alignment guaruntees as the ELF file. This was causing errors in the
UB-sanitizer build as it would occasionally try to access a misaligned
address. To fix this, I simply copy the memory directly to a new buffer
which is guarnteed to have worst-case alignment of 16 in the case that
it's not properly aligned.
Austin Kerbow [Fri, 25 Mar 2022 00:46:15 +0000 (17:46 -0700)]
[AMDGPU] Add llvm.amdgcn.sched.barrier intrinsic
Adds an intrinsic/builtin that can be used to fine tune scheduler behavior. If
there is a need to have highly optimized codegen and kernel developers have
knowledge of inter-wave runtime behavior which is unknown to the compiler this
builtin can be used to tune scheduling.
This intrinsic creates a barrier between scheduling regions. The immediate
parameter is a mask to determine the types of instructions that should be
prevented from crossing the sched_barrier. In this initial patch, there are only
two variations. A mask of 0 means that no instructions may be scheduled across
the sched_barrier. A mask of 1 means that non-memory, non-side-effect inducing
instructions may cross the sched_barrier.
Note that this intrinsic is only meant to work with the scheduling passes. Any
other transformations that may move code will not be impacted in the ways
described above.
Reviewed By: rampitec
Differential Revision: https://reviews.llvm.org/D124700
Florian Hahn [Wed, 11 May 2022 20:20:42 +0000 (21:20 +0100)]
[ConstraintElimination] Add extra tests for different overflows.
Additional tests for D125264, inspired by @spatel.
Philip Reames [Wed, 11 May 2022 20:16:31 +0000 (13:16 -0700)]
[riscv] Add a bunch of tests exploring switch lowering
Specifically, how we handle zext vs sext around truncates.
Craig Topper [Wed, 11 May 2022 19:49:01 +0000 (12:49 -0700)]
[RISCV] Enable subregister liveness tracking for RVV.
RVV makes heavy use of subregisters due to LMUL>1 and segment
load/store tuples. Enabling subregister liveness tracking improves the quality
of the register allocation.
I've added a command line that can be used to turn it off if it causes compile
time or functional issues. I used the command line to keep the old behavior
for one interesting test case that was testing register allocation.
Reviewed By: frasercrmck
Differential Revision: https://reviews.llvm.org/D125108
Craig Topper [Wed, 11 May 2022 19:16:37 +0000 (12:16 -0700)]
[RISCV] Fold addiw from (add X, (addiw (lui C1, C2))) into load/store address
This is a followup to D124231.
We can fold the ADDIW in this pattern if we can prove that LUI+ADDI
would have produced the same result as LUI+ADDIW.
This pattern occurs because constant materialization prefers LUI+ADDIW
for all simm32 immediates. Only immediates in the range
0x7ffff800-0x7fffffff require an ADDIW. Other simm32 immediates
work with LUI+ADDI.
Reviewed By: asb
Differential Revision: https://reviews.llvm.org/D124693
Florian Hahn [Wed, 11 May 2022 19:46:48 +0000 (20:46 +0100)]
[GVN] Add test case for memdep invalidation bug.
Test case for #30999.
Chris Lattner [Wed, 11 May 2022 07:51:53 +0000 (08:51 +0100)]
[AsmParser] Adopt emitWrongTokenError more, improving QoI
This is a full audit of emitError calls, I took the opportunity
to remove extranous parens and fix a couple cases where we'd
generate multiple diagnostics for the same error.
Differential Revision: https://reviews.llvm.org/D125355
Nikolas Klauser [Sun, 8 May 2022 14:40:04 +0000 (16:40 +0200)]
[libc++] Remove __invalidate_all_iterators and replace the uses with std::__debug_db_invalidate_all
Reviewed By: ldionne, #libc
Spies: libcxx-commits
Differential Revision: https://reviews.llvm.org/D125188
Nikolas Klauser [Sat, 7 May 2022 20:20:23 +0000 (22:20 +0200)]
[libc++] Add a few more debug wrapper functions
Reviewed By: ldionne, #libc, jloser
Spies: libcxx-commits
Differential Revision: https://reviews.llvm.org/D125176
Craig Topper [Wed, 11 May 2022 18:52:07 +0000 (11:52 -0700)]
[CodeGenPrepare] Use const reference to avoid unnecessary APInt copy. NFC
Spotted while looking at Matthias' patches.
Reviewed By: RKSimon
Differential Revision: https://reviews.llvm.org/D124985
Philip Reames [Wed, 11 May 2022 18:41:59 +0000 (11:41 -0700)]
[test, riscv] Add test illustrating missing handling for fallthrough blocks in 541c9ba
River Riddle [Sat, 7 May 2022 01:24:17 +0000 (18:24 -0700)]
[TableGen] Refactor TableGenParseFile to no longer use a callback
Now that TableGen no longer relies on global Record state, we can allow
for the client to own the RecordKeeper and SourceMgr. Given that TableGen
internally still relies on the global llvm::SrcMgr, this method unfortunately
still isn't thread-safe.
Differential Revision: https://reviews.llvm.org/D125277
River Riddle [Sat, 7 May 2022 01:05:54 +0000 (18:05 -0700)]
[TableGen] Remove the use of global Record state
This commits removes TableGens reliance on managed static global record state
by moving the RecordContext into the RecordKeeper. The RecordKeeper is now
treated similarly to a (LLVM|MLIR|etc)Context object and is passed to static
construction functions. This is an important step forward in removing TableGens
reliance on global state, and in a followup will allow for users that parse tablegen
to parse multiple tablegen files without worrying about Record lifetime.
Differential Revision: https://reviews.llvm.org/D125276
Qiongsi Wu [Wed, 11 May 2022 17:20:41 +0000 (13:20 -0400)]
[clang][ppc] Creating Seperate Install Target for PPC htm Headers
This patch splits out the htm intrinsic headers from the PPC headers list.
Reviewed By: jsji
Differential Revision: https://reviews.llvm.org/D125386
Craig Topper [Wed, 11 May 2022 18:20:15 +0000 (11:20 -0700)]
[RISCV] Add caching to the gather/scatter to strided load/store conversion.
If we have multiple gather/scatter instructions using the same the
same strided address we would scalarize it multiple times. I guess
a later pass cleans this up, but I don't know if that's guaranteed.
This patch adds a cache to remember the scalarization we already
created for a previous gather/scatter.
Reviewed By: frasercrmck
Differential Revision: https://reviews.llvm.org/D125326
Yaxun (Sam) Liu [Wed, 11 May 2022 15:41:46 +0000 (11:41 -0400)]
[clang] Fix KEYALL
Update KEYALL to cover KEYCUDA. Introduce KEYMAX and
a generic way to update KEYALL.
Reviewed by: Dan Liew
Differential Revision: https://reviews.llvm.org/D125396
Xiang Li [Tue, 10 May 2022 21:22:29 +0000 (14:22 -0700)]
[HLSL] add -D option for dxc mode.
Create dxc_D as alias to option D which Define <macro> to <value> (or 1 if <value> omitted).
Reviewed By: aaron.ballman
Differential Revision: https://reviews.llvm.org/D125338
Craig Topper [Wed, 11 May 2022 18:14:56 +0000 (11:14 -0700)]
[RISCV] Move implementation of getVLOpNum and getSEWOpNum from RISCVInsertVSETVLI to RISCVBaseInfo.h. NFC
We should consolidate the operand counting and ordering into
RISCVBaseInfo.h and stop spreading it around.
Reviewed By: asb
Differential Revision: https://reviews.llvm.org/D125344
Craig Topper [Wed, 11 May 2022 17:58:10 +0000 (10:58 -0700)]
[RISCV] Override TargetLowering::shouldProduceAndByConstByHoistingConstFromShiftsLHSOfAnd.
This hook determines if SimplifySetcc transforms (X & (C l>>/<< Y))
==/!= 0 into ((X <</l>> Y) & C) ==/!= 0. Where C is a constant and
X might be a constant.
The default implementation favors doing the transform if X is not
a constant. Otherwise the code is left alone. There is a provision
that if the target supports a bit test instruction then the transform
will favor ((1 << Y) & X) ==/!= 0. RISCV does not say it has a variable
bit test operation.
RISCV with Zbs does have a BEXT instruction that performs (X >> Y) & 1.
Without Zbs, (X >> Y) & 1 still looks preferable to ((1 << Y) & X) since
we can fold use ANDI instead of putting a 1 in a register for SLL.
This patch overrides this hook to favor bit extract patterns and
otherwise falls back to the "do the transform if X is not a constant"
heuristic.
I've added tests where both C and X are constants with both the shl form
and lshr form. I've also added a test for a switch statement that lowers
to a bit test. That was my original motivation for looking at this.
Reviewed By: asb
Differential Revision: https://reviews.llvm.org/D124639
Sanjay Patel [Wed, 11 May 2022 17:57:33 +0000 (13:57 -0400)]
[InstCombine] freeze operand in sdiv expansion
As discussed in issue #37809, this transform is not safe
if the input is an undefined value.
This is similar to a recent change for urem:
d428f09b2c9d
There is no difference in codegen on the basic examples,
but this could lead to regressions. We may need to
improve freeze analysis or lowering if that happens.
Presumably, in real cases that are similar to the tests
where a subsequent transform removes the select, we
will also be able to remove the freeze by seeing that
the parameter has 'noundef'.
Sanjay Patel [Wed, 11 May 2022 17:48:13 +0000 (13:48 -0400)]
[InstCombine] update auto-generated CHECK lines in test file; NFC
These are all cosmetic (value naming) diffs that would distract from
real changes in this file.
Craig Topper [Wed, 11 May 2022 17:48:12 +0000 (10:48 -0700)]
[RISCV] Add a DAG combine to pre-promote (i32 (and (srl X, Y), 1)) with Zbs on RV64.
Type legalization will want to turn (srl X, Y) into RISCVISD::SRLW,
which will prevent us from using a BEXT instruction.
I don't think there is any precedent for type promotion checking
users to decide how to promote. Instead, I've added this DAG combine to
do it before type legalization.
Reviewed By: asb
Differential Revision: https://reviews.llvm.org/D124109
Philip Reames [Wed, 11 May 2022 17:35:29 +0000 (10:35 -0700)]
[riscv] Canonicalize vsetvli (vsetvli avl, vtype1) vtype2 transitionsas reviewed
This patch is an alternative to a piece of D125270. If we have one vsetvli which is using as AVL the output of another, and the prior AVL can be proven to produce the same VL value as that defining one, we can use the AVL from the prior instruction. This has the effect of removing a state transition on AVL, and will let us use the cheaper 'vsetvli x0, x0, vtype1' form or possible even skip emitting it entirely.
This builds on the same infrastructure as D125337, and does the analogous extension to working on abstract states instead of only prior explicit vsetvli instructions. This is where the (relatively minor) code improvements come from.
More importantly, this fixes the last case where the state computed in phase 1 and 2 of the algorithm differs from the state computed during phase 3. Note that such differences can cause miscompiles by creating disagreements about contents of the VL and VTYPE registers at block boundaries.
Doing this transform inside the dataflow can cause the compatibility of a later store to change with regards to the current state. test15 in the diff illustrates this case well. What we have is a vsetvli which is mutated by one following vector op, but whose GPR result is used by another. The compatibility logic walks back to the def in this case, and checks to see if it matches the immediate prior state. In phase 1 and 2, it doesn't, and in phase 3 (after mutation) it does because we remove a transition which caused it to differ.
Differential Revision: https://reviews.llvm.org/D125392
Arthur Eubanks [Wed, 11 May 2022 16:16:16 +0000 (09:16 -0700)]
[gn build] Use llvm-ar when clang_base_path is specified
Only applies linux for now.
This prevents warnings with use_thinlto like
bfd plugin: LLVM gold plugin has failed to create LTO module: Not an int attribute (Producer: 'LLVM15.0.0git' Reader: 'LLVM 13.0.1')
Reviewed By: thakis
Differential Revision: https://reviews.llvm.org/D125399
Peter Klausler [Mon, 9 May 2022 16:37:35 +0000 (09:37 -0700)]
[flang] Fix check for assumed-size arguments to SHAPE() & al.
The predicate that is used to detect an invalid assumed-size argument
to the intrinsic functions SHAPE, SIZE, & LBOUND gives false results
for arguments whose shapes are not calculatable at compilation time.
Replace with an explicit test for an assumed-size array dummy argument
symbol.
Differential Revision: https://reviews.llvm.org/D125342
Philip Reames [Wed, 11 May 2022 17:12:53 +0000 (10:12 -0700)]
[riscv] Add tests for vsetvli reuse across iterations of a loop
These variations are chosen to exercise both FRE and PRE cases involving loops which don't change state in the iteration and can thus perform vsetvli in the preheader of the loop only. At the moment, these are essentially all TODOs.
David Tenty [Mon, 2 May 2022 21:06:04 +0000 (17:06 -0400)]
[clang][AIX] Don't ignore XCOFF visibility by default
D87451 added -mignore-xcoff-visibility for AIX targets and made it the default (which mimicked the behaviour of the XL 16.1 compiler on AIX).
However, ignoring hidden visibility has unwanted side effects and some libraries depend on visibility to hide non-ABI facing entities from user headers and
reserve the right to change these implementation details based on this (https://libcxx.llvm.org/DesignDocs/VisibilityMacros.html). This forces us to use
internal linkage fallbacks for these cases on AIX and creates an unwanted divergence in implementations on the plaform.
For these reasons, it's preferable to not add -mignore-xcoff-visibility by default, which is what this patch does.
Reviewed By: DiggerLin
Differential Revision: https://reviews.llvm.org/D125141
Mircea Trofin [Wed, 11 May 2022 17:06:26 +0000 (10:06 -0700)]
[mlgo] Fix test
Updated reference file for dev-mode-logging.ll and expected output.
Peter Klausler [Sat, 7 May 2022 01:39:23 +0000 (18:39 -0700)]
[flang] Fold complex component references
Complex component references (z%RE, z%IM) of complex named constants
should be evaluated at compilation time.
Differential Revision: https://reviews.llvm.org/D125341
Sanjay Patel [Wed, 11 May 2022 16:09:47 +0000 (12:09 -0400)]
[InstCombine] freeze operand in urem expansion
As discussed in issue #37809, this transform is not safe
if the input is an undefined value.
There is no difference in codegen on the basic examples,
but this could lead to regressions. We may need to
improve freeze analysis or lowering if that happens.
Amir Ayupov [Wed, 11 May 2022 16:34:10 +0000 (09:34 -0700)]
[BOLT][NFC] Add MCPlus::primeOperands iterator_range
Reviewed By: yota9
Differential Revision: https://reviews.llvm.org/D125397
Joseph Huber [Wed, 11 May 2022 16:25:06 +0000 (12:25 -0400)]
[OpenMP] Add a check for alignment in the offload packager
Summary:
These sections need to be aligned correctly to be extracted later, add
a check to indicate if they aren't.
Vibhuti Sawant [Wed, 11 May 2022 16:16:53 +0000 (09:16 -0700)]
[Bazel] Add support for s390x build target
While executing the test suite for Tensorflow(v2.8.0), we encountered multiple TC failures with the below error
```
'z14' is not a recognized processor for this target
```
This patch adds the s390x target to the build target list. It fixes TC failures in multiple modules of Tensorflow on s390x arch. It is also tested to have no effect on x86 machines.
Reviewed By: GMNGeoffrey
Differential Revision: https://reviews.llvm.org/D125096
Aaron Ballman [Wed, 11 May 2022 16:09:21 +0000 (12:09 -0400)]
Fix the Clang sphinx build
This should address:
https://lab.llvm.org/buildbot/#/builders/92/builds/26609
Matthias Braun [Wed, 11 May 2022 15:41:09 +0000 (08:41 -0700)]
Fix endless loop in optimizePhiConst with integer constant switch condition
Avoid endless loop in degenerate case with an integer constant as switch
condition as reported in https://reviews.llvm.org/D124552
Alban Bridonneau [Wed, 11 May 2022 15:36:24 +0000 (15:36 +0000)]
[NFC] Change comment number in aarch64 isel
python3kgae [Sat, 7 May 2022 07:32:17 +0000 (00:32 -0700)]
[DirectX backend] Add pass to emit dxil metadata.
A new pass DxilEmitMetadata is added to translate information saved in llvm ir into metadata to match DXIL spec.
Only generate DXIL validator version in this PR.
In llvm ir, validator version is saved in ModuleFlag with "dx.valver" as Key.
!llvm.module.flags = !{!0, !1}
!1 = !{i32 6, !"dx.valver", !2}
!2 = !{i32 1, i32 1}
DXIL validator version has major and minor versions that are specified as named metadata:
!dx.valver = !{!2}
!2 = !{i32 1, i32 7}
Reviewed By: kuhar, beanz
Differential Revision: https://reviews.llvm.org/D125158
Joe Nash [Thu, 21 Apr 2022 14:15:27 +0000 (10:15 -0400)]
[AMDGPU] gfx11 Decode wider instructions. NFC
Refactor to pass a templatized size parameter to the decoder to allow wider than
64bit decodes in a later patch.
Contributors:
Jay Foad <jay.foad@amd.com>
Depends on D125261
Patch 5/N for upstreaming of AMDGPU gfx11 architecture.
Reviewed By: dp
Differential Revision: https://reviews.llvm.org/D125316
Florian Hahn [Wed, 11 May 2022 15:10:25 +0000 (16:10 +0100)]
[ConstraintElimination] Add test where ssub result is not used.
Extra tests for D125264.
Joe Nash [Thu, 14 Apr 2022 13:32:59 +0000 (09:32 -0400)]
[AMDGPU] gfx11 subtarget features & early tests
Tablegen definitions for subtarget features and cpp predicate functions to
access the features.
New Sub-TargetProcessors and common latencies.
Simple changes to MIR codegen tests which pass on gfx11 because they have the
same output as previous subtargets or operate on pseudo instructions which
are reused from previous subtargets.
Contributors:
Jay Foad <jay.foad@amd.com>
Petar Avramovic <Petar.Avramovic@amd.com>
Patch 4/N for upstreaming of AMDGPU gfx11 architecture
Depends on D124538
Reviewed By: Petar.Avramovic, foad
Differential Revision: https://reviews.llvm.org/D125261
Shao-Ce SUN [Tue, 10 May 2022 05:28:46 +0000 (13:28 +0800)]
[RISCV] Remove some TODOs in tests
Reviewed By: asb
Differential Revision: https://reviews.llvm.org/D125289
Nikita Popov [Tue, 10 May 2022 15:10:37 +0000 (17:10 +0200)]
[InstCombine] Freeze other uses of frozen value
If there is a freeze %x, we currently replace all other uses of %x
with freeze %x -- as long as they are dominated by the freeze
instruction. This patch extends this behavior to cases where we
did not originally dominate the use by moving the freeze
instruction directly after the definition of the frozen value.
The motivation can be seen in test @combine_and_after_freezing_uses:
Canonicalizing everything to freeze %x allows folds that are based
on value identity (i.e. same operand occurring in two places) to
trigger. This also covers the case from D125248.
Differential Revision: https://reviews.llvm.org/D125321
Philip Reames [Wed, 11 May 2022 14:21:31 +0000 (07:21 -0700)]
[riscv] Prefer to use previous VL for scalar move instructionsK
This patch is an alternative to a piece of D125270. Its direct motivation is to fix a wrong code bug (described below), but somewhat unexpectedly, it also results in a significant code quality improvement for idiomatic fixed length vector patterns.
The existing transform is simply wrong in its current location. We are correct about the fact that the scalar move itself can use the previous vsetvli, but we loose track of the fact that later instructions might depend on the state change represented. That is, the actual value of VL in the register is different than the abstract state thinks it is. Not simply due to precision of modeling, but e.g. the VL register could contain 3 when the abstract state says it is 1. This is annoying hard to demonstrate in practice due to differences in policy flags on the intrinsics, but this is at least a latent wrong code bug.
The code quality benefit comes from the fact we don't need to tie this to explicit vsetvli instructions at all. We can propagate the abstract state, and reduce a) the number of transitions, or b) the cost of those transitions. It turns out we have a bunch of cases - in tests at least - where fixed length AVLs are known non-zero, and we can leave VL unchanged while changing VTYPE.
Differential Revision: https://reviews.llvm.org/D125337
Matthias Springer [Wed, 11 May 2022 11:55:58 +0000 (13:55 +0200)]
[mlir][bufferize][NFC] Move helper functions to BufferizationOptions
Move helper functions for creating allocs/deallocs/memcpys to BufferizationOptions.
Differential Revision: https://reviews.llvm.org/D125375
Louis Dionne [Wed, 11 May 2022 14:16:29 +0000 (10:16 -0400)]
[runtimes] Print the testing configuration in use in libunwind and libc++abi
We do it for libc++, and it's rather useful for debugging e.g. CI.
Nico Weber [Wed, 11 May 2022 14:14:00 +0000 (15:14 +0100)]
[gn build] (manually) port
26eb04268f4c (clang-offload-packager)
Joseph Huber [Fri, 6 May 2022 17:56:42 +0000 (13:56 -0400)]
[Clang] Introduce clang-offload-packager tool to bundle device files
In order to do offloading compilation we need to embed files into the
host and create fatbainaries. Clang uses a special binary format to
bundle several files along with their metadata into a single binary
image. This is currently performed using the `-fembed-offload-binary`
option. However this is not very extensibile since it requires changing
the command flag every time we want to add something and makes optional
arguments difficult. This patch introduces a new tool called
`clang-offload-packager` that behaves similarly to CUDA's `fatbinary`.
This tool takes several input files with metadata and embeds it into a
single image that can then be embedded in the host.
Reviewed By: tra
Differential Revision: https://reviews.llvm.org/D125165
Matt Devereau [Tue, 29 Mar 2022 17:52:48 +0000 (17:52 +0000)]
[AArch64][SVE] Add aarch64_sve_pcs attribute to Clang
Enable function attribute aarch64_sve_pcs at the C level, which correspondes to
aarch64_sve_vector_pcs at the LLVM IR level.
This requirement was created by this addition to the ARM C Language Extension:
https://github.com/ARM-software/acle/pull/194
Differential Revision: https://reviews.llvm.org/D124998
Alexey Bataev [Tue, 14 Dec 2021 18:02:06 +0000 (10:02 -0800)]
[SLP]Further improvement of the cost model for scalars used in buildvectors.
Further improvement of the cost model for the scalars used in
buildvectors sequences. The main functionality is outlined into
a separate function.
The cost is calculated in the following way:
1. If the Base vector is not undef vector, resizing the very first mask to
have common VF and perform action for 2 input vectors (including non-undef
Base). Other shuffle masks are combined with the resulting after the 1 stage and processed as a shuffle of 2 elements.
2. If the Base is undef vector and have only 1 shuffle mask, perform the
action only for 1 vector with the given mask, if it is not the identity
mask.
3. If > 2 masks are used, perform serie of shuffle actions for 2 vectors,
combing the masks properly between the steps.
The original implementation misses the very first analysis for the Base
vector, so the cost might too optimistic in some cases. But it improves
the cost for the insertelements which are part of the current SLP graph.
Part of D107966.
Differential Revision: https://reviews.llvm.org/D115750
Sanjay Patel [Wed, 11 May 2022 12:47:05 +0000 (08:47 -0400)]
[InstCombine] improve auto-generated test checks by matching function signature; NFC
Without this, miscompiles go undetected here as shown in D125352.
Whisperity [Wed, 11 May 2022 12:15:26 +0000 (14:15 +0200)]
[ASTMatchers][NFC] Fix name of matcher in docs and add a missing test
Sergey Semushin [Wed, 11 May 2022 11:38:35 +0000 (13:38 +0200)]
[clang-format] fix nested angle brackets parse inside concept definition
Due to how parseBracedList always stopped on the first closing angle
bracket and was used in parsing angle bracketed expression inside concept
definition, nested brackets inside concepts were parsed incorrectly.
nextToken() call before calling parseBracedList is required because
we were processing opening angle bracket inside parseBracedList second
time leading to incorrect logic after my fix.
Fixes https://github.com/llvm/llvm-project/issues/54943
Fixes https://github.com/llvm/llvm-project/issues/54837
Reviewed By: HazardyKnusperkeks, curdeius
Differential Revision: https://reviews.llvm.org/D123896
Fraser Cormack [Wed, 11 May 2022 11:41:25 +0000 (12:41 +0100)]
[RISCV][NFC] Rename variable to appease code style
Fraser Cormack [Wed, 11 May 2022 11:32:24 +0000 (12:32 +0100)]
[RISCV][NFC] Move variable down closer to its first use
Joseph Huber [Wed, 13 Apr 2022 15:48:07 +0000 (11:48 -0400)]
[CUDA] Add wrapper code generation for registering CUDA images
This patch adds the necessary code generation to create the wrapper code
that registers all the globals in CUDA. We create the necessary
functions and iterate through the list of
`__start_cuda_offloading_entries` to find which globals must be
registered. This is very similar to the code generation done currently
in Clang for non-rdc builds, but here we are registering a fully linked
fatbinary and finding the globals via the above sections.
With this we should be able to fully support basic RDC / LTO building of CUDA
code.
It's also worth noting that this does not include the necessary PTX to JIT the
image, so to use this support the offloading architecture must match the
system's architecture.
Depends on D123810
Reviewed By: tra
Differential Revision: https://reviews.llvm.org/D123812
Joseph Huber [Tue, 12 Apr 2022 15:21:36 +0000 (11:21 -0400)]
[Cuda] Add initial support for wrapping CUDA images in the new driver.
This patch adds the initial support for wrapping CUDA images. This
requires changing some of the logic for how we bundle images. We now
need to copy the image for all kinds that are active for the
architecture. Then we need to run a separate wrapping job if the Kind is
Cuda. For cuda wrapping we need to use the `fatbinary` program from the
CUDA SDK to bundle all the binaries together. This is then passed to a
new function to perfom the actual module code generation that will be
implemented in a later patch.
Depends on D120273 D123471
Reviewed By: tra
Differential Revision: https://reviews.llvm.org/D123810
Joseph Huber [Tue, 15 Mar 2022 20:43:37 +0000 (16:43 -0400)]
[CUDA] Create offloading entries when using the new driver
The changes made in D123460 generalized the code generation for OpenMP's
offloading entries. We can use the same scheme to register globals for
CUDA code. This patch adds the code generation to create these
offloading entries when compiling using the new offloading driver mode.
The offloading entries are simple structs that contain the information
necessary to register the global. The struct used is as follows:
```
Type struct __tgt_offload_entry {
void *addr; // Pointer to the offload entry info.
// (function or global)
char *name; // Name of the function or global.
size_t size; // Size of the entry info (0 if it a function).
int32_t flags;
int32_t reserved;
};
```
Currently CUDA handles RDC code generation by deferring the registration
of globals in the current TU to a callback function containing the
modules ID. Later all the module IDs will be used to register all of the
globals at once. Rather than mimic this, offloading entries allow us to
mimic the way OpenMP registers globals. That is, we create a simple
global struct for each device global to be registered. These are placed
at a special section `cuda_offloading_entires`. Because this section is
a valid C-identifier, the linker will profide a `__start` and `__stop`
pointer that we can use to iterate and register all globals at runtime.
the registration requires a flag variable to indicate which registration
function to use. I have assigned the flags somewhat arbitrarily, but
these use the following values.
Kernel: 0
Variable: 0
Managed: 1
Surface: 2
Texture: 3
Depends on D120272
Reviewed By: tra
Differential Revision: https://reviews.llvm.org/D123471
Aaron Ballman [Wed, 11 May 2022 10:52:21 +0000 (06:52 -0400)]
Fix test; we now expect a pedantic warning
This fixes:
https://lab.llvm.org/buildbot/#/builders/109/builds/38337
Ken Matsui [Wed, 11 May 2022 10:38:35 +0000 (06:38 -0400)]
Add extension diagnostic for linemarker directives
This adds the -Wgnu-line-marker diagnostic flag, grouped under -Wgnu,
to warn about use of the GNU linemarker preprocessor extension.
Fixes #55067
Differential Revision: https://reviews.llvm.org/D124534
Amir Ayupov [Wed, 11 May 2022 10:37:09 +0000 (03:37 -0700)]
[BOLT][TEST] Remove -gdwarf-4 override from %cflags
As BOLT support for monolithic and split DWARF5 is added, remove DWARF version
override for BOLT tests.
Reviewed By: ayermolo
Differential Revision: https://reviews.llvm.org/D125366
Florian Hahn [Wed, 11 May 2022 10:24:56 +0000 (11:24 +0100)]
[VPlan] VPInterleaveRecipe only uses first lane if op not stored.
With opaque pointers, both the stored value and the address can be the
same. Only consider the recipe using the first lane only *if* the
address is not stored.
Fixes #55375.
Florian Hahn [Wed, 11 May 2022 10:24:52 +0000 (11:24 +0100)]
[LV] Add opaque pointer test for #55375.
Amir Ayupov [Wed, 11 May 2022 10:18:12 +0000 (03:18 -0700)]
[BOLT] Add icp-inline option
Add an option to only peel ICP targets that can be subsequently inlined.
Yet there's no guarantee that they will be inlined.
The mode is independent from the heuristic used to choose ICP targets: by exec
count, mispredictions, or memory profile.
Reviewed By: rafauler
Differential Revision: https://reviews.llvm.org/D124900
Nikita Popov [Wed, 11 May 2022 10:20:36 +0000 (12:20 +0200)]
[IndVarSimplify] Regenerate test checks (NFC)
CHIANG, YU-HSUN (Tommy Chiang, oToToT) [Mon, 9 May 2022 19:10:58 +0000 (03:10 +0800)]
[docs][pp-trace] Remove FileNotFound callback
`FileNotFound` preprocessor callback is removed in D119708.
We should also remove it from the documentation.
Reviewed by: jansvoboda11
Differential Revision: https://reviews.llvm.org/D125258
Nikita Popov [Wed, 11 May 2022 10:11:11 +0000 (12:11 +0200)]
[SCEVExpander] Deduplicate min/max expansion code (NFC)
Nikita Popov [Wed, 11 May 2022 09:51:18 +0000 (11:51 +0200)]
[InstCombine] Add additional freeze tests (NFC)
David Green [Wed, 11 May 2022 09:47:44 +0000 (10:47 +0100)]
[TypePromotion] Fix sext vs zext in promoted constant
As pointed out in #55342, given non-canonical IR with multiple
constants, we check the second operand in isSafeWrap, but can promote
both with sext. Fix that as suggested by @craig.topper by ensuring we
only extend the second constant if multiple are present.
Fixes #55342
Differential Revision: https://reviews.llvm.org/D125294
Fraser Cormack [Tue, 10 May 2022 09:49:00 +0000 (10:49 +0100)]
[SelectionDAG][VP] Rename VP sext/zext/trunc ISD opcodes
Rather than VP_SEXT/VP_ZEXT/VP_TRUNC, having
VP_SIGN_EXTEND/VP_ZERO_EXTEND/VP_TRUNCATE better matches their non-VP
counterparts.
Reviewed By: craig.topper
Differential Revision: https://reviews.llvm.org/D125298
Chris Lattner [Wed, 11 May 2022 07:24:50 +0000 (08:24 +0100)]
[AsmParser] Improve error recovery again.
Change the parsing logic to use StringRef instead of lower level
char* logic. Also, if emitting a diagnostic on the first token
in the file, we make sure to use that position instead of the
very start of the file.
Differential Revision: https://reviews.llvm.org/D125353
David Green [Wed, 11 May 2022 07:18:58 +0000 (08:18 +0100)]
[TypePromotion] Format Type Promotion. NFC
This clang-formats the TypePromotion code, with the only meaningful
change being the removal of a verifyFunction call inside a LLVM_DEBUG,
and the printing of the entire function which can be better handled
via -print-after-all.
Jonas Hahnfeld [Thu, 7 Apr 2022 14:19:11 +0000 (16:19 +0200)]
[ORC] Fix sorting of contructors by priority
The code was incorrectly sorting by the function address.
Differential Revision: https://reviews.llvm.org/D123311
Xiang Li [Mon, 2 May 2022 20:59:37 +0000 (13:59 -0700)]
[DirectX backend] Add pass to lower llvm intrinsic into dxil op function.
A new pass DXILOpLowering was added.
It will scan all llvm intrinsics, create dxil op function if it can map to dxil op function.
Then translate call instructions on the intrinsic into call on dxil op function.
dxil op function will add i32 argument to the begining of args for dxil opcode.
So cannot use setCalledFunction to update the call instruction on intrinsic.
This commit only support sin to start the work.
Reviewed By: kuhar, beanz
Differential Revision: https://reviews.llvm.org/D124805
Yeting Kuo [Sun, 8 May 2022 13:10:06 +0000 (21:10 +0800)]
[RISCV] Make PseudoReadVL have the vtypes of the corresponding VLEFF/VLSEGFF.
The patch make PseudoReadVL have the vtypes of the corresponding VLEFF/VLSEGFF.
It's useful to get the vtypes of locations of PseudoReadVL without finding the
corresponding VLEFF/VLSEGFF.
It could simplify optimizations in RISCVInsertVSETVLI like D123581.
Reviewed By: craig.topper
Differential Revision: https://reviews.llvm.org/D125199
jacquesguan [Mon, 18 Apr 2022 06:32:32 +0000 (06:32 +0000)]
[RISCV] Add rvv codegen support for vp.fpext.
This patch adds rvv codegen support for vp.fpext. The lowering of fp_round, vp.fptrunc, fp_extend and vp.fpext share most code so use a common lowering function to handle these four.
And this patch changes the intermediate cast from ISD::FP_EXTEND/ISD::FP_ROUND to the RVV VL version op RISCVISD::FP_EXTEND_VL and RISCVISD::FP_ROUND_VL for scalable vectors.
Reviewed By: frasercrmck
Differential Revision: https://reviews.llvm.org/D123975
Peter Steinfeld [Mon, 9 May 2022 21:12:41 +0000 (14:12 -0700)]
[flang] Change "bad kind" messages in the runtime to "not yet implemented"
Similar to change D125046.
If a programmer is able to compile and link a program that contains types that
are not yet supported by the runtime, it must be because they're not yet
implemented.
This change will make it easier to find unimplemented code in tests.
Differential Revision: https://reviews.llvm.org/D125267
Mingming Liu [Wed, 11 May 2022 02:56:14 +0000 (19:56 -0700)]
[X86] Fix 80 column violation in X86InstrInfo.cpp. NFC
Differential Revision: https://reviews.llvm.org/D125345
Mingming Liu [Wed, 11 May 2022 02:46:15 +0000 (19:46 -0700)]
Revert "[NFC] Run clang-format on llvm/lib/Target/X86/X86InstroInfo.cpp"
This reverts commit
8bef5476de3ec7388ad0c72b26dcc82ac7fd970a.
Need to revert, update commit message and reapply.
Alexander Shaposhnikov [Wed, 11 May 2022 01:07:54 +0000 (01:07 +0000)]
[Transform][Utils][NFC] Clean up CtorUtils.cpp
Xiang1 Zhang [Sat, 7 May 2022 07:22:15 +0000 (15:22 +0800)]
[CodeGen] Fix ConvertNodeToLibcall for STRICT_FPOWI
Reviewed By: PengfeiWang
Differential Revision: https://reviews.llvm.org/D125159
Mingming Liu [Tue, 10 May 2022 23:01:02 +0000 (16:01 -0700)]
[NFC] Run clang-format on llvm/lib/Target/X86/X86InstroInfo.cpp
Differential Revision: https://reviews.llvm.org/D125345
Ting Wang [Wed, 11 May 2022 00:47:51 +0000 (20:47 -0400)]
[PowerPC] Fix PPCISD::STBRX selection issue on A2
Enable FeatureISA2_06 on Power A2 target
Reviewed By: nemanjai
Differential Revision: https://reviews.llvm.org/D125203
Eduard Zingerman [Wed, 11 May 2022 00:41:41 +0000 (17:41 -0700)]
[BPF] Add a test for making FI_ri as isPseudo
Commit
8a63326150ee ("[BPF] Mark FI_ri as isPseudo to avoid
assertion during disassembly") added isPseudo to FI_ri insn
in BPFInstrInfo.td file. This patch added the missing test file.
Differential Revision: https://reviews.llvm.org/D125185
Eduard Zingerman [Wed, 11 May 2022 00:04:58 +0000 (17:04 -0700)]
[BPF] Mark FI_ri as isPseudo to avoid assertion during disassembly
When a specific sequence of bytes is present in the file during
disassembly the disassembler fails with the following assertion:
...
0: 18 20 00 00 00 00 00 00 lea
... Assertion `idx < size()' failed.
...
llvm::SmallVectorTemplateCommon<...>::operator[](...) ...
llvm::MCInst::getOperand(unsigned int) ...
llvm::BPFInstPrinter::printOperand(...) ...
llvm::BPFInstPrinter::printInstruction() ...
llvm::BPFInstPrinter::printInst(...) ...
...
The byte sequence causing the error is (little endian):
18 20 00 00 00 00 00 00 00 00 00 00 00 00 00 00
The issue could be reproduced using the program bellow:
test.ir:
@G = constant
[16 x i8]
[i8 u0x18, i8 u0x20, i8 u0x00, i8 u0x00, i8 u0x00, i8 u0x00, i8 u0x00, i8 u0x00,
i8 u0x00, i8 u0x00, i8 u0x00, i8 u0x00, i8 u0x00, i8 u0x00, i8 u0x00, i8 u0x00],
section "foo", align 8
Compiled and disassembled as follows:
cat test.ir | llc -march=bpfel -filetype=obj -o - \
| llvm-objdump --arch=bpfel --section=foo -d -
This byte sequence corresponds to FI_ri instruction declared in the
BPFInstrInfo.td as follows:
def FI_ri
: TYPE_LD_ST<BPF_IMM.Value, BPF_DW.Value,
(outs GPR:$dst),
(ins MEMri:$addr),
"lea\t$dst, $addr",
[(set i64:$dst, FIri:$addr)]> {
// This is a tentative instruction, and will be replaced
// with MOV_rr and ADD_ri in PEI phase
let Inst{51-48} = 0;
let Inst{55-52} = 2;
let Inst{47-32} = 0;
let Inst{31-0} = 0;
let BPFClass = BPF_LD;
}
Notes:
- First byte (opcode) is formed as follows:
- BPF_IMM.Value is 0x00
- BPF_DW.Value is 0x18
- BPF_LD is 0x00
- Second byte (registers) is formed as follows:
- let Inst{55-52} = 2;
- let Inst{51-48} = 0;
The FI_ri instruction is always replaced by MOV_rr ADD_ri instructions
pair in the BPFRegisterInfo::eliminateFrameIndex method. Thus, this
instruction should be invisible to disassembler. This patch achieves
this by adding "isPseudo" flag for this instruction.
The bug was found by decompiling of one of the BPF tests from Linux
kernel (llvm-objdump -D tools/testing/selftests/bpf/bpf_iter_sockmap.o)
Differential Revision: https://reviews.llvm.org/D125185
Florian Mayer [Wed, 11 May 2022 00:00:57 +0000 (17:00 -0700)]
[HWASan symbolize] Write error to stderr.
Florian Mayer [Tue, 10 May 2022 23:32:12 +0000 (16:32 -0700)]
[HWASan] deflake hwasan_symbolize test more.
Don't fail on corrupted ELF file on indexing. This happens because files
change in the directory from concurrent tests.
Peter Klausler [Tue, 10 May 2022 20:42:08 +0000 (13:42 -0700)]
[flang] Allow local variables and function result inquiries in specification expressions
Inquiries into the bounds, size, and length of local variables (and function results)
are acceptable specification expressions. A recent change allowed them for dummy
arguments that are not OPTIONAL or INTENT(OUT), but didn't address other object
entities.
Differential Revision: https://reviews.llvm.org/D125343
Nick Desaulniers [Tue, 10 May 2022 23:21:17 +0000 (16:21 -0700)]
[BuildLibCalls] infer inreg param attrs from NumRegisterParameters
We're having a hard time booting the ARCH=i386 Linux kernel with clang
after removing -ffreestanding because instcombine was dropping inreg
from callers during libcall simplification, but not the callees defined
in different translation units. This led the callers and callees to have
wildly different calling conventions, which (predictably) blew up at
runtime.
Infer the inreg param attrs on function declarations from the module
metadata "NumRegisterParameters." This allows us to boot the ARCH=i386
Linux kernel (w/ -ffreestanding removed).
Fixes: https://github.com/llvm/llvm-project/issues/53645
Reviewed By: efriedma
Differential Revision: https://reviews.llvm.org/D125285
Wende Tan [Tue, 10 May 2022 22:44:46 +0000 (15:44 -0700)]
[Bitcode] Include indirect users of BlockAddresses in bitcode
The original fix (commit
23ec5782c3cc) of
https://github.com/llvm/llvm-project/issues/52787 only adds `Function`s
that have `Instruction`s that directly use `BlockAddress`es into the
bitcode (`FUNC_CODE_BLOCKADDR_USERS`).
However, in either @rickyz's original reproducing code:
```
void f(long);
__attribute__((noinline)) static void fun(long x) {
f(x + 1);
}
void repro(void) {
fun(({
label:
(long)&&label;
}));
}
```
```
...
define dso_local void @repro() #0 {
entry:
br label %label
label: ; preds = %entry
tail call fastcc void @fun()
ret void
}
define internal fastcc void @fun() unnamed_addr #1 {
entry:
tail call void @f(i64 add (i64 ptrtoint (i8* blockaddress(@repro, %label) to i64), i64 1)) #3
ret void
}
...
```
or the xfs and overlayfs in the Linux kernel, `BlockAddress`es (e.g.,
`i8* blockaddress(@repro, %label)`) may first compose `ConstantExpr`s
(e.g., `i64 ptrtoint (i8* blockaddress(@repro, %label) to i64)`) and
then used by `Instruction`s. This case is not handled by the original
fix.
This patch adds *indirect* users of `BlockAddress`es, i.e., the
`Instruction`s using some `Constant`s which further use the
`BlockAddress`es, into the bitcode as well, by doing depth-first
searches.
Fixes: https://github.com/llvm/llvm-project/issues/52787
Fixes:
23ec5782c3cc ("[Bitcode] materialize Functions early when BlockAddress taken")
Reviewed By: nickdesaulniers
Differential Revision: https://reviews.llvm.org/D124878
Mingming Liu [Tue, 10 May 2022 21:23:40 +0000 (14:23 -0700)]
[Peephole-opt][X86] Enhance peephole opt to see through SUBREG_TO_REG
(following AND) and eliminates redundant TEST instruction.
Differential Revision: https://reviews.llvm.org/D124118
Chia-hung Duan [Tue, 10 May 2022 22:48:46 +0000 (22:48 +0000)]
[mlir] Print some message for op-printing verification
Before dump, Insetad of switching to generic form silently after
verification failure. Print some debug logs to help identify why an op
may be printed in a different way.
Reviewed By: rriddle
Differential Revision: https://reviews.llvm.org/D125136
Thomas Raoux [Mon, 9 May 2022 17:18:21 +0000 (17:18 +0000)]
[mlir][gpu] Move async copy ops to NVGPU and add caching hints
Move async copy operations to NVGPU as they only exist on NV target and are
designed to match ptx semantic. This allows us to also add more fine grain
caching hint attribute to the op.
Add hint to bypass L1 and hook it up to NVVM op.
Differential Revision: https://reviews.llvm.org/D125244
Vasileios Porpodas [Thu, 5 May 2022 22:03:31 +0000 (15:03 -0700)]
[SLP] Make reordering aware of external vectorizable scalar stores.
The current reordering scheme only checks the ordering of in-tree operands.
There are some cases, however, where we need to adjust the ordering based on
the ordering of a future SLP-tree who's instructions are not part of the
current tree, but are external users.
This patch is a simple implementation of this. We keep track of scalar stores
that are users of TreeEntries and if they look profitable to vectorize, then
we keep track of their ordering. During the reordering step we take this new
index order into account. This can remove some shuffles in cases like in the
lit test.
Differential Revision: https://reviews.llvm.org/D125111
Philip Reames [Tue, 10 May 2022 22:06:26 +0000 (15:06 -0700)]
[riscv] Consolidate logic for SEW/VL operand offset calculations [nfc]