Nikita Popov [Mon, 11 Apr 2022 14:45:27 +0000 (16:45 +0200)]
[InstCombine] Add strlen of gep test without inbounds (NFC)
Louis Dionne [Thu, 3 Feb 2022 19:45:22 +0000 (14:45 -0500)]
[libc++] Implement P1007R3: std::assume_aligned
This supersedes and incoroporates content from both D108906 and D54966,
and also some original content.
Co-Authored-by: Marshall Clow <mclow.lists@gmail.com>
Co-Authored-by: Gonzalo Brito Gadeschi
Differential Revision: https://reviews.llvm.org/D118938
Florian Hahn [Mon, 11 Apr 2022 14:45:18 +0000 (16:45 +0200)]
[LICM] Only create load in pre-header when promoting load.
When only a store is sunk, there is no need to create a load in the
pre-header, as the result of the load will never get used.
The dead load can can introduce UB, if the function is marked as
writeonly.
Fixes #51248.
Reviewed By: nikic
Differential Revision: https://reviews.llvm.org/D123473
Louis Dionne [Mon, 11 Apr 2022 14:38:26 +0000 (10:38 -0400)]
[libc++] Make .version.pass.cpp tests be compile-only tests
We don't really need to run them.
Groverkss [Mon, 11 Apr 2022 14:31:27 +0000 (20:01 +0530)]
[MLIR][Presburger] Make PWMAFunction inheritence from space private
This patch makes inheritence from PresburgerSpace for PWMAFunction private.
The reasoning for this patch is to prevent implicit conversion to
PresburgerSpace from PWMAFunction and to not expose all functions exposed by
PresburgerSpace in PWMAFunction.
Reviewed By: arjunp
Differential Revision: https://reviews.llvm.org/D123076
gysit [Mon, 11 Apr 2022 14:23:53 +0000 (14:23 +0000)]
[mlir][tensor] Add pattern to fold ExtractSliceOp, PadOp chains.
The pattern folds chains of tensor::ExtractSliceOp, tensor::PadOp pairs if they pad different dimensions. Repeated tiling and padding of the tiled dimensions may introduce such chains. This canonicalization pattern folds these chains to a single tensor::ExtractSliceOp, tensor::PadOp pair that pads all dimensions at once, which simplifies vectorization and bufferization.
Example:
```mlir
%0 = tensor.extract_slice %input[16, 0] [%sz0, 64] [1, 1]
: tensor<64x64xf32> to tensor<?x64xf32>
%1 = tensor.pad %0 low[0, 0] high[%pw0, 0] { ...
} : tensor<?x64xf32> to tensor<8x64xf32>
%2 = tensor.extract_slice %1[0, 4] [8, %sz1] [1, 1]
: tensor<8x64xf32> to tensor<8x?xf32>
%res = tensor.pad %2 nofold low[0, 0] high[0, %pw1] { ...
} : tensor<8x?xf32> to tensor<8x4xf32>
```
folds into:
```mlir
%0 = tensor.extract_slice %input[16, 4] [%sz0, %sz1] [1, 1]
: tensor<64x64xf32> to tensor<?x?xf32>
%res = tensor.pad %0 nofold low[0, 0] high[%pw0, %pw1] { ...
} : tensor<?x?xf32> to tensor<8x4xf32>
```
Reviewed By: nicolasvasilache, hanchung
Differential Revision: https://reviews.llvm.org/D122722
Hans Wennborg [Fri, 8 Apr 2022 13:54:09 +0000 (15:54 +0200)]
[dllexport] odr-use constexpr default args for constructor closures
InstantiateDefaultCtorDefaultArgs() is supposed to mark default
constructor args as odr-used, since those args will be used when
emitting the constructor closure.
However, constexpr vars were not getting odr-used since
DoMarkVarDeclReferenced() defers them in MaybeODRUseExprs, and the code
was calling CleanupVarDeclMarking() which discarded those uses instead
of processing them.
(This came up in Chromium, crbug.com/1312086)
Differential revision: https://reviews.llvm.org/D123405
Ulrich Weigand [Mon, 11 Apr 2022 14:18:09 +0000 (16:18 +0200)]
[compiler-rt][SystemZ] Skip fuzzer/coverage.test
This test is currently marked as XFAIL on s390x, but it is randomly
passing, causing build bot issues. Setting as UNSUPPORTED for now.
Nikita Popov [Mon, 11 Apr 2022 12:36:37 +0000 (14:36 +0200)]
[Clang] Avoid legacy PM in some tests (NFC)
Either remove legacy PM run lines or change them to use new PM.
Nikolas Klauser [Sat, 9 Apr 2022 14:19:45 +0000 (16:19 +0200)]
[libc++] Remove the usage of __init in operator+
`operator+` currently calls `__init`. This patch removes the usage of implementation details.
Reviewed By: ldionne, Mordante, #libc
Spies: libcxx-commits
Differential Revision: https://reviews.llvm.org/D123058
David Spickett [Wed, 6 Apr 2022 14:29:51 +0000 (14:29 +0000)]
[llvm][AArch64] Generate getExtensionFeatures from the list of extensions
This takes the AARCH64_ARCH_EXT_NAME in AArch64TargetParser.def and uses
it to generate all the "if bit is set add this feature name" code.
Which gives us a bunch that we were missing. I've updated testing
to include those and reordered them to match the order in the .def.
The final part of the test will catch any missing extensions if
we somehow manage to not generate an if block for them.
This has changed the order of cc1's "-target-feature" output so I've
updated some tests in clang to reflect that.
Reviewed By: tmatheson
Differential Revision: https://reviews.llvm.org/D123296
LLVM GN Syncbot [Mon, 11 Apr 2022 12:47:08 +0000 (12:47 +0000)]
[gn build] Port
b4ad28da196d
Guoxiong Li [Mon, 11 Apr 2022 12:31:16 +0000 (08:31 -0400)]
[Clang] Override method ModuleImportRead in MultiplexASTDeserializationListener
Fixes https://llvm.org/PR54521
Differential Revision: https://reviews.llvm.org/D123452
Momchil Velikov [Mon, 11 Apr 2022 11:08:26 +0000 (12:08 +0100)]
[CodeGen] Async unwind - add a pass to fix CFI information
This pass inserts the necessary CFI instructions to compensate for the
inconsistency of the call-frame information caused by linear (non-CGA
aware) nature of the unwind tables.
Unlike the `CFIInstrInserer` pass, this one almost always emits only
`.cfi_remember_state`/`.cfi_restore_state`, which results in smaller
unwind tables and also transparently handles custom unwind info
extensions like CFA offset adjustement and save locations of SVE
registers.
This pass takes advantage of the constraints taht LLVM imposes on the
placement of save/restore points (cf. `ShrinkWrap.cpp`):
* there is a single basic block, containing the function prologue
* possibly multiple epilogue blocks, where each epilogue block is
complete and self-contained, i.e. CSR restore instructions (and the
corresponding CFI instructions are not split across two or more
blocks.
* prologue and epilogue blocks are outside of any loops
Thus, during execution, at the beginning and at the end of each basic
block the function can be in one of two states:
- "has a call frame", if the function has executed the prologue, or
has not executed any epilogue
- "does not have a call frame", if the function has not executed the
prologue, or has executed an epilogue
These properties can be computed for each basic block by a single RPO
traversal.
From the point of view of the unwind tables, the "has/does not have
call frame" state at beginning of each block is determined by the
state at the end of the previous block, in layout order.
Where these states differ, we insert compensating CFI instructions,
which come in two flavours:
- CFI instructions, which reset the unwind table state to the
initial one. This is done by a target specific hook and is
expected to be trivial to implement, for example it could be:
```
.cfi_def_cfa <sp>, 0
.cfi_same_value <rN>
.cfi_same_value <rN-1>
...
```
where `<rN>` are the callee-saved registers.
- CFI instructions, which reset the unwind table state to the one
created by the function prologue. These are the sequence:
```
.cfi_restore_state
.cfi_remember_state
```
In this case we also insert a `.cfi_remember_state` after the
last CFI instruction in the function prologue.
Reviewed By: MaskRay, danielkiss, chill
Differential Revision: https://reviews.llvm.org/D114545
Christian Sigg [Mon, 11 Apr 2022 09:29:32 +0000 (11:29 +0200)]
Remove deprecated `parseSourceFile/String()` overloads.
Reviewed By: rriddle
Differential Revision: https://reviews.llvm.org/D123490
Marius Brehler [Mon, 11 Apr 2022 11:45:44 +0000 (11:45 +0000)]
[mlir][emitc][nfc] Replace !emitc.opaque pointers
Replaces `!emitc.opaque` types used to express pointers with
`!emitc.ptr` types.
Sanjay Patel [Mon, 11 Apr 2022 11:09:47 +0000 (07:09 -0400)]
[SDAG] try to reduce compare of funnel shift equal 0
fshl (or X, Y), X, C ==/!= 0 --> or (shl Y, C), X ==/!= 0
fshl X, (or X, Y), C ==/!= 0 --> or (srl Y, BW-C), X ==/!= 0
This is similar to an existing setcc-of-rotate fold, but the
matching requires more checks for the more general funnel op:
https://alive2.llvm.org/ce/z/Ab2jDd
We are effectively decomposing the funnel shift into logical
shifts, reassociating, and removing a shift.
This should get us the final improvements for x86-64 that were
originally shown in D111530
( https://github.com/llvm/llvm-project/issues/49541 );
x86-32 still shows some SHLD/SHRD, so the pattern is not
matching there yet.
Differential Revision: https://reviews.llvm.org/D122919
Florian Hahn [Mon, 11 Apr 2022 11:27:38 +0000 (13:27 +0200)]
[LICM] Add additional test for load hoisting, simplify existing one.
Tim Northover [Mon, 11 Apr 2022 11:25:58 +0000 (12:25 +0100)]
Revert "AArch64: take compact unwind frame size from last CFI instruction."
It was on ToT when I pushed and committed unintentionally.
Tim Northover [Mon, 11 Apr 2022 11:23:05 +0000 (12:23 +0100)]
AArch64: add nvcast patterns for v1f64
Tim Northover [Mon, 7 Mar 2022 15:12:57 +0000 (15:12 +0000)]
AArch64: take compact unwind frame size from last CFI instruction.
Asynchronous exception support for the prologue means that there can be
multiple .cfi_def_cfa_offset instructions in a single function, which tripped
up an assertion in the compact unwind generator.
In reality the compact unwind format is far too restrictive to represent
asynchronous frames so if we ever wanted that on Darwin we'd fall back to DWARF
(possibly keeping compact unwind around for synchronous users). So the compact
format should continue to represent the synchronous situation, and the
assertion can be removed.
Tim Northover [Thu, 3 Mar 2022 12:38:05 +0000 (12:38 +0000)]
Tail calls: look through AssertZExt to find register copy.
arm64_32 guarantees the high 32 bits of pointer parameters are passed as 0, and
this is modelled in the IR by inserting an AssertZExt after the CopyFromReg.
The function deciding whether registers that need to be preserved actually are
wasn't expecting this so it banned perfectly legitimate tail calls.
Nikita Popov [Mon, 11 Apr 2022 11:14:49 +0000 (13:14 +0200)]
[Clang] Add -no-opaque-pointers to native powerpc test (NFC)
Does not run on x86, so I missed this before. The test currently
has typed pointer check lines.
Simon Pilgrim [Mon, 11 Apr 2022 10:32:45 +0000 (11:32 +0100)]
[InstCombine] Fold sub(add(x,y),min/max(x,y)) -> max/min(x,y) (PR38280)
As discussed on Issue #37628, we can flip a min/max node if we're subtracting from the sum of the node's operands
Alive2: https://alive2.llvm.org/ce/z/W_KXfy
Differential Revision: https://reviews.llvm.org/D123399
Iain Sandoe [Sat, 19 Mar 2022 19:48:38 +0000 (19:48 +0000)]
[C++20][Modules] Add testcases from section 10.2 dependent on header units.
This adds in testcases reflecting the remaining example in section 10.2
of the C++20 standard.
Differential Revision: https://reviews.llvm.org/D122124
gysit [Mon, 11 Apr 2022 10:19:40 +0000 (10:19 +0000)]
[mlir][vector] Swap ExtractSliceOp(TransferWriteOp).
Rewrite tensor::ExtractSliceOp(vector::TransferWriteOp) to vector::TransferWriteOp(tensor::ExtractSliceOp) if the full slice is overwritten and inserted into another tensor. After this rewrite, the operations bufferize in-place since all of them work on the same %iter_arg slice.
For example:
```mlir
%0 = vector.transfer_write %vec, %init_tensor[%c0, %c0]
: vector<8x16xf32>, tensor<8x16xf32>
%1 = tensor.extract_slice %0[0, 0] [%sz0, %sz1] [1, 1]
: tensor<8x16xf32> to tensor<?x?xf32>
%r = tensor.insert_slice %1 into %iter_arg[%iv0, %iv1] [%sz0, %sz1] [1, 1]
: tensor<?x?xf32> into tensor<27x37xf32>
```
folds to
```mlir
%0 = tensor.extract_slice %iter_arg[%iv0, %iv1] [%sz0, %sz1] [1, 1]
: tensor<27x37xf32> to tensor<?x?xf32>
%1 = vector.transfer_write %vec, %0[%c0, %c0]
: vector<8x16xf32>, tensor<?x?xf32>
%r = tensor.insert_slice %1 into %iter_arg[%iv0, %iv1] [%sz0, %sz1] [1, 1]
: tensor<?x?xf32> into tensor<27x37xf32>
Reviewed By: nicolasvasilache, hanchung
Differential Revision: https://reviews.llvm.org/D123190
Sven van Haastregt [Mon, 11 Apr 2022 10:27:51 +0000 (11:27 +0100)]
[OpenCL] Add device enqueue guards for DSE builtins
Align guards of these builtins with opencl-c.h.
Simon Pilgrim [Mon, 11 Apr 2022 10:20:04 +0000 (11:20 +0100)]
[X86] Account for high uop/resource usage in BSF/BSR instructions
znver1/2 models were incorrectly modelling these as single uop instructions, instead of the microcoded nightmares they really are.
Now matches AMD SoG, Agner and instlatx64 numbers.
Fixes #54811
Nikita Popov [Mon, 11 Apr 2022 10:06:39 +0000 (12:06 +0200)]
[CGCall] Check store type in findDominatingStoreToReturnValue()
We need to make sure that the stored type matches the return type.
gysit [Mon, 11 Apr 2022 09:59:35 +0000 (09:59 +0000)]
[mlir][vector] Update transfer read/write doc (NFC).
Clarify the in_bounds attribute is specified for the vector dimensions.
Reviewed By: nicolasvasilache
Differential Revision: https://reviews.llvm.org/D123188
Jean Perier [Mon, 11 Apr 2022 09:56:44 +0000 (02:56 -0700)]
[flang] D123388 fix - remove unused variable from test
Haojian Wu [Mon, 11 Apr 2022 09:51:28 +0000 (11:51 +0200)]
[AST] Remove a duplicated getDecl method in TemplateName, NFC.
There is a TemplateName::getTemplateDecl which does the same work.
Mats Petersson [Fri, 8 Apr 2022 18:04:47 +0000 (19:04 +0100)]
[flang][runtime] Prefer process time over thread time in CPU_TIME
Most Fortran compilers appear to return the process time
for calls to CPU_TIME, where the flang implementation
prior to this change was returning the time used by the
current thread. This would cause incorrect time being
reported when for example OpenMP is used to share work
across multiple CPUs.
This patch changes the order so the selection of "what
time to return" so that if there is a process time to
report, that is the reported value, and only if that is
not available, the thread time is considerd instead.
Reviewed By: jeanPerier
Differential Revision: https://reviews.llvm.org/D123416
Simon Pilgrim [Mon, 11 Apr 2022 09:42:32 +0000 (10:42 +0100)]
Revert rG88ff6f70c45f2767576c64dde28cbfe7a90916ca "[X86] Extend vselect(cond, pshufb(x), pshufb(y)) -> or(pshufb(x), pshufb(y)) to include inner or(pshufb(x), pshufb(y)) chains"
Reverting while I investigate reports of internal test regressions/failures
Nikita Popov [Fri, 8 Apr 2022 09:40:02 +0000 (11:40 +0200)]
[ThinLTOCodeGenerator] Remove support for legacy PM
All users of NewPM=false for the (legacy) ThinLTOCodeGenerator
have been removed, so we can remove this functionality entirely.
Kiran Chandramohan [Mon, 11 Apr 2022 09:05:00 +0000 (09:05 +0000)]
[Flang][OpenMP] Add implementation of privatisation
Privatisation creates local copies of variables in the OpenMP region.
Two functions `createHostAssociateVarClone` and `copyHostAssociateVar`
are added to create a clone of the variable for basic privatisation and to
copy the contents for first-privatisation.
Note: Tests for more data-types will be added when the fir.do_loop is
upstreamed.
This is part of the upstreaming effort from the fir-dev branch in [1].
[1] https://github.com/flang-compiler/f18-llvm-project
Reviewed By: peixin, NimishMishra
Differential Revision: https://reviews.llvm.org/D122595
Co-authored-by: Jean Perier <jperier@nvidia.com>
Co-authored-by: Eric Schweitz <eschweitz@nvidia.com>
Co-authored-by: Peter Klausler <pklausler@nvidia.com>
Co-authored-by: Valentin Clement <clementval@gmail.com>
Co-authored-by: Sourabh Singh Tomar <SourabhSingh.Tomar@amd.com>
Co-authored-by: Nimish Mishra <neelam.nimish@gmail.com>
Co-authored-by: Peixin-Qiao <qiaopeixin@huawei.com>
Nikita Popov [Thu, 7 Apr 2022 09:59:38 +0000 (11:59 +0200)]
[Clang] Enable opaque pointers by default
Enable opaque pointers by default in clang, which can be disabled
either via cc1 option -no-opaque-pointers or cmake flag
-DCLANG_ENABLE_OPAQUE_POINTERS=OFF.
See https://llvm.org/docs/OpaquePointers.html for context.
Differential Revision: https://reviews.llvm.org/D123300
Nikita Popov [Mon, 11 Apr 2022 09:13:05 +0000 (11:13 +0200)]
[Clang] Add -no-opaque-pointers to recently added test (NFC)
Iain Sandoe [Mon, 11 Apr 2022 07:56:50 +0000 (08:56 +0100)]
[C++20][Modules] Remove an empty statement [NFC].
This addresses a post commit review comment by removing an unused and empty
'else' (replaced with a comment).
Simon Pilgrim [Mon, 11 Apr 2022 08:54:42 +0000 (09:54 +0100)]
[X86] Add shuffle combine tests where we fail to fold a mask into a or(pshufb,pshufb) chain
This doesn't hit if we use pshufb intrinsics directly due to a change in lowering order
Nikita Popov [Fri, 8 Apr 2022 08:54:20 +0000 (10:54 +0200)]
[llvm-lto] Remove support for legacy pass manager
This removes support for the legacy pass manager in llvm-lto and
llvm-lto2. In this case I've dropped the use-new-pm option entirely,
as I don't think this is considered part of the public interface.
This also makes -debug-pass-manager work with llvm-lto, because
that was needed to migrate some tests to NewPM.
Differential Revision: https://reviews.llvm.org/D123376
jeanPerier [Mon, 11 Apr 2022 07:33:39 +0000 (09:33 +0200)]
[flang] Lower optionals in GET_COMMAND_ARGUMENT and GET_ENVIRONMENT_VARIABLE
Handle dynamic optional argument in GET_COMMAND_ARGUMENT and GET_ENVIRONMENT_VARIABLE
(previously compiled but caused segfaults). The previous code
handled static presence/absence aspects, but not when an absent dummy optional was
passed to one of the optional intrinsic arguments.
Simplify the runtime call lowering to simply lower the runtime call without
dealing with optionality there. This keeps the optional handling logic in
IntrinsicCall.cpp.
Note that the new code will generate some extra "if (not null addr )/then/else"
when the actual arguments are always there at runtime. That makes the implementation
a lot simpler/safer, and I think it is OK for now (I do not expect these runtime
function to be called in hot loop nests).
Differential Revision: https://reviews.llvm.org/D123388
Jean Perier [Mon, 11 Apr 2022 07:32:03 +0000 (09:32 +0200)]
[flang] add a static assert in CheckUnitNumberInRangeImpl
Add a check that CheckUnitNumberInRangeImpl is not needlessly instantiated.
Differential Revision: https://reviews.llvm.org/D123285
Alexander Shaposhnikov [Mon, 11 Apr 2022 05:36:28 +0000 (05:36 +0000)]
[AArch64][NFC] Update comment in AArch64.td
Alexander Shaposhnikov [Mon, 11 Apr 2022 05:27:11 +0000 (05:27 +0000)]
[AArch64] Split fuse-literals feature
This diff splits fuse-literals feature and enables fuse-adrp-add by default,
in particular, it adjusts instruction scheduling to place ADRP+ADD pairs together.
This also enables the linker to apply the relaxations described in
https://github.com/ARM-software/abi-aa/commit/
d2ca58c54b8e955cfef25c71822f837ae0439d73.
Differential revision: https://reviews.llvm.org/D120104
Test plan: make check-all
Patryk Wychowaniec [Mon, 11 Apr 2022 02:21:45 +0000 (02:21 +0000)]
[AVR] Merge AVRRelaxMemOperations into AVRExpandPseudoInsts
This commit contains a refactoring that merges AVRRelaxMemOperations
into AVRExpandPseudoInsts, so that we have a single place in code that
expands the STDWPtrQRr opcode.
Seizing the day, I've also fixed a couple of potential bugs with our
previous implementation (e.g. when the destination register was killed,
the previous implementation would try to .addDef() that killed
register, crashing LLVM in the process - that's fixed now, as proved by
the test).
Reviewed By: benshi001
Differential Revision: https://reviews.llvm.org/D122533
LiaoChunyu [Thu, 31 Mar 2022 06:16:24 +0000 (14:16 +0800)]
[RISCV] Add basic code modeling for llvm.experimental.stepvector intrinsic
Scalable vectors llvm.experimental.stepvector intrinsic
will crash due to an invalid cost when run the code through the loopunroll.
Reviewed By: kito-cheng
Differential Revision: https://reviews.llvm.org/D122782
Yaxun (Sam) Liu [Fri, 8 Apr 2022 02:57:56 +0000 (22:57 -0400)]
[CUDA][HIP] Externalize kernels in anonymous name space
kernels in anonymous name space needs to have unique name
to avoid duplicate symbols.
Fixes: https://github.com/llvm/llvm-project/issues/54560
Reviewed by: Artem Belevich
Differential Revision: https://reviews.llvm.org/D123353
Nico Weber [Mon, 11 Apr 2022 01:24:21 +0000 (21:24 -0400)]
[llvm-objcopy] Update comments with capitalization change from
6b575395d47b8
No behavior change.
Sheng [Mon, 11 Apr 2022 01:21:15 +0000 (01:21 +0000)]
Fix a misuse of `cast`
`cast` will assert instead of returning null pointer.
Florian Hahn [Sun, 10 Apr 2022 21:50:23 +0000 (23:50 +0200)]
[LAA] Add test with simpler load of pointer select.
Add a simpler test for D114487/D108699.
Florian Hahn [Sun, 10 Apr 2022 20:36:03 +0000 (22:36 +0200)]
[LICM] Add test for PR51248.
Test for #51248. LICM introduces an unused load in a writeonly function.
Florian Hahn [Sun, 10 Apr 2022 20:25:05 +0000 (22:25 +0200)]
[LICM] Trim unneeded functions from test, add promote-able load.
Clean up the test a bit. Also add a promote-able load, to make sure LICM
always has to hoist the load.
eop Chen [Sun, 10 Apr 2022 15:31:25 +0000 (08:31 -0700)]
[X86] Remove dead code from test case
Obvious NFC, no need to review.
Differential Revision: https://reviews.llvm.org/D123465
Nikolas Klauser [Sun, 10 Apr 2022 09:55:06 +0000 (11:55 +0200)]
[libc++] Rename the template arguments of the algorithm result types
Spies: libcxx-commits
Differential Revision: https://reviews.llvm.org/D123463
Simon Pilgrim [Sun, 10 Apr 2022 12:04:53 +0000 (13:04 +0100)]
[X86] Extend vselect(cond, pshufb(x), pshufb(y)) -> or(pshufb(x), pshufb(y)) to include inner or(pshufb(x), pshufb(y)) chains
Alex Fan [Thu, 24 Mar 2022 01:46:36 +0000 (09:46 +0800)]
[ORC] add lazy jit support for riscv64
This adds resolver, indirection and trampoline stubs for riscv64,
allowing lazy compilation to work.
It assumes hard float extension exists. I don't know the proper way to detect it as Triple doesn't provide the interface to check riscv +f +d abi.
I am also not sure if orclazy tests should be enabled because lli needs an additional -codemodel=melany for tests to pass.
Reviewed By: lhames
Differential Revision: https://reviews.llvm.org/D122543
Simon Pilgrim [Sun, 10 Apr 2022 10:03:08 +0000 (11:03 +0100)]
[X86] combineExtractSubvector - fold extract_subvector(insert_subvector(V,X,C1),C1)
extract_subvector(insert_subvector(V,X,C1),C1) -> insert_subvector(extract_subvector(V,C1),X,0)
More aggressively attempt to reduce the width of an extract_subvector source - we currently only do this if we're inserting into a zero vector (i.e. canonicalizing to the AVX implicit zero upper elts pattern).
But if we're extracting from the same point as the inner insert_subvector then the fold is still relatively trivial - we can probably do even better if we can ensure the subvector isn't badly split.
Fangrui Song [Sun, 10 Apr 2022 08:44:53 +0000 (01:44 -0700)]
[Driver] Prepend - to option name in err_drv_unsupported_option_argument diagnostic
Florian Hahn [Sun, 10 Apr 2022 08:26:20 +0000 (10:26 +0200)]
[VPlan] Place VPExpandSCEVRecipe in pre-header.
After D121624 models the pre-header in VPlan, VPExpandSCEVRecipes can be
placed there. This ensures SCEV expansion happens before modifying the
CFG during VPlan execution, when CFG is incomplete.
Depends on D121624.
Reviewed By: Ayal
Differential Revision: https://reviews.llvm.org/D122095
Fangrui Song [Sun, 10 Apr 2022 08:21:31 +0000 (01:21 -0700)]
[Driver] Simplify OPT_fcolor_diagnostics claim
Mostly NFC, but the diagnostic is changed to the more appropriate
err_drv_invalid_argument_to_option.
Fangrui Song [Sun, 10 Apr 2022 08:07:44 +0000 (01:07 -0700)]
[Driver] Simplify -f[no-]diagnostics-color handling. NFC
Make them aliases for -f[no-]color-diagnostics.
Fangrui Song [Sun, 10 Apr 2022 07:31:25 +0000 (00:31 -0700)]
[Frontend] Simplify -finline* handling. NFC
Fangrui Song [Sun, 10 Apr 2022 07:15:12 +0000 (00:15 -0700)]
[Driver] Fix -f[no-]inline to override -f[no-]inline-functions/-finline-hint-functions
Fix two cases to match GCC:
* -fno-inline -finline => (no cc1 option)
* -fno-inline -finline-functions => -fno-inline
Luo, Yuanke [Sun, 10 Apr 2022 06:32:20 +0000 (14:32 +0800)]
[X86][AMX] Fix infinite loop of getShape.
When walk the user chain to get the shape of a phi node. If it is phi
node in the chain, we should walk to the user of this phi node instead
of the original phi node.
Craig Topper [Sun, 10 Apr 2022 03:01:32 +0000 (20:01 -0700)]
[RISCV] Remove unnecessary cast to i8* when converting gather/scatter to strided load/store.
Not sure why I thought this necessary at the time.
Alexander Shaposhnikov [Sun, 10 Apr 2022 02:09:11 +0000 (02:09 +0000)]
[ObjCopy][NFC] Refactor handling of linkedit_data_command in MachOWriter
Alexander Shaposhnikov [Sun, 10 Apr 2022 01:29:24 +0000 (01:29 +0000)]
[ObjCopy][NFC] Add missing const in MachOLayoutBuilder.h
Alexander Shaposhnikov [Sun, 10 Apr 2022 01:20:45 +0000 (01:20 +0000)]
[ObjCopy][NFC] Refactor handling of linkedit_data_command
Aaron Ballman [Sat, 9 Apr 2022 21:23:32 +0000 (17:23 -0400)]
Giving a lot more functions prototypes; NFC
This should address https://lab.llvm.org/buildbot/#/builders/37/builds/12315
and speculatively fix other similar diagnostics.
Bill Wendling [Sat, 9 Apr 2022 20:24:59 +0000 (13:24 -0700)]
[randstruct] NFC change to use static
LLVM GN Syncbot [Sat, 9 Apr 2022 20:16:19 +0000 (20:16 +0000)]
[gn build] Port
7aa8c38a9e19
Connor Kuehl [Sat, 9 Apr 2022 06:36:51 +0000 (23:36 -0700)]
[randstruct] Add randomize structure layout support
The Randstruct feature is a compile-time hardening technique that
randomizes the field layout for designated structures of a code base.
Admittedly, this is mostly useful for closed-source releases of code,
since the randomization seed would need to be available for public and
open source applications.
Why implement it? This patch set enhances Clang’s feature parity with
that of GCC which already has the Randstruct feature. It's used by the
Linux kernel in certain structures to help thwart attacks that depend on
structure layouts in memory.
This patch set is a from-scratch reimplementation of the Randstruct
feature that was originally ported to GCC. The patches for the GCC
implementation can be found here:
https://www.openwall.com/lists/kernel-hardening/2017/04/06/14
Link: https://lists.llvm.org/pipermail/cfe-dev/2019-March/061607.html
Co-authored-by: Cole Nixon <nixontcole@gmail.com>
Co-authored-by: Connor Kuehl <cipkuehl@gmail.com>
Co-authored-by: James Foster <jafosterja@gmail.com>
Co-authored-by: Jeff Takahashi <jeffrey.takahashi@gmail.com>
Co-authored-by: Jordan Cantrell <jordan.cantrell@mail.com>
Co-authored-by: Nikk Forbus <nicholas.forbus@gmail.com>
Co-authored-by: Tim Pugh <nwtpugh@gmail.com>
Co-authored-by: Bill Wendling <isanbard@gmail.com>
Signed-off-by: Bill Wendling <isanbard@gmail.com>
Reviewed By: aaron.ballman
Differential Revision: https://reviews.llvm.org/D121556
Florian Hahn [Sat, 9 Apr 2022 19:32:58 +0000 (21:32 +0200)]
[IRBuilder] Remove commented out include.
Looks like this was left over during some include optimizations. Remove
it.
Simon Pilgrim [Sat, 9 Apr 2022 16:53:21 +0000 (17:53 +0100)]
[X86] Remove cfi noise from splat-for-size.ll tests
Fangrui Song [Sat, 9 Apr 2022 16:46:39 +0000 (09:46 -0700)]
Add some prototypes to fix -Wstrict-prototypes. NFC
PeixinQiao [Sat, 9 Apr 2022 15:57:50 +0000 (23:57 +0800)]
[flang] Fix -Wunused-variable in -DLLVM_ENABLE_ASSERTIONS=off build
Fix the warning from D122483.
Reviewed By: kiranchandramohan
Differential Revision: https://reviews.llvm.org/D123455
Craig Topper [Sat, 9 Apr 2022 15:32:42 +0000 (08:32 -0700)]
[RISCV] Only try LUI+SH*ADD+ADDI for int materialization if LUI+ADDI+SH*ADD failed.
There's an assert in LUI+SH*ADD+ADDI materialization that makes sure the
lower 12 bits aren't zero since that case should have been handled as
LUI+ADDI+SH*ADD. But nothing prevented the LUI+SH*ADD+ADDI checks from
running after the earlier code handled it.
The sequence would be the same length or longer so it wouldn't replace
the earlier sequence, but the assert happened before that was checked.
The vector holding the sequence also wasn't reset before the second
check so that guaranteed the sequence would never be found to be
shorter.
This patch fixes this by only trying the second expansion when the
earlier fails.
Fixes PR54812.
Reviewed By: benshi001
Differential Revision: https://reviews.llvm.org/D123406
Simon Pilgrim [Sat, 9 Apr 2022 15:47:53 +0000 (16:47 +0100)]
[X86] Add original test coverage for Issue #54819
Simon Pilgrim [Sat, 9 Apr 2022 15:05:46 +0000 (16:05 +0100)]
[X86] Fold concat(pshufb(x,y),pshufb(z,w)) -> pshufb(concat(x,z),concat(y,w))
owenca [Sat, 9 Apr 2022 14:55:38 +0000 (07:55 -0700)]
[clang-format] Add execute permission to dump_format_help.py
Aaron Ballman [Sat, 9 Apr 2022 14:51:06 +0000 (10:51 -0400)]
Add some prototypes to these functions; NFC
This is expected to fix the issues in this build bot:
https://lab.llvm.org/buildbot/#/builders/37/builds/12312
LLVM GN Syncbot [Sat, 9 Apr 2022 14:04:27 +0000 (14:04 +0000)]
[gn build] Port
a96443eddedc
Nikolas Klauser [Sat, 9 Apr 2022 07:41:19 +0000 (09:41 +0200)]
[libc++] Implement P0401R6 (allocate_at_least)
Reviewed By: ldionne, var-const, #libc
Spies: mgorny, libcxx-commits, arichardson
Differential Revision: https://reviews.llvm.org/D122877
Simon Pilgrim [Sat, 9 Apr 2022 13:09:21 +0000 (14:09 +0100)]
[X86] lowerV64I8Shuffle - attempt to fold to SHUFFLE(ALIGNR(X,Y)) and OR(PSHUFB(X),PSHUFB(Y))
Aaron Ballman [Sat, 9 Apr 2022 12:35:15 +0000 (08:35 -0400)]
Add some prototypes to these checks; NFC
This should address a build bot failure:
https://lab.llvm.org/buildbot/#/builders/18/builds/4495
Florian Hahn [Sat, 9 Apr 2022 12:19:47 +0000 (14:19 +0200)]
[VPlan] Model pre-header explicitly.
This patch extends the scope of VPlan to also model the pre-header.
The pre-header can be used to place recipes that should be code-gen'd
outside the loop, like SCEV expansion.
Depends on D121623.
Reviewed By: Ayal
Differential Revision: https://reviews.llvm.org/D121624
Simon Pilgrim [Sat, 9 Apr 2022 11:52:56 +0000 (12:52 +0100)]
[X86][SSE] combineSelect - more aggressively create zero elements in the or(pshufb(x), pshufb(y)) fold
When we fold vselect(cond, pshufb(x), pshufb(y)) -> or(pshufb(x), pshufb(y)), ensure we convert all undef elements to zero elements - this should help us expose more known zero elements for deeper chains of these cases.
Noticed while triaging Issue #54819
Jonas Hahnfeld [Fri, 8 Apr 2022 08:30:30 +0000 (10:30 +0200)]
[CUDA/HIP] Remove argument from module ctor/dtor signatures
In theory, constructors can take arguments when called via .init_array
where at least glibc passes in (argc, argv, envp). This isn't used in
the generated code and if it was, the first argument should be an
integer, not a pointer. For destructors registered via atexit, the
function should never take an argument.
Differential Revision: https://reviews.llvm.org/D123370
Simon Pilgrim [Sat, 9 Apr 2022 10:33:15 +0000 (11:33 +0100)]
[X86] Add v64i8 shuffle test coverage
Legalized shuffle masks based on the test cases from Issue #54819
Simon Pilgrim [Sat, 9 Apr 2022 09:59:18 +0000 (10:59 +0100)]
[X86] Reduce some superfluous diffs between znver1/znver2 models. NFC
znver2 is a mainly a search+replace of the znver1 model, but for no reason some lines have been moved around - try to keep these in sync (no actual changes in the models).
Simon Pilgrim [Sat, 9 Apr 2022 09:33:03 +0000 (10:33 +0100)]
[LoopVectorize] Regenerate first-order-recurrence.ll
Simon Pilgrim [Sat, 9 Apr 2022 08:26:58 +0000 (09:26 +0100)]
[AArch64] validateTargetOperandClass - early out from MCK_MPR case. NFCI.
If it didn't match a za register, there's nothing we can do.
Fixes static analyzer uninitialized variable warning.
Kai Luo [Sat, 9 Apr 2022 08:36:57 +0000 (16:36 +0800)]
[PowerPC] Generate tests for 16-byte atomic load/store. NFC.
Vitaly Buka [Sat, 9 Apr 2022 07:57:16 +0000 (00:57 -0700)]
[sanitizer] Disable new test on Android to fix a bot
LLVM GN Syncbot [Sat, 9 Apr 2022 07:40:37 +0000 (07:40 +0000)]
[gn build] Port
889302292bf6
Mark de Wever [Sun, 26 Sep 2021 14:48:25 +0000 (16:48 +0200)]
[libc++][format][4/6] Improve formatted_size.
Use a specialized "buffer" to count the number of insertions instead of
using a `string` as storage type.
Depends on D110497.
Reviewed By: ldionne, #libc
Differential Revision: https://reviews.llvm.org/D110498
Mark de Wever [Sun, 26 Sep 2021 13:47:42 +0000 (15:47 +0200)]
[libc++][format][3/6] Adds a __container_buffer.
Instead of writing every character directly into the container by using
a `back_insert_iterator` the data is buffered in an `array`. This buffer
is then inserted to the container by calling its `insert` member function.
Since there's no guarantee every container's `insert` behaves properly
containers need to opt-in to this behaviour. The appropriate standard
containers opt-in to this behaviour.
This change improves the performance of the format functions that use a
`back_insert_iterator`.
Depends on D110495
Reviewed By: ldionne, vitaut, #libc
Differential Revision: https://reviews.llvm.org/D110497
Fangrui Song [Sat, 9 Apr 2022 06:40:18 +0000 (23:40 -0700)]
Reland "[Driver] Default CLANG_DEFAULT_PIE_ON_LINUX to ON""
(With C++ exceptions, `clang++ --target=mips64{,el}-linux-gnu -fpie -pie
-fuse-ld=lld` has link errors (lld does not implement some strange R_MIPS_64
.eh_frame handling in GNU ld). However, sanitizer-x86_64-linux-qemu used this to
build ScudoUnitTests. Pined ScudoUnitTests to -no-pie.)
Default the option introduced in D113372 to ON to match all(?) major Linux
distros. This matches GCC and improves consistency with Android and linux-musl
which always default to PIE.
Note: CLANG_DEFAULT_PIE_ON_LINUX may be removed in the future.
Differential Revision: https://reviews.llvm.org/D120305
Fangrui Song [Sat, 9 Apr 2022 06:30:07 +0000 (23:30 -0700)]
[scudo][test] Link with -no-pie to be agnostic of CLANG_DEFAULT_PIE_ON_LINUX
This keeps the test behavior unchanged when CLANG_DEFAULT_PIE_ON_LINUX switches
to ON by default.
Note: current clang --target=mips64el-linux-gnu -fpie -pie -fuse-ld=lld
does not link with C++ exceptions, using -pie would lead to
```
ld.lld: error: cannot preempt symbol: DW.ref.__gxx_personality_v0
...
ld.lld: error: relocation R_MIPS_64 cannot be used against local symbol; recompile with -fPIC
...
```
when linking `ScudoUnitTests`: https://lab.llvm.org/buildbot/#/builders/169/builds/7311/steps/18/logs/stdio