Joseph Huber [Fri, 7 Jul 2023 18:18:41 +0000 (13:18 -0500)]
[Libomptarget] Refine logic for determining if we support RPC
Summary:
Add a requirement for the GPU libc to only be on if its enabled
explicitly. Fix the logic around the pythonification of the variable.
Alex Zinenko [Fri, 7 Jul 2023 14:35:04 +0000 (14:35 +0000)]
[mlir] add a simple gpu barrier elimination mechanism
GPU code generation, and specifically the shared memory copy insertion
may introduce spurious barriers guarding read-after-read dependencies or
read-after-write on non-aliasing data, which degrades performance due to
unnecessary synchronization. Add a pattern and transform op that removes
such barriers by analyzing memory effects that the barrier actually
guards that are not also guarded by other barriers. The code is adapted
from the Polygeist incubator project.
Co-authored-by: William Moses <gh@wsmoses.com>
Co-authored-by: Ivan Radanov Ivanov <ivanov.i.aa@m.titech.ac.jp>
Reviewed By: nicolasvasilache, wsmoses
Differential Revision: https://reviews.llvm.org/D154720
Kirill Stoimenov [Fri, 7 Jul 2023 18:45:26 +0000 (18:45 +0000)]
[ASAN] Disable mmap test on Windows.
Nemanja Ivanovic [Fri, 7 Jul 2023 16:03:28 +0000 (12:03 -0400)]
Reland "[PowerPC] Remove extend between shift and and"
The commit originally caused a bootstrap failure on the big endian
PPC bot as the combine was interfering with the legalizer when
applied on illegal types. This update restricts the combine to
the only types for which it is actually needed. Tested on PPC BE
bootstrap locally.
Rafael Ubal Tena [Fri, 7 Jul 2023 17:52:36 +0000 (10:52 -0700)]
New features and bug fix in MLIR test generation tool
- Option `--variable_names <names>` allows the user to pass names for FileCheck
regexps representing variables. Variable names are separated by commas, and
empty names can be used to generate specific variable names automatically.
For example, `--variable-names arg_0,arg_1,,,result` will produce regexp names
`ARG_0`, `ARG_1`, `VAR_0`, `VAR_1`, `RESULT`, `VAR_2`, `VAR_3`, ...
- Option '--attribute_names <names>' can be used to generate global regexp names
to represent attributes. Useful for affine maps. Same behavior as
'--variable_names'.
- Bug fixed for scope detection of SSA variables in ops with nested regions that
return SSA values (e.g., 'linalg.generic'). Originally, returned SSA values were
inserted in the nested scope.
This version of the tool has been used to generate unit tests for the following
patch: https://reviews.llvm.org/D153291
For example, the main body of the test named 'test_select_2d_one_dynamic' was
generated using the following command:
```
$ mlir-opt -pass-pipeline='builtin.module(func.func(tosa-to-linalg))' test_select_2d_one_dynamic.tosa.mlir | generate-test-checks.py --attribute_names map0,map1,map2 --variable_names arg0,arg1,arg2,const1,arg0_dim1,arg1_dim1,,arg2_dim1,max_dim1,,,arg0_broadcast,,,,,,,arg1_broadcast,,,,,,,arg2_broadcast,,,,,,result
```
Reviewed By: eric-k256
Differential Revision: https://reviews.llvm.org/D154458
LLVM GN Syncbot [Fri, 7 Jul 2023 17:56:47 +0000 (17:56 +0000)]
[gn build] Port
b4a62b1fa546
Joseph Huber [Fri, 7 Jul 2023 17:49:09 +0000 (12:49 -0500)]
[Libomptarget] Fix test logic for optionally adding the libcgpu.a
Summary:
This was not operating as expected and was causing the build to fail on
non-configured systems.
Christudasan Devadasan [Wed, 17 May 2023 05:46:58 +0000 (11:16 +0530)]
[AMDGPU][SILowerSGPRSpills] Spill SGPRs to virtual VGPRs
Currently, the custom SGPR spill lowering pass spills
SGPRs into physical VGPR lanes and the remaining VGPRs
are used by regalloc for vector regclass allocation.
This imposes many restrictions that we ended up with
unsuccessful SGPR spilling when there won't be enough
VGPRs and we are forced to spill the leftover into
memory during PEI. The custom spill handling during PEI
has many edge cases and often breaks the compiler time
to time.
This patch implements spilling SGPRs into virtual VGPR
lanes. Since we now split the register allocation for
SGPRs and VGPRs, the virtual registers introduced for
the spill lanes would get allocated automatically in
the subsequent regalloc invocation for VGPRs.
Spill to virtual registers will always be successful,
even in the high-pressure situations, and hence it avoids
most of the edge cases during PEI. We are now left with
only the custom SGPR spills during PEI for special registers
like the frame pointer which is an unproblematic case.
Differential Revision: https://reviews.llvm.org/D124196
Joseph Huber [Sat, 1 Jul 2023 03:08:42 +0000 (22:08 -0500)]
[Libomptarget] Begin implementing support for RPC services
This patch adds the intial support for running an RPC server in
libomptarget to handle host services. We interface with the library
provided by the `libc` project to stand up a basic server. We introduce
a new type that is controlled by the plugin and has each device
intialize its interface. We then run a basic server to check the RPC
buffer.
This patch does not fully implement the interface. In the future each
plugin will want to define special handlers via the interface to support
things like malloc or H2D copies coming from RPC. We will also want to
allow the plugin to specify t he number of ports. This is currently
capped in the implementation but will be adjusted soon.
Right now running the server is handled by whatever thread ends up doing
the waiting. This is probably not a completely sound solution but I am
not overly familiar with the behaviour of OpenMP tasks and what would be
required here. This works okay with synchrnous regions, and somewhat
fine with `nowait` regions, but I've observed some weird behavior when
one of those regions calls `exit`.
Reviewed By: jdoerfert
Differential Revision: https://reviews.llvm.org/D154312
Jay Foad [Thu, 8 Jun 2023 08:50:47 +0000 (09:50 +0100)]
[PEI][Mips] Switch to backwards frame index elimination
This adds support for running PEI::replaceFrameIndicesBackward with no
RegisterScavenger, and basic support for eliminating call frame pseudo
instructions.
Differential Revision: https://reviews.llvm.org/D154347
Jay Foad [Tue, 6 Jun 2023 14:14:53 +0000 (15:14 +0100)]
[PEI] Simplify iterator handling in replaceFrameIndicesBackward. NFCI.
Differential Revision: https://reviews.llvm.org/D154346
Christudasan Devadasan [Fri, 7 Jul 2023 17:28:06 +0000 (22:58 +0530)]
[AMDGPU] Enable whole wave register copy
So far, we haven't exposed the allocation of whole-wave
registers to regalloc. We hand-picked them for various
whole wave mode operations. With a future patch, we
want the allocator to efficiently allocate them rather
than using the custom pre-allocation pass.
Any liverange split of virtual registers involved in
whole-wave operations require the resulting COPY
introduced with the split to be performed for all
lanes. It isn't implemented in the compiler yet.
This patch would identify all such copies and
manipulate the exec mask around them to enable all
lanes without affecting the value of exec mask
elsewhere.
Reviewed By: arsenm
Differential Revision: https://reviews.llvm.org/D143762
Chia-hung Duan [Fri, 7 Jul 2023 17:27:46 +0000 (17:27 +0000)]
[scudo] Allow pushing single block to the freelist of BatchClass
This CL removes the restriction that pushing blocks into BatchClassId
can only be done when freelist is not empty. Without this constraint,
BatchClassId is also available for gathering blocks into groups.
Reviewed By: cferris
Differential Revision: https://reviews.llvm.org/D153492
Christudasan Devadasan [Fri, 7 Jul 2023 17:21:32 +0000 (22:51 +0530)]
[AMDGPU] Implement whole wave register spill
To reduce the register pressure during allocation,
when the allocator spills a virtual register that
corresponds to a whole wave mode operation, the
spill loads and restores should be activated for
all lanes by temporarily flipping all bits in exec
register to one just before the spills. It is not
implemented in the compiler as of today and this
patch enables the necessary support.
This is a pre-patch before the SGPR spill to virtual
VGPR lanes that would eventually causes the whole
wave register spills during allocation.
Reviewed By: arsenm, cdevadas
Differential Revision: https://reviews.llvm.org/D143759
Alex Langford [Wed, 5 Jul 2023 21:56:25 +0000 (14:56 -0700)]
[lldb][NFCI] Remove use of Stream * from TypeSystem
We always assume these streams are valid, might as well take references
instead of raw pointers.
Differential Revision: https://reviews.llvm.org/D154549
Yashwant Singh [Fri, 7 Jul 2023 16:59:30 +0000 (22:29 +0530)]
[CodeGen]Allow targets to use target specific COPY instructions for live range splitting
Replacing D143754. Right now the LiveRangeSplitting during register allocation uses
TargetOpcode::COPY instruction for splitting. For AMDGPU target that creates a
problem as we have both vector and scalar copies. Vector copies perform a copy over
a vector register but only on the lanes(threads) that are active. This is mostly sufficient
however we do run into cases when we have to copy the entire vector register and
not just active lane data. One major place where we need that is live range splitting.
Allowing targets to use their own copy instructions(if defined) will provide a lot of
flexibility and ease to lower these pseudo instructions to correct MIR.
- Introduce getTargetCopyOpcode() virtual function and use if to generate copy in Live range
splitting.
- Replace necessary MI.isCopy() checks with TII.isCopyInstr() in register allocator pipeline.
Reviewed By: arsenm, cdevadas, kparzysz
Differential Revision: https://reviews.llvm.org/D150388
Tamir Duberstein [Fri, 7 Jul 2023 16:50:52 +0000 (16:50 +0000)]
[LLVM-C] Use unwrapDI in LLVMDITypeGetName
This function otherwise crashes. This behavior has been incorrect since
its introduction in
260b581498bed0b847e7e086852e0082d266711d (D46627).
Reviewed By: scott.linder
Differential Revision: https://reviews.llvm.org/D154630
Sami Tolvanen [Thu, 6 Jul 2023 23:42:01 +0000 (23:42 +0000)]
[Clang] Emit KCFI type hashes for member functions
With `-fsanitize=kcfi`, Clang currently won't emit type hashes for
C++ member functions, which leads to check failures if they are
indirectly called. As there's no reason to exclude member functions
in CodeGenModule::setKCFIType, emit type hashes also for them to fix
member function pointer calls with KCFI, and add a test to confirm
that types are emitted correctly.
Joseph Huber [Fri, 7 Jul 2023 11:43:08 +0000 (06:43 -0500)]
[libc] Enable aliasing on AMDGPU targets
AMDGPU supports aliases now, so we can drop this case and leave it only
for the NVPTX target. Unfortunately it's unlikely that NVPTX will be
able to support this in the future due to their PTX language being very
limited.
Reviewed By: lntue
Differential Revision: https://reviews.llvm.org/D154704
Tamir Duberstein [Fri, 7 Jul 2023 16:28:22 +0000 (16:28 +0000)]
Appease clang-tidy
Address clang-tidy warnings, correct usage text.
Differential Revision: https://reviews.llvm.org/D154661
Alex Brachet [Fri, 7 Jul 2023 16:24:51 +0000 (16:24 +0000)]
revert filter_iterator_impl::operator++ changes
Ingo Müller [Fri, 7 Jul 2023 16:06:00 +0000 (16:06 +0000)]
[mlir][doc][transform] Fix link to documentation in tutorial. (NFC)
Reviewed By: ingomueller-net
Differential Revision: https://reviews.llvm.org/D154724
Craig Topper [Fri, 7 Jul 2023 15:48:58 +0000 (08:48 -0700)]
[RISCV] Add a guard condition to orc_b/brev8 handling in ReplaceNodeResults.
The orc_b and brev8 intrinsics are type overloaded, but only
i32 and XLen are supported types. The type legalization code in
ReplaceNodeResults only handles the i32 case on RV64. Add some
checks so we will fail type legalization for other types.
Spenser Bauman [Fri, 7 Jul 2023 15:50:24 +0000 (15:50 +0000)]
[mlir] Implement one-to-n structural conversion for ForOp
Add the missing one-to-n structural type conversion pattern for the
scf.for operation.
Reviewed By: ingomueller-net
Differential Revision: https://reviews.llvm.org/D154299
Kirill Stoimenov [Thu, 6 Jul 2023 21:17:48 +0000 (21:17 +0000)]
[ASAN] Add mmap and munmap interceptor in ASAN
Reviewed By: vitalybuka
Differential Revision: https://reviews.llvm.org/D154659
Ashay Rane [Fri, 7 Jul 2023 14:54:40 +0000 (09:54 -0500)]
[mlir] use shared pointer to prevent vector resizes from destroying ops
The `MapVector` type stores key-value pairs in a vector, which, when
resized, copies the entries and destroys the old ones. This causes the
underlying operations to be deleted, subsequently causing segfaults.
This patch makes the `mappings` map type refer to a shared pointer
instead, so that resizing the vector doesn't call the operations'
destructors.
Reviewed By: springerm
Differential Revision: https://reviews.llvm.org/D154511
Craig Topper [Fri, 7 Jul 2023 15:15:16 +0000 (08:15 -0700)]
[RISCV] Remove pseudos for vwcvt.f.x(u) with rounding mode.
vwcvt.f.x doesn't use rounding mode. The integer value fits in
the mantissa of a 2x larger FP type so no rounding is required.
I've remove the Uses = [FRM] that is also not needed.
I deleted the isel patterns. Alternatively, we could keep them and
drop the rounding mode immediate. The patterns are currently untested
so I chose to delete them. If they become needed in the future, we
can decide then if we should have the patterns or teach the node
creation to use the non-RM form for widening.
This reverts part of D142102.
Reviewed By: luke
Differential Revision: https://reviews.llvm.org/D154653
Alex Brachet [Fri, 7 Jul 2023 15:10:56 +0000 (15:10 +0000)]
[ADT] fix postfix filter_iterator_impl::operator++
Aaron Ballman [Fri, 7 Jul 2023 15:08:17 +0000 (11:08 -0400)]
Fix the Clang sphinx bot
This adds AMDGPUSupport to the index page to address the issue found by:
https://lab.llvm.org/buildbot/#/builders/92/builds/46936
Alex Brachet [Fri, 7 Jul 2023 14:46:44 +0000 (14:46 +0000)]
Reland: "[ADT] fix filter_iterator_impl::operator++"
Qiu Chaofan [Fri, 7 Jul 2023 14:42:46 +0000 (22:42 +0800)]
[PowerPC] Update InputOps of Power10 SchedModel
Count of input operands affect pipeline forwarding in scheduling model.
Previous Power10 model definition arranges some instructions into
incorrect groups, by counting the wrong number of input operands.
This patch updates the model, setting the input operands count correctly
by excluding irrelevant immediate operands and count memory operands of
load instructions correctly.
Reviewed By: shchenz
Differential Revision: https://reviews.llvm.org/D153842
Alex Zinenko [Wed, 5 Jul 2023 10:18:24 +0000 (10:18 +0000)]
[mlir] generalize matchers to support batch matmul
Mostly the same logic applies, with a different rank.
Additionally expose the logic to do identify contraction dimensions and
contraction-like bodies as independent transform ops. This allows us to
recognize "generic" operations and not only the named ones.
Rework the contraction body matching logic to no longer rely on
contraction operations beign uniquely named.
Reviewed By: springerm
Differential Revision: https://reviews.llvm.org/D154498
Corentin Jabot [Fri, 7 Jul 2023 14:41:17 +0000 (16:41 +0200)]
[Clang][NFC] Fix the title of P2361 in the release notes
Luke Lau [Fri, 7 Jul 2023 14:36:17 +0000 (15:36 +0100)]
[RISCV] Update loop vectorizer interleaved access test output
02bb33c3ce7a83d47244ae16c8b4c625aba187a2 changed it so it no longer unrolls the
loop.
cor3ntin [Fri, 7 Jul 2023 14:36:47 +0000 (16:36 +0200)]
[Clang][NFC] Mark that P2280 was approved as a DR
Yaxun (Sam) Liu [Thu, 29 Jun 2023 19:09:14 +0000 (15:09 -0400)]
[amdgpu] start documenting amdgpu support by clang
Reviewed by: Matt Arsenault, Johannes Doerfert, Jacob Lambert
Differential Revision: https://reviews.llvm.org/D154133
Luke Lau [Wed, 5 Jul 2023 18:14:58 +0000 (19:14 +0100)]
[RISCV] Check for alignment when lowering interleaved/deinterleaved loads/stores
As noted by @reames, we should be checking that the memory access is aligned to
the element size (or the unaligned vector memory access feature is enabled)
before lowering vlseg/vsseg intrinsics via the interleaved access pass.
Reviewed By: reames
Differential Revision: https://reviews.llvm.org/D154536
Luke Lau [Wed, 5 Jul 2023 17:37:28 +0000 (18:37 +0100)]
[RISCV] Add tests for unaligned segmented loads and stores
Reviewed By: reames
Differential Revision: https://reviews.llvm.org/D154535
spupyrev [Tue, 13 Jun 2023 17:08:00 +0000 (10:08 -0700)]
A new code layout algorithm for function reordering [1/3]
We are brining a new algorithm for function layout (reordering) based on the
call graph (extracted from a profile data). The algorithm is an improvement of
top of a known heuristic, C^3. It tries to co-locate hot and frequently executed
together functions in the resulting ordering. Unlike C^3, it explores a larger
search space and have an objective closely tied to the performance of
instruction and i-TLB caches. Hence, the name CDS = Cache-Directed Sort.
The algorithm can be used at the linking or post-linking (e.g., BOLT) stage.
This diff modifies the existing data structures to facilitate the implementation
(down the stack). This is a no-op change.
Reviewed By: hoy
Differential Revision: https://reviews.llvm.org/D152833
Nikita Popov [Fri, 7 Jul 2023 13:17:02 +0000 (15:17 +0200)]
[InstCombine] Handle unreachable edge when branching to loop
The successor is unreachable if either this is the only edge, or
this is an edge into a loop, in which case other predecessors
don't matter. This is exactly what the edge dominance check does.
Alex Brachet [Fri, 7 Jul 2023 14:03:58 +0000 (14:03 +0000)]
Revert "[ADT] fix filter_iterator_impl::operator++"
This reverts commit
5a67fa2b17c4db64ef00fbe672a4b59d26039828.
Matt Arsenault [Thu, 1 Jun 2023 23:04:47 +0000 (19:04 -0400)]
SimpleLoopUnswitch: Restore uniform unswitch test
This was supposed to document the new PM limitation but
was deleted in
fb4113ef0c8b2c5e5e2817e9ca14fb57a6d252be
Switch to generated checks since that's more reliable than XFAIL, and
just preserve the preferred results as comments.
Alex Brachet [Fri, 7 Jul 2023 13:50:12 +0000 (13:50 +0000)]
[ADT] fix filter_iterator_impl::operator++
Matt Arsenault [Fri, 2 Jun 2023 10:59:25 +0000 (06:59 -0400)]
TTI: Pass function to hasBranchDivergence in a few passes
https://reviews.llvm.org/D152033
Denis Revunov [Tue, 20 Jun 2023 15:00:36 +0000 (15:00 +0000)]
Reland "[BOLT][Instrumentation] Put Allocator itslef in shared memory by default"
The issue was caused by the absence of placement new definition. It
worked for clang and thus passed Phabricator checks, but broke when
compiled with GCC on buildbot.
Full problem description: https://reviews.llvm.org/D153771#4468239
Original patch description:
In absence of instrumentation-file-append-pid option,
global allocator uses shared pages for allocation. However, since it is a
global variable, it gets COW'd after fork if instrumentation-sleep-time
is used, or if a process forks by itself. This means it handles the same
pages to every process which causes hash table corruption. Thus, if we
want shared pages, we need to put the allocator itself in a shared page,
which we do in this commit in __bolt_instr_setup.
I also added a couple of assertions to sanity-check the hash table.
Reviewed By: rafauler, Amir
Differential Revision: https://reviews.llvm.org/D153771
Matthias Springer [Fri, 7 Jul 2023 13:14:31 +0000 (15:14 +0200)]
[mlir][transform] Add VerifyOp
This transform op runs the verifier on the targeted payload ops. It is for debugging only.
Differential Revision: https://reviews.llvm.org/D154711
Matt Arsenault [Sun, 20 Nov 2022 16:43:08 +0000 (08:43 -0800)]
HIP: Directly call round builtins
Matt Arsenault [Tue, 22 Nov 2022 17:28:41 +0000 (12:28 -0500)]
HIP: Directly call ceil builtins
Matt Arsenault [Wed, 28 Jun 2023 17:49:50 +0000 (13:49 -0400)]
AMDGPU: Remove attempt at simplifying the format string in printf lowering
This avoids computing the dominator tree by removing the
simplifyInstruction use.
This was applying simplification with some kind of questionable
load-store forwarding and looking for the global. This had to have
been an ancient hack copied from previous backends. In the OpenCL
case, this is always emitted as required the direct global reference
anyway.
Nikita Popov [Fri, 7 Jul 2023 13:19:21 +0000 (15:19 +0200)]
[InstCombine] Add test for unreachable loop (NFC)
We only catch this case on the second InstCombine iteration.
Joachim Jenke [Fri, 7 Jul 2023 12:26:21 +0000 (14:26 +0200)]
[OpenMP][OMPT] Change OMPT kind for OpenMP test lock functions
The OpenMP specification mentions that omp_test_lock and
omp_test_nest_lock dispatch OMPT callbacks with ompt_mutex_test_lock
and ompt_mutex_test_nest_lock for their kind respectively. Previously,
the values ompt_mutex_lock and ompt_mutex_nest_lock were used. This
could cause issues in application relying on the kind to correctly
determine lock states. This commit changes the kind to the expected
ones.
Also update callback.h and OMPT tests to reflect this change.
Patch prepared by Thyre
Differential Review: https://reviews.llvm.org/D153028
Differential Review: https://reviews.llvm.org/D153031
Differential Review: https://reviews.llvm.org/D153032
Felipe de Azevedo Piovezan [Wed, 5 Jul 2023 15:10:00 +0000 (11:10 -0400)]
[lldb][NFC] Factor out code from SymbolFileDWARF::ParseVariableDIE
This function does a _lot_ of different things:
1. Parses a DIE,
2. Builds an ExpressionList
3. Figures out lifetime of variable
4. Remaps addresses for debug maps
5. Handles external variables
6. Figures out scope of variables
A lot of this functionality is coded in a complex nest of conditions, variables
that are declared and then initialized much later, variables that are updated in
multiple code paths. All of this makes the code really hard to follow.
This commit attempts to improve the state of things by factoring out (3), adding
documentation on how the expression list is built, and by reducing the scope of
variables.
Differential Revision: https://reviews.llvm.org/D154513
Nikita Popov [Fri, 7 Jul 2023 12:34:49 +0000 (14:34 +0200)]
[LoopVectorize] Regenerate test checks (NFC)
Aaron Ballman [Fri, 7 Jul 2023 12:38:35 +0000 (08:38 -0400)]
Remove rdar links; NFC
This removes links to rdar, which is an internal bug tracker that the
community doesn't have visibility into.
See further discussion at:
https://discourse.llvm.org/t/code-review-reminder-about-links-in-code-commit-messages/71847
Renato Golin [Fri, 7 Jul 2023 11:04:30 +0000 (12:04 +0100)]
[MLIR][Linalg] Add max named op to linalg
I've been trying to come up with a simple and clean implementation for
ReLU. TOSA uses `clamp` which is probably the goal, but that means
table-gen to make it efficient (attributes, only lower `min` or `max`).
For now, `max` is a reasonable named op despite ReLU, so we can start
using it for tiling and fusion, and upon success, we create a more
complete op `clamp` that doesn't need a whole tensor filled with zeroes
or ones to implement the different activation functions.
As with other named ops, we start "requiring" type casts and broadcasts,
and zero filled constant tensors to a more complex pattern-matcher, and
can slowly simplify with attributes or structured matchers (ex. PDL) in
the future.
Differential Revision: https://reviews.llvm.org/D154703
Sam McCall [Fri, 7 Jul 2023 12:34:57 +0000 (14:34 +0200)]
[dataflow] Remove [[deprecated]] from deprecated functions
This fixes -Werror -Wdeprecated builds
See D153469
Matt Arsenault [Tue, 4 Jul 2023 16:50:41 +0000 (12:50 -0400)]
InstCombine: Fold ldexp(ldexp(x, a), b) -> ldexp(x, a + b)
The problem here is overflow or underflow which would have occurred in
the inner operation, which the exponent offsetting avoids. We can do
this if we know the two exponents are in the same direction, or
reassoc flags allow unsafe reassociates.
Matt Arsenault [Tue, 4 Jul 2023 16:55:25 +0000 (12:55 -0400)]
InstCombine: Add baseline tests for ldexp reassociation combine
Matt Arsenault [Tue, 25 Apr 2023 23:24:29 +0000 (19:24 -0400)]
InstSimplify: Update another cannotBeOrderedLessThanZero use
Pass all the optional arguments to enable assumes.
iambrj [Fri, 7 Jul 2023 12:07:47 +0000 (17:37 +0530)]
[MLIR][Presburger] Implement composition for PresburgerRelation
This patch implements range and domain composition for PresburgerRelations
Reviewed By: Groverkss
Differential Revision: https://reviews.llvm.org/D154444
LLVM GN Syncbot [Fri, 7 Jul 2023 12:07:16 +0000 (12:07 +0000)]
[gn build] Port
fc9821a877d4
LLVM GN Syncbot [Fri, 7 Jul 2023 12:07:15 +0000 (12:07 +0000)]
[gn build] Port
c0221e006d47
Nico Weber [Fri, 7 Jul 2023 12:01:41 +0000 (08:01 -0400)]
Revert "[DWARF][BOLT] Implement new mechanism for DWARFRewriter"
This reverts commit
460a2244430fae192298a5fd9fa2a269e540e8c1.
It breaks building on macOS, and it was landed with a review URL
pointing to some Facebook-internal service.
Also reverts a bunch of follow-ups:
Revert "[BOLT][DWARF] Don't check string offsets"
This reverts commit
f9d6f48c8bf5acaac07502403c41cf0b0d89c8d2.
Revert "[BOLT][DWARF] Change to process and write out TUs first then CUs in batches"
This reverts commit
88e95c1e4bb6e2ad3bfd185b96341ad5c09eff6b.
Revert "[BOLT][DWARF] Output DWO files as they are being processed"
This reverts commit
46ca2e3fcd419b1246357ed3b9cd36630f16e64d.
Revert "[BOLT][DWARF] Don't check string offsets"
This reverts commit
cfe4a4b04f219a9dbb4e3fc01883437b6ff0e702.
Revert "[BOLT][DWARF] Numerous fixes for a new DWARFRewriter"
This reverts commit
2701a661daa393ad5901ac88d420d7aa931eda0d.
Joachim Jenke [Fri, 7 Jul 2023 11:59:56 +0000 (13:59 +0200)]
[OpenMP][OMPT] Rename callback master to masked in ompt-multiplex.h
OpenMP 5.1 replaced callback ompt_callback_master_t by
ompt_callback_masked_t. In order to stick to the standard,
the implementation is updated accordingly.
Patch prepared by Semih Burak
Differential Revision: https://reviews.llvm.org/D112798
Joachim Jenke [Fri, 7 Jul 2023 11:58:38 +0000 (13:58 +0200)]
[OpenMP][OMPT] Add two missing nullpointer checks in ompt-multiplex.h
In the functions ompt_multiplex_get_own_ompt_data
and ompt_multiplex_get_client_ompt_data in addition to
data being NULL, also the void pointer field "ptr" of
"data" could be NULL, leading to a subsequent
segfault.
This patch add the corresponding checks.
Patch prepared by Semih Burak
Differential Revision: https://reviews.llvm.org/D112806
yronglin [Fri, 7 Jul 2023 11:55:04 +0000 (19:55 +0800)]
[AST] Stop evaluate constant expression if the condition expression which in switch statement contains errors
This fix issue: https://github.com/llvm/llvm-project/issues/63453
```
constexpr int foo(unsigned char c) {
switch (f) {
case 0:
return 7;
default:
break;
}
return 0;
}
static_assert(foo('d'));
```
Reviewed By: aaron.ballman, erichkeane, hokein
Differential Revision: https://reviews.llvm.org/D153296
Joachim Jenke [Fri, 7 Jul 2023 11:52:55 +0000 (13:52 +0200)]
[OpenMP][Tools] Add omp_all_memory support for Archer
The semantic of depend(out:omp_all_memory) is quite similar to taskwait in
that it separates all tasks (with dependency) created before an
all_memory-task from all tasks (with dependency) created after an
all_memory-task.
Only a single of such tasks can execute at a time. Similar to taskwait, we
have a CV (AllMemory[1]) in the generating task to express the dependency
sink semantic of an all_memory-task. In addition, AllMemory[0] describes the
dependency source semantic of an all_memory-task. All tasks with dependency
create an HB-arc towards the sink and terminate an HB-arc from the source.
Since we expect that not many applications will use such dependency, the
support for handling the synchronization semantic is off by default and
can be turned on using ARCHER_OPTION="all_memory=1". The most costly part
is the precautionary posting of an HB-arc towards the sink, which represents
a potentially contentious write from all concurrently executing sibling tasks.
A warning is printed at runtime, when the option is off while such dependency
is observed. In most cases the lazy activation will still lead to false alerts.
Differential Revision: https://reviews.llvm.org/D111895
Balazs Benics [Fri, 7 Jul 2023 11:48:18 +0000 (13:48 +0200)]
[analyzer][NFC] Simplify CStringChecker strong types
Joachim Jenke [Fri, 7 Jul 2023 11:19:09 +0000 (13:19 +0200)]
[OpenMP] Add OMPT support for omp_all_memory task dependence
omp_all_memory currently has no representation in OMPT.
Adding new dependency flags as suggested by omp-lang issue #3007.
Differential Revision: https://reviews.llvm.org/D111788
Matt Arsenault [Tue, 25 Apr 2023 15:52:09 +0000 (11:52 -0400)]
ValueTracking: Update another cannotBeOrderedLessThanZero use
Matt Arsenault [Tue, 25 Apr 2023 15:50:42 +0000 (11:50 -0400)]
ValueTracking: Update a use of cannotBeOrderedLessThanZero
Makes assumes work.
Lucas Prates [Wed, 14 Jun 2023 09:38:46 +0000 (10:38 +0100)]
[Clang][AArch64] Implement ACLE feature macro for FEAT_LRCPC3
This implements the new value for the `__ARM_FEATURE_RCPC` feature
macro, which was introduced to the ACLE to indicate the availability of
FEAT_LRCPC3.
More details can be found on:
https://github.com/ARM-software/acle/blob/main/main/acle.md#rcpc
Reviewed By: tmatheson
Differential Revision: https://reviews.llvm.org/D153130
Lucas Prates [Tue, 13 Jun 2023 13:48:35 +0000 (14:48 +0100)]
[AArch64][RCPC3] Instruction selection for LDAP1/STL1 instructions
This implements the DAG patterns to enable instruction selection for the
LDAP1 and STL1 instructions from FEAT_LRCPC3. The instructions should
match the following combinations:
* Aqcuiring atomic load + vector insert element for LDAP1.
* Vector extract element + releasing atomic store for STL1.
Patterns have also been added to cope with the DAG structure found when
dealing with 1-lane sub-vectors.
Reviewed By: tmatheson, efriedma
Differential Revision: https://reviews.llvm.org/D153129
Lucas Prates [Fri, 9 Jun 2023 14:20:46 +0000 (15:20 +0100)]
[AArch64][RCPC3] Add Neon intrinsics for LDAP1 and STL1
This adds new intrisics to support the LDAP1 and STL1 Advanced SIMD
(Neon) instructions introduced as part of FEAT_LRCPC3.
The new intrinsics `vldap1(q)_lane`/`vstl1(q)_lane` generate IR code
similar to the existing `vld1(q)_lane/st1(q)_lane` ones, but capturing
the difference in the atomic release/acquire memory model.
The LLVM code generation changes to ensure that this instruction pair
is lowered to the correct LDAP1/STL1 instructions will be covered in a
separate commit.
Based on a patch by Sam Elliott.
Reviewed By: tmatheson
Differential Revision: https://reviews.llvm.org/D153128
Corentin Jabot [Sat, 10 Jul 2021 13:52:54 +0000 (15:52 +0200)]
Implement P2361 Unevaluated string literals
This patch proposes to handle in an uniform fashion
the parsing of strings that are never evaluated,
in asm statement, static assert, attrributes, extern,
etc.
Unevaluated strings are UTF-8 internally and so currently
behave as narrow strings, but these things will diverge with
D93031.
The big question both for this patch and the P2361 paper
is whether we risk breaking code by disallowing
encoding prefixes in this context.
I hope this patch may allow to gather some data on that.
Future work:
Improve the rendering of unicode characters, line break
and so forth in static-assert messages
Reviewed By: aaron.ballman, shafik
Differential Revision: https://reviews.llvm.org/D105759
Balazs Benics [Fri, 7 Jul 2023 11:24:33 +0000 (13:24 +0200)]
[analyzer] Remove deprecated analyzer-config options
The `consider-single-element-arrays-as-flexible-array-members` analyzer
option was deprecated in clang-16, and now removed from clang-17 as
promised in
https://releases.llvm.org/16.0.0/tools/clang/docs/ReleaseNotes.html#static-analyzer
This shouldn't change observable behavior.
Differential Revision: https://reviews.llvm.org/D154481
David Sherwood [Tue, 21 Feb 2023 11:46:21 +0000 (11:46 +0000)]
[LTO] Ensure LICM hoists expensive fdiv instructions introduced by InstCombine
In the LTO pipeline we run InstCombine after LICM, which is
different to what we normally do without LTO. This has the
effect of undoing all the great work done by LICM to reduce
the cost of the loop when it hoists the fdiv out and replaces
it with fmul. When InstCombine runs after LICM it puts the
fdiv straight back which, on AArch64 at least, is darn
expensive. You can observe this problem in the SPEC2017
benchmark parest if you build with "-Ofast -flto" and the
loop-vectoriser uses an unroll factor of 1, which is what
often happens when tail-folding is enabled.
This is also a problem for scalar loops, or indeed any loop
where there is only one use of the preheader fdiv result in
the loop.
See InstCombinerImpl::visitFMul for the code that sinks the fdiv.
I've attempted to fix this by adding another LICM pass for Full
LTO after InstCombine. The alternative is to stop InstCombine
from sinking the fdiv into loops. See D87479 for a previous
discussion on this issue.
Differential Revision: https://reviews.llvm.org/D143631
David Spickett [Fri, 7 Jul 2023 10:44:00 +0000 (10:44 +0000)]
[mlir] Mark test-interpreter unsupported on Windows on Arm
This seems to fail every time there is some change in MLIR,
but not always.
For example: https://lab.llvm.org/buildbot/#/builders/65/builds/10415
Guillaume Chatelet [Wed, 5 Jul 2023 11:07:21 +0000 (11:07 +0000)]
[libc] Adding a version of memcpy w/ software prefetching
For machines with a lot of cores, hardware prefetchers can saturate the memory bus when utilization is high.
In this case it is desirable to turn off the hardware prefetcher completely.
This has a big impact on the performance of memory functions such as `memcpy` that rely on the fact that the next cache line will be readily available.
This patch adds the 'LIBC_COPT_MEMCPY_X86_USE_SOFTWARE_PREFETCHING' compile time option that generates a version of memcpy with software prefetching. While not fully restoring the original performances it mitigates the impact to an acceptable level.
Reviewed By: rtenneti
Differential Revision: https://reviews.llvm.org/D154494
Florian Hahn [Fri, 7 Jul 2023 10:06:30 +0000 (11:06 +0100)]
[LV] Do not add load to group if it moves across conflicting store.
This patch prevents invalid load groups from being formed, where a load
needs to be moved across a conflicting store.
Once we hit a store that conflicts with a load with an existing
interleave group, we need to stop adding earlier loads to the group, as
this would force hoisting the previous stores in the group across the
conflicting load.
To detect such cases, add a new CompletedLoadGroups set, which is used
to keep track of load groups to which no earlier loads can be added.
Fixes https://github.com/llvm/llvm-project/issues/63602
Reviewed By: anna
Differential Revision: https://reviews.llvm.org/D154309
Sam McCall [Wed, 5 Jul 2023 09:35:06 +0000 (11:35 +0200)]
Reland "[dataflow] Add dedicated representation of boolean formulas"
This reverts commit
7a72ce98224be76d9328e65eee472381f7c8e7fe.
Test problems were due to unspecified order of function arg evaluation.
Reland "[dataflow] Replace most BoolValue subclasses with references to Formula (and AtomicBoolValue => Atom and BoolValue => Formula where appropriate)"
This properly frees the Value hierarchy from managing boolean formulas.
We still distinguish AtomicBoolValue; this type is used in client code.
However we expect to convert such uses to BoolValue (where the
distinction is not needed) or Atom (where atomic identity is intended),
and then fold AtomicBoolValue into FormulaBoolValue.
We also distinguish TopBoolValue; this has distinct rules for
widen/join/equivalence, and top-ness is not represented in Formula.
It'd be nice to find a cleaner representation (e.g. the absence of a
formula), but no immediate plans.
For now, BoolValues with the same Formula are deduplicated. This doesn't
seem desirable, as Values are mutable by their creators (properties).
We can probably drop this for FormulaBoolValue immediately (not in this
patch, to isolate changes). For AtomicBoolValue we first need to update
clients to stop using value pointers for atom identity.
The data structures around flow conditions are updated:
- flow condition tokens are Atom, rather than AtomicBoolValue*
- conditions are Formula, rather than BoolValue
Most APIs were changed directly, some with many clients had a
new version added and the existing one deprecated.
The factories for BoolValues in Environment keep their existing
signatures for now (e.g. makeOr(BoolValue, BoolValue) => BoolValue)
and are not deprecated. These have very many clients and finding the
most ergonomic API & migration path still needs some thought.
Differential Revision: https://reviews.llvm.org/D153469
Yeting Kuo [Fri, 7 Jul 2023 09:00:27 +0000 (17:00 +0800)]
[RISCV] Add riscv_vsoxei_mask/riscv_vsuxei_mask to getTgtMemIntrinsic.
This constructs a proper memory operand for riscv_vsoxei_mask and riscv_vsuxei_mask.
I think they are missed in D147119.
Reviewed By: kito-cheng
Differential Revision: https://reviews.llvm.org/D154694
David Spickett [Fri, 7 Jul 2023 09:46:07 +0000 (09:46 +0000)]
[compiler-rt][xray] Disable fdr-single-thread test on Arm
For unknown reasons this casues a bus error.
See:
https://lab.llvm.org/buildbot/#/builders/178/builds/5157
Renato Golin [Thu, 6 Jul 2023 13:56:44 +0000 (14:56 +0100)]
[MLIR][Linalg] Add unary named ops to linalg
Following binary arithmetic in previous commits, this patch adds unary
maths ops to linalg.
It also fixes a few of the previous tests, and makes the binary ops call
BinaryFn.<op> directly instead of relying on Python to recognise the
operation.
Differential Revision: https://reviews.llvm.org/D154618
Tom Eccles [Wed, 5 Jul 2023 16:09:08 +0000 (16:09 +0000)]
[flang][hlfir] allow assoicate where the expr is also used by shape_of
This fixes the majority of cases where we hit the "hlfir.associate of
hlfir.expr with more than one use" TODO. In particular, this allows cam4
to be built.
hlfir.shape_of is just a way to delay reading shape information until
after intrinsics have been lowered to FIR runtime calls. It gets the
shape information from reading existing SSA values (e.g. fetching the
shape used when hlfir.declare'ing the variable).
Therefore hlfir.shape_of doesn't affect decisions about when to
deallocate the buffer.
Differential Revision: https://reviews.llvm.org/D154521
Ingo Müller [Fri, 7 Jul 2023 05:49:01 +0000 (05:49 +0000)]
[mlir] Add InsertionGuards to OneToNPatternRewriter.
This fixes bad behavior of that class that surfaced in
https://reviews.llvm.org/D154299, where calling applySignatureConversion
left the insertion point different from before the call, which broke a
subsequent call to replaceOp. This patch introduces a fix in both
functions, each of which is enough to fix the specific problem in the
aforementioned diff: (1) applySignatureConversion now resets the
insertion point with a guard for the whole function and (2) replace sets
the insertion point to the op that should be replaced (and resets it
with a guard).
Reviewed By: ftynse
Differential Revision: https://reviews.llvm.org/D154684
Ingo Müller [Fri, 7 Jul 2023 06:03:50 +0000 (06:03 +0000)]
[mlir] Avoid unnecessary copies in SCF's OneToNTypeConversions. (NFC)
In two places, a ResultRange was copied into a SmallVector just to be
passed as a ValueRange argument. With this patch, the ResultRanges are
passed directly, avoiding a copy.
Reviewed By: ingomueller-net
Differential Revision: https://reviews.llvm.org/D154685
Michael Platings [Fri, 7 Jul 2023 08:37:34 +0000 (09:37 +0100)]
[ARM][Driver] Change float-abi warning
Previously the warning stated "flag ignored" which is only partially
true - the invalid flag would prevent -feature +soft-float-abi from
being emitted which resulted in user-visible behaviour like
__ARM_PCS_VFP being defined.
Rather than attempt to coerce invalid flags into valid behaviour, don't
describe the expected behaviour.
Ideally the warning would be an error, as it is in GCC. However there
are tests in llvm-project that trigger the warning. Therefore one has to
assume that making the warning an error would break other code that
already exists in the wild.
Also apply test improvements suggested by @MaskRay on D150902.
Reviewed By: simon_tatham
Differential Revision: https://reviews.llvm.org/D154578
Nikita Popov [Thu, 29 Jun 2023 08:28:28 +0000 (10:28 +0200)]
[InstCombine] Preserve inbounds when folding select of GEP
The select base, (gep base, offset) to gep base, select (0, offset)
fold used to drop inbounds, because the gep base, 0 this introduces
might not be inbounds. After the semantics change in D154051, such
a GEP is always considered inbounds, in which allows us to preserve
the flag here.
As the PhaseOrdering test demonstrates, this can result in major
optimization improvements in some cases.
Differential Revision: https://reviews.llvm.org/D154055
Haojian Wu [Fri, 7 Jul 2023 07:01:27 +0000 (09:01 +0200)]
Kito Cheng [Fri, 7 Jul 2023 06:24:30 +0000 (14:24 +0800)]
[compiler-rt][RISCV] Fix __fe_getround and __fe_raise_inexact for Zfinx
Zfinx extension also provide floating point environment like F extension, so
enable that on `__fe_getround` and `__fe_raise_inexact` too.
Reviewed By: asb
Differential Revision: https://reviews.llvm.org/D154570
WuXinlong [Fri, 7 Jul 2023 06:01:22 +0000 (14:01 +0800)]
[RISCV] Add a pass to combine `cm.pop` and `ret` insts
`RISCVPushPopOptimizer.cpp` combine `cm.pop` and `ret` to generates `cm.popretz` or `cm.popret` .
Reviewed By: craig.topper
Differential Revision: https://reviews.llvm.org/D150416
Freddy Ye [Fri, 7 Jul 2023 05:46:46 +0000 (13:46 +0800)]
[X86] Support some Intel CPUs for cpu_specific/dispatch feature
Reviewed By: RKSimon, skan
Differential Revision: https://reviews.llvm.org/D154493
Jim Lin [Fri, 7 Jul 2023 05:02:39 +0000 (13:02 +0800)]
[RISCV] Rename prefix `fixed-vector` to `fixed-vectors` to be the same with other testcases. NFC.
Reviewed By: craig.topper
Differential Revision: https://reviews.llvm.org/D154679
Johannes Doerfert [Fri, 30 Jun 2023 17:51:05 +0000 (10:51 -0700)]
[Attributor] Check all NoFPClass attributes found in the IR
Johannes Doerfert [Mon, 3 Jul 2023 21:25:14 +0000 (14:25 -0700)]
[Attributor][NFC] Add missing comments
Johannes Doerfert [Sun, 2 Jul 2023 22:52:06 +0000 (15:52 -0700)]
[Attributor][NFCI] Use AA::hasAssumedIRAttr for NoSync
Serguei Katkov [Thu, 6 Jul 2023 12:31:12 +0000 (19:31 +0700)]
Register new assumption in a cache
When new assumption is created it should be registered in assumption cache
or cache should be invalidated.
Reviewed By: nikic
Differential Revision: https://reviews.llvm.org/D154601
Craig Topper [Fri, 7 Jul 2023 03:36:55 +0000 (20:36 -0700)]
[RISCV] Don't sink i1 vectors in shouldSinkOperands.
These can't create .vx instructions so there's no reason to sink them.
Jie Fu [Fri, 7 Jul 2023 03:32:36 +0000 (11:32 +0800)]
[RISCV] Remove unused private field 'RVPushable' in RISCVMachineFunctionInfo.h
/data/llvm-project/llvm/lib/Target/RISCV/RISCVMachineFunctionInfo.h:78:8: error: private field 'RVPushable' is not used [-Werror,-Wunused-private-field
]
bool RVPushable = false;
^
1 error generated.