Ethan Luis McDonough [Wed, 28 Jun 2023 20:02:37 +0000 (15:02 -0500)]
[flang][openmp] Fortran offloading test
Flang currently supports offloading for AMD GPUs. This patch establishes a test structure for Fortran offloading tests in libomptarget.
Reviewed By: jdoerfert
Differential Revision: https://reviews.llvm.org/D148778
Florian Hahn [Wed, 28 Jun 2023 20:10:43 +0000 (21:10 +0100)]
[ConstraintElim] Add additional induction phi tests with end argument.
Extra tests for D152730 with different GEP step sizes and the end pointer
being an argument.
David Green [Wed, 28 Jun 2023 20:02:29 +0000 (21:02 +0100)]
[SLP] Use vector types for cmp alt instructions costs
Similar to the other code that costs main/alt instructions, the cmp should be
using the VecTy for the costs, not the ScalarTy.
One of the tests look like it gets worse just because it is not simplified to
0.
Differential Revision: https://reviews.llvm.org/D153507
Serge Pavlov [Wed, 28 Jun 2023 19:04:31 +0000 (02:04 +0700)]
Revert "[Clang] Reset FP options before function instantiations"
This reverts commit
98390ccb80569e8fbb20e6c996b4b8cff87fbec6.
It caused issue #63542.
Matt Arsenault [Tue, 6 Jun 2023 22:06:38 +0000 (18:06 -0400)]
HIP: Use frexp builtins in math headers
Matt Arsenault [Wed, 28 Jun 2023 19:04:08 +0000 (15:04 -0400)]
LangRef: Fix sphinx build error
root [Fri, 23 Jun 2023 17:59:22 +0000 (10:59 -0700)]
adding bf16 support to NVPTX
Currently, bf16 has been scatteredly added to the PTX codegen. This patch aims to complete the set of instructions and code path required to support bf16 data type.
Reviewed By: tra
Differential Revision: https://reviews.llvm.org/D144911
Co-authored-by: Artem Belevich <tra@google.com>
Matt Arsenault [Tue, 2 May 2023 13:07:47 +0000 (09:07 -0400)]
clang: Use new frexp intrinsic for builtins and add f16 version
Matt Arsenault [Thu, 27 Apr 2023 01:57:10 +0000 (21:57 -0400)]
IR: Add llvm.frexp intrinsic
Add an intrinsic which returns the two pieces as multiple return
values. Alternatively could introduce a pair of intrinsics to
separately return the fractional and exponent parts.
AMDGPU has native instructions to return the two halves, but could use
some generic legalization and optimization handling. For example, we
should be able to handle legalization of f16 on older targets, and for
bf16. Additionally antique targets need a hardware workaround which
would be better handled in the backend rather than in library code
where it is now.
Caroline Tice [Tue, 27 Jun 2023 07:18:33 +0000 (00:18 -0700)]
[LLDB] Fix buffer overflow problem in DWARFExpression::Evaluate.
In two calls to ReadMemory in DWARFExpression.cpp, the buffer size
passed to ReadMemory is not actually the size of the buffer (I suspect
a copy/paste error where the variable name was not properly
updated). This caused a buffer overflow bug, which we found throuth
Address Sanitizer. This patch fixes the problem by passing the
correct buffer size to the calls to ReadMemory (and to the
DataExtractor).
Differential Revision: https://reviews.llvm.org/D153840
Nikolas Klauser [Wed, 28 Jun 2023 18:22:11 +0000 (11:22 -0700)]
[libc++] Add missing _LIBCPP_HIDE_FROM_ABI in uninitialized_buffer.h
Tue Ly [Sat, 24 Jun 2023 04:08:31 +0000 (00:08 -0400)]
[libc][math] Implement erff function correctly rounded to all rounding modes.
Implement correctly rounded `erff` functions.
For `x >= 4`, `erff(x) = 1` for `FE_TONEAREST` or `FE_UPWARD`, `0x1.ffffep-1` for `FE_DOWNWARD` or `FE_TOWARDZERO`.
For `0 <= x < 4`, we divide into 32 sub-intervals of length `1/8`, and use a degree-15 odd polynomial to approximate `erff(x)` in each sub-interval:
```
erff(x) ~ x * (c0 + c1 * x^2 + c2 * x^4 + ... + c7 * x^14).
```
For `x < 0`, we can use the same formula as above, since the odd part is factored out.
Performance tested with `perf.sh` tool from the CORE-MATH project on AMD Ryzen 9 5900X:
Reciprocal throughput (clock cycles / op)
```
$ ./perf.sh erff --path2
GNU libc version: 2.35
GNU libc release: stable
-- CORE-MATH reciprocal throughput -- with -march=native (with FMA instructions)
[####################] 100 %
Ntrial = 20 ; Min = 11.790 + 0.182 clc/call; Median-Min = 0.154 clc/call; Max = 12.255 clc/call;
-- CORE-MATH reciprocal throughput -- with -march=x86-64-v2 (without FMA instructions)
[####################] 100 %
Ntrial = 20 ; Min = 14.205 + 0.151 clc/call; Median-Min = 0.159 clc/call; Max = 15.893 clc/call;
-- System LIBC reciprocal throughput --
[####################] 100 %
Ntrial = 20 ; Min = 45.519 + 0.445 clc/call; Median-Min = 0.552 clc/call; Max = 46.345 clc/call;
-- LIBC reciprocal throughput -- with -mavx2 -mfma (with FMA instructions)
[####################] 100 %
Ntrial = 20 ; Min = 9.595 + 0.214 clc/call; Median-Min = 0.220 clc/call; Max = 9.887 clc/call;
-- LIBC reciprocal throughput -- with -msse4.2 (without FMA instructions)
[####################] 100 %
Ntrial = 20 ; Min = 10.223 + 0.190 clc/call; Median-Min = 0.222 clc/call; Max = 10.474 clc/call;
```
and latency (clock cycles / op):
```
$ ./perf.sh erff --path2
GNU libc version: 2.35
GNU libc release: stable
-- CORE-MATH latency -- with -march=native (with FMA instructions)
[####################] 100 %
Ntrial = 20 ; Min = 38.566 + 0.391 clc/call; Median-Min = 0.503 clc/call; Max = 39.170 clc/call;
-- CORE-MATH latency -- with -march=x86-64-v2 (without FMA instructions)
[####################] 100 %
Ntrial = 20 ; Min = 43.223 + 0.667 clc/call; Median-Min = 0.680 clc/call; Max = 43.913 clc/call;
-- System LIBC latency --
[####################] 100 %
Ntrial = 20 ; Min = 111.613 + 1.267 clc/call; Median-Min = 1.696 clc/call; Max = 113.444 clc/call;
-- LIBC latency -- with -mavx2 -mfma (with FMA instructions)
[####################] 100 %
Ntrial = 20 ; Min = 40.138 + 0.410 clc/call; Median-Min = 0.536 clc/call; Max = 40.729 clc/call;
-- LIBC latency -- with -msse4.2 (without FMA instructions)
[####################] 100 %
Ntrial = 20 ; Min = 44.858 + 0.872 clc/call; Median-Min = 0.814 clc/call; Max = 46.019 clc/call;
```
Reviewed By: michaelrj
Differential Revision: https://reviews.llvm.org/D153683
Daniel Thornburgh [Fri, 23 Jun 2023 22:24:48 +0000 (15:24 -0700)]
[Symbolizer] Ignore unknown additional symbolizer markup fields
The symbolizer markup syntax is structured such that fields require only
previous fields for their interpretation; this was originally intended
to make adding new fields a natural extension mechanism for existing
elements. This codifies this into the spec and makes the behavior of the
llvm-symbolizer match. Extra fields are now warned about, but ignored,
rather than ignoring the whole element.
Reviewed By: mcgrathr
Differential Revision: https://reviews.llvm.org/D153821
Jon Roelofs [Mon, 26 Jun 2023 17:31:57 +0000 (10:31 -0700)]
[MachineInst] Bump NumOperands back up to 24bits
In https://reviews.llvm.org/D149445, it was lowered from 32 to 16bits, which
broke an internal project of ours. The relevant code being compiled is a fairly
large nested switch that results in a PHI node with 65k+ operands, which can't
easily be turned into a table for perf reasons.
This change unifies `NumOperands`, `Flags`, and `AsmPrinterFlags` into a packed
7-byte struct, which `CapOperands` can follow as the 8th byte, rounding it up
to a nice alignment before the `Info` field.
rdar://
111217742&
109362033
Differential revision: https://reviews.llvm.org/D153791
Matt Arsenault [Thu, 8 Jun 2023 16:42:59 +0000 (12:42 -0400)]
AMDGPU: Move AMDGPUAttributor run earlier
Move it up with other module passes. It's a higher level optimization
that should probably be done before hacking up the IR for codegen. It
should really be done earlier than this. We could possibly move this
with other IPO passes, but we'd have to stop inferring the lack of
lds.kernel.id calls and have the LDS module pass mark functions which
don't need the ID.
The one test change is because that pass is relying on the backend run
of SROA (which we ideally wouldn't have).
Philip Reames [Wed, 28 Jun 2023 16:38:40 +0000 (09:38 -0700)]
[docs][RISCV] Remove duplicate entries for zvfbfmin and zvfbfwma
Snehasish Kumar [Tue, 27 Jun 2023 18:26:57 +0000 (18:26 +0000)]
[instrprof] Add an overload to accept raw_string_ostream.
Add an overload for InstrProfWriter::write so that users can emit the
buffer to a string. Also use this new overload for existing unit test
usecases.
Reviewed By: tejohnson
Differential Revision: https://reviews.llvm.org/D153904
David Green [Wed, 28 Jun 2023 16:16:34 +0000 (17:16 +0100)]
[SLP][AArch64] Extend extracts-from-scalarizable-vector.ll test for cmp cost testing. NFC
See D153507. The existing test is over-simplified, as written it should have
been simpified prior to SLP vectorization. I have left it as-is to ensure the
crash it was protecting against doesn't arise again. A new test with valid
inputs is also added to show the incorrect costs of alt cmp vectorization.
Fraser Cormack [Thu, 22 Jun 2023 15:49:39 +0000 (16:49 +0100)]
[InstSimplify] Fix a scalable-vector crash
D143505 fixed/simplified folding of operations with SNaN operands. In
doing so it introduced a crash when handling scalable vector types,
wherein the scalable-vector ConstantVector was cast to a ConstantFP.
Since we know by that point in the code that if we've found a NaN, we're
dealing with a scalable-vector splat (as there are no other kinds of
scalable-vector constant for which that holds), we can grab the splatted
value and re-use the existing code, which will automatically splat the
new NaN back to a scalable vector for us.
Reviewed By: arsenm
Differential Revision: https://reviews.llvm.org/D153566
Valentin Clement [Wed, 28 Jun 2023 16:08:22 +0000 (09:08 -0700)]
[flang][openacc] Resolve symbol in device, host and self clause
Some symbols were not resolved in the device, host and self clause
resulting in an `Internal: no symbol found` error.
This patch adds symbol resolution for these clauses.
Reviewed By: razvanlupusoru
Differential Revision: https://reviews.llvm.org/D153919
Valentin Clement [Wed, 28 Jun 2023 16:07:04 +0000 (09:07 -0700)]
[flang][openacc] Relax clause rule on routine directive
Some compiler treat `acc routine` without a parallelism clause as
if seq is present. Relax the parser rule to allow acc routine
without clause. The default clause will be handled in lowering.
Reviewed By: razvanlupusoru
Differential Revision: https://reviews.llvm.org/D153896
Paul Robinson [Wed, 28 Jun 2023 15:27:25 +0000 (08:27 -0700)]
[doc] Fix link typo
LLVM GN Syncbot [Wed, 28 Jun 2023 15:19:41 +0000 (15:19 +0000)]
[gn build] Port
1bfdc534aaae
Yusra Syeda [Wed, 28 Jun 2023 15:18:12 +0000 (11:18 -0400)]
Revert "[SystemZ][z/OS] This patch adds support for the ADA (associated data area), doing the following:"
This reverts commit
9df0f66af5462e23216eae31aedbd4d2f459cc3d.
Jeffrey Byrnes [Tue, 27 Jun 2023 15:40:15 +0000 (08:40 -0700)]
[AMDGPU] NFC: Add schedule-relaxed-occupancy to relax occupancy targets for wave-limited/membound kernels
Default scheduling behavior for these types of kernels is to chase high occupancy goals with scheduling heuristics, but allow occupancy drops if we are unable to reach the target.
This (experimental, off-by-default) feature relaxes occupancy target from the beginning, which enables scheduler to produce better ILP schedules.
Differential Revision: https://reviews.llvm.org/D153925
Change-Id: I112833214e2db869704591f4df3c4574d0fcbb1b
Shilei Tian [Wed, 28 Jun 2023 15:06:36 +0000 (11:06 -0400)]
[NFC][Doc] Update feature support doc `clang/docs/OpenMPSupport.rst` to correct
the color of finished task
Craig Topper [Wed, 28 Jun 2023 14:57:47 +0000 (07:57 -0700)]
[LegalizeTypes] Combine PromoteIntRes_VECTOR_DEINTERLEAVE and PromoteIntRes_VECTOR_INTERLEAVE. NFC
The functions are identical except for the opcode of the node.
We can have a single function and use N->getOpcode().
Reviewed By: luke, paulwalker-arm
Differential Revision: https://reviews.llvm.org/D153929
Paul Robinson [Tue, 27 Jun 2023 15:19:40 +0000 (08:19 -0700)]
[doc] Give better info about forks
Differential Revision: https://reviews.llvm.org/D153884
LLVM GN Syncbot [Wed, 28 Jun 2023 14:14:23 +0000 (14:14 +0000)]
[gn build] Port
9df0f66af546
Yusra Syeda [Wed, 28 Jun 2023 14:13:10 +0000 (10:13 -0400)]
[SystemZ][z/OS] This patch adds support for the ADA (associated data area), doing the following:
- Creates the ADA table to handle displacements
- Emits the ADA section in the SystemZAsmPrinter
- Lowers the ADA_ENTRY node into the appropriate load instruction
Differential Revision: https://reviews.llvm.org/D153788
LLVM GN Syncbot [Wed, 28 Jun 2023 14:08:35 +0000 (14:08 +0000)]
[gn build] Port
8e71d14972b4
Felipe de Azevedo Piovezan [Mon, 26 Jun 2023 14:14:25 +0000 (10:14 -0400)]
[lldb] Use LLVM's implementation of AppleTables for apple_objc
This concludes the migration of accelerator tables from LLDB code to LLVM code.
Differential Revision: https://reviews.llvm.org/D153868
David Green [Wed, 28 Jun 2023 14:02:38 +0000 (15:02 +0100)]
[ARM][AArch64] !cast<Instruction>("XYZ") -> XYZ. NFC
Guillaume Chatelet [Wed, 28 Jun 2023 11:31:16 +0000 (11:31 +0000)]
[libc][NFC] Separate avx/no-avx x86 memcpy implementations
Reviewed By: courbet
Differential Revision: https://reviews.llvm.org/D153958
Joseph Huber [Tue, 27 Jun 2023 19:17:20 +0000 (14:17 -0500)]
[AMDGPU] Always pass `-mcpu` to the `lld` linker
Currently, AMDGPU more or less only supports linking with LTO. If the
user does not either pass `-flto` or `-Wl,-plugin-opt=mcpu=` manually
linking will fail because the architecture's aren't compatible. THis
patch simply passes `-mcpu` by default if it was specified. Should be a
no-op if it's not actually used.
Reviewed By: yaxunl
Differential Revision: https://reviews.llvm.org/D153909
Jie Fu [Wed, 28 Jun 2023 13:46:08 +0000 (21:46 +0800)]
[flang] Build broken due to no member named 'getNumScalableDims' in 'mlir::VectorType' after D153412 (NFC)
/data/llvm-project/flang/lib/Optimizer/Dialect/FIROps.cpp:971:46: error: no member named 'getNumScalableDims' in 'mlir::VectorType'
if (mlir::dyn_cast<mlir::VectorType>(ty).getNumScalableDims() == 0)
~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~ ^
1 error generated.
Youngsuk Kim [Wed, 28 Jun 2023 13:21:01 +0000 (09:21 -0400)]
[llvm] Replace uses of Type::getPointerTo (NFC)
Partial progress towards removing in-tree uses of `Type::getPointerTo`,
before we can deprecate the API.
If the API is used solely to support an unnecessary bitcast, get rid of
the bitcast as well.
Reviewed By: nikic
Differential Revision: https://reviews.llvm.org/D153933
Felipe de Azevedo Piovezan [Tue, 20 Jun 2023 16:21:38 +0000 (12:21 -0400)]
[lldb] Use LLVM's implementation of AppleTables for apple_debug_types
This commit is replacing really old LLDB code, and we've found some odd
behavior while doing this replacement. While the changes here are largely NFC,
there are some subtle changes that fix such odd behavior.
The most curious example of this is the method `FindCompleteObjCClassName`,
which has a flag `must_be_implementation`. This flag was _only_ being respected
for accelerator tables containing the atom `type_flags`, which seems
counter-intuitive. The implementation for DWARF 5 tables does not do that and
neither does the code introduced in this patch.
There were other weird cases, for example, we found boolean logic that was
always true in a code path: look for a `if !has_qualified_name...` deleted
line; that condition was true by simple if/else analysis.
Differential Revision: https://reviews.llvm.org/D153867
Serge Pavlov [Wed, 28 Jun 2023 13:11:15 +0000 (20:11 +0700)]
[Clang] Reset FP options before function instantiations
Previously function template instantiations occurred with FP options
that were in effect at the end of translation unit. It was a problem
for late template parsing as these FP options were used as attributes of
AST nodes and may result in crash. To fix it FP options are set to the
state of the point of template definition.
Differential Revision: https://reviews.llvm.org/D143241
Alexey Bataev [Tue, 27 Jun 2023 19:48:08 +0000 (12:48 -0700)]
[SLP]Fix PR63141: compareCmp is not strict weak ordering.
Added some extra checks for comapreCMP function if IsCompatibility is
false to make it meat the strict weak ordering requirements to be
correctly used in sort functions.
Kevin P. Neal [Wed, 28 Jun 2023 12:52:23 +0000 (08:52 -0400)]
[TableGen] Stabilize sort in GET_SUBTARGETINFO_MACRO block
Add missed change requested in D153371.
Andrzej Warzynski [Wed, 21 Jun 2023 12:27:13 +0000 (13:27 +0100)]
[mlir][VectorType] Remove `numScalableDims` from the vector type
This is a follow-up of https://reviews.llvm.org/D153372 in which
`numScalableDims` (single integer) was effectively replaced with
`isScalableDim` bitmask.
This change is a part of a larger effort to enable scalable
vectorisation in Linalg. See this RFC for more context:
* https://discourse.llvm.org/t/rfc-scalable-vectorisation-in-linalg/
Differential Revision: https://reviews.llvm.org/D153412
Nikita Popov [Wed, 28 Jun 2023 12:48:42 +0000 (14:48 +0200)]
[AArch64] Make tests more robust (NFC)
David Truby [Mon, 26 Jun 2023 14:03:17 +0000 (15:03 +0100)]
[flang] Add COMDAT to global variables where needed
On platforms which support COMDAT sections we should use them when
linkonce or linkonce_odr linkage is requested. This is required on
Windows (PE/COFF) and provides better behaviour than weak symbols on
ELF-based platforms.
This patch also reverts string literals to use linkonce instead of
internal linkage now that comdats are supported.
Differential Revision: https://reviews.llvm.org/D153768
Jingu Kang [Tue, 27 Jun 2023 08:33:13 +0000 (09:33 +0100)]
[AArch64] Remove vector shift instrinsic with shift amount zero
Differential Revision: https://reviews.llvm.org/D153847
Nikita Popov [Wed, 28 Jun 2023 12:31:14 +0000 (14:31 +0200)]
[SimplifyCFG] Make some tests more robust (NFC)
Felipe de Azevedo Piovezan [Fri, 16 Jun 2023 19:07:59 +0000 (15:07 -0400)]
[lldb] Use LLVM's implementation of AppleTables for apple_{names,namespaces}
All the new code should match the behavior of the old exactly.
Of note, the custom queries used to be implemented inside `HashedNameToDIE.cpp`
(which is the LLDB implementation of the tables). However, when porting to LLVM,
we believe they don't belong inside the LLVM table implementation:
1. They don't require any knowledge about the table itself
2. They are not relevant for other users of these classes.
3. They use LLDB data structures.
As such, we implement these custom queries inside AppleDWARFIndex.cpp.
Types and Objective-C tables are done separately, as they have slightly
different functionality that require rewriting more code.
Differential Revision: https://reviews.llvm.org/D153866
John Brawn [Thu, 1 Jun 2023 16:04:39 +0000 (17:04 +0100)]
[ARM] Generate out-of-line jump tables for XO without 32-bit branch
When we only have a 16-bit pc-relative branch instruction we generate
a table of address for a jump table. Currently this is placed inline,
but this won't work with execute-only memory. In this case generate
the jump table out-of-line.
Differential Revision: https://reviews.llvm.org/D153774
Kevin P. Neal [Wed, 28 Jun 2023 12:26:12 +0000 (08:26 -0400)]
[TableGen] Stabilize sort in GET_SUBTARGETINFO_MACRO block
The sort of the elements in the GET_SUBTARGETINFO_MACRO block is done on
the "Name" field of each record. This field is not guaranteed to be unique,
is not guaranteed to even have a value at all, and is not used in the
output anyway. Change to sort on the "FieldName" field which should be
unique.
Problem spotted when lib/Target/PowerPC/PPCGenSubtargetInfo.inc changed
unexpectedly.
Differential Revision: https://reviews.llvm.org/D153371
Nikita Popov [Wed, 28 Jun 2023 12:21:06 +0000 (14:21 +0200)]
[SimplifyCFG] Add additional tests with assume (NFC)
Florian Hahn [Wed, 28 Jun 2023 12:19:39 +0000 (13:19 +0100)]
[ConstraintElim] Try to use first cmp to prove second cmp for ANDs.
This patch extends the existing logic to handle cases where we have
branch conditions of the form (AND icmp, icmp) where the first icmp
implies the second. This can improve results in some cases, e.g. if
SimplifyCFG folded conditions from multiple branches to an AND.
The implementation handles this by adding a new type of check
(AndImpliedCheck), which are queued before conditional facts for the same
block.
When encountering AndImpliedChecks during solving, the first condition
is optimistically added to the constraint system, then we check if the
second icmp can be simplified, and finally the newly added entries are
removed.
The reason for doing things this way is to avoid clashes with signed
<-> unsigned condition transfer, which require us to re-order facts to
increase effectiveness.
Reviewed By: nikic, antoniofrighetto
Differential Revision: https://reviews.llvm.org/D151799
Tue Ly [Wed, 28 Jun 2023 12:13:05 +0000 (08:13 -0400)]
[libc] Fix missing dependency and linking option for sqrtf exhaustive test.
Haojian Wu [Wed, 28 Jun 2023 12:04:22 +0000 (14:04 +0200)]
[clangd] Fix some typos, NFC
Florian Hahn [Wed, 28 Jun 2023 12:02:11 +0000 (13:02 +0100)]
[ConstraintElim] Move condition check logic to helper function (NFC).
This allows easier re-use of the checking logic. Split off from D151799.
Tue Ly [Sat, 24 Jun 2023 04:03:06 +0000 (00:03 -0400)]
[libc][math] Clean up exhaustive tests implementations.
Clean up exhaustive tests. Let check functions return number of failures instead of passed/failed.
Reviewed By: sivachandra
Differential Revision: https://reviews.llvm.org/D153682
Alexey Bataev [Wed, 28 Jun 2023 11:31:40 +0000 (04:31 -0700)]
Revert "[SLP]Fix PR63141: compareCmp is not strict weak ordering."
This reverts commit
f3ebd88064d7f1c36a8272b3e5f7d53501c3f53b to pacify
windows-based buildbots.
Matt Arsenault [Wed, 28 Jun 2023 11:32:20 +0000 (07:32 -0400)]
OpenMP: Revert accidental cmake change to make amdgpu-arch errors fatal
I still think this should be done but should be done separately.
Matt Arsenault [Mon, 26 Jun 2023 16:43:35 +0000 (12:43 -0400)]
ValueTracking: Handle !absolute_symbol in computeKnownBits
Use a unit test since I don't see any existing uses try to make use of
the high bits of a pointer.
This will also assert if the metadata type doesn't match the pointer
width, but I consider that a defect in the verifier and shouldn't be
handled.
AMDGPU allocates LDS globals by assigning !absolute_symbol with the
final fixed address. Tracking the high bits are 0 may help with
addressing mode matching.
Francesco Petrogalli [Wed, 28 Jun 2023 10:54:33 +0000 (12:54 +0200)]
[MISched] Fix bug(s) in bottom-up scheduling.
BUG 1 - choosing the right cycle when booking a resource.
---------------------------------------------------------
Bottom up scheduling should take in account the current cycle at
the scheduling boundary when determing at what cycle a resource can be
issued. Supposed the schedule boundary is at cycle `C`, and that we
want to check at what cycle a 3 cycles resource can be instantiated.
We have two cases: A, in which the last seen resource cycle LSRC in
which the resource is known to be used is more than oe euqual to 3
cycles away from current cycle `C`, (`C - LSRC >=3`) and B in which
the LSRC is less than 3 cycles away from C (`C - LSRC < 3`). Note
that, in bottom-up scheduling LRS is always smaller or eaual to the
current cycle `C`.
The two cases can be schematized as follow:
```
... | C + 1 | C | C - 1 | C - 2 | C - 3 | C - 4 | ...
| | | | | | LSRC | -> Case A
| | | | LSRC | | | -> Case B
// Before allocating the resource
LSRC(A) = C - 4
LSRC(B) = C - 2
```
In case A, the scheduler sees cycles `C`, `C-1` and `C-2` being
available for booking the 3-cycles resource. Therefore the LSRC can be
updated to be `C`, and the resource can be scheduled from cycle `C`
(the `X` in the table):
```
... | C + 1 | C | C - 1 | C - 2 | C - 3 | C - 4 | ...
| | X | X | X | | | -> Case A
// After allocating the resource
LSRC(A) = C
```
In case B, the 3-cycle resource usage would clash with the LSRC if
allocated starting from cycle C:
```
... | C + 1 | C | C - 1 | C - 2 | C - 3 | C - 4 | ...
| | X | X | X | | | -> clash at cycle C - 2
| | | | LSRC | | | -> Case B
```
Therefore, the cycle in which the resource can be scheduled needs to
be greater than `C`. For the example, the resource is booked
in cycle `C + 1`.
```
... | C + 1 | C | C - 1 | C - 2 | C - 3 | C - 4 | ...
| X | X | X | | | |
// After allocating the resource
LSRC(B) = C + 1
```
The behavior we need to correctly support cases A and B is obtained by
computing the next value of the LSRC as the maximum between:
1. the current cycle `C`;
2. and the previous LSRC plus the number of cycle CYCLES the resource will need.
In formula:
```
LSRC(next) = max(C, LSRC(previous) + CYCLES)
```
BUG 2 - booking the resource for the correct number of cycles.
--------------------------------------------------------------
When storing the next LSRC, the funcion `getNextResourceCycle` was
being invoked setting to 0 the number of cycles a resource was using.
The invocation of `getNextResourceCycle` is now using the values of
`Cycles` instead of 0.
Effects on code generation
--------------------------
This fix have effects only on AArch64, for the Cortex-A55
scheduling model (`-mcpu=cortex-a55`).
The changes in the MIR tests caused by this patch show that the value
now reported by `getNextResourceCycle` is correct.
Other cortex-a55 tests have been touched by this change, where some
instructions have been swapped. The final generated code is equivalent
in term of the total number of cycles. The test
`llvm/test/CodeGen/AArch64/misched-detail-resource-booking-02.mir`
shows in details the correctness of the bottom up scheduling, and the
effect on the codegen change that are visible in the test
`llvm/test/CodeGen/AArch64/aarch64-smull.ll`.
Reviewed By: andreadb, dmgreen
Differential Revision: https://reviews.llvm.org/D153117
Matt Arsenault [Wed, 24 May 2023 15:22:20 +0000 (16:22 +0100)]
AMDGPU: Special case uniformity info for single lane workgroups
Constructors/destructors and OpenMP make use of single lane groups
in some cases.
Matt Arsenault [Sat, 24 Jun 2023 20:04:42 +0000 (16:04 -0400)]
ValueTracking: Handle ptrmask in computeKnownBits
Matt Arsenault [Mon, 26 Jun 2023 13:38:58 +0000 (09:38 -0400)]
InstCombine: Add baseline tests for some ptrmask handling
Martin Braenne [Wed, 28 Jun 2023 09:16:09 +0000 (09:16 +0000)]
[clang][dataflow] Don't crash if copy constructor arg doesn't have a storage location.
I accidentally used `cast` instead of `cast_or_null`.
Reviewed By: sammccall, xazax.hun
Differential Revision: https://reviews.llvm.org/D153956
Matt Arsenault [Tue, 13 Jun 2023 19:51:44 +0000 (15:51 -0400)]
OpenMP: Fix nothrow new/delete for amdgpu
I tried #pragma omp begin declare variant device_type(nohost) but it
didn't work and I'm not really sure how it's supposed to work.
Matt Arsenault [Tue, 13 Jun 2023 19:14:09 +0000 (15:14 -0400)]
OpenMP: Add missing test coverage for nothrow new/delete
Missing test from
fd3437a4f791cb0520e19b87953141fc68543377
Matt Arsenault [Fri, 23 Jun 2023 13:51:08 +0000 (09:51 -0400)]
OpenMP/cmake: Use TARGET instead of looking for amdgpu-arch
Not sure if the standalone build case is supposed to be a supported
path. Should probably rely on find_package and imported targets
anyway.
Serge Pavlov [Wed, 28 Jun 2023 10:51:39 +0000 (17:51 +0700)]
[symbolizer] Exit early if input file is absent
If binary file specified as input with option --obj or -e is absent,
now llvm-addr2line exits immediately. This patch extends this behavior to
llvm-symbolizer. Previously llvm-symbolizer waited addresses from input
stream or command line in this case.
Differential Revision: https://reviews.llvm.org/D153219
Sven van Haastregt [Wed, 28 Jun 2023 10:33:27 +0000 (11:33 +0100)]
[CodeGenPrepare] Implement releaseMemory
Release BlockFrequencyInfo and BranchProbabilityInfo results and other
per function information immediately afterwards, instead of holding
onto the memory until the next `CodeGenPrepare::runOnFunction` call.
Differential Revision: https://reviews.llvm.org/D152552
Co-authored-by: Erik Hogeman <erik.hogeman@arm.com>
Nicolas Vasilache [Tue, 27 Jun 2023 19:27:47 +0000 (19:27 +0000)]
[mlir][Linalg] Refactor isaContractionOpInterface and surrounding utils
This is almost NFC except for the fact that:
- when multiple candidates are available we now return them in sorted order vs undetermined order previously
- the type of the transform return is relaxed an a test is added for the case where the transform does not apply
Differential Revision: https://reviews.llvm.org/D153941
Nikita Popov [Wed, 28 Jun 2023 10:09:02 +0000 (12:09 +0200)]
[SimplifyCFG] Regenerate test checks (NFC)
Alexey Lapshin [Mon, 26 Jun 2023 21:31:12 +0000 (23:31 +0200)]
[DWARFv5][DWARFLinker] avoid stripping template names for .debug_names.
DWARFLinker puts three names for subprograms into the .apple_names and
.debug_names: short name, linkage name, name without template parameters.
DW_TAG_subprogram
DW_AT_linkage_name "_Z3fooIcEvv"
DW_AT_name "foo<char>"
short name: "foo<char>"
linkage name: "_Z3fooIcEvv"
name without template parameters: "foo"
DWARFv5 does not require stripping template parameters for subprogram name.
Current llvm-dwarfdump --verify reports the error if names stored in
accelerator table do not match with DIE name(name with stripped template
parameters stored in accelerator table does not match with original DIE name).
This patch does not store name without template parameters into the .debug_names table.
Differential Revision: https://reviews.llvm.org/D153869
Ties Stuij [Wed, 28 Jun 2023 09:06:41 +0000 (10:06 +0100)]
[ARM] allow long-call codegen for armv6-M eXecute Only (XO)
Recently eXecute Only (XO) codegen was also allowed for armv6-M. Previously this
was only implemented for ~armv7+, effectively if MOVW/MOVT is
available. Regarding long calls, we remove the check for MOVW/MOVT when
generating code for XO, which already was redundant as in the subtarget
initialization we already check if XO is valid for the target. And targets that
generate valid XO code should be able to handle the (wrapper globaladdress)
node.
Reviewed By: efriedma
Differential Revision: https://reviews.llvm.org/D153782
Jeremy Morse [Wed, 28 Jun 2023 09:35:57 +0000 (10:35 +0100)]
Reapply "[DebugInfo][InstrRef] Instrument x86 CMOV conversion to preserve variable values"
X86's CMOV conversion transforms CMOV instructions into control flow between
blocks, meaning the value is computed by a PHI rather than a "real" machine
instruction. In instruction-referencing mode, we need to transfer the
instruction label between the old CMOV and the new PHI instruction to mark
where the variable value is computed.
There's an extra complication in that memory operands can be unfolded from the
CMOV and sunk into the new blocks -- the test checks both scenarios where the
instruction number has to hop between instructions.
This omission exposed by Dexter testing.
Reviewed By: Orlando
Differential Revision: https://reviews.llvm.org/D145565
OCHyams [Wed, 28 Jun 2023 09:33:02 +0000 (10:33 +0100)]
Reset NoPHI MachineFunction property in X86CmovConversion
In order to placate the machine-verifier, X86CmovConversion needs to reset the
NoPHI property when it inserts a PHI.
Fixes buildbot failure: https://lab.llvm.org/buildbot/#/builders/16/builds/50453
Reviewed By: StephenTozer
Differential Revision: https://reviews.llvm.org/D153950
Sam McCall [Tue, 27 Jun 2023 18:59:18 +0000 (20:59 +0200)]
[dataflow] Use consistent, symmetrical, non-mutating erased signature for join()
Mutating join() isn't used and so appears to be an anti-optimization.
Having Lattice vs Environment inconsistent is awkward, particularly when trying
to minimize copies while joining.
This patch eliminates the difference, but doesn't actually change the signature
of join on concrete lattice types (as that's a breaking change).
Differential Revision: https://reviews.llvm.org/D153908
Sam McCall [Tue, 27 Jun 2023 23:13:52 +0000 (01:13 +0200)]
[unittest] teach gTest to print entries of DenseMap as pairs
When an assertion like the following fails:
EXPECT_THAT(map, ElementsAre(Pair("p", "nullable"))));
Error message before:
Actual: { 40-byte object <E8-A5 9C-7F 25-37 00-00 58-7E 51-51 D0-7F 00-00 00-00 00-00 00-00 00-00 01-00 00-00 00-00 00-00 00-DA C7-7F 25-37 00-00> }
After:
Actual: { ("p", "nonnull") }
It is not ideal that we need to refer directly to DenseMapPair inside the
internal namespace, but I believe the practical maintenance risk is low.
This change is covered by DenseMap's unittests, as we've covered SmallString etc
in the past.
Differential Revision: https://reviews.llvm.org/D153930
Igor Kirillov [Wed, 21 Jun 2023 17:03:50 +0000 (17:03 +0000)]
[LV] Precommit masked interleaved access tests
Precommit for D152258.
Differential Revision: https://reviews.llvm.org/D153443
pvanhout [Wed, 28 Jun 2023 09:13:51 +0000 (11:13 +0200)]
Revert "[AMDGPU] Use SSAUpdater in PromoteAlloca"
This reverts commit
091bfa76db64fbe96d0e53d99b2068cc05f6aa16.
Martin Braenne [Wed, 28 Jun 2023 08:05:02 +0000 (08:05 +0000)]
[clang][dataflow] Output debug info if `getChild()` doesn't find field.
Depends On D153409
Reviewed By: xazax.hun
Differential Revision: https://reviews.llvm.org/D153851
Martin Braenne [Wed, 28 Jun 2023 08:04:33 +0000 (08:04 +0000)]
[clang][dataflow] Add a test that we can access fields of anonymous records.
Reviewed By: sammccall, ymandel, gribozavr2, xazax.hun
Differential Revision: https://reviews.llvm.org/D153409
Kadir Cetinkaya [Thu, 22 Jun 2023 14:15:26 +0000 (16:15 +0200)]
[clangd] Always allow diagnostics from stale preambles
We've been running this internally for months now, without any
stability or correctness concerns. It has ~40% speed up on incremental
diagnostics latencies (as preamble can get invalidated through code completion
etc.).
Differential Revision: https://reviews.llvm.org/D153882
Florian Hahn [Wed, 28 Jun 2023 08:40:53 +0000 (09:40 +0100)]
[ConstraintElim] Move FactOrCheck and State definitions to top. (NFC)
This will enable follow-up refactoring to use the State directly in the
constraint system, reducing the need to pass lots of arguments around.
Leonard Grey [Wed, 28 Jun 2023 08:28:57 +0000 (10:28 +0200)]
[lsan] Be more conservative in SuspendedThreadsListMac::GetRegistersAndSP
Currently, we only return REGISTERS_UNAVAILABLE_FATAL if we receive
KERN_INVALID_ARGUMENT from thread_status. In reality, there are other
possible return values (MACH_SEND_INVALID_DEST for example) that make it
dangerous to read memory. This can be demonstrated by running
create_thread_leak.cpp in standalone mode where it will appear to hang
due to a EXC_BAD_ACCESS while scanning the stack.
This change reverses the current logic to treat MIG_ARRAY_TOO_LARGE as
non-fatal, and all other errors as fatal.
Differential revision: https://reviews.llvm.org/D153072
Kohei Yamaguchi [Wed, 28 Jun 2023 08:34:14 +0000 (08:34 +0000)]
[mlir][doc] Fix broken docs
- Fix include paths for Transform Dialect Tutorial
- Add math dialect's pass into Pass.md
- Remove a include path of Quant dialect from Pass.md
Reviewed By: ftynse
Differential Revision: https://reviews.llvm.org/D153944
Jacob Bramley [Tue, 13 Jun 2023 15:21:15 +0000 (16:21 +0100)]
Update module flags documentation for Min.
This updates the documentation to match the implementation. Warning and
Min interact in the same way as Warning and Max.
Differential Revision: https://reviews.llvm.org/D153012
Nikita Popov [Wed, 28 Jun 2023 08:08:39 +0000 (10:08 +0200)]
[Attributor] Update UTC version in test (NFC)
In order to also check return attributes.
Nikita Popov [Wed, 14 Jun 2023 10:23:19 +0000 (12:23 +0200)]
[InstCombine] Fold binop of shifts with related amounts
Fold
binop(shift(ShiftedC1, ShAmt), shift(ShiftedC2, add(ShAmt, AddC)))
->
shift(binop(ShiftedC1, shift(ShiftedC2, AddC)), ShAmt)
where both shifts are the same and AddC is a valid shift amount.
Proofs: https://alive2.llvm.org/ce/z/PhVVeg
Differential Revision: https://reviews.llvm.org/D152927
Nikita Popov [Tue, 27 Jun 2023 13:58:56 +0000 (15:58 +0200)]
[Attributor] Convert test to opaque pointers (NFC)
This converts the arg-count-mismatch.ll test to opaque pointers.
The bitcasts of the called functions are now implicit and this
affects behavior: We now infer memory attributes. This should be
fine, as function-level attributes are not affected by signature
mismatches.
Differential Revision: https://reviews.llvm.org/D153406
Fangrui Song [Wed, 28 Jun 2023 07:30:52 +0000 (00:30 -0700)]
Revert D153927 "Resubmit with fix: [NFC] Refactor MBB hotness/coldness into templated PSI functions."
This reverts commit
4d8cf2ae6804e0d3f2b668dbec0f5c1983358328.
There is a library layering violation. LLVMAnalysis cannot depend on LLVMCodeGen.
```
llvm/include/llvm/Analysis/ProfileSummaryInfo.h:19:10: fatal error: 'llvm/CodeGen/MachineFunction.h' file not found
19 | #include "llvm/CodeGen/MachineFunction.h"
| ^~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~
```
Kai Sasaki [Wed, 28 Jun 2023 05:55:42 +0000 (14:55 +0900)]
[mlir][complex] Canonicalize complex.mul with 1 and 0
We can fold the complex.mul if the right value is obvious 1 or 0.
Differential Revision: https://reviews.llvm.org/D153606
Fangrui Song [Wed, 28 Jun 2023 07:23:38 +0000 (00:23 -0700)]
[llvm-exegesis] Adjust GLIBC_INITS_RSEQ condition
Commit
9f80831f3627e800709e2434bbbd5bb179b1576e introduced `#include <sys/rseq.h>`,
but RSEQ_SIG is only defined by some glibc ports (aarch64,arm,mips,powerpc,s390,x86),
causing other hosts (e.g., riscv64, loongarch64) to fail to build.
Reviewed By: aidengrossman, xen0n
Differential Revision: https://reviews.llvm.org/D153938
Hau Hsu [Wed, 28 Jun 2023 06:59:09 +0000 (14:59 +0800)]
Summary: [lldb] Fix libncurses, libpanel library link order
libpanel depends on libcurses, so when linking static libraries, libpanel
should be places prior to libcurses.
This patch resolves error like:
```
.../x86_64-centos6-linux-gnu/bin/ld:
.../lib/libpanelw.a(p_show.o):
in function `show_panel':
p_show.c:(.text+0x39): undefined reference to `_nc_panelhook_sp'
.../x86_64-centos6-linux-gnu/bin/ld:
.../lib/libpanelw.a(p_show.o):
in function `update_panels_sp':
p_update.c:(.text+0x1f): undefined reference to `_nc_panelhook_sp'
collect2: error: ld returned 1 exit status
```
Reviewed By: JDevlieghere
Differential Revision: https://reviews.llvm.org/D153844
Nicolas Vasilache [Tue, 27 Jun 2023 09:02:31 +0000 (09:02 +0000)]
Revert "Revert "[mlir][Transform] Add support for mma.sync m16n8k16 f16 rewrite." and "[mlir][Transform] Introduce nvgpu transform extensions""
This reverts commit
6506692fe619ef8a1f7c6ea829d9a9eceb31622d.
Differential Revision: https://reviews.llvm.org/D153845
Jean Perier [Wed, 28 Jun 2023 06:32:21 +0000 (08:32 +0200)]
[flang][hlfir] Do not reuse hlfir.expr mask when saving RHS.
In WHERE and masked FORALL assignment, both the mask and the
RHS may need to be saved in some temporary storage before evaluating
the assignment.
The code was trying to "optimize" that case when evaluating the RHS
by not fetching the mask temporary that was just created, but in simple
cases of WHERE construct where the evaluated mask is an hlfir.expr,
this caused the hlfir.expr to be both used in an hlfir.associate and
later in an hlfir.apply to create the fir.if to mask the RHS evaluation.
This double usage prevents codegen from inlining the hlfir.expr at the
hlfir.apply, and from "moving" the hlfir.expr storage into the temp
during hlfir.associate bufferization. So this is pessimizing the code:
this would lead to created two mask array temporary storages
This was caught by the unexpectedly high number of "not yet implemented:
hlfir.associate of hlfir.expr with more than one use" that were firing.
Use the mask temporary instead (the hlfir.associate result) when possible.
Some temporary (the "inlined stack") do not support fetching and pushing
in the same run (a single counter is used to keep track of the fetching
and pushing position). Add a canBeFetchedAfterPush() for safety,
but this limitation is anyway not relevant for hlfir.expr since the
inlined stack is only used to save "trivial" scalars.
Also update the temporary storage name to only indicate "forall" if
the top level construct is a FORALL. This is not a very precise name,
but it should at least give a correct context to indicate in the IR
why some temporary array storage was created.
Differential Revision: https://reviews.llvm.org/D153880
Jean Perier [Wed, 28 Jun 2023 06:27:16 +0000 (08:27 +0200)]
[flang] do not merge block after lowering
Lowering relies on dead code generation / unreachable block deletion
to delete some code that is potentially invalid.
However, calling mlir::simplifyRegion also merges block, which may
promote SSA values to block arguments. Not all FIR types are intended
to be block arguments.
The added test shows an example where block merging led to
fir.shape<> being block arguments (and a failure later in codegen).
Reviewed By: tblah, clementval, vdonaldson
Differential Revision: https://reviews.llvm.org/D153858
pvanhout [Tue, 13 Jun 2023 07:49:38 +0000 (09:49 +0200)]
[AMDGPU] Use SSAUpdater in PromoteAlloca
This allows PromoteAlloca to not be reliant on a second SROA run to remove the alloca completely. It just does the full transformation directly.
Note PromoteAlloca is still reliant on SROA running first to
canonicalize the IR. For instance, PromoteAlloca will no longer handle aggregate types because those should be simplified by SROA before reaching the pass.
Reviewed By: #amdgpu, arsenm
Differential Revision: https://reviews.llvm.org/D152706
Freddy Ye [Wed, 28 Jun 2023 05:06:05 +0000 (13:06 +0800)]
Pre-commit test for D151696.
Meanwhile this patch added missing tests for supported CPU names
of cpu_dispatch/specific attribute and added more CHECKs for
resolver function.
Reviewed By: pengfei, skan
Differential Revision: https://reviews.llvm.org/D152989
Petr Hosek [Wed, 28 Jun 2023 05:42:06 +0000 (05:42 +0000)]
[Fuchsia] Enable libcxx filesystem on Windows for stage 1 build
This is a follow up to D153931 but for the first stage.
Chuanqi Xu [Wed, 28 Jun 2023 05:41:36 +0000 (13:41 +0800)]
[NFC] [C++20] [Modules] Add a test for merging lambda types
Close https://github.com/llvm/llvm-project/issues/57222.
This should be fixed with the series of bc73ef0. Add the test case for
C++20 Named modules.
Han Shen [Tue, 27 Jun 2023 23:37:25 +0000 (16:37 -0700)]
Resubmit with fix: [NFC] Refactor MBB hotness/coldness into templated PSI functions.
In D152399, we calculate BPI->BFI in MachineFunctionSplit pass just to
use PSI->isFunctionHotInCallGraph, which is expensive. Instead, we can
implement this directly with MBFI.
Reviewer mentioned in the comment, that machine_size_opts already has
isFunctionColdInCallGraph, isFunctionHotInCallGraphNthPercentile, etc
implemented. These can be refactored and reused across MFS and machine
size opts.
This CL does this - it refactors out those internal static functions
into PSI as templated functions, so they can be accessed easily.
Differential Revision: https://reviews.llvm.org/D153927