Chenle Yu [Mon, 15 May 2023 14:56:48 +0000 (09:56 -0500)]
[OpenMP] Implement task record and replay mechanism
This patch implements the "task record and replay" mechanism. The idea is to be able to store tasks and their dependencies in the runtime so that we do not pay the cost of task creation and dependency resolution for future executions. The objective is to improve fine-grained task performance, both for those from "omp task" and "taskloop".
The entry point of the recording phase is __kmpc_start_record_task, and the end of record is triggered by __kmpc_end_record_task.
Tasks encapsulated between a record start and a record end are saved, meaning that the runtime stores their dependencies and structures, referred to as TDG, in order to replay them in subsequent executions. In these TDG replays, we start the execution by scheduling all root tasks (tasks that do not have input dependencies), and there will be no involvement of a hash table to track the dependencies, yet tasks do not need to be created again.
At the beginning of __kmpc_start_record_task, we must check if a TDG has already been recorded. If yes, the function returns 0 and starts to replay the TDG by calling __kmp_exec_tdg; if not, we start to record, and the function returns 1.
An integer uniquely identifies TDGs. Currently, this identifier needs to be incremented manually in the source code. Still, depending on how this feature would eventually be used in the library, the caller function must do it; also, the caller function needs to implement a mechanism to skip the associated region, according to the return value of __kmpc_start_record_task.
Reviewed By: tianshilei1992
Differential Revision: https://reviews.llvm.org/D146642
Nikita Popov [Mon, 15 May 2023 13:41:16 +0000 (15:41 +0200)]
[KnownBitsTest] Align with ConstantRange test infrastructure (NFC)
Align the way we perform exhaustive tests for KnownBits with what
we do for ConstantRange. Test each case separately by specifying
a function on KnownBits and one on APInts. Additionally, specify
a callback that determines which cases are supposed to be optimal,
rather than only correct. Unlike the ConstantRange case there is
a well-defined, unique notion of optimality for KnownBits.
If a failure occurs, print out the inputs, computed result and
exact result. Adjust the printing function to produce the output
in a format that is meaningful for KnownBits, i.e. print the
actual known bits, using ? to signify unknowns and ! to signify
conflicts.
Erich Keane [Mon, 15 May 2023 14:09:07 +0000 (07:09 -0700)]
Update __cplusplus for C++23, add C++23 diag group alias.
This came up during the C++26 flag discussion, so split this out into a
separate patch.
Sameer Sahasrabuddhe [Mon, 15 May 2023 11:36:06 +0000 (17:06 +0530)]
[LLVM][Uniformity] Propagate temporal divergence explicitly
At a cycle C with divergent exits, UA was using a naive traversal of the exiting
edges to locate blocks that may use values defined inside C. But this traversal
fails when it encounters a cycle. This is now replaced with a much simpler
propagation that iterates over every instruction in C and checks any uses that
are outside C. But such an iteration can be expensive when C is very large; the
original strategy may need to be reconsidered if there is a regression in
compilation times.
Also fixed lit tests that should have originally caught the missed propagation
of temporal divergence.
Reviewed By: foad
Differential Revision: https://reviews.llvm.org/D149646
Manupa Karunaratne [Mon, 15 May 2023 14:41:49 +0000 (14:41 +0000)]
[MLIR][ROCDL] add gpu to rocdl erf support
This commit adds lowering of lib func
call to support erf in rocdl.
Reviewed By: ThomasRaoux
Differential Revision: https://reviews.llvm.org/D150355
Alex Zinenko [Mon, 15 May 2023 12:28:21 +0000 (12:28 +0000)]
[mlir] allow repeated payload in structured.fuse_into_containing
Structured fusion proceeds by iteratively finding the next suitable
producer to be fused into the loop. Therefore, it shouldn't matter if
the same producer is listed multiple times (e.g., it is used as multiple
operands). Adjust the implementation of the transform op to support this
case.
Also fix the checking code in the interpreter to actually respect the
TransformOpInterface indication that repeated payload is allowed, it
seems to have been accidentally dropped in one of the refactorings.
Reviewed By: nicolasvasilache
Differential Revision: https://reviews.llvm.org/D150561
Kyle Huey [Mon, 15 May 2023 14:08:18 +0000 (15:08 +0100)]
[X86] Use the CFA as the DWARF frame base for better variable locations around calls.
Prior to this patch, for the DWARF frame base LLVM uses the frame pointer
register if available, otherwise the stack pointer register. If the stack
pointer register is being used and a call or other code modifies the stack
pointer during the body of the function this results in the locations being
wrong and the debugger displaying the wrong values for variables.
By using DW_OP_call_frame_cfa in these situations the emitted location for
the variable will automatically handle changes in the stack pointer.
The CFA needs to be adjusted for the offset between the frame pointer/stack
pointer to allow the variable locations themselves to remain unchanged by
this patch.
Reviewed By: #debug-info, scott.linder, jryans
Differential Revision: https://reviews.llvm.org/D143463
Florian Hahn [Mon, 15 May 2023 14:05:10 +0000 (15:05 +0100)]
[AArch64] Add test case where widening mull could be used.
Extra test using mull for D150482.
khei4 [Mon, 15 May 2023 13:33:15 +0000 (22:33 +0900)]
[ConstantFold] use StoreSize for VectorType folding
Differential Revision: https://reviews.llvm.org/D150515
Reviewed By: nikic
Nikolas Klauser [Mon, 15 May 2023 13:56:40 +0000 (06:56 -0700)]
Revert "[libc++][PSTL] Implement std::transform"
This reverts commit
cbd9e5454741ebe6b39521fe1a8ed4eed5c2c801.
The wrong patch was landed.
Francesco Petrogalli [Mon, 15 May 2023 13:26:32 +0000 (15:26 +0200)]
[unittests][llvm-exegesis] Remove build warnings [NFCI]
Remove the warning caused by a missing field initializer.
The field is `StartAtCycle` of `struct MCWriteProcResEntry`.
It has been set to the default `StartAtCycle = 0`.
Differential Revision: https://reviews.llvm.org/D150569
Nikolas Klauser [Fri, 5 May 2023 16:16:05 +0000 (09:16 -0700)]
[libc++][PSTL] Implement std::transform
Reviewed By: ldionne, #libc
Spies: libcxx-commits
Differential Revision: https://reviews.llvm.org/D149615
Matthias Springer [Mon, 15 May 2023 13:39:35 +0000 (15:39 +0200)]
[mlir][scf][bufferize] Fix bug in WhileOp analysis verification
Block arguments and yielded values are not equivalent if there are not enough block arguments. This fixes #59442.
Differential Revision: https://reviews.llvm.org/D145575
Matthias Springer [Mon, 15 May 2023 13:34:11 +0000 (15:34 +0200)]
[mlir][bufferization] Add option to dump alias sets
This is useful for debugging.
Differential Revision: https://reviews.llvm.org/D143314
Jan Kuhle [Mon, 15 May 2023 13:33:33 +0000 (15:33 +0200)]
clang-format: [JS] terminate import sorting on `export type X = Y`
Contributed by @jankuehle!
https://reviews.llvm.org/D150116 introduced a bug. `export type X = Y` was considered an export declaration and took part in import sorting. This is not correct. With this change `export type X = Y` properly terminates import sorting.
Reviewed By: krasimir
Differential Revision: https://reviews.llvm.org/D150563
Matthias Springer [Mon, 15 May 2023 13:26:13 +0000 (15:26 +0200)]
[mlir][bufferization] Improve findValueInReverseUseDefChain signature
Instead of passing traversal options as a long list of arguments, store them in a TraversalConfig object and pass that object.
Differential Revision: https://reviews.llvm.org/D143927
Jay Foad [Mon, 15 May 2023 13:15:46 +0000 (14:15 +0100)]
[AMDGPU] Simplify liveins in some MIR tests
We can use the following 16-VGPR tuple directly instead of splitting it
into smaller parts:
$vgpr240_vgpr241_vgpr242_vgpr243_vgpr244_vgpr245_vgpr246_vgpr247_vgpr248_vgpr249_vgpr250_vgpr251_vgpr252_vgpr253_vgpr254_vgpr255
Manna, Soumi [Mon, 15 May 2023 12:58:52 +0000 (05:58 -0700)]
Fix build error caused by https://reviews.llvm.org/D149718
The patch(https://reviews.llvm.org/D149718) broke buildbot
../../clang/include/clang/Sema/ParsedAttr.h:705:18: error: explicitly defaulted move assignment operator is implicitly deleted [-Werror,-Wdefaulted-function-deleted]
AttributePool &operator=(AttributePool &&pool) = default;
^
../../clang/include/clang/Sema/ParsedAttr.h:674:21: note: move assignment operator of 'AttributePool' is implicitly deleted because field 'Factory' is of reference type 'clang::AttributeFactory &'
AttributeFactory &Factory;
^
1 error generated.
This patch fixes the build error.
Nikita Popov [Fri, 28 Apr 2023 13:29:49 +0000 (15:29 +0200)]
[Pipelines] Don't skip GlobalDCE in ThinLTO pre-link
GlobalDCE will only remove functions with available externally
linkage if they are unreferenced. As such, I don't believe there
is any problem with running this pass as part of the ThinLTO pre-link
pipeline. It will only remove functions that are completely dead in
that module, and I don't think there is any benefit to keeping them
around for the post-link phase.
There is no compile-time impact from the additional pass.
This is a followup to one of the side discussions in D146776.
Differential Revision: https://reviews.llvm.org/D149446
Piotr Sobczak [Mon, 15 May 2023 12:13:51 +0000 (14:13 +0200)]
[ValueTracking] Fix computeKnownFPClass with canonicalize
Update code that assumes llvm.canonicalize only handles scalars,
by adding a call to getScalarType().
This is fine, as the intrinsic is trivially vectorizable.
Introduced in D147870, and uncovered by D148065.
Differential Revision: https://reviews.llvm.org/D150556
Matthias Springer [Mon, 15 May 2023 12:39:50 +0000 (14:39 +0200)]
[mlir][IR][tests] Fix incorrect API usage in RewritePatterns
Incorrect API usage was detected by D144552.
Differential Revision: https://reviews.llvm.org/D145167
Matthias Springer [Mon, 15 May 2023 12:31:26 +0000 (14:31 +0200)]
[mlir][bufferization] Fix unknown ops in BufferViewFlowAnalysis
If an op is unknown to the analysis, it must be treated conservatively: assume that every operand aliases with every result.
Differential Revision: https://reviews.llvm.org/D150546
Haojian Wu [Wed, 3 May 2023 10:18:17 +0000 (12:18 +0200)]
[clangd] Fix fixAll not shown when there is only one unused-include and missing-include diagnostics.
Discovered during the review in https://reviews.llvm.org/D149437#inline-1444851.
Differential Revision: https://reviews.llvm.org/D149822
Alejandro Álvarez Ayllón [Mon, 15 May 2023 11:39:58 +0000 (07:39 -0400)]
[clang][parser] Fix namespace dropping after malformed declarations
* After a malformed top-level declaration
* After a malformed templated class method declaration
In both cases, when there is a malformed declaration, any following
namespace is dropped from the AST. This can trigger a cascade of
confusing diagnostics that may hide the original error. An example:
```
// Start #include "SomeFile.h"
template <class T>
void Foo<T>::Bar(void* aRawPtr) {
(void)(aRawPtr);
}
// End #include "SomeFile.h"
int main() {}
```
We get the original error, plus 19 others from the standard library.
With this patch, we only get the original error.
clangd can also benefit from this patch, as namespaces following the
malformed declaration is now preserved. i.e.
```
MACRO_FROM_MISSING_INCLUDE("X")
namespace my_namespace {
//...
}
```
Before this patch, my_namespace is not visible for auto-completion.
Differential Revision: https://reviews.llvm.org/D150258
Joseph Huber [Mon, 15 May 2023 21:22:27 +0000 (16:22 -0500)]
[libc] Cache ownership of the shared buffer in the port
This patch adds another variable to cache cases where we know that we
own the buffer. This allows us to skip the atomic load on the inbox
because we already know its state. This is legal immediately after
opening a port, or when sending immediately after a recieve. This
caching nets a significant (~17%) speedup for the basic open, send,
recieve combination.
Reviewed By: JonChesterfield
Differential Revision: https://reviews.llvm.org/D150516
Joseph Huber [Mon, 24 Apr 2023 19:35:38 +0000 (14:35 -0500)]
[libc] Make the bump pointer explicitly return null on buffer oveerrun
We use a simple bump ptr in the `libc` tests. If we run out of data we
can currently return other static memory and have weird failure cases.
We should fail more explicitly here by returning a null pointer instead.
Reviewed By: sivachandra
Differential Revision: https://reviews.llvm.org/D150529
Martin Braenne [Mon, 15 May 2023 09:23:44 +0000 (09:23 +0000)]
[clang][dataflow] Don't analyze templated declarations.
Attempting to analyze templated code doesn't have a good cost-benefit ratio. We
have so far done a best-effort attempt at this, but maintaining this support has
an ongoing high maintenance cost because the AST for templates can violate a lot
of the invariants that otherwise hold for the AST of concrete code. As just one
example, in concrete code the operand of a UnaryOperator '*' is always a prvalue
(https://godbolt.org/z/s3e5xxMd1), but in templates this isn't true
(https://godbolt.org/z/6W9xxGvoM).
Further rationale for not analyzing templates:
* The semantics of a template itself are weakly defined; semantics can depend
strongly on the concrete template arguments. Analyzing the template itself (as
opposed to an instantiation) therefore has limited value.
* Analyzing templates requires a lot of special-case code that isn't necessary
for concrete code because dependent types are hard to deal with and the AST
violates invariants that otherwise hold for concrete code (see above).
* There's precedent in that neither Clang Static Analyzer nor the flow-sensitive
warnings in Clang (such as uninitialized variables) support analyzing
templates.
Reviewed By: gribozavr2, xazax.hun
Differential Revision: https://reviews.llvm.org/D150352
Florian Hahn [Mon, 15 May 2023 10:49:16 +0000 (11:49 +0100)]
[VPlan] Use VPRecipeWithIRFlags for VPReplicateRecipe, retire poison map
Update VPReplicateRecipe to use VPRecipeWithIRFlags for IR flag
handling. Retire separate MayGeneratePoisonRecipes map.
Depends on D149082.
Reviewed By: Ayal
Differential Revision: https://reviews.llvm.org/D150027
Jay Foad [Mon, 15 May 2023 10:37:56 +0000 (11:37 +0100)]
[RegScavenger] Simplify forward(MachineBasicBlock::iterator). NFC.
Ivan Chikish [Mon, 15 May 2023 10:25:32 +0000 (11:25 +0100)]
[X86] LowerRotate: prefer unpack-based algorithm
Splitting and improving from the https://reviews.llvm.org/D146357
When running tests for LowerShift, I discovered some poor codegen in rotate and funnel shift tests. This patch attempts to address some of them.
Using unpack for splitting and using double-bitwidth shifts may improve performance according to https://uica.uops.info tests.
No cross-lane shuffles
No dirtying double-width registers
Massive improvement for AVX2 rotates in some cases (var_funnnel_v8i16, var_funnnel_v16i16) — because unpack is currently only used for vXi8 vectors.
Differential Revision: https://reviews.llvm.org/D149071
Jacob Crawley [Fri, 12 May 2023 13:59:37 +0000 (13:59 +0000)]
[flang][hlfir] lower hlfir.any into fir runtime call
Depends on: D150272
Differential Revision: https://reviews.llvm.org/D150451
Jacob Crawley [Wed, 10 May 2023 14:23:48 +0000 (14:23 +0000)]
[flang] lower any intrinsic to hlfir.any operation
Carries out the lowering of the any intrinsic into HLFIR
Depends on: D149964
Differential Revision: https://reviews.llvm.org/D150272
Jacob Crawley [Fri, 5 May 2023 14:51:15 +0000 (14:51 +0000)]
[flang] add hlfir.any intrinsic
Adds a HLFIR operation for the ANY intrinsic according to the
design set out in flang/docs/HighLevel.md
Differential Revision: https://reviews.llvm.org/D149964
Peter Smith [Thu, 11 May 2023 11:00:19 +0000 (12:00 +0100)]
[LLD][ELF] Add missing program header parsing to OVERLAY
In D72756 the change to add INPUT_SECTION_FLAGS inadvertantly
removed the line to parse the program header assignment information for
OutputSections within an OVERLAY.
This change adds back the missing line and adds a test for it.
Differential Revision: https://reviews.llvm.org/D150445
Tobias Hieta [Mon, 15 May 2023 08:58:20 +0000 (10:58 +0200)]
[docs] Add Python coding standard to documentation
As discussed on the forums:
https://discourse.llvm.org/t/rfc-document-and-standardize-python-code-style/
Reviewed By: jhenderson, JDevlieghere
Differential Revision: https://reviews.llvm.org/D143852
Francesco Petrogalli [Fri, 12 May 2023 15:45:07 +0000 (17:45 +0200)]
[TableGen][SubtargetEmitter] Add the StartAtCycles field in the WriteRes class.
Conditions that need to be met:
1. count(StartAtCycle) == count(ReservedCycles);
2. For each i: StartAtCycles[i] < ReservedCycles[i];
3. For each i: StartAtCycles[i] >= 0;
4. If left unspecified, the elements are set to 0.
Differential Revision: https://reviews.llvm.org/D150310
Matthias Springer [Mon, 15 May 2023 07:24:01 +0000 (09:24 +0200)]
[mlir][transform] Use TrackingListener-aware iterator for getPayloadOps
Instead of returning an `ArrayRef<Operation *>`, return at iterator that skips ops that were erased/replaced while iterating over the payload ops.
This fixes an issue in conjuction with TrackingListener, where a tracked op was erased during the iteration. Elements may not be removed from an array while iterating over it; this invalidates the iterator.
When ops are erased/removed via `replacePayloadOp`, they are not immediately removed from the mappings data structure. Instead, they are set to `nullptr`. `nullptr`s are not enumerated by `getPayloadOps`. At the end of each transformation, `nullptr`s are removed from the mapping data structure.
Differential Revision: https://reviews.llvm.org/D149847
Guillaume Chatelet [Fri, 12 May 2023 09:10:06 +0000 (09:10 +0000)]
[libc] Add optimized memset for RISCV
This patch adds two versions of `memset` optimized for architectures where unaligned accesses are either illegal or extremely slow.
It is currently enabled for RISCV 64 and RISCV 32 but it could be used for ARM 32 architectures as well.
Here is the before / after output of libc.benchmarks.memory_functions.opt_host --benchmark_filter=BM_Memset on a quad core Linux starfive RISCV 64 board running at 1.5GHz.
Before
```
Run on (4 X 1500 MHz CPU s)
CPU Caches:
L1 Instruction 32 KiB (x4)
L1 Data 32 KiB (x4)
L2 Unified 2048 KiB (x1)
------------------------------------------------------------------------
Benchmark Time CPU Iterations UserCounters...
------------------------------------------------------------------------
BM_Memset/0/0 506 ns 252 ns 2883584 bytes_per_cycle=0.238312/s bytes_per_second=340.908M/s items_per_second=3.96043M/s __llvm_libc::memset,memset Google A
BM_Memset/1/0 296 ns 189 ns 2900992 bytes_per_cycle=0.234589/s bytes_per_second=335.583M/s items_per_second=5.29382M/s __llvm_libc::memset,memset Google B
BM_Memset/2/0 2110 ns 1049 ns 678912 bytes_per_cycle=0.24687/s bytes_per_second=353.151M/s items_per_second=953.527k/s __llvm_libc::memset,memset Google D
BM_Memset/3/0 397 ns 254 ns 3055616 bytes_per_cycle=0.238479/s bytes_per_second=341.147M/s items_per_second=3.93224M/s __llvm_libc::memset,memset Google L
BM_Memset/4/0 1119 ns 621 ns 1079296 bytes_per_cycle=0.244925/s bytes_per_second=350.368M/s items_per_second=1.61047M/s __llvm_libc::memset,memset Google M
BM_Memset/5/0 605 ns 349 ns 1644544 bytes_per_cycle=0.241364/s bytes_per_second=345.274M/s items_per_second=2.8614M/s __llvm_libc::memset,memset Google Q
BM_Memset/6/0 472 ns 271 ns 2310144 bytes_per_cycle=0.238615/s bytes_per_second=341.341M/s items_per_second=3.68799M/s __llvm_libc::memset,memset Google S
BM_Memset/7/0 262 ns 143 ns 3956736 bytes_per_cycle=0.225812/s bytes_per_second=323.026M/s items_per_second=7.0087M/s __llvm_libc::memset,memset Google U
BM_Memset/8/0 454 ns 261 ns 2940928 bytes_per_cycle=0.238883/s bytes_per_second=341.725M/s items_per_second=3.82716M/s __llvm_libc::memset,memset Google W
BM_Memset/9/0 8768 ns 5998 ns 115712 bytes_per_cycle=0.249196/s bytes_per_second=356.478M/s items_per_second=166.724k/s __llvm_libc::memset,uniform 384 to 4096
```
After
```
BM_Memset/0/0 117 ns 69.5 ns 9761792 bytes_per_cycle=0.935152/s bytes_per_second=1.30639G/s items_per_second=14.3834M/s __llvm_libc::memset,memset Google A
BM_Memset/1/0 97.8 ns 58.5 ns
13002752 bytes_per_cycle=0.892814/s bytes_per_second=1.24725G/s items_per_second=17.0848M/s __llvm_libc::memset,memset Google B
BM_Memset/2/0 326 ns 163 ns 5156864 bytes_per_cycle=1.54408/s bytes_per_second=2.15706G/s items_per_second=6.1192M/s __llvm_libc::memset,memset Google D
BM_Memset/3/0 132 ns 65.4 ns
11455488 bytes_per_cycle=0.876411/s bytes_per_second=1.22433G/s items_per_second=15.2803M/s __llvm_libc::memset,memset Google L
BM_Memset/4/0 222 ns 120 ns 6405120 bytes_per_cycle=1.44398/s bytes_per_second=2.01722G/s items_per_second=8.30758M/s __llvm_libc::memset,memset Google M
BM_Memset/5/0 119 ns 79.2 ns 8930304 bytes_per_cycle=1.13327/s bytes_per_second=1.58317G/s items_per_second=12.6189M/s __llvm_libc::memset,memset Google Q
BM_Memset/6/0 123 ns 64.0 ns
11609088 bytes_per_cycle=1.008/s bytes_per_second=1.40817G/s items_per_second=15.6365M/s __llvm_libc::memset,memset Google S
BM_Memset/7/0 85.9 ns 52.1 ns
12423168 bytes_per_cycle=0.641164/s bytes_per_second=917.192M/s items_per_second=19.1937M/s __llvm_libc::memset,memset Google U
BM_Memset/8/0 114 ns 67.1 ns
10347520 bytes_per_cycle=0.911968/s bytes_per_second=1.274G/s items_per_second=14.9015M/s __llvm_libc::memset,memset Google W
BM_Memset/9/0 1326 ns 785 ns 907264 bytes_per_cycle=1.89716/s bytes_per_second=2.6503G/s items_per_second=1.27348M/s __llvm_libc::memset,uniform 384 to 4096
```
Again not as good as current glibc but it's a first step in the right direction.
```
BM_Memset/0/0 108 ns 53.6 ns
12894208 bytes_per_cycle=1.02858/s bytes_per_second=1.4369G/s items_per_second=18.668M/s glibc::memset,memset Google A
BM_Memset/1/0 84.6 ns 47.6 ns
14284800 bytes_per_cycle=1.00197/s bytes_per_second=1.39974G/s items_per_second=21.0256M/s glibc::memset,memset Google B
BM_Memset/2/0 160 ns 85.8 ns 8927232 bytes_per_cycle=3.30805/s bytes_per_second=4.62129G/s items_per_second=11.6596M/s glibc::memset,memset Google D
BM_Memset/3/0 78.9 ns 53.6 ns
13326336 bytes_per_cycle=1.14058/s bytes_per_second=1.59338G/s items_per_second=18.674M/s glibc::memset,memset Google L
BM_Memset/4/0 99.2 ns 60.8 ns
11460608 bytes_per_cycle=2.54751/s bytes_per_second=3.55884G/s items_per_second=16.4587M/s glibc::memset,memset Google M
BM_Memset/5/0 93.0 ns 56.1 ns
12219392 bytes_per_cycle=1.73379/s bytes_per_second=2.42207G/s items_per_second=17.8157M/s glibc::memset,memset Google Q
BM_Memset/6/0 89.4 ns 47.2 ns
14692352 bytes_per_cycle=1.34846/s bytes_per_second=1.88377G/s items_per_second=21.1795M/s glibc::memset,memset Google S
BM_Memset/7/0 84.0 ns 50.0 ns
14468096 bytes_per_cycle=0.911198/s bytes_per_second=1.27293G/s items_per_second=19.994M/s glibc::memset,memset Google U
BM_Memset/8/0 93.4 ns 52.8 ns
13063168 bytes_per_cycle=1.06642/s bytes_per_second=1.48977G/s items_per_second=18.9524M/s glibc::memset,memset Google W
BM_Memset/9/0 438 ns 241 ns 2853888 bytes_per_cycle=6.1185/s bytes_per_second=8.54744G/s items_per_second=4.15064M/s glibc::memset,uniform 384 to 4096
```
Reviewed By: sivachandra
Differential Revision: https://reviews.llvm.org/D150433
Guillaume Chatelet [Fri, 12 May 2023 15:50:44 +0000 (15:50 +0000)]
[NFC] Refactor GlobalVariable Ctor
Reuse logic from other ctor and remove code duplication.
Reviewed By: courbet
Differential Revision: https://reviews.llvm.org/D150453
Christian Ulmann [Mon, 15 May 2023 07:04:59 +0000 (07:04 +0000)]
[IR] Drop const in DILocation::getMergedLocation
This commit removes constness from DILocation::getMergedLocation and
fixes all its users accordingly.
Having constness on the parameters forced the return type to be const
as well, which does force usage of `const_cast` when the location needs
to be used in metadata nodes.
Reviewed By: ftynse
Differential Revision: https://reviews.llvm.org/D149942
pvanhout [Wed, 10 May 2023 12:59:18 +0000 (14:59 +0200)]
[AMDGPU] Improve PHI-breaking heuristics in CGP
D147786 made the transform more conservative by adding heuristics,
which was a good idea. However, the transform got a bit
too conservative at times.
This caused a surprise in some rocRAND benchmarks because D143731 greatly helped a few of them.
For instance, a few xorwow-uniform tests saw a +30% boost in performance after that pass, which was lost when D147786 landed.
This patch is an attempt at reaching a middleground that makes
the pass a bit more permissive. It continues in the same spirit as
D147786 but does the following changes:
- PHI users of a PHI node are now recursively checked. When loops are encountered, we consider the PHIs non-breakable. (Considering them breakable had very negative effect in one app I tested)
- `shufflevector` is now considered interesting, given that it satisfies a few trivial checks.
Reviewed By: arsenm, #amdgpu, jmmartinez
Differential Revision: https://reviews.llvm.org/D150266
Diana Picus [Wed, 10 May 2023 09:52:00 +0000 (11:52 +0200)]
[AMDGPU][MC] Don't accept attr > 32 for param_load
The docs say the interpolation attribute should be between 0..32 [1][2],
but we currently accept values all the way up to 63.
This patch makes the ASMParser error out for values > 32. It does not
touch codegen though because we're currently not checking anything at
all for codegen (llvm.amdgcn.lds.param.load will happily accept even 128
as an attr, although that won't fit in the encoding).
[1] https://llvm.org/docs/AMDGPU/gfx8_attr.html#amdgpu-synid-gfx8-attr
[2] https://llvm.org/docs/AMDGPU/gfx11_attr.html#amdgpu-synid-gfx11-attr
Differential Revision: https://reviews.llvm.org/D150261
Fangrui Song [Mon, 15 May 2023 06:09:31 +0000 (23:09 -0700)]
[Driver][test] Add -fintegrated-as after D150282
D150282 does not add support for derived trace file names with
-fno-integrated-as, e.g. `clang -c -fno-integrated-as a.c -o e/a.o`.
Add -fintegrated-as to fix AIX.
Jake Egan [Mon, 15 May 2023 05:39:46 +0000 (01:39 -0400)]
[AIX][tests] XFAIL -ftime-trace test for now
This test is failing due to D150282. XFAIL this test for now while it's being investigated to get the AIX bot green.
Craig Topper [Mon, 15 May 2023 05:35:47 +0000 (22:35 -0700)]
[RISCV] Add RISCVISD nodes for VWFMADD_VL.
Use it to replace isel patterns with a DAG combine of FP_EXTEND_VL+VFMADD_VL.
This makes it similar to how other widening operations are handled.
I plan to use this to make it easier to form tail undisturbed vfwmacc.
Craig Topper [Mon, 15 May 2023 05:35:39 +0000 (22:35 -0700)]
[RISCV] Add test cases for forming vfwmacc when widening from f16 to f64. NFC
Martin Braenne [Fri, 12 May 2023 11:59:21 +0000 (11:59 +0000)]
[clang][dataflow] Eliminate `SkipPast::ReferenceThenPointer`.
As a replacement, we provide the accessors `getImplicitObjectLocation()` and
`getBaseObjectLocation()`, which are higher-level constructs that cover the use
cases in which `SkipPast::ReferenceThenPointer` was typically used.
Unfortunately, it isn't possible to use these accessors in
UncheckedOptionalAccessModel.cpp; I've added a FIXME to the code explaining the
details. I initially attempted to resolve the issue as part of this patch, but
it turned out to be non-trivial to fix. Instead, I have therefore added a
lower-level replacement for `SkipPast::ReferenceThenPointer` that is used only
within this file.
The wider context of this change is that `SkipPast` will be going away entirely.
See also the RFC at https://discourse.llvm.org/t/70086.
Reviewed By: ymandel, gribozavr2
Differential Revision: https://reviews.llvm.org/D149838
Xi Ruoyao [Sun, 14 May 2023 02:42:45 +0000 (03:42 +0100)]
[cmake] Disable GCC lifetime DSE
LLVM data structures like llvm::User and llvm::MDNode rely on
the value of object storage persisting beyond the lifetime of the
object (#24952). This is not standard compliant and causes a runtime
crash if LLVM is built with GCC and LTO enabled (#57740). Until
these issues are fixed, we need to disable dead store eliminations
eliminations based on object lifetime.
The previous test issues are fixed by
626849c71e85d546a004cc91866beab610222194.
Bug: https://github.com/llvm/llvm-project/issues/24952
Bug: https://github.com/llvm/llvm-project/issues/57740
Link: https://gcc.gnu.org/bugzilla/show_bug.cgi?id=106943
Reviewed By: MaskRay, thesamesam, nikic
Differential Revision: https://reviews.llvm.org/D150505
Jonas Devlieghere [Mon, 15 May 2023 03:20:03 +0000 (20:20 -0700)]
[lldb] Cleanup OptionValue header and implenentation (NFC)
Group related functions together and remove inconsistencies between them
in the implementation.
Chuanqi Xu [Mon, 15 May 2023 03:05:47 +0000 (11:05 +0800)]
Revert "[Serialization] Don't try to complete the redeclaration chain in"
Close https://github.com/llvm/llvm-project/issues/62705
This reverts commit
cf47e9fe86aa65b74b0476a5ad4d036dd7463bfb. This
introduces a breaking change in
https://github.com/llvm/llvm-project/issues/62705. Revert this one to
fix it quickly.
Jonas Devlieghere [Mon, 15 May 2023 02:58:16 +0000 (19:58 -0700)]
[lldb] Complete OptionValue cleanup (NFC)
Make the `Get.*Value` and `Set.*Value` function private and migrate the
last remaining call sites to the new overloaded/templated functions.
Manna, Soumi [Mon, 15 May 2023 03:07:19 +0000 (20:07 -0700)]
[NFC][CLANG] Fix Static Code Analysis Concerns
Reported by Static Analyzer Tool, Coverity:
Bad bit shift operation
The operation may have an undefined behavior or yield an unexpected result.
In <unnamed>::SVEEmitter::encodeFlag(unsigned long long, llvm::StringRef): A bit shift operation has a shift amount which is too large or has a negative value.
// Returns the SVETypeFlags for a given value and mask.
uint64_t encodeFlag(uint64_t V, StringRef MaskName) const {
auto It = FlagTypes.find(MaskName);
//Condition It != llvm::StringMap<unsigned long long, llvm::MallocAllocator>::const_iterator const(this->FlagTypes.end()), taking true branch.
if (It != FlagTypes.end()) {
uint64_t Mask = It->getValue();
//return_constant: Function call llvm::countr_zero(Mask) may return 64.
//assignment: Assigning: Shift = llvm::countr_zero(Mask). The value of Shift is now 64.
unsigned Shift = llvm::countr_zero(Mask);
//Bad bit shift operation (BAD_SHIFT)
//large_shift: In expression V << Shift, left shifting by more than 63 bits has undefined behavior. The shift amount, Shift, is 64.
return (V << Shift) & Mask;
}
llvm_unreachable("Unsupported flag");
}
Asserting Mask != 0 will not suffice to silence Coverity. While Coverity can specifically observe that countr_zero might return 0 (because TrailingZerosCounter<T, 8>::count() has a return 64 statement), It seems like Coverity can not determine that the function can't return 65 or higher. Coverity is reporting is that the shift might overflow,
so that is what should be guarded.
assert(Shift < 64 && "Mask value produced an invalid shift value");
Reviewed By: tahonermann, sdesmalen, erichkeane
Differential Revision: https://reviews.llvm.org/D150140
Manna, Soumi [Mon, 15 May 2023 02:49:22 +0000 (19:49 -0700)]
[NFC][Clang] Fix Coverity issues of copy without assign
This patch adds missing copy/move assignment operator to the class which has user-defined copy/move constructor.
Reviewed By: tahonermann
Differential Revision: https://reviews.llvm.org/D149718
David Green [Sun, 14 May 2023 22:28:11 +0000 (23:28 +0100)]
[AArch64] Update FP16 vector cmp costs
Without FP16, a fp16 v4f16 comparison will be converted to a v4f32 and back.
v8f16 get scalarized currently. Update the costs of v4f16 to match.
Fangrui Song [Sun, 14 May 2023 21:12:16 +0000 (14:12 -0700)]
[clang-tidy][test] Add trailing -- to suppress compile_commands.json read
This fixes some build bots if we reland D150505: specifically when using GCC to
build LLVM and then `-fno-lifetime-dse` ends up passed to compile_commands.json
and causing clang-tidy to pick up the Clang unknown option.
Florian Hahn [Sun, 14 May 2023 21:07:35 +0000 (22:07 +0100)]
[Matrix] Remove redundant transpose with dot product lowering.
Extend dot-product handling to skip transposes of the first operand. As
this is a vector, the conversion between column and row vector via the
transpose isn't needed.
Reviewed By: thegameg
Differential Revision: https://reviews.llvm.org/D148428
LLVM GN Syncbot [Sun, 14 May 2023 19:26:19 +0000 (19:26 +0000)]
[gn build] Port
b7932803dede
Douglas Yung [Sun, 14 May 2023 19:22:11 +0000 (12:22 -0700)]
Revert "[LV] Add test case for #51677."
This reverts commit
77df976a1219c0c6fd102358c15e71747aab4443.
Test is failing on many build bots including:
https://lab.llvm.org/buildbot/#/builders/247/builds/4488
https://lab.llvm.org/buildbot/#/builders/139/builds/40608
https://lab.llvm.org/buildbot/#/builders/216/builds/21169
https://lab.llvm.org/buildbot/#/builders/65/builds/9673
https://lab.llvm.org/buildbot/#/builders/119/builds/13302
https://lab.llvm.org/buildbot/#/builders/121/builds/30459
https://lab.llvm.org/buildbot/#/builders/230/builds/12967
https://lab.llvm.org/buildbot/#/builders/57/builds/26781
https://lab.llvm.org/buildbot/#/builders/214/builds/7458
https://lab.llvm.org/buildbot/#/builders/93/builds/14892
https://lab.llvm.org/buildbot/#/builders/231/builds/11764
Fangrui Song [Sun, 14 May 2023 18:59:02 +0000 (11:59 -0700)]
[MC] Remove redundant classof definitions for MCTargetDesc's derived classes
Fangrui Song [Sun, 14 May 2023 18:37:36 +0000 (11:37 -0700)]
[MC][X86] Fix != result for two register operands
Fixes:
05b589101e7dadce267881e5b0832882f95a9908 (D47545)
Mark de Wever [Thu, 20 Apr 2023 19:03:40 +0000 (21:03 +0200)]
[libc++] Moves unwrap_reference to type_traits.
This was discovered while working on modules.
Reviewed By: #libc, philnik
Differential Revision: https://reviews.llvm.org/D149351
Sergei Barannikov [Sun, 14 May 2023 17:59:13 +0000 (20:59 +0300)]
[clang] Convert a few tests to opaque pointers
Reviewed By: nikic
Differential Revision: https://reviews.llvm.org/D150520
Ricky Zhou [Sun, 14 May 2023 15:53:04 +0000 (16:53 +0100)]
[LV] Add test case for #51677.
Luo, Yuanke [Sun, 14 May 2023 08:15:32 +0000 (16:15 +0800)]
[X86] Fix the bug of pr62625
We should not call tryOptimizeLEAtoMOV() in eliminateFrameIndex() when
the base register is a virtual register, because tryOptimizeLEAtoMOV
would assume the base register be physical register. Although we can
also optimize LEA to MOV with virtual register, I'd like to leave the
optimization in another patch.
Differential Revision: https://reviews.llvm.org/D150521
Phoebe Wang [Sun, 14 May 2023 12:53:30 +0000 (20:53 +0800)]
[Coverity] Fix unchecked return value, NFC
Serguei Katkov [Wed, 10 May 2023 06:19:38 +0000 (13:19 +0700)]
[X86] Improve handling on zero constant for fminimum/fmaximum lowering
If we know that zero constant operand is already in the right place we do not need
to re-order anything.
Reviewed By: e-kud
Differential Revision: https://reviews.llvm.org/D150249
Uday Bondhugula [Sun, 14 May 2023 12:01:51 +0000 (17:31 +0530)]
[MLIR] NFC. Add missing const on affine analysis utils methods
NFC. Add missing const on affine analysis utils ComputationSliceState
methods.
Differential Revision: https://reviews.llvm.org/D150523
Uday Bondhugula [Sun, 14 May 2023 11:52:14 +0000 (17:22 +0530)]
[MLIR] NFC. Make affine analysis utils method const correct
Make isSliceValid const correct. NFC.
Phoebe Wang [Sun, 14 May 2023 09:14:49 +0000 (17:14 +0800)]
[Coverity] Fix unchecked return value, NFC
Vitaly Buka [Sun, 14 May 2023 08:22:49 +0000 (01:22 -0700)]
[test][sanitizer] Disable create_thread_loop on Android
Joshua Cao [Sun, 7 May 2023 05:13:16 +0000 (22:13 -0700)]
[IntervalTree] Initialize find_iterator::Point
There was initially a msan report for use-of-uninitialized value due to
a bug in https://reviews.llvm.org/D138526. find_iterator::Point is
uninitialized for the default constructor of find_iterator, which is
used by IntervalTree::end. This change is not required, but its good
practice to make sure all class members are initialized.
Differential Revision: https://reviews.llvm.org/D149698
Sam James [Sun, 14 May 2023 06:37:43 +0000 (07:37 +0100)]
Revert "[cmake] Disable GCC lifetime DSE"
This reverts commit
ce990b542617e5b52f69707b103a2424bec5e53b.
This breaks some build bots - specifically when using GCC to build LLVM and
then -fno-lifetime-dse ends up passed to Clang in some tests like at
https://lab.llvm.org/buildbot/#/builders/139/builds/40594.
Bug: https://github.com/llvm/llvm-project/issues/24952
Bug: https://github.com/llvm/llvm-project/issues/57740
Differential Revision: https://reviews.llvm.org/D150505
Craig Topper [Sun, 14 May 2023 06:33:00 +0000 (23:33 -0700)]
[LegalizeVectorOps][AArch64][RISCV][X86] Use OpVT for ISD::SETCC in LegalizeVectorOps.
Previously, LegalizeVectorOps used the result VT while LegalizeDAG
used the operand VT. This patch makes them both use the operand VT.
This also makes it consistent with how the default cost model works.
I've hacked the AArch64 cost model to maintain old behavior for some
f16 vectors.
Reviewed By: RKSimon
Differential Revision: https://reviews.llvm.org/D149572
Ian Anderson [Sat, 13 May 2023 01:36:57 +0000 (18:36 -0700)]
[libc++][PSTL] Make the PSTL submodules only have one header
Module map generation for the private detail headers is easier done if each private header is by itself in a submodule. Move the __algorithm/pstl_backends into their own submodules.
Reviewed By: philnik, #libc
Differential Revision: https://reviews.llvm.org/D150503
Craig Topper [Sun, 14 May 2023 06:01:32 +0000 (23:01 -0700)]
[M68k] Update divide-by-constant.ll after D150333.
Xi Ruoyao [Sun, 14 May 2023 02:42:45 +0000 (03:42 +0100)]
[cmake] Disable GCC lifetime DSE
LLVM data structures like llvm::User and llvm::MDNode rely on
the value of object storage persisting beyond the lifetime of the
object (#24952). This is not standard compliant and causes a runtime
crash if LLVM is built with GCC and LTO enabled (#57740). Until
these issues are fixed, we need to disable dead store eliminations
eliminations based on object lifetime.
Bug: https://github.com/llvm/llvm-project/issues/24952
Bug: https://github.com/llvm/llvm-project/issues/57740
Link: https://gcc.gnu.org/bugzilla/show_bug.cgi?id=106943
(This was originally committed as
94f7c961c78d8fdbc05898cfbbf88094de45c1ad but
I reverted it in
b974991f4c4457a2104b648d9797a0ed438ecc9 to fix authorship.)
Reviewed By: MaskRay, thesamesam, nikic
Differential Revision: https://reviews.llvm.org/D150505
Signed-off-by: Sam James <sam@gentoo.org>
Sam James [Sun, 14 May 2023 02:42:27 +0000 (03:42 +0100)]
Revert "[cmake] Disable GCC lifetime DSE" (to fix authorship)
This reverts commit
94f7c961c78d8fdbc05898cfbbf88094de45c1ad.
Differential Revision: https://reviews.llvm.org/D150505
Fangrui Song [Sun, 14 May 2023 01:47:29 +0000 (18:47 -0700)]
MCSymbol: Split FragmentAndHasName to Fragment and HasName
The bit fields have plent of spare bits. Just reserve one for HasName so that we
can access Fragment without bitwise operations. Fragment is commonly accessed.
This change makes my x86-64 release build 5KiB smaller.
Nikolas Klauser [Fri, 12 May 2023 03:24:30 +0000 (20:24 -0700)]
[libc++][NFC] Use _LIBCPP_STD_VER instead of __cpp_lib_atomic_is_always_lock_free
Reviewed By: #libc, ldionne, Mordante
Spies: Mordante, libcxx-commits
Differential Revision: https://reviews.llvm.org/D150421
Thurston Dang [Fri, 12 May 2023 23:27:53 +0000 (23:27 +0000)]
ASan: fix potential use-after-free in backtrace interceptor
Various ASan interceptors may corrupt memory if passed a
pointer to freed memory (https://github.com/google/sanitizers/issues/321).
This patch fixes the issue for the backtrace interceptor,
by calling REAL(backtrace) with a known-good scratch buffer,
and performing an addressability check on the user-provided
buffer prior to writing to it.
Differential Revision: https://reviews.llvm.org/D150496
Aiden Grossman [Sat, 13 May 2023 22:43:39 +0000 (15:43 -0700)]
[Docs] Minor Fixups in Advanced Builds Documentation
This patch changes two instances of an ampersand to a written out and
for more consistency with the rest of the file and brevity. In addition,
the last `cmake --build` reference is removed, again for consistency
with the rest of the file which shows the ninja invocations. This cmake
invocation also passed in the `--parallel` flag which doesn't make sense
with ninja using all threads by default.
This was changed in the previous patch to touch this line
(https://reviews.llvm.org/D88990), but if we want to change this, it
should be done across the entire file.
Noah Goldstein [Sat, 13 May 2023 17:58:55 +0000 (12:58 -0500)]
[SelectionDAG] Use `computeKnownBits` if `Op` is not recognized by `isKnownNeverZero`
The current logic is pretty limitted unless the `Op` is a
constant. This at least covers more obvious cases.
Reviewed By: craig.topper, foad
Differential Revision: https://reviews.llvm.org/D149196
Noah Goldstein [Tue, 25 Apr 2023 17:53:33 +0000 (12:53 -0500)]
[SelectionDAG] Limit max recursion in `isKnownNeverZero` and `isKnownToBeAPowerOfTwo`
Both of these functions recursively call themselves so it makes sense
to limit that upper bound.
Differential Revision: https://reviews.llvm.org/D149195
Noah Goldstein [Sat, 13 May 2023 17:58:16 +0000 (12:58 -0500)]
[InstCombine] Add simplifications for div/rem with `i1` operands; PR62607
This is generally handled already in early CSE.
If a specialized pipeline is used, however, its possible for `i1`
operand with known-zero denominator to slip through. Generally the
known-zero denominator is caught and poison is returned, but if it is
indirect enough (known zero through a phi node) we can miss this case
in `InstructionSimplify` and then miss handling `i1`. This is because
`i1` is current handled with the following check:
`if(Known.countMinLeadingZeros() == Known.getBitWidth() - 1)`
which only works on the assumption we don't know the denominator to be
zero. If we know the denominator to be zero, this check fails:
https://github.com/llvm/llvm-project/issues/62607
This patch simply adds an explicit `if(Known.isZero) return poison;`
which fixes the issue.
Alive2 Link for tests:
https://alive2.llvm.org/ce/z/VTw54n
Reviewed By: nikic
Differential Revision: https://reviews.llvm.org/D150142
Noah Goldstein [Sat, 13 May 2023 17:58:03 +0000 (12:58 -0500)]
[X86] Invert transforming `(x * (Pow2_Ceil(C1) - (1 << C0))) & C1` -> `(-x << C0) & C1`
We can detect the case under the following circumstances:
Take `(Pow2_Ceil(C1) - (1 << C0))` as `C2`.
1) `C2` is NOT a power of 2.
2) `C2 + LeastSignificantBit(C2)` is a nonzero power of 2.
3) `C2 u>= C1`
The motivation is the middle end transforms:
`(-x << C0) & C1`
to
`(x * (Pow2_Ceil(C1) - (1 << C2))) & C1`
As it saves IR instructions. On X86 the two instruction, `sub` and
`shl`, and better than the `mul` so we want to undo the transform.
This comes up when shifting a bit-mask by a byte-misalignment i.e:
`y << ((-(uintptr)x * 8) & 63)`
Alive2 Proofs (including all cases with undefs in the vector):
https://alive2.llvm.org/ce/z/f-65b6
Reviewed By: RKSimon, pengfei
Differential Revision: https://reviews.llvm.org/D150294
Noah Goldstein [Wed, 10 May 2023 19:49:23 +0000 (14:49 -0500)]
[X86] Add tests for inverting `(x * (Pow2_Ceil(C1) - (1 << C0))) & C1` -> `(-x << C0) & C1`; NFC
Differential Revision: https://reviews.llvm.org/D150293
Sam James [Sat, 13 May 2023 19:34:05 +0000 (20:34 +0100)]
[cmake] Disable GCC lifetime DSE
LLVM data structures like llvm::User and llvm::MDNode rely on
the value of object storage persisting beyond the lifetime of the
object (#24952). This is not standard compliant and causes a runtime
crash if LLVM is built with GCC and LTO enabled (#57740). Until
these issues are fixed, we need to disable dead store eliminations
eliminations based on object lifetime.
Bug: https://github.com/llvm/llvm-project/issues/24952
Bug: https://github.com/llvm/llvm-project/issues/57740
Link: https://gcc.gnu.org/bugzilla/show_bug.cgi?id=106943
Reviewed By: MaskRay, thesamesam, nikic
Differential Revision: https://reviews.llvm.org/D150505
Amir Ayupov [Sat, 13 May 2023 17:34:50 +0000 (10:34 -0700)]
[Clang][CMake] Use perf-training for Clang-BOLT
Leverage perf-training flow for BOLT profile collection, enabling reproducible
BOLT optimization. Remove the use of bootstrapped build for profile collection.
Test Plan:
- Regular (single-stage) build
```
$ cmake ... -C .../clang/cmake/caches/BOLT.cmake
$ ninja clang-bolt
...
[21/24] Instrumenting clang binary with BOLT
[21/24] Generating BOLT profile for Clang
[23/24] Merging BOLT fdata
Profile from 2 files merged.
[24/24] Optimizing Clang with BOLT
...
1291202496 : executed instructions (-1.1%)
27005133 : taken branches (-71.5%)
...
```
- Two stage build (ThinLTO+InstPGO)
```
$ cmake ... -C .../clang/cmake/caches/BOLT.cmake -C .../clang/cmake/caches/BOLT-PGO.cmake
$ ninja clang-bolt
$ ninja stage2-clang-bolt
...
[2756/2759] Instrumenting clang binary with BOLT
[2756/2759] Generating BOLT profile for Clang
[2758/2759] Merging BOLT fdata
[2759/2759] Optimizing Clang with BOLT
...
BOLT-INFO: 7092 out of 184104 functions in the binary (3.9%) have non-empty execution profile
756531927 : executed instructions (-0.5%)
15399400 : taken branches (-40.3%)
...
```
Reviewed By: beanz
Differential Revision: https://reviews.llvm.org/D143553
Nico Weber [Sat, 13 May 2023 15:16:23 +0000 (17:16 +0200)]
[gn] port
88c1242ed7e1 (begone, LLVMExegesisARMTests)
Florian Hahn [Sat, 13 May 2023 11:28:10 +0000 (12:28 +0100)]
[LV] Move selecting vectorization factor logic to LVP (NFC).
Split off from D143938. This moves the planning logic to select the
vectorization factor to LoopVectorizationPlanner as a step towards only
computing costs for individual VFs in LoopVectorizationCostModel and do
planning in LVP.
Reviewed By: Ayal
Differential Revision: https://reviews.llvm.org/D150197
Florian Hahn [Sat, 13 May 2023 11:27:53 +0000 (12:27 +0100)]
[VPlan] Change LoopVectorizationPlanner::TTI to be const reference (NFC)
Uday Bondhugula [Sat, 13 May 2023 10:06:12 +0000 (15:36 +0530)]
[MLIR] NFC. Pass affine copy options by const ref
NFC. Pass affine copy options by const ref.
Differential Revision: https://reviews.llvm.org/D150507
Mark de Wever [Sat, 13 May 2023 09:42:25 +0000 (11:42 +0200)]
Reland "[CMake] Bumps minimum version to 3.20.0."
The owner of the last two failing buildbots updated CMake.
This reverts commit
e8e8707b4aa6e4cc04c0cffb2de01d2de71165fc.
Job Noorman [Sat, 13 May 2023 09:36:46 +0000 (11:36 +0200)]
[llvm-jitlink] Pass object features when creating MCSubtargetInfo
The reason for this patch is to allow the MCDisassembler used in tests
to disassemble instructions that are only available when a specific
feature is enabled.
For example, on RISC-V it's currently not possible to use
decode_operand() on a compressed instruction. This patch fixes this.
Reviewed By: lhames
Differential Revision: https://reviews.llvm.org/D149523
Mark de Wever [Sat, 6 May 2023 15:04:26 +0000 (17:04 +0200)]
[NFC][libc++][format] Tests formatter requirements.
Like done D149543 this validates the other formatter specializations.
Reviewed By: #libc, ldionne
Differential Revision: https://reviews.llvm.org/D150041
Florian Hahn [Sat, 13 May 2023 09:17:09 +0000 (10:17 +0100)]
[LV] Move getVScaleForTuning out of LoopVectorizationCostModel (NFC).
Split off refactoring from D150197 to reduce diff.
Aiden Grossman [Sat, 13 May 2023 09:05:22 +0000 (09:05 +0000)]
[Docs][llvm-exegesis] Specify supported platforms and architectures
Currently, there is no documentation on what platforms and architectures
llvm-exegesis is supported on. This patch adds in user-facing
documentation in the CommandGuide about what architectures are supported
as well as developer facing documentation detailing the technical
reasons for why certain platforms are supported and some aren't.
This is a follow-up after discussion in
https://discourse.llvm.org/t/clarification-on-platform-support-for-llvm-exegesis/70206.
Reviewed By: kpdev42
Differential Revision: https://reviews.llvm.org/D149378
Aiden Grossman [Sat, 13 May 2023 08:56:42 +0000 (08:56 +0000)]
[llvm-exegesis] Remove Assembler Tests
The Assembler tests have been disabled for years in tree and at this
point don't test anything other than common MC infrastructure that is
already tested in other parts of the tree. This patch removes them due
to the mentioned reasons.
Reviewed By: courbet
Differential Revision: https://reviews.llvm.org/D149819
Aiden Grossman [Sat, 13 May 2023 08:49:52 +0000 (08:49 +0000)]
[Clang][Docs] Fix man page build
This patch fixes the man page build. It currently doesn't work as
SOURCE_DIR isn't set correctly (just undefined) within the
add_sphinx_target function. This patch also moves around the creation of
targets for autogenerated rst files so that both the man page and html
build can depend upon them as before only the html build depended on
them.
Fixes #62540
Reviewed By: tstellar
Differential Revision: https://reviews.llvm.org/D149809
Florian Hahn [Sat, 13 May 2023 08:40:16 +0000 (09:40 +0100)]
[Matrix] Add shape verification.
At the moment, lower-matrix-intrinsics accepts mis-matches between
shapes for operations. See shape-verification.ll for an example where
@llvm.matrix.column.major.load specifies 6x1 and then the use
(@llvm.matrix.multiply) specifies the operand to have 1x6.
This patch adds verification for shapes to check if shapes match.
Reviewed By: thegameg
Differential Revision: https://reviews.llvm.org/D147438