Florian Hahn [Tue, 6 Dec 2022 17:11:54 +0000 (17:11 +0000)]
[ConstraintElim] Add addition GEP tests with signed predicates.
Arthur Eubanks [Mon, 5 Dec 2022 21:15:02 +0000 (13:15 -0800)]
[llvm-c][test] Remove typed pointer support from llvm-c-test echo
Reviewed By: nikic
Differential Revision: https://reviews.llvm.org/D139364
Peiming Liu [Tue, 6 Dec 2022 01:25:38 +0000 (01:25 +0000)]
Revert "Revert "[mlir][sparse] Refactoring: abstract sparse tensor memory scheme into a SparseTensorDescriptor class.""
This reverts commit
10033a179f0c73f28f051ac70b058a0c61882e3a. Plus, it fixed windows warnings and gcc errors
Reviewed By: aartbik
Differential Revision: https://reviews.llvm.org/D139384
Kazu Hirata [Tue, 6 Dec 2022 17:10:25 +0000 (09:10 -0800)]
[ADT, Support] Move operator<< to raw_ostream.h (NFC)
Without this patch, operator<< for Optional<T> and std::optional<T>
are in Optional.h. This means that a C++ source file must include
Optional.h even if it just needs to stream std::optional<T> and has
nothing to do with Optional<T>, which is counter-intuitive.
This patch moves the operator<< to raw_ostream.h.
As a bonus, we get to resolve a circular dependency. Optional.h no
longer needs to forward-declare raw_ostream. That is, raw_ostream.h
depends on Optional.h, not vice versa.
As a preparation for this patch, I've checked in
77609717410372e8c43aca49a268511378f58297 to forward-declare
raw_ostream in those header files that were relying on the forward
declaration of raw_ostream in Optional.h.
Differential Revision: https://reviews.llvm.org/D139290
Jay Foad [Tue, 6 Dec 2022 16:34:10 +0000 (16:34 +0000)]
[AMDGPU] Add MC tests for s_endpgm's optional immediate operand
Differential Revision: https://reviews.llvm.org/D139438
Nico Weber [Tue, 6 Dec 2022 17:01:36 +0000 (12:01 -0500)]
Revert "[amdgpu] Reimplement LDS lowering"
This reverts commit
982017240d7f25a8a6969b8b73dc51f9ac5b93ed.
Breaks check-llvm, see https://reviews.llvm.org/D139433#3974862
bixia1 [Mon, 5 Dec 2022 23:57:30 +0000 (15:57 -0800)]
[mlir][crunner] Add support for random number generation.
Reviewed By: aartbik
Differential Revision: https://reviews.llvm.org/D139374
Shilei Tian [Tue, 6 Dec 2022 16:36:07 +0000 (11:36 -0500)]
[OpenMP] Use `add_llvm_library` to build the target `PluginInterface` in `plugins-nextgen`
This patch uses `add_llvm_library` to build the target `PluginInterface` since it can handle LLVM dependences much better. One temporary drawback of using this is that currently LLVM CMake macro doesn't support object libraries very well (there was a try a couple years ago but it was reverted later https://github.com/llvm/llvm-project/commit/
29e57229497711a3a294f437b59afa6ddc36a3d8). After switching to that, `CXX_VISIBILITY_PRESET` can not be set correctly, which can cause runtime error that a function call from one plugin could go to another. As a consequence, `PluginInterface` is built as a static library for now. I have asked the question in CMake community (https://discourse.cmake.org/t/set-target-properties-doesnt-work-properly/7016). Once that issue is solved, I'll switch it back to object library. It is not necessarily too bad to use static library, especially `BUILDTREE_ONLY` is already set such that `PluginInterface.a` will not be installed.
Reviewed By: jhuber6
Differential Revision: https://reviews.llvm.org/D139371
Sanjay Patel [Tue, 6 Dec 2022 16:17:52 +0000 (11:17 -0500)]
[SDAG] try to convert bit set/clear to signbit test when trunc is free
(X & Pow2MaskC) == 0 --> (trunc X) >= 0
(X & Pow2MaskC) != 0 --> (trunc X) < 0
This was noted as a regression in the post-commit feedback for D112634
(where we canonicalized IR differently).
For x86, this saves a few instruction bytes. AArch64 seems neutral.
Differential Revision: https://reviews.llvm.org/D139363
Sanjay Patel [Mon, 5 Dec 2022 19:15:58 +0000 (14:15 -0500)]
[AArch64][RISCV][x86] add tests for masked val equality with 0; NFC
Mark de Wever [Mon, 5 Dec 2022 17:23:51 +0000 (18:23 +0100)]
[libc++][CI] Upgrades clang-format version used.
The version used is now determined by Buildkite instead of using the
hard-coded version in the Docker image.
Reviewed By: #libc, ldionne
Differential Revision: https://reviews.llvm.org/D139341
Jon Chesterfield [Tue, 6 Dec 2022 16:10:42 +0000 (16:10 +0000)]
[amdgpu] Reimplement LDS lowering
Renames the current lowering scheme to "module" and introduces two new
ones, "kernel" and "table", plus a "hybrid" that chooses between those three
on a per-variable basis.
Unit tests are set up to pass with the default lowering of "module" or "hybrid"
with this patch defaulting to "module", which will be a less dramatic codegen
change relative to the current. This reflects the sparsity of test coverage for
the table lowering method. Hybrid is better than module in every respect and
will be default in a subsequent patch.
Reviewed By: arsenm
Differential Revision: https://reviews.llvm.org/D139433
Roman Lebedev [Tue, 6 Dec 2022 16:26:31 +0000 (19:26 +0300)]
[exegesis] ParallelSnippetGenerator: SingleStaticRegPerOperand if 2+ use regs
For instrs with tied operands, that strategy will not produce anything
different from `SingleStaticReg` unless there are at least two registers.
Dmitry Makogon [Tue, 6 Dec 2022 16:16:48 +0000 (23:16 +0700)]
[Test] Add test exposing crash in SimplifyCFG when hoisting llvm.deoptimize
Mircea Trofin [Mon, 5 Dec 2022 21:41:27 +0000 (13:41 -0800)]
[mlgo] Dependency-free training mode logger
This is the next step in dropping the dependency on protobuf.
The simple logger produces an output consisting of lines of json
strings. Tensor values - which should constitute the bulk of the data -
are serialized as raw byte buffers. This allows for light-weight reading
of the values.
The next step is to switch the training logic to the new logging format,
following which the protobuf-based logger will be dropped, together with
the training dependency on protobuf.
Subsequent changes will also stop buffering and stream, instead - the
buffering model is just as a convenient point-in-time.
Differential Revision: https://reviews.llvm.org/D139370
Nikita Popov [Tue, 6 Dec 2022 15:56:19 +0000 (16:56 +0100)]
[BasicAA] Guard against empty successors list (PR59360)
Succs can be empty here if a phi predecessor is unreachable.
Fixes https://github.com/llvm/llvm-project/issues/59360
Sander de Smalen [Tue, 6 Dec 2022 12:45:00 +0000 (12:45 +0000)]
[AArch64][SVE2p1] Make use of REVD instruction.
Reversing double-words within a quard-word is possible using the REVD instruction
when SVE2p1 is enabled.
Reviewed By: paulwalker-arm
Differential Revision: https://reviews.llvm.org/D139119
Nikita Popov [Tue, 6 Dec 2022 15:35:24 +0000 (16:35 +0100)]
[ConstantRange] Fix nsw nowrap region for 1 bit integers (PR59301)
The special case for V=1 was incorrect for one bit types, where
1 is also -1. Remove it, and use getNonEmpty() to handle the full
range case instead.
Adjust the exhaustive nowrap tests to test both 5 bit and 1 bit
types.
Fixes https://github.com/llvm/llvm-project/issues/59301.
chenglin.bi [Tue, 6 Dec 2022 15:28:04 +0000 (23:28 +0800)]
[AArch64] Transform shift+and to shift+shift to select more shifted register
and (shl/srl/sra, x, c), mask --> shl (srl/sra, x, c1), c2
Reviewed By: dmgreen
Differential Revision: https://reviews.llvm.org/D138904
Florian Hahn [Tue, 6 Dec 2022 15:27:57 +0000 (15:27 +0000)]
[ConstraintElim] Add tests for GEPs with signed predicates.
Nico Weber [Tue, 6 Dec 2022 15:22:47 +0000 (10:22 -0500)]
Revert "[Driver][test] Fix test by creating empty archive instead of empty file"
This reverts commit
6b992bcce0c5a86f57c83dd8d0ac9e63bcfc5521.
Test fails on macOS where ar doesn't want to create empty archives,
see https://reviews.llvm.org/D137275#3974489
Nico Weber [Tue, 6 Dec 2022 15:22:35 +0000 (10:22 -0500)]
Revert "[clang] Tweak test to tolerate clang being called "clang" instead of "clang-15""
This reverts commit
af95441ba7f3ca61c925409e58ee7e5486d84033.
Necessary to revert
6b992bcce0c5a86f57c83dd8d0ac9e63bcfc5521.
Nico Weber [Tue, 6 Dec 2022 15:17:35 +0000 (10:17 -0500)]
Unbreak check-all on macOS after
dbe8c2c316c40
`${X86_64}` expands to `x86_64;x86_64h` on macOS, so
get_test_cc_for_arch(${X86_64} METADATA_TEST_TARGET_CC METADATA_TEST_TARGET_CFLAGS)
calls the macro get_test_cc_for_arch() with the four arguments
`x86_64`, `x86_64h`, `METADATA_TEST_TARGET_CC`, and `METADATA_TEST_TARGET_CFLAGS`.
This writes the compiler into a variable called x86_64h, the cflags into a
variable called METADATA_TEST_TARGET_CC, and silently ignores the fourth
parameter.
As a fix, just pass `x86_64` instead of `${X86_64}`. Hopefully
that won't break anything on other platforms.
Roman Lebedev [Tue, 6 Dec 2022 14:55:35 +0000 (17:55 +0300)]
[llvm-exegesis] parallel snippet generator: avoid Read-After-Write pitfail for instrs w/ tied variables
As it is being discussed in https://github.com/llvm/llvm-project/issues/59325,
at least for the instructions with tied variables,
when trying to parallelize the instructions,
register selection is rather bad, and may either
use a register which we have used for def,
or vice versa.
That introduces serialization, and leads to
overly pessimistic inverse throughput measurement.
The new implementation avoids that,
New result:
```
$ ninja llvm-exegesis && ./bin/llvm-exegesis --mode=inverse_throughput --opcode-name=VFMADD132PDr --max-configs-per-opcode=9182
ninja: no work to do.
Check generated assembly with: /usr/bin/objdump -d /tmp/snippet-4af034.o
---
mode: inverse_throughput
key:
instructions:
- 'VFMADD132PDr XMM3 XMM3 XMM4 XMM8'
- 'VFMADD132PDr XMM5 XMM5 XMM14 XMM7'
- 'VFMADD132PDr XMM10 XMM10 XMM11 XMM15'
- 'VFMADD132PDr XMM13 XMM13 XMM15 XMM15'
- 'VFMADD132PDr XMM12 XMM12 XMM11 XMM1'
- 'VFMADD132PDr XMM0 XMM0 XMM6 XMM9'
- 'VFMADD132PDr XMM2 XMM2 XMM15 XMM11'
config: ''
register_initial_values:
- 'XMM3=0x0'
- 'XMM4=0x0'
- 'XMM8=0x0'
- 'MXCSR=0x0'
- 'XMM5=0x0'
- 'XMM14=0x0'
- 'XMM7=0x0'
- 'XMM10=0x0'
- 'XMM11=0x0'
- 'XMM15=0x0'
- 'XMM13=0x0'
- 'XMM12=0x0'
- 'XMM1=0x0'
- 'XMM0=0x0'
- 'XMM6=0x0'
- 'XMM9=0x0'
- 'XMM2=0x0'
cpu_name: znver3
llvm_triple: x86_64-unknown-linux-gnu
num_repetitions: 10000
measurements:
- { key: inverse_throughput, value: 0.6403, per_snippet_value: 4.4821 }
error: ''
info: instruction has tied variables, avoiding Read-After-Write issue, picking random def and use registers not aliasing each other, randomizing registers for uses
assembled_snippet: 4883EC10C7042400000000C744240400000000C744240800000000C744240C00000000C5FA6F1C244883C4104883EC10C7042400000000C744240400000000C744240800000000C744240C00000000C5FA6F24244883C4104883EC10C7042400000000C744240400000000C744240800000000C744240C00000000C57A6F04244883C4104883EC04C70424801F0000C5F8AE14244883C4044883EC10C7042400000000C744240400000000C744240800000000C744240C00000000C5FA6F2C244883C4104883EC10C7042400000000C744240400000000C744240800000000C744240C00000000C57A6F34244883C4104883EC10C7042400000000C744240400000000C744240800000000C744240C00000000C5FA6F3C244883C4104883EC10C7042400000000C744240400000000C744240800000000C744240C00000000C57A6F14244883C4104883EC10C7042400000000C744240400000000C744240800000000C744240C00000000C57A6F1C244883C4104883EC10C7042400000000C744240400000000C744240800000000C744240C00000000C57A6F3C244883C4104883EC10C7042400000000C744240400000000C744240800000000C744240C00000000C57A6F2C244883C4104883EC10C7042400000000C744240400000000C744240800000000C744240C00000000C57A6F24244883C4104883EC10C7042400000000C744240400000000C744240800000000C744240C00000000C5FA6F0C244883C4104883EC10C7042400000000C744240400000000C744240800000000C744240C00000000C5FA6F04244883C4104883EC10C7042400000000C744240400000000C744240800000000C744240C00000000C5FA6F34244883C4104883EC10C7042400000000C744240400000000C744240800000000C744240C00000000C57A6F0C244883C4104883EC10C7042400000000C744240400000000C744240800000000C744240C00000000C5FA6F14244883C410C4C2D998D8C4E28998EFC442A198D7C4428198EFC462A198E1C4C2C998C1C4C28198D3C4C2D998D8C4E28998EFC442A198D7C4428198EFC462A198E1C4C2C998C1C4C28198D3C4C2D998D8C4E28998EFC442A198D7C4428198EFC462A198E1C4C2C998C1C4C28198D3C4C2D998D8C4E28998EFC442A198D7C4428198EFC462A198E1C4C2C998C1C4C28198D3C3
...
Check generated assembly with: /usr/bin/objdump -d /tmp/snippet-f05c2f.o
---
mode: inverse_throughput
key:
instructions:
- 'VFMADD132PDr XMM15 XMM15 XMM11 XMM2'
- 'VFMADD132PDr XMM5 XMM5 XMM11 XMM2'
- 'VFMADD132PDr XMM14 XMM14 XMM11 XMM2'
- 'VFMADD132PDr XMM4 XMM4 XMM11 XMM2'
- 'VFMADD132PDr XMM8 XMM8 XMM11 XMM2'
- 'VFMADD132PDr XMM3 XMM3 XMM11 XMM2'
- 'VFMADD132PDr XMM10 XMM10 XMM11 XMM2'
- 'VFMADD132PDr XMM7 XMM7 XMM11 XMM2'
- 'VFMADD132PDr XMM13 XMM13 XMM11 XMM2'
- 'VFMADD132PDr XMM9 XMM9 XMM11 XMM2'
- 'VFMADD132PDr XMM1 XMM1 XMM11 XMM2'
- 'VFMADD132PDr XMM6 XMM6 XMM11 XMM2'
- 'VFMADD132PDr XMM0 XMM0 XMM11 XMM2'
- 'VFMADD132PDr XMM12 XMM12 XMM11 XMM2'
config: ''
register_initial_values:
- 'XMM15=0x0'
- 'XMM11=0x0'
- 'XMM2=0x0'
- 'MXCSR=0x0'
- 'XMM5=0x0'
- 'XMM14=0x0'
- 'XMM4=0x0'
- 'XMM8=0x0'
- 'XMM3=0x0'
- 'XMM10=0x0'
- 'XMM7=0x0'
- 'XMM13=0x0'
- 'XMM9=0x0'
- 'XMM1=0x0'
- 'XMM6=0x0'
- 'XMM0=0x0'
- 'XMM12=0x0'
cpu_name: znver3
llvm_triple: x86_64-unknown-linux-gnu
num_repetitions: 10000
measurements:
- { key: inverse_throughput, value: 0.5312, per_snippet_value: 7.4368 }
error: ''
info: instruction has tied variables, avoiding Read-After-Write issue, picking random def and use registers not aliasing each other, one unique register for each use position
assembled_snippet: 4883EC10C7042400000000C744240400000000C744240800000000C744240C00000000C57A6F3C244883C4104883EC10C7042400000000C744240400000000C744240800000000C744240C00000000C57A6F1C244883C4104883EC10C7042400000000C744240400000000C744240800000000C744240C00000000C5FA6F14244883C4104883EC04C70424801F0000C5F8AE14244883C4044883EC10C7042400000000C744240400000000C744240800000000C744240C00000000C5FA6F2C244883C4104883EC10C7042400000000C744240400000000C744240800000000C744240C00000000C57A6F34244883C4104883EC10C7042400000000C744240400000000C744240800000000C744240C00000000C5FA6F24244883C4104883EC10C7042400000000C744240400000000C744240800000000C744240C00000000C57A6F04244883C4104883EC10C7042400000000C744240400000000C744240800000000C744240C00000000C5FA6F1C244883C4104883EC10C7042400000000C744240400000000C744240800000000C744240C00000000C57A6F14244883C4104883EC10C7042400000000C744240400000000C744240800000000C744240C00000000C5FA6F3C244883C4104883EC10C7042400000000C744240400000000C744240800000000C744240C00000000C57A6F2C244883C4104883EC10C7042400000000C744240400000000C744240800000000C744240C00000000C57A6F0C244883C4104883EC10C7042400000000C744240400000000C744240800000000C744240C00000000C5FA6F0C244883C4104883EC10C7042400000000C744240400000000C744240800000000C744240C00000000C5FA6F34244883C4104883EC10C7042400000000C744240400000000C744240800000000C744240C00000000C5FA6F04244883C4104883EC10C7042400000000C744240400000000C744240800000000C744240C00000000C57A6F24244883C410C462A198FAC4E2A198EAC462A198F2C4E2A198E2C462A198C2C4E2A198DAC462A198D2C4E2A198FAC462A198EAC462A198CAC4E2A198CAC4E2A198F2C4E2A198C2C462A198E2C462A198FAC4E2A198EAC462A198F2C4E2A198E2C462A198C2C4E2A198DAC462A198D2C4E2A198FAC462A198EAC462A198CAC4E2A198CAC4E2A198F2C4E2A198C2C462A198E2C462A198FAC4E2A198EAC462A198F2C4E2A198E2C462A198C2C4E2A198DAC462A198D2C4E2A198FAC462A198EAC462A198CAC4E2A198CAC4E2A198F2C4E2A198C2C462A198E2C462A198FAC4E2A198EAC462A198F2C4E2A198E2C462A198C2C4E2A198DAC462A198D2C4E2A198FAC462A198EAC462A198CAC4E2A198CAC4E2A198F2C4E2A198C2C462A198E2C3
...
Check generated assembly with: /usr/bin/objdump -d /tmp/snippet-c32060.o
---
mode: inverse_throughput
key:
instructions:
- 'VFMADD132PDr XMM10 XMM10 XMM6 XMM6'
- 'VFMADD132PDr XMM8 XMM8 XMM6 XMM6'
- 'VFMADD132PDr XMM12 XMM12 XMM6 XMM6'
- 'VFMADD132PDr XMM9 XMM9 XMM6 XMM6'
- 'VFMADD132PDr XMM7 XMM7 XMM6 XMM6'
- 'VFMADD132PDr XMM1 XMM1 XMM6 XMM6'
- 'VFMADD132PDr XMM0 XMM0 XMM6 XMM6'
- 'VFMADD132PDr XMM5 XMM5 XMM6 XMM6'
- 'VFMADD132PDr XMM11 XMM11 XMM6 XMM6'
- 'VFMADD132PDr XMM2 XMM2 XMM6 XMM6'
- 'VFMADD132PDr XMM15 XMM15 XMM6 XMM6'
- 'VFMADD132PDr XMM3 XMM3 XMM6 XMM6'
- 'VFMADD132PDr XMM14 XMM14 XMM6 XMM6'
- 'VFMADD132PDr XMM4 XMM4 XMM6 XMM6'
- 'VFMADD132PDr XMM13 XMM13 XMM6 XMM6'
config: ''
register_initial_values:
- 'XMM10=0x0'
- 'XMM6=0x0'
- 'MXCSR=0x0'
- 'XMM8=0x0'
- 'XMM12=0x0'
- 'XMM9=0x0'
- 'XMM7=0x0'
- 'XMM1=0x0'
- 'XMM0=0x0'
- 'XMM5=0x0'
- 'XMM11=0x0'
- 'XMM2=0x0'
- 'XMM15=0x0'
- 'XMM3=0x0'
- 'XMM14=0x0'
- 'XMM4=0x0'
- 'XMM13=0x0'
cpu_name: znver3
llvm_triple: x86_64-unknown-linux-gnu
num_repetitions: 10000
measurements:
- { key: inverse_throughput, value: 0.5311, per_snippet_value: 7.9665 }
error: ''
info: instruction has tied variables, avoiding Read-After-Write issue, picking random def and use registers not aliasing each other, reusing the same register for all uses
assembled_snippet: 4883EC10C7042400000000C744240400000000C744240800000000C744240C00000000C57A6F14244883C4104883EC10C7042400000000C744240400000000C744240800000000C744240C00000000C5FA6F34244883C4104883EC04C70424801F0000C5F8AE14244883C4044883EC10C7042400000000C744240400000000C744240800000000C744240C00000000C57A6F04244883C4104883EC10C7042400000000C744240400000000C744240800000000C744240C00000000C57A6F24244883C4104883EC10C7042400000000C744240400000000C744240800000000C744240C00000000C57A6F0C244883C4104883EC10C7042400000000C744240400000000C744240800000000C744240C00000000C5FA6F3C244883C4104883EC10C7042400000000C744240400000000C744240800000000C744240C00000000C5FA6F0C244883C4104883EC10C7042400000000C744240400000000C744240800000000C744240C00000000C5FA6F04244883C4104883EC10C7042400000000C744240400000000C744240800000000C744240C00000000C5FA6F2C244883C4104883EC10C7042400000000C744240400000000C744240800000000C744240C00000000C57A6F1C244883C4104883EC10C7042400000000C744240400000000C744240800000000C744240C00000000C5FA6F14244883C4104883EC10C7042400000000C744240400000000C744240800000000C744240C00000000C57A6F3C244883C4104883EC10C7042400000000C744240400000000C744240800000000C744240C00000000C5FA6F1C244883C4104883EC10C7042400000000C744240400000000C744240800000000C744240C00000000C57A6F34244883C4104883EC10C7042400000000C744240400000000C744240800000000C744240C00000000C5FA6F24244883C4104883EC10C7042400000000C744240400000000C744240800000000C744240C00000000C57A6F2C244883C410C462C998D6C462C998C6C462C998E6C462C998CEC4E2C998FEC4E2C998CEC4E2C998C6C4E2C998EEC462C998DEC4E2C998D6C462C998FEC4E2C998DEC462C998F6C4E2C998E6C462C998EEC462C998D6C462C998C6C462C998E6C462C998CEC4E2C998FEC4E2C998CEC4E2C998C6C4E2C998EEC462C998DEC4E2C998D6C462C998FEC4E2C998DEC462C998F6C4E2C998E6C462C998EEC462C998D6C462C998C6C462C998E6C462C998CEC4E2C998FEC4E2C998CEC4E2C998C6C4E2C998EEC462C998DEC4E2C998D6C462C998FEC4E2C998DEC462C998F6C4E2C998E6C462C998EEC462C998D6C462C998C6C462C998E6C462C998CEC4E2C998FEC4E2C998CEC4E2C998C6C4E2C998EEC462C998DEC4E2C998D6C462C998FEC4E2C998DEC462C998F6C4E2C998E6C462C998EEC3
...
```
Reviewed By: courbet
Differential Revision: https://reviews.llvm.org/D139283
Paul Robinson [Mon, 5 Dec 2022 22:05:09 +0000 (14:05 -0800)]
[ARM/Darwin] Convert tests to check 'target='
Part of the project to eliminate special handling for triples in lit
expressions.
Jay Foad [Tue, 6 Dec 2022 14:50:02 +0000 (14:50 +0000)]
[AMDGPU] Remove FIXME that was addressed by D99413
Michael Maitland [Thu, 3 Nov 2022 17:44:10 +0000 (10:44 -0700)]
[RISCV][Codegen] Account for LMUL in Vector Mask instructions
It is likley that subtargets act differently for vector fixed-point arithmetic instructions based on the LMUL.
This patch creates seperate SchedRead, SchedWrite, WriteRes, ReadAdvance for each relevant LMUL.
Differential Revision: https://reviews.llvm.org/D137427
Jonathan Peyton [Tue, 6 Dec 2022 14:31:31 +0000 (08:31 -0600)]
[OpenMP][libomp] Cleanup version script and exported symbols
This patch fixes issues seen once https://reviews.llvm.org/D135402 is applied.
The exports_so.txt file attempts to export functions which may not exist
depending on which features are enabled/disabled in the OpenMP
runtime library. There are not many of these so exporting dummy
symbols is feasible.
* Export dummy __kmp_reset_stats() function when stats is disabled.
* Export dummy debugging data when USE_DEBUGGER is disabled
* Export dummy __kmp_itt_[fini|init]_ittlib() functions
when ITT Notify is disabled
* Export dummy __kmp_reap_monitor() function when KMP_USE_MONITOR
is disabled
* Remove __kmp_launch_monitor and __kmp_launch_worker from being exported.
They have been static symbols since library inception.
Fixes: https://github.com/llvm/llvm-project/issues/58858
Differential Revision: https://reviews.llvm.org/D138049
Emmmer [Mon, 5 Dec 2022 14:55:15 +0000 (22:55 +0800)]
[LLDB][RISCV] Add RV32FC instruction support for EmulateInstructionRISCV
Reviewed By: DavidSpickett
Differential Revision: https://reviews.llvm.org/D139390
Kadir Cetinkaya [Thu, 17 Nov 2022 12:38:43 +0000 (13:38 +0100)]
[include-cleaner] Make use of locateSymbol in WalkUsed and HTMLReport
Depens on D135953
Differential Revision: https://reviews.llvm.org/D138200
Valentin Clement [Tue, 6 Dec 2022 14:02:14 +0000 (15:02 +0100)]
[flang] Set correct box type when emboxing temporary in array value copy pass
In some cases, the base type is already a fir.box type. This patch updates
the code to set the result box type to fir.box<T>, where T is the type
of the allocated temp.
This was reported in https://github.com/llvm/llvm-project/issues/59342
Reviewed By: jeanPerier, tblah
Differential Revision: https://reviews.llvm.org/D139401
Matt Arsenault [Sun, 4 Dec 2022 00:32:41 +0000 (19:32 -0500)]
ValueTracking: Teach CannotBeOrderedLessThanZero about copysign
LLVM GN Syncbot [Tue, 6 Dec 2022 13:50:56 +0000 (13:50 +0000)]
[gn build] Port
d09d834bb980
LLVM GN Syncbot [Tue, 6 Dec 2022 13:50:55 +0000 (13:50 +0000)]
[gn build] Port
bc0617795f8b
Nico Weber [Tue, 6 Dec 2022 13:50:37 +0000 (08:50 -0500)]
[clang] Tweak test to tolerate clang being called "clang" instead of "clang-15"
Nico Weber [Tue, 6 Dec 2022 13:43:48 +0000 (08:43 -0500)]
Revert "Store OptTable::Info::Name as a StringRef"
This reverts commit
8ae18303f97d5dcfaecc90b4d87effb2011ed82e.
Breaks buildling lldb, see https://reviews.llvm.org/D139274#3974171
Nico Weber [Tue, 6 Dec 2022 13:43:30 +0000 (08:43 -0500)]
Revert "Quick fix to unbreak tblgen past
8ae18303f97d"
This reverts commit
e50a60d7349de151bd2b06d85a79201ebc372d8a.
Prerequisite for reverting
8ae18303f97d5.
Nico Weber [Tue, 6 Dec 2022 13:40:14 +0000 (08:40 -0500)]
Revert "[BOLT] Fix blocks layout reverse iterators"
This reverts commit
7bb0cbfc322826e7bd41a74d5d71da138603b31f.
Doesn't build at least on macOS, see https://reviews.llvm.org/D139335#3974169
Benjamin Kramer [Tue, 6 Dec 2022 13:34:18 +0000 (14:34 +0100)]
Quick fix to unbreak tblgen past
8ae18303f97d
llvm/tools/llvm-objdump/llvm-objdump.cpp:128:38: error: constexpr variable 'ObjdumpInfoTable' must be initialized by a constant expression
static constexpr opt::OptTable::Info ObjdumpInfoTable[] = {
^ ~
ObjdumpOpts.inc:30:45: note: non-constexpr function 'substr' cannot be used in a constant expression
OPTION(prefix_0, llvm::StringRef("<input>").substr(0), INPUT, Input, INVALID, INVALID, nullptr, 0, 0, nullptr, nullptr, nullptr)
^
Nikolas Klauser [Thu, 17 Nov 2022 20:34:29 +0000 (21:34 +0100)]
[libc++] Fix memory leaks when throwing inside std::vector constructors
Fixes #58392
Reviewed By: ldionne, #libc
Spies: alexfh, hans, joanahalili, dblaikie, libcxx-commits
Differential Revision: https://reviews.llvm.org/D138601
Matt Arsenault [Sat, 3 Dec 2022 17:27:34 +0000 (12:27 -0500)]
InstCombine: Fold fabs (copysign x, y) -> fabs x
Matt Arsenault [Sat, 3 Dec 2022 16:11:56 +0000 (11:11 -0500)]
InstCombine: Add baseline tests for copysign with fneg/fabs
serge-sans-paille [Sun, 4 Dec 2022 08:33:14 +0000 (09:33 +0100)]
Store OptTable::Info::Name as a StringRef
This avoids implicit conversion to StringRef at several points, which in
turns avoid redundant calls to strlen.
As a side effect, this greatly simplifies the implementation of
StrCmpOptionNameIgnoreCase.
It also eventually gives a consistent, humble speedup in compilation
time.
https://llvm-compile-time-tracker.com/compare.php?from=
5f5b942823474e98e43a27d515a87ce140396c53&to=
60e13b778119fc32d50dc38ff1a564a87146e9c6&stat=instructions:u
Differential Revision: https://reviews.llvm.org/D139274
Matt Arsenault [Mon, 5 Dec 2022 18:51:01 +0000 (13:51 -0500)]
LangRef: Clarify semantics of lround/llround and lrint/llrint
Haojian Wu [Tue, 6 Dec 2022 12:56:16 +0000 (13:56 +0100)]
Fix a -Wunused-variable warning in release build, NFC
Jean Perier [Tue, 6 Dec 2022 12:53:08 +0000 (13:53 +0100)]
[flang] Allow conversion from hlfir.expr to fir::ExtendedValue
For now at least, the plan is to keep hlfir.expr usage limited as
sub-expression operand, assignment rhs, and a few other contexts (
e.g. Associate statements). The rest of lowering (statements lowering
in the bridge) will still expect to get and manipulate characters and
arrays in memory. That means that hlfir.expr must be converted to
variable in converter.genExprAddr/converter.genExprBox.
This is done using an hlfir.associate, and generating the related
hlfir.end_associate in the statement context.
hlfir::getFirBase of is updated to avoid bringing in the HLFIR
fir.boxchar/fir.box into FIR when the entity was created with
hlfir::AssociateOp.
Differential Revision: https://reviews.llvm.org/D139328
bipmis [Tue, 6 Dec 2022 12:49:05 +0000 (12:49 +0000)]
Add tests which can be matched to umull
Manuel Brito [Tue, 6 Dec 2022 12:39:38 +0000 (12:39 +0000)]
[Clang][CodeGen] Use poison instead of undef for extra argument in __builtin_amdgcn_mov_dpp [NFC]
Differential Revision: https://reviews.llvm.org/D138755
Adrian Kuegel [Tue, 6 Dec 2022 12:29:47 +0000 (13:29 +0100)]
[mlir][SparseTensor] Apply ClangTidyLegacy finding (NFC).
Converting integer literal to bool, use bool literal instead.
Archibald Elliott [Wed, 30 Nov 2022 19:33:54 +0000 (19:33 +0000)]
[AArch64] Implement __arm_rsr128/__arm_wsr128
This only contains the SelectionDAG implementation. GlobalISel to
follow.
The broad approach is:
- Introduce new builtins for 128-bit wide instructions.
- Lower these to @llvm.read_register.i128/@llvm.write_register.i128
- Introduce target-specific ISD nodes which have legal operands (two
i64s rather than an i128). These are named AArch64::{MRRS, MSRR} to
match the instructions they are for. These are a little complex as
they need to match the "shape" of what they're replacing or the
legaliser complains.
- Select these using the existing tryReadRegister/tryWriteRegister to
share the MDString parsing code, and introduce additional code to
ensure these are selected into the right MRRS/MSRR instructions. What
makes this hard is ensuring that the two i64s end up in an XSeqPair
register pair, because SelectionDAG doesn't care that much about
register classes if it can avoid doing so.
The main change to existing code is the reorganisation of
tryReadRegister and tryWriteRegister to try to keep the string parsing
code separate from the instruction creating code.
This also includes the changes to clang to define and use the ACLE
feature macro named `__ARM_FEATURE_SYSREG128`.
Contributors:
Sam Elliott
Lucas Prates
Differential Revision: https://reviews.llvm.org/D139086
Ties Stuij [Tue, 6 Dec 2022 10:56:43 +0000 (10:56 +0000)]
[AArch64] implement GPR (U/S)(MIN/MAX) instruction SDag support
Using SelectionDag, lower umin, umax, smin, smax intrinsics to corresponding
UMIN, UMAX, SMIN, SMAX instructions when feat CSSC is available.
See specs for corresponding immediate and register versions in:
https://developer.arm.com/documentation/ddi0602/2022-09/Base-Instructions/
Reviewed By: lenary
Differential Revision: https://reviews.llvm.org/D138813
Nikita Popov [Wed, 26 Oct 2022 09:55:46 +0000 (11:55 +0200)]
[IR] Don't assume readnone/readonly intrinsics are willreturn
This removes our "temporary" hack to assume that readnone/readonly
intrinsics are also willreturn. An explicit willreturn annotation,
usually via default intrinsic attributes, is now required.
Differential Revision: https://reviews.llvm.org/D137630
Ties Stuij [Tue, 6 Dec 2022 10:44:05 +0000 (10:44 +0000)]
[AArch64] lower abs intrinsic to new ABS instruction in SelDag
When feature CSSC is available, the SelectionDag abs intrinsic should map to the
new scalar ABS instruction.
Additionally, the SIMDTwoScalarD tablegen defm includes a pattern match for
scalar i64, which we don't want to use when CSSC is enabled.
spec:
https://developer.arm.com/documentation/ddi0602/2022-09/Base-Instructions/ABS--Absolute-value-
Reviewed By: lenary
Differential Revision: https://reviews.llvm.org/D138812
David Spickett [Mon, 5 Dec 2022 10:10:42 +0000 (10:10 +0000)]
[LLVM][ARM] Correct llvm feature for vfpv3d16 host feature
d16 was removed in https://reviews.llvm.org/D60691.
Reviewed By: efriedma
Differential Revision: https://reviews.llvm.org/D139304
David Spickett [Tue, 6 Dec 2022 10:30:38 +0000 (10:30 +0000)]
[lld-macho] Fix map file test on 32 bit hosts
The test added in https://reviews.llvm.org/D137368 has been failing
on our 32 bit arm bots:
https://lab.llvm.org/buildbot/#/builders/178/builds/3460
You get this for the strings:
<<dead>> 0x883255000000003 [ 10] literal string: Hello, it's me
Instead of the expected:
<<dead>> 0x0000000F [ 3] literal string: Hello, it's me
This is because unlike symbols whose size is a uint64_t, strings
use a StringRef whose size is size_t. size_t changes size between
32 and 64 bit platforms.
This fixes the test by using %z to print the size of the strings,
this works for 32 and 64 bit.
Ties Stuij [Tue, 6 Dec 2022 10:42:06 +0000 (10:42 +0000)]
[AArch64] SelectionDag codegen for gpr CTZ instruction
When feature CSSC is available we should use instruction CTZ in SelectionDag
where applicable:
- CTTZ intrinsics are lowered to using the gpr CTZ instruction
- BITREVERSE -> CTLZ instruction pattern gets replaced by CTZ
spec:
https://developer.arm.com/documentation/ddi0602/2022-09/Base-Instructions/CTZ--Count-Trailing-Zeros-
Reviewed By: lenary
Differential Revision: https://reviews.llvm.org/D138811
Kristina Bessonova [Fri, 18 Nov 2022 17:56:51 +0000 (19:56 +0200)]
[llvm-objdump] Avoid using mapping symbols as branch target labels
The main motivation for this change is to avoid ambiguity because
mapping symbol names may not be unique across a binary and do not allow uniquely
identifying target address. So that mapping symbols used as branch target
labels make llvm-objdump output less readable.
Another point is that mapping symbols sometimes appear in
non-allocatable sections, like debug info sections which make objdump
output even more confusing.
For example, a small AArch64 executable may contain plenty of `$d[.*]`
symbols and none of them would be useful as a label for resolving
a branch or a memory operand target address:
```
0000000000000254 l .note.ABI-tag
0000000000000000 $d
00000000000008d4 l .eh_frame
0000000000000000 $d
0000000000000868 l .rodata
0000000000000000 $d
0000000000011028 l .data
0000000000000000 $d
0000000000010db8 l .fini_array
0000000000000000 $d
0000000000010db0 l .init_array
0000000000000000 $d
00000000000008e8 l .eh_frame
0000000000000000 $d
0000000000011034 l .bss
0000000000000000 $d
```
Note that GNU objdump doesn't use mapping symbols as branch target
labels for all targets that support such symbols (ARM, AArch64, CSKY).
Differential Revision: https://reviews.llvm.org/D139131
Javier Setoain [Mon, 7 Nov 2022 21:27:36 +0000 (21:27 +0000)]
[mlir] Add hoisting of transfer ops in affine loops
The only way to do this with the current hoisting strategy is by
lowering Affine to Scf first, but that prevents further passes on
Affine.
Differential Revision: https://reviews.llvm.org/D137600
Max Kazantsev [Tue, 6 Dec 2022 09:35:45 +0000 (16:35 +0700)]
[SCEVExpander] Support cost evaluation of several SCEVs with same budget
This is a follow-up from discussion in D138412. Sometimes we want to evaluate
the cost of expansion of several SCEVs together with same budget. For example,
if one of them is a bit above cheap limit, and the second one is free, then
we still want to expand. Checking each of them with "cheap" limit is a bit more
pessimistic.
Differential Revision: https://reviews.llvm.org/D138475
Reviewed By: lebedev.ri
HanSheng Zhang [Tue, 6 Dec 2022 09:59:02 +0000 (10:59 +0100)]
[Verifier]Remove API declaration that has never been implemented
Close https://github.com/llvm/llvm-project/issues/59244
Reviewed By: t.p.northover
Differential Revision: https://reviews.llvm.org/D138889
HanSheng Zhang [Tue, 6 Dec 2022 09:55:17 +0000 (10:55 +0100)]
[CMake]Allow user specified CPack Options
This should allow downstream vendors to install multiple LLVM distributions in parallel.
Should we also patch the default values to allow multiple upstream llvm distribution?
Reviewed By: thieta
Differential Revision: https://reviews.llvm.org/D138632
Juan Manuel MARTINEZ CAAMAÑO [Tue, 6 Dec 2022 09:22:52 +0000 (04:22 -0500)]
[NFC] Remove const from return value of function
Sergey Kachkov [Wed, 19 Oct 2022 15:12:33 +0000 (18:12 +0300)]
[RISCV] Generate .cfi_def_cfa_expression for RVV stack adjustment
Cannonical frame address after RVV stack adjustment is sp + StackSize +
RVVStackSize * vlenb, and since vlenb is unknown at compile-time (but it
is a constant for particular HW implementation), emit
.cfi_def_cfa_expression so libunwind can read VLENB CSR register at
run-time and obtain correct frame address.
Fixes https://github.com/llvm/llvm-project/issues/58356 (but additional
run-time support for reading CSR may be required)
Differential Revision: https://reviews.llvm.org/D136263
Vlad Serebrennikov [Tue, 6 Dec 2022 09:42:07 +0000 (12:42 +0300)]
[clang] Add test for CWG600
P1787: //CWG600 is resolved by explaining that accessibility affects naming a member in the sense of the ODR.//
Wording: see changes to [class.access] p1 and p4.
Additional references: [[ http://eel.is/c++draft/basic.def.odr#8.sentence-2 | basic.def.odr/8 ]]: //A function is odr-used if it is named by a potentially-evaluated expression or conversion.//
Reviewed By: #clang-language-wg, aaron.ballman
Differential Revision: https://reviews.llvm.org/D139173
Corentin Jabot [Fri, 2 Dec 2022 18:35:13 +0000 (19:35 +0100)]
[Clang] make_cxx_dr_status download the issue list automatically
if none is provided
Reviewed By: aaron.ballman
Differential Revision: https://reviews.llvm.org/D139212
Vlad Serebrennikov [Tue, 6 Dec 2022 09:38:38 +0000 (12:38 +0300)]
[clang] Mark CWG554 as N/A
P1787: //CWG554 is resolved by using the word “scope” instead of “declarative region”, consistent with its very common use in phrases like “namespace scope”.//
Reviewed By: #clang-language-wg, cor3ntin, aaron.ballman, shafik
Differential Revision: https://reviews.llvm.org/D139172
David Spickett [Mon, 5 Dec 2022 11:53:06 +0000 (11:53 +0000)]
[LLVM][Release] Prevent empty runtime name in release script
Unlike projects, runtimes doesn't have a default set of names.
This means you get a leading space at the start, which gets converted
to a ';' giving ";<runtime name>;<runtime name>".
CMake then errors because the "" before the first ';' is treated
as a runtime name and of course it's not a valid name.
Fix this by removing the leading spaces from runtimes before we
insert the ';'.
Reviewed By: ldionne
Differential Revision: https://reviews.llvm.org/D139306
chenglin.bi [Tue, 6 Dec 2022 09:35:08 +0000 (17:35 +0800)]
[Instcombine] Add baseline tests for logic-and/logic-or factorization; NFC
Vlad Serebrennikov [Tue, 6 Dec 2022 09:33:56 +0000 (12:33 +0300)]
[clang] Add test for CWG405
P1787: //CWG405 is resolved by stating that argument-dependent lookup (sometimes) occurs after an ordinary unqualified lookup (making statements like “finding a variable prevents argument-dependent lookup” formally correct).//
Wording: see changes to [basic.lookup.argdep] p1 and p3
This issue seems a duplicate of CWG218, even though it is not officially recognized. A part of a test for CWG218 is reused here, adding cross-references.
Reviewed By: #clang-language-wg, aaron.ballman
Differential Revision: https://reviews.llvm.org/D139095
Tobias Hieta [Tue, 22 Nov 2022 09:10:36 +0000 (10:10 +0100)]
[CodeView] Add support for local S_CONSTANT records
CodeView doesn't have the ability to represent variables
in other ways than as in registers or memory values, but
LLVM very often transforms simple values into constants,
consider this program:
int f () { int i = 123; return i; }
LLVM will transform `i` into a constant value and just
leave behind a llvm.dbg.value, this can't be represented
as a S_LOCAL record in CodeView. But we can represent it
as a S_CONSTANT record.
This patch checks if the location of a debug value is null,
then we will insert a S_CONSTANT record instead of a S_LOCAL
value with the flag "OptimizedAway".
In lld we then output the S_CONSTANT in the right scope, before
they where always inserted in the global stream, now we check
the scope before inserting it.
This has shown to improve debugging for our developers
internally.
Fixes to llvm/llvm-project#55958
Reviewed By: aganea
Differential Revision: https://reviews.llvm.org/D138995
Vlad Serebrennikov [Tue, 6 Dec 2022 08:56:01 +0000 (11:56 +0300)]
[clang] Add test for CWG952
P1787: // [[ https://wg21.link/cwg952 | CWG952 ]] is resolved by refining the definition of “naming class” per Richard’s suggestion in [[ https://lists.isocpp.org/core/2020/09/9963.php | “CWG1621 and [class.static/2”]].//
Wording:
- [class.static]/2 removed;
- [class.access.base]/5 rephrased.
Currently behavior is the following: unqualified names undergo //unqualified name lookup// [1], which perform //unqualified search// in immediate scope [2]. This scope is the scope the definition of //naming class// [3] refers to. `A::I` is not //accessible// when named in classes `C` and `D` per [3]. In particular, the last item regarding base class ([class.access.base]/5.4) is not applicable, because class `A` is not //accessible// in both classes `C` and `D` per [4].
References:
1. [[ https://eel.is/c++draft/basic.lookup#unqual-4.sentence-2 | basic.lookup.unqual/4 ]]
2. [[ https://eel.is/c++draft/basic.lookup#unqual-3 | basic.lookup.unqual/3 ]]
3. [[ https://eel.is/c++draft/class.access#base-5.sentence-4 | class.access.base/5 ]]
4. [[ https://eel.is/c++draft/class.access#base-4 | class.access.base/4 ]]
Reviewed By: #clang-language-wg, erichkeane, aaron.ballman
Differential Revision: https://reviews.llvm.org/D139326
Nikita Popov [Tue, 6 Dec 2022 09:14:00 +0000 (10:14 +0100)]
[MemCpyOpt] Use BatchAA when processing one instruction (NFCI)
While we can't use a single BatchAA instance for the entire
MemCpyOpt run without further justification, we can use BatchAA
while performing the queries related to a single instruction
(these will first perform some AA-based checks, and then modify
the IR only afterwards).
Gedare Bloom [Tue, 6 Dec 2022 08:58:30 +0000 (00:58 -0800)]
[clang-format] Avoid breaking )( with BlockIndent
The BracketAlignmentStyle BAS_BlockIndent was forcing breaks before a
closing right parenthesis yielding strange-looking results in case of
code structures that have a left parens immediately following a right
parens ")(" such as is seen with indirect function calls via function
pointers and with type casting.
Fixes 57250.
Fixes 58496.
Differential Revision: https://reviews.llvm.org/D137762
Sergey Kachkov [Mon, 21 Nov 2022 08:48:26 +0000 (11:48 +0300)]
[libunwind][RISCV] Support reading of VLENB CSR register
Support reading of VLENB (vector byte length) control register, that can be
required for correct unwinding of RVV objects on stack.
Differential Revision: https://reviews.llvm.org/D136264
Nikita Popov [Tue, 6 Dec 2022 08:43:42 +0000 (09:43 +0100)]
[DSE] Reuse BatchAA for MSSA clobber queries
This is not NFC because the DSE BatchAA is more powerful than the
default one due to EarliestEscape CaptureInfo, so this might
improve results in some cases.
Jean Perier [Tue, 6 Dec 2022 08:33:48 +0000 (09:33 +0100)]
[flang] do not generate padding/truncation code when character length are equals
When generating character assignment operations, the generic code
generates some code to handle truncation and padding when the length
differ at runtime. A bypass already exists when the length are compile
time constant and match, but it was not used for the trivial case where
the RHS and LHS length is the same SSA value. In such case, even though,
the length is not know at compile time, it is known to be the same.
This will simplify the code creating character temporaries from a
variable in HLFIR that will use this assignment code.
Note that this probably has little impact on performance (llvm may be clever enough
to later catch that for us). But it makes the generated IR a lot more readable at
little cost.
Differential Revision: https://reviews.llvm.org/D139330
Valery Pykhtin [Mon, 21 Nov 2022 16:15:32 +0000 (17:15 +0100)]
[AMDGPU] Fix GCNSubtarget::getMinNumVGPRs, add unit test to check consistency between GCNSubtarget's getMinNumVGPRs, getMaxNumVGPRs and getOccupancyWithNumVGPRs.
```
/// \returns Minimum number of VGPRs that meets given number of waves per
/// execution unit requirement supported by the subtarget.
unsigned getMinNumVGPRs(unsigned WavesPerEU) const;
/// \returns Maximum number of VGPRs that meets given number of waves per
/// execution unit requirement supported by the subtarget.
unsigned getMaxNumVGPRs(unsigned WavesPerEU) const;
/// Return the maximum number of waves per SIMD for kernels using \p VGPRs
/// VGPRs
unsigned getOccupancyWithNumVGPRs(unsigned VGPRs) const;
```
While working on RP tracking issues I noticed that getMinNumVGPRs return incorrect
values: the problem is large VGPR granule sizes on GFX10+ architectures. Some of the
occupancies aren't reachable because require the same amount of VGPR granules as others.
For example 19 waves occupancy on gfx1010 require the same amount of granules as 20 waves
so the resultng occupancy would be 20.
SGPRs have the same issue and even have inconsistency between getMaxNumSGPRs and getOccupancyWithNumSGPRs.
It will be addressed in the next patch.
Legend:
# MinVGPR and MaxVGPR are values returned by getMinNumVGPRs and getMaxNumVGPRs for a given Occ.
# (ONumber) is the value returned by getOccupancyWithNumVGPRs for a given MinVGPR or MaxVGPR.
# R means range problem: MinVGPR should be less than MaxVGPR and both should refer to the same occupancy.
Unit test output without the fix:
```
./build/unittests/Target/AMDGPU/AMDGPUTests --gtest_filter=AMDGPU.TestVGPRLimitsPerOccupancy --print-cpu-reg-limits
gfx90a gfx940:
Occ MinVGPR MaxVGPR
8 0 (O8) 64 (O8)
7 65 (O7) 72 (O7)
6 73 (O6) 80 (O6)
5 81 (O5) 96 (O5)
4 97 (O4) 128 (O4)
3 129 (O3) 168 (O3)
2 169 (O2) 256 (O2)
1 257 (O1) 512 (O1)
gfx600 gfx600 gfx601 gfx601 gfx601 gfx602 gfx602 gfx602 gfx700 gfx700 gfx701 gfx701 gfx702 gfx703 gfx703 gfx703 gfx704 gfx704 gfx705 gfx801 gfx801 gfx802 gfx802 gfx802 gfx803 gfx803 gfx803 gfx803 gfx805 gfx805 gfx810 gfx810 gfx900 gfx902 gfx904 gfx906 gfx908 gfx909 gfx90c:
Occ MinVGPR MaxVGPR
10 0 (O10) 24 (O10)
9 25 (O9) 28 (O9)
8 29 (O8) 32 (O8)
7 33 (O7) 36 (O7)
6 37 (O6) 40 (O6)
5 41 (O5) 48 (O5)
4 49 (O4) 64 (O4)
3 65 (O3) 84 (O3)
2 85 (O2) 128 (O2)
1 129 (O1) 256 (O1)
gfx1030w64 gfx1031w64 gfx1032w64 gfx1033w64 gfx1034w64 gfx1035w64 gfx1036w64 gfx1102w64 gfx1103w64:
Occ MinVGPR MaxVGPR
16 0 (O16) 32 (O16)
15 33 (O12) R 32 (O16)
14 33 (O12) R 32 (O16)
13 33 (O12) R 32 (O16)
12 33 (O12) 40 (O12)
11 41 (O10) R 40 (O12)
10 41 (O10) 48 (O10)
9 49 (O9) 56 (O9)
8 57 (O8) 64 (O8)
7 65 (O7) 72 (O7)
6 73 (O6) 80 (O6)
5 81 (O5) 96 (O5)
4 97 (O4) 128 (O4)
3 129 (O3) 168 (O3)
2 169 (O2) 256 (O2)
1 256 (O2) R 256 (O2)
gfx1100w64 gfx1101w64:
Occ MinVGPR MaxVGPR
16 0 (O16) 48 (O16)
15 49 (O12) R 48 (O16)
14 49 (O12) R 48 (O16)
13 49 (O12) R 48 (O16)
12 49 (O12) 60 (O12)
11 61 (O10) R 60 (O12)
10 61 (O10) 72 (O10)
9 73 (O9) 84 (O9)
8 85 (O8) 96 (O8)
7 97 (O7) 108 (O7)
6 109 (O6) 120 (O6)
5 121 (O5) 144 (O5)
4 145 (O4) 192 (O4)
3 193 (O3) 252 (O3)
2 253 (O2) 256 (O2)
1 256 (O2) R 256 (O2)
gfx1030w32 gfx1031w32 gfx1032w32 gfx1033w32 gfx1034w32 gfx1035w32 gfx1036w32 gfx1102w32 gfx1103w32:
Occ MinVGPR MaxVGPR
16 0 (O16) 64 (O16)
15 65 (O12) R 64 (O16)
14 65 (O12) R 64 (O16)
13 65 (O12) R 64 (O16)
12 65 (O12) 80 (O12)
11 81 (O10) R 80 (O12)
10 81 (O10) 96 (O10)
9 97 (O9) 112 (O9)
8 113 (O8) 128 (O8)
7 129 (O7) 144 (O7)
6 145 (O6) 160 (O6)
5 161 (O5) 192 (O5)
4 193 (O4) 256 (O4)
3 256 (O4) R 256 (O4)
2 256 (O4) R 256 (O4)
1 256 (O4) R 256 (O4)
gfx1100w32 gfx1101w32:
Occ MinVGPR MaxVGPR
16 0 (O16) 96 (O16)
15 97 (O12) R 96 (O16)
14 97 (O12) R 96 (O16)
13 97 (O12) R 96 (O16)
12 97 (O12) 120 (O12)
11 121 (O10) R 120 (O12)
10 121 (O10) 144 (O10)
9 145 (O9) 168 (O9)
8 169 (O8) 192 (O8)
7 193 (O7) 216 (O7)
6 217 (O6) 240 (O6)
5 241 (O5) 256 (O5)
4 256 (O5) R 256 (O5)
3 256 (O5) R 256 (O5)
2 256 (O5) R 256 (O5)
1 256 (O5) R 256 (O5)
gfx1010w64 gfx1011w64 gfx1012w64 gfx1013w64:
Occ MinVGPR MaxVGPR
20 0 (O20) 24 (O20)
19 25 (O18) R 24 (O20)
18 25 (O18) 28 (O18)
17 29 (O16) R 28 (O18)
16 29 (O16) 32 (O16)
15 33 (O14) R 32 (O16)
14 33 (O14) 36 (O14)
13 37 (O12) R 36 (O14)
12 37 (O12) 40 (O12)
11 41 (O11) 44 (O11)
10 45 (O10) 48 (O10)
9 49 (O9) 56 (O9)
8 57 (O8) 64 (O8)
7 65 (O7) 72 (O7)
6 73 (O6) 84 (O6)
5 85 (O5) 100 (O5)
4 101 (O4) 128 (O4)
3 129 (O3) 168 (O3)
2 169 (O2) 256 (O2)
1 256 (O2) R 256 (O2)
gfx1010w32 gfx1011w32 gfx1012w32 gfx1013w32:
Occ MinVGPR MaxVGPR
20 0 (O20) 48 (O20)
19 49 (O18) R 48 (O20)
18 49 (O18) 56 (O18)
17 57 (O16) R 56 (O18)
16 57 (O16) 64 (O16)
15 65 (O14) R 64 (O16)
14 65 (O14) 72 (O14)
13 73 (O12) R 72 (O14)
12 73 (O12) 80 (O12)
11 81 (O11) 88 (O11)
10 89 (O10) 96 (O10)
9 97 (O9) 112 (O9)
8 113 (O8) 128 (O8)
7 129 (O7) 144 (O7)
6 145 (O6) 168 (O6)
5 169 (O5) 200 (O5)
4 201 (O4) 256 (O4)
3 256 (O4) R 256 (O4)
2 256 (O4) R 256 (O4)
1 256 (O4) R 256 (O4)
```
After the fix:
```
gfx90a gfx940:
Occ MinVGPR MaxVGPR
8 0 (O8) 64 (O8)
7 65 (O7) 72 (O7)
6 73 (O6) 80 (O6)
5 81 (O5) 96 (O5)
4 97 (O4) 128 (O4)
3 129 (O3) 168 (O3)
2 169 (O2) 256 (O2)
1 257 (O1) 512 (O1)
gfx600 gfx600 gfx601 gfx601 gfx601 gfx602 gfx602 gfx602 gfx700 gfx700 gfx701 gfx701 gfx702 gfx703 gfx703 gfx703 gfx704 gfx704 gfx705 gfx801 gfx801 gfx802 gfx802 gfx802 gfx803 gfx803 gfx803 gfx803 gfx805 gfx805 gfx810 gfx810 gfx900 gfx902 gfx904 gfx906 gfx908 gfx909 gfx90c:
Occ MinVGPR MaxVGPR
10 0 (O10) 24 (O10)
9 25 (O9) 28 (O9)
8 29 (O8) 32 (O8)
7 33 (O7) 36 (O7)
6 37 (O6) 40 (O6)
5 41 (O5) 48 (O5)
4 49 (O4) 64 (O4)
3 65 (O3) 84 (O3)
2 85 (O2) 128 (O2)
1 129 (O1) 256 (O1)
gfx1030w64 gfx1031w64 gfx1032w64 gfx1033w64 gfx1034w64 gfx1035w64 gfx1036w64 gfx1102w64 gfx1103w64:
Occ MinVGPR MaxVGPR
16 0 (O16) 32 (O16)
15 0 (O16) 32 (O16)
14 0 (O16) 32 (O16)
13 0 (O16) 32 (O16)
12 33 (O12) 40 (O12)
11 33 (O12) 40 (O12)
10 41 (O10) 48 (O10)
9 49 (O9) 56 (O9)
8 57 (O8) 64 (O8)
7 65 (O7) 72 (O7)
6 73 (O6) 80 (O6)
5 81 (O5) 96 (O5)
4 97 (O4) 128 (O4)
3 129 (O3) 168 (O3)
2 169 (O2) 256 (O2)
1 169 (O2) 256 (O2)
gfx1100w64 gfx1101w64:
Occ MinVGPR MaxVGPR
16 0 (O16) 48 (O16)
15 0 (O16) 48 (O16)
14 0 (O16) 48 (O16)
13 0 (O16) 48 (O16)
12 49 (O12) 60 (O12)
11 49 (O12) 60 (O12)
10 61 (O10) 72 (O10)
9 73 (O9) 84 (O9)
8 85 (O8) 96 (O8)
7 97 (O7) 108 (O7)
6 109 (O6) 120 (O6)
5 121 (O5) 144 (O5)
4 145 (O4) 192 (O4)
3 193 (O3) 252 (O3)
2 253 (O2) 256 (O2)
1 253 (O2) 256 (O2)
gfx1030w32 gfx1031w32 gfx1032w32 gfx1033w32 gfx1034w32 gfx1035w32 gfx1036w32 gfx1102w32 gfx1103w32:
Occ MinVGPR MaxVGPR
16 0 (O16) 64 (O16)
15 0 (O16) 64 (O16)
14 0 (O16) 64 (O16)
13 0 (O16) 64 (O16)
12 65 (O12) 80 (O12)
11 65 (O12) 80 (O12)
10 81 (O10) 96 (O10)
9 97 (O9) 112 (O9)
8 113 (O8) 128 (O8)
7 129 (O7) 144 (O7)
6 145 (O6) 160 (O6)
5 161 (O5) 192 (O5)
4 193 (O4) 256 (O4)
3 193 (O4) 256 (O4)
2 193 (O4) 256 (O4)
1 193 (O4) 256 (O4)
gfx1100w32 gfx1101w32:
Occ MinVGPR MaxVGPR
16 0 (O16) 96 (O16)
15 0 (O16) 96 (O16)
14 0 (O16) 96 (O16)
13 0 (O16) 96 (O16)
12 97 (O12) 120 (O12)
11 97 (O12) 120 (O12)
10 121 (O10) 144 (O10)
9 145 (O9) 168 (O9)
8 169 (O8) 192 (O8)
7 193 (O7) 216 (O7)
6 217 (O6) 240 (O6)
5 241 (O5) 256 (O5)
4 241 (O5) 256 (O5)
3 241 (O5) 256 (O5)
2 241 (O5) 256 (O5)
1 241 (O5) 256 (O5)
gfx1010w64 gfx1011w64 gfx1012w64 gfx1013w64:
Occ MinVGPR MaxVGPR
20 0 (O20) 24 (O20)
19 0 (O20) 24 (O20)
18 25 (O18) 28 (O18)
17 25 (O18) 28 (O18)
16 29 (O16) 32 (O16)
15 29 (O16) 32 (O16)
14 33 (O14) 36 (O14)
13 33 (O14) 36 (O14)
12 37 (O12) 40 (O12)
11 41 (O11) 44 (O11)
10 45 (O10) 48 (O10)
9 49 (O9) 56 (O9)
8 57 (O8) 64 (O8)
7 65 (O7) 72 (O7)
6 73 (O6) 84 (O6)
5 85 (O5) 100 (O5)
4 101 (O4) 128 (O4)
3 129 (O3) 168 (O3)
2 169 (O2) 256 (O2)
1 169 (O2) 256 (O2)
gfx1010w32 gfx1011w32 gfx1012w32 gfx1013w32:
Occ MinVGPR MaxVGPR
20 0 (O20) 48 (O20)
19 0 (O20) 48 (O20)
18 49 (O18) 56 (O18)
17 49 (O18) 56 (O18)
16 57 (O16) 64 (O16)
15 57 (O16) 64 (O16)
14 65 (O14) 72 (O14)
13 65 (O14) 72 (O14)
12 73 (O12) 80 (O12)
11 81 (O11) 88 (O11)
10 89 (O10) 96 (O10)
9 97 (O9) 112 (O9)
8 113 (O8) 128 (O8)
7 129 (O7) 144 (O7)
6 145 (O6) 168 (O6)
5 169 (O5) 200 (O5)
4 201 (O4) 256 (O4)
3 201 (O4) 256 (O4)
2 201 (O4) 256 (O4)
1 201 (O4) 256 (O4)
```
Reviewed By: #amdgpu, arsenm
Differential Revision: https://reviews.llvm.org/D138443
Kazu Hirata [Tue, 6 Dec 2022 08:03:44 +0000 (00:03 -0800)]
[mlir] Use std::nullopt instead of None in comments (NFC)
This is part of an effort to migrate from llvm::Optional to
std::optional:
https://discourse.llvm.org/t/deprecating-llvm-optional-x-hasvalue-getvalue-getvalueor/63716
Vladislav Khmelevsky [Mon, 5 Dec 2022 16:40:32 +0000 (20:40 +0400)]
[BOLT] Fix blocks layout reverse iterators
Use container's reverse iterators
Differential Revision: https://reviews.llvm.org/D139335
Kazu Hirata [Tue, 6 Dec 2022 07:55:23 +0000 (23:55 -0800)]
[clang-tools-extra] Use std::nullopt instead of llvm::None (NFC)
This is part of an effort to migrate from llvm::Optional to
std::optional:
https://discourse.llvm.org/t/deprecating-llvm-optional-x-hasvalue-getvalue-getvalueor/63716
Kazu Hirata [Tue, 6 Dec 2022 07:50:04 +0000 (23:50 -0800)]
[llvm] Use std::nullopt instead of llvm::None (NFC)
This is part of an effort to migrate from llvm::Optional to
std::optional:
https://discourse.llvm.org/t/deprecating-llvm-optional-x-hasvalue-getvalue-getvalueor/63716
Diego Caballero [Tue, 6 Dec 2022 07:30:28 +0000 (07:30 +0000)]
[mlir] Add `replaceAllUsesExcept` to rewriter
This patch adds `replaceAllUsesExcept` to the rewriter class.
The implementation is copy-pasted from Value + calling
`updateRootInPlace` to notify the listeners about the
corresponding IR changes.
Reviewed By: Mogball
Differential Revision: https://reviews.llvm.org/D139382
Kazu Hirata [Tue, 6 Dec 2022 07:32:18 +0000 (23:32 -0800)]
[lldb] Use std::nullopt instead of llvm::None (NFC)
This is part of an effort to migrate from llvm::Optional to
std::optional:
https://discourse.llvm.org/t/deprecating-llvm-optional-x-hasvalue-getvalue-getvalueor/63716
Nikita Popov [Mon, 5 Dec 2022 15:10:33 +0000 (16:10 +0100)]
[MemorySSA] Use BatchAA for clobber walker
While MemorySSA use optimization was already using BatchAA, the
publicly exposed MSSA walkers were using plain AAResults. This is
not great, because it is expected that clobber walking will make
repeated AA queries.
This patch makes the clobber API accept a BatchAAResults instance.
The plain APIs are kept as wrappers and will create a BatchAAResults
instance for the duration of the query. In the future, the explicit
BatchAAResults arguments will be used to share AA results across
queries, not just within one query.
Differential Revision: https://reviews.llvm.org/D136164
dbakunevich [Tue, 6 Dec 2022 07:24:57 +0000 (14:24 +0700)]
Added connection to the library with name "re".
Fixed a bug that the "re" library was used in
this python file, but there was no import of it.
Differential Revision: https://reviews.llvm.org/D137926
Fangrui Song [Tue, 6 Dec 2022 07:21:02 +0000 (07:21 +0000)]
[TableGen] llvm::Optional => std::optional
Kazu Hirata [Tue, 6 Dec 2022 07:18:15 +0000 (23:18 -0800)]
[lldb] Use std::nullopt instead of llvm::None (NFC)
This is part of an effort to migrate from llvm::Optional to
std::optional:
https://discourse.llvm.org/t/deprecating-llvm-optional-x-hasvalue-getvalue-getvalueor/63716
jacquesguan [Sun, 9 Oct 2022 07:28:42 +0000 (15:28 +0800)]
[RISCV][NFC] Add test coverage for insertelement/extractelement of widen vector type.
Reviewed By: reames
Differential Revision: https://reviews.llvm.org/D135534
Mark Lacey [Wed, 19 Oct 2022 20:20:56 +0000 (13:20 -0700)]
[PartialInlining] Enable recursive partial inlining.
It seems unnecessarily limiting to disallow recursive partial
inlining, and there are clearly cases where it can benefit
code by avoiding a function call and potentially enabling
other transformations like dead argument elimination
in cases where an argument is only used prior to the early-out
test at the top of the function.
The pass already properly rewrites the recursive calls
within the body of the freshly cloned function, so the only
change here is removing the bail-out when recursion is
detected.
Reviewed By: efriedma
Differential Revision: https://reviews.llvm.org/D136383
Vitaly Buka [Mon, 5 Dec 2022 02:59:01 +0000 (18:59 -0800)]
[msan][CodeGen] Set noundef for C return value
Msan needs noundef consistency between interface and implementation. If
we call C++ from C we can have noundef on C++ side, and no noundef on
caller C side, noundef implementation will not set TLS for return value,
no noundef caller will expect it. Then we have false reports in msan.
The workaround could be set TLS to zero even for noundef return values.
However if we do that always it will increase binary size by about 10%.
If we do that selectively we need to handle "address is taken"
functions, any non local functions, and probably all function which have
musttail callers. Which is still a lot.
The existing implementation of HasStrictReturn refers to C standard as
the reason not enforcing noundef. I believe it applies only to the case
when return statement is omitted. Testing on Google codebase I never see
such cases, however I've see tens of cases where C code returns actual
uninitialized variables, but we ignore that it because of "omitted
return" case.
So this patch will:
1. fix false-positives with TLS missmatch.
2. detect bugs returning uninitialized variables for C as well.
3. report "omitted return" cases stricter than C, which is already a
warning and very likely a bug in a code anyway.
Reviewed By: kda
Differential Revision: https://reviews.llvm.org/D139296
Kazu Hirata [Tue, 6 Dec 2022 06:56:24 +0000 (22:56 -0800)]
[Support] Include optional instead of None.h
SMLoc uses std::nullopt_t, so it should include optional rather than
None.h.
Kazu Hirata [Tue, 6 Dec 2022 06:43:53 +0000 (22:43 -0800)]
[lldb] Use std::nullopt instead of llvm::None (NFC)
This is part of an effort to migrate from llvm::Optional to
std::optional:
https://discourse.llvm.org/t/deprecating-llvm-optional-x-hasvalue-getvalue-getvalueor/63716
Ramkumar Ramachandra [Thu, 1 Dec 2022 10:48:24 +0000 (11:48 +0100)]
mlir/tosa: move tosa.pad from Linalg to Tensor conversion
Since tosa.pad is lowered strictly to artih and tensor ops, move
ConvertPad from TosaToLinalg to TosaToTensor, benefitting non-Linalg
Tosa targets. TensorToLinalg exists, and is trivial, so nothing is lost.
Signed-off-by: Ramkumar Ramachandra <r@artagnon.com>
Differential Revision: https://reviews.llvm.org/D139091
Kazu Hirata [Tue, 6 Dec 2022 06:37:22 +0000 (22:37 -0800)]
[clang-tools-extra] Use std::nullopt instead of llvm::None (NFC)
This is part of an effort to migrate from llvm::Optional to
std::optional:
https://discourse.llvm.org/t/deprecating-llvm-optional-x-hasvalue-getvalue-getvalueor/63716
Kazu Hirata [Tue, 6 Dec 2022 05:49:31 +0000 (21:49 -0800)]
[clang-tools-extra] Use std::nullopt instead of llvm::None (NFC)
This patch mechanically replaces None with std::nullopt where the
compiler would warn if None were deprecated. The intent is to reduce
the amount of manual work required in migrating from Optional to
std::optional.
This is part of an effort to migrate from llvm::Optional to
std::optional:
https://discourse.llvm.org/t/deprecating-llvm-optional-x-hasvalue-getvalue-getvalueor/63716
Kazu Hirata [Tue, 6 Dec 2022 04:54:05 +0000 (20:54 -0800)]
[lldb] Use std::nullopt instead of None (NFC)
This is part of an effort to migrate from llvm::Optional to
std::optional:
https://discourse.llvm.org/t/deprecating-llvm-optional-x-hasvalue-getvalue-getvalueor/63716
Kazu Hirata [Tue, 6 Dec 2022 04:39:10 +0000 (20:39 -0800)]
Remove "using llvm::None;" in *.cpp
These .cpp files do not use llvm::None anymore.
Since these are not header files, we can remove them pretty safely
without deprecating them first.
Jeff Niu [Tue, 6 Dec 2022 04:00:38 +0000 (20:00 -0800)]
[mlir] UnsignedWhenEquivalent ignore dead code
The pass was not checking for uninitialized states due to dead code.
This patch also makes LLVMFuncOp correctly return a null body when it is
external.
Fixes #58807
Depends on D139388
Reviewed By: rriddle
Differential Revision: https://reviews.llvm.org/D139389
Jeff Niu [Tue, 6 Dec 2022 03:34:14 +0000 (19:34 -0800)]
[mlir][llvm] Mark LLVMReturnOp as ReturnLike
Reviewed By: rriddle
Differential Revision: https://reviews.llvm.org/D139388
jacquesguan [Wed, 21 Sep 2022 07:51:18 +0000 (15:51 +0800)]
[RISCV] Fold vector binary operatrion into select with identity constant.
This patch implements shouldFoldSelectWithIdentityConstant for RISCV. It would try to generate vmerge after the binary instruction and let them folded to maksed instruction later.
Reviewed By: craig.topper
Differential Revision: https://reviews.llvm.org/D131551