platform/upstream/llvm.git
17 months ago[mlir][python] Bump min pybind11 version to 2.9.0
Rahul Kayaith [Tue, 23 May 2023 17:40:00 +0000 (13:40 -0400)]
[mlir][python] Bump min pybind11 version to 2.9.0

2.9.0 was released on December 28, 2021, and some following changes
require at least this version.

Reviewed By: stellaraccident

Differential Revision: https://reviews.llvm.org/D150247

17 months agoAdd a reminder to update docs when updating default; NFC
Aaron Ballman [Tue, 23 May 2023 17:38:35 +0000 (13:38 -0400)]
Add a reminder to update docs when updating default; NFC

17 months ago[llvm][ADT] Fix invalid `reference` type of depth-first, breadth-first and post order...
Markus Böck [Tue, 23 May 2023 17:11:43 +0000 (19:11 +0200)]
[llvm][ADT] Fix invalid `reference` type of depth-first, breadth-first and post order iterators

C++s iterator concept requires operator* to return the same type as is specified by the iterators reference type. This functionality is especially important for older generic code that did not yet make use of auto.
An example from within LLVM is iterator_adaptor_base which uses the reference type of the iterator it is wrapping as its return type for operator* (this class is used as base for a lot of other functionality like filter iterators and so on).
Using any of the graph traversal iterators listed above with it would previously fail to compile due to reference being non-const while operator* returned a const reference.

This patch fixes that by correctly specifying reference and using it as the return type of operator* explicitly to prevent further issues in the future.

Differential Revision: https://reviews.llvm.org/D151198

17 months ago[libcxx][tests] Introduce 32-bit feature and use it for stringstream gcount test
Azat Khuzhin [Tue, 23 May 2023 17:28:30 +0000 (19:28 +0200)]
[libcxx][tests] Introduce 32-bit feature and use it for stringstream gcount test

This will avoid hardcoding all unsupported targets, since even after one
more follow up fix [1], there is one more failure.

  [1]: https://reviews.llvm.org/D150886

Plus, if you want to run it locally on some target that CI does not
covers, it could also false-positively fail, which is not good.

Reviewed By: #libc, ldionne

Differential Revision: https://reviews.llvm.org/D151046

17 months ago[lldb][NFCI] Merge implementations of ObjectFileMachO::GetMinimumOSVersion and Object...
Alex Langford [Sat, 20 May 2023 00:51:08 +0000 (17:51 -0700)]
[lldb][NFCI] Merge implementations of ObjectFileMachO::GetMinimumOSVersion and ObjectFileMachO::GetSDKVersion

These functions do the exact same thing (even if they look slightly
different). I yanked the common implementation, cleaned it up, and
shoved it into its own function.

Differential Revision: https://reviews.llvm.org/D151120

17 months ago[mlir][Vector] Add 0-d vector support to 'vector.shape_cast`
Diego Caballero [Mon, 22 May 2023 22:59:46 +0000 (22:59 +0000)]
[mlir][Vector] Add 0-d vector support to 'vector.shape_cast`

This patch adds support to shape cast a vector<1x1x1...1xElemenType> to
a vector<ElementType> and the other way around.

Differential Revision: https://reviews.llvm.org/D151169

17 months ago[libc][obvious] Correctly hoist mask out of the loop
Joseph Huber [Tue, 23 May 2023 17:19:56 +0000 (12:19 -0500)]
[libc][obvious] Correctly hoist mask out of the loop

Summry:
This was accidentally dropped from a previous patch following a rebase.
Fix it to where it's consistent.

Differential Revision: https://reviews.llvm.org/D151232

17 months agoCorrect stale documentation for default MSVC version numbers
Aaron Ballman [Tue, 23 May 2023 17:11:19 +0000 (13:11 -0400)]
Correct stale documentation for default MSVC version numbers

We documented -fmsc-version as defaulting to 1300 and
-fms-compatibility-version as defaulting to 1800, neither of which
were accurate. We currently default to 1920.

See MSVCToolChain::computeMSVCVersion() for details.

17 months ago[hwasan] Move RunFreeHooks call
Jin Xin Ng [Mon, 22 May 2023 19:20:55 +0000 (19:20 +0000)]
[hwasan] Move RunFreeHooks call

Ensures a subsequent call (via an external caller) to
__sanitizer_get_allocated_size via hooks will return a valid size.

This allows a faster version of __sanitizer_get_allocated_size
to be implemented, which can skip checks.

Test to ensure RunFreeHooks' call order will come with
__sanitizer_get_allocated_size_fast

Differential Revision: https://reviews.llvm.org/D151151

17 months ago[libc++][doc] Updates the tasks to do for a release.
Mark de Wever [Sun, 7 May 2023 17:50:41 +0000 (19:50 +0200)]
[libc++][doc] Updates the tasks to do for a release.

This is a followup of the review comments in D144499.

Reviewed By: ldionne, philnik, #libc

Differential Revision: https://reviews.llvm.org/D150585

17 months ago[NFC][libc++][format] Uses stringstream::view.
Mark de Wever [Wed, 17 May 2023 15:38:13 +0000 (17:38 +0200)]
[NFC][libc++][format] Uses stringstream::view.

This member has been added in D148641 so it can be used in the formatter
to avoid creating a "temporary" string.

Reviewed By: #libc, ldionne

Differential Revision: https://reviews.llvm.org/D150791

17 months ago[libc++][modules] Adds std module cppm files.
Mark de Wever [Tue, 28 Feb 2023 19:29:26 +0000 (20:29 +0100)]
[libc++][modules] Adds std module cppm files.

This adds the cppm files of D144994. These files by themselves will do
nothing. The goal is to reduce the size of D144994 and making it easier
to review the real changes of the patch.

Implements parts of
- P2465R3 Standard Library Modules std and std.compat

Reviewed By: ldionne, ChuanqiXu, aaronmondal, #libc

Differential Revision: https://reviews.llvm.org/D151030

17 months ago[IR] Make stack protector symbol dso_local according to -f[no-]direct-access-external...
Fangrui Song [Tue, 23 May 2023 16:49:57 +0000 (09:49 -0700)]
[IR] Make stack protector symbol dso_local according to -f[no-]direct-access-external-data

There are two motivations.

`-fno-pic -fstack-protector -mstack-protector-guard=global` created
`__stack_chk_guard` is referenced directly on all ELF OSes except FreeBSD.
This patch allows referencing the symbol indirectly with
-fno-direct-access-external-data.

Some Linux kernel folks want
`-fno-pic -fstack-protector -mstack-protector-guard-reg=gs -mstack-protector-guard-symbol=__stack_chk_guard`
created `__stack_chk_guard` to be referenced directly, avoiding
R_X86_64_REX_GOTPCRELX (even if the relocation may be optimized out by the linker).
https://github.com/llvm/llvm-project/issues/60116
Why they need this isn't so clear to me.

---

Add module flag "direct-access-external-data" and set the dso_local property of
the stack protector symbol. The module flag can benefit other LLVMCodeGen
synthesized symbols that are not represented in LLVM IR.

Nowadays, with `-fno-pic` being uncommon, ideally we should set
"direct-access-external-data" when it is true. However, doing so would require
~90 clang/test tests to be updated, which are too much.

As a compromise, we set "direct-access-external-data" only when it's different
from the implied default value.

Reviewed By: nickdesaulniers

Differential Revision: https://reviews.llvm.org/D150841

17 months ago[libc++] Updates C++2b to C++23.
Mark de Wever [Wed, 17 May 2023 15:54:53 +0000 (17:54 +0200)]
[libc++] Updates C++2b to C++23.

During the ISO C++ Committee meeting plenary session the C++23 Standard
has been voted as technical complete.

This updates the reference to c++2b to c++23 and updates the __cplusplus
macro.

Note since we use clang-tidy 16 a small work-around is needed. Clang
knows -std=c++23 but clang-tidy not so for now force the lit compiler
flag to use -std=c++2b instead of -std=c++23.

Reviewed By: #libc, philnik, jloser, ldionne

Differential Revision: https://reviews.llvm.org/D150795

17 months ago[lsan] Invoke hooks on realloc
Jin Xin Ng [Mon, 22 May 2023 21:13:46 +0000 (21:13 +0000)]
[lsan] Invoke hooks on realloc

Previously lsan would not invoke hooks on reallocations.
An accompanying regression test is included in sanitizer_common.

This change also moves hook calls to a location where subsequent
calls (via an external caller) to __sanitizer_get_allocated_size
via hooks will return a valid size.

This allows a faster version of __sanitizer_get_allocated_size
to be implemented, which can skip checks.

Test to ensure RunFreeHooks' call order will come with
__sanitizer_get_allocated_size_fast

Differential Revision: https://reviews.llvm.org/D151175

17 months ago[flang] Fixed managing copy-in/copy-out temps.
Slava Zakharin [Tue, 23 May 2023 16:10:26 +0000 (09:10 -0700)]
[flang] Fixed managing copy-in/copy-out temps.

There are several observations regarding the copy-in/copy-out:
  * Actual argument associated with INTENT(OUT) dummy argument that
    requires finalization (7.5.6.3 p. 7) may be read by the finalization
    function, so a copy-in is required.
  * A temporary created for the copy-in/copy-out must be destroyed
    without finalization after the call (or after the corresponding copy-out),
    otherwise, memory leaks may occur.
  * The copy-out assignment must not perform finalization for the LHS.
  * The copy-out assignment from the temporary to the actual argument
    may or may not need to initialize the LHS.

This change-set introduces new runtime methods: CopyOutAssign and
DestroyWithoutFinalization. They are called by the compiler generated
code to match the behavior described above.

Reviewed By: jeanPerier

Differential Revision: https://reviews.llvm.org/D151135

17 months ago[MLIR][python bindings] use pybind C++ APIs for throwing python errors.
max [Mon, 22 May 2023 22:30:12 +0000 (17:30 -0500)]
[MLIR][python bindings] use pybind C++ APIs for throwing python errors.

Differential Revision: https://reviews.llvm.org/D151167

17 months ago[AArch64][FMV] Prevent target attribute using for multiversioning.
Pavel Iliin [Thu, 18 May 2023 11:02:04 +0000 (12:02 +0100)]
[AArch64][FMV] Prevent target attribute using for multiversioning.

On AArch64 for function multiversioning target_version/target_clones
attributes should be used. The patch fixes the defect allowing target
attribute to cause multiversioning.

Differential Revision: https://reviews.llvm.org/D150867

17 months ago[LegalizeTypes][ARM][AArch6][RISCV][VE][WebAssembly] Add special case for smin(X...
Craig Topper [Tue, 23 May 2023 16:19:37 +0000 (09:19 -0700)]
[LegalizeTypes][ARM][AArch6][RISCV][VE][WebAssembly] Add special case for smin(X, -1) and smax(X, 0) to ExpandIntRes_MINMAX.

We can compute a simpler expression for Lo for these cases. This
is an alternative for the test cases in D151180 that works for
more targets.

This is similar to some of the special cases we have for expanding
setcc operands.

Differential Revision: https://reviews.llvm.org/D151182

17 months ago[OpenMP][NFC] clang-format the OpenMP device runtime
Joseph Huber [Tue, 23 May 2023 16:09:16 +0000 (11:09 -0500)]
[OpenMP][NFC] clang-format the OpenMP device runtime

These files aren't fully formatted. I'm guessing this was a holdover
from when `clang-format` was totally broken for OpenMP offloading.
Format the files to be more consistent.

Reviewed By: tianshilei1992

Differential Revision: https://reviews.llvm.org/D151226

17 months ago[libc] More efficiently send bytes via `send_n` and `recv_n`
Joseph Huber [Fri, 19 May 2023 16:17:42 +0000 (11:17 -0500)]
[libc] More efficiently send bytes via `send_n` and `recv_n`

Currently we have the `send_n` and `recv_n` routines to stream data,
such as a string to print, to the other side. The first operation is to
send the size so the other side knows the number of bytes to recieve.
However, this wasted 56 bytes that could've been sent. This meant that
small values, like the arguments to a function to call on the host for
example, needed to perform an extra send. This patch sends the first 56
bytes in the first packet and continues if necessary.

Depends on D150992

Reviewed By: JonChesterfield

Differential Revision: https://reviews.llvm.org/D151041

17 months ago[libc] Fix the `send_n` and `recv_n` utilities under divergent lanes
Joseph Huber [Fri, 19 May 2023 19:58:32 +0000 (14:58 -0500)]
[libc] Fix the `send_n` and `recv_n` utilities under divergent lanes

We provide the `send_n` and `recv_n` utilities as a generic way to
stream data between both sides of the process. This was previously
tested and performed as expected when using a string of constant size.
However, when the size was allowed to diverge between the threads in the
warp or wavefront this could deadlock. This did not occur on NVPTX
because of the use of the explicit warp sync. However, on AMD one of the
work items in the wavefront could continue executing and hit the next
`recv` call before the other threads, then we would deadlock as we
violated the RPC invariants.

This patch replaces the for loop with a thread ballot. This will cause
every thread in the warp or wavefront to continue executing the loop
until all of them can exit. This acts as a more explicit wavefront sync.

Reviewed By: JonChesterfield

Differential Revision: https://reviews.llvm.org/D150992

17 months ago[libc++] Remove tests from ranges.pass.cpp which violate semantic requirements
Nikolas Klauser [Tue, 23 May 2023 15:59:13 +0000 (08:59 -0700)]
[libc++] Remove tests from ranges.pass.cpp which violate semantic requirements

This also removes some tests which we have grouped together into robust_from_*.pass.cpp tests.

Specifically, checking that
- `ranges::dangling` is returned is done in `libcxx/test/std/algorithms/ranges_robust_against_dangling.pass.cpp`
- `std::invoke` is used is done in `libcxx/test/std/algorithms/ranges_robust_against_omitting_invoke.pass.cpp`.
- implicit conversion to bool works is done in `libcxx/test/std/algorithms/ranges_robust_against_nonbool_predicates.pass.cpp`

Checking the comparison order is invalid because the `operator==` isn't symmetric.
Checking what the exact type of `operator==` is, is invalid because comparing the same object has to yield the same results if the objects are not modified.

Reviewed By: ldionne, #libc

Spies: EricWF, libcxx-commits

Differential Revision: https://reviews.llvm.org/D150588

17 months ago[libc++][NFC] Move basic_ios extern instantiations into <ios>
Nikolas Klauser [Tue, 23 May 2023 15:58:14 +0000 (08:58 -0700)]
[libc++][NFC] Move basic_ios extern instantiations into <ios>

`basic_ios` is defined in `<ios>`, so it seems weird that we declare the explicit instantiation for it i `<streambuf>`, which is technically unrelated.

Reviewed By: #libc, EricWF, ldionne

Spies: ldionne, EricWF, libcxx-commits

Differential Revision: https://reviews.llvm.org/D150912

17 months ago[HIP] Allow std::malloc in device function
Yaxun (Sam) Liu [Thu, 11 May 2023 16:57:21 +0000 (12:57 -0400)]
[HIP] Allow std::malloc in device function

D106463 caused a regression that prevents std::malloc to be
called in the device function, which is allowed with nvcc.

Basically the standard C++ header introducing malloc in
std namespace by using ::malloc. The device ::malloc
function needs to be declared before using ::malloc
to be introduced into std namespace.

Revert D106463 and add a test.

Reviewed by: Artem Belevich

Differential Revision: https://reviews.llvm.org/D150965

17 months ago[libc++][NFC] Fix whitespace problems in the files added to ignore_format.txt in...
Nikolas Klauser [Tue, 23 May 2023 15:40:47 +0000 (08:40 -0700)]
[libc++][NFC] Fix whitespace problems in the files added to ignore_format.txt in D151115

Reviewed By: ldionne, #libc, Mordante

Spies: arichardson, Mordante, libcxx-commits

Differential Revision: https://reviews.llvm.org/D151119

17 months ago[lldb][NFCI] Use llvm's libDebugInfo for DebugRanges
Felipe de Azevedo Piovezan [Thu, 11 May 2023 13:01:12 +0000 (09:01 -0400)]
[lldb][NFCI] Use llvm's libDebugInfo for DebugRanges

In an effort to unify the different dwarf parsers available in the codebase,
this commit removes LLDB's custom parsing for the `.debug_ranges` DWARF section,
instead calling into LLVM's parser.

Subsequent work should look into unifying `llvm::DWARDebugRangeList` (whose
entries are pairs of (start, end) addresses) with `lldb::DWARFRangeList` (whose
entries are pairs of (start, length)). The lists themselves are also different
data structures, but functionally equivalent.

Depends on D150363

Differential Revision: https://reviews.llvm.org/D150366

17 months ago[libc][math] Implement double precision log1p correctly rounded to all rounding modes.
Tue Ly [Sun, 21 May 2023 05:27:38 +0000 (01:27 -0400)]
[libc][math] Implement double precision log1p correctly rounded to all rounding modes.

Implement double precision log1p function correctly rounded to all
rounding modes.

**Performance**

  - For `0.5 <= x <= 2`, the fast pass hitting rate is about 99.93%.
  - Benchmarks with `./perf.sh` tool from the CORE-MATH project, unit is (CPU clocks / call).
  - Reciprocal throughput from CORE-MATH's perf tool on Ryzen 5900X:
```
$ ./perf.sh log1p
GNU libc version: 2.35
GNU libc release: stable

-- CORE-MATH reciprocal throughput -- with FMA
[####################] 100 %
Ntrial = 20 ; Min = 39.792 + 1.011 clc/call; Median-Min = 0.940 clc/call; Max = 41.373 clc/call;

-- CORE-MATH reciprocal throughput -- without FMA (-march=x86-64-v2)
[####################] 100 %
Ntrial = 20 ; Min = 87.285 + 1.135 clc/call; Median-Min = 1.299 clc/call; Max = 89.715 clc/call;

-- System LIBC reciprocal throughput --
[####################] 100 %
Ntrial = 20 ; Min = 20.666 + 0.123 clc/call; Median-Min = 0.125 clc/call; Max = 20.828 clc/call;

-- LIBC reciprocal throughput -- with FMA
[####################] 100 %
Ntrial = 20 ; Min = 20.928 + 0.771 clc/call; Median-Min = 0.725 clc/call; Max = 22.767 clc/call;

-- LIBC reciprocal throughput -- without FMA
[####################] 100 %
Ntrial = 20 ; Min = 31.461 + 0.528 clc/call; Median-Min = 0.602 clc/call; Max = 36.809 clc/call;

```
  - Latency from CORE-MATH's perf tool on Ryzen 5900X:
```
$ ./perf.sh log1p --latency
GNU libc version: 2.35
GNU libc release: stable

-- CORE-MATH latency -- with FMA
[####################] 100 %
Ntrial = 20 ; Min = 77.875 + 0.062 clc/call; Median-Min = 0.051 clc/call; Max = 78.003 clc/call;

-- CORE-MATH latency -- without FMA (-march=x86-64-v2)
[####################] 100 %
Ntrial = 20 ; Min = 101.958 + 1.202 clc/call; Median-Min = 1.325 clc/call; Max = 104.452 clc/call;

-- System LIBC latency --
[####################] 100 %
Ntrial = 20 ; Min = 60.581 + 1.443 clc/call; Median-Min = 1.611 clc/call; Max = 62.285 clc/call;

-- LIBC latency -- with FMA
[####################] 100 %
Ntrial = 20 ; Min = 48.817 + 1.108 clc/call; Median-Min = 1.300 clc/call; Max = 50.282 clc/call;

-- LIBC latency -- without FMA
[####################] 100 %
Ntrial = 20 ; Min = 61.121 + 0.599 clc/call; Median-Min = 0.761 clc/call; Max = 62.020 clc/call;
```
  - Accurate pass latency:
```
$ ./perf.sh log1p --latency --simple_stat
GNU libc version: 2.35
GNU libc release: stable

-- CORE-MATH latency -- with FMA
760.444

-- CORE-MATH latency -- without FMA (-march=x86-64-v2)
827.880

-- LIBC latency -- with FMA
711.837

-- LIBC latency -- without FMA
764.317
```

Reviewed By: zimmermann6

Differential Revision: https://reviews.llvm.org/D151049

17 months ago[InstCombine] Add droppable users back to worklist (NFCI)
Nikita Popov [Tue, 23 May 2023 14:59:02 +0000 (16:59 +0200)]
[InstCombine] Add droppable users back to worklist (NFCI)

When sinking and users are dropped, add the using instructions
to the worklist, as they can likely be removed as well.

This should be NFC apart from worklist order effects.

17 months ago[flang][NFC] Move Array constructor inlined temp management into a utility
Jean Perier [Tue, 23 May 2023 15:00:15 +0000 (17:00 +0200)]
[flang][NFC] Move Array constructor inlined temp management into a utility

This patch moves the counter and storage management part of the array
constructor inlined temporary strategy into its own utility so that it
can be reused for the simple cases of temporary creations inside WHERE
and FORALL.

It actually fixes a bug where the counter first value  used for addressing
was "2" leading to read/write after the allocated storage... It seems
I ran the tests end-to-end without the HLFIR flag when previously testing
this. So this may clear some segfaults.

Differential Revision: https://reviews.llvm.org/D151106

17 months ago[flang] use greedy mlir driver for stack arrays pass
Tom Eccles [Wed, 17 May 2023 16:07:41 +0000 (16:07 +0000)]
[flang] use greedy mlir driver for stack arrays pass

In upstream mlir, the dialect conversion infrastructure is used for
lowering from one dialect to another: the passes are of the form
XToYPass. Whereas, transformations within the same dialect tend to use
applyPatternsAndFoldGreedily.

In this case, the full complexity of applyPatternsAndFoldGreedily isn't
needed so we can get away with the simpler applyOpPatternsAndFold.

This change was suggested by @jeanPerier

Differential Revision: https://reviews.llvm.org/D150853

17 months ago[libc][math] Implement double precision log2 function correctly rounded to all roundi...
Tue Ly [Thu, 11 May 2023 15:10:02 +0000 (11:10 -0400)]
[libc][math] Implement double precision log2 function correctly rounded to all rounding modes.

Implement double precision log2 function correctly rounded to all
rounding modes.

See https://reviews.llvm.org/D150014 for a more detail description of the algorithm.

**Performance**

  - For `0.5 <= x <= 2`, the fast pass hitting rate is about 99.91%.

  - Reciprocal throughput from CORE-MATH's perf tool on Ryzen 5900X:
```
$ ./perf.sh log2
GNU libc version: 2.35
GNU libc release: stable

-- CORE-MATH reciprocal throughput -- with FMA
[####################] 100 %
Ntrial = 20 ; Min = 15.458 + 0.204 clc/call; Median-Min = 0.224 clc/call; Max = 15.867 clc/call;

-- CORE-MATH reciprocal throughput -- without FMA (-march=x86-64-v2)
[####################] 100 %
Ntrial = 20 ; Min = 23.711 + 0.524 clc/call; Median-Min = 0.443 clc/call; Max = 25.307 clc/call;

-- System LIBC reciprocal throughput --
[####################] 100 %
Ntrial = 20 ; Min = 14.807 + 0.199 clc/call; Median-Min = 0.211 clc/call; Max = 15.137 clc/call;

-- LIBC reciprocal throughput -- with FMA
[####################] 100 %
Ntrial = 20 ; Min = 17.666 + 0.274 clc/call; Median-Min = 0.298 clc/call; Max = 18.531 clc/call;

-- LIBC reciprocal throughput -- without FMA
[####################] 100 %
Ntrial = 20 ; Min = 26.534 + 0.418 clc/call; Median-Min = 0.462 clc/call; Max = 27.327 clc/call;

```
  - Latency from CORE-MATH's perf tool on Ryzen 5900X:
```
$ ./perf.sh log2 --latency
GNU libc version: 2.35
GNU libc release: stable

-- CORE-MATH latency -- with FMA
[####################] 100 %
Ntrial = 20 ; Min = 46.048 + 1.643 clc/call; Median-Min = 1.694 clc/call; Max = 48.018 clc/call;

-- CORE-MATH latency -- without FMA (-march=x86-64-v2)
[####################] 100 %
Ntrial = 20 ; Min = 62.333 + 0.138 clc/call; Median-Min = 0.119 clc/call; Max = 62.583 clc/call;

-- System LIBC latency --
[####################] 100 %
Ntrial = 20 ; Min = 45.206 + 1.503 clc/call; Median-Min = 1.467 clc/call; Max = 47.229 clc/call;

-- LIBC latency -- with FMA
[####################] 100 %
Ntrial = 20 ; Min = 43.042 + 0.454 clc/call; Median-Min = 0.484 clc/call; Max = 43.912 clc/call;

-- LIBC latency -- without FMA
[####################] 100 %
Ntrial = 20 ; Min = 57.016 + 1.636 clc/call; Median-Min = 1.655 clc/call; Max = 58.816 clc/call;
```
  - Accurate pass latency:
```
$ ./perf.sh log2 --latency --simple_stat
GNU libc version: 2.35
GNU libc release: stable

-- CORE-MATH latency -- with FMA
177.632

-- CORE-MATH latency -- without FMA (-march=x86-64-v2)
231.332

-- LIBC latency -- with FMA
459.751

-- LIBC latency -- without FMA
463.850
```

Reviewed By: zimmermann6

Differential Revision: https://reviews.llvm.org/D150374

17 months ago[Hexagon] Fix safety check in moving instructions in HVC::AlignVectors
Krzysztof Parzyszek [Tue, 23 May 2023 12:46:15 +0000 (05:46 -0700)]
[Hexagon] Fix safety check in moving instructions in HVC::AlignVectors

A prior commit accidentally affected a safety check allowing aliased memory
instructions to be moved across one another.

17 months ago[NFC][CLANG] Fix static code analyzer concerns
Manna, Soumi [Tue, 23 May 2023 14:36:15 +0000 (07:36 -0700)]
[NFC][CLANG] Fix static code analyzer concerns

Reported by Static Code Analyzer Tool, Coverity:

Dereference null return value

Inside "ExprConstant.cpp" file, in <unnamed>::RecordExprEvaluator::VisitCXXStdInitializerListExpr(clang::CXXStdInitializerListExpr const *): Return value of function which returns null is dereferenced without checking.

  bool RecordExprEvaluator::VisitCXXStdInitializerListExpr(
   const CXXStdInitializerListExpr *E) {
       // returned_null: getAsConstantArrayType returns nullptr (checked 81 out of 93 times).
       //var_assigned: Assigning: ArrayType = nullptr return value from getAsConstantArrayType.
    const ConstantArrayType *ArrayType =
       Info.Ctx.getAsConstantArrayType(E->getSubExpr()->getType());
    LValue Array;
    //Condition !EvaluateLValue(E->getSubExpr(), Array, this->Info, false), taking false branch.
    if (!EvaluateLValue(E->getSubExpr(), Array, Info))
     return false;

    // Get a pointer to the first element of the array.

    //Dereference null return value (NULL_RETURNS)
    //dereference: Dereferencing a pointer that might be nullptr ArrayType when calling addArray.
    Array.addArray(Info, E, ArrayType);

This patch adds an assert for unexpected type for array initializer.

Reviewed By: erichkeane

Differential Revision: https://reviews.llvm.org/D151040

17 months ago[include-cleaner] Treat references to nested types implicit
Kadir Cetinkaya [Fri, 5 May 2023 10:39:09 +0000 (12:39 +0200)]
[include-cleaner] Treat references to nested types implicit

Differential Revision: https://reviews.llvm.org/D149948

17 months ago[InstCombine] Fix worklist management in select value equiv fold (NFCI)
Nikita Popov [Tue, 23 May 2023 14:36:11 +0000 (16:36 +0200)]
[InstCombine] Fix worklist management in select value equiv fold (NFCI)

Requeue the modified instruction.

This should be NFC apart from worklist order effects.

17 months ago[libc][math] Implement double precision log function correctly rounded to all roundin...
Tue Ly [Mon, 8 May 2023 18:03:52 +0000 (14:03 -0400)]
[libc][math] Implement double precision log function correctly rounded to all rounding modes.

Implement double precision log function correctly rounded to all
rounding modes.

See https://reviews.llvm.org/D150014 for a more detail description of the algorithm.

**Performance**

  - For `0.5 <= x <= 2`, the fast pass hitting rate is about 99.93%.

  - Reciprocal throughput from CORE-MATH's perf tool on Ryzen 5900X:
```
$ ./perf.sh log
GNU libc version: 2.35
GNU libc release: stable

-- CORE-MATH reciprocal throughput -- with FMA
[####################] 100 %
Ntrial = 20 ; Min = 17.465 + 0.596 clc/call; Median-Min = 0.602 clc/call; Max = 18.389 clc/call;

-- CORE-MATH reciprocal throughput -- without FMA (-march=x86-64-v2)
[####################] 100 %
Ntrial = 20 ; Min = 54.961 + 2.606 clc/call; Median-Min = 2.180 clc/call; Max = 59.583 clc/call;

-- System LIBC reciprocal throughput --
[####################] 100 %
Ntrial = 20 ; Min = 12.608 + 0.276 clc/call; Median-Min = 0.359 clc/call; Max = 13.147 clc/call;

-- LIBC reciprocal throughput -- with FMA
[####################] 100 %
Ntrial = 20 ; Min = 20.952 + 0.468 clc/call; Median-Min = 0.602 clc/call; Max = 21.881 clc/call;

-- LIBC reciprocal throughput -- without FMA
[####################] 100 %
Ntrial = 20 ; Min = 18.569 + 0.552 clc/call; Median-Min = 0.601 clc/call; Max = 19.259 clc/call;

```
  - Latency from CORE-MATH's perf tool on Ryzen 5900X:
```
$ ./perf.sh log --latency
GNU libc version: 2.35
GNU libc release: stable

-- CORE-MATH latency -- with FMA
[####################] 100 %
Ntrial = 20 ; Min = 48.431 + 0.699 clc/call; Median-Min = 0.073 clc/call; Max = 51.269 clc/call;

-- CORE-MATH latency -- without FMA (-march=x86-64-v2)
[####################] 100 %
Ntrial = 20 ; Min = 64.865 + 3.235 clc/call; Median-Min = 3.475 clc/call; Max = 71.788 clc/call;

-- System LIBC latency --
[####################] 100 %
Ntrial = 20 ; Min = 42.151 + 2.090 clc/call; Median-Min = 2.270 clc/call; Max = 44.773 clc/call;

-- LIBC latency -- with FMA
[####################] 100 %
Ntrial = 20 ; Min = 35.266 + 0.479 clc/call; Median-Min = 0.373 clc/call; Max = 36.798 clc/call;

-- LIBC latency -- without FMA
[####################] 100 %
Ntrial = 20 ; Min = 48.518 + 0.484 clc/call; Median-Min = 0.500 clc/call; Max = 49.896 clc/call;
```
  - Accurate pass latency:
```
$ ./perf.sh log --latency --simple_stat
GNU libc version: 2.35
GNU libc release: stable

-- CORE-MATH latency -- with FMA
598.306

-- CORE-MATH latency -- without FMA (-march=x86-64-v2)
632.925

-- LIBC latency -- with FMA
455.632

-- LIBC latency -- without FMA
488.564
```

Reviewed By: zimmermann6

Differential Revision: https://reviews.llvm.org/D150131

17 months ago[NFC][Clang] Fix Coverity bug with dereference null return value in clang::CodeGen...
Manna, Soumi [Tue, 23 May 2023 14:22:40 +0000 (07:22 -0700)]
[NFC][Clang] Fix Coverity bug with dereference null return value in clang::CodeGen::CodeGenFunction::EmitOMPArraySectionExpr()

Reported by Coverity:

Inside  "CGExpr.cpp" file, in clang::CodeGen::CodeGenFunction::EmitOMPArraySectionExpr(clang::OMPArraySectionExpr const *, bool): Return value of function which returns null is dereferenced without checking.

    } else {
   //returned_null: getAsConstantArrayType returns nullptr (checked 83 out of 95 times).
   // var_assigned: Assigning: CAT = nullptr return value from getAsConstantArrayType.
      auto *CAT = C.getAsConstantArrayType(ArrayTy);
   //identity_transfer: Member function call CAT->getSize() returns an offset off CAT (this).

     // Dereference null return value (NULL_RETURNS)
     //dereference: Dereferencing a pointer that might be nullptr CAT->getSize() when calling APInt.
     ConstLength = CAT->getSize();
    }

This patch adds an assert to resolve the bug.

Reviewed By: erichkeane

Differential Revision: https://reviews.llvm.org/D151137

17 months ago[InstCombine] Regenerate test checks (NFC)
Nikita Popov [Tue, 23 May 2023 14:24:41 +0000 (16:24 +0200)]
[InstCombine] Regenerate test checks (NFC)

17 months ago[InstCombine] Fix worklist management in replaceGEPIdxWithZero() fold (NFCI)
Nikita Popov [Tue, 23 May 2023 14:20:41 +0000 (16:20 +0200)]
[InstCombine] Fix worklist management in replaceGEPIdxWithZero() fold (NFCI)

Make sure the old load/store operand is queued for DCE.

This should be NFC apart from worklist order effects.

17 months ago[libc][AMDGPU] Disable the AMDGPU backend's ctor/dtor lowering for libc
Joseph Huber [Tue, 23 May 2023 14:16:30 +0000 (09:16 -0500)]
[libc][AMDGPU] Disable the AMDGPU backend's ctor/dtor lowering for libc

The AMDGPU backend has a built-in pass to lower constructors. We do this
manually in the `start.cpp` implementation so we can disable this to
keep the binaries smaller.

Differential Revision: https://reviews.llvm.org/D151213

17 months ago[OpenMP] Insert missing variable update inside loop
Jonathan Peyton [Mon, 22 May 2023 19:08:51 +0000 (14:08 -0500)]
[OpenMP] Insert missing variable update inside loop

While loop within task priority code did not have necessary update of
variable which could lead to hangs if two threads collided when both
attempted to execute the compare_and_exchange.

Fixes: https://github.com/llvm/llvm-project/issues/62867
Differential Revision: https://reviews.llvm.org/D151138

17 months ago[libc][math] Make log10 correctly rounded for non-FMA targets and improve itsperformance.
Tue Ly [Sat, 6 May 2023 02:08:42 +0000 (22:08 -0400)]
[libc][math] Make log10 correctly rounded for non-FMA targets and improve itsperformance.

Make log10 correctly rounded for non-FMA targets and improve its
performance.

Implemented fast pass and accurate pass:

**Fast Pass**:

  - Range reduction step 0: Extract exponent and mantissa
```
  x = 2^(e_x) * m_x
```
  - Range reduction step 1: Use lookup tables of size 2^7 = 128 to reduce the argument to:
```
   -2^-8 <= v = r * m_x - 1 < 2^-7
  where r = 2^-8 * ceil( 2^8 * (1 - 2^-8) / (1 + k * 2^-7) )
  and k = trunc( (m_x - 1) * 2^7 )
```
  - Polynomial approximation: approximate `log(1 + v)` by a degree-7 polynomial generated by Sollya with:
```
 > P = fpminimax((log(1 + x) - x)/x^2, 5, [|D...|], [-2^-8, 2^-7]);
```
  - Combine the results:
```
  log10(x) ~ ( e_x * log(2) - log(r) + v + v^2 * P(v) ) * log10(e)
```
  - Perform additive Ziv's test with errors bounded by `P_ERR * v^2`.  Return the result if Ziv's test passed.

**Accurate Pass**:

  - Take `e_x`, `v`, and the lookup table index from the range reduction step of fast pass.
  - Perform 3 more range reduction steps:
    - Range reduction step 2: Use look-up tables of size 193 to reduce the argument to `[-0x1.3ffcp-15, 0x1.3e3dp-15]`
```
   v2 = r2 * (1 + v) - 1 = (1 + s2) * (1 + v) - 1 = s2 + v + s2 * v
  where r2 = 2^-16 * round ( 2^16 / (1 + k * 2^-14) )
  and k = trunc( v * 2^14 + 0.5 ).
```
    - Range reduction step 3: Use look-up tables of size 161 to reduce the argument to `[-0x1.01928p-22 , 0x1p-22]`
```
   v3 = r3 * (1 + v2) - 1 = (1 + s3) * (1 + v2) - 1 = s3 + v2 + s3 * v2
  where r3 = 2^-21 * round ( 2^21 / (1 + k * 2^-21) )
  and k = trunc( v * 2^21 + 0.5 ).
```
    - Range reduction step 4: Use look-up tables of size 130 to reduce the argument to `[-0x1.0002143p-29 , 0x1p-29]`
```
   v4 = r4 * (1 + v3) - 1 = (1 + s4) * (1 + v3) - 1 = s4 + v3 + s4 * v3
  where r4 = 2^-28 * round ( 2^28 / (1 + k * 2^-28) )
  and k = trunc( v * 2^28 + 0.5 ).
```
  - Polynomial approximation: approximate `log10(1 + v4)` by a degree-4 minimax polynomial generated by Sollya with:
```
  > P = fpminimax(log10(1 + x)/x, 3, [|128...|], [-0x1.0002143p-29 , 0x1p-29]);
```
  - Combine the results:
```
  log10(x) ~ e_x * log10(2) - log10(r) - log10(r2) - log10(r3) - log10(r4) + v * P(v)
```
  - The combined results are computed using floating points of 128-bit precision.

**Performance**

  - For `0.5 <= x <= 2`, the fast pass hitting rate is about 99.92%.

  - Reciprocal throughput from CORE-MATH's perf tool on Ryzen 5900X:
```
$ ./perf.sh log10
GNU libc version: 2.35
GNU libc release: stable

-- CORE-MATH reciprocal throughput -- with FMA
[####################] 100 %
Ntrial = 20 ; Min = 20.402 + 0.589 clc/call; Median-Min = 0.277 clc/call; Max = 22.752 clc/call;

-- CORE-MATH reciprocal throughput -- without FMA (-march=x86-64-v2)
[####################] 100 %
Ntrial = 20 ; Min = 75.797 + 3.317 clc/call; Median-Min = 3.407 clc/call; Max = 79.371 clc/call;

-- System LIBC reciprocal throughput --
[####################] 100 %
Ntrial = 20 ; Min = 22.668 + 0.184 clc/call; Median-Min = 0.181 clc/call; Max = 23.205 clc/call;

-- LIBC reciprocal throughput -- with FMA
[####################] 100 %
Ntrial = 20 ; Min = 25.977 + 0.183 clc/call; Median-Min = 0.138 clc/call; Max = 26.283 clc/call;

-- LIBC reciprocal throughput -- without FMA
[####################] 100 %
Ntrial = 20 ; Min = 22.140 + 0.980 clc/call; Median-Min = 0.853 clc/call; Max = 23.790 clc/call;

```
  - Latency from CORE-MATH's perf tool on Ryzen 5900X:
```
$ ./perf.sh log10 --latency
GNU libc version: 2.35
GNU libc release: stable

-- CORE-MATH latency -- with FMA
[####################] 100 %
Ntrial = 20 ; Min = 54.613 + 0.357 clc/call; Median-Min = 0.287 clc/call; Max = 55.701 clc/call;

-- CORE-MATH latency -- without FMA (-march=x86-64-v2)
[####################] 100 %
Ntrial = 20 ; Min = 79.681 + 0.482 clc/call; Median-Min = 0.294 clc/call; Max = 81.604 clc/call;

-- System LIBC latency --
[####################] 100 %
Ntrial = 20 ; Min = 61.532 + 0.208 clc/call; Median-Min = 0.199 clc/call; Max = 62.256 clc/call;

-- LIBC latency -- with FMA
[####################] 100 %
Ntrial = 20 ; Min = 41.510 + 0.205 clc/call; Median-Min = 0.244 clc/call; Max = 41.867 clc/call;

-- LIBC latency -- without FMA
[####################] 100 %
Ntrial = 20 ; Min = 55.669 + 0.240 clc/call; Median-Min = 0.280 clc/call; Max = 56.056 clc/call;
```
  - Accurate pass latency:
```
$ ./perf.sh log10 --latency --simple_stat
GNU libc version: 2.35
GNU libc release: stable

-- CORE-MATH latency -- with FMA
640.688

-- CORE-MATH latency -- without FMA (-march=x86-64-v2)
667.354

-- LIBC latency -- with FMA
495.593

-- LIBC latency -- without FMA
504.143
```

Reviewed By: zimmermann6

Differential Revision: https://reviews.llvm.org/D150014

17 months ago[NFC][CLANG] Fix static code analyzer concerns with dereference null return value
Manna, Soumi [Tue, 23 May 2023 14:07:09 +0000 (07:07 -0700)]
[NFC][CLANG] Fix static code analyzer concerns with dereference null return value

Reported by Static Code Analyzer Tool, Coverity:

Inside "SemaExprMember.cpp" file, in clang::Sema::BuildMemberReferenceExpr(clang::Expr *, clang::QualType, clang::SourceLocation, bool, clang::CXXScopeSpec &, clang::SourceLocation, clang::NamedDecl *, clang::DeclarationNameInfo const &, clang::TemplateArgumentListInfo const *, clang::Scope const *, clang::Sema::ActOnMemberAccessExtraArgs *): Return value of function which returns null is dereferenced without checking

  //Condition !Base, taking true branch.
  if (!Base) {
    TypoExpr *TE = nullptr;
    QualType RecordTy = BaseType;

     //Condition IsArrow, taking true branch.
     if (IsArrow) RecordTy = RecordTy->castAs<PointerType>()->getPointeeType();
     //returned_null: getAs returns nullptr (checked 279 out of 294 times).
     //Condition TemplateArgs != NULL, taking true branch.

     //Dereference null return value (NULL_RETURNS)
     //dereference: Dereferencing a pointer that might be nullptr RecordTy->getAs() when calling LookupMemberExprInRecord.
     if (LookupMemberExprInRecord(
           *this, R, nullptr, RecordTy->getAs<RecordType>(), OpLoc, IsArrow,
           SS, TemplateArgs != nullptr, TemplateKWLoc, TE))
        return ExprError();
     if (TE)
       return TE;

This patch uses castAs instead of getAs which will assert if the type doesn't match.

Reviewed By: erichkeane

Differential Revision: https://reviews.llvm.org/D151130

17 months ago[Driver] Try to fix linux-ld.c test with DEFAULT_LINKER set (NFC)
Nikita Popov [Tue, 23 May 2023 14:04:24 +0000 (16:04 +0200)]
[Driver] Try to fix linux-ld.c test with DEFAULT_LINKER set (NFC)

The test fails on the clang-ppc64le-rhel build bot, which has
DEFAULT_LINKER set and an ld.lld binary in the LLVM build directory.

17 months ago[AMDGPU] Add an option to disable manual ctor / dtor lowering
Joseph Huber [Mon, 15 May 2023 12:59:53 +0000 (07:59 -0500)]
[AMDGPU] Add an option to disable manual ctor / dtor lowering

Currently AMDGPU offers extra ctor / dtor lowering by emitting a kernel
that can be called. It's possible to handle ctors and dtors using the
standard method as shown in D149340's commit message. In which case we
on't need these extra kernels as they won't be called. This patch simply
adds a way to conditionally turn off this handling if we do not want to
get extra kernels in the output.

Unrelated, but we could convert this handling to an ODR function that simply
calls the code in D149340 constructed via LLVM-IR. That would handle priority
correctly and would then be correct if not run in LTO mode.

Reviewed By: yaxunl

Differential Revision: https://reviews.llvm.org/D150565

17 months ago[ubsan][test] Remove --check-prefix=UNIQUE for x86_64-apple from e215996a2932ed7c472f...
Fangrui Song [Tue, 23 May 2023 13:59:01 +0000 (06:59 -0700)]
[ubsan][test] Remove --check-prefix=UNIQUE for x86_64-apple from e215996a2932ed7c472f4e94dc4345b30fd0c373

After switching to use a type hash instead of possibly-non-unique typeinfo
objects, we no longer have unique/non-unique distinction.

17 months ago[InstCombine] Remove dead extractelements (NFCI)
Nikita Popov [Tue, 23 May 2023 13:39:53 +0000 (15:39 +0200)]
[InstCombine] Remove dead extractelements (NFCI)

Directly remove these dead extractelement instructions, rather than
leaving them for the next InstCombine iteration to clean up.

Should be mostly NFC, apart from worklist order differences.

17 months ago[mlir][bufferization] Fix bug in findValueInReverseUseDefChain
Matthias Springer [Tue, 23 May 2023 13:22:20 +0000 (15:22 +0200)]
[mlir][bufferization] Fix bug in findValueInReverseUseDefChain

This bug was recently introduced in D143927 and manifests as a dominance violation.

Differential Revision: https://reviews.llvm.org/D151077

17 months agoSilence switch statement contains 'default' but no 'case' labels warning; NFC
Aaron Ballman [Tue, 23 May 2023 13:28:05 +0000 (09:28 -0400)]
Silence switch statement contains 'default' but no 'case' labels warning; NFC

These are showing up in MSVC builds.

17 months ago[AArch64][LV] Disable maximising bandwidth for streaming compatible sve
Dinar Temirbulatov [Tue, 23 May 2023 13:24:01 +0000 (13:24 +0000)]
[AArch64][LV] Disable maximising bandwidth for streaming compatible sve

Fixing last commit by adding actual change to AArch64TargetTransformInfo.cpp

Differential Revision: https://reviews.llvm.org/D150336

17 months agoAdd StringRef::consumeInteger(APInt)
Thomas Preud'homme [Tue, 16 May 2023 09:24:57 +0000 (09:24 +0000)]
Add StringRef::consumeInteger(APInt)

This will be required to allow arbitrary precision support to
FileCheck's numeric variables and expressions. Note: as per
getAsInteger(), this does not support negative value. If there is
interest for that it can be added in a separate patch.

Reviewed By: dblaikie

Differential Revision: https://reviews.llvm.org/D150878

17 months ago[AArch64][LV] Disable maximising bandwidth for streaming compatible sve
Dinar Temirbulatov [Tue, 23 May 2023 12:58:19 +0000 (12:58 +0000)]
[AArch64][LV] Disable maximising bandwidth for streaming compatible sve

We noticed some runtime performance improvements by disabling maximising
bandwidth for streaming compatible sve.

Differential Revision: https://reviews.llvm.org/D150336

17 months agoTurn unreachable error into assert
Thomas Preud'homme [Tue, 16 May 2023 09:22:01 +0000 (09:22 +0000)]
Turn unreachable error into assert

Function valueFromStringRepr() throws an error on missing 0x prefix when
parsing a number string into a value. However, getWildcardRegex() already
ensures that only text with the 0x prefix will match and be parsed,
making that error throwing code dead code. This commit turn the code
into an assert and remove the unit tests exercising that test
accordingly.

Reviewed By: jhenderson

Differential Revision: https://reviews.llvm.org/D150797

17 months agosilence an unused variable warning after 8064caf83fb166b709bfe0e7641c5181341cb064
Krasimir Georgiev [Tue, 23 May 2023 12:46:25 +0000 (12:46 +0000)]
silence an unused variable warning after 8064caf83fb166b709bfe0e7641c5181341cb064

17 months ago[AArch64][FMV] Fix name mangling.
Pavel Iliin [Wed, 17 May 2023 17:14:01 +0000 (18:14 +0100)]
[AArch64][FMV] Fix name mangling.

Put features into function version name in increasing priority order.

Differential Revision: https://reviews.llvm.org/D150800

17 months ago[KnownBits] Return zero instead of unknown for always poison shifts
Nikita Popov [Mon, 15 May 2023 15:56:02 +0000 (17:56 +0200)]
[KnownBits] Return zero instead of unknown for always poison shifts

For always poison shifts, any KnownBits return value is valid.
Currently we return unknown, but returning zero is generally more
profitable. We had some code in ValueTracking that tried to do this,
but was actually dead code.

Differential Revision: https://reviews.llvm.org/D150648

17 months ago[clangd] Store paths as requested in PreambleStatCache
Kadir Cetinkaya [Tue, 23 May 2023 07:47:57 +0000 (09:47 +0200)]
[clangd] Store paths as requested in PreambleStatCache

Underlying FS can store different file names inside the stat response
(e.g. symlinks resolved, absolute paths, dots removed). But we store path names
as requested inside the preamble,
https://github.com/llvm/llvm-project/blob/main/clang/lib/Serialization/ASTWriter.cpp#L1635.

This improves cache hit rates from ~30% to 90% in a build system that uses
symlinks.

Differential Revision: https://reviews.llvm.org/D151185

17 months agoRevert "[clang] Add tests for CWG issues 977, 1482, 2516"
Vlad Serebrennikov [Tue, 23 May 2023 12:29:14 +0000 (15:29 +0300)]
Revert "[clang] Add tests for CWG issues 977, 1482, 2516"

This reverts commit 85452b5f9b5aba5bdf0259b7f0d7400362f95535.

17 months ago[PostOrderIterator] Use SmallVector for RPOT blocks (NFC)
Nikita Popov [Tue, 23 May 2023 12:17:50 +0000 (14:17 +0200)]
[PostOrderIterator] Use SmallVector for RPOT blocks (NFC)

17 months agoReland "[flang] Handle array constants of any rank"
Leandro Lupori [Tue, 16 May 2023 13:06:13 +0000 (13:06 +0000)]
Reland "[flang] Handle array constants of any rank"

Fixes gfortran test-suite regression.

Differential Revision: https://reviews.llvm.org/D150686

17 months agoReapply [PostOrderIterator] Store end iterator (NFC)
Nikita Popov [Mon, 22 May 2023 13:04:18 +0000 (15:04 +0200)]
Reapply [PostOrderIterator] Store end iterator (NFC)

Replace structured bindings with std::get, as they apparently
break the modules build.

-----

Store the end iterator on the VisitStack, instead of recomputing
it every time, as doing so is not free.

17 months agoAArch64: emit synchronous unwind for Darwin arm64_32 platforms too.
Tim Northover [Tue, 23 May 2023 12:14:21 +0000 (13:14 +0100)]
AArch64: emit synchronous unwind for Darwin arm64_32 platforms too.

Since we're checking the triple directly, arm64_32 shows up differently and was
still getting an attempt at asynchronous unwind that added lots more
`__eh_frame` entries instead of the compact format.

17 months ago[clang][dataflow] Use `Strict` accessors in comma operator and no-op cast.
Martin Braenne [Wed, 17 May 2023 13:27:35 +0000 (13:27 +0000)]
[clang][dataflow] Use `Strict` accessors in comma operator and no-op cast.

This patch is part of the ongoing migration to strict handling of value
categories (see https://discourse.llvm.org/t/70086 for details).

Depends On D150775

Reviewed By: gribozavr2

Differential Revision: https://reviews.llvm.org/D150776

17 months ago[Driver] Fix test for use of ld from devtoolset (NFC)
Nikita Popov [Tue, 23 May 2023 09:45:24 +0000 (11:45 +0200)]
[Driver] Fix test for use of ld from devtoolset (NFC)

The test added in c5fe10f365247c3dd9416b7ec8bad73a60b5946e contains
some typos in the check lines, due to which it never actually
verified what was intended.

Fix the test by adding the required input tree and adjusting the
check lines appropriately.

Differential Revision: https://reviews.llvm.org/D151195

17 months ago[gn build] Port 5111286f06e1
LLVM GN Syncbot [Tue, 23 May 2023 11:39:12 +0000 (11:39 +0000)]
[gn build] Port 5111286f06e1

17 months ago[lli] Export the MinGW chkstk function from the lli executable
Martin Storsjö [Sat, 13 May 2023 23:04:22 +0000 (23:04 +0000)]
[lli] Export the MinGW chkstk function from the lli executable

This allows all ExecutionEngine tests pass in MinGW build configurations.

Differential Revision: https://reviews.llvm.org/D150555

17 months agoReland "Reland [clang-repl] Introduce Value to capture expression results"
Jun Zhang [Tue, 23 May 2023 10:09:04 +0000 (18:09 +0800)]
Reland "Reland [clang-repl] Introduce Value to capture expression results"

This reverts commit 094ab4781262b6cb49d57b0ecdf84b047c879295.

Reland with changing `ParseAndExecute` to `Parse` in
`Interpreter::create`. This avoid creating JIT instance everytime even
if we don't really need them.

This should fixes failures like https://lab.llvm.org/buildbot/#/builders/38/builds/11955

The original reverted patch also causes GN bot fails on M1. (https://lab.llvm.org/buildbot/#/builders/38/builds/11955)
However, we can't reproduce it so let's reland it and see what happens.
See discussions here: https://reviews.llvm.org/rGd71a4e02277a64a9dece591cdf2b34f15c3b19a0

17 months ago[Coverity] Constant variable guards dead code.
Luo, Yuanke [Tue, 23 May 2023 11:16:47 +0000 (19:16 +0800)]
[Coverity] Constant variable guards dead code.

17 months agoRevert "[Sema] `setInvalidDecl` for error deduction declaration"
Tom Weaver [Tue, 23 May 2023 10:44:51 +0000 (11:44 +0100)]
Revert "[Sema] `setInvalidDecl` for error deduction declaration"

This reverts commit eb5902ffc97163338bab95d2fd84a953ee76e96f.

Caused buildbot failures on:
  https://lab.llvm.org/buildbot/#/builders/139/builds/41248
  https://lab.llvm.org/buildbot/#/builders/216/builds/21637

17 months agoFix MSVC "ignoring return value of function declared with 'nodiscard' attribute"...
Simon Pilgrim [Tue, 23 May 2023 10:40:33 +0000 (11:40 +0100)]
Fix MSVC "ignoring return value of function declared with 'nodiscard' attribute" warning. NFC.

17 months ago[llvm][github] Allow github links in /cherry-pick actions
Timm Bäder [Tue, 23 May 2023 08:23:10 +0000 (10:23 +0200)]
[llvm][github] Allow github links in /cherry-pick actions

Differential Revision: https://reviews.llvm.org/D151191

17 months ago[Mips] Avoid RegScavenger::forward in Mips16InstrInfo
Jay Foad [Mon, 15 May 2023 11:01:23 +0000 (12:01 +0100)]
[Mips] Avoid RegScavenger::forward in Mips16InstrInfo

RegScavenger::backward is preferred because it does not rely on accurate
kill flags.

Differential Revision: https://reviews.llvm.org/D150557

17 months ago[gn build] Port 0b91de5ea32d
LLVM GN Syncbot [Tue, 23 May 2023 10:01:28 +0000 (10:01 +0000)]
[gn build] Port 0b91de5ea32d

17 months ago[X86] Add X86FixupVectorConstantsPass to re-fold AVX512 vector load folds as broadcas...
Simon Pilgrim [Fri, 19 May 2023 21:45:14 +0000 (22:45 +0100)]
[X86] Add X86FixupVectorConstantsPass to re-fold AVX512 vector load folds as broadcast folds

This patch analyzes AVX512 instructions for full vector width folded loads from the constant pool and attempts to determine if it can be replaced with a smaller broadcast folded variant. Typically the broadcast opportunities were missed by type-width mismatches or mulituse limitations which have been removed in later passes.

As well as introducing broadcast fold tables (which can hopefully be extended/automated in the future), this also handles mismatches in the AND/ANDN/OR/XOR/TERNLOG type-widths, catching additional missed opportunities.

This is patch is pulled from the ongoing work based on D150143, but without removing the existing DAG constant broadcast lowering code - this patch is currently a late stage cleanup only.

The intention is to add additional broadcast/extension handling of constants in future patches, but it turned out that AVX512 broadcast handling was the easiest to start with.

Differential Revision: https://reviews.llvm.org/D150526

17 months ago[clang] Add tests for CWG issues 977, 1482, 2516
Vlad Serebrennikov [Tue, 23 May 2023 09:50:09 +0000 (12:50 +0300)]
[clang] Add tests for CWG issues 977, 1482, 2516

CWG977 focus on point of /completeness/ of enums. Wording provided in CWG1482.
CWG1482 and CWG2516 focus on locus (point) of /declaration/. Wording provided in CWG2516.

Reviewed By: #clang-language-wg, shafik

Differential Revision: https://reviews.llvm.org/D151042

17 months ago[clang] Add test for CWG2213
Vlad Serebrennikov [Tue, 23 May 2023 09:43:47 +0000 (12:43 +0300)]
[clang] Add test for CWG2213

[[https://wg21.link/p1787 | P1787]]: CWG2213 is resolved by allowing an elaborated-type-specifier to contain a simple-template-id without friend.
Wording: see changes to [dcl.type.elab]]/1.

The gist of the issue is that forward declaration of partial class template specialization was disallowed.

Reviewed By: #clang-language-wg, shafik

Differential Revision: https://reviews.llvm.org/D151032

17 months ago[PowerPC] Avoid RegScavenger::forward in PPCFrameLowering
Jay Foad [Mon, 15 May 2023 11:01:36 +0000 (12:01 +0100)]
[PowerPC] Avoid RegScavenger::forward in PPCFrameLowering

RegScavenger::backward is preferred because it does not rely on accurate
kill flags.

Differential Revision: https://reviews.llvm.org/D150558

17 months ago[libc] Display unit test runtime for hosted environments
Guillaume Chatelet [Tue, 23 May 2023 09:14:28 +0000 (09:14 +0000)]
[libc] Display unit test runtime for hosted environments

With more tests added to LLVM libc each week we want to keep track of unittest's runtime, especially for low end build bots.

Top offender can be tracked with a bit of scripting (spoiler alert, mem function sweep tests are in the top ones)
```
ninja check-libc | grep "ms)" | awk '{print $(NF-1),$0}' | sort -nr | cut -f2- -d' '
```

Unfortunately this doesn't work for hermetic tests since `clock` is unavailable.

Reviewed By: sivachandra

Differential Revision: https://reviews.llvm.org/D151097

17 months ago[RISCV] Make zfbfmin imply the F extension
Alex Bradbury [Tue, 23 May 2023 09:09:22 +0000 (10:09 +0100)]
[RISCV] Make zfbfmin imply the F extension

Our current approach is that if one extension requires another, we make
LLVM treat it as implied. My initial zfbfmin patch failed to do this for
the F extension (documented as a requirement of zfbfmin). This patch
fixes that.

Differential Revision: https://reviews.llvm.org/D151096

17 months ago[SimplifyCFG] add nsw on SwitchToLookupTable index calculation on MinCaseVal subtraction
khei4 [Thu, 18 May 2023 03:49:10 +0000 (12:49 +0900)]
[SimplifyCFG] add nsw on SwitchToLookupTable index calculation on MinCaseVal subtraction
Differential Revision: https://reviews.llvm.org/D146903
Reviewed By: nikic

17 months ago[PowerPC] Simplify fp-to-int store optimization
Qiu Chaofan [Tue, 23 May 2023 08:40:54 +0000 (16:40 +0800)]
[PowerPC] Simplify fp-to-int store optimization

On PowerPC VSX targets, fp-to-int will be transformed into xscv with
mfvsr. When the result is to be stored, mfvsr can be replaced by a
direct store.

This change simplifies the optimization by using existing fp-to-int
code, which helps CSE and handling strictfp cases.

Reviewed By: shchenz

Differential Revision: https://reviews.llvm.org/D141473

17 months ago[Clang] Support more stdio builtins
Qiu Chaofan [Tue, 23 May 2023 08:22:32 +0000 (16:22 +0800)]
[Clang] Support more stdio builtins

Add more builtins for stdio functions as in GCC, along with their
mutations under IEEE float128 ABI.

Reviewed By: tuliom

Differential Revision: https://reviews.llvm.org/D150087

17 months ago[OpenMP][Tests][NFC] Mark unsupported libomp tests for GCC
Joachim Jenke [Tue, 23 May 2023 08:26:13 +0000 (10:26 +0200)]
[OpenMP][Tests][NFC] Mark unsupported libomp tests for GCC

This patch properly marks the support level for libomp test when testing with
GCC.

Some new OpenMP features were only introduced with GCC 11.
Tests using the target construct are incompatibe with GCC.

Tests pass now with GCC 10, 11, 12

17 months ago[OpenMP][Tests][NFC] Mark unsupported OMPT tests for GCC
Joachim Jenke [Tue, 23 May 2023 08:22:50 +0000 (10:22 +0200)]
[OpenMP][Tests][NFC] Mark unsupported OMPT tests for GCC

Codegen for some OpenMP directives is different from clang, so some
OMPT tests fail. As we don't expect GCC codegen to change significantly,
we mark the tests as unsupported for GCC.

OMPT Tests pass now with GCC 10, 11, 12

17 months ago[OpenMP][Tests][NFC] Fix libarcher tests for GCC
Joachim Jenke [Tue, 23 May 2023 07:39:27 +0000 (09:39 +0200)]
[OpenMP][Tests][NFC] Fix libarcher tests for GCC

TSan in GCC filters duplicates less aggressively. With 8 threads we can
expect reports for up to 7 pairs of data race in some tests.

Tests pass now with GCC 10, 11, 12

17 months ago[AArch64] Avoid RegScavenger::forward in AArch64SpeculationHardening
Jay Foad [Mon, 15 May 2023 12:09:05 +0000 (13:09 +0100)]
[AArch64] Avoid RegScavenger::forward in AArch64SpeculationHardening

RegScavenger::backward is preferred because it does not rely on accurate
kill flags.

Differential Revision: https://reviews.llvm.org/D150560

17 months ago[AArch64] Add implicit uses to speculative hardening MIR test
Jay Foad [Mon, 15 May 2023 12:05:14 +0000 (13:05 +0100)]
[AArch64] Add implicit uses to speculative hardening MIR test

A couple of tests were setting liveins to add fake live registers, but
that only works if you track liveness forwards. Add some implicit uses
too, so that it also works if you track liveness backwards.

Differential Revision: https://reviews.llvm.org/D150559

17 months agoAMDGPU/GlobalISel: Update test
Matt Arsenault [Tue, 23 May 2023 08:13:36 +0000 (09:13 +0100)]
AMDGPU/GlobalISel: Update test

17 months agoRevert "[clang][ExprConstant] fix __builtin_object_size for flexible array members"
Krasimir Georgiev [Tue, 23 May 2023 07:59:38 +0000 (07:59 +0000)]
Revert "[clang][ExprConstant] fix __builtin_object_size for flexible array members"

This reverts commit 57c5c1ab2a188b7962c9de5ac0f95e3c7441940a.

Causes an assertion failure: https://reviews.llvm.org/D150892#4363080

17 months agoReapply "InstSimplify: Pass AssumptionCache to isKnownNeverInfinity"
Matt Arsenault [Mon, 22 May 2023 09:44:32 +0000 (10:44 +0100)]
Reapply "InstSimplify: Pass AssumptionCache to isKnownNeverInfinity"

This reverts commit 481191b0a8318e55ce467e983d78d2141e827db1.

17 months agoReapply "SimplifyLibCalls: Pass AssumptionCache to isKnownNeverInfinity"
Matt Arsenault [Fri, 19 May 2023 09:25:38 +0000 (10:25 +0100)]
Reapply "SimplifyLibCalls: Pass AssumptionCache to isKnownNeverInfinity"

This reverts commit b357f379c81811409348dd0e0273a248b055bb7a.

17 months agoReapply "ValueTracking: Delete body of isKnownNeverInfinity"
Matt Arsenault [Fri, 19 May 2023 08:15:22 +0000 (09:15 +0100)]
Reapply "ValueTracking: Delete body of isKnownNeverInfinity"

This reverts commit d1dc3e13a791fe1b99a341406b5dafec64750cb1.

200bdd9e869e2982f54923b05e54c117fd33f5d9 should have fixed
the reported regression.

17 months agoReapply "InstSimplify: Use isKnownNeverInfOrNaN"
Matt Arsenault [Mon, 22 May 2023 09:42:58 +0000 (10:42 +0100)]
Reapply "InstSimplify: Use isKnownNeverInfOrNaN"

This reverts commit f55224735ed39af16bccd7ff67b734fd758db6fc.

17 months agoAMDGPU: Expand casted f16 fmed3 pattern to fmin/fmax on gfx8
Matt Arsenault [Sun, 7 May 2023 10:13:28 +0000 (11:13 +0100)]
AMDGPU: Expand casted f16 fmed3 pattern to fmin/fmax on gfx8

If we have legal f16 instructions but no f16 med3, we can save
one instruction by expanding out the min/max sequence compared
to casting to f32 and casting back.

17 months ago[flang][hlfir] Hoist forall bounds computation when possible
Jean Perier [Tue, 23 May 2023 07:17:36 +0000 (09:17 +0200)]
[flang][hlfir] Hoist forall bounds computation when possible

When inner forall bound computations do not depend on previous
forall indices, they can be hoisted.
This is possible because:
 - bound computation are required to be pure (so evaluating them only
   once is possible).
 - If the bound computation depends on a value previously assigned, the
   forall scheduling analysis created different run for it: the
   assignment impacting the bounds value is not part of the current loop
   nest.

The reason this optimization is done at that point and not as part of
generic loop hoisting optimization is that having the all the loop
bound computation hoisted will allow allocating simple temporary
storages. The number of iteration can be pre-computed and used as the
extent for the temporary.

Differential Revision: https://reviews.llvm.org/D151110

17 months ago[Sema] `setInvalidDecl` for error deduction declaration
Congcong Cai [Tue, 23 May 2023 07:07:03 +0000 (09:07 +0200)]
[Sema] `setInvalidDecl` for error deduction declaration

Fixed: https://github.com/llvm/llvm-project/issues/62408
`setInvalidDecl` for invalid `CXXDeductionGuideDecl` to
avoid crashes during semantic analysis.

Reviewed By: aaron.ballman

Differential Revision: https://reviews.llvm.org/D149516

17 months ago[AMDGPU] Reintroduce CC exception for non-inlined functions in Promote Alloca limits
pvanhout [Mon, 15 May 2023 09:23:09 +0000 (11:23 +0200)]
[AMDGPU] Reintroduce CC exception for non-inlined functions in Promote Alloca limits

This is basically a partial revert of https://reviews.llvm.org/D145586 ( fd1d60873fdc )

D145586 was originally introduced to help with SWDEV-363662, and it did, but
it also caused a 25% drop in performance in
some MIOpen benchmarks where, it seems,
functions are inlined more conservatively.

This patch restores the pre-D145586 behavior
for PromoteAlloca: functions with a non-entry CC
have a 32 VGPRs threshold, but only if the function
is not marked with "alwaysinline".

A good number of AMDGPU code makes uses of
the AMDGPUAlwaysInline pass anyway, so in our
backend "alwaysinline" seems very common.

This change does not affect SWDEV-363662 (the motivating issue for introducing D145586).

Fixes SWDEV-399519

Reviewed By: rampitec, #amdgpu

Differential Revision: https://reviews.llvm.org/D150551

17 months ago[mlir] Apply ClangTidy performance finding (NFC)
Adrian Kuegel [Tue, 23 May 2023 06:52:53 +0000 (08:52 +0200)]
[mlir] Apply ClangTidy performance finding (NFC)

17 months ago[NFC] [C++20] [Modules] Add a test
Chuanqi Xu [Tue, 23 May 2023 06:31:05 +0000 (14:31 +0800)]
[NFC] [C++20] [Modules] Add a test

Add a test from https://github.com/llvm/llvm-project/issues/59999. It is
always good to have more tests.