Rahul Kayaith [Tue, 23 May 2023 17:40:00 +0000 (13:40 -0400)]
[mlir][python] Bump min pybind11 version to 2.9.0
2.9.0 was released on December 28, 2021, and some following changes
require at least this version.
Reviewed By: stellaraccident
Differential Revision: https://reviews.llvm.org/D150247
Aaron Ballman [Tue, 23 May 2023 17:38:35 +0000 (13:38 -0400)]
Add a reminder to update docs when updating default; NFC
Markus Böck [Tue, 23 May 2023 17:11:43 +0000 (19:11 +0200)]
[llvm][ADT] Fix invalid `reference` type of depth-first, breadth-first and post order iterators
C++s iterator concept requires operator* to return the same type as is specified by the iterators reference type. This functionality is especially important for older generic code that did not yet make use of auto.
An example from within LLVM is iterator_adaptor_base which uses the reference type of the iterator it is wrapping as its return type for operator* (this class is used as base for a lot of other functionality like filter iterators and so on).
Using any of the graph traversal iterators listed above with it would previously fail to compile due to reference being non-const while operator* returned a const reference.
This patch fixes that by correctly specifying reference and using it as the return type of operator* explicitly to prevent further issues in the future.
Differential Revision: https://reviews.llvm.org/D151198
Azat Khuzhin [Tue, 23 May 2023 17:28:30 +0000 (19:28 +0200)]
[libcxx][tests] Introduce 32-bit feature and use it for stringstream gcount test
This will avoid hardcoding all unsupported targets, since even after one
more follow up fix [1], there is one more failure.
[1]: https://reviews.llvm.org/D150886
Plus, if you want to run it locally on some target that CI does not
covers, it could also false-positively fail, which is not good.
Reviewed By: #libc, ldionne
Differential Revision: https://reviews.llvm.org/D151046
Alex Langford [Sat, 20 May 2023 00:51:08 +0000 (17:51 -0700)]
[lldb][NFCI] Merge implementations of ObjectFileMachO::GetMinimumOSVersion and ObjectFileMachO::GetSDKVersion
These functions do the exact same thing (even if they look slightly
different). I yanked the common implementation, cleaned it up, and
shoved it into its own function.
Differential Revision: https://reviews.llvm.org/D151120
Diego Caballero [Mon, 22 May 2023 22:59:46 +0000 (22:59 +0000)]
[mlir][Vector] Add 0-d vector support to 'vector.shape_cast`
This patch adds support to shape cast a vector<1x1x1...1xElemenType> to
a vector<ElementType> and the other way around.
Differential Revision: https://reviews.llvm.org/D151169
Joseph Huber [Tue, 23 May 2023 17:19:56 +0000 (12:19 -0500)]
[libc][obvious] Correctly hoist mask out of the loop
Summry:
This was accidentally dropped from a previous patch following a rebase.
Fix it to where it's consistent.
Differential Revision: https://reviews.llvm.org/D151232
Aaron Ballman [Tue, 23 May 2023 17:11:19 +0000 (13:11 -0400)]
Correct stale documentation for default MSVC version numbers
We documented -fmsc-version as defaulting to 1300 and
-fms-compatibility-version as defaulting to 1800, neither of which
were accurate. We currently default to 1920.
See MSVCToolChain::computeMSVCVersion() for details.
Jin Xin Ng [Mon, 22 May 2023 19:20:55 +0000 (19:20 +0000)]
[hwasan] Move RunFreeHooks call
Ensures a subsequent call (via an external caller) to
__sanitizer_get_allocated_size via hooks will return a valid size.
This allows a faster version of __sanitizer_get_allocated_size
to be implemented, which can skip checks.
Test to ensure RunFreeHooks' call order will come with
__sanitizer_get_allocated_size_fast
Differential Revision: https://reviews.llvm.org/D151151
Mark de Wever [Sun, 7 May 2023 17:50:41 +0000 (19:50 +0200)]
[libc++][doc] Updates the tasks to do for a release.
This is a followup of the review comments in D144499.
Reviewed By: ldionne, philnik, #libc
Differential Revision: https://reviews.llvm.org/D150585
Mark de Wever [Wed, 17 May 2023 15:38:13 +0000 (17:38 +0200)]
[NFC][libc++][format] Uses stringstream::view.
This member has been added in D148641 so it can be used in the formatter
to avoid creating a "temporary" string.
Reviewed By: #libc, ldionne
Differential Revision: https://reviews.llvm.org/D150791
Mark de Wever [Tue, 28 Feb 2023 19:29:26 +0000 (20:29 +0100)]
[libc++][modules] Adds std module cppm files.
This adds the cppm files of D144994. These files by themselves will do
nothing. The goal is to reduce the size of D144994 and making it easier
to review the real changes of the patch.
Implements parts of
- P2465R3 Standard Library Modules std and std.compat
Reviewed By: ldionne, ChuanqiXu, aaronmondal, #libc
Differential Revision: https://reviews.llvm.org/D151030
Fangrui Song [Tue, 23 May 2023 16:49:57 +0000 (09:49 -0700)]
[IR] Make stack protector symbol dso_local according to -f[no-]direct-access-external-data
There are two motivations.
`-fno-pic -fstack-protector -mstack-protector-guard=global` created
`__stack_chk_guard` is referenced directly on all ELF OSes except FreeBSD.
This patch allows referencing the symbol indirectly with
-fno-direct-access-external-data.
Some Linux kernel folks want
`-fno-pic -fstack-protector -mstack-protector-guard-reg=gs -mstack-protector-guard-symbol=__stack_chk_guard`
created `__stack_chk_guard` to be referenced directly, avoiding
R_X86_64_REX_GOTPCRELX (even if the relocation may be optimized out by the linker).
https://github.com/llvm/llvm-project/issues/60116
Why they need this isn't so clear to me.
---
Add module flag "direct-access-external-data" and set the dso_local property of
the stack protector symbol. The module flag can benefit other LLVMCodeGen
synthesized symbols that are not represented in LLVM IR.
Nowadays, with `-fno-pic` being uncommon, ideally we should set
"direct-access-external-data" when it is true. However, doing so would require
~90 clang/test tests to be updated, which are too much.
As a compromise, we set "direct-access-external-data" only when it's different
from the implied default value.
Reviewed By: nickdesaulniers
Differential Revision: https://reviews.llvm.org/D150841
Mark de Wever [Wed, 17 May 2023 15:54:53 +0000 (17:54 +0200)]
[libc++] Updates C++2b to C++23.
During the ISO C++ Committee meeting plenary session the C++23 Standard
has been voted as technical complete.
This updates the reference to c++2b to c++23 and updates the __cplusplus
macro.
Note since we use clang-tidy 16 a small work-around is needed. Clang
knows -std=c++23 but clang-tidy not so for now force the lit compiler
flag to use -std=c++2b instead of -std=c++23.
Reviewed By: #libc, philnik, jloser, ldionne
Differential Revision: https://reviews.llvm.org/D150795
Jin Xin Ng [Mon, 22 May 2023 21:13:46 +0000 (21:13 +0000)]
[lsan] Invoke hooks on realloc
Previously lsan would not invoke hooks on reallocations.
An accompanying regression test is included in sanitizer_common.
This change also moves hook calls to a location where subsequent
calls (via an external caller) to __sanitizer_get_allocated_size
via hooks will return a valid size.
This allows a faster version of __sanitizer_get_allocated_size
to be implemented, which can skip checks.
Test to ensure RunFreeHooks' call order will come with
__sanitizer_get_allocated_size_fast
Differential Revision: https://reviews.llvm.org/D151175
Slava Zakharin [Tue, 23 May 2023 16:10:26 +0000 (09:10 -0700)]
[flang] Fixed managing copy-in/copy-out temps.
There are several observations regarding the copy-in/copy-out:
* Actual argument associated with INTENT(OUT) dummy argument that
requires finalization (7.5.6.3 p. 7) may be read by the finalization
function, so a copy-in is required.
* A temporary created for the copy-in/copy-out must be destroyed
without finalization after the call (or after the corresponding copy-out),
otherwise, memory leaks may occur.
* The copy-out assignment must not perform finalization for the LHS.
* The copy-out assignment from the temporary to the actual argument
may or may not need to initialize the LHS.
This change-set introduces new runtime methods: CopyOutAssign and
DestroyWithoutFinalization. They are called by the compiler generated
code to match the behavior described above.
Reviewed By: jeanPerier
Differential Revision: https://reviews.llvm.org/D151135
max [Mon, 22 May 2023 22:30:12 +0000 (17:30 -0500)]
[MLIR][python bindings] use pybind C++ APIs for throwing python errors.
Differential Revision: https://reviews.llvm.org/D151167
Pavel Iliin [Thu, 18 May 2023 11:02:04 +0000 (12:02 +0100)]
[AArch64][FMV] Prevent target attribute using for multiversioning.
On AArch64 for function multiversioning target_version/target_clones
attributes should be used. The patch fixes the defect allowing target
attribute to cause multiversioning.
Differential Revision: https://reviews.llvm.org/D150867
Craig Topper [Tue, 23 May 2023 16:19:37 +0000 (09:19 -0700)]
[LegalizeTypes][ARM][AArch6][RISCV][VE][WebAssembly] Add special case for smin(X, -1) and smax(X, 0) to ExpandIntRes_MINMAX.
We can compute a simpler expression for Lo for these cases. This
is an alternative for the test cases in D151180 that works for
more targets.
This is similar to some of the special cases we have for expanding
setcc operands.
Differential Revision: https://reviews.llvm.org/D151182
Joseph Huber [Tue, 23 May 2023 16:09:16 +0000 (11:09 -0500)]
[OpenMP][NFC] clang-format the OpenMP device runtime
These files aren't fully formatted. I'm guessing this was a holdover
from when `clang-format` was totally broken for OpenMP offloading.
Format the files to be more consistent.
Reviewed By: tianshilei1992
Differential Revision: https://reviews.llvm.org/D151226
Joseph Huber [Fri, 19 May 2023 16:17:42 +0000 (11:17 -0500)]
[libc] More efficiently send bytes via `send_n` and `recv_n`
Currently we have the `send_n` and `recv_n` routines to stream data,
such as a string to print, to the other side. The first operation is to
send the size so the other side knows the number of bytes to recieve.
However, this wasted 56 bytes that could've been sent. This meant that
small values, like the arguments to a function to call on the host for
example, needed to perform an extra send. This patch sends the first 56
bytes in the first packet and continues if necessary.
Depends on D150992
Reviewed By: JonChesterfield
Differential Revision: https://reviews.llvm.org/D151041
Joseph Huber [Fri, 19 May 2023 19:58:32 +0000 (14:58 -0500)]
[libc] Fix the `send_n` and `recv_n` utilities under divergent lanes
We provide the `send_n` and `recv_n` utilities as a generic way to
stream data between both sides of the process. This was previously
tested and performed as expected when using a string of constant size.
However, when the size was allowed to diverge between the threads in the
warp or wavefront this could deadlock. This did not occur on NVPTX
because of the use of the explicit warp sync. However, on AMD one of the
work items in the wavefront could continue executing and hit the next
`recv` call before the other threads, then we would deadlock as we
violated the RPC invariants.
This patch replaces the for loop with a thread ballot. This will cause
every thread in the warp or wavefront to continue executing the loop
until all of them can exit. This acts as a more explicit wavefront sync.
Reviewed By: JonChesterfield
Differential Revision: https://reviews.llvm.org/D150992
Nikolas Klauser [Tue, 23 May 2023 15:59:13 +0000 (08:59 -0700)]
[libc++] Remove tests from ranges.pass.cpp which violate semantic requirements
This also removes some tests which we have grouped together into robust_from_*.pass.cpp tests.
Specifically, checking that
- `ranges::dangling` is returned is done in `libcxx/test/std/algorithms/ranges_robust_against_dangling.pass.cpp`
- `std::invoke` is used is done in `libcxx/test/std/algorithms/ranges_robust_against_omitting_invoke.pass.cpp`.
- implicit conversion to bool works is done in `libcxx/test/std/algorithms/ranges_robust_against_nonbool_predicates.pass.cpp`
Checking the comparison order is invalid because the `operator==` isn't symmetric.
Checking what the exact type of `operator==` is, is invalid because comparing the same object has to yield the same results if the objects are not modified.
Reviewed By: ldionne, #libc
Spies: EricWF, libcxx-commits
Differential Revision: https://reviews.llvm.org/D150588
Nikolas Klauser [Tue, 23 May 2023 15:58:14 +0000 (08:58 -0700)]
[libc++][NFC] Move basic_ios extern instantiations into <ios>
`basic_ios` is defined in `<ios>`, so it seems weird that we declare the explicit instantiation for it i `<streambuf>`, which is technically unrelated.
Reviewed By: #libc, EricWF, ldionne
Spies: ldionne, EricWF, libcxx-commits
Differential Revision: https://reviews.llvm.org/D150912
Yaxun (Sam) Liu [Thu, 11 May 2023 16:57:21 +0000 (12:57 -0400)]
[HIP] Allow std::malloc in device function
D106463 caused a regression that prevents std::malloc to be
called in the device function, which is allowed with nvcc.
Basically the standard C++ header introducing malloc in
std namespace by using ::malloc. The device ::malloc
function needs to be declared before using ::malloc
to be introduced into std namespace.
Revert D106463 and add a test.
Reviewed by: Artem Belevich
Differential Revision: https://reviews.llvm.org/D150965
Nikolas Klauser [Tue, 23 May 2023 15:40:47 +0000 (08:40 -0700)]
[libc++][NFC] Fix whitespace problems in the files added to ignore_format.txt in D151115
Reviewed By: ldionne, #libc, Mordante
Spies: arichardson, Mordante, libcxx-commits
Differential Revision: https://reviews.llvm.org/D151119
Felipe de Azevedo Piovezan [Thu, 11 May 2023 13:01:12 +0000 (09:01 -0400)]
[lldb][NFCI] Use llvm's libDebugInfo for DebugRanges
In an effort to unify the different dwarf parsers available in the codebase,
this commit removes LLDB's custom parsing for the `.debug_ranges` DWARF section,
instead calling into LLVM's parser.
Subsequent work should look into unifying `llvm::DWARDebugRangeList` (whose
entries are pairs of (start, end) addresses) with `lldb::DWARFRangeList` (whose
entries are pairs of (start, length)). The lists themselves are also different
data structures, but functionally equivalent.
Depends on D150363
Differential Revision: https://reviews.llvm.org/D150366
Tue Ly [Sun, 21 May 2023 05:27:38 +0000 (01:27 -0400)]
[libc][math] Implement double precision log1p correctly rounded to all rounding modes.
Implement double precision log1p function correctly rounded to all
rounding modes.
**Performance**
- For `0.5 <= x <= 2`, the fast pass hitting rate is about 99.93%.
- Benchmarks with `./perf.sh` tool from the CORE-MATH project, unit is (CPU clocks / call).
- Reciprocal throughput from CORE-MATH's perf tool on Ryzen 5900X:
```
$ ./perf.sh log1p
GNU libc version: 2.35
GNU libc release: stable
-- CORE-MATH reciprocal throughput -- with FMA
[####################] 100 %
Ntrial = 20 ; Min = 39.792 + 1.011 clc/call; Median-Min = 0.940 clc/call; Max = 41.373 clc/call;
-- CORE-MATH reciprocal throughput -- without FMA (-march=x86-64-v2)
[####################] 100 %
Ntrial = 20 ; Min = 87.285 + 1.135 clc/call; Median-Min = 1.299 clc/call; Max = 89.715 clc/call;
-- System LIBC reciprocal throughput --
[####################] 100 %
Ntrial = 20 ; Min = 20.666 + 0.123 clc/call; Median-Min = 0.125 clc/call; Max = 20.828 clc/call;
-- LIBC reciprocal throughput -- with FMA
[####################] 100 %
Ntrial = 20 ; Min = 20.928 + 0.771 clc/call; Median-Min = 0.725 clc/call; Max = 22.767 clc/call;
-- LIBC reciprocal throughput -- without FMA
[####################] 100 %
Ntrial = 20 ; Min = 31.461 + 0.528 clc/call; Median-Min = 0.602 clc/call; Max = 36.809 clc/call;
```
- Latency from CORE-MATH's perf tool on Ryzen 5900X:
```
$ ./perf.sh log1p --latency
GNU libc version: 2.35
GNU libc release: stable
-- CORE-MATH latency -- with FMA
[####################] 100 %
Ntrial = 20 ; Min = 77.875 + 0.062 clc/call; Median-Min = 0.051 clc/call; Max = 78.003 clc/call;
-- CORE-MATH latency -- without FMA (-march=x86-64-v2)
[####################] 100 %
Ntrial = 20 ; Min = 101.958 + 1.202 clc/call; Median-Min = 1.325 clc/call; Max = 104.452 clc/call;
-- System LIBC latency --
[####################] 100 %
Ntrial = 20 ; Min = 60.581 + 1.443 clc/call; Median-Min = 1.611 clc/call; Max = 62.285 clc/call;
-- LIBC latency -- with FMA
[####################] 100 %
Ntrial = 20 ; Min = 48.817 + 1.108 clc/call; Median-Min = 1.300 clc/call; Max = 50.282 clc/call;
-- LIBC latency -- without FMA
[####################] 100 %
Ntrial = 20 ; Min = 61.121 + 0.599 clc/call; Median-Min = 0.761 clc/call; Max = 62.020 clc/call;
```
- Accurate pass latency:
```
$ ./perf.sh log1p --latency --simple_stat
GNU libc version: 2.35
GNU libc release: stable
-- CORE-MATH latency -- with FMA
760.444
-- CORE-MATH latency -- without FMA (-march=x86-64-v2)
827.880
-- LIBC latency -- with FMA
711.837
-- LIBC latency -- without FMA
764.317
```
Reviewed By: zimmermann6
Differential Revision: https://reviews.llvm.org/D151049
Nikita Popov [Tue, 23 May 2023 14:59:02 +0000 (16:59 +0200)]
[InstCombine] Add droppable users back to worklist (NFCI)
When sinking and users are dropped, add the using instructions
to the worklist, as they can likely be removed as well.
This should be NFC apart from worklist order effects.
Jean Perier [Tue, 23 May 2023 15:00:15 +0000 (17:00 +0200)]
[flang][NFC] Move Array constructor inlined temp management into a utility
This patch moves the counter and storage management part of the array
constructor inlined temporary strategy into its own utility so that it
can be reused for the simple cases of temporary creations inside WHERE
and FORALL.
It actually fixes a bug where the counter first value used for addressing
was "2" leading to read/write after the allocated storage... It seems
I ran the tests end-to-end without the HLFIR flag when previously testing
this. So this may clear some segfaults.
Differential Revision: https://reviews.llvm.org/D151106
Tom Eccles [Wed, 17 May 2023 16:07:41 +0000 (16:07 +0000)]
[flang] use greedy mlir driver for stack arrays pass
In upstream mlir, the dialect conversion infrastructure is used for
lowering from one dialect to another: the passes are of the form
XToYPass. Whereas, transformations within the same dialect tend to use
applyPatternsAndFoldGreedily.
In this case, the full complexity of applyPatternsAndFoldGreedily isn't
needed so we can get away with the simpler applyOpPatternsAndFold.
This change was suggested by @jeanPerier
Differential Revision: https://reviews.llvm.org/D150853
Tue Ly [Thu, 11 May 2023 15:10:02 +0000 (11:10 -0400)]
[libc][math] Implement double precision log2 function correctly rounded to all rounding modes.
Implement double precision log2 function correctly rounded to all
rounding modes.
See https://reviews.llvm.org/D150014 for a more detail description of the algorithm.
**Performance**
- For `0.5 <= x <= 2`, the fast pass hitting rate is about 99.91%.
- Reciprocal throughput from CORE-MATH's perf tool on Ryzen 5900X:
```
$ ./perf.sh log2
GNU libc version: 2.35
GNU libc release: stable
-- CORE-MATH reciprocal throughput -- with FMA
[####################] 100 %
Ntrial = 20 ; Min = 15.458 + 0.204 clc/call; Median-Min = 0.224 clc/call; Max = 15.867 clc/call;
-- CORE-MATH reciprocal throughput -- without FMA (-march=x86-64-v2)
[####################] 100 %
Ntrial = 20 ; Min = 23.711 + 0.524 clc/call; Median-Min = 0.443 clc/call; Max = 25.307 clc/call;
-- System LIBC reciprocal throughput --
[####################] 100 %
Ntrial = 20 ; Min = 14.807 + 0.199 clc/call; Median-Min = 0.211 clc/call; Max = 15.137 clc/call;
-- LIBC reciprocal throughput -- with FMA
[####################] 100 %
Ntrial = 20 ; Min = 17.666 + 0.274 clc/call; Median-Min = 0.298 clc/call; Max = 18.531 clc/call;
-- LIBC reciprocal throughput -- without FMA
[####################] 100 %
Ntrial = 20 ; Min = 26.534 + 0.418 clc/call; Median-Min = 0.462 clc/call; Max = 27.327 clc/call;
```
- Latency from CORE-MATH's perf tool on Ryzen 5900X:
```
$ ./perf.sh log2 --latency
GNU libc version: 2.35
GNU libc release: stable
-- CORE-MATH latency -- with FMA
[####################] 100 %
Ntrial = 20 ; Min = 46.048 + 1.643 clc/call; Median-Min = 1.694 clc/call; Max = 48.018 clc/call;
-- CORE-MATH latency -- without FMA (-march=x86-64-v2)
[####################] 100 %
Ntrial = 20 ; Min = 62.333 + 0.138 clc/call; Median-Min = 0.119 clc/call; Max = 62.583 clc/call;
-- System LIBC latency --
[####################] 100 %
Ntrial = 20 ; Min = 45.206 + 1.503 clc/call; Median-Min = 1.467 clc/call; Max = 47.229 clc/call;
-- LIBC latency -- with FMA
[####################] 100 %
Ntrial = 20 ; Min = 43.042 + 0.454 clc/call; Median-Min = 0.484 clc/call; Max = 43.912 clc/call;
-- LIBC latency -- without FMA
[####################] 100 %
Ntrial = 20 ; Min = 57.016 + 1.636 clc/call; Median-Min = 1.655 clc/call; Max = 58.816 clc/call;
```
- Accurate pass latency:
```
$ ./perf.sh log2 --latency --simple_stat
GNU libc version: 2.35
GNU libc release: stable
-- CORE-MATH latency -- with FMA
177.632
-- CORE-MATH latency -- without FMA (-march=x86-64-v2)
231.332
-- LIBC latency -- with FMA
459.751
-- LIBC latency -- without FMA
463.850
```
Reviewed By: zimmermann6
Differential Revision: https://reviews.llvm.org/D150374
Krzysztof Parzyszek [Tue, 23 May 2023 12:46:15 +0000 (05:46 -0700)]
[Hexagon] Fix safety check in moving instructions in HVC::AlignVectors
A prior commit accidentally affected a safety check allowing aliased memory
instructions to be moved across one another.
Manna, Soumi [Tue, 23 May 2023 14:36:15 +0000 (07:36 -0700)]
[NFC][CLANG] Fix static code analyzer concerns
Reported by Static Code Analyzer Tool, Coverity:
Dereference null return value
Inside "ExprConstant.cpp" file, in <unnamed>::RecordExprEvaluator::VisitCXXStdInitializerListExpr(clang::CXXStdInitializerListExpr const *): Return value of function which returns null is dereferenced without checking.
bool RecordExprEvaluator::VisitCXXStdInitializerListExpr(
const CXXStdInitializerListExpr *E) {
// returned_null: getAsConstantArrayType returns nullptr (checked 81 out of 93 times).
//var_assigned: Assigning: ArrayType = nullptr return value from getAsConstantArrayType.
const ConstantArrayType *ArrayType =
Info.Ctx.getAsConstantArrayType(E->getSubExpr()->getType());
LValue Array;
//Condition !EvaluateLValue(E->getSubExpr(), Array, this->Info, false), taking false branch.
if (!EvaluateLValue(E->getSubExpr(), Array, Info))
return false;
// Get a pointer to the first element of the array.
//Dereference null return value (NULL_RETURNS)
//dereference: Dereferencing a pointer that might be nullptr ArrayType when calling addArray.
Array.addArray(Info, E, ArrayType);
This patch adds an assert for unexpected type for array initializer.
Reviewed By: erichkeane
Differential Revision: https://reviews.llvm.org/D151040
Kadir Cetinkaya [Fri, 5 May 2023 10:39:09 +0000 (12:39 +0200)]
[include-cleaner] Treat references to nested types implicit
Differential Revision: https://reviews.llvm.org/D149948
Nikita Popov [Tue, 23 May 2023 14:36:11 +0000 (16:36 +0200)]
[InstCombine] Fix worklist management in select value equiv fold (NFCI)
Requeue the modified instruction.
This should be NFC apart from worklist order effects.
Tue Ly [Mon, 8 May 2023 18:03:52 +0000 (14:03 -0400)]
[libc][math] Implement double precision log function correctly rounded to all rounding modes.
Implement double precision log function correctly rounded to all
rounding modes.
See https://reviews.llvm.org/D150014 for a more detail description of the algorithm.
**Performance**
- For `0.5 <= x <= 2`, the fast pass hitting rate is about 99.93%.
- Reciprocal throughput from CORE-MATH's perf tool on Ryzen 5900X:
```
$ ./perf.sh log
GNU libc version: 2.35
GNU libc release: stable
-- CORE-MATH reciprocal throughput -- with FMA
[####################] 100 %
Ntrial = 20 ; Min = 17.465 + 0.596 clc/call; Median-Min = 0.602 clc/call; Max = 18.389 clc/call;
-- CORE-MATH reciprocal throughput -- without FMA (-march=x86-64-v2)
[####################] 100 %
Ntrial = 20 ; Min = 54.961 + 2.606 clc/call; Median-Min = 2.180 clc/call; Max = 59.583 clc/call;
-- System LIBC reciprocal throughput --
[####################] 100 %
Ntrial = 20 ; Min = 12.608 + 0.276 clc/call; Median-Min = 0.359 clc/call; Max = 13.147 clc/call;
-- LIBC reciprocal throughput -- with FMA
[####################] 100 %
Ntrial = 20 ; Min = 20.952 + 0.468 clc/call; Median-Min = 0.602 clc/call; Max = 21.881 clc/call;
-- LIBC reciprocal throughput -- without FMA
[####################] 100 %
Ntrial = 20 ; Min = 18.569 + 0.552 clc/call; Median-Min = 0.601 clc/call; Max = 19.259 clc/call;
```
- Latency from CORE-MATH's perf tool on Ryzen 5900X:
```
$ ./perf.sh log --latency
GNU libc version: 2.35
GNU libc release: stable
-- CORE-MATH latency -- with FMA
[####################] 100 %
Ntrial = 20 ; Min = 48.431 + 0.699 clc/call; Median-Min = 0.073 clc/call; Max = 51.269 clc/call;
-- CORE-MATH latency -- without FMA (-march=x86-64-v2)
[####################] 100 %
Ntrial = 20 ; Min = 64.865 + 3.235 clc/call; Median-Min = 3.475 clc/call; Max = 71.788 clc/call;
-- System LIBC latency --
[####################] 100 %
Ntrial = 20 ; Min = 42.151 + 2.090 clc/call; Median-Min = 2.270 clc/call; Max = 44.773 clc/call;
-- LIBC latency -- with FMA
[####################] 100 %
Ntrial = 20 ; Min = 35.266 + 0.479 clc/call; Median-Min = 0.373 clc/call; Max = 36.798 clc/call;
-- LIBC latency -- without FMA
[####################] 100 %
Ntrial = 20 ; Min = 48.518 + 0.484 clc/call; Median-Min = 0.500 clc/call; Max = 49.896 clc/call;
```
- Accurate pass latency:
```
$ ./perf.sh log --latency --simple_stat
GNU libc version: 2.35
GNU libc release: stable
-- CORE-MATH latency -- with FMA
598.306
-- CORE-MATH latency -- without FMA (-march=x86-64-v2)
632.925
-- LIBC latency -- with FMA
455.632
-- LIBC latency -- without FMA
488.564
```
Reviewed By: zimmermann6
Differential Revision: https://reviews.llvm.org/D150131
Manna, Soumi [Tue, 23 May 2023 14:22:40 +0000 (07:22 -0700)]
[NFC][Clang] Fix Coverity bug with dereference null return value in clang::CodeGen::CodeGenFunction::EmitOMPArraySectionExpr()
Reported by Coverity:
Inside "CGExpr.cpp" file, in clang::CodeGen::CodeGenFunction::EmitOMPArraySectionExpr(clang::OMPArraySectionExpr const *, bool): Return value of function which returns null is dereferenced without checking.
} else {
//returned_null: getAsConstantArrayType returns nullptr (checked 83 out of 95 times).
// var_assigned: Assigning: CAT = nullptr return value from getAsConstantArrayType.
auto *CAT = C.getAsConstantArrayType(ArrayTy);
//identity_transfer: Member function call CAT->getSize() returns an offset off CAT (this).
// Dereference null return value (NULL_RETURNS)
//dereference: Dereferencing a pointer that might be nullptr CAT->getSize() when calling APInt.
ConstLength = CAT->getSize();
}
This patch adds an assert to resolve the bug.
Reviewed By: erichkeane
Differential Revision: https://reviews.llvm.org/D151137
Nikita Popov [Tue, 23 May 2023 14:24:41 +0000 (16:24 +0200)]
[InstCombine] Regenerate test checks (NFC)
Nikita Popov [Tue, 23 May 2023 14:20:41 +0000 (16:20 +0200)]
[InstCombine] Fix worklist management in replaceGEPIdxWithZero() fold (NFCI)
Make sure the old load/store operand is queued for DCE.
This should be NFC apart from worklist order effects.
Joseph Huber [Tue, 23 May 2023 14:16:30 +0000 (09:16 -0500)]
[libc][AMDGPU] Disable the AMDGPU backend's ctor/dtor lowering for libc
The AMDGPU backend has a built-in pass to lower constructors. We do this
manually in the `start.cpp` implementation so we can disable this to
keep the binaries smaller.
Differential Revision: https://reviews.llvm.org/D151213
Jonathan Peyton [Mon, 22 May 2023 19:08:51 +0000 (14:08 -0500)]
[OpenMP] Insert missing variable update inside loop
While loop within task priority code did not have necessary update of
variable which could lead to hangs if two threads collided when both
attempted to execute the compare_and_exchange.
Fixes: https://github.com/llvm/llvm-project/issues/62867
Differential Revision: https://reviews.llvm.org/D151138
Tue Ly [Sat, 6 May 2023 02:08:42 +0000 (22:08 -0400)]
[libc][math] Make log10 correctly rounded for non-FMA targets and improve itsperformance.
Make log10 correctly rounded for non-FMA targets and improve its
performance.
Implemented fast pass and accurate pass:
**Fast Pass**:
- Range reduction step 0: Extract exponent and mantissa
```
x = 2^(e_x) * m_x
```
- Range reduction step 1: Use lookup tables of size 2^7 = 128 to reduce the argument to:
```
-2^-8 <= v = r * m_x - 1 < 2^-7
where r = 2^-8 * ceil( 2^8 * (1 - 2^-8) / (1 + k * 2^-7) )
and k = trunc( (m_x - 1) * 2^7 )
```
- Polynomial approximation: approximate `log(1 + v)` by a degree-7 polynomial generated by Sollya with:
```
> P = fpminimax((log(1 + x) - x)/x^2, 5, [|D...|], [-2^-8, 2^-7]);
```
- Combine the results:
```
log10(x) ~ ( e_x * log(2) - log(r) + v + v^2 * P(v) ) * log10(e)
```
- Perform additive Ziv's test with errors bounded by `P_ERR * v^2`. Return the result if Ziv's test passed.
**Accurate Pass**:
- Take `e_x`, `v`, and the lookup table index from the range reduction step of fast pass.
- Perform 3 more range reduction steps:
- Range reduction step 2: Use look-up tables of size 193 to reduce the argument to `[-0x1.3ffcp-15, 0x1.3e3dp-15]`
```
v2 = r2 * (1 + v) - 1 = (1 + s2) * (1 + v) - 1 = s2 + v + s2 * v
where r2 = 2^-16 * round ( 2^16 / (1 + k * 2^-14) )
and k = trunc( v * 2^14 + 0.5 ).
```
- Range reduction step 3: Use look-up tables of size 161 to reduce the argument to `[-0x1.01928p-22 , 0x1p-22]`
```
v3 = r3 * (1 + v2) - 1 = (1 + s3) * (1 + v2) - 1 = s3 + v2 + s3 * v2
where r3 = 2^-21 * round ( 2^21 / (1 + k * 2^-21) )
and k = trunc( v * 2^21 + 0.5 ).
```
- Range reduction step 4: Use look-up tables of size 130 to reduce the argument to `[-0x1.0002143p-29 , 0x1p-29]`
```
v4 = r4 * (1 + v3) - 1 = (1 + s4) * (1 + v3) - 1 = s4 + v3 + s4 * v3
where r4 = 2^-28 * round ( 2^28 / (1 + k * 2^-28) )
and k = trunc( v * 2^28 + 0.5 ).
```
- Polynomial approximation: approximate `log10(1 + v4)` by a degree-4 minimax polynomial generated by Sollya with:
```
> P = fpminimax(log10(1 + x)/x, 3, [|128...|], [-0x1.0002143p-29 , 0x1p-29]);
```
- Combine the results:
```
log10(x) ~ e_x * log10(2) - log10(r) - log10(r2) - log10(r3) - log10(r4) + v * P(v)
```
- The combined results are computed using floating points of 128-bit precision.
**Performance**
- For `0.5 <= x <= 2`, the fast pass hitting rate is about 99.92%.
- Reciprocal throughput from CORE-MATH's perf tool on Ryzen 5900X:
```
$ ./perf.sh log10
GNU libc version: 2.35
GNU libc release: stable
-- CORE-MATH reciprocal throughput -- with FMA
[####################] 100 %
Ntrial = 20 ; Min = 20.402 + 0.589 clc/call; Median-Min = 0.277 clc/call; Max = 22.752 clc/call;
-- CORE-MATH reciprocal throughput -- without FMA (-march=x86-64-v2)
[####################] 100 %
Ntrial = 20 ; Min = 75.797 + 3.317 clc/call; Median-Min = 3.407 clc/call; Max = 79.371 clc/call;
-- System LIBC reciprocal throughput --
[####################] 100 %
Ntrial = 20 ; Min = 22.668 + 0.184 clc/call; Median-Min = 0.181 clc/call; Max = 23.205 clc/call;
-- LIBC reciprocal throughput -- with FMA
[####################] 100 %
Ntrial = 20 ; Min = 25.977 + 0.183 clc/call; Median-Min = 0.138 clc/call; Max = 26.283 clc/call;
-- LIBC reciprocal throughput -- without FMA
[####################] 100 %
Ntrial = 20 ; Min = 22.140 + 0.980 clc/call; Median-Min = 0.853 clc/call; Max = 23.790 clc/call;
```
- Latency from CORE-MATH's perf tool on Ryzen 5900X:
```
$ ./perf.sh log10 --latency
GNU libc version: 2.35
GNU libc release: stable
-- CORE-MATH latency -- with FMA
[####################] 100 %
Ntrial = 20 ; Min = 54.613 + 0.357 clc/call; Median-Min = 0.287 clc/call; Max = 55.701 clc/call;
-- CORE-MATH latency -- without FMA (-march=x86-64-v2)
[####################] 100 %
Ntrial = 20 ; Min = 79.681 + 0.482 clc/call; Median-Min = 0.294 clc/call; Max = 81.604 clc/call;
-- System LIBC latency --
[####################] 100 %
Ntrial = 20 ; Min = 61.532 + 0.208 clc/call; Median-Min = 0.199 clc/call; Max = 62.256 clc/call;
-- LIBC latency -- with FMA
[####################] 100 %
Ntrial = 20 ; Min = 41.510 + 0.205 clc/call; Median-Min = 0.244 clc/call; Max = 41.867 clc/call;
-- LIBC latency -- without FMA
[####################] 100 %
Ntrial = 20 ; Min = 55.669 + 0.240 clc/call; Median-Min = 0.280 clc/call; Max = 56.056 clc/call;
```
- Accurate pass latency:
```
$ ./perf.sh log10 --latency --simple_stat
GNU libc version: 2.35
GNU libc release: stable
-- CORE-MATH latency -- with FMA
640.688
-- CORE-MATH latency -- without FMA (-march=x86-64-v2)
667.354
-- LIBC latency -- with FMA
495.593
-- LIBC latency -- without FMA
504.143
```
Reviewed By: zimmermann6
Differential Revision: https://reviews.llvm.org/D150014
Manna, Soumi [Tue, 23 May 2023 14:07:09 +0000 (07:07 -0700)]
[NFC][CLANG] Fix static code analyzer concerns with dereference null return value
Reported by Static Code Analyzer Tool, Coverity:
Inside "SemaExprMember.cpp" file, in clang::Sema::BuildMemberReferenceExpr(clang::Expr *, clang::QualType, clang::SourceLocation, bool, clang::CXXScopeSpec &, clang::SourceLocation, clang::NamedDecl *, clang::DeclarationNameInfo const &, clang::TemplateArgumentListInfo const *, clang::Scope const *, clang::Sema::ActOnMemberAccessExtraArgs *): Return value of function which returns null is dereferenced without checking
//Condition !Base, taking true branch.
if (!Base) {
TypoExpr *TE = nullptr;
QualType RecordTy = BaseType;
//Condition IsArrow, taking true branch.
if (IsArrow) RecordTy = RecordTy->castAs<PointerType>()->getPointeeType();
//returned_null: getAs returns nullptr (checked 279 out of 294 times).
//Condition TemplateArgs != NULL, taking true branch.
//Dereference null return value (NULL_RETURNS)
//dereference: Dereferencing a pointer that might be nullptr RecordTy->getAs() when calling LookupMemberExprInRecord.
if (LookupMemberExprInRecord(
*this, R, nullptr, RecordTy->getAs<RecordType>(), OpLoc, IsArrow,
SS, TemplateArgs != nullptr, TemplateKWLoc, TE))
return ExprError();
if (TE)
return TE;
This patch uses castAs instead of getAs which will assert if the type doesn't match.
Reviewed By: erichkeane
Differential Revision: https://reviews.llvm.org/D151130
Nikita Popov [Tue, 23 May 2023 14:04:24 +0000 (16:04 +0200)]
[Driver] Try to fix linux-ld.c test with DEFAULT_LINKER set (NFC)
The test fails on the clang-ppc64le-rhel build bot, which has
DEFAULT_LINKER set and an ld.lld binary in the LLVM build directory.
Joseph Huber [Mon, 15 May 2023 12:59:53 +0000 (07:59 -0500)]
[AMDGPU] Add an option to disable manual ctor / dtor lowering
Currently AMDGPU offers extra ctor / dtor lowering by emitting a kernel
that can be called. It's possible to handle ctors and dtors using the
standard method as shown in D149340's commit message. In which case we
on't need these extra kernels as they won't be called. This patch simply
adds a way to conditionally turn off this handling if we do not want to
get extra kernels in the output.
Unrelated, but we could convert this handling to an ODR function that simply
calls the code in D149340 constructed via LLVM-IR. That would handle priority
correctly and would then be correct if not run in LTO mode.
Reviewed By: yaxunl
Differential Revision: https://reviews.llvm.org/D150565
Fangrui Song [Tue, 23 May 2023 13:59:01 +0000 (06:59 -0700)]
[ubsan][test] Remove --check-prefix=UNIQUE for x86_64-apple from
e215996a2932ed7c472f4e94dc4345b30fd0c373
After switching to use a type hash instead of possibly-non-unique typeinfo
objects, we no longer have unique/non-unique distinction.
Nikita Popov [Tue, 23 May 2023 13:39:53 +0000 (15:39 +0200)]
[InstCombine] Remove dead extractelements (NFCI)
Directly remove these dead extractelement instructions, rather than
leaving them for the next InstCombine iteration to clean up.
Should be mostly NFC, apart from worklist order differences.
Matthias Springer [Tue, 23 May 2023 13:22:20 +0000 (15:22 +0200)]
[mlir][bufferization] Fix bug in findValueInReverseUseDefChain
This bug was recently introduced in D143927 and manifests as a dominance violation.
Differential Revision: https://reviews.llvm.org/D151077
Aaron Ballman [Tue, 23 May 2023 13:28:05 +0000 (09:28 -0400)]
Silence switch statement contains 'default' but no 'case' labels warning; NFC
These are showing up in MSVC builds.
Dinar Temirbulatov [Tue, 23 May 2023 13:24:01 +0000 (13:24 +0000)]
[AArch64][LV] Disable maximising bandwidth for streaming compatible sve
Fixing last commit by adding actual change to AArch64TargetTransformInfo.cpp
Differential Revision: https://reviews.llvm.org/D150336
Thomas Preud'homme [Tue, 16 May 2023 09:24:57 +0000 (09:24 +0000)]
Add StringRef::consumeInteger(APInt)
This will be required to allow arbitrary precision support to
FileCheck's numeric variables and expressions. Note: as per
getAsInteger(), this does not support negative value. If there is
interest for that it can be added in a separate patch.
Reviewed By: dblaikie
Differential Revision: https://reviews.llvm.org/D150878
Dinar Temirbulatov [Tue, 23 May 2023 12:58:19 +0000 (12:58 +0000)]
[AArch64][LV] Disable maximising bandwidth for streaming compatible sve
We noticed some runtime performance improvements by disabling maximising
bandwidth for streaming compatible sve.
Differential Revision: https://reviews.llvm.org/D150336
Thomas Preud'homme [Tue, 16 May 2023 09:22:01 +0000 (09:22 +0000)]
Turn unreachable error into assert
Function valueFromStringRepr() throws an error on missing 0x prefix when
parsing a number string into a value. However, getWildcardRegex() already
ensures that only text with the 0x prefix will match and be parsed,
making that error throwing code dead code. This commit turn the code
into an assert and remove the unit tests exercising that test
accordingly.
Reviewed By: jhenderson
Differential Revision: https://reviews.llvm.org/D150797
Krasimir Georgiev [Tue, 23 May 2023 12:46:25 +0000 (12:46 +0000)]
silence an unused variable warning after
8064caf83fb166b709bfe0e7641c5181341cb064
Pavel Iliin [Wed, 17 May 2023 17:14:01 +0000 (18:14 +0100)]
[AArch64][FMV] Fix name mangling.
Put features into function version name in increasing priority order.
Differential Revision: https://reviews.llvm.org/D150800
Nikita Popov [Mon, 15 May 2023 15:56:02 +0000 (17:56 +0200)]
[KnownBits] Return zero instead of unknown for always poison shifts
For always poison shifts, any KnownBits return value is valid.
Currently we return unknown, but returning zero is generally more
profitable. We had some code in ValueTracking that tried to do this,
but was actually dead code.
Differential Revision: https://reviews.llvm.org/D150648
Kadir Cetinkaya [Tue, 23 May 2023 07:47:57 +0000 (09:47 +0200)]
[clangd] Store paths as requested in PreambleStatCache
Underlying FS can store different file names inside the stat response
(e.g. symlinks resolved, absolute paths, dots removed). But we store path names
as requested inside the preamble,
https://github.com/llvm/llvm-project/blob/main/clang/lib/Serialization/ASTWriter.cpp#L1635.
This improves cache hit rates from ~30% to 90% in a build system that uses
symlinks.
Differential Revision: https://reviews.llvm.org/D151185
Vlad Serebrennikov [Tue, 23 May 2023 12:29:14 +0000 (15:29 +0300)]
Revert "[clang] Add tests for CWG issues 977, 1482, 2516"
This reverts commit
85452b5f9b5aba5bdf0259b7f0d7400362f95535.
Nikita Popov [Tue, 23 May 2023 12:17:50 +0000 (14:17 +0200)]
[PostOrderIterator] Use SmallVector for RPOT blocks (NFC)
Leandro Lupori [Tue, 16 May 2023 13:06:13 +0000 (13:06 +0000)]
Reland "[flang] Handle array constants of any rank"
Fixes gfortran test-suite regression.
Differential Revision: https://reviews.llvm.org/D150686
Nikita Popov [Mon, 22 May 2023 13:04:18 +0000 (15:04 +0200)]
Reapply [PostOrderIterator] Store end iterator (NFC)
Replace structured bindings with std::get, as they apparently
break the modules build.
-----
Store the end iterator on the VisitStack, instead of recomputing
it every time, as doing so is not free.
Tim Northover [Tue, 23 May 2023 12:14:21 +0000 (13:14 +0100)]
AArch64: emit synchronous unwind for Darwin arm64_32 platforms too.
Since we're checking the triple directly, arm64_32 shows up differently and was
still getting an attempt at asynchronous unwind that added lots more
`__eh_frame` entries instead of the compact format.
Martin Braenne [Wed, 17 May 2023 13:27:35 +0000 (13:27 +0000)]
[clang][dataflow] Use `Strict` accessors in comma operator and no-op cast.
This patch is part of the ongoing migration to strict handling of value
categories (see https://discourse.llvm.org/t/70086 for details).
Depends On D150775
Reviewed By: gribozavr2
Differential Revision: https://reviews.llvm.org/D150776
Nikita Popov [Tue, 23 May 2023 09:45:24 +0000 (11:45 +0200)]
[Driver] Fix test for use of ld from devtoolset (NFC)
The test added in
c5fe10f365247c3dd9416b7ec8bad73a60b5946e contains
some typos in the check lines, due to which it never actually
verified what was intended.
Fix the test by adding the required input tree and adjusting the
check lines appropriately.
Differential Revision: https://reviews.llvm.org/D151195
LLVM GN Syncbot [Tue, 23 May 2023 11:39:12 +0000 (11:39 +0000)]
[gn build] Port
5111286f06e1
Martin Storsjö [Sat, 13 May 2023 23:04:22 +0000 (23:04 +0000)]
[lli] Export the MinGW chkstk function from the lli executable
This allows all ExecutionEngine tests pass in MinGW build configurations.
Differential Revision: https://reviews.llvm.org/D150555
Jun Zhang [Tue, 23 May 2023 10:09:04 +0000 (18:09 +0800)]
Reland "Reland [clang-repl] Introduce Value to capture expression results"
This reverts commit
094ab4781262b6cb49d57b0ecdf84b047c879295.
Reland with changing `ParseAndExecute` to `Parse` in
`Interpreter::create`. This avoid creating JIT instance everytime even
if we don't really need them.
This should fixes failures like https://lab.llvm.org/buildbot/#/builders/38/builds/11955
The original reverted patch also causes GN bot fails on M1. (https://lab.llvm.org/buildbot/#/builders/38/builds/11955)
However, we can't reproduce it so let's reland it and see what happens.
See discussions here: https://reviews.llvm.org/rGd71a4e02277a64a9dece591cdf2b34f15c3b19a0
Luo, Yuanke [Tue, 23 May 2023 11:16:47 +0000 (19:16 +0800)]
[Coverity] Constant variable guards dead code.
Tom Weaver [Tue, 23 May 2023 10:44:51 +0000 (11:44 +0100)]
Revert "[Sema] `setInvalidDecl` for error deduction declaration"
This reverts commit
eb5902ffc97163338bab95d2fd84a953ee76e96f.
Caused buildbot failures on:
https://lab.llvm.org/buildbot/#/builders/139/builds/41248
https://lab.llvm.org/buildbot/#/builders/216/builds/21637
Simon Pilgrim [Tue, 23 May 2023 10:40:33 +0000 (11:40 +0100)]
Fix MSVC "ignoring return value of function declared with 'nodiscard' attribute" warning. NFC.
Timm Bäder [Tue, 23 May 2023 08:23:10 +0000 (10:23 +0200)]
[llvm][github] Allow github links in /cherry-pick actions
Differential Revision: https://reviews.llvm.org/D151191
Jay Foad [Mon, 15 May 2023 11:01:23 +0000 (12:01 +0100)]
[Mips] Avoid RegScavenger::forward in Mips16InstrInfo
RegScavenger::backward is preferred because it does not rely on accurate
kill flags.
Differential Revision: https://reviews.llvm.org/D150557
LLVM GN Syncbot [Tue, 23 May 2023 10:01:28 +0000 (10:01 +0000)]
[gn build] Port
0b91de5ea32d
Simon Pilgrim [Fri, 19 May 2023 21:45:14 +0000 (22:45 +0100)]
[X86] Add X86FixupVectorConstantsPass to re-fold AVX512 vector load folds as broadcast folds
This patch analyzes AVX512 instructions for full vector width folded loads from the constant pool and attempts to determine if it can be replaced with a smaller broadcast folded variant. Typically the broadcast opportunities were missed by type-width mismatches or mulituse limitations which have been removed in later passes.
As well as introducing broadcast fold tables (which can hopefully be extended/automated in the future), this also handles mismatches in the AND/ANDN/OR/XOR/TERNLOG type-widths, catching additional missed opportunities.
This is patch is pulled from the ongoing work based on D150143, but without removing the existing DAG constant broadcast lowering code - this patch is currently a late stage cleanup only.
The intention is to add additional broadcast/extension handling of constants in future patches, but it turned out that AVX512 broadcast handling was the easiest to start with.
Differential Revision: https://reviews.llvm.org/D150526
Vlad Serebrennikov [Tue, 23 May 2023 09:50:09 +0000 (12:50 +0300)]
[clang] Add tests for CWG issues 977, 1482, 2516
CWG977 focus on point of /completeness/ of enums. Wording provided in CWG1482.
CWG1482 and CWG2516 focus on locus (point) of /declaration/. Wording provided in CWG2516.
Reviewed By: #clang-language-wg, shafik
Differential Revision: https://reviews.llvm.org/D151042
Vlad Serebrennikov [Tue, 23 May 2023 09:43:47 +0000 (12:43 +0300)]
[clang] Add test for CWG2213
[[https://wg21.link/p1787 | P1787]]: CWG2213 is resolved by allowing an elaborated-type-specifier to contain a simple-template-id without friend.
Wording: see changes to [dcl.type.elab]]/1.
The gist of the issue is that forward declaration of partial class template specialization was disallowed.
Reviewed By: #clang-language-wg, shafik
Differential Revision: https://reviews.llvm.org/D151032
Jay Foad [Mon, 15 May 2023 11:01:36 +0000 (12:01 +0100)]
[PowerPC] Avoid RegScavenger::forward in PPCFrameLowering
RegScavenger::backward is preferred because it does not rely on accurate
kill flags.
Differential Revision: https://reviews.llvm.org/D150558
Guillaume Chatelet [Tue, 23 May 2023 09:14:28 +0000 (09:14 +0000)]
[libc] Display unit test runtime for hosted environments
With more tests added to LLVM libc each week we want to keep track of unittest's runtime, especially for low end build bots.
Top offender can be tracked with a bit of scripting (spoiler alert, mem function sweep tests are in the top ones)
```
ninja check-libc | grep "ms)" | awk '{print $(NF-1),$0}' | sort -nr | cut -f2- -d' '
```
Unfortunately this doesn't work for hermetic tests since `clock` is unavailable.
Reviewed By: sivachandra
Differential Revision: https://reviews.llvm.org/D151097
Alex Bradbury [Tue, 23 May 2023 09:09:22 +0000 (10:09 +0100)]
[RISCV] Make zfbfmin imply the F extension
Our current approach is that if one extension requires another, we make
LLVM treat it as implied. My initial zfbfmin patch failed to do this for
the F extension (documented as a requirement of zfbfmin). This patch
fixes that.
Differential Revision: https://reviews.llvm.org/D151096
khei4 [Thu, 18 May 2023 03:49:10 +0000 (12:49 +0900)]
[SimplifyCFG] add nsw on SwitchToLookupTable index calculation on MinCaseVal subtraction
Differential Revision: https://reviews.llvm.org/D146903
Reviewed By: nikic
Qiu Chaofan [Tue, 23 May 2023 08:40:54 +0000 (16:40 +0800)]
[PowerPC] Simplify fp-to-int store optimization
On PowerPC VSX targets, fp-to-int will be transformed into xscv with
mfvsr. When the result is to be stored, mfvsr can be replaced by a
direct store.
This change simplifies the optimization by using existing fp-to-int
code, which helps CSE and handling strictfp cases.
Reviewed By: shchenz
Differential Revision: https://reviews.llvm.org/D141473
Qiu Chaofan [Tue, 23 May 2023 08:22:32 +0000 (16:22 +0800)]
[Clang] Support more stdio builtins
Add more builtins for stdio functions as in GCC, along with their
mutations under IEEE float128 ABI.
Reviewed By: tuliom
Differential Revision: https://reviews.llvm.org/D150087
Joachim Jenke [Tue, 23 May 2023 08:26:13 +0000 (10:26 +0200)]
[OpenMP][Tests][NFC] Mark unsupported libomp tests for GCC
This patch properly marks the support level for libomp test when testing with
GCC.
Some new OpenMP features were only introduced with GCC 11.
Tests using the target construct are incompatibe with GCC.
Tests pass now with GCC 10, 11, 12
Joachim Jenke [Tue, 23 May 2023 08:22:50 +0000 (10:22 +0200)]
[OpenMP][Tests][NFC] Mark unsupported OMPT tests for GCC
Codegen for some OpenMP directives is different from clang, so some
OMPT tests fail. As we don't expect GCC codegen to change significantly,
we mark the tests as unsupported for GCC.
OMPT Tests pass now with GCC 10, 11, 12
Joachim Jenke [Tue, 23 May 2023 07:39:27 +0000 (09:39 +0200)]
[OpenMP][Tests][NFC] Fix libarcher tests for GCC
TSan in GCC filters duplicates less aggressively. With 8 threads we can
expect reports for up to 7 pairs of data race in some tests.
Tests pass now with GCC 10, 11, 12
Jay Foad [Mon, 15 May 2023 12:09:05 +0000 (13:09 +0100)]
[AArch64] Avoid RegScavenger::forward in AArch64SpeculationHardening
RegScavenger::backward is preferred because it does not rely on accurate
kill flags.
Differential Revision: https://reviews.llvm.org/D150560
Jay Foad [Mon, 15 May 2023 12:05:14 +0000 (13:05 +0100)]
[AArch64] Add implicit uses to speculative hardening MIR test
A couple of tests were setting liveins to add fake live registers, but
that only works if you track liveness forwards. Add some implicit uses
too, so that it also works if you track liveness backwards.
Differential Revision: https://reviews.llvm.org/D150559
Matt Arsenault [Tue, 23 May 2023 08:13:36 +0000 (09:13 +0100)]
AMDGPU/GlobalISel: Update test
Krasimir Georgiev [Tue, 23 May 2023 07:59:38 +0000 (07:59 +0000)]
Revert "[clang][ExprConstant] fix __builtin_object_size for flexible array members"
This reverts commit
57c5c1ab2a188b7962c9de5ac0f95e3c7441940a.
Causes an assertion failure: https://reviews.llvm.org/D150892#4363080
Matt Arsenault [Mon, 22 May 2023 09:44:32 +0000 (10:44 +0100)]
Reapply "InstSimplify: Pass AssumptionCache to isKnownNeverInfinity"
This reverts commit
481191b0a8318e55ce467e983d78d2141e827db1.
Matt Arsenault [Fri, 19 May 2023 09:25:38 +0000 (10:25 +0100)]
Reapply "SimplifyLibCalls: Pass AssumptionCache to isKnownNeverInfinity"
This reverts commit
b357f379c81811409348dd0e0273a248b055bb7a.
Matt Arsenault [Fri, 19 May 2023 08:15:22 +0000 (09:15 +0100)]
Reapply "ValueTracking: Delete body of isKnownNeverInfinity"
This reverts commit
d1dc3e13a791fe1b99a341406b5dafec64750cb1.
200bdd9e869e2982f54923b05e54c117fd33f5d9 should have fixed
the reported regression.
Matt Arsenault [Mon, 22 May 2023 09:42:58 +0000 (10:42 +0100)]
Reapply "InstSimplify: Use isKnownNeverInfOrNaN"
This reverts commit
f55224735ed39af16bccd7ff67b734fd758db6fc.
Matt Arsenault [Sun, 7 May 2023 10:13:28 +0000 (11:13 +0100)]
AMDGPU: Expand casted f16 fmed3 pattern to fmin/fmax on gfx8
If we have legal f16 instructions but no f16 med3, we can save
one instruction by expanding out the min/max sequence compared
to casting to f32 and casting back.
Jean Perier [Tue, 23 May 2023 07:17:36 +0000 (09:17 +0200)]
[flang][hlfir] Hoist forall bounds computation when possible
When inner forall bound computations do not depend on previous
forall indices, they can be hoisted.
This is possible because:
- bound computation are required to be pure (so evaluating them only
once is possible).
- If the bound computation depends on a value previously assigned, the
forall scheduling analysis created different run for it: the
assignment impacting the bounds value is not part of the current loop
nest.
The reason this optimization is done at that point and not as part of
generic loop hoisting optimization is that having the all the loop
bound computation hoisted will allow allocating simple temporary
storages. The number of iteration can be pre-computed and used as the
extent for the temporary.
Differential Revision: https://reviews.llvm.org/D151110
Congcong Cai [Tue, 23 May 2023 07:07:03 +0000 (09:07 +0200)]
[Sema] `setInvalidDecl` for error deduction declaration
Fixed: https://github.com/llvm/llvm-project/issues/62408
`setInvalidDecl` for invalid `CXXDeductionGuideDecl` to
avoid crashes during semantic analysis.
Reviewed By: aaron.ballman
Differential Revision: https://reviews.llvm.org/D149516
pvanhout [Mon, 15 May 2023 09:23:09 +0000 (11:23 +0200)]
[AMDGPU] Reintroduce CC exception for non-inlined functions in Promote Alloca limits
This is basically a partial revert of https://reviews.llvm.org/D145586 (
fd1d60873fdc )
D145586 was originally introduced to help with SWDEV-363662, and it did, but
it also caused a 25% drop in performance in
some MIOpen benchmarks where, it seems,
functions are inlined more conservatively.
This patch restores the pre-D145586 behavior
for PromoteAlloca: functions with a non-entry CC
have a 32 VGPRs threshold, but only if the function
is not marked with "alwaysinline".
A good number of AMDGPU code makes uses of
the AMDGPUAlwaysInline pass anyway, so in our
backend "alwaysinline" seems very common.
This change does not affect SWDEV-363662 (the motivating issue for introducing D145586).
Fixes SWDEV-399519
Reviewed By: rampitec, #amdgpu
Differential Revision: https://reviews.llvm.org/D150551
Adrian Kuegel [Tue, 23 May 2023 06:52:53 +0000 (08:52 +0200)]
[mlir] Apply ClangTidy performance finding (NFC)
Chuanqi Xu [Tue, 23 May 2023 06:31:05 +0000 (14:31 +0800)]
[NFC] [C++20] [Modules] Add a test
Add a test from https://github.com/llvm/llvm-project/issues/59999. It is
always good to have more tests.