platform/upstream/llvm.git
13 months ago[AArch64][FMV] Prevent target attribute using for multiversioning.
Pavel Iliin [Thu, 18 May 2023 11:02:04 +0000 (12:02 +0100)]
[AArch64][FMV] Prevent target attribute using for multiversioning.

On AArch64 for function multiversioning target_version/target_clones
attributes should be used. The patch fixes the defect allowing target
attribute to cause multiversioning.

Differential Revision: https://reviews.llvm.org/D150867

13 months ago[LegalizeTypes][ARM][AArch6][RISCV][VE][WebAssembly] Add special case for smin(X...
Craig Topper [Tue, 23 May 2023 16:19:37 +0000 (09:19 -0700)]
[LegalizeTypes][ARM][AArch6][RISCV][VE][WebAssembly] Add special case for smin(X, -1) and smax(X, 0) to ExpandIntRes_MINMAX.

We can compute a simpler expression for Lo for these cases. This
is an alternative for the test cases in D151180 that works for
more targets.

This is similar to some of the special cases we have for expanding
setcc operands.

Differential Revision: https://reviews.llvm.org/D151182

13 months ago[OpenMP][NFC] clang-format the OpenMP device runtime
Joseph Huber [Tue, 23 May 2023 16:09:16 +0000 (11:09 -0500)]
[OpenMP][NFC] clang-format the OpenMP device runtime

These files aren't fully formatted. I'm guessing this was a holdover
from when `clang-format` was totally broken for OpenMP offloading.
Format the files to be more consistent.

Reviewed By: tianshilei1992

Differential Revision: https://reviews.llvm.org/D151226

13 months ago[libc] More efficiently send bytes via `send_n` and `recv_n`
Joseph Huber [Fri, 19 May 2023 16:17:42 +0000 (11:17 -0500)]
[libc] More efficiently send bytes via `send_n` and `recv_n`

Currently we have the `send_n` and `recv_n` routines to stream data,
such as a string to print, to the other side. The first operation is to
send the size so the other side knows the number of bytes to recieve.
However, this wasted 56 bytes that could've been sent. This meant that
small values, like the arguments to a function to call on the host for
example, needed to perform an extra send. This patch sends the first 56
bytes in the first packet and continues if necessary.

Depends on D150992

Reviewed By: JonChesterfield

Differential Revision: https://reviews.llvm.org/D151041

13 months ago[libc] Fix the `send_n` and `recv_n` utilities under divergent lanes
Joseph Huber [Fri, 19 May 2023 19:58:32 +0000 (14:58 -0500)]
[libc] Fix the `send_n` and `recv_n` utilities under divergent lanes

We provide the `send_n` and `recv_n` utilities as a generic way to
stream data between both sides of the process. This was previously
tested and performed as expected when using a string of constant size.
However, when the size was allowed to diverge between the threads in the
warp or wavefront this could deadlock. This did not occur on NVPTX
because of the use of the explicit warp sync. However, on AMD one of the
work items in the wavefront could continue executing and hit the next
`recv` call before the other threads, then we would deadlock as we
violated the RPC invariants.

This patch replaces the for loop with a thread ballot. This will cause
every thread in the warp or wavefront to continue executing the loop
until all of them can exit. This acts as a more explicit wavefront sync.

Reviewed By: JonChesterfield

Differential Revision: https://reviews.llvm.org/D150992

13 months ago[libc++] Remove tests from ranges.pass.cpp which violate semantic requirements
Nikolas Klauser [Tue, 23 May 2023 15:59:13 +0000 (08:59 -0700)]
[libc++] Remove tests from ranges.pass.cpp which violate semantic requirements

This also removes some tests which we have grouped together into robust_from_*.pass.cpp tests.

Specifically, checking that
- `ranges::dangling` is returned is done in `libcxx/test/std/algorithms/ranges_robust_against_dangling.pass.cpp`
- `std::invoke` is used is done in `libcxx/test/std/algorithms/ranges_robust_against_omitting_invoke.pass.cpp`.
- implicit conversion to bool works is done in `libcxx/test/std/algorithms/ranges_robust_against_nonbool_predicates.pass.cpp`

Checking the comparison order is invalid because the `operator==` isn't symmetric.
Checking what the exact type of `operator==` is, is invalid because comparing the same object has to yield the same results if the objects are not modified.

Reviewed By: ldionne, #libc

Spies: EricWF, libcxx-commits

Differential Revision: https://reviews.llvm.org/D150588

13 months ago[libc++][NFC] Move basic_ios extern instantiations into <ios>
Nikolas Klauser [Tue, 23 May 2023 15:58:14 +0000 (08:58 -0700)]
[libc++][NFC] Move basic_ios extern instantiations into <ios>

`basic_ios` is defined in `<ios>`, so it seems weird that we declare the explicit instantiation for it i `<streambuf>`, which is technically unrelated.

Reviewed By: #libc, EricWF, ldionne

Spies: ldionne, EricWF, libcxx-commits

Differential Revision: https://reviews.llvm.org/D150912

13 months ago[HIP] Allow std::malloc in device function
Yaxun (Sam) Liu [Thu, 11 May 2023 16:57:21 +0000 (12:57 -0400)]
[HIP] Allow std::malloc in device function

D106463 caused a regression that prevents std::malloc to be
called in the device function, which is allowed with nvcc.

Basically the standard C++ header introducing malloc in
std namespace by using ::malloc. The device ::malloc
function needs to be declared before using ::malloc
to be introduced into std namespace.

Revert D106463 and add a test.

Reviewed by: Artem Belevich

Differential Revision: https://reviews.llvm.org/D150965

13 months ago[libc++][NFC] Fix whitespace problems in the files added to ignore_format.txt in...
Nikolas Klauser [Tue, 23 May 2023 15:40:47 +0000 (08:40 -0700)]
[libc++][NFC] Fix whitespace problems in the files added to ignore_format.txt in D151115

Reviewed By: ldionne, #libc, Mordante

Spies: arichardson, Mordante, libcxx-commits

Differential Revision: https://reviews.llvm.org/D151119

13 months ago[lldb][NFCI] Use llvm's libDebugInfo for DebugRanges
Felipe de Azevedo Piovezan [Thu, 11 May 2023 13:01:12 +0000 (09:01 -0400)]
[lldb][NFCI] Use llvm's libDebugInfo for DebugRanges

In an effort to unify the different dwarf parsers available in the codebase,
this commit removes LLDB's custom parsing for the `.debug_ranges` DWARF section,
instead calling into LLVM's parser.

Subsequent work should look into unifying `llvm::DWARDebugRangeList` (whose
entries are pairs of (start, end) addresses) with `lldb::DWARFRangeList` (whose
entries are pairs of (start, length)). The lists themselves are also different
data structures, but functionally equivalent.

Depends on D150363

Differential Revision: https://reviews.llvm.org/D150366

13 months ago[libc][math] Implement double precision log1p correctly rounded to all rounding modes.
Tue Ly [Sun, 21 May 2023 05:27:38 +0000 (01:27 -0400)]
[libc][math] Implement double precision log1p correctly rounded to all rounding modes.

Implement double precision log1p function correctly rounded to all
rounding modes.

**Performance**

  - For `0.5 <= x <= 2`, the fast pass hitting rate is about 99.93%.
  - Benchmarks with `./perf.sh` tool from the CORE-MATH project, unit is (CPU clocks / call).
  - Reciprocal throughput from CORE-MATH's perf tool on Ryzen 5900X:
```
$ ./perf.sh log1p
GNU libc version: 2.35
GNU libc release: stable

-- CORE-MATH reciprocal throughput -- with FMA
[####################] 100 %
Ntrial = 20 ; Min = 39.792 + 1.011 clc/call; Median-Min = 0.940 clc/call; Max = 41.373 clc/call;

-- CORE-MATH reciprocal throughput -- without FMA (-march=x86-64-v2)
[####################] 100 %
Ntrial = 20 ; Min = 87.285 + 1.135 clc/call; Median-Min = 1.299 clc/call; Max = 89.715 clc/call;

-- System LIBC reciprocal throughput --
[####################] 100 %
Ntrial = 20 ; Min = 20.666 + 0.123 clc/call; Median-Min = 0.125 clc/call; Max = 20.828 clc/call;

-- LIBC reciprocal throughput -- with FMA
[####################] 100 %
Ntrial = 20 ; Min = 20.928 + 0.771 clc/call; Median-Min = 0.725 clc/call; Max = 22.767 clc/call;

-- LIBC reciprocal throughput -- without FMA
[####################] 100 %
Ntrial = 20 ; Min = 31.461 + 0.528 clc/call; Median-Min = 0.602 clc/call; Max = 36.809 clc/call;

```
  - Latency from CORE-MATH's perf tool on Ryzen 5900X:
```
$ ./perf.sh log1p --latency
GNU libc version: 2.35
GNU libc release: stable

-- CORE-MATH latency -- with FMA
[####################] 100 %
Ntrial = 20 ; Min = 77.875 + 0.062 clc/call; Median-Min = 0.051 clc/call; Max = 78.003 clc/call;

-- CORE-MATH latency -- without FMA (-march=x86-64-v2)
[####################] 100 %
Ntrial = 20 ; Min = 101.958 + 1.202 clc/call; Median-Min = 1.325 clc/call; Max = 104.452 clc/call;

-- System LIBC latency --
[####################] 100 %
Ntrial = 20 ; Min = 60.581 + 1.443 clc/call; Median-Min = 1.611 clc/call; Max = 62.285 clc/call;

-- LIBC latency -- with FMA
[####################] 100 %
Ntrial = 20 ; Min = 48.817 + 1.108 clc/call; Median-Min = 1.300 clc/call; Max = 50.282 clc/call;

-- LIBC latency -- without FMA
[####################] 100 %
Ntrial = 20 ; Min = 61.121 + 0.599 clc/call; Median-Min = 0.761 clc/call; Max = 62.020 clc/call;
```
  - Accurate pass latency:
```
$ ./perf.sh log1p --latency --simple_stat
GNU libc version: 2.35
GNU libc release: stable

-- CORE-MATH latency -- with FMA
760.444

-- CORE-MATH latency -- without FMA (-march=x86-64-v2)
827.880

-- LIBC latency -- with FMA
711.837

-- LIBC latency -- without FMA
764.317
```

Reviewed By: zimmermann6

Differential Revision: https://reviews.llvm.org/D151049

13 months ago[InstCombine] Add droppable users back to worklist (NFCI)
Nikita Popov [Tue, 23 May 2023 14:59:02 +0000 (16:59 +0200)]
[InstCombine] Add droppable users back to worklist (NFCI)

When sinking and users are dropped, add the using instructions
to the worklist, as they can likely be removed as well.

This should be NFC apart from worklist order effects.

13 months ago[flang][NFC] Move Array constructor inlined temp management into a utility
Jean Perier [Tue, 23 May 2023 15:00:15 +0000 (17:00 +0200)]
[flang][NFC] Move Array constructor inlined temp management into a utility

This patch moves the counter and storage management part of the array
constructor inlined temporary strategy into its own utility so that it
can be reused for the simple cases of temporary creations inside WHERE
and FORALL.

It actually fixes a bug where the counter first value  used for addressing
was "2" leading to read/write after the allocated storage... It seems
I ran the tests end-to-end without the HLFIR flag when previously testing
this. So this may clear some segfaults.

Differential Revision: https://reviews.llvm.org/D151106

13 months ago[flang] use greedy mlir driver for stack arrays pass
Tom Eccles [Wed, 17 May 2023 16:07:41 +0000 (16:07 +0000)]
[flang] use greedy mlir driver for stack arrays pass

In upstream mlir, the dialect conversion infrastructure is used for
lowering from one dialect to another: the passes are of the form
XToYPass. Whereas, transformations within the same dialect tend to use
applyPatternsAndFoldGreedily.

In this case, the full complexity of applyPatternsAndFoldGreedily isn't
needed so we can get away with the simpler applyOpPatternsAndFold.

This change was suggested by @jeanPerier

Differential Revision: https://reviews.llvm.org/D150853

13 months ago[libc][math] Implement double precision log2 function correctly rounded to all roundi...
Tue Ly [Thu, 11 May 2023 15:10:02 +0000 (11:10 -0400)]
[libc][math] Implement double precision log2 function correctly rounded to all rounding modes.

Implement double precision log2 function correctly rounded to all
rounding modes.

See https://reviews.llvm.org/D150014 for a more detail description of the algorithm.

**Performance**

  - For `0.5 <= x <= 2`, the fast pass hitting rate is about 99.91%.

  - Reciprocal throughput from CORE-MATH's perf tool on Ryzen 5900X:
```
$ ./perf.sh log2
GNU libc version: 2.35
GNU libc release: stable

-- CORE-MATH reciprocal throughput -- with FMA
[####################] 100 %
Ntrial = 20 ; Min = 15.458 + 0.204 clc/call; Median-Min = 0.224 clc/call; Max = 15.867 clc/call;

-- CORE-MATH reciprocal throughput -- without FMA (-march=x86-64-v2)
[####################] 100 %
Ntrial = 20 ; Min = 23.711 + 0.524 clc/call; Median-Min = 0.443 clc/call; Max = 25.307 clc/call;

-- System LIBC reciprocal throughput --
[####################] 100 %
Ntrial = 20 ; Min = 14.807 + 0.199 clc/call; Median-Min = 0.211 clc/call; Max = 15.137 clc/call;

-- LIBC reciprocal throughput -- with FMA
[####################] 100 %
Ntrial = 20 ; Min = 17.666 + 0.274 clc/call; Median-Min = 0.298 clc/call; Max = 18.531 clc/call;

-- LIBC reciprocal throughput -- without FMA
[####################] 100 %
Ntrial = 20 ; Min = 26.534 + 0.418 clc/call; Median-Min = 0.462 clc/call; Max = 27.327 clc/call;

```
  - Latency from CORE-MATH's perf tool on Ryzen 5900X:
```
$ ./perf.sh log2 --latency
GNU libc version: 2.35
GNU libc release: stable

-- CORE-MATH latency -- with FMA
[####################] 100 %
Ntrial = 20 ; Min = 46.048 + 1.643 clc/call; Median-Min = 1.694 clc/call; Max = 48.018 clc/call;

-- CORE-MATH latency -- without FMA (-march=x86-64-v2)
[####################] 100 %
Ntrial = 20 ; Min = 62.333 + 0.138 clc/call; Median-Min = 0.119 clc/call; Max = 62.583 clc/call;

-- System LIBC latency --
[####################] 100 %
Ntrial = 20 ; Min = 45.206 + 1.503 clc/call; Median-Min = 1.467 clc/call; Max = 47.229 clc/call;

-- LIBC latency -- with FMA
[####################] 100 %
Ntrial = 20 ; Min = 43.042 + 0.454 clc/call; Median-Min = 0.484 clc/call; Max = 43.912 clc/call;

-- LIBC latency -- without FMA
[####################] 100 %
Ntrial = 20 ; Min = 57.016 + 1.636 clc/call; Median-Min = 1.655 clc/call; Max = 58.816 clc/call;
```
  - Accurate pass latency:
```
$ ./perf.sh log2 --latency --simple_stat
GNU libc version: 2.35
GNU libc release: stable

-- CORE-MATH latency -- with FMA
177.632

-- CORE-MATH latency -- without FMA (-march=x86-64-v2)
231.332

-- LIBC latency -- with FMA
459.751

-- LIBC latency -- without FMA
463.850
```

Reviewed By: zimmermann6

Differential Revision: https://reviews.llvm.org/D150374

13 months ago[Hexagon] Fix safety check in moving instructions in HVC::AlignVectors
Krzysztof Parzyszek [Tue, 23 May 2023 12:46:15 +0000 (05:46 -0700)]
[Hexagon] Fix safety check in moving instructions in HVC::AlignVectors

A prior commit accidentally affected a safety check allowing aliased memory
instructions to be moved across one another.

13 months ago[NFC][CLANG] Fix static code analyzer concerns
Manna, Soumi [Tue, 23 May 2023 14:36:15 +0000 (07:36 -0700)]
[NFC][CLANG] Fix static code analyzer concerns

Reported by Static Code Analyzer Tool, Coverity:

Dereference null return value

Inside "ExprConstant.cpp" file, in <unnamed>::RecordExprEvaluator::VisitCXXStdInitializerListExpr(clang::CXXStdInitializerListExpr const *): Return value of function which returns null is dereferenced without checking.

  bool RecordExprEvaluator::VisitCXXStdInitializerListExpr(
   const CXXStdInitializerListExpr *E) {
       // returned_null: getAsConstantArrayType returns nullptr (checked 81 out of 93 times).
       //var_assigned: Assigning: ArrayType = nullptr return value from getAsConstantArrayType.
    const ConstantArrayType *ArrayType =
       Info.Ctx.getAsConstantArrayType(E->getSubExpr()->getType());
    LValue Array;
    //Condition !EvaluateLValue(E->getSubExpr(), Array, this->Info, false), taking false branch.
    if (!EvaluateLValue(E->getSubExpr(), Array, Info))
     return false;

    // Get a pointer to the first element of the array.

    //Dereference null return value (NULL_RETURNS)
    //dereference: Dereferencing a pointer that might be nullptr ArrayType when calling addArray.
    Array.addArray(Info, E, ArrayType);

This patch adds an assert for unexpected type for array initializer.

Reviewed By: erichkeane

Differential Revision: https://reviews.llvm.org/D151040

13 months ago[include-cleaner] Treat references to nested types implicit
Kadir Cetinkaya [Fri, 5 May 2023 10:39:09 +0000 (12:39 +0200)]
[include-cleaner] Treat references to nested types implicit

Differential Revision: https://reviews.llvm.org/D149948

13 months ago[InstCombine] Fix worklist management in select value equiv fold (NFCI)
Nikita Popov [Tue, 23 May 2023 14:36:11 +0000 (16:36 +0200)]
[InstCombine] Fix worklist management in select value equiv fold (NFCI)

Requeue the modified instruction.

This should be NFC apart from worklist order effects.

13 months ago[libc][math] Implement double precision log function correctly rounded to all roundin...
Tue Ly [Mon, 8 May 2023 18:03:52 +0000 (14:03 -0400)]
[libc][math] Implement double precision log function correctly rounded to all rounding modes.

Implement double precision log function correctly rounded to all
rounding modes.

See https://reviews.llvm.org/D150014 for a more detail description of the algorithm.

**Performance**

  - For `0.5 <= x <= 2`, the fast pass hitting rate is about 99.93%.

  - Reciprocal throughput from CORE-MATH's perf tool on Ryzen 5900X:
```
$ ./perf.sh log
GNU libc version: 2.35
GNU libc release: stable

-- CORE-MATH reciprocal throughput -- with FMA
[####################] 100 %
Ntrial = 20 ; Min = 17.465 + 0.596 clc/call; Median-Min = 0.602 clc/call; Max = 18.389 clc/call;

-- CORE-MATH reciprocal throughput -- without FMA (-march=x86-64-v2)
[####################] 100 %
Ntrial = 20 ; Min = 54.961 + 2.606 clc/call; Median-Min = 2.180 clc/call; Max = 59.583 clc/call;

-- System LIBC reciprocal throughput --
[####################] 100 %
Ntrial = 20 ; Min = 12.608 + 0.276 clc/call; Median-Min = 0.359 clc/call; Max = 13.147 clc/call;

-- LIBC reciprocal throughput -- with FMA
[####################] 100 %
Ntrial = 20 ; Min = 20.952 + 0.468 clc/call; Median-Min = 0.602 clc/call; Max = 21.881 clc/call;

-- LIBC reciprocal throughput -- without FMA
[####################] 100 %
Ntrial = 20 ; Min = 18.569 + 0.552 clc/call; Median-Min = 0.601 clc/call; Max = 19.259 clc/call;

```
  - Latency from CORE-MATH's perf tool on Ryzen 5900X:
```
$ ./perf.sh log --latency
GNU libc version: 2.35
GNU libc release: stable

-- CORE-MATH latency -- with FMA
[####################] 100 %
Ntrial = 20 ; Min = 48.431 + 0.699 clc/call; Median-Min = 0.073 clc/call; Max = 51.269 clc/call;

-- CORE-MATH latency -- without FMA (-march=x86-64-v2)
[####################] 100 %
Ntrial = 20 ; Min = 64.865 + 3.235 clc/call; Median-Min = 3.475 clc/call; Max = 71.788 clc/call;

-- System LIBC latency --
[####################] 100 %
Ntrial = 20 ; Min = 42.151 + 2.090 clc/call; Median-Min = 2.270 clc/call; Max = 44.773 clc/call;

-- LIBC latency -- with FMA
[####################] 100 %
Ntrial = 20 ; Min = 35.266 + 0.479 clc/call; Median-Min = 0.373 clc/call; Max = 36.798 clc/call;

-- LIBC latency -- without FMA
[####################] 100 %
Ntrial = 20 ; Min = 48.518 + 0.484 clc/call; Median-Min = 0.500 clc/call; Max = 49.896 clc/call;
```
  - Accurate pass latency:
```
$ ./perf.sh log --latency --simple_stat
GNU libc version: 2.35
GNU libc release: stable

-- CORE-MATH latency -- with FMA
598.306

-- CORE-MATH latency -- without FMA (-march=x86-64-v2)
632.925

-- LIBC latency -- with FMA
455.632

-- LIBC latency -- without FMA
488.564
```

Reviewed By: zimmermann6

Differential Revision: https://reviews.llvm.org/D150131

13 months ago[NFC][Clang] Fix Coverity bug with dereference null return value in clang::CodeGen...
Manna, Soumi [Tue, 23 May 2023 14:22:40 +0000 (07:22 -0700)]
[NFC][Clang] Fix Coverity bug with dereference null return value in clang::CodeGen::CodeGenFunction::EmitOMPArraySectionExpr()

Reported by Coverity:

Inside  "CGExpr.cpp" file, in clang::CodeGen::CodeGenFunction::EmitOMPArraySectionExpr(clang::OMPArraySectionExpr const *, bool): Return value of function which returns null is dereferenced without checking.

    } else {
   //returned_null: getAsConstantArrayType returns nullptr (checked 83 out of 95 times).
   // var_assigned: Assigning: CAT = nullptr return value from getAsConstantArrayType.
      auto *CAT = C.getAsConstantArrayType(ArrayTy);
   //identity_transfer: Member function call CAT->getSize() returns an offset off CAT (this).

     // Dereference null return value (NULL_RETURNS)
     //dereference: Dereferencing a pointer that might be nullptr CAT->getSize() when calling APInt.
     ConstLength = CAT->getSize();
    }

This patch adds an assert to resolve the bug.

Reviewed By: erichkeane

Differential Revision: https://reviews.llvm.org/D151137

13 months ago[InstCombine] Regenerate test checks (NFC)
Nikita Popov [Tue, 23 May 2023 14:24:41 +0000 (16:24 +0200)]
[InstCombine] Regenerate test checks (NFC)

13 months ago[InstCombine] Fix worklist management in replaceGEPIdxWithZero() fold (NFCI)
Nikita Popov [Tue, 23 May 2023 14:20:41 +0000 (16:20 +0200)]
[InstCombine] Fix worklist management in replaceGEPIdxWithZero() fold (NFCI)

Make sure the old load/store operand is queued for DCE.

This should be NFC apart from worklist order effects.

13 months ago[libc][AMDGPU] Disable the AMDGPU backend's ctor/dtor lowering for libc
Joseph Huber [Tue, 23 May 2023 14:16:30 +0000 (09:16 -0500)]
[libc][AMDGPU] Disable the AMDGPU backend's ctor/dtor lowering for libc

The AMDGPU backend has a built-in pass to lower constructors. We do this
manually in the `start.cpp` implementation so we can disable this to
keep the binaries smaller.

Differential Revision: https://reviews.llvm.org/D151213

13 months ago[OpenMP] Insert missing variable update inside loop
Jonathan Peyton [Mon, 22 May 2023 19:08:51 +0000 (14:08 -0500)]
[OpenMP] Insert missing variable update inside loop

While loop within task priority code did not have necessary update of
variable which could lead to hangs if two threads collided when both
attempted to execute the compare_and_exchange.

Fixes: https://github.com/llvm/llvm-project/issues/62867
Differential Revision: https://reviews.llvm.org/D151138

13 months ago[libc][math] Make log10 correctly rounded for non-FMA targets and improve itsperformance.
Tue Ly [Sat, 6 May 2023 02:08:42 +0000 (22:08 -0400)]
[libc][math] Make log10 correctly rounded for non-FMA targets and improve itsperformance.

Make log10 correctly rounded for non-FMA targets and improve its
performance.

Implemented fast pass and accurate pass:

**Fast Pass**:

  - Range reduction step 0: Extract exponent and mantissa
```
  x = 2^(e_x) * m_x
```
  - Range reduction step 1: Use lookup tables of size 2^7 = 128 to reduce the argument to:
```
   -2^-8 <= v = r * m_x - 1 < 2^-7
  where r = 2^-8 * ceil( 2^8 * (1 - 2^-8) / (1 + k * 2^-7) )
  and k = trunc( (m_x - 1) * 2^7 )
```
  - Polynomial approximation: approximate `log(1 + v)` by a degree-7 polynomial generated by Sollya with:
```
 > P = fpminimax((log(1 + x) - x)/x^2, 5, [|D...|], [-2^-8, 2^-7]);
```
  - Combine the results:
```
  log10(x) ~ ( e_x * log(2) - log(r) + v + v^2 * P(v) ) * log10(e)
```
  - Perform additive Ziv's test with errors bounded by `P_ERR * v^2`.  Return the result if Ziv's test passed.

**Accurate Pass**:

  - Take `e_x`, `v`, and the lookup table index from the range reduction step of fast pass.
  - Perform 3 more range reduction steps:
    - Range reduction step 2: Use look-up tables of size 193 to reduce the argument to `[-0x1.3ffcp-15, 0x1.3e3dp-15]`
```
   v2 = r2 * (1 + v) - 1 = (1 + s2) * (1 + v) - 1 = s2 + v + s2 * v
  where r2 = 2^-16 * round ( 2^16 / (1 + k * 2^-14) )
  and k = trunc( v * 2^14 + 0.5 ).
```
    - Range reduction step 3: Use look-up tables of size 161 to reduce the argument to `[-0x1.01928p-22 , 0x1p-22]`
```
   v3 = r3 * (1 + v2) - 1 = (1 + s3) * (1 + v2) - 1 = s3 + v2 + s3 * v2
  where r3 = 2^-21 * round ( 2^21 / (1 + k * 2^-21) )
  and k = trunc( v * 2^21 + 0.5 ).
```
    - Range reduction step 4: Use look-up tables of size 130 to reduce the argument to `[-0x1.0002143p-29 , 0x1p-29]`
```
   v4 = r4 * (1 + v3) - 1 = (1 + s4) * (1 + v3) - 1 = s4 + v3 + s4 * v3
  where r4 = 2^-28 * round ( 2^28 / (1 + k * 2^-28) )
  and k = trunc( v * 2^28 + 0.5 ).
```
  - Polynomial approximation: approximate `log10(1 + v4)` by a degree-4 minimax polynomial generated by Sollya with:
```
  > P = fpminimax(log10(1 + x)/x, 3, [|128...|], [-0x1.0002143p-29 , 0x1p-29]);
```
  - Combine the results:
```
  log10(x) ~ e_x * log10(2) - log10(r) - log10(r2) - log10(r3) - log10(r4) + v * P(v)
```
  - The combined results are computed using floating points of 128-bit precision.

**Performance**

  - For `0.5 <= x <= 2`, the fast pass hitting rate is about 99.92%.

  - Reciprocal throughput from CORE-MATH's perf tool on Ryzen 5900X:
```
$ ./perf.sh log10
GNU libc version: 2.35
GNU libc release: stable

-- CORE-MATH reciprocal throughput -- with FMA
[####################] 100 %
Ntrial = 20 ; Min = 20.402 + 0.589 clc/call; Median-Min = 0.277 clc/call; Max = 22.752 clc/call;

-- CORE-MATH reciprocal throughput -- without FMA (-march=x86-64-v2)
[####################] 100 %
Ntrial = 20 ; Min = 75.797 + 3.317 clc/call; Median-Min = 3.407 clc/call; Max = 79.371 clc/call;

-- System LIBC reciprocal throughput --
[####################] 100 %
Ntrial = 20 ; Min = 22.668 + 0.184 clc/call; Median-Min = 0.181 clc/call; Max = 23.205 clc/call;

-- LIBC reciprocal throughput -- with FMA
[####################] 100 %
Ntrial = 20 ; Min = 25.977 + 0.183 clc/call; Median-Min = 0.138 clc/call; Max = 26.283 clc/call;

-- LIBC reciprocal throughput -- without FMA
[####################] 100 %
Ntrial = 20 ; Min = 22.140 + 0.980 clc/call; Median-Min = 0.853 clc/call; Max = 23.790 clc/call;

```
  - Latency from CORE-MATH's perf tool on Ryzen 5900X:
```
$ ./perf.sh log10 --latency
GNU libc version: 2.35
GNU libc release: stable

-- CORE-MATH latency -- with FMA
[####################] 100 %
Ntrial = 20 ; Min = 54.613 + 0.357 clc/call; Median-Min = 0.287 clc/call; Max = 55.701 clc/call;

-- CORE-MATH latency -- without FMA (-march=x86-64-v2)
[####################] 100 %
Ntrial = 20 ; Min = 79.681 + 0.482 clc/call; Median-Min = 0.294 clc/call; Max = 81.604 clc/call;

-- System LIBC latency --
[####################] 100 %
Ntrial = 20 ; Min = 61.532 + 0.208 clc/call; Median-Min = 0.199 clc/call; Max = 62.256 clc/call;

-- LIBC latency -- with FMA
[####################] 100 %
Ntrial = 20 ; Min = 41.510 + 0.205 clc/call; Median-Min = 0.244 clc/call; Max = 41.867 clc/call;

-- LIBC latency -- without FMA
[####################] 100 %
Ntrial = 20 ; Min = 55.669 + 0.240 clc/call; Median-Min = 0.280 clc/call; Max = 56.056 clc/call;
```
  - Accurate pass latency:
```
$ ./perf.sh log10 --latency --simple_stat
GNU libc version: 2.35
GNU libc release: stable

-- CORE-MATH latency -- with FMA
640.688

-- CORE-MATH latency -- without FMA (-march=x86-64-v2)
667.354

-- LIBC latency -- with FMA
495.593

-- LIBC latency -- without FMA
504.143
```

Reviewed By: zimmermann6

Differential Revision: https://reviews.llvm.org/D150014

13 months ago[NFC][CLANG] Fix static code analyzer concerns with dereference null return value
Manna, Soumi [Tue, 23 May 2023 14:07:09 +0000 (07:07 -0700)]
[NFC][CLANG] Fix static code analyzer concerns with dereference null return value

Reported by Static Code Analyzer Tool, Coverity:

Inside "SemaExprMember.cpp" file, in clang::Sema::BuildMemberReferenceExpr(clang::Expr *, clang::QualType, clang::SourceLocation, bool, clang::CXXScopeSpec &, clang::SourceLocation, clang::NamedDecl *, clang::DeclarationNameInfo const &, clang::TemplateArgumentListInfo const *, clang::Scope const *, clang::Sema::ActOnMemberAccessExtraArgs *): Return value of function which returns null is dereferenced without checking

  //Condition !Base, taking true branch.
  if (!Base) {
    TypoExpr *TE = nullptr;
    QualType RecordTy = BaseType;

     //Condition IsArrow, taking true branch.
     if (IsArrow) RecordTy = RecordTy->castAs<PointerType>()->getPointeeType();
     //returned_null: getAs returns nullptr (checked 279 out of 294 times).
     //Condition TemplateArgs != NULL, taking true branch.

     //Dereference null return value (NULL_RETURNS)
     //dereference: Dereferencing a pointer that might be nullptr RecordTy->getAs() when calling LookupMemberExprInRecord.
     if (LookupMemberExprInRecord(
           *this, R, nullptr, RecordTy->getAs<RecordType>(), OpLoc, IsArrow,
           SS, TemplateArgs != nullptr, TemplateKWLoc, TE))
        return ExprError();
     if (TE)
       return TE;

This patch uses castAs instead of getAs which will assert if the type doesn't match.

Reviewed By: erichkeane

Differential Revision: https://reviews.llvm.org/D151130

13 months ago[Driver] Try to fix linux-ld.c test with DEFAULT_LINKER set (NFC)
Nikita Popov [Tue, 23 May 2023 14:04:24 +0000 (16:04 +0200)]
[Driver] Try to fix linux-ld.c test with DEFAULT_LINKER set (NFC)

The test fails on the clang-ppc64le-rhel build bot, which has
DEFAULT_LINKER set and an ld.lld binary in the LLVM build directory.

13 months ago[AMDGPU] Add an option to disable manual ctor / dtor lowering
Joseph Huber [Mon, 15 May 2023 12:59:53 +0000 (07:59 -0500)]
[AMDGPU] Add an option to disable manual ctor / dtor lowering

Currently AMDGPU offers extra ctor / dtor lowering by emitting a kernel
that can be called. It's possible to handle ctors and dtors using the
standard method as shown in D149340's commit message. In which case we
on't need these extra kernels as they won't be called. This patch simply
adds a way to conditionally turn off this handling if we do not want to
get extra kernels in the output.

Unrelated, but we could convert this handling to an ODR function that simply
calls the code in D149340 constructed via LLVM-IR. That would handle priority
correctly and would then be correct if not run in LTO mode.

Reviewed By: yaxunl

Differential Revision: https://reviews.llvm.org/D150565

13 months ago[ubsan][test] Remove --check-prefix=UNIQUE for x86_64-apple from e215996a2932ed7c472f...
Fangrui Song [Tue, 23 May 2023 13:59:01 +0000 (06:59 -0700)]
[ubsan][test] Remove --check-prefix=UNIQUE for x86_64-apple from e215996a2932ed7c472f4e94dc4345b30fd0c373

After switching to use a type hash instead of possibly-non-unique typeinfo
objects, we no longer have unique/non-unique distinction.

13 months ago[InstCombine] Remove dead extractelements (NFCI)
Nikita Popov [Tue, 23 May 2023 13:39:53 +0000 (15:39 +0200)]
[InstCombine] Remove dead extractelements (NFCI)

Directly remove these dead extractelement instructions, rather than
leaving them for the next InstCombine iteration to clean up.

Should be mostly NFC, apart from worklist order differences.

13 months ago[mlir][bufferization] Fix bug in findValueInReverseUseDefChain
Matthias Springer [Tue, 23 May 2023 13:22:20 +0000 (15:22 +0200)]
[mlir][bufferization] Fix bug in findValueInReverseUseDefChain

This bug was recently introduced in D143927 and manifests as a dominance violation.

Differential Revision: https://reviews.llvm.org/D151077

13 months agoSilence switch statement contains 'default' but no 'case' labels warning; NFC
Aaron Ballman [Tue, 23 May 2023 13:28:05 +0000 (09:28 -0400)]
Silence switch statement contains 'default' but no 'case' labels warning; NFC

These are showing up in MSVC builds.

13 months ago[AArch64][LV] Disable maximising bandwidth for streaming compatible sve
Dinar Temirbulatov [Tue, 23 May 2023 13:24:01 +0000 (13:24 +0000)]
[AArch64][LV] Disable maximising bandwidth for streaming compatible sve

Fixing last commit by adding actual change to AArch64TargetTransformInfo.cpp

Differential Revision: https://reviews.llvm.org/D150336

13 months agoAdd StringRef::consumeInteger(APInt)
Thomas Preud'homme [Tue, 16 May 2023 09:24:57 +0000 (09:24 +0000)]
Add StringRef::consumeInteger(APInt)

This will be required to allow arbitrary precision support to
FileCheck's numeric variables and expressions. Note: as per
getAsInteger(), this does not support negative value. If there is
interest for that it can be added in a separate patch.

Reviewed By: dblaikie

Differential Revision: https://reviews.llvm.org/D150878

13 months ago[AArch64][LV] Disable maximising bandwidth for streaming compatible sve
Dinar Temirbulatov [Tue, 23 May 2023 12:58:19 +0000 (12:58 +0000)]
[AArch64][LV] Disable maximising bandwidth for streaming compatible sve

We noticed some runtime performance improvements by disabling maximising
bandwidth for streaming compatible sve.

Differential Revision: https://reviews.llvm.org/D150336

13 months agoTurn unreachable error into assert
Thomas Preud'homme [Tue, 16 May 2023 09:22:01 +0000 (09:22 +0000)]
Turn unreachable error into assert

Function valueFromStringRepr() throws an error on missing 0x prefix when
parsing a number string into a value. However, getWildcardRegex() already
ensures that only text with the 0x prefix will match and be parsed,
making that error throwing code dead code. This commit turn the code
into an assert and remove the unit tests exercising that test
accordingly.

Reviewed By: jhenderson

Differential Revision: https://reviews.llvm.org/D150797

13 months agosilence an unused variable warning after 8064caf83fb166b709bfe0e7641c5181341cb064
Krasimir Georgiev [Tue, 23 May 2023 12:46:25 +0000 (12:46 +0000)]
silence an unused variable warning after 8064caf83fb166b709bfe0e7641c5181341cb064

13 months ago[AArch64][FMV] Fix name mangling.
Pavel Iliin [Wed, 17 May 2023 17:14:01 +0000 (18:14 +0100)]
[AArch64][FMV] Fix name mangling.

Put features into function version name in increasing priority order.

Differential Revision: https://reviews.llvm.org/D150800

13 months ago[KnownBits] Return zero instead of unknown for always poison shifts
Nikita Popov [Mon, 15 May 2023 15:56:02 +0000 (17:56 +0200)]
[KnownBits] Return zero instead of unknown for always poison shifts

For always poison shifts, any KnownBits return value is valid.
Currently we return unknown, but returning zero is generally more
profitable. We had some code in ValueTracking that tried to do this,
but was actually dead code.

Differential Revision: https://reviews.llvm.org/D150648

13 months ago[clangd] Store paths as requested in PreambleStatCache
Kadir Cetinkaya [Tue, 23 May 2023 07:47:57 +0000 (09:47 +0200)]
[clangd] Store paths as requested in PreambleStatCache

Underlying FS can store different file names inside the stat response
(e.g. symlinks resolved, absolute paths, dots removed). But we store path names
as requested inside the preamble,
https://github.com/llvm/llvm-project/blob/main/clang/lib/Serialization/ASTWriter.cpp#L1635.

This improves cache hit rates from ~30% to 90% in a build system that uses
symlinks.

Differential Revision: https://reviews.llvm.org/D151185

13 months agoRevert "[clang] Add tests for CWG issues 977, 1482, 2516"
Vlad Serebrennikov [Tue, 23 May 2023 12:29:14 +0000 (15:29 +0300)]
Revert "[clang] Add tests for CWG issues 977, 1482, 2516"

This reverts commit 85452b5f9b5aba5bdf0259b7f0d7400362f95535.

13 months ago[PostOrderIterator] Use SmallVector for RPOT blocks (NFC)
Nikita Popov [Tue, 23 May 2023 12:17:50 +0000 (14:17 +0200)]
[PostOrderIterator] Use SmallVector for RPOT blocks (NFC)

13 months agoReland "[flang] Handle array constants of any rank"
Leandro Lupori [Tue, 16 May 2023 13:06:13 +0000 (13:06 +0000)]
Reland "[flang] Handle array constants of any rank"

Fixes gfortran test-suite regression.

Differential Revision: https://reviews.llvm.org/D150686

13 months agoReapply [PostOrderIterator] Store end iterator (NFC)
Nikita Popov [Mon, 22 May 2023 13:04:18 +0000 (15:04 +0200)]
Reapply [PostOrderIterator] Store end iterator (NFC)

Replace structured bindings with std::get, as they apparently
break the modules build.

-----

Store the end iterator on the VisitStack, instead of recomputing
it every time, as doing so is not free.

13 months agoAArch64: emit synchronous unwind for Darwin arm64_32 platforms too.
Tim Northover [Tue, 23 May 2023 12:14:21 +0000 (13:14 +0100)]
AArch64: emit synchronous unwind for Darwin arm64_32 platforms too.

Since we're checking the triple directly, arm64_32 shows up differently and was
still getting an attempt at asynchronous unwind that added lots more
`__eh_frame` entries instead of the compact format.

13 months ago[clang][dataflow] Use `Strict` accessors in comma operator and no-op cast.
Martin Braenne [Wed, 17 May 2023 13:27:35 +0000 (13:27 +0000)]
[clang][dataflow] Use `Strict` accessors in comma operator and no-op cast.

This patch is part of the ongoing migration to strict handling of value
categories (see https://discourse.llvm.org/t/70086 for details).

Depends On D150775

Reviewed By: gribozavr2

Differential Revision: https://reviews.llvm.org/D150776

13 months ago[Driver] Fix test for use of ld from devtoolset (NFC)
Nikita Popov [Tue, 23 May 2023 09:45:24 +0000 (11:45 +0200)]
[Driver] Fix test for use of ld from devtoolset (NFC)

The test added in c5fe10f365247c3dd9416b7ec8bad73a60b5946e contains
some typos in the check lines, due to which it never actually
verified what was intended.

Fix the test by adding the required input tree and adjusting the
check lines appropriately.

Differential Revision: https://reviews.llvm.org/D151195

13 months ago[gn build] Port 5111286f06e1
LLVM GN Syncbot [Tue, 23 May 2023 11:39:12 +0000 (11:39 +0000)]
[gn build] Port 5111286f06e1

13 months ago[lli] Export the MinGW chkstk function from the lli executable
Martin Storsjö [Sat, 13 May 2023 23:04:22 +0000 (23:04 +0000)]
[lli] Export the MinGW chkstk function from the lli executable

This allows all ExecutionEngine tests pass in MinGW build configurations.

Differential Revision: https://reviews.llvm.org/D150555

13 months agoReland "Reland [clang-repl] Introduce Value to capture expression results"
Jun Zhang [Tue, 23 May 2023 10:09:04 +0000 (18:09 +0800)]
Reland "Reland [clang-repl] Introduce Value to capture expression results"

This reverts commit 094ab4781262b6cb49d57b0ecdf84b047c879295.

Reland with changing `ParseAndExecute` to `Parse` in
`Interpreter::create`. This avoid creating JIT instance everytime even
if we don't really need them.

This should fixes failures like https://lab.llvm.org/buildbot/#/builders/38/builds/11955

The original reverted patch also causes GN bot fails on M1. (https://lab.llvm.org/buildbot/#/builders/38/builds/11955)
However, we can't reproduce it so let's reland it and see what happens.
See discussions here: https://reviews.llvm.org/rGd71a4e02277a64a9dece591cdf2b34f15c3b19a0

13 months ago[Coverity] Constant variable guards dead code.
Luo, Yuanke [Tue, 23 May 2023 11:16:47 +0000 (19:16 +0800)]
[Coverity] Constant variable guards dead code.

13 months agoRevert "[Sema] `setInvalidDecl` for error deduction declaration"
Tom Weaver [Tue, 23 May 2023 10:44:51 +0000 (11:44 +0100)]
Revert "[Sema] `setInvalidDecl` for error deduction declaration"

This reverts commit eb5902ffc97163338bab95d2fd84a953ee76e96f.

Caused buildbot failures on:
  https://lab.llvm.org/buildbot/#/builders/139/builds/41248
  https://lab.llvm.org/buildbot/#/builders/216/builds/21637

13 months agoFix MSVC "ignoring return value of function declared with 'nodiscard' attribute"...
Simon Pilgrim [Tue, 23 May 2023 10:40:33 +0000 (11:40 +0100)]
Fix MSVC "ignoring return value of function declared with 'nodiscard' attribute" warning. NFC.

13 months ago[llvm][github] Allow github links in /cherry-pick actions
Timm Bäder [Tue, 23 May 2023 08:23:10 +0000 (10:23 +0200)]
[llvm][github] Allow github links in /cherry-pick actions

Differential Revision: https://reviews.llvm.org/D151191

13 months ago[Mips] Avoid RegScavenger::forward in Mips16InstrInfo
Jay Foad [Mon, 15 May 2023 11:01:23 +0000 (12:01 +0100)]
[Mips] Avoid RegScavenger::forward in Mips16InstrInfo

RegScavenger::backward is preferred because it does not rely on accurate
kill flags.

Differential Revision: https://reviews.llvm.org/D150557

13 months ago[gn build] Port 0b91de5ea32d
LLVM GN Syncbot [Tue, 23 May 2023 10:01:28 +0000 (10:01 +0000)]
[gn build] Port 0b91de5ea32d

13 months ago[X86] Add X86FixupVectorConstantsPass to re-fold AVX512 vector load folds as broadcas...
Simon Pilgrim [Fri, 19 May 2023 21:45:14 +0000 (22:45 +0100)]
[X86] Add X86FixupVectorConstantsPass to re-fold AVX512 vector load folds as broadcast folds

This patch analyzes AVX512 instructions for full vector width folded loads from the constant pool and attempts to determine if it can be replaced with a smaller broadcast folded variant. Typically the broadcast opportunities were missed by type-width mismatches or mulituse limitations which have been removed in later passes.

As well as introducing broadcast fold tables (which can hopefully be extended/automated in the future), this also handles mismatches in the AND/ANDN/OR/XOR/TERNLOG type-widths, catching additional missed opportunities.

This is patch is pulled from the ongoing work based on D150143, but without removing the existing DAG constant broadcast lowering code - this patch is currently a late stage cleanup only.

The intention is to add additional broadcast/extension handling of constants in future patches, but it turned out that AVX512 broadcast handling was the easiest to start with.

Differential Revision: https://reviews.llvm.org/D150526

13 months ago[clang] Add tests for CWG issues 977, 1482, 2516
Vlad Serebrennikov [Tue, 23 May 2023 09:50:09 +0000 (12:50 +0300)]
[clang] Add tests for CWG issues 977, 1482, 2516

CWG977 focus on point of /completeness/ of enums. Wording provided in CWG1482.
CWG1482 and CWG2516 focus on locus (point) of /declaration/. Wording provided in CWG2516.

Reviewed By: #clang-language-wg, shafik

Differential Revision: https://reviews.llvm.org/D151042

13 months ago[clang] Add test for CWG2213
Vlad Serebrennikov [Tue, 23 May 2023 09:43:47 +0000 (12:43 +0300)]
[clang] Add test for CWG2213

[[https://wg21.link/p1787 | P1787]]: CWG2213 is resolved by allowing an elaborated-type-specifier to contain a simple-template-id without friend.
Wording: see changes to [dcl.type.elab]]/1.

The gist of the issue is that forward declaration of partial class template specialization was disallowed.

Reviewed By: #clang-language-wg, shafik

Differential Revision: https://reviews.llvm.org/D151032

13 months ago[PowerPC] Avoid RegScavenger::forward in PPCFrameLowering
Jay Foad [Mon, 15 May 2023 11:01:36 +0000 (12:01 +0100)]
[PowerPC] Avoid RegScavenger::forward in PPCFrameLowering

RegScavenger::backward is preferred because it does not rely on accurate
kill flags.

Differential Revision: https://reviews.llvm.org/D150558

13 months ago[libc] Display unit test runtime for hosted environments
Guillaume Chatelet [Tue, 23 May 2023 09:14:28 +0000 (09:14 +0000)]
[libc] Display unit test runtime for hosted environments

With more tests added to LLVM libc each week we want to keep track of unittest's runtime, especially for low end build bots.

Top offender can be tracked with a bit of scripting (spoiler alert, mem function sweep tests are in the top ones)
```
ninja check-libc | grep "ms)" | awk '{print $(NF-1),$0}' | sort -nr | cut -f2- -d' '
```

Unfortunately this doesn't work for hermetic tests since `clock` is unavailable.

Reviewed By: sivachandra

Differential Revision: https://reviews.llvm.org/D151097

13 months ago[RISCV] Make zfbfmin imply the F extension
Alex Bradbury [Tue, 23 May 2023 09:09:22 +0000 (10:09 +0100)]
[RISCV] Make zfbfmin imply the F extension

Our current approach is that if one extension requires another, we make
LLVM treat it as implied. My initial zfbfmin patch failed to do this for
the F extension (documented as a requirement of zfbfmin). This patch
fixes that.

Differential Revision: https://reviews.llvm.org/D151096

13 months ago[SimplifyCFG] add nsw on SwitchToLookupTable index calculation on MinCaseVal subtraction
khei4 [Thu, 18 May 2023 03:49:10 +0000 (12:49 +0900)]
[SimplifyCFG] add nsw on SwitchToLookupTable index calculation on MinCaseVal subtraction
Differential Revision: https://reviews.llvm.org/D146903
Reviewed By: nikic

13 months ago[PowerPC] Simplify fp-to-int store optimization
Qiu Chaofan [Tue, 23 May 2023 08:40:54 +0000 (16:40 +0800)]
[PowerPC] Simplify fp-to-int store optimization

On PowerPC VSX targets, fp-to-int will be transformed into xscv with
mfvsr. When the result is to be stored, mfvsr can be replaced by a
direct store.

This change simplifies the optimization by using existing fp-to-int
code, which helps CSE and handling strictfp cases.

Reviewed By: shchenz

Differential Revision: https://reviews.llvm.org/D141473

13 months ago[Clang] Support more stdio builtins
Qiu Chaofan [Tue, 23 May 2023 08:22:32 +0000 (16:22 +0800)]
[Clang] Support more stdio builtins

Add more builtins for stdio functions as in GCC, along with their
mutations under IEEE float128 ABI.

Reviewed By: tuliom

Differential Revision: https://reviews.llvm.org/D150087

13 months ago[OpenMP][Tests][NFC] Mark unsupported libomp tests for GCC
Joachim Jenke [Tue, 23 May 2023 08:26:13 +0000 (10:26 +0200)]
[OpenMP][Tests][NFC] Mark unsupported libomp tests for GCC

This patch properly marks the support level for libomp test when testing with
GCC.

Some new OpenMP features were only introduced with GCC 11.
Tests using the target construct are incompatibe with GCC.

Tests pass now with GCC 10, 11, 12

13 months ago[OpenMP][Tests][NFC] Mark unsupported OMPT tests for GCC
Joachim Jenke [Tue, 23 May 2023 08:22:50 +0000 (10:22 +0200)]
[OpenMP][Tests][NFC] Mark unsupported OMPT tests for GCC

Codegen for some OpenMP directives is different from clang, so some
OMPT tests fail. As we don't expect GCC codegen to change significantly,
we mark the tests as unsupported for GCC.

OMPT Tests pass now with GCC 10, 11, 12

13 months ago[OpenMP][Tests][NFC] Fix libarcher tests for GCC
Joachim Jenke [Tue, 23 May 2023 07:39:27 +0000 (09:39 +0200)]
[OpenMP][Tests][NFC] Fix libarcher tests for GCC

TSan in GCC filters duplicates less aggressively. With 8 threads we can
expect reports for up to 7 pairs of data race in some tests.

Tests pass now with GCC 10, 11, 12

13 months ago[AArch64] Avoid RegScavenger::forward in AArch64SpeculationHardening
Jay Foad [Mon, 15 May 2023 12:09:05 +0000 (13:09 +0100)]
[AArch64] Avoid RegScavenger::forward in AArch64SpeculationHardening

RegScavenger::backward is preferred because it does not rely on accurate
kill flags.

Differential Revision: https://reviews.llvm.org/D150560

13 months ago[AArch64] Add implicit uses to speculative hardening MIR test
Jay Foad [Mon, 15 May 2023 12:05:14 +0000 (13:05 +0100)]
[AArch64] Add implicit uses to speculative hardening MIR test

A couple of tests were setting liveins to add fake live registers, but
that only works if you track liveness forwards. Add some implicit uses
too, so that it also works if you track liveness backwards.

Differential Revision: https://reviews.llvm.org/D150559

13 months agoAMDGPU/GlobalISel: Update test
Matt Arsenault [Tue, 23 May 2023 08:13:36 +0000 (09:13 +0100)]
AMDGPU/GlobalISel: Update test

13 months agoRevert "[clang][ExprConstant] fix __builtin_object_size for flexible array members"
Krasimir Georgiev [Tue, 23 May 2023 07:59:38 +0000 (07:59 +0000)]
Revert "[clang][ExprConstant] fix __builtin_object_size for flexible array members"

This reverts commit 57c5c1ab2a188b7962c9de5ac0f95e3c7441940a.

Causes an assertion failure: https://reviews.llvm.org/D150892#4363080

13 months agoReapply "InstSimplify: Pass AssumptionCache to isKnownNeverInfinity"
Matt Arsenault [Mon, 22 May 2023 09:44:32 +0000 (10:44 +0100)]
Reapply "InstSimplify: Pass AssumptionCache to isKnownNeverInfinity"

This reverts commit 481191b0a8318e55ce467e983d78d2141e827db1.

13 months agoReapply "SimplifyLibCalls: Pass AssumptionCache to isKnownNeverInfinity"
Matt Arsenault [Fri, 19 May 2023 09:25:38 +0000 (10:25 +0100)]
Reapply "SimplifyLibCalls: Pass AssumptionCache to isKnownNeverInfinity"

This reverts commit b357f379c81811409348dd0e0273a248b055bb7a.

13 months agoReapply "ValueTracking: Delete body of isKnownNeverInfinity"
Matt Arsenault [Fri, 19 May 2023 08:15:22 +0000 (09:15 +0100)]
Reapply "ValueTracking: Delete body of isKnownNeverInfinity"

This reverts commit d1dc3e13a791fe1b99a341406b5dafec64750cb1.

200bdd9e869e2982f54923b05e54c117fd33f5d9 should have fixed
the reported regression.

13 months agoReapply "InstSimplify: Use isKnownNeverInfOrNaN"
Matt Arsenault [Mon, 22 May 2023 09:42:58 +0000 (10:42 +0100)]
Reapply "InstSimplify: Use isKnownNeverInfOrNaN"

This reverts commit f55224735ed39af16bccd7ff67b734fd758db6fc.

13 months agoAMDGPU: Expand casted f16 fmed3 pattern to fmin/fmax on gfx8
Matt Arsenault [Sun, 7 May 2023 10:13:28 +0000 (11:13 +0100)]
AMDGPU: Expand casted f16 fmed3 pattern to fmin/fmax on gfx8

If we have legal f16 instructions but no f16 med3, we can save
one instruction by expanding out the min/max sequence compared
to casting to f32 and casting back.

13 months ago[flang][hlfir] Hoist forall bounds computation when possible
Jean Perier [Tue, 23 May 2023 07:17:36 +0000 (09:17 +0200)]
[flang][hlfir] Hoist forall bounds computation when possible

When inner forall bound computations do not depend on previous
forall indices, they can be hoisted.
This is possible because:
 - bound computation are required to be pure (so evaluating them only
   once is possible).
 - If the bound computation depends on a value previously assigned, the
   forall scheduling analysis created different run for it: the
   assignment impacting the bounds value is not part of the current loop
   nest.

The reason this optimization is done at that point and not as part of
generic loop hoisting optimization is that having the all the loop
bound computation hoisted will allow allocating simple temporary
storages. The number of iteration can be pre-computed and used as the
extent for the temporary.

Differential Revision: https://reviews.llvm.org/D151110

13 months ago[Sema] `setInvalidDecl` for error deduction declaration
Congcong Cai [Tue, 23 May 2023 07:07:03 +0000 (09:07 +0200)]
[Sema] `setInvalidDecl` for error deduction declaration

Fixed: https://github.com/llvm/llvm-project/issues/62408
`setInvalidDecl` for invalid `CXXDeductionGuideDecl` to
avoid crashes during semantic analysis.

Reviewed By: aaron.ballman

Differential Revision: https://reviews.llvm.org/D149516

13 months ago[AMDGPU] Reintroduce CC exception for non-inlined functions in Promote Alloca limits
pvanhout [Mon, 15 May 2023 09:23:09 +0000 (11:23 +0200)]
[AMDGPU] Reintroduce CC exception for non-inlined functions in Promote Alloca limits

This is basically a partial revert of https://reviews.llvm.org/D145586 ( fd1d60873fdc )

D145586 was originally introduced to help with SWDEV-363662, and it did, but
it also caused a 25% drop in performance in
some MIOpen benchmarks where, it seems,
functions are inlined more conservatively.

This patch restores the pre-D145586 behavior
for PromoteAlloca: functions with a non-entry CC
have a 32 VGPRs threshold, but only if the function
is not marked with "alwaysinline".

A good number of AMDGPU code makes uses of
the AMDGPUAlwaysInline pass anyway, so in our
backend "alwaysinline" seems very common.

This change does not affect SWDEV-363662 (the motivating issue for introducing D145586).

Fixes SWDEV-399519

Reviewed By: rampitec, #amdgpu

Differential Revision: https://reviews.llvm.org/D150551

13 months ago[mlir] Apply ClangTidy performance finding (NFC)
Adrian Kuegel [Tue, 23 May 2023 06:52:53 +0000 (08:52 +0200)]
[mlir] Apply ClangTidy performance finding (NFC)

13 months ago[NFC] [C++20] [Modules] Add a test
Chuanqi Xu [Tue, 23 May 2023 06:31:05 +0000 (14:31 +0800)]
[NFC] [C++20] [Modules] Add a test

Add a test from https://github.com/llvm/llvm-project/issues/59999. It is
always good to have more tests.

13 months ago[RISCV][NFC] Simplify code.
Jianjian GUAN [Tue, 23 May 2023 03:44:08 +0000 (11:44 +0800)]
[RISCV][NFC] Simplify code.

Reduce scope of if-else statements.

Reviewed By: craig.topper

Differential Revision: https://reviews.llvm.org/D151178

13 months ago[NFC] Add clang python reformat SHA to .git-blame-ignore-revs
Tobias Hieta [Tue, 23 May 2023 06:31:02 +0000 (08:31 +0200)]
[NFC] Add clang python reformat SHA to .git-blame-ignore-revs

13 months ago[NFC][Py Reformat] Reformat python files in clang and clang-tools-extra
Tobias Hieta [Wed, 17 May 2023 08:56:49 +0000 (10:56 +0200)]
[NFC][Py Reformat] Reformat python files in clang and clang-tools-extra

This is an ongoing series of commits that are reformatting our
Python code.

Reformatting is done with `black`.

If you end up having problems merging this commit because you
have made changes to a python file, the best way to handle that
is to run git checkout --ours <yourfile> and then reformat it
with black.

If you run into any problems, post to discourse about it and
we will try to help.

RFC Thread below:

https://discourse.llvm.org/t/rfc-document-and-standardize-python-code-style

Reviewed By: MatzeB

Differential Revision: https://reviews.llvm.org/D150761

13 months ago[libc] Fix typos in documentation
Kazu Hirata [Tue, 23 May 2023 06:27:59 +0000 (23:27 -0700)]
[libc] Fix typos in documentation

13 months ago[libc] Fix typos in documentation
Kazu Hirata [Tue, 23 May 2023 06:25:16 +0000 (23:25 -0700)]
[libc] Fix typos in documentation

13 months ago[C++20] [Modules] Don't ignore -fmodule-file when we compile pcm files
Chuanqi Xu [Tue, 23 May 2023 05:26:16 +0000 (13:26 +0800)]
[C++20] [Modules] Don't ignore -fmodule-file when we compile pcm files

Close https://github.com/llvm/llvm-project/issues/62843.

Previously when we compile .pcm files into .o files, the
`-fmodule-file=<module-name>=<module-path>` option is ignored. This is
conflicted with our consensus in
https://github.com/llvm/llvm-project/issues/62707.

13 months ago[BPF] Remove unused declaration probeJmpExt
Kazu Hirata [Tue, 23 May 2023 06:19:58 +0000 (23:19 -0700)]
[BPF] Remove unused declaration probeJmpExt

The declaration was added without a corresponding function definition
by:

  commit dc1dbf6ef320175acbdc1206da4b0a176b304449
  Author: Yonghong Song <yhs@fb.com>
  Date:   Wed Aug 23 04:25:57 2017 +0000

13 months ago[lldb] Fix racing issue when loading inlined symbols from crash report
Med Ismail Bennani [Tue, 23 May 2023 05:13:23 +0000 (22:13 -0700)]
[lldb] Fix racing issue when loading inlined symbols from crash report

Following abba5de72466, some tests started failing on green-dragon:

https://green.lab.llvm.org/green/job/lldb-cmake/55460/console

Looking at the backtrace, there seems to be a racing issue when deleting
the temporary directory containing all the JSON object files:

```
Traceback (most recent call last):
  File "/Users/buildslave/jenkins/workspace/lldb-cmake/lldb-build/lib/python3.10/site-packages/lldb/macosx/crashlog.py", line 1115, in __call__
    SymbolicateCrashLogs(debugger, shlex.split(command), result)
  File "/Users/buildslave/jenkins/workspace/lldb-cmake/lldb-build/lib/python3.10/site-packages/lldb/macosx/crashlog.py", line 1457, in SymbolicateCrashLogs
    SymbolicateCrashLog(crash_log, options)
  File "/Users/buildslave/jenkins/workspace/lldb-cmake/lldb-build/lib/python3.10/site-packages/lldb/macosx/crashlog.py", line 1158, in SymbolicateCrashLog
    with tempfile.TemporaryDirectory() as obj_dir:
  File "/usr/local/opt/python@3.10/Frameworks/Python.framework/Versions/3.10/lib/python3.10/tempfile.py", line 869, in __exit__
    self.cleanup()
  File "/usr/local/opt/python@3.10/Frameworks/Python.framework/Versions/3.10/lib/python3.10/tempfile.py", line 873, in cleanup
    self._rmtree(self.name, ignore_errors=self._ignore_cleanup_errors)
  File "/usr/local/opt/python@3.10/Frameworks/Python.framework/Versions/3.10/lib/python3.10/tempfile.py", line 855, in _rmtree
    _shutil.rmtree(name, onerror=onerror)
  File "/usr/local/opt/python@3.10/Frameworks/Python.framework/Versions/3.10/lib/python3.10/shutil.py", line 731, in rmtree
    onerror(os.rmdir, path, sys.exc_info())
  File "/usr/local/opt/python@3.10/Frameworks/Python.framework/Versions/3.10/lib/python3.10/shutil.py", line 729, in rmtree
    os.rmdir(path)
OSError: [Errno 66] Directory not empty: '/var/folders/09/r4vw4v8n5kb67jl66zvlbljw0000gn/T/tmp6qfifxk7'
```

This patch should fix that issue since it won't delete the object file
directory until we're sure that the modules adding tasks completed.

Signed-off-by: Med Ismail Bennani <ismail@bennani.ma>
13 months ago[ThinLTO] Make the cache key independent of the module identifier paths
Argyrios Kyrtzidis [Mon, 22 May 2023 22:16:20 +0000 (15:16 -0700)]
[ThinLTO] Make the cache key independent of the module identifier paths

Otherwise there are cache misses just from changing the name of a path, even though the input modules did not change.

rdar://109672225

Differential Revision: https://reviews.llvm.org/D151165

13 months ago[test] precommit tests for D141188
Zhongyunde [Tue, 23 May 2023 03:30:26 +0000 (11:30 +0800)]
[test] precommit tests for D141188

13 months agoThis is a retry of https://reviews.llvm.org/D114583, which was backed
Galen Elias [Tue, 23 May 2023 03:11:17 +0000 (20:11 -0700)]
This is a retry of https://reviews.llvm.org/D114583, which was backed
out for regressions.

Clang Format is detecting a nested scope followed by another open brace
as a braced initializer list due to incorrectly thinking it's matching a
braced initializer at the end of a constructor initializer list which is
followed by the body open brace.

Unfortunately, UnwrappedLineParser isn't doing a very detailed parse, so
it's not super straightforward to distinguish these cases given the
current structure of calculateBraceTypes. My current hypothesis is that
these can be disambiguated by looking at the token preceding the
l_brace, as initializer list parameters will be preceded by an
identifier, but a scope block generally will not (barring the MACRO
wildcard).

To this end, I am adding tracking of the previous token to the LBraceStack
to help scope this particular case.

TokenAnnotatorTests cherry picked from https://reviews.llvm.org/D150452.

Fixes #33891.
Fixes #52911.

Differential Revision: https://reviews.llvm.org/D150403

13 months ago[libc] Add an option to make `libc` only build the `libc-hdrgen` tool
Joseph Huber [Mon, 22 May 2023 21:00:41 +0000 (16:00 -0500)]
[libc] Add an option to make `libc` only build the `libc-hdrgen` tool

The `libc-hdergen` tool is required for cross-builds, however some cases
can cause issues when configuring this build. This patch adds an
ovveride option `LIBC_HDRGEN_ONLY` to allow us to retain the old
(incorrect) behaviour where `libc` would not build with any other
runtimes enabled.

Reviewed By: sivachandra

Differential Revision: https://reviews.llvm.org/D151155

13 months ago[NFC] [C++20] [Modules] Refactor Sema::isModuleUnitOfCurrentTU into
Chuanqi Xu [Wed, 10 May 2023 03:54:04 +0000 (11:54 +0800)]
[NFC] [C++20] [Modules] Refactor Sema::isModuleUnitOfCurrentTU into
Decl::isInAnotherModuleUnit

Refactor `Sema::isModuleUnitOfCurrentTU` to `Decl::isInAnotherModuleUnit`
to make code simpler a little bit. Note that although this patch
introduces a FIXME, this is an existing issue and this patch just tries
to describe it explicitly.

13 months ago[RISCV] Add more cost model tests for fixed vector casts. NFC
Craig Topper [Tue, 23 May 2023 02:51:42 +0000 (19:51 -0700)]
[RISCV] Add more cost model tests for fixed vector casts. NFC

This covers a full mix of legal and illegal types. I've reduce
the fixed vector length from 128 to 256.

Reviewed By: fakepaper56

Differential Revision: https://reviews.llvm.org/D151127

13 months agoRevert "[CodeGen] Fix incorrect usage of MCPhysReg for diff list elements"
Sergei Barannikov [Tue, 23 May 2023 02:11:38 +0000 (05:11 +0300)]
Revert "[CodeGen] Fix incorrect usage of MCPhysReg for diff list elements"

This reverts commit fa2827f0796c08e36b0b157fc526dd59cd6368e3.

Causes build bot failres:
https://lab.llvm.org/buildbot/#/builders/38/builds/12037

13 months ago[mlir][sparse] (NFC) Reordering extraClassDeclaration for STEA
wren romano [Mon, 22 May 2023 23:07:48 +0000 (16:07 -0700)]
[mlir][sparse] (NFC) Reordering extraClassDeclaration for STEA

Reviewed By: aartbik

Differential Revision: https://reviews.llvm.org/D151171

13 months ago[CodeGen] Fix incorrect usage of MCPhysReg for diff list elements
Sergei Barannikov [Sat, 20 May 2023 18:30:02 +0000 (21:30 +0300)]
[CodeGen] Fix incorrect usage of MCPhysReg for diff list elements

The lists contain differences between register numbers, not the register
numbers themselves. Since a difference can also be negative, this also
changes its type to signed.

Changing the type to signed exposed a "bug". For AMDGPU, which has many
registers, the first element of a sequence could be as big as ~45k.
The value does not fit into int16_t, but fits into uint16_t. The bug
didn't show up because of unsigned wrapping and truncation of the Val
field in the advance() method.

To fix the issue, I changed the way regunit difflists are encoded. The
4-bit 'scale' field of MCRegisterDesc::RegUnit was replaced by 12-bit
number of the first regunit, and the first element of each of the lists
was removed. The higher 20 bits of RegUnit field contain the initial
offset into DiffLists array.
AMDGPU has 1'409 regunits (2^12 = 4'096), and the biggest offset is
80'041 (2^20 = 1'048'576). That is, there is enough room.

Changing the encoding method also resulted in a smaller array size, the
numbers are below (I omitted targets with less than 100 elements).

```
AMDGPU   | 80052 | 78741 |  -1,6%
RISCV    |  6498 |  6297 |  -3,1%
ARM      |  4181 |  3966 |  -5,1%
AArch64  |  2770 |  2592 |  -6,4%
PPC      |  1578 |  1441 |  -8,7%
Hexagon  |   994 |   740 | -25,6%
R600     |   508 |   398 | -21,7%
VE       |   471 |   459 |  -2,5%
Sparc    |   381 |   363 |  -4,7%
X86      |   326 |   208 | -36,2%
Mips     |   253 |   200 | -20,9%
SystemZ  |   186 |   162 | -12,9%
```

Reviewed By: foad, arsenm

Differential Revision: https://reviews.llvm.org/D151036