platform/upstream/llvm.git
2 years ago[X86][FP16] Fix a bug when Combine the FADD(A, FMA(B, C, 0)) to FMA(B, C, A).
Liu, Chen3 [Tue, 28 Sep 2021 01:36:34 +0000 (09:36 +0800)]
[X86][FP16] Fix a bug when Combine the FADD(A, FMA(B, C, 0)) to FMA(B, C, A).

This bug was introduced by D109953. The operand order of generated FMA
is wrong.

Differential Revision: https://reviews.llvm.org/D110606

2 years ago[ORC] Fix the LLJITWithRemoteDebugging example.
Lang Hames [Tue, 28 Sep 2021 03:04:39 +0000 (20:04 -0700)]
[ORC] Fix the LLJITWithRemoteDebugging example.

This was broken by the switch from JITTargetAddress to ExecutorAddr in
21a06254a3a.

2 years ago[ISel] Legalized arithmetic.fence.f128 for 32-bits target
Xiang1 Zhang [Sat, 25 Sep 2021 02:41:37 +0000 (10:41 +0800)]
[ISel] Legalized arithmetic.fence.f128 for 32-bits target

Reviewed By: Craig Topper, Wang Pengfei

Differential Revision: https://reviews.llvm.org/D110467

2 years ago[LoopPred Test] Fix lld-x86_64-win BB failure
Anna Thomas [Tue, 28 Sep 2021 01:27:01 +0000 (21:27 -0400)]
[LoopPred Test] Fix  lld-x86_64-win BB failure

Need a more general CHECK line for testcase in 5df9112 for correctly
handling  lld-x86_64-win buildbot.

2 years agoRevert "tsan: fix trace tests on darwin"
Ahsan Saghir [Tue, 28 Sep 2021 01:17:17 +0000 (20:17 -0500)]
Revert "tsan: fix trace tests on darwin"

This reverts commit 94ea36649ecc854d290c6797e6adb91bdfac756d.

Reverting due to errors on buildbots.

2 years agoReland "[LoopPredication] Add testcase showing BPI computation. NFC"
Anna Thomas [Tue, 28 Sep 2021 00:51:04 +0000 (20:51 -0400)]
Reland "[LoopPredication] Add testcase showing BPI computation. NFC"

This relands commit 16a62d4f.
Relanded after fixing CHECK-LINES for opt pipeline output to be more
general (based on failures seen in buildbot).

2 years agoclang-format
Lang Hames [Tue, 28 Sep 2021 01:00:23 +0000 (18:00 -0700)]
clang-format

2 years ago[llvm-jitlink] Add more information about allocation failures.
Lang Hames [Tue, 28 Sep 2021 00:59:15 +0000 (17:59 -0700)]
[llvm-jitlink] Add more information about allocation failures.

Slab allocator failures will now report requested size and remaining capacity.

2 years ago[PowerPC] MMA - Add __builtin_vsx_build_pair and __builtin_mma_build_acc builtins
Ahsan Saghir [Mon, 13 Sep 2021 01:19:41 +0000 (20:19 -0500)]
[PowerPC] MMA - Add __builtin_vsx_build_pair and __builtin_mma_build_acc builtins

This patch adds the following built-ins:

__builtin_vsx_build_pair
__builtin_mma_build_acc

Reviewed By: #powerpc, nemanjai, lei

Differential Revision: https://reviews.llvm.org/D107647

2 years ago[ORC] Switch from JITTargetAddress to ExecutorAddr for EPC-call APIs.
Lang Hames [Mon, 27 Sep 2021 23:47:24 +0000 (16:47 -0700)]
[ORC] Switch from JITTargetAddress to ExecutorAddr for EPC-call APIs.

Part of the ongoing move to ExecutorAddr.

2 years ago[Polly] Reject regions entered by an indirectbr/callbr.
Michael Kruse [Mon, 27 Sep 2021 01:10:26 +0000 (20:10 -0500)]
[Polly] Reject regions entered by an indirectbr/callbr.

SplitBlockPredecessors is unable to insert an additional BasicBlock
between an indirectbr/callbr terminator and the successor blocks.
This is needed by Polly to normalize the control flow before emitting
its optimzed code.

This patches rejects regions entered by an indirectbr/callbr to not fail
later at code generation.

This fixes llvm.org/PR51964

Recommit with "REQUIRES: asserts" in test that uses statistics.

2 years ago[libc++][NFC] s/enable_if<...>::type/enable_if_t<...> in span
Joe Loser [Mon, 27 Sep 2021 23:18:46 +0000 (19:18 -0400)]
[libc++][NFC] s/enable_if<...>::type/enable_if_t<...> in span

There is some use of `enable_if<...>::type` when the rest of the file
uses `enable_if_t`. So, use `enable_if_t` consistently throughout.

2 years agoRevert "[Polly] Reject reject regions entered by an indirectbr/callbr."
Haowei Wu [Mon, 27 Sep 2021 23:05:33 +0000 (16:05 -0700)]
Revert "[Polly] Reject reject regions entered by an indirectbr/callbr."

This reverts commit 91f46bb77e6d56955c3b96e9e844ae6a251c41e9 which
causes test failures when assertions are off.

2 years ago[ORC] Hold shared_ptr<SymbolStringPool> in errors containing SymbolStringPtrs.
Lang Hames [Mon, 27 Sep 2021 22:25:30 +0000 (15:25 -0700)]
[ORC] Hold shared_ptr<SymbolStringPool> in errors containing SymbolStringPtrs.

This allows these error values to remain valid, even if they tear down the JIT
itself.

2 years ago[CodeMoverUtils] Enhance isSafeToMoveBefore() when control flow equivalence is satisfied
Congzhe Cao [Mon, 27 Sep 2021 22:30:20 +0000 (18:30 -0400)]
[CodeMoverUtils] Enhance isSafeToMoveBefore() when control flow equivalence is satisfied

With improved analysis in determining CFG equivalence that does
not require strict dominance and post-dominance conditions, we
now relax  isSafeToMoveBefore() such that an instruction I can
be moved before InsertPoint even if they do not strictly dominate
each other, as long as they follow the same control flow path.

For example,  we can move Instruction 0 before Instruction 1,
and vice versa.

```
if (cond1)
   // Instruction 0: %add = add i32 1, 2
if (cond1)
   // Instruction 1: %add2 = add i32 2, 1
```

Reviewed By: Whitney

Differential Revision: https://reviews.llvm.org/D110456

2 years agoRevert "tsan: add a test for stack init race"
Kevin Athey [Mon, 27 Sep 2021 21:48:44 +0000 (14:48 -0700)]
Revert "tsan: add a test for stack init race"

This reverts commit b72176b9bc06146d12e495167977effe050dc326.

Broke bot: https://lab.llvm.org/buildbot/#/builders/70/builds/12193

2 years ago[gn build] Port 6cfb4d46bae1
LLVM GN Syncbot [Mon, 27 Sep 2021 21:56:39 +0000 (21:56 +0000)]
[gn build] Port 6cfb4d46bae1

2 years ago[llvm-readobj] Support dumping of MSP430 ELF attributes
Jozef Lawrynowicz [Mon, 27 Sep 2021 21:55:32 +0000 (00:55 +0300)]
[llvm-readobj] Support dumping of MSP430 ELF attributes

The MSP430 ABI supports build attributes for specifying
the ISA, code model, data model and enum size in ELF object files.

Differential Revision: https://reviews.llvm.org/D107969

2 years ago[libomptarget][amdgpu] Follow on to D110513, empty kernarg pools are not fatal
Jon Chesterfield [Mon, 27 Sep 2021 21:21:07 +0000 (22:21 +0100)]
[libomptarget][amdgpu] Follow on to D110513, empty kernarg pools are not fatal

2 years ago[libomptarget][amdgpu] Report zero devices if plugin construction fails, instead...
Jon Chesterfield [Mon, 27 Sep 2021 20:48:29 +0000 (21:48 +0100)]
[libomptarget][amdgpu] Report zero devices if plugin construction fails, instead of segv

2 years agoRevert "[LoopPredication] Add testcase showing BPI computation. NFC"
Anna Thomas [Mon, 27 Sep 2021 21:08:28 +0000 (17:08 -0400)]
Revert "[LoopPredication] Add testcase showing BPI computation. NFC"

This reverts commit 16a62d4f3dca189b0e0565c7ebcd83ddfcc67629.

Needs some update to check lines to fix bb failure.

2 years ago[libc++] Do not enable P1951 before C++23, since it's a breaking change
Louis Dionne [Thu, 23 Sep 2021 16:47:24 +0000 (12:47 -0400)]
[libc++] Do not enable P1951 before C++23, since it's a breaking change

In reaction to the issues raised by Richard in https://llvm.org/D109066,
this commit does not apply P1951 as a DR in previous standard modes,
since it breaks valid code.

I do believe it should be applied as a DR, however ideally we'd get some
sort of statement from the Committee to this effect (and all implementations
would behave consistently). In the meantime, only implement P1951 starting
with C++23 -- we can always come back and apply it as a DR if that's what
the Committee says.

Differential Revision: https://reviews.llvm.org/D110347

2 years ago[LoopPredication] Add testcase showing BPI computation. NFC
Anna Thomas [Mon, 27 Sep 2021 20:52:09 +0000 (16:52 -0400)]
[LoopPredication] Add testcase showing BPI computation. NFC

Precommit testcase for D110438. Since we do not preserve BPI in loop
pass manager, we are forced to compute BPI everytime Loop predication is
invoked.
The patch referenced changes that behaviour by preserving lossy BPI for
loop passes.

2 years ago[X86] Add slow/fast pmulld test coverage to vector-mul.ll
Simon Pilgrim [Mon, 27 Sep 2021 20:42:08 +0000 (21:42 +0100)]
[X86] Add slow/fast pmulld test coverage to vector-mul.ll

2 years ago[gwp-asan] Initialize AllocatorVersionMagic at runtime
Kostya Kortchinsky [Mon, 27 Sep 2021 19:31:59 +0000 (12:31 -0700)]
[gwp-asan] Initialize AllocatorVersionMagic at runtime

GWP-ASan's `AllocatorState` was recently extended with a
`AllocatorVersionMagic` structure required so that GWP-ASan bug reports
can be understood by tools at different versions.

On Fuchsia, this in included in the `scudo::Allocator` structure, and
by having non-zero initializers, this effectively moved the static
allocator structure from the `.bss` segment to the `.data` segment, thus
increasing (significantly) the size of the libc.

This CL proposes to initialize the structure with its magic numbers at
runtime, allowing for the allocator to go back into the `.bss` segment.

I will work on adding a test on the Scudo side to ensure that this type
of changes get detected early on. Additional work is also needed to
reduce the footprint of the (large) memory-tagging related structures
that are currently part of the allocator.

Differential Revision: https://reviews.llvm.org/D110575

2 years ago[NFC][X86] Add 'gather' optsize/minsize test coverage
Roman Lebedev [Mon, 27 Sep 2021 20:47:23 +0000 (23:47 +0300)]
[NFC][X86] Add 'gather' optsize/minsize test coverage

2 years ago[NFC] [PSI] explain encoding of PercentileCutoff.
Florian Mayer [Tue, 14 Sep 2021 15:54:18 +0000 (16:54 +0100)]
[NFC] [PSI] explain encoding of PercentileCutoff.

Reviewed By: mtrofin, davidxl

Differential Revision: https://reviews.llvm.org/D109764

2 years ago[Driver] Remove confusing *-linux-android detection with non-android --target=
Fangrui Song [Mon, 27 Sep 2021 20:28:40 +0000 (13:28 -0700)]
[Driver] Remove confusing *-linux-android detection with non-android --target=

These values allow, for example, `--target=aarch64` and
`--target=aarch64-linux-gnu` to detect `aarch64-linux-android`. This is
confusing. Users should specify `--target=aarch64-linux-android` to get Android GCC
installation.

Reverts D53463.

Reviewed By: nickdesaulniers, danalbert

Differential Revision: https://reviews.llvm.org/D110379

2 years ago[NFC][X86] Add test showing that legal `GATHER`'s are expoanded on Znver3
Roman Lebedev [Mon, 27 Sep 2021 19:40:25 +0000 (22:40 +0300)]
[NFC][X86] Add test showing that legal `GATHER`'s are expoanded on Znver3

2 years ago[ThinLTO] Add noRecurse and noUnwind thinlink function attribute propagation
modimo [Mon, 27 Sep 2021 19:24:28 +0000 (12:24 -0700)]
[ThinLTO] Add noRecurse and noUnwind thinlink function attribute propagation

Thinlink provides an opportunity to propagate function attributes across modules, enabling additional propagation opportunities.

This change propagates (currently default off, turn on with `disable-thinlto-funcattrs=1`) noRecurse and noUnwind based off of function summaries of the prevailing functions in bottom-up call-graph order. Testing on clang self-build:
1. There's a 35-40% increase in noUnwind functions due to the additional propagation opportunities.
2. Throughput is measured at 10-15% increase in thinlink time which itself is 1.5% of E2E link time.

Implementation-wise this adds the following summary function attributes:
1. noUnwind: function is noUnwind
2. mayThrow: function contains a non-call instruction that `Instruction::mayThrow` returns true on (e.g. windows SEH instructions)
3. hasUnknownCall: function contains calls that don't make it into the summary call-graph thus should not be propagated from (e.g. indirect for now, could add no-opt functions as well)

Testing:
Clang self-build passes and 2nd stage build passes check-all
ninja check-all with newly added tests passing

Reviewed By: tejohnson

Differential Revision: https://reviews.llvm.org/D36850

2 years ago[mlir][linalg] Finer-grained padding control.
Tobias Gysi [Mon, 27 Sep 2021 19:20:56 +0000 (19:20 +0000)]
[mlir][linalg] Finer-grained padding control.

Adapt the signature of the PaddingValueComputationFunction callback to either return the padding value or failure to signal padding is not desired.

Reviewed By: nicolasvasilache

Differential Revision: https://reviews.llvm.org/D110572

2 years ago[X86][Costmodel] Load/store i16 Stride=4 VF=32 interleaving costs
Roman Lebedev [Mon, 27 Sep 2021 19:18:41 +0000 (22:18 +0300)]
[X86][Costmodel] Load/store i16 Stride=4 VF=32 interleaving costs

The only sched models that for cpu's that support avx2
but not avx512 are: haswell, broadwell, skylake, zen1-3

For this tuple, measuring becomes problematic since there's a lot of spilling going on,
but apparently all these memory ops do not affect worst-case estimate at all here.

For load we have:
https://godbolt.org/z/zP4hd8MT6 - for intels `Block RThroughput: =150.0`; for ryzens, `Block RThroughput: <=59`
So pick cost of `150`.

For store we have:
https://godbolt.org/z/vKb8zTK8E - for intels `Block RThroughput: =32.0`; for ryzens, `Block RThroughput: <=24.0`
So pick cost of `64`.

I'm directly using the shuffling asm the llc produced,
without any manual fixups that may be needed
to ensure sequential execution.

Reviewed By: RKSimon

Differential Revision: https://reviews.llvm.org/D110548

2 years ago[X86][Costmodel] Load/store i16 Stride=4 VF=16 interleaving costs
Roman Lebedev [Mon, 27 Sep 2021 19:18:40 +0000 (22:18 +0300)]
[X86][Costmodel] Load/store i16 Stride=4 VF=16 interleaving costs

The only sched models that for cpu's that support avx2
but not avx512 are: haswell, broadwell, skylake, zen1-3

For load we have:
https://godbolt.org/z/Wd9cKab83 - for intels `Block RThroughput: =75.0`; for ryzens, `Block RThroughput: <=29.5`
So pick cost of `75`. (note that `# 32-byte Reload` does not affect throughput there.)

For store we have:
https://godbolt.org/z/Wd9cKab83 - for intels `Block RThroughput: =32.0`; for ryzens, `Block RThroughput: <=12.0`
So pick cost of `32`.

I'm directly using the shuffling asm the llc produced,
without any manual fixups that may be needed
to ensure sequential execution.

Reviewed By: RKSimon

Differential Revision: https://reviews.llvm.org/D110543

2 years ago[X86][Costmodel] Load/store i16 Stride=4 VF=8 interleaving costs
Roman Lebedev [Mon, 27 Sep 2021 19:18:36 +0000 (22:18 +0300)]
[X86][Costmodel] Load/store i16 Stride=4 VF=8 interleaving costs

The only sched models that for cpu's that support avx2
but not avx512 are: haswell, broadwell, skylake, zen1-3

For load we have:
https://godbolt.org/z/dd8T5P471 - for intels `Block RThroughput: =33.0`; for ryzens, `Block RThroughput: <=14.5`
So pick cost of `33`.

For store we have:
https://godbolt.org/z/zPxcKWhn4 - for intels `Block RThroughput: =10.0`; for ryzens, `Block RThroughput: <=6.0`
So pick cost of `10`.

I'm directly using the shuffling asm the llc produced,
without any manual fixups that may be needed
to ensure sequential execution.

Reviewed By: RKSimon

Differential Revision: https://reviews.llvm.org/D110541

2 years ago[X86][Costmodel] Load/store i16 Stride=4 VF=4 interleaving costs
Roman Lebedev [Mon, 27 Sep 2021 19:18:32 +0000 (22:18 +0300)]
[X86][Costmodel] Load/store i16 Stride=4 VF=4 interleaving costs

The only sched models that for cpu's that support avx2
but not avx512 are: haswell, broadwell, skylake, zen1-3

For load we have:
https://godbolt.org/z/rnsf639Wh - for intels `Block RThroughput: =17.0`; for ryzens, `Block RThroughput: <=7.5`
So pick cost of `17`.

For store we have:
https://godbolt.org/z/565KKrcY6 - for intels `Block RThroughput: =6.0`; for ryzens, `Block RThroughput: =2.0`
So pick cost of `6`.

I'm directly using the shuffling asm the llc produced,
without any manual fixups that may be needed
to ensure sequential execution.

Reviewed By: RKSimon

Differential Revision: https://reviews.llvm.org/D110537

2 years ago[X86][Costmodel] Load/store i16 Stride=4 VF=2 interleaving costs
Roman Lebedev [Mon, 27 Sep 2021 19:18:27 +0000 (22:18 +0300)]
[X86][Costmodel] Load/store i16 Stride=4 VF=2 interleaving costs

The only sched models that for cpu's that support avx2
but not avx512 are: haswell, broadwell, skylake, zen1-3

For load we have:
https://godbolt.org/z/5EYc6r9nh - for intels `Block RThroughput: =6.0`; for ryzens, `Block RThroughput: <=3.0`
So pick cost of `6`.

For store we have:
https://godbolt.org/z/z61e5d6GE - for intels `Block RThroughput: =2.0`; for ryzens, `Block RThroughput: <=1.0`
So pick cost of `2`.

I'm directly using the shuffling asm the llc produced,
without any manual fixups that may be needed
to ensure sequential execution.

Reviewed By: RKSimon

Differential Revision: https://reviews.llvm.org/D110536

2 years agoFixing docs build
Chris Bieneman [Mon, 27 Sep 2021 19:16:28 +0000 (14:16 -0500)]
Fixing docs build

I always forget that new line...

2 years agoImplement #pragma clang final extension
Chris Bieneman [Mon, 27 Sep 2021 16:59:55 +0000 (11:59 -0500)]
Implement #pragma clang final extension

This patch adds a new preprocessor extension ``#pragma clang final``
which enables warning on undefinition and re-definition of macros.

The intent of this warning is to extend beyond ``-Wmacro-redefined`` to
warn against any and all alterations to macros that are marked `final`.

This warning is part of the ``-Wpedantic-macros`` diagnostics group.

Reviewed By: aaron.ballman

Differential Revision: https://reviews.llvm.org/D108567

2 years ago[InstCombine] reduce code for shl-of-sub transform; NFC
Sanjay Patel [Mon, 27 Sep 2021 18:54:40 +0000 (14:54 -0400)]
[InstCombine] reduce code for shl-of-sub transform; NFC

2 years ago[InstCombine] add tests for shl-of-sub; NFC
Sanjay Patel [Mon, 27 Sep 2021 18:32:42 +0000 (14:32 -0400)]
[InstCombine] add tests for shl-of-sub; NFC

2 years ago[mlir][sparse] sampled matrix multiplication fusion test
Aart Bik [Fri, 24 Sep 2021 20:36:52 +0000 (13:36 -0700)]
[mlir][sparse] sampled matrix multiplication fusion test

This integration tests runs a fused and non-fused version of
sampled matrix multiplication. Both should eventually have the
same performance!

NOTE: relies on pending tensor.init fix!

Reviewed By: bixia

Differential Revision: https://reviews.llvm.org/D110444

2 years agoRevert "[clangd] Refactor IncludeStructure: use File (unsigned) for most computations"
Nico Weber [Mon, 27 Sep 2021 18:38:18 +0000 (14:38 -0400)]
Revert "[clangd] Refactor IncludeStructure: use File (unsigned) for most computations"

This reverts commit 0b1eff1bc5d004b1964bb9b1667e3efc034f3f62.
Breaks check-clangd on Windows, see comments on
https://reviews.llvm.org/D110386

2 years agoRevert "[openmp] Add addrspacecast to getOrCreateIdent"
Jon Chesterfield [Mon, 27 Sep 2021 18:27:00 +0000 (19:27 +0100)]
Revert "[openmp] Add addrspacecast to getOrCreateIdent"

This reverts commit 1a761e5b7b50dc08e0ff7f7aea65e1da29c5cd80.
Failed CI, albeit with a different failure mode to BZ51982

2 years ago[openmp] Add addrspacecast to getOrCreateIdent
Jon Chesterfield [Mon, 27 Sep 2021 18:23:11 +0000 (19:23 +0100)]
[openmp] Add addrspacecast to getOrCreateIdent

Fixes 51982. Minor refactor to remove `return x = y` construct.

Test case derived from https://github.com/ROCm-Developer-Tools/aomp/\
blob/aomp-dev/test/smoke/nest_call_par2/nest_call_par2.c by deleting
parts while checking the assertion failure still occurred.

Reviewed By: jdoerfert

Differential Revision: https://reviews.llvm.org/D110556

2 years ago[mlir][sparse] preserve zero-initialization for materializing buffers
Aart Bik [Fri, 24 Sep 2021 20:15:17 +0000 (13:15 -0700)]
[mlir][sparse] preserve zero-initialization for materializing buffers

This revision makes sure that when the output buffer materializes locally
(in contrast with the passing in of output tensors either in-place or not
in-place), the zero initialization assumption is preserved. This also adds
a bit more documentation on our sparse kernel assumption (viz. TACO
assumptions).

Reviewed By: bixia

Differential Revision: https://reviews.llvm.org/D110442

2 years agoAdd a missing include to appease the build bots
Aaron Ballman [Mon, 27 Sep 2021 18:18:58 +0000 (14:18 -0400)]
Add a missing include to appease the build bots

2 years ago[InstCombine] add use check to shl transform
Sanjay Patel [Mon, 27 Sep 2021 17:49:13 +0000 (13:49 -0400)]
[InstCombine] add use check to shl transform

This bug was introduced with the refactoring in:
9075edc89bc9
...but there were no tests to detect it.

2 years ago[InstCombine] add tests for opposing shifts separated by trunc; NFC
Sanjay Patel [Mon, 27 Sep 2021 17:42:55 +0000 (13:42 -0400)]
[InstCombine] add tests for opposing shifts separated by trunc; NFC

2 years agoBad SLPVectorization shufflevector replacement, resulting in write to wrong memory...
Jameson Nash [Thu, 22 Jul 2021 23:42:23 +0000 (19:42 -0400)]
Bad SLPVectorization shufflevector replacement, resulting in write to wrong memory location

We see that it might otherwise do:

  %10 = getelementptr {}**, <2 x {}***> %9, <2 x i32> <i32 10, i32 4>
  %11 = bitcast <2 x {}***> %10 to <2 x i64*>
...
  %27 = extractelement <2 x i64*> %11, i32 0
  %28 = bitcast i64* %27 to <2 x i64>*
  store <2 x i64> %22, <2 x i64>* %28, align 4, !tbaa !2

Which is an out-of-bounds store (the extractelement got offset 10
instead of offset 4 as intended). With the fix, we correctly generate
extractelement for i32 1 and generate correct code.

Differential Revision: https://reviews.llvm.org/D106613

2 years agoFix bug in readability-uppercase-literal-suffix
Carlos Galvez [Mon, 27 Sep 2021 18:02:53 +0000 (14:02 -0400)]
Fix bug in readability-uppercase-literal-suffix

Fixes https://bugs.llvm.org/show_bug.cgi?id=51790. The check triggers
incorrectly with non-type template parameters.

A bisect determined that the bug was introduced here:
https://github.com/llvm/llvm-project/commit/ea2225a10be986d226e041d20d36dff17e78daed

Unfortunately that patch can no longer be reverted on top of the main
branch, so add a fix instead. Add a unit test to avoid regression in
the future.

2 years ago[flang] Catch branching into FORALL/WHERE constructs
peter klausler [Thu, 16 Sep 2021 17:03:45 +0000 (10:03 -0700)]
[flang] Catch branching into FORALL/WHERE constructs

Enforce constraints C1034 & C1038, which disallow the use
of otherwise valid statements as branch targets when they
appear in FORALL &/or WHERE constructs.  (And make the
diagnostic message somewhat more user-friendly.)

Differential Revision: https://reviews.llvm.org/D109936

2 years ago[AMDGPU] Change ASAN init/fini kernels linkage to external.
Praveen Velliengiri [Mon, 27 Sep 2021 17:49:49 +0000 (11:49 -0600)]
[AMDGPU] Change ASAN init/fini kernels linkage to external.

HSA runtime fails to find the symbols for Init and Fini kernels as
they mark with internal linkage, changing the linkage to external
to fix those errors.

Differential Revision: https://reviews.llvm.org/D110054

2 years ago[mlir] Mode for explicitly controlling the fusion kind
Sumesh Udayakumaran [Sat, 25 Sep 2021 22:46:03 +0000 (01:46 +0300)]
[mlir] Mode for explicitly controlling the fusion kind

New mode option that allows for either running the default fusion kind that happens today or doing either of producer-consumer or sibling fusion. This will also be helpful to minimize the compile-time of the fusion tests.

Reviewed By: bondhugula, dcaballe

Differential Revision: https://reviews.llvm.org/D110102

2 years ago[PowerPC] Fix td pattern for P10 VSLDBI and VSRDBI
Quinn Pham [Thu, 16 Sep 2021 19:00:01 +0000 (14:00 -0500)]
[PowerPC] Fix td pattern for P10 VSLDBI and VSRDBI

This patch fixes the pattern for the P10 instructions Vector Shift Left
Double by Bit Immediate VN-form and Vector Shift Right Double by Bit
Immediate VN-form. The third argument should be a target constant (`timm`)
instead of an `i32` because an immediate is expected.

Reviewed By: lei

Differential Revision: https://reviews.llvm.org/D109920

2 years ago[HIP] Fix linking of asanrt.bc
Yaxun (Sam) Liu [Thu, 23 Sep 2021 03:45:27 +0000 (23:45 -0400)]
[HIP] Fix linking of asanrt.bc

HIP currently uses -mlink-builtin-bitcode to link all bitcode libraries, which
changes the linkage of functions to be internal once they are linked in. This
works for common bitcode libraries since these functions are not intended
to be exposed for external callers.

However, the functions in the sanitizer bitcode library is intended to be
called by instructions generated by the sanitizer pass. If their linkage is
changed to internal, their parameters may be altered by optimizations before
the sanitizer pass, which renders them unusable by the sanitizer pass.

To fix this issue, HIP toolchain links the sanitizer bitcode library with
-mlink-bitcode-file, which does not change the linkage.

A struct BitCodeLibraryInfo is introduced in ToolChain as a generic
approach to pass the bitcode library information between ToolChain and Tool.

Reviewed by: Artem Belevich

Differential Revision: https://reviews.llvm.org/D110304

2 years ago[MLIR][LLVM] Add error if using incorrect attribute type for specifying LLVM linkage
William S. Moses [Mon, 27 Sep 2021 16:55:24 +0000 (12:55 -0400)]
[MLIR][LLVM] Add error if using incorrect attribute type for specifying LLVM linkage

Address post-commit review in https://reviews.llvm.org/D108524 to add appropriate diagnostics.

Reviewed By: mehdi_amini

Differential Revision: https://reviews.llvm.org/D110566

2 years ago[flang] Enforce constraint: defined ass't in WHERE must be elemental
peter klausler [Wed, 15 Sep 2021 15:28:48 +0000 (08:28 -0700)]
[flang] Enforce constraint: defined ass't in WHERE must be elemental

A defined assignment subroutine invoked in the context of a WHERE
statement or construct must necessarily be elemental (C1032).

Differential Revision: https://reviews.llvm.org/D109932

2 years ago[RISCV] Fold store of vmv.x.s to a vse with VL=1.
Craig Topper [Mon, 27 Sep 2021 16:45:30 +0000 (09:45 -0700)]
[RISCV] Fold store of vmv.x.s to a vse with VL=1.

This can avoid a loss of decoupling with the scalar unit on cores
with decoupled scalar and vector units.

We should support FP too, but those use extract_element and not a
custom ISD node so it is a little different. I also left a FIXME
in the test for i64 extract and store on RV32.

Reviewed By: frasercrmck

Differential Revision: https://reviews.llvm.org/D109482

2 years ago[ELF] Support symbol names with space in linker script expressions
Fangrui Song [Mon, 27 Sep 2021 16:50:41 +0000 (09:50 -0700)]
[ELF] Support symbol names with space in linker script expressions

Fix PR51961

Reviewed By: jhenderson

Differential Revision: https://reviews.llvm.org/D110490

2 years ago[InstCombine] Fix an "unused variable" warning
Kazu Hirata [Mon, 27 Sep 2021 16:49:32 +0000 (09:49 -0700)]
[InstCombine] Fix an "unused variable" warning

2 years agoImplement the conversion from sparse constant to sparse tensors.
Bixia Zheng [Sat, 25 Sep 2021 06:19:07 +0000 (23:19 -0700)]
Implement the conversion from sparse constant to sparse tensors.

The sparse constant provides a constant tensor in coordinate format. We first split the sparse constant into a constant tensor for indices and a constant tensor for values. We then generate a loop to fill a sparse tensor in coordinate format using the tensors for the indices and the values. Finally, we convert the sparse tensor in coordinate format to the destination sparse tensor format.

Add tests.

Reviewed By: aartbik

Differential Revision: https://reviews.llvm.org/D110373

2 years ago[OpenMP] libomp: Usage of TASK_TIED constant inside kmp_gsupport.cpp
@vladaindjic [Mon, 27 Sep 2021 16:44:46 +0000 (19:44 +0300)]
[OpenMP] libomp: Usage of TASK_TIED constant inside kmp_gsupport.cpp

The minor code refactorization introduces the TASK_TIED constant inside
kmp_gsupprot.cpp as a replacement for the literal value 1.
The mentioned constant is now used in both kmp_tasking.cpp and
kmp_gsupport.cpp files.

Differential Revision: https://reviews.llvm.org/D110441

2 years ago[RISCV] Improve support for forming widening multiplies when one input is a scalar...
Craig Topper [Mon, 27 Sep 2021 16:37:04 +0000 (09:37 -0700)]
[RISCV] Improve support for forming widening multiplies when one input is a scalar splat.

If one input of a fixed vector multiply is a sign/zero extend and
the other operand is a splat of a scalar, we can use a widening
multiply if the scalar value has sufficient sign/zero bits.

Reviewed By: frasercrmck

Differential Revision: https://reviews.llvm.org/D110028

2 years ago[NFC][AMDGPU] Update cost model tests:
Daniil Fukalov [Mon, 27 Sep 2021 16:23:47 +0000 (19:23 +0300)]
[NFC][AMDGPU] Update cost model tests:

1. Convert to generated tests.
2. Added code-size case in few places.

2 years ago[InstCombine] move shl-only folds out from under commonShiftTransforms(); NFCI
Sanjay Patel [Mon, 27 Sep 2021 16:06:40 +0000 (12:06 -0400)]
[InstCombine] move shl-only folds out from under commonShiftTransforms(); NFCI

This is no-functional-change-intended, but it hopefully makes things
slightly clearer and more efficient to have transforms that require
'shl' be called only from visitShl(). Further cleanup is possible.

2 years ago[lldb] A different fix for Domain Socket tests
Pavel Labath [Mon, 27 Sep 2021 15:57:22 +0000 (17:57 +0200)]
[lldb] A different fix for Domain Socket tests

we need to drop nuls from the end of the string.

2 years ago[Lanai] Remove redundant declaration getTheLanaiTarget (NFC)
Kazu Hirata [Mon, 27 Sep 2021 15:58:27 +0000 (08:58 -0700)]
[Lanai] Remove redundant declaration getTheLanaiTarget (NFC)

Note that getTheLanaiTarget is declared in
TargetInfo/LanaiTargetInfo.h, which LanaiDisassembler.cpp includes.

Identified with readability-redundant-declaration.

2 years ago[clangd] Refactor IncludeStructure: use File (unsigned) for most computations
Kirill Bobyrev [Mon, 27 Sep 2021 15:50:50 +0000 (17:50 +0200)]
[clangd] Refactor IncludeStructure: use File (unsigned) for most computations

Preparation for D108194.

Reviewed By: sammccall

Differential Revision: https://reviews.llvm.org/D110386

2 years ago[OpenMP] Add new worksharing definitions into device RTL
Joseph Huber [Fri, 24 Sep 2021 15:53:31 +0000 (11:53 -0400)]
[OpenMP] Add new worksharing definitions into device RTL

This path defines the newly added `__kmpc_disitrute_static_init`
functions in the device runtime library. These functions are currently
exact copies of the current worksharing method but can be tuned later.

Depends on D110429

Reviewed By: tianshilei1992

Differential Revision: https://reviews.llvm.org/D110430

2 years ago[OpenMP] Introduce a new worksharing RTL function for distribute
Joseph Huber [Fri, 24 Sep 2021 15:02:36 +0000 (11:02 -0400)]
[OpenMP] Introduce a new worksharing RTL function for distribute

This patch adds a new RTL function for worksharing. Currently we use
`__kmpc_for_static_init` for both the `distribute` and `parallel`
portion of the loop clause. This patch replaces the `distribute` portion
with a new runtime call `__kmpc_distribute_static_init`. Currently this
will be used exactly the same way, but will make it easier in the future
to fine-tune the distribute and parallel portion of the loop.

Reviewed By: jdoerfert

Differential Revision: https://reviews.llvm.org/D110429

2 years ago[lldb] Fix SocketTest.DomainGetConnectURI on macOS by stripping more zeroes from...
Raphael Isemann [Mon, 27 Sep 2021 13:28:02 +0000 (15:28 +0200)]
[lldb] Fix SocketTest.DomainGetConnectURI on macOS by stripping more zeroes from getpeername result

Apparently macOS is padding the name result with several padding zeroes at
the end. Just strip them all to pretend it's a C-string.

Thanks to Pavel for suggesting this fix.

2 years ago[llvm/OptTable] Add named param comment for GroupedShortOption
Nico Weber [Mon, 27 Sep 2021 15:31:45 +0000 (11:31 -0400)]
[llvm/OptTable] Add named param comment for GroupedShortOption

2 years agoFix tests defaulting to incorrect triples on AIX
Jake Egan [Mon, 27 Sep 2021 15:29:53 +0000 (11:29 -0400)]
Fix tests defaulting to incorrect triples on AIX

The tests only specify -march, so when the tests are run on AIX the target OS defaults to AIX, which causes the tests to misbehave.

This patch constrains the tests by specifying -mtriple instead of -march.

Reviewed By: daltenty, jsji, MaskRay

Differential Revision: https://reviews.llvm.org/D110186

2 years ago[llvm/OptTable] Drop "The" prefix on fields
Nico Weber [Mon, 27 Sep 2021 15:24:51 +0000 (11:24 -0400)]
[llvm/OptTable] Drop "The" prefix on fields

2 years ago[llvm] Convert OptTable::ParseOneArg() to std::unique_ptr<>
Nico Weber [Mon, 27 Sep 2021 15:19:04 +0000 (11:19 -0400)]
[llvm] Convert OptTable::ParseOneArg() to std::unique_ptr<>

2 years ago[llvm] Convert OptTable::parseOneArgGrouped() to std::unique_ptr<>
Nico Weber [Mon, 27 Sep 2021 15:10:13 +0000 (11:10 -0400)]
[llvm] Convert OptTable::parseOneArgGrouped() to std::unique_ptr<>

2 years ago[llvm] ConvertOption::accept(), acceptInternal() to std::unique_ptr<>
Nico Weber [Mon, 27 Sep 2021 15:04:07 +0000 (11:04 -0400)]
[llvm] ConvertOption::accept(), acceptInternal() to std::unique_ptr<>

These functions transfer ownership to the caller. Make this clear in the
type system.

No behavior change.

2 years ago[InstCombine] generalize fold for (trunc (X u>> C1)) u>> C
Sanjay Patel [Mon, 27 Sep 2021 13:27:28 +0000 (09:27 -0400)]
[InstCombine] generalize fold for (trunc (X u>> C1)) u>> C

This is another step towards trying to re-apply D110170
by eliminating conflicting transforms that cause infinite loops.
a47c8e40c734 was a previous patch in this direction.

The diffs here are mostly cosmetic, but intentional:
1. The existing code that would handle this pattern in FoldShiftByConstant()
   is limited to 'shl' only now. The formatting change to IsLeftShift shows
   that we could move several transforms into visitShl() directly for
   efficiency because they are not common shift transforms.

2. The tests are regenerated to show new instruction names to prove that
   we are getting (almost) identical logic results.

3. The one case where we differ ("trunc_sandwich_small_shift1") shows that
   we now use a narrow 'and' instruction. Previously, we relied on another
   transform to do that, but it is limited to legal types. That seems to
   be a legacy constraint from when IR analysis and codegen were less robust.

https://alive2.llvm.org/ce/z/JxyGA4

  declare void @llvm.assume(i1)

  define i8 @src(i32 %x, i32 %c0, i8 %c1) {
    ; The sum of the shifts must not overflow the source width.
    %z1 = zext i8 %c1 to i32
    %sum = add i32 %c0, %z1
    %ov = icmp ult i32 %sum, 32
    call void @llvm.assume(i1 %ov)

    %sh1 = lshr i32 %x, %c0
    %tr = trunc i32 %sh1 to i8
    %sh2 = lshr i8 %tr, %c1
    ret i8 %sh2
  }

  define i8 @tgt(i32 %x, i32 %c0, i8 %c1) {
    %z1 = zext i8 %c1 to i32
    %sum = add i32 %c0, %z1
    %maskc = lshr i8 -1, %c1

    %s = lshr i32 %x, %sum
    %t = trunc i32 %s to i8
    %a = and i8 %t, %maskc
    ret i8 %a
  }

2 years ago[InstCombine] match variable names and code comments; NFC
Sanjay Patel [Sun, 26 Sep 2021 16:02:46 +0000 (12:02 -0400)]
[InstCombine] match variable names and code comments; NFC

Similar to:
29c09c7

Planned follow-up is to add a transform here to allow removing
a common shift fold that is conflicting with D110170.

2 years agoExplicitly specify -fintegrated-as to clang/test/Driver/compilation_database.c test...
Amy Kwan [Mon, 27 Sep 2021 13:51:25 +0000 (08:51 -0500)]
Explicitly specify -fintegrated-as to clang/test/Driver/compilation_database.c test case.

It appears that this test assumes that the toolchain utilizes the integrated
assembler by default, since the expected output in the CHECKs are
compilation_database.o.

However, this test fails on AIX as AIX does not utilize the integrated assembler.
On AIX, the output instead is of the form /tmp/compilation_database-*.s.
Thus, this patch explicitly adds the -fintegrated-as option to match the
assumption that the integrated assembler is used by default.

Differential Revision: https://reviews.llvm.org/D110431

2 years ago[mlir] AsyncRuntime: use int64_t for ref counting operations
Eugene Zhulenev [Mon, 27 Sep 2021 14:06:54 +0000 (07:06 -0700)]
[mlir] AsyncRuntime: use int64_t for ref counting operations

Workaround for SystemZ ABI problem: https://bugs.llvm.org/show_bug.cgi?id=51898

Reviewed By: ftynse

Differential Revision: https://reviews.llvm.org/D110550

2 years agotsan: fix trace tests on darwin
Dmitry Vyukov [Mon, 27 Sep 2021 12:57:18 +0000 (14:57 +0200)]
tsan: fix trace tests on darwin

The trace tests crashed on darwin because of some thread
initialization issues (thread initialization is somewhat
different on darwin).
Instead of starting real threads, create a new ThreadState
in the main thread. This makes the tests more unit-testy
and hopefully won't crash on darwin (there is almost no
platform-specific code involved now).
This will also help with future trace tests that will need
more than 1 thread. Creating more than 1 real thread and
dispatching test actions across multiple threads in the
required deterministic order is painful.

Depends on D110539.

Reviewed By: melver

Differential Revision: https://reviews.llvm.org/D110546

2 years agotsan: add a test for stack init race
Dmitry Vyukov [Mon, 27 Sep 2021 12:07:28 +0000 (14:07 +0200)]
tsan: add a test for stack init race

Depends on D110538.

Reviewed By: melver

Differential Revision: https://reviews.llvm.org/D110539

2 years agotsan: fix and test detection of TLS races
Dmitry Vyukov [Mon, 27 Sep 2021 11:43:33 +0000 (13:43 +0200)]
tsan: fix and test detection of TLS races

Currently detection of races with TLS/stack initialization
is broken because we imitate the write before thread initialization,
so it's modelled with a wrong thread/epoch.
Fix that and add a test.

Reviewed By: melver

Differential Revision: https://reviews.llvm.org/D110538

2 years ago[AMDGPU] Ignore KILLs when forming clauses
Sebastian Neubauer [Fri, 16 Jul 2021 11:15:49 +0000 (13:15 +0200)]
[AMDGPU] Ignore KILLs when forming clauses

KILL instructions are sometimes present and prevented hard
clauses from being formed.

Fix this by ignoring all meta instructions in clauses.

Differential Revision: https://reviews.llvm.org/D106042

2 years ago[clang] Put original flags on 'Driver args:' crash report line
Nico Weber [Fri, 24 Sep 2021 23:42:09 +0000 (19:42 -0400)]
[clang] Put original flags on 'Driver args:' crash report line

We used to put the canonical spelling of flags after alias processing
on that line. For clang-cl in particular, that meant that we put flags
on that line that the clang-cl driver doesn't even accept, and the
"Driver args:" line wasn't usable.

Differential Revision: https://reviews.llvm.org/D110458

2 years agotsan: de-hardcode MemCount const
Dmitry Vyukov [Mon, 27 Sep 2021 11:30:32 +0000 (13:30 +0200)]
tsan: de-hardcode MemCount const

Use MemCount instead of hard-coded value 7.

Reviewed By: melver

Differential Revision: https://reviews.llvm.org/D110532

2 years ago[lldb] [DynamicRegisterInfo] Add a convenience method to add suppl. registers
Michał Górny [Sat, 18 Sep 2021 15:31:14 +0000 (17:31 +0200)]
[lldb] [DynamicRegisterInfo] Add a convenience method to add suppl. registers

Add a convenience method to add supplementary registers that takes care
of adding invalidate_regs to all (potentially) overlapping registers.

Differential Revision: https://reviews.llvm.org/D110023

2 years ago[FuncSpec] Don't specialise (or crash) on poison or constexpr values
Sjoerd Meijer [Mon, 27 Sep 2021 07:39:53 +0000 (08:39 +0100)]
[FuncSpec] Don't specialise (or crash) on poison or constexpr values

Function specialization was crashing on poison values and constexpr values.
The problem is that these values are not added to the solver, so it crashes
when a lookup is performed for these values. This fixes that by not
specialising on these values. For poison that is obvious, but for constexpr
this is a change in behaviour. Thus, in one way this is a bit of a stopgap, but
specialising on constexpr values wasn't done very intentionally, and need some
more work and tests if we wanted to support this.

As a follow up, we need to look if the solver should exit more gracefully and
return a "don't know", or that it should really support these constexprs.

This should fix PR51600 (https://bugs.llvm.org/show_bug.cgi?id=51600).

Differential Revision: https://reviews.llvm.org/D110529

2 years ago[AArch64] Fix neon-reverseshuffle test extension. NFC
David Green [Mon, 27 Sep 2021 13:43:26 +0000 (14:43 +0100)]
[AArch64] Fix neon-reverseshuffle test extension. NFC

Apparently I gave a ll file a .patch extension. Oops.

2 years agoRemoving a default constructor argument; NFC
Aaron Ballman [Mon, 27 Sep 2021 13:39:45 +0000 (09:39 -0400)]
Removing a default constructor argument; NFC

The argument is always used with its default value, so remove the
argument entirely.

2 years ago[LoopFlatten] Precommit new test widen-iv2.ll for D110234.
Sjoerd Meijer [Wed, 22 Sep 2021 12:06:23 +0000 (13:06 +0100)]
[LoopFlatten] Precommit new test widen-iv2.ll for D110234.

2 years ago[llvm-dwarfdump][docs] Add missing options to the help output and the command guide
gbreynoo [Mon, 27 Sep 2021 13:28:31 +0000 (14:28 +0100)]
[llvm-dwarfdump][docs] Add missing options to the help output and the command guide

This change is to add some missing details to the help text and command
guide:

- Added a note to the command guide that --debug-macro also dumps
  .debug_macinfo.
- Added a note to the command guide that --debug-frame and --eh_frame
  are aliases, and in cases where both sections are present one command
  outputs both.
- Changed the wording in the help output for --ignore-case and --regex to
  closer match the command guide.

2 years agoRevert "Recommit "Revert "[CVP] processSwitch: Remove default case when switch cover...
Jun Ma [Mon, 27 Sep 2021 12:39:05 +0000 (20:39 +0800)]
Revert "Recommit "Revert "[CVP] processSwitch: Remove default case when switch cover all possible values."""

This reverts commit 8ba2adcf9e54b34ba8efa73ac0d81a1192e4f614.

2 years ago[gn build] Port 9da2fa277e81
LLVM GN Syncbot [Mon, 27 Sep 2021 12:33:13 +0000 (12:33 +0000)]
[gn build] Port 9da2fa277e81

2 years ago[lldb] Move StringConvert inside debugserver
Michał Górny [Sat, 25 Sep 2021 10:47:06 +0000 (12:47 +0200)]
[lldb] Move StringConvert inside debugserver

The StringConvert API is no longer used anywhere but in debugserver.
Since debugserver does not use LLVM API, we cannot replace it with
llvm::to_integer() and llvm::to_float() there.  Let's just move
the sources into debugserver.

Differential Revision: https://reviews.llvm.org/D110478

2 years ago[AMDGPU][OpenMP] Add memory pool size check to isValidMemoryPool
Pushpinder Singh [Fri, 24 Sep 2021 07:01:01 +0000 (07:01 +0000)]
[AMDGPU][OpenMP] Add memory pool size check to isValidMemoryPool

Keeping all the checks in one place for future simplification.

Reviewed By: JonChesterfield

Differential Revision: https://reviews.llvm.org/D110513

2 years ago[lldb] [Host] Refactor XML converting getters
Michał Górny [Fri, 24 Sep 2021 11:25:27 +0000 (13:25 +0200)]
[lldb] [Host] Refactor XML converting getters

Refactor the XML converting attribute and text getters to use LLVM API.
While at it, remove some redundant error and missing XML support
handling, as the called base functions do that anyway.  Add tests
for these methods.

Note that this patch changes the getter behavior to be IMHO more
correct.  In particular:

- negative and overflowing integers are now reported as failures to
  convert, rather than being wrapped over or capped

- digits followed by text are now reported as failures to convert
  to double, rather than their numeric part being converted

Differential Revision: https://reviews.llvm.org/D110410

2 years ago[OpenMP][CMake] Use in-project clang as CUDA->IR compiler for new DeviceRTL.
Michael Kruse [Mon, 27 Sep 2021 12:11:41 +0000 (07:11 -0500)]
[OpenMP][CMake] Use in-project clang as CUDA->IR compiler for new DeviceRTL.

Use the in-project clang, llvm-link and opt if available and unless
CMake cache variables specify to use a different compiler. This applies
D101265 to the new DeviceRTL's CMakeLists.txt which was copied before
D101265 was applied.

Fixes the openmp-offloading-cuda-runtime builder which was failing
since D110006.

Reviewed By: tianshilei1992

Differential Revision: https://reviews.llvm.org/D110251

2 years ago[mlir][linalg] Make fusion on tensor rewriter friendly (NFC).
Tobias Gysi [Mon, 27 Sep 2021 10:07:44 +0000 (10:07 +0000)]
[mlir][linalg] Make fusion on tensor rewriter friendly (NFC).

Let the calling pass or pattern replace the uses of the original root operation. Internally, the tileAndFuse still replaces uses and updates operands but only of newly created operations.

Reviewed By: nicolasvasilache

Differential Revision: https://reviews.llvm.org/D110169