platform/upstream/llvm.git
3 years agoBump googletest to 1.8.1
Benjamin Kramer [Fri, 14 May 2021 15:10:50 +0000 (17:10 +0200)]
Bump googletest to 1.8.1

We've accumulated a scary amount of local patches to this directory. I
tried to merge them all, but if your favorite change is missing please
reapply it manually (and send it upstream).

3 years ago[AArch64][SVE] Add unpredicated vector BIC ISD node
Bradley Smith [Fri, 30 Apr 2021 15:11:09 +0000 (16:11 +0100)]
[AArch64][SVE] Add unpredicated vector BIC ISD node

Addition of this node allows us to better utilize the different forms of
the SVE BIC instructions, including using the alias to an AND (immediate).

Differential Revision: https://reviews.llvm.org/D101831

3 years ago[rs4gc] Strip memory related attributes consistently
Philip Reames [Fri, 14 May 2021 14:52:34 +0000 (07:52 -0700)]
[rs4gc] Strip memory related attributes consistently

I noticed that rs4gc is not stripping a number of memory aliasing related attributes. We do strip some from call sites, but don't strip the same ones from declarations or parameters.

Why do we need to strip these? Two answers:

    Safepoints conceptually read and write to the entire garbage collected heap in the physical model. We need this to preserve ordering of all loads and stores with respect to possible relocation.
    We can infer other attributes from these. For instance, readnone can imply both nofree and nosync. Both of which don't hold after physical rewriting.

Note: This exposed a latent issue which was fixed a couple weeks back in 01801d5274.

Differential Revision: https://reviews.llvm.org/D99802

3 years ago[clangd] Always default to raw pch format
Kadir Cetinkaya [Thu, 13 May 2021 16:51:56 +0000 (18:51 +0200)]
[clangd] Always default to raw pch format

Clang would emit a fatal error when it encounters an unregistered PCH
format. This change ensures clangd will always use raw format no matter what
user specifies.

As side effects:

- serializing an AST in an unknown format might throw off build
systems. I suppose this would only be an issue when build system and clangd are
racing for same PCM modules, hopefully this should be rare and both clangd or
the build system should recover on the next run.

- whenever clang reads a serialized AST it seems to be checking for file
signature and emitting non-fatal errors. so this should be fine again.

The only other valid module format in clang is `obj` but it is part of codegen,
i don't think it is worth the dependency. Hence chosing to not register it, at
least yet.

Differential Revision: https://reviews.llvm.org/D102418

3 years ago[utils] Don't print username in arcanist clang format message
David Spickett [Fri, 14 May 2021 14:30:51 +0000 (14:30 +0000)]
[utils] Don't print username in arcanist clang format message

I didn't realise this message was also posted to the phabricator review.

Just say "the user's local path". "local" is the important part,
the username is not important.

3 years ago[ARM] Expand predecessor search to multiple blocks when reverting WhileLoopStarts
David Green [Fri, 14 May 2021 14:08:14 +0000 (15:08 +0100)]
[ARM] Expand predecessor search to multiple blocks when reverting WhileLoopStarts

We were previously only searching a single preheader for call
instructions when reverting WhileLoopStarts to DoLoopStarts. This
extends that to multiple blocks that can come up when, for example a
loop is expanded from a memcpy. It also expends the instructions from
just Call's to also include other LoopStarts, to catch other low
overhead loops in the preheader.

Differential Revision: https://reviews.llvm.org/D102269

3 years ago[ARM] Define CPSR on MEMCPY pseudos
David Green [Thu, 13 May 2021 08:20:33 +0000 (09:20 +0100)]
[ARM] Define CPSR on MEMCPY pseudos

These pseudos are converted post-isel into t2WhileLoopStart and
t2LoopEnd/LoopDec instructions, which themselves are defined to clobber
CPSR. Doing the same with the MEMCPY nodes will make sure they are
scheduled correctly to not end up with incorrect uses.

3 years ago[RISCV] Add the DebugLoc parameter to getVLENFactoredAmount().
Hsiangkai Wang [Thu, 13 May 2021 02:45:00 +0000 (10:45 +0800)]
[RISCV] Add the DebugLoc parameter to getVLENFactoredAmount().

The MachineBasicBlock::iterator is continuously changing during
generating the frame handling instructions. We should use the DebugLoc
from the caller, instead of getting it from the changing iterator.

If the prologue instructions located in a basic block without any other
instructions after these prologue instructions, the iterator will be
updated to the boundary of the basic block and it is invalid to use the
iterator to access DebugLoc. This patch also fixes the crash when
accessing DebugLoc using the iterator.

Differential Revision: https://reviews.llvm.org/D102386

3 years ago[ARM][AArch64] Correct __ARM_FEATURE_CRYPTO macro and crypto feature
David Candler [Fri, 14 May 2021 12:45:05 +0000 (13:45 +0100)]
[ARM][AArch64] Correct __ARM_FEATURE_CRYPTO macro and crypto feature

This patch contains a couple of minor corrections to my previous
crypto patch:

Since both AArch32 and AArch64 are now correctly setting the aes and
sha2 features individually, it is not necessary to continue to check
the crypto feature when defining feature macros.

In the AArch32 driver, the feature vector is only modified when the
crypto feature is actually in the vector. If crypto is not present,
there is no need to split it and explicitly define crypto/sha2/aes.

Reviewed By: lenary

Differential Revision: https://reviews.llvm.org/D102406

3 years ago[AMDGPU][MC][NFC][DOC] Updated AMD GPU assembler syntax description.
Dmitry Preobrazhensky [Fri, 14 May 2021 13:11:36 +0000 (16:11 +0300)]
[AMDGPU][MC][NFC][DOC] Updated AMD GPU assembler syntax description.

Summary of changes:
- added description of GFX90A;
- minor bugfixing and improvements.

3 years ago[PowerPC] Add vec_vupkhpx and vec_vupklpx for XL compatibility
Nemanja Ivanovic [Fri, 14 May 2021 13:00:44 +0000 (08:00 -0500)]
[PowerPC] Add vec_vupkhpx and vec_vupklpx for XL compatibility

These are old names for these functions that XL still supports.

3 years ago[SDAG] reduce code duplication for extend_vec_inreg combines; NFC
Sanjay Patel [Fri, 14 May 2021 12:27:36 +0000 (08:27 -0400)]
[SDAG] reduce code duplication for extend_vec_inreg combines; NFC

These are identical so far, and I was looking at adding a fold
for a pattern with scalar_to_vector which would also nd up duplicated.

3 years ago[clang][NFC] remove unused return value
Nathan Sidwell [Fri, 14 May 2021 11:25:36 +0000 (04:25 -0700)]
[clang][NFC] remove unused return value

In working on p0388 (ary[N] -> ary[] conversion), I discovered neither
use of UnwrapSimilarArrayTypes used the return value. So let's nuke
it.

Differential Revision: https://reviews.llvm.org/D102480

3 years ago[Transforms][Debugify] Fix "Missing line" false alarm on PHI nodes
Djordje Todorovic [Fri, 14 May 2021 09:06:46 +0000 (11:06 +0200)]
[Transforms][Debugify] Fix "Missing line" false alarm on PHI nodes

This is a fix for https://bugs.llvm.org/show_bug.cgi?id=49959

The "Missing line" false alarm was introduced in D75242.

Patch by Yilong Guo<yilong.guo@intel.com>

Differential Revision: https://reviews.llvm.org/D100446

3 years ago[TableGen] Remove unneeded forward defs. NFC.
Jay Foad [Fri, 14 May 2021 11:03:55 +0000 (12:03 +0100)]
[TableGen] Remove unneeded forward defs. NFC.

3 years ago[X86] AMD Zen 3: same-reg AVX YMM VANDNPD is a zero-cycle(!) dep-breaking zero-idiom
Roman Lebedev [Fri, 14 May 2021 11:05:16 +0000 (14:05 +0300)]
[X86] AMD Zen 3: same-reg AVX YMM VANDNPD is a zero-cycle(!) dep-breaking zero-idiom

As confirmed by exegesis measurements, and ref docs.

3 years ago[X86] AMD Zen 3: same-reg AVX XMM VANDNPD is a zero-cycle(!) dep-breaking zero-idiom
Roman Lebedev [Fri, 14 May 2021 11:04:29 +0000 (14:04 +0300)]
[X86] AMD Zen 3: same-reg AVX XMM VANDNPD is a zero-cycle(!) dep-breaking zero-idiom

As confirmed by exegesis measurements, and ref docs.

3 years ago[X86] AMD Zen 3: same-reg SSE XMM ANDNPD is a 1-cycle(!) dep-breaking zero-idiom
Roman Lebedev [Fri, 14 May 2021 11:03:31 +0000 (14:03 +0300)]
[X86] AMD Zen 3: same-reg SSE XMM ANDNPD is a 1-cycle(!) dep-breaking zero-idiom

As confirmed by exegesis measurements, and ref docs.

3 years ago[NFC][X86][MCA] AMD Zen 3: add same-reg AVX YMM VANDNPD tests
Roman Lebedev [Fri, 14 May 2021 11:02:23 +0000 (14:02 +0300)]
[NFC][X86][MCA] AMD Zen 3: add same-reg AVX YMM VANDNPD tests

3 years ago[NFC][X86][MCA] AMD Zen 3: add same-reg AVX XMM VANDNPD tests
Roman Lebedev [Fri, 14 May 2021 11:02:18 +0000 (14:02 +0300)]
[NFC][X86][MCA] AMD Zen 3: add same-reg AVX XMM VANDNPD tests

3 years ago[NFC][X86][MCA] AMD Zen 3: add same-reg SSE XMM ANDNPD tests
Roman Lebedev [Fri, 14 May 2021 11:02:02 +0000 (14:02 +0300)]
[NFC][X86][MCA] AMD Zen 3: add same-reg SSE XMM ANDNPD tests

3 years ago[X86] AMD Zen 3: same-reg AVX YMM VANDNPS is a zero-cycle(!) dep-breaking zero-idiom
Roman Lebedev [Fri, 14 May 2021 10:40:45 +0000 (13:40 +0300)]
[X86] AMD Zen 3: same-reg AVX YMM VANDNPS is a zero-cycle(!) dep-breaking zero-idiom

As confirmed by exegesis measurements, and ref docs.

3 years ago[X86] AMD Zen 3: same-reg AVX XMM VANDNPS is a zero-cycle(!) dep-breaking zero-idiom
Roman Lebedev [Fri, 14 May 2021 10:39:37 +0000 (13:39 +0300)]
[X86] AMD Zen 3: same-reg AVX XMM VANDNPS is a zero-cycle(!) dep-breaking zero-idiom

As confirmed by exegesis measurements, and ref docs.

3 years ago[X86] AMD Zen 3: same-reg SSE XMM ANDNPS is a 1-cycle(!) dep-breaking zero-idiom
Roman Lebedev [Fri, 14 May 2021 10:37:22 +0000 (13:37 +0300)]
[X86] AMD Zen 3: same-reg SSE XMM ANDNPS is a 1-cycle(!) dep-breaking zero-idiom

Same as SSE XMM XORPS/XORPD, it is not zero-cycle, even though it breaks the deps.
As confirmed by the exegesis measurements, and ref docs.

3 years ago[NFC][X86][MCA] AMD Zen 3: add same-reg AVX YMM VANDNPS tests
Roman Lebedev [Fri, 14 May 2021 10:33:42 +0000 (13:33 +0300)]
[NFC][X86][MCA] AMD Zen 3: add same-reg AVX YMM VANDNPS tests

3 years ago[NFC][X86][MCA] AMD Zen 3: add same-reg AVX XMM VANDNPS tests
Roman Lebedev [Fri, 14 May 2021 10:33:35 +0000 (13:33 +0300)]
[NFC][X86][MCA] AMD Zen 3: add same-reg AVX XMM VANDNPS tests

3 years ago[NFC][X86][MCA] AMD Zen 3: add same-reg SSE XMM ANDNPS tests
Roman Lebedev [Fri, 14 May 2021 10:33:24 +0000 (13:33 +0300)]
[NFC][X86][MCA] AMD Zen 3: add same-reg SSE XMM ANDNPS tests

3 years ago[LoopVectorizationLegality] NFC: Mark some interfaces as 'const'
Sander de Smalen [Fri, 14 May 2021 09:52:44 +0000 (10:52 +0100)]
[LoopVectorizationLegality] NFC: Mark some interfaces as 'const'

This patch marks blockNeedsPredication, isConsecutivePtr, isMaskRequired
and getSymbolicStrides as 'const'.

3 years ago[WebAssembly] Omit DBG_VALUE after terminator
Heejin Ahn [Wed, 12 May 2021 05:46:52 +0000 (22:46 -0700)]
[WebAssembly] Omit DBG_VALUE after terminator

When a stackified variable has an associated `DBG_VALUE` instruction,
DebugFixup pass adds a `DBG_VALUE` instruction after the stackified
value's last use to clear the variable's debug range info. But when the
last use instruction is a terminator, it can cause a verification
failure (when run with `-verify-machineinstrs`) because there are no
instructions allowed after a terminator.

For example:
```
%myvar = ...
DBG_VALUE target-index(wasm-operand-stack), $noreg, !"myvar", ...
BR_IF 0, %myvar, ...
DBG_VALUE $noreg, $noreg, !"myvar", ...
```
In this test, `%myvar` is stackified, so the first `DBG_VALUE`
instruction's first operand has changed to `wasm-operand-stack` to
denote it. And an additional `DBG_VALUE` instruction is added after its
last use, `BR_IF`, to signal variable `myvar` is not in the operand
stack anymore. But because the `DBG_VALUE` instruction is added after
the `BR_IF`, a terminator, it fails MachineVerifier.

`DBG_VALUE` instructions are used in `DbgEntityHistoryCalculator` to
compute value ranges to emit DWARF info, and it turns out the
`DbgEntityHistoryCalculator` terminates ranges at the end of a BB, so we
don't need to emit `DBG_VALUE` after a terminator.

Fixes https://bugs.llvm.org/show_bug.cgi?id=50175.

Reviewed By: dschuff

Differential Revision: https://reviews.llvm.org/D102309

3 years ago[WebAssembly] Support Emscripten EH/SjLj in Wasm64
Heejin Ahn [Thu, 6 May 2021 01:19:21 +0000 (18:19 -0700)]
[WebAssembly] Support Emscripten EH/SjLj in Wasm64

In wasm64, the signatures of some library functions and global variables
defined in Emscripten change:
- `emscripten_longjmp`: `(i32, i32) -> ()` -> `(i64, i32) -> ()`
  This changes because the first argument is the address of a memory
  buffer. This in turn causes more changes below.
- `setThrew`: `(i32, i32) -> ()` -> `(i64, i32) -> ()`
  `emscripten_longjmp` calls `setThrew` with the i64 buffer argument as
  the first parameter.
- `__THREW__` (global var): `i32` to `i64`
  `setThrew`'s first argument is set to this `__THREW__` variable, so it
  should change to i64 as well.
- `testSetjmp`: `(i32, i32*, i32) -> (i32)` -> `(i64, i32*, i32) -> (i32)`
  In the code transformation done in this pass, the value of `__THREW__`
  is passed as the first parameter of `testSetjmp`.

This patch creates some helper functions to easily get types that become
different depending on the wasm32/wasm64, and uses them to change
various function signatures and code transformations. Also updates the
tests with WASM32/WASM64 check lines.

(Untested) Emscripten side patch: https://github.com/emscripten-core/emscripten/pull/14108

Reviewed By: aardappel

Differential Revision: https://reviews.llvm.org/D101985

3 years agoIR+AArch64: add a "swiftasync" argument attribute.
Tim Northover [Wed, 20 Jan 2021 10:14:03 +0000 (10:14 +0000)]
IR+AArch64: add a "swiftasync" argument attribute.

This extends any frame record created in the function to include that
parameter, passed in X22.

The new record looks like [X22, FP, LR] in memory, and FP is stored with 0b0001
in bits 63:60 (CodeGen assumes they are 0b0000 in normal operation). The effect
of this is that tools walking the stack should expect to see one of three
values there:

  * 0b0000 => a normal, non-extended record with just [FP, LR]
  * 0b0001 => the extended record [X22, FP, LR]
  * 0b1111 => kernel space, and a non-extended record.

All other values are currently reserved.

If compiling for arm64e this context pointer is address-discriminated with the
discriminator 0xc31a and the DB (process-specific) key.

There is also an "i8** @llvm.swift.async.context.addr()" intrinsic providing
front-ends access to this slot (and forcing its creation initialized to nullptr
if necessary).

3 years ago[Local] collectBitParts - for bswap-only matches, limit shift amounts to whole bytes...
Simon Pilgrim [Fri, 14 May 2021 10:42:41 +0000 (11:42 +0100)]
[Local] collectBitParts - for bswap-only matches, limit shift amounts to whole bytes to reduce compile time.

3 years ago[Local] collectBitParts - reduce maximum recursion depth.
Simon Pilgrim [Fri, 14 May 2021 10:35:02 +0000 (11:35 +0100)]
[Local] collectBitParts - reduce maximum recursion depth.

As noticed on D90170, the recursion depth for matching a maximum of a i128 bitwidth was too high.

@lebedev.ri mentioned that we can probably do better by limiting the number of collected Values instead of just depth, but I'll look at that later.

3 years ago[VectorCombine] Add tests with assumes involvind variable index.
Florian Hahn [Wed, 12 May 2021 11:17:13 +0000 (12:17 +0100)]
[VectorCombine] Add tests with assumes involvind variable index.

Add test cases with variable indices together with assumes guaranteeing
that the indices are valid.

3 years ago[SLP] Fix spill cost computation for insertelement tree node
Anton Afanasyev [Fri, 14 May 2021 10:06:11 +0000 (13:06 +0300)]
[SLP] Fix spill cost computation for insertelement tree node

This is follow up for D98714, bugfixing.

3 years ago[X86] Try to pass DebugLoc by const-ref to avoid costly TrackingMDNodeRef copies...
Simon Pilgrim [Thu, 13 May 2021 17:23:23 +0000 (18:23 +0100)]
[X86] Try to pass DebugLoc by const-ref to avoid costly TrackingMDNodeRef copies. NFCI.

3 years ago[Test] Add test on missing opportunity in Loop Deletion
Max Kazantsev [Fri, 14 May 2021 09:57:05 +0000 (16:57 +0700)]
[Test] Add test on missing opportunity in Loop Deletion

We can break the backedge in some cases when we can evaluate some of the
values and conditions on the 1st iteration.

3 years agoAArch64: support i128 cmpxchg in GlobalISel.
Tim Northover [Wed, 21 Apr 2021 11:12:28 +0000 (12:12 +0100)]
AArch64: support i128 cmpxchg in GlobalISel.

There are three essentially different cases to handle:

  * -O1, no LSE. The IR is expanded to ldxp/stxp and we need patterns to select
    them.
  * -O0, no LSE. We get G_ATOMIC_CMPXCHG, and need to produce CMP_SWAP_N
    pseudos. The registers are all 64-bit so this is easy.
  * LSE. We get G_ATOMIC_CMPXCHG and need to produce a CASP instruction with
    XSeqPair registers.

The last case is by far the hardest, and and adds 128-bit GPR support as a
byproduct.

3 years agoNFCI: Remove VF argument from isScalarWithPredication
Sander de Smalen [Fri, 14 May 2021 08:00:47 +0000 (09:00 +0100)]
NFCI: Remove VF argument from isScalarWithPredication

As discussed in D102437, the VF argument to isScalarWithPredication
seems redundant, so this is intended to be a non-functional change. It
seems wrong to query the widening decision at this point. Removing the
operand and code to get the widening decision causes no unit/regression
tests to fail. I've also found no issues running the LLVM test-suite.

This subsequently removes the VF argument from isPredicatedInst as well,
since it is no longer required.

3 years ago[AMDGPU] getMemOperandsWithOffset: add vaddr operand for stack access BUF instructions
Jay Foad [Fri, 20 Dec 2019 15:13:57 +0000 (15:13 +0000)]
[AMDGPU] getMemOperandsWithOffset: add vaddr operand for stack access BUF instructions

A consequence is that checkInstOffsetsDoNotOverlap can now distinguish
sp+offset from fp+offset, so it knows that it shouldn't try to work out
whether the accesses overlap just by comparing the offsets. For example
in these two instructions:

MIR:
BUFFER_STORE_DWORD_OFFSET %0:vgpr_32(s32), $sgpr0_sgpr1_sgpr2_sgpr3, $sgpr32, 4, 0, 0, 0, 0, 0, implicit $exec :: (dereferenceable store 4 into stack + 4, addrspace 5)
%4:vgpr_32 = BUFFER_LOAD_DWORD_OFFEN %stack.0.alloca, $sgpr0_sgpr1_sgpr2_sgpr3, $sgpr32, 0, 0, 0, 0, 0, 0, implicit $exec :: (load 4 from `i8 addrspace(5)* undef`, addrspace 5)

ISA:
buffer_store_dword v0, off, s[0:3], s32 offset:4
buffer_load_dword v0, off, s[0:3], s34

Differential Revision: https://reviews.llvm.org/D73957

3 years ago[llvm-mc][AArch64] HINT instruction disassembled as BTI
Alexandros Lamprineas [Fri, 14 May 2021 08:37:44 +0000 (09:37 +0100)]
[llvm-mc][AArch64] HINT instruction disassembled as BTI

The Arm Architecture Reference Manual says that the SystemHintOp_BTI
opcode is prefered when CRm:op2 matches 0100:xx0, but llvm-mc
currently accepts 0100:xxx, which isn't right.

Differential Revision: https://reviews.llvm.org/D102415

3 years ago[libcxx] [test] Change the generic_string_alloc test to test conversions to all char...
Martin Storsjö [Thu, 11 Mar 2021 09:03:46 +0000 (11:03 +0200)]
[libcxx] [test] Change the generic_string_alloc test to test conversions to all char types

On windows, the native path char type is wchar_t - therefore, this test
didn't actually do the conversion that the test was supposed to exercise.

The charset conversions on windows do cause extra allocations outside of
the provided allocator though, so that bit of the test has to be waived
now that the test actually does something. (Other tests have similar
TEST_NOT_WIN32() for allocation checks for charset conversions.)

Also fix a typo, and amend the path.native.obs/string_alloc test to
test char8_t, too.

Differential Revision: https://reviews.llvm.org/D102360

3 years ago[X86] AMD Zen 3: same-reg AVX YMM VXORPD is a zero-cycle(!) dep-breaking zero-idiom
Roman Lebedev [Fri, 14 May 2021 08:27:04 +0000 (11:27 +0300)]
[X86] AMD Zen 3: same-reg AVX YMM VXORPD is a zero-cycle(!) dep-breaking zero-idiom

As confirmed by exegesis measurements, and ref docs.

3 years ago[X86] AMD Zen 3: same-reg AVX XMM VXORPD is a zero-cycle(!) dep-breaking zero-idiom
Roman Lebedev [Fri, 14 May 2021 08:26:12 +0000 (11:26 +0300)]
[X86] AMD Zen 3: same-reg AVX XMM VXORPD is a zero-cycle(!) dep-breaking zero-idiom

As confirmed by exegesis measurements, and ref docs.

3 years ago[X86] AMD Zen 3: same-reg SSE XMM XORPD is a 1-cycle(!) dep-breaking zero-idiom
Roman Lebedev [Fri, 14 May 2021 08:24:22 +0000 (11:24 +0300)]
[X86] AMD Zen 3: same-reg SSE XMM XORPD is a 1-cycle(!) dep-breaking zero-idiom

Same as with it's float friend, unlike their AVX versions.
As confirmed by exegesis, and ref docs.

3 years ago[NFC][X86][MCA] AMD Zen 3: add same-reg AVX YMM VXORPD tests
Roman Lebedev [Fri, 14 May 2021 08:20:15 +0000 (11:20 +0300)]
[NFC][X86][MCA] AMD Zen 3: add same-reg AVX YMM VXORPD tests

3 years ago[NFC][X86][MCA] AMD Zen 3: add same-reg AVX XMM VXORPD tests
Roman Lebedev [Fri, 14 May 2021 08:20:10 +0000 (11:20 +0300)]
[NFC][X86][MCA] AMD Zen 3: add same-reg AVX XMM VXORPD tests

3 years ago[NFC][X86][MCA] AMD Zen 3: add same-reg SSE XMM XORPD tests
Roman Lebedev [Fri, 14 May 2021 08:19:59 +0000 (11:19 +0300)]
[NFC][X86][MCA] AMD Zen 3: add same-reg SSE XMM XORPD tests

3 years ago[X86] AMD Zen 3: same-reg AVX YMM VXORPS is a zero-cycle(!) dep-breaking zero-idiom
Roman Lebedev [Fri, 14 May 2021 08:11:45 +0000 (11:11 +0300)]
[X86] AMD Zen 3: same-reg AVX YMM VXORPS is a zero-cycle(!) dep-breaking zero-idiom

As confirmed by exegesis, and ref docs.

3 years ago[NFC][X86][MCA] AMD Zen 3: add same-reg AVX YMM VXORPS tests
Roman Lebedev [Fri, 14 May 2021 08:10:22 +0000 (11:10 +0300)]
[NFC][X86][MCA] AMD Zen 3: add same-reg AVX YMM VXORPS tests

3 years ago[X86] AMD Zen 3: same-reg AVX XMM VXORPS is a zero-cycle(!) dep-breaking zero-idiom
Roman Lebedev [Fri, 14 May 2021 08:06:39 +0000 (11:06 +0300)]
[X86] AMD Zen 3: same-reg AVX XMM VXORPS is a zero-cycle(!) dep-breaking zero-idiom

Unlike it's legacy SSE XMM XORPS version, which measures as being 1-cycle,
this one is certainly a zero-cycle instruction, in addition to both of them
being dependency breaking.

As confirmed by exegesis measurements, and ref docs.

3 years ago[NFC][X86][MCA] AMD Zen 3: add same-reg AVX XMM VXORPS tests
Roman Lebedev [Fri, 14 May 2021 08:02:03 +0000 (11:02 +0300)]
[NFC][X86][MCA] AMD Zen 3: add same-reg AVX XMM VXORPS tests

3 years ago[docs] Added llvm/cmake section
Pooja Yadav [Fri, 14 May 2021 08:39:03 +0000 (14:09 +0530)]
[docs] Added llvm/cmake section

Added information about the cmake inside llvm.

Reviewed By: xgupta, jroelofs

Differential Revision: https://reviews.llvm.org/D101925

3 years ago[AMDGPU] Fix codegen of image intrinsics for g16 and a16
David Stuttard [Fri, 7 May 2021 10:43:29 +0000 (11:43 +0100)]
[AMDGPU] Fix codegen of image intrinsics for g16 and a16

For gfx10 gradient (g16) and address (a16) can be independent. Previous
implementation assumed that a16 implied g16.

There are some other changes that fix the verification (as well as asm/disasm)
that are required for the included test to pass - the XFAIL will be removed in
those changes.

This also includes required fixes for GlobalISel

Differential Revision: https://reviews.llvm.org/D102066

Change-Id: I7d171cc90994de05f41669b66a6d0ffa2ed05d09

3 years ago[AMDGPU][AsmParser/Disassembler] Correct A16 and G16 handling
David Stuttard [Fri, 30 Apr 2021 10:37:41 +0000 (11:37 +0100)]
[AMDGPU][AsmParser/Disassembler] Correct A16 and G16 handling

A16 support for image instructions assembly/disassembly (gfx10) was missing

Also refactor MIMG op addr size calcs to common function

We'd got 3 places where the same operation was being done.

One test is now marked XFAIL until a related codegen patch is in place

Differential Revision: https://reviews.llvm.org/D102231

Change-Id: I7e86e730ef8c71901457855cba570581f4f576bb

3 years ago[llvm][AsmPrinter] Restore source location to register clobber warning
David Spickett [Tue, 11 May 2021 14:33:53 +0000 (14:33 +0000)]
[llvm][AsmPrinter] Restore source location to register clobber warning

Since 5de2d189e6ad466a1f0616195e8c524a4eb3cbc0 this particular warning
hasn't had the location of the source file containing the inline
assembly.

Fix this by reporting via LLVMContext. Which means that we no longer
have the "instantiated into assembly here" lines but they were going to
point to the start of the inline asm string anyway.

This message is already tested via IR in llvm. However we won't have
the required location info there so I've added a C file test in clang
to cover it.
(though strictly, this is testing llvm code)

Reviewed By: ychen

Differential Revision: https://reviews.llvm.org/D102244

3 years agoNew tag for ittapi - fix an error related to cross-compiling ITTAPI in LLVM with...
Alexey Bader [Fri, 14 May 2021 05:18:49 +0000 (08:18 +0300)]
New tag for ittapi - fix an error related to cross-compiling ITTAPI in LLVM with mingw

Fix was implemented in the ittap repo to solve an error about cross-compiling ITTAPI in LLVM with mingw.
The problem occurred in the cross-compilation environment for Julia's dependencies.
The corresponding issue item in ittapi repo: https://github.com/intel/ittapi/issues/19
A new tag was created in ittapi repo for that fix.

This patch contains changes to update the ittapi tag in LLVM.

Reviewed By: bader

Differential Revision: https://reviews.llvm.org/D102471

3 years ago[GVN] Clobber partially aliased loads.
dfukalov [Fri, 9 Apr 2021 10:37:13 +0000 (13:37 +0300)]
[GVN] Clobber partially aliased loads.

Use offsets stored in `AliasResult` implemented in D98718.

Updated with fix of issue reported in https://reviews.llvm.org/D95543#2745161

Reviewed By: nikic

Differential Revision: https://reviews.llvm.org/D95543

3 years ago[DSE] Move isOverwrite into DSEState. NFC
David Green [Fri, 14 May 2021 08:16:51 +0000 (09:16 +0100)]
[DSE] Move isOverwrite into DSEState. NFC

This moves the isOverwrite function into the DSEState so that it can
share the analyses and members from the state.

A few extra loop tests were also added to test stores in and around
multi block loops for D100464.

3 years ago[ORC] Add JITLink dependence for ObjectLinkingLayerTest.
Lang Hames [Fri, 14 May 2021 05:47:35 +0000 (22:47 -0700)]
[ORC] Add JITLink dependence for ObjectLinkingLayerTest.

This aims to fix the failure at
https://lab.llvm.org/buildbot/#/builders/61/builds/9590.

3 years ago[gn build] Port 0fda4c4745b8
LLVM GN Syncbot [Fri, 14 May 2021 04:56:03 +0000 (04:56 +0000)]
[gn build] Port 0fda4c4745b8

3 years ago[ORC] Add support for adding LinkGraphs directly to ObjectLinkingLayer.
Lang Hames [Fri, 14 May 2021 04:35:34 +0000 (21:35 -0700)]
[ORC] Add support for adding LinkGraphs directly to ObjectLinkingLayer.

This is separate from (but builds on) the support added in ec6b71df70a for
emitting LinkGraphs in the context of an active materialization. This commit
makes LinkGraphs a first-class data structure with features equivalent to
object files within ObjectLinkingLayer.

3 years ago[JITLink] Fix missing 'static' keyword in unit test.
Lang Hames [Fri, 14 May 2021 01:59:26 +0000 (18:59 -0700)]
[JITLink] Fix missing 'static' keyword in unit test.

3 years ago[sanitizer] Simplify __sanitizer::BufferedStackTrace::UnwindImpl implementations
Fangrui Song [Fri, 14 May 2021 04:26:31 +0000 (21:26 -0700)]
[sanitizer] Simplify __sanitizer::BufferedStackTrace::UnwindImpl implementations

Intended to be NFC. D102046 relies on the refactoring for stack boundaries.

3 years ago[AMDGPU] Do not clause NSA instructions
Carl Ritson [Fri, 14 May 2021 03:29:54 +0000 (12:29 +0900)]
[AMDGPU] Do not clause NSA instructions

To ensure correct behaviour NSA instructions should not be claused.

Reviewed By: foad

Differential Revision: https://reviews.llvm.org/D102211

3 years agoUse enum comparison instead of generated switch/case, NFC
Reid Kleckner [Fri, 14 May 2021 03:26:50 +0000 (20:26 -0700)]
Use enum comparison instead of generated switch/case, NFC

Clang's coverage data for auto-generated switch cases is really, really
large. Before this change, when I enable code coverage, SemaDeclAttr.obj
is 4.0GB. Naturally, this fails to link.

Replacing the RISCV builtin id check with a comparison reduces object
file size from 4.0GB to 330MB. Replacing the AArch64 SVE range check
reduces the size again down to 17MB, which is reasonable.

I think the RISCV switch is larger in coverage data because it uses more
levels of macro expansion, while the SVE intrinsics only use one. In any
case, please try to avoid switches with 1000+ cases, they usually don't
optimize well.

3 years ago[COFF] Remove a truncation assertion from setRVA
Reid Kleckner [Fri, 14 May 2021 02:32:49 +0000 (19:32 -0700)]
[COFF] Remove a truncation assertion from setRVA

LLD already produces a nice error message when sections exceed 4GB, and
this setRVA assertion causes LLD to crash instead of diagnosing the
error properly.

No test because we don't want slow tests that create 4GB files.

3 years ago[mlir] VectorToSCF cleanup
Matthias Springer [Fri, 14 May 2021 01:45:13 +0000 (10:45 +0900)]
[mlir] VectorToSCF cleanup

Group functions/structs in namespaces for better code readability.

Depends On D102123

Reviewed By: nicolasvasilache

Differential Revision: https://reviews.llvm.org/D102124

3 years ago[MLIR] Fix build failures due to unused variables in non-debug builds.
Rahul Joshi [Fri, 14 May 2021 01:33:02 +0000 (18:33 -0700)]
[MLIR] Fix build failures due to unused variables in non-debug builds.

Differential Revision: https://reviews.llvm.org/D102458

3 years ago[ORC] Remove the OrcExecutionTest class. It is no longer used.
Lang Hames [Fri, 14 May 2021 01:22:33 +0000 (18:22 -0700)]
[ORC] Remove the OrcExecutionTest class. It is no longer used.

3 years ago[ORC] Remove unused RTDyldObjectLinkingLayerExecutionTest class from unit test.
Lang Hames [Fri, 14 May 2021 01:11:33 +0000 (18:11 -0700)]
[ORC] Remove unused RTDyldObjectLinkingLayerExecutionTest class from unit test.

3 years ago[ORC] Remove some stale unit test utils.
Lang Hames [Fri, 14 May 2021 00:32:36 +0000 (17:32 -0700)]
[ORC] Remove some stale unit test utils.

This code was used to test ORCv1, which has been removed. It is not useful for
testing ORCv2.

3 years ago[mlir] VectorToSCF target rank is a pass option
Matthias Springer [Fri, 14 May 2021 00:56:28 +0000 (09:56 +0900)]
[mlir] VectorToSCF target rank is a pass option

Make "target rank" a pass option of VectorToSCF.

Depends On D102101

Reviewed By: nicolasvasilache

Differential Revision: https://reviews.llvm.org/D102123

3 years ago[Debug-Info] change Tag type to dwarf::Tag for createAndAddDIE; NFC
Chen Zheng [Tue, 11 May 2021 01:31:27 +0000 (21:31 -0400)]
[Debug-Info] change Tag type to dwarf::Tag for createAndAddDIE; NFC

Reviewed By: dblaikie

Differential Revision: https://reviews.llvm.org/D102207

3 years agoscudo: Fix MTE error reporting for zero-sized allocations.
Peter Collingbourne [Thu, 13 May 2021 20:55:16 +0000 (13:55 -0700)]
scudo: Fix MTE error reporting for zero-sized allocations.

With zero-sized allocations we don't actually end up storing the
address tag to the memory tag space, so store it in the first byte of
the chunk instead so that we can find it later in getInlineErrorInfo().

Differential Revision: https://reviews.llvm.org/D102442

3 years agoscudo: Check for UAF in ring buffer before OOB in more distant blocks.
Peter Collingbourne [Wed, 12 May 2021 23:49:19 +0000 (16:49 -0700)]
scudo: Check for UAF in ring buffer before OOB in more distant blocks.

It's more likely that we have a UAF than an OOB in blocks that are
more than 1 block away from the fault address, so the UAF should
appear first in the error report.

Differential Revision: https://reviews.llvm.org/D102379

3 years ago[test] Fix new-pm-lto-defaults.ll to work on all platforms
Arthur Eubanks [Fri, 14 May 2021 01:12:55 +0000 (18:12 -0700)]
[test] Fix new-pm-lto-defaults.ll to work on all platforms

https://lab.llvm.org/buildbot/#/builders/119/builds/3775/steps/8/logs/FAIL__LLVM__new-pm-lto-defaults_ll

Followup to D102345.

3 years ago[sanitizer] Use size_t on g_tls_size to fix build on x32
H.J. Lu [Fri, 14 May 2021 01:07:11 +0000 (18:07 -0700)]
[sanitizer] Use size_t on g_tls_size to fix build on x32

On x32 size_t == unsigned int, not unsigned long int:

../../../../../src-master/libsanitizer/sanitizer_common/sanitizer_linux_libcdep.cpp: In function ??void __sanitizer::InitTlsSize()??:
../../../../../src-master/libsanitizer/sanitizer_common/sanitizer_linux_libcdep.cpp:209:55: error: invalid conversion from ??__sanitizer::uptr*?? {aka ??long unsigned int*??} to ??size_t*?? {aka ??unsigned int*??} [-fpermissive]
  209 |   ((void (*)(size_t *, size_t *))get_tls_static_info)(&g_tls_size, &tls_align);
      |                                                       ^~~~~~~~~~~
      |                                                       |
      |                                                       __sanitizer::uptr* {aka long unsigned int*}

by using size_t on g_tls_size.  This is to fix:

https://bugs.llvm.org/show_bug.cgi?id=50332

Reviewed By: MaskRay

Differential Revision: https://reviews.llvm.org/D102446

3 years ago[Debug-Info] make DIE attributes generation under strict DWARF control
Chen Zheng [Mon, 10 May 2021 02:57:36 +0000 (22:57 -0400)]
[Debug-Info] make DIE attributes generation under strict DWARF control

Reviewed By: dblaikie

Differential Revision: https://reviews.llvm.org/D101024

3 years ago[AArch64][GlobalISel] Fix a crash during unsuccessful G_CTPOP <2 x s64> legalization.
Amara Emerson [Thu, 13 May 2021 18:42:17 +0000 (11:42 -0700)]
[AArch64][GlobalISel] Fix a crash during unsuccessful G_CTPOP <2 x s64> legalization.

The legalization rule for scalar-same-as doesn't handle vectors. Until we
implement custom legalization for this, at least fall back properly.

3 years ago[mlir][openacc][NFC] add anonymous namespace around LegalizeDataOpForLLVMTranslation...
Valentin Clement [Fri, 14 May 2021 00:27:37 +0000 (20:27 -0400)]
[mlir][openacc][NFC] add anonymous namespace around LegalizeDataOpForLLVMTranslation class

Add missing anonymous namespace around LegalizeDataOpForLLVMTranslation class .

Reviewed By: rriddle

Differential Revision: https://reviews.llvm.org/D102380

3 years ago[gn] Don't pass -fprofile-instr-generate to linker on Windows
Reid Kleckner [Thu, 13 May 2021 23:03:01 +0000 (16:03 -0700)]
[gn] Don't pass -fprofile-instr-generate to linker on Windows

Avoids a warning from the linker. The user still has to put the resource
directory on the linker search path, and I can't find a clean way to do
that automatically in gn.

3 years agoAMDGPU/GlobalISel: Don't hardcode stack alignment in assert message
Matt Arsenault [Thu, 13 May 2021 23:00:13 +0000 (19:00 -0400)]
AMDGPU/GlobalISel: Don't hardcode stack alignment in assert message

3 years agoAMDGPU/GlobalISel: Implement tail calls
Matt Arsenault [Tue, 12 Jan 2021 22:56:57 +0000 (17:56 -0500)]
AMDGPU/GlobalISel: Implement tail calls

Or at least the sibling call cases which the DAG already handles.

3 years ago[mlir][Linalg] Add support for vector.transfer ops to comprehensive bufferization...
Nicolas Vasilache [Thu, 13 May 2021 20:57:57 +0000 (20:57 +0000)]
[mlir][Linalg] Add support for vector.transfer ops to comprehensive bufferization (2/n).

Differential revision: https://reviews.llvm.org/D102395

3 years ago[mlir][Linalg] Add ComprehensiveBufferize for functions(step 1/n)
Nicolas Vasilache [Thu, 13 May 2021 20:42:24 +0000 (20:42 +0000)]
[mlir][Linalg] Add ComprehensiveBufferize for functions(step 1/n)

This is the first step towards upstreaming comprehensive bufferization following the
discourse post: https://llvm.discourse.group/t/rfc-linalg-on-tensors-update-and-comprehensive-bufferization-rfc/3373/6.

This first commit introduces a basic pass for bufferizing within function boundaries,
assuming that the inplaceable function boundaries have been marked as such.

Differential revision: https://reviews.llvm.org/D101693

3 years agoWiden `name` stencil to support `TypeLoc` nodes.
Weston Carvalho [Mon, 10 May 2021 17:50:55 +0000 (10:50 -0700)]
Widen `name` stencil to support `TypeLoc` nodes.

Differential Revision: https://reviews.llvm.org/D102185

3 years ago[IR] Introduce the opaque pointer type
Arthur Eubanks [Sun, 2 May 2021 02:04:42 +0000 (19:04 -0700)]
[IR] Introduce the opaque pointer type

The opaque pointer type is essentially just a normal pointer type with a
null pointee type.

This also adds support for the opaque pointer type to the bitcode
reader/writer, as well as to textual IR.

To avoid confusion with existing pointer types, we disallow creating a
pointer to an opaque pointer.

Opaque pointer types should not be widely used at this point since many
parts of LLVM still do not support them. The next steps are to add some
very simple use cases of opaque pointers to make sure they work, then
start pretending that all pointers are opaque pointers and see what
breaks.

https://lists.llvm.org/pipermail/llvm-dev/2021-May/150359.html

Reviewed By: dblaikie, dexonsmith, pcc

Differential Revision: https://reviews.llvm.org/D101704

3 years ago[Clang][OpenMP] Allow unified_shared_memory for Pascal-generation GPUs.
Michael Kruse [Thu, 13 May 2021 22:12:23 +0000 (17:12 -0500)]
[Clang][OpenMP] Allow unified_shared_memory for Pascal-generation GPUs.

The Pascal architecture supports the page migration engine required for
unified_shared_memory, as indicated by NVIDIA:
 * https://developer.nvidia.com/blog/unified-memory-cuda-beginners/
 * https://developer.nvidia.com/blog/beyond-gpu-memory-limits-unified-memory-pascal/
 * https://docs.nvidia.com/cuda/cuda-c-programming-guide/index.html#um-requirements

The limitation was introduced in D54493 which justified the cut-off by
the requirement for unified addressing. However, Unified Virtual
Addressing (UVA) is already available with sm20 (Fermi, Kepler,
Maxwell):
 * https://docs.nvidia.com/cuda/gpudirect-rdma/index.html#basics-of-uva-cuda-memory-management

Unified shared memory might even be possible with these, but with
migration of entire allocations on kernel startup.

Reviewed By: JonChesterfield

Differential Revision: https://reviews.llvm.org/D101595

3 years agoDon't run MachineVerifier on sjlj-unwind-inline-asm test because of known issue ...
cynecx [Thu, 13 May 2021 20:45:53 +0000 (21:45 +0100)]
Don't run MachineVerifier on sjlj-unwind-inline-asm test because of known issue (PR39439)

Fixes buildbot failure (https://lab.llvm.org/buildbot/#/builders/16/builds/10825).

Reviewed By: Amanieu

Differential Revision: https://reviews.llvm.org/D102433

3 years ago[docs] Add page on opaque pointer types
Arthur Eubanks [Wed, 12 May 2021 00:02:12 +0000 (17:02 -0700)]
[docs] Add page on opaque pointer types

Reviewed By: dblaikie, dexonsmith

Differential Revision: https://reviews.llvm.org/D102292

3 years ago[clang-repl] Temporarily disable the execute.cpp test on ppc64.
Lang Hames [Thu, 13 May 2021 21:33:33 +0000 (14:33 -0700)]
[clang-repl] Temporarily disable the execute.cpp test on ppc64.

This test is failing on some builders (see [1]) with the following error:

error: Added modules have incompatible data layouts:
  e-m:e-i64:64-n32:64-S128-v256:256:256-v512:512:512 (module) vs
  E-m:a-i64:64-n32:64-S128-v256:256:256-v512:512:512 (jit)

The JIT layout is correct, but some IR module added to the JIT is using a
little-endian layout instead.

This commit disables the test on ppc64 until we can investigate further and
fix the bug.

[1] https://lab.llvm.org/staging/#/builders/126/builds/371

3 years ago[CaptureTracking] Use isIdentifiedFunctionLocal() (NFC)
Nikita Popov [Thu, 13 May 2021 21:03:46 +0000 (23:03 +0200)]
[CaptureTracking] Use isIdentifiedFunctionLocal() (NFC)

These conditions together exactly match isIdentifiedFunctionLocal(),
and this is also what we logically want to check for here.

3 years ago[AA] Use isIdentifiedFunctionLocal() (NFC)
Nikita Popov [Thu, 13 May 2021 20:59:19 +0000 (22:59 +0200)]
[AA] Use isIdentifiedFunctionLocal() (NFC)

This condition is equivalent to isIdentifiedFunctionLocal(),
and this is also what we semantically want to check here.

3 years agoRevert "[X86][CostModel] X86TTIImpl::getMemoryOpCost(): rewrite vector handling again"
Roman Lebedev [Thu, 13 May 2021 20:55:51 +0000 (23:55 +0300)]
Revert "[X86][CostModel] X86TTIImpl::getMemoryOpCost(): rewrite vector handling again"

As reported in post-commit feedback, this has issues with e.g. <16 x i1>:
https://llvm.godbolt.org/z/jxPvdGEW4

This reverts commit c02476f3158f2908ef0a6f628210b5380bd33695.

3 years agoRevert "[X86] X86TTIImpl::getInterleavedMemoryOpCostAVX2(): use getMemoryOpCost()"
Roman Lebedev [Thu, 13 May 2021 20:55:28 +0000 (23:55 +0300)]
Revert "[X86] X86TTIImpl::getInterleavedMemoryOpCostAVX2(): use getMemoryOpCost()"

Depends on a commit that is about to be reverted.

This reverts commit 69ed93a4355123a45c1d7216aea7cd53d07a361b.

3 years ago[X86] AMD Zen 3: same-reg SSE XMM XORPS is a 1-cycle(!) dep-breaking one-idiom
Roman Lebedev [Thu, 13 May 2021 20:54:11 +0000 (23:54 +0300)]
[X86] AMD Zen 3: same-reg SSE XMM XORPS is a 1-cycle(!) dep-breaking one-idiom

While both the SOG and Agner insist that it is zero-cycle,
i can not confirm that claim. While it clearly breaks the dependency,
i can not come up with a snippet, or measurement approach,
to end up with IPC bigger than 4, which, to me, means that it actually
consumes execution resource of an FP unit for a cycle.

3 years ago[NFC][X86][MCA] AMD Zen 3: add same-reg SSE XMM XORPS test
Roman Lebedev [Thu, 13 May 2021 20:48:49 +0000 (23:48 +0300)]
[NFC][X86][MCA] AMD Zen 3: add same-reg SSE XMM XORPS test

3 years ago[mlir][tosa] Add lowering to tosa.abs for integer cases
Rob Suderman [Mon, 3 May 2021 20:56:00 +0000 (13:56 -0700)]
[mlir][tosa] Add lowering to tosa.abs for integer cases

Integer case requires decomposing to simple LLVM operatons.

Differential Revision: https://reviews.llvm.org/D101809

3 years ago[libc] Enable fmaf and fma on x86_64.
Siva Chandra Reddy [Thu, 13 May 2021 20:48:45 +0000 (20:48 +0000)]
[libc] Enable fmaf and fma on x86_64.

They require clang-11 or above for building and hence had to be disabled
as the bots did not have clang-11 or higher. Bots have now been upgraded
so we can enable these functions now.