platform/upstream/llvm.git
3 years ago[mlir][Linalg] Add support for subtensor_insert comprehensive bufferization (3/n)
Nicolas Vasilache [Fri, 14 May 2021 16:35:40 +0000 (16:35 +0000)]
[mlir][Linalg] Add support for subtensor_insert comprehensive bufferization (3/n)

Differential revision: https://reviews.llvm.org/D102417

3 years agoRevert "[X86] Try to pass DebugLoc by const-ref to avoid costly TrackingMDNodeRef...
Mitch Phillips [Fri, 14 May 2021 21:12:21 +0000 (14:12 -0700)]
Revert "[X86] Try to pass DebugLoc by const-ref to avoid costly TrackingMDNodeRef copies. NFCI."

This reverts commit 5ed56a821c0622869739a3ae752eea97a1ee1f48.

Reason: Broke the MSan buildbots. See Phabricator for more info
(https://reviews.llvm.org/rG5ed56a821c0622869739a3ae752eea97a1ee1f48).

3 years ago[NFC] Directly get GV type
Arthur Eubanks [Fri, 14 May 2021 21:26:37 +0000 (14:26 -0700)]
[NFC] Directly get GV type

3 years ago[Polly] Run polly-update-format. NFC.
Michael Kruse [Fri, 14 May 2021 21:22:38 +0000 (16:22 -0500)]
[Polly] Run polly-update-format. NFC.

Thanks to Leonard Chan for reporting.

3 years ago[CSSPGO] Fix return value of getProbeWeight
wlei [Thu, 6 May 2021 00:17:05 +0000 (17:17 -0700)]
[CSSPGO] Fix return value of getProbeWeight

Currently we didn't support multiple return type, we work around to use error_code to represent:

1)  The dangling probe.
2)  Ignore the weight of non-probe instruction

While merging the instructions' weight for the whole BB, it will filter out the error code. But If all instructions of the BB give error_code, the outside logic will mark it as a BB requiring the inference algorithm to infer its weight. This is different from the zero value which will be treated as a cold block.

Fix one place that if we can't find the FunctionSamples in the profile data which indicates the BB is cold, we choose to return zero.

Also refine the comments.

Reviewed By: hoy, wenlei

Differential Revision: https://reviews.llvm.org/D102007

3 years ago[MemDep] Use BatchAA in more places (NFCI)
Nikita Popov [Sat, 25 Jul 2020 16:12:40 +0000 (18:12 +0200)]
[MemDep] Use BatchAA in more places (NFCI)

Previously, we already used BatchAA for individual simple pointer
dependency queries. This extends BatchAA usage for the non-local
case, so that only one BatchAA instance is used for all blocks,
instead of one instance per block.

Use of BatchAA is safe as IR cannot be modified during a MemDep
query.

3 years ago[SystemZ] [z/OS] Add SystemZCallingConventionRegisters class
Neumann Hon [Fri, 14 May 2021 20:15:08 +0000 (16:15 -0400)]
[SystemZ] [z/OS] Add SystemZCallingConventionRegisters class

This patch adds the abstract class SystemZCallingConventionRegisters
which is a SystemZ-specific class detailing special registers used
by calling conventions on the target. SystemZELFRegisters and
SystemZXPLINK64Registers implement this class for ELF and XPLINK64
respectively.

Reviewed By: uweigand

Differential Revision: https://reviews.llvm.org/D102370

3 years ago[LLD][MinGW] Ignore --no-undefined flag
Mateusz Mikuła [Fri, 14 May 2021 20:46:49 +0000 (23:46 +0300)]
[LLD][MinGW] Ignore --no-undefined flag

AFAIK this is the default behaviour when this flag is not passed.

Differential Revision: https://reviews.llvm.org/D102516

3 years ago[MinGW] Always enable -mbig-obj for LLVM build unless using Clang
Mateusz Mikuła [Fri, 14 May 2021 20:45:07 +0000 (23:45 +0300)]
[MinGW] Always enable -mbig-obj for LLVM build unless using Clang

It's easy to hit 2**16 limit with i686 GNU toolchains these days.
Clang does it automagically, so it's not needed there, and the option
causes warnings about being unused when linking.

Differential Revision: https://reviews.llvm.org/D102419

3 years ago[Scudo] Delete unused flag 'rss_limit_mb'.
Mitch Phillips [Fri, 14 May 2021 20:45:29 +0000 (13:45 -0700)]
[Scudo] Delete unused flag 'rss_limit_mb'.

EOM.

Reviewed By: cryptoad

Differential Revision: https://reviews.llvm.org/D102529

3 years ago[LV] Add another more complex first-order recurrence sinking test.
Florian Hahn [Fri, 14 May 2021 20:19:29 +0000 (21:19 +0100)]
[LV] Add another more complex first-order recurrence sinking test.

3 years ago[Clang,Driver] Add -fveclib=Darwin_libsystem_m support.
Florian Hahn [Fri, 14 May 2021 20:00:13 +0000 (21:00 +0100)]
[Clang,Driver] Add -fveclib=Darwin_libsystem_m support.

Support for Darwin's libsystem_m's vector functions has been added to
LLVM in 93a9a8a8d90f.

This patch adds support for -fveclib=Darwin_libsystem_m to Clang.

Reviewed By: arphaman

Differential Revision: https://reviews.llvm.org/D102489

3 years ago[AA] Support callCapturesBefore() on BatchAA (NFCI)
Nikita Popov [Thu, 13 May 2021 19:45:46 +0000 (21:45 +0200)]
[AA] Support callCapturesBefore() on BatchAA (NFCI)

This is not expected to have any practical compile-time effect,
as the alias() calls inside callCapturesBefore() are rare. This
should still be supported for API completeness, and might be
useful for reachability caching.

3 years ago[SLP][NFC]Add a test for non-consecutive inserts, NFC.
Alexey Bataev [Fri, 14 May 2021 19:44:06 +0000 (12:44 -0700)]
[SLP][NFC]Add a test for non-consecutive inserts, NFC.

3 years ago[sanitizer] Commit a missing change in BufferedStackTrace::Unwind
Fangrui Song [Fri, 14 May 2021 19:41:34 +0000 (12:41 -0700)]
[sanitizer] Commit a missing change in BufferedStackTrace::Unwind

3 years ago[sanitizer] Fall back to fast unwinder
Fangrui Song [Fri, 14 May 2021 19:27:33 +0000 (12:27 -0700)]
[sanitizer] Fall back to fast unwinder

`-fno-exceptions -fno-asynchronous-unwind-tables` compiled programs don't
produce .eh_frame on Linux and other ELF platforms, so the slow unwinder cannot
print stack traces. Just fall back to the fast unwinder: this allows
-fno-asynchronous-unwind-tables without requiring the sanitizer option
`fast_unwind_on_fatal=1`

Reviewed By: #sanitizers, vitalybuka

Differential Revision: https://reviews.llvm.org/D102046

3 years agoGTEST_HAS_TR1_TUPLE is gone, stop defining it.
Benjamin Kramer [Fri, 14 May 2021 19:13:18 +0000 (21:13 +0200)]
GTEST_HAS_TR1_TUPLE is gone, stop defining it.

3 years ago[ProfData] Address a unit test FIXME
Benjamin Kramer [Fri, 14 May 2021 19:12:59 +0000 (21:12 +0200)]
[ProfData] Address a unit test FIXME

3 years agoRemove (unneeded) '-asan-use-after-return' from hoist-argument-init-insts.ll.
Kevin Athey [Thu, 13 May 2021 22:07:21 +0000 (15:07 -0700)]
Remove (unneeded) '-asan-use-after-return' from hoist-argument-init-insts.ll.

Remove (unneeded) '-asan-use-after-return' from hoist-argument-init-insts.ll.

Reviewed By: vsk

Differential Revision: https://reviews.llvm.org/D102448

3 years ago[flang] s/TYPED_TEST_CASE/TYPED_TEST_SUITE/ as the former is deprecated
Benjamin Kramer [Fri, 14 May 2021 18:39:48 +0000 (20:39 +0200)]
[flang] s/TYPED_TEST_CASE/TYPED_TEST_SUITE/ as the former is deprecated

3 years agoAdd another -Wdeprecated-copy hack for gtest
Benjamin Kramer [Fri, 14 May 2021 18:37:03 +0000 (20:37 +0200)]
Add another -Wdeprecated-copy hack for gtest

3 years agoSwiftAsync: remove duplicate instance in array. NFC.
Tim Northover [Fri, 14 May 2021 18:21:54 +0000 (19:21 +0100)]
SwiftAsync: remove duplicate instance in array. NFC.

3 years agoDiscount invariant instructions in full unrolling
Philip Reames [Fri, 14 May 2021 17:57:59 +0000 (10:57 -0700)]
Discount invariant instructions in full unrolling

This patch updates the cost model for full unrolling to discount the cost of a loop invariant expression on all but one iteration. The reasoning here is that such an expression (as determined by SCEV) will be CSEd or DSEd once the loop is unrolled. Note that SCEVs reasoning will find things which could be invariant, not simply those outside the loop.

Differential Revision: https://reviews.llvm.org/D102506

3 years ago[mlir] Add missing dependence to TestDialect from TestTransforms
River Riddle [Fri, 14 May 2021 17:58:29 +0000 (10:58 -0700)]
[mlir] Add missing dependence to TestDialect from TestTransforms

This was accidentally dropped in D102456

3 years ago[Polly] Add support for -polly-position=early with the NPM.
Michael Kruse [Thu, 13 May 2021 22:09:55 +0000 (17:09 -0500)]
[Polly] Add support for -polly-position=early with the NPM.

This required support for the canonicalization passes, inlcuding
porting RewriteByReferenceParams to the NPM.

For some reason, the legacy pass pipeline with -polly-position=early did
not run the CodePreparation pass. This was fixed as well.

3 years ago[InstCombine] drop poison flags when simplifying 'shl' based on demanded bits
Sanjay Patel [Fri, 14 May 2021 17:41:14 +0000 (13:41 -0400)]
[InstCombine] drop poison flags when simplifying 'shl' based on demanded bits

As with other transforms in demanded bits, we must be careful not to
wrongly propagate nsw/nuw if we are reducing values leading up to the shift.

This bug was introduced with 1b24f35f843c and leads to the miscompile
shown in:
https://llvm.org/PR50341

3 years ago[InstCombine] add test for shl demanded bits miscompile; NFC
Sanjay Patel [Fri, 14 May 2021 16:45:40 +0000 (12:45 -0400)]
[InstCombine] add test for shl demanded bits miscompile; NFC

PR50341

3 years ago[AMDGPU] Add support for architected flat scratch
Stanislav Mekhanoshin [Mon, 12 Apr 2021 21:40:17 +0000 (14:40 -0700)]
[AMDGPU] Add support for architected flat scratch

Add support for the readonly flat Scratch register initialized
by the SPI.

Differential Revision: https://reviews.llvm.org/D102432

3 years ago[gn build] (manually) merge b7d1ab75cf47
Nico Weber [Fri, 14 May 2021 17:50:35 +0000 (13:50 -0400)]
[gn build] (manually) merge b7d1ab75cf47

No check-hwasan-lam target yet, though.

3 years ago[Demangle][Rust] Parse integer constants
Tomasz Miąsko [Fri, 14 May 2021 00:00:00 +0000 (00:00 +0000)]
[Demangle][Rust] Parse integer constants

Reviewed By: dblaikie

Differential Revision: https://reviews.llvm.org/D102179

3 years agoDo actual DCE in LoopUnroll (try 2)
Philip Reames [Fri, 14 May 2021 17:41:25 +0000 (10:41 -0700)]
Do actual DCE in LoopUnroll (try 2)

Recommitting after addressing a missed review comment, and updating an aarch64 test I'd missed.

LoopUnroll does a limited DCE pass after unrolling, but if you have a chain of dead instructions, it only deletes the last one. Improve the code to recursively delete all trivially dead instructions.

Differential Revision: https://reviews.llvm.org/D102511

3 years ago[GWP-ASan] Migrate lit tests from old Scudo -> Standalone.
Mitch Phillips [Fri, 14 May 2021 17:28:09 +0000 (10:28 -0700)]
[GWP-ASan] Migrate lit tests from old Scudo -> Standalone.

This removes one of the last dependencies on old Scudo, and should allow
us to delete the old Scudo soon.

Reviewed By: vitalybuka, cryptoad

Differential Revision: https://reviews.llvm.org/D102349

3 years agoAllow same memory space for SRC and DST of dma_start operations
Ian Bearman [Fri, 14 May 2021 17:40:15 +0000 (10:40 -0700)]
Allow same memory space for SRC and DST of dma_start operations

    This change allows the SRC and DST of dma_start operations to be located in the
    same memory space. This applies to both the Affine dialect and Memref dialect
    versions of these Ops. The documention has been updated to reflect this by
    explicitly stating overlapping memory locations are not supported (undefined
    behavior).

Reviewed By: bondhugula

Differential Revision: https://reviews.llvm.org/D102274

3 years ago[test] Improve x86-64-plt.s
Fangrui Song [Fri, 14 May 2021 17:38:40 +0000 (10:38 -0700)]
[test] Improve x86-64-plt.s

3 years ago[clangd] Make unit test compatible with gtest 1.10.0
Benjamin Kramer [Fri, 14 May 2021 17:37:46 +0000 (19:37 +0200)]
[clangd] Make unit test compatible with gtest 1.10.0

3 years ago[mlir][NFC] Move passes in test/lib/Transforms/ to a directory that mirrors what...
River Riddle [Fri, 14 May 2021 17:27:57 +0000 (10:27 -0700)]
[mlir][NFC] Move passes in test/lib/Transforms/ to a directory that mirrors what they test

test/lib/Transforms/ has bitrot and become somewhat of a dumping grounds for testing pretty much any part of the project. This revision cleans this up, and moves the files within to a directory that reflects what is actually being tested.

Differential Revision: https://reviews.llvm.org/D102456

3 years agoDocument updated googletest + modifications
Benjamin Kramer [Fri, 14 May 2021 17:25:01 +0000 (19:25 +0200)]
Document updated googletest + modifications

3 years agoAMDGPU: Fix assert when rewriting saddr d16 loads
Matt Arsenault [Tue, 11 May 2021 22:10:47 +0000 (18:10 -0400)]
AMDGPU: Fix assert when rewriting saddr d16 loads

moveOperands does not handle moving tied operands since it would
generally have to fixup the tied operand references. Avoid the assert
by untying and retying after the modification. These in place
modifications really aren't managable.

3 years ago[NFC][X86][MCA] Add sudo-zero-idiom vperm2f128/vperm2i128 tests - don't break deps
Roman Lebedev [Fri, 14 May 2021 17:12:26 +0000 (20:12 +0300)]
[NFC][X86][MCA] Add sudo-zero-idiom vperm2f128/vperm2i128 tests - don't break deps

While btver2 model states that this pattern is a zero-cycle zero-idiom
on Jaguar, it does not appear to be the case on Znver3,
here it measures as not being recognized as dep-breaking zero-idiom,
let alone a zero-cycle one.

3 years ago[X86] AMD Zen 3: same-reg AVX YMM VPCMPGT{B,W,D,Q} is a zero-cycle(!) dep-breaking...
Roman Lebedev [Fri, 14 May 2021 16:57:21 +0000 (19:57 +0300)]
[X86] AMD Zen 3: same-reg AVX YMM VPCMPGT{B,W,D,Q} is a zero-cycle(!) dep-breaking zero-idiom

As measured by exegesis, and confirmed by ref docs.

3 years ago[X86] AMD Zen 3: same-reg AVX XMM VPCMPGT{B,W,D,Q} is a zero-cycle(!) dep-breaking...
Roman Lebedev [Fri, 14 May 2021 16:56:16 +0000 (19:56 +0300)]
[X86] AMD Zen 3: same-reg AVX XMM VPCMPGT{B,W,D,Q} is a zero-cycle(!) dep-breaking zero-idiom

As measured by exegesis, and confirmed by ref docs.

3 years ago[X86] AMD Zen 3: same-reg SSE XMM PCMPGT{B,W,D,Q} is a 1-cycle(!) dep-breaking zero...
Roman Lebedev [Fri, 14 May 2021 16:54:33 +0000 (19:54 +0300)]
[X86] AMD Zen 3: same-reg SSE XMM PCMPGT{B,W,D,Q} is a 1-cycle(!) dep-breaking zero-idiom

As measured by exegesis, and confirmed by ref docs.

3 years ago[NFC][X86][MCA] AMD Zen 3: add same-reg AVX YMM VPCMPGT{B,W,D,Q} tests
Roman Lebedev [Fri, 14 May 2021 15:48:14 +0000 (18:48 +0300)]
[NFC][X86][MCA] AMD Zen 3: add same-reg AVX YMM VPCMPGT{B,W,D,Q} tests

3 years ago[NFC][X86][MCA] AMD Zen 3: add same-reg AVX XMM VPCMPGT{B,W,D,Q} tests
Roman Lebedev [Fri, 14 May 2021 15:48:08 +0000 (18:48 +0300)]
[NFC][X86][MCA] AMD Zen 3: add same-reg AVX XMM VPCMPGT{B,W,D,Q} tests

3 years ago[NFC][X86][MCA] AMD Zen 3: add same-reg SSE XMM PCMPGT{B,W,D,Q} tests
Roman Lebedev [Fri, 14 May 2021 15:47:55 +0000 (18:47 +0300)]
[NFC][X86][MCA] AMD Zen 3: add same-reg SSE XMM PCMPGT{B,W,D,Q} tests

3 years ago[X86] AMD Zen 3: same-reg AVX YMM VPSUBUS{B,W} is a 1-cycle(!) dep-breaking zero...
Roman Lebedev [Fri, 14 May 2021 15:33:22 +0000 (18:33 +0300)]
[X86] AMD Zen 3: same-reg AVX YMM VPSUBUS{B,W} is a 1-cycle(!) dep-breaking zero-idiom

Not really mentioned in ref docs, but measures as such.
Yes, this one is also not zero-cycle.

3 years ago[X86] AMD Zen 3: same-reg AVX XMM VPSUBUS{B,W} is a 1-cycle(!) dep-breaking zero...
Roman Lebedev [Fri, 14 May 2021 15:32:34 +0000 (18:32 +0300)]
[X86] AMD Zen 3: same-reg AVX XMM VPSUBUS{B,W} is a 1-cycle(!) dep-breaking zero-idiom

Not really mentioned in ref docs, but measures as such.
Yes, this one is also not zero-cycle.

3 years ago[X86] AMD Zen 3: same-reg SSE XMM PSUBUS{B,W} is a 1-cycle(!) dep-breaking zero-idiom
Roman Lebedev [Fri, 14 May 2021 15:31:31 +0000 (18:31 +0300)]
[X86] AMD Zen 3: same-reg SSE XMM PSUBUS{B,W} is a 1-cycle(!) dep-breaking zero-idiom

Not really mentioned in ref docs, but measures as such.

3 years ago[NFC][X86][MCA] AMD Zen 3: add same-reg AVX YMM VPSUBUS{B,W} tests
Roman Lebedev [Fri, 14 May 2021 15:22:34 +0000 (18:22 +0300)]
[NFC][X86][MCA] AMD Zen 3: add same-reg AVX YMM VPSUBUS{B,W} tests

3 years ago[NFC][X86][MCA] AMD Zen 3: add same-reg AVX XMM VPSUBUS{B,W} tests
Roman Lebedev [Fri, 14 May 2021 15:22:25 +0000 (18:22 +0300)]
[NFC][X86][MCA] AMD Zen 3: add same-reg AVX XMM VPSUBUS{B,W} tests

3 years ago[NFC][X86][MCA] AMD Zen 3: add same-reg SSE XMM PSUBUS{B,W} tests
Roman Lebedev [Fri, 14 May 2021 15:22:18 +0000 (18:22 +0300)]
[NFC][X86][MCA] AMD Zen 3: add same-reg SSE XMM PSUBUS{B,W} tests

3 years ago[X86] AMD Zen 3: same-reg AVX YMM VPSUBS{B,W} is a 1-cycle(!) dep-breaking zero-idiom
Roman Lebedev [Fri, 14 May 2021 15:17:33 +0000 (18:17 +0300)]
[X86] AMD Zen 3: same-reg AVX YMM VPSUBS{B,W} is a 1-cycle(!) dep-breaking zero-idiom

Not really mentioned in ref docs, but measures as such.
Yes, this one is also not zero-cycle.

3 years ago[X86] AMD Zen 3: same-reg AVX XMM VPSUBS{B,W} is a 1-cycle(!) dep-breaking zero-idiom
Roman Lebedev [Fri, 14 May 2021 15:16:09 +0000 (18:16 +0300)]
[X86] AMD Zen 3: same-reg AVX XMM VPSUBS{B,W} is a 1-cycle(!) dep-breaking zero-idiom

Not really mentioned in ref docs, but measures as such.
Yes, this one is also not zero-cycle.

3 years ago[X86] AMD Zen 3: same-reg SSE XMM PSUBS{B,W} is a 1-cycle(!) dep-breaking zero-idiom
Roman Lebedev [Fri, 14 May 2021 15:06:18 +0000 (18:06 +0300)]
[X86] AMD Zen 3: same-reg SSE XMM PSUBS{B,W} is a 1-cycle(!) dep-breaking zero-idiom

Not really mentioned in ref docs, but measures as such.

3 years ago[NFC][X86][MCA] AMD Zen 3: add same-reg AVX YMM VPSUBS{B,W} tests
Roman Lebedev [Fri, 14 May 2021 14:44:52 +0000 (17:44 +0300)]
[NFC][X86][MCA] AMD Zen 3: add same-reg AVX YMM VPSUBS{B,W} tests

3 years ago[NFC][X86][MCA] AMD Zen 3: add same-reg AVX XMM VPSUBS{B,W} tests
Roman Lebedev [Fri, 14 May 2021 14:44:45 +0000 (17:44 +0300)]
[NFC][X86][MCA] AMD Zen 3: add same-reg AVX XMM VPSUBS{B,W} tests

3 years ago[NFC][X86][MCA] AMD Zen 3: add same-reg SSE XMM PSUBS{B,W} tests
Roman Lebedev [Fri, 14 May 2021 14:44:35 +0000 (17:44 +0300)]
[NFC][X86][MCA] AMD Zen 3: add same-reg SSE XMM PSUBS{B,W} tests

3 years ago[X86] AMD Zen 3: same-reg AVX YMM VPSUB{B,W,D,Q} is a zero-cycle(!) dep-breaking...
Roman Lebedev [Fri, 14 May 2021 14:26:26 +0000 (17:26 +0300)]
[X86] AMD Zen 3: same-reg AVX YMM VPSUB{B,W,D,Q} is a zero-cycle(!) dep-breaking zero-idiom

As confirmed by the exegesis measurements, and ref docs.

3 years ago[X86] AMD Zen 3: same-reg AVX XMM VPSUB{B,W,D,Q} is a zero-cycle(!) dep-breaking...
Roman Lebedev [Fri, 14 May 2021 14:22:19 +0000 (17:22 +0300)]
[X86] AMD Zen 3: same-reg AVX XMM VPSUB{B,W,D,Q} is a zero-cycle(!) dep-breaking zero-idiom

As confirmed by the exegesis measurements, and ref docs.

3 years ago[X86] AMD Zen 3: same-reg SSE XMM PSUB{B,W,D,Q} is a 1-cycle(!) dep-breaking zero...
Roman Lebedev [Fri, 14 May 2021 14:16:50 +0000 (17:16 +0300)]
[X86] AMD Zen 3: same-reg SSE XMM PSUB{B,W,D,Q} is a 1-cycle(!) dep-breaking zero-idiom

As confirmed by the exegesis measurements, and ref docs.

3 years ago[NFC][X86][MCA] AMD Zen 3: add same-reg AVX YMM VPSUB{B,W,D,Q} tests
Roman Lebedev [Fri, 14 May 2021 14:08:45 +0000 (17:08 +0300)]
[NFC][X86][MCA] AMD Zen 3: add same-reg AVX YMM VPSUB{B,W,D,Q} tests

3 years ago[NFC][X86][MCA] AMD Zen 3: add same-reg AVX XMM VPSUB{B,W,D,Q} tests
Roman Lebedev [Fri, 14 May 2021 14:08:38 +0000 (17:08 +0300)]
[NFC][X86][MCA] AMD Zen 3: add same-reg AVX XMM VPSUB{B,W,D,Q} tests

3 years ago[NFC][X86][MCA] AMD Zen 3: add same-reg SSE XMM PSUB{B,W,D,Q} tests
Roman Lebedev [Fri, 14 May 2021 14:08:24 +0000 (17:08 +0300)]
[NFC][X86][MCA] AMD Zen 3: add same-reg SSE XMM PSUB{B,W,D,Q} tests

3 years ago[X86] AMD Zen 3: same-reg AVX YMM VPANDN is a zero-cycle(!) dep-breaking zero-idiom
Roman Lebedev [Fri, 14 May 2021 13:20:25 +0000 (16:20 +0300)]
[X86] AMD Zen 3: same-reg AVX YMM VPANDN is a zero-cycle(!) dep-breaking zero-idiom

As confirmed by exegesis measurements, and ref docs.

3 years ago[X86] AMD Zen 3: same-reg AVX XMM VPANDN is a zero-cycle(!) dep-breaking zero-idiom
Roman Lebedev [Fri, 14 May 2021 13:19:31 +0000 (16:19 +0300)]
[X86] AMD Zen 3: same-reg AVX XMM VPANDN is a zero-cycle(!) dep-breaking zero-idiom

As confirmed by exegesis measurements, and ref docs.

3 years ago[X86] AMD Zen 3: same-reg SSE XMM PANDN is a 1-cycle(!) dep-breaking zero-idiom
Roman Lebedev [Fri, 14 May 2021 13:18:23 +0000 (16:18 +0300)]
[X86] AMD Zen 3: same-reg SSE XMM PANDN is a 1-cycle(!) dep-breaking zero-idiom

As confirmed by the exegesis measurements, and ref docs.

3 years ago[NFC][X86][MCA] AMD Zen 3: add same-reg AVX YMM VPANDN tests
Roman Lebedev [Fri, 14 May 2021 13:16:49 +0000 (16:16 +0300)]
[NFC][X86][MCA] AMD Zen 3: add same-reg AVX YMM VPANDN tests

3 years ago[NFC][X86][MCA] AMD Zen 3: add same-reg AVX XMM VPANDN tests
Roman Lebedev [Fri, 14 May 2021 13:16:43 +0000 (16:16 +0300)]
[NFC][X86][MCA] AMD Zen 3: add same-reg AVX XMM VPANDN tests

3 years ago[NFC][X86][MCA] AMD Zen 3: add same-reg SSE XMM PANDN tests
Roman Lebedev [Fri, 14 May 2021 13:16:23 +0000 (16:16 +0300)]
[NFC][X86][MCA] AMD Zen 3: add same-reg SSE XMM PANDN tests

3 years ago[X86] AMD Zen 3: same-reg AVX YMM VPXOR is a zero-cycle(!) dep-breaking zero-idiom
Roman Lebedev [Fri, 14 May 2021 12:48:22 +0000 (15:48 +0300)]
[X86] AMD Zen 3: same-reg AVX YMM VPXOR is a zero-cycle(!) dep-breaking zero-idiom

As confirmed by exegesis measurements, and ref docs.

3 years ago[X86] AMD Zen 3: same-reg AVX XMM VPXOR is a zero-cycle(!) dep-breaking zero-idiom
Roman Lebedev [Fri, 14 May 2021 12:47:02 +0000 (15:47 +0300)]
[X86] AMD Zen 3: same-reg AVX XMM VPXOR is a zero-cycle(!) dep-breaking zero-idiom

As confirmed by exegesis measurements, and ref docs.

3 years ago[X86] AMD Zen 3: same-reg SSE XMM PXOR is a 1-cycle(!) dep-breaking zero-idiom
Roman Lebedev [Fri, 14 May 2021 12:44:32 +0000 (15:44 +0300)]
[X86] AMD Zen 3: same-reg SSE XMM PXOR is a 1-cycle(!) dep-breaking zero-idiom

As confirmed by the exegesis measurements, and ref docs.

3 years ago[NFC][X86][MCA] AMD Zen 3: add same-reg AVX YMM VPXOR tests
Roman Lebedev [Fri, 14 May 2021 12:37:52 +0000 (15:37 +0300)]
[NFC][X86][MCA] AMD Zen 3: add same-reg AVX YMM VPXOR tests

3 years ago[NFC][X86][MCA] AMD Zen 3: add same-reg AVX XMM VPXOR tests
Roman Lebedev [Fri, 14 May 2021 12:37:37 +0000 (15:37 +0300)]
[NFC][X86][MCA] AMD Zen 3: add same-reg AVX XMM VPXOR tests

3 years ago[NFC][X86][MCA] AMD Zen 3: add same-reg SSE XMM PXOR tests
Roman Lebedev [Fri, 14 May 2021 12:37:00 +0000 (15:37 +0300)]
[NFC][X86][MCA] AMD Zen 3: add same-reg SSE XMM PXOR tests

3 years agoBump googletest to 1.10.0
Benjamin Kramer [Fri, 14 May 2021 17:15:20 +0000 (19:15 +0200)]
Bump googletest to 1.10.0

3 years agoRevert "Do actual DCE in LoopUnroll"
Philip Reames [Fri, 14 May 2021 17:15:30 +0000 (10:15 -0700)]
Revert "Do actual DCE in LoopUnroll"

This reverts commit 9d1a61e695eb01298e26c76867d65592f1e1968c.

I'd missed some review feedback, and had missed updating an aarch64 test.  Reverting while I fix both.

3 years agoDo actual DCE in LoopUnroll
Philip Reames [Fri, 14 May 2021 17:03:45 +0000 (10:03 -0700)]
Do actual DCE in LoopUnroll

LoopUnroll does a limited DCE pass after unrolling, but if you have a chain of dead instructions, it only deletes the last one. Improve the code to recursively delete all trivially dead instructions.

Differential Revision: https://reviews.llvm.org/D102511

3 years ago[HWASan] Add aliasing flag and enable HWASan to use it.
Matt Morehouse [Fri, 14 May 2021 16:02:49 +0000 (09:02 -0700)]
[HWASan] Add aliasing flag and enable HWASan to use it.

-fsanitize-hwaddress-experimental-aliasing is intended to distinguish
aliasing mode from LAM mode on x86_64.  check-hwasan is configured
to use aliasing mode while check-hwasan-lam is configured to use LAM
mode.

The current patch doesn't actually do anything differently in the two
modes.  A subsequent patch will actually build the separate runtimes
and use them in each mode.

Currently LAM mode tests must be run in an emulator that
has LAM support.  To ensure LAM mode isn't broken by future patches, I
will next set up a QEMU buildbot to run the HWASan tests in LAM.

Reviewed By: vitalybuka

Differential Revision: https://reviews.llvm.org/D102288

3 years ago[OpenCL] Simplify use of C11 atomic types.
Anastasia Stulova [Fri, 14 May 2021 14:41:56 +0000 (15:41 +0100)]
[OpenCL] Simplify use of C11 atomic types.

Remove requirements on extension pragma in atomic types
because it has not respected the spec wrt disabling types
and hasn't been useful either. With this change, the
developers can use atomic types from the extensions if they
are supported without enabling the pragma just like the builtin
functions

This patch does not break backward compatibility since the
extension pragma is still supported and it makes the behavior of
the compiler less strict by accepting code without needless and
inconsistent pragma statements.

Differential Revision: https://reviews.llvm.org/D100976

3 years ago[ELF] Add -Bno-symbolic
Fangrui Song [Fri, 14 May 2021 16:40:32 +0000 (09:40 -0700)]
[ELF] Add -Bno-symbolic

This option will be available in GNU ld 2.27 (https://sourceware.org/bugzilla/show_bug.cgi?id=27834).
This option can cancel previously specified -Bsymbolic and
-Bsymbolic-functions.  This is useful for excluding some links when the
default uses -Bsymbolic-functions.

Reviewed By: jhenderson, peter.smith

Differential Revision: https://reviews.llvm.org/D102383

3 years ago[ELF][test] Improve -Bsymbolic & -Bsymbolic-functions test
Fangrui Song [Fri, 14 May 2021 16:33:43 +0000 (09:33 -0700)]
[ELF][test] Improve -Bsymbolic & -Bsymbolic-functions test

Previously there was no test checking that -Bsymbolic-functions only applies to STT_FUNC symbols.

Reviewed By: jhenderson

Differential Revision: https://reviews.llvm.org/D102461

3 years agoAutogen a test for ease of update
Philip Reames [Fri, 14 May 2021 16:33:10 +0000 (09:33 -0700)]
Autogen a test for ease of update

3 years ago[LV] Add a few more complex first-order recurrence tests.
Florian Hahn [Fri, 14 May 2021 15:33:41 +0000 (16:33 +0100)]
[LV] Add a few more complex first-order recurrence tests.

3 years ago[AArch64][SVE] Combine cntp intrinsics with add/sub to produce incp/decp
Bradley Smith [Fri, 7 May 2021 13:19:23 +0000 (14:19 +0100)]
[AArch64][SVE] Combine cntp intrinsics with add/sub to produce incp/decp

Depends on D101062

Differential Revision: https://reviews.llvm.org/D102077

3 years agoFix some typos.
Benoit Jacob [Fri, 14 May 2021 15:56:55 +0000 (21:26 +0530)]
Fix some typos.

Fix some typos

Reviewed By: nicolasvasilache

Differential Revision: https://reviews.llvm.org/D102503

3 years ago[X86][SSE] Pull out combineToHorizontalAddSub helper from inside (F)ADD/SUB combines...
Simon Pilgrim [Fri, 14 May 2021 14:32:18 +0000 (15:32 +0100)]
[X86][SSE] Pull out combineToHorizontalAddSub helper from inside (F)ADD/SUB combines. NFCI.

The intention is to be able to run this from additional locations (such as shuffle combining) in the future.

3 years ago[libc++] Improve make_string test support.
Mark de Wever [Thu, 13 May 2021 16:23:27 +0000 (18:23 +0200)]
[libc++] Improve make_string test support.

Adds MAKE_CSTRING and makes the operators of `MultiStringType` `constexpr`.

The code is copied from D96664 so it can be used in D80895.

Differential Revision: https://reviews.llvm.org/D102414

3 years agoBump googletest to 1.8.1
Benjamin Kramer [Fri, 14 May 2021 15:10:50 +0000 (17:10 +0200)]
Bump googletest to 1.8.1

We've accumulated a scary amount of local patches to this directory. I
tried to merge them all, but if your favorite change is missing please
reapply it manually (and send it upstream).

3 years ago[AArch64][SVE] Add unpredicated vector BIC ISD node
Bradley Smith [Fri, 30 Apr 2021 15:11:09 +0000 (16:11 +0100)]
[AArch64][SVE] Add unpredicated vector BIC ISD node

Addition of this node allows us to better utilize the different forms of
the SVE BIC instructions, including using the alias to an AND (immediate).

Differential Revision: https://reviews.llvm.org/D101831

3 years ago[rs4gc] Strip memory related attributes consistently
Philip Reames [Fri, 14 May 2021 14:52:34 +0000 (07:52 -0700)]
[rs4gc] Strip memory related attributes consistently

I noticed that rs4gc is not stripping a number of memory aliasing related attributes. We do strip some from call sites, but don't strip the same ones from declarations or parameters.

Why do we need to strip these? Two answers:

    Safepoints conceptually read and write to the entire garbage collected heap in the physical model. We need this to preserve ordering of all loads and stores with respect to possible relocation.
    We can infer other attributes from these. For instance, readnone can imply both nofree and nosync. Both of which don't hold after physical rewriting.

Note: This exposed a latent issue which was fixed a couple weeks back in 01801d5274.

Differential Revision: https://reviews.llvm.org/D99802

3 years ago[clangd] Always default to raw pch format
Kadir Cetinkaya [Thu, 13 May 2021 16:51:56 +0000 (18:51 +0200)]
[clangd] Always default to raw pch format

Clang would emit a fatal error when it encounters an unregistered PCH
format. This change ensures clangd will always use raw format no matter what
user specifies.

As side effects:

- serializing an AST in an unknown format might throw off build
systems. I suppose this would only be an issue when build system and clangd are
racing for same PCM modules, hopefully this should be rare and both clangd or
the build system should recover on the next run.

- whenever clang reads a serialized AST it seems to be checking for file
signature and emitting non-fatal errors. so this should be fine again.

The only other valid module format in clang is `obj` but it is part of codegen,
i don't think it is worth the dependency. Hence chosing to not register it, at
least yet.

Differential Revision: https://reviews.llvm.org/D102418

3 years ago[utils] Don't print username in arcanist clang format message
David Spickett [Fri, 14 May 2021 14:30:51 +0000 (14:30 +0000)]
[utils] Don't print username in arcanist clang format message

I didn't realise this message was also posted to the phabricator review.

Just say "the user's local path". "local" is the important part,
the username is not important.

3 years ago[ARM] Expand predecessor search to multiple blocks when reverting WhileLoopStarts
David Green [Fri, 14 May 2021 14:08:14 +0000 (15:08 +0100)]
[ARM] Expand predecessor search to multiple blocks when reverting WhileLoopStarts

We were previously only searching a single preheader for call
instructions when reverting WhileLoopStarts to DoLoopStarts. This
extends that to multiple blocks that can come up when, for example a
loop is expanded from a memcpy. It also expends the instructions from
just Call's to also include other LoopStarts, to catch other low
overhead loops in the preheader.

Differential Revision: https://reviews.llvm.org/D102269

3 years ago[ARM] Define CPSR on MEMCPY pseudos
David Green [Thu, 13 May 2021 08:20:33 +0000 (09:20 +0100)]
[ARM] Define CPSR on MEMCPY pseudos

These pseudos are converted post-isel into t2WhileLoopStart and
t2LoopEnd/LoopDec instructions, which themselves are defined to clobber
CPSR. Doing the same with the MEMCPY nodes will make sure they are
scheduled correctly to not end up with incorrect uses.

3 years ago[RISCV] Add the DebugLoc parameter to getVLENFactoredAmount().
Hsiangkai Wang [Thu, 13 May 2021 02:45:00 +0000 (10:45 +0800)]
[RISCV] Add the DebugLoc parameter to getVLENFactoredAmount().

The MachineBasicBlock::iterator is continuously changing during
generating the frame handling instructions. We should use the DebugLoc
from the caller, instead of getting it from the changing iterator.

If the prologue instructions located in a basic block without any other
instructions after these prologue instructions, the iterator will be
updated to the boundary of the basic block and it is invalid to use the
iterator to access DebugLoc. This patch also fixes the crash when
accessing DebugLoc using the iterator.

Differential Revision: https://reviews.llvm.org/D102386

3 years ago[ARM][AArch64] Correct __ARM_FEATURE_CRYPTO macro and crypto feature
David Candler [Fri, 14 May 2021 12:45:05 +0000 (13:45 +0100)]
[ARM][AArch64] Correct __ARM_FEATURE_CRYPTO macro and crypto feature

This patch contains a couple of minor corrections to my previous
crypto patch:

Since both AArch32 and AArch64 are now correctly setting the aes and
sha2 features individually, it is not necessary to continue to check
the crypto feature when defining feature macros.

In the AArch32 driver, the feature vector is only modified when the
crypto feature is actually in the vector. If crypto is not present,
there is no need to split it and explicitly define crypto/sha2/aes.

Reviewed By: lenary

Differential Revision: https://reviews.llvm.org/D102406

3 years ago[AMDGPU][MC][NFC][DOC] Updated AMD GPU assembler syntax description.
Dmitry Preobrazhensky [Fri, 14 May 2021 13:11:36 +0000 (16:11 +0300)]
[AMDGPU][MC][NFC][DOC] Updated AMD GPU assembler syntax description.

Summary of changes:
- added description of GFX90A;
- minor bugfixing and improvements.

3 years ago[PowerPC] Add vec_vupkhpx and vec_vupklpx for XL compatibility
Nemanja Ivanovic [Fri, 14 May 2021 13:00:44 +0000 (08:00 -0500)]
[PowerPC] Add vec_vupkhpx and vec_vupklpx for XL compatibility

These are old names for these functions that XL still supports.

3 years ago[SDAG] reduce code duplication for extend_vec_inreg combines; NFC
Sanjay Patel [Fri, 14 May 2021 12:27:36 +0000 (08:27 -0400)]
[SDAG] reduce code duplication for extend_vec_inreg combines; NFC

These are identical so far, and I was looking at adding a fold
for a pattern with scalar_to_vector which would also nd up duplicated.