platform/upstream/llvm.git
3 years ago[mlir][sparse] complete migration to sparse tensor type
Aart Bik [Mon, 10 May 2021 17:34:21 +0000 (10:34 -0700)]
[mlir][sparse] complete migration to sparse tensor type

A very elaborate, but also very fun revision because all
puzzle pieces are finally "falling in place".

1. replaces lingalg annotations + flags with proper sparse tensor types
2. add rigorous verification on sparse tensor type and sparse primitives
3. removes glue and clutter on opaque pointers in favor of sparse tensor types
4. migrates all tests to use sparse tensor types

NOTE: next CL will remove *all* obsoleted sparse code in Linalg

Reviewed By: bixia

Differential Revision: https://reviews.llvm.org/D102095

3 years ago[lld-macho] Fix order file arch filtering
Jez Ng [Mon, 10 May 2021 19:45:20 +0000 (15:45 -0400)]
[lld-macho] Fix order file arch filtering

We had a hardcoded check and a stale TODO, written back when we only had
support for one architecture.

Reviewed By: #lld-macho, thakis

Differential Revision: https://reviews.llvm.org/D102154

3 years ago[lld-macho] Treat undefined symbols uniformly
Jez Ng [Mon, 10 May 2021 19:45:18 +0000 (15:45 -0400)]
[lld-macho] Treat undefined symbols uniformly

In particular, we should apply the `-undefined` behavior to all
such symbols, include those that are specified via the command line
(i.e.  `-e`, `-u`, and `-exported_symbol`). ld64 supports this too.

Reviewed By: #lld-macho, thakis

Differential Revision: https://reviews.llvm.org/D102143

3 years ago[lld-macho][nfc] Clean up tests
Jez Ng [Mon, 10 May 2021 02:09:17 +0000 (22:09 -0400)]
[lld-macho][nfc] Clean up tests

* Remove unnecessary `rm -rf %t`s
* Have lc-linker-option.ll use the right comment marker

3 years ago[PowerPC] Spilling to registers does not require frame index scavenging
Stefan Pintilie [Mon, 10 May 2021 18:10:11 +0000 (13:10 -0500)]
[PowerPC] Spilling to registers does not require frame index scavenging

If spills are to registers instead of to the stack then a copy will be used
and frame index scavenging is not required.

This patch adds debug info to frame index scavenging and makes sure that
spilling to registers does not cause frame index scavenging.

Reviewed By: nemanjai, #powerpc

Differential Revision: https://reviews.llvm.org/D101360

3 years ago[TargetLowering] Only inspect attributes in the arguments for ArgListEntry
Arthur Eubanks [Tue, 4 May 2021 01:00:50 +0000 (18:00 -0700)]
[TargetLowering] Only inspect attributes in the arguments for ArgListEntry

Parameter attributes are considered part of the function [1], and like
mismatched calling conventions [2], we can't have the verifier check for
mismatched parameter attributes.

[1] https://llvm.org/docs/LangRef.html#parameter-attributes
[2] https://llvm.org/docs/FAQ.html#why-does-instcombine-simplifycfg-turn-a-call-to-a-function-with-a-mismatched-calling-convention-into-unreachable-why-not-make-the-verifier-reject-it

Reviewed By: rnk

Differential Revision: https://reviews.llvm.org/D101806

3 years ago[mlir][linalg] Restrict distribution to parallel dims
Lei Zhang [Mon, 10 May 2021 19:17:14 +0000 (15:17 -0400)]
[mlir][linalg] Restrict distribution to parallel dims

According to the API contract, LinalgLoopDistributionOptions
expects to work on parallel iterators. When getting processor
information, only loop ranges for parallel dimensions should
be fed in. But right now after generating scf.for loop nests,
we feed in *all* loops, including the ones materialized for
reduction iterators. This can cause unexpected distribution
of reduction dimensions. This commit fixes it.

Reviewed By: mravishankar

Differential Revision: https://reviews.llvm.org/D102079

3 years ago[libc] Rever "Simplifies multi implementations and benchmarks".
Siva Chandra Reddy [Mon, 10 May 2021 19:20:27 +0000 (19:20 +0000)]
[libc] Rever "Simplifies multi implementations and benchmarks".

This reverts commit 541f107871bc9c020925a6e5342542a47c902d12 as the bots
are failing with unknown architecture "x86-64-v*". Will let the original
author decide on the right course of action to correct the problem and
reland.

3 years ago[scudo] [GWP-ASan] Add GWP-ASan variant of scudo benchmarks.
Mitch Phillips [Mon, 10 May 2021 18:59:45 +0000 (11:59 -0700)]
[scudo] [GWP-ASan] Add GWP-ASan variant of scudo benchmarks.

GWP-ASan is the "production" variant as compiled by compiler-rt, and it's useful to be able to benchmark changes in GWP-ASan or Scudo's GWP-ASan hooks across versions. GWP-ASan is sampled, and sampled allocations are much slower, but given the amount of allocations that happen under test here - we actually get a reasonable representation of GWP-ASan's negligent performance impact between runs.

Reviewed By: cryptoad

Differential Revision: https://reviews.llvm.org/D101865

3 years ago[RISCV] Validate the SEW and LMUL operands to __builtin_rvv_vsetvli(max)
Craig Topper [Mon, 10 May 2021 18:30:45 +0000 (11:30 -0700)]
[RISCV] Validate the SEW and LMUL operands to __builtin_rvv_vsetvli(max)

These are required to be constants, this patch makes sure they
are in the accepted range of values.

These are usually created by wrappers in the riscv_vector.h header
which should always be correct. This patch protects against a user
using the builtin directly.

Reviewed By: khchen

Differential Revision: https://reviews.llvm.org/D102086

3 years ago[GlobalISel][IRTranslator] Fix bit-test lowering dropping phi edges.
Amara Emerson [Mon, 10 May 2021 18:12:14 +0000 (11:12 -0700)]
[GlobalISel][IRTranslator] Fix bit-test lowering dropping phi edges.

For contiguous ranges we drop the last bit-test case but in doing so we skip
adding the new MBB PHI edges to the list of replacement PHI edges, and as a
result we incorrectly omit them in the G_PHI in finishPendingPhis().

Was found when bootstrapping clang with -O3 and GlobalISel enabled on Apple Silicon.

3 years ago[PassManager] add helper function to hold set of vector passes (2nd try)
Sanjay Patel [Mon, 10 May 2021 17:55:42 +0000 (13:55 -0400)]
[PassManager] add helper function to hold set of vector passes (2nd try)

This is better no-functional-change-intended than the 1st attempt.
As noted in D102002, there were at least 2 diffs that went
unchecked in pass manager regressions tests: different pass
parameters (SimplifyCFG) and an extension point/callback.
Those should be lifted from the original code blocks correctly
now.

3 years ago[mlir][Python] Re-export cext sparse_tensor module to the public namespace.
Stella Laurenzo [Mon, 10 May 2021 17:42:24 +0000 (17:42 +0000)]
[mlir][Python] Re-export cext sparse_tensor module to the public namespace.

* This was left out of the previous commit accidentally.

Differential Revision: https://reviews.llvm.org/D102183

3 years ago[X86] AMD Zen 3: sub-32-bit CMP also break dependencies
Roman Lebedev [Mon, 10 May 2021 17:52:30 +0000 (20:52 +0300)]
[X86] AMD Zen 3: sub-32-bit CMP also break dependencies

They measure as having the same effect as 32-bit CMP.

3 years ago[NFC][X86][MCA] AMD Zen 3: add tests for sub-32-bit CMP dep breaking
Roman Lebedev [Mon, 10 May 2021 17:48:41 +0000 (20:48 +0300)]
[NFC][X86][MCA] AMD Zen 3: add tests for sub-32-bit CMP dep breaking

3 years ago[X86][AVX] Add example of failure to remove a 256-bit permute(hadd(hadd(),hadd()...
Simon Pilgrim [Mon, 10 May 2021 17:43:02 +0000 (18:43 +0100)]
[X86][AVX] Add example of failure to remove a 256-bit permute(hadd(hadd(),hadd())) shuffle by reordering the packed operands.

3 years ago[X86][SSE] canonicalizeShuffleMaskWithHorizOp - add TODO for better 256/512-bit shuff...
Simon Pilgrim [Mon, 10 May 2021 17:41:05 +0000 (18:41 +0100)]
[X86][SSE] canonicalizeShuffleMaskWithHorizOp - add TODO for better 256/512-bit shuffle+hop folding support. NFC.

3 years ago[lld-macho] Improve an external weak def test
Fangrui Song [Mon, 10 May 2021 17:35:44 +0000 (10:35 -0700)]
[lld-macho] Improve an external weak def test

The rebase table entry is untested.

Reviewed By: #lld-macho, int3

Differential Revision: https://reviews.llvm.org/D102150

3 years ago[Dependence Analysis] Enable delinearization of fixed sized arrays
Andy Kaylor [Mon, 10 May 2021 17:01:43 +0000 (10:01 -0700)]
[Dependence Analysis] Enable delinearization of fixed sized arrays

Patch by Artem Radzikhovskyy!

Allow delinearization of fixed sized arrays if we can prove that the GEP indices do not overflow the array dimensions. The checks applied are similar to the ones that are used for delinearization of parametric size arrays. Make sure that the GEP indices are non-negative and that they are smaller than the range of that dimension.

Changes Summary:

- Updated the LIT tests with more exact values, as we are able to delinearize and apply more exact tests
- profitability.ll - now able to delinearize in all cases, no need to use -da-disable-delinearization-checks flag and run the test twice
- loop-interchange-optimization-remarks.ll - in one of the cases we are able to delinearize without using -da-disable-delinearization-checks
- SimpleSIVNoValidityCheckFixedSize.ll - removed unnecessary "-da-disable-delinearization-checks" flag. Now can get the exact answer without it.
- SimpleSIVNoValidityCheckFixedSize.ll and PreliminaryNoValidityCheckFixedSize.ll - made negative tests more explicit, in order to demonstrate the need for "-da-disable-delinearization-checks" flag

Differential Revision: https://reviews.llvm.org/D101486

3 years ago[mlir][Python] Upstream the PybindAdaptors.h helpers and use it to implement sparse_t...
Stella Laurenzo [Mon, 10 May 2021 01:09:09 +0000 (18:09 -0700)]
[mlir][Python] Upstream the PybindAdaptors.h helpers and use it to implement sparse_tensor.encoding.

* The PybindAdaptors.h file has been evolving across different sub-projects (npcomp, circt) and has been successfully used for out of tree python API interop/extensions and defining custom types.
* Since sparse_tensor.encoding is the first in-tree custom attribute we are supporting, it seemed like the right time to upstream this header and use it to define the attribute in a way that we can support for both in-tree and out-of-tree use (prior, I had not wanted to upstream dead code which was not used in-tree).
* Adapted the circt version of `mlir_type_subclass`, also providing an `mlir_attribute_subclass`. As we get a bit of mileage on this, I would like to transition the builtin types/attributes to this mechanism and delete the old in-tree only `PyConcreteType` and `PyConcreteAttribute` template helpers (which cannot work reliably out of tree as they depend on internals).
* Added support for defaulting the MlirContext if none is passed so that we can support the same idioms as in-tree versions.

There is quite a bit going on here and I can split it up if needed, but would prefer to keep the first use and the header together so sending out in one patch.

Differential Revision: https://reviews.llvm.org/D102144

3 years ago[lld][WebAssembly] Disallow exporting of TLS symbols
Sam Clegg [Fri, 7 May 2021 03:29:05 +0000 (20:29 -0700)]
[lld][WebAssembly] Disallow exporting of TLS symbols

Cross module TLS is currently not supported by our ABI.  This
change makes explicitly exporting a TLS symbol into an error
and prevents implicit exporting (via --export-all).

See https://github.com/emscripten-core/emscripten/issues/14120

Differential Revision: https://reviews.llvm.org/D102044

3 years ago[cmake] Enable -Wmisleading-indentation
Dave Lee [Fri, 7 May 2021 21:16:47 +0000 (14:16 -0700)]
[cmake] Enable -Wmisleading-indentation

Enable `-Wmisleading-indentation` to balance with the LLVM style of optional parentheses.

Differential Revision: https://reviews.llvm.org/D102092

3 years ago[mlir][CAPI] Add CAPI bindings for the sparse_tensor dialect.
Stella Laurenzo [Sun, 9 May 2021 23:14:05 +0000 (16:14 -0700)]
[mlir][CAPI] Add CAPI bindings for the sparse_tensor dialect.

* Adds dialect registration, hand coded 'encoding' attribute and test.
* An MLIR CAPI tablegen backend for attributes does not exist, and this is a relatively complicated case. I opted to hand code it in a canonical way for now, which will provide a reasonable blueprint for building out the tablegen version in the future.
* Also added a (local) CMake function for declaring new CAPI tests, since it was getting repetitive/buggy.

Differential Revision: https://reviews.llvm.org/D102141

3 years ago[X86][SSE] Add examples of failures to remove a permute(pack(pack(),pack())) shuffle...
Simon Pilgrim [Mon, 10 May 2021 16:50:26 +0000 (17:50 +0100)]
[X86][SSE] Add examples of failures to remove a permute(pack(pack(),pack())) shuffle by reordering the packed operands.

3 years ago[RISCV] Correct VL for fixed length masked scatter.
Craig Topper [Mon, 10 May 2021 16:34:32 +0000 (09:34 -0700)]
[RISCV] Correct VL for fixed length masked scatter.

We were incorrectly calling getVectorNumElements on a scalable
vector type. This shouldn't be allowed. This gives a warning on
EVT, but not MVT.

3 years ago[Demangle][Rust] Parse basic types
Tomasz Miąsko [Mon, 10 May 2021 15:58:20 +0000 (08:58 -0700)]
[Demangle][Rust] Parse basic types

Reviewed By: dblaikie

Differential Revision: https://reviews.llvm.org/D102142

3 years ago[clang] Support -fpic -fno-semantic-interposition for AArch64
Fangrui Song [Mon, 10 May 2021 16:43:33 +0000 (09:43 -0700)]
[clang] Support -fpic -fno-semantic-interposition for AArch64

-fno-semantic-interposition (only effective with -fpic) can optimize default
visibility external linkage (non-ifunc-non-COMDAT) variable access and function
calls to avoid GOT/PLT, by using local aliases, e.g.
```
int var;
__attribute__((optnone)) int fun(int x) { return x * x; }
int test() { return fun(var); }
```

-fpic (var and fun are dso_preemptable)
```
test:                                   // @test
        adrp    x8, :got:var
        ldr     x8, [x8, :got_lo12:var]
        ldr     w0, [x8]
// fun is preemptible by default in ld -shared mode. ld will create a PLT.
        b       fun
```

vs -fpic -fno-semantic-interposition (var and fun are dso_local)
```
test:                                   // @test
.Ltest$local:
        adrp    x8, .Lvar$local
        ldr     w0, [x8, :lo12:.Lvar$local]
// The assembler either resolves .Lfun$local at assembly time, or produces a
// relocation referencing a non-preemptible section symbol (which can avoid PLT).
        b       .Lfun$local
```

Note: Clang's default -fpic is more aggressive than GCC -fpic: interprocedural
optimizations (including inlining) are available but local aliases are not used.
-fpic -fsemantic-interposition can disable interprocedural optimizations.

Depends on D101872

Reviewed By: peter.smith

Differential Revision: https://reviews.llvm.org/D101873

3 years ago[ORC] Update SpeculativeJIT example for dispatchTask changes in 5344c88dcb2.
Lang Hames [Mon, 10 May 2021 16:20:59 +0000 (09:20 -0700)]
[ORC] Update SpeculativeJIT example for dispatchTask changes in 5344c88dcb2.

3 years ago[llvm-nm] Help option output should be consistent with the command guide
gbreynoo [Mon, 10 May 2021 16:25:41 +0000 (17:25 +0100)]
[llvm-nm] Help option output should be consistent with the command guide

The nm command guide shows the short options used as aliases but these
are not found in the help text unless --show-hidden is used, other tools
show aliases with --help. This change fixes the help output to be
consistent with the command guide.

Differential Revision: https://reviews.llvm.org/D102072

3 years ago[llvm-symbolizer] Update Command Guide
gbreynoo [Mon, 10 May 2021 16:19:05 +0000 (17:19 +0100)]
[llvm-symbolizer] Update Command Guide

The option --use-symbol-table is now a noop and does not appear in the
help text, however it still appears in the command guide. This change
removes it from the command guide and updates the description of
--output-style .

Differential Revision: https://reviews.llvm.org/D102078

3 years ago[X86][SSE] Add tests for missing shuffle(pack(x,y),pack(z,w)) -> permute(pack())...
Simon Pilgrim [Mon, 10 May 2021 16:17:21 +0000 (17:17 +0100)]
[X86][SSE] Add tests for missing shuffle(pack(x,y),pack(z,w)) -> permute(pack()) folds.

3 years ago[X86][SSE] Merge equal X32/X64 check prefixes. NFCI.
Simon Pilgrim [Mon, 10 May 2021 16:04:44 +0000 (17:04 +0100)]
[X86][SSE] Merge equal X32/X64 check prefixes. NFCI.

3 years ago[llvm-objdump][MachO] Print a newline before lazy bind/bind/weak/exports trie
Fangrui Song [Mon, 10 May 2021 16:16:18 +0000 (09:16 -0700)]
[llvm-objdump][MachO] Print a newline before lazy bind/bind/weak/exports trie

This adds a separator between two pieces of information.

Reviewed By: #lld-macho, alexshap

Differential Revision: https://reviews.llvm.org/D102114

3 years ago[libc++][NFC] Remove _VSTD:: when not needed.
Mark de Wever [Sun, 9 May 2021 16:22:52 +0000 (18:22 +0200)]
[libc++][NFC] Remove _VSTD:: when not needed.

Reviewed By: #libc, Quuxplusone

Differential Revision: https://reviews.llvm.org/D102133

3 years ago[X86] Fix position-independent TType encoding
Harald van Dijk [Mon, 10 May 2021 16:04:33 +0000 (17:04 +0100)]
[X86] Fix position-independent TType encoding

The logic for x86_64 position-independent TType encodings was backwards,
using 8 bytes where 4 were wanted and 4 where 8 were wanted. For regular
x86_64, this was mostly harmless, exception tables are allowed to use
8-byte encodings even when it is not needed. For the large code model,
and for X32, however, the generated exception tables were wrong. For the
large code model, we cannot assume that the address will fit in 4 bytes.
For X32, we cannot use 64-bit relocations.

Fixes PR50148.

Reviewed By: RKSimon

Differential Revision: https://reviews.llvm.org/D102132

3 years ago[NFC] Synchronize reserved identifier code between macro and variables / symbols
serge-sans-paille [Mon, 10 May 2021 12:54:25 +0000 (14:54 +0200)]
[NFC] Synchronize reserved identifier code between macro and variables / symbols

Differential Revision: https://reviews.llvm.org/D102164

3 years ago[clang][AArch32] Correctly align HA arguments when passed on the stack
Momchil Velikov [Mon, 10 May 2021 13:53:21 +0000 (14:53 +0100)]
[clang][AArch32] Correctly align HA arguments when passed on the stack

Analogously to https://reviews.llvm.org/D98794 this patch uses the
`alignstack` attribute to fix incorrect passing of homogeneous
aggregate (HA) arguments on AArch32. The EABI/AAPCS was recently
updated to clarify how VFP co-processor candidates are aligned:
https://github.com/ARM-software/abi-aa/commit/4488e34998514dc7af5507236f279f6881eede62

Differential Revision: https://reviews.llvm.org/D100853

3 years agoRevert "[PassManager] add helper function to hold set of vector passes"
Sanjay Patel [Mon, 10 May 2021 14:54:42 +0000 (10:54 -0400)]
Revert "[PassManager] add helper function to hold set of vector passes"

This reverts commit fefcb1f878c2dad435af604955661ca02a5302de.
It was supposed to be NFC, but as noted in the post-commit
comments in D102002, that was not true: SimplifyCFG uses
different parameters and there's a difference in an
extension point / callback.

3 years ago[libomptarget] Add support for target allocators to dynamic cuda RTL
Jon Chesterfield [Mon, 10 May 2021 14:27:49 +0000 (15:27 +0100)]
[libomptarget] Add support for target allocators to dynamic cuda RTL

[libomptarget] Add support for target allocators to dynamic cuda RTL

Follow on to D102000 which introduced new calls into libcuda. This patch adds
the corresponding entry points to dynamic_cuda, fixing the build for systems
that do not have the cuda toolkit installed.

Function types and enum from https://docs.nvidia.com/cuda/cuda-driver-api/group__CUDA__MEM.html

Reviewed By: pdhaliwal

Differential Revision: https://reviews.llvm.org/D102169

3 years ago[PowerPC] Enable safe for 32bit vins* P10 instructions
Zarko Todorovski [Mon, 10 May 2021 12:06:28 +0000 (08:06 -0400)]
[PowerPC] Enable safe for 32bit vins* P10 instructions

Correctly emit `vins`instructions that are safe in 32bit mode.

Reviewed By: nemanjai, #powerpc

Differential Revision: https://reviews.llvm.org/D101383

3 years ago[SLP]Do not count perfect diamond matches for gathers several times.
Alexey Bataev [Thu, 6 May 2021 20:44:03 +0000 (13:44 -0700)]
[SLP]Do not count perfect diamond matches for gathers several times.

Need to remove the old code for avoiding double counting of the gather
nodes with perfect diamond matches within the tree after we started
detecting perfect/shuffled matching in the previous patch D100495. We
may skip the cost for such nodes completely.

Differential Revision: https://reviews.llvm.org/D102023

3 years ago[libc++][AIX] Define _LIBCPP_ELAST
jasonliu [Mon, 10 May 2021 13:45:36 +0000 (13:45 +0000)]
[libc++][AIX] Define _LIBCPP_ELAST

The aim is to define _LIBCPP_ELAST for AIX since strerror/strerror_r
can't handle out-of-range errno values.

Differential Revision: https://reviews.llvm.org/D100986

3 years ago[AArch64][SVE] Improve SVE codegen for fixed length BITCAST
Bradley Smith [Thu, 6 May 2021 11:19:38 +0000 (12:19 +0100)]
[AArch64][SVE] Improve SVE codegen for fixed length BITCAST

Expanding a fixed length operation involves wrapping the operation in an
insert/extract subvector pair, as such, when this is done to bitcast we
end up with an extract_subvector of a bitcast. DAGCombine tries to
convert this into a bitcast of an extract_subvector which restores the
initial fixed length bitcast, causing an infinite loop of legalization.

As part of this patch, we must make sure the above DAGCombine does not
trigger after legalization if the created bitcast would not be legal.

Differential Revision: https://reviews.llvm.org/D101990

3 years ago[OPENMP]Fix PR48851: the locals are not globalized in SPMD mode.
Alexey Bataev [Wed, 5 May 2021 15:01:58 +0000 (08:01 -0700)]
[OPENMP]Fix PR48851: the locals are not globalized in SPMD mode.

Follow the more general patch for now, do not try to SPMDize the kernel
if the variable is used and local.

Differential Revision: https://reviews.llvm.org/D101911

3 years ago[TableGen] Remove redundant `Error:` in msg (NFC)
qixingxue [Mon, 10 May 2021 06:32:21 +0000 (14:32 +0800)]
[TableGen] Remove redundant `Error:` in msg (NFC)

Since calling `PrintFatalError` will automatically add `error: `
prefix in the message printed, there is no need having an extra
`ERROR:` prefix in the argument passed.

Differential Revision: https://reviews.llvm.org/D102151
Reviewed By: Paul-C-Anagnostopoulos

3 years agoX86FlagsCopyLowering.cpp - try to pass DebugLoc by const-ref to avoid costly Tracking...
Simon Pilgrim [Mon, 10 May 2021 13:00:19 +0000 (14:00 +0100)]
X86FlagsCopyLowering.cpp - try to pass DebugLoc by const-ref to avoid costly TrackingMDNodeRef copies. NFCI.

3 years agoX86LoadValueInjectionLoadHardening.cpp - use const-reference in for-range loops to...
Simon Pilgrim [Mon, 10 May 2021 12:58:05 +0000 (13:58 +0100)]
X86LoadValueInjectionLoadHardening.cpp - use const-reference in for-range loops to avoid unnecessary copies. NFCI.

3 years ago[Constant] Allow ConstantAggregateZero a scalable element count
Fraser Cormack [Fri, 7 May 2021 16:41:27 +0000 (17:41 +0100)]
[Constant] Allow ConstantAggregateZero a scalable element count

A ConstantAggregateZero may be created from a scalable vector type.
However, it still assumed fixed number of elements when queried for
them. This patch changes ConstantAggregateZero to correctly report its
element count.

This change fixes a couple of issues. Firstly, it fixes a crash in
Constant::getUniqueValue when called on a scalable-vector
zeroinitializer constant.

Secondly, it fixes a latent bug in GlobalISel's IRTranslator in which
translating a scalable-vector zeroinitializer would hit the assertion in
ConstantAggregateZero::getNumElements when casting to a FixedVectorType,
rather than reporting an error more gracefully. This is currently
hypothetical as the IRTranslator has deeper issues preventing the use of
scalable vector types.

Reviewed By: RKSimon

Differential Revision: https://reviews.llvm.org/D102082

3 years ago[clangd] Fix data type of WorkDoneProgressReport::percentage
Christian Kandeler [Mon, 10 May 2021 12:56:55 +0000 (14:56 +0200)]
[clangd] Fix data type of WorkDoneProgressReport::percentage

According to the specification, this should be an unsigned integer.

Reviewed By: sammccall

Differential Revision: https://reviews.llvm.org/D101616

3 years ago[NFC][llvm-dwarfdump] Code clean up for inlined var loc stats
Djordje Todorovic [Mon, 10 May 2021 12:08:33 +0000 (05:08 -0700)]
[NFC][llvm-dwarfdump] Code clean up for inlined var loc stats

This is preparation for the https://reviews.llvm.org/D101025.
The D101025 will start calculating var locstats for concrete fns
that refere to an abstract origin as well.

3 years agoclang: Fix tests after 7f78e409d028 if clang is not called clang-13
Nico Weber [Mon, 10 May 2021 12:47:28 +0000 (08:47 -0400)]
clang: Fix tests after 7f78e409d028 if clang is not called clang-13

We might release a new version at some point after all.
In fact, use the same pattern the other CHECK lines in this test
use, for consistency.

3 years ago[AArch64][SVE] Better utilisation of unpredicated forms of remaining intrinsics
Bradley Smith [Fri, 30 Apr 2021 15:21:50 +0000 (16:21 +0100)]
[AArch64][SVE] Better utilisation of unpredicated forms of remaining intrinsics

When using predicated intrinsics, if the predicate used is all lanes active,
use an unpredicated form of the instruction, additionally this allows for
better use of immediate forms.

This only includes instructions where the unpredicated/predicated forms
matched in such a way that instruction selection would not introduce extra
ptrue instructions. This allows us to convert the intrinsics directly to
architecture independent ISD nodes.

Depends on D101062

Differential Revision: https://reviews.llvm.org/D101828

3 years ago[AArch64][SVE] Better utilisation of unpredicated forms of arithmetic intrinsics
Bradley Smith [Fri, 30 Apr 2021 15:17:37 +0000 (16:17 +0100)]
[AArch64][SVE] Better utilisation of unpredicated forms of arithmetic intrinsics

When using predicated arithmetic intrinsics, if the predicate used is all
lanes active, use an unpredicated form of the instruction, additionally
this allows for better use of immediate forms.

This also includes a new complex isel pattern which allows matching an
all active predicate when the types are different but the predicate is a
superset of the type being used. For example, to allow a b8 ptrue for a
b32 predicate operand.

This only includes instructions where the unpredicated/predicated forms
are mismatched between variants, meaning that the removal of the
predicate is done during instruction selection in order to prevent
spurious re-introductions of ptrue instructions.

Co-authored-by: Paul Walker <paul.walker@arm.com>
Differential Revision: https://reviews.llvm.org/D101062

3 years ago[GlobalISel] Fix wrong invocation of `getParamStackAlign` (NFC)
Momchil Velikov [Mon, 10 May 2021 10:19:13 +0000 (11:19 +0100)]
[GlobalISel] Fix wrong invocation of `getParamStackAlign` (NFC)

The function template `CallLowering::setArgFlags` is invoked both
for arguments and return values. In the latter case, it calls
`getParamStackAlign` with argument index `~0u`. Nothing wrong
happens now, as the argument is safely incremented back to 0
inside `getParamStackAlign` (the type is `unsigned`), but in
principle it's fragile and may become incorrect.

Differential Revision: https://reviews.llvm.org/D102004

3 years ago[AArch64][SVE] Fix isel failure for FP-extending loads
Sander de Smalen [Mon, 10 May 2021 10:27:38 +0000 (11:27 +0100)]
[AArch64][SVE] Fix isel failure for FP-extending loads

DAGCombiner tries to combine a (fpext (load)) to (fround (extload))
but SVE has no FP-extending loads. By marking these as expand,
the combine no longer happens.

This also fixes a similar issue for fptrunc, where the source type
is not a legal type.

Reviewed By: bsmith, kmclaughlin

Differential Revision: https://reviews.llvm.org/D102053

3 years agoHexagonVectorCombine.cpp - don't negate a bool value. NFCI.
Simon Pilgrim [Mon, 10 May 2021 09:49:08 +0000 (10:49 +0100)]
HexagonVectorCombine.cpp - don't negate a bool value. NFCI.

Silences MSVC warning.

3 years ago[clang][PreProcessor] Cutoff parsing after hitting completion point
Kadir Cetinkaya [Fri, 7 May 2021 13:22:29 +0000 (15:22 +0200)]
[clang][PreProcessor] Cutoff parsing after hitting completion point

This fixes a crash caused by Lexers being invalidated at code
completion points in
https://github.com/llvm/llvm-project/blob/main/clang/lib/Lex/PPLexerChange.cpp#L520.

Differential Revision: https://reviews.llvm.org/D102069

3 years ago[OpenMP][MLIR]Add support for guided, auto and runtime scheduling
Mats Petersson [Mon, 10 May 2021 08:54:41 +0000 (08:54 +0000)]
[OpenMP][MLIR]Add support for guided, auto and runtime scheduling

When using parallel loop construct, the OpenMP specification allows for
guided, auto and runtime as scheduling variants (as well as static and
dynamic which are already supported).

This adds the translation from MLIR to LLVM-IR for these scheduling
variants.

Reviewed By: jdoerfert

Differential Revision: https://reviews.llvm.org/D101435

3 years agoFixed bug in buffer deallocation pass using unranked memref types.
Julian Gross [Mon, 3 May 2021 14:59:59 +0000 (16:59 +0200)]
Fixed bug in buffer deallocation pass using unranked memref types.

In the buffer deallocation pass, unranked memref types are not properly supported.
After investigating this issue, it turns out that the Clone and Dealloc operation
does not support unranked memref types in the current implementation.
This patch adds the missing feature and enables the transformation of any memref
type.

This patch solves this bug: https://bugs.llvm.org/show_bug.cgi?id=48385

Differential Revision: https://reviews.llvm.org/D101760

3 years ago[compiler-rt] Handle None value when polling addr2line pipe
David Spickett [Wed, 5 May 2021 10:49:35 +0000 (11:49 +0100)]
[compiler-rt] Handle None value when polling addr2line pipe

According to:
https://docs.python.org/3/library/subprocess.html#subprocess.Popen.poll

poll can return None if the process hasn't terminated.

I'm not quite sure how addr2line could end up closing the pipe without
terminating but we did see this happen on one of our bots:
```
<...>scripts/asan_symbolize.py",
line 211, in symbolize
    logging.debug("addr2line exited early (broken pipe), returncode=%d"
% self.pipe.poll())
TypeError: %d format: a number is required, not NoneType
```

Handle None by printing a message that we couldn't get the return
code.

Reviewed By: delcypher

Differential Revision: https://reviews.llvm.org/D101891

3 years ago[MLIR][Shape] Concretize broadcast result type if possible
Frederik Gossen [Mon, 10 May 2021 08:22:23 +0000 (10:22 +0200)]
[MLIR][Shape] Concretize broadcast result type if possible

As a canonicalization, infer the resulting shape rank if possible.

Differential Revision: https://reviews.llvm.org/D102068

3 years ago[libc] Simplifies multi implementations and benchmarks
Guillaume Chatelet [Mon, 10 May 2021 08:23:30 +0000 (08:23 +0000)]
[libc] Simplifies multi implementations and benchmarks

This is a follow up on D101524 which:
 - simplifies cpu features detection and usage,
 - flattens target dependent optimizations so it's obvious which implementations are generated,
 - provides an implementation targeting the host (march/mtune=native) for the mem* functions,
 - makes sure all implementations are unittested (provided the host can run them),
 - makes sure all implementations are benchmarkable (provided the host can run them).

Differential Revision: https://reviews.llvm.org/D101895

3 years agoAMDGPU/GlobalISel: Use destination register bank in applyMappingLoad
Petar Avramovic [Mon, 10 May 2021 08:16:09 +0000 (10:16 +0200)]
AMDGPU/GlobalISel: Use destination register bank in applyMappingLoad

Large loads on target that does not useFlatForGlobal have to be split
in regbankselect. This did not happen in case when destination had vgpr
bank and address had sgpr bank.
Instead of checking if address bank is sgpr check bank of the destination.

Differential Revision: https://reviews.llvm.org/D101992

3 years agoAMDGPU/GlobalISel: Add regbankselect test for vgpr(dest) sgpr(address) load
Petar Avramovic [Fri, 7 May 2021 11:12:28 +0000 (13:12 +0200)]
AMDGPU/GlobalISel: Add regbankselect test for vgpr(dest) sgpr(address) load

Pre-commit for D101992.

3 years ago[mlir] OpenMP-to-LLVM: properly set outer alloca insertion point
Alex Zinenko [Mon, 10 May 2021 08:02:18 +0000 (10:02 +0200)]
[mlir] OpenMP-to-LLVM: properly set outer alloca insertion point

Previously, the OpenMP to LLVM IR conversion was setting the alloca insertion
point to the same position as the main compuation when converting OpenMP
`parallel` operations. This is problematic if, for example, the `parallel`
operation is placed inside a loop and would keep allocating on stack on each
iteration leading to stack overflow.

Reviewed By: kiranchandramohan

Differential Revision: https://reviews.llvm.org/D101307

3 years ago[AMDGPU][OpenMP] Emit textual IR for -emit-llvm -S
Pushpinder Singh [Fri, 7 May 2021 11:37:07 +0000 (11:37 +0000)]
[AMDGPU][OpenMP] Emit textual IR for -emit-llvm -S

Previously clang would print a binary blob into the bundled file
for amdgcn. With this patch, it will instead print textual IR as
expected.

Reviewed By: JonChesterfield

Differential Revision: https://reviews.llvm.org/D102065

3 years ago[libc] Allow target architecture customization
Guillaume Chatelet [Mon, 10 May 2021 07:53:48 +0000 (07:53 +0000)]
[libc] Allow target architecture customization

This patch provides a way to specify the default target cpu optimizations to use when compiling llvm-libc.
This ensures we don't rely on current compiler's default and allows compiling and cross compiling for a particular target.

Differential Revision: https://reviews.llvm.org/D101991

3 years ago[AMDGPU][OpenMP] Disable tests when amdgpu-arch fails
Pushpinder Singh [Fri, 7 May 2021 08:15:49 +0000 (08:15 +0000)]
[AMDGPU][OpenMP] Disable tests when amdgpu-arch fails

This patch prevents runtime tests running on systems without amdgpu.

Reviewed By: protze.joachim, tianshilei1992

Differential Revision: https://reviews.llvm.org/D102054

3 years ago[amdgpu-arch] Guard hsa.h with __has_include
Pushpinder Singh [Fri, 7 May 2021 11:56:46 +0000 (11:56 +0000)]
[amdgpu-arch] Guard hsa.h with __has_include

This patch is suppose to fix the issue of hsa.h not found.
Issue was reported in D99949

Reviewed By: JonChesterfield

Differential Revision: https://reviews.llvm.org/D102067

3 years ago[LegalizeVectorOps][RISCV] Add scalable-vector SELECT expansion
Fraser Cormack [Fri, 7 May 2021 10:20:21 +0000 (11:20 +0100)]
[LegalizeVectorOps][RISCV] Add scalable-vector SELECT expansion

This patch extends VectorLegalizer::ExpandSELECT to permit expansion
also for scalable vector types. The only real change is conditionally
checking for BUILD_VECTOR or SPLAT_VECTOR legality depending on the
vector type.

We can use this to fix "cannot select" errors for scalable vector
selects on the RISCV target. Note that in future patches RISCV will
possibly custom-lower vector SELECTs to VSELECTs for branchless codegen.

Reviewed By: craig.topper

Differential Revision: https://reviews.llvm.org/D102063

3 years ago[mlir] Fix compile error.
Adrian Kuegel [Mon, 10 May 2021 05:48:45 +0000 (07:48 +0200)]
[mlir] Fix compile error.

Inside a templated function, other class members need to be called with
this->.
Otherwise we get: explicit qualification required to use member
'setDebugName' from dependent base class.

3 years ago[AArch64][SVE] Remove index_vector node.
Jun Ma [Fri, 30 Apr 2021 02:30:37 +0000 (10:30 +0800)]
[AArch64][SVE] Remove index_vector node.

Since index_vector is lowered into step_vector in D100816, we can just remove
index_vector, use step_vector for codegen directly.

Differential Revision: https://reviews.llvm.org/D101593

3 years ago[ORC] Use the new dispatchTask API to run query callbacks.
Lang Hames [Sun, 9 May 2021 18:20:54 +0000 (11:20 -0700)]
[ORC] Use the new dispatchTask API to run query callbacks.

Dispatching query callbacks, rather than running them on the current thread,
will allow them to be distributed across multiple threads.

3 years ago[ORC] Generalize materialization dispatch to task dispatch.
Lang Hames [Sun, 9 May 2021 00:45:42 +0000 (17:45 -0700)]
[ORC] Generalize materialization dispatch to task dispatch.

Generalizing this API allows work to be distributed more evenly. In particular,
query callbacks can now be dispatched (rather than running immediately on the
thread that satisfied the query). This avoids the pathalogical case where an
operation on one thread satisfies many queries simultaneously, causing large
amounts of work to be run on that thread while other threads potentially sit
idle.

3 years ago[SimplifyCFG] Ignore ephemeral values when counting insts for threading
Teresa Johnson [Wed, 28 Apr 2021 22:20:04 +0000 (15:20 -0700)]
[SimplifyCFG] Ignore ephemeral values when counting insts for threading

Ignore ephemeral values (only feeding llvm.assume intrinsics) when
computing the instruction count to decide if a block is small enough for
threading. This is similar to the handling of these values in the
InlineCost computation. These instructions will eventually be removed
and shouldn't count against code size (similar to the existing ignoring
of phis).

Without this change, when enabling -fwhole-program-vtables, which causes
type test / assume sequences to be inserted by clang, we can get
different threading decisions. In particular, when building with
instrumentation FDO it can affect the optimizations decisions before FDO
matching, leading to some mismatches.

Differential Revision: https://reviews.llvm.org/D101494

3 years ago[NFC][Coroutines] Fix two tests by removing hardcoded SSA value.
Yuanfang Chen [Mon, 10 May 2021 02:04:07 +0000 (19:04 -0700)]
[NFC][Coroutines] Fix two tests by removing hardcoded SSA value.

3 years ago[RISCV][NFC] Don't need to create a new STI in RISCVAsmPrinter.
Zakk Chen [Wed, 5 May 2021 07:53:41 +0000 (15:53 +0800)]
[RISCV][NFC] Don't need to create a new STI in RISCVAsmPrinter.

RISCVAsmPrinter already has MCSubtargetInfo.

Reviewed By: HsiangKai

Differential Revision: https://reviews.llvm.org/D101889

3 years agoSupport NativeCodeCall binding in rewrite pattern.
Chia-hung Duan [Fri, 16 Apr 2021 05:34:10 +0000 (13:34 +0800)]
Support NativeCodeCall binding in rewrite pattern.

We are able to bind the result from native function while rewriting
pattern. In matching pattern, if we want to get some values back, we can
do that by passing parameter as return value placeholder. Besides, add
the semantic of '$_self' in NativeCodeCall while matching, it'll be the
operation that defines certain operand.

Differential Revision: https://reviews.llvm.org/D100746

3 years ago[lld-macho] Add llvm-otool as a test dependency
Jez Ng [Mon, 10 May 2021 01:11:29 +0000 (21:11 -0400)]
[lld-macho] Add llvm-otool as a test dependency

This unbreaks my local build, which is configured to build only parts of
LLVM.

3 years ago[lld/mac] Fix alignment on subsections
Nico Weber [Sun, 9 May 2021 22:35:16 +0000 (18:35 -0400)]
[lld/mac] Fix alignment on subsections

On a section with alignment of 16, subsections aligned to 16-byte
boundaries should keep their 16-byte alignment.

Fixes PR50274. (The same bug could have happened with -order_file
previously.)

Differential Revision: https://reviews.llvm.org/D102139

3 years ago[lld-macho] Don't reference entry symbol for non-executables
Jez Ng [Mon, 10 May 2021 00:05:45 +0000 (20:05 -0400)]
[lld-macho] Don't reference entry symbol for non-executables

This would cause us to pull in symbols (and code) that should
be unused.

Reviewed By: #lld-macho, thakis

Differential Revision: https://reviews.llvm.org/D102137

3 years ago[Demangle][Rust] Print special namespaces
Tomasz Miąsko [Sun, 9 May 2021 20:38:13 +0000 (13:38 -0700)]
[Demangle][Rust] Print special namespaces

Reviewed By: dblaikie

Differential Revision: https://reviews.llvm.org/D101821

3 years ago[X86] AMD Zen 3: same-reg CMP is a zero-cycle dependency-breaking instruction
Roman Lebedev [Sun, 9 May 2021 20:45:44 +0000 (23:45 +0300)]
[X86] AMD Zen 3: same-reg CMP is a zero-cycle dependency-breaking instruction

As measured by exegesis, and confirmed by ref docs.

3 years ago[NFC][X86][MCA] AMD Zen 3: add tests for CMP dependency breaking
Roman Lebedev [Sun, 9 May 2021 20:28:08 +0000 (23:28 +0300)]
[NFC][X86][MCA] AMD Zen 3: add tests for CMP dependency breaking

3 years ago[X86] AMD Zen 3: same-reg SBB is a dependency-breaking instruction
Roman Lebedev [Sun, 9 May 2021 20:14:17 +0000 (23:14 +0300)]
[X86] AMD Zen 3: same-reg SBB is a dependency-breaking instruction

As confirmed by exegesis measurements, and ref docs.
It does actually execute.

While there, bump latency for MULX32rr, that seems to match measurements.

3 years ago[NFC][X86][MCA] AMD Zen 3: add tests for SBB dependency breaking
Roman Lebedev [Sun, 9 May 2021 20:14:12 +0000 (23:14 +0300)]
[NFC][X86][MCA] AMD Zen 3: add tests for SBB dependency breaking

3 years ago[X86] AMD Zen 3: same-register XOR/SUB are GPR dependency breaking zero-idioms
Roman Lebedev [Sun, 9 May 2021 19:43:30 +0000 (22:43 +0300)]
[X86] AMD Zen 3: same-register XOR/SUB are GPR dependency breaking zero-idioms

As measured by exegesis and confirmed in reference docs.

3 years ago[NFC][X86][MCA] AMD Zen3: add GPR zero-idiom dependency breaking tests
Roman Lebedev [Sun, 9 May 2021 19:27:16 +0000 (22:27 +0300)]
[NFC][X86][MCA] AMD Zen3: add GPR zero-idiom dependency breaking tests

3 years ago[ARM] Fix postinc of vst1xN
David Green [Sun, 9 May 2021 20:57:55 +0000 (21:57 +0100)]
[ARM] Fix postinc of vst1xN

These nodes are not handled correctly by CombineBaseUpdate. For the
moment, similar to 5f1cad4d296a20025f0b mark them as unsupported.

3 years ago[SCEV] Handle and/or in applyLoopGuards()
Nikita Popov [Sat, 1 May 2021 14:59:06 +0000 (16:59 +0200)]
[SCEV] Handle and/or in applyLoopGuards()

applyLoopGuards() already combines conditions from multiple nested
guards. However, it cannot use multiple conditions on the same guard,
combined using and/or. Add support for this by recursing into either
`and` or `or`, depending on the direction of the branch.

Differential Revision: https://reviews.llvm.org/D101692

3 years ago[SCEV] Add additional loop guard and/or tests (NFC)
Nikita Popov [Sun, 9 May 2021 19:21:54 +0000 (21:21 +0200)]
[SCEV] Add additional loop guard and/or tests (NFC)

Add tests for and/and, and/or, or/or, or/and combinations.

3 years ago[NFC][X86] Znver3: drop obsolete fixme
Roman Lebedev [Sun, 9 May 2021 17:37:30 +0000 (20:37 +0300)]
[NFC][X86] Znver3: drop obsolete fixme

3 years ago[X86] AMD Zen 3: XCHG is a zero-cycle instruction
Roman Lebedev [Sun, 9 May 2021 14:32:37 +0000 (17:32 +0300)]
[X86] AMD Zen 3: XCHG is a zero-cycle instruction

As measured by exegesis and confirmed by reference docs.

3 years ago[SelectionDAG] Regenerate test checks (NFC)
LemonBoy [Sun, 9 May 2021 16:51:05 +0000 (18:51 +0200)]
[SelectionDAG] Regenerate test checks (NFC)

3 years ago[SROA] Regenerate test checks (NFC)
Nikita Popov [Sun, 9 May 2021 16:20:37 +0000 (18:20 +0200)]
[SROA] Regenerate test checks (NFC)

3 years ago[libc++][doc] Update the Format library status.
Mark de Wever [Sun, 9 May 2021 15:55:50 +0000 (17:55 +0200)]
[libc++][doc] Update the Format library status.

- Move LWG-3218 to the chrono section.
- Mark the several parts 'In progress'.

3 years ago[lld-macho][NFC] Purge stale test-output trees prior to split-file
Greg McGary [Sat, 8 May 2021 18:42:15 +0000 (11:42 -0700)]
[lld-macho][NFC] Purge stale test-output trees prior to split-file

Enforce standard practice

Differential Revision: https://reviews.llvm.org/D102112

3 years ago[NFC][LoopIdiom] Add some tests for 'lshr until zero' ('count active bits') "on stero...
Roman Lebedev [Sat, 8 May 2021 21:57:59 +0000 (00:57 +0300)]
[NFC][LoopIdiom] Add some tests for 'lshr until zero' ('count active bits') "on steroids" idiom

3 years ago[NFCI][X86] Mark Znver3 scheduling model as complete
Roman Lebedev [Sat, 8 May 2021 17:42:14 +0000 (20:42 +0300)]
[NFCI][X86] Mark Znver3 scheduling model as complete

To the best of my knowledge, all instructions are modelled,
and have reasonable values to them; flipping the switch
doesn't cause any diff for MCA tests, so either we're good,
or we have test coverage gaps.

I'm not really sure why no other X86 sched model is marked as complete.

3 years ago[NFCI][X86] Mark a few lately-added system instructions as such for Scheduling purposes
Roman Lebedev [Sat, 8 May 2021 17:39:26 +0000 (20:39 +0300)]
[NFCI][X86] Mark a few lately-added system instructions as such for Scheduling purposes