platform/upstream/llvm.git
15 months ago[RuntimeDyld][ELF] Fix off-by-1 issues in R_AARCH64_ABS{16,32} overflow checks
Fangrui Song [Wed, 5 Apr 2023 13:52:54 +0000 (06:52 -0700)]
[RuntimeDyld][ELF] Fix off-by-1 issues in R_AARCH64_ABS{16,32} overflow checks

15 months ago[AArch64] Add cost-model tests for fshr.
Florian Hahn [Wed, 5 Apr 2023 13:49:12 +0000 (14:49 +0100)]
[AArch64] Add cost-model tests for fshr.

15 months ago[RISCV][NFC] Use RISCVSubtarget method for predicate in RISCVFeatures.td when available
Alex Bradbury [Wed, 5 Apr 2023 13:46:44 +0000 (14:46 +0100)]
[RISCV][NFC] Use RISCVSubtarget method for predicate in RISCVFeatures.td when available

As RISCVSubtarget defines hasStdExtZfhOrZfhmin() and hasStdExtCOrZca(),
just use these for the matching Predicate definitions rather than
repeating the logic.

15 months ago[mlir] Fix a use after free when loading dependent dialects
Benjamin Kramer [Wed, 5 Apr 2023 13:35:02 +0000 (15:35 +0200)]
[mlir] Fix a use after free when loading dependent dialects

The way dependent dialects are implemented is by recursively calling
loadDialect in the constructor. This means we have to reload from the
dialect table because the constructor might have rehashed that table.

The steps for loading a dialect are
  1. Insert a nullptr into loadedDialects. This indicates the dialect is
     loading
  2. Call ctor(). This recursively loads dependent dialects
  3. Insert the new dialect into the table.

We had a conflict between steps 2 and 3 here. You have to be extremely
unlucky though as rehashing is rare and operator[] does no generation
checking on DenseMap. Changing that to an iterator would've uncovered
this issue immediately.

15 months ago[mlir] update Bazel bulid for 1ef51e0452a473f404edc635412685fce6f61004
Alex Zinenko [Wed, 5 Apr 2023 13:42:02 +0000 (15:42 +0200)]
[mlir] update Bazel bulid for 1ef51e0452a473f404edc635412685fce6f61004

15 months agoRevert "[dsymutil][NFC] Move ARM specific test into the ARM directory."
Alexey Lapshin [Wed, 5 Apr 2023 13:40:02 +0000 (15:40 +0200)]
Revert "[dsymutil][NFC] Move ARM specific test into the ARM directory."

This reverts commit adeb1fa7a34d097825f71dfdfe5c62a242353bb9.

15 months ago[VPlan] Replace check for replicate regions with assert (NFCI).
Florian Hahn [Wed, 5 Apr 2023 13:29:24 +0000 (14:29 +0100)]
[VPlan] Replace check for replicate regions with assert (NFCI).

After recent changes, replication regions only get introduced later, so
there's no need to check for them.

15 months ago[X86][mem-fold] Remove definition of NotMemoryFoldable and move code into a def file...
Shengchen Kan [Wed, 5 Apr 2023 13:23:12 +0000 (21:23 +0800)]
[X86][mem-fold] Remove definition of NotMemoryFoldable and move code into a def file, NFCI

The goal is to centralize the logic of the memory fold.

15 months ago[InstCombine] Convert tests to opaque pointers (NFC)
Nikita Popov [Wed, 5 Apr 2023 13:17:15 +0000 (15:17 +0200)]
[InstCombine] Convert tests to opaque pointers (NFC)

The two debuginfo tests go away because the relevant transforms
no longer occur in this form, e.g. the "cast of alloca" transform
just doesn't exist with opaque pointers.

15 months ago[InstCombine] Regenerate test checks (NFC)
Nikita Popov [Wed, 5 Apr 2023 13:14:03 +0000 (15:14 +0200)]
[InstCombine] Regenerate test checks (NFC)

15 months ago[InstCombine] Name instructions in test (NFC)
Nikita Popov [Wed, 5 Apr 2023 13:07:26 +0000 (15:07 +0200)]
[InstCombine] Name instructions in test (NFC)

15 months ago[InstCombine] Regenerate test checks (NFC)
Nikita Popov [Wed, 5 Apr 2023 13:05:16 +0000 (15:05 +0200)]
[InstCombine] Regenerate test checks (NFC)

15 months ago[InstCombine] Use CreateGEP() API (NFC)
Nikita Popov [Wed, 5 Apr 2023 13:02:02 +0000 (15:02 +0200)]
[InstCombine] Use CreateGEP() API (NFC)

Use the IRBuilder API that accepts inbounds as a boolean parameter,
rather than using a ternary.

15 months ago[mlir][Analysis] Introduce LoopInfo in mlir
Christian Ulmann [Wed, 5 Apr 2023 12:55:22 +0000 (12:55 +0000)]
[mlir][Analysis] Introduce LoopInfo in mlir

This commit introduces an instantiation of LLVM's LoopInfo for CFGs in
MLIR. To test the LoopInfo, a test pass is added the checks the analysis
results for a set of CFGs.

Reviewed By: ftynse

Differential Revision: https://reviews.llvm.org/D147323

15 months ago[X86] combinePredicateReduction - reuse LowerVectorAllEqual for all_of/any_of(vXi1...
Simon Pilgrim [Wed, 5 Apr 2023 12:42:02 +0000 (13:42 +0100)]
[X86] combinePredicateReduction - reuse LowerVectorAllEqual for all_of/any_of(vXi1 eq/ne) reductions

15 months ago[ArgPromotion] Require noundef to copy poison-generating metadata
Nikita Popov [Wed, 5 Apr 2023 12:33:11 +0000 (14:33 +0200)]
[ArgPromotion] Require noundef to copy poison-generating metadata

For poison-generating (rather than IUB) metadata, only copy it
from the dominating must-exec load if it is combined with !noundef.
This could be further extended by additionall intersecting the
metadata from all loads, which does not require !noundef.

15 months ago[X86] LowerVectorAllEqual - split ALLOF(CMPEQ(X,Y)) -> AND(CMPEQ(X[0],Y[0]),CMPEQ...
Simon Pilgrim [Wed, 5 Apr 2023 11:28:26 +0000 (12:28 +0100)]
[X86] LowerVectorAllEqual - split ALLOF(CMPEQ(X,Y)) -> AND(CMPEQ(X[0],Y[0]),CMPEQ(X[1],Y[1]),....) on MOVMSK codegen

Fix minor regression on pre-PTEST targets, since these are always 128-bits we're better off reducing the comparison results (assuming we're not comparing against 0/-1).

15 months ago[GlobalISel] Improve stack slot tracking in dbg.values
Felipe de Azevedo Piovezan [Tue, 4 Apr 2023 13:35:23 +0000 (09:35 -0400)]
[GlobalISel] Improve stack slot tracking in dbg.values

For IR like:

```
%alloca = alloca ...
dbg.value(%alloca, !myvar, OP_deref(<other_ops>))
```

GlobalISel lowers it to MIR:

```
%some_reg = G_FRAME_INDEX <stack_slot>
DBG_VALUE %some_reg, !myvar, OP_deref(<other_ops>)
```

In other words, if the value of `!myvar` can be obtained by
dereferencing an alloca, in MIR we say that the _location_ of a variable
is obtained by dereferencing register %some_reg (plus some
`<other_ops>`).

We can instead remove the use of `%some_reg`: the location of `!myvar`
_is_ `<stack_slot>` (plus some `<other_ops>`). This patch implements
this transformation, which improves debug information handling in O0, as
these registers hardly ever survive register allocation.

A note about testing: similar to what was done in D76934
(f24e2e9eebde4b7a1d), this patch exposed a bug in the Builder class when
using `-debug`, where we tried to print an incomplete instruction. The
changes in `MachineIRBuilder.cpp` address that.

Differential Revision: https://reviews.llvm.org/D147536

15 months ago[X86] Convert tests to opaque pointers (NFC)
Nikita Popov [Wed, 5 Apr 2023 12:09:00 +0000 (14:09 +0200)]
[X86] Convert tests to opaque pointers (NFC)

15 months ago[X86] Name instructions in test (NFC)
Nikita Popov [Wed, 5 Apr 2023 12:07:52 +0000 (14:07 +0200)]
[X86] Name instructions in test (NFC)

15 months ago[X86] Convert some tests to opaque pointers (NFC)
Nikita Popov [Wed, 5 Apr 2023 11:02:19 +0000 (13:02 +0200)]
[X86] Convert some tests to opaque pointers (NFC)

15 months ago[lldb] Remove unused private field 'm_orig_rax_info' in RegisterContextLinux_x86_64...
Jie Fu [Wed, 5 Apr 2023 11:54:23 +0000 (19:54 +0800)]
[lldb] Remove unused private field 'm_orig_rax_info' in RegisterContextLinux_x86_64.h (NFC)

/data/llvm-project/lldb/source/Plugins/Process/Utility/RegisterContextLinux_x86_64.h:32:30: error: private field 'm_orig_rax_info' is not used [-Werror,-Wunused-private-field]
  lldb_private::RegisterInfo m_orig_rax_info;
                             ^
1 error generated.

15 months ago[lld-macho][nfc] std::find_if -> llvm::find_if
Jez Ng [Wed, 5 Apr 2023 05:52:14 +0000 (01:52 -0400)]
[lld-macho][nfc] std::find_if -> llvm::find_if

15 months ago[lld-macho][nfc] Clean up a bunch of clang-tidy issues
Jez Ng [Wed, 5 Apr 2023 05:48:34 +0000 (01:48 -0400)]
[lld-macho][nfc] Clean up a bunch of clang-tidy issues

15 months ago[MLIR] Clarify (test-scf-)parallel-loop-collapsing
Tres Popp [Tue, 4 Apr 2023 09:36:30 +0000 (11:36 +0200)]
[MLIR] Clarify (test-scf-)parallel-loop-collapsing

1. parallel-loop-collapsing is renamed to test-scf-parallel-loop-collapsing.
2. The pass adds various checks to provide error messages instead of
   hitting assert failures.
3. Testing is added to verify these error messages

This is roughly an NFC. The name changes, but all checked behavior
previously would have resulted in an assertion failure. Almost no new
support is added, so this pass is still limited in scope to testing the
transform behaves correctly with input arguments that perfectly match
the ParallelLoop's iterator arg set. The one new piece of functionality
is that invalid operations will now be skipped with an error messages
instead of producing an assertion failure, so the pass can be used with
expected failures for pieces of the IR not cared about with a specific
RUN command.

Differential Revision: https://reviews.llvm.org/D147514

15 months ago[lldb] Detach the child process when stepping over a fork
Pavel Labath [Thu, 12 Jan 2023 13:04:57 +0000 (14:04 +0100)]
[lldb] Detach the child process when stepping over a fork

Step over thread plans were claiming to explain the fork stop reasons,
which prevented the default fork logic (detaching from the child
process) from kicking in. This patch changes that.

Differential Revision: https://reviews.llvm.org/D141605

15 months ago[lldb] Drop RegisterInfoInterface::GetDynamicRegisterInfo
Pavel Labath [Tue, 28 Mar 2023 12:50:16 +0000 (14:50 +0200)]
[lldb] Drop RegisterInfoInterface::GetDynamicRegisterInfo

"Dynamic register info" is a very overloaded term, and this particular
instance of it was only used for passing the information about the
"orig_[re]ax" pseudo-register on x86 through some generic code. Since
both sides of the code are x86-specific, I have replaced this with a
more direct route.

Differential Revision: https://reviews.llvm.org/D147045

15 months ago[AArch64][GlobalISel] Add support for some across-vector NEON intrinsics
Vladislav Dzhidzhoev [Tue, 28 Mar 2023 13:57:35 +0000 (15:57 +0200)]
[AArch64][GlobalISel] Add support for some across-vector NEON intrinsics

Support uaddv, saddv, umaxv, smaxv, uminv, sminv, fmaxv, fminv,
fmaxnmv, fminnmv intrinsics in GlobalISel.

GlobalISelEmitter couldn't import SelectionDAG patterns containing nodes
with 8-bit result type, since they had untyped values. Therefore,
register type for FPR8 is set to i8 to eliminate untyped nodes in these
patterns.

Differential Revision: https://reviews.llvm.org/D146531

15 months ago[NFC][InstCombine] Add tests that show bogus combine of SVE intrinsics when using...
Paul Walker [Tue, 4 Apr 2023 12:49:44 +0000 (12:49 +0000)]
[NFC][InstCombine] Add tests that show bogus combine of SVE intrinsics when using strictfp.

15 months ago[ARM] Fold fadd of vcmul into vcmla
David Green [Wed, 5 Apr 2023 10:52:05 +0000 (11:52 +0100)]
[ARM] Fold fadd of vcmul into vcmla

This adds an extra tablegen combine for folding fadd(a, vcmul(b, c)) into
vcmla(a, b, c), so long as the fadd is allowed to contract.

Differential Revision: https://reviews.llvm.org/D147201

15 months agoUpdate mentions of reduction intrinsics; NFC
Sven van Haastregt [Wed, 5 Apr 2023 10:49:41 +0000 (11:49 +0100)]
Update mentions of reduction intrinsics; NFC

The intrinsics have been out of experimental since 322d0afd875d
("[llvm][mlir] Promote the experimental reduction intrinsics to be
first class intrinsics.", 2020-10-07); update some places that still
referred to them as experimental.

15 months ago[clang-tidy] Fix init-list handling in readability-implicit-bool-conversion
Piotr Zegar [Wed, 5 Apr 2023 09:44:36 +0000 (09:44 +0000)]
[clang-tidy] Fix init-list handling in readability-implicit-bool-conversion

Adds support for explicit casts using initListExpr,
for example: int{boolValue} constructions.

Fixes: #47000

Reviewed By: ccotter

Differential Revision: https://reviews.llvm.org/D147551

15 months ago[dsymutil][NFC] Move ARM specific test into the ARM directory.
Alexey Lapshin [Wed, 5 Apr 2023 10:01:10 +0000 (12:01 +0200)]
[dsymutil][NFC] Move ARM specific test into the ARM directory.

This patch moves fat-header.test -> ARM/fat-header.test

15 months ago[Assignment Tracking][SROA] Handle createFragmentExpression failure
OCHyams [Wed, 5 Apr 2023 10:00:26 +0000 (11:00 +0100)]
[Assignment Tracking][SROA] Handle createFragmentExpression failure

createFragmentExpression will fail if it determines that the expression cannot
be split over fragments. Handle this case in SROA. Similarly to D147312 this
should be a rare occurrence as the `dbg.assign` will usually reference the
`Value` being stored without modifying it with a `DIExpression`.

Reviewed By: jmorse

Differential Revision: https://reviews.llvm.org/D147431

15 months ago[LV] Use available masked vector function variants when required
Graham Hunter [Fri, 16 Sep 2022 14:23:18 +0000 (15:23 +0100)]
[LV] Use available masked vector function variants when required

LLVM has the ability to vectorize using function variants that require
a mask by creating an all-true mask, and to vectorize a conditional
call via scalarization, now we want to join the two parts together
and use a masked variant when a mask is required.

Reviewed By: david-arm

Differential Revision: https://reviews.llvm.org/D136251

15 months ago[AArch64][SME] Fix an infinite loop in DAGCombine related to adding -force-streaming...
Dinar Temirbulatov [Wed, 5 Apr 2023 10:10:55 +0000 (10:10 +0000)]
[AArch64][SME] Fix an infinite loop in DAGCombine related to adding -force-streaming-compatible-sve flag.

Compiler hits infinite loop in DAGCombine. For force-streaming-compatible-sve
mode we have custom lowering for 128-bit vector splats and later in
DAGCombiner::SimplifyVCastOp() we scalarized SPLAT because we have custom
lowering for SME. Later, we restored SPLAT opertion via performMulCombine().

15 months ago[Support] Improve Windows performance of buffered raw_ostream
Andrew Ng [Fri, 31 Mar 2023 16:50:22 +0000 (17:50 +0100)]
[Support] Improve Windows performance of buffered raw_ostream

The "preferred" buffer size for raw_ostream is set to BUFSIZ which on
Windows is only 512. This results in more calls to write and this
overhead can have a significant negative impact on performance,
especially when Anti-Virus is also involved.

Therefore increase the "preferred" buffer size to 16KB for Windows.

One example of where this helps is the LLD --Map option which dumps out
the symbol map for a link. In a link of UE4, this change has been seen
to improve the performance of the symbol map writing by more than a
factor of 6.

Differential Revision: https://reviews.llvm.org/D147340

15 months ago[X86][mem-fold][NFC] Refine code
Shengchen Kan [Wed, 5 Apr 2023 09:42:12 +0000 (17:42 +0800)]
[X86][mem-fold][NFC] Refine code

1. Use `unsigned` for `KeyOp` and `DstOp` b/c `Opcode` is of type `unsigned`.
2. Align the comparator used in X86FoldTablesEmitter.cpp with the one in
   CodeGenTarget::ComputeInstrsByEnum.

15 months ago[gn build] Port 628f11f78d33
LLVM GN Syncbot [Wed, 5 Apr 2023 09:49:30 +0000 (09:49 +0000)]
[gn build] Port 628f11f78d33

15 months ago[DWARFLinkerParallel] Add StringTable class.
Alexey Lapshin [Tue, 4 Apr 2023 12:23:34 +0000 (14:23 +0200)]
[DWARFLinkerParallel] Add StringTable class.

This patch adds StringTable class which is used to prepare
strings for emission into the .debug_str table. Specifically,
this class translates strings if necessary, keeps them in order,
assigns index and offset.

Differential Revision: https://reviews.llvm.org/D147529

15 months ago[AMDGPU] Add machine verifier to a test
Jay Foad [Wed, 5 Apr 2023 09:36:32 +0000 (10:36 +0100)]
[AMDGPU] Add machine verifier to a test

15 months ago[ARM] Combine fadd into fcmla
David Green [Wed, 5 Apr 2023 09:31:19 +0000 (10:31 +0100)]
[ARM] Combine fadd into fcmla

This is the MVE equivalent of https://reviews.llvm.org/D146407. It adds a
target combine for fadd(a, vcmla(b, c, d)) -> vcmla(fadd(a, b), c, d), pushing
the fadd into the operands of the fcmla, which can help simplify away some
additions.

Differential Revision: https://reviews.llvm.org/D147200

15 months ago[clang] Fix crash when handling nested immediate invocations
Mariya Podchishchaeva [Wed, 5 Apr 2023 09:01:42 +0000 (05:01 -0400)]
[clang] Fix crash when handling nested immediate invocations

Before this patch it was expected that if there was several immediate
invocations they all belong to the same expression evaluation context.
During parsing of non local variable initializer a new evaluation context is
pushed, so code like this
```
namespace scope {
struct channel {
    consteval channel(const char* name) noexcept { }
};
consteval const char* make_channel_name(const char* name) { return name;}

channel rsx_log(make_channel_name("rsx_log"));
}
```
produced a nested immediate invocation whose subexpressions are attached
to different expression evaluation contexts. The constructor call
belongs to TU context and `make_channel_name` call to context of
variable initializer.

This patch removes this assumption and adds tracking of previously
failed immediate invocations, so it is possible when handling an
immediate invocation th check that its subexpressions from possibly another
evaluation context contains errors and not produce duplicate
diagnostics.

Fixes https://github.com/llvm/llvm-project/issues/58207

Reviewed By: aaron.ballman, shafik

Differential Revision: https://reviews.llvm.org/D146234

15 months ago[LICM] Don't require optimized uses
Nikita Popov [Fri, 24 Mar 2023 15:35:02 +0000 (16:35 +0100)]
[LICM] Don't require optimized uses

LICM currently requests optimized use MSSA form. This is wasteful,
because LICM doesn't actually care about most uses, only those of
invariant pointers in loops. Everything else doesn't need to be
optimized.

LICM already uses the clobber walker in most places. This patch
adjusts one place that was using getDefiningAccess() to use it as
well, so we no longer have a dependence on pre-optimized uses.

This change is not NFC in that the fallback on the defining access
when there are too many clobber calls may now fall back to an
unoptimized use. In practice, I've not seen any problems with this
though. If desired, we could also increase licm-mssa-optimization-cap
to a higher value (increasing this from 100 to 200 has no impact on
average compile-time -- but also doesn't appear to have any impact
on LICM quality either).

This makes for a 0.9% geomean compile-time improvement on CTMark.

Differential Revision: https://reviews.llvm.org/D147437

15 months ago[AArch64][DAGCombiner]: combine <2xi64> add/sub.
Hassnaa Hamdi [Thu, 30 Mar 2023 14:41:58 +0000 (14:41 +0000)]
[AArch64][DAGCombiner]: combine <2xi64> add/sub.

64-bit vector mul is not supported in NEON,
so we use the SVE's mul.
To improve the performance, we can go one step further,
and use SVE's add/sub, so that we can use SVE's mla/mls.
That works on these patterns:
// This works on the patterns of:
//   add v1, (mul v2, v3)
//   sub v1, (mul v2, v3)

Reviewed By: david-arm

Differential Revision: https://reviews.llvm.org/D147236

15 months ago[WebAssembly] Fix selection of global calls
Heejin Ahn [Wed, 29 Mar 2023 19:40:30 +0000 (12:40 -0700)]
[WebAssembly] Fix selection of global calls

When selecting calls, currently we unconditionally remove `Wrapper`s of
the call target. But we are supposed to do that only when the target is
a function, an external symbol (= library function), or an alias of a
function. Otherwise we end up directly calling globals that are not
functions.

Fixes https://github.com/llvm/llvm-project/issues/60003.

Reviewed By: tlively, HerrCai0907

Differential Revision: https://reviews.llvm.org/D147397

15 months ago[WebAssembly] Move call_indirect_alloca to call.ll
Heejin Ahn [Sun, 2 Apr 2023 03:08:42 +0000 (20:08 -0700)]
[WebAssembly] Move call_indirect_alloca to call.ll

Not sure the distinction between `call.ll` and `call-indirect.ll`,
because `call.ll` also seems to contain many `call_indirect` tests. Also
before D147033 `call-indirect.ll` only contained a single test and it
also tests it with `obj2yaml`, so I guess that file was created for
testing functionalities for object files as well.

We can probably merge these two someday. But anyway, this moves
`call_indirect_alloca` I added in D147033 to `call.ll`, given that that
file contains more `call_indirect` tests and I'm planning to add more
`call_indirect` tests in a followup CL.

Reviewed By: tlively

Differential Revision: https://reviews.llvm.org/D147396

15 months ago[Assignment Tracking] Ignore zero-sized fragments
OCHyams [Wed, 5 Apr 2023 08:28:15 +0000 (09:28 +0100)]
[Assignment Tracking] Ignore zero-sized fragments

Such dbg.assigns will occur if you write zero-sized memcpys (see
https://reviews.llvm.org/D146987#4240016).

Handle this in AssignmentTrackingAnalysis (back end) rather than
AssignmentTrackingPass (declare-to-assign) in case it is possible to reproduce
this as a result of optimisations.

Reviewed By: jmorse

Differential Revision: https://reviews.llvm.org/D147435

15 months ago[PowerPC] Precommit test case for issue 61882. NFC.
Kai Luo [Wed, 5 Apr 2023 08:19:18 +0000 (16:19 +0800)]
[PowerPC] Precommit test case for issue 61882. NFC.

15 months agoRevert "[-Wunsafe-buffer-usage] Fix-Its transforming `&DRE[any]` to `&DRE.data()...
David Spickett [Wed, 5 Apr 2023 08:05:29 +0000 (08:05 +0000)]
Revert "[-Wunsafe-buffer-usage] Fix-Its transforming `&DRE[any]` to `&DRE.data()[any]`"

This reverts commit 87b5807d3802b932c06d83c4287014872aa2caab.

The test case is failing on Windows https://lab.llvm.org/buildbot/#/builders/65/builds/8950.

15 months ago[flang][hlfir] Support TYPE(*) actual argument in intrinsic procedures
Jean Perier [Wed, 5 Apr 2023 08:04:29 +0000 (10:04 +0200)]
[flang][hlfir] Support TYPE(*) actual argument in intrinsic procedures

Similar to https://reviews.llvm.org/D147487.
TYPE(*) evaluate::ActualArgument wraps a symbol instead of an
expression. This requires special handling, which is limited because
C710 restrict the intrinsics in which TYPE(*) may appear as arguments
(there is for instance no need to deal with dynamic presence aspects).

Differential Revision: https://reviews.llvm.org/D147513

15 months ago[clang-tidy] Deprecate cert-dcl21-cpp
Carlos Galvez [Tue, 4 Apr 2023 19:45:24 +0000 (19:45 +0000)]
[clang-tidy] Deprecate cert-dcl21-cpp

It is no longer part of the CERT standard. Looking at the
CERT webpage, we can see it has been moved to the Void
section:
https://wiki.sei.cmu.edu/confluence/display/cplusplus/5+The+Void

Differential Revision: https://reviews.llvm.org/D147563

15 months ago[AArch64][SVE][CodeGen] Generate fused mul+add/sub ops with one of add/sub operands...
sgokhale [Wed, 5 Apr 2023 05:41:36 +0000 (11:11 +0530)]
[AArch64][SVE][CodeGen] Generate fused mul+add/sub ops with one of add/sub operands as splat

Currently, depending upon whether the add/sub instruction can synthesize immediate directly,
its decided whether to generate mul+(add/sub immediate) or mov+mla/mad/msb/mls ops.

If the add/sub can synthesize immediate directly, then fused ops wont get generated. This
patch tries to address this by having makeshift higher priority for the fused ops.

Specifically, patch aims at transformation similar to below:
add ( mul, splat_vector(C))
->
      MOV C
      MAD

Differential Revision: https://reviews.llvm.org/D142656

15 months ago[InstSimplify] Pre-land test for fp min/max optimization.
Serguei Katkov [Wed, 5 Apr 2023 05:08:58 +0000 (12:08 +0700)]
[InstSimplify] Pre-land test for fp min/max optimization.

15 months ago[dsymutil] Prevent interleaved errors and warnings
Jonas Devlieghere [Wed, 5 Apr 2023 04:49:55 +0000 (21:49 -0700)]
[dsymutil] Prevent interleaved errors and warnings

Use a mutex to protect the printing of errors and warnings and prevents
interleaving. There are two sources of parallelism in dsymutil that
could result in interleaved output: errors from different architectures
being processed in parallel and errors from the analyze and clone steps
which execute in lockstep. This patch addresses both by using a unique
mutex across all error reporting.

15 months ago[dsymutil] Unify reporting of warnings and errors
Jonas Devlieghere [Wed, 5 Apr 2023 04:48:58 +0000 (21:48 -0700)]
[dsymutil] Unify reporting of warnings and errors

Make all error reporting in DwarfLinkerForBinary go through the
`reportWarning` and `reportError` wrappers.

15 months ago[dsymutil] Make copySwiftInterfaces a member of DwarfLinkerForBinary (NFC)
Jonas Devlieghere [Wed, 5 Apr 2023 04:28:57 +0000 (21:28 -0700)]
[dsymutil] Make copySwiftInterfaces a member of DwarfLinkerForBinary (NFC)

Make copySwiftInterfaces a member of DwarfLinkerForBinary instead of a
static function.

15 months ago[mlir][tosa] Add InferTensorType interface to tosa reduce operations
Aviad Cohen [Sun, 2 Apr 2023 09:12:15 +0000 (12:12 +0300)]
[mlir][tosa] Add InferTensorType interface to tosa reduce operations

When this interface is used, a call to inferReturnTypeComponents()
is generated on creation and verification of the op.

Reviewed By: jpienaar, eric-k256

Differential Revision: https://reviews.llvm.org/D147407

15 months ago[ORC] Return bootstrap map values via reference argument.
Lang Hames [Wed, 5 Apr 2023 01:07:44 +0000 (18:07 -0700)]
[ORC] Return bootstrap map values via reference argument.

This simplifies checking of the result (it's just an Error, rather than an
optional<Expected<T>>), and allows T to be deduced rather than requiring that
it be specified.

15 months ago[gn build] Port 443825c517c8
Nico Weber [Wed, 5 Apr 2023 01:25:45 +0000 (21:25 -0400)]
[gn build] Port 443825c517c8

15 months ago[gn] port 4dc3bcf0124a
Nico Weber [Wed, 5 Apr 2023 01:23:29 +0000 (21:23 -0400)]
[gn] port 4dc3bcf0124a

15 months agoFix bazel overlay after "[mlir] Introduce IRDL dialect"
Tomás Longeri [Wed, 5 Apr 2023 01:07:25 +0000 (01:07 +0000)]
Fix bazel overlay after "[mlir] Introduce IRDL dialect"

Reviewed By: dblaikie

Differential Revision: https://reviews.llvm.org/D147583

15 months agoFix a few clang-tidy warnings (container empty checks, function decl/def param naming)
David Blaikie [Wed, 5 Apr 2023 01:06:16 +0000 (01:06 +0000)]
Fix a few clang-tidy warnings (container empty checks, function decl/def param naming)

15 months ago[libc] Forward CUDA options to the runtimes invocation of `libc`
Joseph Huber [Tue, 4 Apr 2023 23:32:00 +0000 (18:32 -0500)]
[libc] Forward CUDA options to the runtimes invocation of `libc`

Some configurations may require `-DCUDAToolkit_ROOT` to find CUDA
properly. This is currently not forwarded to the CMake invocation. This
patch adds a prefix so it will be visible when the runtimes build is
started.

Reviewed By: tra

Differential Revision: https://reviews.llvm.org/D147582

15 months ago[libc] Ensure that the required clang tools are up-to-date for libc GPU
Joseph Huber [Tue, 4 Apr 2023 23:20:13 +0000 (18:20 -0500)]
[libc] Ensure that the required clang tools are up-to-date for libc GPU

The `clang-offload-packager`. `nvptx-arch`, and `amdgpu-arch` tools are
required for building the GPU target of `libc`. This patch ensures that
we build this tool when directly building `libc` via `ninja libc` or similar.

Reviewed By: tra

Differential Revision: https://reviews.llvm.org/D147581

15 months ago[nvptx-arch] Dynamically load `libcuda.so.1` directly instead
Joseph Huber [Tue, 4 Apr 2023 23:10:51 +0000 (18:10 -0500)]
[nvptx-arch] Dynamically load `libcuda.so.1` directly instead

This patch loads the CUDA driver library directly via its real
`DT_SONAME`. This prevents the filesystem from needing to reload it in
cases when it's already loaded.

Reviewed By: tra

Differential Revision: https://reviews.llvm.org/D147579

15 months ago[FS-AFDO] Assign discriminators to pseudo probes
Hongtao Yu [Fri, 31 Mar 2023 00:36:51 +0000 (17:36 -0700)]
[FS-AFDO] Assign discriminators to pseudo probes

This is the first change for FS-AFDO integration with CSSPGO. There are more patches coming.

With pseudo probes, we do not assign FS discriminators to any other instructions since we will be using only probes for profile correlation.

Also call instructions are excluded since their dwarf discriminators are used for other purposes, i.e, storing probe ids. Since they are not getting a FS discriminator, they will also be excluded from MIR profile loading. The corresponding changes will be in the subsequent patches.

Reviewed By: wenlei

Differential Revision: https://reviews.llvm.org/D147286

15 months ago[M68k] Add `TRAP`, `TRAPV`, `BKPT`, `ILLEGAL` instructions
Ian Douglas Scott [Tue, 4 Apr 2023 23:32:14 +0000 (16:32 -0700)]
[M68k] Add `TRAP`, `TRAPV`, `BKPT`, `ILLEGAL` instructions

This makes it possible to use TRAP to make Linux system calls using
inline assembly for instance.

Differential Revision: https://reviews.llvm.org/D147102

15 months agoasan_memory_profile: Fix for deadlock in memory profiler code.
Sanjeet Karan Singh [Tue, 4 Apr 2023 22:17:29 +0000 (15:17 -0700)]
asan_memory_profile: Fix for deadlock in memory profiler code.

Calling symbolization directly from stopTheWorld was causing deadlock.
For libc dep systems, symbolization uses dl_iterate_phdr, which acquire a
dl write lock. It could deadlock if the lock is already acquired by one of
suspended.

Reviewed By: vitalybuka

Differential Revision: https://reviews.llvm.org/D146990

15 months agoSimplify test.
Adrian Prantl [Tue, 4 Apr 2023 22:15:15 +0000 (15:15 -0700)]
Simplify test.

This test doesn't actually depend on being able to launch the process.
This may or may not explain why this test behaves oddly on some of our bots.

15 months agoSimplify test script
Adrian Prantl [Tue, 4 Apr 2023 22:08:38 +0000 (15:08 -0700)]
Simplify test script

15 months ago[clang][lit] Make LIT aware of env CLANG_CRASH_DIAGNOSTICS_DIR.
Francesco Petrogalli [Tue, 4 Apr 2023 22:07:20 +0000 (00:07 +0200)]
[clang][lit] Make LIT aware of env CLANG_CRASH_DIAGNOSTICS_DIR.

This is useful for retriving crash reports of LIT runs when the
temporary folder is not accessible.

Reviewed By: michaelplatings

Differential Revision: https://reviews.llvm.org/D147209

15 months ago[BPF] Fix assembly parsing errors for atomic_fetch_* instructions
Eduard Zingerman [Mon, 3 Apr 2023 02:05:23 +0000 (05:05 +0300)]
[BPF] Fix assembly parsing errors for atomic_fetch_* instructions

Fixes BPF assembler parsing errors for the following instructions:
- atomic_fetch_add
- atomic_fetch_and
- atomic_fetch_xor
- atomic_fetch_or
- cmpxchg32_32
- cmpxchg_64
- xchg32_32
- xchg_64

Also add a test to verify that all instructions could be assembled and disassembled.

Differential Revision: https://reviews.llvm.org/D147421

15 months ago[SPIR-V] Remove switch G_ICMP+G_BRCOND+G_BR before ISel
Michal Paszkowski [Sun, 26 Mar 2023 18:07:11 +0000 (20:07 +0200)]
[SPIR-V] Remove switch G_ICMP+G_BRCOND+G_BR before ISel

IRTranslator lowers switches to [G_SUB] + G_ICMP + G_BRCOND + G_BR
sequences. Since values and destination MBBs are included in the
spv_switch intrinsics, the sequences are not needed for ISel.

Before this commit, the information decoded by these sequences were
added to spv_switch intrinsics in SPIRVPreLegalizer and the sequences
were kept until SPIRVModuleAnalysis where they were marked skipped for
emission.

After this commit, the [G_SUB] + G_ICMP + G_BRCOND + G_BR sequences
and MBBs containing only these MIs are erased in SPIRVPreLegalizer.

Differential Revision: https://reviews.llvm.org/D146923

15 months ago[MC][WebAssembly] Fix type checking for bulk memory instructions
Sam Clegg [Tue, 4 Apr 2023 15:37:25 +0000 (08:37 -0700)]
[MC][WebAssembly] Fix type checking for bulk memory instructions

This code currently assumes that all bulk memory operations occur on
memory 0 which who's type is determined by the wasm32 vs wasm64 target
triple.  Further improvements would be need to support multi-memory.

Differential Revision: https://reviews.llvm.org/D147540

15 months ago[msan] Fix handling of ParamTLS overflow.
Evgenii Stepanov [Fri, 24 Mar 2023 23:56:44 +0000 (16:56 -0700)]
[msan] Fix handling of ParamTLS overflow.

Ironically, MSan copies uninitialized data off the stack into
VAArgTLSCopy in the callee-side handling of va_start. Clamp the copy
size to the actual length of the buffer, and zero-initialize the
remainder.

Differential Revision: https://reviews.llvm.org/D146858

15 months ago[-Wunsafe-buffer-usage] Fix-Its transforming `&DRE[any]` to `&DRE.data()[any]`
ziqingluo-90 [Tue, 4 Apr 2023 19:19:12 +0000 (12:19 -0700)]
[-Wunsafe-buffer-usage] Fix-Its transforming `&DRE[any]` to `&DRE.data()[any]`

For an expression of the form `&DRE[any]` under an Unspecified
Pointer Context (UPC), we generate a fix-it for it with respect to a
strategy. In case the strategy is `std::span` (it is the only supported
one for now), the fix-it replaces the expression with
`&DRE.data()[any]`.

A UPC includes at least the contexts where
- the expression is being casted to an integer; and
- the expression is an argument of a call to a function that is not marked unsafe.

Reviewed by: NoQ, malavikasamak, t-rasmud, jkorous

Differential revision: https://reviews.llvm.org/D143128

15 months ago[lldb] Fix build on older FreeBSD
Brooks Davis [Tue, 4 Apr 2023 20:21:45 +0000 (21:21 +0100)]
[lldb] Fix build on older FreeBSD

Commit 392d9eb03af5a1adac66a86939351b22b3e73495 added a dependency on
FPE_FLTIDO which was only defined in FreeBSD main on May 19, 2022 and
is not in all supported releases. Just define it if it's missing as we
could use a debugger compiled on an older system to debug a newer one.

Reviewed by: DavidSpickett, emaste, dim

Differential Revision: https://reviews.llvm.org/D147300

15 months ago[LV] Add uses of recurrences in exit blocks in some tests.
Florian Hahn [Tue, 4 Apr 2023 20:19:29 +0000 (21:19 +0100)]
[LV] Add uses of recurrences in exit blocks in some tests.

This preserves the spirit of the tests even if a follow-up changes only
generates exit values for recurrences if they are actually used.

15 months ago[AIX][PGO] Add malloc error handling and deallocation to FindBinaryId function
Wael Yehia [Tue, 4 Apr 2023 19:28:21 +0000 (19:28 +0000)]
[AIX][PGO] Add malloc error handling and deallocation to FindBinaryId function

This is a follow up on D146976.

Reviewed By: stephenpeckham

Differential Revision: https://reviews.llvm.org/D147559

15 months ago[RISCV] Add tests for extract/insert subvector costs and extract lowering
Philip Reames [Tue, 4 Apr 2023 19:45:22 +0000 (12:45 -0700)]
[RISCV] Add tests for extract/insert subvector costs and extract lowering

15 months ago[RISCV][TTI] Cost SK_Tranpose as a generic two element shuffle
Philip Reames [Mon, 3 Apr 2023 23:27:38 +0000 (16:27 -0700)]
[RISCV][TTI] Cost SK_Tranpose as a generic two element shuffle

This matches the actual lowering.  The previous costing was "as if" it had been fully scalarized.

15 months ago[Verifier] Verify sizes of matrix.multiply operands and specified shape.
Florian Hahn [Tue, 4 Apr 2023 19:51:30 +0000 (20:51 +0100)]
[Verifier] Verify sizes of matrix.multiply operands and specified shape.

Extend the verifier to check if the size of the matrix operands of
matrix.multiply match the sizes specified by the numeric arguments.

Reviewed By: thegameg

Differential Revision: https://reviews.llvm.org/D147466

15 months ago[DebugInfo][flang] Fix linking modules with similar DIStringType
Vladimir Radosavljevic [Tue, 4 Apr 2023 19:34:56 +0000 (12:34 -0700)]
[DebugInfo][flang] Fix linking modules with similar DIStringType

This issue is caused by incomplete implementation of isKeyOf for DIStringType.

Differential Revision: https://reviews.llvm.org/D147140

15 months ago[HWASAN] Fix test which was failing with tag mismatch due to missing no_sanitize...
Kirill Stoimenov [Tue, 4 Apr 2023 19:39:57 +0000 (19:39 +0000)]
[HWASAN] Fix test which was failing with tag mismatch due to missing no_sanitize statement

15 months ago[flang] HLFIR to FIR lowering for complex parts
Ethan Luis McDonough [Tue, 4 Apr 2023 18:54:14 +0000 (13:54 -0500)]
[flang] HLFIR to FIR lowering for complex parts

This revision implements HLFIR to FIR lowering for complex parts.

Reviewed By: jeanPerier

Differential Revision: https://reviews.llvm.org/D146487

15 months ago[mlir] Introduce IRDL dialect
Mathieu Fehr [Tue, 4 Apr 2023 19:16:45 +0000 (20:16 +0100)]
[mlir] Introduce IRDL dialect

This patch introduces the IRDL dialect, which allow users to represent
dynamic dialect definitions as an MLIR program.

The IRDL dialect defines operations, attributes, and types, using
attribute constraints. For example:

```
module {
  irdl.dialect @cmath {
    irdl.type @complex {
      %0 = irdl.is f32
      %1 = irdl.is f64
      %2 = irdl.any_of(%0, %1)
      irdl.parameters(%2)
    }

    irdl.operation @norm {
      %0 = irdl.any
      %1 = irdl.parametric @complex<%0>
      irdl.operands(%1)
      irdl.results(%0)
    }
}
```

This program will define a new `cmath.complex` type, which expects a single
parameter, which is either an `f32` or an `f64`. It also defines an
`cmath.norm` operation, which expects a single `cmath.complex` type as operand,
and returns a value of the underlying type. Note that like PDL (which IRDL is
heavily inspired from), both uses of `%0` are expected to be of the same attribute.

IRDL handles attributes and types with the same operations, and does this by always
wrapping types in a `TypeAttr`. This is to simplify the language.

Depends on D144690

Reviewed By: rriddle, mehdi_amini

Differential Revision: https://reviews.llvm.org/D144692

15 months ago[scudo] Make the placeholder type specifier be consistent with C/C++
Chia-hung Duan [Tue, 4 Apr 2023 04:13:58 +0000 (04:13 +0000)]
[scudo] Make the placeholder type specifier be consistent with C/C++

This avoids `-Wformat` complains the placeholder type specifier mismatch
on `lld`/`llu`(used for `s64`/`u64`) which have slightly different
interpretation in string_utils.cpp.

Also enable Timer build which was disabled because of the complaining of
`-Wformat`.

Differential Revision: https://reviews.llvm.org/D147496

15 months ago[APFloat] Refactor common code for APFloat<->APInt conversion
David Majnemer [Wed, 29 Mar 2023 21:44:11 +0000 (21:44 +0000)]
[APFloat] Refactor common code for APFloat<->APInt conversion

All the IEEE formats are quite similar, we can merge their code
effectively by writing it parametrically via the fltSemantics object.

We can metaprogram the implementation such that this parametricity is
zero-cost.

15 months ago[amdgpu] Implement dynamic LDS accesses from non-kernel functions
Jon Chesterfield [Tue, 4 Apr 2023 19:06:33 +0000 (20:06 +0100)]
[amdgpu] Implement dynamic LDS accesses from non-kernel functions

The premise here is to allow non-kernel functions to locate external LDS variables without using LDS or extra magic SGPRs to do so.

1/ First it crawls the callgraph to work out which external LDS variables are reachable from a given kernel
2/ Then it creates a new `extern char[0]` variable for each kernel, which will alias all the other extern LDS variables because that's the documented behaviour of these variables
3/ The address of that variable is written to a lookup table. The global variable is tagged with metadata to track what address it was allocated at by codegen
4/ The assembler builds the lookup table using the metadata
5/ Any non-kernel functions use the same magic intrinsic used by table lookups of non-dynamic LDS variables to find the address to use

Heavy overlap with the code paths taken for other lowering, in particular the same intrinsic is used to pass the dynamic scope information through the same sgpr as for table lookups of static LDS.

Reviewed By: arsenm

Differential Revision: https://reviews.llvm.org/D144233

15 months ago[MergedLoadStoreMotion] Merge stores with conflicting value types
Jeff Byrnes [Tue, 28 Mar 2023 23:47:58 +0000 (16:47 -0700)]
[MergedLoadStoreMotion] Merge stores with conflicting value types

Since memory does not have an intrinsic type, we do not need to require value type matching on stores in order to sink them. To facilitate that, this patch finds stores which are sinkable, but have conflicting types, and bitcasts the ValueOperand so they are easily sinkable into a PHINode. Rather than doing fancy analysis to optimally insert the bitcast, we always insert right before the relevant store in the diamond branch. The assumption is that later passes (e.g. GVN, SimplifyCFG) will clean up bitcasts as needed.

Differential Revision: https://reviews.llvm.org/D147348

15 months ago[lld][WebAssembly] Fix stub library parsing with windows line endings
Sam Clegg [Tue, 4 Apr 2023 17:24:40 +0000 (10:24 -0700)]
[lld][WebAssembly] Fix stub library parsing with windows line endings

Also, fix checking of first line in ::parse.  We can't use the
::getLines helper here since that already does comment stripping
internally.

Differential Revision: https://reviews.llvm.org/D147548

15 months ago[dsymutil] Disallow --reproducer=Use
Keith Smiley [Tue, 4 Apr 2023 04:53:33 +0000 (21:53 -0700)]
[dsymutil] Disallow --reproducer=Use

This should be implied by --use-reproducer instead as a path is required
for this mode

Differential Revision: https://reviews.llvm.org/D147499

15 months agoRevert "AMDGPU: Created a subclass for the return address operand in the tail call...
Changpeng Fang [Tue, 4 Apr 2023 18:44:52 +0000 (11:44 -0700)]
Revert "AMDGPU: Created a subclass for the return address operand in the tail call return instruction"

This reverts commit 461a559bc9bd755436ba8f12f8b74757e03f9b9f.

15 months agoDo not move "auto-init" instruction if they're volatile
serge-sans-paille [Tue, 4 Apr 2023 18:38:25 +0000 (20:38 +0200)]
Do not move "auto-init" instruction if they're volatile

This is overly conservative, but at least it's safe.

This is a follow-up to https://reviews.llvm.org/D137707

15 months ago[lld] Support separate native object file path in --thinlto-prefix-replace
Ivan Tadeu Ferreira Antunes Filho [Tue, 4 Apr 2023 16:57:53 +0000 (09:57 -0700)]
[lld] Support separate native object file path in --thinlto-prefix-replace

Currently, the --thinlto-prefix-replace="oldpath;newpath" option is used during
distributed ThinLTO thin links to specify the mapping of the input bitcode object
files' directory tree (oldpath) to the directory tree (newpath) used for both:

1) the output files of the thin link itself (the .thinlto.bc index files and the
optional .imports files)
2) the specified object file paths written to the response file given in the
--thinlto-index-only=${response} option, which is used by the final native
link and must match the paths of the native object files that will be
produced by ThinLTO backend compiles.
This patch expands the --thinlto-prefix-replace option to allow a separate directory
tree mapping to be specified for the object file paths written to the response file
(number 2 above). This is important to support builds and build systems where the
same output directory may not be written by multiple build actions (e.g. the thin link
and the ThinLTO backend compiles).

The new format is: --thinlto-prefix-replace="origpath;outpath[;objpath]"

This replaces the origpath directory tree of the thin link input files with
outpath when writing the thin link index and imports outputs (number 1
above). If objpath is specified it replaces origpath of the input files with
objpath when writing the response file (number 2 above), otherwise it
falls back to the old behavior of using outpath for this as well.

Reviewed By: tejohnson, MaskRay

Differential Revision: https://reviews.llvm.org/D144596

15 months ago[AArch64][GlobalISel][RegBankSelect] Guess the bank for loads using the MMO.
Amara Emerson [Tue, 4 Apr 2023 18:00:50 +0000 (11:00 -0700)]
[AArch64][GlobalISel][RegBankSelect] Guess the bank for loads using the MMO.

We had this patch downstream for a long time, we need to find the users of the
IR load to guess the bank since with opaque pointers we lost the type information.

15 months ago[fuzzer][test] Avoid big-file-copy.test on memory constrained devices
Roy Sundahl [Tue, 4 Apr 2023 05:10:26 +0000 (22:10 -0700)]
[fuzzer][test] Avoid big-file-copy.test on memory constrained devices

The test "big-file-copy.test" introduced in D146189 and constrained to darwin by
D147094, is by this differential further constrained to only those devices with
sufficient resources. Also correct the test to read the environment variable
"result" from the same shell in which it was stored (which may differ on devices).

Reviewed By: thetruestblue

Differential Revision: https://reviews.llvm.org/D147502

15 months agoAMDGPU: Created a subclass for the return address operand in the tail call return...
Changpeng Fang [Tue, 4 Apr 2023 17:56:58 +0000 (10:56 -0700)]
AMDGPU: Created a subclass for the return address operand in the tail call return instruction

Summary:
  This is to avoid using the callee saved registers for the return address of the tail call return instruction.

Reviewers:
  arsenm, cdevadas

Differential Revision:
  https://reviews.llvm.org/D147096

15 months agoBazel: Try alwayslink
David Blaikie [Tue, 4 Apr 2023 17:45:27 +0000 (17:45 +0000)]
Bazel: Try alwayslink