Xu Mingjie [Tue, 21 Sep 2021 19:56:50 +0000 (12:56 -0700)]
[LTO] Emit DebugLoc for dead function in optimization remarks
Currently, the dead functions information getting from optimizations remarks does not contain debug location, but knowing where these dead functions locate could be useful for debugging or for detecting dead code.
Cause in `LTO::addRegularLTO()` we use `BitcodeModule::getLazyModule()` to read the bitcode module, when we pass Function F to `ore::NV()`, F is not materialized, so `F->getSubprogram()` returns nullptr, and there is no debug location information of dead functions in optimizations remarks.
This patch call `F->materialize()` before we pass Function F to `ore::NV()`, then debug location information will be emitted for dead functions in optimization remarks.
Reviewed By: tejohnson
Differential Revision: https://reviews.llvm.org/D109737
Joseph Huber [Tue, 21 Sep 2021 19:32:41 +0000 (15:32 -0400)]
[OpenMP] Add thread ID function into new RTL
The new device runtime library currently lacks the
`kmpc_get_hardware_thread_id_in_block` function which is currently used
when doing the SPMDzation optimization. This call would be introduced
through the optimization and then cause a linking error because it was
not present. This patch adds support for this runtime call.
Reviewed By: tianshilei1992
Differential Revision: https://reviews.llvm.org/D110195
Arthur Eubanks [Tue, 21 Sep 2021 21:09:17 +0000 (14:09 -0700)]
[clang] Make -Rpass imply -Rpass=.*
Previously with -Rpass (and friends) we'd have remarks "enabled", but
without an actual regex.
As seen in the test change to line numbers, this can give us better
diagnostics by properly enabling NeedLocTracking with -Rpass.
Reviewed By: dblaikie
Differential Revision: https://reviews.llvm.org/D110201
Craig Topper [Tue, 21 Sep 2021 21:33:44 +0000 (14:33 -0700)]
Revert "[X86] Clear kill flags when rewriting SETCC uses in flag copy lowering."
This reverts commit
7550f146ff75667d6e1828d64438dcc23b77f036.
I botched the bug number.
Craig Topper [Tue, 21 Sep 2021 20:54:55 +0000 (13:54 -0700)]
[X86] Clear kill flags when rewriting SETCC uses in flag copy lowering.
When we rewrite the setcc we replace set old setcc output register
with the new CondReg. But since CondReg can be shared by other
replacements, we don't know if the kill flags for the old register
are valid for CondReg. So be conservative and remove them.
The test case has a SETCCr and a SETCCm on the same condition so
they end up sharing the same CondReg. The SETCCr had one use with
a kill flag. This kill flag isn't valid after the replacement because
CondReg needs a live range extending to the later SETCCm replacment.
Fixes PR51908.
Reviewed By: RKSimon
Differential Revision: https://reviews.llvm.org/D110046
Albion Fung [Tue, 21 Sep 2021 20:46:30 +0000 (15:46 -0500)]
[PowerPC] Support for vector bool int128 on vector comparison builtins
This patch implements support for the type vector bool int128
for arguments on vector comparison builtins listed below,
which would otherwise crash due to ambiguity.
The following builtins are added:
vec_all_eq (vector bool __int128, vector bool __int128)
vec_all_ne (vector bool __int128, vector bool __int128)
vec_any_eq (vector bool __int128, vector bool __int128)
vec_any_ne (vector bool __int128, vector bool __int128)
vec_cmpne(vector bool __int128 a, vector bool __int128 b)
vec_cmpeq(vector bool __int128 a, vector bool __int128 b)
Differential revision: https://reviews.llvm.org/D110084
Sanjay Patel [Tue, 21 Sep 2021 20:52:32 +0000 (16:52 -0400)]
[CodeGen] regenerate test checks; NFC
This broke with
2f6b07316f56 because it wrongly runs the entire LLVM optimizer.
George Burgess IV [Tue, 21 Sep 2021 20:09:08 +0000 (13:09 -0700)]
MemoryBuiltins: update comment; NFC
This comment references behavior that was removed in
ccae43a247b0791f78ea89b9cb7e59fa70f5000d, which is a commit from 5 years
ago. It seems safe to assume that that behavior won't be coming back
soon. If it does, we can readd this part of the comment :)
Giorgis Georgakoudis [Tue, 21 Sep 2021 20:20:39 +0000 (13:20 -0700)]
Revert "[OpenMP] Codegen aggregate for outlined function captures"
This reverts commit
1d66649adf28d48ae1731516d87fb899426e3349.
Revert to fix AMG GPU issue.
Arthur O'Dwyer [Fri, 17 Sep 2021 03:14:57 +0000 (23:14 -0400)]
[libc++] counting_semaphore should not be default-constructible.
Neither the current C++2b draft, nor any revision of [p1135],
nor libstdc++, claims that `counting_semaphore` should be
default-constructible. I think this was just a copy-paste issue
somehow.
Also, `explicit` was missing from the constructor.
Also, `constexpr` remains missing; but that's probably more of a
technical limitation, since apparently there are some platforms
where we don't (can't??) use the atomic implementation and
have to rely on pthreads, which obviously isn't constexpr.
Differential Revision: https://reviews.llvm.org/D110042
Sanjay Patel [Tue, 21 Sep 2021 19:33:51 +0000 (15:33 -0400)]
[InstCombine] fold cast of right-shift if high bits are not demanded
(masked) trunc (lshr X, C) --> (masked) lshr (trunc X), C
Narrowing the shift should be better for analysis and can lead
to follow-on transforms as shown.
Attempt at a general proof in Alive2:
https://alive2.llvm.org/ce/z/tRnnSF
Here are a couple of the specific tests:
https://alive2.llvm.org/ce/z/bCnTp-
https://alive2.llvm.org/ce/z/TfaHnb
Differential Revision: https://reviews.llvm.org/D110170
Antonio Frighetto [Tue, 21 Sep 2021 19:54:40 +0000 (21:54 +0200)]
[IR] Look through bitcast in hasFnAttribute()
A logic incompleteness may lead MemorySSA to be too conservative
in its results. Specifically, when dealing with a call of kind
`call i32 bitcast (i1 (i1)* @test to i32 (i32)*)(i32 %1)`, where
the function `test` is declared with readonly attribute, the
bitcast is not looked through, obscuring function attributes. Hence,
some methods of CallBase (e.g., doesNotReadMemory) could provide
suboptimal results.
Differential Revision: https://reviews.llvm.org/D109888
Usman Nadeem [Tue, 21 Sep 2021 19:45:15 +0000 (12:45 -0700)]
[OpenMP][OMPD] Fix compile error when OMPD is not supported
Differential Revision: https://reviews.llvm.org/D110120
Change-Id: I9d39dacfab5b7fbab37ee4b4d960d51e0892b24d
Nikita Popov [Tue, 21 Sep 2021 19:42:58 +0000 (21:42 +0200)]
[MergeICmps] Remove unused NumMerged variable
Matheus Izvekov [Tue, 14 Sep 2021 23:46:30 +0000 (01:46 +0200)]
[clang] don't mark as Elidable CXXConstruct expressions used in NRVO
See PR51862.
The consumers of the Elidable flag in CXXConstructExpr assume that
an elidable construction just goes through a single copy/move construction,
so that the source object is immediately passed as an argument and is the same
type as the parameter itself.
With the implementation of P2266 and after some adjustments to the
implementation of P1825, we started (correctly, as per standard)
allowing more cases where the copy initialization goes through
user defined conversions.
With this patch we stop using this flag in NRVO contexts, to preserve code
that relies on that assumption.
This causes no known functional changes, we just stop firing some asserts
in a cople of included test cases.
Reviewed By: rsmith
Differential Revision: https://reviews.llvm.org/D109800
Dan F-M [Tue, 21 Sep 2021 19:19:07 +0000 (12:19 -0700)]
[Bazel] Add support for targeting macOS arm64
In attempting to build JAX on Apple Silicon, we discovered an issue with
the bazel configuration in llvm-project-overlay. This patch fixes the
logic, at least when building JAX. More context is included on the
following GitHub issue: https://github.com/google/jax/issues/5501
Differential Revision: https://reviews.llvm.org/D109839
Nikita Popov [Sat, 18 Sep 2021 15:10:29 +0000 (17:10 +0200)]
[MergeICmps] Don't reorder unmerged comparisons
MergeICmps will currently sort (by offset) all comparisons in a chain,
including those that do not get merged. This is problematic in two ways:
* We may end up moving the original first block into the middle of
the chain, in which case the "extra work" instructions will also
be in the middle of the chain, resulting in invalid IR
(reported in https://reviews.llvm.org/D108782#3005583).
* Reordering branches is generally not legal, because it may
introduce branch on poison, which is UB (PR51845). The merging
done by MergeICmps is legal as long as we assume that memcmp()
works on frozen memory, but the reordering of unmerged comparisons
is definitely incorrect (without inserting freeze instructions),
so we should avoid it.
There are easier ways to fix the first issue, but I figured it was
worthwhile to do this properly to also fix the second one. What we
now do is to restore the original relative order of (potentially
merged) comparisons.
I took the liberty of dropping the MERGEICMPS_DOT_ON functionality,
because it would be more awkward to implement now (as the before and
after representation is different) and it doesn't seem terribly
useful nowadays.
Differential Revision: https://reviews.llvm.org/D110024
David Blaikie [Tue, 21 Sep 2021 18:38:33 +0000 (11:38 -0700)]
nullptr printing - update for a change to clang type printing that now uses "std::nullptr_t"
Christian Sigg [Tue, 21 Sep 2021 11:37:06 +0000 (13:37 +0200)]
Support value-typed references in iterator facade's operator->()
Add a PointerProxy similar to the existing iterator_facade_base::ReferenceProxy and return it from the arrow operator. This prevents iterator facades with a reference type that is not a true reference to take the address of a temporary.
Forward the reference type of the mapped_iterator to the iterator adaptor which in turn forwards it to the iterator facade. This fixes mlir::op_iterator::operator->() to take the address of a temporary.
Make some polishing changes to op_iterator and op_filter_iterator.
Reviewed By: rriddle
Differential Revision: https://reviews.llvm.org/D109490
David Blaikie [Tue, 21 Sep 2021 18:32:17 +0000 (11:32 -0700)]
DebugInfo: Rebuild decltype(nullptr) as 'std::nullptr_t'
Now that Clang's been changed to render nullptr types/template
parameters as 'std::nullptr_t' do the same thing down here.
(Clang commit:
131e8786640a49daf533b7ead4d3b5b82e0aea2a )
Michael Liao [Tue, 21 Sep 2021 18:09:06 +0000 (14:09 -0400)]
[IR] Re-group AAMDNodes relevant interfaces. NFC.
David Blaikie [Sun, 19 Sep 2021 21:32:45 +0000 (14:32 -0700)]
Print nullptr_t namespace qualified within std::
This improves diagnostic (& important to me, DWARF) accuracy - otherwise
there could be ambiguities between "std::nullptr_t" and some user-defined
type that's /actually/ "nullptr_t" defined in the global namespace.
Differential Revision: https://reviews.llvm.org/D110044
alex-t [Thu, 16 Sep 2021 17:20:44 +0000 (20:20 +0300)]
[AMDGPU] Filtering out the inactive lanes bits when lowering copy to SCC
Normally, given that the DA results are kept consistent over the selection DAG, uniform comparisons get selected to S_CMP_* but divergent to V_CMP_*. Sometimes, for the sake of efficiency, SSA subgraphs may be converted to VALU to avoid repeatedly copying data back and forth. Hence we have to be able to sustain the correctness passing the i1 from VALU to SALU context and vice versa.
VALU operations only process the active lanes of the VGPR and ignore inactive ones.
Active lanes correspond to 1 bit in the EXEC mask register.
SALU represents i1 as just one bit but VALU as 64bits: 0/1 and 0/(0xffffffffffffffff & EXEC) respectively.
SALU uses one-bit conditional flag SCC but VALU - VCC that is a pair of 32-bit SGPRs
To expose SCC to the VALU context we need to convert the one-bit boolean value to the appropriate 64bit.
To return back to the SALU context we need to do the opposite.
To correctly convert 64bit VALU boolean to either 0 or 1 we need to filter out the bits corresponding to the inactive lanes.
Reviewed By: piotr
Differential Revision: https://reviews.llvm.org/D109900
Owen Anderson [Fri, 17 Sep 2021 18:28:58 +0000 (18:28 +0000)]
Teach InstCombine to eliminate malloc-realloc-free triplets.
Reviewed By: majnemer
Differential Revision: https://reviews.llvm.org/D109988
Brendon Cahoon [Sun, 19 Sep 2021 19:27:43 +0000 (14:27 -0500)]
[AMDGPU] Correctly merge alias.scope and noalias metadata for memops
When adding alias.scope and noalias metadata to a memcpy function,
the alias.scope and noalias metadata from the operands are merged.
The rule for merging alias.scope is to take the intersection of
the domains and the union of the scopes within those domains.
The rule for merging noalias is to take the intersection.
The bug is that AMDGPULowerModuleLDS was using concatenation for
both alias.scope and noalias. For example, when f1 and f2 are added
to the LDS structure and there is a memcpy(f2, f1, sizeof(f1)).
Then, concatenation creates noalias metadata for the memcpy that
includes both {f1, f2}. That means that the memcpy is assumed
not to alias a prior load of f2, which enables the optimizer to
remove a load of f2 that occurs after mempcy.
The function MDNode::getmostGenericAliasScope defines the semantics
for alias.scope. There is a function, combineMetadata in Local.cpp,
that uses intersect for noalias.
Differential Revision: https://reviews.llvm.org/D110049
Craig Topper [Tue, 21 Sep 2021 17:39:08 +0000 (10:39 -0700)]
[RISCV] Make some arrays of constants 'static const'. NFC
This helps the compiler generate better code.
Danila Malyutin [Thu, 26 Aug 2021 18:00:13 +0000 (21:00 +0300)]
[LSR] Make sure that Factor fits into Base type
Fixes pr42770
Differential Revision: https://reviews.llvm.org/D108772
Giorgis Georgakoudis [Tue, 21 Sep 2021 00:12:14 +0000 (17:12 -0700)]
[OpenMP] Codegen aggregate for outlined function captures
Parallel regions are outlined as functions with capture variables explicitly generated as distinct parameters in the function's argument list. That complicates the fork_call interface in the OpenMP runtime: (1) the fork_call is variadic since there is a variable number of arguments to forward to the outlined function, (2) wrapping/unwrapping arguments happens in the OpenMP runtime, which is sub-optimal, has been a source of ABI bugs, and has a hardcoded limit (16) in the number of arguments, (3) forwarded arguments must cast to pointer types, which complicates debugging. This patch avoids those issues by aggregating captured arguments in a struct to pass to the fork_call.
Reviewed By: jdoerfert, jhuber6
Differential Revision: https://reviews.llvm.org/D102107
Amy Kwan [Tue, 21 Sep 2021 15:49:33 +0000 (10:49 -0500)]
[PowerPC] Add prefix load pattern for fpext to v2f64
This patch adds a prefixed load pattern involving v2f32 fpext v2f64, where we
are dealing with a value with an offset that fits into a 34-bit signed immediate.
A reduced test case is also added to patch that tests the pattern, in which the
pattern is tested in the big endian CHECKs of the newly added test.
Differential Revision: https://reviews.llvm.org/D109887
Ayal Zaks [Sun, 29 Aug 2021 14:10:25 +0000 (17:10 +0300)]
[LV] Fix crash for reverse interleaved loads with gap under fold-tail.
This patch fixes the crash found by PR51614:
whenever doing tail folding, interleave groups must be considered under mask.
Another fix D108900 follows for targets that support masked loads and stores:
when *deciding* to vectorize with masked interleave groups, check if the access
is reverse - which is currently not supported; rather than (only) asserting when
computing cost and generating code.
Differential Revision: https://reviews.llvm.org/D108891
Craig Topper [Tue, 21 Sep 2021 16:54:17 +0000 (09:54 -0700)]
[RISCV] Teach RISCVTargetLowering::shouldSinkOperands to sink splats for and/or/xor.
This requires a minor change to CodeGenPrepare to ensure that
shouldSinkOperands will be called for And.
Reviewed By: frasercrmck
Differential Revision: https://reviews.llvm.org/D110106
Nico Weber [Tue, 21 Sep 2021 17:02:52 +0000 (13:02 -0400)]
[lldb/win] Default to native PDB reader when LLVM_ENABLE_DIA_SDK=NO
Trying to use the DIA SDK reader only to fail with "DIA SDK wasn't enabled"
isn't very useful. The native PDB reader is missing some stuff, but it's still
better than nothing.
Reduces number of lldb-check-shell test failures with LLVM_ENABLE_DIA_SDK=NO
from 27 to 15.
Differential Revision: https://reviews.llvm.org/D110172
LLVM GN Syncbot [Tue, 21 Sep 2021 16:34:07 +0000 (16:34 +0000)]
[gn build] Port
a04a6ce7726b
Mark de Wever [Mon, 14 Dec 2020 16:39:15 +0000 (17:39 +0100)]
[libc++][format] Adds parser std-format-spec.
This implements the generic std.format.spec framework for all types.
The Unicode support will be added in a separate patch.
Implements parts of:
- P0645 Text Formatting
Completes:
- LWG-3242 std::format: missing rules for arg-id in width and precision
- P1892 Extended locale-specific presentation specifiers for std::format
Reviewed By: #libc, ldionne, vitaut
Differential Revision: https://reviews.llvm.org/D103368
cchen [Tue, 21 Sep 2021 16:26:50 +0000 (11:26 -0500)]
[OpenMP][NFC] Add declare variant and metadirective to support page
Aaron Ballman [Tue, 21 Sep 2021 16:25:13 +0000 (12:25 -0400)]
Revert "Diagnose -Wunused-value based on CFG reachability"
This reverts commit
63e0d038fc20c894a3d541effa1bc2b1fdea37b9.
It causes test failures:
http://lab.llvm.org:8011/#/builders/119/builds/5612
https://logs.chromium.org/logs/fuchsia/buildbucket/cr-buildbucket/
8835548361443044001/+/u/clang/test/stdout
Dávid Bolvanský [Tue, 21 Sep 2021 16:14:04 +0000 (18:14 +0200)]
[InstCombine] powi(x, y) * powi(x, z) -> powi(x, y + z)
We already have pow(x, y) * pow(x, z) -> pow(x, y + z) transformation, but we are missing same transformation for powi (power is integer).
Requires reassoc.
Reviewed By: spatel
Differential Revision: https://reviews.llvm.org/D109954
Quinn Pham [Fri, 17 Sep 2021 19:23:46 +0000 (14:23 -0500)]
[PowerPC] Fix signature of lxvp and stxvp builtins
This patch changes the signature of the load and store vector pair
builtins to match their documentation. The type of the `signed long long`
argument is changed to `signed long`. This patch also changes existing testcases
to match the signature change.
Reviewed By: lei, Conanap
Differential Revision: https://reviews.llvm.org/D109996
Kazu Hirata [Tue, 21 Sep 2021 16:12:29 +0000 (09:12 -0700)]
[CodeGen] Remove redundant declaration getFileType (NFC)
Sanjay Patel [Tue, 21 Sep 2021 15:26:30 +0000 (11:26 -0400)]
[InstCombine] move/add tests for trunc-of-lshr; NFC
Planning to reframe a proposed transform in terms of
demanded bits as suggested in D110170.
The new tests end with an 'or'.
Kostya Serebryany [Tue, 21 Sep 2021 00:58:50 +0000 (17:58 -0700)]
[sanitizer coverage] write the pc-table at the process exit
The current code writes the pc-table at the process startup,
which may happen before the common_flags() are initialized.
Move writing to the process end.
This is consistent with how we write the counters and avoids the problem with the uninitalized flags.
Add prints if verbosity>=1.
Reviewed By: kostik
Differential Revision: https://reviews.llvm.org/D110119
Florian Hahn [Tue, 21 Sep 2021 15:54:47 +0000 (16:54 +0100)]
[ValueTracking,VectorCombine] Allow passing DT to computeConstantRange.
isValidAssumeForContext can provide better results with access to the
dominator tree in some cases. This patch adjusts computeConstantRange to
allow passing through a dominator tree.
The use VectorCombine is updated to pass through the DT to enable
additional scalarization.
Note that similar APIs like computeKnownBits already accept optional dominator
tree arguments.
Reviewed By: lebedev.ri
Differential Revision: https://reviews.llvm.org/D110175
Michael Liao [Thu, 13 May 2021 13:36:48 +0000 (09:36 -0400)]
[SelectionDAG] Re-calculate scoped AA metadata when merging stores.
Reviewed By: jeroen.dobbelaere
Differential Revision: https://reviews.llvm.org/D102821
Aleksandr Bezzubikov [Tue, 21 Sep 2021 14:41:53 +0000 (10:41 -0400)]
[GlobalISel] Support ConstantAsMetadata in IRTranslator
When using instructions which have a MetadataAsValue argument
(e.g. some target-specific intrinsics) MD canonicalization strips
internal MDNodes with a single ConstantAsMetadata child. That
prevented IRTranslator from the proper translation of such a calls.
Tobias Gysi [Tue, 21 Sep 2021 15:09:32 +0000 (15:09 +0000)]
[mlir][linalg] Simplify slice dim computation for fusion on tensors (NFC).
Compute the tiled producer slice dimensions directly starting from the consumer not using the producer at all.
Reviewed By: nicolasvasilache
Differential Revision: https://reviews.llvm.org/D110147
Tobias Gysi [Tue, 21 Sep 2021 14:53:33 +0000 (14:53 +0000)]
[mlir][linalg] Add isPermutation helper (NFC).
Add a helper method to check if an index vector contains a permutation of its indices. Additionally, refactor applyPermutationToVector to take int64_t.
Reviewed By: nicolasvasilache
Differential Revision: https://reviews.llvm.org/D110135
Dmitry Preobrazhensky [Tue, 21 Sep 2021 15:04:34 +0000 (18:04 +0300)]
[AMDGPU][MC][GFX7][GFX10] Corrected image_atomic_fcmpswap
Differential Revision: https://reviews.llvm.org/D109616
Petar Avramovic [Tue, 21 Sep 2021 13:51:32 +0000 (15:51 +0200)]
AMDGPU/GlobalISel: Restore run line erased in D109154 by mistake
Andy Wingo [Wed, 4 Aug 2021 08:38:24 +0000 (10:38 +0200)]
[clang][NFC] Fix needless double-parenthisation
Strip a layer of parentheses in TreeTransform::RebuildQualifiedType.
Differential Revision: https://reviews.llvm.org/D108359
David Green [Tue, 21 Sep 2021 14:37:00 +0000 (15:37 +0100)]
[AArch64] Regenerate test lines in and-mask-removal.ll
Nicolas Vasilache [Mon, 20 Sep 2021 14:04:07 +0000 (14:04 +0000)]
[mlir][Linalg] Revisit heuristic ordering of tensor.insert_slice in comprehensive bufferize.
It was previously assumed that tensor.insert_slice should be bufferized first in a greedy fashion to avoid out-of-place bufferization of the large tensor. This heuristic does not hold upon further inspection.
This CL removes the special handling of such ops and adds a test that exhibits better behavior and appears in real use cases.
The only test adversely affected is an artificial test which results in a returned memref: this pattern is not allowed by comprehensive bufferization in real scenarios anyway and the offending test is deleted.
Differential Revision: https://reviews.llvm.org/D110072
Nicolas Vasilache [Mon, 20 Sep 2021 14:03:55 +0000 (14:03 +0000)]
[mlir][Linalg] Revisit RAW dependence interference in comprehensive bufferize.
Previously, comprehensive bufferize would consider all aliasing reads and writes to
the result buffer and matching operand. This resulted in spurious dependences
being considered and resulted in too many unnecessary copies.
Instead, this revision revisits the gathering of read and write alias sets.
This results in fewer alloc and copies.
An exhaustive test cases is added that considers all possible permutations of
`matmul(extract_slice(fill), extract_slice(fill), ...)`.
Tobias Gysi [Tue, 21 Sep 2021 13:51:07 +0000 (13:51 +0000)]
[mlir][linalg] Assert tile loop nest invariants in fusion.
Assert the tile loop nest invariants are satisfied instead of failing silently.
Reviewed By: nicolasvasilache
Differential Revision: https://reviews.llvm.org/D110137
Chris Bieneman [Thu, 16 Sep 2021 01:42:12 +0000 (20:42 -0500)]
[NFC] `goto fail` has failed us in the past...
This patch replaces reliance on `goto failure` pattern with
`llvm::scope_exit`.
Reviewed By: bkramer
Differential Revision: https://reviews.llvm.org/D109865
Ben Shi [Sun, 19 Sep 2021 09:42:20 +0000 (09:42 +0000)]
[RISCV] Optimize (add (mul x, c0), c1)
Optimize (add (mul x, c0), c1) -> (ADDI (MUL (ADDI, c1/c0), c0), c1%c0),
if c1/c0 and c1%c0 are simm12, while c1 is not.
Optimize (add (mul x, c0), c1) -> (MUL (ADDI, c1/c0), c0),
if c1%c0 is zero, and c1/c0 is simm12 while c1 is not.
Reviewed By: craig.topper
Differential Revision: https://reviews.llvm.org/D108607
Justas Janickas [Wed, 15 Sep 2021 10:53:43 +0000 (11:53 +0100)]
[OpenCL] Defines helper function for OpenCL default address space
Helper function `getDefaultOpenCLPointeeAddrSpace()` introduced to
`ASTContext` class. It returns default OpenCL address space
depending on language version and enabled features. If generic
address space is supported, the helper function returns value
`LangAS::opencl_generic`. Otherwise, value `LangAS::opencl_private`
is returned. Code refactoring changes performed in several suitable
places.
Differential Revision: https://reviews.llvm.org/D109874
Anna Thomas [Mon, 20 Sep 2021 20:37:38 +0000 (16:37 -0400)]
[InstCombine] Improve TryToSinkInstruction with multiple uses
This patch allows sinking an instruction which can have multiple uses in a
single user. We were previously over-restrictive by looking for exactly one use,
rather than one user.
Also added an API for retrieving a unique undroppable user.
Reviewed By: nikic
Differential Revision: https://reviews.llvm.org/D109700
Saiyedul Islam [Mon, 20 Sep 2021 17:25:41 +0000 (22:55 +0530)]
[clang-offload-bundler][docs][NFC] Add archive unbundling documentation
Add documentation of unbundling of heterogeneous device archives to
create device specific archives, as introduced by D93525. Also, add
documentation for supported text file formats.
Reviewed By: yaxunl
Differential Revision: https://reviews.llvm.org/D110083
OGINO Masanori [Tue, 21 Sep 2021 15:24:48 +0000 (17:24 +0200)]
[NFC] Update the list of subprojects in docs.
The updated list is based on the output of
cmake -G Ninja -S llvm -B build -DLLVM_ENABLE_PROJECTS='foo'.
Differential Revision: https://reviews.llvm.org/D110124
Sanjay Patel [Mon, 20 Sep 2021 19:36:36 +0000 (15:36 -0400)]
[InstCombine] add tests for mask-shift with trunc; NFC
Dmitry Preobrazhensky [Tue, 21 Sep 2021 13:21:44 +0000 (16:21 +0300)]
[AMDGPU][MC][GFX10] Enabled dlc for FLAT and GLOBAL atomics
Differential Revision: https://reviews.llvm.org/D109614
hyeongyu kim [Tue, 21 Sep 2021 12:48:04 +0000 (21:48 +0900)]
[IR] Add the constructor of ShuffleVector for one-input-vector.
One of the two inputs of the Shufflevector is often a placeholder.
Previously, there were cases where the placeholder was undef, and there were cases where it was poison.
I added these constructors to create a placeholder consistently.
Changing to use the newly added constructor will be written in a separate patch.
Reviewed By: spatel
Differential Revision: https://reviews.llvm.org/D110146
Nico Weber [Tue, 21 Sep 2021 13:01:37 +0000 (09:01 -0400)]
[llvm] Pass LLVM_CHECK_ENABLED_PROJECTS through in cross builds
Jonas Paulsson [Thu, 9 Sep 2021 15:26:50 +0000 (17:26 +0200)]
[SystemZ] Emit EXRL target instructions before text section is ended.
SystemZ adds the EXRL target instructions in the end of each file. This must
be done before debug info emission since that may end the text section, and
therefore this is now done in emitConstantPools() (instead of in
emitEndOfAsmFile).
Review: Ulrich Weigand
Differential Revision: https://reviews.llvm.org/D109513
Florian Hahn [Tue, 21 Sep 2021 11:51:07 +0000 (12:51 +0100)]
[VectorCombine] Add tests which require DT to use info from assumes.
Nicholas Guy [Wed, 8 Sep 2021 15:19:17 +0000 (16:19 +0100)]
[AArch64] Improve schedule modelling on the Cortex-A55
Enables the FuseAddress feature in the Cortex-A55 scheduling model
Differential Revision: https://reviews.llvm.org/D109323
Simon Pilgrim [Tue, 21 Sep 2021 11:23:52 +0000 (12:23 +0100)]
[InstCombine] foldConstantInsEltIntoShuffle - bail if we fail to find constant element (PR51824)
If getAggregateElement() returns null for any element, early out as otherwise we will assert when creating a new constant vector
Fixes PR51824 + ; OSS-Fuzz: https://bugs.chromium.org/p/oss-fuzz/issues/detail?id=38057
Simon Pilgrim [Tue, 21 Sep 2021 09:19:12 +0000 (10:19 +0100)]
[CodeGen] SelectionDAGBuilder - Use const-ref iterator in for-range loops. NFCI.
Avoid unnecessary copies, reported by MSVC static analyzer.
Simon Pilgrim [Tue, 21 Sep 2021 09:14:02 +0000 (10:14 +0100)]
RewriteStatepointsForGC - Use const-ref iterator in for-range loops. NFCI.
Avoid unnecessary copies, reported by MSVC static analyzer.
Simon Pilgrim [Tue, 21 Sep 2021 09:12:56 +0000 (10:12 +0100)]
[CodeGen] SDDbgValue::getSDNodes() - use const-ref to avoid unnecessary copies. NFCI.
Reported by MSVC static analyzer.
Dmitry Vyukov [Tue, 21 Sep 2021 08:20:24 +0000 (10:20 +0200)]
tsan: simplify thread context setting
Currently we set thr->tctx after OnStarted callback
taking thread registry mutex again and searching for the context.
But OnStarted already runs under the thread registry mutex
and has access to the context, so set it in the OnStarted.
This makes code simpler and faster.
Depends on D110132.
Reviewed By: melver
Differential Revision: https://reviews.llvm.org/D110133
Dmitry Vyukov [Tue, 21 Sep 2021 08:08:56 +0000 (10:08 +0200)]
tsan: rearrange thread state callbacks (NFC)
Thread state functions are split into 2 parts:
tsan entry function (e.g. ThreadStart) and thread registry
state change callback (e.g. OnStart). Currently these
pairs of functions are located far from each other and
in reverse order. This makes it hard to read and follow the logic.
Reorder the code so that OnFoo directly follows ThreadFoo.
No other code changes.
Reviewed By: melver
Differential Revision: https://reviews.llvm.org/D110132
Dmitry Vyukov [Tue, 21 Sep 2021 07:58:14 +0000 (09:58 +0200)]
tsan: fix debug format strings
Some of the DPrintf's currently produce -Wformat warnings if enabled.
Fix these format strings.
Reviewed By: melver
Differential Revision: https://reviews.llvm.org/D110131
Jay Foad [Mon, 20 Sep 2021 14:44:32 +0000 (15:44 +0100)]
[AMDGPU] Prefer fmac over fma when selecting FMA_W_CHAIN
FMA_W_CHAIN is used when lowering fdiv f32. Prefer to select it to fmac
if there are no source modifiers, just like we do for other mad/mac and
fma/fmac cases.
Differential Revision: https://reviews.llvm.org/D110074
Jay Foad [Mon, 20 Sep 2021 13:20:28 +0000 (14:20 +0100)]
[AMDGPU] Prefer v_fmac over v_fma only when no source modifiers are used
v_fmac with source modifiers forces VOP3 encoding, but it is strictly
better to use the VOP3-only v_fma instead, because $dst and $src2 are
not tied so it gives the register allocator more freedom and avoids a
copy in some cases.
This is the same strategy we already use for v_mad vs v_mac and
v_fma_legacy vs v_fmac_legacy.
Differential Revision: https://reviews.llvm.org/D110070
David Green [Tue, 21 Sep 2021 10:44:41 +0000 (11:44 +0100)]
[AArch64] Regenerate test lines in sve-implicit-zero-filling.ll
Max Kazantsev [Tue, 21 Sep 2021 10:10:43 +0000 (17:10 +0700)]
[SCEV] Use isAvailableAtLoopEntry in the asserts
This is what is supposed to be there.
Petar Avramovic [Tue, 21 Sep 2021 09:54:12 +0000 (11:54 +0200)]
GlobalISel/Utils: Refactor constant splat match functions
Add generic helper function that matches constant splat. It has option to
match constant splat with undef (some elements can be undef but not all).
Add util function and matcher for G_FCONSTANT splat.
Differential Revision: https://reviews.llvm.org/D104410
Max Kazantsev [Tue, 21 Sep 2021 10:05:08 +0000 (17:05 +0700)]
[SCEV] Add some asserts on availability of arguments of isLoopEntryGuardedByCond
The logic in howManyLessThans is fishy. It first checks invariance of
RHS, and then uses OrigRHS as argument for isLoopEntryGuardedByCond, which
is, strictly saying, a different thing. We are seeing a very rare intermittent
failure of availability checks, and it looks like this precondition is
sometimes broken. Before we can figure out what's going on, adding asserts
that all involved values that may possibly to to isLoopEntryGuardedByCond
are available at loop entry.
If either of these asserts fails (OrigRHS is the most likely suspect), it
means that the logic here is flawed.
David Stenberg [Tue, 21 Sep 2021 07:44:51 +0000 (09:44 +0200)]
[LowerConstantIntrinsics] Fix heap-use-after-free bug in worklist
This fixes PR51730, a heap-use-after-free bug in
replaceConditionalBranchesOnConstant().
With the attached reproducer we were left with a function looking
something like this after replaceAndRecursivelySimplify():
[...]
cont2.i:
br i1 %.not1.i, label %handler.type_mismatch3.i, label %cont4.i
handler.type_mismatch3.i:
%3 = phi i1 [ %2, %cont2.thread.i ], [ false, %cont2.i ]
unreachable
cont4.i:
unreachable
[...]
with both the branch instruction and PHI node being in the worklist. As
a result of replacing the branch instruction with an unconditional
branch, the PHI node in %handler.type_mismatch3.i would be removed. This
then resulted in a heap-use-after-free bug due to accessing that removed
PHI node in the next worklist iteration.
This is solved by using a value handle worklist. I am a unsure if this
is the most idiomatic solution. Another solution could have been to
produce a worklist just containing the interesting branch instructions,
but I thought that it perhaps was a bit cleaner to keep all worklist
filtering in the loop that does the rewrites.
Reviewed By: lebedev.ri
Differential Revision: https://reviews.llvm.org/D109221
Justas Janickas [Thu, 9 Sep 2021 08:32:39 +0000 (09:32 +0100)]
[OpenCL] Test case for C++ for OpenCL 2021 in OpenCL C header test
RUN line representing C++ for OpenCL 2021 added to the test. This
should have been done as part of earlier commit
fb321c2ea274 but
was missed during rebasing.
Differential Revision: https://reviews.llvm.org/D109492
Uday Bondhugula [Tue, 21 Sep 2021 08:53:22 +0000 (14:23 +0530)]
[MLIR] NFC. gpu.launch op argument const folder cleanup
NFC updates to gpu.launch op argument const folder.
Differential Revision: https://reviews.llvm.org/D110136
Andrzej Warzynski [Thu, 16 Sep 2021 07:34:55 +0000 (07:34 +0000)]
[flang][docs] Document plugin limitations
This was extracted from the discussion on
https://reviews.llvm.org/D108283.
Co-authored-by: Kiran Chandramohan <kiran.chandramohan@arm.com>
Differential Revision: https://reviews.llvm.org/D109871
Sylvestre Ledru [Tue, 21 Sep 2021 08:44:08 +0000 (10:44 +0200)]
Add CMAKE_BUILD_TYPE to the list of BOOTSTRAP_DEFAULT_PASSTHROUGH variables
When building clang in stage2, when -DCMAKE_BUILD_TYPE=RelWithDebInfo is set,
the developer can expect that the stage2 clang is built using the same mode.
Especially as the performances are much worst in debug mode.
(Principle of least astonishment)
Differential Revision: https://reviews.llvm.org/D53014
Cullen Rhodes [Tue, 21 Sep 2021 07:40:36 +0000 (07:40 +0000)]
[PowerPC] NFC: Remove unused tblgen template args
Identified in D109359.
Reviewed By: nemanjai
Differential Revision: https://reviews.llvm.org/D109715
Morten Borup Petersen [Mon, 20 Sep 2021 18:02:00 +0000 (19:02 +0100)]
[MLIR][SCF] Add for-to-while loop transformation pass
This pass transforms SCF.ForOp operations to SCF.WhileOp. The For loop condition is placed in the 'before' region of the while operation, and indctuion variable incrementation + the loop body in the 'after' region. The loop carried values of the while op are the induction variable (IV) of the for-loop + any iter_args specified for the for-loop.
Any 'yield' ops in the for-loop are rewritten to additionally yield the (incremented) induction variable.
This transformation is useful for passes where we want to consider structured control flow solely on the basis of a loop body and the computation of a loop condition. As an example, when doing high-level synthesis in CIRCT, the incrementation of an IV in a for-loop is "just another part" of a circuit datapath, and what we really care about is the distinction between our datapath and our control logic (the condition variable).
Differential Revision: https://reviews.llvm.org/D108454
Pavel Labath [Tue, 21 Sep 2021 07:55:56 +0000 (09:55 +0200)]
[lldb] Speculative fix to TestGuiExpandThreadsTree
This test relies on being able to unwind from an arbitrary place inside
libc. While I am not sure this is the cause of the observed flakyness,
it is known that we are not able to unwind correctly from some places in
(linux) libc.
This patch adds additional synchronization to ensure that the inferior
is in the main function (instead of pthread guts) when lldb tries to
unwind it. At the very least, it should make the test runs more
predictable/repeatable.
Kunwar Shaanjeet Singh Grover [Tue, 21 Sep 2021 07:29:49 +0000 (12:59 +0530)]
[MLIR] Add mergeLocalIds and mergeSymbolIds
This patch adds mergeLocalIds andmergeSymbolIds as public functions
for FlatAffineConstraints and FlatAffineValueConstraints respectively.
mergeLocalIds is also required to support divisions in intersection,
subtraction, equality checks, and complement for PresburgerSet.
This patch is part of a series of patches aimed at generalizing affine
dependence analysis.
Reviewed By: bondhugula
Differential Revision: https://reviews.llvm.org/D110045
Nathan Ridge [Mon, 20 Sep 2021 06:29:02 +0000 (02:29 -0400)]
[clangd] Deduplicate inlay hints
Duplicates can sometimes appear due to e.g. explicit template
instantiations
Differential Revision: https://reviews.llvm.org/D110051
Amara Emerson [Tue, 14 Sep 2021 08:56:38 +0000 (01:56 -0700)]
[GlobalISel][Legalizer] Use ArtifactValueFinder first for unmerge combines before trying others.
This is motivated by an pathological compile time issue during unmerge combining.
We should be able to use the AVF to do simplification. However AMDGPU
has a lot of codegen changes which I'm not sure how to evaluate.
Differential Revision: https://reviews.llvm.org/D109748
Evgeniy Brevnov [Wed, 28 Jul 2021 10:13:45 +0000 (17:13 +0700)]
[DSE][NFC] Rename Later->Killing, Earlier->Dead
First (and biggest) change is to use "Killing/Dead" in place of "Later/Earlier" base for names in DSE. For example, [Maybe]DeadLoc - is a location killed by KillingI instruction. I believe such names are more descriptive and easy to understand than current ones.
Second, there are inconsistencies in naming where different names are used for the same thing. Fixed that too.
Third, reordered parameters of isPartialOverwrite, tryToMergePartialOverlappingStores, isOverwrite to make them consistent between each other. This greatly reduces potential mistakes.
Reviewed By: fhahn
Differential Revision: https://reviews.llvm.org/D106947
Amara Emerson [Tue, 14 Sep 2021 09:39:57 +0000 (02:39 -0700)]
[GlobalISel][Legalizer] Don't use eraseFromParentAndMarkDBGValuesForRemoval() for some artifacts.
For artifacts excluding G_TRUNC/G_SEXT, which have IR counterparts, we don't
seem to have debug users of defs. However, in the legalizer we're always calling
MachineInstr::eraseFromParentAndMarkDBGValuesForRemoval() which is expensive.
In some rare cases, this contributes significantly to unreasonably long compile
times when we have lots of artifact combiner activity.
To verify this, I added asserts to that function when it actually replaced a debug
use operand with undef for these artifacts. On CTMark with both -O0 and -Os and
debug info enabled, I didn't see a single case where it triggered.
In my measurements I saw around a 0.5% geomean compile-time improvement on -g -O0
for AArch64 with this change.
Differential Revision: https://reviews.llvm.org/D109750
Max Kazantsev [Tue, 21 Sep 2021 04:08:29 +0000 (11:08 +0700)]
[SCEV] Generalize implication when signedness of FoundPred doesn't matter
The implication logic for two values that are both negative or non-negative
says that it doesn't matter whether their predicate is signed and unsigned,
but only flips unsigned into signed for further inference. This patch adds
support for flipping a signed predicate into unsigned as well.
Differential Revision: https://reviews.llvm.org/D109959
Reviewed By: nikic
Yonghong Song [Sat, 31 Jul 2021 18:34:29 +0000 (11:34 -0700)]
BPF: make 32bit register spill with 64bit alignment
In llvm, for non-alu32 mode, the stack alignment is 64bit so only one
64bit spill per 64bit slot. For alu32 mode, the stack alignment
is 32bit, so it is possible to have two 32bit spills per
64bit slot.
Currently, bpf kernel verifier does not preserve register states
for 32bit spills. That is, one 32bit register may hold a constant
value or a bounded range before spill. After reload from the
stack, the information is lost and sometimes this may cause
verifier failure. For 64bit register spill, the verifier
indeed tries to preserve the register state for reloading.
The current verifier can be modestly changed to handle one
32bit spill per 64bit stack slot with state-preserving reload.
Handling two 32bit spills per 64bit stack slot will require
substantial changes.
This patch changes stack alignment for alu32 to be 64bit.
This way, for any 64bit slot in alu32 mode, only one
32bit or 64bit register values can be saved. Together
with previous-mentioned verifier enhancement, 32bit
spill can be handled with state preserving.
Note that llvm stack slot coallescing
seems only doing adjacent packing which may leave some holes
in the stack. For example,
stack slot 8 <== 8 bytes
stack slot 4 <== 8 bytes with 4 byte hole
stack slot 8 <== 8 bytes
stack slot 4 <== 4 bytes
Differential Revision: https://reviews.llvm.org/D109073
Chris Lattner [Tue, 21 Sep 2021 01:27:40 +0000 (18:27 -0700)]
[OpAsmParser] Add a parseCommaSeparatedList helper and beef up Delimeter.
Lots of custom ops have hand-rolled comma-delimited parsing loops, as does
the MLIR parser itself. Provides a standard interface for doing this that
is less error prone and less boilerplate.
While here, extend Delimiter to support <> and {} delimited sequences as
well (I have a use for <> in CIRCT specifically).
Differential Revision: https://reviews.llvm.org/D110122
Max Kazantsev [Tue, 21 Sep 2021 03:22:40 +0000 (10:22 +0700)]
[SimplifyCFG] Redirect switch cases that lead to UB into an unreachable block
When following a case of a switch instruction is guaranteed to lead to
UB, we can safely break these edges and redirect those cases into a newly
created unreachable block. As result, CFG will become simpler and we can
remove some of Phi inputs to make further analyzes easier.
Patch by Dmitry Bakunevich!
Differential Revision: https://reviews.llvm.org/D109428
Reviewed By: lebedev.ri
Michael Kruse [Tue, 21 Sep 2021 02:32:57 +0000 (21:32 -0500)]
[Polly] Don't generate inter-iteration noalias metadata.
This metadata was intended to mark all accesses within an iteration to be pairwise non-aliasing, in this case because every memory of a base pointer is touched (read or write) at most once. This is typical for 'sweeps' over all data. The stated motivation from D30606 is to ensure that unrolled iterations are considered non-aliasing.
Rhe implemention had multiple issues:
* The structure of the noalias metadata was malformed. D110026 added check in the verifier for this metadata, and the tests were failing since then.
* This is not true for the outer loops of the BLIS matrix multiplication, where it was being inserted. Each element of A, B, C is accessed multiple times, as often as the loop not used as an index is iterating.
* Scopes were added to SecondLevelOtherAliasScopeList (used for the !noalias scop list) on-the-fly when another SCEV was seen. This meant that previously visited instructions would not be updated with alias scopes that are only seen later, missing out those SCEVs they should not be aliasing with.
* Since the !noalias scope list would ideally consists of all other SCEV for this base pointer, we might run quickly into scalability issues. Especially after unrolling there would probably at least once SCEV per instruction and unroll instance.
* The inter-iteration noalias base pointer was not removed after leaving the loop marked with it, effectively marking everything after it to noalias as well.
A solution I considered was to mark each instruction as non-aliasing with its own scope. The instruction itself would obviously alias itself, but such construction might also be considered invalid. Duplicating the instruction (e.g. due to speculation) would mark the instruction non-aliasing with its clone. I don't want to go into this territory, especially since the original motivation of determining unrolled instances as noalias based on SCEV is the what scev-aa does as well.
This effectively reverts D30606 and D35761.
Max Kazantsev [Tue, 21 Sep 2021 02:29:33 +0000 (09:29 +0700)]
[NFC] Rename Context->CtxI in SCEV for uniformity reasons
Kazu Hirata [Tue, 21 Sep 2021 02:30:02 +0000 (19:30 -0700)]
[llvm] Use make_early_inc_range (NFC)
River Riddle [Tue, 21 Sep 2021 01:40:45 +0000 (01:40 +0000)]
[mlir] Refactor ElementsAttr into an AttrInterface
This revision refactors ElementsAttr into an Attribute Interface.
This enables a common interface with which to interact with
element attributes, without needing to modify the builtin
dialect. It also removes a majority (if not all?) of the need for
the current OpaqueElementsAttr, which was originally intended as
a way to opaquely represent data that was not representable by
the other builtin constructs.
The new ElementsAttr interface not only allows for users to
natively represent their data in the way that best suits them,
it also allows for efficient opaque access and iteration of the
underlying data. Attributes using the ElementsAttr interface
can directly expose support for interacting with the held
elements using any C++ data type they claim to support. For
example, DenseIntOrFpElementsAttr supports iteration using
various native C++ integer/float data types, as well as
APInt/APFloat, and more. ElementsAttr instances that refer to
DenseIntOrFpElementsAttr can use all of these data types for
iteration:
```c++
DenseIntOrFpElementsAttr intElementsAttr = ...;
ElementsAttr attr = intElementsAttr;
for (uint64_t value : attr.getValues<uint64_t>())
...;
for (APInt value : attr.getValues<APInt>())
...;
for (IntegerAttr value : attr.getValues<IntegerAttr>())
...;
```
ElementsAttr also supports failable range/iterator access,
allowing for selective code paths depending on data type
support:
```c++
ElementsAttr attr = ...;
if (auto range = attr.tryGetValues<uint64_t>()) {
for (uint64_t value : *range)
...;
}
```
Differential Revision: https://reviews.llvm.org/D109190