platform/upstream/llvm.git
2 years agoMemoryBuiltins: start using properties of functions
Augie Fackler [Wed, 30 Mar 2022 18:14:53 +0000 (14:14 -0400)]
MemoryBuiltins: start using properties of functions

Prior to this change, we relied on the hard-coded list for all of the
information performed by MemoryBuiltins. With this change, we're able to
start relying on properites of functions described in attributes, which
opens the door to out-of-tree compilers being able to describe their
allocator functions to LLVM's optimizer logic without having to register
their implementation details with LLVM.

Differential Revision: https://reviews.llvm.org/D123090

2 years ago[PatternMatch][InstCombine] match a vector with constant expression element(s) as...
Sanjay Patel [Thu, 21 Jul 2022 19:07:06 +0000 (15:07 -0400)]
[PatternMatch][InstCombine] match a vector with constant expression element(s) as a constant expression

The InstCombine test is reduced from issue #56601. Without the more
liberal match for ConstantExpr, we try to rearrange constants in
Negator forever.

Alternatively, we could adjust the definition of m_ImmConstant to be
more conservative, but that's probably a larger patch, and I don't
see any downside to changing m_ConstantExpr. We never capture and
modify a ConstantExpr; transforms just want to avoid it.

Differential Revision: https://reviews.llvm.org/D130286

2 years ago[PatternMatch] add tests for constant expression matcher; NFC
Sanjay Patel [Thu, 21 Jul 2022 15:52:23 +0000 (11:52 -0400)]
[PatternMatch] add tests for constant expression matcher; NFC

2 years ago[LoopAccessAnalysis] Simplify D119047
Arthur Eubanks [Wed, 9 Feb 2022 22:18:14 +0000 (14:18 -0800)]
[LoopAccessAnalysis] Simplify D119047

No need to add checks for every type per pointer that we couldn't create
a check for the first time around, just the types that weren't
successful.

Reviewed By: fhahn

Differential Revision: https://reviews.llvm.org/D119376

2 years ago[RISCV][LV] Split coverage of uniform load with outside use
Philip Reames [Thu, 21 Jul 2022 19:02:17 +0000 (12:02 -0700)]
[RISCV][LV] Split coverage of uniform load with outside use

Turns out this has a large effect of tail folding, so split out a single test to cover that case and remove it from the others.

2 years ago[cmake] Don't export `LLVM_TOOLS_INSTALL_DIR` anymore
John Ericson [Sat, 11 Jun 2022 06:11:59 +0000 (06:11 +0000)]
[cmake] Don't export `LLVM_TOOLS_INSTALL_DIR` anymore

First of all, `LLVM_TOOLS_INSTALL_DIR` put there breaks our NixOS
builds, because `LLVM_TOOLS_INSTALL_DIR` defined the same as
`CMAKE_INSTALL_BINDIR` becomes an *absolute* path, and then when
downstream projects try to install there too this breaks because our
builds always install to fresh directories for isolation's sake.

Second of all, note that `LLVM_TOOLS_INSTALL_DIR` stands out against the
other specially crafted `LLVM_CONFIG_*` variables substituted in
`llvm/cmake/modules/LLVMConfig.cmake.in`.

@beanz added it in d0e1c2a550ef348aae036d0fe78cab6f038c420c to fix a
dangling reference in `AddLLVM`, but I am suspicious of how this
variable doesn't follow the pattern.

Those other ones are carefully made to be build-time vs install-time
variables depending on which `LLVMConfig.cmake` is being generated, are
carefully made relative as appropriate, etc. etc. For my NixOS use-case
they are also fine because they are never used as downstream install
variables, only for reading not writing.

To avoid the problems I face, and restore symmetry, I deleted the
exported and arranged to have many `${project}_TOOLS_INSTALL_DIR`s.
`AddLLVM` now instead expects each project to define its own, and they
do so based on `CMAKE_INSTALL_BINDIR`. `LLVMConfig` still exports
`LLVM_TOOLS_BINARY_DIR` which is the location for the tools defined in
the usual way, matching the other remaining exported variables.

For the `AddLLVM` changes, I tried to copy the existing pattern of
internal vs non-internal or for LLVM vs for downstream function/macro
names, but it would good to confirm I did that correctly.

Reviewed By: nikic

Differential Revision: https://reviews.llvm.org/D117977

2 years ago[mlir] Flip dialects to _Prefixed
Jacques Pienaar [Thu, 21 Jul 2022 19:03:07 +0000 (12:03 -0700)]
[mlir] Flip dialects to _Prefixed

At least two weeks passed since flipped to _Both. Made some additional
NFC changes in .td files that were not converted earlier.

2 years ago[NFC] Fix compiler warning in MarkupFilter
Daniel Thornburgh [Thu, 21 Jul 2022 19:00:15 +0000 (12:00 -0700)]
[NFC] Fix compiler warning in MarkupFilter

2 years ago[flang] Run algebraic simplification optimization pass.
Slava Zakharin [Thu, 14 Jul 2022 23:50:41 +0000 (16:50 -0700)]
[flang] Run algebraic simplification optimization pass.

Try 2 to merge 4fbd1d6c872e8228f23a6e13914222af40ca6461.

Flang algebraic simplification pass will run algebraic simplification
rewrite patterns for Math/Complex/etc. dialects. It is enabled
under opt-for-speed optimization levels (i.e. for O1/O2/O3; Os/Oz will not
enable it).

With this change the FIR/MLIR optimization pipeline becomes affected
by the -O* optimization level switches. Until now these switches
only affected the middle-end and back-end.

Differential Revision: https://reviews.llvm.org/D130035

2 years agoAdding a new variant of DepthwiseConv2D
George Petterson [Thu, 21 Jul 2022 18:36:47 +0000 (14:36 -0400)]
Adding a new variant of DepthwiseConv2D

This is the same as the existing multiplier-1 variant of DepthwiseConv2D, but in PyTorch dim order.

Reviewed By: nicolasvasilache

Differential Revision: https://reviews.llvm.org/D128575

2 years ago[Symbolizer] Implement contextual symbolizer markup elements.
Daniel Thornburgh [Fri, 10 Jun 2022 23:11:36 +0000 (16:11 -0700)]
[Symbolizer] Implement contextual symbolizer markup elements.

This change implements the contextual symbolizer markup elements: reset,
module, and mmap. These provide information about the runtime context of
the binary necessary to resolve addresses to symbolic values.

Summary information is printed to the output about this context.
Multiple mmap elements for the same module line are coalesced together.
The standard requires that such elements occur on their own lines to
allow for this; accordingly, anything after a contextual element on a
line is silently discarded.

Implementing this cleanly requires that the filter drive the parser;
this allows skipped sections to avoid being parsed. This also makes the
filter quite a bit easier to use, at the cost of some unused
flexibility.

Reviewed By: peter.smith

Differential Revision: https://reviews.llvm.org/D129519

2 years ago[llvm-cov] Improve error message by printing the object file name that produces error
Zequan Wu [Thu, 21 Jul 2022 17:18:16 +0000 (10:18 -0700)]
[llvm-cov] Improve error message by printing the object file name that produces error

If error occurs on constructing coverage info for one of the object files, it prints the name of the object file, so that users know which one is the cause of error.

Differential Revision: https://reviews.llvm.org/D130196

2 years ago[SemaCXX] Set promotion type for enum if its type is promotable to integer type even...
Zequan Wu [Wed, 20 Jul 2022 23:08:25 +0000 (16:08 -0700)]
[SemaCXX] Set promotion type for enum if its type is promotable to integer type even if it has no definition.

EnumDecl's promotion type is set either to the parsed type or calculated type
after completing its definition. When it's bool type and has no definition,
its promotion type is bool which is not allowed by clang.

Fixes #56560.

Differential Revision: https://reviews.llvm.org/D130210

2 years ago{RISCV][LV] Add tail folding coverage of uniform load store cases
Philip Reames [Thu, 21 Jul 2022 18:07:40 +0000 (11:07 -0700)]
{RISCV][LV] Add tail folding coverage of uniform load store cases

2 years ago{RISCV][LV] Add a test for uniform store of a loop varying value
Philip Reames [Thu, 21 Jul 2022 18:04:23 +0000 (11:04 -0700)]
{RISCV][LV] Add a test for uniform store of a loop varying value

2 years ago[NFC] Empty commit to test commit access
Anubhab Ghosh [Thu, 21 Jul 2022 17:58:10 +0000 (23:28 +0530)]
[NFC] Empty commit to test commit access

2 years ago[lld-macho] Fix LOH parsing segfault
Jez Ng [Thu, 21 Jul 2022 17:58:15 +0000 (13:58 -0400)]
[lld-macho] Fix LOH parsing segfault

`advanceSubsection()` didn't account for the possibility that a section
could have no subsections.

Reviewed By: #lld-macho, thakis, BertalanD

Differential Revision: https://reviews.llvm.org/D130288

2 years agoFix type in documentation
Javed Absar [Thu, 21 Jul 2022 14:47:27 +0000 (15:47 +0100)]
Fix type in documentation

Reviewed By: bondhugula

Differential Revision: https://reviews.llvm.org/D130274

2 years ago{RISCV][LV] Split out and expand tests for uniform loads and stores
Philip Reames [Thu, 21 Jul 2022 17:38:25 +0000 (10:38 -0700)]
{RISCV][LV] Split out and expand tests for uniform loads and stores

2 years ago[CUDA][FIX] Make shfl[_sync] for unsigned long long non-recursive
Johannes Doerfert [Tue, 12 Jul 2022 02:42:16 +0000 (21:42 -0500)]
[CUDA][FIX] Make shfl[_sync] for unsigned long long non-recursive

A copy-paste error caused UB in the definition of the unsigned long long
versions of the shfl intrinsics. Reported and diagnosed by @trws.

Differential Revision: https://reviews.llvm.org/D129536

2 years ago[OpenMP] Introduce more fine-grained control over the thread state use
Johannes Doerfert [Mon, 18 Jul 2022 20:44:02 +0000 (15:44 -0500)]
[OpenMP] Introduce more fine-grained control over the thread state use

We can help optimizations by making sure we use the team state whenever
it is clear there is no thread state. To this end we introduce a new
state flag (`state::HasThreadState`) and explicit control for the
`state::ValueRAII` helpers, including a dedicated "assert equal".

Differential Revision: https://reviews.llvm.org/D130113

2 years ago[OpenMP] Use Undef instead of null as pointer for inactive lanes
Johannes Doerfert [Wed, 13 Jul 2022 16:01:54 +0000 (11:01 -0500)]
[OpenMP] Use Undef instead of null as pointer for inactive lanes

Our conditional writes in the runtime look like this:
```
  if (active)
    *ptr = value;
```
In the RAII we need to assign `ptr` which comes from a lookup call.
If a thread that is not the main thread calls lookup with the intention
to write the pointer, we'll create a new thread state. As such, we need
to avoid calling lookup for inactive threads. We used to use `nullptr`
as their `ptr` value but that can cause pessimistic reasoning. We now
use `undef` instead.

Differential Revision: https://reviews.llvm.org/D130114

2 years ago[OpenMP] Expose the state in the header to allow non-lto optimizations
Johannes Doerfert [Tue, 19 Jul 2022 19:22:23 +0000 (14:22 -0500)]
[OpenMP] Expose the state in the header to allow non-lto optimizations

We used to inline the `lookup` calls such that the runtime had "known"
access offsets when it was shipped. With the new static library build it
doesn't as the lookup is an indirection we cannot look through. This
should help us optimize the code better until we can do LTO for the
runtime again.

Differential Revision: https://reviews.llvm.org/D130111

2 years ago[Libomptarget] Add checks for CUDA subarchitecture using new info
Joseph Huber [Fri, 10 Jun 2022 13:37:21 +0000 (09:37 -0400)]
[Libomptarget] Add checks for CUDA subarchitecture using new info

This patch extends the `is_valid_binary` routine to also check if the
binary's architecture string matches the one parsed from the runtime.
This should allow us to only use the binary whose compute capability
matches, allowing us to support basic multi-architecture binaries for
CUDA.

Depends on D127432

Reviewed By: jdoerfert

Differential Revision: https://reviews.llvm.org/D127505

2 years ago[Libomptarget] Add support for offloading binaries in libomptarget
Joseph Huber [Fri, 10 Jun 2022 19:16:15 +0000 (15:16 -0400)]
[Libomptarget] Add support for offloading binaries in libomptarget

The previous path changed the linker wrapper to embed the offloading
binary format inside the target image instead. This will allow us to
more generically bundle metadata with these images, such as requires
clauses or the target architecture it was compiled for.

I wasn't sure how to handle this best, so I introduced a new type that
replaces the old `__tgt_device_image` struct that we can expand inside
the runtime library. I made the new `__tgt_device_binary` struct pretty
much the same for now. In the future we could change this struct to
pretty much be the `OffloadBinary` class in the future.

Reviewed By: JonChesterfield

Differential Revision: https://reviews.llvm.org/D127432

2 years ago[LinkerWrapper] Embed OffloadBinaries for OpenMP offloading images
Joseph Huber [Wed, 8 Jun 2022 13:38:41 +0000 (09:38 -0400)]
[LinkerWrapper] Embed OffloadBinaries for OpenMP offloading images

The OpenMP offloading runtine currently uses an array of linked
offloading images. One downside to this is that we cannot know the
architecture or triple associated with the given image. In this patch,
instead of embedding the image itself, we embed an offloading binary
instead. This binary is simply a binary format that wraps around the
original linked image to provide some additional metadata. This will
allow us to support offloading to multiple architecture, or performing
future JIT compilation inside of the runtime, more clearly.
Additionally, these can be placed at a special section such that the
supported architectures can be identified using objdump with the support
from D126904. This needs to be stored in a new section name
`.llvm.offloading.images` because the `.llvm.offloading` section
implicitly uses the `SHF_EXCLUDE` flag and will always be stripped.

This patch does not contain the necessary code to parse these in
libomptarget.

Depends on D127246

Reviewed By: saiislam

Differential Revision: https://reviews.llvm.org/D127304

2 years ago[llvm-lib] Ignore /VERBOSE flag
Pengxuan Zheng [Wed, 20 Jul 2022 21:08:25 +0000 (14:08 -0700)]
[llvm-lib] Ignore /VERBOSE flag

Ignore the flag for now, but we can start using it for verbose output if needed.

Reviewed By: thakis

Differential Revision: https://reviews.llvm.org/D130202

2 years ago[mlir][spirv] Rename spv.GLSL ops to spv.GL. NFC.
Jakub Kuderski [Thu, 21 Jul 2022 17:02:45 +0000 (13:02 -0400)]
[mlir][spirv] Rename spv.GLSL ops to spv.GL. NFC.

This is to improve consistency within the SPIR-V dialect and make these ops a bit shorter.

Reviewed By: antiagainst

Differential Revision: https://reviews.llvm.org/D130280

2 years ago[libc++] Reorganize release notes
Louis Dionne [Thu, 21 Jul 2022 16:48:14 +0000 (12:48 -0400)]
[libc++] Reorganize release notes

In particular, create sections for deprecations and removals, and also
for announcing upcoming deprecations/removals.

2 years ago[clang] Add -fdiagnostics-format=sarif option for future SARIF output
Abraham Corea Diaz [Thu, 21 Jul 2022 16:50:43 +0000 (16:50 +0000)]
[clang] Add -fdiagnostics-format=sarif option for future SARIF output

Adds `sarif` option to the existing `-fdiagnostics-format` flag
for intended future work with SARIF diagnostics. Currently issues a warning
against the use of diagnostics in SARIF mode, then defaults to clang style for
diagnostics.

Reviewed By: cjdb, denik, aaron.ballman

Differential Revision: https://reviews.llvm.org/D129886

2 years ago[libc++][NFC] Fix weird unicode character in release notes
Louis Dionne [Thu, 21 Jul 2022 14:55:06 +0000 (10:55 -0400)]
[libc++][NFC] Fix weird unicode character in release notes

2 years ago[flang] Lower F08 merge_bits intrinsic.
Tarun Prabhu [Thu, 21 Jul 2022 16:39:54 +0000 (10:39 -0600)]
[flang] Lower F08 merge_bits intrinsic.

Lower F08 merge_bits intrinsic.

Differential Revision: https://reviews.llvm.org/D129779

2 years ago[mlir][linalg] Add tile_size option to `structured.tile_to_foreach_thread_op`
Christopher Bate [Thu, 21 Jul 2022 15:26:46 +0000 (09:26 -0600)]
[mlir][linalg] Add tile_size option to `structured.tile_to_foreach_thread_op`

This change modifies `structured.tile_to_foreach_thread_op` so that
it accepts either `tile_sizes` or `num_threads` parameters. If
`tile_sizes` are specified, then the number of threads required is
derived the tile sizes rather than the other way around. In both cases,
more aggressive folding of loop parameters is enabled during the
transformation, allowing for the potential elimination of `affine.min`
and `affine.max` operations in the static shape case when calculating
the final adjusted tile size.

Differential Revision: https://reviews.llvm.org/D130139

2 years ago[flang][NFC] Test folding of F08 merge_bits intrinsic.
Tarun Prabhu [Thu, 21 Jul 2022 16:26:56 +0000 (10:26 -0600)]
[flang][NFC] Test folding of F08 merge_bits intrinsic.

Test compile-time folding of the F2008 merge_bits intrinsic.

Differential Revision: https://reviews.llvm.org/D129780

2 years ago[lldb][NFCI] Refactor regex filtering logic in CommandObjectTypeFormatterList
Jorge Gorbe Moya [Thu, 21 Jul 2022 16:19:19 +0000 (09:19 -0700)]
[lldb][NFCI] Refactor regex filtering logic in CommandObjectTypeFormatterList

Extract a bit of copy/pasted regex filtering logic into a separate
function and simplify it a little bit.

Differential Revision: https://reviews.llvm.org/D130219

2 years ago[AArch64] Add target hook for preferPredicateOverEpilogue
David Sherwood [Tue, 12 Jul 2022 11:03:39 +0000 (12:03 +0100)]
[AArch64] Add target hook for preferPredicateOverEpilogue

This patch adds the AArch64 hook for preferPredicateOverEpilogue,
which currently returns true if SVE is enabled and one of the
following conditions (non-exhaustive) is met:

1. The "sve-tail-folding" option is set to "all", or
2. The "sve-tail-folding" option is set to "all+noreductions"
and the loop does not contain reductions,
3. The "sve-tail-folding" option is set to "all+norecurrences"
and the loop has no first-order recurrences.

Currently the default option is "disabled", but this will be
changed in a later patch.

I've added new tests to show the options behave as expected here:

  Transforms/LoopVectorize/AArch64/sve-tail-folding-option.ll

Differential Revision: https://reviews.llvm.org/D129560

2 years ago[clangd] Refactor forwarding call detection logic
Kadir Cetinkaya [Thu, 21 Jul 2022 10:41:00 +0000 (12:41 +0200)]
[clangd] Refactor forwarding call detection logic

Differential Revision: https://reviews.llvm.org/D130261

2 years ago[clangd] Mention whether compile flags were inferred in check mode
Kadir Cetinkaya [Thu, 21 Jul 2022 08:20:00 +0000 (10:20 +0200)]
[clangd] Mention whether compile flags were inferred in check mode

That way when looking at logs it's clear whether diagnostics are a
result of compile flags mismatch.

Differential Revision: https://reviews.llvm.org/D130228

2 years ago[libc++][format] Adhere to clang-tidy style.
Mark de Wever [Thu, 21 Jul 2022 15:31:45 +0000 (17:31 +0200)]
[libc++][format] Adhere to clang-tidy style.

D126971 broke the CI due to recent changes in the clang-tidy settings.
This fixes them.

2 years ago[AMDGPU] NFC. Auto-generate test for vcclo
Joe Nash [Wed, 20 Jul 2022 19:42:57 +0000 (15:42 -0400)]
[AMDGPU] NFC. Auto-generate test for vcclo

2 years ago[lldb] [gdb-remote] Fix process ID after following forked child
Michał Górny [Mon, 18 Jul 2022 15:54:31 +0000 (17:54 +0200)]
[lldb] [gdb-remote] Fix process ID after following forked child

Update the process ID after handling fork/vfork to ensure that
the process plugin reports the correct PID immediately.

Sponsored by: The FreeBSD Foundation
Differential Revision: https://reviews.llvm.org/D130037

2 years ago[Libomptarget] Build the device library even if the sm list is empty
Joseph Huber [Thu, 21 Jul 2022 13:34:58 +0000 (09:34 -0400)]
[Libomptarget] Build the device library even if the sm list is empty

We previously had some logic that stopped us from building the device runtime if
there were no NVPTX architectures provided. This is incorrect because we could
have AMDGPU libraries. Even if the lists are empty we should be able to attempt
to build these and get dummy output. THis wilil make it much easier for our
tooling which expects certain libraries. If the user wishes to disable the
library entirely they should use `-DLIBOMPTARGET_BUILD_DEVICERTL_BCLIB=OFF"

Reviewed By: tianshilei1992

Differential Revision: https://reviews.llvm.org/D130266

2 years ago[X86] Remove cfi directives and duplicated check in tests. NFC
Phoebe Wang [Thu, 21 Jul 2022 14:53:23 +0000 (22:53 +0800)]
[X86] Remove cfi directives and duplicated check in tests. NFC

2 years ago[lldb/test] Fix flakyness in TestNonStop.test_stdio
Pavel Labath [Thu, 21 Jul 2022 14:51:54 +0000 (16:51 +0200)]
[lldb/test] Fix flakyness in TestNonStop.test_stdio

The test was assuming that the output will come in two messages. The
truth is that it will come in **at least** two messages.

2 years ago[mlir][python] Fix issues with block argument slices
Alex Zinenko [Thu, 21 Jul 2022 14:00:37 +0000 (14:00 +0000)]
[mlir][python] Fix issues with block argument slices

The type extraction helper function for block argument and op result
list objects was ignoring the slice entirely. So was the slice addition.
Both are caused by a misleading naming convention to implement slices
via CRTP. Make the convention more explicit and hide the helper
functions so users have harder time calling them directly.

Closes #56540.

Reviewed By: stellaraccident

Differential Revision: https://reviews.llvm.org/D130271

2 years ago[NFC][LoopVectorize] Explicitly disable tail-folding on some SVE tests
David Sherwood [Tue, 12 Jul 2022 12:38:45 +0000 (13:38 +0100)]
[NFC][LoopVectorize] Explicitly disable tail-folding on some SVE tests

This patch is in preparation for enabling vectorisation with tail-folding
by default for SVE targets. Once we do that many existing tests will
break that depend upon having normal unpredicated vector loops. For
all such tests I have added the flag:

  -prefer-predicate-over-epilogue=scalar-epilogue

Differential Revision: https://reviews.llvm.org/D129137

2 years ago[LAA] Precommit add/sub tests for forked pointers
Graham Hunter [Thu, 21 Jul 2022 13:24:58 +0000 (14:24 +0100)]
[LAA] Precommit add/sub tests for forked pointers

Adds new tests for add and sub instructions before reaching a select.

Also adds tests using different bit widths for memory, including
non-power-of-two integers.

2 years ago[mlir][Linalg] Add a Transform dialect NavigationOp op to match a list of ops or...
Nicolas Vasilache [Thu, 21 Jul 2022 13:44:43 +0000 (06:44 -0700)]
[mlir][Linalg] Add a Transform dialect NavigationOp op to match a list of ops or an interface.

This operation is a NavigationOp that simplifies the writing of transform IR.
Since there is no way of refering to an interface by name, the current implementation uses
an EnumAttr and depends on the interfaces it supports.
In the future, it would be worthwhile to remove this dependence and generalize.

Differential Revision: https://reviews.llvm.org/D130267

2 years ago[AMDGPU][MC][NFC] Refine SMEM load definitions.
Ivan Kosarev [Thu, 21 Jul 2022 13:56:25 +0000 (14:56 +0100)]
[AMDGPU][MC][NFC] Refine SMEM load definitions.

Reviewed By: dp

Differential Revision: https://reviews.llvm.org/D130009

2 years ago[AMDGPU][NFC] Validate G_MERGE_VALUES as we match zero-extended 32-bit scalars.
Ivan Kosarev [Thu, 21 Jul 2022 13:25:09 +0000 (14:25 +0100)]
[AMDGPU][NFC] Validate G_MERGE_VALUES as we match zero-extended 32-bit scalars.

Reviewed By: arsenm

Differential Revision: https://reviews.llvm.org/D130001

2 years ago[lld-macho] Fix assertion when two symbols at same addr have unwind info
Jez Ng [Thu, 21 Jul 2022 13:44:01 +0000 (09:44 -0400)]
[lld-macho] Fix assertion when two symbols at same addr have unwind info

If there are multiple symbols at the same address, our unwind info
implementation assumes that we always register unwind entries to a
single canonical symbol.

This assumption was violated by the `registerEhFrame` code.

Fixes #56570.

Reviewed By: #lld-macho, thakis

Differential Revision: https://reviews.llvm.org/D130208

2 years agoRevert "Rewording the "static_assert" to static assertion"
Erich Keane [Thu, 21 Jul 2022 13:39:22 +0000 (06:39 -0700)]
Revert "Rewording the "static_assert" to static assertion"

Looks like we again are going to have problems with libcxx tests that
are overly specific in their dependency on clang's diagnostics.

This reverts commit 6542cb55a3eb115b1c3592514590a19987ffc498.

2 years ago[lld-macho][NFC] Remove redundant StringRef construction
Daniel Bertalan [Thu, 21 Jul 2022 09:26:09 +0000 (11:26 +0200)]
[lld-macho][NFC] Remove redundant StringRef construction

It's only used in one branch, so we were unnecessarily calculating the
length of many symbol names.

Tiny speedup when linking chromium_framework on my M1 Mac mini:

x before.txt
+ after.txt

    N           Min           Max        Median           Avg        Stddev
x  10     3.9917109        4.0418     4.0318099     4.0203902   0.021459873
+  10      3.944725      4.053988     3.9708955     3.9825602   0.037257609
Difference at 95.0% confidence
-0.03783 +/- 0.0285663
-0.940953% +/- 0.710536%
(Student's t, pooled s = 0.0304028)

Differential Revision: https://reviews.llvm.org/D130234

2 years agoRewording the "static_assert" to static assertion
Muhammad Usman Shahid [Thu, 21 Jul 2022 13:32:54 +0000 (06:32 -0700)]
Rewording the "static_assert" to static assertion

This patch is basically the rewording of the static assert statement's
output(error) on screen after failing. Failing a _Static_assert in C
should not report that static_assert failed. It’d probably be better to
reword the diagnostic to be more like GCC and say “static assertion”
failed in both C and C++.

consider a c file having code

_Static_assert(0, "oh no!");

In clang the output is like:

<source>:1:1: error: static_assert failed: oh no!
_Static_assert(0, "oh no!");
^              ~
1 error generated.
Compiler returned: 1

Thus here the "static_assert" is not much good, it will be better to
reword it to the "static assertion failed" to more generic. as the gcc
prints as:

<source>:1:1: error: static assertion failed: "oh no!"
    1 | _Static_assert(0, "oh no!");
          | ^~~~~~~~~~~~~~
          Compiler returned: 1

The above can also be seen here. This patch is about rewording
the static_assert to static assertion.

Differential Revision: https://reviews.llvm.org/D129048

2 years ago[Binary] Hard-code the alignment of the offloading binary
Joseph Huber [Thu, 21 Jul 2022 13:25:43 +0000 (09:25 -0400)]
[Binary] Hard-code the alignment of the offloading binary

Summary:
We previously used `alignof` to get the necessary alignment of the
binary header. However this was different on 32-bit platforms and caused
a few tests to fail because of it. This patch just changes this to be a
hard-coded constant of 8.

2 years ago[AMDGPU] Pre-sink IR input for some tests
Jay Foad [Wed, 20 Jul 2022 12:45:27 +0000 (13:45 +0100)]
[AMDGPU] Pre-sink IR input for some tests

Edit the IR input for some codegen tests to simulate what the IR code
sinking pass would do to it. This makes the tests immune to the presence
or absence of the code sinking pass in the codegen pass pipeline, which
does not belong there.

Differential Revision: https://reviews.llvm.org/D130169

2 years ago[LLDB][ClangExpression] Fix initialization of static enum alias members
Michael Buch [Thu, 21 Jul 2022 00:04:03 +0000 (01:04 +0100)]
[LLDB][ClangExpression] Fix initialization of static enum alias members

`IntegerLiteral::Create` operates on integer types. For that reason
when we parse DWARF into an AST, when we encounter a constant
initialized enum member variable, we try to determine the underlying
integer type before creating the `IntegerLiteral`. However, we
currently don't desugar the type and for enum typedefs
`dyn_cast<EnumType>` fails. In debug builds this triggers following
assert:

```
Assertion failed: (type->isIntegerType() && "Illegal type in IntegerLiteral"), function IntegerLiteral, file Expr.cpp, line 892
```

This patch turns the `dyn_cast<EnumType>` into a `getAs<EnumType>`
which `dyn_cast`s the canonical type, allowing us to get to the
underlying integer type.

**Testing**

* API test
* Manual repro is fixed

Differential Revision: https://reviews.llvm.org/D130213

2 years ago[LLDB][DataFormatter] Add support for std::__map_const_iterator
Michael Buch [Sun, 17 Jul 2022 11:16:39 +0000 (12:16 +0100)]
[LLDB][DataFormatter] Add support for std::__map_const_iterator

This patch adds support for formatting `std::map::const_iterator`.
It's just a matter of adding `const_` to the existing regex.

**Testing**

* Added test case to existing API tests

Differential Revision: https://reviews.llvm.org/D129962

2 years agoAMDGPU: Refine user-sgpr-init16-bug
Matt Arsenault [Thu, 21 Jul 2022 00:30:12 +0000 (20:30 -0400)]
AMDGPU: Refine user-sgpr-init16-bug

It only applies to gfx1100 and gfx1102, and for wave32.

2 years ago[MemoryBuiltins] Add getReallocatedOperand() function (NFC)
Nikita Popov [Thu, 21 Jul 2022 12:54:16 +0000 (14:54 +0200)]
[MemoryBuiltins] Add getReallocatedOperand() function (NFC)

Replace the value-accepting isReallocLikeFn() overload with a
getReallocatedOperand() function, which returns which operand is
the one being reallocated. Currently, this is always the first one,
but once allockind(realloc) is respected, the reallocated operand
will be determined by the allocptr parameter attribute.

2 years ago[MemoryBuiltins] Remove isFreeCall() function (NFC)
Nikita Popov [Thu, 21 Jul 2022 12:42:08 +0000 (14:42 +0200)]
[MemoryBuiltins] Remove isFreeCall() function (NFC)

Remove isFreeCall() in favor of getFreedOperand(). Replace the
two remaining uses with a getFreedOperand() != nullptr check, as
they only care that something is getting freed. (The usage in DSE
is correct as such. The allocator-related checks in CFLGraph look
rather questionable in general.)

2 years ago[InstCombine] Use getFreedOperand() (NFC)
Nikita Popov [Thu, 21 Jul 2022 12:33:55 +0000 (14:33 +0200)]
[InstCombine] Use getFreedOperand() (NFC)

Use getFreedOperand() instead of isFreeCall() to remove the
implicit assumption that any pointer operand to a free function
is the operand being freed. This won't actually matter until we
handle allockind(free).

2 years ago[Attributor] Use getFreedOperand() (NFC)
Nikita Popov [Thu, 21 Jul 2022 12:25:54 +0000 (14:25 +0200)]
[Attributor] Use getFreedOperand() (NFC)

Track which operand is actually freed, to avoid the implicit
assumption that it is the first call argument.

2 years ago[AMDGPU] Combine s_or_saveexec, s_xor instructions.
Thomas Symalla [Wed, 29 Jun 2022 17:58:08 +0000 (19:58 +0200)]
[AMDGPU] Combine s_or_saveexec, s_xor instructions.

This patch merges a consecutive sequence of

s_or_saveexec s_o, s_i
s_xor exec, exec, s_o

into a single

s_andn2_saveexec s_o, s_i instruction.
This patch also cleans up the SIOptimizeExecMasking pass a bit.

Reviewed By: nhaehnle

Differential Revision: https://reviews.llvm.org/D129073

2 years ago[pseudo] Fix an invalid assertion on recoveryBrackets.
Haojian Wu [Thu, 21 Jul 2022 11:57:33 +0000 (13:57 +0200)]
[pseudo] Fix an invalid assertion on recoveryBrackets.

The `Begin` is not the index of the left bracket, `Begin-1` is,
otherwise the assertion will be triggered on case `Foo().call();`.

2 years agoRevert "[Flang] Generate documentation for compiler flags"
Andrzej Warzynski [Thu, 21 Jul 2022 11:54:49 +0000 (11:54 +0000)]
Revert "[Flang] Generate documentation for compiler flags"

This reverts commit 396e944d82f3e212746cd241e4caba445523aff6.

Failing bot: https://lab.llvm.org/buildbot/#/builders/89/builds/30096

2 years ago[Flang] Generate documentation for compiler flags
Dylan Fleming [Thu, 21 Jul 2022 11:11:48 +0000 (11:11 +0000)]
[Flang] Generate documentation for compiler flags

This patch aims to create a webpage to document
Flang's command line options on https://flang.llvm.org/docs/
in a similar way to Clang's
https://clang.llvm.org/docs/ClangCommandLineReference.html

This is done by using clang_tablegen to generate an .rst
file from Options.td (which is current shared with Clang)
For this to work, ClangOptionDocEmitter.cpp was updated
to allow specific Flang flags to be included,
rather than bulk excluding clang flags.

Reviewed By: awarzynski

Differential Revision: https://reviews.llvm.org/D129864

2 years ago[Reland][DebugInfo][llvm-dwarfutil] Combine overlapped address ranges.
Alexey Lapshin [Tue, 19 Jul 2022 15:11:07 +0000 (18:11 +0300)]
[Reland][DebugInfo][llvm-dwarfutil] Combine overlapped address ranges.

DWARF files may contain overlapping address ranges. f.e. it can happen if the two
copies of the function have identical instruction sequences and they end up sharing.
That looks incorrect from the point of view of DWARF spec. Current implementation
of DWARFLinker does not combine overlapped address ranges. It would be good if such
ranges would be handled in some useful way. Thus, this patch allows DWARFLinker
to combine overlapped ranges in a single one.

Depends on D86539

Reviewed By: aprantl

Differential Revision: https://reviews.llvm.org/D123469

2 years ago[AArch64][SVE] Add DAG-Combine to push bitcasts from floating point loads after DUPLA...
Matt Devereau [Mon, 18 Jul 2022 14:39:35 +0000 (14:39 +0000)]
[AArch64][SVE] Add DAG-Combine to push bitcasts from floating point loads after DUPLANE128

This patch lowers
  duplane128(insert_subvector(undef, bitcast(op(128bitsubvec)), 0), 0)
to
  bitcast(duplane128(insert_subvector(undef, op(128bitsubvec), 0), 0)).

This enables floating-point loads to match patterns added in
https://reviews.llvm.org/D130010

Differential Revision: https://reviews.llvm.org/D130013

2 years ago[AArch64][SVE] Add ISel pattern to lower DUPLANE128 to LD1RQD
Matt Devereau [Tue, 12 Jul 2022 15:54:53 +0000 (15:54 +0000)]
[AArch64][SVE] Add ISel pattern to lower DUPLANE128 to LD1RQD

Following on from https://reviews.llvm.org/D128902, lower DUPLANE128 to LD1RQD
for integer load types from instruction selection.

Differential Revision: https://reviews.llvm.org/D130010

2 years ago[AArch64] Add i128 parity test
Simon Pilgrim [Thu, 21 Jul 2022 10:46:27 +0000 (11:46 +0100)]
[AArch64] Add i128 parity test

AArch64 has custom i128 ctpop handling, so match this in the parity tests

Added as part of triaging Issue #56531

2 years ago[AMDGPU][GlobalISel] Fix subtarget checks for combining to v_med3_i16
Jay Foad [Thu, 21 Jul 2022 10:12:14 +0000 (11:12 +0100)]
[AMDGPU][GlobalISel] Fix subtarget checks for combining to v_med3_i16

Differential Revision: https://reviews.llvm.org/D130243

2 years agoRevert "[DebugInfo][llvm-dwarfutil] Combine overlapped address ranges."
Alexey Lapshin [Thu, 21 Jul 2022 10:39:50 +0000 (13:39 +0300)]
Revert "[DebugInfo][llvm-dwarfutil] Combine overlapped address ranges."

This reverts commit d2a4d6bf9c52861f3d418bf7bb7d05f6e74dfead.

2 years ago[MemoryBuiltins] Add getFreedOperand() function (NFCI)
Nikita Popov [Thu, 21 Jul 2022 10:10:36 +0000 (12:10 +0200)]
[MemoryBuiltins] Add getFreedOperand() function (NFCI)

We currently assume in a number of places that free-like functions
free their first argument. This is true for all hardcoded free-like
functions, but with the new attribute-based design, the freed
argument is supposed to be indicated by the allocptr attribute.

To make sure we handle this correctly once allockind(free) is
respected, add a getFreedOperand() helper which returns the freed
argument, rather than just indicating whether the call frees *some*
argument.

This migrates most but not all users of isFreeCall() to the new
API. The remaining users are a bit more tricky.

2 years ago[DebugInfo][llvm-dwarfutil] Combine overlapped address ranges.
Alexey Lapshin [Tue, 19 Jul 2022 15:11:07 +0000 (18:11 +0300)]
[DebugInfo][llvm-dwarfutil] Combine overlapped address ranges.

DWARF files may contain overlapping address ranges. f.e. it can happen if the two
copies of the function have identical instruction sequences and they end up sharing.
That looks incorrect from the point of view of DWARF spec. Current implementation
of DWARFLinker does not combine overlapped address ranges. It would be good if such
ranges would be handled in some useful way. Thus, this patch allows DWARFLinker
to combine overlapped ranges in a single one.

Depends on D86539

Reviewed By: aprantl

Differential Revision: https://reviews.llvm.org/D123469

2 years agotsan: remove unnecessary brackets
Dmitry Vyukov [Thu, 21 Jul 2022 09:44:48 +0000 (11:44 +0200)]
tsan: remove unnecessary brackets

Reviewed By: melver

Differential Revision: https://reviews.llvm.org/D130236

2 years agoReapply [InstCombine] Don't check for alloc fn before fetching alloc size
Nikita Popov [Thu, 21 Jul 2022 08:05:48 +0000 (10:05 +0200)]
Reapply [InstCombine] Don't check for alloc fn before fetching alloc size

Reapply the patch with getObjectSize() replaced by getAllocSize().
The former will also look through calls that return their argument,
and we'll end up placing dereferenceable attributes on intrinsics
like llvm.launder.invariant.group. While this isn't wrong, it also
doesn't seem to be particularly useful. For now, use getAllocSize()
instead, which sticks closer to the original behavior of this code.

-----

This code is just interested in the allocsize, not any other
allocator properties.

2 years ago[MemoryBuiltins] Default to trivial mapper in getAllocSize() (NFC)
Nikita Popov [Thu, 21 Jul 2022 09:40:35 +0000 (11:40 +0200)]
[MemoryBuiltins] Default to trivial mapper in getAllocSize() (NFC)

Default getAllocSize() to use the trivial mapper. Also switch
from using std::function to function_ref.

Furthermore, update the doc comment to point out a subtle difference
between getAllocSize() and getObjectSize(): The latter may also
return something for calls that return their argument (via "returned"
attribute or special intrinsics like invariant groups).

2 years agorecommit "[DAGCombiner] Teach scalarizeBinOpOfSplats handle scalable splat."
jacquesguan [Thu, 7 Jul 2022 08:48:55 +0000 (16:48 +0800)]
recommit "[DAGCombiner] Teach scalarizeBinOpOfSplats handle scalable splat."

With fix for AArch64 and Hexgon test cases.

2 years ago[MemoryBuiltins] Don't query TLI for non-pointer functions (NFC)
Nikita Popov [Thu, 21 Jul 2022 08:46:50 +0000 (10:46 +0200)]
[MemoryBuiltins] Don't query TLI for non-pointer functions (NFC)

Fetching allocation data for calls is a rather hot operation, and
TLI lookups are slow. We can greatly reduce the number of calls
for which TLI is queried by checking that they return a pointer
value first, as this is a requirement for allocation functions
anyway.

2 years ago[C++20] [Modules] Avoid inifinite loop when iterating default args
Chuanqi Xu [Thu, 21 Jul 2022 09:19:11 +0000 (17:19 +0800)]
[C++20] [Modules] Avoid inifinite loop when iterating default args

Currently, clang may meet an infinite loop in a very tricky case when it
iterates the default args. This patch tries to fix this by adding a
`fixed` check.

2 years ago[flang][nfc] Add missing `REQUIRES: asserts` in tests
Andrzej Warzynski [Wed, 20 Jul 2022 17:00:50 +0000 (17:00 +0000)]
[flang][nfc] Add missing `REQUIRES: asserts` in tests

Tests that use `--mlir-pass-statistics-display=` from MLIR require the
following condition to hold: (extracted from LLVM's Statistics.h):
```
  #define LLVM_ENABLE_STATS 1
```
This is normally enforced with `REQUIRES: asserts`. This patch updates
relevant Flang tests accordingly.

For "Release" builds (with assertions disabled), the affected tests will
be failing without this change.

Differential Revision: https://reviews.llvm.org/D130185

2 years ago[mlir][memref] Missing type conversion in memref.reshape llvm lowering
Ivan Butygin [Sun, 17 Jul 2022 16:47:22 +0000 (18:47 +0200)]
[mlir][memref] Missing type conversion in memref.reshape llvm lowering

Shape can be memref of index type, so memref::LoadOp result need to be converted into llvm type.

Differential Revision: https://reviews.llvm.org/D129965

2 years agoRevert "[InstCombine] Don't check for alloc fn before fetching object size"
Nikita Popov [Thu, 21 Jul 2022 08:59:12 +0000 (10:59 +0200)]
Revert "[InstCombine] Don't check for alloc fn before fetching object size"

This reverts commit c72c22c04df992c95c5912d0075e5263c88f9fec.

This affected an Analysis test that I missed. Reverting for now.

2 years ago[InstCombine] Don't check for alloc fn before fetching object size
Nikita Popov [Thu, 21 Jul 2022 08:05:48 +0000 (10:05 +0200)]
[InstCombine] Don't check for alloc fn before fetching object size

This code is just interested in the allocsize, not any other
allocator properties.

2 years ago[PowerPC] Support x86 compatible intrinsics on AIX
Qiu Chaofan [Thu, 21 Jul 2022 08:33:41 +0000 (16:33 +0800)]
[PowerPC] Support x86 compatible intrinsics on AIX

These headers used to be guarded only on PowerPC64 Linux or FreeBSD, but
they can also be enabled for AIX OS target since it's big-endian ready.

Reviewed By: shchenz

Differential Revision: https://reviews.llvm.org/D129461

2 years agoenable P10 vector builtins test on AIX 64 bit; NFC
Chen Zheng [Thu, 21 Jul 2022 08:21:12 +0000 (04:21 -0400)]
enable P10 vector builtins test on AIX 64 bit; NFC

Verify that P10 vector builtins with type `vector signed __int128`
and `vector unsigned __int128` work well on AIX 64 bit.

2 years agore-land [C++20][Modules] Update handling of implicit inlines [P1779R3]
Iain Sandoe [Sun, 3 Jul 2022 13:27:10 +0000 (14:27 +0100)]
re-land [C++20][Modules] Update handling of implicit inlines [P1779R3]

re-land fixes an unwanted interaction with module-map modules, seen in
Greendragon testing.

This provides updates to
[class.mfct]:
Pre C++20 [class.mfct]p2:
  A member function may be defined (8.4) in its class definition, in
  which case it is an inline member function (7.1.2)
Post C++20 [class.mfct]p1:
  If a member function is attached to the global module and is defined
  in its class definition, it is inline.

and
[class.friend]:
Pre-C++20 [class.friend]p5
  A function can be defined in a friend declaration of a
  class . . . . Such a function is implicitly inline.
Post C++20 [class.friend]p7
  Such a function is implicitly an inline function if it is attached
  to the global module.

We add the output of implicit-inline to the TextNodeDumper, and amend
a couple of existing tests to account for this, plus add tests for the
cases covered above.

Differential Revision: https://reviews.llvm.org/D129045

2 years ago[MLIR][SCF] Enable better bufferization for `TileConsumerAndFuseProducersUsingSCFForOp`
lorenzo chelini [Thu, 21 Jul 2022 07:58:53 +0000 (09:58 +0200)]
[MLIR][SCF] Enable better bufferization for `TileConsumerAndFuseProducersUsingSCFForOp`

Replace iterators of the outermost loop with region arguments of the innermost
one. The changes avoid later `bufferization` passes to insert allocation within
the body of the innermost loop.

Reviewed By: mravishankar

Differential Revision: https://reviews.llvm.org/D130083

2 years ago[pseudo] Make sure we rebuild pseudo_gen tool.
Haojian Wu [Thu, 21 Jul 2022 08:02:29 +0000 (10:02 +0200)]
[pseudo] Make sure we rebuild pseudo_gen tool.

2 years ago[lld-macho] Optimize rebase opcode generation
Daniel Bertalan [Wed, 20 Jul 2022 08:09:58 +0000 (10:09 +0200)]
[lld-macho] Optimize rebase opcode generation

This commit reduces the size of the emitted rebase sections by
generating the REBASE_OPCODE_DO_REBASE_ADD_ADDR_ULEB and
REBASE_OPCODE_DO_REBASE_ULEB_TIMES_SKIPPING_ULEB opcodes.

With this change, chromium_framework's rebase section is a 40% smaller
197 kilobytes, down from the previous 320 kB. That is 6 kB smaller than
what ld64 produces for the same input.

Performance figures from my M1 Mac mini:

x before
+ after

    N           Min           Max        Median           Avg        Stddev
x  10     4.2269349     4.3300061     4.2689675     4.2690016   0.031151669
+  10      4.219331     4.2914009     4.2398136     4.2448277   0.023817308
No difference proven at 95.0% confidence

Differential Revision: https://reviews.llvm.org/D130180

2 years ago[CSKY] Fix the testcase error due to the verifyInstructionPredicates
Zi Xuan Wu (Zeson) [Thu, 21 Jul 2022 07:48:42 +0000 (15:48 +0800)]
[CSKY] Fix the testcase error due to the verifyInstructionPredicates

- Test cases for arch only has 16-bit instruction such as ck801/ck802 need
compile with -mattr=+btst16
- Fix the GPR copy instruction with MOV16 for 16-bit only arch.

2 years agoenable P10 vector builtins test on AIX 64 bit; NFC
Chen Zheng [Thu, 21 Jul 2022 07:48:31 +0000 (03:48 -0400)]
enable P10 vector builtins test on AIX 64 bit; NFC

Verify that P10 vector builtins with type `vector signed __int128`
and `vector unsigned __int128` work well on AIX 64 bit.

2 years agoRevert "[RFC][MLIR][SCF] Enable better bufferization for `TileConsumerAndFuseProducer...
lorenzo chelini [Thu, 21 Jul 2022 07:40:30 +0000 (09:40 +0200)]
Revert "[RFC][MLIR][SCF] Enable better bufferization for `TileConsumerAndFuseProducersUsingSCFForOp`"

This reverts commit 9e6585030533e901a8c24dcb05b38d3f0d10331f.

2 years ago[MemoryBuiltins] Avoid isAllocationFn() call before checking removable alloc
Nikita Popov [Thu, 21 Jul 2022 07:37:29 +0000 (09:37 +0200)]
[MemoryBuiltins] Avoid isAllocationFn() call before checking removable alloc

Alloc directly checking whether a given call is a removable
allocation, instead of first checking whether it is an allocation
first.

2 years ago[sanitizer_common] Support Solaris < 11.4 in GetStaticTlsBoundary
Rainer Orth [Thu, 21 Jul 2022 07:18:10 +0000 (09:18 +0200)]
[sanitizer_common] Support Solaris < 11.4 in GetStaticTlsBoundary

This patch, on top of D120048 <https://reviews.llvm.org/D120048>, supports
GetTls on Solaris 11.3 and Illumos that lack `dlpi_tls_modid`.  It's the
same method originally used in D91605 <https://reviews.llvm.org/D91605>,
but integrated into `GetStaticTlsBoundary`.

Tested on `amd64-pc-solaris2.11`, `sparcv9-sun-solaris2.11`, and
`x86_64-pc-linux-gnu`.

Differential Revision: https://reviews.llvm.org/D120059

2 years ago[SelectionDAG] Fix fptoi.sat scalable vector lowering
David Green [Thu, 21 Jul 2022 07:00:22 +0000 (08:00 +0100)]
[SelectionDAG] Fix fptoi.sat scalable vector lowering

Vector fptosi_sat and fptoui_sat were being expanded by unrolling the
vector operation. This doesn't work for scalable vector, so this patch
adds a call to TLI.expandFP_TO_INT_SAT if the vector is scalable.

Scalable tests are added for AArch64 and RISCV. Some of the AArch64
fptoi_sat operations should be legal, but that will be handled in
another patch.

Differential Revision: https://reviews.llvm.org/D130028

2 years ago[RFC][MLIR][SCF] Enable better bufferization for `TileConsumerAndFuseProducersUsingSC...
lorenzo chelini [Tue, 19 Jul 2022 13:25:03 +0000 (15:25 +0200)]
[RFC][MLIR][SCF] Enable better bufferization for `TileConsumerAndFuseProducersUsingSCFForOp`

Replace iterators of the outermost loop with region arguments of the innermost
one. The changes avoid later `bufferization` passes to insert allocation within
the body of the innermost loop.

Reviewed By: mravishankar

Differential Revision: https://reviews.llvm.org/D130083

2 years ago[X86] Add test case for shuffle
Luo, Yuanke [Thu, 21 Jul 2022 06:41:34 +0000 (14:41 +0800)]
[X86] Add test case for shuffle

2 years ago[LoopCacheAnalysis] Fix a type mismatch problem in cost calculation
Congzhe Cao [Thu, 21 Jul 2022 05:35:58 +0000 (01:35 -0400)]
[LoopCacheAnalysis] Fix a type mismatch problem in cost calculation

There is a problem in loop cache analysis that the types of SCEV variables
`Coeff` and `ElemSize` in function `isConsecutive()` may not match. The
mismatch would cause SCEV failures when `Coeff` is multiplied with `ElemSize`.

The fix in this patch is to extend the type of both `Coeff` and `ElemSize` to
whichever is wider in those two variables. As a clean-up, duplicate calculations
of `Stride` in `computeRefCost()` is then removed.

Reviewed By: Meinersbur, #loopoptwg

Differential Revision: https://reviews.llvm.org/D128877