review.tizen.org Git - platform/upstream/llvm.git/log

[PowerPC] Improve code gen for vector add

Improve codegen for vectors modulo additions.

Reviewed By: nemanjai

Differential Revision: https://reviews.llvm.org/D154447

[mlir][gpu] Add dump-ptx option

When targeting NVIDIA GPUs, seeing the generated PTX is important. Currently, we don't have simple way to do it.

This work adds dump-ptx to gpu-to-cubin pass. One can use it like `gpu-to-cubin{chip=sm_90 features=+ptx80 dump-ptx}`.

Reviewed By: nicolasvasilache

Differential Revision: https://reviews.llvm.org/D155166

[AMDGPU][IGLP] Add iglp_opt(1) strategy for single wave gemms

This adds the IGLP strategy for single-wave gemms. The SchedGroup pipeline is laid out in multiple phases, with each phase corresponding to a distinct pattern present in gemm kernels. The resilience of the optimization is dependent upon IR (as seen by pre-RA scheduling) continuing to have these patterns (as defined by instruction class and dependencies) in their current relative ordering.

The kernels of interest have these specific phases:
NT: 1, 2a, 2c
NN: 1, 2a, 2b
TT: 1, 2b, 2c
TN: 1, 2b

The general approach taken was to have a long SchedGroup pipeline. In this way the scheduler will have less capability of doing the wrong thing. In order to resolve the challenge of correctly fitting these long pipelines, we leverage the rules infrastructure to help the solver.

Differential Revision: https://reviews.llvm.org/D149773

Change-Id: I1a35962a95b4bdf740602b8f110d3297c6fb9d96

[flang][runtime] Support in-tree device build of Flang runtime.

I changed the set of files that are built for experimental CUDA/OMP
builds, i.e. the files with enabled device support are built
as such and the rest of the files are built just for the host target.
With this change we can build Flang runtime library that is fully functional
on the host target, so in-tree targets like check-flang become operational.

Reviewed By: klausler, PeteSteinfeld

Differential Revision: https://reviews.llvm.org/D155029

[AMDGPU][AsmParser][NFC] Translate parsed MIMG instructions to MCInsts automatically.

Part of <https://github.com/llvm/llvm-project/issues/62629>.

Reviewed By: foad

Differential Revision: https://reviews.llvm.org/D155061

[AMDGPU][MC] Fix handling of A16 operands in intersect_ray instructions.

The patch adds the support for 'noa16' operands in non-A16 variants of
the instructions, fixes validation of A16 operands and eliminates the
custom conversion to MCInst.

Part of <https://github.com/llvm/llvm-project/issues/62629>.

Reviewed By: arsenm

Differential Revision: https://reviews.llvm.org/D155057

[RISCV] Add initial SDNode patterns for unary zvbb instructions

This patch adds pseudos and SDNode patterns for vbrev.v, vrev8.v, vclz.v,
vctz.v and vcpop.v.
I've only added them for integer element types so far since we're lacking tests
for floats.

Reviewed By: craig.topper

Differential Revision: https://reviews.llvm.org/D155216

[RISCV] Correct resource cycles for vzext/vsext in SiFive7 scheduler.

The instructions produce DLEN bits per cycle. The vsetvli LMUL for these
instructions is the output EMUL. The input EMUL is scaled down by
the vector factor suffix on the instruction name.

So for LMUL=1 there are 2*DLEN bits of result produced over 2 cycles.
This makes SiFive7GetCyclesDefault the correct resource cycles.

Reviewed By: monkchiang

Differential Revision: https://reviews.llvm.org/D155010

[AMDGPU][MC] Pre-commit tests for the noa16 intersect_ray instructions fix, D155057.

The added instructions are incorrectly encoded as a16 ones despite the
'noa16' modifiers.

Reviewed By: foad

Differential Revision: https://reviews.llvm.org/D155059

[DebugInfo] Force users of DWARFDebugAbbrev to call parse before iterating

In an attempt to make it easier to catch errors when parsing the
debug_abbrev section, we should force users to call `parse` before
calling `begin`. In a follow-up change, I will change the return type of
`parse` from `void` to `Error`.

I also explored using the fallible_iterator pattern instead of forcing
users to parse everything up front. I think it would be a useful and
interesting pattern to implement, but it would require more extensive
changes to both DWARFDebugAbbrev and its users. Because my top priority
is improving the safety around parsing debug_abbrev, I'm opting to
preserve existing behavior until I or somebody else has time to refactor
to be able to implement a fallible_iterator.

Differential Revision: https://reviews.llvm.org/D154655

[lldb] Support Compact C Type Format (CTF)

Add support for the Compact C Type Format (CTF) in LLDB. The format
describes the layout and sizes of C types. It is most commonly consumed
by dtrace.

We generate CTF for the XNU kernel and want to be able to use this in
LLDB to debug kernels for which we don't have dSYMs (anymore). CTF is a
much more limited debug format than DWARF which allows is to be an order
of magnitude smaller: a 1GB dSYM can be converted to a handful of
megabytes of CTF. For XNU, the goal is not to replace DWARF, but rather
to have CTF serve as a "better than nothing" debug info format when
DWARF is not available.

It's worth noting that the LLVM toolchain does not support emitting CTF.
XNU uses ctfconvert to generate CTF from DWARF which is used for
testing.

Differential revision: https://reviews.llvm.org/D154862

[lldb] Support Compact C Type Format (CTF) section

Teach LLDB about the ctf (Compact C Type Format) section.

Differential revision: https://reviews.llvm.org/D154668

[RISCV] Common remaining operand logic in performCombineVMergeAndVOps [nfc]

We can share the code for both the unmasked and masked cases, and add a missing consistency assert in the process.

This is a subset of Luke's D155063. I'm splitting pieces and landing them in the process of convincing myself all the individual transforms are in fact correct. This is the last major piece.

[lldb] Move CommandOverrideCallbackWithResult to lldb_private namespace

This has an `lldb_private` type in its parameter, it should be in
`lldb-private-types.h`

Differential Revision: https://reviews.llvm.org/D155129

[flang][openacc] Add support for complex mul reduction

Add support to lower reduction with the multiply operator and
complex type.

Depends on D155007

Reviewed By: razvanlupusoru

Differential Revision: https://reviews.llvm.org/D155014

Switch to strncpy to silence GCC stringop overflow warnings.

Thanks to Simon Pilgrim for letting me know about these in
https://reviews.llvm.org/rG9d701c8a8d65.

[BOLT] Attach ORC info to instructions in CFG

Propagate Linux Kernel ORC information read from the file to the whole
function CFG once the graph has been built. We have a choice to either
attach ORC state annotation to every instruction, or to the first
instruction in the basic block to conserve processing memory. I chose to
attach to every instruction under --print-orc option which is currently
on by default.

Depends on D155153, D154815

Reviewed By: Amir

Differential Revision: https://reviews.llvm.org/D155156

[BOLT][NFC] Add post-CFG processing to MetadataRewriter interface

Add MetadataRewriter::postCFGInitializer().

Reviewed By: jobnoorman

Differential Revision: https://reviews.llvm.org/D155153

[RISCV] Reason explicitly about mask and rounding mode in performCombineVMergeAndVOps [nfc]

This is a subset of Luke's D155063. I'm splitting pieces and landing them in the process of convincing myself all the individual transforms are in fact correct.

The code structure here is overly verbose. I'm landing this staging change with the code structure exactly matching the non-masked case to make the following cleanup that commons this all obviously correct.

[BOLT] Add reading support for Linux ORC sections

Read ORC (oops rewind capability) info used for unwinding the stack by
Linux Kernel. The info is stored in .orc_unwind and .orc_unwind_ip
sections. There is also a related .orc_lookup section that is being
populated by the kernel during runtime. Contents of the sections are
sorted for quicker lookup by a post-link objtool.

Unless we modify stack access instructions, we don't have to change ORC
info attributed to instructions in the binary. However, we need to
update instruction addresses and sort both sections based on the new
layout.

For pretty printing, we add "--print-orc" option that prints ORC info
next to instructions in code dumps.

Reviewed By: Amir

Differential Revision: https://reviews.llvm.org/D154815

Remove Clang :: CodeGenCXX/unified-cfi-lto.cpp due to buildbot failures

This test has been failing on sanitizer-x86_64-linux-bootstrap-asan
since it was commited. Removing this test while I work on reproducing
this.

Example: https://lab.llvm.org/buildbot/#/builders/168/builds/14579

[libc++] Fix filesystem tests on platforms that don't have IO

This patch moves a few tests that were still using std::fprintf to
using TEST_REQUIRE instead, which provides a single point to tweak
for platforms that don't implement fprintf. As a fly-by fix, it also
avoids including `time_utils.h` in filesystem_clock.cpp when it is
not required, since that header makes some pretty large assumptions
about the platform it is on.

Differential Revision: https://reviews.llvm.org/D155019

[BOLT][DWARF] Fix adding DW_AT_GNU_ranges_base

There are cases in DWARF4 when Skeleton CU has ranges, but dwo CU doesn't.
Bug was introduced in new DWARFRewriter where for DWARF4 it would fall through
to DWARF5 case.

Reviewed By: maksfb

Differential Revision: https://reviews.llvm.org/D155033

[lldb] Forward declare SBPlatform and SBTypeMember in SBDefines

Differential Revision: https://reviews.llvm.org/D155137

[BOLT][DWARF][NFC] Fix false positive error

The DWO Unit DIE, doesn't have low_pc/high_pc, so we were printing this error
for valid cases.

Reviewed By: maksfb

Differential Revision: https://reviews.llvm.org/D155032

Don't assert on a non-pointer value being used for a "p" inline asm constraint.

GCC and existing codebases allow the use of integral values to be used
with this constraint. A recent change D133914 in this area started causing asserts.
Removing the assert is enough as the rest of the code works fine.

rdar://109675485

Differential Revision: https://reviews.llvm.org/D155023

[RISCV] Update test after the addition for rounding mode to vfadd intrinsic. NFC

The greediness of the operand matching regular expressions made
the test pass even though an operand is missing.

[mlir][sparse][gpu] force 16-byte alignment on data structs for cuSparseLt

Also makes some minor consistency edits in the cuSparseLt wrapper lib.

Reviewed By: Peiming, K-Wu

Differential Revision: https://reviews.llvm.org/D155139

[BOLT][DWARF][NFC] Set initial offset of DIE

Setting initial offset of DIE to input DIE. This is to make "printf" debugging
easier.

Reviewed By: maksfb

Differential Revision: https://reviews.llvm.org/D155031

[Driver] Remove unneeded useRelaxRelocations overrides

ENABLE_X86_RELAX_RELOCATIONS has defaulted to on
(c41a18cf61790fc898dcda1055c3efbf442c14c0) for nearly 3 years.
As a clean-up, remove overrides from some early adopters.

Change OHOS to use true as agreed by the patch author D145227.

[RISCV] Common post-mask operand construction in performCombineVMergeAndVOps [nfc]

This is a subset of Luke's D155063. I'm splitting pieces and landing them in the process of convincing myself all the individual transforms are in fact correct.

This particular change involves a slightly ugly bit of code to match the glue to the mask. I'm staging it this way as I ran into a bit of weirdness when commoning mask operands, and wanted to isolate the complexity.

[Demangle] use std::string_view::data rather than &*std::string_view::begin

To fix expensive check builds that were failing when using MSVC's
std::string_view::iterator::operator*, I added a few expressions like
&*std::string_view::begin. @nico pointed out that this is literally the
same thing and more clearly expressed as std::string_view::data.

Link: https://github.com/llvm/llvm-project/issues/63740
Reviewed By: #libc_abi, ldionne, philnik, MaskRay

Differential Revision: https://reviews.llvm.org/D154876

[Flang][OpenMP][Lower] Program level implicit SAVE variable handling for declare target

This is an attempt at mimicing the method in which
threadprivate handles the following type of variables:

program main
integer :: i
!$omp declare target to(i)
end

Which essentially generates a GlobalOp for the variable (which
would normally only be an alloca) when it's instantiated. The
main difference is there is no operation generated within the
function, instead the declare target attribute is appended
later within handleDeclareTarget.

Reviewers: kiranchandramohan

Differential Revision: https://reviews.llvm.org/D152037

[RISCV] Tail common repeated code in performCombineVMergeAndVOps [nfc]

Very minor change, just making sure each step is obvious and easy to follow.

This is a subset of Luke's D155063. I'm splitting pieces and landing them in the process of convincing myself all the individual transforms are in fact correct.

[flang] Fix OMPEarlyOutlining erasing declare target functions

The early outlining pass was erasing target functions that need to be
kept. It should only erase functions that contain target ops.

[RISCV] Factor out a dupiicate bit of repeated code in performCombineVMergeAndVOps [nfc]

We have the SEW operand access repeating in all paths, common it up to make the code easier to read.

This is a subset of Luke's D155063. I'm splitting pieces and landing them in the process of convincing myself all the individual transforms are in fact correct.

[flang][hlfir] Fixed character allocatable in structure constructor.

The problem appeared as a segfault for case like this:
```
type t
character(11), allocatable :: c
end type
character(12), alloctable :: x
type(t) y
y = t(x)
```

The frontend representes `y = t(x)` as `y=t(c=%SET_LENGTH(x,11_8))`.
When 'x' is unallocated the hlfir.set_length lowering results in
segfault. It could probably be handled in hlfir.set_length lowering
by using NULL base for the hlfir.declare depending on the allocation
status of 'x', but I am not sure if !hlfir.expr, in general, is supposed
to represent an expression created from unallocated allocatable.
I believe in Fortran that would mean referencing an unallocated
allocatable, which is not allowed.

I decided to special case `SET_LENGTH` in structure constructor,
so that we use its 'x' operand as the RHS for the assign operation
implying the isAllocatable check for cases when 'x' is allocatable.
This requires setting keep_lhs_length_if_realloc flag for the assign
operation. Note that when the component being intialized has
deferred length the frontend does not produce `SET_LENGTH`.

Differential Revision: https://reviews.llvm.org/D155151

[Driver] Recognize powerpc-unknown-eabi as a bare-metal toolchain

This seems to match https://gcc.gnu.org/install/specific.html#powerpc-x-eabi

It seems that anything with OS `none` (although that doesn’t seem to be distinguished from `unknown`) or with environment `eabi` should be treated as bare-metal.
Since this seems to have been handled on a case-by-case basis in the past ([arm](https://reviews.llvm.org/D33259), [riscv](https://reviews.llvm.org/D91442), [aarch64](https://reviews.llvm.org/D111134)), what I am proposing here is to add another case to the list to also handle `powerpc[64][le]-unknown-unknown-eabi` using the `BareMetal` toolchain, following the example of the existing cases. (We don’t care about powerpc64 and powerpc[64]le, but it seemed appropriate to lump them in.)

At Indel, we have been building bare-metal embedded applications that run on custom PowerPC and ARM systems with Clang and LLD for a couple of years now, using target triples `powerpc-indel-eabi`, `powerpc-indel-eabi750`, `arm-indel-eabi`, `aarch64-indel-eabi` (which I just learned from D153430 is wrong and should be `aarch64-indel-elf` instead, but that’s a different matter). This has worked fine for ARM, but for PowerPC we have been unable to call the linker (LLD) through the Clang driver, because it would insist on calling GCC as the linker, even when told `-fuse-ld=lld`. That does not work for us, there is no GCC around. Instead we had to call `ld.lld` directly, introducing some special cases in our build system to translate between linker-via-driver and linker-called-directly command line arguments. I have now dug into why that is, and found that the difference between ARM and PowerPC is that `arm-indel-eabi` hits a special case that causes the Clang driver to instantiate a `BareMetal` toolchain that is able to call LLD and works the way we need, whereas `powerpc-indel-eabi` lands in the default case of a `Generic_ELF` (subclass of `Generic_GCC`) toolchain which expects GCC.

Reviewed By: MaskRay, michaelplatings, #powerpc, nemanjai

Differential Revision: https://reviews.llvm.org/D154357

[X86] Fold ANDNP(X,NOT(Y)) -> NOT(OR(X,Y))

Removing the x86-specific node helps further folding and improves commutativity

[flang][openacc] Add semantic check for reduction operator and types

Check the combination of reduction operator and types. This is
currently not checking common block and composite types.

Depends on D155105

Reviewed By: razvanlupusoru

Differential Revision: https://reviews.llvm.org/D155106

[flang][NFC] Remove duplicate of getDesignatorNameIfDataRef function

Remove duplicate of the getDesignatorNameIfDataRef() function.

Reviewed By: kiranchandramohan

Differential Revision: https://reviews.llvm.org/D155105

[RISCV] Simplify glue handling logic in performCombineVMergeAndVOps [nfc]

This is a subset of Luke's D155063. I'm splitting pieces and landing them in the process of convincing myself all the individual transforms are in fact correct.

In this case, we're simplifying based on the assumption that all of our vmerge operands have mask operands. This is a fundemental property of a vmerge.

[Flang] -funderscoring bug fix

There was a bug with the -funderscoring / -fno-underscoring options from (https://reviews.llvm.org/D140795) that prevented the driver option from controlling the underscoring behaviour and instead the behaviour could only be controlled by the pass option instead of the driver option. The driver test case did not catch the bug and also needed to be updated.

Reviewed By: awarzynski

Differential Revision: https://reviews.llvm.org/D155042

[TableGen][CodeEmitterGen] Fix SubOpAliases MIOperandNo mixup

SubOpAliases maps a sub-operand name to the respective operand's index
and the sub-operand number within this operand. The operand index is
used for the Operands array.

Currently MIOperandNo is used as the operand index, which is not
correct. For example, if there are 2 operands with 3 sub-operands each:

(ins (bdladdr12onlylen4 $B1, $D1, $L1):$BDL1,
(bdladdr12onlylen4 $B2, $D2, $L2):$BDL2)

then B2's operand index will be 3, but the correct value is 1.

Reviewed By: jyknight

Differential Revision: https://reviews.llvm.org/D155158

[AMDGPU] Use V_FMA_MIX* more often

Combine mul (f32) + fptrunc (f32->f16) to "v_fma_mixlo_f16 mulSrc1, mulSrc2, 0".

Differential Revision: https://reviews.llvm.org/D153544
Reviewers: arsenm, foad

Add test case back but with !REQUIRES: amdgpu-registered-target.

[mlir][Arith] Make previous load-bearing assert into a real error

When I landed the EmulateUnsupportedFloats, I'd negligently included
an assert that needed to run for the pass to be correct. Previous
emergency fix commits removed the assert. This commit re-adds the
"can't happen" testing as an emitOpError() and aborting the rewrite,
thus allowing it to function in no-assertions builds.

Reviewed By: kuhar

Differential Revision: https://reviews.llvm.org/D155088

[libc++] Remove BuildKite bridging files that are not needed anymore

Differential Revision: https://reviews.llvm.org/D155120

[mlir][Linalg] Fold/erase self-copy linalg.copy on buffers

Differential Revision: https://reviews.llvm.org/D155203

sanitizer_common: initialize sanitizer runtimes lazily from signal interceptors

Currently if a program calls sigaction very early (before non-lazy sanitizer
initialization, in particular if .preinit_array initialization is not enabled),
then sigaction will wrongly fail since the interceptor is not initialized yet.

In all other interceptors we do lazy runtime initialization for this reason,
but we don't do it in the signal interceptors.
Do lazy runtime initialization in signal interceptors as well.

Reviewed By: melver

Differential Revision: https://reviews.llvm.org/D155188

Precommit for more usage of V_FMA/MAD_MIX*

Make fdiv.f16.ll autogenerated.

Remove amdgpu target to fix aarch64 buildbot failure.

Reland "[AMDGPU] Wave32 CodeGen for amdgcn.ballot.i64"

This time without the extra `->dump()`

A recent addition to the device libs, `__ockl_dm_trim`, caused a series of
failures at O0 due to a i64 ballot intrinsic being inlined into a wave32 function.

The quick fix for this is to support codegen for this rare case.
A proper long-term fix for this type of issue is still being discussed.

Fixes SWDEV-408929, SWDEV-408957, SWDEV-409885, SWDEV-410193

Reviewed By: #amdgpu, arsenm

Differential Revision: https://reviews.llvm.org/D155050

[mlir][spirv] Lower memref.reinterpret_cast

For kernel SPIR-V, we are lowering memref to bare pointers, so reinterpret can be lowered to pointer, adjusted by offset value.

Differential Revision: https://reviews.llvm.org/D155011

Revert "[AMDGPU] Wave32 CodeGen for amdgcn.ballot.i64"

This reverts commit cfa2d0a3aa0beb5422107dc9943cb0eae6d93896.

[Driver] Warn about -mios-version-min instead of erroring out when
targeting MachO embedded architectures

Sometimes users pass this option when targeting embedded architectures
like armv7m on non-darwin platforms.

Emit a warning instead of erroring out, which restores the behavior
prior to 34d7acd444b88342fc93fca202608c1e16fa5946.

[BTF] Fix BTFParserTest.cpp for unaligned access after D149058

Test bot reported an issue with unit tests for D149058 in [1]:

  [==========] Running 1 test from 1 test suite.
  [----------] Global test environment set-up.
  [----------] 1 test from BTFParserTest
  [ RUN      ] BTFParserTest.simpleCorrectInput
  /b/sanitizer-x86_64-linux-fast/build/llvm-project/llvm/unittests/DebugInfo/BTF/BTFParserTest.cpp:141:33:
  runtime error: upcast of misaligned address 0x7facce60411f for type 'llvm::SmallString<0>', which requires 8 byte alignment
  0x7facce60411f: note: pointer points here
   64 00 00 00 37  41 60 ce ac 7f 00 00 00  00 00 00 00 00 00 00 00  00 00 00 00 00 00 00 00  00 00 00
               ^
  SUMMARY: UndefinedBehaviorSanitizer: undefined-behavior
  /b/sanitizer-x86_64-linux-fast/build/llvm-project/llvm/unittests/DebugInfo/BTF/BTFParserTest.cpp:141:33

The issue is caused by attribute "packed" used for too many things:

  #pragma pack(push, 1)
  struct MockData1 {
    struct B {
      ...
    } BTF;
    struct E {
      ...
    } Ext;

    int BTFSectionLen = sizeof(BTF);
    int ExtSectionLen = sizeof(Ext);

    SmallString<0> Storage;
    std::unique_ptr<ObjectFile> Obj;

  }
  #pragma pack(pop)

Access to unaligned pointers in `Storage`/`Obj` causes unaligned
access errors.

To fix this #pragma directives are pushed invards to apply only to `B`
and `E` definitions.

[1] https://lab.llvm.org/buildbot/#/builders/5/builds/35040

Differential Revision: https://reviews.llvm.org/D155176

Check for denormal flushing when selecting V_FMA/MAD_MIX*

Revert "[CodeGen] Store SP adjustment in MachineBasicBlock. NFCI."

This reverts commit 58d1eaa3b6ce4f7285c51f83faff7a3ac374c746.

[AMDGPU] Wave32 CodeGen for amdgcn.ballot.i64

A recent addition to the device libs, `__ockl_dm_trim`, caused a series of
failures at O0 due to a i64 ballot intrinsic being inlined into a wave32 function.

The quick fix for this is to support codegen for this rare case.
A proper long-term fix for this type of issue is still being discussed.

Fixes SWDEV-408929, SWDEV-408957, SWDEV-409885, SWDEV-410193

Reviewed By: #amdgpu, arsenm

Differential Revision: https://reviews.llvm.org/D155050

[GlobalISel] Fix the error transformation of BRCOND to BCC

Fix https://github.com/llvm/llvm-project/issues/62309

Reviewed By: dmgreen
Differential Revision: https://reviews.llvm.org/D150527

[Flang][OpenMP][MLIR] Add early outlining pass for omp.target operations to flang

This patch implements an early outlining transform of omp.target operations in
flang. The pass is needed because optimizations may cross target op region
boundaries, but with the outlining the resulting functions only contain a
single omp.target op plus a func.return, so there should not be any opportunity
to optimize across region boundaries.

The patch also adds an interface to be able to store and retrieve the parent
function name of the original target operation. This is needed to be able to
create correct kernel function names when lowering to LLVM-IR.

Reviewed By: kiranchandramohan, domada

Differential Revision: https://reviews.llvm.org/D154879

Revert rGf269877dc30777354be8a512e871aba1b1f9fd7a "[X86] canonicalizeShuffleMaskWithHorizOp - fold permute(pack(x,y)) -> pack(shuffle(x,y),undef) iff we only demand the lower elements"

This appears to be causing some infinite loops (and is particularly bad when D152928 is applied).

[mlir][nvvm] Add populate function (nfc)

This work adds populate function for the nvvm to llvm conversion pattern.

Reviewed By: kuhar

Differential Revision: https://reviews.llvm.org/D155189

[X86] Don't elide argument copies for scalarized vectors (PR63475)

When eliding argument copies, the memory layout between a plain
store of the type and the layout of the argument lowering on the
stack must match. For multi-part argument lowerings, this is not
necessarily the case.

The code already tried to prevent this optimization for "scalarized
and extended" vectors, but the check for "extends" was incomplete.
While a scalarized vector of i32s stores i32 values on the stack,
these are stored in 8 byte stack slots (on x86_64), so effectively
have padding.

Rather than trying to add more special cases to handle this (which
is not straightforward), I'm going in the other direction and
exclude scalarized vectors from this optimization entirely. This
seems like a rare case that is not worth the hassle -- the complete
lack of test coverage is not reassuring either.

Fixes https://github.com/llvm/llvm-project/issues/63475.

Differential Revision: https://reviews.llvm.org/D154078

[X86] Remove out of range extract in test (NFC)

As pointed out in https://reviews.llvm.org/D154078#inline-1500915.

[gn] port d1367ca46ee4

[gn build] Port 2b2e7f6e5727

[gn] port c8e055d485ea (BTF)

[lldb][NFC] Factor out code linking OSO addr of uninitialized GVs

This code was introduced in 2fc93eabf7e132abd51d0ea0ad599beb3fa44334.
In order to improve readability of ParseVariableDIE, we move this code into a
helper function. The issue this code attempted to address was fixed between
Clangs 9 and 11; as such, if we ever want to delete this code, it is a lot
easier to do so after the refactor.

Differential Revision: https://reviews.llvm.org/D155100

[CodeGenCXX] Add test for forward declare as array elem (NFC)

To guard against the miscompile that D153142 would have introduced.

[clang][Interp][NFC] Move a declaration into an if statement

[clang][Interp][NFC] Use std::byte for byte code.

[clang][Interp][NFC] Trim Source.h includes

[X86] Prevent infinite loop in SelectionDAG when lowering negations

In certain cases, lowering negations can cause an infinite loop in SelectionDAG on X86.

The following snippet shows that behaviour:
https://godbolt.org/z/5hP45T4hY

What happens is that ADD(XOR(..., -1), 1) is detected as the two's complement and transformed into SUB(0, ...)
However, immediates can not be encoded as the LHS of a SUB on X86.
Therefore it is transformed back into an ADD/XOR pair, which is then again transformed into a SUB and so on.

In that specific case, I still think it is valid to display this as a SUB(0,...) , because it should eventually be lowered as a NEG.
Which seems better than an ADD/XOR pair.

Adding an exception to the X86 specific handling for SUBs with 0 LHS operand fixes this infinite loop.

Differential Revision: https://reviews.llvm.org/D154575

[clangd][NFC] Remove dead code

refactor/tweaks/ExtractVariable.cpp:
Condition (!C++ && !ExprType) is never true because if ExprType was null
we would early-exit earlier.

tool/ClangdMain.cpp:
StaticIdx variable is not initialized before check, so checking it
doesn't make sense.

Found by static analyzer tool.

Reviewed By: sammccall

Differential Revision: https://reviews.llvm.org/D155164

[AArch64] Extra tablegen patterns for smaller extracted addl/addw/subl/subw

During lowering, especially of smaller vector types, we can end up with
`add (extract_subvector(zext(x), extract_subvector(zext(y))`, which can
be turned into `extract_subvector(add(zext(y), zext(x)))`, which can use
the addl AArch64 instruction. This adds some tablegen patterns for it,
along with addw where only one operand is an extract/extend and subl/subw.

Differential Revision: https://reviews.llvm.org/D153632

[RISCV] Don't fold vmerge into ops if fp exception can be raised

We are already checking for fp exceptions if VL changes, but I believe we
should also be checking for them if the mask changes as well, since that also
affects the set of active elements. From the spec:
> A vector floating-point exception at any active floating-point element sets
> the standard FP exception flags in the fflags register. Inactive elements do
> not set FP exception flags.

Note that we don't change the mask if IsMasked is true, i.e. True is masked
already, since in that case we keep the existing mask.

Reviewed By: reames

Differential Revision: https://reviews.llvm.org/D154980

[RISCV] Add test for vmerge combine that should be prevented

The fadd in these test cases is constrained and may set fflags differently
depending on the active elements (the nofpexcept flag isn't set on the node).
Therefore to preserve semantics we shouldn't change its mask.

Reviewed By: reames

Differential Revision: https://reviews.llvm.org/D154979

[amdgpu][lds] Raise an explicit unimplemented error on absolute address LDS variables

These aren't implemented. They could be at moderate implementation
complexity. Raising an error is better than silently miscompiling.

Patching now because the patch at D155125 is a step towards using this metadata
more extensively as part of the lowering path and that will interact badly with
input variables with this annotation.

Lowering user defined variables at specific addresses would drop this error,
put them at the requested position in the frame during this pass, and then
use the same codegen that will be used for the kernel specific struct shortly.

Reviewed By: jmmartinez

Differential Revision: https://reviews.llvm.org/D155132

[libc][NFC] Split memcpy implementations per platform

This is a follow up on D154800 and D154770 to make the code structure more principled and avoid too many nested #ifdef/#endif.

Reviewed By: courbet

Differential Revision: https://reviews.llvm.org/D155099

[X86] Remove combineVectorTruncation and delay general vector trunc to lowering

Stop folding vector truncations to PACKSS/PACKUS patterns prematurely - another step towards Issue #63710. We still prematurely fold to PACKSS/PACKUS if there are sufficient signbits, that will be addressed in a later patch when we remove combineVectorSignBitsTruncation.

This required ReplaceNodeResults to extend handling of sub-128-bit results to SSSE3 (or later) cases, which has allowed us to improve vXi32->vXi16 truncations to use PSHUFB.

I also tweaked LowerTruncateVecPack to recognise widened truncation source operands so the upper elements remain UNDEF (otherwise truncateVectorWithPACK* will constant fold them to allzeros/allones values).

[LSR] Don't consider users of constant outside loop

In CollectLoopInvariantFixupsAndFormulae(), LSR looks at users
outside the loop. E.g. if we have an addrec based on %base, and
%base is also used outside the loop, then we have to keep it in a
register anyway, which may make it more profitable to use
%base + %idx style addressing.

This reasoning doesn't hold up when the base is a constant, because
the constant can be rematerialized. The lsr-memcpy.ll test regressed
when enabling opaque pointers, because inttoptr (i64 6442450944 to ptr)
now also has a use outside the loop (previously it didn't due to a
pointer type difference), and that extra "use" results in worse use
of addressing modes in the loop. However, the use outside the loop
actually gets rematerialized, so the alleged register saving does
not occur.

The same reasoning also applies to other types of constants, such
as global variable references.

Differential Revision: https://reviews.llvm.org/D155073

[libc][NFC] Split bcmp implementations per platform

This is a follow up on D154800 and D154770 to make the code structure more principled and avoid too many nested #ifdef/#endif.

Reviewed By: courbet

Differential Revision: https://reviews.llvm.org/D155076

[LSR] Add test variant with global variables (NFC)

A variant of the test using globals instead of inttoptr expressions
for D155073.

[Clang][RISCV] Align RVV intrinsic builtin names with the C intrinsics

Reviewed By: craig.topper

Differential Revision: https://reviews.llvm.org/D155102

[X86] canonicalizeShuffleMaskWithHorizOp - fold 256-bit permute(hop(x,y)) -> hop(extract(x),extract(x)) iff we only demand the lower elements

Attempt to recognise when we can narrow a 256-bit hop to a lower 128-bit hop by extracting the requested subvectors (and then widening back)

[X86] canonicalizeShuffleMaskWithHorizOp - fold permute(pack(x,y)) -> pack(shuffle(x,y),undef) iff we only demand the lower elements

Help expose undef elements for further shuffle combines

Noticed while trying to improve truncation packss/packus patterns for sub-128-bit results.

[DWARFv5][DWARFLinker] Add stripped template names into .debug_names.

The D153869 patch stopped storing stripped template names into
.debug_names accelerator table. This patch restores original
behavior as lldb relies on presenting stripped names. Changes for
llvm-dwarfdump would be done as a separate patch.

Differential Revision: https://reviews.llvm.org/D155070

Revert "[MemCpyOpt] implement single BB stack-move optimization which unify the static unescaped allocas"

This reverts commit 96ae0851c26237378fa1280b0a9ad713e1b72bdb.

[IR] Remove LLVMPointerTo intrinsic type (NFC)

With opaque pointers, this is equivalent to llvm_ptr_ty. However,
this particular type was actually completely unused.

[lld][COFF] Add -print-search-paths for debugging.

While working on adding more implicit search paths to the
lld COFF driver, it was helpful to have a way to print all
the search paths, both for debugging and for testing without
having to create very complicated test cases.

This is a simple arg that just prints the search paths and exits.

Related to the efforts in #63827

Differential Revision: https://reviews.llvm.org/D155047

[Format][Tooling] Fix HeaderIncludes::insert not respect the main-file header.

Differential Revision: https://reviews.llvm.org/D154963

[RISCV] Adjust formatting under RISCVInstrInfoVPseudos.td (NFC)

CC: craig.topper

[analyzer] Fix crash in MoveChecker when it tries to report duplicate issue

The 'MoveChecker' was missing the check if the error node was
successfully generated (non-null value was returned). This happens
if duplicate of the report is emitted.

This patch contains NFC, where 'reportBug' is renamed to 'tryReportBug',
to better indicate conditional behavior of function.

Author: Arseniy Zaostrovnykh <arseniy.zaostrovnykh@sonarsource.com>

Reviewed By: xazax.hun

Differential Revision: https://reviews.llvm.org/D155084

[RISCV] Remove unused private field 'HasFRMRoundModeOp' in RVVIntrinsic (NFC)

/data/llvm-project/clang/include/clang/Support/RISCVVIntrinsicUtils.h:390:8: error: private field 'HasFRMRoundModeOp' is not used [-Werror,-Wunused-private-field]
bool HasFRMRoundModeOp;
^
1 error generated.

Revert "[Support] Move StringExtras.h include from Error.h to Error.cpp"

This reverts commit 5be8c89ed6d959d57ae21c168dc4215ce0d05553.

[IR] Partially remove pointer element types from intrinsic signatures (NFC)

As typed pointers are no longer supported, we should no longer
specify element types in intrinsic signatures.

The only meaningful pointer types are now:

    llvm_ptr_ty -> ptr
    llvm_anyptr_ty -> ptr addrspace(any)
    LLVMQualPointerType<N> -> ptr addrspace(N)

This is only "partially" because we also have a bunch of special
IIT descriptors like LLVMPointerTo, LLVMPointerToElt and
LLVMAnyPointerToElt, which I'll leave for a later revision.

Differential Revision: https://reviews.llvm.org/D155086

[8/8][RISCV] Add rounding mode control variant for vfredosum, vfredusum, vfwredosum, vfwredusum

Depends on D154635

For the cover letter of the patch-set, please checkout D154628.

This is the 8th patch of the patch-set.

Reviewed By: craig.topper

Differential Revision: https://reviews.llvm.org/D154636

[7/8][RISCV] Add rounding mode control variant for conversion intrinsics between floating-point and integer

Depends on D154634

For the cover letter of the patch-set, please checkout D154628.

This is the 7th patch of the patch-set. This patch includes change to
vfcvt_x_f, vfcvt_xu_f, vfwcvt_x_f, vfwcvt_xu_f, vfncvt_x_f, vfncvt_xu_f
vfcvt_f_x, vfcvt_f_xu, vfncvt_f_x vfncvt_f_xu, vfncvt_f_f

Reviewed By: craig.topper

Differential Revision: https://reviews.llvm.org/D154635