[SROA] `isVectorPromotionViable()`: memory intrinsics operate on vectors of bytes...
authorRoman Lebedev <lebedev.ri@gmail.com>
Fri, 9 Dec 2022 13:25:09 +0000 (16:25 +0300)
committerRoman Lebedev <lebedev.ri@gmail.com>
Fri, 16 Dec 2022 16:27:38 +0000 (19:27 +0300)
commitcfd594f8bb5e779c81171e7c1e61ae8436efabd3
tree424e0db4f89d2aabb3aec803a951249cd3d2bce3
parent37b8f09a4b61bf9bf9d0b9017d790c8b82be2e17
[SROA] `isVectorPromotionViable()`: memory intrinsics operate on vectors of bytes (take 3)

* This is a recommit of 3c4d2a03968ccf5889bacffe02d6fa2443b0260f,
* which was reverted in 25f01d593ce296078f57e872778b77d074ae5888,
  because it exposed a miscompile in PPC backend,  which was resolved
  in https://reviews.llvm.org/D140089 / cb3f415cd2019df7d14683842198bc4b7a492bc5.
* which was a recommit of cf624b23bc5d5a6161706d1663def49380ff816a,
* which was reverted in 5cfc22cafe3f2465e0bb324f8daba82ffcabd0df,
  because the cut-off on the number of vector elements was not low enough,
  and it triggered both SDAG SDNode operand number assertions,
  5and caused compile time explosions in some cases.

Let's try with something really *REALLY* conservative first,
just to get somewhere, and try to bump it later.

FIXME: should this respect TTI reg width * num vec regs?

Original commit message:

Now, there's a big caveat here - these bytes
are abstract bytes, not the i8 we have in LLVM,
so strictly speaking this is not exactly legal,
see e.g. https://github.com/AliveToolkit/alive2/issues/860
^ the "bytes" "could" have been a pointer,
and loading it as an integer inserts an implicit ptrtoint.

But at the same time,
InstCombine's `InstCombinerImpl::SimplifyAnyMemTransfer()`
would expand a memtransfer of 1/2/4/8 bytes
into integer-typed load+store,
so this isn't exactly a new problem.

Note that in memory, poison is byte-wise,
so we really can't widen elements,
but SROA seems to be inconsistent here.

Fixes #59116.
21 files changed:
clang/test/CodeGenOpenCL/amdgpu-nullptr.cl
llvm/lib/Transforms/Scalar/SROA.cpp
llvm/test/CodeGen/AMDGPU/v1024.ll
llvm/test/DebugInfo/Generic/assignment-tracking/sroa/memcpy.ll
llvm/test/DebugInfo/Generic/assignment-tracking/sroa/memmove-to-from-same-alloca.ll
llvm/test/DebugInfo/Generic/assignment-tracking/sroa/store.ll
llvm/test/DebugInfo/Generic/assignment-tracking/sroa/user-memcpy.ll
llvm/test/DebugInfo/Generic/assignment-tracking/sroa/vec-2.ll
llvm/test/DebugInfo/X86/sroasplit-1.ll
llvm/test/DebugInfo/X86/sroasplit-4.ll
llvm/test/Transforms/PhaseOrdering/instcombine-sroa-inttoptr.ll
llvm/test/Transforms/SROA/address-spaces.ll
llvm/test/Transforms/SROA/alignment.ll
llvm/test/Transforms/SROA/alloca-address-space.ll
llvm/test/Transforms/SROA/basictest.ll
llvm/test/Transforms/SROA/pointer-offset-size.ll
llvm/test/Transforms/SROA/scalable-vectors.ll
llvm/test/Transforms/SROA/slice-width.ll
llvm/test/Transforms/SROA/tbaa-struct.ll
llvm/test/Transforms/SROA/tbaa-struct2.ll
llvm/test/Transforms/SROA/vector-promotion.ll