[AMDGPU] Use V_PERM to match buildvectors when inputs are not canonicalized (i.e...
authorjeff <Jeffrey.Byrnes@amd.com>
Wed, 21 Sep 2022 18:09:30 +0000 (18:09 +0000)
committerJeffrey Byrnes <Jeffrey.Byrnes@amd.com>
Mon, 3 Oct 2022 19:58:29 +0000 (12:58 -0700)
commitf4e6149d8217176f71591b277e9cd08be5f732c1
treec8b7b1d897bebe6ca36592487db2e9b6ef2df539
parentc1bfa414284f887f3f0b5709ebbd281dc3ccc34f
[AMDGPU] Use V_PERM to match buildvectors when inputs are not canonicalized (i.e. can't use V_PACK)

If we can not prove that f16 operands of a buildvector are canonicalized, then we can not lower into a V_PACK. In this scenario, we would previously lower into some combination of and(sdwa), shr, or. This patch allows for matching into V_PERM instead.

Change-Id: Ifa4a74fdb81ef44f22ba490c7fdf81ec8aebc945
53 files changed:
llvm/lib/Target/AMDGPU/AMDGPUInstructions.td
llvm/lib/Target/AMDGPU/SIInstructions.td
llvm/lib/Target/AMDGPU/SOPInstructions.td
llvm/test/CodeGen/AMDGPU/GlobalISel/fpow.ll
llvm/test/CodeGen/AMDGPU/GlobalISel/llvm.amdgcn.image.atomic.dim.a16.ll
llvm/test/CodeGen/AMDGPU/GlobalISel/llvm.amdgcn.image.gather4.a16.dim.ll
llvm/test/CodeGen/AMDGPU/GlobalISel/llvm.amdgcn.image.load.2darraymsaa.a16.ll
llvm/test/CodeGen/AMDGPU/GlobalISel/llvm.amdgcn.image.load.3d.a16.ll
llvm/test/CodeGen/AMDGPU/GlobalISel/llvm.amdgcn.image.sample.cd.g16.ll
llvm/test/CodeGen/AMDGPU/GlobalISel/llvm.amdgcn.image.sample.g16.ll
llvm/test/CodeGen/AMDGPU/GlobalISel/llvm.amdgcn.intersect_ray.ll
llvm/test/CodeGen/AMDGPU/GlobalISel/saddsat.ll
llvm/test/CodeGen/AMDGPU/GlobalISel/sdivrem.ll
llvm/test/CodeGen/AMDGPU/GlobalISel/ssubsat.ll
llvm/test/CodeGen/AMDGPU/GlobalISel/uaddsat.ll
llvm/test/CodeGen/AMDGPU/GlobalISel/udivrem.ll
llvm/test/CodeGen/AMDGPU/GlobalISel/usubsat.ll
llvm/test/CodeGen/AMDGPU/add.v2i16.ll
llvm/test/CodeGen/AMDGPU/build-vector-packed-partial-undef.ll
llvm/test/CodeGen/AMDGPU/chain-hi-to-lo.ll
llvm/test/CodeGen/AMDGPU/combine-vload-extract.ll
llvm/test/CodeGen/AMDGPU/divergence-driven-buildvector.ll
llvm/test/CodeGen/AMDGPU/extract-subvector-16bit.ll
llvm/test/CodeGen/AMDGPU/fast-unaligned-load-store.global.ll
llvm/test/CodeGen/AMDGPU/fast-unaligned-load-store.private.ll
llvm/test/CodeGen/AMDGPU/fcanonicalize.f16.ll
llvm/test/CodeGen/AMDGPU/fmax_legacy.f16.ll
llvm/test/CodeGen/AMDGPU/fmin_legacy.f16.ll
llvm/test/CodeGen/AMDGPU/fshr.ll
llvm/test/CodeGen/AMDGPU/idot4s.ll
llvm/test/CodeGen/AMDGPU/idot4u.ll
llvm/test/CodeGen/AMDGPU/idot8s.ll
llvm/test/CodeGen/AMDGPU/idot8u.ll
llvm/test/CodeGen/AMDGPU/insert_vector_elt.v2i16.ll
llvm/test/CodeGen/AMDGPU/llvm.amdgcn.image.gather4.a16.dim.ll
llvm/test/CodeGen/AMDGPU/llvm.amdgcn.image.sample.a16.dim.ll
llvm/test/CodeGen/AMDGPU/llvm.amdgcn.image.sample.cd.a16.dim.ll
llvm/test/CodeGen/AMDGPU/llvm.amdgcn.image.sample.cd.g16.encode.ll
llvm/test/CodeGen/AMDGPU/llvm.amdgcn.image.sample.cd.g16.ll
llvm/test/CodeGen/AMDGPU/llvm.amdgcn.image.sample.g16.a16.dim.ll
llvm/test/CodeGen/AMDGPU/llvm.amdgcn.image.sample.g16.encode.ll
llvm/test/CodeGen/AMDGPU/llvm.amdgcn.image.sample.g16.ll
llvm/test/CodeGen/AMDGPU/load-hi16.ll
llvm/test/CodeGen/AMDGPU/load-lo16.ll
llvm/test/CodeGen/AMDGPU/pack.v2f16.ll
llvm/test/CodeGen/AMDGPU/pack.v2i16.ll
llvm/test/CodeGen/AMDGPU/partial-shift-shrink.ll
llvm/test/CodeGen/AMDGPU/strict_fadd.f16.ll
llvm/test/CodeGen/AMDGPU/strict_fma.f16.ll
llvm/test/CodeGen/AMDGPU/strict_fmul.f16.ll
llvm/test/CodeGen/AMDGPU/strict_fsub.f16.ll
llvm/test/CodeGen/AMDGPU/sub.v2i16.ll
llvm/test/CodeGen/AMDGPU/vector_shuffle.packed.ll