[AArch64] Use NEON's tbl1 for 16xi8 and 8xi8 build vector with mask.
authorLawrence Benson <github@lawben.com>
Wed, 29 Mar 2023 14:26:28 +0000 (15:26 +0100)
committerDavid Green <david.green@arm.com>
Wed, 29 Mar 2023 14:26:28 +0000 (15:26 +0100)
commit267d6d665cf2379ebfcc65fa385a35529c83a7d0
treed3320b24f6ca908456594bab63dfd1423810dac5
parent0b57d47bfab9d12d749d96627716eebdd4a9d636
[AArch64] Use NEON's tbl1 for 16xi8 and 8xi8 build vector with mask.

When using Clang's __builtin_shufflevector with a 16xi8 or 8xi8 source and
runtime mask on an AArch64 target, LLVM currently generates 16 or 8
extract+and+insert operations. This patch replaces these inserts with (a vector
AND +) NEON's tbl1 intruction.

Issue: https://github.com/llvm/llvm-project/issues/60515

Differential Revision: https://reviews.llvm.org/D146212
llvm/lib/Target/AArch64/AArch64ISelLowering.cpp
llvm/test/CodeGen/AArch64/neon-shuffle-vector-tbl.ll [new file with mode: 0644]