[AArch64] Lower 3 and 4 sources buildvectors to TBL
authorDavid Green <david.green@arm.com>
Sat, 26 Mar 2022 21:10:43 +0000 (21:10 +0000)
committerDavid Green <david.green@arm.com>
Sat, 26 Mar 2022 21:10:43 +0000 (21:10 +0000)
commit693d3b7e76367a1dd31b594ba72bdda5391dfef3
tree39f5cef5502e006ed5d0bac52304e95071de1115
parentb548f5847235118878c15caa8df1b89e75fc965b
[AArch64] Lower 3 and 4 sources buildvectors to TBL

The default expansion for buildvectors is to extract each element and
insert them into a new vector. That involves a lot of copying to/from
the GPR registers. TLB3 and TLB4 can be relatively slow instructions
with the mask needing to be loaded from a constant pool, but they should
always be better than all the moves to/from GPRs.

Differential Revision: https://reviews.llvm.org/D121137
llvm/lib/Target/AArch64/AArch64ISelLowering.cpp
llvm/test/CodeGen/AArch64/fptosi-sat-vector.ll
llvm/test/CodeGen/AArch64/fptoui-sat-vector.ll
llvm/test/CodeGen/AArch64/neon-extracttruncate.ll
llvm/test/CodeGen/AArch64/shuffle-tbl34.ll
llvm/test/CodeGen/AArch64/tbl-loops.ll