[X86][Costmodel] `getReplicationShuffleCost()`: promote 16 bit-wide elements to 32...
authorRoman Lebedev <lebedev.ri@gmail.com>
Sun, 14 Nov 2021 16:53:01 +0000 (19:53 +0300)
committerRoman Lebedev <lebedev.ri@gmail.com>
Sun, 14 Nov 2021 17:01:38 +0000 (20:01 +0300)
commit4dd2f0446cf5de76c16a2663049a874f5c90ac92
tree7725b41980d3e72a0b7be2daa5bb33c3de3830af
parente876698a5dc48697a077ae51455fb3520ed7410d
[X86][Costmodel] `getReplicationShuffleCost()`: promote 16 bit-wide elements to 32 bit when no AVX512BW

The basic idea is simple, if we don't have native shuffle for this element type,
then we must have native shuffle for wider element type,
so promote, replicate, demote.

I believe, asking `getCastInstrCost(Instruction::Trunc` is correct semantically,
case in point `trunc <32 x i32> to <32 x i8>` aka 2 * ZMM will naively result in
2 * XMM, that then will be packed into 1 * YMM,
and it should count the cost of said packing,
not just the truncations.

Reviewed By: RKSimon

Differential Revision: https://reviews.llvm.org/D113609
llvm/lib/Target/X86/X86TargetTransformInfo.cpp
llvm/test/Analysis/CostModel/X86/shuffle-replication-i16.ll