[X86][Costmodel] Load/store i8 Stride=3 VF=8 interleaving costs
authorRoman Lebedev <lebedev.ri@gmail.com>
Sat, 2 Oct 2021 10:39:15 +0000 (13:39 +0300)
committerRoman Lebedev <lebedev.ri@gmail.com>
Sat, 2 Oct 2021 10:39:15 +0000 (13:39 +0300)
commitd1460c88a6d8739920f86383ff7d17be3dc517f6
treea4e98c9a6b3b2e7949d3e674c1cdfebd96951207
parentf1df2d8eaf188eec2971b12e57c821a0db5f3a36
[X86][Costmodel] Load/store i8 Stride=3 VF=8 interleaving costs

While we already model this tuple, the values are divergent from reality, so fix them.

The only sched models that for cpu's that support avx2
but not avx512 are: haswell, broadwell, skylake, zen1-3

For load we have:
https://godbolt.org/z/1jeocxj55 - for intels `Block RThroughput: =6.0`; for ryzens, `Block RThroughput: <=3.0`
So pick cost of `6`.

For store we have:
https://godbolt.org/z/fr7xfa3K5 - for intels `Block RThroughput: =6.0`; for ryzens, `Block RThroughput: <=2.0`
So pick cost of `6`.

I'm directly using the shuffling asm the llc produced,
without any manual fixups that may be needed
to ensure sequential execution.

Reviewed By: RKSimon

Differential Revision: https://reviews.llvm.org/D110960
llvm/lib/Target/X86/X86TargetTransformInfo.cpp
llvm/test/Analysis/CostModel/X86/interleaved-load-i8-stride-3.ll
llvm/test/Analysis/CostModel/X86/interleaved-store-i8-stride-3.ll