[CostModel][X86] Improve accuracy of vXi64 MUL costs on AVX2/AVX512 targets
authorSimon Pilgrim <llvm-dev@redking.me.uk>
Sun, 23 May 2021 21:50:45 +0000 (22:50 +0100)
committerSimon Pilgrim <llvm-dev@redking.me.uk>
Mon, 24 May 2021 08:48:32 +0000 (09:48 +0100)
commit243e58868176102484c3ff1a338342633ede7361
tree0b2f3dd12abf9717052159e83a80a3ffa3d5bff6
parent486110eb413446dfa835d880bfd1c0d6bbe9f120
[CostModel][X86] Improve accuracy of vXi64 MUL costs on AVX2/AVX512 targets

By llvm-mca analysis, Haswell/Broadwell has the worst v4i64 recip-throughput cost of the AVX2 targets at 6 (vs the currently used cost of 8). Similarly SkylakeServer (our only AVX512 target model) implements PMULLQ with an average cost of 1.5 (rounded up to 2.0), and the PMULUDQ-sequence (without AVX512DQ) as a cost of 6.
llvm/lib/Target/X86/X86TargetTransformInfo.cpp
llvm/test/Analysis/CostModel/X86/arith-fix.ll
llvm/test/Analysis/CostModel/X86/arith-overflow.ll
llvm/test/Analysis/CostModel/X86/arith.ll
llvm/test/Analysis/CostModel/X86/reduce-mul.ll
llvm/test/Analysis/CostModel/X86/rem.ll