[CostModel][X86] Improve masked load/store AVX1/AVX2 costs
authorSimon Pilgrim <llvm-dev@redking.me.uk>
Sun, 2 Jun 2019 20:37:02 +0000 (20:37 +0000)
committerSimon Pilgrim <llvm-dev@redking.me.uk>
Sun, 2 Jun 2019 20:37:02 +0000 (20:37 +0000)
commit8a32ca381d1ecbb04b456a109bcb4f4ab846380b
tree743f7a6dd94aefe90a6c08de5ff7fb11c95d52cf
parenta7bc31ebc6deddc99bcfc4ca15fc6d2573544f1c
[CostModel][X86] Improve masked load/store AVX1/AVX2 costs

A mixture of internal tests and review of the scheduler models indicates we're overestimating the cost of a masked load, which we're estimating at 4x regular memory ops - more realistic values indicates that its closer to 2x. Masked stores costs are a lot more diverse but 8x is roughly in the middle of the range.

e.g. SandyBridge
defm : X86WriteRes<WriteFMaskedLoad, [SBPort23,SBPort05], 8, [1,2], 3>;
defm : X86WriteRes<WriteFMaskedLoadY, [SBPort23,SBPort05], 9, [1,2], 3>;
defm : X86WriteRes<WriteFMaskedStore, [SBPort4,SBPort01,SBPort23], 5, [1,1,1], 3>;
defm : X86WriteRes<WriteFMaskedStoreY, [SBPort4,SBPort01,SBPort23], 5, [1,1,1], 3>;

e.g. Btver2
defm : X86WriteRes<WriteFMaskedLoad, [JLAGU, JFPU01, JFPX], 6, [1, 2, 2], 1>;
defm : X86WriteRes<WriteFMaskedLoadY, [JLAGU, JFPU01, JFPX], 6, [2, 4, 4], 2>;
defm : X86WriteRes<WriteFMaskedStore, [JSAGU, JFPU01, JFPX], 6, [1, 1, 4], 1>;
defm : X86WriteRes<WriteFMaskedStoreY, [JSAGU, JFPU01, JFPX], 6, [2, 2, 4], 2>;

Differential Revision: https://reviews.llvm.org/D61257

llvm-svn: 362338
llvm/lib/Target/X86/X86TargetTransformInfo.cpp
llvm/test/Analysis/CostModel/X86/masked-intrinsic-cost-widen.ll
llvm/test/Analysis/CostModel/X86/masked-intrinsic-cost.ll
llvm/test/Transforms/LoopVectorize/X86/masked_load_store.ll