[X86] Lower the cost of v4i64->v4i32 and v8i64->v8i32 truncate with AVX
authorCraig Topper <craig.topper@intel.com>
Wed, 29 Apr 2020 20:00:04 +0000 (13:00 -0700)
committerCraig Topper <craig.topper@intel.com>
Wed, 29 Apr 2020 20:21:44 +0000 (13:21 -0700)
commitcff66865322e9e990808eb5a7ed7cdacefb699d7
tree9c9e7a266b1cd66b0229743a003425e429ea0fb9
parent87324ac33e96526ab3fe7854079f82b1af522934
[X86] Lower the cost of v4i64->v4i32 and v8i64->v8i32 truncate with AVX

We generate much better code these days than we used to. And we use the same sequence for AVX1 and AVX2 for these

For v4i64->v4i32 we generate:
vextractf128    xmm1, ymm0, 1
vshufps xmm0, xmm0, xmm1, 136   # xmm0 = xmm0[0,2],xmm1[0,2]

And for v8i64->v8i32 we generate:
vperm2f128      ymm2, ymm0, ymm1, 49 # ymm2 = ymm0[2,3],ymm1[2,3]
vinsertf128     ymm0, ymm0, xmm1, 1
vshufps ymm0, ymm0, ymm2, 136   # ymm0 = ymm0[0,2],ymm2[0,2],ymm0[4,6],ymm2[4,6]

Differential Revision: https://reviews.llvm.org/D79109
llvm/lib/Target/X86/X86TargetTransformInfo.cpp
llvm/test/Analysis/CostModel/X86/arith-fix.ll
llvm/test/Analysis/CostModel/X86/arith-overflow.ll
llvm/test/Analysis/CostModel/X86/cast.ll
llvm/test/Analysis/CostModel/X86/min-legal-vector-width.ll
llvm/test/Analysis/CostModel/X86/trunc.ll