review.tizen.org Git - platform/upstream/llvm.git/commit

author	Craig Topper <craig.topper@intel.com>
	Mon, 27 Apr 2020 18:53:41 +0000 (11:53 -0700)
committer	Craig Topper <craig.topper@intel.com>
	Mon, 27 Apr 2020 19:00:24 +0000 (12:00 -0700)
commit	bdbbed115f87fd2700bf10249c6a63625f59a809
tree	d9484e03796087112ac84ea0d6f90594c9b9d2a9	tree \| snapshot
parent	4b9bef7e6cae4212ab7325ab3165ce01be4344bc	commit \| diff

[X86][CostModel] Update costs for vector truncate with avx512f/avx512bw.

All avx512 truncate instructions except vXi64->vXi32 are 2 uops
on port 5. So raise their costs to 2. Except when we have an
earlier faster sequence like pshufb for 128 bit input vectors.

Add a lower cost of 3 v16i16->v16i8 with avx512f where we can
extend to v16i32 then truncate. And a cost of 2 for avx512bw with
and without avx512vl. There we can use vpmovwb with either a ymm
or zmm input. Both of these beat masking, splitting, and using
packuswb which is our avx/avx2 codegen.

llvm/lib/Target/X86/X86TargetTransformInfo.cpp		diff \| blob \| history
llvm/test/Analysis/CostModel/X86/arith-fix.ll		diff \| blob \| history
llvm/test/Analysis/CostModel/X86/arith-overflow.ll		diff \| blob \| history
llvm/test/Analysis/CostModel/X86/cast.ll		diff \| blob \| history
llvm/test/Analysis/CostModel/X86/min-legal-vector-width.ll		diff \| blob \| history
llvm/test/Analysis/CostModel/X86/trunc.ll		diff \| blob \| history