review.tizen.org Git - platform/upstream/llvm.git/commit

author	Simon Pilgrim <llvm-dev@redking.me.uk>
	Wed, 14 Jul 2021 11:03:16 +0000 (12:03 +0100)
committer	Simon Pilgrim <llvm-dev@redking.me.uk>
	Wed, 14 Jul 2021 11:03:49 +0000 (12:03 +0100)
commit	ee71c1bbccb19ed7a30b9aaf112a2c6ac2987193
tree	fc0ecd76ef7ad02de6fb98419d46c8bf64edae01	tree \| snapshot
parent	90e7f5d25902fe7d7a8eac1b6050f6a3f8c0919e	commit \| diff

[X86] Implement smarter instruction lowering for FP_TO_UINT from f32/f64 to i32/i64 and vXf32/vXf64 to vXi32 for SSE2 and AVX2 by using the exact semantic of the CVTTPS2SI instruction.

We know that "CVTTPS2SI" returns 0x80000000 for out of range inputs (and for FP_TO_UINT, negative float values are undefined). We can use this to make unsigned conversions from vXf32 to vXi32 more efficient, particularly on targets without blend using the following logic:

small := CVTTPS2SI(x);
fp_to_ui(x) := small | (CVTTPS2SI(x - 2^31) & ARITHMETIC_RIGHT_SHIFT(small, 31))

Even on targets where "PBLENDVPS"/"PBLENDVB" exists, it is often a latency 2, low throughput instruction so this logic is applied there too (in particular for AVX2 also). It furthermore gets rid of one high latency floating point comparison in the previous lowering.

@TomHender checked the correctness of this for all possible floats between -1 and 2^32 (both ends excluded).

Original Patch by @TomHender (Tom Hender)

Differential Revision: https://reviews.llvm.org/D89697

llvm/lib/Target/X86/X86ISelLowering.cpp		diff \| blob \| history
llvm/lib/Target/X86/X86TargetTransformInfo.cpp		diff \| blob \| history
llvm/test/Analysis/CostModel/X86/fptoui.ll		diff \| blob \| history
llvm/test/CodeGen/X86/concat-cast.ll		diff \| blob \| history
llvm/test/CodeGen/X86/fptoui-sat-scalar.ll		diff \| blob \| history
llvm/test/CodeGen/X86/ftrunc.ll		diff \| blob \| history
llvm/test/CodeGen/X86/half.ll		diff \| blob \| history
llvm/test/CodeGen/X86/scalar-fp-to-i32.ll		diff \| blob \| history
llvm/test/CodeGen/X86/scalar-fp-to-i64.ll		diff \| blob \| history
llvm/test/CodeGen/X86/vec_cast3.ll		diff \| blob \| history
llvm/test/CodeGen/X86/vec_fp_to_int.ll		diff \| blob \| history
llvm/test/Transforms/SLPVectorizer/X86/fptoui.ll		diff \| blob \| history