[X86][SSE] Prefer trunc(movd(x)) to pextrb(x,0)
authorSimon Pilgrim <llvm-dev@redking.me.uk>
Fri, 13 Mar 2020 18:42:43 +0000 (18:42 +0000)
committerSimon Pilgrim <llvm-dev@redking.me.uk>
Fri, 13 Mar 2020 18:43:04 +0000 (18:43 +0000)
commit05c0d3491822b3a74f49be2fe8c8273e436ab7ec
tree05fe2a6d9c28c3b2c91d9f330dcf2a2b7520a82a
parenta213ece30bdb8b604ea0933edbd9c2ca77b9631f
[X86][SSE] Prefer trunc(movd(x)) to pextrb(x,0)

If we're extracting the 0'th index of a v16i8 vector we're better off using MOVD than PEXTRB, unless we're storing the value or we require the implicit zero extension of PEXTRB.

The biggest perf diff is on SLM targets where MOVD (uops=1, lat=3 tp=1) is notably faster than PEXTRB (uops=2, lat=5, tp=4).

This matches what we already do for PEXTRW.

Differential Revision: https://reviews.llvm.org/D76138
25 files changed:
llvm/lib/Target/X86/X86ISelLowering.cpp
llvm/test/CodeGen/X86/avg.ll
llvm/test/CodeGen/X86/avx512-vec3-crash.ll
llvm/test/CodeGen/X86/bitcast-vector-bool.ll
llvm/test/CodeGen/X86/buildvec-insertvec.ll
llvm/test/CodeGen/X86/extract-concat.ll
llvm/test/CodeGen/X86/horizontal-reduce-smax.ll
llvm/test/CodeGen/X86/horizontal-reduce-smin.ll
llvm/test/CodeGen/X86/horizontal-reduce-umax.ll
llvm/test/CodeGen/X86/horizontal-reduce-umin.ll
llvm/test/CodeGen/X86/scalar_widen_div.ll
llvm/test/CodeGen/X86/var-permute-128.ll
llvm/test/CodeGen/X86/var-permute-512.ll
llvm/test/CodeGen/X86/vector-bitreverse.ll
llvm/test/CodeGen/X86/vector-idiv-sdiv-128.ll
llvm/test/CodeGen/X86/vector-reduce-add.ll
llvm/test/CodeGen/X86/vector-reduce-and.ll
llvm/test/CodeGen/X86/vector-reduce-mul.ll
llvm/test/CodeGen/X86/vector-reduce-or.ll
llvm/test/CodeGen/X86/vector-reduce-smax.ll
llvm/test/CodeGen/X86/vector-reduce-smin.ll
llvm/test/CodeGen/X86/vector-reduce-umax.ll
llvm/test/CodeGen/X86/vector-reduce-umin.ll
llvm/test/CodeGen/X86/vector-reduce-xor.ll
llvm/test/CodeGen/X86/widen_bitops-0.ll