[X86][SSE] Improve lowering of vXi64 multiplies
authorSimon Pilgrim <llvm-dev@redking.me.uk>
Wed, 21 Dec 2016 20:00:10 +0000 (20:00 +0000)
committerSimon Pilgrim <llvm-dev@redking.me.uk>
Wed, 21 Dec 2016 20:00:10 +0000 (20:00 +0000)
commit081abbb164cceea0ff5b70d1557f2cf31198f5b9
tree9b712808d6be653cbd6b052d85fa932a9714d3c7
parentb0761a0c1ba8ec77d3704d2450d481bc25e60a9d
[X86][SSE] Improve lowering of vXi64 multiplies

As mentioned on PR30845, we were performing our vXi64 multiplication as:

AloBlo = pmuludq(a, b);
AloBhi = pmuludq(a, psrlqi(b, 32));
AhiBlo = pmuludq(psrlqi(a, 32), b);
return AloBlo + psllqi(AloBhi, 32)+ psllqi(AhiBlo, 32);

when we could avoid one of the upper shifts with:

AloBlo = pmuludq(a, b);
AloBhi = pmuludq(a, psrlqi(b, 32));
AhiBlo = pmuludq(psrlqi(a, 32), b);
return AloBlo + psllqi(AloBhi + AhiBlo, 32);

This matches the lowering on gcc/icc.

Differential Revision: https://reviews.llvm.org/D27756

llvm-svn: 290267
llvm/lib/Target/X86/X86ISelLowering.cpp
llvm/lib/Target/X86/X86TargetTransformInfo.cpp
llvm/test/Analysis/CostModel/X86/arith.ll
llvm/test/CodeGen/X86/avx-arith.ll
llvm/test/CodeGen/X86/avx512-arith.ll
llvm/test/CodeGen/X86/combine-mul.ll
llvm/test/CodeGen/X86/pmul.ll
llvm/test/CodeGen/X86/shrink_vmul.ll
llvm/test/CodeGen/X86/vector-trunc-math.ll