Use rsqrt (X86) to speed up reciprocal square root calcs
authorSanjay Patel <spatel@rotateright.com>
Fri, 24 Oct 2014 17:02:16 +0000 (17:02 +0000)
committerSanjay Patel <spatel@rotateright.com>
Fri, 24 Oct 2014 17:02:16 +0000 (17:02 +0000)
commit957efc23bb87d341a1b478d87a48bb888c2d4068
tree48ae584987b7970cb90899c03590938f4d622799
parent5e3a421bfcb891fc7821daa501e30c113fb1bf16
Use rsqrt (X86) to speed up reciprocal square root calcs

This is a first step for generating SSE rsqrt instructions for
reciprocal square root calcs when fast-math is allowed.

For now, be conservative and only enable this for AMD btver2
where performance improves significantly - for example, 29%
on llvm/projects/test-suite/SingleSource/Benchmarks/BenchmarkGame/n-body.c
(if we convert the data type to single-precision float).

This patch adds a two constant version of the Newton-Raphson
refinement algorithm to DAGCombiner that can be selected by any target
via a parameter returned by getRsqrtEstimate()..

See PR20900 for more details:
http://llvm.org/bugs/show_bug.cgi?id=20900

Differential Revision: http://reviews.llvm.org/D5658

llvm-svn: 220570
llvm/include/llvm/Target/TargetLowering.h
llvm/lib/CodeGen/SelectionDAG/DAGCombiner.cpp
llvm/lib/Target/PowerPC/PPCISelLowering.cpp
llvm/lib/Target/PowerPC/PPCISelLowering.h
llvm/lib/Target/X86/X86.td
llvm/lib/Target/X86/X86ISelLowering.cpp
llvm/lib/Target/X86/X86ISelLowering.h
llvm/lib/Target/X86/X86Subtarget.cpp
llvm/lib/Target/X86/X86Subtarget.h
llvm/test/CodeGen/X86/sqrt-fastmath.ll