[x86] enable fast sqrtss/sqrtps tuning for AMD Zen cores
authorSanjay Patel <spatel@rotateright.com>
Fri, 4 Feb 2022 18:13:01 +0000 (13:13 -0500)
committerSanjay Patel <spatel@rotateright.com>
Fri, 4 Feb 2022 18:59:20 +0000 (13:59 -0500)
commitfff3e1dbaa9ee2d91dc15b39defa88346f03a4c2
tree6da80baa09d1a1b26baf6ced817024bebba5d1cd
parentdbed14d215fed740e0e26784e7b8b00b68f5e680
[x86] enable fast sqrtss/sqrtps tuning for AMD Zen cores

As discussed in D118534, all of the recent AMD CPUs have
relatively fast (<14 cycle latency) "sqrtss" and "sqrtps"
instructions:
https://uops.info/table.html?search=sqrtps&cb_lat=on&cb_tp=on&cb_SNB=on&cb_SKL=on&cb_ZENp=on&cb_ZEN2=on&cb_ZEN3=on&cb_measurements=on&cb_avx=on&cb_sse=on

So we should set this tuning flag to alter codegen of plain
"sqrt(X)" expansion (as opposed to reciprocal-sqrt - there
is other test coverage for that pattern). The expansion is
both slower and less accurate than the hardware instruction.

Differential Revision: https://reviews.llvm.org/D119001
llvm/lib/Target/X86/X86.td
llvm/test/CodeGen/X86/sqrt-fastmath-tune.ll