AVX512FP16: Optimize _Float16 reciprocal for div and sqrt
authorHongyu Wang <hongyu.wang@intel.com>
Mon, 25 Oct 2021 09:00:46 +0000 (17:00 +0800)
committerHongyu Wang <hongyu.wang@intel.com>
Thu, 28 Oct 2021 01:51:00 +0000 (09:51 +0800)
commit5720c450fab749664b32dbcd14d0a66f8ba57e5f
treee8a0e1d2079122f76df5dc6142558b3f7b3d5c49
parent04a2cf3fd65c21e9099edea462c22446fa72d398
AVX512FP16: Optimize _Float16 reciprocal for div and sqrt

For _Float16 type, add insn and expanders to optimize x / y to
x * rcp (y), and x / sqrt (y) to x * rsqrt (y).
As Half float only have minor precision difference between div and
mul * rcp, there is no need for Newton-Rhapson approximation.

gcc/ChangeLog:

* config/i386/i386.c (use_rsqrt_p): Add mode parameter, enable
  HFmode rsqrt without TARGET_SSE_MATH.
(ix86_optab_supported_p): Refactor rint, adjust floor, ceil,
btrunc condition to be restricted by -ftrapping-math, adjust
use_rsqrt_p function call.
* config/i386/i386.md (rcphf2): New define_insn.
(rsqrthf2): Likewise.
* config/i386/sse.md (div<mode>3): Change VF2H to VF2.
(div<mode>3): New expander for HF mode.
(rsqrt<mode>2): Likewise.
(*avx512fp16_vmrcpv8hf2): New define_insn for rpad pass.
(*avx512fp16_vmrsqrtv8hf2): Likewise.

gcc/testsuite/ChangeLog:

* gcc.target/i386/avx512fp16-recip-1.c: New test.
* gcc.target/i386/avx512fp16-recip-2.c: Ditto.
* gcc.target/i386/pr102464.c: Add -fno-trapping-math.
gcc/config/i386/i386.c
gcc/config/i386/i386.md
gcc/config/i386/sse.md
gcc/testsuite/gcc.target/i386/avx512fp16-recip-1.c [new file with mode: 0644]
gcc/testsuite/gcc.target/i386/avx512fp16-recip-2.c [new file with mode: 0644]
gcc/testsuite/gcc.target/i386/pr102464.c