rs6000: Support SSE4.1 "round" intrinsics
authorPaul A. Clarke <pc@us.ibm.com>
Mon, 12 Jul 2021 14:38:22 +0000 (09:38 -0500)
committerPaul A. Clarke <pc@us.ibm.com>
Thu, 13 Jan 2022 15:24:06 +0000 (09:24 -0600)
commit5fce2e036f6ec2ab8bfdbf042e1d7fcc6c569a9a
tree6868733761c1797513f406170fd30f06badcf59b
parentf45a2232bc8d6b88f52859cac502611395f3caf5
rs6000: Support SSE4.1 "round" intrinsics

Suppress exceptions (when specified), by saving, manipulating, and
restoring the FPSCR.  Similarly, save, set, and restore the floating-point
rounding mode when required.

No attempt is made to optimize writing the FPSCR (by checking if the new
value would be the same), other than using lighter weight instructions
when possible. Note that explicit instruction scheduling "barriers" are
added to prevent floating-point computations from being moved before or
after the explicit FPSCR manipulations.  (That these are required has
been reported as an issue in GCC: PR102783.)

The scalar versions naively use the parallel versions to compute the
single scalar result and then construct the remainder of the result.

Of minor note, the values of _MM_FROUND_TO_NEG_INF and _MM_FROUND_TO_ZERO
are swapped from the corresponding values on x86 so as to match the
corresponding rounding mode values in the Power ISA.

Move implementations of _mm_ceil* and _mm_floor* into _mm_round*, and
convert _mm_ceil* and _mm_floor* into macros. This matches the current
analogous implementations in config/i386/smmintrin.h.

Function signatures match the analogous functions in config/i386/smmintrin.h.

Add tests for _mm_round_pd, _mm_round_ps, _mm_round_sd, _mm_round_ss,
modeled after the very similar "floor" and "ceil" tests.

Include basic tests, plus tests at the boundaries for floating-point
representation, positive and negative, test all of the parameterized
rounding modes as well as the C99 rounding modes and interactions
between the two.

Exceptions are not explicitly tested.

2022-01-13  Paul A. Clarke  <pc@us.ibm.com>

gcc
* config/rs6000/smmintrin.h (_mm_round_pd, _mm_round_ps,
_mm_round_sd, _mm_round_ss, _MM_FROUND_TO_NEAREST_INT,
_MM_FROUND_TO_ZERO, _MM_FROUND_TO_POS_INF, _MM_FROUND_TO_NEG_INF,
_MM_FROUND_CUR_DIRECTION, _MM_FROUND_RAISE_EXC, _MM_FROUND_NO_EXC,
_MM_FROUND_NINT, _MM_FROUND_FLOOR, _MM_FROUND_CEIL, _MM_FROUND_TRUNC,
_MM_FROUND_RINT, _MM_FROUND_NEARBYINT): New.
(_mm_ceil_pd, _mm_ceil_ps, _mm_ceil_sd, _mm_ceil_ss, _mm_floor_pd,
_mm_floor_ps, _mm_floor_sd, _mm_floor_ss): Convert from function to
macro.

gcc/testsuite
* gcc.target/powerpc/sse4_1-round3.h: New.
* gcc.target/powerpc/sse4_1-roundpd.c: New.
* gcc.target/powerpc/sse4_1-roundps.c: New.
* gcc.target/powerpc/sse4_1-roundsd.c: New.
* gcc.target/powerpc/sse4_1-roundss.c: New.
gcc/config/rs6000/smmintrin.h
gcc/testsuite/gcc.target/powerpc/sse4_1-round3.h [new file with mode: 0644]
gcc/testsuite/gcc.target/powerpc/sse4_1-roundpd.c [new file with mode: 0644]
gcc/testsuite/gcc.target/powerpc/sse4_1-roundps.c [new file with mode: 0644]
gcc/testsuite/gcc.target/powerpc/sse4_1-roundsd.c [new file with mode: 0644]
gcc/testsuite/gcc.target/powerpc/sse4_1-roundss.c [new file with mode: 0644]