Continuing fixes for ceil and floor functions not to raise the
"inexact" exception, this patch fixes the x86_64 SSE4.1 versions. The
roundss / roundsd instructions take an immediate operand that
determines the rounding mode and whether to raise "inexact"; this just
needs bit 3 set to disable "inexact", which this patch does.
Remark: we don't have an SSE4.1 version of trunc / truncf (using this
instruction with operand 11); I'd expect one to make sense, but of
course it should be benchmarked against the existing C code. I'll
file a bug in Bugzilla for the lack of such a version.
Tested for x86_64.
[BZ #15479]
* sysdeps/x86_64/fpu/multiarch/s_ceil.S (__ceil_sse41): Set bit 3
of immediate operand to rounding instruction.
* sysdeps/x86_64/fpu/multiarch/s_ceilf.S (__ceilf_sse41):
Likewise.
* sysdeps/x86_64/fpu/multiarch/s_floor.S (__floor_sse41):
Likewise.
* sysdeps/x86_64/fpu/multiarch/s_floorf.S (__floorf_sse41):
Likewise.
+2016-05-24 Joseph Myers <joseph@codesourcery.com>
+
+ [BZ #15479]
+ * sysdeps/x86_64/fpu/multiarch/s_ceil.S (__ceil_sse41): Set bit 3
+ of immediate operand to rounding instruction.
+ * sysdeps/x86_64/fpu/multiarch/s_ceilf.S (__ceilf_sse41):
+ Likewise.
+ * sysdeps/x86_64/fpu/multiarch/s_floor.S (__floor_sse41):
+ Likewise.
+ * sysdeps/x86_64/fpu/multiarch/s_floorf.S (__floorf_sse41):
+ Likewise.
+
2016-05-24 Paul E. Murphy <murphyp@linux.vnet.ibm.com>
* math/libm-test.inc (MIN_EXP): Directly define as
ENTRY(__ceil_sse41)
- roundsd $2, %xmm0, %xmm0
+ roundsd $10, %xmm0, %xmm0
ret
END(__ceil_sse41)
ENTRY(__ceilf_sse41)
- roundss $2, %xmm0, %xmm0
+ roundss $10, %xmm0, %xmm0
ret
END(__ceilf_sse41)
ENTRY(__floor_sse41)
- roundsd $1, %xmm0, %xmm0
+ roundsd $9, %xmm0, %xmm0
ret
END(__floor_sse41)
ENTRY(__floorf_sse41)
- roundss $1, %xmm0, %xmm0
+ roundss $9, %xmm0, %xmm0
ret
END(__floorf_sse41)