arm: Replace arm_builtin_vectorized_function [PR106253]
This patch extends the fix for PR106253 to AArch32. As with AArch64,
we were using ACLE intrinsics to vectorise scalar built-ins, even
though the two sometimes have different ECF_* flags. (That in turn
is because the ACLE intrinsics should follow the instruction semantics
as closely as possible, whereas the scalar built-ins follow language
specs.)
The patch also removes the copysignf built-in, which only existed
for this purpose and wasn't a “real” arm_neon.h built-in.
Doing this also has the side-effect of enabling vectorisation of
rint and roundeven. Logically that should be a separate patch,
but making it one would have meant adding a new int iterator
for the original set of instructions and then removing it again
when including new functions.
I've restricted the bswap tests to little-endian because we end
up with excessive spilling on big-endian. E.g.:
sub sp, sp, #8
vstr d1, [sp]
vldr d16, [sp]
vrev16.8 d16, d16
vstr d16, [sp]
vldr d0, [sp]
add sp, sp, #8
@ sp needed
bx lr
Similarly, the copysign tests require little-endian because on
big-endian we unnecessarily load the constant from the constant pool:
vldr.32 s15, .L3
vdup.32 d0, d7[1]
vbsl d0, d2, d1
bx lr
.L3:
.word -
2147483648
gcc/
PR target/106253
* config/arm/arm-builtins.cc (arm_builtin_vectorized_function):
Delete.
* config/arm/arm-protos.h (arm_builtin_vectorized_function): Delete.
* config/arm/arm.cc (TARGET_VECTORIZE_BUILTIN_VECTORIZED_FUNCTION):
Delete.
* config/arm/arm_neon_builtins.def (copysignf): Delete.
* config/arm/iterators.md (nvrint_pattern): New attribute.
* config/arm/neon.md (<NEON_VRINT:nvrint_pattern><VCVTF:mode>2):
New pattern.
(l<NEON_VCVT:nvrint_pattern><su_optab><VCVTF:mode><v_cmp_result>2):
Likewise.
(neon_copysignf<mode>): Rename to...
(copysign<mode>3): ...this.
gcc/testsuite/
PR target/106253
* gcc.target/arm/vect_unary_1.c: New test.
* gcc.target/arm/vect_binary_1.c: Likewise.