aarch64: Use memcpy to copy vector tables in vst1[q]_x2 intrinsics
authorJonathan Wright <jonathan.wright@arm.com>
Fri, 23 Jul 2021 12:41:39 +0000 (13:41 +0100)
committerJonathan Wright <jonathan.wright@arm.com>
Fri, 23 Jul 2021 14:38:00 +0000 (15:38 +0100)
commit50752b751fff56e7e2c74024bae659d5e9dea50f
tree8bc4b01e0715f1cce9e1ab2abbb8ffe2266f514d
parentccf6e2c21be84a478bcef4cced49879879a1104c
aarch64: Use memcpy to copy vector tables in vst1[q]_x2 intrinsics

Use __builtin_memcpy to copy vector structures instead of building
a new opaque structure one vector at a time in each of the vst1[q]_x2
Neon intrinsics in arm_neon.h. This simplifies the header file and
also improves code generation - superfluous move instructions were
emitted for every register extraction/set in this additional
structure.

Add new code generation tests to verify that superfluous move
instructions are not generated for the vst1q_x2 intrinsics.

gcc/ChangeLog:

2021-07-23  Jonathan Wright  <jonathan.wright@arm.com>

* config/aarch64/arm_neon.h (vst1_s64_x2): Use
__builtin_memcpy instead of constructing
__builtin_aarch64_simd_oi one vector at a time.
(vst1_u64_x2): Likewise.
(vst1_f64_x2): Likewise.
(vst1_s8_x2): Likewise.
(vst1_p8_x2): Likewise.
(vst1_s16_x2): Likewise.
(vst1_p16_x2): Likewise.
(vst1_s32_x2): Likewise.
(vst1_u8_x2): Likewise.
(vst1_u16_x2): Likewise.
(vst1_u32_x2): Likewise.
(vst1_f16_x2): Likewise.
(vst1_f32_x2): Likewise.
(vst1_p64_x2): Likewise.
(vst1q_s8_x2): Likewise.
(vst1q_p8_x2): Likewise.
(vst1q_s16_x2): Likewise.
(vst1q_p16_x2): Likewise.
(vst1q_s32_x2): Likewise.
(vst1q_s64_x2): Likewise.
(vst1q_u8_x2): Likewise.
(vst1q_u16_x2): Likewise.
(vst1q_u32_x2): Likewise.
(vst1q_u64_x2): Likewise.
(vst1q_f16_x2): Likewise.
(vst1q_f32_x2): Likewise.
(vst1q_f64_x2): Likewise.
(vst1q_p64_x2): Likewise.

gcc/testsuite/ChangeLog:

* gcc.target/aarch64/vector_structure_intrinsics.c: Add new
tests.
gcc/config/aarch64/arm_neon.h
gcc/testsuite/gcc.target/aarch64/vector_structure_intrinsics.c