aarch64: Use memcpy to copy vector tables in vst4[q] intrinsics
authorJonathan Wright <jonathan.wright@arm.com>
Tue, 20 Jul 2021 09:28:34 +0000 (10:28 +0100)
committerJonathan Wright <jonathan.wright@arm.com>
Fri, 23 Jul 2021 11:15:20 +0000 (12:15 +0100)
commite8de7edde6c5c3cc60f15c78422b85b4ccdc08bf
tree926acb26919bd15868b75646d73cbeeef2f373c8
parent4848e283ccaed451ddcc38edcb9f5ce9e9f2d7eb
aarch64: Use memcpy to copy vector tables in vst4[q] intrinsics

Use __builtin_memcpy to copy vector structures instead of building
a new opaque structure one vector at a time in each of the vst4[q]
Neon intrinsics in arm_neon.h. This simplifies the header file and
also improves code generation - superfluous move instructions were
emitted for every register extraction/set in this additional
structure.

Add new code generation tests to verify that superfluous move
instructions are no longer generated for the vst4q intrinsics.

gcc/ChangeLog:

2021-07-20  Jonathan Wright  <jonathan.wright@arm.com>

* config/aarch64/arm_neon.h (vst4_s64): Use __builtin_memcpy
instead of constructing __builtin_aarch64_simd_xi one vector
at a time.
(vst4_u64): Likewise.
(vst4_f64): Likewise.
(vst4_s8): Likewise.
(vst4_p8): Likewise.
(vst4_s16): Likewise.
(vst4_p16): Likewise.
(vst4_s32): Likewise.
(vst4_u8): Likewise.
(vst4_u16): Likewise.
(vst4_u32): Likewise.
(vst4_f16): Likewise.
(vst4_f32): Likewise.
(vst4_p64): Likewise.
(vst4q_s8): Likewise.
(vst4q_p8): Likewise.
(vst4q_s16): Likewise.
(vst4q_p16): Likewise.
(vst4q_s32): Likewise.
(vst4q_s64): Likewise.
(vst4q_u8): Likewise.
(vst4q_u16): Likewise.
(vst4q_u32): Likewise.
(vst4q_u64): Likewise.
(vst4q_f16): Likewise.
(vst4q_f32): Likewise.
(vst4q_f64): Likewise.
(vst4q_p64): Likewise.

gcc/testsuite/ChangeLog:

* gcc.target/aarch64/vector_structure_intrinsics.c: Add new
tests.
gcc/config/aarch64/arm_neon.h
gcc/testsuite/gcc.target/aarch64/vector_structure_intrinsics.c