aarch64: Reimplement vget_low* intrinsics
authorKyrylo Tkachov <kyrylo.tkachov@arm.com>
Fri, 5 Feb 2021 08:14:07 +0000 (08:14 +0000)
committerKyrylo Tkachov <kyrylo.tkachov@arm.com>
Fri, 5 Feb 2021 08:14:07 +0000 (08:14 +0000)
commitb6e7a7498732b83df61443c211b8d69454ad0b22
tree59812c9e666a19680aef716fccaff066146bff71
parent072f20c555907cce38a424da47b6c1baa8330169
aarch64: Reimplement vget_low* intrinsics

We can do better on the vget_low* intrinsics.
Currently they reinterpret their argument into a V2DI vector and extract the low "lane",
reinterpreting that back into the shorter vector.
This is functionally correct and generates a sequence of subregs and a vec_select that, by itself,
gets optimised away eventually.
However it's bad when we want to use the result in a other SIMD operations.
Then the subreg-vec_select-subreg combo blocks many combine patterns.

This patch reimplements them to emit a proper low vec_select from the start.
It generates much cleaner RTL and allows for more aggressive combinations, particularly
with the patterns that Jonathan has been pushing lately.

gcc/ChangeLog:

* config/aarch64/aarch64-simd-builtins.def (get_low): Define builtin.
* config/aarch64/aarch64-simd.md (aarch64_get_low<mode>): Define.
* config/aarch64/arm_neon.h (__GET_LOW): Delete.
(vget_low_f16): Reimplement using new builtin.
(vget_low_f32): Likewise.
(vget_low_f64): Likewise.
(vget_low_p8): Likewise.
(vget_low_p16): Likewise.
(vget_low_p64): Likewise.
(vget_low_s8): Likewise.
(vget_low_s16): Likewise.
(vget_low_s32): Likewise.
(vget_low_s64): Likewise.
(vget_low_u8): Likewise.
(vget_low_u16): Likewise.
(vget_low_u32): Likewise.
(vget_low_u64): Likewise.
gcc/config/aarch64/aarch64-simd-builtins.def
gcc/config/aarch64/aarch64-simd.md
gcc/config/aarch64/arm_neon.h