arm: Auto-vectorization for MVE: add pack/unpack patterns
authorChristophe Lyon <christophe.lyon@linaro.org>
Thu, 3 Jun 2021 14:35:50 +0000 (14:35 +0000)
committerChristophe Lyon <christophe.lyon@linaro.org>
Mon, 14 Jun 2021 16:39:21 +0000 (16:39 +0000)
commit046a3beb1673bf4a61c131373b6a5e84158e92bf
tree492b33eefb5e745a6009ad8d51496c39e9a6eccf
parent12d13cf50fe68c898ee65d71d1ba9cdb3ea07996
arm: Auto-vectorization for MVE: add pack/unpack patterns

This patch adds vec_unpack<US>_hi_<mode>, vec_unpack<US>_lo_<mode>,
vec_pack_trunc_<mode> patterns for MVE.

It does so by moving the unpack patterns from neon.md to
vec-common.md, while adding them support for MVE. The pack expander is
derived from the Neon one (which in turn is renamed into
neon_quad_vec_pack_trunc_<mode>).

The patch introduces mve_vec_unpack<US>_lo_<mode> and
mve_vec_unpack<US>_hi_<mode> which are similar to their Neon
counterparts, except for the assembly syntax.

The patch introduces mve_vec_pack_trunc_lo_<mode> to avoid the need for a
zero-initialized temporary, which is needed if the
vec_pack_trunc_<mode> expander calls @mve_vmovn[bt]q_<supf><mode>
instead.

With this patch, we can now vectorize the 16 and 8-bit versions of
vclz and vshl, although the generated code could still be improved.
For test_clz_s16, we now generate
        vldrh.16        q3, [r1]
        vmovlb.s16   q2, q3
        vmovlt.s16   q3, q3
        vclz.i32  q2, q2
        vclz.i32  q3, q3
        vmovnb.i32      q1, q2
        vmovnt.i32      q1, q3
        vstrh.16        q1, [r0]
which could be improved to
        vldrh.16        q3, [r1]
vclz.i16 q1, q3
        vstrh.16        q1, [r0]
if we could avoid the need for unpack/pack steps.

For reference, clang-12 generates:
vldrh.s32       q0, [r1]
vldrh.s32       q1, [r1, #8]
vclz.i32        q0, q0
vstrh.32        q0, [r0]
vclz.i32        q0, q1
vstrh.32        q0, [r0, #8]

2021-06-11  Christophe Lyon  <christophe.lyon@linaro.org>

gcc/
* config/arm/mve.md (mve_vec_unpack<US>_lo_<mode>): New pattern.
(mve_vec_unpack<US>_hi_<mode>): New pattern.
(@mve_vec_pack_trunc_lo_<mode>): New pattern.
(mve_vmovntq_<supf><mode>): Prefix with '@'.
* config/arm/neon.md (vec_unpack<US>_hi_<mode>): Move to
vec-common.md.
(vec_unpack<US>_lo_<mode>): Likewise.
(vec_pack_trunc_<mode>): Rename to
neon_quad_vec_pack_trunc_<mode>.
* config/arm/vec-common.md (vec_unpack<US>_hi_<mode>): New
pattern.
(vec_unpack<US>_lo_<mode>): New.
(vec_pack_trunc_<mode>): New.

gcc/testsuite/
* gcc.target/arm/simd/mve-vclz.c: Update expected results.
* gcc.target/arm/simd/mve-vshl.c: Likewise.
* gcc.target/arm/simd/mve-vec-pack.c: New test.
* gcc.target/arm/simd/mve-vec-unpack.c: New test.
gcc/config/arm/mve.md
gcc/config/arm/neon.md
gcc/config/arm/vec-common.md
gcc/testsuite/gcc.target/arm/simd/mve-vclz.c
gcc/testsuite/gcc.target/arm/simd/mve-vec-pack.c [new file with mode: 0644]
gcc/testsuite/gcc.target/arm/simd/mve-vec-unpack.c [new file with mode: 0644]
gcc/testsuite/gcc.target/arm/simd/mve-vshl.c