AArch64: Add combine patterns for narrowing shift of half top bits (shuffle)
authorTamar Christina <tamar.christina@arm.com>
Wed, 20 Oct 2021 16:07:54 +0000 (17:07 +0100)
committerTamar Christina <tamar.christina@arm.com>
Wed, 20 Oct 2021 16:07:54 +0000 (17:07 +0100)
commit41812e5e35e231c500468aa1ca779f7c703dc1a3
treec65aabe34a6df64555849d14d6d7af50b11fd55c
parente33aef11e145996fc550eca07e899f0c756d3802
AArch64: Add combine patterns for narrowing shift of half top bits (shuffle)

When doing a (narrowing) right shift by half the width of the original type then
we are essentially shuffling the top bits from the first number down.

If we have a hi/lo pair we can just use a single shuffle instead of needing two
shifts.

i.e.

typedef short int16_t;
typedef unsigned short uint16_t;

void foo (uint16_t * restrict a, int16_t * restrict d, int n)
{
    for( int i = 0; i < n; i++ )
      d[i] = (a[i] * a[i]) >> 16;
}

now generates:

.L4:
        ldr     q0, [x0, x3]
        umull   v1.4s, v0.4h, v0.4h
        umull2  v0.4s, v0.8h, v0.8h
        uzp2    v0.8h, v1.8h, v0.8h
        str     q0, [x1, x3]
        add     x3, x3, 16
        cmp     x4, x3
        bne     .L4

instead of

.L4:
        ldr     q0, [x0, x3]
        umull   v1.4s, v0.4h, v0.4h
        umull2  v0.4s, v0.8h, v0.8h
        sshr    v1.4s, v1.4s, 16
        sshr    v0.4s, v0.4s, 16
        xtn     v1.4h, v1.4s
        xtn2    v1.8h, v0.4s
        str     q1, [x1, x3]
        add     x3, x3, 16
        cmp     x4, x3
        bne     .L4

Thanks,
Tamar

gcc/ChangeLog:

* config/aarch64/aarch64-simd.md
(*aarch64_<srn_op>topbits_shuffle<mode>_le): New.
(*aarch64_topbits_shuffle<mode>_le): New.
(*aarch64_<srn_op>topbits_shuffle<mode>_be): New.
(*aarch64_topbits_shuffle<mode>_be): New.
* config/aarch64/predicates.md
(aarch64_simd_shift_imm_vec_exact_top): New.

gcc/testsuite/ChangeLog:

* gcc.target/aarch64/shrn-combine-10.c: New test.
* gcc.target/aarch64/shrn-combine-5.c: New test.
* gcc.target/aarch64/shrn-combine-6.c: New test.
* gcc.target/aarch64/shrn-combine-7.c: New test.
* gcc.target/aarch64/shrn-combine-8.c: New test.
* gcc.target/aarch64/shrn-combine-9.c: New test.
gcc/config/aarch64/aarch64-simd.md
gcc/config/aarch64/predicates.md
gcc/testsuite/gcc.target/aarch64/shrn-combine-10.c [new file with mode: 0644]
gcc/testsuite/gcc.target/aarch64/shrn-combine-5.c [new file with mode: 0644]
gcc/testsuite/gcc.target/aarch64/shrn-combine-6.c [new file with mode: 0644]
gcc/testsuite/gcc.target/aarch64/shrn-combine-7.c [new file with mode: 0644]
gcc/testsuite/gcc.target/aarch64/shrn-combine-8.c [new file with mode: 0644]
gcc/testsuite/gcc.target/aarch64/shrn-combine-9.c [new file with mode: 0644]