middle-end: Add optimized float addsub without needing VEC_PERM_EXPR.
authorTamar Christina <tamar.christina@arm.com>
Mon, 14 Nov 2022 15:42:42 +0000 (15:42 +0000)
committerTamar Christina <tamar.christina@arm.com>
Mon, 14 Nov 2022 17:40:56 +0000 (17:40 +0000)
commitb2bb611d90d01f64a2456c29de2a2ca1211ac134
treebeaed686bf35b867edc42d73a51ef5f0044ccb7f
parent2b85d759dae79c930abe8118e1102ecb673b74aa
middle-end: Add optimized float addsub without needing VEC_PERM_EXPR.

For IEEE 754 floating point formats we can replace a sequence of alternative
+/- with fneg of a wider type followed by an fadd.  This eliminated the need for
using a permutation.  This patch adds a math.pd rule to recognize and do this
rewriting.

For

void f (float *restrict a, float *restrict b, float *res, int n)
{
   for (int i = 0; i < (n & -4); i+=2)
    {
      res[i+0] = a[i+0] + b[i+0];
      res[i+1] = a[i+1] - b[i+1];
    }
}

we generate:

.L3:
        ldr     q1, [x1, x3]
        ldr     q0, [x0, x3]
        fneg    v1.2d, v1.2d
        fadd    v0.4s, v0.4s, v1.4s
        str     q0, [x2, x3]
        add     x3, x3, 16
        cmp     x3, x4
        bne     .L3

now instead of:

.L3:
        ldr     q1, [x0, x3]
        ldr     q2, [x1, x3]
        fadd    v0.4s, v1.4s, v2.4s
        fsub    v1.4s, v1.4s, v2.4s
        tbl     v0.16b, {v0.16b - v1.16b}, v3.16b
        str     q0, [x2, x3]
        add     x3, x3, 16
        cmp     x3, x4
        bne     .L3

Thanks to George Steed for the idea.

gcc/ChangeLog:

* generic-match-head.cc: Include langooks.
* gimple-match-head.cc: Likewise.
* match.pd: Add fneg/fadd rule.

gcc/testsuite/ChangeLog:

* gcc.target/aarch64/simd/addsub_1.c: New test.
* gcc.target/aarch64/sve/addsub_1.c: New test.
gcc/generic-match-head.cc
gcc/gimple-match-head.cc
gcc/match.pd
gcc/testsuite/gcc.target/aarch64/simd/addsub_1.c [new file with mode: 0644]
gcc/testsuite/gcc.target/aarch64/sve/addsub_1.c [new file with mode: 0644]