sve: combine inverted masks into NOTs
authorTamar Christina <tamar.christina@arm.com>
Tue, 12 Oct 2021 10:34:06 +0000 (11:34 +0100)
committerTamar Christina <tamar.christina@arm.com>
Tue, 12 Oct 2021 10:35:45 +0000 (11:35 +0100)
commite36206c9940d224637083f2e91bd4c70f4b7dd20
treee790a32dfadb6a0215c07c30975ad779512291f4
parenta1a7d094307080c3d994209457f732005f59fa6a
sve: combine inverted masks into NOTs

The following example

void f10(double * restrict z, double * restrict w, double * restrict x,
 double * restrict y, int n)
{
    for (int i = 0; i < n; i++) {
        z[i] = (w[i] > 0) ? x[i] + w[i] : y[i] - w[i];
    }
}

generates currently:

        ld1d    z1.d, p1/z, [x1, x5, lsl 3]
        fcmgt   p2.d, p1/z, z1.d, #0.0
        fcmgt   p0.d, p3/z, z1.d, #0.0
        ld1d    z2.d, p2/z, [x2, x5, lsl 3]
        bic     p0.b, p3/z, p1.b, p0.b
        ld1d    z0.d, p0/z, [x3, x5, lsl 3]

where a BIC is generated between p1 and p0 where a NOT would be better here
since we won't require the use of p3 and opens the pattern up to being CSEd.

After this patch using a 2 -> 2 split we generate:

        ld1d    z1.d, p0/z, [x1, x5, lsl 3]
        fcmgt   p2.d, p0/z, z1.d, #0.0
        not     p1.b, p0/z, p2.b

The additional scratch is needed such that we can CSE the two operations.  If
both statements wrote to the same register then CSE won't be able to CSE the
values if there are other statements in between that use the register.

A second pattern is needed to capture the nor case as combine will match the
longest sequence first.  So without this pattern we end up de-optimizing nor
and instead emit two nots.  I did not find a better way to do this.

gcc/ChangeLog:

* config/aarch64/aarch64-sve.md (*fcm<cmp_op><mode>_bic_combine,
*fcm<cmp_op><mode>_nor_combine, *fcmuo<mode>_bic_combine,
*fcmuo<mode>_nor_combine): New.

gcc/testsuite/ChangeLog:

* gcc.target/aarch64/sve/pred-not-gen-1.c: New test.
* gcc.target/aarch64/sve/pred-not-gen-2.c: New test.
* gcc.target/aarch64/sve/pred-not-gen-3.c: New test.
* gcc.target/aarch64/sve/pred-not-gen-4.c: New test.
gcc/config/aarch64/aarch64-sve.md
gcc/testsuite/gcc.target/aarch64/sve/pred-not-gen-1.c [new file with mode: 0644]
gcc/testsuite/gcc.target/aarch64/sve/pred-not-gen-2.c [new file with mode: 0644]
gcc/testsuite/gcc.target/aarch64/sve/pred-not-gen-3.c [new file with mode: 0644]
gcc/testsuite/gcc.target/aarch64/sve/pred-not-gen-4.c [new file with mode: 0644]