vect/aarch64: Fix various sve/cond*.c failures
authorRichard Sandiford <richard.sandiford@arm.com>
Fri, 27 Jan 2023 17:03:50 +0000 (17:03 +0000)
committerRichard Sandiford <richard.sandiford@arm.com>
Fri, 27 Jan 2023 17:03:50 +0000 (17:03 +0000)
commit553f8003ba5ecfdf0574a171692843ef838226b4
tree5a7d5e630b51bbdf9949787a2e3638b22bf8dfbd
parent32d1c3dd1d63399cca20567fb35d1ff90e30b655
vect/aarch64: Fix various sve/cond*.c failures

Quite a few gcc.target/aarch64/sve/cond*.c tests started failing
after g:68e0063397ba820e71adc220b2da0581dce29ffa, but it turns out
that we were cheating passes before the patch.

The tests involve comparing the cost of N wide compares, a pack
sequence, and a narrow COND_EXPR with the cost of a single COND_EXPR
on fewer elements.  The costs for the former included all operations,
but the costs for the latter didn't model the comparison embedded in
the COND_EXPR.  The patch made us include the comparison on both sides,
making it apples-for-apples, but that's enough to tip the balance in
favour of using the wider types.

I think the new choice does reflect the current SVE cost model
correctly.  (Whether and how the model should be tweaked is a
different question.)  This patch therefore changes the tuning
vector length to one that makes the choice more obvious.

That in turn needs a tweak to compare_inside_loop_cost.
The function compares body_cost1/vf1 with body_cost2/vf2,
but for fully-amsked loops, it limits vf to the actual number
of iterations.  This is so that (say) an expensive 16-element
vector body doesn't win over a cheaper 8-element vector body
when there are only 7 elements to process.

However, the limit was applied using known_le, regardless of
the tuning target.  For a heuristic like this, it seems better
to use the likely minimum (which is a concept that was only
added after this code went in).

g:68e0063397ba820e71adc220b2da0581dce29ffa also fixed
vcond_4_costly.c.

gcc/
* tree-vectorizer.cc (vector_costs::compare_inside_loop_cost):
Use the likely minimum VF when bounding the denominators to
the estimated number of iterations.

gcc/testsuite/
* gcc.target/aarch64/sve/cond_asrd_1.c: Tune for a 256-bit
vector length.
* gcc.target/aarch64/sve/cond_cnot_4.c: Likewise.
* gcc.target/aarch64/sve/cond_cnot_6.c: Likewise.
* gcc.target/aarch64/sve/cond_unary_5.c: Likewise.
* gcc.target/aarch64/sve/cond_unary_6.c: Likewise.
* gcc.target/aarch64/sve/cond_uxt_5.c: Likewise.
* gcc.target/aarch64/sve/vcond_4_costly.c: Remove XFAILs.
gcc/testsuite/gcc.target/aarch64/sve/cond_asrd_1.c
gcc/testsuite/gcc.target/aarch64/sve/cond_cnot_4.c
gcc/testsuite/gcc.target/aarch64/sve/cond_cnot_6.c
gcc/testsuite/gcc.target/aarch64/sve/cond_unary_5.c
gcc/testsuite/gcc.target/aarch64/sve/cond_unary_6.c
gcc/testsuite/gcc.target/aarch64/sve/cond_uxt_5.c
gcc/testsuite/gcc.target/aarch64/sve/vcond_4_costly.c
gcc/tree-vectorizer.cc