Vectorization of first-order recurrences
authorRichard Biener <rguenther@suse.de>
Thu, 6 Oct 2022 11:56:09 +0000 (13:56 +0200)
committerRichard Biener <rguenther@suse.de>
Mon, 17 Oct 2022 10:36:35 +0000 (12:36 +0200)
commit46a8e017d048ec3271bbb898942e3b166c4e8ff3
treee6b1a35d4364b763d6ca815781a84d61dd46dff2
parentacdb24166d13d87c374e578d2ad5d58249171930
Vectorization of first-order recurrences

The following picks up the prototype by Ju-Zhe Zhong for vectorizing
first order recurrences.  That solves two TSVC missed optimization PRs.

There's a new scalar cycle def kind, vect_first_order_recurrence
and it's handling of the backedge value vectorization is complicated
by the fact that the vectorized value isn't the PHI but instead
a (series of) permute(s) shifting in the recurring value from the
previous iteration.  I've implemented this by creating both the
single vectorized PHI and the series of permutes when vectorizing
the scalar PHI but leave the backedge values in both unassigned.
The backedge values are (for the testcases) computed by a load
which is also the place after which the permutes are inserted.
That placement also restricts the cases we can handle (without
resorting to code motion).

I added both costing and SLP handling though SLP handling is
restricted to the case where a single vectorized PHI is enough.

Missing is epilogue handling - while prologue peeling would
be handled transparently by adjusting iv_phi_p the epilogue
case doesn't work with just inserting a scalar LC PHI since
that a) keeps the scalar load live and b) that loads is the
wrong one, it has to be the last, much like when we'd vectorize
the LC PHI as live operation.  Unfortunately LIVE
compute/analysis happens too early before we decide on
peeling.  When using fully masked loop vectorization the
vect-recurr-6.c works as expected though.

I have tested this on x86_64 for now, but since epilogue
handling is missing there's probably no practical cases.
My prototype WHILE_ULT AVX512 patch can handle vect-recurr-6.c
just fine but I didn't feel like running SPEC within SDE nor
is the WHILE_ULT patch complete enough.

PR tree-optimization/99409
PR tree-optimization/99394
* tree-vectorizer.h (vect_def_type::vect_first_order_recurrence): Add.
(stmt_vec_info_type::recurr_info_type): Likewise.
(vectorizable_recurr): New function.
* tree-vect-loop.cc (vect_phi_first_order_recurrence_p): New
function.
(vect_analyze_scalar_cycles_1): Look for first order
recurrences.
(vect_analyze_loop_operations): Handle them.
(vect_transform_loop): Likewise.
(vectorizable_recurr): New function.
(maybe_set_vectorized_backedge_value): Handle the backedge value
setting in the first order recurrence PHI and the permutes.
* tree-vect-stmts.cc (vect_analyze_stmt): Handle first order
recurrences.
(vect_transform_stmt): Likewise.
(vect_is_simple_use): Likewise.
(vect_is_simple_use): Likewise.
* tree-vect-slp.cc (vect_get_and_check_slp_defs): Likewise.
(vect_build_slp_tree_2): Likewise.
(vect_schedule_scc): Handle the backedge value setting in the
first order recurrence PHI and the permutes.

* gcc.dg/vect/vect-recurr-1.c: New testcase.
* gcc.dg/vect/vect-recurr-2.c: Likewise.
* gcc.dg/vect/vect-recurr-3.c: Likewise.
* gcc.dg/vect/vect-recurr-4.c: Likewise.
* gcc.dg/vect/vect-recurr-5.c: Likewise.
* gcc.dg/vect/vect-recurr-6.c: Likewise.
* gcc.dg/vect/tsvc/vect-tsvc-s252.c: Un-XFAIL.
* gcc.dg/vect/tsvc/vect-tsvc-s254.c: Likewise.
* gcc.dg/vect/tsvc/vect-tsvc-s291.c: Likewise.

Co-authored-by: Ju-Zhe Zhong <juzhe.zhong@rivai.ai>
13 files changed:
gcc/testsuite/gcc.dg/vect/tsvc/vect-tsvc-s252.c
gcc/testsuite/gcc.dg/vect/tsvc/vect-tsvc-s254.c
gcc/testsuite/gcc.dg/vect/tsvc/vect-tsvc-s291.c
gcc/testsuite/gcc.dg/vect/vect-recurr-1.c [new file with mode: 0644]
gcc/testsuite/gcc.dg/vect/vect-recurr-2.c [new file with mode: 0644]
gcc/testsuite/gcc.dg/vect/vect-recurr-3.c [new file with mode: 0644]
gcc/testsuite/gcc.dg/vect/vect-recurr-4.c [new file with mode: 0644]
gcc/testsuite/gcc.dg/vect/vect-recurr-5.c [new file with mode: 0644]
gcc/testsuite/gcc.dg/vect/vect-recurr-6.c [new file with mode: 0644]
gcc/tree-vect-loop.cc
gcc/tree-vect-slp.cc
gcc/tree-vect-stmts.cc
gcc/tree-vectorizer.h