SLP vectorize across PHI nodes
authorRichard Biener <rguenther@suse.de>
Fri, 16 Oct 2020 07:43:22 +0000 (09:43 +0200)
committerRichard Biener <rguenther@suse.de>
Tue, 27 Oct 2020 12:17:09 +0000 (13:17 +0100)
commit4a369d199bf2f34e037030b18d0da923e8a24997
tree5eac5d0e6f1a4a9d6ca63d54f5daf8fc1e2c35ca
parent5af1e827bbb624eb28f80d2c5e0da46185af3708
SLP vectorize across PHI nodes

This makes SLP discovery detect backedges by seeding the bst_map with
the node to be analyzed so it can be picked up from recursive calls.
This removes the need to discover backedges in a separate walk.

This enables SLP build to handle PHI nodes in full, continuing
the SLP build to non-backedges.  For loop vectorization this
enables outer loop vectorization of nested SLP cycles and for
BB vectorization this enables vectorization of PHIs at CFG merges.

It also turns code generation into a SCC discovery walk to handle
irreducible regions and nodes only reachable via backedges where
we now also fill in vectorized backedge defs.

This requires sanitizing the SLP tree for SLP reduction chains even
more, manually filling the backedge SLP def.

This also exposes the fact that CFG copying (and edge splitting
until I fixed that) ends up with different edge order in the
copy which doesn't play well with the desired 1:1 mapping of
SLP PHI node children and edges for epilogue vectorization.
I've tried to fixup CFG copying here but this really looks
like a dead (or expensive) end there so I've done fixup in
slpeel_tree_duplicate_loop_to_edge_cfg instead for the cases
we can run into.

There's still NULLs in the SLP_TREE_CHILDREN vectors and I'm
not sure it's possible to eliminate them all this stage1 so the
patch has quite some checks for this case all over the place.

Bootstrapped and tested on x86_64-unknown-linux-gnu.  SPEC CPU 2017
and SPEC CPU 2006 successfully built and tested.

2020-10-27  Richard Biener  <rguenther@suse.de>

* gimple.h (gimple_expr_type): For PHIs return the type
of the result.
* tree-vect-loop-manip.c (slpeel_tree_duplicate_loop_to_edge_cfg):
Make sure edge order into copied loop headers line up with the
originals.
* tree-vect-loop.c (vect_transform_cycle_phi): Handle nested
loops with SLP.
(vectorizable_phi): New function.
(vectorizable_live_operation): For BB vectorization compute insert
location here.
* tree-vect-slp.c (vect_free_slp_tree): Deal with NULL
SLP_TREE_CHILDREN entries.
(vect_create_new_slp_node): Add overloads with pre-existing node
argument.
(vect_print_slp_graph): Likewise.
(vect_mark_slp_stmts): Likewise.
(vect_mark_slp_stmts_relevant): Likewise.
(vect_gather_slp_loads): Likewise.
(vect_optimize_slp): Likewise.
(vect_slp_analyze_node_operations): Likewise.
(vect_bb_slp_scalar_cost): Likewise.
(vect_remove_slp_scalar_calls): Likewise.
(vect_get_and_check_slp_defs): Handle PHIs.
(vect_build_slp_tree_1): Handle PHIs.
(vect_build_slp_tree_2): Continue SLP build, following PHI
arguments.  Fix memory leak.
(vect_build_slp_tree): Put stub node into the hash-map so
we can discover cycles directly.
(vect_build_slp_instance): Set the backedge SLP def for
reduction chains.
(vect_analyze_slp_backedges): Remove.
(vect_analyze_slp): Do not call it.
(vect_slp_convert_to_external): Release SLP_TREE_LOAD_PERMUTATION.
(vect_slp_analyze_node_operations): Handle stray failed
backedge defs by failing.
(vect_slp_build_vertices): Adjust leaf condition.
(vect_bb_slp_mark_live_stmts): Handle PHIs, use visited
hash-set to handle cycles.
(vect_slp_analyze_operations): Adjust.
(vect_bb_partition_graph_r): Likewise.
(vect_slp_function): Adjust split condition to allow CFG
merges.
(vect_schedule_slp_instance): Rename to ...
(vect_schedule_slp_node): ... this.  Move DFS walk to ...
(vect_schedule_scc): ... this new function.
(vect_schedule_slp): Call it.  Remove ad-hoc vectorized
backedge fill code.
* tree-vect-stmts.c (vect_analyze_stmt): Call
vectorizable_phi.
(vect_transform_stmt): Likewise.
(vect_is_simple_use): Handle vect_backedge_def.
* tree-vectorizer.c (vec_info::new_stmt_vec_info): Only
set loop header PHIs to vect_unknown_def_type for loop
vectorization.
* tree-vectorizer.h (enum vect_def_type): Add vect_backedge_def.
(enum stmt_vec_info_type): Add phi_info_type.
(vectorizable_phi): Declare.

* gcc.dg/vect/bb-slp-54.c: New test.
* gcc.dg/vect/bb-slp-55.c: Likewise.
* gcc.dg/vect/bb-slp-56.c: Likewise.
* gcc.dg/vect/bb-slp-57.c: Likewise.
* gcc.dg/vect/bb-slp-58.c: Likewise.
* gcc.dg/vect/bb-slp-59.c: Likewise.
* gcc.dg/vect/bb-slp-60.c: Likewise.
* gcc.dg/vect/bb-slp-61.c: Likewise.
* gcc.dg/vect/bb-slp-62.c: Likewise.
* gcc.dg/vect/bb-slp-63.c: Likewise.
* gcc.dg/vect/bb-slp-64.c: Likewise.
* gcc.dg/vect/bb-slp-65.c: Likewise.
* gcc.dg/vect/bb-slp-66.c: Likewise.
* gcc.dg/vect/vect-outer-slp-1.c: Likewise.
* gfortran.dg/vect/O3-bb-slp-1.f: Likewise.
* gfortran.dg/vect/O3-bb-slp-2.f: Likewise.
* g++.dg/vect/simd-11.cc: Likewise.
24 files changed:
gcc/gimple.h
gcc/testsuite/g++.dg/vect/simd-11.cc [new file with mode: 0644]
gcc/testsuite/gcc.dg/vect/bb-slp-54.c [new file with mode: 0644]
gcc/testsuite/gcc.dg/vect/bb-slp-55.c [new file with mode: 0644]
gcc/testsuite/gcc.dg/vect/bb-slp-56.c [new file with mode: 0644]
gcc/testsuite/gcc.dg/vect/bb-slp-57.c [new file with mode: 0644]
gcc/testsuite/gcc.dg/vect/bb-slp-58.c [new file with mode: 0644]
gcc/testsuite/gcc.dg/vect/bb-slp-59.c [new file with mode: 0644]
gcc/testsuite/gcc.dg/vect/bb-slp-60.c [new file with mode: 0644]
gcc/testsuite/gcc.dg/vect/bb-slp-61.c [new file with mode: 0644]
gcc/testsuite/gcc.dg/vect/bb-slp-62.c [new file with mode: 0644]
gcc/testsuite/gcc.dg/vect/bb-slp-63.c [new file with mode: 0644]
gcc/testsuite/gcc.dg/vect/bb-slp-64.c [new file with mode: 0644]
gcc/testsuite/gcc.dg/vect/bb-slp-65.c [new file with mode: 0644]
gcc/testsuite/gcc.dg/vect/bb-slp-66.c [new file with mode: 0644]
gcc/testsuite/gcc.dg/vect/vect-outer-slp-1.c [new file with mode: 0644]
gcc/testsuite/gfortran.dg/vect/O3-bb-slp-1.f [new file with mode: 0644]
gcc/testsuite/gfortran.dg/vect/O3-bb-slp-2.f [new file with mode: 0644]
gcc/tree-vect-loop-manip.c
gcc/tree-vect-loop.c
gcc/tree-vect-slp.c
gcc/tree-vect-stmts.c
gcc/tree-vectorizer.c
gcc/tree-vectorizer.h