From: Richard Sandiford Date: Wed, 23 Sep 2020 11:29:40 +0000 (+0100) Subject: vect: Fix epilogue loop handling of partial vectors X-Git-Tag: upstream/12.2.0~13561 X-Git-Url: http://review.tizen.org/git/?a=commitdiff_plain;h=4452a7660b224ff310d246bc7f8c612669c8cd98;p=platform%2Fupstream%2Fgcc.git vect: Fix epilogue loop handling of partial vectors This patch fixes the fallout that Kewen reported on Power after the recent change to avoid unnecessary use of partial vectors. As Kewen said, the problem is that vect_analyze_loop_2 doesn't know how many epilogue iterations there will be, and so it cannot make a final decision about whether the number of iterations forces an epilogue loop to use partial vectors. This is similar to the current situation for peeling: we don't know during initial analysis whether an epilogue loop will itself require peeling. Instead we decide that during vect_do_peeling, where the final number of epilogue loop iterations is known. The patch takes a similar approach for the decision about whether to use partial vectors. As the comments in the patch say, the idea is that vect_analyze_loop_2 should make peeling and partial- vector decisions based on the assumption that the loop_vinfo will be used as the main loop, while vect_do_peeling should make them in the knowledge that the loop_vinfo will be used as an epilogue loop. This allows the same analysis to be used for both cases, which we rely on for implementing VECT_COMPARE_COSTS; see the big comment in vect_analyze_loop for details. I hope the patch makes the (mostly preexisting) structure a bit more obvious. It isn't what anyone would design from scratch, but that's the nature of working with a mature vector framework. Arranging things this way means that vect_verify_full_masking and vect_verify_loop_lens now become part of the “can” rather than “will” test for partial vectors. Also, while splitting out the logic that handles epilogues with constant iterations, I added a check to make sure that we don't try to use partial vectors to vectorise a single-scalar loop. This required some changes to the Power tests. gcc/ * tree-vectorizer.h (determine_peel_for_niter): Delete in favor of... (vect_determine_partial_vectors_and_peeling): ...this new function. * tree-vect-loop-manip.c (vect_update_epilogue_niters): New function. Reject using vector epilogue loops for single iterations. Install the constant number of epilogue loop iterations in the associated loop_vinfo. Rely on vect_determine_partial_vectors_and_peeling to do the main part of the test. (vect_do_peeling): Use vect_update_epilogue_niters to handle epilogue loops with a known number of iterations. Skip recomputing the number of iterations later in that case. Otherwise, use vect_determine_partial_vectors_and_peeling to decide whether the epilogue loop needs to use partial vectors or peeling. * tree-vect-loop.c (_loop_vec_info::_loop_vec_info): Set the default can_use_partial_vectors_p to false if partial-vector-usage=0. (determine_peel_for_niter): Remove in favor of... (vect_determine_partial_vectors_and_peeling): ...this new function, split out from... (vect_analyze_loop_2): ...here. Reflect the vect_verify_full_masking and vect_verify_loop_lens results in CAN_USE_PARTIAL_VECTORS_P rather than USING_PARTIAL_VECTORS_P. gcc/testsuite/ * gcc.target/powerpc/p9-vec-length-epil-1.c: Do not expect the single-iteration epilogues of the 64-bit loops to be vectorized. * gcc.target/powerpc/p9-vec-length-epil-7.c: Likewise. * gcc.target/powerpc/p9-vec-length-epil-8.c: Likewise. --- diff --git a/gcc/testsuite/gcc.target/powerpc/p9-vec-length-epil-1.c b/gcc/testsuite/gcc.target/powerpc/p9-vec-length-epil-1.c index ebb2f45..d248f09 100644 --- a/gcc/testsuite/gcc.target/powerpc/p9-vec-length-epil-1.c +++ b/gcc/testsuite/gcc.target/powerpc/p9-vec-length-epil-1.c @@ -10,6 +10,6 @@ /* { dg-final { scan-assembler-times {\mlxvx?\M} 20 } } */ /* { dg-final { scan-assembler-times {\mstxvx?\M} 10 } } */ -/* { dg-final { scan-assembler-times {\mlxvl\M} 20 } } */ -/* { dg-final { scan-assembler-times {\mstxvl\M} 10 } } */ +/* { dg-final { scan-assembler-times {\mlxvl\M} 14 } } */ +/* { dg-final { scan-assembler-times {\mstxvl\M} 7 } } */ diff --git a/gcc/testsuite/gcc.target/powerpc/p9-vec-length-epil-7.c b/gcc/testsuite/gcc.target/powerpc/p9-vec-length-epil-7.c index 9d40328..a27ee34 100644 --- a/gcc/testsuite/gcc.target/powerpc/p9-vec-length-epil-7.c +++ b/gcc/testsuite/gcc.target/powerpc/p9-vec-length-epil-7.c @@ -8,4 +8,4 @@ #include "p9-vec-length-7.h" -/* { dg-final { scan-assembler-times {\mstxvl\M} 10 } } */ +/* { dg-final { scan-assembler-times {\mstxvl\M} 7 } } */ diff --git a/gcc/testsuite/gcc.target/powerpc/p9-vec-length-epil-8.c b/gcc/testsuite/gcc.target/powerpc/p9-vec-length-epil-8.c index 6b54a29..961df0d 100644 --- a/gcc/testsuite/gcc.target/powerpc/p9-vec-length-epil-8.c +++ b/gcc/testsuite/gcc.target/powerpc/p9-vec-length-epil-8.c @@ -8,5 +8,5 @@ #include "p9-vec-length-8.h" -/* { dg-final { scan-assembler-times {\mlxvl\M} 30 } } */ -/* { dg-final { scan-assembler-times {\mstxvl\M} 10 } } */ +/* { dg-final { scan-assembler-times {\mlxvl\M} 21 } } */ +/* { dg-final { scan-assembler-times {\mstxvl\M} 7 } } */ diff --git a/gcc/tree-vect-loop-manip.c b/gcc/tree-vect-loop-manip.c index 47cfa6f..7cf00e6 100644 --- a/gcc/tree-vect-loop-manip.c +++ b/gcc/tree-vect-loop-manip.c @@ -2386,6 +2386,34 @@ slpeel_update_phi_nodes_for_lcssa (class loop *epilog) rename_use_op (PHI_ARG_DEF_PTR_FROM_EDGE (gsi.phi (), e)); } +/* EPILOGUE_VINFO is an epilogue loop that we now know would need to + iterate exactly CONST_NITERS times. Make a final decision about + whether the epilogue loop should be used, returning true if so. */ + +static bool +vect_update_epilogue_niters (loop_vec_info epilogue_vinfo, + unsigned HOST_WIDE_INT const_niters) +{ + /* Avoid wrap-around when computing const_niters - 1. Also reject + using an epilogue loop for a single scalar iteration, even if + we could in principle implement that using partial vectors. */ + unsigned int gap_niters = LOOP_VINFO_PEELING_FOR_GAPS (epilogue_vinfo); + if (const_niters <= gap_niters + 1) + return false; + + /* Install the number of iterations. */ + tree niters_type = TREE_TYPE (LOOP_VINFO_NITERS (epilogue_vinfo)); + tree niters_tree = build_int_cst (niters_type, const_niters); + tree nitersm1_tree = build_int_cst (niters_type, const_niters - 1); + + LOOP_VINFO_NITERS (epilogue_vinfo) = niters_tree; + LOOP_VINFO_NITERSM1 (epilogue_vinfo) = nitersm1_tree; + + /* Decide what to do if the number of epilogue iterations is not + a multiple of the epilogue loop's vectorization factor. */ + return vect_determine_partial_vectors_and_peeling (epilogue_vinfo, true); +} + /* Function vect_do_peeling. Input: @@ -2493,6 +2521,7 @@ vect_do_peeling (loop_vec_info loop_vinfo, tree niters, tree nitersm1, int estimated_vf; int prolog_peeling = 0; bool vect_epilogues = loop_vinfo->epilogue_vinfos.length () > 0; + bool vect_epilogues_updated_niters = false; /* We currently do not support prolog peeling if the target alignment is not known at compile time. 'vect_gen_prolog_loop_niters' depends on the target alignment being constant. */ @@ -2601,8 +2630,7 @@ vect_do_peeling (loop_vec_info loop_vinfo, tree niters, tree nitersm1, if (vect_epilogues && LOOP_VINFO_NITERS_KNOWN_P (loop_vinfo) && prolog_peeling >= 0 - && known_eq (vf, lowest_vf) - && !LOOP_VINFO_USING_PARTIAL_VECTORS_P (epilogue_vinfo)) + && known_eq (vf, lowest_vf)) { unsigned HOST_WIDE_INT eiters = (LOOP_VINFO_INT_NITERS (loop_vinfo) @@ -2612,13 +2640,7 @@ vect_do_peeling (loop_vec_info loop_vinfo, tree niters, tree nitersm1, eiters = eiters % lowest_vf + LOOP_VINFO_PEELING_FOR_GAPS (loop_vinfo); - unsigned int ratio; - unsigned int epilogue_gaps - = LOOP_VINFO_PEELING_FOR_GAPS (epilogue_vinfo); - while (!(constant_multiple_p - (GET_MODE_SIZE (loop_vinfo->vector_mode), - GET_MODE_SIZE (epilogue_vinfo->vector_mode), &ratio) - && eiters >= lowest_vf / ratio + epilogue_gaps)) + while (!vect_update_epilogue_niters (epilogue_vinfo, eiters)) { delete epilogue_vinfo; epilogue_vinfo = NULL; @@ -2629,8 +2651,8 @@ vect_do_peeling (loop_vec_info loop_vinfo, tree niters, tree nitersm1, } epilogue_vinfo = loop_vinfo->epilogue_vinfos[0]; loop_vinfo->epilogue_vinfos.ordered_remove (0); - epilogue_gaps = LOOP_VINFO_PEELING_FOR_GAPS (epilogue_vinfo); } + vect_epilogues_updated_niters = true; } /* Prolog loop may be skipped. */ bool skip_prolog = (prolog_peeling != 0); @@ -2928,7 +2950,9 @@ vect_do_peeling (loop_vec_info loop_vinfo, tree niters, tree nitersm1, skip_e edge. */ if (skip_vector) { - gcc_assert (update_e != NULL && skip_e != NULL); + gcc_assert (update_e != NULL + && skip_e != NULL + && !vect_epilogues_updated_niters); gphi *new_phi = create_phi_node (make_ssa_name (TREE_TYPE (niters)), update_e->dest); tree new_ssa = make_ssa_name (TREE_TYPE (niters)); @@ -2953,25 +2977,32 @@ vect_do_peeling (loop_vec_info loop_vinfo, tree niters, tree nitersm1, niters = PHI_RESULT (new_phi); } - /* Subtract the number of iterations performed by the vectorized loop - from the number of total iterations. */ - tree epilogue_niters = fold_build2 (MINUS_EXPR, TREE_TYPE (niters), - before_loop_niters, - niters); - - LOOP_VINFO_NITERS (epilogue_vinfo) = epilogue_niters; - LOOP_VINFO_NITERSM1 (epilogue_vinfo) - = fold_build2 (MINUS_EXPR, TREE_TYPE (epilogue_niters), - epilogue_niters, - build_one_cst (TREE_TYPE (epilogue_niters))); - /* Set ADVANCE to the number of iterations performed by the previous loop and its prologue. */ *advance = niters; - /* Redo the peeling for niter analysis as the NITERs and alignment - may have been updated to take the main loop into account. */ - determine_peel_for_niter (epilogue_vinfo); + if (!vect_epilogues_updated_niters) + { + /* Subtract the number of iterations performed by the vectorized loop + from the number of total iterations. */ + tree epilogue_niters = fold_build2 (MINUS_EXPR, TREE_TYPE (niters), + before_loop_niters, + niters); + + LOOP_VINFO_NITERS (epilogue_vinfo) = epilogue_niters; + LOOP_VINFO_NITERSM1 (epilogue_vinfo) + = fold_build2 (MINUS_EXPR, TREE_TYPE (epilogue_niters), + epilogue_niters, + build_one_cst (TREE_TYPE (epilogue_niters))); + + /* Decide what to do if the number of epilogue iterations is not + a multiple of the epilogue loop's vectorization factor. + We should have rejected the loop during the analysis phase + if this fails. */ + if (!vect_determine_partial_vectors_and_peeling (epilogue_vinfo, + true)) + gcc_unreachable (); + } } adjust_vec.release (); diff --git a/gcc/tree-vect-loop.c b/gcc/tree-vect-loop.c index 46d126c..f1d6bdd 100644 --- a/gcc/tree-vect-loop.c +++ b/gcc/tree-vect-loop.c @@ -814,7 +814,7 @@ _loop_vec_info::_loop_vec_info (class loop *loop_in, vec_info_shared *shared) vec_outside_cost (0), vec_inside_cost (0), vectorizable (false), - can_use_partial_vectors_p (true), + can_use_partial_vectors_p (param_vect_partial_vector_usage != 0), using_partial_vectors_p (false), epil_using_partial_vectors_p (false), peeling_for_gaps (false), @@ -2003,22 +2003,123 @@ vect_dissolve_slp_only_groups (loop_vec_info loop_vinfo) } } +/* Determine if operating on full vectors for LOOP_VINFO might leave + some scalar iterations still to do. If so, decide how we should + handle those scalar iterations. The possibilities are: -/* Decides whether we need to create an epilogue loop to handle - remaining scalar iterations and sets PEELING_FOR_NITERS accordingly. */ + (1) Make LOOP_VINFO operate on partial vectors instead of full vectors. + In this case: -void -determine_peel_for_niter (loop_vec_info loop_vinfo) + LOOP_VINFO_USING_PARTIAL_VECTORS_P == true + LOOP_VINFO_EPIL_USING_PARTIAL_VECTORS_P == false + LOOP_VINFO_PEELING_FOR_NITER == false + + (2) Make LOOP_VINFO operate on full vectors and use an epilogue loop + to handle the remaining scalar iterations. In this case: + + LOOP_VINFO_USING_PARTIAL_VECTORS_P == false + LOOP_VINFO_PEELING_FOR_NITER == true + + There are two choices: + + (2a) Consider vectorizing the epilogue loop at the same VF as the + main loop, but using partial vectors instead of full vectors. + In this case: + + LOOP_VINFO_EPIL_USING_PARTIAL_VECTORS_P == true + + (2b) Consider vectorizing the epilogue loop at lower VFs only. + In this case: + + LOOP_VINFO_EPIL_USING_PARTIAL_VECTORS_P == false + + When FOR_EPILOGUE_P is true, make this determination based on the + assumption that LOOP_VINFO is an epilogue loop, otherwise make it + based on the assumption that LOOP_VINFO is the main loop. The caller + has made sure that the number of iterations is set appropriately for + this value of FOR_EPILOGUE_P. */ + +opt_result +vect_determine_partial_vectors_and_peeling (loop_vec_info loop_vinfo, + bool for_epilogue_p) { - LOOP_VINFO_PEELING_FOR_NITER (loop_vinfo) = false; + /* Determine whether there would be any scalar iterations left over. */ + bool need_peeling_or_partial_vectors_p + = vect_need_peeling_or_partial_vectors_p (loop_vinfo); - if (LOOP_VINFO_USING_PARTIAL_VECTORS_P (loop_vinfo)) - /* The main loop handles all iterations. */ - LOOP_VINFO_PEELING_FOR_NITER (loop_vinfo) = false; - else if (vect_need_peeling_or_partial_vectors_p (loop_vinfo)) - LOOP_VINFO_PEELING_FOR_NITER (loop_vinfo) = true; -} + /* Decide whether to vectorize the loop with partial vectors. */ + LOOP_VINFO_USING_PARTIAL_VECTORS_P (loop_vinfo) = false; + LOOP_VINFO_EPIL_USING_PARTIAL_VECTORS_P (loop_vinfo) = false; + if (LOOP_VINFO_CAN_USE_PARTIAL_VECTORS_P (loop_vinfo) + && need_peeling_or_partial_vectors_p) + { + /* For partial-vector-usage=1, try to push the handling of partial + vectors to the epilogue, with the main loop continuing to operate + on full vectors. + + ??? We could then end up failing to use partial vectors if we + decide to peel iterations into a prologue, and if the main loop + then ends up processing fewer than VF iterations. */ + if (param_vect_partial_vector_usage == 1 + && !LOOP_VINFO_EPILOGUE_P (loop_vinfo) + && !vect_known_niters_smaller_than_vf (loop_vinfo)) + LOOP_VINFO_EPIL_USING_PARTIAL_VECTORS_P (loop_vinfo) = true; + else + LOOP_VINFO_USING_PARTIAL_VECTORS_P (loop_vinfo) = true; + } + + if (dump_enabled_p ()) + { + if (LOOP_VINFO_USING_PARTIAL_VECTORS_P (loop_vinfo)) + dump_printf_loc (MSG_NOTE, vect_location, + "operating on partial vectors%s.\n", + for_epilogue_p ? " for epilogue loop" : ""); + else + dump_printf_loc (MSG_NOTE, vect_location, + "operating only on full vectors%s.\n", + for_epilogue_p ? " for epilogue loop" : ""); + } + if (for_epilogue_p) + { + loop_vec_info orig_loop_vinfo = LOOP_VINFO_ORIG_LOOP_INFO (loop_vinfo); + gcc_assert (orig_loop_vinfo); + if (!LOOP_VINFO_USING_PARTIAL_VECTORS_P (loop_vinfo)) + gcc_assert (known_lt (LOOP_VINFO_VECT_FACTOR (loop_vinfo), + LOOP_VINFO_VECT_FACTOR (orig_loop_vinfo))); + } + + if (LOOP_VINFO_NITERS_KNOWN_P (loop_vinfo) + && !LOOP_VINFO_USING_PARTIAL_VECTORS_P (loop_vinfo)) + { + /* Check that the loop processes at least one full vector. */ + poly_uint64 vf = LOOP_VINFO_VECT_FACTOR (loop_vinfo); + tree scalar_niters = LOOP_VINFO_NITERS (loop_vinfo); + if (known_lt (wi::to_widest (scalar_niters), vf)) + return opt_result::failure_at (vect_location, + "loop does not have enough iterations" + " to support vectorization.\n"); + + /* If we need to peel an extra epilogue iteration to handle data + accesses with gaps, check that there are enough scalar iterations + available. + + The check above is redundant with this one when peeling for gaps, + but the distinction is useful for diagnostics. */ + tree scalar_nitersm1 = LOOP_VINFO_NITERSM1 (loop_vinfo); + if (LOOP_VINFO_PEELING_FOR_GAPS (loop_vinfo) + && known_lt (wi::to_widest (scalar_nitersm1), vf)) + return opt_result::failure_at (vect_location, + "loop does not have enough iterations" + " to support peeling for gaps.\n"); + } + + LOOP_VINFO_PEELING_FOR_NITER (loop_vinfo) + = (!LOOP_VINFO_USING_PARTIAL_VECTORS_P (loop_vinfo) + && need_peeling_or_partial_vectors_p); + + return opt_result::success (); +} /* Function vect_analyze_loop_2. @@ -2272,72 +2373,32 @@ start_over: LOOP_VINFO_CAN_USE_PARTIAL_VECTORS_P (loop_vinfo) = false; } - /* Decide whether to vectorize a loop with partial vectors for - this vectorization factor. */ - if (LOOP_VINFO_CAN_USE_PARTIAL_VECTORS_P (loop_vinfo)) - { - /* Don't use partial vectors if we don't need to peel the loop. */ - if (param_vect_partial_vector_usage == 0 - || !vect_need_peeling_or_partial_vectors_p (loop_vinfo)) - LOOP_VINFO_USING_PARTIAL_VECTORS_P (loop_vinfo) = false; - else if (vect_verify_full_masking (loop_vinfo) - || vect_verify_loop_lens (loop_vinfo)) - { - /* The epilogue and other known niters less than VF - cases can still use vector access with length fully. */ - if (param_vect_partial_vector_usage == 1 - && !LOOP_VINFO_EPILOGUE_P (loop_vinfo) - && !vect_known_niters_smaller_than_vf (loop_vinfo)) - { - LOOP_VINFO_USING_PARTIAL_VECTORS_P (loop_vinfo) = false; - LOOP_VINFO_EPIL_USING_PARTIAL_VECTORS_P (loop_vinfo) = true; - } - else - LOOP_VINFO_USING_PARTIAL_VECTORS_P (loop_vinfo) = true; - } - else - LOOP_VINFO_USING_PARTIAL_VECTORS_P (loop_vinfo) = false; - } - else - LOOP_VINFO_USING_PARTIAL_VECTORS_P (loop_vinfo) = false; - - if (dump_enabled_p ()) - { - if (LOOP_VINFO_USING_PARTIAL_VECTORS_P (loop_vinfo)) - dump_printf_loc (MSG_NOTE, vect_location, - "operating on partial vectors.\n"); - else - dump_printf_loc (MSG_NOTE, vect_location, - "operating only on full vectors.\n"); - } - - /* If epilog loop is required because of data accesses with gaps, - one additional iteration needs to be peeled. Check if there is - enough iterations for vectorization. */ - if (LOOP_VINFO_PEELING_FOR_GAPS (loop_vinfo) - && LOOP_VINFO_NITERS_KNOWN_P (loop_vinfo) - && !LOOP_VINFO_USING_PARTIAL_VECTORS_P (loop_vinfo)) - { - poly_uint64 vf = LOOP_VINFO_VECT_FACTOR (loop_vinfo); - tree scalar_niters = LOOP_VINFO_NITERSM1 (loop_vinfo); - - if (known_lt (wi::to_widest (scalar_niters), vf)) - return opt_result::failure_at (vect_location, - "loop has no enough iterations to" - " support peeling for gaps.\n"); - } + /* If we still have the option of using partial vectors, + check whether we can generate the necessary loop controls. */ + if (LOOP_VINFO_CAN_USE_PARTIAL_VECTORS_P (loop_vinfo) + && !vect_verify_full_masking (loop_vinfo) + && !vect_verify_loop_lens (loop_vinfo)) + LOOP_VINFO_CAN_USE_PARTIAL_VECTORS_P (loop_vinfo) = false; /* If we're vectorizing an epilogue loop, the vectorized loop either needs to be able to handle fewer than VF scalars, or needs to have a lower VF than the main loop. */ if (LOOP_VINFO_EPILOGUE_P (loop_vinfo) - && !LOOP_VINFO_USING_PARTIAL_VECTORS_P (loop_vinfo) + && !LOOP_VINFO_CAN_USE_PARTIAL_VECTORS_P (loop_vinfo) && maybe_ge (LOOP_VINFO_VECT_FACTOR (loop_vinfo), LOOP_VINFO_VECT_FACTOR (orig_loop_vinfo))) return opt_result::failure_at (vect_location, "Vectorization factor too high for" " epilogue loop.\n"); + /* Decide whether this loop_vinfo should use partial vectors or peeling, + assuming that the loop will be used as a main loop. We will redo + this analysis later if we instead decide to use the loop as an + epilogue loop. */ + ok = vect_determine_partial_vectors_and_peeling (loop_vinfo, false); + if (!ok) + return ok; + /* Check the costings of the loop make vectorizing worthwhile. */ res = vect_analyze_loop_costing (loop_vinfo); if (res < 0) @@ -2350,7 +2411,6 @@ start_over: return opt_result::failure_at (vect_location, "Loop costings not worthwhile.\n"); - determine_peel_for_niter (loop_vinfo); /* If an epilogue loop is required make sure we can create one. */ if (LOOP_VINFO_PEELING_FOR_GAPS (loop_vinfo) || LOOP_VINFO_PEELING_FOR_NITER (loop_vinfo)) diff --git a/gcc/tree-vectorizer.h b/gcc/tree-vectorizer.h index 9dffc55..b7fa6bc 100644 --- a/gcc/tree-vectorizer.h +++ b/gcc/tree-vectorizer.h @@ -1967,7 +1967,8 @@ extern tree vect_create_addr_base_for_vector_ref (vec_info *, extern widest_int vect_iv_limit_for_partial_vectors (loop_vec_info loop_vinfo); bool vect_rgroup_iv_might_wrap_p (loop_vec_info, rgroup_controls *); /* Used in tree-vect-loop-manip.c */ -extern void determine_peel_for_niter (loop_vec_info); +extern opt_result vect_determine_partial_vectors_and_peeling (loop_vec_info, + bool); /* Used in gimple-loop-interchange.c and tree-parloops.c. */ extern bool check_reduction_path (dump_user_location_t, loop_p, gphi *, tree, enum tree_code);