From: Richard Sandiford Date: Mon, 15 Feb 2021 15:05:22 +0000 (+0000) Subject: rtl-ssa: Reduce the amount of temporary memory needed [PR98863] X-Git-Tag: upstream/12.2.0~9708 X-Git-Url: http://review.tizen.org/git/?a=commitdiff_plain;h=abe07a74bb7a2692eff2af151ca54e749ed5eba6;p=platform%2Fupstream%2Fgcc.git rtl-ssa: Reduce the amount of temporary memory needed [PR98863] The rtl-ssa code uses an on-the-side IL and needs to build that IL for each block and RTL insn. I'd originally not used the classical dominance frontier method for placing phis on the basis that it seemed like more work in this context: we're having to visit everything in an RPO walk anyway, so for non-backedge cases we can tell immediately whether a phi node is needed. We then speculatively created phis for registers that are live across backedges and simplified them later. This avoided having to walk most of the IL twice (once to build the initial IL, and once to link uses to phis). However, as shown in PR98863, this leads to excessive temporary memory in extreme cases, since we had to record the value of every live register on exit from every block. In that PR, there were many registers that were live (but unused) across a large region of code. This patch does use the classical approach to placing phis, but tries to use the existing DF defs information to avoid two walks of the IL. We still use the previous approach for memory, since there is no up-front information to indicate whether a block defines memory or not. However, since memory is just treated as a single unified thing (like for gimple vops), memory doesn't suffer from the same scalability problems as registers. With this change, fwprop no longer seems to be a memory-hog outlier in the PR: the maximum RSS is similar with and without fwprop. The PR also shows the problems inherent in using bitmap operations involving the live-in and live-out sets, which in the testcase are very large. I've therefore tried to reduce those operations to the bare minimum. The patch also includes other compile-time optimisations motivated by the PR; see the changelog for details. I tried adding: for (int i = 0; i < 200; ++i) { crtl->ssa = new rtl_ssa::function_info (cfun); delete crtl->ssa; } to fwprop.c to stress the code. fwprop then took 35% of the compile time for the problematic partition in the PR (measured on a release build). fwprop takes less than .5% of the compile time when running normally. The command: git diff 0b76990a9d75d97b84014e37519086b81824c307~ gcc/fwprop.c | \ patch -p1 -R still gives a working compiler that uses the old fwprop.c. The compile time with that version is very similar. For a more reasonable testcase like optabs.ii at -O, I saw a 6.7% compile time regression with the loop above added (i.e. creating the info 201 times per pass instead of once per pass). That goes down to 4.8% with -O -g. I can't measure a significant difference with a normal compiler (no 200-iteration loop). So I think that (as expected) the patch does make things a bit slower in the normal case. But like Richi says, peak memory usage is harder for users to work around than slighter slower compile times. gcc/ PR rtl-optimization/98863 * rtl-ssa/functions.h (function_info::bb_live_out_info): Delete. (function_info::build_info): Turn into a declaration, moving the definition to internals.h. (function_info::bb_walker): Declare. (function_info::create_reg_use): Likewise. (function_info::calculate_potential_phi_regs): Take a build_info parameter. (function_info::place_phis, function_info::create_ebbs): Declare. (function_info::calculate_ebb_live_in_for_debug): Likewise. (function_info::populate_backedge_phis): Delete. (function_info::start_block, function_info::end_block): Declare. (function_info::populate_phi_inputs): Delete. (function_info::m_potential_phi_regs): Move information to build_info. * rtl-ssa/internals.h: New file. (function_info::bb_phi_info): New class. (function_info::build_info): Moved from functions.h. Add a constructor and destructor. (function_info::build_info::ebb_use): Delete. (function_info::build_info::ebb_def): Likewise. (function_info::build_info::bb_live_out): Likewise. (function_info::build_info::tmp_ebb_live_in_for_debug): New variable. (function_info::build_info::potential_phi_regs): Likewise. (function_info::build_info::potential_phi_regs_for_debug): Likewise. (function_info::build_info::ebb_def_regs): Likewise. (function_info::build_info::bb_phis): Likewise. (function_info::build_info::bb_mem_live_out): Likewise. (function_info::build_info::bb_to_rpo): Likewise. (function_info::build_info::def_stack): Likewise. (function_info::build_info::old_def_stack_limit): Likewise. * rtl-ssa/internals.inl (function_info::build_info::record_reg_def): Remove the regno argument. Push the previous definition onto the definition stack where necessary. * rtl-ssa/accesses.cc: Include internals.h. * rtl-ssa/changes.cc: Likewise. * rtl-ssa/blocks.cc: Likewise. (function_info::build_info::build_info): Define. (function_info::build_info::~build_info): Likewise. (function_info::bb_walker): New class. (function_info::bb_walker::bb_walker): Define. (function_info::add_live_out_use): Convert a logarithmic-complexity test into a linear one. Allow the same definition to be passed multiple times. (function_info::calculate_potential_phi_regs): Moved from functions.cc. Take a build_info parameter and store the information there instead. (function_info::place_phis): New function. (function_info::add_entry_block_defs): Update call to record_reg_def. (function_info::calculate_ebb_live_in_for_debug): New function. (function_info::add_phi_nodes): Use bb_phis to decide which registers need phi nodes and initialize ebb_def_regs accordingly. Do not add degenerate phis here. (function_info::add_artificial_accesses): Use create_reg_use. Assert that all definitions are listed in the DF LR sets. Update call to record_reg_def. (function_info::record_block_live_out): Record live-out register values in the phis of successor blocks. Use the live-out set when processing the last block in an EBB, instead of always using the live-in sets of successor blocks. AND the live sets with the set of registers that have been defined in the EBB, rather than with all potential phi registers. Cope correctly with branches back to the start of the current EBB. (function_info::start_block): New function. (function_info::end_block): Likewise. (function_info::populate_phi_inputs): Likewise. (function_info::create_ebbs): Likewise. (function_info::process_all_blocks): Rewrite into a multi-phase process. * rtl-ssa/functions.cc: Include internals.h. (function_info::calculate_potential_phi_regs): Move to blocks.cc. (function_info::init_function_data): Remove caller. * rtl-ssa/insns.cc: Include internals.h (function_info::create_reg_use): New function. Lazily any degenerate phis needed by the linear RPO view. (function_info::record_use): Use create_reg_use. When processing debug uses, use potential_phi_regs and test it before checking whether the register is live on entry to the current EBB. Lazily calculate ebb_live_in_for_debug. (function_info::record_call_clobbers): Update call to record_reg_def. (function_info::record_def): Likewise. --- diff --git a/gcc/rtl-ssa/accesses.cc b/gcc/rtl-ssa/accesses.cc index 5023d55..de3a29e 100644 --- a/gcc/rtl-ssa/accesses.cc +++ b/gcc/rtl-ssa/accesses.cc @@ -26,6 +26,7 @@ #include "rtl.h" #include "df.h" #include "rtl-ssa.h" +#include "rtl-ssa/internals.h" #include "rtl-ssa/internals.inl" using namespace rtl_ssa; diff --git a/gcc/rtl-ssa/blocks.cc b/gcc/rtl-ssa/blocks.cc index 2255a87..3a57e95 100644 --- a/gcc/rtl-ssa/blocks.cc +++ b/gcc/rtl-ssa/blocks.cc @@ -26,13 +26,122 @@ #include "rtl.h" #include "df.h" #include "rtl-ssa.h" +#include "rtl-ssa/internals.h" #include "rtl-ssa/internals.inl" #include "cfganal.h" #include "cfgrtl.h" #include "predict.h" +#include "domwalk.h" using namespace rtl_ssa; +// Prepare to build information for a function in which all register numbers +// are less than NUM_REGS and all basic block indices are less than +// NUM_BB_INDICES +function_info::build_info::build_info (unsigned int num_regs, + unsigned int num_bb_indices) + : current_bb (nullptr), + current_ebb (nullptr), + last_access (num_regs + 1), + ebb_live_in_for_debug (nullptr), + potential_phi_regs (num_regs), + bb_phis (num_bb_indices), + bb_mem_live_out (num_bb_indices), + bb_to_rpo (num_bb_indices) +{ + last_access.safe_grow_cleared (num_regs + 1); + + bitmap_clear (potential_phi_regs); + + // These arrays shouldn't need to be initialized, since we'll always + // write to an entry before reading from it. But poison the contents + // when checking, just to make sure we don't accidentally use an + // uninitialized value. + bb_phis.quick_grow (num_bb_indices); + bb_mem_live_out.quick_grow (num_bb_indices); + bb_to_rpo.quick_grow (num_bb_indices); + if (flag_checking) + { + // Can't do this for bb_phis because it has a constructor. + memset (bb_mem_live_out.address (), 0xaf, + num_bb_indices * sizeof (bb_mem_live_out[0])); + memset (bb_to_rpo.address (), 0xaf, + num_bb_indices * sizeof (bb_to_rpo[0])); + } + + // Start off with an empty set of phi nodes for each block. + for (bb_phi_info &info : bb_phis) + bitmap_initialize (&info.regs, &bitmap_default_obstack); +} + +function_info::build_info::~build_info () +{ + for (bb_phi_info &info : bb_phis) + bitmap_release (&info.regs); +} + +// A dom_walker for populating the basic blocks. +class function_info::bb_walker : public dom_walker +{ +public: + bb_walker (function_info *, build_info &); + virtual edge before_dom_children (basic_block); + virtual void after_dom_children (basic_block); + +private: + // Information about the function we're building. + function_info *m_function; + build_info &m_bi; + + // We should treat the exit block as being the last child of this one. + // See the comment in the constructor for more information. + basic_block m_exit_block_dominator; +}; + +// Prepare to walk the blocks in FUNCTION using BI. +function_info::bb_walker::bb_walker (function_info *function, build_info &bi) + : dom_walker (CDI_DOMINATORS, ALL_BLOCKS, bi.bb_to_rpo.address ()), + m_function (function), + m_bi (bi), + m_exit_block_dominator (nullptr) +{ + // ??? There is no dominance information associated with the exit block, + // so work out its immediate dominator using predecessor blocks. We then + // walk the exit block just before popping its immediate dominator. + edge e; + edge_iterator ei; + FOR_EACH_EDGE (e, ei, EXIT_BLOCK_PTR_FOR_FN (m_function->m_fn)->preds) + if (m_exit_block_dominator) + m_exit_block_dominator + = nearest_common_dominator (CDI_DOMINATORS, + m_exit_block_dominator, e->src); + else + m_exit_block_dominator = e->src; + + // If the exit block is unreachable, process it last. + if (!m_exit_block_dominator) + m_exit_block_dominator = ENTRY_BLOCK_PTR_FOR_FN (m_function->m_fn); +} + +edge +function_info::bb_walker::before_dom_children (basic_block bb) +{ + m_function->start_block (m_bi, m_function->bb (bb)); + return nullptr; +} + +void +function_info::bb_walker::after_dom_children (basic_block bb) +{ + // See the comment in the constructor for details. + if (bb == m_exit_block_dominator) + { + before_dom_children (EXIT_BLOCK_PTR_FOR_FN (m_function->m_fn)); + after_dom_children (EXIT_BLOCK_PTR_FOR_FN (m_function->m_fn)); + } + m_function->end_block (m_bi, m_function->bb (bb)); +} + // See the comment above the declaration. void bb_info::print_identifier (pretty_printer *pp) const @@ -217,15 +326,15 @@ function_info::add_live_out_use (bb_info *bb, set_info *def) // If the end of the block already has an artificial use, that use // acts to make DEF live at the appropriate point. - unsigned int regno = def->regno (); - if (find_access (bb->end_insn ()->uses (), regno)) + use_info *use = def->last_nondebug_insn_use (); + if (use && use->insn () == bb->end_insn ()) return; // Currently there is no need to maintain a backward link from the end // instruction to the list of live-out uses. Such a list would be // expensive to update if it was represented using the usual insn_info // access arrays. - use_info *use = allocate (bb->end_insn (), def->resource (), def); + use = allocate (bb->end_insn (), def->resource (), def); use->set_is_live_out_use (true); add_use (use); } @@ -474,6 +583,124 @@ function_info::append_bb (bb_info *bb) m_last_bb = bb; } +// Calculate BI.potential_phi_regs and BI.potential_phi_regs_for_debug. +void +function_info::calculate_potential_phi_regs (build_info &bi) +{ + auto *lr_info = DF_LR_BB_INFO (ENTRY_BLOCK_PTR_FOR_FN (m_fn)); + bool is_debug = MAY_HAVE_DEBUG_INSNS; + for (unsigned int regno = 0; regno < m_num_regs; ++regno) + if (regno >= DF_REG_SIZE (DF) + // Exclude registers that have a single definition that dominates + // all uses. If the definition does not dominate all uses, + // the register will be exposed upwards to the entry block but + // will not be defined by the entry block. + || DF_REG_DEF_COUNT (regno) > 1 + || (!bitmap_bit_p (&lr_info->def, regno) + && bitmap_bit_p (&lr_info->out, regno))) + { + bitmap_set_bit (bi.potential_phi_regs, regno); + if (is_debug) + bitmap_set_bit (bi.potential_phi_regs_for_debug, regno); + } +} + +// Called while building SSA form using BI. Decide where phi nodes +// should be placed for each register and initialize BI.bb_phis accordingly. +void +function_info::place_phis (build_info &bi) +{ + unsigned int num_bb_indices = last_basic_block_for_fn (m_fn); + + // Calculate dominance frontiers. + auto_vec frontiers; + frontiers.safe_grow (num_bb_indices); + for (unsigned int i = 0; i < num_bb_indices; ++i) + bitmap_initialize (&frontiers[i], &bitmap_default_obstack); + compute_dominance_frontiers (frontiers.address ()); + + // In extreme cases, the number of live-in registers can be much + // greater than the number of phi nodes needed in a block (see PR98863). + // Try to reduce the number of operations involving live-in sets by using + // PENDING as a staging area: registers in PENDING need phi nodes if + // they are live on entry to the corresponding block, but do not need + // phi nodes otherwise. + auto_vec unfiltered; + unfiltered.safe_grow (num_bb_indices); + for (unsigned int i = 0; i < num_bb_indices; ++i) + bitmap_initialize (&unfiltered[i], &bitmap_default_obstack); + + // If block B1 defines R and if B2 is in the dominance frontier of B1, + // queue a possible phi node for R in B2. + auto_bitmap worklist; + for (unsigned int b1 = 0; b1 < num_bb_indices; ++b1) + { + // Only access DF information for blocks that are known to exist. + if (bitmap_empty_p (&frontiers[b1])) + continue; + + bitmap b1_def = &DF_LR_BB_INFO (BASIC_BLOCK_FOR_FN (m_fn, b1))->def; + bitmap_iterator bmi; + unsigned int b2; + EXECUTE_IF_SET_IN_BITMAP (&frontiers[b1], 0, b2, bmi) + if (bitmap_ior_into (&unfiltered[b2], b1_def) + && !bitmap_empty_p (&frontiers[b2])) + // Propagate the (potential) new phi node definitions in B2. + bitmap_set_bit (worklist, b2); + } + + while (!bitmap_empty_p (worklist)) + { + unsigned int b1 = bitmap_first_set_bit (worklist); + bitmap_clear_bit (worklist, b1); + + // Restrict the phi nodes to registers that are live on entry to + // the block. + bitmap b1_in = DF_LR_IN (BASIC_BLOCK_FOR_FN (m_fn, b1)); + bitmap b1_phis = &bi.bb_phis[b1].regs; + if (!bitmap_ior_and_into (b1_phis, &unfiltered[b1], b1_in)) + continue; + + // If block B1 has a phi node for R and if B2 is in the dominance + // frontier of B1, queue a possible phi node for R in B2. + bitmap_iterator bmi; + unsigned int b2; + EXECUTE_IF_SET_IN_BITMAP (&frontiers[b1], 0, b2, bmi) + if (bitmap_ior_into (&unfiltered[b2], b1_phis) + && !bitmap_empty_p (&frontiers[b2])) + bitmap_set_bit (worklist, b2); + } + + basic_block cfg_bb; + FOR_ALL_BB_FN (cfg_bb, m_fn) + { + // Calculate the set of phi nodes for blocks that don't have any + // dominance frontiers. We only need to do this once per block. + unsigned int i = cfg_bb->index; + bb_phi_info &phis = bi.bb_phis[i]; + if (bitmap_empty_p (&frontiers[i])) + bitmap_and (&phis.regs, &unfiltered[i], DF_LR_IN (cfg_bb)); + + // Create an array that contains all phi inputs for this block. + // See the comment above the member variables for more information. + phis.num_phis = bitmap_count_bits (&phis.regs); + phis.num_preds = EDGE_COUNT (cfg_bb->preds); + unsigned int num_inputs = phis.num_phis * phis.num_preds; + if (num_inputs != 0) + { + phis.inputs = XOBNEWVEC (&m_temp_obstack, set_info *, num_inputs); + memset (phis.inputs, 0, num_inputs * sizeof (phis.inputs[0])); + } + } + + // Free the temporary bitmaps. + for (unsigned int i = 0; i < num_bb_indices; ++i) + { + bitmap_release (&frontiers[i]); + bitmap_release (&unfiltered[i]); + } +} + // Called while building SSA form using BI, with BI.current_bb being // the entry block. // @@ -508,7 +735,7 @@ function_info::add_entry_block_defs (build_info &bi) auto *set = allocate (insn, full_register (regno)); append_def (set); m_temp_defs.safe_push (set); - bi.record_reg_def (regno, set); + bi.record_reg_def (set); } // Create a definition that reflects the state of memory on entry to @@ -521,191 +748,86 @@ function_info::add_entry_block_defs (build_info &bi) finish_insn_accesses (insn); } +// Lazily calculate the value of BI.ebb_live_in_for_debug for BI.current_ebb. +void +function_info::calculate_ebb_live_in_for_debug (build_info &bi) +{ + gcc_checking_assert (bitmap_empty_p (bi.tmp_ebb_live_in_for_debug)); + bi.ebb_live_in_for_debug = bi.tmp_ebb_live_in_for_debug; + bitmap_and (bi.ebb_live_in_for_debug, bi.potential_phi_regs_for_debug, + DF_LR_IN (bi.current_ebb->first_bb ()->cfg_bb ())); + bitmap_tree_view (bi.ebb_live_in_for_debug); +} + // Called while building SSA form using BI. Create phi nodes for the -// current EBB, leaving backedge inputs to be filled in later. Set -// bi.last_access to the values that are live on entry to the EBB, -// regardless of whether or not they are phi nodes. +// current EBB. void function_info::add_phi_nodes (build_info &bi) { ebb_info *ebb = bi.current_ebb; basic_block cfg_bb = ebb->first_bb ()->cfg_bb (); - auto *lr_info = DF_LR_BB_INFO (cfg_bb); - // Get a local cache of the predecessor blocks' live out values. - unsigned int num_preds = EDGE_COUNT (cfg_bb->preds); - auto_vec pred_live_outs (num_preds); - bool has_backedge = false; - bool has_eh_edge = false; - edge e; - edge_iterator ei; - FOR_EACH_EDGE (e, ei, cfg_bb->preds) + // Create the register phis for this EBB. + bb_phi_info &phis = bi.bb_phis[cfg_bb->index]; + unsigned int num_preds = phis.num_preds; + unsigned int regno; + bitmap_iterator in_bi; + EXECUTE_IF_SET_IN_BITMAP (&phis.regs, 0, regno, in_bi) { - bb_info *pred_bb = this->bb (e->src); - const bb_live_out_info *live_out = &bi.bb_live_out[e->src->index]; - - // In LR (but not LIVE), the registers live on entry to a block must - // normally be a subset of the registers live on exit from any - // given predecessor block. The exceptions are EH edges, which - // implicitly clobber all registers in eh_edge_abi.full_reg_clobbers (). - // Thus if a register is upwards exposed in an EH handler, it won't - // be propagated across the EH edge. - // - // Excluding that special case, all registers live on entry to - // EBB are also live on exit from PRED_BB and were (or will be) - // considered when creating LIVE_OUT. - gcc_checking_assert ((e->flags & EDGE_EH) - || !bitmap_intersect_compl_p (&lr_info->in, - DF_LR_OUT (e->src))); - if (!pred_bb || !pred_bb->head_insn ()) - { - has_backedge = true; - live_out = nullptr; - } - has_eh_edge |= (e->flags & EDGE_EH); - pred_live_outs.quick_push (live_out); - } - - // PRED_REG_INDICES[I] tracks the index into PRED_LIVE_OUTS[I]->reg_values - // of the first unused entry. - auto_vec pred_reg_indices (num_preds); - pred_reg_indices.quick_grow_cleared (num_preds); - - // Use this array to build up the list of inputs to each phi. - m_temp_defs.safe_grow (num_preds); + gcc_checking_assert (bitmap_bit_p (bi.potential_phi_regs, regno)); - // Return true if the current phi is degenerate, i.e. if all its inputs - // are the same. - auto is_degenerate_phi = [&]() - { - if (has_backedge) - return false; + // Create an array of phi inputs, to be filled in later. + auto *inputs = XOBNEWVEC (&m_obstack, access_info *, num_preds); + memset (inputs, 0, sizeof (access_info *) * num_preds); - for (unsigned int i = 1; i < num_preds; ++i) - if (m_temp_defs[i] != m_temp_defs[0]) - return false; + // Later code works out the correct mode of the phi. Use BLKmode + // as a placeholder for now. + phi_info *phi = create_phi (ebb, { E_BLKmode, regno }, + inputs, num_preds); + bi.record_reg_def (phi); + } - return true; - }; + bitmap_copy (bi.ebb_def_regs, &phis.regs); - // Finish calculating the live-in value for RESOURCE. Decide how to - // represent the value of RESOURCE on entry to EBB and return its definition. - auto finish_phi = [&](resource_info resource) -> set_info * + // Collect the live-in memory definitions and record whether they're + // all the same. + m_temp_defs.reserve (num_preds); + set_info *mem_value = nullptr; + bool mem_phi_is_degenerate = true; + edge e; + edge_iterator ei; + FOR_EACH_EDGE (e, ei, cfg_bb->preds) { - access_info **inputs; - unsigned int num_inputs; - if (is_degenerate_phi ()) + bb_info *pred_bb = this->bb (e->src); + if (pred_bb && pred_bb->head_insn ()) { - auto *input = safe_as_a (m_temp_defs[0]); - if (!input) - // The live-in value is completely uninitialized. - return nullptr; - - unsigned int regno = input->regno (); - if (input->is_reg () && !bitmap_bit_p (bi.ebb_use, regno)) - // The live-in value comes from a single source and there - // are no uses of it within the EBB itself. We therefore - // don't need a phi node. - return input; - - // The live-in value comes from a single source and might be - // used by the EBB itself. Create a degenerate phi for it. - inputs = m_temp_defs.begin (); - num_inputs = 1; + mem_value = bi.bb_mem_live_out[pred_bb->index ()]; + m_temp_defs.quick_push (mem_value); + if (mem_value != m_temp_defs[0]) + mem_phi_is_degenerate = false; } else { - obstack_grow (&m_obstack, m_temp_defs.address (), - num_preds * sizeof (access_info *)); - inputs = static_cast (obstack_finish (&m_obstack)); - num_inputs = num_preds; + m_temp_defs.quick_push (nullptr); + mem_phi_is_degenerate = false; } - return create_phi (ebb, resource, inputs, num_inputs); - }; - - if (bi.ebb_live_in_for_debug) - bitmap_clear (bi.ebb_live_in_for_debug); + } - // Get the definition of each live input register, excluding registers - // that are known to have a single definition that dominates all uses. - unsigned int regno; - bitmap_iterator in_bi; - EXECUTE_IF_AND_IN_BITMAP (&lr_info->in, m_potential_phi_regs, - 0, regno, in_bi) + // Create a phi for memory, on the assumption that something in the + // EBB will need it. + if (mem_phi_is_degenerate) { - for (unsigned int pred_i = 0; pred_i < num_preds; ++pred_i) - { - set_info *input = nullptr; - if (const bb_live_out_info *pred_live_out = pred_live_outs[pred_i]) - { - // Skip over registers that aren't live on entry to this block. - unsigned int reg_i = pred_reg_indices[pred_i]; - while (reg_i < pred_live_out->num_reg_values - && pred_live_out->reg_values[reg_i]->regno () < regno) - reg_i += 1; - - // As we asserted above, REGNO is live out from the predecessor - // block, at least by the LR reckoning. But there are three - // cases: - // - // (1) The live-out value is well-defined (the normal case), - // with the definition coming either from the block itself - // or from a predecessor block. In this case reg_values - // has a set_info entry for the register. - // - // (2) The live-out value was not modified by the predecessor - // EBB and did not have a defined value on input to that - // EBB either. In this case reg_values has no entry for - // the register. - // - // (3) The live-out value was modified by the predecessor EBB, - // but the final modification was a clobber rather than - // a set. In this case reg_values again has no entry for - // the register. - // - // The phi input for (2) and (3) is undefined, which we - // represent as a null set_info. - if (reg_i < pred_live_out->num_reg_values) - { - set_info *set = pred_live_out->reg_values[reg_i]; - if (set->regno () == regno) - { - input = set; - reg_i += 1; - } - } - - // Fully call-clobbered values do not survive across EH edges. - // In particular, if a call that normally sets a result register - // throws an exception, the set of the result register should - // not be treated as live on entry to the EH handler. - if (has_eh_edge - && HARD_REGISTER_NUM_P (regno) - && eh_edge_abi.clobbers_full_reg_p (regno) - && (EDGE_PRED (cfg_bb, pred_i)->flags & EDGE_EH)) - input = nullptr; - - pred_reg_indices[pred_i] = reg_i; - } - m_temp_defs[pred_i] = input; - } - // Later code works out the correct mode of the phi. Use BLKmode - // as a placeholder for now. - bi.record_reg_def (regno, finish_phi ({ E_BLKmode, regno })); - if (bi.ebb_live_in_for_debug) - bitmap_set_bit (bi.ebb_live_in_for_debug, regno); + access_info *input[] = { mem_value }; + mem_value = create_phi (ebb, memory, input, 1); } - - // Repeat the process above for memory. - for (unsigned int pred_i = 0; pred_i < num_preds; ++pred_i) + else { - set_info *input = nullptr; - if (const bb_live_out_info *pred_live_out = pred_live_outs[pred_i]) - input = pred_live_out->mem_value; - m_temp_defs[pred_i] = input; + obstack_grow (&m_obstack, m_temp_defs.address (), + num_preds * sizeof (access_info *)); + auto *inputs = static_cast (obstack_finish (&m_obstack)); + mem_value = create_phi (ebb, memory, inputs, num_preds); } - bi.record_mem_def (finish_phi (memory)); - + bi.record_mem_def (mem_value); m_temp_defs.truncate (0); } @@ -744,16 +866,12 @@ function_info::add_artificial_accesses (build_info &bi, df_ref_flags flags) { unsigned int regno = DF_REF_REGNO (ref); machine_mode mode = GET_MODE (DF_REF_REAL_REG (ref)); - resource_info resource { mode, regno }; // A definition must be available. gcc_checking_assert (bitmap_bit_p (&lr_info->in, regno) || (flags != DF_REF_AT_TOP && bitmap_bit_p (&lr_info->def, regno))); - set_info *def = bi.current_reg_value (regno); - auto *use = allocate (insn, resource, def); - add_use (use); - m_temp_uses.safe_push (use); + m_temp_uses.safe_push (create_reg_use (bi, insn, { mode, regno })); } // Track the return value of memory by adding an artificial use of @@ -772,6 +890,9 @@ function_info::add_artificial_accesses (build_info &bi, df_ref_flags flags) machine_mode mode = GET_MODE (DF_REF_REAL_REG (ref)); resource_info resource { mode, regno }; + // We rely on the def set being correct. + gcc_checking_assert (bitmap_bit_p (&lr_info->def, regno)); + // If the value isn't used later in the block and isn't live // on exit, we could instead represent the definition as a // clobber_info. However, that case should be relatively @@ -779,7 +900,7 @@ function_info::add_artificial_accesses (build_info &bi, df_ref_flags flags) set_info *def = allocate (insn, resource); append_def (def); m_temp_defs.safe_push (def); - bi.record_reg_def (regno, def); + bi.record_reg_def (def); } // Model the effect of a memory clobber on an incoming edge by adding @@ -808,103 +929,215 @@ function_info::add_block_contents (build_info &bi) add_insn_to_block (bi, insn); } -// Called while building SSA form using BI. Use BI.bb_live_out to record -// the values that are live out from BI.current_bb. +// Called while building SSA form using BI. Record live-out register values +// in the phi inputs of successor blocks and create live-out uses where +// appropriate. Record the live-out memory value in BI.bb_mem_live_out. void function_info::record_block_live_out (build_info &bi) { bb_info *bb = bi.current_bb; ebb_info *ebb = bi.current_ebb; basic_block cfg_bb = bb->cfg_bb (); - bb_live_out_info *live_out = &bi.bb_live_out[bb->index ()]; - auto *lr_info = DF_LR_BB_INFO (bb->cfg_bb ()); - // Calculate which subset of m_potential_phi_regs is live out from EBB - // at the end of BB. - auto_bitmap live_out_from_ebb; + // Record the live-out register values in the phi inputs of + // successor blocks. edge e; edge_iterator ei; FOR_EACH_EDGE (e, ei, cfg_bb->succs) { - bb_info *dest_bb = this->bb (e->dest); - if (!dest_bb || dest_bb->ebb () != ebb) - bitmap_ior_and_into (live_out_from_ebb, DF_LR_IN (e->dest), - m_potential_phi_regs); + bb_phi_info &phis = bi.bb_phis[e->dest->index]; + unsigned int input_i = e->dest_idx * phis.num_phis; + unsigned int regno; + bitmap_iterator out_bi; + EXECUTE_IF_SET_IN_BITMAP (&phis.regs, 0, regno, out_bi) + { + phis.inputs[input_i] + = live_out_value (bb, bi.current_reg_value (regno)); + input_i += 1; + } } - // Record the live-out register values. - unsigned int regno; - bitmap_iterator out_bi; - EXECUTE_IF_AND_IN_BITMAP (&lr_info->out, m_potential_phi_regs, - 0, regno, out_bi) - if (set_info *value = live_out_value (bb, bi.current_reg_value (regno))) + // Add the set of registers that were defined in this BB to the set + // of potentially-live registers defined in the EBB. + bitmap_ior_into (bi.ebb_def_regs, &DF_LR_BB_INFO (cfg_bb)->def); + + // Iterate through the registers in LIVE_OUT and see whether we need + // to add a live-out use for them. + auto record_live_out_regs = [&](bitmap live_out) + { + unsigned int regno; + bitmap_iterator out_bi; + EXECUTE_IF_AND_IN_BITMAP (bi.ebb_def_regs, live_out, 0, regno, out_bi) + { + set_info *value = live_out_value (bb, bi.current_reg_value (regno)); + if (value && value->ebb () == ebb) + add_live_out_use (bb, value); + } + }; + + if (bb == ebb->last_bb ()) + // All live-out registers might need live-out uses. + record_live_out_regs (DF_LR_OUT (cfg_bb)); + else + // Registers might need live-out uses if they are live on entry + // to a successor block in a different EBB. + FOR_EACH_EDGE (e, ei, cfg_bb->succs) { - if (value->ebb () == ebb && bitmap_bit_p (live_out_from_ebb, regno)) - add_live_out_use (bb, value); - obstack_ptr_grow (&m_temp_obstack, value); + bb_info *dest_bb = this->bb (e->dest); + if (dest_bb->ebb () != ebb || dest_bb == ebb->first_bb ()) + record_live_out_regs (DF_LR_IN (e->dest)); } - live_out->num_reg_values = (obstack_object_size (&m_temp_obstack) - / sizeof (set_info *)); - auto *data = obstack_finish (&m_temp_obstack); - live_out->reg_values = static_cast (data); + // Record the live-out memory value. + bi.bb_mem_live_out[cfg_bb->index] + = live_out_value (bb, bi.current_mem_value ()); +} + +// Add BB and its contents to the SSA information. +void +function_info::start_block (build_info &bi, bb_info *bb) +{ + ebb_info *ebb = bb->ebb (); + + // We (need to) add all blocks from one EBB before moving on to the next. + bi.current_bb = bb; + if (bb == ebb->first_bb ()) + bi.current_ebb = ebb; + else + gcc_assert (bi.current_ebb == ebb); - live_out->mem_value = live_out_value (bb, bi.current_mem_value ()); + // Record the start of this block's definitions in the definitions stack. + bi.old_def_stack_limit.safe_push (bi.def_stack.length ()); + + // Add the block itself. + append_bb (bb); + + // If the block starts an EBB, create the phi insn. This insn should exist + // for all EBBs, even if they don't (yet) need phis. + if (bb == ebb->first_bb ()) + ebb->set_phi_insn (append_artificial_insn (bb)); + + if (bb->index () == ENTRY_BLOCK) + { + add_entry_block_defs (bi); + record_block_live_out (bi); + return; + } + + if (EDGE_COUNT (bb->cfg_bb ()->preds) == 0) + { + // Leave unreachable blocks empty, since there is no useful + // liveness information for them, and anything they do will + // be wasted work. In a cleaned-up cfg, the only unreachable + // block we should see is the exit block of a noreturn function. + bb->set_head_insn (append_artificial_insn (bb)); + bb->set_end_insn (append_artificial_insn (bb)); + return; + } + + // If the block starts an EBB, create the phi nodes. + if (bb == ebb->first_bb ()) + add_phi_nodes (bi); + + // Process the contents of the block. + add_artificial_accesses (bi, DF_REF_AT_TOP); + if (bb->index () != EXIT_BLOCK) + add_block_contents (bi); + add_artificial_accesses (bi, df_ref_flags ()); + record_block_live_out (bi); + + // If we needed to calculate a live-in set for debug purposes, + // reset it to null at the end of the EBB. Convert the underlying + // bitmap to an empty list view, ready for the next calculation. + if (bi.ebb_live_in_for_debug && bb == ebb->last_bb ()) + { + bitmap_clear (bi.tmp_ebb_live_in_for_debug); + bitmap_list_view (bi.tmp_ebb_live_in_for_debug); + bi.ebb_live_in_for_debug = nullptr; + } } -// Called while building SSA form using BI. Check if BI.current_bb has -// any outgoing backedges. If so, use the up-to-date contents of -// BI.bb_live_out to populate the associated inputs of any phi nodes. +// Finish adding BB and the blocks that it dominates to the SSA information. void -function_info::populate_backedge_phis (build_info &bi) +function_info::end_block (build_info &bi, bb_info *bb) { - bb_info *bb = bi.current_bb; - basic_block cfg_bb = bb->cfg_bb (); - const bb_live_out_info *live_out = &bi.bb_live_out[bb->index ()]; + // Restore the register last_access information to the state it was + // in before we started processing BB. + unsigned int old_limit = bi.old_def_stack_limit.pop (); + while (bi.def_stack.length () > old_limit) + { + // We pushed a definition in BB if it was the first dominating + // definition (and so the previous entry was null). In other + // cases we pushed the previous dominating definition. + def_info *def = bi.def_stack.pop (); + unsigned int regno = def->regno (); + if (def->bb () == bb) + def = nullptr; + bi.last_access[regno + 1] = def; + } +} - edge e; - edge_iterator ei; - FOR_EACH_EDGE (e, ei, cfg_bb->succs) +// Finish setting up the phi nodes for each block, now that we've added +// the contents of all blocks. +void +function_info::populate_phi_inputs (build_info &bi) +{ + auto_vec sorted_phis; + for (ebb_info *ebb : ebbs ()) { - // Check if this edge counts as a backedge in the current traversal. - bb_info *succ_bb = this->bb (e->dest); - if (!succ_bb || !succ_bb->head_insn ()) + if (!ebb->first_phi ()) continue; - // Although the phis do not keep a defined order long-term, they are - // still in reverse regno order at this point. We can therefore use - // a merge operation on the phis and the live-out values. - unsigned int input_i = e->dest_idx; - int reg_i = live_out->num_reg_values - 1; - for (phi_info *phi : succ_bb->ebb ()->phis ()) + // Get a sorted array of EBB's phi nodes. + basic_block cfg_bb = ebb->first_bb ()->cfg_bb (); + bb_phi_info &phis = bi.bb_phis[cfg_bb->index]; + sorted_phis.truncate (0); + for (phi_info *phi : ebb->phis ()) + sorted_phis.safe_push (phi); + std::sort (sorted_phis.address (), + sorted_phis.address () + sorted_phis.length (), + compare_access_infos); + + // Set the inputs of the non-degenerate register phis. All inputs + // for one edge come before all inputs for the next edge. + set_info **inputs = phis.inputs; + unsigned int phi_i = 0; + bitmap_iterator bmi; + unsigned int regno; + EXECUTE_IF_SET_IN_BITMAP (&phis.regs, 0, regno, bmi) { - set_info *input = nullptr; - if (phi->is_mem ()) - input = live_out->mem_value; - else + // Skip intervening degenerate phis. + while (sorted_phis[phi_i]->regno () < regno) + phi_i += 1; + phi_info *phi = sorted_phis[phi_i]; + gcc_assert (phi->regno () == regno); + for (unsigned int input_i = 0; input_i < phis.num_preds; ++input_i) + if (set_info *input = inputs[input_i * phis.num_phis]) + { + use_info *use = phi->input_use (input_i); + gcc_assert (!use->def ()); + use->set_def (input); + add_use (use); + } + phi_i += 1; + inputs += 1; + } + + // Fill in the backedge inputs to any memory phi. + phi_info *mem_phi = sorted_phis.last (); + if (mem_phi->is_mem () && !mem_phi->is_degenerate ()) + { + edge e; + edge_iterator ei; + FOR_EACH_EDGE (e, ei, cfg_bb->preds) { - // Skip over any intervening live-out values. - unsigned int regno = phi->regno (); - while (reg_i >= 0) + use_info *use = mem_phi->input_use (e->dest_idx); + if (!use->def ()) { - set_info *reg_value = live_out->reg_values[reg_i]; - if (reg_value->regno () < regno) - break; - reg_i -= 1; - if (reg_value->regno () == regno) - { - input = reg_value; - break; - } + use->set_def (bi.bb_mem_live_out[e->src->index]); + add_use (use); } } - if (input) - { - use_info *use = phi->input_use (input_i); - gcc_assert (!use->def ()); - use->set_def (input); - add_use (use); - } } } } @@ -960,14 +1193,12 @@ choose_next_block_in_ebb (basic_block bb) return best_edge ? best_edge->dest : nullptr; } -// Partition the function's blocks into EBBs and build SSA form for all -// EBBs in the function. +// Partition the function into extended basic blocks. Create the +// associated ebb_infos and bb_infos, but don't add the bb_infos +// to the function list yet. void -function_info::process_all_blocks () +function_info::create_ebbs (build_info &bi) { - auto temps = temp_watermark (); - unsigned int num_bb_indices = last_basic_block_for_fn (m_fn); - // Compute the starting reverse postorder. We tweak this later to try // to get better EBB assignments. auto *postorder = new int[n_basic_blocks_for_fn (m_fn)]; @@ -975,119 +1206,63 @@ function_info::process_all_blocks () = pre_and_rev_post_order_compute (nullptr, postorder, true); gcc_assert (int (postorder_num) <= n_basic_blocks_for_fn (m_fn)); - // Construct the working state for this function and its subroutines. - build_info bi; - bi.last_access = XOBNEWVEC (&m_temp_obstack, access_info *, m_num_regs + 1); - memset (bi.last_access, 0, (m_num_regs + 1) * sizeof (set_info *)); - - // The bb_live_out array shouldn't need to be initialized, since we'll - // always write to an entry before reading from it. But poison the - // contents when checking, just to make sure we don't accidentally use - // an uninitialized value. - bi.bb_live_out = XOBNEWVEC (&m_temp_obstack, bb_live_out_info, - num_bb_indices); - if (flag_checking) - memset (bi.bb_live_out, 0xaf, - num_bb_indices * sizeof (bb_live_out_info)); - - // Only pay the overhead of recording a separate live-in bitmap if - // there are debug instructions that might need it. - auto_bitmap ebb_live_in; - if (MAY_HAVE_DEBUG_INSNS) - { - bi.ebb_live_in_for_debug = ebb_live_in; - // The bitmap is tested using individual bit operations, so optimize - // for that case. - bitmap_tree_view (ebb_live_in); - } - else - bi.ebb_live_in_for_debug = nullptr; - // Iterate over the blocks in reverse postorder. In cases where // multiple possible orders exist, prefer orders that chain blocks // together into EBBs. If multiple possible EBBs exist, try to pick // the ones that are most likely to be profitable. - auto_vec ebb; - auto_bitmap ebb_use_tmp; - auto_bitmap ebb_def_tmp; + auto_vec bbs; + unsigned int next_bb_index = 0; for (unsigned int i = 0; i < postorder_num; ++i) if (!m_bbs[postorder[i]]) { - // Choose and create the blocks that should form the next EBB, - // and calculate the set of registers that the EBB uses and defines - // Only do actual bitmap operations if the EBB contains multiple - // blocks. + // Choose and create the blocks that should form the next EBB. basic_block cfg_bb = BASIC_BLOCK_FOR_FN (m_fn, postorder[i]); - bi.ebb_use = &DF_LR_BB_INFO (cfg_bb)->use; - bi.ebb_def = &DF_LR_BB_INFO (cfg_bb)->def; - ebb.safe_push (create_bb_info (cfg_bb)); - cfg_bb = choose_next_block_in_ebb (cfg_bb); - if (cfg_bb) + do { - // An EBB with two blocks. - bitmap_ior (ebb_use_tmp, bi.ebb_use, &DF_LR_BB_INFO (cfg_bb)->use); - bitmap_ior (ebb_def_tmp, bi.ebb_def, &DF_LR_BB_INFO (cfg_bb)->def); - bi.ebb_use = ebb_use_tmp; - bi.ebb_def = ebb_def_tmp; - ebb.safe_push (create_bb_info (cfg_bb)); + // Record the chosen block order in a new RPO. + bi.bb_to_rpo[cfg_bb->index] = next_bb_index++; + bbs.safe_push (create_bb_info (cfg_bb)); cfg_bb = choose_next_block_in_ebb (cfg_bb); - while (cfg_bb) - { - // An EBB with three or more blocks. - bitmap_ior_into (bi.ebb_use, &DF_LR_BB_INFO (cfg_bb)->use); - bitmap_ior_into (bi.ebb_def, &DF_LR_BB_INFO (cfg_bb)->def); - ebb.safe_push (create_bb_info (cfg_bb)); - cfg_bb = choose_next_block_in_ebb (cfg_bb); - } } + while (cfg_bb); // Create the EBB itself. - bi.current_ebb = allocate (ebb[0], ebb.last ()); - for (bb_info *bb : ebb) - { - bb->set_ebb (bi.current_ebb); - append_bb (bb); - } - - // Populate the contents of the EBB. - bi.current_ebb->set_phi_insn (append_artificial_insn (ebb[0])); - if (ebb[0]->index () == ENTRY_BLOCK) - { - gcc_assert (ebb.length () == 1); - bi.current_bb = ebb[0]; - add_entry_block_defs (bi); - record_block_live_out (bi); - } - else if (EDGE_COUNT (ebb[0]->cfg_bb ()->preds) == 0) - // Leave unreachable blocks empty, since there is no useful - // liveness information for them, and anything they do will - // be wasted work. In a cleaned-up cfg, the only unreachable - // block we should see is the exit block of a noreturn function. - for (bb_info *bb : ebb) - { - bb->set_head_insn (append_artificial_insn (bb)); - bb->set_end_insn (append_artificial_insn (bb)); - } - else - { - add_phi_nodes (bi); - for (bb_info *bb : ebb) - { - bi.current_bb = bb; - add_artificial_accesses (bi, DF_REF_AT_TOP); - if (bb->index () != EXIT_BLOCK) - add_block_contents (bi); - add_artificial_accesses (bi, df_ref_flags ()); - record_block_live_out (bi); - populate_backedge_phis (bi); - } - } - ebb.truncate (0); + auto *ebb = allocate (bbs[0], bbs.last ()); + for (bb_info *bb : bbs) + bb->set_ebb (ebb); + bbs.truncate (0); } delete[] postorder; } +// Partition the function's blocks into EBBs and build SSA form for all +// EBBs in the function. +void +function_info::process_all_blocks () +{ + auto temps = temp_watermark (); + unsigned int num_bb_indices = last_basic_block_for_fn (m_fn); + + build_info bi (m_num_regs, num_bb_indices); + + calculate_potential_phi_regs (bi); + create_ebbs (bi); + place_phis (bi); + bb_walker (this, bi).walk (ENTRY_BLOCK_PTR_FOR_FN (m_fn)); + populate_phi_inputs (bi); + + if (flag_checking) + { + // The definition stack should be empty and all register definitions + // should be back in their original undefined state. + gcc_assert (bi.def_stack.is_empty () + && bi.old_def_stack_limit.is_empty ()); + for (unsigned int regno = 0; regno < m_num_regs; ++regno) + gcc_assert (!bi.last_access[regno + 1]); + } +} + // Print a description of CALL_CLOBBERS to PP. void rtl_ssa::pp_ebb_call_clobbers (pretty_printer *pp, diff --git a/gcc/rtl-ssa/changes.cc b/gcc/rtl-ssa/changes.cc index 2a3cc2b..bfab3d9 100644 --- a/gcc/rtl-ssa/changes.cc +++ b/gcc/rtl-ssa/changes.cc @@ -26,6 +26,7 @@ #include "rtl.h" #include "df.h" #include "rtl-ssa.h" +#include "rtl-ssa/internals.h" #include "rtl-ssa/internals.inl" #include "target.h" #include "predict.h" diff --git a/gcc/rtl-ssa/functions.cc b/gcc/rtl-ssa/functions.cc index 80821cd..257bc2b 100644 --- a/gcc/rtl-ssa/functions.cc +++ b/gcc/rtl-ssa/functions.cc @@ -26,6 +26,7 @@ #include "rtl.h" #include "df.h" #include "rtl-ssa.h" +#include "rtl-ssa/internals.h" #include "rtl-ssa/internals.inl" using namespace rtl_ssa; @@ -74,23 +75,6 @@ function_info::print (pretty_printer *pp) const } } -// Calculate m_potential_phi_regs. -void -function_info::calculate_potential_phi_regs () -{ - auto *lr_info = DF_LR_BB_INFO (ENTRY_BLOCK_PTR_FOR_FN (m_fn)); - for (unsigned int regno = 0; regno < m_num_regs; ++regno) - if (regno >= DF_REG_SIZE (DF) - // Exclude registers that have a single definition that dominates - // all uses. If the definition does not dominate all uses, - // the register will be exposed upwards to the entry block but - // will not be defined by the entry block. - || DF_REG_DEF_COUNT (regno) > 1 - || (!bitmap_bit_p (&lr_info->def, regno) - && bitmap_bit_p (&lr_info->out, regno))) - bitmap_set_bit (m_potential_phi_regs, regno); -} - // Initialize all member variables in preparation for (re)building // SSA form from scratch. void @@ -107,8 +91,6 @@ function_info::init_function_data () m_last_insn = nullptr; m_last_nondebug_insn = nullptr; m_free_phis = nullptr; - - calculate_potential_phi_regs (); } // The initial phase of the phi simplification process. The cumulative diff --git a/gcc/rtl-ssa/functions.h b/gcc/rtl-ssa/functions.h index f64bd3f..58755d7 100644 --- a/gcc/rtl-ssa/functions.h +++ b/gcc/rtl-ssa/functions.h @@ -176,81 +176,9 @@ public: void print (pretty_printer *pp) const; private: - // Information about the values that are live on exit from a basic block. - // This class is only used when constructing the SSA form, it isn't - // designed for being kept up-to-date. - class bb_live_out_info - { - public: - // REG_VALUES contains all the registers that live out from the block, - // in order of increasing register number. There are NUM_REG_VALUES - // in total. Registers do not appear here if their values are known - // to be completely undefined; in that sense, the information is - // closer to DF_LIVE than to DF_LR. - unsigned int num_reg_values; - set_info **reg_values; - - // The memory value that is live on exit from the block. - set_info *mem_value; - }; - - // Information used while constructing the SSA form and discarded - // afterwards. - class build_info - { - public: - set_info *current_reg_value (unsigned int) const; - set_info *current_mem_value () const; - - void record_reg_def (unsigned int, def_info *); - void record_mem_def (def_info *); - - // The block that we're currently processing. - bb_info *current_bb; - - // The EBB that contains CURRENT_BB. - ebb_info *current_ebb; - - // Except for the local exception noted below: - // - // - If register R has been defined in the current EBB, LAST_ACCESS[R + 1] - // is the last definition of R in the EBB. - // - // - If register R is currently live but has not yet been defined - // in the EBB, LAST_ACCESS[R + 1] is the current value of R, - // or null if the register's value is completely undefined. - // - // - The contents are not meaningful for other registers. - // - // Similarly: - // - // - If the current EBB has defined memory, LAST_ACCESS[0] is the last - // definition of memory in the EBB. - // - // - Otherwise LAST_ACCESS[0] is the value of memory that is live on - // - entry to the EBB. - // - // The exception is that while building instructions, LAST_ACCESS[I] - // can temporarily be the use of regno I - 1 by that instruction. - access_info **last_access; - - // A bitmap of registers that are live on entry to this EBB, with a tree - // view for quick lookup. Only used if MAY_HAVE_DEBUG_INSNS. - bitmap ebb_live_in_for_debug; - - // A conservative superset of the registers that are used by - // instructions in CURRENT_EBB. That is, all used registers - // are in the set, but some unused registers might be too. - bitmap ebb_use; - - // A similarly conservative superset of the registers that are defined - // by instructions in CURRENT_EBB. - bitmap ebb_def; - - // BB_LIVE_OUT[BI] gives the live-out values for the basic block - // with index BI. - bb_live_out_info *bb_live_out; - }; + class bb_phi_info; + class build_info; + class bb_walker; // Return an RAII object that owns all objects allocated by // allocate_temp during its lifetime. @@ -307,6 +235,7 @@ private: void start_insn_accesses (); void finish_insn_accesses (insn_info *); + use_info *create_reg_use (build_info &, insn_info *, resource_info); void record_use (build_info &, insn_info *, rtx_obj_reference); void record_call_clobbers (build_info &, insn_info *, rtx_call_insn *); void record_def (build_info &, insn_info *, rtx_obj_reference); @@ -327,7 +256,6 @@ private: bb_info *create_bb_info (basic_block); void append_bb (bb_info *); - void calculate_potential_phi_regs (); insn_info *add_placeholder_after (insn_info *); void possibly_queue_changes (insn_change &); @@ -335,12 +263,18 @@ private: void apply_changes_to_insn (insn_change &); void init_function_data (); + void calculate_potential_phi_regs (build_info &); + void place_phis (build_info &); + void create_ebbs (build_info &); void add_entry_block_defs (build_info &); + void calculate_ebb_live_in_for_debug (build_info &); void add_phi_nodes (build_info &); void add_artificial_accesses (build_info &, df_ref_flags); void add_block_contents (build_info &); void record_block_live_out (build_info &); - void populate_backedge_phis (build_info &); + void start_block (build_info &, bb_info *); + void end_block (build_info &, bb_info *); + void populate_phi_inputs (build_info &); void process_all_blocks (); void simplify_phi_setup (phi_info *, set_info **, bitmap); @@ -400,13 +334,6 @@ private: auto_vec m_temp_defs; auto_vec m_temp_uses; - // The set of registers that might need to have phis associated with them. - // Registers outside this set are known to have a single definition that - // dominates all uses. - // - // Before RA, about 5% of registers are typically in the set. - auto_bitmap m_potential_phi_regs; - // A list of phis that are no longer in use. Their uids are still unique // and so can be recycled. phi_info *m_free_phis; diff --git a/gcc/rtl-ssa/insns.cc b/gcc/rtl-ssa/insns.cc index 88fff06..5cc3962 100644 --- a/gcc/rtl-ssa/insns.cc +++ b/gcc/rtl-ssa/insns.cc @@ -26,6 +26,7 @@ #include "rtl.h" #include "df.h" #include "rtl-ssa.h" +#include "rtl-ssa/internals.h" #include "rtl-ssa/internals.inl" #include "predict.h" #include "print-rtl.h" @@ -406,6 +407,33 @@ function_info::finish_insn_accesses (insn_info *insn) insn->set_accesses (static_cast (addr), num_defs, num_uses); } +// Called while building SSA form using BI. Create and return a use of +// register RESOURCE in INSN. Create a degenerate phi where necessary. +use_info * +function_info::create_reg_use (build_info &bi, insn_info *insn, + resource_info resource) +{ + set_info *value = bi.current_reg_value (resource.regno); + if (value && value->ebb () != bi.current_ebb) + { + if (insn->is_debug_insn ()) + value = look_through_degenerate_phi (value); + else if (bitmap_bit_p (bi.potential_phi_regs, resource.regno)) + { + // VALUE is defined by a previous EBB and RESOURCE has multiple + // definitions. Create a degenerate phi in the current EBB + // so that all definitions and uses follow a linear RPO view; + // see rtl.texi for details. + access_info *inputs[] = { look_through_degenerate_phi (value) }; + value = create_phi (bi.current_ebb, value->resource (), inputs, 1); + bi.record_reg_def (value); + } + } + auto *use = allocate (insn, resource, value); + add_use (use); + return use; +} + // Called while building SSA form using BI. Record that INSN contains // read reference REF. If this requires new entries to be added to // INSN->uses (), add those entries to the list we're building in @@ -450,28 +478,25 @@ function_info::record_use (build_info &bi, insn_info *insn, if (value->ebb () == bi.current_ebb) return true; + // Check if VALUE is the function's only definition of REGNO. + // (We already know that it dominates the use.) + if (!bitmap_bit_p (bi.potential_phi_regs, regno)) + return true; + // If the register is live on entry to the EBB but not used // within it, VALUE is the correct live-in value. + if (!bi.ebb_live_in_for_debug) + calculate_ebb_live_in_for_debug (bi); if (bitmap_bit_p (bi.ebb_live_in_for_debug, regno)) return true; - // Check if VALUE is the function's only definition of REGNO - // and if it dominates the use. - if (regno != MEM_REGNO - && regno < DF_REG_SIZE (DF) - && DF_REG_DEF_COUNT (regno) == 1 - && dominated_by_p (CDI_DOMINATORS, insn->bb ()->cfg_bb (), - value->bb ()->cfg_bb ())) - return true; - // Punt for other cases. return false; }; if (insn->is_debug_insn () && !value_is_valid ()) value = nullptr; - use = allocate (insn, resource_info { mode, regno }, value); - add_use (use); + use = create_reg_use (bi, insn, { mode, regno }); m_temp_uses.safe_push (use); bi.last_access[ref.regno + 1] = use; use->record_reference (ref, true); @@ -547,7 +572,7 @@ function_info::record_call_clobbers (build_info &bi, insn_info *insn, def->m_is_call_clobber = true; append_def (def); m_temp_defs.safe_push (def); - bi.last_access[regno + 1] = def; + bi.record_reg_def (def); } } } @@ -599,7 +624,7 @@ function_info::record_def (build_info &bi, insn_info *insn, def->record_reference (ref, true); append_def (def); m_temp_defs.safe_push (def); - bi.last_access[ref.regno + 1] = def; + bi.record_reg_def (def); } // Called while building SSA form using BI. Add an insn_info for RTL diff --git a/gcc/rtl-ssa/internals.h b/gcc/rtl-ssa/internals.h new file mode 100644 index 0000000..65c44ec --- /dev/null +++ b/gcc/rtl-ssa/internals.h @@ -0,0 +1,140 @@ +// Definition of private classes for RTL SSA -*- C++ -*- +// Copyright (C) 2020-2021 Free Software Foundation, Inc. +// +// This file is part of GCC. +// +// GCC is free software; you can redistribute it and/or modify it under +// the terms of the GNU General Public License as published by the Free +// Software Foundation; either version 3, or (at your option) any later +// version. +// +// GCC is distributed in the hope that it will be useful, but WITHOUT ANY +// WARRANTY; without even the implied warranty of MERCHANTABILITY or +// FITNESS FOR A PARTICULAR PURPOSE. See the GNU General Public License +// for more details. +// +// You should have received a copy of the GNU General Public License +// along with GCC; see the file COPYING3. If not see +// . + +namespace rtl_ssa { + +// Information about a basic block's phi nodes. This class is only used when +// constructing the SSA form, it isn't meant to be kept up-to-date. +class function_info::bb_phi_info +{ +public: + // The set of registers that need phi nodes. + bitmap_head regs; + + // The number of registers in REGS. + unsigned int num_phis; + + // The number of inputs to each phi node. Caching the information here + // is at best a minor optimisation, but it fills a 32-bit hole that would + // otherwise exist on 64-bit hosts. + unsigned int num_preds; + + // An array of all the phi inputs for this block. It lists all inputs + // from the first incoming edge followed by all inputs for the next + // incoming edge, and so on. The inputs for a given edge are sorted + // by increasing register number. + set_info **inputs; +}; + +// Information used while constructing the SSA form and discarded +// afterwards. +class function_info::build_info +{ +public: + build_info (unsigned int, unsigned int); + ~build_info (); + + set_info *current_reg_value (unsigned int) const; + set_info *current_mem_value () const; + + void record_reg_def (def_info *); + void record_mem_def (def_info *); + + // The block that we're currently processing. + bb_info *current_bb; + + // The EBB that contains CURRENT_BB. + ebb_info *current_ebb; + + // Except for the local exception noted below: + // + // - If register R has been defined in the current EBB, LAST_ACCESS[R + 1] + // is the last definition of R in the EBB. + // + // - Otherwise, if the current EBB is dominated by a definition of R, + // LAST_ACCESS[R + 1] is the nearest dominating definition. + // + // - Otherwise, LAST_ACCESS[R + 1] is null. + // + // Similarly: + // + // - If the current EBB has defined memory, LAST_ACCESS[0] is the last + // definition of memory in the EBB. + // + // - Otherwise LAST_ACCESS[0] is the value of memory that is live on + // - entry to the EBB. + // + // The exception is that while building instructions, LAST_ACCESS[I] + // can temporarily be the use of regno I - 1 by that instruction. + auto_vec last_access; + + // A bitmap used to hold EBB_LIVE_IN_FOR_DEBUG. + auto_bitmap tmp_ebb_live_in_for_debug; + + // If nonnull, a bitmap of registers that are live on entry to this EBB, + // with a tree view for quick lookup. This bitmap is calculated lazily + // and is only used if MAY_HAVE_DEBUG_INSNS. + bitmap ebb_live_in_for_debug; + + // The set of registers that might need to have phis associated with them. + // Registers outside this set are known to have a single definition that + // dominates all uses. + // + // Before RA, about 5% of registers are typically in the set. + auto_sbitmap potential_phi_regs; + + // A sparse bitmap representation of POTENTIAL_PHI_REGS. Only used if + // MAY_HAVE_DEBUG_INSNS. + auto_bitmap potential_phi_regs_for_debug; + + // The set of registers that have been defined so far in the current EBB. + auto_bitmap ebb_def_regs; + + // BB_PHIS[B] describes the phis for basic block B. + auto_vec bb_phis; + + // BB_MEM_LIVE_OUT[B] is the memory value that is live on exit from + // basic block B. + auto_vec bb_mem_live_out; + + // BB_TO_RPO[B] gives the position of block B in a reverse postorder + // of the CFG. The RPO is a tweaked version of the one normally + // returned by pre_and_rev_post_order_compute, with all blocks in + // an EBB having consecutive positions. + auto_vec bb_to_rpo; + + // This stack is divided into sections, with one section for the + // current basic block and one section for each dominating block. + // Each element is a register definition. + // + // If the section for block B contains a definition D of a register R, + // then one of two things is true: + // + // - D occurs in B and no definition of R dominates B. + // - D dominates B and is the nearest dominating definition of R. + // + // The two cases are distinguished by the value of D->bb (). + auto_vec def_stack; + + // The top of this stack records the start of the current block's + // section in DEF_STACK. + auto_vec old_def_stack_limit; +}; + +} diff --git a/gcc/rtl-ssa/internals.inl b/gcc/rtl-ssa/internals.inl index 3d3ed9f..325717d 100644 --- a/gcc/rtl-ssa/internals.inl +++ b/gcc/rtl-ssa/internals.inl @@ -574,10 +574,20 @@ inline ebb_info::ebb_info (bb_info *first_bb, bb_info *last_bb) { } -// Set the contents of last_access for register REGNO to DEF. -inline void -function_info::build_info::record_reg_def (unsigned int regno, def_info *def) -{ +// Record register definition DEF in last_access, pushing a definition +// to def_stack where appropriate. +inline void +function_info::build_info::record_reg_def (def_info *def) +{ + unsigned int regno = def->regno (); + auto *prev_dominating_def = safe_as_a (last_access[regno + 1]); + if (!prev_dominating_def) + // Indicate that DEF is the first dominating definition of REGNO. + def_stack.safe_push (def); + else if (prev_dominating_def->bb () != def->bb ()) + // Record that PREV_DOMINATING_DEF was the dominating definition + // of REGNO on entry to the current block. + def_stack.safe_push (prev_dominating_def); last_access[regno + 1] = def; }