1 Linear Scan Register Allocation: Design and Implementation Notes
8 [Preconditions](#preconditions)
10 [Post-Conditions](#post-conditions)
12 [LSRA Phases](#lsra-phases)
14 [Key Data Structures](#key-data-structures)
16 [Dumps and Debugging Support](#dumps-and-debugging-support)
18 [LSRA Stress Modes](#lsra-stress-modes)
20 [Assertions & Validation](#assertions-validation)
22 [Future Extensions and Enhancements](#future-extensions-and-enhancements)
24 [References](#references)
29 This document provides additional detail on the linear scan register
30 allocator (LSRA) in RyuJIT. It is expected that the reader has already
31 read the [RyuJIT Overview document](https://github.com/dotnet/coreclr/blob/master/Documentation/botr/ryujit-overview.md).
33 Register allocation is performed using a linear scan register allocation
34 scheme, implemented by the `LinearScan` class.
36 - Physical registers are represented by the `RegRecord` class.
38 - Values (`lclVar` references and non-lclVar `GenTree` nodes that
39 produce a value) are represented by the `Interval` class.
41 - Each `Interval` (rather poorly named, since it really is
42 multiple "intervals") consists of one or more live ranges, each
43 of which has a list of `RefPosition`s (termed "use positions"
44 in much of the literature) that identify nodes at which the
45 value is referenced, as well as the register requirements for
48 - References to both physical (`RegRecord`) and virtual
49 (`Interval`) registers are represented by the
50 `RefPosition` class. It contains information about the location,
51 type and other requirements of the reference.
53 There are four main phases to LSRA:
57 - The order of allocation of the `BasicBlock`s is determined:
59 - This attempts to ensure that at least one predecessor of a
60 block is allocated before it.
62 - When not optimizing, the layout order of the blocks is used.
64 - Note that the order doesn't affect correctness, as the
65 location of `lclVar`s across block boundaries is fixed up
66 as necessary by the resolution phase. When not optimizing
67 `lclVar`s are not enregistered, so there is no benefit to
68 using a different order.
70 - An `Interval` is built for each register-candidate `lclVar`.
72 - A `RegRecord` is built for each physical register.
74 - Constructing `RefPosition`s
76 - The `RefPosition`s are built in a linear traversal of all the
79 - Iterating over the `BasicBlock`s in the order determined
80 above, a `RefTypeBB` `RefPosition` is created at each
81 new `BasicBlock`. This is the signal to the register
82 allocator to re-establish variable locations at the
85 - Iterating over the nodes within the block in execution
88 - Two locations (`LsraLocation`, which is an `unsigned
89 int`) are assigned to each node.
91 - The first is the virtual location at which its
94 - The second is the virtual location at which its
95 target register(s) are defined.
97 - This allows an instruction to use the same register
98 as both a source and target (where the source is not
99 marked `delayRegFree`.
101 - For each node, `RefPosition`s are built to reflect the uses,
102 definitions and kills of any registers involved in the
103 evaluation of the node.
105 - The allocation phase iterates over the `RefPosition` list:
107 - At the beginning of each new block the register assignment of
108 each lclVar `Interval` is updated, either based on incoming
109 argument locations, or based on the location at the end of the
110 selected predecessor block.
112 - Iteration proceeds according to ascending order of `RefPosition`s.
113 This means that the register assignment is performed in linear
116 - The status of an `Interval` is updated as a `RefPosition` is
119 - Splitting or spilling an `Interval` doesn't involve creating a new
120 one. Instead, the `RefPosition` simply gets a new assignment, and
121 is either marked for reload/copy or its location is simply
122 updated in the incoming map.
124 - The resolution phase has two parts:
126 - The `RefPosition`s are walked again to write back the register
127 assignments to the `GenTree` nodes.
129 - It is at this point that the register assignments for the
130 lclVars is finalized at `BasicBlock` boundaries.
132 - Then the block boundaries are handled:
134 - For fork edges (the source block has multiple targets, but
135 each target has only that one source), any required
136 resolution is placed at the target.
138 - For join edges (a single target block has multiple sources,
139 but each source has only that one target), any required
140 resolution is placed at the source.
142 - Critical edges require more complicated handling, and may
143 require splitting of the edge for placement of resolution.
145 - It may be that it would be more efficient to simply
146 spill those lclVars that require the split edge.
151 ### Lowered IR Form (LIR)
153 The RyuJIT backend components (register allocation and code generation)
154 are somewhat unusual in that the IR nodes do not map directly to target
155 instructions. Instead, the `Lowering` phase of the JIT first performs
156 IL transformations to ensure that the operations match the semantic
157 level of the target such that all registers are explicitly represented
160 - This may not result in a 1:1 correlation between nodes and target
161 instructions, as some nodes (e.g. constants) may not require a
162 register, and other nodes (e.g. those representing addressing modes)
163 may become part of the parent instruction at code generation time.
165 It is the job of the `Lowering` phase to transform the IR such that:
167 - The nodes are in `LIR` form (i.e. all expression trees have been
168 linearized, and the execution order of the nodes within a BasicBlock
169 is specified by the `gtNext` and `gtPrev` links).
171 - Any node that will be evaluated as part of its parent (consumer)
172 node have been identified. These are known as "contained" nodes, and
173 are identified by setting the `GTF_CONTAINED` flag in the
174 `gtFlags` field of the node.
176 - Similarly, nodes that represent values that could be referenced from
177 memory by their consumer (i.e. via an addressing mode) are marked as
178 "reg-optional" (`LIR::Flags::RegOptional`)
180 - Issues [\#7752](https://github.com/dotnet/coreclr/issues/7752) and [\#7753](https://github.com/dotnet/coreclr/issues/7753) track the proposal to support "folding"
181 of operations using a tree temp, i.e. supporting the possibility
182 of a def being reg-optional, as well as its use, so that it need
183 never occupy a register.
185 - Issue [\#6361](https://github.com/dotnet/coreclr/issues/6361) tracks the problem that `Lowering` currently has
186 to select a single operand to be reg-optional, even if either
187 operand could be. This requires some additional state because
188 LSRA can't easily navigate from one use to the other to
189 communicate whether the first operand has been assigned a
192 - All unused values (nodes that produce a result that is not consumed)
193 are identified (`gtLIRFlags` has the `LIR::Flags::UnusedValue` bit set)
195 - Since tree temps (the values produced by nodes and consumed by
196 their parent) are expected to be single-def, single-use (SDSU),
197 normally the live range can be determined to end at the use. If
198 there is no use, the register allocator doesn't know where the
201 - Code can be generated without any context from the parent (consumer)
204 ### Register Requirements
206 There are three types of value for which registers are allocated:
208 - Local variables, `lclVar`s, which appear in the IR as
209 `GT_LCL_VAR` for uses or `GT_STORE_LCL_VAR` for
212 - Values produced by any other IR node. These are known as "tree
213 temps", and never have more than a single use.
215 - "Internal registers" that are required for the evaluation of a node.
217 - These are an exception to the explicit representation of
220 The register requirements are determined by the `TreeNodeInfoInit()`
221 method, which is called for each node just prior to building its
224 The register lifetimes must obey the following lifetime model:
226 - First, any internal registers are defined.
228 - Next, any source registers are used (and are then freed if they are
229 last use and are not identified as `delayRegFree`).
231 - Next, the internal registers are used (and are then freed).
233 - Next, the destination register(s) are defined
235 - Finally, any `delayRegFree` source registesr are freed.
237 There are several things to note about this order:
239 - The internal registers will never overlap any use, but they may
240 overlap a destination register.
242 - Internal registers are never live beyond the node.
244 - The `delayRegFree` annotation is used for instructions that are
245 only available in a Read-Modify-Write form. That is, the destination
246 register is one of the sources. In this case, we must not use the
247 same register for the non-RMW operand as for the destination.
252 After LSRA, the graph has the following properties:
254 - The `gtRegNum` of each tree node contains the allocated register,
255 if any. Nodes that produce multiple registers are similarly
256 assigned, via extended register number fields. If the node does not
257 produce a value directly (i.e. it is either of void type, or it is
258 evaluated as part of its parent) its gtRegNum is set to REG_NA.
260 - In most cases, this register must satisfy the constraints
261 specified by the `NodeInfo`.
263 - In some cases, this is difficult:
265 - If a lclVar node currently lives in some register, it may
266 not be desirable to move it (i.e. its current location may
267 be desirable for future uses, e.g. if it's a callee save
268 register, but needs to be in a specific arg register for a
271 - In other cases there may be conflicts on the restrictions
272 placed by the defining node and the node which consumes it
274 - If such a node is constrained to a single fixed register (e.g.
275 an arg register, or a return from a call), then LSRA is free to
276 annotate the node with a different register. The code generator
277 must issue the appropriate move.
279 - However, if such a node is constrained to a set of registers,
280 and its current location does not satisfy that requirement, LSRA
281 must insert a `GT_COPY` node between the node and its parent.
282 The gtRegNum on the `GT_COPY` node must satisfy the register
283 requirement of the parent.
285 - GenTree::gtRsvdRegs has a set of registers used for internal temps.
286 These must satisfy the constraints given by the `NodeInfo`.
288 - A tree node is marked `GTF_SPILL` if the tree node must be spilled by
289 the code generator after it has been evaluated.
291 - A tree node is marked `GTF_SPILLED` if it is a lclVar that must be
292 reloaded prior to use.
294 - The register (gtRegNum) on the node indicates the register to
295 which it must be reloaded.
297 - For lclVar nodes, since the uses and defs are distinct tree
298 nodes, it is always possible to annotate the node with the
299 register to which the variable must be reloaded.
301 - For other nodes, since they represent both the def and use, if
302 the value must be reloaded to a different register, LSRA must
303 insert a `GT_RELOAD` node to specify the register to which it
306 - Local variable table (`LclVarDsc`):
308 - `LclVarDsc::lvRegister` is set to true if a local variable has the
309 same register assignment for its entire lifetime.
311 - `LclVarDsc::lvRegNum` is initialized to its initial register
314 - For incoming parameters, this is the register to which
315 `genFnPrologCalleeRegArgs()` will move it.
317 - Codegen will set `lvRegNum` to its current value as it processes
318 the trees, since a variable can be assigned different registers
324 This section describes the phases of the `LinearScan` allocator (as
325 well as supporting components) in more depth.
327 ### Liveness and Candidate Identification
329 - `Compiler::lvaMarkLocalVars`
331 - This determines which variables are tracked for the purposes of
332 dataflow analysis. Only those variables will be candidates for
335 - It is unclear whether it would be beneficial, but if we
336 could keep track of the variables that are only used within
337 a block (presumably true of many introduced temps), we may
338 find that we could continue to limit the number of variables
339 whose liveness is tracked across blocks, keeping an expanded
340 set only for transient liveness. [Issue \#11339](https://github.com/dotnet/coreclr/issues/11339)
342 - `Compiler::fgLocalVarLiveness`
344 - This computes the `BasicBlock::bbLiveIn` and
345 `BasicBlock::bbLiveOut` sets that are used by LSRA.
347 - It does this for an lclVar marked `lvTracked`, even if it is
348 not a register candidate. The dataflow information is also used
349 for GC info, but it may be the case that some of these should
350 not be marked during the pre-LSRA dataflow analysis.
352 - `LinearScan::identifyCandidates`
354 - This mostly duplicates what is done in
355 `Compiler::lvaMarkLocalVars(). There are differences in what
356 constituted a register candidate vs. what is tracked, but we
357 should probably handle them in ` Compiler::lvaMarkLocalVars()`
358 when it is called after `Lowering`.
360 - It sets the ` lvLRACandidate` flag on lclVars that are going
361 to be register candidates.
365 The determination of block ordering is done during the building of
366 `RefPosition`s. However, it is a logically distinct component. Its
367 objective is to identify a sequence of `BasicBlock`s for allocation
368 that satisfies the following properties:
370 - Each block comes after at least one of its predecessors, ideally the
371 one on the edge with the greatest weight.
373 - We use block weight, since edge weight is not tracked in the
376 The order of the `BasicBlock`s is captured in the `blockSequence` member of `LinearScan`.
378 Other implementations of linear scan register allocation aim to ensure
379 that a block is immediately preceded by a predecessor block. This is not
380 as big a consideration in our implementation because we reset the
381 variable locations to those at the end of the most frequent successor.
383 It begins with the first block, and then adds successor blocks to the
384 ready list, in sorted order, where the block weight is the sorting
387 After a block is added to the sequence list,
388 `findPredBlockForLiveIn()` is called to determine which predecessor to
389 use to set the register location of live-in lclVars, which may not be the
390 same as the previous block in the `blockSequence`. Its `bbNum` is captured in
393 The block ordering pass also identifies whether and where there are
394 critical edges. This also captured in the `LsraBlockInfo` and is used by the resolution phase.
396 ### Register Requirements
398 Like block ordering, this is done during the building of
399 `RefPosition`s, but is a logically separate component.
401 - Determine register requirements (Lowering::TreeNodeInfoInit)
403 - This method is responsible for initializing the `NodeInfo` for
406 - This is currently maintained in the `gtLsraInfo` for each
409 - The nodes are placed into a map, that maps from a tree node
410 to the list of tree nodes that are the actual operands.
412 - This complication is due to contained nodes, where, e.g.
413 a `GT_IND` node may be contained, and have a
414 `GT_LEA` child whose base and addr nodes are the
417 - Work is underway to eliminate the `gtLsraInfo`, and to
418 place the `NodeInfo` of the actual operands directly
419 into the map, eliminating the need for two level (map of
420 lists) data structure [Issue \#7225](https://github.com/dotnet/coreclr/issues/7225)
422 - Subsequent work will eliminate the `NodeInfo`
423 altogether, and instead build`RefPosition`s directly
424 from the `TreeNodeInfoInit` methods [Issue \#7257](https://github.com/dotnet/coreclr/issues/7257)
426 - The `NodeInfo` includes:
428 - The number of register sources and destinations.
430 - Register restrictions (candidates) for the register(s) defined by the node, if any.
431 These restrictions are specified separately as those imposed by this, the defining node
432 (`dstCandidates`) and those imposed by the consuming or parent node (`srcCandidates`).
434 - At the time the consuming node is being handled by `TreeNodeInfoInit()`, the `Interval`
435 for the defining node (if it is a "tree temp") has already been created, as has the
436 defining (`RefTypeDef`) `RefPosition`.
437 Any conflicts between the `dstCandidates` on the sources, and the `srcCandidates` at
438 the consuming node will be handled when the consuming `RefPosition` (`RefTypeUse`) is
441 - The number (internalCount) of registers required, and their
442 register restrictions (candidates). These are neither inputs
443 nor outputs of the node, but used in the sequence of code
444 generated for the tree.
446 - At one point, we had planned to eliminate internal
447 registers, as they do not precisely represent register
448 lifetimes. However, it is felt that the complexity and
449 code expansion from eliminating them would outweigh any
452 - Register allocation (doLinearScan)
454 - Iterate over the linear ordering of nodes within each
457 - Build `Interval`s with annotations
459 - Ref positions (called use positions by [[1]](#[1]) at each
460 def or use, and for internal registers.
462 - Add cross-interval preferences
464 - This area has room for improvement [Issue #11463](https://github.com/dotnet/coreclr/issues/11463)
466 - Determine whether to prefer callee-save registers. [Issue
467 \#7664](https://github.com/dotnet/coreclr/issues/7664) tracks the need to improve the heuristics for
468 when to prefer callee-save.
470 - Add pseudo-uses to the `Interval` if live at loop backedge.
472 - If the previous block is not a predecessor of the
473 current block, add DummyDefs for variables that are
474 live-in to the current block, but not live-out of the
477 - This is generally not required, as the block will
478 normally have a predecessor block that has already
481 - Future improvements:
483 - Identify single-def variables (including arguments) --
484 these are candidates for store-at-def spilling. [Issue
485 \#11344](https://github.com/dotnet/coreclr/issues/11344).
487 - Identify candidates for recomputation -- values which
488 are idempotent and can be more easily recomputed than
489 spilled & reloaded [Issue \#6131](https://github.com/dotnet/coreclr/issues/6131)
491 - Add appropriate ref positions for fixed registers (argument
492 registers, volatile/caller-save, write barriers, dedicated
495 - Iterate over `RefPosition`s in forward order, performing linear scan
496 allocation. The algorithm below is a modified version of that
497 specified in [[3]](#[3]), and doesn't include optimizations such as early
498 exit of the loops when the remaining `Interval`s will be out of range.
499 It is a high-level abstraction of the algorithm as implemented:
501 LinearScanAllocation(List<RefPosition> refPositions)
504 List<Interval> active = {};
506 Location currentLoc = Start;
507 foreach refPosition in RefPositions
509 if (refPosition.Location > currentLoc)
512 currentLoc = refPosition.Location;
514 if (refPosition is BasicBlockBoundary) ProcessBlockBoundaryAlloc()
515 Interval currentInterval = refPosition.Interval;
516 if (currentInterval->hasValidAssignment())
518 refPosition->setReg(currentInterval->physReg);
520 // find a register for currentInterval
521 // AllocateBlockedReg will take into account whether it must have
522 // a register in this position, and may otherwise choose not to
523 // allocate if existing intervals have higher priority.
524 // A split interval will be added to the unhandled list
525 // A split interval will be added to the unhandled list
526 if (!TryAllocateFreeReg(refPosition))
527 AllocateBusyReg(refPosition);
531 - hasValidAssignment() is not currently an actual method. If the
532 `Interval` currently has an assigned register, and it meets the
533 requirements of `refPosition` it is used. Otherwise, a new register
536 - Currently, parameters may not be allocated a register if their
537 weighted reference count is less than `BB_UNITY_WEIGHT`, however
538 plenty of room remains for improving the allocation of
539 parameters [Issue \#11356](https://github.com/dotnet/coreclr/issues/11356)
541 - `TryAllocateFreeReg()` iterates over the registers, attempting to find
542 the best free register (if any) to allocate:
544 - It uses a set of scoring criteria to evaluate the "goodness" of
545 the register that includes:
547 - Whether it covers the lifetime of the `Interval` and/or the
548 `Interval` to which it is preferenced, if any
550 - Whether it is in the register preference set for the
553 - Whether it is not only available but currently unassigned
554 (i.e. this register is NOT currently assigned to an `Interval`
555 which is not currently live)
557 - It always uses the same order for iterating over the registers.
558 The jit32 register allocator used a different ordering for tree
559 temps than for lclVars. It's unclear if this matters for LSRA,
560 but [Issue \#11357](https://github.com/dotnet/coreclr/issues/11357) tracks this question.
562 - It currently fully evaluates the "goodness" of each register.
563 [Issue \#7301](https://github.com/dotnet/coreclr/issues/7301) tracks the possibility of short-circuiting this evaluation.
565 - `AllocateBusyReg()` iterates over all the registers trying to find the
566 best register to spill (it must only be called if `tryAllocateFreeReg()`
567 was unable to find one):
569 - It takes into account the following:
571 - The distance to the next use of the `Interval` being
574 - The relative weight of the `Interval`s.
576 - Whether the `RefPosition` being allocated, or the one
577 potentially being spilled, is reg-optional
579 - It will always spill an `Interval` either at its most recent
580 use, or at the entry to the current block.
582 - Issues [\#7609](https://github.com/dotnet/coreclr/issues/7609) and [\#7665](https://github.com/dotnet/coreclr/issues/7665) track improvement of spill
585 - It is quite possible that combining `TryAllocateFreeReg()` and
586 `AllocateBusyReg()` might be more effective [Issue \#15408](https://github.com/dotnet/coreclr/issues/15408).
590 - Perform resolution at block boundaries, adding moves as needed
591 (the algorithm for resolution can be found in [[2]](#[2]), though note
592 that there is a typo: in the last 'if' statement, it should be
593 "if b != loc(pred(b))" instead of "if b = loc(pred(b))"):
595 - First, any variables that are in registers on the incoming
596 edge, but need to be on the stack, are stored, freeing those
599 - Next, any register-to-register moves are done, using a
602 - If there are circular dependencies, swap is used when
603 available, and a temp register is used otherwise.
605 - Finally, any stack to register moves are done.
607 - Resolution of exception edges
609 - This is currently done by ensuring that any variable that's
610 live in to an exception region is maintained on stack.
612 - Issue \#6001 raises the performance issue due to this
615 - An initial approach, for which some initial
616 implementation has been done, is to support the notion
617 of "write-thru" variables; for these, all definitions
618 would write to memory, but uses could use a register
621 - The value would be reloaded at exception boundaries.
623 - Support for "write-thru" variables could be extended to
624 single-def variables; if they are spilled at their
625 single definition, they need never be spilled again
630 - Code generation (genGenerateCode)
632 - Code generation utilizes the registers specified on the
633 instructions, as gtRegNum.
635 - At block boundaries, the code generator calls the register
636 allocator to update the locations of variables
637 (`recordVarLocationsAtStartOfBB()`)
639 - Spills and reloads are generated as annotated in `gtFlags`
640 (`GTF_SPILL`, `GTF_SPILLED`, respectively).
642 - GC tracking is performed during codegen.
649 This is the same bbLiveIn set as used elsewhere in the JIT but is
650 recomputed prior to register allocation because `Lowering` creates new
655 This is a temporary single-instance data structure used during the
656 construction of `Interval`s.
660 This is a common base class for `Interval`, which represent virtual
661 registers, and `RegRecord`s, which represent the physical registers. It
664 - An ordered list of ref positions -- each point in the linear order
665 at which it is defined or used, with a flag indicating whether it
666 must be in a register.
670 A structure that represents a variable, optimizer temp, or local temp,
671 that is a candidate for enregistering. Its `RefPosition`s reflect
672 actual uses (`RefTypeUse`) or defs (`RefTypeDef`) in the code
673 stream, entry "definitions" (`RefTypeParamDef` and
674 `RefTypeZeroInit`) or convey boundary information
675 (`RefTypeDummyDef`, `RefTypeExpUse`). It also contains:
677 - Register preference information:
679 - Set of preferred registers
681 - An `Interval` to which it is related by assignment or copy.
683 - The assigned register and its `RegRecord`
685 - During allocation, this may change, and reflects the current
686 assignment, if any, at the most recent `RefPosition`
688 - If the `Interval` is spilled or becomes inactive, it will retain
689 the `RegRecord`, but its `assignedReg` field will be set to
694 `RegRecord`s for physical registers share a common base class with
695 `Interval`s, but represent actual registers instead of virtual
696 registers. They have a list of `RefPosition`s for code locations at
697 which a fixed physical register is required, and during allocation they
698 maintain a pointer to the current `Interval` to which the physical
699 register is assigned.
701 `RegRecord` `RefPosition`s are `RefTypeKill` or `RefTypeFixedReg`. The former is
702 inserted at a point where a register is killed, e.g. by a call. The
703 latter are inserted at a point where a specific register must be used or
704 defined. These are generally due to calls or instructions with implicit
709 A `RefPosition` represents a single use, def or update of an
710 enregisterable variable or temporary. It contains
712 - The `Interval` or `RegRecord` to which this reference belongs.
714 - The next `RefPosition` (in code order) for the `Interval` or `RegRecord`.
715 This is used when determining which register to spill.
717 - Register assignment
719 - Prior to allocation, this represents the set of valid register
722 - After allocation, this is a single element register mask
723 including only the allocated register (or two registers in the
724 ARM implementation for longs).
726 - The type of `RefPosition`:
728 - `RefTypeDef` is a pure definition of an `Interval` .
730 - `RefTypeUse` is a pure use of an `Interval` .
732 - `RefTypeKill` is a location at which a physical register is
733 killed. These only exist on `RegRecord`s, not on `Interval`s
735 - Note that this type is probably not needed -- see especially
736 notes about physical registers in "future" section.
738 - `RefTypeBB` is really just a marker in the list of `RefPosition`s,
739 where the register allocator needs to record the register
740 locations at block boundaries. It is not associated with an
741 `Interval` or `RegRecord`.
743 - `RefTypeFixedReg` is a `RefPosition` on a `RegRecord` that marks a
744 position at which it is required by some `Interval`. This allows
745 the register allocator to determine the range in which a
748 - `RefTypeExpUse` is an `Interval` `RefPosition` inserted just prior to
749 a loop backedge, that allows the register allocator to identify
750 the actual end of the live range
752 - `RefTypeParamDef` is an `Interval` `RefPosition` that represents the
753 incoming "definition" of a parameter (register or stack)
755 - `RefTypeDummyDef` is inserted at a block into which a variable is
756 live, but was not live out of the previous block (in traversal
759 - `RefTypeZeroInit` is an `Interval` `RefPosition` that represents the
760 position at entry at which a variable will be initialized to
765 The tree nodes are updated during the writeback traversal with the
766 register that has been assigned to it (if any). If the associated
767 `RefPosition` is marked `spillAfter`, the `GTF_SPILL` flag is set. If it is
768 marked "reload" then the `GTF_SPILLED` flag is set.
770 The `GT_LCL_VAR` nodes are updated with the current "active" register
771 for the variable. For `Interval`s that represent incoming parameters, this
772 must be set to the initial register assignment for the parameter (this
773 may differ from the incoming register for register parameters; the
774 prolog generation code will move it as needed).
776 In the event that a tree-node (non-lclVar) is spilled, and the original
777 assigned register is not available at the time it is reloaded, a
778 GT_RELOAD node is inserted in the graph, and is annotated with the
779 register to which it should be reloaded just prior to use.
783 This mapping records the location of each live variable at the entry and
784 exit of each block. It is recorded during the writeback phase
785 (resolveRegisters) and is used to add resolution moves, as well as to
786 record the variable locations at the beginning of each block during code
789 This map incurs non-trivial JIT-time overhead. Issue \#11396 addresses
790 the question of whether there is an alternative that would have better
793 Dumps and Debugging Support
794 ---------------------------
796 The following dumps and debugging modes are provided:
798 - Prior to register allocation
800 - Liveness (use, def, in, out) per block
801 - A tuple-like list of nodes in execution order
803 - During `Interval` & `RefPosition` creation - `buildIntervals()`
805 - For each incoming arg: its type and incoming location
806 - For each instruction:
807 - The current contents of the `OperandToLocationInfoMap`. This corresponds to all the nodes that have defined values
808 that have not yet been consumed.
809 - An abbreviated dump of the GenTree node.
810 - The `TreeNodeInfo` generated for it.
811 - A dump of each ref position as it is created.
813 - After buildIntervals()
814 - A dump of all the `Interval`s and their `RefPosition`s
815 - A tuple-like list of nodes with their `RefPosition`s
818 - A table of registers and their contents as allocation
822 - A dump of `RefPosition`s, sequentially, and grouped for Var
826 - A list of the candidates for resolution at each split, join or
827 critical edge (i.e. the vars that are live) and where they
828 reside (stack or reg)
829 - Any actual moves that are required
832 - A table of registers and their contents, including spills made
833 after each `RefPosition` was allocated.
834 - A tuple-like list of nodes with their assigned registers
839 The implementation uses the COMPLUS_JitStressRegs environment variable.
840 The following are the stress modes associated with this variable. For
841 the most part they can be combined, though in some cases the values are
844 - Limit the number of registers available for allocation (at most one
845 of these can be specified):
847 - Limit to callee-regs (0x1)
849 - Limit to caller-save regs (0x2)
851 - Limit to a reduced set of both callee and caller-save regs (0x3)
853 - Note that some instructions require a large number of registers,
854 and may require special handling (this is captured in the
855 DEBUG-only field `RefPosition::minRegCandidateCount`, which is
856 set during `buildRefPositionsForNode()`)
858 - Modify register selection heuristics
860 - Reverse register "scoring" criteria (0x4)
861 - Prefer caller-save regs when the `Interval` spans a call (0x8)
863 - Spill the nearest-use instead of the farthest (subject to
864 the correctness condition that registers can't be reused
865 within the same node) (0x10)
867 - Modify the order in which the basic blocks are traversed:
868 - Layout order (0x20)
869 - Pred-first order (0x40) -- the default
870 - "Random" order (0x60) -- not yet implemented
871 - For repeatability, the order should be pseudo-random, e.g.
872 using a given skip and mod-divisor where the mod-divisor is
873 larger than the maximum number of basic blocks.
875 - Extend lifetimes to the entire method (0x80)
876 - This is done under MinOpts, but MinOpts doesn't actually put an
877 lclVars in registers.
879 - Modify the starting location (register/stack) for each variable at
880 block boundaries (the default is to use the weightiest predecessor
881 block that has already been allocated):
882 - Use the location from the previous block in layout order (0x100)
883 - Rotate the variable locations (0x200)
885 Assertions & Validation
886 -----------------------
888 There are many assertions in `LinearScan`. The following are the most
889 effective at identifying issues (i.e. they frequently show up in bugs):
891 - The node information isn't showing the number of consumed registers
893 - `assert(((consume == 0) && (produce == 0)) || (ComputeAvailableSrcCount(tree) == consume));`
894 - This usually means that the `TreeNodeInfoInit` method for this
895 node is not correctly setting its `srcCount` (which is what `consume` has been set to).
896 - No register is found, even with spilling.
897 - `assert(farthestRefPhysRegRecord != nullptr);`
898 - This means that the register requirements were over-constrained, either due to an error, or
899 possibly because it is a stress mode, and the instruction hasn't correctly specified its
900 minimum number of registers.
902 At the end of write-back (`resolveRegisters()`), `verifyFinalAllocation()` runs. It doesn't do a lot of validation, but it
903 prints the final allocation (including final spill placement), so is useful for tracking down correctness issues.
905 Future Extensions and Enhancements
906 ----------------------------------
908 The potential enhancements to the JIT, some of which are referenced in this document, can generally be found by [searching for LSRA in open issues](https://github.com/dotnet/coreclr/issues?utf8=%E2%9C%93&q=is%3Aissue+is%3Aopen+LSRA+in%3Atitle). The ones that are focused on JIT throughput are labeled `JITThroughput`.
910 The following haven't yet been opened as github issues:
912 ### Leveraging SSA form
914 Making SSA form available to LSRA would:
916 - Allow splitting of `Interval`s where SSA names are not related by a phi node.
918 - Ease potential future support for "General SSA" (where sources and dests of phi nodes may overlap), as existing support for resolution at block boundaries could be easily extended to introduce minimal copies for translating out of General SSA form.
920 ### Spanning trees for physical registers
922 LLVM has replaced their linear scan register allocator with something it
923 calls "Greedy Register Allocation". This uses a priority queue for the
924 order of allocation (sorted by decreasing spill cost), and a B+ tree to
925 represent each physical register. I think that using the B+ trees for
926 physical registers would be an improvement over the current PhysRegs,
927 and we may want to experiment with changing the allocation order as
928 well. It would not be necessary to significantly modify the process of
929 creating `Interval`s, nor the resolution phase.
934 1. <a name="[1]"/> Boissinot, B. et
935 al "Fast liveness checking for ssa-form programs," CGO 2008, pp.
937 http://portal.acm.org/citation.cfm?id=1356058.1356064&coll=ACM&dl=ACM&CFID=105967773&CFTOKEN=80545349
939 2. <a name="[2]"/> Boissinot, B. et al, "Revisiting
940 Out-of-SSA Translation for Correctness, Code Quality and
941 Efficiency," CGO 2009, pp. 114-125.
942 <http://portal.acm.org/citation.cfm?id=1545006.1545063&coll=ACM&dl=ACM&CFID=105967773&CFTOKEN=80545349>
945 3. <a name="[3]"/>Wimmer, C. and Mössenböck, D. "Optimized
946 Interval Splitting in a Linear Scan Register Allocator," ACM VEE
948 <http://portal.acm.org/citation.cfm?id=1064998&dl=ACM&coll=ACM&CFID=105967773&CFTOKEN=80545349>
950 4. <a name="[4]"/> Wimmer, C. and Franz, M. "Linear Scan
951 Register Allocation on SSA Form," ACM CGO 2010, pp. 170-179.
952 <http://portal.acm.org/citation.cfm?id=1772979&dl=ACM&coll=ACM&CFID=105967773&CFTOKEN=80545349>
954 5. <a name="[5]"/> Traub, O. et al "Quality and Speed in Linear-scan Register
955 Allocation," SIGPLAN '98, pp. 142-151.
956 <http://portal.acm.org/citation.cfm?id=277650.277714&coll=ACM&dl=ACM&CFID=105967773&CFTOKEN=80545349>
958 6. <a name="[6]"/> Olesen, J. "Greedy Register Allocation in LLVM 3.0," LLVM Project Blog, Sept. 2011.
959 <http://blog.llvm.org/2011/09/greedy-register-allocation-in-llvm-30.html>
960 (Last retrieved Feb. 2012)