1 // Licensed to the .NET Foundation under one or more agreements.
2 // The .NET Foundation licenses this file to you under the MIT license.
3 // See the LICENSE file in the project root for more information.
6 XXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXX
7 XXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXX
9 Linear Scan Register Allocation
14 - All register requirements are expressed in the code stream, either as destination
15 registers of tree nodes, or as internal registers. These requirements are
16 expressed in the TreeNodeInfo (gtLsraInfo) on each node, which includes:
17 - The number of register sources and destinations.
18 - The register restrictions (candidates) of the target register, both from itself,
19 as producer of the value (dstCandidates), and from its consuming node (srcCandidates).
20 Note that the srcCandidates field of TreeNodeInfo refers to the destination register
21 (not any of its sources).
22 - The number (internalCount) of registers required, and their register restrictions (internalCandidates).
23 These are neither inputs nor outputs of the node, but used in the sequence of code generated for the tree.
24 "Internal registers" are registers used during the code sequence generated for the node.
25 The register lifetimes must obey the following lifetime model:
26 - First, any internal registers are defined.
27 - Next, any source registers are used (and are then freed if they are last use and are not identified as
29 - Next, the internal registers are used (and are then freed).
30 - Next, any registers in the kill set for the instruction are killed.
31 - Next, the destination register(s) are defined (multiple destination registers are only supported on ARM)
32 - Finally, any "delayRegFree" source registers are freed.
33 There are several things to note about this order:
34 - The internal registers will never overlap any use, but they may overlap a destination register.
35 - Internal registers are never live beyond the node.
36 - The "delayRegFree" annotation is used for instructions that are only available in a Read-Modify-Write form.
37 That is, the destination register is one of the sources. In this case, we must not use the same register for
38 the non-RMW operand as for the destination.
40 Overview (doLinearScan):
41 - Walk all blocks, building intervals and RefPositions (buildIntervals)
42 - Traverse the RefPositions, marking last uses (setLastUses)
43 - Note that this is necessary because the execution order doesn't accurately reflect use order.
44 There is a "TODO-Throughput" to eliminate this.
45 - Allocate registers (allocateRegisters)
46 - Annotate nodes with register assignments (resolveRegisters)
47 - Add move nodes as needed to resolve conflicting register
48 assignments across non-adjacent edges. (resolveEdges, called from resolveRegisters)
53 - GenTree::gtRegNum (and gtRegPair for ARM) is annotated with the register
54 assignment for a node. If the node does not require a register, it is
55 annotated as such (for single registers, gtRegNum = REG_NA; for register
56 pair type, gtRegPair = REG_PAIR_NONE). For a variable definition or interior
57 tree node (an "implicit" definition), this is the register to put the result.
58 For an expression use, this is the place to find the value that has previously
60 - In most cases, this register must satisfy the constraints specified by the TreeNodeInfo.
61 - In some cases, this is difficult:
62 - If a lclVar node currently lives in some register, it may not be desirable to move it
63 (i.e. its current location may be desirable for future uses, e.g. if it's a callee save register,
64 but needs to be in a specific arg register for a call).
65 - In other cases there may be conflicts on the restrictions placed by the defining node and the node which
67 - If such a node is constrained to a single fixed register (e.g. an arg register, or a return from a call),
68 then LSRA is free to annotate the node with a different register. The code generator must issue the appropriate
70 - However, if such a node is constrained to a set of registers, and its current location does not satisfy that
71 requirement, LSRA must insert a GT_COPY node between the node and its parent. The gtRegNum on the GT_COPY node
72 must satisfy the register requirement of the parent.
73 - GenTree::gtRsvdRegs has a set of registers used for internal temps.
74 - A tree node is marked GTF_SPILL if the tree node must be spilled by the code generator after it has been
76 - LSRA currently does not set GTF_SPILLED on such nodes, because it caused problems in the old code generator.
77 In the new backend perhaps this should change (see also the note below under CodeGen).
78 - A tree node is marked GTF_SPILLED if it is a lclVar that must be reloaded prior to use.
79 - The register (gtRegNum) on the node indicates the register to which it must be reloaded.
80 - For lclVar nodes, since the uses and defs are distinct tree nodes, it is always possible to annotate the node
81 with the register to which the variable must be reloaded.
82 - For other nodes, since they represent both the def and use, if the value must be reloaded to a different
83 register, LSRA must insert a GT_RELOAD node in order to specify the register to which it should be reloaded.
85 Local variable table (LclVarDsc):
86 - LclVarDsc::lvRegister is set to true if a local variable has the
87 same register assignment for its entire lifetime.
88 - LclVarDsc::lvRegNum / lvOtherReg: these are initialized to their
89 first value at the end of LSRA (it looks like lvOtherReg isn't?
90 This is probably a bug (ARM)). Codegen will set them to their current value
91 as it processes the trees, since a variable can (now) be assigned different
92 registers over its lifetimes.
94 XXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXX
95 XXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXX
103 #ifndef LEGACY_BACKEND // This file is ONLY used for the RyuJIT backend that uses the linear scan register allocator
108 const char* LinearScan::resolveTypeName[] = {"Split", "Join", "Critical", "SharedCritical"};
111 /*XXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXX
112 XXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXX
114 XX Small Helper functions XX
117 XXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXX
118 XXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXX
121 //--------------------------------------------------------------
122 // lsraAssignRegToTree: Assign the given reg to tree node.
125 // tree - Gentree node
126 // reg - register to be assigned
127 // regIdx - register idx, if tree is a multi-reg call node.
128 // regIdx will be zero for single-reg result producing tree nodes.
133 void lsraAssignRegToTree(GenTreePtr tree, regNumber reg, unsigned regIdx)
137 tree->gtRegNum = reg;
141 assert(tree->IsMultiRegCall());
142 GenTreeCall* call = tree->AsCall();
143 call->SetRegNumByIdx(reg, regIdx);
147 //-------------------------------------------------------------
148 // getWeight: Returns the weight of the RefPosition.
151 // refPos - ref position
154 // Weight of ref position.
155 unsigned LinearScan::getWeight(RefPosition* refPos)
158 GenTreePtr treeNode = refPos->treeNode;
160 if (treeNode != nullptr)
162 if (isCandidateLocalRef(treeNode))
164 // Tracked locals: use weighted ref cnt as the weight of the
166 GenTreeLclVarCommon* lclCommon = treeNode->AsLclVarCommon();
167 LclVarDsc* varDsc = &(compiler->lvaTable[lclCommon->gtLclNum]);
168 weight = varDsc->lvRefCntWtd;
172 // Non-candidate local ref or non-lcl tree node.
173 // These are considered to have two references in the basic block:
174 // a def and a use and hence weighted ref count is 2 times
175 // the basic block weight in which they appear.
176 weight = 2 * this->blockInfo[refPos->bbNum].weight;
181 // Non-tree node ref positions. These will have a single
182 // reference in the basic block and hence their weighted
183 // refcount is equal to the block weight in which they
185 weight = this->blockInfo[refPos->bbNum].weight;
191 // allRegs represents a set of registers that can
192 // be used to allocate the specified type in any point
193 // in time (more of a 'bank' of registers).
194 regMaskTP LinearScan::allRegs(RegisterType rt)
198 return availableFloatRegs;
200 else if (rt == TYP_DOUBLE)
202 return availableDoubleRegs;
204 // TODO-Cleanup: Add an RBM_ALLSIMD
206 else if (varTypeIsSIMD(rt))
208 return availableDoubleRegs;
209 #endif // FEATURE_SIMD
213 return availableIntRegs;
217 //--------------------------------------------------------------------------
218 // allMultiRegCallNodeRegs: represents a set of registers that can be used
219 // to allocate a multi-reg call node.
222 // call - Multi-reg call node
225 // Mask representing the set of available registers for multi-reg call
229 // Multi-reg call node available regs = Bitwise-OR(allregs(GetReturnRegType(i)))
230 // for all i=0..RetRegCount-1.
231 regMaskTP LinearScan::allMultiRegCallNodeRegs(GenTreeCall* call)
233 assert(call->HasMultiRegRetVal());
235 ReturnTypeDesc* retTypeDesc = call->GetReturnTypeDesc();
236 regMaskTP resultMask = allRegs(retTypeDesc->GetReturnRegType(0));
238 unsigned count = retTypeDesc->GetReturnRegCount();
239 for (unsigned i = 1; i < count; ++i)
241 resultMask |= allRegs(retTypeDesc->GetReturnRegType(i));
247 //--------------------------------------------------------------------------
248 // allRegs: returns the set of registers that can accomodate the type of
252 // tree - GenTree node
255 // Mask representing the set of available registers for given tree
257 // Note: In case of multi-reg call node, the full set of registers must be
258 // determined by looking at types of individual return register types.
259 // In this case, the registers may include registers from different register
260 // sets and will not be limited to the actual ABI return registers.
261 regMaskTP LinearScan::allRegs(GenTree* tree)
263 regMaskTP resultMask;
265 // In case of multi-reg calls, allRegs is defined as
266 // Bitwise-Or(allRegs(GetReturnRegType(i)) for i=0..ReturnRegCount-1
267 if (tree->IsMultiRegCall())
269 resultMask = allMultiRegCallNodeRegs(tree->AsCall());
273 resultMask = allRegs(tree->TypeGet());
279 regMaskTP LinearScan::allSIMDRegs()
281 return availableFloatRegs;
284 //------------------------------------------------------------------------
285 // internalFloatRegCandidates: Return the set of registers that are appropriate
286 // for use as internal float registers.
289 // The set of registers (as a regMaskTP).
292 // compFloatingPointUsed is only required to be set if it is possible that we
293 // will use floating point callee-save registers.
294 // It is unlikely, if an internal register is the only use of floating point,
295 // that it will select a callee-save register. But to be safe, we restrict
296 // the set of candidates if compFloatingPointUsed is not already set.
298 regMaskTP LinearScan::internalFloatRegCandidates()
300 if (compiler->compFloatingPointUsed)
302 return allRegs(TYP_FLOAT);
306 return RBM_FLT_CALLEE_TRASH;
310 /*****************************************************************************
312 *****************************************************************************/
314 RegisterType regType(T type)
317 if (varTypeIsSIMD(type))
319 return FloatRegisterType;
321 #endif // FEATURE_SIMD
322 return varTypeIsFloating(TypeGet(type)) ? FloatRegisterType : IntRegisterType;
325 bool useFloatReg(var_types type)
327 return (regType(type) == FloatRegisterType);
330 bool registerTypesEquivalent(RegisterType a, RegisterType b)
332 return varTypeIsIntegralOrI(a) == varTypeIsIntegralOrI(b);
335 bool isSingleRegister(regMaskTP regMask)
337 return (regMask != RBM_NONE && genMaxOneBit(regMask));
340 /*****************************************************************************
341 * Inline functions for RegRecord
342 *****************************************************************************/
344 bool RegRecord::isFree()
346 return ((assignedInterval == nullptr || !assignedInterval->isActive) && !isBusyUntilNextKill);
349 /*****************************************************************************
350 * Inline functions for LinearScan
351 *****************************************************************************/
352 RegRecord* LinearScan::getRegisterRecord(regNumber regNum)
354 return &physRegs[regNum];
358 //------------------------------------------------------------------------
359 // stressLimitRegs: Given a set of registers, expressed as a register mask, reduce
360 // them based on the current stress options.
363 // mask - The current mask of register candidates for a node
366 // A possibly-modified mask, based on the value of COMPlus_JitStressRegs.
369 // This is the method used to implement the stress options that limit
370 // the set of registers considered for allocation.
372 regMaskTP LinearScan::stressLimitRegs(RefPosition* refPosition, regMaskTP mask)
374 if (getStressLimitRegs() != LSRA_LIMIT_NONE)
376 switch (getStressLimitRegs())
378 case LSRA_LIMIT_CALLEE:
379 if (!compiler->opts.compDbgEnC && (mask & RBM_CALLEE_SAVED) != RBM_NONE)
381 mask &= RBM_CALLEE_SAVED;
384 case LSRA_LIMIT_CALLER:
385 if ((mask & RBM_CALLEE_TRASH) != RBM_NONE)
387 regMaskTP newMask = mask & RBM_CALLEE_TRASH;
389 // On x86 we need to ensure that there are minimum
390 // 2 registers in the mask because we could have the
393 // t0 = GT_SUB(v02, v01)
394 // v01 = GT_DIV(v02, t0)
396 // Say v02 was allocated edx and v01 was allocated ecx.
397 // Candidates of Def position of GT_SUB = { ecx, ebx, esi, edi }
398 // Candidates & RBM_CALLEE_TRASH = { ecx }
399 // But ecx cannot be allocated to Def position of GT_SUB
400 // since v01 is marked as delayRegFree. Because targetReg of
401 // non-commutative opers like GT_SUB cannot be the same as
402 // op2's reg on xarch.
404 // On x86 alone this needs to be ensured because GT_DIV
405 // kills two callee trash registers (eax and edx) and op2
406 // of GT_SUB could take ecx leaving no registers for
407 // allocation. On targets like amd64 this is not an issue
408 // because there are more callee trash registers leaving
409 // aside { eax, edx, ecx }
410 if (genCountBits(newMask) >= 2)
414 #else // !_TARGET_X86_
416 #endif // !_TARGET_X86_
419 case LSRA_LIMIT_SMALL_SET:
420 if ((mask & LsraLimitSmallIntSet) != RBM_NONE)
422 mask &= LsraLimitSmallIntSet;
424 else if ((mask & LsraLimitSmallFPSet) != RBM_NONE)
426 mask &= LsraLimitSmallFPSet;
432 if (refPosition != nullptr && refPosition->isFixedRegRef)
434 mask |= refPosition->registerAssignment;
441 // TODO-Cleanup: Consider adding an overload that takes a varDsc, and can appropriately
442 // set such fields as isStructField
444 Interval* LinearScan::newInterval(RegisterType theRegisterType)
446 intervals.emplace_back(theRegisterType, allRegs(theRegisterType));
447 Interval* newInt = &intervals.back();
450 newInt->intervalIndex = static_cast<unsigned>(intervals.size() - 1);
453 DBEXEC(VERBOSE, newInt->dump());
457 RefPosition* LinearScan::newRefPositionRaw(LsraLocation nodeLocation, GenTree* treeNode, RefType refType)
459 refPositions.emplace_back(curBBNum, nodeLocation, treeNode, refType);
460 RefPosition* newRP = &refPositions.back();
462 newRP->rpNum = static_cast<unsigned>(refPositions.size() - 1);
467 //------------------------------------------------------------------------
468 // resolveConflictingDefAndUse: Resolve the situation where we have conflicting def and use
469 // register requirements on a single-def, single-use interval.
472 // defRefPosition - The interval definition
473 // useRefPosition - The (sole) interval use
479 // The two RefPositions are for the same interval, which is a tree-temp.
482 // We require some special handling for the case where the use is a "delayRegFree" case of a fixedReg.
483 // In that case, if we change the registerAssignment on the useRefPosition, we will lose the fact that,
484 // even if we assign a different register (and rely on codegen to do the copy), that fixedReg also needs
485 // to remain busy until the Def register has been allocated. In that case, we don't allow Case 1 or Case 4
487 // Here are the cases we consider (in this order):
488 // 1. If The defRefPosition specifies a single register, and there are no conflicting
489 // FixedReg uses of it between the def and use, we use that register, and the code generator
490 // will insert the copy. Note that it cannot be in use because there is a FixedRegRef for the def.
491 // 2. If the useRefPosition specifies a single register, and it is not in use, and there are no
492 // conflicting FixedReg uses of it between the def and use, we use that register, and the code generator
493 // will insert the copy.
494 // 3. If the defRefPosition specifies a single register (but there are conflicts, as determined
495 // in 1.), and there are no conflicts with the useRefPosition register (if it's a single register),
496 /// we set the register requirements on the defRefPosition to the use registers, and the
497 // code generator will insert a copy on the def. We can't rely on the code generator to put a copy
498 // on the use if it has multiple possible candidates, as it won't know which one has been allocated.
499 // 4. If the useRefPosition specifies a single register, and there are no conflicts with the register
500 // on the defRefPosition, we leave the register requirements on the defRefPosition as-is, and set
501 // the useRefPosition to the def registers, for similar reasons to case #3.
502 // 5. If both the defRefPosition and the useRefPosition specify single registers, but both have conflicts,
503 // We set the candiates on defRefPosition to be all regs of the appropriate type, and since they are
504 // single registers, codegen can insert the copy.
505 // 6. Finally, if the RefPositions specify disjoint subsets of the registers (or the use is fixed but
506 // has a conflict), we must insert a copy. The copy will be inserted before the use if the
507 // use is not fixed (in the fixed case, the code generator will insert the use).
509 // TODO-CQ: We get bad register allocation in case #3 in the situation where no register is
510 // available for the lifetime. We end up allocating a register that must be spilled, and it probably
511 // won't be the register that is actually defined by the target instruction. So, we have to copy it
512 // and THEN spill it. In this case, we should be using the def requirement. But we need to change
513 // the interface to this method a bit to make that work (e.g. returning a candidate set to use, but
514 // leaving the registerAssignment as-is on the def, so that if we find that we need to spill anyway
515 // we can use the fixed-reg on the def.
518 void LinearScan::resolveConflictingDefAndUse(Interval* interval, RefPosition* defRefPosition)
520 assert(!interval->isLocalVar);
522 RefPosition* useRefPosition = defRefPosition->nextRefPosition;
523 regMaskTP defRegAssignment = defRefPosition->registerAssignment;
524 regMaskTP useRegAssignment = useRefPosition->registerAssignment;
525 RegRecord* defRegRecord = nullptr;
526 RegRecord* useRegRecord = nullptr;
527 regNumber defReg = REG_NA;
528 regNumber useReg = REG_NA;
529 bool defRegConflict = false;
530 bool useRegConflict = false;
532 // If the useRefPosition is a "delayRegFree", we can't change the registerAssignment
533 // on it, or we will fail to ensure that the fixedReg is busy at the time the target
534 // (of the node that uses this interval) is allocated.
535 bool canChangeUseAssignment = !useRefPosition->isFixedRegRef || !useRefPosition->delayRegFree;
537 INDEBUG(dumpLsraAllocationEvent(LSRA_EVENT_DEFUSE_CONFLICT));
538 if (!canChangeUseAssignment)
540 INDEBUG(dumpLsraAllocationEvent(LSRA_EVENT_DEFUSE_FIXED_DELAY_USE));
542 if (defRefPosition->isFixedRegRef)
544 defReg = defRefPosition->assignedReg();
545 defRegRecord = getRegisterRecord(defReg);
546 if (canChangeUseAssignment)
548 RefPosition* currFixedRegRefPosition = defRegRecord->recentRefPosition;
549 assert(currFixedRegRefPosition != nullptr &&
550 currFixedRegRefPosition->nodeLocation == defRefPosition->nodeLocation);
552 if (currFixedRegRefPosition->nextRefPosition == nullptr ||
553 currFixedRegRefPosition->nextRefPosition->nodeLocation > useRefPosition->getRefEndLocation())
555 // This is case #1. Use the defRegAssignment
556 INDEBUG(dumpLsraAllocationEvent(LSRA_EVENT_DEFUSE_CASE1));
557 useRefPosition->registerAssignment = defRegAssignment;
562 defRegConflict = true;
566 if (useRefPosition->isFixedRegRef)
568 useReg = useRefPosition->assignedReg();
569 useRegRecord = getRegisterRecord(useReg);
570 RefPosition* currFixedRegRefPosition = useRegRecord->recentRefPosition;
572 // We know that useRefPosition is a fixed use, so the nextRefPosition must not be null.
573 RefPosition* nextFixedRegRefPosition = useRegRecord->getNextRefPosition();
574 assert(nextFixedRegRefPosition != nullptr &&
575 nextFixedRegRefPosition->nodeLocation <= useRefPosition->nodeLocation);
577 // First, check to see if there are any conflicting FixedReg references between the def and use.
578 if (nextFixedRegRefPosition->nodeLocation == useRefPosition->nodeLocation)
580 // OK, no conflicting FixedReg references.
581 // Now, check to see whether it is currently in use.
582 if (useRegRecord->assignedInterval != nullptr)
584 RefPosition* possiblyConflictingRef = useRegRecord->assignedInterval->recentRefPosition;
585 LsraLocation possiblyConflictingRefLocation = possiblyConflictingRef->getRefEndLocation();
586 if (possiblyConflictingRefLocation >= defRefPosition->nodeLocation)
588 useRegConflict = true;
593 // This is case #2. Use the useRegAssignment
594 INDEBUG(dumpLsraAllocationEvent(LSRA_EVENT_DEFUSE_CASE2));
595 defRefPosition->registerAssignment = useRegAssignment;
601 useRegConflict = true;
604 if (defRegRecord != nullptr && !useRegConflict)
607 INDEBUG(dumpLsraAllocationEvent(LSRA_EVENT_DEFUSE_CASE3));
608 defRefPosition->registerAssignment = useRegAssignment;
611 if (useRegRecord != nullptr && !defRegConflict && canChangeUseAssignment)
614 INDEBUG(dumpLsraAllocationEvent(LSRA_EVENT_DEFUSE_CASE4));
615 useRefPosition->registerAssignment = defRegAssignment;
618 if (defRegRecord != nullptr && useRegRecord != nullptr)
621 INDEBUG(dumpLsraAllocationEvent(LSRA_EVENT_DEFUSE_CASE5));
622 RegisterType regType = interval->registerType;
623 assert((getRegisterType(interval, defRefPosition) == regType) &&
624 (getRegisterType(interval, useRefPosition) == regType));
625 regMaskTP candidates = allRegs(regType);
626 defRefPosition->registerAssignment = candidates;
629 INDEBUG(dumpLsraAllocationEvent(LSRA_EVENT_DEFUSE_CASE6));
633 //------------------------------------------------------------------------
634 // conflictingFixedRegReference: Determine whether the current RegRecord has a
635 // fixed register use that conflicts with 'refPosition'
638 // refPosition - The RefPosition of interest
641 // Returns true iff the given RefPosition is NOT a fixed use of this register,
643 // - there is a RefPosition on this RegRecord at the nodeLocation of the given RefPosition, or
644 // - the given RefPosition has a delayRegFree, and there is a RefPosition on this RegRecord at
645 // the nodeLocation just past the given RefPosition.
648 // 'refPosition is non-null.
650 bool RegRecord::conflictingFixedRegReference(RefPosition* refPosition)
652 // Is this a fixed reference of this register? If so, there is no conflict.
653 if (refPosition->isFixedRefOfRegMask(genRegMask(regNum)))
657 // Otherwise, check for conflicts.
658 // There is a conflict if:
659 // 1. There is a recent RefPosition on this RegRecord that is at this location,
660 // except in the case where it is a special "putarg" that is associated with this interval, OR
661 // 2. There is an upcoming RefPosition at this location, or at the next location
662 // if refPosition is a delayed use (i.e. must be kept live through the next/def location).
664 LsraLocation refLocation = refPosition->nodeLocation;
665 if (recentRefPosition != nullptr && recentRefPosition->refType != RefTypeKill &&
666 recentRefPosition->nodeLocation == refLocation &&
667 (!isBusyUntilNextKill || assignedInterval != refPosition->getInterval()))
671 LsraLocation nextPhysRefLocation = getNextRefLocation();
672 if (nextPhysRefLocation == refLocation || (refPosition->delayRegFree && nextPhysRefLocation == (refLocation + 1)))
679 void LinearScan::applyCalleeSaveHeuristics(RefPosition* rp)
681 #ifdef _TARGET_AMD64_
682 if (compiler->opts.compDbgEnC)
684 // We only use RSI and RDI for EnC code, so we don't want to favor callee-save regs.
687 #endif // _TARGET_AMD64_
689 Interval* theInterval = rp->getInterval();
691 regMaskTP calleeSaveMask = calleeSaveRegs(getRegisterType(theInterval, rp));
692 if (doReverseCallerCallee())
694 regMaskTP newAssignment = rp->registerAssignment;
695 newAssignment &= calleeSaveMask;
696 if (newAssignment != RBM_NONE)
698 rp->registerAssignment = newAssignment;
704 // Set preferences so that this register set will be preferred for earlier refs
705 theInterval->updateRegisterPreferences(rp->registerAssignment);
709 void LinearScan::associateRefPosWithInterval(RefPosition* rp)
711 Referenceable* theReferent = rp->referent;
713 if (theReferent != nullptr)
715 // All RefPositions except the dummy ones at the beginning of blocks
717 if (rp->isIntervalRef())
719 Interval* theInterval = rp->getInterval();
721 applyCalleeSaveHeuristics(rp);
723 // Ensure that we have consistent def/use on SDSU temps.
724 // However, in the case of a non-commutative rmw def, we must avoid over-constraining
725 // the def, so don't propagate a single-register restriction from the consumer to the producer
727 if (RefTypeIsUse(rp->refType) && !theInterval->isLocalVar)
729 RefPosition* prevRefPosition = theInterval->recentRefPosition;
730 assert(prevRefPosition != nullptr && theInterval->firstRefPosition == prevRefPosition);
731 regMaskTP prevAssignment = prevRefPosition->registerAssignment;
732 regMaskTP newAssignment = (prevAssignment & rp->registerAssignment);
733 if (newAssignment != RBM_NONE)
735 if (!theInterval->hasNonCommutativeRMWDef || !isSingleRegister(newAssignment))
737 prevRefPosition->registerAssignment = newAssignment;
742 theInterval->hasConflictingDefUse = true;
747 RefPosition* prevRP = theReferent->recentRefPosition;
748 if (prevRP != nullptr)
750 prevRP->nextRefPosition = rp;
754 theReferent->firstRefPosition = rp;
756 theReferent->recentRefPosition = rp;
757 theReferent->lastRefPosition = rp;
761 assert((rp->refType == RefTypeBB) || (rp->refType == RefTypeKillGCRefs));
765 //---------------------------------------------------------------------------
766 // newRefPosition: allocate and initialize a new RefPosition.
769 // reg - reg number that identifies RegRecord to be associated
770 // with this RefPosition
771 // theLocation - LSRA location of RefPosition
772 // theRefType - RefPosition type
773 // theTreeNode - GenTree node for which this RefPosition is created
774 // mask - Set of valid registers for this RefPosition
775 // multiRegIdx - register position if this RefPosition corresponds to a
776 // multi-reg call node.
781 RefPosition* LinearScan::newRefPosition(
782 regNumber reg, LsraLocation theLocation, RefType theRefType, GenTree* theTreeNode, regMaskTP mask)
784 RefPosition* newRP = newRefPositionRaw(theLocation, theTreeNode, theRefType);
786 newRP->setReg(getRegisterRecord(reg));
787 newRP->registerAssignment = mask;
789 newRP->setMultiRegIdx(0);
790 newRP->setAllocateIfProfitable(0);
792 associateRefPosWithInterval(newRP);
794 DBEXEC(VERBOSE, newRP->dump());
798 //---------------------------------------------------------------------------
799 // newRefPosition: allocate and initialize a new RefPosition.
802 // theInterval - interval to which RefPosition is associated with.
803 // theLocation - LSRA location of RefPosition
804 // theRefType - RefPosition type
805 // theTreeNode - GenTree node for which this RefPosition is created
806 // mask - Set of valid registers for this RefPosition
807 // multiRegIdx - register position if this RefPosition corresponds to a
808 // multi-reg call node.
813 RefPosition* LinearScan::newRefPosition(Interval* theInterval,
814 LsraLocation theLocation,
816 GenTree* theTreeNode,
818 unsigned multiRegIdx /* = 0 */)
821 if (theInterval != nullptr && regType(theInterval->registerType) == FloatRegisterType)
823 // In the case we're using floating point registers we must make sure
824 // this flag was set previously in the compiler since this will mandate
825 // whether LSRA will take into consideration FP reg killsets.
826 assert(compiler->compFloatingPointUsed || ((mask & RBM_FLT_CALLEE_SAVED) == 0));
830 // If this reference is constrained to a single register (and it's not a dummy
831 // or Kill reftype already), add a RefTypeFixedReg at this location so that its
832 // availability can be more accurately determined
834 bool isFixedRegister = isSingleRegister(mask);
835 bool insertFixedRef = false;
838 // Insert a RefTypeFixedReg for any normal def or use (not ParamDef or BB)
839 if (theRefType == RefTypeUse || theRefType == RefTypeDef)
841 insertFixedRef = true;
847 regNumber physicalReg = genRegNumFromMask(mask);
848 RefPosition* pos = newRefPosition(physicalReg, theLocation, RefTypeFixedReg, nullptr, mask);
849 assert(theInterval != nullptr);
850 assert((allRegs(theInterval->registerType) & mask) != 0);
853 RefPosition* newRP = newRefPositionRaw(theLocation, theTreeNode, theRefType);
855 newRP->setInterval(theInterval);
858 newRP->isFixedRegRef = isFixedRegister;
860 #ifndef _TARGET_AMD64_
861 // We don't need this for AMD because the PInvoke method epilog code is explicit
862 // at register allocation time.
863 if (theInterval != nullptr && theInterval->isLocalVar && compiler->info.compCallUnmanaged &&
864 theInterval->varNum == compiler->genReturnLocal)
866 mask &= ~(RBM_PINVOKE_TCB | RBM_PINVOKE_FRAME);
867 noway_assert(mask != RBM_NONE);
869 #endif // !_TARGET_AMD64_
870 newRP->registerAssignment = mask;
872 newRP->setMultiRegIdx(multiRegIdx);
873 newRP->setAllocateIfProfitable(0);
875 associateRefPosWithInterval(newRP);
877 DBEXEC(VERBOSE, newRP->dump());
881 /*****************************************************************************
882 * Inline functions for Interval
883 *****************************************************************************/
884 RefPosition* Referenceable::getNextRefPosition()
886 if (recentRefPosition == nullptr)
888 return firstRefPosition;
892 return recentRefPosition->nextRefPosition;
896 LsraLocation Referenceable::getNextRefLocation()
898 RefPosition* nextRefPosition = getNextRefPosition();
899 if (nextRefPosition == nullptr)
905 return nextRefPosition->nodeLocation;
909 // Iterate through all the registers of the given type
910 class RegisterIterator
912 friend class Registers;
915 RegisterIterator(RegisterType type) : regType(type)
917 if (useFloatReg(regType))
919 currentRegNum = REG_FP_FIRST;
923 currentRegNum = REG_INT_FIRST;
928 static RegisterIterator Begin(RegisterType regType)
930 return RegisterIterator(regType);
932 static RegisterIterator End(RegisterType regType)
934 RegisterIterator endIter = RegisterIterator(regType);
935 // This assumes only integer and floating point register types
936 // if we target a processor with additional register types,
937 // this would have to change
938 if (useFloatReg(regType))
940 // This just happens to work for both double & float
941 endIter.currentRegNum = REG_NEXT(REG_FP_LAST);
945 endIter.currentRegNum = REG_NEXT(REG_INT_LAST);
951 void operator++(int dummy) // int dummy is c++ for "this is postfix ++"
953 currentRegNum = REG_NEXT(currentRegNum);
955 if (regType == TYP_DOUBLE)
956 currentRegNum = REG_NEXT(currentRegNum);
959 void operator++() // prefix operator++
961 currentRegNum = REG_NEXT(currentRegNum);
963 if (regType == TYP_DOUBLE)
964 currentRegNum = REG_NEXT(currentRegNum);
967 regNumber operator*()
969 return currentRegNum;
971 bool operator!=(const RegisterIterator& other)
973 return other.currentRegNum != currentRegNum;
977 regNumber currentRegNum;
978 RegisterType regType;
984 friend class RegisterIterator;
986 Registers(RegisterType t)
990 RegisterIterator begin()
992 return RegisterIterator::Begin(type);
994 RegisterIterator end()
996 return RegisterIterator::End(type);
1001 void LinearScan::dumpVarToRegMap(VarToRegMap map)
1003 bool anyPrinted = false;
1004 for (unsigned varIndex = 0; varIndex < compiler->lvaTrackedCount; varIndex++)
1006 unsigned varNum = compiler->lvaTrackedToVarNum[varIndex];
1007 if (map[varIndex] != REG_STK)
1009 printf("V%02u=%s ", varNum, getRegName(map[varIndex]));
1020 void LinearScan::dumpInVarToRegMap(BasicBlock* block)
1022 printf("Var=Reg beg of BB%02u: ", block->bbNum);
1023 VarToRegMap map = getInVarToRegMap(block->bbNum);
1024 dumpVarToRegMap(map);
1027 void LinearScan::dumpOutVarToRegMap(BasicBlock* block)
1029 printf("Var=Reg end of BB%02u: ", block->bbNum);
1030 VarToRegMap map = getOutVarToRegMap(block->bbNum);
1031 dumpVarToRegMap(map);
1036 LinearScanInterface* getLinearScanAllocator(Compiler* comp)
1038 return new (comp, CMK_LSRA) LinearScan(comp);
1041 //------------------------------------------------------------------------
1048 // The constructor takes care of initializing the data structures that are used
1049 // during Lowering, including (in DEBUG) getting the stress environment variables,
1050 // as they may affect the block ordering.
1052 LinearScan::LinearScan(Compiler* theCompiler)
1053 : compiler(theCompiler)
1054 #if MEASURE_MEM_ALLOC
1055 , lsraIAllocator(nullptr)
1056 #endif // MEASURE_MEM_ALLOC
1057 , intervals(LinearScanMemoryAllocatorInterval(theCompiler))
1058 , refPositions(LinearScanMemoryAllocatorRefPosition(theCompiler))
1061 maxNodeLocation = 0;
1062 activeRefPosition = nullptr;
1064 // Get the value of the environment variable that controls stress for register allocation
1065 lsraStressMask = JitConfig.JitStressRegs();
1068 if (lsraStressMask != 0)
1070 // The code in this #if can be used to debug JitStressRegs issues according to
1071 // method hash. To use, simply set environment variables JitStressRegsHashLo and JitStressRegsHashHi
1072 unsigned methHash = compiler->info.compMethodHash();
1073 char* lostr = getenv("JitStressRegsHashLo");
1074 unsigned methHashLo = 0;
1076 if (lostr != nullptr)
1078 sscanf_s(lostr, "%x", &methHashLo);
1081 char* histr = getenv("JitStressRegsHashHi");
1082 unsigned methHashHi = UINT32_MAX;
1083 if (histr != nullptr)
1085 sscanf_s(histr, "%x", &methHashHi);
1088 if (methHash < methHashLo || methHash > methHashHi)
1092 else if (dump == true)
1094 printf("JitStressRegs = %x for method %s, hash = 0x%x.\n",
1095 lsraStressMask, compiler->info.compFullName, compiler->info.compMethodHash());
1096 printf(""); // in our logic this causes a flush
1102 dumpTerse = (JitConfig.JitDumpTerseLsra() != 0);
1105 availableIntRegs = (RBM_ALLINT & ~compiler->codeGen->regSet.rsMaskResvd);
1107 availableIntRegs &= ~RBM_FPBASE;
1108 #endif // ETW_EBP_FRAMED
1109 availableFloatRegs = RBM_ALLFLOAT;
1110 availableDoubleRegs = RBM_ALLDOUBLE;
1112 #ifdef _TARGET_AMD64_
1113 if (compiler->opts.compDbgEnC)
1115 // On x64 when the EnC option is set, we always save exactly RBP, RSI and RDI.
1116 // RBP is not available to the register allocator, so RSI and RDI are the only
1117 // callee-save registers available.
1118 availableIntRegs &= ~RBM_CALLEE_SAVED | RBM_RSI | RBM_RDI;
1119 availableFloatRegs &= ~RBM_CALLEE_SAVED;
1120 availableDoubleRegs &= ~RBM_CALLEE_SAVED;
1122 #endif // _TARGET_AMD64_
1123 compiler->rpFrameType = FT_NOT_SET;
1124 compiler->rpMustCreateEBPCalled = false;
1126 compiler->codeGen->intRegState.rsIsFloat = false;
1127 compiler->codeGen->floatRegState.rsIsFloat = true;
1129 // Block sequencing (the order in which we schedule).
1130 // Note that we don't initialize the bbVisitedSet until we do the first traversal
1131 // (currently during Lowering's second phase, where it sets the TreeNodeInfo).
1132 // This is so that any blocks that are added during the first phase of Lowering
1133 // are accounted for (and we don't have BasicBlockEpoch issues).
1134 blockSequencingDone = false;
1135 blockSequence = nullptr;
1136 blockSequenceWorkList = nullptr;
1140 // Information about each block, including predecessor blocks used for variable locations at block entry.
1141 blockInfo = nullptr;
1143 // Populate the register mask table.
1144 // The first two masks in the table are allint/allfloat
1145 // The next N are the masks for each single register.
1146 // After that are the dynamically added ones.
1147 regMaskTable = new (compiler, CMK_LSRA) regMaskTP[numMasks];
1148 regMaskTable[ALLINT_IDX] = allRegs(TYP_INT);
1149 regMaskTable[ALLFLOAT_IDX] = allRegs(TYP_DOUBLE);
1152 for (reg = REG_FIRST; reg < REG_COUNT; reg = REG_NEXT(reg))
1154 regMaskTable[FIRST_SINGLE_REG_IDX + reg - REG_FIRST] = (reg == REG_STK) ? RBM_NONE : genRegMask(reg);
1156 nextFreeMask = FIRST_SINGLE_REG_IDX + REG_COUNT;
1157 noway_assert(nextFreeMask <= numMasks);
1160 // Return the reg mask corresponding to the given index.
1161 regMaskTP LinearScan::GetRegMaskForIndex(RegMaskIndex index)
1163 assert(index < numMasks);
1164 assert(index < nextFreeMask);
1165 return regMaskTable[index];
1168 // Given a reg mask, return the index it corresponds to. If it is not a 'well known' reg mask,
1169 // add it at the end. This method has linear behavior in the worst cases but that is fairly rare.
1170 // Most methods never use any but the well-known masks, and when they do use more
1171 // it is only one or two more.
1172 LinearScan::RegMaskIndex LinearScan::GetIndexForRegMask(regMaskTP mask)
1174 RegMaskIndex result;
1175 if (isSingleRegister(mask))
1177 result = genRegNumFromMask(mask) + FIRST_SINGLE_REG_IDX;
1179 else if (mask == allRegs(TYP_INT))
1181 result = ALLINT_IDX;
1183 else if (mask == allRegs(TYP_DOUBLE))
1185 result = ALLFLOAT_IDX;
1189 for (int i = FIRST_SINGLE_REG_IDX + REG_COUNT; i < nextFreeMask; i++)
1191 if (regMaskTable[i] == mask)
1197 // We only allocate a fixed number of masks. Since we don't reallocate, we will throw a
1198 // noway_assert if we exceed this limit.
1199 noway_assert(nextFreeMask < numMasks);
1201 regMaskTable[nextFreeMask] = mask;
1202 result = nextFreeMask;
1205 assert(mask == regMaskTable[result]);
1209 // We've decided that we can't use a register during register allocation (probably FPBASE),
1210 // but we've already added it to the register masks. Go through the masks and remove it.
1211 void LinearScan::RemoveRegisterFromMasks(regNumber reg)
1213 JITDUMP("Removing register %s from LSRA register masks\n", getRegName(reg));
1215 regMaskTP mask = ~genRegMask(reg);
1216 for (int i = 0; i < nextFreeMask; i++)
1218 regMaskTable[i] &= mask;
1221 JITDUMP("After removing register:\n");
1222 DBEXEC(VERBOSE, dspRegisterMaskTable());
1226 void LinearScan::dspRegisterMaskTable()
1228 printf("LSRA register masks. Total allocated: %d, total used: %d\n", numMasks, nextFreeMask);
1229 for (int i = 0; i < nextFreeMask; i++)
1232 dspRegMask(regMaskTable[i]);
1238 //------------------------------------------------------------------------
1239 // getNextCandidateFromWorkList: Get the next candidate for block sequencing
1245 // The next block to be placed in the sequence.
1248 // This method currently always returns the next block in the list, and relies on having
1249 // blocks added to the list only when they are "ready", and on the
1250 // addToBlockSequenceWorkList() method to insert them in the proper order.
1251 // However, a block may be in the list and already selected, if it was subsequently
1252 // encountered as both a flow and layout successor of the most recently selected
1255 BasicBlock* LinearScan::getNextCandidateFromWorkList()
1257 BasicBlockList* nextWorkList = nullptr;
1258 for (BasicBlockList* workList = blockSequenceWorkList; workList != nullptr; workList = nextWorkList)
1260 nextWorkList = workList->next;
1261 BasicBlock* candBlock = workList->block;
1262 removeFromBlockSequenceWorkList(workList, nullptr);
1263 if (!isBlockVisited(candBlock))
1271 //------------------------------------------------------------------------
1272 // setBlockSequence:Determine the block order for register allocation.
1281 // On return, the blockSequence array contains the blocks, in the order in which they
1282 // will be allocated.
1283 // This method clears the bbVisitedSet on LinearScan, and when it returns the set
1284 // contains all the bbNums for the block.
1285 // This requires a traversal of the BasicBlocks, and could potentially be
1286 // combined with the first traversal (currently the one in Lowering that sets the
1289 void LinearScan::setBlockSequence()
1291 // Reset the "visited" flag on each block.
1292 compiler->EnsureBasicBlockEpoch();
1293 bbVisitedSet = BlockSetOps::MakeEmpty(compiler);
1294 BlockSet BLOCKSET_INIT_NOCOPY(readySet, BlockSetOps::MakeEmpty(compiler));
1295 assert(blockSequence == nullptr && bbSeqCount == 0);
1296 blockSequence = new (compiler, CMK_LSRA) BasicBlock*[compiler->fgBBcount];
1297 bbNumMaxBeforeResolution = compiler->fgBBNumMax;
1298 blockInfo = new (compiler, CMK_LSRA) LsraBlockInfo[bbNumMaxBeforeResolution + 1];
1300 assert(blockSequenceWorkList == nullptr);
1302 bool addedInternalBlocks = false;
1303 verifiedAllBBs = false;
1304 BasicBlock* nextBlock;
1305 for (BasicBlock* block = compiler->fgFirstBB; block != nullptr; block = nextBlock)
1307 blockSequence[bbSeqCount] = block;
1308 markBlockVisited(block);
1310 nextBlock = nullptr;
1312 // Initialize the blockInfo.
1313 // predBBNum will be set later. 0 is never used as a bbNum.
1314 blockInfo[block->bbNum].predBBNum = 0;
1315 // We check for critical edges below, but initialize to false.
1316 blockInfo[block->bbNum].hasCriticalInEdge = false;
1317 blockInfo[block->bbNum].hasCriticalOutEdge = false;
1318 blockInfo[block->bbNum].weight = block->bbWeight;
1320 if (block->GetUniquePred(compiler) == nullptr)
1322 for (flowList* pred = block->bbPreds; pred != nullptr; pred = pred->flNext)
1324 BasicBlock* predBlock = pred->flBlock;
1325 if (predBlock->NumSucc(compiler) > 1)
1327 blockInfo[block->bbNum].hasCriticalInEdge = true;
1330 else if (predBlock->bbJumpKind == BBJ_SWITCH)
1332 assert(!"Switch with single successor");
1337 // Determine which block to schedule next.
1339 // First, update the NORMAL successors of the current block, adding them to the worklist
1340 // according to the desired order. We will handle the EH successors below.
1341 bool checkForCriticalOutEdge = (block->NumSucc(compiler) > 1);
1342 if (!checkForCriticalOutEdge && block->bbJumpKind == BBJ_SWITCH)
1344 assert(!"Switch with single successor");
1347 for (unsigned succIndex = 0; succIndex < block->NumSucc(compiler); succIndex++)
1349 BasicBlock* succ = block->GetSucc(succIndex, compiler);
1350 if (checkForCriticalOutEdge && succ->GetUniquePred(compiler) == nullptr)
1352 blockInfo[block->bbNum].hasCriticalOutEdge = true;
1353 // We can stop checking now.
1354 checkForCriticalOutEdge = false;
1357 if (isTraversalLayoutOrder() || isBlockVisited(succ))
1362 // We've now seen a predecessor, so add it to the work list and the "readySet".
1363 // It will be inserted in the worklist according to the specified traversal order
1364 // (i.e. pred-first or random, since layout order is handled above).
1365 if (!BlockSetOps::IsMember(compiler, readySet, succ->bbNum))
1367 addToBlockSequenceWorkList(readySet, succ);
1368 BlockSetOps::AddElemD(compiler, readySet, succ->bbNum);
1372 // For layout order, simply use bbNext
1373 if (isTraversalLayoutOrder())
1375 nextBlock = block->bbNext;
1379 while (nextBlock == nullptr)
1381 nextBlock = getNextCandidateFromWorkList();
1383 // TODO-Throughput: We would like to bypass this traversal if we know we've handled all
1384 // the blocks - but fgBBcount does not appear to be updated when blocks are removed.
1385 if (nextBlock == nullptr /* && bbSeqCount != compiler->fgBBcount*/ && !verifiedAllBBs)
1387 // If we don't encounter all blocks by traversing the regular sucessor links, do a full
1388 // traversal of all the blocks, and add them in layout order.
1389 // This may include:
1390 // - internal-only blocks (in the fgAddCodeList) which may not be in the flow graph
1391 // (these are not even in the bbNext links).
1392 // - blocks that have become unreachable due to optimizations, but that are strongly
1393 // connected (these are not removed)
1396 for (Compiler::AddCodeDsc* desc = compiler->fgAddCodeList; desc != nullptr; desc = desc->acdNext)
1398 if (!isBlockVisited(block))
1400 addToBlockSequenceWorkList(readySet, block);
1401 BlockSetOps::AddElemD(compiler, readySet, block->bbNum);
1405 for (BasicBlock* block = compiler->fgFirstBB; block; block = block->bbNext)
1407 if (!isBlockVisited(block))
1409 addToBlockSequenceWorkList(readySet, block);
1410 BlockSetOps::AddElemD(compiler, readySet, block->bbNum);
1413 verifiedAllBBs = true;
1421 blockSequencingDone = true;
1424 // Make sure that we've visited all the blocks.
1425 for (BasicBlock* block = compiler->fgFirstBB; block != nullptr; block = block->bbNext)
1427 assert(isBlockVisited(block));
1430 JITDUMP("LSRA Block Sequence: ");
1432 for (BasicBlock *block = startBlockSequence(); block != nullptr; ++i, block = moveToNextBlock())
1434 JITDUMP("BB%02u", block->bbNum);
1436 if (block->isMaxBBWeight())
1442 JITDUMP("(%6s) ", refCntWtd2str(block->getBBWeight(compiler)));
1454 //------------------------------------------------------------------------
1455 // compareBlocksForSequencing: Compare two basic blocks for sequencing order.
1458 // block1 - the first block for comparison
1459 // block2 - the second block for comparison
1460 // useBlockWeights - whether to use block weights for comparison
1463 // -1 if block1 is preferred.
1464 // 0 if the blocks are equivalent.
1465 // 1 if block2 is preferred.
1468 // See addToBlockSequenceWorkList.
1469 int LinearScan::compareBlocksForSequencing(BasicBlock* block1, BasicBlock* block2, bool useBlockWeights)
1471 if (useBlockWeights)
1473 unsigned weight1 = block1->getBBWeight(compiler);
1474 unsigned weight2 = block2->getBBWeight(compiler);
1476 if (weight1 > weight2)
1480 else if (weight1 < weight2)
1486 // If weights are the same prefer LOWER bbnum
1487 if (block1->bbNum < block2->bbNum)
1491 else if (block1->bbNum == block2->bbNum)
1501 //------------------------------------------------------------------------
1502 // addToBlockSequenceWorkList: Add a BasicBlock to the work list for sequencing.
1505 // sequencedBlockSet - the set of blocks that are already sequenced
1506 // block - the new block to be added
1512 // The first block in the list will be the next one to be sequenced, as soon
1513 // as we encounter a block whose successors have all been sequenced, in pred-first
1514 // order, or the very next block if we are traversing in random order (once implemented).
1515 // This method uses a comparison method to determine the order in which to place
1516 // the blocks in the list. This method queries whether all predecessors of the
1517 // block are sequenced at the time it is added to the list and if so uses block weights
1518 // for inserting the block. A block is never inserted ahead of its predecessors.
1519 // A block at the time of insertion may not have all its predecessors sequenced, in
1520 // which case it will be sequenced based on its block number. Once a block is inserted,
1521 // its priority\order will not be changed later once its remaining predecessors are
1522 // sequenced. This would mean that work list may not be sorted entirely based on
1523 // block weights alone.
1525 // Note also that, when random traversal order is implemented, this method
1526 // should insert the blocks into the list in random order, so that we can always
1527 // simply select the first block in the list.
1528 void LinearScan::addToBlockSequenceWorkList(BlockSet sequencedBlockSet, BasicBlock* block)
1530 // The block that is being added is not already sequenced
1531 assert(!BlockSetOps::IsMember(compiler, sequencedBlockSet, block->bbNum));
1533 // Get predSet of block
1534 BlockSet BLOCKSET_INIT_NOCOPY(predSet, BlockSetOps::MakeEmpty(compiler));
1536 for (pred = block->bbPreds; pred != nullptr; pred = pred->flNext)
1538 BlockSetOps::AddElemD(compiler, predSet, pred->flBlock->bbNum);
1541 // If either a rarely run block or all its preds are already sequenced, use block's weight to sequence
1542 bool useBlockWeight = block->isRunRarely() || BlockSetOps::IsSubset(compiler, sequencedBlockSet, predSet);
1544 BasicBlockList* prevNode = nullptr;
1545 BasicBlockList* nextNode = blockSequenceWorkList;
1547 while (nextNode != nullptr)
1551 if (nextNode->block->isRunRarely())
1553 // If the block that is yet to be sequenced is a rarely run block, always use block weights for sequencing
1554 seqResult = compareBlocksForSequencing(nextNode->block, block, true);
1556 else if (BlockSetOps::IsMember(compiler, predSet, nextNode->block->bbNum))
1558 // always prefer unsequenced pred blocks
1563 seqResult = compareBlocksForSequencing(nextNode->block, block, useBlockWeight);
1571 prevNode = nextNode;
1572 nextNode = nextNode->next;
1575 BasicBlockList* newListNode = new (compiler, CMK_LSRA) BasicBlockList(block, nextNode);
1576 if (prevNode == nullptr)
1578 blockSequenceWorkList = newListNode;
1582 prevNode->next = newListNode;
1586 void LinearScan::removeFromBlockSequenceWorkList(BasicBlockList* listNode, BasicBlockList* prevNode)
1588 if (listNode == blockSequenceWorkList)
1590 assert(prevNode == nullptr);
1591 blockSequenceWorkList = listNode->next;
1595 assert(prevNode != nullptr && prevNode->next == listNode);
1596 prevNode->next = listNode->next;
1598 // TODO-Cleanup: consider merging Compiler::BlockListNode and BasicBlockList
1599 // compiler->FreeBlockListNode(listNode);
1602 // Initialize the block order for allocation (called each time a new traversal begins).
1603 BasicBlock* LinearScan::startBlockSequence()
1605 if (!blockSequencingDone)
1609 BasicBlock* curBB = compiler->fgFirstBB;
1611 curBBNum = curBB->bbNum;
1612 clearVisitedBlocks();
1613 assert(blockSequence[0] == compiler->fgFirstBB);
1614 markBlockVisited(curBB);
1618 //------------------------------------------------------------------------
1619 // moveToNextBlock: Move to the next block in order for allocation or resolution.
1628 // This method is used when the next block is actually going to be handled.
1629 // It changes curBBNum.
1631 BasicBlock* LinearScan::moveToNextBlock()
1633 BasicBlock* nextBlock = getNextBlock();
1635 if (nextBlock != nullptr)
1637 curBBNum = nextBlock->bbNum;
1642 //------------------------------------------------------------------------
1643 // getNextBlock: Get the next block in order for allocation or resolution.
1652 // This method does not actually change the current block - it is used simply
1653 // to determine which block will be next.
1655 BasicBlock* LinearScan::getNextBlock()
1657 assert(blockSequencingDone);
1658 unsigned int nextBBSeqNum = curBBSeqNum + 1;
1659 if (nextBBSeqNum < bbSeqCount)
1661 return blockSequence[nextBBSeqNum];
1666 //------------------------------------------------------------------------
1667 // doLinearScan: The main method for register allocation.
1676 // Lowering must have set the NodeInfo (gtLsraInfo) on each node to communicate
1677 // the register requirements.
1679 void LinearScan::doLinearScan()
1684 printf("*************** In doLinearScan\n");
1685 printf("Trees before linear scan register allocator (LSRA)\n");
1686 compiler->fgDispBasicBlocks(true);
1690 splitBBNumToTargetBBNumMap = nullptr;
1692 // This is complicated by the fact that physical registers have refs associated
1693 // with locations where they are killed (e.g. calls), but we don't want to
1694 // count these as being touched.
1696 compiler->codeGen->regSet.rsClearRegsModified();
1698 // Figure out if we're going to use an RSP frame or an RBP frame. We need to do this
1699 // before building the intervals and ref positions, because those objects will embed
1700 // RBP in various register masks (like preferences) if RBP is allowed to be allocated.
1705 DBEXEC(VERBOSE, TupleStyleDump(LSRA_DUMP_REFPOS));
1706 compiler->EndPhase(PHASE_LINEAR_SCAN_BUILD);
1708 DBEXEC(VERBOSE, lsraDumpIntervals("after buildIntervals"));
1710 BlockSetOps::ClearD(compiler, bbVisitedSet);
1712 allocateRegisters();
1713 compiler->EndPhase(PHASE_LINEAR_SCAN_ALLOC);
1715 compiler->EndPhase(PHASE_LINEAR_SCAN_RESOLVE);
1717 DBEXEC(VERBOSE, TupleStyleDump(LSRA_DUMP_POST));
1719 compiler->compLSRADone = true;
1722 //------------------------------------------------------------------------
1723 // recordVarLocationsAtStartOfBB: Update live-in LclVarDscs with the appropriate
1724 // register location at the start of a block, during codegen.
1727 // bb - the block for which code is about to be generated.
1733 // CodeGen will take care of updating the reg masks and the current var liveness,
1734 // after calling this method.
1735 // This is because we need to kill off the dead registers before setting the newly live ones.
1737 void LinearScan::recordVarLocationsAtStartOfBB(BasicBlock* bb)
1739 JITDUMP("Recording Var Locations at start of BB%02u\n", bb->bbNum);
1740 VarToRegMap map = getInVarToRegMap(bb->bbNum);
1743 VARSET_ITER_INIT(compiler, iter, bb->bbLiveIn, varIndex);
1744 while (iter.NextElem(compiler, &varIndex))
1746 unsigned varNum = compiler->lvaTrackedToVarNum[varIndex];
1747 LclVarDsc* varDsc = &(compiler->lvaTable[varNum]);
1748 regNumber regNum = getVarReg(map, varNum);
1750 regNumber oldRegNum = varDsc->lvRegNum;
1751 regNumber newRegNum = regNum;
1753 if (oldRegNum != newRegNum)
1755 JITDUMP(" V%02u(%s->%s)", varNum, compiler->compRegVarName(oldRegNum),
1756 compiler->compRegVarName(newRegNum));
1757 varDsc->lvRegNum = newRegNum;
1760 else if (newRegNum != REG_STK)
1762 JITDUMP(" V%02u(%s)", varNum, compiler->compRegVarName(newRegNum));
1769 JITDUMP(" <none>\n");
1775 void Interval::setLocalNumber(unsigned lclNum, LinearScan* linScan)
1777 linScan->localVarIntervals[lclNum] = this;
1779 assert(linScan->getIntervalForLocalVar(lclNum) == this);
1780 this->isLocalVar = true;
1781 this->varNum = lclNum;
1784 // identify the candidates which we are not going to enregister due to
1785 // being used in EH in a way we don't want to deal with
1786 // this logic cloned from fgInterBlockLocalVarLiveness
1787 void LinearScan::identifyCandidatesExceptionDataflow()
1789 VARSET_TP VARSET_INIT_NOCOPY(exceptVars, VarSetOps::MakeEmpty(compiler));
1790 VARSET_TP VARSET_INIT_NOCOPY(filterVars, VarSetOps::MakeEmpty(compiler));
1791 VARSET_TP VARSET_INIT_NOCOPY(finallyVars, VarSetOps::MakeEmpty(compiler));
1794 foreach_block(compiler, block)
1796 if (block->bbCatchTyp != BBCT_NONE)
1798 // live on entry to handler
1799 VarSetOps::UnionD(compiler, exceptVars, block->bbLiveIn);
1802 if (block->bbJumpKind == BBJ_EHFILTERRET)
1804 // live on exit from filter
1805 VarSetOps::UnionD(compiler, filterVars, block->bbLiveOut);
1807 else if (block->bbJumpKind == BBJ_EHFINALLYRET)
1809 // live on exit from finally
1810 VarSetOps::UnionD(compiler, finallyVars, block->bbLiveOut);
1812 #if FEATURE_EH_FUNCLETS
1813 // Funclets are called and returned from, as such we can only count on the frame
1814 // pointer being restored, and thus everything live in or live out must be on the
1816 if (block->bbFlags & BBF_FUNCLET_BEG)
1818 VarSetOps::UnionD(compiler, exceptVars, block->bbLiveIn);
1820 if ((block->bbJumpKind == BBJ_EHFINALLYRET) || (block->bbJumpKind == BBJ_EHFILTERRET) ||
1821 (block->bbJumpKind == BBJ_EHCATCHRET))
1823 VarSetOps::UnionD(compiler, exceptVars, block->bbLiveOut);
1825 #endif // FEATURE_EH_FUNCLETS
1828 // slam them all together (there was really no need to use more than 2 bitvectors here)
1829 VarSetOps::UnionD(compiler, exceptVars, filterVars);
1830 VarSetOps::UnionD(compiler, exceptVars, finallyVars);
1832 /* Mark all pointer variables live on exit from a 'finally'
1833 block as either volatile for non-GC ref types or as
1834 'explicitly initialized' (volatile and must-init) for GC-ref types */
1836 VARSET_ITER_INIT(compiler, iter, exceptVars, varIndex);
1837 while (iter.NextElem(compiler, &varIndex))
1839 unsigned varNum = compiler->lvaTrackedToVarNum[varIndex];
1840 LclVarDsc* varDsc = compiler->lvaTable + varNum;
1842 compiler->lvaSetVarDoNotEnregister(varNum DEBUGARG(Compiler::DNER_LiveInOutOfHandler));
1844 if (varTypeIsGC(varDsc))
1846 if (VarSetOps::IsMember(compiler, finallyVars, varIndex) && !varDsc->lvIsParam)
1848 varDsc->lvMustInit = true;
1854 bool LinearScan::isRegCandidate(LclVarDsc* varDsc)
1856 // Check to see if opt settings permit register variables
1857 if ((compiler->opts.compFlags & CLFLG_REGVAR) == 0)
1862 // If we have JMP, reg args must be put on the stack
1864 if (compiler->compJmpOpUsed && varDsc->lvIsRegArg)
1869 if (!varDsc->lvTracked)
1874 // Don't allocate registers for dependently promoted struct fields
1875 if (compiler->lvaIsFieldOfDependentlyPromotedStruct(varDsc))
1882 // Identify locals & compiler temps that are register candidates
1883 // TODO-Cleanup: This was cloned from Compiler::lvaSortByRefCount() in lclvars.cpp in order
1884 // to avoid perturbation, but should be merged.
1886 void LinearScan::identifyCandidates()
1888 if (compiler->lvaCount == 0)
1893 if (compiler->compHndBBtabCount > 0)
1895 identifyCandidatesExceptionDataflow();
1898 // initialize mapping from local to interval
1899 localVarIntervals = new (compiler, CMK_LSRA) Interval*[compiler->lvaCount];
1904 // While we build intervals for the candidate lclVars, we will determine the floating point
1905 // lclVars, if any, to consider for callee-save register preferencing.
1906 // We maintain two sets of FP vars - those that meet the first threshold of weighted ref Count,
1907 // and those that meet the second.
1908 // The first threshold is used for methods that are heuristically deemed either to have light
1909 // fp usage, or other factors that encourage conservative use of callee-save registers, such
1910 // as multiple exits (where there might be an early exit that woudl be excessively penalized by
1911 // lots of prolog/epilog saves & restores).
1912 // The second threshold is used where there are factors deemed to make it more likely that fp
1913 // fp callee save registers will be needed, such as loops or many fp vars.
1914 // We keep two sets of vars, since we collect some of the information to determine which set to
1915 // use as we iterate over the vars.
1916 // When we are generating AVX code on non-Unix (FEATURE_PARTIAL_SIMD_CALLEE_SAVE), we maintain an
1917 // additional set of LargeVectorType vars, and there is a separate threshold defined for those.
1918 // It is assumed that if we encounter these, that we should consider this a "high use" scenario,
1919 // so we don't maintain two sets of these vars.
1920 // This is defined as thresholdLargeVectorRefCntWtd, as we are likely to use the same mechanism
1921 // for vectors on Arm64, though the actual value may differ.
1923 VarSetOps::AssignNoCopy(compiler, fpCalleeSaveCandidateVars, VarSetOps::MakeEmpty(compiler));
1924 VARSET_TP VARSET_INIT_NOCOPY(fpMaybeCandidateVars, VarSetOps::MakeEmpty(compiler));
1925 unsigned int floatVarCount = 0;
1926 unsigned int thresholdFPRefCntWtd = 4 * BB_UNITY_WEIGHT;
1927 unsigned int maybeFPRefCntWtd = 2 * BB_UNITY_WEIGHT;
1928 #if FEATURE_PARTIAL_SIMD_CALLEE_SAVE
1929 VarSetOps::AssignNoCopy(compiler, largeVectorVars, VarSetOps::MakeEmpty(compiler));
1930 VarSetOps::AssignNoCopy(compiler, largeVectorCalleeSaveCandidateVars, VarSetOps::MakeEmpty(compiler));
1931 unsigned int largeVectorVarCount = 0;
1932 unsigned int thresholdLargeVectorRefCntWtd = 4 * BB_UNITY_WEIGHT;
1933 #endif // FEATURE_PARTIAL_SIMD_CALLEE_SAVE
1935 for (lclNum = 0, varDsc = compiler->lvaTable; lclNum < compiler->lvaCount; lclNum++, varDsc++)
1937 // Assign intervals to all the variables - this makes it easier to map
1939 var_types intervalType = (var_types)varDsc->lvType;
1940 Interval* newInt = newInterval(intervalType);
1942 newInt->setLocalNumber(lclNum, this);
1943 if (varDsc->lvIsStructField)
1945 newInt->isStructField = true;
1948 // Initialize all variables to REG_STK
1949 varDsc->lvRegNum = REG_STK;
1950 #ifndef _TARGET_64BIT_
1951 varDsc->lvOtherReg = REG_STK;
1952 #endif // _TARGET_64BIT_
1954 #if !defined(_TARGET_64BIT_)
1955 if (intervalType == TYP_LONG)
1957 // Long variables should not be register candidates.
1958 // Lowering will have split any candidate lclVars into lo/hi vars.
1959 varDsc->lvLRACandidate = 0;
1962 #endif // !defined(_TARGET_64BIT)
1964 /* Track all locals that can be enregistered */
1966 varDsc->lvLRACandidate = 1;
1968 if (!isRegCandidate(varDsc))
1970 varDsc->lvLRACandidate = 0;
1974 // Start with lvRegister as false - set it true only if the variable gets
1975 // the same register assignment throughout
1976 varDsc->lvRegister = false;
1978 /* If the ref count is zero */
1979 if (varDsc->lvRefCnt == 0)
1981 /* Zero ref count, make this untracked */
1982 varDsc->lvRefCntWtd = 0;
1983 varDsc->lvLRACandidate = 0;
1986 // Variables that are address-exposed are never enregistered, or tracked.
1987 // A struct may be promoted, and a struct that fits in a register may be fully enregistered.
1988 // Pinned variables may not be tracked (a condition of the GCInfo representation)
1989 // or enregistered, on x86 -- it is believed that we can enregister pinned (more properly, "pinning")
1990 // references when using the general GC encoding.
1992 if (varDsc->lvAddrExposed || !varTypeIsEnregisterableStruct(varDsc))
1994 varDsc->lvLRACandidate = 0;
1996 Compiler::DoNotEnregisterReason dner = Compiler::DNER_AddrExposed;
1997 if (!varDsc->lvAddrExposed)
1999 dner = Compiler::DNER_IsStruct;
2002 compiler->lvaSetVarDoNotEnregister(lclNum DEBUGARG(dner));
2004 else if (varDsc->lvPinned)
2006 varDsc->lvTracked = 0;
2007 #ifdef JIT32_GCENCODER
2008 compiler->lvaSetVarDoNotEnregister(lclNum DEBUGARG(Compiler::DNER_PinningRef));
2009 #endif // JIT32_GCENCODER
2012 // Are we not optimizing and we have exception handlers?
2013 // if so mark all args and locals as volatile, so that they
2014 // won't ever get enregistered.
2016 if (compiler->opts.MinOpts() && compiler->compHndBBtabCount > 0)
2018 compiler->lvaSetVarDoNotEnregister(lclNum DEBUGARG(Compiler::DNER_LiveInOutOfHandler));
2019 varDsc->lvLRACandidate = 0;
2023 if (varDsc->lvDoNotEnregister)
2025 varDsc->lvLRACandidate = 0;
2029 var_types type = genActualType(varDsc->TypeGet());
2033 #if CPU_HAS_FP_SUPPORT
2036 if (compiler->opts.compDbgCode)
2038 varDsc->lvLRACandidate = 0;
2041 #endif // CPU_HAS_FP_SUPPORT
2053 if (varDsc->lvPromoted)
2055 varDsc->lvLRACandidate = 0;
2058 // TODO-1stClassStructs: Move TYP_SIMD8 up with the other SIMD types, after handling the param issue
2059 // (passing & returning as TYP_LONG).
2061 #endif // FEATURE_SIMD
2065 varDsc->lvLRACandidate = 0;
2071 noway_assert(!"lvType not set correctly");
2072 varDsc->lvType = TYP_INT;
2077 varDsc->lvLRACandidate = 0;
2080 // we will set this later when we have determined liveness
2081 if (varDsc->lvLRACandidate)
2083 varDsc->lvMustInit = false;
2086 // We maintain two sets of FP vars - those that meet the first threshold of weighted ref Count,
2087 // and those that meet the second (see the definitions of thresholdFPRefCntWtd and maybeFPRefCntWtd
2089 CLANG_FORMAT_COMMENT_ANCHOR;
2091 #if FEATURE_PARTIAL_SIMD_CALLEE_SAVE
2092 // Additionally, when we are generating AVX on non-UNIX amd64, we keep a separate set of the LargeVectorType
2094 if (varDsc->lvType == LargeVectorType)
2096 largeVectorVarCount++;
2097 VarSetOps::AddElemD(compiler, largeVectorVars, varDsc->lvVarIndex);
2098 unsigned refCntWtd = varDsc->lvRefCntWtd;
2099 if (refCntWtd >= thresholdLargeVectorRefCntWtd)
2101 VarSetOps::AddElemD(compiler, largeVectorCalleeSaveCandidateVars, varDsc->lvVarIndex);
2105 #endif // FEATURE_PARTIAL_SIMD_CALLEE_SAVE
2106 if (regType(newInt->registerType) == FloatRegisterType)
2109 unsigned refCntWtd = varDsc->lvRefCntWtd;
2110 if (varDsc->lvIsRegArg)
2112 // Don't count the initial reference for register params. In those cases,
2113 // using a callee-save causes an extra copy.
2114 refCntWtd -= BB_UNITY_WEIGHT;
2116 if (refCntWtd >= thresholdFPRefCntWtd)
2118 VarSetOps::AddElemD(compiler, fpCalleeSaveCandidateVars, varDsc->lvVarIndex);
2120 else if (refCntWtd >= maybeFPRefCntWtd)
2122 VarSetOps::AddElemD(compiler, fpMaybeCandidateVars, varDsc->lvVarIndex);
2127 // The factors we consider to determine which set of fp vars to use as candidates for callee save
2128 // registers current include the number of fp vars, whether there are loops, and whether there are
2129 // multiple exits. These have been selected somewhat empirically, but there is probably room for
2131 CLANG_FORMAT_COMMENT_ANCHOR;
2136 printf("\nFP callee save candidate vars: ");
2137 if (!VarSetOps::IsEmpty(compiler, fpCalleeSaveCandidateVars))
2139 dumpConvertedVarSet(compiler, fpCalleeSaveCandidateVars);
2149 JITDUMP("floatVarCount = %d; hasLoops = %d, singleExit = %d\n", floatVarCount, compiler->fgHasLoops,
2150 (compiler->fgReturnBlocks == nullptr || compiler->fgReturnBlocks->next == nullptr));
2152 // Determine whether to use the 2nd, more aggressive, threshold for fp callee saves.
2153 if (floatVarCount > 6 && compiler->fgHasLoops &&
2154 (compiler->fgReturnBlocks == nullptr || compiler->fgReturnBlocks->next == nullptr))
2159 printf("Adding additional fp callee save candidates: \n");
2160 if (!VarSetOps::IsEmpty(compiler, fpMaybeCandidateVars))
2162 dumpConvertedVarSet(compiler, fpMaybeCandidateVars);
2171 VarSetOps::UnionD(compiler, fpCalleeSaveCandidateVars, fpMaybeCandidateVars);
2178 // Frame layout is only pre-computed for ARM
2179 printf("\nlvaTable after IdentifyCandidates\n");
2180 compiler->lvaTableDump();
2183 #endif // _TARGET_ARM_
2186 // TODO-Throughput: This mapping can surely be more efficiently done
2187 void LinearScan::initVarRegMaps()
2189 assert(compiler->lvaTrackedFixed); // We should have already set this to prevent us from adding any new tracked
2192 // The compiler memory allocator requires that the allocation be an
2193 // even multiple of int-sized objects
2194 unsigned int varCount = compiler->lvaTrackedCount;
2195 regMapCount = (unsigned int)roundUp(varCount, sizeof(int));
2197 // Not sure why blocks aren't numbered from zero, but they don't appear to be.
2198 // So, if we want to index by bbNum we have to know the maximum value.
2199 unsigned int bbCount = compiler->fgBBNumMax + 1;
2201 inVarToRegMaps = new (compiler, CMK_LSRA) regNumber*[bbCount];
2202 outVarToRegMaps = new (compiler, CMK_LSRA) regNumber*[bbCount];
2206 // This VarToRegMap is used during the resolution of critical edges.
2207 sharedCriticalVarToRegMap = new (compiler, CMK_LSRA) regNumber[regMapCount];
2209 for (unsigned int i = 0; i < bbCount; i++)
2211 regNumber* inVarToRegMap = new (compiler, CMK_LSRA) regNumber[regMapCount];
2212 regNumber* outVarToRegMap = new (compiler, CMK_LSRA) regNumber[regMapCount];
2214 for (unsigned int j = 0; j < regMapCount; j++)
2216 inVarToRegMap[j] = REG_STK;
2217 outVarToRegMap[j] = REG_STK;
2219 inVarToRegMaps[i] = inVarToRegMap;
2220 outVarToRegMaps[i] = outVarToRegMap;
2225 sharedCriticalVarToRegMap = nullptr;
2226 for (unsigned int i = 0; i < bbCount; i++)
2228 inVarToRegMaps[i] = nullptr;
2229 outVarToRegMaps[i] = nullptr;
2234 void LinearScan::setInVarRegForBB(unsigned int bbNum, unsigned int varNum, regNumber reg)
2236 assert(reg < UCHAR_MAX && varNum < compiler->lvaCount);
2237 inVarToRegMaps[bbNum][compiler->lvaTable[varNum].lvVarIndex] = reg;
2240 void LinearScan::setOutVarRegForBB(unsigned int bbNum, unsigned int varNum, regNumber reg)
2242 assert(reg < UCHAR_MAX && varNum < compiler->lvaCount);
2243 outVarToRegMaps[bbNum][compiler->lvaTable[varNum].lvVarIndex] = reg;
2246 LinearScan::SplitEdgeInfo LinearScan::getSplitEdgeInfo(unsigned int bbNum)
2248 SplitEdgeInfo splitEdgeInfo;
2249 assert(bbNum <= compiler->fgBBNumMax);
2250 assert(bbNum > bbNumMaxBeforeResolution);
2251 assert(splitBBNumToTargetBBNumMap != nullptr);
2252 splitBBNumToTargetBBNumMap->Lookup(bbNum, &splitEdgeInfo);
2253 assert(splitEdgeInfo.toBBNum <= bbNumMaxBeforeResolution);
2254 assert(splitEdgeInfo.fromBBNum <= bbNumMaxBeforeResolution);
2255 return splitEdgeInfo;
2258 VarToRegMap LinearScan::getInVarToRegMap(unsigned int bbNum)
2260 assert(bbNum <= compiler->fgBBNumMax);
2261 // For the blocks inserted to split critical edges, the inVarToRegMap is
2262 // equal to the outVarToRegMap at the "from" block.
2263 if (bbNum > bbNumMaxBeforeResolution)
2265 SplitEdgeInfo splitEdgeInfo = getSplitEdgeInfo(bbNum);
2266 unsigned fromBBNum = splitEdgeInfo.fromBBNum;
2269 assert(splitEdgeInfo.toBBNum != 0);
2270 return inVarToRegMaps[splitEdgeInfo.toBBNum];
2274 return outVarToRegMaps[fromBBNum];
2278 return inVarToRegMaps[bbNum];
2281 VarToRegMap LinearScan::getOutVarToRegMap(unsigned int bbNum)
2283 assert(bbNum <= compiler->fgBBNumMax);
2284 // For the blocks inserted to split critical edges, the outVarToRegMap is
2285 // equal to the inVarToRegMap at the target.
2286 if (bbNum > bbNumMaxBeforeResolution)
2288 // If this is an empty block, its in and out maps are both the same.
2289 // We identify this case by setting fromBBNum or toBBNum to 0, and using only the other.
2290 SplitEdgeInfo splitEdgeInfo = getSplitEdgeInfo(bbNum);
2291 unsigned toBBNum = splitEdgeInfo.toBBNum;
2294 assert(splitEdgeInfo.fromBBNum != 0);
2295 return outVarToRegMaps[splitEdgeInfo.fromBBNum];
2299 return inVarToRegMaps[toBBNum];
2302 return outVarToRegMaps[bbNum];
2305 regNumber LinearScan::getVarReg(VarToRegMap bbVarToRegMap, unsigned int varNum)
2307 assert(compiler->lvaTable[varNum].lvTracked);
2308 return bbVarToRegMap[compiler->lvaTable[varNum].lvVarIndex];
2311 // Initialize the incoming VarToRegMap to the given map values (generally a predecessor of
2313 VarToRegMap LinearScan::setInVarToRegMap(unsigned int bbNum, VarToRegMap srcVarToRegMap)
2315 VarToRegMap inVarToRegMap = inVarToRegMaps[bbNum];
2316 memcpy(inVarToRegMap, srcVarToRegMap, (regMapCount * sizeof(regNumber)));
2317 return inVarToRegMap;
2320 // find the last node in the tree in execution order
2321 // TODO-Throughput: this is inefficient!
2322 GenTree* lastNodeInTree(GenTree* tree)
2324 // There is no gtprev on the top level tree node so
2325 // apparently the way to walk a tree backwards is to walk
2326 // it forward, find the last node, and walk back from there.
2328 GenTree* last = nullptr;
2329 if (tree->OperGet() == GT_STMT)
2331 GenTree* statement = tree;
2333 foreach_treenode_execution_order(tree, statement)
2344 tree = tree->gtNext;
2350 // given a tree node
2351 RefType refTypeForLocalRefNode(GenTree* node)
2353 assert(node->IsLocal());
2355 // We don't support updates
2356 assert((node->gtFlags & GTF_VAR_USEASG) == 0);
2358 if (node->gtFlags & GTF_VAR_DEF)
2368 // This function sets RefPosition last uses by walking the RefPositions, instead of walking the
2369 // tree nodes in execution order (as was done in a previous version).
2370 // This is because the execution order isn't strictly correct, specifically for
2371 // references to local variables that occur in arg lists.
2373 // TODO-Throughput: This function should eventually be eliminated, as we should be able to rely on last uses
2374 // being set by dataflow analysis. It is necessary to do it this way only because the execution
2375 // order wasn't strictly correct.
2377 void LinearScan::setLastUses(BasicBlock* block)
2382 JITDUMP("\n\nCALCULATING LAST USES for block %u, liveout=", block->bbNum);
2383 dumpConvertedVarSet(compiler, block->bbLiveOut);
2384 JITDUMP("\n==============================\n");
2388 unsigned keepAliveVarNum = BAD_VAR_NUM;
2389 if (compiler->lvaKeepAliveAndReportThis())
2391 keepAliveVarNum = compiler->info.compThisArg;
2392 assert(compiler->info.compIsStatic == false);
2395 // find which uses are lastUses
2397 // Work backwards starting with live out.
2398 // 'temp' is updated to include any exposed use (including those in this
2399 // block that we've already seen). When we encounter a use, if it's
2400 // not in that set, then it's a last use.
2402 VARSET_TP VARSET_INIT(compiler, temp, block->bbLiveOut);
2404 auto currentRefPosition = refPositions.rbegin();
2406 while (currentRefPosition->refType != RefTypeBB)
2408 // We should never see ParamDefs or ZeroInits within a basic block.
2409 assert(currentRefPosition->refType != RefTypeParamDef && currentRefPosition->refType != RefTypeZeroInit);
2410 if (currentRefPosition->isIntervalRef() && currentRefPosition->getInterval()->isLocalVar)
2412 unsigned varNum = currentRefPosition->getInterval()->varNum;
2413 unsigned varIndex = currentRefPosition->getInterval()->getVarIndex(compiler);
2414 // We should always have a tree node for a localVar, except for the "special" RefPositions.
2415 GenTreePtr tree = currentRefPosition->treeNode;
2416 assert(tree != nullptr || currentRefPosition->refType == RefTypeExpUse ||
2417 currentRefPosition->refType == RefTypeDummyDef);
2418 if (!VarSetOps::IsMember(compiler, temp, varIndex) && varNum != keepAliveVarNum)
2420 // There was no exposed use, so this is a
2421 // "last use" (and we mark it thus even if it's a def)
2423 if (tree != nullptr)
2425 tree->gtFlags |= GTF_VAR_DEATH;
2427 LsraLocation loc = currentRefPosition->nodeLocation;
2429 if (getLsraExtendLifeTimes())
2431 JITDUMP("last use of V%02u @%u (not marked as last use for LSRA due to extendLifetimes stress "
2433 compiler->lvaTrackedToVarNum[varIndex], loc);
2438 JITDUMP("last use of V%02u @%u\n", compiler->lvaTrackedToVarNum[varIndex], loc);
2439 currentRefPosition->lastUse = true;
2441 VarSetOps::AddElemD(compiler, temp, varIndex);
2445 currentRefPosition->lastUse = false;
2446 if (tree != nullptr)
2448 tree->gtFlags &= ~GTF_VAR_DEATH;
2452 if (currentRefPosition->refType == RefTypeDef || currentRefPosition->refType == RefTypeDummyDef)
2454 VarSetOps::RemoveElemD(compiler, temp, varIndex);
2457 assert(currentRefPosition != refPositions.rend());
2458 ++currentRefPosition;
2462 VARSET_TP VARSET_INIT(compiler, temp2, block->bbLiveIn);
2463 VarSetOps::DiffD(compiler, temp2, temp);
2464 VarSetOps::DiffD(compiler, temp, block->bbLiveIn);
2465 bool foundDiff = false;
2468 VARSET_ITER_INIT(compiler, iter, temp, varIndex);
2469 while (iter.NextElem(compiler, &varIndex))
2471 unsigned varNum = compiler->lvaTrackedToVarNum[varIndex];
2472 if (compiler->lvaTable[varNum].lvLRACandidate)
2474 JITDUMP("BB%02u: V%02u is computed live, but not in LiveIn set.\n", block->bbNum, varNum);
2481 VARSET_ITER_INIT(compiler, iter, temp2, varIndex);
2482 while (iter.NextElem(compiler, &varIndex))
2484 unsigned varNum = compiler->lvaTrackedToVarNum[varIndex];
2485 if (compiler->lvaTable[varNum].lvLRACandidate)
2487 JITDUMP("BB%02u: V%02u is in LiveIn set, but not computed live.\n", block->bbNum, varNum);
2497 void LinearScan::addRefsForPhysRegMask(regMaskTP mask, LsraLocation currentLoc, RefType refType, bool isLastUse)
2499 for (regNumber reg = REG_FIRST; mask; reg = REG_NEXT(reg), mask >>= 1)
2503 // This assumes that these are all "special" RefTypes that
2504 // don't need to be recorded on the tree (hence treeNode is nullptr)
2505 RefPosition* pos = newRefPosition(reg, currentLoc, refType, nullptr,
2506 genRegMask(reg)); // This MUST occupy the physical register (obviously)
2510 pos->lastUse = true;
2516 //------------------------------------------------------------------------
2517 // getKillSetForNode: Return the registers killed by the given tree node.
2520 // compiler - the compiler context to use
2521 // tree - the tree for which the kill set is needed.
2523 // Return Value: a register mask of the registers killed
2525 regMaskTP LinearScan::getKillSetForNode(GenTree* tree)
2527 regMaskTP killMask = RBM_NONE;
2528 switch (tree->OperGet())
2530 #ifdef _TARGET_XARCH_
2532 // We use the 128-bit multiply when performing an overflow checking unsigned multiply
2534 if (((tree->gtFlags & GTF_UNSIGNED) != 0) && tree->gtOverflowEx())
2536 // Both RAX and RDX are killed by the operation
2537 killMask = RBM_RAX | RBM_RDX;
2542 #if defined(_TARGET_X86_) && !defined(LEGACY_BACKEND)
2545 killMask = RBM_RAX | RBM_RDX;
2552 if (!varTypeIsFloating(tree->TypeGet()))
2554 // RDX needs to be killed early, because it must not be used as a source register
2555 // (unlike most cases, where the kill happens AFTER the uses). So for this kill,
2556 // we add the RefPosition at the tree loc (where the uses are located) instead of the
2557 // usual kill location which is the same as the defs at tree loc+1.
2558 // Note that we don't have to add interference for the live vars, because that
2559 // will be done below, and is not sensitive to the precise location.
2560 LsraLocation currentLoc = tree->gtLsraInfo.loc;
2561 assert(currentLoc != 0);
2562 addRefsForPhysRegMask(RBM_RDX, currentLoc, RefTypeKill, true);
2563 // Both RAX and RDX are killed by the operation
2564 killMask = RBM_RAX | RBM_RDX;
2567 #endif // _TARGET_XARCH_
2570 if (tree->OperIsCopyBlkOp())
2572 assert(tree->AsObj()->gtGcPtrCount != 0);
2573 killMask = compiler->compHelperCallKillSet(CORINFO_HELP_ASSIGN_BYREF);
2579 case GT_STORE_DYN_BLK:
2581 GenTreeBlk* blkNode = tree->AsBlk();
2582 bool isCopyBlk = varTypeIsStruct(blkNode->Data());
2583 switch (blkNode->gtBlkOpKind)
2585 case GenTreeBlk::BlkOpKindHelper:
2588 killMask = compiler->compHelperCallKillSet(CORINFO_HELP_MEMCPY);
2592 killMask = compiler->compHelperCallKillSet(CORINFO_HELP_MEMSET);
2596 #ifdef _TARGET_XARCH_
2597 case GenTreeBlk::BlkOpKindRepInstr:
2600 // rep movs kills RCX, RDI and RSI
2601 killMask = RBM_RCX | RBM_RDI | RBM_RSI;
2605 // rep stos kills RCX and RDI.
2606 // (Note that the Data() node, if not constant, will be assigned to
2607 // RCX, but it's find that this kills it, as the value is not available
2608 // after this node in any case.)
2609 killMask = RBM_RDI | RBM_RCX;
2613 case GenTreeBlk::BlkOpKindRepInstr:
2615 case GenTreeBlk::BlkOpKindUnroll:
2616 case GenTreeBlk::BlkOpKindInvalid:
2617 // for these 'gtBlkOpKind' kinds, we leave 'killMask' = RBM_NONE
2628 if (tree->gtLsraInfo.isHelperCallWithKills)
2630 killMask = RBM_CALLEE_TRASH;
2634 killMask = compiler->compHelperCallKillSet(CORINFO_HELP_STOP_FOR_GC);
2638 if (compiler->compFloatingPointUsed)
2640 if (tree->TypeGet() == TYP_DOUBLE)
2642 needDoubleTmpForFPCall = true;
2644 else if (tree->TypeGet() == TYP_FLOAT)
2646 needFloatTmpForFPCall = true;
2649 if (tree->IsHelperCall())
2651 GenTreeCall* call = tree->AsCall();
2652 CorInfoHelpFunc helpFunc = compiler->eeGetHelperNum(call->gtCallMethHnd);
2653 killMask = compiler->compHelperCallKillSet(helpFunc);
2656 #endif // _TARGET_X86_
2658 // if there is no FP used, we can ignore the FP kills
2659 if (compiler->compFloatingPointUsed)
2661 killMask = RBM_CALLEE_TRASH;
2665 killMask = RBM_INT_CALLEE_TRASH;
2670 if (compiler->codeGen->gcInfo.gcIsWriteBarrierAsgNode(tree))
2672 killMask = RBM_CALLEE_TRASH_NOGC;
2673 #if !NOGC_WRITE_BARRIERS && (defined(_TARGET_ARM_) || defined(_TARGET_AMD64_))
2674 killMask |= (RBM_ARG_0 | RBM_ARG_1);
2675 #endif // !NOGC_WRITE_BARRIERS && (defined(_TARGET_ARM_) || defined(_TARGET_AMD64_))
2679 #if defined(PROFILING_SUPPORTED) && defined(_TARGET_AMD64_)
2680 // If this method requires profiler ELT hook then mark these nodes as killing
2681 // callee trash registers (excluding RAX and XMM0). The reason for this is that
2682 // profiler callback would trash these registers. See vm\amd64\asmhelpers.asm for
2685 if (compiler->compIsProfilerHookNeeded())
2687 killMask = compiler->compHelperCallKillSet(CORINFO_HELP_PROF_FCN_LEAVE);
2692 if (compiler->compIsProfilerHookNeeded())
2694 killMask = compiler->compHelperCallKillSet(CORINFO_HELP_PROF_FCN_TAILCALL);
2698 #endif // PROFILING_SUPPORTED && _TARGET_AMD64_
2701 // for all other 'tree->OperGet()' kinds, leave 'killMask' = RBM_NONE
2707 //------------------------------------------------------------------------
2708 // buildKillPositionsForNode:
2709 // Given some tree node add refpositions for all the registers this node kills
2712 // tree - the tree for which kill positions should be generated
2713 // currentLoc - the location at which the kills should be added
2716 // true - kills were inserted
2717 // false - no kills were inserted
2720 // The return value is needed because if we have any kills, we need to make sure that
2721 // all defs are located AFTER the kills. On the other hand, if there aren't kills,
2722 // the multiple defs for a regPair are in different locations.
2723 // If we generate any kills, we will mark all currentLiveVars as being preferenced
2724 // to avoid the killed registers. This is somewhat conservative.
2726 bool LinearScan::buildKillPositionsForNode(GenTree* tree, LsraLocation currentLoc)
2728 regMaskTP killMask = getKillSetForNode(tree);
2729 bool isCallKill = ((killMask == RBM_INT_CALLEE_TRASH) || (killMask == RBM_CALLEE_TRASH));
2730 if (killMask != RBM_NONE)
2732 // The killMask identifies a set of registers that will be used during codegen.
2733 // Mark these as modified here, so when we do final frame layout, we'll know about
2734 // all these registers. This is especially important if killMask contains
2735 // callee-saved registers, which affect the frame size since we need to save/restore them.
2736 // In the case where we have a copyBlk with GC pointers, can need to call the
2737 // CORINFO_HELP_ASSIGN_BYREF helper, which kills callee-saved RSI and RDI, if
2738 // LSRA doesn't assign RSI/RDI, they wouldn't get marked as modified until codegen,
2739 // which is too late.
2740 compiler->codeGen->regSet.rsSetRegsModified(killMask DEBUGARG(dumpTerse));
2742 addRefsForPhysRegMask(killMask, currentLoc, RefTypeKill, true);
2744 // TODO-CQ: It appears to be valuable for both fp and int registers to avoid killing the callee
2745 // save regs on infrequently exectued paths. However, it results in a large number of asmDiffs,
2746 // many of which appear to be regressions (because there is more spill on the infrequently path),
2747 // but are not really because the frequent path becomes smaller. Validating these diffs will need
2748 // to be done before making this change.
2749 // if (!blockSequence[curBBSeqNum]->isRunRarely())
2752 VARSET_ITER_INIT(compiler, iter, currentLiveVars, varIndex);
2753 while (iter.NextElem(compiler, &varIndex))
2755 unsigned varNum = compiler->lvaTrackedToVarNum[varIndex];
2756 LclVarDsc* varDsc = compiler->lvaTable + varNum;
2757 #if FEATURE_PARTIAL_SIMD_CALLEE_SAVE
2758 if (varDsc->lvType == LargeVectorType)
2760 if (!VarSetOps::IsMember(compiler, largeVectorCalleeSaveCandidateVars, varIndex))
2766 #endif // FEATURE_PARTIAL_SIMD_CALLEE_SAVE
2767 if (varTypeIsFloating(varDsc) &&
2768 !VarSetOps::IsMember(compiler, fpCalleeSaveCandidateVars, varIndex))
2772 Interval* interval = getIntervalForLocalVar(varNum);
2775 interval->preferCalleeSave = true;
2777 regMaskTP newPreferences = allRegs(interval->registerType) & (~killMask);
2779 if (newPreferences != RBM_NONE)
2781 interval->updateRegisterPreferences(newPreferences);
2785 // If there are no callee-saved registers, the call could kill all the registers.
2786 // This is a valid state, so in that case assert should not trigger. The RA will spill in order to
2787 // free a register later.
2788 assert(compiler->opts.compDbgEnC || (calleeSaveRegs(varDsc->lvType)) == RBM_NONE);
2793 if (tree->IsCall() && (tree->gtFlags & GTF_CALL_UNMANAGED) != 0)
2795 RefPosition* pos = newRefPosition((Interval*)nullptr, currentLoc, RefTypeKillGCRefs, tree,
2796 (allRegs(TYP_REF) & ~RBM_ARG_REGS));
2804 RefPosition* LinearScan::defineNewInternalTemp(GenTree* tree,
2805 RegisterType regType,
2806 LsraLocation currentLoc,
2809 Interval* current = newInterval(regType);
2810 current->isInternal = true;
2811 return newRefPosition(current, currentLoc, RefTypeDef, tree, regMask);
2814 int LinearScan::buildInternalRegisterDefsForNode(GenTree* tree,
2815 LsraLocation currentLoc,
2816 RefPosition* temps[]) // populates
2819 int internalIntCount = tree->gtLsraInfo.internalIntCount;
2820 regMaskTP internalCands = tree->gtLsraInfo.getInternalCandidates(this);
2822 // If the number of internal integer registers required is the same as the number of candidate integer registers in
2823 // the candidate set, then they must be handled as fixed registers.
2824 // (E.g. for the integer registers that floating point arguments must be copied into for a varargs call.)
2825 bool fixedRegs = false;
2826 regMaskTP internalIntCandidates = (internalCands & allRegs(TYP_INT));
2827 if (((int)genCountBits(internalIntCandidates)) == internalIntCount)
2832 for (count = 0; count < internalIntCount; count++)
2834 regMaskTP internalIntCands = (internalCands & allRegs(TYP_INT));
2837 internalIntCands = genFindLowestBit(internalIntCands);
2838 internalCands &= ~internalIntCands;
2840 temps[count] = defineNewInternalTemp(tree, IntRegisterType, currentLoc, internalIntCands);
2843 int internalFloatCount = tree->gtLsraInfo.internalFloatCount;
2844 for (int i = 0; i < internalFloatCount; i++)
2846 regMaskTP internalFPCands = (internalCands & internalFloatRegCandidates());
2847 temps[count++] = defineNewInternalTemp(tree, FloatRegisterType, currentLoc, internalFPCands);
2850 noway_assert(count < MaxInternalRegisters);
2851 assert(count == (internalIntCount + internalFloatCount));
2855 void LinearScan::buildInternalRegisterUsesForNode(GenTree* tree,
2856 LsraLocation currentLoc,
2857 RefPosition* defs[],
2860 assert(total < MaxInternalRegisters);
2862 // defs[] has been populated by buildInternalRegisterDefsForNode
2863 // now just add uses to the defs previously added.
2864 for (int i = 0; i < total; i++)
2866 RefPosition* prevRefPosition = defs[i];
2867 assert(prevRefPosition != nullptr);
2868 regMaskTP mask = prevRefPosition->registerAssignment;
2869 if (prevRefPosition->isPhysRegRef)
2871 newRefPosition(defs[i]->getReg()->regNum, currentLoc, RefTypeUse, tree, mask);
2875 RefPosition* newest = newRefPosition(defs[i]->getInterval(), currentLoc, RefTypeUse, tree, mask);
2876 newest->lastUse = true;
2881 regMaskTP LinearScan::getUseCandidates(GenTree* useNode)
2883 TreeNodeInfo info = useNode->gtLsraInfo;
2884 return info.getSrcCandidates(this);
2887 regMaskTP LinearScan::getDefCandidates(GenTree* tree)
2889 TreeNodeInfo info = tree->gtLsraInfo;
2890 return info.getDstCandidates(this);
2893 RegisterType LinearScan::getDefType(GenTree* tree)
2895 return tree->TypeGet();
2898 regMaskTP fixedCandidateMask(var_types type, regMaskTP candidates)
2900 if (genMaxOneBit(candidates))
2907 //------------------------------------------------------------------------
2908 // LocationInfoListNode: used to store a single `LocationInfo` value for a
2909 // node during `buildIntervals`.
2911 // This is the node type for `LocationInfoList` below.
2913 class LocationInfoListNode final : public LocationInfo
2915 friend class LocationInfoList;
2916 friend class LocationInfoListNodePool;
2918 LocationInfoListNode* m_next; // The next node in the list
2921 LocationInfoListNode(LsraLocation l, Interval* i, GenTree* t, unsigned regIdx = 0) : LocationInfo(l, i, t, regIdx)
2925 //------------------------------------------------------------------------
2926 // LocationInfoListNode::Next: Returns the next node in the list.
2927 LocationInfoListNode* Next() const
2933 //------------------------------------------------------------------------
2934 // LocationInfoList: used to store a list of `LocationInfo` values for a
2935 // node during `buildIntervals`.
2937 // Given an IR node that either directly defines N registers or that is a
2938 // contained node with uses that define a total of N registers, that node
2939 // will map to N `LocationInfo` values. These values are stored as a
2940 // linked list of `LocationInfoListNode` values.
2942 class LocationInfoList final
2944 friend class LocationInfoListNodePool;
2946 LocationInfoListNode* m_head; // The head of the list
2947 LocationInfoListNode* m_tail; // The tail of the list
2950 LocationInfoList() : m_head(nullptr), m_tail(nullptr)
2954 LocationInfoList(LocationInfoListNode* node) : m_head(node), m_tail(node)
2956 assert(m_head->m_next == nullptr);
2959 //------------------------------------------------------------------------
2960 // LocationInfoList::IsEmpty: Returns true if the list is empty.
2962 bool IsEmpty() const
2964 return m_head == nullptr;
2967 //------------------------------------------------------------------------
2968 // LocationInfoList::Begin: Returns the first node in the list.
2970 LocationInfoListNode* Begin() const
2975 //------------------------------------------------------------------------
2976 // LocationInfoList::End: Returns the position after the last node in the
2977 // list. The returned value is suitable for use as
2978 // a sentinel for iteration.
2980 LocationInfoListNode* End() const
2985 //------------------------------------------------------------------------
2986 // LocationInfoList::Append: Appends a node to the list.
2989 // node - The node to append. Must not be part of an existing list.
2991 void Append(LocationInfoListNode* node)
2993 assert(node->m_next == nullptr);
2995 if (m_tail == nullptr)
2997 assert(m_head == nullptr);
3002 m_tail->m_next = node;
3008 //------------------------------------------------------------------------
3009 // LocationInfoList::Append: Appends another list to this list.
3012 // other - The list to append.
3014 void Append(LocationInfoList other)
3016 if (m_tail == nullptr)
3018 assert(m_head == nullptr);
3019 m_head = other.m_head;
3023 m_tail->m_next = other.m_head;
3026 m_tail = other.m_tail;
3030 //------------------------------------------------------------------------
3031 // LocationInfoListNodePool: manages a pool of `LocationInfoListNode`
3032 // values to decrease overall memory usage
3033 // during `buildIntervals`.
3035 // `buildIntervals` involves creating a list of location info values per
3036 // node that either directly produces a set of registers or that is a
3037 // contained node with register-producing sources. However, these lists
3038 // are short-lived: they are destroyed once the use of the corresponding
3039 // node is processed. As such, there is typically only a small number of
3040 // `LocationInfoListNode` values in use at any given time. Pooling these
3041 // values avoids otherwise frequent allocations.
3042 class LocationInfoListNodePool final
3044 LocationInfoListNode* m_freeList;
3045 Compiler* m_compiler;
3048 //------------------------------------------------------------------------
3049 // LocationInfoListNodePool::LocationInfoListNodePool:
3050 // Creates a pool of `LocationInfoListNode` values.
3053 // compiler - The compiler context.
3054 // preallocate - The number of nodes to preallocate.
3056 LocationInfoListNodePool(Compiler* compiler, unsigned preallocate = 0) : m_compiler(compiler)
3058 if (preallocate > 0)
3060 size_t preallocateSize = sizeof(LocationInfoListNode) * preallocate;
3061 auto* preallocatedNodes = reinterpret_cast<LocationInfoListNode*>(compiler->compGetMem(preallocateSize));
3063 LocationInfoListNode* head = preallocatedNodes;
3064 head->m_next = nullptr;
3066 for (unsigned i = 1; i < preallocate; i++)
3068 LocationInfoListNode* node = &preallocatedNodes[i];
3069 node->m_next = head;
3077 //------------------------------------------------------------------------
3078 // LocationInfoListNodePool::GetNode: Fetches an unused node from the
3082 // l - - The `LsraLocation` for the `LocationInfo` value.
3083 // i - The interval for the `LocationInfo` value.
3084 // t - The IR node for the `LocationInfo` value
3085 // regIdx - The register index for the `LocationInfo` value.
3088 // A pooled or newly-allocated `LocationInfoListNode`, depending on the
3089 // contents of the pool.
3090 LocationInfoListNode* GetNode(LsraLocation l, Interval* i, GenTree* t, unsigned regIdx = 0)
3092 LocationInfoListNode* head = m_freeList;
3093 if (head == nullptr)
3095 head = reinterpret_cast<LocationInfoListNode*>(m_compiler->compGetMem(sizeof(LocationInfoListNode)));
3099 m_freeList = head->m_next;
3105 head->multiRegIdx = regIdx;
3106 head->m_next = nullptr;
3111 //------------------------------------------------------------------------
3112 // LocationInfoListNodePool::ReturnNodes: Returns a list of nodes to the
3116 // list - The list to return.
3118 void ReturnNodes(LocationInfoList& list)
3120 assert(list.m_head != nullptr);
3121 assert(list.m_tail != nullptr);
3123 LocationInfoListNode* head = m_freeList;
3124 list.m_tail->m_next = head;
3125 m_freeList = list.m_head;
3129 #if FEATURE_PARTIAL_SIMD_CALLEE_SAVE
3131 LinearScan::buildUpperVectorSaveRefPositions(GenTree* tree, LsraLocation currentLoc)
3133 VARSET_TP VARSET_INIT_NOCOPY(liveLargeVectors, VarSetOps::MakeEmpty(compiler));
3134 regMaskTP fpCalleeKillSet = RBM_NONE;
3135 if (!VarSetOps::IsEmpty(compiler, largeVectorVars))
3137 // We actually need to find any calls that kill the upper-half of the callee-save vector registers.
3138 // But we will use as a proxy any node that kills floating point registers.
3139 // (Note that some calls are masquerading as other nodes at this point so we can't just check for calls.)
3140 fpCalleeKillSet = getKillSetForNode(tree);
3141 if ((fpCalleeKillSet & RBM_FLT_CALLEE_TRASH) != RBM_NONE)
3143 VarSetOps::AssignNoCopy(compiler, liveLargeVectors,
3144 VarSetOps::Intersection(compiler, currentLiveVars, largeVectorVars));
3145 VARSET_ITER_INIT(compiler, iter, liveLargeVectors, varIndex);
3146 while (iter.NextElem(compiler, &varIndex))
3148 unsigned varNum = compiler->lvaTrackedToVarNum[varIndex];
3149 Interval* varInterval = getIntervalForLocalVar(varNum);
3150 Interval* tempInterval = newInterval(LargeVectorType);
3151 tempInterval->isInternal = true;
3153 newRefPosition(tempInterval, currentLoc, RefTypeUpperVectorSaveDef, tree, RBM_FLT_CALLEE_SAVED);
3154 // We are going to save the existing relatedInterval of varInterval on tempInterval, so that we can set
3155 // the tempInterval as the relatedInterval of varInterval, so that we can build the corresponding
3156 // RefTypeUpperVectorSaveUse RefPosition. We will then restore the relatedInterval onto varInterval,
3157 // and set varInterval as the relatedInterval of tempInterval.
3158 tempInterval->relatedInterval = varInterval->relatedInterval;
3159 varInterval->relatedInterval = tempInterval;
3163 return liveLargeVectors;
3166 void LinearScan::buildUpperVectorRestoreRefPositions(GenTree* tree,
3167 LsraLocation currentLoc,
3168 VARSET_VALARG_TP liveLargeVectors)
3170 if (!VarSetOps::IsEmpty(compiler, liveLargeVectors))
3172 VARSET_ITER_INIT(compiler, iter, liveLargeVectors, varIndex);
3173 while (iter.NextElem(compiler, &varIndex))
3175 unsigned varNum = compiler->lvaTrackedToVarNum[varIndex];
3176 Interval* varInterval = getIntervalForLocalVar(varNum);
3177 Interval* tempInterval = varInterval->relatedInterval;
3178 assert(tempInterval->isInternal == true);
3180 newRefPosition(tempInterval, currentLoc, RefTypeUpperVectorSaveUse, tree, RBM_FLT_CALLEE_SAVED);
3181 // Restore the relatedInterval onto varInterval, and set varInterval as the relatedInterval
3183 varInterval->relatedInterval = tempInterval->relatedInterval;
3184 tempInterval->relatedInterval = varInterval;
3188 #endif // FEATURE_PARTIAL_SIMD_CALLEE_SAVE
3191 //------------------------------------------------------------------------
3192 // ComputeOperandDstCount: computes the number of registers defined by a
3195 // For most nodes, this is simple:
3196 // - Nodes that do not produce values (e.g. stores and other void-typed
3197 // nodes) and nodes that immediately use the registers they define
3198 // produce no registers
3199 // - Nodes that are marked as defining N registers define N registers.
3201 // For contained nodes, however, things are more complicated: for purposes
3202 // of bookkeeping, a contained node is treated as producing the transitive
3203 // closure of the registers produced by its sources.
3206 // operand - The operand for which to compute a register count.
3209 // The number of registers defined by `operand`.
3211 static int ComputeOperandDstCount(GenTree* operand)
3213 TreeNodeInfo& operandInfo = operand->gtLsraInfo;
3215 if (operandInfo.isLocalDefUse)
3217 // Operands that define an unused value do not produce any registers.
3220 else if (operandInfo.dstCount != 0)
3222 // Operands that have a specified number of destination registers consume all of their operands
3223 // and therefore produce exactly that number of registers.
3224 return operandInfo.dstCount;
3226 else if (operandInfo.srcCount != 0)
3228 // If an operand has no destination registers but does have source registers, it must be a store
3230 assert(operand->OperIsStore() || operand->OperIsBlkOp() || operand->OperIsPutArgStk() ||
3231 operand->OperIsCompare());
3234 else if (!operand->OperIsAggregate() && (operand->OperIsStore() || operand->TypeGet() == TYP_VOID))
3236 // Stores and void-typed operands may be encountered when processing call nodes, which contain
3237 // pointers to argument setup stores.
3242 // If an aggregate or non-void-typed operand is not an unsued value and does not have source registers,
3243 // that argument is contained within its parent and produces `sum(operand_dst_count)` registers.
3245 for (GenTree* op : operand->Operands())
3247 dstCount += ComputeOperandDstCount(op);
3254 //------------------------------------------------------------------------
3255 // ComputeAvailableSrcCount: computes the number of registers available as
3256 // sources for a node.
3258 // This is simply the sum of the number of registers prduced by each
3259 // operand to the node.
3262 // node - The node for which to compute a source count.
3265 // The number of registers available as sources for `node`.
3267 static int ComputeAvailableSrcCount(GenTree* node)
3270 for (GenTree* operand : node->Operands())
3272 numSources += ComputeOperandDstCount(operand);
3279 void LinearScan::buildRefPositionsForNode(GenTree* tree,
3281 LocationInfoListNodePool& listNodePool,
3282 HashTableBase<GenTree*, LocationInfoList>& operandToLocationInfoMap,
3283 LsraLocation currentLoc)
3286 assert(!isRegPairType(tree->TypeGet()));
3287 #endif // _TARGET_ARM_
3289 // The LIR traversal doesn't visit non-aggregate GT_LIST or GT_ARGPLACE nodes
3290 assert(tree->OperGet() != GT_ARGPLACE);
3291 assert((tree->OperGet() != GT_LIST) || tree->AsArgList()->IsAggregate());
3293 // These nodes are eliminated by the Rationalizer.
3294 if (tree->OperGet() == GT_CLS_VAR)
3296 JITDUMP("Unexpected node %s in LSRA.\n", GenTree::NodeName(tree->OperGet()));
3297 assert(!"Unexpected node in LSRA.");
3300 // The set of internal temporary registers used by this node are stored in the
3301 // gtRsvdRegs register mask. Clear it out.
3302 tree->gtRsvdRegs = RBM_NONE;
3307 JITDUMP("at start of tree, map contains: { ");
3309 for (auto kvp : operandToLocationInfoMap)
3311 GenTree* node = kvp.Key();
3312 LocationInfoList defList = kvp.Value();
3314 JITDUMP("%sN%03u. %s -> (", first ? "" : "; ", node->gtSeqNum, GenTree::NodeName(node->OperGet()));
3315 for (LocationInfoListNode *def = defList.Begin(), *end = defList.End(); def != end; def = def->Next())
3317 JITDUMP("%s%d.N%03u", def == defList.Begin() ? "" : ", ", def->loc, def->treeNode->gtSeqNum);
3327 TreeNodeInfo info = tree->gtLsraInfo;
3328 assert(info.IsValid(this));
3329 int consume = info.srcCount;
3330 int produce = info.dstCount;
3332 assert(((consume == 0) && (produce == 0)) || (ComputeAvailableSrcCount(tree) == consume));
3334 if (isCandidateLocalRef(tree) && !tree->OperIsLocalStore())
3336 assert(consume == 0);
3338 // We handle tracked variables differently from non-tracked ones. If it is tracked,
3339 // we simply add a use or def of the tracked variable. Otherwise, for a use we need
3340 // to actually add the appropriate references for loading or storing the variable.
3342 // It won't actually get used or defined until the appropriate ancestor tree node
3343 // is processed, unless this is marked "isLocalDefUse" because it is a stack-based argument
3346 Interval* interval = getIntervalForLocalVar(tree->gtLclVarCommon.gtLclNum);
3347 regMaskTP candidates = getUseCandidates(tree);
3348 regMaskTP fixedAssignment = fixedCandidateMask(tree->TypeGet(), candidates);
3350 // We have only approximate last-use information at this point. This is because the
3351 // execution order doesn't actually reflect the true order in which the localVars
3352 // are referenced - but the order of the RefPositions will, so we recompute it after
3353 // RefPositions are built.
3354 // Use the old value for setting currentLiveVars - note that we do this with the
3355 // not-quite-correct setting of lastUse. However, this is OK because
3356 // 1) this is only for preferencing, which doesn't require strict correctness, and
3357 // 2) the cases where these out-of-order uses occur should not overlap a kill.
3358 // TODO-Throughput: clean this up once we have the execution order correct. At that point
3359 // we can update currentLiveVars at the same place that we create the RefPosition.
3360 if ((tree->gtFlags & GTF_VAR_DEATH) != 0)
3362 VarSetOps::RemoveElemD(compiler, currentLiveVars,
3363 compiler->lvaTable[tree->gtLclVarCommon.gtLclNum].lvVarIndex);
3366 JITDUMP("t%u (i:%u)\n", currentLoc, interval->intervalIndex);
3368 if (!info.isLocalDefUse)
3372 LocationInfoList list(listNodePool.GetNode(currentLoc, interval, tree));
3373 bool added = operandToLocationInfoMap.AddOrUpdate(tree, list);
3376 tree->gtLsraInfo.definesAnyRegisters = true;
3383 JITDUMP(" Not added to map\n");
3384 regMaskTP candidates = getUseCandidates(tree);
3386 if (fixedAssignment != RBM_NONE)
3388 candidates = fixedAssignment;
3390 RefPosition* pos = newRefPosition(interval, currentLoc, RefTypeUse, tree, candidates);
3391 pos->isLocalDefUse = true;
3392 bool isLastUse = ((tree->gtFlags & GTF_VAR_DEATH) != 0);
3393 pos->lastUse = isLastUse;
3394 pos->setAllocateIfProfitable(tree->IsRegOptional());
3395 DBEXEC(VERBOSE, pos->dump());
3403 lsraDispNode(tree, LSRA_DUMP_REFPOS, (produce != 0));
3405 JITDUMP(" consume=%d produce=%d\n", consume, produce);
3409 // Handle the case of local variable assignment
3410 Interval* varDefInterval = nullptr;
3411 RefType defRefType = RefTypeDef;
3413 GenTree* defNode = tree;
3415 // noAdd means the node creates a def but for purposes of map
3416 // management do not add it because data is not flowing up the
3417 // tree but over (as in ASG nodes)
3419 bool noAdd = info.isLocalDefUse;
3420 RefPosition* prevPos = nullptr;
3422 bool isSpecialPutArg = false;
3424 assert(!tree->OperIsAssignment());
3425 if (tree->OperIsLocalStore())
3427 if (isCandidateLocalRef(tree))
3429 // We always push the tracked lclVar intervals
3430 varDefInterval = getIntervalForLocalVar(tree->gtLclVarCommon.gtLclNum);
3431 defRefType = refTypeForLocalRefNode(tree);
3439 assert(consume <= MAX_RET_REG_COUNT);
3442 // Get the location info for the register defined by the first operand.
3443 LocationInfoList operandDefs;
3444 bool found = operandToLocationInfoMap.TryGetValue(*(tree->OperandsBegin()), &operandDefs);
3447 // Since we only expect to consume one register, we should only have a single register to
3449 assert(operandDefs.Begin()->Next() == operandDefs.End());
3451 LocationInfo& operandInfo = *static_cast<LocationInfo*>(operandDefs.Begin());
3453 Interval* srcInterval = operandInfo.interval;
3454 if (srcInterval->relatedInterval == nullptr)
3456 // Preference the source to the dest, unless this is a non-last-use localVar.
3457 // Note that the last-use info is not correct, but it is a better approximation than preferencing
3458 // the source to the dest, if the source's lifetime extends beyond the dest.
3459 if (!srcInterval->isLocalVar || (operandInfo.treeNode->gtFlags & GTF_VAR_DEATH) != 0)
3461 srcInterval->assignRelatedInterval(varDefInterval);
3464 else if (!srcInterval->isLocalVar)
3466 // Preference the source to dest, if src is not a local var.
3467 srcInterval->assignRelatedInterval(varDefInterval);
3470 // We can have a case where the source of the store has a different register type,
3471 // e.g. when the store is of a return value temp, and op1 is a Vector2
3472 // (TYP_SIMD8). We will need to set the
3473 // src candidates accordingly on op1 so that LSRA will generate a copy.
3474 // We could do this during Lowering, but at that point we don't know whether
3475 // this lclVar will be a register candidate, and if not, we would prefer to leave
3477 if (regType(tree->gtGetOp1()->TypeGet()) != regType(tree->TypeGet()))
3479 tree->gtGetOp1()->gtLsraInfo.setSrcCandidates(this, allRegs(tree->TypeGet()));
3483 if ((tree->gtFlags & GTF_VAR_DEATH) == 0)
3485 VarSetOps::AddElemD(compiler, currentLiveVars,
3486 compiler->lvaTable[tree->gtLclVarCommon.gtLclNum].lvVarIndex);
3490 else if (noAdd && produce == 0)
3492 // This is the case for dead nodes that occur after
3493 // tree rationalization
3494 // TODO-Cleanup: Identify and remove these dead nodes prior to register allocation.
3495 if (tree->IsMultiRegCall())
3497 // In case of multi-reg call node, produce = number of return registers
3498 produce = tree->AsCall()->GetReturnTypeDesc()->GetReturnRegCount();
3511 if (varDefInterval != nullptr)
3513 printf("t%u (i:%u) = op ", currentLoc, varDefInterval->intervalIndex);
3517 for (int i = 0; i < produce; i++)
3519 printf("t%u ", currentLoc);
3532 Interval* prefSrcInterval = nullptr;
3534 // If this is a binary operator that will be encoded with 2 operand fields
3535 // (i.e. the target is read-modify-write), preference the dst to op1.
3537 bool hasDelayFreeSrc = tree->gtLsraInfo.hasDelayFreeSrc;
3538 if (tree->OperGet() == GT_PUTARG_REG && isCandidateLocalRef(tree->gtGetOp1()) &&
3539 (tree->gtGetOp1()->gtFlags & GTF_VAR_DEATH) == 0)
3541 // This is the case for a "pass-through" copy of a lclVar. In the case where it is a non-last-use,
3542 // we don't want the def of the copy to kill the lclVar register, if it is assigned the same register
3543 // (which is actually what we hope will happen).
3544 JITDUMP("Setting putarg_reg as a pass-through of a non-last use lclVar\n");
3546 // Get the register information for the first operand of the node.
3547 LocationInfoList operandDefs;
3548 bool found = operandToLocationInfoMap.TryGetValue(*(tree->OperandsBegin()), &operandDefs);
3551 // Preference the destination to the interval of the first register defined by the first operand.
3552 Interval* srcInterval = operandDefs.Begin()->interval;
3553 assert(srcInterval->isLocalVar);
3554 prefSrcInterval = srcInterval;
3555 isSpecialPutArg = true;
3558 RefPosition* internalRefs[MaxInternalRegisters];
3560 // make intervals for all the 'internal' register requirements for this node
3561 // where internal means additional registers required temporarily
3562 int internalCount = buildInternalRegisterDefsForNode(tree, currentLoc, internalRefs);
3564 // pop all ref'd tree temps
3565 GenTreeOperandIterator iterator = tree->OperandsBegin();
3567 // `operandDefs` holds the list of `LocationInfo` values for the registers defined by the current
3568 // operand. `operandDefsIterator` points to the current `LocationInfo` value in `operandDefs`.
3569 LocationInfoList operandDefs;
3570 LocationInfoListNode* operandDefsIterator = operandDefs.End();
3571 for (int useIndex = 0; useIndex < consume; useIndex++)
3573 // If we've consumed all of the registers defined by the current operand, advance to the next
3574 // operand that defines any registers.
3575 if (operandDefsIterator == operandDefs.End())
3577 // Skip operands that do not define any registers, whether directly or indirectly.
3581 assert(iterator != tree->OperandsEnd());
3582 operand = *iterator;
3585 } while (!operand->gtLsraInfo.definesAnyRegisters);
3587 // If we have already processed a previous operand, return its `LocationInfo` list to the
3591 assert(!operandDefs.IsEmpty());
3592 listNodePool.ReturnNodes(operandDefs);
3595 // Remove the list of registers defined by the current operand from the map. Note that this
3596 // is only correct because tree nodes are singly-used: if this property ever changes (e.g.
3597 // if tree nodes are eventually allowed to be multiply-used), then the removal is only
3598 // correct at the last use.
3599 bool removed = operandToLocationInfoMap.TryRemove(operand, &operandDefs);
3602 // Move the operand def iterator to the `LocationInfo` for the first register defined by the
3604 operandDefsIterator = operandDefs.Begin();
3605 assert(operandDefsIterator != operandDefs.End());
3608 LocationInfo& locInfo = *static_cast<LocationInfo*>(operandDefsIterator);
3609 operandDefsIterator = operandDefsIterator->Next();
3611 JITDUMP("t%u ", locInfo.loc);
3613 // for interstitial tree temps, a use is always last and end;
3614 // this is set by default in newRefPosition
3615 GenTree* useNode = locInfo.treeNode;
3616 assert(useNode != nullptr);
3617 var_types type = useNode->TypeGet();
3618 regMaskTP candidates = getUseCandidates(useNode);
3619 Interval* i = locInfo.interval;
3620 unsigned multiRegIdx = locInfo.multiRegIdx;
3623 // In case of multi-reg call store to a local, there won't be any mismatch of
3624 // use candidates with the type of the tree node.
3625 if (tree->OperIsLocalStore() && varDefInterval == nullptr && !useNode->IsMultiRegCall())
3627 // This is a non-candidate store. If this is a SIMD type, the use candidates
3628 // may not match the type of the tree node. If that is the case, change the
3629 // type of the tree node to match, so that we do the right kind of store.
3630 if ((candidates & allRegs(tree->gtType)) == RBM_NONE)
3632 noway_assert((candidates & allRegs(useNode->gtType)) != RBM_NONE);
3633 // Currently, the only case where this should happen is for a TYP_LONG
3634 // source and a TYP_SIMD8 target.
3635 assert((useNode->gtType == TYP_LONG && tree->gtType == TYP_SIMD8) ||
3636 (useNode->gtType == TYP_SIMD8 && tree->gtType == TYP_LONG));
3637 tree->gtType = useNode->gtType;
3640 #endif // FEATURE_SIMD
3642 bool delayRegFree = (hasDelayFreeSrc && useNode->gtLsraInfo.isDelayFree);
3643 if (useNode->gtLsraInfo.isTgtPref)
3645 prefSrcInterval = i;
3648 bool regOptionalAtUse = useNode->IsRegOptional();
3649 bool isLastUse = true;
3650 if (isCandidateLocalRef(useNode))
3652 isLastUse = ((useNode->gtFlags & GTF_VAR_DEATH) != 0);
3656 // For non-localVar uses we record nothing,
3657 // as nothing needs to be written back to the tree.
3661 regMaskTP fixedAssignment = fixedCandidateMask(type, candidates);
3662 if (fixedAssignment != RBM_NONE)
3664 candidates = fixedAssignment;
3668 if ((candidates & allRegs(i->registerType)) == 0)
3670 // This should only occur where we've got a type mismatch due to SIMD
3671 // pointer-size types that are passed & returned as longs.
3672 i->hasConflictingDefUse = true;
3673 if (fixedAssignment != RBM_NONE)
3675 // Explicitly insert a FixedRefPosition and fake the candidates, because otherwise newRefPosition
3676 // will complain about the types not matching.
3677 regNumber physicalReg = genRegNumFromMask(fixedAssignment);
3678 RefPosition* pos = newRefPosition(physicalReg, currentLoc, RefTypeFixedReg, nullptr, fixedAssignment);
3680 pos = newRefPosition(i, currentLoc, RefTypeUse, useNode, allRegs(i->registerType), multiRegIdx);
3681 pos->registerAssignment = candidates;
3685 pos = newRefPosition(i, currentLoc, RefTypeUse, useNode, candidates, multiRegIdx);
3689 hasDelayFreeSrc = true;
3690 pos->delayRegFree = true;
3695 pos->lastUse = true;
3698 if (regOptionalAtUse)
3700 pos->setAllocateIfProfitable(1);
3705 if (!operandDefs.IsEmpty())
3707 listNodePool.ReturnNodes(operandDefs);
3710 buildInternalRegisterUsesForNode(tree, currentLoc, internalRefs, internalCount);
3712 RegisterType registerType = getDefType(tree);
3713 regMaskTP candidates = getDefCandidates(tree);
3714 regMaskTP useCandidates = getUseCandidates(tree);
3719 printf("Def candidates ");
3720 dumpRegMask(candidates);
3721 printf(", Use candidates ");
3722 dumpRegMask(useCandidates);
3727 #if defined(_TARGET_AMD64_)
3728 // Multi-reg call node is the only node that could produce multi-reg value
3729 assert(produce <= 1 || (tree->IsMultiRegCall() && produce == MAX_RET_REG_COUNT));
3730 #elif defined(_TARGET_ARM_)
3731 assert(!varTypeIsMultiReg(tree->TypeGet()));
3732 #endif // _TARGET_xxx_
3734 // Add kill positions before adding def positions
3735 buildKillPositionsForNode(tree, currentLoc + 1);
3737 #if FEATURE_PARTIAL_SIMD_CALLEE_SAVE
3738 VARSET_TP VARSET_INIT_NOCOPY(liveLargeVectors, VarSetOps::UninitVal());
3739 if (RBM_FLT_CALLEE_SAVED != RBM_NONE)
3741 // Build RefPositions for saving any live large vectors.
3742 // This must be done after the kills, so that we know which large vectors are still live.
3743 VarSetOps::AssignNoCopy(compiler, liveLargeVectors, buildUpperVectorSaveRefPositions(tree, currentLoc));
3745 #endif // FEATURE_PARTIAL_SIMD_CALLEE_SAVE
3747 ReturnTypeDesc* retTypeDesc = nullptr;
3748 bool isMultiRegCall = tree->IsMultiRegCall();
3751 retTypeDesc = tree->AsCall()->GetReturnTypeDesc();
3752 assert((int)genCountBits(candidates) == produce);
3753 assert(candidates == retTypeDesc->GetABIReturnRegs());
3757 LocationInfoList locationInfoList;
3758 LsraLocation defLocation = currentLoc + 1;
3759 for (int i = 0; i < produce; i++)
3761 regMaskTP currCandidates = candidates;
3762 Interval* interval = varDefInterval;
3764 // In case of multi-reg call node, registerType is given by
3765 // the type of ith position return register.
3768 registerType = retTypeDesc->GetReturnRegType((unsigned)i);
3769 currCandidates = genRegMask(retTypeDesc->GetABIReturnReg(i));
3770 useCandidates = allRegs(registerType);
3773 if (interval == nullptr)
3775 // Make a new interval
3776 interval = newInterval(registerType);
3777 if (hasDelayFreeSrc)
3779 interval->hasNonCommutativeRMWDef = true;
3781 else if (tree->OperIsConst())
3783 assert(!tree->IsReuseRegVal());
3784 interval->isConstant = true;
3787 if ((currCandidates & useCandidates) != RBM_NONE)
3789 interval->updateRegisterPreferences(currCandidates & useCandidates);
3792 if (isSpecialPutArg)
3794 interval->isSpecialPutArg = true;
3799 assert(registerTypesEquivalent(interval->registerType, registerType));
3802 if (prefSrcInterval != nullptr)
3804 interval->assignRelatedIntervalIfUnassigned(prefSrcInterval);
3807 // for assignments, we want to create a refposition for the def
3811 locationInfoList.Append(listNodePool.GetNode(defLocation, interval, tree, (unsigned)i));
3814 RefPosition* pos = newRefPosition(interval, defLocation, defRefType, defNode, currCandidates, (unsigned)i);
3815 if (info.isLocalDefUse)
3817 pos->isLocalDefUse = true;
3818 pos->lastUse = true;
3820 DBEXEC(VERBOSE, pos->dump());
3821 interval->updateRegisterPreferences(currCandidates);
3822 interval->updateRegisterPreferences(useCandidates);
3825 #if FEATURE_PARTIAL_SIMD_CALLEE_SAVE
3826 buildUpperVectorRestoreRefPositions(tree, currentLoc, liveLargeVectors);
3827 #endif // FEATURE_PARTIAL_SIMD_CALLEE_SAVE
3829 bool isContainedNode =
3830 !noAdd && consume == 0 && produce == 0 && (tree->OperIsAggregate() || (tree->TypeGet() != TYP_VOID && !tree->OperIsStore()));
3831 if (isContainedNode)
3833 // Contained nodes map to the concatenated lists of their operands.
3834 for (GenTree* op : tree->Operands())
3836 if (!op->gtLsraInfo.definesAnyRegisters)
3838 assert(ComputeOperandDstCount(op) == 0);
3842 LocationInfoList operandList;
3843 bool removed = operandToLocationInfoMap.TryRemove(op, &operandList);
3846 locationInfoList.Append(operandList);
3850 if (!locationInfoList.IsEmpty())
3852 bool added = operandToLocationInfoMap.AddOrUpdate(tree, locationInfoList);
3854 tree->gtLsraInfo.definesAnyRegisters = true;
3858 // make an interval for each physical register
3859 void LinearScan::buildPhysRegRecords()
3861 RegisterType regType = IntRegisterType;
3862 for (regNumber reg = REG_FIRST; reg < ACTUAL_REG_COUNT; reg = REG_NEXT(reg))
3864 RegRecord* curr = &physRegs[reg];
3869 BasicBlock* getNonEmptyBlock(BasicBlock* block)
3871 while (block != nullptr && block->bbTreeList == nullptr)
3873 BasicBlock* nextBlock = block->bbNext;
3874 // Note that here we use the version of NumSucc that does not take a compiler.
3875 // That way this doesn't have to take a compiler, or be an instance method, e.g. of LinearScan.
3876 // If we have an empty block, it must have jump type BBJ_NONE or BBJ_ALWAYS, in which
3877 // case we don't need the version that takes a compiler.
3878 assert(block->NumSucc() == 1 && ((block->bbJumpKind == BBJ_ALWAYS) || (block->bbJumpKind == BBJ_NONE)));
3879 // sometimes the first block is empty and ends with an uncond branch
3880 // assert( block->GetSucc(0) == nextBlock);
3883 assert(block != nullptr && block->bbTreeList != nullptr);
3887 void LinearScan::insertZeroInitRefPositions()
3889 // insert defs for this, then a block boundary
3891 VARSET_ITER_INIT(compiler, iter, compiler->fgFirstBB->bbLiveIn, varIndex);
3892 while (iter.NextElem(compiler, &varIndex))
3894 unsigned varNum = compiler->lvaTrackedToVarNum[varIndex];
3895 LclVarDsc* varDsc = compiler->lvaTable + varNum;
3896 if (!varDsc->lvIsParam && isCandidateVar(varDsc) &&
3897 (compiler->info.compInitMem || varTypeIsGC(varDsc->TypeGet())))
3899 GenTree* firstNode = getNonEmptyBlock(compiler->fgFirstBB)->firstNode();
3900 JITDUMP("V%02u was live in\n", varNum);
3901 Interval* interval = getIntervalForLocalVar(varNum);
3903 newRefPosition(interval, MinLocation, RefTypeZeroInit, firstNode, allRegs(interval->registerType));
3904 varDsc->lvMustInit = true;
3909 #if defined(FEATURE_UNIX_AMD64_STRUCT_PASSING)
3910 // -----------------------------------------------------------------------
3911 // Sets the register state for an argument of type STRUCT for System V systems.
3912 // See Compiler::raUpdateRegStateForArg(RegState *regState, LclVarDsc *argDsc) in regalloc.cpp
3913 // for how state for argument is updated for unix non-structs and Windows AMD64 structs.
3914 void LinearScan::unixAmd64UpdateRegStateForArg(LclVarDsc* argDsc)
3916 assert(varTypeIsStruct(argDsc));
3917 RegState* intRegState = &compiler->codeGen->intRegState;
3918 RegState* floatRegState = &compiler->codeGen->floatRegState;
3920 if ((argDsc->lvArgReg != REG_STK) && (argDsc->lvArgReg != REG_NA))
3922 if (genRegMask(argDsc->lvArgReg) & (RBM_ALLFLOAT))
3924 assert(genRegMask(argDsc->lvArgReg) & (RBM_FLTARG_REGS));
3925 floatRegState->rsCalleeRegArgMaskLiveIn |= genRegMask(argDsc->lvArgReg);
3929 assert(genRegMask(argDsc->lvArgReg) & (RBM_ARG_REGS));
3930 intRegState->rsCalleeRegArgMaskLiveIn |= genRegMask(argDsc->lvArgReg);
3934 if ((argDsc->lvOtherArgReg != REG_STK) && (argDsc->lvOtherArgReg != REG_NA))
3936 if (genRegMask(argDsc->lvOtherArgReg) & (RBM_ALLFLOAT))
3938 assert(genRegMask(argDsc->lvOtherArgReg) & (RBM_FLTARG_REGS));
3939 floatRegState->rsCalleeRegArgMaskLiveIn |= genRegMask(argDsc->lvOtherArgReg);
3943 assert(genRegMask(argDsc->lvOtherArgReg) & (RBM_ARG_REGS));
3944 intRegState->rsCalleeRegArgMaskLiveIn |= genRegMask(argDsc->lvOtherArgReg);
3949 #endif // defined(FEATURE_UNIX_AMD64_STRUCT_PASSING)
3951 //------------------------------------------------------------------------
3952 // updateRegStateForArg: Updates rsCalleeRegArgMaskLiveIn for the appropriate
3953 // regState (either compiler->intRegState or compiler->floatRegState),
3954 // with the lvArgReg on "argDsc"
3957 // argDsc - the argument for which the state is to be updated.
3959 // Return Value: None
3962 // The argument is live on entry to the function
3963 // (or is untracked and therefore assumed live)
3966 // This relies on a method in regAlloc.cpp that is shared between LSRA
3967 // and regAlloc. It is further abstracted here because regState is updated
3968 // separately for tracked and untracked variables in LSRA.
3970 void LinearScan::updateRegStateForArg(LclVarDsc* argDsc)
3972 #if defined(FEATURE_UNIX_AMD64_STRUCT_PASSING)
3973 // For System V AMD64 calls the argDsc can have 2 registers (for structs.)
3974 // Handle them here.
3975 if (varTypeIsStruct(argDsc))
3977 unixAmd64UpdateRegStateForArg(argDsc);
3980 #endif // defined(FEATURE_UNIX_AMD64_STRUCT_PASSING)
3982 RegState* intRegState = &compiler->codeGen->intRegState;
3983 RegState* floatRegState = &compiler->codeGen->floatRegState;
3984 // In the case of AMD64 we'll still use the floating point registers
3985 // to model the register usage for argument on vararg calls, so
3986 // we will ignore the varargs condition to determine whether we use
3987 // XMM registers or not for setting up the call.
3988 bool isFloat = (isFloatRegType(argDsc->lvType)
3989 #ifndef _TARGET_AMD64_
3990 && !compiler->info.compIsVarArgs
3994 if (argDsc->lvIsHfaRegArg())
4001 JITDUMP("Float arg V%02u in reg %s\n", (argDsc - compiler->lvaTable), getRegName(argDsc->lvArgReg));
4002 compiler->raUpdateRegStateForArg(floatRegState, argDsc);
4006 JITDUMP("Int arg V%02u in reg %s\n", (argDsc - compiler->lvaTable), getRegName(argDsc->lvArgReg));
4007 #if FEATURE_MULTIREG_ARGS
4008 if (argDsc->lvOtherArgReg != REG_NA)
4010 JITDUMP("(second half) in reg %s\n", getRegName(argDsc->lvOtherArgReg));
4012 #endif // FEATURE_MULTIREG_ARGS
4013 compiler->raUpdateRegStateForArg(intRegState, argDsc);
4018 //------------------------------------------------------------------------
4019 // findPredBlockForLiveIn: Determine which block should be used for the register locations of the live-in variables.
4022 // block - The block for which we're selecting a predecesor.
4023 // prevBlock - The previous block in in allocation order.
4024 // pPredBlockIsAllocated - A debug-only argument that indicates whether any of the predecessors have been seen
4025 // in allocation order.
4028 // The selected predecessor.
4031 // in DEBUG, caller initializes *pPredBlockIsAllocated to false, and it will be set to true if the block
4032 // returned is in fact a predecessor.
4035 // This will select a predecessor based on the heuristics obtained by getLsraBlockBoundaryLocations(), which can be
4037 // LSRA_BLOCK_BOUNDARY_PRED - Use the register locations of a predecessor block (default)
4038 // LSRA_BLOCK_BOUNDARY_LAYOUT - Use the register locations of the previous block in layout order.
4039 // This is the only case where this actually returns a different block.
4040 // LSRA_BLOCK_BOUNDARY_ROTATE - Rotate the register locations from a predecessor.
4041 // For this case, the block returned is the same as for LSRA_BLOCK_BOUNDARY_PRED, but
4042 // the register locations will be "rotated" to stress the resolution and allocation
4045 BasicBlock* LinearScan::findPredBlockForLiveIn(BasicBlock* block,
4046 BasicBlock* prevBlock DEBUGARG(bool* pPredBlockIsAllocated))
4048 BasicBlock* predBlock = nullptr;
4050 assert(*pPredBlockIsAllocated == false);
4051 if (getLsraBlockBoundaryLocations() == LSRA_BLOCK_BOUNDARY_LAYOUT)
4053 if (prevBlock != nullptr)
4055 predBlock = prevBlock;
4060 if (block != compiler->fgFirstBB)
4062 predBlock = block->GetUniquePred(compiler);
4063 if (predBlock != nullptr)
4065 if (isBlockVisited(predBlock))
4067 if (predBlock->bbJumpKind == BBJ_COND)
4069 // Special handling to improve matching on backedges.
4070 BasicBlock* otherBlock = (block == predBlock->bbNext) ? predBlock->bbJumpDest : predBlock->bbNext;
4071 noway_assert(otherBlock != nullptr);
4072 if (isBlockVisited(otherBlock))
4074 // This is the case when we have a conditional branch where one target has already
4075 // been visited. It would be best to use the same incoming regs as that block,
4076 // so that we have less likelihood of having to move registers.
4077 // For example, in determining the block to use for the starting register locations for
4078 // "block" in the following example, we'd like to use the same predecessor for "block"
4079 // as for "otherBlock", so that both successors of predBlock have the same locations, reducing
4080 // the likelihood of needing a split block on a backedge:
4091 for (flowList* pred = otherBlock->bbPreds; pred != nullptr; pred = pred->flNext)
4093 BasicBlock* otherPred = pred->flBlock;
4094 if (otherPred->bbNum == blockInfo[otherBlock->bbNum].predBBNum)
4096 predBlock = otherPred;
4105 predBlock = nullptr;
4110 for (flowList* pred = block->bbPreds; pred != nullptr; pred = pred->flNext)
4112 BasicBlock* candidatePredBlock = pred->flBlock;
4113 if (isBlockVisited(candidatePredBlock))
4115 if (predBlock == nullptr || predBlock->bbWeight < candidatePredBlock->bbWeight)
4117 predBlock = candidatePredBlock;
4118 INDEBUG(*pPredBlockIsAllocated = true;)
4123 if (predBlock == nullptr)
4125 predBlock = prevBlock;
4126 assert(predBlock != nullptr);
4127 JITDUMP("\n\nNo allocated predecessor; ");
4133 void LinearScan::buildIntervals()
4137 // start numbering at 1; 0 is the entry
4138 LsraLocation currentLoc = 1;
4140 JITDUMP("\nbuildIntervals ========\n");
4142 // Now build (empty) records for all of the physical registers
4143 buildPhysRegRecords();
4148 printf("\n-----------------\n");
4149 printf("LIVENESS:\n");
4150 printf("-----------------\n");
4151 foreach_block(compiler, block)
4153 printf("BB%02u use def in out\n", block->bbNum);
4154 dumpConvertedVarSet(compiler, block->bbVarUse);
4156 dumpConvertedVarSet(compiler, block->bbVarDef);
4158 dumpConvertedVarSet(compiler, block->bbLiveIn);
4160 dumpConvertedVarSet(compiler, block->bbLiveOut);
4166 identifyCandidates();
4168 DBEXEC(VERBOSE, TupleStyleDump(LSRA_DUMP_PRE));
4171 JITDUMP("\nbuildIntervals second part ========\n");
4174 // Next, create ParamDef RefPositions for all the tracked parameters,
4175 // in order of their varIndex
4178 unsigned int lclNum;
4180 RegState* intRegState = &compiler->codeGen->intRegState;
4181 RegState* floatRegState = &compiler->codeGen->floatRegState;
4182 intRegState->rsCalleeRegArgMaskLiveIn = RBM_NONE;
4183 floatRegState->rsCalleeRegArgMaskLiveIn = RBM_NONE;
4185 for (unsigned int varIndex = 0; varIndex < compiler->lvaTrackedCount; varIndex++)
4187 lclNum = compiler->lvaTrackedToVarNum[varIndex];
4188 argDsc = &(compiler->lvaTable[lclNum]);
4190 if (!argDsc->lvIsParam)
4195 // Only reserve a register if the argument is actually used.
4196 // Is it dead on entry? If compJmpOpUsed is true, then the arguments
4197 // have to be kept alive, so we have to consider it as live on entry.
4198 // Use lvRefCnt instead of checking bbLiveIn because if it's volatile we
4199 // won't have done dataflow on it, but it needs to be marked as live-in so
4200 // it will get saved in the prolog.
4201 if (!compiler->compJmpOpUsed && argDsc->lvRefCnt == 0 && !compiler->opts.compDbgCode)
4206 if (argDsc->lvIsRegArg)
4208 updateRegStateForArg(argDsc);
4211 if (isCandidateVar(argDsc))
4213 Interval* interval = getIntervalForLocalVar(lclNum);
4214 regMaskTP mask = allRegs(TypeGet(argDsc));
4215 if (argDsc->lvIsRegArg)
4217 // Set this interval as currently assigned to that register
4218 regNumber inArgReg = argDsc->lvArgReg;
4219 assert(inArgReg < REG_COUNT);
4220 mask = genRegMask(inArgReg);
4221 assignPhysReg(inArgReg, interval);
4223 RefPosition* pos = newRefPosition(interval, MinLocation, RefTypeParamDef, nullptr, mask);
4225 else if (varTypeIsStruct(argDsc->lvType))
4227 for (unsigned fieldVarNum = argDsc->lvFieldLclStart;
4228 fieldVarNum < argDsc->lvFieldLclStart + argDsc->lvFieldCnt; ++fieldVarNum)
4230 LclVarDsc* fieldVarDsc = &(compiler->lvaTable[fieldVarNum]);
4231 if (fieldVarDsc->lvLRACandidate)
4233 Interval* interval = getIntervalForLocalVar(fieldVarNum);
4235 newRefPosition(interval, MinLocation, RefTypeParamDef, nullptr, allRegs(TypeGet(fieldVarDsc)));
4241 // We can overwrite the register (i.e. codegen saves it on entry)
4242 assert(argDsc->lvRefCnt == 0 || !argDsc->lvIsRegArg || argDsc->lvDoNotEnregister ||
4243 !argDsc->lvLRACandidate || (varTypeIsFloating(argDsc->TypeGet()) && compiler->opts.compDbgCode));
4247 // Now set up the reg state for the non-tracked args
4248 // (We do this here because we want to generate the ParamDef RefPositions in tracked
4249 // order, so that loop doesn't hit the non-tracked args)
4251 for (unsigned argNum = 0; argNum < compiler->info.compArgsCount; argNum++, argDsc++)
4253 argDsc = &(compiler->lvaTable[argNum]);
4255 if (argDsc->lvPromotedStruct())
4257 noway_assert(argDsc->lvFieldCnt == 1); // We only handle one field here
4259 unsigned fieldVarNum = argDsc->lvFieldLclStart;
4260 argDsc = &(compiler->lvaTable[fieldVarNum]);
4262 noway_assert(argDsc->lvIsParam);
4263 if (!argDsc->lvTracked && argDsc->lvIsRegArg)
4265 updateRegStateForArg(argDsc);
4269 // If there is a secret stub param, it is also live in
4270 if (compiler->info.compPublishStubParam)
4272 intRegState->rsCalleeRegArgMaskLiveIn |= RBM_SECRET_STUB_PARAM;
4275 LocationInfoListNodePool listNodePool(compiler, 8);
4276 SmallHashTable<GenTree*, LocationInfoList, 32> operandToLocationInfoMap(compiler);
4278 BasicBlock* predBlock = nullptr;
4279 BasicBlock* prevBlock = nullptr;
4281 // Initialize currentLiveVars to the empty set. We will set it to the current
4282 // live-in at the entry to each block (this will include the incoming args on
4283 // the first block).
4284 VarSetOps::AssignNoCopy(compiler, currentLiveVars, VarSetOps::MakeEmpty(compiler));
4286 for (block = startBlockSequence(); block != nullptr; block = moveToNextBlock())
4288 JITDUMP("\nNEW BLOCK BB%02u\n", block->bbNum);
4290 bool predBlockIsAllocated = false;
4291 predBlock = findPredBlockForLiveIn(block, prevBlock DEBUGARG(&predBlockIsAllocated));
4293 if (block == compiler->fgFirstBB)
4295 insertZeroInitRefPositions();
4298 // Determine if we need any DummyDefs.
4299 // We need DummyDefs for cases where "predBlock" isn't really a predecessor.
4300 // Note that it's possible to have uses of unitialized variables, in which case even the first
4301 // block may require DummyDefs, which we are not currently adding - this means that these variables
4302 // will always be considered to be in memory on entry (and reloaded when the use is encountered).
4303 // TODO-CQ: Consider how best to tune this. Currently, if we create DummyDefs for uninitialized
4304 // variables (which may actually be initialized along the dynamically executed paths, but not
4305 // on all static paths), we wind up with excessive liveranges for some of these variables.
4306 VARSET_TP VARSET_INIT(compiler, newLiveIn, block->bbLiveIn);
4309 JITDUMP("\n\nSetting incoming variable registers of BB%02u to outVarToRegMap of BB%02u\n", block->bbNum,
4311 assert(predBlock->bbNum <= bbNumMaxBeforeResolution);
4312 blockInfo[block->bbNum].predBBNum = predBlock->bbNum;
4313 // Compute set difference: newLiveIn = block->bbLiveIn - predBlock->bbLiveOut
4314 VarSetOps::DiffD(compiler, newLiveIn, predBlock->bbLiveOut);
4316 bool needsDummyDefs = (!VarSetOps::IsEmpty(compiler, newLiveIn) && block != compiler->fgFirstBB);
4318 // Create dummy def RefPositions
4322 // If we are using locations from a predecessor, we should never require DummyDefs.
4323 assert(!predBlockIsAllocated);
4325 JITDUMP("Creating dummy definitions\n");
4326 VARSET_ITER_INIT(compiler, iter, newLiveIn, varIndex);
4327 while (iter.NextElem(compiler, &varIndex))
4329 unsigned varNum = compiler->lvaTrackedToVarNum[varIndex];
4330 LclVarDsc* varDsc = compiler->lvaTable + varNum;
4331 // Add a dummyDef for any candidate vars that are in the "newLiveIn" set.
4332 // If this is the entry block, don't add any incoming parameters (they're handled with ParamDefs).
4333 if (isCandidateVar(varDsc) && (predBlock != nullptr || !varDsc->lvIsParam))
4335 Interval* interval = getIntervalForLocalVar(varNum);
4337 newRefPosition(interval, currentLoc, RefTypeDummyDef, nullptr, allRegs(interval->registerType));
4340 JITDUMP("Finished creating dummy definitions\n\n");
4343 // Add a dummy RefPosition to mark the block boundary.
4344 // Note that we do this AFTER adding the exposed uses above, because the
4345 // register positions for those exposed uses need to be recorded at
4348 RefPosition* pos = newRefPosition((Interval*)nullptr, currentLoc, RefTypeBB, nullptr, RBM_NONE);
4350 VarSetOps::Assign(compiler, currentLiveVars, block->bbLiveIn);
4352 LIR::Range& blockRange = LIR::AsRange(block);
4353 for (GenTree* node : blockRange.NonPhiNodes())
4355 assert(node->gtLsraInfo.loc >= currentLoc);
4356 assert(((node->gtLIRFlags & LIR::Flags::IsUnusedValue) == 0) || node->gtLsraInfo.isLocalDefUse);
4358 currentLoc = node->gtLsraInfo.loc;
4359 buildRefPositionsForNode(node, block, listNodePool, operandToLocationInfoMap, currentLoc);
4362 if (currentLoc > maxNodeLocation)
4364 maxNodeLocation = currentLoc;
4369 // Increment the LsraLocation at this point, so that the dummy RefPositions
4370 // will not have the same LsraLocation as any "real" RefPosition.
4373 // Note: the visited set is cleared in LinearScan::doLinearScan()
4374 markBlockVisited(block);
4376 // Insert exposed uses for a lclVar that is live-out of 'block' but not live-in to the
4377 // next block, or any unvisited successors.
4378 // This will address lclVars that are live on a backedge, as well as those that are kept
4379 // live at a GT_JMP.
4381 // Blocks ending with "jmp method" are marked as BBJ_HAS_JMP,
4382 // and jmp call is represented using GT_JMP node which is a leaf node.
4383 // Liveness phase keeps all the arguments of the method live till the end of
4384 // block by adding them to liveout set of the block containing GT_JMP.
4386 // The target of a GT_JMP implicitly uses all the current method arguments, however
4387 // there are no actual references to them. This can cause LSRA to assert, because
4388 // the variables are live but it sees no references. In order to correctly model the
4389 // liveness of these arguments, we add dummy exposed uses, in the same manner as for
4390 // backward branches. This will happen automatically via expUseSet.
4392 // Note that a block ending with GT_JMP has no successors and hence the variables
4393 // for which dummy use ref positions are added are arguments of the method.
4395 VARSET_TP VARSET_INIT(compiler, expUseSet, block->bbLiveOut);
4396 BasicBlock* nextBlock = getNextBlock();
4397 if (nextBlock != nullptr)
4399 VarSetOps::DiffD(compiler, expUseSet, nextBlock->bbLiveIn);
4401 AllSuccessorIter succsEnd = block->GetAllSuccs(compiler).end();
4402 for (AllSuccessorIter succs = block->GetAllSuccs(compiler).begin();
4403 succs != succsEnd && !VarSetOps::IsEmpty(compiler, expUseSet); ++succs)
4405 BasicBlock* succ = (*succs);
4406 if (isBlockVisited(succ))
4410 VarSetOps::DiffD(compiler, expUseSet, succ->bbLiveIn);
4413 if (!VarSetOps::IsEmpty(compiler, expUseSet))
4415 JITDUMP("Exposed uses:");
4416 VARSET_ITER_INIT(compiler, iter, expUseSet, varIndex);
4417 while (iter.NextElem(compiler, &varIndex))
4419 unsigned varNum = compiler->lvaTrackedToVarNum[varIndex];
4420 LclVarDsc* varDsc = compiler->lvaTable + varNum;
4421 if (isCandidateVar(varDsc))
4423 Interval* interval = getIntervalForLocalVar(varNum);
4425 newRefPosition(interval, currentLoc, RefTypeExpUse, nullptr, allRegs(interval->registerType));
4426 JITDUMP(" V%02u", varNum);
4432 // Identify the last uses of each variable, except in the case of MinOpts, where all vars
4433 // are kept live everywhere.
4435 if (!compiler->opts.MinOpts())
4444 dumpConvertedVarSet(compiler, block->bbVarUse);
4446 dumpConvertedVarSet(compiler, block->bbVarDef);
4454 // If we need to KeepAliveAndReportThis, add a dummy exposed use of it at the end
4455 if (compiler->lvaKeepAliveAndReportThis())
4457 unsigned keepAliveVarNum = compiler->info.compThisArg;
4458 assert(compiler->info.compIsStatic == false);
4459 if (isCandidateVar(&compiler->lvaTable[keepAliveVarNum]))
4461 JITDUMP("Adding exposed use of this, for lvaKeepAliveAndReportThis\n");
4462 Interval* interval = getIntervalForLocalVar(keepAliveVarNum);
4464 newRefPosition(interval, currentLoc, RefTypeExpUse, nullptr, allRegs(interval->registerType));
4469 if (getLsraExtendLifeTimes())
4472 for (lclNum = 0, varDsc = compiler->lvaTable; lclNum < compiler->lvaCount; lclNum++, varDsc++)
4474 if (varDsc->lvLRACandidate)
4476 JITDUMP("Adding exposed use of V%02u for LsraExtendLifetimes\n", lclNum);
4477 Interval* interval = getIntervalForLocalVar(lclNum);
4479 newRefPosition(interval, currentLoc, RefTypeExpUse, nullptr, allRegs(interval->registerType));
4485 // If the last block has successors, create a RefTypeBB to record
4488 if (prevBlock->NumSucc(compiler) > 0)
4490 RefPosition* pos = newRefPosition((Interval*)nullptr, currentLoc, RefTypeBB, nullptr, RBM_NONE);
4494 // Make sure we don't have any blocks that were not visited
4495 foreach_block(compiler, block)
4497 assert(isBlockVisited(block));
4502 lsraDumpIntervals("BEFORE VALIDATING INTERVALS");
4503 dumpRefPositions("BEFORE VALIDATING INTERVALS");
4504 validateIntervals();
4510 void LinearScan::dumpVarRefPositions(const char* title)
4512 printf("\nVAR REFPOSITIONS %s\n", title);
4514 for (unsigned i = 0; i < compiler->lvaCount; i++)
4516 Interval* interval = getIntervalForLocalVar(i);
4517 printf("--- V%02u\n", i);
4519 for (RefPosition* ref = interval->firstRefPosition; ref != nullptr; ref = ref->nextRefPosition)
4528 void LinearScan::validateIntervals()
4530 for (unsigned i = 0; i < compiler->lvaCount; i++)
4532 Interval* interval = getIntervalForLocalVar(i);
4534 bool defined = false;
4535 printf("-----------------\n");
4536 for (RefPosition* ref = interval->firstRefPosition; ref != nullptr; ref = ref->nextRefPosition)
4539 RefType refType = ref->refType;
4540 if (!defined && RefTypeIsUse(refType))
4542 if (compiler->info.compMethodName != nullptr)
4544 printf("%s: ", compiler->info.compMethodName);
4546 printf("LocalVar V%02u: undefined use at %u\n", i, ref->nodeLocation);
4548 // Note that there can be multiple last uses if they are on disjoint paths,
4549 // so we can't really check the lastUse flag
4554 if (RefTypeIsDef(refType))
4563 // Set the default rpFrameType based upon codeGen->isFramePointerRequired()
4564 // This was lifted from the register predictor
4566 void LinearScan::setFrameType()
4568 FrameType frameType = FT_NOT_SET;
4569 if (compiler->codeGen->isFramePointerRequired())
4571 frameType = FT_EBP_FRAME;
4575 if (compiler->rpMustCreateEBPCalled == false)
4580 compiler->rpMustCreateEBPCalled = true;
4581 if (compiler->rpMustCreateEBPFrame(INDEBUG(&reason)))
4583 JITDUMP("; Decided to create an EBP based frame for ETW stackwalking (%s)\n", reason);
4584 compiler->codeGen->setFrameRequired(true);
4588 if (compiler->codeGen->isFrameRequired())
4590 frameType = FT_EBP_FRAME;
4594 frameType = FT_ESP_FRAME;
4599 // The DOUBLE_ALIGN feature indicates whether the JIT will attempt to double-align the
4600 // frame if needed. Note that this feature isn't on for amd64, because the stack is
4601 // always double-aligned by default.
4602 compiler->codeGen->setDoubleAlign(false);
4604 // TODO-CQ: Tune this (see regalloc.cpp, in which raCntWtdStkDblStackFP is used to
4605 // determine whether to double-align). Note, though that there is at least one test
4606 // (jit\opt\Perf\DoubleAlign\Locals.exe) that depends on double-alignment being set
4607 // in certain situations.
4608 if (!compiler->opts.MinOpts() && !compiler->codeGen->isFramePointerRequired() && compiler->compFloatingPointUsed)
4610 frameType = FT_DOUBLE_ALIGN_FRAME;
4612 #endif // DOUBLE_ALIGN
4617 noway_assert(!compiler->codeGen->isFramePointerRequired());
4618 noway_assert(!compiler->codeGen->isFrameRequired());
4619 compiler->codeGen->setFramePointerUsed(false);
4622 compiler->codeGen->setFramePointerUsed(true);
4625 case FT_DOUBLE_ALIGN_FRAME:
4626 noway_assert(!compiler->codeGen->isFramePointerRequired());
4627 compiler->codeGen->setFramePointerUsed(false);
4628 compiler->codeGen->setDoubleAlign(true);
4630 #endif // DOUBLE_ALIGN
4632 noway_assert(!"rpFrameType not set correctly!");
4636 // If we are using FPBASE as the frame register, we cannot also use it for
4637 // a local var. Note that we may have already added it to the register masks,
4638 // which are computed when the LinearScan class constructor is created, and
4639 // used during lowering. Luckily, the TreeNodeInfo only stores an index to
4640 // the masks stored in the LinearScan class, so we only need to walk the
4641 // unique masks and remove FPBASE.
4642 if (frameType == FT_EBP_FRAME)
4644 if ((availableIntRegs & RBM_FPBASE) != 0)
4646 RemoveRegisterFromMasks(REG_FPBASE);
4648 // We know that we're already in "read mode" for availableIntRegs. However,
4649 // we need to remove the FPBASE register, so subsequent users (like callers
4650 // to allRegs()) get the right thing. The RemoveRegisterFromMasks() code
4651 // fixes up everything that already took a dependency on the value that was
4652 // previously read, so this completes the picture.
4653 availableIntRegs.OverrideAssign(availableIntRegs & ~RBM_FPBASE);
4657 compiler->rpFrameType = frameType;
4660 // Is the copyReg given by this RefPosition still busy at the
4662 bool copyRegInUse(RefPosition* ref, LsraLocation loc)
4664 assert(ref->copyReg);
4665 if (ref->getRefEndLocation() >= loc)
4669 Interval* interval = ref->getInterval();
4670 RefPosition* nextRef = interval->getNextRefPosition();
4671 if (nextRef != nullptr && nextRef->treeNode == ref->treeNode && nextRef->getRefEndLocation() >= loc)
4678 // Determine whether the register represented by "physRegRecord" is available at least
4679 // at the "currentLoc", and if so, return the next location at which it is in use in
4680 // "nextRefLocationPtr"
4682 bool LinearScan::registerIsAvailable(RegRecord* physRegRecord,
4683 LsraLocation currentLoc,
4684 LsraLocation* nextRefLocationPtr,
4685 RegisterType regType)
4687 *nextRefLocationPtr = MaxLocation;
4688 LsraLocation nextRefLocation = MaxLocation;
4689 regMaskTP regMask = genRegMask(physRegRecord->regNum);
4690 if (physRegRecord->isBusyUntilNextKill)
4695 RefPosition* nextPhysReference = physRegRecord->getNextRefPosition();
4696 if (nextPhysReference != nullptr)
4698 nextRefLocation = nextPhysReference->nodeLocation;
4699 // if (nextPhysReference->refType == RefTypeFixedReg) nextRefLocation--;
4701 else if (!physRegRecord->isCalleeSave)
4703 nextRefLocation = MaxLocation - 1;
4706 Interval* assignedInterval = physRegRecord->assignedInterval;
4708 if (assignedInterval != nullptr)
4710 RefPosition* recentReference = assignedInterval->recentRefPosition;
4712 // The only case where we have an assignedInterval, but recentReference is null
4713 // is where this interval is live at procedure entry (i.e. an arg register), in which
4714 // case it's still live and its assigned register is not available
4715 // (Note that the ParamDef will be recorded as a recentReference when we encounter
4716 // it, but we will be allocating registers, potentially to other incoming parameters,
4717 // as we process the ParamDefs.)
4719 if (recentReference == nullptr)
4724 // Is this a copyReg? It is if the register assignment doesn't match.
4725 // (the recentReference may not be a copyReg, because we could have seen another
4726 // reference since the copyReg)
4728 if (!assignedInterval->isAssignedTo(physRegRecord->regNum))
4730 // Don't reassign it if it's still in use
4731 if (recentReference->copyReg && copyRegInUse(recentReference, currentLoc))
4736 else if (!assignedInterval->isActive && assignedInterval->isConstant)
4738 // Treat this as unassigned, i.e. do nothing.
4739 // TODO-CQ: Consider adjusting the heuristics (probably in the caller of this method)
4740 // to avoid reusing these registers.
4742 // If this interval isn't active, it's available if it isn't referenced
4743 // at this location (or the previous location, if the recent RefPosition
4744 // is a delayRegFree).
4745 else if (!assignedInterval->isActive &&
4746 (recentReference->refType == RefTypeExpUse || recentReference->getRefEndLocation() < currentLoc))
4748 // This interval must have a next reference (otherwise it wouldn't be assigned to this register)
4749 RefPosition* nextReference = recentReference->nextRefPosition;
4750 if (nextReference != nullptr)
4752 if (nextReference->nodeLocation < nextRefLocation)
4754 nextRefLocation = nextReference->nodeLocation;
4759 assert(recentReference->copyReg && recentReference->registerAssignment != regMask);
4767 if (nextRefLocation < *nextRefLocationPtr)
4769 *nextRefLocationPtr = nextRefLocation;
4773 if (regType == TYP_DOUBLE)
4775 // Recurse, but check the other half this time (TYP_FLOAT)
4776 if (!registerIsAvailable(getRegisterRecord(REG_NEXT(physRegRecord->regNum)), currentLoc, nextRefLocationPtr,
4779 nextRefLocation = *nextRefLocationPtr;
4781 #endif // _TARGET_ARM_
4783 return (nextRefLocation >= currentLoc);
4786 //------------------------------------------------------------------------
4787 // getRegisterType: Get the RegisterType to use for the given RefPosition
4790 // currentInterval: The interval for the current allocation
4791 // refPosition: The RefPosition of the current Interval for which a register is being allocated
4794 // The RegisterType that should be allocated for this RefPosition
4797 // This will nearly always be identical to the registerType of the interval, except in the case
4798 // of SIMD types of 8 bytes (currently only Vector2) when they are passed and returned in integer
4799 // registers, or copied to a return temp.
4800 // This method need only be called in situations where we may be dealing with the register requirements
4801 // of a RefTypeUse RefPosition (i.e. not when we are only looking at the type of an interval, nor when
4802 // we are interested in the "defining" type of the interval). This is because the situation of interest
4803 // only happens at the use (where it must be copied to an integer register).
4805 RegisterType LinearScan::getRegisterType(Interval* currentInterval, RefPosition* refPosition)
4807 assert(refPosition->getInterval() == currentInterval);
4808 RegisterType regType = currentInterval->registerType;
4809 regMaskTP candidates = refPosition->registerAssignment;
4810 #if defined(FEATURE_SIMD) && defined(_TARGET_AMD64_)
4811 if ((candidates & allRegs(regType)) == RBM_NONE)
4813 assert((regType == TYP_SIMD8) && (refPosition->refType == RefTypeUse) &&
4814 ((candidates & allRegs(TYP_INT)) != RBM_NONE));
4817 #else // !(defined(FEATURE_SIMD) && defined(_TARGET_AMD64_))
4818 assert((candidates & allRegs(regType)) != RBM_NONE);
4819 #endif // !(defined(FEATURE_SIMD) && defined(_TARGET_AMD64_))
4823 //------------------------------------------------------------------------
4824 // tryAllocateFreeReg: Find a free register that satisfies the requirements for refPosition,
4825 // and takes into account the preferences for the given Interval
4828 // currentInterval: The interval for the current allocation
4829 // refPosition: The RefPosition of the current Interval for which a register is being allocated
4832 // The regNumber, if any, allocated to the RefPositon. Returns REG_NA if no free register is found.
4835 // TODO-CQ: Consider whether we need to use a different order for tree temps than for vars, as
4838 static const regNumber lsraRegOrder[] = {REG_VAR_ORDER};
4839 const unsigned lsraRegOrderSize = ArrLen(lsraRegOrder);
4840 static const regNumber lsraRegOrderFlt[] = {REG_VAR_ORDER_FLT};
4841 const unsigned lsraRegOrderFltSize = ArrLen(lsraRegOrderFlt);
4843 regNumber LinearScan::tryAllocateFreeReg(Interval* currentInterval, RefPosition* refPosition)
4845 regNumber foundReg = REG_NA;
4847 RegisterType regType = getRegisterType(currentInterval, refPosition);
4848 const regNumber* regOrder;
4849 unsigned regOrderSize;
4850 if (useFloatReg(regType))
4852 regOrder = lsraRegOrderFlt;
4853 regOrderSize = lsraRegOrderFltSize;
4857 regOrder = lsraRegOrder;
4858 regOrderSize = lsraRegOrderSize;
4861 LsraLocation currentLocation = refPosition->nodeLocation;
4862 RefPosition* nextRefPos = refPosition->nextRefPosition;
4863 LsraLocation nextLocation = (nextRefPos == nullptr) ? currentLocation : nextRefPos->nodeLocation;
4864 regMaskTP candidates = refPosition->registerAssignment;
4865 regMaskTP preferences = currentInterval->registerPreferences;
4867 if (RefTypeIsDef(refPosition->refType))
4869 if (currentInterval->hasConflictingDefUse)
4871 resolveConflictingDefAndUse(currentInterval, refPosition);
4872 candidates = refPosition->registerAssignment;
4874 // Otherwise, check for the case of a fixed-reg def of a reg that will be killed before the
4875 // use, or interferes at the point of use (which shouldn't happen, but Lower doesn't mark
4876 // the contained nodes as interfering).
4877 // Note that we may have a ParamDef RefPosition that is marked isFixedRegRef, but which
4878 // has had its registerAssignment changed to no longer be a single register.
4879 else if (refPosition->isFixedRegRef && nextRefPos != nullptr && RefTypeIsUse(nextRefPos->refType) &&
4880 !nextRefPos->isFixedRegRef && genMaxOneBit(refPosition->registerAssignment))
4882 regNumber defReg = refPosition->assignedReg();
4883 RegRecord* defRegRecord = getRegisterRecord(defReg);
4885 RefPosition* currFixedRegRefPosition = defRegRecord->recentRefPosition;
4886 assert(currFixedRegRefPosition != nullptr &&
4887 currFixedRegRefPosition->nodeLocation == refPosition->nodeLocation);
4889 // If there is another fixed reference to this register before the use, change the candidates
4890 // on this RefPosition to include that of nextRefPos.
4891 if (currFixedRegRefPosition->nextRefPosition != nullptr &&
4892 currFixedRegRefPosition->nextRefPosition->nodeLocation <= nextRefPos->getRefEndLocation())
4894 candidates |= nextRefPos->registerAssignment;
4895 if (preferences == refPosition->registerAssignment)
4897 preferences = candidates;
4903 preferences &= candidates;
4904 if (preferences == RBM_NONE)
4906 preferences = candidates;
4908 regMaskTP relatedPreferences = RBM_NONE;
4911 candidates = stressLimitRegs(refPosition, candidates);
4913 bool mustAssignARegister = true;
4914 assert(candidates != RBM_NONE);
4916 // If the related interval has no further references, it is possible that it is a source of the
4917 // node that produces this interval. However, we don't want to use the relatedInterval for preferencing
4918 // if its next reference is not a new definition (as it either is or will become live).
4919 Interval* relatedInterval = currentInterval->relatedInterval;
4920 if (relatedInterval != nullptr)
4922 RefPosition* nextRelatedRefPosition = relatedInterval->getNextRefPosition();
4923 if (nextRelatedRefPosition != nullptr)
4925 // Don't use the relatedInterval for preferencing if its next reference is not a new definition.
4926 if (!RefTypeIsDef(nextRelatedRefPosition->refType))
4928 relatedInterval = nullptr;
4930 // Is the relatedInterval simply a copy to another relatedInterval?
4931 else if ((relatedInterval->relatedInterval != nullptr) &&
4932 (nextRelatedRefPosition->nextRefPosition != nullptr) &&
4933 (nextRelatedRefPosition->nextRefPosition->nextRefPosition == nullptr) &&
4934 (nextRelatedRefPosition->nextRefPosition->nodeLocation <
4935 relatedInterval->relatedInterval->getNextRefLocation()))
4937 // The current relatedInterval has only two remaining RefPositions, both of which
4938 // occur prior to the next RefPosition for its relatedInterval.
4939 // It is likely a copy.
4940 relatedInterval = relatedInterval->relatedInterval;
4945 if (relatedInterval != nullptr)
4947 // If the related interval already has an assigned register, then use that
4948 // as the related preference. We'll take the related
4949 // interval preferences into account in the loop over all the registers.
4951 if (relatedInterval->assignedReg != nullptr)
4953 relatedPreferences = genRegMask(relatedInterval->assignedReg->regNum);
4957 relatedPreferences = relatedInterval->registerPreferences;
4961 bool preferCalleeSave = currentInterval->preferCalleeSave;
4963 // For floating point, we want to be less aggressive about using callee-save registers.
4964 // So in that case, we just need to ensure that the current RefPosition is covered.
4965 RefPosition* rangeEndRefPosition;
4966 RefPosition* lastRefPosition = currentInterval->lastRefPosition;
4967 if (useFloatReg(currentInterval->registerType))
4969 rangeEndRefPosition = refPosition;
4973 rangeEndRefPosition = currentInterval->lastRefPosition;
4974 // If we have a relatedInterval that is not currently occupying a register,
4975 // and whose lifetime begins after this one ends,
4976 // we want to try to select a register that will cover its lifetime.
4977 if ((relatedInterval != nullptr) && (relatedInterval->assignedReg == nullptr) &&
4978 (relatedInterval->getNextRefLocation() >= rangeEndRefPosition->nodeLocation))
4980 lastRefPosition = relatedInterval->lastRefPosition;
4981 preferCalleeSave = relatedInterval->preferCalleeSave;
4985 // If this has a delayed use (due to being used in a rmw position of a
4986 // non-commutative operator), its endLocation is delayed until the "def"
4987 // position, which is one location past the use (getRefEndLocation() takes care of this).
4988 LsraLocation rangeEndLocation = rangeEndRefPosition->getRefEndLocation();
4989 LsraLocation lastLocation = lastRefPosition->getRefEndLocation();
4990 regNumber prevReg = REG_NA;
4992 if (currentInterval->assignedReg)
4994 bool useAssignedReg = false;
4995 // This was an interval that was previously allocated to the given
4996 // physical register, and we should try to allocate it to that register
4997 // again, if possible and reasonable.
4998 // Use it preemptively (i.e. before checking other available regs)
4999 // only if it is preferred and available.
5001 RegRecord* regRec = currentInterval->assignedReg;
5002 prevReg = regRec->regNum;
5003 regMaskTP prevRegBit = genRegMask(prevReg);
5005 // Is it in the preferred set of regs?
5006 if ((prevRegBit & preferences) != RBM_NONE)
5008 // Is it currently available?
5009 LsraLocation nextPhysRefLoc;
5010 if (registerIsAvailable(regRec, currentLocation, &nextPhysRefLoc, currentInterval->registerType))
5012 // If the register is next referenced at this location, only use it if
5013 // this has a fixed reg requirement (i.e. this is the reference that caused
5014 // the FixedReg ref to be created)
5016 if (!regRec->conflictingFixedRegReference(refPosition))
5018 useAssignedReg = true;
5024 regNumber foundReg = prevReg;
5025 assignPhysReg(regRec, currentInterval);
5026 refPosition->registerAssignment = genRegMask(foundReg);
5031 // Don't keep trying to allocate to this register
5032 currentInterval->assignedReg = nullptr;
5036 RegRecord* availablePhysRegInterval = nullptr;
5037 Interval* intervalToUnassign = nullptr;
5039 // Each register will receive a score which is the sum of the scoring criteria below.
5040 // These were selected on the assumption that they will have an impact on the "goodness"
5041 // of a register selection, and have been tuned to a certain extent by observing the impact
5042 // of the ordering on asmDiffs. However, there is probably much more room for tuning,
5043 // and perhaps additional criteria.
5045 // These are FLAGS (bits) so that we can easily order them and add them together.
5046 // If the scores are equal, but one covers more of the current interval's range,
5047 // then it wins. Otherwise, the one encountered earlier in the regOrder wins.
5051 VALUE_AVAILABLE = 0x40, // It is a constant value that is already in an acceptable register.
5052 COVERS = 0x20, // It is in the interval's preference set and it covers the entire lifetime.
5053 OWN_PREFERENCE = 0x10, // It is in the preference set of this interval.
5054 COVERS_RELATED = 0x08, // It is in the preference set of the related interval and covers the entire lifetime.
5055 RELATED_PREFERENCE = 0x04, // It is in the preference set of the related interval.
5056 CALLER_CALLEE = 0x02, // It is in the right "set" for the interval (caller or callee-save).
5057 UNASSIGNED = 0x01, // It is not currently assigned to an inactive interval.
5062 // Compute the best possible score so we can stop looping early if we find it.
5063 // TODO-Throughput: At some point we may want to short-circuit the computation of each score, but
5064 // probably not until we've tuned the order of these criteria. At that point,
5065 // we'll need to avoid the short-circuit if we've got a stress option to reverse
5067 int bestPossibleScore = COVERS + UNASSIGNED + OWN_PREFERENCE + CALLER_CALLEE;
5068 if (relatedPreferences != RBM_NONE)
5070 bestPossibleScore |= RELATED_PREFERENCE + COVERS_RELATED;
5073 LsraLocation bestLocation = MinLocation;
5075 // In non-debug builds, this will simply get optimized away
5076 bool reverseSelect = false;
5078 reverseSelect = doReverseSelect();
5081 // An optimization for the common case where there is only one candidate -
5082 // avoid looping over all the other registers
5084 regNumber singleReg = REG_NA;
5086 if (genMaxOneBit(candidates))
5089 singleReg = genRegNumFromMask(candidates);
5090 regOrder = &singleReg;
5093 for (unsigned i = 0; i < regOrderSize && (candidates != RBM_NONE); i++)
5095 regNumber regNum = regOrder[i];
5096 regMaskTP candidateBit = genRegMask(regNum);
5098 if (!(candidates & candidateBit))
5103 candidates &= ~candidateBit;
5105 RegRecord* physRegRecord = getRegisterRecord(regNum);
5108 LsraLocation nextPhysRefLocation = MaxLocation;
5110 // By chance, is this register already holding this interval, as a copyReg or having
5111 // been restored as inactive after a kill?
5112 if (physRegRecord->assignedInterval == currentInterval)
5114 availablePhysRegInterval = physRegRecord;
5115 intervalToUnassign = nullptr;
5119 // Find the next RefPosition of the physical register
5120 if (!registerIsAvailable(physRegRecord, currentLocation, &nextPhysRefLocation, regType))
5125 // If the register is next referenced at this location, only use it if
5126 // this has a fixed reg requirement (i.e. this is the reference that caused
5127 // the FixedReg ref to be created)
5129 if (physRegRecord->conflictingFixedRegReference(refPosition))
5134 // If this is a definition of a constant interval, check to see if its value is already in this register.
5135 if (currentInterval->isConstant && RefTypeIsDef(refPosition->refType) &&
5136 (physRegRecord->assignedInterval != nullptr) && physRegRecord->assignedInterval->isConstant)
5138 noway_assert(refPosition->treeNode != nullptr);
5139 GenTree* otherTreeNode = physRegRecord->assignedInterval->firstRefPosition->treeNode;
5140 noway_assert(otherTreeNode != nullptr);
5142 if (refPosition->treeNode->OperGet() == otherTreeNode->OperGet())
5144 switch (otherTreeNode->OperGet())
5147 if ((refPosition->treeNode->AsIntCon()->IconValue() ==
5148 otherTreeNode->AsIntCon()->IconValue()) &&
5149 (varTypeGCtype(refPosition->treeNode) == varTypeGCtype(otherTreeNode)))
5151 #ifdef _TARGET_64BIT_
5152 // If the constant is negative, only reuse registers of the same type.
5153 // This is because, on a 64-bit system, we do not sign-extend immediates in registers to
5154 // 64-bits unless they are actually longs, as this requires a longer instruction.
5155 // This doesn't apply to a 32-bit system, on which long values occupy multiple registers.
5156 // (We could sign-extend, but we would have to always sign-extend, because if we reuse more
5157 // than once, we won't have access to the instruction that originally defines the constant).
5158 if ((refPosition->treeNode->TypeGet() == otherTreeNode->TypeGet()) ||
5159 (refPosition->treeNode->AsIntCon()->IconValue() >= 0))
5160 #endif // _TARGET_64BIT_
5162 score |= VALUE_AVAILABLE;
5168 // For floating point constants, the values must be identical, not simply compare
5169 // equal. So we compare the bits.
5170 if (refPosition->treeNode->AsDblCon()->isBitwiseEqual(otherTreeNode->AsDblCon()) &&
5171 (refPosition->treeNode->TypeGet() == otherTreeNode->TypeGet()))
5173 score |= VALUE_AVAILABLE;
5178 // for all other 'otherTreeNode->OperGet()' kinds, we leave 'score' unchanged
5184 // If the nextPhysRefLocation is a fixedRef for the rangeEndRefPosition, increment it so that
5185 // we don't think it isn't covering the live range.
5186 // This doesn't handle the case where earlier RefPositions for this Interval are also
5187 // FixedRefs of this regNum, but at least those are only interesting in the case where those
5188 // are "local last uses" of the Interval - otherwise the liveRange would interfere with the reg.
5189 if (nextPhysRefLocation == rangeEndLocation && rangeEndRefPosition->isFixedRefOfReg(regNum))
5191 INDEBUG(dumpLsraAllocationEvent(LSRA_EVENT_INCREMENT_RANGE_END, currentInterval, regNum));
5192 nextPhysRefLocation++;
5195 if ((candidateBit & preferences) != RBM_NONE)
5197 score |= OWN_PREFERENCE;
5198 if (nextPhysRefLocation > rangeEndLocation)
5203 if (relatedInterval != nullptr && (candidateBit & relatedPreferences) != RBM_NONE)
5205 score |= RELATED_PREFERENCE;
5206 if (nextPhysRefLocation > relatedInterval->lastRefPosition->nodeLocation)
5208 score |= COVERS_RELATED;
5212 // If we had a fixed-reg def of a reg that will be killed before the use, prefer it to any other registers
5213 // with the same score. (Note that we haven't changed the original registerAssignment on the RefPosition).
5214 // Overload the RELATED_PREFERENCE value.
5215 else if (candidateBit == refPosition->registerAssignment)
5217 score |= RELATED_PREFERENCE;
5220 if ((preferCalleeSave && physRegRecord->isCalleeSave) || (!preferCalleeSave && !physRegRecord->isCalleeSave))
5222 score |= CALLER_CALLEE;
5225 // The register is considered unassigned if it has no assignedInterval, OR
5226 // if its next reference is beyond the range of this interval.
5227 if (physRegRecord->assignedInterval == nullptr ||
5228 physRegRecord->assignedInterval->getNextRefLocation() > lastLocation)
5230 score |= UNASSIGNED;
5233 bool foundBetterCandidate = false;
5235 if (score > bestScore)
5237 foundBetterCandidate = true;
5239 else if (score == bestScore)
5241 // Prefer a register that covers the range.
5242 if (bestLocation <= lastLocation)
5244 if (nextPhysRefLocation > bestLocation)
5246 foundBetterCandidate = true;
5249 // If both cover the range, prefer a register that is killed sooner (leaving the longer range register
5250 // available). If both cover the range and also getting killed at the same location, prefer the one which
5251 // is same as previous assignment.
5252 else if (nextPhysRefLocation > lastLocation)
5254 if (nextPhysRefLocation < bestLocation)
5256 foundBetterCandidate = true;
5258 else if (nextPhysRefLocation == bestLocation && prevReg == regNum)
5260 foundBetterCandidate = true;
5266 if (doReverseSelect() && bestScore != 0)
5268 foundBetterCandidate = !foundBetterCandidate;
5272 if (foundBetterCandidate)
5274 bestLocation = nextPhysRefLocation;
5275 availablePhysRegInterval = physRegRecord;
5276 intervalToUnassign = physRegRecord->assignedInterval;
5280 // there is no way we can get a better score so break out
5281 if (!reverseSelect && score == bestPossibleScore && bestLocation == rangeEndLocation + 1)
5287 if (availablePhysRegInterval != nullptr)
5289 if (intervalToUnassign != nullptr)
5291 unassignPhysReg(availablePhysRegInterval, intervalToUnassign->recentRefPosition);
5292 if (bestScore & VALUE_AVAILABLE)
5294 assert(intervalToUnassign->isConstant);
5295 refPosition->treeNode->SetReuseRegVal();
5296 refPosition->treeNode->SetInReg();
5298 // If we considered this "unassigned" because this interval's lifetime ends before
5299 // the next ref, remember it.
5300 else if ((bestScore & UNASSIGNED) != 0 && intervalToUnassign != nullptr)
5302 availablePhysRegInterval->previousInterval = intervalToUnassign;
5307 assert((bestScore & VALUE_AVAILABLE) == 0);
5309 assignPhysReg(availablePhysRegInterval, currentInterval);
5310 foundReg = availablePhysRegInterval->regNum;
5311 regMaskTP foundRegMask = genRegMask(foundReg);
5312 refPosition->registerAssignment = foundRegMask;
5313 if (relatedInterval != nullptr)
5315 relatedInterval->updateRegisterPreferences(foundRegMask);
5322 //------------------------------------------------------------------------
5323 // allocateBusyReg: Find a busy register that satisfies the requirements for refPosition,
5324 // and that can be spilled.
5327 // current The interval for the current allocation
5328 // refPosition The RefPosition of the current Interval for which a register is being allocated
5329 // allocateIfProfitable If true, a reg may not be allocated if all other ref positions currently
5330 // occupying registers are more important than the 'refPosition'.
5333 // The regNumber allocated to the RefPositon. Returns REG_NA if no free register is found.
5335 // Note: Currently this routine uses weight and farthest distance of next reference
5336 // to select a ref position for spilling.
5337 // a) if allocateIfProfitable = false
5338 // The ref position chosen for spilling will be the lowest weight
5339 // of all and if there is is more than one ref position with the
5340 // same lowest weight, among them choses the one with farthest
5341 // distance to its next reference.
5343 // b) if allocateIfProfitable = true
5344 // The ref position chosen for spilling will not only be lowest weight
5345 // of all but also has a weight lower than 'refPosition'. If there is
5346 // no such ref position, reg will not be allocated.
5347 regNumber LinearScan::allocateBusyReg(Interval* current, RefPosition* refPosition, bool allocateIfProfitable)
5349 regNumber foundReg = REG_NA;
5351 RegisterType regType = getRegisterType(current, refPosition);
5352 regMaskTP candidates = refPosition->registerAssignment;
5353 regMaskTP preferences = (current->registerPreferences & candidates);
5354 if (preferences == RBM_NONE)
5356 preferences = candidates;
5358 if (candidates == RBM_NONE)
5360 // This assumes only integer and floating point register types
5361 // if we target a processor with additional register types,
5362 // this would have to change
5363 candidates = allRegs(regType);
5367 candidates = stressLimitRegs(refPosition, candidates);
5370 // TODO-CQ: Determine whether/how to take preferences into account in addition to
5371 // prefering the one with the furthest ref position when considering
5372 // a candidate to spill
5373 RegRecord* farthestRefPhysRegRecord = nullptr;
5374 LsraLocation farthestLocation = MinLocation;
5375 LsraLocation refLocation = refPosition->nodeLocation;
5376 unsigned farthestRefPosWeight;
5377 if (allocateIfProfitable)
5379 // If allocating a reg is optional, we will consider those ref positions
5380 // whose weight is less than 'refPosition' for spilling.
5381 farthestRefPosWeight = getWeight(refPosition);
5385 // If allocating a reg is a must, we start off with max weight so
5386 // that the first spill candidate will be selected based on
5387 // farthest distance alone. Since we start off with farthestLocation
5388 // initialized to MinLocation, the first available ref position
5389 // will be selected as spill candidate and its weight as the
5390 // fathestRefPosWeight.
5391 farthestRefPosWeight = BB_MAX_WEIGHT;
5394 for (regNumber regNum : Registers(regType))
5396 regMaskTP candidateBit = genRegMask(regNum);
5397 if (!(candidates & candidateBit))
5401 RegRecord* physRegRecord = getRegisterRecord(regNum);
5403 if (physRegRecord->isBusyUntilNextKill)
5407 Interval* assignedInterval = physRegRecord->assignedInterval;
5409 // If there is a fixed reference at the same location (and it's not due to this reference),
5412 if (physRegRecord->conflictingFixedRegReference(refPosition))
5414 assert(candidates != candidateBit);
5418 LsraLocation physRegNextLocation = MaxLocation;
5419 if (refPosition->isFixedRefOfRegMask(candidateBit))
5421 // Either there is a fixed reference due to this node, or one associated with a
5422 // fixed use fed by a def at this node.
5423 // In either case, we must use this register as it's the only candidate
5424 // TODO-CQ: At the time we allocate a register to a fixed-reg def, if it's not going
5425 // to remain live until the use, we should set the candidates to allRegs(regType)
5426 // to avoid a spill - codegen can then insert the copy.
5427 assert(candidates == candidateBit);
5428 physRegNextLocation = MaxLocation;
5429 farthestRefPosWeight = BB_MAX_WEIGHT;
5433 physRegNextLocation = physRegRecord->getNextRefLocation();
5435 // If refPosition requires a fixed register, we should reject all others.
5436 // Otherwise, we will still evaluate all phyRegs though their next location is
5437 // not better than farthestLocation found so far.
5439 // TODO: this method should be using an approach similar to tryAllocateFreeReg()
5440 // where it uses a regOrder array to avoid iterating over any but the single
5442 if (refPosition->isFixedRegRef && physRegNextLocation < farthestLocation)
5448 // If this register is not assigned to an interval, either
5449 // - it has a FixedReg reference at the current location that is not this reference, OR
5450 // - this is the special case of a fixed loReg, where this interval has a use at the same location
5451 // In either case, we cannot use it
5453 if (assignedInterval == nullptr)
5455 RefPosition* nextPhysRegPosition = physRegRecord->getNextRefPosition();
5457 #ifndef _TARGET_ARM64_
5458 // TODO-Cleanup: Revisit this after Issue #3524 is complete
5459 // On ARM64 the nodeLocation is not always == refLocation, Disabling this assert for now.
5460 assert(nextPhysRegPosition->nodeLocation == refLocation && candidateBit != candidates);
5465 RefPosition* recentAssignedRef = assignedInterval->recentRefPosition;
5467 if (!assignedInterval->isActive)
5469 // The assigned interval has a reference at this location - otherwise, we would have found
5470 // this in tryAllocateFreeReg().
5471 // Note that we may or may not have actually handled the reference yet, so it could either
5472 // be recentAssigedRef, or the next reference.
5473 assert(recentAssignedRef != nullptr);
5474 if (recentAssignedRef->nodeLocation != refLocation)
5476 if (recentAssignedRef->nodeLocation + 1 == refLocation)
5478 assert(recentAssignedRef->delayRegFree);
5482 RefPosition* nextAssignedRef = recentAssignedRef->nextRefPosition;
5483 assert(nextAssignedRef != nullptr);
5484 assert(nextAssignedRef->nodeLocation == refLocation ||
5485 (nextAssignedRef->nodeLocation + 1 == refLocation && nextAssignedRef->delayRegFree));
5491 // If we have a recentAssignedRef, check that it is going to be OK to spill it
5493 // TODO-Review: Under what conditions recentAssginedRef would be null?
5494 unsigned recentAssignedRefWeight = BB_ZERO_WEIGHT;
5495 if (recentAssignedRef != nullptr)
5497 if (recentAssignedRef->nodeLocation == refLocation)
5499 // We can't spill a register that's being used at the current location
5500 RefPosition* physRegRef = physRegRecord->recentRefPosition;
5504 // If the current position has the candidate register marked to be delayed,
5505 // check if the previous location is using this register, if that's the case we have to skip
5506 // since we can't spill this register.
5507 if (recentAssignedRef->delayRegFree && (refLocation == recentAssignedRef->nodeLocation + 1))
5512 // We don't prefer to spill a register if the weight of recentAssignedRef > weight
5513 // of the spill candidate found so far. We would consider spilling a greater weight
5514 // ref position only if the refPosition being allocated must need a reg.
5515 recentAssignedRefWeight = getWeight(recentAssignedRef);
5516 if (recentAssignedRefWeight > farthestRefPosWeight)
5522 LsraLocation nextLocation = assignedInterval->getNextRefLocation();
5524 // We should never spill a register that's occupied by an Interval with its next use at the current location.
5525 // Normally this won't occur (unless we actually had more uses in a single node than there are registers),
5526 // because we'll always find something with a later nextLocation, but it can happen in stress when
5527 // we have LSRA_SELECT_NEAREST.
5528 if ((nextLocation == refLocation) && !refPosition->isFixedRegRef)
5533 if (nextLocation > physRegNextLocation)
5535 nextLocation = physRegNextLocation;
5538 bool isBetterLocation;
5541 if (doSelectNearest() && farthestRefPhysRegRecord != nullptr)
5543 isBetterLocation = (nextLocation <= farthestLocation);
5547 // This if-stmt is associated with the above else
5548 if (recentAssignedRefWeight < farthestRefPosWeight)
5550 isBetterLocation = true;
5554 // This would mean the weight of spill ref position we found so far is equal
5555 // to the weight of the ref position that is being evaluated. In this case
5556 // we prefer to spill ref position whose distance to its next reference is
5558 assert(recentAssignedRefWeight == farthestRefPosWeight);
5560 // If allocateIfProfitable=true, the first spill candidate selected
5561 // will be based on weight alone. After we have found a spill
5562 // candidate whose weight is less than the 'refPosition', we will
5563 // consider farthest distance when there is a tie in weights.
5564 // This is to ensure that we don't spill a ref position whose
5565 // weight is equal to weight of 'refPosition'.
5566 if (allocateIfProfitable && farthestRefPhysRegRecord == nullptr)
5568 isBetterLocation = false;
5572 isBetterLocation = (nextLocation > farthestLocation);
5574 if (nextLocation > farthestLocation)
5576 isBetterLocation = true;
5578 else if (nextLocation == farthestLocation)
5580 // Both weight and distance are equal.
5581 // Prefer that ref position which is marked both reload and
5582 // allocate if profitable. These ref positions don't need
5583 // need to be spilled as they are already in memory and
5584 // codegen considers them as contained memory operands.
5585 isBetterLocation = (recentAssignedRef != nullptr) && recentAssignedRef->reload &&
5586 recentAssignedRef->AllocateIfProfitable();
5590 isBetterLocation = false;
5595 if (isBetterLocation)
5597 farthestLocation = nextLocation;
5598 farthestRefPhysRegRecord = physRegRecord;
5599 farthestRefPosWeight = recentAssignedRefWeight;
5604 if (allocateIfProfitable)
5606 // There may not be a spill candidate or if one is found
5607 // its weight must be less than the weight of 'refPosition'
5608 assert((farthestRefPhysRegRecord == nullptr) || (farthestRefPosWeight < getWeight(refPosition)));
5612 // Must have found a spill candidate.
5613 assert((farthestRefPhysRegRecord != nullptr) && (farthestLocation > refLocation || refPosition->isFixedRegRef));
5617 if (farthestRefPhysRegRecord != nullptr)
5619 foundReg = farthestRefPhysRegRecord->regNum;
5620 unassignPhysReg(farthestRefPhysRegRecord, farthestRefPhysRegRecord->assignedInterval->recentRefPosition);
5621 assignPhysReg(farthestRefPhysRegRecord, current);
5622 refPosition->registerAssignment = genRegMask(foundReg);
5627 refPosition->registerAssignment = RBM_NONE;
5633 // Grab a register to use to copy and then immediately use.
5634 // This is called only for localVar intervals that already have a register
5635 // assignment that is not compatible with the current RefPosition.
5636 // This is not like regular assignment, because we don't want to change
5637 // any preferences or existing register assignments.
5638 // Prefer a free register that's got the earliest next use.
5639 // Otherwise, spill something with the farthest next use
5641 regNumber LinearScan::assignCopyReg(RefPosition* refPosition)
5643 Interval* currentInterval = refPosition->getInterval();
5644 assert(currentInterval != nullptr);
5645 assert(currentInterval->isActive);
5647 bool foundFreeReg = false;
5648 RegRecord* bestPhysReg = nullptr;
5649 LsraLocation bestLocation = MinLocation;
5650 regMaskTP candidates = refPosition->registerAssignment;
5652 // Save the relatedInterval, if any, so that it doesn't get modified during allocation.
5653 Interval* savedRelatedInterval = currentInterval->relatedInterval;
5654 currentInterval->relatedInterval = nullptr;
5656 // We don't want really want to change the default assignment,
5657 // so 1) pretend this isn't active, and 2) remember the old reg
5658 regNumber oldPhysReg = currentInterval->physReg;
5659 RegRecord* oldRegRecord = currentInterval->assignedReg;
5660 assert(oldRegRecord->regNum == oldPhysReg);
5661 currentInterval->isActive = false;
5663 regNumber allocatedReg = tryAllocateFreeReg(currentInterval, refPosition);
5664 if (allocatedReg == REG_NA)
5666 allocatedReg = allocateBusyReg(currentInterval, refPosition, false);
5669 // Now restore the old info
5670 currentInterval->relatedInterval = savedRelatedInterval;
5671 currentInterval->physReg = oldPhysReg;
5672 currentInterval->assignedReg = oldRegRecord;
5673 currentInterval->isActive = true;
5675 refPosition->copyReg = true;
5676 return allocatedReg;
5679 // Check if the interval is already assigned and if it is then unassign the physical record
5680 // then set the assignedInterval to 'interval'
5682 void LinearScan::checkAndAssignInterval(RegRecord* regRec, Interval* interval)
5684 if (regRec->assignedInterval != nullptr && regRec->assignedInterval != interval)
5686 // This is allocated to another interval. Either it is inactive, or it was allocated as a
5687 // copyReg and is therefore not the "assignedReg" of the other interval. In the latter case,
5688 // we simply unassign it - in the former case we need to set the physReg on the interval to
5689 // REG_NA to indicate that it is no longer in that register.
5690 // The lack of checking for this case resulted in an assert in the retail version of System.dll,
5691 // in method SerialStream.GetDcbFlag.
5692 // Note that we can't check for the copyReg case, because we may have seen a more recent
5693 // RefPosition for the Interval that was NOT a copyReg.
5694 if (regRec->assignedInterval->assignedReg == regRec)
5696 assert(regRec->assignedInterval->isActive == false);
5697 regRec->assignedInterval->physReg = REG_NA;
5699 unassignPhysReg(regRec->regNum);
5702 regRec->assignedInterval = interval;
5705 // Assign the given physical register interval to the given interval
5706 void LinearScan::assignPhysReg(RegRecord* regRec, Interval* interval)
5708 regMaskTP assignedRegMask = genRegMask(regRec->regNum);
5709 compiler->codeGen->regSet.rsSetRegsModified(assignedRegMask DEBUGARG(dumpTerse));
5711 checkAndAssignInterval(regRec, interval);
5712 interval->assignedReg = regRec;
5715 if ((interval->registerType == TYP_DOUBLE) && isFloatRegType(regRec->registerType))
5717 regNumber nextRegNum = REG_NEXT(regRec->regNum);
5718 RegRecord* nextRegRec = getRegisterRecord(nextRegNum);
5720 checkAndAssignInterval(nextRegRec, interval);
5722 #endif // _TARGET_ARM_
5724 interval->physReg = regRec->regNum;
5725 interval->isActive = true;
5726 if (interval->isLocalVar)
5728 // Prefer this register for future references
5729 interval->updateRegisterPreferences(assignedRegMask);
5733 //------------------------------------------------------------------------
5734 // spill: Spill this Interval between "fromRefPosition" and "toRefPosition"
5737 // fromRefPosition - The RefPosition at which the Interval is to be spilled
5738 // toRefPosition - The RefPosition at which it must be reloaded
5744 // fromRefPosition and toRefPosition must not be null
5746 void LinearScan::spillInterval(Interval* interval, RefPosition* fromRefPosition, RefPosition* toRefPosition)
5748 assert(fromRefPosition != nullptr && toRefPosition != nullptr);
5749 assert(fromRefPosition->getInterval() == interval && toRefPosition->getInterval() == interval);
5750 assert(fromRefPosition->nextRefPosition == toRefPosition);
5752 if (!fromRefPosition->lastUse)
5754 // If not allocated a register, Lcl var def/use ref positions even if reg optional
5755 // should be marked as spillAfter.
5756 if (!fromRefPosition->RequiresRegister() && !(interval->isLocalVar && fromRefPosition->IsActualRef()))
5758 fromRefPosition->registerAssignment = RBM_NONE;
5762 fromRefPosition->spillAfter = true;
5765 assert(toRefPosition != nullptr);
5770 dumpLsraAllocationEvent(LSRA_EVENT_SPILL, interval);
5774 interval->isActive = false;
5775 interval->isSpilled = true;
5777 // If fromRefPosition occurs before the beginning of this block, mark this as living in the stack
5778 // on entry to this block.
5779 if (fromRefPosition->nodeLocation <= curBBStartLocation)
5781 // This must be a lclVar interval
5782 assert(interval->isLocalVar);
5783 setInVarRegForBB(curBBNum, interval->varNum, REG_STK);
5787 //------------------------------------------------------------------------
5788 // unassignPhysRegNoSpill: Unassign the given physical register record from
5789 // an active interval, without spilling.
5792 // regRec - the RegRecord to be unasssigned
5798 // The assignedInterval must not be null, and must be active.
5801 // This method is used to unassign a register when an interval needs to be moved to a
5802 // different register, but not (yet) spilled.
5804 void LinearScan::unassignPhysRegNoSpill(RegRecord* regRec)
5806 Interval* assignedInterval = regRec->assignedInterval;
5807 assert(assignedInterval != nullptr && assignedInterval->isActive);
5808 assignedInterval->isActive = false;
5809 unassignPhysReg(regRec, nullptr);
5810 assignedInterval->isActive = true;
5813 //------------------------------------------------------------------------
5814 // checkAndClearInterval: Clear the assignedInterval for the given
5815 // physical register record
5818 // regRec - the physical RegRecord to be unasssigned
5819 // spillRefPosition - The RefPosition at which the assignedInterval is to be spilled
5820 // or nullptr if we aren't spilling
5826 // see unassignPhysReg
5828 void LinearScan::checkAndClearInterval(RegRecord* regRec, RefPosition* spillRefPosition)
5830 Interval* assignedInterval = regRec->assignedInterval;
5831 assert(assignedInterval != nullptr);
5832 regNumber thisRegNum = regRec->regNum;
5834 if (spillRefPosition == nullptr)
5836 // Note that we can't assert for the copyReg case
5838 if (assignedInterval->physReg == thisRegNum)
5840 assert(assignedInterval->isActive == false);
5845 assert(spillRefPosition->getInterval() == assignedInterval);
5848 regRec->assignedInterval = nullptr;
5851 //------------------------------------------------------------------------
5852 // unassignPhysReg: Unassign the given physical register record, and spill the
5853 // assignedInterval at the given spillRefPosition, if any.
5856 // regRec - the RegRecord to be unasssigned
5857 // spillRefPosition - The RefPosition at which the assignedInterval is to be spilled
5863 // The assignedInterval must not be null.
5864 // If spillRefPosition is null, the assignedInterval must be inactive, or not currently
5865 // assigned to this register (e.g. this is a copyReg for that Interval).
5866 // Otherwise, spillRefPosition must be associated with the assignedInterval.
5868 void LinearScan::unassignPhysReg(RegRecord* regRec, RefPosition* spillRefPosition)
5870 Interval* assignedInterval = regRec->assignedInterval;
5871 assert(assignedInterval != nullptr);
5872 checkAndClearInterval(regRec, spillRefPosition);
5873 regNumber thisRegNum = regRec->regNum;
5876 if ((assignedInterval->registerType == TYP_DOUBLE) && isFloatRegType(regRec->registerType))
5878 regNumber nextRegNum = REG_NEXT(regRec->regNum);
5879 RegRecord* nextRegRec = getRegisterRecord(nextRegNum);
5880 checkAndClearInterval(nextRegRec, spillRefPosition);
5882 #endif // _TARGET_ARM_
5885 if (VERBOSE && !dumpTerse)
5887 printf("unassigning %s: ", getRegName(regRec->regNum));
5888 assignedInterval->dump();
5893 RefPosition* nextRefPosition = nullptr;
5894 if (spillRefPosition != nullptr)
5896 nextRefPosition = spillRefPosition->nextRefPosition;
5899 if (assignedInterval->physReg != REG_NA && assignedInterval->physReg != thisRegNum)
5901 // This must have been a temporary copy reg, but we can't assert that because there
5902 // may have been intervening RefPositions that were not copyRegs.
5903 regRec->assignedInterval = nullptr;
5907 regNumber victimAssignedReg = assignedInterval->physReg;
5908 assignedInterval->physReg = REG_NA;
5910 bool spill = assignedInterval->isActive && nextRefPosition != nullptr;
5913 // If this is an active interval, it must have a recentRefPosition,
5914 // otherwise it would not be active
5915 assert(spillRefPosition != nullptr);
5918 // TODO-CQ: Enable this and insert an explicit GT_COPY (otherwise there's no way to communicate
5919 // to codegen that we want the copyReg to be the new home location).
5920 // If the last reference was a copyReg, and we're spilling the register
5921 // it was copied from, then make the copyReg the new primary location
5923 if (spillRefPosition->copyReg)
5925 regNumber copyFromRegNum = victimAssignedReg;
5926 regNumber copyRegNum = genRegNumFromMask(spillRefPosition->registerAssignment);
5927 if (copyFromRegNum == thisRegNum &&
5928 getRegisterRecord(copyRegNum)->assignedInterval == assignedInterval)
5930 assert(copyRegNum != thisRegNum);
5931 assignedInterval->physReg = copyRegNum;
5932 assignedInterval->assignedReg = this->getRegisterRecord(copyRegNum);
5938 // With JitStressRegs == 0x80 (LSRA_EXTEND_LIFETIMES), we may have a RefPosition
5939 // that is not marked lastUse even though the treeNode is a lastUse. In that case
5940 // we must not mark it for spill because the register will have been immediately freed
5941 // after use. While we could conceivably add special handling for this case in codegen,
5942 // it would be messy and undesirably cause the "bleeding" of LSRA stress modes outside
5944 if (extendLifetimes() && assignedInterval->isLocalVar && RefTypeIsUse(spillRefPosition->refType) &&
5945 spillRefPosition->treeNode != nullptr && (spillRefPosition->treeNode->gtFlags & GTF_VAR_DEATH) != 0)
5947 dumpLsraAllocationEvent(LSRA_EVENT_SPILL_EXTENDED_LIFETIME, assignedInterval);
5948 assignedInterval->isActive = false;
5950 // If the spillRefPosition occurs before the beginning of this block, it will have
5951 // been marked as living in this register on entry to this block, but we now need
5952 // to mark this as living on the stack.
5953 if (spillRefPosition->nodeLocation <= curBBStartLocation)
5955 setInVarRegForBB(curBBNum, assignedInterval->varNum, REG_STK);
5956 if (spillRefPosition->nextRefPosition != nullptr)
5958 assignedInterval->isSpilled = true;
5963 // Otherwise, we need to mark spillRefPosition as lastUse, or the interval
5964 // will remain active beyond its allocated range during the resolution phase.
5965 spillRefPosition->lastUse = true;
5971 spillInterval(assignedInterval, spillRefPosition, nextRefPosition);
5974 // Maintain the association with the interval, if it has more references.
5975 // Or, if we "remembered" an interval assigned to this register, restore it.
5976 if (nextRefPosition != nullptr)
5978 assignedInterval->assignedReg = regRec;
5980 else if (regRec->previousInterval != nullptr && regRec->previousInterval->assignedReg == regRec &&
5981 regRec->previousInterval->getNextRefPosition() != nullptr)
5983 regRec->assignedInterval = regRec->previousInterval;
5984 regRec->previousInterval = nullptr;
5988 dumpLsraAllocationEvent(LSRA_EVENT_RESTORE_PREVIOUS_INTERVAL_AFTER_SPILL, regRec->assignedInterval,
5993 dumpLsraAllocationEvent(LSRA_EVENT_RESTORE_PREVIOUS_INTERVAL, regRec->assignedInterval, thisRegNum);
5999 regRec->assignedInterval = nullptr;
6000 regRec->previousInterval = nullptr;
6004 //------------------------------------------------------------------------
6005 // spillGCRefs: Spill any GC-type intervals that are currently in registers.a
6008 // killRefPosition - The RefPosition for the kill
6013 void LinearScan::spillGCRefs(RefPosition* killRefPosition)
6015 // For each physical register that can hold a GC type,
6016 // if it is occupied by an interval of a GC type, spill that interval.
6017 regMaskTP candidateRegs = killRefPosition->registerAssignment;
6018 while (candidateRegs != RBM_NONE)
6020 regMaskTP nextRegBit = genFindLowestBit(candidateRegs);
6021 candidateRegs &= ~nextRegBit;
6022 regNumber nextReg = genRegNumFromMask(nextRegBit);
6023 RegRecord* regRecord = getRegisterRecord(nextReg);
6024 Interval* assignedInterval = regRecord->assignedInterval;
6025 if (assignedInterval == nullptr || (assignedInterval->isActive == false) ||
6026 !varTypeIsGC(assignedInterval->registerType))
6030 unassignPhysReg(regRecord, assignedInterval->recentRefPosition);
6032 INDEBUG(dumpLsraAllocationEvent(LSRA_EVENT_DONE_KILL_GC_REFS, nullptr, REG_NA, nullptr));
6035 //------------------------------------------------------------------------
6036 // processBlockEndAllocation: Update var locations after 'currentBlock' has been allocated
6039 // currentBlock - the BasicBlock we have just finished allocating registers for
6045 // Calls processBlockEndLocation() to set the outVarToRegMap, then gets the next block,
6046 // and sets the inVarToRegMap appropriately.
6048 void LinearScan::processBlockEndAllocation(BasicBlock* currentBlock)
6050 assert(currentBlock != nullptr);
6051 processBlockEndLocations(currentBlock);
6052 markBlockVisited(currentBlock);
6054 // Get the next block to allocate.
6055 // When the last block in the method has successors, there will be a final "RefTypeBB" to
6056 // ensure that we get the varToRegMap set appropriately, but in that case we don't need
6057 // to worry about "nextBlock".
6058 BasicBlock* nextBlock = getNextBlock();
6059 if (nextBlock != nullptr)
6061 processBlockStartLocations(nextBlock, true);
6065 //------------------------------------------------------------------------
6066 // rotateBlockStartLocation: When in the LSRA_BLOCK_BOUNDARY_ROTATE stress mode, attempt to
6067 // "rotate" the register assignment for a localVar to the next higher
6068 // register that is available.
6071 // interval - the Interval for the variable whose register is getting rotated
6072 // targetReg - its register assignment from the predecessor block being used for live-in
6073 // availableRegs - registers available for use
6076 // The new register to use.
6079 regNumber LinearScan::rotateBlockStartLocation(Interval* interval, regNumber targetReg, regMaskTP availableRegs)
6081 if (targetReg != REG_STK && getLsraBlockBoundaryLocations() == LSRA_BLOCK_BOUNDARY_ROTATE)
6083 // If we're rotating the register locations at block boundaries, try to use
6084 // the next higher register number of the appropriate register type.
6085 regMaskTP candidateRegs = allRegs(interval->registerType) & availableRegs;
6086 regNumber firstReg = REG_NA;
6087 regNumber newReg = REG_NA;
6088 while (candidateRegs != RBM_NONE)
6090 regMaskTP nextRegBit = genFindLowestBit(candidateRegs);
6091 candidateRegs &= ~nextRegBit;
6092 regNumber nextReg = genRegNumFromMask(nextRegBit);
6093 if (nextReg > targetReg)
6098 else if (firstReg == REG_NA)
6103 if (newReg == REG_NA)
6105 assert(firstReg != REG_NA);
6114 //------------------------------------------------------------------------
6115 // processBlockStartLocations: Update var locations on entry to 'currentBlock'
6118 // currentBlock - the BasicBlock we have just finished allocating registers for
6119 // allocationPass - true if we are currently allocating registers (versus writing them back)
6125 // During the allocation pass, we use the outVarToRegMap of the selected predecessor to
6126 // determine the lclVar locations for the inVarToRegMap.
6127 // During the resolution (write-back) pass, we only modify the inVarToRegMap in cases where
6128 // a lclVar was spilled after the block had been completed.
6129 void LinearScan::processBlockStartLocations(BasicBlock* currentBlock, bool allocationPass)
6131 unsigned predBBNum = blockInfo[currentBlock->bbNum].predBBNum;
6132 VarToRegMap predVarToRegMap = getOutVarToRegMap(predBBNum);
6133 VarToRegMap inVarToRegMap = getInVarToRegMap(currentBlock->bbNum);
6134 bool hasCriticalInEdge = blockInfo[currentBlock->bbNum].hasCriticalInEdge;
6136 VARSET_TP VARSET_INIT_NOCOPY(liveIn, currentBlock->bbLiveIn);
6138 if (getLsraExtendLifeTimes())
6140 VarSetOps::AssignNoCopy(compiler, liveIn, compiler->lvaTrackedVars);
6142 // If we are rotating register assignments at block boundaries, we want to make the
6143 // inactive registers available for the rotation.
6144 regMaskTP inactiveRegs = RBM_NONE;
6146 regMaskTP liveRegs = RBM_NONE;
6147 VARSET_ITER_INIT(compiler, iter, liveIn, varIndex);
6148 while (iter.NextElem(compiler, &varIndex))
6150 unsigned varNum = compiler->lvaTrackedToVarNum[varIndex];
6151 if (!compiler->lvaTable[varNum].lvLRACandidate)
6155 regNumber targetReg;
6156 Interval* interval = getIntervalForLocalVar(varNum);
6157 RefPosition* nextRefPosition = interval->getNextRefPosition();
6158 assert(nextRefPosition != nullptr);
6162 targetReg = predVarToRegMap[varIndex];
6163 INDEBUG(targetReg = rotateBlockStartLocation(interval, targetReg, (~liveRegs | inactiveRegs)));
6164 inVarToRegMap[varIndex] = targetReg;
6166 else // !allocationPass (i.e. resolution/write-back pass)
6168 targetReg = inVarToRegMap[varIndex];
6169 // There are four cases that we need to consider during the resolution pass:
6170 // 1. This variable had a register allocated initially, and it was not spilled in the RefPosition
6171 // that feeds this block. In this case, both targetReg and predVarToRegMap[varIndex] will be targetReg.
6172 // 2. This variable had not been spilled prior to the end of predBB, but was later spilled, so
6173 // predVarToRegMap[varIndex] will be REG_STK, but targetReg is its former allocated value.
6174 // In this case, we will normally change it to REG_STK. We will update its "spilled" status when we
6175 // encounter it in resolveLocalRef().
6176 // 2a. If the next RefPosition is marked as a copyReg, we need to retain the allocated register. This is
6177 // because the copyReg RefPosition will not have recorded the "home" register, yet downstream
6178 // RefPositions rely on the correct "home" register.
6179 // 3. This variable was spilled before we reached the end of predBB. In this case, both targetReg and
6180 // predVarToRegMap[varIndex] will be REG_STK, and the next RefPosition will have been marked
6181 // as reload during allocation time if necessary (note that by the time we actually reach the next
6182 // RefPosition, we may be using a different predecessor, at which it is still in a register).
6183 // 4. This variable was spilled during the allocation of this block, so targetReg is REG_STK
6184 // (because we set inVarToRegMap at the time we spilled it), but predVarToRegMap[varIndex]
6185 // is not REG_STK. We retain the REG_STK value in the inVarToRegMap.
6186 if (targetReg != REG_STK)
6188 if (predVarToRegMap[varIndex] != REG_STK)
6191 assert(predVarToRegMap[varIndex] == targetReg ||
6192 getLsraBlockBoundaryLocations() == LSRA_BLOCK_BOUNDARY_ROTATE);
6194 else if (!nextRefPosition->copyReg)
6197 inVarToRegMap[varIndex] = REG_STK;
6198 targetReg = REG_STK;
6200 // Else case 2a. - retain targetReg.
6202 // Else case #3 or #4, we retain targetReg and nothing further to do or assert.
6204 if (interval->physReg == targetReg)
6206 if (interval->isActive)
6208 assert(targetReg != REG_STK);
6209 assert(interval->assignedReg != nullptr && interval->assignedReg->regNum == targetReg &&
6210 interval->assignedReg->assignedInterval == interval);
6211 liveRegs |= genRegMask(targetReg);
6215 else if (interval->physReg != REG_NA)
6217 // This can happen if we are using the locations from a basic block other than the
6218 // immediately preceding one - where the variable was in a different location.
6219 if (targetReg != REG_STK)
6221 // Unassign it from the register (it will get a new register below).
6222 if (interval->assignedReg != nullptr && interval->assignedReg->assignedInterval == interval)
6224 interval->isActive = false;
6225 unassignPhysReg(getRegisterRecord(interval->physReg), nullptr);
6229 // This interval was live in this register the last time we saw a reference to it,
6230 // but has since been displaced.
6231 interval->physReg = REG_NA;
6234 else if (allocationPass)
6236 // Keep the register assignment - if another var has it, it will get unassigned.
6237 // Otherwise, resolution will fix it up later, and it will be more
6238 // likely to match other assignments this way.
6239 interval->isActive = true;
6240 liveRegs |= genRegMask(interval->physReg);
6241 INDEBUG(inactiveRegs |= genRegMask(interval->physReg));
6242 inVarToRegMap[varIndex] = interval->physReg;
6246 interval->physReg = REG_NA;
6249 if (targetReg != REG_STK)
6251 RegRecord* targetRegRecord = getRegisterRecord(targetReg);
6252 liveRegs |= genRegMask(targetReg);
6253 if (!interval->isActive)
6255 interval->isActive = true;
6256 interval->physReg = targetReg;
6257 interval->assignedReg = targetRegRecord;
6259 Interval* assignedInterval = targetRegRecord->assignedInterval;
6260 if (assignedInterval != interval)
6262 // Is there another interval currently assigned to this register? If so unassign it.
6263 if (assignedInterval != nullptr)
6265 if (assignedInterval->assignedReg == targetRegRecord)
6267 // If the interval is active, it will be set to active when we reach its new
6268 // register assignment (which we must not yet have done, or it wouldn't still be
6269 // assigned to this register).
6270 assignedInterval->isActive = false;
6271 unassignPhysReg(targetRegRecord, nullptr);
6272 if (allocationPass && assignedInterval->isLocalVar &&
6273 inVarToRegMap[assignedInterval->getVarIndex(compiler)] == targetReg)
6275 inVarToRegMap[assignedInterval->getVarIndex(compiler)] = REG_STK;
6280 // This interval is no longer assigned to this register.
6281 targetRegRecord->assignedInterval = nullptr;
6284 assignPhysReg(targetRegRecord, interval);
6286 if (interval->recentRefPosition != nullptr && !interval->recentRefPosition->copyReg &&
6287 interval->recentRefPosition->registerAssignment != genRegMask(targetReg))
6289 interval->getNextRefPosition()->outOfOrder = true;
6294 // Unassign any registers that are no longer live.
6295 for (regNumber reg = REG_FIRST; reg < ACTUAL_REG_COUNT; reg = REG_NEXT(reg))
6297 if ((liveRegs & genRegMask(reg)) == 0)
6299 RegRecord* physRegRecord = getRegisterRecord(reg);
6300 Interval* assignedInterval = physRegRecord->assignedInterval;
6302 if (assignedInterval != nullptr)
6304 assert(assignedInterval->isLocalVar || assignedInterval->isConstant);
6305 if (!assignedInterval->isConstant && assignedInterval->assignedReg == physRegRecord)
6307 assignedInterval->isActive = false;
6308 if (assignedInterval->getNextRefPosition() == nullptr)
6310 unassignPhysReg(physRegRecord, nullptr);
6312 inVarToRegMap[assignedInterval->getVarIndex(compiler)] = REG_STK;
6316 // This interval may still be active, but was in another register in an
6317 // intervening block.
6318 physRegRecord->assignedInterval = nullptr;
6323 INDEBUG(dumpLsraAllocationEvent(LSRA_EVENT_START_BB, nullptr, REG_NA, currentBlock));
6326 //------------------------------------------------------------------------
6327 // processBlockEndLocations: Record the variables occupying registers after completing the current block.
6330 // currentBlock - the block we have just completed.
6336 // This must be called both during the allocation and resolution (write-back) phases.
6337 // This is because we need to have the outVarToRegMap locations in order to set the locations
6338 // at successor blocks during allocation time, but if lclVars are spilled after a block has been
6339 // completed, we need to record the REG_STK location for those variables at resolution time.
6341 void LinearScan::processBlockEndLocations(BasicBlock* currentBlock)
6343 assert(currentBlock != nullptr && currentBlock->bbNum == curBBNum);
6344 VarToRegMap outVarToRegMap = getOutVarToRegMap(curBBNum);
6346 VARSET_TP VARSET_INIT_NOCOPY(liveOut, currentBlock->bbLiveOut);
6348 if (getLsraExtendLifeTimes())
6350 VarSetOps::AssignNoCopy(compiler, liveOut, compiler->lvaTrackedVars);
6353 regMaskTP liveRegs = RBM_NONE;
6354 VARSET_ITER_INIT(compiler, iter, liveOut, varIndex);
6355 while (iter.NextElem(compiler, &varIndex))
6357 unsigned varNum = compiler->lvaTrackedToVarNum[varIndex];
6358 Interval* interval = getIntervalForLocalVar(varNum);
6359 if (interval->isActive)
6361 assert(interval->physReg != REG_NA && interval->physReg != REG_STK);
6362 outVarToRegMap[varIndex] = interval->physReg;
6366 outVarToRegMap[varIndex] = REG_STK;
6369 INDEBUG(dumpLsraAllocationEvent(LSRA_EVENT_END_BB));
6373 void LinearScan::dumpRefPositions(const char* str)
6375 printf("------------\n");
6376 printf("REFPOSITIONS %s: \n", str);
6377 printf("------------\n");
6378 for (auto& refPos : refPositions)
6385 bool LinearScan::registerIsFree(regNumber regNum, RegisterType regType)
6387 RegRecord* physRegRecord = getRegisterRecord(regNum);
6389 bool isFree = physRegRecord->isFree();
6392 if (isFree && regType == TYP_DOUBLE)
6394 isFree = getRegisterRecord(REG_NEXT(regNum))->isFree();
6396 #endif // _TARGET_ARM_
6401 //------------------------------------------------------------------------
6402 // LinearScan::freeRegister: Make a register available for use
6405 // physRegRecord - the RegRecord for the register to be freed.
6412 // It may be that the RegRecord has already been freed, e.g. due to a kill,
6413 // in which case this method has no effect.
6416 // If there is currently an Interval assigned to this register, and it has
6417 // more references (i.e. this is a local last-use, but more uses and/or
6418 // defs remain), it will remain assigned to the physRegRecord. However, since
6419 // it is marked inactive, the register will be available, albeit less desirable
6421 void LinearScan::freeRegister(RegRecord* physRegRecord)
6423 Interval* assignedInterval = physRegRecord->assignedInterval;
6424 // It may have already been freed by a "Kill"
6425 if (assignedInterval != nullptr)
6427 assignedInterval->isActive = false;
6428 // If this is a constant node, that we may encounter again (e.g. constant),
6429 // don't unassign it until we need the register.
6430 if (!assignedInterval->isConstant)
6432 RefPosition* nextRefPosition = assignedInterval->getNextRefPosition();
6433 // Unassign the register only if there are no more RefPositions, or the next
6434 // one is a def. Note that the latter condition doesn't actually ensure that
6435 // there aren't subsequent uses that could be reached by a def in the assigned
6436 // register, but is merely a heuristic to avoid tying up the register (or using
6437 // it when it's non-optimal). A better alternative would be to use SSA, so that
6438 // we wouldn't unnecessarily link separate live ranges to the same register.
6439 if (nextRefPosition == nullptr || RefTypeIsDef(nextRefPosition->refType))
6441 unassignPhysReg(physRegRecord, nullptr);
6447 void LinearScan::freeRegisters(regMaskTP regsToFree)
6449 if (regsToFree == RBM_NONE)
6454 INDEBUG(dumpLsraAllocationEvent(LSRA_EVENT_FREE_REGS));
6455 while (regsToFree != RBM_NONE)
6457 regMaskTP nextRegBit = genFindLowestBit(regsToFree);
6458 regsToFree &= ~nextRegBit;
6459 regNumber nextReg = genRegNumFromMask(nextRegBit);
6460 freeRegister(getRegisterRecord(nextReg));
6464 // Actual register allocation, accomplished by iterating over all of the previously
6465 // constructed Intervals
6466 // Loosely based on raAssignVars()
6468 void LinearScan::allocateRegisters()
6470 JITDUMP("*************** In LinearScan::allocateRegisters()\n");
6471 DBEXEC(VERBOSE, lsraDumpIntervals("before allocateRegisters"));
6473 // at start, nothing is active except for register args
6474 for (auto& interval : intervals)
6476 Interval* currentInterval = &interval;
6477 currentInterval->recentRefPosition = nullptr;
6478 currentInterval->isActive = false;
6479 if (currentInterval->isLocalVar)
6481 LclVarDsc* varDsc = currentInterval->getLocalVar(compiler);
6482 if (varDsc->lvIsRegArg && currentInterval->firstRefPosition != nullptr)
6484 currentInterval->isActive = true;
6489 for (regNumber reg = REG_FIRST; reg < ACTUAL_REG_COUNT; reg = REG_NEXT(reg))
6491 getRegisterRecord(reg)->recentRefPosition = nullptr;
6492 getRegisterRecord(reg)->isActive = false;
6496 regNumber lastAllocatedReg = REG_NA;
6499 dumpRefPositions("BEFORE ALLOCATION");
6500 dumpVarRefPositions("BEFORE ALLOCATION");
6502 printf("\n\nAllocating Registers\n"
6503 "--------------------\n");
6506 dumpRegRecordHeader();
6507 // Now print an empty indent
6508 printf(indentFormat, "");
6513 BasicBlock* currentBlock = nullptr;
6515 LsraLocation prevLocation = MinLocation;
6516 regMaskTP regsToFree = RBM_NONE;
6517 regMaskTP delayRegsToFree = RBM_NONE;
6519 // This is the most recent RefPosition for which a register was allocated
6520 // - currently only used for DEBUG but maintained in non-debug, for clarity of code
6521 // (and will be optimized away because in non-debug spillAlways() unconditionally returns false)
6522 RefPosition* lastAllocatedRefPosition = nullptr;
6524 bool handledBlockEnd = false;
6526 for (auto& refPosition : refPositions)
6528 RefPosition* currentRefPosition = &refPosition;
6531 // Set the activeRefPosition to null until we're done with any boundary handling.
6532 activeRefPosition = nullptr;
6537 // We're really dumping the RegRecords "after" the previous RefPosition, but it's more convenient
6538 // to do this here, since there are a number of "continue"s in this loop.
6548 // This is the previousRefPosition of the current Referent, if any
6549 RefPosition* previousRefPosition = nullptr;
6551 Interval* currentInterval = nullptr;
6552 Referenceable* currentReferent = nullptr;
6553 bool isInternalRef = false;
6554 RefType refType = currentRefPosition->refType;
6556 currentReferent = currentRefPosition->referent;
6558 if (spillAlways() && lastAllocatedRefPosition != nullptr && !lastAllocatedRefPosition->isPhysRegRef &&
6559 !lastAllocatedRefPosition->getInterval()->isInternal &&
6560 (RefTypeIsDef(lastAllocatedRefPosition->refType) || lastAllocatedRefPosition->getInterval()->isLocalVar))
6562 assert(lastAllocatedRefPosition->registerAssignment != RBM_NONE);
6563 RegRecord* regRecord = lastAllocatedRefPosition->getInterval()->assignedReg;
6564 unassignPhysReg(regRecord, lastAllocatedRefPosition);
6565 // Now set lastAllocatedRefPosition to null, so that we don't try to spill it again
6566 lastAllocatedRefPosition = nullptr;
6569 // We wait to free any registers until we've completed all the
6570 // uses for the current node.
6571 // This avoids reusing registers too soon.
6572 // We free before the last true def (after all the uses & internal
6573 // registers), and then again at the beginning of the next node.
6574 // This is made easier by assigning two LsraLocations per node - one
6575 // for all the uses, internal registers & all but the last def, and
6576 // another for the final def (if any).
6578 LsraLocation currentLocation = currentRefPosition->nodeLocation;
6580 if ((regsToFree | delayRegsToFree) != RBM_NONE)
6582 bool doFreeRegs = false;
6583 // Free at a new location, or at a basic block boundary
6584 if (currentLocation > prevLocation || refType == RefTypeBB)
6591 freeRegisters(regsToFree);
6592 regsToFree = delayRegsToFree;
6593 delayRegsToFree = RBM_NONE;
6596 prevLocation = currentLocation;
6598 // get previous refposition, then current refpos is the new previous
6599 if (currentReferent != nullptr)
6601 previousRefPosition = currentReferent->recentRefPosition;
6602 currentReferent->recentRefPosition = currentRefPosition;
6606 assert((refType == RefTypeBB) || (refType == RefTypeKillGCRefs));
6609 // For the purposes of register resolution, we handle the DummyDefs before
6610 // the block boundary - so the RefTypeBB is after all the DummyDefs.
6611 // However, for the purposes of allocation, we want to handle the block
6612 // boundary first, so that we can free any registers occupied by lclVars
6613 // that aren't live in the next block and make them available for the
6616 if (!handledBlockEnd && (refType == RefTypeBB || refType == RefTypeDummyDef))
6618 // Free any delayed regs (now in regsToFree) before processing the block boundary
6619 freeRegisters(regsToFree);
6620 regsToFree = RBM_NONE;
6621 handledBlockEnd = true;
6622 curBBStartLocation = currentRefPosition->nodeLocation;
6623 if (currentBlock == nullptr)
6625 currentBlock = startBlockSequence();
6629 processBlockEndAllocation(currentBlock);
6630 currentBlock = moveToNextBlock();
6633 if (VERBOSE && currentBlock != nullptr && !dumpTerse)
6635 currentBlock->dspBlockHeader(compiler);
6642 activeRefPosition = currentRefPosition;
6647 dumpRefPositionShort(currentRefPosition, currentBlock);
6651 currentRefPosition->dump();
6656 if (refType == RefTypeBB)
6658 handledBlockEnd = false;
6662 if (refType == RefTypeKillGCRefs)
6664 spillGCRefs(currentRefPosition);
6668 // If this is a FixedReg, disassociate any inactive constant interval from this register.
6669 // Otherwise, do nothing.
6670 if (refType == RefTypeFixedReg)
6672 RegRecord* regRecord = currentRefPosition->getReg();
6673 if (regRecord->assignedInterval != nullptr && !regRecord->assignedInterval->isActive &&
6674 regRecord->assignedInterval->isConstant)
6676 regRecord->assignedInterval = nullptr;
6678 INDEBUG(dumpLsraAllocationEvent(LSRA_EVENT_FIXED_REG, nullptr, currentRefPosition->assignedReg()));
6682 // If this is an exposed use, do nothing - this is merely a placeholder to attempt to
6683 // ensure that a register is allocated for the full lifetime. The resolution logic
6684 // will take care of moving to the appropriate register if needed.
6686 if (refType == RefTypeExpUse)
6688 INDEBUG(dumpLsraAllocationEvent(LSRA_EVENT_EXP_USE));
6692 regNumber assignedRegister = REG_NA;
6694 if (currentRefPosition->isIntervalRef())
6696 currentInterval = currentRefPosition->getInterval();
6697 assignedRegister = currentInterval->physReg;
6699 if (VERBOSE && !dumpTerse)
6701 currentInterval->dump();
6705 // Identify the special cases where we decide up-front not to allocate
6706 bool allocate = true;
6707 bool didDump = false;
6709 if (refType == RefTypeParamDef || refType == RefTypeZeroInit)
6711 // For a ParamDef with a weighted refCount less than unity, don't enregister it at entry.
6712 // TODO-CQ: Consider doing this only for stack parameters, since otherwise we may be needlessly
6713 // inserting a store.
6714 LclVarDsc* varDsc = currentInterval->getLocalVar(compiler);
6715 assert(varDsc != nullptr);
6716 if (refType == RefTypeParamDef && varDsc->lvRefCntWtd <= BB_UNITY_WEIGHT)
6718 INDEBUG(dumpLsraAllocationEvent(LSRA_EVENT_NO_ENTRY_REG_ALLOCATED, currentInterval));
6722 // If it has no actual references, mark it as "lastUse"; since they're not actually part
6723 // of any flow they won't have been marked during dataflow. Otherwise, if we allocate a
6724 // register we won't unassign it.
6725 else if (currentRefPosition->nextRefPosition == nullptr)
6727 INDEBUG(dumpLsraAllocationEvent(LSRA_EVENT_ZERO_REF, currentInterval));
6728 currentRefPosition->lastUse = true;
6732 else if (refType == RefTypeUpperVectorSaveDef || refType == RefTypeUpperVectorSaveUse)
6734 Interval* lclVarInterval = currentInterval->relatedInterval;
6735 if (lclVarInterval->physReg == REG_NA)
6740 #endif // FEATURE_SIMD
6742 if (allocate == false)
6744 if (assignedRegister != REG_NA)
6746 unassignPhysReg(getRegisterRecord(assignedRegister), currentRefPosition);
6750 INDEBUG(dumpLsraAllocationEvent(LSRA_EVENT_NO_REG_ALLOCATED, currentInterval));
6753 currentRefPosition->registerAssignment = RBM_NONE;
6757 if (currentInterval->isSpecialPutArg)
6759 assert(!currentInterval->isLocalVar);
6760 Interval* srcInterval = currentInterval->relatedInterval;
6761 assert(srcInterval->isLocalVar);
6762 if (refType == RefTypeDef)
6764 assert(srcInterval->recentRefPosition->nodeLocation == currentLocation - 1);
6765 RegRecord* physRegRecord = srcInterval->assignedReg;
6767 // For a putarg_reg to be special, its next use location has to be the same
6768 // as fixed reg's next kill location. Otherwise, if source lcl var's next use
6769 // is after the kill of fixed reg but before putarg_reg's next use, fixed reg's
6770 // kill would lead to spill of source but not the putarg_reg if it were treated
6772 if (srcInterval->isActive &&
6773 genRegMask(srcInterval->physReg) == currentRefPosition->registerAssignment &&
6774 currentInterval->getNextRefLocation() == physRegRecord->getNextRefLocation())
6776 assert(physRegRecord->regNum == srcInterval->physReg);
6778 // Special putarg_reg acts as a pass-thru since both source lcl var
6779 // and putarg_reg have the same register allocated. Physical reg
6780 // record of reg continue to point to source lcl var's interval
6781 // instead of to putarg_reg's interval. So if a spill of reg
6782 // allocated to source lcl var happens, to reallocate to another
6783 // tree node, before its use at call node it will lead to spill of
6784 // lcl var instead of putarg_reg since physical reg record is pointing
6785 // to lcl var's interval. As a result, arg reg would get trashed leading
6786 // to bad codegen. The assumption here is that source lcl var of a
6787 // special putarg_reg doesn't get spilled and re-allocated prior to
6788 // its use at the call node. This is ensured by marking physical reg
6789 // record as busy until next kill.
6790 physRegRecord->isBusyUntilNextKill = true;
6794 currentInterval->isSpecialPutArg = false;
6797 // If this is still a SpecialPutArg, continue;
6798 if (currentInterval->isSpecialPutArg)
6800 INDEBUG(dumpLsraAllocationEvent(LSRA_EVENT_SPECIAL_PUTARG, currentInterval,
6801 currentRefPosition->assignedReg()));
6806 if (assignedRegister == REG_NA && RefTypeIsUse(refType))
6808 currentRefPosition->reload = true;
6809 INDEBUG(dumpLsraAllocationEvent(LSRA_EVENT_RELOAD, currentInterval, assignedRegister));
6813 regMaskTP assignedRegBit = RBM_NONE;
6814 bool isInRegister = false;
6815 if (assignedRegister != REG_NA)
6817 isInRegister = true;
6818 assignedRegBit = genRegMask(assignedRegister);
6819 if (!currentInterval->isActive)
6821 // If this is a use, it must have started the block on the stack, but the register
6822 // was available for use so we kept the association.
6823 if (RefTypeIsUse(refType))
6825 assert(inVarToRegMaps[curBBNum][currentInterval->getVarIndex(compiler)] == REG_STK &&
6826 previousRefPosition->nodeLocation <= curBBStartLocation);
6827 isInRegister = false;
6831 currentInterval->isActive = true;
6834 assert(currentInterval->assignedReg != nullptr &&
6835 currentInterval->assignedReg->regNum == assignedRegister &&
6836 currentInterval->assignedReg->assignedInterval == currentInterval);
6839 // If this is a physical register, we unconditionally assign it to itself!
6840 if (currentRefPosition->isPhysRegRef)
6842 RegRecord* currentReg = currentRefPosition->getReg();
6843 Interval* assignedInterval = currentReg->assignedInterval;
6845 if (assignedInterval != nullptr)
6847 unassignPhysReg(currentReg, assignedInterval->recentRefPosition);
6849 currentReg->isActive = true;
6850 assignedRegister = currentReg->regNum;
6851 assignedRegBit = genRegMask(assignedRegister);
6852 if (refType == RefTypeKill)
6854 currentReg->isBusyUntilNextKill = false;
6857 else if (previousRefPosition != nullptr)
6859 assert(previousRefPosition->nextRefPosition == currentRefPosition);
6860 assert(assignedRegister == REG_NA || assignedRegBit == previousRefPosition->registerAssignment ||
6861 currentRefPosition->outOfOrder || previousRefPosition->copyReg ||
6862 previousRefPosition->refType == RefTypeExpUse || currentRefPosition->refType == RefTypeDummyDef);
6864 else if (assignedRegister != REG_NA)
6866 // Handle the case where this is a preassigned register (i.e. parameter).
6867 // We don't want to actually use the preassigned register if it's not
6868 // going to cover the lifetime - but we had to preallocate it to ensure
6869 // that it remained live.
6870 // TODO-CQ: At some point we may want to refine the analysis here, in case
6871 // it might be beneficial to keep it in this reg for PART of the lifetime
6872 if (currentInterval->isLocalVar)
6874 regMaskTP preferences = currentInterval->registerPreferences;
6875 bool keepAssignment = true;
6876 bool matchesPreferences = (preferences & genRegMask(assignedRegister)) != RBM_NONE;
6878 // Will the assigned register cover the lifetime? If not, does it at least
6879 // meet the preferences for the next RefPosition?
6880 RegRecord* physRegRecord = getRegisterRecord(currentInterval->physReg);
6881 RefPosition* nextPhysRegRefPos = physRegRecord->getNextRefPosition();
6882 if (nextPhysRegRefPos != nullptr &&
6883 nextPhysRegRefPos->nodeLocation <= currentInterval->lastRefPosition->nodeLocation)
6885 // Check to see if the existing assignment matches the preferences (e.g. callee save registers)
6886 // and ensure that the next use of this localVar does not occur after the nextPhysRegRefPos
6887 // There must be a next RefPosition, because we know that the Interval extends beyond the
6888 // nextPhysRegRefPos.
6889 RefPosition* nextLclVarRefPos = currentRefPosition->nextRefPosition;
6890 assert(nextLclVarRefPos != nullptr);
6891 if (!matchesPreferences || nextPhysRegRefPos->nodeLocation < nextLclVarRefPos->nodeLocation ||
6892 physRegRecord->conflictingFixedRegReference(nextLclVarRefPos))
6894 keepAssignment = false;
6897 else if (refType == RefTypeParamDef && !matchesPreferences)
6899 // Don't use the register, even if available, if it doesn't match the preferences.
6900 // Note that this case is only for ParamDefs, for which we haven't yet taken preferences
6901 // into account (we've just automatically got the initial location). In other cases,
6902 // we would already have put it in a preferenced register, if it was available.
6903 // TODO-CQ: Consider expanding this to check availability - that would duplicate
6904 // code here, but otherwise we may wind up in this register anyway.
6905 keepAssignment = false;
6908 if (keepAssignment == false)
6910 currentRefPosition->registerAssignment = allRegs(currentInterval->registerType);
6911 unassignPhysRegNoSpill(physRegRecord);
6913 // If the preferences are currently set to just this register, reset them to allRegs
6914 // of the appropriate type (just as we just reset the registerAssignment for this
6916 // Otherwise, simply remove this register from the preferences, if it's there.
6918 if (currentInterval->registerPreferences == assignedRegBit)
6920 currentInterval->registerPreferences = currentRefPosition->registerAssignment;
6924 currentInterval->registerPreferences &= ~assignedRegBit;
6927 assignedRegister = REG_NA;
6928 assignedRegBit = RBM_NONE;
6933 if (assignedRegister != REG_NA)
6935 // If there is a conflicting fixed reference, insert a copy.
6936 RegRecord* physRegRecord = getRegisterRecord(assignedRegister);
6937 if (physRegRecord->conflictingFixedRegReference(currentRefPosition))
6939 // We may have already reassigned the register to the conflicting reference.
6940 // If not, we need to unassign this interval.
6941 if (physRegRecord->assignedInterval == currentInterval)
6943 unassignPhysRegNoSpill(physRegRecord);
6945 currentRefPosition->moveReg = true;
6946 assignedRegister = REG_NA;
6947 INDEBUG(dumpLsraAllocationEvent(LSRA_EVENT_MOVE_REG, currentInterval, assignedRegister));
6949 else if ((genRegMask(assignedRegister) & currentRefPosition->registerAssignment) != 0)
6951 currentRefPosition->registerAssignment = assignedRegBit;
6952 if (!currentReferent->isActive)
6954 // If we've got an exposed use at the top of a block, the
6955 // interval might not have been active. Otherwise if it's a use,
6956 // the interval must be active.
6957 if (refType == RefTypeDummyDef)
6959 currentReferent->isActive = true;
6960 assert(getRegisterRecord(assignedRegister)->assignedInterval == currentInterval);
6964 currentRefPosition->reload = true;
6967 INDEBUG(dumpLsraAllocationEvent(LSRA_EVENT_KEPT_ALLOCATION, currentInterval, assignedRegister));
6971 assert(currentInterval != nullptr);
6973 // It's already in a register, but not one we need.
6974 // If it is a fixed use that is not marked "delayRegFree", there is already a FixedReg to ensure that
6975 // the needed reg is not otherwise in use, so we can simply ignore it and codegen will do the copy.
6976 // The reason we need special handling for the "delayRegFree" case is that we need to mark the
6977 // fixed-reg as in-use and delayed (the FixedReg RefPosition doesn't handle the delay requirement).
6978 // Otherwise, if this is a pure use localVar or tree temp, we assign a copyReg, but must free both regs
6979 // if it is a last use.
6980 if (!currentRefPosition->isFixedRegRef || currentRefPosition->delayRegFree)
6982 if (!RefTypeIsDef(currentRefPosition->refType))
6984 regNumber copyReg = assignCopyReg(currentRefPosition);
6985 assert(copyReg != REG_NA);
6986 INDEBUG(dumpLsraAllocationEvent(LSRA_EVENT_COPY_REG, currentInterval, copyReg));
6987 lastAllocatedRefPosition = currentRefPosition;
6988 if (currentRefPosition->lastUse)
6990 if (currentRefPosition->delayRegFree)
6992 INDEBUG(dumpLsraAllocationEvent(LSRA_EVENT_LAST_USE_DELAYED, currentInterval,
6995 (genRegMask(assignedRegister) | currentRefPosition->registerAssignment);
7000 dumpLsraAllocationEvent(LSRA_EVENT_LAST_USE, currentInterval, assignedRegister));
7001 regsToFree |= (genRegMask(assignedRegister) | currentRefPosition->registerAssignment);
7004 // If this is a tree temp (non-localVar) interval, we will need an explicit move.
7005 if (!currentInterval->isLocalVar)
7007 currentRefPosition->moveReg = true;
7008 currentRefPosition->copyReg = false;
7014 INDEBUG(dumpLsraAllocationEvent(LSRA_EVENT_NEEDS_NEW_REG, nullptr, assignedRegister));
7015 regsToFree |= genRegMask(assignedRegister);
7016 // We want a new register, but we don't want this to be considered a spill.
7017 assignedRegister = REG_NA;
7018 if (physRegRecord->assignedInterval == currentInterval)
7020 unassignPhysRegNoSpill(physRegRecord);
7026 INDEBUG(dumpLsraAllocationEvent(LSRA_EVENT_KEPT_ALLOCATION, nullptr, assignedRegister));
7031 if (assignedRegister == REG_NA)
7033 bool allocateReg = true;
7035 if (currentRefPosition->AllocateIfProfitable())
7037 // We can avoid allocating a register if it is a the last use requiring a reload.
7038 if (currentRefPosition->lastUse && currentRefPosition->reload)
7040 allocateReg = false;
7044 // Under stress mode, don't attempt to allocate a reg to
7045 // reg optional ref position.
7046 if (allocateReg && regOptionalNoAlloc())
7048 allocateReg = false;
7055 // Try to allocate a register
7056 assignedRegister = tryAllocateFreeReg(currentInterval, currentRefPosition);
7059 // If no register was found, and if the currentRefPosition must have a register,
7060 // then find a register to spill
7061 if (assignedRegister == REG_NA)
7064 if (refType == RefTypeUpperVectorSaveDef)
7066 // TODO-CQ: Determine whether copying to two integer callee-save registers would be profitable.
7067 currentRefPosition->registerAssignment = (allRegs(TYP_FLOAT) & RBM_FLT_CALLEE_TRASH);
7068 assignedRegister = tryAllocateFreeReg(currentInterval, currentRefPosition);
7069 // There MUST be caller-save registers available, because they have all just been killed.
7070 assert(assignedRegister != REG_NA);
7072 // (These will look a bit backward in the dump, but it's a pain to dump the alloc before the spill).
7073 unassignPhysReg(getRegisterRecord(assignedRegister), currentRefPosition);
7074 INDEBUG(dumpLsraAllocationEvent(LSRA_EVENT_ALLOC_REG, currentInterval, assignedRegister));
7075 // Now set assignedRegister to REG_NA again so that we don't re-activate it.
7076 assignedRegister = REG_NA;
7079 #endif // FEATURE_SIMD
7080 if (currentRefPosition->RequiresRegister() || currentRefPosition->AllocateIfProfitable())
7084 assignedRegister = allocateBusyReg(currentInterval, currentRefPosition,
7085 currentRefPosition->AllocateIfProfitable());
7088 if (assignedRegister != REG_NA)
7091 dumpLsraAllocationEvent(LSRA_EVENT_ALLOC_SPILLED_REG, currentInterval, assignedRegister));
7095 // This can happen only for those ref positions that are to be allocated
7096 // only if profitable.
7097 noway_assert(currentRefPosition->AllocateIfProfitable());
7099 currentRefPosition->registerAssignment = RBM_NONE;
7100 currentRefPosition->reload = false;
7102 INDEBUG(dumpLsraAllocationEvent(LSRA_EVENT_NO_REG_ALLOCATED, currentInterval));
7107 INDEBUG(dumpLsraAllocationEvent(LSRA_EVENT_NO_REG_ALLOCATED, currentInterval));
7108 currentRefPosition->registerAssignment = RBM_NONE;
7109 currentInterval->isActive = false;
7117 if (currentInterval->isConstant && (currentRefPosition->treeNode != nullptr) &&
7118 currentRefPosition->treeNode->IsReuseRegVal())
7120 dumpLsraAllocationEvent(LSRA_EVENT_REUSE_REG, nullptr, assignedRegister, currentBlock);
7124 dumpLsraAllocationEvent(LSRA_EVENT_ALLOC_REG, nullptr, assignedRegister, currentBlock);
7130 if (refType == RefTypeDummyDef && assignedRegister != REG_NA)
7132 setInVarRegForBB(curBBNum, currentInterval->varNum, assignedRegister);
7135 // If we allocated a register, and this is a use of a spilled value,
7136 // it should have been marked for reload above.
7137 if (assignedRegister != REG_NA && RefTypeIsUse(refType) && !isInRegister)
7139 assert(currentRefPosition->reload);
7143 // If we allocated a register, record it
7144 if (currentInterval != nullptr && assignedRegister != REG_NA)
7146 assignedRegBit = genRegMask(assignedRegister);
7147 currentRefPosition->registerAssignment = assignedRegBit;
7148 currentInterval->physReg = assignedRegister;
7149 regsToFree &= ~assignedRegBit; // we'll set it again later if it's dead
7151 // If this interval is dead, free the register.
7152 // The interval could be dead if this is a user variable, or if the
7153 // node is being evaluated for side effects, or a call whose result
7154 // is not used, etc.
7155 if (currentRefPosition->lastUse || currentRefPosition->nextRefPosition == nullptr)
7157 assert(currentRefPosition->isIntervalRef());
7159 if (refType != RefTypeExpUse && currentRefPosition->nextRefPosition == nullptr)
7161 if (currentRefPosition->delayRegFree)
7163 delayRegsToFree |= assignedRegBit;
7164 INDEBUG(dumpLsraAllocationEvent(LSRA_EVENT_LAST_USE_DELAYED));
7168 regsToFree |= assignedRegBit;
7169 INDEBUG(dumpLsraAllocationEvent(LSRA_EVENT_LAST_USE));
7174 currentInterval->isActive = false;
7178 lastAllocatedRefPosition = currentRefPosition;
7182 // Free registers to clear associated intervals for resolution phase
7183 CLANG_FORMAT_COMMENT_ANCHOR;
7186 if (getLsraExtendLifeTimes())
7188 // If we have extended lifetimes, we need to make sure all the registers are freed.
7189 for (int regNumIndex = 0; regNumIndex <= REG_FP_LAST; regNumIndex++)
7191 RegRecord& regRecord = physRegs[regNumIndex];
7192 Interval* interval = regRecord.assignedInterval;
7193 if (interval != nullptr)
7195 interval->isActive = false;
7196 unassignPhysReg(®Record, nullptr);
7203 freeRegisters(regsToFree | delayRegsToFree);
7211 // Dump the RegRecords after the last RefPosition is handled.
7216 dumpRefPositions("AFTER ALLOCATION");
7217 dumpVarRefPositions("AFTER ALLOCATION");
7219 // Dump the intervals that remain active
7220 printf("Active intervals at end of allocation:\n");
7222 // We COULD just reuse the intervalIter from above, but ArrayListIterator doesn't
7223 // provide a Reset function (!) - we'll probably replace this so don't bother
7226 for (auto& interval : intervals)
7228 if (interval.isActive)
7240 // LinearScan::resolveLocalRef
7242 // Update the graph for a local reference.
7243 // Also, track the register (if any) that is currently occupied.
7245 // treeNode: The lclVar that's being resolved
7246 // currentRefPosition: the RefPosition associated with the treeNode
7249 // This method is called for each local reference, during the resolveRegisters
7250 // phase of LSRA. It is responsible for keeping the following in sync:
7251 // - varDsc->lvRegNum (and lvOtherReg) contain the unique register location.
7252 // If it is not in the same register through its lifetime, it is set to REG_STK.
7253 // - interval->physReg is set to the assigned register
7254 // (i.e. at the code location which is currently being handled by resolveRegisters())
7255 // - interval->isActive is true iff the interval is live and occupying a register
7256 // - interval->isSpilled is set to true if the interval is EVER spilled
7257 // - interval->isSplit is set to true if the interval does not occupy the same
7258 // register throughout the method
7259 // - RegRecord->assignedInterval points to the interval which currently occupies
7261 // - For each lclVar node:
7262 // - gtRegNum/gtRegPair is set to the currently allocated register(s)
7263 // - GTF_REG_VAL is set if it is a use, and is in a register
7264 // - GTF_SPILLED is set on a use if it must be reloaded prior to use (GTF_REG_VAL
7266 // - GTF_SPILL is set if it must be spilled after use (GTF_REG_VAL may or may not
7269 // A copyReg is an ugly case where the variable must be in a specific (fixed) register,
7270 // but it currently resides elsewhere. The register allocator must track the use of the
7271 // fixed register, but it marks the lclVar node with the register it currently lives in
7272 // and the code generator does the necessary move.
7274 // Before beginning, the varDsc for each parameter must be set to its initial location.
7276 // NICE: Consider tracking whether an Interval is always in the same location (register/stack)
7277 // in which case it will require no resolution.
7279 void LinearScan::resolveLocalRef(BasicBlock* block, GenTreePtr treeNode, RefPosition* currentRefPosition)
7281 assert((block == nullptr) == (treeNode == nullptr));
7283 // Is this a tracked local? Or just a register allocated for loading
7284 // a non-tracked one?
7285 Interval* interval = currentRefPosition->getInterval();
7286 if (!interval->isLocalVar)
7290 interval->recentRefPosition = currentRefPosition;
7291 LclVarDsc* varDsc = interval->getLocalVar(compiler);
7293 if (currentRefPosition->registerAssignment == RBM_NONE)
7295 assert(!currentRefPosition->RequiresRegister());
7297 interval->isSpilled = true;
7298 varDsc->lvRegNum = REG_STK;
7299 if (interval->assignedReg != nullptr && interval->assignedReg->assignedInterval == interval)
7301 interval->assignedReg->assignedInterval = nullptr;
7303 interval->assignedReg = nullptr;
7304 interval->physReg = REG_NA;
7309 // In most cases, assigned and home registers will be the same
7310 // The exception is the copyReg case, where we've assigned a register
7311 // for a specific purpose, but will be keeping the register assignment
7312 regNumber assignedReg = currentRefPosition->assignedReg();
7313 regNumber homeReg = assignedReg;
7315 // Undo any previous association with a physical register, UNLESS this
7317 if (!currentRefPosition->copyReg)
7319 regNumber oldAssignedReg = interval->physReg;
7320 if (oldAssignedReg != REG_NA && assignedReg != oldAssignedReg)
7322 RegRecord* oldRegRecord = getRegisterRecord(oldAssignedReg);
7323 if (oldRegRecord->assignedInterval == interval)
7325 oldRegRecord->assignedInterval = nullptr;
7330 if (currentRefPosition->refType == RefTypeUse && !currentRefPosition->reload)
7332 // Was this spilled after our predecessor was scheduled?
7333 if (interval->physReg == REG_NA)
7335 assert(inVarToRegMaps[curBBNum][varDsc->lvVarIndex] == REG_STK);
7336 currentRefPosition->reload = true;
7340 bool reload = currentRefPosition->reload;
7341 bool spillAfter = currentRefPosition->spillAfter;
7343 // In the reload case we simply do not set GTF_REG_VAL, and it gets
7344 // referenced from the variable's home location.
7345 // This is also true for a pure def which is spilled.
7346 if (reload && currentRefPosition->refType != RefTypeDef)
7348 varDsc->lvRegNum = REG_STK;
7351 interval->physReg = assignedReg;
7354 // If there is no treeNode, this must be a RefTypeExpUse, in
7355 // which case we did the reload already
7356 if (treeNode != nullptr)
7358 treeNode->gtFlags |= GTF_SPILLED;
7361 if (currentRefPosition->AllocateIfProfitable())
7363 // This is a use of lclVar that is flagged as reg-optional
7364 // by lower/codegen and marked for both reload and spillAfter.
7365 // In this case we can avoid unnecessary reload and spill
7366 // by setting reg on lclVar to REG_STK and reg on tree node
7367 // to REG_NA. Codegen will generate the code by considering
7368 // it as a contained memory operand.
7370 // Note that varDsc->lvRegNum is already to REG_STK above.
7371 interval->physReg = REG_NA;
7372 treeNode->gtRegNum = REG_NA;
7373 treeNode->gtFlags &= ~GTF_SPILLED;
7377 treeNode->gtFlags |= GTF_SPILL;
7383 assert(currentRefPosition->refType == RefTypeExpUse);
7386 // If we have an undefined use set it as non-reg
7387 if (!interval->isSpilled)
7389 if (varDsc->lvIsParam && !varDsc->lvIsRegArg && currentRefPosition == interval->firstRefPosition)
7391 // Parameters are the only thing that can be used before defined
7395 // if we see a use before def of something else, the zero init flag better not be set.
7396 noway_assert(!compiler->info.compInitMem);
7397 // if it is not set, then the behavior is undefined but we don't want to crash or assert
7398 interval->isSpilled = true;
7402 else if (spillAfter && !RefTypeIsUse(currentRefPosition->refType))
7404 // In the case of a pure def, don't bother spilling - just assign it to the
7405 // stack. However, we need to remember that it was spilled.
7407 interval->isSpilled = true;
7408 varDsc->lvRegNum = REG_STK;
7409 interval->physReg = REG_NA;
7410 if (treeNode != nullptr)
7412 treeNode->gtRegNum = REG_NA;
7417 // Not reload and Not pure-def that's spillAfter
7419 if (currentRefPosition->copyReg || currentRefPosition->moveReg)
7421 // For a copyReg or moveReg, we have two cases:
7422 // - In the first case, we have a fixedReg - i.e. a register which the code
7423 // generator is constrained to use.
7424 // The code generator will generate the appropriate move to meet the requirement.
7425 // - In the second case, we were forced to use a different register because of
7426 // interference (or JitStressRegs).
7427 // In this case, we generate a GT_COPY.
7428 // In either case, we annotate the treeNode with the register in which the value
7429 // currently lives. For moveReg, the homeReg is the new register (as assigned above).
7430 // But for copyReg, the homeReg remains unchanged.
7432 assert(treeNode != nullptr);
7433 treeNode->gtRegNum = interval->physReg;
7435 if (currentRefPosition->copyReg)
7437 homeReg = interval->physReg;
7441 interval->physReg = assignedReg;
7444 if (!currentRefPosition->isFixedRegRef || currentRefPosition->moveReg)
7446 // This is the second case, where we need to generate a copy
7447 insertCopyOrReload(block, treeNode, currentRefPosition->getMultiRegIdx(), currentRefPosition);
7452 interval->physReg = assignedReg;
7454 if (!interval->isSpilled && !interval->isSplit)
7456 if (varDsc->lvRegNum != REG_STK)
7458 // If the register assignments don't match, then this interval is spilt,
7459 // but not spilled (yet)
7460 // However, we don't have a single register assignment now
7461 if (varDsc->lvRegNum != assignedReg)
7463 interval->isSplit = TRUE;
7464 varDsc->lvRegNum = REG_STK;
7469 varDsc->lvRegNum = assignedReg;
7475 if (treeNode != nullptr)
7477 treeNode->gtFlags |= GTF_SPILL;
7479 interval->isSpilled = true;
7480 interval->physReg = REG_NA;
7481 varDsc->lvRegNum = REG_STK;
7484 // This value is in a register, UNLESS we already saw this treeNode
7485 // and marked it for reload
7486 if (treeNode != nullptr && !(treeNode->gtFlags & GTF_SPILLED))
7488 treeNode->gtFlags |= GTF_REG_VAL;
7492 // Update the physRegRecord for the register, so that we know what vars are in
7493 // regs at the block boundaries
7494 RegRecord* physRegRecord = getRegisterRecord(homeReg);
7495 if (spillAfter || currentRefPosition->lastUse)
7497 physRegRecord->assignedInterval = nullptr;
7498 interval->assignedReg = nullptr;
7499 interval->physReg = REG_NA;
7500 interval->isActive = false;
7504 interval->isActive = true;
7505 physRegRecord->assignedInterval = interval;
7506 interval->assignedReg = physRegRecord;
7510 void LinearScan::writeRegisters(RefPosition* currentRefPosition, GenTree* tree)
7512 lsraAssignRegToTree(tree, currentRefPosition->assignedReg(), currentRefPosition->getMultiRegIdx());
7515 //------------------------------------------------------------------------
7516 // insertCopyOrReload: Insert a copy in the case where a tree node value must be moved
7517 // to a different register at the point of use (GT_COPY), or it is reloaded to a different register
7518 // than the one it was spilled from (GT_RELOAD).
7521 // tree - This is the node to copy or reload.
7522 // Insert copy or reload node between this node and its parent.
7523 // multiRegIdx - register position of tree node for which copy or reload is needed.
7524 // refPosition - The RefPosition at which copy or reload will take place.
7527 // The GT_COPY or GT_RELOAD will be inserted in the proper spot in execution order where the reload is to occur.
7529 // For example, for this tree (numbers are execution order, lower is earlier and higher is later):
7531 // +---------+----------+
7533 // +---------+----------+
7538 // +-------------------+ +----------------------+
7539 // | x (1) | "tree" | y (2) |
7540 // +-------------------+ +----------------------+
7542 // generate this tree:
7544 // +---------+----------+
7546 // +---------+----------+
7551 // +-------------------+ +----------------------+
7552 // | GT_RELOAD (3) | | y (2) |
7553 // +-------------------+ +----------------------+
7555 // +-------------------+
7557 // +-------------------+
7559 // Note in particular that the GT_RELOAD node gets inserted in execution order immediately before the parent of "tree",
7560 // which seems a bit weird since normally a node's parent (in this case, the parent of "x", GT_RELOAD in the "after"
7561 // picture) immediately follows all of its children (that is, normally the execution ordering is postorder).
7562 // The ordering must be this weird "out of normal order" way because the "x" node is being spilled, probably
7563 // because the expression in the tree represented above by "y" has high register requirements. We don't want
7564 // to reload immediately, of course. So we put GT_RELOAD where the reload should actually happen.
7566 // Note that GT_RELOAD is required when we reload to a different register than the one we spilled to. It can also be
7567 // used if we reload to the same register. Normally, though, in that case we just mark the node with GTF_SPILLED,
7568 // and the unspilling code automatically reuses the same register, and does the reload when it notices that flag
7569 // when considering a node's operands.
7571 void LinearScan::insertCopyOrReload(BasicBlock* block, GenTreePtr tree, unsigned multiRegIdx, RefPosition* refPosition)
7573 LIR::Range& blockRange = LIR::AsRange(block);
7576 bool foundUse = blockRange.TryGetUse(tree, &treeUse);
7579 GenTree* parent = treeUse.User();
7582 if (refPosition->reload)
7591 // If the parent is a reload/copy node, then tree must be a multi-reg call node
7592 // that has already had one of its registers spilled. This is Because multi-reg
7593 // call node is the only node whose RefTypeDef positions get independently
7594 // spilled or reloaded. It is possible that one of its RefTypeDef position got
7595 // spilled and the next use of it requires it to be in a different register.
7597 // In this case set the ith position reg of reload/copy node to the reg allocated
7598 // for copy/reload refPosition. Essentially a copy/reload node will have a reg
7599 // for each multi-reg position of its child. If there is a valid reg in ith
7600 // position of GT_COPY or GT_RELOAD node then the corresponding result of its
7601 // child needs to be copied or reloaded to that reg.
7602 if (parent->IsCopyOrReload())
7604 noway_assert(parent->OperGet() == oper);
7605 noway_assert(tree->IsMultiRegCall());
7606 GenTreeCall* call = tree->AsCall();
7607 GenTreeCopyOrReload* copyOrReload = parent->AsCopyOrReload();
7608 noway_assert(copyOrReload->GetRegNumByIdx(multiRegIdx) == REG_NA);
7609 copyOrReload->SetRegNumByIdx(refPosition->assignedReg(), multiRegIdx);
7613 // Create the new node, with "tree" as its only child.
7614 var_types treeType = tree->TypeGet();
7617 // Check to see whether we need to move to a different register set.
7618 // This currently only happens in the case of SIMD vector types that are small enough (pointer size)
7619 // that they must be passed & returned in integer registers.
7620 // 'treeType' is the type of the register we are moving FROM,
7621 // and refPosition->registerAssignment is the mask for the register we are moving TO.
7622 // If they don't match, we need to reverse the type for the "move" node.
7624 if ((allRegs(treeType) & refPosition->registerAssignment) == 0)
7626 treeType = (useFloatReg(treeType)) ? TYP_I_IMPL : TYP_SIMD8;
7628 #endif // FEATURE_SIMD
7630 GenTreeCopyOrReload* newNode = new (compiler, oper) GenTreeCopyOrReload(oper, treeType, tree);
7631 assert(refPosition->registerAssignment != RBM_NONE);
7632 newNode->SetRegNumByIdx(refPosition->assignedReg(), multiRegIdx);
7633 newNode->gtLsraInfo.isLsraAdded = true;
7634 newNode->gtLsraInfo.isLocalDefUse = false;
7635 if (refPosition->copyReg)
7637 // This is a TEMPORARY copy
7638 assert(isCandidateLocalRef(tree));
7639 newNode->gtFlags |= GTF_VAR_DEATH;
7642 // Insert the copy/reload after the spilled node and replace the use of the original node with a use
7643 // of the copy/reload.
7644 blockRange.InsertAfter(tree, newNode);
7645 treeUse.ReplaceWith(compiler, newNode);
7649 #if FEATURE_PARTIAL_SIMD_CALLEE_SAVE
7650 //------------------------------------------------------------------------
7651 // insertUpperVectorSaveAndReload: Insert code to save and restore the upper half of a vector that lives
7652 // in a callee-save register at the point of a kill (the upper half is
7656 // tree - This is the node around which we will insert the Save & Reload.
7657 // It will be a call or some node that turns into a call.
7658 // refPosition - The RefTypeUpperVectorSaveDef RefPosition.
7660 void LinearScan::insertUpperVectorSaveAndReload(GenTreePtr tree, RefPosition* refPosition, BasicBlock* block)
7662 Interval* lclVarInterval = refPosition->getInterval()->relatedInterval;
7663 assert(lclVarInterval->isLocalVar == true);
7664 LclVarDsc* varDsc = compiler->lvaTable + lclVarInterval->varNum;
7665 assert(varDsc->lvType == LargeVectorType);
7666 regNumber lclVarReg = lclVarInterval->physReg;
7667 if (lclVarReg == REG_NA)
7672 assert((genRegMask(lclVarReg) & RBM_FLT_CALLEE_SAVED) != RBM_NONE);
7674 regNumber spillReg = refPosition->assignedReg();
7675 bool spillToMem = refPosition->spillAfter;
7677 LIR::Range& blockRange = LIR::AsRange(block);
7679 // First, insert the save as an embedded statement before the call.
7681 GenTreePtr saveLcl = compiler->gtNewLclvNode(lclVarInterval->varNum, LargeVectorType);
7682 saveLcl->gtLsraInfo.isLsraAdded = true;
7683 saveLcl->gtRegNum = lclVarReg;
7684 saveLcl->gtFlags |= GTF_REG_VAL;
7685 saveLcl->gtLsraInfo.isLocalDefUse = false;
7687 GenTreeSIMD* simdNode =
7688 new (compiler, GT_SIMD) GenTreeSIMD(LargeVectorSaveType, saveLcl, nullptr, SIMDIntrinsicUpperSave,
7689 varDsc->lvBaseType, genTypeSize(LargeVectorType));
7690 simdNode->gtLsraInfo.isLsraAdded = true;
7691 simdNode->gtRegNum = spillReg;
7694 simdNode->gtFlags |= GTF_SPILL;
7697 blockRange.InsertBefore(tree, LIR::SeqTree(compiler, simdNode));
7699 // Now insert the restore after the call.
7701 GenTreePtr restoreLcl = compiler->gtNewLclvNode(lclVarInterval->varNum, LargeVectorType);
7702 restoreLcl->gtLsraInfo.isLsraAdded = true;
7703 restoreLcl->gtRegNum = lclVarReg;
7704 restoreLcl->gtFlags |= GTF_REG_VAL;
7705 restoreLcl->gtLsraInfo.isLocalDefUse = false;
7707 simdNode = new (compiler, GT_SIMD)
7708 GenTreeSIMD(LargeVectorType, restoreLcl, nullptr, SIMDIntrinsicUpperRestore, varDsc->lvBaseType, 32);
7709 simdNode->gtLsraInfo.isLsraAdded = true;
7710 simdNode->gtRegNum = spillReg;
7713 simdNode->gtFlags |= GTF_SPILLED;
7716 blockRange.InsertAfter(tree, LIR::SeqTree(compiler, simdNode));
7718 #endif // FEATURE_PARTIAL_SIMD_CALLEE_SAVE
7720 //------------------------------------------------------------------------
7721 // initMaxSpill: Initializes the LinearScan members used to track the max number
7722 // of concurrent spills. This is needed so that we can set the
7723 // fields in Compiler, so that the code generator, in turn can
7724 // allocate the right number of spill locations.
7733 // This is called before any calls to updateMaxSpill().
7735 void LinearScan::initMaxSpill()
7737 needDoubleTmpForFPCall = false;
7738 needFloatTmpForFPCall = false;
7739 for (int i = 0; i < TYP_COUNT; i++)
7742 currentSpill[i] = 0;
7746 //------------------------------------------------------------------------
7747 // recordMaxSpill: Sets the fields in Compiler for the max number of concurrent spills.
7748 // (See the comment on initMaxSpill.)
7757 // This is called after updateMaxSpill() has been called for all "real"
7760 void LinearScan::recordMaxSpill()
7762 // Note: due to the temp normalization process (see tmpNormalizeType)
7763 // only a few types should actually be seen here.
7764 JITDUMP("Recording the maximum number of concurrent spills:\n");
7766 var_types returnType = compiler->tmpNormalizeType(compiler->info.compRetType);
7767 if (needDoubleTmpForFPCall || (returnType == TYP_DOUBLE))
7769 JITDUMP("Adding a spill temp for moving a double call/return value between xmm reg and x87 stack.\n");
7770 maxSpill[TYP_DOUBLE] += 1;
7772 if (needFloatTmpForFPCall || (returnType == TYP_FLOAT))
7774 JITDUMP("Adding a spill temp for moving a float call/return value between xmm reg and x87 stack.\n");
7775 maxSpill[TYP_FLOAT] += 1;
7777 #endif // _TARGET_X86_
7778 for (int i = 0; i < TYP_COUNT; i++)
7780 if (var_types(i) != compiler->tmpNormalizeType(var_types(i)))
7782 // Only normalized types should have anything in the maxSpill array.
7783 // We assume here that if type 'i' does not normalize to itself, then
7784 // nothing else normalizes to 'i', either.
7785 assert(maxSpill[i] == 0);
7787 JITDUMP(" %s: %d\n", varTypeName(var_types(i)), maxSpill[i]);
7788 if (maxSpill[i] != 0)
7790 compiler->tmpPreAllocateTemps(var_types(i), maxSpill[i]);
7795 //------------------------------------------------------------------------
7796 // updateMaxSpill: Update the maximum number of concurrent spills
7799 // refPosition - the current RefPosition being handled
7805 // The RefPosition has an associated interval (getInterval() will
7806 // otherwise assert).
7809 // This is called for each "real" RefPosition during the writeback
7810 // phase of LSRA. It keeps track of how many concurrently-live
7811 // spills there are, and the largest number seen so far.
7813 void LinearScan::updateMaxSpill(RefPosition* refPosition)
7815 RefType refType = refPosition->refType;
7817 if (refPosition->spillAfter || refPosition->reload ||
7818 (refPosition->AllocateIfProfitable() && refPosition->assignedReg() == REG_NA))
7820 Interval* interval = refPosition->getInterval();
7821 if (!interval->isLocalVar)
7823 // The tmp allocation logic 'normalizes' types to a small number of
7824 // types that need distinct stack locations from each other.
7825 // Those types are currently gc refs, byrefs, <= 4 byte non-GC items,
7826 // 8-byte non-GC items, and 16-byte or 32-byte SIMD vectors.
7827 // LSRA is agnostic to those choices but needs
7828 // to know what they are here.
7831 #if FEATURE_PARTIAL_SIMD_CALLEE_SAVE
7832 if ((refType == RefTypeUpperVectorSaveDef) || (refType == RefTypeUpperVectorSaveUse))
7834 typ = LargeVectorSaveType;
7837 #endif // !FEATURE_PARTIAL_SIMD_CALLEE_SAVE
7839 GenTreePtr treeNode = refPosition->treeNode;
7840 if (treeNode == nullptr)
7842 assert(RefTypeIsUse(refType));
7843 treeNode = interval->firstRefPosition->treeNode;
7845 assert(treeNode != nullptr);
7847 // In case of multi-reg call nodes, we need to use the type
7848 // of the return register given by multiRegIdx of the refposition.
7849 if (treeNode->IsMultiRegCall())
7851 ReturnTypeDesc* retTypeDesc = treeNode->AsCall()->GetReturnTypeDesc();
7852 typ = retTypeDesc->GetReturnRegType(refPosition->getMultiRegIdx());
7856 typ = treeNode->TypeGet();
7858 typ = compiler->tmpNormalizeType(typ);
7861 if (refPosition->spillAfter && !refPosition->reload)
7863 currentSpill[typ]++;
7864 if (currentSpill[typ] > maxSpill[typ])
7866 maxSpill[typ] = currentSpill[typ];
7869 else if (refPosition->reload)
7871 assert(currentSpill[typ] > 0);
7872 currentSpill[typ]--;
7874 else if (refPosition->AllocateIfProfitable() && refPosition->assignedReg() == REG_NA)
7876 // A spill temp not getting reloaded into a reg because it is
7877 // marked as allocate if profitable and getting used from its
7878 // memory location. To properly account max spill for typ we
7879 // decrement spill count.
7880 assert(RefTypeIsUse(refType));
7881 assert(currentSpill[typ] > 0);
7882 currentSpill[typ]--;
7884 JITDUMP(" Max spill for %s is %d\n", varTypeName(typ), maxSpill[typ]);
7889 // This is the final phase of register allocation. It writes the register assignments to
7890 // the tree, and performs resolution across joins and backedges.
7892 void LinearScan::resolveRegisters()
7894 // Iterate over the tree and the RefPositions in lockstep
7895 // - annotate the tree with register assignments by setting gtRegNum or gtRegPair (for longs)
7897 // - track globally-live var locations
7898 // - add resolution points at split/merge/critical points as needed
7900 // Need to use the same traversal order as the one that assigns the location numbers.
7902 // Dummy RefPositions have been added at any split, join or critical edge, at the
7903 // point where resolution may be required. These are located:
7904 // - for a split, at the top of the non-adjacent block
7905 // - for a join, at the bottom of the non-adjacent joining block
7906 // - for a critical edge, at the top of the target block of each critical
7908 // Note that a target block may have multiple incoming critical or split edges
7910 // These RefPositions record the expected location of the Interval at that point.
7911 // At each branch, we identify the location of each liveOut interval, and check
7912 // against the RefPositions at the target.
7915 LsraLocation currentLocation = MinLocation;
7917 // Clear register assignments - these will be reestablished as lclVar defs (including RefTypeParamDefs)
7919 for (regNumber reg = REG_FIRST; reg < ACTUAL_REG_COUNT; reg = REG_NEXT(reg))
7921 RegRecord* physRegRecord = getRegisterRecord(reg);
7922 Interval* assignedInterval = physRegRecord->assignedInterval;
7923 if (assignedInterval != nullptr)
7925 assignedInterval->assignedReg = nullptr;
7926 assignedInterval->physReg = REG_NA;
7928 physRegRecord->assignedInterval = nullptr;
7929 physRegRecord->recentRefPosition = nullptr;
7932 // Clear "recentRefPosition" for lclVar intervals
7933 for (unsigned lclNum = 0; lclNum < compiler->lvaCount; lclNum++)
7935 localVarIntervals[lclNum]->recentRefPosition = nullptr;
7936 localVarIntervals[lclNum]->isActive = false;
7939 // handle incoming arguments and special temps
7940 auto currentRefPosition = refPositions.begin();
7942 VarToRegMap entryVarToRegMap = inVarToRegMaps[compiler->fgFirstBB->bbNum];
7943 while (currentRefPosition != refPositions.end() &&
7944 (currentRefPosition->refType == RefTypeParamDef || currentRefPosition->refType == RefTypeZeroInit))
7946 Interval* interval = currentRefPosition->getInterval();
7947 assert(interval != nullptr && interval->isLocalVar);
7948 resolveLocalRef(nullptr, nullptr, currentRefPosition);
7949 regNumber reg = REG_STK;
7950 int varIndex = interval->getVarIndex(compiler);
7952 if (!currentRefPosition->spillAfter && currentRefPosition->registerAssignment != RBM_NONE)
7954 reg = currentRefPosition->assignedReg();
7959 interval->isActive = false;
7961 entryVarToRegMap[varIndex] = reg;
7962 ++currentRefPosition;
7965 JITDUMP("------------------------\n");
7966 JITDUMP("WRITING BACK ASSIGNMENTS\n");
7967 JITDUMP("------------------------\n");
7969 BasicBlock* insertionBlock = compiler->fgFirstBB;
7970 GenTreePtr insertionPoint = LIR::AsRange(insertionBlock).FirstNonPhiNode();
7972 // write back assignments
7973 for (block = startBlockSequence(); block != nullptr; block = moveToNextBlock())
7975 assert(curBBNum == block->bbNum);
7980 block->dspBlockHeader(compiler);
7981 currentRefPosition->dump();
7985 // Record the var locations at the start of this block.
7986 // (If it's fgFirstBB, we've already done that above, see entryVarToRegMap)
7988 curBBStartLocation = currentRefPosition->nodeLocation;
7989 if (block != compiler->fgFirstBB)
7991 processBlockStartLocations(block, false);
7994 // Handle the DummyDefs, updating the incoming var location.
7995 for (; currentRefPosition != refPositions.end() && currentRefPosition->refType == RefTypeDummyDef;
7996 ++currentRefPosition)
7998 assert(currentRefPosition->isIntervalRef());
7999 // Don't mark dummy defs as reload
8000 currentRefPosition->reload = false;
8001 resolveLocalRef(nullptr, nullptr, currentRefPosition);
8003 if (currentRefPosition->registerAssignment != RBM_NONE)
8005 reg = currentRefPosition->assignedReg();
8010 currentRefPosition->getInterval()->isActive = false;
8012 setInVarRegForBB(curBBNum, currentRefPosition->getInterval()->varNum, reg);
8015 // The next RefPosition should be for the block. Move past it.
8016 assert(currentRefPosition != refPositions.end());
8017 assert(currentRefPosition->refType == RefTypeBB);
8018 ++currentRefPosition;
8020 // Handle the RefPositions for the block
8021 for (; currentRefPosition != refPositions.end() && currentRefPosition->refType != RefTypeBB &&
8022 currentRefPosition->refType != RefTypeDummyDef;
8023 ++currentRefPosition)
8025 currentLocation = currentRefPosition->nodeLocation;
8026 JITDUMP("current : ");
8027 DBEXEC(VERBOSE, currentRefPosition->dump());
8029 // Ensure that the spill & copy info is valid.
8030 // First, if it's reload, it must not be copyReg or moveReg
8031 assert(!currentRefPosition->reload || (!currentRefPosition->copyReg && !currentRefPosition->moveReg));
8032 // If it's copyReg it must not be moveReg, and vice-versa
8033 assert(!currentRefPosition->copyReg || !currentRefPosition->moveReg);
8035 switch (currentRefPosition->refType)
8038 case RefTypeUpperVectorSaveUse:
8039 case RefTypeUpperVectorSaveDef:
8040 #endif // FEATURE_SIMD
8043 // These are the ones we're interested in
8046 case RefTypeFixedReg:
8047 // These require no handling at resolution time
8048 assert(currentRefPosition->referent != nullptr);
8049 currentRefPosition->referent->recentRefPosition = currentRefPosition;
8052 // Ignore the ExpUse cases - a RefTypeExpUse would only exist if the
8053 // variable is dead at the entry to the next block. So we'll mark
8054 // it as in its current location and resolution will take care of any
8056 assert(getNextBlock() == nullptr ||
8057 !VarSetOps::IsMember(compiler, getNextBlock()->bbLiveIn,
8058 currentRefPosition->getInterval()->getVarIndex(compiler)));
8059 currentRefPosition->referent->recentRefPosition = currentRefPosition;
8061 case RefTypeKillGCRefs:
8062 // No action to take at resolution time, and no interval to update recentRefPosition for.
8064 case RefTypeDummyDef:
8065 case RefTypeParamDef:
8066 case RefTypeZeroInit:
8067 // Should have handled all of these already
8072 updateMaxSpill(currentRefPosition);
8073 GenTree* treeNode = currentRefPosition->treeNode;
8075 #if FEATURE_PARTIAL_SIMD_CALLEE_SAVE
8076 if (currentRefPosition->refType == RefTypeUpperVectorSaveDef)
8078 // The treeNode must be a call, and this must be a RefPosition for a LargeVectorType LocalVar.
8079 // If the LocalVar is in a callee-save register, we are going to spill its upper half around the call.
8080 // If we have allocated a register to spill it to, we will use that; otherwise, we will spill it
8081 // to the stack. We can use as a temp register any non-arg caller-save register.
8082 noway_assert(treeNode != nullptr);
8083 currentRefPosition->referent->recentRefPosition = currentRefPosition;
8084 insertUpperVectorSaveAndReload(treeNode, currentRefPosition, block);
8086 else if (currentRefPosition->refType == RefTypeUpperVectorSaveUse)
8090 #endif // FEATURE_PARTIAL_SIMD_CALLEE_SAVE
8092 // Most uses won't actually need to be recorded (they're on the def).
8093 // In those cases, treeNode will be nullptr.
8094 if (treeNode == nullptr)
8096 // This is either a use, a dead def, or a field of a struct
8097 Interval* interval = currentRefPosition->getInterval();
8098 assert(currentRefPosition->refType == RefTypeUse ||
8099 currentRefPosition->registerAssignment == RBM_NONE || interval->isStructField);
8101 // TODO-Review: Need to handle the case where any of the struct fields
8102 // are reloaded/spilled at this use
8103 assert(!interval->isStructField ||
8104 (currentRefPosition->reload == false && currentRefPosition->spillAfter == false));
8106 if (interval->isLocalVar && !interval->isStructField)
8108 LclVarDsc* varDsc = interval->getLocalVar(compiler);
8110 // This must be a dead definition. We need to mark the lclVar
8111 // so that it's not considered a candidate for lvRegister, as
8112 // this dead def will have to go to the stack.
8113 assert(currentRefPosition->refType == RefTypeDef);
8114 varDsc->lvRegNum = REG_STK;
8117 JITDUMP("No tree node to write back to\n");
8121 DBEXEC(VERBOSE, lsraDispNode(treeNode, LSRA_DUMP_REFPOS, true));
8124 LsraLocation loc = treeNode->gtLsraInfo.loc;
8125 JITDUMP("curr = %u mapped = %u", currentLocation, loc);
8126 assert(treeNode->IsLocal() || currentLocation == loc || currentLocation == loc + 1);
8128 if (currentRefPosition->isIntervalRef() && currentRefPosition->getInterval()->isInternal)
8130 JITDUMP(" internal");
8131 GenTreePtr indNode = nullptr;
8132 if (treeNode->OperIsIndir())
8135 JITDUMP(" allocated at GT_IND");
8137 if (indNode != nullptr)
8139 GenTreePtr addrNode = indNode->gtOp.gtOp1->gtEffectiveVal();
8140 if (addrNode->OperGet() != GT_ARR_ELEM)
8142 addrNode->gtRsvdRegs |= currentRefPosition->registerAssignment;
8143 JITDUMP(", recorded on addr");
8146 if (treeNode->OperGet() == GT_ARR_ELEM)
8148 // TODO-Review: See WORKAROUND ALERT in buildRefPositionsForNode()
8149 GenTreePtr firstIndexTree = treeNode->gtArrElem.gtArrInds[0]->gtEffectiveVal();
8150 assert(firstIndexTree != nullptr);
8151 if (firstIndexTree->IsLocal() && (firstIndexTree->gtFlags & GTF_VAR_DEATH) == 0)
8153 // Record the LAST internal interval
8154 // (Yes, this naively just records each one, but the next will replace it;
8155 // I'd fix this if it wasn't just a temporary fix)
8156 if (currentRefPosition->refType == RefTypeDef)
8158 JITDUMP(" allocated at GT_ARR_ELEM, recorded on firstIndex V%02u");
8159 firstIndexTree->gtRsvdRegs = (regMaskSmall)currentRefPosition->registerAssignment;
8163 treeNode->gtRsvdRegs |= currentRefPosition->registerAssignment;
8167 writeRegisters(currentRefPosition, treeNode);
8169 if (treeNode->IsLocal() && currentRefPosition->getInterval()->isLocalVar)
8171 resolveLocalRef(block, treeNode, currentRefPosition);
8174 // Mark spill locations on temps
8175 // (local vars are handled in resolveLocalRef, above)
8176 // Note that the tree node will be changed from GTF_SPILL to GTF_SPILLED
8177 // in codegen, taking care of the "reload" case for temps
8178 else if (currentRefPosition->spillAfter || (currentRefPosition->nextRefPosition != nullptr &&
8179 currentRefPosition->nextRefPosition->moveReg))
8181 if (treeNode != nullptr && currentRefPosition->isIntervalRef())
8183 if (currentRefPosition->spillAfter)
8185 treeNode->gtFlags |= GTF_SPILL;
8187 // If this is a constant interval that is reusing a pre-existing value, we actually need
8188 // to generate the value at this point in order to spill it.
8189 if (treeNode->IsReuseRegVal())
8191 treeNode->ResetReuseRegVal();
8194 // In case of multi-reg call node, also set spill flag on the
8195 // register specified by multi-reg index of current RefPosition.
8196 // Note that the spill flag on treeNode indicates that one or
8197 // more its allocated registers are in that state.
8198 if (treeNode->IsMultiRegCall())
8200 GenTreeCall* call = treeNode->AsCall();
8201 call->SetRegSpillFlagByIdx(GTF_SPILL, currentRefPosition->getMultiRegIdx());
8205 // If the value is reloaded or moved to a different register, we need to insert
8206 // a node to hold the register to which it should be reloaded
8207 RefPosition* nextRefPosition = currentRefPosition->nextRefPosition;
8208 assert(nextRefPosition != nullptr);
8209 if (INDEBUG(alwaysInsertReload() ||)
8210 nextRefPosition->assignedReg() != currentRefPosition->assignedReg())
8212 if (nextRefPosition->assignedReg() != REG_NA)
8214 insertCopyOrReload(block, treeNode, currentRefPosition->getMultiRegIdx(),
8219 assert(nextRefPosition->AllocateIfProfitable());
8221 // In case of tree temps, if def is spilled and use didn't
8222 // get a register, set a flag on tree node to be treated as
8223 // contained at the point of its use.
8224 if (currentRefPosition->spillAfter && currentRefPosition->refType == RefTypeDef &&
8225 nextRefPosition->refType == RefTypeUse)
8227 assert(nextRefPosition->treeNode == nullptr);
8228 treeNode->gtFlags |= GTF_NOREG_AT_USE;
8234 // We should never have to "spill after" a temp use, since
8235 // they're single use
8245 processBlockEndLocations(block);
8251 printf("-----------------------\n");
8252 printf("RESOLVING BB BOUNDARIES\n");
8253 printf("-----------------------\n");
8255 printf("Prior to Resolution\n");
8256 foreach_block(compiler, block)
8258 printf("\nBB%02u use def in out\n", block->bbNum);
8259 dumpConvertedVarSet(compiler, block->bbVarUse);
8261 dumpConvertedVarSet(compiler, block->bbVarDef);
8263 dumpConvertedVarSet(compiler, block->bbLiveIn);
8265 dumpConvertedVarSet(compiler, block->bbLiveOut);
8268 dumpInVarToRegMap(block);
8269 dumpOutVarToRegMap(block);
8278 // Verify register assignments on variables
8281 for (lclNum = 0, varDsc = compiler->lvaTable; lclNum < compiler->lvaCount; lclNum++, varDsc++)
8283 if (!isCandidateVar(varDsc))
8285 varDsc->lvRegNum = REG_STK;
8289 Interval* interval = getIntervalForLocalVar(lclNum);
8291 // Determine initial position for parameters
8293 if (varDsc->lvIsParam)
8295 regMaskTP initialRegMask = interval->firstRefPosition->registerAssignment;
8296 regNumber initialReg = (initialRegMask == RBM_NONE || interval->firstRefPosition->spillAfter)
8298 : genRegNumFromMask(initialRegMask);
8299 regNumber sourceReg = (varDsc->lvIsRegArg) ? varDsc->lvArgReg : REG_STK;
8302 if (varTypeIsMultiReg(varDsc))
8304 // TODO-ARM-NYI: Map the hi/lo intervals back to lvRegNum and lvOtherReg (these should NYI before
8306 assert(!"Multi-reg types not yet supported");
8309 #endif // _TARGET_ARM_
8311 varDsc->lvArgInitReg = initialReg;
8312 JITDUMP(" Set V%02u argument initial register to %s\n", lclNum, getRegName(initialReg));
8314 if (!varDsc->lvIsRegArg)
8317 if (compiler->lvaIsFieldOfDependentlyPromotedStruct(varDsc))
8319 if (sourceReg != initialReg)
8321 // The code generator won't initialize struct
8322 // fields, so we have to do that if it's not already
8323 // where it belongs.
8324 assert(interval->isStructField);
8325 JITDUMP(" Move struct field param V%02u from %s to %s\n", lclNum, getRegName(sourceReg),
8326 getRegName(initialReg));
8327 insertMove(insertionBlock, insertionPoint, lclNum, sourceReg, initialReg);
8333 // If lvRegNum is REG_STK, that means that either no register
8334 // was assigned, or (more likely) that the same register was not
8335 // used for all references. In that case, codegen gets the register
8336 // from the tree node.
8337 if (varDsc->lvRegNum == REG_STK || interval->isSpilled || interval->isSplit)
8339 // For codegen purposes, we'll set lvRegNum to whatever register
8340 // it's currently in as we go.
8341 // However, we never mark an interval as lvRegister if it has either been spilled
8343 varDsc->lvRegister = false;
8345 // Skip any dead defs or exposed uses
8346 // (first use exposed will only occur when there is no explicit initialization)
8347 RefPosition* firstRefPosition = interval->firstRefPosition;
8348 while ((firstRefPosition != nullptr) && (firstRefPosition->refType == RefTypeExpUse))
8350 firstRefPosition = firstRefPosition->nextRefPosition;
8352 if (firstRefPosition == nullptr)
8355 varDsc->lvLRACandidate = false;
8356 if (varDsc->lvRefCnt == 0)
8358 varDsc->lvOnFrame = false;
8362 // We may encounter cases where a lclVar actually has no references, but
8363 // a non-zero refCnt. For safety (in case this is some "hidden" lclVar that we're
8364 // not correctly recognizing), we'll mark those as needing a stack location.
8365 // TODO-Cleanup: Make this an assert if/when we correct the refCnt
8367 varDsc->lvOnFrame = true;
8372 // If the interval was not spilled, it doesn't need a stack location.
8373 if (!interval->isSpilled)
8375 varDsc->lvOnFrame = false;
8377 if (firstRefPosition->registerAssignment == RBM_NONE || firstRefPosition->spillAfter)
8379 // Either this RefPosition is spilled, or it is not a "real" def or use
8380 assert(firstRefPosition->spillAfter ||
8381 (firstRefPosition->refType != RefTypeDef && firstRefPosition->refType != RefTypeUse));
8382 varDsc->lvRegNum = REG_STK;
8386 varDsc->lvRegNum = firstRefPosition->assignedReg();
8393 varDsc->lvRegister = true;
8394 varDsc->lvOnFrame = false;
8397 regMaskTP registerAssignment = genRegMask(varDsc->lvRegNum);
8398 assert(!interval->isSpilled && !interval->isSplit);
8399 RefPosition* refPosition = interval->firstRefPosition;
8400 assert(refPosition != nullptr);
8402 while (refPosition != nullptr)
8404 // All RefPositions must match, except for dead definitions,
8405 // copyReg/moveReg and RefTypeExpUse positions
8406 if (refPosition->registerAssignment != RBM_NONE && !refPosition->copyReg && !refPosition->moveReg &&
8407 refPosition->refType != RefTypeExpUse)
8409 assert(refPosition->registerAssignment == registerAssignment);
8411 refPosition = refPosition->nextRefPosition;
8421 printf("Trees after linear scan register allocator (LSRA)\n");
8422 compiler->fgDispBasicBlocks(true);
8425 verifyFinalAllocation();
8428 compiler->raMarkStkVars();
8431 // TODO-CQ: Review this comment and address as needed.
8432 // Change all unused promoted non-argument struct locals to a non-GC type (in this case TYP_INT)
8433 // so that the gc tracking logic and lvMustInit logic will ignore them.
8434 // Extract the code that does this from raAssignVars, and call it here.
8435 // PRECONDITIONS: Ensure that lvPromoted is set on promoted structs, if and
8436 // only if it is promoted on all paths.
8437 // Call might be something like:
8438 // compiler->BashUnusedStructLocals();
8442 //------------------------------------------------------------------------
8443 // insertMove: Insert a move of a lclVar with the given lclNum into the given block.
8446 // block - the BasicBlock into which the move will be inserted.
8447 // insertionPoint - the instruction before which to insert the move
8448 // lclNum - the lclNum of the var to be moved
8449 // fromReg - the register from which the var is moving
8450 // toReg - the register to which the var is moving
8456 // If insertionPoint is non-NULL, insert before that instruction;
8457 // otherwise, insert "near" the end (prior to the branch, if any).
8458 // If fromReg or toReg is REG_STK, then move from/to memory, respectively.
8460 void LinearScan::insertMove(
8461 BasicBlock* block, GenTreePtr insertionPoint, unsigned lclNum, regNumber fromReg, regNumber toReg)
8463 LclVarDsc* varDsc = compiler->lvaTable + lclNum;
8464 // One or both MUST be a register
8465 assert(fromReg != REG_STK || toReg != REG_STK);
8466 // They must not be the same register.
8467 assert(fromReg != toReg);
8469 // This var can't be marked lvRegister now
8470 varDsc->lvRegNum = REG_STK;
8472 var_types lclTyp = varDsc->TypeGet();
8473 if (varDsc->lvNormalizeOnStore())
8475 lclTyp = genActualType(lclTyp);
8477 GenTreePtr src = compiler->gtNewLclvNode(lclNum, lclTyp);
8478 src->gtLsraInfo.isLsraAdded = true;
8481 // If we are moving from STK to reg, mark the lclVar nodes with GTF_SPILLED
8482 // Otherwise, if we are moving from reg to stack, mark it as GTF_SPILL
8483 // Finally, for a reg-to-reg move, generate a GT_COPY
8486 if (fromReg == REG_STK)
8488 src->gtFlags |= GTF_SPILLED;
8489 src->gtRegNum = toReg;
8491 else if (toReg == REG_STK)
8493 src->gtFlags |= GTF_SPILL;
8495 src->gtRegNum = fromReg;
8499 top = new (compiler, GT_COPY) GenTreeCopyOrReload(GT_COPY, varDsc->TypeGet(), src);
8500 // This is the new home of the lclVar - indicate that by clearing the GTF_VAR_DEATH flag.
8501 // Note that if src is itself a lastUse, this will have no effect.
8502 top->gtFlags &= ~(GTF_VAR_DEATH);
8503 src->gtRegNum = fromReg;
8505 top->gtRegNum = toReg;
8508 src->gtLsraInfo.isLocalDefUse = false;
8509 top->gtLsraInfo.isLsraAdded = true;
8511 top->gtLsraInfo.isLocalDefUse = true;
8513 LIR::Range treeRange = LIR::SeqTree(compiler, top);
8514 LIR::Range& blockRange = LIR::AsRange(block);
8516 if (insertionPoint != nullptr)
8518 blockRange.InsertBefore(insertionPoint, std::move(treeRange));
8522 // Put the copy at the bottom
8523 // If there's a branch, make an embedded statement that executes just prior to the branch
8524 if (block->bbJumpKind == BBJ_COND || block->bbJumpKind == BBJ_SWITCH)
8526 noway_assert(!blockRange.IsEmpty());
8528 GenTree* branch = blockRange.LastNode();
8529 assert(branch->OperIsConditionalJump() || branch->OperGet() == GT_SWITCH_TABLE ||
8530 branch->OperGet() == GT_SWITCH);
8532 blockRange.InsertBefore(branch, std::move(treeRange));
8536 assert(block->bbJumpKind == BBJ_NONE || block->bbJumpKind == BBJ_ALWAYS);
8537 blockRange.InsertAtEnd(std::move(treeRange));
8542 void LinearScan::insertSwap(
8543 BasicBlock* block, GenTreePtr insertionPoint, unsigned lclNum1, regNumber reg1, unsigned lclNum2, regNumber reg2)
8548 const char* insertionPointString = "top";
8549 if (insertionPoint == nullptr)
8551 insertionPointString = "bottom";
8553 printf(" BB%02u %s: swap V%02u in %s with V%02u in %s\n", block->bbNum, insertionPointString, lclNum1,
8554 getRegName(reg1), lclNum2, getRegName(reg2));
8558 LclVarDsc* varDsc1 = compiler->lvaTable + lclNum1;
8559 LclVarDsc* varDsc2 = compiler->lvaTable + lclNum2;
8560 assert(reg1 != REG_STK && reg1 != REG_NA && reg2 != REG_STK && reg2 != REG_NA);
8562 GenTreePtr lcl1 = compiler->gtNewLclvNode(lclNum1, varDsc1->TypeGet());
8563 lcl1->gtLsraInfo.isLsraAdded = true;
8564 lcl1->gtLsraInfo.isLocalDefUse = false;
8566 lcl1->gtRegNum = reg1;
8568 GenTreePtr lcl2 = compiler->gtNewLclvNode(lclNum2, varDsc2->TypeGet());
8569 lcl2->gtLsraInfo.isLsraAdded = true;
8570 lcl2->gtLsraInfo.isLocalDefUse = false;
8572 lcl2->gtRegNum = reg2;
8574 GenTreePtr swap = compiler->gtNewOperNode(GT_SWAP, TYP_VOID, lcl1, lcl2);
8575 swap->gtLsraInfo.isLsraAdded = true;
8576 swap->gtLsraInfo.isLocalDefUse = false;
8577 swap->gtRegNum = REG_NA;
8579 lcl1->gtNext = lcl2;
8580 lcl2->gtPrev = lcl1;
8581 lcl2->gtNext = swap;
8582 swap->gtPrev = lcl2;
8584 LIR::Range swapRange = LIR::SeqTree(compiler, swap);
8585 LIR::Range& blockRange = LIR::AsRange(block);
8587 if (insertionPoint != nullptr)
8589 blockRange.InsertBefore(insertionPoint, std::move(swapRange));
8593 // Put the copy at the bottom
8594 // If there's a branch, make an embedded statement that executes just prior to the branch
8595 if (block->bbJumpKind == BBJ_COND || block->bbJumpKind == BBJ_SWITCH)
8597 noway_assert(!blockRange.IsEmpty());
8599 GenTree* branch = blockRange.LastNode();
8600 assert(branch->OperIsConditionalJump() || branch->OperGet() == GT_SWITCH_TABLE ||
8601 branch->OperGet() == GT_SWITCH);
8603 blockRange.InsertBefore(branch, std::move(swapRange));
8607 assert(block->bbJumpKind == BBJ_NONE || block->bbJumpKind == BBJ_ALWAYS);
8608 blockRange.InsertAtEnd(std::move(swapRange));
8613 //------------------------------------------------------------------------
8614 // getTempRegForResolution: Get a free register to use for resolution code.
8617 // fromBlock - The "from" block on the edge being resolved.
8618 // toBlock - The "to"block on the edge
8619 // type - the type of register required
8622 // Returns a register that is free on the given edge, or REG_NA if none is available.
8625 // It is up to the caller to check the return value, and to determine whether a register is
8626 // available, and to handle that case appropriately.
8627 // It is also up to the caller to cache the return value, as this is not cheap to compute.
8629 regNumber LinearScan::getTempRegForResolution(BasicBlock* fromBlock, BasicBlock* toBlock, var_types type)
8631 // TODO-Throughput: This would be much more efficient if we add RegToVarMaps instead of VarToRegMaps
8632 // and they would be more space-efficient as well.
8633 VarToRegMap fromVarToRegMap = getOutVarToRegMap(fromBlock->bbNum);
8634 VarToRegMap toVarToRegMap = getInVarToRegMap(toBlock->bbNum);
8636 regMaskTP freeRegs = allRegs(type);
8638 if (getStressLimitRegs() == LSRA_LIMIT_SMALL_SET)
8643 INDEBUG(freeRegs = stressLimitRegs(nullptr, freeRegs));
8645 // We are only interested in the variables that are live-in to the "to" block.
8646 VARSET_ITER_INIT(compiler, iter, toBlock->bbLiveIn, varIndex);
8647 while (iter.NextElem(compiler, &varIndex) && freeRegs != RBM_NONE)
8649 regNumber fromReg = fromVarToRegMap[varIndex];
8650 regNumber toReg = toVarToRegMap[varIndex];
8651 assert(fromReg != REG_NA && toReg != REG_NA);
8652 if (fromReg != REG_STK)
8654 freeRegs &= ~genRegMask(fromReg);
8656 if (toReg != REG_STK)
8658 freeRegs &= ~genRegMask(toReg);
8661 if (freeRegs == RBM_NONE)
8667 regNumber tempReg = genRegNumFromMask(genFindLowestBit(freeRegs));
8672 //------------------------------------------------------------------------
8673 // addResolution: Add a resolution move of the given interval
8676 // block - the BasicBlock into which the move will be inserted.
8677 // insertionPoint - the instruction before which to insert the move
8678 // interval - the interval of the var to be moved
8679 // toReg - the register to which the var is moving
8680 // fromReg - the register from which the var is moving
8686 // For joins, we insert at the bottom (indicated by an insertionPoint
8687 // of nullptr), while for splits we insert at the top.
8688 // This is because for joins 'block' is a pred of the join, while for splits it is a succ.
8689 // For critical edges, this function may be called twice - once to move from
8690 // the source (fromReg), if any, to the stack, in which case toReg will be
8691 // REG_STK, and we insert at the bottom (leave insertionPoint as nullptr).
8692 // The next time, we want to move from the stack to the destination (toReg),
8693 // in which case fromReg will be REG_STK, and we insert at the top.
8695 void LinearScan::addResolution(
8696 BasicBlock* block, GenTreePtr insertionPoint, Interval* interval, regNumber toReg, regNumber fromReg)
8699 const char* insertionPointString = "top";
8701 if (insertionPoint == nullptr)
8704 insertionPointString = "bottom";
8708 JITDUMP(" BB%02u %s: move V%02u from ", block->bbNum, insertionPointString, interval->varNum);
8709 JITDUMP("%s to %s", getRegName(fromReg), getRegName(toReg));
8711 insertMove(block, insertionPoint, interval->varNum, fromReg, toReg);
8712 if (fromReg == REG_STK || toReg == REG_STK)
8714 interval->isSpilled = true;
8718 interval->isSplit = true;
8722 //------------------------------------------------------------------------
8723 // handleOutgoingCriticalEdges: Performs the necessary resolution on all critical edges that feed out of 'block'
8726 // block - the block with outgoing critical edges.
8732 // For all outgoing critical edges (i.e. any successor of this block which is
8733 // a join edge), if there are any conflicts, split the edge by adding a new block,
8734 // and generate the resolution code into that block.
8736 void LinearScan::handleOutgoingCriticalEdges(BasicBlock* block)
8738 VARSET_TP VARSET_INIT_NOCOPY(sameResolutionSet, VarSetOps::MakeEmpty(compiler));
8739 VARSET_TP VARSET_INIT_NOCOPY(sameLivePathsSet, VarSetOps::MakeEmpty(compiler));
8740 VARSET_TP VARSET_INIT_NOCOPY(singleTargetSet, VarSetOps::MakeEmpty(compiler));
8741 VARSET_TP VARSET_INIT_NOCOPY(diffResolutionSet, VarSetOps::MakeEmpty(compiler));
8743 // Get the outVarToRegMap for this block
8744 VarToRegMap outVarToRegMap = getOutVarToRegMap(block->bbNum);
8745 unsigned succCount = block->NumSucc(compiler);
8746 assert(succCount > 1);
8747 VarToRegMap firstSuccInVarToRegMap = nullptr;
8748 BasicBlock* firstSucc = nullptr;
8750 // First, determine the live regs at the end of this block so that we know what regs are
8751 // available to copy into.
8752 regMaskTP liveOutRegs = RBM_NONE;
8753 VARSET_ITER_INIT(compiler, iter1, block->bbLiveOut, varIndex1);
8754 while (iter1.NextElem(compiler, &varIndex1))
8756 unsigned varNum = compiler->lvaTrackedToVarNum[varIndex1];
8757 regNumber fromReg = getVarReg(outVarToRegMap, varNum);
8758 if (fromReg != REG_STK)
8760 liveOutRegs |= genRegMask(fromReg);
8764 // Next, if this blocks ends with a switch table, we have to make sure not to copy
8765 // into the registers that it uses.
8766 regMaskTP switchRegs = RBM_NONE;
8767 if (block->bbJumpKind == BBJ_SWITCH)
8769 // At this point, Lowering has transformed any non-switch-table blocks into
8771 GenTree* switchTable = LIR::AsRange(block).LastNode();
8772 assert(switchTable != nullptr && switchTable->OperGet() == GT_SWITCH_TABLE);
8774 switchRegs = switchTable->gtRsvdRegs;
8775 GenTree* op1 = switchTable->gtGetOp1();
8776 GenTree* op2 = switchTable->gtGetOp2();
8777 noway_assert(op1 != nullptr && op2 != nullptr);
8778 assert(op1->gtRegNum != REG_NA && op2->gtRegNum != REG_NA);
8779 switchRegs |= genRegMask(op1->gtRegNum);
8780 switchRegs |= genRegMask(op2->gtRegNum);
8783 VarToRegMap sameVarToRegMap = sharedCriticalVarToRegMap;
8784 regMaskTP sameWriteRegs = RBM_NONE;
8785 regMaskTP diffReadRegs = RBM_NONE;
8787 // For each var, classify them as:
8788 // - in the same register at the end of this block and at each target (no resolution needed)
8789 // - in different registers at different targets (resolve separately):
8790 // diffResolutionSet
8791 // - in the same register at each target at which it's live, but different from the end of
8792 // this block. We may be able to resolve these as if it is "join", but only if they do not
8793 // write to any registers that are read by those in the diffResolutionSet:
8794 // sameResolutionSet
8796 VARSET_ITER_INIT(compiler, iter, block->bbLiveOut, varIndex);
8797 while (iter.NextElem(compiler, &varIndex))
8799 unsigned varNum = compiler->lvaTrackedToVarNum[varIndex];
8800 regNumber fromReg = getVarReg(outVarToRegMap, varNum);
8801 bool isMatch = true;
8802 bool isSame = false;
8803 bool maybeSingleTarget = false;
8804 bool maybeSameLivePaths = false;
8805 bool liveOnlyAtSplitEdge = true;
8806 regNumber sameToReg = REG_NA;
8807 for (unsigned succIndex = 0; succIndex < succCount; succIndex++)
8809 BasicBlock* succBlock = block->GetSucc(succIndex, compiler);
8810 if (!VarSetOps::IsMember(compiler, succBlock->bbLiveIn, varIndex))
8812 maybeSameLivePaths = true;
8815 else if (liveOnlyAtSplitEdge)
8817 // Is the var live only at those target blocks which are connected by a split edge to this block
8818 liveOnlyAtSplitEdge = ((succBlock->bbPreds->flNext == nullptr) && (succBlock != compiler->fgFirstBB));
8821 regNumber toReg = getVarReg(getInVarToRegMap(succBlock->bbNum), varNum);
8822 if (sameToReg == REG_NA)
8827 if (toReg == sameToReg)
8835 // Check for the cases where we can't write to a register.
8836 // We only need to check for these cases if sameToReg is an actual register (not REG_STK).
8837 if (sameToReg != REG_NA && sameToReg != REG_STK)
8839 // If there's a path on which this var isn't live, it may use the original value in sameToReg.
8840 // In this case, sameToReg will be in the liveOutRegs of this block.
8841 // Similarly, if sameToReg is in sameWriteRegs, it has already been used (i.e. for a lclVar that's
8842 // live only at another target), and we can't copy another lclVar into that reg in this block.
8843 regMaskTP sameToRegMask = genRegMask(sameToReg);
8844 if (maybeSameLivePaths &&
8845 (((sameToRegMask & liveOutRegs) != RBM_NONE) || ((sameToRegMask & sameWriteRegs) != RBM_NONE)))
8849 // If this register is used by a switch table at the end of the block, we can't do the copy
8850 // in this block (since we can't insert it after the switch).
8851 if ((sameToRegMask & switchRegs) != RBM_NONE)
8856 // If the var is live only at those blocks connected by a split edge and not live-in at some of the
8857 // target blocks, we will resolve it the same way as if it were in diffResolutionSet and resolution
8858 // will be deferred to the handling of split edges, which means copy will only be at those target(s).
8860 // Another way to achieve similar resolution for vars live only at split edges is by removing them
8861 // from consideration up-front but it requires that we traverse those edges anyway to account for
8862 // the registers that must note be overwritten.
8863 if (liveOnlyAtSplitEdge && maybeSameLivePaths)
8869 if (sameToReg == REG_NA)
8871 VarSetOps::AddElemD(compiler, diffResolutionSet, varIndex);
8872 if (fromReg != REG_STK)
8874 diffReadRegs |= genRegMask(fromReg);
8877 else if (sameToReg != fromReg)
8879 VarSetOps::AddElemD(compiler, sameResolutionSet, varIndex);
8880 sameVarToRegMap[varIndex] = sameToReg;
8881 if (sameToReg != REG_STK)
8883 sameWriteRegs |= genRegMask(sameToReg);
8888 if (!VarSetOps::IsEmpty(compiler, sameResolutionSet))
8890 if ((sameWriteRegs & diffReadRegs) != RBM_NONE)
8892 // We cannot split the "same" and "diff" regs if the "same" set writes registers
8893 // that must be read by the "diff" set. (Note that when these are done as a "batch"
8894 // we carefully order them to ensure all the input regs are read before they are
8896 VarSetOps::UnionD(compiler, diffResolutionSet, sameResolutionSet);
8897 VarSetOps::ClearD(compiler, sameResolutionSet);
8901 // For any vars in the sameResolutionSet, we can simply add the move at the end of "block".
8902 resolveEdge(block, nullptr, ResolveSharedCritical, sameResolutionSet);
8905 if (!VarSetOps::IsEmpty(compiler, diffResolutionSet))
8907 for (unsigned succIndex = 0; succIndex < succCount; succIndex++)
8909 BasicBlock* succBlock = block->GetSucc(succIndex, compiler);
8911 // Any "diffResolutionSet" resolution for a block with no other predecessors will be handled later
8912 // as split resolution.
8913 if ((succBlock->bbPreds->flNext == nullptr) && (succBlock != compiler->fgFirstBB))
8918 // Now collect the resolution set for just this edge, if any.
8919 // Check only the vars in diffResolutionSet that are live-in to this successor.
8920 bool needsResolution = false;
8921 VarToRegMap succInVarToRegMap = getInVarToRegMap(succBlock->bbNum);
8922 VARSET_TP VARSET_INIT_NOCOPY(edgeResolutionSet,
8923 VarSetOps::Intersection(compiler, diffResolutionSet, succBlock->bbLiveIn));
8924 VARSET_ITER_INIT(compiler, iter, edgeResolutionSet, varIndex);
8925 while (iter.NextElem(compiler, &varIndex))
8927 unsigned varNum = compiler->lvaTrackedToVarNum[varIndex];
8928 Interval* interval = getIntervalForLocalVar(varNum);
8929 regNumber fromReg = getVarReg(outVarToRegMap, varNum);
8930 regNumber toReg = getVarReg(succInVarToRegMap, varNum);
8932 if (fromReg == toReg)
8934 VarSetOps::RemoveElemD(compiler, edgeResolutionSet, varIndex);
8937 if (!VarSetOps::IsEmpty(compiler, edgeResolutionSet))
8939 resolveEdge(block, succBlock, ResolveCritical, edgeResolutionSet);
8945 //------------------------------------------------------------------------
8946 // resolveEdges: Perform resolution across basic block edges
8955 // Traverse the basic blocks.
8956 // - If this block has a single predecessor that is not the immediately
8957 // preceding block, perform any needed 'split' resolution at the beginning of this block
8958 // - Otherwise if this block has critical incoming edges, handle them.
8959 // - If this block has a single successor that has multiple predecesors, perform any needed
8960 // 'join' resolution at the end of this block.
8961 // Note that a block may have both 'split' or 'critical' incoming edge(s) and 'join' outgoing
8964 void LinearScan::resolveEdges()
8966 JITDUMP("RESOLVING EDGES\n");
8968 BasicBlock *block, *prevBlock = nullptr;
8970 // Handle all the critical edges first.
8971 // We will try to avoid resolution across critical edges in cases where all the critical-edge
8972 // targets of a block have the same home. We will then split the edges only for the
8973 // remaining mismatches. We visit the out-edges, as that allows us to share the moves that are
8974 // common among allt he targets.
8976 foreach_block(compiler, block)
8978 if (block->bbNum > bbNumMaxBeforeResolution)
8980 // This is a new block added during resolution - we don't need to visit these now.
8983 if (blockInfo[block->bbNum].hasCriticalOutEdge)
8985 handleOutgoingCriticalEdges(block);
8990 prevBlock = nullptr;
8991 foreach_block(compiler, block)
8993 if (block->bbNum > bbNumMaxBeforeResolution)
8995 // This is a new block added during resolution - we don't need to visit these now.
8999 unsigned succCount = block->NumSucc(compiler);
9000 flowList* preds = block->bbPreds;
9001 BasicBlock* uniquePredBlock = block->GetUniquePred(compiler);
9003 // First, if this block has a single predecessor,
9004 // we may need resolution at the beginning of this block.
9005 // This may be true even if it's the block we used for starting locations,
9006 // if a variable was spilled.
9007 if (!VarSetOps::IsEmpty(compiler, block->bbLiveIn))
9009 if (uniquePredBlock != nullptr)
9011 // We may have split edges during critical edge resolution, and in the process split
9012 // a non-critical edge as well.
9013 // It is unlikely that we would ever have more than one of these in sequence (indeed,
9014 // I don't think it's possible), but there's no need to assume that it can't.
9015 while (uniquePredBlock->bbNum > bbNumMaxBeforeResolution)
9017 uniquePredBlock = uniquePredBlock->GetUniquePred(compiler);
9018 noway_assert(uniquePredBlock != nullptr);
9020 resolveEdge(uniquePredBlock, block, ResolveSplit, block->bbLiveIn);
9024 // Finally, if this block has a single successor:
9025 // - and that has at least one other predecessor (otherwise we will do the resolution at the
9026 // top of the successor),
9027 // - and that is not the target of a critical edge (otherwise we've already handled it)
9028 // we may need resolution at the end of this block.
9032 BasicBlock* succBlock = block->GetSucc(0, compiler);
9033 if (succBlock->GetUniquePred(compiler) == nullptr)
9035 resolveEdge(block, succBlock, ResolveJoin, succBlock->bbLiveIn);
9040 // Now, fixup the mapping for any blocks that were adding for edge splitting.
9041 // See the comment prior to the call to fgSplitEdge() in resolveEdge().
9042 // Note that we could fold this loop in with the checking code below, but that
9043 // would only improve the debug case, and would clutter up the code somewhat.
9044 if (compiler->fgBBNumMax > bbNumMaxBeforeResolution)
9046 foreach_block(compiler, block)
9048 if (block->bbNum > bbNumMaxBeforeResolution)
9050 // There may be multiple blocks inserted when we split. But we must always have exactly
9051 // one path (i.e. all blocks must be single-successor and single-predecessor),
9052 // and only one block along the path may be non-empty.
9053 // Note that we may have a newly-inserted block that is empty, but which connects
9054 // two non-resolution blocks. This happens when an edge is split that requires it.
9056 BasicBlock* succBlock = block;
9059 succBlock = succBlock->GetUniqueSucc();
9060 noway_assert(succBlock != nullptr);
9061 } while ((succBlock->bbNum > bbNumMaxBeforeResolution) && succBlock->isEmpty());
9063 BasicBlock* predBlock = block;
9066 predBlock = predBlock->GetUniquePred(compiler);
9067 noway_assert(predBlock != nullptr);
9068 } while ((predBlock->bbNum > bbNumMaxBeforeResolution) && predBlock->isEmpty());
9070 unsigned succBBNum = succBlock->bbNum;
9071 unsigned predBBNum = predBlock->bbNum;
9072 if (block->isEmpty())
9074 // For the case of the empty block, find the non-resolution block (succ or pred).
9075 if (predBBNum > bbNumMaxBeforeResolution)
9077 assert(succBBNum <= bbNumMaxBeforeResolution);
9087 assert((succBBNum <= bbNumMaxBeforeResolution) && (predBBNum <= bbNumMaxBeforeResolution));
9089 SplitEdgeInfo info = {predBBNum, succBBNum};
9090 getSplitBBNumToTargetBBNumMap()->Set(block->bbNum, info);
9096 // Make sure the varToRegMaps match up on all edges.
9097 bool foundMismatch = false;
9098 foreach_block(compiler, block)
9100 if (block->isEmpty() && block->bbNum > bbNumMaxBeforeResolution)
9104 VarToRegMap toVarToRegMap = getInVarToRegMap(block->bbNum);
9105 for (flowList* pred = block->bbPreds; pred != nullptr; pred = pred->flNext)
9107 BasicBlock* predBlock = pred->flBlock;
9108 VarToRegMap fromVarToRegMap = getOutVarToRegMap(predBlock->bbNum);
9109 VARSET_ITER_INIT(compiler, iter, block->bbLiveIn, varIndex);
9110 while (iter.NextElem(compiler, &varIndex))
9112 unsigned varNum = compiler->lvaTrackedToVarNum[varIndex];
9113 regNumber fromReg = getVarReg(fromVarToRegMap, varNum);
9114 regNumber toReg = getVarReg(toVarToRegMap, varNum);
9115 if (fromReg != toReg)
9117 Interval* interval = getIntervalForLocalVar(varNum);
9120 foundMismatch = true;
9121 printf("Found mismatched var locations after resolution!\n");
9123 printf(" V%02u: BB%02u to BB%02u: ", varNum, predBlock->bbNum, block->bbNum);
9124 printf("%s to %s\n", getRegName(fromReg), getRegName(toReg));
9129 assert(!foundMismatch);
9134 //------------------------------------------------------------------------
9135 // resolveEdge: Perform the specified type of resolution between two blocks.
9138 // fromBlock - the block from which the edge originates
9139 // toBlock - the block at which the edge terminates
9140 // resolveType - the type of resolution to be performed
9141 // liveSet - the set of tracked lclVar indices which may require resolution
9147 // The caller must have performed the analysis to determine the type of the edge.
9150 // This method emits the correctly ordered moves necessary to place variables in the
9151 // correct registers across a Split, Join or Critical edge.
9152 // In order to avoid overwriting register values before they have been moved to their
9153 // new home (register/stack), it first does the register-to-stack moves (to free those
9154 // registers), then the register to register moves, ensuring that the target register
9155 // is free before the move, and then finally the stack to register moves.
9157 void LinearScan::resolveEdge(BasicBlock* fromBlock,
9158 BasicBlock* toBlock,
9159 ResolveType resolveType,
9160 VARSET_VALARG_TP liveSet)
9162 VarToRegMap fromVarToRegMap = getOutVarToRegMap(fromBlock->bbNum);
9163 VarToRegMap toVarToRegMap;
9164 if (resolveType == ResolveSharedCritical)
9166 toVarToRegMap = sharedCriticalVarToRegMap;
9170 toVarToRegMap = getInVarToRegMap(toBlock->bbNum);
9173 // The block to which we add the resolution moves depends on the resolveType
9175 switch (resolveType)
9178 case ResolveSharedCritical:
9184 case ResolveCritical:
9185 // fgSplitEdge may add one or two BasicBlocks. It returns the block that splits
9186 // the edge from 'fromBlock' and 'toBlock', but if it inserts that block right after
9187 // a block with a fall-through it will have to create another block to handle that edge.
9188 // These new blocks can be mapped to existing blocks in order to correctly handle
9189 // the calls to recordVarLocationsAtStartOfBB() from codegen. That mapping is handled
9190 // in resolveEdges(), after all the edge resolution has been done (by calling this
9191 // method for each edge).
9192 block = compiler->fgSplitEdge(fromBlock, toBlock);
9199 #ifndef _TARGET_XARCH_
9200 // We record tempregs for beginning and end of each block.
9201 // For amd64/x86 we only need a tempReg for float - we'll use xchg for int.
9202 // TODO-Throughput: It would be better to determine the tempRegs on demand, but the code below
9203 // modifies the varToRegMaps so we don't have all the correct registers at the time
9204 // we need to get the tempReg.
9205 regNumber tempRegInt =
9206 (resolveType == ResolveSharedCritical) ? REG_NA : getTempRegForResolution(fromBlock, toBlock, TYP_INT);
9207 #endif // !_TARGET_XARCH_
9208 regNumber tempRegFlt = REG_NA;
9209 if ((compiler->compFloatingPointUsed) && (resolveType != ResolveSharedCritical))
9211 tempRegFlt = getTempRegForResolution(fromBlock, toBlock, TYP_FLOAT);
9214 regMaskTP targetRegsToDo = RBM_NONE;
9215 regMaskTP targetRegsReady = RBM_NONE;
9216 regMaskTP targetRegsFromStack = RBM_NONE;
9218 // The following arrays capture the location of the registers as they are moved:
9219 // - location[reg] gives the current location of the var that was originally in 'reg'.
9220 // (Note that a var may be moved more than once.)
9221 // - source[reg] gives the original location of the var that needs to be moved to 'reg'.
9222 // For example, if a var is in rax and needs to be moved to rsi, then we would start with:
9223 // location[rax] == rax
9224 // source[rsi] == rax -- this doesn't change
9225 // Then, if for some reason we need to move it temporary to rbx, we would have:
9226 // location[rax] == rbx
9227 // Once we have completed the move, we will have:
9228 // location[rax] == REG_NA
9229 // This indicates that the var originally in rax is now in its target register.
9231 regNumberSmall location[REG_COUNT];
9232 C_ASSERT(sizeof(char) == sizeof(regNumberSmall)); // for memset to work
9233 memset(location, REG_NA, REG_COUNT);
9234 regNumberSmall source[REG_COUNT];
9235 memset(source, REG_NA, REG_COUNT);
9237 // What interval is this register associated with?
9238 // (associated with incoming reg)
9239 Interval* sourceIntervals[REG_COUNT] = {nullptr};
9241 // Intervals for vars that need to be loaded from the stack
9242 Interval* stackToRegIntervals[REG_COUNT] = {nullptr};
9244 // Get the starting insertion point for the "to" resolution
9245 GenTreePtr insertionPoint = nullptr;
9246 if (resolveType == ResolveSplit || resolveType == ResolveCritical)
9248 insertionPoint = LIR::AsRange(block).FirstNonPhiNode();
9252 // - Perform all moves from reg to stack (no ordering needed on these)
9253 // - For reg to reg moves, record the current location, associating their
9254 // source location with the target register they need to go into
9255 // - For stack to reg moves (done last, no ordering needed between them)
9256 // record the interval associated with the target reg
9257 // TODO-Throughput: We should be looping over the liveIn and liveOut registers, since
9258 // that will scale better than the live variables
9260 VARSET_ITER_INIT(compiler, iter, liveSet, varIndex);
9261 while (iter.NextElem(compiler, &varIndex))
9263 unsigned varNum = compiler->lvaTrackedToVarNum[varIndex];
9264 bool isSpilled = false;
9265 Interval* interval = getIntervalForLocalVar(varNum);
9266 regNumber fromReg = getVarReg(fromVarToRegMap, varNum);
9267 regNumber toReg = getVarReg(toVarToRegMap, varNum);
9268 if (fromReg == toReg)
9273 // For Critical edges, the location will not change on either side of the edge,
9274 // since we'll add a new block to do the move.
9275 if (resolveType == ResolveSplit)
9277 toVarToRegMap[varIndex] = fromReg;
9279 else if (resolveType == ResolveJoin || resolveType == ResolveSharedCritical)
9281 fromVarToRegMap[varIndex] = toReg;
9284 assert(fromReg < UCHAR_MAX && toReg < UCHAR_MAX);
9288 if (fromReg != toReg)
9290 if (fromReg == REG_STK)
9292 stackToRegIntervals[toReg] = interval;
9293 targetRegsFromStack |= genRegMask(toReg);
9295 else if (toReg == REG_STK)
9297 // Do the reg to stack moves now
9298 addResolution(block, insertionPoint, interval, REG_STK, fromReg);
9299 JITDUMP(" (%s)\n", resolveTypeName[resolveType]);
9303 location[fromReg] = (regNumberSmall)fromReg;
9304 source[toReg] = (regNumberSmall)fromReg;
9305 sourceIntervals[fromReg] = interval;
9306 targetRegsToDo |= genRegMask(toReg);
9311 // REGISTER to REGISTER MOVES
9313 // First, find all the ones that are ready to move now
9314 regMaskTP targetCandidates = targetRegsToDo;
9315 while (targetCandidates != RBM_NONE)
9317 regMaskTP targetRegMask = genFindLowestBit(targetCandidates);
9318 targetCandidates &= ~targetRegMask;
9319 regNumber targetReg = genRegNumFromMask(targetRegMask);
9320 if (location[targetReg] == REG_NA)
9322 targetRegsReady |= targetRegMask;
9326 // Perform reg to reg moves
9327 while (targetRegsToDo != RBM_NONE)
9329 while (targetRegsReady != RBM_NONE)
9331 regMaskTP targetRegMask = genFindLowestBit(targetRegsReady);
9332 targetRegsToDo &= ~targetRegMask;
9333 targetRegsReady &= ~targetRegMask;
9334 regNumber targetReg = genRegNumFromMask(targetRegMask);
9335 assert(location[targetReg] != targetReg);
9336 regNumber sourceReg = (regNumber)source[targetReg];
9337 regNumber fromReg = (regNumber)location[sourceReg];
9338 assert(fromReg < UCHAR_MAX && sourceReg < UCHAR_MAX);
9339 Interval* interval = sourceIntervals[sourceReg];
9340 assert(interval != nullptr);
9341 addResolution(block, insertionPoint, interval, targetReg, fromReg);
9342 JITDUMP(" (%s)\n", resolveTypeName[resolveType]);
9343 sourceIntervals[sourceReg] = nullptr;
9344 location[sourceReg] = REG_NA;
9346 // Do we have a free targetReg?
9347 if (fromReg == sourceReg && source[fromReg] != REG_NA)
9349 regMaskTP fromRegMask = genRegMask(fromReg);
9350 targetRegsReady |= fromRegMask;
9353 if (targetRegsToDo != RBM_NONE)
9355 regMaskTP targetRegMask = genFindLowestBit(targetRegsToDo);
9356 regNumber targetReg = genRegNumFromMask(targetRegMask);
9358 // Is it already there due to other moves?
9359 // If not, move it to the temp reg, OR swap it with another register
9360 regNumber sourceReg = (regNumber)source[targetReg];
9361 regNumber fromReg = (regNumber)location[sourceReg];
9362 if (targetReg == fromReg)
9364 targetRegsToDo &= ~targetRegMask;
9368 regNumber tempReg = REG_NA;
9369 bool useSwap = false;
9370 if (emitter::isFloatReg(targetReg))
9372 tempReg = tempRegFlt;
9374 #ifdef _TARGET_XARCH_
9379 #else // !_TARGET_XARCH_
9382 tempReg = tempRegInt;
9384 #endif // !_TARGET_XARCH_
9385 if (useSwap || tempReg == REG_NA)
9387 // First, we have to figure out the destination register for what's currently in fromReg,
9388 // so that we can find its sourceInterval.
9389 regNumber otherTargetReg = REG_NA;
9391 // By chance, is fromReg going where it belongs?
9392 if (location[source[fromReg]] == targetReg)
9394 otherTargetReg = fromReg;
9395 // If we can swap, we will be done with otherTargetReg as well.
9396 // Otherwise, we'll spill it to the stack and reload it later.
9399 regMaskTP fromRegMask = genRegMask(fromReg);
9400 targetRegsToDo &= ~fromRegMask;
9405 // Look at the remaining registers from targetRegsToDo (which we expect to be relatively
9406 // small at this point) to find out what's currently in targetReg.
9407 regMaskTP mask = targetRegsToDo;
9408 while (mask != RBM_NONE && otherTargetReg == REG_NA)
9410 regMaskTP nextRegMask = genFindLowestBit(mask);
9411 regNumber nextReg = genRegNumFromMask(nextRegMask);
9412 mask &= ~nextRegMask;
9413 if (location[source[nextReg]] == targetReg)
9415 otherTargetReg = nextReg;
9419 assert(otherTargetReg != REG_NA);
9423 // Generate a "swap" of fromReg and targetReg
9424 insertSwap(block, insertionPoint, sourceIntervals[source[otherTargetReg]]->varNum, targetReg,
9425 sourceIntervals[sourceReg]->varNum, fromReg);
9426 location[sourceReg] = REG_NA;
9427 location[source[otherTargetReg]] = (regNumberSmall)fromReg;
9431 // Spill "targetReg" to the stack and add its eventual target (otherTargetReg)
9432 // to "targetRegsFromStack", which will be handled below.
9433 // NOTE: This condition is very rare. Setting COMPlus_JitStressRegs=0x203
9434 // has been known to trigger it in JIT SH.
9436 // First, spill "otherInterval" from targetReg to the stack.
9437 Interval* otherInterval = sourceIntervals[source[otherTargetReg]];
9438 addResolution(block, insertionPoint, otherInterval, REG_STK, targetReg);
9439 JITDUMP(" (%s)\n", resolveTypeName[resolveType]);
9440 location[source[otherTargetReg]] = REG_STK;
9442 // Now, move the interval that is going to targetReg, and add its "fromReg" to
9443 // "targetRegsReady".
9444 addResolution(block, insertionPoint, sourceIntervals[sourceReg], targetReg, fromReg);
9445 JITDUMP(" (%s)\n", resolveTypeName[resolveType]);
9446 location[sourceReg] = REG_NA;
9447 targetRegsReady |= genRegMask(fromReg);
9449 targetRegsToDo &= ~targetRegMask;
9453 compiler->codeGen->regSet.rsSetRegsModified(genRegMask(tempReg) DEBUGARG(dumpTerse));
9454 assert(sourceIntervals[targetReg] != nullptr);
9455 addResolution(block, insertionPoint, sourceIntervals[targetReg], tempReg, targetReg);
9456 JITDUMP(" (%s)\n", resolveTypeName[resolveType]);
9457 location[targetReg] = (regNumberSmall)tempReg;
9458 targetRegsReady |= targetRegMask;
9464 // Finally, perform stack to reg moves
9465 // All the target regs will be empty at this point
9466 while (targetRegsFromStack != RBM_NONE)
9468 regMaskTP targetRegMask = genFindLowestBit(targetRegsFromStack);
9469 targetRegsFromStack &= ~targetRegMask;
9470 regNumber targetReg = genRegNumFromMask(targetRegMask);
9472 Interval* interval = stackToRegIntervals[targetReg];
9473 assert(interval != nullptr);
9475 addResolution(block, insertionPoint, interval, targetReg, REG_STK);
9476 JITDUMP(" (%s)\n", resolveTypeName[resolveType]);
9480 void TreeNodeInfo::Initialize(LinearScan* lsra, GenTree* node, LsraLocation location)
9482 regMaskTP dstCandidates;
9484 // if there is a reg indicated on the tree node, use that for dstCandidates
9485 // the exception is the NOP, which sometimes show up around late args.
9486 // TODO-Cleanup: get rid of those NOPs.
9487 if (node->gtRegNum == REG_NA || node->gtOper == GT_NOP)
9489 dstCandidates = lsra->allRegs(node->TypeGet());
9493 dstCandidates = genRegMask(node->gtRegNum);
9496 internalIntCount = 0;
9497 internalFloatCount = 0;
9498 isLocalDefUse = false;
9499 isHelperCallWithKills = false;
9500 isLsraAdded = false;
9501 definesAnyRegisters = false;
9503 setDstCandidates(lsra, dstCandidates);
9504 srcCandsIndex = dstCandsIndex;
9506 setInternalCandidates(lsra, lsra->allRegs(TYP_INT));
9510 isInitialized = true;
9513 assert(IsValid(lsra));
9516 regMaskTP TreeNodeInfo::getSrcCandidates(LinearScan* lsra)
9518 return lsra->GetRegMaskForIndex(srcCandsIndex);
9521 void TreeNodeInfo::setSrcCandidates(LinearScan* lsra, regMaskTP mask)
9523 LinearScan::RegMaskIndex i = lsra->GetIndexForRegMask(mask);
9524 assert(FitsIn<unsigned char>(i));
9525 srcCandsIndex = (unsigned char)i;
9528 regMaskTP TreeNodeInfo::getDstCandidates(LinearScan* lsra)
9530 return lsra->GetRegMaskForIndex(dstCandsIndex);
9533 void TreeNodeInfo::setDstCandidates(LinearScan* lsra, regMaskTP mask)
9535 LinearScan::RegMaskIndex i = lsra->GetIndexForRegMask(mask);
9536 assert(FitsIn<unsigned char>(i));
9537 dstCandsIndex = (unsigned char)i;
9540 regMaskTP TreeNodeInfo::getInternalCandidates(LinearScan* lsra)
9542 return lsra->GetRegMaskForIndex(internalCandsIndex);
9545 void TreeNodeInfo::setInternalCandidates(LinearScan* lsra, regMaskTP mask)
9547 LinearScan::RegMaskIndex i = lsra->GetIndexForRegMask(mask);
9548 assert(FitsIn<unsigned char>(i));
9549 internalCandsIndex = (unsigned char)i;
9552 void TreeNodeInfo::addInternalCandidates(LinearScan* lsra, regMaskTP mask)
9554 LinearScan::RegMaskIndex i = lsra->GetIndexForRegMask(lsra->GetRegMaskForIndex(internalCandsIndex) | mask);
9555 assert(FitsIn<unsigned char>(i));
9556 internalCandsIndex = (unsigned char)i;
9560 void dumpRegMask(regMaskTP regs)
9562 if (regs == RBM_ALLINT)
9566 else if (regs == (RBM_ALLINT & ~RBM_FPBASE))
9568 printf("[allIntButFP]");
9570 else if (regs == RBM_ALLFLOAT)
9572 printf("[allFloat]");
9574 else if (regs == RBM_ALLDOUBLE)
9576 printf("[allDouble]");
9584 static const char* getRefTypeName(RefType refType)
9588 #define DEF_REFTYPE(memberName, memberValue, shortName) \
9591 #include "lsra_reftypes.h"
9598 static const char* getRefTypeShortName(RefType refType)
9602 #define DEF_REFTYPE(memberName, memberValue, shortName) \
9605 #include "lsra_reftypes.h"
9612 void RefPosition::dump()
9614 printf("<RefPosition #%-3u @%-3u", rpNum, nodeLocation);
9616 if (nextRefPosition)
9618 printf(" ->#%-3u", nextRefPosition->rpNum);
9621 printf(" %s ", getRefTypeName(refType));
9623 if (this->isPhysRegRef)
9625 this->getReg()->tinyDump();
9627 else if (getInterval())
9629 this->getInterval()->tinyDump();
9634 printf("%s ", treeNode->OpName(treeNode->OperGet()));
9636 printf("BB%02u ", this->bbNum);
9639 dumpRegMask(registerAssignment);
9649 if (this->spillAfter)
9651 printf(" spillAfter");
9661 if (this->isFixedRegRef)
9665 if (this->isLocalDefUse)
9669 if (this->delayRegFree)
9673 if (this->outOfOrder)
9675 printf(" outOfOrder");
9680 void RegRecord::dump()
9685 void Interval::dump()
9687 printf("Interval %2u:", intervalIndex);
9691 printf(" (V%02u)", varNum);
9695 printf(" (INTERNAL)");
9699 printf(" (SPILLED)");
9707 printf(" (struct)");
9709 if (isSpecialPutArg)
9711 printf(" (specialPutArg)");
9715 printf(" (constant)");
9718 printf(" RefPositions {");
9719 for (RefPosition* refPosition = this->firstRefPosition; refPosition != nullptr;
9720 refPosition = refPosition->nextRefPosition)
9722 printf("#%u@%u", refPosition->rpNum, refPosition->nodeLocation);
9723 if (refPosition->nextRefPosition)
9730 // this is not used (yet?)
9731 // printf(" SpillOffset %d", this->spillOffset);
9733 printf(" physReg:%s", getRegName(physReg));
9735 printf(" Preferences=");
9736 dumpRegMask(this->registerPreferences);
9738 if (relatedInterval)
9740 printf(" RelatedInterval ");
9741 relatedInterval->microDump();
9742 printf("[%p]", dspPtr(relatedInterval));
9748 // print out very concise representation
9749 void Interval::tinyDump()
9751 printf("<Ivl:%u", intervalIndex);
9754 printf(" V%02u", varNum);
9758 printf(" internal");
9763 // print out extremely concise representation
9764 void Interval::microDump()
9766 char intervalTypeChar = 'I';
9769 intervalTypeChar = 'T';
9771 else if (isLocalVar)
9773 intervalTypeChar = 'L';
9776 printf("<%c%u>", intervalTypeChar, intervalIndex);
9779 void RegRecord::tinyDump()
9781 printf("<Reg:%-3s> ", getRegName(regNum));
9784 void TreeNodeInfo::dump(LinearScan* lsra)
9786 printf("<TreeNodeInfo @ %2u %d=%d %di %df", loc, dstCount, srcCount, internalIntCount, internalFloatCount);
9788 dumpRegMask(getSrcCandidates(lsra));
9790 dumpRegMask(getInternalCandidates(lsra));
9792 dumpRegMask(getDstCandidates(lsra));
9801 if (isHelperCallWithKills)
9820 void LinearScan::lsraDumpIntervals(const char* msg)
9824 printf("\nLinear scan intervals %s:\n", msg);
9825 for (auto& interval : intervals)
9827 // only dump something if it has references
9828 // if (interval->firstRefPosition)
9835 // Dumps a tree node as a destination or source operand, with the style
9836 // of dump dependent on the mode
9837 void LinearScan::lsraGetOperandString(GenTreePtr tree,
9838 LsraTupleDumpMode mode,
9839 char* operandString,
9840 unsigned operandStringLength)
9842 const char* lastUseChar = "";
9843 if ((tree->gtFlags & GTF_VAR_DEATH) != 0)
9849 case LinearScan::LSRA_DUMP_PRE:
9850 _snprintf_s(operandString, operandStringLength, operandStringLength, "t%d%s", tree->gtSeqNum, lastUseChar);
9852 case LinearScan::LSRA_DUMP_REFPOS:
9853 _snprintf_s(operandString, operandStringLength, operandStringLength, "t%d%s", tree->gtSeqNum, lastUseChar);
9855 case LinearScan::LSRA_DUMP_POST:
9857 Compiler* compiler = JitTls::GetCompiler();
9859 if (!tree->gtHasReg())
9861 _snprintf_s(operandString, operandStringLength, operandStringLength, "STK%s", lastUseChar);
9865 _snprintf_s(operandString, operandStringLength, operandStringLength, "%s%s",
9866 getRegName(tree->gtRegNum, useFloatReg(tree->TypeGet())), lastUseChar);
9871 printf("ERROR: INVALID TUPLE DUMP MODE\n");
9875 void LinearScan::lsraDispNode(GenTreePtr tree, LsraTupleDumpMode mode, bool hasDest)
9877 Compiler* compiler = JitTls::GetCompiler();
9878 const unsigned operandStringLength = 16;
9879 char operandString[operandStringLength];
9880 const char* emptyDestOperand = " ";
9881 char spillChar = ' ';
9883 if (mode == LinearScan::LSRA_DUMP_POST)
9885 if ((tree->gtFlags & GTF_SPILL) != 0)
9889 if (!hasDest && tree->gtHasReg())
9891 // This can be true for the "localDefUse" case - defining a reg, but
9892 // pushing it on the stack
9893 assert(spillChar == ' ');
9898 printf("%c N%03u. ", spillChar, tree->gtSeqNum);
9900 LclVarDsc* varDsc = nullptr;
9901 unsigned varNum = UINT_MAX;
9902 if (tree->IsLocal())
9904 varNum = tree->gtLclVarCommon.gtLclNum;
9905 varDsc = &(compiler->lvaTable[varNum]);
9906 if (varDsc->lvLRACandidate)
9913 if (mode == LinearScan::LSRA_DUMP_POST && tree->gtFlags & GTF_SPILLED)
9915 assert(tree->gtHasReg());
9917 lsraGetOperandString(tree, mode, operandString, operandStringLength);
9918 printf("%-15s =", operandString);
9922 printf("%-15s ", emptyDestOperand);
9924 if (varDsc != nullptr)
9926 if (varDsc->lvLRACandidate)
9928 if (mode == LSRA_DUMP_REFPOS)
9930 printf(" V%02u(L%d)", varNum, getIntervalForLocalVar(varNum)->intervalIndex);
9934 lsraGetOperandString(tree, mode, operandString, operandStringLength);
9935 printf(" V%02u(%s)", varNum, operandString);
9936 if (mode == LinearScan::LSRA_DUMP_POST && tree->gtFlags & GTF_SPILLED)
9944 printf(" V%02u MEM", varNum);
9947 else if (tree->OperIsAssignment())
9949 assert(!tree->gtHasReg());
9950 const char* isRev = "";
9951 if ((tree->gtFlags & GTF_REVERSE_OPS) != 0)
9955 printf(" asg%s%s ", GenTree::NodeName(tree->OperGet()), isRev);
9959 compiler->gtDispNodeName(tree);
9960 if ((tree->gtFlags & GTF_REVERSE_OPS) != 0)
9964 if (tree->OperKind() & GTK_LEAF)
9966 compiler->gtDispLeaf(tree, nullptr);
9971 //------------------------------------------------------------------------
9972 // ComputeOperandDstCount: computes the number of registers defined by a
9975 // For most nodes, this is simple:
9976 // - Nodes that do not produce values (e.g. stores and other void-typed
9977 // nodes) and nodes that immediately use the registers they define
9978 // produce no registers
9979 // - Nodes that are marked as defining N registers define N registers.
9981 // For contained nodes, however, things are more complicated: for purposes
9982 // of bookkeeping, a contained node is treated as producing the transitive
9983 // closure of the registers produced by its sources.
9986 // operand - The operand for which to compute a register count.
9989 // The number of registers defined by `operand`.
9991 void LinearScan::DumpOperandDefs(
9992 GenTree* operand, bool& first, LsraTupleDumpMode mode, char* operandString, const unsigned operandStringLength)
9994 assert(operand != nullptr);
9995 assert(operandString != nullptr);
9997 if (ComputeOperandDstCount(operand) == 0)
10002 if (operand->gtLsraInfo.dstCount != 0)
10004 // This operand directly produces registers; print it.
10005 for (int i = 0; i < operand->gtLsraInfo.dstCount; i++)
10012 lsraGetOperandString(operand, mode, operandString, operandStringLength);
10013 printf("%s", operandString);
10020 // This is a contained node. Dump the defs produced by its operands.
10021 for (GenTree* op : operand->Operands())
10023 DumpOperandDefs(op, first, mode, operandString, operandStringLength);
10028 void LinearScan::TupleStyleDump(LsraTupleDumpMode mode)
10031 LsraLocation currentLoc = 1; // 0 is the entry
10032 const unsigned operandStringLength = 16;
10033 char operandString[operandStringLength];
10035 // currentRefPosition is not used for LSRA_DUMP_PRE
10036 // We keep separate iterators for defs, so that we can print them
10037 // on the lhs of the dump
10038 auto currentRefPosition = refPositions.begin();
10042 case LSRA_DUMP_PRE:
10043 printf("TUPLE STYLE DUMP BEFORE LSRA\n");
10045 case LSRA_DUMP_REFPOS:
10046 printf("TUPLE STYLE DUMP WITH REF POSITIONS\n");
10048 case LSRA_DUMP_POST:
10049 printf("TUPLE STYLE DUMP WITH REGISTER ASSIGNMENTS\n");
10052 printf("ERROR: INVALID TUPLE DUMP MODE\n");
10056 if (mode != LSRA_DUMP_PRE)
10058 printf("Incoming Parameters: ");
10059 for (; currentRefPosition != refPositions.end() && currentRefPosition->refType != RefTypeBB;
10060 ++currentRefPosition)
10062 Interval* interval = currentRefPosition->getInterval();
10063 assert(interval != nullptr && interval->isLocalVar);
10064 printf(" V%02d", interval->varNum);
10065 if (mode == LSRA_DUMP_POST)
10068 if (currentRefPosition->registerAssignment == RBM_NONE)
10074 reg = currentRefPosition->assignedReg();
10076 LclVarDsc* varDsc = &(compiler->lvaTable[interval->varNum]);
10078 regNumber assignedReg = varDsc->lvRegNum;
10079 regNumber argReg = (varDsc->lvIsRegArg) ? varDsc->lvArgReg : REG_STK;
10081 assert(reg == assignedReg || varDsc->lvRegister == false);
10084 printf(getRegName(argReg, isFloatRegType(interval->registerType)));
10087 printf("%s)", getRegName(reg, isFloatRegType(interval->registerType)));
10093 for (block = startBlockSequence(); block != nullptr; block = moveToNextBlock())
10097 if (mode == LSRA_DUMP_REFPOS)
10099 bool printedBlockHeader = false;
10100 // We should find the boundary RefPositions in the order of exposed uses, dummy defs, and the blocks
10101 for (; currentRefPosition != refPositions.end() &&
10102 (currentRefPosition->refType == RefTypeExpUse || currentRefPosition->refType == RefTypeDummyDef ||
10103 (currentRefPosition->refType == RefTypeBB && !printedBlockHeader));
10104 ++currentRefPosition)
10106 Interval* interval = nullptr;
10107 if (currentRefPosition->isIntervalRef())
10109 interval = currentRefPosition->getInterval();
10111 switch (currentRefPosition->refType)
10113 case RefTypeExpUse:
10114 assert(interval != nullptr);
10115 assert(interval->isLocalVar);
10116 printf(" Exposed use of V%02u at #%d\n", interval->varNum, currentRefPosition->rpNum);
10118 case RefTypeDummyDef:
10119 assert(interval != nullptr);
10120 assert(interval->isLocalVar);
10121 printf(" Dummy def of V%02u at #%d\n", interval->varNum, currentRefPosition->rpNum);
10124 block->dspBlockHeader(compiler);
10125 printedBlockHeader = true;
10129 printf("Unexpected RefPosition type at #%d\n", currentRefPosition->rpNum);
10136 block->dspBlockHeader(compiler);
10139 if (mode == LSRA_DUMP_POST && block != compiler->fgFirstBB && block->bbNum <= bbNumMaxBeforeResolution)
10141 printf("Predecessor for variable locations: BB%02u\n", blockInfo[block->bbNum].predBBNum);
10142 dumpInVarToRegMap(block);
10144 if (block->bbNum > bbNumMaxBeforeResolution)
10146 SplitEdgeInfo splitEdgeInfo;
10147 splitBBNumToTargetBBNumMap->Lookup(block->bbNum, &splitEdgeInfo);
10148 assert(splitEdgeInfo.toBBNum <= bbNumMaxBeforeResolution);
10149 assert(splitEdgeInfo.fromBBNum <= bbNumMaxBeforeResolution);
10150 printf("New block introduced for resolution from BB%02u to BB%02u\n", splitEdgeInfo.fromBBNum,
10151 splitEdgeInfo.toBBNum);
10154 for (GenTree* node : LIR::AsRange(block).NonPhiNodes())
10156 GenTree* tree = node;
10158 genTreeOps oper = tree->OperGet();
10159 TreeNodeInfo& info = tree->gtLsraInfo;
10160 if (tree->gtLsraInfo.isLsraAdded)
10162 // This must be one of the nodes that we add during LSRA
10164 if (oper == GT_LCL_VAR)
10169 else if (oper == GT_RELOAD || oper == GT_COPY)
10174 #ifdef FEATURE_SIMD
10175 else if (oper == GT_SIMD)
10177 if (tree->gtSIMD.gtSIMDIntrinsicID == SIMDIntrinsicUpperSave)
10184 assert(tree->gtSIMD.gtSIMDIntrinsicID == SIMDIntrinsicUpperRestore);
10189 #endif // FEATURE_SIMD
10192 assert(oper == GT_SWAP);
10196 info.internalIntCount = 0;
10197 info.internalFloatCount = 0;
10200 int consume = info.srcCount;
10201 int produce = info.dstCount;
10202 regMaskTP killMask = RBM_NONE;
10203 regMaskTP fixedMask = RBM_NONE;
10205 lsraDispNode(tree, mode, produce != 0 && mode != LSRA_DUMP_REFPOS);
10207 if (mode != LSRA_DUMP_REFPOS)
10214 for (GenTree* operand : tree->Operands())
10216 DumpOperandDefs(operand, first, mode, operandString, operandStringLength);
10222 // Print each RefPosition on a new line, but
10223 // printing all the kills for each node on a single line
10224 // and combining the fixed regs with their associated def or use
10225 bool killPrinted = false;
10226 RefPosition* lastFixedRegRefPos = nullptr;
10227 for (; currentRefPosition != refPositions.end() &&
10228 (currentRefPosition->refType == RefTypeUse || currentRefPosition->refType == RefTypeFixedReg ||
10229 currentRefPosition->refType == RefTypeKill || currentRefPosition->refType == RefTypeDef) &&
10230 (currentRefPosition->nodeLocation == tree->gtSeqNum ||
10231 currentRefPosition->nodeLocation == tree->gtSeqNum + 1);
10232 ++currentRefPosition)
10234 Interval* interval = nullptr;
10235 if (currentRefPosition->isIntervalRef())
10237 interval = currentRefPosition->getInterval();
10239 switch (currentRefPosition->refType)
10242 if (currentRefPosition->isPhysRegRef)
10244 printf("\n Use:R%d(#%d)",
10245 currentRefPosition->getReg()->regNum, currentRefPosition->rpNum);
10249 assert(interval != nullptr);
10251 interval->microDump();
10252 printf("(#%d)", currentRefPosition->rpNum);
10253 if (currentRefPosition->isFixedRegRef)
10255 assert(genMaxOneBit(currentRefPosition->registerAssignment));
10256 assert(lastFixedRegRefPos != nullptr);
10257 printf(" Fixed:%s(#%d)", getRegName(currentRefPosition->assignedReg(),
10258 isFloatRegType(interval->registerType)),
10259 lastFixedRegRefPos->rpNum);
10260 lastFixedRegRefPos = nullptr;
10262 if (currentRefPosition->isLocalDefUse)
10264 printf(" LocalDefUse");
10266 if (currentRefPosition->lastUse)
10274 // Print each def on a new line
10275 assert(interval != nullptr);
10277 interval->microDump();
10278 printf("(#%d)", currentRefPosition->rpNum);
10279 if (currentRefPosition->isFixedRegRef)
10281 assert(genMaxOneBit(currentRefPosition->registerAssignment));
10282 printf(" %s", getRegName(currentRefPosition->assignedReg(),
10283 isFloatRegType(interval->registerType)));
10285 if (currentRefPosition->isLocalDefUse)
10287 printf(" LocalDefUse");
10289 if (currentRefPosition->lastUse)
10293 if (interval->relatedInterval != nullptr)
10296 interval->relatedInterval->microDump();
10303 printf("\n Kill: ");
10304 killPrinted = true;
10306 printf(getRegName(currentRefPosition->assignedReg(),
10307 isFloatRegType(currentRefPosition->getReg()->registerType)));
10310 case RefTypeFixedReg:
10311 lastFixedRegRefPos = currentRefPosition;
10314 printf("Unexpected RefPosition type at #%d\n", currentRefPosition->rpNum);
10320 if (info.internalIntCount != 0 && mode != LSRA_DUMP_REFPOS)
10322 printf("\tinternal (%d):\t", info.internalIntCount);
10323 if (mode == LSRA_DUMP_POST)
10325 dumpRegMask(tree->gtRsvdRegs);
10327 else if ((info.getInternalCandidates(this) & allRegs(TYP_INT)) != allRegs(TYP_INT))
10329 dumpRegMask(info.getInternalCandidates(this) & allRegs(TYP_INT));
10333 if (info.internalFloatCount != 0 && mode != LSRA_DUMP_REFPOS)
10335 printf("\tinternal (%d):\t", info.internalFloatCount);
10336 if (mode == LSRA_DUMP_POST)
10338 dumpRegMask(tree->gtRsvdRegs);
10340 else if ((info.getInternalCandidates(this) & allRegs(TYP_INT)) != allRegs(TYP_INT))
10342 dumpRegMask(info.getInternalCandidates(this) & allRegs(TYP_INT));
10347 if (mode == LSRA_DUMP_POST)
10349 dumpOutVarToRegMap(block);
10356 void LinearScan::dumpLsraAllocationEvent(LsraDumpEvent event,
10357 Interval* interval,
10359 BasicBlock* currentBlock)
10367 // Conflicting def/use
10368 case LSRA_EVENT_DEFUSE_CONFLICT:
10371 printf(" Def and Use have conflicting register requirements:");
10375 printf("DUconflict ");
10379 case LSRA_EVENT_DEFUSE_FIXED_DELAY_USE:
10382 printf(" Can't change useAssignment ");
10385 case LSRA_EVENT_DEFUSE_CASE1:
10388 printf(" case #1, use the defRegAssignment\n");
10392 printf(indentFormat, " case #1 use defRegAssignment");
10394 dumpEmptyRefPosition();
10397 case LSRA_EVENT_DEFUSE_CASE2:
10400 printf(" case #2, use the useRegAssignment\n");
10404 printf(indentFormat, " case #2 use useRegAssignment");
10406 dumpEmptyRefPosition();
10409 case LSRA_EVENT_DEFUSE_CASE3:
10412 printf(" case #3, change the defRegAssignment to the use regs\n");
10416 printf(indentFormat, " case #3 use useRegAssignment");
10418 dumpEmptyRefPosition();
10421 case LSRA_EVENT_DEFUSE_CASE4:
10424 printf(" case #4, change the useRegAssignment to the def regs\n");
10428 printf(indentFormat, " case #4 use defRegAssignment");
10430 dumpEmptyRefPosition();
10433 case LSRA_EVENT_DEFUSE_CASE5:
10436 printf(" case #5, Conflicting Def and Use single-register requirements require copies - set def to all "
10437 "regs of the appropriate type\n");
10441 printf(indentFormat, " case #5 set def to all regs");
10443 dumpEmptyRefPosition();
10446 case LSRA_EVENT_DEFUSE_CASE6:
10449 printf(" case #6, Conflicting Def and Use register requirements require a copy\n");
10453 printf(indentFormat, " case #6 need a copy");
10455 dumpEmptyRefPosition();
10459 case LSRA_EVENT_SPILL:
10462 printf("Spilled:\n");
10467 assert(interval != nullptr && interval->assignedReg != nullptr);
10468 printf("Spill %-4s ", getRegName(interval->assignedReg->regNum));
10470 dumpEmptyRefPosition();
10473 case LSRA_EVENT_SPILL_EXTENDED_LIFETIME:
10476 printf(" Spilled extended lifetime var V%02u at last use; not marked for actual spill.",
10477 interval->intervalIndex);
10481 // Restoring the previous register
10482 case LSRA_EVENT_RESTORE_PREVIOUS_INTERVAL_AFTER_SPILL:
10483 assert(interval != nullptr);
10486 printf(" Assign register %s to previous interval Ivl:%d after spill\n", getRegName(reg),
10487 interval->intervalIndex);
10491 // If we spilled, then the dump is already pre-indented, but we need to pre-indent for the subsequent
10493 // with a dumpEmptyRefPosition().
10494 printf("SRstr %-4s ", getRegName(reg));
10496 dumpEmptyRefPosition();
10499 case LSRA_EVENT_RESTORE_PREVIOUS_INTERVAL:
10500 assert(interval != nullptr);
10503 printf(" Assign register %s to previous interval Ivl:%d\n", getRegName(reg), interval->intervalIndex);
10507 if (activeRefPosition == nullptr)
10509 printf(emptyRefPositionFormat, "");
10511 printf("Restr %-4s ", getRegName(reg));
10513 if (activeRefPosition != nullptr)
10515 printf(emptyRefPositionFormat, "");
10520 // Done with GC Kills
10521 case LSRA_EVENT_DONE_KILL_GC_REFS:
10522 printf("DoneKillGC ");
10525 // Block boundaries
10526 case LSRA_EVENT_START_BB:
10527 assert(currentBlock != nullptr);
10530 printf("\n\n Live Vars(Regs) at start of BB%02u (from pred BB%02u):", currentBlock->bbNum,
10531 blockInfo[currentBlock->bbNum].predBBNum);
10532 dumpVarToRegMap(inVarToRegMaps[currentBlock->bbNum]);
10535 case LSRA_EVENT_END_BB:
10538 printf("\n\n Live Vars(Regs) after BB%02u:", currentBlock->bbNum);
10539 dumpVarToRegMap(outVarToRegMaps[currentBlock->bbNum]);
10543 case LSRA_EVENT_FREE_REGS:
10546 printf("Freeing registers:\n");
10550 // Characteristics of the current RefPosition
10551 case LSRA_EVENT_INCREMENT_RANGE_END:
10554 printf(" Incrementing nextPhysRegLocation for %s\n", getRegName(reg));
10558 case LSRA_EVENT_LAST_USE:
10561 printf(" Last use, marked to be freed\n");
10564 case LSRA_EVENT_LAST_USE_DELAYED:
10567 printf(" Last use, marked to be freed (delayed)\n");
10570 case LSRA_EVENT_NEEDS_NEW_REG:
10573 printf(" Needs new register; mark %s to be freed\n", getRegName(reg));
10577 printf("Free %-4s ", getRegName(reg));
10579 dumpEmptyRefPosition();
10583 // Allocation decisions
10584 case LSRA_EVENT_FIXED_REG:
10585 case LSRA_EVENT_EXP_USE:
10588 printf("No allocation\n");
10592 printf("Keep %-4s ", getRegName(reg));
10595 case LSRA_EVENT_ZERO_REF:
10596 assert(interval != nullptr && interval->isLocalVar);
10599 printf("Marking V%02u as last use there are no actual references\n", interval->varNum);
10605 dumpEmptyRefPosition();
10608 case LSRA_EVENT_KEPT_ALLOCATION:
10611 printf("already allocated %4s\n", getRegName(reg));
10615 printf("Keep %-4s ", getRegName(reg));
10618 case LSRA_EVENT_COPY_REG:
10619 assert(interval != nullptr && interval->recentRefPosition != nullptr);
10622 printf("allocated %s as copyReg\n\n", getRegName(reg));
10626 printf("Copy %-4s ", getRegName(reg));
10629 case LSRA_EVENT_MOVE_REG:
10630 assert(interval != nullptr && interval->recentRefPosition != nullptr);
10633 printf(" needs a new register; marked as moveReg\n");
10637 printf("Move %-4s ", getRegName(reg));
10639 dumpEmptyRefPosition();
10642 case LSRA_EVENT_ALLOC_REG:
10645 printf("allocated %s\n", getRegName(reg));
10649 printf("Alloc %-4s ", getRegName(reg));
10652 case LSRA_EVENT_REUSE_REG:
10655 printf("reused constant in %s\n", getRegName(reg));
10659 printf("Reuse %-4s ", getRegName(reg));
10662 case LSRA_EVENT_ALLOC_SPILLED_REG:
10665 printf("allocated spilled register %s\n", getRegName(reg));
10669 printf("Steal %-4s ", getRegName(reg));
10672 case LSRA_EVENT_NO_ENTRY_REG_ALLOCATED:
10673 assert(interval != nullptr && interval->isLocalVar);
10676 printf("Not allocating an entry register for V%02u due to low ref count\n", interval->varNum);
10683 case LSRA_EVENT_NO_REG_ALLOCATED:
10686 printf("no register allocated\n");
10693 case LSRA_EVENT_RELOAD:
10696 printf(" Marked for reload\n");
10700 printf("ReLod %-4s ", getRegName(reg));
10702 dumpEmptyRefPosition();
10705 case LSRA_EVENT_SPECIAL_PUTARG:
10708 printf(" Special case of putArg - using lclVar that's in the expected reg\n");
10712 printf("PtArg %-4s ", getRegName(reg));
10720 //------------------------------------------------------------------------
10721 // dumpRegRecordHeader: Dump the header for a column-based dump of the register state.
10730 // Reg names fit in 4 characters (minimum width of the columns)
10733 // In order to make the table as dense as possible (for ease of reading the dumps),
10734 // we determine the minimum regColumnWidth width required to represent:
10735 // regs, by name (e.g. eax or xmm0) - this is fixed at 4 characters.
10736 // intervals, as Vnn for lclVar intervals, or as I<num> for other intervals.
10737 // The table is indented by the amount needed for dumpRefPositionShort, which is
10738 // captured in shortRefPositionDumpWidth.
10740 void LinearScan::dumpRegRecordHeader()
10742 printf("The following table has one or more rows for each RefPosition that is handled during allocation.\n"
10743 "The first column provides the basic information about the RefPosition, with its type (e.g. Def,\n"
10744 "Use, Fixd) followed by a '*' if it is a last use, and a 'D' if it is delayRegFree, and then the\n"
10745 "action taken during allocation (e.g. Alloc a new register, or Keep an existing one).\n"
10746 "The subsequent columns show the Interval occupying each register, if any, followed by 'a' if it is\n"
10747 "active, and 'i'if it is inactive. Columns are only printed up to the last modifed register, which\n"
10748 "may increase during allocation, in which case additional columns will appear. Registers which are\n"
10749 "not marked modified have ---- in their column.\n\n");
10751 // First, determine the width of each register column (which holds a reg name in the
10752 // header, and an interval name in each subsequent row).
10753 int intervalNumberWidth = (int)log10((double)intervals.size()) + 1;
10754 // The regColumnWidth includes the identifying character (I or V) and an 'i' or 'a' (inactive or active)
10755 regColumnWidth = intervalNumberWidth + 2;
10756 if (regColumnWidth < 4)
10758 regColumnWidth = 4;
10760 sprintf_s(intervalNameFormat, MAX_FORMAT_CHARS, "%%c%%-%dd", regColumnWidth - 2);
10761 sprintf_s(regNameFormat, MAX_FORMAT_CHARS, "%%-%ds", regColumnWidth);
10763 // Next, determine the width of the short RefPosition (see dumpRefPositionShort()).
10764 // This is in the form:
10765 // nnn.#mmm NAME TYPEld
10767 // nnn is the Location, right-justified to the width needed for the highest location.
10768 // mmm is the RefPosition rpNum, left-justified to the width needed for the highest rpNum.
10769 // NAME is dumped by dumpReferentName(), and is "regColumnWidth".
10770 // TYPE is RefTypeNameShort, and is 4 characters
10771 // l is either '*' (if a last use) or ' ' (otherwise)
10772 // d is either 'D' (if a delayed use) or ' ' (otherwise)
10774 maxNodeLocation = (maxNodeLocation == 0)
10776 : maxNodeLocation; // corner case of a method with an infinite loop without any gentree nodes
10777 assert(maxNodeLocation >= 1);
10778 assert(refPositions.size() >= 1);
10779 int nodeLocationWidth = (int)log10((double)maxNodeLocation) + 1;
10780 int refPositionWidth = (int)log10((double)refPositions.size()) + 1;
10781 int refTypeInfoWidth = 4 /*TYPE*/ + 2 /* last-use and delayed */ + 1 /* space */;
10782 int locationAndRPNumWidth = nodeLocationWidth + 2 /* .# */ + refPositionWidth + 1 /* space */;
10783 int shortRefPositionDumpWidth = locationAndRPNumWidth + regColumnWidth + 1 /* space */ + refTypeInfoWidth;
10784 sprintf_s(shortRefPositionFormat, MAX_FORMAT_CHARS, "%%%dd.#%%-%dd ", nodeLocationWidth, refPositionWidth);
10785 sprintf_s(emptyRefPositionFormat, MAX_FORMAT_CHARS, "%%-%ds", shortRefPositionDumpWidth);
10787 // The width of the "allocation info"
10788 // - a 5-character allocation decision
10790 // - a 4-character register
10792 int allocationInfoWidth = 5 + 1 + 4 + 1;
10794 // Next, determine the width of the legend for each row. This includes:
10795 // - a short RefPosition dump (shortRefPositionDumpWidth), which includes a space
10796 // - the allocation info (allocationInfoWidth), which also includes a space
10798 regTableIndent = shortRefPositionDumpWidth + allocationInfoWidth;
10800 // BBnn printed left-justified in the NAME Typeld and allocationInfo space.
10801 int bbDumpWidth = regColumnWidth + 1 + refTypeInfoWidth + allocationInfoWidth;
10802 int bbNumWidth = (int)log10((double)compiler->fgBBNumMax) + 1;
10803 // In the unlikely event that BB numbers overflow the space, we'll simply omit the predBB
10804 int predBBNumDumpSpace = regTableIndent - locationAndRPNumWidth - bbNumWidth - 9; // 'BB' + ' PredBB'
10805 if (predBBNumDumpSpace < bbNumWidth)
10807 sprintf_s(bbRefPosFormat, MAX_LEGEND_FORMAT_CHARS, "BB%%-%dd", shortRefPositionDumpWidth - 2);
10811 sprintf_s(bbRefPosFormat, MAX_LEGEND_FORMAT_CHARS, "BB%%-%dd PredBB%%-%dd", bbNumWidth, predBBNumDumpSpace);
10814 if (compiler->shouldDumpASCIITrees())
10816 columnSeparator = "|";
10824 columnSeparator = "\xe2\x94\x82";
10825 line = "\xe2\x94\x80";
10826 leftBox = "\xe2\x94\x9c";
10827 middleBox = "\xe2\x94\xbc";
10828 rightBox = "\xe2\x94\xa4";
10830 sprintf_s(indentFormat, MAX_FORMAT_CHARS, "%%-%ds", regTableIndent);
10832 // Now, set up the legend format for the RefPosition info
10833 sprintf_s(legendFormat, MAX_LEGEND_FORMAT_CHARS, "%%-%d.%ds%%-%d.%ds%%-%ds%%s", nodeLocationWidth + 1,
10834 nodeLocationWidth + 1, refPositionWidth + 2, refPositionWidth + 2, regColumnWidth + 1);
10836 // Finally, print a "title row" including the legend and the reg names
10837 dumpRegRecordTitle();
10840 int LinearScan::getLastUsedRegNumIndex()
10842 int lastUsedRegNumIndex = 0;
10843 regMaskTP usedRegsMask = compiler->codeGen->regSet.rsGetModifiedRegsMask();
10844 int lastRegNumIndex = compiler->compFloatingPointUsed ? REG_FP_LAST : REG_INT_LAST;
10845 for (int regNumIndex = 0; regNumIndex <= lastRegNumIndex; regNumIndex++)
10847 if ((usedRegsMask & genRegMask((regNumber)regNumIndex)) != 0)
10849 lastUsedRegNumIndex = regNumIndex;
10852 return lastUsedRegNumIndex;
10855 void LinearScan::dumpRegRecordTitleLines()
10857 for (int i = 0; i < regTableIndent; i++)
10859 printf("%s", line);
10861 int lastUsedRegNumIndex = getLastUsedRegNumIndex();
10862 for (int regNumIndex = 0; regNumIndex <= lastUsedRegNumIndex; regNumIndex++)
10864 printf("%s", middleBox);
10865 for (int i = 0; i < regColumnWidth; i++)
10867 printf("%s", line);
10870 printf("%s\n", rightBox);
10872 void LinearScan::dumpRegRecordTitle()
10874 dumpRegRecordTitleLines();
10876 // Print out the legend for the RefPosition info
10877 printf(legendFormat, "Loc ", "RP# ", "Name ", "Type Action Reg ");
10879 // Print out the register name column headers
10880 char columnFormatArray[MAX_FORMAT_CHARS];
10881 sprintf_s(columnFormatArray, MAX_FORMAT_CHARS, "%s%%-%d.%ds", columnSeparator, regColumnWidth, regColumnWidth);
10882 int lastUsedRegNumIndex = getLastUsedRegNumIndex();
10883 for (int regNumIndex = 0; regNumIndex <= lastUsedRegNumIndex; regNumIndex++)
10885 regNumber regNum = (regNumber)regNumIndex;
10886 const char* regName = getRegName(regNum);
10887 printf(columnFormatArray, regName);
10889 printf("%s\n", columnSeparator);
10891 rowCountSinceLastTitle = 0;
10893 dumpRegRecordTitleLines();
10896 void LinearScan::dumpRegRecords()
10898 static char columnFormatArray[18];
10899 int lastUsedRegNumIndex = getLastUsedRegNumIndex();
10900 regMaskTP usedRegsMask = compiler->codeGen->regSet.rsGetModifiedRegsMask();
10902 for (int regNumIndex = 0; regNumIndex <= lastUsedRegNumIndex; regNumIndex++)
10904 printf("%s", columnSeparator);
10905 RegRecord& regRecord = physRegs[regNumIndex];
10906 Interval* interval = regRecord.assignedInterval;
10907 if (interval != nullptr)
10909 dumpIntervalName(interval);
10910 char activeChar = interval->isActive ? 'a' : 'i';
10911 printf("%c", activeChar);
10913 else if (regRecord.isBusyUntilNextKill)
10915 printf(columnFormatArray, "Busy");
10917 else if ((usedRegsMask & genRegMask((regNumber)regNumIndex)) == 0)
10919 sprintf_s(columnFormatArray, MAX_FORMAT_CHARS, "%%-%ds", regColumnWidth);
10920 printf(columnFormatArray, "----");
10924 sprintf_s(columnFormatArray, MAX_FORMAT_CHARS, "%%-%ds", regColumnWidth);
10925 printf(columnFormatArray, "");
10928 printf("%s\n", columnSeparator);
10930 if (rowCountSinceLastTitle > MAX_ROWS_BETWEEN_TITLES)
10932 dumpRegRecordTitle();
10934 rowCountSinceLastTitle++;
10937 void LinearScan::dumpIntervalName(Interval* interval)
10940 if (interval->isLocalVar)
10942 intervalChar = 'V';
10944 else if (interval->isConstant)
10946 intervalChar = 'C';
10950 intervalChar = 'I';
10952 printf(intervalNameFormat, intervalChar, interval->intervalIndex);
10955 void LinearScan::dumpEmptyRefPosition()
10957 printf(emptyRefPositionFormat, "");
10960 // Note that the size of this dump is computed in dumpRegRecordHeader().
10962 void LinearScan::dumpRefPositionShort(RefPosition* refPosition, BasicBlock* currentBlock)
10964 BasicBlock* block = currentBlock;
10965 if (refPosition->refType == RefTypeBB)
10967 // Always print a title row before a RefTypeBB (except for the first, because we
10968 // will already have printed it before the parameters)
10969 if (refPosition->refType == RefTypeBB && block != compiler->fgFirstBB && block != nullptr)
10971 dumpRegRecordTitle();
10974 printf(shortRefPositionFormat, refPosition->nodeLocation, refPosition->rpNum);
10975 if (refPosition->refType == RefTypeBB)
10977 if (block == nullptr)
10979 printf(regNameFormat, "END");
10981 printf(regNameFormat, "");
10985 printf(bbRefPosFormat, block->bbNum, block == compiler->fgFirstBB ? 0 : blockInfo[block->bbNum].predBBNum);
10988 else if (refPosition->isIntervalRef())
10990 Interval* interval = refPosition->getInterval();
10991 dumpIntervalName(interval);
10992 char lastUseChar = ' ';
10993 char delayChar = ' ';
10994 if (refPosition->lastUse)
10997 if (refPosition->delayRegFree)
11002 printf(" %s%c%c ", getRefTypeShortName(refPosition->refType), lastUseChar, delayChar);
11004 else if (refPosition->isPhysRegRef)
11006 RegRecord* regRecord = refPosition->getReg();
11007 printf(regNameFormat, getRegName(regRecord->regNum));
11008 printf(" %s ", getRefTypeShortName(refPosition->refType));
11012 assert(refPosition->refType == RefTypeKillGCRefs);
11013 // There's no interval or reg name associated with this.
11014 printf(regNameFormat, " ");
11015 printf(" %s ", getRefTypeShortName(refPosition->refType));
11019 //------------------------------------------------------------------------
11020 // LinearScan::IsResolutionMove:
11021 // Returns true if the given node is a move inserted by LSRA
11025 // node - the node to check.
11027 bool LinearScan::IsResolutionMove(GenTree* node)
11029 if (!node->gtLsraInfo.isLsraAdded)
11034 switch (node->OperGet())
11038 return node->gtLsraInfo.isLocalDefUse;
11048 //------------------------------------------------------------------------
11049 // LinearScan::IsResolutionNode:
11050 // Returns true if the given node is either a move inserted by LSRA
11051 // resolution or an operand to such a move.
11054 // containingRange - the range that contains the node to check.
11055 // node - the node to check.
11057 bool LinearScan::IsResolutionNode(LIR::Range& containingRange, GenTree* node)
11061 if (IsResolutionMove(node))
11066 if (!node->gtLsraInfo.isLsraAdded || (node->OperGet() != GT_LCL_VAR))
11072 bool foundUse = containingRange.TryGetUse(node, &use);
11079 //------------------------------------------------------------------------
11080 // verifyFinalAllocation: Traverse the RefPositions and verify various invariants.
11089 // If verbose is set, this will also dump a table of the final allocations.
11090 void LinearScan::verifyFinalAllocation()
11094 printf("\nFinal allocation\n");
11097 // Clear register assignments.
11098 for (regNumber reg = REG_FIRST; reg < ACTUAL_REG_COUNT; reg = REG_NEXT(reg))
11100 RegRecord* physRegRecord = getRegisterRecord(reg);
11101 physRegRecord->assignedInterval = nullptr;
11104 for (auto& interval : intervals)
11106 interval.assignedReg = nullptr;
11107 interval.physReg = REG_NA;
11110 DBEXEC(VERBOSE, dumpRegRecordTitle());
11112 BasicBlock* currentBlock = nullptr;
11113 GenTree* firstBlockEndResolutionNode = nullptr;
11114 regMaskTP regsToFree = RBM_NONE;
11115 regMaskTP delayRegsToFree = RBM_NONE;
11116 LsraLocation currentLocation = MinLocation;
11117 for (auto& refPosition : refPositions)
11119 RefPosition* currentRefPosition = &refPosition;
11120 Interval* interval = nullptr;
11121 RegRecord* regRecord = nullptr;
11122 regNumber regNum = REG_NA;
11123 if (currentRefPosition->refType == RefTypeBB)
11125 regsToFree |= delayRegsToFree;
11126 delayRegsToFree = RBM_NONE;
11127 // For BB RefPositions, wait until we dump the "end of block" info before dumping the basic RefPosition
11132 // For other RefPosition types, we can dump the basic RefPosition info now.
11133 DBEXEC(VERBOSE, dumpRefPositionShort(currentRefPosition, currentBlock));
11135 if (currentRefPosition->isPhysRegRef)
11137 regRecord = currentRefPosition->getReg();
11138 regRecord->recentRefPosition = currentRefPosition;
11139 regNum = regRecord->regNum;
11141 else if (currentRefPosition->isIntervalRef())
11143 interval = currentRefPosition->getInterval();
11144 interval->recentRefPosition = currentRefPosition;
11145 if (currentRefPosition->registerAssignment != RBM_NONE)
11147 if (!genMaxOneBit(currentRefPosition->registerAssignment))
11149 assert(currentRefPosition->refType == RefTypeExpUse ||
11150 currentRefPosition->refType == RefTypeDummyDef);
11154 regNum = currentRefPosition->assignedReg();
11155 regRecord = getRegisterRecord(regNum);
11161 LsraLocation newLocation = currentRefPosition->nodeLocation;
11163 if (newLocation > currentLocation)
11166 // We could use the freeRegisters() method, but we'd have to carefully manage the active intervals.
11167 for (regNumber reg = REG_FIRST; reg < ACTUAL_REG_COUNT; reg = REG_NEXT(reg))
11169 regMaskTP regMask = genRegMask(reg);
11170 if ((regsToFree & regMask) != RBM_NONE)
11172 RegRecord* physRegRecord = getRegisterRecord(reg);
11173 physRegRecord->assignedInterval = nullptr;
11176 regsToFree = delayRegsToFree;
11177 regsToFree = RBM_NONE;
11179 currentLocation = newLocation;
11181 switch (currentRefPosition->refType)
11185 if (currentBlock == nullptr)
11187 currentBlock = startBlockSequence();
11191 // Verify the resolution moves at the end of the previous block.
11192 for (GenTree* node = firstBlockEndResolutionNode; node != nullptr; node = node->gtNext)
11194 // Only verify nodes that are actually moves; don't bother with the nodes that are
11195 // operands to moves.
11196 if (IsResolutionMove(node))
11198 verifyResolutionMove(node, currentLocation);
11202 // Validate the locations at the end of the previous block.
11203 VarToRegMap outVarToRegMap = outVarToRegMaps[currentBlock->bbNum];
11204 VARSET_ITER_INIT(compiler, iter, currentBlock->bbLiveOut, varIndex);
11205 while (iter.NextElem(compiler, &varIndex))
11207 unsigned varNum = compiler->lvaTrackedToVarNum[varIndex];
11208 regNumber regNum = getVarReg(outVarToRegMap, varNum);
11209 interval = getIntervalForLocalVar(varNum);
11210 assert(interval->physReg == regNum || (interval->physReg == REG_NA && regNum == REG_STK));
11211 interval->physReg = REG_NA;
11212 interval->assignedReg = nullptr;
11213 interval->isActive = false;
11216 // Clear register assignments.
11217 for (regNumber reg = REG_FIRST; reg < ACTUAL_REG_COUNT; reg = REG_NEXT(reg))
11219 RegRecord* physRegRecord = getRegisterRecord(reg);
11220 physRegRecord->assignedInterval = nullptr;
11223 // Now, record the locations at the beginning of this block.
11224 currentBlock = moveToNextBlock();
11227 if (currentBlock != nullptr)
11229 VarToRegMap inVarToRegMap = inVarToRegMaps[currentBlock->bbNum];
11230 VARSET_ITER_INIT(compiler, iter, currentBlock->bbLiveIn, varIndex);
11231 while (iter.NextElem(compiler, &varIndex))
11233 unsigned varNum = compiler->lvaTrackedToVarNum[varIndex];
11234 regNumber regNum = getVarReg(inVarToRegMap, varNum);
11235 interval = getIntervalForLocalVar(varNum);
11236 interval->physReg = regNum;
11237 interval->assignedReg = &(physRegs[regNum]);
11238 interval->isActive = true;
11239 physRegs[regNum].assignedInterval = interval;
11244 dumpRefPositionShort(currentRefPosition, currentBlock);
11248 // Finally, handle the resolution moves, if any, at the beginning of the next block.
11249 firstBlockEndResolutionNode = nullptr;
11250 bool foundNonResolutionNode = false;
11252 LIR::Range& currentBlockRange = LIR::AsRange(currentBlock);
11253 for (GenTree* node : currentBlockRange.NonPhiNodes())
11255 if (IsResolutionNode(currentBlockRange, node))
11257 if (foundNonResolutionNode)
11259 firstBlockEndResolutionNode = node;
11262 else if (IsResolutionMove(node))
11264 // Only verify nodes that are actually moves; don't bother with the nodes that are
11265 // operands to moves.
11266 verifyResolutionMove(node, currentLocation);
11271 foundNonResolutionNode = true;
11280 assert(regRecord != nullptr);
11281 assert(regRecord->assignedInterval == nullptr);
11282 dumpLsraAllocationEvent(LSRA_EVENT_KEPT_ALLOCATION, nullptr, regRecord->regNum, currentBlock);
11284 case RefTypeFixedReg:
11285 assert(regRecord != nullptr);
11286 dumpLsraAllocationEvent(LSRA_EVENT_KEPT_ALLOCATION, nullptr, regRecord->regNum, currentBlock);
11289 case RefTypeUpperVectorSaveDef:
11290 case RefTypeUpperVectorSaveUse:
11293 case RefTypeParamDef:
11294 case RefTypeZeroInit:
11295 assert(interval != nullptr);
11297 if (interval->isSpecialPutArg)
11299 dumpLsraAllocationEvent(LSRA_EVENT_SPECIAL_PUTARG, interval, regNum);
11302 if (currentRefPosition->reload)
11304 interval->isActive = true;
11305 assert(regNum != REG_NA);
11306 interval->physReg = regNum;
11307 interval->assignedReg = regRecord;
11308 regRecord->assignedInterval = interval;
11309 dumpLsraAllocationEvent(LSRA_EVENT_RELOAD, nullptr, regRecord->regNum, currentBlock);
11311 if (regNum == REG_NA)
11313 dumpLsraAllocationEvent(LSRA_EVENT_NO_REG_ALLOCATED, interval);
11315 else if (RefTypeIsDef(currentRefPosition->refType))
11317 interval->isActive = true;
11320 if (interval->isConstant && (currentRefPosition->treeNode != nullptr) &&
11321 currentRefPosition->treeNode->IsReuseRegVal())
11323 dumpLsraAllocationEvent(LSRA_EVENT_REUSE_REG, nullptr, regRecord->regNum, currentBlock);
11327 dumpLsraAllocationEvent(LSRA_EVENT_ALLOC_REG, nullptr, regRecord->regNum, currentBlock);
11331 else if (currentRefPosition->copyReg)
11333 dumpLsraAllocationEvent(LSRA_EVENT_COPY_REG, interval, regRecord->regNum, currentBlock);
11335 else if (currentRefPosition->moveReg)
11337 assert(interval->assignedReg != nullptr);
11338 interval->assignedReg->assignedInterval = nullptr;
11339 interval->physReg = regNum;
11340 interval->assignedReg = regRecord;
11341 regRecord->assignedInterval = interval;
11344 printf("Move %-4s ", getRegName(regRecord->regNum));
11349 dumpLsraAllocationEvent(LSRA_EVENT_KEPT_ALLOCATION, nullptr, regRecord->regNum, currentBlock);
11351 if (currentRefPosition->lastUse || currentRefPosition->spillAfter)
11353 interval->isActive = false;
11355 if (regNum != REG_NA)
11357 if (currentRefPosition->spillAfter)
11362 dumpEmptyRefPosition();
11363 printf("Spill %-4s ", getRegName(regNum));
11366 else if (currentRefPosition->copyReg)
11368 regRecord->assignedInterval = interval;
11372 interval->physReg = regNum;
11373 interval->assignedReg = regRecord;
11374 regRecord->assignedInterval = interval;
11378 case RefTypeKillGCRefs:
11379 // No action to take.
11380 // However, we will assert that, at resolution time, no registers contain GC refs.
11382 DBEXEC(VERBOSE, printf(" "));
11383 regMaskTP candidateRegs = currentRefPosition->registerAssignment;
11384 while (candidateRegs != RBM_NONE)
11386 regMaskTP nextRegBit = genFindLowestBit(candidateRegs);
11387 candidateRegs &= ~nextRegBit;
11388 regNumber nextReg = genRegNumFromMask(nextRegBit);
11389 RegRecord* regRecord = getRegisterRecord(nextReg);
11390 Interval* assignedInterval = regRecord->assignedInterval;
11391 assert(assignedInterval == nullptr || !varTypeIsGC(assignedInterval->registerType));
11396 case RefTypeExpUse:
11397 case RefTypeDummyDef:
11398 // Do nothing; these will be handled by the RefTypeBB.
11399 DBEXEC(VERBOSE, printf(" "));
11402 case RefTypeInvalid:
11403 // for these 'currentRefPosition->refType' values, No action to take
11407 if (currentRefPosition->refType != RefTypeBB)
11409 DBEXEC(VERBOSE, dumpRegRecords());
11410 if (interval != nullptr)
11412 if (currentRefPosition->copyReg)
11414 assert(interval->physReg != regNum);
11415 regRecord->assignedInterval = nullptr;
11416 assert(interval->assignedReg != nullptr);
11417 regRecord = interval->assignedReg;
11419 if (currentRefPosition->spillAfter || currentRefPosition->lastUse)
11421 interval->physReg = REG_NA;
11422 interval->assignedReg = nullptr;
11424 // regRegcord could be null if RefPosition is to be allocated a
11425 // reg only if profitable.
11426 if (regRecord != nullptr)
11428 regRecord->assignedInterval = nullptr;
11432 assert(currentRefPosition->AllocateIfProfitable());
11439 // Now, verify the resolution blocks.
11440 // Currently these are nearly always at the end of the method, but that may not alwyas be the case.
11441 // So, we'll go through all the BBs looking for blocks whose bbNum is greater than bbNumMaxBeforeResolution.
11442 for (BasicBlock* currentBlock = compiler->fgFirstBB; currentBlock != nullptr; currentBlock = currentBlock->bbNext)
11444 if (currentBlock->bbNum > bbNumMaxBeforeResolution)
11448 dumpRegRecordTitle();
11449 printf(shortRefPositionFormat, 0, 0);
11450 assert(currentBlock->bbPreds != nullptr && currentBlock->bbPreds->flBlock != nullptr);
11451 printf(bbRefPosFormat, currentBlock->bbNum, currentBlock->bbPreds->flBlock->bbNum);
11455 // Clear register assignments.
11456 for (regNumber reg = REG_FIRST; reg < ACTUAL_REG_COUNT; reg = REG_NEXT(reg))
11458 RegRecord* physRegRecord = getRegisterRecord(reg);
11459 physRegRecord->assignedInterval = nullptr;
11462 // Set the incoming register assignments
11463 VarToRegMap inVarToRegMap = getInVarToRegMap(currentBlock->bbNum);
11464 VARSET_ITER_INIT(compiler, iter, currentBlock->bbLiveIn, varIndex);
11465 while (iter.NextElem(compiler, &varIndex))
11467 unsigned varNum = compiler->lvaTrackedToVarNum[varIndex];
11468 regNumber regNum = getVarReg(inVarToRegMap, varNum);
11469 Interval* interval = getIntervalForLocalVar(varNum);
11470 interval->physReg = regNum;
11471 interval->assignedReg = &(physRegs[regNum]);
11472 interval->isActive = true;
11473 physRegs[regNum].assignedInterval = interval;
11476 // Verify the moves in this block
11477 LIR::Range& currentBlockRange = LIR::AsRange(currentBlock);
11478 for (GenTree* node : currentBlockRange.NonPhiNodes())
11480 assert(IsResolutionNode(currentBlockRange, node));
11481 if (IsResolutionMove(node))
11483 // Only verify nodes that are actually moves; don't bother with the nodes that are
11484 // operands to moves.
11485 verifyResolutionMove(node, currentLocation);
11489 // Verify the outgoing register assignments
11491 VarToRegMap outVarToRegMap = getOutVarToRegMap(currentBlock->bbNum);
11492 VARSET_ITER_INIT(compiler, iter, currentBlock->bbLiveOut, varIndex);
11493 while (iter.NextElem(compiler, &varIndex))
11495 unsigned varNum = compiler->lvaTrackedToVarNum[varIndex];
11496 regNumber regNum = getVarReg(outVarToRegMap, varNum);
11497 Interval* interval = getIntervalForLocalVar(varNum);
11498 assert(interval->physReg == regNum || (interval->physReg == REG_NA && regNum == REG_STK));
11499 interval->physReg = REG_NA;
11500 interval->assignedReg = nullptr;
11501 interval->isActive = false;
11507 DBEXEC(VERBOSE, printf("\n"));
11510 //------------------------------------------------------------------------
11511 // verifyResolutionMove: Verify a resolution statement. Called by verifyFinalAllocation()
11514 // resolutionMove - A GenTree* that must be a resolution move.
11515 // currentLocation - The LsraLocation of the most recent RefPosition that has been verified.
11521 // If verbose is set, this will also dump the moves into the table of final allocations.
11522 void LinearScan::verifyResolutionMove(GenTree* resolutionMove, LsraLocation currentLocation)
11524 GenTree* dst = resolutionMove;
11525 assert(IsResolutionMove(dst));
11527 if (dst->OperGet() == GT_SWAP)
11529 GenTreeLclVarCommon* left = dst->gtGetOp1()->AsLclVarCommon();
11530 GenTreeLclVarCommon* right = dst->gtGetOp2()->AsLclVarCommon();
11531 regNumber leftRegNum = left->gtRegNum;
11532 regNumber rightRegNum = right->gtRegNum;
11533 Interval* leftInterval = getIntervalForLocalVar(left->gtLclNum);
11534 Interval* rightInterval = getIntervalForLocalVar(right->gtLclNum);
11535 assert(leftInterval->physReg == leftRegNum && rightInterval->physReg == rightRegNum);
11536 leftInterval->physReg = rightRegNum;
11537 rightInterval->physReg = leftRegNum;
11538 physRegs[rightRegNum].assignedInterval = leftInterval;
11539 physRegs[leftRegNum].assignedInterval = rightInterval;
11542 printf(shortRefPositionFormat, currentLocation, 0);
11543 dumpIntervalName(leftInterval);
11545 printf(" %-4s ", getRegName(rightRegNum));
11547 printf(shortRefPositionFormat, currentLocation, 0);
11548 dumpIntervalName(rightInterval);
11550 printf(" %-4s ", getRegName(leftRegNum));
11555 regNumber dstRegNum = dst->gtRegNum;
11556 regNumber srcRegNum;
11557 GenTreeLclVarCommon* lcl;
11558 if (dst->OperGet() == GT_COPY)
11560 lcl = dst->gtGetOp1()->AsLclVarCommon();
11561 srcRegNum = lcl->gtRegNum;
11565 lcl = dst->AsLclVarCommon();
11566 if ((lcl->gtFlags & GTF_SPILLED) != 0)
11568 srcRegNum = REG_STK;
11572 assert((lcl->gtFlags & GTF_SPILL) != 0);
11573 srcRegNum = dstRegNum;
11574 dstRegNum = REG_STK;
11577 Interval* interval = getIntervalForLocalVar(lcl->gtLclNum);
11578 assert(interval->physReg == srcRegNum || (srcRegNum == REG_STK && interval->physReg == REG_NA));
11579 if (srcRegNum != REG_STK)
11581 physRegs[srcRegNum].assignedInterval = nullptr;
11583 if (dstRegNum != REG_STK)
11585 interval->physReg = dstRegNum;
11586 interval->assignedReg = &(physRegs[dstRegNum]);
11587 physRegs[dstRegNum].assignedInterval = interval;
11588 interval->isActive = true;
11592 interval->physReg = REG_NA;
11593 interval->assignedReg = nullptr;
11594 interval->isActive = false;
11598 printf(shortRefPositionFormat, currentLocation, 0);
11599 dumpIntervalName(interval);
11601 printf(" %-4s ", getRegName(dstRegNum));
11607 #endif // !LEGACY_BACKEND