1 // Licensed to the .NET Foundation under one or more agreements.
2 // The .NET Foundation licenses this file to you under the MIT license.
3 // See the LICENSE file in the project root for more information.
6 XXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXX
7 XXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXX
9 Linear Scan Register Allocation
14 - All register requirements are expressed in the code stream, either as destination
15 registers of tree nodes, or as internal registers. These requirements are
16 expressed in the TreeNodeInfo (gtLsraInfo) on each node, which includes:
17 - The number of register sources and destinations.
18 - The register restrictions (candidates) of the target register, both from itself,
19 as producer of the value (dstCandidates), and from its consuming node (srcCandidates).
20 Note that the srcCandidates field of TreeNodeInfo refers to the destination register
21 (not any of its sources).
22 - The number (internalCount) of registers required, and their register restrictions (internalCandidates).
23 These are neither inputs nor outputs of the node, but used in the sequence of code generated for the tree.
24 "Internal registers" are registers used during the code sequence generated for the node.
25 The register lifetimes must obey the following lifetime model:
26 - First, any internal registers are defined.
27 - Next, any source registers are used (and are then freed if they are last use and are not identified as
29 - Next, the internal registers are used (and are then freed).
30 - Next, any registers in the kill set for the instruction are killed.
31 - Next, the destination register(s) are defined (multiple destination registers are only supported on ARM)
32 - Finally, any "delayRegFree" source registers are freed.
33 There are several things to note about this order:
34 - The internal registers will never overlap any use, but they may overlap a destination register.
35 - Internal registers are never live beyond the node.
36 - The "delayRegFree" annotation is used for instructions that are only available in a Read-Modify-Write form.
37 That is, the destination register is one of the sources. In this case, we must not use the same register for
38 the non-RMW operand as for the destination.
40 Overview (doLinearScan):
41 - Walk all blocks, building intervals and RefPositions (buildIntervals)
42 - Allocate registers (allocateRegisters)
43 - Annotate nodes with register assignments (resolveRegisters)
44 - Add move nodes as needed to resolve conflicting register
45 assignments across non-adjacent edges. (resolveEdges, called from resolveRegisters)
50 - GenTree::gtRegNum (and gtRegPair for ARM) is annotated with the register
51 assignment for a node. If the node does not require a register, it is
52 annotated as such (for single registers, gtRegNum = REG_NA; for register
53 pair type, gtRegPair = REG_PAIR_NONE). For a variable definition or interior
54 tree node (an "implicit" definition), this is the register to put the result.
55 For an expression use, this is the place to find the value that has previously
57 - In most cases, this register must satisfy the constraints specified by the TreeNodeInfo.
58 - In some cases, this is difficult:
59 - If a lclVar node currently lives in some register, it may not be desirable to move it
60 (i.e. its current location may be desirable for future uses, e.g. if it's a callee save register,
61 but needs to be in a specific arg register for a call).
62 - In other cases there may be conflicts on the restrictions placed by the defining node and the node which
64 - If such a node is constrained to a single fixed register (e.g. an arg register, or a return from a call),
65 then LSRA is free to annotate the node with a different register. The code generator must issue the appropriate
67 - However, if such a node is constrained to a set of registers, and its current location does not satisfy that
68 requirement, LSRA must insert a GT_COPY node between the node and its parent. The gtRegNum on the GT_COPY node
69 must satisfy the register requirement of the parent.
70 - GenTree::gtRsvdRegs has a set of registers used for internal temps.
71 - A tree node is marked GTF_SPILL if the tree node must be spilled by the code generator after it has been
73 - LSRA currently does not set GTF_SPILLED on such nodes, because it caused problems in the old code generator.
74 In the new backend perhaps this should change (see also the note below under CodeGen).
75 - A tree node is marked GTF_SPILLED if it is a lclVar that must be reloaded prior to use.
76 - The register (gtRegNum) on the node indicates the register to which it must be reloaded.
77 - For lclVar nodes, since the uses and defs are distinct tree nodes, it is always possible to annotate the node
78 with the register to which the variable must be reloaded.
79 - For other nodes, since they represent both the def and use, if the value must be reloaded to a different
80 register, LSRA must insert a GT_RELOAD node in order to specify the register to which it should be reloaded.
82 Local variable table (LclVarDsc):
83 - LclVarDsc::lvRegister is set to true if a local variable has the
84 same register assignment for its entire lifetime.
85 - LclVarDsc::lvRegNum / lvOtherReg: these are initialized to their
86 first value at the end of LSRA (it looks like lvOtherReg isn't?
87 This is probably a bug (ARM)). Codegen will set them to their current value
88 as it processes the trees, since a variable can (now) be assigned different
89 registers over its lifetimes.
91 XXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXX
92 XXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXX
100 #ifndef LEGACY_BACKEND // This file is ONLY used for the RyuJIT backend that uses the linear scan register allocator
105 const char* LinearScan::resolveTypeName[] = {"Split", "Join", "Critical", "SharedCritical"};
108 /*XXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXX
109 XXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXX
111 XX Small Helper functions XX
114 XXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXX
115 XXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXX
118 //--------------------------------------------------------------
119 // lsraAssignRegToTree: Assign the given reg to tree node.
122 // tree - Gentree node
123 // reg - register to be assigned
124 // regIdx - register idx, if tree is a multi-reg call node.
125 // regIdx will be zero for single-reg result producing tree nodes.
130 void lsraAssignRegToTree(GenTreePtr tree, regNumber reg, unsigned regIdx)
134 tree->gtRegNum = reg;
138 assert(tree->IsMultiRegCall());
139 GenTreeCall* call = tree->AsCall();
140 call->SetRegNumByIdx(reg, regIdx);
144 //-------------------------------------------------------------
145 // getWeight: Returns the weight of the RefPosition.
148 // refPos - ref position
151 // Weight of ref position.
152 unsigned LinearScan::getWeight(RefPosition* refPos)
155 GenTreePtr treeNode = refPos->treeNode;
157 if (treeNode != nullptr)
159 if (isCandidateLocalRef(treeNode))
161 // Tracked locals: use weighted ref cnt as the weight of the
163 GenTreeLclVarCommon* lclCommon = treeNode->AsLclVarCommon();
164 LclVarDsc* varDsc = &(compiler->lvaTable[lclCommon->gtLclNum]);
165 weight = varDsc->lvRefCntWtd;
169 // Non-candidate local ref or non-lcl tree node.
170 // These are considered to have two references in the basic block:
171 // a def and a use and hence weighted ref count is 2 times
172 // the basic block weight in which they appear.
173 weight = 2 * this->blockInfo[refPos->bbNum].weight;
178 // Non-tree node ref positions. These will have a single
179 // reference in the basic block and hence their weighted
180 // refcount is equal to the block weight in which they
182 weight = this->blockInfo[refPos->bbNum].weight;
188 // allRegs represents a set of registers that can
189 // be used to allocate the specified type in any point
190 // in time (more of a 'bank' of registers).
191 regMaskTP LinearScan::allRegs(RegisterType rt)
195 return availableFloatRegs;
197 else if (rt == TYP_DOUBLE)
199 return availableDoubleRegs;
201 // TODO-Cleanup: Add an RBM_ALLSIMD
203 else if (varTypeIsSIMD(rt))
205 return availableDoubleRegs;
206 #endif // FEATURE_SIMD
210 return availableIntRegs;
214 //--------------------------------------------------------------------------
215 // allMultiRegCallNodeRegs: represents a set of registers that can be used
216 // to allocate a multi-reg call node.
219 // call - Multi-reg call node
222 // Mask representing the set of available registers for multi-reg call
226 // Multi-reg call node available regs = Bitwise-OR(allregs(GetReturnRegType(i)))
227 // for all i=0..RetRegCount-1.
228 regMaskTP LinearScan::allMultiRegCallNodeRegs(GenTreeCall* call)
230 assert(call->HasMultiRegRetVal());
232 ReturnTypeDesc* retTypeDesc = call->GetReturnTypeDesc();
233 regMaskTP resultMask = allRegs(retTypeDesc->GetReturnRegType(0));
235 unsigned count = retTypeDesc->GetReturnRegCount();
236 for (unsigned i = 1; i < count; ++i)
238 resultMask |= allRegs(retTypeDesc->GetReturnRegType(i));
244 //--------------------------------------------------------------------------
245 // allRegs: returns the set of registers that can accomodate the type of
249 // tree - GenTree node
252 // Mask representing the set of available registers for given tree
254 // Note: In case of multi-reg call node, the full set of registers must be
255 // determined by looking at types of individual return register types.
256 // In this case, the registers may include registers from different register
257 // sets and will not be limited to the actual ABI return registers.
258 regMaskTP LinearScan::allRegs(GenTree* tree)
260 regMaskTP resultMask;
262 // In case of multi-reg calls, allRegs is defined as
263 // Bitwise-Or(allRegs(GetReturnRegType(i)) for i=0..ReturnRegCount-1
264 if (tree->IsMultiRegCall())
266 resultMask = allMultiRegCallNodeRegs(tree->AsCall());
270 resultMask = allRegs(tree->TypeGet());
276 regMaskTP LinearScan::allSIMDRegs()
278 return availableFloatRegs;
281 //------------------------------------------------------------------------
282 // internalFloatRegCandidates: Return the set of registers that are appropriate
283 // for use as internal float registers.
286 // The set of registers (as a regMaskTP).
289 // compFloatingPointUsed is only required to be set if it is possible that we
290 // will use floating point callee-save registers.
291 // It is unlikely, if an internal register is the only use of floating point,
292 // that it will select a callee-save register. But to be safe, we restrict
293 // the set of candidates if compFloatingPointUsed is not already set.
295 regMaskTP LinearScan::internalFloatRegCandidates()
297 if (compiler->compFloatingPointUsed)
299 return allRegs(TYP_FLOAT);
303 return RBM_FLT_CALLEE_TRASH;
307 /*****************************************************************************
309 *****************************************************************************/
311 RegisterType regType(T type)
314 if (varTypeIsSIMD(type))
316 return FloatRegisterType;
318 #endif // FEATURE_SIMD
319 return varTypeIsFloating(TypeGet(type)) ? FloatRegisterType : IntRegisterType;
322 bool useFloatReg(var_types type)
324 return (regType(type) == FloatRegisterType);
327 bool registerTypesEquivalent(RegisterType a, RegisterType b)
329 return varTypeIsIntegralOrI(a) == varTypeIsIntegralOrI(b);
332 bool isSingleRegister(regMaskTP regMask)
334 return (regMask != RBM_NONE && genMaxOneBit(regMask));
337 /*****************************************************************************
338 * Inline functions for RegRecord
339 *****************************************************************************/
341 bool RegRecord::isFree()
343 return ((assignedInterval == nullptr || !assignedInterval->isActive) && !isBusyUntilNextKill);
346 /*****************************************************************************
347 * Inline functions for LinearScan
348 *****************************************************************************/
349 RegRecord* LinearScan::getRegisterRecord(regNumber regNum)
351 return &physRegs[regNum];
356 //----------------------------------------------------------------------------
357 // getConstrainedRegMask: Returns new regMask which is the intersection of
358 // regMaskActual and regMaskConstraint if the new regMask has at least
359 // minRegCount registers, otherwise returns regMaskActual.
362 // regMaskActual - regMask that needs to be constrained
363 // regMaskConstraint - regMask constraint that needs to be
364 // applied to regMaskActual
365 // minRegCount - Minimum number of regs that should be
366 // be present in new regMask.
369 // New regMask that has minRegCount registers after instersection.
370 // Otherwise returns regMaskActual.
371 regMaskTP LinearScan::getConstrainedRegMask(regMaskTP regMaskActual, regMaskTP regMaskConstraint, unsigned minRegCount)
373 regMaskTP newMask = regMaskActual & regMaskConstraint;
374 if (genCountBits(newMask) >= minRegCount)
379 return regMaskActual;
382 //------------------------------------------------------------------------
383 // stressLimitRegs: Given a set of registers, expressed as a register mask, reduce
384 // them based on the current stress options.
387 // mask - The current mask of register candidates for a node
390 // A possibly-modified mask, based on the value of COMPlus_JitStressRegs.
393 // This is the method used to implement the stress options that limit
394 // the set of registers considered for allocation.
396 regMaskTP LinearScan::stressLimitRegs(RefPosition* refPosition, regMaskTP mask)
398 if (getStressLimitRegs() != LSRA_LIMIT_NONE)
400 // The refPosition could be null, for example when called
401 // by getTempRegForResolution().
402 int minRegCount = (refPosition != nullptr) ? refPosition->minRegCandidateCount : 1;
404 switch (getStressLimitRegs())
406 case LSRA_LIMIT_CALLEE:
407 if (!compiler->opts.compDbgEnC)
409 mask = getConstrainedRegMask(mask, RBM_CALLEE_SAVED, minRegCount);
413 case LSRA_LIMIT_CALLER:
415 mask = getConstrainedRegMask(mask, RBM_CALLEE_TRASH, minRegCount);
419 case LSRA_LIMIT_SMALL_SET:
420 if ((mask & LsraLimitSmallIntSet) != RBM_NONE)
422 mask = getConstrainedRegMask(mask, LsraLimitSmallIntSet, minRegCount);
424 else if ((mask & LsraLimitSmallFPSet) != RBM_NONE)
426 mask = getConstrainedRegMask(mask, LsraLimitSmallFPSet, minRegCount);
434 if (refPosition != nullptr && refPosition->isFixedRegRef)
436 mask |= refPosition->registerAssignment;
444 // TODO-Cleanup: Consider adding an overload that takes a varDsc, and can appropriately
445 // set such fields as isStructField
447 Interval* LinearScan::newInterval(RegisterType theRegisterType)
449 intervals.emplace_back(theRegisterType, allRegs(theRegisterType));
450 Interval* newInt = &intervals.back();
453 newInt->intervalIndex = static_cast<unsigned>(intervals.size() - 1);
456 DBEXEC(VERBOSE, newInt->dump());
460 RefPosition* LinearScan::newRefPositionRaw(LsraLocation nodeLocation, GenTree* treeNode, RefType refType)
462 refPositions.emplace_back(curBBNum, nodeLocation, treeNode, refType);
463 RefPosition* newRP = &refPositions.back();
465 newRP->rpNum = static_cast<unsigned>(refPositions.size() - 1);
470 //------------------------------------------------------------------------
471 // resolveConflictingDefAndUse: Resolve the situation where we have conflicting def and use
472 // register requirements on a single-def, single-use interval.
475 // defRefPosition - The interval definition
476 // useRefPosition - The (sole) interval use
482 // The two RefPositions are for the same interval, which is a tree-temp.
485 // We require some special handling for the case where the use is a "delayRegFree" case of a fixedReg.
486 // In that case, if we change the registerAssignment on the useRefPosition, we will lose the fact that,
487 // even if we assign a different register (and rely on codegen to do the copy), that fixedReg also needs
488 // to remain busy until the Def register has been allocated. In that case, we don't allow Case 1 or Case 4
490 // Here are the cases we consider (in this order):
491 // 1. If The defRefPosition specifies a single register, and there are no conflicting
492 // FixedReg uses of it between the def and use, we use that register, and the code generator
493 // will insert the copy. Note that it cannot be in use because there is a FixedRegRef for the def.
494 // 2. If the useRefPosition specifies a single register, and it is not in use, and there are no
495 // conflicting FixedReg uses of it between the def and use, we use that register, and the code generator
496 // will insert the copy.
497 // 3. If the defRefPosition specifies a single register (but there are conflicts, as determined
498 // in 1.), and there are no conflicts with the useRefPosition register (if it's a single register),
499 /// we set the register requirements on the defRefPosition to the use registers, and the
500 // code generator will insert a copy on the def. We can't rely on the code generator to put a copy
501 // on the use if it has multiple possible candidates, as it won't know which one has been allocated.
502 // 4. If the useRefPosition specifies a single register, and there are no conflicts with the register
503 // on the defRefPosition, we leave the register requirements on the defRefPosition as-is, and set
504 // the useRefPosition to the def registers, for similar reasons to case #3.
505 // 5. If both the defRefPosition and the useRefPosition specify single registers, but both have conflicts,
506 // We set the candiates on defRefPosition to be all regs of the appropriate type, and since they are
507 // single registers, codegen can insert the copy.
508 // 6. Finally, if the RefPositions specify disjoint subsets of the registers (or the use is fixed but
509 // has a conflict), we must insert a copy. The copy will be inserted before the use if the
510 // use is not fixed (in the fixed case, the code generator will insert the use).
512 // TODO-CQ: We get bad register allocation in case #3 in the situation where no register is
513 // available for the lifetime. We end up allocating a register that must be spilled, and it probably
514 // won't be the register that is actually defined by the target instruction. So, we have to copy it
515 // and THEN spill it. In this case, we should be using the def requirement. But we need to change
516 // the interface to this method a bit to make that work (e.g. returning a candidate set to use, but
517 // leaving the registerAssignment as-is on the def, so that if we find that we need to spill anyway
518 // we can use the fixed-reg on the def.
521 void LinearScan::resolveConflictingDefAndUse(Interval* interval, RefPosition* defRefPosition)
523 assert(!interval->isLocalVar);
525 RefPosition* useRefPosition = defRefPosition->nextRefPosition;
526 regMaskTP defRegAssignment = defRefPosition->registerAssignment;
527 regMaskTP useRegAssignment = useRefPosition->registerAssignment;
528 RegRecord* defRegRecord = nullptr;
529 RegRecord* useRegRecord = nullptr;
530 regNumber defReg = REG_NA;
531 regNumber useReg = REG_NA;
532 bool defRegConflict = false;
533 bool useRegConflict = false;
535 // If the useRefPosition is a "delayRegFree", we can't change the registerAssignment
536 // on it, or we will fail to ensure that the fixedReg is busy at the time the target
537 // (of the node that uses this interval) is allocated.
538 bool canChangeUseAssignment = !useRefPosition->isFixedRegRef || !useRefPosition->delayRegFree;
540 INDEBUG(dumpLsraAllocationEvent(LSRA_EVENT_DEFUSE_CONFLICT));
541 if (!canChangeUseAssignment)
543 INDEBUG(dumpLsraAllocationEvent(LSRA_EVENT_DEFUSE_FIXED_DELAY_USE));
545 if (defRefPosition->isFixedRegRef)
547 defReg = defRefPosition->assignedReg();
548 defRegRecord = getRegisterRecord(defReg);
549 if (canChangeUseAssignment)
551 RefPosition* currFixedRegRefPosition = defRegRecord->recentRefPosition;
552 assert(currFixedRegRefPosition != nullptr &&
553 currFixedRegRefPosition->nodeLocation == defRefPosition->nodeLocation);
555 if (currFixedRegRefPosition->nextRefPosition == nullptr ||
556 currFixedRegRefPosition->nextRefPosition->nodeLocation > useRefPosition->getRefEndLocation())
558 // This is case #1. Use the defRegAssignment
559 INDEBUG(dumpLsraAllocationEvent(LSRA_EVENT_DEFUSE_CASE1));
560 useRefPosition->registerAssignment = defRegAssignment;
565 defRegConflict = true;
569 if (useRefPosition->isFixedRegRef)
571 useReg = useRefPosition->assignedReg();
572 useRegRecord = getRegisterRecord(useReg);
573 RefPosition* currFixedRegRefPosition = useRegRecord->recentRefPosition;
575 // We know that useRefPosition is a fixed use, so the nextRefPosition must not be null.
576 RefPosition* nextFixedRegRefPosition = useRegRecord->getNextRefPosition();
577 assert(nextFixedRegRefPosition != nullptr &&
578 nextFixedRegRefPosition->nodeLocation <= useRefPosition->nodeLocation);
580 // First, check to see if there are any conflicting FixedReg references between the def and use.
581 if (nextFixedRegRefPosition->nodeLocation == useRefPosition->nodeLocation)
583 // OK, no conflicting FixedReg references.
584 // Now, check to see whether it is currently in use.
585 if (useRegRecord->assignedInterval != nullptr)
587 RefPosition* possiblyConflictingRef = useRegRecord->assignedInterval->recentRefPosition;
588 LsraLocation possiblyConflictingRefLocation = possiblyConflictingRef->getRefEndLocation();
589 if (possiblyConflictingRefLocation >= defRefPosition->nodeLocation)
591 useRegConflict = true;
596 // This is case #2. Use the useRegAssignment
597 INDEBUG(dumpLsraAllocationEvent(LSRA_EVENT_DEFUSE_CASE2));
598 defRefPosition->registerAssignment = useRegAssignment;
604 useRegConflict = true;
607 if (defRegRecord != nullptr && !useRegConflict)
610 INDEBUG(dumpLsraAllocationEvent(LSRA_EVENT_DEFUSE_CASE3));
611 defRefPosition->registerAssignment = useRegAssignment;
614 if (useRegRecord != nullptr && !defRegConflict && canChangeUseAssignment)
617 INDEBUG(dumpLsraAllocationEvent(LSRA_EVENT_DEFUSE_CASE4));
618 useRefPosition->registerAssignment = defRegAssignment;
621 if (defRegRecord != nullptr && useRegRecord != nullptr)
624 INDEBUG(dumpLsraAllocationEvent(LSRA_EVENT_DEFUSE_CASE5));
625 RegisterType regType = interval->registerType;
626 assert((getRegisterType(interval, defRefPosition) == regType) &&
627 (getRegisterType(interval, useRefPosition) == regType));
628 regMaskTP candidates = allRegs(regType);
629 defRefPosition->registerAssignment = candidates;
632 INDEBUG(dumpLsraAllocationEvent(LSRA_EVENT_DEFUSE_CASE6));
636 //------------------------------------------------------------------------
637 // conflictingFixedRegReference: Determine whether the current RegRecord has a
638 // fixed register use that conflicts with 'refPosition'
641 // refPosition - The RefPosition of interest
644 // Returns true iff the given RefPosition is NOT a fixed use of this register,
646 // - there is a RefPosition on this RegRecord at the nodeLocation of the given RefPosition, or
647 // - the given RefPosition has a delayRegFree, and there is a RefPosition on this RegRecord at
648 // the nodeLocation just past the given RefPosition.
651 // 'refPosition is non-null.
653 bool RegRecord::conflictingFixedRegReference(RefPosition* refPosition)
655 // Is this a fixed reference of this register? If so, there is no conflict.
656 if (refPosition->isFixedRefOfRegMask(genRegMask(regNum)))
660 // Otherwise, check for conflicts.
661 // There is a conflict if:
662 // 1. There is a recent RefPosition on this RegRecord that is at this location,
663 // except in the case where it is a special "putarg" that is associated with this interval, OR
664 // 2. There is an upcoming RefPosition at this location, or at the next location
665 // if refPosition is a delayed use (i.e. must be kept live through the next/def location).
667 LsraLocation refLocation = refPosition->nodeLocation;
668 if (recentRefPosition != nullptr && recentRefPosition->refType != RefTypeKill &&
669 recentRefPosition->nodeLocation == refLocation &&
670 (!isBusyUntilNextKill || assignedInterval != refPosition->getInterval()))
674 LsraLocation nextPhysRefLocation = getNextRefLocation();
675 if (nextPhysRefLocation == refLocation || (refPosition->delayRegFree && nextPhysRefLocation == (refLocation + 1)))
682 void LinearScan::applyCalleeSaveHeuristics(RefPosition* rp)
684 #ifdef _TARGET_AMD64_
685 if (compiler->opts.compDbgEnC)
687 // We only use RSI and RDI for EnC code, so we don't want to favor callee-save regs.
690 #endif // _TARGET_AMD64_
692 Interval* theInterval = rp->getInterval();
695 regMaskTP calleeSaveMask = calleeSaveRegs(getRegisterType(theInterval, rp));
696 if (doReverseCallerCallee())
698 rp->registerAssignment =
699 getConstrainedRegMask(rp->registerAssignment, calleeSaveMask, rp->minRegCandidateCount);
704 // Set preferences so that this register set will be preferred for earlier refs
705 theInterval->updateRegisterPreferences(rp->registerAssignment);
709 void LinearScan::associateRefPosWithInterval(RefPosition* rp)
711 Referenceable* theReferent = rp->referent;
713 if (theReferent != nullptr)
715 // All RefPositions except the dummy ones at the beginning of blocks
717 if (rp->isIntervalRef())
719 Interval* theInterval = rp->getInterval();
721 applyCalleeSaveHeuristics(rp);
723 if (theInterval->isLocalVar)
725 if (RefTypeIsUse(rp->refType))
727 RefPosition* const prevRP = theInterval->recentRefPosition;
728 if ((prevRP != nullptr) && (prevRP->bbNum == rp->bbNum))
730 prevRP->lastUse = false;
734 rp->lastUse = (rp->refType != RefTypeExpUse) && (rp->refType != RefTypeParamDef) &&
735 (rp->refType != RefTypeZeroInit) && !extendLifetimes();
737 else if (rp->refType == RefTypeUse)
739 // Ensure that we have consistent def/use on SDSU temps.
740 // However, in the case of a non-commutative rmw def, we must avoid over-constraining
741 // the def, so don't propagate a single-register restriction from the consumer to the producer
742 RefPosition* prevRefPosition = theInterval->recentRefPosition;
743 assert(prevRefPosition != nullptr && theInterval->firstRefPosition == prevRefPosition);
744 regMaskTP prevAssignment = prevRefPosition->registerAssignment;
745 regMaskTP newAssignment = (prevAssignment & rp->registerAssignment);
746 if (newAssignment != RBM_NONE)
748 if (!theInterval->hasNonCommutativeRMWDef || !isSingleRegister(newAssignment))
750 prevRefPosition->registerAssignment = newAssignment;
755 theInterval->hasConflictingDefUse = true;
762 RefPosition* prevRP = theReferent->recentRefPosition;
763 if (prevRP != nullptr)
765 prevRP->nextRefPosition = rp;
769 theReferent->firstRefPosition = rp;
771 theReferent->recentRefPosition = rp;
772 theReferent->lastRefPosition = rp;
776 assert((rp->refType == RefTypeBB) || (rp->refType == RefTypeKillGCRefs));
780 //---------------------------------------------------------------------------
781 // newRefPosition: allocate and initialize a new RefPosition.
784 // reg - reg number that identifies RegRecord to be associated
785 // with this RefPosition
786 // theLocation - LSRA location of RefPosition
787 // theRefType - RefPosition type
788 // theTreeNode - GenTree node for which this RefPosition is created
789 // mask - Set of valid registers for this RefPosition
790 // multiRegIdx - register position if this RefPosition corresponds to a
791 // multi-reg call node.
796 RefPosition* LinearScan::newRefPosition(
797 regNumber reg, LsraLocation theLocation, RefType theRefType, GenTree* theTreeNode, regMaskTP mask)
799 RefPosition* newRP = newRefPositionRaw(theLocation, theTreeNode, theRefType);
801 newRP->setReg(getRegisterRecord(reg));
802 newRP->registerAssignment = mask;
804 newRP->setMultiRegIdx(0);
805 newRP->setAllocateIfProfitable(0);
807 associateRefPosWithInterval(newRP);
809 DBEXEC(VERBOSE, newRP->dump());
813 //---------------------------------------------------------------------------
814 // newRefPosition: allocate and initialize a new RefPosition.
817 // theInterval - interval to which RefPosition is associated with.
818 // theLocation - LSRA location of RefPosition
819 // theRefType - RefPosition type
820 // theTreeNode - GenTree node for which this RefPosition is created
821 // mask - Set of valid registers for this RefPosition
822 // multiRegIdx - register position if this RefPosition corresponds to a
823 // multi-reg call node.
824 // minRegCount - Minimum number registers that needs to be ensured while
825 // constraining candidates for this ref position under
826 // LSRA stress. This is a DEBUG only arg.
831 RefPosition* LinearScan::newRefPosition(Interval* theInterval,
832 LsraLocation theLocation,
834 GenTree* theTreeNode,
836 unsigned multiRegIdx /* = 0 */
837 DEBUGARG(unsigned minRegCandidateCount /* = 1 */))
840 if (theInterval != nullptr && regType(theInterval->registerType) == FloatRegisterType)
842 // In the case we're using floating point registers we must make sure
843 // this flag was set previously in the compiler since this will mandate
844 // whether LSRA will take into consideration FP reg killsets.
845 assert(compiler->compFloatingPointUsed || ((mask & RBM_FLT_CALLEE_SAVED) == 0));
849 // If this reference is constrained to a single register (and it's not a dummy
850 // or Kill reftype already), add a RefTypeFixedReg at this location so that its
851 // availability can be more accurately determined
853 bool isFixedRegister = isSingleRegister(mask);
854 bool insertFixedRef = false;
857 // Insert a RefTypeFixedReg for any normal def or use (not ParamDef or BB)
858 if (theRefType == RefTypeUse || theRefType == RefTypeDef)
860 insertFixedRef = true;
866 regNumber physicalReg = genRegNumFromMask(mask);
867 RefPosition* pos = newRefPosition(physicalReg, theLocation, RefTypeFixedReg, nullptr, mask);
868 assert(theInterval != nullptr);
869 assert((allRegs(theInterval->registerType) & mask) != 0);
872 RefPosition* newRP = newRefPositionRaw(theLocation, theTreeNode, theRefType);
874 newRP->setInterval(theInterval);
877 newRP->isFixedRegRef = isFixedRegister;
879 #ifndef _TARGET_AMD64_
880 // We don't need this for AMD because the PInvoke method epilog code is explicit
881 // at register allocation time.
882 if (theInterval != nullptr && theInterval->isLocalVar && compiler->info.compCallUnmanaged &&
883 theInterval->varNum == compiler->genReturnLocal)
885 mask &= ~(RBM_PINVOKE_TCB | RBM_PINVOKE_FRAME);
886 noway_assert(mask != RBM_NONE);
888 #endif // !_TARGET_AMD64_
889 newRP->registerAssignment = mask;
891 newRP->setMultiRegIdx(multiRegIdx);
892 newRP->setAllocateIfProfitable(0);
895 newRP->minRegCandidateCount = minRegCandidateCount;
898 associateRefPosWithInterval(newRP);
900 DBEXEC(VERBOSE, newRP->dump());
904 /*****************************************************************************
905 * Inline functions for Interval
906 *****************************************************************************/
907 RefPosition* Referenceable::getNextRefPosition()
909 if (recentRefPosition == nullptr)
911 return firstRefPosition;
915 return recentRefPosition->nextRefPosition;
919 LsraLocation Referenceable::getNextRefLocation()
921 RefPosition* nextRefPosition = getNextRefPosition();
922 if (nextRefPosition == nullptr)
928 return nextRefPosition->nodeLocation;
932 // Iterate through all the registers of the given type
933 class RegisterIterator
935 friend class Registers;
938 RegisterIterator(RegisterType type) : regType(type)
940 if (useFloatReg(regType))
942 currentRegNum = REG_FP_FIRST;
946 currentRegNum = REG_INT_FIRST;
951 static RegisterIterator Begin(RegisterType regType)
953 return RegisterIterator(regType);
955 static RegisterIterator End(RegisterType regType)
957 RegisterIterator endIter = RegisterIterator(regType);
958 // This assumes only integer and floating point register types
959 // if we target a processor with additional register types,
960 // this would have to change
961 if (useFloatReg(regType))
963 // This just happens to work for both double & float
964 endIter.currentRegNum = REG_NEXT(REG_FP_LAST);
968 endIter.currentRegNum = REG_NEXT(REG_INT_LAST);
974 void operator++(int dummy) // int dummy is c++ for "this is postfix ++"
976 currentRegNum = REG_NEXT(currentRegNum);
978 if (regType == TYP_DOUBLE)
979 currentRegNum = REG_NEXT(currentRegNum);
982 void operator++() // prefix operator++
984 currentRegNum = REG_NEXT(currentRegNum);
986 if (regType == TYP_DOUBLE)
987 currentRegNum = REG_NEXT(currentRegNum);
990 regNumber operator*()
992 return currentRegNum;
994 bool operator!=(const RegisterIterator& other)
996 return other.currentRegNum != currentRegNum;
1000 regNumber currentRegNum;
1001 RegisterType regType;
1007 friend class RegisterIterator;
1009 Registers(RegisterType t)
1013 RegisterIterator begin()
1015 return RegisterIterator::Begin(type);
1017 RegisterIterator end()
1019 return RegisterIterator::End(type);
1024 void LinearScan::dumpVarToRegMap(VarToRegMap map)
1026 bool anyPrinted = false;
1027 for (unsigned varIndex = 0; varIndex < compiler->lvaTrackedCount; varIndex++)
1029 unsigned varNum = compiler->lvaTrackedToVarNum[varIndex];
1030 if (map[varIndex] != REG_STK)
1032 printf("V%02u=%s ", varNum, getRegName(map[varIndex]));
1043 void LinearScan::dumpInVarToRegMap(BasicBlock* block)
1045 printf("Var=Reg beg of BB%02u: ", block->bbNum);
1046 VarToRegMap map = getInVarToRegMap(block->bbNum);
1047 dumpVarToRegMap(map);
1050 void LinearScan::dumpOutVarToRegMap(BasicBlock* block)
1052 printf("Var=Reg end of BB%02u: ", block->bbNum);
1053 VarToRegMap map = getOutVarToRegMap(block->bbNum);
1054 dumpVarToRegMap(map);
1059 LinearScanInterface* getLinearScanAllocator(Compiler* comp)
1061 return new (comp, CMK_LSRA) LinearScan(comp);
1064 //------------------------------------------------------------------------
1071 // The constructor takes care of initializing the data structures that are used
1072 // during Lowering, including (in DEBUG) getting the stress environment variables,
1073 // as they may affect the block ordering.
1075 LinearScan::LinearScan(Compiler* theCompiler)
1076 : compiler(theCompiler)
1077 #if MEASURE_MEM_ALLOC
1078 , lsraIAllocator(nullptr)
1079 #endif // MEASURE_MEM_ALLOC
1080 , intervals(LinearScanMemoryAllocatorInterval(theCompiler))
1081 , refPositions(LinearScanMemoryAllocatorRefPosition(theCompiler))
1084 maxNodeLocation = 0;
1085 activeRefPosition = nullptr;
1087 // Get the value of the environment variable that controls stress for register allocation
1088 lsraStressMask = JitConfig.JitStressRegs();
1091 if (lsraStressMask != 0)
1093 // The code in this #if can be used to debug JitStressRegs issues according to
1094 // method hash. To use, simply set environment variables JitStressRegsHashLo and JitStressRegsHashHi
1095 unsigned methHash = compiler->info.compMethodHash();
1096 char* lostr = getenv("JitStressRegsHashLo");
1097 unsigned methHashLo = 0;
1099 if (lostr != nullptr)
1101 sscanf_s(lostr, "%x", &methHashLo);
1104 char* histr = getenv("JitStressRegsHashHi");
1105 unsigned methHashHi = UINT32_MAX;
1106 if (histr != nullptr)
1108 sscanf_s(histr, "%x", &methHashHi);
1111 if (methHash < methHashLo || methHash > methHashHi)
1115 else if (dump == true)
1117 printf("JitStressRegs = %x for method %s, hash = 0x%x.\n",
1118 lsraStressMask, compiler->info.compFullName, compiler->info.compMethodHash());
1119 printf(""); // in our logic this causes a flush
1125 dumpTerse = (JitConfig.JitDumpTerseLsra() != 0);
1128 availableIntRegs = (RBM_ALLINT & ~compiler->codeGen->regSet.rsMaskResvd);
1131 availableIntRegs &= ~RBM_FPBASE;
1132 #endif // ETW_EBP_FRAMED
1134 availableFloatRegs = RBM_ALLFLOAT;
1135 availableDoubleRegs = RBM_ALLDOUBLE;
1137 #ifdef _TARGET_AMD64_
1138 if (compiler->opts.compDbgEnC)
1140 // On x64 when the EnC option is set, we always save exactly RBP, RSI and RDI.
1141 // RBP is not available to the register allocator, so RSI and RDI are the only
1142 // callee-save registers available.
1143 availableIntRegs &= ~RBM_CALLEE_SAVED | RBM_RSI | RBM_RDI;
1144 availableFloatRegs &= ~RBM_CALLEE_SAVED;
1145 availableDoubleRegs &= ~RBM_CALLEE_SAVED;
1147 #endif // _TARGET_AMD64_
1148 compiler->rpFrameType = FT_NOT_SET;
1149 compiler->rpMustCreateEBPCalled = false;
1151 compiler->codeGen->intRegState.rsIsFloat = false;
1152 compiler->codeGen->floatRegState.rsIsFloat = true;
1154 // Block sequencing (the order in which we schedule).
1155 // Note that we don't initialize the bbVisitedSet until we do the first traversal
1156 // (currently during Lowering's second phase, where it sets the TreeNodeInfo).
1157 // This is so that any blocks that are added during the first phase of Lowering
1158 // are accounted for (and we don't have BasicBlockEpoch issues).
1159 blockSequencingDone = false;
1160 blockSequence = nullptr;
1161 blockSequenceWorkList = nullptr;
1165 // Information about each block, including predecessor blocks used for variable locations at block entry.
1166 blockInfo = nullptr;
1168 // Populate the register mask table.
1169 // The first two masks in the table are allint/allfloat
1170 // The next N are the masks for each single register.
1171 // After that are the dynamically added ones.
1172 regMaskTable = new (compiler, CMK_LSRA) regMaskTP[numMasks];
1173 regMaskTable[ALLINT_IDX] = allRegs(TYP_INT);
1174 regMaskTable[ALLFLOAT_IDX] = allRegs(TYP_DOUBLE);
1177 for (reg = REG_FIRST; reg < REG_COUNT; reg = REG_NEXT(reg))
1179 regMaskTable[FIRST_SINGLE_REG_IDX + reg - REG_FIRST] = (reg == REG_STK) ? RBM_NONE : genRegMask(reg);
1181 nextFreeMask = FIRST_SINGLE_REG_IDX + REG_COUNT;
1182 noway_assert(nextFreeMask <= numMasks);
1185 // Return the reg mask corresponding to the given index.
1186 regMaskTP LinearScan::GetRegMaskForIndex(RegMaskIndex index)
1188 assert(index < numMasks);
1189 assert(index < nextFreeMask);
1190 return regMaskTable[index];
1193 // Given a reg mask, return the index it corresponds to. If it is not a 'well known' reg mask,
1194 // add it at the end. This method has linear behavior in the worst cases but that is fairly rare.
1195 // Most methods never use any but the well-known masks, and when they do use more
1196 // it is only one or two more.
1197 LinearScan::RegMaskIndex LinearScan::GetIndexForRegMask(regMaskTP mask)
1199 RegMaskIndex result;
1200 if (isSingleRegister(mask))
1202 result = genRegNumFromMask(mask) + FIRST_SINGLE_REG_IDX;
1204 else if (mask == allRegs(TYP_INT))
1206 result = ALLINT_IDX;
1208 else if (mask == allRegs(TYP_DOUBLE))
1210 result = ALLFLOAT_IDX;
1214 for (int i = FIRST_SINGLE_REG_IDX + REG_COUNT; i < nextFreeMask; i++)
1216 if (regMaskTable[i] == mask)
1222 // We only allocate a fixed number of masks. Since we don't reallocate, we will throw a
1223 // noway_assert if we exceed this limit.
1224 noway_assert(nextFreeMask < numMasks);
1226 regMaskTable[nextFreeMask] = mask;
1227 result = nextFreeMask;
1230 assert(mask == regMaskTable[result]);
1234 // We've decided that we can't use a register during register allocation (probably FPBASE),
1235 // but we've already added it to the register masks. Go through the masks and remove it.
1236 void LinearScan::RemoveRegisterFromMasks(regNumber reg)
1238 JITDUMP("Removing register %s from LSRA register masks\n", getRegName(reg));
1240 regMaskTP mask = ~genRegMask(reg);
1241 for (int i = 0; i < nextFreeMask; i++)
1243 regMaskTable[i] &= mask;
1246 JITDUMP("After removing register:\n");
1247 DBEXEC(VERBOSE, dspRegisterMaskTable());
1251 void LinearScan::dspRegisterMaskTable()
1253 printf("LSRA register masks. Total allocated: %d, total used: %d\n", numMasks, nextFreeMask);
1254 for (int i = 0; i < nextFreeMask; i++)
1257 dspRegMask(regMaskTable[i]);
1263 //------------------------------------------------------------------------
1264 // getNextCandidateFromWorkList: Get the next candidate for block sequencing
1270 // The next block to be placed in the sequence.
1273 // This method currently always returns the next block in the list, and relies on having
1274 // blocks added to the list only when they are "ready", and on the
1275 // addToBlockSequenceWorkList() method to insert them in the proper order.
1276 // However, a block may be in the list and already selected, if it was subsequently
1277 // encountered as both a flow and layout successor of the most recently selected
1280 BasicBlock* LinearScan::getNextCandidateFromWorkList()
1282 BasicBlockList* nextWorkList = nullptr;
1283 for (BasicBlockList* workList = blockSequenceWorkList; workList != nullptr; workList = nextWorkList)
1285 nextWorkList = workList->next;
1286 BasicBlock* candBlock = workList->block;
1287 removeFromBlockSequenceWorkList(workList, nullptr);
1288 if (!isBlockVisited(candBlock))
1296 //------------------------------------------------------------------------
1297 // setBlockSequence:Determine the block order for register allocation.
1306 // On return, the blockSequence array contains the blocks, in the order in which they
1307 // will be allocated.
1308 // This method clears the bbVisitedSet on LinearScan, and when it returns the set
1309 // contains all the bbNums for the block.
1310 // This requires a traversal of the BasicBlocks, and could potentially be
1311 // combined with the first traversal (currently the one in Lowering that sets the
1314 void LinearScan::setBlockSequence()
1316 // Reset the "visited" flag on each block.
1317 compiler->EnsureBasicBlockEpoch();
1318 bbVisitedSet = BlockSetOps::MakeEmpty(compiler);
1319 BlockSet BLOCKSET_INIT_NOCOPY(readySet, BlockSetOps::MakeEmpty(compiler));
1320 assert(blockSequence == nullptr && bbSeqCount == 0);
1321 blockSequence = new (compiler, CMK_LSRA) BasicBlock*[compiler->fgBBcount];
1322 bbNumMaxBeforeResolution = compiler->fgBBNumMax;
1323 blockInfo = new (compiler, CMK_LSRA) LsraBlockInfo[bbNumMaxBeforeResolution + 1];
1325 assert(blockSequenceWorkList == nullptr);
1327 bool addedInternalBlocks = false;
1328 verifiedAllBBs = false;
1329 hasCriticalEdges = false;
1330 BasicBlock* nextBlock;
1331 for (BasicBlock* block = compiler->fgFirstBB; block != nullptr; block = nextBlock)
1333 blockSequence[bbSeqCount] = block;
1334 markBlockVisited(block);
1336 nextBlock = nullptr;
1338 // Initialize the blockInfo.
1339 // predBBNum will be set later. 0 is never used as a bbNum.
1340 blockInfo[block->bbNum].predBBNum = 0;
1341 // We check for critical edges below, but initialize to false.
1342 blockInfo[block->bbNum].hasCriticalInEdge = false;
1343 blockInfo[block->bbNum].hasCriticalOutEdge = false;
1344 blockInfo[block->bbNum].weight = block->bbWeight;
1346 #if TRACK_LSRA_STATS
1347 blockInfo[block->bbNum].spillCount = 0;
1348 blockInfo[block->bbNum].copyRegCount = 0;
1349 blockInfo[block->bbNum].resolutionMovCount = 0;
1350 blockInfo[block->bbNum].splitEdgeCount = 0;
1351 #endif // TRACK_LSRA_STATS
1353 if (block->GetUniquePred(compiler) == nullptr)
1355 for (flowList* pred = block->bbPreds; pred != nullptr; pred = pred->flNext)
1357 BasicBlock* predBlock = pred->flBlock;
1358 if (predBlock->NumSucc(compiler) > 1)
1360 blockInfo[block->bbNum].hasCriticalInEdge = true;
1361 hasCriticalEdges = true;
1364 else if (predBlock->bbJumpKind == BBJ_SWITCH)
1366 assert(!"Switch with single successor");
1371 // Determine which block to schedule next.
1373 // First, update the NORMAL successors of the current block, adding them to the worklist
1374 // according to the desired order. We will handle the EH successors below.
1375 bool checkForCriticalOutEdge = (block->NumSucc(compiler) > 1);
1376 if (!checkForCriticalOutEdge && block->bbJumpKind == BBJ_SWITCH)
1378 assert(!"Switch with single successor");
1381 const unsigned numSuccs = block->NumSucc(compiler);
1382 for (unsigned succIndex = 0; succIndex < numSuccs; succIndex++)
1384 BasicBlock* succ = block->GetSucc(succIndex, compiler);
1385 if (checkForCriticalOutEdge && succ->GetUniquePred(compiler) == nullptr)
1387 blockInfo[block->bbNum].hasCriticalOutEdge = true;
1388 hasCriticalEdges = true;
1389 // We can stop checking now.
1390 checkForCriticalOutEdge = false;
1393 if (isTraversalLayoutOrder() || isBlockVisited(succ))
1398 // We've now seen a predecessor, so add it to the work list and the "readySet".
1399 // It will be inserted in the worklist according to the specified traversal order
1400 // (i.e. pred-first or random, since layout order is handled above).
1401 if (!BlockSetOps::IsMember(compiler, readySet, succ->bbNum))
1403 addToBlockSequenceWorkList(readySet, succ);
1404 BlockSetOps::AddElemD(compiler, readySet, succ->bbNum);
1408 // For layout order, simply use bbNext
1409 if (isTraversalLayoutOrder())
1411 nextBlock = block->bbNext;
1415 while (nextBlock == nullptr)
1417 nextBlock = getNextCandidateFromWorkList();
1419 // TODO-Throughput: We would like to bypass this traversal if we know we've handled all
1420 // the blocks - but fgBBcount does not appear to be updated when blocks are removed.
1421 if (nextBlock == nullptr /* && bbSeqCount != compiler->fgBBcount*/ && !verifiedAllBBs)
1423 // If we don't encounter all blocks by traversing the regular sucessor links, do a full
1424 // traversal of all the blocks, and add them in layout order.
1425 // This may include:
1426 // - internal-only blocks (in the fgAddCodeList) which may not be in the flow graph
1427 // (these are not even in the bbNext links).
1428 // - blocks that have become unreachable due to optimizations, but that are strongly
1429 // connected (these are not removed)
1432 for (Compiler::AddCodeDsc* desc = compiler->fgAddCodeList; desc != nullptr; desc = desc->acdNext)
1434 if (!isBlockVisited(block))
1436 addToBlockSequenceWorkList(readySet, block);
1437 BlockSetOps::AddElemD(compiler, readySet, block->bbNum);
1441 for (BasicBlock* block = compiler->fgFirstBB; block; block = block->bbNext)
1443 if (!isBlockVisited(block))
1445 addToBlockSequenceWorkList(readySet, block);
1446 BlockSetOps::AddElemD(compiler, readySet, block->bbNum);
1449 verifiedAllBBs = true;
1457 blockSequencingDone = true;
1460 // Make sure that we've visited all the blocks.
1461 for (BasicBlock* block = compiler->fgFirstBB; block != nullptr; block = block->bbNext)
1463 assert(isBlockVisited(block));
1466 JITDUMP("LSRA Block Sequence: ");
1468 for (BasicBlock *block = startBlockSequence(); block != nullptr; ++i, block = moveToNextBlock())
1470 JITDUMP("BB%02u", block->bbNum);
1472 if (block->isMaxBBWeight())
1478 JITDUMP("(%6s) ", refCntWtd2str(block->getBBWeight(compiler)));
1490 //------------------------------------------------------------------------
1491 // compareBlocksForSequencing: Compare two basic blocks for sequencing order.
1494 // block1 - the first block for comparison
1495 // block2 - the second block for comparison
1496 // useBlockWeights - whether to use block weights for comparison
1499 // -1 if block1 is preferred.
1500 // 0 if the blocks are equivalent.
1501 // 1 if block2 is preferred.
1504 // See addToBlockSequenceWorkList.
1505 int LinearScan::compareBlocksForSequencing(BasicBlock* block1, BasicBlock* block2, bool useBlockWeights)
1507 if (useBlockWeights)
1509 unsigned weight1 = block1->getBBWeight(compiler);
1510 unsigned weight2 = block2->getBBWeight(compiler);
1512 if (weight1 > weight2)
1516 else if (weight1 < weight2)
1522 // If weights are the same prefer LOWER bbnum
1523 if (block1->bbNum < block2->bbNum)
1527 else if (block1->bbNum == block2->bbNum)
1537 //------------------------------------------------------------------------
1538 // addToBlockSequenceWorkList: Add a BasicBlock to the work list for sequencing.
1541 // sequencedBlockSet - the set of blocks that are already sequenced
1542 // block - the new block to be added
1548 // The first block in the list will be the next one to be sequenced, as soon
1549 // as we encounter a block whose successors have all been sequenced, in pred-first
1550 // order, or the very next block if we are traversing in random order (once implemented).
1551 // This method uses a comparison method to determine the order in which to place
1552 // the blocks in the list. This method queries whether all predecessors of the
1553 // block are sequenced at the time it is added to the list and if so uses block weights
1554 // for inserting the block. A block is never inserted ahead of its predecessors.
1555 // A block at the time of insertion may not have all its predecessors sequenced, in
1556 // which case it will be sequenced based on its block number. Once a block is inserted,
1557 // its priority\order will not be changed later once its remaining predecessors are
1558 // sequenced. This would mean that work list may not be sorted entirely based on
1559 // block weights alone.
1561 // Note also that, when random traversal order is implemented, this method
1562 // should insert the blocks into the list in random order, so that we can always
1563 // simply select the first block in the list.
1564 void LinearScan::addToBlockSequenceWorkList(BlockSet sequencedBlockSet, BasicBlock* block)
1566 // The block that is being added is not already sequenced
1567 assert(!BlockSetOps::IsMember(compiler, sequencedBlockSet, block->bbNum));
1569 // Get predSet of block
1570 BlockSet BLOCKSET_INIT_NOCOPY(predSet, BlockSetOps::MakeEmpty(compiler));
1572 for (pred = block->bbPreds; pred != nullptr; pred = pred->flNext)
1574 BlockSetOps::AddElemD(compiler, predSet, pred->flBlock->bbNum);
1577 // If either a rarely run block or all its preds are already sequenced, use block's weight to sequence
1578 bool useBlockWeight = block->isRunRarely() || BlockSetOps::IsSubset(compiler, sequencedBlockSet, predSet);
1580 BasicBlockList* prevNode = nullptr;
1581 BasicBlockList* nextNode = blockSequenceWorkList;
1583 while (nextNode != nullptr)
1587 if (nextNode->block->isRunRarely())
1589 // If the block that is yet to be sequenced is a rarely run block, always use block weights for sequencing
1590 seqResult = compareBlocksForSequencing(nextNode->block, block, true);
1592 else if (BlockSetOps::IsMember(compiler, predSet, nextNode->block->bbNum))
1594 // always prefer unsequenced pred blocks
1599 seqResult = compareBlocksForSequencing(nextNode->block, block, useBlockWeight);
1607 prevNode = nextNode;
1608 nextNode = nextNode->next;
1611 BasicBlockList* newListNode = new (compiler, CMK_LSRA) BasicBlockList(block, nextNode);
1612 if (prevNode == nullptr)
1614 blockSequenceWorkList = newListNode;
1618 prevNode->next = newListNode;
1622 void LinearScan::removeFromBlockSequenceWorkList(BasicBlockList* listNode, BasicBlockList* prevNode)
1624 if (listNode == blockSequenceWorkList)
1626 assert(prevNode == nullptr);
1627 blockSequenceWorkList = listNode->next;
1631 assert(prevNode != nullptr && prevNode->next == listNode);
1632 prevNode->next = listNode->next;
1634 // TODO-Cleanup: consider merging Compiler::BlockListNode and BasicBlockList
1635 // compiler->FreeBlockListNode(listNode);
1638 // Initialize the block order for allocation (called each time a new traversal begins).
1639 BasicBlock* LinearScan::startBlockSequence()
1641 if (!blockSequencingDone)
1645 BasicBlock* curBB = compiler->fgFirstBB;
1647 curBBNum = curBB->bbNum;
1648 clearVisitedBlocks();
1649 assert(blockSequence[0] == compiler->fgFirstBB);
1650 markBlockVisited(curBB);
1654 //------------------------------------------------------------------------
1655 // moveToNextBlock: Move to the next block in order for allocation or resolution.
1664 // This method is used when the next block is actually going to be handled.
1665 // It changes curBBNum.
1667 BasicBlock* LinearScan::moveToNextBlock()
1669 BasicBlock* nextBlock = getNextBlock();
1671 if (nextBlock != nullptr)
1673 curBBNum = nextBlock->bbNum;
1678 //------------------------------------------------------------------------
1679 // getNextBlock: Get the next block in order for allocation or resolution.
1688 // This method does not actually change the current block - it is used simply
1689 // to determine which block will be next.
1691 BasicBlock* LinearScan::getNextBlock()
1693 assert(blockSequencingDone);
1694 unsigned int nextBBSeqNum = curBBSeqNum + 1;
1695 if (nextBBSeqNum < bbSeqCount)
1697 return blockSequence[nextBBSeqNum];
1702 //------------------------------------------------------------------------
1703 // doLinearScan: The main method for register allocation.
1712 // Lowering must have set the NodeInfo (gtLsraInfo) on each node to communicate
1713 // the register requirements.
1715 void LinearScan::doLinearScan()
1720 printf("*************** In doLinearScan\n");
1721 printf("Trees before linear scan register allocator (LSRA)\n");
1722 compiler->fgDispBasicBlocks(true);
1726 splitBBNumToTargetBBNumMap = nullptr;
1728 // This is complicated by the fact that physical registers have refs associated
1729 // with locations where they are killed (e.g. calls), but we don't want to
1730 // count these as being touched.
1732 compiler->codeGen->regSet.rsClearRegsModified();
1736 DBEXEC(VERBOSE, TupleStyleDump(LSRA_DUMP_REFPOS));
1737 compiler->EndPhase(PHASE_LINEAR_SCAN_BUILD);
1739 DBEXEC(VERBOSE, lsraDumpIntervals("after buildIntervals"));
1741 BlockSetOps::ClearD(compiler, bbVisitedSet);
1743 allocateRegisters();
1744 compiler->EndPhase(PHASE_LINEAR_SCAN_ALLOC);
1746 compiler->EndPhase(PHASE_LINEAR_SCAN_RESOLVE);
1748 #if TRACK_LSRA_STATS
1749 if ((JitConfig.DisplayLsraStats() != 0)
1755 dumpLsraStats(jitstdout);
1757 #endif // TRACK_LSRA_STATS
1759 DBEXEC(VERBOSE, TupleStyleDump(LSRA_DUMP_POST));
1761 compiler->compLSRADone = true;
1764 //------------------------------------------------------------------------
1765 // recordVarLocationsAtStartOfBB: Update live-in LclVarDscs with the appropriate
1766 // register location at the start of a block, during codegen.
1769 // bb - the block for which code is about to be generated.
1775 // CodeGen will take care of updating the reg masks and the current var liveness,
1776 // after calling this method.
1777 // This is because we need to kill off the dead registers before setting the newly live ones.
1779 void LinearScan::recordVarLocationsAtStartOfBB(BasicBlock* bb)
1781 JITDUMP("Recording Var Locations at start of BB%02u\n", bb->bbNum);
1782 VarToRegMap map = getInVarToRegMap(bb->bbNum);
1785 VARSET_ITER_INIT(compiler, iter, bb->bbLiveIn, varIndex);
1786 while (iter.NextElem(compiler, &varIndex))
1788 unsigned varNum = compiler->lvaTrackedToVarNum[varIndex];
1789 LclVarDsc* varDsc = &(compiler->lvaTable[varNum]);
1790 regNumber regNum = getVarReg(map, varNum);
1792 regNumber oldRegNum = varDsc->lvRegNum;
1793 regNumber newRegNum = regNum;
1795 if (oldRegNum != newRegNum)
1797 JITDUMP(" V%02u(%s->%s)", varNum, compiler->compRegVarName(oldRegNum),
1798 compiler->compRegVarName(newRegNum));
1799 varDsc->lvRegNum = newRegNum;
1802 else if (newRegNum != REG_STK)
1804 JITDUMP(" V%02u(%s)", varNum, compiler->compRegVarName(newRegNum));
1811 JITDUMP(" <none>\n");
1817 void Interval::setLocalNumber(unsigned lclNum, LinearScan* linScan)
1819 linScan->localVarIntervals[lclNum] = this;
1821 assert(linScan->getIntervalForLocalVar(lclNum) == this);
1822 this->isLocalVar = true;
1823 this->varNum = lclNum;
1826 // identify the candidates which we are not going to enregister due to
1827 // being used in EH in a way we don't want to deal with
1828 // this logic cloned from fgInterBlockLocalVarLiveness
1829 void LinearScan::identifyCandidatesExceptionDataflow()
1831 VARSET_TP VARSET_INIT_NOCOPY(exceptVars, VarSetOps::MakeEmpty(compiler));
1832 VARSET_TP VARSET_INIT_NOCOPY(filterVars, VarSetOps::MakeEmpty(compiler));
1833 VARSET_TP VARSET_INIT_NOCOPY(finallyVars, VarSetOps::MakeEmpty(compiler));
1836 foreach_block(compiler, block)
1838 if (block->bbCatchTyp != BBCT_NONE)
1840 // live on entry to handler
1841 VarSetOps::UnionD(compiler, exceptVars, block->bbLiveIn);
1844 if (block->bbJumpKind == BBJ_EHFILTERRET)
1846 // live on exit from filter
1847 VarSetOps::UnionD(compiler, filterVars, block->bbLiveOut);
1849 else if (block->bbJumpKind == BBJ_EHFINALLYRET)
1851 // live on exit from finally
1852 VarSetOps::UnionD(compiler, finallyVars, block->bbLiveOut);
1854 #if FEATURE_EH_FUNCLETS
1855 // Funclets are called and returned from, as such we can only count on the frame
1856 // pointer being restored, and thus everything live in or live out must be on the
1858 if (block->bbFlags & BBF_FUNCLET_BEG)
1860 VarSetOps::UnionD(compiler, exceptVars, block->bbLiveIn);
1862 if ((block->bbJumpKind == BBJ_EHFINALLYRET) || (block->bbJumpKind == BBJ_EHFILTERRET) ||
1863 (block->bbJumpKind == BBJ_EHCATCHRET))
1865 VarSetOps::UnionD(compiler, exceptVars, block->bbLiveOut);
1867 #endif // FEATURE_EH_FUNCLETS
1870 // slam them all together (there was really no need to use more than 2 bitvectors here)
1871 VarSetOps::UnionD(compiler, exceptVars, filterVars);
1872 VarSetOps::UnionD(compiler, exceptVars, finallyVars);
1874 /* Mark all pointer variables live on exit from a 'finally'
1875 block as either volatile for non-GC ref types or as
1876 'explicitly initialized' (volatile and must-init) for GC-ref types */
1878 VARSET_ITER_INIT(compiler, iter, exceptVars, varIndex);
1879 while (iter.NextElem(compiler, &varIndex))
1881 unsigned varNum = compiler->lvaTrackedToVarNum[varIndex];
1882 LclVarDsc* varDsc = compiler->lvaTable + varNum;
1884 compiler->lvaSetVarDoNotEnregister(varNum DEBUGARG(Compiler::DNER_LiveInOutOfHandler));
1886 if (varTypeIsGC(varDsc))
1888 if (VarSetOps::IsMember(compiler, finallyVars, varIndex) && !varDsc->lvIsParam)
1890 varDsc->lvMustInit = true;
1896 bool LinearScan::isRegCandidate(LclVarDsc* varDsc)
1898 // Check to see if opt settings permit register variables
1899 if ((compiler->opts.compFlags & CLFLG_REGVAR) == 0)
1904 // If we have JMP, reg args must be put on the stack
1906 if (compiler->compJmpOpUsed && varDsc->lvIsRegArg)
1911 if (!varDsc->lvTracked)
1916 // Don't allocate registers for dependently promoted struct fields
1917 if (compiler->lvaIsFieldOfDependentlyPromotedStruct(varDsc))
1924 // Identify locals & compiler temps that are register candidates
1925 // TODO-Cleanup: This was cloned from Compiler::lvaSortByRefCount() in lclvars.cpp in order
1926 // to avoid perturbation, but should be merged.
1928 void LinearScan::identifyCandidates()
1931 // Initialize the sets of lclVars that are used to determine whether, and for which lclVars,
1932 // we need to perform resolution across basic blocks.
1933 // Note that we can't do this in the constructor because the number of tracked lclVars may
1934 // change between the constructor and the actual allocation.
1935 VarSetOps::AssignNoCopy(compiler, resolutionCandidateVars, VarSetOps::MakeEmpty(compiler));
1936 VarSetOps::AssignNoCopy(compiler, splitOrSpilledVars, VarSetOps::MakeEmpty(compiler));
1938 if (compiler->lvaCount == 0)
1943 if (compiler->compHndBBtabCount > 0)
1945 identifyCandidatesExceptionDataflow();
1948 // initialize mapping from local to interval
1949 localVarIntervals = new (compiler, CMK_LSRA) Interval*[compiler->lvaCount];
1954 // While we build intervals for the candidate lclVars, we will determine the floating point
1955 // lclVars, if any, to consider for callee-save register preferencing.
1956 // We maintain two sets of FP vars - those that meet the first threshold of weighted ref Count,
1957 // and those that meet the second.
1958 // The first threshold is used for methods that are heuristically deemed either to have light
1959 // fp usage, or other factors that encourage conservative use of callee-save registers, such
1960 // as multiple exits (where there might be an early exit that woudl be excessively penalized by
1961 // lots of prolog/epilog saves & restores).
1962 // The second threshold is used where there are factors deemed to make it more likely that fp
1963 // fp callee save registers will be needed, such as loops or many fp vars.
1964 // We keep two sets of vars, since we collect some of the information to determine which set to
1965 // use as we iterate over the vars.
1966 // When we are generating AVX code on non-Unix (FEATURE_PARTIAL_SIMD_CALLEE_SAVE), we maintain an
1967 // additional set of LargeVectorType vars, and there is a separate threshold defined for those.
1968 // It is assumed that if we encounter these, that we should consider this a "high use" scenario,
1969 // so we don't maintain two sets of these vars.
1970 // This is defined as thresholdLargeVectorRefCntWtd, as we are likely to use the same mechanism
1971 // for vectors on Arm64, though the actual value may differ.
1973 VarSetOps::AssignNoCopy(compiler, fpCalleeSaveCandidateVars, VarSetOps::MakeEmpty(compiler));
1974 VARSET_TP VARSET_INIT_NOCOPY(fpMaybeCandidateVars, VarSetOps::MakeEmpty(compiler));
1975 unsigned int floatVarCount = 0;
1976 unsigned int thresholdFPRefCntWtd = 4 * BB_UNITY_WEIGHT;
1977 unsigned int maybeFPRefCntWtd = 2 * BB_UNITY_WEIGHT;
1978 #if FEATURE_PARTIAL_SIMD_CALLEE_SAVE
1979 VarSetOps::AssignNoCopy(compiler, largeVectorVars, VarSetOps::MakeEmpty(compiler));
1980 VarSetOps::AssignNoCopy(compiler, largeVectorCalleeSaveCandidateVars, VarSetOps::MakeEmpty(compiler));
1981 unsigned int largeVectorVarCount = 0;
1982 unsigned int thresholdLargeVectorRefCntWtd = 4 * BB_UNITY_WEIGHT;
1983 #endif // FEATURE_PARTIAL_SIMD_CALLEE_SAVE
1985 unsigned refCntStk = 0;
1986 unsigned refCntReg = 0;
1987 unsigned refCntWtdReg = 0;
1988 unsigned refCntStkParam = 0; // sum of ref counts for all stack based parameters
1989 unsigned refCntWtdStkDbl = 0; // sum of wtd ref counts for stack based doubles
1990 doDoubleAlign = false;
1991 bool checkDoubleAlign = true;
1992 if (compiler->codeGen->isFramePointerRequired() || compiler->opts.MinOpts())
1994 checkDoubleAlign = false;
1998 switch (compiler->getCanDoubleAlign())
2000 case MUST_DOUBLE_ALIGN:
2001 doDoubleAlign = true;
2002 checkDoubleAlign = false;
2004 case CAN_DOUBLE_ALIGN:
2006 case CANT_DOUBLE_ALIGN:
2007 doDoubleAlign = false;
2008 checkDoubleAlign = false;
2014 #endif // DOUBLE_ALIGN
2016 for (lclNum = 0, varDsc = compiler->lvaTable; lclNum < compiler->lvaCount; lclNum++, varDsc++)
2018 // Assign intervals to all the variables - this makes it easier to map
2020 var_types intervalType = (var_types)varDsc->lvType;
2021 Interval* newInt = newInterval(intervalType);
2023 newInt->setLocalNumber(lclNum, this);
2026 if (checkDoubleAlign)
2028 if (varDsc->lvIsParam && !varDsc->lvIsRegArg)
2030 refCntStkParam += varDsc->lvRefCnt;
2032 else if (!isRegCandidate(varDsc) || varDsc->lvDoNotEnregister)
2034 refCntStk += varDsc->lvRefCnt;
2035 if ((varDsc->lvType == TYP_DOUBLE) ||
2036 ((varTypeIsStruct(varDsc) && varDsc->lvStructDoubleAlign &&
2037 (compiler->lvaGetPromotionType(varDsc) != Compiler::PROMOTION_TYPE_INDEPENDENT))))
2039 refCntWtdStkDbl += varDsc->lvRefCntWtd;
2044 refCntReg += varDsc->lvRefCnt;
2045 refCntWtdReg += varDsc->lvRefCntWtd;
2048 #endif // DOUBLE_ALIGN
2050 if (varDsc->lvIsStructField)
2052 newInt->isStructField = true;
2055 // Initialize all variables to REG_STK
2056 varDsc->lvRegNum = REG_STK;
2057 #ifndef _TARGET_64BIT_
2058 varDsc->lvOtherReg = REG_STK;
2059 #endif // _TARGET_64BIT_
2061 #if !defined(_TARGET_64BIT_)
2062 if (intervalType == TYP_LONG)
2064 // Long variables should not be register candidates.
2065 // Lowering will have split any candidate lclVars into lo/hi vars.
2066 varDsc->lvLRACandidate = 0;
2069 #endif // !defined(_TARGET_64BIT)
2071 /* Track all locals that can be enregistered */
2073 varDsc->lvLRACandidate = 1;
2075 if (!isRegCandidate(varDsc))
2077 varDsc->lvLRACandidate = 0;
2081 // Start with lvRegister as false - set it true only if the variable gets
2082 // the same register assignment throughout
2083 varDsc->lvRegister = false;
2085 /* If the ref count is zero */
2086 if (varDsc->lvRefCnt == 0)
2088 /* Zero ref count, make this untracked */
2089 varDsc->lvRefCntWtd = 0;
2090 varDsc->lvLRACandidate = 0;
2093 // Variables that are address-exposed are never enregistered, or tracked.
2094 // A struct may be promoted, and a struct that fits in a register may be fully enregistered.
2095 // Pinned variables may not be tracked (a condition of the GCInfo representation)
2096 // or enregistered, on x86 -- it is believed that we can enregister pinned (more properly, "pinning")
2097 // references when using the general GC encoding.
2099 if (varDsc->lvAddrExposed || !varTypeIsEnregisterableStruct(varDsc))
2101 varDsc->lvLRACandidate = 0;
2103 Compiler::DoNotEnregisterReason dner = Compiler::DNER_AddrExposed;
2104 if (!varDsc->lvAddrExposed)
2106 dner = Compiler::DNER_IsStruct;
2109 compiler->lvaSetVarDoNotEnregister(lclNum DEBUGARG(dner));
2111 else if (varDsc->lvPinned)
2113 varDsc->lvTracked = 0;
2114 #ifdef JIT32_GCENCODER
2115 compiler->lvaSetVarDoNotEnregister(lclNum DEBUGARG(Compiler::DNER_PinningRef));
2116 #endif // JIT32_GCENCODER
2119 // Are we not optimizing and we have exception handlers?
2120 // if so mark all args and locals as volatile, so that they
2121 // won't ever get enregistered.
2123 if (compiler->opts.MinOpts() && compiler->compHndBBtabCount > 0)
2125 compiler->lvaSetVarDoNotEnregister(lclNum DEBUGARG(Compiler::DNER_LiveInOutOfHandler));
2126 varDsc->lvLRACandidate = 0;
2130 if (varDsc->lvDoNotEnregister)
2132 varDsc->lvLRACandidate = 0;
2136 var_types type = genActualType(varDsc->TypeGet());
2140 #if CPU_HAS_FP_SUPPORT
2143 if (compiler->opts.compDbgCode)
2145 varDsc->lvLRACandidate = 0;
2148 #endif // CPU_HAS_FP_SUPPORT
2160 if (varDsc->lvPromoted)
2162 varDsc->lvLRACandidate = 0;
2165 // TODO-1stClassStructs: Move TYP_SIMD8 up with the other SIMD types, after handling the param issue
2166 // (passing & returning as TYP_LONG).
2168 #endif // FEATURE_SIMD
2172 varDsc->lvLRACandidate = 0;
2178 noway_assert(!"lvType not set correctly");
2179 varDsc->lvType = TYP_INT;
2184 varDsc->lvLRACandidate = 0;
2187 // we will set this later when we have determined liveness
2188 if (varDsc->lvLRACandidate)
2190 varDsc->lvMustInit = false;
2193 // We maintain two sets of FP vars - those that meet the first threshold of weighted ref Count,
2194 // and those that meet the second (see the definitions of thresholdFPRefCntWtd and maybeFPRefCntWtd
2196 CLANG_FORMAT_COMMENT_ANCHOR;
2198 #if FEATURE_PARTIAL_SIMD_CALLEE_SAVE
2199 // Additionally, when we are generating AVX on non-UNIX amd64, we keep a separate set of the LargeVectorType
2201 if (varDsc->lvType == LargeVectorType)
2203 largeVectorVarCount++;
2204 VarSetOps::AddElemD(compiler, largeVectorVars, varDsc->lvVarIndex);
2205 unsigned refCntWtd = varDsc->lvRefCntWtd;
2206 if (refCntWtd >= thresholdLargeVectorRefCntWtd)
2208 VarSetOps::AddElemD(compiler, largeVectorCalleeSaveCandidateVars, varDsc->lvVarIndex);
2212 #endif // FEATURE_PARTIAL_SIMD_CALLEE_SAVE
2213 if (regType(newInt->registerType) == FloatRegisterType)
2216 unsigned refCntWtd = varDsc->lvRefCntWtd;
2217 if (varDsc->lvIsRegArg)
2219 // Don't count the initial reference for register params. In those cases,
2220 // using a callee-save causes an extra copy.
2221 refCntWtd -= BB_UNITY_WEIGHT;
2223 if (refCntWtd >= thresholdFPRefCntWtd)
2225 VarSetOps::AddElemD(compiler, fpCalleeSaveCandidateVars, varDsc->lvVarIndex);
2227 else if (refCntWtd >= maybeFPRefCntWtd)
2229 VarSetOps::AddElemD(compiler, fpMaybeCandidateVars, varDsc->lvVarIndex);
2235 if (checkDoubleAlign)
2237 // TODO-CQ: Fine-tune this:
2238 // In the legacy reg predictor, this runs after allocation, and then demotes any lclVars
2239 // allocated to the frame pointer, which is probably the wrong order.
2240 // However, because it runs after allocation, it can determine the impact of demoting
2241 // the lclVars allocated to the frame pointer.
2242 // => Here, estimate of the EBP refCnt and weighted refCnt is a wild guess.
2244 unsigned refCntEBP = refCntReg / 8;
2245 unsigned refCntWtdEBP = refCntWtdReg / 8;
2248 compiler->shouldDoubleAlign(refCntStk, refCntEBP, refCntWtdEBP, refCntStkParam, refCntWtdStkDbl);
2250 #endif // DOUBLE_ALIGN
2252 // The factors we consider to determine which set of fp vars to use as candidates for callee save
2253 // registers current include the number of fp vars, whether there are loops, and whether there are
2254 // multiple exits. These have been selected somewhat empirically, but there is probably room for
2256 CLANG_FORMAT_COMMENT_ANCHOR;
2261 printf("\nFP callee save candidate vars: ");
2262 if (!VarSetOps::IsEmpty(compiler, fpCalleeSaveCandidateVars))
2264 dumpConvertedVarSet(compiler, fpCalleeSaveCandidateVars);
2274 JITDUMP("floatVarCount = %d; hasLoops = %d, singleExit = %d\n", floatVarCount, compiler->fgHasLoops,
2275 (compiler->fgReturnBlocks == nullptr || compiler->fgReturnBlocks->next == nullptr));
2277 // Determine whether to use the 2nd, more aggressive, threshold for fp callee saves.
2278 if (floatVarCount > 6 && compiler->fgHasLoops &&
2279 (compiler->fgReturnBlocks == nullptr || compiler->fgReturnBlocks->next == nullptr))
2284 printf("Adding additional fp callee save candidates: \n");
2285 if (!VarSetOps::IsEmpty(compiler, fpMaybeCandidateVars))
2287 dumpConvertedVarSet(compiler, fpMaybeCandidateVars);
2296 VarSetOps::UnionD(compiler, fpCalleeSaveCandidateVars, fpMaybeCandidateVars);
2303 // Frame layout is only pre-computed for ARM
2304 printf("\nlvaTable after IdentifyCandidates\n");
2305 compiler->lvaTableDump();
2308 #endif // _TARGET_ARM_
2311 // TODO-Throughput: This mapping can surely be more efficiently done
2312 void LinearScan::initVarRegMaps()
2314 assert(compiler->lvaTrackedFixed); // We should have already set this to prevent us from adding any new tracked
2317 // The compiler memory allocator requires that the allocation be an
2318 // even multiple of int-sized objects
2319 unsigned int varCount = compiler->lvaTrackedCount;
2320 regMapCount = (unsigned int)roundUp(varCount, sizeof(int));
2322 // Not sure why blocks aren't numbered from zero, but they don't appear to be.
2323 // So, if we want to index by bbNum we have to know the maximum value.
2324 unsigned int bbCount = compiler->fgBBNumMax + 1;
2326 inVarToRegMaps = new (compiler, CMK_LSRA) regNumber*[bbCount];
2327 outVarToRegMaps = new (compiler, CMK_LSRA) regNumber*[bbCount];
2331 // This VarToRegMap is used during the resolution of critical edges.
2332 sharedCriticalVarToRegMap = new (compiler, CMK_LSRA) regNumber[regMapCount];
2334 for (unsigned int i = 0; i < bbCount; i++)
2336 regNumber* inVarToRegMap = new (compiler, CMK_LSRA) regNumber[regMapCount];
2337 regNumber* outVarToRegMap = new (compiler, CMK_LSRA) regNumber[regMapCount];
2339 for (unsigned int j = 0; j < regMapCount; j++)
2341 inVarToRegMap[j] = REG_STK;
2342 outVarToRegMap[j] = REG_STK;
2344 inVarToRegMaps[i] = inVarToRegMap;
2345 outVarToRegMaps[i] = outVarToRegMap;
2350 sharedCriticalVarToRegMap = nullptr;
2351 for (unsigned int i = 0; i < bbCount; i++)
2353 inVarToRegMaps[i] = nullptr;
2354 outVarToRegMaps[i] = nullptr;
2359 void LinearScan::setInVarRegForBB(unsigned int bbNum, unsigned int varNum, regNumber reg)
2361 assert(reg < UCHAR_MAX && varNum < compiler->lvaCount);
2362 inVarToRegMaps[bbNum][compiler->lvaTable[varNum].lvVarIndex] = reg;
2365 void LinearScan::setOutVarRegForBB(unsigned int bbNum, unsigned int varNum, regNumber reg)
2367 assert(reg < UCHAR_MAX && varNum < compiler->lvaCount);
2368 outVarToRegMaps[bbNum][compiler->lvaTable[varNum].lvVarIndex] = reg;
2371 LinearScan::SplitEdgeInfo LinearScan::getSplitEdgeInfo(unsigned int bbNum)
2373 SplitEdgeInfo splitEdgeInfo;
2374 assert(bbNum <= compiler->fgBBNumMax);
2375 assert(bbNum > bbNumMaxBeforeResolution);
2376 assert(splitBBNumToTargetBBNumMap != nullptr);
2377 splitBBNumToTargetBBNumMap->Lookup(bbNum, &splitEdgeInfo);
2378 assert(splitEdgeInfo.toBBNum <= bbNumMaxBeforeResolution);
2379 assert(splitEdgeInfo.fromBBNum <= bbNumMaxBeforeResolution);
2380 return splitEdgeInfo;
2383 VarToRegMap LinearScan::getInVarToRegMap(unsigned int bbNum)
2385 assert(bbNum <= compiler->fgBBNumMax);
2386 // For the blocks inserted to split critical edges, the inVarToRegMap is
2387 // equal to the outVarToRegMap at the "from" block.
2388 if (bbNum > bbNumMaxBeforeResolution)
2390 SplitEdgeInfo splitEdgeInfo = getSplitEdgeInfo(bbNum);
2391 unsigned fromBBNum = splitEdgeInfo.fromBBNum;
2394 assert(splitEdgeInfo.toBBNum != 0);
2395 return inVarToRegMaps[splitEdgeInfo.toBBNum];
2399 return outVarToRegMaps[fromBBNum];
2403 return inVarToRegMaps[bbNum];
2406 VarToRegMap LinearScan::getOutVarToRegMap(unsigned int bbNum)
2408 assert(bbNum <= compiler->fgBBNumMax);
2409 // For the blocks inserted to split critical edges, the outVarToRegMap is
2410 // equal to the inVarToRegMap at the target.
2411 if (bbNum > bbNumMaxBeforeResolution)
2413 // If this is an empty block, its in and out maps are both the same.
2414 // We identify this case by setting fromBBNum or toBBNum to 0, and using only the other.
2415 SplitEdgeInfo splitEdgeInfo = getSplitEdgeInfo(bbNum);
2416 unsigned toBBNum = splitEdgeInfo.toBBNum;
2419 assert(splitEdgeInfo.fromBBNum != 0);
2420 return outVarToRegMaps[splitEdgeInfo.fromBBNum];
2424 return inVarToRegMaps[toBBNum];
2427 return outVarToRegMaps[bbNum];
2430 regNumber LinearScan::getVarReg(VarToRegMap bbVarToRegMap, unsigned int varNum)
2432 assert(compiler->lvaTable[varNum].lvTracked);
2433 return bbVarToRegMap[compiler->lvaTable[varNum].lvVarIndex];
2436 // Initialize the incoming VarToRegMap to the given map values (generally a predecessor of
2438 VarToRegMap LinearScan::setInVarToRegMap(unsigned int bbNum, VarToRegMap srcVarToRegMap)
2440 VarToRegMap inVarToRegMap = inVarToRegMaps[bbNum];
2441 memcpy(inVarToRegMap, srcVarToRegMap, (regMapCount * sizeof(regNumber)));
2442 return inVarToRegMap;
2445 // find the last node in the tree in execution order
2446 // TODO-Throughput: this is inefficient!
2447 GenTree* lastNodeInTree(GenTree* tree)
2449 // There is no gtprev on the top level tree node so
2450 // apparently the way to walk a tree backwards is to walk
2451 // it forward, find the last node, and walk back from there.
2453 GenTree* last = nullptr;
2454 if (tree->OperGet() == GT_STMT)
2456 GenTree* statement = tree;
2458 foreach_treenode_execution_order(tree, statement)
2469 tree = tree->gtNext;
2475 // given a tree node
2476 RefType refTypeForLocalRefNode(GenTree* node)
2478 assert(node->IsLocal());
2480 // We don't support updates
2481 assert((node->gtFlags & GTF_VAR_USEASG) == 0);
2483 if (node->gtFlags & GTF_VAR_DEF)
2493 // This function sets RefPosition last uses by walking the RefPositions, instead of walking the
2494 // tree nodes in execution order (as was done in a previous version).
2495 // This is because the execution order isn't strictly correct, specifically for
2496 // references to local variables that occur in arg lists.
2498 // TODO-Throughput: This function should eventually be eliminated, as we should be able to rely on last uses
2499 // being set by dataflow analysis. It is necessary to do it this way only because the execution
2500 // order wasn't strictly correct.
2503 void LinearScan::checkLastUses(BasicBlock* block)
2507 JITDUMP("\n\nCHECKING LAST USES for block %u, liveout=", block->bbNum);
2508 dumpConvertedVarSet(compiler, block->bbLiveOut);
2509 JITDUMP("\n==============================\n");
2512 unsigned keepAliveVarNum = BAD_VAR_NUM;
2513 if (compiler->lvaKeepAliveAndReportThis())
2515 keepAliveVarNum = compiler->info.compThisArg;
2516 assert(compiler->info.compIsStatic == false);
2519 // find which uses are lastUses
2521 // Work backwards starting with live out.
2522 // 'temp' is updated to include any exposed use (including those in this
2523 // block that we've already seen). When we encounter a use, if it's
2524 // not in that set, then it's a last use.
2526 VARSET_TP VARSET_INIT(compiler, temp, block->bbLiveOut);
2528 bool foundDiff = false;
2529 auto currentRefPosition = refPositions.rbegin();
2530 while (currentRefPosition->refType != RefTypeBB)
2532 // We should never see ParamDefs or ZeroInits within a basic block.
2533 assert(currentRefPosition->refType != RefTypeParamDef && currentRefPosition->refType != RefTypeZeroInit);
2534 if (currentRefPosition->isIntervalRef() && currentRefPosition->getInterval()->isLocalVar)
2536 unsigned varNum = currentRefPosition->getInterval()->varNum;
2537 unsigned varIndex = currentRefPosition->getInterval()->getVarIndex(compiler);
2539 LsraLocation loc = currentRefPosition->nodeLocation;
2541 // We should always have a tree node for a localVar, except for the "special" RefPositions.
2542 GenTreePtr tree = currentRefPosition->treeNode;
2543 assert(tree != nullptr || currentRefPosition->refType == RefTypeExpUse ||
2544 currentRefPosition->refType == RefTypeDummyDef);
2546 if (!VarSetOps::IsMember(compiler, temp, varIndex) && varNum != keepAliveVarNum)
2548 // There was no exposed use, so this is a "last use" (and we mark it thus even if it's a def)
2550 if (extendLifetimes())
2552 // NOTE: this is a bit of a hack. When extending lifetimes, the "last use" bit will be clear.
2553 // This bit, however, would normally be used during resolveLocalRef to set the value of
2554 // GTF_VAR_DEATH on the node for a ref position. If this bit is not set correctly even when
2555 // extending lifetimes, the code generator will assert as it expects to have accurate last
2556 // use information. To avoid these asserts, set the GTF_VAR_DEATH bit here.
2557 if (tree != nullptr)
2559 tree->gtFlags |= GTF_VAR_DEATH;
2562 else if (!currentRefPosition->lastUse)
2564 JITDUMP("missing expected last use of V%02u @%u\n", compiler->lvaTrackedToVarNum[varIndex], loc);
2567 VarSetOps::AddElemD(compiler, temp, varIndex);
2569 else if (currentRefPosition->lastUse)
2571 JITDUMP("unexpected last use of V%02u @%u\n", compiler->lvaTrackedToVarNum[varIndex], loc);
2574 else if (extendLifetimes() && tree != nullptr)
2576 // NOTE: see the comment above re: the extendLifetimes hack.
2577 tree->gtFlags &= ~GTF_VAR_DEATH;
2580 if (currentRefPosition->refType == RefTypeDef || currentRefPosition->refType == RefTypeDummyDef)
2582 VarSetOps::RemoveElemD(compiler, temp, varIndex);
2586 assert(currentRefPosition != refPositions.rend());
2587 ++currentRefPosition;
2590 VARSET_TP VARSET_INIT(compiler, temp2, block->bbLiveIn);
2591 VarSetOps::DiffD(compiler, temp2, temp);
2592 VarSetOps::DiffD(compiler, temp, block->bbLiveIn);
2595 VARSET_ITER_INIT(compiler, iter, temp, varIndex);
2596 while (iter.NextElem(compiler, &varIndex))
2598 unsigned varNum = compiler->lvaTrackedToVarNum[varIndex];
2599 if (compiler->lvaTable[varNum].lvLRACandidate)
2601 JITDUMP("BB%02u: V%02u is computed live, but not in LiveIn set.\n", block->bbNum, varNum);
2608 VARSET_ITER_INIT(compiler, iter, temp2, varIndex);
2609 while (iter.NextElem(compiler, &varIndex))
2611 unsigned varNum = compiler->lvaTrackedToVarNum[varIndex];
2612 if (compiler->lvaTable[varNum].lvLRACandidate)
2614 JITDUMP("BB%02u: V%02u is in LiveIn set, but not computed live.\n", block->bbNum, varNum);
2624 void LinearScan::addRefsForPhysRegMask(regMaskTP mask, LsraLocation currentLoc, RefType refType, bool isLastUse)
2626 for (regNumber reg = REG_FIRST; mask; reg = REG_NEXT(reg), mask >>= 1)
2630 // This assumes that these are all "special" RefTypes that
2631 // don't need to be recorded on the tree (hence treeNode is nullptr)
2632 RefPosition* pos = newRefPosition(reg, currentLoc, refType, nullptr,
2633 genRegMask(reg)); // This MUST occupy the physical register (obviously)
2637 pos->lastUse = true;
2643 //------------------------------------------------------------------------
2644 // getKillSetForNode: Return the registers killed by the given tree node.
2647 // compiler - the compiler context to use
2648 // tree - the tree for which the kill set is needed.
2650 // Return Value: a register mask of the registers killed
2652 regMaskTP LinearScan::getKillSetForNode(GenTree* tree)
2654 regMaskTP killMask = RBM_NONE;
2655 switch (tree->OperGet())
2657 #ifdef _TARGET_XARCH_
2659 // We use the 128-bit multiply when performing an overflow checking unsigned multiply
2661 if (((tree->gtFlags & GTF_UNSIGNED) != 0) && tree->gtOverflowEx())
2663 // Both RAX and RDX are killed by the operation
2664 killMask = RBM_RAX | RBM_RDX;
2669 #if defined(_TARGET_X86_) && !defined(LEGACY_BACKEND)
2672 killMask = RBM_RAX | RBM_RDX;
2679 if (!varTypeIsFloating(tree->TypeGet()))
2681 // RDX needs to be killed early, because it must not be used as a source register
2682 // (unlike most cases, where the kill happens AFTER the uses). So for this kill,
2683 // we add the RefPosition at the tree loc (where the uses are located) instead of the
2684 // usual kill location which is the same as the defs at tree loc+1.
2685 // Note that we don't have to add interference for the live vars, because that
2686 // will be done below, and is not sensitive to the precise location.
2687 LsraLocation currentLoc = tree->gtLsraInfo.loc;
2688 assert(currentLoc != 0);
2689 addRefsForPhysRegMask(RBM_RDX, currentLoc, RefTypeKill, true);
2690 // Both RAX and RDX are killed by the operation
2691 killMask = RBM_RAX | RBM_RDX;
2694 #endif // _TARGET_XARCH_
2697 if (tree->OperIsCopyBlkOp())
2699 assert(tree->AsObj()->gtGcPtrCount != 0);
2700 killMask = compiler->compHelperCallKillSet(CORINFO_HELP_ASSIGN_BYREF);
2706 case GT_STORE_DYN_BLK:
2708 GenTreeBlk* blkNode = tree->AsBlk();
2709 bool isCopyBlk = varTypeIsStruct(blkNode->Data());
2710 switch (blkNode->gtBlkOpKind)
2712 case GenTreeBlk::BlkOpKindHelper:
2715 killMask = compiler->compHelperCallKillSet(CORINFO_HELP_MEMCPY);
2719 killMask = compiler->compHelperCallKillSet(CORINFO_HELP_MEMSET);
2723 #ifdef _TARGET_XARCH_
2724 case GenTreeBlk::BlkOpKindRepInstr:
2727 // rep movs kills RCX, RDI and RSI
2728 killMask = RBM_RCX | RBM_RDI | RBM_RSI;
2732 // rep stos kills RCX and RDI.
2733 // (Note that the Data() node, if not constant, will be assigned to
2734 // RCX, but it's find that this kills it, as the value is not available
2735 // after this node in any case.)
2736 killMask = RBM_RDI | RBM_RCX;
2740 case GenTreeBlk::BlkOpKindRepInstr:
2742 case GenTreeBlk::BlkOpKindUnroll:
2743 case GenTreeBlk::BlkOpKindInvalid:
2744 // for these 'gtBlkOpKind' kinds, we leave 'killMask' = RBM_NONE
2755 if (tree->gtLsraInfo.isHelperCallWithKills)
2757 killMask = RBM_CALLEE_TRASH;
2761 killMask = compiler->compHelperCallKillSet(CORINFO_HELP_STOP_FOR_GC);
2765 if (compiler->compFloatingPointUsed)
2767 if (tree->TypeGet() == TYP_DOUBLE)
2769 needDoubleTmpForFPCall = true;
2771 else if (tree->TypeGet() == TYP_FLOAT)
2773 needFloatTmpForFPCall = true;
2776 #endif // _TARGET_X86_
2777 #if defined(_TARGET_X86_) || defined(_TARGET_ARM_)
2778 if (tree->IsHelperCall())
2780 GenTreeCall* call = tree->AsCall();
2781 CorInfoHelpFunc helpFunc = compiler->eeGetHelperNum(call->gtCallMethHnd);
2782 killMask = compiler->compHelperCallKillSet(helpFunc);
2785 #endif // defined(_TARGET_X86_) || defined(_TARGET_ARM_)
2787 // if there is no FP used, we can ignore the FP kills
2788 if (compiler->compFloatingPointUsed)
2790 killMask = RBM_CALLEE_TRASH;
2794 killMask = RBM_INT_CALLEE_TRASH;
2799 if (compiler->codeGen->gcInfo.gcIsWriteBarrierAsgNode(tree))
2801 killMask = RBM_CALLEE_TRASH_NOGC;
2805 #if defined(PROFILING_SUPPORTED)
2806 // If this method requires profiler ELT hook then mark these nodes as killing
2807 // callee trash registers (excluding RAX and XMM0). The reason for this is that
2808 // profiler callback would trash these registers. See vm\amd64\asmhelpers.asm for
2811 if (compiler->compIsProfilerHookNeeded())
2813 killMask = compiler->compHelperCallKillSet(CORINFO_HELP_PROF_FCN_LEAVE);
2818 if (compiler->compIsProfilerHookNeeded())
2820 killMask = compiler->compHelperCallKillSet(CORINFO_HELP_PROF_FCN_TAILCALL);
2823 #endif // PROFILING_SUPPORTED
2826 // for all other 'tree->OperGet()' kinds, leave 'killMask' = RBM_NONE
2832 //------------------------------------------------------------------------
2833 // buildKillPositionsForNode:
2834 // Given some tree node add refpositions for all the registers this node kills
2837 // tree - the tree for which kill positions should be generated
2838 // currentLoc - the location at which the kills should be added
2841 // true - kills were inserted
2842 // false - no kills were inserted
2845 // The return value is needed because if we have any kills, we need to make sure that
2846 // all defs are located AFTER the kills. On the other hand, if there aren't kills,
2847 // the multiple defs for a regPair are in different locations.
2848 // If we generate any kills, we will mark all currentLiveVars as being preferenced
2849 // to avoid the killed registers. This is somewhat conservative.
2851 bool LinearScan::buildKillPositionsForNode(GenTree* tree, LsraLocation currentLoc)
2853 regMaskTP killMask = getKillSetForNode(tree);
2854 bool isCallKill = ((killMask == RBM_INT_CALLEE_TRASH) || (killMask == RBM_CALLEE_TRASH));
2855 if (killMask != RBM_NONE)
2857 // The killMask identifies a set of registers that will be used during codegen.
2858 // Mark these as modified here, so when we do final frame layout, we'll know about
2859 // all these registers. This is especially important if killMask contains
2860 // callee-saved registers, which affect the frame size since we need to save/restore them.
2861 // In the case where we have a copyBlk with GC pointers, can need to call the
2862 // CORINFO_HELP_ASSIGN_BYREF helper, which kills callee-saved RSI and RDI, if
2863 // LSRA doesn't assign RSI/RDI, they wouldn't get marked as modified until codegen,
2864 // which is too late.
2865 compiler->codeGen->regSet.rsSetRegsModified(killMask DEBUGARG(dumpTerse));
2867 addRefsForPhysRegMask(killMask, currentLoc, RefTypeKill, true);
2869 // TODO-CQ: It appears to be valuable for both fp and int registers to avoid killing the callee
2870 // save regs on infrequently exectued paths. However, it results in a large number of asmDiffs,
2871 // many of which appear to be regressions (because there is more spill on the infrequently path),
2872 // but are not really because the frequent path becomes smaller. Validating these diffs will need
2873 // to be done before making this change.
2874 // if (!blockSequence[curBBSeqNum]->isRunRarely())
2877 VARSET_ITER_INIT(compiler, iter, currentLiveVars, varIndex);
2878 while (iter.NextElem(compiler, &varIndex))
2880 unsigned varNum = compiler->lvaTrackedToVarNum[varIndex];
2881 LclVarDsc* varDsc = compiler->lvaTable + varNum;
2882 #if FEATURE_PARTIAL_SIMD_CALLEE_SAVE
2883 if (varDsc->lvType == LargeVectorType)
2885 if (!VarSetOps::IsMember(compiler, largeVectorCalleeSaveCandidateVars, varIndex))
2891 #endif // FEATURE_PARTIAL_SIMD_CALLEE_SAVE
2892 if (varTypeIsFloating(varDsc) &&
2893 !VarSetOps::IsMember(compiler, fpCalleeSaveCandidateVars, varIndex))
2897 Interval* interval = getIntervalForLocalVar(varNum);
2900 interval->preferCalleeSave = true;
2902 regMaskTP newPreferences = allRegs(interval->registerType) & (~killMask);
2904 if (newPreferences != RBM_NONE)
2906 interval->updateRegisterPreferences(newPreferences);
2910 // If there are no callee-saved registers, the call could kill all the registers.
2911 // This is a valid state, so in that case assert should not trigger. The RA will spill in order to
2912 // free a register later.
2913 assert(compiler->opts.compDbgEnC || (calleeSaveRegs(varDsc->lvType)) == RBM_NONE);
2918 if (tree->IsCall() && (tree->gtFlags & GTF_CALL_UNMANAGED) != 0)
2920 RefPosition* pos = newRefPosition((Interval*)nullptr, currentLoc, RefTypeKillGCRefs, tree,
2921 (allRegs(TYP_REF) & ~RBM_ARG_REGS));
2929 //----------------------------------------------------------------------------
2930 // defineNewInternalTemp: Defines a ref position for an internal temp.
2933 // tree - Gentree node requiring an internal register
2934 // regType - Register type
2935 // currentLoc - Location of the temp Def position
2936 // regMask - register mask of candidates for temp
2937 // minRegCandidateCount - Minimum registers to be ensured in candidate
2938 // set under LSRA stress mode. This is a
2940 RefPosition* LinearScan::defineNewInternalTemp(GenTree* tree,
2941 RegisterType regType,
2942 LsraLocation currentLoc,
2943 regMaskTP regMask DEBUGARG(unsigned minRegCandidateCount))
2945 Interval* current = newInterval(regType);
2946 current->isInternal = true;
2947 return newRefPosition(current, currentLoc, RefTypeDef, tree, regMask, 0 DEBUG_ARG(minRegCandidateCount));
2950 //------------------------------------------------------------------------
2951 // buildInternalRegisterDefsForNode - build Def positions for internal
2952 // registers required for tree node.
2955 // tree - Gentree node that needs internal registers
2956 // currentLoc - Location at which Def positions need to be defined
2957 // temps - in-out array which is populated with ref positions
2958 // created for Def of internal registers
2959 // minRegCandidateCount - Minimum registers to be ensured in candidate
2960 // set of ref positions under LSRA stress. This is
2961 // a DEBUG only arg.
2964 // The total number of Def positions created for internal registers of tree node.
2965 int LinearScan::buildInternalRegisterDefsForNode(GenTree* tree,
2966 LsraLocation currentLoc,
2967 RefPosition* temps[] // populates
2968 DEBUGARG(unsigned minRegCandidateCount))
2971 int internalIntCount = tree->gtLsraInfo.internalIntCount;
2972 regMaskTP internalCands = tree->gtLsraInfo.getInternalCandidates(this);
2974 // If the number of internal integer registers required is the same as the number of candidate integer registers in
2975 // the candidate set, then they must be handled as fixed registers.
2976 // (E.g. for the integer registers that floating point arguments must be copied into for a varargs call.)
2977 bool fixedRegs = false;
2978 regMaskTP internalIntCandidates = (internalCands & allRegs(TYP_INT));
2979 if (((int)genCountBits(internalIntCandidates)) == internalIntCount)
2984 for (count = 0; count < internalIntCount; count++)
2986 regMaskTP internalIntCands = (internalCands & allRegs(TYP_INT));
2989 internalIntCands = genFindLowestBit(internalIntCands);
2990 internalCands &= ~internalIntCands;
2993 defineNewInternalTemp(tree, IntRegisterType, currentLoc, internalIntCands DEBUG_ARG(minRegCandidateCount));
2996 int internalFloatCount = tree->gtLsraInfo.internalFloatCount;
2997 for (int i = 0; i < internalFloatCount; i++)
2999 regMaskTP internalFPCands = (internalCands & internalFloatRegCandidates());
3001 defineNewInternalTemp(tree, FloatRegisterType, currentLoc, internalFPCands DEBUG_ARG(minRegCandidateCount));
3004 noway_assert(count < MaxInternalRegisters);
3005 assert(count == (internalIntCount + internalFloatCount));
3009 //------------------------------------------------------------------------
3010 // buildInternalRegisterUsesForNode - adds Use positions for internal
3011 // registers required for tree node.
3014 // tree - Gentree node that needs internal registers
3015 // currentLoc - Location at which Use positions need to be defined
3016 // defs - int array containing Def positions of internal
3018 // total - Total number of Def positions in 'defs' array.
3019 // minRegCandidateCount - Minimum registers to be ensured in candidate
3020 // set of ref positions under LSRA stress. This is
3021 // a DEBUG only arg.
3025 void LinearScan::buildInternalRegisterUsesForNode(GenTree* tree,
3026 LsraLocation currentLoc,
3027 RefPosition* defs[],
3028 int total DEBUGARG(unsigned minRegCandidateCount))
3030 assert(total < MaxInternalRegisters);
3032 // defs[] has been populated by buildInternalRegisterDefsForNode
3033 // now just add uses to the defs previously added.
3034 for (int i = 0; i < total; i++)
3036 RefPosition* prevRefPosition = defs[i];
3037 assert(prevRefPosition != nullptr);
3038 regMaskTP mask = prevRefPosition->registerAssignment;
3039 if (prevRefPosition->isPhysRegRef)
3041 newRefPosition(defs[i]->getReg()->regNum, currentLoc, RefTypeUse, tree, mask);
3045 RefPosition* newest = newRefPosition(defs[i]->getInterval(), currentLoc, RefTypeUse, tree, mask,
3046 0 DEBUG_ARG(minRegCandidateCount));
3048 if (tree->gtLsraInfo.isInternalRegDelayFree)
3050 newest->delayRegFree = true;
3056 regMaskTP LinearScan::getUseCandidates(GenTree* useNode)
3058 TreeNodeInfo info = useNode->gtLsraInfo;
3059 return info.getSrcCandidates(this);
3062 regMaskTP LinearScan::getDefCandidates(GenTree* tree)
3064 TreeNodeInfo info = tree->gtLsraInfo;
3065 return info.getDstCandidates(this);
3068 RegisterType LinearScan::getDefType(GenTree* tree)
3070 return tree->TypeGet();
3073 regMaskTP fixedCandidateMask(var_types type, regMaskTP candidates)
3075 if (genMaxOneBit(candidates))
3082 //------------------------------------------------------------------------
3083 // LocationInfoListNode: used to store a single `LocationInfo` value for a
3084 // node during `buildIntervals`.
3086 // This is the node type for `LocationInfoList` below.
3088 class LocationInfoListNode final : public LocationInfo
3090 friend class LocationInfoList;
3091 friend class LocationInfoListNodePool;
3093 LocationInfoListNode* m_next; // The next node in the list
3096 LocationInfoListNode(LsraLocation l, Interval* i, GenTree* t, unsigned regIdx = 0) : LocationInfo(l, i, t, regIdx)
3100 //------------------------------------------------------------------------
3101 // LocationInfoListNode::Next: Returns the next node in the list.
3102 LocationInfoListNode* Next() const
3108 //------------------------------------------------------------------------
3109 // LocationInfoList: used to store a list of `LocationInfo` values for a
3110 // node during `buildIntervals`.
3112 // Given an IR node that either directly defines N registers or that is a
3113 // contained node with uses that define a total of N registers, that node
3114 // will map to N `LocationInfo` values. These values are stored as a
3115 // linked list of `LocationInfoListNode` values.
3117 class LocationInfoList final
3119 friend class LocationInfoListNodePool;
3121 LocationInfoListNode* m_head; // The head of the list
3122 LocationInfoListNode* m_tail; // The tail of the list
3125 LocationInfoList() : m_head(nullptr), m_tail(nullptr)
3129 LocationInfoList(LocationInfoListNode* node) : m_head(node), m_tail(node)
3131 assert(m_head->m_next == nullptr);
3134 //------------------------------------------------------------------------
3135 // LocationInfoList::IsEmpty: Returns true if the list is empty.
3137 bool IsEmpty() const
3139 return m_head == nullptr;
3142 //------------------------------------------------------------------------
3143 // LocationInfoList::Begin: Returns the first node in the list.
3145 LocationInfoListNode* Begin() const
3150 //------------------------------------------------------------------------
3151 // LocationInfoList::End: Returns the position after the last node in the
3152 // list. The returned value is suitable for use as
3153 // a sentinel for iteration.
3155 LocationInfoListNode* End() const
3160 //------------------------------------------------------------------------
3161 // LocationInfoList::Append: Appends a node to the list.
3164 // node - The node to append. Must not be part of an existing list.
3166 void Append(LocationInfoListNode* node)
3168 assert(node->m_next == nullptr);
3170 if (m_tail == nullptr)
3172 assert(m_head == nullptr);
3177 m_tail->m_next = node;
3183 //------------------------------------------------------------------------
3184 // LocationInfoList::Append: Appends another list to this list.
3187 // other - The list to append.
3189 void Append(LocationInfoList other)
3191 if (m_tail == nullptr)
3193 assert(m_head == nullptr);
3194 m_head = other.m_head;
3198 m_tail->m_next = other.m_head;
3201 m_tail = other.m_tail;
3205 //------------------------------------------------------------------------
3206 // LocationInfoListNodePool: manages a pool of `LocationInfoListNode`
3207 // values to decrease overall memory usage
3208 // during `buildIntervals`.
3210 // `buildIntervals` involves creating a list of location info values per
3211 // node that either directly produces a set of registers or that is a
3212 // contained node with register-producing sources. However, these lists
3213 // are short-lived: they are destroyed once the use of the corresponding
3214 // node is processed. As such, there is typically only a small number of
3215 // `LocationInfoListNode` values in use at any given time. Pooling these
3216 // values avoids otherwise frequent allocations.
3217 class LocationInfoListNodePool final
3219 LocationInfoListNode* m_freeList;
3220 Compiler* m_compiler;
3223 //------------------------------------------------------------------------
3224 // LocationInfoListNodePool::LocationInfoListNodePool:
3225 // Creates a pool of `LocationInfoListNode` values.
3228 // compiler - The compiler context.
3229 // preallocate - The number of nodes to preallocate.
3231 LocationInfoListNodePool(Compiler* compiler, unsigned preallocate = 0) : m_compiler(compiler)
3233 if (preallocate > 0)
3235 size_t preallocateSize = sizeof(LocationInfoListNode) * preallocate;
3236 auto* preallocatedNodes = reinterpret_cast<LocationInfoListNode*>(compiler->compGetMem(preallocateSize));
3238 LocationInfoListNode* head = preallocatedNodes;
3239 head->m_next = nullptr;
3241 for (unsigned i = 1; i < preallocate; i++)
3243 LocationInfoListNode* node = &preallocatedNodes[i];
3244 node->m_next = head;
3252 //------------------------------------------------------------------------
3253 // LocationInfoListNodePool::GetNode: Fetches an unused node from the
3257 // l - - The `LsraLocation` for the `LocationInfo` value.
3258 // i - The interval for the `LocationInfo` value.
3259 // t - The IR node for the `LocationInfo` value
3260 // regIdx - The register index for the `LocationInfo` value.
3263 // A pooled or newly-allocated `LocationInfoListNode`, depending on the
3264 // contents of the pool.
3265 LocationInfoListNode* GetNode(LsraLocation l, Interval* i, GenTree* t, unsigned regIdx = 0)
3267 LocationInfoListNode* head = m_freeList;
3268 if (head == nullptr)
3270 head = reinterpret_cast<LocationInfoListNode*>(m_compiler->compGetMem(sizeof(LocationInfoListNode)));
3274 m_freeList = head->m_next;
3280 head->multiRegIdx = regIdx;
3281 head->m_next = nullptr;
3286 //------------------------------------------------------------------------
3287 // LocationInfoListNodePool::ReturnNodes: Returns a list of nodes to the
3291 // list - The list to return.
3293 void ReturnNodes(LocationInfoList& list)
3295 assert(list.m_head != nullptr);
3296 assert(list.m_tail != nullptr);
3298 LocationInfoListNode* head = m_freeList;
3299 list.m_tail->m_next = head;
3300 m_freeList = list.m_head;
3304 #if FEATURE_PARTIAL_SIMD_CALLEE_SAVE
3306 LinearScan::buildUpperVectorSaveRefPositions(GenTree* tree, LsraLocation currentLoc)
3308 VARSET_TP VARSET_INIT_NOCOPY(liveLargeVectors, VarSetOps::MakeEmpty(compiler));
3309 regMaskTP fpCalleeKillSet = RBM_NONE;
3310 if (!VarSetOps::IsEmpty(compiler, largeVectorVars))
3312 // We actually need to find any calls that kill the upper-half of the callee-save vector registers.
3313 // But we will use as a proxy any node that kills floating point registers.
3314 // (Note that some calls are masquerading as other nodes at this point so we can't just check for calls.)
3315 fpCalleeKillSet = getKillSetForNode(tree);
3316 if ((fpCalleeKillSet & RBM_FLT_CALLEE_TRASH) != RBM_NONE)
3318 VarSetOps::AssignNoCopy(compiler, liveLargeVectors,
3319 VarSetOps::Intersection(compiler, currentLiveVars, largeVectorVars));
3320 VARSET_ITER_INIT(compiler, iter, liveLargeVectors, varIndex);
3321 while (iter.NextElem(compiler, &varIndex))
3323 unsigned varNum = compiler->lvaTrackedToVarNum[varIndex];
3324 Interval* varInterval = getIntervalForLocalVar(varNum);
3325 Interval* tempInterval = newInterval(LargeVectorType);
3326 tempInterval->isInternal = true;
3328 newRefPosition(tempInterval, currentLoc, RefTypeUpperVectorSaveDef, tree, RBM_FLT_CALLEE_SAVED);
3329 // We are going to save the existing relatedInterval of varInterval on tempInterval, so that we can set
3330 // the tempInterval as the relatedInterval of varInterval, so that we can build the corresponding
3331 // RefTypeUpperVectorSaveUse RefPosition. We will then restore the relatedInterval onto varInterval,
3332 // and set varInterval as the relatedInterval of tempInterval.
3333 tempInterval->relatedInterval = varInterval->relatedInterval;
3334 varInterval->relatedInterval = tempInterval;
3338 return liveLargeVectors;
3341 void LinearScan::buildUpperVectorRestoreRefPositions(GenTree* tree,
3342 LsraLocation currentLoc,
3343 VARSET_VALARG_TP liveLargeVectors)
3345 if (!VarSetOps::IsEmpty(compiler, liveLargeVectors))
3347 VARSET_ITER_INIT(compiler, iter, liveLargeVectors, varIndex);
3348 while (iter.NextElem(compiler, &varIndex))
3350 unsigned varNum = compiler->lvaTrackedToVarNum[varIndex];
3351 Interval* varInterval = getIntervalForLocalVar(varNum);
3352 Interval* tempInterval = varInterval->relatedInterval;
3353 assert(tempInterval->isInternal == true);
3355 newRefPosition(tempInterval, currentLoc, RefTypeUpperVectorSaveUse, tree, RBM_FLT_CALLEE_SAVED);
3356 // Restore the relatedInterval onto varInterval, and set varInterval as the relatedInterval
3358 varInterval->relatedInterval = tempInterval->relatedInterval;
3359 tempInterval->relatedInterval = varInterval;
3363 #endif // FEATURE_PARTIAL_SIMD_CALLEE_SAVE
3366 //------------------------------------------------------------------------
3367 // ComputeOperandDstCount: computes the number of registers defined by a
3370 // For most nodes, this is simple:
3371 // - Nodes that do not produce values (e.g. stores and other void-typed
3372 // nodes) and nodes that immediately use the registers they define
3373 // produce no registers
3374 // - Nodes that are marked as defining N registers define N registers.
3376 // For contained nodes, however, things are more complicated: for purposes
3377 // of bookkeeping, a contained node is treated as producing the transitive
3378 // closure of the registers produced by its sources.
3381 // operand - The operand for which to compute a register count.
3384 // The number of registers defined by `operand`.
3386 static int ComputeOperandDstCount(GenTree* operand)
3388 TreeNodeInfo& operandInfo = operand->gtLsraInfo;
3390 if (operandInfo.isLocalDefUse)
3392 // Operands that define an unused value do not produce any registers.
3395 else if (operandInfo.dstCount != 0)
3397 // Operands that have a specified number of destination registers consume all of their operands
3398 // and therefore produce exactly that number of registers.
3399 return operandInfo.dstCount;
3401 else if (operandInfo.srcCount != 0)
3403 // If an operand has no destination registers but does have source registers, it must be a store
3405 assert(operand->OperIsStore() || operand->OperIsBlkOp() || operand->OperIsPutArgStk() ||
3406 operand->OperIsCompare() || operand->IsSIMDEqualityOrInequality());
3409 else if (!operand->OperIsFieldListHead() && (operand->OperIsStore() || operand->TypeGet() == TYP_VOID))
3411 // Stores and void-typed operands may be encountered when processing call nodes, which contain
3412 // pointers to argument setup stores.
3417 // If a field list or non-void-typed operand is not an unused value and does not have source registers,
3418 // that argument is contained within its parent and produces `sum(operand_dst_count)` registers.
3420 for (GenTree* op : operand->Operands())
3422 dstCount += ComputeOperandDstCount(op);
3429 //------------------------------------------------------------------------
3430 // ComputeAvailableSrcCount: computes the number of registers available as
3431 // sources for a node.
3433 // This is simply the sum of the number of registers produced by each
3434 // operand to the node.
3437 // node - The node for which to compute a source count.
3440 // The number of registers available as sources for `node`.
3442 static int ComputeAvailableSrcCount(GenTree* node)
3445 for (GenTree* operand : node->Operands())
3447 numSources += ComputeOperandDstCount(operand);
3454 void LinearScan::buildRefPositionsForNode(GenTree* tree,
3456 LocationInfoListNodePool& listNodePool,
3457 HashTableBase<GenTree*, LocationInfoList>& operandToLocationInfoMap,
3458 LsraLocation currentLoc)
3461 assert(!isRegPairType(tree->TypeGet()));
3462 #endif // _TARGET_ARM_
3464 // The LIR traversal doesn't visit GT_LIST or GT_ARGPLACE nodes.
3465 // GT_CLS_VAR nodes should have been eliminated by rationalizer.
3466 assert(tree->OperGet() != GT_ARGPLACE);
3467 assert(tree->OperGet() != GT_LIST);
3468 assert(tree->OperGet() != GT_CLS_VAR);
3470 // The LIR traversal visits only the first node in a GT_FIELD_LIST.
3471 assert((tree->OperGet() != GT_FIELD_LIST) || tree->AsFieldList()->IsFieldListHead());
3473 // The set of internal temporary registers used by this node are stored in the
3474 // gtRsvdRegs register mask. Clear it out.
3475 tree->gtRsvdRegs = RBM_NONE;
3480 JITDUMP("at start of tree, map contains: { ");
3482 for (auto kvp : operandToLocationInfoMap)
3484 GenTree* node = kvp.Key();
3485 LocationInfoList defList = kvp.Value();
3487 JITDUMP("%sN%03u. %s -> (", first ? "" : "; ", node->gtSeqNum, GenTree::NodeName(node->OperGet()));
3488 for (LocationInfoListNode *def = defList.Begin(), *end = defList.End(); def != end; def = def->Next())
3490 JITDUMP("%s%d.N%03u", def == defList.Begin() ? "" : ", ", def->loc, def->treeNode->gtSeqNum);
3500 TreeNodeInfo info = tree->gtLsraInfo;
3501 assert(info.IsValid(this));
3502 int consume = info.srcCount;
3503 int produce = info.dstCount;
3505 assert(((consume == 0) && (produce == 0)) || (ComputeAvailableSrcCount(tree) == consume));
3507 if (isCandidateLocalRef(tree) && !tree->OperIsLocalStore())
3509 assert(consume == 0);
3511 // We handle tracked variables differently from non-tracked ones. If it is tracked,
3512 // we simply add a use or def of the tracked variable. Otherwise, for a use we need
3513 // to actually add the appropriate references for loading or storing the variable.
3515 // It won't actually get used or defined until the appropriate ancestor tree node
3516 // is processed, unless this is marked "isLocalDefUse" because it is a stack-based argument
3519 Interval* interval = getIntervalForLocalVar(tree->gtLclVarCommon.gtLclNum);
3520 regMaskTP candidates = getUseCandidates(tree);
3521 regMaskTP fixedAssignment = fixedCandidateMask(tree->TypeGet(), candidates);
3523 // We have only approximate last-use information at this point. This is because the
3524 // execution order doesn't actually reflect the true order in which the localVars
3525 // are referenced - but the order of the RefPositions will, so we recompute it after
3526 // RefPositions are built.
3527 // Use the old value for setting currentLiveVars - note that we do this with the
3528 // not-quite-correct setting of lastUse. However, this is OK because
3529 // 1) this is only for preferencing, which doesn't require strict correctness, and
3530 // 2) the cases where these out-of-order uses occur should not overlap a kill.
3531 // TODO-Throughput: clean this up once we have the execution order correct. At that point
3532 // we can update currentLiveVars at the same place that we create the RefPosition.
3533 if ((tree->gtFlags & GTF_VAR_DEATH) != 0)
3535 VarSetOps::RemoveElemD(compiler, currentLiveVars,
3536 compiler->lvaTable[tree->gtLclVarCommon.gtLclNum].lvVarIndex);
3539 JITDUMP("t%u (i:%u)\n", currentLoc, interval->intervalIndex);
3541 if (!info.isLocalDefUse)
3545 LocationInfoList list(listNodePool.GetNode(currentLoc, interval, tree));
3546 bool added = operandToLocationInfoMap.AddOrUpdate(tree, list);
3549 tree->gtLsraInfo.definesAnyRegisters = true;
3556 JITDUMP(" Not added to map\n");
3557 regMaskTP candidates = getUseCandidates(tree);
3559 if (fixedAssignment != RBM_NONE)
3561 candidates = fixedAssignment;
3563 RefPosition* pos = newRefPosition(interval, currentLoc, RefTypeUse, tree, candidates);
3564 pos->isLocalDefUse = true;
3565 pos->setAllocateIfProfitable(tree->IsRegOptional());
3566 DBEXEC(VERBOSE, pos->dump());
3574 lsraDispNode(tree, LSRA_DUMP_REFPOS, (produce != 0));
3576 JITDUMP(" consume=%d produce=%d\n", consume, produce);
3580 const bool isContainedNode = !info.isLocalDefUse && consume == 0 && produce == 0 && tree->canBeContained();
3581 if (isContainedNode)
3583 assert(info.internalIntCount == 0);
3584 assert(info.internalFloatCount == 0);
3586 // Contained nodes map to the concatenated lists of their operands.
3587 LocationInfoList locationInfoList;
3588 for (GenTree* op : tree->Operands())
3590 if (!op->gtLsraInfo.definesAnyRegisters)
3592 assert(ComputeOperandDstCount(op) == 0);
3596 LocationInfoList operandList;
3597 bool removed = operandToLocationInfoMap.TryRemove(op, &operandList);
3600 locationInfoList.Append(operandList);
3603 if (!locationInfoList.IsEmpty())
3605 bool added = operandToLocationInfoMap.AddOrUpdate(tree, locationInfoList);
3607 tree->gtLsraInfo.definesAnyRegisters = true;
3613 // Handle the case of local variable assignment
3614 Interval* varDefInterval = nullptr;
3615 RefType defRefType = RefTypeDef;
3617 GenTree* defNode = tree;
3619 // noAdd means the node creates a def but for purposes of map
3620 // management do not add it because data is not flowing up the
3621 // tree but over (as in ASG nodes)
3623 bool noAdd = info.isLocalDefUse;
3624 RefPosition* prevPos = nullptr;
3626 bool isSpecialPutArg = false;
3628 assert(!tree->OperIsAssignment());
3629 if (tree->OperIsLocalStore())
3631 if (isCandidateLocalRef(tree))
3633 // We always push the tracked lclVar intervals
3634 varDefInterval = getIntervalForLocalVar(tree->gtLclVarCommon.gtLclNum);
3635 defRefType = refTypeForLocalRefNode(tree);
3643 assert(consume <= MAX_RET_REG_COUNT);
3646 // Get the location info for the register defined by the first operand.
3647 LocationInfoList operandDefs;
3648 bool found = operandToLocationInfoMap.TryGetValue(*(tree->OperandsBegin()), &operandDefs);
3651 // Since we only expect to consume one register, we should only have a single register to
3653 assert(operandDefs.Begin()->Next() == operandDefs.End());
3655 LocationInfo& operandInfo = *static_cast<LocationInfo*>(operandDefs.Begin());
3657 Interval* srcInterval = operandInfo.interval;
3658 if (srcInterval->relatedInterval == nullptr)
3660 // Preference the source to the dest, unless this is a non-last-use localVar.
3661 // Note that the last-use info is not correct, but it is a better approximation than preferencing
3662 // the source to the dest, if the source's lifetime extends beyond the dest.
3663 if (!srcInterval->isLocalVar || (operandInfo.treeNode->gtFlags & GTF_VAR_DEATH) != 0)
3665 srcInterval->assignRelatedInterval(varDefInterval);
3668 else if (!srcInterval->isLocalVar)
3670 // Preference the source to dest, if src is not a local var.
3671 srcInterval->assignRelatedInterval(varDefInterval);
3674 // We can have a case where the source of the store has a different register type,
3675 // e.g. when the store is of a return value temp, and op1 is a Vector2
3676 // (TYP_SIMD8). We will need to set the
3677 // src candidates accordingly on op1 so that LSRA will generate a copy.
3678 // We could do this during Lowering, but at that point we don't know whether
3679 // this lclVar will be a register candidate, and if not, we would prefer to leave
3681 if (regType(tree->gtGetOp1()->TypeGet()) != regType(tree->TypeGet()))
3683 tree->gtGetOp1()->gtLsraInfo.setSrcCandidates(this, allRegs(tree->TypeGet()));
3687 if ((tree->gtFlags & GTF_VAR_DEATH) == 0)
3689 VarSetOps::AddElemD(compiler, currentLiveVars,
3690 compiler->lvaTable[tree->gtLclVarCommon.gtLclNum].lvVarIndex);
3694 else if (noAdd && produce == 0)
3696 // This is the case for dead nodes that occur after
3697 // tree rationalization
3698 // TODO-Cleanup: Identify and remove these dead nodes prior to register allocation.
3699 if (tree->IsMultiRegCall())
3701 // In case of multi-reg call node, produce = number of return registers
3702 produce = tree->AsCall()->GetReturnTypeDesc()->GetReturnRegCount();
3715 if (varDefInterval != nullptr)
3717 printf("t%u (i:%u) = op ", currentLoc, varDefInterval->intervalIndex);
3721 for (int i = 0; i < produce; i++)
3723 printf("t%u ", currentLoc);
3736 Interval* prefSrcInterval = nullptr;
3738 // If this is a binary operator that will be encoded with 2 operand fields
3739 // (i.e. the target is read-modify-write), preference the dst to op1.
3741 bool hasDelayFreeSrc = tree->gtLsraInfo.hasDelayFreeSrc;
3743 #if defined(DEBUG) && defined(_TARGET_X86_)
3744 // On x86, `LSRA_LIMIT_CALLER` is too restrictive to allow the use of special put args: this stress mode
3745 // leaves only three registers allocatable--eax, ecx, and edx--of which the latter two are also used for the
3746 // first two integral arguments to a call. This can leave us with too few registers to succesfully allocate in
3747 // situations like the following:
3749 // t1026 = lclVar ref V52 tmp35 u:3 REG NA <l:$3a1, c:$98d>
3752 // t1352 = * putarg_reg ref REG NA
3754 // t342 = lclVar int V14 loc6 u:4 REG NA $50c
3756 // t343 = const int 1 REG NA $41
3760 // t344 = * + int REG NA $495
3762 // t345 = lclVar int V04 arg4 u:2 REG NA $100
3766 // t346 = * % int REG NA $496
3769 // t1353 = * putarg_reg int REG NA
3771 // t1354 = lclVar ref V52 tmp35 (last use) REG NA
3774 // t1355 = * lea(b+0) byref REG NA
3776 // Here, the first `putarg_reg` would normally be considered a special put arg, which would remove `ecx` from the
3777 // set of allocatable registers, leaving only `eax` and `edx`. The allocator will then fail to allocate a register
3778 // for the def of `t345` if arg4 is not a register candidate: the corresponding ref position will be constrained to
3779 // { `ecx`, `ebx`, `esi`, `edi` }, which `LSRA_LIMIT_CALLER` will further constrain to `ecx`, which will not be
3780 // available due to the special put arg.
3781 const bool supportsSpecialPutArg = getStressLimitRegs() != LSRA_LIMIT_CALLER;
3783 const bool supportsSpecialPutArg = true;
3786 if (supportsSpecialPutArg && tree->OperGet() == GT_PUTARG_REG && isCandidateLocalRef(tree->gtGetOp1()) &&
3787 (tree->gtGetOp1()->gtFlags & GTF_VAR_DEATH) == 0)
3789 // This is the case for a "pass-through" copy of a lclVar. In the case where it is a non-last-use,
3790 // we don't want the def of the copy to kill the lclVar register, if it is assigned the same register
3791 // (which is actually what we hope will happen).
3792 JITDUMP("Setting putarg_reg as a pass-through of a non-last use lclVar\n");
3794 // Get the register information for the first operand of the node.
3795 LocationInfoList operandDefs;
3796 bool found = operandToLocationInfoMap.TryGetValue(*(tree->OperandsBegin()), &operandDefs);
3799 // Preference the destination to the interval of the first register defined by the first operand.
3800 Interval* srcInterval = operandDefs.Begin()->interval;
3801 assert(srcInterval->isLocalVar);
3802 prefSrcInterval = srcInterval;
3803 isSpecialPutArg = true;
3806 RefPosition* internalRefs[MaxInternalRegisters];
3809 // Number of registers required for tree node is the sum of
3810 // consume + produce + internalCount. This is the minimum
3811 // set of registers that needs to be ensured in candidate
3812 // set of ref positions created.
3813 unsigned minRegCount = consume + produce + info.internalIntCount + info.internalFloatCount;
3816 // make intervals for all the 'internal' register requirements for this node
3817 // where internal means additional registers required temporarily
3818 int internalCount = buildInternalRegisterDefsForNode(tree, currentLoc, internalRefs DEBUG_ARG(minRegCount));
3820 // pop all ref'd tree temps
3821 GenTreeOperandIterator iterator = tree->OperandsBegin();
3823 // `operandDefs` holds the list of `LocationInfo` values for the registers defined by the current
3824 // operand. `operandDefsIterator` points to the current `LocationInfo` value in `operandDefs`.
3825 LocationInfoList operandDefs;
3826 LocationInfoListNode* operandDefsIterator = operandDefs.End();
3827 for (int useIndex = 0; useIndex < consume; useIndex++)
3829 // If we've consumed all of the registers defined by the current operand, advance to the next
3830 // operand that defines any registers.
3831 if (operandDefsIterator == operandDefs.End())
3833 // Skip operands that do not define any registers, whether directly or indirectly.
3837 assert(iterator != tree->OperandsEnd());
3838 operand = *iterator;
3841 } while (!operand->gtLsraInfo.definesAnyRegisters);
3843 // If we have already processed a previous operand, return its `LocationInfo` list to the
3847 assert(!operandDefs.IsEmpty());
3848 listNodePool.ReturnNodes(operandDefs);
3851 // Remove the list of registers defined by the current operand from the map. Note that this
3852 // is only correct because tree nodes are singly-used: if this property ever changes (e.g.
3853 // if tree nodes are eventually allowed to be multiply-used), then the removal is only
3854 // correct at the last use.
3855 bool removed = operandToLocationInfoMap.TryRemove(operand, &operandDefs);
3858 // Move the operand def iterator to the `LocationInfo` for the first register defined by the
3860 operandDefsIterator = operandDefs.Begin();
3861 assert(operandDefsIterator != operandDefs.End());
3864 LocationInfo& locInfo = *static_cast<LocationInfo*>(operandDefsIterator);
3865 operandDefsIterator = operandDefsIterator->Next();
3867 JITDUMP("t%u ", locInfo.loc);
3869 // for interstitial tree temps, a use is always last and end;
3870 // this is set by default in newRefPosition
3871 GenTree* useNode = locInfo.treeNode;
3872 assert(useNode != nullptr);
3873 var_types type = useNode->TypeGet();
3874 regMaskTP candidates = getUseCandidates(useNode);
3875 Interval* i = locInfo.interval;
3876 unsigned multiRegIdx = locInfo.multiRegIdx;
3879 // In case of multi-reg call store to a local, there won't be any mismatch of
3880 // use candidates with the type of the tree node.
3881 if (tree->OperIsLocalStore() && varDefInterval == nullptr && !useNode->IsMultiRegCall())
3883 // This is a non-candidate store. If this is a SIMD type, the use candidates
3884 // may not match the type of the tree node. If that is the case, change the
3885 // type of the tree node to match, so that we do the right kind of store.
3886 if ((candidates & allRegs(tree->gtType)) == RBM_NONE)
3888 noway_assert((candidates & allRegs(useNode->gtType)) != RBM_NONE);
3889 // Currently, the only case where this should happen is for a TYP_LONG
3890 // source and a TYP_SIMD8 target.
3891 assert((useNode->gtType == TYP_LONG && tree->gtType == TYP_SIMD8) ||
3892 (useNode->gtType == TYP_SIMD8 && tree->gtType == TYP_LONG));
3893 tree->gtType = useNode->gtType;
3896 #endif // FEATURE_SIMD
3898 if (useNode->gtLsraInfo.isTgtPref)
3900 prefSrcInterval = i;
3903 regMaskTP fixedAssignment = fixedCandidateMask(type, candidates);
3904 if (fixedAssignment != RBM_NONE)
3906 candidates = fixedAssignment;
3909 const bool regOptionalAtUse = useNode->IsRegOptional();
3910 const bool delayRegFree = (hasDelayFreeSrc && useNode->gtLsraInfo.isDelayFree);
3912 assert(isCandidateLocalRef(useNode) == i->isLocalVar);
3915 // For non-localVar uses we record nothing,
3916 // as nothing needs to be written back to the tree.
3921 // If delayRegFree, then Use will interfere with the destination of
3922 // the consuming node. Therefore, we also need add the kill set of
3923 // consuming node to minRegCount.
3925 // For example consider the following IR on x86, where v01 and v02
3926 // are method args coming in ecx and edx respectively.
3929 // For GT_DIV minRegCount will be 3 without adding kill set
3932 // Assume further JitStressRegs=2, which would constrain
3933 // candidates to callee trashable regs { eax, ecx, edx } on
3934 // use positions of v01 and v02. LSRA allocates ecx for v01.
3935 // Use position of v02 cannot be allocated a regs since it
3936 // is marked delay-reg free and {eax,edx} are getting killed
3937 // before the def of GT_DIV. For this reason, minRegCount
3938 // for Use position of v02 also needs to take into account
3939 // of kill set of its consuming node.
3940 unsigned minRegCountForUsePos = minRegCount;
3943 regMaskTP killMask = getKillSetForNode(tree);
3944 if (killMask != RBM_NONE)
3946 minRegCountForUsePos += genCountBits(killMask);
3952 if ((candidates & allRegs(i->registerType)) == 0)
3954 // This should only occur where we've got a type mismatch due to SIMD
3955 // pointer-size types that are passed & returned as longs.
3956 i->hasConflictingDefUse = true;
3957 if (fixedAssignment != RBM_NONE)
3959 // Explicitly insert a FixedRefPosition and fake the candidates, because otherwise newRefPosition
3960 // will complain about the types not matching.
3961 regNumber physicalReg = genRegNumFromMask(fixedAssignment);
3962 RefPosition* pos = newRefPosition(physicalReg, currentLoc, RefTypeFixedReg, nullptr, fixedAssignment);
3964 pos = newRefPosition(i, currentLoc, RefTypeUse, useNode, allRegs(i->registerType),
3965 multiRegIdx DEBUG_ARG(minRegCountForUsePos));
3966 pos->registerAssignment = candidates;
3970 pos = newRefPosition(i, currentLoc, RefTypeUse, useNode, candidates,
3971 multiRegIdx DEBUG_ARG(minRegCountForUsePos));
3976 hasDelayFreeSrc = true;
3977 pos->delayRegFree = true;
3980 if (regOptionalAtUse)
3982 pos->setAllocateIfProfitable(1);
3987 if (!operandDefs.IsEmpty())
3989 listNodePool.ReturnNodes(operandDefs);
3992 buildInternalRegisterUsesForNode(tree, currentLoc, internalRefs, internalCount DEBUG_ARG(minRegCount));
3994 RegisterType registerType = getDefType(tree);
3995 regMaskTP candidates = getDefCandidates(tree);
3996 regMaskTP useCandidates = getUseCandidates(tree);
4001 printf("Def candidates ");
4002 dumpRegMask(candidates);
4003 printf(", Use candidates ");
4004 dumpRegMask(useCandidates);
4009 #if defined(_TARGET_AMD64_)
4010 // Multi-reg call node is the only node that could produce multi-reg value
4011 assert(produce <= 1 || (tree->IsMultiRegCall() && produce == MAX_RET_REG_COUNT));
4012 #endif // _TARGET_xxx_
4014 // Add kill positions before adding def positions
4015 buildKillPositionsForNode(tree, currentLoc + 1);
4017 #if FEATURE_PARTIAL_SIMD_CALLEE_SAVE
4018 VARSET_TP VARSET_INIT_NOCOPY(liveLargeVectors, VarSetOps::UninitVal());
4019 if (RBM_FLT_CALLEE_SAVED != RBM_NONE)
4021 // Build RefPositions for saving any live large vectors.
4022 // This must be done after the kills, so that we know which large vectors are still live.
4023 VarSetOps::AssignNoCopy(compiler, liveLargeVectors, buildUpperVectorSaveRefPositions(tree, currentLoc + 1));
4025 #endif // FEATURE_PARTIAL_SIMD_CALLEE_SAVE
4027 ReturnTypeDesc* retTypeDesc = nullptr;
4028 bool isMultiRegCall = tree->IsMultiRegCall();
4031 retTypeDesc = tree->AsCall()->GetReturnTypeDesc();
4032 assert((int)genCountBits(candidates) == produce);
4033 assert(candidates == retTypeDesc->GetABIReturnRegs());
4037 LocationInfoList locationInfoList;
4038 LsraLocation defLocation = currentLoc + 1;
4039 for (int i = 0; i < produce; i++)
4041 regMaskTP currCandidates = candidates;
4042 Interval* interval = varDefInterval;
4044 // In case of multi-reg call node, registerType is given by
4045 // the type of ith position return register.
4048 registerType = retTypeDesc->GetReturnRegType((unsigned)i);
4049 currCandidates = genRegMask(retTypeDesc->GetABIReturnReg(i));
4050 useCandidates = allRegs(registerType);
4053 if (interval == nullptr)
4055 // Make a new interval
4056 interval = newInterval(registerType);
4057 if (hasDelayFreeSrc)
4059 interval->hasNonCommutativeRMWDef = true;
4061 else if (tree->OperIsConst())
4063 assert(!tree->IsReuseRegVal());
4064 interval->isConstant = true;
4067 if ((currCandidates & useCandidates) != RBM_NONE)
4069 interval->updateRegisterPreferences(currCandidates & useCandidates);
4072 if (isSpecialPutArg)
4074 interval->isSpecialPutArg = true;
4079 assert(registerTypesEquivalent(interval->registerType, registerType));
4082 if (prefSrcInterval != nullptr)
4084 interval->assignRelatedIntervalIfUnassigned(prefSrcInterval);
4087 // for assignments, we want to create a refposition for the def
4091 locationInfoList.Append(listNodePool.GetNode(defLocation, interval, tree, (unsigned)i));
4094 RefPosition* pos = newRefPosition(interval, defLocation, defRefType, defNode, currCandidates,
4095 (unsigned)i DEBUG_ARG(minRegCount));
4096 if (info.isLocalDefUse)
4098 pos->isLocalDefUse = true;
4099 pos->lastUse = true;
4101 DBEXEC(VERBOSE, pos->dump());
4102 interval->updateRegisterPreferences(currCandidates);
4103 interval->updateRegisterPreferences(useCandidates);
4106 #if FEATURE_PARTIAL_SIMD_CALLEE_SAVE
4107 // SaveDef position must be at the same location as Def position of call node.
4108 buildUpperVectorRestoreRefPositions(tree, defLocation, liveLargeVectors);
4109 #endif // FEATURE_PARTIAL_SIMD_CALLEE_SAVE
4111 if (!locationInfoList.IsEmpty())
4113 bool added = operandToLocationInfoMap.AddOrUpdate(tree, locationInfoList);
4115 tree->gtLsraInfo.definesAnyRegisters = true;
4119 // make an interval for each physical register
4120 void LinearScan::buildPhysRegRecords()
4122 RegisterType regType = IntRegisterType;
4123 for (regNumber reg = REG_FIRST; reg < ACTUAL_REG_COUNT; reg = REG_NEXT(reg))
4125 RegRecord* curr = &physRegs[reg];
4130 BasicBlock* getNonEmptyBlock(BasicBlock* block)
4132 while (block != nullptr && block->bbTreeList == nullptr)
4134 BasicBlock* nextBlock = block->bbNext;
4135 // Note that here we use the version of NumSucc that does not take a compiler.
4136 // That way this doesn't have to take a compiler, or be an instance method, e.g. of LinearScan.
4137 // If we have an empty block, it must have jump type BBJ_NONE or BBJ_ALWAYS, in which
4138 // case we don't need the version that takes a compiler.
4139 assert(block->NumSucc() == 1 && ((block->bbJumpKind == BBJ_ALWAYS) || (block->bbJumpKind == BBJ_NONE)));
4140 // sometimes the first block is empty and ends with an uncond branch
4141 // assert( block->GetSucc(0) == nextBlock);
4144 assert(block != nullptr && block->bbTreeList != nullptr);
4148 //------------------------------------------------------------------------
4149 // insertZeroInitRefPositions: Handle lclVars that are live-in to the first block
4152 // For each lclVar that is live-in to the first block:
4153 // - If it is a GC ref, or if compInitMem is set, a ZeroInit RefPosition will be created.
4154 // - Otherwise, it will be marked as spilled, since it will not be assigned a register
4155 // on entry and will be loaded from memory on the undefined path.
4156 // Note that, when the compInitMem option is not set, we may encounter these on
4157 // paths that are protected by the same condition as an earlier def. However, since
4158 // we don't do the analysis to determine this - and couldn't rely on always identifying
4159 // such cases even if we tried - we must conservatively treat the undefined path as
4160 // being possible. This is a relatively rare case, so the introduced conservatism is
4161 // not expected to warrant the analysis required to determine the best placement of
4162 // an initialization.
4164 void LinearScan::insertZeroInitRefPositions()
4166 // insert defs for this, then a block boundary
4168 VARSET_ITER_INIT(compiler, iter, compiler->fgFirstBB->bbLiveIn, varIndex);
4169 while (iter.NextElem(compiler, &varIndex))
4171 unsigned varNum = compiler->lvaTrackedToVarNum[varIndex];
4172 LclVarDsc* varDsc = compiler->lvaTable + varNum;
4173 if (!varDsc->lvIsParam && isCandidateVar(varDsc))
4175 JITDUMP("V%02u was live in to first block:", varNum);
4176 Interval* interval = getIntervalForLocalVar(varNum);
4177 if (compiler->info.compInitMem || varTypeIsGC(varDsc->TypeGet()))
4179 JITDUMP(" creating ZeroInit\n");
4180 GenTree* firstNode = getNonEmptyBlock(compiler->fgFirstBB)->firstNode();
4182 newRefPosition(interval, MinLocation, RefTypeZeroInit, firstNode, allRegs(interval->registerType));
4183 varDsc->lvMustInit = true;
4187 setIntervalAsSpilled(interval);
4188 JITDUMP(" marking as spilled\n");
4194 #if defined(FEATURE_UNIX_AMD64_STRUCT_PASSING)
4195 // -----------------------------------------------------------------------
4196 // Sets the register state for an argument of type STRUCT for System V systems.
4197 // See Compiler::raUpdateRegStateForArg(RegState *regState, LclVarDsc *argDsc) in regalloc.cpp
4198 // for how state for argument is updated for unix non-structs and Windows AMD64 structs.
4199 void LinearScan::unixAmd64UpdateRegStateForArg(LclVarDsc* argDsc)
4201 assert(varTypeIsStruct(argDsc));
4202 RegState* intRegState = &compiler->codeGen->intRegState;
4203 RegState* floatRegState = &compiler->codeGen->floatRegState;
4205 if ((argDsc->lvArgReg != REG_STK) && (argDsc->lvArgReg != REG_NA))
4207 if (genRegMask(argDsc->lvArgReg) & (RBM_ALLFLOAT))
4209 assert(genRegMask(argDsc->lvArgReg) & (RBM_FLTARG_REGS));
4210 floatRegState->rsCalleeRegArgMaskLiveIn |= genRegMask(argDsc->lvArgReg);
4214 assert(genRegMask(argDsc->lvArgReg) & (RBM_ARG_REGS));
4215 intRegState->rsCalleeRegArgMaskLiveIn |= genRegMask(argDsc->lvArgReg);
4219 if ((argDsc->lvOtherArgReg != REG_STK) && (argDsc->lvOtherArgReg != REG_NA))
4221 if (genRegMask(argDsc->lvOtherArgReg) & (RBM_ALLFLOAT))
4223 assert(genRegMask(argDsc->lvOtherArgReg) & (RBM_FLTARG_REGS));
4224 floatRegState->rsCalleeRegArgMaskLiveIn |= genRegMask(argDsc->lvOtherArgReg);
4228 assert(genRegMask(argDsc->lvOtherArgReg) & (RBM_ARG_REGS));
4229 intRegState->rsCalleeRegArgMaskLiveIn |= genRegMask(argDsc->lvOtherArgReg);
4234 #endif // defined(FEATURE_UNIX_AMD64_STRUCT_PASSING)
4236 //------------------------------------------------------------------------
4237 // updateRegStateForArg: Updates rsCalleeRegArgMaskLiveIn for the appropriate
4238 // regState (either compiler->intRegState or compiler->floatRegState),
4239 // with the lvArgReg on "argDsc"
4242 // argDsc - the argument for which the state is to be updated.
4244 // Return Value: None
4247 // The argument is live on entry to the function
4248 // (or is untracked and therefore assumed live)
4251 // This relies on a method in regAlloc.cpp that is shared between LSRA
4252 // and regAlloc. It is further abstracted here because regState is updated
4253 // separately for tracked and untracked variables in LSRA.
4255 void LinearScan::updateRegStateForArg(LclVarDsc* argDsc)
4257 #if defined(FEATURE_UNIX_AMD64_STRUCT_PASSING)
4258 // For System V AMD64 calls the argDsc can have 2 registers (for structs.)
4259 // Handle them here.
4260 if (varTypeIsStruct(argDsc))
4262 unixAmd64UpdateRegStateForArg(argDsc);
4265 #endif // defined(FEATURE_UNIX_AMD64_STRUCT_PASSING)
4267 RegState* intRegState = &compiler->codeGen->intRegState;
4268 RegState* floatRegState = &compiler->codeGen->floatRegState;
4269 // In the case of AMD64 we'll still use the floating point registers
4270 // to model the register usage for argument on vararg calls, so
4271 // we will ignore the varargs condition to determine whether we use
4272 // XMM registers or not for setting up the call.
4273 bool isFloat = (isFloatRegType(argDsc->lvType)
4274 #ifndef _TARGET_AMD64_
4275 && !compiler->info.compIsVarArgs
4279 if (argDsc->lvIsHfaRegArg())
4286 JITDUMP("Float arg V%02u in reg %s\n", (argDsc - compiler->lvaTable), getRegName(argDsc->lvArgReg));
4287 compiler->raUpdateRegStateForArg(floatRegState, argDsc);
4291 JITDUMP("Int arg V%02u in reg %s\n", (argDsc - compiler->lvaTable), getRegName(argDsc->lvArgReg));
4292 #if FEATURE_MULTIREG_ARGS
4293 if (argDsc->lvOtherArgReg != REG_NA)
4295 JITDUMP("(second half) in reg %s\n", getRegName(argDsc->lvOtherArgReg));
4297 #endif // FEATURE_MULTIREG_ARGS
4298 compiler->raUpdateRegStateForArg(intRegState, argDsc);
4303 //------------------------------------------------------------------------
4304 // findPredBlockForLiveIn: Determine which block should be used for the register locations of the live-in variables.
4307 // block - The block for which we're selecting a predecesor.
4308 // prevBlock - The previous block in in allocation order.
4309 // pPredBlockIsAllocated - A debug-only argument that indicates whether any of the predecessors have been seen
4310 // in allocation order.
4313 // The selected predecessor.
4316 // in DEBUG, caller initializes *pPredBlockIsAllocated to false, and it will be set to true if the block
4317 // returned is in fact a predecessor.
4320 // This will select a predecessor based on the heuristics obtained by getLsraBlockBoundaryLocations(), which can be
4322 // LSRA_BLOCK_BOUNDARY_PRED - Use the register locations of a predecessor block (default)
4323 // LSRA_BLOCK_BOUNDARY_LAYOUT - Use the register locations of the previous block in layout order.
4324 // This is the only case where this actually returns a different block.
4325 // LSRA_BLOCK_BOUNDARY_ROTATE - Rotate the register locations from a predecessor.
4326 // For this case, the block returned is the same as for LSRA_BLOCK_BOUNDARY_PRED, but
4327 // the register locations will be "rotated" to stress the resolution and allocation
4330 BasicBlock* LinearScan::findPredBlockForLiveIn(BasicBlock* block,
4331 BasicBlock* prevBlock DEBUGARG(bool* pPredBlockIsAllocated))
4333 BasicBlock* predBlock = nullptr;
4335 assert(*pPredBlockIsAllocated == false);
4336 if (getLsraBlockBoundaryLocations() == LSRA_BLOCK_BOUNDARY_LAYOUT)
4338 if (prevBlock != nullptr)
4340 predBlock = prevBlock;
4345 if (block != compiler->fgFirstBB)
4347 predBlock = block->GetUniquePred(compiler);
4348 if (predBlock != nullptr)
4350 if (isBlockVisited(predBlock))
4352 if (predBlock->bbJumpKind == BBJ_COND)
4354 // Special handling to improve matching on backedges.
4355 BasicBlock* otherBlock = (block == predBlock->bbNext) ? predBlock->bbJumpDest : predBlock->bbNext;
4356 noway_assert(otherBlock != nullptr);
4357 if (isBlockVisited(otherBlock))
4359 // This is the case when we have a conditional branch where one target has already
4360 // been visited. It would be best to use the same incoming regs as that block,
4361 // so that we have less likelihood of having to move registers.
4362 // For example, in determining the block to use for the starting register locations for
4363 // "block" in the following example, we'd like to use the same predecessor for "block"
4364 // as for "otherBlock", so that both successors of predBlock have the same locations, reducing
4365 // the likelihood of needing a split block on a backedge:
4376 for (flowList* pred = otherBlock->bbPreds; pred != nullptr; pred = pred->flNext)
4378 BasicBlock* otherPred = pred->flBlock;
4379 if (otherPred->bbNum == blockInfo[otherBlock->bbNum].predBBNum)
4381 predBlock = otherPred;
4390 predBlock = nullptr;
4395 for (flowList* pred = block->bbPreds; pred != nullptr; pred = pred->flNext)
4397 BasicBlock* candidatePredBlock = pred->flBlock;
4398 if (isBlockVisited(candidatePredBlock))
4400 if (predBlock == nullptr || predBlock->bbWeight < candidatePredBlock->bbWeight)
4402 predBlock = candidatePredBlock;
4403 INDEBUG(*pPredBlockIsAllocated = true;)
4408 if (predBlock == nullptr)
4410 predBlock = prevBlock;
4411 assert(predBlock != nullptr);
4412 JITDUMP("\n\nNo allocated predecessor; ");
4418 void LinearScan::buildIntervals()
4422 // start numbering at 1; 0 is the entry
4423 LsraLocation currentLoc = 1;
4425 JITDUMP("\nbuildIntervals ========\n");
4427 // Now build (empty) records for all of the physical registers
4428 buildPhysRegRecords();
4433 printf("\n-----------------\n");
4434 printf("LIVENESS:\n");
4435 printf("-----------------\n");
4436 foreach_block(compiler, block)
4438 printf("BB%02u use def in out\n", block->bbNum);
4439 dumpConvertedVarSet(compiler, block->bbVarUse);
4441 dumpConvertedVarSet(compiler, block->bbVarDef);
4443 dumpConvertedVarSet(compiler, block->bbLiveIn);
4445 dumpConvertedVarSet(compiler, block->bbLiveOut);
4452 // We will determine whether we should double align the frame during
4453 // identifyCandidates(), but we initially assume that we will not.
4454 doDoubleAlign = false;
4457 identifyCandidates();
4459 // Figure out if we're going to use a frame pointer. We need to do this before building
4460 // the ref positions, because those objects will embed the frame register in various register masks
4461 // if the frame pointer is not reserved. If we decide to have a frame pointer, setFrameType() will
4462 // remove the frame pointer from the masks.
4465 DBEXEC(VERBOSE, TupleStyleDump(LSRA_DUMP_PRE));
4468 JITDUMP("\nbuildIntervals second part ========\n");
4471 // Next, create ParamDef RefPositions for all the tracked parameters,
4472 // in order of their varIndex
4475 unsigned int lclNum;
4477 RegState* intRegState = &compiler->codeGen->intRegState;
4478 RegState* floatRegState = &compiler->codeGen->floatRegState;
4479 intRegState->rsCalleeRegArgMaskLiveIn = RBM_NONE;
4480 floatRegState->rsCalleeRegArgMaskLiveIn = RBM_NONE;
4482 for (unsigned int varIndex = 0; varIndex < compiler->lvaTrackedCount; varIndex++)
4484 lclNum = compiler->lvaTrackedToVarNum[varIndex];
4485 argDsc = &(compiler->lvaTable[lclNum]);
4487 if (!argDsc->lvIsParam)
4492 // Only reserve a register if the argument is actually used.
4493 // Is it dead on entry? If compJmpOpUsed is true, then the arguments
4494 // have to be kept alive, so we have to consider it as live on entry.
4495 // Use lvRefCnt instead of checking bbLiveIn because if it's volatile we
4496 // won't have done dataflow on it, but it needs to be marked as live-in so
4497 // it will get saved in the prolog.
4498 if (!compiler->compJmpOpUsed && argDsc->lvRefCnt == 0 && !compiler->opts.compDbgCode)
4503 if (argDsc->lvIsRegArg)
4505 updateRegStateForArg(argDsc);
4508 if (isCandidateVar(argDsc))
4510 Interval* interval = getIntervalForLocalVar(lclNum);
4511 regMaskTP mask = allRegs(TypeGet(argDsc));
4512 if (argDsc->lvIsRegArg)
4514 // Set this interval as currently assigned to that register
4515 regNumber inArgReg = argDsc->lvArgReg;
4516 assert(inArgReg < REG_COUNT);
4517 mask = genRegMask(inArgReg);
4518 assignPhysReg(inArgReg, interval);
4520 RefPosition* pos = newRefPosition(interval, MinLocation, RefTypeParamDef, nullptr, mask);
4522 else if (varTypeIsStruct(argDsc->lvType))
4524 for (unsigned fieldVarNum = argDsc->lvFieldLclStart;
4525 fieldVarNum < argDsc->lvFieldLclStart + argDsc->lvFieldCnt; ++fieldVarNum)
4527 LclVarDsc* fieldVarDsc = &(compiler->lvaTable[fieldVarNum]);
4528 if (fieldVarDsc->lvLRACandidate)
4530 Interval* interval = getIntervalForLocalVar(fieldVarNum);
4532 newRefPosition(interval, MinLocation, RefTypeParamDef, nullptr, allRegs(TypeGet(fieldVarDsc)));
4538 // We can overwrite the register (i.e. codegen saves it on entry)
4539 assert(argDsc->lvRefCnt == 0 || !argDsc->lvIsRegArg || argDsc->lvDoNotEnregister ||
4540 !argDsc->lvLRACandidate || (varTypeIsFloating(argDsc->TypeGet()) && compiler->opts.compDbgCode));
4544 // Now set up the reg state for the non-tracked args
4545 // (We do this here because we want to generate the ParamDef RefPositions in tracked
4546 // order, so that loop doesn't hit the non-tracked args)
4548 for (unsigned argNum = 0; argNum < compiler->info.compArgsCount; argNum++, argDsc++)
4550 argDsc = &(compiler->lvaTable[argNum]);
4552 if (argDsc->lvPromotedStruct())
4554 noway_assert(argDsc->lvFieldCnt == 1); // We only handle one field here
4556 unsigned fieldVarNum = argDsc->lvFieldLclStart;
4557 argDsc = &(compiler->lvaTable[fieldVarNum]);
4559 noway_assert(argDsc->lvIsParam);
4560 if (!argDsc->lvTracked && argDsc->lvIsRegArg)
4562 updateRegStateForArg(argDsc);
4566 // If there is a secret stub param, it is also live in
4567 if (compiler->info.compPublishStubParam)
4569 intRegState->rsCalleeRegArgMaskLiveIn |= RBM_SECRET_STUB_PARAM;
4572 LocationInfoListNodePool listNodePool(compiler, 8);
4573 SmallHashTable<GenTree*, LocationInfoList, 32> operandToLocationInfoMap(compiler);
4575 BasicBlock* predBlock = nullptr;
4576 BasicBlock* prevBlock = nullptr;
4578 // Initialize currentLiveVars to the empty set. We will set it to the current
4579 // live-in at the entry to each block (this will include the incoming args on
4580 // the first block).
4581 VarSetOps::AssignNoCopy(compiler, currentLiveVars, VarSetOps::MakeEmpty(compiler));
4583 for (block = startBlockSequence(); block != nullptr; block = moveToNextBlock())
4585 JITDUMP("\nNEW BLOCK BB%02u\n", block->bbNum);
4587 bool predBlockIsAllocated = false;
4588 predBlock = findPredBlockForLiveIn(block, prevBlock DEBUGARG(&predBlockIsAllocated));
4590 if (block == compiler->fgFirstBB)
4592 insertZeroInitRefPositions();
4595 // Any lclVars live-in to a block are resolution candidates.
4596 VarSetOps::UnionD(compiler, resolutionCandidateVars, block->bbLiveIn);
4598 // Determine if we need any DummyDefs.
4599 // We need DummyDefs for cases where "predBlock" isn't really a predecessor.
4600 // Note that it's possible to have uses of unitialized variables, in which case even the first
4601 // block may require DummyDefs, which we are not currently adding - this means that these variables
4602 // will always be considered to be in memory on entry (and reloaded when the use is encountered).
4603 // TODO-CQ: Consider how best to tune this. Currently, if we create DummyDefs for uninitialized
4604 // variables (which may actually be initialized along the dynamically executed paths, but not
4605 // on all static paths), we wind up with excessive liveranges for some of these variables.
4606 VARSET_TP VARSET_INIT(compiler, newLiveIn, block->bbLiveIn);
4609 JITDUMP("\n\nSetting BB%02u as the predecessor for determining incoming variable registers of BB%02u\n",
4610 block->bbNum, predBlock->bbNum);
4611 assert(predBlock->bbNum <= bbNumMaxBeforeResolution);
4612 blockInfo[block->bbNum].predBBNum = predBlock->bbNum;
4613 // Compute set difference: newLiveIn = block->bbLiveIn - predBlock->bbLiveOut
4614 VarSetOps::DiffD(compiler, newLiveIn, predBlock->bbLiveOut);
4616 bool needsDummyDefs = (!VarSetOps::IsEmpty(compiler, newLiveIn) && block != compiler->fgFirstBB);
4618 // Create dummy def RefPositions
4622 // If we are using locations from a predecessor, we should never require DummyDefs.
4623 assert(!predBlockIsAllocated);
4625 JITDUMP("Creating dummy definitions\n");
4626 VARSET_ITER_INIT(compiler, iter, newLiveIn, varIndex);
4627 while (iter.NextElem(compiler, &varIndex))
4629 unsigned varNum = compiler->lvaTrackedToVarNum[varIndex];
4630 LclVarDsc* varDsc = compiler->lvaTable + varNum;
4631 // Add a dummyDef for any candidate vars that are in the "newLiveIn" set.
4632 // If this is the entry block, don't add any incoming parameters (they're handled with ParamDefs).
4633 if (isCandidateVar(varDsc) && (predBlock != nullptr || !varDsc->lvIsParam))
4635 Interval* interval = getIntervalForLocalVar(varNum);
4637 newRefPosition(interval, currentLoc, RefTypeDummyDef, nullptr, allRegs(interval->registerType));
4640 JITDUMP("Finished creating dummy definitions\n\n");
4643 // Add a dummy RefPosition to mark the block boundary.
4644 // Note that we do this AFTER adding the exposed uses above, because the
4645 // register positions for those exposed uses need to be recorded at
4648 RefPosition* pos = newRefPosition((Interval*)nullptr, currentLoc, RefTypeBB, nullptr, RBM_NONE);
4650 VarSetOps::Assign(compiler, currentLiveVars, block->bbLiveIn);
4652 LIR::Range& blockRange = LIR::AsRange(block);
4653 for (GenTree* node : blockRange.NonPhiNodes())
4655 assert(node->gtLsraInfo.loc >= currentLoc);
4656 assert(((node->gtLIRFlags & LIR::Flags::IsUnusedValue) == 0) || node->gtLsraInfo.isLocalDefUse);
4658 currentLoc = node->gtLsraInfo.loc;
4659 buildRefPositionsForNode(node, block, listNodePool, operandToLocationInfoMap, currentLoc);
4662 if (currentLoc > maxNodeLocation)
4664 maxNodeLocation = currentLoc;
4669 // Increment the LsraLocation at this point, so that the dummy RefPositions
4670 // will not have the same LsraLocation as any "real" RefPosition.
4673 // Note: the visited set is cleared in LinearScan::doLinearScan()
4674 markBlockVisited(block);
4676 // Insert exposed uses for a lclVar that is live-out of 'block' but not live-in to the
4677 // next block, or any unvisited successors.
4678 // This will address lclVars that are live on a backedge, as well as those that are kept
4679 // live at a GT_JMP.
4681 // Blocks ending with "jmp method" are marked as BBJ_HAS_JMP,
4682 // and jmp call is represented using GT_JMP node which is a leaf node.
4683 // Liveness phase keeps all the arguments of the method live till the end of
4684 // block by adding them to liveout set of the block containing GT_JMP.
4686 // The target of a GT_JMP implicitly uses all the current method arguments, however
4687 // there are no actual references to them. This can cause LSRA to assert, because
4688 // the variables are live but it sees no references. In order to correctly model the
4689 // liveness of these arguments, we add dummy exposed uses, in the same manner as for
4690 // backward branches. This will happen automatically via expUseSet.
4692 // Note that a block ending with GT_JMP has no successors and hence the variables
4693 // for which dummy use ref positions are added are arguments of the method.
4695 VARSET_TP VARSET_INIT(compiler, expUseSet, block->bbLiveOut);
4696 BasicBlock* nextBlock = getNextBlock();
4697 if (nextBlock != nullptr)
4699 VarSetOps::DiffD(compiler, expUseSet, nextBlock->bbLiveIn);
4701 AllSuccessorIter succsEnd = block->GetAllSuccs(compiler).end();
4702 for (AllSuccessorIter succs = block->GetAllSuccs(compiler).begin();
4703 succs != succsEnd && !VarSetOps::IsEmpty(compiler, expUseSet); ++succs)
4705 BasicBlock* succ = (*succs);
4706 if (isBlockVisited(succ))
4710 VarSetOps::DiffD(compiler, expUseSet, succ->bbLiveIn);
4713 if (!VarSetOps::IsEmpty(compiler, expUseSet))
4715 JITDUMP("Exposed uses:");
4716 VARSET_ITER_INIT(compiler, iter, expUseSet, varIndex);
4717 while (iter.NextElem(compiler, &varIndex))
4719 unsigned varNum = compiler->lvaTrackedToVarNum[varIndex];
4720 LclVarDsc* varDsc = compiler->lvaTable + varNum;
4721 if (isCandidateVar(varDsc))
4723 Interval* interval = getIntervalForLocalVar(varNum);
4725 newRefPosition(interval, currentLoc, RefTypeExpUse, nullptr, allRegs(interval->registerType));
4726 JITDUMP(" V%02u", varNum);
4732 // Clear the "last use" flag on any vars that are live-out from this block.
4734 VARSET_ITER_INIT(compiler, iter, block->bbLiveOut, varIndex);
4735 while (iter.NextElem(compiler, &varIndex))
4737 unsigned varNum = compiler->lvaTrackedToVarNum[varIndex];
4738 LclVarDsc* const varDsc = &compiler->lvaTable[varNum];
4739 if (isCandidateVar(varDsc))
4741 RefPosition* const lastRP = getIntervalForLocalVar(varNum)->lastRefPosition;
4742 if ((lastRP != nullptr) && (lastRP->bbNum == block->bbNum))
4744 lastRP->lastUse = false;
4751 checkLastUses(block);
4756 dumpConvertedVarSet(compiler, block->bbVarUse);
4758 dumpConvertedVarSet(compiler, block->bbVarDef);
4766 // If we need to KeepAliveAndReportThis, add a dummy exposed use of it at the end
4767 if (compiler->lvaKeepAliveAndReportThis())
4769 unsigned keepAliveVarNum = compiler->info.compThisArg;
4770 assert(compiler->info.compIsStatic == false);
4771 if (isCandidateVar(&compiler->lvaTable[keepAliveVarNum]))
4773 JITDUMP("Adding exposed use of this, for lvaKeepAliveAndReportThis\n");
4774 Interval* interval = getIntervalForLocalVar(keepAliveVarNum);
4776 newRefPosition(interval, currentLoc, RefTypeExpUse, nullptr, allRegs(interval->registerType));
4781 if (getLsraExtendLifeTimes())
4784 for (lclNum = 0, varDsc = compiler->lvaTable; lclNum < compiler->lvaCount; lclNum++, varDsc++)
4786 if (varDsc->lvLRACandidate)
4788 JITDUMP("Adding exposed use of V%02u for LsraExtendLifetimes\n", lclNum);
4789 Interval* interval = getIntervalForLocalVar(lclNum);
4791 newRefPosition(interval, currentLoc, RefTypeExpUse, nullptr, allRegs(interval->registerType));
4797 // If the last block has successors, create a RefTypeBB to record
4800 if (prevBlock->NumSucc(compiler) > 0)
4802 RefPosition* pos = newRefPosition((Interval*)nullptr, currentLoc, RefTypeBB, nullptr, RBM_NONE);
4806 // Make sure we don't have any blocks that were not visited
4807 foreach_block(compiler, block)
4809 assert(isBlockVisited(block));
4814 lsraDumpIntervals("BEFORE VALIDATING INTERVALS");
4815 dumpRefPositions("BEFORE VALIDATING INTERVALS");
4816 validateIntervals();
4822 void LinearScan::dumpVarRefPositions(const char* title)
4824 printf("\nVAR REFPOSITIONS %s\n", title);
4826 for (unsigned i = 0; i < compiler->lvaCount; i++)
4828 Interval* interval = getIntervalForLocalVar(i);
4829 printf("--- V%02u\n", i);
4831 for (RefPosition* ref = interval->firstRefPosition; ref != nullptr; ref = ref->nextRefPosition)
4840 void LinearScan::validateIntervals()
4842 for (unsigned i = 0; i < compiler->lvaCount; i++)
4844 Interval* interval = getIntervalForLocalVar(i);
4846 bool defined = false;
4847 printf("-----------------\n");
4848 for (RefPosition* ref = interval->firstRefPosition; ref != nullptr; ref = ref->nextRefPosition)
4851 RefType refType = ref->refType;
4852 if (!defined && RefTypeIsUse(refType))
4854 if (compiler->info.compMethodName != nullptr)
4856 printf("%s: ", compiler->info.compMethodName);
4858 printf("LocalVar V%02u: undefined use at %u\n", i, ref->nodeLocation);
4860 // Note that there can be multiple last uses if they are on disjoint paths,
4861 // so we can't really check the lastUse flag
4866 if (RefTypeIsDef(refType))
4875 // Set the default rpFrameType based upon codeGen->isFramePointerRequired()
4876 // This was lifted from the register predictor
4878 void LinearScan::setFrameType()
4880 FrameType frameType = FT_NOT_SET;
4882 compiler->codeGen->setDoubleAlign(false);
4885 frameType = FT_DOUBLE_ALIGN_FRAME;
4886 compiler->codeGen->setDoubleAlign(true);
4889 #endif // DOUBLE_ALIGN
4890 if (compiler->codeGen->isFramePointerRequired())
4892 frameType = FT_EBP_FRAME;
4896 if (compiler->rpMustCreateEBPCalled == false)
4901 compiler->rpMustCreateEBPCalled = true;
4902 if (compiler->rpMustCreateEBPFrame(INDEBUG(&reason)))
4904 JITDUMP("; Decided to create an EBP based frame for ETW stackwalking (%s)\n", reason);
4905 compiler->codeGen->setFrameRequired(true);
4909 if (compiler->codeGen->isFrameRequired())
4911 frameType = FT_EBP_FRAME;
4915 frameType = FT_ESP_FRAME;
4922 noway_assert(!compiler->codeGen->isFramePointerRequired());
4923 noway_assert(!compiler->codeGen->isFrameRequired());
4924 compiler->codeGen->setFramePointerUsed(false);
4927 compiler->codeGen->setFramePointerUsed(true);
4930 case FT_DOUBLE_ALIGN_FRAME:
4931 noway_assert(!compiler->codeGen->isFramePointerRequired());
4932 compiler->codeGen->setFramePointerUsed(false);
4934 #endif // DOUBLE_ALIGN
4936 noway_assert(!"rpFrameType not set correctly!");
4940 // If we are using FPBASE as the frame register, we cannot also use it for
4941 // a local var. Note that we may have already added it to the register masks,
4942 // which are computed when the LinearScan class constructor is created, and
4943 // used during lowering. Luckily, the TreeNodeInfo only stores an index to
4944 // the masks stored in the LinearScan class, so we only need to walk the
4945 // unique masks and remove FPBASE.
4946 if (frameType == FT_EBP_FRAME)
4948 if ((availableIntRegs & RBM_FPBASE) != 0)
4950 RemoveRegisterFromMasks(REG_FPBASE);
4952 // We know that we're already in "read mode" for availableIntRegs. However,
4953 // we need to remove the FPBASE register, so subsequent users (like callers
4954 // to allRegs()) get the right thing. The RemoveRegisterFromMasks() code
4955 // fixes up everything that already took a dependency on the value that was
4956 // previously read, so this completes the picture.
4957 availableIntRegs.OverrideAssign(availableIntRegs & ~RBM_FPBASE);
4961 compiler->rpFrameType = frameType;
4964 // Is the copyReg/moveReg given by this RefPosition still busy at the
4966 bool copyOrMoveRegInUse(RefPosition* ref, LsraLocation loc)
4968 assert(ref->copyReg || ref->moveReg);
4969 if (ref->getRefEndLocation() >= loc)
4973 Interval* interval = ref->getInterval();
4974 RefPosition* nextRef = interval->getNextRefPosition();
4975 if (nextRef != nullptr && nextRef->treeNode == ref->treeNode && nextRef->getRefEndLocation() >= loc)
4982 // Determine whether the register represented by "physRegRecord" is available at least
4983 // at the "currentLoc", and if so, return the next location at which it is in use in
4984 // "nextRefLocationPtr"
4986 bool LinearScan::registerIsAvailable(RegRecord* physRegRecord,
4987 LsraLocation currentLoc,
4988 LsraLocation* nextRefLocationPtr,
4989 RegisterType regType)
4991 *nextRefLocationPtr = MaxLocation;
4992 LsraLocation nextRefLocation = MaxLocation;
4993 regMaskTP regMask = genRegMask(physRegRecord->regNum);
4994 if (physRegRecord->isBusyUntilNextKill)
4999 RefPosition* nextPhysReference = physRegRecord->getNextRefPosition();
5000 if (nextPhysReference != nullptr)
5002 nextRefLocation = nextPhysReference->nodeLocation;
5003 // if (nextPhysReference->refType == RefTypeFixedReg) nextRefLocation--;
5005 else if (!physRegRecord->isCalleeSave)
5007 nextRefLocation = MaxLocation - 1;
5010 Interval* assignedInterval = physRegRecord->assignedInterval;
5012 if (assignedInterval != nullptr)
5014 RefPosition* recentReference = assignedInterval->recentRefPosition;
5016 // The only case where we have an assignedInterval, but recentReference is null
5017 // is where this interval is live at procedure entry (i.e. an arg register), in which
5018 // case it's still live and its assigned register is not available
5019 // (Note that the ParamDef will be recorded as a recentReference when we encounter
5020 // it, but we will be allocating registers, potentially to other incoming parameters,
5021 // as we process the ParamDefs.)
5023 if (recentReference == nullptr)
5028 // Is this a copyReg/moveReg? It is if the register assignment doesn't match.
5029 // (the recentReference may not be a copyReg/moveReg, because we could have seen another
5030 // reference since the copyReg/moveReg)
5032 if (!assignedInterval->isAssignedTo(physRegRecord->regNum))
5034 // Don't reassign it if it's still in use
5035 if ((recentReference->copyReg || recentReference->moveReg) &&
5036 copyOrMoveRegInUse(recentReference, currentLoc))
5041 else if (!assignedInterval->isActive && assignedInterval->isConstant)
5043 // Treat this as unassigned, i.e. do nothing.
5044 // TODO-CQ: Consider adjusting the heuristics (probably in the caller of this method)
5045 // to avoid reusing these registers.
5047 // If this interval isn't active, it's available if it isn't referenced
5048 // at this location (or the previous location, if the recent RefPosition
5049 // is a delayRegFree).
5050 else if (!assignedInterval->isActive &&
5051 (recentReference->refType == RefTypeExpUse || recentReference->getRefEndLocation() < currentLoc))
5053 // This interval must have a next reference (otherwise it wouldn't be assigned to this register)
5054 RefPosition* nextReference = recentReference->nextRefPosition;
5055 if (nextReference != nullptr)
5057 if (nextReference->nodeLocation < nextRefLocation)
5059 nextRefLocation = nextReference->nodeLocation;
5064 assert(recentReference->copyReg && recentReference->registerAssignment != regMask);
5072 if (nextRefLocation < *nextRefLocationPtr)
5074 *nextRefLocationPtr = nextRefLocation;
5078 if (regType == TYP_DOUBLE)
5080 // Recurse, but check the other half this time (TYP_FLOAT)
5081 if (!registerIsAvailable(getRegisterRecord(REG_NEXT(physRegRecord->regNum)), currentLoc, nextRefLocationPtr,
5084 nextRefLocation = *nextRefLocationPtr;
5086 #endif // _TARGET_ARM_
5088 return (nextRefLocation >= currentLoc);
5091 //------------------------------------------------------------------------
5092 // getRegisterType: Get the RegisterType to use for the given RefPosition
5095 // currentInterval: The interval for the current allocation
5096 // refPosition: The RefPosition of the current Interval for which a register is being allocated
5099 // The RegisterType that should be allocated for this RefPosition
5102 // This will nearly always be identical to the registerType of the interval, except in the case
5103 // of SIMD types of 8 bytes (currently only Vector2) when they are passed and returned in integer
5104 // registers, or copied to a return temp.
5105 // This method need only be called in situations where we may be dealing with the register requirements
5106 // of a RefTypeUse RefPosition (i.e. not when we are only looking at the type of an interval, nor when
5107 // we are interested in the "defining" type of the interval). This is because the situation of interest
5108 // only happens at the use (where it must be copied to an integer register).
5110 RegisterType LinearScan::getRegisterType(Interval* currentInterval, RefPosition* refPosition)
5112 assert(refPosition->getInterval() == currentInterval);
5113 RegisterType regType = currentInterval->registerType;
5114 regMaskTP candidates = refPosition->registerAssignment;
5115 #if defined(FEATURE_SIMD) && defined(_TARGET_AMD64_)
5116 if ((candidates & allRegs(regType)) == RBM_NONE)
5118 assert((regType == TYP_SIMD8) && (refPosition->refType == RefTypeUse) &&
5119 ((candidates & allRegs(TYP_INT)) != RBM_NONE));
5122 #else // !(defined(FEATURE_SIMD) && defined(_TARGET_AMD64_))
5123 assert((candidates & allRegs(regType)) != RBM_NONE);
5124 #endif // !(defined(FEATURE_SIMD) && defined(_TARGET_AMD64_))
5128 //------------------------------------------------------------------------
5129 // tryAllocateFreeReg: Find a free register that satisfies the requirements for refPosition,
5130 // and takes into account the preferences for the given Interval
5133 // currentInterval: The interval for the current allocation
5134 // refPosition: The RefPosition of the current Interval for which a register is being allocated
5137 // The regNumber, if any, allocated to the RefPositon. Returns REG_NA if no free register is found.
5140 // TODO-CQ: Consider whether we need to use a different order for tree temps than for vars, as
5143 static const regNumber lsraRegOrder[] = {REG_VAR_ORDER};
5144 const unsigned lsraRegOrderSize = ArrLen(lsraRegOrder);
5145 static const regNumber lsraRegOrderFlt[] = {REG_VAR_ORDER_FLT};
5146 const unsigned lsraRegOrderFltSize = ArrLen(lsraRegOrderFlt);
5148 regNumber LinearScan::tryAllocateFreeReg(Interval* currentInterval, RefPosition* refPosition)
5150 regNumber foundReg = REG_NA;
5152 RegisterType regType = getRegisterType(currentInterval, refPosition);
5153 const regNumber* regOrder;
5154 unsigned regOrderSize;
5155 if (useFloatReg(regType))
5157 regOrder = lsraRegOrderFlt;
5158 regOrderSize = lsraRegOrderFltSize;
5162 regOrder = lsraRegOrder;
5163 regOrderSize = lsraRegOrderSize;
5166 LsraLocation currentLocation = refPosition->nodeLocation;
5167 RefPosition* nextRefPos = refPosition->nextRefPosition;
5168 LsraLocation nextLocation = (nextRefPos == nullptr) ? currentLocation : nextRefPos->nodeLocation;
5169 regMaskTP candidates = refPosition->registerAssignment;
5170 regMaskTP preferences = currentInterval->registerPreferences;
5172 if (RefTypeIsDef(refPosition->refType))
5174 if (currentInterval->hasConflictingDefUse)
5176 resolveConflictingDefAndUse(currentInterval, refPosition);
5177 candidates = refPosition->registerAssignment;
5179 // Otherwise, check for the case of a fixed-reg def of a reg that will be killed before the
5180 // use, or interferes at the point of use (which shouldn't happen, but Lower doesn't mark
5181 // the contained nodes as interfering).
5182 // Note that we may have a ParamDef RefPosition that is marked isFixedRegRef, but which
5183 // has had its registerAssignment changed to no longer be a single register.
5184 else if (refPosition->isFixedRegRef && nextRefPos != nullptr && RefTypeIsUse(nextRefPos->refType) &&
5185 !nextRefPos->isFixedRegRef && genMaxOneBit(refPosition->registerAssignment))
5187 regNumber defReg = refPosition->assignedReg();
5188 RegRecord* defRegRecord = getRegisterRecord(defReg);
5190 RefPosition* currFixedRegRefPosition = defRegRecord->recentRefPosition;
5191 assert(currFixedRegRefPosition != nullptr &&
5192 currFixedRegRefPosition->nodeLocation == refPosition->nodeLocation);
5194 // If there is another fixed reference to this register before the use, change the candidates
5195 // on this RefPosition to include that of nextRefPos.
5196 if (currFixedRegRefPosition->nextRefPosition != nullptr &&
5197 currFixedRegRefPosition->nextRefPosition->nodeLocation <= nextRefPos->getRefEndLocation())
5199 candidates |= nextRefPos->registerAssignment;
5200 if (preferences == refPosition->registerAssignment)
5202 preferences = candidates;
5208 preferences &= candidates;
5209 if (preferences == RBM_NONE)
5211 preferences = candidates;
5213 regMaskTP relatedPreferences = RBM_NONE;
5216 candidates = stressLimitRegs(refPosition, candidates);
5218 bool mustAssignARegister = true;
5219 assert(candidates != RBM_NONE);
5221 // If the related interval has no further references, it is possible that it is a source of the
5222 // node that produces this interval. However, we don't want to use the relatedInterval for preferencing
5223 // if its next reference is not a new definition (as it either is or will become live).
5224 Interval* relatedInterval = currentInterval->relatedInterval;
5225 if (relatedInterval != nullptr)
5227 RefPosition* nextRelatedRefPosition = relatedInterval->getNextRefPosition();
5228 if (nextRelatedRefPosition != nullptr)
5230 // Don't use the relatedInterval for preferencing if its next reference is not a new definition.
5231 if (!RefTypeIsDef(nextRelatedRefPosition->refType))
5233 relatedInterval = nullptr;
5235 // Is the relatedInterval simply a copy to another relatedInterval?
5236 else if ((relatedInterval->relatedInterval != nullptr) &&
5237 (nextRelatedRefPosition->nextRefPosition != nullptr) &&
5238 (nextRelatedRefPosition->nextRefPosition->nextRefPosition == nullptr) &&
5239 (nextRelatedRefPosition->nextRefPosition->nodeLocation <
5240 relatedInterval->relatedInterval->getNextRefLocation()))
5242 // The current relatedInterval has only two remaining RefPositions, both of which
5243 // occur prior to the next RefPosition for its relatedInterval.
5244 // It is likely a copy.
5245 relatedInterval = relatedInterval->relatedInterval;
5250 if (relatedInterval != nullptr)
5252 // If the related interval already has an assigned register, then use that
5253 // as the related preference. We'll take the related
5254 // interval preferences into account in the loop over all the registers.
5256 if (relatedInterval->assignedReg != nullptr)
5258 relatedPreferences = genRegMask(relatedInterval->assignedReg->regNum);
5262 relatedPreferences = relatedInterval->registerPreferences;
5266 bool preferCalleeSave = currentInterval->preferCalleeSave;
5268 // For floating point, we want to be less aggressive about using callee-save registers.
5269 // So in that case, we just need to ensure that the current RefPosition is covered.
5270 RefPosition* rangeEndRefPosition;
5271 RefPosition* lastRefPosition = currentInterval->lastRefPosition;
5272 if (useFloatReg(currentInterval->registerType))
5274 rangeEndRefPosition = refPosition;
5278 rangeEndRefPosition = currentInterval->lastRefPosition;
5279 // If we have a relatedInterval that is not currently occupying a register,
5280 // and whose lifetime begins after this one ends,
5281 // we want to try to select a register that will cover its lifetime.
5282 if ((relatedInterval != nullptr) && (relatedInterval->assignedReg == nullptr) &&
5283 (relatedInterval->getNextRefLocation() >= rangeEndRefPosition->nodeLocation))
5285 lastRefPosition = relatedInterval->lastRefPosition;
5286 preferCalleeSave = relatedInterval->preferCalleeSave;
5290 // If this has a delayed use (due to being used in a rmw position of a
5291 // non-commutative operator), its endLocation is delayed until the "def"
5292 // position, which is one location past the use (getRefEndLocation() takes care of this).
5293 LsraLocation rangeEndLocation = rangeEndRefPosition->getRefEndLocation();
5294 LsraLocation lastLocation = lastRefPosition->getRefEndLocation();
5295 regNumber prevReg = REG_NA;
5297 if (currentInterval->assignedReg)
5299 bool useAssignedReg = false;
5300 // This was an interval that was previously allocated to the given
5301 // physical register, and we should try to allocate it to that register
5302 // again, if possible and reasonable.
5303 // Use it preemptively (i.e. before checking other available regs)
5304 // only if it is preferred and available.
5306 RegRecord* regRec = currentInterval->assignedReg;
5307 prevReg = regRec->regNum;
5308 regMaskTP prevRegBit = genRegMask(prevReg);
5310 // Is it in the preferred set of regs?
5311 if ((prevRegBit & preferences) != RBM_NONE)
5313 // Is it currently available?
5314 LsraLocation nextPhysRefLoc;
5315 if (registerIsAvailable(regRec, currentLocation, &nextPhysRefLoc, currentInterval->registerType))
5317 // If the register is next referenced at this location, only use it if
5318 // this has a fixed reg requirement (i.e. this is the reference that caused
5319 // the FixedReg ref to be created)
5321 if (!regRec->conflictingFixedRegReference(refPosition))
5323 useAssignedReg = true;
5329 regNumber foundReg = prevReg;
5330 assignPhysReg(regRec, currentInterval);
5331 refPosition->registerAssignment = genRegMask(foundReg);
5336 // Don't keep trying to allocate to this register
5337 currentInterval->assignedReg = nullptr;
5341 RegRecord* availablePhysRegInterval = nullptr;
5342 Interval* intervalToUnassign = nullptr;
5344 // Each register will receive a score which is the sum of the scoring criteria below.
5345 // These were selected on the assumption that they will have an impact on the "goodness"
5346 // of a register selection, and have been tuned to a certain extent by observing the impact
5347 // of the ordering on asmDiffs. However, there is probably much more room for tuning,
5348 // and perhaps additional criteria.
5350 // These are FLAGS (bits) so that we can easily order them and add them together.
5351 // If the scores are equal, but one covers more of the current interval's range,
5352 // then it wins. Otherwise, the one encountered earlier in the regOrder wins.
5356 VALUE_AVAILABLE = 0x40, // It is a constant value that is already in an acceptable register.
5357 COVERS = 0x20, // It is in the interval's preference set and it covers the entire lifetime.
5358 OWN_PREFERENCE = 0x10, // It is in the preference set of this interval.
5359 COVERS_RELATED = 0x08, // It is in the preference set of the related interval and covers the entire lifetime.
5360 RELATED_PREFERENCE = 0x04, // It is in the preference set of the related interval.
5361 CALLER_CALLEE = 0x02, // It is in the right "set" for the interval (caller or callee-save).
5362 UNASSIGNED = 0x01, // It is not currently assigned to an inactive interval.
5367 // Compute the best possible score so we can stop looping early if we find it.
5368 // TODO-Throughput: At some point we may want to short-circuit the computation of each score, but
5369 // probably not until we've tuned the order of these criteria. At that point,
5370 // we'll need to avoid the short-circuit if we've got a stress option to reverse
5372 int bestPossibleScore = COVERS + UNASSIGNED + OWN_PREFERENCE + CALLER_CALLEE;
5373 if (relatedPreferences != RBM_NONE)
5375 bestPossibleScore |= RELATED_PREFERENCE + COVERS_RELATED;
5378 LsraLocation bestLocation = MinLocation;
5380 // In non-debug builds, this will simply get optimized away
5381 bool reverseSelect = false;
5383 reverseSelect = doReverseSelect();
5386 // An optimization for the common case where there is only one candidate -
5387 // avoid looping over all the other registers
5389 regNumber singleReg = REG_NA;
5391 if (genMaxOneBit(candidates))
5394 singleReg = genRegNumFromMask(candidates);
5395 regOrder = &singleReg;
5398 for (unsigned i = 0; i < regOrderSize && (candidates != RBM_NONE); i++)
5400 regNumber regNum = regOrder[i];
5401 regMaskTP candidateBit = genRegMask(regNum);
5403 if (!(candidates & candidateBit))
5408 candidates &= ~candidateBit;
5410 RegRecord* physRegRecord = getRegisterRecord(regNum);
5413 LsraLocation nextPhysRefLocation = MaxLocation;
5415 // By chance, is this register already holding this interval, as a copyReg or having
5416 // been restored as inactive after a kill?
5417 if (physRegRecord->assignedInterval == currentInterval)
5419 availablePhysRegInterval = physRegRecord;
5420 intervalToUnassign = nullptr;
5424 // Find the next RefPosition of the physical register
5425 if (!registerIsAvailable(physRegRecord, currentLocation, &nextPhysRefLocation, regType))
5430 // If the register is next referenced at this location, only use it if
5431 // this has a fixed reg requirement (i.e. this is the reference that caused
5432 // the FixedReg ref to be created)
5434 if (physRegRecord->conflictingFixedRegReference(refPosition))
5439 // If this is a definition of a constant interval, check to see if its value is already in this register.
5440 if (currentInterval->isConstant && RefTypeIsDef(refPosition->refType) &&
5441 (physRegRecord->assignedInterval != nullptr) && physRegRecord->assignedInterval->isConstant)
5443 noway_assert(refPosition->treeNode != nullptr);
5444 GenTree* otherTreeNode = physRegRecord->assignedInterval->firstRefPosition->treeNode;
5445 noway_assert(otherTreeNode != nullptr);
5447 if (refPosition->treeNode->OperGet() == otherTreeNode->OperGet())
5449 switch (otherTreeNode->OperGet())
5452 if ((refPosition->treeNode->AsIntCon()->IconValue() ==
5453 otherTreeNode->AsIntCon()->IconValue()) &&
5454 (varTypeGCtype(refPosition->treeNode) == varTypeGCtype(otherTreeNode)))
5456 #ifdef _TARGET_64BIT_
5457 // If the constant is negative, only reuse registers of the same type.
5458 // This is because, on a 64-bit system, we do not sign-extend immediates in registers to
5459 // 64-bits unless they are actually longs, as this requires a longer instruction.
5460 // This doesn't apply to a 32-bit system, on which long values occupy multiple registers.
5461 // (We could sign-extend, but we would have to always sign-extend, because if we reuse more
5462 // than once, we won't have access to the instruction that originally defines the constant).
5463 if ((refPosition->treeNode->TypeGet() == otherTreeNode->TypeGet()) ||
5464 (refPosition->treeNode->AsIntCon()->IconValue() >= 0))
5465 #endif // _TARGET_64BIT_
5467 score |= VALUE_AVAILABLE;
5473 // For floating point constants, the values must be identical, not simply compare
5474 // equal. So we compare the bits.
5475 if (refPosition->treeNode->AsDblCon()->isBitwiseEqual(otherTreeNode->AsDblCon()) &&
5476 (refPosition->treeNode->TypeGet() == otherTreeNode->TypeGet()))
5478 score |= VALUE_AVAILABLE;
5483 // for all other 'otherTreeNode->OperGet()' kinds, we leave 'score' unchanged
5489 // If the nextPhysRefLocation is a fixedRef for the rangeEndRefPosition, increment it so that
5490 // we don't think it isn't covering the live range.
5491 // This doesn't handle the case where earlier RefPositions for this Interval are also
5492 // FixedRefs of this regNum, but at least those are only interesting in the case where those
5493 // are "local last uses" of the Interval - otherwise the liveRange would interfere with the reg.
5494 if (nextPhysRefLocation == rangeEndLocation && rangeEndRefPosition->isFixedRefOfReg(regNum))
5496 INDEBUG(dumpLsraAllocationEvent(LSRA_EVENT_INCREMENT_RANGE_END, currentInterval, regNum));
5497 nextPhysRefLocation++;
5500 if ((candidateBit & preferences) != RBM_NONE)
5502 score |= OWN_PREFERENCE;
5503 if (nextPhysRefLocation > rangeEndLocation)
5508 if (relatedInterval != nullptr && (candidateBit & relatedPreferences) != RBM_NONE)
5510 score |= RELATED_PREFERENCE;
5511 if (nextPhysRefLocation > relatedInterval->lastRefPosition->nodeLocation)
5513 score |= COVERS_RELATED;
5517 // If we had a fixed-reg def of a reg that will be killed before the use, prefer it to any other registers
5518 // with the same score. (Note that we haven't changed the original registerAssignment on the RefPosition).
5519 // Overload the RELATED_PREFERENCE value.
5520 else if (candidateBit == refPosition->registerAssignment)
5522 score |= RELATED_PREFERENCE;
5525 if ((preferCalleeSave && physRegRecord->isCalleeSave) || (!preferCalleeSave && !physRegRecord->isCalleeSave))
5527 score |= CALLER_CALLEE;
5530 // The register is considered unassigned if it has no assignedInterval, OR
5531 // if its next reference is beyond the range of this interval.
5532 if (physRegRecord->assignedInterval == nullptr ||
5533 physRegRecord->assignedInterval->getNextRefLocation() > lastLocation)
5535 score |= UNASSIGNED;
5538 bool foundBetterCandidate = false;
5540 if (score > bestScore)
5542 foundBetterCandidate = true;
5544 else if (score == bestScore)
5546 // Prefer a register that covers the range.
5547 if (bestLocation <= lastLocation)
5549 if (nextPhysRefLocation > bestLocation)
5551 foundBetterCandidate = true;
5554 // If both cover the range, prefer a register that is killed sooner (leaving the longer range register
5555 // available). If both cover the range and also getting killed at the same location, prefer the one which
5556 // is same as previous assignment.
5557 else if (nextPhysRefLocation > lastLocation)
5559 if (nextPhysRefLocation < bestLocation)
5561 foundBetterCandidate = true;
5563 else if (nextPhysRefLocation == bestLocation && prevReg == regNum)
5565 foundBetterCandidate = true;
5571 if (doReverseSelect() && bestScore != 0)
5573 foundBetterCandidate = !foundBetterCandidate;
5577 if (foundBetterCandidate)
5579 bestLocation = nextPhysRefLocation;
5580 availablePhysRegInterval = physRegRecord;
5581 intervalToUnassign = physRegRecord->assignedInterval;
5585 // there is no way we can get a better score so break out
5586 if (!reverseSelect && score == bestPossibleScore && bestLocation == rangeEndLocation + 1)
5592 if (availablePhysRegInterval != nullptr)
5594 if (intervalToUnassign != nullptr)
5596 unassignPhysReg(availablePhysRegInterval, intervalToUnassign->recentRefPosition);
5597 if (bestScore & VALUE_AVAILABLE)
5599 assert(intervalToUnassign->isConstant);
5600 refPosition->treeNode->SetReuseRegVal();
5601 refPosition->treeNode->SetInReg();
5603 // If we considered this "unassigned" because this interval's lifetime ends before
5604 // the next ref, remember it.
5605 else if ((bestScore & UNASSIGNED) != 0 && intervalToUnassign != nullptr)
5607 availablePhysRegInterval->previousInterval = intervalToUnassign;
5612 assert((bestScore & VALUE_AVAILABLE) == 0);
5614 assignPhysReg(availablePhysRegInterval, currentInterval);
5615 foundReg = availablePhysRegInterval->regNum;
5616 regMaskTP foundRegMask = genRegMask(foundReg);
5617 refPosition->registerAssignment = foundRegMask;
5618 if (relatedInterval != nullptr)
5620 relatedInterval->updateRegisterPreferences(foundRegMask);
5627 //------------------------------------------------------------------------
5628 // allocateBusyReg: Find a busy register that satisfies the requirements for refPosition,
5629 // and that can be spilled.
5632 // current The interval for the current allocation
5633 // refPosition The RefPosition of the current Interval for which a register is being allocated
5634 // allocateIfProfitable If true, a reg may not be allocated if all other ref positions currently
5635 // occupying registers are more important than the 'refPosition'.
5638 // The regNumber allocated to the RefPositon. Returns REG_NA if no free register is found.
5640 // Note: Currently this routine uses weight and farthest distance of next reference
5641 // to select a ref position for spilling.
5642 // a) if allocateIfProfitable = false
5643 // The ref position chosen for spilling will be the lowest weight
5644 // of all and if there is is more than one ref position with the
5645 // same lowest weight, among them choses the one with farthest
5646 // distance to its next reference.
5648 // b) if allocateIfProfitable = true
5649 // The ref position chosen for spilling will not only be lowest weight
5650 // of all but also has a weight lower than 'refPosition'. If there is
5651 // no such ref position, reg will not be allocated.
5652 regNumber LinearScan::allocateBusyReg(Interval* current, RefPosition* refPosition, bool allocateIfProfitable)
5654 regNumber foundReg = REG_NA;
5656 RegisterType regType = getRegisterType(current, refPosition);
5657 regMaskTP candidates = refPosition->registerAssignment;
5658 regMaskTP preferences = (current->registerPreferences & candidates);
5659 if (preferences == RBM_NONE)
5661 preferences = candidates;
5663 if (candidates == RBM_NONE)
5665 // This assumes only integer and floating point register types
5666 // if we target a processor with additional register types,
5667 // this would have to change
5668 candidates = allRegs(regType);
5672 candidates = stressLimitRegs(refPosition, candidates);
5675 // TODO-CQ: Determine whether/how to take preferences into account in addition to
5676 // prefering the one with the furthest ref position when considering
5677 // a candidate to spill
5678 RegRecord* farthestRefPhysRegRecord = nullptr;
5679 LsraLocation farthestLocation = MinLocation;
5680 LsraLocation refLocation = refPosition->nodeLocation;
5681 unsigned farthestRefPosWeight;
5682 if (allocateIfProfitable)
5684 // If allocating a reg is optional, we will consider those ref positions
5685 // whose weight is less than 'refPosition' for spilling.
5686 farthestRefPosWeight = getWeight(refPosition);
5690 // If allocating a reg is a must, we start off with max weight so
5691 // that the first spill candidate will be selected based on
5692 // farthest distance alone. Since we start off with farthestLocation
5693 // initialized to MinLocation, the first available ref position
5694 // will be selected as spill candidate and its weight as the
5695 // fathestRefPosWeight.
5696 farthestRefPosWeight = BB_MAX_WEIGHT;
5699 for (regNumber regNum : Registers(regType))
5701 regMaskTP candidateBit = genRegMask(regNum);
5702 if (!(candidates & candidateBit))
5706 RegRecord* physRegRecord = getRegisterRecord(regNum);
5708 if (physRegRecord->isBusyUntilNextKill)
5712 Interval* assignedInterval = physRegRecord->assignedInterval;
5714 // If there is a fixed reference at the same location (and it's not due to this reference),
5717 if (physRegRecord->conflictingFixedRegReference(refPosition))
5719 assert(candidates != candidateBit);
5723 LsraLocation physRegNextLocation = MaxLocation;
5724 if (refPosition->isFixedRefOfRegMask(candidateBit))
5726 // Either there is a fixed reference due to this node, or one associated with a
5727 // fixed use fed by a def at this node.
5728 // In either case, we must use this register as it's the only candidate
5729 // TODO-CQ: At the time we allocate a register to a fixed-reg def, if it's not going
5730 // to remain live until the use, we should set the candidates to allRegs(regType)
5731 // to avoid a spill - codegen can then insert the copy.
5732 assert(candidates == candidateBit);
5734 // If a refPosition has a fixed reg as its candidate and is also marked
5735 // as allocateIfProfitable, we should allocate fixed reg only if the
5736 // weight of this ref position is greater than the weight of the ref
5737 // position to which fixed reg is assigned. Such a case would arise
5738 // on x86 under LSRA stress.
5739 if (!allocateIfProfitable)
5741 physRegNextLocation = MaxLocation;
5742 farthestRefPosWeight = BB_MAX_WEIGHT;
5747 physRegNextLocation = physRegRecord->getNextRefLocation();
5749 // If refPosition requires a fixed register, we should reject all others.
5750 // Otherwise, we will still evaluate all phyRegs though their next location is
5751 // not better than farthestLocation found so far.
5753 // TODO: this method should be using an approach similar to tryAllocateFreeReg()
5754 // where it uses a regOrder array to avoid iterating over any but the single
5756 if (refPosition->isFixedRegRef && physRegNextLocation < farthestLocation)
5762 // If this register is not assigned to an interval, either
5763 // - it has a FixedReg reference at the current location that is not this reference, OR
5764 // - this is the special case of a fixed loReg, where this interval has a use at the same location
5765 // In either case, we cannot use it
5767 if (assignedInterval == nullptr)
5769 RefPosition* nextPhysRegPosition = physRegRecord->getNextRefPosition();
5771 #ifndef _TARGET_ARM64_
5772 // TODO-Cleanup: Revisit this after Issue #3524 is complete
5773 // On ARM64 the nodeLocation is not always == refLocation, Disabling this assert for now.
5774 assert(nextPhysRegPosition->nodeLocation == refLocation && candidateBit != candidates);
5779 RefPosition* recentAssignedRef = assignedInterval->recentRefPosition;
5781 if (!assignedInterval->isActive)
5783 // The assigned interval has a reference at this location - otherwise, we would have found
5784 // this in tryAllocateFreeReg().
5785 // Note that we may or may not have actually handled the reference yet, so it could either
5786 // be recentAssigedRef, or the next reference.
5787 assert(recentAssignedRef != nullptr);
5788 if (recentAssignedRef->nodeLocation != refLocation)
5790 if (recentAssignedRef->nodeLocation + 1 == refLocation)
5792 assert(recentAssignedRef->delayRegFree);
5796 RefPosition* nextAssignedRef = recentAssignedRef->nextRefPosition;
5797 assert(nextAssignedRef != nullptr);
5798 assert(nextAssignedRef->nodeLocation == refLocation ||
5799 (nextAssignedRef->nodeLocation + 1 == refLocation && nextAssignedRef->delayRegFree));
5805 // If we have a recentAssignedRef, check that it is going to be OK to spill it
5807 // TODO-Review: Under what conditions recentAssginedRef would be null?
5808 unsigned recentAssignedRefWeight = BB_ZERO_WEIGHT;
5809 if (recentAssignedRef != nullptr)
5811 if (recentAssignedRef->nodeLocation == refLocation)
5813 // We can't spill a register that's being used at the current location
5814 RefPosition* physRegRef = physRegRecord->recentRefPosition;
5818 // If the current position has the candidate register marked to be delayed,
5819 // check if the previous location is using this register, if that's the case we have to skip
5820 // since we can't spill this register.
5821 if (recentAssignedRef->delayRegFree && (refLocation == recentAssignedRef->nodeLocation + 1))
5826 // We don't prefer to spill a register if the weight of recentAssignedRef > weight
5827 // of the spill candidate found so far. We would consider spilling a greater weight
5828 // ref position only if the refPosition being allocated must need a reg.
5829 recentAssignedRefWeight = getWeight(recentAssignedRef);
5830 if (recentAssignedRefWeight > farthestRefPosWeight)
5836 RefPosition* nextRefPosition = assignedInterval->getNextRefPosition();
5837 LsraLocation nextLocation = assignedInterval->getNextRefLocation();
5839 // We should never spill a register that's occupied by an Interval with its next use at the current location.
5840 // Normally this won't occur (unless we actually had more uses in a single node than there are registers),
5841 // because we'll always find something with a later nextLocation, but it can happen in stress when
5842 // we have LSRA_SELECT_NEAREST.
5843 if ((nextLocation == refLocation) && !refPosition->isFixedRegRef && nextRefPosition->RequiresRegister())
5848 if (nextLocation > physRegNextLocation)
5850 nextLocation = physRegNextLocation;
5853 bool isBetterLocation;
5856 if (doSelectNearest() && farthestRefPhysRegRecord != nullptr)
5858 isBetterLocation = (nextLocation <= farthestLocation);
5862 // This if-stmt is associated with the above else
5863 if (recentAssignedRefWeight < farthestRefPosWeight)
5865 isBetterLocation = true;
5869 // This would mean the weight of spill ref position we found so far is equal
5870 // to the weight of the ref position that is being evaluated. In this case
5871 // we prefer to spill ref position whose distance to its next reference is
5873 assert(recentAssignedRefWeight == farthestRefPosWeight);
5875 // If allocateIfProfitable=true, the first spill candidate selected
5876 // will be based on weight alone. After we have found a spill
5877 // candidate whose weight is less than the 'refPosition', we will
5878 // consider farthest distance when there is a tie in weights.
5879 // This is to ensure that we don't spill a ref position whose
5880 // weight is equal to weight of 'refPosition'.
5881 if (allocateIfProfitable && farthestRefPhysRegRecord == nullptr)
5883 isBetterLocation = false;
5887 isBetterLocation = (nextLocation > farthestLocation);
5889 if (nextLocation > farthestLocation)
5891 isBetterLocation = true;
5893 else if (nextLocation == farthestLocation)
5895 // Both weight and distance are equal.
5896 // Prefer that ref position which is marked both reload and
5897 // allocate if profitable. These ref positions don't need
5898 // need to be spilled as they are already in memory and
5899 // codegen considers them as contained memory operands.
5900 isBetterLocation = (recentAssignedRef != nullptr) && recentAssignedRef->reload &&
5901 recentAssignedRef->AllocateIfProfitable();
5905 isBetterLocation = false;
5910 if (isBetterLocation)
5912 farthestLocation = nextLocation;
5913 farthestRefPhysRegRecord = physRegRecord;
5914 farthestRefPosWeight = recentAssignedRefWeight;
5919 if (allocateIfProfitable)
5921 // There may not be a spill candidate or if one is found
5922 // its weight must be less than the weight of 'refPosition'
5923 assert((farthestRefPhysRegRecord == nullptr) || (farthestRefPosWeight < getWeight(refPosition)));
5927 // Must have found a spill candidate.
5928 assert(farthestRefPhysRegRecord != nullptr);
5929 if ((farthestLocation == refLocation) && !refPosition->isFixedRegRef)
5931 Interval* assignedInterval = farthestRefPhysRegRecord->assignedInterval;
5932 RefPosition* nextRefPosition = assignedInterval->getNextRefPosition();
5933 assert(!nextRefPosition->RequiresRegister());
5937 assert(farthestLocation > refLocation || refPosition->isFixedRegRef);
5942 if (farthestRefPhysRegRecord != nullptr)
5944 foundReg = farthestRefPhysRegRecord->regNum;
5945 unassignPhysReg(farthestRefPhysRegRecord, farthestRefPhysRegRecord->assignedInterval->recentRefPosition);
5946 assignPhysReg(farthestRefPhysRegRecord, current);
5947 refPosition->registerAssignment = genRegMask(foundReg);
5952 refPosition->registerAssignment = RBM_NONE;
5958 // Grab a register to use to copy and then immediately use.
5959 // This is called only for localVar intervals that already have a register
5960 // assignment that is not compatible with the current RefPosition.
5961 // This is not like regular assignment, because we don't want to change
5962 // any preferences or existing register assignments.
5963 // Prefer a free register that's got the earliest next use.
5964 // Otherwise, spill something with the farthest next use
5966 regNumber LinearScan::assignCopyReg(RefPosition* refPosition)
5968 Interval* currentInterval = refPosition->getInterval();
5969 assert(currentInterval != nullptr);
5970 assert(currentInterval->isActive);
5972 bool foundFreeReg = false;
5973 RegRecord* bestPhysReg = nullptr;
5974 LsraLocation bestLocation = MinLocation;
5975 regMaskTP candidates = refPosition->registerAssignment;
5977 // Save the relatedInterval, if any, so that it doesn't get modified during allocation.
5978 Interval* savedRelatedInterval = currentInterval->relatedInterval;
5979 currentInterval->relatedInterval = nullptr;
5981 // We don't want really want to change the default assignment,
5982 // so 1) pretend this isn't active, and 2) remember the old reg
5983 regNumber oldPhysReg = currentInterval->physReg;
5984 RegRecord* oldRegRecord = currentInterval->assignedReg;
5985 assert(oldRegRecord->regNum == oldPhysReg);
5986 currentInterval->isActive = false;
5988 regNumber allocatedReg = tryAllocateFreeReg(currentInterval, refPosition);
5989 if (allocatedReg == REG_NA)
5991 allocatedReg = allocateBusyReg(currentInterval, refPosition, false);
5994 // Now restore the old info
5995 currentInterval->relatedInterval = savedRelatedInterval;
5996 currentInterval->physReg = oldPhysReg;
5997 currentInterval->assignedReg = oldRegRecord;
5998 currentInterval->isActive = true;
6000 refPosition->copyReg = true;
6001 return allocatedReg;
6004 // Check if the interval is already assigned and if it is then unassign the physical record
6005 // then set the assignedInterval to 'interval'
6007 void LinearScan::checkAndAssignInterval(RegRecord* regRec, Interval* interval)
6009 if (regRec->assignedInterval != nullptr && regRec->assignedInterval != interval)
6011 // This is allocated to another interval. Either it is inactive, or it was allocated as a
6012 // copyReg and is therefore not the "assignedReg" of the other interval. In the latter case,
6013 // we simply unassign it - in the former case we need to set the physReg on the interval to
6014 // REG_NA to indicate that it is no longer in that register.
6015 // The lack of checking for this case resulted in an assert in the retail version of System.dll,
6016 // in method SerialStream.GetDcbFlag.
6017 // Note that we can't check for the copyReg case, because we may have seen a more recent
6018 // RefPosition for the Interval that was NOT a copyReg.
6019 if (regRec->assignedInterval->assignedReg == regRec)
6021 assert(regRec->assignedInterval->isActive == false);
6022 regRec->assignedInterval->physReg = REG_NA;
6024 unassignPhysReg(regRec->regNum);
6027 regRec->assignedInterval = interval;
6030 // Assign the given physical register interval to the given interval
6031 void LinearScan::assignPhysReg(RegRecord* regRec, Interval* interval)
6033 regMaskTP assignedRegMask = genRegMask(regRec->regNum);
6034 compiler->codeGen->regSet.rsSetRegsModified(assignedRegMask DEBUGARG(dumpTerse));
6036 checkAndAssignInterval(regRec, interval);
6037 interval->assignedReg = regRec;
6040 if ((interval->registerType == TYP_DOUBLE) && isFloatRegType(regRec->registerType))
6042 regNumber nextRegNum = REG_NEXT(regRec->regNum);
6043 RegRecord* nextRegRec = getRegisterRecord(nextRegNum);
6045 checkAndAssignInterval(nextRegRec, interval);
6047 #endif // _TARGET_ARM_
6049 interval->physReg = regRec->regNum;
6050 interval->isActive = true;
6051 if (interval->isLocalVar)
6053 // Prefer this register for future references
6054 interval->updateRegisterPreferences(assignedRegMask);
6058 //------------------------------------------------------------------------
6059 // setIntervalAsSplit: Set this Interval as being split
6062 // interval - The Interval which is being split
6068 // The given Interval will be marked as split, and it will be added to the
6069 // set of splitOrSpilledVars.
6072 // "interval" must be a lclVar interval, as tree temps are never split.
6073 // This is asserted in the call to getVarIndex().
6075 void LinearScan::setIntervalAsSplit(Interval* interval)
6077 if (interval->isLocalVar)
6079 unsigned varIndex = interval->getVarIndex(compiler);
6080 if (!interval->isSplit)
6082 VarSetOps::AddElemD(compiler, splitOrSpilledVars, varIndex);
6086 assert(VarSetOps::IsMember(compiler, splitOrSpilledVars, varIndex));
6089 interval->isSplit = true;
6092 //------------------------------------------------------------------------
6093 // setIntervalAsSpilled: Set this Interval as being spilled
6096 // interval - The Interval which is being spilled
6102 // The given Interval will be marked as spilled, and it will be added
6103 // to the set of splitOrSpilledVars.
6105 void LinearScan::setIntervalAsSpilled(Interval* interval)
6107 if (interval->isLocalVar)
6109 unsigned varIndex = interval->getVarIndex(compiler);
6110 if (!interval->isSpilled)
6112 VarSetOps::AddElemD(compiler, splitOrSpilledVars, varIndex);
6116 assert(VarSetOps::IsMember(compiler, splitOrSpilledVars, varIndex));
6119 interval->isSpilled = true;
6122 //------------------------------------------------------------------------
6123 // spill: Spill this Interval between "fromRefPosition" and "toRefPosition"
6126 // fromRefPosition - The RefPosition at which the Interval is to be spilled
6127 // toRefPosition - The RefPosition at which it must be reloaded
6133 // fromRefPosition and toRefPosition must not be null
6135 void LinearScan::spillInterval(Interval* interval, RefPosition* fromRefPosition, RefPosition* toRefPosition)
6137 assert(fromRefPosition != nullptr && toRefPosition != nullptr);
6138 assert(fromRefPosition->getInterval() == interval && toRefPosition->getInterval() == interval);
6139 assert(fromRefPosition->nextRefPosition == toRefPosition);
6141 if (!fromRefPosition->lastUse)
6143 // If not allocated a register, Lcl var def/use ref positions even if reg optional
6144 // should be marked as spillAfter.
6145 if (!fromRefPosition->RequiresRegister() && !(interval->isLocalVar && fromRefPosition->IsActualRef()))
6147 fromRefPosition->registerAssignment = RBM_NONE;
6151 fromRefPosition->spillAfter = true;
6154 assert(toRefPosition != nullptr);
6159 dumpLsraAllocationEvent(LSRA_EVENT_SPILL, interval);
6163 INTRACK_STATS(updateLsraStat(LSRA_STAT_SPILL, fromRefPosition->bbNum));
6165 interval->isActive = false;
6166 setIntervalAsSpilled(interval);
6168 // If fromRefPosition occurs before the beginning of this block, mark this as living in the stack
6169 // on entry to this block.
6170 if (fromRefPosition->nodeLocation <= curBBStartLocation)
6172 // This must be a lclVar interval
6173 assert(interval->isLocalVar);
6174 setInVarRegForBB(curBBNum, interval->varNum, REG_STK);
6178 //------------------------------------------------------------------------
6179 // unassignPhysRegNoSpill: Unassign the given physical register record from
6180 // an active interval, without spilling.
6183 // regRec - the RegRecord to be unasssigned
6189 // The assignedInterval must not be null, and must be active.
6192 // This method is used to unassign a register when an interval needs to be moved to a
6193 // different register, but not (yet) spilled.
6195 void LinearScan::unassignPhysRegNoSpill(RegRecord* regRec)
6197 Interval* assignedInterval = regRec->assignedInterval;
6198 assert(assignedInterval != nullptr && assignedInterval->isActive);
6199 assignedInterval->isActive = false;
6200 unassignPhysReg(regRec, nullptr);
6201 assignedInterval->isActive = true;
6204 //------------------------------------------------------------------------
6205 // checkAndClearInterval: Clear the assignedInterval for the given
6206 // physical register record
6209 // regRec - the physical RegRecord to be unasssigned
6210 // spillRefPosition - The RefPosition at which the assignedInterval is to be spilled
6211 // or nullptr if we aren't spilling
6217 // see unassignPhysReg
6219 void LinearScan::checkAndClearInterval(RegRecord* regRec, RefPosition* spillRefPosition)
6221 Interval* assignedInterval = regRec->assignedInterval;
6222 assert(assignedInterval != nullptr);
6223 regNumber thisRegNum = regRec->regNum;
6225 if (spillRefPosition == nullptr)
6227 // Note that we can't assert for the copyReg case
6229 if (assignedInterval->physReg == thisRegNum)
6231 assert(assignedInterval->isActive == false);
6236 assert(spillRefPosition->getInterval() == assignedInterval);
6239 regRec->assignedInterval = nullptr;
6242 //------------------------------------------------------------------------
6243 // unassignPhysReg: Unassign the given physical register record, and spill the
6244 // assignedInterval at the given spillRefPosition, if any.
6247 // regRec - the RegRecord to be unasssigned
6248 // spillRefPosition - The RefPosition at which the assignedInterval is to be spilled
6254 // The assignedInterval must not be null.
6255 // If spillRefPosition is null, the assignedInterval must be inactive, or not currently
6256 // assigned to this register (e.g. this is a copyReg for that Interval).
6257 // Otherwise, spillRefPosition must be associated with the assignedInterval.
6259 void LinearScan::unassignPhysReg(RegRecord* regRec, RefPosition* spillRefPosition)
6261 Interval* assignedInterval = regRec->assignedInterval;
6262 assert(assignedInterval != nullptr);
6263 checkAndClearInterval(regRec, spillRefPosition);
6264 regNumber thisRegNum = regRec->regNum;
6267 if ((assignedInterval->registerType == TYP_DOUBLE) && isFloatRegType(regRec->registerType))
6269 regNumber nextRegNum = REG_NEXT(regRec->regNum);
6270 RegRecord* nextRegRec = getRegisterRecord(nextRegNum);
6271 checkAndClearInterval(nextRegRec, spillRefPosition);
6273 #endif // _TARGET_ARM_
6276 if (VERBOSE && !dumpTerse)
6278 printf("unassigning %s: ", getRegName(regRec->regNum));
6279 assignedInterval->dump();
6284 RefPosition* nextRefPosition = nullptr;
6285 if (spillRefPosition != nullptr)
6287 nextRefPosition = spillRefPosition->nextRefPosition;
6290 if (assignedInterval->physReg != REG_NA && assignedInterval->physReg != thisRegNum)
6292 // This must have been a temporary copy reg, but we can't assert that because there
6293 // may have been intervening RefPositions that were not copyRegs.
6294 regRec->assignedInterval = nullptr;
6298 regNumber victimAssignedReg = assignedInterval->physReg;
6299 assignedInterval->physReg = REG_NA;
6301 bool spill = assignedInterval->isActive && nextRefPosition != nullptr;
6304 // If this is an active interval, it must have a recentRefPosition,
6305 // otherwise it would not be active
6306 assert(spillRefPosition != nullptr);
6309 // TODO-CQ: Enable this and insert an explicit GT_COPY (otherwise there's no way to communicate
6310 // to codegen that we want the copyReg to be the new home location).
6311 // If the last reference was a copyReg, and we're spilling the register
6312 // it was copied from, then make the copyReg the new primary location
6314 if (spillRefPosition->copyReg)
6316 regNumber copyFromRegNum = victimAssignedReg;
6317 regNumber copyRegNum = genRegNumFromMask(spillRefPosition->registerAssignment);
6318 if (copyFromRegNum == thisRegNum &&
6319 getRegisterRecord(copyRegNum)->assignedInterval == assignedInterval)
6321 assert(copyRegNum != thisRegNum);
6322 assignedInterval->physReg = copyRegNum;
6323 assignedInterval->assignedReg = this->getRegisterRecord(copyRegNum);
6329 // With JitStressRegs == 0x80 (LSRA_EXTEND_LIFETIMES), we may have a RefPosition
6330 // that is not marked lastUse even though the treeNode is a lastUse. In that case
6331 // we must not mark it for spill because the register will have been immediately freed
6332 // after use. While we could conceivably add special handling for this case in codegen,
6333 // it would be messy and undesirably cause the "bleeding" of LSRA stress modes outside
6335 if (extendLifetimes() && assignedInterval->isLocalVar && RefTypeIsUse(spillRefPosition->refType) &&
6336 spillRefPosition->treeNode != nullptr && (spillRefPosition->treeNode->gtFlags & GTF_VAR_DEATH) != 0)
6338 dumpLsraAllocationEvent(LSRA_EVENT_SPILL_EXTENDED_LIFETIME, assignedInterval);
6339 assignedInterval->isActive = false;
6341 // If the spillRefPosition occurs before the beginning of this block, it will have
6342 // been marked as living in this register on entry to this block, but we now need
6343 // to mark this as living on the stack.
6344 if (spillRefPosition->nodeLocation <= curBBStartLocation)
6346 setInVarRegForBB(curBBNum, assignedInterval->varNum, REG_STK);
6347 if (spillRefPosition->nextRefPosition != nullptr)
6349 setIntervalAsSpilled(assignedInterval);
6354 // Otherwise, we need to mark spillRefPosition as lastUse, or the interval
6355 // will remain active beyond its allocated range during the resolution phase.
6356 spillRefPosition->lastUse = true;
6362 spillInterval(assignedInterval, spillRefPosition, nextRefPosition);
6365 // Maintain the association with the interval, if it has more references.
6366 // Or, if we "remembered" an interval assigned to this register, restore it.
6367 if (nextRefPosition != nullptr)
6369 assignedInterval->assignedReg = regRec;
6371 else if (regRec->previousInterval != nullptr && regRec->previousInterval != assignedInterval &&
6372 regRec->previousInterval->assignedReg == regRec &&
6373 regRec->previousInterval->getNextRefPosition() != nullptr)
6375 regRec->assignedInterval = regRec->previousInterval;
6376 regRec->previousInterval = nullptr;
6380 dumpLsraAllocationEvent(LSRA_EVENT_RESTORE_PREVIOUS_INTERVAL_AFTER_SPILL, regRec->assignedInterval,
6385 dumpLsraAllocationEvent(LSRA_EVENT_RESTORE_PREVIOUS_INTERVAL, regRec->assignedInterval, thisRegNum);
6391 regRec->assignedInterval = nullptr;
6392 regRec->previousInterval = nullptr;
6396 //------------------------------------------------------------------------
6397 // spillGCRefs: Spill any GC-type intervals that are currently in registers.a
6400 // killRefPosition - The RefPosition for the kill
6405 void LinearScan::spillGCRefs(RefPosition* killRefPosition)
6407 // For each physical register that can hold a GC type,
6408 // if it is occupied by an interval of a GC type, spill that interval.
6409 regMaskTP candidateRegs = killRefPosition->registerAssignment;
6410 while (candidateRegs != RBM_NONE)
6412 regMaskTP nextRegBit = genFindLowestBit(candidateRegs);
6413 candidateRegs &= ~nextRegBit;
6414 regNumber nextReg = genRegNumFromMask(nextRegBit);
6415 RegRecord* regRecord = getRegisterRecord(nextReg);
6416 Interval* assignedInterval = regRecord->assignedInterval;
6417 if (assignedInterval == nullptr || (assignedInterval->isActive == false) ||
6418 !varTypeIsGC(assignedInterval->registerType))
6422 unassignPhysReg(regRecord, assignedInterval->recentRefPosition);
6424 INDEBUG(dumpLsraAllocationEvent(LSRA_EVENT_DONE_KILL_GC_REFS, nullptr, REG_NA, nullptr));
6427 //------------------------------------------------------------------------
6428 // processBlockEndAllocation: Update var locations after 'currentBlock' has been allocated
6431 // currentBlock - the BasicBlock we have just finished allocating registers for
6437 // Calls processBlockEndLocation() to set the outVarToRegMap, then gets the next block,
6438 // and sets the inVarToRegMap appropriately.
6440 void LinearScan::processBlockEndAllocation(BasicBlock* currentBlock)
6442 assert(currentBlock != nullptr);
6443 processBlockEndLocations(currentBlock);
6444 markBlockVisited(currentBlock);
6446 // Get the next block to allocate.
6447 // When the last block in the method has successors, there will be a final "RefTypeBB" to
6448 // ensure that we get the varToRegMap set appropriately, but in that case we don't need
6449 // to worry about "nextBlock".
6450 BasicBlock* nextBlock = getNextBlock();
6451 if (nextBlock != nullptr)
6453 processBlockStartLocations(nextBlock, true);
6457 //------------------------------------------------------------------------
6458 // rotateBlockStartLocation: When in the LSRA_BLOCK_BOUNDARY_ROTATE stress mode, attempt to
6459 // "rotate" the register assignment for a localVar to the next higher
6460 // register that is available.
6463 // interval - the Interval for the variable whose register is getting rotated
6464 // targetReg - its register assignment from the predecessor block being used for live-in
6465 // availableRegs - registers available for use
6468 // The new register to use.
6471 regNumber LinearScan::rotateBlockStartLocation(Interval* interval, regNumber targetReg, regMaskTP availableRegs)
6473 if (targetReg != REG_STK && getLsraBlockBoundaryLocations() == LSRA_BLOCK_BOUNDARY_ROTATE)
6475 // If we're rotating the register locations at block boundaries, try to use
6476 // the next higher register number of the appropriate register type.
6477 regMaskTP candidateRegs = allRegs(interval->registerType) & availableRegs;
6478 regNumber firstReg = REG_NA;
6479 regNumber newReg = REG_NA;
6480 while (candidateRegs != RBM_NONE)
6482 regMaskTP nextRegBit = genFindLowestBit(candidateRegs);
6483 candidateRegs &= ~nextRegBit;
6484 regNumber nextReg = genRegNumFromMask(nextRegBit);
6485 if (nextReg > targetReg)
6490 else if (firstReg == REG_NA)
6495 if (newReg == REG_NA)
6497 assert(firstReg != REG_NA);
6506 //------------------------------------------------------------------------
6507 // processBlockStartLocations: Update var locations on entry to 'currentBlock'
6510 // currentBlock - the BasicBlock we have just finished allocating registers for
6511 // allocationPass - true if we are currently allocating registers (versus writing them back)
6517 // During the allocation pass, we use the outVarToRegMap of the selected predecessor to
6518 // determine the lclVar locations for the inVarToRegMap.
6519 // During the resolution (write-back) pass, we only modify the inVarToRegMap in cases where
6520 // a lclVar was spilled after the block had been completed.
6521 void LinearScan::processBlockStartLocations(BasicBlock* currentBlock, bool allocationPass)
6523 unsigned predBBNum = blockInfo[currentBlock->bbNum].predBBNum;
6524 VarToRegMap predVarToRegMap = getOutVarToRegMap(predBBNum);
6525 VarToRegMap inVarToRegMap = getInVarToRegMap(currentBlock->bbNum);
6526 bool hasCriticalInEdge = blockInfo[currentBlock->bbNum].hasCriticalInEdge;
6528 VARSET_TP VARSET_INIT_NOCOPY(liveIn, currentBlock->bbLiveIn);
6530 if (getLsraExtendLifeTimes())
6532 VarSetOps::AssignNoCopy(compiler, liveIn, compiler->lvaTrackedVars);
6534 // If we are rotating register assignments at block boundaries, we want to make the
6535 // inactive registers available for the rotation.
6536 regMaskTP inactiveRegs = RBM_NONE;
6538 regMaskTP liveRegs = RBM_NONE;
6539 VARSET_ITER_INIT(compiler, iter, liveIn, varIndex);
6540 while (iter.NextElem(compiler, &varIndex))
6542 unsigned varNum = compiler->lvaTrackedToVarNum[varIndex];
6543 if (!compiler->lvaTable[varNum].lvLRACandidate)
6547 regNumber targetReg;
6548 Interval* interval = getIntervalForLocalVar(varNum);
6549 RefPosition* nextRefPosition = interval->getNextRefPosition();
6550 assert(nextRefPosition != nullptr);
6554 targetReg = predVarToRegMap[varIndex];
6556 regNumber newTargetReg = rotateBlockStartLocation(interval, targetReg, (~liveRegs | inactiveRegs));
6557 if (newTargetReg != targetReg)
6559 targetReg = newTargetReg;
6560 setIntervalAsSplit(interval);
6563 inVarToRegMap[varIndex] = targetReg;
6565 else // !allocationPass (i.e. resolution/write-back pass)
6567 targetReg = inVarToRegMap[varIndex];
6568 // There are four cases that we need to consider during the resolution pass:
6569 // 1. This variable had a register allocated initially, and it was not spilled in the RefPosition
6570 // that feeds this block. In this case, both targetReg and predVarToRegMap[varIndex] will be targetReg.
6571 // 2. This variable had not been spilled prior to the end of predBB, but was later spilled, so
6572 // predVarToRegMap[varIndex] will be REG_STK, but targetReg is its former allocated value.
6573 // In this case, we will normally change it to REG_STK. We will update its "spilled" status when we
6574 // encounter it in resolveLocalRef().
6575 // 2a. If the next RefPosition is marked as a copyReg, we need to retain the allocated register. This is
6576 // because the copyReg RefPosition will not have recorded the "home" register, yet downstream
6577 // RefPositions rely on the correct "home" register.
6578 // 3. This variable was spilled before we reached the end of predBB. In this case, both targetReg and
6579 // predVarToRegMap[varIndex] will be REG_STK, and the next RefPosition will have been marked
6580 // as reload during allocation time if necessary (note that by the time we actually reach the next
6581 // RefPosition, we may be using a different predecessor, at which it is still in a register).
6582 // 4. This variable was spilled during the allocation of this block, so targetReg is REG_STK
6583 // (because we set inVarToRegMap at the time we spilled it), but predVarToRegMap[varIndex]
6584 // is not REG_STK. We retain the REG_STK value in the inVarToRegMap.
6585 if (targetReg != REG_STK)
6587 if (predVarToRegMap[varIndex] != REG_STK)
6590 assert(predVarToRegMap[varIndex] == targetReg ||
6591 getLsraBlockBoundaryLocations() == LSRA_BLOCK_BOUNDARY_ROTATE);
6593 else if (!nextRefPosition->copyReg)
6596 inVarToRegMap[varIndex] = REG_STK;
6597 targetReg = REG_STK;
6599 // Else case 2a. - retain targetReg.
6601 // Else case #3 or #4, we retain targetReg and nothing further to do or assert.
6603 if (interval->physReg == targetReg)
6605 if (interval->isActive)
6607 assert(targetReg != REG_STK);
6608 assert(interval->assignedReg != nullptr && interval->assignedReg->regNum == targetReg &&
6609 interval->assignedReg->assignedInterval == interval);
6610 liveRegs |= genRegMask(targetReg);
6614 else if (interval->physReg != REG_NA)
6616 // This can happen if we are using the locations from a basic block other than the
6617 // immediately preceding one - where the variable was in a different location.
6618 if (targetReg != REG_STK)
6620 // Unassign it from the register (it will get a new register below).
6621 if (interval->assignedReg != nullptr && interval->assignedReg->assignedInterval == interval)
6623 interval->isActive = false;
6624 unassignPhysReg(getRegisterRecord(interval->physReg), nullptr);
6628 // This interval was live in this register the last time we saw a reference to it,
6629 // but has since been displaced.
6630 interval->physReg = REG_NA;
6633 else if (allocationPass)
6635 // Keep the register assignment - if another var has it, it will get unassigned.
6636 // Otherwise, resolution will fix it up later, and it will be more
6637 // likely to match other assignments this way.
6638 interval->isActive = true;
6639 liveRegs |= genRegMask(interval->physReg);
6640 INDEBUG(inactiveRegs |= genRegMask(interval->physReg));
6641 inVarToRegMap[varIndex] = interval->physReg;
6645 interval->physReg = REG_NA;
6648 if (targetReg != REG_STK)
6650 RegRecord* targetRegRecord = getRegisterRecord(targetReg);
6651 liveRegs |= genRegMask(targetReg);
6652 if (!interval->isActive)
6654 interval->isActive = true;
6655 interval->physReg = targetReg;
6656 interval->assignedReg = targetRegRecord;
6658 Interval* assignedInterval = targetRegRecord->assignedInterval;
6659 if (assignedInterval != interval)
6661 // Is there another interval currently assigned to this register? If so unassign it.
6662 if (assignedInterval != nullptr)
6664 if (assignedInterval->assignedReg == targetRegRecord)
6666 // If the interval is active, it will be set to active when we reach its new
6667 // register assignment (which we must not yet have done, or it wouldn't still be
6668 // assigned to this register).
6669 assignedInterval->isActive = false;
6670 unassignPhysReg(targetRegRecord, nullptr);
6671 if (allocationPass && assignedInterval->isLocalVar &&
6672 inVarToRegMap[assignedInterval->getVarIndex(compiler)] == targetReg)
6674 inVarToRegMap[assignedInterval->getVarIndex(compiler)] = REG_STK;
6679 // This interval is no longer assigned to this register.
6680 targetRegRecord->assignedInterval = nullptr;
6683 assignPhysReg(targetRegRecord, interval);
6685 if (interval->recentRefPosition != nullptr && !interval->recentRefPosition->copyReg &&
6686 interval->recentRefPosition->registerAssignment != genRegMask(targetReg))
6688 interval->getNextRefPosition()->outOfOrder = true;
6693 // Unassign any registers that are no longer live.
6694 for (regNumber reg = REG_FIRST; reg < ACTUAL_REG_COUNT; reg = REG_NEXT(reg))
6696 if ((liveRegs & genRegMask(reg)) == 0)
6698 RegRecord* physRegRecord = getRegisterRecord(reg);
6699 Interval* assignedInterval = physRegRecord->assignedInterval;
6701 if (assignedInterval != nullptr)
6703 assert(assignedInterval->isLocalVar || assignedInterval->isConstant);
6704 if (!assignedInterval->isConstant && assignedInterval->assignedReg == physRegRecord)
6706 assignedInterval->isActive = false;
6707 if (assignedInterval->getNextRefPosition() == nullptr)
6709 unassignPhysReg(physRegRecord, nullptr);
6711 inVarToRegMap[assignedInterval->getVarIndex(compiler)] = REG_STK;
6715 // This interval may still be active, but was in another register in an
6716 // intervening block.
6717 physRegRecord->assignedInterval = nullptr;
6722 INDEBUG(dumpLsraAllocationEvent(LSRA_EVENT_START_BB, nullptr, REG_NA, currentBlock));
6725 //------------------------------------------------------------------------
6726 // processBlockEndLocations: Record the variables occupying registers after completing the current block.
6729 // currentBlock - the block we have just completed.
6735 // This must be called both during the allocation and resolution (write-back) phases.
6736 // This is because we need to have the outVarToRegMap locations in order to set the locations
6737 // at successor blocks during allocation time, but if lclVars are spilled after a block has been
6738 // completed, we need to record the REG_STK location for those variables at resolution time.
6740 void LinearScan::processBlockEndLocations(BasicBlock* currentBlock)
6742 assert(currentBlock != nullptr && currentBlock->bbNum == curBBNum);
6743 VarToRegMap outVarToRegMap = getOutVarToRegMap(curBBNum);
6745 VARSET_TP VARSET_INIT_NOCOPY(liveOut, currentBlock->bbLiveOut);
6747 if (getLsraExtendLifeTimes())
6749 VarSetOps::AssignNoCopy(compiler, liveOut, compiler->lvaTrackedVars);
6752 regMaskTP liveRegs = RBM_NONE;
6753 VARSET_ITER_INIT(compiler, iter, liveOut, varIndex);
6754 while (iter.NextElem(compiler, &varIndex))
6756 unsigned varNum = compiler->lvaTrackedToVarNum[varIndex];
6757 Interval* interval = getIntervalForLocalVar(varNum);
6758 if (interval->isActive)
6760 assert(interval->physReg != REG_NA && interval->physReg != REG_STK);
6761 outVarToRegMap[varIndex] = interval->physReg;
6765 outVarToRegMap[varIndex] = REG_STK;
6768 INDEBUG(dumpLsraAllocationEvent(LSRA_EVENT_END_BB));
6772 void LinearScan::dumpRefPositions(const char* str)
6774 printf("------------\n");
6775 printf("REFPOSITIONS %s: \n", str);
6776 printf("------------\n");
6777 for (auto& refPos : refPositions)
6784 bool LinearScan::registerIsFree(regNumber regNum, RegisterType regType)
6786 RegRecord* physRegRecord = getRegisterRecord(regNum);
6788 bool isFree = physRegRecord->isFree();
6791 if (isFree && regType == TYP_DOUBLE)
6793 isFree = getRegisterRecord(REG_NEXT(regNum))->isFree();
6795 #endif // _TARGET_ARM_
6800 //------------------------------------------------------------------------
6801 // LinearScan::freeRegister: Make a register available for use
6804 // physRegRecord - the RegRecord for the register to be freed.
6811 // It may be that the RegRecord has already been freed, e.g. due to a kill,
6812 // in which case this method has no effect.
6815 // If there is currently an Interval assigned to this register, and it has
6816 // more references (i.e. this is a local last-use, but more uses and/or
6817 // defs remain), it will remain assigned to the physRegRecord. However, since
6818 // it is marked inactive, the register will be available, albeit less desirable
6820 void LinearScan::freeRegister(RegRecord* physRegRecord)
6822 Interval* assignedInterval = physRegRecord->assignedInterval;
6823 // It may have already been freed by a "Kill"
6824 if (assignedInterval != nullptr)
6826 assignedInterval->isActive = false;
6827 // If this is a constant node, that we may encounter again (e.g. constant),
6828 // don't unassign it until we need the register.
6829 if (!assignedInterval->isConstant)
6831 RefPosition* nextRefPosition = assignedInterval->getNextRefPosition();
6832 // Unassign the register only if there are no more RefPositions, or the next
6833 // one is a def. Note that the latter condition doesn't actually ensure that
6834 // there aren't subsequent uses that could be reached by a def in the assigned
6835 // register, but is merely a heuristic to avoid tying up the register (or using
6836 // it when it's non-optimal). A better alternative would be to use SSA, so that
6837 // we wouldn't unnecessarily link separate live ranges to the same register.
6838 if (nextRefPosition == nullptr || RefTypeIsDef(nextRefPosition->refType))
6840 unassignPhysReg(physRegRecord, nullptr);
6846 void LinearScan::freeRegisters(regMaskTP regsToFree)
6848 if (regsToFree == RBM_NONE)
6853 INDEBUG(dumpLsraAllocationEvent(LSRA_EVENT_FREE_REGS));
6854 while (regsToFree != RBM_NONE)
6856 regMaskTP nextRegBit = genFindLowestBit(regsToFree);
6857 regsToFree &= ~nextRegBit;
6858 regNumber nextReg = genRegNumFromMask(nextRegBit);
6859 freeRegister(getRegisterRecord(nextReg));
6863 // Actual register allocation, accomplished by iterating over all of the previously
6864 // constructed Intervals
6865 // Loosely based on raAssignVars()
6867 void LinearScan::allocateRegisters()
6869 JITDUMP("*************** In LinearScan::allocateRegisters()\n");
6870 DBEXEC(VERBOSE, lsraDumpIntervals("before allocateRegisters"));
6872 // at start, nothing is active except for register args
6873 for (auto& interval : intervals)
6875 Interval* currentInterval = &interval;
6876 currentInterval->recentRefPosition = nullptr;
6877 currentInterval->isActive = false;
6878 if (currentInterval->isLocalVar)
6880 LclVarDsc* varDsc = currentInterval->getLocalVar(compiler);
6881 if (varDsc->lvIsRegArg && currentInterval->firstRefPosition != nullptr)
6883 currentInterval->isActive = true;
6888 for (regNumber reg = REG_FIRST; reg < ACTUAL_REG_COUNT; reg = REG_NEXT(reg))
6890 getRegisterRecord(reg)->recentRefPosition = nullptr;
6891 getRegisterRecord(reg)->isActive = false;
6895 regNumber lastAllocatedReg = REG_NA;
6898 dumpRefPositions("BEFORE ALLOCATION");
6899 dumpVarRefPositions("BEFORE ALLOCATION");
6901 printf("\n\nAllocating Registers\n"
6902 "--------------------\n");
6905 dumpRegRecordHeader();
6906 // Now print an empty indent
6907 printf(indentFormat, "");
6912 BasicBlock* currentBlock = nullptr;
6914 LsraLocation prevLocation = MinLocation;
6915 regMaskTP regsToFree = RBM_NONE;
6916 regMaskTP delayRegsToFree = RBM_NONE;
6918 // This is the most recent RefPosition for which a register was allocated
6919 // - currently only used for DEBUG but maintained in non-debug, for clarity of code
6920 // (and will be optimized away because in non-debug spillAlways() unconditionally returns false)
6921 RefPosition* lastAllocatedRefPosition = nullptr;
6923 bool handledBlockEnd = false;
6925 for (auto& refPosition : refPositions)
6927 RefPosition* currentRefPosition = &refPosition;
6930 // Set the activeRefPosition to null until we're done with any boundary handling.
6931 activeRefPosition = nullptr;
6936 // We're really dumping the RegRecords "after" the previous RefPosition, but it's more convenient
6937 // to do this here, since there are a number of "continue"s in this loop.
6947 // This is the previousRefPosition of the current Referent, if any
6948 RefPosition* previousRefPosition = nullptr;
6950 Interval* currentInterval = nullptr;
6951 Referenceable* currentReferent = nullptr;
6952 bool isInternalRef = false;
6953 RefType refType = currentRefPosition->refType;
6955 currentReferent = currentRefPosition->referent;
6957 if (spillAlways() && lastAllocatedRefPosition != nullptr && !lastAllocatedRefPosition->isPhysRegRef &&
6958 !lastAllocatedRefPosition->getInterval()->isInternal &&
6959 (RefTypeIsDef(lastAllocatedRefPosition->refType) || lastAllocatedRefPosition->getInterval()->isLocalVar))
6961 assert(lastAllocatedRefPosition->registerAssignment != RBM_NONE);
6962 RegRecord* regRecord = lastAllocatedRefPosition->getInterval()->assignedReg;
6963 unassignPhysReg(regRecord, lastAllocatedRefPosition);
6964 // Now set lastAllocatedRefPosition to null, so that we don't try to spill it again
6965 lastAllocatedRefPosition = nullptr;
6968 // We wait to free any registers until we've completed all the
6969 // uses for the current node.
6970 // This avoids reusing registers too soon.
6971 // We free before the last true def (after all the uses & internal
6972 // registers), and then again at the beginning of the next node.
6973 // This is made easier by assigning two LsraLocations per node - one
6974 // for all the uses, internal registers & all but the last def, and
6975 // another for the final def (if any).
6977 LsraLocation currentLocation = currentRefPosition->nodeLocation;
6979 if ((regsToFree | delayRegsToFree) != RBM_NONE)
6981 bool doFreeRegs = false;
6982 // Free at a new location, or at a basic block boundary
6983 if (currentLocation > prevLocation || refType == RefTypeBB)
6990 freeRegisters(regsToFree);
6991 regsToFree = delayRegsToFree;
6992 delayRegsToFree = RBM_NONE;
6995 prevLocation = currentLocation;
6997 // get previous refposition, then current refpos is the new previous
6998 if (currentReferent != nullptr)
7000 previousRefPosition = currentReferent->recentRefPosition;
7001 currentReferent->recentRefPosition = currentRefPosition;
7005 assert((refType == RefTypeBB) || (refType == RefTypeKillGCRefs));
7008 // For the purposes of register resolution, we handle the DummyDefs before
7009 // the block boundary - so the RefTypeBB is after all the DummyDefs.
7010 // However, for the purposes of allocation, we want to handle the block
7011 // boundary first, so that we can free any registers occupied by lclVars
7012 // that aren't live in the next block and make them available for the
7015 if (!handledBlockEnd && (refType == RefTypeBB || refType == RefTypeDummyDef))
7017 // Free any delayed regs (now in regsToFree) before processing the block boundary
7018 freeRegisters(regsToFree);
7019 regsToFree = RBM_NONE;
7020 handledBlockEnd = true;
7021 curBBStartLocation = currentRefPosition->nodeLocation;
7022 if (currentBlock == nullptr)
7024 currentBlock = startBlockSequence();
7028 processBlockEndAllocation(currentBlock);
7029 currentBlock = moveToNextBlock();
7032 if (VERBOSE && currentBlock != nullptr && !dumpTerse)
7034 currentBlock->dspBlockHeader(compiler);
7041 activeRefPosition = currentRefPosition;
7046 dumpRefPositionShort(currentRefPosition, currentBlock);
7050 currentRefPosition->dump();
7055 if (refType == RefTypeBB)
7057 handledBlockEnd = false;
7061 if (refType == RefTypeKillGCRefs)
7063 spillGCRefs(currentRefPosition);
7067 // If this is a FixedReg, disassociate any inactive constant interval from this register.
7068 // Otherwise, do nothing.
7069 if (refType == RefTypeFixedReg)
7071 RegRecord* regRecord = currentRefPosition->getReg();
7072 if (regRecord->assignedInterval != nullptr && !regRecord->assignedInterval->isActive &&
7073 regRecord->assignedInterval->isConstant)
7075 regRecord->assignedInterval = nullptr;
7077 INDEBUG(dumpLsraAllocationEvent(LSRA_EVENT_FIXED_REG, nullptr, currentRefPosition->assignedReg()));
7081 // If this is an exposed use, do nothing - this is merely a placeholder to attempt to
7082 // ensure that a register is allocated for the full lifetime. The resolution logic
7083 // will take care of moving to the appropriate register if needed.
7085 if (refType == RefTypeExpUse)
7087 INDEBUG(dumpLsraAllocationEvent(LSRA_EVENT_EXP_USE));
7091 regNumber assignedRegister = REG_NA;
7093 if (currentRefPosition->isIntervalRef())
7095 currentInterval = currentRefPosition->getInterval();
7096 assignedRegister = currentInterval->physReg;
7098 if (VERBOSE && !dumpTerse)
7100 currentInterval->dump();
7104 // Identify the special cases where we decide up-front not to allocate
7105 bool allocate = true;
7106 bool didDump = false;
7108 if (refType == RefTypeParamDef || refType == RefTypeZeroInit)
7110 // For a ParamDef with a weighted refCount less than unity, don't enregister it at entry.
7111 // TODO-CQ: Consider doing this only for stack parameters, since otherwise we may be needlessly
7112 // inserting a store.
7113 LclVarDsc* varDsc = currentInterval->getLocalVar(compiler);
7114 assert(varDsc != nullptr);
7115 if (refType == RefTypeParamDef && varDsc->lvRefCntWtd <= BB_UNITY_WEIGHT)
7117 INDEBUG(dumpLsraAllocationEvent(LSRA_EVENT_NO_ENTRY_REG_ALLOCATED, currentInterval));
7120 setIntervalAsSpilled(currentInterval);
7122 // If it has no actual references, mark it as "lastUse"; since they're not actually part
7123 // of any flow they won't have been marked during dataflow. Otherwise, if we allocate a
7124 // register we won't unassign it.
7125 else if (currentRefPosition->nextRefPosition == nullptr)
7127 INDEBUG(dumpLsraAllocationEvent(LSRA_EVENT_ZERO_REF, currentInterval));
7128 currentRefPosition->lastUse = true;
7132 else if (refType == RefTypeUpperVectorSaveDef || refType == RefTypeUpperVectorSaveUse)
7134 Interval* lclVarInterval = currentInterval->relatedInterval;
7135 if (lclVarInterval->physReg == REG_NA)
7140 #endif // FEATURE_SIMD
7142 if (allocate == false)
7144 if (assignedRegister != REG_NA)
7146 unassignPhysReg(getRegisterRecord(assignedRegister), currentRefPosition);
7150 INDEBUG(dumpLsraAllocationEvent(LSRA_EVENT_NO_REG_ALLOCATED, currentInterval));
7153 currentRefPosition->registerAssignment = RBM_NONE;
7157 if (currentInterval->isSpecialPutArg)
7159 assert(!currentInterval->isLocalVar);
7160 Interval* srcInterval = currentInterval->relatedInterval;
7161 assert(srcInterval->isLocalVar);
7162 if (refType == RefTypeDef)
7164 assert(srcInterval->recentRefPosition->nodeLocation == currentLocation - 1);
7165 RegRecord* physRegRecord = srcInterval->assignedReg;
7167 // For a putarg_reg to be special, its next use location has to be the same
7168 // as fixed reg's next kill location. Otherwise, if source lcl var's next use
7169 // is after the kill of fixed reg but before putarg_reg's next use, fixed reg's
7170 // kill would lead to spill of source but not the putarg_reg if it were treated
7172 if (srcInterval->isActive &&
7173 genRegMask(srcInterval->physReg) == currentRefPosition->registerAssignment &&
7174 currentInterval->getNextRefLocation() == physRegRecord->getNextRefLocation())
7176 assert(physRegRecord->regNum == srcInterval->physReg);
7178 // Special putarg_reg acts as a pass-thru since both source lcl var
7179 // and putarg_reg have the same register allocated. Physical reg
7180 // record of reg continue to point to source lcl var's interval
7181 // instead of to putarg_reg's interval. So if a spill of reg
7182 // allocated to source lcl var happens, to reallocate to another
7183 // tree node, before its use at call node it will lead to spill of
7184 // lcl var instead of putarg_reg since physical reg record is pointing
7185 // to lcl var's interval. As a result, arg reg would get trashed leading
7186 // to bad codegen. The assumption here is that source lcl var of a
7187 // special putarg_reg doesn't get spilled and re-allocated prior to
7188 // its use at the call node. This is ensured by marking physical reg
7189 // record as busy until next kill.
7190 physRegRecord->isBusyUntilNextKill = true;
7194 currentInterval->isSpecialPutArg = false;
7197 // If this is still a SpecialPutArg, continue;
7198 if (currentInterval->isSpecialPutArg)
7200 INDEBUG(dumpLsraAllocationEvent(LSRA_EVENT_SPECIAL_PUTARG, currentInterval,
7201 currentRefPosition->assignedReg()));
7206 if (assignedRegister == REG_NA && RefTypeIsUse(refType))
7208 currentRefPosition->reload = true;
7209 INDEBUG(dumpLsraAllocationEvent(LSRA_EVENT_RELOAD, currentInterval, assignedRegister));
7213 regMaskTP assignedRegBit = RBM_NONE;
7214 bool isInRegister = false;
7215 if (assignedRegister != REG_NA)
7217 isInRegister = true;
7218 assignedRegBit = genRegMask(assignedRegister);
7219 if (!currentInterval->isActive)
7221 // If this is a use, it must have started the block on the stack, but the register
7222 // was available for use so we kept the association.
7223 if (RefTypeIsUse(refType))
7225 assert(inVarToRegMaps[curBBNum][currentInterval->getVarIndex(compiler)] == REG_STK &&
7226 previousRefPosition->nodeLocation <= curBBStartLocation);
7227 isInRegister = false;
7231 currentInterval->isActive = true;
7234 assert(currentInterval->assignedReg != nullptr &&
7235 currentInterval->assignedReg->regNum == assignedRegister &&
7236 currentInterval->assignedReg->assignedInterval == currentInterval);
7239 // If this is a physical register, we unconditionally assign it to itself!
7240 if (currentRefPosition->isPhysRegRef)
7242 RegRecord* currentReg = currentRefPosition->getReg();
7243 Interval* assignedInterval = currentReg->assignedInterval;
7245 if (assignedInterval != nullptr)
7247 unassignPhysReg(currentReg, assignedInterval->recentRefPosition);
7249 currentReg->isActive = true;
7250 assignedRegister = currentReg->regNum;
7251 assignedRegBit = genRegMask(assignedRegister);
7252 if (refType == RefTypeKill)
7254 currentReg->isBusyUntilNextKill = false;
7257 else if (previousRefPosition != nullptr)
7259 assert(previousRefPosition->nextRefPosition == currentRefPosition);
7260 assert(assignedRegister == REG_NA || assignedRegBit == previousRefPosition->registerAssignment ||
7261 currentRefPosition->outOfOrder || previousRefPosition->copyReg ||
7262 previousRefPosition->refType == RefTypeExpUse || currentRefPosition->refType == RefTypeDummyDef);
7264 else if (assignedRegister != REG_NA)
7266 // Handle the case where this is a preassigned register (i.e. parameter).
7267 // We don't want to actually use the preassigned register if it's not
7268 // going to cover the lifetime - but we had to preallocate it to ensure
7269 // that it remained live.
7270 // TODO-CQ: At some point we may want to refine the analysis here, in case
7271 // it might be beneficial to keep it in this reg for PART of the lifetime
7272 if (currentInterval->isLocalVar)
7274 regMaskTP preferences = currentInterval->registerPreferences;
7275 bool keepAssignment = true;
7276 bool matchesPreferences = (preferences & genRegMask(assignedRegister)) != RBM_NONE;
7278 // Will the assigned register cover the lifetime? If not, does it at least
7279 // meet the preferences for the next RefPosition?
7280 RegRecord* physRegRecord = getRegisterRecord(currentInterval->physReg);
7281 RefPosition* nextPhysRegRefPos = physRegRecord->getNextRefPosition();
7282 if (nextPhysRegRefPos != nullptr &&
7283 nextPhysRegRefPos->nodeLocation <= currentInterval->lastRefPosition->nodeLocation)
7285 // Check to see if the existing assignment matches the preferences (e.g. callee save registers)
7286 // and ensure that the next use of this localVar does not occur after the nextPhysRegRefPos
7287 // There must be a next RefPosition, because we know that the Interval extends beyond the
7288 // nextPhysRegRefPos.
7289 RefPosition* nextLclVarRefPos = currentRefPosition->nextRefPosition;
7290 assert(nextLclVarRefPos != nullptr);
7291 if (!matchesPreferences || nextPhysRegRefPos->nodeLocation < nextLclVarRefPos->nodeLocation ||
7292 physRegRecord->conflictingFixedRegReference(nextLclVarRefPos))
7294 keepAssignment = false;
7297 else if (refType == RefTypeParamDef && !matchesPreferences)
7299 // Don't use the register, even if available, if it doesn't match the preferences.
7300 // Note that this case is only for ParamDefs, for which we haven't yet taken preferences
7301 // into account (we've just automatically got the initial location). In other cases,
7302 // we would already have put it in a preferenced register, if it was available.
7303 // TODO-CQ: Consider expanding this to check availability - that would duplicate
7304 // code here, but otherwise we may wind up in this register anyway.
7305 keepAssignment = false;
7308 if (keepAssignment == false)
7310 currentRefPosition->registerAssignment = allRegs(currentInterval->registerType);
7311 unassignPhysRegNoSpill(physRegRecord);
7313 // If the preferences are currently set to just this register, reset them to allRegs
7314 // of the appropriate type (just as we just reset the registerAssignment for this
7316 // Otherwise, simply remove this register from the preferences, if it's there.
7318 if (currentInterval->registerPreferences == assignedRegBit)
7320 currentInterval->registerPreferences = currentRefPosition->registerAssignment;
7324 currentInterval->registerPreferences &= ~assignedRegBit;
7327 assignedRegister = REG_NA;
7328 assignedRegBit = RBM_NONE;
7333 if (assignedRegister != REG_NA)
7335 // If there is a conflicting fixed reference, insert a copy.
7336 RegRecord* physRegRecord = getRegisterRecord(assignedRegister);
7337 if (physRegRecord->conflictingFixedRegReference(currentRefPosition))
7339 // We may have already reassigned the register to the conflicting reference.
7340 // If not, we need to unassign this interval.
7341 if (physRegRecord->assignedInterval == currentInterval)
7343 unassignPhysRegNoSpill(physRegRecord);
7345 currentRefPosition->moveReg = true;
7346 assignedRegister = REG_NA;
7347 setIntervalAsSplit(currentInterval);
7348 INDEBUG(dumpLsraAllocationEvent(LSRA_EVENT_MOVE_REG, currentInterval, assignedRegister));
7350 else if ((genRegMask(assignedRegister) & currentRefPosition->registerAssignment) != 0)
7352 currentRefPosition->registerAssignment = assignedRegBit;
7353 if (!currentReferent->isActive)
7355 // If we've got an exposed use at the top of a block, the
7356 // interval might not have been active. Otherwise if it's a use,
7357 // the interval must be active.
7358 if (refType == RefTypeDummyDef)
7360 currentReferent->isActive = true;
7361 assert(getRegisterRecord(assignedRegister)->assignedInterval == currentInterval);
7365 currentRefPosition->reload = true;
7368 INDEBUG(dumpLsraAllocationEvent(LSRA_EVENT_KEPT_ALLOCATION, currentInterval, assignedRegister));
7372 assert(currentInterval != nullptr);
7374 // It's already in a register, but not one we need.
7375 if (!RefTypeIsDef(currentRefPosition->refType))
7377 regNumber copyReg = assignCopyReg(currentRefPosition);
7378 assert(copyReg != REG_NA);
7379 INDEBUG(dumpLsraAllocationEvent(LSRA_EVENT_COPY_REG, currentInterval, copyReg));
7380 lastAllocatedRefPosition = currentRefPosition;
7381 if (currentRefPosition->lastUse)
7383 if (currentRefPosition->delayRegFree)
7385 INDEBUG(dumpLsraAllocationEvent(LSRA_EVENT_LAST_USE_DELAYED, currentInterval,
7387 delayRegsToFree |= (genRegMask(assignedRegister) | currentRefPosition->registerAssignment);
7391 INDEBUG(dumpLsraAllocationEvent(LSRA_EVENT_LAST_USE, currentInterval, assignedRegister));
7392 regsToFree |= (genRegMask(assignedRegister) | currentRefPosition->registerAssignment);
7395 // If this is a tree temp (non-localVar) interval, we will need an explicit move.
7396 if (!currentInterval->isLocalVar)
7398 currentRefPosition->moveReg = true;
7399 currentRefPosition->copyReg = false;
7405 INDEBUG(dumpLsraAllocationEvent(LSRA_EVENT_NEEDS_NEW_REG, nullptr, assignedRegister));
7406 regsToFree |= genRegMask(assignedRegister);
7407 // We want a new register, but we don't want this to be considered a spill.
7408 assignedRegister = REG_NA;
7409 if (physRegRecord->assignedInterval == currentInterval)
7411 unassignPhysRegNoSpill(physRegRecord);
7417 if (assignedRegister == REG_NA)
7419 bool allocateReg = true;
7421 if (currentRefPosition->AllocateIfProfitable())
7423 // We can avoid allocating a register if it is a the last use requiring a reload.
7424 if (currentRefPosition->lastUse && currentRefPosition->reload)
7426 allocateReg = false;
7430 // Under stress mode, don't attempt to allocate a reg to
7431 // reg optional ref position.
7432 if (allocateReg && regOptionalNoAlloc())
7434 allocateReg = false;
7441 // Try to allocate a register
7442 assignedRegister = tryAllocateFreeReg(currentInterval, currentRefPosition);
7445 // If no register was found, and if the currentRefPosition must have a register,
7446 // then find a register to spill
7447 if (assignedRegister == REG_NA)
7449 #if FEATURE_PARTIAL_SIMD_CALLEE_SAVE
7450 if (refType == RefTypeUpperVectorSaveDef)
7452 // TODO-CQ: Determine whether copying to two integer callee-save registers would be profitable.
7454 // SaveDef position occurs after the Use of args and at the same location as Kill/Def
7455 // positions of a call node. But SaveDef position cannot use any of the arg regs as
7456 // they are needed for call node.
7457 currentRefPosition->registerAssignment =
7458 (allRegs(TYP_FLOAT) & RBM_FLT_CALLEE_TRASH & ~RBM_FLTARG_REGS);
7459 assignedRegister = tryAllocateFreeReg(currentInterval, currentRefPosition);
7461 // There MUST be caller-save registers available, because they have all just been killed.
7462 // Amd64 Windows: xmm4-xmm5 are guaranteed to be available as xmm0-xmm3 are used for passing args.
7463 // Amd64 Unix: xmm8-xmm15 are guaranteed to be avilable as xmm0-xmm7 are used for passing args.
7464 // X86 RyuJIT Windows: xmm4-xmm7 are guanrateed to be available.
7465 assert(assignedRegister != REG_NA);
7469 // i) The reason we have to spill is that SaveDef position is allocated after the Kill positions
7470 // of the call node are processed. Since callee-trash registers are killed by call node
7471 // we explicity spill and unassign the register.
7472 // ii) These will look a bit backward in the dump, but it's a pain to dump the alloc before the
7474 unassignPhysReg(getRegisterRecord(assignedRegister), currentRefPosition);
7475 INDEBUG(dumpLsraAllocationEvent(LSRA_EVENT_ALLOC_REG, currentInterval, assignedRegister));
7477 // Now set assignedRegister to REG_NA again so that we don't re-activate it.
7478 assignedRegister = REG_NA;
7481 #endif // FEATURE_PARTIAL_SIMD_CALLEE_SAVE
7482 if (currentRefPosition->RequiresRegister() || currentRefPosition->AllocateIfProfitable())
7486 assignedRegister = allocateBusyReg(currentInterval, currentRefPosition,
7487 currentRefPosition->AllocateIfProfitable());
7490 if (assignedRegister != REG_NA)
7493 dumpLsraAllocationEvent(LSRA_EVENT_ALLOC_SPILLED_REG, currentInterval, assignedRegister));
7497 // This can happen only for those ref positions that are to be allocated
7498 // only if profitable.
7499 noway_assert(currentRefPosition->AllocateIfProfitable());
7501 currentRefPosition->registerAssignment = RBM_NONE;
7502 currentRefPosition->reload = false;
7503 setIntervalAsSpilled(currentInterval);
7505 INDEBUG(dumpLsraAllocationEvent(LSRA_EVENT_NO_REG_ALLOCATED, currentInterval));
7510 INDEBUG(dumpLsraAllocationEvent(LSRA_EVENT_NO_REG_ALLOCATED, currentInterval));
7511 currentRefPosition->registerAssignment = RBM_NONE;
7512 currentInterval->isActive = false;
7513 setIntervalAsSpilled(currentInterval);
7521 if (currentInterval->isConstant && (currentRefPosition->treeNode != nullptr) &&
7522 currentRefPosition->treeNode->IsReuseRegVal())
7524 dumpLsraAllocationEvent(LSRA_EVENT_REUSE_REG, nullptr, assignedRegister, currentBlock);
7528 dumpLsraAllocationEvent(LSRA_EVENT_ALLOC_REG, nullptr, assignedRegister, currentBlock);
7534 if (refType == RefTypeDummyDef && assignedRegister != REG_NA)
7536 setInVarRegForBB(curBBNum, currentInterval->varNum, assignedRegister);
7539 // If we allocated a register, and this is a use of a spilled value,
7540 // it should have been marked for reload above.
7541 if (assignedRegister != REG_NA && RefTypeIsUse(refType) && !isInRegister)
7543 assert(currentRefPosition->reload);
7547 // If we allocated a register, record it
7548 if (currentInterval != nullptr && assignedRegister != REG_NA)
7550 assignedRegBit = genRegMask(assignedRegister);
7551 currentRefPosition->registerAssignment = assignedRegBit;
7552 currentInterval->physReg = assignedRegister;
7553 regsToFree &= ~assignedRegBit; // we'll set it again later if it's dead
7555 // If this interval is dead, free the register.
7556 // The interval could be dead if this is a user variable, or if the
7557 // node is being evaluated for side effects, or a call whose result
7558 // is not used, etc.
7559 if (currentRefPosition->lastUse || currentRefPosition->nextRefPosition == nullptr)
7561 assert(currentRefPosition->isIntervalRef());
7563 if (refType != RefTypeExpUse && currentRefPosition->nextRefPosition == nullptr)
7565 if (currentRefPosition->delayRegFree)
7567 delayRegsToFree |= assignedRegBit;
7568 INDEBUG(dumpLsraAllocationEvent(LSRA_EVENT_LAST_USE_DELAYED));
7572 regsToFree |= assignedRegBit;
7573 INDEBUG(dumpLsraAllocationEvent(LSRA_EVENT_LAST_USE));
7578 currentInterval->isActive = false;
7582 lastAllocatedRefPosition = currentRefPosition;
7586 // Free registers to clear associated intervals for resolution phase
7587 CLANG_FORMAT_COMMENT_ANCHOR;
7590 if (getLsraExtendLifeTimes())
7592 // If we have extended lifetimes, we need to make sure all the registers are freed.
7593 for (int regNumIndex = 0; regNumIndex <= REG_FP_LAST; regNumIndex++)
7595 RegRecord& regRecord = physRegs[regNumIndex];
7596 Interval* interval = regRecord.assignedInterval;
7597 if (interval != nullptr)
7599 interval->isActive = false;
7600 unassignPhysReg(®Record, nullptr);
7607 freeRegisters(regsToFree | delayRegsToFree);
7615 // Dump the RegRecords after the last RefPosition is handled.
7620 dumpRefPositions("AFTER ALLOCATION");
7621 dumpVarRefPositions("AFTER ALLOCATION");
7623 // Dump the intervals that remain active
7624 printf("Active intervals at end of allocation:\n");
7626 // We COULD just reuse the intervalIter from above, but ArrayListIterator doesn't
7627 // provide a Reset function (!) - we'll probably replace this so don't bother
7630 for (auto& interval : intervals)
7632 if (interval.isActive)
7644 // LinearScan::resolveLocalRef
7646 // Update the graph for a local reference.
7647 // Also, track the register (if any) that is currently occupied.
7649 // treeNode: The lclVar that's being resolved
7650 // currentRefPosition: the RefPosition associated with the treeNode
7653 // This method is called for each local reference, during the resolveRegisters
7654 // phase of LSRA. It is responsible for keeping the following in sync:
7655 // - varDsc->lvRegNum (and lvOtherReg) contain the unique register location.
7656 // If it is not in the same register through its lifetime, it is set to REG_STK.
7657 // - interval->physReg is set to the assigned register
7658 // (i.e. at the code location which is currently being handled by resolveRegisters())
7659 // - interval->isActive is true iff the interval is live and occupying a register
7660 // - interval->isSpilled should have already been set to true if the interval is EVER spilled
7661 // - interval->isSplit is set to true if the interval does not occupy the same
7662 // register throughout the method
7663 // - RegRecord->assignedInterval points to the interval which currently occupies
7665 // - For each lclVar node:
7666 // - gtRegNum/gtRegPair is set to the currently allocated register(s)
7667 // - GTF_REG_VAL is set if it is a use, and is in a register
7668 // - GTF_SPILLED is set on a use if it must be reloaded prior to use (GTF_REG_VAL
7670 // - GTF_SPILL is set if it must be spilled after use (GTF_REG_VAL may or may not
7673 // A copyReg is an ugly case where the variable must be in a specific (fixed) register,
7674 // but it currently resides elsewhere. The register allocator must track the use of the
7675 // fixed register, but it marks the lclVar node with the register it currently lives in
7676 // and the code generator does the necessary move.
7678 // Before beginning, the varDsc for each parameter must be set to its initial location.
7680 // NICE: Consider tracking whether an Interval is always in the same location (register/stack)
7681 // in which case it will require no resolution.
7683 void LinearScan::resolveLocalRef(BasicBlock* block, GenTreePtr treeNode, RefPosition* currentRefPosition)
7685 assert((block == nullptr) == (treeNode == nullptr));
7687 // Is this a tracked local? Or just a register allocated for loading
7688 // a non-tracked one?
7689 Interval* interval = currentRefPosition->getInterval();
7690 if (!interval->isLocalVar)
7694 interval->recentRefPosition = currentRefPosition;
7695 LclVarDsc* varDsc = interval->getLocalVar(compiler);
7697 // NOTE: we set the GTF_VAR_DEATH flag here unless we are extending lifetimes, in which case we write
7698 // this bit in checkLastUses. This is a bit of a hack, but is necessary because codegen requires
7699 // accurate last use info that is not reflected in the lastUse bit on ref positions when we are extending
7700 // lifetimes. See also the comments in checkLastUses.
7701 if ((treeNode != nullptr) && !extendLifetimes())
7703 if (currentRefPosition->lastUse)
7705 treeNode->gtFlags |= GTF_VAR_DEATH;
7709 treeNode->gtFlags &= ~GTF_VAR_DEATH;
7713 if (currentRefPosition->registerAssignment == RBM_NONE)
7715 assert(!currentRefPosition->RequiresRegister());
7716 assert(interval->isSpilled);
7718 varDsc->lvRegNum = REG_STK;
7719 if (interval->assignedReg != nullptr && interval->assignedReg->assignedInterval == interval)
7721 interval->assignedReg->assignedInterval = nullptr;
7723 interval->assignedReg = nullptr;
7724 interval->physReg = REG_NA;
7729 // In most cases, assigned and home registers will be the same
7730 // The exception is the copyReg case, where we've assigned a register
7731 // for a specific purpose, but will be keeping the register assignment
7732 regNumber assignedReg = currentRefPosition->assignedReg();
7733 regNumber homeReg = assignedReg;
7735 // Undo any previous association with a physical register, UNLESS this
7737 if (!currentRefPosition->copyReg)
7739 regNumber oldAssignedReg = interval->physReg;
7740 if (oldAssignedReg != REG_NA && assignedReg != oldAssignedReg)
7742 RegRecord* oldRegRecord = getRegisterRecord(oldAssignedReg);
7743 if (oldRegRecord->assignedInterval == interval)
7745 oldRegRecord->assignedInterval = nullptr;
7750 if (currentRefPosition->refType == RefTypeUse && !currentRefPosition->reload)
7752 // Was this spilled after our predecessor was scheduled?
7753 if (interval->physReg == REG_NA)
7755 assert(inVarToRegMaps[curBBNum][varDsc->lvVarIndex] == REG_STK);
7756 currentRefPosition->reload = true;
7760 bool reload = currentRefPosition->reload;
7761 bool spillAfter = currentRefPosition->spillAfter;
7763 // In the reload case we simply do not set GTF_REG_VAL, and it gets
7764 // referenced from the variable's home location.
7765 // This is also true for a pure def which is spilled.
7768 assert(currentRefPosition->refType != RefTypeDef);
7769 assert(interval->isSpilled);
7770 varDsc->lvRegNum = REG_STK;
7773 interval->physReg = assignedReg;
7776 // If there is no treeNode, this must be a RefTypeExpUse, in
7777 // which case we did the reload already
7778 if (treeNode != nullptr)
7780 treeNode->gtFlags |= GTF_SPILLED;
7783 if (currentRefPosition->AllocateIfProfitable())
7785 // This is a use of lclVar that is flagged as reg-optional
7786 // by lower/codegen and marked for both reload and spillAfter.
7787 // In this case we can avoid unnecessary reload and spill
7788 // by setting reg on lclVar to REG_STK and reg on tree node
7789 // to REG_NA. Codegen will generate the code by considering
7790 // it as a contained memory operand.
7792 // Note that varDsc->lvRegNum is already to REG_STK above.
7793 interval->physReg = REG_NA;
7794 treeNode->gtRegNum = REG_NA;
7795 treeNode->gtFlags &= ~GTF_SPILLED;
7799 treeNode->gtFlags |= GTF_SPILL;
7805 assert(currentRefPosition->refType == RefTypeExpUse);
7808 else if (spillAfter && !RefTypeIsUse(currentRefPosition->refType))
7810 // In the case of a pure def, don't bother spilling - just assign it to the
7811 // stack. However, we need to remember that it was spilled.
7813 assert(interval->isSpilled);
7814 varDsc->lvRegNum = REG_STK;
7815 interval->physReg = REG_NA;
7816 if (treeNode != nullptr)
7818 treeNode->gtRegNum = REG_NA;
7823 // Not reload and Not pure-def that's spillAfter
7825 if (currentRefPosition->copyReg || currentRefPosition->moveReg)
7827 // For a copyReg or moveReg, we have two cases:
7828 // - In the first case, we have a fixedReg - i.e. a register which the code
7829 // generator is constrained to use.
7830 // The code generator will generate the appropriate move to meet the requirement.
7831 // - In the second case, we were forced to use a different register because of
7832 // interference (or JitStressRegs).
7833 // In this case, we generate a GT_COPY.
7834 // In either case, we annotate the treeNode with the register in which the value
7835 // currently lives. For moveReg, the homeReg is the new register (as assigned above).
7836 // But for copyReg, the homeReg remains unchanged.
7838 assert(treeNode != nullptr);
7839 treeNode->gtRegNum = interval->physReg;
7841 if (currentRefPosition->copyReg)
7843 homeReg = interval->physReg;
7847 assert(interval->isSplit);
7848 interval->physReg = assignedReg;
7851 if (!currentRefPosition->isFixedRegRef || currentRefPosition->moveReg)
7853 // This is the second case, where we need to generate a copy
7854 insertCopyOrReload(block, treeNode, currentRefPosition->getMultiRegIdx(), currentRefPosition);
7859 interval->physReg = assignedReg;
7861 if (!interval->isSpilled && !interval->isSplit)
7863 if (varDsc->lvRegNum != REG_STK)
7865 // If the register assignments don't match, then this interval is split.
7866 if (varDsc->lvRegNum != assignedReg)
7868 setIntervalAsSplit(interval);
7869 varDsc->lvRegNum = REG_STK;
7874 varDsc->lvRegNum = assignedReg;
7880 if (treeNode != nullptr)
7882 treeNode->gtFlags |= GTF_SPILL;
7884 assert(interval->isSpilled);
7885 interval->physReg = REG_NA;
7886 varDsc->lvRegNum = REG_STK;
7889 // This value is in a register, UNLESS we already saw this treeNode
7890 // and marked it for reload
7891 if (treeNode != nullptr && !(treeNode->gtFlags & GTF_SPILLED))
7893 treeNode->gtFlags |= GTF_REG_VAL;
7897 // Update the physRegRecord for the register, so that we know what vars are in
7898 // regs at the block boundaries
7899 RegRecord* physRegRecord = getRegisterRecord(homeReg);
7900 if (spillAfter || currentRefPosition->lastUse)
7902 physRegRecord->assignedInterval = nullptr;
7903 interval->assignedReg = nullptr;
7904 interval->physReg = REG_NA;
7905 interval->isActive = false;
7909 interval->isActive = true;
7910 physRegRecord->assignedInterval = interval;
7911 interval->assignedReg = physRegRecord;
7915 void LinearScan::writeRegisters(RefPosition* currentRefPosition, GenTree* tree)
7917 lsraAssignRegToTree(tree, currentRefPosition->assignedReg(), currentRefPosition->getMultiRegIdx());
7920 //------------------------------------------------------------------------
7921 // insertCopyOrReload: Insert a copy in the case where a tree node value must be moved
7922 // to a different register at the point of use (GT_COPY), or it is reloaded to a different register
7923 // than the one it was spilled from (GT_RELOAD).
7926 // block - basic block in which GT_COPY/GT_RELOAD is inserted.
7927 // tree - This is the node to copy or reload.
7928 // Insert copy or reload node between this node and its parent.
7929 // multiRegIdx - register position of tree node for which copy or reload is needed.
7930 // refPosition - The RefPosition at which copy or reload will take place.
7933 // The GT_COPY or GT_RELOAD will be inserted in the proper spot in execution order where the reload is to occur.
7935 // For example, for this tree (numbers are execution order, lower is earlier and higher is later):
7937 // +---------+----------+
7939 // +---------+----------+
7944 // +-------------------+ +----------------------+
7945 // | x (1) | "tree" | y (2) |
7946 // +-------------------+ +----------------------+
7948 // generate this tree:
7950 // +---------+----------+
7952 // +---------+----------+
7957 // +-------------------+ +----------------------+
7958 // | GT_RELOAD (3) | | y (2) |
7959 // +-------------------+ +----------------------+
7961 // +-------------------+
7963 // +-------------------+
7965 // Note in particular that the GT_RELOAD node gets inserted in execution order immediately before the parent of "tree",
7966 // which seems a bit weird since normally a node's parent (in this case, the parent of "x", GT_RELOAD in the "after"
7967 // picture) immediately follows all of its children (that is, normally the execution ordering is postorder).
7968 // The ordering must be this weird "out of normal order" way because the "x" node is being spilled, probably
7969 // because the expression in the tree represented above by "y" has high register requirements. We don't want
7970 // to reload immediately, of course. So we put GT_RELOAD where the reload should actually happen.
7972 // Note that GT_RELOAD is required when we reload to a different register than the one we spilled to. It can also be
7973 // used if we reload to the same register. Normally, though, in that case we just mark the node with GTF_SPILLED,
7974 // and the unspilling code automatically reuses the same register, and does the reload when it notices that flag
7975 // when considering a node's operands.
7977 void LinearScan::insertCopyOrReload(BasicBlock* block, GenTreePtr tree, unsigned multiRegIdx, RefPosition* refPosition)
7979 LIR::Range& blockRange = LIR::AsRange(block);
7982 bool foundUse = blockRange.TryGetUse(tree, &treeUse);
7985 GenTree* parent = treeUse.User();
7988 if (refPosition->reload)
7996 #if TRACK_LSRA_STATS
7997 updateLsraStat(LSRA_STAT_COPY_REG, block->bbNum);
8001 // If the parent is a reload/copy node, then tree must be a multi-reg call node
8002 // that has already had one of its registers spilled. This is Because multi-reg
8003 // call node is the only node whose RefTypeDef positions get independently
8004 // spilled or reloaded. It is possible that one of its RefTypeDef position got
8005 // spilled and the next use of it requires it to be in a different register.
8007 // In this case set the ith position reg of reload/copy node to the reg allocated
8008 // for copy/reload refPosition. Essentially a copy/reload node will have a reg
8009 // for each multi-reg position of its child. If there is a valid reg in ith
8010 // position of GT_COPY or GT_RELOAD node then the corresponding result of its
8011 // child needs to be copied or reloaded to that reg.
8012 if (parent->IsCopyOrReload())
8014 noway_assert(parent->OperGet() == oper);
8015 noway_assert(tree->IsMultiRegCall());
8016 GenTreeCall* call = tree->AsCall();
8017 GenTreeCopyOrReload* copyOrReload = parent->AsCopyOrReload();
8018 noway_assert(copyOrReload->GetRegNumByIdx(multiRegIdx) == REG_NA);
8019 copyOrReload->SetRegNumByIdx(refPosition->assignedReg(), multiRegIdx);
8023 // Create the new node, with "tree" as its only child.
8024 var_types treeType = tree->TypeGet();
8027 // Check to see whether we need to move to a different register set.
8028 // This currently only happens in the case of SIMD vector types that are small enough (pointer size)
8029 // that they must be passed & returned in integer registers.
8030 // 'treeType' is the type of the register we are moving FROM,
8031 // and refPosition->registerAssignment is the mask for the register we are moving TO.
8032 // If they don't match, we need to reverse the type for the "move" node.
8034 if ((allRegs(treeType) & refPosition->registerAssignment) == 0)
8036 treeType = (useFloatReg(treeType)) ? TYP_I_IMPL : TYP_SIMD8;
8038 #endif // FEATURE_SIMD
8040 GenTreeCopyOrReload* newNode = new (compiler, oper) GenTreeCopyOrReload(oper, treeType, tree);
8041 assert(refPosition->registerAssignment != RBM_NONE);
8042 newNode->SetRegNumByIdx(refPosition->assignedReg(), multiRegIdx);
8043 newNode->gtLsraInfo.isLsraAdded = true;
8044 newNode->gtLsraInfo.isLocalDefUse = false;
8045 if (refPosition->copyReg)
8047 // This is a TEMPORARY copy
8048 assert(isCandidateLocalRef(tree));
8049 newNode->gtFlags |= GTF_VAR_DEATH;
8052 // Insert the copy/reload after the spilled node and replace the use of the original node with a use
8053 // of the copy/reload.
8054 blockRange.InsertAfter(tree, newNode);
8055 treeUse.ReplaceWith(compiler, newNode);
8059 #if FEATURE_PARTIAL_SIMD_CALLEE_SAVE
8060 //------------------------------------------------------------------------
8061 // insertUpperVectorSaveAndReload: Insert code to save and restore the upper half of a vector that lives
8062 // in a callee-save register at the point of a kill (the upper half is
8066 // tree - This is the node around which we will insert the Save & Reload.
8067 // It will be a call or some node that turns into a call.
8068 // refPosition - The RefTypeUpperVectorSaveDef RefPosition.
8070 void LinearScan::insertUpperVectorSaveAndReload(GenTreePtr tree, RefPosition* refPosition, BasicBlock* block)
8072 Interval* lclVarInterval = refPosition->getInterval()->relatedInterval;
8073 assert(lclVarInterval->isLocalVar == true);
8074 LclVarDsc* varDsc = compiler->lvaTable + lclVarInterval->varNum;
8075 assert(varDsc->lvType == LargeVectorType);
8076 regNumber lclVarReg = lclVarInterval->physReg;
8077 if (lclVarReg == REG_NA)
8082 assert((genRegMask(lclVarReg) & RBM_FLT_CALLEE_SAVED) != RBM_NONE);
8084 regNumber spillReg = refPosition->assignedReg();
8085 bool spillToMem = refPosition->spillAfter;
8087 LIR::Range& blockRange = LIR::AsRange(block);
8089 // First, insert the save as an embedded statement before the call.
8091 GenTreePtr saveLcl = compiler->gtNewLclvNode(lclVarInterval->varNum, LargeVectorType);
8092 saveLcl->gtLsraInfo.isLsraAdded = true;
8093 saveLcl->gtRegNum = lclVarReg;
8094 saveLcl->gtFlags |= GTF_REG_VAL;
8095 saveLcl->gtLsraInfo.isLocalDefUse = false;
8097 GenTreeSIMD* simdNode =
8098 new (compiler, GT_SIMD) GenTreeSIMD(LargeVectorSaveType, saveLcl, nullptr, SIMDIntrinsicUpperSave,
8099 varDsc->lvBaseType, genTypeSize(LargeVectorType));
8100 simdNode->gtLsraInfo.isLsraAdded = true;
8101 simdNode->gtRegNum = spillReg;
8104 simdNode->gtFlags |= GTF_SPILL;
8107 blockRange.InsertBefore(tree, LIR::SeqTree(compiler, simdNode));
8109 // Now insert the restore after the call.
8111 GenTreePtr restoreLcl = compiler->gtNewLclvNode(lclVarInterval->varNum, LargeVectorType);
8112 restoreLcl->gtLsraInfo.isLsraAdded = true;
8113 restoreLcl->gtRegNum = lclVarReg;
8114 restoreLcl->gtFlags |= GTF_REG_VAL;
8115 restoreLcl->gtLsraInfo.isLocalDefUse = false;
8117 simdNode = new (compiler, GT_SIMD)
8118 GenTreeSIMD(LargeVectorType, restoreLcl, nullptr, SIMDIntrinsicUpperRestore, varDsc->lvBaseType, 32);
8119 simdNode->gtLsraInfo.isLsraAdded = true;
8120 simdNode->gtRegNum = spillReg;
8123 simdNode->gtFlags |= GTF_SPILLED;
8126 blockRange.InsertAfter(tree, LIR::SeqTree(compiler, simdNode));
8128 #endif // FEATURE_PARTIAL_SIMD_CALLEE_SAVE
8130 //------------------------------------------------------------------------
8131 // initMaxSpill: Initializes the LinearScan members used to track the max number
8132 // of concurrent spills. This is needed so that we can set the
8133 // fields in Compiler, so that the code generator, in turn can
8134 // allocate the right number of spill locations.
8143 // This is called before any calls to updateMaxSpill().
8145 void LinearScan::initMaxSpill()
8147 needDoubleTmpForFPCall = false;
8148 needFloatTmpForFPCall = false;
8149 for (int i = 0; i < TYP_COUNT; i++)
8152 currentSpill[i] = 0;
8156 //------------------------------------------------------------------------
8157 // recordMaxSpill: Sets the fields in Compiler for the max number of concurrent spills.
8158 // (See the comment on initMaxSpill.)
8167 // This is called after updateMaxSpill() has been called for all "real"
8170 void LinearScan::recordMaxSpill()
8172 // Note: due to the temp normalization process (see tmpNormalizeType)
8173 // only a few types should actually be seen here.
8174 JITDUMP("Recording the maximum number of concurrent spills:\n");
8176 var_types returnType = compiler->tmpNormalizeType(compiler->info.compRetType);
8177 if (needDoubleTmpForFPCall || (returnType == TYP_DOUBLE))
8179 JITDUMP("Adding a spill temp for moving a double call/return value between xmm reg and x87 stack.\n");
8180 maxSpill[TYP_DOUBLE] += 1;
8182 if (needFloatTmpForFPCall || (returnType == TYP_FLOAT))
8184 JITDUMP("Adding a spill temp for moving a float call/return value between xmm reg and x87 stack.\n");
8185 maxSpill[TYP_FLOAT] += 1;
8187 #endif // _TARGET_X86_
8188 for (int i = 0; i < TYP_COUNT; i++)
8190 if (var_types(i) != compiler->tmpNormalizeType(var_types(i)))
8192 // Only normalized types should have anything in the maxSpill array.
8193 // We assume here that if type 'i' does not normalize to itself, then
8194 // nothing else normalizes to 'i', either.
8195 assert(maxSpill[i] == 0);
8197 JITDUMP(" %s: %d\n", varTypeName(var_types(i)), maxSpill[i]);
8198 if (maxSpill[i] != 0)
8200 compiler->tmpPreAllocateTemps(var_types(i), maxSpill[i]);
8205 //------------------------------------------------------------------------
8206 // updateMaxSpill: Update the maximum number of concurrent spills
8209 // refPosition - the current RefPosition being handled
8215 // The RefPosition has an associated interval (getInterval() will
8216 // otherwise assert).
8219 // This is called for each "real" RefPosition during the writeback
8220 // phase of LSRA. It keeps track of how many concurrently-live
8221 // spills there are, and the largest number seen so far.
8223 void LinearScan::updateMaxSpill(RefPosition* refPosition)
8225 RefType refType = refPosition->refType;
8227 if (refPosition->spillAfter || refPosition->reload ||
8228 (refPosition->AllocateIfProfitable() && refPosition->assignedReg() == REG_NA))
8230 Interval* interval = refPosition->getInterval();
8231 if (!interval->isLocalVar)
8233 // The tmp allocation logic 'normalizes' types to a small number of
8234 // types that need distinct stack locations from each other.
8235 // Those types are currently gc refs, byrefs, <= 4 byte non-GC items,
8236 // 8-byte non-GC items, and 16-byte or 32-byte SIMD vectors.
8237 // LSRA is agnostic to those choices but needs
8238 // to know what they are here.
8241 #if FEATURE_PARTIAL_SIMD_CALLEE_SAVE
8242 if ((refType == RefTypeUpperVectorSaveDef) || (refType == RefTypeUpperVectorSaveUse))
8244 typ = LargeVectorSaveType;
8247 #endif // !FEATURE_PARTIAL_SIMD_CALLEE_SAVE
8249 GenTreePtr treeNode = refPosition->treeNode;
8250 if (treeNode == nullptr)
8252 assert(RefTypeIsUse(refType));
8253 treeNode = interval->firstRefPosition->treeNode;
8255 assert(treeNode != nullptr);
8257 // In case of multi-reg call nodes, we need to use the type
8258 // of the return register given by multiRegIdx of the refposition.
8259 if (treeNode->IsMultiRegCall())
8261 ReturnTypeDesc* retTypeDesc = treeNode->AsCall()->GetReturnTypeDesc();
8262 typ = retTypeDesc->GetReturnRegType(refPosition->getMultiRegIdx());
8266 typ = treeNode->TypeGet();
8268 typ = compiler->tmpNormalizeType(typ);
8271 if (refPosition->spillAfter && !refPosition->reload)
8273 currentSpill[typ]++;
8274 if (currentSpill[typ] > maxSpill[typ])
8276 maxSpill[typ] = currentSpill[typ];
8279 else if (refPosition->reload)
8281 assert(currentSpill[typ] > 0);
8282 currentSpill[typ]--;
8284 else if (refPosition->AllocateIfProfitable() && refPosition->assignedReg() == REG_NA)
8286 // A spill temp not getting reloaded into a reg because it is
8287 // marked as allocate if profitable and getting used from its
8288 // memory location. To properly account max spill for typ we
8289 // decrement spill count.
8290 assert(RefTypeIsUse(refType));
8291 assert(currentSpill[typ] > 0);
8292 currentSpill[typ]--;
8294 JITDUMP(" Max spill for %s is %d\n", varTypeName(typ), maxSpill[typ]);
8299 // This is the final phase of register allocation. It writes the register assignments to
8300 // the tree, and performs resolution across joins and backedges.
8302 void LinearScan::resolveRegisters()
8304 // Iterate over the tree and the RefPositions in lockstep
8305 // - annotate the tree with register assignments by setting gtRegNum or gtRegPair (for longs)
8307 // - track globally-live var locations
8308 // - add resolution points at split/merge/critical points as needed
8310 // Need to use the same traversal order as the one that assigns the location numbers.
8312 // Dummy RefPositions have been added at any split, join or critical edge, at the
8313 // point where resolution may be required. These are located:
8314 // - for a split, at the top of the non-adjacent block
8315 // - for a join, at the bottom of the non-adjacent joining block
8316 // - for a critical edge, at the top of the target block of each critical
8318 // Note that a target block may have multiple incoming critical or split edges
8320 // These RefPositions record the expected location of the Interval at that point.
8321 // At each branch, we identify the location of each liveOut interval, and check
8322 // against the RefPositions at the target.
8325 LsraLocation currentLocation = MinLocation;
8327 // Clear register assignments - these will be reestablished as lclVar defs (including RefTypeParamDefs)
8329 for (regNumber reg = REG_FIRST; reg < ACTUAL_REG_COUNT; reg = REG_NEXT(reg))
8331 RegRecord* physRegRecord = getRegisterRecord(reg);
8332 Interval* assignedInterval = physRegRecord->assignedInterval;
8333 if (assignedInterval != nullptr)
8335 assignedInterval->assignedReg = nullptr;
8336 assignedInterval->physReg = REG_NA;
8338 physRegRecord->assignedInterval = nullptr;
8339 physRegRecord->recentRefPosition = nullptr;
8342 // Clear "recentRefPosition" for lclVar intervals
8343 for (unsigned lclNum = 0; lclNum < compiler->lvaCount; lclNum++)
8345 localVarIntervals[lclNum]->recentRefPosition = nullptr;
8346 localVarIntervals[lclNum]->isActive = false;
8349 // handle incoming arguments and special temps
8350 auto currentRefPosition = refPositions.begin();
8352 VarToRegMap entryVarToRegMap = inVarToRegMaps[compiler->fgFirstBB->bbNum];
8353 while (currentRefPosition != refPositions.end() &&
8354 (currentRefPosition->refType == RefTypeParamDef || currentRefPosition->refType == RefTypeZeroInit))
8356 Interval* interval = currentRefPosition->getInterval();
8357 assert(interval != nullptr && interval->isLocalVar);
8358 resolveLocalRef(nullptr, nullptr, currentRefPosition);
8359 regNumber reg = REG_STK;
8360 int varIndex = interval->getVarIndex(compiler);
8362 if (!currentRefPosition->spillAfter && currentRefPosition->registerAssignment != RBM_NONE)
8364 reg = currentRefPosition->assignedReg();
8369 interval->isActive = false;
8371 entryVarToRegMap[varIndex] = reg;
8372 ++currentRefPosition;
8375 JITDUMP("------------------------\n");
8376 JITDUMP("WRITING BACK ASSIGNMENTS\n");
8377 JITDUMP("------------------------\n");
8379 BasicBlock* insertionBlock = compiler->fgFirstBB;
8380 GenTreePtr insertionPoint = LIR::AsRange(insertionBlock).FirstNonPhiNode();
8382 // write back assignments
8383 for (block = startBlockSequence(); block != nullptr; block = moveToNextBlock())
8385 assert(curBBNum == block->bbNum);
8390 block->dspBlockHeader(compiler);
8391 currentRefPosition->dump();
8395 // Record the var locations at the start of this block.
8396 // (If it's fgFirstBB, we've already done that above, see entryVarToRegMap)
8398 curBBStartLocation = currentRefPosition->nodeLocation;
8399 if (block != compiler->fgFirstBB)
8401 processBlockStartLocations(block, false);
8404 // Handle the DummyDefs, updating the incoming var location.
8405 for (; currentRefPosition != refPositions.end() && currentRefPosition->refType == RefTypeDummyDef;
8406 ++currentRefPosition)
8408 assert(currentRefPosition->isIntervalRef());
8409 // Don't mark dummy defs as reload
8410 currentRefPosition->reload = false;
8411 resolveLocalRef(nullptr, nullptr, currentRefPosition);
8413 if (currentRefPosition->registerAssignment != RBM_NONE)
8415 reg = currentRefPosition->assignedReg();
8420 currentRefPosition->getInterval()->isActive = false;
8422 setInVarRegForBB(curBBNum, currentRefPosition->getInterval()->varNum, reg);
8425 // The next RefPosition should be for the block. Move past it.
8426 assert(currentRefPosition != refPositions.end());
8427 assert(currentRefPosition->refType == RefTypeBB);
8428 ++currentRefPosition;
8430 // Handle the RefPositions for the block
8431 for (; currentRefPosition != refPositions.end() && currentRefPosition->refType != RefTypeBB &&
8432 currentRefPosition->refType != RefTypeDummyDef;
8433 ++currentRefPosition)
8435 currentLocation = currentRefPosition->nodeLocation;
8436 JITDUMP("current : ");
8437 DBEXEC(VERBOSE, currentRefPosition->dump());
8439 // Ensure that the spill & copy info is valid.
8440 // First, if it's reload, it must not be copyReg or moveReg
8441 assert(!currentRefPosition->reload || (!currentRefPosition->copyReg && !currentRefPosition->moveReg));
8442 // If it's copyReg it must not be moveReg, and vice-versa
8443 assert(!currentRefPosition->copyReg || !currentRefPosition->moveReg);
8445 switch (currentRefPosition->refType)
8448 case RefTypeUpperVectorSaveUse:
8449 case RefTypeUpperVectorSaveDef:
8450 #endif // FEATURE_SIMD
8453 // These are the ones we're interested in
8456 case RefTypeFixedReg:
8457 // These require no handling at resolution time
8458 assert(currentRefPosition->referent != nullptr);
8459 currentRefPosition->referent->recentRefPosition = currentRefPosition;
8462 // Ignore the ExpUse cases - a RefTypeExpUse would only exist if the
8463 // variable is dead at the entry to the next block. So we'll mark
8464 // it as in its current location and resolution will take care of any
8466 assert(getNextBlock() == nullptr ||
8467 !VarSetOps::IsMember(compiler, getNextBlock()->bbLiveIn,
8468 currentRefPosition->getInterval()->getVarIndex(compiler)));
8469 currentRefPosition->referent->recentRefPosition = currentRefPosition;
8471 case RefTypeKillGCRefs:
8472 // No action to take at resolution time, and no interval to update recentRefPosition for.
8474 case RefTypeDummyDef:
8475 case RefTypeParamDef:
8476 case RefTypeZeroInit:
8477 // Should have handled all of these already
8482 updateMaxSpill(currentRefPosition);
8483 GenTree* treeNode = currentRefPosition->treeNode;
8485 #if FEATURE_PARTIAL_SIMD_CALLEE_SAVE
8486 if (currentRefPosition->refType == RefTypeUpperVectorSaveDef)
8488 // The treeNode must be a call, and this must be a RefPosition for a LargeVectorType LocalVar.
8489 // If the LocalVar is in a callee-save register, we are going to spill its upper half around the call.
8490 // If we have allocated a register to spill it to, we will use that; otherwise, we will spill it
8491 // to the stack. We can use as a temp register any non-arg caller-save register.
8492 noway_assert(treeNode != nullptr);
8493 currentRefPosition->referent->recentRefPosition = currentRefPosition;
8494 insertUpperVectorSaveAndReload(treeNode, currentRefPosition, block);
8496 else if (currentRefPosition->refType == RefTypeUpperVectorSaveUse)
8500 #endif // FEATURE_PARTIAL_SIMD_CALLEE_SAVE
8502 // Most uses won't actually need to be recorded (they're on the def).
8503 // In those cases, treeNode will be nullptr.
8504 if (treeNode == nullptr)
8506 // This is either a use, a dead def, or a field of a struct
8507 Interval* interval = currentRefPosition->getInterval();
8508 assert(currentRefPosition->refType == RefTypeUse ||
8509 currentRefPosition->registerAssignment == RBM_NONE || interval->isStructField);
8511 // TODO-Review: Need to handle the case where any of the struct fields
8512 // are reloaded/spilled at this use
8513 assert(!interval->isStructField ||
8514 (currentRefPosition->reload == false && currentRefPosition->spillAfter == false));
8516 if (interval->isLocalVar && !interval->isStructField)
8518 LclVarDsc* varDsc = interval->getLocalVar(compiler);
8520 // This must be a dead definition. We need to mark the lclVar
8521 // so that it's not considered a candidate for lvRegister, as
8522 // this dead def will have to go to the stack.
8523 assert(currentRefPosition->refType == RefTypeDef);
8524 varDsc->lvRegNum = REG_STK;
8527 JITDUMP("No tree node to write back to\n");
8531 DBEXEC(VERBOSE, lsraDispNode(treeNode, LSRA_DUMP_REFPOS, true));
8534 LsraLocation loc = treeNode->gtLsraInfo.loc;
8535 JITDUMP("curr = %u mapped = %u", currentLocation, loc);
8536 assert(treeNode->IsLocal() || currentLocation == loc || currentLocation == loc + 1);
8538 if (currentRefPosition->isIntervalRef() && currentRefPosition->getInterval()->isInternal)
8540 JITDUMP(" internal");
8541 GenTreePtr indNode = nullptr;
8542 if (treeNode->OperGet() == GT_IND)
8545 JITDUMP(" allocated at GT_IND");
8547 if (indNode != nullptr)
8549 GenTreePtr addrNode = indNode->gtOp.gtOp1->gtEffectiveVal();
8550 if (addrNode->OperGet() != GT_ARR_ELEM)
8552 addrNode->gtRsvdRegs |= currentRefPosition->registerAssignment;
8553 JITDUMP(", recorded on addr");
8556 if (treeNode->OperGet() == GT_ARR_ELEM)
8558 // TODO-Review: See WORKAROUND ALERT in buildRefPositionsForNode()
8559 GenTreePtr firstIndexTree = treeNode->gtArrElem.gtArrInds[0]->gtEffectiveVal();
8560 assert(firstIndexTree != nullptr);
8561 if (firstIndexTree->IsLocal() && (firstIndexTree->gtFlags & GTF_VAR_DEATH) == 0)
8563 // Record the LAST internal interval
8564 // (Yes, this naively just records each one, but the next will replace it;
8565 // I'd fix this if it wasn't just a temporary fix)
8566 if (currentRefPosition->refType == RefTypeDef)
8568 JITDUMP(" allocated at GT_ARR_ELEM, recorded on firstIndex V%02u");
8569 firstIndexTree->gtRsvdRegs = (regMaskSmall)currentRefPosition->registerAssignment;
8573 treeNode->gtRsvdRegs |= currentRefPosition->registerAssignment;
8577 writeRegisters(currentRefPosition, treeNode);
8579 if (treeNode->IsLocal() && currentRefPosition->getInterval()->isLocalVar)
8581 resolveLocalRef(block, treeNode, currentRefPosition);
8584 // Mark spill locations on temps
8585 // (local vars are handled in resolveLocalRef, above)
8586 // Note that the tree node will be changed from GTF_SPILL to GTF_SPILLED
8587 // in codegen, taking care of the "reload" case for temps
8588 else if (currentRefPosition->spillAfter || (currentRefPosition->nextRefPosition != nullptr &&
8589 currentRefPosition->nextRefPosition->moveReg))
8591 if (treeNode != nullptr && currentRefPosition->isIntervalRef())
8593 if (currentRefPosition->spillAfter)
8595 treeNode->gtFlags |= GTF_SPILL;
8597 // If this is a constant interval that is reusing a pre-existing value, we actually need
8598 // to generate the value at this point in order to spill it.
8599 if (treeNode->IsReuseRegVal())
8601 treeNode->ResetReuseRegVal();
8604 // In case of multi-reg call node, also set spill flag on the
8605 // register specified by multi-reg index of current RefPosition.
8606 // Note that the spill flag on treeNode indicates that one or
8607 // more its allocated registers are in that state.
8608 if (treeNode->IsMultiRegCall())
8610 GenTreeCall* call = treeNode->AsCall();
8611 call->SetRegSpillFlagByIdx(GTF_SPILL, currentRefPosition->getMultiRegIdx());
8615 // If the value is reloaded or moved to a different register, we need to insert
8616 // a node to hold the register to which it should be reloaded
8617 RefPosition* nextRefPosition = currentRefPosition->nextRefPosition;
8618 assert(nextRefPosition != nullptr);
8619 if (INDEBUG(alwaysInsertReload() ||)
8620 nextRefPosition->assignedReg() != currentRefPosition->assignedReg())
8622 if (nextRefPosition->assignedReg() != REG_NA)
8624 insertCopyOrReload(block, treeNode, currentRefPosition->getMultiRegIdx(),
8629 assert(nextRefPosition->AllocateIfProfitable());
8631 // In case of tree temps, if def is spilled and use didn't
8632 // get a register, set a flag on tree node to be treated as
8633 // contained at the point of its use.
8634 if (currentRefPosition->spillAfter && currentRefPosition->refType == RefTypeDef &&
8635 nextRefPosition->refType == RefTypeUse)
8637 assert(nextRefPosition->treeNode == nullptr);
8638 treeNode->gtFlags |= GTF_NOREG_AT_USE;
8644 // We should never have to "spill after" a temp use, since
8645 // they're single use
8655 processBlockEndLocations(block);
8661 printf("-----------------------\n");
8662 printf("RESOLVING BB BOUNDARIES\n");
8663 printf("-----------------------\n");
8665 printf("Resolution Candidates: ");
8666 dumpConvertedVarSet(compiler, resolutionCandidateVars);
8668 printf("Has %sCritical Edges\n\n", hasCriticalEdges ? "" : "No");
8670 printf("Prior to Resolution\n");
8671 foreach_block(compiler, block)
8673 printf("\nBB%02u use def in out\n", block->bbNum);
8674 dumpConvertedVarSet(compiler, block->bbVarUse);
8676 dumpConvertedVarSet(compiler, block->bbVarDef);
8678 dumpConvertedVarSet(compiler, block->bbLiveIn);
8680 dumpConvertedVarSet(compiler, block->bbLiveOut);
8683 dumpInVarToRegMap(block);
8684 dumpOutVarToRegMap(block);
8693 // Verify register assignments on variables
8696 for (lclNum = 0, varDsc = compiler->lvaTable; lclNum < compiler->lvaCount; lclNum++, varDsc++)
8698 if (!isCandidateVar(varDsc))
8700 varDsc->lvRegNum = REG_STK;
8704 Interval* interval = getIntervalForLocalVar(lclNum);
8706 // Determine initial position for parameters
8708 if (varDsc->lvIsParam)
8710 regMaskTP initialRegMask = interval->firstRefPosition->registerAssignment;
8711 regNumber initialReg = (initialRegMask == RBM_NONE || interval->firstRefPosition->spillAfter)
8713 : genRegNumFromMask(initialRegMask);
8714 regNumber sourceReg = (varDsc->lvIsRegArg) ? varDsc->lvArgReg : REG_STK;
8717 if (varTypeIsMultiReg(varDsc))
8719 // TODO-ARM-NYI: Map the hi/lo intervals back to lvRegNum and lvOtherReg (these should NYI before
8721 assert(!"Multi-reg types not yet supported");
8724 #endif // _TARGET_ARM_
8726 varDsc->lvArgInitReg = initialReg;
8727 JITDUMP(" Set V%02u argument initial register to %s\n", lclNum, getRegName(initialReg));
8730 // Stack args that are part of dependently-promoted structs should never be register candidates (see
8731 // LinearScan::isRegCandidate).
8732 assert(varDsc->lvIsRegArg || !compiler->lvaIsFieldOfDependentlyPromotedStruct(varDsc));
8735 // If lvRegNum is REG_STK, that means that either no register
8736 // was assigned, or (more likely) that the same register was not
8737 // used for all references. In that case, codegen gets the register
8738 // from the tree node.
8739 if (varDsc->lvRegNum == REG_STK || interval->isSpilled || interval->isSplit)
8741 // For codegen purposes, we'll set lvRegNum to whatever register
8742 // it's currently in as we go.
8743 // However, we never mark an interval as lvRegister if it has either been spilled
8745 varDsc->lvRegister = false;
8747 // Skip any dead defs or exposed uses
8748 // (first use exposed will only occur when there is no explicit initialization)
8749 RefPosition* firstRefPosition = interval->firstRefPosition;
8750 while ((firstRefPosition != nullptr) && (firstRefPosition->refType == RefTypeExpUse))
8752 firstRefPosition = firstRefPosition->nextRefPosition;
8754 if (firstRefPosition == nullptr)
8757 varDsc->lvLRACandidate = false;
8758 if (varDsc->lvRefCnt == 0)
8760 varDsc->lvOnFrame = false;
8764 // We may encounter cases where a lclVar actually has no references, but
8765 // a non-zero refCnt. For safety (in case this is some "hidden" lclVar that we're
8766 // not correctly recognizing), we'll mark those as needing a stack location.
8767 // TODO-Cleanup: Make this an assert if/when we correct the refCnt
8769 varDsc->lvOnFrame = true;
8774 // If the interval was not spilled, it doesn't need a stack location.
8775 if (!interval->isSpilled)
8777 varDsc->lvOnFrame = false;
8779 if (firstRefPosition->registerAssignment == RBM_NONE || firstRefPosition->spillAfter)
8781 // Either this RefPosition is spilled, or regOptional or it is not a "real" def or use
8782 assert(firstRefPosition->spillAfter || firstRefPosition->AllocateIfProfitable() ||
8783 (firstRefPosition->refType != RefTypeDef && firstRefPosition->refType != RefTypeUse));
8784 varDsc->lvRegNum = REG_STK;
8788 varDsc->lvRegNum = firstRefPosition->assignedReg();
8795 varDsc->lvRegister = true;
8796 varDsc->lvOnFrame = false;
8799 regMaskTP registerAssignment = genRegMask(varDsc->lvRegNum);
8800 assert(!interval->isSpilled && !interval->isSplit);
8801 RefPosition* refPosition = interval->firstRefPosition;
8802 assert(refPosition != nullptr);
8804 while (refPosition != nullptr)
8806 // All RefPositions must match, except for dead definitions,
8807 // copyReg/moveReg and RefTypeExpUse positions
8808 if (refPosition->registerAssignment != RBM_NONE && !refPosition->copyReg && !refPosition->moveReg &&
8809 refPosition->refType != RefTypeExpUse)
8811 assert(refPosition->registerAssignment == registerAssignment);
8813 refPosition = refPosition->nextRefPosition;
8823 printf("Trees after linear scan register allocator (LSRA)\n");
8824 compiler->fgDispBasicBlocks(true);
8827 verifyFinalAllocation();
8830 compiler->raMarkStkVars();
8833 // TODO-CQ: Review this comment and address as needed.
8834 // Change all unused promoted non-argument struct locals to a non-GC type (in this case TYP_INT)
8835 // so that the gc tracking logic and lvMustInit logic will ignore them.
8836 // Extract the code that does this from raAssignVars, and call it here.
8837 // PRECONDITIONS: Ensure that lvPromoted is set on promoted structs, if and
8838 // only if it is promoted on all paths.
8839 // Call might be something like:
8840 // compiler->BashUnusedStructLocals();
8844 //------------------------------------------------------------------------
8845 // insertMove: Insert a move of a lclVar with the given lclNum into the given block.
8848 // block - the BasicBlock into which the move will be inserted.
8849 // insertionPoint - the instruction before which to insert the move
8850 // lclNum - the lclNum of the var to be moved
8851 // fromReg - the register from which the var is moving
8852 // toReg - the register to which the var is moving
8858 // If insertionPoint is non-NULL, insert before that instruction;
8859 // otherwise, insert "near" the end (prior to the branch, if any).
8860 // If fromReg or toReg is REG_STK, then move from/to memory, respectively.
8862 void LinearScan::insertMove(
8863 BasicBlock* block, GenTreePtr insertionPoint, unsigned lclNum, regNumber fromReg, regNumber toReg)
8865 LclVarDsc* varDsc = compiler->lvaTable + lclNum;
8866 // the lclVar must be a register candidate
8867 assert(isRegCandidate(varDsc));
8868 // One or both MUST be a register
8869 assert(fromReg != REG_STK || toReg != REG_STK);
8870 // They must not be the same register.
8871 assert(fromReg != toReg);
8873 // This var can't be marked lvRegister now
8874 varDsc->lvRegNum = REG_STK;
8876 GenTreePtr src = compiler->gtNewLclvNode(lclNum, varDsc->TypeGet());
8877 src->gtLsraInfo.isLsraAdded = true;
8879 // There are three cases we need to handle:
8880 // - We are loading a lclVar from the stack.
8881 // - We are storing a lclVar to the stack.
8882 // - We are copying a lclVar between registers.
8884 // In the first and second cases, the lclVar node will be marked with GTF_SPILLED and GTF_SPILL, respectively.
8885 // It is up to the code generator to ensure that any necessary normalization is done when loading or storing the
8888 // In the third case, we generate GT_COPY(GT_LCL_VAR) and type each node with the normalized type of the lclVar.
8889 // This is safe because a lclVar is always normalized once it is in a register.
8892 if (fromReg == REG_STK)
8894 src->gtFlags |= GTF_SPILLED;
8895 src->gtRegNum = toReg;
8897 else if (toReg == REG_STK)
8899 src->gtFlags |= GTF_SPILL;
8901 src->gtRegNum = fromReg;
8905 var_types movType = genActualType(varDsc->TypeGet());
8906 src->gtType = movType;
8908 dst = new (compiler, GT_COPY) GenTreeCopyOrReload(GT_COPY, movType, src);
8909 // This is the new home of the lclVar - indicate that by clearing the GTF_VAR_DEATH flag.
8910 // Note that if src is itself a lastUse, this will have no effect.
8911 dst->gtFlags &= ~(GTF_VAR_DEATH);
8912 src->gtRegNum = fromReg;
8914 dst->gtRegNum = toReg;
8915 src->gtLsraInfo.isLocalDefUse = false;
8916 dst->gtLsraInfo.isLsraAdded = true;
8918 dst->gtLsraInfo.isLocalDefUse = true;
8920 LIR::Range treeRange = LIR::SeqTree(compiler, dst);
8921 LIR::Range& blockRange = LIR::AsRange(block);
8923 if (insertionPoint != nullptr)
8925 blockRange.InsertBefore(insertionPoint, std::move(treeRange));
8929 // Put the copy at the bottom
8930 // If there's a branch, make an embedded statement that executes just prior to the branch
8931 if (block->bbJumpKind == BBJ_COND || block->bbJumpKind == BBJ_SWITCH)
8933 noway_assert(!blockRange.IsEmpty());
8935 GenTree* branch = blockRange.LastNode();
8936 assert(branch->OperIsConditionalJump() || branch->OperGet() == GT_SWITCH_TABLE ||
8937 branch->OperGet() == GT_SWITCH);
8939 blockRange.InsertBefore(branch, std::move(treeRange));
8943 assert(block->bbJumpKind == BBJ_NONE || block->bbJumpKind == BBJ_ALWAYS);
8944 blockRange.InsertAtEnd(std::move(treeRange));
8949 void LinearScan::insertSwap(
8950 BasicBlock* block, GenTreePtr insertionPoint, unsigned lclNum1, regNumber reg1, unsigned lclNum2, regNumber reg2)
8955 const char* insertionPointString = "top";
8956 if (insertionPoint == nullptr)
8958 insertionPointString = "bottom";
8960 printf(" BB%02u %s: swap V%02u in %s with V%02u in %s\n", block->bbNum, insertionPointString, lclNum1,
8961 getRegName(reg1), lclNum2, getRegName(reg2));
8965 LclVarDsc* varDsc1 = compiler->lvaTable + lclNum1;
8966 LclVarDsc* varDsc2 = compiler->lvaTable + lclNum2;
8967 assert(reg1 != REG_STK && reg1 != REG_NA && reg2 != REG_STK && reg2 != REG_NA);
8969 GenTreePtr lcl1 = compiler->gtNewLclvNode(lclNum1, varDsc1->TypeGet());
8970 lcl1->gtLsraInfo.isLsraAdded = true;
8971 lcl1->gtLsraInfo.isLocalDefUse = false;
8973 lcl1->gtRegNum = reg1;
8975 GenTreePtr lcl2 = compiler->gtNewLclvNode(lclNum2, varDsc2->TypeGet());
8976 lcl2->gtLsraInfo.isLsraAdded = true;
8977 lcl2->gtLsraInfo.isLocalDefUse = false;
8979 lcl2->gtRegNum = reg2;
8981 GenTreePtr swap = compiler->gtNewOperNode(GT_SWAP, TYP_VOID, lcl1, lcl2);
8982 swap->gtLsraInfo.isLsraAdded = true;
8983 swap->gtLsraInfo.isLocalDefUse = false;
8984 swap->gtRegNum = REG_NA;
8986 lcl1->gtNext = lcl2;
8987 lcl2->gtPrev = lcl1;
8988 lcl2->gtNext = swap;
8989 swap->gtPrev = lcl2;
8991 LIR::Range swapRange = LIR::SeqTree(compiler, swap);
8992 LIR::Range& blockRange = LIR::AsRange(block);
8994 if (insertionPoint != nullptr)
8996 blockRange.InsertBefore(insertionPoint, std::move(swapRange));
9000 // Put the copy at the bottom
9001 // If there's a branch, make an embedded statement that executes just prior to the branch
9002 if (block->bbJumpKind == BBJ_COND || block->bbJumpKind == BBJ_SWITCH)
9004 noway_assert(!blockRange.IsEmpty());
9006 GenTree* branch = blockRange.LastNode();
9007 assert(branch->OperIsConditionalJump() || branch->OperGet() == GT_SWITCH_TABLE ||
9008 branch->OperGet() == GT_SWITCH);
9010 blockRange.InsertBefore(branch, std::move(swapRange));
9014 assert(block->bbJumpKind == BBJ_NONE || block->bbJumpKind == BBJ_ALWAYS);
9015 blockRange.InsertAtEnd(std::move(swapRange));
9020 //------------------------------------------------------------------------
9021 // getTempRegForResolution: Get a free register to use for resolution code.
9024 // fromBlock - The "from" block on the edge being resolved.
9025 // toBlock - The "to"block on the edge
9026 // type - the type of register required
9029 // Returns a register that is free on the given edge, or REG_NA if none is available.
9032 // It is up to the caller to check the return value, and to determine whether a register is
9033 // available, and to handle that case appropriately.
9034 // It is also up to the caller to cache the return value, as this is not cheap to compute.
9036 regNumber LinearScan::getTempRegForResolution(BasicBlock* fromBlock, BasicBlock* toBlock, var_types type)
9038 // TODO-Throughput: This would be much more efficient if we add RegToVarMaps instead of VarToRegMaps
9039 // and they would be more space-efficient as well.
9040 VarToRegMap fromVarToRegMap = getOutVarToRegMap(fromBlock->bbNum);
9041 VarToRegMap toVarToRegMap = getInVarToRegMap(toBlock->bbNum);
9043 regMaskTP freeRegs = allRegs(type);
9045 if (getStressLimitRegs() == LSRA_LIMIT_SMALL_SET)
9050 INDEBUG(freeRegs = stressLimitRegs(nullptr, freeRegs));
9052 // We are only interested in the variables that are live-in to the "to" block.
9053 VARSET_ITER_INIT(compiler, iter, toBlock->bbLiveIn, varIndex);
9054 while (iter.NextElem(compiler, &varIndex) && freeRegs != RBM_NONE)
9056 regNumber fromReg = fromVarToRegMap[varIndex];
9057 regNumber toReg = toVarToRegMap[varIndex];
9058 assert(fromReg != REG_NA && toReg != REG_NA);
9059 if (fromReg != REG_STK)
9061 freeRegs &= ~genRegMask(fromReg);
9063 if (toReg != REG_STK)
9065 freeRegs &= ~genRegMask(toReg);
9068 if (freeRegs == RBM_NONE)
9074 regNumber tempReg = genRegNumFromMask(genFindLowestBit(freeRegs));
9079 //------------------------------------------------------------------------
9080 // addResolution: Add a resolution move of the given interval
9083 // block - the BasicBlock into which the move will be inserted.
9084 // insertionPoint - the instruction before which to insert the move
9085 // interval - the interval of the var to be moved
9086 // toReg - the register to which the var is moving
9087 // fromReg - the register from which the var is moving
9093 // For joins, we insert at the bottom (indicated by an insertionPoint
9094 // of nullptr), while for splits we insert at the top.
9095 // This is because for joins 'block' is a pred of the join, while for splits it is a succ.
9096 // For critical edges, this function may be called twice - once to move from
9097 // the source (fromReg), if any, to the stack, in which case toReg will be
9098 // REG_STK, and we insert at the bottom (leave insertionPoint as nullptr).
9099 // The next time, we want to move from the stack to the destination (toReg),
9100 // in which case fromReg will be REG_STK, and we insert at the top.
9102 void LinearScan::addResolution(
9103 BasicBlock* block, GenTreePtr insertionPoint, Interval* interval, regNumber toReg, regNumber fromReg)
9106 const char* insertionPointString = "top";
9108 if (insertionPoint == nullptr)
9111 insertionPointString = "bottom";
9115 JITDUMP(" BB%02u %s: move V%02u from ", block->bbNum, insertionPointString, interval->varNum);
9116 JITDUMP("%s to %s", getRegName(fromReg), getRegName(toReg));
9118 insertMove(block, insertionPoint, interval->varNum, fromReg, toReg);
9119 if (fromReg == REG_STK || toReg == REG_STK)
9121 assert(interval->isSpilled);
9125 // We should have already marked this as spilled or split.
9126 assert((interval->isSpilled) || (interval->isSplit));
9129 INTRACK_STATS(updateLsraStat(LSRA_STAT_RESOLUTION_MOV, block->bbNum));
9132 //------------------------------------------------------------------------
9133 // handleOutgoingCriticalEdges: Performs the necessary resolution on all critical edges that feed out of 'block'
9136 // block - the block with outgoing critical edges.
9142 // For all outgoing critical edges (i.e. any successor of this block which is
9143 // a join edge), if there are any conflicts, split the edge by adding a new block,
9144 // and generate the resolution code into that block.
9146 void LinearScan::handleOutgoingCriticalEdges(BasicBlock* block)
9148 VARSET_TP VARSET_INIT_NOCOPY(outResolutionSet,
9149 VarSetOps::Intersection(compiler, block->bbLiveOut, resolutionCandidateVars));
9150 if (VarSetOps::IsEmpty(compiler, outResolutionSet))
9154 VARSET_TP VARSET_INIT_NOCOPY(sameResolutionSet, VarSetOps::MakeEmpty(compiler));
9155 VARSET_TP VARSET_INIT_NOCOPY(sameLivePathsSet, VarSetOps::MakeEmpty(compiler));
9156 VARSET_TP VARSET_INIT_NOCOPY(singleTargetSet, VarSetOps::MakeEmpty(compiler));
9157 VARSET_TP VARSET_INIT_NOCOPY(diffResolutionSet, VarSetOps::MakeEmpty(compiler));
9159 // Get the outVarToRegMap for this block
9160 VarToRegMap outVarToRegMap = getOutVarToRegMap(block->bbNum);
9161 unsigned succCount = block->NumSucc(compiler);
9162 assert(succCount > 1);
9163 VarToRegMap firstSuccInVarToRegMap = nullptr;
9164 BasicBlock* firstSucc = nullptr;
9166 // First, determine the live regs at the end of this block so that we know what regs are
9167 // available to copy into.
9168 // Note that for this purpose we use the full live-out set, because we must ensure that
9169 // even the registers that remain the same across the edge are preserved correctly.
9170 regMaskTP liveOutRegs = RBM_NONE;
9171 VARSET_ITER_INIT(compiler, iter1, block->bbLiveOut, varIndex1);
9172 while (iter1.NextElem(compiler, &varIndex1))
9174 unsigned varNum = compiler->lvaTrackedToVarNum[varIndex1];
9175 regNumber fromReg = getVarReg(outVarToRegMap, varNum);
9176 if (fromReg != REG_STK)
9178 liveOutRegs |= genRegMask(fromReg);
9182 // Next, if this blocks ends with a switch table, we have to make sure not to copy
9183 // into the registers that it uses.
9184 regMaskTP switchRegs = RBM_NONE;
9185 if (block->bbJumpKind == BBJ_SWITCH)
9187 // At this point, Lowering has transformed any non-switch-table blocks into
9189 GenTree* switchTable = LIR::AsRange(block).LastNode();
9190 assert(switchTable != nullptr && switchTable->OperGet() == GT_SWITCH_TABLE);
9192 switchRegs = switchTable->gtRsvdRegs;
9193 GenTree* op1 = switchTable->gtGetOp1();
9194 GenTree* op2 = switchTable->gtGetOp2();
9195 noway_assert(op1 != nullptr && op2 != nullptr);
9196 assert(op1->gtRegNum != REG_NA && op2->gtRegNum != REG_NA);
9197 switchRegs |= genRegMask(op1->gtRegNum);
9198 switchRegs |= genRegMask(op2->gtRegNum);
9201 VarToRegMap sameVarToRegMap = sharedCriticalVarToRegMap;
9202 regMaskTP sameWriteRegs = RBM_NONE;
9203 regMaskTP diffReadRegs = RBM_NONE;
9205 // For each var that may require resolution, classify them as:
9206 // - in the same register at the end of this block and at each target (no resolution needed)
9207 // - in different registers at different targets (resolve separately):
9208 // diffResolutionSet
9209 // - in the same register at each target at which it's live, but different from the end of
9210 // this block. We may be able to resolve these as if it is "join", but only if they do not
9211 // write to any registers that are read by those in the diffResolutionSet:
9212 // sameResolutionSet
9214 VARSET_ITER_INIT(compiler, iter, outResolutionSet, varIndex);
9215 while (iter.NextElem(compiler, &varIndex))
9217 unsigned varNum = compiler->lvaTrackedToVarNum[varIndex];
9218 regNumber fromReg = getVarReg(outVarToRegMap, varNum);
9219 bool isMatch = true;
9220 bool isSame = false;
9221 bool maybeSingleTarget = false;
9222 bool maybeSameLivePaths = false;
9223 bool liveOnlyAtSplitEdge = true;
9224 regNumber sameToReg = REG_NA;
9225 for (unsigned succIndex = 0; succIndex < succCount; succIndex++)
9227 BasicBlock* succBlock = block->GetSucc(succIndex, compiler);
9228 if (!VarSetOps::IsMember(compiler, succBlock->bbLiveIn, varIndex))
9230 maybeSameLivePaths = true;
9233 else if (liveOnlyAtSplitEdge)
9235 // Is the var live only at those target blocks which are connected by a split edge to this block
9236 liveOnlyAtSplitEdge = ((succBlock->bbPreds->flNext == nullptr) && (succBlock != compiler->fgFirstBB));
9239 regNumber toReg = getVarReg(getInVarToRegMap(succBlock->bbNum), varNum);
9240 if (sameToReg == REG_NA)
9245 if (toReg == sameToReg)
9253 // Check for the cases where we can't write to a register.
9254 // We only need to check for these cases if sameToReg is an actual register (not REG_STK).
9255 if (sameToReg != REG_NA && sameToReg != REG_STK)
9257 // If there's a path on which this var isn't live, it may use the original value in sameToReg.
9258 // In this case, sameToReg will be in the liveOutRegs of this block.
9259 // Similarly, if sameToReg is in sameWriteRegs, it has already been used (i.e. for a lclVar that's
9260 // live only at another target), and we can't copy another lclVar into that reg in this block.
9261 regMaskTP sameToRegMask = genRegMask(sameToReg);
9262 if (maybeSameLivePaths &&
9263 (((sameToRegMask & liveOutRegs) != RBM_NONE) || ((sameToRegMask & sameWriteRegs) != RBM_NONE)))
9267 // If this register is used by a switch table at the end of the block, we can't do the copy
9268 // in this block (since we can't insert it after the switch).
9269 if ((sameToRegMask & switchRegs) != RBM_NONE)
9274 // If the var is live only at those blocks connected by a split edge and not live-in at some of the
9275 // target blocks, we will resolve it the same way as if it were in diffResolutionSet and resolution
9276 // will be deferred to the handling of split edges, which means copy will only be at those target(s).
9278 // Another way to achieve similar resolution for vars live only at split edges is by removing them
9279 // from consideration up-front but it requires that we traverse those edges anyway to account for
9280 // the registers that must note be overwritten.
9281 if (liveOnlyAtSplitEdge && maybeSameLivePaths)
9287 if (sameToReg == REG_NA)
9289 VarSetOps::AddElemD(compiler, diffResolutionSet, varIndex);
9290 if (fromReg != REG_STK)
9292 diffReadRegs |= genRegMask(fromReg);
9295 else if (sameToReg != fromReg)
9297 VarSetOps::AddElemD(compiler, sameResolutionSet, varIndex);
9298 sameVarToRegMap[varIndex] = sameToReg;
9299 if (sameToReg != REG_STK)
9301 sameWriteRegs |= genRegMask(sameToReg);
9306 if (!VarSetOps::IsEmpty(compiler, sameResolutionSet))
9308 if ((sameWriteRegs & diffReadRegs) != RBM_NONE)
9310 // We cannot split the "same" and "diff" regs if the "same" set writes registers
9311 // that must be read by the "diff" set. (Note that when these are done as a "batch"
9312 // we carefully order them to ensure all the input regs are read before they are
9314 VarSetOps::UnionD(compiler, diffResolutionSet, sameResolutionSet);
9315 VarSetOps::ClearD(compiler, sameResolutionSet);
9319 // For any vars in the sameResolutionSet, we can simply add the move at the end of "block".
9320 resolveEdge(block, nullptr, ResolveSharedCritical, sameResolutionSet);
9323 if (!VarSetOps::IsEmpty(compiler, diffResolutionSet))
9325 for (unsigned succIndex = 0; succIndex < succCount; succIndex++)
9327 BasicBlock* succBlock = block->GetSucc(succIndex, compiler);
9329 // Any "diffResolutionSet" resolution for a block with no other predecessors will be handled later
9330 // as split resolution.
9331 if ((succBlock->bbPreds->flNext == nullptr) && (succBlock != compiler->fgFirstBB))
9336 // Now collect the resolution set for just this edge, if any.
9337 // Check only the vars in diffResolutionSet that are live-in to this successor.
9338 bool needsResolution = false;
9339 VarToRegMap succInVarToRegMap = getInVarToRegMap(succBlock->bbNum);
9340 VARSET_TP VARSET_INIT_NOCOPY(edgeResolutionSet,
9341 VarSetOps::Intersection(compiler, diffResolutionSet, succBlock->bbLiveIn));
9342 VARSET_ITER_INIT(compiler, iter, edgeResolutionSet, varIndex);
9343 while (iter.NextElem(compiler, &varIndex))
9345 unsigned varNum = compiler->lvaTrackedToVarNum[varIndex];
9346 Interval* interval = getIntervalForLocalVar(varNum);
9347 regNumber fromReg = getVarReg(outVarToRegMap, varNum);
9348 regNumber toReg = getVarReg(succInVarToRegMap, varNum);
9350 if (fromReg == toReg)
9352 VarSetOps::RemoveElemD(compiler, edgeResolutionSet, varIndex);
9355 if (!VarSetOps::IsEmpty(compiler, edgeResolutionSet))
9357 resolveEdge(block, succBlock, ResolveCritical, edgeResolutionSet);
9363 //------------------------------------------------------------------------
9364 // resolveEdges: Perform resolution across basic block edges
9373 // Traverse the basic blocks.
9374 // - If this block has a single predecessor that is not the immediately
9375 // preceding block, perform any needed 'split' resolution at the beginning of this block
9376 // - Otherwise if this block has critical incoming edges, handle them.
9377 // - If this block has a single successor that has multiple predecesors, perform any needed
9378 // 'join' resolution at the end of this block.
9379 // Note that a block may have both 'split' or 'critical' incoming edge(s) and 'join' outgoing
9382 void LinearScan::resolveEdges()
9384 JITDUMP("RESOLVING EDGES\n");
9386 // The resolutionCandidateVars set was initialized with all the lclVars that are live-in to
9387 // any block. We now intersect that set with any lclVars that ever spilled or split.
9388 // If there are no candidates for resoultion, simply return.
9390 VarSetOps::IntersectionD(compiler, resolutionCandidateVars, splitOrSpilledVars);
9391 if (VarSetOps::IsEmpty(compiler, resolutionCandidateVars))
9396 BasicBlock *block, *prevBlock = nullptr;
9398 // Handle all the critical edges first.
9399 // We will try to avoid resolution across critical edges in cases where all the critical-edge
9400 // targets of a block have the same home. We will then split the edges only for the
9401 // remaining mismatches. We visit the out-edges, as that allows us to share the moves that are
9402 // common among allt he targets.
9404 if (hasCriticalEdges)
9406 foreach_block(compiler, block)
9408 if (block->bbNum > bbNumMaxBeforeResolution)
9410 // This is a new block added during resolution - we don't need to visit these now.
9413 if (blockInfo[block->bbNum].hasCriticalOutEdge)
9415 handleOutgoingCriticalEdges(block);
9421 prevBlock = nullptr;
9422 foreach_block(compiler, block)
9424 if (block->bbNum > bbNumMaxBeforeResolution)
9426 // This is a new block added during resolution - we don't need to visit these now.
9430 unsigned succCount = block->NumSucc(compiler);
9431 flowList* preds = block->bbPreds;
9432 BasicBlock* uniquePredBlock = block->GetUniquePred(compiler);
9434 // First, if this block has a single predecessor,
9435 // we may need resolution at the beginning of this block.
9436 // This may be true even if it's the block we used for starting locations,
9437 // if a variable was spilled.
9438 VARSET_TP VARSET_INIT_NOCOPY(inResolutionSet,
9439 VarSetOps::Intersection(compiler, block->bbLiveIn, resolutionCandidateVars));
9440 if (!VarSetOps::IsEmpty(compiler, inResolutionSet))
9442 if (uniquePredBlock != nullptr)
9444 // We may have split edges during critical edge resolution, and in the process split
9445 // a non-critical edge as well.
9446 // It is unlikely that we would ever have more than one of these in sequence (indeed,
9447 // I don't think it's possible), but there's no need to assume that it can't.
9448 while (uniquePredBlock->bbNum > bbNumMaxBeforeResolution)
9450 uniquePredBlock = uniquePredBlock->GetUniquePred(compiler);
9451 noway_assert(uniquePredBlock != nullptr);
9453 resolveEdge(uniquePredBlock, block, ResolveSplit, inResolutionSet);
9457 // Finally, if this block has a single successor:
9458 // - and that has at least one other predecessor (otherwise we will do the resolution at the
9459 // top of the successor),
9460 // - and that is not the target of a critical edge (otherwise we've already handled it)
9461 // we may need resolution at the end of this block.
9465 BasicBlock* succBlock = block->GetSucc(0, compiler);
9466 if (succBlock->GetUniquePred(compiler) == nullptr)
9468 VARSET_TP VARSET_INIT_NOCOPY(outResolutionSet, VarSetOps::Intersection(compiler, succBlock->bbLiveIn,
9469 resolutionCandidateVars));
9470 if (!VarSetOps::IsEmpty(compiler, outResolutionSet))
9472 resolveEdge(block, succBlock, ResolveJoin, outResolutionSet);
9478 // Now, fixup the mapping for any blocks that were adding for edge splitting.
9479 // See the comment prior to the call to fgSplitEdge() in resolveEdge().
9480 // Note that we could fold this loop in with the checking code below, but that
9481 // would only improve the debug case, and would clutter up the code somewhat.
9482 if (compiler->fgBBNumMax > bbNumMaxBeforeResolution)
9484 foreach_block(compiler, block)
9486 if (block->bbNum > bbNumMaxBeforeResolution)
9488 // There may be multiple blocks inserted when we split. But we must always have exactly
9489 // one path (i.e. all blocks must be single-successor and single-predecessor),
9490 // and only one block along the path may be non-empty.
9491 // Note that we may have a newly-inserted block that is empty, but which connects
9492 // two non-resolution blocks. This happens when an edge is split that requires it.
9494 BasicBlock* succBlock = block;
9497 succBlock = succBlock->GetUniqueSucc();
9498 noway_assert(succBlock != nullptr);
9499 } while ((succBlock->bbNum > bbNumMaxBeforeResolution) && succBlock->isEmpty());
9501 BasicBlock* predBlock = block;
9504 predBlock = predBlock->GetUniquePred(compiler);
9505 noway_assert(predBlock != nullptr);
9506 } while ((predBlock->bbNum > bbNumMaxBeforeResolution) && predBlock->isEmpty());
9508 unsigned succBBNum = succBlock->bbNum;
9509 unsigned predBBNum = predBlock->bbNum;
9510 if (block->isEmpty())
9512 // For the case of the empty block, find the non-resolution block (succ or pred).
9513 if (predBBNum > bbNumMaxBeforeResolution)
9515 assert(succBBNum <= bbNumMaxBeforeResolution);
9525 assert((succBBNum <= bbNumMaxBeforeResolution) && (predBBNum <= bbNumMaxBeforeResolution));
9527 SplitEdgeInfo info = {predBBNum, succBBNum};
9528 getSplitBBNumToTargetBBNumMap()->Set(block->bbNum, info);
9534 // Make sure the varToRegMaps match up on all edges.
9535 bool foundMismatch = false;
9536 foreach_block(compiler, block)
9538 if (block->isEmpty() && block->bbNum > bbNumMaxBeforeResolution)
9542 VarToRegMap toVarToRegMap = getInVarToRegMap(block->bbNum);
9543 for (flowList* pred = block->bbPreds; pred != nullptr; pred = pred->flNext)
9545 BasicBlock* predBlock = pred->flBlock;
9546 VarToRegMap fromVarToRegMap = getOutVarToRegMap(predBlock->bbNum);
9547 VARSET_ITER_INIT(compiler, iter, block->bbLiveIn, varIndex);
9548 while (iter.NextElem(compiler, &varIndex))
9550 unsigned varNum = compiler->lvaTrackedToVarNum[varIndex];
9551 regNumber fromReg = getVarReg(fromVarToRegMap, varNum);
9552 regNumber toReg = getVarReg(toVarToRegMap, varNum);
9553 if (fromReg != toReg)
9555 Interval* interval = getIntervalForLocalVar(varNum);
9558 foundMismatch = true;
9559 printf("Found mismatched var locations after resolution!\n");
9561 printf(" V%02u: BB%02u to BB%02u: ", varNum, predBlock->bbNum, block->bbNum);
9562 printf("%s to %s\n", getRegName(fromReg), getRegName(toReg));
9567 assert(!foundMismatch);
9572 //------------------------------------------------------------------------
9573 // resolveEdge: Perform the specified type of resolution between two blocks.
9576 // fromBlock - the block from which the edge originates
9577 // toBlock - the block at which the edge terminates
9578 // resolveType - the type of resolution to be performed
9579 // liveSet - the set of tracked lclVar indices which may require resolution
9585 // The caller must have performed the analysis to determine the type of the edge.
9588 // This method emits the correctly ordered moves necessary to place variables in the
9589 // correct registers across a Split, Join or Critical edge.
9590 // In order to avoid overwriting register values before they have been moved to their
9591 // new home (register/stack), it first does the register-to-stack moves (to free those
9592 // registers), then the register to register moves, ensuring that the target register
9593 // is free before the move, and then finally the stack to register moves.
9595 void LinearScan::resolveEdge(BasicBlock* fromBlock,
9596 BasicBlock* toBlock,
9597 ResolveType resolveType,
9598 VARSET_VALARG_TP liveSet)
9600 VarToRegMap fromVarToRegMap = getOutVarToRegMap(fromBlock->bbNum);
9601 VarToRegMap toVarToRegMap;
9602 if (resolveType == ResolveSharedCritical)
9604 toVarToRegMap = sharedCriticalVarToRegMap;
9608 toVarToRegMap = getInVarToRegMap(toBlock->bbNum);
9611 // The block to which we add the resolution moves depends on the resolveType
9613 switch (resolveType)
9616 case ResolveSharedCritical:
9622 case ResolveCritical:
9623 // fgSplitEdge may add one or two BasicBlocks. It returns the block that splits
9624 // the edge from 'fromBlock' and 'toBlock', but if it inserts that block right after
9625 // a block with a fall-through it will have to create another block to handle that edge.
9626 // These new blocks can be mapped to existing blocks in order to correctly handle
9627 // the calls to recordVarLocationsAtStartOfBB() from codegen. That mapping is handled
9628 // in resolveEdges(), after all the edge resolution has been done (by calling this
9629 // method for each edge).
9630 block = compiler->fgSplitEdge(fromBlock, toBlock);
9632 // Split edges are counted against fromBlock.
9633 INTRACK_STATS(updateLsraStat(LSRA_STAT_SPLIT_EDGE, fromBlock->bbNum));
9640 #ifndef _TARGET_XARCH_
9641 // We record tempregs for beginning and end of each block.
9642 // For amd64/x86 we only need a tempReg for float - we'll use xchg for int.
9643 // TODO-Throughput: It would be better to determine the tempRegs on demand, but the code below
9644 // modifies the varToRegMaps so we don't have all the correct registers at the time
9645 // we need to get the tempReg.
9646 regNumber tempRegInt =
9647 (resolveType == ResolveSharedCritical) ? REG_NA : getTempRegForResolution(fromBlock, toBlock, TYP_INT);
9648 #endif // !_TARGET_XARCH_
9649 regNumber tempRegFlt = REG_NA;
9650 if ((compiler->compFloatingPointUsed) && (resolveType != ResolveSharedCritical))
9652 tempRegFlt = getTempRegForResolution(fromBlock, toBlock, TYP_FLOAT);
9655 regMaskTP targetRegsToDo = RBM_NONE;
9656 regMaskTP targetRegsReady = RBM_NONE;
9657 regMaskTP targetRegsFromStack = RBM_NONE;
9659 // The following arrays capture the location of the registers as they are moved:
9660 // - location[reg] gives the current location of the var that was originally in 'reg'.
9661 // (Note that a var may be moved more than once.)
9662 // - source[reg] gives the original location of the var that needs to be moved to 'reg'.
9663 // For example, if a var is in rax and needs to be moved to rsi, then we would start with:
9664 // location[rax] == rax
9665 // source[rsi] == rax -- this doesn't change
9666 // Then, if for some reason we need to move it temporary to rbx, we would have:
9667 // location[rax] == rbx
9668 // Once we have completed the move, we will have:
9669 // location[rax] == REG_NA
9670 // This indicates that the var originally in rax is now in its target register.
9672 regNumberSmall location[REG_COUNT];
9673 C_ASSERT(sizeof(char) == sizeof(regNumberSmall)); // for memset to work
9674 memset(location, REG_NA, REG_COUNT);
9675 regNumberSmall source[REG_COUNT];
9676 memset(source, REG_NA, REG_COUNT);
9678 // What interval is this register associated with?
9679 // (associated with incoming reg)
9680 Interval* sourceIntervals[REG_COUNT];
9681 memset(&sourceIntervals, 0, sizeof(sourceIntervals));
9683 // Intervals for vars that need to be loaded from the stack
9684 Interval* stackToRegIntervals[REG_COUNT];
9685 memset(&stackToRegIntervals, 0, sizeof(stackToRegIntervals));
9687 // Get the starting insertion point for the "to" resolution
9688 GenTreePtr insertionPoint = nullptr;
9689 if (resolveType == ResolveSplit || resolveType == ResolveCritical)
9691 insertionPoint = LIR::AsRange(block).FirstNonPhiNode();
9695 // - Perform all moves from reg to stack (no ordering needed on these)
9696 // - For reg to reg moves, record the current location, associating their
9697 // source location with the target register they need to go into
9698 // - For stack to reg moves (done last, no ordering needed between them)
9699 // record the interval associated with the target reg
9700 // TODO-Throughput: We should be looping over the liveIn and liveOut registers, since
9701 // that will scale better than the live variables
9703 VARSET_ITER_INIT(compiler, iter, liveSet, varIndex);
9704 while (iter.NextElem(compiler, &varIndex))
9706 unsigned varNum = compiler->lvaTrackedToVarNum[varIndex];
9707 bool isSpilled = false;
9708 Interval* interval = getIntervalForLocalVar(varNum);
9709 regNumber fromReg = getVarReg(fromVarToRegMap, varNum);
9710 regNumber toReg = getVarReg(toVarToRegMap, varNum);
9711 if (fromReg == toReg)
9716 // For Critical edges, the location will not change on either side of the edge,
9717 // since we'll add a new block to do the move.
9718 if (resolveType == ResolveSplit)
9720 toVarToRegMap[varIndex] = fromReg;
9722 else if (resolveType == ResolveJoin || resolveType == ResolveSharedCritical)
9724 fromVarToRegMap[varIndex] = toReg;
9727 assert(fromReg < UCHAR_MAX && toReg < UCHAR_MAX);
9731 if (fromReg != toReg)
9733 if (fromReg == REG_STK)
9735 stackToRegIntervals[toReg] = interval;
9736 targetRegsFromStack |= genRegMask(toReg);
9738 else if (toReg == REG_STK)
9740 // Do the reg to stack moves now
9741 addResolution(block, insertionPoint, interval, REG_STK, fromReg);
9742 JITDUMP(" (%s)\n", resolveTypeName[resolveType]);
9746 location[fromReg] = (regNumberSmall)fromReg;
9747 source[toReg] = (regNumberSmall)fromReg;
9748 sourceIntervals[fromReg] = interval;
9749 targetRegsToDo |= genRegMask(toReg);
9754 // REGISTER to REGISTER MOVES
9756 // First, find all the ones that are ready to move now
9757 regMaskTP targetCandidates = targetRegsToDo;
9758 while (targetCandidates != RBM_NONE)
9760 regMaskTP targetRegMask = genFindLowestBit(targetCandidates);
9761 targetCandidates &= ~targetRegMask;
9762 regNumber targetReg = genRegNumFromMask(targetRegMask);
9763 if (location[targetReg] == REG_NA)
9765 targetRegsReady |= targetRegMask;
9769 // Perform reg to reg moves
9770 while (targetRegsToDo != RBM_NONE)
9772 while (targetRegsReady != RBM_NONE)
9774 regMaskTP targetRegMask = genFindLowestBit(targetRegsReady);
9775 targetRegsToDo &= ~targetRegMask;
9776 targetRegsReady &= ~targetRegMask;
9777 regNumber targetReg = genRegNumFromMask(targetRegMask);
9778 assert(location[targetReg] != targetReg);
9779 regNumber sourceReg = (regNumber)source[targetReg];
9780 regNumber fromReg = (regNumber)location[sourceReg];
9781 assert(fromReg < UCHAR_MAX && sourceReg < UCHAR_MAX);
9782 Interval* interval = sourceIntervals[sourceReg];
9783 assert(interval != nullptr);
9784 addResolution(block, insertionPoint, interval, targetReg, fromReg);
9785 JITDUMP(" (%s)\n", resolveTypeName[resolveType]);
9786 sourceIntervals[sourceReg] = nullptr;
9787 location[sourceReg] = REG_NA;
9789 // Do we have a free targetReg?
9790 if (fromReg == sourceReg && source[fromReg] != REG_NA)
9792 regMaskTP fromRegMask = genRegMask(fromReg);
9793 targetRegsReady |= fromRegMask;
9796 if (targetRegsToDo != RBM_NONE)
9798 regMaskTP targetRegMask = genFindLowestBit(targetRegsToDo);
9799 regNumber targetReg = genRegNumFromMask(targetRegMask);
9801 // Is it already there due to other moves?
9802 // If not, move it to the temp reg, OR swap it with another register
9803 regNumber sourceReg = (regNumber)source[targetReg];
9804 regNumber fromReg = (regNumber)location[sourceReg];
9805 if (targetReg == fromReg)
9807 targetRegsToDo &= ~targetRegMask;
9811 regNumber tempReg = REG_NA;
9812 bool useSwap = false;
9813 if (emitter::isFloatReg(targetReg))
9815 tempReg = tempRegFlt;
9817 #ifdef _TARGET_XARCH_
9822 #else // !_TARGET_XARCH_
9826 tempReg = tempRegInt;
9829 #endif // !_TARGET_XARCH_
9830 if (useSwap || tempReg == REG_NA)
9832 // First, we have to figure out the destination register for what's currently in fromReg,
9833 // so that we can find its sourceInterval.
9834 regNumber otherTargetReg = REG_NA;
9836 // By chance, is fromReg going where it belongs?
9837 if (location[source[fromReg]] == targetReg)
9839 otherTargetReg = fromReg;
9840 // If we can swap, we will be done with otherTargetReg as well.
9841 // Otherwise, we'll spill it to the stack and reload it later.
9844 regMaskTP fromRegMask = genRegMask(fromReg);
9845 targetRegsToDo &= ~fromRegMask;
9850 // Look at the remaining registers from targetRegsToDo (which we expect to be relatively
9851 // small at this point) to find out what's currently in targetReg.
9852 regMaskTP mask = targetRegsToDo;
9853 while (mask != RBM_NONE && otherTargetReg == REG_NA)
9855 regMaskTP nextRegMask = genFindLowestBit(mask);
9856 regNumber nextReg = genRegNumFromMask(nextRegMask);
9857 mask &= ~nextRegMask;
9858 if (location[source[nextReg]] == targetReg)
9860 otherTargetReg = nextReg;
9864 assert(otherTargetReg != REG_NA);
9868 // Generate a "swap" of fromReg and targetReg
9869 insertSwap(block, insertionPoint, sourceIntervals[source[otherTargetReg]]->varNum, targetReg,
9870 sourceIntervals[sourceReg]->varNum, fromReg);
9871 location[sourceReg] = REG_NA;
9872 location[source[otherTargetReg]] = (regNumberSmall)fromReg;
9874 INTRACK_STATS(updateLsraStat(LSRA_STAT_RESOLUTION_MOV, block->bbNum));
9878 // Spill "targetReg" to the stack and add its eventual target (otherTargetReg)
9879 // to "targetRegsFromStack", which will be handled below.
9880 // NOTE: This condition is very rare. Setting COMPlus_JitStressRegs=0x203
9881 // has been known to trigger it in JIT SH.
9883 // First, spill "otherInterval" from targetReg to the stack.
9884 Interval* otherInterval = sourceIntervals[source[otherTargetReg]];
9885 setIntervalAsSpilled(otherInterval);
9886 addResolution(block, insertionPoint, otherInterval, REG_STK, targetReg);
9887 JITDUMP(" (%s)\n", resolveTypeName[resolveType]);
9888 location[source[otherTargetReg]] = REG_STK;
9890 // Now, move the interval that is going to targetReg, and add its "fromReg" to
9891 // "targetRegsReady".
9892 addResolution(block, insertionPoint, sourceIntervals[sourceReg], targetReg, fromReg);
9893 JITDUMP(" (%s)\n", resolveTypeName[resolveType]);
9894 location[sourceReg] = REG_NA;
9895 targetRegsReady |= genRegMask(fromReg);
9897 targetRegsToDo &= ~targetRegMask;
9901 compiler->codeGen->regSet.rsSetRegsModified(genRegMask(tempReg) DEBUGARG(dumpTerse));
9902 assert(sourceIntervals[targetReg] != nullptr);
9903 addResolution(block, insertionPoint, sourceIntervals[targetReg], tempReg, targetReg);
9904 JITDUMP(" (%s)\n", resolveTypeName[resolveType]);
9905 location[targetReg] = (regNumberSmall)tempReg;
9906 targetRegsReady |= targetRegMask;
9912 // Finally, perform stack to reg moves
9913 // All the target regs will be empty at this point
9914 while (targetRegsFromStack != RBM_NONE)
9916 regMaskTP targetRegMask = genFindLowestBit(targetRegsFromStack);
9917 targetRegsFromStack &= ~targetRegMask;
9918 regNumber targetReg = genRegNumFromMask(targetRegMask);
9920 Interval* interval = stackToRegIntervals[targetReg];
9921 assert(interval != nullptr);
9923 addResolution(block, insertionPoint, interval, targetReg, REG_STK);
9924 JITDUMP(" (%s)\n", resolveTypeName[resolveType]);
9928 void TreeNodeInfo::Initialize(LinearScan* lsra, GenTree* node, LsraLocation location)
9930 regMaskTP dstCandidates;
9932 // if there is a reg indicated on the tree node, use that for dstCandidates
9933 // the exception is the NOP, which sometimes show up around late args.
9934 // TODO-Cleanup: get rid of those NOPs.
9935 if (node->gtRegNum == REG_NA || node->gtOper == GT_NOP)
9937 dstCandidates = lsra->allRegs(node->TypeGet());
9941 dstCandidates = genRegMask(node->gtRegNum);
9944 internalIntCount = 0;
9945 internalFloatCount = 0;
9946 isLocalDefUse = false;
9947 isHelperCallWithKills = false;
9948 isLsraAdded = false;
9949 definesAnyRegisters = false;
9951 setDstCandidates(lsra, dstCandidates);
9952 srcCandsIndex = dstCandsIndex;
9954 setInternalCandidates(lsra, lsra->allRegs(TYP_INT));
9958 isInitialized = true;
9961 assert(IsValid(lsra));
9964 regMaskTP TreeNodeInfo::getSrcCandidates(LinearScan* lsra)
9966 return lsra->GetRegMaskForIndex(srcCandsIndex);
9969 void TreeNodeInfo::setSrcCandidates(LinearScan* lsra, regMaskTP mask)
9971 LinearScan::RegMaskIndex i = lsra->GetIndexForRegMask(mask);
9972 assert(FitsIn<unsigned char>(i));
9973 srcCandsIndex = (unsigned char)i;
9976 regMaskTP TreeNodeInfo::getDstCandidates(LinearScan* lsra)
9978 return lsra->GetRegMaskForIndex(dstCandsIndex);
9981 void TreeNodeInfo::setDstCandidates(LinearScan* lsra, regMaskTP mask)
9983 LinearScan::RegMaskIndex i = lsra->GetIndexForRegMask(mask);
9984 assert(FitsIn<unsigned char>(i));
9985 dstCandsIndex = (unsigned char)i;
9988 regMaskTP TreeNodeInfo::getInternalCandidates(LinearScan* lsra)
9990 return lsra->GetRegMaskForIndex(internalCandsIndex);
9993 void TreeNodeInfo::setInternalCandidates(LinearScan* lsra, regMaskTP mask)
9995 LinearScan::RegMaskIndex i = lsra->GetIndexForRegMask(mask);
9996 assert(FitsIn<unsigned char>(i));
9997 internalCandsIndex = (unsigned char)i;
10000 void TreeNodeInfo::addInternalCandidates(LinearScan* lsra, regMaskTP mask)
10002 LinearScan::RegMaskIndex i = lsra->GetIndexForRegMask(lsra->GetRegMaskForIndex(internalCandsIndex) | mask);
10003 assert(FitsIn<unsigned char>(i));
10004 internalCandsIndex = (unsigned char)i;
10007 #if TRACK_LSRA_STATS
10008 // ----------------------------------------------------------
10009 // updateLsraStat: Increment LSRA stat counter.
10012 // stat - LSRA stat enum
10013 // bbNum - Basic block to which LSRA stat needs to be
10014 // associated with.
10016 void LinearScan::updateLsraStat(LsraStat stat, unsigned bbNum)
10018 if (bbNum > bbNumMaxBeforeResolution)
10020 // This is a newly created basic block as part of resolution.
10021 // These blocks contain resolution moves that are already accounted.
10027 case LSRA_STAT_SPILL:
10028 ++(blockInfo[bbNum].spillCount);
10031 case LSRA_STAT_COPY_REG:
10032 ++(blockInfo[bbNum].copyRegCount);
10035 case LSRA_STAT_RESOLUTION_MOV:
10036 ++(blockInfo[bbNum].resolutionMovCount);
10039 case LSRA_STAT_SPLIT_EDGE:
10040 ++(blockInfo[bbNum].splitEdgeCount);
10048 // -----------------------------------------------------------
10049 // dumpLsraStats - dumps Lsra stats to given file.
10052 // file - file to which stats are to be written.
10054 void LinearScan::dumpLsraStats(FILE* file)
10056 unsigned sumSpillCount = 0;
10057 unsigned sumCopyRegCount = 0;
10058 unsigned sumResolutionMovCount = 0;
10059 unsigned sumSplitEdgeCount = 0;
10060 UINT64 wtdSpillCount = 0;
10061 UINT64 wtdCopyRegCount = 0;
10062 UINT64 wtdResolutionMovCount = 0;
10064 fprintf(file, "----------\n");
10065 fprintf(file, "LSRA Stats");
10069 fprintf(file, " : %s\n", compiler->info.compFullName);
10073 // In verbose mode no need to print full name
10074 // while printing lsra stats.
10075 fprintf(file, "\n");
10078 fprintf(file, " : %s\n", compiler->eeGetMethodFullName(compiler->info.compCompHnd));
10081 fprintf(file, "----------\n");
10083 for (BasicBlock* block = compiler->fgFirstBB; block != nullptr; block = block->bbNext)
10085 if (block->bbNum > bbNumMaxBeforeResolution)
10090 unsigned spillCount = blockInfo[block->bbNum].spillCount;
10091 unsigned copyRegCount = blockInfo[block->bbNum].copyRegCount;
10092 unsigned resolutionMovCount = blockInfo[block->bbNum].resolutionMovCount;
10093 unsigned splitEdgeCount = blockInfo[block->bbNum].splitEdgeCount;
10095 if (spillCount != 0 || copyRegCount != 0 || resolutionMovCount != 0 || splitEdgeCount != 0)
10097 fprintf(file, "BB%02u [%8d]: ", block->bbNum, block->bbWeight);
10098 fprintf(file, "SpillCount = %d, ResolutionMovs = %d, SplitEdges = %d, CopyReg = %d\n", spillCount,
10099 resolutionMovCount, splitEdgeCount, copyRegCount);
10102 sumSpillCount += spillCount;
10103 sumCopyRegCount += copyRegCount;
10104 sumResolutionMovCount += resolutionMovCount;
10105 sumSplitEdgeCount += splitEdgeCount;
10107 wtdSpillCount += (UINT64)spillCount * block->bbWeight;
10108 wtdCopyRegCount += (UINT64)copyRegCount * block->bbWeight;
10109 wtdResolutionMovCount += (UINT64)resolutionMovCount * block->bbWeight;
10112 fprintf(file, "Total Spill Count: %d Weighted: %I64u\n", sumSpillCount, wtdSpillCount);
10113 fprintf(file, "Total CopyReg Count: %d Weighted: %I64u\n", sumCopyRegCount, wtdCopyRegCount);
10114 fprintf(file, "Total ResolutionMov Count: %d Weighted: %I64u\n", sumResolutionMovCount, wtdResolutionMovCount);
10115 fprintf(file, "Total number of split edges: %d\n", sumSplitEdgeCount);
10117 // compute total number of spill temps created
10118 unsigned numSpillTemps = 0;
10119 for (int i = 0; i < TYP_COUNT; i++)
10121 numSpillTemps += maxSpill[i];
10123 fprintf(file, "Total Number of spill temps created: %d\n\n", numSpillTemps);
10125 #endif // TRACK_LSRA_STATS
10128 void dumpRegMask(regMaskTP regs)
10130 if (regs == RBM_ALLINT)
10132 printf("[allInt]");
10134 else if (regs == (RBM_ALLINT & ~RBM_FPBASE))
10136 printf("[allIntButFP]");
10138 else if (regs == RBM_ALLFLOAT)
10140 printf("[allFloat]");
10142 else if (regs == RBM_ALLDOUBLE)
10144 printf("[allDouble]");
10152 static const char* getRefTypeName(RefType refType)
10156 #define DEF_REFTYPE(memberName, memberValue, shortName) \
10158 return #memberName;
10159 #include "lsra_reftypes.h"
10166 static const char* getRefTypeShortName(RefType refType)
10170 #define DEF_REFTYPE(memberName, memberValue, shortName) \
10173 #include "lsra_reftypes.h"
10180 void RefPosition::dump()
10182 printf("<RefPosition #%-3u @%-3u", rpNum, nodeLocation);
10184 if (nextRefPosition)
10186 printf(" ->#%-3u", nextRefPosition->rpNum);
10189 printf(" %s ", getRefTypeName(refType));
10191 if (this->isPhysRegRef)
10193 this->getReg()->tinyDump();
10195 else if (getInterval())
10197 this->getInterval()->tinyDump();
10200 if (this->treeNode)
10202 printf("%s ", treeNode->OpName(treeNode->OperGet()));
10204 printf("BB%02u ", this->bbNum);
10206 printf("regmask=");
10207 dumpRegMask(registerAssignment);
10217 if (this->spillAfter)
10219 printf(" spillAfter");
10229 if (this->isFixedRegRef)
10233 if (this->isLocalDefUse)
10237 if (this->delayRegFree)
10241 if (this->outOfOrder)
10243 printf(" outOfOrder");
10246 if (this->AllocateIfProfitable())
10248 printf(" regOptional");
10253 void RegRecord::dump()
10258 void Interval::dump()
10260 printf("Interval %2u:", intervalIndex);
10264 printf(" (V%02u)", varNum);
10268 printf(" (INTERNAL)");
10272 printf(" (SPILLED)");
10276 printf(" (SPLIT)");
10280 printf(" (struct)");
10282 if (isSpecialPutArg)
10284 printf(" (specialPutArg)");
10288 printf(" (constant)");
10291 printf(" RefPositions {");
10292 for (RefPosition* refPosition = this->firstRefPosition; refPosition != nullptr;
10293 refPosition = refPosition->nextRefPosition)
10295 printf("#%u@%u", refPosition->rpNum, refPosition->nodeLocation);
10296 if (refPosition->nextRefPosition)
10303 // this is not used (yet?)
10304 // printf(" SpillOffset %d", this->spillOffset);
10306 printf(" physReg:%s", getRegName(physReg));
10308 printf(" Preferences=");
10309 dumpRegMask(this->registerPreferences);
10311 if (relatedInterval)
10313 printf(" RelatedInterval ");
10314 relatedInterval->microDump();
10315 printf("[%p]", dspPtr(relatedInterval));
10321 // print out very concise representation
10322 void Interval::tinyDump()
10324 printf("<Ivl:%u", intervalIndex);
10327 printf(" V%02u", varNum);
10331 printf(" internal");
10336 // print out extremely concise representation
10337 void Interval::microDump()
10339 char intervalTypeChar = 'I';
10342 intervalTypeChar = 'T';
10344 else if (isLocalVar)
10346 intervalTypeChar = 'L';
10349 printf("<%c%u>", intervalTypeChar, intervalIndex);
10352 void RegRecord::tinyDump()
10354 printf("<Reg:%-3s> ", getRegName(regNum));
10357 void TreeNodeInfo::dump(LinearScan* lsra)
10359 printf("<TreeNodeInfo @ %2u %d=%d %di %df", loc, dstCount, srcCount, internalIntCount, internalFloatCount);
10361 dumpRegMask(getSrcCandidates(lsra));
10363 dumpRegMask(getInternalCandidates(lsra));
10365 dumpRegMask(getDstCandidates(lsra));
10374 if (isHelperCallWithKills)
10393 void LinearScan::lsraDumpIntervals(const char* msg)
10395 Interval* interval;
10397 printf("\nLinear scan intervals %s:\n", msg);
10398 for (auto& interval : intervals)
10400 // only dump something if it has references
10401 // if (interval->firstRefPosition)
10408 // Dumps a tree node as a destination or source operand, with the style
10409 // of dump dependent on the mode
10410 void LinearScan::lsraGetOperandString(GenTreePtr tree,
10411 LsraTupleDumpMode mode,
10412 char* operandString,
10413 unsigned operandStringLength)
10415 const char* lastUseChar = "";
10416 if ((tree->gtFlags & GTF_VAR_DEATH) != 0)
10422 case LinearScan::LSRA_DUMP_PRE:
10423 _snprintf_s(operandString, operandStringLength, operandStringLength, "t%d%s", tree->gtSeqNum, lastUseChar);
10425 case LinearScan::LSRA_DUMP_REFPOS:
10426 _snprintf_s(operandString, operandStringLength, operandStringLength, "t%d%s", tree->gtSeqNum, lastUseChar);
10428 case LinearScan::LSRA_DUMP_POST:
10430 Compiler* compiler = JitTls::GetCompiler();
10432 if (!tree->gtHasReg())
10434 _snprintf_s(operandString, operandStringLength, operandStringLength, "STK%s", lastUseChar);
10438 _snprintf_s(operandString, operandStringLength, operandStringLength, "%s%s",
10439 getRegName(tree->gtRegNum, useFloatReg(tree->TypeGet())), lastUseChar);
10444 printf("ERROR: INVALID TUPLE DUMP MODE\n");
10448 void LinearScan::lsraDispNode(GenTreePtr tree, LsraTupleDumpMode mode, bool hasDest)
10450 Compiler* compiler = JitTls::GetCompiler();
10451 const unsigned operandStringLength = 16;
10452 char operandString[operandStringLength];
10453 const char* emptyDestOperand = " ";
10454 char spillChar = ' ';
10456 if (mode == LinearScan::LSRA_DUMP_POST)
10458 if ((tree->gtFlags & GTF_SPILL) != 0)
10462 if (!hasDest && tree->gtHasReg())
10464 // A node can define a register, but not produce a value for a parent to consume,
10465 // i.e. in the "localDefUse" case.
10466 // There used to be an assert here that we wouldn't spill such a node.
10467 // However, we can have unused lclVars that wind up being the node at which
10468 // it is spilled. This probably indicates a bug, but we don't realy want to
10469 // assert during a dump.
10470 if (spillChar == 'S')
10481 printf("%c N%03u. ", spillChar, tree->gtSeqNum);
10483 LclVarDsc* varDsc = nullptr;
10484 unsigned varNum = UINT_MAX;
10485 if (tree->IsLocal())
10487 varNum = tree->gtLclVarCommon.gtLclNum;
10488 varDsc = &(compiler->lvaTable[varNum]);
10489 if (varDsc->lvLRACandidate)
10496 if (mode == LinearScan::LSRA_DUMP_POST && tree->gtFlags & GTF_SPILLED)
10498 assert(tree->gtHasReg());
10500 lsraGetOperandString(tree, mode, operandString, operandStringLength);
10501 printf("%-15s =", operandString);
10505 printf("%-15s ", emptyDestOperand);
10507 if (varDsc != nullptr)
10509 if (varDsc->lvLRACandidate)
10511 if (mode == LSRA_DUMP_REFPOS)
10513 printf(" V%02u(L%d)", varNum, getIntervalForLocalVar(varNum)->intervalIndex);
10517 lsraGetOperandString(tree, mode, operandString, operandStringLength);
10518 printf(" V%02u(%s)", varNum, operandString);
10519 if (mode == LinearScan::LSRA_DUMP_POST && tree->gtFlags & GTF_SPILLED)
10527 printf(" V%02u MEM", varNum);
10530 else if (tree->OperIsAssignment())
10532 assert(!tree->gtHasReg());
10533 const char* isRev = "";
10534 if ((tree->gtFlags & GTF_REVERSE_OPS) != 0)
10538 printf(" asg%s%s ", GenTree::NodeName(tree->OperGet()), isRev);
10542 compiler->gtDispNodeName(tree);
10543 if ((tree->gtFlags & GTF_REVERSE_OPS) != 0)
10547 if (tree->OperKind() & GTK_LEAF)
10549 compiler->gtDispLeaf(tree, nullptr);
10554 //------------------------------------------------------------------------
10555 // ComputeOperandDstCount: computes the number of registers defined by a
10558 // For most nodes, this is simple:
10559 // - Nodes that do not produce values (e.g. stores and other void-typed
10560 // nodes) and nodes that immediately use the registers they define
10561 // produce no registers
10562 // - Nodes that are marked as defining N registers define N registers.
10564 // For contained nodes, however, things are more complicated: for purposes
10565 // of bookkeeping, a contained node is treated as producing the transitive
10566 // closure of the registers produced by its sources.
10569 // operand - The operand for which to compute a register count.
10572 // The number of registers defined by `operand`.
10574 void LinearScan::DumpOperandDefs(
10575 GenTree* operand, bool& first, LsraTupleDumpMode mode, char* operandString, const unsigned operandStringLength)
10577 assert(operand != nullptr);
10578 assert(operandString != nullptr);
10580 if (ComputeOperandDstCount(operand) == 0)
10585 if (operand->gtLsraInfo.dstCount != 0)
10587 // This operand directly produces registers; print it.
10588 for (int i = 0; i < operand->gtLsraInfo.dstCount; i++)
10595 lsraGetOperandString(operand, mode, operandString, operandStringLength);
10596 printf("%s", operandString);
10603 // This is a contained node. Dump the defs produced by its operands.
10604 for (GenTree* op : operand->Operands())
10606 DumpOperandDefs(op, first, mode, operandString, operandStringLength);
10611 void LinearScan::TupleStyleDump(LsraTupleDumpMode mode)
10614 LsraLocation currentLoc = 1; // 0 is the entry
10615 const unsigned operandStringLength = 16;
10616 char operandString[operandStringLength];
10618 // currentRefPosition is not used for LSRA_DUMP_PRE
10619 // We keep separate iterators for defs, so that we can print them
10620 // on the lhs of the dump
10621 auto currentRefPosition = refPositions.begin();
10625 case LSRA_DUMP_PRE:
10626 printf("TUPLE STYLE DUMP BEFORE LSRA\n");
10628 case LSRA_DUMP_REFPOS:
10629 printf("TUPLE STYLE DUMP WITH REF POSITIONS\n");
10631 case LSRA_DUMP_POST:
10632 printf("TUPLE STYLE DUMP WITH REGISTER ASSIGNMENTS\n");
10635 printf("ERROR: INVALID TUPLE DUMP MODE\n");
10639 if (mode != LSRA_DUMP_PRE)
10641 printf("Incoming Parameters: ");
10642 for (; currentRefPosition != refPositions.end() && currentRefPosition->refType != RefTypeBB;
10643 ++currentRefPosition)
10645 Interval* interval = currentRefPosition->getInterval();
10646 assert(interval != nullptr && interval->isLocalVar);
10647 printf(" V%02d", interval->varNum);
10648 if (mode == LSRA_DUMP_POST)
10651 if (currentRefPosition->registerAssignment == RBM_NONE)
10657 reg = currentRefPosition->assignedReg();
10659 LclVarDsc* varDsc = &(compiler->lvaTable[interval->varNum]);
10661 regNumber assignedReg = varDsc->lvRegNum;
10662 regNumber argReg = (varDsc->lvIsRegArg) ? varDsc->lvArgReg : REG_STK;
10664 assert(reg == assignedReg || varDsc->lvRegister == false);
10667 printf(getRegName(argReg, isFloatRegType(interval->registerType)));
10670 printf("%s)", getRegName(reg, isFloatRegType(interval->registerType)));
10676 for (block = startBlockSequence(); block != nullptr; block = moveToNextBlock())
10680 if (mode == LSRA_DUMP_REFPOS)
10682 bool printedBlockHeader = false;
10683 // We should find the boundary RefPositions in the order of exposed uses, dummy defs, and the blocks
10684 for (; currentRefPosition != refPositions.end() &&
10685 (currentRefPosition->refType == RefTypeExpUse || currentRefPosition->refType == RefTypeDummyDef ||
10686 (currentRefPosition->refType == RefTypeBB && !printedBlockHeader));
10687 ++currentRefPosition)
10689 Interval* interval = nullptr;
10690 if (currentRefPosition->isIntervalRef())
10692 interval = currentRefPosition->getInterval();
10694 switch (currentRefPosition->refType)
10696 case RefTypeExpUse:
10697 assert(interval != nullptr);
10698 assert(interval->isLocalVar);
10699 printf(" Exposed use of V%02u at #%d\n", interval->varNum, currentRefPosition->rpNum);
10701 case RefTypeDummyDef:
10702 assert(interval != nullptr);
10703 assert(interval->isLocalVar);
10704 printf(" Dummy def of V%02u at #%d\n", interval->varNum, currentRefPosition->rpNum);
10707 block->dspBlockHeader(compiler);
10708 printedBlockHeader = true;
10712 printf("Unexpected RefPosition type at #%d\n", currentRefPosition->rpNum);
10719 block->dspBlockHeader(compiler);
10722 if (mode == LSRA_DUMP_POST && block != compiler->fgFirstBB && block->bbNum <= bbNumMaxBeforeResolution)
10724 printf("Predecessor for variable locations: BB%02u\n", blockInfo[block->bbNum].predBBNum);
10725 dumpInVarToRegMap(block);
10727 if (block->bbNum > bbNumMaxBeforeResolution)
10729 SplitEdgeInfo splitEdgeInfo;
10730 splitBBNumToTargetBBNumMap->Lookup(block->bbNum, &splitEdgeInfo);
10731 assert(splitEdgeInfo.toBBNum <= bbNumMaxBeforeResolution);
10732 assert(splitEdgeInfo.fromBBNum <= bbNumMaxBeforeResolution);
10733 printf("New block introduced for resolution from BB%02u to BB%02u\n", splitEdgeInfo.fromBBNum,
10734 splitEdgeInfo.toBBNum);
10737 for (GenTree* node : LIR::AsRange(block).NonPhiNodes())
10739 GenTree* tree = node;
10741 genTreeOps oper = tree->OperGet();
10742 TreeNodeInfo& info = tree->gtLsraInfo;
10743 if (tree->gtLsraInfo.isLsraAdded)
10745 // This must be one of the nodes that we add during LSRA
10747 if (oper == GT_LCL_VAR)
10752 else if (oper == GT_RELOAD || oper == GT_COPY)
10757 #ifdef FEATURE_SIMD
10758 else if (oper == GT_SIMD)
10760 if (tree->gtSIMD.gtSIMDIntrinsicID == SIMDIntrinsicUpperSave)
10767 assert(tree->gtSIMD.gtSIMDIntrinsicID == SIMDIntrinsicUpperRestore);
10772 #endif // FEATURE_SIMD
10775 assert(oper == GT_SWAP);
10779 info.internalIntCount = 0;
10780 info.internalFloatCount = 0;
10783 int consume = info.srcCount;
10784 int produce = info.dstCount;
10785 regMaskTP killMask = RBM_NONE;
10786 regMaskTP fixedMask = RBM_NONE;
10788 lsraDispNode(tree, mode, produce != 0 && mode != LSRA_DUMP_REFPOS);
10790 if (mode != LSRA_DUMP_REFPOS)
10797 for (GenTree* operand : tree->Operands())
10799 DumpOperandDefs(operand, first, mode, operandString, operandStringLength);
10805 // Print each RefPosition on a new line, but
10806 // printing all the kills for each node on a single line
10807 // and combining the fixed regs with their associated def or use
10808 bool killPrinted = false;
10809 RefPosition* lastFixedRegRefPos = nullptr;
10810 for (; currentRefPosition != refPositions.end() &&
10811 (currentRefPosition->refType == RefTypeUse || currentRefPosition->refType == RefTypeFixedReg ||
10812 currentRefPosition->refType == RefTypeKill || currentRefPosition->refType == RefTypeDef) &&
10813 (currentRefPosition->nodeLocation == tree->gtSeqNum ||
10814 currentRefPosition->nodeLocation == tree->gtSeqNum + 1);
10815 ++currentRefPosition)
10817 Interval* interval = nullptr;
10818 if (currentRefPosition->isIntervalRef())
10820 interval = currentRefPosition->getInterval();
10822 switch (currentRefPosition->refType)
10825 if (currentRefPosition->isPhysRegRef)
10827 printf("\n Use:R%d(#%d)",
10828 currentRefPosition->getReg()->regNum, currentRefPosition->rpNum);
10832 assert(interval != nullptr);
10834 interval->microDump();
10835 printf("(#%d)", currentRefPosition->rpNum);
10836 if (currentRefPosition->isFixedRegRef)
10838 assert(genMaxOneBit(currentRefPosition->registerAssignment));
10839 assert(lastFixedRegRefPos != nullptr);
10840 printf(" Fixed:%s(#%d)", getRegName(currentRefPosition->assignedReg(),
10841 isFloatRegType(interval->registerType)),
10842 lastFixedRegRefPos->rpNum);
10843 lastFixedRegRefPos = nullptr;
10845 if (currentRefPosition->isLocalDefUse)
10847 printf(" LocalDefUse");
10849 if (currentRefPosition->lastUse)
10857 // Print each def on a new line
10858 assert(interval != nullptr);
10860 interval->microDump();
10861 printf("(#%d)", currentRefPosition->rpNum);
10862 if (currentRefPosition->isFixedRegRef)
10864 assert(genMaxOneBit(currentRefPosition->registerAssignment));
10865 printf(" %s", getRegName(currentRefPosition->assignedReg(),
10866 isFloatRegType(interval->registerType)));
10868 if (currentRefPosition->isLocalDefUse)
10870 printf(" LocalDefUse");
10872 if (currentRefPosition->lastUse)
10876 if (interval->relatedInterval != nullptr)
10879 interval->relatedInterval->microDump();
10886 printf("\n Kill: ");
10887 killPrinted = true;
10889 printf(getRegName(currentRefPosition->assignedReg(),
10890 isFloatRegType(currentRefPosition->getReg()->registerType)));
10893 case RefTypeFixedReg:
10894 lastFixedRegRefPos = currentRefPosition;
10897 printf("Unexpected RefPosition type at #%d\n", currentRefPosition->rpNum);
10903 if (info.internalIntCount != 0 && mode != LSRA_DUMP_REFPOS)
10905 printf("\tinternal (%d):\t", info.internalIntCount);
10906 if (mode == LSRA_DUMP_POST)
10908 dumpRegMask(tree->gtRsvdRegs);
10910 else if ((info.getInternalCandidates(this) & allRegs(TYP_INT)) != allRegs(TYP_INT))
10912 dumpRegMask(info.getInternalCandidates(this) & allRegs(TYP_INT));
10916 if (info.internalFloatCount != 0 && mode != LSRA_DUMP_REFPOS)
10918 printf("\tinternal (%d):\t", info.internalFloatCount);
10919 if (mode == LSRA_DUMP_POST)
10921 dumpRegMask(tree->gtRsvdRegs);
10923 else if ((info.getInternalCandidates(this) & allRegs(TYP_INT)) != allRegs(TYP_INT))
10925 dumpRegMask(info.getInternalCandidates(this) & allRegs(TYP_INT));
10930 if (mode == LSRA_DUMP_POST)
10932 dumpOutVarToRegMap(block);
10939 void LinearScan::dumpLsraAllocationEvent(LsraDumpEvent event,
10940 Interval* interval,
10942 BasicBlock* currentBlock)
10950 // Conflicting def/use
10951 case LSRA_EVENT_DEFUSE_CONFLICT:
10954 printf(" Def and Use have conflicting register requirements:");
10958 printf("DUconflict ");
10962 case LSRA_EVENT_DEFUSE_FIXED_DELAY_USE:
10965 printf(" Can't change useAssignment ");
10968 case LSRA_EVENT_DEFUSE_CASE1:
10971 printf(" case #1, use the defRegAssignment\n");
10975 printf(indentFormat, " case #1 use defRegAssignment");
10977 dumpEmptyRefPosition();
10980 case LSRA_EVENT_DEFUSE_CASE2:
10983 printf(" case #2, use the useRegAssignment\n");
10987 printf(indentFormat, " case #2 use useRegAssignment");
10989 dumpEmptyRefPosition();
10992 case LSRA_EVENT_DEFUSE_CASE3:
10995 printf(" case #3, change the defRegAssignment to the use regs\n");
10999 printf(indentFormat, " case #3 use useRegAssignment");
11001 dumpEmptyRefPosition();
11004 case LSRA_EVENT_DEFUSE_CASE4:
11007 printf(" case #4, change the useRegAssignment to the def regs\n");
11011 printf(indentFormat, " case #4 use defRegAssignment");
11013 dumpEmptyRefPosition();
11016 case LSRA_EVENT_DEFUSE_CASE5:
11019 printf(" case #5, Conflicting Def and Use single-register requirements require copies - set def to all "
11020 "regs of the appropriate type\n");
11024 printf(indentFormat, " case #5 set def to all regs");
11026 dumpEmptyRefPosition();
11029 case LSRA_EVENT_DEFUSE_CASE6:
11032 printf(" case #6, Conflicting Def and Use register requirements require a copy\n");
11036 printf(indentFormat, " case #6 need a copy");
11038 dumpEmptyRefPosition();
11042 case LSRA_EVENT_SPILL:
11045 printf("Spilled:\n");
11050 assert(interval != nullptr && interval->assignedReg != nullptr);
11051 printf("Spill %-4s ", getRegName(interval->assignedReg->regNum));
11053 dumpEmptyRefPosition();
11056 case LSRA_EVENT_SPILL_EXTENDED_LIFETIME:
11059 printf(" Spilled extended lifetime var V%02u at last use; not marked for actual spill.",
11060 interval->intervalIndex);
11064 // Restoring the previous register
11065 case LSRA_EVENT_RESTORE_PREVIOUS_INTERVAL_AFTER_SPILL:
11066 assert(interval != nullptr);
11069 printf(" Assign register %s to previous interval Ivl:%d after spill\n", getRegName(reg),
11070 interval->intervalIndex);
11074 // If we spilled, then the dump is already pre-indented, but we need to pre-indent for the subsequent
11076 // with a dumpEmptyRefPosition().
11077 printf("SRstr %-4s ", getRegName(reg));
11079 dumpEmptyRefPosition();
11082 case LSRA_EVENT_RESTORE_PREVIOUS_INTERVAL:
11083 assert(interval != nullptr);
11086 printf(" Assign register %s to previous interval Ivl:%d\n", getRegName(reg), interval->intervalIndex);
11090 if (activeRefPosition == nullptr)
11092 printf(emptyRefPositionFormat, "");
11094 printf("Restr %-4s ", getRegName(reg));
11096 if (activeRefPosition != nullptr)
11098 printf(emptyRefPositionFormat, "");
11103 // Done with GC Kills
11104 case LSRA_EVENT_DONE_KILL_GC_REFS:
11105 printf("DoneKillGC ");
11108 // Block boundaries
11109 case LSRA_EVENT_START_BB:
11110 assert(currentBlock != nullptr);
11113 printf("\n\n Live Vars(Regs) at start of BB%02u (from pred BB%02u):", currentBlock->bbNum,
11114 blockInfo[currentBlock->bbNum].predBBNum);
11115 dumpVarToRegMap(inVarToRegMaps[currentBlock->bbNum]);
11118 case LSRA_EVENT_END_BB:
11121 printf("\n\n Live Vars(Regs) after BB%02u:", currentBlock->bbNum);
11122 dumpVarToRegMap(outVarToRegMaps[currentBlock->bbNum]);
11126 case LSRA_EVENT_FREE_REGS:
11129 printf("Freeing registers:\n");
11133 // Characteristics of the current RefPosition
11134 case LSRA_EVENT_INCREMENT_RANGE_END:
11137 printf(" Incrementing nextPhysRegLocation for %s\n", getRegName(reg));
11141 case LSRA_EVENT_LAST_USE:
11144 printf(" Last use, marked to be freed\n");
11147 case LSRA_EVENT_LAST_USE_DELAYED:
11150 printf(" Last use, marked to be freed (delayed)\n");
11153 case LSRA_EVENT_NEEDS_NEW_REG:
11156 printf(" Needs new register; mark %s to be freed\n", getRegName(reg));
11160 printf("Free %-4s ", getRegName(reg));
11162 dumpEmptyRefPosition();
11166 // Allocation decisions
11167 case LSRA_EVENT_FIXED_REG:
11168 case LSRA_EVENT_EXP_USE:
11171 printf("No allocation\n");
11175 printf("Keep %-4s ", getRegName(reg));
11178 case LSRA_EVENT_ZERO_REF:
11179 assert(interval != nullptr && interval->isLocalVar);
11182 printf("Marking V%02u as last use there are no actual references\n", interval->varNum);
11188 dumpEmptyRefPosition();
11191 case LSRA_EVENT_KEPT_ALLOCATION:
11194 printf("already allocated %4s\n", getRegName(reg));
11198 printf("Keep %-4s ", getRegName(reg));
11201 case LSRA_EVENT_COPY_REG:
11202 assert(interval != nullptr && interval->recentRefPosition != nullptr);
11205 printf("allocated %s as copyReg\n\n", getRegName(reg));
11209 printf("Copy %-4s ", getRegName(reg));
11212 case LSRA_EVENT_MOVE_REG:
11213 assert(interval != nullptr && interval->recentRefPosition != nullptr);
11216 printf(" needs a new register; marked as moveReg\n");
11220 printf("Move %-4s ", getRegName(reg));
11222 dumpEmptyRefPosition();
11225 case LSRA_EVENT_ALLOC_REG:
11228 printf("allocated %s\n", getRegName(reg));
11232 printf("Alloc %-4s ", getRegName(reg));
11235 case LSRA_EVENT_REUSE_REG:
11238 printf("reused constant in %s\n", getRegName(reg));
11242 printf("Reuse %-4s ", getRegName(reg));
11245 case LSRA_EVENT_ALLOC_SPILLED_REG:
11248 printf("allocated spilled register %s\n", getRegName(reg));
11252 printf("Steal %-4s ", getRegName(reg));
11255 case LSRA_EVENT_NO_ENTRY_REG_ALLOCATED:
11256 assert(interval != nullptr && interval->isLocalVar);
11259 printf("Not allocating an entry register for V%02u due to low ref count\n", interval->varNum);
11266 case LSRA_EVENT_NO_REG_ALLOCATED:
11269 printf("no register allocated\n");
11276 case LSRA_EVENT_RELOAD:
11279 printf(" Marked for reload\n");
11283 printf("ReLod %-4s ", getRegName(reg));
11285 dumpEmptyRefPosition();
11288 case LSRA_EVENT_SPECIAL_PUTARG:
11291 printf(" Special case of putArg - using lclVar that's in the expected reg\n");
11295 printf("PtArg %-4s ", getRegName(reg));
11303 //------------------------------------------------------------------------
11304 // dumpRegRecordHeader: Dump the header for a column-based dump of the register state.
11313 // Reg names fit in 4 characters (minimum width of the columns)
11316 // In order to make the table as dense as possible (for ease of reading the dumps),
11317 // we determine the minimum regColumnWidth width required to represent:
11318 // regs, by name (e.g. eax or xmm0) - this is fixed at 4 characters.
11319 // intervals, as Vnn for lclVar intervals, or as I<num> for other intervals.
11320 // The table is indented by the amount needed for dumpRefPositionShort, which is
11321 // captured in shortRefPositionDumpWidth.
11323 void LinearScan::dumpRegRecordHeader()
11325 printf("The following table has one or more rows for each RefPosition that is handled during allocation.\n"
11326 "The first column provides the basic information about the RefPosition, with its type (e.g. Def,\n"
11327 "Use, Fixd) followed by a '*' if it is a last use, and a 'D' if it is delayRegFree, and then the\n"
11328 "action taken during allocation (e.g. Alloc a new register, or Keep an existing one).\n"
11329 "The subsequent columns show the Interval occupying each register, if any, followed by 'a' if it is\n"
11330 "active, and 'i'if it is inactive. Columns are only printed up to the last modifed register, which\n"
11331 "may increase during allocation, in which case additional columns will appear. Registers which are\n"
11332 "not marked modified have ---- in their column.\n\n");
11334 // First, determine the width of each register column (which holds a reg name in the
11335 // header, and an interval name in each subsequent row).
11336 int intervalNumberWidth = (int)log10((double)intervals.size()) + 1;
11337 // The regColumnWidth includes the identifying character (I or V) and an 'i' or 'a' (inactive or active)
11338 regColumnWidth = intervalNumberWidth + 2;
11339 if (regColumnWidth < 4)
11341 regColumnWidth = 4;
11343 sprintf_s(intervalNameFormat, MAX_FORMAT_CHARS, "%%c%%-%dd", regColumnWidth - 2);
11344 sprintf_s(regNameFormat, MAX_FORMAT_CHARS, "%%-%ds", regColumnWidth);
11346 // Next, determine the width of the short RefPosition (see dumpRefPositionShort()).
11347 // This is in the form:
11348 // nnn.#mmm NAME TYPEld
11350 // nnn is the Location, right-justified to the width needed for the highest location.
11351 // mmm is the RefPosition rpNum, left-justified to the width needed for the highest rpNum.
11352 // NAME is dumped by dumpReferentName(), and is "regColumnWidth".
11353 // TYPE is RefTypeNameShort, and is 4 characters
11354 // l is either '*' (if a last use) or ' ' (otherwise)
11355 // d is either 'D' (if a delayed use) or ' ' (otherwise)
11357 maxNodeLocation = (maxNodeLocation == 0)
11359 : maxNodeLocation; // corner case of a method with an infinite loop without any gentree nodes
11360 assert(maxNodeLocation >= 1);
11361 assert(refPositions.size() >= 1);
11362 int nodeLocationWidth = (int)log10((double)maxNodeLocation) + 1;
11363 int refPositionWidth = (int)log10((double)refPositions.size()) + 1;
11364 int refTypeInfoWidth = 4 /*TYPE*/ + 2 /* last-use and delayed */ + 1 /* space */;
11365 int locationAndRPNumWidth = nodeLocationWidth + 2 /* .# */ + refPositionWidth + 1 /* space */;
11366 int shortRefPositionDumpWidth = locationAndRPNumWidth + regColumnWidth + 1 /* space */ + refTypeInfoWidth;
11367 sprintf_s(shortRefPositionFormat, MAX_FORMAT_CHARS, "%%%dd.#%%-%dd ", nodeLocationWidth, refPositionWidth);
11368 sprintf_s(emptyRefPositionFormat, MAX_FORMAT_CHARS, "%%-%ds", shortRefPositionDumpWidth);
11370 // The width of the "allocation info"
11371 // - a 5-character allocation decision
11373 // - a 4-character register
11375 int allocationInfoWidth = 5 + 1 + 4 + 1;
11377 // Next, determine the width of the legend for each row. This includes:
11378 // - a short RefPosition dump (shortRefPositionDumpWidth), which includes a space
11379 // - the allocation info (allocationInfoWidth), which also includes a space
11381 regTableIndent = shortRefPositionDumpWidth + allocationInfoWidth;
11383 // BBnn printed left-justified in the NAME Typeld and allocationInfo space.
11384 int bbDumpWidth = regColumnWidth + 1 + refTypeInfoWidth + allocationInfoWidth;
11385 int bbNumWidth = (int)log10((double)compiler->fgBBNumMax) + 1;
11386 // In the unlikely event that BB numbers overflow the space, we'll simply omit the predBB
11387 int predBBNumDumpSpace = regTableIndent - locationAndRPNumWidth - bbNumWidth - 9; // 'BB' + ' PredBB'
11388 if (predBBNumDumpSpace < bbNumWidth)
11390 sprintf_s(bbRefPosFormat, MAX_LEGEND_FORMAT_CHARS, "BB%%-%dd", shortRefPositionDumpWidth - 2);
11394 sprintf_s(bbRefPosFormat, MAX_LEGEND_FORMAT_CHARS, "BB%%-%dd PredBB%%-%dd", bbNumWidth, predBBNumDumpSpace);
11397 if (compiler->shouldDumpASCIITrees())
11399 columnSeparator = "|";
11407 columnSeparator = "\xe2\x94\x82";
11408 line = "\xe2\x94\x80";
11409 leftBox = "\xe2\x94\x9c";
11410 middleBox = "\xe2\x94\xbc";
11411 rightBox = "\xe2\x94\xa4";
11413 sprintf_s(indentFormat, MAX_FORMAT_CHARS, "%%-%ds", regTableIndent);
11415 // Now, set up the legend format for the RefPosition info
11416 sprintf_s(legendFormat, MAX_LEGEND_FORMAT_CHARS, "%%-%d.%ds%%-%d.%ds%%-%ds%%s", nodeLocationWidth + 1,
11417 nodeLocationWidth + 1, refPositionWidth + 2, refPositionWidth + 2, regColumnWidth + 1);
11419 // Finally, print a "title row" including the legend and the reg names
11420 dumpRegRecordTitle();
11423 int LinearScan::getLastUsedRegNumIndex()
11425 int lastUsedRegNumIndex = 0;
11426 regMaskTP usedRegsMask = compiler->codeGen->regSet.rsGetModifiedRegsMask();
11427 int lastRegNumIndex = compiler->compFloatingPointUsed ? REG_FP_LAST : REG_INT_LAST;
11428 for (int regNumIndex = 0; regNumIndex <= lastRegNumIndex; regNumIndex++)
11430 if ((usedRegsMask & genRegMask((regNumber)regNumIndex)) != 0)
11432 lastUsedRegNumIndex = regNumIndex;
11435 return lastUsedRegNumIndex;
11438 void LinearScan::dumpRegRecordTitleLines()
11440 for (int i = 0; i < regTableIndent; i++)
11442 printf("%s", line);
11444 int lastUsedRegNumIndex = getLastUsedRegNumIndex();
11445 for (int regNumIndex = 0; regNumIndex <= lastUsedRegNumIndex; regNumIndex++)
11447 printf("%s", middleBox);
11448 for (int i = 0; i < regColumnWidth; i++)
11450 printf("%s", line);
11453 printf("%s\n", rightBox);
11455 void LinearScan::dumpRegRecordTitle()
11457 dumpRegRecordTitleLines();
11459 // Print out the legend for the RefPosition info
11460 printf(legendFormat, "Loc ", "RP# ", "Name ", "Type Action Reg ");
11462 // Print out the register name column headers
11463 char columnFormatArray[MAX_FORMAT_CHARS];
11464 sprintf_s(columnFormatArray, MAX_FORMAT_CHARS, "%s%%-%d.%ds", columnSeparator, regColumnWidth, regColumnWidth);
11465 int lastUsedRegNumIndex = getLastUsedRegNumIndex();
11466 for (int regNumIndex = 0; regNumIndex <= lastUsedRegNumIndex; regNumIndex++)
11468 regNumber regNum = (regNumber)regNumIndex;
11469 const char* regName = getRegName(regNum);
11470 printf(columnFormatArray, regName);
11472 printf("%s\n", columnSeparator);
11474 rowCountSinceLastTitle = 0;
11476 dumpRegRecordTitleLines();
11479 void LinearScan::dumpRegRecords()
11481 static char columnFormatArray[18];
11482 int lastUsedRegNumIndex = getLastUsedRegNumIndex();
11483 regMaskTP usedRegsMask = compiler->codeGen->regSet.rsGetModifiedRegsMask();
11485 for (int regNumIndex = 0; regNumIndex <= lastUsedRegNumIndex; regNumIndex++)
11487 printf("%s", columnSeparator);
11488 RegRecord& regRecord = physRegs[regNumIndex];
11489 Interval* interval = regRecord.assignedInterval;
11490 if (interval != nullptr)
11492 dumpIntervalName(interval);
11493 char activeChar = interval->isActive ? 'a' : 'i';
11494 printf("%c", activeChar);
11496 else if (regRecord.isBusyUntilNextKill)
11498 printf(columnFormatArray, "Busy");
11500 else if ((usedRegsMask & genRegMask((regNumber)regNumIndex)) == 0)
11502 sprintf_s(columnFormatArray, MAX_FORMAT_CHARS, "%%-%ds", regColumnWidth);
11503 printf(columnFormatArray, "----");
11507 sprintf_s(columnFormatArray, MAX_FORMAT_CHARS, "%%-%ds", regColumnWidth);
11508 printf(columnFormatArray, "");
11511 printf("%s\n", columnSeparator);
11513 if (rowCountSinceLastTitle > MAX_ROWS_BETWEEN_TITLES)
11515 dumpRegRecordTitle();
11517 rowCountSinceLastTitle++;
11520 void LinearScan::dumpIntervalName(Interval* interval)
11523 if (interval->isLocalVar)
11525 intervalChar = 'V';
11527 else if (interval->isConstant)
11529 intervalChar = 'C';
11533 intervalChar = 'I';
11535 printf(intervalNameFormat, intervalChar, interval->intervalIndex);
11538 void LinearScan::dumpEmptyRefPosition()
11540 printf(emptyRefPositionFormat, "");
11543 // Note that the size of this dump is computed in dumpRegRecordHeader().
11545 void LinearScan::dumpRefPositionShort(RefPosition* refPosition, BasicBlock* currentBlock)
11547 BasicBlock* block = currentBlock;
11548 if (refPosition->refType == RefTypeBB)
11550 // Always print a title row before a RefTypeBB (except for the first, because we
11551 // will already have printed it before the parameters)
11552 if (refPosition->refType == RefTypeBB && block != compiler->fgFirstBB && block != nullptr)
11554 dumpRegRecordTitle();
11557 printf(shortRefPositionFormat, refPosition->nodeLocation, refPosition->rpNum);
11558 if (refPosition->refType == RefTypeBB)
11560 if (block == nullptr)
11562 printf(regNameFormat, "END");
11564 printf(regNameFormat, "");
11568 printf(bbRefPosFormat, block->bbNum, block == compiler->fgFirstBB ? 0 : blockInfo[block->bbNum].predBBNum);
11571 else if (refPosition->isIntervalRef())
11573 Interval* interval = refPosition->getInterval();
11574 dumpIntervalName(interval);
11575 char lastUseChar = ' ';
11576 char delayChar = ' ';
11577 if (refPosition->lastUse)
11580 if (refPosition->delayRegFree)
11585 printf(" %s%c%c ", getRefTypeShortName(refPosition->refType), lastUseChar, delayChar);
11587 else if (refPosition->isPhysRegRef)
11589 RegRecord* regRecord = refPosition->getReg();
11590 printf(regNameFormat, getRegName(regRecord->regNum));
11591 printf(" %s ", getRefTypeShortName(refPosition->refType));
11595 assert(refPosition->refType == RefTypeKillGCRefs);
11596 // There's no interval or reg name associated with this.
11597 printf(regNameFormat, " ");
11598 printf(" %s ", getRefTypeShortName(refPosition->refType));
11602 //------------------------------------------------------------------------
11603 // LinearScan::IsResolutionMove:
11604 // Returns true if the given node is a move inserted by LSRA
11608 // node - the node to check.
11610 bool LinearScan::IsResolutionMove(GenTree* node)
11612 if (!node->gtLsraInfo.isLsraAdded)
11617 switch (node->OperGet())
11621 return node->gtLsraInfo.isLocalDefUse;
11631 //------------------------------------------------------------------------
11632 // LinearScan::IsResolutionNode:
11633 // Returns true if the given node is either a move inserted by LSRA
11634 // resolution or an operand to such a move.
11637 // containingRange - the range that contains the node to check.
11638 // node - the node to check.
11640 bool LinearScan::IsResolutionNode(LIR::Range& containingRange, GenTree* node)
11644 if (IsResolutionMove(node))
11649 if (!node->gtLsraInfo.isLsraAdded || (node->OperGet() != GT_LCL_VAR))
11655 bool foundUse = containingRange.TryGetUse(node, &use);
11662 //------------------------------------------------------------------------
11663 // verifyFinalAllocation: Traverse the RefPositions and verify various invariants.
11672 // If verbose is set, this will also dump a table of the final allocations.
11673 void LinearScan::verifyFinalAllocation()
11677 printf("\nFinal allocation\n");
11680 // Clear register assignments.
11681 for (regNumber reg = REG_FIRST; reg < ACTUAL_REG_COUNT; reg = REG_NEXT(reg))
11683 RegRecord* physRegRecord = getRegisterRecord(reg);
11684 physRegRecord->assignedInterval = nullptr;
11687 for (auto& interval : intervals)
11689 interval.assignedReg = nullptr;
11690 interval.physReg = REG_NA;
11693 DBEXEC(VERBOSE, dumpRegRecordTitle());
11695 BasicBlock* currentBlock = nullptr;
11696 GenTree* firstBlockEndResolutionNode = nullptr;
11697 regMaskTP regsToFree = RBM_NONE;
11698 regMaskTP delayRegsToFree = RBM_NONE;
11699 LsraLocation currentLocation = MinLocation;
11700 for (auto& refPosition : refPositions)
11702 RefPosition* currentRefPosition = &refPosition;
11703 Interval* interval = nullptr;
11704 RegRecord* regRecord = nullptr;
11705 regNumber regNum = REG_NA;
11706 if (currentRefPosition->refType == RefTypeBB)
11708 regsToFree |= delayRegsToFree;
11709 delayRegsToFree = RBM_NONE;
11710 // For BB RefPositions, wait until we dump the "end of block" info before dumping the basic RefPosition
11715 // For other RefPosition types, we can dump the basic RefPosition info now.
11716 DBEXEC(VERBOSE, dumpRefPositionShort(currentRefPosition, currentBlock));
11718 if (currentRefPosition->isPhysRegRef)
11720 regRecord = currentRefPosition->getReg();
11721 regRecord->recentRefPosition = currentRefPosition;
11722 regNum = regRecord->regNum;
11724 else if (currentRefPosition->isIntervalRef())
11726 interval = currentRefPosition->getInterval();
11727 interval->recentRefPosition = currentRefPosition;
11728 if (currentRefPosition->registerAssignment != RBM_NONE)
11730 if (!genMaxOneBit(currentRefPosition->registerAssignment))
11732 assert(currentRefPosition->refType == RefTypeExpUse ||
11733 currentRefPosition->refType == RefTypeDummyDef);
11737 regNum = currentRefPosition->assignedReg();
11738 regRecord = getRegisterRecord(regNum);
11744 LsraLocation newLocation = currentRefPosition->nodeLocation;
11746 if (newLocation > currentLocation)
11749 // We could use the freeRegisters() method, but we'd have to carefully manage the active intervals.
11750 for (regNumber reg = REG_FIRST; reg < ACTUAL_REG_COUNT; reg = REG_NEXT(reg))
11752 regMaskTP regMask = genRegMask(reg);
11753 if ((regsToFree & regMask) != RBM_NONE)
11755 RegRecord* physRegRecord = getRegisterRecord(reg);
11756 physRegRecord->assignedInterval = nullptr;
11759 regsToFree = delayRegsToFree;
11760 regsToFree = RBM_NONE;
11762 currentLocation = newLocation;
11764 switch (currentRefPosition->refType)
11768 if (currentBlock == nullptr)
11770 currentBlock = startBlockSequence();
11774 // Verify the resolution moves at the end of the previous block.
11775 for (GenTree* node = firstBlockEndResolutionNode; node != nullptr; node = node->gtNext)
11777 // Only verify nodes that are actually moves; don't bother with the nodes that are
11778 // operands to moves.
11779 if (IsResolutionMove(node))
11781 verifyResolutionMove(node, currentLocation);
11785 // Validate the locations at the end of the previous block.
11786 VarToRegMap outVarToRegMap = outVarToRegMaps[currentBlock->bbNum];
11787 VARSET_ITER_INIT(compiler, iter, currentBlock->bbLiveOut, varIndex);
11788 while (iter.NextElem(compiler, &varIndex))
11790 unsigned varNum = compiler->lvaTrackedToVarNum[varIndex];
11791 regNumber regNum = getVarReg(outVarToRegMap, varNum);
11792 interval = getIntervalForLocalVar(varNum);
11793 assert(interval->physReg == regNum || (interval->physReg == REG_NA && regNum == REG_STK));
11794 interval->physReg = REG_NA;
11795 interval->assignedReg = nullptr;
11796 interval->isActive = false;
11799 // Clear register assignments.
11800 for (regNumber reg = REG_FIRST; reg < ACTUAL_REG_COUNT; reg = REG_NEXT(reg))
11802 RegRecord* physRegRecord = getRegisterRecord(reg);
11803 physRegRecord->assignedInterval = nullptr;
11806 // Now, record the locations at the beginning of this block.
11807 currentBlock = moveToNextBlock();
11810 if (currentBlock != nullptr)
11812 VarToRegMap inVarToRegMap = inVarToRegMaps[currentBlock->bbNum];
11813 VARSET_ITER_INIT(compiler, iter, currentBlock->bbLiveIn, varIndex);
11814 while (iter.NextElem(compiler, &varIndex))
11816 unsigned varNum = compiler->lvaTrackedToVarNum[varIndex];
11817 regNumber regNum = getVarReg(inVarToRegMap, varNum);
11818 interval = getIntervalForLocalVar(varNum);
11819 interval->physReg = regNum;
11820 interval->assignedReg = &(physRegs[regNum]);
11821 interval->isActive = true;
11822 physRegs[regNum].assignedInterval = interval;
11827 dumpRefPositionShort(currentRefPosition, currentBlock);
11831 // Finally, handle the resolution moves, if any, at the beginning of the next block.
11832 firstBlockEndResolutionNode = nullptr;
11833 bool foundNonResolutionNode = false;
11835 LIR::Range& currentBlockRange = LIR::AsRange(currentBlock);
11836 for (GenTree* node : currentBlockRange.NonPhiNodes())
11838 if (IsResolutionNode(currentBlockRange, node))
11840 if (foundNonResolutionNode)
11842 firstBlockEndResolutionNode = node;
11845 else if (IsResolutionMove(node))
11847 // Only verify nodes that are actually moves; don't bother with the nodes that are
11848 // operands to moves.
11849 verifyResolutionMove(node, currentLocation);
11854 foundNonResolutionNode = true;
11863 assert(regRecord != nullptr);
11864 assert(regRecord->assignedInterval == nullptr);
11865 dumpLsraAllocationEvent(LSRA_EVENT_KEPT_ALLOCATION, nullptr, regRecord->regNum, currentBlock);
11867 case RefTypeFixedReg:
11868 assert(regRecord != nullptr);
11869 dumpLsraAllocationEvent(LSRA_EVENT_KEPT_ALLOCATION, nullptr, regRecord->regNum, currentBlock);
11872 case RefTypeUpperVectorSaveDef:
11873 case RefTypeUpperVectorSaveUse:
11876 case RefTypeParamDef:
11877 case RefTypeZeroInit:
11878 assert(interval != nullptr);
11880 if (interval->isSpecialPutArg)
11882 dumpLsraAllocationEvent(LSRA_EVENT_SPECIAL_PUTARG, interval, regNum);
11885 if (currentRefPosition->reload)
11887 interval->isActive = true;
11888 assert(regNum != REG_NA);
11889 interval->physReg = regNum;
11890 interval->assignedReg = regRecord;
11891 regRecord->assignedInterval = interval;
11892 dumpLsraAllocationEvent(LSRA_EVENT_RELOAD, nullptr, regRecord->regNum, currentBlock);
11894 if (regNum == REG_NA)
11896 dumpLsraAllocationEvent(LSRA_EVENT_NO_REG_ALLOCATED, interval);
11898 else if (RefTypeIsDef(currentRefPosition->refType))
11900 interval->isActive = true;
11903 if (interval->isConstant && (currentRefPosition->treeNode != nullptr) &&
11904 currentRefPosition->treeNode->IsReuseRegVal())
11906 dumpLsraAllocationEvent(LSRA_EVENT_REUSE_REG, nullptr, regRecord->regNum, currentBlock);
11910 dumpLsraAllocationEvent(LSRA_EVENT_ALLOC_REG, nullptr, regRecord->regNum, currentBlock);
11914 else if (currentRefPosition->copyReg)
11916 dumpLsraAllocationEvent(LSRA_EVENT_COPY_REG, interval, regRecord->regNum, currentBlock);
11918 else if (currentRefPosition->moveReg)
11920 assert(interval->assignedReg != nullptr);
11921 interval->assignedReg->assignedInterval = nullptr;
11922 interval->physReg = regNum;
11923 interval->assignedReg = regRecord;
11924 regRecord->assignedInterval = interval;
11927 printf("Move %-4s ", getRegName(regRecord->regNum));
11932 dumpLsraAllocationEvent(LSRA_EVENT_KEPT_ALLOCATION, nullptr, regRecord->regNum, currentBlock);
11934 if (currentRefPosition->lastUse || currentRefPosition->spillAfter)
11936 interval->isActive = false;
11938 if (regNum != REG_NA)
11940 if (currentRefPosition->spillAfter)
11944 // If refPos is marked as copyReg, then the reg that is spilled
11945 // is the homeReg of the interval not the reg currently assigned
11947 regNumber spillReg = regNum;
11948 if (currentRefPosition->copyReg)
11950 assert(interval != nullptr);
11951 spillReg = interval->physReg;
11954 dumpEmptyRefPosition();
11955 printf("Spill %-4s ", getRegName(spillReg));
11958 else if (currentRefPosition->copyReg)
11960 regRecord->assignedInterval = interval;
11964 interval->physReg = regNum;
11965 interval->assignedReg = regRecord;
11966 regRecord->assignedInterval = interval;
11970 case RefTypeKillGCRefs:
11971 // No action to take.
11972 // However, we will assert that, at resolution time, no registers contain GC refs.
11974 DBEXEC(VERBOSE, printf(" "));
11975 regMaskTP candidateRegs = currentRefPosition->registerAssignment;
11976 while (candidateRegs != RBM_NONE)
11978 regMaskTP nextRegBit = genFindLowestBit(candidateRegs);
11979 candidateRegs &= ~nextRegBit;
11980 regNumber nextReg = genRegNumFromMask(nextRegBit);
11981 RegRecord* regRecord = getRegisterRecord(nextReg);
11982 Interval* assignedInterval = regRecord->assignedInterval;
11983 assert(assignedInterval == nullptr || !varTypeIsGC(assignedInterval->registerType));
11988 case RefTypeExpUse:
11989 case RefTypeDummyDef:
11990 // Do nothing; these will be handled by the RefTypeBB.
11991 DBEXEC(VERBOSE, printf(" "));
11994 case RefTypeInvalid:
11995 // for these 'currentRefPosition->refType' values, No action to take
11999 if (currentRefPosition->refType != RefTypeBB)
12001 DBEXEC(VERBOSE, dumpRegRecords());
12002 if (interval != nullptr)
12004 if (currentRefPosition->copyReg)
12006 assert(interval->physReg != regNum);
12007 regRecord->assignedInterval = nullptr;
12008 assert(interval->assignedReg != nullptr);
12009 regRecord = interval->assignedReg;
12011 if (currentRefPosition->spillAfter || currentRefPosition->lastUse)
12013 interval->physReg = REG_NA;
12014 interval->assignedReg = nullptr;
12016 // regRegcord could be null if the RefPosition does not require a register.
12017 if (regRecord != nullptr)
12019 regRecord->assignedInterval = nullptr;
12023 assert(!currentRefPosition->RequiresRegister());
12030 // Now, verify the resolution blocks.
12031 // Currently these are nearly always at the end of the method, but that may not alwyas be the case.
12032 // So, we'll go through all the BBs looking for blocks whose bbNum is greater than bbNumMaxBeforeResolution.
12033 for (BasicBlock* currentBlock = compiler->fgFirstBB; currentBlock != nullptr; currentBlock = currentBlock->bbNext)
12035 if (currentBlock->bbNum > bbNumMaxBeforeResolution)
12039 dumpRegRecordTitle();
12040 printf(shortRefPositionFormat, 0, 0);
12041 assert(currentBlock->bbPreds != nullptr && currentBlock->bbPreds->flBlock != nullptr);
12042 printf(bbRefPosFormat, currentBlock->bbNum, currentBlock->bbPreds->flBlock->bbNum);
12046 // Clear register assignments.
12047 for (regNumber reg = REG_FIRST; reg < ACTUAL_REG_COUNT; reg = REG_NEXT(reg))
12049 RegRecord* physRegRecord = getRegisterRecord(reg);
12050 physRegRecord->assignedInterval = nullptr;
12053 // Set the incoming register assignments
12054 VarToRegMap inVarToRegMap = getInVarToRegMap(currentBlock->bbNum);
12055 VARSET_ITER_INIT(compiler, iter, currentBlock->bbLiveIn, varIndex);
12056 while (iter.NextElem(compiler, &varIndex))
12058 unsigned varNum = compiler->lvaTrackedToVarNum[varIndex];
12059 regNumber regNum = getVarReg(inVarToRegMap, varNum);
12060 Interval* interval = getIntervalForLocalVar(varNum);
12061 interval->physReg = regNum;
12062 interval->assignedReg = &(physRegs[regNum]);
12063 interval->isActive = true;
12064 physRegs[regNum].assignedInterval = interval;
12067 // Verify the moves in this block
12068 LIR::Range& currentBlockRange = LIR::AsRange(currentBlock);
12069 for (GenTree* node : currentBlockRange.NonPhiNodes())
12071 assert(IsResolutionNode(currentBlockRange, node));
12072 if (IsResolutionMove(node))
12074 // Only verify nodes that are actually moves; don't bother with the nodes that are
12075 // operands to moves.
12076 verifyResolutionMove(node, currentLocation);
12080 // Verify the outgoing register assignments
12082 VarToRegMap outVarToRegMap = getOutVarToRegMap(currentBlock->bbNum);
12083 VARSET_ITER_INIT(compiler, iter, currentBlock->bbLiveOut, varIndex);
12084 while (iter.NextElem(compiler, &varIndex))
12086 unsigned varNum = compiler->lvaTrackedToVarNum[varIndex];
12087 regNumber regNum = getVarReg(outVarToRegMap, varNum);
12088 Interval* interval = getIntervalForLocalVar(varNum);
12089 assert(interval->physReg == regNum || (interval->physReg == REG_NA && regNum == REG_STK));
12090 interval->physReg = REG_NA;
12091 interval->assignedReg = nullptr;
12092 interval->isActive = false;
12098 DBEXEC(VERBOSE, printf("\n"));
12101 //------------------------------------------------------------------------
12102 // verifyResolutionMove: Verify a resolution statement. Called by verifyFinalAllocation()
12105 // resolutionMove - A GenTree* that must be a resolution move.
12106 // currentLocation - The LsraLocation of the most recent RefPosition that has been verified.
12112 // If verbose is set, this will also dump the moves into the table of final allocations.
12113 void LinearScan::verifyResolutionMove(GenTree* resolutionMove, LsraLocation currentLocation)
12115 GenTree* dst = resolutionMove;
12116 assert(IsResolutionMove(dst));
12118 if (dst->OperGet() == GT_SWAP)
12120 GenTreeLclVarCommon* left = dst->gtGetOp1()->AsLclVarCommon();
12121 GenTreeLclVarCommon* right = dst->gtGetOp2()->AsLclVarCommon();
12122 regNumber leftRegNum = left->gtRegNum;
12123 regNumber rightRegNum = right->gtRegNum;
12124 Interval* leftInterval = getIntervalForLocalVar(left->gtLclNum);
12125 Interval* rightInterval = getIntervalForLocalVar(right->gtLclNum);
12126 assert(leftInterval->physReg == leftRegNum && rightInterval->physReg == rightRegNum);
12127 leftInterval->physReg = rightRegNum;
12128 rightInterval->physReg = leftRegNum;
12129 leftInterval->assignedReg = &physRegs[rightRegNum];
12130 rightInterval->assignedReg = &physRegs[leftRegNum];
12131 physRegs[rightRegNum].assignedInterval = leftInterval;
12132 physRegs[leftRegNum].assignedInterval = rightInterval;
12135 printf(shortRefPositionFormat, currentLocation, 0);
12136 dumpIntervalName(leftInterval);
12138 printf(" %-4s ", getRegName(rightRegNum));
12140 printf(shortRefPositionFormat, currentLocation, 0);
12141 dumpIntervalName(rightInterval);
12143 printf(" %-4s ", getRegName(leftRegNum));
12148 regNumber dstRegNum = dst->gtRegNum;
12149 regNumber srcRegNum;
12150 GenTreeLclVarCommon* lcl;
12151 if (dst->OperGet() == GT_COPY)
12153 lcl = dst->gtGetOp1()->AsLclVarCommon();
12154 srcRegNum = lcl->gtRegNum;
12158 lcl = dst->AsLclVarCommon();
12159 if ((lcl->gtFlags & GTF_SPILLED) != 0)
12161 srcRegNum = REG_STK;
12165 assert((lcl->gtFlags & GTF_SPILL) != 0);
12166 srcRegNum = dstRegNum;
12167 dstRegNum = REG_STK;
12170 Interval* interval = getIntervalForLocalVar(lcl->gtLclNum);
12171 assert(interval->physReg == srcRegNum || (srcRegNum == REG_STK && interval->physReg == REG_NA));
12172 if (srcRegNum != REG_STK)
12174 physRegs[srcRegNum].assignedInterval = nullptr;
12176 if (dstRegNum != REG_STK)
12178 interval->physReg = dstRegNum;
12179 interval->assignedReg = &(physRegs[dstRegNum]);
12180 physRegs[dstRegNum].assignedInterval = interval;
12181 interval->isActive = true;
12185 interval->physReg = REG_NA;
12186 interval->assignedReg = nullptr;
12187 interval->isActive = false;
12191 printf(shortRefPositionFormat, currentLocation, 0);
12192 dumpIntervalName(interval);
12194 printf(" %-4s ", getRegName(dstRegNum));
12200 #endif // !LEGACY_BACKEND