1 // Licensed to the .NET Foundation under one or more agreements.
2 // The .NET Foundation licenses this file to you under the MIT license.
3 // See the LICENSE file in the project root for more information.
6 XXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXX
7 XXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXX
9 Linear Scan Register Allocation
14 - All register requirements are expressed in the code stream, either as destination
15 registers of tree nodes, or as internal registers. These requirements are
16 expressed in the TreeNodeInfo (gtLsraInfo) on each node, which includes:
17 - The number of register sources and destinations.
18 - The register restrictions (candidates) of the target register, both from itself,
19 as producer of the value (dstCandidates), and from its consuming node (srcCandidates).
20 Note that the srcCandidates field of TreeNodeInfo refers to the destination register
21 (not any of its sources).
22 - The number (internalCount) of registers required, and their register restrictions (internalCandidates).
23 These are neither inputs nor outputs of the node, but used in the sequence of code generated for the tree.
24 "Internal registers" are registers used during the code sequence generated for the node.
25 The register lifetimes must obey the following lifetime model:
26 - First, any internal registers are defined.
27 - Next, any source registers are used (and are then freed if they are last use and are not identified as
29 - Next, the internal registers are used (and are then freed).
30 - Next, any registers in the kill set for the instruction are killed.
31 - Next, the destination register(s) are defined (multiple destination registers are only supported on ARM)
32 - Finally, any "delayRegFree" source registers are freed.
33 There are several things to note about this order:
34 - The internal registers will never overlap any use, but they may overlap a destination register.
35 - Internal registers are never live beyond the node.
36 - The "delayRegFree" annotation is used for instructions that are only available in a Read-Modify-Write form.
37 That is, the destination register is one of the sources. In this case, we must not use the same register for
38 the non-RMW operand as for the destination.
40 Overview (doLinearScan):
41 - Walk all blocks, building intervals and RefPositions (buildIntervals)
42 - Allocate registers (allocateRegisters)
43 - Annotate nodes with register assignments (resolveRegisters)
44 - Add move nodes as needed to resolve conflicting register
45 assignments across non-adjacent edges. (resolveEdges, called from resolveRegisters)
50 - GenTree::gtRegNum (and gtRegPair for ARM) is annotated with the register
51 assignment for a node. If the node does not require a register, it is
52 annotated as such (for single registers, gtRegNum = REG_NA; for register
53 pair type, gtRegPair = REG_PAIR_NONE). For a variable definition or interior
54 tree node (an "implicit" definition), this is the register to put the result.
55 For an expression use, this is the place to find the value that has previously
57 - In most cases, this register must satisfy the constraints specified by the TreeNodeInfo.
58 - In some cases, this is difficult:
59 - If a lclVar node currently lives in some register, it may not be desirable to move it
60 (i.e. its current location may be desirable for future uses, e.g. if it's a callee save register,
61 but needs to be in a specific arg register for a call).
62 - In other cases there may be conflicts on the restrictions placed by the defining node and the node which
64 - If such a node is constrained to a single fixed register (e.g. an arg register, or a return from a call),
65 then LSRA is free to annotate the node with a different register. The code generator must issue the appropriate
67 - However, if such a node is constrained to a set of registers, and its current location does not satisfy that
68 requirement, LSRA must insert a GT_COPY node between the node and its parent. The gtRegNum on the GT_COPY node
69 must satisfy the register requirement of the parent.
70 - GenTree::gtRsvdRegs has a set of registers used for internal temps.
71 - A tree node is marked GTF_SPILL if the tree node must be spilled by the code generator after it has been
73 - LSRA currently does not set GTF_SPILLED on such nodes, because it caused problems in the old code generator.
74 In the new backend perhaps this should change (see also the note below under CodeGen).
75 - A tree node is marked GTF_SPILLED if it is a lclVar that must be reloaded prior to use.
76 - The register (gtRegNum) on the node indicates the register to which it must be reloaded.
77 - For lclVar nodes, since the uses and defs are distinct tree nodes, it is always possible to annotate the node
78 with the register to which the variable must be reloaded.
79 - For other nodes, since they represent both the def and use, if the value must be reloaded to a different
80 register, LSRA must insert a GT_RELOAD node in order to specify the register to which it should be reloaded.
82 Local variable table (LclVarDsc):
83 - LclVarDsc::lvRegister is set to true if a local variable has the
84 same register assignment for its entire lifetime.
85 - LclVarDsc::lvRegNum / lvOtherReg: these are initialized to their
86 first value at the end of LSRA (it looks like lvOtherReg isn't?
87 This is probably a bug (ARM)). Codegen will set them to their current value
88 as it processes the trees, since a variable can (now) be assigned different
89 registers over its lifetimes.
91 XXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXX
92 XXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXX
100 #ifndef LEGACY_BACKEND // This file is ONLY used for the RyuJIT backend that uses the linear scan register allocator
105 const char* LinearScan::resolveTypeName[] = {"Split", "Join", "Critical", "SharedCritical"};
108 /*XXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXX
109 XXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXX
111 XX Small Helper functions XX
114 XXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXX
115 XXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXX
118 //--------------------------------------------------------------
119 // lsraAssignRegToTree: Assign the given reg to tree node.
122 // tree - Gentree node
123 // reg - register to be assigned
124 // regIdx - register idx, if tree is a multi-reg call node.
125 // regIdx will be zero for single-reg result producing tree nodes.
130 void lsraAssignRegToTree(GenTreePtr tree, regNumber reg, unsigned regIdx)
134 tree->gtRegNum = reg;
136 #if defined(_TARGET_ARM_)
137 else if (tree->OperGet() == GT_MUL_LONG || tree->OperGet() == GT_PUTARG_REG)
140 GenTreeMultiRegOp* mul = tree->AsMultiRegOp();
141 mul->gtOtherReg = reg;
143 else if (tree->OperGet() == GT_COPY)
146 GenTreeCopyOrReload* copy = tree->AsCopyOrReload();
147 copy->gtOtherRegs[0] = (regNumberSmall)reg;
149 else if (tree->OperGet() == GT_PUTARG_SPLIT)
151 GenTreePutArgSplit* putArg = tree->AsPutArgSplit();
152 putArg->SetRegNumByIdx(reg, regIdx);
154 #endif // _TARGET_ARM_
157 assert(tree->IsMultiRegCall());
158 GenTreeCall* call = tree->AsCall();
159 call->SetRegNumByIdx(reg, regIdx);
163 //-------------------------------------------------------------
164 // getWeight: Returns the weight of the RefPosition.
167 // refPos - ref position
170 // Weight of ref position.
171 unsigned LinearScan::getWeight(RefPosition* refPos)
174 GenTreePtr treeNode = refPos->treeNode;
176 if (treeNode != nullptr)
178 if (isCandidateLocalRef(treeNode))
180 // Tracked locals: use weighted ref cnt as the weight of the
182 GenTreeLclVarCommon* lclCommon = treeNode->AsLclVarCommon();
183 LclVarDsc* varDsc = &(compiler->lvaTable[lclCommon->gtLclNum]);
184 weight = varDsc->lvRefCntWtd;
188 // Non-candidate local ref or non-lcl tree node.
189 // These are considered to have two references in the basic block:
190 // a def and a use and hence weighted ref count is 2 times
191 // the basic block weight in which they appear.
192 weight = 2 * this->blockInfo[refPos->bbNum].weight;
197 // Non-tree node ref positions. These will have a single
198 // reference in the basic block and hence their weighted
199 // refcount is equal to the block weight in which they
201 weight = this->blockInfo[refPos->bbNum].weight;
207 // allRegs represents a set of registers that can
208 // be used to allocate the specified type in any point
209 // in time (more of a 'bank' of registers).
210 regMaskTP LinearScan::allRegs(RegisterType rt)
214 return availableFloatRegs;
216 else if (rt == TYP_DOUBLE)
218 return availableDoubleRegs;
220 // TODO-Cleanup: Add an RBM_ALLSIMD
222 else if (varTypeIsSIMD(rt))
224 return availableDoubleRegs;
225 #endif // FEATURE_SIMD
229 return availableIntRegs;
233 //--------------------------------------------------------------------------
234 // allMultiRegCallNodeRegs: represents a set of registers that can be used
235 // to allocate a multi-reg call node.
238 // call - Multi-reg call node
241 // Mask representing the set of available registers for multi-reg call
245 // Multi-reg call node available regs = Bitwise-OR(allregs(GetReturnRegType(i)))
246 // for all i=0..RetRegCount-1.
247 regMaskTP LinearScan::allMultiRegCallNodeRegs(GenTreeCall* call)
249 assert(call->HasMultiRegRetVal());
251 ReturnTypeDesc* retTypeDesc = call->GetReturnTypeDesc();
252 regMaskTP resultMask = allRegs(retTypeDesc->GetReturnRegType(0));
254 unsigned count = retTypeDesc->GetReturnRegCount();
255 for (unsigned i = 1; i < count; ++i)
257 resultMask |= allRegs(retTypeDesc->GetReturnRegType(i));
263 //--------------------------------------------------------------------------
264 // allRegs: returns the set of registers that can accomodate the type of
268 // tree - GenTree node
271 // Mask representing the set of available registers for given tree
273 // Note: In case of multi-reg call node, the full set of registers must be
274 // determined by looking at types of individual return register types.
275 // In this case, the registers may include registers from different register
276 // sets and will not be limited to the actual ABI return registers.
277 regMaskTP LinearScan::allRegs(GenTree* tree)
279 regMaskTP resultMask;
281 // In case of multi-reg calls, allRegs is defined as
282 // Bitwise-Or(allRegs(GetReturnRegType(i)) for i=0..ReturnRegCount-1
283 if (tree->IsMultiRegCall())
285 resultMask = allMultiRegCallNodeRegs(tree->AsCall());
289 resultMask = allRegs(tree->TypeGet());
295 regMaskTP LinearScan::allSIMDRegs()
297 return availableFloatRegs;
300 //------------------------------------------------------------------------
301 // internalFloatRegCandidates: Return the set of registers that are appropriate
302 // for use as internal float registers.
305 // The set of registers (as a regMaskTP).
308 // compFloatingPointUsed is only required to be set if it is possible that we
309 // will use floating point callee-save registers.
310 // It is unlikely, if an internal register is the only use of floating point,
311 // that it will select a callee-save register. But to be safe, we restrict
312 // the set of candidates if compFloatingPointUsed is not already set.
314 regMaskTP LinearScan::internalFloatRegCandidates()
316 if (compiler->compFloatingPointUsed)
318 return allRegs(TYP_FLOAT);
322 return RBM_FLT_CALLEE_TRASH;
326 /*****************************************************************************
328 *****************************************************************************/
330 RegisterType regType(T type)
333 if (varTypeIsSIMD(type))
335 return FloatRegisterType;
337 #endif // FEATURE_SIMD
338 return varTypeIsFloating(TypeGet(type)) ? FloatRegisterType : IntRegisterType;
341 bool useFloatReg(var_types type)
343 return (regType(type) == FloatRegisterType);
346 bool registerTypesEquivalent(RegisterType a, RegisterType b)
348 return varTypeIsIntegralOrI(a) == varTypeIsIntegralOrI(b);
351 bool isSingleRegister(regMaskTP regMask)
353 return (regMask != RBM_NONE && genMaxOneBit(regMask));
356 /*****************************************************************************
357 * Inline functions for RegRecord
358 *****************************************************************************/
360 bool RegRecord::isFree()
362 return ((assignedInterval == nullptr || !assignedInterval->isActive) && !isBusyUntilNextKill);
365 /*****************************************************************************
366 * Inline functions for LinearScan
367 *****************************************************************************/
368 RegRecord* LinearScan::getRegisterRecord(regNumber regNum)
370 return &physRegs[regNum];
375 //----------------------------------------------------------------------------
376 // getConstrainedRegMask: Returns new regMask which is the intersection of
377 // regMaskActual and regMaskConstraint if the new regMask has at least
378 // minRegCount registers, otherwise returns regMaskActual.
381 // regMaskActual - regMask that needs to be constrained
382 // regMaskConstraint - regMask constraint that needs to be
383 // applied to regMaskActual
384 // minRegCount - Minimum number of regs that should be
385 // be present in new regMask.
388 // New regMask that has minRegCount registers after instersection.
389 // Otherwise returns regMaskActual.
390 regMaskTP LinearScan::getConstrainedRegMask(regMaskTP regMaskActual, regMaskTP regMaskConstraint, unsigned minRegCount)
392 regMaskTP newMask = regMaskActual & regMaskConstraint;
393 if (genCountBits(newMask) >= minRegCount)
398 return regMaskActual;
401 //------------------------------------------------------------------------
402 // stressLimitRegs: Given a set of registers, expressed as a register mask, reduce
403 // them based on the current stress options.
406 // mask - The current mask of register candidates for a node
409 // A possibly-modified mask, based on the value of COMPlus_JitStressRegs.
412 // This is the method used to implement the stress options that limit
413 // the set of registers considered for allocation.
415 regMaskTP LinearScan::stressLimitRegs(RefPosition* refPosition, regMaskTP mask)
417 if (getStressLimitRegs() != LSRA_LIMIT_NONE)
419 // The refPosition could be null, for example when called
420 // by getTempRegForResolution().
421 int minRegCount = (refPosition != nullptr) ? refPosition->minRegCandidateCount : 1;
423 switch (getStressLimitRegs())
425 case LSRA_LIMIT_CALLEE:
426 if (!compiler->opts.compDbgEnC)
428 mask = getConstrainedRegMask(mask, RBM_CALLEE_SAVED, minRegCount);
432 case LSRA_LIMIT_CALLER:
434 mask = getConstrainedRegMask(mask, RBM_CALLEE_TRASH, minRegCount);
438 case LSRA_LIMIT_SMALL_SET:
439 if ((mask & LsraLimitSmallIntSet) != RBM_NONE)
441 mask = getConstrainedRegMask(mask, LsraLimitSmallIntSet, minRegCount);
443 else if ((mask & LsraLimitSmallFPSet) != RBM_NONE)
445 mask = getConstrainedRegMask(mask, LsraLimitSmallFPSet, minRegCount);
453 if (refPosition != nullptr && refPosition->isFixedRegRef)
455 mask |= refPosition->registerAssignment;
463 // TODO-Cleanup: Consider adding an overload that takes a varDsc, and can appropriately
464 // set such fields as isStructField
466 Interval* LinearScan::newInterval(RegisterType theRegisterType)
468 intervals.emplace_back(theRegisterType, allRegs(theRegisterType));
469 Interval* newInt = &intervals.back();
472 newInt->intervalIndex = static_cast<unsigned>(intervals.size() - 1);
475 DBEXEC(VERBOSE, newInt->dump());
479 RefPosition* LinearScan::newRefPositionRaw(LsraLocation nodeLocation, GenTree* treeNode, RefType refType)
481 refPositions.emplace_back(curBBNum, nodeLocation, treeNode, refType);
482 RefPosition* newRP = &refPositions.back();
484 newRP->rpNum = static_cast<unsigned>(refPositions.size() - 1);
489 //------------------------------------------------------------------------
490 // resolveConflictingDefAndUse: Resolve the situation where we have conflicting def and use
491 // register requirements on a single-def, single-use interval.
494 // defRefPosition - The interval definition
495 // useRefPosition - The (sole) interval use
501 // The two RefPositions are for the same interval, which is a tree-temp.
504 // We require some special handling for the case where the use is a "delayRegFree" case of a fixedReg.
505 // In that case, if we change the registerAssignment on the useRefPosition, we will lose the fact that,
506 // even if we assign a different register (and rely on codegen to do the copy), that fixedReg also needs
507 // to remain busy until the Def register has been allocated. In that case, we don't allow Case 1 or Case 4
509 // Here are the cases we consider (in this order):
510 // 1. If The defRefPosition specifies a single register, and there are no conflicting
511 // FixedReg uses of it between the def and use, we use that register, and the code generator
512 // will insert the copy. Note that it cannot be in use because there is a FixedRegRef for the def.
513 // 2. If the useRefPosition specifies a single register, and it is not in use, and there are no
514 // conflicting FixedReg uses of it between the def and use, we use that register, and the code generator
515 // will insert the copy.
516 // 3. If the defRefPosition specifies a single register (but there are conflicts, as determined
517 // in 1.), and there are no conflicts with the useRefPosition register (if it's a single register),
518 /// we set the register requirements on the defRefPosition to the use registers, and the
519 // code generator will insert a copy on the def. We can't rely on the code generator to put a copy
520 // on the use if it has multiple possible candidates, as it won't know which one has been allocated.
521 // 4. If the useRefPosition specifies a single register, and there are no conflicts with the register
522 // on the defRefPosition, we leave the register requirements on the defRefPosition as-is, and set
523 // the useRefPosition to the def registers, for similar reasons to case #3.
524 // 5. If both the defRefPosition and the useRefPosition specify single registers, but both have conflicts,
525 // We set the candiates on defRefPosition to be all regs of the appropriate type, and since they are
526 // single registers, codegen can insert the copy.
527 // 6. Finally, if the RefPositions specify disjoint subsets of the registers (or the use is fixed but
528 // has a conflict), we must insert a copy. The copy will be inserted before the use if the
529 // use is not fixed (in the fixed case, the code generator will insert the use).
531 // TODO-CQ: We get bad register allocation in case #3 in the situation where no register is
532 // available for the lifetime. We end up allocating a register that must be spilled, and it probably
533 // won't be the register that is actually defined by the target instruction. So, we have to copy it
534 // and THEN spill it. In this case, we should be using the def requirement. But we need to change
535 // the interface to this method a bit to make that work (e.g. returning a candidate set to use, but
536 // leaving the registerAssignment as-is on the def, so that if we find that we need to spill anyway
537 // we can use the fixed-reg on the def.
540 void LinearScan::resolveConflictingDefAndUse(Interval* interval, RefPosition* defRefPosition)
542 assert(!interval->isLocalVar);
544 RefPosition* useRefPosition = defRefPosition->nextRefPosition;
545 regMaskTP defRegAssignment = defRefPosition->registerAssignment;
546 regMaskTP useRegAssignment = useRefPosition->registerAssignment;
547 RegRecord* defRegRecord = nullptr;
548 RegRecord* useRegRecord = nullptr;
549 regNumber defReg = REG_NA;
550 regNumber useReg = REG_NA;
551 bool defRegConflict = false;
552 bool useRegConflict = false;
554 // If the useRefPosition is a "delayRegFree", we can't change the registerAssignment
555 // on it, or we will fail to ensure that the fixedReg is busy at the time the target
556 // (of the node that uses this interval) is allocated.
557 bool canChangeUseAssignment = !useRefPosition->isFixedRegRef || !useRefPosition->delayRegFree;
559 INDEBUG(dumpLsraAllocationEvent(LSRA_EVENT_DEFUSE_CONFLICT));
560 if (!canChangeUseAssignment)
562 INDEBUG(dumpLsraAllocationEvent(LSRA_EVENT_DEFUSE_FIXED_DELAY_USE));
564 if (defRefPosition->isFixedRegRef)
566 defReg = defRefPosition->assignedReg();
567 defRegRecord = getRegisterRecord(defReg);
568 if (canChangeUseAssignment)
570 RefPosition* currFixedRegRefPosition = defRegRecord->recentRefPosition;
571 assert(currFixedRegRefPosition != nullptr &&
572 currFixedRegRefPosition->nodeLocation == defRefPosition->nodeLocation);
574 if (currFixedRegRefPosition->nextRefPosition == nullptr ||
575 currFixedRegRefPosition->nextRefPosition->nodeLocation > useRefPosition->getRefEndLocation())
577 // This is case #1. Use the defRegAssignment
578 INDEBUG(dumpLsraAllocationEvent(LSRA_EVENT_DEFUSE_CASE1));
579 useRefPosition->registerAssignment = defRegAssignment;
584 defRegConflict = true;
588 if (useRefPosition->isFixedRegRef)
590 useReg = useRefPosition->assignedReg();
591 useRegRecord = getRegisterRecord(useReg);
592 RefPosition* currFixedRegRefPosition = useRegRecord->recentRefPosition;
594 // We know that useRefPosition is a fixed use, so the nextRefPosition must not be null.
595 RefPosition* nextFixedRegRefPosition = useRegRecord->getNextRefPosition();
596 assert(nextFixedRegRefPosition != nullptr &&
597 nextFixedRegRefPosition->nodeLocation <= useRefPosition->nodeLocation);
599 // First, check to see if there are any conflicting FixedReg references between the def and use.
600 if (nextFixedRegRefPosition->nodeLocation == useRefPosition->nodeLocation)
602 // OK, no conflicting FixedReg references.
603 // Now, check to see whether it is currently in use.
604 if (useRegRecord->assignedInterval != nullptr)
606 RefPosition* possiblyConflictingRef = useRegRecord->assignedInterval->recentRefPosition;
607 LsraLocation possiblyConflictingRefLocation = possiblyConflictingRef->getRefEndLocation();
608 if (possiblyConflictingRefLocation >= defRefPosition->nodeLocation)
610 useRegConflict = true;
615 // This is case #2. Use the useRegAssignment
616 INDEBUG(dumpLsraAllocationEvent(LSRA_EVENT_DEFUSE_CASE2));
617 defRefPosition->registerAssignment = useRegAssignment;
623 useRegConflict = true;
626 if (defRegRecord != nullptr && !useRegConflict)
629 INDEBUG(dumpLsraAllocationEvent(LSRA_EVENT_DEFUSE_CASE3));
630 defRefPosition->registerAssignment = useRegAssignment;
633 if (useRegRecord != nullptr && !defRegConflict && canChangeUseAssignment)
636 INDEBUG(dumpLsraAllocationEvent(LSRA_EVENT_DEFUSE_CASE4));
637 useRefPosition->registerAssignment = defRegAssignment;
640 if (defRegRecord != nullptr && useRegRecord != nullptr)
643 INDEBUG(dumpLsraAllocationEvent(LSRA_EVENT_DEFUSE_CASE5));
644 RegisterType regType = interval->registerType;
645 assert((getRegisterType(interval, defRefPosition) == regType) &&
646 (getRegisterType(interval, useRefPosition) == regType));
647 regMaskTP candidates = allRegs(regType);
648 defRefPosition->registerAssignment = candidates;
651 INDEBUG(dumpLsraAllocationEvent(LSRA_EVENT_DEFUSE_CASE6));
655 //------------------------------------------------------------------------
656 // conflictingFixedRegReference: Determine whether the current RegRecord has a
657 // fixed register use that conflicts with 'refPosition'
660 // refPosition - The RefPosition of interest
663 // Returns true iff the given RefPosition is NOT a fixed use of this register,
665 // - there is a RefPosition on this RegRecord at the nodeLocation of the given RefPosition, or
666 // - the given RefPosition has a delayRegFree, and there is a RefPosition on this RegRecord at
667 // the nodeLocation just past the given RefPosition.
670 // 'refPosition is non-null.
672 bool RegRecord::conflictingFixedRegReference(RefPosition* refPosition)
674 // Is this a fixed reference of this register? If so, there is no conflict.
675 if (refPosition->isFixedRefOfRegMask(genRegMask(regNum)))
679 // Otherwise, check for conflicts.
680 // There is a conflict if:
681 // 1. There is a recent RefPosition on this RegRecord that is at this location,
682 // except in the case where it is a special "putarg" that is associated with this interval, OR
683 // 2. There is an upcoming RefPosition at this location, or at the next location
684 // if refPosition is a delayed use (i.e. must be kept live through the next/def location).
686 LsraLocation refLocation = refPosition->nodeLocation;
687 if (recentRefPosition != nullptr && recentRefPosition->refType != RefTypeKill &&
688 recentRefPosition->nodeLocation == refLocation &&
689 (!isBusyUntilNextKill || assignedInterval != refPosition->getInterval()))
693 LsraLocation nextPhysRefLocation = getNextRefLocation();
694 if (nextPhysRefLocation == refLocation || (refPosition->delayRegFree && nextPhysRefLocation == (refLocation + 1)))
701 void LinearScan::applyCalleeSaveHeuristics(RefPosition* rp)
703 #ifdef _TARGET_AMD64_
704 if (compiler->opts.compDbgEnC)
706 // We only use RSI and RDI for EnC code, so we don't want to favor callee-save regs.
709 #endif // _TARGET_AMD64_
711 Interval* theInterval = rp->getInterval();
714 regMaskTP calleeSaveMask = calleeSaveRegs(getRegisterType(theInterval, rp));
715 if (doReverseCallerCallee())
717 rp->registerAssignment =
718 getConstrainedRegMask(rp->registerAssignment, calleeSaveMask, rp->minRegCandidateCount);
723 // Set preferences so that this register set will be preferred for earlier refs
724 theInterval->updateRegisterPreferences(rp->registerAssignment);
728 void LinearScan::associateRefPosWithInterval(RefPosition* rp)
730 Referenceable* theReferent = rp->referent;
732 if (theReferent != nullptr)
734 // All RefPositions except the dummy ones at the beginning of blocks
736 if (rp->isIntervalRef())
738 Interval* theInterval = rp->getInterval();
740 applyCalleeSaveHeuristics(rp);
742 if (theInterval->isLocalVar)
744 if (RefTypeIsUse(rp->refType))
746 RefPosition* const prevRP = theInterval->recentRefPosition;
747 if ((prevRP != nullptr) && (prevRP->bbNum == rp->bbNum))
749 prevRP->lastUse = false;
753 rp->lastUse = (rp->refType != RefTypeExpUse) && (rp->refType != RefTypeParamDef) &&
754 (rp->refType != RefTypeZeroInit) && !extendLifetimes();
756 else if (rp->refType == RefTypeUse)
758 // Ensure that we have consistent def/use on SDSU temps.
759 // However, there are a couple of cases where this may over-constrain allocation:
760 // 1. In the case of a non-commutative rmw def (in which the rmw source must be delay-free), or
761 // 2. In the case where the defining node requires a temp distinct from the target (also a
763 // In those cases, if we propagate a single-register restriction from the consumer to the producer
764 // the delayed uses will not see a fixed reference in the PhysReg at that position, and may
765 // incorrectly allocate that register.
766 // TODO-CQ: This means that we may often require a copy at the use of this node's result.
767 // This case could be moved to BuildRefPositionsForNode, at the point where the def RefPosition is
768 // created, causing a RefTypeFixedRef to be added at that location. This, however, results in
769 // more PhysReg RefPositions (a throughput impact), and a large number of diffs that require
770 // further analysis to determine benefit.
772 RefPosition* prevRefPosition = theInterval->recentRefPosition;
773 assert(prevRefPosition != nullptr && theInterval->firstRefPosition == prevRefPosition);
774 // All defs must have a valid treeNode, but we check it below to be conservative.
775 assert(prevRefPosition->treeNode != nullptr);
776 regMaskTP prevAssignment = prevRefPosition->registerAssignment;
777 regMaskTP newAssignment = (prevAssignment & rp->registerAssignment);
778 if (newAssignment != RBM_NONE)
780 if (!isSingleRegister(newAssignment) ||
781 (!theInterval->hasNonCommutativeRMWDef && (prevRefPosition->treeNode != nullptr) &&
782 !prevRefPosition->treeNode->gtLsraInfo.isInternalRegDelayFree))
784 prevRefPosition->registerAssignment = newAssignment;
789 theInterval->hasConflictingDefUse = true;
796 RefPosition* prevRP = theReferent->recentRefPosition;
797 if (prevRP != nullptr)
799 prevRP->nextRefPosition = rp;
803 theReferent->firstRefPosition = rp;
805 theReferent->recentRefPosition = rp;
806 theReferent->lastRefPosition = rp;
810 assert((rp->refType == RefTypeBB) || (rp->refType == RefTypeKillGCRefs));
814 //---------------------------------------------------------------------------
815 // newRefPosition: allocate and initialize a new RefPosition.
818 // reg - reg number that identifies RegRecord to be associated
819 // with this RefPosition
820 // theLocation - LSRA location of RefPosition
821 // theRefType - RefPosition type
822 // theTreeNode - GenTree node for which this RefPosition is created
823 // mask - Set of valid registers for this RefPosition
824 // multiRegIdx - register position if this RefPosition corresponds to a
825 // multi-reg call node.
830 RefPosition* LinearScan::newRefPosition(
831 regNumber reg, LsraLocation theLocation, RefType theRefType, GenTree* theTreeNode, regMaskTP mask)
833 RefPosition* newRP = newRefPositionRaw(theLocation, theTreeNode, theRefType);
835 newRP->setReg(getRegisterRecord(reg));
836 newRP->registerAssignment = mask;
838 newRP->setMultiRegIdx(0);
839 newRP->setAllocateIfProfitable(false);
841 associateRefPosWithInterval(newRP);
843 DBEXEC(VERBOSE, newRP->dump());
847 //---------------------------------------------------------------------------
848 // newRefPosition: allocate and initialize a new RefPosition.
851 // theInterval - interval to which RefPosition is associated with.
852 // theLocation - LSRA location of RefPosition
853 // theRefType - RefPosition type
854 // theTreeNode - GenTree node for which this RefPosition is created
855 // mask - Set of valid registers for this RefPosition
856 // multiRegIdx - register position if this RefPosition corresponds to a
857 // multi-reg call node.
858 // minRegCount - Minimum number registers that needs to be ensured while
859 // constraining candidates for this ref position under
860 // LSRA stress. This is a DEBUG only arg.
865 RefPosition* LinearScan::newRefPosition(Interval* theInterval,
866 LsraLocation theLocation,
868 GenTree* theTreeNode,
870 unsigned multiRegIdx /* = 0 */
871 DEBUGARG(unsigned minRegCandidateCount /* = 1 */))
874 if (theInterval != nullptr && regType(theInterval->registerType) == FloatRegisterType)
876 // In the case we're using floating point registers we must make sure
877 // this flag was set previously in the compiler since this will mandate
878 // whether LSRA will take into consideration FP reg killsets.
879 assert(compiler->compFloatingPointUsed || ((mask & RBM_FLT_CALLEE_SAVED) == 0));
883 // If this reference is constrained to a single register (and it's not a dummy
884 // or Kill reftype already), add a RefTypeFixedReg at this location so that its
885 // availability can be more accurately determined
887 bool isFixedRegister = isSingleRegister(mask);
888 bool insertFixedRef = false;
891 // Insert a RefTypeFixedReg for any normal def or use (not ParamDef or BB)
892 if (theRefType == RefTypeUse || theRefType == RefTypeDef)
894 insertFixedRef = true;
900 regNumber physicalReg = genRegNumFromMask(mask);
901 RefPosition* pos = newRefPosition(physicalReg, theLocation, RefTypeFixedReg, nullptr, mask);
902 assert(theInterval != nullptr);
903 assert((allRegs(theInterval->registerType) & mask) != 0);
906 RefPosition* newRP = newRefPositionRaw(theLocation, theTreeNode, theRefType);
908 newRP->setInterval(theInterval);
911 newRP->isFixedRegRef = isFixedRegister;
913 #ifndef _TARGET_AMD64_
914 // We don't need this for AMD because the PInvoke method epilog code is explicit
915 // at register allocation time.
916 if (theInterval != nullptr && theInterval->isLocalVar && compiler->info.compCallUnmanaged &&
917 theInterval->varNum == compiler->genReturnLocal)
919 mask &= ~(RBM_PINVOKE_TCB | RBM_PINVOKE_FRAME);
920 noway_assert(mask != RBM_NONE);
922 #endif // !_TARGET_AMD64_
923 newRP->registerAssignment = mask;
925 newRP->setMultiRegIdx(multiRegIdx);
926 newRP->setAllocateIfProfitable(false);
929 newRP->minRegCandidateCount = minRegCandidateCount;
932 associateRefPosWithInterval(newRP);
934 DBEXEC(VERBOSE, newRP->dump());
938 /*****************************************************************************
939 * Inline functions for Interval
940 *****************************************************************************/
941 RefPosition* Referenceable::getNextRefPosition()
943 if (recentRefPosition == nullptr)
945 return firstRefPosition;
949 return recentRefPosition->nextRefPosition;
953 LsraLocation Referenceable::getNextRefLocation()
955 RefPosition* nextRefPosition = getNextRefPosition();
956 if (nextRefPosition == nullptr)
962 return nextRefPosition->nodeLocation;
966 // Iterate through all the registers of the given type
967 class RegisterIterator
969 friend class Registers;
972 RegisterIterator(RegisterType type) : regType(type)
974 if (useFloatReg(regType))
976 currentRegNum = REG_FP_FIRST;
980 currentRegNum = REG_INT_FIRST;
985 static RegisterIterator Begin(RegisterType regType)
987 return RegisterIterator(regType);
989 static RegisterIterator End(RegisterType regType)
991 RegisterIterator endIter = RegisterIterator(regType);
992 // This assumes only integer and floating point register types
993 // if we target a processor with additional register types,
994 // this would have to change
995 if (useFloatReg(regType))
997 // This just happens to work for both double & float
998 endIter.currentRegNum = REG_NEXT(REG_FP_LAST);
1002 endIter.currentRegNum = REG_NEXT(REG_INT_LAST);
1008 void operator++(int dummy) // int dummy is c++ for "this is postfix ++"
1010 currentRegNum = REG_NEXT(currentRegNum);
1012 if (regType == TYP_DOUBLE)
1013 currentRegNum = REG_NEXT(currentRegNum);
1016 void operator++() // prefix operator++
1018 currentRegNum = REG_NEXT(currentRegNum);
1020 if (regType == TYP_DOUBLE)
1021 currentRegNum = REG_NEXT(currentRegNum);
1024 regNumber operator*()
1026 return currentRegNum;
1028 bool operator!=(const RegisterIterator& other)
1030 return other.currentRegNum != currentRegNum;
1034 regNumber currentRegNum;
1035 RegisterType regType;
1041 friend class RegisterIterator;
1043 Registers(RegisterType t)
1047 RegisterIterator begin()
1049 return RegisterIterator::Begin(type);
1051 RegisterIterator end()
1053 return RegisterIterator::End(type);
1058 void LinearScan::dumpVarToRegMap(VarToRegMap map)
1060 bool anyPrinted = false;
1061 for (unsigned varIndex = 0; varIndex < compiler->lvaTrackedCount; varIndex++)
1063 unsigned varNum = compiler->lvaTrackedToVarNum[varIndex];
1064 if (map[varIndex] != REG_STK)
1066 printf("V%02u=%s ", varNum, getRegName(map[varIndex]));
1077 void LinearScan::dumpInVarToRegMap(BasicBlock* block)
1079 printf("Var=Reg beg of BB%02u: ", block->bbNum);
1080 VarToRegMap map = getInVarToRegMap(block->bbNum);
1081 dumpVarToRegMap(map);
1084 void LinearScan::dumpOutVarToRegMap(BasicBlock* block)
1086 printf("Var=Reg end of BB%02u: ", block->bbNum);
1087 VarToRegMap map = getOutVarToRegMap(block->bbNum);
1088 dumpVarToRegMap(map);
1093 LinearScanInterface* getLinearScanAllocator(Compiler* comp)
1095 return new (comp, CMK_LSRA) LinearScan(comp);
1098 //------------------------------------------------------------------------
1105 // The constructor takes care of initializing the data structures that are used
1106 // during Lowering, including (in DEBUG) getting the stress environment variables,
1107 // as they may affect the block ordering.
1109 LinearScan::LinearScan(Compiler* theCompiler)
1110 : compiler(theCompiler)
1111 #if MEASURE_MEM_ALLOC
1112 , lsraIAllocator(nullptr)
1113 #endif // MEASURE_MEM_ALLOC
1114 , intervals(LinearScanMemoryAllocatorInterval(theCompiler))
1115 , refPositions(LinearScanMemoryAllocatorRefPosition(theCompiler))
1118 maxNodeLocation = 0;
1119 activeRefPosition = nullptr;
1121 // Get the value of the environment variable that controls stress for register allocation
1122 lsraStressMask = JitConfig.JitStressRegs();
1125 if (lsraStressMask != 0)
1127 // The code in this #if can be used to debug JitStressRegs issues according to
1128 // method hash. To use, simply set environment variables JitStressRegsHashLo and JitStressRegsHashHi
1129 unsigned methHash = compiler->info.compMethodHash();
1130 char* lostr = getenv("JitStressRegsHashLo");
1131 unsigned methHashLo = 0;
1133 if (lostr != nullptr)
1135 sscanf_s(lostr, "%x", &methHashLo);
1138 char* histr = getenv("JitStressRegsHashHi");
1139 unsigned methHashHi = UINT32_MAX;
1140 if (histr != nullptr)
1142 sscanf_s(histr, "%x", &methHashHi);
1145 if (methHash < methHashLo || methHash > methHashHi)
1149 else if (dump == true)
1151 printf("JitStressRegs = %x for method %s, hash = 0x%x.\n",
1152 lsraStressMask, compiler->info.compFullName, compiler->info.compMethodHash());
1153 printf(""); // in our logic this causes a flush
1159 dumpTerse = (JitConfig.JitDumpTerseLsra() != 0);
1162 enregisterLocalVars = ((compiler->opts.compFlags & CLFLG_REGVAR) != 0) && compiler->lvaTrackedCount > 0;
1163 availableIntRegs = (RBM_ALLINT & ~compiler->codeGen->regSet.rsMaskResvd);
1166 availableIntRegs &= ~RBM_FPBASE;
1167 #endif // ETW_EBP_FRAMED
1169 availableFloatRegs = RBM_ALLFLOAT;
1170 availableDoubleRegs = RBM_ALLDOUBLE;
1172 #ifdef _TARGET_AMD64_
1173 if (compiler->opts.compDbgEnC)
1175 // On x64 when the EnC option is set, we always save exactly RBP, RSI and RDI.
1176 // RBP is not available to the register allocator, so RSI and RDI are the only
1177 // callee-save registers available.
1178 availableIntRegs &= ~RBM_CALLEE_SAVED | RBM_RSI | RBM_RDI;
1179 availableFloatRegs &= ~RBM_CALLEE_SAVED;
1180 availableDoubleRegs &= ~RBM_CALLEE_SAVED;
1182 #endif // _TARGET_AMD64_
1183 compiler->rpFrameType = FT_NOT_SET;
1184 compiler->rpMustCreateEBPCalled = false;
1186 compiler->codeGen->intRegState.rsIsFloat = false;
1187 compiler->codeGen->floatRegState.rsIsFloat = true;
1189 // Block sequencing (the order in which we schedule).
1190 // Note that we don't initialize the bbVisitedSet until we do the first traversal
1191 // (currently during Lowering's second phase, where it sets the TreeNodeInfo).
1192 // This is so that any blocks that are added during the first phase of Lowering
1193 // are accounted for (and we don't have BasicBlockEpoch issues).
1194 blockSequencingDone = false;
1195 blockSequence = nullptr;
1196 blockSequenceWorkList = nullptr;
1200 // Information about each block, including predecessor blocks used for variable locations at block entry.
1201 blockInfo = nullptr;
1203 // Populate the register mask table.
1204 // The first two masks in the table are allint/allfloat
1205 // The next N are the masks for each single register.
1206 // After that are the dynamically added ones.
1207 regMaskTable = new (compiler, CMK_LSRA) regMaskTP[numMasks];
1208 regMaskTable[ALLINT_IDX] = allRegs(TYP_INT);
1209 regMaskTable[ALLFLOAT_IDX] = allRegs(TYP_DOUBLE);
1212 for (reg = REG_FIRST; reg < REG_COUNT; reg = REG_NEXT(reg))
1214 regMaskTable[FIRST_SINGLE_REG_IDX + reg - REG_FIRST] = (reg == REG_STK) ? RBM_NONE : genRegMask(reg);
1216 nextFreeMask = FIRST_SINGLE_REG_IDX + REG_COUNT;
1217 noway_assert(nextFreeMask <= numMasks);
1220 // Return the reg mask corresponding to the given index.
1221 regMaskTP LinearScan::GetRegMaskForIndex(RegMaskIndex index)
1223 assert(index < numMasks);
1224 assert(index < nextFreeMask);
1225 return regMaskTable[index];
1228 // Given a reg mask, return the index it corresponds to. If it is not a 'well known' reg mask,
1229 // add it at the end. This method has linear behavior in the worst cases but that is fairly rare.
1230 // Most methods never use any but the well-known masks, and when they do use more
1231 // it is only one or two more.
1232 LinearScan::RegMaskIndex LinearScan::GetIndexForRegMask(regMaskTP mask)
1234 RegMaskIndex result;
1235 if (isSingleRegister(mask))
1237 result = genRegNumFromMask(mask) + FIRST_SINGLE_REG_IDX;
1239 else if (mask == allRegs(TYP_INT))
1241 result = ALLINT_IDX;
1243 else if (mask == allRegs(TYP_DOUBLE))
1245 result = ALLFLOAT_IDX;
1249 for (int i = FIRST_SINGLE_REG_IDX + REG_COUNT; i < nextFreeMask; i++)
1251 if (regMaskTable[i] == mask)
1257 // We only allocate a fixed number of masks. Since we don't reallocate, we will throw a
1258 // noway_assert if we exceed this limit.
1259 noway_assert(nextFreeMask < numMasks);
1261 regMaskTable[nextFreeMask] = mask;
1262 result = nextFreeMask;
1265 assert(mask == regMaskTable[result]);
1269 // We've decided that we can't use a register during register allocation (probably FPBASE),
1270 // but we've already added it to the register masks. Go through the masks and remove it.
1271 void LinearScan::RemoveRegisterFromMasks(regNumber reg)
1273 JITDUMP("Removing register %s from LSRA register masks\n", getRegName(reg));
1275 regMaskTP mask = ~genRegMask(reg);
1276 for (int i = 0; i < nextFreeMask; i++)
1278 regMaskTable[i] &= mask;
1281 JITDUMP("After removing register:\n");
1282 DBEXEC(VERBOSE, dspRegisterMaskTable());
1286 void LinearScan::dspRegisterMaskTable()
1288 printf("LSRA register masks. Total allocated: %d, total used: %d\n", numMasks, nextFreeMask);
1289 for (int i = 0; i < nextFreeMask; i++)
1292 dspRegMask(regMaskTable[i]);
1298 //------------------------------------------------------------------------
1299 // getNextCandidateFromWorkList: Get the next candidate for block sequencing
1305 // The next block to be placed in the sequence.
1308 // This method currently always returns the next block in the list, and relies on having
1309 // blocks added to the list only when they are "ready", and on the
1310 // addToBlockSequenceWorkList() method to insert them in the proper order.
1311 // However, a block may be in the list and already selected, if it was subsequently
1312 // encountered as both a flow and layout successor of the most recently selected
1315 BasicBlock* LinearScan::getNextCandidateFromWorkList()
1317 BasicBlockList* nextWorkList = nullptr;
1318 for (BasicBlockList* workList = blockSequenceWorkList; workList != nullptr; workList = nextWorkList)
1320 nextWorkList = workList->next;
1321 BasicBlock* candBlock = workList->block;
1322 removeFromBlockSequenceWorkList(workList, nullptr);
1323 if (!isBlockVisited(candBlock))
1331 //------------------------------------------------------------------------
1332 // setBlockSequence:Determine the block order for register allocation.
1341 // On return, the blockSequence array contains the blocks, in the order in which they
1342 // will be allocated.
1343 // This method clears the bbVisitedSet on LinearScan, and when it returns the set
1344 // contains all the bbNums for the block.
1345 // This requires a traversal of the BasicBlocks, and could potentially be
1346 // combined with the first traversal (currently the one in Lowering that sets the
1349 void LinearScan::setBlockSequence()
1351 // Reset the "visited" flag on each block.
1352 compiler->EnsureBasicBlockEpoch();
1353 bbVisitedSet = BlockSetOps::MakeEmpty(compiler);
1354 BlockSet readySet(BlockSetOps::MakeEmpty(compiler));
1355 BlockSet predSet(BlockSetOps::MakeEmpty(compiler));
1357 assert(blockSequence == nullptr && bbSeqCount == 0);
1358 blockSequence = new (compiler, CMK_LSRA) BasicBlock*[compiler->fgBBcount];
1359 bbNumMaxBeforeResolution = compiler->fgBBNumMax;
1360 blockInfo = new (compiler, CMK_LSRA) LsraBlockInfo[bbNumMaxBeforeResolution + 1];
1362 assert(blockSequenceWorkList == nullptr);
1364 bool addedInternalBlocks = false;
1365 verifiedAllBBs = false;
1366 hasCriticalEdges = false;
1367 BasicBlock* nextBlock;
1368 for (BasicBlock* block = compiler->fgFirstBB; block != nullptr; block = nextBlock)
1370 blockSequence[bbSeqCount] = block;
1371 markBlockVisited(block);
1373 nextBlock = nullptr;
1375 // Initialize the blockInfo.
1376 // predBBNum will be set later. 0 is never used as a bbNum.
1377 blockInfo[block->bbNum].predBBNum = 0;
1378 // We check for critical edges below, but initialize to false.
1379 blockInfo[block->bbNum].hasCriticalInEdge = false;
1380 blockInfo[block->bbNum].hasCriticalOutEdge = false;
1381 blockInfo[block->bbNum].weight = block->bbWeight;
1383 #if TRACK_LSRA_STATS
1384 blockInfo[block->bbNum].spillCount = 0;
1385 blockInfo[block->bbNum].copyRegCount = 0;
1386 blockInfo[block->bbNum].resolutionMovCount = 0;
1387 blockInfo[block->bbNum].splitEdgeCount = 0;
1388 #endif // TRACK_LSRA_STATS
1390 if (block->GetUniquePred(compiler) == nullptr)
1392 for (flowList* pred = block->bbPreds; pred != nullptr; pred = pred->flNext)
1394 BasicBlock* predBlock = pred->flBlock;
1395 if (predBlock->NumSucc(compiler) > 1)
1397 blockInfo[block->bbNum].hasCriticalInEdge = true;
1398 hasCriticalEdges = true;
1401 else if (predBlock->bbJumpKind == BBJ_SWITCH)
1403 assert(!"Switch with single successor");
1408 // Determine which block to schedule next.
1410 // First, update the NORMAL successors of the current block, adding them to the worklist
1411 // according to the desired order. We will handle the EH successors below.
1412 bool checkForCriticalOutEdge = (block->NumSucc(compiler) > 1);
1413 if (!checkForCriticalOutEdge && block->bbJumpKind == BBJ_SWITCH)
1415 assert(!"Switch with single successor");
1418 const unsigned numSuccs = block->NumSucc(compiler);
1419 for (unsigned succIndex = 0; succIndex < numSuccs; succIndex++)
1421 BasicBlock* succ = block->GetSucc(succIndex, compiler);
1422 if (checkForCriticalOutEdge && succ->GetUniquePred(compiler) == nullptr)
1424 blockInfo[block->bbNum].hasCriticalOutEdge = true;
1425 hasCriticalEdges = true;
1426 // We can stop checking now.
1427 checkForCriticalOutEdge = false;
1430 if (isTraversalLayoutOrder() || isBlockVisited(succ))
1435 // We've now seen a predecessor, so add it to the work list and the "readySet".
1436 // It will be inserted in the worklist according to the specified traversal order
1437 // (i.e. pred-first or random, since layout order is handled above).
1438 if (!BlockSetOps::IsMember(compiler, readySet, succ->bbNum))
1440 addToBlockSequenceWorkList(readySet, succ, predSet);
1441 BlockSetOps::AddElemD(compiler, readySet, succ->bbNum);
1445 // For layout order, simply use bbNext
1446 if (isTraversalLayoutOrder())
1448 nextBlock = block->bbNext;
1452 while (nextBlock == nullptr)
1454 nextBlock = getNextCandidateFromWorkList();
1456 // TODO-Throughput: We would like to bypass this traversal if we know we've handled all
1457 // the blocks - but fgBBcount does not appear to be updated when blocks are removed.
1458 if (nextBlock == nullptr /* && bbSeqCount != compiler->fgBBcount*/ && !verifiedAllBBs)
1460 // If we don't encounter all blocks by traversing the regular sucessor links, do a full
1461 // traversal of all the blocks, and add them in layout order.
1462 // This may include:
1463 // - internal-only blocks (in the fgAddCodeList) which may not be in the flow graph
1464 // (these are not even in the bbNext links).
1465 // - blocks that have become unreachable due to optimizations, but that are strongly
1466 // connected (these are not removed)
1469 for (Compiler::AddCodeDsc* desc = compiler->fgAddCodeList; desc != nullptr; desc = desc->acdNext)
1471 if (!isBlockVisited(block))
1473 addToBlockSequenceWorkList(readySet, block, predSet);
1474 BlockSetOps::AddElemD(compiler, readySet, block->bbNum);
1478 for (BasicBlock* block = compiler->fgFirstBB; block; block = block->bbNext)
1480 if (!isBlockVisited(block))
1482 addToBlockSequenceWorkList(readySet, block, predSet);
1483 BlockSetOps::AddElemD(compiler, readySet, block->bbNum);
1486 verifiedAllBBs = true;
1494 blockSequencingDone = true;
1497 // Make sure that we've visited all the blocks.
1498 for (BasicBlock* block = compiler->fgFirstBB; block != nullptr; block = block->bbNext)
1500 assert(isBlockVisited(block));
1503 JITDUMP("LSRA Block Sequence: ");
1505 for (BasicBlock *block = startBlockSequence(); block != nullptr; ++i, block = moveToNextBlock())
1507 JITDUMP("BB%02u", block->bbNum);
1509 if (block->isMaxBBWeight())
1515 JITDUMP("(%6s) ", refCntWtd2str(block->getBBWeight(compiler)));
1527 //------------------------------------------------------------------------
1528 // compareBlocksForSequencing: Compare two basic blocks for sequencing order.
1531 // block1 - the first block for comparison
1532 // block2 - the second block for comparison
1533 // useBlockWeights - whether to use block weights for comparison
1536 // -1 if block1 is preferred.
1537 // 0 if the blocks are equivalent.
1538 // 1 if block2 is preferred.
1541 // See addToBlockSequenceWorkList.
1542 int LinearScan::compareBlocksForSequencing(BasicBlock* block1, BasicBlock* block2, bool useBlockWeights)
1544 if (useBlockWeights)
1546 unsigned weight1 = block1->getBBWeight(compiler);
1547 unsigned weight2 = block2->getBBWeight(compiler);
1549 if (weight1 > weight2)
1553 else if (weight1 < weight2)
1559 // If weights are the same prefer LOWER bbnum
1560 if (block1->bbNum < block2->bbNum)
1564 else if (block1->bbNum == block2->bbNum)
1574 //------------------------------------------------------------------------
1575 // addToBlockSequenceWorkList: Add a BasicBlock to the work list for sequencing.
1578 // sequencedBlockSet - the set of blocks that are already sequenced
1579 // block - the new block to be added
1580 // predSet - the buffer to save predecessors set. A block set allocated by the caller used here as a
1581 // temporary block set for constructing a predecessor set. Allocated by the caller to avoid reallocating a new block
1582 // set with every call to this function
1588 // The first block in the list will be the next one to be sequenced, as soon
1589 // as we encounter a block whose successors have all been sequenced, in pred-first
1590 // order, or the very next block if we are traversing in random order (once implemented).
1591 // This method uses a comparison method to determine the order in which to place
1592 // the blocks in the list. This method queries whether all predecessors of the
1593 // block are sequenced at the time it is added to the list and if so uses block weights
1594 // for inserting the block. A block is never inserted ahead of its predecessors.
1595 // A block at the time of insertion may not have all its predecessors sequenced, in
1596 // which case it will be sequenced based on its block number. Once a block is inserted,
1597 // its priority\order will not be changed later once its remaining predecessors are
1598 // sequenced. This would mean that work list may not be sorted entirely based on
1599 // block weights alone.
1601 // Note also that, when random traversal order is implemented, this method
1602 // should insert the blocks into the list in random order, so that we can always
1603 // simply select the first block in the list.
1604 void LinearScan::addToBlockSequenceWorkList(BlockSet sequencedBlockSet, BasicBlock* block, BlockSet& predSet)
1606 // The block that is being added is not already sequenced
1607 assert(!BlockSetOps::IsMember(compiler, sequencedBlockSet, block->bbNum));
1609 // Get predSet of block
1610 BlockSetOps::ClearD(compiler, predSet);
1612 for (pred = block->bbPreds; pred != nullptr; pred = pred->flNext)
1614 BlockSetOps::AddElemD(compiler, predSet, pred->flBlock->bbNum);
1617 // If either a rarely run block or all its preds are already sequenced, use block's weight to sequence
1618 bool useBlockWeight = block->isRunRarely() || BlockSetOps::IsSubset(compiler, sequencedBlockSet, predSet);
1620 BasicBlockList* prevNode = nullptr;
1621 BasicBlockList* nextNode = blockSequenceWorkList;
1623 while (nextNode != nullptr)
1627 if (nextNode->block->isRunRarely())
1629 // If the block that is yet to be sequenced is a rarely run block, always use block weights for sequencing
1630 seqResult = compareBlocksForSequencing(nextNode->block, block, true);
1632 else if (BlockSetOps::IsMember(compiler, predSet, nextNode->block->bbNum))
1634 // always prefer unsequenced pred blocks
1639 seqResult = compareBlocksForSequencing(nextNode->block, block, useBlockWeight);
1647 prevNode = nextNode;
1648 nextNode = nextNode->next;
1651 BasicBlockList* newListNode = new (compiler, CMK_LSRA) BasicBlockList(block, nextNode);
1652 if (prevNode == nullptr)
1654 blockSequenceWorkList = newListNode;
1658 prevNode->next = newListNode;
1662 void LinearScan::removeFromBlockSequenceWorkList(BasicBlockList* listNode, BasicBlockList* prevNode)
1664 if (listNode == blockSequenceWorkList)
1666 assert(prevNode == nullptr);
1667 blockSequenceWorkList = listNode->next;
1671 assert(prevNode != nullptr && prevNode->next == listNode);
1672 prevNode->next = listNode->next;
1674 // TODO-Cleanup: consider merging Compiler::BlockListNode and BasicBlockList
1675 // compiler->FreeBlockListNode(listNode);
1678 // Initialize the block order for allocation (called each time a new traversal begins).
1679 BasicBlock* LinearScan::startBlockSequence()
1681 if (!blockSequencingDone)
1685 BasicBlock* curBB = compiler->fgFirstBB;
1687 curBBNum = curBB->bbNum;
1688 clearVisitedBlocks();
1689 assert(blockSequence[0] == compiler->fgFirstBB);
1690 markBlockVisited(curBB);
1694 //------------------------------------------------------------------------
1695 // moveToNextBlock: Move to the next block in order for allocation or resolution.
1704 // This method is used when the next block is actually going to be handled.
1705 // It changes curBBNum.
1707 BasicBlock* LinearScan::moveToNextBlock()
1709 BasicBlock* nextBlock = getNextBlock();
1711 if (nextBlock != nullptr)
1713 curBBNum = nextBlock->bbNum;
1718 //------------------------------------------------------------------------
1719 // getNextBlock: Get the next block in order for allocation or resolution.
1728 // This method does not actually change the current block - it is used simply
1729 // to determine which block will be next.
1731 BasicBlock* LinearScan::getNextBlock()
1733 assert(blockSequencingDone);
1734 unsigned int nextBBSeqNum = curBBSeqNum + 1;
1735 if (nextBBSeqNum < bbSeqCount)
1737 return blockSequence[nextBBSeqNum];
1742 //------------------------------------------------------------------------
1743 // doLinearScan: The main method for register allocation.
1752 // Lowering must have set the NodeInfo (gtLsraInfo) on each node to communicate
1753 // the register requirements.
1755 void LinearScan::doLinearScan()
1757 unsigned lsraBlockEpoch = compiler->GetCurBasicBlockEpoch();
1759 splitBBNumToTargetBBNumMap = nullptr;
1761 // This is complicated by the fact that physical registers have refs associated
1762 // with locations where they are killed (e.g. calls), but we don't want to
1763 // count these as being touched.
1765 compiler->codeGen->regSet.rsClearRegsModified();
1769 DBEXEC(VERBOSE, TupleStyleDump(LSRA_DUMP_REFPOS));
1770 compiler->EndPhase(PHASE_LINEAR_SCAN_BUILD);
1772 DBEXEC(VERBOSE, lsraDumpIntervals("after buildIntervals"));
1774 clearVisitedBlocks();
1776 allocateRegisters();
1777 compiler->EndPhase(PHASE_LINEAR_SCAN_ALLOC);
1779 compiler->EndPhase(PHASE_LINEAR_SCAN_RESOLVE);
1781 #if TRACK_LSRA_STATS
1782 if ((JitConfig.DisplayLsraStats() != 0)
1788 dumpLsraStats(jitstdout);
1790 #endif // TRACK_LSRA_STATS
1792 DBEXEC(VERBOSE, TupleStyleDump(LSRA_DUMP_POST));
1794 compiler->compLSRADone = true;
1795 noway_assert(lsraBlockEpoch = compiler->GetCurBasicBlockEpoch());
1798 //------------------------------------------------------------------------
1799 // recordVarLocationsAtStartOfBB: Update live-in LclVarDscs with the appropriate
1800 // register location at the start of a block, during codegen.
1803 // bb - the block for which code is about to be generated.
1809 // CodeGen will take care of updating the reg masks and the current var liveness,
1810 // after calling this method.
1811 // This is because we need to kill off the dead registers before setting the newly live ones.
1813 void LinearScan::recordVarLocationsAtStartOfBB(BasicBlock* bb)
1815 if (!enregisterLocalVars)
1819 JITDUMP("Recording Var Locations at start of BB%02u\n", bb->bbNum);
1820 VarToRegMap map = getInVarToRegMap(bb->bbNum);
1823 VarSetOps::AssignNoCopy(compiler, currentLiveVars,
1824 VarSetOps::Intersection(compiler, registerCandidateVars, bb->bbLiveIn));
1825 VarSetOps::Iter iter(compiler, currentLiveVars);
1826 unsigned varIndex = 0;
1827 while (iter.NextElem(&varIndex))
1829 unsigned varNum = compiler->lvaTrackedToVarNum[varIndex];
1830 LclVarDsc* varDsc = &(compiler->lvaTable[varNum]);
1831 regNumber regNum = getVarReg(map, varIndex);
1833 regNumber oldRegNum = varDsc->lvRegNum;
1834 regNumber newRegNum = regNum;
1836 if (oldRegNum != newRegNum)
1838 JITDUMP(" V%02u(%s->%s)", varNum, compiler->compRegVarName(oldRegNum),
1839 compiler->compRegVarName(newRegNum));
1840 varDsc->lvRegNum = newRegNum;
1843 else if (newRegNum != REG_STK)
1845 JITDUMP(" V%02u(%s)", varNum, compiler->compRegVarName(newRegNum));
1852 JITDUMP(" <none>\n");
1858 void Interval::setLocalNumber(Compiler* compiler, unsigned lclNum, LinearScan* linScan)
1860 LclVarDsc* varDsc = &compiler->lvaTable[lclNum];
1861 assert(varDsc->lvTracked);
1862 assert(varDsc->lvVarIndex < compiler->lvaTrackedCount);
1864 linScan->localVarIntervals[varDsc->lvVarIndex] = this;
1866 assert(linScan->getIntervalForLocalVar(varDsc->lvVarIndex) == this);
1867 this->isLocalVar = true;
1868 this->varNum = lclNum;
1871 // identify the candidates which we are not going to enregister due to
1872 // being used in EH in a way we don't want to deal with
1873 // this logic cloned from fgInterBlockLocalVarLiveness
1874 void LinearScan::identifyCandidatesExceptionDataflow()
1876 VARSET_TP exceptVars(VarSetOps::MakeEmpty(compiler));
1877 VARSET_TP filterVars(VarSetOps::MakeEmpty(compiler));
1878 VARSET_TP finallyVars(VarSetOps::MakeEmpty(compiler));
1881 foreach_block(compiler, block)
1883 if (block->bbCatchTyp != BBCT_NONE)
1885 // live on entry to handler
1886 VarSetOps::UnionD(compiler, exceptVars, block->bbLiveIn);
1889 if (block->bbJumpKind == BBJ_EHFILTERRET)
1891 // live on exit from filter
1892 VarSetOps::UnionD(compiler, filterVars, block->bbLiveOut);
1894 else if (block->bbJumpKind == BBJ_EHFINALLYRET)
1896 // live on exit from finally
1897 VarSetOps::UnionD(compiler, finallyVars, block->bbLiveOut);
1899 #if FEATURE_EH_FUNCLETS
1900 // Funclets are called and returned from, as such we can only count on the frame
1901 // pointer being restored, and thus everything live in or live out must be on the
1903 if (block->bbFlags & BBF_FUNCLET_BEG)
1905 VarSetOps::UnionD(compiler, exceptVars, block->bbLiveIn);
1907 if ((block->bbJumpKind == BBJ_EHFINALLYRET) || (block->bbJumpKind == BBJ_EHFILTERRET) ||
1908 (block->bbJumpKind == BBJ_EHCATCHRET))
1910 VarSetOps::UnionD(compiler, exceptVars, block->bbLiveOut);
1912 #endif // FEATURE_EH_FUNCLETS
1915 // slam them all together (there was really no need to use more than 2 bitvectors here)
1916 VarSetOps::UnionD(compiler, exceptVars, filterVars);
1917 VarSetOps::UnionD(compiler, exceptVars, finallyVars);
1919 /* Mark all pointer variables live on exit from a 'finally'
1920 block as either volatile for non-GC ref types or as
1921 'explicitly initialized' (volatile and must-init) for GC-ref types */
1923 VarSetOps::Iter iter(compiler, exceptVars);
1924 unsigned varIndex = 0;
1925 while (iter.NextElem(&varIndex))
1927 unsigned varNum = compiler->lvaTrackedToVarNum[varIndex];
1928 LclVarDsc* varDsc = compiler->lvaTable + varNum;
1930 compiler->lvaSetVarDoNotEnregister(varNum DEBUGARG(Compiler::DNER_LiveInOutOfHandler));
1932 if (varTypeIsGC(varDsc))
1934 if (VarSetOps::IsMember(compiler, finallyVars, varIndex) && !varDsc->lvIsParam)
1936 varDsc->lvMustInit = true;
1942 bool LinearScan::isRegCandidate(LclVarDsc* varDsc)
1944 // We shouldn't be called if opt settings do not permit register variables.
1945 assert((compiler->opts.compFlags & CLFLG_REGVAR) != 0);
1947 if (!varDsc->lvTracked)
1952 #if !defined(_TARGET_64BIT_)
1953 if (varDsc->lvType == TYP_LONG)
1955 // Long variables should not be register candidates.
1956 // Lowering will have split any candidate lclVars into lo/hi vars.
1959 #endif // !defined(_TARGET_64BIT)
1961 // If we have JMP, reg args must be put on the stack
1963 if (compiler->compJmpOpUsed && varDsc->lvIsRegArg)
1968 // Don't allocate registers for dependently promoted struct fields
1969 if (compiler->lvaIsFieldOfDependentlyPromotedStruct(varDsc))
1976 // Identify locals & compiler temps that are register candidates
1977 // TODO-Cleanup: This was cloned from Compiler::lvaSortByRefCount() in lclvars.cpp in order
1978 // to avoid perturbation, but should be merged.
1980 void LinearScan::identifyCandidates()
1982 if (enregisterLocalVars)
1984 // Initialize the set of lclVars that are candidates for register allocation.
1985 VarSetOps::AssignNoCopy(compiler, registerCandidateVars, VarSetOps::MakeEmpty(compiler));
1987 // Initialize the sets of lclVars that are used to determine whether, and for which lclVars,
1988 // we need to perform resolution across basic blocks.
1989 // Note that we can't do this in the constructor because the number of tracked lclVars may
1990 // change between the constructor and the actual allocation.
1991 VarSetOps::AssignNoCopy(compiler, resolutionCandidateVars, VarSetOps::MakeEmpty(compiler));
1992 VarSetOps::AssignNoCopy(compiler, splitOrSpilledVars, VarSetOps::MakeEmpty(compiler));
1994 // We set enregisterLocalVars to true only if there are tracked lclVars
1995 assert(compiler->lvaCount != 0);
1997 else if (compiler->lvaCount == 0)
1999 // Nothing to do. Note that even if enregisterLocalVars is false, we still need to set the
2000 // lvLRACandidate field on all the lclVars to false if we have any.
2004 if (compiler->compHndBBtabCount > 0)
2006 identifyCandidatesExceptionDataflow();
2012 // While we build intervals for the candidate lclVars, we will determine the floating point
2013 // lclVars, if any, to consider for callee-save register preferencing.
2014 // We maintain two sets of FP vars - those that meet the first threshold of weighted ref Count,
2015 // and those that meet the second.
2016 // The first threshold is used for methods that are heuristically deemed either to have light
2017 // fp usage, or other factors that encourage conservative use of callee-save registers, such
2018 // as multiple exits (where there might be an early exit that woudl be excessively penalized by
2019 // lots of prolog/epilog saves & restores).
2020 // The second threshold is used where there are factors deemed to make it more likely that fp
2021 // fp callee save registers will be needed, such as loops or many fp vars.
2022 // We keep two sets of vars, since we collect some of the information to determine which set to
2023 // use as we iterate over the vars.
2024 // When we are generating AVX code on non-Unix (FEATURE_PARTIAL_SIMD_CALLEE_SAVE), we maintain an
2025 // additional set of LargeVectorType vars, and there is a separate threshold defined for those.
2026 // It is assumed that if we encounter these, that we should consider this a "high use" scenario,
2027 // so we don't maintain two sets of these vars.
2028 // This is defined as thresholdLargeVectorRefCntWtd, as we are likely to use the same mechanism
2029 // for vectors on Arm64, though the actual value may differ.
2031 unsigned int floatVarCount = 0;
2032 unsigned int thresholdFPRefCntWtd = 4 * BB_UNITY_WEIGHT;
2033 unsigned int maybeFPRefCntWtd = 2 * BB_UNITY_WEIGHT;
2034 VARSET_TP fpMaybeCandidateVars(VarSetOps::UninitVal());
2035 #if FEATURE_PARTIAL_SIMD_CALLEE_SAVE
2036 unsigned int largeVectorVarCount = 0;
2037 unsigned int thresholdLargeVectorRefCntWtd = 4 * BB_UNITY_WEIGHT;
2038 #endif // FEATURE_PARTIAL_SIMD_CALLEE_SAVE
2039 if (enregisterLocalVars)
2041 VarSetOps::AssignNoCopy(compiler, fpCalleeSaveCandidateVars, VarSetOps::MakeEmpty(compiler));
2042 VarSetOps::AssignNoCopy(compiler, fpMaybeCandidateVars, VarSetOps::MakeEmpty(compiler));
2043 #if FEATURE_PARTIAL_SIMD_CALLEE_SAVE
2044 VarSetOps::AssignNoCopy(compiler, largeVectorVars, VarSetOps::MakeEmpty(compiler));
2045 VarSetOps::AssignNoCopy(compiler, largeVectorCalleeSaveCandidateVars, VarSetOps::MakeEmpty(compiler));
2046 #endif // FEATURE_PARTIAL_SIMD_CALLEE_SAVE
2049 unsigned refCntStk = 0;
2050 unsigned refCntReg = 0;
2051 unsigned refCntWtdReg = 0;
2052 unsigned refCntStkParam = 0; // sum of ref counts for all stack based parameters
2053 unsigned refCntWtdStkDbl = 0; // sum of wtd ref counts for stack based doubles
2054 doDoubleAlign = false;
2055 bool checkDoubleAlign = true;
2056 if (compiler->codeGen->isFramePointerRequired() || compiler->opts.MinOpts())
2058 checkDoubleAlign = false;
2062 switch (compiler->getCanDoubleAlign())
2064 case MUST_DOUBLE_ALIGN:
2065 doDoubleAlign = true;
2066 checkDoubleAlign = false;
2068 case CAN_DOUBLE_ALIGN:
2070 case CANT_DOUBLE_ALIGN:
2071 doDoubleAlign = false;
2072 checkDoubleAlign = false;
2078 #endif // DOUBLE_ALIGN
2080 // Check whether register variables are permitted.
2081 if (!enregisterLocalVars)
2083 localVarIntervals = nullptr;
2085 else if (compiler->lvaTrackedCount > 0)
2087 // initialize mapping from tracked local to interval
2088 localVarIntervals = new (compiler, CMK_LSRA) Interval*[compiler->lvaTrackedCount];
2091 INTRACK_STATS(regCandidateVarCount = 0);
2092 for (lclNum = 0, varDsc = compiler->lvaTable; lclNum < compiler->lvaCount; lclNum++, varDsc++)
2094 // Initialize all variables to REG_STK
2095 varDsc->lvRegNum = REG_STK;
2096 #ifndef _TARGET_64BIT_
2097 varDsc->lvOtherReg = REG_STK;
2098 #endif // _TARGET_64BIT_
2100 if (!enregisterLocalVars)
2102 varDsc->lvLRACandidate = false;
2107 if (checkDoubleAlign)
2109 if (varDsc->lvIsParam && !varDsc->lvIsRegArg)
2111 refCntStkParam += varDsc->lvRefCnt;
2113 else if (!isRegCandidate(varDsc) || varDsc->lvDoNotEnregister)
2115 refCntStk += varDsc->lvRefCnt;
2116 if ((varDsc->lvType == TYP_DOUBLE) ||
2117 ((varTypeIsStruct(varDsc) && varDsc->lvStructDoubleAlign &&
2118 (compiler->lvaGetPromotionType(varDsc) != Compiler::PROMOTION_TYPE_INDEPENDENT))))
2120 refCntWtdStkDbl += varDsc->lvRefCntWtd;
2125 refCntReg += varDsc->lvRefCnt;
2126 refCntWtdReg += varDsc->lvRefCntWtd;
2129 #endif // DOUBLE_ALIGN
2131 /* Track all locals that can be enregistered */
2133 if (!isRegCandidate(varDsc))
2135 varDsc->lvLRACandidate = 0;
2136 if (varDsc->lvTracked)
2138 localVarIntervals[varDsc->lvVarIndex] = nullptr;
2143 assert(varDsc->lvTracked);
2145 varDsc->lvLRACandidate = 1;
2147 // Start with lvRegister as false - set it true only if the variable gets
2148 // the same register assignment throughout
2149 varDsc->lvRegister = false;
2151 /* If the ref count is zero */
2152 if (varDsc->lvRefCnt == 0)
2154 /* Zero ref count, make this untracked */
2155 varDsc->lvRefCntWtd = 0;
2156 varDsc->lvLRACandidate = 0;
2159 // Variables that are address-exposed are never enregistered, or tracked.
2160 // A struct may be promoted, and a struct that fits in a register may be fully enregistered.
2161 // Pinned variables may not be tracked (a condition of the GCInfo representation)
2162 // or enregistered, on x86 -- it is believed that we can enregister pinned (more properly, "pinning")
2163 // references when using the general GC encoding.
2165 if (varDsc->lvAddrExposed || !varTypeIsEnregisterableStruct(varDsc))
2167 varDsc->lvLRACandidate = 0;
2169 Compiler::DoNotEnregisterReason dner = Compiler::DNER_AddrExposed;
2170 if (!varDsc->lvAddrExposed)
2172 dner = Compiler::DNER_IsStruct;
2175 compiler->lvaSetVarDoNotEnregister(lclNum DEBUGARG(dner));
2177 else if (varDsc->lvPinned)
2179 varDsc->lvTracked = 0;
2180 #ifdef JIT32_GCENCODER
2181 compiler->lvaSetVarDoNotEnregister(lclNum DEBUGARG(Compiler::DNER_PinningRef));
2182 #endif // JIT32_GCENCODER
2185 // Are we not optimizing and we have exception handlers?
2186 // if so mark all args and locals as volatile, so that they
2187 // won't ever get enregistered.
2189 if (compiler->opts.MinOpts() && compiler->compHndBBtabCount > 0)
2191 compiler->lvaSetVarDoNotEnregister(lclNum DEBUGARG(Compiler::DNER_LiveInOutOfHandler));
2194 if (varDsc->lvDoNotEnregister)
2196 varDsc->lvLRACandidate = 0;
2197 localVarIntervals[varDsc->lvVarIndex] = nullptr;
2201 var_types type = genActualType(varDsc->TypeGet());
2205 #if CPU_HAS_FP_SUPPORT
2208 if (compiler->opts.compDbgCode)
2210 varDsc->lvLRACandidate = 0;
2213 if (varDsc->lvIsParam && varDsc->lvIsRegArg)
2215 type = (type == TYP_DOUBLE) ? TYP_LONG : TYP_INT;
2217 #endif // ARM_SOFTFP
2219 #endif // CPU_HAS_FP_SUPPORT
2231 if (varDsc->lvPromoted)
2233 varDsc->lvLRACandidate = 0;
2237 // TODO-1stClassStructs: Move TYP_SIMD8 up with the other SIMD types, after handling the param issue
2238 // (passing & returning as TYP_LONG).
2240 #endif // FEATURE_SIMD
2244 varDsc->lvLRACandidate = 0;
2250 noway_assert(!"lvType not set correctly");
2251 varDsc->lvType = TYP_INT;
2256 varDsc->lvLRACandidate = 0;
2259 if (varDsc->lvLRACandidate)
2261 Interval* newInt = newInterval(type);
2262 newInt->setLocalNumber(compiler, lclNum, this);
2263 VarSetOps::AddElemD(compiler, registerCandidateVars, varDsc->lvVarIndex);
2265 // we will set this later when we have determined liveness
2266 varDsc->lvMustInit = false;
2268 if (varDsc->lvIsStructField)
2270 newInt->isStructField = true;
2273 INTRACK_STATS(regCandidateVarCount++);
2275 // We maintain two sets of FP vars - those that meet the first threshold of weighted ref Count,
2276 // and those that meet the second (see the definitions of thresholdFPRefCntWtd and maybeFPRefCntWtd
2278 CLANG_FORMAT_COMMENT_ANCHOR;
2280 #if FEATURE_PARTIAL_SIMD_CALLEE_SAVE
2281 // Additionally, when we are generating AVX on non-UNIX amd64, we keep a separate set of the LargeVectorType
2283 if (varDsc->lvType == LargeVectorType)
2285 largeVectorVarCount++;
2286 VarSetOps::AddElemD(compiler, largeVectorVars, varDsc->lvVarIndex);
2287 unsigned refCntWtd = varDsc->lvRefCntWtd;
2288 if (refCntWtd >= thresholdLargeVectorRefCntWtd)
2290 VarSetOps::AddElemD(compiler, largeVectorCalleeSaveCandidateVars, varDsc->lvVarIndex);
2294 #endif // FEATURE_PARTIAL_SIMD_CALLEE_SAVE
2295 if (regType(type) == FloatRegisterType)
2298 unsigned refCntWtd = varDsc->lvRefCntWtd;
2299 if (varDsc->lvIsRegArg)
2301 // Don't count the initial reference for register params. In those cases,
2302 // using a callee-save causes an extra copy.
2303 refCntWtd -= BB_UNITY_WEIGHT;
2305 if (refCntWtd >= thresholdFPRefCntWtd)
2307 VarSetOps::AddElemD(compiler, fpCalleeSaveCandidateVars, varDsc->lvVarIndex);
2309 else if (refCntWtd >= maybeFPRefCntWtd)
2311 VarSetOps::AddElemD(compiler, fpMaybeCandidateVars, varDsc->lvVarIndex);
2317 localVarIntervals[varDsc->lvVarIndex] = nullptr;
2322 if (checkDoubleAlign)
2324 // TODO-CQ: Fine-tune this:
2325 // In the legacy reg predictor, this runs after allocation, and then demotes any lclVars
2326 // allocated to the frame pointer, which is probably the wrong order.
2327 // However, because it runs after allocation, it can determine the impact of demoting
2328 // the lclVars allocated to the frame pointer.
2329 // => Here, estimate of the EBP refCnt and weighted refCnt is a wild guess.
2331 unsigned refCntEBP = refCntReg / 8;
2332 unsigned refCntWtdEBP = refCntWtdReg / 8;
2335 compiler->shouldDoubleAlign(refCntStk, refCntEBP, refCntWtdEBP, refCntStkParam, refCntWtdStkDbl);
2337 #endif // DOUBLE_ALIGN
2339 // The factors we consider to determine which set of fp vars to use as candidates for callee save
2340 // registers current include the number of fp vars, whether there are loops, and whether there are
2341 // multiple exits. These have been selected somewhat empirically, but there is probably room for
2343 CLANG_FORMAT_COMMENT_ANCHOR;
2348 printf("\nFP callee save candidate vars: ");
2349 if (enregisterLocalVars && !VarSetOps::IsEmpty(compiler, fpCalleeSaveCandidateVars))
2351 dumpConvertedVarSet(compiler, fpCalleeSaveCandidateVars);
2361 JITDUMP("floatVarCount = %d; hasLoops = %d, singleExit = %d\n", floatVarCount, compiler->fgHasLoops,
2362 (compiler->fgReturnBlocks == nullptr || compiler->fgReturnBlocks->next == nullptr));
2364 // Determine whether to use the 2nd, more aggressive, threshold for fp callee saves.
2365 if (floatVarCount > 6 && compiler->fgHasLoops &&
2366 (compiler->fgReturnBlocks == nullptr || compiler->fgReturnBlocks->next == nullptr))
2368 assert(enregisterLocalVars);
2372 printf("Adding additional fp callee save candidates: \n");
2373 if (!VarSetOps::IsEmpty(compiler, fpMaybeCandidateVars))
2375 dumpConvertedVarSet(compiler, fpMaybeCandidateVars);
2384 VarSetOps::UnionD(compiler, fpCalleeSaveCandidateVars, fpMaybeCandidateVars);
2391 // Frame layout is only pre-computed for ARM
2392 printf("\nlvaTable after IdentifyCandidates\n");
2393 compiler->lvaTableDump();
2396 #endif // _TARGET_ARM_
2399 // TODO-Throughput: This mapping can surely be more efficiently done
2400 void LinearScan::initVarRegMaps()
2402 if (!enregisterLocalVars)
2404 inVarToRegMaps = nullptr;
2405 outVarToRegMaps = nullptr;
2408 assert(compiler->lvaTrackedFixed); // We should have already set this to prevent us from adding any new tracked
2411 // The compiler memory allocator requires that the allocation be an
2412 // even multiple of int-sized objects
2413 unsigned int varCount = compiler->lvaTrackedCount;
2414 regMapCount = (unsigned int)roundUp(varCount, sizeof(int));
2416 // Not sure why blocks aren't numbered from zero, but they don't appear to be.
2417 // So, if we want to index by bbNum we have to know the maximum value.
2418 unsigned int bbCount = compiler->fgBBNumMax + 1;
2420 inVarToRegMaps = new (compiler, CMK_LSRA) regNumberSmall*[bbCount];
2421 outVarToRegMaps = new (compiler, CMK_LSRA) regNumberSmall*[bbCount];
2425 // This VarToRegMap is used during the resolution of critical edges.
2426 sharedCriticalVarToRegMap = new (compiler, CMK_LSRA) regNumberSmall[regMapCount];
2428 for (unsigned int i = 0; i < bbCount; i++)
2430 VarToRegMap inVarToRegMap = new (compiler, CMK_LSRA) regNumberSmall[regMapCount];
2431 VarToRegMap outVarToRegMap = new (compiler, CMK_LSRA) regNumberSmall[regMapCount];
2433 for (unsigned int j = 0; j < regMapCount; j++)
2435 inVarToRegMap[j] = REG_STK;
2436 outVarToRegMap[j] = REG_STK;
2438 inVarToRegMaps[i] = inVarToRegMap;
2439 outVarToRegMaps[i] = outVarToRegMap;
2444 sharedCriticalVarToRegMap = nullptr;
2445 for (unsigned int i = 0; i < bbCount; i++)
2447 inVarToRegMaps[i] = nullptr;
2448 outVarToRegMaps[i] = nullptr;
2453 void LinearScan::setInVarRegForBB(unsigned int bbNum, unsigned int varNum, regNumber reg)
2455 assert(enregisterLocalVars);
2456 assert(reg < UCHAR_MAX && varNum < compiler->lvaCount);
2457 inVarToRegMaps[bbNum][compiler->lvaTable[varNum].lvVarIndex] = (regNumberSmall)reg;
2460 void LinearScan::setOutVarRegForBB(unsigned int bbNum, unsigned int varNum, regNumber reg)
2462 assert(enregisterLocalVars);
2463 assert(reg < UCHAR_MAX && varNum < compiler->lvaCount);
2464 outVarToRegMaps[bbNum][compiler->lvaTable[varNum].lvVarIndex] = (regNumberSmall)reg;
2467 LinearScan::SplitEdgeInfo LinearScan::getSplitEdgeInfo(unsigned int bbNum)
2469 assert(enregisterLocalVars);
2470 SplitEdgeInfo splitEdgeInfo;
2471 assert(bbNum <= compiler->fgBBNumMax);
2472 assert(bbNum > bbNumMaxBeforeResolution);
2473 assert(splitBBNumToTargetBBNumMap != nullptr);
2474 splitBBNumToTargetBBNumMap->Lookup(bbNum, &splitEdgeInfo);
2475 assert(splitEdgeInfo.toBBNum <= bbNumMaxBeforeResolution);
2476 assert(splitEdgeInfo.fromBBNum <= bbNumMaxBeforeResolution);
2477 return splitEdgeInfo;
2480 VarToRegMap LinearScan::getInVarToRegMap(unsigned int bbNum)
2482 assert(enregisterLocalVars);
2483 assert(bbNum <= compiler->fgBBNumMax);
2484 // For the blocks inserted to split critical edges, the inVarToRegMap is
2485 // equal to the outVarToRegMap at the "from" block.
2486 if (bbNum > bbNumMaxBeforeResolution)
2488 SplitEdgeInfo splitEdgeInfo = getSplitEdgeInfo(bbNum);
2489 unsigned fromBBNum = splitEdgeInfo.fromBBNum;
2492 assert(splitEdgeInfo.toBBNum != 0);
2493 return inVarToRegMaps[splitEdgeInfo.toBBNum];
2497 return outVarToRegMaps[fromBBNum];
2501 return inVarToRegMaps[bbNum];
2504 VarToRegMap LinearScan::getOutVarToRegMap(unsigned int bbNum)
2506 assert(enregisterLocalVars);
2507 assert(bbNum <= compiler->fgBBNumMax);
2508 // For the blocks inserted to split critical edges, the outVarToRegMap is
2509 // equal to the inVarToRegMap at the target.
2510 if (bbNum > bbNumMaxBeforeResolution)
2512 // If this is an empty block, its in and out maps are both the same.
2513 // We identify this case by setting fromBBNum or toBBNum to 0, and using only the other.
2514 SplitEdgeInfo splitEdgeInfo = getSplitEdgeInfo(bbNum);
2515 unsigned toBBNum = splitEdgeInfo.toBBNum;
2518 assert(splitEdgeInfo.fromBBNum != 0);
2519 return outVarToRegMaps[splitEdgeInfo.fromBBNum];
2523 return inVarToRegMaps[toBBNum];
2526 return outVarToRegMaps[bbNum];
2529 //------------------------------------------------------------------------
2530 // setVarReg: Set the register associated with a variable in the given 'bbVarToRegMap'.
2533 // bbVarToRegMap - the map of interest
2534 // trackedVarIndex - the lvVarIndex for the variable
2535 // reg - the register to which it is being mapped
2540 void LinearScan::setVarReg(VarToRegMap bbVarToRegMap, unsigned int trackedVarIndex, regNumber reg)
2542 assert(trackedVarIndex < compiler->lvaTrackedCount);
2543 regNumberSmall regSmall = (regNumberSmall)reg;
2544 assert((regNumber)regSmall == reg);
2545 bbVarToRegMap[trackedVarIndex] = regSmall;
2548 //------------------------------------------------------------------------
2549 // getVarReg: Get the register associated with a variable in the given 'bbVarToRegMap'.
2552 // bbVarToRegMap - the map of interest
2553 // trackedVarIndex - the lvVarIndex for the variable
2556 // The register to which 'trackedVarIndex' is mapped
2558 regNumber LinearScan::getVarReg(VarToRegMap bbVarToRegMap, unsigned int trackedVarIndex)
2560 assert(enregisterLocalVars);
2561 assert(trackedVarIndex < compiler->lvaTrackedCount);
2562 return (regNumber)bbVarToRegMap[trackedVarIndex];
2565 // Initialize the incoming VarToRegMap to the given map values (generally a predecessor of
2567 VarToRegMap LinearScan::setInVarToRegMap(unsigned int bbNum, VarToRegMap srcVarToRegMap)
2569 assert(enregisterLocalVars);
2570 VarToRegMap inVarToRegMap = inVarToRegMaps[bbNum];
2571 memcpy(inVarToRegMap, srcVarToRegMap, (regMapCount * sizeof(regNumber)));
2572 return inVarToRegMap;
2575 // given a tree node
2576 RefType refTypeForLocalRefNode(GenTree* node)
2578 assert(node->IsLocal());
2580 // We don't support updates
2581 assert((node->gtFlags & GTF_VAR_USEASG) == 0);
2583 if (node->gtFlags & GTF_VAR_DEF)
2593 //------------------------------------------------------------------------
2594 // checkLastUses: Check correctness of last use flags
2597 // The block for which we are checking last uses.
2600 // This does a backward walk of the RefPositions, starting from the liveOut set.
2601 // This method was previously used to set the last uses, which were computed by
2602 // liveness, but were not create in some cases of multiple lclVar references in the
2603 // same tree. However, now that last uses are computed as RefPositions are created,
2604 // that is no longer necessary, and this method is simply retained as a check.
2605 // The exception to the check-only behavior is when LSRA_EXTEND_LIFETIMES if set via
2606 // COMPlus_JitStressRegs. In that case, this method is required, because even though
2607 // the RefPositions will not be marked lastUse in that case, we still need to correclty
2608 // mark the last uses on the tree nodes, which is done by this method.
2611 void LinearScan::checkLastUses(BasicBlock* block)
2615 JITDUMP("\n\nCHECKING LAST USES for block %u, liveout=", block->bbNum);
2616 dumpConvertedVarSet(compiler, block->bbLiveOut);
2617 JITDUMP("\n==============================\n");
2620 unsigned keepAliveVarNum = BAD_VAR_NUM;
2621 if (compiler->lvaKeepAliveAndReportThis())
2623 keepAliveVarNum = compiler->info.compThisArg;
2624 assert(compiler->info.compIsStatic == false);
2627 // find which uses are lastUses
2629 // Work backwards starting with live out.
2630 // 'computedLive' is updated to include any exposed use (including those in this
2631 // block that we've already seen). When we encounter a use, if it's
2632 // not in that set, then it's a last use.
2634 VARSET_TP computedLive(VarSetOps::MakeCopy(compiler, block->bbLiveOut));
2636 bool foundDiff = false;
2637 auto currentRefPosition = refPositions.rbegin();
2638 while (currentRefPosition->refType != RefTypeBB)
2640 // We should never see ParamDefs or ZeroInits within a basic block.
2641 assert(currentRefPosition->refType != RefTypeParamDef && currentRefPosition->refType != RefTypeZeroInit);
2642 if (currentRefPosition->isIntervalRef() && currentRefPosition->getInterval()->isLocalVar)
2644 unsigned varNum = currentRefPosition->getInterval()->varNum;
2645 unsigned varIndex = currentRefPosition->getInterval()->getVarIndex(compiler);
2647 LsraLocation loc = currentRefPosition->nodeLocation;
2649 // We should always have a tree node for a localVar, except for the "special" RefPositions.
2650 GenTreePtr tree = currentRefPosition->treeNode;
2651 assert(tree != nullptr || currentRefPosition->refType == RefTypeExpUse ||
2652 currentRefPosition->refType == RefTypeDummyDef);
2654 if (!VarSetOps::IsMember(compiler, computedLive, varIndex) && varNum != keepAliveVarNum)
2656 // There was no exposed use, so this is a "last use" (and we mark it thus even if it's a def)
2658 if (extendLifetimes())
2660 // NOTE: this is a bit of a hack. When extending lifetimes, the "last use" bit will be clear.
2661 // This bit, however, would normally be used during resolveLocalRef to set the value of
2662 // GTF_VAR_DEATH on the node for a ref position. If this bit is not set correctly even when
2663 // extending lifetimes, the code generator will assert as it expects to have accurate last
2664 // use information. To avoid these asserts, set the GTF_VAR_DEATH bit here.
2665 // Note also that extendLifetimes() is an LSRA stress mode, so it will only be true for
2666 // Checked or Debug builds, for which this method will be executed.
2667 if (tree != nullptr)
2669 tree->gtFlags |= GTF_VAR_DEATH;
2672 else if (!currentRefPosition->lastUse)
2674 JITDUMP("missing expected last use of V%02u @%u\n", compiler->lvaTrackedToVarNum[varIndex], loc);
2677 VarSetOps::AddElemD(compiler, computedLive, varIndex);
2679 else if (currentRefPosition->lastUse)
2681 JITDUMP("unexpected last use of V%02u @%u\n", compiler->lvaTrackedToVarNum[varIndex], loc);
2684 else if (extendLifetimes() && tree != nullptr)
2686 // NOTE: see the comment above re: the extendLifetimes hack.
2687 tree->gtFlags &= ~GTF_VAR_DEATH;
2690 if (currentRefPosition->refType == RefTypeDef || currentRefPosition->refType == RefTypeDummyDef)
2692 VarSetOps::RemoveElemD(compiler, computedLive, varIndex);
2696 assert(currentRefPosition != refPositions.rend());
2697 ++currentRefPosition;
2700 VARSET_TP liveInNotComputedLive(VarSetOps::Diff(compiler, block->bbLiveIn, computedLive));
2702 VarSetOps::Iter liveInNotComputedLiveIter(compiler, liveInNotComputedLive);
2703 unsigned liveInNotComputedLiveIndex = 0;
2704 while (liveInNotComputedLiveIter.NextElem(&liveInNotComputedLiveIndex))
2706 unsigned varNum = compiler->lvaTrackedToVarNum[liveInNotComputedLiveIndex];
2707 if (compiler->lvaTable[varNum].lvLRACandidate)
2709 JITDUMP("BB%02u: V%02u is in LiveIn set, but not computed live.\n", block->bbNum, varNum);
2714 VarSetOps::DiffD(compiler, computedLive, block->bbLiveIn);
2715 const VARSET_TP& computedLiveNotLiveIn(computedLive); // reuse the buffer.
2716 VarSetOps::Iter computedLiveNotLiveInIter(compiler, computedLiveNotLiveIn);
2717 unsigned computedLiveNotLiveInIndex = 0;
2718 while (computedLiveNotLiveInIter.NextElem(&computedLiveNotLiveInIndex))
2720 unsigned varNum = compiler->lvaTrackedToVarNum[computedLiveNotLiveInIndex];
2721 if (compiler->lvaTable[varNum].lvLRACandidate)
2723 JITDUMP("BB%02u: V%02u is computed live, but not in LiveIn set.\n", block->bbNum, varNum);
2732 void LinearScan::addRefsForPhysRegMask(regMaskTP mask, LsraLocation currentLoc, RefType refType, bool isLastUse)
2734 for (regNumber reg = REG_FIRST; mask; reg = REG_NEXT(reg), mask >>= 1)
2738 // This assumes that these are all "special" RefTypes that
2739 // don't need to be recorded on the tree (hence treeNode is nullptr)
2740 RefPosition* pos = newRefPosition(reg, currentLoc, refType, nullptr,
2741 genRegMask(reg)); // This MUST occupy the physical register (obviously)
2745 pos->lastUse = true;
2751 //------------------------------------------------------------------------
2752 // getKillSetForNode: Return the registers killed by the given tree node.
2755 // compiler - the compiler context to use
2756 // tree - the tree for which the kill set is needed.
2758 // Return Value: a register mask of the registers killed
2760 regMaskTP LinearScan::getKillSetForNode(GenTree* tree)
2762 regMaskTP killMask = RBM_NONE;
2763 switch (tree->OperGet())
2765 #ifdef _TARGET_XARCH_
2767 // We use the 128-bit multiply when performing an overflow checking unsigned multiply
2769 if (((tree->gtFlags & GTF_UNSIGNED) != 0) && tree->gtOverflowEx())
2771 // Both RAX and RDX are killed by the operation
2772 killMask = RBM_RAX | RBM_RDX;
2777 #if defined(_TARGET_X86_) && !defined(LEGACY_BACKEND)
2780 killMask = RBM_RAX | RBM_RDX;
2787 if (!varTypeIsFloating(tree->TypeGet()))
2789 // RDX needs to be killed early, because it must not be used as a source register
2790 // (unlike most cases, where the kill happens AFTER the uses). So for this kill,
2791 // we add the RefPosition at the tree loc (where the uses are located) instead of the
2792 // usual kill location which is the same as the defs at tree loc+1.
2793 // Note that we don't have to add interference for the live vars, because that
2794 // will be done below, and is not sensitive to the precise location.
2795 LsraLocation currentLoc = tree->gtLsraInfo.loc;
2796 assert(currentLoc != 0);
2797 addRefsForPhysRegMask(RBM_RDX, currentLoc, RefTypeKill, true);
2798 // Both RAX and RDX are killed by the operation
2799 killMask = RBM_RAX | RBM_RDX;
2802 #endif // _TARGET_XARCH_
2805 if (tree->OperIsCopyBlkOp())
2807 assert(tree->AsObj()->gtGcPtrCount != 0);
2808 killMask = compiler->compHelperCallKillSet(CORINFO_HELP_ASSIGN_BYREF);
2814 case GT_STORE_DYN_BLK:
2816 GenTreeBlk* blkNode = tree->AsBlk();
2817 bool isCopyBlk = varTypeIsStruct(blkNode->Data());
2818 switch (blkNode->gtBlkOpKind)
2820 case GenTreeBlk::BlkOpKindHelper:
2823 killMask = compiler->compHelperCallKillSet(CORINFO_HELP_MEMCPY);
2827 killMask = compiler->compHelperCallKillSet(CORINFO_HELP_MEMSET);
2831 #ifdef _TARGET_XARCH_
2832 case GenTreeBlk::BlkOpKindRepInstr:
2835 // rep movs kills RCX, RDI and RSI
2836 killMask = RBM_RCX | RBM_RDI | RBM_RSI;
2840 // rep stos kills RCX and RDI.
2841 // (Note that the Data() node, if not constant, will be assigned to
2842 // RCX, but it's find that this kills it, as the value is not available
2843 // after this node in any case.)
2844 killMask = RBM_RDI | RBM_RCX;
2848 case GenTreeBlk::BlkOpKindRepInstr:
2850 case GenTreeBlk::BlkOpKindUnroll:
2851 case GenTreeBlk::BlkOpKindInvalid:
2852 // for these 'gtBlkOpKind' kinds, we leave 'killMask' = RBM_NONE
2859 killMask = compiler->compHelperCallKillSet(CORINFO_HELP_STOP_FOR_GC);
2863 if (compiler->compFloatingPointUsed)
2865 if (tree->TypeGet() == TYP_DOUBLE)
2867 needDoubleTmpForFPCall = true;
2869 else if (tree->TypeGet() == TYP_FLOAT)
2871 needFloatTmpForFPCall = true;
2874 #endif // _TARGET_X86_
2875 #if defined(_TARGET_X86_) || defined(_TARGET_ARM_)
2876 if (tree->IsHelperCall())
2878 GenTreeCall* call = tree->AsCall();
2879 CorInfoHelpFunc helpFunc = compiler->eeGetHelperNum(call->gtCallMethHnd);
2880 killMask = compiler->compHelperCallKillSet(helpFunc);
2883 #endif // defined(_TARGET_X86_) || defined(_TARGET_ARM_)
2885 // if there is no FP used, we can ignore the FP kills
2886 if (compiler->compFloatingPointUsed)
2888 killMask = RBM_CALLEE_TRASH;
2892 killMask = RBM_INT_CALLEE_TRASH;
2895 if (tree->AsCall()->IsVirtualStub())
2897 killMask |= compiler->virtualStubParamInfo->GetRegMask();
2899 #else // !_TARGET_ARM_
2900 // Verify that the special virtual stub call registers are in the kill mask.
2901 // We don't just add them unconditionally to the killMask because for most architectures
2902 // they are already in the RBM_CALLEE_TRASH set,
2903 // and we don't want to introduce extra checks and calls in this hot function.
2904 assert(!tree->AsCall()->IsVirtualStub() || ((killMask & compiler->virtualStubParamInfo->GetRegMask()) ==
2905 compiler->virtualStubParamInfo->GetRegMask()));
2910 if (compiler->codeGen->gcInfo.gcIsWriteBarrierAsgNode(tree))
2912 killMask = RBM_CALLEE_TRASH_NOGC;
2916 #if defined(PROFILING_SUPPORTED)
2917 // If this method requires profiler ELT hook then mark these nodes as killing
2918 // callee trash registers (excluding RAX and XMM0). The reason for this is that
2919 // profiler callback would trash these registers. See vm\amd64\asmhelpers.asm for
2922 if (compiler->compIsProfilerHookNeeded())
2924 killMask = compiler->compHelperCallKillSet(CORINFO_HELP_PROF_FCN_LEAVE);
2929 if (compiler->compIsProfilerHookNeeded())
2931 killMask = compiler->compHelperCallKillSet(CORINFO_HELP_PROF_FCN_TAILCALL);
2934 #endif // PROFILING_SUPPORTED
2937 // for all other 'tree->OperGet()' kinds, leave 'killMask' = RBM_NONE
2943 //------------------------------------------------------------------------
2944 // buildKillPositionsForNode:
2945 // Given some tree node add refpositions for all the registers this node kills
2948 // tree - the tree for which kill positions should be generated
2949 // currentLoc - the location at which the kills should be added
2952 // true - kills were inserted
2953 // false - no kills were inserted
2956 // The return value is needed because if we have any kills, we need to make sure that
2957 // all defs are located AFTER the kills. On the other hand, if there aren't kills,
2958 // the multiple defs for a regPair are in different locations.
2959 // If we generate any kills, we will mark all currentLiveVars as being preferenced
2960 // to avoid the killed registers. This is somewhat conservative.
2962 bool LinearScan::buildKillPositionsForNode(GenTree* tree, LsraLocation currentLoc)
2964 regMaskTP killMask = getKillSetForNode(tree);
2965 bool isCallKill = ((killMask == RBM_INT_CALLEE_TRASH) || (killMask == RBM_CALLEE_TRASH));
2966 if (killMask != RBM_NONE)
2968 // The killMask identifies a set of registers that will be used during codegen.
2969 // Mark these as modified here, so when we do final frame layout, we'll know about
2970 // all these registers. This is especially important if killMask contains
2971 // callee-saved registers, which affect the frame size since we need to save/restore them.
2972 // In the case where we have a copyBlk with GC pointers, can need to call the
2973 // CORINFO_HELP_ASSIGN_BYREF helper, which kills callee-saved RSI and RDI, if
2974 // LSRA doesn't assign RSI/RDI, they wouldn't get marked as modified until codegen,
2975 // which is too late.
2976 compiler->codeGen->regSet.rsSetRegsModified(killMask DEBUGARG(dumpTerse));
2978 addRefsForPhysRegMask(killMask, currentLoc, RefTypeKill, true);
2980 // TODO-CQ: It appears to be valuable for both fp and int registers to avoid killing the callee
2981 // save regs on infrequently exectued paths. However, it results in a large number of asmDiffs,
2982 // many of which appear to be regressions (because there is more spill on the infrequently path),
2983 // but are not really because the frequent path becomes smaller. Validating these diffs will need
2984 // to be done before making this change.
2985 // if (!blockSequence[curBBSeqNum]->isRunRarely())
2986 if (enregisterLocalVars)
2988 VarSetOps::Iter iter(compiler, currentLiveVars);
2989 unsigned varIndex = 0;
2990 while (iter.NextElem(&varIndex))
2992 unsigned varNum = compiler->lvaTrackedToVarNum[varIndex];
2993 LclVarDsc* varDsc = compiler->lvaTable + varNum;
2994 #if FEATURE_PARTIAL_SIMD_CALLEE_SAVE
2995 if (varDsc->lvType == LargeVectorType)
2997 if (!VarSetOps::IsMember(compiler, largeVectorCalleeSaveCandidateVars, varIndex))
3003 #endif // FEATURE_PARTIAL_SIMD_CALLEE_SAVE
3004 if (varTypeIsFloating(varDsc) &&
3005 !VarSetOps::IsMember(compiler, fpCalleeSaveCandidateVars, varIndex))
3009 Interval* interval = getIntervalForLocalVar(varIndex);
3012 interval->preferCalleeSave = true;
3014 regMaskTP newPreferences = allRegs(interval->registerType) & (~killMask);
3016 if (newPreferences != RBM_NONE)
3018 interval->updateRegisterPreferences(newPreferences);
3022 // If there are no callee-saved registers, the call could kill all the registers.
3023 // This is a valid state, so in that case assert should not trigger. The RA will spill in order to
3024 // free a register later.
3025 assert(compiler->opts.compDbgEnC || (calleeSaveRegs(varDsc->lvType)) == RBM_NONE);
3030 if (tree->IsCall() && (tree->gtFlags & GTF_CALL_UNMANAGED) != 0)
3032 RefPosition* pos = newRefPosition((Interval*)nullptr, currentLoc, RefTypeKillGCRefs, tree,
3033 (allRegs(TYP_REF) & ~RBM_ARG_REGS));
3041 //----------------------------------------------------------------------------
3042 // defineNewInternalTemp: Defines a ref position for an internal temp.
3045 // tree - Gentree node requiring an internal register
3046 // regType - Register type
3047 // currentLoc - Location of the temp Def position
3048 // regMask - register mask of candidates for temp
3049 // minRegCandidateCount - Minimum registers to be ensured in candidate
3050 // set under LSRA stress mode. This is a
3052 RefPosition* LinearScan::defineNewInternalTemp(GenTree* tree,
3053 RegisterType regType,
3054 LsraLocation currentLoc,
3055 regMaskTP regMask DEBUGARG(unsigned minRegCandidateCount))
3057 Interval* current = newInterval(regType);
3058 current->isInternal = true;
3059 return newRefPosition(current, currentLoc, RefTypeDef, tree, regMask, 0 DEBUG_ARG(minRegCandidateCount));
3062 //------------------------------------------------------------------------
3063 // buildInternalRegisterDefsForNode - build Def positions for internal
3064 // registers required for tree node.
3067 // tree - Gentree node that needs internal registers
3068 // currentLoc - Location at which Def positions need to be defined
3069 // temps - in-out array which is populated with ref positions
3070 // created for Def of internal registers
3071 // minRegCandidateCount - Minimum registers to be ensured in candidate
3072 // set of ref positions under LSRA stress. This is
3073 // a DEBUG only arg.
3076 // The total number of Def positions created for internal registers of tree node.
3077 int LinearScan::buildInternalRegisterDefsForNode(GenTree* tree,
3078 LsraLocation currentLoc,
3079 RefPosition* temps[] // populates
3080 DEBUGARG(unsigned minRegCandidateCount))
3083 int internalIntCount = tree->gtLsraInfo.internalIntCount;
3084 regMaskTP internalCands = tree->gtLsraInfo.getInternalCandidates(this);
3086 // If the number of internal integer registers required is the same as the number of candidate integer registers in
3087 // the candidate set, then they must be handled as fixed registers.
3088 // (E.g. for the integer registers that floating point arguments must be copied into for a varargs call.)
3089 bool fixedRegs = false;
3090 regMaskTP internalIntCandidates = (internalCands & allRegs(TYP_INT));
3091 if (((int)genCountBits(internalIntCandidates)) == internalIntCount)
3096 for (count = 0; count < internalIntCount; count++)
3098 regMaskTP internalIntCands = (internalCands & allRegs(TYP_INT));
3101 internalIntCands = genFindLowestBit(internalIntCands);
3102 internalCands &= ~internalIntCands;
3105 defineNewInternalTemp(tree, IntRegisterType, currentLoc, internalIntCands DEBUG_ARG(minRegCandidateCount));
3108 int internalFloatCount = tree->gtLsraInfo.internalFloatCount;
3109 for (int i = 0; i < internalFloatCount; i++)
3111 regMaskTP internalFPCands = (internalCands & internalFloatRegCandidates());
3113 defineNewInternalTemp(tree, FloatRegisterType, currentLoc, internalFPCands DEBUG_ARG(minRegCandidateCount));
3116 assert(count < MaxInternalRegisters);
3117 assert(count == (internalIntCount + internalFloatCount));
3121 //------------------------------------------------------------------------
3122 // buildInternalRegisterUsesForNode - adds Use positions for internal
3123 // registers required for tree node.
3126 // tree - Gentree node that needs internal registers
3127 // currentLoc - Location at which Use positions need to be defined
3128 // defs - int array containing Def positions of internal
3130 // total - Total number of Def positions in 'defs' array.
3131 // minRegCandidateCount - Minimum registers to be ensured in candidate
3132 // set of ref positions under LSRA stress. This is
3133 // a DEBUG only arg.
3137 void LinearScan::buildInternalRegisterUsesForNode(GenTree* tree,
3138 LsraLocation currentLoc,
3139 RefPosition* defs[],
3140 int total DEBUGARG(unsigned minRegCandidateCount))
3142 assert(total < MaxInternalRegisters);
3144 // defs[] has been populated by buildInternalRegisterDefsForNode
3145 // now just add uses to the defs previously added.
3146 for (int i = 0; i < total; i++)
3148 RefPosition* prevRefPosition = defs[i];
3149 assert(prevRefPosition != nullptr);
3150 regMaskTP mask = prevRefPosition->registerAssignment;
3151 if (prevRefPosition->isPhysRegRef)
3153 newRefPosition(defs[i]->getReg()->regNum, currentLoc, RefTypeUse, tree, mask);
3157 RefPosition* newest = newRefPosition(defs[i]->getInterval(), currentLoc, RefTypeUse, tree, mask,
3158 0 DEBUG_ARG(minRegCandidateCount));
3160 if (tree->gtLsraInfo.isInternalRegDelayFree)
3162 newest->delayRegFree = true;
3168 regMaskTP LinearScan::getUseCandidates(GenTree* useNode)
3170 TreeNodeInfo info = useNode->gtLsraInfo;
3171 return info.getSrcCandidates(this);
3174 regMaskTP LinearScan::getDefCandidates(GenTree* tree)
3176 TreeNodeInfo info = tree->gtLsraInfo;
3177 return info.getDstCandidates(this);
3180 RegisterType LinearScan::getDefType(GenTree* tree)
3182 return tree->TypeGet();
3185 //------------------------------------------------------------------------
3186 // LocationInfoListNode: used to store a single `LocationInfo` value for a
3187 // node during `buildIntervals`.
3189 // This is the node type for `LocationInfoList` below.
3191 class LocationInfoListNode final : public LocationInfo
3193 friend class LocationInfoList;
3194 friend class LocationInfoListNodePool;
3196 LocationInfoListNode* m_next; // The next node in the list
3199 LocationInfoListNode(LsraLocation l, Interval* i, GenTree* t, unsigned regIdx = 0) : LocationInfo(l, i, t, regIdx)
3203 //------------------------------------------------------------------------
3204 // LocationInfoListNode::Next: Returns the next node in the list.
3205 LocationInfoListNode* Next() const
3211 //------------------------------------------------------------------------
3212 // LocationInfoList: used to store a list of `LocationInfo` values for a
3213 // node during `buildIntervals`.
3215 // Given an IR node that either directly defines N registers or that is a
3216 // contained node with uses that define a total of N registers, that node
3217 // will map to N `LocationInfo` values. These values are stored as a
3218 // linked list of `LocationInfoListNode` values.
3220 class LocationInfoList final
3222 friend class LocationInfoListNodePool;
3224 LocationInfoListNode* m_head; // The head of the list
3225 LocationInfoListNode* m_tail; // The tail of the list
3228 LocationInfoList() : m_head(nullptr), m_tail(nullptr)
3232 LocationInfoList(LocationInfoListNode* node) : m_head(node), m_tail(node)
3234 assert(m_head->m_next == nullptr);
3237 //------------------------------------------------------------------------
3238 // LocationInfoList::IsEmpty: Returns true if the list is empty.
3240 bool IsEmpty() const
3242 return m_head == nullptr;
3245 //------------------------------------------------------------------------
3246 // LocationInfoList::Begin: Returns the first node in the list.
3248 LocationInfoListNode* Begin() const
3253 //------------------------------------------------------------------------
3254 // LocationInfoList::End: Returns the position after the last node in the
3255 // list. The returned value is suitable for use as
3256 // a sentinel for iteration.
3258 LocationInfoListNode* End() const
3263 //------------------------------------------------------------------------
3264 // LocationInfoList::Append: Appends a node to the list.
3267 // node - The node to append. Must not be part of an existing list.
3269 void Append(LocationInfoListNode* node)
3271 assert(node->m_next == nullptr);
3273 if (m_tail == nullptr)
3275 assert(m_head == nullptr);
3280 m_tail->m_next = node;
3286 //------------------------------------------------------------------------
3287 // LocationInfoList::Append: Appends another list to this list.
3290 // other - The list to append.
3292 void Append(LocationInfoList other)
3294 if (m_tail == nullptr)
3296 assert(m_head == nullptr);
3297 m_head = other.m_head;
3301 m_tail->m_next = other.m_head;
3304 m_tail = other.m_tail;
3308 //------------------------------------------------------------------------
3309 // LocationInfoListNodePool: manages a pool of `LocationInfoListNode`
3310 // values to decrease overall memory usage
3311 // during `buildIntervals`.
3313 // `buildIntervals` involves creating a list of location info values per
3314 // node that either directly produces a set of registers or that is a
3315 // contained node with register-producing sources. However, these lists
3316 // are short-lived: they are destroyed once the use of the corresponding
3317 // node is processed. As such, there is typically only a small number of
3318 // `LocationInfoListNode` values in use at any given time. Pooling these
3319 // values avoids otherwise frequent allocations.
3320 class LocationInfoListNodePool final
3322 LocationInfoListNode* m_freeList;
3323 Compiler* m_compiler;
3326 //------------------------------------------------------------------------
3327 // LocationInfoListNodePool::LocationInfoListNodePool:
3328 // Creates a pool of `LocationInfoListNode` values.
3331 // compiler - The compiler context.
3332 // preallocate - The number of nodes to preallocate.
3334 LocationInfoListNodePool(Compiler* compiler, unsigned preallocate = 0) : m_compiler(compiler)
3336 if (preallocate > 0)
3338 size_t preallocateSize = sizeof(LocationInfoListNode) * preallocate;
3339 auto* preallocatedNodes = reinterpret_cast<LocationInfoListNode*>(compiler->compGetMem(preallocateSize));
3341 LocationInfoListNode* head = preallocatedNodes;
3342 head->m_next = nullptr;
3344 for (unsigned i = 1; i < preallocate; i++)
3346 LocationInfoListNode* node = &preallocatedNodes[i];
3347 node->m_next = head;
3355 //------------------------------------------------------------------------
3356 // LocationInfoListNodePool::GetNode: Fetches an unused node from the
3360 // l - - The `LsraLocation` for the `LocationInfo` value.
3361 // i - The interval for the `LocationInfo` value.
3362 // t - The IR node for the `LocationInfo` value
3363 // regIdx - The register index for the `LocationInfo` value.
3366 // A pooled or newly-allocated `LocationInfoListNode`, depending on the
3367 // contents of the pool.
3368 LocationInfoListNode* GetNode(LsraLocation l, Interval* i, GenTree* t, unsigned regIdx = 0)
3370 LocationInfoListNode* head = m_freeList;
3371 if (head == nullptr)
3373 head = reinterpret_cast<LocationInfoListNode*>(m_compiler->compGetMem(sizeof(LocationInfoListNode)));
3377 m_freeList = head->m_next;
3383 head->multiRegIdx = regIdx;
3384 head->m_next = nullptr;
3389 //------------------------------------------------------------------------
3390 // LocationInfoListNodePool::ReturnNodes: Returns a list of nodes to the
3394 // list - The list to return.
3396 void ReturnNodes(LocationInfoList& list)
3398 assert(list.m_head != nullptr);
3399 assert(list.m_tail != nullptr);
3401 LocationInfoListNode* head = m_freeList;
3402 list.m_tail->m_next = head;
3403 m_freeList = list.m_head;
3407 #if FEATURE_PARTIAL_SIMD_CALLEE_SAVE
3409 LinearScan::buildUpperVectorSaveRefPositions(GenTree* tree, LsraLocation currentLoc)
3411 assert(enregisterLocalVars);
3412 VARSET_TP liveLargeVectors(VarSetOps::MakeEmpty(compiler));
3413 regMaskTP fpCalleeKillSet = RBM_NONE;
3414 if (!VarSetOps::IsEmpty(compiler, largeVectorVars))
3416 // We actually need to find any calls that kill the upper-half of the callee-save vector registers.
3417 // But we will use as a proxy any node that kills floating point registers.
3418 // (Note that some calls are masquerading as other nodes at this point so we can't just check for calls.)
3419 fpCalleeKillSet = getKillSetForNode(tree);
3420 if ((fpCalleeKillSet & RBM_FLT_CALLEE_TRASH) != RBM_NONE)
3422 VarSetOps::AssignNoCopy(compiler, liveLargeVectors,
3423 VarSetOps::Intersection(compiler, currentLiveVars, largeVectorVars));
3424 VarSetOps::Iter iter(compiler, liveLargeVectors);
3425 unsigned varIndex = 0;
3426 while (iter.NextElem(&varIndex))
3428 Interval* varInterval = getIntervalForLocalVar(varIndex);
3429 Interval* tempInterval = newInterval(LargeVectorType);
3430 tempInterval->isInternal = true;
3432 newRefPosition(tempInterval, currentLoc, RefTypeUpperVectorSaveDef, tree, RBM_FLT_CALLEE_SAVED);
3433 // We are going to save the existing relatedInterval of varInterval on tempInterval, so that we can set
3434 // the tempInterval as the relatedInterval of varInterval, so that we can build the corresponding
3435 // RefTypeUpperVectorSaveUse RefPosition. We will then restore the relatedInterval onto varInterval,
3436 // and set varInterval as the relatedInterval of tempInterval.
3437 tempInterval->relatedInterval = varInterval->relatedInterval;
3438 varInterval->relatedInterval = tempInterval;
3442 return liveLargeVectors;
3445 void LinearScan::buildUpperVectorRestoreRefPositions(GenTree* tree,
3446 LsraLocation currentLoc,
3447 VARSET_VALARG_TP liveLargeVectors)
3449 assert(enregisterLocalVars);
3450 if (!VarSetOps::IsEmpty(compiler, liveLargeVectors))
3452 VarSetOps::Iter iter(compiler, liveLargeVectors);
3453 unsigned varIndex = 0;
3454 while (iter.NextElem(&varIndex))
3456 Interval* varInterval = getIntervalForLocalVar(varIndex);
3457 Interval* tempInterval = varInterval->relatedInterval;
3458 assert(tempInterval->isInternal == true);
3460 newRefPosition(tempInterval, currentLoc, RefTypeUpperVectorSaveUse, tree, RBM_FLT_CALLEE_SAVED);
3461 // Restore the relatedInterval onto varInterval, and set varInterval as the relatedInterval
3463 varInterval->relatedInterval = tempInterval->relatedInterval;
3464 tempInterval->relatedInterval = varInterval;
3468 #endif // FEATURE_PARTIAL_SIMD_CALLEE_SAVE
3471 //------------------------------------------------------------------------
3472 // ComputeOperandDstCount: computes the number of registers defined by a
3475 // For most nodes, this is simple:
3476 // - Nodes that do not produce values (e.g. stores and other void-typed
3477 // nodes) and nodes that immediately use the registers they define
3478 // produce no registers
3479 // - Nodes that are marked as defining N registers define N registers.
3481 // For contained nodes, however, things are more complicated: for purposes
3482 // of bookkeeping, a contained node is treated as producing the transitive
3483 // closure of the registers produced by its sources.
3486 // operand - The operand for which to compute a register count.
3489 // The number of registers defined by `operand`.
3491 static int ComputeOperandDstCount(GenTree* operand)
3493 TreeNodeInfo& operandInfo = operand->gtLsraInfo;
3495 if (operandInfo.isLocalDefUse)
3497 // Operands that define an unused value do not produce any registers.
3500 else if (operandInfo.dstCount != 0)
3502 // Operands that have a specified number of destination registers consume all of their operands
3503 // and therefore produce exactly that number of registers.
3504 return operandInfo.dstCount;
3506 else if (operandInfo.srcCount != 0)
3508 // If an operand has no destination registers but does have source registers, it must be a store
3510 assert(operand->OperIsStore() || operand->OperIsBlkOp() || operand->OperIsPutArgStk() ||
3511 operand->OperIsCompare() || operand->OperIs(GT_CMP) || operand->IsSIMDEqualityOrInequality());
3514 else if (!operand->OperIsFieldListHead() && (operand->OperIsStore() || operand->TypeGet() == TYP_VOID))
3516 // Stores and void-typed operands may be encountered when processing call nodes, which contain
3517 // pointers to argument setup stores.
3522 // If a field list or non-void-typed operand is not an unused value and does not have source registers,
3523 // that argument is contained within its parent and produces `sum(operand_dst_count)` registers.
3525 for (GenTree* op : operand->Operands())
3527 dstCount += ComputeOperandDstCount(op);
3534 //------------------------------------------------------------------------
3535 // ComputeAvailableSrcCount: computes the number of registers available as
3536 // sources for a node.
3538 // This is simply the sum of the number of registers produced by each
3539 // operand to the node.
3542 // node - The node for which to compute a source count.
3545 // The number of registers available as sources for `node`.
3547 static int ComputeAvailableSrcCount(GenTree* node)
3550 for (GenTree* operand : node->Operands())
3552 numSources += ComputeOperandDstCount(operand);
3559 static GenTree* GetFirstOperand(GenTree* node)
3561 GenTree* firstOperand = nullptr;
3562 node->VisitOperands([&firstOperand](GenTree* operand) -> GenTree::VisitResult {
3563 firstOperand = operand;
3564 return GenTree::VisitResult::Abort;
3566 return firstOperand;
3569 void LinearScan::buildRefPositionsForNode(GenTree* tree,
3571 LocationInfoListNodePool& listNodePool,
3572 HashTableBase<GenTree*, LocationInfoList>& operandToLocationInfoMap,
3573 LsraLocation currentLoc)
3576 assert(!isRegPairType(tree->TypeGet()));
3577 #endif // _TARGET_ARM_
3579 // The LIR traversal doesn't visit GT_LIST or GT_ARGPLACE nodes.
3580 // GT_CLS_VAR nodes should have been eliminated by rationalizer.
3581 assert(tree->OperGet() != GT_ARGPLACE);
3582 assert(tree->OperGet() != GT_LIST);
3583 assert(tree->OperGet() != GT_CLS_VAR);
3585 // The LIR traversal visits only the first node in a GT_FIELD_LIST.
3586 assert((tree->OperGet() != GT_FIELD_LIST) || tree->AsFieldList()->IsFieldListHead());
3588 // The set of internal temporary registers used by this node are stored in the
3589 // gtRsvdRegs register mask. Clear it out.
3590 tree->gtRsvdRegs = RBM_NONE;
3592 TreeNodeInfo info = tree->gtLsraInfo;
3593 assert(info.IsValid(this));
3594 int consume = info.srcCount;
3595 int produce = info.dstCount;
3600 lsraDispNode(tree, LSRA_DUMP_REFPOS, (produce != 0));
3602 if (tree->isContained())
3604 JITDUMP("Contained\n");
3606 else if (tree->OperIs(GT_LCL_VAR, GT_LCL_FLD) && info.isLocalDefUse)
3608 JITDUMP("Unused\n");
3612 JITDUMP(" consume=%d produce=%d\n", consume, produce);
3617 JITDUMP("at start of tree, map contains: { ");
3619 for (auto kvp : operandToLocationInfoMap)
3621 GenTree* node = kvp.Key();
3622 LocationInfoList defList = kvp.Value();
3624 JITDUMP("%sN%03u. %s -> (", first ? "" : "; ", node->gtSeqNum, GenTree::OpName(node->OperGet()));
3625 for (LocationInfoListNode *def = defList.Begin(), *end = defList.End(); def != end; def = def->Next())
3627 JITDUMP("%s%d.N%03u", def == defList.Begin() ? "" : ", ", def->loc, def->treeNode->gtSeqNum);
3638 assert(((consume == 0) && (produce == 0)) || (ComputeAvailableSrcCount(tree) == consume));
3640 if (tree->OperIs(GT_LCL_VAR, GT_LCL_FLD))
3642 LclVarDsc* const varDsc = &compiler->lvaTable[tree->AsLclVarCommon()->gtLclNum];
3643 if (isCandidateVar(varDsc))
3645 assert(consume == 0);
3647 // We handle tracked variables differently from non-tracked ones. If it is tracked,
3648 // we simply add a use or def of the tracked variable. Otherwise, for a use we need
3649 // to actually add the appropriate references for loading or storing the variable.
3651 // It won't actually get used or defined until the appropriate ancestor tree node
3652 // is processed, unless this is marked "isLocalDefUse" because it is a stack-based argument
3655 assert(varDsc->lvTracked);
3656 unsigned varIndex = varDsc->lvVarIndex;
3658 // We have only approximate last-use information at this point. This is because the
3659 // execution order doesn't actually reflect the true order in which the localVars
3660 // are referenced - but the order of the RefPositions will, so we recompute it after
3661 // RefPositions are built.
3662 // Use the old value for setting currentLiveVars - note that we do this with the
3663 // not-quite-correct setting of lastUse. However, this is OK because
3664 // 1) this is only for preferencing, which doesn't require strict correctness, and
3665 // 2) the cases where these out-of-order uses occur should not overlap a kill.
3666 // TODO-Throughput: clean this up once we have the execution order correct. At that point
3667 // we can update currentLiveVars at the same place that we create the RefPosition.
3668 if ((tree->gtFlags & GTF_VAR_DEATH) != 0)
3670 VarSetOps::RemoveElemD(compiler, currentLiveVars, varIndex);
3673 if (!info.isLocalDefUse && !tree->isContained())
3675 assert(produce != 0);
3677 LocationInfoList list(listNodePool.GetNode(currentLoc, getIntervalForLocalVar(varIndex), tree));
3678 bool added = operandToLocationInfoMap.AddOrUpdate(tree, list);
3681 tree->gtLsraInfo.definesAnyRegisters = true;
3687 if (tree->isContained())
3689 assert(!info.isLocalDefUse);
3690 assert(consume == 0);
3691 assert(produce == 0);
3692 assert(info.internalIntCount == 0);
3693 assert(info.internalFloatCount == 0);
3695 // Contained nodes map to the concatenated lists of their operands.
3696 LocationInfoList locationInfoList;
3697 tree->VisitOperands([&](GenTree* op) -> GenTree::VisitResult {
3698 if (!op->gtLsraInfo.definesAnyRegisters)
3700 assert(ComputeOperandDstCount(op) == 0);
3701 return GenTree::VisitResult::Continue;
3704 LocationInfoList operandList;
3705 bool removed = operandToLocationInfoMap.TryRemove(op, &operandList);
3708 locationInfoList.Append(operandList);
3709 return GenTree::VisitResult::Continue;
3712 if (!locationInfoList.IsEmpty())
3714 bool added = operandToLocationInfoMap.AddOrUpdate(tree, locationInfoList);
3716 tree->gtLsraInfo.definesAnyRegisters = true;
3722 // Handle the case of local variable assignment
3723 Interval* varDefInterval = nullptr;
3724 RefType defRefType = RefTypeDef;
3726 GenTree* defNode = tree;
3728 // noAdd means the node creates a def but for purposes of map
3729 // management do not add it because data is not flowing up the
3730 // tree but over (as in ASG nodes)
3732 bool noAdd = info.isLocalDefUse;
3733 RefPosition* prevPos = nullptr;
3735 bool isSpecialPutArg = false;
3737 assert(!tree->OperIsAssignment());
3738 if (tree->OperIsLocalStore())
3740 GenTreeLclVarCommon* const store = tree->AsLclVarCommon();
3741 assert((consume > 1) || (regType(store->gtOp1->TypeGet()) == regType(store->TypeGet())));
3743 LclVarDsc* varDsc = &compiler->lvaTable[store->gtLclNum];
3744 if (isCandidateVar(varDsc))
3746 // We always push the tracked lclVar intervals
3747 assert(varDsc->lvTracked);
3748 unsigned varIndex = varDsc->lvVarIndex;
3749 varDefInterval = getIntervalForLocalVar(varIndex);
3750 defRefType = refTypeForLocalRefNode(tree);
3758 assert(consume <= MAX_RET_REG_COUNT);
3761 // Get the location info for the register defined by the first operand.
3762 LocationInfoList operandDefs;
3763 bool found = operandToLocationInfoMap.TryGetValue(GetFirstOperand(tree), &operandDefs);
3766 // Since we only expect to consume one register, we should only have a single register to
3768 assert(operandDefs.Begin()->Next() == operandDefs.End());
3770 LocationInfo& operandInfo = *static_cast<LocationInfo*>(operandDefs.Begin());
3772 Interval* srcInterval = operandInfo.interval;
3773 if (srcInterval->relatedInterval == nullptr)
3775 // Preference the source to the dest, unless this is a non-last-use localVar.
3776 // Note that the last-use info is not correct, but it is a better approximation than preferencing
3777 // the source to the dest, if the source's lifetime extends beyond the dest.
3778 if (!srcInterval->isLocalVar || (operandInfo.treeNode->gtFlags & GTF_VAR_DEATH) != 0)
3780 srcInterval->assignRelatedInterval(varDefInterval);
3783 else if (!srcInterval->isLocalVar)
3785 // Preference the source to dest, if src is not a local var.
3786 srcInterval->assignRelatedInterval(varDefInterval);
3790 if ((tree->gtFlags & GTF_VAR_DEATH) == 0)
3792 VarSetOps::AddElemD(compiler, currentLiveVars, varIndex);
3795 else if (store->gtOp1->OperIs(GT_BITCAST))
3797 store->gtType = store->gtOp1->gtType = store->gtOp1->AsUnOp()->gtOp1->TypeGet();
3799 // Get the location info for the register defined by the first operand.
3800 LocationInfoList operandDefs;
3801 bool found = operandToLocationInfoMap.TryGetValue(GetFirstOperand(store), &operandDefs);
3804 // Since we only expect to consume one register, we should only have a single register to consume.
3805 assert(operandDefs.Begin()->Next() == operandDefs.End());
3807 LocationInfo& operandInfo = *static_cast<LocationInfo*>(operandDefs.Begin());
3809 Interval* srcInterval = operandInfo.interval;
3810 srcInterval->registerType = regType(store->TypeGet());
3812 RefPosition* srcDefPosition = srcInterval->firstRefPosition;
3813 assert(srcDefPosition != nullptr);
3814 assert(srcDefPosition->refType == RefTypeDef);
3815 assert(srcDefPosition->treeNode == store->gtOp1);
3817 srcDefPosition->registerAssignment = allRegs(store->TypeGet());
3818 store->gtOp1->gtLsraInfo.setSrcCandidates(this, allRegs(store->TypeGet()));
3821 else if (noAdd && produce == 0)
3823 // This is the case for dead nodes that occur after
3824 // tree rationalization
3825 // TODO-Cleanup: Identify and remove these dead nodes prior to register allocation.
3826 if (tree->IsMultiRegCall())
3828 // In case of multi-reg call node, produce = number of return registers
3829 produce = tree->AsCall()->GetReturnTypeDesc()->GetReturnRegCount();
3837 Interval* prefSrcInterval = nullptr;
3839 // If this is a binary operator that will be encoded with 2 operand fields
3840 // (i.e. the target is read-modify-write), preference the dst to op1.
3842 bool hasDelayFreeSrc = tree->gtLsraInfo.hasDelayFreeSrc;
3844 #if defined(DEBUG) && defined(_TARGET_X86_)
3845 // On x86, `LSRA_LIMIT_CALLER` is too restrictive to allow the use of special put args: this stress mode
3846 // leaves only three registers allocatable--eax, ecx, and edx--of which the latter two are also used for the
3847 // first two integral arguments to a call. This can leave us with too few registers to succesfully allocate in
3848 // situations like the following:
3850 // t1026 = lclVar ref V52 tmp35 u:3 REG NA <l:$3a1, c:$98d>
3853 // t1352 = * putarg_reg ref REG NA
3855 // t342 = lclVar int V14 loc6 u:4 REG NA $50c
3857 // t343 = const int 1 REG NA $41
3861 // t344 = * + int REG NA $495
3863 // t345 = lclVar int V04 arg4 u:2 REG NA $100
3867 // t346 = * % int REG NA $496
3870 // t1353 = * putarg_reg int REG NA
3872 // t1354 = lclVar ref V52 tmp35 (last use) REG NA
3875 // t1355 = * lea(b+0) byref REG NA
3877 // Here, the first `putarg_reg` would normally be considered a special put arg, which would remove `ecx` from the
3878 // set of allocatable registers, leaving only `eax` and `edx`. The allocator will then fail to allocate a register
3879 // for the def of `t345` if arg4 is not a register candidate: the corresponding ref position will be constrained to
3880 // { `ecx`, `ebx`, `esi`, `edi` }, which `LSRA_LIMIT_CALLER` will further constrain to `ecx`, which will not be
3881 // available due to the special put arg.
3882 const bool supportsSpecialPutArg = getStressLimitRegs() != LSRA_LIMIT_CALLER;
3884 const bool supportsSpecialPutArg = true;
3887 if (supportsSpecialPutArg && tree->OperGet() == GT_PUTARG_REG && isCandidateLocalRef(tree->gtGetOp1()) &&
3888 (tree->gtGetOp1()->gtFlags & GTF_VAR_DEATH) == 0)
3890 // This is the case for a "pass-through" copy of a lclVar. In the case where it is a non-last-use,
3891 // we don't want the def of the copy to kill the lclVar register, if it is assigned the same register
3892 // (which is actually what we hope will happen).
3893 JITDUMP("Setting putarg_reg as a pass-through of a non-last use lclVar\n");
3895 // Get the register information for the first operand of the node.
3896 LocationInfoList operandDefs;
3897 bool found = operandToLocationInfoMap.TryGetValue(GetFirstOperand(tree), &operandDefs);
3900 // Preference the destination to the interval of the first register defined by the first operand.
3901 Interval* srcInterval = operandDefs.Begin()->interval;
3902 assert(srcInterval->isLocalVar);
3903 prefSrcInterval = srcInterval;
3904 isSpecialPutArg = true;
3907 RefPosition* internalRefs[MaxInternalRegisters];
3910 // Number of registers required for tree node is the sum of
3911 // consume + produce + internalCount. This is the minimum
3912 // set of registers that needs to be ensured in candidate
3913 // set of ref positions created.
3914 unsigned minRegCount = consume + produce + info.internalIntCount + info.internalFloatCount;
3917 // make intervals for all the 'internal' register requirements for this node
3918 // where internal means additional registers required temporarily
3919 int internalCount = buildInternalRegisterDefsForNode(tree, currentLoc, internalRefs DEBUG_ARG(minRegCount));
3921 // pop all ref'd tree temps
3922 tree->VisitOperands([&](GenTree* operand) -> GenTree::VisitResult {
3923 // Skip operands that do not define any registers, whether directly or indirectly.
3924 if (!operand->gtLsraInfo.definesAnyRegisters)
3926 return GenTree::VisitResult::Continue;
3929 // Remove the list of registers defined by the current operand from the map. Note that this
3930 // is only correct because tree nodes are singly-used: if this property ever changes (e.g.
3931 // if tree nodes are eventually allowed to be multiply-used), then the removal is only
3932 // correct at the last use.
3933 LocationInfoList operandDefs;
3934 bool removed = operandToLocationInfoMap.TryRemove(operand, &operandDefs);
3936 assert(!operandDefs.IsEmpty());
3938 LocationInfoListNode* const operandDefsEnd = operandDefs.End();
3939 for (LocationInfoListNode* operandDefsIterator = operandDefs.Begin(); operandDefsIterator != operandDefsEnd;
3940 operandDefsIterator = operandDefsIterator->Next())
3942 LocationInfo& locInfo = *static_cast<LocationInfo*>(operandDefsIterator);
3944 // for interstitial tree temps, a use is always last and end; this is set by default in newRefPosition
3945 GenTree* const useNode = locInfo.treeNode;
3946 assert(useNode != nullptr);
3948 Interval* const i = locInfo.interval;
3949 if (useNode->gtLsraInfo.isTgtPref)
3951 prefSrcInterval = i;
3954 const bool delayRegFree = (hasDelayFreeSrc && useNode->gtLsraInfo.isDelayFree);
3957 // If delayRegFree, then Use will interfere with the destination of
3958 // the consuming node. Therefore, we also need add the kill set of
3959 // consuming node to minRegCount.
3961 // For example consider the following IR on x86, where v01 and v02
3962 // are method args coming in ecx and edx respectively.
3965 // For GT_DIV minRegCount will be 3 without adding kill set
3968 // Assume further JitStressRegs=2, which would constrain
3969 // candidates to callee trashable regs { eax, ecx, edx } on
3970 // use positions of v01 and v02. LSRA allocates ecx for v01.
3971 // Use position of v02 cannot be allocated a regs since it
3972 // is marked delay-reg free and {eax,edx} are getting killed
3973 // before the def of GT_DIV. For this reason, minRegCount
3974 // for Use position of v02 also needs to take into account
3975 // of kill set of its consuming node.
3976 unsigned minRegCountForUsePos = minRegCount;
3979 regMaskTP killMask = getKillSetForNode(tree);
3980 if (killMask != RBM_NONE)
3982 minRegCountForUsePos += genCountBits(killMask);
3987 regMaskTP candidates = getUseCandidates(useNode);
3988 assert((candidates & allRegs(i->registerType)) != 0);
3990 // For non-localVar uses we record nothing, as nothing needs to be written back to the tree.
3991 GenTree* const refPosNode = i->isLocalVar ? useNode : nullptr;
3992 RefPosition* pos = newRefPosition(i, currentLoc, RefTypeUse, refPosNode, candidates,
3993 locInfo.multiRegIdx DEBUG_ARG(minRegCountForUsePos));
3997 pos->delayRegFree = true;
4000 if (useNode->IsRegOptional())
4002 pos->setAllocateIfProfitable(true);
4006 listNodePool.ReturnNodes(operandDefs);
4008 return GenTree::VisitResult::Continue;
4011 buildInternalRegisterUsesForNode(tree, currentLoc, internalRefs, internalCount DEBUG_ARG(minRegCount));
4013 RegisterType registerType = getDefType(tree);
4014 regMaskTP candidates = getDefCandidates(tree);
4015 regMaskTP useCandidates = getUseCandidates(tree);
4018 if (VERBOSE && produce)
4020 printf("Def candidates ");
4021 dumpRegMask(candidates);
4022 printf(", Use candidates ");
4023 dumpRegMask(useCandidates);
4028 #if defined(_TARGET_AMD64_)
4029 // Multi-reg call node is the only node that could produce multi-reg value
4030 assert(produce <= 1 || (tree->IsMultiRegCall() && produce == MAX_RET_REG_COUNT));
4031 #endif // _TARGET_xxx_
4033 // Add kill positions before adding def positions
4034 buildKillPositionsForNode(tree, currentLoc + 1);
4036 #if FEATURE_PARTIAL_SIMD_CALLEE_SAVE
4037 VARSET_TP liveLargeVectors(VarSetOps::UninitVal());
4038 if (enregisterLocalVars && (RBM_FLT_CALLEE_SAVED != RBM_NONE))
4040 // Build RefPositions for saving any live large vectors.
4041 // This must be done after the kills, so that we know which large vectors are still live.
4042 VarSetOps::AssignNoCopy(compiler, liveLargeVectors, buildUpperVectorSaveRefPositions(tree, currentLoc + 1));
4044 #endif // FEATURE_PARTIAL_SIMD_CALLEE_SAVE
4046 ReturnTypeDesc* retTypeDesc = nullptr;
4047 bool isMultiRegCall = tree->IsMultiRegCall();
4050 retTypeDesc = tree->AsCall()->GetReturnTypeDesc();
4051 assert((int)genCountBits(candidates) == produce);
4052 assert(candidates == retTypeDesc->GetABIReturnRegs());
4056 LocationInfoList locationInfoList;
4057 LsraLocation defLocation = currentLoc + 1;
4059 regMaskTP remainingUseCandidates = useCandidates;
4061 for (int i = 0; i < produce; i++)
4063 regMaskTP currCandidates = candidates;
4064 Interval* interval = varDefInterval;
4066 // In case of multi-reg call node, registerType is given by
4067 // the type of ith position return register.
4070 registerType = retTypeDesc->GetReturnRegType((unsigned)i);
4071 currCandidates = genRegMask(retTypeDesc->GetABIReturnReg(i));
4072 useCandidates = allRegs(registerType);
4076 if (tree->OperIsPutArgSplit())
4078 // get i-th candidate
4079 currCandidates = genFindLowestReg(candidates);
4080 candidates &= ~currCandidates;
4083 // If oper is GT_PUTARG_REG, set bits in useCandidates must be in sequential order.
4084 else if (tree->OperGet() == GT_PUTARG_REG || tree->OperGet() == GT_COPY)
4086 useCandidates = genFindLowestReg(remainingUseCandidates);
4087 remainingUseCandidates &= ~useCandidates;
4089 #endif // ARM_SOFTFP
4090 #endif // _TARGET_ARM_
4092 if (interval == nullptr)
4094 // Make a new interval
4095 interval = newInterval(registerType);
4096 if (hasDelayFreeSrc)
4098 interval->hasNonCommutativeRMWDef = true;
4100 else if (tree->OperIsConst())
4102 assert(!tree->IsReuseRegVal());
4103 interval->isConstant = true;
4106 if ((currCandidates & useCandidates) != RBM_NONE)
4108 interval->updateRegisterPreferences(currCandidates & useCandidates);
4111 if (isSpecialPutArg)
4113 interval->isSpecialPutArg = true;
4118 assert(registerTypesEquivalent(interval->registerType, registerType));
4121 if (prefSrcInterval != nullptr)
4123 interval->assignRelatedIntervalIfUnassigned(prefSrcInterval);
4126 // for assignments, we want to create a refposition for the def
4130 locationInfoList.Append(listNodePool.GetNode(defLocation, interval, tree, (unsigned)i));
4133 RefPosition* pos = newRefPosition(interval, defLocation, defRefType, defNode, currCandidates,
4134 (unsigned)i DEBUG_ARG(minRegCount));
4135 if (info.isLocalDefUse)
4137 pos->isLocalDefUse = true;
4138 pos->lastUse = true;
4140 interval->updateRegisterPreferences(currCandidates);
4141 interval->updateRegisterPreferences(useCandidates);
4144 #if FEATURE_PARTIAL_SIMD_CALLEE_SAVE
4145 // SaveDef position must be at the same location as Def position of call node.
4146 if (enregisterLocalVars)
4148 buildUpperVectorRestoreRefPositions(tree, defLocation, liveLargeVectors);
4150 #endif // FEATURE_PARTIAL_SIMD_CALLEE_SAVE
4152 if (!locationInfoList.IsEmpty())
4154 bool added = operandToLocationInfoMap.AddOrUpdate(tree, locationInfoList);
4156 tree->gtLsraInfo.definesAnyRegisters = true;
4161 // make an interval for each physical register
4162 void LinearScan::buildPhysRegRecords()
4164 RegisterType regType = IntRegisterType;
4165 for (regNumber reg = REG_FIRST; reg < ACTUAL_REG_COUNT; reg = REG_NEXT(reg))
4167 RegRecord* curr = &physRegs[reg];
4172 BasicBlock* getNonEmptyBlock(BasicBlock* block)
4174 while (block != nullptr && block->bbTreeList == nullptr)
4176 BasicBlock* nextBlock = block->bbNext;
4177 // Note that here we use the version of NumSucc that does not take a compiler.
4178 // That way this doesn't have to take a compiler, or be an instance method, e.g. of LinearScan.
4179 // If we have an empty block, it must have jump type BBJ_NONE or BBJ_ALWAYS, in which
4180 // case we don't need the version that takes a compiler.
4181 assert(block->NumSucc() == 1 && ((block->bbJumpKind == BBJ_ALWAYS) || (block->bbJumpKind == BBJ_NONE)));
4182 // sometimes the first block is empty and ends with an uncond branch
4183 // assert( block->GetSucc(0) == nextBlock);
4186 assert(block != nullptr && block->bbTreeList != nullptr);
4190 //------------------------------------------------------------------------
4191 // insertZeroInitRefPositions: Handle lclVars that are live-in to the first block
4194 // Prior to calling this method, 'currentLiveVars' must be set to the set of register
4195 // candidate variables that are liveIn to the first block.
4196 // For each register candidate that is live-in to the first block:
4197 // - If it is a GC ref, or if compInitMem is set, a ZeroInit RefPosition will be created.
4198 // - Otherwise, it will be marked as spilled, since it will not be assigned a register
4199 // on entry and will be loaded from memory on the undefined path.
4200 // Note that, when the compInitMem option is not set, we may encounter these on
4201 // paths that are protected by the same condition as an earlier def. However, since
4202 // we don't do the analysis to determine this - and couldn't rely on always identifying
4203 // such cases even if we tried - we must conservatively treat the undefined path as
4204 // being possible. This is a relatively rare case, so the introduced conservatism is
4205 // not expected to warrant the analysis required to determine the best placement of
4206 // an initialization.
4208 void LinearScan::insertZeroInitRefPositions()
4210 assert(enregisterLocalVars);
4212 VARSET_TP expectedLiveVars(VarSetOps::Intersection(compiler, registerCandidateVars, compiler->fgFirstBB->bbLiveIn));
4213 assert(VarSetOps::Equal(compiler, currentLiveVars, expectedLiveVars));
4216 // insert defs for this, then a block boundary
4218 VarSetOps::Iter iter(compiler, currentLiveVars);
4219 unsigned varIndex = 0;
4220 while (iter.NextElem(&varIndex))
4222 unsigned varNum = compiler->lvaTrackedToVarNum[varIndex];
4223 LclVarDsc* varDsc = compiler->lvaTable + varNum;
4224 if (!varDsc->lvIsParam && isCandidateVar(varDsc))
4226 JITDUMP("V%02u was live in to first block:", varNum);
4227 Interval* interval = getIntervalForLocalVar(varIndex);
4228 if (compiler->info.compInitMem || varTypeIsGC(varDsc->TypeGet()))
4230 JITDUMP(" creating ZeroInit\n");
4231 GenTree* firstNode = getNonEmptyBlock(compiler->fgFirstBB)->firstNode();
4233 newRefPosition(interval, MinLocation, RefTypeZeroInit, firstNode, allRegs(interval->registerType));
4234 varDsc->lvMustInit = true;
4238 setIntervalAsSpilled(interval);
4239 JITDUMP(" marking as spilled\n");
4245 #if defined(FEATURE_UNIX_AMD64_STRUCT_PASSING)
4246 // -----------------------------------------------------------------------
4247 // Sets the register state for an argument of type STRUCT for System V systems.
4248 // See Compiler::raUpdateRegStateForArg(RegState *regState, LclVarDsc *argDsc) in regalloc.cpp
4249 // for how state for argument is updated for unix non-structs and Windows AMD64 structs.
4250 void LinearScan::unixAmd64UpdateRegStateForArg(LclVarDsc* argDsc)
4252 assert(varTypeIsStruct(argDsc));
4253 RegState* intRegState = &compiler->codeGen->intRegState;
4254 RegState* floatRegState = &compiler->codeGen->floatRegState;
4256 if ((argDsc->lvArgReg != REG_STK) && (argDsc->lvArgReg != REG_NA))
4258 if (genRegMask(argDsc->lvArgReg) & (RBM_ALLFLOAT))
4260 assert(genRegMask(argDsc->lvArgReg) & (RBM_FLTARG_REGS));
4261 floatRegState->rsCalleeRegArgMaskLiveIn |= genRegMask(argDsc->lvArgReg);
4265 assert(genRegMask(argDsc->lvArgReg) & (RBM_ARG_REGS));
4266 intRegState->rsCalleeRegArgMaskLiveIn |= genRegMask(argDsc->lvArgReg);
4270 if ((argDsc->lvOtherArgReg != REG_STK) && (argDsc->lvOtherArgReg != REG_NA))
4272 if (genRegMask(argDsc->lvOtherArgReg) & (RBM_ALLFLOAT))
4274 assert(genRegMask(argDsc->lvOtherArgReg) & (RBM_FLTARG_REGS));
4275 floatRegState->rsCalleeRegArgMaskLiveIn |= genRegMask(argDsc->lvOtherArgReg);
4279 assert(genRegMask(argDsc->lvOtherArgReg) & (RBM_ARG_REGS));
4280 intRegState->rsCalleeRegArgMaskLiveIn |= genRegMask(argDsc->lvOtherArgReg);
4285 #endif // defined(FEATURE_UNIX_AMD64_STRUCT_PASSING)
4287 //------------------------------------------------------------------------
4288 // updateRegStateForArg: Updates rsCalleeRegArgMaskLiveIn for the appropriate
4289 // regState (either compiler->intRegState or compiler->floatRegState),
4290 // with the lvArgReg on "argDsc"
4293 // argDsc - the argument for which the state is to be updated.
4295 // Return Value: None
4298 // The argument is live on entry to the function
4299 // (or is untracked and therefore assumed live)
4302 // This relies on a method in regAlloc.cpp that is shared between LSRA
4303 // and regAlloc. It is further abstracted here because regState is updated
4304 // separately for tracked and untracked variables in LSRA.
4306 void LinearScan::updateRegStateForArg(LclVarDsc* argDsc)
4308 #if defined(FEATURE_UNIX_AMD64_STRUCT_PASSING)
4309 // For System V AMD64 calls the argDsc can have 2 registers (for structs.)
4310 // Handle them here.
4311 if (varTypeIsStruct(argDsc))
4313 unixAmd64UpdateRegStateForArg(argDsc);
4316 #endif // defined(FEATURE_UNIX_AMD64_STRUCT_PASSING)
4318 RegState* intRegState = &compiler->codeGen->intRegState;
4319 RegState* floatRegState = &compiler->codeGen->floatRegState;
4320 // In the case of AMD64 we'll still use the floating point registers
4321 // to model the register usage for argument on vararg calls, so
4322 // we will ignore the varargs condition to determine whether we use
4323 // XMM registers or not for setting up the call.
4324 bool isFloat = (isFloatRegType(argDsc->lvType)
4325 #ifndef _TARGET_AMD64_
4326 && !compiler->info.compIsVarArgs
4328 && !compiler->opts.compUseSoftFP);
4330 if (argDsc->lvIsHfaRegArg())
4337 JITDUMP("Float arg V%02u in reg %s\n", (argDsc - compiler->lvaTable), getRegName(argDsc->lvArgReg));
4338 compiler->raUpdateRegStateForArg(floatRegState, argDsc);
4342 JITDUMP("Int arg V%02u in reg %s\n", (argDsc - compiler->lvaTable), getRegName(argDsc->lvArgReg));
4343 #if FEATURE_MULTIREG_ARGS
4344 if (argDsc->lvOtherArgReg != REG_NA)
4346 JITDUMP("(second half) in reg %s\n", getRegName(argDsc->lvOtherArgReg));
4348 #endif // FEATURE_MULTIREG_ARGS
4349 compiler->raUpdateRegStateForArg(intRegState, argDsc);
4354 //------------------------------------------------------------------------
4355 // findPredBlockForLiveIn: Determine which block should be used for the register locations of the live-in variables.
4358 // block - The block for which we're selecting a predecesor.
4359 // prevBlock - The previous block in in allocation order.
4360 // pPredBlockIsAllocated - A debug-only argument that indicates whether any of the predecessors have been seen
4361 // in allocation order.
4364 // The selected predecessor.
4367 // in DEBUG, caller initializes *pPredBlockIsAllocated to false, and it will be set to true if the block
4368 // returned is in fact a predecessor.
4371 // This will select a predecessor based on the heuristics obtained by getLsraBlockBoundaryLocations(), which can be
4373 // LSRA_BLOCK_BOUNDARY_PRED - Use the register locations of a predecessor block (default)
4374 // LSRA_BLOCK_BOUNDARY_LAYOUT - Use the register locations of the previous block in layout order.
4375 // This is the only case where this actually returns a different block.
4376 // LSRA_BLOCK_BOUNDARY_ROTATE - Rotate the register locations from a predecessor.
4377 // For this case, the block returned is the same as for LSRA_BLOCK_BOUNDARY_PRED, but
4378 // the register locations will be "rotated" to stress the resolution and allocation
4381 BasicBlock* LinearScan::findPredBlockForLiveIn(BasicBlock* block,
4382 BasicBlock* prevBlock DEBUGARG(bool* pPredBlockIsAllocated))
4384 BasicBlock* predBlock = nullptr;
4386 assert(*pPredBlockIsAllocated == false);
4387 if (getLsraBlockBoundaryLocations() == LSRA_BLOCK_BOUNDARY_LAYOUT)
4389 if (prevBlock != nullptr)
4391 predBlock = prevBlock;
4396 if (block != compiler->fgFirstBB)
4398 predBlock = block->GetUniquePred(compiler);
4399 if (predBlock != nullptr)
4401 if (isBlockVisited(predBlock))
4403 if (predBlock->bbJumpKind == BBJ_COND)
4405 // Special handling to improve matching on backedges.
4406 BasicBlock* otherBlock = (block == predBlock->bbNext) ? predBlock->bbJumpDest : predBlock->bbNext;
4407 noway_assert(otherBlock != nullptr);
4408 if (isBlockVisited(otherBlock))
4410 // This is the case when we have a conditional branch where one target has already
4411 // been visited. It would be best to use the same incoming regs as that block,
4412 // so that we have less likelihood of having to move registers.
4413 // For example, in determining the block to use for the starting register locations for
4414 // "block" in the following example, we'd like to use the same predecessor for "block"
4415 // as for "otherBlock", so that both successors of predBlock have the same locations, reducing
4416 // the likelihood of needing a split block on a backedge:
4427 for (flowList* pred = otherBlock->bbPreds; pred != nullptr; pred = pred->flNext)
4429 BasicBlock* otherPred = pred->flBlock;
4430 if (otherPred->bbNum == blockInfo[otherBlock->bbNum].predBBNum)
4432 predBlock = otherPred;
4441 predBlock = nullptr;
4446 for (flowList* pred = block->bbPreds; pred != nullptr; pred = pred->flNext)
4448 BasicBlock* candidatePredBlock = pred->flBlock;
4449 if (isBlockVisited(candidatePredBlock))
4451 if (predBlock == nullptr || predBlock->bbWeight < candidatePredBlock->bbWeight)
4453 predBlock = candidatePredBlock;
4454 INDEBUG(*pPredBlockIsAllocated = true;)
4459 if (predBlock == nullptr)
4461 predBlock = prevBlock;
4462 assert(predBlock != nullptr);
4463 JITDUMP("\n\nNo allocated predecessor; ");
4469 void LinearScan::buildIntervals()
4473 // start numbering at 1; 0 is the entry
4474 LsraLocation currentLoc = 1;
4476 JITDUMP("\nbuildIntervals ========\n");
4478 // Now build (empty) records for all of the physical registers
4479 buildPhysRegRecords();
4484 printf("\n-----------------\n");
4485 printf("LIVENESS:\n");
4486 printf("-----------------\n");
4487 foreach_block(compiler, block)
4489 printf("BB%02u use def in out\n", block->bbNum);
4490 dumpConvertedVarSet(compiler, block->bbVarUse);
4492 dumpConvertedVarSet(compiler, block->bbVarDef);
4494 dumpConvertedVarSet(compiler, block->bbLiveIn);
4496 dumpConvertedVarSet(compiler, block->bbLiveOut);
4503 // We will determine whether we should double align the frame during
4504 // identifyCandidates(), but we initially assume that we will not.
4505 doDoubleAlign = false;
4508 identifyCandidates();
4510 // Figure out if we're going to use a frame pointer. We need to do this before building
4511 // the ref positions, because those objects will embed the frame register in various register masks
4512 // if the frame pointer is not reserved. If we decide to have a frame pointer, setFrameType() will
4513 // remove the frame pointer from the masks.
4516 DBEXEC(VERBOSE, TupleStyleDump(LSRA_DUMP_PRE));
4519 JITDUMP("\nbuildIntervals second part ========\n");
4522 // Next, create ParamDef RefPositions for all the tracked parameters,
4523 // in order of their varIndex
4526 unsigned int lclNum;
4528 RegState* intRegState = &compiler->codeGen->intRegState;
4529 RegState* floatRegState = &compiler->codeGen->floatRegState;
4530 intRegState->rsCalleeRegArgMaskLiveIn = RBM_NONE;
4531 floatRegState->rsCalleeRegArgMaskLiveIn = RBM_NONE;
4533 for (unsigned int varIndex = 0; varIndex < compiler->lvaTrackedCount; varIndex++)
4535 lclNum = compiler->lvaTrackedToVarNum[varIndex];
4536 argDsc = &(compiler->lvaTable[lclNum]);
4538 if (!argDsc->lvIsParam)
4543 // Only reserve a register if the argument is actually used.
4544 // Is it dead on entry? If compJmpOpUsed is true, then the arguments
4545 // have to be kept alive, so we have to consider it as live on entry.
4546 // Use lvRefCnt instead of checking bbLiveIn because if it's volatile we
4547 // won't have done dataflow on it, but it needs to be marked as live-in so
4548 // it will get saved in the prolog.
4549 if (!compiler->compJmpOpUsed && argDsc->lvRefCnt == 0 && !compiler->opts.compDbgCode)
4554 if (argDsc->lvIsRegArg)
4556 updateRegStateForArg(argDsc);
4559 if (isCandidateVar(argDsc))
4561 Interval* interval = getIntervalForLocalVar(varIndex);
4562 regMaskTP mask = allRegs(TypeGet(argDsc));
4563 if (argDsc->lvIsRegArg)
4565 // Set this interval as currently assigned to that register
4566 regNumber inArgReg = argDsc->lvArgReg;
4567 assert(inArgReg < REG_COUNT);
4568 mask = genRegMask(inArgReg);
4569 assignPhysReg(inArgReg, interval);
4571 RefPosition* pos = newRefPosition(interval, MinLocation, RefTypeParamDef, nullptr, mask);
4573 else if (varTypeIsStruct(argDsc->lvType))
4575 for (unsigned fieldVarNum = argDsc->lvFieldLclStart;
4576 fieldVarNum < argDsc->lvFieldLclStart + argDsc->lvFieldCnt; ++fieldVarNum)
4578 LclVarDsc* fieldVarDsc = &(compiler->lvaTable[fieldVarNum]);
4579 if (fieldVarDsc->lvLRACandidate)
4581 assert(fieldVarDsc->lvTracked);
4582 Interval* interval = getIntervalForLocalVar(fieldVarDsc->lvVarIndex);
4584 newRefPosition(interval, MinLocation, RefTypeParamDef, nullptr, allRegs(TypeGet(fieldVarDsc)));
4590 // We can overwrite the register (i.e. codegen saves it on entry)
4591 assert(argDsc->lvRefCnt == 0 || !argDsc->lvIsRegArg || argDsc->lvDoNotEnregister ||
4592 !argDsc->lvLRACandidate || (varTypeIsFloating(argDsc->TypeGet()) && compiler->opts.compDbgCode));
4596 // Now set up the reg state for the non-tracked args
4597 // (We do this here because we want to generate the ParamDef RefPositions in tracked
4598 // order, so that loop doesn't hit the non-tracked args)
4600 for (unsigned argNum = 0; argNum < compiler->info.compArgsCount; argNum++, argDsc++)
4602 argDsc = &(compiler->lvaTable[argNum]);
4604 if (argDsc->lvPromotedStruct())
4606 noway_assert(argDsc->lvFieldCnt == 1); // We only handle one field here
4608 unsigned fieldVarNum = argDsc->lvFieldLclStart;
4609 argDsc = &(compiler->lvaTable[fieldVarNum]);
4611 noway_assert(argDsc->lvIsParam);
4612 if (!argDsc->lvTracked && argDsc->lvIsRegArg)
4614 updateRegStateForArg(argDsc);
4618 // If there is a secret stub param, it is also live in
4619 if (compiler->info.compPublishStubParam)
4621 intRegState->rsCalleeRegArgMaskLiveIn |= RBM_SECRET_STUB_PARAM;
4624 LocationInfoListNodePool listNodePool(compiler, 8);
4625 SmallHashTable<GenTree*, LocationInfoList, 32> operandToLocationInfoMap(compiler);
4627 BasicBlock* predBlock = nullptr;
4628 BasicBlock* prevBlock = nullptr;
4630 // Initialize currentLiveVars to the empty set. We will set it to the current
4631 // live-in at the entry to each block (this will include the incoming args on
4632 // the first block).
4633 VarSetOps::AssignNoCopy(compiler, currentLiveVars, VarSetOps::MakeEmpty(compiler));
4635 for (block = startBlockSequence(); block != nullptr; block = moveToNextBlock())
4637 JITDUMP("\nNEW BLOCK BB%02u\n", block->bbNum);
4639 bool predBlockIsAllocated = false;
4640 predBlock = findPredBlockForLiveIn(block, prevBlock DEBUGARG(&predBlockIsAllocated));
4643 JITDUMP("\n\nSetting BB%02u as the predecessor for determining incoming variable registers of BB%02u\n",
4644 block->bbNum, predBlock->bbNum);
4645 assert(predBlock->bbNum <= bbNumMaxBeforeResolution);
4646 blockInfo[block->bbNum].predBBNum = predBlock->bbNum;
4649 if (enregisterLocalVars)
4651 VarSetOps::AssignNoCopy(compiler, currentLiveVars,
4652 VarSetOps::Intersection(compiler, registerCandidateVars, block->bbLiveIn));
4654 if (block == compiler->fgFirstBB)
4656 insertZeroInitRefPositions();
4659 // Any lclVars live-in to a block are resolution candidates.
4660 VarSetOps::UnionD(compiler, resolutionCandidateVars, currentLiveVars);
4662 // Determine if we need any DummyDefs.
4663 // We need DummyDefs for cases where "predBlock" isn't really a predecessor.
4664 // Note that it's possible to have uses of unitialized variables, in which case even the first
4665 // block may require DummyDefs, which we are not currently adding - this means that these variables
4666 // will always be considered to be in memory on entry (and reloaded when the use is encountered).
4667 // TODO-CQ: Consider how best to tune this. Currently, if we create DummyDefs for uninitialized
4668 // variables (which may actually be initialized along the dynamically executed paths, but not
4669 // on all static paths), we wind up with excessive liveranges for some of these variables.
4670 VARSET_TP newLiveIn(VarSetOps::MakeCopy(compiler, currentLiveVars));
4673 // Compute set difference: newLiveIn = currentLiveVars - predBlock->bbLiveOut
4674 VarSetOps::DiffD(compiler, newLiveIn, predBlock->bbLiveOut);
4676 bool needsDummyDefs = (!VarSetOps::IsEmpty(compiler, newLiveIn) && block != compiler->fgFirstBB);
4678 // Create dummy def RefPositions
4682 // If we are using locations from a predecessor, we should never require DummyDefs.
4683 assert(!predBlockIsAllocated);
4685 JITDUMP("Creating dummy definitions\n");
4686 VarSetOps::Iter iter(compiler, newLiveIn);
4687 unsigned varIndex = 0;
4688 while (iter.NextElem(&varIndex))
4690 unsigned varNum = compiler->lvaTrackedToVarNum[varIndex];
4691 LclVarDsc* varDsc = compiler->lvaTable + varNum;
4692 // Add a dummyDef for any candidate vars that are in the "newLiveIn" set.
4693 // If this is the entry block, don't add any incoming parameters (they're handled with ParamDefs).
4694 if (isCandidateVar(varDsc) && (predBlock != nullptr || !varDsc->lvIsParam))
4696 Interval* interval = getIntervalForLocalVar(varIndex);
4697 RefPosition* pos = newRefPosition(interval, currentLoc, RefTypeDummyDef, nullptr,
4698 allRegs(interval->registerType));
4701 JITDUMP("Finished creating dummy definitions\n\n");
4705 // Add a dummy RefPosition to mark the block boundary.
4706 // Note that we do this AFTER adding the exposed uses above, because the
4707 // register positions for those exposed uses need to be recorded at
4710 RefPosition* pos = newRefPosition((Interval*)nullptr, currentLoc, RefTypeBB, nullptr, RBM_NONE);
4713 LIR::Range& blockRange = LIR::AsRange(block);
4714 for (GenTree* node : blockRange.NonPhiNodes())
4716 assert(node->gtLsraInfo.loc >= currentLoc);
4717 assert(!node->IsValue() || !node->IsUnusedValue() || node->gtLsraInfo.isLocalDefUse);
4719 currentLoc = node->gtLsraInfo.loc;
4720 buildRefPositionsForNode(node, block, listNodePool, operandToLocationInfoMap, currentLoc);
4723 if (currentLoc > maxNodeLocation)
4725 maxNodeLocation = currentLoc;
4730 // Increment the LsraLocation at this point, so that the dummy RefPositions
4731 // will not have the same LsraLocation as any "real" RefPosition.
4734 // Note: the visited set is cleared in LinearScan::doLinearScan()
4735 markBlockVisited(block);
4737 if (enregisterLocalVars)
4739 // Insert exposed uses for a lclVar that is live-out of 'block' but not live-in to the
4740 // next block, or any unvisited successors.
4741 // This will address lclVars that are live on a backedge, as well as those that are kept
4742 // live at a GT_JMP.
4744 // Blocks ending with "jmp method" are marked as BBJ_HAS_JMP,
4745 // and jmp call is represented using GT_JMP node which is a leaf node.
4746 // Liveness phase keeps all the arguments of the method live till the end of
4747 // block by adding them to liveout set of the block containing GT_JMP.
4749 // The target of a GT_JMP implicitly uses all the current method arguments, however
4750 // there are no actual references to them. This can cause LSRA to assert, because
4751 // the variables are live but it sees no references. In order to correctly model the
4752 // liveness of these arguments, we add dummy exposed uses, in the same manner as for
4753 // backward branches. This will happen automatically via expUseSet.
4755 // Note that a block ending with GT_JMP has no successors and hence the variables
4756 // for which dummy use ref positions are added are arguments of the method.
4758 VARSET_TP expUseSet(VarSetOps::MakeCopy(compiler, block->bbLiveOut));
4759 VarSetOps::IntersectionD(compiler, expUseSet, registerCandidateVars);
4760 BasicBlock* nextBlock = getNextBlock();
4761 if (nextBlock != nullptr)
4763 VarSetOps::DiffD(compiler, expUseSet, nextBlock->bbLiveIn);
4765 for (BasicBlock* succ : block->GetAllSuccs(compiler))
4767 if (VarSetOps::IsEmpty(compiler, expUseSet))
4772 if (isBlockVisited(succ))
4776 VarSetOps::DiffD(compiler, expUseSet, succ->bbLiveIn);
4779 if (!VarSetOps::IsEmpty(compiler, expUseSet))
4781 JITDUMP("Exposed uses:");
4782 VarSetOps::Iter iter(compiler, expUseSet);
4783 unsigned varIndex = 0;
4784 while (iter.NextElem(&varIndex))
4786 unsigned varNum = compiler->lvaTrackedToVarNum[varIndex];
4787 LclVarDsc* varDsc = compiler->lvaTable + varNum;
4788 assert(isCandidateVar(varDsc));
4789 Interval* interval = getIntervalForLocalVar(varIndex);
4791 newRefPosition(interval, currentLoc, RefTypeExpUse, nullptr, allRegs(interval->registerType));
4792 JITDUMP(" V%02u", varNum);
4797 // Clear the "last use" flag on any vars that are live-out from this block.
4799 VarSetOps::Iter iter(compiler, block->bbLiveOut);
4800 unsigned varIndex = 0;
4801 while (iter.NextElem(&varIndex))
4803 unsigned varNum = compiler->lvaTrackedToVarNum[varIndex];
4804 LclVarDsc* const varDsc = &compiler->lvaTable[varNum];
4805 if (isCandidateVar(varDsc))
4807 RefPosition* const lastRP = getIntervalForLocalVar(varIndex)->lastRefPosition;
4808 if ((lastRP != nullptr) && (lastRP->bbNum == block->bbNum))
4810 lastRP->lastUse = false;
4817 checkLastUses(block);
4822 dumpConvertedVarSet(compiler, block->bbVarUse);
4824 dumpConvertedVarSet(compiler, block->bbVarDef);
4833 if (enregisterLocalVars)
4835 if (compiler->lvaKeepAliveAndReportThis())
4837 // If we need to KeepAliveAndReportThis, add a dummy exposed use of it at the end
4838 unsigned keepAliveVarNum = compiler->info.compThisArg;
4839 assert(compiler->info.compIsStatic == false);
4840 LclVarDsc* varDsc = compiler->lvaTable + keepAliveVarNum;
4841 if (isCandidateVar(varDsc))
4843 JITDUMP("Adding exposed use of this, for lvaKeepAliveAndReportThis\n");
4844 Interval* interval = getIntervalForLocalVar(varDsc->lvVarIndex);
4846 newRefPosition(interval, currentLoc, RefTypeExpUse, nullptr, allRegs(interval->registerType));
4851 if (getLsraExtendLifeTimes())
4854 for (lclNum = 0, varDsc = compiler->lvaTable; lclNum < compiler->lvaCount; lclNum++, varDsc++)
4856 if (varDsc->lvLRACandidate)
4858 JITDUMP("Adding exposed use of V%02u for LsraExtendLifetimes\n", lclNum);
4859 Interval* interval = getIntervalForLocalVar(varDsc->lvVarIndex);
4861 newRefPosition(interval, currentLoc, RefTypeExpUse, nullptr, allRegs(interval->registerType));
4868 // If the last block has successors, create a RefTypeBB to record
4871 if (prevBlock->NumSucc(compiler) > 0)
4873 RefPosition* pos = newRefPosition((Interval*)nullptr, currentLoc, RefTypeBB, nullptr, RBM_NONE);
4877 // Make sure we don't have any blocks that were not visited
4878 foreach_block(compiler, block)
4880 assert(isBlockVisited(block));
4885 lsraDumpIntervals("BEFORE VALIDATING INTERVALS");
4886 dumpRefPositions("BEFORE VALIDATING INTERVALS");
4887 validateIntervals();
4893 void LinearScan::dumpVarRefPositions(const char* title)
4895 if (enregisterLocalVars)
4897 printf("\nVAR REFPOSITIONS %s\n", title);
4899 for (unsigned i = 0; i < compiler->lvaCount; i++)
4901 printf("--- V%02u\n", i);
4903 LclVarDsc* varDsc = compiler->lvaTable + i;
4904 if (varDsc->lvIsRegCandidate())
4906 Interval* interval = getIntervalForLocalVar(varDsc->lvVarIndex);
4907 for (RefPosition* ref = interval->firstRefPosition; ref != nullptr; ref = ref->nextRefPosition)
4917 void LinearScan::validateIntervals()
4919 if (enregisterLocalVars)
4921 for (unsigned i = 0; i < compiler->lvaTrackedCount; i++)
4923 if (!compiler->lvaTable[compiler->lvaTrackedToVarNum[i]].lvLRACandidate)
4927 Interval* interval = getIntervalForLocalVar(i);
4929 bool defined = false;
4930 printf("-----------------\n");
4931 for (RefPosition* ref = interval->firstRefPosition; ref != nullptr; ref = ref->nextRefPosition)
4934 RefType refType = ref->refType;
4935 if (!defined && RefTypeIsUse(refType))
4937 if (compiler->info.compMethodName != nullptr)
4939 printf("%s: ", compiler->info.compMethodName);
4941 printf("LocalVar V%02u: undefined use at %u\n", interval->varNum, ref->nodeLocation);
4943 // Note that there can be multiple last uses if they are on disjoint paths,
4944 // so we can't really check the lastUse flag
4949 if (RefTypeIsDef(refType))
4959 // Set the default rpFrameType based upon codeGen->isFramePointerRequired()
4960 // This was lifted from the register predictor
4962 void LinearScan::setFrameType()
4964 FrameType frameType = FT_NOT_SET;
4966 compiler->codeGen->setDoubleAlign(false);
4969 frameType = FT_DOUBLE_ALIGN_FRAME;
4970 compiler->codeGen->setDoubleAlign(true);
4973 #endif // DOUBLE_ALIGN
4974 if (compiler->codeGen->isFramePointerRequired())
4976 frameType = FT_EBP_FRAME;
4980 if (compiler->rpMustCreateEBPCalled == false)
4985 compiler->rpMustCreateEBPCalled = true;
4986 if (compiler->rpMustCreateEBPFrame(INDEBUG(&reason)))
4988 JITDUMP("; Decided to create an EBP based frame for ETW stackwalking (%s)\n", reason);
4989 compiler->codeGen->setFrameRequired(true);
4993 if (compiler->codeGen->isFrameRequired())
4995 frameType = FT_EBP_FRAME;
4999 frameType = FT_ESP_FRAME;
5006 noway_assert(!compiler->codeGen->isFramePointerRequired());
5007 noway_assert(!compiler->codeGen->isFrameRequired());
5008 compiler->codeGen->setFramePointerUsed(false);
5011 compiler->codeGen->setFramePointerUsed(true);
5014 case FT_DOUBLE_ALIGN_FRAME:
5015 noway_assert(!compiler->codeGen->isFramePointerRequired());
5016 compiler->codeGen->setFramePointerUsed(false);
5018 #endif // DOUBLE_ALIGN
5020 noway_assert(!"rpFrameType not set correctly!");
5024 // If we are using FPBASE as the frame register, we cannot also use it for
5025 // a local var. Note that we may have already added it to the register masks,
5026 // which are computed when the LinearScan class constructor is created, and
5027 // used during lowering. Luckily, the TreeNodeInfo only stores an index to
5028 // the masks stored in the LinearScan class, so we only need to walk the
5029 // unique masks and remove FPBASE.
5030 if (frameType == FT_EBP_FRAME)
5032 if ((availableIntRegs & RBM_FPBASE) != 0)
5034 RemoveRegisterFromMasks(REG_FPBASE);
5036 // We know that we're already in "read mode" for availableIntRegs. However,
5037 // we need to remove the FPBASE register, so subsequent users (like callers
5038 // to allRegs()) get the right thing. The RemoveRegisterFromMasks() code
5039 // fixes up everything that already took a dependency on the value that was
5040 // previously read, so this completes the picture.
5041 availableIntRegs.OverrideAssign(availableIntRegs & ~RBM_FPBASE);
5045 compiler->rpFrameType = frameType;
5048 // Is the copyReg/moveReg given by this RefPosition still busy at the
5050 bool copyOrMoveRegInUse(RefPosition* ref, LsraLocation loc)
5052 assert(ref->copyReg || ref->moveReg);
5053 if (ref->getRefEndLocation() >= loc)
5057 Interval* interval = ref->getInterval();
5058 RefPosition* nextRef = interval->getNextRefPosition();
5059 if (nextRef != nullptr && nextRef->treeNode == ref->treeNode && nextRef->getRefEndLocation() >= loc)
5066 // Determine whether the register represented by "physRegRecord" is available at least
5067 // at the "currentLoc", and if so, return the next location at which it is in use in
5068 // "nextRefLocationPtr"
5070 bool LinearScan::registerIsAvailable(RegRecord* physRegRecord,
5071 LsraLocation currentLoc,
5072 LsraLocation* nextRefLocationPtr,
5073 RegisterType regType)
5075 *nextRefLocationPtr = MaxLocation;
5076 LsraLocation nextRefLocation = MaxLocation;
5077 regMaskTP regMask = genRegMask(physRegRecord->regNum);
5078 if (physRegRecord->isBusyUntilNextKill)
5083 RefPosition* nextPhysReference = physRegRecord->getNextRefPosition();
5084 if (nextPhysReference != nullptr)
5086 nextRefLocation = nextPhysReference->nodeLocation;
5087 // if (nextPhysReference->refType == RefTypeFixedReg) nextRefLocation--;
5089 else if (!physRegRecord->isCalleeSave)
5091 nextRefLocation = MaxLocation - 1;
5094 Interval* assignedInterval = physRegRecord->assignedInterval;
5096 if (assignedInterval != nullptr)
5098 RefPosition* recentReference = assignedInterval->recentRefPosition;
5100 // The only case where we have an assignedInterval, but recentReference is null
5101 // is where this interval is live at procedure entry (i.e. an arg register), in which
5102 // case it's still live and its assigned register is not available
5103 // (Note that the ParamDef will be recorded as a recentReference when we encounter
5104 // it, but we will be allocating registers, potentially to other incoming parameters,
5105 // as we process the ParamDefs.)
5107 if (recentReference == nullptr)
5112 // Is this a copyReg/moveReg? It is if the register assignment doesn't match.
5113 // (the recentReference may not be a copyReg/moveReg, because we could have seen another
5114 // reference since the copyReg/moveReg)
5116 if (!assignedInterval->isAssignedTo(physRegRecord->regNum))
5118 // Don't reassign it if it's still in use
5119 if ((recentReference->copyReg || recentReference->moveReg) &&
5120 copyOrMoveRegInUse(recentReference, currentLoc))
5125 else if (!assignedInterval->isActive && assignedInterval->isConstant)
5127 // Treat this as unassigned, i.e. do nothing.
5128 // TODO-CQ: Consider adjusting the heuristics (probably in the caller of this method)
5129 // to avoid reusing these registers.
5131 // If this interval isn't active, it's available if it isn't referenced
5132 // at this location (or the previous location, if the recent RefPosition
5133 // is a delayRegFree).
5134 else if (!assignedInterval->isActive &&
5135 (recentReference->refType == RefTypeExpUse || recentReference->getRefEndLocation() < currentLoc))
5137 // This interval must have a next reference (otherwise it wouldn't be assigned to this register)
5138 RefPosition* nextReference = recentReference->nextRefPosition;
5139 if (nextReference != nullptr)
5141 if (nextReference->nodeLocation < nextRefLocation)
5143 nextRefLocation = nextReference->nodeLocation;
5148 assert(recentReference->copyReg && recentReference->registerAssignment != regMask);
5156 if (nextRefLocation < *nextRefLocationPtr)
5158 *nextRefLocationPtr = nextRefLocation;
5162 if (regType == TYP_DOUBLE)
5164 // Recurse, but check the other half this time (TYP_FLOAT)
5165 if (!registerIsAvailable(getRegisterRecord(REG_NEXT(physRegRecord->regNum)), currentLoc, nextRefLocationPtr,
5168 nextRefLocation = *nextRefLocationPtr;
5170 #endif // _TARGET_ARM_
5172 return (nextRefLocation >= currentLoc);
5175 //------------------------------------------------------------------------
5176 // getRegisterType: Get the RegisterType to use for the given RefPosition
5179 // currentInterval: The interval for the current allocation
5180 // refPosition: The RefPosition of the current Interval for which a register is being allocated
5183 // The RegisterType that should be allocated for this RefPosition
5186 // This will nearly always be identical to the registerType of the interval, except in the case
5187 // of SIMD types of 8 bytes (currently only Vector2) when they are passed and returned in integer
5188 // registers, or copied to a return temp.
5189 // This method need only be called in situations where we may be dealing with the register requirements
5190 // of a RefTypeUse RefPosition (i.e. not when we are only looking at the type of an interval, nor when
5191 // we are interested in the "defining" type of the interval). This is because the situation of interest
5192 // only happens at the use (where it must be copied to an integer register).
5194 RegisterType LinearScan::getRegisterType(Interval* currentInterval, RefPosition* refPosition)
5196 assert(refPosition->getInterval() == currentInterval);
5197 RegisterType regType = currentInterval->registerType;
5198 regMaskTP candidates = refPosition->registerAssignment;
5200 assert((candidates & allRegs(regType)) != RBM_NONE);
5204 //------------------------------------------------------------------------
5205 // tryAllocateFreeReg: Find a free register that satisfies the requirements for refPosition,
5206 // and takes into account the preferences for the given Interval
5209 // currentInterval: The interval for the current allocation
5210 // refPosition: The RefPosition of the current Interval for which a register is being allocated
5213 // The regNumber, if any, allocated to the RefPositon. Returns REG_NA if no free register is found.
5216 // TODO-CQ: Consider whether we need to use a different order for tree temps than for vars, as
5219 static const regNumber lsraRegOrder[] = {REG_VAR_ORDER};
5220 const unsigned lsraRegOrderSize = ArrLen(lsraRegOrder);
5221 static const regNumber lsraRegOrderFlt[] = {REG_VAR_ORDER_FLT};
5222 const unsigned lsraRegOrderFltSize = ArrLen(lsraRegOrderFlt);
5224 regNumber LinearScan::tryAllocateFreeReg(Interval* currentInterval, RefPosition* refPosition)
5226 regNumber foundReg = REG_NA;
5228 RegisterType regType = getRegisterType(currentInterval, refPosition);
5229 const regNumber* regOrder;
5230 unsigned regOrderSize;
5231 if (useFloatReg(regType))
5233 regOrder = lsraRegOrderFlt;
5234 regOrderSize = lsraRegOrderFltSize;
5238 regOrder = lsraRegOrder;
5239 regOrderSize = lsraRegOrderSize;
5242 LsraLocation currentLocation = refPosition->nodeLocation;
5243 RefPosition* nextRefPos = refPosition->nextRefPosition;
5244 LsraLocation nextLocation = (nextRefPos == nullptr) ? currentLocation : nextRefPos->nodeLocation;
5245 regMaskTP candidates = refPosition->registerAssignment;
5246 regMaskTP preferences = currentInterval->registerPreferences;
5248 if (RefTypeIsDef(refPosition->refType))
5250 if (currentInterval->hasConflictingDefUse)
5252 resolveConflictingDefAndUse(currentInterval, refPosition);
5253 candidates = refPosition->registerAssignment;
5255 // Otherwise, check for the case of a fixed-reg def of a reg that will be killed before the
5256 // use, or interferes at the point of use (which shouldn't happen, but Lower doesn't mark
5257 // the contained nodes as interfering).
5258 // Note that we may have a ParamDef RefPosition that is marked isFixedRegRef, but which
5259 // has had its registerAssignment changed to no longer be a single register.
5260 else if (refPosition->isFixedRegRef && nextRefPos != nullptr && RefTypeIsUse(nextRefPos->refType) &&
5261 !nextRefPos->isFixedRegRef && genMaxOneBit(refPosition->registerAssignment))
5263 regNumber defReg = refPosition->assignedReg();
5264 RegRecord* defRegRecord = getRegisterRecord(defReg);
5266 RefPosition* currFixedRegRefPosition = defRegRecord->recentRefPosition;
5267 assert(currFixedRegRefPosition != nullptr &&
5268 currFixedRegRefPosition->nodeLocation == refPosition->nodeLocation);
5270 // If there is another fixed reference to this register before the use, change the candidates
5271 // on this RefPosition to include that of nextRefPos.
5272 if (currFixedRegRefPosition->nextRefPosition != nullptr &&
5273 currFixedRegRefPosition->nextRefPosition->nodeLocation <= nextRefPos->getRefEndLocation())
5275 candidates |= nextRefPos->registerAssignment;
5276 if (preferences == refPosition->registerAssignment)
5278 preferences = candidates;
5284 preferences &= candidates;
5285 if (preferences == RBM_NONE)
5287 preferences = candidates;
5289 regMaskTP relatedPreferences = RBM_NONE;
5292 candidates = stressLimitRegs(refPosition, candidates);
5294 bool mustAssignARegister = true;
5295 assert(candidates != RBM_NONE);
5297 // If the related interval has no further references, it is possible that it is a source of the
5298 // node that produces this interval. However, we don't want to use the relatedInterval for preferencing
5299 // if its next reference is not a new definition (as it either is or will become live).
5300 Interval* relatedInterval = currentInterval->relatedInterval;
5301 if (relatedInterval != nullptr)
5303 RefPosition* nextRelatedRefPosition = relatedInterval->getNextRefPosition();
5304 if (nextRelatedRefPosition != nullptr)
5306 // Don't use the relatedInterval for preferencing if its next reference is not a new definition.
5307 if (!RefTypeIsDef(nextRelatedRefPosition->refType))
5309 relatedInterval = nullptr;
5311 // Is the relatedInterval simply a copy to another relatedInterval?
5312 else if ((relatedInterval->relatedInterval != nullptr) &&
5313 (nextRelatedRefPosition->nextRefPosition != nullptr) &&
5314 (nextRelatedRefPosition->nextRefPosition->nextRefPosition == nullptr) &&
5315 (nextRelatedRefPosition->nextRefPosition->nodeLocation <
5316 relatedInterval->relatedInterval->getNextRefLocation()))
5318 // The current relatedInterval has only two remaining RefPositions, both of which
5319 // occur prior to the next RefPosition for its relatedInterval.
5320 // It is likely a copy.
5321 relatedInterval = relatedInterval->relatedInterval;
5326 if (relatedInterval != nullptr)
5328 // If the related interval already has an assigned register, then use that
5329 // as the related preference. We'll take the related
5330 // interval preferences into account in the loop over all the registers.
5332 if (relatedInterval->assignedReg != nullptr)
5334 relatedPreferences = genRegMask(relatedInterval->assignedReg->regNum);
5338 relatedPreferences = relatedInterval->registerPreferences;
5342 bool preferCalleeSave = currentInterval->preferCalleeSave;
5344 // For floating point, we want to be less aggressive about using callee-save registers.
5345 // So in that case, we just need to ensure that the current RefPosition is covered.
5346 RefPosition* rangeEndRefPosition;
5347 RefPosition* lastRefPosition = currentInterval->lastRefPosition;
5348 if (useFloatReg(currentInterval->registerType))
5350 rangeEndRefPosition = refPosition;
5354 rangeEndRefPosition = currentInterval->lastRefPosition;
5355 // If we have a relatedInterval that is not currently occupying a register,
5356 // and whose lifetime begins after this one ends,
5357 // we want to try to select a register that will cover its lifetime.
5358 if ((relatedInterval != nullptr) && (relatedInterval->assignedReg == nullptr) &&
5359 (relatedInterval->getNextRefLocation() >= rangeEndRefPosition->nodeLocation))
5361 lastRefPosition = relatedInterval->lastRefPosition;
5362 preferCalleeSave = relatedInterval->preferCalleeSave;
5366 // If this has a delayed use (due to being used in a rmw position of a
5367 // non-commutative operator), its endLocation is delayed until the "def"
5368 // position, which is one location past the use (getRefEndLocation() takes care of this).
5369 LsraLocation rangeEndLocation = rangeEndRefPosition->getRefEndLocation();
5370 LsraLocation lastLocation = lastRefPosition->getRefEndLocation();
5371 regNumber prevReg = REG_NA;
5373 if (currentInterval->assignedReg)
5375 bool useAssignedReg = false;
5376 // This was an interval that was previously allocated to the given
5377 // physical register, and we should try to allocate it to that register
5378 // again, if possible and reasonable.
5379 // Use it preemptively (i.e. before checking other available regs)
5380 // only if it is preferred and available.
5382 RegRecord* regRec = currentInterval->assignedReg;
5383 prevReg = regRec->regNum;
5384 regMaskTP prevRegBit = genRegMask(prevReg);
5386 // Is it in the preferred set of regs?
5387 if ((prevRegBit & preferences) != RBM_NONE)
5389 // Is it currently available?
5390 LsraLocation nextPhysRefLoc;
5391 if (registerIsAvailable(regRec, currentLocation, &nextPhysRefLoc, currentInterval->registerType))
5393 // If the register is next referenced at this location, only use it if
5394 // this has a fixed reg requirement (i.e. this is the reference that caused
5395 // the FixedReg ref to be created)
5397 if (!regRec->conflictingFixedRegReference(refPosition))
5399 useAssignedReg = true;
5405 regNumber foundReg = prevReg;
5406 assignPhysReg(regRec, currentInterval);
5407 refPosition->registerAssignment = genRegMask(foundReg);
5412 // Don't keep trying to allocate to this register
5413 currentInterval->assignedReg = nullptr;
5417 RegRecord* availablePhysRegInterval = nullptr;
5418 Interval* intervalToUnassign = nullptr;
5420 // Each register will receive a score which is the sum of the scoring criteria below.
5421 // These were selected on the assumption that they will have an impact on the "goodness"
5422 // of a register selection, and have been tuned to a certain extent by observing the impact
5423 // of the ordering on asmDiffs. However, there is probably much more room for tuning,
5424 // and perhaps additional criteria.
5426 // These are FLAGS (bits) so that we can easily order them and add them together.
5427 // If the scores are equal, but one covers more of the current interval's range,
5428 // then it wins. Otherwise, the one encountered earlier in the regOrder wins.
5432 VALUE_AVAILABLE = 0x40, // It is a constant value that is already in an acceptable register.
5433 COVERS = 0x20, // It is in the interval's preference set and it covers the entire lifetime.
5434 OWN_PREFERENCE = 0x10, // It is in the preference set of this interval.
5435 COVERS_RELATED = 0x08, // It is in the preference set of the related interval and covers the entire lifetime.
5436 RELATED_PREFERENCE = 0x04, // It is in the preference set of the related interval.
5437 CALLER_CALLEE = 0x02, // It is in the right "set" for the interval (caller or callee-save).
5438 UNASSIGNED = 0x01, // It is not currently assigned to an inactive interval.
5443 // Compute the best possible score so we can stop looping early if we find it.
5444 // TODO-Throughput: At some point we may want to short-circuit the computation of each score, but
5445 // probably not until we've tuned the order of these criteria. At that point,
5446 // we'll need to avoid the short-circuit if we've got a stress option to reverse
5448 int bestPossibleScore = COVERS + UNASSIGNED + OWN_PREFERENCE + CALLER_CALLEE;
5449 if (relatedPreferences != RBM_NONE)
5451 bestPossibleScore |= RELATED_PREFERENCE + COVERS_RELATED;
5454 LsraLocation bestLocation = MinLocation;
5456 // In non-debug builds, this will simply get optimized away
5457 bool reverseSelect = false;
5459 reverseSelect = doReverseSelect();
5462 // An optimization for the common case where there is only one candidate -
5463 // avoid looping over all the other registers
5465 regNumber singleReg = REG_NA;
5467 if (genMaxOneBit(candidates))
5470 singleReg = genRegNumFromMask(candidates);
5471 regOrder = &singleReg;
5474 for (unsigned i = 0; i < regOrderSize && (candidates != RBM_NONE); i++)
5476 regNumber regNum = regOrder[i];
5477 regMaskTP candidateBit = genRegMask(regNum);
5479 if (!(candidates & candidateBit))
5484 candidates &= ~candidateBit;
5486 RegRecord* physRegRecord = getRegisterRecord(regNum);
5489 LsraLocation nextPhysRefLocation = MaxLocation;
5491 // By chance, is this register already holding this interval, as a copyReg or having
5492 // been restored as inactive after a kill?
5493 if (physRegRecord->assignedInterval == currentInterval)
5495 availablePhysRegInterval = physRegRecord;
5496 intervalToUnassign = nullptr;
5500 // Find the next RefPosition of the physical register
5501 if (!registerIsAvailable(physRegRecord, currentLocation, &nextPhysRefLocation, regType))
5506 // If the register is next referenced at this location, only use it if
5507 // this has a fixed reg requirement (i.e. this is the reference that caused
5508 // the FixedReg ref to be created)
5510 if (physRegRecord->conflictingFixedRegReference(refPosition))
5515 // If this is a definition of a constant interval, check to see if its value is already in this register.
5516 if (currentInterval->isConstant && RefTypeIsDef(refPosition->refType) &&
5517 (physRegRecord->assignedInterval != nullptr) && physRegRecord->assignedInterval->isConstant)
5519 noway_assert(refPosition->treeNode != nullptr);
5520 GenTree* otherTreeNode = physRegRecord->assignedInterval->firstRefPosition->treeNode;
5521 noway_assert(otherTreeNode != nullptr);
5523 if (refPosition->treeNode->OperGet() == otherTreeNode->OperGet())
5525 switch (otherTreeNode->OperGet())
5528 if ((refPosition->treeNode->AsIntCon()->IconValue() ==
5529 otherTreeNode->AsIntCon()->IconValue()) &&
5530 (varTypeGCtype(refPosition->treeNode) == varTypeGCtype(otherTreeNode)))
5532 #ifdef _TARGET_64BIT_
5533 // If the constant is negative, only reuse registers of the same type.
5534 // This is because, on a 64-bit system, we do not sign-extend immediates in registers to
5535 // 64-bits unless they are actually longs, as this requires a longer instruction.
5536 // This doesn't apply to a 32-bit system, on which long values occupy multiple registers.
5537 // (We could sign-extend, but we would have to always sign-extend, because if we reuse more
5538 // than once, we won't have access to the instruction that originally defines the constant).
5539 if ((refPosition->treeNode->TypeGet() == otherTreeNode->TypeGet()) ||
5540 (refPosition->treeNode->AsIntCon()->IconValue() >= 0))
5541 #endif // _TARGET_64BIT_
5543 score |= VALUE_AVAILABLE;
5549 // For floating point constants, the values must be identical, not simply compare
5550 // equal. So we compare the bits.
5551 if (refPosition->treeNode->AsDblCon()->isBitwiseEqual(otherTreeNode->AsDblCon()) &&
5552 (refPosition->treeNode->TypeGet() == otherTreeNode->TypeGet()))
5554 score |= VALUE_AVAILABLE;
5559 // for all other 'otherTreeNode->OperGet()' kinds, we leave 'score' unchanged
5565 // If the nextPhysRefLocation is a fixedRef for the rangeEndRefPosition, increment it so that
5566 // we don't think it isn't covering the live range.
5567 // This doesn't handle the case where earlier RefPositions for this Interval are also
5568 // FixedRefs of this regNum, but at least those are only interesting in the case where those
5569 // are "local last uses" of the Interval - otherwise the liveRange would interfere with the reg.
5570 if (nextPhysRefLocation == rangeEndLocation && rangeEndRefPosition->isFixedRefOfReg(regNum))
5572 INDEBUG(dumpLsraAllocationEvent(LSRA_EVENT_INCREMENT_RANGE_END, currentInterval, regNum));
5573 nextPhysRefLocation++;
5576 if ((candidateBit & preferences) != RBM_NONE)
5578 score |= OWN_PREFERENCE;
5579 if (nextPhysRefLocation > rangeEndLocation)
5584 if (relatedInterval != nullptr && (candidateBit & relatedPreferences) != RBM_NONE)
5586 score |= RELATED_PREFERENCE;
5587 if (nextPhysRefLocation > relatedInterval->lastRefPosition->nodeLocation)
5589 score |= COVERS_RELATED;
5593 // If we had a fixed-reg def of a reg that will be killed before the use, prefer it to any other registers
5594 // with the same score. (Note that we haven't changed the original registerAssignment on the RefPosition).
5595 // Overload the RELATED_PREFERENCE value.
5596 else if (candidateBit == refPosition->registerAssignment)
5598 score |= RELATED_PREFERENCE;
5601 if ((preferCalleeSave && physRegRecord->isCalleeSave) || (!preferCalleeSave && !physRegRecord->isCalleeSave))
5603 score |= CALLER_CALLEE;
5606 // The register is considered unassigned if it has no assignedInterval, OR
5607 // if its next reference is beyond the range of this interval.
5608 if (physRegRecord->assignedInterval == nullptr ||
5609 physRegRecord->assignedInterval->getNextRefLocation() > lastLocation)
5611 score |= UNASSIGNED;
5614 bool foundBetterCandidate = false;
5616 if (score > bestScore)
5618 foundBetterCandidate = true;
5620 else if (score == bestScore)
5622 // Prefer a register that covers the range.
5623 if (bestLocation <= lastLocation)
5625 if (nextPhysRefLocation > bestLocation)
5627 foundBetterCandidate = true;
5630 // If both cover the range, prefer a register that is killed sooner (leaving the longer range register
5631 // available). If both cover the range and also getting killed at the same location, prefer the one which
5632 // is same as previous assignment.
5633 else if (nextPhysRefLocation > lastLocation)
5635 if (nextPhysRefLocation < bestLocation)
5637 foundBetterCandidate = true;
5639 else if (nextPhysRefLocation == bestLocation && prevReg == regNum)
5641 foundBetterCandidate = true;
5647 if (doReverseSelect() && bestScore != 0)
5649 foundBetterCandidate = !foundBetterCandidate;
5653 if (foundBetterCandidate)
5655 bestLocation = nextPhysRefLocation;
5656 availablePhysRegInterval = physRegRecord;
5657 intervalToUnassign = physRegRecord->assignedInterval;
5661 // there is no way we can get a better score so break out
5662 if (!reverseSelect && score == bestPossibleScore && bestLocation == rangeEndLocation + 1)
5668 if (availablePhysRegInterval != nullptr)
5670 if (intervalToUnassign != nullptr)
5672 RegRecord* physRegToUnassign = availablePhysRegInterval;
5674 // We should unassign a double register if availablePhysRegInterval is part of the double register
5675 if (availablePhysRegInterval->assignedInterval->registerType == TYP_DOUBLE &&
5676 !genIsValidDoubleReg(availablePhysRegInterval->regNum))
5677 physRegToUnassign = findAnotherHalfRegRec(availablePhysRegInterval);
5679 unassignPhysReg(physRegToUnassign, intervalToUnassign->recentRefPosition);
5680 if (bestScore & VALUE_AVAILABLE)
5682 assert(intervalToUnassign->isConstant);
5683 refPosition->treeNode->SetReuseRegVal();
5685 // If we considered this "unassigned" because this interval's lifetime ends before
5686 // the next ref, remember it.
5687 else if ((bestScore & UNASSIGNED) != 0 && intervalToUnassign != nullptr)
5689 updatePreviousInterval(physRegToUnassign, intervalToUnassign, intervalToUnassign->registerType);
5694 assert((bestScore & VALUE_AVAILABLE) == 0);
5696 assignPhysReg(availablePhysRegInterval, currentInterval);
5697 foundReg = availablePhysRegInterval->regNum;
5698 regMaskTP foundRegMask = genRegMask(foundReg);
5699 refPosition->registerAssignment = foundRegMask;
5700 if (relatedInterval != nullptr)
5702 relatedInterval->updateRegisterPreferences(foundRegMask);
5710 //------------------------------------------------------------------------
5711 // canSpillThisReg: Determine whether we can spill physRegRecord and anotherPhysRegRecord
5714 // physRegRecord - reg to spill
5715 // recentAssignedRef - recent RefPosition of physRegRecord
5716 // anotherPhysRegRecord - When we have to test a double reigster, this is a second float register
5717 // of a double register in ARM32.
5718 // Otherwise it is nullptr.
5719 // anotherRecentAssignedRef - recent RefPosition of anotherPhysRegRecord
5720 // refLocation - Location of RefPosition where this register will be spilled
5721 // recentAssignedRefWeight - Weight of recent assigned RefPosition which will be determined in this function
5722 // farthestRefPosWeight - Current farthestRefPosWeight at allocateBusyReg()
5725 // True - if we can spill physRegRecord and anotherPhysRegRecord
5726 // False - otherwise
5728 // Note: This helper is designed to be used only from allocateBusyReg()
5730 bool LinearScan::canSpillThisReg(RegRecord* physRegRecord,
5731 RefPosition* recentAssignedRef,
5732 RegRecord* anotherPhysRegRecord,
5733 RefPosition* anotherRecentAssignedRef,
5734 LsraLocation refLocation,
5735 unsigned* recentAssignedRefWeight,
5736 unsigned farthestRefPosWeight)
5738 // There are four cases for ARM32 when we are trying to allocation a double register for TYP_DOUBLE
5739 // Case 1: recentAssignedRef !=nullptr && anotherRecentAssignedRef != nullptr
5740 // Case 2: recentAssignedRef !=nullptr && anotherRecentAssignedRef == nullptr
5741 // Case 3: recentAssignedRef ==nullptr && anotherRecentAssignedRef != nullptr
5742 // Case 4: recentAssignedRef ==nullptr && anotherRecentAssignedRef == nullptr
5743 if (recentAssignedRef != nullptr)
5745 if (anotherRecentAssignedRef != nullptr)
5747 // Case 1: recentAssignedRef !=nullptr && anotherRecentAssignedRef != nullptr
5748 if (anotherRecentAssignedRef->nodeLocation == refLocation)
5750 // We can't spill a register that's being used at the current location
5754 // If the current position has the candidate register marked to be delayed,
5755 // check if the previous location is using this register, if that's the case we have to skip
5756 // since we can't spill this register.
5757 if (anotherRecentAssignedRef->delayRegFree && (refLocation == anotherRecentAssignedRef->nodeLocation + 1))
5762 // fallthrough to test recentAssignedRef
5764 // Case 2: recentAssignedRef !=nullptr && anotherRecentAssignedRef == nullptr
5765 if (recentAssignedRef->nodeLocation == refLocation)
5767 // We can't spill a register that's being used at the current location
5768 RefPosition* physRegRef = physRegRecord->recentRefPosition;
5772 // If the current position has the candidate register marked to be delayed,
5773 // check if the previous location is using this register, if that's the case we have to skip
5774 // since we can't spill this register.
5775 if (recentAssignedRef->delayRegFree && (refLocation == recentAssignedRef->nodeLocation + 1))
5780 // We don't prefer to spill a register if the weight of recentAssignedRef > weight
5781 // of the spill candidate found so far. We would consider spilling a greater weight
5782 // ref position only if the refPosition being allocated must need a reg.
5783 unsigned weight1 = BB_ZERO_WEIGHT;
5784 unsigned weight2 = BB_ZERO_WEIGHT;
5785 if (recentAssignedRef != nullptr)
5786 weight1 = getWeight(recentAssignedRef);
5787 if (anotherRecentAssignedRef != nullptr)
5788 weight2 = getWeight(anotherRecentAssignedRef);
5790 *recentAssignedRefWeight = (weight1 > weight2) ? weight1 : weight2;
5792 if (*recentAssignedRefWeight > farthestRefPosWeight)
5799 if (anotherRecentAssignedRef != nullptr)
5801 // Case 3: recentAssignedRef ==nullptr && anotherRecentAssignedRef != nullptr
5802 if (anotherRecentAssignedRef->nodeLocation == refLocation)
5804 // We can't spill a register that's being used at the current location
5808 // If the current position has the candidate register marked to be delayed,
5809 // check if the previous location is using this register, if that's the case we have to skip
5810 // since we can't spill this register.
5811 if (anotherRecentAssignedRef->delayRegFree && (refLocation == anotherRecentAssignedRef->nodeLocation + 1))
5816 // We don't prefer to spill a register if the weight of recentAssignedRef > weight
5817 // of the spill candidate found so far. We would consider spilling a greater weight
5818 // ref position only if the refPosition being allocated must need a reg.
5819 *recentAssignedRefWeight = getWeight(anotherRecentAssignedRef);
5820 if (*recentAssignedRefWeight > farthestRefPosWeight)
5827 // Case 4: recentAssignedRef ==nullptr && anotherRecentAssignedRef == nullptr
5834 //----------------------------------------------------------------------------
5835 // checkReferenceAt: Test assertion conditions for recent assigned RefPosition
5838 // recentAssignedRef - RefPosition to be tested
5839 // refLocation - LsraLocation of interval being allocated
5844 // Note: This helper is designed to be used only from allocateBusyReg()
5846 void LinearScan::checkReferenceAt(RefPosition* recentAssignedRef, LsraLocation refLocation)
5848 // The assigned interval has a reference at this location - otherwise, we would have found
5849 // this in tryAllocateFreeReg().
5850 // Note that we may or may not have actually handled the reference yet, so it could either
5851 // be recentAssignedRef, or the next reference.
5852 assert(recentAssignedRef != nullptr);
5853 if (recentAssignedRef->nodeLocation != refLocation)
5855 if (recentAssignedRef->nodeLocation + 1 == refLocation)
5857 assert(recentAssignedRef->delayRegFree);
5861 RefPosition* nextAssignedRef = recentAssignedRef->nextRefPosition;
5862 assert(nextAssignedRef != nullptr);
5863 assert(nextAssignedRef->nodeLocation == refLocation ||
5864 (nextAssignedRef->nodeLocation + 1 == refLocation && nextAssignedRef->delayRegFree));
5869 //------------------------------------------------------------------------
5870 // allocateBusyReg: Find a busy register that satisfies the requirements for refPosition,
5871 // and that can be spilled.
5874 // current The interval for the current allocation
5875 // refPosition The RefPosition of the current Interval for which a register is being allocated
5876 // allocateIfProfitable If true, a reg may not be allocated if all other ref positions currently
5877 // occupying registers are more important than the 'refPosition'.
5880 // The regNumber allocated to the RefPositon. Returns REG_NA if no free register is found.
5882 // Note: Currently this routine uses weight and farthest distance of next reference
5883 // to select a ref position for spilling.
5884 // a) if allocateIfProfitable = false
5885 // The ref position chosen for spilling will be the lowest weight
5886 // of all and if there is is more than one ref position with the
5887 // same lowest weight, among them choses the one with farthest
5888 // distance to its next reference.
5890 // b) if allocateIfProfitable = true
5891 // The ref position chosen for spilling will not only be lowest weight
5892 // of all but also has a weight lower than 'refPosition'. If there is
5893 // no such ref position, reg will not be allocated.
5894 regNumber LinearScan::allocateBusyReg(Interval* current, RefPosition* refPosition, bool allocateIfProfitable)
5896 regNumber foundReg = REG_NA;
5898 RegisterType regType = getRegisterType(current, refPosition);
5899 regMaskTP candidates = refPosition->registerAssignment;
5900 regMaskTP preferences = (current->registerPreferences & candidates);
5901 if (preferences == RBM_NONE)
5903 preferences = candidates;
5905 if (candidates == RBM_NONE)
5907 // This assumes only integer and floating point register types
5908 // if we target a processor with additional register types,
5909 // this would have to change
5910 candidates = allRegs(regType);
5914 candidates = stressLimitRegs(refPosition, candidates);
5917 // TODO-CQ: Determine whether/how to take preferences into account in addition to
5918 // prefering the one with the furthest ref position when considering
5919 // a candidate to spill
5920 RegRecord* farthestRefPhysRegRecord = nullptr;
5922 RegRecord* farthestRefPhysRegRecord2 = nullptr;
5924 LsraLocation farthestLocation = MinLocation;
5925 LsraLocation refLocation = refPosition->nodeLocation;
5926 unsigned farthestRefPosWeight;
5927 if (allocateIfProfitable)
5929 // If allocating a reg is optional, we will consider those ref positions
5930 // whose weight is less than 'refPosition' for spilling.
5931 farthestRefPosWeight = getWeight(refPosition);
5935 // If allocating a reg is a must, we start off with max weight so
5936 // that the first spill candidate will be selected based on
5937 // farthest distance alone. Since we start off with farthestLocation
5938 // initialized to MinLocation, the first available ref position
5939 // will be selected as spill candidate and its weight as the
5940 // fathestRefPosWeight.
5941 farthestRefPosWeight = BB_MAX_WEIGHT;
5944 for (regNumber regNum : Registers(regType))
5946 regMaskTP candidateBit = genRegMask(regNum);
5947 if (!(candidates & candidateBit))
5951 RegRecord* physRegRecord = getRegisterRecord(regNum);
5953 RegRecord* physRegRecord2 = nullptr;
5954 // For ARM32, let's consider two float registers consisting a double reg together,
5955 // when allocaing a double register.
5956 if (current->registerType == TYP_DOUBLE && genIsValidDoubleReg(regNum))
5958 physRegRecord2 = findAnotherHalfRegRec(physRegRecord);
5962 if (physRegRecord->isBusyUntilNextKill)
5966 Interval* assignedInterval = physRegRecord->assignedInterval;
5968 Interval* assignedInterval2 = (physRegRecord2 == nullptr) ? nullptr : physRegRecord2->assignedInterval;
5971 // If there is a fixed reference at the same location (and it's not due to this reference),
5974 if (physRegRecord->conflictingFixedRegReference(refPosition))
5976 assert(candidates != candidateBit);
5980 LsraLocation physRegNextLocation = MaxLocation;
5981 if (refPosition->isFixedRefOfRegMask(candidateBit))
5983 // Either there is a fixed reference due to this node, or one associated with a
5984 // fixed use fed by a def at this node.
5985 // In either case, we must use this register as it's the only candidate
5986 // TODO-CQ: At the time we allocate a register to a fixed-reg def, if it's not going
5987 // to remain live until the use, we should set the candidates to allRegs(regType)
5988 // to avoid a spill - codegen can then insert the copy.
5989 assert(candidates == candidateBit);
5991 // If a refPosition has a fixed reg as its candidate and is also marked
5992 // as allocateIfProfitable, we should allocate fixed reg only if the
5993 // weight of this ref position is greater than the weight of the ref
5994 // position to which fixed reg is assigned. Such a case would arise
5995 // on x86 under LSRA stress.
5996 if (!allocateIfProfitable)
5998 physRegNextLocation = MaxLocation;
5999 farthestRefPosWeight = BB_MAX_WEIGHT;
6004 physRegNextLocation = physRegRecord->getNextRefLocation();
6006 // If refPosition requires a fixed register, we should reject all others.
6007 // Otherwise, we will still evaluate all phyRegs though their next location is
6008 // not better than farthestLocation found so far.
6010 // TODO: this method should be using an approach similar to tryAllocateFreeReg()
6011 // where it uses a regOrder array to avoid iterating over any but the single
6013 if (refPosition->isFixedRegRef && physRegNextLocation < farthestLocation)
6019 // If this register is not assigned to an interval, either
6020 // - it has a FixedReg reference at the current location that is not this reference, OR
6021 // - this is the special case of a fixed loReg, where this interval has a use at the same location
6022 // In either case, we cannot use it
6023 CLANG_FORMAT_COMMENT_ANCHOR;
6026 if (assignedInterval == nullptr && assignedInterval2 == nullptr)
6028 if (assignedInterval == nullptr)
6031 RefPosition* nextPhysRegPosition = physRegRecord->getNextRefPosition();
6033 #ifndef _TARGET_ARM64_
6034 // TODO-Cleanup: Revisit this after Issue #3524 is complete
6035 // On ARM64 the nodeLocation is not always == refLocation, Disabling this assert for now.
6036 assert(nextPhysRegPosition->nodeLocation == refLocation && candidateBit != candidates);
6042 RefPosition* recentAssignedRef = (assignedInterval == nullptr) ? nullptr : assignedInterval->recentRefPosition;
6043 RefPosition* recentAssignedRef2 =
6044 (assignedInterval2 == nullptr) ? nullptr : assignedInterval2->recentRefPosition;
6045 // There are four cases for ARM32 when we are trying to allocation a double register for TYP_DOUBLE
6046 // Case 1: LoReg->assignedInterval != nullptr && HiReg->assignedInterval != nullptr
6047 // Case 2: LoReg->assignedInterval != nullptr && HiReg->assignedInterval == nullptr
6048 // Case 3: LoReg->assignedInterval == nullptr && HiReg->assignedInterval != nullptr
6049 // Case 4: LoReg->assignedInterval == nullptr && HiReg->assignedInterval == nullptr
6050 if (assignedInterval != nullptr)
6052 // Case 1: LoReg->assignedInterval != nullptr && HiReg->assignedInterval != nullptr
6053 // Case 2: LoReg->assignedInterval != nullptr && HiReg->assignedInterval == nullptr
6054 if (!assignedInterval->isActive)
6056 checkReferenceAt(recentAssignedRef, refLocation);
6060 if (assignedInterval2 != nullptr && !assignedInterval2->isActive)
6062 checkReferenceAt(recentAssignedRef2, refLocation);
6068 if (assignedInterval2 != nullptr)
6070 // Case 3: LoReg->assignedInterval == nullptr && HiReg->assignedInterval != nullptr
6071 if (!assignedInterval2->isActive)
6073 checkReferenceAt(recentAssignedRef2, refLocation);
6079 // Case 4: LoReg->assignedInterval == nullptr && HiReg->assignedInterval == nullptr
6080 // This should not be happen, because we already handle this case before.
6085 RefPosition* recentAssignedRef = assignedInterval->recentRefPosition;
6086 if (!assignedInterval->isActive)
6088 // The assigned interval has a reference at this location - otherwise, we would have found
6089 // this in tryAllocateFreeReg().
6090 // Note that we may or may not have actually handled the reference yet, so it could either
6091 // be recentAssignedRef, or the next reference.
6092 assert(recentAssignedRef != nullptr);
6093 if (recentAssignedRef->nodeLocation != refLocation)
6095 if (recentAssignedRef->nodeLocation + 1 == refLocation)
6097 assert(recentAssignedRef->delayRegFree);
6101 RefPosition* nextAssignedRef = recentAssignedRef->nextRefPosition;
6102 assert(nextAssignedRef != nullptr);
6103 assert(nextAssignedRef->nodeLocation == refLocation ||
6104 (nextAssignedRef->nodeLocation + 1 == refLocation && nextAssignedRef->delayRegFree));
6110 // If we have a recentAssignedRef, check that it is going to be OK to spill it
6112 // TODO-Review: Under what conditions recentAssginedRef would be null?
6113 unsigned recentAssignedRefWeight = BB_ZERO_WEIGHT;
6115 if (!canSpillThisReg(physRegRecord, recentAssignedRef, physRegRecord2, recentAssignedRef2, refLocation,
6116 &recentAssignedRefWeight, farthestRefPosWeight))
6120 #else // !_TARGET_ARM_
6121 if (recentAssignedRef != nullptr)
6123 if (recentAssignedRef->nodeLocation == refLocation)
6125 // We can't spill a register that's being used at the current location
6126 RefPosition* physRegRef = physRegRecord->recentRefPosition;
6130 // If the current position has the candidate register marked to be delayed,
6131 // check if the previous location is using this register, if that's the case we have to skip
6132 // since we can't spill this register.
6133 if (recentAssignedRef->delayRegFree && (refLocation == recentAssignedRef->nodeLocation + 1))
6138 // We don't prefer to spill a register if the weight of recentAssignedRef > weight
6139 // of the spill candidate found so far. We would consider spilling a greater weight
6140 // ref position only if the refPosition being allocated must need a reg.
6141 recentAssignedRefWeight = getWeight(recentAssignedRef);
6142 if (recentAssignedRefWeight > farthestRefPosWeight)
6147 #endif // !_TARGET_ARM_
6149 RefPosition* nextRefPosition = nullptr;
6150 LsraLocation nextLocation = MinLocation;
6152 if (assignedInterval != nullptr)
6155 nextRefPosition = assignedInterval->getNextRefPosition();
6156 nextLocation = assignedInterval->getNextRefLocation();
6158 // We should never spill a register that's occupied by an Interval with its next use at the current
6160 // Normally this won't occur (unless we actually had more uses in a single node than there are registers),
6161 // because we'll always find something with a later nextLocation, but it can happen in stress when
6162 // we have LSRA_SELECT_NEAREST.
6163 if ((nextLocation == refLocation) && !refPosition->isFixedRegRef && nextRefPosition->RequiresRegister())
6170 if (assignedInterval2 != nullptr)
6172 RefPosition* nextRefPosition2 = assignedInterval2->getNextRefPosition();
6173 LsraLocation nextLocation2 = assignedInterval2->getNextRefLocation();
6174 if ((nextLocation2 == refLocation) && !refPosition->isFixedRegRef && nextRefPosition2->RequiresRegister())
6178 nextLocation = (nextLocation > nextLocation2) ? nextLocation : nextLocation2;
6181 if (nextLocation > physRegNextLocation)
6183 nextLocation = physRegNextLocation;
6186 bool isBetterLocation;
6189 if (doSelectNearest() && farthestRefPhysRegRecord != nullptr)
6191 isBetterLocation = (nextLocation <= farthestLocation);
6195 // This if-stmt is associated with the above else
6196 if (recentAssignedRefWeight < farthestRefPosWeight)
6198 isBetterLocation = true;
6202 // This would mean the weight of spill ref position we found so far is equal
6203 // to the weight of the ref position that is being evaluated. In this case
6204 // we prefer to spill ref position whose distance to its next reference is
6206 assert(recentAssignedRefWeight == farthestRefPosWeight);
6208 // If allocateIfProfitable=true, the first spill candidate selected
6209 // will be based on weight alone. After we have found a spill
6210 // candidate whose weight is less than the 'refPosition', we will
6211 // consider farthest distance when there is a tie in weights.
6212 // This is to ensure that we don't spill a ref position whose
6213 // weight is equal to weight of 'refPosition'.
6214 if (allocateIfProfitable && farthestRefPhysRegRecord == nullptr)
6216 isBetterLocation = false;
6220 isBetterLocation = (nextLocation > farthestLocation);
6222 if (nextLocation > farthestLocation)
6224 isBetterLocation = true;
6226 else if (nextLocation == farthestLocation)
6228 // Both weight and distance are equal.
6229 // Prefer that ref position which is marked both reload and
6230 // allocate if profitable. These ref positions don't need
6231 // need to be spilled as they are already in memory and
6232 // codegen considers them as contained memory operands.
6233 CLANG_FORMAT_COMMENT_ANCHOR;
6235 // TODO-CQ-ARM: Just conservatively "and" two condition. We may implement better condision later.
6236 isBetterLocation = true;
6237 if (recentAssignedRef != nullptr)
6238 isBetterLocation &= (recentAssignedRef->reload && recentAssignedRef->AllocateIfProfitable());
6240 if (recentAssignedRef2 != nullptr)
6241 isBetterLocation &= (recentAssignedRef2->reload && recentAssignedRef2->AllocateIfProfitable());
6243 isBetterLocation = (recentAssignedRef != nullptr) && recentAssignedRef->reload &&
6244 recentAssignedRef->AllocateIfProfitable();
6249 isBetterLocation = false;
6254 if (isBetterLocation)
6256 farthestLocation = nextLocation;
6257 farthestRefPhysRegRecord = physRegRecord;
6259 farthestRefPhysRegRecord2 = physRegRecord2;
6261 farthestRefPosWeight = recentAssignedRefWeight;
6266 if (allocateIfProfitable)
6268 // There may not be a spill candidate or if one is found
6269 // its weight must be less than the weight of 'refPosition'
6270 assert((farthestRefPhysRegRecord == nullptr) || (farthestRefPosWeight < getWeight(refPosition)));
6274 // Must have found a spill candidate.
6275 assert(farthestRefPhysRegRecord != nullptr);
6276 if ((farthestLocation == refLocation) && !refPosition->isFixedRegRef)
6279 Interval* assignedInterval =
6280 (farthestRefPhysRegRecord == nullptr) ? nullptr : farthestRefPhysRegRecord->assignedInterval;
6281 Interval* assignedInterval2 =
6282 (farthestRefPhysRegRecord2 == nullptr) ? nullptr : farthestRefPhysRegRecord2->assignedInterval;
6283 RefPosition* nextRefPosition =
6284 (assignedInterval == nullptr) ? nullptr : assignedInterval->getNextRefPosition();
6285 RefPosition* nextRefPosition2 =
6286 (assignedInterval2 == nullptr) ? nullptr : assignedInterval2->getNextRefPosition();
6287 if (nextRefPosition != nullptr)
6288 if (nextRefPosition2 != nullptr)
6289 assert(!nextRefPosition->RequiresRegister() || !nextRefPosition2->RequiresRegister());
6291 assert(!nextRefPosition->RequiresRegister());
6294 assert(nextRefPosition2 != nullptr && !nextRefPosition2->RequiresRegister());
6297 Interval* assignedInterval = farthestRefPhysRegRecord->assignedInterval;
6298 RefPosition* nextRefPosition = assignedInterval->getNextRefPosition();
6299 assert(!nextRefPosition->RequiresRegister());
6304 assert(farthestLocation > refLocation || refPosition->isFixedRegRef);
6309 if (farthestRefPhysRegRecord != nullptr)
6311 foundReg = farthestRefPhysRegRecord->regNum;
6313 if (genIsValidDoubleReg(foundReg))
6315 // For a double register, we has following three cases.
6316 // Case 1: farthestRefPhysRegRecord is assigned to TYP_DOUBLE interval
6317 // Case 2: farthestRefPhysRegRecord and farthestRefPhysRegRecord2 are assigned to
6318 // different TYP_FLOAT intervals
6319 // Case 3: farthestRefPhysRegRecord is assgined to TYP_FLOAT interval
6320 // and farthestRefPhysRegRecord2 is nullptr
6321 // Case 4: farthestRefPhysRegRecord is nullptr, and farthestRefPhysRegRecord2 is
6322 // assigned to a TYP_FLOAT interval
6323 if (farthestRefPhysRegRecord->assignedInterval != nullptr)
6325 if (farthestRefPhysRegRecord->assignedInterval->registerType == TYP_DOUBLE)
6327 // Case 1: farthestRefPhysRegRecord is assigned to TYP_DOUBLE interval
6328 unassignPhysReg(farthestRefPhysRegRecord,
6329 farthestRefPhysRegRecord->assignedInterval->recentRefPosition);
6333 // Case 2: farthestRefPhysRegRecord and farthestRefPhysRegRecord2 are assigned to
6334 // different TYP_FLOAT intervals
6335 // Case 3: farthestRefPhysRegRecord is assgined to TYP_FLOAT interval
6336 // and farthestRefPhysRegRecord2 is nullptr
6337 unassignPhysReg(farthestRefPhysRegRecord,
6338 farthestRefPhysRegRecord->assignedInterval->recentRefPosition);
6339 if (farthestRefPhysRegRecord2 != nullptr)
6340 unassignPhysReg(farthestRefPhysRegRecord2,
6341 farthestRefPhysRegRecord2->assignedInterval->recentRefPosition);
6346 // Case 4: farthestRefPhysRegRecord is nullptr, and farthestRefPhysRegRecord2 is
6347 // assigned to a TYP_FLOAT interval
6348 assert(farthestRefPhysRegRecord2->assignedInterval != nullptr);
6349 assert(farthestRefPhysRegRecord2->assignedInterval->registerType == TYP_FLOAT);
6350 unassignPhysReg(farthestRefPhysRegRecord2,
6351 farthestRefPhysRegRecord2->assignedInterval->recentRefPosition);
6356 unassignPhysReg(farthestRefPhysRegRecord, farthestRefPhysRegRecord->assignedInterval->recentRefPosition);
6359 unassignPhysReg(farthestRefPhysRegRecord, farthestRefPhysRegRecord->assignedInterval->recentRefPosition);
6361 assignPhysReg(farthestRefPhysRegRecord, current);
6362 refPosition->registerAssignment = genRegMask(foundReg);
6367 refPosition->registerAssignment = RBM_NONE;
6373 // Grab a register to use to copy and then immediately use.
6374 // This is called only for localVar intervals that already have a register
6375 // assignment that is not compatible with the current RefPosition.
6376 // This is not like regular assignment, because we don't want to change
6377 // any preferences or existing register assignments.
6378 // Prefer a free register that's got the earliest next use.
6379 // Otherwise, spill something with the farthest next use
6381 regNumber LinearScan::assignCopyReg(RefPosition* refPosition)
6383 Interval* currentInterval = refPosition->getInterval();
6384 assert(currentInterval != nullptr);
6385 assert(currentInterval->isActive);
6387 bool foundFreeReg = false;
6388 RegRecord* bestPhysReg = nullptr;
6389 LsraLocation bestLocation = MinLocation;
6390 regMaskTP candidates = refPosition->registerAssignment;
6392 // Save the relatedInterval, if any, so that it doesn't get modified during allocation.
6393 Interval* savedRelatedInterval = currentInterval->relatedInterval;
6394 currentInterval->relatedInterval = nullptr;
6396 // We don't want really want to change the default assignment,
6397 // so 1) pretend this isn't active, and 2) remember the old reg
6398 regNumber oldPhysReg = currentInterval->physReg;
6399 RegRecord* oldRegRecord = currentInterval->assignedReg;
6400 assert(oldRegRecord->regNum == oldPhysReg);
6401 currentInterval->isActive = false;
6403 regNumber allocatedReg = tryAllocateFreeReg(currentInterval, refPosition);
6404 if (allocatedReg == REG_NA)
6406 allocatedReg = allocateBusyReg(currentInterval, refPosition, false);
6409 // Now restore the old info
6410 currentInterval->relatedInterval = savedRelatedInterval;
6411 currentInterval->physReg = oldPhysReg;
6412 currentInterval->assignedReg = oldRegRecord;
6413 currentInterval->isActive = true;
6415 refPosition->copyReg = true;
6416 return allocatedReg;
6419 // Check if the interval is already assigned and if it is then unassign the physical record
6420 // then set the assignedInterval to 'interval'
6422 void LinearScan::checkAndAssignInterval(RegRecord* regRec, Interval* interval)
6424 if (regRec->assignedInterval != nullptr && regRec->assignedInterval != interval)
6426 // This is allocated to another interval. Either it is inactive, or it was allocated as a
6427 // copyReg and is therefore not the "assignedReg" of the other interval. In the latter case,
6428 // we simply unassign it - in the former case we need to set the physReg on the interval to
6429 // REG_NA to indicate that it is no longer in that register.
6430 // The lack of checking for this case resulted in an assert in the retail version of System.dll,
6431 // in method SerialStream.GetDcbFlag.
6432 // Note that we can't check for the copyReg case, because we may have seen a more recent
6433 // RefPosition for the Interval that was NOT a copyReg.
6434 if (regRec->assignedInterval->assignedReg == regRec)
6436 assert(regRec->assignedInterval->isActive == false);
6437 regRec->assignedInterval->physReg = REG_NA;
6439 unassignPhysReg(regRec->regNum);
6442 updateAssignedInterval(regRec, interval, interval->registerType);
6445 // Assign the given physical register interval to the given interval
6446 void LinearScan::assignPhysReg(RegRecord* regRec, Interval* interval)
6448 regMaskTP assignedRegMask = genRegMask(regRec->regNum);
6449 compiler->codeGen->regSet.rsSetRegsModified(assignedRegMask DEBUGARG(dumpTerse));
6451 checkAndAssignInterval(regRec, interval);
6452 interval->assignedReg = regRec;
6454 interval->physReg = regRec->regNum;
6455 interval->isActive = true;
6456 if (interval->isLocalVar)
6458 // Prefer this register for future references
6459 interval->updateRegisterPreferences(assignedRegMask);
6463 //------------------------------------------------------------------------
6464 // setIntervalAsSplit: Set this Interval as being split
6467 // interval - The Interval which is being split
6473 // The given Interval will be marked as split, and it will be added to the
6474 // set of splitOrSpilledVars.
6477 // "interval" must be a lclVar interval, as tree temps are never split.
6478 // This is asserted in the call to getVarIndex().
6480 void LinearScan::setIntervalAsSplit(Interval* interval)
6482 if (interval->isLocalVar)
6484 unsigned varIndex = interval->getVarIndex(compiler);
6485 if (!interval->isSplit)
6487 VarSetOps::AddElemD(compiler, splitOrSpilledVars, varIndex);
6491 assert(VarSetOps::IsMember(compiler, splitOrSpilledVars, varIndex));
6494 interval->isSplit = true;
6497 //------------------------------------------------------------------------
6498 // setIntervalAsSpilled: Set this Interval as being spilled
6501 // interval - The Interval which is being spilled
6507 // The given Interval will be marked as spilled, and it will be added
6508 // to the set of splitOrSpilledVars.
6510 void LinearScan::setIntervalAsSpilled(Interval* interval)
6512 if (interval->isLocalVar)
6514 unsigned varIndex = interval->getVarIndex(compiler);
6515 if (!interval->isSpilled)
6517 VarSetOps::AddElemD(compiler, splitOrSpilledVars, varIndex);
6521 assert(VarSetOps::IsMember(compiler, splitOrSpilledVars, varIndex));
6524 interval->isSpilled = true;
6527 //------------------------------------------------------------------------
6528 // spill: Spill this Interval between "fromRefPosition" and "toRefPosition"
6531 // fromRefPosition - The RefPosition at which the Interval is to be spilled
6532 // toRefPosition - The RefPosition at which it must be reloaded
6538 // fromRefPosition and toRefPosition must not be null
6540 void LinearScan::spillInterval(Interval* interval, RefPosition* fromRefPosition, RefPosition* toRefPosition)
6542 assert(fromRefPosition != nullptr && toRefPosition != nullptr);
6543 assert(fromRefPosition->getInterval() == interval && toRefPosition->getInterval() == interval);
6544 assert(fromRefPosition->nextRefPosition == toRefPosition);
6546 if (!fromRefPosition->lastUse)
6548 // If not allocated a register, Lcl var def/use ref positions even if reg optional
6549 // should be marked as spillAfter.
6550 if (!fromRefPosition->RequiresRegister() && !(interval->isLocalVar && fromRefPosition->IsActualRef()))
6552 fromRefPosition->registerAssignment = RBM_NONE;
6556 fromRefPosition->spillAfter = true;
6559 assert(toRefPosition != nullptr);
6564 dumpLsraAllocationEvent(LSRA_EVENT_SPILL, interval);
6568 INTRACK_STATS(updateLsraStat(LSRA_STAT_SPILL, fromRefPosition->bbNum));
6570 interval->isActive = false;
6571 setIntervalAsSpilled(interval);
6573 // If fromRefPosition occurs before the beginning of this block, mark this as living in the stack
6574 // on entry to this block.
6575 if (fromRefPosition->nodeLocation <= curBBStartLocation)
6577 // This must be a lclVar interval
6578 assert(interval->isLocalVar);
6579 setInVarRegForBB(curBBNum, interval->varNum, REG_STK);
6583 //------------------------------------------------------------------------
6584 // unassignPhysRegNoSpill: Unassign the given physical register record from
6585 // an active interval, without spilling.
6588 // regRec - the RegRecord to be unasssigned
6594 // The assignedInterval must not be null, and must be active.
6597 // This method is used to unassign a register when an interval needs to be moved to a
6598 // different register, but not (yet) spilled.
6600 void LinearScan::unassignPhysRegNoSpill(RegRecord* regRec)
6602 Interval* assignedInterval = regRec->assignedInterval;
6603 assert(assignedInterval != nullptr && assignedInterval->isActive);
6604 assignedInterval->isActive = false;
6605 unassignPhysReg(regRec, nullptr);
6606 assignedInterval->isActive = true;
6609 //------------------------------------------------------------------------
6610 // checkAndClearInterval: Clear the assignedInterval for the given
6611 // physical register record
6614 // regRec - the physical RegRecord to be unasssigned
6615 // spillRefPosition - The RefPosition at which the assignedInterval is to be spilled
6616 // or nullptr if we aren't spilling
6622 // see unassignPhysReg
6624 void LinearScan::checkAndClearInterval(RegRecord* regRec, RefPosition* spillRefPosition)
6626 Interval* assignedInterval = regRec->assignedInterval;
6627 assert(assignedInterval != nullptr);
6628 regNumber thisRegNum = regRec->regNum;
6630 if (spillRefPosition == nullptr)
6632 // Note that we can't assert for the copyReg case
6634 if (assignedInterval->physReg == thisRegNum)
6636 assert(assignedInterval->isActive == false);
6641 assert(spillRefPosition->getInterval() == assignedInterval);
6644 updateAssignedInterval(regRec, nullptr, assignedInterval->registerType);
6647 //------------------------------------------------------------------------
6648 // unassignPhysReg: Unassign the given physical register record, and spill the
6649 // assignedInterval at the given spillRefPosition, if any.
6652 // regRec - the RegRecord to be unasssigned
6653 // spillRefPosition - The RefPosition at which the assignedInterval is to be spilled
6659 // The assignedInterval must not be null.
6660 // If spillRefPosition is null, the assignedInterval must be inactive, or not currently
6661 // assigned to this register (e.g. this is a copyReg for that Interval).
6662 // Otherwise, spillRefPosition must be associated with the assignedInterval.
6664 void LinearScan::unassignPhysReg(RegRecord* regRec, RefPosition* spillRefPosition)
6666 Interval* assignedInterval = regRec->assignedInterval;
6667 assert(assignedInterval != nullptr);
6669 regNumber thisRegNum = regRec->regNum;
6672 regNumber nextRegNum = REG_NA;
6673 RegRecord* nextRegRec = nullptr;
6675 // Prepare second half RegRecord of a double register for TYP_DOUBLE
6676 if (assignedInterval->registerType == TYP_DOUBLE)
6678 assert(isFloatRegType(regRec->registerType));
6679 assert(genIsValidDoubleReg(regRec->regNum));
6681 nextRegNum = REG_NEXT(regRec->regNum);
6682 nextRegRec = getRegisterRecord(nextRegNum);
6684 // Both two RegRecords should have been assigned to the same interval.
6685 assert(assignedInterval == nextRegRec->assignedInterval);
6687 #endif // _TARGET_ARM_
6689 checkAndClearInterval(regRec, spillRefPosition);
6692 if (assignedInterval->registerType == TYP_DOUBLE)
6694 // Both two RegRecords should have been unassigned together.
6695 assert(regRec->assignedInterval == nullptr);
6696 assert(nextRegRec->assignedInterval == nullptr);
6698 #endif // _TARGET_ARM_
6701 if (VERBOSE && !dumpTerse)
6703 printf("unassigning %s: ", getRegName(regRec->regNum));
6704 assignedInterval->dump();
6709 RefPosition* nextRefPosition = nullptr;
6710 if (spillRefPosition != nullptr)
6712 nextRefPosition = spillRefPosition->nextRefPosition;
6715 if (assignedInterval->physReg != REG_NA && assignedInterval->physReg != thisRegNum)
6717 // This must have been a temporary copy reg, but we can't assert that because there
6718 // may have been intervening RefPositions that were not copyRegs.
6720 // reg->assignedInterval has already been set to nullptr by checkAndClearInterval()
6721 assert(regRec->assignedInterval == nullptr);
6725 regNumber victimAssignedReg = assignedInterval->physReg;
6726 assignedInterval->physReg = REG_NA;
6728 bool spill = assignedInterval->isActive && nextRefPosition != nullptr;
6731 // If this is an active interval, it must have a recentRefPosition,
6732 // otherwise it would not be active
6733 assert(spillRefPosition != nullptr);
6736 // TODO-CQ: Enable this and insert an explicit GT_COPY (otherwise there's no way to communicate
6737 // to codegen that we want the copyReg to be the new home location).
6738 // If the last reference was a copyReg, and we're spilling the register
6739 // it was copied from, then make the copyReg the new primary location
6741 if (spillRefPosition->copyReg)
6743 regNumber copyFromRegNum = victimAssignedReg;
6744 regNumber copyRegNum = genRegNumFromMask(spillRefPosition->registerAssignment);
6745 if (copyFromRegNum == thisRegNum &&
6746 getRegisterRecord(copyRegNum)->assignedInterval == assignedInterval)
6748 assert(copyRegNum != thisRegNum);
6749 assignedInterval->physReg = copyRegNum;
6750 assignedInterval->assignedReg = this->getRegisterRecord(copyRegNum);
6756 // With JitStressRegs == 0x80 (LSRA_EXTEND_LIFETIMES), we may have a RefPosition
6757 // that is not marked lastUse even though the treeNode is a lastUse. In that case
6758 // we must not mark it for spill because the register will have been immediately freed
6759 // after use. While we could conceivably add special handling for this case in codegen,
6760 // it would be messy and undesirably cause the "bleeding" of LSRA stress modes outside
6762 if (extendLifetimes() && assignedInterval->isLocalVar && RefTypeIsUse(spillRefPosition->refType) &&
6763 spillRefPosition->treeNode != nullptr && (spillRefPosition->treeNode->gtFlags & GTF_VAR_DEATH) != 0)
6765 dumpLsraAllocationEvent(LSRA_EVENT_SPILL_EXTENDED_LIFETIME, assignedInterval);
6766 assignedInterval->isActive = false;
6768 // If the spillRefPosition occurs before the beginning of this block, it will have
6769 // been marked as living in this register on entry to this block, but we now need
6770 // to mark this as living on the stack.
6771 if (spillRefPosition->nodeLocation <= curBBStartLocation)
6773 setInVarRegForBB(curBBNum, assignedInterval->varNum, REG_STK);
6774 if (spillRefPosition->nextRefPosition != nullptr)
6776 setIntervalAsSpilled(assignedInterval);
6781 // Otherwise, we need to mark spillRefPosition as lastUse, or the interval
6782 // will remain active beyond its allocated range during the resolution phase.
6783 spillRefPosition->lastUse = true;
6789 spillInterval(assignedInterval, spillRefPosition, nextRefPosition);
6792 // Maintain the association with the interval, if it has more references.
6793 // Or, if we "remembered" an interval assigned to this register, restore it.
6794 if (nextRefPosition != nullptr)
6796 assignedInterval->assignedReg = regRec;
6798 else if (canRestorePreviousInterval(regRec, assignedInterval))
6800 regRec->assignedInterval = regRec->previousInterval;
6801 regRec->previousInterval = nullptr;
6805 // We can not use updateAssignedInterval() and updatePreviousInterval() here,
6806 // because regRec may not be a even-numbered float register.
6808 // Update second half RegRecord of a double register for TYP_DOUBLE
6809 if (regRec->assignedInterval->registerType == TYP_DOUBLE)
6811 RegRecord* anotherHalfRegRec = findAnotherHalfRegRec(regRec);
6813 anotherHalfRegRec->assignedInterval = regRec->assignedInterval;
6814 anotherHalfRegRec->previousInterval = nullptr;
6816 #endif // _TARGET_ARM_
6821 dumpLsraAllocationEvent(LSRA_EVENT_RESTORE_PREVIOUS_INTERVAL_AFTER_SPILL, regRec->assignedInterval,
6826 dumpLsraAllocationEvent(LSRA_EVENT_RESTORE_PREVIOUS_INTERVAL, regRec->assignedInterval, thisRegNum);
6832 updateAssignedInterval(regRec, nullptr, assignedInterval->registerType);
6833 updatePreviousInterval(regRec, nullptr, assignedInterval->registerType);
6837 //------------------------------------------------------------------------
6838 // spillGCRefs: Spill any GC-type intervals that are currently in registers.a
6841 // killRefPosition - The RefPosition for the kill
6846 void LinearScan::spillGCRefs(RefPosition* killRefPosition)
6848 // For each physical register that can hold a GC type,
6849 // if it is occupied by an interval of a GC type, spill that interval.
6850 regMaskTP candidateRegs = killRefPosition->registerAssignment;
6851 while (candidateRegs != RBM_NONE)
6853 regMaskTP nextRegBit = genFindLowestBit(candidateRegs);
6854 candidateRegs &= ~nextRegBit;
6855 regNumber nextReg = genRegNumFromMask(nextRegBit);
6856 RegRecord* regRecord = getRegisterRecord(nextReg);
6857 Interval* assignedInterval = regRecord->assignedInterval;
6858 if (assignedInterval == nullptr || (assignedInterval->isActive == false) ||
6859 !varTypeIsGC(assignedInterval->registerType))
6863 unassignPhysReg(regRecord, assignedInterval->recentRefPosition);
6865 INDEBUG(dumpLsraAllocationEvent(LSRA_EVENT_DONE_KILL_GC_REFS, nullptr, REG_NA, nullptr));
6868 //------------------------------------------------------------------------
6869 // processBlockEndAllocation: Update var locations after 'currentBlock' has been allocated
6872 // currentBlock - the BasicBlock we have just finished allocating registers for
6878 // Calls processBlockEndLocations() to set the outVarToRegMap, then gets the next block,
6879 // and sets the inVarToRegMap appropriately.
6881 void LinearScan::processBlockEndAllocation(BasicBlock* currentBlock)
6883 assert(currentBlock != nullptr);
6884 if (enregisterLocalVars)
6886 processBlockEndLocations(currentBlock);
6888 markBlockVisited(currentBlock);
6890 // Get the next block to allocate.
6891 // When the last block in the method has successors, there will be a final "RefTypeBB" to
6892 // ensure that we get the varToRegMap set appropriately, but in that case we don't need
6893 // to worry about "nextBlock".
6894 BasicBlock* nextBlock = getNextBlock();
6895 if (nextBlock != nullptr)
6897 processBlockStartLocations(nextBlock, true);
6901 //------------------------------------------------------------------------
6902 // rotateBlockStartLocation: When in the LSRA_BLOCK_BOUNDARY_ROTATE stress mode, attempt to
6903 // "rotate" the register assignment for a localVar to the next higher
6904 // register that is available.
6907 // interval - the Interval for the variable whose register is getting rotated
6908 // targetReg - its register assignment from the predecessor block being used for live-in
6909 // availableRegs - registers available for use
6912 // The new register to use.
6915 regNumber LinearScan::rotateBlockStartLocation(Interval* interval, regNumber targetReg, regMaskTP availableRegs)
6917 if (targetReg != REG_STK && getLsraBlockBoundaryLocations() == LSRA_BLOCK_BOUNDARY_ROTATE)
6919 // If we're rotating the register locations at block boundaries, try to use
6920 // the next higher register number of the appropriate register type.
6921 regMaskTP candidateRegs = allRegs(interval->registerType) & availableRegs;
6922 regNumber firstReg = REG_NA;
6923 regNumber newReg = REG_NA;
6924 while (candidateRegs != RBM_NONE)
6926 regMaskTP nextRegBit = genFindLowestBit(candidateRegs);
6927 candidateRegs &= ~nextRegBit;
6928 regNumber nextReg = genRegNumFromMask(nextRegBit);
6929 if (nextReg > targetReg)
6934 else if (firstReg == REG_NA)
6939 if (newReg == REG_NA)
6941 assert(firstReg != REG_NA);
6951 //--------------------------------------------------------------------------------------
6952 // isSecondHalfReg: Test if recRec is second half of double register
6953 // which is assigned to an interval.
6956 // regRec - a register to be tested
6957 // interval - an interval which is assigned to some register
6963 // True only if regRec is second half of assignedReg in interval
6965 bool LinearScan::isSecondHalfReg(RegRecord* regRec, Interval* interval)
6967 RegRecord* assignedReg = interval->assignedReg;
6969 if (assignedReg != nullptr && interval->registerType == TYP_DOUBLE)
6971 // interval should have been allocated to a valid double register
6972 assert(genIsValidDoubleReg(assignedReg->regNum));
6974 // Find a second half RegRecord of double register
6975 regNumber firstRegNum = assignedReg->regNum;
6976 regNumber secondRegNum = REG_NEXT(firstRegNum);
6978 assert(genIsValidFloatReg(secondRegNum) && !genIsValidDoubleReg(secondRegNum));
6980 RegRecord* secondRegRec = getRegisterRecord(secondRegNum);
6982 return secondRegRec == regRec;
6988 //------------------------------------------------------------------------------------------
6989 // findAnotherHalfRegRec: Find another half RegRecord which forms same ARM32 double register
6992 // regRec - A float RegRecord
6998 // A RegRecord which forms same double register with regRec
7000 RegRecord* LinearScan::findAnotherHalfRegRec(RegRecord* regRec)
7002 regNumber anotherHalfRegNum;
7003 RegRecord* anotherHalfRegRec;
7005 assert(genIsValidFloatReg(regRec->regNum));
7007 // Find another half register for TYP_DOUBLE interval,
7008 // following same logic in canRestorePreviousInterval().
7009 if (genIsValidDoubleReg(regRec->regNum))
7011 anotherHalfRegNum = REG_NEXT(regRec->regNum);
7012 assert(!genIsValidDoubleReg(anotherHalfRegNum));
7016 anotherHalfRegNum = REG_PREV(regRec->regNum);
7017 assert(genIsValidDoubleReg(anotherHalfRegNum));
7019 anotherHalfRegRec = getRegisterRecord(anotherHalfRegNum);
7021 return anotherHalfRegRec;
7025 //--------------------------------------------------------------------------------------
7026 // canRestorePreviousInterval: Test if we can restore previous interval
7029 // regRec - a register which contains previous interval to be restored
7030 // assignedInterval - an interval just unassigned
7036 // True only if previous interval of regRec can be restored
7038 bool LinearScan::canRestorePreviousInterval(RegRecord* regRec, Interval* assignedInterval)
7041 (regRec->previousInterval != nullptr && regRec->previousInterval != assignedInterval &&
7042 regRec->previousInterval->assignedReg == regRec && regRec->previousInterval->getNextRefPosition() != nullptr);
7045 if (retVal && regRec->previousInterval->registerType == TYP_DOUBLE)
7047 RegRecord* anotherHalfRegRec = findAnotherHalfRegRec(regRec);
7049 retVal = retVal && anotherHalfRegRec->assignedInterval == nullptr;
7056 bool LinearScan::isAssignedToInterval(Interval* interval, RegRecord* regRec)
7058 bool isAssigned = (interval->assignedReg == regRec);
7060 isAssigned |= isSecondHalfReg(regRec, interval);
7065 //------------------------------------------------------------------------
7066 // processBlockStartLocations: Update var locations on entry to 'currentBlock' and clear constant
7070 // currentBlock - the BasicBlock we are about to allocate registers for
7071 // allocationPass - true if we are currently allocating registers (versus writing them back)
7077 // During the allocation pass, we use the outVarToRegMap of the selected predecessor to
7078 // determine the lclVar locations for the inVarToRegMap.
7079 // During the resolution (write-back) pass, we only modify the inVarToRegMap in cases where
7080 // a lclVar was spilled after the block had been completed.
7081 void LinearScan::processBlockStartLocations(BasicBlock* currentBlock, bool allocationPass)
7083 // If we have no register candidates we should only call this method during allocation.
7085 assert(enregisterLocalVars || allocationPass);
7087 if (!enregisterLocalVars)
7089 // Just clear any constant registers and return.
7090 for (regNumber reg = REG_FIRST; reg < ACTUAL_REG_COUNT; reg = REG_NEXT(reg))
7092 RegRecord* physRegRecord = getRegisterRecord(reg);
7093 Interval* assignedInterval = physRegRecord->assignedInterval;
7095 if (assignedInterval != nullptr)
7097 assert(assignedInterval->isConstant);
7098 physRegRecord->assignedInterval = nullptr;
7101 INDEBUG(dumpLsraAllocationEvent(LSRA_EVENT_START_BB, nullptr, REG_NA, currentBlock));
7105 unsigned predBBNum = blockInfo[currentBlock->bbNum].predBBNum;
7106 VarToRegMap predVarToRegMap = getOutVarToRegMap(predBBNum);
7107 VarToRegMap inVarToRegMap = getInVarToRegMap(currentBlock->bbNum);
7108 bool hasCriticalInEdge = blockInfo[currentBlock->bbNum].hasCriticalInEdge;
7110 VarSetOps::AssignNoCopy(compiler, currentLiveVars,
7111 VarSetOps::Intersection(compiler, registerCandidateVars, currentBlock->bbLiveIn));
7113 if (getLsraExtendLifeTimes())
7115 VarSetOps::AssignNoCopy(compiler, currentLiveVars, registerCandidateVars);
7117 // If we are rotating register assignments at block boundaries, we want to make the
7118 // inactive registers available for the rotation.
7119 regMaskTP inactiveRegs = RBM_NONE;
7121 regMaskTP liveRegs = RBM_NONE;
7122 VarSetOps::Iter iter(compiler, currentLiveVars);
7123 unsigned varIndex = 0;
7124 while (iter.NextElem(&varIndex))
7126 unsigned varNum = compiler->lvaTrackedToVarNum[varIndex];
7127 if (!compiler->lvaTable[varNum].lvLRACandidate)
7131 regNumber targetReg;
7132 Interval* interval = getIntervalForLocalVar(varIndex);
7133 RefPosition* nextRefPosition = interval->getNextRefPosition();
7134 assert(nextRefPosition != nullptr);
7138 targetReg = getVarReg(predVarToRegMap, varIndex);
7140 regNumber newTargetReg = rotateBlockStartLocation(interval, targetReg, (~liveRegs | inactiveRegs));
7141 if (newTargetReg != targetReg)
7143 targetReg = newTargetReg;
7144 setIntervalAsSplit(interval);
7147 setVarReg(inVarToRegMap, varIndex, targetReg);
7149 else // !allocationPass (i.e. resolution/write-back pass)
7151 targetReg = getVarReg(inVarToRegMap, varIndex);
7152 // There are four cases that we need to consider during the resolution pass:
7153 // 1. This variable had a register allocated initially, and it was not spilled in the RefPosition
7154 // that feeds this block. In this case, both targetReg and predVarToRegMap[varIndex] will be targetReg.
7155 // 2. This variable had not been spilled prior to the end of predBB, but was later spilled, so
7156 // predVarToRegMap[varIndex] will be REG_STK, but targetReg is its former allocated value.
7157 // In this case, we will normally change it to REG_STK. We will update its "spilled" status when we
7158 // encounter it in resolveLocalRef().
7159 // 2a. If the next RefPosition is marked as a copyReg, we need to retain the allocated register. This is
7160 // because the copyReg RefPosition will not have recorded the "home" register, yet downstream
7161 // RefPositions rely on the correct "home" register.
7162 // 3. This variable was spilled before we reached the end of predBB. In this case, both targetReg and
7163 // predVarToRegMap[varIndex] will be REG_STK, and the next RefPosition will have been marked
7164 // as reload during allocation time if necessary (note that by the time we actually reach the next
7165 // RefPosition, we may be using a different predecessor, at which it is still in a register).
7166 // 4. This variable was spilled during the allocation of this block, so targetReg is REG_STK
7167 // (because we set inVarToRegMap at the time we spilled it), but predVarToRegMap[varIndex]
7168 // is not REG_STK. We retain the REG_STK value in the inVarToRegMap.
7169 if (targetReg != REG_STK)
7171 if (getVarReg(predVarToRegMap, varIndex) != REG_STK)
7174 assert(getVarReg(predVarToRegMap, varIndex) == targetReg ||
7175 getLsraBlockBoundaryLocations() == LSRA_BLOCK_BOUNDARY_ROTATE);
7177 else if (!nextRefPosition->copyReg)
7180 setVarReg(inVarToRegMap, varIndex, REG_STK);
7181 targetReg = REG_STK;
7183 // Else case 2a. - retain targetReg.
7185 // Else case #3 or #4, we retain targetReg and nothing further to do or assert.
7187 if (interval->physReg == targetReg)
7189 if (interval->isActive)
7191 assert(targetReg != REG_STK);
7192 assert(interval->assignedReg != nullptr && interval->assignedReg->regNum == targetReg &&
7193 interval->assignedReg->assignedInterval == interval);
7194 liveRegs |= genRegMask(targetReg);
7198 else if (interval->physReg != REG_NA)
7200 // This can happen if we are using the locations from a basic block other than the
7201 // immediately preceding one - where the variable was in a different location.
7202 if (targetReg != REG_STK)
7204 // Unassign it from the register (it will get a new register below).
7205 if (interval->assignedReg != nullptr && interval->assignedReg->assignedInterval == interval)
7207 interval->isActive = false;
7208 unassignPhysReg(getRegisterRecord(interval->physReg), nullptr);
7212 // This interval was live in this register the last time we saw a reference to it,
7213 // but has since been displaced.
7214 interval->physReg = REG_NA;
7217 else if (allocationPass)
7219 // Keep the register assignment - if another var has it, it will get unassigned.
7220 // Otherwise, resolution will fix it up later, and it will be more
7221 // likely to match other assignments this way.
7222 interval->isActive = true;
7223 liveRegs |= genRegMask(interval->physReg);
7224 INDEBUG(inactiveRegs |= genRegMask(interval->physReg));
7225 setVarReg(inVarToRegMap, varIndex, interval->physReg);
7229 interval->physReg = REG_NA;
7232 if (targetReg != REG_STK)
7234 RegRecord* targetRegRecord = getRegisterRecord(targetReg);
7235 liveRegs |= genRegMask(targetReg);
7236 if (!interval->isActive)
7238 interval->isActive = true;
7239 interval->physReg = targetReg;
7240 interval->assignedReg = targetRegRecord;
7242 Interval* assignedInterval = targetRegRecord->assignedInterval;
7243 if (assignedInterval != interval)
7245 // Is there another interval currently assigned to this register? If so unassign it.
7246 if (assignedInterval != nullptr)
7248 if (isAssignedToInterval(assignedInterval, targetRegRecord))
7250 regNumber assignedRegNum = assignedInterval->assignedReg->regNum;
7252 // If the interval is active, it will be set to active when we reach its new
7253 // register assignment (which we must not yet have done, or it wouldn't still be
7254 // assigned to this register).
7255 assignedInterval->isActive = false;
7256 unassignPhysReg(assignedInterval->assignedReg, nullptr);
7257 if (allocationPass && assignedInterval->isLocalVar &&
7258 inVarToRegMap[assignedInterval->getVarIndex(compiler)] == assignedRegNum)
7260 inVarToRegMap[assignedInterval->getVarIndex(compiler)] = REG_STK;
7265 // This interval is no longer assigned to this register.
7266 updateAssignedInterval(targetRegRecord, nullptr, assignedInterval->registerType);
7269 assignPhysReg(targetRegRecord, interval);
7271 if (interval->recentRefPosition != nullptr && !interval->recentRefPosition->copyReg &&
7272 interval->recentRefPosition->registerAssignment != genRegMask(targetReg))
7274 interval->getNextRefPosition()->outOfOrder = true;
7279 // Unassign any registers that are no longer live.
7280 for (regNumber reg = REG_FIRST; reg < ACTUAL_REG_COUNT; reg = REG_NEXT(reg))
7282 if ((liveRegs & genRegMask(reg)) == 0)
7284 RegRecord* physRegRecord = getRegisterRecord(reg);
7285 Interval* assignedInterval = physRegRecord->assignedInterval;
7287 if (assignedInterval != nullptr)
7289 assert(assignedInterval->isLocalVar || assignedInterval->isConstant);
7291 if (!assignedInterval->isConstant && assignedInterval->assignedReg == physRegRecord)
7293 assignedInterval->isActive = false;
7294 if (assignedInterval->getNextRefPosition() == nullptr)
7296 unassignPhysReg(physRegRecord, nullptr);
7298 inVarToRegMap[assignedInterval->getVarIndex(compiler)] = REG_STK;
7302 // This interval may still be active, but was in another register in an
7303 // intervening block.
7304 updateAssignedInterval(physRegRecord, nullptr, assignedInterval->registerType);
7308 if (assignedInterval->registerType == TYP_DOUBLE)
7310 // Skip next float register, because we already addressed a double register
7311 assert(genIsValidDoubleReg(reg));
7312 reg = REG_NEXT(reg);
7314 #endif // _TARGET_ARM_
7320 RegRecord* physRegRecord = getRegisterRecord(reg);
7321 Interval* assignedInterval = physRegRecord->assignedInterval;
7323 if (assignedInterval != nullptr && assignedInterval->registerType == TYP_DOUBLE)
7325 // Skip next float register, because we already addressed a double register
7326 assert(genIsValidDoubleReg(reg));
7327 reg = REG_NEXT(reg);
7330 #endif // _TARGET_ARM_
7332 INDEBUG(dumpLsraAllocationEvent(LSRA_EVENT_START_BB, nullptr, REG_NA, currentBlock));
7335 //------------------------------------------------------------------------
7336 // processBlockEndLocations: Record the variables occupying registers after completing the current block.
7339 // currentBlock - the block we have just completed.
7345 // This must be called both during the allocation and resolution (write-back) phases.
7346 // This is because we need to have the outVarToRegMap locations in order to set the locations
7347 // at successor blocks during allocation time, but if lclVars are spilled after a block has been
7348 // completed, we need to record the REG_STK location for those variables at resolution time.
7350 void LinearScan::processBlockEndLocations(BasicBlock* currentBlock)
7352 assert(currentBlock != nullptr && currentBlock->bbNum == curBBNum);
7353 VarToRegMap outVarToRegMap = getOutVarToRegMap(curBBNum);
7355 VarSetOps::AssignNoCopy(compiler, currentLiveVars,
7356 VarSetOps::Intersection(compiler, registerCandidateVars, currentBlock->bbLiveOut));
7358 if (getLsraExtendLifeTimes())
7360 VarSetOps::Assign(compiler, currentLiveVars, registerCandidateVars);
7363 regMaskTP liveRegs = RBM_NONE;
7364 VarSetOps::Iter iter(compiler, currentLiveVars);
7365 unsigned varIndex = 0;
7366 while (iter.NextElem(&varIndex))
7368 Interval* interval = getIntervalForLocalVar(varIndex);
7369 if (interval->isActive)
7371 assert(interval->physReg != REG_NA && interval->physReg != REG_STK);
7372 setVarReg(outVarToRegMap, varIndex, interval->physReg);
7376 outVarToRegMap[varIndex] = REG_STK;
7379 INDEBUG(dumpLsraAllocationEvent(LSRA_EVENT_END_BB));
7383 void LinearScan::dumpRefPositions(const char* str)
7385 printf("------------\n");
7386 printf("REFPOSITIONS %s: \n", str);
7387 printf("------------\n");
7388 for (auto& refPos : refPositions)
7395 bool LinearScan::registerIsFree(regNumber regNum, RegisterType regType)
7397 RegRecord* physRegRecord = getRegisterRecord(regNum);
7399 bool isFree = physRegRecord->isFree();
7402 if (isFree && regType == TYP_DOUBLE)
7404 isFree = getRegisterRecord(REG_NEXT(regNum))->isFree();
7406 #endif // _TARGET_ARM_
7411 //------------------------------------------------------------------------
7412 // LinearScan::freeRegister: Make a register available for use
7415 // physRegRecord - the RegRecord for the register to be freed.
7422 // It may be that the RegRecord has already been freed, e.g. due to a kill,
7423 // in which case this method has no effect.
7426 // If there is currently an Interval assigned to this register, and it has
7427 // more references (i.e. this is a local last-use, but more uses and/or
7428 // defs remain), it will remain assigned to the physRegRecord. However, since
7429 // it is marked inactive, the register will be available, albeit less desirable
7431 void LinearScan::freeRegister(RegRecord* physRegRecord)
7433 Interval* assignedInterval = physRegRecord->assignedInterval;
7434 // It may have already been freed by a "Kill"
7435 if (assignedInterval != nullptr)
7437 assignedInterval->isActive = false;
7438 // If this is a constant node, that we may encounter again (e.g. constant),
7439 // don't unassign it until we need the register.
7440 if (!assignedInterval->isConstant)
7442 RefPosition* nextRefPosition = assignedInterval->getNextRefPosition();
7443 // Unassign the register only if there are no more RefPositions, or the next
7444 // one is a def. Note that the latter condition doesn't actually ensure that
7445 // there aren't subsequent uses that could be reached by a def in the assigned
7446 // register, but is merely a heuristic to avoid tying up the register (or using
7447 // it when it's non-optimal). A better alternative would be to use SSA, so that
7448 // we wouldn't unnecessarily link separate live ranges to the same register.
7449 if (nextRefPosition == nullptr || RefTypeIsDef(nextRefPosition->refType))
7452 assert((assignedInterval->registerType != TYP_DOUBLE) || genIsValidDoubleReg(physRegRecord->regNum));
7453 #endif // _TARGET_ARM_
7454 unassignPhysReg(physRegRecord, nullptr);
7460 void LinearScan::freeRegisters(regMaskTP regsToFree)
7462 if (regsToFree == RBM_NONE)
7467 INDEBUG(dumpLsraAllocationEvent(LSRA_EVENT_FREE_REGS));
7468 while (regsToFree != RBM_NONE)
7470 regMaskTP nextRegBit = genFindLowestBit(regsToFree);
7471 regsToFree &= ~nextRegBit;
7472 regNumber nextReg = genRegNumFromMask(nextRegBit);
7473 freeRegister(getRegisterRecord(nextReg));
7477 // Actual register allocation, accomplished by iterating over all of the previously
7478 // constructed Intervals
7479 // Loosely based on raAssignVars()
7481 void LinearScan::allocateRegisters()
7483 JITDUMP("*************** In LinearScan::allocateRegisters()\n");
7484 DBEXEC(VERBOSE, lsraDumpIntervals("before allocateRegisters"));
7486 // at start, nothing is active except for register args
7487 for (auto& interval : intervals)
7489 Interval* currentInterval = &interval;
7490 currentInterval->recentRefPosition = nullptr;
7491 currentInterval->isActive = false;
7492 if (currentInterval->isLocalVar)
7494 LclVarDsc* varDsc = currentInterval->getLocalVar(compiler);
7495 if (varDsc->lvIsRegArg && currentInterval->firstRefPosition != nullptr)
7497 currentInterval->isActive = true;
7502 for (regNumber reg = REG_FIRST; reg < ACTUAL_REG_COUNT; reg = REG_NEXT(reg))
7504 getRegisterRecord(reg)->recentRefPosition = nullptr;
7505 getRegisterRecord(reg)->isActive = false;
7509 regNumber lastAllocatedReg = REG_NA;
7512 dumpRefPositions("BEFORE ALLOCATION");
7513 dumpVarRefPositions("BEFORE ALLOCATION");
7515 printf("\n\nAllocating Registers\n"
7516 "--------------------\n");
7519 dumpRegRecordHeader();
7520 // Now print an empty indent
7521 printf(indentFormat, "");
7526 BasicBlock* currentBlock = nullptr;
7528 LsraLocation prevLocation = MinLocation;
7529 regMaskTP regsToFree = RBM_NONE;
7530 regMaskTP delayRegsToFree = RBM_NONE;
7532 // This is the most recent RefPosition for which a register was allocated
7533 // - currently only used for DEBUG but maintained in non-debug, for clarity of code
7534 // (and will be optimized away because in non-debug spillAlways() unconditionally returns false)
7535 RefPosition* lastAllocatedRefPosition = nullptr;
7537 bool handledBlockEnd = false;
7539 for (auto& refPosition : refPositions)
7541 RefPosition* currentRefPosition = &refPosition;
7544 // Set the activeRefPosition to null until we're done with any boundary handling.
7545 activeRefPosition = nullptr;
7550 // We're really dumping the RegRecords "after" the previous RefPosition, but it's more convenient
7551 // to do this here, since there are a number of "continue"s in this loop.
7561 // This is the previousRefPosition of the current Referent, if any
7562 RefPosition* previousRefPosition = nullptr;
7564 Interval* currentInterval = nullptr;
7565 Referenceable* currentReferent = nullptr;
7566 bool isInternalRef = false;
7567 RefType refType = currentRefPosition->refType;
7569 currentReferent = currentRefPosition->referent;
7571 if (spillAlways() && lastAllocatedRefPosition != nullptr && !lastAllocatedRefPosition->isPhysRegRef &&
7572 !lastAllocatedRefPosition->getInterval()->isInternal &&
7573 (RefTypeIsDef(lastAllocatedRefPosition->refType) || lastAllocatedRefPosition->getInterval()->isLocalVar))
7575 assert(lastAllocatedRefPosition->registerAssignment != RBM_NONE);
7576 RegRecord* regRecord = lastAllocatedRefPosition->getInterval()->assignedReg;
7577 unassignPhysReg(regRecord, lastAllocatedRefPosition);
7578 // Now set lastAllocatedRefPosition to null, so that we don't try to spill it again
7579 lastAllocatedRefPosition = nullptr;
7582 // We wait to free any registers until we've completed all the
7583 // uses for the current node.
7584 // This avoids reusing registers too soon.
7585 // We free before the last true def (after all the uses & internal
7586 // registers), and then again at the beginning of the next node.
7587 // This is made easier by assigning two LsraLocations per node - one
7588 // for all the uses, internal registers & all but the last def, and
7589 // another for the final def (if any).
7591 LsraLocation currentLocation = currentRefPosition->nodeLocation;
7593 if ((regsToFree | delayRegsToFree) != RBM_NONE)
7595 bool doFreeRegs = false;
7596 // Free at a new location, or at a basic block boundary
7597 if (currentLocation > prevLocation || refType == RefTypeBB)
7604 freeRegisters(regsToFree);
7605 regsToFree = delayRegsToFree;
7606 delayRegsToFree = RBM_NONE;
7609 prevLocation = currentLocation;
7611 // get previous refposition, then current refpos is the new previous
7612 if (currentReferent != nullptr)
7614 previousRefPosition = currentReferent->recentRefPosition;
7615 currentReferent->recentRefPosition = currentRefPosition;
7619 assert((refType == RefTypeBB) || (refType == RefTypeKillGCRefs));
7622 // For the purposes of register resolution, we handle the DummyDefs before
7623 // the block boundary - so the RefTypeBB is after all the DummyDefs.
7624 // However, for the purposes of allocation, we want to handle the block
7625 // boundary first, so that we can free any registers occupied by lclVars
7626 // that aren't live in the next block and make them available for the
7629 if (!handledBlockEnd && (refType == RefTypeBB || refType == RefTypeDummyDef))
7631 // Free any delayed regs (now in regsToFree) before processing the block boundary
7632 freeRegisters(regsToFree);
7633 regsToFree = RBM_NONE;
7634 handledBlockEnd = true;
7635 curBBStartLocation = currentRefPosition->nodeLocation;
7636 if (currentBlock == nullptr)
7638 currentBlock = startBlockSequence();
7642 processBlockEndAllocation(currentBlock);
7643 currentBlock = moveToNextBlock();
7646 if (VERBOSE && currentBlock != nullptr && !dumpTerse)
7648 currentBlock->dspBlockHeader(compiler);
7655 activeRefPosition = currentRefPosition;
7660 dumpRefPositionShort(currentRefPosition, currentBlock);
7664 currentRefPosition->dump();
7669 if (refType == RefTypeBB)
7671 handledBlockEnd = false;
7675 if (refType == RefTypeKillGCRefs)
7677 spillGCRefs(currentRefPosition);
7681 // If this is a FixedReg, disassociate any inactive constant interval from this register.
7682 // Otherwise, do nothing.
7683 if (refType == RefTypeFixedReg)
7685 RegRecord* regRecord = currentRefPosition->getReg();
7686 Interval* assignedInterval = regRecord->assignedInterval;
7688 if (assignedInterval != nullptr && !assignedInterval->isActive && assignedInterval->isConstant)
7690 regRecord->assignedInterval = nullptr;
7693 // Update overlapping floating point register for TYP_DOUBLE
7694 if (assignedInterval->registerType == TYP_DOUBLE)
7696 regRecord = getRegisterRecord(REG_NEXT(regRecord->regNum));
7697 assignedInterval = regRecord->assignedInterval;
7699 assert(assignedInterval != nullptr && !assignedInterval->isActive && assignedInterval->isConstant);
7700 regRecord->assignedInterval = nullptr;
7704 INDEBUG(dumpLsraAllocationEvent(LSRA_EVENT_FIXED_REG, nullptr, currentRefPosition->assignedReg()));
7708 // If this is an exposed use, do nothing - this is merely a placeholder to attempt to
7709 // ensure that a register is allocated for the full lifetime. The resolution logic
7710 // will take care of moving to the appropriate register if needed.
7712 if (refType == RefTypeExpUse)
7714 INDEBUG(dumpLsraAllocationEvent(LSRA_EVENT_EXP_USE));
7718 regNumber assignedRegister = REG_NA;
7720 if (currentRefPosition->isIntervalRef())
7722 currentInterval = currentRefPosition->getInterval();
7723 assignedRegister = currentInterval->physReg;
7725 if (VERBOSE && !dumpTerse)
7727 currentInterval->dump();
7731 // Identify the special cases where we decide up-front not to allocate
7732 bool allocate = true;
7733 bool didDump = false;
7735 if (refType == RefTypeParamDef || refType == RefTypeZeroInit)
7737 // For a ParamDef with a weighted refCount less than unity, don't enregister it at entry.
7738 // TODO-CQ: Consider doing this only for stack parameters, since otherwise we may be needlessly
7739 // inserting a store.
7740 LclVarDsc* varDsc = currentInterval->getLocalVar(compiler);
7741 assert(varDsc != nullptr);
7742 if (refType == RefTypeParamDef && varDsc->lvRefCntWtd <= BB_UNITY_WEIGHT)
7744 INDEBUG(dumpLsraAllocationEvent(LSRA_EVENT_NO_ENTRY_REG_ALLOCATED, currentInterval));
7747 setIntervalAsSpilled(currentInterval);
7749 // If it has no actual references, mark it as "lastUse"; since they're not actually part
7750 // of any flow they won't have been marked during dataflow. Otherwise, if we allocate a
7751 // register we won't unassign it.
7752 else if (currentRefPosition->nextRefPosition == nullptr)
7754 INDEBUG(dumpLsraAllocationEvent(LSRA_EVENT_ZERO_REF, currentInterval));
7755 currentRefPosition->lastUse = true;
7759 else if (refType == RefTypeUpperVectorSaveDef || refType == RefTypeUpperVectorSaveUse)
7761 Interval* lclVarInterval = currentInterval->relatedInterval;
7762 if (lclVarInterval->physReg == REG_NA)
7767 #endif // FEATURE_SIMD
7769 if (allocate == false)
7771 if (assignedRegister != REG_NA)
7773 unassignPhysReg(getRegisterRecord(assignedRegister), currentRefPosition);
7777 INDEBUG(dumpLsraAllocationEvent(LSRA_EVENT_NO_REG_ALLOCATED, currentInterval));
7780 currentRefPosition->registerAssignment = RBM_NONE;
7784 if (currentInterval->isSpecialPutArg)
7786 assert(!currentInterval->isLocalVar);
7787 Interval* srcInterval = currentInterval->relatedInterval;
7788 assert(srcInterval->isLocalVar);
7789 if (refType == RefTypeDef)
7791 assert(srcInterval->recentRefPosition->nodeLocation == currentLocation - 1);
7792 RegRecord* physRegRecord = srcInterval->assignedReg;
7794 // For a putarg_reg to be special, its next use location has to be the same
7795 // as fixed reg's next kill location. Otherwise, if source lcl var's next use
7796 // is after the kill of fixed reg but before putarg_reg's next use, fixed reg's
7797 // kill would lead to spill of source but not the putarg_reg if it were treated
7799 if (srcInterval->isActive &&
7800 genRegMask(srcInterval->physReg) == currentRefPosition->registerAssignment &&
7801 currentInterval->getNextRefLocation() == physRegRecord->getNextRefLocation())
7803 assert(physRegRecord->regNum == srcInterval->physReg);
7805 // Special putarg_reg acts as a pass-thru since both source lcl var
7806 // and putarg_reg have the same register allocated. Physical reg
7807 // record of reg continue to point to source lcl var's interval
7808 // instead of to putarg_reg's interval. So if a spill of reg
7809 // allocated to source lcl var happens, to reallocate to another
7810 // tree node, before its use at call node it will lead to spill of
7811 // lcl var instead of putarg_reg since physical reg record is pointing
7812 // to lcl var's interval. As a result, arg reg would get trashed leading
7813 // to bad codegen. The assumption here is that source lcl var of a
7814 // special putarg_reg doesn't get spilled and re-allocated prior to
7815 // its use at the call node. This is ensured by marking physical reg
7816 // record as busy until next kill.
7817 physRegRecord->isBusyUntilNextKill = true;
7821 currentInterval->isSpecialPutArg = false;
7824 // If this is still a SpecialPutArg, continue;
7825 if (currentInterval->isSpecialPutArg)
7827 INDEBUG(dumpLsraAllocationEvent(LSRA_EVENT_SPECIAL_PUTARG, currentInterval,
7828 currentRefPosition->assignedReg()));
7833 if (assignedRegister == REG_NA && RefTypeIsUse(refType))
7835 currentRefPosition->reload = true;
7836 INDEBUG(dumpLsraAllocationEvent(LSRA_EVENT_RELOAD, currentInterval, assignedRegister));
7840 regMaskTP assignedRegBit = RBM_NONE;
7841 bool isInRegister = false;
7842 if (assignedRegister != REG_NA)
7844 isInRegister = true;
7845 assignedRegBit = genRegMask(assignedRegister);
7846 if (!currentInterval->isActive)
7848 // If this is a use, it must have started the block on the stack, but the register
7849 // was available for use so we kept the association.
7850 if (RefTypeIsUse(refType))
7852 assert(enregisterLocalVars);
7853 assert(inVarToRegMaps[curBBNum][currentInterval->getVarIndex(compiler)] == REG_STK &&
7854 previousRefPosition->nodeLocation <= curBBStartLocation);
7855 isInRegister = false;
7859 currentInterval->isActive = true;
7862 assert(currentInterval->assignedReg != nullptr &&
7863 currentInterval->assignedReg->regNum == assignedRegister &&
7864 currentInterval->assignedReg->assignedInterval == currentInterval);
7867 // If this is a physical register, we unconditionally assign it to itself!
7868 if (currentRefPosition->isPhysRegRef)
7870 RegRecord* currentReg = currentRefPosition->getReg();
7871 Interval* assignedInterval = currentReg->assignedInterval;
7873 if (assignedInterval != nullptr)
7875 unassignPhysReg(currentReg, assignedInterval->recentRefPosition);
7877 currentReg->isActive = true;
7878 assignedRegister = currentReg->regNum;
7879 assignedRegBit = genRegMask(assignedRegister);
7880 if (refType == RefTypeKill)
7882 currentReg->isBusyUntilNextKill = false;
7885 else if (previousRefPosition != nullptr)
7887 assert(previousRefPosition->nextRefPosition == currentRefPosition);
7888 assert(assignedRegister == REG_NA || assignedRegBit == previousRefPosition->registerAssignment ||
7889 currentRefPosition->outOfOrder || previousRefPosition->copyReg ||
7890 previousRefPosition->refType == RefTypeExpUse || currentRefPosition->refType == RefTypeDummyDef);
7892 else if (assignedRegister != REG_NA)
7894 // Handle the case where this is a preassigned register (i.e. parameter).
7895 // We don't want to actually use the preassigned register if it's not
7896 // going to cover the lifetime - but we had to preallocate it to ensure
7897 // that it remained live.
7898 // TODO-CQ: At some point we may want to refine the analysis here, in case
7899 // it might be beneficial to keep it in this reg for PART of the lifetime
7900 if (currentInterval->isLocalVar)
7902 regMaskTP preferences = currentInterval->registerPreferences;
7903 bool keepAssignment = true;
7904 bool matchesPreferences = (preferences & genRegMask(assignedRegister)) != RBM_NONE;
7906 // Will the assigned register cover the lifetime? If not, does it at least
7907 // meet the preferences for the next RefPosition?
7908 RegRecord* physRegRecord = getRegisterRecord(currentInterval->physReg);
7909 RefPosition* nextPhysRegRefPos = physRegRecord->getNextRefPosition();
7910 if (nextPhysRegRefPos != nullptr &&
7911 nextPhysRegRefPos->nodeLocation <= currentInterval->lastRefPosition->nodeLocation)
7913 // Check to see if the existing assignment matches the preferences (e.g. callee save registers)
7914 // and ensure that the next use of this localVar does not occur after the nextPhysRegRefPos
7915 // There must be a next RefPosition, because we know that the Interval extends beyond the
7916 // nextPhysRegRefPos.
7917 RefPosition* nextLclVarRefPos = currentRefPosition->nextRefPosition;
7918 assert(nextLclVarRefPos != nullptr);
7919 if (!matchesPreferences || nextPhysRegRefPos->nodeLocation < nextLclVarRefPos->nodeLocation ||
7920 physRegRecord->conflictingFixedRegReference(nextLclVarRefPos))
7922 keepAssignment = false;
7925 else if (refType == RefTypeParamDef && !matchesPreferences)
7927 // Don't use the register, even if available, if it doesn't match the preferences.
7928 // Note that this case is only for ParamDefs, for which we haven't yet taken preferences
7929 // into account (we've just automatically got the initial location). In other cases,
7930 // we would already have put it in a preferenced register, if it was available.
7931 // TODO-CQ: Consider expanding this to check availability - that would duplicate
7932 // code here, but otherwise we may wind up in this register anyway.
7933 keepAssignment = false;
7936 if (keepAssignment == false)
7938 currentRefPosition->registerAssignment = allRegs(currentInterval->registerType);
7939 unassignPhysRegNoSpill(physRegRecord);
7941 // If the preferences are currently set to just this register, reset them to allRegs
7942 // of the appropriate type (just as we just reset the registerAssignment for this
7944 // Otherwise, simply remove this register from the preferences, if it's there.
7946 if (currentInterval->registerPreferences == assignedRegBit)
7948 currentInterval->registerPreferences = currentRefPosition->registerAssignment;
7952 currentInterval->registerPreferences &= ~assignedRegBit;
7955 assignedRegister = REG_NA;
7956 assignedRegBit = RBM_NONE;
7961 if (assignedRegister != REG_NA)
7963 // If there is a conflicting fixed reference, insert a copy.
7964 RegRecord* physRegRecord = getRegisterRecord(assignedRegister);
7965 if (physRegRecord->conflictingFixedRegReference(currentRefPosition))
7967 // We may have already reassigned the register to the conflicting reference.
7968 // If not, we need to unassign this interval.
7969 if (physRegRecord->assignedInterval == currentInterval)
7971 unassignPhysRegNoSpill(physRegRecord);
7973 currentRefPosition->moveReg = true;
7974 assignedRegister = REG_NA;
7975 setIntervalAsSplit(currentInterval);
7976 INDEBUG(dumpLsraAllocationEvent(LSRA_EVENT_MOVE_REG, currentInterval, assignedRegister));
7978 else if ((genRegMask(assignedRegister) & currentRefPosition->registerAssignment) != 0)
7980 currentRefPosition->registerAssignment = assignedRegBit;
7981 if (!currentReferent->isActive)
7983 // If we've got an exposed use at the top of a block, the
7984 // interval might not have been active. Otherwise if it's a use,
7985 // the interval must be active.
7986 if (refType == RefTypeDummyDef)
7988 currentReferent->isActive = true;
7989 assert(getRegisterRecord(assignedRegister)->assignedInterval == currentInterval);
7993 currentRefPosition->reload = true;
7996 INDEBUG(dumpLsraAllocationEvent(LSRA_EVENT_KEPT_ALLOCATION, currentInterval, assignedRegister));
8000 assert(currentInterval != nullptr);
8002 // It's already in a register, but not one we need.
8003 if (!RefTypeIsDef(currentRefPosition->refType))
8005 regNumber copyReg = assignCopyReg(currentRefPosition);
8006 assert(copyReg != REG_NA);
8007 INDEBUG(dumpLsraAllocationEvent(LSRA_EVENT_COPY_REG, currentInterval, copyReg));
8008 lastAllocatedRefPosition = currentRefPosition;
8009 if (currentRefPosition->lastUse)
8011 if (currentRefPosition->delayRegFree)
8013 INDEBUG(dumpLsraAllocationEvent(LSRA_EVENT_LAST_USE_DELAYED, currentInterval,
8015 delayRegsToFree |= (genRegMask(assignedRegister) | currentRefPosition->registerAssignment);
8019 INDEBUG(dumpLsraAllocationEvent(LSRA_EVENT_LAST_USE, currentInterval, assignedRegister));
8020 regsToFree |= (genRegMask(assignedRegister) | currentRefPosition->registerAssignment);
8023 // If this is a tree temp (non-localVar) interval, we will need an explicit move.
8024 if (!currentInterval->isLocalVar)
8026 currentRefPosition->moveReg = true;
8027 currentRefPosition->copyReg = false;
8033 INDEBUG(dumpLsraAllocationEvent(LSRA_EVENT_NEEDS_NEW_REG, nullptr, assignedRegister));
8034 regsToFree |= genRegMask(assignedRegister);
8035 // We want a new register, but we don't want this to be considered a spill.
8036 assignedRegister = REG_NA;
8037 if (physRegRecord->assignedInterval == currentInterval)
8039 unassignPhysRegNoSpill(physRegRecord);
8045 if (assignedRegister == REG_NA)
8047 bool allocateReg = true;
8049 if (currentRefPosition->AllocateIfProfitable())
8051 // We can avoid allocating a register if it is a the last use requiring a reload.
8052 if (currentRefPosition->lastUse && currentRefPosition->reload)
8054 allocateReg = false;
8058 // Under stress mode, don't attempt to allocate a reg to
8059 // reg optional ref position.
8060 if (allocateReg && regOptionalNoAlloc())
8062 allocateReg = false;
8069 // Try to allocate a register
8070 assignedRegister = tryAllocateFreeReg(currentInterval, currentRefPosition);
8073 // If no register was found, and if the currentRefPosition must have a register,
8074 // then find a register to spill
8075 if (assignedRegister == REG_NA)
8077 #if FEATURE_PARTIAL_SIMD_CALLEE_SAVE
8078 if (refType == RefTypeUpperVectorSaveDef)
8080 // TODO-CQ: Determine whether copying to two integer callee-save registers would be profitable.
8082 // SaveDef position occurs after the Use of args and at the same location as Kill/Def
8083 // positions of a call node. But SaveDef position cannot use any of the arg regs as
8084 // they are needed for call node.
8085 currentRefPosition->registerAssignment =
8086 (allRegs(TYP_FLOAT) & RBM_FLT_CALLEE_TRASH & ~RBM_FLTARG_REGS);
8087 assignedRegister = tryAllocateFreeReg(currentInterval, currentRefPosition);
8089 // There MUST be caller-save registers available, because they have all just been killed.
8090 // Amd64 Windows: xmm4-xmm5 are guaranteed to be available as xmm0-xmm3 are used for passing args.
8091 // Amd64 Unix: xmm8-xmm15 are guaranteed to be avilable as xmm0-xmm7 are used for passing args.
8092 // X86 RyuJIT Windows: xmm4-xmm7 are guanrateed to be available.
8093 assert(assignedRegister != REG_NA);
8097 // i) The reason we have to spill is that SaveDef position is allocated after the Kill positions
8098 // of the call node are processed. Since callee-trash registers are killed by call node
8099 // we explicity spill and unassign the register.
8100 // ii) These will look a bit backward in the dump, but it's a pain to dump the alloc before the
8102 unassignPhysReg(getRegisterRecord(assignedRegister), currentRefPosition);
8103 INDEBUG(dumpLsraAllocationEvent(LSRA_EVENT_ALLOC_REG, currentInterval, assignedRegister));
8105 // Now set assignedRegister to REG_NA again so that we don't re-activate it.
8106 assignedRegister = REG_NA;
8109 #endif // FEATURE_PARTIAL_SIMD_CALLEE_SAVE
8110 if (currentRefPosition->RequiresRegister() || currentRefPosition->AllocateIfProfitable())
8114 assignedRegister = allocateBusyReg(currentInterval, currentRefPosition,
8115 currentRefPosition->AllocateIfProfitable());
8118 if (assignedRegister != REG_NA)
8121 dumpLsraAllocationEvent(LSRA_EVENT_ALLOC_SPILLED_REG, currentInterval, assignedRegister));
8125 // This can happen only for those ref positions that are to be allocated
8126 // only if profitable.
8127 noway_assert(currentRefPosition->AllocateIfProfitable());
8129 currentRefPosition->registerAssignment = RBM_NONE;
8130 currentRefPosition->reload = false;
8131 setIntervalAsSpilled(currentInterval);
8133 INDEBUG(dumpLsraAllocationEvent(LSRA_EVENT_NO_REG_ALLOCATED, currentInterval));
8138 INDEBUG(dumpLsraAllocationEvent(LSRA_EVENT_NO_REG_ALLOCATED, currentInterval));
8139 currentRefPosition->registerAssignment = RBM_NONE;
8140 currentInterval->isActive = false;
8141 setIntervalAsSpilled(currentInterval);
8149 if (currentInterval->isConstant && (currentRefPosition->treeNode != nullptr) &&
8150 currentRefPosition->treeNode->IsReuseRegVal())
8152 dumpLsraAllocationEvent(LSRA_EVENT_REUSE_REG, nullptr, assignedRegister, currentBlock);
8156 dumpLsraAllocationEvent(LSRA_EVENT_ALLOC_REG, nullptr, assignedRegister, currentBlock);
8162 if (refType == RefTypeDummyDef && assignedRegister != REG_NA)
8164 setInVarRegForBB(curBBNum, currentInterval->varNum, assignedRegister);
8167 // If we allocated a register, and this is a use of a spilled value,
8168 // it should have been marked for reload above.
8169 if (assignedRegister != REG_NA && RefTypeIsUse(refType) && !isInRegister)
8171 assert(currentRefPosition->reload);
8175 // If we allocated a register, record it
8176 if (currentInterval != nullptr && assignedRegister != REG_NA)
8178 assignedRegBit = genRegMask(assignedRegister);
8179 currentRefPosition->registerAssignment = assignedRegBit;
8180 currentInterval->physReg = assignedRegister;
8181 regsToFree &= ~assignedRegBit; // we'll set it again later if it's dead
8183 // If this interval is dead, free the register.
8184 // The interval could be dead if this is a user variable, or if the
8185 // node is being evaluated for side effects, or a call whose result
8186 // is not used, etc.
8187 if (currentRefPosition->lastUse || currentRefPosition->nextRefPosition == nullptr)
8189 assert(currentRefPosition->isIntervalRef());
8191 if (refType != RefTypeExpUse && currentRefPosition->nextRefPosition == nullptr)
8193 if (currentRefPosition->delayRegFree)
8195 delayRegsToFree |= assignedRegBit;
8197 INDEBUG(dumpLsraAllocationEvent(LSRA_EVENT_LAST_USE_DELAYED));
8201 regsToFree |= assignedRegBit;
8203 INDEBUG(dumpLsraAllocationEvent(LSRA_EVENT_LAST_USE));
8208 currentInterval->isActive = false;
8212 lastAllocatedRefPosition = currentRefPosition;
8216 // Free registers to clear associated intervals for resolution phase
8217 CLANG_FORMAT_COMMENT_ANCHOR;
8220 if (getLsraExtendLifeTimes())
8222 // If we have extended lifetimes, we need to make sure all the registers are freed.
8223 for (int regNumIndex = 0; regNumIndex <= REG_FP_LAST; regNumIndex++)
8225 RegRecord& regRecord = physRegs[regNumIndex];
8226 Interval* interval = regRecord.assignedInterval;
8227 if (interval != nullptr)
8229 interval->isActive = false;
8230 unassignPhysReg(®Record, nullptr);
8237 freeRegisters(regsToFree | delayRegsToFree);
8245 // Dump the RegRecords after the last RefPosition is handled.
8250 dumpRefPositions("AFTER ALLOCATION");
8251 dumpVarRefPositions("AFTER ALLOCATION");
8253 // Dump the intervals that remain active
8254 printf("Active intervals at end of allocation:\n");
8256 // We COULD just reuse the intervalIter from above, but ArrayListIterator doesn't
8257 // provide a Reset function (!) - we'll probably replace this so don't bother
8260 for (auto& interval : intervals)
8262 if (interval.isActive)
8274 //-----------------------------------------------------------------------------
8275 // updateAssignedInterval: Update assigned interval of register.
8278 // reg - register to be updated
8279 // interval - interval to be assigned
8280 // regType - regsiter type
8286 // For ARM32, when "regType" is TYP_DOUBLE, "reg" should be a even-numbered
8287 // float register, i.e. lower half of double register.
8290 // For ARM32, two float registers consisting a double register are updated
8291 // together when "regType" is TYP_DOUBLE.
8293 void LinearScan::updateAssignedInterval(RegRecord* reg, Interval* interval, RegisterType regType)
8295 reg->assignedInterval = interval;
8298 // Update overlapping floating point register for TYP_DOUBLE
8299 if (regType == TYP_DOUBLE)
8301 assert(genIsValidDoubleReg(reg->regNum));
8303 RegRecord* anotherHalfReg = findAnotherHalfRegRec(reg);
8305 anotherHalfReg->assignedInterval = interval;
8310 //-----------------------------------------------------------------------------
8311 // updatePreviousInterval: Update previous interval of register.
8314 // reg - register to be updated
8315 // interval - interval to be assigned
8316 // regType - regsiter type
8322 // For ARM32, when "regType" is TYP_DOUBLE, "reg" should be a even-numbered
8323 // float register, i.e. lower half of double register.
8326 // For ARM32, two float registers consisting a double register are updated
8327 // together when "regType" is TYP_DOUBLE.
8329 void LinearScan::updatePreviousInterval(RegRecord* reg, Interval* interval, RegisterType regType)
8331 reg->previousInterval = interval;
8334 // Update overlapping floating point register for TYP_DOUBLE
8335 if (regType == TYP_DOUBLE)
8337 assert(genIsValidDoubleReg(reg->regNum));
8339 RegRecord* anotherHalfReg = findAnotherHalfRegRec(reg);
8341 anotherHalfReg->previousInterval = interval;
8346 // LinearScan::resolveLocalRef
8348 // Update the graph for a local reference.
8349 // Also, track the register (if any) that is currently occupied.
8351 // treeNode: The lclVar that's being resolved
8352 // currentRefPosition: the RefPosition associated with the treeNode
8355 // This method is called for each local reference, during the resolveRegisters
8356 // phase of LSRA. It is responsible for keeping the following in sync:
8357 // - varDsc->lvRegNum (and lvOtherReg) contain the unique register location.
8358 // If it is not in the same register through its lifetime, it is set to REG_STK.
8359 // - interval->physReg is set to the assigned register
8360 // (i.e. at the code location which is currently being handled by resolveRegisters())
8361 // - interval->isActive is true iff the interval is live and occupying a register
8362 // - interval->isSpilled should have already been set to true if the interval is EVER spilled
8363 // - interval->isSplit is set to true if the interval does not occupy the same
8364 // register throughout the method
8365 // - RegRecord->assignedInterval points to the interval which currently occupies
8367 // - For each lclVar node:
8368 // - gtRegNum/gtRegPair is set to the currently allocated register(s).
8369 // - GTF_SPILLED is set on a use if it must be reloaded prior to use.
8370 // - GTF_SPILL is set if it must be spilled after use.
8372 // A copyReg is an ugly case where the variable must be in a specific (fixed) register,
8373 // but it currently resides elsewhere. The register allocator must track the use of the
8374 // fixed register, but it marks the lclVar node with the register it currently lives in
8375 // and the code generator does the necessary move.
8377 // Before beginning, the varDsc for each parameter must be set to its initial location.
8379 // NICE: Consider tracking whether an Interval is always in the same location (register/stack)
8380 // in which case it will require no resolution.
8382 void LinearScan::resolveLocalRef(BasicBlock* block, GenTreePtr treeNode, RefPosition* currentRefPosition)
8384 assert((block == nullptr) == (treeNode == nullptr));
8385 assert(enregisterLocalVars);
8387 // Is this a tracked local? Or just a register allocated for loading
8388 // a non-tracked one?
8389 Interval* interval = currentRefPosition->getInterval();
8390 if (!interval->isLocalVar)
8394 interval->recentRefPosition = currentRefPosition;
8395 LclVarDsc* varDsc = interval->getLocalVar(compiler);
8397 // NOTE: we set the GTF_VAR_DEATH flag here unless we are extending lifetimes, in which case we write
8398 // this bit in checkLastUses. This is a bit of a hack, but is necessary because codegen requires
8399 // accurate last use info that is not reflected in the lastUse bit on ref positions when we are extending
8400 // lifetimes. See also the comments in checkLastUses.
8401 if ((treeNode != nullptr) && !extendLifetimes())
8403 if (currentRefPosition->lastUse)
8405 treeNode->gtFlags |= GTF_VAR_DEATH;
8409 treeNode->gtFlags &= ~GTF_VAR_DEATH;
8413 if (currentRefPosition->registerAssignment == RBM_NONE)
8415 assert(!currentRefPosition->RequiresRegister());
8416 assert(interval->isSpilled);
8418 varDsc->lvRegNum = REG_STK;
8419 if (interval->assignedReg != nullptr && interval->assignedReg->assignedInterval == interval)
8421 updateAssignedInterval(interval->assignedReg, nullptr, interval->registerType);
8423 interval->assignedReg = nullptr;
8424 interval->physReg = REG_NA;
8425 if (treeNode != nullptr)
8427 treeNode->SetContained();
8433 // In most cases, assigned and home registers will be the same
8434 // The exception is the copyReg case, where we've assigned a register
8435 // for a specific purpose, but will be keeping the register assignment
8436 regNumber assignedReg = currentRefPosition->assignedReg();
8437 regNumber homeReg = assignedReg;
8439 // Undo any previous association with a physical register, UNLESS this
8441 if (!currentRefPosition->copyReg)
8443 regNumber oldAssignedReg = interval->physReg;
8444 if (oldAssignedReg != REG_NA && assignedReg != oldAssignedReg)
8446 RegRecord* oldRegRecord = getRegisterRecord(oldAssignedReg);
8447 if (oldRegRecord->assignedInterval == interval)
8449 updateAssignedInterval(oldRegRecord, nullptr, interval->registerType);
8454 if (currentRefPosition->refType == RefTypeUse && !currentRefPosition->reload)
8456 // Was this spilled after our predecessor was scheduled?
8457 if (interval->physReg == REG_NA)
8459 assert(inVarToRegMaps[curBBNum][varDsc->lvVarIndex] == REG_STK);
8460 currentRefPosition->reload = true;
8464 bool reload = currentRefPosition->reload;
8465 bool spillAfter = currentRefPosition->spillAfter;
8467 // In the reload case we either:
8468 // - Set the register to REG_STK if it will be referenced only from the home location, or
8469 // - Set the register to the assigned register and set GTF_SPILLED if it must be loaded into a register.
8472 assert(currentRefPosition->refType != RefTypeDef);
8473 assert(interval->isSpilled);
8474 varDsc->lvRegNum = REG_STK;
8477 interval->physReg = assignedReg;
8480 // If there is no treeNode, this must be a RefTypeExpUse, in
8481 // which case we did the reload already
8482 if (treeNode != nullptr)
8484 treeNode->gtFlags |= GTF_SPILLED;
8487 if (currentRefPosition->AllocateIfProfitable())
8489 // This is a use of lclVar that is flagged as reg-optional
8490 // by lower/codegen and marked for both reload and spillAfter.
8491 // In this case we can avoid unnecessary reload and spill
8492 // by setting reg on lclVar to REG_STK and reg on tree node
8493 // to REG_NA. Codegen will generate the code by considering
8494 // it as a contained memory operand.
8496 // Note that varDsc->lvRegNum is already to REG_STK above.
8497 interval->physReg = REG_NA;
8498 treeNode->gtRegNum = REG_NA;
8499 treeNode->gtFlags &= ~GTF_SPILLED;
8500 treeNode->SetContained();
8504 treeNode->gtFlags |= GTF_SPILL;
8510 assert(currentRefPosition->refType == RefTypeExpUse);
8513 else if (spillAfter && !RefTypeIsUse(currentRefPosition->refType))
8515 // In the case of a pure def, don't bother spilling - just assign it to the
8516 // stack. However, we need to remember that it was spilled.
8518 assert(interval->isSpilled);
8519 varDsc->lvRegNum = REG_STK;
8520 interval->physReg = REG_NA;
8521 if (treeNode != nullptr)
8523 treeNode->gtRegNum = REG_NA;
8528 // Not reload and Not pure-def that's spillAfter
8530 if (currentRefPosition->copyReg || currentRefPosition->moveReg)
8532 // For a copyReg or moveReg, we have two cases:
8533 // - In the first case, we have a fixedReg - i.e. a register which the code
8534 // generator is constrained to use.
8535 // The code generator will generate the appropriate move to meet the requirement.
8536 // - In the second case, we were forced to use a different register because of
8537 // interference (or JitStressRegs).
8538 // In this case, we generate a GT_COPY.
8539 // In either case, we annotate the treeNode with the register in which the value
8540 // currently lives. For moveReg, the homeReg is the new register (as assigned above).
8541 // But for copyReg, the homeReg remains unchanged.
8543 assert(treeNode != nullptr);
8544 treeNode->gtRegNum = interval->physReg;
8546 if (currentRefPosition->copyReg)
8548 homeReg = interval->physReg;
8552 assert(interval->isSplit);
8553 interval->physReg = assignedReg;
8556 if (!currentRefPosition->isFixedRegRef || currentRefPosition->moveReg)
8558 // This is the second case, where we need to generate a copy
8559 insertCopyOrReload(block, treeNode, currentRefPosition->getMultiRegIdx(), currentRefPosition);
8564 interval->physReg = assignedReg;
8566 if (!interval->isSpilled && !interval->isSplit)
8568 if (varDsc->lvRegNum != REG_STK)
8570 // If the register assignments don't match, then this interval is split.
8571 if (varDsc->lvRegNum != assignedReg)
8573 setIntervalAsSplit(interval);
8574 varDsc->lvRegNum = REG_STK;
8579 varDsc->lvRegNum = assignedReg;
8585 if (treeNode != nullptr)
8587 treeNode->gtFlags |= GTF_SPILL;
8589 assert(interval->isSpilled);
8590 interval->physReg = REG_NA;
8591 varDsc->lvRegNum = REG_STK;
8595 // Update the physRegRecord for the register, so that we know what vars are in
8596 // regs at the block boundaries
8597 RegRecord* physRegRecord = getRegisterRecord(homeReg);
8598 if (spillAfter || currentRefPosition->lastUse)
8600 interval->isActive = false;
8601 interval->assignedReg = nullptr;
8602 interval->physReg = REG_NA;
8604 updateAssignedInterval(physRegRecord, nullptr, interval->registerType);
8608 interval->isActive = true;
8609 interval->assignedReg = physRegRecord;
8611 updateAssignedInterval(physRegRecord, interval, interval->registerType);
8615 void LinearScan::writeRegisters(RefPosition* currentRefPosition, GenTree* tree)
8617 lsraAssignRegToTree(tree, currentRefPosition->assignedReg(), currentRefPosition->getMultiRegIdx());
8620 //------------------------------------------------------------------------
8621 // insertCopyOrReload: Insert a copy in the case where a tree node value must be moved
8622 // to a different register at the point of use (GT_COPY), or it is reloaded to a different register
8623 // than the one it was spilled from (GT_RELOAD).
8626 // block - basic block in which GT_COPY/GT_RELOAD is inserted.
8627 // tree - This is the node to copy or reload.
8628 // Insert copy or reload node between this node and its parent.
8629 // multiRegIdx - register position of tree node for which copy or reload is needed.
8630 // refPosition - The RefPosition at which copy or reload will take place.
8633 // The GT_COPY or GT_RELOAD will be inserted in the proper spot in execution order where the reload is to occur.
8635 // For example, for this tree (numbers are execution order, lower is earlier and higher is later):
8637 // +---------+----------+
8639 // +---------+----------+
8644 // +-------------------+ +----------------------+
8645 // | x (1) | "tree" | y (2) |
8646 // +-------------------+ +----------------------+
8648 // generate this tree:
8650 // +---------+----------+
8652 // +---------+----------+
8657 // +-------------------+ +----------------------+
8658 // | GT_RELOAD (3) | | y (2) |
8659 // +-------------------+ +----------------------+
8661 // +-------------------+
8663 // +-------------------+
8665 // Note in particular that the GT_RELOAD node gets inserted in execution order immediately before the parent of "tree",
8666 // which seems a bit weird since normally a node's parent (in this case, the parent of "x", GT_RELOAD in the "after"
8667 // picture) immediately follows all of its children (that is, normally the execution ordering is postorder).
8668 // The ordering must be this weird "out of normal order" way because the "x" node is being spilled, probably
8669 // because the expression in the tree represented above by "y" has high register requirements. We don't want
8670 // to reload immediately, of course. So we put GT_RELOAD where the reload should actually happen.
8672 // Note that GT_RELOAD is required when we reload to a different register than the one we spilled to. It can also be
8673 // used if we reload to the same register. Normally, though, in that case we just mark the node with GTF_SPILLED,
8674 // and the unspilling code automatically reuses the same register, and does the reload when it notices that flag
8675 // when considering a node's operands.
8677 void LinearScan::insertCopyOrReload(BasicBlock* block, GenTreePtr tree, unsigned multiRegIdx, RefPosition* refPosition)
8679 LIR::Range& blockRange = LIR::AsRange(block);
8682 bool foundUse = blockRange.TryGetUse(tree, &treeUse);
8685 GenTree* parent = treeUse.User();
8688 if (refPosition->reload)
8696 #if TRACK_LSRA_STATS
8697 updateLsraStat(LSRA_STAT_COPY_REG, block->bbNum);
8701 // If the parent is a reload/copy node, then tree must be a multi-reg call node
8702 // that has already had one of its registers spilled. This is Because multi-reg
8703 // call node is the only node whose RefTypeDef positions get independently
8704 // spilled or reloaded. It is possible that one of its RefTypeDef position got
8705 // spilled and the next use of it requires it to be in a different register.
8707 // In this case set the ith position reg of reload/copy node to the reg allocated
8708 // for copy/reload refPosition. Essentially a copy/reload node will have a reg
8709 // for each multi-reg position of its child. If there is a valid reg in ith
8710 // position of GT_COPY or GT_RELOAD node then the corresponding result of its
8711 // child needs to be copied or reloaded to that reg.
8712 if (parent->IsCopyOrReload())
8714 noway_assert(parent->OperGet() == oper);
8715 noway_assert(tree->IsMultiRegCall());
8716 GenTreeCall* call = tree->AsCall();
8717 GenTreeCopyOrReload* copyOrReload = parent->AsCopyOrReload();
8718 noway_assert(copyOrReload->GetRegNumByIdx(multiRegIdx) == REG_NA);
8719 copyOrReload->SetRegNumByIdx(refPosition->assignedReg(), multiRegIdx);
8723 // Create the new node, with "tree" as its only child.
8724 var_types treeType = tree->TypeGet();
8726 GenTreeCopyOrReload* newNode = new (compiler, oper) GenTreeCopyOrReload(oper, treeType, tree);
8727 assert(refPosition->registerAssignment != RBM_NONE);
8728 newNode->SetRegNumByIdx(refPosition->assignedReg(), multiRegIdx);
8729 newNode->gtLsraInfo.isLsraAdded = true;
8730 newNode->gtLsraInfo.isLocalDefUse = false;
8731 if (refPosition->copyReg)
8733 // This is a TEMPORARY copy
8734 assert(isCandidateLocalRef(tree));
8735 newNode->gtFlags |= GTF_VAR_DEATH;
8738 // Insert the copy/reload after the spilled node and replace the use of the original node with a use
8739 // of the copy/reload.
8740 blockRange.InsertAfter(tree, newNode);
8741 treeUse.ReplaceWith(compiler, newNode);
8745 #if FEATURE_PARTIAL_SIMD_CALLEE_SAVE
8746 //------------------------------------------------------------------------
8747 // insertUpperVectorSaveAndReload: Insert code to save and restore the upper half of a vector that lives
8748 // in a callee-save register at the point of a kill (the upper half is
8752 // tree - This is the node around which we will insert the Save & Reload.
8753 // It will be a call or some node that turns into a call.
8754 // refPosition - The RefTypeUpperVectorSaveDef RefPosition.
8756 void LinearScan::insertUpperVectorSaveAndReload(GenTreePtr tree, RefPosition* refPosition, BasicBlock* block)
8758 Interval* lclVarInterval = refPosition->getInterval()->relatedInterval;
8759 assert(lclVarInterval->isLocalVar == true);
8760 LclVarDsc* varDsc = compiler->lvaTable + lclVarInterval->varNum;
8761 assert(varDsc->lvType == LargeVectorType);
8762 regNumber lclVarReg = lclVarInterval->physReg;
8763 if (lclVarReg == REG_NA)
8768 assert((genRegMask(lclVarReg) & RBM_FLT_CALLEE_SAVED) != RBM_NONE);
8770 regNumber spillReg = refPosition->assignedReg();
8771 bool spillToMem = refPosition->spillAfter;
8773 LIR::Range& blockRange = LIR::AsRange(block);
8775 // First, insert the save before the call.
8777 GenTreePtr saveLcl = compiler->gtNewLclvNode(lclVarInterval->varNum, LargeVectorType);
8778 saveLcl->gtLsraInfo.isLsraAdded = true;
8779 saveLcl->gtRegNum = lclVarReg;
8780 saveLcl->gtLsraInfo.isLocalDefUse = false;
8782 GenTreeSIMD* simdNode =
8783 new (compiler, GT_SIMD) GenTreeSIMD(LargeVectorSaveType, saveLcl, nullptr, SIMDIntrinsicUpperSave,
8784 varDsc->lvBaseType, genTypeSize(LargeVectorType));
8785 simdNode->gtLsraInfo.isLsraAdded = true;
8786 simdNode->gtRegNum = spillReg;
8789 simdNode->gtFlags |= GTF_SPILL;
8792 blockRange.InsertBefore(tree, LIR::SeqTree(compiler, simdNode));
8794 // Now insert the restore after the call.
8796 GenTreePtr restoreLcl = compiler->gtNewLclvNode(lclVarInterval->varNum, LargeVectorType);
8797 restoreLcl->gtLsraInfo.isLsraAdded = true;
8798 restoreLcl->gtRegNum = lclVarReg;
8799 restoreLcl->gtLsraInfo.isLocalDefUse = false;
8801 simdNode = new (compiler, GT_SIMD)
8802 GenTreeSIMD(LargeVectorType, restoreLcl, nullptr, SIMDIntrinsicUpperRestore, varDsc->lvBaseType, 32);
8803 simdNode->gtLsraInfo.isLsraAdded = true;
8804 simdNode->gtRegNum = spillReg;
8807 simdNode->gtFlags |= GTF_SPILLED;
8810 blockRange.InsertAfter(tree, LIR::SeqTree(compiler, simdNode));
8812 #endif // FEATURE_PARTIAL_SIMD_CALLEE_SAVE
8814 //------------------------------------------------------------------------
8815 // initMaxSpill: Initializes the LinearScan members used to track the max number
8816 // of concurrent spills. This is needed so that we can set the
8817 // fields in Compiler, so that the code generator, in turn can
8818 // allocate the right number of spill locations.
8827 // This is called before any calls to updateMaxSpill().
8829 void LinearScan::initMaxSpill()
8831 needDoubleTmpForFPCall = false;
8832 needFloatTmpForFPCall = false;
8833 for (int i = 0; i < TYP_COUNT; i++)
8836 currentSpill[i] = 0;
8840 //------------------------------------------------------------------------
8841 // recordMaxSpill: Sets the fields in Compiler for the max number of concurrent spills.
8842 // (See the comment on initMaxSpill.)
8851 // This is called after updateMaxSpill() has been called for all "real"
8854 void LinearScan::recordMaxSpill()
8856 // Note: due to the temp normalization process (see tmpNormalizeType)
8857 // only a few types should actually be seen here.
8858 JITDUMP("Recording the maximum number of concurrent spills:\n");
8860 var_types returnType = compiler->tmpNormalizeType(compiler->info.compRetType);
8861 if (needDoubleTmpForFPCall || (returnType == TYP_DOUBLE))
8863 JITDUMP("Adding a spill temp for moving a double call/return value between xmm reg and x87 stack.\n");
8864 maxSpill[TYP_DOUBLE] += 1;
8866 if (needFloatTmpForFPCall || (returnType == TYP_FLOAT))
8868 JITDUMP("Adding a spill temp for moving a float call/return value between xmm reg and x87 stack.\n");
8869 maxSpill[TYP_FLOAT] += 1;
8871 #endif // _TARGET_X86_
8872 for (int i = 0; i < TYP_COUNT; i++)
8874 if (var_types(i) != compiler->tmpNormalizeType(var_types(i)))
8876 // Only normalized types should have anything in the maxSpill array.
8877 // We assume here that if type 'i' does not normalize to itself, then
8878 // nothing else normalizes to 'i', either.
8879 assert(maxSpill[i] == 0);
8881 if (maxSpill[i] != 0)
8883 JITDUMP(" %s: %d\n", varTypeName(var_types(i)), maxSpill[i]);
8884 compiler->tmpPreAllocateTemps(var_types(i), maxSpill[i]);
8890 //------------------------------------------------------------------------
8891 // updateMaxSpill: Update the maximum number of concurrent spills
8894 // refPosition - the current RefPosition being handled
8900 // The RefPosition has an associated interval (getInterval() will
8901 // otherwise assert).
8904 // This is called for each "real" RefPosition during the writeback
8905 // phase of LSRA. It keeps track of how many concurrently-live
8906 // spills there are, and the largest number seen so far.
8908 void LinearScan::updateMaxSpill(RefPosition* refPosition)
8910 RefType refType = refPosition->refType;
8912 if (refPosition->spillAfter || refPosition->reload ||
8913 (refPosition->AllocateIfProfitable() && refPosition->assignedReg() == REG_NA))
8915 Interval* interval = refPosition->getInterval();
8916 if (!interval->isLocalVar)
8918 // The tmp allocation logic 'normalizes' types to a small number of
8919 // types that need distinct stack locations from each other.
8920 // Those types are currently gc refs, byrefs, <= 4 byte non-GC items,
8921 // 8-byte non-GC items, and 16-byte or 32-byte SIMD vectors.
8922 // LSRA is agnostic to those choices but needs
8923 // to know what they are here.
8926 #if FEATURE_PARTIAL_SIMD_CALLEE_SAVE
8927 if ((refType == RefTypeUpperVectorSaveDef) || (refType == RefTypeUpperVectorSaveUse))
8929 typ = LargeVectorSaveType;
8932 #endif // !FEATURE_PARTIAL_SIMD_CALLEE_SAVE
8934 GenTreePtr treeNode = refPosition->treeNode;
8935 if (treeNode == nullptr)
8937 assert(RefTypeIsUse(refType));
8938 treeNode = interval->firstRefPosition->treeNode;
8940 assert(treeNode != nullptr);
8942 // In case of multi-reg call nodes, we need to use the type
8943 // of the return register given by multiRegIdx of the refposition.
8944 if (treeNode->IsMultiRegCall())
8946 ReturnTypeDesc* retTypeDesc = treeNode->AsCall()->GetReturnTypeDesc();
8947 typ = retTypeDesc->GetReturnRegType(refPosition->getMultiRegIdx());
8951 typ = treeNode->TypeGet();
8953 typ = compiler->tmpNormalizeType(typ);
8956 if (refPosition->spillAfter && !refPosition->reload)
8958 currentSpill[typ]++;
8959 if (currentSpill[typ] > maxSpill[typ])
8961 maxSpill[typ] = currentSpill[typ];
8964 else if (refPosition->reload)
8966 assert(currentSpill[typ] > 0);
8967 currentSpill[typ]--;
8969 else if (refPosition->AllocateIfProfitable() && refPosition->assignedReg() == REG_NA)
8971 // A spill temp not getting reloaded into a reg because it is
8972 // marked as allocate if profitable and getting used from its
8973 // memory location. To properly account max spill for typ we
8974 // decrement spill count.
8975 assert(RefTypeIsUse(refType));
8976 assert(currentSpill[typ] > 0);
8977 currentSpill[typ]--;
8979 JITDUMP(" Max spill for %s is %d\n", varTypeName(typ), maxSpill[typ]);
8984 // This is the final phase of register allocation. It writes the register assignments to
8985 // the tree, and performs resolution across joins and backedges.
8987 void LinearScan::resolveRegisters()
8989 // Iterate over the tree and the RefPositions in lockstep
8990 // - annotate the tree with register assignments by setting gtRegNum or gtRegPair (for longs)
8992 // - track globally-live var locations
8993 // - add resolution points at split/merge/critical points as needed
8995 // Need to use the same traversal order as the one that assigns the location numbers.
8997 // Dummy RefPositions have been added at any split, join or critical edge, at the
8998 // point where resolution may be required. These are located:
8999 // - for a split, at the top of the non-adjacent block
9000 // - for a join, at the bottom of the non-adjacent joining block
9001 // - for a critical edge, at the top of the target block of each critical
9003 // Note that a target block may have multiple incoming critical or split edges
9005 // These RefPositions record the expected location of the Interval at that point.
9006 // At each branch, we identify the location of each liveOut interval, and check
9007 // against the RefPositions at the target.
9010 LsraLocation currentLocation = MinLocation;
9012 // Clear register assignments - these will be reestablished as lclVar defs (including RefTypeParamDefs)
9014 if (enregisterLocalVars)
9016 for (regNumber reg = REG_FIRST; reg < ACTUAL_REG_COUNT; reg = REG_NEXT(reg))
9018 RegRecord* physRegRecord = getRegisterRecord(reg);
9019 Interval* assignedInterval = physRegRecord->assignedInterval;
9020 if (assignedInterval != nullptr)
9022 assignedInterval->assignedReg = nullptr;
9023 assignedInterval->physReg = REG_NA;
9025 physRegRecord->assignedInterval = nullptr;
9026 physRegRecord->recentRefPosition = nullptr;
9029 // Clear "recentRefPosition" for lclVar intervals
9030 for (unsigned varIndex = 0; varIndex < compiler->lvaTrackedCount; varIndex++)
9032 if (localVarIntervals[varIndex] != nullptr)
9034 localVarIntervals[varIndex]->recentRefPosition = nullptr;
9035 localVarIntervals[varIndex]->isActive = false;
9039 assert(compiler->lvaTable[compiler->lvaTrackedToVarNum[varIndex]].lvLRACandidate == false);
9044 // handle incoming arguments and special temps
9045 auto currentRefPosition = refPositions.begin();
9047 if (enregisterLocalVars)
9049 VarToRegMap entryVarToRegMap = inVarToRegMaps[compiler->fgFirstBB->bbNum];
9050 while (currentRefPosition != refPositions.end() &&
9051 (currentRefPosition->refType == RefTypeParamDef || currentRefPosition->refType == RefTypeZeroInit))
9053 Interval* interval = currentRefPosition->getInterval();
9054 assert(interval != nullptr && interval->isLocalVar);
9055 resolveLocalRef(nullptr, nullptr, currentRefPosition);
9056 regNumber reg = REG_STK;
9057 int varIndex = interval->getVarIndex(compiler);
9059 if (!currentRefPosition->spillAfter && currentRefPosition->registerAssignment != RBM_NONE)
9061 reg = currentRefPosition->assignedReg();
9066 interval->isActive = false;
9068 setVarReg(entryVarToRegMap, varIndex, reg);
9069 ++currentRefPosition;
9074 assert(currentRefPosition == refPositions.end() ||
9075 (currentRefPosition->refType != RefTypeParamDef && currentRefPosition->refType != RefTypeZeroInit));
9078 BasicBlock* insertionBlock = compiler->fgFirstBB;
9079 GenTreePtr insertionPoint = LIR::AsRange(insertionBlock).FirstNonPhiNode();
9081 // write back assignments
9082 for (block = startBlockSequence(); block != nullptr; block = moveToNextBlock())
9084 assert(curBBNum == block->bbNum);
9086 if (enregisterLocalVars)
9088 // Record the var locations at the start of this block.
9089 // (If it's fgFirstBB, we've already done that above, see entryVarToRegMap)
9091 curBBStartLocation = currentRefPosition->nodeLocation;
9092 if (block != compiler->fgFirstBB)
9094 processBlockStartLocations(block, false);
9097 // Handle the DummyDefs, updating the incoming var location.
9098 for (; currentRefPosition != refPositions.end() && currentRefPosition->refType == RefTypeDummyDef;
9099 ++currentRefPosition)
9101 assert(currentRefPosition->isIntervalRef());
9102 // Don't mark dummy defs as reload
9103 currentRefPosition->reload = false;
9104 resolveLocalRef(nullptr, nullptr, currentRefPosition);
9106 if (currentRefPosition->registerAssignment != RBM_NONE)
9108 reg = currentRefPosition->assignedReg();
9113 currentRefPosition->getInterval()->isActive = false;
9115 setInVarRegForBB(curBBNum, currentRefPosition->getInterval()->varNum, reg);
9119 // The next RefPosition should be for the block. Move past it.
9120 assert(currentRefPosition != refPositions.end());
9121 assert(currentRefPosition->refType == RefTypeBB);
9122 ++currentRefPosition;
9124 // Handle the RefPositions for the block
9125 for (; currentRefPosition != refPositions.end() && currentRefPosition->refType != RefTypeBB &&
9126 currentRefPosition->refType != RefTypeDummyDef;
9127 ++currentRefPosition)
9129 currentLocation = currentRefPosition->nodeLocation;
9131 // Ensure that the spill & copy info is valid.
9132 // First, if it's reload, it must not be copyReg or moveReg
9133 assert(!currentRefPosition->reload || (!currentRefPosition->copyReg && !currentRefPosition->moveReg));
9134 // If it's copyReg it must not be moveReg, and vice-versa
9135 assert(!currentRefPosition->copyReg || !currentRefPosition->moveReg);
9137 switch (currentRefPosition->refType)
9140 case RefTypeUpperVectorSaveUse:
9141 case RefTypeUpperVectorSaveDef:
9142 #endif // FEATURE_SIMD
9145 // These are the ones we're interested in
9148 case RefTypeFixedReg:
9149 // These require no handling at resolution time
9150 assert(currentRefPosition->referent != nullptr);
9151 currentRefPosition->referent->recentRefPosition = currentRefPosition;
9154 // Ignore the ExpUse cases - a RefTypeExpUse would only exist if the
9155 // variable is dead at the entry to the next block. So we'll mark
9156 // it as in its current location and resolution will take care of any
9158 assert(getNextBlock() == nullptr ||
9159 !VarSetOps::IsMember(compiler, getNextBlock()->bbLiveIn,
9160 currentRefPosition->getInterval()->getVarIndex(compiler)));
9161 currentRefPosition->referent->recentRefPosition = currentRefPosition;
9163 case RefTypeKillGCRefs:
9164 // No action to take at resolution time, and no interval to update recentRefPosition for.
9166 case RefTypeDummyDef:
9167 case RefTypeParamDef:
9168 case RefTypeZeroInit:
9169 // Should have handled all of these already
9174 updateMaxSpill(currentRefPosition);
9175 GenTree* treeNode = currentRefPosition->treeNode;
9177 #if FEATURE_PARTIAL_SIMD_CALLEE_SAVE
9178 if (currentRefPosition->refType == RefTypeUpperVectorSaveDef)
9180 // The treeNode must be a call, and this must be a RefPosition for a LargeVectorType LocalVar.
9181 // If the LocalVar is in a callee-save register, we are going to spill its upper half around the call.
9182 // If we have allocated a register to spill it to, we will use that; otherwise, we will spill it
9183 // to the stack. We can use as a temp register any non-arg caller-save register.
9184 noway_assert(treeNode != nullptr);
9185 currentRefPosition->referent->recentRefPosition = currentRefPosition;
9186 insertUpperVectorSaveAndReload(treeNode, currentRefPosition, block);
9188 else if (currentRefPosition->refType == RefTypeUpperVectorSaveUse)
9192 #endif // FEATURE_PARTIAL_SIMD_CALLEE_SAVE
9194 // Most uses won't actually need to be recorded (they're on the def).
9195 // In those cases, treeNode will be nullptr.
9196 if (treeNode == nullptr)
9198 // This is either a use, a dead def, or a field of a struct
9199 Interval* interval = currentRefPosition->getInterval();
9200 assert(currentRefPosition->refType == RefTypeUse ||
9201 currentRefPosition->registerAssignment == RBM_NONE || interval->isStructField);
9203 // TODO-Review: Need to handle the case where any of the struct fields
9204 // are reloaded/spilled at this use
9205 assert(!interval->isStructField ||
9206 (currentRefPosition->reload == false && currentRefPosition->spillAfter == false));
9208 if (interval->isLocalVar && !interval->isStructField)
9210 LclVarDsc* varDsc = interval->getLocalVar(compiler);
9212 // This must be a dead definition. We need to mark the lclVar
9213 // so that it's not considered a candidate for lvRegister, as
9214 // this dead def will have to go to the stack.
9215 assert(currentRefPosition->refType == RefTypeDef);
9216 varDsc->lvRegNum = REG_STK;
9221 LsraLocation loc = treeNode->gtLsraInfo.loc;
9222 assert(treeNode->IsLocal() || currentLocation == loc || currentLocation == loc + 1);
9224 if (currentRefPosition->isIntervalRef() && currentRefPosition->getInterval()->isInternal)
9226 treeNode->gtRsvdRegs |= currentRefPosition->registerAssignment;
9230 writeRegisters(currentRefPosition, treeNode);
9232 if (treeNode->IsLocal() && currentRefPosition->getInterval()->isLocalVar)
9234 resolveLocalRef(block, treeNode, currentRefPosition);
9237 // Mark spill locations on temps
9238 // (local vars are handled in resolveLocalRef, above)
9239 // Note that the tree node will be changed from GTF_SPILL to GTF_SPILLED
9240 // in codegen, taking care of the "reload" case for temps
9241 else if (currentRefPosition->spillAfter || (currentRefPosition->nextRefPosition != nullptr &&
9242 currentRefPosition->nextRefPosition->moveReg))
9244 if (treeNode != nullptr && currentRefPosition->isIntervalRef())
9246 if (currentRefPosition->spillAfter)
9248 treeNode->gtFlags |= GTF_SPILL;
9250 // If this is a constant interval that is reusing a pre-existing value, we actually need
9251 // to generate the value at this point in order to spill it.
9252 if (treeNode->IsReuseRegVal())
9254 treeNode->ResetReuseRegVal();
9257 // In case of multi-reg call node, also set spill flag on the
9258 // register specified by multi-reg index of current RefPosition.
9259 // Note that the spill flag on treeNode indicates that one or
9260 // more its allocated registers are in that state.
9261 if (treeNode->IsMultiRegCall())
9263 GenTreeCall* call = treeNode->AsCall();
9264 call->SetRegSpillFlagByIdx(GTF_SPILL, currentRefPosition->getMultiRegIdx());
9267 else if (treeNode->OperIsPutArgSplit())
9269 GenTreePutArgSplit* splitArg = treeNode->AsPutArgSplit();
9270 splitArg->SetRegSpillFlagByIdx(GTF_SPILL, currentRefPosition->getMultiRegIdx());
9275 // If the value is reloaded or moved to a different register, we need to insert
9276 // a node to hold the register to which it should be reloaded
9277 RefPosition* nextRefPosition = currentRefPosition->nextRefPosition;
9278 assert(nextRefPosition != nullptr);
9279 if (INDEBUG(alwaysInsertReload() ||)
9280 nextRefPosition->assignedReg() != currentRefPosition->assignedReg())
9282 if (nextRefPosition->assignedReg() != REG_NA)
9284 insertCopyOrReload(block, treeNode, currentRefPosition->getMultiRegIdx(),
9289 assert(nextRefPosition->AllocateIfProfitable());
9291 // In case of tree temps, if def is spilled and use didn't
9292 // get a register, set a flag on tree node to be treated as
9293 // contained at the point of its use.
9294 if (currentRefPosition->spillAfter && currentRefPosition->refType == RefTypeDef &&
9295 nextRefPosition->refType == RefTypeUse)
9297 assert(nextRefPosition->treeNode == nullptr);
9298 treeNode->gtFlags |= GTF_NOREG_AT_USE;
9304 // We should never have to "spill after" a temp use, since
9305 // they're single use
9314 if (enregisterLocalVars)
9316 processBlockEndLocations(block);
9320 if (enregisterLocalVars)
9325 printf("-----------------------\n");
9326 printf("RESOLVING BB BOUNDARIES\n");
9327 printf("-----------------------\n");
9329 printf("Resolution Candidates: ");
9330 dumpConvertedVarSet(compiler, resolutionCandidateVars);
9332 printf("Has %sCritical Edges\n\n", hasCriticalEdges ? "" : "No");
9334 printf("Prior to Resolution\n");
9335 foreach_block(compiler, block)
9337 printf("\nBB%02u use def in out\n", block->bbNum);
9338 dumpConvertedVarSet(compiler, block->bbVarUse);
9340 dumpConvertedVarSet(compiler, block->bbVarDef);
9342 dumpConvertedVarSet(compiler, block->bbLiveIn);
9344 dumpConvertedVarSet(compiler, block->bbLiveOut);
9347 dumpInVarToRegMap(block);
9348 dumpOutVarToRegMap(block);
9357 // Verify register assignments on variables
9360 for (lclNum = 0, varDsc = compiler->lvaTable; lclNum < compiler->lvaCount; lclNum++, varDsc++)
9362 if (!isCandidateVar(varDsc))
9364 varDsc->lvRegNum = REG_STK;
9368 Interval* interval = getIntervalForLocalVar(varDsc->lvVarIndex);
9370 // Determine initial position for parameters
9372 if (varDsc->lvIsParam)
9374 regMaskTP initialRegMask = interval->firstRefPosition->registerAssignment;
9375 regNumber initialReg = (initialRegMask == RBM_NONE || interval->firstRefPosition->spillAfter)
9377 : genRegNumFromMask(initialRegMask);
9378 regNumber sourceReg = (varDsc->lvIsRegArg) ? varDsc->lvArgReg : REG_STK;
9381 if (varTypeIsMultiReg(varDsc))
9383 // TODO-ARM-NYI: Map the hi/lo intervals back to lvRegNum and lvOtherReg (these should NYI
9385 assert(!"Multi-reg types not yet supported");
9388 #endif // _TARGET_ARM_
9390 varDsc->lvArgInitReg = initialReg;
9391 JITDUMP(" Set V%02u argument initial register to %s\n", lclNum, getRegName(initialReg));
9394 // Stack args that are part of dependently-promoted structs should never be register candidates (see
9395 // LinearScan::isRegCandidate).
9396 assert(varDsc->lvIsRegArg || !compiler->lvaIsFieldOfDependentlyPromotedStruct(varDsc));
9399 // If lvRegNum is REG_STK, that means that either no register
9400 // was assigned, or (more likely) that the same register was not
9401 // used for all references. In that case, codegen gets the register
9402 // from the tree node.
9403 if (varDsc->lvRegNum == REG_STK || interval->isSpilled || interval->isSplit)
9405 // For codegen purposes, we'll set lvRegNum to whatever register
9406 // it's currently in as we go.
9407 // However, we never mark an interval as lvRegister if it has either been spilled
9409 varDsc->lvRegister = false;
9411 // Skip any dead defs or exposed uses
9412 // (first use exposed will only occur when there is no explicit initialization)
9413 RefPosition* firstRefPosition = interval->firstRefPosition;
9414 while ((firstRefPosition != nullptr) && (firstRefPosition->refType == RefTypeExpUse))
9416 firstRefPosition = firstRefPosition->nextRefPosition;
9418 if (firstRefPosition == nullptr)
9421 varDsc->lvLRACandidate = false;
9422 if (varDsc->lvRefCnt == 0)
9424 varDsc->lvOnFrame = false;
9428 // We may encounter cases where a lclVar actually has no references, but
9429 // a non-zero refCnt. For safety (in case this is some "hidden" lclVar that we're
9430 // not correctly recognizing), we'll mark those as needing a stack location.
9431 // TODO-Cleanup: Make this an assert if/when we correct the refCnt
9433 varDsc->lvOnFrame = true;
9438 // If the interval was not spilled, it doesn't need a stack location.
9439 if (!interval->isSpilled)
9441 varDsc->lvOnFrame = false;
9443 if (firstRefPosition->registerAssignment == RBM_NONE || firstRefPosition->spillAfter)
9445 // Either this RefPosition is spilled, or regOptional or it is not a "real" def or use
9447 firstRefPosition->spillAfter || firstRefPosition->AllocateIfProfitable() ||
9448 (firstRefPosition->refType != RefTypeDef && firstRefPosition->refType != RefTypeUse));
9449 varDsc->lvRegNum = REG_STK;
9453 varDsc->lvRegNum = firstRefPosition->assignedReg();
9460 varDsc->lvRegister = true;
9461 varDsc->lvOnFrame = false;
9464 regMaskTP registerAssignment = genRegMask(varDsc->lvRegNum);
9465 assert(!interval->isSpilled && !interval->isSplit);
9466 RefPosition* refPosition = interval->firstRefPosition;
9467 assert(refPosition != nullptr);
9469 while (refPosition != nullptr)
9471 // All RefPositions must match, except for dead definitions,
9472 // copyReg/moveReg and RefTypeExpUse positions
9473 if (refPosition->registerAssignment != RBM_NONE && !refPosition->copyReg &&
9474 !refPosition->moveReg && refPosition->refType != RefTypeExpUse)
9476 assert(refPosition->registerAssignment == registerAssignment);
9478 refPosition = refPosition->nextRefPosition;
9489 printf("Trees after linear scan register allocator (LSRA)\n");
9490 compiler->fgDispBasicBlocks(true);
9493 verifyFinalAllocation();
9496 compiler->raMarkStkVars();
9499 // TODO-CQ: Review this comment and address as needed.
9500 // Change all unused promoted non-argument struct locals to a non-GC type (in this case TYP_INT)
9501 // so that the gc tracking logic and lvMustInit logic will ignore them.
9502 // Extract the code that does this from raAssignVars, and call it here.
9503 // PRECONDITIONS: Ensure that lvPromoted is set on promoted structs, if and
9504 // only if it is promoted on all paths.
9505 // Call might be something like:
9506 // compiler->BashUnusedStructLocals();
9510 //------------------------------------------------------------------------
9511 // insertMove: Insert a move of a lclVar with the given lclNum into the given block.
9514 // block - the BasicBlock into which the move will be inserted.
9515 // insertionPoint - the instruction before which to insert the move
9516 // lclNum - the lclNum of the var to be moved
9517 // fromReg - the register from which the var is moving
9518 // toReg - the register to which the var is moving
9524 // If insertionPoint is non-NULL, insert before that instruction;
9525 // otherwise, insert "near" the end (prior to the branch, if any).
9526 // If fromReg or toReg is REG_STK, then move from/to memory, respectively.
9528 void LinearScan::insertMove(
9529 BasicBlock* block, GenTreePtr insertionPoint, unsigned lclNum, regNumber fromReg, regNumber toReg)
9531 LclVarDsc* varDsc = compiler->lvaTable + lclNum;
9532 // the lclVar must be a register candidate
9533 assert(isRegCandidate(varDsc));
9534 // One or both MUST be a register
9535 assert(fromReg != REG_STK || toReg != REG_STK);
9536 // They must not be the same register.
9537 assert(fromReg != toReg);
9539 // This var can't be marked lvRegister now
9540 varDsc->lvRegNum = REG_STK;
9542 GenTreePtr src = compiler->gtNewLclvNode(lclNum, varDsc->TypeGet());
9543 src->gtLsraInfo.isLsraAdded = true;
9545 // There are three cases we need to handle:
9546 // - We are loading a lclVar from the stack.
9547 // - We are storing a lclVar to the stack.
9548 // - We are copying a lclVar between registers.
9550 // In the first and second cases, the lclVar node will be marked with GTF_SPILLED and GTF_SPILL, respectively.
9551 // It is up to the code generator to ensure that any necessary normalization is done when loading or storing the
9554 // In the third case, we generate GT_COPY(GT_LCL_VAR) and type each node with the normalized type of the lclVar.
9555 // This is safe because a lclVar is always normalized once it is in a register.
9558 if (fromReg == REG_STK)
9560 src->gtFlags |= GTF_SPILLED;
9561 src->gtRegNum = toReg;
9563 else if (toReg == REG_STK)
9565 src->gtFlags |= GTF_SPILL;
9566 src->gtRegNum = fromReg;
9570 var_types movType = genActualType(varDsc->TypeGet());
9571 src->gtType = movType;
9573 dst = new (compiler, GT_COPY) GenTreeCopyOrReload(GT_COPY, movType, src);
9574 // This is the new home of the lclVar - indicate that by clearing the GTF_VAR_DEATH flag.
9575 // Note that if src is itself a lastUse, this will have no effect.
9576 dst->gtFlags &= ~(GTF_VAR_DEATH);
9577 src->gtRegNum = fromReg;
9578 dst->gtRegNum = toReg;
9579 src->gtLsraInfo.isLocalDefUse = false;
9580 dst->gtLsraInfo.isLsraAdded = true;
9582 dst->gtLsraInfo.isLocalDefUse = true;
9584 LIR::Range treeRange = LIR::SeqTree(compiler, dst);
9585 LIR::Range& blockRange = LIR::AsRange(block);
9587 if (insertionPoint != nullptr)
9589 blockRange.InsertBefore(insertionPoint, std::move(treeRange));
9593 // Put the copy at the bottom
9594 // If there's a branch, make an embedded statement that executes just prior to the branch
9595 if (block->bbJumpKind == BBJ_COND || block->bbJumpKind == BBJ_SWITCH)
9597 noway_assert(!blockRange.IsEmpty());
9599 GenTree* branch = blockRange.LastNode();
9600 assert(branch->OperIsConditionalJump() || branch->OperGet() == GT_SWITCH_TABLE ||
9601 branch->OperGet() == GT_SWITCH);
9603 blockRange.InsertBefore(branch, std::move(treeRange));
9607 assert(block->bbJumpKind == BBJ_NONE || block->bbJumpKind == BBJ_ALWAYS);
9608 blockRange.InsertAtEnd(std::move(treeRange));
9613 void LinearScan::insertSwap(
9614 BasicBlock* block, GenTreePtr insertionPoint, unsigned lclNum1, regNumber reg1, unsigned lclNum2, regNumber reg2)
9619 const char* insertionPointString = "top";
9620 if (insertionPoint == nullptr)
9622 insertionPointString = "bottom";
9624 printf(" BB%02u %s: swap V%02u in %s with V%02u in %s\n", block->bbNum, insertionPointString, lclNum1,
9625 getRegName(reg1), lclNum2, getRegName(reg2));
9629 LclVarDsc* varDsc1 = compiler->lvaTable + lclNum1;
9630 LclVarDsc* varDsc2 = compiler->lvaTable + lclNum2;
9631 assert(reg1 != REG_STK && reg1 != REG_NA && reg2 != REG_STK && reg2 != REG_NA);
9633 GenTreePtr lcl1 = compiler->gtNewLclvNode(lclNum1, varDsc1->TypeGet());
9634 lcl1->gtLsraInfo.isLsraAdded = true;
9635 lcl1->gtLsraInfo.isLocalDefUse = false;
9636 lcl1->gtRegNum = reg1;
9638 GenTreePtr lcl2 = compiler->gtNewLclvNode(lclNum2, varDsc2->TypeGet());
9639 lcl2->gtLsraInfo.isLsraAdded = true;
9640 lcl2->gtLsraInfo.isLocalDefUse = false;
9641 lcl2->gtRegNum = reg2;
9643 GenTreePtr swap = compiler->gtNewOperNode(GT_SWAP, TYP_VOID, lcl1, lcl2);
9644 swap->gtLsraInfo.isLsraAdded = true;
9645 swap->gtLsraInfo.isLocalDefUse = false;
9646 swap->gtRegNum = REG_NA;
9648 lcl1->gtNext = lcl2;
9649 lcl2->gtPrev = lcl1;
9650 lcl2->gtNext = swap;
9651 swap->gtPrev = lcl2;
9653 LIR::Range swapRange = LIR::SeqTree(compiler, swap);
9654 LIR::Range& blockRange = LIR::AsRange(block);
9656 if (insertionPoint != nullptr)
9658 blockRange.InsertBefore(insertionPoint, std::move(swapRange));
9662 // Put the copy at the bottom
9663 // If there's a branch, make an embedded statement that executes just prior to the branch
9664 if (block->bbJumpKind == BBJ_COND || block->bbJumpKind == BBJ_SWITCH)
9666 noway_assert(!blockRange.IsEmpty());
9668 GenTree* branch = blockRange.LastNode();
9669 assert(branch->OperIsConditionalJump() || branch->OperGet() == GT_SWITCH_TABLE ||
9670 branch->OperGet() == GT_SWITCH);
9672 blockRange.InsertBefore(branch, std::move(swapRange));
9676 assert(block->bbJumpKind == BBJ_NONE || block->bbJumpKind == BBJ_ALWAYS);
9677 blockRange.InsertAtEnd(std::move(swapRange));
9682 //------------------------------------------------------------------------
9683 // getTempRegForResolution: Get a free register to use for resolution code.
9686 // fromBlock - The "from" block on the edge being resolved.
9687 // toBlock - The "to"block on the edge
9688 // type - the type of register required
9691 // Returns a register that is free on the given edge, or REG_NA if none is available.
9694 // It is up to the caller to check the return value, and to determine whether a register is
9695 // available, and to handle that case appropriately.
9696 // It is also up to the caller to cache the return value, as this is not cheap to compute.
9698 regNumber LinearScan::getTempRegForResolution(BasicBlock* fromBlock, BasicBlock* toBlock, var_types type)
9700 // TODO-Throughput: This would be much more efficient if we add RegToVarMaps instead of VarToRegMaps
9701 // and they would be more space-efficient as well.
9702 VarToRegMap fromVarToRegMap = getOutVarToRegMap(fromBlock->bbNum);
9703 VarToRegMap toVarToRegMap = getInVarToRegMap(toBlock->bbNum);
9705 regMaskTP freeRegs = allRegs(type);
9707 if (getStressLimitRegs() == LSRA_LIMIT_SMALL_SET)
9712 INDEBUG(freeRegs = stressLimitRegs(nullptr, freeRegs));
9714 // We are only interested in the variables that are live-in to the "to" block.
9715 VarSetOps::Iter iter(compiler, toBlock->bbLiveIn);
9716 unsigned varIndex = 0;
9717 while (iter.NextElem(&varIndex) && freeRegs != RBM_NONE)
9719 regNumber fromReg = getVarReg(fromVarToRegMap, varIndex);
9720 regNumber toReg = getVarReg(toVarToRegMap, varIndex);
9721 assert(fromReg != REG_NA && toReg != REG_NA);
9722 if (fromReg != REG_STK)
9724 freeRegs &= ~genRegMask(fromReg);
9726 if (toReg != REG_STK)
9728 freeRegs &= ~genRegMask(toReg);
9731 if (freeRegs == RBM_NONE)
9737 regNumber tempReg = genRegNumFromMask(genFindLowestBit(freeRegs));
9742 //------------------------------------------------------------------------
9743 // addResolution: Add a resolution move of the given interval
9746 // block - the BasicBlock into which the move will be inserted.
9747 // insertionPoint - the instruction before which to insert the move
9748 // interval - the interval of the var to be moved
9749 // toReg - the register to which the var is moving
9750 // fromReg - the register from which the var is moving
9756 // For joins, we insert at the bottom (indicated by an insertionPoint
9757 // of nullptr), while for splits we insert at the top.
9758 // This is because for joins 'block' is a pred of the join, while for splits it is a succ.
9759 // For critical edges, this function may be called twice - once to move from
9760 // the source (fromReg), if any, to the stack, in which case toReg will be
9761 // REG_STK, and we insert at the bottom (leave insertionPoint as nullptr).
9762 // The next time, we want to move from the stack to the destination (toReg),
9763 // in which case fromReg will be REG_STK, and we insert at the top.
9765 void LinearScan::addResolution(
9766 BasicBlock* block, GenTreePtr insertionPoint, Interval* interval, regNumber toReg, regNumber fromReg)
9769 const char* insertionPointString = "top";
9771 if (insertionPoint == nullptr)
9774 insertionPointString = "bottom";
9778 JITDUMP(" BB%02u %s: move V%02u from ", block->bbNum, insertionPointString, interval->varNum);
9779 JITDUMP("%s to %s", getRegName(fromReg), getRegName(toReg));
9781 insertMove(block, insertionPoint, interval->varNum, fromReg, toReg);
9782 if (fromReg == REG_STK || toReg == REG_STK)
9784 assert(interval->isSpilled);
9788 // We should have already marked this as spilled or split.
9789 assert((interval->isSpilled) || (interval->isSplit));
9792 INTRACK_STATS(updateLsraStat(LSRA_STAT_RESOLUTION_MOV, block->bbNum));
9795 //------------------------------------------------------------------------
9796 // handleOutgoingCriticalEdges: Performs the necessary resolution on all critical edges that feed out of 'block'
9799 // block - the block with outgoing critical edges.
9805 // For all outgoing critical edges (i.e. any successor of this block which is
9806 // a join edge), if there are any conflicts, split the edge by adding a new block,
9807 // and generate the resolution code into that block.
9809 void LinearScan::handleOutgoingCriticalEdges(BasicBlock* block)
9811 VARSET_TP outResolutionSet(VarSetOps::Intersection(compiler, block->bbLiveOut, resolutionCandidateVars));
9812 if (VarSetOps::IsEmpty(compiler, outResolutionSet))
9816 VARSET_TP sameResolutionSet(VarSetOps::MakeEmpty(compiler));
9817 VARSET_TP sameLivePathsSet(VarSetOps::MakeEmpty(compiler));
9818 VARSET_TP singleTargetSet(VarSetOps::MakeEmpty(compiler));
9819 VARSET_TP diffResolutionSet(VarSetOps::MakeEmpty(compiler));
9821 // Get the outVarToRegMap for this block
9822 VarToRegMap outVarToRegMap = getOutVarToRegMap(block->bbNum);
9823 unsigned succCount = block->NumSucc(compiler);
9824 assert(succCount > 1);
9825 VarToRegMap firstSuccInVarToRegMap = nullptr;
9826 BasicBlock* firstSucc = nullptr;
9828 // First, determine the live regs at the end of this block so that we know what regs are
9829 // available to copy into.
9830 // Note that for this purpose we use the full live-out set, because we must ensure that
9831 // even the registers that remain the same across the edge are preserved correctly.
9832 regMaskTP liveOutRegs = RBM_NONE;
9833 VarSetOps::Iter liveOutIter(compiler, block->bbLiveOut);
9834 unsigned liveOutVarIndex = 0;
9835 while (liveOutIter.NextElem(&liveOutVarIndex))
9837 regNumber fromReg = getVarReg(outVarToRegMap, liveOutVarIndex);
9838 if (fromReg != REG_STK)
9840 liveOutRegs |= genRegMask(fromReg);
9844 // Next, if this blocks ends with a switch table, we have to make sure not to copy
9845 // into the registers that it uses.
9846 regMaskTP switchRegs = RBM_NONE;
9847 if (block->bbJumpKind == BBJ_SWITCH)
9849 // At this point, Lowering has transformed any non-switch-table blocks into
9851 GenTree* switchTable = LIR::AsRange(block).LastNode();
9852 assert(switchTable != nullptr && switchTable->OperGet() == GT_SWITCH_TABLE);
9854 switchRegs = switchTable->gtRsvdRegs;
9855 GenTree* op1 = switchTable->gtGetOp1();
9856 GenTree* op2 = switchTable->gtGetOp2();
9857 noway_assert(op1 != nullptr && op2 != nullptr);
9858 assert(op1->gtRegNum != REG_NA && op2->gtRegNum != REG_NA);
9859 switchRegs |= genRegMask(op1->gtRegNum);
9860 switchRegs |= genRegMask(op2->gtRegNum);
9863 VarToRegMap sameVarToRegMap = sharedCriticalVarToRegMap;
9864 regMaskTP sameWriteRegs = RBM_NONE;
9865 regMaskTP diffReadRegs = RBM_NONE;
9867 // For each var that may require resolution, classify them as:
9868 // - in the same register at the end of this block and at each target (no resolution needed)
9869 // - in different registers at different targets (resolve separately):
9870 // diffResolutionSet
9871 // - in the same register at each target at which it's live, but different from the end of
9872 // this block. We may be able to resolve these as if it is "join", but only if they do not
9873 // write to any registers that are read by those in the diffResolutionSet:
9874 // sameResolutionSet
9876 VarSetOps::Iter outResolutionSetIter(compiler, outResolutionSet);
9877 unsigned outResolutionSetVarIndex = 0;
9878 while (outResolutionSetIter.NextElem(&outResolutionSetVarIndex))
9880 regNumber fromReg = getVarReg(outVarToRegMap, outResolutionSetVarIndex);
9881 bool isMatch = true;
9882 bool isSame = false;
9883 bool maybeSingleTarget = false;
9884 bool maybeSameLivePaths = false;
9885 bool liveOnlyAtSplitEdge = true;
9886 regNumber sameToReg = REG_NA;
9887 for (unsigned succIndex = 0; succIndex < succCount; succIndex++)
9889 BasicBlock* succBlock = block->GetSucc(succIndex, compiler);
9890 if (!VarSetOps::IsMember(compiler, succBlock->bbLiveIn, outResolutionSetVarIndex))
9892 maybeSameLivePaths = true;
9895 else if (liveOnlyAtSplitEdge)
9897 // Is the var live only at those target blocks which are connected by a split edge to this block
9898 liveOnlyAtSplitEdge = ((succBlock->bbPreds->flNext == nullptr) && (succBlock != compiler->fgFirstBB));
9901 regNumber toReg = getVarReg(getInVarToRegMap(succBlock->bbNum), outResolutionSetVarIndex);
9902 if (sameToReg == REG_NA)
9907 if (toReg == sameToReg)
9915 // Check for the cases where we can't write to a register.
9916 // We only need to check for these cases if sameToReg is an actual register (not REG_STK).
9917 if (sameToReg != REG_NA && sameToReg != REG_STK)
9919 // If there's a path on which this var isn't live, it may use the original value in sameToReg.
9920 // In this case, sameToReg will be in the liveOutRegs of this block.
9921 // Similarly, if sameToReg is in sameWriteRegs, it has already been used (i.e. for a lclVar that's
9922 // live only at another target), and we can't copy another lclVar into that reg in this block.
9923 regMaskTP sameToRegMask = genRegMask(sameToReg);
9924 if (maybeSameLivePaths &&
9925 (((sameToRegMask & liveOutRegs) != RBM_NONE) || ((sameToRegMask & sameWriteRegs) != RBM_NONE)))
9929 // If this register is used by a switch table at the end of the block, we can't do the copy
9930 // in this block (since we can't insert it after the switch).
9931 if ((sameToRegMask & switchRegs) != RBM_NONE)
9936 // If the var is live only at those blocks connected by a split edge and not live-in at some of the
9937 // target blocks, we will resolve it the same way as if it were in diffResolutionSet and resolution
9938 // will be deferred to the handling of split edges, which means copy will only be at those target(s).
9940 // Another way to achieve similar resolution for vars live only at split edges is by removing them
9941 // from consideration up-front but it requires that we traverse those edges anyway to account for
9942 // the registers that must note be overwritten.
9943 if (liveOnlyAtSplitEdge && maybeSameLivePaths)
9949 if (sameToReg == REG_NA)
9951 VarSetOps::AddElemD(compiler, diffResolutionSet, outResolutionSetVarIndex);
9952 if (fromReg != REG_STK)
9954 diffReadRegs |= genRegMask(fromReg);
9957 else if (sameToReg != fromReg)
9959 VarSetOps::AddElemD(compiler, sameResolutionSet, outResolutionSetVarIndex);
9960 setVarReg(sameVarToRegMap, outResolutionSetVarIndex, sameToReg);
9961 if (sameToReg != REG_STK)
9963 sameWriteRegs |= genRegMask(sameToReg);
9968 if (!VarSetOps::IsEmpty(compiler, sameResolutionSet))
9970 if ((sameWriteRegs & diffReadRegs) != RBM_NONE)
9972 // We cannot split the "same" and "diff" regs if the "same" set writes registers
9973 // that must be read by the "diff" set. (Note that when these are done as a "batch"
9974 // we carefully order them to ensure all the input regs are read before they are
9976 VarSetOps::UnionD(compiler, diffResolutionSet, sameResolutionSet);
9977 VarSetOps::ClearD(compiler, sameResolutionSet);
9981 // For any vars in the sameResolutionSet, we can simply add the move at the end of "block".
9982 resolveEdge(block, nullptr, ResolveSharedCritical, sameResolutionSet);
9985 if (!VarSetOps::IsEmpty(compiler, diffResolutionSet))
9987 for (unsigned succIndex = 0; succIndex < succCount; succIndex++)
9989 BasicBlock* succBlock = block->GetSucc(succIndex, compiler);
9991 // Any "diffResolutionSet" resolution for a block with no other predecessors will be handled later
9992 // as split resolution.
9993 if ((succBlock->bbPreds->flNext == nullptr) && (succBlock != compiler->fgFirstBB))
9998 // Now collect the resolution set for just this edge, if any.
9999 // Check only the vars in diffResolutionSet that are live-in to this successor.
10000 bool needsResolution = false;
10001 VarToRegMap succInVarToRegMap = getInVarToRegMap(succBlock->bbNum);
10002 VARSET_TP edgeResolutionSet(VarSetOps::Intersection(compiler, diffResolutionSet, succBlock->bbLiveIn));
10003 VarSetOps::Iter iter(compiler, edgeResolutionSet);
10004 unsigned varIndex = 0;
10005 while (iter.NextElem(&varIndex))
10007 regNumber fromReg = getVarReg(outVarToRegMap, varIndex);
10008 regNumber toReg = getVarReg(succInVarToRegMap, varIndex);
10010 if (fromReg == toReg)
10012 VarSetOps::RemoveElemD(compiler, edgeResolutionSet, varIndex);
10015 if (!VarSetOps::IsEmpty(compiler, edgeResolutionSet))
10017 resolveEdge(block, succBlock, ResolveCritical, edgeResolutionSet);
10023 //------------------------------------------------------------------------
10024 // resolveEdges: Perform resolution across basic block edges
10033 // Traverse the basic blocks.
10034 // - If this block has a single predecessor that is not the immediately
10035 // preceding block, perform any needed 'split' resolution at the beginning of this block
10036 // - Otherwise if this block has critical incoming edges, handle them.
10037 // - If this block has a single successor that has multiple predecesors, perform any needed
10038 // 'join' resolution at the end of this block.
10039 // Note that a block may have both 'split' or 'critical' incoming edge(s) and 'join' outgoing
10042 void LinearScan::resolveEdges()
10044 JITDUMP("RESOLVING EDGES\n");
10046 // The resolutionCandidateVars set was initialized with all the lclVars that are live-in to
10047 // any block. We now intersect that set with any lclVars that ever spilled or split.
10048 // If there are no candidates for resoultion, simply return.
10050 VarSetOps::IntersectionD(compiler, resolutionCandidateVars, splitOrSpilledVars);
10051 if (VarSetOps::IsEmpty(compiler, resolutionCandidateVars))
10056 BasicBlock *block, *prevBlock = nullptr;
10058 // Handle all the critical edges first.
10059 // We will try to avoid resolution across critical edges in cases where all the critical-edge
10060 // targets of a block have the same home. We will then split the edges only for the
10061 // remaining mismatches. We visit the out-edges, as that allows us to share the moves that are
10062 // common among allt he targets.
10064 if (hasCriticalEdges)
10066 foreach_block(compiler, block)
10068 if (block->bbNum > bbNumMaxBeforeResolution)
10070 // This is a new block added during resolution - we don't need to visit these now.
10073 if (blockInfo[block->bbNum].hasCriticalOutEdge)
10075 handleOutgoingCriticalEdges(block);
10081 prevBlock = nullptr;
10082 foreach_block(compiler, block)
10084 if (block->bbNum > bbNumMaxBeforeResolution)
10086 // This is a new block added during resolution - we don't need to visit these now.
10090 unsigned succCount = block->NumSucc(compiler);
10091 flowList* preds = block->bbPreds;
10092 BasicBlock* uniquePredBlock = block->GetUniquePred(compiler);
10094 // First, if this block has a single predecessor,
10095 // we may need resolution at the beginning of this block.
10096 // This may be true even if it's the block we used for starting locations,
10097 // if a variable was spilled.
10098 VARSET_TP inResolutionSet(VarSetOps::Intersection(compiler, block->bbLiveIn, resolutionCandidateVars));
10099 if (!VarSetOps::IsEmpty(compiler, inResolutionSet))
10101 if (uniquePredBlock != nullptr)
10103 // We may have split edges during critical edge resolution, and in the process split
10104 // a non-critical edge as well.
10105 // It is unlikely that we would ever have more than one of these in sequence (indeed,
10106 // I don't think it's possible), but there's no need to assume that it can't.
10107 while (uniquePredBlock->bbNum > bbNumMaxBeforeResolution)
10109 uniquePredBlock = uniquePredBlock->GetUniquePred(compiler);
10110 noway_assert(uniquePredBlock != nullptr);
10112 resolveEdge(uniquePredBlock, block, ResolveSplit, inResolutionSet);
10116 // Finally, if this block has a single successor:
10117 // - and that has at least one other predecessor (otherwise we will do the resolution at the
10118 // top of the successor),
10119 // - and that is not the target of a critical edge (otherwise we've already handled it)
10120 // we may need resolution at the end of this block.
10122 if (succCount == 1)
10124 BasicBlock* succBlock = block->GetSucc(0, compiler);
10125 if (succBlock->GetUniquePred(compiler) == nullptr)
10127 VARSET_TP outResolutionSet(
10128 VarSetOps::Intersection(compiler, succBlock->bbLiveIn, resolutionCandidateVars));
10129 if (!VarSetOps::IsEmpty(compiler, outResolutionSet))
10131 resolveEdge(block, succBlock, ResolveJoin, outResolutionSet);
10137 // Now, fixup the mapping for any blocks that were adding for edge splitting.
10138 // See the comment prior to the call to fgSplitEdge() in resolveEdge().
10139 // Note that we could fold this loop in with the checking code below, but that
10140 // would only improve the debug case, and would clutter up the code somewhat.
10141 if (compiler->fgBBNumMax > bbNumMaxBeforeResolution)
10143 foreach_block(compiler, block)
10145 if (block->bbNum > bbNumMaxBeforeResolution)
10147 // There may be multiple blocks inserted when we split. But we must always have exactly
10148 // one path (i.e. all blocks must be single-successor and single-predecessor),
10149 // and only one block along the path may be non-empty.
10150 // Note that we may have a newly-inserted block that is empty, but which connects
10151 // two non-resolution blocks. This happens when an edge is split that requires it.
10153 BasicBlock* succBlock = block;
10156 succBlock = succBlock->GetUniqueSucc();
10157 noway_assert(succBlock != nullptr);
10158 } while ((succBlock->bbNum > bbNumMaxBeforeResolution) && succBlock->isEmpty());
10160 BasicBlock* predBlock = block;
10163 predBlock = predBlock->GetUniquePred(compiler);
10164 noway_assert(predBlock != nullptr);
10165 } while ((predBlock->bbNum > bbNumMaxBeforeResolution) && predBlock->isEmpty());
10167 unsigned succBBNum = succBlock->bbNum;
10168 unsigned predBBNum = predBlock->bbNum;
10169 if (block->isEmpty())
10171 // For the case of the empty block, find the non-resolution block (succ or pred).
10172 if (predBBNum > bbNumMaxBeforeResolution)
10174 assert(succBBNum <= bbNumMaxBeforeResolution);
10184 assert((succBBNum <= bbNumMaxBeforeResolution) && (predBBNum <= bbNumMaxBeforeResolution));
10186 SplitEdgeInfo info = {predBBNum, succBBNum};
10187 getSplitBBNumToTargetBBNumMap()->Set(block->bbNum, info);
10193 // Make sure the varToRegMaps match up on all edges.
10194 bool foundMismatch = false;
10195 foreach_block(compiler, block)
10197 if (block->isEmpty() && block->bbNum > bbNumMaxBeforeResolution)
10201 VarToRegMap toVarToRegMap = getInVarToRegMap(block->bbNum);
10202 for (flowList* pred = block->bbPreds; pred != nullptr; pred = pred->flNext)
10204 BasicBlock* predBlock = pred->flBlock;
10205 VarToRegMap fromVarToRegMap = getOutVarToRegMap(predBlock->bbNum);
10206 VarSetOps::Iter iter(compiler, block->bbLiveIn);
10207 unsigned varIndex = 0;
10208 while (iter.NextElem(&varIndex))
10210 regNumber fromReg = getVarReg(fromVarToRegMap, varIndex);
10211 regNumber toReg = getVarReg(toVarToRegMap, varIndex);
10212 if (fromReg != toReg)
10214 if (!foundMismatch)
10216 foundMismatch = true;
10217 printf("Found mismatched var locations after resolution!\n");
10219 unsigned varNum = compiler->lvaTrackedToVarNum[varIndex];
10220 printf(" V%02u: BB%02u to BB%02u: %s to %s\n", varNum, predBlock->bbNum, block->bbNum,
10221 getRegName(fromReg), getRegName(toReg));
10226 assert(!foundMismatch);
10231 //------------------------------------------------------------------------
10232 // resolveEdge: Perform the specified type of resolution between two blocks.
10235 // fromBlock - the block from which the edge originates
10236 // toBlock - the block at which the edge terminates
10237 // resolveType - the type of resolution to be performed
10238 // liveSet - the set of tracked lclVar indices which may require resolution
10244 // The caller must have performed the analysis to determine the type of the edge.
10247 // This method emits the correctly ordered moves necessary to place variables in the
10248 // correct registers across a Split, Join or Critical edge.
10249 // In order to avoid overwriting register values before they have been moved to their
10250 // new home (register/stack), it first does the register-to-stack moves (to free those
10251 // registers), then the register to register moves, ensuring that the target register
10252 // is free before the move, and then finally the stack to register moves.
10254 void LinearScan::resolveEdge(BasicBlock* fromBlock,
10255 BasicBlock* toBlock,
10256 ResolveType resolveType,
10257 VARSET_VALARG_TP liveSet)
10259 VarToRegMap fromVarToRegMap = getOutVarToRegMap(fromBlock->bbNum);
10260 VarToRegMap toVarToRegMap;
10261 if (resolveType == ResolveSharedCritical)
10263 toVarToRegMap = sharedCriticalVarToRegMap;
10267 toVarToRegMap = getInVarToRegMap(toBlock->bbNum);
10270 // The block to which we add the resolution moves depends on the resolveType
10272 switch (resolveType)
10275 case ResolveSharedCritical:
10281 case ResolveCritical:
10282 // fgSplitEdge may add one or two BasicBlocks. It returns the block that splits
10283 // the edge from 'fromBlock' and 'toBlock', but if it inserts that block right after
10284 // a block with a fall-through it will have to create another block to handle that edge.
10285 // These new blocks can be mapped to existing blocks in order to correctly handle
10286 // the calls to recordVarLocationsAtStartOfBB() from codegen. That mapping is handled
10287 // in resolveEdges(), after all the edge resolution has been done (by calling this
10288 // method for each edge).
10289 block = compiler->fgSplitEdge(fromBlock, toBlock);
10291 // Split edges are counted against fromBlock.
10292 INTRACK_STATS(updateLsraStat(LSRA_STAT_SPLIT_EDGE, fromBlock->bbNum));
10299 #ifndef _TARGET_XARCH_
10300 // We record tempregs for beginning and end of each block.
10301 // For amd64/x86 we only need a tempReg for float - we'll use xchg for int.
10302 // TODO-Throughput: It would be better to determine the tempRegs on demand, but the code below
10303 // modifies the varToRegMaps so we don't have all the correct registers at the time
10304 // we need to get the tempReg.
10305 regNumber tempRegInt =
10306 (resolveType == ResolveSharedCritical) ? REG_NA : getTempRegForResolution(fromBlock, toBlock, TYP_INT);
10307 #endif // !_TARGET_XARCH_
10308 regNumber tempRegFlt = REG_NA;
10309 if ((compiler->compFloatingPointUsed) && (resolveType != ResolveSharedCritical))
10312 #ifdef _TARGET_ARM_
10313 // Let's try to reserve a double register for TYP_FLOAT and TYP_DOUBLE
10314 tempRegFlt = getTempRegForResolution(fromBlock, toBlock, TYP_DOUBLE);
10315 if (tempRegFlt == REG_NA)
10317 // If fails, try to reserve a float register for TYP_FLOAT
10318 tempRegFlt = getTempRegForResolution(fromBlock, toBlock, TYP_FLOAT);
10321 tempRegFlt = getTempRegForResolution(fromBlock, toBlock, TYP_FLOAT);
10325 regMaskTP targetRegsToDo = RBM_NONE;
10326 regMaskTP targetRegsReady = RBM_NONE;
10327 regMaskTP targetRegsFromStack = RBM_NONE;
10329 // The following arrays capture the location of the registers as they are moved:
10330 // - location[reg] gives the current location of the var that was originally in 'reg'.
10331 // (Note that a var may be moved more than once.)
10332 // - source[reg] gives the original location of the var that needs to be moved to 'reg'.
10333 // For example, if a var is in rax and needs to be moved to rsi, then we would start with:
10334 // location[rax] == rax
10335 // source[rsi] == rax -- this doesn't change
10336 // Then, if for some reason we need to move it temporary to rbx, we would have:
10337 // location[rax] == rbx
10338 // Once we have completed the move, we will have:
10339 // location[rax] == REG_NA
10340 // This indicates that the var originally in rax is now in its target register.
10342 regNumberSmall location[REG_COUNT];
10343 C_ASSERT(sizeof(char) == sizeof(regNumberSmall)); // for memset to work
10344 memset(location, REG_NA, REG_COUNT);
10345 regNumberSmall source[REG_COUNT];
10346 memset(source, REG_NA, REG_COUNT);
10348 // What interval is this register associated with?
10349 // (associated with incoming reg)
10350 Interval* sourceIntervals[REG_COUNT];
10351 memset(&sourceIntervals, 0, sizeof(sourceIntervals));
10353 // Intervals for vars that need to be loaded from the stack
10354 Interval* stackToRegIntervals[REG_COUNT];
10355 memset(&stackToRegIntervals, 0, sizeof(stackToRegIntervals));
10357 // Get the starting insertion point for the "to" resolution
10358 GenTreePtr insertionPoint = nullptr;
10359 if (resolveType == ResolveSplit || resolveType == ResolveCritical)
10361 insertionPoint = LIR::AsRange(block).FirstNonPhiNode();
10365 // - Perform all moves from reg to stack (no ordering needed on these)
10366 // - For reg to reg moves, record the current location, associating their
10367 // source location with the target register they need to go into
10368 // - For stack to reg moves (done last, no ordering needed between them)
10369 // record the interval associated with the target reg
10370 // TODO-Throughput: We should be looping over the liveIn and liveOut registers, since
10371 // that will scale better than the live variables
10373 VarSetOps::Iter iter(compiler, liveSet);
10374 unsigned varIndex = 0;
10375 while (iter.NextElem(&varIndex))
10377 regNumber fromReg = getVarReg(fromVarToRegMap, varIndex);
10378 regNumber toReg = getVarReg(toVarToRegMap, varIndex);
10379 if (fromReg == toReg)
10384 // For Critical edges, the location will not change on either side of the edge,
10385 // since we'll add a new block to do the move.
10386 if (resolveType == ResolveSplit)
10388 setVarReg(toVarToRegMap, varIndex, fromReg);
10390 else if (resolveType == ResolveJoin || resolveType == ResolveSharedCritical)
10392 setVarReg(fromVarToRegMap, varIndex, toReg);
10395 assert(fromReg < UCHAR_MAX && toReg < UCHAR_MAX);
10397 Interval* interval = getIntervalForLocalVar(varIndex);
10399 if (fromReg == REG_STK)
10401 stackToRegIntervals[toReg] = interval;
10402 targetRegsFromStack |= genRegMask(toReg);
10404 else if (toReg == REG_STK)
10406 // Do the reg to stack moves now
10407 addResolution(block, insertionPoint, interval, REG_STK, fromReg);
10408 JITDUMP(" (%s)\n", resolveTypeName[resolveType]);
10412 location[fromReg] = (regNumberSmall)fromReg;
10413 source[toReg] = (regNumberSmall)fromReg;
10414 sourceIntervals[fromReg] = interval;
10415 targetRegsToDo |= genRegMask(toReg);
10419 // REGISTER to REGISTER MOVES
10421 // First, find all the ones that are ready to move now
10422 regMaskTP targetCandidates = targetRegsToDo;
10423 while (targetCandidates != RBM_NONE)
10425 regMaskTP targetRegMask = genFindLowestBit(targetCandidates);
10426 targetCandidates &= ~targetRegMask;
10427 regNumber targetReg = genRegNumFromMask(targetRegMask);
10428 if (location[targetReg] == REG_NA)
10430 targetRegsReady |= targetRegMask;
10434 // Perform reg to reg moves
10435 while (targetRegsToDo != RBM_NONE)
10437 while (targetRegsReady != RBM_NONE)
10439 regMaskTP targetRegMask = genFindLowestBit(targetRegsReady);
10440 targetRegsToDo &= ~targetRegMask;
10441 targetRegsReady &= ~targetRegMask;
10442 regNumber targetReg = genRegNumFromMask(targetRegMask);
10443 assert(location[targetReg] != targetReg);
10444 regNumber sourceReg = (regNumber)source[targetReg];
10445 regNumber fromReg = (regNumber)location[sourceReg];
10446 assert(fromReg < UCHAR_MAX && sourceReg < UCHAR_MAX);
10447 Interval* interval = sourceIntervals[sourceReg];
10448 assert(interval != nullptr);
10449 addResolution(block, insertionPoint, interval, targetReg, fromReg);
10450 JITDUMP(" (%s)\n", resolveTypeName[resolveType]);
10451 sourceIntervals[sourceReg] = nullptr;
10452 location[sourceReg] = REG_NA;
10454 // Do we have a free targetReg?
10455 if (fromReg == sourceReg && source[fromReg] != REG_NA)
10457 regMaskTP fromRegMask = genRegMask(fromReg);
10458 targetRegsReady |= fromRegMask;
10461 if (targetRegsToDo != RBM_NONE)
10463 regMaskTP targetRegMask = genFindLowestBit(targetRegsToDo);
10464 regNumber targetReg = genRegNumFromMask(targetRegMask);
10466 // Is it already there due to other moves?
10467 // If not, move it to the temp reg, OR swap it with another register
10468 regNumber sourceReg = (regNumber)source[targetReg];
10469 regNumber fromReg = (regNumber)location[sourceReg];
10470 if (targetReg == fromReg)
10472 targetRegsToDo &= ~targetRegMask;
10476 regNumber tempReg = REG_NA;
10477 bool useSwap = false;
10478 if (emitter::isFloatReg(targetReg))
10480 #ifdef _TARGET_ARM_
10481 if (sourceIntervals[fromReg]->registerType == TYP_DOUBLE)
10483 // ARM32 requires a double temp register for TYP_DOUBLE.
10484 // We tried to reserve a double temp register first, but sometimes we can't.
10485 tempReg = genIsValidDoubleReg(tempRegFlt) ? tempRegFlt : REG_NA;
10488 #endif // _TARGET_ARM_
10489 tempReg = tempRegFlt;
10491 #ifdef _TARGET_XARCH_
10496 #else // !_TARGET_XARCH_
10500 tempReg = tempRegInt;
10503 #endif // !_TARGET_XARCH_
10504 if (useSwap || tempReg == REG_NA)
10506 // First, we have to figure out the destination register for what's currently in fromReg,
10507 // so that we can find its sourceInterval.
10508 regNumber otherTargetReg = REG_NA;
10510 // By chance, is fromReg going where it belongs?
10511 if (location[source[fromReg]] == targetReg)
10513 otherTargetReg = fromReg;
10514 // If we can swap, we will be done with otherTargetReg as well.
10515 // Otherwise, we'll spill it to the stack and reload it later.
10518 regMaskTP fromRegMask = genRegMask(fromReg);
10519 targetRegsToDo &= ~fromRegMask;
10524 // Look at the remaining registers from targetRegsToDo (which we expect to be relatively
10525 // small at this point) to find out what's currently in targetReg.
10526 regMaskTP mask = targetRegsToDo;
10527 while (mask != RBM_NONE && otherTargetReg == REG_NA)
10529 regMaskTP nextRegMask = genFindLowestBit(mask);
10530 regNumber nextReg = genRegNumFromMask(nextRegMask);
10531 mask &= ~nextRegMask;
10532 if (location[source[nextReg]] == targetReg)
10534 otherTargetReg = nextReg;
10538 assert(otherTargetReg != REG_NA);
10542 // Generate a "swap" of fromReg and targetReg
10543 insertSwap(block, insertionPoint, sourceIntervals[source[otherTargetReg]]->varNum, targetReg,
10544 sourceIntervals[sourceReg]->varNum, fromReg);
10545 location[sourceReg] = REG_NA;
10546 location[source[otherTargetReg]] = (regNumberSmall)fromReg;
10548 INTRACK_STATS(updateLsraStat(LSRA_STAT_RESOLUTION_MOV, block->bbNum));
10552 // Spill "targetReg" to the stack and add its eventual target (otherTargetReg)
10553 // to "targetRegsFromStack", which will be handled below.
10554 // NOTE: This condition is very rare. Setting COMPlus_JitStressRegs=0x203
10555 // has been known to trigger it in JIT SH.
10557 // First, spill "otherInterval" from targetReg to the stack.
10558 Interval* otherInterval = sourceIntervals[source[otherTargetReg]];
10559 setIntervalAsSpilled(otherInterval);
10560 addResolution(block, insertionPoint, otherInterval, REG_STK, targetReg);
10561 JITDUMP(" (%s)\n", resolveTypeName[resolveType]);
10562 location[source[otherTargetReg]] = REG_STK;
10564 // Now, move the interval that is going to targetReg, and add its "fromReg" to
10565 // "targetRegsReady".
10566 addResolution(block, insertionPoint, sourceIntervals[sourceReg], targetReg, fromReg);
10567 JITDUMP(" (%s)\n", resolveTypeName[resolveType]);
10568 location[sourceReg] = REG_NA;
10569 targetRegsReady |= genRegMask(fromReg);
10571 targetRegsToDo &= ~targetRegMask;
10575 compiler->codeGen->regSet.rsSetRegsModified(genRegMask(tempReg) DEBUGARG(dumpTerse));
10576 assert(sourceIntervals[targetReg] != nullptr);
10577 addResolution(block, insertionPoint, sourceIntervals[targetReg], tempReg, targetReg);
10578 JITDUMP(" (%s)\n", resolveTypeName[resolveType]);
10579 location[targetReg] = (regNumberSmall)tempReg;
10580 targetRegsReady |= targetRegMask;
10586 // Finally, perform stack to reg moves
10587 // All the target regs will be empty at this point
10588 while (targetRegsFromStack != RBM_NONE)
10590 regMaskTP targetRegMask = genFindLowestBit(targetRegsFromStack);
10591 targetRegsFromStack &= ~targetRegMask;
10592 regNumber targetReg = genRegNumFromMask(targetRegMask);
10594 Interval* interval = stackToRegIntervals[targetReg];
10595 assert(interval != nullptr);
10597 addResolution(block, insertionPoint, interval, targetReg, REG_STK);
10598 JITDUMP(" (%s)\n", resolveTypeName[resolveType]);
10602 void TreeNodeInfo::Initialize(LinearScan* lsra, GenTree* node, LsraLocation location)
10604 regMaskTP dstCandidates;
10606 // if there is a reg indicated on the tree node, use that for dstCandidates
10607 // the exception is the NOP, which sometimes show up around late args.
10608 // TODO-Cleanup: get rid of those NOPs.
10609 if (node->gtRegNum == REG_NA || node->gtOper == GT_NOP)
10612 if (node->OperGet() == GT_PUTARG_REG)
10614 dstCandidates = lsra->allRegs(TYP_INT);
10619 dstCandidates = lsra->allRegs(node->TypeGet());
10624 dstCandidates = genRegMask(node->gtRegNum);
10627 internalIntCount = 0;
10628 internalFloatCount = 0;
10629 isLocalDefUse = false;
10630 isLsraAdded = false;
10631 definesAnyRegisters = false;
10633 setDstCandidates(lsra, dstCandidates);
10634 srcCandsIndex = dstCandsIndex;
10636 setInternalCandidates(lsra, lsra->allRegs(TYP_INT));
10640 isInitialized = true;
10643 assert(IsValid(lsra));
10646 regMaskTP TreeNodeInfo::getSrcCandidates(LinearScan* lsra)
10648 return lsra->GetRegMaskForIndex(srcCandsIndex);
10651 void TreeNodeInfo::setSrcCandidates(LinearScan* lsra, regMaskTP mask)
10653 LinearScan::RegMaskIndex i = lsra->GetIndexForRegMask(mask);
10654 assert(FitsIn<unsigned char>(i));
10655 srcCandsIndex = (unsigned char)i;
10658 regMaskTP TreeNodeInfo::getDstCandidates(LinearScan* lsra)
10660 return lsra->GetRegMaskForIndex(dstCandsIndex);
10663 void TreeNodeInfo::setDstCandidates(LinearScan* lsra, regMaskTP mask)
10665 LinearScan::RegMaskIndex i = lsra->GetIndexForRegMask(mask);
10666 assert(FitsIn<unsigned char>(i));
10667 dstCandsIndex = (unsigned char)i;
10670 regMaskTP TreeNodeInfo::getInternalCandidates(LinearScan* lsra)
10672 return lsra->GetRegMaskForIndex(internalCandsIndex);
10675 void TreeNodeInfo::setInternalCandidates(LinearScan* lsra, regMaskTP mask)
10677 LinearScan::RegMaskIndex i = lsra->GetIndexForRegMask(mask);
10678 assert(FitsIn<unsigned char>(i));
10679 internalCandsIndex = (unsigned char)i;
10682 void TreeNodeInfo::addInternalCandidates(LinearScan* lsra, regMaskTP mask)
10684 LinearScan::RegMaskIndex i = lsra->GetIndexForRegMask(lsra->GetRegMaskForIndex(internalCandsIndex) | mask);
10685 assert(FitsIn<unsigned char>(i));
10686 internalCandsIndex = (unsigned char)i;
10689 #if TRACK_LSRA_STATS
10690 // ----------------------------------------------------------
10691 // updateLsraStat: Increment LSRA stat counter.
10694 // stat - LSRA stat enum
10695 // bbNum - Basic block to which LSRA stat needs to be
10696 // associated with.
10698 void LinearScan::updateLsraStat(LsraStat stat, unsigned bbNum)
10700 if (bbNum > bbNumMaxBeforeResolution)
10702 // This is a newly created basic block as part of resolution.
10703 // These blocks contain resolution moves that are already accounted.
10709 case LSRA_STAT_SPILL:
10710 ++(blockInfo[bbNum].spillCount);
10713 case LSRA_STAT_COPY_REG:
10714 ++(blockInfo[bbNum].copyRegCount);
10717 case LSRA_STAT_RESOLUTION_MOV:
10718 ++(blockInfo[bbNum].resolutionMovCount);
10721 case LSRA_STAT_SPLIT_EDGE:
10722 ++(blockInfo[bbNum].splitEdgeCount);
10730 // -----------------------------------------------------------
10731 // dumpLsraStats - dumps Lsra stats to given file.
10734 // file - file to which stats are to be written.
10736 void LinearScan::dumpLsraStats(FILE* file)
10738 unsigned sumSpillCount = 0;
10739 unsigned sumCopyRegCount = 0;
10740 unsigned sumResolutionMovCount = 0;
10741 unsigned sumSplitEdgeCount = 0;
10742 UINT64 wtdSpillCount = 0;
10743 UINT64 wtdCopyRegCount = 0;
10744 UINT64 wtdResolutionMovCount = 0;
10746 fprintf(file, "----------\n");
10747 fprintf(file, "LSRA Stats");
10751 fprintf(file, " : %s\n", compiler->info.compFullName);
10755 // In verbose mode no need to print full name
10756 // while printing lsra stats.
10757 fprintf(file, "\n");
10760 fprintf(file, " : %s\n", compiler->eeGetMethodFullName(compiler->info.compCompHnd));
10763 fprintf(file, "----------\n");
10765 for (BasicBlock* block = compiler->fgFirstBB; block != nullptr; block = block->bbNext)
10767 if (block->bbNum > bbNumMaxBeforeResolution)
10772 unsigned spillCount = blockInfo[block->bbNum].spillCount;
10773 unsigned copyRegCount = blockInfo[block->bbNum].copyRegCount;
10774 unsigned resolutionMovCount = blockInfo[block->bbNum].resolutionMovCount;
10775 unsigned splitEdgeCount = blockInfo[block->bbNum].splitEdgeCount;
10777 if (spillCount != 0 || copyRegCount != 0 || resolutionMovCount != 0 || splitEdgeCount != 0)
10779 fprintf(file, "BB%02u [%8d]: ", block->bbNum, block->bbWeight);
10780 fprintf(file, "SpillCount = %d, ResolutionMovs = %d, SplitEdges = %d, CopyReg = %d\n", spillCount,
10781 resolutionMovCount, splitEdgeCount, copyRegCount);
10784 sumSpillCount += spillCount;
10785 sumCopyRegCount += copyRegCount;
10786 sumResolutionMovCount += resolutionMovCount;
10787 sumSplitEdgeCount += splitEdgeCount;
10789 wtdSpillCount += (UINT64)spillCount * block->bbWeight;
10790 wtdCopyRegCount += (UINT64)copyRegCount * block->bbWeight;
10791 wtdResolutionMovCount += (UINT64)resolutionMovCount * block->bbWeight;
10794 fprintf(file, "Total Tracked Vars: %d\n", compiler->lvaTrackedCount);
10795 fprintf(file, "Total Reg Cand Vars: %d\n", regCandidateVarCount);
10796 fprintf(file, "Total number of Intervals: %d\n", static_cast<unsigned>(intervals.size() - 1));
10797 fprintf(file, "Total number of RefPositions: %d\n", static_cast<unsigned>(refPositions.size() - 1));
10798 fprintf(file, "Total Spill Count: %d Weighted: %I64u\n", sumSpillCount, wtdSpillCount);
10799 fprintf(file, "Total CopyReg Count: %d Weighted: %I64u\n", sumCopyRegCount, wtdCopyRegCount);
10800 fprintf(file, "Total ResolutionMov Count: %d Weighted: %I64u\n", sumResolutionMovCount, wtdResolutionMovCount);
10801 fprintf(file, "Total number of split edges: %d\n", sumSplitEdgeCount);
10803 // compute total number of spill temps created
10804 unsigned numSpillTemps = 0;
10805 for (int i = 0; i < TYP_COUNT; i++)
10807 numSpillTemps += maxSpill[i];
10809 fprintf(file, "Total Number of spill temps created: %d\n\n", numSpillTemps);
10811 #endif // TRACK_LSRA_STATS
10814 void dumpRegMask(regMaskTP regs)
10816 if (regs == RBM_ALLINT)
10818 printf("[allInt]");
10820 else if (regs == (RBM_ALLINT & ~RBM_FPBASE))
10822 printf("[allIntButFP]");
10824 else if (regs == RBM_ALLFLOAT)
10826 printf("[allFloat]");
10828 else if (regs == RBM_ALLDOUBLE)
10830 printf("[allDouble]");
10838 static const char* getRefTypeName(RefType refType)
10842 #define DEF_REFTYPE(memberName, memberValue, shortName) \
10844 return #memberName;
10845 #include "lsra_reftypes.h"
10852 static const char* getRefTypeShortName(RefType refType)
10856 #define DEF_REFTYPE(memberName, memberValue, shortName) \
10859 #include "lsra_reftypes.h"
10866 void RefPosition::dump()
10868 printf("<RefPosition #%-3u @%-3u", rpNum, nodeLocation);
10870 if (nextRefPosition)
10872 printf(" ->#%-3u", nextRefPosition->rpNum);
10875 printf(" %s ", getRefTypeName(refType));
10877 if (this->isPhysRegRef)
10879 this->getReg()->tinyDump();
10881 else if (getInterval())
10883 this->getInterval()->tinyDump();
10886 if (this->treeNode)
10888 printf("%s ", treeNode->OpName(treeNode->OperGet()));
10890 printf("BB%02u ", this->bbNum);
10892 printf("regmask=");
10893 dumpRegMask(registerAssignment);
10903 if (this->spillAfter)
10905 printf(" spillAfter");
10915 if (this->isFixedRegRef)
10919 if (this->isLocalDefUse)
10923 if (this->delayRegFree)
10927 if (this->outOfOrder)
10929 printf(" outOfOrder");
10932 if (this->AllocateIfProfitable())
10934 printf(" regOptional");
10939 void RegRecord::dump()
10944 void Interval::dump()
10946 printf("Interval %2u:", intervalIndex);
10950 printf(" (V%02u)", varNum);
10954 printf(" (INTERNAL)");
10958 printf(" (SPILLED)");
10962 printf(" (SPLIT)");
10966 printf(" (struct)");
10968 if (isSpecialPutArg)
10970 printf(" (specialPutArg)");
10974 printf(" (constant)");
10977 printf(" RefPositions {");
10978 for (RefPosition* refPosition = this->firstRefPosition; refPosition != nullptr;
10979 refPosition = refPosition->nextRefPosition)
10981 printf("#%u@%u", refPosition->rpNum, refPosition->nodeLocation);
10982 if (refPosition->nextRefPosition)
10989 // this is not used (yet?)
10990 // printf(" SpillOffset %d", this->spillOffset);
10992 printf(" physReg:%s", getRegName(physReg));
10994 printf(" Preferences=");
10995 dumpRegMask(this->registerPreferences);
10997 if (relatedInterval)
10999 printf(" RelatedInterval ");
11000 relatedInterval->microDump();
11001 printf("[%p]", dspPtr(relatedInterval));
11007 // print out very concise representation
11008 void Interval::tinyDump()
11010 printf("<Ivl:%u", intervalIndex);
11013 printf(" V%02u", varNum);
11017 printf(" internal");
11022 // print out extremely concise representation
11023 void Interval::microDump()
11025 char intervalTypeChar = 'I';
11028 intervalTypeChar = 'T';
11030 else if (isLocalVar)
11032 intervalTypeChar = 'L';
11035 printf("<%c%u>", intervalTypeChar, intervalIndex);
11038 void RegRecord::tinyDump()
11040 printf("<Reg:%-3s> ", getRegName(regNum));
11043 void TreeNodeInfo::dump(LinearScan* lsra)
11045 printf("<TreeNodeInfo @ %2u %d=%d %di %df", loc, dstCount, srcCount, internalIntCount, internalFloatCount);
11047 dumpRegMask(getSrcCandidates(lsra));
11049 dumpRegMask(getInternalCandidates(lsra));
11051 dumpRegMask(getDstCandidates(lsra));
11076 if (isInternalRegDelayFree)
11083 void LinearScan::lsraDumpIntervals(const char* msg)
11085 Interval* interval;
11087 printf("\nLinear scan intervals %s:\n", msg);
11088 for (auto& interval : intervals)
11090 // only dump something if it has references
11091 // if (interval->firstRefPosition)
11098 // Dumps a tree node as a destination or source operand, with the style
11099 // of dump dependent on the mode
11100 void LinearScan::lsraGetOperandString(GenTreePtr tree,
11101 LsraTupleDumpMode mode,
11102 char* operandString,
11103 unsigned operandStringLength)
11105 const char* lastUseChar = "";
11106 if ((tree->gtFlags & GTF_VAR_DEATH) != 0)
11112 case LinearScan::LSRA_DUMP_PRE:
11113 _snprintf_s(operandString, operandStringLength, operandStringLength, "t%d%s", tree->gtTreeID, lastUseChar);
11115 case LinearScan::LSRA_DUMP_REFPOS:
11116 _snprintf_s(operandString, operandStringLength, operandStringLength, "t%d%s", tree->gtTreeID, lastUseChar);
11118 case LinearScan::LSRA_DUMP_POST:
11120 Compiler* compiler = JitTls::GetCompiler();
11122 if (!tree->gtHasReg())
11124 _snprintf_s(operandString, operandStringLength, operandStringLength, "STK%s", lastUseChar);
11128 _snprintf_s(operandString, operandStringLength, operandStringLength, "%s%s",
11129 getRegName(tree->gtRegNum, useFloatReg(tree->TypeGet())), lastUseChar);
11134 printf("ERROR: INVALID TUPLE DUMP MODE\n");
11138 void LinearScan::lsraDispNode(GenTreePtr tree, LsraTupleDumpMode mode, bool hasDest)
11140 Compiler* compiler = JitTls::GetCompiler();
11141 const unsigned operandStringLength = 16;
11142 char operandString[operandStringLength];
11143 const char* emptyDestOperand = " ";
11144 char spillChar = ' ';
11146 if (mode == LinearScan::LSRA_DUMP_POST)
11148 if ((tree->gtFlags & GTF_SPILL) != 0)
11152 if (!hasDest && tree->gtHasReg())
11154 // A node can define a register, but not produce a value for a parent to consume,
11155 // i.e. in the "localDefUse" case.
11156 // There used to be an assert here that we wouldn't spill such a node.
11157 // However, we can have unused lclVars that wind up being the node at which
11158 // it is spilled. This probably indicates a bug, but we don't realy want to
11159 // assert during a dump.
11160 if (spillChar == 'S')
11171 printf("%c N%03u. ", spillChar, tree->gtSeqNum);
11173 LclVarDsc* varDsc = nullptr;
11174 unsigned varNum = UINT_MAX;
11175 if (tree->IsLocal())
11177 varNum = tree->gtLclVarCommon.gtLclNum;
11178 varDsc = &(compiler->lvaTable[varNum]);
11179 if (varDsc->lvLRACandidate)
11186 if (mode == LinearScan::LSRA_DUMP_POST && tree->gtFlags & GTF_SPILLED)
11188 assert(tree->gtHasReg());
11190 lsraGetOperandString(tree, mode, operandString, operandStringLength);
11191 printf("%-15s =", operandString);
11195 printf("%-15s ", emptyDestOperand);
11197 if (varDsc != nullptr)
11199 if (varDsc->lvLRACandidate)
11201 if (mode == LSRA_DUMP_REFPOS)
11203 printf(" V%02u(L%d)", varNum, getIntervalForLocalVar(varDsc->lvVarIndex)->intervalIndex);
11207 lsraGetOperandString(tree, mode, operandString, operandStringLength);
11208 printf(" V%02u(%s)", varNum, operandString);
11209 if (mode == LinearScan::LSRA_DUMP_POST && tree->gtFlags & GTF_SPILLED)
11217 printf(" V%02u MEM", varNum);
11220 else if (tree->OperIsAssignment())
11222 assert(!tree->gtHasReg());
11223 printf(" asg%s ", GenTree::OpName(tree->OperGet()));
11227 compiler->gtDispNodeName(tree);
11228 if (tree->OperKind() & GTK_LEAF)
11230 compiler->gtDispLeaf(tree, nullptr);
11235 //------------------------------------------------------------------------
11236 // ComputeOperandDstCount: computes the number of registers defined by a
11239 // For most nodes, this is simple:
11240 // - Nodes that do not produce values (e.g. stores and other void-typed
11241 // nodes) and nodes that immediately use the registers they define
11242 // produce no registers
11243 // - Nodes that are marked as defining N registers define N registers.
11245 // For contained nodes, however, things are more complicated: for purposes
11246 // of bookkeeping, a contained node is treated as producing the transitive
11247 // closure of the registers produced by its sources.
11250 // operand - The operand for which to compute a register count.
11253 // The number of registers defined by `operand`.
11255 void LinearScan::DumpOperandDefs(
11256 GenTree* operand, bool& first, LsraTupleDumpMode mode, char* operandString, const unsigned operandStringLength)
11258 assert(operand != nullptr);
11259 assert(operandString != nullptr);
11261 if (ComputeOperandDstCount(operand) == 0)
11266 if (operand->gtLsraInfo.dstCount != 0)
11268 // This operand directly produces registers; print it.
11269 for (int i = 0; i < operand->gtLsraInfo.dstCount; i++)
11276 lsraGetOperandString(operand, mode, operandString, operandStringLength);
11277 printf("%s", operandString);
11284 // This is a contained node. Dump the defs produced by its operands.
11285 for (GenTree* op : operand->Operands())
11287 DumpOperandDefs(op, first, mode, operandString, operandStringLength);
11292 void LinearScan::TupleStyleDump(LsraTupleDumpMode mode)
11295 LsraLocation currentLoc = 1; // 0 is the entry
11296 const unsigned operandStringLength = 16;
11297 char operandString[operandStringLength];
11299 // currentRefPosition is not used for LSRA_DUMP_PRE
11300 // We keep separate iterators for defs, so that we can print them
11301 // on the lhs of the dump
11302 auto currentRefPosition = refPositions.begin();
11306 case LSRA_DUMP_PRE:
11307 printf("TUPLE STYLE DUMP BEFORE LSRA\n");
11309 case LSRA_DUMP_REFPOS:
11310 printf("TUPLE STYLE DUMP WITH REF POSITIONS\n");
11312 case LSRA_DUMP_POST:
11313 printf("TUPLE STYLE DUMP WITH REGISTER ASSIGNMENTS\n");
11316 printf("ERROR: INVALID TUPLE DUMP MODE\n");
11320 if (mode != LSRA_DUMP_PRE)
11322 printf("Incoming Parameters: ");
11323 for (; currentRefPosition != refPositions.end() && currentRefPosition->refType != RefTypeBB;
11324 ++currentRefPosition)
11326 Interval* interval = currentRefPosition->getInterval();
11327 assert(interval != nullptr && interval->isLocalVar);
11328 printf(" V%02d", interval->varNum);
11329 if (mode == LSRA_DUMP_POST)
11332 if (currentRefPosition->registerAssignment == RBM_NONE)
11338 reg = currentRefPosition->assignedReg();
11340 LclVarDsc* varDsc = &(compiler->lvaTable[interval->varNum]);
11342 regNumber assignedReg = varDsc->lvRegNum;
11343 regNumber argReg = (varDsc->lvIsRegArg) ? varDsc->lvArgReg : REG_STK;
11345 assert(reg == assignedReg || varDsc->lvRegister == false);
11348 printf(getRegName(argReg, isFloatRegType(interval->registerType)));
11351 printf("%s)", getRegName(reg, isFloatRegType(interval->registerType)));
11357 for (block = startBlockSequence(); block != nullptr; block = moveToNextBlock())
11361 if (mode == LSRA_DUMP_REFPOS)
11363 bool printedBlockHeader = false;
11364 // We should find the boundary RefPositions in the order of exposed uses, dummy defs, and the blocks
11365 for (; currentRefPosition != refPositions.end() &&
11366 (currentRefPosition->refType == RefTypeExpUse || currentRefPosition->refType == RefTypeDummyDef ||
11367 (currentRefPosition->refType == RefTypeBB && !printedBlockHeader));
11368 ++currentRefPosition)
11370 Interval* interval = nullptr;
11371 if (currentRefPosition->isIntervalRef())
11373 interval = currentRefPosition->getInterval();
11375 switch (currentRefPosition->refType)
11377 case RefTypeExpUse:
11378 assert(interval != nullptr);
11379 assert(interval->isLocalVar);
11380 printf(" Exposed use of V%02u at #%d\n", interval->varNum, currentRefPosition->rpNum);
11382 case RefTypeDummyDef:
11383 assert(interval != nullptr);
11384 assert(interval->isLocalVar);
11385 printf(" Dummy def of V%02u at #%d\n", interval->varNum, currentRefPosition->rpNum);
11388 block->dspBlockHeader(compiler);
11389 printedBlockHeader = true;
11393 printf("Unexpected RefPosition type at #%d\n", currentRefPosition->rpNum);
11400 block->dspBlockHeader(compiler);
11403 if (enregisterLocalVars && mode == LSRA_DUMP_POST && block != compiler->fgFirstBB &&
11404 block->bbNum <= bbNumMaxBeforeResolution)
11406 printf("Predecessor for variable locations: BB%02u\n", blockInfo[block->bbNum].predBBNum);
11407 dumpInVarToRegMap(block);
11409 if (block->bbNum > bbNumMaxBeforeResolution)
11411 SplitEdgeInfo splitEdgeInfo;
11412 splitBBNumToTargetBBNumMap->Lookup(block->bbNum, &splitEdgeInfo);
11413 assert(splitEdgeInfo.toBBNum <= bbNumMaxBeforeResolution);
11414 assert(splitEdgeInfo.fromBBNum <= bbNumMaxBeforeResolution);
11415 printf("New block introduced for resolution from BB%02u to BB%02u\n", splitEdgeInfo.fromBBNum,
11416 splitEdgeInfo.toBBNum);
11419 for (GenTree* node : LIR::AsRange(block).NonPhiNodes())
11421 GenTree* tree = node;
11423 genTreeOps oper = tree->OperGet();
11424 TreeNodeInfo& info = tree->gtLsraInfo;
11425 if (tree->gtLsraInfo.isLsraAdded)
11427 // This must be one of the nodes that we add during LSRA
11429 if (oper == GT_LCL_VAR)
11434 else if (oper == GT_RELOAD || oper == GT_COPY)
11439 #ifdef FEATURE_SIMD
11440 else if (oper == GT_SIMD)
11442 if (tree->gtSIMD.gtSIMDIntrinsicID == SIMDIntrinsicUpperSave)
11449 assert(tree->gtSIMD.gtSIMDIntrinsicID == SIMDIntrinsicUpperRestore);
11454 #endif // FEATURE_SIMD
11457 assert(oper == GT_SWAP);
11461 info.internalIntCount = 0;
11462 info.internalFloatCount = 0;
11465 int consume = info.srcCount;
11466 int produce = info.dstCount;
11467 regMaskTP killMask = RBM_NONE;
11468 regMaskTP fixedMask = RBM_NONE;
11470 lsraDispNode(tree, mode, produce != 0 && mode != LSRA_DUMP_REFPOS);
11472 if (mode != LSRA_DUMP_REFPOS)
11479 for (GenTree* operand : tree->Operands())
11481 DumpOperandDefs(operand, first, mode, operandString, operandStringLength);
11487 // Print each RefPosition on a new line, but
11488 // printing all the kills for each node on a single line
11489 // and combining the fixed regs with their associated def or use
11490 bool killPrinted = false;
11491 RefPosition* lastFixedRegRefPos = nullptr;
11492 for (; currentRefPosition != refPositions.end() &&
11493 (currentRefPosition->refType == RefTypeUse || currentRefPosition->refType == RefTypeFixedReg ||
11494 currentRefPosition->refType == RefTypeKill || currentRefPosition->refType == RefTypeDef) &&
11495 (currentRefPosition->nodeLocation == tree->gtSeqNum ||
11496 currentRefPosition->nodeLocation == tree->gtSeqNum + 1);
11497 ++currentRefPosition)
11499 Interval* interval = nullptr;
11500 if (currentRefPosition->isIntervalRef())
11502 interval = currentRefPosition->getInterval();
11504 switch (currentRefPosition->refType)
11507 if (currentRefPosition->isPhysRegRef)
11509 printf("\n Use:R%d(#%d)",
11510 currentRefPosition->getReg()->regNum, currentRefPosition->rpNum);
11514 assert(interval != nullptr);
11516 interval->microDump();
11517 printf("(#%d)", currentRefPosition->rpNum);
11518 if (currentRefPosition->isFixedRegRef)
11520 assert(genMaxOneBit(currentRefPosition->registerAssignment));
11521 assert(lastFixedRegRefPos != nullptr);
11522 printf(" Fixed:%s(#%d)", getRegName(currentRefPosition->assignedReg(),
11523 isFloatRegType(interval->registerType)),
11524 lastFixedRegRefPos->rpNum);
11525 lastFixedRegRefPos = nullptr;
11527 if (currentRefPosition->isLocalDefUse)
11529 printf(" LocalDefUse");
11531 if (currentRefPosition->lastUse)
11539 // Print each def on a new line
11540 assert(interval != nullptr);
11542 interval->microDump();
11543 printf("(#%d)", currentRefPosition->rpNum);
11544 if (currentRefPosition->isFixedRegRef)
11546 assert(genMaxOneBit(currentRefPosition->registerAssignment));
11547 printf(" %s", getRegName(currentRefPosition->assignedReg(),
11548 isFloatRegType(interval->registerType)));
11550 if (currentRefPosition->isLocalDefUse)
11552 printf(" LocalDefUse");
11554 if (currentRefPosition->lastUse)
11558 if (interval->relatedInterval != nullptr)
11561 interval->relatedInterval->microDump();
11568 printf("\n Kill: ");
11569 killPrinted = true;
11571 printf(getRegName(currentRefPosition->assignedReg(),
11572 isFloatRegType(currentRefPosition->getReg()->registerType)));
11575 case RefTypeFixedReg:
11576 lastFixedRegRefPos = currentRefPosition;
11579 printf("Unexpected RefPosition type at #%d\n", currentRefPosition->rpNum);
11585 if (info.internalIntCount != 0 && mode != LSRA_DUMP_REFPOS)
11587 printf("\tinternal (%d):\t", info.internalIntCount);
11588 if (mode == LSRA_DUMP_POST)
11590 dumpRegMask(tree->gtRsvdRegs);
11592 else if ((info.getInternalCandidates(this) & allRegs(TYP_INT)) != allRegs(TYP_INT))
11594 dumpRegMask(info.getInternalCandidates(this) & allRegs(TYP_INT));
11598 if (info.internalFloatCount != 0 && mode != LSRA_DUMP_REFPOS)
11600 printf("\tinternal (%d):\t", info.internalFloatCount);
11601 if (mode == LSRA_DUMP_POST)
11603 dumpRegMask(tree->gtRsvdRegs);
11605 else if ((info.getInternalCandidates(this) & allRegs(TYP_INT)) != allRegs(TYP_INT))
11607 dumpRegMask(info.getInternalCandidates(this) & allRegs(TYP_INT));
11612 if (enregisterLocalVars && mode == LSRA_DUMP_POST)
11614 dumpOutVarToRegMap(block);
11621 void LinearScan::dumpLsraAllocationEvent(LsraDumpEvent event,
11622 Interval* interval,
11624 BasicBlock* currentBlock)
11632 // Conflicting def/use
11633 case LSRA_EVENT_DEFUSE_CONFLICT:
11636 printf(" Def and Use have conflicting register requirements:");
11640 printf("DUconflict ");
11644 case LSRA_EVENT_DEFUSE_FIXED_DELAY_USE:
11647 printf(" Can't change useAssignment ");
11650 case LSRA_EVENT_DEFUSE_CASE1:
11653 printf(" case #1, use the defRegAssignment\n");
11657 printf(indentFormat, " case #1 use defRegAssignment");
11659 dumpEmptyRefPosition();
11662 case LSRA_EVENT_DEFUSE_CASE2:
11665 printf(" case #2, use the useRegAssignment\n");
11669 printf(indentFormat, " case #2 use useRegAssignment");
11671 dumpEmptyRefPosition();
11674 case LSRA_EVENT_DEFUSE_CASE3:
11677 printf(" case #3, change the defRegAssignment to the use regs\n");
11681 printf(indentFormat, " case #3 use useRegAssignment");
11683 dumpEmptyRefPosition();
11686 case LSRA_EVENT_DEFUSE_CASE4:
11689 printf(" case #4, change the useRegAssignment to the def regs\n");
11693 printf(indentFormat, " case #4 use defRegAssignment");
11695 dumpEmptyRefPosition();
11698 case LSRA_EVENT_DEFUSE_CASE5:
11701 printf(" case #5, Conflicting Def and Use single-register requirements require copies - set def to all "
11702 "regs of the appropriate type\n");
11706 printf(indentFormat, " case #5 set def to all regs");
11708 dumpEmptyRefPosition();
11711 case LSRA_EVENT_DEFUSE_CASE6:
11714 printf(" case #6, Conflicting Def and Use register requirements require a copy\n");
11718 printf(indentFormat, " case #6 need a copy");
11720 dumpEmptyRefPosition();
11724 case LSRA_EVENT_SPILL:
11727 printf("Spilled:\n");
11732 assert(interval != nullptr && interval->assignedReg != nullptr);
11733 printf("Spill %-4s ", getRegName(interval->assignedReg->regNum));
11735 dumpEmptyRefPosition();
11738 case LSRA_EVENT_SPILL_EXTENDED_LIFETIME:
11741 printf(" Spilled extended lifetime var V%02u at last use; not marked for actual spill.",
11742 interval->intervalIndex);
11746 // Restoring the previous register
11747 case LSRA_EVENT_RESTORE_PREVIOUS_INTERVAL_AFTER_SPILL:
11748 assert(interval != nullptr);
11751 printf(" Assign register %s to previous interval Ivl:%d after spill\n", getRegName(reg),
11752 interval->intervalIndex);
11756 // If we spilled, then the dump is already pre-indented, but we need to pre-indent for the subsequent
11758 // with a dumpEmptyRefPosition().
11759 printf("SRstr %-4s ", getRegName(reg));
11761 dumpEmptyRefPosition();
11764 case LSRA_EVENT_RESTORE_PREVIOUS_INTERVAL:
11765 assert(interval != nullptr);
11768 printf(" Assign register %s to previous interval Ivl:%d\n", getRegName(reg), interval->intervalIndex);
11772 if (activeRefPosition == nullptr)
11774 printf(emptyRefPositionFormat, "");
11776 printf("Restr %-4s ", getRegName(reg));
11778 if (activeRefPosition != nullptr)
11780 printf(emptyRefPositionFormat, "");
11785 // Done with GC Kills
11786 case LSRA_EVENT_DONE_KILL_GC_REFS:
11787 printf("DoneKillGC ");
11790 // Block boundaries
11791 case LSRA_EVENT_START_BB:
11792 assert(currentBlock != nullptr);
11795 printf("\n\n Live Vars(Regs) at start of BB%02u (from pred BB%02u):", currentBlock->bbNum,
11796 blockInfo[currentBlock->bbNum].predBBNum);
11797 dumpVarToRegMap(inVarToRegMaps[currentBlock->bbNum]);
11800 case LSRA_EVENT_END_BB:
11803 printf("\n\n Live Vars(Regs) after BB%02u:", currentBlock->bbNum);
11804 dumpVarToRegMap(outVarToRegMaps[currentBlock->bbNum]);
11808 case LSRA_EVENT_FREE_REGS:
11811 printf("Freeing registers:\n");
11815 // Characteristics of the current RefPosition
11816 case LSRA_EVENT_INCREMENT_RANGE_END:
11819 printf(" Incrementing nextPhysRegLocation for %s\n", getRegName(reg));
11823 case LSRA_EVENT_LAST_USE:
11826 printf(" Last use, marked to be freed\n");
11829 case LSRA_EVENT_LAST_USE_DELAYED:
11832 printf(" Last use, marked to be freed (delayed)\n");
11835 case LSRA_EVENT_NEEDS_NEW_REG:
11838 printf(" Needs new register; mark %s to be freed\n", getRegName(reg));
11842 printf("Free %-4s ", getRegName(reg));
11844 dumpEmptyRefPosition();
11848 // Allocation decisions
11849 case LSRA_EVENT_FIXED_REG:
11850 case LSRA_EVENT_EXP_USE:
11853 printf("No allocation\n");
11857 printf("Keep %-4s ", getRegName(reg));
11860 case LSRA_EVENT_ZERO_REF:
11861 assert(interval != nullptr && interval->isLocalVar);
11864 printf("Marking V%02u as last use there are no actual references\n", interval->varNum);
11870 dumpEmptyRefPosition();
11873 case LSRA_EVENT_KEPT_ALLOCATION:
11876 printf("already allocated %4s\n", getRegName(reg));
11880 printf("Keep %-4s ", getRegName(reg));
11883 case LSRA_EVENT_COPY_REG:
11884 assert(interval != nullptr && interval->recentRefPosition != nullptr);
11887 printf("allocated %s as copyReg\n\n", getRegName(reg));
11891 printf("Copy %-4s ", getRegName(reg));
11894 case LSRA_EVENT_MOVE_REG:
11895 assert(interval != nullptr && interval->recentRefPosition != nullptr);
11898 printf(" needs a new register; marked as moveReg\n");
11902 printf("Move %-4s ", getRegName(reg));
11904 dumpEmptyRefPosition();
11907 case LSRA_EVENT_ALLOC_REG:
11910 printf("allocated %s\n", getRegName(reg));
11914 printf("Alloc %-4s ", getRegName(reg));
11917 case LSRA_EVENT_REUSE_REG:
11920 printf("reused constant in %s\n", getRegName(reg));
11924 printf("Reuse %-4s ", getRegName(reg));
11927 case LSRA_EVENT_ALLOC_SPILLED_REG:
11930 printf("allocated spilled register %s\n", getRegName(reg));
11934 printf("Steal %-4s ", getRegName(reg));
11937 case LSRA_EVENT_NO_ENTRY_REG_ALLOCATED:
11938 assert(interval != nullptr && interval->isLocalVar);
11941 printf("Not allocating an entry register for V%02u due to low ref count\n", interval->varNum);
11948 case LSRA_EVENT_NO_REG_ALLOCATED:
11951 printf("no register allocated\n");
11958 case LSRA_EVENT_RELOAD:
11961 printf(" Marked for reload\n");
11965 printf("ReLod %-4s ", getRegName(reg));
11967 dumpEmptyRefPosition();
11970 case LSRA_EVENT_SPECIAL_PUTARG:
11973 printf(" Special case of putArg - using lclVar that's in the expected reg\n");
11977 printf("PtArg %-4s ", getRegName(reg));
11985 //------------------------------------------------------------------------
11986 // dumpRegRecordHeader: Dump the header for a column-based dump of the register state.
11995 // Reg names fit in 4 characters (minimum width of the columns)
11998 // In order to make the table as dense as possible (for ease of reading the dumps),
11999 // we determine the minimum regColumnWidth width required to represent:
12000 // regs, by name (e.g. eax or xmm0) - this is fixed at 4 characters.
12001 // intervals, as Vnn for lclVar intervals, or as I<num> for other intervals.
12002 // The table is indented by the amount needed for dumpRefPositionShort, which is
12003 // captured in shortRefPositionDumpWidth.
12005 void LinearScan::dumpRegRecordHeader()
12007 printf("The following table has one or more rows for each RefPosition that is handled during allocation.\n"
12008 "The first column provides the basic information about the RefPosition, with its type (e.g. Def,\n"
12009 "Use, Fixd) followed by a '*' if it is a last use, and a 'D' if it is delayRegFree, and then the\n"
12010 "action taken during allocation (e.g. Alloc a new register, or Keep an existing one).\n"
12011 "The subsequent columns show the Interval occupying each register, if any, followed by 'a' if it is\n"
12012 "active, and 'i'if it is inactive. Columns are only printed up to the last modifed register, which\n"
12013 "may increase during allocation, in which case additional columns will appear. Registers which are\n"
12014 "not marked modified have ---- in their column.\n\n");
12016 // First, determine the width of each register column (which holds a reg name in the
12017 // header, and an interval name in each subsequent row).
12018 int intervalNumberWidth = (int)log10((double)intervals.size()) + 1;
12019 // The regColumnWidth includes the identifying character (I or V) and an 'i' or 'a' (inactive or active)
12020 regColumnWidth = intervalNumberWidth + 2;
12021 if (regColumnWidth < 4)
12023 regColumnWidth = 4;
12025 sprintf_s(intervalNameFormat, MAX_FORMAT_CHARS, "%%c%%-%dd", regColumnWidth - 2);
12026 sprintf_s(regNameFormat, MAX_FORMAT_CHARS, "%%-%ds", regColumnWidth);
12028 // Next, determine the width of the short RefPosition (see dumpRefPositionShort()).
12029 // This is in the form:
12030 // nnn.#mmm NAME TYPEld
12032 // nnn is the Location, right-justified to the width needed for the highest location.
12033 // mmm is the RefPosition rpNum, left-justified to the width needed for the highest rpNum.
12034 // NAME is dumped by dumpReferentName(), and is "regColumnWidth".
12035 // TYPE is RefTypeNameShort, and is 4 characters
12036 // l is either '*' (if a last use) or ' ' (otherwise)
12037 // d is either 'D' (if a delayed use) or ' ' (otherwise)
12039 maxNodeLocation = (maxNodeLocation == 0)
12041 : maxNodeLocation; // corner case of a method with an infinite loop without any gentree nodes
12042 assert(maxNodeLocation >= 1);
12043 assert(refPositions.size() >= 1);
12044 int nodeLocationWidth = (int)log10((double)maxNodeLocation) + 1;
12045 int refPositionWidth = (int)log10((double)refPositions.size()) + 1;
12046 int refTypeInfoWidth = 4 /*TYPE*/ + 2 /* last-use and delayed */ + 1 /* space */;
12047 int locationAndRPNumWidth = nodeLocationWidth + 2 /* .# */ + refPositionWidth + 1 /* space */;
12048 int shortRefPositionDumpWidth = locationAndRPNumWidth + regColumnWidth + 1 /* space */ + refTypeInfoWidth;
12049 sprintf_s(shortRefPositionFormat, MAX_FORMAT_CHARS, "%%%dd.#%%-%dd ", nodeLocationWidth, refPositionWidth);
12050 sprintf_s(emptyRefPositionFormat, MAX_FORMAT_CHARS, "%%-%ds", shortRefPositionDumpWidth);
12052 // The width of the "allocation info"
12053 // - a 5-character allocation decision
12055 // - a 4-character register
12057 int allocationInfoWidth = 5 + 1 + 4 + 1;
12059 // Next, determine the width of the legend for each row. This includes:
12060 // - a short RefPosition dump (shortRefPositionDumpWidth), which includes a space
12061 // - the allocation info (allocationInfoWidth), which also includes a space
12063 regTableIndent = shortRefPositionDumpWidth + allocationInfoWidth;
12065 // BBnn printed left-justified in the NAME Typeld and allocationInfo space.
12066 int bbDumpWidth = regColumnWidth + 1 + refTypeInfoWidth + allocationInfoWidth;
12067 int bbNumWidth = (int)log10((double)compiler->fgBBNumMax) + 1;
12068 // In the unlikely event that BB numbers overflow the space, we'll simply omit the predBB
12069 int predBBNumDumpSpace = regTableIndent - locationAndRPNumWidth - bbNumWidth - 9; // 'BB' + ' PredBB'
12070 if (predBBNumDumpSpace < bbNumWidth)
12072 sprintf_s(bbRefPosFormat, MAX_LEGEND_FORMAT_CHARS, "BB%%-%dd", shortRefPositionDumpWidth - 2);
12076 sprintf_s(bbRefPosFormat, MAX_LEGEND_FORMAT_CHARS, "BB%%-%dd PredBB%%-%dd", bbNumWidth, predBBNumDumpSpace);
12079 if (compiler->shouldDumpASCIITrees())
12081 columnSeparator = "|";
12089 columnSeparator = "\xe2\x94\x82";
12090 line = "\xe2\x94\x80";
12091 leftBox = "\xe2\x94\x9c";
12092 middleBox = "\xe2\x94\xbc";
12093 rightBox = "\xe2\x94\xa4";
12095 sprintf_s(indentFormat, MAX_FORMAT_CHARS, "%%-%ds", regTableIndent);
12097 // Now, set up the legend format for the RefPosition info
12098 sprintf_s(legendFormat, MAX_LEGEND_FORMAT_CHARS, "%%-%d.%ds%%-%d.%ds%%-%ds%%s", nodeLocationWidth + 1,
12099 nodeLocationWidth + 1, refPositionWidth + 2, refPositionWidth + 2, regColumnWidth + 1);
12101 // Finally, print a "title row" including the legend and the reg names
12102 dumpRegRecordTitle();
12105 int LinearScan::getLastUsedRegNumIndex()
12107 int lastUsedRegNumIndex = 0;
12108 regMaskTP usedRegsMask = compiler->codeGen->regSet.rsGetModifiedRegsMask();
12109 int lastRegNumIndex = compiler->compFloatingPointUsed ? REG_FP_LAST : REG_INT_LAST;
12110 for (int regNumIndex = 0; regNumIndex <= lastRegNumIndex; regNumIndex++)
12112 if ((usedRegsMask & genRegMask((regNumber)regNumIndex)) != 0)
12114 lastUsedRegNumIndex = regNumIndex;
12117 return lastUsedRegNumIndex;
12120 void LinearScan::dumpRegRecordTitleLines()
12122 for (int i = 0; i < regTableIndent; i++)
12124 printf("%s", line);
12126 int lastUsedRegNumIndex = getLastUsedRegNumIndex();
12127 for (int regNumIndex = 0; regNumIndex <= lastUsedRegNumIndex; regNumIndex++)
12129 printf("%s", middleBox);
12130 for (int i = 0; i < regColumnWidth; i++)
12132 printf("%s", line);
12135 printf("%s\n", rightBox);
12137 void LinearScan::dumpRegRecordTitle()
12139 dumpRegRecordTitleLines();
12141 // Print out the legend for the RefPosition info
12142 printf(legendFormat, "Loc ", "RP# ", "Name ", "Type Action Reg ");
12144 // Print out the register name column headers
12145 char columnFormatArray[MAX_FORMAT_CHARS];
12146 sprintf_s(columnFormatArray, MAX_FORMAT_CHARS, "%s%%-%d.%ds", columnSeparator, regColumnWidth, regColumnWidth);
12147 int lastUsedRegNumIndex = getLastUsedRegNumIndex();
12148 for (int regNumIndex = 0; regNumIndex <= lastUsedRegNumIndex; regNumIndex++)
12150 regNumber regNum = (regNumber)regNumIndex;
12151 const char* regName = getRegName(regNum);
12152 printf(columnFormatArray, regName);
12154 printf("%s\n", columnSeparator);
12156 rowCountSinceLastTitle = 0;
12158 dumpRegRecordTitleLines();
12161 void LinearScan::dumpRegRecords()
12163 static char columnFormatArray[18];
12164 int lastUsedRegNumIndex = getLastUsedRegNumIndex();
12165 regMaskTP usedRegsMask = compiler->codeGen->regSet.rsGetModifiedRegsMask();
12167 for (int regNumIndex = 0; regNumIndex <= lastUsedRegNumIndex; regNumIndex++)
12169 printf("%s", columnSeparator);
12170 RegRecord& regRecord = physRegs[regNumIndex];
12171 Interval* interval = regRecord.assignedInterval;
12172 if (interval != nullptr)
12174 dumpIntervalName(interval);
12175 char activeChar = interval->isActive ? 'a' : 'i';
12176 printf("%c", activeChar);
12178 else if (regRecord.isBusyUntilNextKill)
12180 printf(columnFormatArray, "Busy");
12182 else if ((usedRegsMask & genRegMask((regNumber)regNumIndex)) == 0)
12184 sprintf_s(columnFormatArray, MAX_FORMAT_CHARS, "%%-%ds", regColumnWidth);
12185 printf(columnFormatArray, "----");
12189 sprintf_s(columnFormatArray, MAX_FORMAT_CHARS, "%%-%ds", regColumnWidth);
12190 printf(columnFormatArray, "");
12193 printf("%s\n", columnSeparator);
12195 if (rowCountSinceLastTitle > MAX_ROWS_BETWEEN_TITLES)
12197 dumpRegRecordTitle();
12199 rowCountSinceLastTitle++;
12202 void LinearScan::dumpIntervalName(Interval* interval)
12204 if (interval->isLocalVar)
12206 printf(intervalNameFormat, 'V', interval->varNum);
12208 else if (interval->isConstant)
12210 printf(intervalNameFormat, 'C', interval->intervalIndex);
12214 printf(intervalNameFormat, 'I', interval->intervalIndex);
12218 void LinearScan::dumpEmptyRefPosition()
12220 printf(emptyRefPositionFormat, "");
12223 // Note that the size of this dump is computed in dumpRegRecordHeader().
12225 void LinearScan::dumpRefPositionShort(RefPosition* refPosition, BasicBlock* currentBlock)
12227 BasicBlock* block = currentBlock;
12228 if (refPosition->refType == RefTypeBB)
12230 // Always print a title row before a RefTypeBB (except for the first, because we
12231 // will already have printed it before the parameters)
12232 if (refPosition->refType == RefTypeBB && block != compiler->fgFirstBB && block != nullptr)
12234 dumpRegRecordTitle();
12237 printf(shortRefPositionFormat, refPosition->nodeLocation, refPosition->rpNum);
12238 if (refPosition->refType == RefTypeBB)
12240 if (block == nullptr)
12242 printf(regNameFormat, "END");
12244 printf(regNameFormat, "");
12248 printf(bbRefPosFormat, block->bbNum, block == compiler->fgFirstBB ? 0 : blockInfo[block->bbNum].predBBNum);
12251 else if (refPosition->isIntervalRef())
12253 Interval* interval = refPosition->getInterval();
12254 dumpIntervalName(interval);
12255 char lastUseChar = ' ';
12256 char delayChar = ' ';
12257 if (refPosition->lastUse)
12260 if (refPosition->delayRegFree)
12265 printf(" %s%c%c ", getRefTypeShortName(refPosition->refType), lastUseChar, delayChar);
12267 else if (refPosition->isPhysRegRef)
12269 RegRecord* regRecord = refPosition->getReg();
12270 printf(regNameFormat, getRegName(regRecord->regNum));
12271 printf(" %s ", getRefTypeShortName(refPosition->refType));
12275 assert(refPosition->refType == RefTypeKillGCRefs);
12276 // There's no interval or reg name associated with this.
12277 printf(regNameFormat, " ");
12278 printf(" %s ", getRefTypeShortName(refPosition->refType));
12282 //------------------------------------------------------------------------
12283 // LinearScan::IsResolutionMove:
12284 // Returns true if the given node is a move inserted by LSRA
12288 // node - the node to check.
12290 bool LinearScan::IsResolutionMove(GenTree* node)
12292 if (!node->gtLsraInfo.isLsraAdded)
12297 switch (node->OperGet())
12301 return node->gtLsraInfo.isLocalDefUse;
12311 //------------------------------------------------------------------------
12312 // LinearScan::IsResolutionNode:
12313 // Returns true if the given node is either a move inserted by LSRA
12314 // resolution or an operand to such a move.
12317 // containingRange - the range that contains the node to check.
12318 // node - the node to check.
12320 bool LinearScan::IsResolutionNode(LIR::Range& containingRange, GenTree* node)
12324 if (IsResolutionMove(node))
12329 if (!node->gtLsraInfo.isLsraAdded || (node->OperGet() != GT_LCL_VAR))
12335 bool foundUse = containingRange.TryGetUse(node, &use);
12342 //------------------------------------------------------------------------
12343 // verifyFinalAllocation: Traverse the RefPositions and verify various invariants.
12352 // If verbose is set, this will also dump a table of the final allocations.
12353 void LinearScan::verifyFinalAllocation()
12357 printf("\nFinal allocation\n");
12360 // Clear register assignments.
12361 for (regNumber reg = REG_FIRST; reg < ACTUAL_REG_COUNT; reg = REG_NEXT(reg))
12363 RegRecord* physRegRecord = getRegisterRecord(reg);
12364 physRegRecord->assignedInterval = nullptr;
12367 for (auto& interval : intervals)
12369 interval.assignedReg = nullptr;
12370 interval.physReg = REG_NA;
12373 DBEXEC(VERBOSE, dumpRegRecordTitle());
12375 BasicBlock* currentBlock = nullptr;
12376 GenTree* firstBlockEndResolutionNode = nullptr;
12377 regMaskTP regsToFree = RBM_NONE;
12378 regMaskTP delayRegsToFree = RBM_NONE;
12379 LsraLocation currentLocation = MinLocation;
12380 for (auto& refPosition : refPositions)
12382 RefPosition* currentRefPosition = &refPosition;
12383 Interval* interval = nullptr;
12384 RegRecord* regRecord = nullptr;
12385 regNumber regNum = REG_NA;
12386 if (currentRefPosition->refType == RefTypeBB)
12388 regsToFree |= delayRegsToFree;
12389 delayRegsToFree = RBM_NONE;
12390 // For BB RefPositions, wait until we dump the "end of block" info before dumping the basic RefPosition
12395 // For other RefPosition types, we can dump the basic RefPosition info now.
12396 DBEXEC(VERBOSE, dumpRefPositionShort(currentRefPosition, currentBlock));
12398 if (currentRefPosition->isPhysRegRef)
12400 regRecord = currentRefPosition->getReg();
12401 regRecord->recentRefPosition = currentRefPosition;
12402 regNum = regRecord->regNum;
12404 else if (currentRefPosition->isIntervalRef())
12406 interval = currentRefPosition->getInterval();
12407 interval->recentRefPosition = currentRefPosition;
12408 if (currentRefPosition->registerAssignment != RBM_NONE)
12410 if (!genMaxOneBit(currentRefPosition->registerAssignment))
12412 assert(currentRefPosition->refType == RefTypeExpUse ||
12413 currentRefPosition->refType == RefTypeDummyDef);
12417 regNum = currentRefPosition->assignedReg();
12418 regRecord = getRegisterRecord(regNum);
12424 LsraLocation newLocation = currentRefPosition->nodeLocation;
12426 if (newLocation > currentLocation)
12429 // We could use the freeRegisters() method, but we'd have to carefully manage the active intervals.
12430 for (regNumber reg = REG_FIRST; reg < ACTUAL_REG_COUNT; reg = REG_NEXT(reg))
12432 regMaskTP regMask = genRegMask(reg);
12433 if ((regsToFree & regMask) != RBM_NONE)
12435 RegRecord* physRegRecord = getRegisterRecord(reg);
12436 physRegRecord->assignedInterval = nullptr;
12439 regsToFree = delayRegsToFree;
12440 regsToFree = RBM_NONE;
12442 currentLocation = newLocation;
12444 switch (currentRefPosition->refType)
12448 if (currentBlock == nullptr)
12450 currentBlock = startBlockSequence();
12454 // Verify the resolution moves at the end of the previous block.
12455 for (GenTree* node = firstBlockEndResolutionNode; node != nullptr; node = node->gtNext)
12457 assert(enregisterLocalVars);
12458 // Only verify nodes that are actually moves; don't bother with the nodes that are
12459 // operands to moves.
12460 if (IsResolutionMove(node))
12462 verifyResolutionMove(node, currentLocation);
12466 // Validate the locations at the end of the previous block.
12467 if (enregisterLocalVars)
12469 VarToRegMap outVarToRegMap = outVarToRegMaps[currentBlock->bbNum];
12470 VarSetOps::Iter iter(compiler, currentBlock->bbLiveOut);
12471 unsigned varIndex = 0;
12472 while (iter.NextElem(&varIndex))
12474 if (localVarIntervals[varIndex] == nullptr)
12476 assert(!compiler->lvaTable[compiler->lvaTrackedToVarNum[varIndex]].lvLRACandidate);
12479 regNumber regNum = getVarReg(outVarToRegMap, varIndex);
12480 interval = getIntervalForLocalVar(varIndex);
12481 assert(interval->physReg == regNum || (interval->physReg == REG_NA && regNum == REG_STK));
12482 interval->physReg = REG_NA;
12483 interval->assignedReg = nullptr;
12484 interval->isActive = false;
12488 // Clear register assignments.
12489 for (regNumber reg = REG_FIRST; reg < ACTUAL_REG_COUNT; reg = REG_NEXT(reg))
12491 RegRecord* physRegRecord = getRegisterRecord(reg);
12492 physRegRecord->assignedInterval = nullptr;
12495 // Now, record the locations at the beginning of this block.
12496 currentBlock = moveToNextBlock();
12499 if (currentBlock != nullptr)
12501 if (enregisterLocalVars)
12503 VarToRegMap inVarToRegMap = inVarToRegMaps[currentBlock->bbNum];
12504 VarSetOps::Iter iter(compiler, currentBlock->bbLiveIn);
12505 unsigned varIndex = 0;
12506 while (iter.NextElem(&varIndex))
12508 if (localVarIntervals[varIndex] == nullptr)
12510 assert(!compiler->lvaTable[compiler->lvaTrackedToVarNum[varIndex]].lvLRACandidate);
12513 regNumber regNum = getVarReg(inVarToRegMap, varIndex);
12514 interval = getIntervalForLocalVar(varIndex);
12515 interval->physReg = regNum;
12516 interval->assignedReg = &(physRegs[regNum]);
12517 interval->isActive = true;
12518 physRegs[regNum].assignedInterval = interval;
12524 dumpRefPositionShort(currentRefPosition, currentBlock);
12528 // Finally, handle the resolution moves, if any, at the beginning of the next block.
12529 firstBlockEndResolutionNode = nullptr;
12530 bool foundNonResolutionNode = false;
12532 LIR::Range& currentBlockRange = LIR::AsRange(currentBlock);
12533 for (GenTree* node : currentBlockRange.NonPhiNodes())
12535 if (IsResolutionNode(currentBlockRange, node))
12537 assert(enregisterLocalVars);
12538 if (foundNonResolutionNode)
12540 firstBlockEndResolutionNode = node;
12543 else if (IsResolutionMove(node))
12545 // Only verify nodes that are actually moves; don't bother with the nodes that are
12546 // operands to moves.
12547 verifyResolutionMove(node, currentLocation);
12552 foundNonResolutionNode = true;
12561 assert(regRecord != nullptr);
12562 assert(regRecord->assignedInterval == nullptr);
12563 dumpLsraAllocationEvent(LSRA_EVENT_KEPT_ALLOCATION, nullptr, regRecord->regNum, currentBlock);
12565 case RefTypeFixedReg:
12566 assert(regRecord != nullptr);
12567 dumpLsraAllocationEvent(LSRA_EVENT_KEPT_ALLOCATION, nullptr, regRecord->regNum, currentBlock);
12570 case RefTypeUpperVectorSaveDef:
12571 case RefTypeUpperVectorSaveUse:
12574 case RefTypeParamDef:
12575 case RefTypeZeroInit:
12576 assert(interval != nullptr);
12578 if (interval->isSpecialPutArg)
12580 dumpLsraAllocationEvent(LSRA_EVENT_SPECIAL_PUTARG, interval, regNum);
12583 if (currentRefPosition->reload)
12585 interval->isActive = true;
12586 assert(regNum != REG_NA);
12587 interval->physReg = regNum;
12588 interval->assignedReg = regRecord;
12589 regRecord->assignedInterval = interval;
12590 dumpLsraAllocationEvent(LSRA_EVENT_RELOAD, nullptr, regRecord->regNum, currentBlock);
12592 if (regNum == REG_NA)
12594 dumpLsraAllocationEvent(LSRA_EVENT_NO_REG_ALLOCATED, interval);
12596 else if (RefTypeIsDef(currentRefPosition->refType))
12598 interval->isActive = true;
12601 if (interval->isConstant && (currentRefPosition->treeNode != nullptr) &&
12602 currentRefPosition->treeNode->IsReuseRegVal())
12604 dumpLsraAllocationEvent(LSRA_EVENT_REUSE_REG, nullptr, regRecord->regNum, currentBlock);
12608 dumpLsraAllocationEvent(LSRA_EVENT_ALLOC_REG, nullptr, regRecord->regNum, currentBlock);
12612 else if (currentRefPosition->copyReg)
12614 dumpLsraAllocationEvent(LSRA_EVENT_COPY_REG, interval, regRecord->regNum, currentBlock);
12616 else if (currentRefPosition->moveReg)
12618 assert(interval->assignedReg != nullptr);
12619 interval->assignedReg->assignedInterval = nullptr;
12620 interval->physReg = regNum;
12621 interval->assignedReg = regRecord;
12622 regRecord->assignedInterval = interval;
12625 printf("Move %-4s ", getRegName(regRecord->regNum));
12630 dumpLsraAllocationEvent(LSRA_EVENT_KEPT_ALLOCATION, nullptr, regRecord->regNum, currentBlock);
12632 if (currentRefPosition->lastUse || currentRefPosition->spillAfter)
12634 interval->isActive = false;
12636 if (regNum != REG_NA)
12638 if (currentRefPosition->spillAfter)
12642 // If refPos is marked as copyReg, then the reg that is spilled
12643 // is the homeReg of the interval not the reg currently assigned
12645 regNumber spillReg = regNum;
12646 if (currentRefPosition->copyReg)
12648 assert(interval != nullptr);
12649 spillReg = interval->physReg;
12652 dumpEmptyRefPosition();
12653 printf("Spill %-4s ", getRegName(spillReg));
12656 else if (currentRefPosition->copyReg)
12658 regRecord->assignedInterval = interval;
12662 interval->physReg = regNum;
12663 interval->assignedReg = regRecord;
12664 regRecord->assignedInterval = interval;
12668 case RefTypeKillGCRefs:
12669 // No action to take.
12670 // However, we will assert that, at resolution time, no registers contain GC refs.
12672 DBEXEC(VERBOSE, printf(" "));
12673 regMaskTP candidateRegs = currentRefPosition->registerAssignment;
12674 while (candidateRegs != RBM_NONE)
12676 regMaskTP nextRegBit = genFindLowestBit(candidateRegs);
12677 candidateRegs &= ~nextRegBit;
12678 regNumber nextReg = genRegNumFromMask(nextRegBit);
12679 RegRecord* regRecord = getRegisterRecord(nextReg);
12680 Interval* assignedInterval = regRecord->assignedInterval;
12681 assert(assignedInterval == nullptr || !varTypeIsGC(assignedInterval->registerType));
12686 case RefTypeExpUse:
12687 case RefTypeDummyDef:
12688 // Do nothing; these will be handled by the RefTypeBB.
12689 DBEXEC(VERBOSE, printf(" "));
12692 case RefTypeInvalid:
12693 // for these 'currentRefPosition->refType' values, No action to take
12697 if (currentRefPosition->refType != RefTypeBB)
12699 DBEXEC(VERBOSE, dumpRegRecords());
12700 if (interval != nullptr)
12702 if (currentRefPosition->copyReg)
12704 assert(interval->physReg != regNum);
12705 regRecord->assignedInterval = nullptr;
12706 assert(interval->assignedReg != nullptr);
12707 regRecord = interval->assignedReg;
12709 if (currentRefPosition->spillAfter || currentRefPosition->lastUse)
12711 interval->physReg = REG_NA;
12712 interval->assignedReg = nullptr;
12714 // regRegcord could be null if the RefPosition does not require a register.
12715 if (regRecord != nullptr)
12717 regRecord->assignedInterval = nullptr;
12721 assert(!currentRefPosition->RequiresRegister());
12728 // Now, verify the resolution blocks.
12729 // Currently these are nearly always at the end of the method, but that may not alwyas be the case.
12730 // So, we'll go through all the BBs looking for blocks whose bbNum is greater than bbNumMaxBeforeResolution.
12731 for (BasicBlock* currentBlock = compiler->fgFirstBB; currentBlock != nullptr; currentBlock = currentBlock->bbNext)
12733 if (currentBlock->bbNum > bbNumMaxBeforeResolution)
12735 // If we haven't enregistered an lclVars, we have no resolution blocks.
12736 assert(enregisterLocalVars);
12740 dumpRegRecordTitle();
12741 printf(shortRefPositionFormat, 0, 0);
12742 assert(currentBlock->bbPreds != nullptr && currentBlock->bbPreds->flBlock != nullptr);
12743 printf(bbRefPosFormat, currentBlock->bbNum, currentBlock->bbPreds->flBlock->bbNum);
12747 // Clear register assignments.
12748 for (regNumber reg = REG_FIRST; reg < ACTUAL_REG_COUNT; reg = REG_NEXT(reg))
12750 RegRecord* physRegRecord = getRegisterRecord(reg);
12751 physRegRecord->assignedInterval = nullptr;
12754 // Set the incoming register assignments
12755 VarToRegMap inVarToRegMap = getInVarToRegMap(currentBlock->bbNum);
12756 VarSetOps::Iter iter(compiler, currentBlock->bbLiveIn);
12757 unsigned varIndex = 0;
12758 while (iter.NextElem(&varIndex))
12760 if (localVarIntervals[varIndex] == nullptr)
12762 assert(!compiler->lvaTable[compiler->lvaTrackedToVarNum[varIndex]].lvLRACandidate);
12765 regNumber regNum = getVarReg(inVarToRegMap, varIndex);
12766 Interval* interval = getIntervalForLocalVar(varIndex);
12767 interval->physReg = regNum;
12768 interval->assignedReg = &(physRegs[regNum]);
12769 interval->isActive = true;
12770 physRegs[regNum].assignedInterval = interval;
12773 // Verify the moves in this block
12774 LIR::Range& currentBlockRange = LIR::AsRange(currentBlock);
12775 for (GenTree* node : currentBlockRange.NonPhiNodes())
12777 assert(IsResolutionNode(currentBlockRange, node));
12778 if (IsResolutionMove(node))
12780 // Only verify nodes that are actually moves; don't bother with the nodes that are
12781 // operands to moves.
12782 verifyResolutionMove(node, currentLocation);
12786 // Verify the outgoing register assignments
12788 VarToRegMap outVarToRegMap = getOutVarToRegMap(currentBlock->bbNum);
12789 VarSetOps::Iter iter(compiler, currentBlock->bbLiveOut);
12790 unsigned varIndex = 0;
12791 while (iter.NextElem(&varIndex))
12793 if (localVarIntervals[varIndex] == nullptr)
12795 assert(!compiler->lvaTable[compiler->lvaTrackedToVarNum[varIndex]].lvLRACandidate);
12798 regNumber regNum = getVarReg(outVarToRegMap, varIndex);
12799 Interval* interval = getIntervalForLocalVar(varIndex);
12800 assert(interval->physReg == regNum || (interval->physReg == REG_NA && regNum == REG_STK));
12801 interval->physReg = REG_NA;
12802 interval->assignedReg = nullptr;
12803 interval->isActive = false;
12809 DBEXEC(VERBOSE, printf("\n"));
12812 //------------------------------------------------------------------------
12813 // verifyResolutionMove: Verify a resolution statement. Called by verifyFinalAllocation()
12816 // resolutionMove - A GenTree* that must be a resolution move.
12817 // currentLocation - The LsraLocation of the most recent RefPosition that has been verified.
12823 // If verbose is set, this will also dump the moves into the table of final allocations.
12824 void LinearScan::verifyResolutionMove(GenTree* resolutionMove, LsraLocation currentLocation)
12826 GenTree* dst = resolutionMove;
12827 assert(IsResolutionMove(dst));
12829 if (dst->OperGet() == GT_SWAP)
12831 GenTreeLclVarCommon* left = dst->gtGetOp1()->AsLclVarCommon();
12832 GenTreeLclVarCommon* right = dst->gtGetOp2()->AsLclVarCommon();
12833 regNumber leftRegNum = left->gtRegNum;
12834 regNumber rightRegNum = right->gtRegNum;
12835 LclVarDsc* leftVarDsc = compiler->lvaTable + left->gtLclNum;
12836 LclVarDsc* rightVarDsc = compiler->lvaTable + right->gtLclNum;
12837 Interval* leftInterval = getIntervalForLocalVar(leftVarDsc->lvVarIndex);
12838 Interval* rightInterval = getIntervalForLocalVar(rightVarDsc->lvVarIndex);
12839 assert(leftInterval->physReg == leftRegNum && rightInterval->physReg == rightRegNum);
12840 leftInterval->physReg = rightRegNum;
12841 rightInterval->physReg = leftRegNum;
12842 leftInterval->assignedReg = &physRegs[rightRegNum];
12843 rightInterval->assignedReg = &physRegs[leftRegNum];
12844 physRegs[rightRegNum].assignedInterval = leftInterval;
12845 physRegs[leftRegNum].assignedInterval = rightInterval;
12848 printf(shortRefPositionFormat, currentLocation, 0);
12849 dumpIntervalName(leftInterval);
12851 printf(" %-4s ", getRegName(rightRegNum));
12853 printf(shortRefPositionFormat, currentLocation, 0);
12854 dumpIntervalName(rightInterval);
12856 printf(" %-4s ", getRegName(leftRegNum));
12861 regNumber dstRegNum = dst->gtRegNum;
12862 regNumber srcRegNum;
12863 GenTreeLclVarCommon* lcl;
12864 if (dst->OperGet() == GT_COPY)
12866 lcl = dst->gtGetOp1()->AsLclVarCommon();
12867 srcRegNum = lcl->gtRegNum;
12871 lcl = dst->AsLclVarCommon();
12872 if ((lcl->gtFlags & GTF_SPILLED) != 0)
12874 srcRegNum = REG_STK;
12878 assert((lcl->gtFlags & GTF_SPILL) != 0);
12879 srcRegNum = dstRegNum;
12880 dstRegNum = REG_STK;
12884 Interval* interval = getIntervalForLocalVarNode(lcl);
12885 assert(interval->physReg == srcRegNum || (srcRegNum == REG_STK && interval->physReg == REG_NA));
12886 if (srcRegNum != REG_STK)
12888 physRegs[srcRegNum].assignedInterval = nullptr;
12890 if (dstRegNum != REG_STK)
12892 interval->physReg = dstRegNum;
12893 interval->assignedReg = &(physRegs[dstRegNum]);
12894 physRegs[dstRegNum].assignedInterval = interval;
12895 interval->isActive = true;
12899 interval->physReg = REG_NA;
12900 interval->assignedReg = nullptr;
12901 interval->isActive = false;
12905 printf(shortRefPositionFormat, currentLocation, 0);
12906 dumpIntervalName(interval);
12908 printf(" %-4s ", getRegName(dstRegNum));
12914 #endif // !LEGACY_BACKEND