1 // Licensed to the .NET Foundation under one or more agreements.
2 // The .NET Foundation licenses this file to you under the MIT license.
3 // See the LICENSE file in the project root for more information.
6 XXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXX
7 XXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXX
9 Linear Scan Register Allocation
14 - All register requirements are expressed in the code stream, either as destination
15 registers of tree nodes, or as internal registers. These requirements are
16 expressed in the TreeNodeInfo (gtLsraInfo) on each node, which includes:
17 - The number of register sources and destinations.
18 - The register restrictions (candidates) of the target register, both from itself,
19 as producer of the value (dstCandidates), and from its consuming node (srcCandidates).
20 Note that the srcCandidates field of TreeNodeInfo refers to the destination register
21 (not any of its sources).
22 - The number (internalCount) of registers required, and their register restrictions (internalCandidates).
23 These are neither inputs nor outputs of the node, but used in the sequence of code generated for the tree.
24 "Internal registers" are registers used during the code sequence generated for the node.
25 The register lifetimes must obey the following lifetime model:
26 - First, any internal registers are defined.
27 - Next, any source registers are used (and are then freed if they are last use and are not identified as
29 - Next, the internal registers are used (and are then freed).
30 - Next, any registers in the kill set for the instruction are killed.
31 - Next, the destination register(s) are defined (multiple destination registers are only supported on ARM)
32 - Finally, any "delayRegFree" source registers are freed.
33 There are several things to note about this order:
34 - The internal registers will never overlap any use, but they may overlap a destination register.
35 - Internal registers are never live beyond the node.
36 - The "delayRegFree" annotation is used for instructions that are only available in a Read-Modify-Write form.
37 That is, the destination register is one of the sources. In this case, we must not use the same register for
38 the non-RMW operand as for the destination.
40 Overview (doLinearScan):
41 - Walk all blocks, building intervals and RefPositions (buildIntervals)
42 - Allocate registers (allocateRegisters)
43 - Annotate nodes with register assignments (resolveRegisters)
44 - Add move nodes as needed to resolve conflicting register
45 assignments across non-adjacent edges. (resolveEdges, called from resolveRegisters)
50 - GenTree::gtRegNum (and gtRegPair for ARM) is annotated with the register
51 assignment for a node. If the node does not require a register, it is
52 annotated as such (for single registers, gtRegNum = REG_NA; for register
53 pair type, gtRegPair = REG_PAIR_NONE). For a variable definition or interior
54 tree node (an "implicit" definition), this is the register to put the result.
55 For an expression use, this is the place to find the value that has previously
57 - In most cases, this register must satisfy the constraints specified by the TreeNodeInfo.
58 - In some cases, this is difficult:
59 - If a lclVar node currently lives in some register, it may not be desirable to move it
60 (i.e. its current location may be desirable for future uses, e.g. if it's a callee save register,
61 but needs to be in a specific arg register for a call).
62 - In other cases there may be conflicts on the restrictions placed by the defining node and the node which
64 - If such a node is constrained to a single fixed register (e.g. an arg register, or a return from a call),
65 then LSRA is free to annotate the node with a different register. The code generator must issue the appropriate
67 - However, if such a node is constrained to a set of registers, and its current location does not satisfy that
68 requirement, LSRA must insert a GT_COPY node between the node and its parent. The gtRegNum on the GT_COPY node
69 must satisfy the register requirement of the parent.
70 - GenTree::gtRsvdRegs has a set of registers used for internal temps.
71 - A tree node is marked GTF_SPILL if the tree node must be spilled by the code generator after it has been
73 - LSRA currently does not set GTF_SPILLED on such nodes, because it caused problems in the old code generator.
74 In the new backend perhaps this should change (see also the note below under CodeGen).
75 - A tree node is marked GTF_SPILLED if it is a lclVar that must be reloaded prior to use.
76 - The register (gtRegNum) on the node indicates the register to which it must be reloaded.
77 - For lclVar nodes, since the uses and defs are distinct tree nodes, it is always possible to annotate the node
78 with the register to which the variable must be reloaded.
79 - For other nodes, since they represent both the def and use, if the value must be reloaded to a different
80 register, LSRA must insert a GT_RELOAD node in order to specify the register to which it should be reloaded.
82 Local variable table (LclVarDsc):
83 - LclVarDsc::lvRegister is set to true if a local variable has the
84 same register assignment for its entire lifetime.
85 - LclVarDsc::lvRegNum / lvOtherReg: these are initialized to their
86 first value at the end of LSRA (it looks like lvOtherReg isn't?
87 This is probably a bug (ARM)). Codegen will set them to their current value
88 as it processes the trees, since a variable can (now) be assigned different
89 registers over its lifetimes.
91 XXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXX
92 XXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXX
100 #ifndef LEGACY_BACKEND // This file is ONLY used for the RyuJIT backend that uses the linear scan register allocator
105 const char* LinearScan::resolveTypeName[] = {"Split", "Join", "Critical", "SharedCritical"};
108 /*XXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXX
109 XXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXX
111 XX Small Helper functions XX
114 XXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXX
115 XXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXX
118 //--------------------------------------------------------------
119 // lsraAssignRegToTree: Assign the given reg to tree node.
122 // tree - Gentree node
123 // reg - register to be assigned
124 // regIdx - register idx, if tree is a multi-reg call node.
125 // regIdx will be zero for single-reg result producing tree nodes.
130 void lsraAssignRegToTree(GenTreePtr tree, regNumber reg, unsigned regIdx)
134 tree->gtRegNum = reg;
136 #if defined(_TARGET_ARM_)
137 else if (tree->OperGet() == GT_MUL_LONG || tree->OperGet() == GT_PUTARG_REG)
140 GenTreeMultiRegOp* mul = tree->AsMultiRegOp();
141 mul->gtOtherReg = reg;
143 else if (tree->OperGet() == GT_COPY)
146 GenTreeCopyOrReload* copy = tree->AsCopyOrReload();
147 copy->gtOtherRegs[0] = (regNumberSmall)reg;
149 else if (tree->OperGet() == GT_PUTARG_SPLIT)
151 GenTreePutArgSplit* putArg = tree->AsPutArgSplit();
152 putArg->SetRegNumByIdx(reg, regIdx);
154 #endif // _TARGET_ARM_
157 assert(tree->IsMultiRegCall());
158 GenTreeCall* call = tree->AsCall();
159 call->SetRegNumByIdx(reg, regIdx);
163 //-------------------------------------------------------------
164 // getWeight: Returns the weight of the RefPosition.
167 // refPos - ref position
170 // Weight of ref position.
171 unsigned LinearScan::getWeight(RefPosition* refPos)
174 GenTreePtr treeNode = refPos->treeNode;
176 if (treeNode != nullptr)
178 if (isCandidateLocalRef(treeNode))
180 // Tracked locals: use weighted ref cnt as the weight of the
182 GenTreeLclVarCommon* lclCommon = treeNode->AsLclVarCommon();
183 LclVarDsc* varDsc = &(compiler->lvaTable[lclCommon->gtLclNum]);
184 weight = varDsc->lvRefCntWtd;
188 // Non-candidate local ref or non-lcl tree node.
189 // These are considered to have two references in the basic block:
190 // a def and a use and hence weighted ref count is 2 times
191 // the basic block weight in which they appear.
192 weight = 2 * this->blockInfo[refPos->bbNum].weight;
197 // Non-tree node ref positions. These will have a single
198 // reference in the basic block and hence their weighted
199 // refcount is equal to the block weight in which they
201 weight = this->blockInfo[refPos->bbNum].weight;
207 // allRegs represents a set of registers that can
208 // be used to allocate the specified type in any point
209 // in time (more of a 'bank' of registers).
210 regMaskTP LinearScan::allRegs(RegisterType rt)
214 return availableFloatRegs;
216 else if (rt == TYP_DOUBLE)
218 return availableDoubleRegs;
220 // TODO-Cleanup: Add an RBM_ALLSIMD
222 else if (varTypeIsSIMD(rt))
224 return availableDoubleRegs;
225 #endif // FEATURE_SIMD
229 return availableIntRegs;
233 //--------------------------------------------------------------------------
234 // allMultiRegCallNodeRegs: represents a set of registers that can be used
235 // to allocate a multi-reg call node.
238 // call - Multi-reg call node
241 // Mask representing the set of available registers for multi-reg call
245 // Multi-reg call node available regs = Bitwise-OR(allregs(GetReturnRegType(i)))
246 // for all i=0..RetRegCount-1.
247 regMaskTP LinearScan::allMultiRegCallNodeRegs(GenTreeCall* call)
249 assert(call->HasMultiRegRetVal());
251 ReturnTypeDesc* retTypeDesc = call->GetReturnTypeDesc();
252 regMaskTP resultMask = allRegs(retTypeDesc->GetReturnRegType(0));
254 unsigned count = retTypeDesc->GetReturnRegCount();
255 for (unsigned i = 1; i < count; ++i)
257 resultMask |= allRegs(retTypeDesc->GetReturnRegType(i));
263 //--------------------------------------------------------------------------
264 // allRegs: returns the set of registers that can accomodate the type of
268 // tree - GenTree node
271 // Mask representing the set of available registers for given tree
273 // Note: In case of multi-reg call node, the full set of registers must be
274 // determined by looking at types of individual return register types.
275 // In this case, the registers may include registers from different register
276 // sets and will not be limited to the actual ABI return registers.
277 regMaskTP LinearScan::allRegs(GenTree* tree)
279 regMaskTP resultMask;
281 // In case of multi-reg calls, allRegs is defined as
282 // Bitwise-Or(allRegs(GetReturnRegType(i)) for i=0..ReturnRegCount-1
283 if (tree->IsMultiRegCall())
285 resultMask = allMultiRegCallNodeRegs(tree->AsCall());
289 resultMask = allRegs(tree->TypeGet());
295 regMaskTP LinearScan::allSIMDRegs()
297 return availableFloatRegs;
300 //------------------------------------------------------------------------
301 // internalFloatRegCandidates: Return the set of registers that are appropriate
302 // for use as internal float registers.
305 // The set of registers (as a regMaskTP).
308 // compFloatingPointUsed is only required to be set if it is possible that we
309 // will use floating point callee-save registers.
310 // It is unlikely, if an internal register is the only use of floating point,
311 // that it will select a callee-save register. But to be safe, we restrict
312 // the set of candidates if compFloatingPointUsed is not already set.
314 regMaskTP LinearScan::internalFloatRegCandidates()
316 if (compiler->compFloatingPointUsed)
318 return allRegs(TYP_FLOAT);
322 return RBM_FLT_CALLEE_TRASH;
326 /*****************************************************************************
328 *****************************************************************************/
330 RegisterType regType(T type)
333 if (varTypeIsSIMD(type))
335 return FloatRegisterType;
337 #endif // FEATURE_SIMD
338 return varTypeIsFloating(TypeGet(type)) ? FloatRegisterType : IntRegisterType;
341 bool useFloatReg(var_types type)
343 return (regType(type) == FloatRegisterType);
346 bool registerTypesEquivalent(RegisterType a, RegisterType b)
348 return varTypeIsIntegralOrI(a) == varTypeIsIntegralOrI(b);
351 bool isSingleRegister(regMaskTP regMask)
353 return (regMask != RBM_NONE && genMaxOneBit(regMask));
356 /*****************************************************************************
357 * Inline functions for RegRecord
358 *****************************************************************************/
360 bool RegRecord::isFree()
362 return ((assignedInterval == nullptr || !assignedInterval->isActive) && !isBusyUntilNextKill);
365 /*****************************************************************************
366 * Inline functions for LinearScan
367 *****************************************************************************/
368 RegRecord* LinearScan::getRegisterRecord(regNumber regNum)
370 return &physRegs[regNum];
375 //----------------------------------------------------------------------------
376 // getConstrainedRegMask: Returns new regMask which is the intersection of
377 // regMaskActual and regMaskConstraint if the new regMask has at least
378 // minRegCount registers, otherwise returns regMaskActual.
381 // regMaskActual - regMask that needs to be constrained
382 // regMaskConstraint - regMask constraint that needs to be
383 // applied to regMaskActual
384 // minRegCount - Minimum number of regs that should be
385 // be present in new regMask.
388 // New regMask that has minRegCount registers after instersection.
389 // Otherwise returns regMaskActual.
390 regMaskTP LinearScan::getConstrainedRegMask(regMaskTP regMaskActual, regMaskTP regMaskConstraint, unsigned minRegCount)
392 regMaskTP newMask = regMaskActual & regMaskConstraint;
393 if (genCountBits(newMask) >= minRegCount)
398 return regMaskActual;
401 //------------------------------------------------------------------------
402 // stressLimitRegs: Given a set of registers, expressed as a register mask, reduce
403 // them based on the current stress options.
406 // mask - The current mask of register candidates for a node
409 // A possibly-modified mask, based on the value of COMPlus_JitStressRegs.
412 // This is the method used to implement the stress options that limit
413 // the set of registers considered for allocation.
415 regMaskTP LinearScan::stressLimitRegs(RefPosition* refPosition, regMaskTP mask)
417 if (getStressLimitRegs() != LSRA_LIMIT_NONE)
419 // The refPosition could be null, for example when called
420 // by getTempRegForResolution().
421 int minRegCount = (refPosition != nullptr) ? refPosition->minRegCandidateCount : 1;
423 switch (getStressLimitRegs())
425 case LSRA_LIMIT_CALLEE:
426 if (!compiler->opts.compDbgEnC)
428 mask = getConstrainedRegMask(mask, RBM_CALLEE_SAVED, minRegCount);
432 case LSRA_LIMIT_CALLER:
434 mask = getConstrainedRegMask(mask, RBM_CALLEE_TRASH, minRegCount);
438 case LSRA_LIMIT_SMALL_SET:
439 if ((mask & LsraLimitSmallIntSet) != RBM_NONE)
441 mask = getConstrainedRegMask(mask, LsraLimitSmallIntSet, minRegCount);
443 else if ((mask & LsraLimitSmallFPSet) != RBM_NONE)
445 mask = getConstrainedRegMask(mask, LsraLimitSmallFPSet, minRegCount);
453 if (refPosition != nullptr && refPosition->isFixedRegRef)
455 mask |= refPosition->registerAssignment;
463 // TODO-Cleanup: Consider adding an overload that takes a varDsc, and can appropriately
464 // set such fields as isStructField
466 Interval* LinearScan::newInterval(RegisterType theRegisterType)
468 intervals.emplace_back(theRegisterType, allRegs(theRegisterType));
469 Interval* newInt = &intervals.back();
472 newInt->intervalIndex = static_cast<unsigned>(intervals.size() - 1);
475 DBEXEC(VERBOSE, newInt->dump());
479 RefPosition* LinearScan::newRefPositionRaw(LsraLocation nodeLocation, GenTree* treeNode, RefType refType)
481 refPositions.emplace_back(curBBNum, nodeLocation, treeNode, refType);
482 RefPosition* newRP = &refPositions.back();
484 newRP->rpNum = static_cast<unsigned>(refPositions.size() - 1);
489 //------------------------------------------------------------------------
490 // resolveConflictingDefAndUse: Resolve the situation where we have conflicting def and use
491 // register requirements on a single-def, single-use interval.
494 // defRefPosition - The interval definition
495 // useRefPosition - The (sole) interval use
501 // The two RefPositions are for the same interval, which is a tree-temp.
504 // We require some special handling for the case where the use is a "delayRegFree" case of a fixedReg.
505 // In that case, if we change the registerAssignment on the useRefPosition, we will lose the fact that,
506 // even if we assign a different register (and rely on codegen to do the copy), that fixedReg also needs
507 // to remain busy until the Def register has been allocated. In that case, we don't allow Case 1 or Case 4
509 // Here are the cases we consider (in this order):
510 // 1. If The defRefPosition specifies a single register, and there are no conflicting
511 // FixedReg uses of it between the def and use, we use that register, and the code generator
512 // will insert the copy. Note that it cannot be in use because there is a FixedRegRef for the def.
513 // 2. If the useRefPosition specifies a single register, and it is not in use, and there are no
514 // conflicting FixedReg uses of it between the def and use, we use that register, and the code generator
515 // will insert the copy.
516 // 3. If the defRefPosition specifies a single register (but there are conflicts, as determined
517 // in 1.), and there are no conflicts with the useRefPosition register (if it's a single register),
518 /// we set the register requirements on the defRefPosition to the use registers, and the
519 // code generator will insert a copy on the def. We can't rely on the code generator to put a copy
520 // on the use if it has multiple possible candidates, as it won't know which one has been allocated.
521 // 4. If the useRefPosition specifies a single register, and there are no conflicts with the register
522 // on the defRefPosition, we leave the register requirements on the defRefPosition as-is, and set
523 // the useRefPosition to the def registers, for similar reasons to case #3.
524 // 5. If both the defRefPosition and the useRefPosition specify single registers, but both have conflicts,
525 // We set the candiates on defRefPosition to be all regs of the appropriate type, and since they are
526 // single registers, codegen can insert the copy.
527 // 6. Finally, if the RefPositions specify disjoint subsets of the registers (or the use is fixed but
528 // has a conflict), we must insert a copy. The copy will be inserted before the use if the
529 // use is not fixed (in the fixed case, the code generator will insert the use).
531 // TODO-CQ: We get bad register allocation in case #3 in the situation where no register is
532 // available for the lifetime. We end up allocating a register that must be spilled, and it probably
533 // won't be the register that is actually defined by the target instruction. So, we have to copy it
534 // and THEN spill it. In this case, we should be using the def requirement. But we need to change
535 // the interface to this method a bit to make that work (e.g. returning a candidate set to use, but
536 // leaving the registerAssignment as-is on the def, so that if we find that we need to spill anyway
537 // we can use the fixed-reg on the def.
540 void LinearScan::resolveConflictingDefAndUse(Interval* interval, RefPosition* defRefPosition)
542 assert(!interval->isLocalVar);
544 RefPosition* useRefPosition = defRefPosition->nextRefPosition;
545 regMaskTP defRegAssignment = defRefPosition->registerAssignment;
546 regMaskTP useRegAssignment = useRefPosition->registerAssignment;
547 RegRecord* defRegRecord = nullptr;
548 RegRecord* useRegRecord = nullptr;
549 regNumber defReg = REG_NA;
550 regNumber useReg = REG_NA;
551 bool defRegConflict = false;
552 bool useRegConflict = false;
554 // If the useRefPosition is a "delayRegFree", we can't change the registerAssignment
555 // on it, or we will fail to ensure that the fixedReg is busy at the time the target
556 // (of the node that uses this interval) is allocated.
557 bool canChangeUseAssignment = !useRefPosition->isFixedRegRef || !useRefPosition->delayRegFree;
559 INDEBUG(dumpLsraAllocationEvent(LSRA_EVENT_DEFUSE_CONFLICT));
560 if (!canChangeUseAssignment)
562 INDEBUG(dumpLsraAllocationEvent(LSRA_EVENT_DEFUSE_FIXED_DELAY_USE));
564 if (defRefPosition->isFixedRegRef)
566 defReg = defRefPosition->assignedReg();
567 defRegRecord = getRegisterRecord(defReg);
568 if (canChangeUseAssignment)
570 RefPosition* currFixedRegRefPosition = defRegRecord->recentRefPosition;
571 assert(currFixedRegRefPosition != nullptr &&
572 currFixedRegRefPosition->nodeLocation == defRefPosition->nodeLocation);
574 if (currFixedRegRefPosition->nextRefPosition == nullptr ||
575 currFixedRegRefPosition->nextRefPosition->nodeLocation > useRefPosition->getRefEndLocation())
577 // This is case #1. Use the defRegAssignment
578 INDEBUG(dumpLsraAllocationEvent(LSRA_EVENT_DEFUSE_CASE1));
579 useRefPosition->registerAssignment = defRegAssignment;
584 defRegConflict = true;
588 if (useRefPosition->isFixedRegRef)
590 useReg = useRefPosition->assignedReg();
591 useRegRecord = getRegisterRecord(useReg);
592 RefPosition* currFixedRegRefPosition = useRegRecord->recentRefPosition;
594 // We know that useRefPosition is a fixed use, so the nextRefPosition must not be null.
595 RefPosition* nextFixedRegRefPosition = useRegRecord->getNextRefPosition();
596 assert(nextFixedRegRefPosition != nullptr &&
597 nextFixedRegRefPosition->nodeLocation <= useRefPosition->nodeLocation);
599 // First, check to see if there are any conflicting FixedReg references between the def and use.
600 if (nextFixedRegRefPosition->nodeLocation == useRefPosition->nodeLocation)
602 // OK, no conflicting FixedReg references.
603 // Now, check to see whether it is currently in use.
604 if (useRegRecord->assignedInterval != nullptr)
606 RefPosition* possiblyConflictingRef = useRegRecord->assignedInterval->recentRefPosition;
607 LsraLocation possiblyConflictingRefLocation = possiblyConflictingRef->getRefEndLocation();
608 if (possiblyConflictingRefLocation >= defRefPosition->nodeLocation)
610 useRegConflict = true;
615 // This is case #2. Use the useRegAssignment
616 INDEBUG(dumpLsraAllocationEvent(LSRA_EVENT_DEFUSE_CASE2));
617 defRefPosition->registerAssignment = useRegAssignment;
623 useRegConflict = true;
626 if (defRegRecord != nullptr && !useRegConflict)
629 INDEBUG(dumpLsraAllocationEvent(LSRA_EVENT_DEFUSE_CASE3));
630 defRefPosition->registerAssignment = useRegAssignment;
633 if (useRegRecord != nullptr && !defRegConflict && canChangeUseAssignment)
636 INDEBUG(dumpLsraAllocationEvent(LSRA_EVENT_DEFUSE_CASE4));
637 useRefPosition->registerAssignment = defRegAssignment;
640 if (defRegRecord != nullptr && useRegRecord != nullptr)
643 INDEBUG(dumpLsraAllocationEvent(LSRA_EVENT_DEFUSE_CASE5));
644 RegisterType regType = interval->registerType;
645 assert((getRegisterType(interval, defRefPosition) == regType) &&
646 (getRegisterType(interval, useRefPosition) == regType));
647 regMaskTP candidates = allRegs(regType);
648 defRefPosition->registerAssignment = candidates;
651 INDEBUG(dumpLsraAllocationEvent(LSRA_EVENT_DEFUSE_CASE6));
655 //------------------------------------------------------------------------
656 // conflictingFixedRegReference: Determine whether the current RegRecord has a
657 // fixed register use that conflicts with 'refPosition'
660 // refPosition - The RefPosition of interest
663 // Returns true iff the given RefPosition is NOT a fixed use of this register,
665 // - there is a RefPosition on this RegRecord at the nodeLocation of the given RefPosition, or
666 // - the given RefPosition has a delayRegFree, and there is a RefPosition on this RegRecord at
667 // the nodeLocation just past the given RefPosition.
670 // 'refPosition is non-null.
672 bool RegRecord::conflictingFixedRegReference(RefPosition* refPosition)
674 // Is this a fixed reference of this register? If so, there is no conflict.
675 if (refPosition->isFixedRefOfRegMask(genRegMask(regNum)))
679 // Otherwise, check for conflicts.
680 // There is a conflict if:
681 // 1. There is a recent RefPosition on this RegRecord that is at this location,
682 // except in the case where it is a special "putarg" that is associated with this interval, OR
683 // 2. There is an upcoming RefPosition at this location, or at the next location
684 // if refPosition is a delayed use (i.e. must be kept live through the next/def location).
686 LsraLocation refLocation = refPosition->nodeLocation;
687 if (recentRefPosition != nullptr && recentRefPosition->refType != RefTypeKill &&
688 recentRefPosition->nodeLocation == refLocation &&
689 (!isBusyUntilNextKill || assignedInterval != refPosition->getInterval()))
693 LsraLocation nextPhysRefLocation = getNextRefLocation();
694 if (nextPhysRefLocation == refLocation || (refPosition->delayRegFree && nextPhysRefLocation == (refLocation + 1)))
701 void LinearScan::applyCalleeSaveHeuristics(RefPosition* rp)
703 #ifdef _TARGET_AMD64_
704 if (compiler->opts.compDbgEnC)
706 // We only use RSI and RDI for EnC code, so we don't want to favor callee-save regs.
709 #endif // _TARGET_AMD64_
711 Interval* theInterval = rp->getInterval();
714 regMaskTP calleeSaveMask = calleeSaveRegs(getRegisterType(theInterval, rp));
715 if (doReverseCallerCallee())
717 rp->registerAssignment =
718 getConstrainedRegMask(rp->registerAssignment, calleeSaveMask, rp->minRegCandidateCount);
723 // Set preferences so that this register set will be preferred for earlier refs
724 theInterval->updateRegisterPreferences(rp->registerAssignment);
728 void LinearScan::associateRefPosWithInterval(RefPosition* rp)
730 Referenceable* theReferent = rp->referent;
732 if (theReferent != nullptr)
734 // All RefPositions except the dummy ones at the beginning of blocks
736 if (rp->isIntervalRef())
738 Interval* theInterval = rp->getInterval();
740 applyCalleeSaveHeuristics(rp);
742 if (theInterval->isLocalVar)
744 if (RefTypeIsUse(rp->refType))
746 RefPosition* const prevRP = theInterval->recentRefPosition;
747 if ((prevRP != nullptr) && (prevRP->bbNum == rp->bbNum))
749 prevRP->lastUse = false;
753 rp->lastUse = (rp->refType != RefTypeExpUse) && (rp->refType != RefTypeParamDef) &&
754 (rp->refType != RefTypeZeroInit) && !extendLifetimes();
756 else if (rp->refType == RefTypeUse)
758 // Ensure that we have consistent def/use on SDSU temps.
759 // However, there are a couple of cases where this may over-constrain allocation:
760 // 1. In the case of a non-commutative rmw def (in which the rmw source must be delay-free), or
761 // 2. In the case where the defining node requires a temp distinct from the target (also a
763 // In those cases, if we propagate a single-register restriction from the consumer to the producer
764 // the delayed uses will not see a fixed reference in the PhysReg at that position, and may
765 // incorrectly allocate that register.
766 // TODO-CQ: This means that we may often require a copy at the use of this node's result.
767 // This case could be moved to BuildRefPositionsForNode, at the point where the def RefPosition is
768 // created, causing a RefTypeFixedRef to be added at that location. This, however, results in
769 // more PhysReg RefPositions (a throughput impact), and a large number of diffs that require
770 // further analysis to determine benefit.
772 RefPosition* prevRefPosition = theInterval->recentRefPosition;
773 assert(prevRefPosition != nullptr && theInterval->firstRefPosition == prevRefPosition);
774 // All defs must have a valid treeNode, but we check it below to be conservative.
775 assert(prevRefPosition->treeNode != nullptr);
776 regMaskTP prevAssignment = prevRefPosition->registerAssignment;
777 regMaskTP newAssignment = (prevAssignment & rp->registerAssignment);
778 if (newAssignment != RBM_NONE)
780 if (!isSingleRegister(newAssignment) ||
781 (!theInterval->hasNonCommutativeRMWDef && (prevRefPosition->treeNode != nullptr) &&
782 !prevRefPosition->treeNode->gtLsraInfo.isInternalRegDelayFree))
784 prevRefPosition->registerAssignment = newAssignment;
789 theInterval->hasConflictingDefUse = true;
796 RefPosition* prevRP = theReferent->recentRefPosition;
797 if (prevRP != nullptr)
799 prevRP->nextRefPosition = rp;
803 theReferent->firstRefPosition = rp;
805 theReferent->recentRefPosition = rp;
806 theReferent->lastRefPosition = rp;
810 assert((rp->refType == RefTypeBB) || (rp->refType == RefTypeKillGCRefs));
814 //---------------------------------------------------------------------------
815 // newRefPosition: allocate and initialize a new RefPosition.
818 // reg - reg number that identifies RegRecord to be associated
819 // with this RefPosition
820 // theLocation - LSRA location of RefPosition
821 // theRefType - RefPosition type
822 // theTreeNode - GenTree node for which this RefPosition is created
823 // mask - Set of valid registers for this RefPosition
824 // multiRegIdx - register position if this RefPosition corresponds to a
825 // multi-reg call node.
830 RefPosition* LinearScan::newRefPosition(
831 regNumber reg, LsraLocation theLocation, RefType theRefType, GenTree* theTreeNode, regMaskTP mask)
833 RefPosition* newRP = newRefPositionRaw(theLocation, theTreeNode, theRefType);
835 newRP->setReg(getRegisterRecord(reg));
836 newRP->registerAssignment = mask;
838 newRP->setMultiRegIdx(0);
839 newRP->setAllocateIfProfitable(false);
841 associateRefPosWithInterval(newRP);
843 DBEXEC(VERBOSE, newRP->dump());
847 //---------------------------------------------------------------------------
848 // newRefPosition: allocate and initialize a new RefPosition.
851 // theInterval - interval to which RefPosition is associated with.
852 // theLocation - LSRA location of RefPosition
853 // theRefType - RefPosition type
854 // theTreeNode - GenTree node for which this RefPosition is created
855 // mask - Set of valid registers for this RefPosition
856 // multiRegIdx - register position if this RefPosition corresponds to a
857 // multi-reg call node.
858 // minRegCount - Minimum number registers that needs to be ensured while
859 // constraining candidates for this ref position under
860 // LSRA stress. This is a DEBUG only arg.
865 RefPosition* LinearScan::newRefPosition(Interval* theInterval,
866 LsraLocation theLocation,
868 GenTree* theTreeNode,
870 unsigned multiRegIdx /* = 0 */
871 DEBUGARG(unsigned minRegCandidateCount /* = 1 */))
874 if (theInterval != nullptr && regType(theInterval->registerType) == FloatRegisterType)
876 // In the case we're using floating point registers we must make sure
877 // this flag was set previously in the compiler since this will mandate
878 // whether LSRA will take into consideration FP reg killsets.
879 assert(compiler->compFloatingPointUsed || ((mask & RBM_FLT_CALLEE_SAVED) == 0));
883 // If this reference is constrained to a single register (and it's not a dummy
884 // or Kill reftype already), add a RefTypeFixedReg at this location so that its
885 // availability can be more accurately determined
887 bool isFixedRegister = isSingleRegister(mask);
888 bool insertFixedRef = false;
891 // Insert a RefTypeFixedReg for any normal def or use (not ParamDef or BB)
892 if (theRefType == RefTypeUse || theRefType == RefTypeDef)
894 insertFixedRef = true;
900 regNumber physicalReg = genRegNumFromMask(mask);
901 RefPosition* pos = newRefPosition(physicalReg, theLocation, RefTypeFixedReg, nullptr, mask);
902 assert(theInterval != nullptr);
903 assert((allRegs(theInterval->registerType) & mask) != 0);
906 RefPosition* newRP = newRefPositionRaw(theLocation, theTreeNode, theRefType);
908 newRP->setInterval(theInterval);
911 newRP->isFixedRegRef = isFixedRegister;
913 #ifndef _TARGET_AMD64_
914 // We don't need this for AMD because the PInvoke method epilog code is explicit
915 // at register allocation time.
916 if (theInterval != nullptr && theInterval->isLocalVar && compiler->info.compCallUnmanaged &&
917 theInterval->varNum == compiler->genReturnLocal)
919 mask &= ~(RBM_PINVOKE_TCB | RBM_PINVOKE_FRAME);
920 noway_assert(mask != RBM_NONE);
922 #endif // !_TARGET_AMD64_
923 newRP->registerAssignment = mask;
925 newRP->setMultiRegIdx(multiRegIdx);
926 newRP->setAllocateIfProfitable(false);
929 newRP->minRegCandidateCount = minRegCandidateCount;
932 associateRefPosWithInterval(newRP);
934 DBEXEC(VERBOSE, newRP->dump());
938 /*****************************************************************************
939 * Inline functions for Interval
940 *****************************************************************************/
941 RefPosition* Referenceable::getNextRefPosition()
943 if (recentRefPosition == nullptr)
945 return firstRefPosition;
949 return recentRefPosition->nextRefPosition;
953 LsraLocation Referenceable::getNextRefLocation()
955 RefPosition* nextRefPosition = getNextRefPosition();
956 if (nextRefPosition == nullptr)
962 return nextRefPosition->nodeLocation;
966 // Iterate through all the registers of the given type
967 class RegisterIterator
969 friend class Registers;
972 RegisterIterator(RegisterType type) : regType(type)
974 if (useFloatReg(regType))
976 currentRegNum = REG_FP_FIRST;
980 currentRegNum = REG_INT_FIRST;
985 static RegisterIterator Begin(RegisterType regType)
987 return RegisterIterator(regType);
989 static RegisterIterator End(RegisterType regType)
991 RegisterIterator endIter = RegisterIterator(regType);
992 // This assumes only integer and floating point register types
993 // if we target a processor with additional register types,
994 // this would have to change
995 if (useFloatReg(regType))
997 // This just happens to work for both double & float
998 endIter.currentRegNum = REG_NEXT(REG_FP_LAST);
1002 endIter.currentRegNum = REG_NEXT(REG_INT_LAST);
1008 void operator++(int dummy) // int dummy is c++ for "this is postfix ++"
1010 currentRegNum = REG_NEXT(currentRegNum);
1012 if (regType == TYP_DOUBLE)
1013 currentRegNum = REG_NEXT(currentRegNum);
1016 void operator++() // prefix operator++
1018 currentRegNum = REG_NEXT(currentRegNum);
1020 if (regType == TYP_DOUBLE)
1021 currentRegNum = REG_NEXT(currentRegNum);
1024 regNumber operator*()
1026 return currentRegNum;
1028 bool operator!=(const RegisterIterator& other)
1030 return other.currentRegNum != currentRegNum;
1034 regNumber currentRegNum;
1035 RegisterType regType;
1041 friend class RegisterIterator;
1043 Registers(RegisterType t)
1047 RegisterIterator begin()
1049 return RegisterIterator::Begin(type);
1051 RegisterIterator end()
1053 return RegisterIterator::End(type);
1058 void LinearScan::dumpVarToRegMap(VarToRegMap map)
1060 bool anyPrinted = false;
1061 for (unsigned varIndex = 0; varIndex < compiler->lvaTrackedCount; varIndex++)
1063 unsigned varNum = compiler->lvaTrackedToVarNum[varIndex];
1064 if (map[varIndex] != REG_STK)
1066 printf("V%02u=%s ", varNum, getRegName(map[varIndex]));
1077 void LinearScan::dumpInVarToRegMap(BasicBlock* block)
1079 printf("Var=Reg beg of BB%02u: ", block->bbNum);
1080 VarToRegMap map = getInVarToRegMap(block->bbNum);
1081 dumpVarToRegMap(map);
1084 void LinearScan::dumpOutVarToRegMap(BasicBlock* block)
1086 printf("Var=Reg end of BB%02u: ", block->bbNum);
1087 VarToRegMap map = getOutVarToRegMap(block->bbNum);
1088 dumpVarToRegMap(map);
1093 LinearScanInterface* getLinearScanAllocator(Compiler* comp)
1095 return new (comp, CMK_LSRA) LinearScan(comp);
1098 //------------------------------------------------------------------------
1105 // The constructor takes care of initializing the data structures that are used
1106 // during Lowering, including (in DEBUG) getting the stress environment variables,
1107 // as they may affect the block ordering.
1109 LinearScan::LinearScan(Compiler* theCompiler)
1110 : compiler(theCompiler)
1111 #if MEASURE_MEM_ALLOC
1112 , lsraIAllocator(nullptr)
1113 #endif // MEASURE_MEM_ALLOC
1114 , intervals(LinearScanMemoryAllocatorInterval(theCompiler))
1115 , refPositions(LinearScanMemoryAllocatorRefPosition(theCompiler))
1118 maxNodeLocation = 0;
1119 activeRefPosition = nullptr;
1121 // Get the value of the environment variable that controls stress for register allocation
1122 lsraStressMask = JitConfig.JitStressRegs();
1125 if (lsraStressMask != 0)
1127 // The code in this #if can be used to debug JitStressRegs issues according to
1128 // method hash. To use, simply set environment variables JitStressRegsHashLo and JitStressRegsHashHi
1129 unsigned methHash = compiler->info.compMethodHash();
1130 char* lostr = getenv("JitStressRegsHashLo");
1131 unsigned methHashLo = 0;
1133 if (lostr != nullptr)
1135 sscanf_s(lostr, "%x", &methHashLo);
1138 char* histr = getenv("JitStressRegsHashHi");
1139 unsigned methHashHi = UINT32_MAX;
1140 if (histr != nullptr)
1142 sscanf_s(histr, "%x", &methHashHi);
1145 if (methHash < methHashLo || methHash > methHashHi)
1149 else if (dump == true)
1151 printf("JitStressRegs = %x for method %s, hash = 0x%x.\n",
1152 lsraStressMask, compiler->info.compFullName, compiler->info.compMethodHash());
1153 printf(""); // in our logic this causes a flush
1159 dumpTerse = (JitConfig.JitDumpTerseLsra() != 0);
1162 enregisterLocalVars = ((compiler->opts.compFlags & CLFLG_REGVAR) != 0) && compiler->lvaTrackedCount > 0;
1163 availableIntRegs = (RBM_ALLINT & ~compiler->codeGen->regSet.rsMaskResvd);
1166 availableIntRegs &= ~RBM_FPBASE;
1167 #endif // ETW_EBP_FRAMED
1169 availableFloatRegs = RBM_ALLFLOAT;
1170 availableDoubleRegs = RBM_ALLDOUBLE;
1172 #ifdef _TARGET_AMD64_
1173 if (compiler->opts.compDbgEnC)
1175 // On x64 when the EnC option is set, we always save exactly RBP, RSI and RDI.
1176 // RBP is not available to the register allocator, so RSI and RDI are the only
1177 // callee-save registers available.
1178 availableIntRegs &= ~RBM_CALLEE_SAVED | RBM_RSI | RBM_RDI;
1179 availableFloatRegs &= ~RBM_CALLEE_SAVED;
1180 availableDoubleRegs &= ~RBM_CALLEE_SAVED;
1182 #endif // _TARGET_AMD64_
1183 compiler->rpFrameType = FT_NOT_SET;
1184 compiler->rpMustCreateEBPCalled = false;
1186 compiler->codeGen->intRegState.rsIsFloat = false;
1187 compiler->codeGen->floatRegState.rsIsFloat = true;
1189 // Block sequencing (the order in which we schedule).
1190 // Note that we don't initialize the bbVisitedSet until we do the first traversal
1191 // (currently during Lowering's second phase, where it sets the TreeNodeInfo).
1192 // This is so that any blocks that are added during the first phase of Lowering
1193 // are accounted for (and we don't have BasicBlockEpoch issues).
1194 blockSequencingDone = false;
1195 blockSequence = nullptr;
1196 blockSequenceWorkList = nullptr;
1200 // Information about each block, including predecessor blocks used for variable locations at block entry.
1201 blockInfo = nullptr;
1203 // Populate the register mask table.
1204 // The first two masks in the table are allint/allfloat
1205 // The next N are the masks for each single register.
1206 // After that are the dynamically added ones.
1207 regMaskTable = new (compiler, CMK_LSRA) regMaskTP[numMasks];
1208 regMaskTable[ALLINT_IDX] = allRegs(TYP_INT);
1209 regMaskTable[ALLFLOAT_IDX] = allRegs(TYP_DOUBLE);
1212 for (reg = REG_FIRST; reg < REG_COUNT; reg = REG_NEXT(reg))
1214 regMaskTable[FIRST_SINGLE_REG_IDX + reg - REG_FIRST] = (reg == REG_STK) ? RBM_NONE : genRegMask(reg);
1216 nextFreeMask = FIRST_SINGLE_REG_IDX + REG_COUNT;
1217 noway_assert(nextFreeMask <= numMasks);
1220 // Return the reg mask corresponding to the given index.
1221 regMaskTP LinearScan::GetRegMaskForIndex(RegMaskIndex index)
1223 assert(index < numMasks);
1224 assert(index < nextFreeMask);
1225 return regMaskTable[index];
1228 // Given a reg mask, return the index it corresponds to. If it is not a 'well known' reg mask,
1229 // add it at the end. This method has linear behavior in the worst cases but that is fairly rare.
1230 // Most methods never use any but the well-known masks, and when they do use more
1231 // it is only one or two more.
1232 LinearScan::RegMaskIndex LinearScan::GetIndexForRegMask(regMaskTP mask)
1234 RegMaskIndex result;
1235 if (isSingleRegister(mask))
1237 result = genRegNumFromMask(mask) + FIRST_SINGLE_REG_IDX;
1239 else if (mask == allRegs(TYP_INT))
1241 result = ALLINT_IDX;
1243 else if (mask == allRegs(TYP_DOUBLE))
1245 result = ALLFLOAT_IDX;
1249 for (int i = FIRST_SINGLE_REG_IDX + REG_COUNT; i < nextFreeMask; i++)
1251 if (regMaskTable[i] == mask)
1257 // We only allocate a fixed number of masks. Since we don't reallocate, we will throw a
1258 // noway_assert if we exceed this limit.
1259 noway_assert(nextFreeMask < numMasks);
1261 regMaskTable[nextFreeMask] = mask;
1262 result = nextFreeMask;
1265 assert(mask == regMaskTable[result]);
1269 // We've decided that we can't use a register during register allocation (probably FPBASE),
1270 // but we've already added it to the register masks. Go through the masks and remove it.
1271 void LinearScan::RemoveRegisterFromMasks(regNumber reg)
1273 JITDUMP("Removing register %s from LSRA register masks\n", getRegName(reg));
1275 regMaskTP mask = ~genRegMask(reg);
1276 for (int i = 0; i < nextFreeMask; i++)
1278 regMaskTable[i] &= mask;
1281 JITDUMP("After removing register:\n");
1282 DBEXEC(VERBOSE, dspRegisterMaskTable());
1286 void LinearScan::dspRegisterMaskTable()
1288 printf("LSRA register masks. Total allocated: %d, total used: %d\n", numMasks, nextFreeMask);
1289 for (int i = 0; i < nextFreeMask; i++)
1292 dspRegMask(regMaskTable[i]);
1298 //------------------------------------------------------------------------
1299 // getNextCandidateFromWorkList: Get the next candidate for block sequencing
1305 // The next block to be placed in the sequence.
1308 // This method currently always returns the next block in the list, and relies on having
1309 // blocks added to the list only when they are "ready", and on the
1310 // addToBlockSequenceWorkList() method to insert them in the proper order.
1311 // However, a block may be in the list and already selected, if it was subsequently
1312 // encountered as both a flow and layout successor of the most recently selected
1315 BasicBlock* LinearScan::getNextCandidateFromWorkList()
1317 BasicBlockList* nextWorkList = nullptr;
1318 for (BasicBlockList* workList = blockSequenceWorkList; workList != nullptr; workList = nextWorkList)
1320 nextWorkList = workList->next;
1321 BasicBlock* candBlock = workList->block;
1322 removeFromBlockSequenceWorkList(workList, nullptr);
1323 if (!isBlockVisited(candBlock))
1331 //------------------------------------------------------------------------
1332 // setBlockSequence:Determine the block order for register allocation.
1341 // On return, the blockSequence array contains the blocks, in the order in which they
1342 // will be allocated.
1343 // This method clears the bbVisitedSet on LinearScan, and when it returns the set
1344 // contains all the bbNums for the block.
1345 // This requires a traversal of the BasicBlocks, and could potentially be
1346 // combined with the first traversal (currently the one in Lowering that sets the
1349 void LinearScan::setBlockSequence()
1351 // Reset the "visited" flag on each block.
1352 compiler->EnsureBasicBlockEpoch();
1353 bbVisitedSet = BlockSetOps::MakeEmpty(compiler);
1354 BlockSet readySet(BlockSetOps::MakeEmpty(compiler));
1355 BlockSet predSet(BlockSetOps::MakeEmpty(compiler));
1357 assert(blockSequence == nullptr && bbSeqCount == 0);
1358 blockSequence = new (compiler, CMK_LSRA) BasicBlock*[compiler->fgBBcount];
1359 bbNumMaxBeforeResolution = compiler->fgBBNumMax;
1360 blockInfo = new (compiler, CMK_LSRA) LsraBlockInfo[bbNumMaxBeforeResolution + 1];
1362 assert(blockSequenceWorkList == nullptr);
1364 bool addedInternalBlocks = false;
1365 verifiedAllBBs = false;
1366 hasCriticalEdges = false;
1367 BasicBlock* nextBlock;
1368 for (BasicBlock* block = compiler->fgFirstBB; block != nullptr; block = nextBlock)
1370 blockSequence[bbSeqCount] = block;
1371 markBlockVisited(block);
1373 nextBlock = nullptr;
1375 // Initialize the blockInfo.
1376 // predBBNum will be set later. 0 is never used as a bbNum.
1377 blockInfo[block->bbNum].predBBNum = 0;
1378 // We check for critical edges below, but initialize to false.
1379 blockInfo[block->bbNum].hasCriticalInEdge = false;
1380 blockInfo[block->bbNum].hasCriticalOutEdge = false;
1381 blockInfo[block->bbNum].weight = block->bbWeight;
1383 #if TRACK_LSRA_STATS
1384 blockInfo[block->bbNum].spillCount = 0;
1385 blockInfo[block->bbNum].copyRegCount = 0;
1386 blockInfo[block->bbNum].resolutionMovCount = 0;
1387 blockInfo[block->bbNum].splitEdgeCount = 0;
1388 #endif // TRACK_LSRA_STATS
1390 if (block->GetUniquePred(compiler) == nullptr)
1392 for (flowList* pred = block->bbPreds; pred != nullptr; pred = pred->flNext)
1394 BasicBlock* predBlock = pred->flBlock;
1395 if (predBlock->NumSucc(compiler) > 1)
1397 blockInfo[block->bbNum].hasCriticalInEdge = true;
1398 hasCriticalEdges = true;
1401 else if (predBlock->bbJumpKind == BBJ_SWITCH)
1403 assert(!"Switch with single successor");
1408 // Determine which block to schedule next.
1410 // First, update the NORMAL successors of the current block, adding them to the worklist
1411 // according to the desired order. We will handle the EH successors below.
1412 bool checkForCriticalOutEdge = (block->NumSucc(compiler) > 1);
1413 if (!checkForCriticalOutEdge && block->bbJumpKind == BBJ_SWITCH)
1415 assert(!"Switch with single successor");
1418 const unsigned numSuccs = block->NumSucc(compiler);
1419 for (unsigned succIndex = 0; succIndex < numSuccs; succIndex++)
1421 BasicBlock* succ = block->GetSucc(succIndex, compiler);
1422 if (checkForCriticalOutEdge && succ->GetUniquePred(compiler) == nullptr)
1424 blockInfo[block->bbNum].hasCriticalOutEdge = true;
1425 hasCriticalEdges = true;
1426 // We can stop checking now.
1427 checkForCriticalOutEdge = false;
1430 if (isTraversalLayoutOrder() || isBlockVisited(succ))
1435 // We've now seen a predecessor, so add it to the work list and the "readySet".
1436 // It will be inserted in the worklist according to the specified traversal order
1437 // (i.e. pred-first or random, since layout order is handled above).
1438 if (!BlockSetOps::IsMember(compiler, readySet, succ->bbNum))
1440 addToBlockSequenceWorkList(readySet, succ, predSet);
1441 BlockSetOps::AddElemD(compiler, readySet, succ->bbNum);
1445 // For layout order, simply use bbNext
1446 if (isTraversalLayoutOrder())
1448 nextBlock = block->bbNext;
1452 while (nextBlock == nullptr)
1454 nextBlock = getNextCandidateFromWorkList();
1456 // TODO-Throughput: We would like to bypass this traversal if we know we've handled all
1457 // the blocks - but fgBBcount does not appear to be updated when blocks are removed.
1458 if (nextBlock == nullptr /* && bbSeqCount != compiler->fgBBcount*/ && !verifiedAllBBs)
1460 // If we don't encounter all blocks by traversing the regular sucessor links, do a full
1461 // traversal of all the blocks, and add them in layout order.
1462 // This may include:
1463 // - internal-only blocks (in the fgAddCodeList) which may not be in the flow graph
1464 // (these are not even in the bbNext links).
1465 // - blocks that have become unreachable due to optimizations, but that are strongly
1466 // connected (these are not removed)
1469 for (Compiler::AddCodeDsc* desc = compiler->fgAddCodeList; desc != nullptr; desc = desc->acdNext)
1471 if (!isBlockVisited(block))
1473 addToBlockSequenceWorkList(readySet, block, predSet);
1474 BlockSetOps::AddElemD(compiler, readySet, block->bbNum);
1478 for (BasicBlock* block = compiler->fgFirstBB; block; block = block->bbNext)
1480 if (!isBlockVisited(block))
1482 addToBlockSequenceWorkList(readySet, block, predSet);
1483 BlockSetOps::AddElemD(compiler, readySet, block->bbNum);
1486 verifiedAllBBs = true;
1494 blockSequencingDone = true;
1497 // Make sure that we've visited all the blocks.
1498 for (BasicBlock* block = compiler->fgFirstBB; block != nullptr; block = block->bbNext)
1500 assert(isBlockVisited(block));
1503 JITDUMP("LSRA Block Sequence: ");
1505 for (BasicBlock *block = startBlockSequence(); block != nullptr; ++i, block = moveToNextBlock())
1507 JITDUMP("BB%02u", block->bbNum);
1509 if (block->isMaxBBWeight())
1515 JITDUMP("(%6s) ", refCntWtd2str(block->getBBWeight(compiler)));
1527 //------------------------------------------------------------------------
1528 // compareBlocksForSequencing: Compare two basic blocks for sequencing order.
1531 // block1 - the first block for comparison
1532 // block2 - the second block for comparison
1533 // useBlockWeights - whether to use block weights for comparison
1536 // -1 if block1 is preferred.
1537 // 0 if the blocks are equivalent.
1538 // 1 if block2 is preferred.
1541 // See addToBlockSequenceWorkList.
1542 int LinearScan::compareBlocksForSequencing(BasicBlock* block1, BasicBlock* block2, bool useBlockWeights)
1544 if (useBlockWeights)
1546 unsigned weight1 = block1->getBBWeight(compiler);
1547 unsigned weight2 = block2->getBBWeight(compiler);
1549 if (weight1 > weight2)
1553 else if (weight1 < weight2)
1559 // If weights are the same prefer LOWER bbnum
1560 if (block1->bbNum < block2->bbNum)
1564 else if (block1->bbNum == block2->bbNum)
1574 //------------------------------------------------------------------------
1575 // addToBlockSequenceWorkList: Add a BasicBlock to the work list for sequencing.
1578 // sequencedBlockSet - the set of blocks that are already sequenced
1579 // block - the new block to be added
1580 // predSet - the buffer to save predecessors set. A block set allocated by the caller used here as a
1581 // temporary block set for constructing a predecessor set. Allocated by the caller to avoid reallocating a new block
1582 // set with every call to this function
1588 // The first block in the list will be the next one to be sequenced, as soon
1589 // as we encounter a block whose successors have all been sequenced, in pred-first
1590 // order, or the very next block if we are traversing in random order (once implemented).
1591 // This method uses a comparison method to determine the order in which to place
1592 // the blocks in the list. This method queries whether all predecessors of the
1593 // block are sequenced at the time it is added to the list and if so uses block weights
1594 // for inserting the block. A block is never inserted ahead of its predecessors.
1595 // A block at the time of insertion may not have all its predecessors sequenced, in
1596 // which case it will be sequenced based on its block number. Once a block is inserted,
1597 // its priority\order will not be changed later once its remaining predecessors are
1598 // sequenced. This would mean that work list may not be sorted entirely based on
1599 // block weights alone.
1601 // Note also that, when random traversal order is implemented, this method
1602 // should insert the blocks into the list in random order, so that we can always
1603 // simply select the first block in the list.
1604 void LinearScan::addToBlockSequenceWorkList(BlockSet sequencedBlockSet, BasicBlock* block, BlockSet& predSet)
1606 // The block that is being added is not already sequenced
1607 assert(!BlockSetOps::IsMember(compiler, sequencedBlockSet, block->bbNum));
1609 // Get predSet of block
1610 BlockSetOps::ClearD(compiler, predSet);
1612 for (pred = block->bbPreds; pred != nullptr; pred = pred->flNext)
1614 BlockSetOps::AddElemD(compiler, predSet, pred->flBlock->bbNum);
1617 // If either a rarely run block or all its preds are already sequenced, use block's weight to sequence
1618 bool useBlockWeight = block->isRunRarely() || BlockSetOps::IsSubset(compiler, sequencedBlockSet, predSet);
1620 BasicBlockList* prevNode = nullptr;
1621 BasicBlockList* nextNode = blockSequenceWorkList;
1623 while (nextNode != nullptr)
1627 if (nextNode->block->isRunRarely())
1629 // If the block that is yet to be sequenced is a rarely run block, always use block weights for sequencing
1630 seqResult = compareBlocksForSequencing(nextNode->block, block, true);
1632 else if (BlockSetOps::IsMember(compiler, predSet, nextNode->block->bbNum))
1634 // always prefer unsequenced pred blocks
1639 seqResult = compareBlocksForSequencing(nextNode->block, block, useBlockWeight);
1647 prevNode = nextNode;
1648 nextNode = nextNode->next;
1651 BasicBlockList* newListNode = new (compiler, CMK_LSRA) BasicBlockList(block, nextNode);
1652 if (prevNode == nullptr)
1654 blockSequenceWorkList = newListNode;
1658 prevNode->next = newListNode;
1662 void LinearScan::removeFromBlockSequenceWorkList(BasicBlockList* listNode, BasicBlockList* prevNode)
1664 if (listNode == blockSequenceWorkList)
1666 assert(prevNode == nullptr);
1667 blockSequenceWorkList = listNode->next;
1671 assert(prevNode != nullptr && prevNode->next == listNode);
1672 prevNode->next = listNode->next;
1674 // TODO-Cleanup: consider merging Compiler::BlockListNode and BasicBlockList
1675 // compiler->FreeBlockListNode(listNode);
1678 // Initialize the block order for allocation (called each time a new traversal begins).
1679 BasicBlock* LinearScan::startBlockSequence()
1681 if (!blockSequencingDone)
1685 BasicBlock* curBB = compiler->fgFirstBB;
1687 curBBNum = curBB->bbNum;
1688 clearVisitedBlocks();
1689 assert(blockSequence[0] == compiler->fgFirstBB);
1690 markBlockVisited(curBB);
1694 //------------------------------------------------------------------------
1695 // moveToNextBlock: Move to the next block in order for allocation or resolution.
1704 // This method is used when the next block is actually going to be handled.
1705 // It changes curBBNum.
1707 BasicBlock* LinearScan::moveToNextBlock()
1709 BasicBlock* nextBlock = getNextBlock();
1711 if (nextBlock != nullptr)
1713 curBBNum = nextBlock->bbNum;
1718 //------------------------------------------------------------------------
1719 // getNextBlock: Get the next block in order for allocation or resolution.
1728 // This method does not actually change the current block - it is used simply
1729 // to determine which block will be next.
1731 BasicBlock* LinearScan::getNextBlock()
1733 assert(blockSequencingDone);
1734 unsigned int nextBBSeqNum = curBBSeqNum + 1;
1735 if (nextBBSeqNum < bbSeqCount)
1737 return blockSequence[nextBBSeqNum];
1742 //------------------------------------------------------------------------
1743 // doLinearScan: The main method for register allocation.
1752 // Lowering must have set the NodeInfo (gtLsraInfo) on each node to communicate
1753 // the register requirements.
1755 void LinearScan::doLinearScan()
1757 unsigned lsraBlockEpoch = compiler->GetCurBasicBlockEpoch();
1759 splitBBNumToTargetBBNumMap = nullptr;
1761 // This is complicated by the fact that physical registers have refs associated
1762 // with locations where they are killed (e.g. calls), but we don't want to
1763 // count these as being touched.
1765 compiler->codeGen->regSet.rsClearRegsModified();
1769 DBEXEC(VERBOSE, TupleStyleDump(LSRA_DUMP_REFPOS));
1770 compiler->EndPhase(PHASE_LINEAR_SCAN_BUILD);
1772 DBEXEC(VERBOSE, lsraDumpIntervals("after buildIntervals"));
1774 clearVisitedBlocks();
1776 allocateRegisters();
1777 compiler->EndPhase(PHASE_LINEAR_SCAN_ALLOC);
1779 compiler->EndPhase(PHASE_LINEAR_SCAN_RESOLVE);
1781 #if TRACK_LSRA_STATS
1782 if ((JitConfig.DisplayLsraStats() != 0)
1788 dumpLsraStats(jitstdout);
1790 #endif // TRACK_LSRA_STATS
1792 DBEXEC(VERBOSE, TupleStyleDump(LSRA_DUMP_POST));
1794 compiler->compLSRADone = true;
1795 noway_assert(lsraBlockEpoch = compiler->GetCurBasicBlockEpoch());
1798 //------------------------------------------------------------------------
1799 // recordVarLocationsAtStartOfBB: Update live-in LclVarDscs with the appropriate
1800 // register location at the start of a block, during codegen.
1803 // bb - the block for which code is about to be generated.
1809 // CodeGen will take care of updating the reg masks and the current var liveness,
1810 // after calling this method.
1811 // This is because we need to kill off the dead registers before setting the newly live ones.
1813 void LinearScan::recordVarLocationsAtStartOfBB(BasicBlock* bb)
1815 if (!enregisterLocalVars)
1819 JITDUMP("Recording Var Locations at start of BB%02u\n", bb->bbNum);
1820 VarToRegMap map = getInVarToRegMap(bb->bbNum);
1823 VarSetOps::AssignNoCopy(compiler, currentLiveVars,
1824 VarSetOps::Intersection(compiler, registerCandidateVars, bb->bbLiveIn));
1825 VarSetOps::Iter iter(compiler, currentLiveVars);
1826 unsigned varIndex = 0;
1827 while (iter.NextElem(&varIndex))
1829 unsigned varNum = compiler->lvaTrackedToVarNum[varIndex];
1830 LclVarDsc* varDsc = &(compiler->lvaTable[varNum]);
1831 regNumber regNum = getVarReg(map, varIndex);
1833 regNumber oldRegNum = varDsc->lvRegNum;
1834 regNumber newRegNum = regNum;
1836 if (oldRegNum != newRegNum)
1838 JITDUMP(" V%02u(%s->%s)", varNum, compiler->compRegVarName(oldRegNum),
1839 compiler->compRegVarName(newRegNum));
1840 varDsc->lvRegNum = newRegNum;
1843 else if (newRegNum != REG_STK)
1845 JITDUMP(" V%02u(%s)", varNum, compiler->compRegVarName(newRegNum));
1852 JITDUMP(" <none>\n");
1858 void Interval::setLocalNumber(Compiler* compiler, unsigned lclNum, LinearScan* linScan)
1860 LclVarDsc* varDsc = &compiler->lvaTable[lclNum];
1861 assert(varDsc->lvTracked);
1862 assert(varDsc->lvVarIndex < compiler->lvaTrackedCount);
1864 linScan->localVarIntervals[varDsc->lvVarIndex] = this;
1866 assert(linScan->getIntervalForLocalVar(varDsc->lvVarIndex) == this);
1867 this->isLocalVar = true;
1868 this->varNum = lclNum;
1871 // identify the candidates which we are not going to enregister due to
1872 // being used in EH in a way we don't want to deal with
1873 // this logic cloned from fgInterBlockLocalVarLiveness
1874 void LinearScan::identifyCandidatesExceptionDataflow()
1876 VARSET_TP exceptVars(VarSetOps::MakeEmpty(compiler));
1877 VARSET_TP filterVars(VarSetOps::MakeEmpty(compiler));
1878 VARSET_TP finallyVars(VarSetOps::MakeEmpty(compiler));
1881 foreach_block(compiler, block)
1883 if (block->bbCatchTyp != BBCT_NONE)
1885 // live on entry to handler
1886 VarSetOps::UnionD(compiler, exceptVars, block->bbLiveIn);
1889 if (block->bbJumpKind == BBJ_EHFILTERRET)
1891 // live on exit from filter
1892 VarSetOps::UnionD(compiler, filterVars, block->bbLiveOut);
1894 else if (block->bbJumpKind == BBJ_EHFINALLYRET)
1896 // live on exit from finally
1897 VarSetOps::UnionD(compiler, finallyVars, block->bbLiveOut);
1899 #if FEATURE_EH_FUNCLETS
1900 // Funclets are called and returned from, as such we can only count on the frame
1901 // pointer being restored, and thus everything live in or live out must be on the
1903 if (block->bbFlags & BBF_FUNCLET_BEG)
1905 VarSetOps::UnionD(compiler, exceptVars, block->bbLiveIn);
1907 if ((block->bbJumpKind == BBJ_EHFINALLYRET) || (block->bbJumpKind == BBJ_EHFILTERRET) ||
1908 (block->bbJumpKind == BBJ_EHCATCHRET))
1910 VarSetOps::UnionD(compiler, exceptVars, block->bbLiveOut);
1912 #endif // FEATURE_EH_FUNCLETS
1915 // slam them all together (there was really no need to use more than 2 bitvectors here)
1916 VarSetOps::UnionD(compiler, exceptVars, filterVars);
1917 VarSetOps::UnionD(compiler, exceptVars, finallyVars);
1919 /* Mark all pointer variables live on exit from a 'finally'
1920 block as either volatile for non-GC ref types or as
1921 'explicitly initialized' (volatile and must-init) for GC-ref types */
1923 VarSetOps::Iter iter(compiler, exceptVars);
1924 unsigned varIndex = 0;
1925 while (iter.NextElem(&varIndex))
1927 unsigned varNum = compiler->lvaTrackedToVarNum[varIndex];
1928 LclVarDsc* varDsc = compiler->lvaTable + varNum;
1930 compiler->lvaSetVarDoNotEnregister(varNum DEBUGARG(Compiler::DNER_LiveInOutOfHandler));
1932 if (varTypeIsGC(varDsc))
1934 if (VarSetOps::IsMember(compiler, finallyVars, varIndex) && !varDsc->lvIsParam)
1936 varDsc->lvMustInit = true;
1942 //------------------------------------------------------------------------
1943 // IsContainableMemoryOp: Checks whether this is a memory op that can be contained.
1946 // node - the node of interest.
1949 // True if this will definitely be a memory reference that could be contained.
1952 // This differs from the isMemoryOp() method on GenTree because it checks for
1953 // the case of doNotEnregister local. This won't include locals that
1954 // for some other reason do not become register candidates, nor those that get
1956 // Also, because we usually call this before we redo dataflow, any new lclVars
1957 // introduced after the last dataflow analysis will not yet be marked lvTracked,
1958 // so we don't use that.
1960 bool LinearScan::isContainableMemoryOp(GenTree* node)
1962 #ifdef _TARGET_XARCH_
1963 if (node->isMemoryOp())
1967 if (node->IsLocal())
1969 if (!enregisterLocalVars)
1973 LclVarDsc* varDsc = &compiler->lvaTable[node->AsLclVar()->gtLclNum];
1974 return varDsc->lvDoNotEnregister;
1976 #endif // _TARGET_XARCH_
1980 bool LinearScan::isRegCandidate(LclVarDsc* varDsc)
1982 // We shouldn't be called if opt settings do not permit register variables.
1983 assert((compiler->opts.compFlags & CLFLG_REGVAR) != 0);
1985 if (!varDsc->lvTracked)
1990 #if !defined(_TARGET_64BIT_)
1991 if (varDsc->lvType == TYP_LONG)
1993 // Long variables should not be register candidates.
1994 // Lowering will have split any candidate lclVars into lo/hi vars.
1997 #endif // !defined(_TARGET_64BIT)
1999 // If we have JMP, reg args must be put on the stack
2001 if (compiler->compJmpOpUsed && varDsc->lvIsRegArg)
2006 // Don't allocate registers for dependently promoted struct fields
2007 if (compiler->lvaIsFieldOfDependentlyPromotedStruct(varDsc))
2014 // Identify locals & compiler temps that are register candidates
2015 // TODO-Cleanup: This was cloned from Compiler::lvaSortByRefCount() in lclvars.cpp in order
2016 // to avoid perturbation, but should be merged.
2018 void LinearScan::identifyCandidates()
2020 if (enregisterLocalVars)
2022 // Initialize the set of lclVars that are candidates for register allocation.
2023 VarSetOps::AssignNoCopy(compiler, registerCandidateVars, VarSetOps::MakeEmpty(compiler));
2025 // Initialize the sets of lclVars that are used to determine whether, and for which lclVars,
2026 // we need to perform resolution across basic blocks.
2027 // Note that we can't do this in the constructor because the number of tracked lclVars may
2028 // change between the constructor and the actual allocation.
2029 VarSetOps::AssignNoCopy(compiler, resolutionCandidateVars, VarSetOps::MakeEmpty(compiler));
2030 VarSetOps::AssignNoCopy(compiler, splitOrSpilledVars, VarSetOps::MakeEmpty(compiler));
2032 // We set enregisterLocalVars to true only if there are tracked lclVars
2033 assert(compiler->lvaCount != 0);
2035 else if (compiler->lvaCount == 0)
2037 // Nothing to do. Note that even if enregisterLocalVars is false, we still need to set the
2038 // lvLRACandidate field on all the lclVars to false if we have any.
2042 if (compiler->compHndBBtabCount > 0)
2044 identifyCandidatesExceptionDataflow();
2050 // While we build intervals for the candidate lclVars, we will determine the floating point
2051 // lclVars, if any, to consider for callee-save register preferencing.
2052 // We maintain two sets of FP vars - those that meet the first threshold of weighted ref Count,
2053 // and those that meet the second.
2054 // The first threshold is used for methods that are heuristically deemed either to have light
2055 // fp usage, or other factors that encourage conservative use of callee-save registers, such
2056 // as multiple exits (where there might be an early exit that woudl be excessively penalized by
2057 // lots of prolog/epilog saves & restores).
2058 // The second threshold is used where there are factors deemed to make it more likely that fp
2059 // fp callee save registers will be needed, such as loops or many fp vars.
2060 // We keep two sets of vars, since we collect some of the information to determine which set to
2061 // use as we iterate over the vars.
2062 // When we are generating AVX code on non-Unix (FEATURE_PARTIAL_SIMD_CALLEE_SAVE), we maintain an
2063 // additional set of LargeVectorType vars, and there is a separate threshold defined for those.
2064 // It is assumed that if we encounter these, that we should consider this a "high use" scenario,
2065 // so we don't maintain two sets of these vars.
2066 // This is defined as thresholdLargeVectorRefCntWtd, as we are likely to use the same mechanism
2067 // for vectors on Arm64, though the actual value may differ.
2069 unsigned int floatVarCount = 0;
2070 unsigned int thresholdFPRefCntWtd = 4 * BB_UNITY_WEIGHT;
2071 unsigned int maybeFPRefCntWtd = 2 * BB_UNITY_WEIGHT;
2072 VARSET_TP fpMaybeCandidateVars(VarSetOps::UninitVal());
2073 #if FEATURE_PARTIAL_SIMD_CALLEE_SAVE
2074 unsigned int largeVectorVarCount = 0;
2075 unsigned int thresholdLargeVectorRefCntWtd = 4 * BB_UNITY_WEIGHT;
2076 #endif // FEATURE_PARTIAL_SIMD_CALLEE_SAVE
2077 if (enregisterLocalVars)
2079 VarSetOps::AssignNoCopy(compiler, fpCalleeSaveCandidateVars, VarSetOps::MakeEmpty(compiler));
2080 VarSetOps::AssignNoCopy(compiler, fpMaybeCandidateVars, VarSetOps::MakeEmpty(compiler));
2081 #if FEATURE_PARTIAL_SIMD_CALLEE_SAVE
2082 VarSetOps::AssignNoCopy(compiler, largeVectorVars, VarSetOps::MakeEmpty(compiler));
2083 VarSetOps::AssignNoCopy(compiler, largeVectorCalleeSaveCandidateVars, VarSetOps::MakeEmpty(compiler));
2084 #endif // FEATURE_PARTIAL_SIMD_CALLEE_SAVE
2087 unsigned refCntStk = 0;
2088 unsigned refCntReg = 0;
2089 unsigned refCntWtdReg = 0;
2090 unsigned refCntStkParam = 0; // sum of ref counts for all stack based parameters
2091 unsigned refCntWtdStkDbl = 0; // sum of wtd ref counts for stack based doubles
2092 doDoubleAlign = false;
2093 bool checkDoubleAlign = true;
2094 if (compiler->codeGen->isFramePointerRequired() || compiler->opts.MinOpts())
2096 checkDoubleAlign = false;
2100 switch (compiler->getCanDoubleAlign())
2102 case MUST_DOUBLE_ALIGN:
2103 doDoubleAlign = true;
2104 checkDoubleAlign = false;
2106 case CAN_DOUBLE_ALIGN:
2108 case CANT_DOUBLE_ALIGN:
2109 doDoubleAlign = false;
2110 checkDoubleAlign = false;
2116 #endif // DOUBLE_ALIGN
2118 // Check whether register variables are permitted.
2119 if (!enregisterLocalVars)
2121 localVarIntervals = nullptr;
2123 else if (compiler->lvaTrackedCount > 0)
2125 // initialize mapping from tracked local to interval
2126 localVarIntervals = new (compiler, CMK_LSRA) Interval*[compiler->lvaTrackedCount];
2129 INTRACK_STATS(regCandidateVarCount = 0);
2130 for (lclNum = 0, varDsc = compiler->lvaTable; lclNum < compiler->lvaCount; lclNum++, varDsc++)
2132 // Initialize all variables to REG_STK
2133 varDsc->lvRegNum = REG_STK;
2134 #ifndef _TARGET_64BIT_
2135 varDsc->lvOtherReg = REG_STK;
2136 #endif // _TARGET_64BIT_
2138 if (!enregisterLocalVars)
2140 varDsc->lvLRACandidate = false;
2145 if (checkDoubleAlign)
2147 if (varDsc->lvIsParam && !varDsc->lvIsRegArg)
2149 refCntStkParam += varDsc->lvRefCnt;
2151 else if (!isRegCandidate(varDsc) || varDsc->lvDoNotEnregister)
2153 refCntStk += varDsc->lvRefCnt;
2154 if ((varDsc->lvType == TYP_DOUBLE) ||
2155 ((varTypeIsStruct(varDsc) && varDsc->lvStructDoubleAlign &&
2156 (compiler->lvaGetPromotionType(varDsc) != Compiler::PROMOTION_TYPE_INDEPENDENT))))
2158 refCntWtdStkDbl += varDsc->lvRefCntWtd;
2163 refCntReg += varDsc->lvRefCnt;
2164 refCntWtdReg += varDsc->lvRefCntWtd;
2167 #endif // DOUBLE_ALIGN
2169 /* Track all locals that can be enregistered */
2171 if (!isRegCandidate(varDsc))
2173 varDsc->lvLRACandidate = 0;
2174 if (varDsc->lvTracked)
2176 localVarIntervals[varDsc->lvVarIndex] = nullptr;
2181 assert(varDsc->lvTracked);
2183 varDsc->lvLRACandidate = 1;
2185 // Start with lvRegister as false - set it true only if the variable gets
2186 // the same register assignment throughout
2187 varDsc->lvRegister = false;
2189 /* If the ref count is zero */
2190 if (varDsc->lvRefCnt == 0)
2192 /* Zero ref count, make this untracked */
2193 varDsc->lvRefCntWtd = 0;
2194 varDsc->lvLRACandidate = 0;
2197 // Variables that are address-exposed are never enregistered, or tracked.
2198 // A struct may be promoted, and a struct that fits in a register may be fully enregistered.
2199 // Pinned variables may not be tracked (a condition of the GCInfo representation)
2200 // or enregistered, on x86 -- it is believed that we can enregister pinned (more properly, "pinning")
2201 // references when using the general GC encoding.
2203 if (varDsc->lvAddrExposed || !varTypeIsEnregisterableStruct(varDsc))
2205 varDsc->lvLRACandidate = 0;
2207 Compiler::DoNotEnregisterReason dner = Compiler::DNER_AddrExposed;
2208 if (!varDsc->lvAddrExposed)
2210 dner = Compiler::DNER_IsStruct;
2213 compiler->lvaSetVarDoNotEnregister(lclNum DEBUGARG(dner));
2215 else if (varDsc->lvPinned)
2217 varDsc->lvTracked = 0;
2218 #ifdef JIT32_GCENCODER
2219 compiler->lvaSetVarDoNotEnregister(lclNum DEBUGARG(Compiler::DNER_PinningRef));
2220 #endif // JIT32_GCENCODER
2223 // Are we not optimizing and we have exception handlers?
2224 // if so mark all args and locals as volatile, so that they
2225 // won't ever get enregistered.
2227 if (compiler->opts.MinOpts() && compiler->compHndBBtabCount > 0)
2229 compiler->lvaSetVarDoNotEnregister(lclNum DEBUGARG(Compiler::DNER_LiveInOutOfHandler));
2232 if (varDsc->lvDoNotEnregister)
2234 varDsc->lvLRACandidate = 0;
2235 localVarIntervals[varDsc->lvVarIndex] = nullptr;
2239 var_types type = genActualType(varDsc->TypeGet());
2243 #if CPU_HAS_FP_SUPPORT
2246 if (compiler->opts.compDbgCode)
2248 varDsc->lvLRACandidate = 0;
2251 if (varDsc->lvIsParam && varDsc->lvIsRegArg)
2253 type = (type == TYP_DOUBLE) ? TYP_LONG : TYP_INT;
2255 #endif // ARM_SOFTFP
2257 #endif // CPU_HAS_FP_SUPPORT
2269 if (varDsc->lvPromoted)
2271 varDsc->lvLRACandidate = 0;
2275 // TODO-1stClassStructs: Move TYP_SIMD8 up with the other SIMD types, after handling the param issue
2276 // (passing & returning as TYP_LONG).
2278 #endif // FEATURE_SIMD
2282 varDsc->lvLRACandidate = 0;
2288 noway_assert(!"lvType not set correctly");
2289 varDsc->lvType = TYP_INT;
2294 varDsc->lvLRACandidate = 0;
2297 if (varDsc->lvLRACandidate)
2299 Interval* newInt = newInterval(type);
2300 newInt->setLocalNumber(compiler, lclNum, this);
2301 VarSetOps::AddElemD(compiler, registerCandidateVars, varDsc->lvVarIndex);
2303 // we will set this later when we have determined liveness
2304 varDsc->lvMustInit = false;
2306 if (varDsc->lvIsStructField)
2308 newInt->isStructField = true;
2311 INTRACK_STATS(regCandidateVarCount++);
2313 // We maintain two sets of FP vars - those that meet the first threshold of weighted ref Count,
2314 // and those that meet the second (see the definitions of thresholdFPRefCntWtd and maybeFPRefCntWtd
2316 CLANG_FORMAT_COMMENT_ANCHOR;
2318 #if FEATURE_PARTIAL_SIMD_CALLEE_SAVE
2319 // Additionally, when we are generating AVX on non-UNIX amd64, we keep a separate set of the LargeVectorType
2321 if (varDsc->lvType == LargeVectorType)
2323 largeVectorVarCount++;
2324 VarSetOps::AddElemD(compiler, largeVectorVars, varDsc->lvVarIndex);
2325 unsigned refCntWtd = varDsc->lvRefCntWtd;
2326 if (refCntWtd >= thresholdLargeVectorRefCntWtd)
2328 VarSetOps::AddElemD(compiler, largeVectorCalleeSaveCandidateVars, varDsc->lvVarIndex);
2332 #endif // FEATURE_PARTIAL_SIMD_CALLEE_SAVE
2333 if (regType(type) == FloatRegisterType)
2336 unsigned refCntWtd = varDsc->lvRefCntWtd;
2337 if (varDsc->lvIsRegArg)
2339 // Don't count the initial reference for register params. In those cases,
2340 // using a callee-save causes an extra copy.
2341 refCntWtd -= BB_UNITY_WEIGHT;
2343 if (refCntWtd >= thresholdFPRefCntWtd)
2345 VarSetOps::AddElemD(compiler, fpCalleeSaveCandidateVars, varDsc->lvVarIndex);
2347 else if (refCntWtd >= maybeFPRefCntWtd)
2349 VarSetOps::AddElemD(compiler, fpMaybeCandidateVars, varDsc->lvVarIndex);
2355 localVarIntervals[varDsc->lvVarIndex] = nullptr;
2360 if (checkDoubleAlign)
2362 // TODO-CQ: Fine-tune this:
2363 // In the legacy reg predictor, this runs after allocation, and then demotes any lclVars
2364 // allocated to the frame pointer, which is probably the wrong order.
2365 // However, because it runs after allocation, it can determine the impact of demoting
2366 // the lclVars allocated to the frame pointer.
2367 // => Here, estimate of the EBP refCnt and weighted refCnt is a wild guess.
2369 unsigned refCntEBP = refCntReg / 8;
2370 unsigned refCntWtdEBP = refCntWtdReg / 8;
2373 compiler->shouldDoubleAlign(refCntStk, refCntEBP, refCntWtdEBP, refCntStkParam, refCntWtdStkDbl);
2375 #endif // DOUBLE_ALIGN
2377 // The factors we consider to determine which set of fp vars to use as candidates for callee save
2378 // registers current include the number of fp vars, whether there are loops, and whether there are
2379 // multiple exits. These have been selected somewhat empirically, but there is probably room for
2381 CLANG_FORMAT_COMMENT_ANCHOR;
2386 printf("\nFP callee save candidate vars: ");
2387 if (enregisterLocalVars && !VarSetOps::IsEmpty(compiler, fpCalleeSaveCandidateVars))
2389 dumpConvertedVarSet(compiler, fpCalleeSaveCandidateVars);
2399 JITDUMP("floatVarCount = %d; hasLoops = %d, singleExit = %d\n", floatVarCount, compiler->fgHasLoops,
2400 (compiler->fgReturnBlocks == nullptr || compiler->fgReturnBlocks->next == nullptr));
2402 // Determine whether to use the 2nd, more aggressive, threshold for fp callee saves.
2403 if (floatVarCount > 6 && compiler->fgHasLoops &&
2404 (compiler->fgReturnBlocks == nullptr || compiler->fgReturnBlocks->next == nullptr))
2406 assert(enregisterLocalVars);
2410 printf("Adding additional fp callee save candidates: \n");
2411 if (!VarSetOps::IsEmpty(compiler, fpMaybeCandidateVars))
2413 dumpConvertedVarSet(compiler, fpMaybeCandidateVars);
2422 VarSetOps::UnionD(compiler, fpCalleeSaveCandidateVars, fpMaybeCandidateVars);
2429 // Frame layout is only pre-computed for ARM
2430 printf("\nlvaTable after IdentifyCandidates\n");
2431 compiler->lvaTableDump();
2434 #endif // _TARGET_ARM_
2437 // TODO-Throughput: This mapping can surely be more efficiently done
2438 void LinearScan::initVarRegMaps()
2440 if (!enregisterLocalVars)
2442 inVarToRegMaps = nullptr;
2443 outVarToRegMaps = nullptr;
2446 assert(compiler->lvaTrackedFixed); // We should have already set this to prevent us from adding any new tracked
2449 // The compiler memory allocator requires that the allocation be an
2450 // even multiple of int-sized objects
2451 unsigned int varCount = compiler->lvaTrackedCount;
2452 regMapCount = (unsigned int)roundUp(varCount, sizeof(int));
2454 // Not sure why blocks aren't numbered from zero, but they don't appear to be.
2455 // So, if we want to index by bbNum we have to know the maximum value.
2456 unsigned int bbCount = compiler->fgBBNumMax + 1;
2458 inVarToRegMaps = new (compiler, CMK_LSRA) regNumberSmall*[bbCount];
2459 outVarToRegMaps = new (compiler, CMK_LSRA) regNumberSmall*[bbCount];
2463 // This VarToRegMap is used during the resolution of critical edges.
2464 sharedCriticalVarToRegMap = new (compiler, CMK_LSRA) regNumberSmall[regMapCount];
2466 for (unsigned int i = 0; i < bbCount; i++)
2468 VarToRegMap inVarToRegMap = new (compiler, CMK_LSRA) regNumberSmall[regMapCount];
2469 VarToRegMap outVarToRegMap = new (compiler, CMK_LSRA) regNumberSmall[regMapCount];
2471 for (unsigned int j = 0; j < regMapCount; j++)
2473 inVarToRegMap[j] = REG_STK;
2474 outVarToRegMap[j] = REG_STK;
2476 inVarToRegMaps[i] = inVarToRegMap;
2477 outVarToRegMaps[i] = outVarToRegMap;
2482 sharedCriticalVarToRegMap = nullptr;
2483 for (unsigned int i = 0; i < bbCount; i++)
2485 inVarToRegMaps[i] = nullptr;
2486 outVarToRegMaps[i] = nullptr;
2491 void LinearScan::setInVarRegForBB(unsigned int bbNum, unsigned int varNum, regNumber reg)
2493 assert(enregisterLocalVars);
2494 assert(reg < UCHAR_MAX && varNum < compiler->lvaCount);
2495 inVarToRegMaps[bbNum][compiler->lvaTable[varNum].lvVarIndex] = (regNumberSmall)reg;
2498 void LinearScan::setOutVarRegForBB(unsigned int bbNum, unsigned int varNum, regNumber reg)
2500 assert(enregisterLocalVars);
2501 assert(reg < UCHAR_MAX && varNum < compiler->lvaCount);
2502 outVarToRegMaps[bbNum][compiler->lvaTable[varNum].lvVarIndex] = (regNumberSmall)reg;
2505 LinearScan::SplitEdgeInfo LinearScan::getSplitEdgeInfo(unsigned int bbNum)
2507 assert(enregisterLocalVars);
2508 SplitEdgeInfo splitEdgeInfo;
2509 assert(bbNum <= compiler->fgBBNumMax);
2510 assert(bbNum > bbNumMaxBeforeResolution);
2511 assert(splitBBNumToTargetBBNumMap != nullptr);
2512 splitBBNumToTargetBBNumMap->Lookup(bbNum, &splitEdgeInfo);
2513 assert(splitEdgeInfo.toBBNum <= bbNumMaxBeforeResolution);
2514 assert(splitEdgeInfo.fromBBNum <= bbNumMaxBeforeResolution);
2515 return splitEdgeInfo;
2518 VarToRegMap LinearScan::getInVarToRegMap(unsigned int bbNum)
2520 assert(enregisterLocalVars);
2521 assert(bbNum <= compiler->fgBBNumMax);
2522 // For the blocks inserted to split critical edges, the inVarToRegMap is
2523 // equal to the outVarToRegMap at the "from" block.
2524 if (bbNum > bbNumMaxBeforeResolution)
2526 SplitEdgeInfo splitEdgeInfo = getSplitEdgeInfo(bbNum);
2527 unsigned fromBBNum = splitEdgeInfo.fromBBNum;
2530 assert(splitEdgeInfo.toBBNum != 0);
2531 return inVarToRegMaps[splitEdgeInfo.toBBNum];
2535 return outVarToRegMaps[fromBBNum];
2539 return inVarToRegMaps[bbNum];
2542 VarToRegMap LinearScan::getOutVarToRegMap(unsigned int bbNum)
2544 assert(enregisterLocalVars);
2545 assert(bbNum <= compiler->fgBBNumMax);
2546 // For the blocks inserted to split critical edges, the outVarToRegMap is
2547 // equal to the inVarToRegMap at the target.
2548 if (bbNum > bbNumMaxBeforeResolution)
2550 // If this is an empty block, its in and out maps are both the same.
2551 // We identify this case by setting fromBBNum or toBBNum to 0, and using only the other.
2552 SplitEdgeInfo splitEdgeInfo = getSplitEdgeInfo(bbNum);
2553 unsigned toBBNum = splitEdgeInfo.toBBNum;
2556 assert(splitEdgeInfo.fromBBNum != 0);
2557 return outVarToRegMaps[splitEdgeInfo.fromBBNum];
2561 return inVarToRegMaps[toBBNum];
2564 return outVarToRegMaps[bbNum];
2567 //------------------------------------------------------------------------
2568 // setVarReg: Set the register associated with a variable in the given 'bbVarToRegMap'.
2571 // bbVarToRegMap - the map of interest
2572 // trackedVarIndex - the lvVarIndex for the variable
2573 // reg - the register to which it is being mapped
2578 void LinearScan::setVarReg(VarToRegMap bbVarToRegMap, unsigned int trackedVarIndex, regNumber reg)
2580 assert(trackedVarIndex < compiler->lvaTrackedCount);
2581 regNumberSmall regSmall = (regNumberSmall)reg;
2582 assert((regNumber)regSmall == reg);
2583 bbVarToRegMap[trackedVarIndex] = regSmall;
2586 //------------------------------------------------------------------------
2587 // getVarReg: Get the register associated with a variable in the given 'bbVarToRegMap'.
2590 // bbVarToRegMap - the map of interest
2591 // trackedVarIndex - the lvVarIndex for the variable
2594 // The register to which 'trackedVarIndex' is mapped
2596 regNumber LinearScan::getVarReg(VarToRegMap bbVarToRegMap, unsigned int trackedVarIndex)
2598 assert(enregisterLocalVars);
2599 assert(trackedVarIndex < compiler->lvaTrackedCount);
2600 return (regNumber)bbVarToRegMap[trackedVarIndex];
2603 // Initialize the incoming VarToRegMap to the given map values (generally a predecessor of
2605 VarToRegMap LinearScan::setInVarToRegMap(unsigned int bbNum, VarToRegMap srcVarToRegMap)
2607 assert(enregisterLocalVars);
2608 VarToRegMap inVarToRegMap = inVarToRegMaps[bbNum];
2609 memcpy(inVarToRegMap, srcVarToRegMap, (regMapCount * sizeof(regNumber)));
2610 return inVarToRegMap;
2613 // given a tree node
2614 RefType refTypeForLocalRefNode(GenTree* node)
2616 assert(node->IsLocal());
2618 // We don't support updates
2619 assert((node->gtFlags & GTF_VAR_USEASG) == 0);
2621 if (node->gtFlags & GTF_VAR_DEF)
2631 //------------------------------------------------------------------------
2632 // checkLastUses: Check correctness of last use flags
2635 // The block for which we are checking last uses.
2638 // This does a backward walk of the RefPositions, starting from the liveOut set.
2639 // This method was previously used to set the last uses, which were computed by
2640 // liveness, but were not create in some cases of multiple lclVar references in the
2641 // same tree. However, now that last uses are computed as RefPositions are created,
2642 // that is no longer necessary, and this method is simply retained as a check.
2643 // The exception to the check-only behavior is when LSRA_EXTEND_LIFETIMES if set via
2644 // COMPlus_JitStressRegs. In that case, this method is required, because even though
2645 // the RefPositions will not be marked lastUse in that case, we still need to correclty
2646 // mark the last uses on the tree nodes, which is done by this method.
2649 void LinearScan::checkLastUses(BasicBlock* block)
2653 JITDUMP("\n\nCHECKING LAST USES for block %u, liveout=", block->bbNum);
2654 dumpConvertedVarSet(compiler, block->bbLiveOut);
2655 JITDUMP("\n==============================\n");
2658 unsigned keepAliveVarNum = BAD_VAR_NUM;
2659 if (compiler->lvaKeepAliveAndReportThis())
2661 keepAliveVarNum = compiler->info.compThisArg;
2662 assert(compiler->info.compIsStatic == false);
2665 // find which uses are lastUses
2667 // Work backwards starting with live out.
2668 // 'computedLive' is updated to include any exposed use (including those in this
2669 // block that we've already seen). When we encounter a use, if it's
2670 // not in that set, then it's a last use.
2672 VARSET_TP computedLive(VarSetOps::MakeCopy(compiler, block->bbLiveOut));
2674 bool foundDiff = false;
2675 auto currentRefPosition = refPositions.rbegin();
2676 while (currentRefPosition->refType != RefTypeBB)
2678 // We should never see ParamDefs or ZeroInits within a basic block.
2679 assert(currentRefPosition->refType != RefTypeParamDef && currentRefPosition->refType != RefTypeZeroInit);
2680 if (currentRefPosition->isIntervalRef() && currentRefPosition->getInterval()->isLocalVar)
2682 unsigned varNum = currentRefPosition->getInterval()->varNum;
2683 unsigned varIndex = currentRefPosition->getInterval()->getVarIndex(compiler);
2685 LsraLocation loc = currentRefPosition->nodeLocation;
2687 // We should always have a tree node for a localVar, except for the "special" RefPositions.
2688 GenTreePtr tree = currentRefPosition->treeNode;
2689 assert(tree != nullptr || currentRefPosition->refType == RefTypeExpUse ||
2690 currentRefPosition->refType == RefTypeDummyDef);
2692 if (!VarSetOps::IsMember(compiler, computedLive, varIndex) && varNum != keepAliveVarNum)
2694 // There was no exposed use, so this is a "last use" (and we mark it thus even if it's a def)
2696 if (extendLifetimes())
2698 // NOTE: this is a bit of a hack. When extending lifetimes, the "last use" bit will be clear.
2699 // This bit, however, would normally be used during resolveLocalRef to set the value of
2700 // GTF_VAR_DEATH on the node for a ref position. If this bit is not set correctly even when
2701 // extending lifetimes, the code generator will assert as it expects to have accurate last
2702 // use information. To avoid these asserts, set the GTF_VAR_DEATH bit here.
2703 // Note also that extendLifetimes() is an LSRA stress mode, so it will only be true for
2704 // Checked or Debug builds, for which this method will be executed.
2705 if (tree != nullptr)
2707 tree->gtFlags |= GTF_VAR_DEATH;
2710 else if (!currentRefPosition->lastUse)
2712 JITDUMP("missing expected last use of V%02u @%u\n", compiler->lvaTrackedToVarNum[varIndex], loc);
2715 VarSetOps::AddElemD(compiler, computedLive, varIndex);
2717 else if (currentRefPosition->lastUse)
2719 JITDUMP("unexpected last use of V%02u @%u\n", compiler->lvaTrackedToVarNum[varIndex], loc);
2722 else if (extendLifetimes() && tree != nullptr)
2724 // NOTE: see the comment above re: the extendLifetimes hack.
2725 tree->gtFlags &= ~GTF_VAR_DEATH;
2728 if (currentRefPosition->refType == RefTypeDef || currentRefPosition->refType == RefTypeDummyDef)
2730 VarSetOps::RemoveElemD(compiler, computedLive, varIndex);
2734 assert(currentRefPosition != refPositions.rend());
2735 ++currentRefPosition;
2738 VARSET_TP liveInNotComputedLive(VarSetOps::Diff(compiler, block->bbLiveIn, computedLive));
2740 VarSetOps::Iter liveInNotComputedLiveIter(compiler, liveInNotComputedLive);
2741 unsigned liveInNotComputedLiveIndex = 0;
2742 while (liveInNotComputedLiveIter.NextElem(&liveInNotComputedLiveIndex))
2744 unsigned varNum = compiler->lvaTrackedToVarNum[liveInNotComputedLiveIndex];
2745 if (compiler->lvaTable[varNum].lvLRACandidate)
2747 JITDUMP("BB%02u: V%02u is in LiveIn set, but not computed live.\n", block->bbNum, varNum);
2752 VarSetOps::DiffD(compiler, computedLive, block->bbLiveIn);
2753 const VARSET_TP& computedLiveNotLiveIn(computedLive); // reuse the buffer.
2754 VarSetOps::Iter computedLiveNotLiveInIter(compiler, computedLiveNotLiveIn);
2755 unsigned computedLiveNotLiveInIndex = 0;
2756 while (computedLiveNotLiveInIter.NextElem(&computedLiveNotLiveInIndex))
2758 unsigned varNum = compiler->lvaTrackedToVarNum[computedLiveNotLiveInIndex];
2759 if (compiler->lvaTable[varNum].lvLRACandidate)
2761 JITDUMP("BB%02u: V%02u is computed live, but not in LiveIn set.\n", block->bbNum, varNum);
2770 void LinearScan::addRefsForPhysRegMask(regMaskTP mask, LsraLocation currentLoc, RefType refType, bool isLastUse)
2772 for (regNumber reg = REG_FIRST; mask; reg = REG_NEXT(reg), mask >>= 1)
2776 // This assumes that these are all "special" RefTypes that
2777 // don't need to be recorded on the tree (hence treeNode is nullptr)
2778 RefPosition* pos = newRefPosition(reg, currentLoc, refType, nullptr,
2779 genRegMask(reg)); // This MUST occupy the physical register (obviously)
2783 pos->lastUse = true;
2789 //------------------------------------------------------------------------
2790 // getKillSetForNode: Return the registers killed by the given tree node.
2793 // compiler - the compiler context to use
2794 // tree - the tree for which the kill set is needed.
2796 // Return Value: a register mask of the registers killed
2798 regMaskTP LinearScan::getKillSetForNode(GenTree* tree)
2800 regMaskTP killMask = RBM_NONE;
2801 switch (tree->OperGet())
2803 #ifdef _TARGET_XARCH_
2805 // We use the 128-bit multiply when performing an overflow checking unsigned multiply
2807 if (((tree->gtFlags & GTF_UNSIGNED) != 0) && tree->gtOverflowEx())
2809 // Both RAX and RDX are killed by the operation
2810 killMask = RBM_RAX | RBM_RDX;
2815 #if defined(_TARGET_X86_) && !defined(LEGACY_BACKEND)
2818 killMask = RBM_RAX | RBM_RDX;
2825 if (!varTypeIsFloating(tree->TypeGet()))
2827 // RDX needs to be killed early, because it must not be used as a source register
2828 // (unlike most cases, where the kill happens AFTER the uses). So for this kill,
2829 // we add the RefPosition at the tree loc (where the uses are located) instead of the
2830 // usual kill location which is the same as the defs at tree loc+1.
2831 // Note that we don't have to add interference for the live vars, because that
2832 // will be done below, and is not sensitive to the precise location.
2833 LsraLocation currentLoc = tree->gtLsraInfo.loc;
2834 assert(currentLoc != 0);
2835 addRefsForPhysRegMask(RBM_RDX, currentLoc, RefTypeKill, true);
2836 // Both RAX and RDX are killed by the operation
2837 killMask = RBM_RAX | RBM_RDX;
2840 #endif // _TARGET_XARCH_
2843 if (tree->OperIsCopyBlkOp())
2845 assert(tree->AsObj()->gtGcPtrCount != 0);
2846 killMask = compiler->compHelperCallKillSet(CORINFO_HELP_ASSIGN_BYREF);
2852 case GT_STORE_DYN_BLK:
2854 GenTreeBlk* blkNode = tree->AsBlk();
2855 bool isCopyBlk = varTypeIsStruct(blkNode->Data());
2856 switch (blkNode->gtBlkOpKind)
2858 case GenTreeBlk::BlkOpKindHelper:
2861 killMask = compiler->compHelperCallKillSet(CORINFO_HELP_MEMCPY);
2865 killMask = compiler->compHelperCallKillSet(CORINFO_HELP_MEMSET);
2869 #ifdef _TARGET_XARCH_
2870 case GenTreeBlk::BlkOpKindRepInstr:
2873 // rep movs kills RCX, RDI and RSI
2874 killMask = RBM_RCX | RBM_RDI | RBM_RSI;
2878 // rep stos kills RCX and RDI.
2879 // (Note that the Data() node, if not constant, will be assigned to
2880 // RCX, but it's find that this kills it, as the value is not available
2881 // after this node in any case.)
2882 killMask = RBM_RDI | RBM_RCX;
2886 case GenTreeBlk::BlkOpKindRepInstr:
2888 case GenTreeBlk::BlkOpKindUnroll:
2889 case GenTreeBlk::BlkOpKindInvalid:
2890 // for these 'gtBlkOpKind' kinds, we leave 'killMask' = RBM_NONE
2897 killMask = compiler->compHelperCallKillSet(CORINFO_HELP_STOP_FOR_GC);
2901 if (compiler->compFloatingPointUsed)
2903 if (tree->TypeGet() == TYP_DOUBLE)
2905 needDoubleTmpForFPCall = true;
2907 else if (tree->TypeGet() == TYP_FLOAT)
2909 needFloatTmpForFPCall = true;
2912 #endif // _TARGET_X86_
2913 #if defined(_TARGET_X86_) || defined(_TARGET_ARM_)
2914 if (tree->IsHelperCall())
2916 GenTreeCall* call = tree->AsCall();
2917 CorInfoHelpFunc helpFunc = compiler->eeGetHelperNum(call->gtCallMethHnd);
2918 killMask = compiler->compHelperCallKillSet(helpFunc);
2921 #endif // defined(_TARGET_X86_) || defined(_TARGET_ARM_)
2923 // if there is no FP used, we can ignore the FP kills
2924 if (compiler->compFloatingPointUsed)
2926 killMask = RBM_CALLEE_TRASH;
2930 killMask = RBM_INT_CALLEE_TRASH;
2933 if (tree->AsCall()->IsVirtualStub())
2935 killMask |= compiler->virtualStubParamInfo->GetRegMask();
2937 #else // !_TARGET_ARM_
2938 // Verify that the special virtual stub call registers are in the kill mask.
2939 // We don't just add them unconditionally to the killMask because for most architectures
2940 // they are already in the RBM_CALLEE_TRASH set,
2941 // and we don't want to introduce extra checks and calls in this hot function.
2942 assert(!tree->AsCall()->IsVirtualStub() || ((killMask & compiler->virtualStubParamInfo->GetRegMask()) ==
2943 compiler->virtualStubParamInfo->GetRegMask()));
2948 if (compiler->codeGen->gcInfo.gcIsWriteBarrierAsgNode(tree))
2950 killMask = RBM_CALLEE_TRASH_NOGC;
2954 #if defined(PROFILING_SUPPORTED)
2955 // If this method requires profiler ELT hook then mark these nodes as killing
2956 // callee trash registers (excluding RAX and XMM0). The reason for this is that
2957 // profiler callback would trash these registers. See vm\amd64\asmhelpers.asm for
2960 if (compiler->compIsProfilerHookNeeded())
2962 killMask = compiler->compHelperCallKillSet(CORINFO_HELP_PROF_FCN_LEAVE);
2967 if (compiler->compIsProfilerHookNeeded())
2969 killMask = compiler->compHelperCallKillSet(CORINFO_HELP_PROF_FCN_TAILCALL);
2972 #endif // PROFILING_SUPPORTED
2975 // for all other 'tree->OperGet()' kinds, leave 'killMask' = RBM_NONE
2981 //------------------------------------------------------------------------
2982 // buildKillPositionsForNode:
2983 // Given some tree node add refpositions for all the registers this node kills
2986 // tree - the tree for which kill positions should be generated
2987 // currentLoc - the location at which the kills should be added
2990 // true - kills were inserted
2991 // false - no kills were inserted
2994 // The return value is needed because if we have any kills, we need to make sure that
2995 // all defs are located AFTER the kills. On the other hand, if there aren't kills,
2996 // the multiple defs for a regPair are in different locations.
2997 // If we generate any kills, we will mark all currentLiveVars as being preferenced
2998 // to avoid the killed registers. This is somewhat conservative.
3000 bool LinearScan::buildKillPositionsForNode(GenTree* tree, LsraLocation currentLoc)
3002 regMaskTP killMask = getKillSetForNode(tree);
3003 bool isCallKill = ((killMask == RBM_INT_CALLEE_TRASH) || (killMask == RBM_CALLEE_TRASH));
3004 if (killMask != RBM_NONE)
3006 // The killMask identifies a set of registers that will be used during codegen.
3007 // Mark these as modified here, so when we do final frame layout, we'll know about
3008 // all these registers. This is especially important if killMask contains
3009 // callee-saved registers, which affect the frame size since we need to save/restore them.
3010 // In the case where we have a copyBlk with GC pointers, can need to call the
3011 // CORINFO_HELP_ASSIGN_BYREF helper, which kills callee-saved RSI and RDI, if
3012 // LSRA doesn't assign RSI/RDI, they wouldn't get marked as modified until codegen,
3013 // which is too late.
3014 compiler->codeGen->regSet.rsSetRegsModified(killMask DEBUGARG(dumpTerse));
3016 addRefsForPhysRegMask(killMask, currentLoc, RefTypeKill, true);
3018 // TODO-CQ: It appears to be valuable for both fp and int registers to avoid killing the callee
3019 // save regs on infrequently exectued paths. However, it results in a large number of asmDiffs,
3020 // many of which appear to be regressions (because there is more spill on the infrequently path),
3021 // but are not really because the frequent path becomes smaller. Validating these diffs will need
3022 // to be done before making this change.
3023 // if (!blockSequence[curBBSeqNum]->isRunRarely())
3024 if (enregisterLocalVars)
3026 VarSetOps::Iter iter(compiler, currentLiveVars);
3027 unsigned varIndex = 0;
3028 while (iter.NextElem(&varIndex))
3030 unsigned varNum = compiler->lvaTrackedToVarNum[varIndex];
3031 LclVarDsc* varDsc = compiler->lvaTable + varNum;
3032 #if FEATURE_PARTIAL_SIMD_CALLEE_SAVE
3033 if (varDsc->lvType == LargeVectorType)
3035 if (!VarSetOps::IsMember(compiler, largeVectorCalleeSaveCandidateVars, varIndex))
3041 #endif // FEATURE_PARTIAL_SIMD_CALLEE_SAVE
3042 if (varTypeIsFloating(varDsc) &&
3043 !VarSetOps::IsMember(compiler, fpCalleeSaveCandidateVars, varIndex))
3047 Interval* interval = getIntervalForLocalVar(varIndex);
3050 interval->preferCalleeSave = true;
3052 regMaskTP newPreferences = allRegs(interval->registerType) & (~killMask);
3054 if (newPreferences != RBM_NONE)
3056 interval->updateRegisterPreferences(newPreferences);
3060 // If there are no callee-saved registers, the call could kill all the registers.
3061 // This is a valid state, so in that case assert should not trigger. The RA will spill in order to
3062 // free a register later.
3063 assert(compiler->opts.compDbgEnC || (calleeSaveRegs(varDsc->lvType)) == RBM_NONE);
3068 if (tree->IsCall() && (tree->gtFlags & GTF_CALL_UNMANAGED) != 0)
3070 RefPosition* pos = newRefPosition((Interval*)nullptr, currentLoc, RefTypeKillGCRefs, tree,
3071 (allRegs(TYP_REF) & ~RBM_ARG_REGS));
3079 //----------------------------------------------------------------------------
3080 // defineNewInternalTemp: Defines a ref position for an internal temp.
3083 // tree - Gentree node requiring an internal register
3084 // regType - Register type
3085 // currentLoc - Location of the temp Def position
3086 // regMask - register mask of candidates for temp
3087 // minRegCandidateCount - Minimum registers to be ensured in candidate
3088 // set under LSRA stress mode. This is a
3090 RefPosition* LinearScan::defineNewInternalTemp(GenTree* tree,
3091 RegisterType regType,
3092 LsraLocation currentLoc,
3093 regMaskTP regMask DEBUGARG(unsigned minRegCandidateCount))
3095 Interval* current = newInterval(regType);
3096 current->isInternal = true;
3097 return newRefPosition(current, currentLoc, RefTypeDef, tree, regMask, 0 DEBUG_ARG(minRegCandidateCount));
3100 //------------------------------------------------------------------------
3101 // buildInternalRegisterDefsForNode - build Def positions for internal
3102 // registers required for tree node.
3105 // tree - Gentree node that needs internal registers
3106 // currentLoc - Location at which Def positions need to be defined
3107 // temps - in-out array which is populated with ref positions
3108 // created for Def of internal registers
3109 // minRegCandidateCount - Minimum registers to be ensured in candidate
3110 // set of ref positions under LSRA stress. This is
3111 // a DEBUG only arg.
3114 // The total number of Def positions created for internal registers of tree node.
3115 int LinearScan::buildInternalRegisterDefsForNode(GenTree* tree,
3116 LsraLocation currentLoc,
3117 RefPosition* temps[] // populates
3118 DEBUGARG(unsigned minRegCandidateCount))
3121 int internalIntCount = tree->gtLsraInfo.internalIntCount;
3122 regMaskTP internalCands = tree->gtLsraInfo.getInternalCandidates(this);
3124 // If the number of internal integer registers required is the same as the number of candidate integer registers in
3125 // the candidate set, then they must be handled as fixed registers.
3126 // (E.g. for the integer registers that floating point arguments must be copied into for a varargs call.)
3127 bool fixedRegs = false;
3128 regMaskTP internalIntCandidates = (internalCands & allRegs(TYP_INT));
3129 if (((int)genCountBits(internalIntCandidates)) == internalIntCount)
3134 for (count = 0; count < internalIntCount; count++)
3136 regMaskTP internalIntCands = (internalCands & allRegs(TYP_INT));
3139 internalIntCands = genFindLowestBit(internalIntCands);
3140 internalCands &= ~internalIntCands;
3143 defineNewInternalTemp(tree, IntRegisterType, currentLoc, internalIntCands DEBUG_ARG(minRegCandidateCount));
3146 int internalFloatCount = tree->gtLsraInfo.internalFloatCount;
3147 for (int i = 0; i < internalFloatCount; i++)
3149 regMaskTP internalFPCands = (internalCands & internalFloatRegCandidates());
3151 defineNewInternalTemp(tree, FloatRegisterType, currentLoc, internalFPCands DEBUG_ARG(minRegCandidateCount));
3154 assert(count < MaxInternalRegisters);
3155 assert(count == (internalIntCount + internalFloatCount));
3159 //------------------------------------------------------------------------
3160 // buildInternalRegisterUsesForNode - adds Use positions for internal
3161 // registers required for tree node.
3164 // tree - Gentree node that needs internal registers
3165 // currentLoc - Location at which Use positions need to be defined
3166 // defs - int array containing Def positions of internal
3168 // total - Total number of Def positions in 'defs' array.
3169 // minRegCandidateCount - Minimum registers to be ensured in candidate
3170 // set of ref positions under LSRA stress. This is
3171 // a DEBUG only arg.
3175 void LinearScan::buildInternalRegisterUsesForNode(GenTree* tree,
3176 LsraLocation currentLoc,
3177 RefPosition* defs[],
3178 int total DEBUGARG(unsigned minRegCandidateCount))
3180 assert(total < MaxInternalRegisters);
3182 // defs[] has been populated by buildInternalRegisterDefsForNode
3183 // now just add uses to the defs previously added.
3184 for (int i = 0; i < total; i++)
3186 RefPosition* prevRefPosition = defs[i];
3187 assert(prevRefPosition != nullptr);
3188 regMaskTP mask = prevRefPosition->registerAssignment;
3189 if (prevRefPosition->isPhysRegRef)
3191 newRefPosition(defs[i]->getReg()->regNum, currentLoc, RefTypeUse, tree, mask);
3195 RefPosition* newest = newRefPosition(defs[i]->getInterval(), currentLoc, RefTypeUse, tree, mask,
3196 0 DEBUG_ARG(minRegCandidateCount));
3198 if (tree->gtLsraInfo.isInternalRegDelayFree)
3200 newest->delayRegFree = true;
3206 regMaskTP LinearScan::getUseCandidates(GenTree* useNode)
3208 TreeNodeInfo info = useNode->gtLsraInfo;
3209 return info.getSrcCandidates(this);
3212 regMaskTP LinearScan::getDefCandidates(GenTree* tree)
3214 TreeNodeInfo info = tree->gtLsraInfo;
3215 return info.getDstCandidates(this);
3218 RegisterType LinearScan::getDefType(GenTree* tree)
3220 return tree->TypeGet();
3223 //------------------------------------------------------------------------
3224 // LocationInfoListNode: used to store a single `LocationInfo` value for a
3225 // node during `buildIntervals`.
3227 // This is the node type for `LocationInfoList` below.
3229 class LocationInfoListNode final : public LocationInfo
3231 friend class LocationInfoList;
3232 friend class LocationInfoListNodePool;
3234 LocationInfoListNode* m_next; // The next node in the list
3237 LocationInfoListNode(LsraLocation l, Interval* i, GenTree* t, unsigned regIdx = 0) : LocationInfo(l, i, t, regIdx)
3241 //------------------------------------------------------------------------
3242 // LocationInfoListNode::Next: Returns the next node in the list.
3243 LocationInfoListNode* Next() const
3249 //------------------------------------------------------------------------
3250 // LocationInfoList: used to store a list of `LocationInfo` values for a
3251 // node during `buildIntervals`.
3253 // Given an IR node that either directly defines N registers or that is a
3254 // contained node with uses that define a total of N registers, that node
3255 // will map to N `LocationInfo` values. These values are stored as a
3256 // linked list of `LocationInfoListNode` values.
3258 class LocationInfoList final
3260 friend class LocationInfoListNodePool;
3262 LocationInfoListNode* m_head; // The head of the list
3263 LocationInfoListNode* m_tail; // The tail of the list
3266 LocationInfoList() : m_head(nullptr), m_tail(nullptr)
3270 LocationInfoList(LocationInfoListNode* node) : m_head(node), m_tail(node)
3272 assert(m_head->m_next == nullptr);
3275 //------------------------------------------------------------------------
3276 // LocationInfoList::IsEmpty: Returns true if the list is empty.
3278 bool IsEmpty() const
3280 return m_head == nullptr;
3283 //------------------------------------------------------------------------
3284 // LocationInfoList::Begin: Returns the first node in the list.
3286 LocationInfoListNode* Begin() const
3291 //------------------------------------------------------------------------
3292 // LocationInfoList::End: Returns the position after the last node in the
3293 // list. The returned value is suitable for use as
3294 // a sentinel for iteration.
3296 LocationInfoListNode* End() const
3301 //------------------------------------------------------------------------
3302 // LocationInfoList::Append: Appends a node to the list.
3305 // node - The node to append. Must not be part of an existing list.
3307 void Append(LocationInfoListNode* node)
3309 assert(node->m_next == nullptr);
3311 if (m_tail == nullptr)
3313 assert(m_head == nullptr);
3318 m_tail->m_next = node;
3324 //------------------------------------------------------------------------
3325 // LocationInfoList::Append: Appends another list to this list.
3328 // other - The list to append.
3330 void Append(LocationInfoList other)
3332 if (m_tail == nullptr)
3334 assert(m_head == nullptr);
3335 m_head = other.m_head;
3339 m_tail->m_next = other.m_head;
3342 m_tail = other.m_tail;
3346 //------------------------------------------------------------------------
3347 // LocationInfoListNodePool: manages a pool of `LocationInfoListNode`
3348 // values to decrease overall memory usage
3349 // during `buildIntervals`.
3351 // `buildIntervals` involves creating a list of location info values per
3352 // node that either directly produces a set of registers or that is a
3353 // contained node with register-producing sources. However, these lists
3354 // are short-lived: they are destroyed once the use of the corresponding
3355 // node is processed. As such, there is typically only a small number of
3356 // `LocationInfoListNode` values in use at any given time. Pooling these
3357 // values avoids otherwise frequent allocations.
3358 class LocationInfoListNodePool final
3360 LocationInfoListNode* m_freeList;
3361 Compiler* m_compiler;
3364 //------------------------------------------------------------------------
3365 // LocationInfoListNodePool::LocationInfoListNodePool:
3366 // Creates a pool of `LocationInfoListNode` values.
3369 // compiler - The compiler context.
3370 // preallocate - The number of nodes to preallocate.
3372 LocationInfoListNodePool(Compiler* compiler, unsigned preallocate = 0) : m_compiler(compiler)
3374 if (preallocate > 0)
3376 size_t preallocateSize = sizeof(LocationInfoListNode) * preallocate;
3377 auto* preallocatedNodes = reinterpret_cast<LocationInfoListNode*>(compiler->compGetMem(preallocateSize));
3379 LocationInfoListNode* head = preallocatedNodes;
3380 head->m_next = nullptr;
3382 for (unsigned i = 1; i < preallocate; i++)
3384 LocationInfoListNode* node = &preallocatedNodes[i];
3385 node->m_next = head;
3393 //------------------------------------------------------------------------
3394 // LocationInfoListNodePool::GetNode: Fetches an unused node from the
3398 // l - - The `LsraLocation` for the `LocationInfo` value.
3399 // i - The interval for the `LocationInfo` value.
3400 // t - The IR node for the `LocationInfo` value
3401 // regIdx - The register index for the `LocationInfo` value.
3404 // A pooled or newly-allocated `LocationInfoListNode`, depending on the
3405 // contents of the pool.
3406 LocationInfoListNode* GetNode(LsraLocation l, Interval* i, GenTree* t, unsigned regIdx = 0)
3408 LocationInfoListNode* head = m_freeList;
3409 if (head == nullptr)
3411 head = reinterpret_cast<LocationInfoListNode*>(m_compiler->compGetMem(sizeof(LocationInfoListNode)));
3415 m_freeList = head->m_next;
3421 head->multiRegIdx = regIdx;
3422 head->m_next = nullptr;
3427 //------------------------------------------------------------------------
3428 // LocationInfoListNodePool::ReturnNodes: Returns a list of nodes to the
3432 // list - The list to return.
3434 void ReturnNodes(LocationInfoList& list)
3436 assert(list.m_head != nullptr);
3437 assert(list.m_tail != nullptr);
3439 LocationInfoListNode* head = m_freeList;
3440 list.m_tail->m_next = head;
3441 m_freeList = list.m_head;
3445 #if FEATURE_PARTIAL_SIMD_CALLEE_SAVE
3447 LinearScan::buildUpperVectorSaveRefPositions(GenTree* tree, LsraLocation currentLoc)
3449 assert(enregisterLocalVars);
3450 VARSET_TP liveLargeVectors(VarSetOps::MakeEmpty(compiler));
3451 regMaskTP fpCalleeKillSet = RBM_NONE;
3452 if (!VarSetOps::IsEmpty(compiler, largeVectorVars))
3454 // We actually need to find any calls that kill the upper-half of the callee-save vector registers.
3455 // But we will use as a proxy any node that kills floating point registers.
3456 // (Note that some calls are masquerading as other nodes at this point so we can't just check for calls.)
3457 fpCalleeKillSet = getKillSetForNode(tree);
3458 if ((fpCalleeKillSet & RBM_FLT_CALLEE_TRASH) != RBM_NONE)
3460 VarSetOps::AssignNoCopy(compiler, liveLargeVectors,
3461 VarSetOps::Intersection(compiler, currentLiveVars, largeVectorVars));
3462 VarSetOps::Iter iter(compiler, liveLargeVectors);
3463 unsigned varIndex = 0;
3464 while (iter.NextElem(&varIndex))
3466 Interval* varInterval = getIntervalForLocalVar(varIndex);
3467 Interval* tempInterval = newInterval(LargeVectorType);
3468 tempInterval->isInternal = true;
3470 newRefPosition(tempInterval, currentLoc, RefTypeUpperVectorSaveDef, tree, RBM_FLT_CALLEE_SAVED);
3471 // We are going to save the existing relatedInterval of varInterval on tempInterval, so that we can set
3472 // the tempInterval as the relatedInterval of varInterval, so that we can build the corresponding
3473 // RefTypeUpperVectorSaveUse RefPosition. We will then restore the relatedInterval onto varInterval,
3474 // and set varInterval as the relatedInterval of tempInterval.
3475 tempInterval->relatedInterval = varInterval->relatedInterval;
3476 varInterval->relatedInterval = tempInterval;
3480 return liveLargeVectors;
3483 void LinearScan::buildUpperVectorRestoreRefPositions(GenTree* tree,
3484 LsraLocation currentLoc,
3485 VARSET_VALARG_TP liveLargeVectors)
3487 assert(enregisterLocalVars);
3488 if (!VarSetOps::IsEmpty(compiler, liveLargeVectors))
3490 VarSetOps::Iter iter(compiler, liveLargeVectors);
3491 unsigned varIndex = 0;
3492 while (iter.NextElem(&varIndex))
3494 Interval* varInterval = getIntervalForLocalVar(varIndex);
3495 Interval* tempInterval = varInterval->relatedInterval;
3496 assert(tempInterval->isInternal == true);
3498 newRefPosition(tempInterval, currentLoc, RefTypeUpperVectorSaveUse, tree, RBM_FLT_CALLEE_SAVED);
3499 // Restore the relatedInterval onto varInterval, and set varInterval as the relatedInterval
3501 varInterval->relatedInterval = tempInterval->relatedInterval;
3502 tempInterval->relatedInterval = varInterval;
3506 #endif // FEATURE_PARTIAL_SIMD_CALLEE_SAVE
3509 //------------------------------------------------------------------------
3510 // ComputeOperandDstCount: computes the number of registers defined by a
3513 // For most nodes, this is simple:
3514 // - Nodes that do not produce values (e.g. stores and other void-typed
3515 // nodes) and nodes that immediately use the registers they define
3516 // produce no registers
3517 // - Nodes that are marked as defining N registers define N registers.
3519 // For contained nodes, however, things are more complicated: for purposes
3520 // of bookkeeping, a contained node is treated as producing the transitive
3521 // closure of the registers produced by its sources.
3524 // operand - The operand for which to compute a register count.
3527 // The number of registers defined by `operand`.
3529 static int ComputeOperandDstCount(GenTree* operand)
3531 TreeNodeInfo& operandInfo = operand->gtLsraInfo;
3533 if (operandInfo.isLocalDefUse)
3535 // Operands that define an unused value do not produce any registers.
3538 else if (operandInfo.dstCount != 0)
3540 // Operands that have a specified number of destination registers consume all of their operands
3541 // and therefore produce exactly that number of registers.
3542 return operandInfo.dstCount;
3544 else if (operandInfo.srcCount != 0)
3546 // If an operand has no destination registers but does have source registers, it must be a store
3548 assert(operand->OperIsStore() || operand->OperIsBlkOp() || operand->OperIsPutArgStk() ||
3549 operand->OperIsCompare() || operand->OperIs(GT_CMP) || operand->IsSIMDEqualityOrInequality());
3552 else if (!operand->OperIsFieldListHead() && (operand->OperIsStore() || operand->TypeGet() == TYP_VOID))
3554 // Stores and void-typed operands may be encountered when processing call nodes, which contain
3555 // pointers to argument setup stores.
3558 #ifdef _TARGET_ARMARCH_
3559 else if (operand->OperIsPutArgStk())
3561 // A PUTARG_STK argument is an operand of a call, but is neither contained, nor does it produce
3563 assert(!operand->isContained());
3566 #endif // _TARGET_ARMARCH_
3569 // If a field list or non-void-typed operand is not an unused value and does not have source registers,
3570 // that argument is contained within its parent and produces `sum(operand_dst_count)` registers.
3572 for (GenTree* op : operand->Operands())
3574 dstCount += ComputeOperandDstCount(op);
3581 //------------------------------------------------------------------------
3582 // ComputeAvailableSrcCount: computes the number of registers available as
3583 // sources for a node.
3585 // This is simply the sum of the number of registers produced by each
3586 // operand to the node.
3589 // node - The node for which to compute a source count.
3592 // The number of registers available as sources for `node`.
3594 static int ComputeAvailableSrcCount(GenTree* node)
3597 for (GenTree* operand : node->Operands())
3599 numSources += ComputeOperandDstCount(operand);
3606 static GenTree* GetFirstOperand(GenTree* node)
3608 GenTree* firstOperand = nullptr;
3609 node->VisitOperands([&firstOperand](GenTree* operand) -> GenTree::VisitResult {
3610 firstOperand = operand;
3611 return GenTree::VisitResult::Abort;
3613 return firstOperand;
3616 void LinearScan::buildRefPositionsForNode(GenTree* tree,
3618 LocationInfoListNodePool& listNodePool,
3619 HashTableBase<GenTree*, LocationInfoList>& operandToLocationInfoMap,
3620 LsraLocation currentLoc)
3623 assert(!isRegPairType(tree->TypeGet()));
3624 #endif // _TARGET_ARM_
3626 // The LIR traversal doesn't visit GT_LIST or GT_ARGPLACE nodes.
3627 // GT_CLS_VAR nodes should have been eliminated by rationalizer.
3628 assert(tree->OperGet() != GT_ARGPLACE);
3629 assert(tree->OperGet() != GT_LIST);
3630 assert(tree->OperGet() != GT_CLS_VAR);
3632 // The LIR traversal visits only the first node in a GT_FIELD_LIST.
3633 assert((tree->OperGet() != GT_FIELD_LIST) || tree->AsFieldList()->IsFieldListHead());
3635 // The set of internal temporary registers used by this node are stored in the
3636 // gtRsvdRegs register mask. Clear it out.
3637 tree->gtRsvdRegs = RBM_NONE;
3639 TreeNodeInfo info = tree->gtLsraInfo;
3640 assert(info.IsValid(this));
3641 int consume = info.srcCount;
3642 int produce = info.dstCount;
3647 lsraDispNode(tree, LSRA_DUMP_REFPOS, (produce != 0));
3649 if (tree->isContained())
3651 JITDUMP("Contained\n");
3653 else if (tree->OperIs(GT_LCL_VAR, GT_LCL_FLD) && info.isLocalDefUse)
3655 JITDUMP("Unused\n");
3659 JITDUMP(" consume=%d produce=%d\n", consume, produce);
3664 JITDUMP("at start of tree, map contains: { ");
3666 for (auto kvp : operandToLocationInfoMap)
3668 GenTree* node = kvp.Key();
3669 LocationInfoList defList = kvp.Value();
3671 JITDUMP("%sN%03u. %s -> (", first ? "" : "; ", node->gtSeqNum, GenTree::OpName(node->OperGet()));
3672 for (LocationInfoListNode *def = defList.Begin(), *end = defList.End(); def != end; def = def->Next())
3674 JITDUMP("%s%d.N%03u", def == defList.Begin() ? "" : ", ", def->loc, def->treeNode->gtSeqNum);
3685 assert(((consume == 0) && (produce == 0)) || (ComputeAvailableSrcCount(tree) == consume));
3687 if (tree->OperIs(GT_LCL_VAR, GT_LCL_FLD))
3689 LclVarDsc* const varDsc = &compiler->lvaTable[tree->AsLclVarCommon()->gtLclNum];
3690 if (isCandidateVar(varDsc))
3692 assert(consume == 0);
3694 // We handle tracked variables differently from non-tracked ones. If it is tracked,
3695 // we simply add a use or def of the tracked variable. Otherwise, for a use we need
3696 // to actually add the appropriate references for loading or storing the variable.
3698 // It won't actually get used or defined until the appropriate ancestor tree node
3699 // is processed, unless this is marked "isLocalDefUse" because it is a stack-based argument
3702 assert(varDsc->lvTracked);
3703 unsigned varIndex = varDsc->lvVarIndex;
3705 // We have only approximate last-use information at this point. This is because the
3706 // execution order doesn't actually reflect the true order in which the localVars
3707 // are referenced - but the order of the RefPositions will, so we recompute it after
3708 // RefPositions are built.
3709 // Use the old value for setting currentLiveVars - note that we do this with the
3710 // not-quite-correct setting of lastUse. However, this is OK because
3711 // 1) this is only for preferencing, which doesn't require strict correctness, and
3712 // 2) the cases where these out-of-order uses occur should not overlap a kill.
3713 // TODO-Throughput: clean this up once we have the execution order correct. At that point
3714 // we can update currentLiveVars at the same place that we create the RefPosition.
3715 if ((tree->gtFlags & GTF_VAR_DEATH) != 0)
3717 VarSetOps::RemoveElemD(compiler, currentLiveVars, varIndex);
3720 if (!info.isLocalDefUse && !tree->isContained())
3722 assert(produce != 0);
3724 LocationInfoList list(listNodePool.GetNode(currentLoc, getIntervalForLocalVar(varIndex), tree));
3725 bool added = operandToLocationInfoMap.AddOrUpdate(tree, list);
3728 tree->gtLsraInfo.definesAnyRegisters = true;
3734 if (tree->isContained())
3736 assert(!info.isLocalDefUse);
3737 assert(consume == 0);
3738 assert(produce == 0);
3739 assert(info.internalIntCount == 0);
3740 assert(info.internalFloatCount == 0);
3742 // Contained nodes map to the concatenated lists of their operands.
3743 LocationInfoList locationInfoList;
3744 tree->VisitOperands([&](GenTree* op) -> GenTree::VisitResult {
3745 if (!op->gtLsraInfo.definesAnyRegisters)
3747 assert(ComputeOperandDstCount(op) == 0);
3748 return GenTree::VisitResult::Continue;
3751 LocationInfoList operandList;
3752 bool removed = operandToLocationInfoMap.TryRemove(op, &operandList);
3755 locationInfoList.Append(operandList);
3756 return GenTree::VisitResult::Continue;
3759 if (!locationInfoList.IsEmpty())
3761 bool added = operandToLocationInfoMap.AddOrUpdate(tree, locationInfoList);
3763 tree->gtLsraInfo.definesAnyRegisters = true;
3769 // Handle the case of local variable assignment
3770 Interval* varDefInterval = nullptr;
3771 RefType defRefType = RefTypeDef;
3773 GenTree* defNode = tree;
3775 // noAdd means the node creates a def but for purposes of map
3776 // management do not add it because data is not flowing up the
3777 // tree but over (as in ASG nodes)
3779 bool noAdd = info.isLocalDefUse;
3780 RefPosition* prevPos = nullptr;
3782 bool isSpecialPutArg = false;
3784 assert(!tree->OperIsAssignment());
3785 if (tree->OperIsLocalStore())
3787 GenTreeLclVarCommon* const store = tree->AsLclVarCommon();
3788 assert((consume > 1) || (regType(store->gtOp1->TypeGet()) == regType(store->TypeGet())));
3790 LclVarDsc* varDsc = &compiler->lvaTable[store->gtLclNum];
3791 if (isCandidateVar(varDsc))
3793 // We always push the tracked lclVar intervals
3794 assert(varDsc->lvTracked);
3795 unsigned varIndex = varDsc->lvVarIndex;
3796 varDefInterval = getIntervalForLocalVar(varIndex);
3797 defRefType = refTypeForLocalRefNode(tree);
3805 assert(consume <= MAX_RET_REG_COUNT);
3808 // Get the location info for the register defined by the first operand.
3809 LocationInfoList operandDefs;
3810 bool found = operandToLocationInfoMap.TryGetValue(GetFirstOperand(tree), &operandDefs);
3813 // Since we only expect to consume one register, we should only have a single register to
3815 assert(operandDefs.Begin()->Next() == operandDefs.End());
3817 LocationInfo& operandInfo = *static_cast<LocationInfo*>(operandDefs.Begin());
3819 Interval* srcInterval = operandInfo.interval;
3820 if (srcInterval->relatedInterval == nullptr)
3822 // Preference the source to the dest, unless this is a non-last-use localVar.
3823 // Note that the last-use info is not correct, but it is a better approximation than preferencing
3824 // the source to the dest, if the source's lifetime extends beyond the dest.
3825 if (!srcInterval->isLocalVar || (operandInfo.treeNode->gtFlags & GTF_VAR_DEATH) != 0)
3827 srcInterval->assignRelatedInterval(varDefInterval);
3830 else if (!srcInterval->isLocalVar)
3832 // Preference the source to dest, if src is not a local var.
3833 srcInterval->assignRelatedInterval(varDefInterval);
3837 if ((tree->gtFlags & GTF_VAR_DEATH) == 0)
3839 VarSetOps::AddElemD(compiler, currentLiveVars, varIndex);
3842 else if (store->gtOp1->OperIs(GT_BITCAST))
3844 store->gtType = store->gtOp1->gtType = store->gtOp1->AsUnOp()->gtOp1->TypeGet();
3846 // Get the location info for the register defined by the first operand.
3847 LocationInfoList operandDefs;
3848 bool found = operandToLocationInfoMap.TryGetValue(GetFirstOperand(store), &operandDefs);
3851 // Since we only expect to consume one register, we should only have a single register to consume.
3852 assert(operandDefs.Begin()->Next() == operandDefs.End());
3854 LocationInfo& operandInfo = *static_cast<LocationInfo*>(operandDefs.Begin());
3856 Interval* srcInterval = operandInfo.interval;
3857 srcInterval->registerType = regType(store->TypeGet());
3859 RefPosition* srcDefPosition = srcInterval->firstRefPosition;
3860 assert(srcDefPosition != nullptr);
3861 assert(srcDefPosition->refType == RefTypeDef);
3862 assert(srcDefPosition->treeNode == store->gtOp1);
3864 srcDefPosition->registerAssignment = allRegs(store->TypeGet());
3865 store->gtOp1->gtLsraInfo.setSrcCandidates(this, allRegs(store->TypeGet()));
3868 else if (noAdd && produce == 0)
3870 // This is the case for dead nodes that occur after
3871 // tree rationalization
3872 // TODO-Cleanup: Identify and remove these dead nodes prior to register allocation.
3873 if (tree->IsMultiRegCall())
3875 // In case of multi-reg call node, produce = number of return registers
3876 produce = tree->AsCall()->GetReturnTypeDesc()->GetReturnRegCount();
3884 Interval* prefSrcInterval = nullptr;
3886 // If this is a binary operator that will be encoded with 2 operand fields
3887 // (i.e. the target is read-modify-write), preference the dst to op1.
3889 bool hasDelayFreeSrc = tree->gtLsraInfo.hasDelayFreeSrc;
3891 #if defined(DEBUG) && defined(_TARGET_X86_)
3892 // On x86, `LSRA_LIMIT_CALLER` is too restrictive to allow the use of special put args: this stress mode
3893 // leaves only three registers allocatable--eax, ecx, and edx--of which the latter two are also used for the
3894 // first two integral arguments to a call. This can leave us with too few registers to succesfully allocate in
3895 // situations like the following:
3897 // t1026 = lclVar ref V52 tmp35 u:3 REG NA <l:$3a1, c:$98d>
3900 // t1352 = * putarg_reg ref REG NA
3902 // t342 = lclVar int V14 loc6 u:4 REG NA $50c
3904 // t343 = const int 1 REG NA $41
3908 // t344 = * + int REG NA $495
3910 // t345 = lclVar int V04 arg4 u:2 REG NA $100
3914 // t346 = * % int REG NA $496
3917 // t1353 = * putarg_reg int REG NA
3919 // t1354 = lclVar ref V52 tmp35 (last use) REG NA
3922 // t1355 = * lea(b+0) byref REG NA
3924 // Here, the first `putarg_reg` would normally be considered a special put arg, which would remove `ecx` from the
3925 // set of allocatable registers, leaving only `eax` and `edx`. The allocator will then fail to allocate a register
3926 // for the def of `t345` if arg4 is not a register candidate: the corresponding ref position will be constrained to
3927 // { `ecx`, `ebx`, `esi`, `edi` }, which `LSRA_LIMIT_CALLER` will further constrain to `ecx`, which will not be
3928 // available due to the special put arg.
3929 const bool supportsSpecialPutArg = getStressLimitRegs() != LSRA_LIMIT_CALLER;
3931 const bool supportsSpecialPutArg = true;
3934 if (supportsSpecialPutArg && tree->OperGet() == GT_PUTARG_REG && isCandidateLocalRef(tree->gtGetOp1()) &&
3935 (tree->gtGetOp1()->gtFlags & GTF_VAR_DEATH) == 0)
3937 // This is the case for a "pass-through" copy of a lclVar. In the case where it is a non-last-use,
3938 // we don't want the def of the copy to kill the lclVar register, if it is assigned the same register
3939 // (which is actually what we hope will happen).
3940 JITDUMP("Setting putarg_reg as a pass-through of a non-last use lclVar\n");
3942 // Get the register information for the first operand of the node.
3943 LocationInfoList operandDefs;
3944 bool found = operandToLocationInfoMap.TryGetValue(GetFirstOperand(tree), &operandDefs);
3947 // Preference the destination to the interval of the first register defined by the first operand.
3948 Interval* srcInterval = operandDefs.Begin()->interval;
3949 assert(srcInterval->isLocalVar);
3950 prefSrcInterval = srcInterval;
3951 isSpecialPutArg = true;
3954 RefPosition* internalRefs[MaxInternalRegisters];
3957 // Number of registers required for tree node is the sum of
3958 // consume + produce + internalCount. This is the minimum
3959 // set of registers that needs to be ensured in candidate
3960 // set of ref positions created.
3961 unsigned minRegCount = consume + produce + info.internalIntCount + info.internalFloatCount;
3964 // make intervals for all the 'internal' register requirements for this node
3965 // where internal means additional registers required temporarily
3966 int internalCount = buildInternalRegisterDefsForNode(tree, currentLoc, internalRefs DEBUG_ARG(minRegCount));
3968 // pop all ref'd tree temps
3969 tree->VisitOperands([&](GenTree* operand) -> GenTree::VisitResult {
3970 // Skip operands that do not define any registers, whether directly or indirectly.
3971 if (!operand->gtLsraInfo.definesAnyRegisters)
3973 return GenTree::VisitResult::Continue;
3976 // Remove the list of registers defined by the current operand from the map. Note that this
3977 // is only correct because tree nodes are singly-used: if this property ever changes (e.g.
3978 // if tree nodes are eventually allowed to be multiply-used), then the removal is only
3979 // correct at the last use.
3980 LocationInfoList operandDefs;
3981 bool removed = operandToLocationInfoMap.TryRemove(operand, &operandDefs);
3983 assert(!operandDefs.IsEmpty());
3986 regMaskTP currCandidates = RBM_NONE;
3987 #endif // _TARGET_ARM_
3989 LocationInfoListNode* const operandDefsEnd = operandDefs.End();
3990 for (LocationInfoListNode* operandDefsIterator = operandDefs.Begin(); operandDefsIterator != operandDefsEnd;
3991 operandDefsIterator = operandDefsIterator->Next())
3993 LocationInfo& locInfo = *static_cast<LocationInfo*>(operandDefsIterator);
3995 // for interstitial tree temps, a use is always last and end; this is set by default in newRefPosition
3996 GenTree* const useNode = locInfo.treeNode;
3997 assert(useNode != nullptr);
3999 Interval* const i = locInfo.interval;
4000 if (useNode->gtLsraInfo.isTgtPref)
4002 prefSrcInterval = i;
4005 const bool delayRegFree = (hasDelayFreeSrc && useNode->gtLsraInfo.isDelayFree);
4008 // If delayRegFree, then Use will interfere with the destination of
4009 // the consuming node. Therefore, we also need add the kill set of
4010 // consuming node to minRegCount.
4012 // For example consider the following IR on x86, where v01 and v02
4013 // are method args coming in ecx and edx respectively.
4016 // For GT_DIV minRegCount will be 3 without adding kill set
4019 // Assume further JitStressRegs=2, which would constrain
4020 // candidates to callee trashable regs { eax, ecx, edx } on
4021 // use positions of v01 and v02. LSRA allocates ecx for v01.
4022 // Use position of v02 cannot be allocated a regs since it
4023 // is marked delay-reg free and {eax,edx} are getting killed
4024 // before the def of GT_DIV. For this reason, minRegCount
4025 // for Use position of v02 also needs to take into account
4026 // of kill set of its consuming node.
4027 unsigned minRegCountForUsePos = minRegCount;
4030 regMaskTP killMask = getKillSetForNode(tree);
4031 if (killMask != RBM_NONE)
4033 minRegCountForUsePos += genCountBits(killMask);
4038 regMaskTP candidates = getUseCandidates(useNode);
4040 if (useNode->OperIsPutArgSplit() || (compiler->opts.compUseSoftFP && useNode->OperIsPutArgReg()))
4042 // get i-th candidate, set bits in useCandidates must be in sequential order.
4043 candidates = genFindLowestReg(candidates & ~currCandidates);
4044 currCandidates |= candidates;
4046 #endif // _TARGET_ARM_
4048 assert((candidates & allRegs(i->registerType)) != 0);
4050 // For non-localVar uses we record nothing, as nothing needs to be written back to the tree.
4051 GenTree* const refPosNode = i->isLocalVar ? useNode : nullptr;
4052 RefPosition* pos = newRefPosition(i, currentLoc, RefTypeUse, refPosNode, candidates,
4053 locInfo.multiRegIdx DEBUG_ARG(minRegCountForUsePos));
4057 pos->delayRegFree = true;
4060 if (useNode->IsRegOptional())
4062 pos->setAllocateIfProfitable(true);
4066 listNodePool.ReturnNodes(operandDefs);
4068 return GenTree::VisitResult::Continue;
4071 buildInternalRegisterUsesForNode(tree, currentLoc, internalRefs, internalCount DEBUG_ARG(minRegCount));
4073 RegisterType registerType = getDefType(tree);
4074 regMaskTP candidates = getDefCandidates(tree);
4075 regMaskTP useCandidates = getUseCandidates(tree);
4078 if (VERBOSE && produce)
4080 printf("Def candidates ");
4081 dumpRegMask(candidates);
4082 printf(", Use candidates ");
4083 dumpRegMask(useCandidates);
4088 #if defined(_TARGET_AMD64_)
4089 // Multi-reg call node is the only node that could produce multi-reg value
4090 assert(produce <= 1 || (tree->IsMultiRegCall() && produce == MAX_RET_REG_COUNT));
4091 #endif // _TARGET_xxx_
4093 // Add kill positions before adding def positions
4094 buildKillPositionsForNode(tree, currentLoc + 1);
4096 #if FEATURE_PARTIAL_SIMD_CALLEE_SAVE
4097 VARSET_TP liveLargeVectors(VarSetOps::UninitVal());
4098 if (enregisterLocalVars && (RBM_FLT_CALLEE_SAVED != RBM_NONE))
4100 // Build RefPositions for saving any live large vectors.
4101 // This must be done after the kills, so that we know which large vectors are still live.
4102 VarSetOps::AssignNoCopy(compiler, liveLargeVectors, buildUpperVectorSaveRefPositions(tree, currentLoc + 1));
4104 #endif // FEATURE_PARTIAL_SIMD_CALLEE_SAVE
4106 ReturnTypeDesc* retTypeDesc = nullptr;
4107 bool isMultiRegCall = tree->IsMultiRegCall();
4110 retTypeDesc = tree->AsCall()->GetReturnTypeDesc();
4111 assert((int)genCountBits(candidates) == produce);
4112 assert(candidates == retTypeDesc->GetABIReturnRegs());
4116 LocationInfoList locationInfoList;
4117 LsraLocation defLocation = currentLoc + 1;
4119 regMaskTP remainingUseCandidates = useCandidates;
4121 for (int i = 0; i < produce; i++)
4123 regMaskTP currCandidates = candidates;
4124 Interval* interval = varDefInterval;
4126 // In case of multi-reg call node, registerType is given by
4127 // the type of ith position return register.
4130 registerType = retTypeDesc->GetReturnRegType((unsigned)i);
4131 currCandidates = genRegMask(retTypeDesc->GetABIReturnReg(i));
4132 useCandidates = allRegs(registerType);
4136 if (tree->OperIsPutArgSplit())
4138 // get i-th candidate
4139 currCandidates = genFindLowestReg(candidates);
4140 candidates &= ~currCandidates;
4143 // If oper is GT_PUTARG_REG, set bits in useCandidates must be in sequential order.
4144 else if (tree->OperIsMultiRegOp())
4146 useCandidates = genFindLowestReg(remainingUseCandidates);
4147 remainingUseCandidates &= ~useCandidates;
4149 #endif // ARM_SOFTFP
4150 #endif // _TARGET_ARM_
4152 if (interval == nullptr)
4154 // Make a new interval
4155 interval = newInterval(registerType);
4156 if (hasDelayFreeSrc)
4158 interval->hasNonCommutativeRMWDef = true;
4160 else if (tree->OperIsConst())
4162 assert(!tree->IsReuseRegVal());
4163 interval->isConstant = true;
4166 if ((currCandidates & useCandidates) != RBM_NONE)
4168 interval->updateRegisterPreferences(currCandidates & useCandidates);
4171 if (isSpecialPutArg)
4173 interval->isSpecialPutArg = true;
4178 assert(registerTypesEquivalent(interval->registerType, registerType));
4181 if (prefSrcInterval != nullptr)
4183 interval->assignRelatedIntervalIfUnassigned(prefSrcInterval);
4186 // for assignments, we want to create a refposition for the def
4190 locationInfoList.Append(listNodePool.GetNode(defLocation, interval, tree, (unsigned)i));
4193 RefPosition* pos = newRefPosition(interval, defLocation, defRefType, defNode, currCandidates,
4194 (unsigned)i DEBUG_ARG(minRegCount));
4195 if (info.isLocalDefUse)
4197 pos->isLocalDefUse = true;
4198 pos->lastUse = true;
4200 interval->updateRegisterPreferences(currCandidates);
4201 interval->updateRegisterPreferences(useCandidates);
4204 #if FEATURE_PARTIAL_SIMD_CALLEE_SAVE
4205 // SaveDef position must be at the same location as Def position of call node.
4206 if (enregisterLocalVars)
4208 buildUpperVectorRestoreRefPositions(tree, defLocation, liveLargeVectors);
4210 #endif // FEATURE_PARTIAL_SIMD_CALLEE_SAVE
4212 if (!locationInfoList.IsEmpty())
4214 bool added = operandToLocationInfoMap.AddOrUpdate(tree, locationInfoList);
4216 tree->gtLsraInfo.definesAnyRegisters = true;
4221 // make an interval for each physical register
4222 void LinearScan::buildPhysRegRecords()
4224 RegisterType regType = IntRegisterType;
4225 for (regNumber reg = REG_FIRST; reg < ACTUAL_REG_COUNT; reg = REG_NEXT(reg))
4227 RegRecord* curr = &physRegs[reg];
4232 BasicBlock* getNonEmptyBlock(BasicBlock* block)
4234 while (block != nullptr && block->bbTreeList == nullptr)
4236 BasicBlock* nextBlock = block->bbNext;
4237 // Note that here we use the version of NumSucc that does not take a compiler.
4238 // That way this doesn't have to take a compiler, or be an instance method, e.g. of LinearScan.
4239 // If we have an empty block, it must have jump type BBJ_NONE or BBJ_ALWAYS, in which
4240 // case we don't need the version that takes a compiler.
4241 assert(block->NumSucc() == 1 && ((block->bbJumpKind == BBJ_ALWAYS) || (block->bbJumpKind == BBJ_NONE)));
4242 // sometimes the first block is empty and ends with an uncond branch
4243 // assert( block->GetSucc(0) == nextBlock);
4246 assert(block != nullptr && block->bbTreeList != nullptr);
4250 //------------------------------------------------------------------------
4251 // insertZeroInitRefPositions: Handle lclVars that are live-in to the first block
4254 // Prior to calling this method, 'currentLiveVars' must be set to the set of register
4255 // candidate variables that are liveIn to the first block.
4256 // For each register candidate that is live-in to the first block:
4257 // - If it is a GC ref, or if compInitMem is set, a ZeroInit RefPosition will be created.
4258 // - Otherwise, it will be marked as spilled, since it will not be assigned a register
4259 // on entry and will be loaded from memory on the undefined path.
4260 // Note that, when the compInitMem option is not set, we may encounter these on
4261 // paths that are protected by the same condition as an earlier def. However, since
4262 // we don't do the analysis to determine this - and couldn't rely on always identifying
4263 // such cases even if we tried - we must conservatively treat the undefined path as
4264 // being possible. This is a relatively rare case, so the introduced conservatism is
4265 // not expected to warrant the analysis required to determine the best placement of
4266 // an initialization.
4268 void LinearScan::insertZeroInitRefPositions()
4270 assert(enregisterLocalVars);
4272 VARSET_TP expectedLiveVars(VarSetOps::Intersection(compiler, registerCandidateVars, compiler->fgFirstBB->bbLiveIn));
4273 assert(VarSetOps::Equal(compiler, currentLiveVars, expectedLiveVars));
4276 // insert defs for this, then a block boundary
4278 VarSetOps::Iter iter(compiler, currentLiveVars);
4279 unsigned varIndex = 0;
4280 while (iter.NextElem(&varIndex))
4282 unsigned varNum = compiler->lvaTrackedToVarNum[varIndex];
4283 LclVarDsc* varDsc = compiler->lvaTable + varNum;
4284 if (!varDsc->lvIsParam && isCandidateVar(varDsc))
4286 JITDUMP("V%02u was live in to first block:", varNum);
4287 Interval* interval = getIntervalForLocalVar(varIndex);
4288 if (compiler->info.compInitMem || varTypeIsGC(varDsc->TypeGet()))
4290 JITDUMP(" creating ZeroInit\n");
4291 GenTree* firstNode = getNonEmptyBlock(compiler->fgFirstBB)->firstNode();
4293 newRefPosition(interval, MinLocation, RefTypeZeroInit, firstNode, allRegs(interval->registerType));
4294 varDsc->lvMustInit = true;
4298 setIntervalAsSpilled(interval);
4299 JITDUMP(" marking as spilled\n");
4305 #if defined(FEATURE_UNIX_AMD64_STRUCT_PASSING)
4306 // -----------------------------------------------------------------------
4307 // Sets the register state for an argument of type STRUCT for System V systems.
4308 // See Compiler::raUpdateRegStateForArg(RegState *regState, LclVarDsc *argDsc) in regalloc.cpp
4309 // for how state for argument is updated for unix non-structs and Windows AMD64 structs.
4310 void LinearScan::unixAmd64UpdateRegStateForArg(LclVarDsc* argDsc)
4312 assert(varTypeIsStruct(argDsc));
4313 RegState* intRegState = &compiler->codeGen->intRegState;
4314 RegState* floatRegState = &compiler->codeGen->floatRegState;
4316 if ((argDsc->lvArgReg != REG_STK) && (argDsc->lvArgReg != REG_NA))
4318 if (genRegMask(argDsc->lvArgReg) & (RBM_ALLFLOAT))
4320 assert(genRegMask(argDsc->lvArgReg) & (RBM_FLTARG_REGS));
4321 floatRegState->rsCalleeRegArgMaskLiveIn |= genRegMask(argDsc->lvArgReg);
4325 assert(genRegMask(argDsc->lvArgReg) & (RBM_ARG_REGS));
4326 intRegState->rsCalleeRegArgMaskLiveIn |= genRegMask(argDsc->lvArgReg);
4330 if ((argDsc->lvOtherArgReg != REG_STK) && (argDsc->lvOtherArgReg != REG_NA))
4332 if (genRegMask(argDsc->lvOtherArgReg) & (RBM_ALLFLOAT))
4334 assert(genRegMask(argDsc->lvOtherArgReg) & (RBM_FLTARG_REGS));
4335 floatRegState->rsCalleeRegArgMaskLiveIn |= genRegMask(argDsc->lvOtherArgReg);
4339 assert(genRegMask(argDsc->lvOtherArgReg) & (RBM_ARG_REGS));
4340 intRegState->rsCalleeRegArgMaskLiveIn |= genRegMask(argDsc->lvOtherArgReg);
4345 #endif // defined(FEATURE_UNIX_AMD64_STRUCT_PASSING)
4347 //------------------------------------------------------------------------
4348 // updateRegStateForArg: Updates rsCalleeRegArgMaskLiveIn for the appropriate
4349 // regState (either compiler->intRegState or compiler->floatRegState),
4350 // with the lvArgReg on "argDsc"
4353 // argDsc - the argument for which the state is to be updated.
4355 // Return Value: None
4358 // The argument is live on entry to the function
4359 // (or is untracked and therefore assumed live)
4362 // This relies on a method in regAlloc.cpp that is shared between LSRA
4363 // and regAlloc. It is further abstracted here because regState is updated
4364 // separately for tracked and untracked variables in LSRA.
4366 void LinearScan::updateRegStateForArg(LclVarDsc* argDsc)
4368 #if defined(FEATURE_UNIX_AMD64_STRUCT_PASSING)
4369 // For System V AMD64 calls the argDsc can have 2 registers (for structs.)
4370 // Handle them here.
4371 if (varTypeIsStruct(argDsc))
4373 unixAmd64UpdateRegStateForArg(argDsc);
4376 #endif // defined(FEATURE_UNIX_AMD64_STRUCT_PASSING)
4378 RegState* intRegState = &compiler->codeGen->intRegState;
4379 RegState* floatRegState = &compiler->codeGen->floatRegState;
4380 // In the case of AMD64 we'll still use the floating point registers
4381 // to model the register usage for argument on vararg calls, so
4382 // we will ignore the varargs condition to determine whether we use
4383 // XMM registers or not for setting up the call.
4384 bool isFloat = (isFloatRegType(argDsc->lvType)
4385 #ifndef _TARGET_AMD64_
4386 && !compiler->info.compIsVarArgs
4388 && !compiler->opts.compUseSoftFP);
4390 if (argDsc->lvIsHfaRegArg())
4397 JITDUMP("Float arg V%02u in reg %s\n", (argDsc - compiler->lvaTable), getRegName(argDsc->lvArgReg));
4398 compiler->raUpdateRegStateForArg(floatRegState, argDsc);
4402 JITDUMP("Int arg V%02u in reg %s\n", (argDsc - compiler->lvaTable), getRegName(argDsc->lvArgReg));
4403 #if FEATURE_MULTIREG_ARGS
4404 if (argDsc->lvOtherArgReg != REG_NA)
4406 JITDUMP("(second half) in reg %s\n", getRegName(argDsc->lvOtherArgReg));
4408 #endif // FEATURE_MULTIREG_ARGS
4409 compiler->raUpdateRegStateForArg(intRegState, argDsc);
4414 //------------------------------------------------------------------------
4415 // findPredBlockForLiveIn: Determine which block should be used for the register locations of the live-in variables.
4418 // block - The block for which we're selecting a predecesor.
4419 // prevBlock - The previous block in in allocation order.
4420 // pPredBlockIsAllocated - A debug-only argument that indicates whether any of the predecessors have been seen
4421 // in allocation order.
4424 // The selected predecessor.
4427 // in DEBUG, caller initializes *pPredBlockIsAllocated to false, and it will be set to true if the block
4428 // returned is in fact a predecessor.
4431 // This will select a predecessor based on the heuristics obtained by getLsraBlockBoundaryLocations(), which can be
4433 // LSRA_BLOCK_BOUNDARY_PRED - Use the register locations of a predecessor block (default)
4434 // LSRA_BLOCK_BOUNDARY_LAYOUT - Use the register locations of the previous block in layout order.
4435 // This is the only case where this actually returns a different block.
4436 // LSRA_BLOCK_BOUNDARY_ROTATE - Rotate the register locations from a predecessor.
4437 // For this case, the block returned is the same as for LSRA_BLOCK_BOUNDARY_PRED, but
4438 // the register locations will be "rotated" to stress the resolution and allocation
4441 BasicBlock* LinearScan::findPredBlockForLiveIn(BasicBlock* block,
4442 BasicBlock* prevBlock DEBUGARG(bool* pPredBlockIsAllocated))
4444 BasicBlock* predBlock = nullptr;
4446 assert(*pPredBlockIsAllocated == false);
4447 if (getLsraBlockBoundaryLocations() == LSRA_BLOCK_BOUNDARY_LAYOUT)
4449 if (prevBlock != nullptr)
4451 predBlock = prevBlock;
4456 if (block != compiler->fgFirstBB)
4458 predBlock = block->GetUniquePred(compiler);
4459 if (predBlock != nullptr)
4461 if (isBlockVisited(predBlock))
4463 if (predBlock->bbJumpKind == BBJ_COND)
4465 // Special handling to improve matching on backedges.
4466 BasicBlock* otherBlock = (block == predBlock->bbNext) ? predBlock->bbJumpDest : predBlock->bbNext;
4467 noway_assert(otherBlock != nullptr);
4468 if (isBlockVisited(otherBlock))
4470 // This is the case when we have a conditional branch where one target has already
4471 // been visited. It would be best to use the same incoming regs as that block,
4472 // so that we have less likelihood of having to move registers.
4473 // For example, in determining the block to use for the starting register locations for
4474 // "block" in the following example, we'd like to use the same predecessor for "block"
4475 // as for "otherBlock", so that both successors of predBlock have the same locations, reducing
4476 // the likelihood of needing a split block on a backedge:
4487 for (flowList* pred = otherBlock->bbPreds; pred != nullptr; pred = pred->flNext)
4489 BasicBlock* otherPred = pred->flBlock;
4490 if (otherPred->bbNum == blockInfo[otherBlock->bbNum].predBBNum)
4492 predBlock = otherPred;
4501 predBlock = nullptr;
4506 for (flowList* pred = block->bbPreds; pred != nullptr; pred = pred->flNext)
4508 BasicBlock* candidatePredBlock = pred->flBlock;
4509 if (isBlockVisited(candidatePredBlock))
4511 if (predBlock == nullptr || predBlock->bbWeight < candidatePredBlock->bbWeight)
4513 predBlock = candidatePredBlock;
4514 INDEBUG(*pPredBlockIsAllocated = true;)
4519 if (predBlock == nullptr)
4521 predBlock = prevBlock;
4522 assert(predBlock != nullptr);
4523 JITDUMP("\n\nNo allocated predecessor; ");
4529 void LinearScan::buildIntervals()
4533 JITDUMP("\nbuildIntervals ========\n");
4535 // Now build (empty) records for all of the physical registers
4536 buildPhysRegRecords();
4541 printf("\n-----------------\n");
4542 printf("LIVENESS:\n");
4543 printf("-----------------\n");
4544 foreach_block(compiler, block)
4546 printf("BB%02u use def in out\n", block->bbNum);
4547 dumpConvertedVarSet(compiler, block->bbVarUse);
4549 dumpConvertedVarSet(compiler, block->bbVarDef);
4551 dumpConvertedVarSet(compiler, block->bbLiveIn);
4553 dumpConvertedVarSet(compiler, block->bbLiveOut);
4560 // We will determine whether we should double align the frame during
4561 // identifyCandidates(), but we initially assume that we will not.
4562 doDoubleAlign = false;
4565 identifyCandidates();
4567 // Figure out if we're going to use a frame pointer. We need to do this before building
4568 // the ref positions, because those objects will embed the frame register in various register masks
4569 // if the frame pointer is not reserved. If we decide to have a frame pointer, setFrameType() will
4570 // remove the frame pointer from the masks.
4573 DBEXEC(VERBOSE, TupleStyleDump(LSRA_DUMP_PRE));
4576 JITDUMP("\nbuildIntervals second part ========\n");
4577 LsraLocation currentLoc = 0;
4579 // Next, create ParamDef RefPositions for all the tracked parameters,
4580 // in order of their varIndex
4583 unsigned int lclNum;
4585 RegState* intRegState = &compiler->codeGen->intRegState;
4586 RegState* floatRegState = &compiler->codeGen->floatRegState;
4587 intRegState->rsCalleeRegArgMaskLiveIn = RBM_NONE;
4588 floatRegState->rsCalleeRegArgMaskLiveIn = RBM_NONE;
4590 for (unsigned int varIndex = 0; varIndex < compiler->lvaTrackedCount; varIndex++)
4592 lclNum = compiler->lvaTrackedToVarNum[varIndex];
4593 argDsc = &(compiler->lvaTable[lclNum]);
4595 if (!argDsc->lvIsParam)
4600 // Only reserve a register if the argument is actually used.
4601 // Is it dead on entry? If compJmpOpUsed is true, then the arguments
4602 // have to be kept alive, so we have to consider it as live on entry.
4603 // Use lvRefCnt instead of checking bbLiveIn because if it's volatile we
4604 // won't have done dataflow on it, but it needs to be marked as live-in so
4605 // it will get saved in the prolog.
4606 if (!compiler->compJmpOpUsed && argDsc->lvRefCnt == 0 && !compiler->opts.compDbgCode)
4611 if (argDsc->lvIsRegArg)
4613 updateRegStateForArg(argDsc);
4616 if (isCandidateVar(argDsc))
4618 Interval* interval = getIntervalForLocalVar(varIndex);
4619 regMaskTP mask = allRegs(TypeGet(argDsc));
4620 if (argDsc->lvIsRegArg)
4622 // Set this interval as currently assigned to that register
4623 regNumber inArgReg = argDsc->lvArgReg;
4624 assert(inArgReg < REG_COUNT);
4625 mask = genRegMask(inArgReg);
4626 assignPhysReg(inArgReg, interval);
4628 RefPosition* pos = newRefPosition(interval, MinLocation, RefTypeParamDef, nullptr, mask);
4630 else if (varTypeIsStruct(argDsc->lvType))
4632 for (unsigned fieldVarNum = argDsc->lvFieldLclStart;
4633 fieldVarNum < argDsc->lvFieldLclStart + argDsc->lvFieldCnt; ++fieldVarNum)
4635 LclVarDsc* fieldVarDsc = &(compiler->lvaTable[fieldVarNum]);
4636 if (fieldVarDsc->lvLRACandidate)
4638 assert(fieldVarDsc->lvTracked);
4639 Interval* interval = getIntervalForLocalVar(fieldVarDsc->lvVarIndex);
4641 newRefPosition(interval, MinLocation, RefTypeParamDef, nullptr, allRegs(TypeGet(fieldVarDsc)));
4647 // We can overwrite the register (i.e. codegen saves it on entry)
4648 assert(argDsc->lvRefCnt == 0 || !argDsc->lvIsRegArg || argDsc->lvDoNotEnregister ||
4649 !argDsc->lvLRACandidate || (varTypeIsFloating(argDsc->TypeGet()) && compiler->opts.compDbgCode));
4653 // Now set up the reg state for the non-tracked args
4654 // (We do this here because we want to generate the ParamDef RefPositions in tracked
4655 // order, so that loop doesn't hit the non-tracked args)
4657 for (unsigned argNum = 0; argNum < compiler->info.compArgsCount; argNum++, argDsc++)
4659 argDsc = &(compiler->lvaTable[argNum]);
4661 if (argDsc->lvPromotedStruct())
4663 noway_assert(argDsc->lvFieldCnt == 1); // We only handle one field here
4665 unsigned fieldVarNum = argDsc->lvFieldLclStart;
4666 argDsc = &(compiler->lvaTable[fieldVarNum]);
4668 noway_assert(argDsc->lvIsParam);
4669 if (!argDsc->lvTracked && argDsc->lvIsRegArg)
4671 updateRegStateForArg(argDsc);
4675 // If there is a secret stub param, it is also live in
4676 if (compiler->info.compPublishStubParam)
4678 intRegState->rsCalleeRegArgMaskLiveIn |= RBM_SECRET_STUB_PARAM;
4681 LocationInfoListNodePool listNodePool(compiler, 8);
4682 SmallHashTable<GenTree*, LocationInfoList, 32> operandToLocationInfoMap(compiler);
4684 BasicBlock* predBlock = nullptr;
4685 BasicBlock* prevBlock = nullptr;
4687 // Initialize currentLiveVars to the empty set. We will set it to the current
4688 // live-in at the entry to each block (this will include the incoming args on
4689 // the first block).
4690 VarSetOps::AssignNoCopy(compiler, currentLiveVars, VarSetOps::MakeEmpty(compiler));
4692 for (block = startBlockSequence(); block != nullptr; block = moveToNextBlock())
4694 JITDUMP("\nNEW BLOCK BB%02u\n", block->bbNum);
4696 bool predBlockIsAllocated = false;
4697 predBlock = findPredBlockForLiveIn(block, prevBlock DEBUGARG(&predBlockIsAllocated));
4700 JITDUMP("\n\nSetting BB%02u as the predecessor for determining incoming variable registers of BB%02u\n",
4701 block->bbNum, predBlock->bbNum);
4702 assert(predBlock->bbNum <= bbNumMaxBeforeResolution);
4703 blockInfo[block->bbNum].predBBNum = predBlock->bbNum;
4706 if (enregisterLocalVars)
4708 VarSetOps::AssignNoCopy(compiler, currentLiveVars,
4709 VarSetOps::Intersection(compiler, registerCandidateVars, block->bbLiveIn));
4711 if (block == compiler->fgFirstBB)
4713 insertZeroInitRefPositions();
4716 // Any lclVars live-in to a block are resolution candidates.
4717 VarSetOps::UnionD(compiler, resolutionCandidateVars, currentLiveVars);
4719 // Determine if we need any DummyDefs.
4720 // We need DummyDefs for cases where "predBlock" isn't really a predecessor.
4721 // Note that it's possible to have uses of unitialized variables, in which case even the first
4722 // block may require DummyDefs, which we are not currently adding - this means that these variables
4723 // will always be considered to be in memory on entry (and reloaded when the use is encountered).
4724 // TODO-CQ: Consider how best to tune this. Currently, if we create DummyDefs for uninitialized
4725 // variables (which may actually be initialized along the dynamically executed paths, but not
4726 // on all static paths), we wind up with excessive liveranges for some of these variables.
4727 VARSET_TP newLiveIn(VarSetOps::MakeCopy(compiler, currentLiveVars));
4730 // Compute set difference: newLiveIn = currentLiveVars - predBlock->bbLiveOut
4731 VarSetOps::DiffD(compiler, newLiveIn, predBlock->bbLiveOut);
4733 bool needsDummyDefs = (!VarSetOps::IsEmpty(compiler, newLiveIn) && block != compiler->fgFirstBB);
4735 // Create dummy def RefPositions
4739 // If we are using locations from a predecessor, we should never require DummyDefs.
4740 assert(!predBlockIsAllocated);
4742 JITDUMP("Creating dummy definitions\n");
4743 VarSetOps::Iter iter(compiler, newLiveIn);
4744 unsigned varIndex = 0;
4745 while (iter.NextElem(&varIndex))
4747 unsigned varNum = compiler->lvaTrackedToVarNum[varIndex];
4748 LclVarDsc* varDsc = compiler->lvaTable + varNum;
4749 // Add a dummyDef for any candidate vars that are in the "newLiveIn" set.
4750 // If this is the entry block, don't add any incoming parameters (they're handled with ParamDefs).
4751 if (isCandidateVar(varDsc) && (predBlock != nullptr || !varDsc->lvIsParam))
4753 Interval* interval = getIntervalForLocalVar(varIndex);
4754 RefPosition* pos = newRefPosition(interval, currentLoc, RefTypeDummyDef, nullptr,
4755 allRegs(interval->registerType));
4758 JITDUMP("Finished creating dummy definitions\n\n");
4762 // Add a dummy RefPosition to mark the block boundary.
4763 // Note that we do this AFTER adding the exposed uses above, because the
4764 // register positions for those exposed uses need to be recorded at
4767 RefPosition* pos = newRefPosition((Interval*)nullptr, currentLoc, RefTypeBB, nullptr, RBM_NONE);
4770 LIR::Range& blockRange = LIR::AsRange(block);
4771 for (GenTree* node : blockRange.NonPhiNodes())
4773 assert(node->gtLsraInfo.loc >= currentLoc);
4774 assert(!node->IsValue() || !node->IsUnusedValue() || node->gtLsraInfo.isLocalDefUse);
4776 currentLoc = node->gtLsraInfo.loc;
4777 buildRefPositionsForNode(node, block, listNodePool, operandToLocationInfoMap, currentLoc);
4780 if (currentLoc > maxNodeLocation)
4782 maxNodeLocation = currentLoc;
4787 // Increment the LsraLocation at this point, so that the dummy RefPositions
4788 // will not have the same LsraLocation as any "real" RefPosition.
4791 // Note: the visited set is cleared in LinearScan::doLinearScan()
4792 markBlockVisited(block);
4794 if (enregisterLocalVars)
4796 // Insert exposed uses for a lclVar that is live-out of 'block' but not live-in to the
4797 // next block, or any unvisited successors.
4798 // This will address lclVars that are live on a backedge, as well as those that are kept
4799 // live at a GT_JMP.
4801 // Blocks ending with "jmp method" are marked as BBJ_HAS_JMP,
4802 // and jmp call is represented using GT_JMP node which is a leaf node.
4803 // Liveness phase keeps all the arguments of the method live till the end of
4804 // block by adding them to liveout set of the block containing GT_JMP.
4806 // The target of a GT_JMP implicitly uses all the current method arguments, however
4807 // there are no actual references to them. This can cause LSRA to assert, because
4808 // the variables are live but it sees no references. In order to correctly model the
4809 // liveness of these arguments, we add dummy exposed uses, in the same manner as for
4810 // backward branches. This will happen automatically via expUseSet.
4812 // Note that a block ending with GT_JMP has no successors and hence the variables
4813 // for which dummy use ref positions are added are arguments of the method.
4815 VARSET_TP expUseSet(VarSetOps::MakeCopy(compiler, block->bbLiveOut));
4816 VarSetOps::IntersectionD(compiler, expUseSet, registerCandidateVars);
4817 BasicBlock* nextBlock = getNextBlock();
4818 if (nextBlock != nullptr)
4820 VarSetOps::DiffD(compiler, expUseSet, nextBlock->bbLiveIn);
4822 for (BasicBlock* succ : block->GetAllSuccs(compiler))
4824 if (VarSetOps::IsEmpty(compiler, expUseSet))
4829 if (isBlockVisited(succ))
4833 VarSetOps::DiffD(compiler, expUseSet, succ->bbLiveIn);
4836 if (!VarSetOps::IsEmpty(compiler, expUseSet))
4838 JITDUMP("Exposed uses:");
4839 VarSetOps::Iter iter(compiler, expUseSet);
4840 unsigned varIndex = 0;
4841 while (iter.NextElem(&varIndex))
4843 unsigned varNum = compiler->lvaTrackedToVarNum[varIndex];
4844 LclVarDsc* varDsc = compiler->lvaTable + varNum;
4845 assert(isCandidateVar(varDsc));
4846 Interval* interval = getIntervalForLocalVar(varIndex);
4848 newRefPosition(interval, currentLoc, RefTypeExpUse, nullptr, allRegs(interval->registerType));
4849 JITDUMP(" V%02u", varNum);
4854 // Clear the "last use" flag on any vars that are live-out from this block.
4856 VarSetOps::Iter iter(compiler, block->bbLiveOut);
4857 unsigned varIndex = 0;
4858 while (iter.NextElem(&varIndex))
4860 unsigned varNum = compiler->lvaTrackedToVarNum[varIndex];
4861 LclVarDsc* const varDsc = &compiler->lvaTable[varNum];
4862 if (isCandidateVar(varDsc))
4864 RefPosition* const lastRP = getIntervalForLocalVar(varIndex)->lastRefPosition;
4865 if ((lastRP != nullptr) && (lastRP->bbNum == block->bbNum))
4867 lastRP->lastUse = false;
4874 checkLastUses(block);
4879 dumpConvertedVarSet(compiler, block->bbVarUse);
4881 dumpConvertedVarSet(compiler, block->bbVarDef);
4890 if (enregisterLocalVars)
4892 if (compiler->lvaKeepAliveAndReportThis())
4894 // If we need to KeepAliveAndReportThis, add a dummy exposed use of it at the end
4895 unsigned keepAliveVarNum = compiler->info.compThisArg;
4896 assert(compiler->info.compIsStatic == false);
4897 LclVarDsc* varDsc = compiler->lvaTable + keepAliveVarNum;
4898 if (isCandidateVar(varDsc))
4900 JITDUMP("Adding exposed use of this, for lvaKeepAliveAndReportThis\n");
4901 Interval* interval = getIntervalForLocalVar(varDsc->lvVarIndex);
4903 newRefPosition(interval, currentLoc, RefTypeExpUse, nullptr, allRegs(interval->registerType));
4908 if (getLsraExtendLifeTimes())
4911 for (lclNum = 0, varDsc = compiler->lvaTable; lclNum < compiler->lvaCount; lclNum++, varDsc++)
4913 if (varDsc->lvLRACandidate)
4915 JITDUMP("Adding exposed use of V%02u for LsraExtendLifetimes\n", lclNum);
4916 Interval* interval = getIntervalForLocalVar(varDsc->lvVarIndex);
4918 newRefPosition(interval, currentLoc, RefTypeExpUse, nullptr, allRegs(interval->registerType));
4925 // If the last block has successors, create a RefTypeBB to record
4928 if (prevBlock->NumSucc(compiler) > 0)
4930 RefPosition* pos = newRefPosition((Interval*)nullptr, currentLoc, RefTypeBB, nullptr, RBM_NONE);
4934 // Make sure we don't have any blocks that were not visited
4935 foreach_block(compiler, block)
4937 assert(isBlockVisited(block));
4942 lsraDumpIntervals("BEFORE VALIDATING INTERVALS");
4943 dumpRefPositions("BEFORE VALIDATING INTERVALS");
4944 validateIntervals();
4950 void LinearScan::dumpVarRefPositions(const char* title)
4952 if (enregisterLocalVars)
4954 printf("\nVAR REFPOSITIONS %s\n", title);
4956 for (unsigned i = 0; i < compiler->lvaCount; i++)
4958 printf("--- V%02u\n", i);
4960 LclVarDsc* varDsc = compiler->lvaTable + i;
4961 if (varDsc->lvIsRegCandidate())
4963 Interval* interval = getIntervalForLocalVar(varDsc->lvVarIndex);
4964 for (RefPosition* ref = interval->firstRefPosition; ref != nullptr; ref = ref->nextRefPosition)
4974 void LinearScan::validateIntervals()
4976 if (enregisterLocalVars)
4978 for (unsigned i = 0; i < compiler->lvaTrackedCount; i++)
4980 if (!compiler->lvaTable[compiler->lvaTrackedToVarNum[i]].lvLRACandidate)
4984 Interval* interval = getIntervalForLocalVar(i);
4986 bool defined = false;
4987 printf("-----------------\n");
4988 for (RefPosition* ref = interval->firstRefPosition; ref != nullptr; ref = ref->nextRefPosition)
4991 RefType refType = ref->refType;
4992 if (!defined && RefTypeIsUse(refType))
4994 if (compiler->info.compMethodName != nullptr)
4996 printf("%s: ", compiler->info.compMethodName);
4998 printf("LocalVar V%02u: undefined use at %u\n", interval->varNum, ref->nodeLocation);
5000 // Note that there can be multiple last uses if they are on disjoint paths,
5001 // so we can't really check the lastUse flag
5006 if (RefTypeIsDef(refType))
5016 // Set the default rpFrameType based upon codeGen->isFramePointerRequired()
5017 // This was lifted from the register predictor
5019 void LinearScan::setFrameType()
5021 FrameType frameType = FT_NOT_SET;
5023 compiler->codeGen->setDoubleAlign(false);
5026 frameType = FT_DOUBLE_ALIGN_FRAME;
5027 compiler->codeGen->setDoubleAlign(true);
5030 #endif // DOUBLE_ALIGN
5031 if (compiler->codeGen->isFramePointerRequired())
5033 frameType = FT_EBP_FRAME;
5037 if (compiler->rpMustCreateEBPCalled == false)
5042 compiler->rpMustCreateEBPCalled = true;
5043 if (compiler->rpMustCreateEBPFrame(INDEBUG(&reason)))
5045 JITDUMP("; Decided to create an EBP based frame for ETW stackwalking (%s)\n", reason);
5046 compiler->codeGen->setFrameRequired(true);
5050 if (compiler->codeGen->isFrameRequired())
5052 frameType = FT_EBP_FRAME;
5056 frameType = FT_ESP_FRAME;
5063 noway_assert(!compiler->codeGen->isFramePointerRequired());
5064 noway_assert(!compiler->codeGen->isFrameRequired());
5065 compiler->codeGen->setFramePointerUsed(false);
5068 compiler->codeGen->setFramePointerUsed(true);
5071 case FT_DOUBLE_ALIGN_FRAME:
5072 noway_assert(!compiler->codeGen->isFramePointerRequired());
5073 compiler->codeGen->setFramePointerUsed(false);
5075 #endif // DOUBLE_ALIGN
5077 noway_assert(!"rpFrameType not set correctly!");
5081 // If we are using FPBASE as the frame register, we cannot also use it for
5082 // a local var. Note that we may have already added it to the register masks,
5083 // which are computed when the LinearScan class constructor is created, and
5084 // used during lowering. Luckily, the TreeNodeInfo only stores an index to
5085 // the masks stored in the LinearScan class, so we only need to walk the
5086 // unique masks and remove FPBASE.
5087 if (frameType == FT_EBP_FRAME)
5089 if ((availableIntRegs & RBM_FPBASE) != 0)
5091 RemoveRegisterFromMasks(REG_FPBASE);
5093 // We know that we're already in "read mode" for availableIntRegs. However,
5094 // we need to remove the FPBASE register, so subsequent users (like callers
5095 // to allRegs()) get the right thing. The RemoveRegisterFromMasks() code
5096 // fixes up everything that already took a dependency on the value that was
5097 // previously read, so this completes the picture.
5098 availableIntRegs.OverrideAssign(availableIntRegs & ~RBM_FPBASE);
5102 compiler->rpFrameType = frameType;
5105 // Is the copyReg/moveReg given by this RefPosition still busy at the
5107 bool copyOrMoveRegInUse(RefPosition* ref, LsraLocation loc)
5109 assert(ref->copyReg || ref->moveReg);
5110 if (ref->getRefEndLocation() >= loc)
5114 Interval* interval = ref->getInterval();
5115 RefPosition* nextRef = interval->getNextRefPosition();
5116 if (nextRef != nullptr && nextRef->treeNode == ref->treeNode && nextRef->getRefEndLocation() >= loc)
5123 // Determine whether the register represented by "physRegRecord" is available at least
5124 // at the "currentLoc", and if so, return the next location at which it is in use in
5125 // "nextRefLocationPtr"
5127 bool LinearScan::registerIsAvailable(RegRecord* physRegRecord,
5128 LsraLocation currentLoc,
5129 LsraLocation* nextRefLocationPtr,
5130 RegisterType regType)
5132 *nextRefLocationPtr = MaxLocation;
5133 LsraLocation nextRefLocation = MaxLocation;
5134 regMaskTP regMask = genRegMask(physRegRecord->regNum);
5135 if (physRegRecord->isBusyUntilNextKill)
5140 RefPosition* nextPhysReference = physRegRecord->getNextRefPosition();
5141 if (nextPhysReference != nullptr)
5143 nextRefLocation = nextPhysReference->nodeLocation;
5144 // if (nextPhysReference->refType == RefTypeFixedReg) nextRefLocation--;
5146 else if (!physRegRecord->isCalleeSave)
5148 nextRefLocation = MaxLocation - 1;
5151 Interval* assignedInterval = physRegRecord->assignedInterval;
5153 if (assignedInterval != nullptr)
5155 RefPosition* recentReference = assignedInterval->recentRefPosition;
5157 // The only case where we have an assignedInterval, but recentReference is null
5158 // is where this interval is live at procedure entry (i.e. an arg register), in which
5159 // case it's still live and its assigned register is not available
5160 // (Note that the ParamDef will be recorded as a recentReference when we encounter
5161 // it, but we will be allocating registers, potentially to other incoming parameters,
5162 // as we process the ParamDefs.)
5164 if (recentReference == nullptr)
5169 // Is this a copyReg/moveReg? It is if the register assignment doesn't match.
5170 // (the recentReference may not be a copyReg/moveReg, because we could have seen another
5171 // reference since the copyReg/moveReg)
5173 if (!assignedInterval->isAssignedTo(physRegRecord->regNum))
5175 // Don't reassign it if it's still in use
5176 if ((recentReference->copyReg || recentReference->moveReg) &&
5177 copyOrMoveRegInUse(recentReference, currentLoc))
5182 else if (!assignedInterval->isActive && assignedInterval->isConstant)
5184 // Treat this as unassigned, i.e. do nothing.
5185 // TODO-CQ: Consider adjusting the heuristics (probably in the caller of this method)
5186 // to avoid reusing these registers.
5188 // If this interval isn't active, it's available if it isn't referenced
5189 // at this location (or the previous location, if the recent RefPosition
5190 // is a delayRegFree).
5191 else if (!assignedInterval->isActive &&
5192 (recentReference->refType == RefTypeExpUse || recentReference->getRefEndLocation() < currentLoc))
5194 // This interval must have a next reference (otherwise it wouldn't be assigned to this register)
5195 RefPosition* nextReference = recentReference->nextRefPosition;
5196 if (nextReference != nullptr)
5198 if (nextReference->nodeLocation < nextRefLocation)
5200 nextRefLocation = nextReference->nodeLocation;
5205 assert(recentReference->copyReg && recentReference->registerAssignment != regMask);
5213 if (nextRefLocation < *nextRefLocationPtr)
5215 *nextRefLocationPtr = nextRefLocation;
5219 if (regType == TYP_DOUBLE)
5221 // Recurse, but check the other half this time (TYP_FLOAT)
5222 if (!registerIsAvailable(getRegisterRecord(REG_NEXT(physRegRecord->regNum)), currentLoc, nextRefLocationPtr,
5225 nextRefLocation = *nextRefLocationPtr;
5227 #endif // _TARGET_ARM_
5229 return (nextRefLocation >= currentLoc);
5232 //------------------------------------------------------------------------
5233 // getRegisterType: Get the RegisterType to use for the given RefPosition
5236 // currentInterval: The interval for the current allocation
5237 // refPosition: The RefPosition of the current Interval for which a register is being allocated
5240 // The RegisterType that should be allocated for this RefPosition
5243 // This will nearly always be identical to the registerType of the interval, except in the case
5244 // of SIMD types of 8 bytes (currently only Vector2) when they are passed and returned in integer
5245 // registers, or copied to a return temp.
5246 // This method need only be called in situations where we may be dealing with the register requirements
5247 // of a RefTypeUse RefPosition (i.e. not when we are only looking at the type of an interval, nor when
5248 // we are interested in the "defining" type of the interval). This is because the situation of interest
5249 // only happens at the use (where it must be copied to an integer register).
5251 RegisterType LinearScan::getRegisterType(Interval* currentInterval, RefPosition* refPosition)
5253 assert(refPosition->getInterval() == currentInterval);
5254 RegisterType regType = currentInterval->registerType;
5255 regMaskTP candidates = refPosition->registerAssignment;
5257 assert((candidates & allRegs(regType)) != RBM_NONE);
5261 //------------------------------------------------------------------------
5262 // tryAllocateFreeReg: Find a free register that satisfies the requirements for refPosition,
5263 // and takes into account the preferences for the given Interval
5266 // currentInterval: The interval for the current allocation
5267 // refPosition: The RefPosition of the current Interval for which a register is being allocated
5270 // The regNumber, if any, allocated to the RefPositon. Returns REG_NA if no free register is found.
5273 // TODO-CQ: Consider whether we need to use a different order for tree temps than for vars, as
5276 static const regNumber lsraRegOrder[] = {REG_VAR_ORDER};
5277 const unsigned lsraRegOrderSize = ArrLen(lsraRegOrder);
5278 static const regNumber lsraRegOrderFlt[] = {REG_VAR_ORDER_FLT};
5279 const unsigned lsraRegOrderFltSize = ArrLen(lsraRegOrderFlt);
5281 regNumber LinearScan::tryAllocateFreeReg(Interval* currentInterval, RefPosition* refPosition)
5283 regNumber foundReg = REG_NA;
5285 RegisterType regType = getRegisterType(currentInterval, refPosition);
5286 const regNumber* regOrder;
5287 unsigned regOrderSize;
5288 if (useFloatReg(regType))
5290 regOrder = lsraRegOrderFlt;
5291 regOrderSize = lsraRegOrderFltSize;
5295 regOrder = lsraRegOrder;
5296 regOrderSize = lsraRegOrderSize;
5299 LsraLocation currentLocation = refPosition->nodeLocation;
5300 RefPosition* nextRefPos = refPosition->nextRefPosition;
5301 LsraLocation nextLocation = (nextRefPos == nullptr) ? currentLocation : nextRefPos->nodeLocation;
5302 regMaskTP candidates = refPosition->registerAssignment;
5303 regMaskTP preferences = currentInterval->registerPreferences;
5305 if (RefTypeIsDef(refPosition->refType))
5307 if (currentInterval->hasConflictingDefUse)
5309 resolveConflictingDefAndUse(currentInterval, refPosition);
5310 candidates = refPosition->registerAssignment;
5312 // Otherwise, check for the case of a fixed-reg def of a reg that will be killed before the
5313 // use, or interferes at the point of use (which shouldn't happen, but Lower doesn't mark
5314 // the contained nodes as interfering).
5315 // Note that we may have a ParamDef RefPosition that is marked isFixedRegRef, but which
5316 // has had its registerAssignment changed to no longer be a single register.
5317 else if (refPosition->isFixedRegRef && nextRefPos != nullptr && RefTypeIsUse(nextRefPos->refType) &&
5318 !nextRefPos->isFixedRegRef && genMaxOneBit(refPosition->registerAssignment))
5320 regNumber defReg = refPosition->assignedReg();
5321 RegRecord* defRegRecord = getRegisterRecord(defReg);
5323 RefPosition* currFixedRegRefPosition = defRegRecord->recentRefPosition;
5324 assert(currFixedRegRefPosition != nullptr &&
5325 currFixedRegRefPosition->nodeLocation == refPosition->nodeLocation);
5327 // If there is another fixed reference to this register before the use, change the candidates
5328 // on this RefPosition to include that of nextRefPos.
5329 if (currFixedRegRefPosition->nextRefPosition != nullptr &&
5330 currFixedRegRefPosition->nextRefPosition->nodeLocation <= nextRefPos->getRefEndLocation())
5332 candidates |= nextRefPos->registerAssignment;
5333 if (preferences == refPosition->registerAssignment)
5335 preferences = candidates;
5341 preferences &= candidates;
5342 if (preferences == RBM_NONE)
5344 preferences = candidates;
5346 regMaskTP relatedPreferences = RBM_NONE;
5349 candidates = stressLimitRegs(refPosition, candidates);
5351 bool mustAssignARegister = true;
5352 assert(candidates != RBM_NONE);
5354 // If the related interval has no further references, it is possible that it is a source of the
5355 // node that produces this interval. However, we don't want to use the relatedInterval for preferencing
5356 // if its next reference is not a new definition (as it either is or will become live).
5357 Interval* relatedInterval = currentInterval->relatedInterval;
5358 if (relatedInterval != nullptr)
5360 RefPosition* nextRelatedRefPosition = relatedInterval->getNextRefPosition();
5361 if (nextRelatedRefPosition != nullptr)
5363 // Don't use the relatedInterval for preferencing if its next reference is not a new definition.
5364 if (!RefTypeIsDef(nextRelatedRefPosition->refType))
5366 relatedInterval = nullptr;
5368 // Is the relatedInterval simply a copy to another relatedInterval?
5369 else if ((relatedInterval->relatedInterval != nullptr) &&
5370 (nextRelatedRefPosition->nextRefPosition != nullptr) &&
5371 (nextRelatedRefPosition->nextRefPosition->nextRefPosition == nullptr) &&
5372 (nextRelatedRefPosition->nextRefPosition->nodeLocation <
5373 relatedInterval->relatedInterval->getNextRefLocation()))
5375 // The current relatedInterval has only two remaining RefPositions, both of which
5376 // occur prior to the next RefPosition for its relatedInterval.
5377 // It is likely a copy.
5378 relatedInterval = relatedInterval->relatedInterval;
5383 if (relatedInterval != nullptr)
5385 // If the related interval already has an assigned register, then use that
5386 // as the related preference. We'll take the related
5387 // interval preferences into account in the loop over all the registers.
5389 if (relatedInterval->assignedReg != nullptr)
5391 relatedPreferences = genRegMask(relatedInterval->assignedReg->regNum);
5395 relatedPreferences = relatedInterval->registerPreferences;
5399 bool preferCalleeSave = currentInterval->preferCalleeSave;
5401 // For floating point, we want to be less aggressive about using callee-save registers.
5402 // So in that case, we just need to ensure that the current RefPosition is covered.
5403 RefPosition* rangeEndRefPosition;
5404 RefPosition* lastRefPosition = currentInterval->lastRefPosition;
5405 if (useFloatReg(currentInterval->registerType))
5407 rangeEndRefPosition = refPosition;
5411 rangeEndRefPosition = currentInterval->lastRefPosition;
5412 // If we have a relatedInterval that is not currently occupying a register,
5413 // and whose lifetime begins after this one ends,
5414 // we want to try to select a register that will cover its lifetime.
5415 if ((relatedInterval != nullptr) && (relatedInterval->assignedReg == nullptr) &&
5416 (relatedInterval->getNextRefLocation() >= rangeEndRefPosition->nodeLocation))
5418 lastRefPosition = relatedInterval->lastRefPosition;
5419 preferCalleeSave = relatedInterval->preferCalleeSave;
5423 // If this has a delayed use (due to being used in a rmw position of a
5424 // non-commutative operator), its endLocation is delayed until the "def"
5425 // position, which is one location past the use (getRefEndLocation() takes care of this).
5426 LsraLocation rangeEndLocation = rangeEndRefPosition->getRefEndLocation();
5427 LsraLocation lastLocation = lastRefPosition->getRefEndLocation();
5428 regNumber prevReg = REG_NA;
5430 if (currentInterval->assignedReg)
5432 bool useAssignedReg = false;
5433 // This was an interval that was previously allocated to the given
5434 // physical register, and we should try to allocate it to that register
5435 // again, if possible and reasonable.
5436 // Use it preemptively (i.e. before checking other available regs)
5437 // only if it is preferred and available.
5439 RegRecord* regRec = currentInterval->assignedReg;
5440 prevReg = regRec->regNum;
5441 regMaskTP prevRegBit = genRegMask(prevReg);
5443 // Is it in the preferred set of regs?
5444 if ((prevRegBit & preferences) != RBM_NONE)
5446 // Is it currently available?
5447 LsraLocation nextPhysRefLoc;
5448 if (registerIsAvailable(regRec, currentLocation, &nextPhysRefLoc, currentInterval->registerType))
5450 // If the register is next referenced at this location, only use it if
5451 // this has a fixed reg requirement (i.e. this is the reference that caused
5452 // the FixedReg ref to be created)
5454 if (!regRec->conflictingFixedRegReference(refPosition))
5456 useAssignedReg = true;
5462 regNumber foundReg = prevReg;
5463 assignPhysReg(regRec, currentInterval);
5464 refPosition->registerAssignment = genRegMask(foundReg);
5469 // Don't keep trying to allocate to this register
5470 currentInterval->assignedReg = nullptr;
5474 RegRecord* availablePhysRegInterval = nullptr;
5475 Interval* intervalToUnassign = nullptr;
5477 // Each register will receive a score which is the sum of the scoring criteria below.
5478 // These were selected on the assumption that they will have an impact on the "goodness"
5479 // of a register selection, and have been tuned to a certain extent by observing the impact
5480 // of the ordering on asmDiffs. However, there is probably much more room for tuning,
5481 // and perhaps additional criteria.
5483 // These are FLAGS (bits) so that we can easily order them and add them together.
5484 // If the scores are equal, but one covers more of the current interval's range,
5485 // then it wins. Otherwise, the one encountered earlier in the regOrder wins.
5489 VALUE_AVAILABLE = 0x40, // It is a constant value that is already in an acceptable register.
5490 COVERS = 0x20, // It is in the interval's preference set and it covers the entire lifetime.
5491 OWN_PREFERENCE = 0x10, // It is in the preference set of this interval.
5492 COVERS_RELATED = 0x08, // It is in the preference set of the related interval and covers the entire lifetime.
5493 RELATED_PREFERENCE = 0x04, // It is in the preference set of the related interval.
5494 CALLER_CALLEE = 0x02, // It is in the right "set" for the interval (caller or callee-save).
5495 UNASSIGNED = 0x01, // It is not currently assigned to an inactive interval.
5500 // Compute the best possible score so we can stop looping early if we find it.
5501 // TODO-Throughput: At some point we may want to short-circuit the computation of each score, but
5502 // probably not until we've tuned the order of these criteria. At that point,
5503 // we'll need to avoid the short-circuit if we've got a stress option to reverse
5505 int bestPossibleScore = COVERS + UNASSIGNED + OWN_PREFERENCE + CALLER_CALLEE;
5506 if (relatedPreferences != RBM_NONE)
5508 bestPossibleScore |= RELATED_PREFERENCE + COVERS_RELATED;
5511 LsraLocation bestLocation = MinLocation;
5513 // In non-debug builds, this will simply get optimized away
5514 bool reverseSelect = false;
5516 reverseSelect = doReverseSelect();
5519 // An optimization for the common case where there is only one candidate -
5520 // avoid looping over all the other registers
5522 regNumber singleReg = REG_NA;
5524 if (genMaxOneBit(candidates))
5527 singleReg = genRegNumFromMask(candidates);
5528 regOrder = &singleReg;
5531 for (unsigned i = 0; i < regOrderSize && (candidates != RBM_NONE); i++)
5533 regNumber regNum = regOrder[i];
5534 regMaskTP candidateBit = genRegMask(regNum);
5536 if (!(candidates & candidateBit))
5541 candidates &= ~candidateBit;
5543 RegRecord* physRegRecord = getRegisterRecord(regNum);
5546 LsraLocation nextPhysRefLocation = MaxLocation;
5548 // By chance, is this register already holding this interval, as a copyReg or having
5549 // been restored as inactive after a kill?
5550 if (physRegRecord->assignedInterval == currentInterval)
5552 availablePhysRegInterval = physRegRecord;
5553 intervalToUnassign = nullptr;
5557 // Find the next RefPosition of the physical register
5558 if (!registerIsAvailable(physRegRecord, currentLocation, &nextPhysRefLocation, regType))
5563 // If the register is next referenced at this location, only use it if
5564 // this has a fixed reg requirement (i.e. this is the reference that caused
5565 // the FixedReg ref to be created)
5567 if (physRegRecord->conflictingFixedRegReference(refPosition))
5572 // If this is a definition of a constant interval, check to see if its value is already in this register.
5573 if (currentInterval->isConstant && RefTypeIsDef(refPosition->refType) &&
5574 (physRegRecord->assignedInterval != nullptr) && physRegRecord->assignedInterval->isConstant)
5576 noway_assert(refPosition->treeNode != nullptr);
5577 GenTree* otherTreeNode = physRegRecord->assignedInterval->firstRefPosition->treeNode;
5578 noway_assert(otherTreeNode != nullptr);
5580 if (refPosition->treeNode->OperGet() == otherTreeNode->OperGet())
5582 switch (otherTreeNode->OperGet())
5585 if ((refPosition->treeNode->AsIntCon()->IconValue() ==
5586 otherTreeNode->AsIntCon()->IconValue()) &&
5587 (varTypeGCtype(refPosition->treeNode) == varTypeGCtype(otherTreeNode)))
5589 #ifdef _TARGET_64BIT_
5590 // If the constant is negative, only reuse registers of the same type.
5591 // This is because, on a 64-bit system, we do not sign-extend immediates in registers to
5592 // 64-bits unless they are actually longs, as this requires a longer instruction.
5593 // This doesn't apply to a 32-bit system, on which long values occupy multiple registers.
5594 // (We could sign-extend, but we would have to always sign-extend, because if we reuse more
5595 // than once, we won't have access to the instruction that originally defines the constant).
5596 if ((refPosition->treeNode->TypeGet() == otherTreeNode->TypeGet()) ||
5597 (refPosition->treeNode->AsIntCon()->IconValue() >= 0))
5598 #endif // _TARGET_64BIT_
5600 score |= VALUE_AVAILABLE;
5606 // For floating point constants, the values must be identical, not simply compare
5607 // equal. So we compare the bits.
5608 if (refPosition->treeNode->AsDblCon()->isBitwiseEqual(otherTreeNode->AsDblCon()) &&
5609 (refPosition->treeNode->TypeGet() == otherTreeNode->TypeGet()))
5611 score |= VALUE_AVAILABLE;
5616 // for all other 'otherTreeNode->OperGet()' kinds, we leave 'score' unchanged
5622 // If the nextPhysRefLocation is a fixedRef for the rangeEndRefPosition, increment it so that
5623 // we don't think it isn't covering the live range.
5624 // This doesn't handle the case where earlier RefPositions for this Interval are also
5625 // FixedRefs of this regNum, but at least those are only interesting in the case where those
5626 // are "local last uses" of the Interval - otherwise the liveRange would interfere with the reg.
5627 if (nextPhysRefLocation == rangeEndLocation && rangeEndRefPosition->isFixedRefOfReg(regNum))
5629 INDEBUG(dumpLsraAllocationEvent(LSRA_EVENT_INCREMENT_RANGE_END, currentInterval, regNum));
5630 nextPhysRefLocation++;
5633 if ((candidateBit & preferences) != RBM_NONE)
5635 score |= OWN_PREFERENCE;
5636 if (nextPhysRefLocation > rangeEndLocation)
5641 if (relatedInterval != nullptr && (candidateBit & relatedPreferences) != RBM_NONE)
5643 score |= RELATED_PREFERENCE;
5644 if (nextPhysRefLocation > relatedInterval->lastRefPosition->nodeLocation)
5646 score |= COVERS_RELATED;
5650 // If we had a fixed-reg def of a reg that will be killed before the use, prefer it to any other registers
5651 // with the same score. (Note that we haven't changed the original registerAssignment on the RefPosition).
5652 // Overload the RELATED_PREFERENCE value.
5653 else if (candidateBit == refPosition->registerAssignment)
5655 score |= RELATED_PREFERENCE;
5658 if ((preferCalleeSave && physRegRecord->isCalleeSave) || (!preferCalleeSave && !physRegRecord->isCalleeSave))
5660 score |= CALLER_CALLEE;
5663 // The register is considered unassigned if it has no assignedInterval, OR
5664 // if its next reference is beyond the range of this interval.
5665 if (!isAssigned(physRegRecord, lastLocation ARM_ARG(currentInterval->registerType)))
5667 score |= UNASSIGNED;
5670 bool foundBetterCandidate = false;
5672 if (score > bestScore)
5674 foundBetterCandidate = true;
5676 else if (score == bestScore)
5678 // Prefer a register that covers the range.
5679 if (bestLocation <= lastLocation)
5681 if (nextPhysRefLocation > bestLocation)
5683 foundBetterCandidate = true;
5686 // If both cover the range, prefer a register that is killed sooner (leaving the longer range register
5687 // available). If both cover the range and also getting killed at the same location, prefer the one which
5688 // is same as previous assignment.
5689 else if (nextPhysRefLocation > lastLocation)
5691 if (nextPhysRefLocation < bestLocation)
5693 foundBetterCandidate = true;
5695 else if (nextPhysRefLocation == bestLocation && prevReg == regNum)
5697 foundBetterCandidate = true;
5703 if (doReverseSelect() && bestScore != 0)
5705 foundBetterCandidate = !foundBetterCandidate;
5709 if (foundBetterCandidate)
5711 bestLocation = nextPhysRefLocation;
5712 availablePhysRegInterval = physRegRecord;
5713 intervalToUnassign = physRegRecord->assignedInterval;
5717 // there is no way we can get a better score so break out
5718 if (!reverseSelect && score == bestPossibleScore && bestLocation == rangeEndLocation + 1)
5724 if (availablePhysRegInterval != nullptr)
5726 if (isAssigned(availablePhysRegInterval ARM_ARG(currentInterval->registerType)))
5728 intervalToUnassign = availablePhysRegInterval->assignedInterval;
5729 unassignPhysReg(availablePhysRegInterval ARM_ARG(currentInterval->registerType));
5731 if ((bestScore & VALUE_AVAILABLE) != 0 && intervalToUnassign != nullptr)
5733 assert(intervalToUnassign->isConstant);
5734 refPosition->treeNode->SetReuseRegVal();
5736 // If we considered this "unassigned" because this interval's lifetime ends before
5737 // the next ref, remember it.
5738 else if ((bestScore & UNASSIGNED) != 0 && intervalToUnassign != nullptr)
5740 updatePreviousInterval(availablePhysRegInterval, intervalToUnassign, intervalToUnassign->registerType);
5745 assert((bestScore & VALUE_AVAILABLE) == 0);
5747 assignPhysReg(availablePhysRegInterval, currentInterval);
5748 foundReg = availablePhysRegInterval->regNum;
5749 regMaskTP foundRegMask = genRegMask(foundReg);
5750 refPosition->registerAssignment = foundRegMask;
5751 if (relatedInterval != nullptr)
5753 relatedInterval->updateRegisterPreferences(foundRegMask);
5759 //------------------------------------------------------------------------
5760 // canSpillReg: Determine whether we can spill physRegRecord
5763 // physRegRecord - reg to spill
5764 // refLocation - Location of RefPosition where this register will be spilled
5765 // recentAssignedRefWeight - Weight of recent assigned RefPosition which will be determined in this function
5766 // farthestRefPosWeight - Current farthestRefPosWeight at allocateBusyReg()
5769 // True - if we can spill physRegRecord
5770 // False - otherwise
5772 // Note: This helper is designed to be used only from allocateBusyReg() and canSpillDoubleReg()
5774 bool LinearScan::canSpillReg(RegRecord* physRegRecord,
5775 LsraLocation refLocation,
5776 unsigned* recentAssignedRefWeight,
5777 unsigned farthestRefPosWeight)
5779 assert(physRegRecord->assignedInterval != nullptr);
5780 RefPosition* recentAssignedRef = physRegRecord->assignedInterval->recentRefPosition;
5782 if (recentAssignedRef != nullptr)
5784 if (recentAssignedRef->nodeLocation == refLocation)
5786 // We can't spill a register that's being used at the current location
5790 // If the current position has the candidate register marked to be delayed,
5791 // check if the previous location is using this register, if that's the case we have to skip
5792 // since we can't spill this register.
5793 if (recentAssignedRef->delayRegFree && (refLocation == recentAssignedRef->nodeLocation + 1))
5798 // We don't prefer to spill a register if the weight of recentAssignedRef > weight
5799 // of the spill candidate found so far. We would consider spilling a greater weight
5800 // ref position only if the refPosition being allocated must need a reg.
5801 *recentAssignedRefWeight = getWeight(recentAssignedRef);
5802 if (*recentAssignedRefWeight > farthestRefPosWeight)
5811 bool LinearScan::canSpillDoubleReg(RegRecord* physRegRecord,
5812 LsraLocation refLocation,
5813 unsigned* recentAssignedRefWeight,
5814 unsigned farthestRefPosWeight)
5817 unsigned weight = BB_ZERO_WEIGHT;
5818 unsigned weight2 = BB_ZERO_WEIGHT;
5820 RegRecord* physRegRecord2 = findAnotherHalfRegRec(physRegRecord);
5822 if (physRegRecord->assignedInterval != nullptr)
5823 retVal &= canSpillReg(physRegRecord, refLocation, &weight, farthestRefPosWeight);
5825 if (physRegRecord2->assignedInterval != nullptr)
5826 retVal &= canSpillReg(physRegRecord2, refLocation, &weight2, farthestRefPosWeight);
5828 if (!(weight == BB_ZERO_WEIGHT && weight2 == BB_ZERO_WEIGHT))
5830 // weight and/or weight2 have been updated.
5831 *recentAssignedRefWeight = (weight > weight2) ? weight : weight2;
5838 //----------------------------------------------------------------------------
5839 // checkActiveInterval: Test activness of an interval
5840 // and check assertions if the interval is not active
5843 // interval - An interval to be tested
5844 // refLocation - Location where the interval is being tested
5847 // True - iff the interval is active
5848 // False - otherwise
5850 // Note: This helper is designed to be used only from checkActiveIntervals()
5852 bool LinearScan::checkActiveInterval(Interval* interval, LsraLocation refLocation)
5854 if (!interval->isActive)
5856 RefPosition* recentAssignedRef = interval->recentRefPosition;
5857 // Note that we may or may not have actually handled the reference yet, so it could either
5858 // be recentAssignedRef, or the next reference.
5859 assert(recentAssignedRef != nullptr);
5860 if (recentAssignedRef->nodeLocation != refLocation)
5862 if (recentAssignedRef->nodeLocation + 1 == refLocation)
5864 assert(recentAssignedRef->delayRegFree);
5868 RefPosition* nextAssignedRef = recentAssignedRef->nextRefPosition;
5869 assert(nextAssignedRef != nullptr);
5870 assert(nextAssignedRef->nodeLocation == refLocation ||
5871 (nextAssignedRef->nodeLocation + 1 == refLocation && nextAssignedRef->delayRegFree));
5879 //----------------------------------------------------------------------------------------
5880 // checkActiveIntervals: Test activness of a interval assinged to a register
5881 // and check assertions if the interval is not active.
5882 // We look into all intervals of two float registers consisting
5883 // a double regsiter for ARM32.
5886 // physRegRecord - A register
5887 // refLocation - Location where intervsl is being tested
5888 // registerType - Type of regsiter
5891 // True - iff the all intervals are active
5892 // False - otherwise
5894 // Note: This helper is designed to be used only from allocateBusyReg()
5896 bool LinearScan::checkActiveIntervals(RegRecord* physRegRecord, LsraLocation refLocation, RegisterType registerType)
5898 Interval* assignedInterval = physRegRecord->assignedInterval;
5901 // Check two intervals for a double register in ARM32
5902 Interval* assignedInterval2 = nullptr;
5903 if (registerType == TYP_DOUBLE)
5904 assignedInterval2 = findAnotherHalfRegRec(physRegRecord)->assignedInterval;
5906 // Both intervals should not be nullptr at the same time, becasue we already handle this case before.
5907 assert(!(assignedInterval == nullptr && assignedInterval2 == nullptr));
5909 if (assignedInterval != nullptr && !checkActiveInterval(assignedInterval, refLocation))
5912 if (assignedInterval2 != nullptr && !checkActiveInterval(assignedInterval2, refLocation))
5917 return checkActiveInterval(assignedInterval, refLocation);
5922 void LinearScan::unassignDoublePhysReg(RegRecord* doubleRegRecord)
5924 assert(genIsValidDoubleReg(doubleRegRecord->regNum));
5926 RegRecord* doubleRegRecordLo = doubleRegRecord;
5927 RegRecord* doubleRegRecordHi = findAnotherHalfRegRec(doubleRegRecordLo);
5928 // For a double register, we has following four cases.
5929 // Case 1: doubleRegRecLo is assigned to TYP_DOUBLE interval
5930 // Case 2: doubleRegRecLo and doubleRegRecHi are assigned to different TYP_FLOAT intervals
5931 // Case 3: doubelRegRecLo is assgined to TYP_FLOAT interval and doubleRegRecHi is nullptr
5932 // Case 4: doubleRegRecordLo is nullptr, and doubleRegRecordHi is assigned to a TYP_FLOAT interval
5933 if (doubleRegRecordLo->assignedInterval != nullptr)
5935 if (doubleRegRecordLo->assignedInterval->registerType == TYP_DOUBLE)
5937 // Case 1: doubleRegRecLo is assigned to TYP_DOUBLE interval
5938 unassignPhysReg(doubleRegRecordLo, doubleRegRecordLo->assignedInterval->recentRefPosition);
5942 // Case 2: doubleRegRecLo and doubleRegRecHi are assigned to different TYP_FLOAT intervals
5943 // Case 3: doubelRegRecLo is assgined to TYP_FLOAT interval and doubleRegRecHi is nullptr
5944 assert(doubleRegRecordLo->assignedInterval->registerType == TYP_FLOAT);
5945 unassignPhysReg(doubleRegRecordLo, doubleRegRecordLo->assignedInterval->recentRefPosition);
5947 if (doubleRegRecordHi != nullptr)
5949 if (doubleRegRecordHi->assignedInterval != nullptr)
5951 assert(doubleRegRecordHi->assignedInterval->registerType == TYP_FLOAT);
5952 unassignPhysReg(doubleRegRecordHi, doubleRegRecordHi->assignedInterval->recentRefPosition);
5959 // Case 4: doubleRegRecordLo is nullptr, and doubleRegRecordHi is assigned to a TYP_FLOAT interval
5960 assert(doubleRegRecordHi->assignedInterval != nullptr);
5961 assert(doubleRegRecordHi->assignedInterval->registerType == TYP_FLOAT);
5962 unassignPhysReg(doubleRegRecordHi, doubleRegRecordHi->assignedInterval->recentRefPosition);
5966 #endif // _TARGET_ARM_
5968 //----------------------------------------------------------------------------------------
5969 // isRegInUse: Test whether regRec is being used at the refPosition
5972 // regRec - A register to be tested
5973 // refPosition - RefPosition where regRec is tested
5974 // nextLocation - next RefPosition of interval assigned to regRec
5977 // True - if regRec is beding used
5978 // False - otherwise
5980 // Note: This helper is designed to be used only from allocateBusyReg()
5982 bool LinearScan::isRegInUse(RegRecord* regRec, RefPosition* refPosition, LsraLocation* nextLocation)
5984 Interval* assignedInterval = regRec->assignedInterval;
5985 if (assignedInterval != nullptr)
5987 LsraLocation refLocation = refPosition->nodeLocation;
5988 RefPosition* nextRefPosition = assignedInterval->getNextRefPosition();
5989 *nextLocation = assignedInterval->getNextRefLocation();
5991 // We should never spill a register that's occupied by an Interval with its next use at the current
5993 // Normally this won't occur (unless we actually had more uses in a single node than there are registers),
5994 // because we'll always find something with a later nextLocation, but it can happen in stress when
5995 // we have LSRA_SELECT_NEAREST.
5996 if ((*nextLocation == refLocation) && !refPosition->isFixedRegRef && nextRefPosition->RequiresRegister())
6004 //------------------------------------------------------------------------
6005 // allocateBusyReg: Find a busy register that satisfies the requirements for refPosition,
6006 // and that can be spilled.
6009 // current The interval for the current allocation
6010 // refPosition The RefPosition of the current Interval for which a register is being allocated
6011 // allocateIfProfitable If true, a reg may not be allocated if all other ref positions currently
6012 // occupying registers are more important than the 'refPosition'.
6015 // The regNumber allocated to the RefPositon. Returns REG_NA if no free register is found.
6017 // Note: Currently this routine uses weight and farthest distance of next reference
6018 // to select a ref position for spilling.
6019 // a) if allocateIfProfitable = false
6020 // The ref position chosen for spilling will be the lowest weight
6021 // of all and if there is is more than one ref position with the
6022 // same lowest weight, among them choses the one with farthest
6023 // distance to its next reference.
6025 // b) if allocateIfProfitable = true
6026 // The ref position chosen for spilling will not only be lowest weight
6027 // of all but also has a weight lower than 'refPosition'. If there is
6028 // no such ref position, reg will not be allocated.
6029 regNumber LinearScan::allocateBusyReg(Interval* current, RefPosition* refPosition, bool allocateIfProfitable)
6031 regNumber foundReg = REG_NA;
6033 RegisterType regType = getRegisterType(current, refPosition);
6034 regMaskTP candidates = refPosition->registerAssignment;
6035 regMaskTP preferences = (current->registerPreferences & candidates);
6036 if (preferences == RBM_NONE)
6038 preferences = candidates;
6040 if (candidates == RBM_NONE)
6042 // This assumes only integer and floating point register types
6043 // if we target a processor with additional register types,
6044 // this would have to change
6045 candidates = allRegs(regType);
6049 candidates = stressLimitRegs(refPosition, candidates);
6052 // TODO-CQ: Determine whether/how to take preferences into account in addition to
6053 // prefering the one with the furthest ref position when considering
6054 // a candidate to spill
6055 RegRecord* farthestRefPhysRegRecord = nullptr;
6057 RegRecord* farthestRefPhysRegRecord2 = nullptr;
6059 LsraLocation farthestLocation = MinLocation;
6060 LsraLocation refLocation = refPosition->nodeLocation;
6061 unsigned farthestRefPosWeight;
6062 if (allocateIfProfitable)
6064 // If allocating a reg is optional, we will consider those ref positions
6065 // whose weight is less than 'refPosition' for spilling.
6066 farthestRefPosWeight = getWeight(refPosition);
6070 // If allocating a reg is a must, we start off with max weight so
6071 // that the first spill candidate will be selected based on
6072 // farthest distance alone. Since we start off with farthestLocation
6073 // initialized to MinLocation, the first available ref position
6074 // will be selected as spill candidate and its weight as the
6075 // fathestRefPosWeight.
6076 farthestRefPosWeight = BB_MAX_WEIGHT;
6079 for (regNumber regNum : Registers(regType))
6081 regMaskTP candidateBit = genRegMask(regNum);
6082 if (!(candidates & candidateBit))
6086 RegRecord* physRegRecord = getRegisterRecord(regNum);
6088 RegRecord* physRegRecord2 = nullptr;
6089 // For ARM32, let's consider two float registers consisting a double reg together,
6090 // when allocaing a double register.
6091 if (current->registerType == TYP_DOUBLE)
6093 assert(genIsValidDoubleReg(regNum));
6094 physRegRecord2 = findAnotherHalfRegRec(physRegRecord);
6098 if (physRegRecord->isBusyUntilNextKill)
6102 Interval* assignedInterval = physRegRecord->assignedInterval;
6104 Interval* assignedInterval2 = (physRegRecord2 == nullptr) ? nullptr : physRegRecord2->assignedInterval;
6107 // If there is a fixed reference at the same location (and it's not due to this reference),
6110 if (physRegRecord->conflictingFixedRegReference(refPosition))
6112 assert(candidates != candidateBit);
6116 LsraLocation physRegNextLocation = MaxLocation;
6117 if (refPosition->isFixedRefOfRegMask(candidateBit))
6119 // Either there is a fixed reference due to this node, or one associated with a
6120 // fixed use fed by a def at this node.
6121 // In either case, we must use this register as it's the only candidate
6122 // TODO-CQ: At the time we allocate a register to a fixed-reg def, if it's not going
6123 // to remain live until the use, we should set the candidates to allRegs(regType)
6124 // to avoid a spill - codegen can then insert the copy.
6125 assert(candidates == candidateBit);
6127 // If a refPosition has a fixed reg as its candidate and is also marked
6128 // as allocateIfProfitable, we should allocate fixed reg only if the
6129 // weight of this ref position is greater than the weight of the ref
6130 // position to which fixed reg is assigned. Such a case would arise
6131 // on x86 under LSRA stress.
6132 if (!allocateIfProfitable)
6134 physRegNextLocation = MaxLocation;
6135 farthestRefPosWeight = BB_MAX_WEIGHT;
6140 physRegNextLocation = physRegRecord->getNextRefLocation();
6142 // If refPosition requires a fixed register, we should reject all others.
6143 // Otherwise, we will still evaluate all phyRegs though their next location is
6144 // not better than farthestLocation found so far.
6146 // TODO: this method should be using an approach similar to tryAllocateFreeReg()
6147 // where it uses a regOrder array to avoid iterating over any but the single
6149 if (refPosition->isFixedRegRef && physRegNextLocation < farthestLocation)
6155 // If this register is not assigned to an interval, either
6156 // - it has a FixedReg reference at the current location that is not this reference, OR
6157 // - this is the special case of a fixed loReg, where this interval has a use at the same location
6158 // In either case, we cannot use it
6159 CLANG_FORMAT_COMMENT_ANCHOR;
6162 if (assignedInterval == nullptr && assignedInterval2 == nullptr)
6164 if (assignedInterval == nullptr)
6167 RefPosition* nextPhysRegPosition = physRegRecord->getNextRefPosition();
6169 #ifndef _TARGET_ARM64_
6170 // TODO-Cleanup: Revisit this after Issue #3524 is complete
6171 // On ARM64 the nodeLocation is not always == refLocation, Disabling this assert for now.
6172 assert(nextPhysRegPosition->nodeLocation == refLocation && candidateBit != candidates);
6178 RefPosition* recentAssignedRef = (assignedInterval == nullptr) ? nullptr : assignedInterval->recentRefPosition;
6179 RefPosition* recentAssignedRef2 =
6180 (assignedInterval2 == nullptr) ? nullptr : assignedInterval2->recentRefPosition;
6182 RefPosition* recentAssignedRef = assignedInterval->recentRefPosition;
6185 if (!checkActiveIntervals(physRegRecord, refLocation, current->registerType))
6190 // If we have a recentAssignedRef, check that it is going to be OK to spill it
6192 // TODO-Review: Under what conditions recentAssginedRef would be null?
6193 unsigned recentAssignedRefWeight = BB_ZERO_WEIGHT;
6196 if (current->registerType == TYP_DOUBLE)
6198 if (!canSpillDoubleReg(physRegRecord, refLocation, &recentAssignedRefWeight, farthestRefPosWeight))
6203 // This if-stmt is associated with the above else
6204 if (!canSpillReg(physRegRecord, refLocation, &recentAssignedRefWeight, farthestRefPosWeight))
6209 LsraLocation nextLocation = MinLocation;
6211 if (isRegInUse(physRegRecord, refPosition, &nextLocation))
6217 if (current->registerType == TYP_DOUBLE)
6219 LsraLocation nextLocation2 = MinLocation;
6220 if (isRegInUse(physRegRecord2, refPosition, &nextLocation2))
6224 nextLocation = (nextLocation > nextLocation2) ? nextLocation : nextLocation2;
6228 if (nextLocation > physRegNextLocation)
6230 nextLocation = physRegNextLocation;
6233 bool isBetterLocation;
6236 if (doSelectNearest() && farthestRefPhysRegRecord != nullptr)
6238 isBetterLocation = (nextLocation <= farthestLocation);
6242 // This if-stmt is associated with the above else
6243 if (recentAssignedRefWeight < farthestRefPosWeight)
6245 isBetterLocation = true;
6249 // This would mean the weight of spill ref position we found so far is equal
6250 // to the weight of the ref position that is being evaluated. In this case
6251 // we prefer to spill ref position whose distance to its next reference is
6253 assert(recentAssignedRefWeight == farthestRefPosWeight);
6255 // If allocateIfProfitable=true, the first spill candidate selected
6256 // will be based on weight alone. After we have found a spill
6257 // candidate whose weight is less than the 'refPosition', we will
6258 // consider farthest distance when there is a tie in weights.
6259 // This is to ensure that we don't spill a ref position whose
6260 // weight is equal to weight of 'refPosition'.
6261 if (allocateIfProfitable && farthestRefPhysRegRecord == nullptr)
6263 isBetterLocation = false;
6267 isBetterLocation = (nextLocation > farthestLocation);
6269 if (nextLocation > farthestLocation)
6271 isBetterLocation = true;
6273 else if (nextLocation == farthestLocation)
6275 // Both weight and distance are equal.
6276 // Prefer that ref position which is marked both reload and
6277 // allocate if profitable. These ref positions don't need
6278 // need to be spilled as they are already in memory and
6279 // codegen considers them as contained memory operands.
6280 CLANG_FORMAT_COMMENT_ANCHOR;
6282 // TODO-CQ-ARM: Just conservatively "and" two condition. We may implement better condision later.
6283 isBetterLocation = true;
6284 if (recentAssignedRef != nullptr)
6285 isBetterLocation &= (recentAssignedRef->reload && recentAssignedRef->AllocateIfProfitable());
6287 if (recentAssignedRef2 != nullptr)
6288 isBetterLocation &= (recentAssignedRef2->reload && recentAssignedRef2->AllocateIfProfitable());
6290 isBetterLocation = (recentAssignedRef != nullptr) && recentAssignedRef->reload &&
6291 recentAssignedRef->AllocateIfProfitable();
6296 isBetterLocation = false;
6301 if (isBetterLocation)
6303 farthestLocation = nextLocation;
6304 farthestRefPhysRegRecord = physRegRecord;
6306 farthestRefPhysRegRecord2 = physRegRecord2;
6308 farthestRefPosWeight = recentAssignedRefWeight;
6313 if (allocateIfProfitable)
6315 // There may not be a spill candidate or if one is found
6316 // its weight must be less than the weight of 'refPosition'
6317 assert((farthestRefPhysRegRecord == nullptr) || (farthestRefPosWeight < getWeight(refPosition)));
6321 // Must have found a spill candidate.
6322 assert(farthestRefPhysRegRecord != nullptr);
6323 if ((farthestLocation == refLocation) && !refPosition->isFixedRegRef)
6326 Interval* assignedInterval =
6327 (farthestRefPhysRegRecord == nullptr) ? nullptr : farthestRefPhysRegRecord->assignedInterval;
6328 Interval* assignedInterval2 =
6329 (farthestRefPhysRegRecord2 == nullptr) ? nullptr : farthestRefPhysRegRecord2->assignedInterval;
6330 RefPosition* nextRefPosition =
6331 (assignedInterval == nullptr) ? nullptr : assignedInterval->getNextRefPosition();
6332 RefPosition* nextRefPosition2 =
6333 (assignedInterval2 == nullptr) ? nullptr : assignedInterval2->getNextRefPosition();
6334 if (nextRefPosition != nullptr)
6336 if (nextRefPosition2 != nullptr)
6338 assert(!nextRefPosition->RequiresRegister() || !nextRefPosition2->RequiresRegister());
6342 assert(!nextRefPosition->RequiresRegister());
6347 assert(nextRefPosition2 != nullptr && !nextRefPosition2->RequiresRegister());
6349 #else // !_TARGET_ARM_
6350 Interval* assignedInterval = farthestRefPhysRegRecord->assignedInterval;
6351 RefPosition* nextRefPosition = assignedInterval->getNextRefPosition();
6352 assert(!nextRefPosition->RequiresRegister());
6353 #endif // !_TARGET_ARM_
6357 assert(farthestLocation > refLocation || refPosition->isFixedRegRef);
6362 if (farthestRefPhysRegRecord != nullptr)
6364 foundReg = farthestRefPhysRegRecord->regNum;
6367 if (current->registerType == TYP_DOUBLE)
6369 assert(genIsValidDoubleReg(foundReg));
6370 unassignDoublePhysReg(farthestRefPhysRegRecord);
6375 unassignPhysReg(farthestRefPhysRegRecord, farthestRefPhysRegRecord->assignedInterval->recentRefPosition);
6378 assignPhysReg(farthestRefPhysRegRecord, current);
6379 refPosition->registerAssignment = genRegMask(foundReg);
6384 refPosition->registerAssignment = RBM_NONE;
6390 // Grab a register to use to copy and then immediately use.
6391 // This is called only for localVar intervals that already have a register
6392 // assignment that is not compatible with the current RefPosition.
6393 // This is not like regular assignment, because we don't want to change
6394 // any preferences or existing register assignments.
6395 // Prefer a free register that's got the earliest next use.
6396 // Otherwise, spill something with the farthest next use
6398 regNumber LinearScan::assignCopyReg(RefPosition* refPosition)
6400 Interval* currentInterval = refPosition->getInterval();
6401 assert(currentInterval != nullptr);
6402 assert(currentInterval->isActive);
6404 bool foundFreeReg = false;
6405 RegRecord* bestPhysReg = nullptr;
6406 LsraLocation bestLocation = MinLocation;
6407 regMaskTP candidates = refPosition->registerAssignment;
6409 // Save the relatedInterval, if any, so that it doesn't get modified during allocation.
6410 Interval* savedRelatedInterval = currentInterval->relatedInterval;
6411 currentInterval->relatedInterval = nullptr;
6413 // We don't want really want to change the default assignment,
6414 // so 1) pretend this isn't active, and 2) remember the old reg
6415 regNumber oldPhysReg = currentInterval->physReg;
6416 RegRecord* oldRegRecord = currentInterval->assignedReg;
6417 assert(oldRegRecord->regNum == oldPhysReg);
6418 currentInterval->isActive = false;
6420 regNumber allocatedReg = tryAllocateFreeReg(currentInterval, refPosition);
6421 if (allocatedReg == REG_NA)
6423 allocatedReg = allocateBusyReg(currentInterval, refPosition, false);
6426 // Now restore the old info
6427 currentInterval->relatedInterval = savedRelatedInterval;
6428 currentInterval->physReg = oldPhysReg;
6429 currentInterval->assignedReg = oldRegRecord;
6430 currentInterval->isActive = true;
6432 refPosition->copyReg = true;
6433 return allocatedReg;
6436 //------------------------------------------------------------------------
6437 // isAssigned: This is the function to check if the given RegRecord has an assignedInterval
6438 // regardless of lastLocation.
6439 // So it would be call isAssigned() with Maxlocation value.
6442 // regRec - The RegRecord to check that it is assigned.
6443 // newRegType - There are elements to judge according to the upcoming register type.
6446 // Returns true if the given RegRecord has an assignedInterval.
6449 // There is the case to check if the RegRecord has an assignedInterval regardless of Lastlocation.
6451 bool LinearScan::isAssigned(RegRecord* regRec ARM_ARG(RegisterType newRegType))
6453 return isAssigned(regRec, MaxLocation ARM_ARG(newRegType));
6456 //------------------------------------------------------------------------
6457 // isAssigned: Check whether the given RegRecord has an assignedInterval
6458 // that has a reference prior to the given location.
6461 // regRec - The RegRecord of interest
6462 // lastLocation - The LsraLocation up to which we want to check
6463 // newRegType - The `RegisterType` of interval we want to check
6464 // (this is for the purposes of checking the other half of a TYP_DOUBLE RegRecord)
6467 // Returns true if the given RegRecord (and its other half, if TYP_DOUBLE) has an assignedInterval
6468 // that is referenced prior to the given location
6471 // The register is not considered to be assigned if it has no assignedInterval, or that Interval's
6472 // next reference is beyond lastLocation
6474 bool LinearScan::isAssigned(RegRecord* regRec, LsraLocation lastLocation ARM_ARG(RegisterType newRegType))
6476 Interval* assignedInterval = regRec->assignedInterval;
6478 if ((assignedInterval == nullptr) || assignedInterval->getNextRefLocation() > lastLocation)
6481 if (newRegType == TYP_DOUBLE)
6483 RegRecord* anotherRegRec = findAnotherHalfRegRec(regRec);
6485 if ((anotherRegRec->assignedInterval == nullptr) ||
6486 (anotherRegRec->assignedInterval->getNextRefLocation() > lastLocation))
6488 // In case the newRegType is a double register,
6489 // the score would be set UNASSIGNED if another register is also not set.
6503 // Check if the interval is already assigned and if it is then unassign the physical record
6504 // then set the assignedInterval to 'interval'
6506 void LinearScan::checkAndAssignInterval(RegRecord* regRec, Interval* interval)
6508 if (regRec->assignedInterval != nullptr && regRec->assignedInterval != interval)
6510 // This is allocated to another interval. Either it is inactive, or it was allocated as a
6511 // copyReg and is therefore not the "assignedReg" of the other interval. In the latter case,
6512 // we simply unassign it - in the former case we need to set the physReg on the interval to
6513 // REG_NA to indicate that it is no longer in that register.
6514 // The lack of checking for this case resulted in an assert in the retail version of System.dll,
6515 // in method SerialStream.GetDcbFlag.
6516 // Note that we can't check for the copyReg case, because we may have seen a more recent
6517 // RefPosition for the Interval that was NOT a copyReg.
6518 if (regRec->assignedInterval->assignedReg == regRec)
6520 assert(regRec->assignedInterval->isActive == false);
6521 regRec->assignedInterval->physReg = REG_NA;
6523 unassignPhysReg(regRec->regNum);
6526 updateAssignedInterval(regRec, interval, interval->registerType);
6529 // Assign the given physical register interval to the given interval
6530 void LinearScan::assignPhysReg(RegRecord* regRec, Interval* interval)
6532 regMaskTP assignedRegMask = genRegMask(regRec->regNum);
6533 compiler->codeGen->regSet.rsSetRegsModified(assignedRegMask DEBUGARG(dumpTerse));
6535 checkAndAssignInterval(regRec, interval);
6536 interval->assignedReg = regRec;
6538 interval->physReg = regRec->regNum;
6539 interval->isActive = true;
6540 if (interval->isLocalVar)
6542 // Prefer this register for future references
6543 interval->updateRegisterPreferences(assignedRegMask);
6547 //------------------------------------------------------------------------
6548 // setIntervalAsSplit: Set this Interval as being split
6551 // interval - The Interval which is being split
6557 // The given Interval will be marked as split, and it will be added to the
6558 // set of splitOrSpilledVars.
6561 // "interval" must be a lclVar interval, as tree temps are never split.
6562 // This is asserted in the call to getVarIndex().
6564 void LinearScan::setIntervalAsSplit(Interval* interval)
6566 if (interval->isLocalVar)
6568 unsigned varIndex = interval->getVarIndex(compiler);
6569 if (!interval->isSplit)
6571 VarSetOps::AddElemD(compiler, splitOrSpilledVars, varIndex);
6575 assert(VarSetOps::IsMember(compiler, splitOrSpilledVars, varIndex));
6578 interval->isSplit = true;
6581 //------------------------------------------------------------------------
6582 // setIntervalAsSpilled: Set this Interval as being spilled
6585 // interval - The Interval which is being spilled
6591 // The given Interval will be marked as spilled, and it will be added
6592 // to the set of splitOrSpilledVars.
6594 void LinearScan::setIntervalAsSpilled(Interval* interval)
6596 if (interval->isLocalVar)
6598 unsigned varIndex = interval->getVarIndex(compiler);
6599 if (!interval->isSpilled)
6601 VarSetOps::AddElemD(compiler, splitOrSpilledVars, varIndex);
6605 assert(VarSetOps::IsMember(compiler, splitOrSpilledVars, varIndex));
6608 interval->isSpilled = true;
6611 //------------------------------------------------------------------------
6612 // spill: Spill this Interval between "fromRefPosition" and "toRefPosition"
6615 // fromRefPosition - The RefPosition at which the Interval is to be spilled
6616 // toRefPosition - The RefPosition at which it must be reloaded
6622 // fromRefPosition and toRefPosition must not be null
6624 void LinearScan::spillInterval(Interval* interval, RefPosition* fromRefPosition, RefPosition* toRefPosition)
6626 assert(fromRefPosition != nullptr && toRefPosition != nullptr);
6627 assert(fromRefPosition->getInterval() == interval && toRefPosition->getInterval() == interval);
6628 assert(fromRefPosition->nextRefPosition == toRefPosition);
6630 if (!fromRefPosition->lastUse)
6632 // If not allocated a register, Lcl var def/use ref positions even if reg optional
6633 // should be marked as spillAfter.
6634 if (!fromRefPosition->RequiresRegister() && !(interval->isLocalVar && fromRefPosition->IsActualRef()))
6636 fromRefPosition->registerAssignment = RBM_NONE;
6640 fromRefPosition->spillAfter = true;
6643 assert(toRefPosition != nullptr);
6648 dumpLsraAllocationEvent(LSRA_EVENT_SPILL, interval);
6652 INTRACK_STATS(updateLsraStat(LSRA_STAT_SPILL, fromRefPosition->bbNum));
6654 interval->isActive = false;
6655 setIntervalAsSpilled(interval);
6657 // If fromRefPosition occurs before the beginning of this block, mark this as living in the stack
6658 // on entry to this block.
6659 if (fromRefPosition->nodeLocation <= curBBStartLocation)
6661 // This must be a lclVar interval
6662 assert(interval->isLocalVar);
6663 setInVarRegForBB(curBBNum, interval->varNum, REG_STK);
6667 //------------------------------------------------------------------------
6668 // unassignPhysRegNoSpill: Unassign the given physical register record from
6669 // an active interval, without spilling.
6672 // regRec - the RegRecord to be unasssigned
6678 // The assignedInterval must not be null, and must be active.
6681 // This method is used to unassign a register when an interval needs to be moved to a
6682 // different register, but not (yet) spilled.
6684 void LinearScan::unassignPhysRegNoSpill(RegRecord* regRec)
6686 Interval* assignedInterval = regRec->assignedInterval;
6687 assert(assignedInterval != nullptr && assignedInterval->isActive);
6688 assignedInterval->isActive = false;
6689 unassignPhysReg(regRec, nullptr);
6690 assignedInterval->isActive = true;
6693 //------------------------------------------------------------------------
6694 // checkAndClearInterval: Clear the assignedInterval for the given
6695 // physical register record
6698 // regRec - the physical RegRecord to be unasssigned
6699 // spillRefPosition - The RefPosition at which the assignedInterval is to be spilled
6700 // or nullptr if we aren't spilling
6706 // see unassignPhysReg
6708 void LinearScan::checkAndClearInterval(RegRecord* regRec, RefPosition* spillRefPosition)
6710 Interval* assignedInterval = regRec->assignedInterval;
6711 assert(assignedInterval != nullptr);
6712 regNumber thisRegNum = regRec->regNum;
6714 if (spillRefPosition == nullptr)
6716 // Note that we can't assert for the copyReg case
6718 if (assignedInterval->physReg == thisRegNum)
6720 assert(assignedInterval->isActive == false);
6725 assert(spillRefPosition->getInterval() == assignedInterval);
6728 updateAssignedInterval(regRec, nullptr, assignedInterval->registerType);
6731 //------------------------------------------------------------------------
6732 // unassignPhysReg: Unassign the given physical register record, and spill the
6733 // assignedInterval at the given spillRefPosition, if any.
6736 // regRec - The RegRecord to be unasssigned
6737 // newRegType - The RegisterType of interval that would be assigned
6743 // On ARM architecture, Intervals have to be unassigned considering
6744 // with the register type of interval that would be assigned.
6746 void LinearScan::unassignPhysReg(RegRecord* regRec ARM_ARG(RegisterType newRegType))
6748 RegRecord* regRecToUnassign = regRec;
6750 RegRecord* anotherRegRec = nullptr;
6752 if ((regRecToUnassign->assignedInterval != nullptr) &&
6753 (regRecToUnassign->assignedInterval->registerType == TYP_DOUBLE))
6755 // If the register type of interval(being unassigned or new) is TYP_DOUBLE,
6756 // It should have to be valid double register (even register)
6757 if (!genIsValidDoubleReg(regRecToUnassign->regNum))
6759 regRecToUnassign = findAnotherHalfRegRec(regRec);
6764 if (newRegType == TYP_DOUBLE)
6766 anotherRegRec = findAnotherHalfRegRec(regRecToUnassign);
6771 if (regRecToUnassign->assignedInterval != nullptr)
6773 unassignPhysReg(regRecToUnassign, regRecToUnassign->assignedInterval->recentRefPosition);
6776 if ((anotherRegRec != nullptr) && (anotherRegRec->assignedInterval != nullptr))
6778 unassignPhysReg(anotherRegRec, anotherRegRec->assignedInterval->recentRefPosition);
6783 //------------------------------------------------------------------------
6784 // unassignPhysReg: Unassign the given physical register record, and spill the
6785 // assignedInterval at the given spillRefPosition, if any.
6788 // regRec - the RegRecord to be unasssigned
6789 // spillRefPosition - The RefPosition at which the assignedInterval is to be spilled
6795 // The assignedInterval must not be null.
6796 // If spillRefPosition is null, the assignedInterval must be inactive, or not currently
6797 // assigned to this register (e.g. this is a copyReg for that Interval).
6798 // Otherwise, spillRefPosition must be associated with the assignedInterval.
6800 void LinearScan::unassignPhysReg(RegRecord* regRec, RefPosition* spillRefPosition)
6802 Interval* assignedInterval = regRec->assignedInterval;
6803 assert(assignedInterval != nullptr);
6805 regNumber thisRegNum = regRec->regNum;
6808 RegRecord* anotherRegRec = nullptr;
6810 // Prepare second half RegRecord of a double register for TYP_DOUBLE
6811 if (assignedInterval->registerType == TYP_DOUBLE)
6813 assert(isFloatRegType(regRec->registerType));
6815 anotherRegRec = findAnotherHalfRegRec(regRec);
6817 // Both two RegRecords should have been assigned to the same interval.
6818 assert(assignedInterval == anotherRegRec->assignedInterval);
6820 #endif // _TARGET_ARM_
6822 checkAndClearInterval(regRec, spillRefPosition);
6825 if (assignedInterval->registerType == TYP_DOUBLE)
6827 // Both two RegRecords should have been unassigned together.
6828 assert(regRec->assignedInterval == nullptr);
6829 assert(anotherRegRec->assignedInterval == nullptr);
6831 #endif // _TARGET_ARM_
6834 if (VERBOSE && !dumpTerse)
6836 printf("unassigning %s: ", getRegName(regRec->regNum));
6837 assignedInterval->dump();
6842 RefPosition* nextRefPosition = nullptr;
6843 if (spillRefPosition != nullptr)
6845 nextRefPosition = spillRefPosition->nextRefPosition;
6848 if (assignedInterval->physReg != REG_NA && assignedInterval->physReg != thisRegNum)
6850 // This must have been a temporary copy reg, but we can't assert that because there
6851 // may have been intervening RefPositions that were not copyRegs.
6853 // reg->assignedInterval has already been set to nullptr by checkAndClearInterval()
6854 assert(regRec->assignedInterval == nullptr);
6858 regNumber victimAssignedReg = assignedInterval->physReg;
6859 assignedInterval->physReg = REG_NA;
6861 bool spill = assignedInterval->isActive && nextRefPosition != nullptr;
6864 // If this is an active interval, it must have a recentRefPosition,
6865 // otherwise it would not be active
6866 assert(spillRefPosition != nullptr);
6869 // TODO-CQ: Enable this and insert an explicit GT_COPY (otherwise there's no way to communicate
6870 // to codegen that we want the copyReg to be the new home location).
6871 // If the last reference was a copyReg, and we're spilling the register
6872 // it was copied from, then make the copyReg the new primary location
6874 if (spillRefPosition->copyReg)
6876 regNumber copyFromRegNum = victimAssignedReg;
6877 regNumber copyRegNum = genRegNumFromMask(spillRefPosition->registerAssignment);
6878 if (copyFromRegNum == thisRegNum &&
6879 getRegisterRecord(copyRegNum)->assignedInterval == assignedInterval)
6881 assert(copyRegNum != thisRegNum);
6882 assignedInterval->physReg = copyRegNum;
6883 assignedInterval->assignedReg = this->getRegisterRecord(copyRegNum);
6889 // With JitStressRegs == 0x80 (LSRA_EXTEND_LIFETIMES), we may have a RefPosition
6890 // that is not marked lastUse even though the treeNode is a lastUse. In that case
6891 // we must not mark it for spill because the register will have been immediately freed
6892 // after use. While we could conceivably add special handling for this case in codegen,
6893 // it would be messy and undesirably cause the "bleeding" of LSRA stress modes outside
6895 if (extendLifetimes() && assignedInterval->isLocalVar && RefTypeIsUse(spillRefPosition->refType) &&
6896 spillRefPosition->treeNode != nullptr && (spillRefPosition->treeNode->gtFlags & GTF_VAR_DEATH) != 0)
6898 dumpLsraAllocationEvent(LSRA_EVENT_SPILL_EXTENDED_LIFETIME, assignedInterval);
6899 assignedInterval->isActive = false;
6901 // If the spillRefPosition occurs before the beginning of this block, it will have
6902 // been marked as living in this register on entry to this block, but we now need
6903 // to mark this as living on the stack.
6904 if (spillRefPosition->nodeLocation <= curBBStartLocation)
6906 setInVarRegForBB(curBBNum, assignedInterval->varNum, REG_STK);
6907 if (spillRefPosition->nextRefPosition != nullptr)
6909 setIntervalAsSpilled(assignedInterval);
6914 // Otherwise, we need to mark spillRefPosition as lastUse, or the interval
6915 // will remain active beyond its allocated range during the resolution phase.
6916 spillRefPosition->lastUse = true;
6922 spillInterval(assignedInterval, spillRefPosition, nextRefPosition);
6925 // Maintain the association with the interval, if it has more references.
6926 // Or, if we "remembered" an interval assigned to this register, restore it.
6927 if (nextRefPosition != nullptr)
6929 assignedInterval->assignedReg = regRec;
6931 else if (canRestorePreviousInterval(regRec, assignedInterval))
6933 regRec->assignedInterval = regRec->previousInterval;
6934 regRec->previousInterval = nullptr;
6938 // We can not use updateAssignedInterval() and updatePreviousInterval() here,
6939 // because regRec may not be a even-numbered float register.
6941 // Update second half RegRecord of a double register for TYP_DOUBLE
6942 if (regRec->assignedInterval->registerType == TYP_DOUBLE)
6944 RegRecord* anotherHalfRegRec = findAnotherHalfRegRec(regRec);
6946 anotherHalfRegRec->assignedInterval = regRec->assignedInterval;
6947 anotherHalfRegRec->previousInterval = nullptr;
6949 #endif // _TARGET_ARM_
6954 dumpLsraAllocationEvent(LSRA_EVENT_RESTORE_PREVIOUS_INTERVAL_AFTER_SPILL, regRec->assignedInterval,
6959 dumpLsraAllocationEvent(LSRA_EVENT_RESTORE_PREVIOUS_INTERVAL, regRec->assignedInterval, thisRegNum);
6965 updateAssignedInterval(regRec, nullptr, assignedInterval->registerType);
6966 updatePreviousInterval(regRec, nullptr, assignedInterval->registerType);
6970 //------------------------------------------------------------------------
6971 // spillGCRefs: Spill any GC-type intervals that are currently in registers.a
6974 // killRefPosition - The RefPosition for the kill
6979 void LinearScan::spillGCRefs(RefPosition* killRefPosition)
6981 // For each physical register that can hold a GC type,
6982 // if it is occupied by an interval of a GC type, spill that interval.
6983 regMaskTP candidateRegs = killRefPosition->registerAssignment;
6984 while (candidateRegs != RBM_NONE)
6986 regMaskTP nextRegBit = genFindLowestBit(candidateRegs);
6987 candidateRegs &= ~nextRegBit;
6988 regNumber nextReg = genRegNumFromMask(nextRegBit);
6989 RegRecord* regRecord = getRegisterRecord(nextReg);
6990 Interval* assignedInterval = regRecord->assignedInterval;
6991 if (assignedInterval == nullptr || (assignedInterval->isActive == false) ||
6992 !varTypeIsGC(assignedInterval->registerType))
6996 unassignPhysReg(regRecord, assignedInterval->recentRefPosition);
6998 INDEBUG(dumpLsraAllocationEvent(LSRA_EVENT_DONE_KILL_GC_REFS, nullptr, REG_NA, nullptr));
7001 //------------------------------------------------------------------------
7002 // processBlockEndAllocation: Update var locations after 'currentBlock' has been allocated
7005 // currentBlock - the BasicBlock we have just finished allocating registers for
7011 // Calls processBlockEndLocations() to set the outVarToRegMap, then gets the next block,
7012 // and sets the inVarToRegMap appropriately.
7014 void LinearScan::processBlockEndAllocation(BasicBlock* currentBlock)
7016 assert(currentBlock != nullptr);
7017 if (enregisterLocalVars)
7019 processBlockEndLocations(currentBlock);
7021 markBlockVisited(currentBlock);
7023 // Get the next block to allocate.
7024 // When the last block in the method has successors, there will be a final "RefTypeBB" to
7025 // ensure that we get the varToRegMap set appropriately, but in that case we don't need
7026 // to worry about "nextBlock".
7027 BasicBlock* nextBlock = getNextBlock();
7028 if (nextBlock != nullptr)
7030 processBlockStartLocations(nextBlock, true);
7034 //------------------------------------------------------------------------
7035 // rotateBlockStartLocation: When in the LSRA_BLOCK_BOUNDARY_ROTATE stress mode, attempt to
7036 // "rotate" the register assignment for a localVar to the next higher
7037 // register that is available.
7040 // interval - the Interval for the variable whose register is getting rotated
7041 // targetReg - its register assignment from the predecessor block being used for live-in
7042 // availableRegs - registers available for use
7045 // The new register to use.
7048 regNumber LinearScan::rotateBlockStartLocation(Interval* interval, regNumber targetReg, regMaskTP availableRegs)
7050 if (targetReg != REG_STK && getLsraBlockBoundaryLocations() == LSRA_BLOCK_BOUNDARY_ROTATE)
7052 // If we're rotating the register locations at block boundaries, try to use
7053 // the next higher register number of the appropriate register type.
7054 regMaskTP candidateRegs = allRegs(interval->registerType) & availableRegs;
7055 regNumber firstReg = REG_NA;
7056 regNumber newReg = REG_NA;
7057 while (candidateRegs != RBM_NONE)
7059 regMaskTP nextRegBit = genFindLowestBit(candidateRegs);
7060 candidateRegs &= ~nextRegBit;
7061 regNumber nextReg = genRegNumFromMask(nextRegBit);
7062 if (nextReg > targetReg)
7067 else if (firstReg == REG_NA)
7072 if (newReg == REG_NA)
7074 assert(firstReg != REG_NA);
7084 //--------------------------------------------------------------------------------------
7085 // isSecondHalfReg: Test if recRec is second half of double register
7086 // which is assigned to an interval.
7089 // regRec - a register to be tested
7090 // interval - an interval which is assigned to some register
7096 // True only if regRec is second half of assignedReg in interval
7098 bool LinearScan::isSecondHalfReg(RegRecord* regRec, Interval* interval)
7100 RegRecord* assignedReg = interval->assignedReg;
7102 if (assignedReg != nullptr && interval->registerType == TYP_DOUBLE)
7104 // interval should have been allocated to a valid double register
7105 assert(genIsValidDoubleReg(assignedReg->regNum));
7107 // Find a second half RegRecord of double register
7108 regNumber firstRegNum = assignedReg->regNum;
7109 regNumber secondRegNum = REG_NEXT(firstRegNum);
7111 assert(genIsValidFloatReg(secondRegNum) && !genIsValidDoubleReg(secondRegNum));
7113 RegRecord* secondRegRec = getRegisterRecord(secondRegNum);
7115 return secondRegRec == regRec;
7121 //------------------------------------------------------------------------------------------
7122 // findAnotherHalfRegRec: Find another half RegRecord which forms same ARM32 double register
7125 // regRec - A float RegRecord
7131 // A RegRecord which forms same double register with regRec
7133 RegRecord* LinearScan::findAnotherHalfRegRec(RegRecord* regRec)
7135 regNumber anotherHalfRegNum;
7136 RegRecord* anotherHalfRegRec;
7138 assert(genIsValidFloatReg(regRec->regNum));
7140 // Find another half register for TYP_DOUBLE interval,
7141 // following same logic in canRestorePreviousInterval().
7142 if (genIsValidDoubleReg(regRec->regNum))
7144 anotherHalfRegNum = REG_NEXT(regRec->regNum);
7145 assert(!genIsValidDoubleReg(anotherHalfRegNum));
7149 anotherHalfRegNum = REG_PREV(regRec->regNum);
7150 assert(genIsValidDoubleReg(anotherHalfRegNum));
7152 anotherHalfRegRec = getRegisterRecord(anotherHalfRegNum);
7154 return anotherHalfRegRec;
7158 //--------------------------------------------------------------------------------------
7159 // canRestorePreviousInterval: Test if we can restore previous interval
7162 // regRec - a register which contains previous interval to be restored
7163 // assignedInterval - an interval just unassigned
7169 // True only if previous interval of regRec can be restored
7171 bool LinearScan::canRestorePreviousInterval(RegRecord* regRec, Interval* assignedInterval)
7174 (regRec->previousInterval != nullptr && regRec->previousInterval != assignedInterval &&
7175 regRec->previousInterval->assignedReg == regRec && regRec->previousInterval->getNextRefPosition() != nullptr);
7178 if (retVal && regRec->previousInterval->registerType == TYP_DOUBLE)
7180 RegRecord* anotherHalfRegRec = findAnotherHalfRegRec(regRec);
7182 retVal = retVal && anotherHalfRegRec->assignedInterval == nullptr;
7189 bool LinearScan::isAssignedToInterval(Interval* interval, RegRecord* regRec)
7191 bool isAssigned = (interval->assignedReg == regRec);
7193 isAssigned |= isSecondHalfReg(regRec, interval);
7198 //------------------------------------------------------------------------
7199 // processBlockStartLocations: Update var locations on entry to 'currentBlock' and clear constant
7203 // currentBlock - the BasicBlock we are about to allocate registers for
7204 // allocationPass - true if we are currently allocating registers (versus writing them back)
7210 // During the allocation pass, we use the outVarToRegMap of the selected predecessor to
7211 // determine the lclVar locations for the inVarToRegMap.
7212 // During the resolution (write-back) pass, we only modify the inVarToRegMap in cases where
7213 // a lclVar was spilled after the block had been completed.
7214 void LinearScan::processBlockStartLocations(BasicBlock* currentBlock, bool allocationPass)
7216 // If we have no register candidates we should only call this method during allocation.
7218 assert(enregisterLocalVars || allocationPass);
7220 if (!enregisterLocalVars)
7222 // Just clear any constant registers and return.
7223 for (regNumber reg = REG_FIRST; reg < ACTUAL_REG_COUNT; reg = REG_NEXT(reg))
7225 RegRecord* physRegRecord = getRegisterRecord(reg);
7226 Interval* assignedInterval = physRegRecord->assignedInterval;
7228 if (assignedInterval != nullptr)
7230 assert(assignedInterval->isConstant);
7231 physRegRecord->assignedInterval = nullptr;
7234 INDEBUG(dumpLsraAllocationEvent(LSRA_EVENT_START_BB, nullptr, REG_NA, currentBlock));
7238 unsigned predBBNum = blockInfo[currentBlock->bbNum].predBBNum;
7239 VarToRegMap predVarToRegMap = getOutVarToRegMap(predBBNum);
7240 VarToRegMap inVarToRegMap = getInVarToRegMap(currentBlock->bbNum);
7241 bool hasCriticalInEdge = blockInfo[currentBlock->bbNum].hasCriticalInEdge;
7243 VarSetOps::AssignNoCopy(compiler, currentLiveVars,
7244 VarSetOps::Intersection(compiler, registerCandidateVars, currentBlock->bbLiveIn));
7246 if (getLsraExtendLifeTimes())
7248 VarSetOps::AssignNoCopy(compiler, currentLiveVars, registerCandidateVars);
7250 // If we are rotating register assignments at block boundaries, we want to make the
7251 // inactive registers available for the rotation.
7252 regMaskTP inactiveRegs = RBM_NONE;
7254 regMaskTP liveRegs = RBM_NONE;
7255 VarSetOps::Iter iter(compiler, currentLiveVars);
7256 unsigned varIndex = 0;
7257 while (iter.NextElem(&varIndex))
7259 unsigned varNum = compiler->lvaTrackedToVarNum[varIndex];
7260 if (!compiler->lvaTable[varNum].lvLRACandidate)
7264 regNumber targetReg;
7265 Interval* interval = getIntervalForLocalVar(varIndex);
7266 RefPosition* nextRefPosition = interval->getNextRefPosition();
7267 assert(nextRefPosition != nullptr);
7271 targetReg = getVarReg(predVarToRegMap, varIndex);
7273 regNumber newTargetReg = rotateBlockStartLocation(interval, targetReg, (~liveRegs | inactiveRegs));
7274 if (newTargetReg != targetReg)
7276 targetReg = newTargetReg;
7277 setIntervalAsSplit(interval);
7280 setVarReg(inVarToRegMap, varIndex, targetReg);
7282 else // !allocationPass (i.e. resolution/write-back pass)
7284 targetReg = getVarReg(inVarToRegMap, varIndex);
7285 // There are four cases that we need to consider during the resolution pass:
7286 // 1. This variable had a register allocated initially, and it was not spilled in the RefPosition
7287 // that feeds this block. In this case, both targetReg and predVarToRegMap[varIndex] will be targetReg.
7288 // 2. This variable had not been spilled prior to the end of predBB, but was later spilled, so
7289 // predVarToRegMap[varIndex] will be REG_STK, but targetReg is its former allocated value.
7290 // In this case, we will normally change it to REG_STK. We will update its "spilled" status when we
7291 // encounter it in resolveLocalRef().
7292 // 2a. If the next RefPosition is marked as a copyReg, we need to retain the allocated register. This is
7293 // because the copyReg RefPosition will not have recorded the "home" register, yet downstream
7294 // RefPositions rely on the correct "home" register.
7295 // 3. This variable was spilled before we reached the end of predBB. In this case, both targetReg and
7296 // predVarToRegMap[varIndex] will be REG_STK, and the next RefPosition will have been marked
7297 // as reload during allocation time if necessary (note that by the time we actually reach the next
7298 // RefPosition, we may be using a different predecessor, at which it is still in a register).
7299 // 4. This variable was spilled during the allocation of this block, so targetReg is REG_STK
7300 // (because we set inVarToRegMap at the time we spilled it), but predVarToRegMap[varIndex]
7301 // is not REG_STK. We retain the REG_STK value in the inVarToRegMap.
7302 if (targetReg != REG_STK)
7304 if (getVarReg(predVarToRegMap, varIndex) != REG_STK)
7307 assert(getVarReg(predVarToRegMap, varIndex) == targetReg ||
7308 getLsraBlockBoundaryLocations() == LSRA_BLOCK_BOUNDARY_ROTATE);
7310 else if (!nextRefPosition->copyReg)
7313 setVarReg(inVarToRegMap, varIndex, REG_STK);
7314 targetReg = REG_STK;
7316 // Else case 2a. - retain targetReg.
7318 // Else case #3 or #4, we retain targetReg and nothing further to do or assert.
7320 if (interval->physReg == targetReg)
7322 if (interval->isActive)
7324 assert(targetReg != REG_STK);
7325 assert(interval->assignedReg != nullptr && interval->assignedReg->regNum == targetReg &&
7326 interval->assignedReg->assignedInterval == interval);
7327 liveRegs |= genRegMask(targetReg);
7331 else if (interval->physReg != REG_NA)
7333 // This can happen if we are using the locations from a basic block other than the
7334 // immediately preceding one - where the variable was in a different location.
7335 if (targetReg != REG_STK)
7337 // Unassign it from the register (it will get a new register below).
7338 if (interval->assignedReg != nullptr && interval->assignedReg->assignedInterval == interval)
7340 interval->isActive = false;
7341 unassignPhysReg(getRegisterRecord(interval->physReg), nullptr);
7345 // This interval was live in this register the last time we saw a reference to it,
7346 // but has since been displaced.
7347 interval->physReg = REG_NA;
7350 else if (allocationPass)
7352 // Keep the register assignment - if another var has it, it will get unassigned.
7353 // Otherwise, resolution will fix it up later, and it will be more
7354 // likely to match other assignments this way.
7355 interval->isActive = true;
7356 liveRegs |= genRegMask(interval->physReg);
7357 INDEBUG(inactiveRegs |= genRegMask(interval->physReg));
7358 setVarReg(inVarToRegMap, varIndex, interval->physReg);
7362 interval->physReg = REG_NA;
7365 if (targetReg != REG_STK)
7367 RegRecord* targetRegRecord = getRegisterRecord(targetReg);
7368 liveRegs |= genRegMask(targetReg);
7369 if (!interval->isActive)
7371 interval->isActive = true;
7372 interval->physReg = targetReg;
7373 interval->assignedReg = targetRegRecord;
7375 Interval* assignedInterval = targetRegRecord->assignedInterval;
7376 if (assignedInterval != interval)
7378 // Is there another interval currently assigned to this register? If so unassign it.
7379 if (assignedInterval != nullptr)
7381 if (isAssignedToInterval(assignedInterval, targetRegRecord))
7383 regNumber assignedRegNum = assignedInterval->assignedReg->regNum;
7385 // If the interval is active, it will be set to active when we reach its new
7386 // register assignment (which we must not yet have done, or it wouldn't still be
7387 // assigned to this register).
7388 assignedInterval->isActive = false;
7389 unassignPhysReg(assignedInterval->assignedReg, nullptr);
7390 if (allocationPass && assignedInterval->isLocalVar &&
7391 inVarToRegMap[assignedInterval->getVarIndex(compiler)] == assignedRegNum)
7393 inVarToRegMap[assignedInterval->getVarIndex(compiler)] = REG_STK;
7398 // This interval is no longer assigned to this register.
7399 updateAssignedInterval(targetRegRecord, nullptr, assignedInterval->registerType);
7402 assignPhysReg(targetRegRecord, interval);
7404 if (interval->recentRefPosition != nullptr && !interval->recentRefPosition->copyReg &&
7405 interval->recentRefPosition->registerAssignment != genRegMask(targetReg))
7407 interval->getNextRefPosition()->outOfOrder = true;
7412 // Unassign any registers that are no longer live.
7413 for (regNumber reg = REG_FIRST; reg < ACTUAL_REG_COUNT; reg = REG_NEXT(reg))
7415 if ((liveRegs & genRegMask(reg)) == 0)
7417 RegRecord* physRegRecord = getRegisterRecord(reg);
7418 Interval* assignedInterval = physRegRecord->assignedInterval;
7420 if (assignedInterval != nullptr)
7422 assert(assignedInterval->isLocalVar || assignedInterval->isConstant);
7424 if (!assignedInterval->isConstant && assignedInterval->assignedReg == physRegRecord)
7426 assignedInterval->isActive = false;
7427 if (assignedInterval->getNextRefPosition() == nullptr)
7429 unassignPhysReg(physRegRecord, nullptr);
7431 inVarToRegMap[assignedInterval->getVarIndex(compiler)] = REG_STK;
7435 // This interval may still be active, but was in another register in an
7436 // intervening block.
7437 updateAssignedInterval(physRegRecord, nullptr, assignedInterval->registerType);
7441 if (assignedInterval->registerType == TYP_DOUBLE)
7443 // Skip next float register, because we already addressed a double register
7444 assert(genIsValidDoubleReg(reg));
7445 reg = REG_NEXT(reg);
7447 #endif // _TARGET_ARM_
7453 RegRecord* physRegRecord = getRegisterRecord(reg);
7454 Interval* assignedInterval = physRegRecord->assignedInterval;
7456 if (assignedInterval != nullptr && assignedInterval->registerType == TYP_DOUBLE)
7458 // Skip next float register, because we already addressed a double register
7459 assert(genIsValidDoubleReg(reg));
7460 reg = REG_NEXT(reg);
7463 #endif // _TARGET_ARM_
7465 INDEBUG(dumpLsraAllocationEvent(LSRA_EVENT_START_BB, nullptr, REG_NA, currentBlock));
7468 //------------------------------------------------------------------------
7469 // processBlockEndLocations: Record the variables occupying registers after completing the current block.
7472 // currentBlock - the block we have just completed.
7478 // This must be called both during the allocation and resolution (write-back) phases.
7479 // This is because we need to have the outVarToRegMap locations in order to set the locations
7480 // at successor blocks during allocation time, but if lclVars are spilled after a block has been
7481 // completed, we need to record the REG_STK location for those variables at resolution time.
7483 void LinearScan::processBlockEndLocations(BasicBlock* currentBlock)
7485 assert(currentBlock != nullptr && currentBlock->bbNum == curBBNum);
7486 VarToRegMap outVarToRegMap = getOutVarToRegMap(curBBNum);
7488 VarSetOps::AssignNoCopy(compiler, currentLiveVars,
7489 VarSetOps::Intersection(compiler, registerCandidateVars, currentBlock->bbLiveOut));
7491 if (getLsraExtendLifeTimes())
7493 VarSetOps::Assign(compiler, currentLiveVars, registerCandidateVars);
7496 regMaskTP liveRegs = RBM_NONE;
7497 VarSetOps::Iter iter(compiler, currentLiveVars);
7498 unsigned varIndex = 0;
7499 while (iter.NextElem(&varIndex))
7501 Interval* interval = getIntervalForLocalVar(varIndex);
7502 if (interval->isActive)
7504 assert(interval->physReg != REG_NA && interval->physReg != REG_STK);
7505 setVarReg(outVarToRegMap, varIndex, interval->physReg);
7509 outVarToRegMap[varIndex] = REG_STK;
7512 INDEBUG(dumpLsraAllocationEvent(LSRA_EVENT_END_BB));
7516 void LinearScan::dumpRefPositions(const char* str)
7518 printf("------------\n");
7519 printf("REFPOSITIONS %s: \n", str);
7520 printf("------------\n");
7521 for (auto& refPos : refPositions)
7528 bool LinearScan::registerIsFree(regNumber regNum, RegisterType regType)
7530 RegRecord* physRegRecord = getRegisterRecord(regNum);
7532 bool isFree = physRegRecord->isFree();
7535 if (isFree && regType == TYP_DOUBLE)
7537 isFree = getRegisterRecord(REG_NEXT(regNum))->isFree();
7539 #endif // _TARGET_ARM_
7544 //------------------------------------------------------------------------
7545 // LinearScan::freeRegister: Make a register available for use
7548 // physRegRecord - the RegRecord for the register to be freed.
7555 // It may be that the RegRecord has already been freed, e.g. due to a kill,
7556 // in which case this method has no effect.
7559 // If there is currently an Interval assigned to this register, and it has
7560 // more references (i.e. this is a local last-use, but more uses and/or
7561 // defs remain), it will remain assigned to the physRegRecord. However, since
7562 // it is marked inactive, the register will be available, albeit less desirable
7564 void LinearScan::freeRegister(RegRecord* physRegRecord)
7566 Interval* assignedInterval = physRegRecord->assignedInterval;
7567 // It may have already been freed by a "Kill"
7568 if (assignedInterval != nullptr)
7570 assignedInterval->isActive = false;
7571 // If this is a constant node, that we may encounter again (e.g. constant),
7572 // don't unassign it until we need the register.
7573 if (!assignedInterval->isConstant)
7575 RefPosition* nextRefPosition = assignedInterval->getNextRefPosition();
7576 // Unassign the register only if there are no more RefPositions, or the next
7577 // one is a def. Note that the latter condition doesn't actually ensure that
7578 // there aren't subsequent uses that could be reached by a def in the assigned
7579 // register, but is merely a heuristic to avoid tying up the register (or using
7580 // it when it's non-optimal). A better alternative would be to use SSA, so that
7581 // we wouldn't unnecessarily link separate live ranges to the same register.
7582 if (nextRefPosition == nullptr || RefTypeIsDef(nextRefPosition->refType))
7585 assert((assignedInterval->registerType != TYP_DOUBLE) || genIsValidDoubleReg(physRegRecord->regNum));
7586 #endif // _TARGET_ARM_
7587 unassignPhysReg(physRegRecord, nullptr);
7593 void LinearScan::freeRegisters(regMaskTP regsToFree)
7595 if (regsToFree == RBM_NONE)
7600 INDEBUG(dumpLsraAllocationEvent(LSRA_EVENT_FREE_REGS));
7601 while (regsToFree != RBM_NONE)
7603 regMaskTP nextRegBit = genFindLowestBit(regsToFree);
7604 regsToFree &= ~nextRegBit;
7605 regNumber nextReg = genRegNumFromMask(nextRegBit);
7606 freeRegister(getRegisterRecord(nextReg));
7610 // Actual register allocation, accomplished by iterating over all of the previously
7611 // constructed Intervals
7612 // Loosely based on raAssignVars()
7614 void LinearScan::allocateRegisters()
7616 JITDUMP("*************** In LinearScan::allocateRegisters()\n");
7617 DBEXEC(VERBOSE, lsraDumpIntervals("before allocateRegisters"));
7619 // at start, nothing is active except for register args
7620 for (auto& interval : intervals)
7622 Interval* currentInterval = &interval;
7623 currentInterval->recentRefPosition = nullptr;
7624 currentInterval->isActive = false;
7625 if (currentInterval->isLocalVar)
7627 LclVarDsc* varDsc = currentInterval->getLocalVar(compiler);
7628 if (varDsc->lvIsRegArg && currentInterval->firstRefPosition != nullptr)
7630 currentInterval->isActive = true;
7635 for (regNumber reg = REG_FIRST; reg < ACTUAL_REG_COUNT; reg = REG_NEXT(reg))
7637 getRegisterRecord(reg)->recentRefPosition = nullptr;
7638 getRegisterRecord(reg)->isActive = false;
7642 regNumber lastAllocatedReg = REG_NA;
7645 dumpRefPositions("BEFORE ALLOCATION");
7646 dumpVarRefPositions("BEFORE ALLOCATION");
7648 printf("\n\nAllocating Registers\n"
7649 "--------------------\n");
7652 dumpRegRecordHeader();
7653 // Now print an empty indent
7654 printf(indentFormat, "");
7659 BasicBlock* currentBlock = nullptr;
7661 LsraLocation prevLocation = MinLocation;
7662 regMaskTP regsToFree = RBM_NONE;
7663 regMaskTP delayRegsToFree = RBM_NONE;
7665 // This is the most recent RefPosition for which a register was allocated
7666 // - currently only used for DEBUG but maintained in non-debug, for clarity of code
7667 // (and will be optimized away because in non-debug spillAlways() unconditionally returns false)
7668 RefPosition* lastAllocatedRefPosition = nullptr;
7670 bool handledBlockEnd = false;
7672 for (auto& refPosition : refPositions)
7674 RefPosition* currentRefPosition = &refPosition;
7677 // Set the activeRefPosition to null until we're done with any boundary handling.
7678 activeRefPosition = nullptr;
7683 // We're really dumping the RegRecords "after" the previous RefPosition, but it's more convenient
7684 // to do this here, since there are a number of "continue"s in this loop.
7694 // This is the previousRefPosition of the current Referent, if any
7695 RefPosition* previousRefPosition = nullptr;
7697 Interval* currentInterval = nullptr;
7698 Referenceable* currentReferent = nullptr;
7699 bool isInternalRef = false;
7700 RefType refType = currentRefPosition->refType;
7702 currentReferent = currentRefPosition->referent;
7704 if (spillAlways() && lastAllocatedRefPosition != nullptr && !lastAllocatedRefPosition->isPhysRegRef &&
7705 !lastAllocatedRefPosition->getInterval()->isInternal &&
7706 (RefTypeIsDef(lastAllocatedRefPosition->refType) || lastAllocatedRefPosition->getInterval()->isLocalVar))
7708 assert(lastAllocatedRefPosition->registerAssignment != RBM_NONE);
7709 RegRecord* regRecord = lastAllocatedRefPosition->getInterval()->assignedReg;
7710 unassignPhysReg(regRecord, lastAllocatedRefPosition);
7711 // Now set lastAllocatedRefPosition to null, so that we don't try to spill it again
7712 lastAllocatedRefPosition = nullptr;
7715 // We wait to free any registers until we've completed all the
7716 // uses for the current node.
7717 // This avoids reusing registers too soon.
7718 // We free before the last true def (after all the uses & internal
7719 // registers), and then again at the beginning of the next node.
7720 // This is made easier by assigning two LsraLocations per node - one
7721 // for all the uses, internal registers & all but the last def, and
7722 // another for the final def (if any).
7724 LsraLocation currentLocation = currentRefPosition->nodeLocation;
7726 if ((regsToFree | delayRegsToFree) != RBM_NONE)
7728 bool doFreeRegs = false;
7729 // Free at a new location, or at a basic block boundary
7730 if (currentLocation > prevLocation || refType == RefTypeBB)
7737 freeRegisters(regsToFree);
7738 regsToFree = delayRegsToFree;
7739 delayRegsToFree = RBM_NONE;
7742 prevLocation = currentLocation;
7744 // get previous refposition, then current refpos is the new previous
7745 if (currentReferent != nullptr)
7747 previousRefPosition = currentReferent->recentRefPosition;
7748 currentReferent->recentRefPosition = currentRefPosition;
7752 assert((refType == RefTypeBB) || (refType == RefTypeKillGCRefs));
7755 // For the purposes of register resolution, we handle the DummyDefs before
7756 // the block boundary - so the RefTypeBB is after all the DummyDefs.
7757 // However, for the purposes of allocation, we want to handle the block
7758 // boundary first, so that we can free any registers occupied by lclVars
7759 // that aren't live in the next block and make them available for the
7762 if (!handledBlockEnd && (refType == RefTypeBB || refType == RefTypeDummyDef))
7764 // Free any delayed regs (now in regsToFree) before processing the block boundary
7765 freeRegisters(regsToFree);
7766 regsToFree = RBM_NONE;
7767 handledBlockEnd = true;
7768 curBBStartLocation = currentRefPosition->nodeLocation;
7769 if (currentBlock == nullptr)
7771 currentBlock = startBlockSequence();
7775 processBlockEndAllocation(currentBlock);
7776 currentBlock = moveToNextBlock();
7779 if (VERBOSE && currentBlock != nullptr && !dumpTerse)
7781 currentBlock->dspBlockHeader(compiler);
7788 activeRefPosition = currentRefPosition;
7793 dumpRefPositionShort(currentRefPosition, currentBlock);
7797 currentRefPosition->dump();
7802 if (refType == RefTypeBB)
7804 handledBlockEnd = false;
7808 if (refType == RefTypeKillGCRefs)
7810 spillGCRefs(currentRefPosition);
7814 // If this is a FixedReg, disassociate any inactive constant interval from this register.
7815 // Otherwise, do nothing.
7816 if (refType == RefTypeFixedReg)
7818 RegRecord* regRecord = currentRefPosition->getReg();
7819 Interval* assignedInterval = regRecord->assignedInterval;
7821 if (assignedInterval != nullptr && !assignedInterval->isActive && assignedInterval->isConstant)
7823 regRecord->assignedInterval = nullptr;
7826 // Update overlapping floating point register for TYP_DOUBLE
7827 if (assignedInterval->registerType == TYP_DOUBLE)
7829 regRecord = getRegisterRecord(REG_NEXT(regRecord->regNum));
7830 assignedInterval = regRecord->assignedInterval;
7832 assert(assignedInterval != nullptr && !assignedInterval->isActive && assignedInterval->isConstant);
7833 regRecord->assignedInterval = nullptr;
7837 INDEBUG(dumpLsraAllocationEvent(LSRA_EVENT_FIXED_REG, nullptr, currentRefPosition->assignedReg()));
7841 // If this is an exposed use, do nothing - this is merely a placeholder to attempt to
7842 // ensure that a register is allocated for the full lifetime. The resolution logic
7843 // will take care of moving to the appropriate register if needed.
7845 if (refType == RefTypeExpUse)
7847 INDEBUG(dumpLsraAllocationEvent(LSRA_EVENT_EXP_USE));
7851 regNumber assignedRegister = REG_NA;
7853 if (currentRefPosition->isIntervalRef())
7855 currentInterval = currentRefPosition->getInterval();
7856 assignedRegister = currentInterval->physReg;
7858 if (VERBOSE && !dumpTerse)
7860 currentInterval->dump();
7864 // Identify the special cases where we decide up-front not to allocate
7865 bool allocate = true;
7866 bool didDump = false;
7868 if (refType == RefTypeParamDef || refType == RefTypeZeroInit)
7870 // For a ParamDef with a weighted refCount less than unity, don't enregister it at entry.
7871 // TODO-CQ: Consider doing this only for stack parameters, since otherwise we may be needlessly
7872 // inserting a store.
7873 LclVarDsc* varDsc = currentInterval->getLocalVar(compiler);
7874 assert(varDsc != nullptr);
7875 if (refType == RefTypeParamDef && varDsc->lvRefCntWtd <= BB_UNITY_WEIGHT)
7877 INDEBUG(dumpLsraAllocationEvent(LSRA_EVENT_NO_ENTRY_REG_ALLOCATED, currentInterval));
7880 setIntervalAsSpilled(currentInterval);
7882 // If it has no actual references, mark it as "lastUse"; since they're not actually part
7883 // of any flow they won't have been marked during dataflow. Otherwise, if we allocate a
7884 // register we won't unassign it.
7885 else if (currentRefPosition->nextRefPosition == nullptr)
7887 INDEBUG(dumpLsraAllocationEvent(LSRA_EVENT_ZERO_REF, currentInterval));
7888 currentRefPosition->lastUse = true;
7892 else if (refType == RefTypeUpperVectorSaveDef || refType == RefTypeUpperVectorSaveUse)
7894 Interval* lclVarInterval = currentInterval->relatedInterval;
7895 if (lclVarInterval->physReg == REG_NA)
7900 #endif // FEATURE_SIMD
7902 if (allocate == false)
7904 if (assignedRegister != REG_NA)
7906 unassignPhysReg(getRegisterRecord(assignedRegister), currentRefPosition);
7910 INDEBUG(dumpLsraAllocationEvent(LSRA_EVENT_NO_REG_ALLOCATED, currentInterval));
7913 currentRefPosition->registerAssignment = RBM_NONE;
7917 if (currentInterval->isSpecialPutArg)
7919 assert(!currentInterval->isLocalVar);
7920 Interval* srcInterval = currentInterval->relatedInterval;
7921 assert(srcInterval->isLocalVar);
7922 if (refType == RefTypeDef)
7924 assert(srcInterval->recentRefPosition->nodeLocation == currentLocation - 1);
7925 RegRecord* physRegRecord = srcInterval->assignedReg;
7927 // For a putarg_reg to be special, its next use location has to be the same
7928 // as fixed reg's next kill location. Otherwise, if source lcl var's next use
7929 // is after the kill of fixed reg but before putarg_reg's next use, fixed reg's
7930 // kill would lead to spill of source but not the putarg_reg if it were treated
7932 if (srcInterval->isActive &&
7933 genRegMask(srcInterval->physReg) == currentRefPosition->registerAssignment &&
7934 currentInterval->getNextRefLocation() == physRegRecord->getNextRefLocation())
7936 assert(physRegRecord->regNum == srcInterval->physReg);
7938 // Special putarg_reg acts as a pass-thru since both source lcl var
7939 // and putarg_reg have the same register allocated. Physical reg
7940 // record of reg continue to point to source lcl var's interval
7941 // instead of to putarg_reg's interval. So if a spill of reg
7942 // allocated to source lcl var happens, to reallocate to another
7943 // tree node, before its use at call node it will lead to spill of
7944 // lcl var instead of putarg_reg since physical reg record is pointing
7945 // to lcl var's interval. As a result, arg reg would get trashed leading
7946 // to bad codegen. The assumption here is that source lcl var of a
7947 // special putarg_reg doesn't get spilled and re-allocated prior to
7948 // its use at the call node. This is ensured by marking physical reg
7949 // record as busy until next kill.
7950 physRegRecord->isBusyUntilNextKill = true;
7954 currentInterval->isSpecialPutArg = false;
7957 // If this is still a SpecialPutArg, continue;
7958 if (currentInterval->isSpecialPutArg)
7960 INDEBUG(dumpLsraAllocationEvent(LSRA_EVENT_SPECIAL_PUTARG, currentInterval,
7961 currentRefPosition->assignedReg()));
7966 if (assignedRegister == REG_NA && RefTypeIsUse(refType))
7968 currentRefPosition->reload = true;
7969 INDEBUG(dumpLsraAllocationEvent(LSRA_EVENT_RELOAD, currentInterval, assignedRegister));
7973 regMaskTP assignedRegBit = RBM_NONE;
7974 bool isInRegister = false;
7975 if (assignedRegister != REG_NA)
7977 isInRegister = true;
7978 assignedRegBit = genRegMask(assignedRegister);
7979 if (!currentInterval->isActive)
7981 // If this is a use, it must have started the block on the stack, but the register
7982 // was available for use so we kept the association.
7983 if (RefTypeIsUse(refType))
7985 assert(enregisterLocalVars);
7986 assert(inVarToRegMaps[curBBNum][currentInterval->getVarIndex(compiler)] == REG_STK &&
7987 previousRefPosition->nodeLocation <= curBBStartLocation);
7988 isInRegister = false;
7992 currentInterval->isActive = true;
7995 assert(currentInterval->assignedReg != nullptr &&
7996 currentInterval->assignedReg->regNum == assignedRegister &&
7997 currentInterval->assignedReg->assignedInterval == currentInterval);
8000 // If this is a physical register, we unconditionally assign it to itself!
8001 if (currentRefPosition->isPhysRegRef)
8003 RegRecord* currentReg = currentRefPosition->getReg();
8004 Interval* assignedInterval = currentReg->assignedInterval;
8006 if (assignedInterval != nullptr)
8008 unassignPhysReg(currentReg, assignedInterval->recentRefPosition);
8010 currentReg->isActive = true;
8011 assignedRegister = currentReg->regNum;
8012 assignedRegBit = genRegMask(assignedRegister);
8013 if (refType == RefTypeKill)
8015 currentReg->isBusyUntilNextKill = false;
8018 else if (previousRefPosition != nullptr)
8020 assert(previousRefPosition->nextRefPosition == currentRefPosition);
8021 assert(assignedRegister == REG_NA || assignedRegBit == previousRefPosition->registerAssignment ||
8022 currentRefPosition->outOfOrder || previousRefPosition->copyReg ||
8023 previousRefPosition->refType == RefTypeExpUse || currentRefPosition->refType == RefTypeDummyDef);
8025 else if (assignedRegister != REG_NA)
8027 // Handle the case where this is a preassigned register (i.e. parameter).
8028 // We don't want to actually use the preassigned register if it's not
8029 // going to cover the lifetime - but we had to preallocate it to ensure
8030 // that it remained live.
8031 // TODO-CQ: At some point we may want to refine the analysis here, in case
8032 // it might be beneficial to keep it in this reg for PART of the lifetime
8033 if (currentInterval->isLocalVar)
8035 regMaskTP preferences = currentInterval->registerPreferences;
8036 bool keepAssignment = true;
8037 bool matchesPreferences = (preferences & genRegMask(assignedRegister)) != RBM_NONE;
8039 // Will the assigned register cover the lifetime? If not, does it at least
8040 // meet the preferences for the next RefPosition?
8041 RegRecord* physRegRecord = getRegisterRecord(currentInterval->physReg);
8042 RefPosition* nextPhysRegRefPos = physRegRecord->getNextRefPosition();
8043 if (nextPhysRegRefPos != nullptr &&
8044 nextPhysRegRefPos->nodeLocation <= currentInterval->lastRefPosition->nodeLocation)
8046 // Check to see if the existing assignment matches the preferences (e.g. callee save registers)
8047 // and ensure that the next use of this localVar does not occur after the nextPhysRegRefPos
8048 // There must be a next RefPosition, because we know that the Interval extends beyond the
8049 // nextPhysRegRefPos.
8050 RefPosition* nextLclVarRefPos = currentRefPosition->nextRefPosition;
8051 assert(nextLclVarRefPos != nullptr);
8052 if (!matchesPreferences || nextPhysRegRefPos->nodeLocation < nextLclVarRefPos->nodeLocation ||
8053 physRegRecord->conflictingFixedRegReference(nextLclVarRefPos))
8055 keepAssignment = false;
8058 else if (refType == RefTypeParamDef && !matchesPreferences)
8060 // Don't use the register, even if available, if it doesn't match the preferences.
8061 // Note that this case is only for ParamDefs, for which we haven't yet taken preferences
8062 // into account (we've just automatically got the initial location). In other cases,
8063 // we would already have put it in a preferenced register, if it was available.
8064 // TODO-CQ: Consider expanding this to check availability - that would duplicate
8065 // code here, but otherwise we may wind up in this register anyway.
8066 keepAssignment = false;
8069 if (keepAssignment == false)
8071 currentRefPosition->registerAssignment = allRegs(currentInterval->registerType);
8072 unassignPhysRegNoSpill(physRegRecord);
8074 // If the preferences are currently set to just this register, reset them to allRegs
8075 // of the appropriate type (just as we just reset the registerAssignment for this
8077 // Otherwise, simply remove this register from the preferences, if it's there.
8079 if (currentInterval->registerPreferences == assignedRegBit)
8081 currentInterval->registerPreferences = currentRefPosition->registerAssignment;
8085 currentInterval->registerPreferences &= ~assignedRegBit;
8088 assignedRegister = REG_NA;
8089 assignedRegBit = RBM_NONE;
8094 if (assignedRegister != REG_NA)
8096 // If there is a conflicting fixed reference, insert a copy.
8097 RegRecord* physRegRecord = getRegisterRecord(assignedRegister);
8098 if (physRegRecord->conflictingFixedRegReference(currentRefPosition))
8100 // We may have already reassigned the register to the conflicting reference.
8101 // If not, we need to unassign this interval.
8102 if (physRegRecord->assignedInterval == currentInterval)
8104 unassignPhysRegNoSpill(physRegRecord);
8106 currentRefPosition->moveReg = true;
8107 assignedRegister = REG_NA;
8108 setIntervalAsSplit(currentInterval);
8109 INDEBUG(dumpLsraAllocationEvent(LSRA_EVENT_MOVE_REG, currentInterval, assignedRegister));
8111 else if ((genRegMask(assignedRegister) & currentRefPosition->registerAssignment) != 0)
8113 currentRefPosition->registerAssignment = assignedRegBit;
8114 if (!currentReferent->isActive)
8116 // If we've got an exposed use at the top of a block, the
8117 // interval might not have been active. Otherwise if it's a use,
8118 // the interval must be active.
8119 if (refType == RefTypeDummyDef)
8121 currentReferent->isActive = true;
8122 assert(getRegisterRecord(assignedRegister)->assignedInterval == currentInterval);
8126 currentRefPosition->reload = true;
8129 INDEBUG(dumpLsraAllocationEvent(LSRA_EVENT_KEPT_ALLOCATION, currentInterval, assignedRegister));
8133 assert(currentInterval != nullptr);
8135 // It's already in a register, but not one we need.
8136 if (!RefTypeIsDef(currentRefPosition->refType))
8138 regNumber copyReg = assignCopyReg(currentRefPosition);
8139 assert(copyReg != REG_NA);
8140 INDEBUG(dumpLsraAllocationEvent(LSRA_EVENT_COPY_REG, currentInterval, copyReg));
8141 lastAllocatedRefPosition = currentRefPosition;
8142 if (currentRefPosition->lastUse)
8144 if (currentRefPosition->delayRegFree)
8146 INDEBUG(dumpLsraAllocationEvent(LSRA_EVENT_LAST_USE_DELAYED, currentInterval,
8148 delayRegsToFree |= (genRegMask(assignedRegister) | currentRefPosition->registerAssignment);
8152 INDEBUG(dumpLsraAllocationEvent(LSRA_EVENT_LAST_USE, currentInterval, assignedRegister));
8153 regsToFree |= (genRegMask(assignedRegister) | currentRefPosition->registerAssignment);
8156 // If this is a tree temp (non-localVar) interval, we will need an explicit move.
8157 if (!currentInterval->isLocalVar)
8159 currentRefPosition->moveReg = true;
8160 currentRefPosition->copyReg = false;
8166 INDEBUG(dumpLsraAllocationEvent(LSRA_EVENT_NEEDS_NEW_REG, nullptr, assignedRegister));
8167 regsToFree |= genRegMask(assignedRegister);
8168 // We want a new register, but we don't want this to be considered a spill.
8169 assignedRegister = REG_NA;
8170 if (physRegRecord->assignedInterval == currentInterval)
8172 unassignPhysRegNoSpill(physRegRecord);
8178 if (assignedRegister == REG_NA)
8180 bool allocateReg = true;
8182 if (currentRefPosition->AllocateIfProfitable())
8184 // We can avoid allocating a register if it is a the last use requiring a reload.
8185 if (currentRefPosition->lastUse && currentRefPosition->reload)
8187 allocateReg = false;
8191 // Under stress mode, don't attempt to allocate a reg to
8192 // reg optional ref position.
8193 if (allocateReg && regOptionalNoAlloc())
8195 allocateReg = false;
8202 // Try to allocate a register
8203 assignedRegister = tryAllocateFreeReg(currentInterval, currentRefPosition);
8206 // If no register was found, and if the currentRefPosition must have a register,
8207 // then find a register to spill
8208 if (assignedRegister == REG_NA)
8210 #if FEATURE_PARTIAL_SIMD_CALLEE_SAVE
8211 if (refType == RefTypeUpperVectorSaveDef)
8213 // TODO-CQ: Determine whether copying to two integer callee-save registers would be profitable.
8215 // SaveDef position occurs after the Use of args and at the same location as Kill/Def
8216 // positions of a call node. But SaveDef position cannot use any of the arg regs as
8217 // they are needed for call node.
8218 currentRefPosition->registerAssignment =
8219 (allRegs(TYP_FLOAT) & RBM_FLT_CALLEE_TRASH & ~RBM_FLTARG_REGS);
8220 assignedRegister = tryAllocateFreeReg(currentInterval, currentRefPosition);
8222 // There MUST be caller-save registers available, because they have all just been killed.
8223 // Amd64 Windows: xmm4-xmm5 are guaranteed to be available as xmm0-xmm3 are used for passing args.
8224 // Amd64 Unix: xmm8-xmm15 are guaranteed to be avilable as xmm0-xmm7 are used for passing args.
8225 // X86 RyuJIT Windows: xmm4-xmm7 are guanrateed to be available.
8226 assert(assignedRegister != REG_NA);
8230 // i) The reason we have to spill is that SaveDef position is allocated after the Kill positions
8231 // of the call node are processed. Since callee-trash registers are killed by call node
8232 // we explicity spill and unassign the register.
8233 // ii) These will look a bit backward in the dump, but it's a pain to dump the alloc before the
8235 unassignPhysReg(getRegisterRecord(assignedRegister), currentRefPosition);
8236 INDEBUG(dumpLsraAllocationEvent(LSRA_EVENT_ALLOC_REG, currentInterval, assignedRegister));
8238 // Now set assignedRegister to REG_NA again so that we don't re-activate it.
8239 assignedRegister = REG_NA;
8242 #endif // FEATURE_PARTIAL_SIMD_CALLEE_SAVE
8243 if (currentRefPosition->RequiresRegister() || currentRefPosition->AllocateIfProfitable())
8247 assignedRegister = allocateBusyReg(currentInterval, currentRefPosition,
8248 currentRefPosition->AllocateIfProfitable());
8251 if (assignedRegister != REG_NA)
8254 dumpLsraAllocationEvent(LSRA_EVENT_ALLOC_SPILLED_REG, currentInterval, assignedRegister));
8258 // This can happen only for those ref positions that are to be allocated
8259 // only if profitable.
8260 noway_assert(currentRefPosition->AllocateIfProfitable());
8262 currentRefPosition->registerAssignment = RBM_NONE;
8263 currentRefPosition->reload = false;
8264 setIntervalAsSpilled(currentInterval);
8266 INDEBUG(dumpLsraAllocationEvent(LSRA_EVENT_NO_REG_ALLOCATED, currentInterval));
8271 INDEBUG(dumpLsraAllocationEvent(LSRA_EVENT_NO_REG_ALLOCATED, currentInterval));
8272 currentRefPosition->registerAssignment = RBM_NONE;
8273 currentInterval->isActive = false;
8274 setIntervalAsSpilled(currentInterval);
8282 if (currentInterval->isConstant && (currentRefPosition->treeNode != nullptr) &&
8283 currentRefPosition->treeNode->IsReuseRegVal())
8285 dumpLsraAllocationEvent(LSRA_EVENT_REUSE_REG, nullptr, assignedRegister, currentBlock);
8289 dumpLsraAllocationEvent(LSRA_EVENT_ALLOC_REG, nullptr, assignedRegister, currentBlock);
8295 if (refType == RefTypeDummyDef && assignedRegister != REG_NA)
8297 setInVarRegForBB(curBBNum, currentInterval->varNum, assignedRegister);
8300 // If we allocated a register, and this is a use of a spilled value,
8301 // it should have been marked for reload above.
8302 if (assignedRegister != REG_NA && RefTypeIsUse(refType) && !isInRegister)
8304 assert(currentRefPosition->reload);
8308 // If we allocated a register, record it
8309 if (currentInterval != nullptr && assignedRegister != REG_NA)
8311 assignedRegBit = genRegMask(assignedRegister);
8312 currentRefPosition->registerAssignment = assignedRegBit;
8313 currentInterval->physReg = assignedRegister;
8314 regsToFree &= ~assignedRegBit; // we'll set it again later if it's dead
8316 // If this interval is dead, free the register.
8317 // The interval could be dead if this is a user variable, or if the
8318 // node is being evaluated for side effects, or a call whose result
8319 // is not used, etc.
8320 if (currentRefPosition->lastUse || currentRefPosition->nextRefPosition == nullptr)
8322 assert(currentRefPosition->isIntervalRef());
8324 if (refType != RefTypeExpUse && currentRefPosition->nextRefPosition == nullptr)
8326 if (currentRefPosition->delayRegFree)
8328 delayRegsToFree |= assignedRegBit;
8330 INDEBUG(dumpLsraAllocationEvent(LSRA_EVENT_LAST_USE_DELAYED));
8334 regsToFree |= assignedRegBit;
8336 INDEBUG(dumpLsraAllocationEvent(LSRA_EVENT_LAST_USE));
8341 currentInterval->isActive = false;
8345 lastAllocatedRefPosition = currentRefPosition;
8349 // Free registers to clear associated intervals for resolution phase
8350 CLANG_FORMAT_COMMENT_ANCHOR;
8353 if (getLsraExtendLifeTimes())
8355 // If we have extended lifetimes, we need to make sure all the registers are freed.
8356 for (int regNumIndex = 0; regNumIndex <= REG_FP_LAST; regNumIndex++)
8358 RegRecord& regRecord = physRegs[regNumIndex];
8359 Interval* interval = regRecord.assignedInterval;
8360 if (interval != nullptr)
8362 interval->isActive = false;
8363 unassignPhysReg(®Record, nullptr);
8370 freeRegisters(regsToFree | delayRegsToFree);
8378 // Dump the RegRecords after the last RefPosition is handled.
8383 dumpRefPositions("AFTER ALLOCATION");
8384 dumpVarRefPositions("AFTER ALLOCATION");
8386 // Dump the intervals that remain active
8387 printf("Active intervals at end of allocation:\n");
8389 // We COULD just reuse the intervalIter from above, but ArrayListIterator doesn't
8390 // provide a Reset function (!) - we'll probably replace this so don't bother
8393 for (auto& interval : intervals)
8395 if (interval.isActive)
8407 //-----------------------------------------------------------------------------
8408 // updateAssignedInterval: Update assigned interval of register.
8411 // reg - register to be updated
8412 // interval - interval to be assigned
8413 // regType - regsiter type
8419 // For ARM32, when "regType" is TYP_DOUBLE, "reg" should be a even-numbered
8420 // float register, i.e. lower half of double register.
8423 // For ARM32, two float registers consisting a double register are updated
8424 // together when "regType" is TYP_DOUBLE.
8426 void LinearScan::updateAssignedInterval(RegRecord* reg, Interval* interval, RegisterType regType)
8428 reg->assignedInterval = interval;
8431 // Update overlapping floating point register for TYP_DOUBLE
8432 if (regType == TYP_DOUBLE)
8434 RegRecord* anotherHalfReg = findAnotherHalfRegRec(reg);
8436 anotherHalfReg->assignedInterval = interval;
8441 //-----------------------------------------------------------------------------
8442 // updatePreviousInterval: Update previous interval of register.
8445 // reg - register to be updated
8446 // interval - interval to be assigned
8447 // regType - regsiter type
8453 // For ARM32, when "regType" is TYP_DOUBLE, "reg" should be a even-numbered
8454 // float register, i.e. lower half of double register.
8457 // For ARM32, two float registers consisting a double register are updated
8458 // together when "regType" is TYP_DOUBLE.
8460 void LinearScan::updatePreviousInterval(RegRecord* reg, Interval* interval, RegisterType regType)
8462 reg->previousInterval = interval;
8465 // Update overlapping floating point register for TYP_DOUBLE
8466 if (regType == TYP_DOUBLE)
8468 RegRecord* anotherHalfReg = findAnotherHalfRegRec(reg);
8470 anotherHalfReg->previousInterval = interval;
8475 // LinearScan::resolveLocalRef
8477 // Update the graph for a local reference.
8478 // Also, track the register (if any) that is currently occupied.
8480 // treeNode: The lclVar that's being resolved
8481 // currentRefPosition: the RefPosition associated with the treeNode
8484 // This method is called for each local reference, during the resolveRegisters
8485 // phase of LSRA. It is responsible for keeping the following in sync:
8486 // - varDsc->lvRegNum (and lvOtherReg) contain the unique register location.
8487 // If it is not in the same register through its lifetime, it is set to REG_STK.
8488 // - interval->physReg is set to the assigned register
8489 // (i.e. at the code location which is currently being handled by resolveRegisters())
8490 // - interval->isActive is true iff the interval is live and occupying a register
8491 // - interval->isSpilled should have already been set to true if the interval is EVER spilled
8492 // - interval->isSplit is set to true if the interval does not occupy the same
8493 // register throughout the method
8494 // - RegRecord->assignedInterval points to the interval which currently occupies
8496 // - For each lclVar node:
8497 // - gtRegNum/gtRegPair is set to the currently allocated register(s).
8498 // - GTF_SPILLED is set on a use if it must be reloaded prior to use.
8499 // - GTF_SPILL is set if it must be spilled after use.
8501 // A copyReg is an ugly case where the variable must be in a specific (fixed) register,
8502 // but it currently resides elsewhere. The register allocator must track the use of the
8503 // fixed register, but it marks the lclVar node with the register it currently lives in
8504 // and the code generator does the necessary move.
8506 // Before beginning, the varDsc for each parameter must be set to its initial location.
8508 // NICE: Consider tracking whether an Interval is always in the same location (register/stack)
8509 // in which case it will require no resolution.
8511 void LinearScan::resolveLocalRef(BasicBlock* block, GenTreePtr treeNode, RefPosition* currentRefPosition)
8513 assert((block == nullptr) == (treeNode == nullptr));
8514 assert(enregisterLocalVars);
8516 // Is this a tracked local? Or just a register allocated for loading
8517 // a non-tracked one?
8518 Interval* interval = currentRefPosition->getInterval();
8519 if (!interval->isLocalVar)
8523 interval->recentRefPosition = currentRefPosition;
8524 LclVarDsc* varDsc = interval->getLocalVar(compiler);
8526 // NOTE: we set the GTF_VAR_DEATH flag here unless we are extending lifetimes, in which case we write
8527 // this bit in checkLastUses. This is a bit of a hack, but is necessary because codegen requires
8528 // accurate last use info that is not reflected in the lastUse bit on ref positions when we are extending
8529 // lifetimes. See also the comments in checkLastUses.
8530 if ((treeNode != nullptr) && !extendLifetimes())
8532 if (currentRefPosition->lastUse)
8534 treeNode->gtFlags |= GTF_VAR_DEATH;
8538 treeNode->gtFlags &= ~GTF_VAR_DEATH;
8542 if (currentRefPosition->registerAssignment == RBM_NONE)
8544 assert(!currentRefPosition->RequiresRegister());
8545 assert(interval->isSpilled);
8547 varDsc->lvRegNum = REG_STK;
8548 if (interval->assignedReg != nullptr && interval->assignedReg->assignedInterval == interval)
8550 updateAssignedInterval(interval->assignedReg, nullptr, interval->registerType);
8552 interval->assignedReg = nullptr;
8553 interval->physReg = REG_NA;
8554 if (treeNode != nullptr)
8556 treeNode->SetContained();
8562 // In most cases, assigned and home registers will be the same
8563 // The exception is the copyReg case, where we've assigned a register
8564 // for a specific purpose, but will be keeping the register assignment
8565 regNumber assignedReg = currentRefPosition->assignedReg();
8566 regNumber homeReg = assignedReg;
8568 // Undo any previous association with a physical register, UNLESS this
8570 if (!currentRefPosition->copyReg)
8572 regNumber oldAssignedReg = interval->physReg;
8573 if (oldAssignedReg != REG_NA && assignedReg != oldAssignedReg)
8575 RegRecord* oldRegRecord = getRegisterRecord(oldAssignedReg);
8576 if (oldRegRecord->assignedInterval == interval)
8578 updateAssignedInterval(oldRegRecord, nullptr, interval->registerType);
8583 if (currentRefPosition->refType == RefTypeUse && !currentRefPosition->reload)
8585 // Was this spilled after our predecessor was scheduled?
8586 if (interval->physReg == REG_NA)
8588 assert(inVarToRegMaps[curBBNum][varDsc->lvVarIndex] == REG_STK);
8589 currentRefPosition->reload = true;
8593 bool reload = currentRefPosition->reload;
8594 bool spillAfter = currentRefPosition->spillAfter;
8596 // In the reload case we either:
8597 // - Set the register to REG_STK if it will be referenced only from the home location, or
8598 // - Set the register to the assigned register and set GTF_SPILLED if it must be loaded into a register.
8601 assert(currentRefPosition->refType != RefTypeDef);
8602 assert(interval->isSpilled);
8603 varDsc->lvRegNum = REG_STK;
8606 interval->physReg = assignedReg;
8609 // If there is no treeNode, this must be a RefTypeExpUse, in
8610 // which case we did the reload already
8611 if (treeNode != nullptr)
8613 treeNode->gtFlags |= GTF_SPILLED;
8616 if (currentRefPosition->AllocateIfProfitable())
8618 // This is a use of lclVar that is flagged as reg-optional
8619 // by lower/codegen and marked for both reload and spillAfter.
8620 // In this case we can avoid unnecessary reload and spill
8621 // by setting reg on lclVar to REG_STK and reg on tree node
8622 // to REG_NA. Codegen will generate the code by considering
8623 // it as a contained memory operand.
8625 // Note that varDsc->lvRegNum is already to REG_STK above.
8626 interval->physReg = REG_NA;
8627 treeNode->gtRegNum = REG_NA;
8628 treeNode->gtFlags &= ~GTF_SPILLED;
8629 treeNode->SetContained();
8633 treeNode->gtFlags |= GTF_SPILL;
8639 assert(currentRefPosition->refType == RefTypeExpUse);
8642 else if (spillAfter && !RefTypeIsUse(currentRefPosition->refType))
8644 // In the case of a pure def, don't bother spilling - just assign it to the
8645 // stack. However, we need to remember that it was spilled.
8647 assert(interval->isSpilled);
8648 varDsc->lvRegNum = REG_STK;
8649 interval->physReg = REG_NA;
8650 if (treeNode != nullptr)
8652 treeNode->gtRegNum = REG_NA;
8657 // Not reload and Not pure-def that's spillAfter
8659 if (currentRefPosition->copyReg || currentRefPosition->moveReg)
8661 // For a copyReg or moveReg, we have two cases:
8662 // - In the first case, we have a fixedReg - i.e. a register which the code
8663 // generator is constrained to use.
8664 // The code generator will generate the appropriate move to meet the requirement.
8665 // - In the second case, we were forced to use a different register because of
8666 // interference (or JitStressRegs).
8667 // In this case, we generate a GT_COPY.
8668 // In either case, we annotate the treeNode with the register in which the value
8669 // currently lives. For moveReg, the homeReg is the new register (as assigned above).
8670 // But for copyReg, the homeReg remains unchanged.
8672 assert(treeNode != nullptr);
8673 treeNode->gtRegNum = interval->physReg;
8675 if (currentRefPosition->copyReg)
8677 homeReg = interval->physReg;
8681 assert(interval->isSplit);
8682 interval->physReg = assignedReg;
8685 if (!currentRefPosition->isFixedRegRef || currentRefPosition->moveReg)
8687 // This is the second case, where we need to generate a copy
8688 insertCopyOrReload(block, treeNode, currentRefPosition->getMultiRegIdx(), currentRefPosition);
8693 interval->physReg = assignedReg;
8695 if (!interval->isSpilled && !interval->isSplit)
8697 if (varDsc->lvRegNum != REG_STK)
8699 // If the register assignments don't match, then this interval is split.
8700 if (varDsc->lvRegNum != assignedReg)
8702 setIntervalAsSplit(interval);
8703 varDsc->lvRegNum = REG_STK;
8708 varDsc->lvRegNum = assignedReg;
8714 if (treeNode != nullptr)
8716 treeNode->gtFlags |= GTF_SPILL;
8718 assert(interval->isSpilled);
8719 interval->physReg = REG_NA;
8720 varDsc->lvRegNum = REG_STK;
8724 // Update the physRegRecord for the register, so that we know what vars are in
8725 // regs at the block boundaries
8726 RegRecord* physRegRecord = getRegisterRecord(homeReg);
8727 if (spillAfter || currentRefPosition->lastUse)
8729 interval->isActive = false;
8730 interval->assignedReg = nullptr;
8731 interval->physReg = REG_NA;
8733 updateAssignedInterval(physRegRecord, nullptr, interval->registerType);
8737 interval->isActive = true;
8738 interval->assignedReg = physRegRecord;
8740 updateAssignedInterval(physRegRecord, interval, interval->registerType);
8744 void LinearScan::writeRegisters(RefPosition* currentRefPosition, GenTree* tree)
8746 lsraAssignRegToTree(tree, currentRefPosition->assignedReg(), currentRefPosition->getMultiRegIdx());
8749 //------------------------------------------------------------------------
8750 // insertCopyOrReload: Insert a copy in the case where a tree node value must be moved
8751 // to a different register at the point of use (GT_COPY), or it is reloaded to a different register
8752 // than the one it was spilled from (GT_RELOAD).
8755 // block - basic block in which GT_COPY/GT_RELOAD is inserted.
8756 // tree - This is the node to copy or reload.
8757 // Insert copy or reload node between this node and its parent.
8758 // multiRegIdx - register position of tree node for which copy or reload is needed.
8759 // refPosition - The RefPosition at which copy or reload will take place.
8762 // The GT_COPY or GT_RELOAD will be inserted in the proper spot in execution order where the reload is to occur.
8764 // For example, for this tree (numbers are execution order, lower is earlier and higher is later):
8766 // +---------+----------+
8768 // +---------+----------+
8773 // +-------------------+ +----------------------+
8774 // | x (1) | "tree" | y (2) |
8775 // +-------------------+ +----------------------+
8777 // generate this tree:
8779 // +---------+----------+
8781 // +---------+----------+
8786 // +-------------------+ +----------------------+
8787 // | GT_RELOAD (3) | | y (2) |
8788 // +-------------------+ +----------------------+
8790 // +-------------------+
8792 // +-------------------+
8794 // Note in particular that the GT_RELOAD node gets inserted in execution order immediately before the parent of "tree",
8795 // which seems a bit weird since normally a node's parent (in this case, the parent of "x", GT_RELOAD in the "after"
8796 // picture) immediately follows all of its children (that is, normally the execution ordering is postorder).
8797 // The ordering must be this weird "out of normal order" way because the "x" node is being spilled, probably
8798 // because the expression in the tree represented above by "y" has high register requirements. We don't want
8799 // to reload immediately, of course. So we put GT_RELOAD where the reload should actually happen.
8801 // Note that GT_RELOAD is required when we reload to a different register than the one we spilled to. It can also be
8802 // used if we reload to the same register. Normally, though, in that case we just mark the node with GTF_SPILLED,
8803 // and the unspilling code automatically reuses the same register, and does the reload when it notices that flag
8804 // when considering a node's operands.
8806 void LinearScan::insertCopyOrReload(BasicBlock* block, GenTreePtr tree, unsigned multiRegIdx, RefPosition* refPosition)
8808 LIR::Range& blockRange = LIR::AsRange(block);
8811 bool foundUse = blockRange.TryGetUse(tree, &treeUse);
8814 GenTree* parent = treeUse.User();
8817 if (refPosition->reload)
8825 #if TRACK_LSRA_STATS
8826 updateLsraStat(LSRA_STAT_COPY_REG, block->bbNum);
8830 // If the parent is a reload/copy node, then tree must be a multi-reg call node
8831 // that has already had one of its registers spilled. This is Because multi-reg
8832 // call node is the only node whose RefTypeDef positions get independently
8833 // spilled or reloaded. It is possible that one of its RefTypeDef position got
8834 // spilled and the next use of it requires it to be in a different register.
8836 // In this case set the ith position reg of reload/copy node to the reg allocated
8837 // for copy/reload refPosition. Essentially a copy/reload node will have a reg
8838 // for each multi-reg position of its child. If there is a valid reg in ith
8839 // position of GT_COPY or GT_RELOAD node then the corresponding result of its
8840 // child needs to be copied or reloaded to that reg.
8841 if (parent->IsCopyOrReload())
8843 noway_assert(parent->OperGet() == oper);
8844 noway_assert(tree->IsMultiRegCall());
8845 GenTreeCall* call = tree->AsCall();
8846 GenTreeCopyOrReload* copyOrReload = parent->AsCopyOrReload();
8847 noway_assert(copyOrReload->GetRegNumByIdx(multiRegIdx) == REG_NA);
8848 copyOrReload->SetRegNumByIdx(refPosition->assignedReg(), multiRegIdx);
8852 // Create the new node, with "tree" as its only child.
8853 var_types treeType = tree->TypeGet();
8855 GenTreeCopyOrReload* newNode = new (compiler, oper) GenTreeCopyOrReload(oper, treeType, tree);
8856 assert(refPosition->registerAssignment != RBM_NONE);
8857 newNode->SetRegNumByIdx(refPosition->assignedReg(), multiRegIdx);
8858 newNode->gtLsraInfo.isLsraAdded = true;
8859 newNode->gtLsraInfo.isLocalDefUse = false;
8860 if (refPosition->copyReg)
8862 // This is a TEMPORARY copy
8863 assert(isCandidateLocalRef(tree));
8864 newNode->gtFlags |= GTF_VAR_DEATH;
8867 // Insert the copy/reload after the spilled node and replace the use of the original node with a use
8868 // of the copy/reload.
8869 blockRange.InsertAfter(tree, newNode);
8870 treeUse.ReplaceWith(compiler, newNode);
8874 #if FEATURE_PARTIAL_SIMD_CALLEE_SAVE
8875 //------------------------------------------------------------------------
8876 // insertUpperVectorSaveAndReload: Insert code to save and restore the upper half of a vector that lives
8877 // in a callee-save register at the point of a kill (the upper half is
8881 // tree - This is the node around which we will insert the Save & Reload.
8882 // It will be a call or some node that turns into a call.
8883 // refPosition - The RefTypeUpperVectorSaveDef RefPosition.
8885 void LinearScan::insertUpperVectorSaveAndReload(GenTreePtr tree, RefPosition* refPosition, BasicBlock* block)
8887 Interval* lclVarInterval = refPosition->getInterval()->relatedInterval;
8888 assert(lclVarInterval->isLocalVar == true);
8889 LclVarDsc* varDsc = compiler->lvaTable + lclVarInterval->varNum;
8890 assert(varDsc->lvType == LargeVectorType);
8891 regNumber lclVarReg = lclVarInterval->physReg;
8892 if (lclVarReg == REG_NA)
8897 assert((genRegMask(lclVarReg) & RBM_FLT_CALLEE_SAVED) != RBM_NONE);
8899 regNumber spillReg = refPosition->assignedReg();
8900 bool spillToMem = refPosition->spillAfter;
8902 LIR::Range& blockRange = LIR::AsRange(block);
8904 // First, insert the save before the call.
8906 GenTreePtr saveLcl = compiler->gtNewLclvNode(lclVarInterval->varNum, LargeVectorType);
8907 saveLcl->gtLsraInfo.isLsraAdded = true;
8908 saveLcl->gtRegNum = lclVarReg;
8909 saveLcl->gtLsraInfo.isLocalDefUse = false;
8911 GenTreeSIMD* simdNode =
8912 new (compiler, GT_SIMD) GenTreeSIMD(LargeVectorSaveType, saveLcl, nullptr, SIMDIntrinsicUpperSave,
8913 varDsc->lvBaseType, genTypeSize(LargeVectorType));
8914 simdNode->gtLsraInfo.isLsraAdded = true;
8915 simdNode->gtRegNum = spillReg;
8918 simdNode->gtFlags |= GTF_SPILL;
8921 blockRange.InsertBefore(tree, LIR::SeqTree(compiler, simdNode));
8923 // Now insert the restore after the call.
8925 GenTreePtr restoreLcl = compiler->gtNewLclvNode(lclVarInterval->varNum, LargeVectorType);
8926 restoreLcl->gtLsraInfo.isLsraAdded = true;
8927 restoreLcl->gtRegNum = lclVarReg;
8928 restoreLcl->gtLsraInfo.isLocalDefUse = false;
8930 simdNode = new (compiler, GT_SIMD)
8931 GenTreeSIMD(LargeVectorType, restoreLcl, nullptr, SIMDIntrinsicUpperRestore, varDsc->lvBaseType, 32);
8932 simdNode->gtLsraInfo.isLsraAdded = true;
8933 simdNode->gtRegNum = spillReg;
8936 simdNode->gtFlags |= GTF_SPILLED;
8939 blockRange.InsertAfter(tree, LIR::SeqTree(compiler, simdNode));
8941 #endif // FEATURE_PARTIAL_SIMD_CALLEE_SAVE
8943 //------------------------------------------------------------------------
8944 // initMaxSpill: Initializes the LinearScan members used to track the max number
8945 // of concurrent spills. This is needed so that we can set the
8946 // fields in Compiler, so that the code generator, in turn can
8947 // allocate the right number of spill locations.
8956 // This is called before any calls to updateMaxSpill().
8958 void LinearScan::initMaxSpill()
8960 needDoubleTmpForFPCall = false;
8961 needFloatTmpForFPCall = false;
8962 for (int i = 0; i < TYP_COUNT; i++)
8965 currentSpill[i] = 0;
8969 //------------------------------------------------------------------------
8970 // recordMaxSpill: Sets the fields in Compiler for the max number of concurrent spills.
8971 // (See the comment on initMaxSpill.)
8980 // This is called after updateMaxSpill() has been called for all "real"
8983 void LinearScan::recordMaxSpill()
8985 // Note: due to the temp normalization process (see tmpNormalizeType)
8986 // only a few types should actually be seen here.
8987 JITDUMP("Recording the maximum number of concurrent spills:\n");
8989 var_types returnType = compiler->tmpNormalizeType(compiler->info.compRetType);
8990 if (needDoubleTmpForFPCall || (returnType == TYP_DOUBLE))
8992 JITDUMP("Adding a spill temp for moving a double call/return value between xmm reg and x87 stack.\n");
8993 maxSpill[TYP_DOUBLE] += 1;
8995 if (needFloatTmpForFPCall || (returnType == TYP_FLOAT))
8997 JITDUMP("Adding a spill temp for moving a float call/return value between xmm reg and x87 stack.\n");
8998 maxSpill[TYP_FLOAT] += 1;
9000 #endif // _TARGET_X86_
9001 for (int i = 0; i < TYP_COUNT; i++)
9003 if (var_types(i) != compiler->tmpNormalizeType(var_types(i)))
9005 // Only normalized types should have anything in the maxSpill array.
9006 // We assume here that if type 'i' does not normalize to itself, then
9007 // nothing else normalizes to 'i', either.
9008 assert(maxSpill[i] == 0);
9010 if (maxSpill[i] != 0)
9012 JITDUMP(" %s: %d\n", varTypeName(var_types(i)), maxSpill[i]);
9013 compiler->tmpPreAllocateTemps(var_types(i), maxSpill[i]);
9019 //------------------------------------------------------------------------
9020 // updateMaxSpill: Update the maximum number of concurrent spills
9023 // refPosition - the current RefPosition being handled
9029 // The RefPosition has an associated interval (getInterval() will
9030 // otherwise assert).
9033 // This is called for each "real" RefPosition during the writeback
9034 // phase of LSRA. It keeps track of how many concurrently-live
9035 // spills there are, and the largest number seen so far.
9037 void LinearScan::updateMaxSpill(RefPosition* refPosition)
9039 RefType refType = refPosition->refType;
9041 if (refPosition->spillAfter || refPosition->reload ||
9042 (refPosition->AllocateIfProfitable() && refPosition->assignedReg() == REG_NA))
9044 Interval* interval = refPosition->getInterval();
9045 if (!interval->isLocalVar)
9047 // The tmp allocation logic 'normalizes' types to a small number of
9048 // types that need distinct stack locations from each other.
9049 // Those types are currently gc refs, byrefs, <= 4 byte non-GC items,
9050 // 8-byte non-GC items, and 16-byte or 32-byte SIMD vectors.
9051 // LSRA is agnostic to those choices but needs
9052 // to know what they are here.
9055 #if FEATURE_PARTIAL_SIMD_CALLEE_SAVE
9056 if ((refType == RefTypeUpperVectorSaveDef) || (refType == RefTypeUpperVectorSaveUse))
9058 typ = LargeVectorSaveType;
9061 #endif // !FEATURE_PARTIAL_SIMD_CALLEE_SAVE
9063 GenTreePtr treeNode = refPosition->treeNode;
9064 if (treeNode == nullptr)
9066 assert(RefTypeIsUse(refType));
9067 treeNode = interval->firstRefPosition->treeNode;
9069 assert(treeNode != nullptr);
9071 // In case of multi-reg call nodes, we need to use the type
9072 // of the return register given by multiRegIdx of the refposition.
9073 if (treeNode->IsMultiRegCall())
9075 ReturnTypeDesc* retTypeDesc = treeNode->AsCall()->GetReturnTypeDesc();
9076 typ = retTypeDesc->GetReturnRegType(refPosition->getMultiRegIdx());
9079 else if (treeNode->OperIsPutArgSplit())
9081 typ = treeNode->AsPutArgSplit()->GetRegType(refPosition->getMultiRegIdx());
9085 else if (treeNode->OperIsPutArgReg())
9087 // For double arg regs, the type is changed to long since they must be passed via `r0-r3`.
9088 // However when they get spilled, they should be treated as separated int registers.
9089 var_types typNode = treeNode->TypeGet();
9090 typ = (typNode == TYP_LONG) ? TYP_INT : typNode;
9092 #endif // ARM_SOFTFP
9095 typ = treeNode->TypeGet();
9097 typ = compiler->tmpNormalizeType(typ);
9100 if (refPosition->spillAfter && !refPosition->reload)
9102 currentSpill[typ]++;
9103 if (currentSpill[typ] > maxSpill[typ])
9105 maxSpill[typ] = currentSpill[typ];
9108 else if (refPosition->reload)
9110 assert(currentSpill[typ] > 0);
9111 currentSpill[typ]--;
9113 else if (refPosition->AllocateIfProfitable() && refPosition->assignedReg() == REG_NA)
9115 // A spill temp not getting reloaded into a reg because it is
9116 // marked as allocate if profitable and getting used from its
9117 // memory location. To properly account max spill for typ we
9118 // decrement spill count.
9119 assert(RefTypeIsUse(refType));
9120 assert(currentSpill[typ] > 0);
9121 currentSpill[typ]--;
9123 JITDUMP(" Max spill for %s is %d\n", varTypeName(typ), maxSpill[typ]);
9128 // This is the final phase of register allocation. It writes the register assignments to
9129 // the tree, and performs resolution across joins and backedges.
9131 void LinearScan::resolveRegisters()
9133 // Iterate over the tree and the RefPositions in lockstep
9134 // - annotate the tree with register assignments by setting gtRegNum or gtRegPair (for longs)
9136 // - track globally-live var locations
9137 // - add resolution points at split/merge/critical points as needed
9139 // Need to use the same traversal order as the one that assigns the location numbers.
9141 // Dummy RefPositions have been added at any split, join or critical edge, at the
9142 // point where resolution may be required. These are located:
9143 // - for a split, at the top of the non-adjacent block
9144 // - for a join, at the bottom of the non-adjacent joining block
9145 // - for a critical edge, at the top of the target block of each critical
9147 // Note that a target block may have multiple incoming critical or split edges
9149 // These RefPositions record the expected location of the Interval at that point.
9150 // At each branch, we identify the location of each liveOut interval, and check
9151 // against the RefPositions at the target.
9154 LsraLocation currentLocation = MinLocation;
9156 // Clear register assignments - these will be reestablished as lclVar defs (including RefTypeParamDefs)
9158 if (enregisterLocalVars)
9160 for (regNumber reg = REG_FIRST; reg < ACTUAL_REG_COUNT; reg = REG_NEXT(reg))
9162 RegRecord* physRegRecord = getRegisterRecord(reg);
9163 Interval* assignedInterval = physRegRecord->assignedInterval;
9164 if (assignedInterval != nullptr)
9166 assignedInterval->assignedReg = nullptr;
9167 assignedInterval->physReg = REG_NA;
9169 physRegRecord->assignedInterval = nullptr;
9170 physRegRecord->recentRefPosition = nullptr;
9173 // Clear "recentRefPosition" for lclVar intervals
9174 for (unsigned varIndex = 0; varIndex < compiler->lvaTrackedCount; varIndex++)
9176 if (localVarIntervals[varIndex] != nullptr)
9178 localVarIntervals[varIndex]->recentRefPosition = nullptr;
9179 localVarIntervals[varIndex]->isActive = false;
9183 assert(compiler->lvaTable[compiler->lvaTrackedToVarNum[varIndex]].lvLRACandidate == false);
9188 // handle incoming arguments and special temps
9189 auto currentRefPosition = refPositions.begin();
9191 if (enregisterLocalVars)
9193 VarToRegMap entryVarToRegMap = inVarToRegMaps[compiler->fgFirstBB->bbNum];
9194 while (currentRefPosition != refPositions.end() &&
9195 (currentRefPosition->refType == RefTypeParamDef || currentRefPosition->refType == RefTypeZeroInit))
9197 Interval* interval = currentRefPosition->getInterval();
9198 assert(interval != nullptr && interval->isLocalVar);
9199 resolveLocalRef(nullptr, nullptr, currentRefPosition);
9200 regNumber reg = REG_STK;
9201 int varIndex = interval->getVarIndex(compiler);
9203 if (!currentRefPosition->spillAfter && currentRefPosition->registerAssignment != RBM_NONE)
9205 reg = currentRefPosition->assignedReg();
9210 interval->isActive = false;
9212 setVarReg(entryVarToRegMap, varIndex, reg);
9213 ++currentRefPosition;
9218 assert(currentRefPosition == refPositions.end() ||
9219 (currentRefPosition->refType != RefTypeParamDef && currentRefPosition->refType != RefTypeZeroInit));
9222 BasicBlock* insertionBlock = compiler->fgFirstBB;
9223 GenTreePtr insertionPoint = LIR::AsRange(insertionBlock).FirstNonPhiNode();
9225 // write back assignments
9226 for (block = startBlockSequence(); block != nullptr; block = moveToNextBlock())
9228 assert(curBBNum == block->bbNum);
9230 if (enregisterLocalVars)
9232 // Record the var locations at the start of this block.
9233 // (If it's fgFirstBB, we've already done that above, see entryVarToRegMap)
9235 curBBStartLocation = currentRefPosition->nodeLocation;
9236 if (block != compiler->fgFirstBB)
9238 processBlockStartLocations(block, false);
9241 // Handle the DummyDefs, updating the incoming var location.
9242 for (; currentRefPosition != refPositions.end() && currentRefPosition->refType == RefTypeDummyDef;
9243 ++currentRefPosition)
9245 assert(currentRefPosition->isIntervalRef());
9246 // Don't mark dummy defs as reload
9247 currentRefPosition->reload = false;
9248 resolveLocalRef(nullptr, nullptr, currentRefPosition);
9250 if (currentRefPosition->registerAssignment != RBM_NONE)
9252 reg = currentRefPosition->assignedReg();
9257 currentRefPosition->getInterval()->isActive = false;
9259 setInVarRegForBB(curBBNum, currentRefPosition->getInterval()->varNum, reg);
9263 // The next RefPosition should be for the block. Move past it.
9264 assert(currentRefPosition != refPositions.end());
9265 assert(currentRefPosition->refType == RefTypeBB);
9266 ++currentRefPosition;
9268 // Handle the RefPositions for the block
9269 for (; currentRefPosition != refPositions.end() && currentRefPosition->refType != RefTypeBB &&
9270 currentRefPosition->refType != RefTypeDummyDef;
9271 ++currentRefPosition)
9273 currentLocation = currentRefPosition->nodeLocation;
9275 // Ensure that the spill & copy info is valid.
9276 // First, if it's reload, it must not be copyReg or moveReg
9277 assert(!currentRefPosition->reload || (!currentRefPosition->copyReg && !currentRefPosition->moveReg));
9278 // If it's copyReg it must not be moveReg, and vice-versa
9279 assert(!currentRefPosition->copyReg || !currentRefPosition->moveReg);
9281 switch (currentRefPosition->refType)
9284 case RefTypeUpperVectorSaveUse:
9285 case RefTypeUpperVectorSaveDef:
9286 #endif // FEATURE_SIMD
9289 // These are the ones we're interested in
9292 case RefTypeFixedReg:
9293 // These require no handling at resolution time
9294 assert(currentRefPosition->referent != nullptr);
9295 currentRefPosition->referent->recentRefPosition = currentRefPosition;
9298 // Ignore the ExpUse cases - a RefTypeExpUse would only exist if the
9299 // variable is dead at the entry to the next block. So we'll mark
9300 // it as in its current location and resolution will take care of any
9302 assert(getNextBlock() == nullptr ||
9303 !VarSetOps::IsMember(compiler, getNextBlock()->bbLiveIn,
9304 currentRefPosition->getInterval()->getVarIndex(compiler)));
9305 currentRefPosition->referent->recentRefPosition = currentRefPosition;
9307 case RefTypeKillGCRefs:
9308 // No action to take at resolution time, and no interval to update recentRefPosition for.
9310 case RefTypeDummyDef:
9311 case RefTypeParamDef:
9312 case RefTypeZeroInit:
9313 // Should have handled all of these already
9318 updateMaxSpill(currentRefPosition);
9319 GenTree* treeNode = currentRefPosition->treeNode;
9321 #if FEATURE_PARTIAL_SIMD_CALLEE_SAVE
9322 if (currentRefPosition->refType == RefTypeUpperVectorSaveDef)
9324 // The treeNode must be a call, and this must be a RefPosition for a LargeVectorType LocalVar.
9325 // If the LocalVar is in a callee-save register, we are going to spill its upper half around the call.
9326 // If we have allocated a register to spill it to, we will use that; otherwise, we will spill it
9327 // to the stack. We can use as a temp register any non-arg caller-save register.
9328 noway_assert(treeNode != nullptr);
9329 currentRefPosition->referent->recentRefPosition = currentRefPosition;
9330 insertUpperVectorSaveAndReload(treeNode, currentRefPosition, block);
9332 else if (currentRefPosition->refType == RefTypeUpperVectorSaveUse)
9336 #endif // FEATURE_PARTIAL_SIMD_CALLEE_SAVE
9338 // Most uses won't actually need to be recorded (they're on the def).
9339 // In those cases, treeNode will be nullptr.
9340 if (treeNode == nullptr)
9342 // This is either a use, a dead def, or a field of a struct
9343 Interval* interval = currentRefPosition->getInterval();
9344 assert(currentRefPosition->refType == RefTypeUse ||
9345 currentRefPosition->registerAssignment == RBM_NONE || interval->isStructField);
9347 // TODO-Review: Need to handle the case where any of the struct fields
9348 // are reloaded/spilled at this use
9349 assert(!interval->isStructField ||
9350 (currentRefPosition->reload == false && currentRefPosition->spillAfter == false));
9352 if (interval->isLocalVar && !interval->isStructField)
9354 LclVarDsc* varDsc = interval->getLocalVar(compiler);
9356 // This must be a dead definition. We need to mark the lclVar
9357 // so that it's not considered a candidate for lvRegister, as
9358 // this dead def will have to go to the stack.
9359 assert(currentRefPosition->refType == RefTypeDef);
9360 varDsc->lvRegNum = REG_STK;
9365 LsraLocation loc = treeNode->gtLsraInfo.loc;
9366 assert(treeNode->IsLocal() || currentLocation == loc || currentLocation == loc + 1);
9368 if (currentRefPosition->isIntervalRef() && currentRefPosition->getInterval()->isInternal)
9370 treeNode->gtRsvdRegs |= currentRefPosition->registerAssignment;
9374 writeRegisters(currentRefPosition, treeNode);
9376 if (treeNode->IsLocal() && currentRefPosition->getInterval()->isLocalVar)
9378 resolveLocalRef(block, treeNode, currentRefPosition);
9381 // Mark spill locations on temps
9382 // (local vars are handled in resolveLocalRef, above)
9383 // Note that the tree node will be changed from GTF_SPILL to GTF_SPILLED
9384 // in codegen, taking care of the "reload" case for temps
9385 else if (currentRefPosition->spillAfter || (currentRefPosition->nextRefPosition != nullptr &&
9386 currentRefPosition->nextRefPosition->moveReg))
9388 if (treeNode != nullptr && currentRefPosition->isIntervalRef())
9390 if (currentRefPosition->spillAfter)
9392 treeNode->gtFlags |= GTF_SPILL;
9394 // If this is a constant interval that is reusing a pre-existing value, we actually need
9395 // to generate the value at this point in order to spill it.
9396 if (treeNode->IsReuseRegVal())
9398 treeNode->ResetReuseRegVal();
9401 // In case of multi-reg call node, also set spill flag on the
9402 // register specified by multi-reg index of current RefPosition.
9403 // Note that the spill flag on treeNode indicates that one or
9404 // more its allocated registers are in that state.
9405 if (treeNode->IsMultiRegCall())
9407 GenTreeCall* call = treeNode->AsCall();
9408 call->SetRegSpillFlagByIdx(GTF_SPILL, currentRefPosition->getMultiRegIdx());
9411 else if (treeNode->OperIsPutArgSplit())
9413 GenTreePutArgSplit* splitArg = treeNode->AsPutArgSplit();
9414 splitArg->SetRegSpillFlagByIdx(GTF_SPILL, currentRefPosition->getMultiRegIdx());
9416 else if (treeNode->OperIsMultiRegOp())
9418 GenTreeMultiRegOp* multiReg = treeNode->AsMultiRegOp();
9419 multiReg->SetRegSpillFlagByIdx(GTF_SPILL, currentRefPosition->getMultiRegIdx());
9424 // If the value is reloaded or moved to a different register, we need to insert
9425 // a node to hold the register to which it should be reloaded
9426 RefPosition* nextRefPosition = currentRefPosition->nextRefPosition;
9427 assert(nextRefPosition != nullptr);
9428 if (INDEBUG(alwaysInsertReload() ||)
9429 nextRefPosition->assignedReg() != currentRefPosition->assignedReg())
9431 if (nextRefPosition->assignedReg() != REG_NA)
9433 insertCopyOrReload(block, treeNode, currentRefPosition->getMultiRegIdx(),
9438 assert(nextRefPosition->AllocateIfProfitable());
9440 // In case of tree temps, if def is spilled and use didn't
9441 // get a register, set a flag on tree node to be treated as
9442 // contained at the point of its use.
9443 if (currentRefPosition->spillAfter && currentRefPosition->refType == RefTypeDef &&
9444 nextRefPosition->refType == RefTypeUse)
9446 assert(nextRefPosition->treeNode == nullptr);
9447 treeNode->gtFlags |= GTF_NOREG_AT_USE;
9453 // We should never have to "spill after" a temp use, since
9454 // they're single use
9463 if (enregisterLocalVars)
9465 processBlockEndLocations(block);
9469 if (enregisterLocalVars)
9474 printf("-----------------------\n");
9475 printf("RESOLVING BB BOUNDARIES\n");
9476 printf("-----------------------\n");
9478 printf("Resolution Candidates: ");
9479 dumpConvertedVarSet(compiler, resolutionCandidateVars);
9481 printf("Has %sCritical Edges\n\n", hasCriticalEdges ? "" : "No");
9483 printf("Prior to Resolution\n");
9484 foreach_block(compiler, block)
9486 printf("\nBB%02u use def in out\n", block->bbNum);
9487 dumpConvertedVarSet(compiler, block->bbVarUse);
9489 dumpConvertedVarSet(compiler, block->bbVarDef);
9491 dumpConvertedVarSet(compiler, block->bbLiveIn);
9493 dumpConvertedVarSet(compiler, block->bbLiveOut);
9496 dumpInVarToRegMap(block);
9497 dumpOutVarToRegMap(block);
9506 // Verify register assignments on variables
9509 for (lclNum = 0, varDsc = compiler->lvaTable; lclNum < compiler->lvaCount; lclNum++, varDsc++)
9511 if (!isCandidateVar(varDsc))
9513 varDsc->lvRegNum = REG_STK;
9517 Interval* interval = getIntervalForLocalVar(varDsc->lvVarIndex);
9519 // Determine initial position for parameters
9521 if (varDsc->lvIsParam)
9523 regMaskTP initialRegMask = interval->firstRefPosition->registerAssignment;
9524 regNumber initialReg = (initialRegMask == RBM_NONE || interval->firstRefPosition->spillAfter)
9526 : genRegNumFromMask(initialRegMask);
9527 regNumber sourceReg = (varDsc->lvIsRegArg) ? varDsc->lvArgReg : REG_STK;
9530 if (varTypeIsMultiReg(varDsc))
9532 // TODO-ARM-NYI: Map the hi/lo intervals back to lvRegNum and lvOtherReg (these should NYI
9534 assert(!"Multi-reg types not yet supported");
9537 #endif // _TARGET_ARM_
9539 varDsc->lvArgInitReg = initialReg;
9540 JITDUMP(" Set V%02u argument initial register to %s\n", lclNum, getRegName(initialReg));
9543 // Stack args that are part of dependently-promoted structs should never be register candidates (see
9544 // LinearScan::isRegCandidate).
9545 assert(varDsc->lvIsRegArg || !compiler->lvaIsFieldOfDependentlyPromotedStruct(varDsc));
9548 // If lvRegNum is REG_STK, that means that either no register
9549 // was assigned, or (more likely) that the same register was not
9550 // used for all references. In that case, codegen gets the register
9551 // from the tree node.
9552 if (varDsc->lvRegNum == REG_STK || interval->isSpilled || interval->isSplit)
9554 // For codegen purposes, we'll set lvRegNum to whatever register
9555 // it's currently in as we go.
9556 // However, we never mark an interval as lvRegister if it has either been spilled
9558 varDsc->lvRegister = false;
9560 // Skip any dead defs or exposed uses
9561 // (first use exposed will only occur when there is no explicit initialization)
9562 RefPosition* firstRefPosition = interval->firstRefPosition;
9563 while ((firstRefPosition != nullptr) && (firstRefPosition->refType == RefTypeExpUse))
9565 firstRefPosition = firstRefPosition->nextRefPosition;
9567 if (firstRefPosition == nullptr)
9570 varDsc->lvLRACandidate = false;
9571 if (varDsc->lvRefCnt == 0)
9573 varDsc->lvOnFrame = false;
9577 // We may encounter cases where a lclVar actually has no references, but
9578 // a non-zero refCnt. For safety (in case this is some "hidden" lclVar that we're
9579 // not correctly recognizing), we'll mark those as needing a stack location.
9580 // TODO-Cleanup: Make this an assert if/when we correct the refCnt
9582 varDsc->lvOnFrame = true;
9587 // If the interval was not spilled, it doesn't need a stack location.
9588 if (!interval->isSpilled)
9590 varDsc->lvOnFrame = false;
9592 if (firstRefPosition->registerAssignment == RBM_NONE || firstRefPosition->spillAfter)
9594 // Either this RefPosition is spilled, or regOptional or it is not a "real" def or use
9596 firstRefPosition->spillAfter || firstRefPosition->AllocateIfProfitable() ||
9597 (firstRefPosition->refType != RefTypeDef && firstRefPosition->refType != RefTypeUse));
9598 varDsc->lvRegNum = REG_STK;
9602 varDsc->lvRegNum = firstRefPosition->assignedReg();
9609 varDsc->lvRegister = true;
9610 varDsc->lvOnFrame = false;
9613 regMaskTP registerAssignment = genRegMask(varDsc->lvRegNum);
9614 assert(!interval->isSpilled && !interval->isSplit);
9615 RefPosition* refPosition = interval->firstRefPosition;
9616 assert(refPosition != nullptr);
9618 while (refPosition != nullptr)
9620 // All RefPositions must match, except for dead definitions,
9621 // copyReg/moveReg and RefTypeExpUse positions
9622 if (refPosition->registerAssignment != RBM_NONE && !refPosition->copyReg &&
9623 !refPosition->moveReg && refPosition->refType != RefTypeExpUse)
9625 assert(refPosition->registerAssignment == registerAssignment);
9627 refPosition = refPosition->nextRefPosition;
9638 printf("Trees after linear scan register allocator (LSRA)\n");
9639 compiler->fgDispBasicBlocks(true);
9642 verifyFinalAllocation();
9645 compiler->raMarkStkVars();
9648 // TODO-CQ: Review this comment and address as needed.
9649 // Change all unused promoted non-argument struct locals to a non-GC type (in this case TYP_INT)
9650 // so that the gc tracking logic and lvMustInit logic will ignore them.
9651 // Extract the code that does this from raAssignVars, and call it here.
9652 // PRECONDITIONS: Ensure that lvPromoted is set on promoted structs, if and
9653 // only if it is promoted on all paths.
9654 // Call might be something like:
9655 // compiler->BashUnusedStructLocals();
9659 //------------------------------------------------------------------------
9660 // insertMove: Insert a move of a lclVar with the given lclNum into the given block.
9663 // block - the BasicBlock into which the move will be inserted.
9664 // insertionPoint - the instruction before which to insert the move
9665 // lclNum - the lclNum of the var to be moved
9666 // fromReg - the register from which the var is moving
9667 // toReg - the register to which the var is moving
9673 // If insertionPoint is non-NULL, insert before that instruction;
9674 // otherwise, insert "near" the end (prior to the branch, if any).
9675 // If fromReg or toReg is REG_STK, then move from/to memory, respectively.
9677 void LinearScan::insertMove(
9678 BasicBlock* block, GenTreePtr insertionPoint, unsigned lclNum, regNumber fromReg, regNumber toReg)
9680 LclVarDsc* varDsc = compiler->lvaTable + lclNum;
9681 // the lclVar must be a register candidate
9682 assert(isRegCandidate(varDsc));
9683 // One or both MUST be a register
9684 assert(fromReg != REG_STK || toReg != REG_STK);
9685 // They must not be the same register.
9686 assert(fromReg != toReg);
9688 // This var can't be marked lvRegister now
9689 varDsc->lvRegNum = REG_STK;
9691 GenTreePtr src = compiler->gtNewLclvNode(lclNum, varDsc->TypeGet());
9692 src->gtLsraInfo.isLsraAdded = true;
9694 // There are three cases we need to handle:
9695 // - We are loading a lclVar from the stack.
9696 // - We are storing a lclVar to the stack.
9697 // - We are copying a lclVar between registers.
9699 // In the first and second cases, the lclVar node will be marked with GTF_SPILLED and GTF_SPILL, respectively.
9700 // It is up to the code generator to ensure that any necessary normalization is done when loading or storing the
9703 // In the third case, we generate GT_COPY(GT_LCL_VAR) and type each node with the normalized type of the lclVar.
9704 // This is safe because a lclVar is always normalized once it is in a register.
9707 if (fromReg == REG_STK)
9709 src->gtFlags |= GTF_SPILLED;
9710 src->gtRegNum = toReg;
9712 else if (toReg == REG_STK)
9714 src->gtFlags |= GTF_SPILL;
9715 src->gtRegNum = fromReg;
9719 var_types movType = genActualType(varDsc->TypeGet());
9720 src->gtType = movType;
9722 dst = new (compiler, GT_COPY) GenTreeCopyOrReload(GT_COPY, movType, src);
9723 // This is the new home of the lclVar - indicate that by clearing the GTF_VAR_DEATH flag.
9724 // Note that if src is itself a lastUse, this will have no effect.
9725 dst->gtFlags &= ~(GTF_VAR_DEATH);
9726 src->gtRegNum = fromReg;
9727 dst->gtRegNum = toReg;
9728 src->gtLsraInfo.isLocalDefUse = false;
9729 dst->gtLsraInfo.isLsraAdded = true;
9731 dst->gtLsraInfo.isLocalDefUse = true;
9733 LIR::Range treeRange = LIR::SeqTree(compiler, dst);
9734 LIR::Range& blockRange = LIR::AsRange(block);
9736 if (insertionPoint != nullptr)
9738 blockRange.InsertBefore(insertionPoint, std::move(treeRange));
9742 // Put the copy at the bottom
9743 // If there's a branch, make an embedded statement that executes just prior to the branch
9744 if (block->bbJumpKind == BBJ_COND || block->bbJumpKind == BBJ_SWITCH)
9746 noway_assert(!blockRange.IsEmpty());
9748 GenTree* branch = blockRange.LastNode();
9749 assert(branch->OperIsConditionalJump() || branch->OperGet() == GT_SWITCH_TABLE ||
9750 branch->OperGet() == GT_SWITCH);
9752 blockRange.InsertBefore(branch, std::move(treeRange));
9756 assert(block->bbJumpKind == BBJ_NONE || block->bbJumpKind == BBJ_ALWAYS);
9757 blockRange.InsertAtEnd(std::move(treeRange));
9762 void LinearScan::insertSwap(
9763 BasicBlock* block, GenTreePtr insertionPoint, unsigned lclNum1, regNumber reg1, unsigned lclNum2, regNumber reg2)
9768 const char* insertionPointString = "top";
9769 if (insertionPoint == nullptr)
9771 insertionPointString = "bottom";
9773 printf(" BB%02u %s: swap V%02u in %s with V%02u in %s\n", block->bbNum, insertionPointString, lclNum1,
9774 getRegName(reg1), lclNum2, getRegName(reg2));
9778 LclVarDsc* varDsc1 = compiler->lvaTable + lclNum1;
9779 LclVarDsc* varDsc2 = compiler->lvaTable + lclNum2;
9780 assert(reg1 != REG_STK && reg1 != REG_NA && reg2 != REG_STK && reg2 != REG_NA);
9782 GenTreePtr lcl1 = compiler->gtNewLclvNode(lclNum1, varDsc1->TypeGet());
9783 lcl1->gtLsraInfo.isLsraAdded = true;
9784 lcl1->gtLsraInfo.isLocalDefUse = false;
9785 lcl1->gtRegNum = reg1;
9787 GenTreePtr lcl2 = compiler->gtNewLclvNode(lclNum2, varDsc2->TypeGet());
9788 lcl2->gtLsraInfo.isLsraAdded = true;
9789 lcl2->gtLsraInfo.isLocalDefUse = false;
9790 lcl2->gtRegNum = reg2;
9792 GenTreePtr swap = compiler->gtNewOperNode(GT_SWAP, TYP_VOID, lcl1, lcl2);
9793 swap->gtLsraInfo.isLsraAdded = true;
9794 swap->gtLsraInfo.isLocalDefUse = false;
9795 swap->gtRegNum = REG_NA;
9797 lcl1->gtNext = lcl2;
9798 lcl2->gtPrev = lcl1;
9799 lcl2->gtNext = swap;
9800 swap->gtPrev = lcl2;
9802 LIR::Range swapRange = LIR::SeqTree(compiler, swap);
9803 LIR::Range& blockRange = LIR::AsRange(block);
9805 if (insertionPoint != nullptr)
9807 blockRange.InsertBefore(insertionPoint, std::move(swapRange));
9811 // Put the copy at the bottom
9812 // If there's a branch, make an embedded statement that executes just prior to the branch
9813 if (block->bbJumpKind == BBJ_COND || block->bbJumpKind == BBJ_SWITCH)
9815 noway_assert(!blockRange.IsEmpty());
9817 GenTree* branch = blockRange.LastNode();
9818 assert(branch->OperIsConditionalJump() || branch->OperGet() == GT_SWITCH_TABLE ||
9819 branch->OperGet() == GT_SWITCH);
9821 blockRange.InsertBefore(branch, std::move(swapRange));
9825 assert(block->bbJumpKind == BBJ_NONE || block->bbJumpKind == BBJ_ALWAYS);
9826 blockRange.InsertAtEnd(std::move(swapRange));
9831 //------------------------------------------------------------------------
9832 // getTempRegForResolution: Get a free register to use for resolution code.
9835 // fromBlock - The "from" block on the edge being resolved.
9836 // toBlock - The "to"block on the edge
9837 // type - the type of register required
9840 // Returns a register that is free on the given edge, or REG_NA if none is available.
9843 // It is up to the caller to check the return value, and to determine whether a register is
9844 // available, and to handle that case appropriately.
9845 // It is also up to the caller to cache the return value, as this is not cheap to compute.
9847 regNumber LinearScan::getTempRegForResolution(BasicBlock* fromBlock, BasicBlock* toBlock, var_types type)
9849 // TODO-Throughput: This would be much more efficient if we add RegToVarMaps instead of VarToRegMaps
9850 // and they would be more space-efficient as well.
9851 VarToRegMap fromVarToRegMap = getOutVarToRegMap(fromBlock->bbNum);
9852 VarToRegMap toVarToRegMap = getInVarToRegMap(toBlock->bbNum);
9854 regMaskTP freeRegs = allRegs(type);
9856 if (getStressLimitRegs() == LSRA_LIMIT_SMALL_SET)
9861 INDEBUG(freeRegs = stressLimitRegs(nullptr, freeRegs));
9863 // We are only interested in the variables that are live-in to the "to" block.
9864 VarSetOps::Iter iter(compiler, toBlock->bbLiveIn);
9865 unsigned varIndex = 0;
9866 while (iter.NextElem(&varIndex) && freeRegs != RBM_NONE)
9868 regNumber fromReg = getVarReg(fromVarToRegMap, varIndex);
9869 regNumber toReg = getVarReg(toVarToRegMap, varIndex);
9870 assert(fromReg != REG_NA && toReg != REG_NA);
9871 if (fromReg != REG_STK)
9873 freeRegs &= ~genRegMask(fromReg);
9875 if (toReg != REG_STK)
9877 freeRegs &= ~genRegMask(toReg);
9880 if (freeRegs == RBM_NONE)
9886 regNumber tempReg = genRegNumFromMask(genFindLowestBit(freeRegs));
9891 //------------------------------------------------------------------------
9892 // addResolution: Add a resolution move of the given interval
9895 // block - the BasicBlock into which the move will be inserted.
9896 // insertionPoint - the instruction before which to insert the move
9897 // interval - the interval of the var to be moved
9898 // toReg - the register to which the var is moving
9899 // fromReg - the register from which the var is moving
9905 // For joins, we insert at the bottom (indicated by an insertionPoint
9906 // of nullptr), while for splits we insert at the top.
9907 // This is because for joins 'block' is a pred of the join, while for splits it is a succ.
9908 // For critical edges, this function may be called twice - once to move from
9909 // the source (fromReg), if any, to the stack, in which case toReg will be
9910 // REG_STK, and we insert at the bottom (leave insertionPoint as nullptr).
9911 // The next time, we want to move from the stack to the destination (toReg),
9912 // in which case fromReg will be REG_STK, and we insert at the top.
9914 void LinearScan::addResolution(
9915 BasicBlock* block, GenTreePtr insertionPoint, Interval* interval, regNumber toReg, regNumber fromReg)
9918 const char* insertionPointString = "top";
9920 if (insertionPoint == nullptr)
9923 insertionPointString = "bottom";
9927 JITDUMP(" BB%02u %s: move V%02u from ", block->bbNum, insertionPointString, interval->varNum);
9928 JITDUMP("%s to %s", getRegName(fromReg), getRegName(toReg));
9930 insertMove(block, insertionPoint, interval->varNum, fromReg, toReg);
9931 if (fromReg == REG_STK || toReg == REG_STK)
9933 assert(interval->isSpilled);
9937 // We should have already marked this as spilled or split.
9938 assert((interval->isSpilled) || (interval->isSplit));
9941 INTRACK_STATS(updateLsraStat(LSRA_STAT_RESOLUTION_MOV, block->bbNum));
9944 //------------------------------------------------------------------------
9945 // handleOutgoingCriticalEdges: Performs the necessary resolution on all critical edges that feed out of 'block'
9948 // block - the block with outgoing critical edges.
9954 // For all outgoing critical edges (i.e. any successor of this block which is
9955 // a join edge), if there are any conflicts, split the edge by adding a new block,
9956 // and generate the resolution code into that block.
9958 void LinearScan::handleOutgoingCriticalEdges(BasicBlock* block)
9960 VARSET_TP outResolutionSet(VarSetOps::Intersection(compiler, block->bbLiveOut, resolutionCandidateVars));
9961 if (VarSetOps::IsEmpty(compiler, outResolutionSet))
9965 VARSET_TP sameResolutionSet(VarSetOps::MakeEmpty(compiler));
9966 VARSET_TP sameLivePathsSet(VarSetOps::MakeEmpty(compiler));
9967 VARSET_TP singleTargetSet(VarSetOps::MakeEmpty(compiler));
9968 VARSET_TP diffResolutionSet(VarSetOps::MakeEmpty(compiler));
9970 // Get the outVarToRegMap for this block
9971 VarToRegMap outVarToRegMap = getOutVarToRegMap(block->bbNum);
9972 unsigned succCount = block->NumSucc(compiler);
9973 assert(succCount > 1);
9974 VarToRegMap firstSuccInVarToRegMap = nullptr;
9975 BasicBlock* firstSucc = nullptr;
9977 // First, determine the live regs at the end of this block so that we know what regs are
9978 // available to copy into.
9979 // Note that for this purpose we use the full live-out set, because we must ensure that
9980 // even the registers that remain the same across the edge are preserved correctly.
9981 regMaskTP liveOutRegs = RBM_NONE;
9982 VarSetOps::Iter liveOutIter(compiler, block->bbLiveOut);
9983 unsigned liveOutVarIndex = 0;
9984 while (liveOutIter.NextElem(&liveOutVarIndex))
9986 regNumber fromReg = getVarReg(outVarToRegMap, liveOutVarIndex);
9987 if (fromReg != REG_STK)
9989 liveOutRegs |= genRegMask(fromReg);
9993 // Next, if this blocks ends with a switch table, we have to make sure not to copy
9994 // into the registers that it uses.
9995 regMaskTP switchRegs = RBM_NONE;
9996 if (block->bbJumpKind == BBJ_SWITCH)
9998 // At this point, Lowering has transformed any non-switch-table blocks into
10000 GenTree* switchTable = LIR::AsRange(block).LastNode();
10001 assert(switchTable != nullptr && switchTable->OperGet() == GT_SWITCH_TABLE);
10003 switchRegs = switchTable->gtRsvdRegs;
10004 GenTree* op1 = switchTable->gtGetOp1();
10005 GenTree* op2 = switchTable->gtGetOp2();
10006 noway_assert(op1 != nullptr && op2 != nullptr);
10007 assert(op1->gtRegNum != REG_NA && op2->gtRegNum != REG_NA);
10008 switchRegs |= genRegMask(op1->gtRegNum);
10009 switchRegs |= genRegMask(op2->gtRegNum);
10012 VarToRegMap sameVarToRegMap = sharedCriticalVarToRegMap;
10013 regMaskTP sameWriteRegs = RBM_NONE;
10014 regMaskTP diffReadRegs = RBM_NONE;
10016 // For each var that may require resolution, classify them as:
10017 // - in the same register at the end of this block and at each target (no resolution needed)
10018 // - in different registers at different targets (resolve separately):
10019 // diffResolutionSet
10020 // - in the same register at each target at which it's live, but different from the end of
10021 // this block. We may be able to resolve these as if it is "join", but only if they do not
10022 // write to any registers that are read by those in the diffResolutionSet:
10023 // sameResolutionSet
10025 VarSetOps::Iter outResolutionSetIter(compiler, outResolutionSet);
10026 unsigned outResolutionSetVarIndex = 0;
10027 while (outResolutionSetIter.NextElem(&outResolutionSetVarIndex))
10029 regNumber fromReg = getVarReg(outVarToRegMap, outResolutionSetVarIndex);
10030 bool isMatch = true;
10031 bool isSame = false;
10032 bool maybeSingleTarget = false;
10033 bool maybeSameLivePaths = false;
10034 bool liveOnlyAtSplitEdge = true;
10035 regNumber sameToReg = REG_NA;
10036 for (unsigned succIndex = 0; succIndex < succCount; succIndex++)
10038 BasicBlock* succBlock = block->GetSucc(succIndex, compiler);
10039 if (!VarSetOps::IsMember(compiler, succBlock->bbLiveIn, outResolutionSetVarIndex))
10041 maybeSameLivePaths = true;
10044 else if (liveOnlyAtSplitEdge)
10046 // Is the var live only at those target blocks which are connected by a split edge to this block
10047 liveOnlyAtSplitEdge = ((succBlock->bbPreds->flNext == nullptr) && (succBlock != compiler->fgFirstBB));
10050 regNumber toReg = getVarReg(getInVarToRegMap(succBlock->bbNum), outResolutionSetVarIndex);
10051 if (sameToReg == REG_NA)
10056 if (toReg == sameToReg)
10060 sameToReg = REG_NA;
10064 // Check for the cases where we can't write to a register.
10065 // We only need to check for these cases if sameToReg is an actual register (not REG_STK).
10066 if (sameToReg != REG_NA && sameToReg != REG_STK)
10068 // If there's a path on which this var isn't live, it may use the original value in sameToReg.
10069 // In this case, sameToReg will be in the liveOutRegs of this block.
10070 // Similarly, if sameToReg is in sameWriteRegs, it has already been used (i.e. for a lclVar that's
10071 // live only at another target), and we can't copy another lclVar into that reg in this block.
10072 regMaskTP sameToRegMask = genRegMask(sameToReg);
10073 if (maybeSameLivePaths &&
10074 (((sameToRegMask & liveOutRegs) != RBM_NONE) || ((sameToRegMask & sameWriteRegs) != RBM_NONE)))
10076 sameToReg = REG_NA;
10078 // If this register is used by a switch table at the end of the block, we can't do the copy
10079 // in this block (since we can't insert it after the switch).
10080 if ((sameToRegMask & switchRegs) != RBM_NONE)
10082 sameToReg = REG_NA;
10085 // If the var is live only at those blocks connected by a split edge and not live-in at some of the
10086 // target blocks, we will resolve it the same way as if it were in diffResolutionSet and resolution
10087 // will be deferred to the handling of split edges, which means copy will only be at those target(s).
10089 // Another way to achieve similar resolution for vars live only at split edges is by removing them
10090 // from consideration up-front but it requires that we traverse those edges anyway to account for
10091 // the registers that must note be overwritten.
10092 if (liveOnlyAtSplitEdge && maybeSameLivePaths)
10094 sameToReg = REG_NA;
10098 if (sameToReg == REG_NA)
10100 VarSetOps::AddElemD(compiler, diffResolutionSet, outResolutionSetVarIndex);
10101 if (fromReg != REG_STK)
10103 diffReadRegs |= genRegMask(fromReg);
10106 else if (sameToReg != fromReg)
10108 VarSetOps::AddElemD(compiler, sameResolutionSet, outResolutionSetVarIndex);
10109 setVarReg(sameVarToRegMap, outResolutionSetVarIndex, sameToReg);
10110 if (sameToReg != REG_STK)
10112 sameWriteRegs |= genRegMask(sameToReg);
10117 if (!VarSetOps::IsEmpty(compiler, sameResolutionSet))
10119 if ((sameWriteRegs & diffReadRegs) != RBM_NONE)
10121 // We cannot split the "same" and "diff" regs if the "same" set writes registers
10122 // that must be read by the "diff" set. (Note that when these are done as a "batch"
10123 // we carefully order them to ensure all the input regs are read before they are
10125 VarSetOps::UnionD(compiler, diffResolutionSet, sameResolutionSet);
10126 VarSetOps::ClearD(compiler, sameResolutionSet);
10130 // For any vars in the sameResolutionSet, we can simply add the move at the end of "block".
10131 resolveEdge(block, nullptr, ResolveSharedCritical, sameResolutionSet);
10134 if (!VarSetOps::IsEmpty(compiler, diffResolutionSet))
10136 for (unsigned succIndex = 0; succIndex < succCount; succIndex++)
10138 BasicBlock* succBlock = block->GetSucc(succIndex, compiler);
10140 // Any "diffResolutionSet" resolution for a block with no other predecessors will be handled later
10141 // as split resolution.
10142 if ((succBlock->bbPreds->flNext == nullptr) && (succBlock != compiler->fgFirstBB))
10147 // Now collect the resolution set for just this edge, if any.
10148 // Check only the vars in diffResolutionSet that are live-in to this successor.
10149 bool needsResolution = false;
10150 VarToRegMap succInVarToRegMap = getInVarToRegMap(succBlock->bbNum);
10151 VARSET_TP edgeResolutionSet(VarSetOps::Intersection(compiler, diffResolutionSet, succBlock->bbLiveIn));
10152 VarSetOps::Iter iter(compiler, edgeResolutionSet);
10153 unsigned varIndex = 0;
10154 while (iter.NextElem(&varIndex))
10156 regNumber fromReg = getVarReg(outVarToRegMap, varIndex);
10157 regNumber toReg = getVarReg(succInVarToRegMap, varIndex);
10159 if (fromReg == toReg)
10161 VarSetOps::RemoveElemD(compiler, edgeResolutionSet, varIndex);
10164 if (!VarSetOps::IsEmpty(compiler, edgeResolutionSet))
10166 resolveEdge(block, succBlock, ResolveCritical, edgeResolutionSet);
10172 //------------------------------------------------------------------------
10173 // resolveEdges: Perform resolution across basic block edges
10182 // Traverse the basic blocks.
10183 // - If this block has a single predecessor that is not the immediately
10184 // preceding block, perform any needed 'split' resolution at the beginning of this block
10185 // - Otherwise if this block has critical incoming edges, handle them.
10186 // - If this block has a single successor that has multiple predecesors, perform any needed
10187 // 'join' resolution at the end of this block.
10188 // Note that a block may have both 'split' or 'critical' incoming edge(s) and 'join' outgoing
10191 void LinearScan::resolveEdges()
10193 JITDUMP("RESOLVING EDGES\n");
10195 // The resolutionCandidateVars set was initialized with all the lclVars that are live-in to
10196 // any block. We now intersect that set with any lclVars that ever spilled or split.
10197 // If there are no candidates for resoultion, simply return.
10199 VarSetOps::IntersectionD(compiler, resolutionCandidateVars, splitOrSpilledVars);
10200 if (VarSetOps::IsEmpty(compiler, resolutionCandidateVars))
10205 BasicBlock *block, *prevBlock = nullptr;
10207 // Handle all the critical edges first.
10208 // We will try to avoid resolution across critical edges in cases where all the critical-edge
10209 // targets of a block have the same home. We will then split the edges only for the
10210 // remaining mismatches. We visit the out-edges, as that allows us to share the moves that are
10211 // common among allt he targets.
10213 if (hasCriticalEdges)
10215 foreach_block(compiler, block)
10217 if (block->bbNum > bbNumMaxBeforeResolution)
10219 // This is a new block added during resolution - we don't need to visit these now.
10222 if (blockInfo[block->bbNum].hasCriticalOutEdge)
10224 handleOutgoingCriticalEdges(block);
10230 prevBlock = nullptr;
10231 foreach_block(compiler, block)
10233 if (block->bbNum > bbNumMaxBeforeResolution)
10235 // This is a new block added during resolution - we don't need to visit these now.
10239 unsigned succCount = block->NumSucc(compiler);
10240 flowList* preds = block->bbPreds;
10241 BasicBlock* uniquePredBlock = block->GetUniquePred(compiler);
10243 // First, if this block has a single predecessor,
10244 // we may need resolution at the beginning of this block.
10245 // This may be true even if it's the block we used for starting locations,
10246 // if a variable was spilled.
10247 VARSET_TP inResolutionSet(VarSetOps::Intersection(compiler, block->bbLiveIn, resolutionCandidateVars));
10248 if (!VarSetOps::IsEmpty(compiler, inResolutionSet))
10250 if (uniquePredBlock != nullptr)
10252 // We may have split edges during critical edge resolution, and in the process split
10253 // a non-critical edge as well.
10254 // It is unlikely that we would ever have more than one of these in sequence (indeed,
10255 // I don't think it's possible), but there's no need to assume that it can't.
10256 while (uniquePredBlock->bbNum > bbNumMaxBeforeResolution)
10258 uniquePredBlock = uniquePredBlock->GetUniquePred(compiler);
10259 noway_assert(uniquePredBlock != nullptr);
10261 resolveEdge(uniquePredBlock, block, ResolveSplit, inResolutionSet);
10265 // Finally, if this block has a single successor:
10266 // - and that has at least one other predecessor (otherwise we will do the resolution at the
10267 // top of the successor),
10268 // - and that is not the target of a critical edge (otherwise we've already handled it)
10269 // we may need resolution at the end of this block.
10271 if (succCount == 1)
10273 BasicBlock* succBlock = block->GetSucc(0, compiler);
10274 if (succBlock->GetUniquePred(compiler) == nullptr)
10276 VARSET_TP outResolutionSet(
10277 VarSetOps::Intersection(compiler, succBlock->bbLiveIn, resolutionCandidateVars));
10278 if (!VarSetOps::IsEmpty(compiler, outResolutionSet))
10280 resolveEdge(block, succBlock, ResolveJoin, outResolutionSet);
10286 // Now, fixup the mapping for any blocks that were adding for edge splitting.
10287 // See the comment prior to the call to fgSplitEdge() in resolveEdge().
10288 // Note that we could fold this loop in with the checking code below, but that
10289 // would only improve the debug case, and would clutter up the code somewhat.
10290 if (compiler->fgBBNumMax > bbNumMaxBeforeResolution)
10292 foreach_block(compiler, block)
10294 if (block->bbNum > bbNumMaxBeforeResolution)
10296 // There may be multiple blocks inserted when we split. But we must always have exactly
10297 // one path (i.e. all blocks must be single-successor and single-predecessor),
10298 // and only one block along the path may be non-empty.
10299 // Note that we may have a newly-inserted block that is empty, but which connects
10300 // two non-resolution blocks. This happens when an edge is split that requires it.
10302 BasicBlock* succBlock = block;
10305 succBlock = succBlock->GetUniqueSucc();
10306 noway_assert(succBlock != nullptr);
10307 } while ((succBlock->bbNum > bbNumMaxBeforeResolution) && succBlock->isEmpty());
10309 BasicBlock* predBlock = block;
10312 predBlock = predBlock->GetUniquePred(compiler);
10313 noway_assert(predBlock != nullptr);
10314 } while ((predBlock->bbNum > bbNumMaxBeforeResolution) && predBlock->isEmpty());
10316 unsigned succBBNum = succBlock->bbNum;
10317 unsigned predBBNum = predBlock->bbNum;
10318 if (block->isEmpty())
10320 // For the case of the empty block, find the non-resolution block (succ or pred).
10321 if (predBBNum > bbNumMaxBeforeResolution)
10323 assert(succBBNum <= bbNumMaxBeforeResolution);
10333 assert((succBBNum <= bbNumMaxBeforeResolution) && (predBBNum <= bbNumMaxBeforeResolution));
10335 SplitEdgeInfo info = {predBBNum, succBBNum};
10336 getSplitBBNumToTargetBBNumMap()->Set(block->bbNum, info);
10342 // Make sure the varToRegMaps match up on all edges.
10343 bool foundMismatch = false;
10344 foreach_block(compiler, block)
10346 if (block->isEmpty() && block->bbNum > bbNumMaxBeforeResolution)
10350 VarToRegMap toVarToRegMap = getInVarToRegMap(block->bbNum);
10351 for (flowList* pred = block->bbPreds; pred != nullptr; pred = pred->flNext)
10353 BasicBlock* predBlock = pred->flBlock;
10354 VarToRegMap fromVarToRegMap = getOutVarToRegMap(predBlock->bbNum);
10355 VarSetOps::Iter iter(compiler, block->bbLiveIn);
10356 unsigned varIndex = 0;
10357 while (iter.NextElem(&varIndex))
10359 regNumber fromReg = getVarReg(fromVarToRegMap, varIndex);
10360 regNumber toReg = getVarReg(toVarToRegMap, varIndex);
10361 if (fromReg != toReg)
10363 if (!foundMismatch)
10365 foundMismatch = true;
10366 printf("Found mismatched var locations after resolution!\n");
10368 unsigned varNum = compiler->lvaTrackedToVarNum[varIndex];
10369 printf(" V%02u: BB%02u to BB%02u: %s to %s\n", varNum, predBlock->bbNum, block->bbNum,
10370 getRegName(fromReg), getRegName(toReg));
10375 assert(!foundMismatch);
10380 //------------------------------------------------------------------------
10381 // resolveEdge: Perform the specified type of resolution between two blocks.
10384 // fromBlock - the block from which the edge originates
10385 // toBlock - the block at which the edge terminates
10386 // resolveType - the type of resolution to be performed
10387 // liveSet - the set of tracked lclVar indices which may require resolution
10393 // The caller must have performed the analysis to determine the type of the edge.
10396 // This method emits the correctly ordered moves necessary to place variables in the
10397 // correct registers across a Split, Join or Critical edge.
10398 // In order to avoid overwriting register values before they have been moved to their
10399 // new home (register/stack), it first does the register-to-stack moves (to free those
10400 // registers), then the register to register moves, ensuring that the target register
10401 // is free before the move, and then finally the stack to register moves.
10403 void LinearScan::resolveEdge(BasicBlock* fromBlock,
10404 BasicBlock* toBlock,
10405 ResolveType resolveType,
10406 VARSET_VALARG_TP liveSet)
10408 VarToRegMap fromVarToRegMap = getOutVarToRegMap(fromBlock->bbNum);
10409 VarToRegMap toVarToRegMap;
10410 if (resolveType == ResolveSharedCritical)
10412 toVarToRegMap = sharedCriticalVarToRegMap;
10416 toVarToRegMap = getInVarToRegMap(toBlock->bbNum);
10419 // The block to which we add the resolution moves depends on the resolveType
10421 switch (resolveType)
10424 case ResolveSharedCritical:
10430 case ResolveCritical:
10431 // fgSplitEdge may add one or two BasicBlocks. It returns the block that splits
10432 // the edge from 'fromBlock' and 'toBlock', but if it inserts that block right after
10433 // a block with a fall-through it will have to create another block to handle that edge.
10434 // These new blocks can be mapped to existing blocks in order to correctly handle
10435 // the calls to recordVarLocationsAtStartOfBB() from codegen. That mapping is handled
10436 // in resolveEdges(), after all the edge resolution has been done (by calling this
10437 // method for each edge).
10438 block = compiler->fgSplitEdge(fromBlock, toBlock);
10440 // Split edges are counted against fromBlock.
10441 INTRACK_STATS(updateLsraStat(LSRA_STAT_SPLIT_EDGE, fromBlock->bbNum));
10448 #ifndef _TARGET_XARCH_
10449 // We record tempregs for beginning and end of each block.
10450 // For amd64/x86 we only need a tempReg for float - we'll use xchg for int.
10451 // TODO-Throughput: It would be better to determine the tempRegs on demand, but the code below
10452 // modifies the varToRegMaps so we don't have all the correct registers at the time
10453 // we need to get the tempReg.
10454 regNumber tempRegInt =
10455 (resolveType == ResolveSharedCritical) ? REG_NA : getTempRegForResolution(fromBlock, toBlock, TYP_INT);
10456 #endif // !_TARGET_XARCH_
10457 regNumber tempRegFlt = REG_NA;
10458 if ((compiler->compFloatingPointUsed) && (resolveType != ResolveSharedCritical))
10461 #ifdef _TARGET_ARM_
10462 // Let's try to reserve a double register for TYP_FLOAT and TYP_DOUBLE
10463 tempRegFlt = getTempRegForResolution(fromBlock, toBlock, TYP_DOUBLE);
10464 if (tempRegFlt == REG_NA)
10466 // If fails, try to reserve a float register for TYP_FLOAT
10467 tempRegFlt = getTempRegForResolution(fromBlock, toBlock, TYP_FLOAT);
10470 tempRegFlt = getTempRegForResolution(fromBlock, toBlock, TYP_FLOAT);
10474 regMaskTP targetRegsToDo = RBM_NONE;
10475 regMaskTP targetRegsReady = RBM_NONE;
10476 regMaskTP targetRegsFromStack = RBM_NONE;
10478 // The following arrays capture the location of the registers as they are moved:
10479 // - location[reg] gives the current location of the var that was originally in 'reg'.
10480 // (Note that a var may be moved more than once.)
10481 // - source[reg] gives the original location of the var that needs to be moved to 'reg'.
10482 // For example, if a var is in rax and needs to be moved to rsi, then we would start with:
10483 // location[rax] == rax
10484 // source[rsi] == rax -- this doesn't change
10485 // Then, if for some reason we need to move it temporary to rbx, we would have:
10486 // location[rax] == rbx
10487 // Once we have completed the move, we will have:
10488 // location[rax] == REG_NA
10489 // This indicates that the var originally in rax is now in its target register.
10491 regNumberSmall location[REG_COUNT];
10492 C_ASSERT(sizeof(char) == sizeof(regNumberSmall)); // for memset to work
10493 memset(location, REG_NA, REG_COUNT);
10494 regNumberSmall source[REG_COUNT];
10495 memset(source, REG_NA, REG_COUNT);
10497 // What interval is this register associated with?
10498 // (associated with incoming reg)
10499 Interval* sourceIntervals[REG_COUNT];
10500 memset(&sourceIntervals, 0, sizeof(sourceIntervals));
10502 // Intervals for vars that need to be loaded from the stack
10503 Interval* stackToRegIntervals[REG_COUNT];
10504 memset(&stackToRegIntervals, 0, sizeof(stackToRegIntervals));
10506 // Get the starting insertion point for the "to" resolution
10507 GenTreePtr insertionPoint = nullptr;
10508 if (resolveType == ResolveSplit || resolveType == ResolveCritical)
10510 insertionPoint = LIR::AsRange(block).FirstNonPhiNode();
10514 // - Perform all moves from reg to stack (no ordering needed on these)
10515 // - For reg to reg moves, record the current location, associating their
10516 // source location with the target register they need to go into
10517 // - For stack to reg moves (done last, no ordering needed between them)
10518 // record the interval associated with the target reg
10519 // TODO-Throughput: We should be looping over the liveIn and liveOut registers, since
10520 // that will scale better than the live variables
10522 VarSetOps::Iter iter(compiler, liveSet);
10523 unsigned varIndex = 0;
10524 while (iter.NextElem(&varIndex))
10526 regNumber fromReg = getVarReg(fromVarToRegMap, varIndex);
10527 regNumber toReg = getVarReg(toVarToRegMap, varIndex);
10528 if (fromReg == toReg)
10533 // For Critical edges, the location will not change on either side of the edge,
10534 // since we'll add a new block to do the move.
10535 if (resolveType == ResolveSplit)
10537 setVarReg(toVarToRegMap, varIndex, fromReg);
10539 else if (resolveType == ResolveJoin || resolveType == ResolveSharedCritical)
10541 setVarReg(fromVarToRegMap, varIndex, toReg);
10544 assert(fromReg < UCHAR_MAX && toReg < UCHAR_MAX);
10546 Interval* interval = getIntervalForLocalVar(varIndex);
10548 if (fromReg == REG_STK)
10550 stackToRegIntervals[toReg] = interval;
10551 targetRegsFromStack |= genRegMask(toReg);
10553 else if (toReg == REG_STK)
10555 // Do the reg to stack moves now
10556 addResolution(block, insertionPoint, interval, REG_STK, fromReg);
10557 JITDUMP(" (%s)\n", resolveTypeName[resolveType]);
10561 location[fromReg] = (regNumberSmall)fromReg;
10562 source[toReg] = (regNumberSmall)fromReg;
10563 sourceIntervals[fromReg] = interval;
10564 targetRegsToDo |= genRegMask(toReg);
10568 // REGISTER to REGISTER MOVES
10570 // First, find all the ones that are ready to move now
10571 regMaskTP targetCandidates = targetRegsToDo;
10572 while (targetCandidates != RBM_NONE)
10574 regMaskTP targetRegMask = genFindLowestBit(targetCandidates);
10575 targetCandidates &= ~targetRegMask;
10576 regNumber targetReg = genRegNumFromMask(targetRegMask);
10577 if (location[targetReg] == REG_NA)
10579 targetRegsReady |= targetRegMask;
10583 // Perform reg to reg moves
10584 while (targetRegsToDo != RBM_NONE)
10586 while (targetRegsReady != RBM_NONE)
10588 regMaskTP targetRegMask = genFindLowestBit(targetRegsReady);
10589 targetRegsToDo &= ~targetRegMask;
10590 targetRegsReady &= ~targetRegMask;
10591 regNumber targetReg = genRegNumFromMask(targetRegMask);
10592 assert(location[targetReg] != targetReg);
10593 regNumber sourceReg = (regNumber)source[targetReg];
10594 regNumber fromReg = (regNumber)location[sourceReg];
10595 assert(fromReg < UCHAR_MAX && sourceReg < UCHAR_MAX);
10596 Interval* interval = sourceIntervals[sourceReg];
10597 assert(interval != nullptr);
10598 addResolution(block, insertionPoint, interval, targetReg, fromReg);
10599 JITDUMP(" (%s)\n", resolveTypeName[resolveType]);
10600 sourceIntervals[sourceReg] = nullptr;
10601 location[sourceReg] = REG_NA;
10603 // Do we have a free targetReg?
10604 if (fromReg == sourceReg && source[fromReg] != REG_NA)
10606 regMaskTP fromRegMask = genRegMask(fromReg);
10607 targetRegsReady |= fromRegMask;
10610 if (targetRegsToDo != RBM_NONE)
10612 regMaskTP targetRegMask = genFindLowestBit(targetRegsToDo);
10613 regNumber targetReg = genRegNumFromMask(targetRegMask);
10615 // Is it already there due to other moves?
10616 // If not, move it to the temp reg, OR swap it with another register
10617 regNumber sourceReg = (regNumber)source[targetReg];
10618 regNumber fromReg = (regNumber)location[sourceReg];
10619 if (targetReg == fromReg)
10621 targetRegsToDo &= ~targetRegMask;
10625 regNumber tempReg = REG_NA;
10626 bool useSwap = false;
10627 if (emitter::isFloatReg(targetReg))
10629 #ifdef _TARGET_ARM_
10630 if (sourceIntervals[fromReg]->registerType == TYP_DOUBLE)
10632 // ARM32 requires a double temp register for TYP_DOUBLE.
10633 // We tried to reserve a double temp register first, but sometimes we can't.
10634 tempReg = genIsValidDoubleReg(tempRegFlt) ? tempRegFlt : REG_NA;
10637 #endif // _TARGET_ARM_
10638 tempReg = tempRegFlt;
10640 #ifdef _TARGET_XARCH_
10645 #else // !_TARGET_XARCH_
10649 tempReg = tempRegInt;
10652 #endif // !_TARGET_XARCH_
10653 if (useSwap || tempReg == REG_NA)
10655 // First, we have to figure out the destination register for what's currently in fromReg,
10656 // so that we can find its sourceInterval.
10657 regNumber otherTargetReg = REG_NA;
10659 // By chance, is fromReg going where it belongs?
10660 if (location[source[fromReg]] == targetReg)
10662 otherTargetReg = fromReg;
10663 // If we can swap, we will be done with otherTargetReg as well.
10664 // Otherwise, we'll spill it to the stack and reload it later.
10667 regMaskTP fromRegMask = genRegMask(fromReg);
10668 targetRegsToDo &= ~fromRegMask;
10673 // Look at the remaining registers from targetRegsToDo (which we expect to be relatively
10674 // small at this point) to find out what's currently in targetReg.
10675 regMaskTP mask = targetRegsToDo;
10676 while (mask != RBM_NONE && otherTargetReg == REG_NA)
10678 regMaskTP nextRegMask = genFindLowestBit(mask);
10679 regNumber nextReg = genRegNumFromMask(nextRegMask);
10680 mask &= ~nextRegMask;
10681 if (location[source[nextReg]] == targetReg)
10683 otherTargetReg = nextReg;
10687 assert(otherTargetReg != REG_NA);
10691 // Generate a "swap" of fromReg and targetReg
10692 insertSwap(block, insertionPoint, sourceIntervals[source[otherTargetReg]]->varNum, targetReg,
10693 sourceIntervals[sourceReg]->varNum, fromReg);
10694 location[sourceReg] = REG_NA;
10695 location[source[otherTargetReg]] = (regNumberSmall)fromReg;
10697 INTRACK_STATS(updateLsraStat(LSRA_STAT_RESOLUTION_MOV, block->bbNum));
10701 // Spill "targetReg" to the stack and add its eventual target (otherTargetReg)
10702 // to "targetRegsFromStack", which will be handled below.
10703 // NOTE: This condition is very rare. Setting COMPlus_JitStressRegs=0x203
10704 // has been known to trigger it in JIT SH.
10706 // First, spill "otherInterval" from targetReg to the stack.
10707 Interval* otherInterval = sourceIntervals[source[otherTargetReg]];
10708 setIntervalAsSpilled(otherInterval);
10709 addResolution(block, insertionPoint, otherInterval, REG_STK, targetReg);
10710 JITDUMP(" (%s)\n", resolveTypeName[resolveType]);
10711 location[source[otherTargetReg]] = REG_STK;
10713 // Now, move the interval that is going to targetReg, and add its "fromReg" to
10714 // "targetRegsReady".
10715 addResolution(block, insertionPoint, sourceIntervals[sourceReg], targetReg, fromReg);
10716 JITDUMP(" (%s)\n", resolveTypeName[resolveType]);
10717 location[sourceReg] = REG_NA;
10718 targetRegsReady |= genRegMask(fromReg);
10720 targetRegsToDo &= ~targetRegMask;
10724 compiler->codeGen->regSet.rsSetRegsModified(genRegMask(tempReg) DEBUGARG(dumpTerse));
10725 assert(sourceIntervals[targetReg] != nullptr);
10726 addResolution(block, insertionPoint, sourceIntervals[targetReg], tempReg, targetReg);
10727 JITDUMP(" (%s)\n", resolveTypeName[resolveType]);
10728 location[targetReg] = (regNumberSmall)tempReg;
10729 targetRegsReady |= targetRegMask;
10735 // Finally, perform stack to reg moves
10736 // All the target regs will be empty at this point
10737 while (targetRegsFromStack != RBM_NONE)
10739 regMaskTP targetRegMask = genFindLowestBit(targetRegsFromStack);
10740 targetRegsFromStack &= ~targetRegMask;
10741 regNumber targetReg = genRegNumFromMask(targetRegMask);
10743 Interval* interval = stackToRegIntervals[targetReg];
10744 assert(interval != nullptr);
10746 addResolution(block, insertionPoint, interval, targetReg, REG_STK);
10747 JITDUMP(" (%s)\n", resolveTypeName[resolveType]);
10751 //------------------------------------------------------------------------
10752 // GetIndirSourceCount: Get the source registers for an indirection that might be contained.
10755 // node - The node of interest
10758 // The number of source registers used by the *parent* of this node.
10760 int LinearScan::GetIndirSourceCount(GenTreeIndir* indirTree)
10762 GenTree* const addr = indirTree->gtOp1;
10763 if (!addr->isContained())
10767 if (!addr->OperIs(GT_LEA))
10772 GenTreeAddrMode* const addrMode = addr->AsAddrMode();
10774 unsigned srcCount = 0;
10775 if ((addrMode->Base() != nullptr) && !addrMode->Base()->isContained())
10779 if (addrMode->Index() != nullptr)
10781 // We never have a contained index.
10782 assert(!addrMode->Index()->isContained());
10788 void TreeNodeInfo::Initialize(LinearScan* lsra, GenTree* node, LsraLocation location)
10790 regMaskTP dstCandidates;
10792 // if there is a reg indicated on the tree node, use that for dstCandidates
10793 // the exception is the NOP, which sometimes show up around late args.
10794 // TODO-Cleanup: get rid of those NOPs.
10795 if (node->gtRegNum == REG_STK)
10797 dstCandidates = RBM_NONE;
10799 else if (node->gtRegNum == REG_NA || node->gtOper == GT_NOP)
10802 if (node->OperGet() == GT_PUTARG_REG)
10804 dstCandidates = lsra->allRegs(TYP_INT);
10809 dstCandidates = lsra->allRegs(node->TypeGet());
10814 dstCandidates = genRegMask(node->gtRegNum);
10817 internalIntCount = 0;
10818 internalFloatCount = 0;
10819 isLocalDefUse = false;
10820 isLsraAdded = false;
10821 definesAnyRegisters = false;
10823 setDstCandidates(lsra, dstCandidates);
10824 srcCandsIndex = dstCandsIndex;
10826 setInternalCandidates(lsra, lsra->allRegs(TYP_INT));
10830 isInitialized = true;
10833 assert(IsValid(lsra));
10836 regMaskTP TreeNodeInfo::getSrcCandidates(LinearScan* lsra)
10838 return lsra->GetRegMaskForIndex(srcCandsIndex);
10841 void TreeNodeInfo::setSrcCandidates(LinearScan* lsra, regMaskTP mask)
10843 LinearScan::RegMaskIndex i = lsra->GetIndexForRegMask(mask);
10844 assert(FitsIn<unsigned char>(i));
10845 srcCandsIndex = (unsigned char)i;
10848 regMaskTP TreeNodeInfo::getDstCandidates(LinearScan* lsra)
10850 return lsra->GetRegMaskForIndex(dstCandsIndex);
10853 void TreeNodeInfo::setDstCandidates(LinearScan* lsra, regMaskTP mask)
10855 LinearScan::RegMaskIndex i = lsra->GetIndexForRegMask(mask);
10856 assert(FitsIn<unsigned char>(i));
10857 dstCandsIndex = (unsigned char)i;
10860 regMaskTP TreeNodeInfo::getInternalCandidates(LinearScan* lsra)
10862 return lsra->GetRegMaskForIndex(internalCandsIndex);
10865 void TreeNodeInfo::setInternalCandidates(LinearScan* lsra, regMaskTP mask)
10867 LinearScan::RegMaskIndex i = lsra->GetIndexForRegMask(mask);
10868 assert(FitsIn<unsigned char>(i));
10869 internalCandsIndex = (unsigned char)i;
10872 void TreeNodeInfo::addInternalCandidates(LinearScan* lsra, regMaskTP mask)
10874 LinearScan::RegMaskIndex i = lsra->GetIndexForRegMask(lsra->GetRegMaskForIndex(internalCandsIndex) | mask);
10875 assert(FitsIn<unsigned char>(i));
10876 internalCandsIndex = (unsigned char)i;
10879 #if TRACK_LSRA_STATS
10880 // ----------------------------------------------------------
10881 // updateLsraStat: Increment LSRA stat counter.
10884 // stat - LSRA stat enum
10885 // bbNum - Basic block to which LSRA stat needs to be
10886 // associated with.
10888 void LinearScan::updateLsraStat(LsraStat stat, unsigned bbNum)
10890 if (bbNum > bbNumMaxBeforeResolution)
10892 // This is a newly created basic block as part of resolution.
10893 // These blocks contain resolution moves that are already accounted.
10899 case LSRA_STAT_SPILL:
10900 ++(blockInfo[bbNum].spillCount);
10903 case LSRA_STAT_COPY_REG:
10904 ++(blockInfo[bbNum].copyRegCount);
10907 case LSRA_STAT_RESOLUTION_MOV:
10908 ++(blockInfo[bbNum].resolutionMovCount);
10911 case LSRA_STAT_SPLIT_EDGE:
10912 ++(blockInfo[bbNum].splitEdgeCount);
10920 // -----------------------------------------------------------
10921 // dumpLsraStats - dumps Lsra stats to given file.
10924 // file - file to which stats are to be written.
10926 void LinearScan::dumpLsraStats(FILE* file)
10928 unsigned sumSpillCount = 0;
10929 unsigned sumCopyRegCount = 0;
10930 unsigned sumResolutionMovCount = 0;
10931 unsigned sumSplitEdgeCount = 0;
10932 UINT64 wtdSpillCount = 0;
10933 UINT64 wtdCopyRegCount = 0;
10934 UINT64 wtdResolutionMovCount = 0;
10936 fprintf(file, "----------\n");
10937 fprintf(file, "LSRA Stats");
10941 fprintf(file, " : %s\n", compiler->info.compFullName);
10945 // In verbose mode no need to print full name
10946 // while printing lsra stats.
10947 fprintf(file, "\n");
10950 fprintf(file, " : %s\n", compiler->eeGetMethodFullName(compiler->info.compCompHnd));
10953 fprintf(file, "----------\n");
10955 for (BasicBlock* block = compiler->fgFirstBB; block != nullptr; block = block->bbNext)
10957 if (block->bbNum > bbNumMaxBeforeResolution)
10962 unsigned spillCount = blockInfo[block->bbNum].spillCount;
10963 unsigned copyRegCount = blockInfo[block->bbNum].copyRegCount;
10964 unsigned resolutionMovCount = blockInfo[block->bbNum].resolutionMovCount;
10965 unsigned splitEdgeCount = blockInfo[block->bbNum].splitEdgeCount;
10967 if (spillCount != 0 || copyRegCount != 0 || resolutionMovCount != 0 || splitEdgeCount != 0)
10969 fprintf(file, "BB%02u [%8d]: ", block->bbNum, block->bbWeight);
10970 fprintf(file, "SpillCount = %d, ResolutionMovs = %d, SplitEdges = %d, CopyReg = %d\n", spillCount,
10971 resolutionMovCount, splitEdgeCount, copyRegCount);
10974 sumSpillCount += spillCount;
10975 sumCopyRegCount += copyRegCount;
10976 sumResolutionMovCount += resolutionMovCount;
10977 sumSplitEdgeCount += splitEdgeCount;
10979 wtdSpillCount += (UINT64)spillCount * block->bbWeight;
10980 wtdCopyRegCount += (UINT64)copyRegCount * block->bbWeight;
10981 wtdResolutionMovCount += (UINT64)resolutionMovCount * block->bbWeight;
10984 fprintf(file, "Total Tracked Vars: %d\n", compiler->lvaTrackedCount);
10985 fprintf(file, "Total Reg Cand Vars: %d\n", regCandidateVarCount);
10986 fprintf(file, "Total number of Intervals: %d\n", static_cast<unsigned>(intervals.size() - 1));
10987 fprintf(file, "Total number of RefPositions: %d\n", static_cast<unsigned>(refPositions.size() - 1));
10988 fprintf(file, "Total Spill Count: %d Weighted: %I64u\n", sumSpillCount, wtdSpillCount);
10989 fprintf(file, "Total CopyReg Count: %d Weighted: %I64u\n", sumCopyRegCount, wtdCopyRegCount);
10990 fprintf(file, "Total ResolutionMov Count: %d Weighted: %I64u\n", sumResolutionMovCount, wtdResolutionMovCount);
10991 fprintf(file, "Total number of split edges: %d\n", sumSplitEdgeCount);
10993 // compute total number of spill temps created
10994 unsigned numSpillTemps = 0;
10995 for (int i = 0; i < TYP_COUNT; i++)
10997 numSpillTemps += maxSpill[i];
10999 fprintf(file, "Total Number of spill temps created: %d\n\n", numSpillTemps);
11001 #endif // TRACK_LSRA_STATS
11004 void dumpRegMask(regMaskTP regs)
11006 if (regs == RBM_ALLINT)
11008 printf("[allInt]");
11010 else if (regs == (RBM_ALLINT & ~RBM_FPBASE))
11012 printf("[allIntButFP]");
11014 else if (regs == RBM_ALLFLOAT)
11016 printf("[allFloat]");
11018 else if (regs == RBM_ALLDOUBLE)
11020 printf("[allDouble]");
11028 static const char* getRefTypeName(RefType refType)
11032 #define DEF_REFTYPE(memberName, memberValue, shortName) \
11034 return #memberName;
11035 #include "lsra_reftypes.h"
11042 static const char* getRefTypeShortName(RefType refType)
11046 #define DEF_REFTYPE(memberName, memberValue, shortName) \
11049 #include "lsra_reftypes.h"
11056 void RefPosition::dump()
11058 printf("<RefPosition #%-3u @%-3u", rpNum, nodeLocation);
11060 if (nextRefPosition)
11062 printf(" ->#%-3u", nextRefPosition->rpNum);
11065 printf(" %s ", getRefTypeName(refType));
11067 if (this->isPhysRegRef)
11069 this->getReg()->tinyDump();
11071 else if (getInterval())
11073 this->getInterval()->tinyDump();
11076 if (this->treeNode)
11078 printf("%s ", treeNode->OpName(treeNode->OperGet()));
11080 printf("BB%02u ", this->bbNum);
11082 printf("regmask=");
11083 dumpRegMask(registerAssignment);
11093 if (this->spillAfter)
11095 printf(" spillAfter");
11105 if (this->isFixedRegRef)
11109 if (this->isLocalDefUse)
11113 if (this->delayRegFree)
11117 if (this->outOfOrder)
11119 printf(" outOfOrder");
11122 if (this->AllocateIfProfitable())
11124 printf(" regOptional");
11129 void RegRecord::dump()
11134 void Interval::dump()
11136 printf("Interval %2u:", intervalIndex);
11140 printf(" (V%02u)", varNum);
11144 printf(" (INTERNAL)");
11148 printf(" (SPILLED)");
11152 printf(" (SPLIT)");
11156 printf(" (struct)");
11158 if (isSpecialPutArg)
11160 printf(" (specialPutArg)");
11164 printf(" (constant)");
11167 printf(" RefPositions {");
11168 for (RefPosition* refPosition = this->firstRefPosition; refPosition != nullptr;
11169 refPosition = refPosition->nextRefPosition)
11171 printf("#%u@%u", refPosition->rpNum, refPosition->nodeLocation);
11172 if (refPosition->nextRefPosition)
11179 // this is not used (yet?)
11180 // printf(" SpillOffset %d", this->spillOffset);
11182 printf(" physReg:%s", getRegName(physReg));
11184 printf(" Preferences=");
11185 dumpRegMask(this->registerPreferences);
11187 if (relatedInterval)
11189 printf(" RelatedInterval ");
11190 relatedInterval->microDump();
11191 printf("[%p]", dspPtr(relatedInterval));
11197 // print out very concise representation
11198 void Interval::tinyDump()
11200 printf("<Ivl:%u", intervalIndex);
11203 printf(" V%02u", varNum);
11207 printf(" internal");
11212 // print out extremely concise representation
11213 void Interval::microDump()
11215 char intervalTypeChar = 'I';
11218 intervalTypeChar = 'T';
11220 else if (isLocalVar)
11222 intervalTypeChar = 'L';
11225 printf("<%c%u>", intervalTypeChar, intervalIndex);
11228 void RegRecord::tinyDump()
11230 printf("<Reg:%-3s> ", getRegName(regNum));
11233 void TreeNodeInfo::dump(LinearScan* lsra)
11235 printf("<TreeNodeInfo @ %2u %d=%d %di %df", loc, dstCount, srcCount, internalIntCount, internalFloatCount);
11237 dumpRegMask(getSrcCandidates(lsra));
11239 dumpRegMask(getInternalCandidates(lsra));
11241 dumpRegMask(getDstCandidates(lsra));
11266 if (isInternalRegDelayFree)
11273 void LinearScan::lsraDumpIntervals(const char* msg)
11275 Interval* interval;
11277 printf("\nLinear scan intervals %s:\n", msg);
11278 for (auto& interval : intervals)
11280 // only dump something if it has references
11281 // if (interval->firstRefPosition)
11288 // Dumps a tree node as a destination or source operand, with the style
11289 // of dump dependent on the mode
11290 void LinearScan::lsraGetOperandString(GenTreePtr tree,
11291 LsraTupleDumpMode mode,
11292 char* operandString,
11293 unsigned operandStringLength)
11295 const char* lastUseChar = "";
11296 if ((tree->gtFlags & GTF_VAR_DEATH) != 0)
11302 case LinearScan::LSRA_DUMP_PRE:
11303 _snprintf_s(operandString, operandStringLength, operandStringLength, "t%d%s", tree->gtTreeID, lastUseChar);
11305 case LinearScan::LSRA_DUMP_REFPOS:
11306 _snprintf_s(operandString, operandStringLength, operandStringLength, "t%d%s", tree->gtTreeID, lastUseChar);
11308 case LinearScan::LSRA_DUMP_POST:
11310 Compiler* compiler = JitTls::GetCompiler();
11312 if (!tree->gtHasReg())
11314 _snprintf_s(operandString, operandStringLength, operandStringLength, "STK%s", lastUseChar);
11318 _snprintf_s(operandString, operandStringLength, operandStringLength, "%s%s",
11319 getRegName(tree->gtRegNum, useFloatReg(tree->TypeGet())), lastUseChar);
11324 printf("ERROR: INVALID TUPLE DUMP MODE\n");
11328 void LinearScan::lsraDispNode(GenTreePtr tree, LsraTupleDumpMode mode, bool hasDest)
11330 Compiler* compiler = JitTls::GetCompiler();
11331 const unsigned operandStringLength = 16;
11332 char operandString[operandStringLength];
11333 const char* emptyDestOperand = " ";
11334 char spillChar = ' ';
11336 if (mode == LinearScan::LSRA_DUMP_POST)
11338 if ((tree->gtFlags & GTF_SPILL) != 0)
11342 if (!hasDest && tree->gtHasReg())
11344 // A node can define a register, but not produce a value for a parent to consume,
11345 // i.e. in the "localDefUse" case.
11346 // There used to be an assert here that we wouldn't spill such a node.
11347 // However, we can have unused lclVars that wind up being the node at which
11348 // it is spilled. This probably indicates a bug, but we don't realy want to
11349 // assert during a dump.
11350 if (spillChar == 'S')
11361 printf("%c N%03u. ", spillChar, tree->gtSeqNum);
11363 LclVarDsc* varDsc = nullptr;
11364 unsigned varNum = UINT_MAX;
11365 if (tree->IsLocal())
11367 varNum = tree->gtLclVarCommon.gtLclNum;
11368 varDsc = &(compiler->lvaTable[varNum]);
11369 if (varDsc->lvLRACandidate)
11376 if (mode == LinearScan::LSRA_DUMP_POST && tree->gtFlags & GTF_SPILLED)
11378 assert(tree->gtHasReg());
11380 lsraGetOperandString(tree, mode, operandString, operandStringLength);
11381 printf("%-15s =", operandString);
11385 printf("%-15s ", emptyDestOperand);
11387 if (varDsc != nullptr)
11389 if (varDsc->lvLRACandidate)
11391 if (mode == LSRA_DUMP_REFPOS)
11393 printf(" V%02u(L%d)", varNum, getIntervalForLocalVar(varDsc->lvVarIndex)->intervalIndex);
11397 lsraGetOperandString(tree, mode, operandString, operandStringLength);
11398 printf(" V%02u(%s)", varNum, operandString);
11399 if (mode == LinearScan::LSRA_DUMP_POST && tree->gtFlags & GTF_SPILLED)
11407 printf(" V%02u MEM", varNum);
11410 else if (tree->OperIsAssignment())
11412 assert(!tree->gtHasReg());
11413 printf(" asg%s ", GenTree::OpName(tree->OperGet()));
11417 compiler->gtDispNodeName(tree);
11418 if (tree->OperKind() & GTK_LEAF)
11420 compiler->gtDispLeaf(tree, nullptr);
11425 //------------------------------------------------------------------------
11426 // ComputeOperandDstCount: computes the number of registers defined by a
11429 // For most nodes, this is simple:
11430 // - Nodes that do not produce values (e.g. stores and other void-typed
11431 // nodes) and nodes that immediately use the registers they define
11432 // produce no registers
11433 // - Nodes that are marked as defining N registers define N registers.
11435 // For contained nodes, however, things are more complicated: for purposes
11436 // of bookkeeping, a contained node is treated as producing the transitive
11437 // closure of the registers produced by its sources.
11440 // operand - The operand for which to compute a register count.
11443 // The number of registers defined by `operand`.
11445 void LinearScan::DumpOperandDefs(
11446 GenTree* operand, bool& first, LsraTupleDumpMode mode, char* operandString, const unsigned operandStringLength)
11448 assert(operand != nullptr);
11449 assert(operandString != nullptr);
11451 if (ComputeOperandDstCount(operand) == 0)
11456 if (operand->gtLsraInfo.dstCount != 0)
11458 // This operand directly produces registers; print it.
11459 for (int i = 0; i < operand->gtLsraInfo.dstCount; i++)
11466 lsraGetOperandString(operand, mode, operandString, operandStringLength);
11467 printf("%s", operandString);
11474 // This is a contained node. Dump the defs produced by its operands.
11475 for (GenTree* op : operand->Operands())
11477 DumpOperandDefs(op, first, mode, operandString, operandStringLength);
11482 void LinearScan::TupleStyleDump(LsraTupleDumpMode mode)
11485 LsraLocation currentLoc = 1; // 0 is the entry
11486 const unsigned operandStringLength = 16;
11487 char operandString[operandStringLength];
11489 // currentRefPosition is not used for LSRA_DUMP_PRE
11490 // We keep separate iterators for defs, so that we can print them
11491 // on the lhs of the dump
11492 auto currentRefPosition = refPositions.begin();
11496 case LSRA_DUMP_PRE:
11497 printf("TUPLE STYLE DUMP BEFORE LSRA\n");
11499 case LSRA_DUMP_REFPOS:
11500 printf("TUPLE STYLE DUMP WITH REF POSITIONS\n");
11502 case LSRA_DUMP_POST:
11503 printf("TUPLE STYLE DUMP WITH REGISTER ASSIGNMENTS\n");
11506 printf("ERROR: INVALID TUPLE DUMP MODE\n");
11510 if (mode != LSRA_DUMP_PRE)
11512 printf("Incoming Parameters: ");
11513 for (; currentRefPosition != refPositions.end() && currentRefPosition->refType != RefTypeBB;
11514 ++currentRefPosition)
11516 Interval* interval = currentRefPosition->getInterval();
11517 assert(interval != nullptr && interval->isLocalVar);
11518 printf(" V%02d", interval->varNum);
11519 if (mode == LSRA_DUMP_POST)
11522 if (currentRefPosition->registerAssignment == RBM_NONE)
11528 reg = currentRefPosition->assignedReg();
11530 LclVarDsc* varDsc = &(compiler->lvaTable[interval->varNum]);
11532 regNumber assignedReg = varDsc->lvRegNum;
11533 regNumber argReg = (varDsc->lvIsRegArg) ? varDsc->lvArgReg : REG_STK;
11535 assert(reg == assignedReg || varDsc->lvRegister == false);
11538 printf(getRegName(argReg, isFloatRegType(interval->registerType)));
11541 printf("%s)", getRegName(reg, isFloatRegType(interval->registerType)));
11547 for (block = startBlockSequence(); block != nullptr; block = moveToNextBlock())
11551 if (mode == LSRA_DUMP_REFPOS)
11553 bool printedBlockHeader = false;
11554 // We should find the boundary RefPositions in the order of exposed uses, dummy defs, and the blocks
11555 for (; currentRefPosition != refPositions.end() &&
11556 (currentRefPosition->refType == RefTypeExpUse || currentRefPosition->refType == RefTypeDummyDef ||
11557 (currentRefPosition->refType == RefTypeBB && !printedBlockHeader));
11558 ++currentRefPosition)
11560 Interval* interval = nullptr;
11561 if (currentRefPosition->isIntervalRef())
11563 interval = currentRefPosition->getInterval();
11565 switch (currentRefPosition->refType)
11567 case RefTypeExpUse:
11568 assert(interval != nullptr);
11569 assert(interval->isLocalVar);
11570 printf(" Exposed use of V%02u at #%d\n", interval->varNum, currentRefPosition->rpNum);
11572 case RefTypeDummyDef:
11573 assert(interval != nullptr);
11574 assert(interval->isLocalVar);
11575 printf(" Dummy def of V%02u at #%d\n", interval->varNum, currentRefPosition->rpNum);
11578 block->dspBlockHeader(compiler);
11579 printedBlockHeader = true;
11583 printf("Unexpected RefPosition type at #%d\n", currentRefPosition->rpNum);
11590 block->dspBlockHeader(compiler);
11593 if (enregisterLocalVars && mode == LSRA_DUMP_POST && block != compiler->fgFirstBB &&
11594 block->bbNum <= bbNumMaxBeforeResolution)
11596 printf("Predecessor for variable locations: BB%02u\n", blockInfo[block->bbNum].predBBNum);
11597 dumpInVarToRegMap(block);
11599 if (block->bbNum > bbNumMaxBeforeResolution)
11601 SplitEdgeInfo splitEdgeInfo;
11602 splitBBNumToTargetBBNumMap->Lookup(block->bbNum, &splitEdgeInfo);
11603 assert(splitEdgeInfo.toBBNum <= bbNumMaxBeforeResolution);
11604 assert(splitEdgeInfo.fromBBNum <= bbNumMaxBeforeResolution);
11605 printf("New block introduced for resolution from BB%02u to BB%02u\n", splitEdgeInfo.fromBBNum,
11606 splitEdgeInfo.toBBNum);
11609 for (GenTree* node : LIR::AsRange(block).NonPhiNodes())
11611 GenTree* tree = node;
11613 genTreeOps oper = tree->OperGet();
11614 TreeNodeInfo& info = tree->gtLsraInfo;
11615 if (tree->gtLsraInfo.isLsraAdded)
11617 // This must be one of the nodes that we add during LSRA
11619 if (oper == GT_LCL_VAR)
11624 else if (oper == GT_RELOAD || oper == GT_COPY)
11629 #ifdef FEATURE_SIMD
11630 else if (oper == GT_SIMD)
11632 if (tree->gtSIMD.gtSIMDIntrinsicID == SIMDIntrinsicUpperSave)
11639 assert(tree->gtSIMD.gtSIMDIntrinsicID == SIMDIntrinsicUpperRestore);
11644 #endif // FEATURE_SIMD
11647 assert(oper == GT_SWAP);
11651 info.internalIntCount = 0;
11652 info.internalFloatCount = 0;
11655 int consume = info.srcCount;
11656 int produce = info.dstCount;
11657 regMaskTP killMask = RBM_NONE;
11658 regMaskTP fixedMask = RBM_NONE;
11660 lsraDispNode(tree, mode, produce != 0 && mode != LSRA_DUMP_REFPOS);
11662 if (mode != LSRA_DUMP_REFPOS)
11669 for (GenTree* operand : tree->Operands())
11671 DumpOperandDefs(operand, first, mode, operandString, operandStringLength);
11677 // Print each RefPosition on a new line, but
11678 // printing all the kills for each node on a single line
11679 // and combining the fixed regs with their associated def or use
11680 bool killPrinted = false;
11681 RefPosition* lastFixedRegRefPos = nullptr;
11682 for (; currentRefPosition != refPositions.end() &&
11683 (currentRefPosition->refType == RefTypeUse || currentRefPosition->refType == RefTypeFixedReg ||
11684 currentRefPosition->refType == RefTypeKill || currentRefPosition->refType == RefTypeDef) &&
11685 (currentRefPosition->nodeLocation == tree->gtSeqNum ||
11686 currentRefPosition->nodeLocation == tree->gtSeqNum + 1);
11687 ++currentRefPosition)
11689 Interval* interval = nullptr;
11690 if (currentRefPosition->isIntervalRef())
11692 interval = currentRefPosition->getInterval();
11694 switch (currentRefPosition->refType)
11697 if (currentRefPosition->isPhysRegRef)
11699 printf("\n Use:R%d(#%d)",
11700 currentRefPosition->getReg()->regNum, currentRefPosition->rpNum);
11704 assert(interval != nullptr);
11706 interval->microDump();
11707 printf("(#%d)", currentRefPosition->rpNum);
11708 if (currentRefPosition->isFixedRegRef)
11710 assert(genMaxOneBit(currentRefPosition->registerAssignment));
11711 assert(lastFixedRegRefPos != nullptr);
11712 printf(" Fixed:%s(#%d)", getRegName(currentRefPosition->assignedReg(),
11713 isFloatRegType(interval->registerType)),
11714 lastFixedRegRefPos->rpNum);
11715 lastFixedRegRefPos = nullptr;
11717 if (currentRefPosition->isLocalDefUse)
11719 printf(" LocalDefUse");
11721 if (currentRefPosition->lastUse)
11729 // Print each def on a new line
11730 assert(interval != nullptr);
11732 interval->microDump();
11733 printf("(#%d)", currentRefPosition->rpNum);
11734 if (currentRefPosition->isFixedRegRef)
11736 assert(genMaxOneBit(currentRefPosition->registerAssignment));
11737 printf(" %s", getRegName(currentRefPosition->assignedReg(),
11738 isFloatRegType(interval->registerType)));
11740 if (currentRefPosition->isLocalDefUse)
11742 printf(" LocalDefUse");
11744 if (currentRefPosition->lastUse)
11748 if (interval->relatedInterval != nullptr)
11751 interval->relatedInterval->microDump();
11758 printf("\n Kill: ");
11759 killPrinted = true;
11761 printf(getRegName(currentRefPosition->assignedReg(),
11762 isFloatRegType(currentRefPosition->getReg()->registerType)));
11765 case RefTypeFixedReg:
11766 lastFixedRegRefPos = currentRefPosition;
11769 printf("Unexpected RefPosition type at #%d\n", currentRefPosition->rpNum);
11775 if (info.internalIntCount != 0 && mode != LSRA_DUMP_REFPOS)
11777 printf("\tinternal (%d):\t", info.internalIntCount);
11778 if (mode == LSRA_DUMP_POST)
11780 dumpRegMask(tree->gtRsvdRegs);
11782 else if ((info.getInternalCandidates(this) & allRegs(TYP_INT)) != allRegs(TYP_INT))
11784 dumpRegMask(info.getInternalCandidates(this) & allRegs(TYP_INT));
11788 if (info.internalFloatCount != 0 && mode != LSRA_DUMP_REFPOS)
11790 printf("\tinternal (%d):\t", info.internalFloatCount);
11791 if (mode == LSRA_DUMP_POST)
11793 dumpRegMask(tree->gtRsvdRegs);
11795 else if ((info.getInternalCandidates(this) & allRegs(TYP_INT)) != allRegs(TYP_INT))
11797 dumpRegMask(info.getInternalCandidates(this) & allRegs(TYP_INT));
11802 if (enregisterLocalVars && mode == LSRA_DUMP_POST)
11804 dumpOutVarToRegMap(block);
11811 void LinearScan::dumpLsraAllocationEvent(LsraDumpEvent event,
11812 Interval* interval,
11814 BasicBlock* currentBlock)
11822 // Conflicting def/use
11823 case LSRA_EVENT_DEFUSE_CONFLICT:
11826 printf(" Def and Use have conflicting register requirements:");
11830 printf("DUconflict ");
11834 case LSRA_EVENT_DEFUSE_FIXED_DELAY_USE:
11837 printf(" Can't change useAssignment ");
11840 case LSRA_EVENT_DEFUSE_CASE1:
11843 printf(" case #1, use the defRegAssignment\n");
11847 printf(indentFormat, " case #1 use defRegAssignment");
11849 dumpEmptyRefPosition();
11852 case LSRA_EVENT_DEFUSE_CASE2:
11855 printf(" case #2, use the useRegAssignment\n");
11859 printf(indentFormat, " case #2 use useRegAssignment");
11861 dumpEmptyRefPosition();
11864 case LSRA_EVENT_DEFUSE_CASE3:
11867 printf(" case #3, change the defRegAssignment to the use regs\n");
11871 printf(indentFormat, " case #3 use useRegAssignment");
11873 dumpEmptyRefPosition();
11876 case LSRA_EVENT_DEFUSE_CASE4:
11879 printf(" case #4, change the useRegAssignment to the def regs\n");
11883 printf(indentFormat, " case #4 use defRegAssignment");
11885 dumpEmptyRefPosition();
11888 case LSRA_EVENT_DEFUSE_CASE5:
11891 printf(" case #5, Conflicting Def and Use single-register requirements require copies - set def to all "
11892 "regs of the appropriate type\n");
11896 printf(indentFormat, " case #5 set def to all regs");
11898 dumpEmptyRefPosition();
11901 case LSRA_EVENT_DEFUSE_CASE6:
11904 printf(" case #6, Conflicting Def and Use register requirements require a copy\n");
11908 printf(indentFormat, " case #6 need a copy");
11910 dumpEmptyRefPosition();
11914 case LSRA_EVENT_SPILL:
11917 printf("Spilled:\n");
11922 assert(interval != nullptr && interval->assignedReg != nullptr);
11923 printf("Spill %-4s ", getRegName(interval->assignedReg->regNum));
11925 dumpEmptyRefPosition();
11928 case LSRA_EVENT_SPILL_EXTENDED_LIFETIME:
11931 printf(" Spilled extended lifetime var V%02u at last use; not marked for actual spill.",
11932 interval->intervalIndex);
11936 // Restoring the previous register
11937 case LSRA_EVENT_RESTORE_PREVIOUS_INTERVAL_AFTER_SPILL:
11938 assert(interval != nullptr);
11941 printf(" Assign register %s to previous interval Ivl:%d after spill\n", getRegName(reg),
11942 interval->intervalIndex);
11946 // If we spilled, then the dump is already pre-indented, but we need to pre-indent for the subsequent
11948 // with a dumpEmptyRefPosition().
11949 printf("SRstr %-4s ", getRegName(reg));
11951 dumpEmptyRefPosition();
11954 case LSRA_EVENT_RESTORE_PREVIOUS_INTERVAL:
11955 assert(interval != nullptr);
11958 printf(" Assign register %s to previous interval Ivl:%d\n", getRegName(reg), interval->intervalIndex);
11962 if (activeRefPosition == nullptr)
11964 printf(emptyRefPositionFormat, "");
11966 printf("Restr %-4s ", getRegName(reg));
11968 if (activeRefPosition != nullptr)
11970 printf(emptyRefPositionFormat, "");
11975 // Done with GC Kills
11976 case LSRA_EVENT_DONE_KILL_GC_REFS:
11977 printf("DoneKillGC ");
11980 // Block boundaries
11981 case LSRA_EVENT_START_BB:
11982 assert(currentBlock != nullptr);
11985 printf("\n\n Live Vars(Regs) at start of BB%02u (from pred BB%02u):", currentBlock->bbNum,
11986 blockInfo[currentBlock->bbNum].predBBNum);
11987 dumpVarToRegMap(inVarToRegMaps[currentBlock->bbNum]);
11990 case LSRA_EVENT_END_BB:
11993 printf("\n\n Live Vars(Regs) after BB%02u:", currentBlock->bbNum);
11994 dumpVarToRegMap(outVarToRegMaps[currentBlock->bbNum]);
11998 case LSRA_EVENT_FREE_REGS:
12001 printf("Freeing registers:\n");
12005 // Characteristics of the current RefPosition
12006 case LSRA_EVENT_INCREMENT_RANGE_END:
12009 printf(" Incrementing nextPhysRegLocation for %s\n", getRegName(reg));
12013 case LSRA_EVENT_LAST_USE:
12016 printf(" Last use, marked to be freed\n");
12019 case LSRA_EVENT_LAST_USE_DELAYED:
12022 printf(" Last use, marked to be freed (delayed)\n");
12025 case LSRA_EVENT_NEEDS_NEW_REG:
12028 printf(" Needs new register; mark %s to be freed\n", getRegName(reg));
12032 printf("Free %-4s ", getRegName(reg));
12034 dumpEmptyRefPosition();
12038 // Allocation decisions
12039 case LSRA_EVENT_FIXED_REG:
12040 case LSRA_EVENT_EXP_USE:
12043 printf("No allocation\n");
12047 printf("Keep %-4s ", getRegName(reg));
12050 case LSRA_EVENT_ZERO_REF:
12051 assert(interval != nullptr && interval->isLocalVar);
12054 printf("Marking V%02u as last use there are no actual references\n", interval->varNum);
12060 dumpEmptyRefPosition();
12063 case LSRA_EVENT_KEPT_ALLOCATION:
12066 printf("already allocated %4s\n", getRegName(reg));
12070 printf("Keep %-4s ", getRegName(reg));
12073 case LSRA_EVENT_COPY_REG:
12074 assert(interval != nullptr && interval->recentRefPosition != nullptr);
12077 printf("allocated %s as copyReg\n\n", getRegName(reg));
12081 printf("Copy %-4s ", getRegName(reg));
12084 case LSRA_EVENT_MOVE_REG:
12085 assert(interval != nullptr && interval->recentRefPosition != nullptr);
12088 printf(" needs a new register; marked as moveReg\n");
12092 printf("Move %-4s ", getRegName(reg));
12094 dumpEmptyRefPosition();
12097 case LSRA_EVENT_ALLOC_REG:
12100 printf("allocated %s\n", getRegName(reg));
12104 printf("Alloc %-4s ", getRegName(reg));
12107 case LSRA_EVENT_REUSE_REG:
12110 printf("reused constant in %s\n", getRegName(reg));
12114 printf("Reuse %-4s ", getRegName(reg));
12117 case LSRA_EVENT_ALLOC_SPILLED_REG:
12120 printf("allocated spilled register %s\n", getRegName(reg));
12124 printf("Steal %-4s ", getRegName(reg));
12127 case LSRA_EVENT_NO_ENTRY_REG_ALLOCATED:
12128 assert(interval != nullptr && interval->isLocalVar);
12131 printf("Not allocating an entry register for V%02u due to low ref count\n", interval->varNum);
12138 case LSRA_EVENT_NO_REG_ALLOCATED:
12141 printf("no register allocated\n");
12148 case LSRA_EVENT_RELOAD:
12151 printf(" Marked for reload\n");
12155 printf("ReLod %-4s ", getRegName(reg));
12157 dumpEmptyRefPosition();
12160 case LSRA_EVENT_SPECIAL_PUTARG:
12163 printf(" Special case of putArg - using lclVar that's in the expected reg\n");
12167 printf("PtArg %-4s ", getRegName(reg));
12175 //------------------------------------------------------------------------
12176 // dumpRegRecordHeader: Dump the header for a column-based dump of the register state.
12185 // Reg names fit in 4 characters (minimum width of the columns)
12188 // In order to make the table as dense as possible (for ease of reading the dumps),
12189 // we determine the minimum regColumnWidth width required to represent:
12190 // regs, by name (e.g. eax or xmm0) - this is fixed at 4 characters.
12191 // intervals, as Vnn for lclVar intervals, or as I<num> for other intervals.
12192 // The table is indented by the amount needed for dumpRefPositionShort, which is
12193 // captured in shortRefPositionDumpWidth.
12195 void LinearScan::dumpRegRecordHeader()
12197 printf("The following table has one or more rows for each RefPosition that is handled during allocation.\n"
12198 "The first column provides the basic information about the RefPosition, with its type (e.g. Def,\n"
12199 "Use, Fixd) followed by a '*' if it is a last use, and a 'D' if it is delayRegFree, and then the\n"
12200 "action taken during allocation (e.g. Alloc a new register, or Keep an existing one).\n"
12201 "The subsequent columns show the Interval occupying each register, if any, followed by 'a' if it is\n"
12202 "active, and 'i'if it is inactive. Columns are only printed up to the last modifed register, which\n"
12203 "may increase during allocation, in which case additional columns will appear. Registers which are\n"
12204 "not marked modified have ---- in their column.\n\n");
12206 // First, determine the width of each register column (which holds a reg name in the
12207 // header, and an interval name in each subsequent row).
12208 int intervalNumberWidth = (int)log10((double)intervals.size()) + 1;
12209 // The regColumnWidth includes the identifying character (I or V) and an 'i' or 'a' (inactive or active)
12210 regColumnWidth = intervalNumberWidth + 2;
12211 if (regColumnWidth < 4)
12213 regColumnWidth = 4;
12215 sprintf_s(intervalNameFormat, MAX_FORMAT_CHARS, "%%c%%-%dd", regColumnWidth - 2);
12216 sprintf_s(regNameFormat, MAX_FORMAT_CHARS, "%%-%ds", regColumnWidth);
12218 // Next, determine the width of the short RefPosition (see dumpRefPositionShort()).
12219 // This is in the form:
12220 // nnn.#mmm NAME TYPEld
12222 // nnn is the Location, right-justified to the width needed for the highest location.
12223 // mmm is the RefPosition rpNum, left-justified to the width needed for the highest rpNum.
12224 // NAME is dumped by dumpReferentName(), and is "regColumnWidth".
12225 // TYPE is RefTypeNameShort, and is 4 characters
12226 // l is either '*' (if a last use) or ' ' (otherwise)
12227 // d is either 'D' (if a delayed use) or ' ' (otherwise)
12229 maxNodeLocation = (maxNodeLocation == 0)
12231 : maxNodeLocation; // corner case of a method with an infinite loop without any gentree nodes
12232 assert(maxNodeLocation >= 1);
12233 assert(refPositions.size() >= 1);
12234 int nodeLocationWidth = (int)log10((double)maxNodeLocation) + 1;
12235 int refPositionWidth = (int)log10((double)refPositions.size()) + 1;
12236 int refTypeInfoWidth = 4 /*TYPE*/ + 2 /* last-use and delayed */ + 1 /* space */;
12237 int locationAndRPNumWidth = nodeLocationWidth + 2 /* .# */ + refPositionWidth + 1 /* space */;
12238 int shortRefPositionDumpWidth = locationAndRPNumWidth + regColumnWidth + 1 /* space */ + refTypeInfoWidth;
12239 sprintf_s(shortRefPositionFormat, MAX_FORMAT_CHARS, "%%%dd.#%%-%dd ", nodeLocationWidth, refPositionWidth);
12240 sprintf_s(emptyRefPositionFormat, MAX_FORMAT_CHARS, "%%-%ds", shortRefPositionDumpWidth);
12242 // The width of the "allocation info"
12243 // - a 5-character allocation decision
12245 // - a 4-character register
12247 int allocationInfoWidth = 5 + 1 + 4 + 1;
12249 // Next, determine the width of the legend for each row. This includes:
12250 // - a short RefPosition dump (shortRefPositionDumpWidth), which includes a space
12251 // - the allocation info (allocationInfoWidth), which also includes a space
12253 regTableIndent = shortRefPositionDumpWidth + allocationInfoWidth;
12255 // BBnn printed left-justified in the NAME Typeld and allocationInfo space.
12256 int bbDumpWidth = regColumnWidth + 1 + refTypeInfoWidth + allocationInfoWidth;
12257 int bbNumWidth = (int)log10((double)compiler->fgBBNumMax) + 1;
12258 // In the unlikely event that BB numbers overflow the space, we'll simply omit the predBB
12259 int predBBNumDumpSpace = regTableIndent - locationAndRPNumWidth - bbNumWidth - 9; // 'BB' + ' PredBB'
12260 if (predBBNumDumpSpace < bbNumWidth)
12262 sprintf_s(bbRefPosFormat, MAX_LEGEND_FORMAT_CHARS, "BB%%-%dd", shortRefPositionDumpWidth - 2);
12266 sprintf_s(bbRefPosFormat, MAX_LEGEND_FORMAT_CHARS, "BB%%-%dd PredBB%%-%dd", bbNumWidth, predBBNumDumpSpace);
12269 if (compiler->shouldDumpASCIITrees())
12271 columnSeparator = "|";
12279 columnSeparator = "\xe2\x94\x82";
12280 line = "\xe2\x94\x80";
12281 leftBox = "\xe2\x94\x9c";
12282 middleBox = "\xe2\x94\xbc";
12283 rightBox = "\xe2\x94\xa4";
12285 sprintf_s(indentFormat, MAX_FORMAT_CHARS, "%%-%ds", regTableIndent);
12287 // Now, set up the legend format for the RefPosition info
12288 sprintf_s(legendFormat, MAX_LEGEND_FORMAT_CHARS, "%%-%d.%ds%%-%d.%ds%%-%ds%%s", nodeLocationWidth + 1,
12289 nodeLocationWidth + 1, refPositionWidth + 2, refPositionWidth + 2, regColumnWidth + 1);
12291 // Finally, print a "title row" including the legend and the reg names
12292 dumpRegRecordTitle();
12295 int LinearScan::getLastUsedRegNumIndex()
12297 int lastUsedRegNumIndex = 0;
12298 regMaskTP usedRegsMask = compiler->codeGen->regSet.rsGetModifiedRegsMask();
12299 int lastRegNumIndex = compiler->compFloatingPointUsed ? REG_FP_LAST : REG_INT_LAST;
12300 for (int regNumIndex = 0; regNumIndex <= lastRegNumIndex; regNumIndex++)
12302 if ((usedRegsMask & genRegMask((regNumber)regNumIndex)) != 0)
12304 lastUsedRegNumIndex = regNumIndex;
12307 return lastUsedRegNumIndex;
12310 void LinearScan::dumpRegRecordTitleLines()
12312 for (int i = 0; i < regTableIndent; i++)
12314 printf("%s", line);
12316 int lastUsedRegNumIndex = getLastUsedRegNumIndex();
12317 for (int regNumIndex = 0; regNumIndex <= lastUsedRegNumIndex; regNumIndex++)
12319 printf("%s", middleBox);
12320 for (int i = 0; i < regColumnWidth; i++)
12322 printf("%s", line);
12325 printf("%s\n", rightBox);
12327 void LinearScan::dumpRegRecordTitle()
12329 dumpRegRecordTitleLines();
12331 // Print out the legend for the RefPosition info
12332 printf(legendFormat, "Loc ", "RP# ", "Name ", "Type Action Reg ");
12334 // Print out the register name column headers
12335 char columnFormatArray[MAX_FORMAT_CHARS];
12336 sprintf_s(columnFormatArray, MAX_FORMAT_CHARS, "%s%%-%d.%ds", columnSeparator, regColumnWidth, regColumnWidth);
12337 int lastUsedRegNumIndex = getLastUsedRegNumIndex();
12338 for (int regNumIndex = 0; regNumIndex <= lastUsedRegNumIndex; regNumIndex++)
12340 regNumber regNum = (regNumber)regNumIndex;
12341 const char* regName = getRegName(regNum);
12342 printf(columnFormatArray, regName);
12344 printf("%s\n", columnSeparator);
12346 rowCountSinceLastTitle = 0;
12348 dumpRegRecordTitleLines();
12351 void LinearScan::dumpRegRecords()
12353 static char columnFormatArray[18];
12354 int lastUsedRegNumIndex = getLastUsedRegNumIndex();
12355 regMaskTP usedRegsMask = compiler->codeGen->regSet.rsGetModifiedRegsMask();
12357 for (int regNumIndex = 0; regNumIndex <= lastUsedRegNumIndex; regNumIndex++)
12359 printf("%s", columnSeparator);
12360 RegRecord& regRecord = physRegs[regNumIndex];
12361 Interval* interval = regRecord.assignedInterval;
12362 if (interval != nullptr)
12364 dumpIntervalName(interval);
12365 char activeChar = interval->isActive ? 'a' : 'i';
12366 printf("%c", activeChar);
12368 else if (regRecord.isBusyUntilNextKill)
12370 printf(columnFormatArray, "Busy");
12372 else if ((usedRegsMask & genRegMask((regNumber)regNumIndex)) == 0)
12374 sprintf_s(columnFormatArray, MAX_FORMAT_CHARS, "%%-%ds", regColumnWidth);
12375 printf(columnFormatArray, "----");
12379 sprintf_s(columnFormatArray, MAX_FORMAT_CHARS, "%%-%ds", regColumnWidth);
12380 printf(columnFormatArray, "");
12383 printf("%s\n", columnSeparator);
12385 if (rowCountSinceLastTitle > MAX_ROWS_BETWEEN_TITLES)
12387 dumpRegRecordTitle();
12389 rowCountSinceLastTitle++;
12392 void LinearScan::dumpIntervalName(Interval* interval)
12394 if (interval->isLocalVar)
12396 printf(intervalNameFormat, 'V', interval->varNum);
12398 else if (interval->isConstant)
12400 printf(intervalNameFormat, 'C', interval->intervalIndex);
12404 printf(intervalNameFormat, 'I', interval->intervalIndex);
12408 void LinearScan::dumpEmptyRefPosition()
12410 printf(emptyRefPositionFormat, "");
12413 // Note that the size of this dump is computed in dumpRegRecordHeader().
12415 void LinearScan::dumpRefPositionShort(RefPosition* refPosition, BasicBlock* currentBlock)
12417 BasicBlock* block = currentBlock;
12418 if (refPosition->refType == RefTypeBB)
12420 // Always print a title row before a RefTypeBB (except for the first, because we
12421 // will already have printed it before the parameters)
12422 if (refPosition->refType == RefTypeBB && block != compiler->fgFirstBB && block != nullptr)
12424 dumpRegRecordTitle();
12427 printf(shortRefPositionFormat, refPosition->nodeLocation, refPosition->rpNum);
12428 if (refPosition->refType == RefTypeBB)
12430 if (block == nullptr)
12432 printf(regNameFormat, "END");
12434 printf(regNameFormat, "");
12438 printf(bbRefPosFormat, block->bbNum, block == compiler->fgFirstBB ? 0 : blockInfo[block->bbNum].predBBNum);
12441 else if (refPosition->isIntervalRef())
12443 Interval* interval = refPosition->getInterval();
12444 dumpIntervalName(interval);
12445 char lastUseChar = ' ';
12446 char delayChar = ' ';
12447 if (refPosition->lastUse)
12450 if (refPosition->delayRegFree)
12455 printf(" %s%c%c ", getRefTypeShortName(refPosition->refType), lastUseChar, delayChar);
12457 else if (refPosition->isPhysRegRef)
12459 RegRecord* regRecord = refPosition->getReg();
12460 printf(regNameFormat, getRegName(regRecord->regNum));
12461 printf(" %s ", getRefTypeShortName(refPosition->refType));
12465 assert(refPosition->refType == RefTypeKillGCRefs);
12466 // There's no interval or reg name associated with this.
12467 printf(regNameFormat, " ");
12468 printf(" %s ", getRefTypeShortName(refPosition->refType));
12472 //------------------------------------------------------------------------
12473 // LinearScan::IsResolutionMove:
12474 // Returns true if the given node is a move inserted by LSRA
12478 // node - the node to check.
12480 bool LinearScan::IsResolutionMove(GenTree* node)
12482 if (!node->gtLsraInfo.isLsraAdded)
12487 switch (node->OperGet())
12491 return node->gtLsraInfo.isLocalDefUse;
12501 //------------------------------------------------------------------------
12502 // LinearScan::IsResolutionNode:
12503 // Returns true if the given node is either a move inserted by LSRA
12504 // resolution or an operand to such a move.
12507 // containingRange - the range that contains the node to check.
12508 // node - the node to check.
12510 bool LinearScan::IsResolutionNode(LIR::Range& containingRange, GenTree* node)
12514 if (IsResolutionMove(node))
12519 if (!node->gtLsraInfo.isLsraAdded || (node->OperGet() != GT_LCL_VAR))
12525 bool foundUse = containingRange.TryGetUse(node, &use);
12532 //------------------------------------------------------------------------
12533 // verifyFinalAllocation: Traverse the RefPositions and verify various invariants.
12542 // If verbose is set, this will also dump a table of the final allocations.
12543 void LinearScan::verifyFinalAllocation()
12547 printf("\nFinal allocation\n");
12550 // Clear register assignments.
12551 for (regNumber reg = REG_FIRST; reg < ACTUAL_REG_COUNT; reg = REG_NEXT(reg))
12553 RegRecord* physRegRecord = getRegisterRecord(reg);
12554 physRegRecord->assignedInterval = nullptr;
12557 for (auto& interval : intervals)
12559 interval.assignedReg = nullptr;
12560 interval.physReg = REG_NA;
12563 DBEXEC(VERBOSE, dumpRegRecordTitle());
12565 BasicBlock* currentBlock = nullptr;
12566 GenTree* firstBlockEndResolutionNode = nullptr;
12567 regMaskTP regsToFree = RBM_NONE;
12568 regMaskTP delayRegsToFree = RBM_NONE;
12569 LsraLocation currentLocation = MinLocation;
12570 for (auto& refPosition : refPositions)
12572 RefPosition* currentRefPosition = &refPosition;
12573 Interval* interval = nullptr;
12574 RegRecord* regRecord = nullptr;
12575 regNumber regNum = REG_NA;
12576 if (currentRefPosition->refType == RefTypeBB)
12578 regsToFree |= delayRegsToFree;
12579 delayRegsToFree = RBM_NONE;
12580 // For BB RefPositions, wait until we dump the "end of block" info before dumping the basic RefPosition
12585 // For other RefPosition types, we can dump the basic RefPosition info now.
12586 DBEXEC(VERBOSE, dumpRefPositionShort(currentRefPosition, currentBlock));
12588 if (currentRefPosition->isPhysRegRef)
12590 regRecord = currentRefPosition->getReg();
12591 regRecord->recentRefPosition = currentRefPosition;
12592 regNum = regRecord->regNum;
12594 else if (currentRefPosition->isIntervalRef())
12596 interval = currentRefPosition->getInterval();
12597 interval->recentRefPosition = currentRefPosition;
12598 if (currentRefPosition->registerAssignment != RBM_NONE)
12600 if (!genMaxOneBit(currentRefPosition->registerAssignment))
12602 assert(currentRefPosition->refType == RefTypeExpUse ||
12603 currentRefPosition->refType == RefTypeDummyDef);
12607 regNum = currentRefPosition->assignedReg();
12608 regRecord = getRegisterRecord(regNum);
12614 LsraLocation newLocation = currentRefPosition->nodeLocation;
12616 if (newLocation > currentLocation)
12619 // We could use the freeRegisters() method, but we'd have to carefully manage the active intervals.
12620 for (regNumber reg = REG_FIRST; reg < ACTUAL_REG_COUNT; reg = REG_NEXT(reg))
12622 regMaskTP regMask = genRegMask(reg);
12623 if ((regsToFree & regMask) != RBM_NONE)
12625 RegRecord* physRegRecord = getRegisterRecord(reg);
12626 physRegRecord->assignedInterval = nullptr;
12629 regsToFree = delayRegsToFree;
12630 regsToFree = RBM_NONE;
12632 currentLocation = newLocation;
12634 switch (currentRefPosition->refType)
12638 if (currentBlock == nullptr)
12640 currentBlock = startBlockSequence();
12644 // Verify the resolution moves at the end of the previous block.
12645 for (GenTree* node = firstBlockEndResolutionNode; node != nullptr; node = node->gtNext)
12647 assert(enregisterLocalVars);
12648 // Only verify nodes that are actually moves; don't bother with the nodes that are
12649 // operands to moves.
12650 if (IsResolutionMove(node))
12652 verifyResolutionMove(node, currentLocation);
12656 // Validate the locations at the end of the previous block.
12657 if (enregisterLocalVars)
12659 VarToRegMap outVarToRegMap = outVarToRegMaps[currentBlock->bbNum];
12660 VarSetOps::Iter iter(compiler, currentBlock->bbLiveOut);
12661 unsigned varIndex = 0;
12662 while (iter.NextElem(&varIndex))
12664 if (localVarIntervals[varIndex] == nullptr)
12666 assert(!compiler->lvaTable[compiler->lvaTrackedToVarNum[varIndex]].lvLRACandidate);
12669 regNumber regNum = getVarReg(outVarToRegMap, varIndex);
12670 interval = getIntervalForLocalVar(varIndex);
12671 assert(interval->physReg == regNum || (interval->physReg == REG_NA && regNum == REG_STK));
12672 interval->physReg = REG_NA;
12673 interval->assignedReg = nullptr;
12674 interval->isActive = false;
12678 // Clear register assignments.
12679 for (regNumber reg = REG_FIRST; reg < ACTUAL_REG_COUNT; reg = REG_NEXT(reg))
12681 RegRecord* physRegRecord = getRegisterRecord(reg);
12682 physRegRecord->assignedInterval = nullptr;
12685 // Now, record the locations at the beginning of this block.
12686 currentBlock = moveToNextBlock();
12689 if (currentBlock != nullptr)
12691 if (enregisterLocalVars)
12693 VarToRegMap inVarToRegMap = inVarToRegMaps[currentBlock->bbNum];
12694 VarSetOps::Iter iter(compiler, currentBlock->bbLiveIn);
12695 unsigned varIndex = 0;
12696 while (iter.NextElem(&varIndex))
12698 if (localVarIntervals[varIndex] == nullptr)
12700 assert(!compiler->lvaTable[compiler->lvaTrackedToVarNum[varIndex]].lvLRACandidate);
12703 regNumber regNum = getVarReg(inVarToRegMap, varIndex);
12704 interval = getIntervalForLocalVar(varIndex);
12705 interval->physReg = regNum;
12706 interval->assignedReg = &(physRegs[regNum]);
12707 interval->isActive = true;
12708 physRegs[regNum].assignedInterval = interval;
12714 dumpRefPositionShort(currentRefPosition, currentBlock);
12718 // Finally, handle the resolution moves, if any, at the beginning of the next block.
12719 firstBlockEndResolutionNode = nullptr;
12720 bool foundNonResolutionNode = false;
12722 LIR::Range& currentBlockRange = LIR::AsRange(currentBlock);
12723 for (GenTree* node : currentBlockRange.NonPhiNodes())
12725 if (IsResolutionNode(currentBlockRange, node))
12727 assert(enregisterLocalVars);
12728 if (foundNonResolutionNode)
12730 firstBlockEndResolutionNode = node;
12733 else if (IsResolutionMove(node))
12735 // Only verify nodes that are actually moves; don't bother with the nodes that are
12736 // operands to moves.
12737 verifyResolutionMove(node, currentLocation);
12742 foundNonResolutionNode = true;
12751 assert(regRecord != nullptr);
12752 assert(regRecord->assignedInterval == nullptr);
12753 dumpLsraAllocationEvent(LSRA_EVENT_KEPT_ALLOCATION, nullptr, regRecord->regNum, currentBlock);
12755 case RefTypeFixedReg:
12756 assert(regRecord != nullptr);
12757 dumpLsraAllocationEvent(LSRA_EVENT_KEPT_ALLOCATION, nullptr, regRecord->regNum, currentBlock);
12760 case RefTypeUpperVectorSaveDef:
12761 case RefTypeUpperVectorSaveUse:
12764 case RefTypeParamDef:
12765 case RefTypeZeroInit:
12766 assert(interval != nullptr);
12768 if (interval->isSpecialPutArg)
12770 dumpLsraAllocationEvent(LSRA_EVENT_SPECIAL_PUTARG, interval, regNum);
12773 if (currentRefPosition->reload)
12775 interval->isActive = true;
12776 assert(regNum != REG_NA);
12777 interval->physReg = regNum;
12778 interval->assignedReg = regRecord;
12779 regRecord->assignedInterval = interval;
12780 dumpLsraAllocationEvent(LSRA_EVENT_RELOAD, nullptr, regRecord->regNum, currentBlock);
12782 if (regNum == REG_NA)
12784 dumpLsraAllocationEvent(LSRA_EVENT_NO_REG_ALLOCATED, interval);
12786 else if (RefTypeIsDef(currentRefPosition->refType))
12788 interval->isActive = true;
12791 if (interval->isConstant && (currentRefPosition->treeNode != nullptr) &&
12792 currentRefPosition->treeNode->IsReuseRegVal())
12794 dumpLsraAllocationEvent(LSRA_EVENT_REUSE_REG, nullptr, regRecord->regNum, currentBlock);
12798 dumpLsraAllocationEvent(LSRA_EVENT_ALLOC_REG, nullptr, regRecord->regNum, currentBlock);
12802 else if (currentRefPosition->copyReg)
12804 dumpLsraAllocationEvent(LSRA_EVENT_COPY_REG, interval, regRecord->regNum, currentBlock);
12806 else if (currentRefPosition->moveReg)
12808 assert(interval->assignedReg != nullptr);
12809 interval->assignedReg->assignedInterval = nullptr;
12810 interval->physReg = regNum;
12811 interval->assignedReg = regRecord;
12812 regRecord->assignedInterval = interval;
12815 printf("Move %-4s ", getRegName(regRecord->regNum));
12820 dumpLsraAllocationEvent(LSRA_EVENT_KEPT_ALLOCATION, nullptr, regRecord->regNum, currentBlock);
12822 if (currentRefPosition->lastUse || currentRefPosition->spillAfter)
12824 interval->isActive = false;
12826 if (regNum != REG_NA)
12828 if (currentRefPosition->spillAfter)
12832 // If refPos is marked as copyReg, then the reg that is spilled
12833 // is the homeReg of the interval not the reg currently assigned
12835 regNumber spillReg = regNum;
12836 if (currentRefPosition->copyReg)
12838 assert(interval != nullptr);
12839 spillReg = interval->physReg;
12842 dumpEmptyRefPosition();
12843 printf("Spill %-4s ", getRegName(spillReg));
12846 else if (currentRefPosition->copyReg)
12848 regRecord->assignedInterval = interval;
12852 interval->physReg = regNum;
12853 interval->assignedReg = regRecord;
12854 regRecord->assignedInterval = interval;
12858 case RefTypeKillGCRefs:
12859 // No action to take.
12860 // However, we will assert that, at resolution time, no registers contain GC refs.
12862 DBEXEC(VERBOSE, printf(" "));
12863 regMaskTP candidateRegs = currentRefPosition->registerAssignment;
12864 while (candidateRegs != RBM_NONE)
12866 regMaskTP nextRegBit = genFindLowestBit(candidateRegs);
12867 candidateRegs &= ~nextRegBit;
12868 regNumber nextReg = genRegNumFromMask(nextRegBit);
12869 RegRecord* regRecord = getRegisterRecord(nextReg);
12870 Interval* assignedInterval = regRecord->assignedInterval;
12871 assert(assignedInterval == nullptr || !varTypeIsGC(assignedInterval->registerType));
12876 case RefTypeExpUse:
12877 case RefTypeDummyDef:
12878 // Do nothing; these will be handled by the RefTypeBB.
12879 DBEXEC(VERBOSE, printf(" "));
12882 case RefTypeInvalid:
12883 // for these 'currentRefPosition->refType' values, No action to take
12887 if (currentRefPosition->refType != RefTypeBB)
12889 DBEXEC(VERBOSE, dumpRegRecords());
12890 if (interval != nullptr)
12892 if (currentRefPosition->copyReg)
12894 assert(interval->physReg != regNum);
12895 regRecord->assignedInterval = nullptr;
12896 assert(interval->assignedReg != nullptr);
12897 regRecord = interval->assignedReg;
12899 if (currentRefPosition->spillAfter || currentRefPosition->lastUse)
12901 interval->physReg = REG_NA;
12902 interval->assignedReg = nullptr;
12904 // regRegcord could be null if the RefPosition does not require a register.
12905 if (regRecord != nullptr)
12907 regRecord->assignedInterval = nullptr;
12911 assert(!currentRefPosition->RequiresRegister());
12918 // Now, verify the resolution blocks.
12919 // Currently these are nearly always at the end of the method, but that may not alwyas be the case.
12920 // So, we'll go through all the BBs looking for blocks whose bbNum is greater than bbNumMaxBeforeResolution.
12921 for (BasicBlock* currentBlock = compiler->fgFirstBB; currentBlock != nullptr; currentBlock = currentBlock->bbNext)
12923 if (currentBlock->bbNum > bbNumMaxBeforeResolution)
12925 // If we haven't enregistered an lclVars, we have no resolution blocks.
12926 assert(enregisterLocalVars);
12930 dumpRegRecordTitle();
12931 printf(shortRefPositionFormat, 0, 0);
12932 assert(currentBlock->bbPreds != nullptr && currentBlock->bbPreds->flBlock != nullptr);
12933 printf(bbRefPosFormat, currentBlock->bbNum, currentBlock->bbPreds->flBlock->bbNum);
12937 // Clear register assignments.
12938 for (regNumber reg = REG_FIRST; reg < ACTUAL_REG_COUNT; reg = REG_NEXT(reg))
12940 RegRecord* physRegRecord = getRegisterRecord(reg);
12941 physRegRecord->assignedInterval = nullptr;
12944 // Set the incoming register assignments
12945 VarToRegMap inVarToRegMap = getInVarToRegMap(currentBlock->bbNum);
12946 VarSetOps::Iter iter(compiler, currentBlock->bbLiveIn);
12947 unsigned varIndex = 0;
12948 while (iter.NextElem(&varIndex))
12950 if (localVarIntervals[varIndex] == nullptr)
12952 assert(!compiler->lvaTable[compiler->lvaTrackedToVarNum[varIndex]].lvLRACandidate);
12955 regNumber regNum = getVarReg(inVarToRegMap, varIndex);
12956 Interval* interval = getIntervalForLocalVar(varIndex);
12957 interval->physReg = regNum;
12958 interval->assignedReg = &(physRegs[regNum]);
12959 interval->isActive = true;
12960 physRegs[regNum].assignedInterval = interval;
12963 // Verify the moves in this block
12964 LIR::Range& currentBlockRange = LIR::AsRange(currentBlock);
12965 for (GenTree* node : currentBlockRange.NonPhiNodes())
12967 assert(IsResolutionNode(currentBlockRange, node));
12968 if (IsResolutionMove(node))
12970 // Only verify nodes that are actually moves; don't bother with the nodes that are
12971 // operands to moves.
12972 verifyResolutionMove(node, currentLocation);
12976 // Verify the outgoing register assignments
12978 VarToRegMap outVarToRegMap = getOutVarToRegMap(currentBlock->bbNum);
12979 VarSetOps::Iter iter(compiler, currentBlock->bbLiveOut);
12980 unsigned varIndex = 0;
12981 while (iter.NextElem(&varIndex))
12983 if (localVarIntervals[varIndex] == nullptr)
12985 assert(!compiler->lvaTable[compiler->lvaTrackedToVarNum[varIndex]].lvLRACandidate);
12988 regNumber regNum = getVarReg(outVarToRegMap, varIndex);
12989 Interval* interval = getIntervalForLocalVar(varIndex);
12990 assert(interval->physReg == regNum || (interval->physReg == REG_NA && regNum == REG_STK));
12991 interval->physReg = REG_NA;
12992 interval->assignedReg = nullptr;
12993 interval->isActive = false;
12999 DBEXEC(VERBOSE, printf("\n"));
13002 //------------------------------------------------------------------------
13003 // verifyResolutionMove: Verify a resolution statement. Called by verifyFinalAllocation()
13006 // resolutionMove - A GenTree* that must be a resolution move.
13007 // currentLocation - The LsraLocation of the most recent RefPosition that has been verified.
13013 // If verbose is set, this will also dump the moves into the table of final allocations.
13014 void LinearScan::verifyResolutionMove(GenTree* resolutionMove, LsraLocation currentLocation)
13016 GenTree* dst = resolutionMove;
13017 assert(IsResolutionMove(dst));
13019 if (dst->OperGet() == GT_SWAP)
13021 GenTreeLclVarCommon* left = dst->gtGetOp1()->AsLclVarCommon();
13022 GenTreeLclVarCommon* right = dst->gtGetOp2()->AsLclVarCommon();
13023 regNumber leftRegNum = left->gtRegNum;
13024 regNumber rightRegNum = right->gtRegNum;
13025 LclVarDsc* leftVarDsc = compiler->lvaTable + left->gtLclNum;
13026 LclVarDsc* rightVarDsc = compiler->lvaTable + right->gtLclNum;
13027 Interval* leftInterval = getIntervalForLocalVar(leftVarDsc->lvVarIndex);
13028 Interval* rightInterval = getIntervalForLocalVar(rightVarDsc->lvVarIndex);
13029 assert(leftInterval->physReg == leftRegNum && rightInterval->physReg == rightRegNum);
13030 leftInterval->physReg = rightRegNum;
13031 rightInterval->physReg = leftRegNum;
13032 leftInterval->assignedReg = &physRegs[rightRegNum];
13033 rightInterval->assignedReg = &physRegs[leftRegNum];
13034 physRegs[rightRegNum].assignedInterval = leftInterval;
13035 physRegs[leftRegNum].assignedInterval = rightInterval;
13038 printf(shortRefPositionFormat, currentLocation, 0);
13039 dumpIntervalName(leftInterval);
13041 printf(" %-4s ", getRegName(rightRegNum));
13043 printf(shortRefPositionFormat, currentLocation, 0);
13044 dumpIntervalName(rightInterval);
13046 printf(" %-4s ", getRegName(leftRegNum));
13051 regNumber dstRegNum = dst->gtRegNum;
13052 regNumber srcRegNum;
13053 GenTreeLclVarCommon* lcl;
13054 if (dst->OperGet() == GT_COPY)
13056 lcl = dst->gtGetOp1()->AsLclVarCommon();
13057 srcRegNum = lcl->gtRegNum;
13061 lcl = dst->AsLclVarCommon();
13062 if ((lcl->gtFlags & GTF_SPILLED) != 0)
13064 srcRegNum = REG_STK;
13068 assert((lcl->gtFlags & GTF_SPILL) != 0);
13069 srcRegNum = dstRegNum;
13070 dstRegNum = REG_STK;
13074 Interval* interval = getIntervalForLocalVarNode(lcl);
13075 assert(interval->physReg == srcRegNum || (srcRegNum == REG_STK && interval->physReg == REG_NA));
13076 if (srcRegNum != REG_STK)
13078 physRegs[srcRegNum].assignedInterval = nullptr;
13080 if (dstRegNum != REG_STK)
13082 interval->physReg = dstRegNum;
13083 interval->assignedReg = &(physRegs[dstRegNum]);
13084 physRegs[dstRegNum].assignedInterval = interval;
13085 interval->isActive = true;
13089 interval->physReg = REG_NA;
13090 interval->assignedReg = nullptr;
13091 interval->isActive = false;
13095 printf(shortRefPositionFormat, currentLocation, 0);
13096 dumpIntervalName(interval);
13098 printf(" %-4s ", getRegName(dstRegNum));
13104 #endif // !LEGACY_BACKEND