Documentation/botr/ryujit-overview.md

   1 JIT Compiler Structure
   2 ===
   3
   4 # Introduction
   5
   6 RyuJIT is the code name for the next generation Just in Time Compiler (aka “JIT”) for the AMD64 .NET runtime. Its first implementation is for the AMD64 architecture. It is derived from a code base that is still in use for the other targets of .NET.
   7
   8 The primary design considerations for RyuJIT are to:
   9
  10 * Maintain a high compatibility bar with previous JITs, especially those for x86 (jit32) and x64 (jit64).
  11 * Support and enable good runtime performance through code optimizations, register allocation, and code generation.
  12 * Ensure good throughput via largely linear-order optimizations and transformations, along with limitations on tracked variables for analyses (such as dataflow) that are inherently super-linear.
  13 * Ensure that the JIT architecture is designed to support a range of targets and scenarios.
  14
  15 The first objective was the primary motivation for evolving the existing code base, rather than starting from scratch or departing more drastically from the existing IR and architecture.
  16
  17 # Execution Environment and External Interface
  18
  19 RyuJIT provides the just in time compilation service for the .NET runtime. The runtime itself is variously called the EE (execution engine), the VM (virtual machine) or simply the CLR (common language runtime). Depending upon the configuration, the EE and JIT may reside in the same or different executable files. RyuJIT implements the JIT side of the JIT/EE interfaces:
  20
  21 * `ICorJitCompiler` – this is the interface that the JIT compiler implements. This interface is defined in [src/inc/corjit.h](https://github.com/dotnet/coreclr/blob/master/src/inc/corjit.h) and its implementation is in [src/jit/ee_il_dll.cpp](https://github.com/dotnet/coreclr/blob/master/src/jit/ee_il_dll.cpp). The following are the key methods on this interface:
  22   * `compileMethod` is the main entry point for the JIT. The EE passes it a `ICorJitInfo` object, and the “info” containing the IL, the method header, and various other useful tidbits. It returns a pointer to the code, its size, and additional GC, EH and (optionally) debug info.
  23   * `getVersionIdentifier` is the mechanism by which the JIT/EE interface is versioned. There is a single GUID (manually generated) which the JIT and EE must agree on.
  24   * `getMaxIntrinsicSIMDVectorLength` communicates to the EE the largest SIMD vector length that the JIT can support.
  25 * `ICorJitInfo` – this is the interface that the EE implements. It has many methods defined on it that allow the JIT to look up metadata tokens, traverse type signatures, compute field and vtable offsets, find method entry points, construct string literals, etc. This bulk of this interface is inherited from `ICorDynamicInfo` which is defined in [src/inc/corinfo.h](https://github.com/dotnet/coreclr/blob/master/src/inc/corinfo.h). The implementation is defined in [src/vm/jitinterface.cpp](https://github.com/dotnet/coreclr/blob/master/src/vm/jitinterface.cpp).
  26
  27 # Internal Representation (IR)
  28
  29 ## `Compiler` object
  30
  31 The `Compiler` object is the primary data structure of the JIT. While it is not part of the JIT's IR per se, it serves as the root from which the data structures that implement the IR are accessible. For example, the `Compiler` object points to the head of the function's `BasicBlock` list with the `fgFirstBB` field, as well as having additional pointers to the end of the list, and other distinguished locations. `ICorJit`Compiler`::CompileMethod()` is invoked for each method, and creates a new `Compiler` object. Thus, the JIT need not worry about thread synchronization while accessing `Compiler` state. The EE has the necessary synchronization to ensure there is a single JIT’d copy of a method when two or more threads try to trigger JIT compilation of the same method.
  32
  33 ## Overview of the IR
  34
  35 RyuJIT represents a function as a doubly-linked list of `BasicBlock` values. Each `BasicBlock` has explicit edges to its successors that define the function's non-exceptional control flow. Exceptional control flow is implicit, with protected regions and handlers described in a table of `EHblkDsc` values. At the beginning of a compilation, each `BasicBlock` contains nodes in a high-level, statement- and tree-oriented form (HIR); this form persists throughout the JIT's front end. During the first phase of the back end--the rationalization phase--the HIR for each block is lowered to a linearly-ordered, node-oriented form (LIR). The fundamental distinction between HIR and LIR is in ordering semantics, though there are also some restrictions on the types of nodes that may appear in an HIR or LIR block.
  36
  37 Both HIR and LIR blocks are composed of `GenTree` nodes that define the operations performed by the block. A `GenTree` node may consume some number of operands and may produce a singly-defined, at-most-singly-used value as a result. These values are referred to interchangably as *SDSU* temps or *tree* temps. Defs of SDSU temps are represented by `GenTree` nodes themselves, and uses are represented by edges from the using node to the defining node. Furthermore, SDSU temps defined in one block may not be used in a different block. In cases where a value must be multiply-defined, multiply-used, or defined in one block and used in another, the IR provides another class of temporary: the local var. Local vars are defined by assignment nodes in HIR or store nodes in LIR, and are used by local var nodes in both forms.
  38
  39 An HIR block is composed of a doubly-linked list of statement nodes (`GenTreeStmt`), each of which references a single expression tree (`gtStmtExpr`). The `GenTree` nodes in this tree execute in "tree order", which is defined as the order produced by a depth-first, left-to-right traversal of the tree, with two notable exceptions:
  40 - Binary nodes marked with the `GTF_REVERSE_OPS` flag execute their right operand tree (`gtOp2`) before their left operand tree (`gtOp1`)
  41 - Dynamically-sized block copy nodes where `gtEvalSizeFirst` is `true` execute the `gtDynamicSize` tree before executing their other operand trees
  42 In addition to tree order, HIR also requires that no SDSU temp is defined in one statement and used in another. In situations where the requirements of tree and statement order prove onerous (e.g. when code must execute at a particular point in a function), HIR provides `GT_COMMA` nodes as an escape valve: these nodes consume and discard the results of their left-hand side while producing a copy of the vaule produced by their right-hand side. This allows the compiler to insert code in the middle of a statement without requiring that the statement be split apart.
  43
  44 An LIR block is composed of a doubly-linked list of `GenTree` nodes, each of which describes a single operation in the method. These nodes execute in the order given by the list; there is no relationship between the order in which a node's operands appear and the order in which the operators that produced those operands execute. The only exception to this rule occurs after the register allocator, which may introduce `GT_COPY` and `GT_RELOAD` nodes that execute in "spill order". Spill order is defined as the order in which the register allocator visits a node's operands. For correctness, the code generator must generate code for spills, reloads, and `GT_COPY`/`GT_RELOAD` nodes in this order.
  45
  46 In addition to HIR and LIR `BasicBlock`s, a separate representation--`insGroup` and `instrDesc`--is used during the actual instruction encoding.
  47
  48 ![RyuJIT IR Overview](../images/ryujit-ir-overview.png)
  49
  50 ## GenTree Nodes
  51
  52 Each operation is represented as a GenTree node, with an opcode (`GT_xxx`), zero or more child/operand `GenTree` nodes, and additional fields as needed to represent the semantics of that node. Every node includes its type, value number, assertions, register assignments, etc. when available.
  53
  54 `GenTree` nodes are doubly-linked in execution order, but the links are not necessarily valid during all phases of the JIT. In HIR these links are primarily a convenience, as the order produced by a traversal of the links must match the order produced by a "tree order" traversal (see above for details). In LIR these links define the execution order of the nodes.
  55
  56 HIR statement nodes utilize the same `GenTree` base type as the operation nodes, though they are not truly related.
  57 * The statement nodes are doubly-linked. The first statement node in a block points to the last node in the block via its `gtPrev` link. Note that the last statement node does *not* point to the first; that is, the list is not fully circular.
  58 * Each statement node contains two `GenTree` links – `gtStmtExpr` points to the top-level node in the statement (i.e. the root of the tree that represents the statement), while `gtStmtList` points to the first node in execution order (again, this link is not always valid).
  59
  60 ## Local var descriptors
  61
  62 A `LclVarDsc` represents a possibly-multiply-defined, possibly-multiply-used temporary. These temporaries may be used to represent user local variables, arguments or JIT-created temps. Each lclVar has a `gtLclNum` which is the identifier usually associated with the variable in the JIT and its dumps. The `LclVarDsc` contains the type, use count, weighted use count, frame or register assignment etc. A local var may be "tracked" (`lvTracked`), in which case it participates in dataflow analysis, and has a secondary name (`lvVarIndex`) that allows for the use of dense bit vectors.
  63
  64 ### Example of Post-Import IR
  65
  66 For this snippet of code (extracted from [tests/src/JIT/CodeGenBringUpTests/DblRoots.cs](https://github.com/dotnet/coreclr/blob/master/tests/src/JIT/CodeGenBringUpTests/DblRoots.cs)):
  67
  68        r1 = (-b + Math.Sqrt(b*b - 4*a*c))/(2*a);
  69
  70 A stripped-down dump of the `GenTree` nodes just after they are imported looks like this:
  71
  72     ▌ stmtExpr  void  (top level) (IL 0x000...0x026)
  73     │        ┌──▌ lclVar    double V00 arg0
  74     │     ┌──▌ *         double
  75     │     │  └──▌ dconst    double 2.00
  76     │  ┌──▌ /         double
  77     │  │  │  ┌──▌ mathFN    double sqrt
  78     │  │  │  │  │     ┌──▌ lclVar    double V02 arg2
  79     │  │  │  │  │  ┌──▌ *         double
  80     │  │  │  │  │  │  │  ┌──▌ lclVar    double V00 arg0
  81     │  │  │  │  │  │  └──▌ *         double
  82     │  │  │  │  │  │     └──▌ dconst    double 4.00
  83     │  │  │  │  └──▌ -         double
  84     │  │  │  │     │       lclVar    double V01 arg1
  85     │  │  │  │     └──▌ *         double
  86     │  │  │  │        └──▌ lclVar    double V01 arg1
  87     │  │  └──▌ +         double
  88     │  │     └──▌ unary -   double
  89     │  │        └──▌ lclVar    double V01 arg1
  90     └──▌  =         double
  91        └──▌ indir     double
  92           └──▌ lclVar    byref  V03 arg3
  93
  94 ## Types
  95
  96 The JIT is primarily concerned with “primitive” types, i.e. integers, reference types, pointers, and floating point types.  It must also be concerned with the format of user-defined value types (i.e. struct types derived from `System.ValueType`) – specifically, their size and the offset of any GC references they contain, so that they can be correctly initialized and copied.  The primitive types are represented in the JIT by the `var_types` enum, and any additional information required for struct types is obtained from the JIT/EE interface by the use of an opaque `CORINFO_CLASS_HANDLE`.
  97
  98 ## Dataflow Information
  99
 100 In order to limit throughput impact, the JIT limits the number of lclVars for which liveness information is computed. These are the tracked lclVars (`lvTracked` is true), and they are the only candidates for register allocation (i.e. only these lclVars may be assigned registers over their entire lifetime). Defs and uses of untracked lclVars are treated as stores and loads to/from the appropriate stack location, and the corresponding nodes act as normal operators during register allocation.
 101
 102 The liveness analysis determines the set of defs, as well as the uses that are upward exposed, for each block. It then propagates the liveness information. The result of the analysis is captured in the following:
 103
 104 * The live-in and live-out sets are captured in the `bbLiveIn` and `bbLiveOut` fields of the `BasicBlock`.
 105 * The `GTF_VAR_DEF` flag is set on a lclVar node (all of which are of type `GenTreeLclVarCommon`) that is a definition.
 106 * The `GTF_VAR_USEASG` flag is set (in addition to the `GTF_VAR_DEF` flag) for the target of an update (e.g. +=).
 107
 108 ## SSA
 109
 110 Static single assignment (SSA) form is constructed in a traditional manner [[1]](#[1]). The SSA names are recorded on the lclVar references. While SSA form usually retains a pointer or link to the defining reference, RyuJIT currently retains only the `BasicBlock` in which the definition of each SSA name resides.
 111
 112 ## Value Numbering
 113
 114 Value numbering utilizes SSA for lclVar values, but also performs value numbering of expression trees. It takes advantage of type safety by not invalidating the value number for field references with a heap write, unless the write is to the same field. The IR nodes are annotated with the value numbers, which are indexes into a type-specific value number store. Value numbering traverses the trees, performing symbolic evaluation of many operations.
 115
 116 # Phases of RyuJIT
 117
 118 The top-level function of interest is `Compiler::compCompile`. It invokes the following phases in order.
 119
 120 | **Phase** | **IR Transformations** |
 121 | --- | --- |
 122 |[Pre-import](#pre-import)|`Compiler->lvaTable` created and filled in for each user argument and variable. BasicBlock list initialized.|
 123 |[Importation](#importation)|`GenTree` nodes created and linked in to Statements, and Statements into BasicBlocks. Inlining candidates identified.|
 124 |[Inlining](#inlining)|The IR for inlined methods is incorporated into the flowgraph.|
 125 |[Struct Promotion](#struct-promotion)|New lclVars are created for each field of a promoted struct.|
 126 |[Mark Address-Exposed Locals](#mark-addr-exposed)|lclVars with references occurring in an address-taken context are marked.  This must be kept up-to-date.|
 127 |[Morph Blocks](#morph-blocks)|Performs localized transformations, including mandatory normalization as well as simple optimizations.|
 128 |[Eliminate Qmarks](#eliminate-qmarks)|All `GT_QMARK` nodes are eliminated, other than simple ones that do not require control flow.|
 129 |[Flowgraph Analysis](#flowgraph-analysis)|`BasicBlock` predecessors are computed, and must be kept valid. Loops are identified, and normalized, cloned and/or unrolled.|
 130 |[Normalize IR for Optimization](#normalize-ir)|lclVar references counts are set, and must be kept valid. Evaluation order of `GenTree` nodes (`gtNext`/`gtPrev`) is determined, and must be kept valid.|
 131 |[SSA and Value Numbering Optimizations](#ssa-vn)|Computes liveness (`bbLiveIn` and `bbLiveOut` on `BasicBlocks`), and dominators. Builds SSA for tracked lclVars. Computes value numbers.|
 132 |[Loop Invariant Code Hoisting](#licm)|Hoists expressions out of loops.|
 133 |[Copy Propagation](#copy-propagation)|Copy propagation based on value numbers.|
 134 |[Common Subexpression Elimination (CSE)](#cse)|Elimination of redundant subexressions based on value numbers.|
 135 |[Assertion Propagation](#assertion-propagation)|Utilizes value numbers to propagate and transform based on properties such as non-nullness.|
 136 |[Range analysis](#range-analysis)|Eliminate array index range checks based on value numbers and assertions|
 137 |[Rationalization](#rationalization)|Flowgraph order changes from `FGOrderTree` to `FGOrderLinear`. All `GT_COMMA`, `GT_ASG` and `GT_ADDR` nodes are transformed.|
 138 |[Lowering](#lowering)|Register requirements are fully specified (`gtLsraInfo`). All control flow is explicit.|
 139 |[Register allocation](#reg-alloc)|Registers are assigned (`gtRegNum` and/or `gtRsvdRegs`),and the number of spill temps calculated.|
 140 |[Code Generation](#code-generation)|Determines frame layout. Generates code for each `BasicBlock`. Generates prolog & epilog code for the method. Emit EH, GC and Debug info.|
 141
 142 ## <a name="pre-import"/>Pre-import
 143
 144 Prior to reading in the IL for the method, the JIT initializes the local variable table, and scans the IL to find branch targets and form BasicBlocks.
 145
 146 ## <a name="importation">Importation
 147
 148 Importation is the phase that creates the IR for the method, reading in one IL instruction at a time, and building up the statements. During this process, it may need to generate IR with multiple, nested expressions. This is the purpose of the non-expression-like IR nodes:
 149
 150 * It may need to evaluate part of the expression into a temp, in which case it will use a comma (`GT_COMMA`) node to ensure that the temp is evaluated in the proper execution order – i.e. `GT_COMMA(GT_ASG(temp, exp), temp)` is inserted into the tree where “exp” would go.
 151 * It may need to create conditional expressions, but adding control flow at this point would be quite messy. In this case it generates question mark/colon (?: or `GT_QMARK`/`GT_COLON`) trees that may be nested within an expression.
 152
 153 During importation, tail call candidates (either explicitly marked or opportunistically identified) are identified and flagged. They are further validated, and possibly unmarked, during morphing.
 154
 155 ## Morphing
 156
 157 The `fgMorph` phase includes a number of transformations:
 158
 159 ### <a name="inlining"/>Inlining
 160
 161 The `fgInline` phase determines whether each call site is a candidate for inlining. The initial determination is made via a state machine that runs over the candidate method’s IL. It estimates the native code size corresponding to the inline method, and uses a set of heuristics, including the estimated size of the current method) to determine if inlining would be profitable. If so, a separate Compiler object is created, and the importation phase is called to create the tree for the candidate inline method. Inlining may be aborted prior to completion, if any conditions are encountered that indicate that it may be unprofitable (or otherwise incorrect). If inlining is successful, the inlinee compiler’s trees are incorporated into the inliner compiler (the “parent”), with args and returns appropriately transformed.
 162
 163 ### <a name="struct-promotion"/>Struct Promotion
 164
 165 Struct promotion (`fgPromoteStructs()`) analyzes the local variables and temps, and determines if their fields are candidates for tracking (and possibly enregistering) separately. It first determines whether it is possible to promote, which takes into account whether the layout may have holes or overlapping fields, whether its fields (flattening any contained structs) will fit in registers, etc.
 166
 167 Next, it determines whether it is likely to be profitable, based on the number of fields, and whether the fields are individually referenced.
 168
 169 When a lclVar is promoted, there are now N+1 lclVars for the struct, where N is the number of fields. The original struct lclVar is not considered to be tracked, but its fields may be.
 170
 171 ### <a name="mark-addr-exposed"/>Mark Address-Exposed Locals
 172
 173 This phase traverses the expression trees, propagating the context (e.g. taking the address, indirecting) to determine which lclVars have their address taken, and which therefore will not be register candidates. If a struct lclVar has been promoted, and is then found to be address-taken, it will be considered “dependently promoted”, which is an odd way of saying that the fields will still be separately tracked, but they will not be register candidates.
 174
 175 ### <a name="morph-blocks"/>Morph Blocks
 176
 177 What is often thought of as “morph” involves localized transformations to the trees. In addition to performing simple optimizing transformations, it performs some normalization that is required, such as converting field and array accesses into pointer arithmetic. It can (and must) be called by subsequent phases on newly added or modified trees. During the main Morph phase, the boolean `fgGlobalMorph` is set on the Compiler argument, which governs which transformations are permissible.
 178
 179 ### <a name="eliminate-qmarks"/>Eliminate Qmarks
 180
 181 This expands most `GT_QMARK`/`GT_COLON` trees into blocks, except for the case that is instantiating a condition.
 182
 183 ## <a name="flowgraph-analysis"/>Flowgraph Analysis
 184
 185 At this point, a number of analyses and transformations are done on the flowgraph:
 186
 187 * Computing the predecessors of each block
 188 * Computing edge weights, if profile information is available
 189 * Computing reachability and dominators
 190 * Identifying and normalizing loops (transforming while loops to “do while”)
 191 * Cloning and unrolling of loops
 192
 193 ## <a name="normalize-ir"/>Normalize IR for Optimization
 194
 195 At this point, a number of properties are computed on the IR, and must remain valid for the remaining phases. We will call this “normalization”
 196
 197 * `lvaMarkLocalVars` – set the reference counts (raw and weighted) for lclVars, sort them, and determine which will be tracked (currently up to 128). Note that after this point any transformation that adds or removes lclVar references must update the reference counts.
 198 * `optOptimizeBools` – this optimizes Boolean expressions, and may change the flowgraph (why is it not done prior to reachability and dominators?)
 199 * Link the trees in evaluation order (setting `gtNext` and `gtPrev` fields): and `fgFindOperOrder()` and `fgSetBlockOrder()`.
 200
 201 ## <a name="ssa-vn"/>SSA and Value Numbering Optimizations
 202
 203 The next set of optimizations are built on top of SSA and value numbering. First, the SSA representation is built (during which dataflow analysis, aka liveness, is computed on the lclVars), then value numbering is done using SSA.
 204
 205 ### <a name="licm"/>Loop Invariant Code Hoisting
 206
 207 This phase traverses all the loop nests, in outer-to-inner order (thus hoisting expressions outside the largest loop in which they are invariant). It traverses all of the statements in the blocks in the loop that are always executed. If the statement is:
 208
 209 * A valid CSE candidate
 210 * Has no side-effects
 211 * Does not raise an exception OR occurs in the loop prior to any side-effects
 212 * Has a valid value number, and it is a lclVar defined outside the loop, or its children (the value numbers from which it was computed) are invariant.
 213
 214 ### <a name="copy-propagation"/>Copy Propagation
 215
 216 This phase walks each block in the graph (in dominator-first order, maintaining context between dominator and child) keeping track of every live definition. When it encounters a variable that shares the VN with a live definition, it is replaced with the variable in the live definition.
 217
 218 The JIT currently requires that the IR be maintained in conventional SSA form, as there is no “out of SSA” translation (see the comments on `optVnCopyProp()` for more information).
 219
 220 ### <a name="cse"/>Common Subexpression Elimination (CSE)
 221
 222 Utilizes value numbers to identify redundant computations, which are then evaluated to a new temp lclVar, and then reused.
 223
 224 ### <a name="assertion-propagation"/>Assertion Propagation
 225
 226 Utilizes value numbers to propagate and transform based on properties such as non-nullness.
 227
 228 ### <a name="range-analysis"/>Range analysis
 229
 230 Optimize array index range checks based on value numbers and assertions.
 231
 232 ## <a name="rationalization"/>Rationalization
 233
 234 As the JIT has evolved, changes have been made to improve the ability to reason over the tree in both “tree order” and “linear order”. These changes have been termed the “rationalization” of the IR. In the spirit of reuse and evolution, some of the changes have been made only in the later (“backend”) components of the JIT. The corresponding transformations are made to the IR by a “Rationalizer” component. It is expected that over time some of these changes will migrate to an earlier place in the JIT phase order:
 235
 236 * Elimination of assignment nodes (`GT_ASG`). The assignment node was problematic because the semantics of its destination (left hand side of the assignment) could not be determined without context. For example, a `GT_LCL_VAR` on the left-hand side of an assignment is a definition of the local variable, but on the right-hand side it is a use. Furthermore, since the execution order requires that the children be executed before the parent, it is unnatural that the left-hand side of the assignment appears in execution order before the assignment operator.
 237   * During rationalization, all assignments are replaced by stores, which either represent their destination on the store node itself (e.g. `GT_LCL_VAR`), or by the use of a child address node (e.g. `GT_STORE_IND`).
 238 * Elimination of address nodes (`GT_ADDR`). These are problematic because of the need for parent context to analyze the child.
 239 * Elimination of “comma” nodes (`GT_COMMA`). These nodes are introduced for convenience during importation, during which a single tree is constructed at a time, and not incorporated into the statement list until it is completed. When it is necessary, for example, to store a partially-constructed tree into a temporary variable, a `GT_COMMA` node is used to link it into the tree. However, in later phases, these comma nodes are an impediment to analysis, and thus are eliminated.
 240   * In some cases, it is not possible to fully extract the tree into a separate statement, due to execution order dependencies. In these cases, an “embedded” statement is created. While these are conceptually very similar to the `GT_COMMA` nodes, they do not masquerade as expressions.
 241 * Elimination of “QMark” (`GT_QMARK`/`GT_COLON`) nodes is actually done at the end of morphing, long before the current rationalization phase. The presence of these nodes made analyses (especially dataflow) overly complex.
 242 * Elimination of statements. Without statements, the execution order of a basic block's contents is fully defined by the `gtNext`/`gtPrev` links between `GenTree` nodes.
 243
 244 For our earlier example (Example of Post-Import IR), here is what the simplified dump looks like just prior to Rationalization (the $ annotations are value numbers).  Note that some common subexpressions have been computed into new temporary lclVars, and that computation has been inserted as a `GT_COMMA` (comma) node in the IR:
 245
 246     ▌  stmtExpr  void  (top level) (IL 0x000...0x026)
 247     │        ┌──▌  lclVar    double V07 cse1          $185
 248     │     ┌──▌  comma     double                      $185
 249     │     │  │     ┌──▌  dconst    double 2.00        $143
 250     │     │  │  ┌──▌  \*         double                $185
 251     │     │  │  │  └──▌  lclVar   double V00 arg0 u:2 $80
 252     │     │  └──▌  =         double                   $VN.Void
 253     │     │     └──▌  lclVar    double V07 cse1       $185
 254     │  ┌──▌  /         double                         $186
 255     │  │  │  ┌──▌  unary -   double                   $84
 256     │  │  │  │  └──▌  lclVar    double V01 arg1   u:2 $81
 257     │  │  └──▌  +         double                      $184
 258     │  │     │  ┌──▌  lclVar    double V06 cse0       $83
 259     │  │     └──▌  comma     double                   $83
 260     │  │        │  ┌──▌  mathFN    double sqrt        $83
 261     │  │        │  │  │     ┌──▌  lclVar double V02 arg2 u:2 $82
 262     │  │        │  │  │  ┌──▌  \*         double              $182
 263     │  │        │  │  │  │  │  ┌──▌  dconst    double 4.00   $141
 264     │  │        │  │  │  │  └──▌  \*         double           $181
 265     │  │        │  │  │  │     └──▌  lclVar double V00 arg0 u:2 $80
 266     │  │        │  │  └──▌  -         double                    $183
 267     │  │        │  │     │  ┌──▌  lclVar    double V01 arg1 u:2 $81
 268     │  │        │  │     └──▌  \*         double                 $180
 269     │  │        │  │        └──▌  lclVar    double V01 arg1 u:2 $81
 270     │  │        └──▌  =         double                          $VN.Void
 271     │  │           └──▌  lclVar    double V06 cse0              $83
 272     └──▌  =         double                                      $VN.Void
 273        └──▌  indir     double $186
 274           └──▌  lclVar    byref  V03 arg3        u:2 (last use) $c0
 275
 276 After rationalization, the nodes are presented in execution order, and the `GT_COMMA` (comma), `GT_ASG` (=), and `GT_STMT` nodes have been eliminated:
 277
 278     t0   = lclVar     double V01 arg1
 279     t1   = lclVar     double V01 arg1
 280            ┌──▌  t0   double
 281            ├──▌  t1   double
 282     t2   = \*          double
 283     t3   = lclVar     double V00 arg0
 284     t4   = dconst     double 4.00
 285            ┌──▌  t3   double
 286            ├──▌  t4   double
 287     t5   = \*          double
 288     t6   = lclVar     double V02 arg2
 289            ┌──▌  t5   double
 290            ├──▌  t6   double
 291     t7   = \*          double
 292            ┌──▌  t2   double
 293            ├──▌  t7   double
 294     t8   = -          double
 295            ┌──▌  t8   double
 296     t9   = mathFN     double sqrt
 297            ┌──▌  t9   double
 298            st.lclVar  double V06
 299     t10  = lclVar     double V06
 300     t11  = lclVar     double V01 arg1
 301            ┌──▌  t11  double
 302     t12  = unary -    double
 303            ┌──▌  t10  double
 304            ├──▌  t12  double
 305     t13  = +          double
 306     t14  = lclVar     double V00 arg0
 307     t15  = dconst     double 2.00
 308            ┌──▌  t14  double
 309            ├──▌  t15  double
 310     t16  = \*          double
 311            ┌──▌  t16  double
 312            st.lclVar  double V07
 313     t18  = lclVar     double V07
 314            ┌──▌  t17  double
 315            ├──▌  t18  double
 316     t19  = /          double
 317     t20  = lclVar     byref  V03 arg3
 318            ┌──▌  t20  double
 319            storeIndir double
 320
 321 ## <a name="lowering"/>Lowering
 322
 323 Lowering is responsible for transforming the IR in such a way that the control flow, and any register requirements, are fully exposed.
 324
 325 It accomplishes this in two passes.
 326
 327 The first pass is an execution-order traversal that performs context-dependent transformations such as expanding switch statements (using a switch table or a series of conditional branches), constructing addressing modes, etc.  For example, this:
 328
 329     t0   = lclVar     ref    V00 arg0
 330     t1   = lclVar     int    V03 loc1
 331            ┌──▌  t1   int
 332     t2   = cast       long <- int
 333     t3   = const      long   2
 334            ┌──▌  t2   long
 335            ├──▌  t3   long
 336     t4   = <<         long
 337            ┌──▌  t0   ref
 338            ├──▌  t4   long
 339     t5   = +          byref
 340     t6   = const      long   16
 341            ┌──▌  t5   ref
 342            ├──▌  t6   long
 343     t7   = +          byref
 344            ┌──▌  t7   ref
 345     t8   = indir      int
 346
 347 Is transformed into this, in which the addressing mode is explicit:
 348
 349     t0   = lclVar     ref    V00 arg0
 350     t1   = lclVar     int    V03 loc1
 351            ┌──▌  t1   int
 352     t2   = cast       long <- int
 353            ┌──▌  t0   ref
 354            ├──▌  t2   long
 355     t7   = lea(b+(i*4)+16) byref
 356            ┌──▌  t7   ref
 357     t8   = indir      int
 358
 359 The next pass annotates the nodes with register requirements. This is done in an execution order traversal in order to ensure that each node's operands are visited prior to the node itself. This pass may also do some transformations that do not require the parent context, such as determining the code generation strategy for block assignments (e.g. `GT_COPYBLK`) which may become helper calls, unrolled loops, or an instruction like `rep stos`.
 360
 361 The register requirements of a node are expressed by its `TreeNodeInfo` (`gtLsraInfo`) value.  For example, for the `copyBlk` node in this snippet:
 362
 363     t0   = const(h)   long   0xCA4000 static
 364     t1   = &lclVar    byref  V04 loc4
 365     t2   = const      int    34
 366            ┌──▌  t0   long
 367            ├──▌  t1   byref
 368            ├──▌  t2   int
 369            copyBlk    void
 370
 371 The `TreeNodeInfo` would be as follows:
 372
 373     +<TreeNodeInfo @ 15 0=1 1i 1f
 374           src=[allInt]
 375           int=[rax rcx rdx rbx rbp rsi rdi r8-r15 mm0-mm5]
 376           dst=[allInt] I>
 377
 378 The “@ 15” is the location number of the node.  The “0=1” indicates that there are zero destination registers (because this defines only memory), and 1 source register (the address of lclVar V04).  The “1i” indicates that it requires 1 internal integer register (for copying the remainder after copying 16-byte sized chunks), the “1f” indicates that it requires 1 internal floating point register (for copying the two 16-byte chunks).  The src, int and dst fields are encoded masks that indicate the register constraints for the source, internal and destination registers, respectively.
 379
 380 ## <a name="reg-alloc"/>Register allocation
 381
 382 The RyuJIT register allocator uses a Linear Scan algorithm, with an approach similar to [[2]](#[2]). In brief, it operates on two main data structures:
 383
 384 * `Intervals` (representing live ranges of variables or tree expressions) and `RegRecords` (representing physical registers), both of which derive from `Referent`.
 385 * `RefPositions`, which represent uses or defs (or variants thereof, such as ExposedUses) of either `Intervals` or physical registers.
 386
 387 Pre-conditions:
 388
 389 * The `NodeInfo` is initialized for each tree node to indicate:
 390   * Number of registers consumed and produced by the node.
 391   * Number and type (int versus float) of internal registers required.
 392
 393 Allocation proceeds in 4 phases:
 394
 395 * Determine the order in which the `BasicBlocks` will be allocated, and which predecessor of each block will be used to determine the starting location for variables live-in to the `BasicBlock`.
 396 * Construct Intervals for each tracked lclVar, then walk the `BasicBlocks` in the determined order building `RefPositions` for each register use, def, or kill.
 397 * Allocate the registers by traversing the `RefPositions`.
 398 * Write back the register assignments, and perform any necessary moves at block boundaries where the allocations don’t match.
 399
 400 Post-conditions:
 401
 402 * The `gtRegNum` property of all `GenTree` nodes that require a register has been set to a valid register number.
 403 * The `gtRsvdRegs` field (a set/mask of registers) has the requested number of registers specified for internal use.
 404 * All spilled values (lclVar or expression) are marked with `GTF_SPILL` at their definition. For lclVars, they are also marked with `GTF_SPILLED` at any use at which the value must be reloaded.
 405 * For all lclVars that are register candidates:
 406   * `lvRegNum` = initial register location (or `REG_STK`)
 407   * `lvRegister` flag set if it always lives in the same register
 408   * `lvSpilled` flag is set if it is ever spilled
 409 * The maximum number of simultaneously-live spill locations of each type (used for spilling expression trees) has been communicated via calls to `compiler->tmpPreAllocateTemps(type)`.
 410
 411 ## <a name="code-generation"/>Code Generation
 412
 413 The process of code generation is relatively straightforward, as Lowering has done some of the work already. Code generation proceeds roughly as follows:
 414
 415 * Determine the frame layout – allocating space on the frame for any lclVars that are not fully enregistered, as well as any spill temps required for spilling non-lclVar expressions.
 416 * For each `BasicBlock`, in layout order, and each `GenTree` node in the block, in execution order:
 417   * If the node is “contained” (i.e. its operation is subsumed by a parent node), do nothing.
 418   * Otherwise, “consume” all the register operands of the node.
 419     * This updates the liveness information (i.e. marking a lclVar as dead if this is the last use), and performs any needed copies.
 420     * This must be done in "spill order" so that any spill/restore code inserted by the register allocator to resolve register conflicts is generated in the correct order. "
 421   * Track the live variables in registers, as well as the live stack variables that contain GC refs.
 422   * Produce the `instrDesc(s)` for the operation, with the current live GC references.
 423   * Update the scope information (debug info) at block boundaries.
 424 * Generate the prolog and epilog code.
 425 * Write the final instruction bytes. It does this by invoking the emitter, which holds all the `instrDescs`.
 426
 427 # Phase-dependent Properties and Invariants of the IR
 428
 429 There are several properties of the IR that are valid only during (or after) specific phases of the JIT. This section describes the phase transitions, and how the IR properties are affected.
 430
 431 ## Phase Transitions
 432
 433 * Flowgraph analysis
 434   * Sets the predecessors of each block, which must be kept valid after this phase.
 435   * Computes reachability and dominators. These may be invalidated by changes to the flowgraph.
 436   * Computes edge weights, if profile information is available.
 437   * Identifies and normalizes loops. These may be invalidated, but must be marked as such.
 438 * Normalization
 439   * The lclVar reference counts are set by `lvaMarkLocalVars()`.
 440   * Statement ordering is determined by `fgSetBlockOrder()`. Execution order is a depth-first preorder traversal of the nodes, with the operands usually executed in order. The exceptions are:
 441     * Binary operators, which can have the `GTF_REVERSE_OPS` flag set to indicate that the RHS (`gtOp2`) should be evaluated before the LHS (`gtOp1`).
 442     * Dynamically-sized block copy nodes, which can have `gtEvalSizeFirst` set to `true` to indicate that their `gtDynamicSize` tree should be evaluated before executing their other operands.
 443 * Rationalization
 444   * All `GT_ASG` trees are transformed into `GT_STORE` variants (e.g. `GT_STORE_LCL_VAR`).
 445   * All `GT_ADDR` nodes are eliminated (e.g. with `GT_LCL_VAR_ADDR`).
 446   * All `GT_COMMA` and `GT_STMT` nodes are removed and their constituent nodes linked into execution order.
 447 * Lowering
 448   * `GenTree` nodes are split or transformed as needed to expose all of their register requirements and any necessary `flowgraph` changes (e.g., for switch statements).
 449
 450 ## GenTree phase-dependent properties
 451
 452 Ordering:
 453
 454 * For `GenTreeStmt` nodes, the `gtNext` and `gtPrev` fields must always be consistent. The last statement in the `BasicBlock` must have `gtNext` equal to null. By convention, the `gtPrev` of the first statement in the `BasicBlock` must be the last statement of the `BasicBlock`.
 455   * In all phases, `gtStmtExpr` points to the top-level node of the expression.
 456 * For non-statement nodes, the `gtNext` and `gtPrev` fields are either null, prior to ordering, or they are consistent (i.e. `A->gtPrev->gtNext = A`, and `A->gtNext->gtPrev == A`, if they are non-null).
 457 * After normalization the `gtStmtList` of the containing statement points to the first node to be executed.
 458 * Prior to normalization, the `gtNext` and `gtPrev` pointers on the expression (non-statement) `GenTree` nodes are invalid. The expression nodes are only traversed via the links from parent to child (e.g. `node->gtGetOp1()`, or `node->gtOp.gtOp1`). The `gtNext/gtPrev` links are set by `fgSetBlockOrder()`.
 459   * After normalization, and prior to rationalization, the parent/child links remain the primary traversal mechanism. The evaluation order of any nested expression-statements (usually assignments) is enforced by the `GT_COMMA` in which they are contained.
 460 * After rationalization, all `GT_COMMA` nodes are eliminated, statements are flattened, and the primary traversal mechanism becomes the `gtNext/gtPrev` links which define the execution order.
 461 * In tree ordering:
 462   * The `gtPrev` of the first node (`gtStmtList`) is always null.
 463   * The `gtNext` of the last node (`gtStmtExpr`) is always null.
 464
 465 TreeNodeInfo:
 466
 467 * The `TreeNodeInfo` (`gtLsraInfo`) is set during the Lowering phase, and communicates the register requirements of the node, including the number and types of registers used as sources, destinations and internal registers. Currently only a single destination per node is supported.
 468
 469 ## LclVar phase-dependent properties
 470
 471 Prior to normalization, the reference counts (`lvRefCnt` and `lvRefCntWtd`) are not valid. After normalization they must be updated when lclVar references are added or removed.
 472
 473 # Supporting technologies and components
 474
 475 ## Instruction encoding
 476
 477 Instruction encoding is performed by the emitter ([emit.h](https://github.com/dotnet/coreclr/blob/master/src/jit/emit.h)), using the `insGroup`/`instrDesc` representation. The code generator calls methods on the emitter to construct `instrDescs`. The encodings information is captured in the following:
 478
 479 * The “instruction” enumeration itemizes the different instructions available on each target, and is used as an index into the various encoding tables (e.g. `instInfo[]`, `emitInsModeFmtTab[]`) generated from the `instrs{tgt}.h` (e.g., [instrsxarch.h](https://github.com/dotnet/coreclr/blob/master/src/jit/instrsxarch.h)).
 480 * The skeleton encodings are contained in the tables, and then there are methods on the emitter that handle the special encoding constraints for the various instructions, addressing modes, register types, etc.
 481
 482 ## GC Info
 483
 484 Reporting of live GC references is done in two ways:
 485
 486 * For stack locations that are not tracked (these could be spill locations or lclVars – local variables or temps – that are not register candidates), they are initialized to null in the prolog, and reported as live for the entire method.
 487 * For lclVars with tracked lifetimes, or for expression involving GC references, we report the range over which the reference is live. This is done by the emitter, which adds this information to the instruction group, and which terminates instruction groups when the GC info changes.
 488
 489 The tracking of GC reference lifetimes is done via the `GCInfo` class in the JIT. It is declared in [src/jit/jitgcinfo.h](https://github.com/dotnet/coreclr/blob/master/src/jit/jitgcinfo.h) (to differentiate it from [src/inc/gcinfo.h](https://github.com/dotnet/coreclr/blob/master/src/inc/gcinfo.h)), and implemented in [src/jit/gcinfo.cpp](https://github.com/dotnet/coreclr/blob/master/src/jit/gcinfo.cpp).
 490
 491 In a JitDump, the generated GC info can be seen following the “In gcInfoBlockHdrSave()” line.
 492
 493 ## Debugger info
 494
 495 Debug info consists primarily of two types of information in the JIT:
 496
 497 * Mapping of IL offsets to native code offsets. This is accomplished via:
 498   * the `gtStmtILoffsx` on the statement nodes (`GenTreeStmt`)
 499   * the `gtLclILoffs` on lclVar references (`GenTreeLclVar`)
 500   * The IL offsets are captured during CodeGen by calling `CodeGen::genIPmappingAdd()`, and then written to debug tables by `CodeGen::genIPmappingGen()`.
 501 * Mapping of user locals to location (register or stack). This is accomplished via:
 502   * Struct `siVarLoc` (in [compiler.h](https://github.com/dotnet/coreclr/blob/master/src/jit/compiler.h)) captures the location
 503   * `VarScopeDsc` ([compiler.h](https://github.com/dotnet/coreclr/blob/master/src/jit/compiler.h)) captures the live range of a local variable in a given location.
 504
 505 ## Exception handling
 506
 507 Exception handling information is captured in an `EHblkDsc` for each exception handling region. Each region includes the first and last blocks of the try and handler regions, exception type, enclosing region, among other things. Look at [jiteh.h](https://github.com/dotnet/coreclr/blob/master/src/jit/jiteh.h) and [jiteh.cpp](https://github.com/dotnet/coreclr/blob/master/src/jit/jiteh.cpp), especially, for details. Look at `Compiler::fgVerifyHandlerTab()` to see how the exception table constraints are verified.
 508
 509 # Reading a JitDump
 510
 511 One of the best ways of learning about the JIT compiler is examining a compilation dump in detail. The dump shows you all the really important details of the basic data structures without all the implementation detail of the code. Debugging a JIT bug almost always begins with a JitDump. Only after the problem is isolated by the dump does it make sense to start debugging the JIT code itself.
 512
 513 Dumps are also useful because they give you good places to place breakpoints. If you want to see what is happening at some point in the dump, simply search for the dump text in the source code. This gives you a great place to put a conditional breakpoint.
 514
 515 There is not a strong convention about what or how the information is dumped, but generally you can find phase-specific information by searching for the phase name. Some useful points follow.
 516
 517 ## How to create a JitDump
 518
 519 You can enable dumps by setting the `COMPlus_JitDump` environment variable to a space-separated list of the method(s) you want to dump. For example:
 520
 521 ```cmd
 522 :: Print out lots of useful info when
 523 :: compiling methods named Main/GetEnumerator
 524 set "COMPlus_JitDump=Main GetEnumerator"
 525 ```
 526
 527 See [Setting configuration variables](../building/viewing-jit-dumps.md#setting-configuration-variables) for more details on this.
 528
 529 Full instructions for dumping the compilation of some managed code can be found here: [viewing-jit-dumps.md](../building/viewing-jit-dumps.md)
 530
 531 ## Reading expression trees
 532
 533 It takes some time to learn to “read” the expression trees, which are printed with the children indented from the parent, and, for binary operators, with the first operand below the parent and the second operand above.
 534
 535 Here is an example dump
 536
 537     [000027] ------------             ▌  stmtExpr  void  (top level) (IL 0x010...  ???)
 538     [000026] --C-G-------             └──▌  return    double
 539     [000024] --C-G-------                └──▌  call      double BringUpTest.DblSqrt
 540     [000021] ------------                   │     ┌──▌  lclVar    double V02 arg2
 541     [000022] ------------                   │  ┌──▌  -         double
 542     [000020] ------------                   │  │  └──▌  lclVar    double V03 loc0
 543     [000023] ------------ arg0              └──▌  *         double
 544     [000017] ------------                      │     ┌──▌  lclVar    double V01 arg1
 545     [000018] ------------                      │  ┌──▌  -         double
 546     [000016] ------------                      │  │  └──▌  lclVar    double V03 loc0
 547     [000019] ------------                      └──▌  *         double
 548     [000013] ------------                         │     ┌──▌  lclVar    double V00 arg0
 549     [000014] ------------                         │  ┌──▌  -         double
 550     [000012] ------------                         │  │  └──▌  lclVar    double V03 loc0
 551     [000015] ------------                         └──▌  *         double
 552     [000011] ------------                            └──▌  lclVar    double V03 loc0
 553
 554 The tree nodes are indented to represent the parent-child relationship. Binary operators print first the right hand side, then the operator node itself, then the left hand side. This scheme makes sense if you look at the dump “sideways” (lean your head to the left). Oriented this way, the left hand side operator is actually on the left side, and the right hand operator is on the right side so you can almost visualize the tree if you look at it sideways. The indentation level is also there as a backup.
 555
 556 Tree nodes are identified by their `gtTreeID`. This field only exists in DEBUG builds, but is quite useful for debugging, since all tree nodes are created from the routine `gtNewNode` (in [src/jit/gentree.cpp](https://github.com/dotnet/coreclr/blob/master/src/jit/gentree.cpp)). If you find a bad tree and wish to understand how it got corrupted, you can place a conditional breakpoint at the end of `gtNewNode` to see when it is created, and then a data breakpoint on the field that you believe is corrupted.
 557
 558 The trees are connected by line characters (either in ASCII, by default, or in slightly more readable Unicode when `COMPlus_JitDumpAscii=0` is specified), to make it a bit easier to read.
 559
 560     N037 (  0,  0) [000391] ----------L- arg0 SETUP  │  ┌──▌  argPlace  ref    REG NA $1c1
 561     N041 (  2,  8) [000389] ------------             │  │     ┌──▌  const(h) long 0xB410A098 REG rcx $240
 562     N043 (  4, 10) [000390] ----G-------             │  │  ┌──▌  indir     ref    REG rcx $1c1
 563     N045 (  4, 10) [000488] ----G------- arg0 in rcx │  ├──▌  putarg_reg ref    REG rcx
 564     N049 ( 18, 16) [000269] --C-G-------             └──▌  call void System.Diagnostics.TraceInternal.Fail $VN.Void
 565
 566 ## Variable naming
 567
 568 The dump uses the index into the local variable table as its name. The arguments to the function come first, then the local variables, then any compiler generated temps. Thus in a function with 2 parameters (remember “this” is also a parameter), and one local variable, the first argument would be variable 0, the second argument variable 1, and the local variable would be variable 2. As described earlier, tracked variables are given a tracked variable index which identifies the bit for that variable in the dataflow bit vectors. This can lead to confusion as to whether the variable number is its index into the local variable table, or its tracked index. In the dumps when we refer to a variable by its local variable table index we use the ‘V’ prefix, and when we print the tracked index we prefix it by a ‘T’.
 569
 570 ## References
 571
 572 <a name="[1]"/>
 573 [1] P. Briggs, K. D. Cooper, T. J. Harvey, and L. T. Simpson, "Practical improvements to the construction and destruction of static single assignment form," Software --- Practice and Experience, vol. 28, no. 8, pp. 859---881, Jul. 1998.
 574
 575 <a name="[2]"/>
 576 [2] Wimmer, C. and Mössenböck, D. "Optimized Interval Splitting in a Linear Scan Register Allocator," ACM VEE 2005, pp. 132-141. [http://portal.acm.org/citation.cfm?id=1064998&dl=ACM&coll=ACM&CFID=105967773&CFTOKEN=80545349](http://portal.acm.org/citation.cfm?id=1064998&dl=ACM&coll=ACM&CFID=105967773&CFTOKEN=80545349)