7 * Avoid forcing structs to the stack if they are only assigned to/from, or passed to/returned
8 from a call or intrinsic
9 - Including SIMD types as well as other pointer-sized-or-less struct types
10 - Enable enregistration of structs that have no field accesses
11 * Optimize these types as effectively as any other basic type
12 - Value numbering, especially for types that are used in intrinsics (e.g. SIMD)
16 * No “swizzling” or lying about struct types – they are always struct types
17 - No confusing use of GT_LCL_FLD to refer to the entire struct as a different type
19 Struct-Related Issues in RyuJIT
20 -------------------------------
21 The following issues illustrate some of the motivation for improving the handling of value types
24 * [\#11407 [RyuJIT] Fully enregister structs that fit into a single register when profitable](https://github.com/dotnet/coreclr/issues/11407), also VSO Bug 98404: .NET JIT x86 - poor code generated for value type initialization
25 * This is a simple test case that should generate simply `xor eax; ret` on x86 and x64, but
26 instead generates many unnecessary copies. It is addressed by full enregistration of
27 structs that fit into a register:
30 struct foo { public byte b1, b2, b3, b4; }
31 static foo getfoo() { return new foo(); }
34 * [\#1133 JIT: Excessive copies when inlining](https://github.com/dotnet/coreclr/issues/1133)
35 * The scenario given in this issue involves a struct that is larger than 8 bytes, so
36 it is not impacted by the fixed-size types. However, by enabling assertion propagation
37 for struct types (which, in turn is made easier by using normal assignments), the
38 excess copies can be eliminated.
39 * Note that these copies are not generated when passing and returning scalar types,
40 and it may be worth considering (in future) whether we can avoiding adding them
43 * [\#1161 RyuJIT properly optimizes structs with a single field if the field type is int but not if it is double](https://github.com/dotnet/coreclr/issues/1161)
44 * This issue arises because we never promote a struct with a single double field, due to
45 the fact that such a struct may be passed or returned in a general purpose register.
46 This issue could be addressed independently, but should "fall out" of improved heuristics
47 for when to promote and enregister structs.
49 * [\#1636 Add optimization to avoid copying a struct if passed by reference and there are no
50 writes to and no reads after passed to a callee](https://github.com/dotnet/coreclr/issues/1636).
51 * This issue is nearly the same as the above, except that in this case the desire is to
52 eliminate unneeded copies locally (i.e. not just due to inlining), in the case where
53 the struct may or may not be passed or returned directly.
54 * Unfortunately, there is not currently a scenario or test case for this issue.
56 * [\#3144 Avoid marking tmp as DoNotEnregister in tmp=GT_CALL() where call returns a
57 enregisterable struct in two return registers](https://github.com/dotnet/coreclr/issues/3144)
58 * This issue could be addressed without First Class Structs. However,
59 it will be easier with struct assignments that are normalized as regular assignments, and
60 should be done along with the streamlining of the handling of ABI-specific struct passing
63 * [\#3539 RyuJIT: Poor code quality for tight generic loop with many inlineable calls](https://github.com/dotnet/coreclr/issues/3539)
64 (factor x8 slower than non-generic few calls loop).
65 * I am still investigating this issue.
67 * [\#5556 RuyJIT: structs in parameters and enregistering](https://github.com/dotnet/coreclr/issues/5556)
68 * This also requires further investigation, but requires us to "Add support in prolog to extract fields, and
69 remove the restriction of not promoting incoming reg structs that have more than one field" - see [Dependent Work Items](https://github.com/dotnet/coreclr/blob/master/Documentation/design-docs/first-class-structs.md#dependent-work-items)
71 Normalizing Struct Types
72 ------------------------
73 We would like to facilitate full enregistration of structs with the following properties:
74 1. Its fields are infrequently accessed, and
75 1. The entire struct fits into a register, and
76 2. Its value is used or defined in a register
77 (i.e. as an argument to or return value from calls or intrinsics).
79 In RyuJIT, the concept of a type is very simplistic (which helps support the high throughput
80 of the JIT). Rather than a symbol table to hold the properties of a type, RyuJIT primarily
81 deals with types as simple values of an enumeration. When more detailed information is
82 required about the structure of a type, we query the type system, across the JIT/EE interface.
83 This is generally done only during the importer (translation from MSIL to the RyuJIT IR), and
84 during struct promotion analysis. As a result, struct types are treated as an opaque type
85 (TYP_STRUCT) of unknown size and structure.
87 In order to treat fully-enregisterable struct types as "first class" types in RyuJIT, we
88 create new types with fixed size and structure:
89 * TYP_SIMD8, TYP_SIMD12, TYP_SIMD16 and (where supported by the target) TYP_SIMD32
90 - These types already exist, and represent some already-completed steps toward First Class Structs.
91 * TYP_STRUCT1, TYP_STRUCT2, TYP_STRUCT4, TYP_STRUCT8 (on 64-bit systems)
92 - These types are new, and will be used where struct types of the given size are passed and/or
93 returned in registers.
95 We want to identify and normalize these types early in the compiler, before any decisions are
96 made regarding whether they are constrained to live on the stack and whether and how they are
97 promoted (scalar replaced) or copied.
99 One issue that arises is that it becomes necessary to know the size of any struct type that
100 we encounter, even if we may not actually need to know the size in order to generate code.
101 The major cause of additional queries seems to be for field references. It is possible to
102 defer some of these cases. I don't know what the throughput impact will be to always do the
103 normalization, but in principle I think it is worth doing because the alternative would be
104 to transform the types later (e.g. during morph) and use a contextual tree walk to see if we
105 care about the size of the struct. That would likely be a messier analysis.
107 Current Struct IR Phase Transitions
108 -----------------------------------
110 There are three phases in the JIT that make changes to the representation of struct tree
114 * All struct type lclVars have TYP_STRUCT
115 * All struct assignments/inits are block ops
116 * All struct call args are ldobj
117 * Other struct nodes have TYP_STRUCT
119 * Fields of promoted structs become separate lclVars (scalar promoted) with primitive types
121 * All struct nodes are transformed to block ops
123 * Some promoted structs are forced to stack
124 - Become “dependently promoted”
126 - Morphed to GT_LCL_FLD if passed in a register
127 - Treated in various ways otherwise (inconsistent)
131 The most fundamental change with first class structs is that struct assignments become
132 just a special case of assignment. The existing block ops (GT_INITBLK, GT_COPYBLK,
133 GT_COPYOBJ, GT_LDOBJ) are eliminated. Instead, the block operations in the incoming MSIL
134 are translated into assignments to or from a new GT_OBJ node.
136 New fixed-size struct types are added: (TYP_STRUCT[1|2|4|8]), which are somewhat similar
137 to the (existing) SIMD types (TYP_SIMD[8|16|32]). As is currently done for the SIMD types,
138 these types are normalized in the importer.
140 Conceptually, struct nodes refer to the object, not the address. This is important, as
141 the existing block operations all take address operands, meaning that any lclVar involved
142 in an assignment (including initialization) will be in an address-taken context in the JIT,
143 requiring special analysis to identify the cases where the address is only taken in order
144 to assign to or from the lclVar. This further allows for consistency in the treatment of
145 structs and simple types - even potentially enabling optimizations of non-enregisterable
150 * Struct promotion analysis
151 * Aggressively promote pointer-sized fields of structs used as args or returns
152 * Consider FULL promotion of pointer-size structs
153 * If there are fewer field references than calls or returns
156 * Struct assignments look like any other assignment
157 * GenTreeAsg (GT_ASG) extends GenTreeOp with:
160 // True if this assignment is a volatile memory operation.
161 bool IsVolatile() const { return (gtFlags & GTF_BLK_VOLATILE) != 0; }
164 // What code sequence we will be using to encode this operation.
175 ### Struct “objects” as lvalues
176 * Lhs of a struct assignment is a block node or lclVar
177 * Block nodes represent the address and “shape” info formerly on the block copy:
178 * GT_BLK and GT_STORE_BLK (GenTreeBlk)
179 * Has a (non-tree node) size field
182 * GT_OBJ and GT_STORE_OBJ (GenTreeObj extends GenTreeBlk)
183 * gtClass, gtGcPtrs, gtGcPtrCount, gtSlots
184 * GT_DYN_BLK and GT_STORE_DYN_BLK (GenTreeDynBlk extends GenTreeBlk)
185 * Additional child gtDynamicSize
187 ### Struct “objects” as rvalues
188 After morph, structs on rhs of assignment are either:
189 * The tree node for the object: e.g. call, retExpr
190 * GT_IND of an address (e.g. GT_LEA)
192 The lhs provides the “shape” for the assignment. Note: it has been suggested that these could
193 remain as GT_BLK nodes, but I have not given that any deep consideration.
195 ### Preserving Struct Types in Trees
197 Prior to morphing, all nodes that may represent a struct type will have a class handle.
198 After morphing, some will become GT_IND.
200 ### Structs As Call Arguments
202 All struct args imported as GT_OBJ, transformed as follows during morph:
203 * P_FULL promoted locals:
204 * Remain as a GT_LCL_VAR nodes, with the appropriate fixed-size struct type.
205 * Note that these may or may not be passed in registers.
206 * P_INDEP promoted locals:
207 * These are the ones where the fields don’t match the reg types
208 GT_STRUCT (or something) for aggregating multiple fields into a single register
209 * Op1 is a lclVar for the first promoted field
210 * Op2 is the lclVar for the next field, OR another GT_STRUCT
211 * Bit offset for the second child
212 * All other cases (non-locals, OR P_DEP or non-promoted locals):
213 * GT_LIST of GT_IND for each half
217 The return of a struct value from the current method is represented as follows:
218 * GT_RET(GT_OBJ) initially
219 * GT_OBJ morphed, and then transformed similarly to call args
221 Proposed Struct IR Phase Transitions
222 ------------------------------------
225 * Struct assignments are imported as GT_ASG
226 * Struct type is normalized to TYP_STRUCT* or TYP_SIMD*
228 * Fields of promoted structs become separate lclVars (as is)
229 * Enregisterable structs (including Pair Types) may be promoted to P_FULL (i.e. fully enregistered)
230 * As a future optimization, we may "restructure" multi-register argument or return values as a
231 synthesized struct of appropriately typed fields, and then promoted in the normal manner.
233 * All struct type local variables remain as simple GT_LCL_VAR nodes.
234 * All other struct nodes are transformed to GT_IND (rhs of assignment) or remain as GT_OBJ
235 * In Lowering, GT_OBJ will be changed to GT_BLK if there are no gc ptrs. This could be done
236 earlier, but there are some places where the object pointer is desired.
237 * It is not actually clear if there is a great deal of value in the GT_BLK, but it was added
238 to be more compatible with existing code that expects block copies with gc pointers to be
239 distinguished from those that do not.
240 * Promoted structs are forced to stack ONLY if address taken
242 * Fixed-size enregisterable structs: GT_LCL_VAR or GT_OBJ of appropriate type.
243 * Multi-register arguments: GT_LIST of register-sized operands:
244 * GT_LCL_VAR if there is a promoted field that exactly matches the register size and type
245 (note that, if we have performed the optimization mentioned above in struct promotion,
246 we may have a GT_LCL_VAR of a synthesized struct field).
247 * GT_LCL_FLD if there is a matching field in the struct that has not been promoted.
248 * GT_IND otherwise. Note that if this is a struct local that does not have a matching field,
249 this will force the local to live on the stack.
251 * Pair types (e.g. TYP_LONG on 32-bit targets) are decomposed as needed to expose register requirements.
252 Although these are not strictly structs, their handling is similar.
253 * Computations are decomposed into their constituent parts when they independently write
255 * TYP_LONG lclVars (and TYP_DOUBLE on ARM) are split (similar to promotion/scalar replacement of
256 structs) if and only if they are register candidates.
257 * Other TYP_LONG/TYP_DOUBLE lclVars are loaded into independent registers either via:
258 * Single GT_LCL_VAR that will translate into a pair load instruction (ldp), with two register
260 * GT_LCL_FLD (current approach) or GT_IND (probaby a better approach)
261 * Calls and loads that target multiple registers
262 * Existing gtLsraInfo has the capability to specify multiple destination registers
263 * Additional work is required in LSRA to handle these correctly
264 * If HFAs can be return values (not just call args), then we may need to support up to 4 destination
272 The `getfoo` method initializes a struct of 4 bytes.
273 The dump of the (single) local variable is included to show the change from `struct (8)` to
274 `struct4`, as the "exact size" of the struct is 4 bytes.
275 Here is the IR after Import:
278 ; V00 loc0 struct ( 8)
280 ▌ stmtExpr void (top level) (IL 0x000... ???)
286 └──▌ lclVar struct V00 loc0
288 ▌ stmtExpr void (top level) (IL 0x008... ???)
290 └──▌ lclFld int V00 loc0 [+0]
292 This is how it currently looks just before code generation:
294 ▌ stmtExpr void (top level) (IL 0x000...0x003)
295 │ ┌──▌ const int 0 REG rax $81
296 │ ├──▌ &lclVar byref V00 loc0 d:3 REG NA
297 └──▌ storeIndir int REG NA
299 ▌ stmtExpr void (top level) (IL 0x008...0x009)
300 │ ┌──▌ lclFld int V00 loc0 u:3[+0] (last use) REG rax $180
301 └──▌ return int REG NA $181
303 And here is the resulting code:
307 mov qword ptr [V00 rsp], rax
309 mov dword ptr [V00 rsp], eax
310 mov eax, dword ptr [V00 rsp]
315 Here is the IR after Import with the prototype First Class Struct changes.
316 Note that the fixed-size struct variable is assigned and returned just as for a scalar type.
321 ▌ stmtExpr void (top level) (IL 0x000... ???)
323 └──▌ = struct4 (init)
324 └──▌ lclVar struct4 V00 loc0
327 ▌ stmtExpr void (top level) (IL 0x008... ???)
329 └──▌ lclVar struct4 V00 loc0
331 And Here is the resulting code just prior to code generation:
333 ▌ stmtExpr void (top level) (IL 0x008...0x009)
334 │ ┌──▌ const struct4 0 REG rax $81
335 └──▌ return struct4 REG NA $140
337 Finally, here is the resulting code that we were hoping to acheive:
345 Here is the IR after Inlining for the `TestValueTypesInInlinedMethods` method that invokes a
346 sequence of methods that are inlined, creating a sequence of copies.
347 Because this struct type does not fit into a single register, the types do not change (and
348 therefore the local variable table is not shown).
351 ▌ stmtExpr void (top level) (IL 0x000...0x003)
357 └──▌ lclVar struct V00 loc0
359 ▌ stmtExpr void (top level) (IL 0x008... ???)
363 │ │ └──▌ lclVar struct V00 loc0
366 └──▌ lclVar struct V01 tmp0
368 ▌ stmtExpr void (top level) (IL 0x008... ???)
372 │ │ └──▌ lclVar struct V01 tmp0
375 └──▌ lclVar struct V02 tmp1
377 ▌ stmtExpr void (top level) (IL 0x008... ???)
381 │ │ └──▌ lclVar struct V02 tmp1
384 └──▌ lclVar struct V03 tmp2
386 ▌ stmtExpr void (top level) (IL 0x008... ???)
387 └──▌ call help long HELPER.CORINFO_HELP_GETSHARED_NONGCSTATIC_BASE
388 ├──▌ const long 0x7ff918494e10
391 ▌ stmtExpr void (top level) (IL 0x008... ???)
395 │ │ └──▌ lclVar struct V03 tmp2
397 │ ┌──▌ const long 8 Fseq[#FirstElem]
401 ▌ stmtExpr void (top level) (IL 0x00E... ???)
404 And here is the resulting code:
408 mov qword ptr [V00 rsp+58H], rax
409 mov qword ptr [V00+0x8 rsp+60H], rax
411 lea rdx, bword ptr [V00 rsp+58H]
413 vmovdqu qword ptr [rdx], ymm0
414 vmovdqu ymm0, qword ptr [V00 rsp+58H]
415 vmovdqu qword ptr [V01 rsp+48H]ymm0, qword ptr
416 vmovdqu ymm0, qword ptr [V01 rsp+48H]
417 vmovdqu qword ptr [V02 rsp+38H]ymm0, qword ptr
418 vmovdqu ymm0, qword ptr [V02 rsp+38H]
419 vmovdqu qword ptr [V03 rsp+28H]ymm0, qword ptr
420 mov rcx, 0x7FF918494E10
422 call CORINFO_HELP_GETSHARED_NONGCSTATIC_BASE
423 mov rax, 0x1FAC6EB29C8
424 mov rax, gword ptr [rax]
426 vmovdqu ymm0, qword ptr [V03 rsp+28H]
427 vmovdqu qword ptr [rax], ymm0
434 (note that the obj node will become a blk node downstream).
436 ▌ stmtExpr void (top level) (IL 0x000...0x003)
439 └──▌ lclVar struct V00 loc0
441 ▌ stmtExpr void (top level) (IL 0x008... ???)
442 │ ┌──▌ lclVar struct V00 loc0
444 └──▌ lclVar struct V01 tmp0
446 ▌ stmtExpr void (top level) (IL 0x008... ???)
447 │ ┌──▌ lclVar struct V01 tmp0
449 └──▌ lclVar struct V02 tmp1
451 ▌ stmtExpr void (top level) (IL 0x008... ???)
452 │ ┌──▌ lclVar struct V02 tmp1
454 └──▌ lclVar struct V03 tmp2
456 ▌ stmtExpr void (top level) (IL 0x008... ???)
457 └──▌ call help long HELPER.CORINFO_HELP_GETSHARED_NONGCSTATIC_BASE
458 ├──▌ const long 0x7ff9184b4e10
461 ▌ stmtExpr void (top level) (IL 0x008... ???)
462 │ ┌──▌ lclVar struct V03 tmp2
465 │ ┌──▌ const long 8 Fseq[#FirstElem]
469 ▌ stmtExpr void (top level) (IL 0x00E... ???)
472 Here is the IR after fgMorph:
473 Note that copy propagation has propagated the zero initialization through to the final store.
475 ▌ stmtExpr void (top level) (IL 0x000...0x003)
478 └──▌ lclVar struct V00 loc0
480 ▌ stmtExpr void (top level) (IL 0x008... ???)
481 │ ┌──▌ const struct 0
483 └──▌ lclVar struct V01 tmp0
485 ▌ stmtExpr void (top level) (IL 0x008... ???)
486 │ ┌──▌ const struct 0
488 └──▌ lclVar struct V02 tmp1
490 ▌ stmtExpr void (top level) (IL 0x008... ???)
491 │ ┌──▌ const struct 0
493 └──▌ lclVar struct V03 tmp2
495 ▌ stmtExpr void (top level) (IL 0x008... ???)
496 └──▌ call help long HELPER.CORINFO_HELP_GETSHARED_NONGCSTATIC_BASE
497 ├──▌ const long 0x7ffc8bbb4e10
500 ▌ stmtExpr void (top level) (IL 0x008... ???)
501 │ ┌──▌ const struct 0
504 │ ┌──▌ const long 8 Fseq[#FirstElem]
507 └──▌ const(h) long 0x2425b6229c8 static Fseq[s_dt]
509 ▌ stmtExpr void (top level) (IL 0x00E... ???)
513 After liveness analysis the dead stores have been eliminated:
515 ▌ stmtExpr void (top level) (IL 0x008... ???)
516 └──▌ call help long HELPER.CORINFO_HELP_GETSHARED_NONGCSTATIC_BASE
517 ├──▌ const long 0x7ffc8bbb4e10
520 ▌ stmtExpr void (top level) (IL 0x008... ???)
521 │ ┌──▌ const struct 0
524 │ ┌──▌ const long 8 Fseq[#FirstElem]
527 └──▌ const(h) long 0x2425b6229c8 static Fseq[s_dt]
529 ▌ stmtExpr void (top level) (IL 0x00E... ???)
532 And here is the resulting code, going from a code size of 129 bytes down to 58.
535 mov rcx, 0x7FFC8BBB4E10
537 call CORINFO_HELP_GETSHARED_NONGCSTATIC_BASE
539 mov rdx, 0x2425B6229C8
540 mov rdx, gword ptr [rdx]
543 vmovdqu qword ptr [rdx], ymm0
550 This is a preliminary breakdown of the work into somewhat separable tasks. Those whose descriptions
551 are prefaced by '*' have been prototyped in an earlier version of the JIT, and that work is now
552 being re-integrated and tested, but may require some cleanup and/or phasing with other work items
553 before a PR is submitted.
555 ### Mostly-Independent work items
556 1. *Replace block ops with assignments & new nodes.
558 2. *Add new fixed-size types, and normalize them in the importer (might be best to do this with or after #1, but not really dependent)
561 * Enable support for multiple destination regs, call nodes that return a struct in multiple
562 registers (for x64/ux, and for arm)
563 * Handle multiple destination regs for ldp on arm64 (could be done before or concurrently with the above).
564 Note that this work item is specifically intended for call arguments. It is likely the case that
565 utilizing ldp for general-purpose code sequences would be handled separately.
567 4. X64/ux: aggressively promote lclVar struct incoming or outgoing args with two 8-byte fields
570 * modify the handling of multireg struct args to use GT_LIST of GT_IND
571 * remove the restriction to NOT promote things that are multi-reg args, as long as they match (i.e. two 8-byte fields).
572 Pass those using GT_LIST of GT_LCL_VAR.
573 * stop adding extra lclVar copies
576 * Promote 16-byte struct lclVars that are incoming or outgoing register arguments only if they have 2 8-byte fields (DONE).
577 Pass those using GT_LIST of GT_LCL_VAR (as above for x64/ux).
578 Note that, if the types do not match, e.g. a TYP_DOUBLE field that will be passed in an integer register,
579 it will require special handling in Lowering and LSRA, as is currently done in the TYP_SIMD8 case.
580 * For other cases, pass as GT_LIST of GT_IND (DONE)
581 * The GT_LIST would be created in fgMorphArgs(). Then in Lower, putarg_reg nodes will be inserted between
582 the GT_LIST and the list item (GT_LCL_VAR or GT_IND). (DONE)
583 * Add support for HFAs.
585 ### Dependent work items:
587 7. *(Depends on 1 & 2): Fully enregister TYP_STRUCT[1|2|3|4|8] with no field accesses.
589 8. *(Depends on 1 & 2): Enable value numbering and assertion propagation for struct types.
591 9. (Depends on 1 & 2, mostly to avoid conflicts): Add support in prolog to extract fields, and
592 remove the restriction of not promoting incoming reg structs that have more than one field.
593 Note that SIMD types are already reassembled in the prolog.
595 10. (Not really dependent, but probably best done after 1, 2, 5, 6): Add support for assembling
596 non-matching fields into registers for call args and returns. This includes producing the
597 appropriate IR, which may be simply be shifts and or's of the appropriate fields.
598 This would either be done during `fgMorphArgs()` and the `GT_RETURN` case of `fgMorphSmpOp()`
599 or as described below in
600 [Extracting and Assembling Structs](#Extract-Assemble).
602 11. (Not really dependent, but probably best done after 1, 2, 5, 6): Add support for extracting the fields for the
603 returned struct value of a call, producing the appropriate IR, which may simply be shifts and
605 This would either be done during the morphing of the call itself, or as described below in
606 [Extracting and Assembling Structs](#Extract-Assemble).
608 12. (Depends on 3, may replace the second part of 6): For arm64, add support for loading non-promoted
609 or non-local structs with ldp
610 * Either using TYP_STRUCT and special case handling, OR adding TYP_STRUCT16
612 13. (Depends on 7, 9, 10, 11): Enhance struct promotion to allow full enregistration of structs,
613 even if some field are accessed, if there are more call/return references than field references.
614 This work item should address issue #1161, by removing the automatic non-promotion
615 of structs with a single double field, and adding appropriate heuristics for when it
620 These changes are somewhat orthogonal, though will likely have merge issues if done in parallel with any of
622 * Unified API for ABI info
624 * Num regs used for passing
625 * Per-slot location (reg num / REG_STK)
626 * Per-slot type (for reg “slots”)
627 * Starting stack slot offset (if passed on stack)
630 * We should be able to unify HFA handling into this model
631 * For arg passing, the API for creating the argEntry should take an arg state that keeps track of
632 what regs have been used, and handles the backfilling case for ARM
634 Open Design Questions
635 ---------------------
636 ### <a name="Extract-Assemble"/>Extracting and Assembling Structs
638 Should the IR for extracting and assembling struct arguments from or to argument or return registers
639 be generated directly during the morphing of call arguments and returns, or should this capability
640 be handled in a more general fashion in `fgMorphCopyBlock()`?
641 The latter seems desirable for its general applicability.
643 One way to handle this might be:
645 1. Whenever you have a case of mismatched structs (call args, call node, or return node),
646 create a promoted temp of the "fake struct type", e.g. for arm you would introduce three
647 new temps for the struct, and for each of its TYP_LONG promoted fields.
648 2. Add an assignment to or from the temp (e.g. as a setup arg node), BUT the structs on
649 both sides of that assignment can now be promoted.
650 3. Add code to fgMorphCopyBlock to handle the extraction and assembling of structs.
651 4. The promoted fields of the temp would be preferenced to the appropriate argument or return registers.