Use a smaller expansion of `GT_INDEX` in MinOpts.
We currently expand `GT_INDEX` nodes during morph into an explicit
bounds check followed by a load. For example, this tree:
```
[000059] ------------ /--* LCL_VAR int V09 loc6
[000060] R--XG------- /--* INDEX ref
[000058] ------------ | \--* LCL_VAR ref V00 arg0
[000062] -A-XG------- * ASG ref
[000061] D------N---- \--* LCL_VAR ref V10 loc7
```
is expanded into this tree:
```
[000060] R--XG+------ /--* IND ref
[000491] -----+------ | | /--* CNS_INT long 16 Fseq[#FirstElem]
[000492] -----+------ | \--* ADD byref
[000488] -----+-N---- | | /--* CNS_INT long 3
[000489] -----+------ | | /--* LSH long
[000487] -----+------ | | | \--* CAST long <- int
[000484] i----+------ | | | \--* LCL_VAR int V09 loc6
[000490] -----+------ | \--* ADD byref
[000483] -----+------ | \--* LCL_VAR ref V00 arg0
[000493] ---XG+------ /--* COMMA ref
[000486] ---X-+------ | \--* ARR_BOUNDS_CHECK_Rng void
[000059] -----+------ | +--* LCL_VAR int V09 loc6
[000485] ---X-+------ | \--* ARR_LENGTH int
[000058] -----+------ | \--* LCL_VAR ref V00 arg0
[000062] -A-XG+------ * ASG ref
[000061] D----+-N---- \--* LCL_VAR ref V10 loc7
```
Even in this simple case where both the array object and the index are
lclVars, this represents a rather large increase in the size of the IR.
In the worst case, the JIT introduces and additional lclVar for both the
array object and the index, adding several additional nodes to the tree.
When optimizing, exposing the structure of the array access may be
helpful, as it may allow the compiler to better analyze the program.
When we are not optimizing, however, the expansion serves little purpose
besides constraining the IR shapes that must be handled by the backend.
Due to its need for lclVars in the worst case, this expansion may even
bloat the size of the generated code, as all lclVar references are
generated as loads/stores from/to the stack when we are not optimizing.
In the case above, the expanded tree generates the following x64
assembly:
```
IN0018: 000092 mov rdi, gword ptr [V00 rbp-10H]
IN0019: 000096 mov edi, dword ptr [rdi+8]
IN001a: 000099 cmp dword ptr [V09 rbp-48H], edi
IN001b: 00009C jae G_M5106_IG38
IN001c: 0000A2 mov rdi, gword ptr [V00 rbp-10H]
IN001d: 0000A6 mov esi, dword ptr [V09 rbp-48H]
IN001e: 0000A9 movsxd rsi, esi
IN001f: 0000AC mov rdi, gword ptr [rdi+8*rsi+16]
IN0020: 0000B1 mov gword ptr [V10 rbp-50H], rdi
```
Inspired by other recent experiments (e.g. #13188), this change
introduces a new node that replaces the above expansion in MinOpts. This
node, `GT_INDEX_ADDR`, represents the bounds check and address
computation involved in an array access, and returns the address of the
element that is to be loaded or stored. Using this node, the example
tree given above expands to the following:
```
[000489] a--XG+------ /--* IND ref
[000059] -----+------ | | /--* LCL_VAR int V09 loc6
[000060] R--XG+--R--- | \--* INDEX_ADDR byref
[000058] -----+------ | \--* LCL_VAR ref V00 arg0
[000062] -A-XG+------ * ASG ref
[000061] D----+-N---- \--* LCL_VAR ref V10 loc7
```
This expansion requires only the addition of the `GT_IND` node that
represents the memory access itself. This savings in IR size translates
to about a 2% decrease in instructions retired during non-optimizing
compilation. Furthermore, this expansion tends to generate smaller
code; for example, the tree given above is generated in 29 rather than
35 bytes:
```
IN0018: 000092 mov edi, dword ptr [V09 rbp-48H]
IN0019: 000095 mov rsi, gword ptr [V00 rbp-10H]
IN001a: 000099 cmp rdi, qword ptr [rsi+8]
IN001b: 00009D jae G_M5106_IG38
IN001c: 0000A3 lea rsi, bword ptr [rsi+8*rdi+16]
IN001d: 0000A8 mov rdi, gword ptr [rsi]
IN001e: 0000AB mov gword ptr [V10 rbp-50H], rdi
```
13 files changed: