JIT: Handle half accesses for SIMDs in local morph (#89520)
While it's not generally expected that halves can be accessed directly (ending
up with LCL_FLD), it sometimes happens in some of the SW implementations of
Vector256/Vector512 methods. In rare cases the JIT still falls back to these
even with there is HW acceleration. In those cases we want to avoid DNER'ing the
involved locals, so expand the existing recognition with these patterns.
Also add a check to the existing SIMD16 -> SIMD12 to verify the source is a
SIMD16.
Fix #85359
Fix #89456
Some size wise regressions are expected, which come down to
- A large number of similar looking tests end up now enregistering some locals
that cause new upper half saves/restores to be required. This accounts for
most of the size-wise regressions.
- The expansions often do not result in smaller code because loading/storing the
halves directly from/to stack is smaller code than the vector equivalent with
extraction/insertion.
Many of the regressions are in SW implementations of Vector256/Vector512 methods
that we usually do not expect to see called with HW acceleration supported.