RyuJIT/x86: Implement TYP_SIMD12 support
There is no native load/store instruction for Vector3/TYP_SIMD12,
so we need to break this type down into two loads or two stores,
with an additional instruction to put the values together in the
xmm target register. AMD64 SIMD support already implements most of
this. For RyuJIT/x86, we need to implement stack argument support
(both incoming and outgoing), which is different from the AMD64 ABI.
In addition, this change implements accurate alignment-sensitive
codegen for all SIMD types. For RyuJIT/x86, the stack is only 4
byte aligned (unless we have double alignment), so SIMD locals are
not known to be aligned (TYP_SIMD8 could be with double alignment).
For AMD64, we were unnecessarily pessimizing alignment information,
and were always generating unaligned moves when on AVX2 hardware.
Now, all SIMD types are given their preferred alignment in
getSIMDTypeAlignment() and alignment determination in
isSIMDTypeLocalAligned() takes into account stack alignment (it
still needs support for x86 dynamic alignment). X86 still needs to
consider dynamic stack alignment for SIMD locals.
Fixes dotnet/coreclr#7863
Commit migrated from https://github.com/dotnet/coreclr/commit/
e401df83de9a4f135e71ca2ab06eff19c112a881