Correctly pass 12-byte GT_SIMD nodes on x86.
During lowering, all TYP_SIMD12 GT_SIMD nodes are retyped to TYP_SIMD16.
This is correct except in the case that the GT_SIMD node is going to be
passed by-value on x86, in which case this retyping causes us to push 16
bytes rather than 12. This change recognizes a GT_SIMD node that has
been retyped by checking the value of gtSIMDSize during LowerArg and
uses TYP_SIMD12 for the putarg instead of TYP_SIMD16 (aside from
differences in detecting retyped nodes, this is the same transformation
we already do when passing TYP_SIMD12 lclVars that are being used as
TYP_SIMD16 on x86).
Fixes https://github.com/dotnet/corefx/issues/15913.
Commit migrated from https://github.com/dotnet/coreclr/commit/
57c83fb63d8d8f442f9c7d0c93cb4f722eead545