[mono][interp] Add instrinsics for common Vector128 operations (#81782)
* [mono][interp] Move defines from transform.c
* [mono][interp] Move more defines from transform.c
* [mono][interp] Add intrinsics for most common V128 operations
We add intrinsics for Vector128 intrinsics that are actively used within our bcl.
We declare a set of simd method names, the same way we do it with jit, in `simd-methods.def`. In `transform-simd.c` we lookup method names in the list of supported intrinsics for `Vector128` and `Vector128<T>`. Once we find a supported instrinsic, we generate code for it, typically a `MINT_SIMD_INTRINS_*` opcode. In order to avoid adding too many new opcodes to the interpreter, simd intrinsics are grouped by signature. So all simd intrinsics that receive a single argument and return a value, will be called through `MINT_SIMD_INTRINS_P_P`. This instruction will receive an index to get the intrinsic implementation and calls it indirectly.
Some of the intrinsics are implemented using the standard vector intrinsics, supported by gcc and clang. These do not fully expose the SIMD capabilities, so some intrinsics are implemented naively. This should still be faster than using nonvectorized approach from managed code. In the future we can add better implmentation, on platforms where we have low level support. This would both be faster and reduce code size.
* [mono][interp] Add option to disable simd intrinsics
* [mono][interp] Disable simd intrinsics by default on wasm
These intrinsics are not yet implemented on jiterpreter, making it slighty slower instead.
* [mono][interp] Replace v128_create with v128_ldc if possible
v128_create receives as an argument every single element of the vector. This method is typically used with constants. For a Vector128<short> this means that creating a constant vector required 8 ldc.i4 and a v128_create. We can instead use a single instruction and embed the vector value in the code stream directly.
* [mono][interp] Remove op_Division
It is actually not used in bcl, it is not really vectorized on any platforms and the codegen for the interp implementation is massive and inefficient.
* [mono][interp] Don't emit intrinsics for unsupported vector types
* [mono][interp] Vector extensions used in these intrinsics are a GNUC extension
13 files changed: