[mono] Implement AdvSimd (#49260)
This change adds AdvSimd and AdvSimd.Arm64 support to LLVM-enabled Mono.
Most aarch64 LLVM intrinsic functions are overloaded and have names determined
by an invariant base string prepended to a string representation of one or two
type parameters. Intrinsic functions used by an LLVM module must have a
declaration somewhere in memory when JITting or somewhere in the output bitcode
file when AOTing. Currently Mono maintains a hash table that maps internal
intrinsic IDs to LLVM intrinsic declarations. These IDs have been extended: a
simplified type representation is added to the key's upper bits. This
representation is not especially compact, and currently uses 9 bits to label 18
states, but it's easy to look at in a debugger. (A simple base-18 encoding
could encode three parameters in 13 bits.)
These overload-tagged IDs can be passed to
`OP_XOP_OVR{_,_SCALAR,_BYSCALAR}X_{X,X_X,X_X_X}`. The return type of the
intrinsic that generates these mini ops is used to derive the overload tag to
find the corresponding LLVM intrinsic function declaration.
`MonoLLVMModule::intrins_by_id` is removed, because LLVM intrinsic lookup keys
are no longer small contiguous integers. It only seemed to serve as a lookup
table for data already contained in a hash table.
The corresponding instructions for some of these .NET-level intrinsics take
immediate parameters. For some of these instructions, the LLVM IR code that
selects these immediate-argument instructions can emit a fallback for
non-constant parameters, either by using an equivalent instruction with a
register operand or by using a longer and less-efficient instruction sequence.
For the rest, a branching code sequence is emitted. Helper functions
(`immediate_unroll_begin` etc.) are added to make this a little less
repetitious.
Some operations take an immediate operand denoting a lane to select in a vector
before proceeding with another generic vector or scalar operation. These are
decomposed into a sequence of `OP_ARM64_SELECT_SCALAR` followed by the
non-lane-specific operation. LLVM can still optimize this to the lane-selecting
instruction when possible, and can generate fallback code for non-immediate
lane selection.
The tables describing the intrinsics supported by the runtime are extended to
support intrinsics with different target instructions for signed, unsigned and
floating point parameters. Whenever possible, .NET-level intrinsics that
correspond to a single LLVM intrinsic function are stored as a single entry in
these tables. Unfortunately many intrinsics need to be translated into a
sequence of LLVM IR operations; for these, new mini IR opcodes are added to
select the LLVM IR builder code that should run.