Implement AArch64 vector load/store multiple N-element structure class SIMD(lselem).
Including following 14 instructions:
4 ld1 insts: load multiple 1-element structure to sequential 1/2/3/4 registers.
ld2/ld3/ld4: load multiple N-element structure to sequential N registers (N=2,3,4).
4 st1 insts: store multiple 1-element structure from sequential 1/2/3/4 registers.
st2/st3/st4: store multiple N-element structure from sequential N registers (N = 2,3,4).
E.g. ld1(3 registers version) will load 32-bit elements {A, B, C, D, E, F} sequentially into the three 64-bit vectors list {BA, DC, FE}.
E.g. ld3 will load 32-bit elements {A, B, C, D, E, F} into the three 64-bit vectors list {DA, EB, FC}.
llvm-svn: 192351