From: Jonathan Wright Date: Mon, 9 Aug 2021 14:26:48 +0000 (+0100) Subject: aarch64: Add machine modes for Neon vector-tuple types X-Git-Tag: upstream/12.2.0~3787 X-Git-Url: http://review.tizen.org/git/?a=commitdiff_plain;h=66f206b85395c273980e2b81a54dbddc4897e4a7;p=platform%2Fupstream%2Fgcc.git aarch64: Add machine modes for Neon vector-tuple types Until now, GCC has used large integer machine modes (OI, CI and XI) to model Neon vector-tuple types. This is suboptimal for many reasons, the most notable are: 1) Large integer modes are opaque and modifying one vector in the tuple requires a lot of inefficient set/get gymnastics. The result is a lot of superfluous move instructions. 2) Large integer modes do not map well to types that are tuples of 64-bit vectors - we need additional zero-padding which again results in superfluous move instructions. This patch adds new machine modes that better model the C-level Neon vector-tuple types. The approach is somewhat similar to that already used for SVE vector-tuple types. All of the AArch64 backend patterns and builtins that manipulate Neon vector tuples are updated to use the new machine modes. This has the effect of significantly reducing the amount of boiler-plate code in the arm_neon.h header. While this patch increases the quality of code generated in many instances, there is still room for significant improvement - which will be attempted in subsequent patches. gcc/ChangeLog: 2021-08-09 Jonathan Wright Richard Sandiford * config/aarch64/aarch64-builtins.c (v2x8qi_UP): Define. (v2x4hi_UP): Likewise. (v2x4hf_UP): Likewise. (v2x4bf_UP): Likewise. (v2x2si_UP): Likewise. (v2x2sf_UP): Likewise. (v2x1di_UP): Likewise. (v2x1df_UP): Likewise. (v2x16qi_UP): Likewise. (v2x8hi_UP): Likewise. (v2x8hf_UP): Likewise. (v2x8bf_UP): Likewise. (v2x4si_UP): Likewise. (v2x4sf_UP): Likewise. (v2x2di_UP): Likewise. (v2x2df_UP): Likewise. (v3x8qi_UP): Likewise. (v3x4hi_UP): Likewise. (v3x4hf_UP): Likewise. (v3x4bf_UP): Likewise. (v3x2si_UP): Likewise. (v3x2sf_UP): Likewise. (v3x1di_UP): Likewise. (v3x1df_UP): Likewise. (v3x16qi_UP): Likewise. (v3x8hi_UP): Likewise. (v3x8hf_UP): Likewise. (v3x8bf_UP): Likewise. (v3x4si_UP): Likewise. (v3x4sf_UP): Likewise. (v3x2di_UP): Likewise. (v3x2df_UP): Likewise. (v4x8qi_UP): Likewise. (v4x4hi_UP): Likewise. (v4x4hf_UP): Likewise. (v4x4bf_UP): Likewise. (v4x2si_UP): Likewise. (v4x2sf_UP): Likewise. (v4x1di_UP): Likewise. (v4x1df_UP): Likewise. (v4x16qi_UP): Likewise. (v4x8hi_UP): Likewise. (v4x8hf_UP): Likewise. (v4x8bf_UP): Likewise. (v4x4si_UP): Likewise. (v4x4sf_UP): Likewise. (v4x2di_UP): Likewise. (v4x2df_UP): Likewise. (TYPES_GETREGP): Delete. (TYPES_SETREGP): Likewise. (TYPES_LOADSTRUCT_U): Define. (TYPES_LOADSTRUCT_P): Likewise. (TYPES_LOADSTRUCT_LANE_U): Likewise. (TYPES_LOADSTRUCT_LANE_P): Likewise. (TYPES_STORE1P): Move for consistency. (TYPES_STORESTRUCT_U): Define. (TYPES_STORESTRUCT_P): Likewise. (TYPES_STORESTRUCT_LANE_U): Likewise. (TYPES_STORESTRUCT_LANE_P): Likewise. (aarch64_simd_tuple_types): Define. (aarch64_lookup_simd_builtin_type): Handle tuple type lookup. (aarch64_init_simd_builtin_functions): Update frontend lookup for builtin functions after handling arm_neon.h pragma. (register_tuple_type): Manually set modes of single-integer tuple types. Record tuple types. * config/aarch64/aarch64-modes.def (ADV_SIMD_D_REG_STRUCT_MODES): Define D-register tuple modes. (ADV_SIMD_Q_REG_STRUCT_MODES): Define Q-register tuple modes. (SVE_MODES): Give single-vector modes priority over vector- tuple modes. (VECTOR_MODES_WITH_PREFIX): Set partial-vector mode order to be after all single-vector modes. * config/aarch64/aarch64-simd-builtins.def: Update builtin generator macros to reflect modifications to the backend patterns. * config/aarch64/aarch64-simd.md (aarch64_simd_ld2): Use vector-tuple mode iterator and rename to... (aarch64_simd_ld2): This. (aarch64_simd_ld2r): Use vector-tuple mode iterator and rename to... (aarch64_simd_ld2r): This. (aarch64_vec_load_lanesoi_lane): Use vector-tuple mode iterator and rename to... (aarch64_vec_load_lanes_lane): This. (vec_load_lanesoi): Use vector-tuple mode iterator and rename to... (vec_load_lanes): This. (aarch64_simd_st2): Use vector-tuple mode iterator and rename to... (aarch64_simd_st2): This. (aarch64_vec_store_lanesoi_lane): Use vector-tuple mode iterator and rename to... (aarch64_vec_store_lanes_lane): This. (vec_store_lanesoi): Use vector-tuple mode iterator and rename to... (vec_store_lanes): This. (aarch64_simd_ld3): Use vector-tuple mode iterator and rename to... (aarch64_simd_ld3): This. (aarch64_simd_ld3r): Use vector-tuple mode iterator and rename to... (aarch64_simd_ld3r): This. (aarch64_vec_load_lanesci_lane): Use vector-tuple mode iterator and rename to... (vec_load_lanesci): This. (aarch64_simd_st3): Use vector-tuple mode iterator and rename to... (aarch64_simd_st3): This. (aarch64_vec_store_lanesci_lane): Use vector-tuple mode iterator and rename to... (vec_store_lanesci): This. (aarch64_simd_ld4): Use vector-tuple mode iterator and rename to... (aarch64_simd_ld4): This. (aarch64_simd_ld4r): Use vector-tuple mode iterator and rename to... (aarch64_simd_ld4r): This. (aarch64_vec_load_lanesxi_lane): Use vector-tuple mode iterator and rename to... (vec_load_lanesxi): This. (aarch64_simd_st4): Use vector-tuple mode iterator and rename to... (aarch64_simd_st4): This. (aarch64_vec_store_lanesxi_lane): Use vector-tuple mode iterator and rename to... (vec_store_lanesxi): This. (mov): Define for Neon vector-tuple modes. (aarch64_ld1x3): Use vector-tuple mode iterator and rename to... (aarch64_ld1x3): This. (aarch64_ld1_x3_): Use vector-tuple mode iterator and rename to... (aarch64_ld1_x3_): This. (aarch64_ld1x4): Use vector-tuple mode iterator and rename to... (aarch64_ld1x4): This. (aarch64_ld1_x4_): Use vector-tuple mode iterator and rename to... (aarch64_ld1_x4_): This. (aarch64_st1x2): Use vector-tuple mode iterator and rename to... (aarch64_st1x2): This. (aarch64_st1_x2_): Use vector-tuple mode iterator and rename to... (aarch64_st1_x2_): This. (aarch64_st1x3): Use vector-tuple mode iterator and rename to... (aarch64_st1x3): This. (aarch64_st1_x3_): Use vector-tuple mode iterator and rename to... (aarch64_st1_x3_): This. (aarch64_st1x4): Use vector-tuple mode iterator and rename to... (aarch64_st1x4): This. (aarch64_st1_x4_): Use vector-tuple mode iterator and rename to... (aarch64_st1_x4_): This. (*aarch64_mov): Define for vector-tuple modes. (*aarch64_be_mov): Likewise. (aarch64_ldr): Use vector-tuple mode iterator and rename to... (aarch64_ldr): This. (aarch64_ld2_dreg): Use vector-tuple mode iterator and rename to... (aarch64_ld2_dreg): This. (aarch64_ld3_dreg): Use vector-tuple mode iterator and rename to... (aarch64_ld3_dreg): This. (aarch64_ld4_dreg): Use vector-tuple mode iterator and rename to... (aarch64_ld4_dreg): This. (aarch64_ld): Use vector-tuple mode iterator and rename to... (aarch64_ld): Use vector-tuple mode iterator and rename to... (aarch64_ld): Use vector-tuple mode (aarch64_ld1x2): Delete. (aarch64_ld1x2): Use vector-tuple mode iterator and rename to... (aarch64_ld1x2): This. (aarch64_ld_lane): Use vector- tuple mode iterator and rename to... (aarch64_ld_lane): This. (aarch64_get_dreg): Delete. (aarch64_get_qreg): Likewise. (aarch64_st2_dreg): Use vector-tuple mode iterator and rename to... (aarch64_st2_dreg): This. (aarch64_st3_dreg): Use vector-tuple mode iterator and rename to... (aarch64_st3_dreg): This. (aarch64_st4_dreg): Use vector-tuple mode iterator and rename to... (aarch64_st4_dreg): This. (aarch64_st): Use vector-tuple mode iterator and rename to... (aarch64_st): This. (aarch64_st): Use vector-tuple mode iterator and rename to aarch64_st. (aarch64_st_lane): Use vector- tuple mode iterator and rename to... (aarch64_st_lane): This. (aarch64_set_qreg): Delete. (aarch64_simd_ld1_x2): Use vector-tuple mode iterator and rename to... (aarch64_simd_ld1_x2): This. * config/aarch64/aarch64.c (aarch64_advsimd_struct_mode_p): Refactor to include new vector-tuple modes. (aarch64_classify_vector_mode): Add cases for new vector- tuple modes. (aarch64_advsimd_partial_struct_mode_p): Define. (aarch64_advsimd_full_struct_mode_p): Likewise. (aarch64_advsimd_vector_array_mode): Likewise. (aarch64_sve_data_mode): Change location in file. (aarch64_array_mode): Handle case of Neon vector-tuple modes. (aarch64_hard_regno_nregs): Handle case of partial Neon vector structures. (aarch64_classify_address): Refactor to include handling of Neon vector-tuple modes. (aarch64_print_operand): Print "d" for "%R" for a partial Neon vector structure. (aarch64_expand_vec_perm_1): Use new vector-tuple mode. (aarch64_modes_tieable_p): Prevent tieing Neon partial struct modes with scalar machines modes larger than 8 bytes. (aarch64_can_change_mode_class): Don't allow changes between partial and full Neon vector-structure modes. * config/aarch64/arm_neon.h (vst2_lane_f16): Use updated builtin and remove boiler-plate code for opaque mode. (vst2_lane_f32): Likewise. (vst2_lane_f64): Likewise. (vst2_lane_p8): Likewise. (vst2_lane_p16): Likewise. (vst2_lane_p64): Likewise. (vst2_lane_s8): Likewise. (vst2_lane_s16): Likewise. (vst2_lane_s32): Likewise. (vst2_lane_s64): Likewise. (vst2_lane_u8): Likewise. (vst2_lane_u16): Likewise. (vst2_lane_u32): Likewise. (vst2_lane_u64): Likewise. (vst2q_lane_f16): Likewise. (vst2q_lane_f32): Likewise. (vst2q_lane_f64): Likewise. (vst2q_lane_p8): Likewise. (vst2q_lane_p16): Likewise. (vst2q_lane_p64): Likewise. (vst2q_lane_s8): Likewise. (vst2q_lane_s16): Likewise. (vst2q_lane_s32): Likewise. (vst2q_lane_s64): Likewise. (vst2q_lane_u8): Likewise. (vst2q_lane_u16): Likewise. (vst2q_lane_u32): Likewise. (vst2q_lane_u64): Likewise. (vst3_lane_f16): Likewise. (vst3_lane_f32): Likewise. (vst3_lane_f64): Likewise. (vst3_lane_p8): Likewise. (vst3_lane_p16): Likewise. (vst3_lane_p64): Likewise. (vst3_lane_s8): Likewise. (vst3_lane_s16): Likewise. (vst3_lane_s32): Likewise. (vst3_lane_s64): Likewise. (vst3_lane_u8): Likewise. (vst3_lane_u16): Likewise. (vst3_lane_u32): Likewise. (vst3_lane_u64): Likewise. (vst3q_lane_f16): Likewise. (vst3q_lane_f32): Likewise. (vst3q_lane_f64): Likewise. (vst3q_lane_p8): Likewise. (vst3q_lane_p16): Likewise. (vst3q_lane_p64): Likewise. (vst3q_lane_s8): Likewise. (vst3q_lane_s16): Likewise. (vst3q_lane_s32): Likewise. (vst3q_lane_s64): Likewise. (vst3q_lane_u8): Likewise. (vst3q_lane_u16): Likewise. (vst3q_lane_u32): Likewise. (vst3q_lane_u64): Likewise. (vst4_lane_f16): Likewise. (vst4_lane_f32): Likewise. (vst4_lane_f64): Likewise. (vst4_lane_p8): Likewise. (vst4_lane_p16): Likewise. (vst4_lane_p64): Likewise. (vst4_lane_s8): Likewise. (vst4_lane_s16): Likewise. (vst4_lane_s32): Likewise. (vst4_lane_s64): Likewise. (vst4_lane_u8): Likewise. (vst4_lane_u16): Likewise. (vst4_lane_u32): Likewise. (vst4_lane_u64): Likewise. (vst4q_lane_f16): Likewise. (vst4q_lane_f32): Likewise. (vst4q_lane_f64): Likewise. (vst4q_lane_p8): Likewise. (vst4q_lane_p16): Likewise. (vst4q_lane_p64): Likewise. (vst4q_lane_s8): Likewise. (vst4q_lane_s16): Likewise. (vst4q_lane_s32): Likewise. (vst4q_lane_s64): Likewise. (vst4q_lane_u8): Likewise. (vst4q_lane_u16): Likewise. (vst4q_lane_u32): Likewise. (vst4q_lane_u64): Likewise. (vtbl3_s8): Likewise. (vtbl3_u8): Likewise. (vtbl3_p8): Likewise. (vtbl4_s8): Likewise. (vtbl4_u8): Likewise. (vtbl4_p8): Likewise. (vld1_u8_x3): Likewise. (vld1_s8_x3): Likewise. (vld1_u16_x3): Likewise. (vld1_s16_x3): Likewise. (vld1_u32_x3): Likewise. (vld1_s32_x3): Likewise. (vld1_u64_x3): Likewise. (vld1_s64_x3): Likewise. (vld1_f16_x3): Likewise. (vld1_f32_x3): Likewise. (vld1_f64_x3): Likewise. (vld1_p8_x3): Likewise. (vld1_p16_x3): Likewise. (vld1_p64_x3): Likewise. (vld1q_u8_x3): Likewise. (vld1q_s8_x3): Likewise. (vld1q_u16_x3): Likewise. (vld1q_s16_x3): Likewise. (vld1q_u32_x3): Likewise. (vld1q_s32_x3): Likewise. (vld1q_u64_x3): Likewise. (vld1q_s64_x3): Likewise. (vld1q_f16_x3): Likewise. (vld1q_f32_x3): Likewise. (vld1q_f64_x3): Likewise. (vld1q_p8_x3): Likewise. (vld1q_p16_x3): Likewise. (vld1q_p64_x3): Likewise. (vld1_u8_x2): Likewise. (vld1_s8_x2): Likewise. (vld1_u16_x2): Likewise. (vld1_s16_x2): Likewise. (vld1_u32_x2): Likewise. (vld1_s32_x2): Likewise. (vld1_u64_x2): Likewise. (vld1_s64_x2): Likewise. (vld1_f16_x2): Likewise. (vld1_f32_x2): Likewise. (vld1_f64_x2): Likewise. (vld1_p8_x2): Likewise. (vld1_p16_x2): Likewise. (vld1_p64_x2): Likewise. (vld1q_u8_x2): Likewise. (vld1q_s8_x2): Likewise. (vld1q_u16_x2): Likewise. (vld1q_s16_x2): Likewise. (vld1q_u32_x2): Likewise. (vld1q_s32_x2): Likewise. (vld1q_u64_x2): Likewise. (vld1q_s64_x2): Likewise. (vld1q_f16_x2): Likewise. (vld1q_f32_x2): Likewise. (vld1q_f64_x2): Likewise. (vld1q_p8_x2): Likewise. (vld1q_p16_x2): Likewise. (vld1q_p64_x2): Likewise. (vld1_s8_x4): Likewise. (vld1q_s8_x4): Likewise. (vld1_s16_x4): Likewise. (vld1q_s16_x4): Likewise. (vld1_s32_x4): Likewise. (vld1q_s32_x4): Likewise. (vld1_u8_x4): Likewise. (vld1q_u8_x4): Likewise. (vld1_u16_x4): Likewise. (vld1q_u16_x4): Likewise. (vld1_u32_x4): Likewise. (vld1q_u32_x4): Likewise. (vld1_f16_x4): Likewise. (vld1q_f16_x4): Likewise. (vld1_f32_x4): Likewise. (vld1q_f32_x4): Likewise. (vld1_p8_x4): Likewise. (vld1q_p8_x4): Likewise. (vld1_p16_x4): Likewise. (vld1q_p16_x4): Likewise. (vld1_s64_x4): Likewise. (vld1_u64_x4): Likewise. (vld1_p64_x4): Likewise. (vld1q_s64_x4): Likewise. (vld1q_u64_x4): Likewise. (vld1q_p64_x4): Likewise. (vld1_f64_x4): Likewise. (vld1q_f64_x4): Likewise. (vld2_s64): Likewise. (vld2_u64): Likewise. (vld2_f64): Likewise. (vld2_s8): Likewise. (vld2_p8): Likewise. (vld2_p64): Likewise. (vld2_s16): Likewise. (vld2_p16): Likewise. (vld2_s32): Likewise. (vld2_u8): Likewise. (vld2_u16): Likewise. (vld2_u32): Likewise. (vld2_f16): Likewise. (vld2_f32): Likewise. (vld2q_s8): Likewise. (vld2q_p8): Likewise. (vld2q_s16): Likewise. (vld2q_p16): Likewise. (vld2q_p64): Likewise. (vld2q_s32): Likewise. (vld2q_s64): Likewise. (vld2q_u8): Likewise. (vld2q_u16): Likewise. (vld2q_u32): Likewise. (vld2q_u64): Likewise. (vld2q_f16): Likewise. (vld2q_f32): Likewise. (vld2q_f64): Likewise. (vld3_s64): Likewise. (vld3_u64): Likewise. (vld3_f64): Likewise. (vld3_s8): Likewise. (vld3_p8): Likewise. (vld3_s16): Likewise. (vld3_p16): Likewise. (vld3_s32): Likewise. (vld3_u8): Likewise. (vld3_u16): Likewise. (vld3_u32): Likewise. (vld3_f16): Likewise. (vld3_f32): Likewise. (vld3_p64): Likewise. (vld3q_s8): Likewise. (vld3q_p8): Likewise. (vld3q_s16): Likewise. (vld3q_p16): Likewise. (vld3q_s32): Likewise. (vld3q_s64): Likewise. (vld3q_u8): Likewise. (vld3q_u16): Likewise. (vld3q_u32): Likewise. (vld3q_u64): Likewise. (vld3q_f16): Likewise. (vld3q_f32): Likewise. (vld3q_f64): Likewise. (vld3q_p64): Likewise. (vld4_s64): Likewise. (vld4_u64): Likewise. (vld4_f64): Likewise. (vld4_s8): Likewise. (vld4_p8): Likewise. (vld4_s16): Likewise. (vld4_p16): Likewise. (vld4_s32): Likewise. (vld4_u8): Likewise. (vld4_u16): Likewise. (vld4_u32): Likewise. (vld4_f16): Likewise. (vld4_f32): Likewise. (vld4_p64): Likewise. (vld4q_s8): Likewise. (vld4q_p8): Likewise. (vld4q_s16): Likewise. (vld4q_p16): Likewise. (vld4q_s32): Likewise. (vld4q_s64): Likewise. (vld4q_u8): Likewise. (vld4q_u16): Likewise. (vld4q_u32): Likewise. (vld4q_u64): Likewise. (vld4q_f16): Likewise. (vld4q_f32): Likewise. (vld4q_f64): Likewise. (vld4q_p64): Likewise. (vld2_dup_s8): Likewise. (vld2_dup_s16): Likewise. (vld2_dup_s32): Likewise. (vld2_dup_f16): Likewise. (vld2_dup_f32): Likewise. (vld2_dup_f64): Likewise. (vld2_dup_u8): Likewise. (vld2_dup_u16): Likewise. (vld2_dup_u32): Likewise. (vld2_dup_p8): Likewise. (vld2_dup_p16): Likewise. (vld2_dup_p64): Likewise. (vld2_dup_s64): Likewise. (vld2_dup_u64): Likewise. (vld2q_dup_s8): Likewise. (vld2q_dup_p8): Likewise. (vld2q_dup_s16): Likewise. (vld2q_dup_p16): Likewise. (vld2q_dup_s32): Likewise. (vld2q_dup_s64): Likewise. (vld2q_dup_u8): Likewise. (vld2q_dup_u16): Likewise. (vld2q_dup_u32): Likewise. (vld2q_dup_u64): Likewise. (vld2q_dup_f16): Likewise. (vld2q_dup_f32): Likewise. (vld2q_dup_f64): Likewise. (vld2q_dup_p64): Likewise. (vld3_dup_s64): Likewise. (vld3_dup_u64): Likewise. (vld3_dup_f64): Likewise. (vld3_dup_s8): Likewise. (vld3_dup_p8): Likewise. (vld3_dup_s16): Likewise. (vld3_dup_p16): Likewise. (vld3_dup_s32): Likewise. (vld3_dup_u8): Likewise. (vld3_dup_u16): Likewise. (vld3_dup_u32): Likewise. (vld3_dup_f16): Likewise. (vld3_dup_f32): Likewise. (vld3_dup_p64): Likewise. (vld3q_dup_s8): Likewise. (vld3q_dup_p8): Likewise. (vld3q_dup_s16): Likewise. (vld3q_dup_p16): Likewise. (vld3q_dup_s32): Likewise. (vld3q_dup_s64): Likewise. (vld3q_dup_u8): Likewise. (vld3q_dup_u16): Likewise. (vld3q_dup_u32): Likewise. (vld3q_dup_u64): Likewise. (vld3q_dup_f16): Likewise. (vld3q_dup_f32): Likewise. (vld3q_dup_f64): Likewise. (vld3q_dup_p64): Likewise. (vld4_dup_s64): Likewise. (vld4_dup_u64): Likewise. (vld4_dup_f64): Likewise. (vld4_dup_s8): Likewise. (vld4_dup_p8): Likewise. (vld4_dup_s16): Likewise. (vld4_dup_p16): Likewise. (vld4_dup_s32): Likewise. (vld4_dup_u8): Likewise. (vld4_dup_u16): Likewise. (vld4_dup_u32): Likewise. (vld4_dup_f16): Likewise. (vld4_dup_f32): Likewise. (vld4_dup_p64): Likewise. (vld4q_dup_s8): Likewise. (vld4q_dup_p8): Likewise. (vld4q_dup_s16): Likewise. (vld4q_dup_p16): Likewise. (vld4q_dup_s32): Likewise. (vld4q_dup_s64): Likewise. (vld4q_dup_u8): Likewise. (vld4q_dup_u16): Likewise. (vld4q_dup_u32): Likewise. (vld4q_dup_u64): Likewise. (vld4q_dup_f16): Likewise. (vld4q_dup_f32): Likewise. (vld4q_dup_f64): Likewise. (vld4q_dup_p64): Likewise. (vld2_lane_u8): Likewise. (vld2_lane_u16): Likewise. (vld2_lane_u32): Likewise. (vld2_lane_u64): Likewise. (vld2_lane_s8): Likewise. (vld2_lane_s16): Likewise. (vld2_lane_s32): Likewise. (vld2_lane_s64): Likewise. (vld2_lane_f16): Likewise. (vld2_lane_f32): Likewise. (vld2_lane_f64): Likewise. (vld2_lane_p8): Likewise. (vld2_lane_p16): Likewise. (vld2_lane_p64): Likewise. (vld2q_lane_u8): Likewise. (vld2q_lane_u16): Likewise. (vld2q_lane_u32): Likewise. (vld2q_lane_u64): Likewise. (vld2q_lane_s8): Likewise. (vld2q_lane_s16): Likewise. (vld2q_lane_s32): Likewise. (vld2q_lane_s64): Likewise. (vld2q_lane_f16): Likewise. (vld2q_lane_f32): Likewise. (vld2q_lane_f64): Likewise. (vld2q_lane_p8): Likewise. (vld2q_lane_p16): Likewise. (vld2q_lane_p64): Likewise. (vld3_lane_u8): Likewise. (vld3_lane_u16): Likewise. (vld3_lane_u32): Likewise. (vld3_lane_u64): Likewise. (vld3_lane_s8): Likewise. (vld3_lane_s16): Likewise. (vld3_lane_s32): Likewise. (vld3_lane_s64): Likewise. (vld3_lane_f16): Likewise. (vld3_lane_f32): Likewise. (vld3_lane_f64): Likewise. (vld3_lane_p8): Likewise. (vld3_lane_p16): Likewise. (vld3_lane_p64): Likewise. (vld3q_lane_u8): Likewise. (vld3q_lane_u16): Likewise. (vld3q_lane_u32): Likewise. (vld3q_lane_u64): Likewise. (vld3q_lane_s8): Likewise. (vld3q_lane_s16): Likewise. (vld3q_lane_s32): Likewise. (vld3q_lane_s64): Likewise. (vld3q_lane_f16): Likewise. (vld3q_lane_f32): Likewise. (vld3q_lane_f64): Likewise. (vld3q_lane_p8): Likewise. (vld3q_lane_p16): Likewise. (vld3q_lane_p64): Likewise. (vld4_lane_u8): Likewise. (vld4_lane_u16): Likewise. (vld4_lane_u32): Likewise. (vld4_lane_u64): Likewise. (vld4_lane_s8): Likewise. (vld4_lane_s16): Likewise. (vld4_lane_s32): Likewise. (vld4_lane_s64): Likewise. (vld4_lane_f16): Likewise. (vld4_lane_f32): Likewise. (vld4_lane_f64): Likewise. (vld4_lane_p8): Likewise. (vld4_lane_p16): Likewise. (vld4_lane_p64): Likewise. (vld4q_lane_u8): Likewise. (vld4q_lane_u16): Likewise. (vld4q_lane_u32): Likewise. (vld4q_lane_u64): Likewise. (vld4q_lane_s8): Likewise. (vld4q_lane_s16): Likewise. (vld4q_lane_s32): Likewise. (vld4q_lane_s64): Likewise. (vld4q_lane_f16): Likewise. (vld4q_lane_f32): Likewise. (vld4q_lane_f64): Likewise. (vld4q_lane_p8): Likewise. (vld4q_lane_p16): Likewise. (vld4q_lane_p64): Likewise. (vqtbl2_s8): Likewise. (vqtbl2_u8): Likewise. (vqtbl2_p8): Likewise. (vqtbl2q_s8): Likewise. (vqtbl2q_u8): Likewise. (vqtbl2q_p8): Likewise. (vqtbl3_s8): Likewise. (vqtbl3_u8): Likewise. (vqtbl3_p8): Likewise. (vqtbl3q_s8): Likewise. (vqtbl3q_u8): Likewise. (vqtbl3q_p8): Likewise. (vqtbl4_s8): Likewise. (vqtbl4_u8): Likewise. (vqtbl4_p8): Likewise. (vqtbl4q_s8): Likewise. (vqtbl4q_u8): Likewise. (vqtbl4q_p8): Likewise. (vqtbx2_s8): Likewise. (vqtbx2_u8): Likewise. (vqtbx2_p8): Likewise. (vqtbx2q_s8): Likewise. (vqtbx2q_u8): Likewise. (vqtbx2q_p8): Likewise. (vqtbx3_s8): Likewise. (vqtbx3_u8): Likewise. (vqtbx3_p8): Likewise. (vqtbx3q_s8): Likewise. (vqtbx3q_u8): Likewise. (vqtbx3q_p8): Likewise. (vqtbx4_s8): Likewise. (vqtbx4_u8): Likewise. (vqtbx4_p8): Likewise. (vqtbx4q_s8): Likewise. (vqtbx4q_u8): Likewise. (vqtbx4q_p8): Likewise. (vst1_s64_x2): Likewise. (vst1_u64_x2): Likewise. (vst1_f64_x2): Likewise. (vst1_s8_x2): Likewise. (vst1_p8_x2): Likewise. (vst1_s16_x2): Likewise. (vst1_p16_x2): Likewise. (vst1_s32_x2): Likewise. (vst1_u8_x2): Likewise. (vst1_u16_x2): Likewise. (vst1_u32_x2): Likewise. (vst1_f16_x2): Likewise. (vst1_f32_x2): Likewise. (vst1_p64_x2): Likewise. (vst1q_s8_x2): Likewise. (vst1q_p8_x2): Likewise. (vst1q_s16_x2): Likewise. (vst1q_p16_x2): Likewise. (vst1q_s32_x2): Likewise. (vst1q_s64_x2): Likewise. (vst1q_u8_x2): Likewise. (vst1q_u16_x2): Likewise. (vst1q_u32_x2): Likewise. (vst1q_u64_x2): Likewise. (vst1q_f16_x2): Likewise. (vst1q_f32_x2): Likewise. (vst1q_f64_x2): Likewise. (vst1q_p64_x2): Likewise. (vst1_s64_x3): Likewise. (vst1_u64_x3): Likewise. (vst1_f64_x3): Likewise. (vst1_s8_x3): Likewise. (vst1_p8_x3): Likewise. (vst1_s16_x3): Likewise. (vst1_p16_x3): Likewise. (vst1_s32_x3): Likewise. (vst1_u8_x3): Likewise. (vst1_u16_x3): Likewise. (vst1_u32_x3): Likewise. (vst1_f16_x3): Likewise. (vst1_f32_x3): Likewise. (vst1_p64_x3): Likewise. (vst1q_s8_x3): Likewise. (vst1q_p8_x3): Likewise. (vst1q_s16_x3): Likewise. (vst1q_p16_x3): Likewise. (vst1q_s32_x3): Likewise. (vst1q_s64_x3): Likewise. (vst1q_u8_x3): Likewise. (vst1q_u16_x3): Likewise. (vst1q_u32_x3): Likewise. (vst1q_u64_x3): Likewise. (vst1q_f16_x3): Likewise. (vst1q_f32_x3): Likewise. (vst1q_f64_x3): Likewise. (vst1q_p64_x3): Likewise. (vst1_s8_x4): Likewise. (vst1q_s8_x4): Likewise. (vst1_s16_x4): Likewise. (vst1q_s16_x4): Likewise. (vst1_s32_x4): Likewise. (vst1q_s32_x4): Likewise. (vst1_u8_x4): Likewise. (vst1q_u8_x4): Likewise. (vst1_u16_x4): Likewise. (vst1q_u16_x4): Likewise. (vst1_u32_x4): Likewise. (vst1q_u32_x4): Likewise. (vst1_f16_x4): Likewise. (vst1q_f16_x4): Likewise. (vst1_f32_x4): Likewise. (vst1q_f32_x4): Likewise. (vst1_p8_x4): Likewise. (vst1q_p8_x4): Likewise. (vst1_p16_x4): Likewise. (vst1q_p16_x4): Likewise. (vst1_s64_x4): Likewise. (vst1_u64_x4): Likewise. (vst1_p64_x4): Likewise. (vst1q_s64_x4): Likewise. (vst1q_u64_x4): Likewise. (vst1q_p64_x4): Likewise. (vst1_f64_x4): Likewise. (vst1q_f64_x4): Likewise. (vst2_s64): Likewise. (vst2_u64): Likewise. (vst2_f64): Likewise. (vst2_s8): Likewise. (vst2_p8): Likewise. (vst2_s16): Likewise. (vst2_p16): Likewise. (vst2_s32): Likewise. (vst2_u8): Likewise. (vst2_u16): Likewise. (vst2_u32): Likewise. (vst2_f16): Likewise. (vst2_f32): Likewise. (vst2_p64): Likewise. (vst2q_s8): Likewise. (vst2q_p8): Likewise. (vst2q_s16): Likewise. (vst2q_p16): Likewise. (vst2q_s32): Likewise. (vst2q_s64): Likewise. (vst2q_u8): Likewise. (vst2q_u16): Likewise. (vst2q_u32): Likewise. (vst2q_u64): Likewise. (vst2q_f16): Likewise. (vst2q_f32): Likewise. (vst2q_f64): Likewise. (vst2q_p64): Likewise. (vst3_s64): Likewise. (vst3_u64): Likewise. (vst3_f64): Likewise. (vst3_s8): Likewise. (vst3_p8): Likewise. (vst3_s16): Likewise. (vst3_p16): Likewise. (vst3_s32): Likewise. (vst3_u8): Likewise. (vst3_u16): Likewise. (vst3_u32): Likewise. (vst3_f16): Likewise. (vst3_f32): Likewise. (vst3_p64): Likewise. (vst3q_s8): Likewise. (vst3q_p8): Likewise. (vst3q_s16): Likewise. (vst3q_p16): Likewise. (vst3q_s32): Likewise. (vst3q_s64): Likewise. (vst3q_u8): Likewise. (vst3q_u16): Likewise. (vst3q_u32): Likewise. (vst3q_u64): Likewise. (vst3q_f16): Likewise. (vst3q_f32): Likewise. (vst3q_f64): Likewise. (vst3q_p64): Likewise. (vst4_s64): Likewise. (vst4_u64): Likewise. (vst4_f64): Likewise. (vst4_s8): Likewise. (vst4_p8): Likewise. (vst4_s16): Likewise. (vst4_p16): Likewise. (vst4_s32): Likewise. (vst4_u8): Likewise. (vst4_u16): Likewise. (vst4_u32): Likewise. (vst4_f16): Likewise. (vst4_f32): Likewise. (vst4_p64): Likewise. (vst4q_s8): Likewise. (vst4q_p8): Likewise. (vst4q_s16): Likewise. (vst4q_p16): Likewise. (vst4q_s32): Likewise. (vst4q_s64): Likewise. (vst4q_u8): Likewise. (vst4q_u16): Likewise. (vst4q_u32): Likewise. (vst4q_u64): Likewise. (vst4q_f16): Likewise. (vst4q_f32): Likewise. (vst4q_f64): Likewise. (vst4q_p64): Likewise. (vtbx4_s8): Likewise. (vtbx4_u8): Likewise. (vtbx4_p8): Likewise. (vld1_bf16_x2): Likewise. (vld1q_bf16_x2): Likewise. (vld1_bf16_x3): Likewise. (vld1q_bf16_x3): Likewise. (vld1_bf16_x4): Likewise. (vld1q_bf16_x4): Likewise. (vld2_bf16): Likewise. (vld2q_bf16): Likewise. (vld2_dup_bf16): Likewise. (vld2q_dup_bf16): Likewise. (vld3_bf16): Likewise. (vld3q_bf16): Likewise. (vld3_dup_bf16): Likewise. (vld3q_dup_bf16): Likewise. (vld4_bf16): Likewise. (vld4q_bf16): Likewise. (vld4_dup_bf16): Likewise. (vld4q_dup_bf16): Likewise. (vst1_bf16_x2): Likewise. (vst1q_bf16_x2): Likewise. (vst1_bf16_x3): Likewise. (vst1q_bf16_x3): Likewise. (vst1_bf16_x4): Likewise. (vst1q_bf16_x4): Likewise. (vst2_bf16): Likewise. (vst2q_bf16): Likewise. (vst3_bf16): Likewise. (vst3q_bf16): Likewise. (vst4_bf16): Likewise. (vst4q_bf16): Likewise. (vld2_lane_bf16): Likewise. (vld2q_lane_bf16): Likewise. (vld3_lane_bf16): Likewise. (vld3q_lane_bf16): Likewise. (vld4_lane_bf16): Likewise. (vld4q_lane_bf16): Likewise. (vst2_lane_bf16): Likewise. (vst2q_lane_bf16): Likewise. (vst3_lane_bf16): Likewise. (vst3q_lane_bf16): Likewise. (vst4_lane_bf16): Likewise. (vst4q_lane_bf16): Likewise. * config/aarch64/geniterators.sh: Modify iterator regex to match new vector-tuple modes. * config/aarch64/iterators.md (insn_count): Extend mode attribute with vector-tuple type information. (nregs): Likewise. (Vendreg): Likewise. (Vetype): Likewise. (Vtype): Likewise. (VSTRUCT_2D): New mode iterator. (VSTRUCT_2DNX): Likewise. (VSTRUCT_2DX): Likewise. (VSTRUCT_2Q): Likewise. (VSTRUCT_2QD): Likewise. (VSTRUCT_3D): Likewise. (VSTRUCT_3DNX): Likewise. (VSTRUCT_3DX): Likewise. (VSTRUCT_3Q): Likewise. (VSTRUCT_3QD): Likewise. (VSTRUCT_4D): Likewise. (VSTRUCT_4DNX): Likewise. (VSTRUCT_4DX): Likewise. (VSTRUCT_4Q): Likewise. (VSTRUCT_4QD): Likewise. (VSTRUCT_D): Likewise. (VSTRUCT_Q): Likewise. (VSTRUCT_QD): Likewise. (VSTRUCT_ELT): New mode attribute. (vstruct_elt): Likewise. * genmodes.c (VECTOR_MODE): Add default prefix and order parameters. (VECTOR_MODE_WITH_PREFIX): Define. (make_vector_mode): Add mode prefix and order parameters. gcc/testsuite/ChangeLog: * gcc.target/aarch64/advsimd-intrinsics/bf16_vldN_lane_2.c: Relax incorrect register number requirement. * gcc.target/aarch64/sve/pcs/struct_3_256.c: Accept equivalent codegen with fmov. --- diff --git a/gcc/config/aarch64/aarch64-builtins.c b/gcc/config/aarch64/aarch64-builtins.c index eff4cdc..ed91c2b 100644 --- a/gcc/config/aarch64/aarch64-builtins.c +++ b/gcc/config/aarch64/aarch64-builtins.c @@ -75,6 +75,54 @@ #define bf_UP E_BFmode #define v4bf_UP E_V4BFmode #define v8bf_UP E_V8BFmode +#define v2x8qi_UP E_V2x8QImode +#define v2x4hi_UP E_V2x4HImode +#define v2x4hf_UP E_V2x4HFmode +#define v2x4bf_UP E_V2x4BFmode +#define v2x2si_UP E_V2x2SImode +#define v2x2sf_UP E_V2x2SFmode +#define v2x1di_UP E_V2x1DImode +#define v2x1df_UP E_V2x1DFmode +#define v2x16qi_UP E_V2x16QImode +#define v2x8hi_UP E_V2x8HImode +#define v2x8hf_UP E_V2x8HFmode +#define v2x8bf_UP E_V2x8BFmode +#define v2x4si_UP E_V2x4SImode +#define v2x4sf_UP E_V2x4SFmode +#define v2x2di_UP E_V2x2DImode +#define v2x2df_UP E_V2x2DFmode +#define v3x8qi_UP E_V3x8QImode +#define v3x4hi_UP E_V3x4HImode +#define v3x4hf_UP E_V3x4HFmode +#define v3x4bf_UP E_V3x4BFmode +#define v3x2si_UP E_V3x2SImode +#define v3x2sf_UP E_V3x2SFmode +#define v3x1di_UP E_V3x1DImode +#define v3x1df_UP E_V3x1DFmode +#define v3x16qi_UP E_V3x16QImode +#define v3x8hi_UP E_V3x8HImode +#define v3x8hf_UP E_V3x8HFmode +#define v3x8bf_UP E_V3x8BFmode +#define v3x4si_UP E_V3x4SImode +#define v3x4sf_UP E_V3x4SFmode +#define v3x2di_UP E_V3x2DImode +#define v3x2df_UP E_V3x2DFmode +#define v4x8qi_UP E_V4x8QImode +#define v4x4hi_UP E_V4x4HImode +#define v4x4hf_UP E_V4x4HFmode +#define v4x4bf_UP E_V4x4BFmode +#define v4x2si_UP E_V4x2SImode +#define v4x2sf_UP E_V4x2SFmode +#define v4x1di_UP E_V4x1DImode +#define v4x1df_UP E_V4x1DFmode +#define v4x16qi_UP E_V4x16QImode +#define v4x8hi_UP E_V4x8HImode +#define v4x8hf_UP E_V4x8HFmode +#define v4x8bf_UP E_V4x8BFmode +#define v4x4si_UP E_V4x4SImode +#define v4x4sf_UP E_V4x4SFmode +#define v4x2di_UP E_V4x2DImode +#define v4x2df_UP E_V4x2DFmode #define UP(X) X##_UP #define SIMD_MAX_BUILTIN_ARGS 5 @@ -228,7 +276,6 @@ aarch64_types_binop_pppu_qualifiers[SIMD_MAX_BUILTIN_ARGS] = { qualifier_poly, qualifier_poly, qualifier_poly, qualifier_unsigned }; #define TYPES_TERNOP_PPPU (aarch64_types_binop_pppu_qualifiers) - static enum aarch64_type_qualifiers aarch64_types_quadop_lane_pair_qualifiers[SIMD_MAX_BUILTIN_ARGS] = { qualifier_none, qualifier_none, qualifier_none, @@ -265,10 +312,6 @@ aarch64_types_quadopu_imm_qualifiers[SIMD_MAX_BUILTIN_ARGS] #define TYPES_QUADOPUI (aarch64_types_quadopu_imm_qualifiers) static enum aarch64_type_qualifiers -aarch64_types_binop_imm_p_qualifiers[SIMD_MAX_BUILTIN_ARGS] - = { qualifier_poly, qualifier_none, qualifier_immediate }; -#define TYPES_GETREGP (aarch64_types_binop_imm_p_qualifiers) -static enum aarch64_type_qualifiers aarch64_types_binop_imm_qualifiers[SIMD_MAX_BUILTIN_ARGS] = { qualifier_none, qualifier_none, qualifier_immediate }; #define TYPES_GETREG (aarch64_types_binop_imm_qualifiers) @@ -292,10 +335,6 @@ aarch64_types_shift2_to_unsigned_qualifiers[SIMD_MAX_BUILTIN_ARGS] #define TYPES_SHIFT2IMM_UUSS (aarch64_types_shift2_to_unsigned_qualifiers) static enum aarch64_type_qualifiers -aarch64_types_ternop_s_imm_p_qualifiers[SIMD_MAX_BUILTIN_ARGS] - = { qualifier_none, qualifier_none, qualifier_poly, qualifier_immediate}; -#define TYPES_SETREGP (aarch64_types_ternop_s_imm_p_qualifiers) -static enum aarch64_type_qualifiers aarch64_types_ternop_s_imm_qualifiers[SIMD_MAX_BUILTIN_ARGS] = { qualifier_none, qualifier_none, qualifier_none, qualifier_immediate}; #define TYPES_SETREG (aarch64_types_ternop_s_imm_qualifiers) @@ -331,10 +370,29 @@ aarch64_types_load1_qualifiers[SIMD_MAX_BUILTIN_ARGS] #define TYPES_LOAD1 (aarch64_types_load1_qualifiers) #define TYPES_LOADSTRUCT (aarch64_types_load1_qualifiers) static enum aarch64_type_qualifiers +aarch64_types_load1_u_qualifiers[SIMD_MAX_BUILTIN_ARGS] + = { qualifier_unsigned, qualifier_const_pointer_map_mode }; +#define TYPES_LOADSTRUCT_U (aarch64_types_load1_u_qualifiers) +static enum aarch64_type_qualifiers +aarch64_types_load1_p_qualifiers[SIMD_MAX_BUILTIN_ARGS] + = { qualifier_poly, qualifier_const_pointer_map_mode }; +#define TYPES_LOADSTRUCT_P (aarch64_types_load1_p_qualifiers) + +static enum aarch64_type_qualifiers aarch64_types_loadstruct_lane_qualifiers[SIMD_MAX_BUILTIN_ARGS] = { qualifier_none, qualifier_const_pointer_map_mode, qualifier_none, qualifier_struct_load_store_lane_index }; #define TYPES_LOADSTRUCT_LANE (aarch64_types_loadstruct_lane_qualifiers) +static enum aarch64_type_qualifiers +aarch64_types_loadstruct_lane_u_qualifiers[SIMD_MAX_BUILTIN_ARGS] + = { qualifier_unsigned, qualifier_const_pointer_map_mode, + qualifier_unsigned, qualifier_struct_load_store_lane_index }; +#define TYPES_LOADSTRUCT_LANE_U (aarch64_types_loadstruct_lane_u_qualifiers) +static enum aarch64_type_qualifiers +aarch64_types_loadstruct_lane_p_qualifiers[SIMD_MAX_BUILTIN_ARGS] + = { qualifier_poly, qualifier_const_pointer_map_mode, + qualifier_poly, qualifier_struct_load_store_lane_index }; +#define TYPES_LOADSTRUCT_LANE_P (aarch64_types_loadstruct_lane_p_qualifiers) static enum aarch64_type_qualifiers aarch64_types_bsl_p_qualifiers[SIMD_MAX_BUILTIN_ARGS] @@ -358,19 +416,35 @@ aarch64_types_bsl_u_qualifiers[SIMD_MAX_BUILTIN_ARGS] qualifier_map_mode | qualifier_pointer to build a pointer to the element type of the vector. */ static enum aarch64_type_qualifiers -aarch64_types_store1_p_qualifiers[SIMD_MAX_BUILTIN_ARGS] - = { qualifier_void, qualifier_pointer_map_mode, qualifier_poly }; -#define TYPES_STORE1P (aarch64_types_store1_p_qualifiers) -static enum aarch64_type_qualifiers aarch64_types_store1_qualifiers[SIMD_MAX_BUILTIN_ARGS] = { qualifier_void, qualifier_pointer_map_mode, qualifier_none }; #define TYPES_STORE1 (aarch64_types_store1_qualifiers) #define TYPES_STORESTRUCT (aarch64_types_store1_qualifiers) static enum aarch64_type_qualifiers +aarch64_types_store1_u_qualifiers[SIMD_MAX_BUILTIN_ARGS] + = { qualifier_void, qualifier_pointer_map_mode, qualifier_unsigned }; +#define TYPES_STORESTRUCT_U (aarch64_types_store1_u_qualifiers) +static enum aarch64_type_qualifiers +aarch64_types_store1_p_qualifiers[SIMD_MAX_BUILTIN_ARGS] + = { qualifier_void, qualifier_pointer_map_mode, qualifier_poly }; +#define TYPES_STORE1P (aarch64_types_store1_p_qualifiers) +#define TYPES_STORESTRUCT_P (aarch64_types_store1_p_qualifiers) + +static enum aarch64_type_qualifiers aarch64_types_storestruct_lane_qualifiers[SIMD_MAX_BUILTIN_ARGS] = { qualifier_void, qualifier_pointer_map_mode, qualifier_none, qualifier_struct_load_store_lane_index }; #define TYPES_STORESTRUCT_LANE (aarch64_types_storestruct_lane_qualifiers) +static enum aarch64_type_qualifiers +aarch64_types_storestruct_lane_u_qualifiers[SIMD_MAX_BUILTIN_ARGS] + = { qualifier_void, qualifier_pointer_map_mode, + qualifier_unsigned, qualifier_struct_load_store_lane_index }; +#define TYPES_STORESTRUCT_LANE_U (aarch64_types_storestruct_lane_u_qualifiers) +static enum aarch64_type_qualifiers +aarch64_types_storestruct_lane_p_qualifiers[SIMD_MAX_BUILTIN_ARGS] + = { qualifier_void, qualifier_pointer_map_mode, + qualifier_poly, qualifier_struct_load_store_lane_index }; +#define TYPES_STORESTRUCT_LANE_P (aarch64_types_storestruct_lane_p_qualifiers) #define CF0(N, X) CODE_FOR_aarch64_##N##X #define CF1(N, X) CODE_FOR_##N##X##1 @@ -644,6 +718,8 @@ static GTY(()) struct aarch64_simd_type_info aarch64_simd_types [] = { }; #undef ENTRY +static GTY(()) tree aarch64_simd_tuple_types[ARM_NEON_H_TYPES_LAST][3]; + static GTY(()) tree aarch64_simd_intOI_type_node = NULL_TREE; static GTY(()) tree aarch64_simd_intCI_type_node = NULL_TREE; static GTY(()) tree aarch64_simd_intXI_type_node = NULL_TREE; @@ -764,9 +840,16 @@ aarch64_lookup_simd_builtin_type (machine_mode mode, return aarch64_simd_builtin_std_type (mode, q); for (i = 0; i < nelts; i++) - if (aarch64_simd_types[i].mode == mode - && aarch64_simd_types[i].q == q) - return aarch64_simd_types[i].itype; + { + if (aarch64_simd_types[i].mode == mode + && aarch64_simd_types[i].q == q) + return aarch64_simd_types[i].itype; + if (aarch64_simd_tuple_types[i][0] != NULL_TREE) + for (int j = 0; j < 3; j++) + if (TYPE_MODE (aarch64_simd_tuple_types[i][j]) == mode + && aarch64_simd_types[i].q == q) + return aarch64_simd_tuple_types[i][j]; + } return NULL_TREE; } @@ -1173,7 +1256,17 @@ aarch64_init_simd_builtin_functions (bool called_from_pragma) tree attrs = aarch64_get_attributes (d->flags, d->mode); - fndecl = aarch64_general_add_builtin (namebuf, ftype, fcode, attrs); + if (called_from_pragma) + { + unsigned int raw_code + = (fcode << AARCH64_BUILTIN_SHIFT) | AARCH64_BUILTIN_GENERAL; + fndecl = simulate_builtin_function_decl (input_location, namebuf, + ftype, raw_code, NULL, + attrs); + } + else + fndecl = aarch64_general_add_builtin (namebuf, ftype, fcode, attrs); + aarch64_builtin_decls[fcode] = fndecl; } } @@ -1195,6 +1288,16 @@ register_tuple_type (unsigned int num_vectors, unsigned int type_index) tree vector_type = type->itype; tree array_type = build_array_type_nelts (vector_type, num_vectors); + if (type->mode == DImode) + { + if (num_vectors == 2) + SET_TYPE_MODE (array_type, V2x1DImode); + else if (num_vectors == 3) + SET_TYPE_MODE (array_type, V3x1DImode); + else if (num_vectors == 4) + SET_TYPE_MODE (array_type, V4x1DImode); + } + unsigned int alignment = (known_eq (GET_MODE_SIZE (type->mode), 16) ? 128 : 64); gcc_assert (TYPE_MODE_RAW (array_type) == TYPE_MODE (array_type) @@ -1209,6 +1312,13 @@ register_tuple_type (unsigned int num_vectors, unsigned int type_index) 1)); gcc_assert (TYPE_MODE_RAW (t) == TYPE_MODE (t) && TYPE_ALIGN (t) == alignment); + + if (num_vectors == 2) + aarch64_simd_tuple_types[type_index][0] = t; + else if (num_vectors == 3) + aarch64_simd_tuple_types[type_index][1] = t; + else if (num_vectors == 4) + aarch64_simd_tuple_types[type_index][2] = t; } static bool diff --git a/gcc/config/aarch64/aarch64-modes.def b/gcc/config/aarch64/aarch64-modes.def index 1a07bc1..ac97d22 100644 --- a/gcc/config/aarch64/aarch64-modes.def +++ b/gcc/config/aarch64/aarch64-modes.def @@ -81,13 +81,69 @@ INT_MODE (OI, 32); INT_MODE (CI, 48); INT_MODE (XI, 64); +/* Define Advanced SIMD modes for structures of 2, 3 and 4 d-registers. */ +#define ADV_SIMD_D_REG_STRUCT_MODES(NVECS, VB, VH, VS, VD) \ + VECTOR_MODES_WITH_PREFIX (V##NVECS##x, INT, 8, 3); \ + VECTOR_MODES_WITH_PREFIX (V##NVECS##x, FLOAT, 8, 3); \ + VECTOR_MODE_WITH_PREFIX (V##NVECS##x, FLOAT, DF, 1, 3); \ + VECTOR_MODE_WITH_PREFIX (V##NVECS##x, INT, DI, 1, 3); \ + \ + ADJUST_NUNITS (VB##QI, NVECS * 8); \ + ADJUST_NUNITS (VH##HI, NVECS * 4); \ + ADJUST_NUNITS (VS##SI, NVECS * 2); \ + ADJUST_NUNITS (VD##DI, NVECS); \ + ADJUST_NUNITS (VH##BF, NVECS * 4); \ + ADJUST_NUNITS (VH##HF, NVECS * 4); \ + ADJUST_NUNITS (VS##SF, NVECS * 2); \ + ADJUST_NUNITS (VD##DF, NVECS); \ + \ + ADJUST_ALIGNMENT (VB##QI, 8); \ + ADJUST_ALIGNMENT (VH##HI, 8); \ + ADJUST_ALIGNMENT (VS##SI, 8); \ + ADJUST_ALIGNMENT (VD##DI, 8); \ + ADJUST_ALIGNMENT (VH##BF, 8); \ + ADJUST_ALIGNMENT (VH##HF, 8); \ + ADJUST_ALIGNMENT (VS##SF, 8); \ + ADJUST_ALIGNMENT (VD##DF, 8); + +ADV_SIMD_D_REG_STRUCT_MODES (2, V2x8, V2x4, V2x2, V2x1) +ADV_SIMD_D_REG_STRUCT_MODES (3, V3x8, V3x4, V3x2, V3x1) +ADV_SIMD_D_REG_STRUCT_MODES (4, V4x8, V4x4, V4x2, V4x1) + +/* Define Advanced SIMD modes for structures of 2, 3 and 4 q-registers. */ +#define ADV_SIMD_Q_REG_STRUCT_MODES(NVECS, VB, VH, VS, VD) \ + VECTOR_MODES_WITH_PREFIX (V##NVECS##x, INT, 16, 3); \ + VECTOR_MODES_WITH_PREFIX (V##NVECS##x, FLOAT, 16, 3); \ + \ + ADJUST_NUNITS (VB##QI, NVECS * 16); \ + ADJUST_NUNITS (VH##HI, NVECS * 8); \ + ADJUST_NUNITS (VS##SI, NVECS * 4); \ + ADJUST_NUNITS (VD##DI, NVECS * 2); \ + ADJUST_NUNITS (VH##BF, NVECS * 8); \ + ADJUST_NUNITS (VH##HF, NVECS * 8); \ + ADJUST_NUNITS (VS##SF, NVECS * 4); \ + ADJUST_NUNITS (VD##DF, NVECS * 2); \ + \ + ADJUST_ALIGNMENT (VB##QI, 16); \ + ADJUST_ALIGNMENT (VH##HI, 16); \ + ADJUST_ALIGNMENT (VS##SI, 16); \ + ADJUST_ALIGNMENT (VD##DI, 16); \ + ADJUST_ALIGNMENT (VH##BF, 16); \ + ADJUST_ALIGNMENT (VH##HF, 16); \ + ADJUST_ALIGNMENT (VS##SF, 16); \ + ADJUST_ALIGNMENT (VD##DF, 16); + +ADV_SIMD_Q_REG_STRUCT_MODES (2, V2x16, V2x8, V2x4, V2x2) +ADV_SIMD_Q_REG_STRUCT_MODES (3, V3x16, V3x8, V3x4, V3x2) +ADV_SIMD_Q_REG_STRUCT_MODES (4, V4x16, V4x8, V4x4, V4x2) + /* Define SVE modes for NVECS vectors. VB, VH, VS and VD are the prefixes for 8-bit, 16-bit, 32-bit and 64-bit elements respectively. It isn't strictly necessary to set the alignment here, since the default would be clamped to BIGGEST_ALIGNMENT anyhow, but it seems clearer. */ #define SVE_MODES(NVECS, VB, VH, VS, VD) \ - VECTOR_MODES_WITH_PREFIX (VNx, INT, 16 * NVECS, 0); \ - VECTOR_MODES_WITH_PREFIX (VNx, FLOAT, 16 * NVECS, 0); \ + VECTOR_MODES_WITH_PREFIX (VNx, INT, 16 * NVECS, NVECS == 1 ? 1 : 4); \ + VECTOR_MODES_WITH_PREFIX (VNx, FLOAT, 16 * NVECS, NVECS == 1 ? 1 : 4); \ \ ADJUST_NUNITS (VB##QI, aarch64_sve_vg * NVECS * 8); \ ADJUST_NUNITS (VH##HI, aarch64_sve_vg * NVECS * 4); \ @@ -123,14 +179,14 @@ SVE_MODES (4, VNx64, VNx32, VNx16, VNx8) In memory they occupy contiguous locations, in the same way as fixed-length vectors. E.g. VNx8QImode is half the size of VNx16QImode. - Passing 1 as the final argument ensures that the modes come after all - other modes in the GET_MODE_WIDER chain, so that we never pick them - in preference to a full vector mode. */ -VECTOR_MODES_WITH_PREFIX (VNx, INT, 2, 1); -VECTOR_MODES_WITH_PREFIX (VNx, INT, 4, 1); -VECTOR_MODES_WITH_PREFIX (VNx, INT, 8, 1); -VECTOR_MODES_WITH_PREFIX (VNx, FLOAT, 4, 1); -VECTOR_MODES_WITH_PREFIX (VNx, FLOAT, 8, 1); + Passing 2 as the final argument ensures that the modes come after all + other single-vector modes in the GET_MODE_WIDER chain, so that we never + pick them in preference to a full vector mode. */ +VECTOR_MODES_WITH_PREFIX (VNx, INT, 2, 2); +VECTOR_MODES_WITH_PREFIX (VNx, INT, 4, 2); +VECTOR_MODES_WITH_PREFIX (VNx, INT, 8, 2); +VECTOR_MODES_WITH_PREFIX (VNx, FLOAT, 4, 2); +VECTOR_MODES_WITH_PREFIX (VNx, FLOAT, 8, 2); ADJUST_NUNITS (VNx2QI, aarch64_sve_vg); ADJUST_NUNITS (VNx2HI, aarch64_sve_vg); diff --git a/gcc/config/aarch64/aarch64-simd-builtins.def b/gcc/config/aarch64/aarch64-simd-builtins.def index be1c5b5..6546e91 100644 --- a/gcc/config/aarch64/aarch64-simd-builtins.def +++ b/gcc/config/aarch64/aarch64-simd-builtins.def @@ -76,59 +76,86 @@ BUILTIN_VSDQ_I (BINOP_SSU, suqadd, 0, NONE) BUILTIN_VSDQ_I (BINOP_UUS, usqadd, 0, NONE) - /* Implemented by aarch64_get_dreg. */ - BUILTIN_VDC (GETREG, get_dregoi, 0, AUTO_FP) - BUILTIN_VDC (GETREG, get_dregci, 0, AUTO_FP) - BUILTIN_VDC (GETREG, get_dregxi, 0, AUTO_FP) - VAR1 (GETREGP, get_dregoi, 0, AUTO_FP, di) - VAR1 (GETREGP, get_dregci, 0, AUTO_FP, di) - VAR1 (GETREGP, get_dregxi, 0, AUTO_FP, di) - /* Implemented by aarch64_get_qreg. */ - BUILTIN_VQ (GETREG, get_qregoi, 0, AUTO_FP) - BUILTIN_VQ (GETREG, get_qregci, 0, AUTO_FP) - BUILTIN_VQ (GETREG, get_qregxi, 0, AUTO_FP) - VAR1 (GETREGP, get_qregoi, 0, AUTO_FP, v2di) - VAR1 (GETREGP, get_qregci, 0, AUTO_FP, v2di) - VAR1 (GETREGP, get_qregxi, 0, AUTO_FP, v2di) - /* Implemented by aarch64_set_qreg. */ - BUILTIN_VQ (SETREG, set_qregoi, 0, AUTO_FP) - BUILTIN_VQ (SETREG, set_qregci, 0, AUTO_FP) - BUILTIN_VQ (SETREG, set_qregxi, 0, AUTO_FP) - VAR1 (SETREGP, set_qregoi, 0, AUTO_FP, v2di) - VAR1 (SETREGP, set_qregci, 0, AUTO_FP, v2di) - VAR1 (SETREGP, set_qregxi, 0, AUTO_FP, v2di) - /* Implemented by aarch64_ld1x2. */ - BUILTIN_VQ (LOADSTRUCT, ld1x2, 0, LOAD) - /* Implemented by aarch64_ld1x2. */ - BUILTIN_VDC (LOADSTRUCT, ld1x2, 0, LOAD) - /* Implemented by aarch64_ld. */ - BUILTIN_VDC (LOADSTRUCT, ld2, 0, LOAD) - BUILTIN_VDC (LOADSTRUCT, ld3, 0, LOAD) - BUILTIN_VDC (LOADSTRUCT, ld4, 0, LOAD) - /* Implemented by aarch64_ld. */ - BUILTIN_VQ (LOADSTRUCT, ld2, 0, LOAD) - BUILTIN_VQ (LOADSTRUCT, ld3, 0, LOAD) - BUILTIN_VQ (LOADSTRUCT, ld4, 0, LOAD) - /* Implemented by aarch64_ldr. */ + /* Implemented by aarch64_ld1x2. */ + BUILTIN_VALLDIF (LOADSTRUCT, ld1x2, 0, LOAD) + BUILTIN_VSDQ_I_DI (LOADSTRUCT_U, ld1x2, 0, LOAD) + BUILTIN_VALLP (LOADSTRUCT_P, ld1x2, 0, LOAD) + /* Implemented by aarch64_ld1x3. */ + BUILTIN_VALLDIF (LOADSTRUCT, ld1x3, 0, LOAD) + BUILTIN_VSDQ_I_DI (LOADSTRUCT_U, ld1x3, 0, LOAD) + BUILTIN_VALLP (LOADSTRUCT_P, ld1x3, 0, LOAD) + /* Implemented by aarch64_ld1x4. */ + BUILTIN_VALLDIF (LOADSTRUCT, ld1x4, 0, LOAD) + BUILTIN_VSDQ_I_DI (LOADSTRUCT_U, ld1x4, 0, LOAD) + BUILTIN_VALLP (LOADSTRUCT_P, ld1x4, 0, LOAD) + + /* Implemented by aarch64_st1x2. */ + BUILTIN_VALLDIF (STORESTRUCT, st1x2, 0, STORE) + BUILTIN_VSDQ_I_DI (STORESTRUCT_U, st1x2, 0, STORE) + BUILTIN_VALLP (STORESTRUCT_P, st1x2, 0, STORE) + /* Implemented by aarch64_st1x3. */ + BUILTIN_VALLDIF (STORESTRUCT, st1x3, 0, STORE) + BUILTIN_VSDQ_I_DI (STORESTRUCT_U, st1x3, 0, STORE) + BUILTIN_VALLP (STORESTRUCT_P, st1x3, 0, STORE) + /* Implemented by aarch64_st1x4. */ + BUILTIN_VALLDIF (STORESTRUCT, st1x4, 0, STORE) + BUILTIN_VSDQ_I_DI (STORESTRUCT_U, st1x4, 0, STORE) + BUILTIN_VALLP (STORESTRUCT_P, st1x4, 0, STORE) + + /* Implemented by aarch64_ld. */ + BUILTIN_VALLDIF (LOADSTRUCT, ld2, 0, LOAD) + BUILTIN_VSDQ_I_DI (LOADSTRUCT_U, ld2, 0, LOAD) + BUILTIN_VALLP (LOADSTRUCT_P, ld2, 0, LOAD) + BUILTIN_VALLDIF (LOADSTRUCT, ld3, 0, LOAD) + BUILTIN_VSDQ_I_DI (LOADSTRUCT_U, ld3, 0, LOAD) + BUILTIN_VALLP (LOADSTRUCT_P, ld3, 0, LOAD) + BUILTIN_VALLDIF (LOADSTRUCT, ld4, 0, LOAD) + BUILTIN_VSDQ_I_DI (LOADSTRUCT_U, ld4, 0, LOAD) + BUILTIN_VALLP (LOADSTRUCT_P, ld4, 0, LOAD) + + /* Implemented by aarch64_st. */ + BUILTIN_VALLDIF (STORESTRUCT, st2, 0, STORE) + BUILTIN_VSDQ_I_DI (STORESTRUCT_U, st2, 0, STORE) + BUILTIN_VALLP (STORESTRUCT_P, st2, 0, STORE) + BUILTIN_VALLDIF (STORESTRUCT, st3, 0, STORE) + BUILTIN_VSDQ_I_DI (STORESTRUCT_U, st3, 0, STORE) + BUILTIN_VALLP (STORESTRUCT_P, st3, 0, STORE) + BUILTIN_VALLDIF (STORESTRUCT, st4, 0, STORE) + BUILTIN_VSDQ_I_DI (STORESTRUCT_U, st4, 0, STORE) + BUILTIN_VALLP (STORESTRUCT_P, st4, 0, STORE) + + /* Implemented by aarch64_ldr. */ BUILTIN_VALLDIF (LOADSTRUCT, ld2r, 0, LOAD) + BUILTIN_VSDQ_I_DI (LOADSTRUCT_U, ld2r, 0, LOAD) + BUILTIN_VALLP (LOADSTRUCT_P, ld2r, 0, LOAD) BUILTIN_VALLDIF (LOADSTRUCT, ld3r, 0, LOAD) + BUILTIN_VSDQ_I_DI (LOADSTRUCT_U, ld3r, 0, LOAD) + BUILTIN_VALLP (LOADSTRUCT_P, ld3r, 0, LOAD) BUILTIN_VALLDIF (LOADSTRUCT, ld4r, 0, LOAD) - /* Implemented by aarch64_ld_lane. */ + BUILTIN_VSDQ_I_DI (LOADSTRUCT_U, ld4r, 0, LOAD) + BUILTIN_VALLP (LOADSTRUCT_P, ld4r, 0, LOAD) + + /* Implemented by aarch64_ld_lane. */ BUILTIN_VALLDIF (LOADSTRUCT_LANE, ld2_lane, 0, ALL) + BUILTIN_VSDQ_I_DI (LOADSTRUCT_LANE_U, ld2_lane, 0, ALL) + BUILTIN_VALLP (LOADSTRUCT_LANE_P, ld2_lane, 0, ALL) BUILTIN_VALLDIF (LOADSTRUCT_LANE, ld3_lane, 0, ALL) + BUILTIN_VSDQ_I_DI (LOADSTRUCT_LANE_U, ld3_lane, 0, ALL) + BUILTIN_VALLP (LOADSTRUCT_LANE_P, ld3_lane, 0, ALL) BUILTIN_VALLDIF (LOADSTRUCT_LANE, ld4_lane, 0, ALL) - /* Implemented by aarch64_st. */ - BUILTIN_VDC (STORESTRUCT, st2, 0, STORE) - BUILTIN_VDC (STORESTRUCT, st3, 0, STORE) - BUILTIN_VDC (STORESTRUCT, st4, 0, STORE) - /* Implemented by aarch64_st. */ - BUILTIN_VQ (STORESTRUCT, st2, 0, STORE) - BUILTIN_VQ (STORESTRUCT, st3, 0, STORE) - BUILTIN_VQ (STORESTRUCT, st4, 0, STORE) + BUILTIN_VSDQ_I_DI (LOADSTRUCT_LANE_U, ld4_lane, 0, ALL) + BUILTIN_VALLP (LOADSTRUCT_LANE_P, ld4_lane, 0, ALL) + /* Implemented by aarch64_st_lane. */ BUILTIN_VALLDIF (STORESTRUCT_LANE, st2_lane, 0, ALL) + BUILTIN_VSDQ_I_DI (STORESTRUCT_LANE_U, st2_lane, 0, ALL) + BUILTIN_VALLP (STORESTRUCT_LANE_P, st2_lane, 0, ALL) BUILTIN_VALLDIF (STORESTRUCT_LANE, st3_lane, 0, ALL) + BUILTIN_VSDQ_I_DI (STORESTRUCT_LANE_U, st3_lane, 0, ALL) + BUILTIN_VALLP (STORESTRUCT_LANE_P, st3_lane, 0, ALL) BUILTIN_VALLDIF (STORESTRUCT_LANE, st4_lane, 0, ALL) + BUILTIN_VSDQ_I_DI (STORESTRUCT_LANE_U, st4_lane, 0, ALL) + BUILTIN_VALLP (STORESTRUCT_LANE_P, st4_lane, 0, ALL) BUILTIN_VQW (BINOP, saddl2, 0, NONE) BUILTIN_VQW (BINOP, uaddl2, 0, NONE) @@ -657,21 +684,6 @@ BUILTIN_VALL_F16 (STORE1, st1, 0, STORE) VAR1 (STORE1P, st1, 0, STORE, v2di) - /* Implemented by aarch64_ld1x3. */ - BUILTIN_VALLDIF (LOADSTRUCT, ld1x3, 0, LOAD) - - /* Implemented by aarch64_ld1x4. */ - BUILTIN_VALLDIF (LOADSTRUCT, ld1x4, 0, LOAD) - - /* Implemented by aarch64_st1x2. */ - BUILTIN_VALLDIF (STORESTRUCT, st1x2, 0, STORE) - - /* Implemented by aarch64_st1x3. */ - BUILTIN_VALLDIF (STORESTRUCT, st1x3, 0, STORE) - - /* Implemented by aarch64_st1x4. */ - BUILTIN_VALLDIF (STORESTRUCT, st1x4, 0, STORE) - /* Implemented by fma4. */ BUILTIN_VHSDF (TERNOP, fma, 4, FP) VAR1 (TERNOP, fma, 4, FP, hf) @@ -726,12 +738,21 @@ /* Implemented by aarch64_qtbl2. */ VAR2 (BINOP, qtbl2, 0, NONE, v8qi, v16qi) + VAR2 (BINOPU, qtbl2, 0, NONE, v8qi, v16qi) + VAR2 (BINOP_PPU, qtbl2, 0, NONE, v8qi, v16qi) + VAR2 (BINOP_SSU, qtbl2, 0, NONE, v8qi, v16qi) /* Implemented by aarch64_qtbl3. */ VAR2 (BINOP, qtbl3, 0, NONE, v8qi, v16qi) + VAR2 (BINOPU, qtbl3, 0, NONE, v8qi, v16qi) + VAR2 (BINOP_PPU, qtbl3, 0, NONE, v8qi, v16qi) + VAR2 (BINOP_SSU, qtbl3, 0, NONE, v8qi, v16qi) /* Implemented by aarch64_qtbl4. */ VAR2 (BINOP, qtbl4, 0, NONE, v8qi, v16qi) + VAR2 (BINOPU, qtbl4, 0, NONE, v8qi, v16qi) + VAR2 (BINOP_PPU, qtbl4, 0, NONE, v8qi, v16qi) + VAR2 (BINOP_SSU, qtbl4, 0, NONE, v8qi, v16qi) /* Implemented by aarch64_qtbx1. */ VAR2 (TERNOP, qtbx1, 0, NONE, v8qi, v16qi) @@ -741,12 +762,21 @@ /* Implemented by aarch64_qtbx2. */ VAR2 (TERNOP, qtbx2, 0, NONE, v8qi, v16qi) + VAR2 (TERNOPU, qtbx2, 0, NONE, v8qi, v16qi) + VAR2 (TERNOP_PPPU, qtbx2, 0, NONE, v8qi, v16qi) + VAR2 (TERNOP_SSSU, qtbx2, 0, NONE, v8qi, v16qi) /* Implemented by aarch64_qtbx3. */ VAR2 (TERNOP, qtbx3, 0, NONE, v8qi, v16qi) + VAR2 (TERNOPU, qtbx3, 0, NONE, v8qi, v16qi) + VAR2 (TERNOP_PPPU, qtbx3, 0, NONE, v8qi, v16qi) + VAR2 (TERNOP_SSSU, qtbx3, 0, NONE, v8qi, v16qi) /* Implemented by aarch64_qtbx4. */ VAR2 (TERNOP, qtbx4, 0, NONE, v8qi, v16qi) + VAR2 (TERNOPU, qtbx4, 0, NONE, v8qi, v16qi) + VAR2 (TERNOP_PPPU, qtbx4, 0, NONE, v8qi, v16qi) + VAR2 (TERNOP_SSSU, qtbx4, 0, NONE, v8qi, v16qi) /* Builtins for ARMv8.1-A Adv.SIMD instructions. */ diff --git a/gcc/config/aarch64/aarch64-simd.md b/gcc/config/aarch64/aarch64-simd.md index 61c3d7e..bff76e4 100644 --- a/gcc/config/aarch64/aarch64-simd.md +++ b/gcc/config/aarch64/aarch64-simd.md @@ -6768,162 +6768,165 @@ ;; Patterns for vector struct loads and stores. -(define_insn "aarch64_simd_ld2" - [(set (match_operand:OI 0 "register_operand" "=w") - (unspec:OI [(match_operand:OI 1 "aarch64_simd_struct_operand" "Utv") - (unspec:VQ [(const_int 0)] UNSPEC_VSTRUCTDUMMY)] - UNSPEC_LD2))] +(define_insn "aarch64_simd_ld2" + [(set (match_operand:VSTRUCT_2Q 0 "register_operand" "=w") + (unspec:VSTRUCT_2Q [ + (match_operand:VSTRUCT_2Q 1 "aarch64_simd_struct_operand" "Utv")] + UNSPEC_LD2))] "TARGET_SIMD" "ld2\\t{%S0. - %T0.}, %1" [(set_attr "type" "neon_load2_2reg")] ) -(define_insn "aarch64_simd_ld2r" - [(set (match_operand:OI 0 "register_operand" "=w") - (unspec:OI [(match_operand:BLK 1 "aarch64_simd_struct_operand" "Utv") - (unspec:VALLDIF [(const_int 0)] UNSPEC_VSTRUCTDUMMY) ] - UNSPEC_LD2_DUP))] +(define_insn "aarch64_simd_ld2r" + [(set (match_operand:VSTRUCT_2QD 0 "register_operand" "=w") + (unspec:VSTRUCT_2QD [ + (match_operand:BLK 1 "aarch64_simd_struct_operand" "Utv")] + UNSPEC_LD2_DUP))] "TARGET_SIMD" "ld2r\\t{%S0. - %T0.}, %1" [(set_attr "type" "neon_load2_all_lanes")] ) -(define_insn "aarch64_vec_load_lanesoi_lane" - [(set (match_operand:OI 0 "register_operand" "=w") - (unspec:OI [(match_operand:BLK 1 "aarch64_simd_struct_operand" "Utv") - (match_operand:OI 2 "register_operand" "0") - (match_operand:SI 3 "immediate_operand" "i") - (unspec:VALLDIF [(const_int 0)] UNSPEC_VSTRUCTDUMMY) ] - UNSPEC_LD2_LANE))] +(define_insn "aarch64_vec_load_lanes_lane" + [(set (match_operand:VSTRUCT_2QD 0 "register_operand" "=w") + (unspec:VSTRUCT_2QD [ + (match_operand:BLK 1 "aarch64_simd_struct_operand" "Utv") + (match_operand:VSTRUCT_2QD 2 "register_operand" "0") + (match_operand:SI 3 "immediate_operand" "i")] + UNSPEC_LD2_LANE))] "TARGET_SIMD" { - operands[3] = aarch64_endian_lane_rtx (mode, INTVAL (operands[3])); + operands[3] = aarch64_endian_lane_rtx (mode, + INTVAL (operands[3])); return "ld2\\t{%S0. - %T0.}[%3], %1"; } [(set_attr "type" "neon_load2_one_lane")] ) -(define_expand "vec_load_lanesoi" - [(set (match_operand:OI 0 "register_operand") - (unspec:OI [(match_operand:OI 1 "aarch64_simd_struct_operand") - (unspec:VQ [(const_int 0)] UNSPEC_VSTRUCTDUMMY)] - UNSPEC_LD2))] +(define_expand "vec_load_lanes" + [(set (match_operand:VSTRUCT_2Q 0 "register_operand") + (unspec:VSTRUCT_2Q [ + (match_operand:VSTRUCT_2Q 1 "aarch64_simd_struct_operand")] + UNSPEC_LD2))] "TARGET_SIMD" { if (BYTES_BIG_ENDIAN) { - rtx tmp = gen_reg_rtx (OImode); - rtx mask = aarch64_reverse_mask (mode, ); - emit_insn (gen_aarch64_simd_ld2 (tmp, operands[1])); - emit_insn (gen_aarch64_rev_reglistoi (operands[0], tmp, mask)); + rtx tmp = gen_reg_rtx (mode); + rtx mask = aarch64_reverse_mask (mode, + GET_MODE_NUNITS (mode).to_constant () / ); + emit_insn (gen_aarch64_simd_ld2 (tmp, operands[1])); + emit_insn (gen_aarch64_rev_reglist (operands[0], tmp, mask)); } else - emit_insn (gen_aarch64_simd_ld2 (operands[0], operands[1])); + emit_insn (gen_aarch64_simd_ld2 (operands[0], operands[1])); DONE; }) -(define_insn "aarch64_simd_st2" - [(set (match_operand:OI 0 "aarch64_simd_struct_operand" "=Utv") - (unspec:OI [(match_operand:OI 1 "register_operand" "w") - (unspec:VQ [(const_int 0)] UNSPEC_VSTRUCTDUMMY)] - UNSPEC_ST2))] +(define_insn "aarch64_simd_st2" + [(set (match_operand:VSTRUCT_2Q 0 "aarch64_simd_struct_operand" "=Utv") + (unspec:VSTRUCT_2Q [ + (match_operand:VSTRUCT_2Q 1 "register_operand" "w")] + UNSPEC_ST2))] "TARGET_SIMD" "st2\\t{%S1. - %T1.}, %0" [(set_attr "type" "neon_store2_2reg")] ) ;; RTL uses GCC vector extension indices, so flip only for assembly. -(define_insn "aarch64_vec_store_lanesoi_lane" +(define_insn "aarch64_vec_store_lanes_lane" [(set (match_operand:BLK 0 "aarch64_simd_struct_operand" "=Utv") - (unspec:BLK [(match_operand:OI 1 "register_operand" "w") - (unspec:VALLDIF [(const_int 0)] UNSPEC_VSTRUCTDUMMY) - (match_operand:SI 2 "immediate_operand" "i")] - UNSPEC_ST2_LANE))] + (unspec:BLK [(match_operand:VSTRUCT_2QD 1 "register_operand" "w") + (match_operand:SI 2 "immediate_operand" "i")] + UNSPEC_ST2_LANE))] "TARGET_SIMD" { - operands[2] = aarch64_endian_lane_rtx (mode, INTVAL (operands[2])); + operands[2] = aarch64_endian_lane_rtx (mode, + INTVAL (operands[2])); return "st2\\t{%S1. - %T1.}[%2], %0"; } [(set_attr "type" "neon_store2_one_lane")] ) -(define_expand "vec_store_lanesoi" - [(set (match_operand:OI 0 "aarch64_simd_struct_operand") - (unspec:OI [(match_operand:OI 1 "register_operand") - (unspec:VQ [(const_int 0)] UNSPEC_VSTRUCTDUMMY)] +(define_expand "vec_store_lanes" + [(set (match_operand:VSTRUCT_2Q 0 "aarch64_simd_struct_operand") + (unspec:VSTRUCT_2Q [(match_operand:VSTRUCT_2Q 1 "register_operand")] UNSPEC_ST2))] "TARGET_SIMD" { if (BYTES_BIG_ENDIAN) { - rtx tmp = gen_reg_rtx (OImode); - rtx mask = aarch64_reverse_mask (mode, ); - emit_insn (gen_aarch64_rev_reglistoi (tmp, operands[1], mask)); - emit_insn (gen_aarch64_simd_st2 (operands[0], tmp)); + rtx tmp = gen_reg_rtx (mode); + rtx mask = aarch64_reverse_mask (mode, + GET_MODE_NUNITS (mode).to_constant () / ); + emit_insn (gen_aarch64_rev_reglist (tmp, operands[1], mask)); + emit_insn (gen_aarch64_simd_st2 (operands[0], tmp)); } else - emit_insn (gen_aarch64_simd_st2 (operands[0], operands[1])); + emit_insn (gen_aarch64_simd_st2 (operands[0], operands[1])); DONE; }) -(define_insn "aarch64_simd_ld3" - [(set (match_operand:CI 0 "register_operand" "=w") - (unspec:CI [(match_operand:CI 1 "aarch64_simd_struct_operand" "Utv") - (unspec:VQ [(const_int 0)] UNSPEC_VSTRUCTDUMMY)] - UNSPEC_LD3))] +(define_insn "aarch64_simd_ld3" + [(set (match_operand:VSTRUCT_3Q 0 "register_operand" "=w") + (unspec:VSTRUCT_3Q [ + (match_operand:VSTRUCT_3Q 1 "aarch64_simd_struct_operand" "Utv")] + UNSPEC_LD3))] "TARGET_SIMD" "ld3\\t{%S0. - %U0.}, %1" [(set_attr "type" "neon_load3_3reg")] ) -(define_insn "aarch64_simd_ld3r" - [(set (match_operand:CI 0 "register_operand" "=w") - (unspec:CI [(match_operand:BLK 1 "aarch64_simd_struct_operand" "Utv") - (unspec:VALLDIF [(const_int 0)] UNSPEC_VSTRUCTDUMMY) ] - UNSPEC_LD3_DUP))] +(define_insn "aarch64_simd_ld3r" + [(set (match_operand:VSTRUCT_3QD 0 "register_operand" "=w") + (unspec:VSTRUCT_3QD [ + (match_operand:BLK 1 "aarch64_simd_struct_operand" "Utv")] + UNSPEC_LD3_DUP))] "TARGET_SIMD" "ld3r\\t{%S0. - %U0.}, %1" [(set_attr "type" "neon_load3_all_lanes")] ) -(define_insn "aarch64_vec_load_lanesci_lane" - [(set (match_operand:CI 0 "register_operand" "=w") - (unspec:CI [(match_operand:BLK 1 "aarch64_simd_struct_operand" "Utv") - (match_operand:CI 2 "register_operand" "0") - (match_operand:SI 3 "immediate_operand" "i") - (unspec:VALLDIF [(const_int 0)] UNSPEC_VSTRUCTDUMMY)] - UNSPEC_LD3_LANE))] +(define_insn "aarch64_vec_load_lanes_lane" + [(set (match_operand:VSTRUCT_3QD 0 "register_operand" "=w") + (unspec:VSTRUCT_3QD [ + (match_operand:BLK 1 "aarch64_simd_struct_operand" "Utv") + (match_operand:VSTRUCT_3QD 2 "register_operand" "0") + (match_operand:SI 3 "immediate_operand" "i")] + UNSPEC_LD3_LANE))] "TARGET_SIMD" { - operands[3] = aarch64_endian_lane_rtx (mode, INTVAL (operands[3])); + operands[3] = aarch64_endian_lane_rtx (mode, + INTVAL (operands[3])); return "ld3\\t{%S0. - %U0.}[%3], %1"; } [(set_attr "type" "neon_load3_one_lane")] ) -(define_expand "vec_load_lanesci" - [(set (match_operand:CI 0 "register_operand") - (unspec:CI [(match_operand:CI 1 "aarch64_simd_struct_operand") - (unspec:VQ [(const_int 0)] UNSPEC_VSTRUCTDUMMY)] - UNSPEC_LD3))] +(define_expand "vec_load_lanes" + [(set (match_operand:VSTRUCT_3Q 0 "register_operand") + (unspec:VSTRUCT_3Q [ + (match_operand:VSTRUCT_3Q 1 "aarch64_simd_struct_operand")] + UNSPEC_LD3))] "TARGET_SIMD" { if (BYTES_BIG_ENDIAN) { - rtx tmp = gen_reg_rtx (CImode); - rtx mask = aarch64_reverse_mask (mode, ); - emit_insn (gen_aarch64_simd_ld3 (tmp, operands[1])); - emit_insn (gen_aarch64_rev_reglistci (operands[0], tmp, mask)); + rtx tmp = gen_reg_rtx (mode); + rtx mask = aarch64_reverse_mask (mode, + GET_MODE_NUNITS (mode).to_constant () / ); + emit_insn (gen_aarch64_simd_ld3 (tmp, operands[1])); + emit_insn (gen_aarch64_rev_reglist (operands[0], tmp, mask)); } else - emit_insn (gen_aarch64_simd_ld3 (operands[0], operands[1])); + emit_insn (gen_aarch64_simd_ld3 (operands[0], operands[1])); DONE; }) -(define_insn "aarch64_simd_st3" - [(set (match_operand:CI 0 "aarch64_simd_struct_operand" "=Utv") - (unspec:CI [(match_operand:CI 1 "register_operand" "w") - (unspec:VQ [(const_int 0)] UNSPEC_VSTRUCTDUMMY)] +(define_insn "aarch64_simd_st3" + [(set (match_operand:VSTRUCT_3Q 0 "aarch64_simd_struct_operand" "=Utv") + (unspec:VSTRUCT_3Q [(match_operand:VSTRUCT_3Q 1 "register_operand" "w")] UNSPEC_ST3))] "TARGET_SIMD" "st3\\t{%S1. - %U1.}, %0" @@ -6931,141 +6934,144 @@ ) ;; RTL uses GCC vector extension indices, so flip only for assembly. -(define_insn "aarch64_vec_store_lanesci_lane" +(define_insn "aarch64_vec_store_lanes_lane" [(set (match_operand:BLK 0 "aarch64_simd_struct_operand" "=Utv") - (unspec:BLK [(match_operand:CI 1 "register_operand" "w") - (unspec:VALLDIF [(const_int 0)] UNSPEC_VSTRUCTDUMMY) + (unspec:BLK [(match_operand:VSTRUCT_3QD 1 "register_operand" "w") (match_operand:SI 2 "immediate_operand" "i")] - UNSPEC_ST3_LANE))] + UNSPEC_ST3_LANE))] "TARGET_SIMD" { - operands[2] = aarch64_endian_lane_rtx (mode, INTVAL (operands[2])); + operands[2] = aarch64_endian_lane_rtx (mode, + INTVAL (operands[2])); return "st3\\t{%S1. - %U1.}[%2], %0"; } [(set_attr "type" "neon_store3_one_lane")] ) -(define_expand "vec_store_lanesci" - [(set (match_operand:CI 0 "aarch64_simd_struct_operand") - (unspec:CI [(match_operand:CI 1 "register_operand") - (unspec:VQ [(const_int 0)] UNSPEC_VSTRUCTDUMMY)] - UNSPEC_ST3))] +(define_expand "vec_store_lanes" + [(set (match_operand:VSTRUCT_3Q 0 "aarch64_simd_struct_operand") + (unspec:VSTRUCT_3Q [ + (match_operand:VSTRUCT_3Q 1 "register_operand")] + UNSPEC_ST3))] "TARGET_SIMD" { if (BYTES_BIG_ENDIAN) { - rtx tmp = gen_reg_rtx (CImode); - rtx mask = aarch64_reverse_mask (mode, ); - emit_insn (gen_aarch64_rev_reglistci (tmp, operands[1], mask)); - emit_insn (gen_aarch64_simd_st3 (operands[0], tmp)); + rtx tmp = gen_reg_rtx (mode); + rtx mask = aarch64_reverse_mask (mode, + GET_MODE_NUNITS (mode).to_constant () / ); + emit_insn (gen_aarch64_rev_reglist (tmp, operands[1], mask)); + emit_insn (gen_aarch64_simd_st3 (operands[0], tmp)); } else - emit_insn (gen_aarch64_simd_st3 (operands[0], operands[1])); + emit_insn (gen_aarch64_simd_st3 (operands[0], operands[1])); DONE; }) -(define_insn "aarch64_simd_ld4" - [(set (match_operand:XI 0 "register_operand" "=w") - (unspec:XI [(match_operand:XI 1 "aarch64_simd_struct_operand" "Utv") - (unspec:VQ [(const_int 0)] UNSPEC_VSTRUCTDUMMY)] - UNSPEC_LD4))] +(define_insn "aarch64_simd_ld4" + [(set (match_operand:VSTRUCT_4Q 0 "register_operand" "=w") + (unspec:VSTRUCT_4Q [ + (match_operand:VSTRUCT_4Q 1 "aarch64_simd_struct_operand" "Utv")] + UNSPEC_LD4))] "TARGET_SIMD" "ld4\\t{%S0. - %V0.}, %1" [(set_attr "type" "neon_load4_4reg")] ) -(define_insn "aarch64_simd_ld4r" - [(set (match_operand:XI 0 "register_operand" "=w") - (unspec:XI [(match_operand:BLK 1 "aarch64_simd_struct_operand" "Utv") - (unspec:VALLDIF [(const_int 0)] UNSPEC_VSTRUCTDUMMY) ] - UNSPEC_LD4_DUP))] +(define_insn "aarch64_simd_ld4r" + [(set (match_operand:VSTRUCT_4QD 0 "register_operand" "=w") + (unspec:VSTRUCT_4QD [ + (match_operand:BLK 1 "aarch64_simd_struct_operand" "Utv")] + UNSPEC_LD4_DUP))] "TARGET_SIMD" "ld4r\\t{%S0. - %V0.}, %1" [(set_attr "type" "neon_load4_all_lanes")] ) -(define_insn "aarch64_vec_load_lanesxi_lane" - [(set (match_operand:XI 0 "register_operand" "=w") - (unspec:XI [(match_operand:BLK 1 "aarch64_simd_struct_operand" "Utv") - (match_operand:XI 2 "register_operand" "0") - (match_operand:SI 3 "immediate_operand" "i") - (unspec:VALLDIF [(const_int 0)] UNSPEC_VSTRUCTDUMMY)] - UNSPEC_LD4_LANE))] +(define_insn "aarch64_vec_load_lanes_lane" + [(set (match_operand:VSTRUCT_4QD 0 "register_operand" "=w") + (unspec:VSTRUCT_4QD [ + (match_operand:BLK 1 "aarch64_simd_struct_operand" "Utv") + (match_operand:VSTRUCT_4QD 2 "register_operand" "0") + (match_operand:SI 3 "immediate_operand" "i")] + UNSPEC_LD4_LANE))] "TARGET_SIMD" { - operands[3] = aarch64_endian_lane_rtx (mode, INTVAL (operands[3])); + operands[3] = aarch64_endian_lane_rtx (mode, + INTVAL (operands[3])); return "ld4\\t{%S0. - %V0.}[%3], %1"; } [(set_attr "type" "neon_load4_one_lane")] ) -(define_expand "vec_load_lanesxi" - [(set (match_operand:XI 0 "register_operand") - (unspec:XI [(match_operand:XI 1 "aarch64_simd_struct_operand") - (unspec:VQ [(const_int 0)] UNSPEC_VSTRUCTDUMMY)] - UNSPEC_LD4))] +(define_expand "vec_load_lanes" + [(set (match_operand:VSTRUCT_4Q 0 "register_operand") + (unspec:VSTRUCT_4Q [ + (match_operand:VSTRUCT_4Q 1 "aarch64_simd_struct_operand")] + UNSPEC_LD4))] "TARGET_SIMD" { if (BYTES_BIG_ENDIAN) { - rtx tmp = gen_reg_rtx (XImode); - rtx mask = aarch64_reverse_mask (mode, ); - emit_insn (gen_aarch64_simd_ld4 (tmp, operands[1])); - emit_insn (gen_aarch64_rev_reglistxi (operands[0], tmp, mask)); + rtx tmp = gen_reg_rtx (mode); + rtx mask = aarch64_reverse_mask (mode, + GET_MODE_NUNITS (mode).to_constant () / ); + emit_insn (gen_aarch64_simd_ld4 (tmp, operands[1])); + emit_insn (gen_aarch64_rev_reglist (operands[0], tmp, mask)); } else - emit_insn (gen_aarch64_simd_ld4 (operands[0], operands[1])); + emit_insn (gen_aarch64_simd_ld4 (operands[0], operands[1])); DONE; }) -(define_insn "aarch64_simd_st4" - [(set (match_operand:XI 0 "aarch64_simd_struct_operand" "=Utv") - (unspec:XI [(match_operand:XI 1 "register_operand" "w") - (unspec:VQ [(const_int 0)] UNSPEC_VSTRUCTDUMMY)] - UNSPEC_ST4))] +(define_insn "aarch64_simd_st4" + [(set (match_operand:VSTRUCT_4Q 0 "aarch64_simd_struct_operand" "=Utv") + (unspec:VSTRUCT_4Q [ + (match_operand:VSTRUCT_4Q 1 "register_operand" "w")] + UNSPEC_ST4))] "TARGET_SIMD" "st4\\t{%S1. - %V1.}, %0" [(set_attr "type" "neon_store4_4reg")] ) ;; RTL uses GCC vector extension indices, so flip only for assembly. -(define_insn "aarch64_vec_store_lanesxi_lane" +(define_insn "aarch64_vec_store_lanes_lane" [(set (match_operand:BLK 0 "aarch64_simd_struct_operand" "=Utv") - (unspec:BLK [(match_operand:XI 1 "register_operand" "w") - (unspec:VALLDIF [(const_int 0)] UNSPEC_VSTRUCTDUMMY) + (unspec:BLK [(match_operand:VSTRUCT_4QD 1 "register_operand" "w") (match_operand:SI 2 "immediate_operand" "i")] - UNSPEC_ST4_LANE))] + UNSPEC_ST4_LANE))] "TARGET_SIMD" { - operands[2] = aarch64_endian_lane_rtx (mode, INTVAL (operands[2])); + operands[2] = aarch64_endian_lane_rtx (mode, + INTVAL (operands[2])); return "st4\\t{%S1. - %V1.}[%2], %0"; } [(set_attr "type" "neon_store4_one_lane")] ) -(define_expand "vec_store_lanesxi" - [(set (match_operand:XI 0 "aarch64_simd_struct_operand") - (unspec:XI [(match_operand:XI 1 "register_operand") - (unspec:VQ [(const_int 0)] UNSPEC_VSTRUCTDUMMY)] +(define_expand "vec_store_lanes" + [(set (match_operand:VSTRUCT_4Q 0 "aarch64_simd_struct_operand") + (unspec:VSTRUCT_4Q [(match_operand:VSTRUCT_4Q 1 "register_operand")] UNSPEC_ST4))] "TARGET_SIMD" { if (BYTES_BIG_ENDIAN) { - rtx tmp = gen_reg_rtx (XImode); - rtx mask = aarch64_reverse_mask (mode, ); - emit_insn (gen_aarch64_rev_reglistxi (tmp, operands[1], mask)); - emit_insn (gen_aarch64_simd_st4 (operands[0], tmp)); + rtx tmp = gen_reg_rtx (mode); + rtx mask = aarch64_reverse_mask (mode, + GET_MODE_NUNITS (mode).to_constant () / ); + emit_insn (gen_aarch64_rev_reglist (tmp, operands[1], mask)); + emit_insn (gen_aarch64_simd_st4 (operands[0], tmp)); } else - emit_insn (gen_aarch64_simd_st4 (operands[0], operands[1])); + emit_insn (gen_aarch64_simd_st4 (operands[0], operands[1])); DONE; }) (define_insn_and_split "aarch64_rev_reglist" -[(set (match_operand:VSTRUCT 0 "register_operand" "=&w") - (unspec:VSTRUCT - [(match_operand:VSTRUCT 1 "register_operand" "w") +[(set (match_operand:VSTRUCT_QD 0 "register_operand" "=&w") + (unspec:VSTRUCT_QD + [(match_operand:VSTRUCT_QD 1 "register_operand" "w") (match_operand:V16QI 2 "register_operand" "w")] UNSPEC_REV_REGLIST))] "TARGET_SIMD" @@ -7074,7 +7080,7 @@ [(const_int 0)] { int i; - int nregs = GET_MODE_SIZE (mode) / UNITS_PER_VREG; + int nregs = GET_MODE_SIZE (mode).to_constant () / UNITS_PER_VREG; for (i = 0; i < nregs; i++) { rtx op0 = gen_rtx_REG (V16QImode, REGNO (operands[0]) + i); @@ -7090,6 +7096,18 @@ ;; Reload patterns for AdvSIMD register list operands. (define_expand "mov" + [(set (match_operand:VSTRUCT_QD 0 "nonimmediate_operand") + (match_operand:VSTRUCT_QD 1 "general_operand"))] + "TARGET_SIMD" +{ + if (can_create_pseudo_p ()) + { + if (GET_CODE (operands[0]) != REG) + operands[1] = force_reg (mode, operands[1]); + } +}) + +(define_expand "mov" [(set (match_operand:VSTRUCT 0 "nonimmediate_operand") (match_operand:VSTRUCT 1 "general_operand"))] "TARGET_SIMD" @@ -7101,115 +7119,122 @@ } }) - -(define_expand "aarch64_ld1x3" - [(match_operand:CI 0 "register_operand") - (match_operand:DI 1 "register_operand") - (unspec:VALLDIF [(const_int 0)] UNSPEC_VSTRUCTDUMMY)] +(define_expand "aarch64_ld1x3" + [(match_operand:VSTRUCT_3QD 0 "register_operand") + (match_operand:DI 1 "register_operand")] "TARGET_SIMD" { - rtx mem = gen_rtx_MEM (CImode, operands[1]); - emit_insn (gen_aarch64_ld1_x3_ (operands[0], mem)); + rtx mem = gen_rtx_MEM (mode, operands[1]); + emit_insn (gen_aarch64_ld1_x3_ (operands[0], mem)); DONE; }) -(define_insn "aarch64_ld1_x3_" - [(set (match_operand:CI 0 "register_operand" "=w") - (unspec:CI - [(match_operand:CI 1 "aarch64_simd_struct_operand" "Utv") - (unspec:VALLDIF [(const_int 3)] UNSPEC_VSTRUCTDUMMY)] UNSPEC_LD1))] +(define_insn "aarch64_ld1_x3_" + [(set (match_operand:VSTRUCT_3QD 0 "register_operand" "=w") + (unspec:VSTRUCT_3QD + [(match_operand:VSTRUCT_3QD 1 "aarch64_simd_struct_operand" "Utv")] + UNSPEC_LD1))] "TARGET_SIMD" "ld1\\t{%S0. - %U0.}, %1" [(set_attr "type" "neon_load1_3reg")] ) -(define_expand "aarch64_ld1x4" - [(match_operand:XI 0 "register_operand" "=w") - (match_operand:DI 1 "register_operand" "r") - (unspec:VALLDIF [(const_int 0)] UNSPEC_VSTRUCTDUMMY)] +(define_expand "aarch64_ld1x4" + [(match_operand:VSTRUCT_4QD 0 "register_operand" "=w") + (match_operand:DI 1 "register_operand" "r")] "TARGET_SIMD" { - rtx mem = gen_rtx_MEM (XImode, operands[1]); - emit_insn (gen_aarch64_ld1_x4_ (operands[0], mem)); + rtx mem = gen_rtx_MEM (mode, operands[1]); + emit_insn (gen_aarch64_ld1_x4_ (operands[0], mem)); DONE; }) -(define_insn "aarch64_ld1_x4_" - [(set (match_operand:XI 0 "register_operand" "=w") - (unspec:XI - [(match_operand:XI 1 "aarch64_simd_struct_operand" "Utv") - (unspec:VALLDIF [(const_int 4)] UNSPEC_VSTRUCTDUMMY)] +(define_insn "aarch64_ld1_x4_" + [(set (match_operand:VSTRUCT_4QD 0 "register_operand" "=w") + (unspec:VSTRUCT_4QD + [(match_operand:VSTRUCT_4QD 1 "aarch64_simd_struct_operand" "Utv")] UNSPEC_LD1))] "TARGET_SIMD" "ld1\\t{%S0. - %V0.}, %1" [(set_attr "type" "neon_load1_4reg")] ) -(define_expand "aarch64_st1x2" +(define_expand "aarch64_st1x2" [(match_operand:DI 0 "register_operand") - (match_operand:OI 1 "register_operand") - (unspec:VALLDIF [(const_int 0)] UNSPEC_VSTRUCTDUMMY)] + (match_operand:VSTRUCT_2QD 1 "register_operand")] "TARGET_SIMD" { - rtx mem = gen_rtx_MEM (OImode, operands[0]); - emit_insn (gen_aarch64_st1_x2_ (mem, operands[1])); + rtx mem = gen_rtx_MEM (mode, operands[0]); + emit_insn (gen_aarch64_st1_x2_ (mem, operands[1])); DONE; }) -(define_insn "aarch64_st1_x2_" - [(set (match_operand:OI 0 "aarch64_simd_struct_operand" "=Utv") - (unspec:OI - [(match_operand:OI 1 "register_operand" "w") - (unspec:VALLDIF [(const_int 2)] UNSPEC_VSTRUCTDUMMY)] UNSPEC_ST1))] +(define_insn "aarch64_st1_x2_" + [(set (match_operand:VSTRUCT_2QD 0 "aarch64_simd_struct_operand" "=Utv") + (unspec:VSTRUCT_2QD + [(match_operand:VSTRUCT_2QD 1 "register_operand" "w")] + UNSPEC_ST1))] "TARGET_SIMD" "st1\\t{%S1. - %T1.}, %0" [(set_attr "type" "neon_store1_2reg")] ) -(define_expand "aarch64_st1x3" +(define_expand "aarch64_st1x3" [(match_operand:DI 0 "register_operand") - (match_operand:CI 1 "register_operand") - (unspec:VALLDIF [(const_int 0)] UNSPEC_VSTRUCTDUMMY)] + (match_operand:VSTRUCT_3QD 1 "register_operand")] "TARGET_SIMD" { - rtx mem = gen_rtx_MEM (CImode, operands[0]); - emit_insn (gen_aarch64_st1_x3_ (mem, operands[1])); + rtx mem = gen_rtx_MEM (mode, operands[0]); + emit_insn (gen_aarch64_st1_x3_ (mem, operands[1])); DONE; }) -(define_insn "aarch64_st1_x3_" - [(set (match_operand:CI 0 "aarch64_simd_struct_operand" "=Utv") - (unspec:CI - [(match_operand:CI 1 "register_operand" "w") - (unspec:VALLDIF [(const_int 3)] UNSPEC_VSTRUCTDUMMY)] UNSPEC_ST1))] +(define_insn "aarch64_st1_x3_" + [(set (match_operand:VSTRUCT_3QD 0 "aarch64_simd_struct_operand" "=Utv") + (unspec:VSTRUCT_3QD + [(match_operand:VSTRUCT_3QD 1 "register_operand" "w")] + UNSPEC_ST1))] "TARGET_SIMD" "st1\\t{%S1. - %U1.}, %0" [(set_attr "type" "neon_store1_3reg")] ) -(define_expand "aarch64_st1x4" +(define_expand "aarch64_st1x4" [(match_operand:DI 0 "register_operand" "") - (match_operand:XI 1 "register_operand" "") - (unspec:VALLDIF [(const_int 0)] UNSPEC_VSTRUCTDUMMY)] + (match_operand:VSTRUCT_4QD 1 "register_operand" "")] "TARGET_SIMD" { - rtx mem = gen_rtx_MEM (XImode, operands[0]); - emit_insn (gen_aarch64_st1_x4_ (mem, operands[1])); + rtx mem = gen_rtx_MEM (mode, operands[0]); + emit_insn (gen_aarch64_st1_x4_ (mem, operands[1])); DONE; }) -(define_insn "aarch64_st1_x4_" - [(set (match_operand:XI 0 "aarch64_simd_struct_operand" "=Utv") - (unspec:XI - [(match_operand:XI 1 "register_operand" "w") - (unspec:VALLDIF [(const_int 4)] UNSPEC_VSTRUCTDUMMY)] - UNSPEC_ST1))] +(define_insn "aarch64_st1_x4_" + [(set (match_operand:VSTRUCT_4QD 0 "aarch64_simd_struct_operand" "=Utv") + (unspec:VSTRUCT_4QD + [(match_operand:VSTRUCT_4QD 1 "register_operand" "w")] + UNSPEC_ST1))] "TARGET_SIMD" "st1\\t{%S1. - %V1.}, %0" [(set_attr "type" "neon_store1_4reg")] ) (define_insn "*aarch64_mov" + [(set (match_operand:VSTRUCT_QD 0 "aarch64_simd_nonimmediate_operand" "=w,Utv,w") + (match_operand:VSTRUCT_QD 1 "aarch64_simd_general_operand" " w,w,Utv"))] + "TARGET_SIMD && !BYTES_BIG_ENDIAN + && (register_operand (operands[0], mode) + || register_operand (operands[1], mode))" + "@ + # + st1\\t{%S1. - %1.}, %0 + ld1\\t{%S0. - %0.}, %1" + [(set_attr "type" "multiple,neon_store_reg_q,\ + neon_load_reg_q") + (set_attr "length" ",4,4")] +) + +(define_insn "*aarch64_mov" [(set (match_operand:VSTRUCT 0 "aarch64_simd_nonimmediate_operand" "=w,Utv,w") (match_operand:VSTRUCT 1 "aarch64_simd_general_operand" " w,w,Utv"))] "TARGET_SIMD && !BYTES_BIG_ENDIAN @@ -7243,6 +7268,34 @@ [(set_attr "type" "neon_store1_1reg")] ) +(define_insn "*aarch64_be_mov" + [(set (match_operand:VSTRUCT_2D 0 "nonimmediate_operand" "=w,m,w") + (match_operand:VSTRUCT_2D 1 "general_operand" " w,w,m"))] + "TARGET_SIMD && BYTES_BIG_ENDIAN + && (register_operand (operands[0], mode) + || register_operand (operands[1], mode))" + "@ + # + stp\\t%d1, %R1, %0 + ldp\\t%d0, %R0, %1" + [(set_attr "type" "multiple,neon_stp,neon_ldp") + (set_attr "length" "8,4,4")] +) + +(define_insn "*aarch64_be_mov" + [(set (match_operand:VSTRUCT_2Q 0 "nonimmediate_operand" "=w,m,w") + (match_operand:VSTRUCT_2Q 1 "general_operand" " w,w,m"))] + "TARGET_SIMD && BYTES_BIG_ENDIAN + && (register_operand (operands[0], mode) + || register_operand (operands[1], mode))" + "@ + # + stp\\t%q1, %R1, %0 + ldp\\t%q0, %R0, %1" + [(set_attr "type" "multiple,neon_stp_q,neon_ldp_q") + (set_attr "length" "8,4,4")] +) + (define_insn "*aarch64_be_movoi" [(set (match_operand:OI 0 "nonimmediate_operand" "=w,m,w") (match_operand:OI 1 "general_operand" " w,w,m"))] @@ -7257,6 +7310,17 @@ (set_attr "length" "8,4,4")] ) +(define_insn "*aarch64_be_mov" + [(set (match_operand:VSTRUCT_3QD 0 "nonimmediate_operand" "=w,o,w") + (match_operand:VSTRUCT_3QD 1 "general_operand" " w,w,o"))] + "TARGET_SIMD && BYTES_BIG_ENDIAN + && (register_operand (operands[0], mode) + || register_operand (operands[1], mode))" + "#" + [(set_attr "type" "multiple") + (set_attr "length" "12,8,8")] +) + (define_insn "*aarch64_be_movci" [(set (match_operand:CI 0 "nonimmediate_operand" "=w,o,w") (match_operand:CI 1 "general_operand" " w,w,o"))] @@ -7268,6 +7332,17 @@ (set_attr "length" "12,4,4")] ) +(define_insn "*aarch64_be_mov" + [(set (match_operand:VSTRUCT_4QD 0 "nonimmediate_operand" "=w,o,w") + (match_operand:VSTRUCT_4QD 1 "general_operand" " w,w,o"))] + "TARGET_SIMD && BYTES_BIG_ENDIAN + && (register_operand (operands[0], mode) + || register_operand (operands[1], mode))" + "#" + [(set_attr "type" "multiple") + (set_attr "length" "16,8,8")] +) + (define_insn "*aarch64_be_movxi" [(set (match_operand:XI 0 "nonimmediate_operand" "=w,o,w") (match_operand:XI 1 "general_operand" " w,w,o"))] @@ -7280,6 +7355,16 @@ ) (define_split + [(set (match_operand:VSTRUCT_2QD 0 "register_operand") + (match_operand:VSTRUCT_2QD 1 "register_operand"))] + "TARGET_SIMD && reload_completed" + [(const_int 0)] +{ + aarch64_simd_emit_reg_reg_move (operands, mode, 2); + DONE; +}) + +(define_split [(set (match_operand:OI 0 "register_operand") (match_operand:OI 1 "register_operand"))] "TARGET_SIMD && reload_completed" @@ -7290,6 +7375,42 @@ }) (define_split + [(set (match_operand:VSTRUCT_3QD 0 "nonimmediate_operand") + (match_operand:VSTRUCT_3QD 1 "general_operand"))] + "TARGET_SIMD && reload_completed" + [(const_int 0)] +{ + if (register_operand (operands[0], mode) + && register_operand (operands[1], mode)) + { + aarch64_simd_emit_reg_reg_move (operands, mode, 3); + DONE; + } + else if (BYTES_BIG_ENDIAN) + { + int elt_size = GET_MODE_SIZE (mode).to_constant () / ; + machine_mode pair_mode = elt_size == 16 ? V2x16QImode : V2x8QImode; + emit_move_insn (simplify_gen_subreg (pair_mode, operands[0], + mode, 0), + simplify_gen_subreg (pair_mode, operands[1], + mode, 0)); + emit_move_insn (gen_lowpart (mode, + simplify_gen_subreg (mode, + operands[0], + mode, + 2 * elt_size)), + gen_lowpart (mode, + simplify_gen_subreg (mode, + operands[1], + mode, + 2 * elt_size))); + DONE; + } + else + FAIL; +}) + +(define_split [(set (match_operand:CI 0 "nonimmediate_operand") (match_operand:CI 1 "general_operand"))] "TARGET_SIMD && reload_completed" @@ -7318,6 +7439,36 @@ }) (define_split + [(set (match_operand:VSTRUCT_4QD 0 "nonimmediate_operand") + (match_operand:VSTRUCT_4QD 1 "general_operand"))] + "TARGET_SIMD && reload_completed" + [(const_int 0)] +{ + if (register_operand (operands[0], mode) + && register_operand (operands[1], mode)) + { + aarch64_simd_emit_reg_reg_move (operands, mode, 4); + DONE; + } + else if (BYTES_BIG_ENDIAN) + { + int elt_size = GET_MODE_SIZE (mode).to_constant () / ; + machine_mode pair_mode = elt_size == 16 ? V2x16QImode : V2x8QImode; + emit_move_insn (simplify_gen_subreg (pair_mode, operands[0], + mode, 0), + simplify_gen_subreg (pair_mode, operands[1], + mode, 0)); + emit_move_insn (simplify_gen_subreg (pair_mode, operands[0], + mode, 2 * elt_size), + simplify_gen_subreg (pair_mode, operands[1], + mode, 2 * elt_size)); + DONE; + } + else + FAIL; +}) + +(define_split [(set (match_operand:XI 0 "nonimmediate_operand") (match_operand:XI 1 "general_operand"))] "TARGET_SIMD && reload_completed" @@ -7341,91 +7492,85 @@ FAIL; }) -(define_expand "aarch64_ldr" - [(match_operand:VSTRUCT 0 "register_operand") - (match_operand:DI 1 "register_operand") - (unspec:VALLDIF [(const_int 0)] UNSPEC_VSTRUCTDUMMY)] +(define_expand "aarch64_ldr" + [(match_operand:VSTRUCT_QD 0 "register_operand") + (match_operand:DI 1 "register_operand")] "TARGET_SIMD" { rtx mem = gen_rtx_MEM (BLKmode, operands[1]); - set_mem_size (mem, GET_MODE_SIZE (GET_MODE_INNER (mode)) - * ); + set_mem_size (mem, GET_MODE_SIZE (GET_MODE_INNER (mode)) * ); - emit_insn (gen_aarch64_simd_ldr (operands[0], - mem)); + emit_insn (gen_aarch64_simd_ldr (operands[0], mem)); DONE; }) -(define_insn "aarch64_ld2_dreg" - [(set (match_operand:OI 0 "register_operand" "=w") - (unspec:OI [(match_operand:BLK 1 "aarch64_simd_struct_operand" "Utv") - (unspec:VD [(const_int 0)] UNSPEC_VSTRUCTDUMMY)] - UNSPEC_LD2_DREG))] +(define_insn "aarch64_ld2_dreg" + [(set (match_operand:VSTRUCT_2DNX 0 "register_operand" "=w") + (unspec:VSTRUCT_2DNX [ + (match_operand:VSTRUCT_2DNX 1 "aarch64_simd_struct_operand" "Utv")] + UNSPEC_LD2_DREG))] "TARGET_SIMD" "ld2\\t{%S0. - %T0.}, %1" [(set_attr "type" "neon_load2_2reg")] ) -(define_insn "aarch64_ld2_dreg" - [(set (match_operand:OI 0 "register_operand" "=w") - (unspec:OI [(match_operand:BLK 1 "aarch64_simd_struct_operand" "Utv") - (unspec:DX [(const_int 0)] UNSPEC_VSTRUCTDUMMY)] - UNSPEC_LD2_DREG))] +(define_insn "aarch64_ld2_dreg" + [(set (match_operand:VSTRUCT_2DX 0 "register_operand" "=w") + (unspec:VSTRUCT_2DX [ + (match_operand:VSTRUCT_2DX 1 "aarch64_simd_struct_operand" "Utv")] + UNSPEC_LD2_DREG))] "TARGET_SIMD" "ld1\\t{%S0.1d - %T0.1d}, %1" [(set_attr "type" "neon_load1_2reg")] ) -(define_insn "aarch64_ld3_dreg" - [(set (match_operand:CI 0 "register_operand" "=w") - (unspec:CI [(match_operand:BLK 1 "aarch64_simd_struct_operand" "Utv") - (unspec:VD [(const_int 0)] UNSPEC_VSTRUCTDUMMY)] - UNSPEC_LD3_DREG))] +(define_insn "aarch64_ld3_dreg" + [(set (match_operand:VSTRUCT_3DNX 0 "register_operand" "=w") + (unspec:VSTRUCT_3DNX [ + (match_operand:VSTRUCT_3DNX 1 "aarch64_simd_struct_operand" "Utv")] + UNSPEC_LD3_DREG))] "TARGET_SIMD" "ld3\\t{%S0. - %U0.}, %1" [(set_attr "type" "neon_load3_3reg")] ) -(define_insn "aarch64_ld3_dreg" - [(set (match_operand:CI 0 "register_operand" "=w") - (unspec:CI [(match_operand:BLK 1 "aarch64_simd_struct_operand" "Utv") - (unspec:DX [(const_int 0)] UNSPEC_VSTRUCTDUMMY)] - UNSPEC_LD3_DREG))] +(define_insn "aarch64_ld3_dreg" + [(set (match_operand:VSTRUCT_3DX 0 "register_operand" "=w") + (unspec:VSTRUCT_3DX [ + (match_operand:VSTRUCT_3DX 1 "aarch64_simd_struct_operand" "Utv")] + UNSPEC_LD3_DREG))] "TARGET_SIMD" "ld1\\t{%S0.1d - %U0.1d}, %1" [(set_attr "type" "neon_load1_3reg")] ) -(define_insn "aarch64_ld4_dreg" - [(set (match_operand:XI 0 "register_operand" "=w") - (unspec:XI [(match_operand:BLK 1 "aarch64_simd_struct_operand" "Utv") - (unspec:VD [(const_int 0)] UNSPEC_VSTRUCTDUMMY)] - UNSPEC_LD4_DREG))] +(define_insn "aarch64_ld4_dreg" + [(set (match_operand:VSTRUCT_4DNX 0 "register_operand" "=w") + (unspec:VSTRUCT_4DNX [ + (match_operand:VSTRUCT_4DNX 1 "aarch64_simd_struct_operand" "Utv")] + UNSPEC_LD4_DREG))] "TARGET_SIMD" "ld4\\t{%S0. - %V0.}, %1" [(set_attr "type" "neon_load4_4reg")] ) -(define_insn "aarch64_ld4_dreg" - [(set (match_operand:XI 0 "register_operand" "=w") - (unspec:XI [(match_operand:BLK 1 "aarch64_simd_struct_operand" "Utv") - (unspec:DX [(const_int 0)] UNSPEC_VSTRUCTDUMMY)] - UNSPEC_LD4_DREG))] +(define_insn "aarch64_ld4_dreg" + [(set (match_operand:VSTRUCT_4DX 0 "register_operand" "=w") + (unspec:VSTRUCT_4DX [ + (match_operand:VSTRUCT_4DX 1 "aarch64_simd_struct_operand" "Utv")] + UNSPEC_LD4_DREG))] "TARGET_SIMD" "ld1\\t{%S0.1d - %V0.1d}, %1" [(set_attr "type" "neon_load1_4reg")] ) -(define_expand "aarch64_ld" - [(match_operand:VSTRUCT 0 "register_operand") - (match_operand:DI 1 "register_operand") - (unspec:VDC [(const_int 0)] UNSPEC_VSTRUCTDUMMY)] +(define_expand "aarch64_ld" + [(match_operand:VSTRUCT_D 0 "register_operand") + (match_operand:DI 1 "register_operand")] "TARGET_SIMD" { - rtx mem = gen_rtx_MEM (BLKmode, operands[1]); - set_mem_size (mem, * 8); - - emit_insn (gen_aarch64_ld_dreg (operands[0], mem)); + rtx mem = gen_rtx_MEM (mode, operands[1]); + emit_insn (gen_aarch64_ld_dreg (operands[0], mem)); DONE; }) @@ -7444,97 +7589,42 @@ DONE; }) -(define_expand "aarch64_ld" - [(match_operand:VSTRUCT 0 "register_operand") - (match_operand:DI 1 "register_operand") - (unspec:VQ [(const_int 0)] UNSPEC_VSTRUCTDUMMY)] - "TARGET_SIMD" -{ - machine_mode mode = mode; - rtx mem = gen_rtx_MEM (mode, operands[1]); - - emit_insn (gen_aarch64_simd_ld (operands[0], mem)); - DONE; -}) - -(define_expand "aarch64_ld1x2" - [(match_operand:OI 0 "register_operand") - (match_operand:DI 1 "register_operand") - (unspec:VQ [(const_int 0)] UNSPEC_VSTRUCTDUMMY)] +(define_expand "aarch64_ld" + [(match_operand:VSTRUCT_Q 0 "register_operand") + (match_operand:DI 1 "register_operand")] "TARGET_SIMD" { - machine_mode mode = OImode; - rtx mem = gen_rtx_MEM (mode, operands[1]); - - emit_insn (gen_aarch64_simd_ld1_x2 (operands[0], mem)); + rtx mem = gen_rtx_MEM (mode, operands[1]); + emit_insn (gen_aarch64_simd_ld (operands[0], mem)); DONE; }) -(define_expand "aarch64_ld1x2" - [(match_operand:OI 0 "register_operand") - (match_operand:DI 1 "register_operand") - (unspec:VDC [(const_int 0)] UNSPEC_VSTRUCTDUMMY)] +(define_expand "aarch64_ld1x2" + [(match_operand:VSTRUCT_2QD 0 "register_operand") + (match_operand:DI 1 "register_operand")] "TARGET_SIMD" { - machine_mode mode = OImode; + machine_mode mode = mode; rtx mem = gen_rtx_MEM (mode, operands[1]); - emit_insn (gen_aarch64_simd_ld1_x2 (operands[0], mem)); + emit_insn (gen_aarch64_simd_ld1_x2 (operands[0], mem)); DONE; }) - -(define_expand "aarch64_ld_lane" - [(match_operand:VSTRUCT 0 "register_operand") +(define_expand "aarch64_ld_lane" + [(match_operand:VSTRUCT_QD 0 "register_operand") (match_operand:DI 1 "register_operand") - (match_operand:VSTRUCT 2 "register_operand") - (match_operand:SI 3 "immediate_operand") - (unspec:VALLDIF [(const_int 0)] UNSPEC_VSTRUCTDUMMY)] + (match_operand:VSTRUCT_QD 2 "register_operand") + (match_operand:SI 3 "immediate_operand")] "TARGET_SIMD" { rtx mem = gen_rtx_MEM (BLKmode, operands[1]); - set_mem_size (mem, GET_MODE_SIZE (GET_MODE_INNER (mode)) - * ); - - aarch64_simd_lane_bounds (operands[3], 0, , NULL); - emit_insn (gen_aarch64_vec_load_lanes_lane ( - operands[0], mem, operands[2], operands[3])); - DONE; -}) - -;; Expanders for builtins to extract vector registers from large -;; opaque integer modes. - -;; D-register list. + set_mem_size (mem, GET_MODE_SIZE (GET_MODE_INNER (mode)) * ); -(define_expand "aarch64_get_dreg" - [(match_operand:VDC 0 "register_operand") - (match_operand:VSTRUCT 1 "register_operand") - (match_operand:SI 2 "immediate_operand")] - "TARGET_SIMD" -{ - int part = INTVAL (operands[2]); - rtx temp = gen_reg_rtx (mode); - int offset = part * 16; - - emit_move_insn (temp, gen_rtx_SUBREG (mode, operands[1], offset)); - emit_move_insn (operands[0], gen_lowpart (mode, temp)); - DONE; -}) - -;; Q-register list. - -(define_expand "aarch64_get_qreg" - [(match_operand:VQ 0 "register_operand") - (match_operand:VSTRUCT 1 "register_operand") - (match_operand:SI 2 "immediate_operand")] - "TARGET_SIMD" -{ - int part = INTVAL (operands[2]); - int offset = part * 16; - - emit_move_insn (operands[0], - gen_rtx_SUBREG (mode, operands[1], offset)); + aarch64_simd_lane_bounds (operands[3], 0, + GET_MODE_NUNITS (mode).to_constant () / , NULL); + emit_insn (gen_aarch64_vec_load_lanes_lane (operands[0], + mem, operands[2], operands[3])); DONE; }) @@ -7581,7 +7671,7 @@ (define_insn "aarch64_qtbl2" [(set (match_operand:VB 0 "register_operand" "=w") - (unspec:VB [(match_operand:OI 1 "register_operand" "w") + (unspec:VB [(match_operand:V2x16QI 1 "register_operand" "w") (match_operand:VB 2 "register_operand" "w")] UNSPEC_TBL))] "TARGET_SIMD" @@ -7592,7 +7682,7 @@ (define_insn "aarch64_qtbx2" [(set (match_operand:VB 0 "register_operand" "=w") (unspec:VB [(match_operand:VB 1 "register_operand" "0") - (match_operand:OI 2 "register_operand" "w") + (match_operand:V2x16QI 2 "register_operand" "w") (match_operand:VB 3 "register_operand" "w")] UNSPEC_TBX))] "TARGET_SIMD" @@ -7604,7 +7694,7 @@ (define_insn "aarch64_qtbl3" [(set (match_operand:VB 0 "register_operand" "=w") - (unspec:VB [(match_operand:CI 1 "register_operand" "w") + (unspec:VB [(match_operand:V3x16QI 1 "register_operand" "w") (match_operand:VB 2 "register_operand" "w")] UNSPEC_TBL))] "TARGET_SIMD" @@ -7615,7 +7705,7 @@ (define_insn "aarch64_qtbx3" [(set (match_operand:VB 0 "register_operand" "=w") (unspec:VB [(match_operand:VB 1 "register_operand" "0") - (match_operand:CI 2 "register_operand" "w") + (match_operand:V3x16QI 2 "register_operand" "w") (match_operand:VB 3 "register_operand" "w")] UNSPEC_TBX))] "TARGET_SIMD" @@ -7627,7 +7717,7 @@ (define_insn "aarch64_qtbl4" [(set (match_operand:VB 0 "register_operand" "=w") - (unspec:VB [(match_operand:XI 1 "register_operand" "w") + (unspec:VB [(match_operand:V4x16QI 1 "register_operand" "w") (match_operand:VB 2 "register_operand" "w")] UNSPEC_TBL))] "TARGET_SIMD" @@ -7638,7 +7728,7 @@ (define_insn "aarch64_qtbx4" [(set (match_operand:VB 0 "register_operand" "=w") (unspec:VB [(match_operand:VB 1 "register_operand" "0") - (match_operand:XI 2 "register_operand" "w") + (match_operand:V4x16QI 2 "register_operand" "w") (match_operand:VB 3 "register_operand" "w")] UNSPEC_TBX))] "TARGET_SIMD" @@ -7647,10 +7737,10 @@ ) (define_insn_and_split "aarch64_combinev16qi" - [(set (match_operand:OI 0 "register_operand" "=w") - (unspec:OI [(match_operand:V16QI 1 "register_operand" "w") - (match_operand:V16QI 2 "register_operand" "w")] - UNSPEC_CONCAT))] + [(set (match_operand:V2x16QI 0 "register_operand" "=w") + (unspec:V2x16QI [(match_operand:V16QI 1 "register_operand" "w") + (match_operand:V16QI 2 "register_operand" "w")] + UNSPEC_CONCAT))] "TARGET_SIMD" "#" "&& reload_completed" @@ -7706,105 +7796,99 @@ [(set_attr "type" "neon_rev")] ) -(define_insn "aarch64_st2_dreg" - [(set (match_operand:BLK 0 "aarch64_simd_struct_operand" "=Utv") - (unspec:BLK [(match_operand:OI 1 "register_operand" "w") - (unspec:VD [(const_int 0)] UNSPEC_VSTRUCTDUMMY)] - UNSPEC_ST2))] +(define_insn "aarch64_st2_dreg" + [(set (match_operand:VSTRUCT_2DNX 0 "aarch64_simd_struct_operand" "=Utv") + (unspec:VSTRUCT_2DNX [ + (match_operand:VSTRUCT_2DNX 1 "register_operand" "w")] + UNSPEC_ST2))] "TARGET_SIMD" "st2\\t{%S1. - %T1.}, %0" [(set_attr "type" "neon_store2_2reg")] ) -(define_insn "aarch64_st2_dreg" - [(set (match_operand:BLK 0 "aarch64_simd_struct_operand" "=Utv") - (unspec:BLK [(match_operand:OI 1 "register_operand" "w") - (unspec:DX [(const_int 0)] UNSPEC_VSTRUCTDUMMY)] - UNSPEC_ST2))] +(define_insn "aarch64_st2_dreg" + [(set (match_operand:VSTRUCT_2DX 0 "aarch64_simd_struct_operand" "=Utv") + (unspec:VSTRUCT_2DX [ + (match_operand:VSTRUCT_2DX 1 "register_operand" "w")] + UNSPEC_ST2))] "TARGET_SIMD" "st1\\t{%S1.1d - %T1.1d}, %0" [(set_attr "type" "neon_store1_2reg")] ) -(define_insn "aarch64_st3_dreg" - [(set (match_operand:BLK 0 "aarch64_simd_struct_operand" "=Utv") - (unspec:BLK [(match_operand:CI 1 "register_operand" "w") - (unspec:VD [(const_int 0)] UNSPEC_VSTRUCTDUMMY)] - UNSPEC_ST3))] +(define_insn "aarch64_st3_dreg" + [(set (match_operand:VSTRUCT_3DNX 0 "aarch64_simd_struct_operand" "=Utv") + (unspec:VSTRUCT_3DNX [ + (match_operand:VSTRUCT_3DNX 1 "register_operand" "w")] + UNSPEC_ST3))] "TARGET_SIMD" "st3\\t{%S1. - %U1.}, %0" [(set_attr "type" "neon_store3_3reg")] ) -(define_insn "aarch64_st3_dreg" - [(set (match_operand:BLK 0 "aarch64_simd_struct_operand" "=Utv") - (unspec:BLK [(match_operand:CI 1 "register_operand" "w") - (unspec:DX [(const_int 0)] UNSPEC_VSTRUCTDUMMY)] - UNSPEC_ST3))] +(define_insn "aarch64_st3_dreg" + [(set (match_operand:VSTRUCT_3DX 0 "aarch64_simd_struct_operand" "=Utv") + (unspec:VSTRUCT_3DX [ + (match_operand:VSTRUCT_3DX 1 "register_operand" "w")] + UNSPEC_ST3))] "TARGET_SIMD" "st1\\t{%S1.1d - %U1.1d}, %0" [(set_attr "type" "neon_store1_3reg")] ) -(define_insn "aarch64_st4_dreg" - [(set (match_operand:BLK 0 "aarch64_simd_struct_operand" "=Utv") - (unspec:BLK [(match_operand:XI 1 "register_operand" "w") - (unspec:VD [(const_int 0)] UNSPEC_VSTRUCTDUMMY)] - UNSPEC_ST4))] +(define_insn "aarch64_st4_dreg" + [(set (match_operand:VSTRUCT_4DNX 0 "aarch64_simd_struct_operand" "=Utv") + (unspec:VSTRUCT_4DNX [ + (match_operand:VSTRUCT_4DNX 1 "register_operand" "w")] + UNSPEC_ST4))] "TARGET_SIMD" "st4\\t{%S1. - %V1.}, %0" [(set_attr "type" "neon_store4_4reg")] ) -(define_insn "aarch64_st4_dreg" - [(set (match_operand:BLK 0 "aarch64_simd_struct_operand" "=Utv") - (unspec:BLK [(match_operand:XI 1 "register_operand" "w") - (unspec:DX [(const_int 0)] UNSPEC_VSTRUCTDUMMY)] - UNSPEC_ST4))] +(define_insn "aarch64_st4_dreg" + [(set (match_operand:VSTRUCT_4DX 0 "aarch64_simd_struct_operand" "=Utv") + (unspec:VSTRUCT_4DX [ + (match_operand:VSTRUCT_4DX 1 "register_operand" "w")] + UNSPEC_ST4))] "TARGET_SIMD" "st1\\t{%S1.1d - %V1.1d}, %0" [(set_attr "type" "neon_store1_4reg")] ) -(define_expand "aarch64_st" +(define_expand "aarch64_st" [(match_operand:DI 0 "register_operand") - (match_operand:VSTRUCT 1 "register_operand") - (unspec:VDC [(const_int 0)] UNSPEC_VSTRUCTDUMMY)] + (match_operand:VSTRUCT_D 1 "register_operand")] "TARGET_SIMD" { - rtx mem = gen_rtx_MEM (BLKmode, operands[0]); - set_mem_size (mem, * 8); - - emit_insn (gen_aarch64_st_dreg (mem, operands[1])); + rtx mem = gen_rtx_MEM (mode, operands[0]); + emit_insn (gen_aarch64_st_dreg (mem, operands[1])); DONE; }) -(define_expand "aarch64_st" +(define_expand "aarch64_st" [(match_operand:DI 0 "register_operand") - (match_operand:VSTRUCT 1 "register_operand") - (unspec:VQ [(const_int 0)] UNSPEC_VSTRUCTDUMMY)] + (match_operand:VSTRUCT_Q 1 "register_operand")] "TARGET_SIMD" { - machine_mode mode = mode; - rtx mem = gen_rtx_MEM (mode, operands[0]); - - emit_insn (gen_aarch64_simd_st (mem, operands[1])); + rtx mem = gen_rtx_MEM (mode, operands[0]); + emit_insn (gen_aarch64_simd_st (mem, operands[1])); DONE; }) -(define_expand "aarch64_st_lane" +(define_expand "aarch64_st_lane" [(match_operand:DI 0 "register_operand") - (match_operand:VSTRUCT 1 "register_operand") - (unspec:VALLDIF [(const_int 0)] UNSPEC_VSTRUCTDUMMY) + (match_operand:VSTRUCT_QD 1 "register_operand") (match_operand:SI 2 "immediate_operand")] "TARGET_SIMD" { rtx mem = gen_rtx_MEM (BLKmode, operands[0]); - set_mem_size (mem, GET_MODE_SIZE (GET_MODE_INNER (mode)) - * ); + set_mem_size (mem, GET_MODE_SIZE (GET_MODE_INNER (mode)) * ); - emit_insn (gen_aarch64_vec_store_lanes_lane ( - mem, operands[1], operands[2])); + aarch64_simd_lane_bounds (operands[2], 0, + GET_MODE_NUNITS (mode).to_constant () / , NULL); + emit_insn (gen_aarch64_vec_store_lanes_lane (mem, + operands[1], operands[2])); DONE; }) @@ -7823,28 +7907,6 @@ DONE; }) -;; Expander for builtins to insert vector registers into large -;; opaque integer modes. - -;; Q-register list. We don't need a D-reg inserter as we zero -;; extend them in arm_neon.h and insert the resulting Q-regs. - -(define_expand "aarch64_set_qreg" - [(match_operand:VSTRUCT 0 "register_operand") - (match_operand:VSTRUCT 1 "register_operand") - (match_operand:VQ 2 "register_operand") - (match_operand:SI 3 "immediate_operand")] - "TARGET_SIMD" -{ - int part = INTVAL (operands[3]); - int offset = part * 16; - - emit_move_insn (operands[0], operands[1]); - emit_move_insn (gen_rtx_SUBREG (mode, operands[0], offset), - operands[2]); - DONE; -}) - ;; Standard pattern name vec_init. (define_expand "vec_init" @@ -7874,21 +7936,11 @@ [(set_attr "type" "neon_load1_all_lanes")] ) -(define_insn "aarch64_simd_ld1_x2" - [(set (match_operand:OI 0 "register_operand" "=w") - (unspec:OI [(match_operand:OI 1 "aarch64_simd_struct_operand" "Utv") - (unspec:VQ [(const_int 0)] UNSPEC_VSTRUCTDUMMY)] - UNSPEC_LD1))] - "TARGET_SIMD" - "ld1\\t{%S0. - %T0.}, %1" - [(set_attr "type" "neon_load1_2reg")] -) - -(define_insn "aarch64_simd_ld1_x2" - [(set (match_operand:OI 0 "register_operand" "=w") - (unspec:OI [(match_operand:OI 1 "aarch64_simd_struct_operand" "Utv") - (unspec:VDC [(const_int 0)] UNSPEC_VSTRUCTDUMMY)] - UNSPEC_LD1))] +(define_insn "aarch64_simd_ld1_x2" + [(set (match_operand:VSTRUCT_2QD 0 "register_operand" "=w") + (unspec:VSTRUCT_2QD [ + (match_operand:VSTRUCT_2QD 1 "aarch64_simd_struct_operand" "Utv")] + UNSPEC_LD1))] "TARGET_SIMD" "ld1\\t{%S0. - %T0.}, %1" [(set_attr "type" "neon_load1_2reg")] diff --git a/gcc/config/aarch64/aarch64.c b/gcc/config/aarch64/aarch64.c index 1780751..f7c3f80 100644 --- a/gcc/config/aarch64/aarch64.c +++ b/gcc/config/aarch64/aarch64.c @@ -2863,14 +2863,6 @@ aarch64_estimated_sve_vq () return estimated_poly_value (BITS_PER_SVE_VECTOR) / 128; } -/* Return true if MODE is any of the Advanced SIMD structure modes. */ -bool -aarch64_advsimd_struct_mode_p (machine_mode mode) -{ - return (TARGET_SIMD - && (mode == OImode || mode == CImode || mode == XImode)); -} - /* Return true if MODE is an SVE predicate mode. */ static bool aarch64_sve_pred_mode_p (machine_mode mode) @@ -2901,9 +2893,6 @@ const unsigned int VEC_ANY_DATA = VEC_ADVSIMD | VEC_SVE_DATA; static unsigned int aarch64_classify_vector_mode (machine_mode mode) { - if (aarch64_advsimd_struct_mode_p (mode)) - return VEC_ADVSIMD | VEC_STRUCT; - if (aarch64_sve_pred_mode_p (mode)) return VEC_SVE_PRED; @@ -2970,6 +2959,65 @@ aarch64_classify_vector_mode (machine_mode mode) case E_VNx8DFmode: return TARGET_SVE ? VEC_SVE_DATA | VEC_STRUCT : 0; + case E_OImode: + case E_CImode: + case E_XImode: + return TARGET_SIMD ? VEC_ADVSIMD | VEC_STRUCT : 0; + + /* Structures of 64-bit Advanced SIMD vectors. */ + case E_V2x8QImode: + case E_V2x4HImode: + case E_V2x2SImode: + case E_V2x1DImode: + case E_V2x4BFmode: + case E_V2x4HFmode: + case E_V2x2SFmode: + case E_V2x1DFmode: + case E_V3x8QImode: + case E_V3x4HImode: + case E_V3x2SImode: + case E_V3x1DImode: + case E_V3x4BFmode: + case E_V3x4HFmode: + case E_V3x2SFmode: + case E_V3x1DFmode: + case E_V4x8QImode: + case E_V4x4HImode: + case E_V4x2SImode: + case E_V4x1DImode: + case E_V4x4BFmode: + case E_V4x4HFmode: + case E_V4x2SFmode: + case E_V4x1DFmode: + return TARGET_SIMD ? VEC_ADVSIMD | VEC_STRUCT | VEC_PARTIAL : 0; + + /* Structures of 128-bit Advanced SIMD vectors. */ + case E_V2x16QImode: + case E_V2x8HImode: + case E_V2x4SImode: + case E_V2x2DImode: + case E_V2x8BFmode: + case E_V2x8HFmode: + case E_V2x4SFmode: + case E_V2x2DFmode: + case E_V3x16QImode: + case E_V3x8HImode: + case E_V3x4SImode: + case E_V3x2DImode: + case E_V3x8BFmode: + case E_V3x8HFmode: + case E_V3x4SFmode: + case E_V3x2DFmode: + case E_V4x16QImode: + case E_V4x8HImode: + case E_V4x4SImode: + case E_V4x2DImode: + case E_V4x8BFmode: + case E_V4x8HFmode: + case E_V4x4SFmode: + case E_V4x2DFmode: + return TARGET_SIMD ? VEC_ADVSIMD | VEC_STRUCT : 0; + /* 64-bit Advanced SIMD vectors. */ case E_V8QImode: case E_V4HImode: @@ -2995,6 +3043,29 @@ aarch64_classify_vector_mode (machine_mode mode) } } +/* Return true if MODE is any of the Advanced SIMD structure modes. */ +bool +aarch64_advsimd_struct_mode_p (machine_mode mode) +{ + unsigned int vec_flags = aarch64_classify_vector_mode (mode); + return (vec_flags & VEC_ADVSIMD) && (vec_flags & VEC_STRUCT); +} + +/* Return true if MODE is an Advanced SIMD D-register structure mode. */ +static bool +aarch64_advsimd_partial_struct_mode_p (machine_mode mode) +{ + return (aarch64_classify_vector_mode (mode) + == (VEC_ADVSIMD | VEC_STRUCT | VEC_PARTIAL)); +} + +/* Return true if MODE is an Advanced SIMD Q-register structure mode. */ +static bool +aarch64_advsimd_full_struct_mode_p (machine_mode mode) +{ + return (aarch64_classify_vector_mode (mode) == (VEC_ADVSIMD | VEC_STRUCT)); +} + /* Return true if MODE is any of the data vector modes, including structure modes. */ static bool @@ -3037,14 +3108,53 @@ aarch64_vl_bytes (machine_mode mode, unsigned int vec_flags) return BYTES_PER_SVE_PRED; } +/* Given an Advanced SIMD vector mode MODE and a tuple size NELEMS, return the + corresponding vector structure mode. */ +static opt_machine_mode +aarch64_advsimd_vector_array_mode (machine_mode mode, + unsigned HOST_WIDE_INT nelems) +{ + unsigned int flags = VEC_ADVSIMD | VEC_STRUCT; + if (known_eq (GET_MODE_SIZE (mode), 8)) + flags |= VEC_PARTIAL; + + machine_mode struct_mode; + FOR_EACH_MODE_IN_CLASS (struct_mode, GET_MODE_CLASS (mode)) + if (aarch64_classify_vector_mode (struct_mode) == flags + && GET_MODE_INNER (struct_mode) == GET_MODE_INNER (mode) + && known_eq (GET_MODE_NUNITS (struct_mode), + GET_MODE_NUNITS (mode) * nelems)) + return struct_mode; + return opt_machine_mode (); +} + +/* Return the SVE vector mode that has NUNITS elements of mode INNER_MODE. */ + +opt_machine_mode +aarch64_sve_data_mode (scalar_mode inner_mode, poly_uint64 nunits) +{ + enum mode_class mclass = (is_a (inner_mode) + ? MODE_VECTOR_FLOAT : MODE_VECTOR_INT); + machine_mode mode; + FOR_EACH_MODE_IN_CLASS (mode, mclass) + if (inner_mode == GET_MODE_INNER (mode) + && known_eq (nunits, GET_MODE_NUNITS (mode)) + && aarch64_sve_data_mode_p (mode)) + return mode; + return opt_machine_mode (); +} + /* Implement target hook TARGET_ARRAY_MODE. */ static opt_machine_mode aarch64_array_mode (machine_mode mode, unsigned HOST_WIDE_INT nelems) { if (aarch64_classify_vector_mode (mode) == VEC_SVE_DATA && IN_RANGE (nelems, 2, 4)) - return mode_for_vector (GET_MODE_INNER (mode), - GET_MODE_NUNITS (mode) * nelems); + return aarch64_sve_data_mode (GET_MODE_INNER (mode), + GET_MODE_NUNITS (mode) * nelems); + if (aarch64_classify_vector_mode (mode) == VEC_ADVSIMD + && IN_RANGE (nelems, 2, 4)) + return aarch64_advsimd_vector_array_mode (mode, nelems); return opt_machine_mode (); } @@ -3121,22 +3231,6 @@ aarch64_get_mask_mode (machine_mode mode) return default_get_mask_mode (mode); } -/* Return the SVE vector mode that has NUNITS elements of mode INNER_MODE. */ - -opt_machine_mode -aarch64_sve_data_mode (scalar_mode inner_mode, poly_uint64 nunits) -{ - enum mode_class mclass = (is_a (inner_mode) - ? MODE_VECTOR_FLOAT : MODE_VECTOR_INT); - machine_mode mode; - FOR_EACH_MODE_IN_CLASS (mode, mclass) - if (inner_mode == GET_MODE_INNER (mode) - && known_eq (nunits, GET_MODE_NUNITS (mode)) - && aarch64_sve_data_mode_p (mode)) - return mode; - return opt_machine_mode (); -} - /* Return the integer element mode associated with SVE mode MODE. */ static scalar_int_mode @@ -3261,6 +3355,8 @@ aarch64_hard_regno_nregs (unsigned regno, machine_mode mode) if (vec_flags & VEC_SVE_DATA) return exact_div (GET_MODE_SIZE (mode), aarch64_vl_bytes (mode, vec_flags)).to_constant (); + if (vec_flags == (VEC_ADVSIMD | VEC_STRUCT | VEC_PARTIAL)) + return GET_MODE_SIZE (mode).to_constant () / 8; return CEIL (lowest_size, UNITS_PER_VREG); } case PR_REGS: @@ -9890,21 +9986,39 @@ aarch64_classify_address (struct aarch64_address_info *info, instruction (only big endian will get here). For ldp/stp instructions, the offset is scaled for the size of a single element of the pair. */ - if (mode == OImode) + if (aarch64_advsimd_partial_struct_mode_p (mode) + && known_eq (GET_MODE_SIZE (mode), 16)) + return aarch64_offset_7bit_signed_scaled_p (DImode, offset); + if (aarch64_advsimd_full_struct_mode_p (mode) + && known_eq (GET_MODE_SIZE (mode), 32)) return aarch64_offset_7bit_signed_scaled_p (TImode, offset); /* Three 9/12 bit offsets checks because CImode will emit three ldr/str instructions (only big endian will get here). */ - if (mode == CImode) + if (aarch64_advsimd_partial_struct_mode_p (mode) + && known_eq (GET_MODE_SIZE (mode), 24)) + return (aarch64_offset_7bit_signed_scaled_p (DImode, offset) + && (aarch64_offset_9bit_signed_unscaled_p (DImode, + offset + 16) + || offset_12bit_unsigned_scaled_p (DImode, + offset + 16))); + if (aarch64_advsimd_full_struct_mode_p (mode) + && known_eq (GET_MODE_SIZE (mode), 48)) return (aarch64_offset_7bit_signed_scaled_p (TImode, offset) - && (aarch64_offset_9bit_signed_unscaled_p (V16QImode, + && (aarch64_offset_9bit_signed_unscaled_p (TImode, offset + 32) - || offset_12bit_unsigned_scaled_p (V16QImode, + || offset_12bit_unsigned_scaled_p (TImode, offset + 32))); /* Two 7bit offsets checks because XImode will emit two ldp/stp instructions (only big endian will get here). */ - if (mode == XImode) + if (aarch64_advsimd_partial_struct_mode_p (mode) + && known_eq (GET_MODE_SIZE (mode), 32)) + return (aarch64_offset_7bit_signed_scaled_p (DImode, offset) + && aarch64_offset_7bit_signed_scaled_p (DImode, + offset + 16)); + if (aarch64_advsimd_full_struct_mode_p (mode) + && known_eq (GET_MODE_SIZE (mode), 64)) return (aarch64_offset_7bit_signed_scaled_p (TImode, offset) && aarch64_offset_7bit_signed_scaled_p (TImode, offset + 32)); @@ -10991,7 +11105,10 @@ aarch64_print_operand (FILE *f, rtx x, int code) break; case 'R': - if (REG_P (x) && FP_REGNUM_P (REGNO (x))) + if (REG_P (x) && FP_REGNUM_P (REGNO (x)) + && (aarch64_advsimd_partial_struct_mode_p (GET_MODE (x)))) + asm_fprintf (f, "d%d", REGNO (x) - V0_REGNUM + 1); + else if (REG_P (x) && FP_REGNUM_P (REGNO (x))) asm_fprintf (f, "q%d", REGNO (x) - V0_REGNUM + 1); else if (REG_P (x) && GP_REGNUM_P (REGNO (x))) asm_fprintf (f, "x%d", REGNO (x) - R0_REGNUM + 1); @@ -22343,7 +22460,7 @@ aarch64_expand_vec_perm_1 (rtx target, rtx op0, rtx op1, rtx sel) } else { - pair = gen_reg_rtx (OImode); + pair = gen_reg_rtx (V2x16QImode); emit_insn (gen_aarch64_combinev16qi (pair, op0, op1)); emit_insn (gen_aarch64_qtbl2v16qi (target, pair, sel)); } @@ -23320,6 +23437,12 @@ aarch64_expand_sve_vcond (machine_mode data_mode, machine_mode cmp_mode, static bool aarch64_modes_tieable_p (machine_mode mode1, machine_mode mode2) { + if ((aarch64_advsimd_partial_struct_mode_p (mode1) + != aarch64_advsimd_partial_struct_mode_p (mode2)) + && maybe_gt (GET_MODE_SIZE (mode1), 8) + && maybe_gt (GET_MODE_SIZE (mode2), 8)) + return false; + if (GET_MODE_CLASS (mode1) == GET_MODE_CLASS (mode2)) return true; @@ -25175,6 +25298,10 @@ aarch64_can_change_mode_class (machine_mode from, bool from_pred_p = (from_flags & VEC_SVE_PRED); bool to_pred_p = (to_flags & VEC_SVE_PRED); + bool from_full_advsimd_struct_p = (from_flags == (VEC_ADVSIMD | VEC_STRUCT)); + bool to_partial_advsimd_struct_p = (to_flags == (VEC_ADVSIMD | VEC_STRUCT + | VEC_PARTIAL)); + /* Don't allow changes between predicate modes and other modes. Only predicate registers can hold predicate modes and only non-predicate registers can hold non-predicate modes, so any @@ -25195,6 +25322,11 @@ aarch64_can_change_mode_class (machine_mode from, || GET_MODE_UNIT_SIZE (from) != GET_MODE_UNIT_SIZE (to))) return false; + /* Don't allow changes between partial and full Advanced SIMD structure + modes. */ + if (from_full_advsimd_struct_p && to_partial_advsimd_struct_p) + return false; + if (maybe_ne (BITS_PER_SVE_VECTOR, 128u)) { /* Don't allow changes between SVE modes and other modes that might diff --git a/gcc/config/aarch64/arm_neon.h b/gcc/config/aarch64/arm_neon.h index ed0dfa9..9838c39 100644 --- a/gcc/config/aarch64/arm_neon.h +++ b/gcc/config/aarch64/arm_neon.h @@ -8676,14 +8676,7 @@ __extension__ extern __inline void __attribute__ ((__always_inline__, __gnu_inline__, __artificial__)) vst2_lane_f16 (float16_t *__ptr, float16x4x2_t __val, const int __lane) { - __builtin_aarch64_simd_oi __o; - float16x8x2_t __temp; - __temp.val[0] = vcombine_f16 (__val.val[0], - vcreate_f16 (__AARCH64_UINT64_C (0))); - __temp.val[1] = vcombine_f16 (__val.val[1], - vcreate_f16 (__AARCH64_UINT64_C (0))); - __builtin_memcpy (&__o, &__temp, sizeof (__temp)); - __builtin_aarch64_st2_lanev4hf ((__builtin_aarch64_simd_hf *) __ptr, __o, + __builtin_aarch64_st2_lanev4hf ((__builtin_aarch64_simd_hf *) __ptr, __val, __lane); } @@ -8691,14 +8684,7 @@ __extension__ extern __inline void __attribute__ ((__always_inline__, __gnu_inline__, __artificial__)) vst2_lane_f32 (float32_t *__ptr, float32x2x2_t __val, const int __lane) { - __builtin_aarch64_simd_oi __o; - float32x4x2_t __temp; - __temp.val[0] = vcombine_f32 (__val.val[0], - vcreate_f32 (__AARCH64_UINT64_C (0))); - __temp.val[1] = vcombine_f32 (__val.val[1], - vcreate_f32 (__AARCH64_UINT64_C (0))); - __builtin_memcpy (&__o, &__temp, sizeof (__temp)); - __builtin_aarch64_st2_lanev2sf ((__builtin_aarch64_simd_sf *) __ptr, __o, + __builtin_aarch64_st2_lanev2sf ((__builtin_aarch64_simd_sf *) __ptr, __val, __lane); } @@ -8706,14 +8692,7 @@ __extension__ extern __inline void __attribute__ ((__always_inline__, __gnu_inline__, __artificial__)) vst2_lane_f64 (float64_t *__ptr, float64x1x2_t __val, const int __lane) { - __builtin_aarch64_simd_oi __o; - float64x2x2_t __temp; - __temp.val[0] = vcombine_f64 (__val.val[0], - vcreate_f64 (__AARCH64_UINT64_C (0))); - __temp.val[1] = vcombine_f64 (__val.val[1], - vcreate_f64 (__AARCH64_UINT64_C (0))); - __builtin_memcpy (&__o, &__temp, sizeof (__temp)); - __builtin_aarch64_st2_lanedf ((__builtin_aarch64_simd_df *) __ptr, __o, + __builtin_aarch64_st2_lanedf ((__builtin_aarch64_simd_df *) __ptr, __val, __lane); } @@ -8721,59 +8700,31 @@ __extension__ extern __inline void __attribute__ ((__always_inline__, __gnu_inline__, __artificial__)) vst2_lane_p8 (poly8_t *__ptr, poly8x8x2_t __val, const int __lane) { - __builtin_aarch64_simd_oi __o; - poly8x16x2_t __temp; - __temp.val[0] = vcombine_p8 (__val.val[0], - vcreate_p8 (__AARCH64_UINT64_C (0))); - __temp.val[1] = vcombine_p8 (__val.val[1], - vcreate_p8 (__AARCH64_UINT64_C (0))); - __builtin_memcpy (&__o, &__temp, sizeof (__temp)); - __builtin_aarch64_st2_lanev8qi ((__builtin_aarch64_simd_qi *) __ptr, __o, - __lane); + __builtin_aarch64_st2_lanev8qi_sps ((__builtin_aarch64_simd_qi *) __ptr, + __val, __lane); } __extension__ extern __inline void __attribute__ ((__always_inline__, __gnu_inline__, __artificial__)) vst2_lane_p16 (poly16_t *__ptr, poly16x4x2_t __val, const int __lane) { - __builtin_aarch64_simd_oi __o; - poly16x8x2_t __temp; - __temp.val[0] = vcombine_p16 (__val.val[0], - vcreate_p16 (__AARCH64_UINT64_C (0))); - __temp.val[1] = vcombine_p16 (__val.val[1], - vcreate_p16 (__AARCH64_UINT64_C (0))); - __builtin_memcpy (&__o, &__temp, sizeof (__temp)); - __builtin_aarch64_st2_lanev4hi ((__builtin_aarch64_simd_hi *) __ptr, __o, - __lane); + __builtin_aarch64_st2_lanev4hi_sps ((__builtin_aarch64_simd_hi *) __ptr, + __val, __lane); } __extension__ extern __inline void __attribute__ ((__always_inline__, __gnu_inline__, __artificial__)) vst2_lane_p64 (poly64_t *__ptr, poly64x1x2_t __val, const int __lane) { - __builtin_aarch64_simd_oi __o; - poly64x2x2_t __temp; - __temp.val[0] = vcombine_p64 (__val.val[0], - vcreate_p64 (__AARCH64_UINT64_C (0))); - __temp.val[1] = vcombine_p64 (__val.val[1], - vcreate_p64 (__AARCH64_UINT64_C (0))); - __builtin_memcpy (&__o, &__temp, sizeof (__temp)); - __builtin_aarch64_st2_lanedi ((__builtin_aarch64_simd_di *) __ptr, __o, - __lane); + __builtin_aarch64_st2_lanedi_sps ((__builtin_aarch64_simd_di *) __ptr, + __val, __lane); } __extension__ extern __inline void __attribute__ ((__always_inline__, __gnu_inline__, __artificial__)) vst2_lane_s8 (int8_t *__ptr, int8x8x2_t __val, const int __lane) { - __builtin_aarch64_simd_oi __o; - int8x16x2_t __temp; - __temp.val[0] = vcombine_s8 (__val.val[0], - vcreate_s8 (__AARCH64_UINT64_C (0))); - __temp.val[1] = vcombine_s8 (__val.val[1], - vcreate_s8 (__AARCH64_UINT64_C (0))); - __builtin_memcpy (&__o, &__temp, sizeof (__temp)); - __builtin_aarch64_st2_lanev8qi ((__builtin_aarch64_simd_qi *) __ptr, __o, + __builtin_aarch64_st2_lanev8qi ((__builtin_aarch64_simd_qi *) __ptr, __val, __lane); } @@ -8781,14 +8732,7 @@ __extension__ extern __inline void __attribute__ ((__always_inline__, __gnu_inline__, __artificial__)) vst2_lane_s16 (int16_t *__ptr, int16x4x2_t __val, const int __lane) { - __builtin_aarch64_simd_oi __o; - int16x8x2_t __temp; - __temp.val[0] = vcombine_s16 (__val.val[0], - vcreate_s16 (__AARCH64_UINT64_C (0))); - __temp.val[1] = vcombine_s16 (__val.val[1], - vcreate_s16 (__AARCH64_UINT64_C (0))); - __builtin_memcpy (&__o, &__temp, sizeof (__temp)); - __builtin_aarch64_st2_lanev4hi ((__builtin_aarch64_simd_hi *) __ptr, __o, + __builtin_aarch64_st2_lanev4hi ((__builtin_aarch64_simd_hi *) __ptr, __val, __lane); } @@ -8796,14 +8740,7 @@ __extension__ extern __inline void __attribute__ ((__always_inline__, __gnu_inline__, __artificial__)) vst2_lane_s32 (int32_t *__ptr, int32x2x2_t __val, const int __lane) { - __builtin_aarch64_simd_oi __o; - int32x4x2_t __temp; - __temp.val[0] = vcombine_s32 (__val.val[0], - vcreate_s32 (__AARCH64_UINT64_C (0))); - __temp.val[1] = vcombine_s32 (__val.val[1], - vcreate_s32 (__AARCH64_UINT64_C (0))); - __builtin_memcpy (&__o, &__temp, sizeof (__temp)); - __builtin_aarch64_st2_lanev2si ((__builtin_aarch64_simd_si *) __ptr, __o, + __builtin_aarch64_st2_lanev2si ((__builtin_aarch64_simd_si *) __ptr, __val, __lane); } @@ -8811,14 +8748,7 @@ __extension__ extern __inline void __attribute__ ((__always_inline__, __gnu_inline__, __artificial__)) vst2_lane_s64 (int64_t *__ptr, int64x1x2_t __val, const int __lane) { - __builtin_aarch64_simd_oi __o; - int64x2x2_t __temp; - __temp.val[0] = vcombine_s64 (__val.val[0], - vcreate_s64 (__AARCH64_UINT64_C (0))); - __temp.val[1] = vcombine_s64 (__val.val[1], - vcreate_s64 (__AARCH64_UINT64_C (0))); - __builtin_memcpy (&__o, &__temp, sizeof (__temp)); - __builtin_aarch64_st2_lanedi ((__builtin_aarch64_simd_di *) __ptr, __o, + __builtin_aarch64_st2_lanedi ((__builtin_aarch64_simd_di *) __ptr, __val, __lane); } @@ -8826,69 +8756,39 @@ __extension__ extern __inline void __attribute__ ((__always_inline__, __gnu_inline__, __artificial__)) vst2_lane_u8 (uint8_t *__ptr, uint8x8x2_t __val, const int __lane) { - __builtin_aarch64_simd_oi __o; - uint8x16x2_t __temp; - __temp.val[0] = vcombine_u8 (__val.val[0], - vcreate_u8 (__AARCH64_UINT64_C (0))); - __temp.val[1] = vcombine_u8 (__val.val[1], - vcreate_u8 (__AARCH64_UINT64_C (0))); - __builtin_memcpy (&__o, &__temp, sizeof (__temp)); - __builtin_aarch64_st2_lanev8qi ((__builtin_aarch64_simd_qi *) __ptr, __o, - __lane); + __builtin_aarch64_st2_lanev8qi_sus ((__builtin_aarch64_simd_qi *) __ptr, + __val, __lane); } __extension__ extern __inline void __attribute__ ((__always_inline__, __gnu_inline__, __artificial__)) vst2_lane_u16 (uint16_t *__ptr, uint16x4x2_t __val, const int __lane) { - __builtin_aarch64_simd_oi __o; - uint16x8x2_t __temp; - __temp.val[0] = vcombine_u16 (__val.val[0], - vcreate_u16 (__AARCH64_UINT64_C (0))); - __temp.val[1] = vcombine_u16 (__val.val[1], - vcreate_u16 (__AARCH64_UINT64_C (0))); - __builtin_memcpy (&__o, &__temp, sizeof (__temp)); - __builtin_aarch64_st2_lanev4hi ((__builtin_aarch64_simd_hi *) __ptr, __o, - __lane); + __builtin_aarch64_st2_lanev4hi_sus ((__builtin_aarch64_simd_hi *) __ptr, + __val, __lane); } __extension__ extern __inline void __attribute__ ((__always_inline__, __gnu_inline__, __artificial__)) vst2_lane_u32 (uint32_t *__ptr, uint32x2x2_t __val, const int __lane) { - __builtin_aarch64_simd_oi __o; - uint32x4x2_t __temp; - __temp.val[0] = vcombine_u32 (__val.val[0], - vcreate_u32 (__AARCH64_UINT64_C (0))); - __temp.val[1] = vcombine_u32 (__val.val[1], - vcreate_u32 (__AARCH64_UINT64_C (0))); - __builtin_memcpy (&__o, &__temp, sizeof (__temp)); - __builtin_aarch64_st2_lanev2si ((__builtin_aarch64_simd_si *) __ptr, __o, - __lane); + __builtin_aarch64_st2_lanev2si_sus ((__builtin_aarch64_simd_si *) __ptr, + __val, __lane); } __extension__ extern __inline void __attribute__ ((__always_inline__, __gnu_inline__, __artificial__)) vst2_lane_u64 (uint64_t *__ptr, uint64x1x2_t __val, const int __lane) { - __builtin_aarch64_simd_oi __o; - uint64x2x2_t __temp; - __temp.val[0] = vcombine_u64 (__val.val[0], - vcreate_u64 (__AARCH64_UINT64_C (0))); - __temp.val[1] = vcombine_u64 (__val.val[1], - vcreate_u64 (__AARCH64_UINT64_C (0))); - __builtin_memcpy (&__o, &__temp, sizeof (__temp)); - __builtin_aarch64_st2_lanedi ((__builtin_aarch64_simd_di *) __ptr, __o, - __lane); + __builtin_aarch64_st2_lanedi_sus ((__builtin_aarch64_simd_di *) __ptr, __val, + __lane); } __extension__ extern __inline void __attribute__ ((__always_inline__, __gnu_inline__, __artificial__)) vst2q_lane_f16 (float16_t *__ptr, float16x8x2_t __val, const int __lane) { - __builtin_aarch64_simd_oi __o; - __builtin_memcpy (&__o, &__val, sizeof (__val)); - __builtin_aarch64_st2_lanev8hf ((__builtin_aarch64_simd_hf *) __ptr, __o, + __builtin_aarch64_st2_lanev8hf ((__builtin_aarch64_simd_hf *) __ptr, __val, __lane); } @@ -8896,9 +8796,7 @@ __extension__ extern __inline void __attribute__ ((__always_inline__, __gnu_inline__, __artificial__)) vst2q_lane_f32 (float32_t *__ptr, float32x4x2_t __val, const int __lane) { - __builtin_aarch64_simd_oi __o; - __builtin_memcpy (&__o, &__val, sizeof (__val)); - __builtin_aarch64_st2_lanev4sf ((__builtin_aarch64_simd_sf *) __ptr, __o, + __builtin_aarch64_st2_lanev4sf ((__builtin_aarch64_simd_sf *) __ptr, __val, __lane); } @@ -8906,9 +8804,7 @@ __extension__ extern __inline void __attribute__ ((__always_inline__, __gnu_inline__, __artificial__)) vst2q_lane_f64 (float64_t *__ptr, float64x2x2_t __val, const int __lane) { - __builtin_aarch64_simd_oi __o; - __builtin_memcpy (&__o, &__val, sizeof (__val)); - __builtin_aarch64_st2_lanev2df ((__builtin_aarch64_simd_df *) __ptr, __o, + __builtin_aarch64_st2_lanev2df ((__builtin_aarch64_simd_df *) __ptr, __val, __lane); } @@ -8916,39 +8812,31 @@ __extension__ extern __inline void __attribute__ ((__always_inline__, __gnu_inline__, __artificial__)) vst2q_lane_p8 (poly8_t *__ptr, poly8x16x2_t __val, const int __lane) { - __builtin_aarch64_simd_oi __o; - __builtin_memcpy (&__o, &__val, sizeof (__val)); - __builtin_aarch64_st2_lanev16qi ((__builtin_aarch64_simd_qi *) __ptr, __o, - __lane); + __builtin_aarch64_st2_lanev16qi_sps ((__builtin_aarch64_simd_qi *) __ptr, + __val, __lane); } __extension__ extern __inline void __attribute__ ((__always_inline__, __gnu_inline__, __artificial__)) vst2q_lane_p16 (poly16_t *__ptr, poly16x8x2_t __val, const int __lane) { - __builtin_aarch64_simd_oi __o; - __builtin_memcpy (&__o, &__val, sizeof (__val)); - __builtin_aarch64_st2_lanev8hi ((__builtin_aarch64_simd_hi *) __ptr, __o, - __lane); + __builtin_aarch64_st2_lanev8hi_sps ((__builtin_aarch64_simd_hi *) __ptr, + __val, __lane); } __extension__ extern __inline void __attribute__ ((__always_inline__, __gnu_inline__, __artificial__)) vst2q_lane_p64 (poly64_t *__ptr, poly64x2x2_t __val, const int __lane) { - __builtin_aarch64_simd_oi __o; - __builtin_memcpy (&__o, &__val, sizeof (__val)); - __builtin_aarch64_st2_lanev2di ((__builtin_aarch64_simd_di *) __ptr, __o, - __lane); + __builtin_aarch64_st2_lanev2di_sps ((__builtin_aarch64_simd_di *) __ptr, + __val, __lane); } __extension__ extern __inline void __attribute__ ((__always_inline__, __gnu_inline__, __artificial__)) vst2q_lane_s8 (int8_t *__ptr, int8x16x2_t __val, const int __lane) { - __builtin_aarch64_simd_oi __o; - __builtin_memcpy (&__o, &__val, sizeof (__val)); - __builtin_aarch64_st2_lanev16qi ((__builtin_aarch64_simd_qi *) __ptr, __o, + __builtin_aarch64_st2_lanev16qi ((__builtin_aarch64_simd_qi *) __ptr, __val, __lane); } @@ -8956,9 +8844,7 @@ __extension__ extern __inline void __attribute__ ((__always_inline__, __gnu_inline__, __artificial__)) vst2q_lane_s16 (int16_t *__ptr, int16x8x2_t __val, const int __lane) { - __builtin_aarch64_simd_oi __o; - __builtin_memcpy (&__o, &__val, sizeof (__val)); - __builtin_aarch64_st2_lanev8hi ((__builtin_aarch64_simd_hi *) __ptr, __o, + __builtin_aarch64_st2_lanev8hi ((__builtin_aarch64_simd_hi *) __ptr, __val, __lane); } @@ -8966,9 +8852,7 @@ __extension__ extern __inline void __attribute__ ((__always_inline__, __gnu_inline__, __artificial__)) vst2q_lane_s32 (int32_t *__ptr, int32x4x2_t __val, const int __lane) { - __builtin_aarch64_simd_oi __o; - __builtin_memcpy (&__o, &__val, sizeof (__val)); - __builtin_aarch64_st2_lanev4si ((__builtin_aarch64_simd_si *) __ptr, __o, + __builtin_aarch64_st2_lanev4si ((__builtin_aarch64_simd_si *) __ptr, __val, __lane); } @@ -8976,9 +8860,7 @@ __extension__ extern __inline void __attribute__ ((__always_inline__, __gnu_inline__, __artificial__)) vst2q_lane_s64 (int64_t *__ptr, int64x2x2_t __val, const int __lane) { - __builtin_aarch64_simd_oi __o; - __builtin_memcpy (&__o, &__val, sizeof (__val)); - __builtin_aarch64_st2_lanev2di ((__builtin_aarch64_simd_di *) __ptr, __o, + __builtin_aarch64_st2_lanev2di ((__builtin_aarch64_simd_di *) __ptr, __val, __lane); } @@ -8986,56 +8868,39 @@ __extension__ extern __inline void __attribute__ ((__always_inline__, __gnu_inline__, __artificial__)) vst2q_lane_u8 (uint8_t *__ptr, uint8x16x2_t __val, const int __lane) { - __builtin_aarch64_simd_oi __o; - __builtin_memcpy (&__o, &__val, sizeof (__val)); - __builtin_aarch64_st2_lanev16qi ((__builtin_aarch64_simd_qi *) __ptr, __o, - __lane); + __builtin_aarch64_st2_lanev16qi_sus ((__builtin_aarch64_simd_qi *) __ptr, + __val, __lane); } __extension__ extern __inline void __attribute__ ((__always_inline__, __gnu_inline__, __artificial__)) vst2q_lane_u16 (uint16_t *__ptr, uint16x8x2_t __val, const int __lane) { - __builtin_aarch64_simd_oi __o; - __builtin_memcpy (&__o, &__val, sizeof (__val)); - __builtin_aarch64_st2_lanev8hi ((__builtin_aarch64_simd_hi *) __ptr, __o, - __lane); + __builtin_aarch64_st2_lanev8hi_sus ((__builtin_aarch64_simd_hi *) __ptr, + __val, __lane); } __extension__ extern __inline void __attribute__ ((__always_inline__, __gnu_inline__, __artificial__)) vst2q_lane_u32 (uint32_t *__ptr, uint32x4x2_t __val, const int __lane) { - __builtin_aarch64_simd_oi __o; - __builtin_memcpy (&__o, &__val, sizeof (__val)); - __builtin_aarch64_st2_lanev4si ((__builtin_aarch64_simd_si *) __ptr, __o, - __lane); + __builtin_aarch64_st2_lanev4si_sus ((__builtin_aarch64_simd_si *) __ptr, + __val, __lane); } __extension__ extern __inline void __attribute__ ((__always_inline__, __gnu_inline__, __artificial__)) vst2q_lane_u64 (uint64_t *__ptr, uint64x2x2_t __val, const int __lane) { - __builtin_aarch64_simd_oi __o; - __builtin_memcpy (&__o, &__val, sizeof (__val)); - __builtin_aarch64_st2_lanev2di ((__builtin_aarch64_simd_di *) __ptr, __o, - __lane); + __builtin_aarch64_st2_lanev2di_sus ((__builtin_aarch64_simd_di *) __ptr, + __val, __lane); } __extension__ extern __inline void __attribute__ ((__always_inline__, __gnu_inline__, __artificial__)) vst3_lane_f16 (float16_t *__ptr, float16x4x3_t __val, const int __lane) { - __builtin_aarch64_simd_ci __o; - float16x8x3_t __temp; - __temp.val[0] = vcombine_f16 (__val.val[0], - vcreate_f16 (__AARCH64_UINT64_C (0))); - __temp.val[1] = vcombine_f16 (__val.val[1], - vcreate_f16 (__AARCH64_UINT64_C (0))); - __temp.val[2] = vcombine_f16 (__val.val[2], - vcreate_f16 (__AARCH64_UINT64_C (0))); - __builtin_memcpy (&__o, &__temp, sizeof (__temp)); - __builtin_aarch64_st3_lanev4hf ((__builtin_aarch64_simd_hf *) __ptr, __o, + __builtin_aarch64_st3_lanev4hf ((__builtin_aarch64_simd_hf *) __ptr, __val, __lane); } @@ -9043,16 +8908,7 @@ __extension__ extern __inline void __attribute__ ((__always_inline__, __gnu_inline__, __artificial__)) vst3_lane_f32 (float32_t *__ptr, float32x2x3_t __val, const int __lane) { - __builtin_aarch64_simd_ci __o; - float32x4x3_t __temp; - __temp.val[0] = vcombine_f32 (__val.val[0], - vcreate_f32 (__AARCH64_UINT64_C (0))); - __temp.val[1] = vcombine_f32 (__val.val[1], - vcreate_f32 (__AARCH64_UINT64_C (0))); - __temp.val[2] = vcombine_f32 (__val.val[2], - vcreate_f32 (__AARCH64_UINT64_C (0))); - __builtin_memcpy (&__o, &__temp, sizeof (__temp)); - __builtin_aarch64_st3_lanev2sf ((__builtin_aarch64_simd_sf *) __ptr, __o, + __builtin_aarch64_st3_lanev2sf ((__builtin_aarch64_simd_sf *) __ptr, __val, __lane); } @@ -9060,16 +8916,7 @@ __extension__ extern __inline void __attribute__ ((__always_inline__, __gnu_inline__, __artificial__)) vst3_lane_f64 (float64_t *__ptr, float64x1x3_t __val, const int __lane) { - __builtin_aarch64_simd_ci __o; - float64x2x3_t __temp; - __temp.val[0] = vcombine_f64 (__val.val[0], - vcreate_f64 (__AARCH64_UINT64_C (0))); - __temp.val[1] = vcombine_f64 (__val.val[1], - vcreate_f64 (__AARCH64_UINT64_C (0))); - __temp.val[2] = vcombine_f64 (__val.val[2], - vcreate_f64 (__AARCH64_UINT64_C (0))); - __builtin_memcpy (&__o, &__temp, sizeof (__temp)); - __builtin_aarch64_st3_lanedf ((__builtin_aarch64_simd_df *) __ptr, __o, + __builtin_aarch64_st3_lanedf ((__builtin_aarch64_simd_df *) __ptr, __val, __lane); } @@ -9077,67 +8924,31 @@ __extension__ extern __inline void __attribute__ ((__always_inline__, __gnu_inline__, __artificial__)) vst3_lane_p8 (poly8_t *__ptr, poly8x8x3_t __val, const int __lane) { - __builtin_aarch64_simd_ci __o; - poly8x16x3_t __temp; - __temp.val[0] = vcombine_p8 (__val.val[0], - vcreate_p8 (__AARCH64_UINT64_C (0))); - __temp.val[1] = vcombine_p8 (__val.val[1], - vcreate_p8 (__AARCH64_UINT64_C (0))); - __temp.val[2] = vcombine_p8 (__val.val[2], - vcreate_p8 (__AARCH64_UINT64_C (0))); - __builtin_memcpy (&__o, &__temp, sizeof (__temp)); - __builtin_aarch64_st3_lanev8qi ((__builtin_aarch64_simd_qi *) __ptr, __o, - __lane); + __builtin_aarch64_st3_lanev8qi_sps ((__builtin_aarch64_simd_qi *) __ptr, + __val, __lane); } __extension__ extern __inline void __attribute__ ((__always_inline__, __gnu_inline__, __artificial__)) vst3_lane_p16 (poly16_t *__ptr, poly16x4x3_t __val, const int __lane) { - __builtin_aarch64_simd_ci __o; - poly16x8x3_t __temp; - __temp.val[0] = vcombine_p16 (__val.val[0], - vcreate_p16 (__AARCH64_UINT64_C (0))); - __temp.val[1] = vcombine_p16 (__val.val[1], - vcreate_p16 (__AARCH64_UINT64_C (0))); - __temp.val[2] = vcombine_p16 (__val.val[2], - vcreate_p16 (__AARCH64_UINT64_C (0))); - __builtin_memcpy (&__o, &__temp, sizeof (__temp)); - __builtin_aarch64_st3_lanev4hi ((__builtin_aarch64_simd_hi *) __ptr, __o, - __lane); + __builtin_aarch64_st3_lanev4hi_sps ((__builtin_aarch64_simd_hi *) __ptr, + __val, __lane); } __extension__ extern __inline void __attribute__ ((__always_inline__, __gnu_inline__, __artificial__)) vst3_lane_p64 (poly64_t *__ptr, poly64x1x3_t __val, const int __lane) { - __builtin_aarch64_simd_ci __o; - poly64x2x3_t __temp; - __temp.val[0] = vcombine_p64 (__val.val[0], - vcreate_p64 (__AARCH64_UINT64_C (0))); - __temp.val[1] = vcombine_p64 (__val.val[1], - vcreate_p64 (__AARCH64_UINT64_C (0))); - __temp.val[2] = vcombine_p64 (__val.val[2], - vcreate_p64 (__AARCH64_UINT64_C (0))); - __builtin_memcpy (&__o, &__temp, sizeof (__temp)); - __builtin_aarch64_st3_lanedi ((__builtin_aarch64_simd_di *) __ptr, __o, - __lane); + __builtin_aarch64_st3_lanedi_sps ((__builtin_aarch64_simd_di *) __ptr, __val, + __lane); } __extension__ extern __inline void __attribute__ ((__always_inline__, __gnu_inline__, __artificial__)) vst3_lane_s8 (int8_t *__ptr, int8x8x3_t __val, const int __lane) { - __builtin_aarch64_simd_ci __o; - int8x16x3_t __temp; - __temp.val[0] = vcombine_s8 (__val.val[0], - vcreate_s8 (__AARCH64_UINT64_C (0))); - __temp.val[1] = vcombine_s8 (__val.val[1], - vcreate_s8 (__AARCH64_UINT64_C (0))); - __temp.val[2] = vcombine_s8 (__val.val[2], - vcreate_s8 (__AARCH64_UINT64_C (0))); - __builtin_memcpy (&__o, &__temp, sizeof (__temp)); - __builtin_aarch64_st3_lanev8qi ((__builtin_aarch64_simd_qi *) __ptr, __o, + __builtin_aarch64_st3_lanev8qi ((__builtin_aarch64_simd_qi *) __ptr, __val, __lane); } @@ -9145,16 +8956,7 @@ __extension__ extern __inline void __attribute__ ((__always_inline__, __gnu_inline__, __artificial__)) vst3_lane_s16 (int16_t *__ptr, int16x4x3_t __val, const int __lane) { - __builtin_aarch64_simd_ci __o; - int16x8x3_t __temp; - __temp.val[0] = vcombine_s16 (__val.val[0], - vcreate_s16 (__AARCH64_UINT64_C (0))); - __temp.val[1] = vcombine_s16 (__val.val[1], - vcreate_s16 (__AARCH64_UINT64_C (0))); - __temp.val[2] = vcombine_s16 (__val.val[2], - vcreate_s16 (__AARCH64_UINT64_C (0))); - __builtin_memcpy (&__o, &__temp, sizeof (__temp)); - __builtin_aarch64_st3_lanev4hi ((__builtin_aarch64_simd_hi *) __ptr, __o, + __builtin_aarch64_st3_lanev4hi ((__builtin_aarch64_simd_hi *) __ptr, __val, __lane); } @@ -9162,16 +8964,7 @@ __extension__ extern __inline void __attribute__ ((__always_inline__, __gnu_inline__, __artificial__)) vst3_lane_s32 (int32_t *__ptr, int32x2x3_t __val, const int __lane) { - __builtin_aarch64_simd_ci __o; - int32x4x3_t __temp; - __temp.val[0] = vcombine_s32 (__val.val[0], - vcreate_s32 (__AARCH64_UINT64_C (0))); - __temp.val[1] = vcombine_s32 (__val.val[1], - vcreate_s32 (__AARCH64_UINT64_C (0))); - __temp.val[2] = vcombine_s32 (__val.val[2], - vcreate_s32 (__AARCH64_UINT64_C (0))); - __builtin_memcpy (&__o, &__temp, sizeof (__temp)); - __builtin_aarch64_st3_lanev2si ((__builtin_aarch64_simd_si *) __ptr, __o, + __builtin_aarch64_st3_lanev2si ((__builtin_aarch64_simd_si *) __ptr, __val, __lane); } @@ -9179,16 +8972,7 @@ __extension__ extern __inline void __attribute__ ((__always_inline__, __gnu_inline__, __artificial__)) vst3_lane_s64 (int64_t *__ptr, int64x1x3_t __val, const int __lane) { - __builtin_aarch64_simd_ci __o; - int64x2x3_t __temp; - __temp.val[0] = vcombine_s64 (__val.val[0], - vcreate_s64 (__AARCH64_UINT64_C (0))); - __temp.val[1] = vcombine_s64 (__val.val[1], - vcreate_s64 (__AARCH64_UINT64_C (0))); - __temp.val[2] = vcombine_s64 (__val.val[2], - vcreate_s64 (__AARCH64_UINT64_C (0))); - __builtin_memcpy (&__o, &__temp, sizeof (__temp)); - __builtin_aarch64_st3_lanedi ((__builtin_aarch64_simd_di *) __ptr, __o, + __builtin_aarch64_st3_lanedi ((__builtin_aarch64_simd_di *) __ptr, __val, __lane); } @@ -9196,77 +8980,39 @@ __extension__ extern __inline void __attribute__ ((__always_inline__, __gnu_inline__, __artificial__)) vst3_lane_u8 (uint8_t *__ptr, uint8x8x3_t __val, const int __lane) { - __builtin_aarch64_simd_ci __o; - uint8x16x3_t __temp; - __temp.val[0] = vcombine_u8 (__val.val[0], - vcreate_u8 (__AARCH64_UINT64_C (0))); - __temp.val[1] = vcombine_u8 (__val.val[1], - vcreate_u8 (__AARCH64_UINT64_C (0))); - __temp.val[2] = vcombine_u8 (__val.val[2], - vcreate_u8 (__AARCH64_UINT64_C (0))); - __builtin_memcpy (&__o, &__temp, sizeof (__temp)); - __builtin_aarch64_st3_lanev8qi ((__builtin_aarch64_simd_qi *) __ptr, __o, - __lane); + __builtin_aarch64_st3_lanev8qi_sus ((__builtin_aarch64_simd_qi *) __ptr, + __val, __lane); } __extension__ extern __inline void __attribute__ ((__always_inline__, __gnu_inline__, __artificial__)) vst3_lane_u16 (uint16_t *__ptr, uint16x4x3_t __val, const int __lane) { - __builtin_aarch64_simd_ci __o; - uint16x8x3_t __temp; - __temp.val[0] = vcombine_u16 (__val.val[0], - vcreate_u16 (__AARCH64_UINT64_C (0))); - __temp.val[1] = vcombine_u16 (__val.val[1], - vcreate_u16 (__AARCH64_UINT64_C (0))); - __temp.val[2] = vcombine_u16 (__val.val[2], - vcreate_u16 (__AARCH64_UINT64_C (0))); - __builtin_memcpy (&__o, &__temp, sizeof (__temp)); - __builtin_aarch64_st3_lanev4hi ((__builtin_aarch64_simd_hi *) __ptr, __o, - __lane); + __builtin_aarch64_st3_lanev4hi_sus ((__builtin_aarch64_simd_hi *) __ptr, + __val, __lane); } __extension__ extern __inline void __attribute__ ((__always_inline__, __gnu_inline__, __artificial__)) vst3_lane_u32 (uint32_t *__ptr, uint32x2x3_t __val, const int __lane) { - __builtin_aarch64_simd_ci __o; - uint32x4x3_t __temp; - __temp.val[0] = vcombine_u32 (__val.val[0], - vcreate_u32 (__AARCH64_UINT64_C (0))); - __temp.val[1] = vcombine_u32 (__val.val[1], - vcreate_u32 (__AARCH64_UINT64_C (0))); - __temp.val[2] = vcombine_u32 (__val.val[2], - vcreate_u32 (__AARCH64_UINT64_C (0))); - __builtin_memcpy (&__o, &__temp, sizeof (__temp)); - __builtin_aarch64_st3_lanev2si ((__builtin_aarch64_simd_si *) __ptr, __o, - __lane); + __builtin_aarch64_st3_lanev2si_sus ((__builtin_aarch64_simd_si *) __ptr, + __val, __lane); } __extension__ extern __inline void __attribute__ ((__always_inline__, __gnu_inline__, __artificial__)) vst3_lane_u64 (uint64_t *__ptr, uint64x1x3_t __val, const int __lane) { - __builtin_aarch64_simd_ci __o; - uint64x2x3_t __temp; - __temp.val[0] = vcombine_u64 (__val.val[0], - vcreate_u64 (__AARCH64_UINT64_C (0))); - __temp.val[1] = vcombine_u64 (__val.val[1], - vcreate_u64 (__AARCH64_UINT64_C (0))); - __temp.val[2] = vcombine_u64 (__val.val[2], - vcreate_u64 (__AARCH64_UINT64_C (0))); - __builtin_memcpy (&__o, &__temp, sizeof (__temp)); - __builtin_aarch64_st3_lanedi ((__builtin_aarch64_simd_di *) __ptr, __o, - __lane); + __builtin_aarch64_st3_lanedi_sus ((__builtin_aarch64_simd_di *) __ptr, __val, + __lane); } __extension__ extern __inline void __attribute__ ((__always_inline__, __gnu_inline__, __artificial__)) vst3q_lane_f16 (float16_t *__ptr, float16x8x3_t __val, const int __lane) { - __builtin_aarch64_simd_ci __o; - __builtin_memcpy (&__o, &__val, sizeof (__val)); - __builtin_aarch64_st3_lanev8hf ((__builtin_aarch64_simd_hf *) __ptr, __o, + __builtin_aarch64_st3_lanev8hf ((__builtin_aarch64_simd_hf *) __ptr, __val, __lane); } @@ -9274,9 +9020,7 @@ __extension__ extern __inline void __attribute__ ((__always_inline__, __gnu_inline__, __artificial__)) vst3q_lane_f32 (float32_t *__ptr, float32x4x3_t __val, const int __lane) { - __builtin_aarch64_simd_ci __o; - __builtin_memcpy (&__o, &__val, sizeof (__val)); - __builtin_aarch64_st3_lanev4sf ((__builtin_aarch64_simd_sf *) __ptr, __o, + __builtin_aarch64_st3_lanev4sf ((__builtin_aarch64_simd_sf *) __ptr, __val, __lane); } @@ -9284,9 +9028,7 @@ __extension__ extern __inline void __attribute__ ((__always_inline__, __gnu_inline__, __artificial__)) vst3q_lane_f64 (float64_t *__ptr, float64x2x3_t __val, const int __lane) { - __builtin_aarch64_simd_ci __o; - __builtin_memcpy (&__o, &__val, sizeof (__val)); - __builtin_aarch64_st3_lanev2df ((__builtin_aarch64_simd_df *) __ptr, __o, + __builtin_aarch64_st3_lanev2df ((__builtin_aarch64_simd_df *) __ptr, __val, __lane); } @@ -9294,39 +9036,31 @@ __extension__ extern __inline void __attribute__ ((__always_inline__, __gnu_inline__, __artificial__)) vst3q_lane_p8 (poly8_t *__ptr, poly8x16x3_t __val, const int __lane) { - __builtin_aarch64_simd_ci __o; - __builtin_memcpy (&__o, &__val, sizeof (__val)); - __builtin_aarch64_st3_lanev16qi ((__builtin_aarch64_simd_qi *) __ptr, __o, - __lane); + __builtin_aarch64_st3_lanev16qi_sps ((__builtin_aarch64_simd_qi *) __ptr, + __val, __lane); } __extension__ extern __inline void __attribute__ ((__always_inline__, __gnu_inline__, __artificial__)) vst3q_lane_p16 (poly16_t *__ptr, poly16x8x3_t __val, const int __lane) { - __builtin_aarch64_simd_ci __o; - __builtin_memcpy (&__o, &__val, sizeof (__val)); - __builtin_aarch64_st3_lanev8hi ((__builtin_aarch64_simd_hi *) __ptr, __o, - __lane); + __builtin_aarch64_st3_lanev8hi_sps ((__builtin_aarch64_simd_hi *) __ptr, + __val, __lane); } __extension__ extern __inline void __attribute__ ((__always_inline__, __gnu_inline__, __artificial__)) vst3q_lane_p64 (poly64_t *__ptr, poly64x2x3_t __val, const int __lane) { - __builtin_aarch64_simd_ci __o; - __builtin_memcpy (&__o, &__val, sizeof (__val)); - __builtin_aarch64_st3_lanev2di ((__builtin_aarch64_simd_di *) __ptr, __o, - __lane); + __builtin_aarch64_st3_lanev2di_sps ((__builtin_aarch64_simd_di *) __ptr, + __val, __lane); } __extension__ extern __inline void __attribute__ ((__always_inline__, __gnu_inline__, __artificial__)) vst3q_lane_s8 (int8_t *__ptr, int8x16x3_t __val, const int __lane) { - __builtin_aarch64_simd_ci __o; - __builtin_memcpy (&__o, &__val, sizeof (__val)); - __builtin_aarch64_st3_lanev16qi ((__builtin_aarch64_simd_qi *) __ptr, __o, + __builtin_aarch64_st3_lanev16qi ((__builtin_aarch64_simd_qi *) __ptr, __val, __lane); } @@ -9334,9 +9068,7 @@ __extension__ extern __inline void __attribute__ ((__always_inline__, __gnu_inline__, __artificial__)) vst3q_lane_s16 (int16_t *__ptr, int16x8x3_t __val, const int __lane) { - __builtin_aarch64_simd_ci __o; - __builtin_memcpy (&__o, &__val, sizeof (__val)); - __builtin_aarch64_st3_lanev8hi ((__builtin_aarch64_simd_hi *) __ptr, __o, + __builtin_aarch64_st3_lanev8hi ((__builtin_aarch64_simd_hi *) __ptr, __val, __lane); } @@ -9344,9 +9076,7 @@ __extension__ extern __inline void __attribute__ ((__always_inline__, __gnu_inline__, __artificial__)) vst3q_lane_s32 (int32_t *__ptr, int32x4x3_t __val, const int __lane) { - __builtin_aarch64_simd_ci __o; - __builtin_memcpy (&__o, &__val, sizeof (__val)); - __builtin_aarch64_st3_lanev4si ((__builtin_aarch64_simd_si *) __ptr, __o, + __builtin_aarch64_st3_lanev4si ((__builtin_aarch64_simd_si *) __ptr, __val, __lane); } @@ -9354,9 +9084,7 @@ __extension__ extern __inline void __attribute__ ((__always_inline__, __gnu_inline__, __artificial__)) vst3q_lane_s64 (int64_t *__ptr, int64x2x3_t __val, const int __lane) { - __builtin_aarch64_simd_ci __o; - __builtin_memcpy (&__o, &__val, sizeof (__val)); - __builtin_aarch64_st3_lanev2di ((__builtin_aarch64_simd_di *) __ptr, __o, + __builtin_aarch64_st3_lanev2di ((__builtin_aarch64_simd_di *) __ptr, __val, __lane); } @@ -9364,58 +9092,39 @@ __extension__ extern __inline void __attribute__ ((__always_inline__, __gnu_inline__, __artificial__)) vst3q_lane_u8 (uint8_t *__ptr, uint8x16x3_t __val, const int __lane) { - __builtin_aarch64_simd_ci __o; - __builtin_memcpy (&__o, &__val, sizeof (__val)); - __builtin_aarch64_st3_lanev16qi ((__builtin_aarch64_simd_qi *) __ptr, __o, - __lane); + __builtin_aarch64_st3_lanev16qi_sus ((__builtin_aarch64_simd_qi *) __ptr, + __val, __lane); } __extension__ extern __inline void __attribute__ ((__always_inline__, __gnu_inline__, __artificial__)) vst3q_lane_u16 (uint16_t *__ptr, uint16x8x3_t __val, const int __lane) { - __builtin_aarch64_simd_ci __o; - __builtin_memcpy (&__o, &__val, sizeof (__val)); - __builtin_aarch64_st3_lanev8hi ((__builtin_aarch64_simd_hi *) __ptr, __o, - __lane); + __builtin_aarch64_st3_lanev8hi_sus ((__builtin_aarch64_simd_hi *) __ptr, + __val, __lane); } __extension__ extern __inline void __attribute__ ((__always_inline__, __gnu_inline__, __artificial__)) vst3q_lane_u32 (uint32_t *__ptr, uint32x4x3_t __val, const int __lane) { - __builtin_aarch64_simd_ci __o; - __builtin_memcpy (&__o, &__val, sizeof (__val)); - __builtin_aarch64_st3_lanev4si ((__builtin_aarch64_simd_si *) __ptr, __o, - __lane); + __builtin_aarch64_st3_lanev4si_sus ((__builtin_aarch64_simd_si *) __ptr, + __val, __lane); } __extension__ extern __inline void __attribute__ ((__always_inline__, __gnu_inline__, __artificial__)) vst3q_lane_u64 (uint64_t *__ptr, uint64x2x3_t __val, const int __lane) { - __builtin_aarch64_simd_ci __o; - __builtin_memcpy (&__o, &__val, sizeof (__val)); - __builtin_aarch64_st3_lanev2di ((__builtin_aarch64_simd_di *) __ptr, __o, - __lane); + __builtin_aarch64_st3_lanev2di_sus ((__builtin_aarch64_simd_di *) __ptr, + __val, __lane); } __extension__ extern __inline void __attribute__ ((__always_inline__, __gnu_inline__, __artificial__)) vst4_lane_f16 (float16_t *__ptr, float16x4x4_t __val, const int __lane) { - __builtin_aarch64_simd_xi __o; - float16x8x4_t __temp; - __temp.val[0] = vcombine_f16 (__val.val[0], - vcreate_f16 (__AARCH64_UINT64_C (0))); - __temp.val[1] = vcombine_f16 (__val.val[1], - vcreate_f16 (__AARCH64_UINT64_C (0))); - __temp.val[2] = vcombine_f16 (__val.val[2], - vcreate_f16 (__AARCH64_UINT64_C (0))); - __temp.val[3] = vcombine_f16 (__val.val[3], - vcreate_f16 (__AARCH64_UINT64_C (0))); - __builtin_memcpy (&__o, &__temp, sizeof (__temp)); - __builtin_aarch64_st4_lanev4hf ((__builtin_aarch64_simd_hf *) __ptr, __o, + __builtin_aarch64_st4_lanev4hf ((__builtin_aarch64_simd_hf *) __ptr, __val, __lane); } @@ -9423,18 +9132,7 @@ __extension__ extern __inline void __attribute__ ((__always_inline__, __gnu_inline__, __artificial__)) vst4_lane_f32 (float32_t *__ptr, float32x2x4_t __val, const int __lane) { - __builtin_aarch64_simd_xi __o; - float32x4x4_t __temp; - __temp.val[0] = vcombine_f32 (__val.val[0], - vcreate_f32 (__AARCH64_UINT64_C (0))); - __temp.val[1] = vcombine_f32 (__val.val[1], - vcreate_f32 (__AARCH64_UINT64_C (0))); - __temp.val[2] = vcombine_f32 (__val.val[2], - vcreate_f32 (__AARCH64_UINT64_C (0))); - __temp.val[3] = vcombine_f32 (__val.val[3], - vcreate_f32 (__AARCH64_UINT64_C (0))); - __builtin_memcpy (&__o, &__temp, sizeof (__temp)); - __builtin_aarch64_st4_lanev2sf ((__builtin_aarch64_simd_sf *) __ptr, __o, + __builtin_aarch64_st4_lanev2sf ((__builtin_aarch64_simd_sf *) __ptr, __val, __lane); } @@ -9442,18 +9140,7 @@ __extension__ extern __inline void __attribute__ ((__always_inline__, __gnu_inline__, __artificial__)) vst4_lane_f64 (float64_t *__ptr, float64x1x4_t __val, const int __lane) { - __builtin_aarch64_simd_xi __o; - float64x2x4_t __temp; - __temp.val[0] = vcombine_f64 (__val.val[0], - vcreate_f64 (__AARCH64_UINT64_C (0))); - __temp.val[1] = vcombine_f64 (__val.val[1], - vcreate_f64 (__AARCH64_UINT64_C (0))); - __temp.val[2] = vcombine_f64 (__val.val[2], - vcreate_f64 (__AARCH64_UINT64_C (0))); - __temp.val[3] = vcombine_f64 (__val.val[3], - vcreate_f64 (__AARCH64_UINT64_C (0))); - __builtin_memcpy (&__o, &__temp, sizeof (__temp)); - __builtin_aarch64_st4_lanedf ((__builtin_aarch64_simd_df *) __ptr, __o, + __builtin_aarch64_st4_lanedf ((__builtin_aarch64_simd_df *) __ptr, __val, __lane); } @@ -9461,75 +9148,31 @@ __extension__ extern __inline void __attribute__ ((__always_inline__, __gnu_inline__, __artificial__)) vst4_lane_p8 (poly8_t *__ptr, poly8x8x4_t __val, const int __lane) { - __builtin_aarch64_simd_xi __o; - poly8x16x4_t __temp; - __temp.val[0] = vcombine_p8 (__val.val[0], - vcreate_p8 (__AARCH64_UINT64_C (0))); - __temp.val[1] = vcombine_p8 (__val.val[1], - vcreate_p8 (__AARCH64_UINT64_C (0))); - __temp.val[2] = vcombine_p8 (__val.val[2], - vcreate_p8 (__AARCH64_UINT64_C (0))); - __temp.val[3] = vcombine_p8 (__val.val[3], - vcreate_p8 (__AARCH64_UINT64_C (0))); - __builtin_memcpy (&__o, &__temp, sizeof (__temp)); - __builtin_aarch64_st4_lanev8qi ((__builtin_aarch64_simd_qi *) __ptr, __o, - __lane); + __builtin_aarch64_st4_lanev8qi_sps ((__builtin_aarch64_simd_qi *) __ptr, + __val, __lane); } __extension__ extern __inline void __attribute__ ((__always_inline__, __gnu_inline__, __artificial__)) vst4_lane_p16 (poly16_t *__ptr, poly16x4x4_t __val, const int __lane) { - __builtin_aarch64_simd_xi __o; - poly16x8x4_t __temp; - __temp.val[0] = vcombine_p16 (__val.val[0], - vcreate_p16 (__AARCH64_UINT64_C (0))); - __temp.val[1] = vcombine_p16 (__val.val[1], - vcreate_p16 (__AARCH64_UINT64_C (0))); - __temp.val[2] = vcombine_p16 (__val.val[2], - vcreate_p16 (__AARCH64_UINT64_C (0))); - __temp.val[3] = vcombine_p16 (__val.val[3], - vcreate_p16 (__AARCH64_UINT64_C (0))); - __builtin_memcpy (&__o, &__temp, sizeof (__temp)); - __builtin_aarch64_st4_lanev4hi ((__builtin_aarch64_simd_hi *) __ptr, __o, - __lane); + __builtin_aarch64_st4_lanev4hi_sps ((__builtin_aarch64_simd_hi *) __ptr, + __val, __lane); } __extension__ extern __inline void __attribute__ ((__always_inline__, __gnu_inline__, __artificial__)) vst4_lane_p64 (poly64_t *__ptr, poly64x1x4_t __val, const int __lane) { - __builtin_aarch64_simd_xi __o; - poly64x2x4_t __temp; - __temp.val[0] = vcombine_p64 (__val.val[0], - vcreate_p64 (__AARCH64_UINT64_C (0))); - __temp.val[1] = vcombine_p64 (__val.val[1], - vcreate_p64 (__AARCH64_UINT64_C (0))); - __temp.val[2] = vcombine_p64 (__val.val[2], - vcreate_p64 (__AARCH64_UINT64_C (0))); - __temp.val[3] = vcombine_p64 (__val.val[3], - vcreate_p64 (__AARCH64_UINT64_C (0))); - __builtin_memcpy (&__o, &__temp, sizeof (__temp)); - __builtin_aarch64_st4_lanedi ((__builtin_aarch64_simd_di *) __ptr, __o, - __lane); + __builtin_aarch64_st4_lanedi_sps ((__builtin_aarch64_simd_di *) __ptr, __val, + __lane); } __extension__ extern __inline void __attribute__ ((__always_inline__, __gnu_inline__, __artificial__)) vst4_lane_s8 (int8_t *__ptr, int8x8x4_t __val, const int __lane) { - __builtin_aarch64_simd_xi __o; - int8x16x4_t __temp; - __temp.val[0] = vcombine_s8 (__val.val[0], - vcreate_s8 (__AARCH64_UINT64_C (0))); - __temp.val[1] = vcombine_s8 (__val.val[1], - vcreate_s8 (__AARCH64_UINT64_C (0))); - __temp.val[2] = vcombine_s8 (__val.val[2], - vcreate_s8 (__AARCH64_UINT64_C (0))); - __temp.val[3] = vcombine_s8 (__val.val[3], - vcreate_s8 (__AARCH64_UINT64_C (0))); - __builtin_memcpy (&__o, &__temp, sizeof (__temp)); - __builtin_aarch64_st4_lanev8qi ((__builtin_aarch64_simd_qi *) __ptr, __o, + __builtin_aarch64_st4_lanev8qi ((__builtin_aarch64_simd_qi *) __ptr, __val, __lane); } @@ -9537,18 +9180,7 @@ __extension__ extern __inline void __attribute__ ((__always_inline__, __gnu_inline__, __artificial__)) vst4_lane_s16 (int16_t *__ptr, int16x4x4_t __val, const int __lane) { - __builtin_aarch64_simd_xi __o; - int16x8x4_t __temp; - __temp.val[0] = vcombine_s16 (__val.val[0], - vcreate_s16 (__AARCH64_UINT64_C (0))); - __temp.val[1] = vcombine_s16 (__val.val[1], - vcreate_s16 (__AARCH64_UINT64_C (0))); - __temp.val[2] = vcombine_s16 (__val.val[2], - vcreate_s16 (__AARCH64_UINT64_C (0))); - __temp.val[3] = vcombine_s16 (__val.val[3], - vcreate_s16 (__AARCH64_UINT64_C (0))); - __builtin_memcpy (&__o, &__temp, sizeof (__temp)); - __builtin_aarch64_st4_lanev4hi ((__builtin_aarch64_simd_hi *) __ptr, __o, + __builtin_aarch64_st4_lanev4hi ((__builtin_aarch64_simd_hi *) __ptr, __val, __lane); } @@ -9556,18 +9188,7 @@ __extension__ extern __inline void __attribute__ ((__always_inline__, __gnu_inline__, __artificial__)) vst4_lane_s32 (int32_t *__ptr, int32x2x4_t __val, const int __lane) { - __builtin_aarch64_simd_xi __o; - int32x4x4_t __temp; - __temp.val[0] = vcombine_s32 (__val.val[0], - vcreate_s32 (__AARCH64_UINT64_C (0))); - __temp.val[1] = vcombine_s32 (__val.val[1], - vcreate_s32 (__AARCH64_UINT64_C (0))); - __temp.val[2] = vcombine_s32 (__val.val[2], - vcreate_s32 (__AARCH64_UINT64_C (0))); - __temp.val[3] = vcombine_s32 (__val.val[3], - vcreate_s32 (__AARCH64_UINT64_C (0))); - __builtin_memcpy (&__o, &__temp, sizeof (__temp)); - __builtin_aarch64_st4_lanev2si ((__builtin_aarch64_simd_si *) __ptr, __o, + __builtin_aarch64_st4_lanev2si ((__builtin_aarch64_simd_si *) __ptr, __val, __lane); } @@ -9575,18 +9196,7 @@ __extension__ extern __inline void __attribute__ ((__always_inline__, __gnu_inline__, __artificial__)) vst4_lane_s64 (int64_t *__ptr, int64x1x4_t __val, const int __lane) { - __builtin_aarch64_simd_xi __o; - int64x2x4_t __temp; - __temp.val[0] = vcombine_s64 (__val.val[0], - vcreate_s64 (__AARCH64_UINT64_C (0))); - __temp.val[1] = vcombine_s64 (__val.val[1], - vcreate_s64 (__AARCH64_UINT64_C (0))); - __temp.val[2] = vcombine_s64 (__val.val[2], - vcreate_s64 (__AARCH64_UINT64_C (0))); - __temp.val[3] = vcombine_s64 (__val.val[3], - vcreate_s64 (__AARCH64_UINT64_C (0))); - __builtin_memcpy (&__o, &__temp, sizeof (__temp)); - __builtin_aarch64_st4_lanedi ((__builtin_aarch64_simd_di *) __ptr, __o, + __builtin_aarch64_st4_lanedi ((__builtin_aarch64_simd_di *) __ptr, __val, __lane); } @@ -9594,85 +9204,39 @@ __extension__ extern __inline void __attribute__ ((__always_inline__, __gnu_inline__, __artificial__)) vst4_lane_u8 (uint8_t *__ptr, uint8x8x4_t __val, const int __lane) { - __builtin_aarch64_simd_xi __o; - uint8x16x4_t __temp; - __temp.val[0] = vcombine_u8 (__val.val[0], - vcreate_u8 (__AARCH64_UINT64_C (0))); - __temp.val[1] = vcombine_u8 (__val.val[1], - vcreate_u8 (__AARCH64_UINT64_C (0))); - __temp.val[2] = vcombine_u8 (__val.val[2], - vcreate_u8 (__AARCH64_UINT64_C (0))); - __temp.val[3] = vcombine_u8 (__val.val[3], - vcreate_u8 (__AARCH64_UINT64_C (0))); - __builtin_memcpy (&__o, &__temp, sizeof (__temp)); - __builtin_aarch64_st4_lanev8qi ((__builtin_aarch64_simd_qi *) __ptr, __o, - __lane); + __builtin_aarch64_st4_lanev8qi_sus ((__builtin_aarch64_simd_qi *) __ptr, + __val, __lane); } __extension__ extern __inline void __attribute__ ((__always_inline__, __gnu_inline__, __artificial__)) vst4_lane_u16 (uint16_t *__ptr, uint16x4x4_t __val, const int __lane) { - __builtin_aarch64_simd_xi __o; - uint16x8x4_t __temp; - __temp.val[0] = vcombine_u16 (__val.val[0], - vcreate_u16 (__AARCH64_UINT64_C (0))); - __temp.val[1] = vcombine_u16 (__val.val[1], - vcreate_u16 (__AARCH64_UINT64_C (0))); - __temp.val[2] = vcombine_u16 (__val.val[2], - vcreate_u16 (__AARCH64_UINT64_C (0))); - __temp.val[3] = vcombine_u16 (__val.val[3], - vcreate_u16 (__AARCH64_UINT64_C (0))); - __builtin_memcpy (&__o, &__temp, sizeof (__temp)); - __builtin_aarch64_st4_lanev4hi ((__builtin_aarch64_simd_hi *) __ptr, __o, - __lane); + __builtin_aarch64_st4_lanev4hi_sus ((__builtin_aarch64_simd_hi *) __ptr, + __val, __lane); } __extension__ extern __inline void __attribute__ ((__always_inline__, __gnu_inline__, __artificial__)) vst4_lane_u32 (uint32_t *__ptr, uint32x2x4_t __val, const int __lane) { - __builtin_aarch64_simd_xi __o; - uint32x4x4_t __temp; - __temp.val[0] = vcombine_u32 (__val.val[0], - vcreate_u32 (__AARCH64_UINT64_C (0))); - __temp.val[1] = vcombine_u32 (__val.val[1], - vcreate_u32 (__AARCH64_UINT64_C (0))); - __temp.val[2] = vcombine_u32 (__val.val[2], - vcreate_u32 (__AARCH64_UINT64_C (0))); - __temp.val[3] = vcombine_u32 (__val.val[3], - vcreate_u32 (__AARCH64_UINT64_C (0))); - __builtin_memcpy (&__o, &__temp, sizeof (__temp)); - __builtin_aarch64_st4_lanev2si ((__builtin_aarch64_simd_si *) __ptr, __o, - __lane); + __builtin_aarch64_st4_lanev2si_sus ((__builtin_aarch64_simd_si *) __ptr, + __val, __lane); } __extension__ extern __inline void __attribute__ ((__always_inline__, __gnu_inline__, __artificial__)) vst4_lane_u64 (uint64_t *__ptr, uint64x1x4_t __val, const int __lane) { - __builtin_aarch64_simd_xi __o; - uint64x2x4_t __temp; - __temp.val[0] = vcombine_u64 (__val.val[0], - vcreate_u64 (__AARCH64_UINT64_C (0))); - __temp.val[1] = vcombine_u64 (__val.val[1], - vcreate_u64 (__AARCH64_UINT64_C (0))); - __temp.val[2] = vcombine_u64 (__val.val[2], - vcreate_u64 (__AARCH64_UINT64_C (0))); - __temp.val[3] = vcombine_u64 (__val.val[3], - vcreate_u64 (__AARCH64_UINT64_C (0))); - __builtin_memcpy (&__o, &__temp, sizeof (__temp)); - __builtin_aarch64_st4_lanedi ((__builtin_aarch64_simd_di *) __ptr, __o, - __lane); + __builtin_aarch64_st4_lanedi_sus ((__builtin_aarch64_simd_di *) __ptr, __val, + __lane); } __extension__ extern __inline void __attribute__ ((__always_inline__, __gnu_inline__, __artificial__)) vst4q_lane_f16 (float16_t *__ptr, float16x8x4_t __val, const int __lane) { - __builtin_aarch64_simd_xi __o; - __builtin_memcpy (&__o, &__val, sizeof (__val)); - __builtin_aarch64_st4_lanev8hf ((__builtin_aarch64_simd_hf *) __ptr, __o, + __builtin_aarch64_st4_lanev8hf ((__builtin_aarch64_simd_hf *) __ptr, __val, __lane); } @@ -9680,9 +9244,7 @@ __extension__ extern __inline void __attribute__ ((__always_inline__, __gnu_inline__, __artificial__)) vst4q_lane_f32 (float32_t *__ptr, float32x4x4_t __val, const int __lane) { - __builtin_aarch64_simd_xi __o; - __builtin_memcpy (&__o, &__val, sizeof (__val)); - __builtin_aarch64_st4_lanev4sf ((__builtin_aarch64_simd_sf *) __ptr, __o, + __builtin_aarch64_st4_lanev4sf ((__builtin_aarch64_simd_sf *) __ptr, __val, __lane); } @@ -9690,9 +9252,7 @@ __extension__ extern __inline void __attribute__ ((__always_inline__, __gnu_inline__, __artificial__)) vst4q_lane_f64 (float64_t *__ptr, float64x2x4_t __val, const int __lane) { - __builtin_aarch64_simd_xi __o; - __builtin_memcpy (&__o, &__val, sizeof (__val)); - __builtin_aarch64_st4_lanev2df ((__builtin_aarch64_simd_df *) __ptr, __o, + __builtin_aarch64_st4_lanev2df ((__builtin_aarch64_simd_df *) __ptr, __val, __lane); } @@ -9700,39 +9260,31 @@ __extension__ extern __inline void __attribute__ ((__always_inline__, __gnu_inline__, __artificial__)) vst4q_lane_p8 (poly8_t *__ptr, poly8x16x4_t __val, const int __lane) { - __builtin_aarch64_simd_xi __o; - __builtin_memcpy (&__o, &__val, sizeof (__val)); - __builtin_aarch64_st4_lanev16qi ((__builtin_aarch64_simd_qi *) __ptr, __o, - __lane); + __builtin_aarch64_st4_lanev16qi_sps ((__builtin_aarch64_simd_qi *) __ptr, + __val, __lane); } __extension__ extern __inline void __attribute__ ((__always_inline__, __gnu_inline__, __artificial__)) vst4q_lane_p16 (poly16_t *__ptr, poly16x8x4_t __val, const int __lane) { - __builtin_aarch64_simd_xi __o; - __builtin_memcpy (&__o, &__val, sizeof (__val)); - __builtin_aarch64_st4_lanev8hi ((__builtin_aarch64_simd_hi *) __ptr, __o, - __lane); + __builtin_aarch64_st4_lanev8hi_sps ((__builtin_aarch64_simd_hi *) __ptr, + __val, __lane); } __extension__ extern __inline void __attribute__ ((__always_inline__, __gnu_inline__, __artificial__)) vst4q_lane_p64 (poly64_t *__ptr, poly64x2x4_t __val, const int __lane) { - __builtin_aarch64_simd_xi __o; - __builtin_memcpy (&__o, &__val, sizeof (__val)); - __builtin_aarch64_st4_lanev2di ((__builtin_aarch64_simd_di *) __ptr, __o, - __lane); + __builtin_aarch64_st4_lanev2di_sps ((__builtin_aarch64_simd_di *) __ptr, + __val, __lane); } __extension__ extern __inline void __attribute__ ((__always_inline__, __gnu_inline__, __artificial__)) vst4q_lane_s8 (int8_t *__ptr, int8x16x4_t __val, const int __lane) { - __builtin_aarch64_simd_xi __o; - __builtin_memcpy (&__o, &__val, sizeof (__val)); - __builtin_aarch64_st4_lanev16qi ((__builtin_aarch64_simd_qi *) __ptr, __o, + __builtin_aarch64_st4_lanev16qi ((__builtin_aarch64_simd_qi *) __ptr, __val, __lane); } @@ -9740,9 +9292,7 @@ __extension__ extern __inline void __attribute__ ((__always_inline__, __gnu_inline__, __artificial__)) vst4q_lane_s16 (int16_t *__ptr, int16x8x4_t __val, const int __lane) { - __builtin_aarch64_simd_xi __o; - __builtin_memcpy (&__o, &__val, sizeof (__val)); - __builtin_aarch64_st4_lanev8hi ((__builtin_aarch64_simd_hi *) __ptr, __o, + __builtin_aarch64_st4_lanev8hi ((__builtin_aarch64_simd_hi *) __ptr, __val, __lane); } @@ -9750,9 +9300,7 @@ __extension__ extern __inline void __attribute__ ((__always_inline__, __gnu_inline__, __artificial__)) vst4q_lane_s32 (int32_t *__ptr, int32x4x4_t __val, const int __lane) { - __builtin_aarch64_simd_xi __o; - __builtin_memcpy (&__o, &__val, sizeof (__val)); - __builtin_aarch64_st4_lanev4si ((__builtin_aarch64_simd_si *) __ptr, __o, + __builtin_aarch64_st4_lanev4si ((__builtin_aarch64_simd_si *) __ptr, __val, __lane); } @@ -9760,9 +9308,7 @@ __extension__ extern __inline void __attribute__ ((__always_inline__, __gnu_inline__, __artificial__)) vst4q_lane_s64 (int64_t *__ptr, int64x2x4_t __val, const int __lane) { - __builtin_aarch64_simd_xi __o; - __builtin_memcpy (&__o, &__val, sizeof (__val)); - __builtin_aarch64_st4_lanev2di ((__builtin_aarch64_simd_di *) __ptr, __o, + __builtin_aarch64_st4_lanev2di ((__builtin_aarch64_simd_di *) __ptr, __val, __lane); } @@ -9770,40 +9316,32 @@ __extension__ extern __inline void __attribute__ ((__always_inline__, __gnu_inline__, __artificial__)) vst4q_lane_u8 (uint8_t *__ptr, uint8x16x4_t __val, const int __lane) { - __builtin_aarch64_simd_xi __o; - __builtin_memcpy (&__o, &__val, sizeof (__val)); - __builtin_aarch64_st4_lanev16qi ((__builtin_aarch64_simd_qi *) __ptr, __o, - __lane); + __builtin_aarch64_st4_lanev16qi_sus ((__builtin_aarch64_simd_qi *) __ptr, + __val, __lane); } __extension__ extern __inline void __attribute__ ((__always_inline__, __gnu_inline__, __artificial__)) vst4q_lane_u16 (uint16_t *__ptr, uint16x8x4_t __val, const int __lane) { - __builtin_aarch64_simd_xi __o; - __builtin_memcpy (&__o, &__val, sizeof (__val)); - __builtin_aarch64_st4_lanev8hi ((__builtin_aarch64_simd_hi *) __ptr, __o, - __lane); + __builtin_aarch64_st4_lanev8hi_sus ((__builtin_aarch64_simd_hi *) __ptr, + __val, __lane); } __extension__ extern __inline void __attribute__ ((__always_inline__, __gnu_inline__, __artificial__)) vst4q_lane_u32 (uint32_t *__ptr, uint32x4x4_t __val, const int __lane) { - __builtin_aarch64_simd_xi __o; - __builtin_memcpy (&__o, &__val, sizeof (__val)); - __builtin_aarch64_st4_lanev4si ((__builtin_aarch64_simd_si *) __ptr, __o, - __lane); + __builtin_aarch64_st4_lanev4si_sus ((__builtin_aarch64_simd_si *) __ptr, + __val, __lane); } __extension__ extern __inline void __attribute__ ((__always_inline__, __gnu_inline__, __artificial__)) vst4q_lane_u64 (uint64_t *__ptr, uint64x2x4_t __val, const int __lane) { - __builtin_aarch64_simd_xi __o; - __builtin_memcpy (&__o, &__val, sizeof (__val)); - __builtin_aarch64_st4_lanev2di ((__builtin_aarch64_simd_di *) __ptr, __o, - __lane); + __builtin_aarch64_st4_lanev2di_sus ((__builtin_aarch64_simd_di *) __ptr, + __val, __lane); } __extension__ extern __inline int64_t @@ -10020,12 +9558,10 @@ __attribute__ ((__always_inline__, __gnu_inline__, __artificial__)) vtbl3_s8 (int8x8x3_t __tab, int8x8_t __idx) { int8x16x2_t __temp; - __builtin_aarch64_simd_oi __o; __temp.val[0] = vcombine_s8 (__tab.val[0], __tab.val[1]); __temp.val[1] = vcombine_s8 (__tab.val[2], vcreate_s8 (__AARCH64_UINT64_C (0x0))); - __builtin_memcpy (&__o, &__temp, sizeof (__temp)); - return __builtin_aarch64_qtbl2v8qi (__o, __idx); + return __builtin_aarch64_qtbl2v8qi (__temp, __idx); } __extension__ extern __inline uint8x8_t @@ -10033,12 +9569,10 @@ __attribute__ ((__always_inline__, __gnu_inline__, __artificial__)) vtbl3_u8 (uint8x8x3_t __tab, uint8x8_t __idx) { uint8x16x2_t __temp; - __builtin_aarch64_simd_oi __o; __temp.val[0] = vcombine_u8 (__tab.val[0], __tab.val[1]); __temp.val[1] = vcombine_u8 (__tab.val[2], vcreate_u8 (__AARCH64_UINT64_C (0x0))); - __builtin_memcpy (&__o, &__temp, sizeof (__temp)); - return (uint8x8_t)__builtin_aarch64_qtbl2v8qi (__o, (int8x8_t)__idx); + return __builtin_aarch64_qtbl2v8qi_uuu (__temp, __idx); } __extension__ extern __inline poly8x8_t @@ -10046,12 +9580,10 @@ __attribute__ ((__always_inline__, __gnu_inline__, __artificial__)) vtbl3_p8 (poly8x8x3_t __tab, uint8x8_t __idx) { poly8x16x2_t __temp; - __builtin_aarch64_simd_oi __o; __temp.val[0] = vcombine_p8 (__tab.val[0], __tab.val[1]); __temp.val[1] = vcombine_p8 (__tab.val[2], vcreate_p8 (__AARCH64_UINT64_C (0x0))); - __builtin_memcpy (&__o, &__temp, sizeof (__temp)); - return (poly8x8_t)__builtin_aarch64_qtbl2v8qi (__o, (int8x8_t)__idx); + return __builtin_aarch64_qtbl2v8qi_ppu (__temp, __idx); } __extension__ extern __inline int8x8_t @@ -10059,11 +9591,9 @@ __attribute__ ((__always_inline__, __gnu_inline__, __artificial__)) vtbl4_s8 (int8x8x4_t __tab, int8x8_t __idx) { int8x16x2_t __temp; - __builtin_aarch64_simd_oi __o; __temp.val[0] = vcombine_s8 (__tab.val[0], __tab.val[1]); __temp.val[1] = vcombine_s8 (__tab.val[2], __tab.val[3]); - __builtin_memcpy (&__o, &__temp, sizeof (__temp)); - return __builtin_aarch64_qtbl2v8qi (__o, __idx); + return __builtin_aarch64_qtbl2v8qi (__temp, __idx); } __extension__ extern __inline uint8x8_t @@ -10071,11 +9601,9 @@ __attribute__ ((__always_inline__, __gnu_inline__, __artificial__)) vtbl4_u8 (uint8x8x4_t __tab, uint8x8_t __idx) { uint8x16x2_t __temp; - __builtin_aarch64_simd_oi __o; __temp.val[0] = vcombine_u8 (__tab.val[0], __tab.val[1]); __temp.val[1] = vcombine_u8 (__tab.val[2], __tab.val[3]); - __builtin_memcpy (&__o, &__temp, sizeof (__temp)); - return (uint8x8_t)__builtin_aarch64_qtbl2v8qi (__o, (int8x8_t)__idx); + return __builtin_aarch64_qtbl2v8qi_uuu (__temp, __idx); } __extension__ extern __inline poly8x8_t @@ -10083,11 +9611,9 @@ __attribute__ ((__always_inline__, __gnu_inline__, __artificial__)) vtbl4_p8 (poly8x8x4_t __tab, uint8x8_t __idx) { poly8x16x2_t __temp; - __builtin_aarch64_simd_oi __o; __temp.val[0] = vcombine_p8 (__tab.val[0], __tab.val[1]); __temp.val[1] = vcombine_p8 (__tab.val[2], __tab.val[3]); - __builtin_memcpy (&__o, &__temp, sizeof (__temp)); - return(poly8x8_t)__builtin_aarch64_qtbl2v8qi (__o, (int8x8_t)__idx); + return __builtin_aarch64_qtbl2v8qi_ppu (__temp, __idx); } __extension__ extern __inline int8x8_t @@ -15653,366 +15179,211 @@ __extension__ extern __inline uint8x8x3_t __attribute__ ((__always_inline__, __gnu_inline__, __artificial__)) vld1_u8_x3 (const uint8_t *__a) { - uint8x8x3_t __i; - __builtin_aarch64_simd_ci __o; - __o = (__builtin_aarch64_simd_ci)__builtin_aarch64_ld1x3v8qi ((const __builtin_aarch64_simd_qi *) __a); - __i.val[0] = (uint8x8_t) __builtin_aarch64_get_dregciv8qi (__o, 0); - __i.val[1] = (uint8x8_t) __builtin_aarch64_get_dregciv8qi (__o, 1); - __i.val[2] = (uint8x8_t) __builtin_aarch64_get_dregciv8qi (__o, 2); - return __i; + return __builtin_aarch64_ld1x3v8qi_us ( + (const __builtin_aarch64_simd_qi *) __a); } __extension__ extern __inline int8x8x3_t __attribute__ ((__always_inline__, __gnu_inline__, __artificial__)) vld1_s8_x3 (const int8_t *__a) { - int8x8x3_t __i; - __builtin_aarch64_simd_ci __o; - __o = __builtin_aarch64_ld1x3v8qi ((const __builtin_aarch64_simd_qi *) __a); - __i.val[0] = (int8x8_t) __builtin_aarch64_get_dregciv8qi (__o, 0); - __i.val[1] = (int8x8_t) __builtin_aarch64_get_dregciv8qi (__o, 1); - __i.val[2] = (int8x8_t) __builtin_aarch64_get_dregciv8qi (__o, 2); - return __i; + return __builtin_aarch64_ld1x3v8qi ((const __builtin_aarch64_simd_qi *) __a); } __extension__ extern __inline uint16x4x3_t __attribute__ ((__always_inline__, __gnu_inline__, __artificial__)) vld1_u16_x3 (const uint16_t *__a) { - uint16x4x3_t __i; - __builtin_aarch64_simd_ci __o; - __o = __builtin_aarch64_ld1x3v4hi ((const __builtin_aarch64_simd_hi *) __a); - __i.val[0] = (uint16x4_t) __builtin_aarch64_get_dregciv4hi (__o, 0); - __i.val[1] = (uint16x4_t) __builtin_aarch64_get_dregciv4hi (__o, 1); - __i.val[2] = (uint16x4_t) __builtin_aarch64_get_dregciv4hi (__o, 2); - return __i; + return __builtin_aarch64_ld1x3v4hi_us ( + (const __builtin_aarch64_simd_hi *) __a); } __extension__ extern __inline int16x4x3_t __attribute__ ((__always_inline__, __gnu_inline__, __artificial__)) vld1_s16_x3 (const int16_t *__a) { - int16x4x3_t __i; - __builtin_aarch64_simd_ci __o; - __o = __builtin_aarch64_ld1x3v4hi ((const __builtin_aarch64_simd_hi *) __a); - __i.val[0] = (int16x4_t) __builtin_aarch64_get_dregciv4hi (__o, 0); - __i.val[1] = (int16x4_t) __builtin_aarch64_get_dregciv4hi (__o, 1); - __i.val[2] = (int16x4_t) __builtin_aarch64_get_dregciv4hi (__o, 2); - return __i; + return __builtin_aarch64_ld1x3v4hi ((const __builtin_aarch64_simd_hi *) __a); } __extension__ extern __inline uint32x2x3_t __attribute__ ((__always_inline__, __gnu_inline__, __artificial__)) vld1_u32_x3 (const uint32_t *__a) { - uint32x2x3_t __i; - __builtin_aarch64_simd_ci __o; - __o = __builtin_aarch64_ld1x3v2si ((const __builtin_aarch64_simd_si *) __a); - __i.val[0] = (uint32x2_t) __builtin_aarch64_get_dregciv2si (__o, 0); - __i.val[1] = (uint32x2_t) __builtin_aarch64_get_dregciv2si (__o, 1); - __i.val[2] = (uint32x2_t) __builtin_aarch64_get_dregciv2si (__o, 2); - return __i; + return __builtin_aarch64_ld1x3v2si_us ( + (const __builtin_aarch64_simd_si *) __a); } __extension__ extern __inline int32x2x3_t __attribute__ ((__always_inline__, __gnu_inline__, __artificial__)) vld1_s32_x3 (const int32_t *__a) { - int32x2x3_t __i; - __builtin_aarch64_simd_ci __o; - __o = __builtin_aarch64_ld1x3v2si ((const __builtin_aarch64_simd_si *) __a); - __i.val[0] = (int32x2_t) __builtin_aarch64_get_dregciv2si (__o, 0); - __i.val[1] = (int32x2_t) __builtin_aarch64_get_dregciv2si (__o, 1); - __i.val[2] = (int32x2_t) __builtin_aarch64_get_dregciv2si (__o, 2); - return __i; + return __builtin_aarch64_ld1x3v2si ((const __builtin_aarch64_simd_si *) __a); } __extension__ extern __inline uint64x1x3_t __attribute__ ((__always_inline__, __gnu_inline__, __artificial__)) vld1_u64_x3 (const uint64_t *__a) { - uint64x1x3_t __i; - __builtin_aarch64_simd_ci __o; - __o = __builtin_aarch64_ld1x3di ((const __builtin_aarch64_simd_di *) __a); - __i.val[0] = (uint64x1_t) __builtin_aarch64_get_dregcidi (__o, 0); - __i.val[1] = (uint64x1_t) __builtin_aarch64_get_dregcidi (__o, 1); - __i.val[2] = (uint64x1_t) __builtin_aarch64_get_dregcidi (__o, 2); - return __i; + return __builtin_aarch64_ld1x3di_us ( + (const __builtin_aarch64_simd_di *) __a); } __extension__ extern __inline int64x1x3_t __attribute__ ((__always_inline__, __gnu_inline__, __artificial__)) vld1_s64_x3 (const int64_t *__a) { - int64x1x3_t __i; - __builtin_aarch64_simd_ci __o; - __o = __builtin_aarch64_ld1x3di ((const __builtin_aarch64_simd_di *) __a); - __i.val[0] = (int64x1_t) __builtin_aarch64_get_dregcidi (__o, 0); - __i.val[1] = (int64x1_t) __builtin_aarch64_get_dregcidi (__o, 1); - __i.val[2] = (int64x1_t) __builtin_aarch64_get_dregcidi (__o, 2); - - return __i; + return __builtin_aarch64_ld1x3di ((const __builtin_aarch64_simd_di *) __a); } __extension__ extern __inline float16x4x3_t __attribute__ ((__always_inline__, __gnu_inline__, __artificial__)) vld1_f16_x3 (const float16_t *__a) { - float16x4x3_t __i; - __builtin_aarch64_simd_ci __o; - __o = __builtin_aarch64_ld1x3v4hf ((const __builtin_aarch64_simd_hf *) __a); - __i.val[0] = (float16x4_t) __builtin_aarch64_get_dregciv4hf (__o, 0); - __i.val[1] = (float16x4_t) __builtin_aarch64_get_dregciv4hf (__o, 1); - __i.val[2] = (float16x4_t) __builtin_aarch64_get_dregciv4hf (__o, 2); - return __i; + return __builtin_aarch64_ld1x3v4hf ((const __builtin_aarch64_simd_hf *) __a); } __extension__ extern __inline float32x2x3_t __attribute__ ((__always_inline__, __gnu_inline__, __artificial__)) vld1_f32_x3 (const float32_t *__a) { - float32x2x3_t __i; - __builtin_aarch64_simd_ci __o; - __o = __builtin_aarch64_ld1x3v2sf ((const __builtin_aarch64_simd_sf *) __a); - __i.val[0] = (float32x2_t) __builtin_aarch64_get_dregciv2sf (__o, 0); - __i.val[1] = (float32x2_t) __builtin_aarch64_get_dregciv2sf (__o, 1); - __i.val[2] = (float32x2_t) __builtin_aarch64_get_dregciv2sf (__o, 2); - return __i; + return __builtin_aarch64_ld1x3v2sf ((const __builtin_aarch64_simd_sf *) __a); } __extension__ extern __inline float64x1x3_t __attribute__ ((__always_inline__, __gnu_inline__, __artificial__)) vld1_f64_x3 (const float64_t *__a) { - float64x1x3_t __i; - __builtin_aarch64_simd_ci __o; - __o = __builtin_aarch64_ld1x3df ((const __builtin_aarch64_simd_df *) __a); - __i.val[0] = (float64x1_t) __builtin_aarch64_get_dregcidi (__o, 0); - __i.val[1] = (float64x1_t) __builtin_aarch64_get_dregcidi (__o, 1); - __i.val[2] = (float64x1_t) __builtin_aarch64_get_dregcidi (__o, 2); - return __i; + return __builtin_aarch64_ld1x3df ((const __builtin_aarch64_simd_df *) __a); } __extension__ extern __inline poly8x8x3_t __attribute__ ((__always_inline__, __gnu_inline__, __artificial__)) vld1_p8_x3 (const poly8_t *__a) { - poly8x8x3_t __i; - __builtin_aarch64_simd_ci __o; - __o = __builtin_aarch64_ld1x3v8qi ((const __builtin_aarch64_simd_qi *) __a); - __i.val[0] = (poly8x8_t) __builtin_aarch64_get_dregciv8qi (__o, 0); - __i.val[1] = (poly8x8_t) __builtin_aarch64_get_dregciv8qi (__o, 1); - __i.val[2] = (poly8x8_t) __builtin_aarch64_get_dregciv8qi (__o, 2); - return __i; + return __builtin_aarch64_ld1x3v8qi_ps ( + (const __builtin_aarch64_simd_qi *) __a); } __extension__ extern __inline poly16x4x3_t __attribute__ ((__always_inline__, __gnu_inline__, __artificial__)) vld1_p16_x3 (const poly16_t *__a) { - poly16x4x3_t __i; - __builtin_aarch64_simd_ci __o; - __o = __builtin_aarch64_ld1x3v4hi ((const __builtin_aarch64_simd_hi *) __a); - __i.val[0] = (poly16x4_t) __builtin_aarch64_get_dregciv4hi (__o, 0); - __i.val[1] = (poly16x4_t) __builtin_aarch64_get_dregciv4hi (__o, 1); - __i.val[2] = (poly16x4_t) __builtin_aarch64_get_dregciv4hi (__o, 2); - return __i; + return __builtin_aarch64_ld1x3v4hi_ps ( + (const __builtin_aarch64_simd_hi *) __a); } __extension__ extern __inline poly64x1x3_t __attribute__ ((__always_inline__, __gnu_inline__, __artificial__)) vld1_p64_x3 (const poly64_t *__a) { - poly64x1x3_t __i; - __builtin_aarch64_simd_ci __o; - __o = __builtin_aarch64_ld1x3di ((const __builtin_aarch64_simd_di *) __a); - __i.val[0] = (poly64x1_t) __builtin_aarch64_get_dregcidi (__o, 0); - __i.val[1] = (poly64x1_t) __builtin_aarch64_get_dregcidi (__o, 1); - __i.val[2] = (poly64x1_t) __builtin_aarch64_get_dregcidi (__o, 2); - -return __i; + return __builtin_aarch64_ld1x3di_ps ( + (const __builtin_aarch64_simd_di *) __a); } __extension__ extern __inline uint8x16x3_t __attribute__ ((__always_inline__, __gnu_inline__, __artificial__)) vld1q_u8_x3 (const uint8_t *__a) { - uint8x16x3_t __i; - __builtin_aarch64_simd_ci __o; - __o = __builtin_aarch64_ld1x3v16qi ((const __builtin_aarch64_simd_qi *) __a); - __i.val[0] = (uint8x16_t) __builtin_aarch64_get_qregciv16qi (__o, 0); - __i.val[1] = (uint8x16_t) __builtin_aarch64_get_qregciv16qi (__o, 1); - __i.val[2] = (uint8x16_t) __builtin_aarch64_get_qregciv16qi (__o, 2); - return __i; + return __builtin_aarch64_ld1x3v16qi_us ( + (const __builtin_aarch64_simd_qi *) __a); } __extension__ extern __inline int8x16x3_t __attribute__ ((__always_inline__, __gnu_inline__, __artificial__)) vld1q_s8_x3 (const int8_t *__a) { - int8x16x3_t __i; - __builtin_aarch64_simd_ci __o; - __o = __builtin_aarch64_ld1x3v16qi ((const __builtin_aarch64_simd_qi *) __a); - __i.val[0] = (int8x16_t) __builtin_aarch64_get_qregciv16qi (__o, 0); - __i.val[1] = (int8x16_t) __builtin_aarch64_get_qregciv16qi (__o, 1); - __i.val[2] = (int8x16_t) __builtin_aarch64_get_qregciv16qi (__o, 2); - return __i; + return __builtin_aarch64_ld1x3v16qi ( + (const __builtin_aarch64_simd_qi *) __a); } __extension__ extern __inline uint16x8x3_t __attribute__ ((__always_inline__, __gnu_inline__, __artificial__)) vld1q_u16_x3 (const uint16_t *__a) { - uint16x8x3_t __i; - __builtin_aarch64_simd_ci __o; - __o = __builtin_aarch64_ld1x3v8hi ((const __builtin_aarch64_simd_hi *) __a); - __i.val[0] = (uint16x8_t) __builtin_aarch64_get_qregciv8hi (__o, 0); - __i.val[1] = (uint16x8_t) __builtin_aarch64_get_qregciv8hi (__o, 1); - __i.val[2] = (uint16x8_t) __builtin_aarch64_get_qregciv8hi (__o, 2); - return __i; + return __builtin_aarch64_ld1x3v8hi_us ( + (const __builtin_aarch64_simd_hi *) __a); } __extension__ extern __inline int16x8x3_t __attribute__ ((__always_inline__, __gnu_inline__, __artificial__)) vld1q_s16_x3 (const int16_t *__a) { - int16x8x3_t __i; - __builtin_aarch64_simd_ci __o; - __o = __builtin_aarch64_ld1x3v8hi ((const __builtin_aarch64_simd_hi *) __a); - __i.val[0] = (int16x8_t) __builtin_aarch64_get_qregciv8hi (__o, 0); - __i.val[1] = (int16x8_t) __builtin_aarch64_get_qregciv8hi (__o, 1); - __i.val[2] = (int16x8_t) __builtin_aarch64_get_qregciv8hi (__o, 2); - return __i; + return __builtin_aarch64_ld1x3v8hi ((const __builtin_aarch64_simd_hi *) __a); } __extension__ extern __inline uint32x4x3_t __attribute__ ((__always_inline__, __gnu_inline__, __artificial__)) vld1q_u32_x3 (const uint32_t *__a) { - uint32x4x3_t __i; - __builtin_aarch64_simd_ci __o; - __o = __builtin_aarch64_ld1x3v4si ((const __builtin_aarch64_simd_si *) __a); - __i.val[0] = (uint32x4_t) __builtin_aarch64_get_qregciv4si (__o, 0); - __i.val[1] = (uint32x4_t) __builtin_aarch64_get_qregciv4si (__o, 1); - __i.val[2] = (uint32x4_t) __builtin_aarch64_get_qregciv4si (__o, 2); - return __i; + return __builtin_aarch64_ld1x3v4si_us ( + (const __builtin_aarch64_simd_si *) __a); } __extension__ extern __inline int32x4x3_t __attribute__ ((__always_inline__, __gnu_inline__, __artificial__)) vld1q_s32_x3 (const int32_t *__a) { - int32x4x3_t __i; - __builtin_aarch64_simd_ci __o; - __o = __builtin_aarch64_ld1x3v4si ((const __builtin_aarch64_simd_si *) __a); - __i.val[0] = (int32x4_t) __builtin_aarch64_get_qregciv4si (__o, 0); - __i.val[1] = (int32x4_t) __builtin_aarch64_get_qregciv4si (__o, 1); - __i.val[2] = (int32x4_t) __builtin_aarch64_get_qregciv4si (__o, 2); - return __i; + return __builtin_aarch64_ld1x3v4si ((const __builtin_aarch64_simd_si *) __a); } __extension__ extern __inline uint64x2x3_t __attribute__ ((__always_inline__, __gnu_inline__, __artificial__)) vld1q_u64_x3 (const uint64_t *__a) { - uint64x2x3_t __i; - __builtin_aarch64_simd_ci __o; - __o = __builtin_aarch64_ld1x3v2di ((const __builtin_aarch64_simd_di *) __a); - __i.val[0] = (uint64x2_t) __builtin_aarch64_get_qregciv2di (__o, 0); - __i.val[1] = (uint64x2_t) __builtin_aarch64_get_qregciv2di (__o, 1); - __i.val[2] = (uint64x2_t) __builtin_aarch64_get_qregciv2di (__o, 2); - return __i; + return __builtin_aarch64_ld1x3v2di_us ( + (const __builtin_aarch64_simd_di *) __a); } __extension__ extern __inline int64x2x3_t __attribute__ ((__always_inline__, __gnu_inline__, __artificial__)) vld1q_s64_x3 (const int64_t *__a) { - int64x2x3_t __i; - __builtin_aarch64_simd_ci __o; - __o = __builtin_aarch64_ld1x3v2di ((const __builtin_aarch64_simd_di *) __a); - __i.val[0] = (int64x2_t) __builtin_aarch64_get_qregciv2di (__o, 0); - __i.val[1] = (int64x2_t) __builtin_aarch64_get_qregciv2di (__o, 1); - __i.val[2] = (int64x2_t) __builtin_aarch64_get_qregciv2di (__o, 2); - return __i; + return __builtin_aarch64_ld1x3v2di ((const __builtin_aarch64_simd_di *) __a); } __extension__ extern __inline float16x8x3_t __attribute__ ((__always_inline__, __gnu_inline__, __artificial__)) vld1q_f16_x3 (const float16_t *__a) { - float16x8x3_t __i; - __builtin_aarch64_simd_ci __o; - __o = __builtin_aarch64_ld1x3v8hf ((const __builtin_aarch64_simd_hf *) __a); - __i.val[0] = (float16x8_t) __builtin_aarch64_get_qregciv8hf (__o, 0); - __i.val[1] = (float16x8_t) __builtin_aarch64_get_qregciv8hf (__o, 1); - __i.val[2] = (float16x8_t) __builtin_aarch64_get_qregciv8hf (__o, 2); - return __i; + return __builtin_aarch64_ld1x3v8hf ((const __builtin_aarch64_simd_hf *) __a); } __extension__ extern __inline float32x4x3_t __attribute__ ((__always_inline__, __gnu_inline__, __artificial__)) vld1q_f32_x3 (const float32_t *__a) { - float32x4x3_t __i; - __builtin_aarch64_simd_ci __o; - __o = __builtin_aarch64_ld1x3v4sf ((const __builtin_aarch64_simd_sf *) __a); - __i.val[0] = (float32x4_t) __builtin_aarch64_get_qregciv4sf (__o, 0); - __i.val[1] = (float32x4_t) __builtin_aarch64_get_qregciv4sf (__o, 1); - __i.val[2] = (float32x4_t) __builtin_aarch64_get_qregciv4sf (__o, 2); - return __i; + return __builtin_aarch64_ld1x3v4sf ((const __builtin_aarch64_simd_sf *) __a); } __extension__ extern __inline float64x2x3_t __attribute__ ((__always_inline__, __gnu_inline__, __artificial__)) vld1q_f64_x3 (const float64_t *__a) { - float64x2x3_t __i; - __builtin_aarch64_simd_ci __o; - __o = __builtin_aarch64_ld1x3v2df ((const __builtin_aarch64_simd_df *) __a); - __i.val[0] = (float64x2_t) __builtin_aarch64_get_qregciv2df (__o, 0); - __i.val[1] = (float64x2_t) __builtin_aarch64_get_qregciv2df (__o, 1); - __i.val[2] = (float64x2_t) __builtin_aarch64_get_qregciv2df (__o, 2); - return __i; + return __builtin_aarch64_ld1x3v2df ((const __builtin_aarch64_simd_df *) __a); } __extension__ extern __inline poly8x16x3_t __attribute__ ((__always_inline__, __gnu_inline__, __artificial__)) vld1q_p8_x3 (const poly8_t *__a) { - poly8x16x3_t __i; - __builtin_aarch64_simd_ci __o; - __o = __builtin_aarch64_ld1x3v16qi ((const __builtin_aarch64_simd_qi *) __a); - __i.val[0] = (poly8x16_t) __builtin_aarch64_get_qregciv16qi (__o, 0); - __i.val[1] = (poly8x16_t) __builtin_aarch64_get_qregciv16qi (__o, 1); - __i.val[2] = (poly8x16_t) __builtin_aarch64_get_qregciv16qi (__o, 2); - return __i; + return __builtin_aarch64_ld1x3v16qi_ps ( + (const __builtin_aarch64_simd_qi *) __a); } __extension__ extern __inline poly16x8x3_t __attribute__ ((__always_inline__, __gnu_inline__, __artificial__)) vld1q_p16_x3 (const poly16_t *__a) { - poly16x8x3_t __i; - __builtin_aarch64_simd_ci __o; - __o = __builtin_aarch64_ld1x3v8hi ((const __builtin_aarch64_simd_hi *) __a); - __i.val[0] = (poly16x8_t) __builtin_aarch64_get_qregciv8hi (__o, 0); - __i.val[1] = (poly16x8_t) __builtin_aarch64_get_qregciv8hi (__o, 1); - __i.val[2] = (poly16x8_t) __builtin_aarch64_get_qregciv8hi (__o, 2); - return __i; + return __builtin_aarch64_ld1x3v8hi_ps ( + (const __builtin_aarch64_simd_hi *) __a); } __extension__ extern __inline poly64x2x3_t __attribute__ ((__always_inline__, __gnu_inline__, __artificial__)) vld1q_p64_x3 (const poly64_t *__a) { - poly64x2x3_t __i; - __builtin_aarch64_simd_ci __o; - __o = __builtin_aarch64_ld1x3v2di ((const __builtin_aarch64_simd_di *) __a); - __i.val[0] = (poly64x2_t) __builtin_aarch64_get_qregciv2di (__o, 0); - __i.val[1] = (poly64x2_t) __builtin_aarch64_get_qregciv2di (__o, 1); - __i.val[2] = (poly64x2_t) __builtin_aarch64_get_qregciv2di (__o, 2); - return __i; + return __builtin_aarch64_ld1x3v2di_ps ( + (const __builtin_aarch64_simd_di *) __a); } /* vld1q */ @@ -16102,336 +15473,211 @@ __extension__ extern __inline uint8x8x2_t __attribute__ ((__always_inline__, __gnu_inline__, __artificial__)) vld1_u8_x2 (const uint8_t *__a) { - uint8x8x2_t ret; - __builtin_aarch64_simd_oi __o; - __o = __builtin_aarch64_ld1x2v8qi ((const __builtin_aarch64_simd_qi *) __a); - ret.val[0] = (uint8x8_t) __builtin_aarch64_get_dregoiv8qi (__o, 0); - ret.val[1] = (uint8x8_t) __builtin_aarch64_get_dregoiv8qi (__o, 1); - return ret; + return __builtin_aarch64_ld1x2v8qi_us ( + (const __builtin_aarch64_simd_qi *) __a); } __extension__ extern __inline int8x8x2_t __attribute__ ((__always_inline__, __gnu_inline__, __artificial__)) vld1_s8_x2 (const int8_t *__a) { - int8x8x2_t ret; - __builtin_aarch64_simd_oi __o; - __o = __builtin_aarch64_ld1x2v8qi ((const __builtin_aarch64_simd_qi *) __a); - ret.val[0] = (int8x8_t) __builtin_aarch64_get_dregoiv8qi (__o, 0); - ret.val[1] = (int8x8_t) __builtin_aarch64_get_dregoiv8qi (__o, 1); - return ret; + return __builtin_aarch64_ld1x2v8qi ((const __builtin_aarch64_simd_qi *) __a); } __extension__ extern __inline uint16x4x2_t __attribute__ ((__always_inline__, __gnu_inline__, __artificial__)) vld1_u16_x2 (const uint16_t *__a) { - uint16x4x2_t ret; - __builtin_aarch64_simd_oi __o; - __o = __builtin_aarch64_ld1x2v4hi ((const __builtin_aarch64_simd_hi *) __a); - ret.val[0] = (uint16x4_t) __builtin_aarch64_get_dregoiv4hi (__o, 0); - ret.val[1] = (uint16x4_t) __builtin_aarch64_get_dregoiv4hi (__o, 1); - return ret; + return __builtin_aarch64_ld1x2v4hi_us ( + (const __builtin_aarch64_simd_hi *) __a); } __extension__ extern __inline int16x4x2_t __attribute__ ((__always_inline__, __gnu_inline__, __artificial__)) vld1_s16_x2 (const int16_t *__a) { - int16x4x2_t ret; - __builtin_aarch64_simd_oi __o; - __o = __builtin_aarch64_ld1x2v4hi ((const __builtin_aarch64_simd_hi *) __a); - ret.val[0] = (int16x4_t) __builtin_aarch64_get_dregoiv4hi (__o, 0); - ret.val[1] = (int16x4_t) __builtin_aarch64_get_dregoiv4hi (__o, 1); - return ret; + return __builtin_aarch64_ld1x2v4hi ((const __builtin_aarch64_simd_hi *) __a); } __extension__ extern __inline uint32x2x2_t __attribute__ ((__always_inline__, __gnu_inline__, __artificial__)) vld1_u32_x2 (const uint32_t *__a) { - uint32x2x2_t ret; - __builtin_aarch64_simd_oi __o; - __o = __builtin_aarch64_ld1x2v2si ((const __builtin_aarch64_simd_si *) __a); - ret.val[0] = (uint32x2_t) __builtin_aarch64_get_dregoiv2si (__o, 0); - ret.val[1] = (uint32x2_t) __builtin_aarch64_get_dregoiv2si (__o, 1); - return ret; + return __builtin_aarch64_ld1x2v2si_us ( + (const __builtin_aarch64_simd_si *) __a); } __extension__ extern __inline int32x2x2_t __attribute__ ((__always_inline__, __gnu_inline__, __artificial__)) vld1_s32_x2 (const int32_t *__a) { - int32x2x2_t ret; - __builtin_aarch64_simd_oi __o; - __o = __builtin_aarch64_ld1x2v2si ((const __builtin_aarch64_simd_si *) __a); - ret.val[0] = (int32x2_t) __builtin_aarch64_get_dregoiv2si (__o, 0); - ret.val[1] = (int32x2_t) __builtin_aarch64_get_dregoiv2si (__o, 1); - return ret; + return __builtin_aarch64_ld1x2v2si ((const __builtin_aarch64_simd_si *) __a); } __extension__ extern __inline uint64x1x2_t __attribute__ ((__always_inline__, __gnu_inline__, __artificial__)) vld1_u64_x2 (const uint64_t *__a) { - uint64x1x2_t ret; - __builtin_aarch64_simd_oi __o; - __o = __builtin_aarch64_ld1x2di ((const __builtin_aarch64_simd_di *) __a); - ret.val[0] = (uint64x1_t) __builtin_aarch64_get_dregoidi (__o, 0); - ret.val[1] = (uint64x1_t) __builtin_aarch64_get_dregoidi (__o, 1); - return ret; + return __builtin_aarch64_ld1x2di_us ( + (const __builtin_aarch64_simd_di *) __a); } __extension__ extern __inline int64x1x2_t __attribute__ ((__always_inline__, __gnu_inline__, __artificial__)) vld1_s64_x2 (const int64_t *__a) { - int64x1x2_t ret; - __builtin_aarch64_simd_oi __o; - __o = __builtin_aarch64_ld1x2di ((const __builtin_aarch64_simd_di *) __a); - ret.val[0] = (int64x1_t) __builtin_aarch64_get_dregoidi (__o, 0); - ret.val[1] = (int64x1_t) __builtin_aarch64_get_dregoidi (__o, 1); - return ret; + return __builtin_aarch64_ld1x2di ((const __builtin_aarch64_simd_di *) __a); } __extension__ extern __inline float16x4x2_t __attribute__ ((__always_inline__, __gnu_inline__, __artificial__)) vld1_f16_x2 (const float16_t *__a) { - float16x4x2_t ret; - __builtin_aarch64_simd_oi __o; - __o = __builtin_aarch64_ld1x2v4hf ((const __builtin_aarch64_simd_hf *) __a); - ret.val[0] = (float16x4_t) __builtin_aarch64_get_dregoiv4hf (__o, 0); - ret.val[1] = (float16x4_t) __builtin_aarch64_get_dregoiv4hf (__o, 1); - return ret; + return __builtin_aarch64_ld1x2v4hf ((const __builtin_aarch64_simd_hf *) __a); } __extension__ extern __inline float32x2x2_t __attribute__ ((__always_inline__, __gnu_inline__, __artificial__)) vld1_f32_x2 (const float32_t *__a) { - float32x2x2_t ret; - __builtin_aarch64_simd_oi __o; - __o = __builtin_aarch64_ld1x2v2sf ((const __builtin_aarch64_simd_sf *) __a); - ret.val[0] = (float32x2_t) __builtin_aarch64_get_dregoiv2sf (__o, 0); - ret.val[1] = (float32x2_t) __builtin_aarch64_get_dregoiv2sf (__o, 1); - return ret; + return __builtin_aarch64_ld1x2v2sf ((const __builtin_aarch64_simd_sf *) __a); } __extension__ extern __inline float64x1x2_t __attribute__ ((__always_inline__, __gnu_inline__, __artificial__)) vld1_f64_x2 (const float64_t *__a) { - float64x1x2_t ret; - __builtin_aarch64_simd_oi __o; - __o = __builtin_aarch64_ld1x2df ((const __builtin_aarch64_simd_df *) __a); - ret.val[0] = (float64x1_t) {__builtin_aarch64_get_dregoidf (__o, 0)}; - ret.val[1] = (float64x1_t) {__builtin_aarch64_get_dregoidf (__o, 1)}; - return ret; + return __builtin_aarch64_ld1x2df ((const __builtin_aarch64_simd_df *) __a); } __extension__ extern __inline poly8x8x2_t __attribute__ ((__always_inline__, __gnu_inline__, __artificial__)) vld1_p8_x2 (const poly8_t *__a) { - poly8x8x2_t ret; - __builtin_aarch64_simd_oi __o; - __o = __builtin_aarch64_ld1x2v8qi ((const __builtin_aarch64_simd_qi *) __a); - ret.val[0] = (poly8x8_t) __builtin_aarch64_get_dregoiv8qi (__o, 0); - ret.val[1] = (poly8x8_t) __builtin_aarch64_get_dregoiv8qi (__o, 1); - return ret; + return __builtin_aarch64_ld1x2v8qi_ps ( + (const __builtin_aarch64_simd_qi *) __a); } __extension__ extern __inline poly16x4x2_t __attribute__ ((__always_inline__, __gnu_inline__, __artificial__)) vld1_p16_x2 (const poly16_t *__a) { - poly16x4x2_t ret; - __builtin_aarch64_simd_oi __o; - __o = __builtin_aarch64_ld1x2v4hi ((const __builtin_aarch64_simd_hi *) __a); - ret.val[0] = (poly16x4_t) __builtin_aarch64_get_dregoiv4hi (__o, 0); - ret.val[1] = (poly16x4_t) __builtin_aarch64_get_dregoiv4hi (__o, 1); - return ret; + return __builtin_aarch64_ld1x2v4hi_ps ( + (const __builtin_aarch64_simd_hi *) __a); } __extension__ extern __inline poly64x1x2_t __attribute__ ((__always_inline__, __gnu_inline__, __artificial__)) vld1_p64_x2 (const poly64_t *__a) { - poly64x1x2_t ret; - __builtin_aarch64_simd_oi __o; - __o = __builtin_aarch64_ld1x2di ((const __builtin_aarch64_simd_di *) __a); - ret.val[0] = (poly64x1_t) __builtin_aarch64_get_dregoidi (__o, 0); - ret.val[1] = (poly64x1_t) __builtin_aarch64_get_dregoidi (__o, 1); - return ret; + return __builtin_aarch64_ld1x2di_ps ( + (const __builtin_aarch64_simd_di *) __a); } __extension__ extern __inline uint8x16x2_t __attribute__ ((__always_inline__, __gnu_inline__, __artificial__)) vld1q_u8_x2 (const uint8_t *__a) { - uint8x16x2_t ret; - __builtin_aarch64_simd_oi __o; - __o = __builtin_aarch64_ld1x2v16qi ((const __builtin_aarch64_simd_qi *) __a); - ret.val[0] = (uint8x16_t) __builtin_aarch64_get_qregoiv16qi (__o, 0); - ret.val[1] = (uint8x16_t) __builtin_aarch64_get_qregoiv16qi (__o, 1); - return ret; + return __builtin_aarch64_ld1x2v16qi_us ( + (const __builtin_aarch64_simd_qi *) __a); } __extension__ extern __inline int8x16x2_t __attribute__ ((__always_inline__, __gnu_inline__, __artificial__)) vld1q_s8_x2 (const int8_t *__a) { - int8x16x2_t ret; - __builtin_aarch64_simd_oi __o; - __o = __builtin_aarch64_ld1x2v16qi ((const __builtin_aarch64_simd_qi *) __a); - ret.val[0] = (int8x16_t) __builtin_aarch64_get_qregoiv16qi (__o, 0); - ret.val[1] = (int8x16_t) __builtin_aarch64_get_qregoiv16qi (__o, 1); - return ret; + return __builtin_aarch64_ld1x2v16qi ( + (const __builtin_aarch64_simd_qi *) __a); } __extension__ extern __inline uint16x8x2_t __attribute__ ((__always_inline__, __gnu_inline__, __artificial__)) vld1q_u16_x2 (const uint16_t *__a) { - uint16x8x2_t ret; - __builtin_aarch64_simd_oi __o; - __o = __builtin_aarch64_ld1x2v8hi ((const __builtin_aarch64_simd_hi *) __a); - ret.val[0] = (uint16x8_t) __builtin_aarch64_get_qregoiv8hi (__o, 0); - ret.val[1] = (uint16x8_t) __builtin_aarch64_get_qregoiv8hi (__o, 1); - return ret; + return __builtin_aarch64_ld1x2v8hi_us ( + (const __builtin_aarch64_simd_hi *) __a); } __extension__ extern __inline int16x8x2_t __attribute__ ((__always_inline__, __gnu_inline__, __artificial__)) vld1q_s16_x2 (const int16_t *__a) { - int16x8x2_t ret; - __builtin_aarch64_simd_oi __o; - __o = __builtin_aarch64_ld1x2v8hi ((const __builtin_aarch64_simd_hi *) __a); - ret.val[0] = (int16x8_t) __builtin_aarch64_get_qregoiv8hi (__o, 0); - ret.val[1] = (int16x8_t) __builtin_aarch64_get_qregoiv8hi (__o, 1); - return ret; + return __builtin_aarch64_ld1x2v8hi ((const __builtin_aarch64_simd_hi *) __a); } __extension__ extern __inline uint32x4x2_t __attribute__ ((__always_inline__, __gnu_inline__, __artificial__)) vld1q_u32_x2 (const uint32_t *__a) { - uint32x4x2_t ret; - __builtin_aarch64_simd_oi __o; - __o = __builtin_aarch64_ld1x2v4si ((const __builtin_aarch64_simd_si *) __a); - ret.val[0] = (uint32x4_t) __builtin_aarch64_get_qregoiv4si (__o, 0); - ret.val[1] = (uint32x4_t) __builtin_aarch64_get_qregoiv4si (__o, 1); - return ret; + return __builtin_aarch64_ld1x2v4si_us ( + (const __builtin_aarch64_simd_si *) __a); } __extension__ extern __inline int32x4x2_t __attribute__ ((__always_inline__, __gnu_inline__, __artificial__)) vld1q_s32_x2 (const int32_t *__a) { - int32x4x2_t ret; - __builtin_aarch64_simd_oi __o; - __o = __builtin_aarch64_ld1x2v4si ((const __builtin_aarch64_simd_si *) __a); - ret.val[0] = (int32x4_t) __builtin_aarch64_get_qregoiv4si (__o, 0); - ret.val[1] = (int32x4_t) __builtin_aarch64_get_qregoiv4si (__o, 1); - return ret; + return __builtin_aarch64_ld1x2v4si ((const __builtin_aarch64_simd_si *) __a); } __extension__ extern __inline uint64x2x2_t __attribute__ ((__always_inline__, __gnu_inline__, __artificial__)) vld1q_u64_x2 (const uint64_t *__a) { - uint64x2x2_t ret; - __builtin_aarch64_simd_oi __o; - __o = __builtin_aarch64_ld1x2v2di ((const __builtin_aarch64_simd_di *) __a); - ret.val[0] = (uint64x2_t) __builtin_aarch64_get_qregoiv2di (__o, 0); - ret.val[1] = (uint64x2_t) __builtin_aarch64_get_qregoiv2di (__o, 1); - return ret; + return __builtin_aarch64_ld1x2v2di_us ( + (const __builtin_aarch64_simd_di *) __a); } __extension__ extern __inline int64x2x2_t __attribute__ ((__always_inline__, __gnu_inline__, __artificial__)) vld1q_s64_x2 (const int64_t *__a) { - int64x2x2_t ret; - __builtin_aarch64_simd_oi __o; - __o = __builtin_aarch64_ld1x2v2di ((const __builtin_aarch64_simd_di *) __a); - ret.val[0] = (int64x2_t) __builtin_aarch64_get_qregoiv2di (__o, 0); - ret.val[1] = (int64x2_t) __builtin_aarch64_get_qregoiv2di (__o, 1); - return ret; + return __builtin_aarch64_ld1x2v2di ((const __builtin_aarch64_simd_di *) __a); } __extension__ extern __inline float16x8x2_t __attribute__ ((__always_inline__, __gnu_inline__, __artificial__)) vld1q_f16_x2 (const float16_t *__a) { - float16x8x2_t ret; - __builtin_aarch64_simd_oi __o; - __o = __builtin_aarch64_ld1x2v8hf ((const __builtin_aarch64_simd_hf *) __a); - ret.val[0] = (float16x8_t) __builtin_aarch64_get_qregoiv8hf (__o, 0); - ret.val[1] = (float16x8_t) __builtin_aarch64_get_qregoiv8hf (__o, 1); - return ret; + return __builtin_aarch64_ld1x2v8hf ((const __builtin_aarch64_simd_hf *) __a); } __extension__ extern __inline float32x4x2_t __attribute__ ((__always_inline__, __gnu_inline__, __artificial__)) vld1q_f32_x2 (const float32_t *__a) { - float32x4x2_t ret; - __builtin_aarch64_simd_oi __o; - __o = __builtin_aarch64_ld1x2v4sf ((const __builtin_aarch64_simd_sf *) __a); - ret.val[0] = (float32x4_t) __builtin_aarch64_get_qregoiv4sf (__o, 0); - ret.val[1] = (float32x4_t) __builtin_aarch64_get_qregoiv4sf (__o, 1); - return ret; + return __builtin_aarch64_ld1x2v4sf ((const __builtin_aarch64_simd_sf *) __a); } __extension__ extern __inline float64x2x2_t __attribute__ ((__always_inline__, __gnu_inline__, __artificial__)) vld1q_f64_x2 (const float64_t *__a) { - float64x2x2_t ret; - __builtin_aarch64_simd_oi __o; - __o = __builtin_aarch64_ld1x2v2df ((const __builtin_aarch64_simd_df *) __a); - ret.val[0] = (float64x2_t) __builtin_aarch64_get_qregoiv2df (__o, 0); - ret.val[1] = (float64x2_t) __builtin_aarch64_get_qregoiv2df (__o, 1); - return ret; + return __builtin_aarch64_ld1x2v2df ((const __builtin_aarch64_simd_df *) __a); } __extension__ extern __inline poly8x16x2_t __attribute__ ((__always_inline__, __gnu_inline__, __artificial__)) vld1q_p8_x2 (const poly8_t *__a) { - poly8x16x2_t ret; - __builtin_aarch64_simd_oi __o; - __o = __builtin_aarch64_ld1x2v16qi ((const __builtin_aarch64_simd_qi *) __a); - ret.val[0] = (poly8x16_t) __builtin_aarch64_get_qregoiv16qi (__o, 0); - ret.val[1] = (poly8x16_t) __builtin_aarch64_get_qregoiv16qi (__o, 1); - return ret; + return __builtin_aarch64_ld1x2v16qi_ps ( + (const __builtin_aarch64_simd_qi *) __a); } __extension__ extern __inline poly16x8x2_t __attribute__ ((__always_inline__, __gnu_inline__, __artificial__)) vld1q_p16_x2 (const poly16_t *__a) { - poly16x8x2_t ret; - __builtin_aarch64_simd_oi __o; - __o = __builtin_aarch64_ld1x2v8hi ((const __builtin_aarch64_simd_hi *) __a); - ret.val[0] = (poly16x8_t) __builtin_aarch64_get_qregoiv8hi (__o, 0); - ret.val[1] = (poly16x8_t) __builtin_aarch64_get_qregoiv8hi (__o, 1); - return ret; + return __builtin_aarch64_ld1x2v8hi_ps ( + (const __builtin_aarch64_simd_hi *) __a); } __extension__ extern __inline poly64x2x2_t __attribute__ ((__always_inline__, __gnu_inline__, __artificial__)) vld1q_p64_x2 (const poly64_t *__a) { - poly64x2x2_t ret; - __builtin_aarch64_simd_oi __o; - __o = __builtin_aarch64_ld1x2v2di ((const __builtin_aarch64_simd_di *) __a); - ret.val[0] = (poly64x2_t) __builtin_aarch64_get_qregoiv2di (__o, 0); - ret.val[1] = (poly64x2_t) __builtin_aarch64_get_qregoiv2di (__o, 1); - return ret; + return __builtin_aarch64_ld1x2v2di_ps ( + (const __builtin_aarch64_simd_di *) __a); } __extension__ extern __inline uint16x8_t @@ -16464,280 +15710,211 @@ __extension__ extern __inline int8x8x4_t __attribute__ ((__always_inline__, __gnu_inline__, __artificial__)) vld1_s8_x4 (const int8_t *__a) { - union { int8x8x4_t __i; __builtin_aarch64_simd_xi __o; } __au; - __au.__o - = __builtin_aarch64_ld1x4v8qi ((const __builtin_aarch64_simd_qi *) __a); - return __au.__i; + return __builtin_aarch64_ld1x4v8qi ((const __builtin_aarch64_simd_qi *) __a); } __extension__ extern __inline int8x16x4_t __attribute__ ((__always_inline__, __gnu_inline__, __artificial__)) vld1q_s8_x4 (const int8_t *__a) { - union { int8x16x4_t __i; __builtin_aarch64_simd_xi __o; } __au; - __au.__o - = __builtin_aarch64_ld1x4v16qi ((const __builtin_aarch64_simd_qi *) __a); - return __au.__i; + return __builtin_aarch64_ld1x4v16qi ( + (const __builtin_aarch64_simd_qi *) __a); } __extension__ extern __inline int16x4x4_t __attribute__ ((__always_inline__, __gnu_inline__, __artificial__)) vld1_s16_x4 (const int16_t *__a) { - union { int16x4x4_t __i; __builtin_aarch64_simd_xi __o; } __au; - __au.__o - = __builtin_aarch64_ld1x4v4hi ((const __builtin_aarch64_simd_hi *) __a); - return __au.__i; + return __builtin_aarch64_ld1x4v4hi ((const __builtin_aarch64_simd_hi *) __a); } __extension__ extern __inline int16x8x4_t __attribute__ ((__always_inline__, __gnu_inline__, __artificial__)) vld1q_s16_x4 (const int16_t *__a) { - union { int16x8x4_t __i; __builtin_aarch64_simd_xi __o; } __au; - __au.__o - = __builtin_aarch64_ld1x4v8hi ((const __builtin_aarch64_simd_hi *) __a); - return __au.__i; + return __builtin_aarch64_ld1x4v8hi ((const __builtin_aarch64_simd_hi *) __a); } __extension__ extern __inline int32x2x4_t __attribute__ ((__always_inline__, __gnu_inline__, __artificial__)) vld1_s32_x4 (const int32_t *__a) { - union { int32x2x4_t __i; __builtin_aarch64_simd_xi __o; } __au; - __au.__o - = __builtin_aarch64_ld1x4v2si ((const __builtin_aarch64_simd_si *) __a); - return __au.__i; + return __builtin_aarch64_ld1x4v2si ((const __builtin_aarch64_simd_si *) __a); } __extension__ extern __inline int32x4x4_t __attribute__ ((__always_inline__, __gnu_inline__, __artificial__)) vld1q_s32_x4 (const int32_t *__a) { - union { int32x4x4_t __i; __builtin_aarch64_simd_xi __o; } __au; - __au.__o - = __builtin_aarch64_ld1x4v4si ((const __builtin_aarch64_simd_si *) __a); - return __au.__i; + return __builtin_aarch64_ld1x4v4si ((const __builtin_aarch64_simd_si *) __a); } __extension__ extern __inline uint8x8x4_t __attribute__ ((__always_inline__, __gnu_inline__, __artificial__)) vld1_u8_x4 (const uint8_t *__a) { - union { uint8x8x4_t __i; __builtin_aarch64_simd_xi __o; } __au; - __au.__o - = __builtin_aarch64_ld1x4v8qi ((const __builtin_aarch64_simd_qi *) __a); - return __au.__i; + return __builtin_aarch64_ld1x4v8qi_us ( + (const __builtin_aarch64_simd_qi *) __a); } __extension__ extern __inline uint8x16x4_t __attribute__ ((__always_inline__, __gnu_inline__, __artificial__)) vld1q_u8_x4 (const uint8_t *__a) { - union { uint8x16x4_t __i; __builtin_aarch64_simd_xi __o; } __au; - __au.__o - = __builtin_aarch64_ld1x4v16qi ((const __builtin_aarch64_simd_qi *) __a); - return __au.__i; + return __builtin_aarch64_ld1x4v16qi_us ( + (const __builtin_aarch64_simd_qi *) __a); } __extension__ extern __inline uint16x4x4_t __attribute__ ((__always_inline__, __gnu_inline__, __artificial__)) vld1_u16_x4 (const uint16_t *__a) { - union { uint16x4x4_t __i; __builtin_aarch64_simd_xi __o; } __au; - __au.__o - = __builtin_aarch64_ld1x4v4hi ((const __builtin_aarch64_simd_hi *) __a); - return __au.__i; + return __builtin_aarch64_ld1x4v4hi_us ( + (const __builtin_aarch64_simd_hi *) __a); } __extension__ extern __inline uint16x8x4_t __attribute__ ((__always_inline__, __gnu_inline__, __artificial__)) vld1q_u16_x4 (const uint16_t *__a) { - union { uint16x8x4_t __i; __builtin_aarch64_simd_xi __o; } __au; - __au.__o - = __builtin_aarch64_ld1x4v8hi ((const __builtin_aarch64_simd_hi *) __a); - return __au.__i; + return __builtin_aarch64_ld1x4v8hi_us ( + (const __builtin_aarch64_simd_hi *) __a); } __extension__ extern __inline uint32x2x4_t __attribute__ ((__always_inline__, __gnu_inline__, __artificial__)) vld1_u32_x4 (const uint32_t *__a) { - union { uint32x2x4_t __i; __builtin_aarch64_simd_xi __o; } __au; - __au.__o - = __builtin_aarch64_ld1x4v2si ((const __builtin_aarch64_simd_si *) __a); - return __au.__i; + return __builtin_aarch64_ld1x4v2si_us ( + (const __builtin_aarch64_simd_si *) __a); } __extension__ extern __inline uint32x4x4_t __attribute__ ((__always_inline__, __gnu_inline__, __artificial__)) vld1q_u32_x4 (const uint32_t *__a) { - union { uint32x4x4_t __i; __builtin_aarch64_simd_xi __o; } __au; - __au.__o - = __builtin_aarch64_ld1x4v4si ((const __builtin_aarch64_simd_si *) __a); - return __au.__i; + return __builtin_aarch64_ld1x4v4si_us ( + (const __builtin_aarch64_simd_si *) __a); } __extension__ extern __inline float16x4x4_t __attribute__ ((__always_inline__, __gnu_inline__, __artificial__)) vld1_f16_x4 (const float16_t *__a) { - union { float16x4x4_t __i; __builtin_aarch64_simd_xi __o; } __au; - __au.__o - = __builtin_aarch64_ld1x4v4hf ((const __builtin_aarch64_simd_hf *) __a); - return __au.__i; + return __builtin_aarch64_ld1x4v4hf ((const __builtin_aarch64_simd_hf *) __a); } __extension__ extern __inline float16x8x4_t __attribute__ ((__always_inline__, __gnu_inline__, __artificial__)) vld1q_f16_x4 (const float16_t *__a) { - union { float16x8x4_t __i; __builtin_aarch64_simd_xi __o; } __au; - __au.__o - = __builtin_aarch64_ld1x4v8hf ((const __builtin_aarch64_simd_hf *) __a); - return __au.__i; + return __builtin_aarch64_ld1x4v8hf ((const __builtin_aarch64_simd_hf *) __a); } __extension__ extern __inline float32x2x4_t __attribute__ ((__always_inline__, __gnu_inline__, __artificial__)) vld1_f32_x4 (const float32_t *__a) { - union { float32x2x4_t __i; __builtin_aarch64_simd_xi __o; } __au; - __au.__o - = __builtin_aarch64_ld1x4v2sf ((const __builtin_aarch64_simd_sf *) __a); - return __au.__i; + return __builtin_aarch64_ld1x4v2sf ((const __builtin_aarch64_simd_sf *) __a); } __extension__ extern __inline float32x4x4_t __attribute__ ((__always_inline__, __gnu_inline__, __artificial__)) vld1q_f32_x4 (const float32_t *__a) { - union { float32x4x4_t __i; __builtin_aarch64_simd_xi __o; } __au; - __au.__o - = __builtin_aarch64_ld1x4v4sf ((const __builtin_aarch64_simd_sf *) __a); - return __au.__i; + return __builtin_aarch64_ld1x4v4sf ((const __builtin_aarch64_simd_sf *) __a); } __extension__ extern __inline poly8x8x4_t __attribute__ ((__always_inline__, __gnu_inline__, __artificial__)) vld1_p8_x4 (const poly8_t *__a) { - union { poly8x8x4_t __i; __builtin_aarch64_simd_xi __o; } __au; - __au.__o - = __builtin_aarch64_ld1x4v8qi ((const __builtin_aarch64_simd_qi *) __a); - return __au.__i; + return __builtin_aarch64_ld1x4v8qi_ps ( + (const __builtin_aarch64_simd_qi *) __a); } __extension__ extern __inline poly8x16x4_t __attribute__ ((__always_inline__, __gnu_inline__, __artificial__)) vld1q_p8_x4 (const poly8_t *__a) { - union { poly8x16x4_t __i; __builtin_aarch64_simd_xi __o; } __au; - __au.__o - = __builtin_aarch64_ld1x4v16qi ((const __builtin_aarch64_simd_qi *) __a); - return __au.__i; + return __builtin_aarch64_ld1x4v16qi_ps ( + (const __builtin_aarch64_simd_qi *) __a); } __extension__ extern __inline poly16x4x4_t __attribute__ ((__always_inline__, __gnu_inline__, __artificial__)) vld1_p16_x4 (const poly16_t *__a) { - union { poly16x4x4_t __i; __builtin_aarch64_simd_xi __o; } __au; - __au.__o - = __builtin_aarch64_ld1x4v4hi ((const __builtin_aarch64_simd_hi *) __a); - return __au.__i; + return __builtin_aarch64_ld1x4v4hi_ps ( + (const __builtin_aarch64_simd_hi *) __a); } __extension__ extern __inline poly16x8x4_t __attribute__ ((__always_inline__, __gnu_inline__, __artificial__)) vld1q_p16_x4 (const poly16_t *__a) { - union { poly16x8x4_t __i; __builtin_aarch64_simd_xi __o; } __au; - __au.__o - = __builtin_aarch64_ld1x4v8hi ((const __builtin_aarch64_simd_hi *) __a); - return __au.__i; + return __builtin_aarch64_ld1x4v8hi_ps ( + (const __builtin_aarch64_simd_hi *) __a); } __extension__ extern __inline int64x1x4_t __attribute__ ((__always_inline__, __gnu_inline__, __artificial__)) vld1_s64_x4 (const int64_t *__a) { - union { int64x1x4_t __i; __builtin_aarch64_simd_xi __o; } __au; - __au.__o - = __builtin_aarch64_ld1x4di ((const __builtin_aarch64_simd_di *) __a); - return __au.__i; + return __builtin_aarch64_ld1x4di ((const __builtin_aarch64_simd_di *) __a); } __extension__ extern __inline uint64x1x4_t __attribute__ ((__always_inline__, __gnu_inline__, __artificial__)) vld1_u64_x4 (const uint64_t *__a) { - union { uint64x1x4_t __i; __builtin_aarch64_simd_xi __o; } __au; - __au.__o - = __builtin_aarch64_ld1x4di ((const __builtin_aarch64_simd_di *) __a); - return __au.__i; + return __builtin_aarch64_ld1x4di_us ( + (const __builtin_aarch64_simd_di *) __a); } __extension__ extern __inline poly64x1x4_t __attribute__ ((__always_inline__, __gnu_inline__, __artificial__)) vld1_p64_x4 (const poly64_t *__a) { - union { poly64x1x4_t __i; __builtin_aarch64_simd_xi __o; } __au; - __au.__o - = __builtin_aarch64_ld1x4di ((const __builtin_aarch64_simd_di *) __a); - return __au.__i; + return __builtin_aarch64_ld1x4di_ps ( + (const __builtin_aarch64_simd_di *) __a); } __extension__ extern __inline int64x2x4_t __attribute__ ((__always_inline__, __gnu_inline__, __artificial__)) vld1q_s64_x4 (const int64_t *__a) { - union { int64x2x4_t __i; __builtin_aarch64_simd_xi __o; } __au; - __au.__o - = __builtin_aarch64_ld1x4v2di ((const __builtin_aarch64_simd_di *) __a); - return __au.__i; + return __builtin_aarch64_ld1x4v2di ((const __builtin_aarch64_simd_di *) __a); } __extension__ extern __inline uint64x2x4_t __attribute__ ((__always_inline__, __gnu_inline__, __artificial__)) vld1q_u64_x4 (const uint64_t *__a) { - union { uint64x2x4_t __i; __builtin_aarch64_simd_xi __o; } __au; - __au.__o - = __builtin_aarch64_ld1x4v2di ((const __builtin_aarch64_simd_di *) __a); - return __au.__i; + return __builtin_aarch64_ld1x4v2di_us ( + (const __builtin_aarch64_simd_di *) __a); } __extension__ extern __inline poly64x2x4_t __attribute__ ((__always_inline__, __gnu_inline__, __artificial__)) vld1q_p64_x4 (const poly64_t *__a) { - union { poly64x2x4_t __i; __builtin_aarch64_simd_xi __o; } __au; - __au.__o - = __builtin_aarch64_ld1x4v2di ((const __builtin_aarch64_simd_di *) __a); - return __au.__i; + return __builtin_aarch64_ld1x4v2di_ps ( + (const __builtin_aarch64_simd_di *) __a); } __extension__ extern __inline float64x1x4_t __attribute__ ((__always_inline__, __gnu_inline__, __artificial__)) vld1_f64_x4 (const float64_t *__a) { - union { float64x1x4_t __i; __builtin_aarch64_simd_xi __o; } __au; - __au.__o - = __builtin_aarch64_ld1x4df ((const __builtin_aarch64_simd_df *) __a); - return __au.__i; + return __builtin_aarch64_ld1x4df ((const __builtin_aarch64_simd_df *) __a); } __extension__ extern __inline float64x2x4_t __attribute__ ((__always_inline__, __gnu_inline__, __artificial__)) vld1q_f64_x4 (const float64_t *__a) { - union { float64x2x4_t __i; __builtin_aarch64_simd_xi __o; } __au; - __au.__o - = __builtin_aarch64_ld1x4v2df ((const __builtin_aarch64_simd_df *) __a); - return __au.__i; + return __builtin_aarch64_ld1x4v2df ((const __builtin_aarch64_simd_df *) __a); } /* vld1_dup */ @@ -17146,1092 +16323,622 @@ __extension__ extern __inline int64x1x2_t __attribute__ ((__always_inline__, __gnu_inline__, __artificial__)) vld2_s64 (const int64_t * __a) { - int64x1x2_t ret; - __builtin_aarch64_simd_oi __o; - __o = __builtin_aarch64_ld2di ((const __builtin_aarch64_simd_di *) __a); - ret.val[0] = (int64x1_t) __builtin_aarch64_get_dregoidi (__o, 0); - ret.val[1] = (int64x1_t) __builtin_aarch64_get_dregoidi (__o, 1); - return ret; + return __builtin_aarch64_ld2di ((const __builtin_aarch64_simd_di *) __a); } __extension__ extern __inline uint64x1x2_t __attribute__ ((__always_inline__, __gnu_inline__, __artificial__)) vld2_u64 (const uint64_t * __a) { - uint64x1x2_t ret; - __builtin_aarch64_simd_oi __o; - __o = __builtin_aarch64_ld2di ((const __builtin_aarch64_simd_di *) __a); - ret.val[0] = (uint64x1_t) __builtin_aarch64_get_dregoidi (__o, 0); - ret.val[1] = (uint64x1_t) __builtin_aarch64_get_dregoidi (__o, 1); - return ret; + return __builtin_aarch64_ld2di_us ((const __builtin_aarch64_simd_di *) __a); } __extension__ extern __inline float64x1x2_t __attribute__ ((__always_inline__, __gnu_inline__, __artificial__)) vld2_f64 (const float64_t * __a) { - float64x1x2_t ret; - __builtin_aarch64_simd_oi __o; - __o = __builtin_aarch64_ld2df ((const __builtin_aarch64_simd_df *) __a); - ret.val[0] = (float64x1_t) {__builtin_aarch64_get_dregoidf (__o, 0)}; - ret.val[1] = (float64x1_t) {__builtin_aarch64_get_dregoidf (__o, 1)}; - return ret; + return __builtin_aarch64_ld2df ((const __builtin_aarch64_simd_df *) __a); } __extension__ extern __inline int8x8x2_t __attribute__ ((__always_inline__, __gnu_inline__, __artificial__)) vld2_s8 (const int8_t * __a) { - int8x8x2_t ret; - __builtin_aarch64_simd_oi __o; - __o = __builtin_aarch64_ld2v8qi ((const __builtin_aarch64_simd_qi *) __a); - ret.val[0] = (int8x8_t) __builtin_aarch64_get_dregoiv8qi (__o, 0); - ret.val[1] = (int8x8_t) __builtin_aarch64_get_dregoiv8qi (__o, 1); - return ret; + return __builtin_aarch64_ld2v8qi ((const __builtin_aarch64_simd_qi *) __a); } __extension__ extern __inline poly8x8x2_t __attribute__ ((__always_inline__, __gnu_inline__, __artificial__)) vld2_p8 (const poly8_t * __a) { - poly8x8x2_t ret; - __builtin_aarch64_simd_oi __o; - __o = __builtin_aarch64_ld2v8qi ((const __builtin_aarch64_simd_qi *) __a); - ret.val[0] = (poly8x8_t) __builtin_aarch64_get_dregoiv8qi (__o, 0); - ret.val[1] = (poly8x8_t) __builtin_aarch64_get_dregoiv8qi (__o, 1); - return ret; + return __builtin_aarch64_ld2v8qi_ps ( + (const __builtin_aarch64_simd_qi *) __a); } __extension__ extern __inline poly64x1x2_t __attribute__ ((__always_inline__, __gnu_inline__, __artificial__)) vld2_p64 (const poly64_t * __a) { - poly64x1x2_t ret; - __builtin_aarch64_simd_oi __o; - __o = __builtin_aarch64_ld2di ((const __builtin_aarch64_simd_di *) __a); - ret.val[0] = (poly64x1_t) __builtin_aarch64_get_dregoidi_pss (__o, 0); - ret.val[1] = (poly64x1_t) __builtin_aarch64_get_dregoidi_pss (__o, 1); - return ret; + return __builtin_aarch64_ld2di_ps ((const __builtin_aarch64_simd_di *) __a); } __extension__ extern __inline int16x4x2_t __attribute__ ((__always_inline__, __gnu_inline__, __artificial__)) vld2_s16 (const int16_t * __a) { - int16x4x2_t ret; - __builtin_aarch64_simd_oi __o; - __o = __builtin_aarch64_ld2v4hi ((const __builtin_aarch64_simd_hi *) __a); - ret.val[0] = (int16x4_t) __builtin_aarch64_get_dregoiv4hi (__o, 0); - ret.val[1] = (int16x4_t) __builtin_aarch64_get_dregoiv4hi (__o, 1); - return ret; + return __builtin_aarch64_ld2v4hi ((const __builtin_aarch64_simd_hi *) __a); } __extension__ extern __inline poly16x4x2_t __attribute__ ((__always_inline__, __gnu_inline__, __artificial__)) vld2_p16 (const poly16_t * __a) { - poly16x4x2_t ret; - __builtin_aarch64_simd_oi __o; - __o = __builtin_aarch64_ld2v4hi ((const __builtin_aarch64_simd_hi *) __a); - ret.val[0] = (poly16x4_t) __builtin_aarch64_get_dregoiv4hi (__o, 0); - ret.val[1] = (poly16x4_t) __builtin_aarch64_get_dregoiv4hi (__o, 1); - return ret; + return __builtin_aarch64_ld2v4hi_ps ( + (const __builtin_aarch64_simd_hi *) __a); } __extension__ extern __inline int32x2x2_t __attribute__ ((__always_inline__, __gnu_inline__, __artificial__)) vld2_s32 (const int32_t * __a) { - int32x2x2_t ret; - __builtin_aarch64_simd_oi __o; - __o = __builtin_aarch64_ld2v2si ((const __builtin_aarch64_simd_si *) __a); - ret.val[0] = (int32x2_t) __builtin_aarch64_get_dregoiv2si (__o, 0); - ret.val[1] = (int32x2_t) __builtin_aarch64_get_dregoiv2si (__o, 1); - return ret; + return __builtin_aarch64_ld2v2si ((const __builtin_aarch64_simd_si *) __a); } __extension__ extern __inline uint8x8x2_t __attribute__ ((__always_inline__, __gnu_inline__, __artificial__)) vld2_u8 (const uint8_t * __a) { - uint8x8x2_t ret; - __builtin_aarch64_simd_oi __o; - __o = __builtin_aarch64_ld2v8qi ((const __builtin_aarch64_simd_qi *) __a); - ret.val[0] = (uint8x8_t) __builtin_aarch64_get_dregoiv8qi (__o, 0); - ret.val[1] = (uint8x8_t) __builtin_aarch64_get_dregoiv8qi (__o, 1); - return ret; + return __builtin_aarch64_ld2v8qi_us ( + (const __builtin_aarch64_simd_qi *) __a); } __extension__ extern __inline uint16x4x2_t __attribute__ ((__always_inline__, __gnu_inline__, __artificial__)) vld2_u16 (const uint16_t * __a) { - uint16x4x2_t ret; - __builtin_aarch64_simd_oi __o; - __o = __builtin_aarch64_ld2v4hi ((const __builtin_aarch64_simd_hi *) __a); - ret.val[0] = (uint16x4_t) __builtin_aarch64_get_dregoiv4hi (__o, 0); - ret.val[1] = (uint16x4_t) __builtin_aarch64_get_dregoiv4hi (__o, 1); - return ret; + return __builtin_aarch64_ld2v4hi_us ( + (const __builtin_aarch64_simd_hi *) __a); } __extension__ extern __inline uint32x2x2_t __attribute__ ((__always_inline__, __gnu_inline__, __artificial__)) vld2_u32 (const uint32_t * __a) { - uint32x2x2_t ret; - __builtin_aarch64_simd_oi __o; - __o = __builtin_aarch64_ld2v2si ((const __builtin_aarch64_simd_si *) __a); - ret.val[0] = (uint32x2_t) __builtin_aarch64_get_dregoiv2si (__o, 0); - ret.val[1] = (uint32x2_t) __builtin_aarch64_get_dregoiv2si (__o, 1); - return ret; + return __builtin_aarch64_ld2v2si_us ( + (const __builtin_aarch64_simd_si *) __a); } __extension__ extern __inline float16x4x2_t __attribute__ ((__always_inline__, __gnu_inline__, __artificial__)) vld2_f16 (const float16_t * __a) { - float16x4x2_t ret; - __builtin_aarch64_simd_oi __o; - __o = __builtin_aarch64_ld2v4hf (__a); - ret.val[0] = __builtin_aarch64_get_dregoiv4hf (__o, 0); - ret.val[1] = __builtin_aarch64_get_dregoiv4hf (__o, 1); - return ret; + return __builtin_aarch64_ld2v4hf (__a); } __extension__ extern __inline float32x2x2_t __attribute__ ((__always_inline__, __gnu_inline__, __artificial__)) vld2_f32 (const float32_t * __a) { - float32x2x2_t ret; - __builtin_aarch64_simd_oi __o; - __o = __builtin_aarch64_ld2v2sf ((const __builtin_aarch64_simd_sf *) __a); - ret.val[0] = (float32x2_t) __builtin_aarch64_get_dregoiv2sf (__o, 0); - ret.val[1] = (float32x2_t) __builtin_aarch64_get_dregoiv2sf (__o, 1); - return ret; + return __builtin_aarch64_ld2v2sf ((const __builtin_aarch64_simd_sf *) __a); } __extension__ extern __inline int8x16x2_t __attribute__ ((__always_inline__, __gnu_inline__, __artificial__)) vld2q_s8 (const int8_t * __a) { - int8x16x2_t ret; - __builtin_aarch64_simd_oi __o; - __o = __builtin_aarch64_ld2v16qi ((const __builtin_aarch64_simd_qi *) __a); - ret.val[0] = (int8x16_t) __builtin_aarch64_get_qregoiv16qi (__o, 0); - ret.val[1] = (int8x16_t) __builtin_aarch64_get_qregoiv16qi (__o, 1); - return ret; + return __builtin_aarch64_ld2v16qi ((const __builtin_aarch64_simd_qi *) __a); } __extension__ extern __inline poly8x16x2_t __attribute__ ((__always_inline__, __gnu_inline__, __artificial__)) vld2q_p8 (const poly8_t * __a) { - poly8x16x2_t ret; - __builtin_aarch64_simd_oi __o; - __o = __builtin_aarch64_ld2v16qi ((const __builtin_aarch64_simd_qi *) __a); - ret.val[0] = (poly8x16_t) __builtin_aarch64_get_qregoiv16qi (__o, 0); - ret.val[1] = (poly8x16_t) __builtin_aarch64_get_qregoiv16qi (__o, 1); - return ret; + return __builtin_aarch64_ld2v16qi_ps ( + (const __builtin_aarch64_simd_qi *) __a); } __extension__ extern __inline int16x8x2_t __attribute__ ((__always_inline__, __gnu_inline__, __artificial__)) vld2q_s16 (const int16_t * __a) { - int16x8x2_t ret; - __builtin_aarch64_simd_oi __o; - __o = __builtin_aarch64_ld2v8hi ((const __builtin_aarch64_simd_hi *) __a); - ret.val[0] = (int16x8_t) __builtin_aarch64_get_qregoiv8hi (__o, 0); - ret.val[1] = (int16x8_t) __builtin_aarch64_get_qregoiv8hi (__o, 1); - return ret; + return __builtin_aarch64_ld2v8hi ((const __builtin_aarch64_simd_hi *) __a); } __extension__ extern __inline poly16x8x2_t __attribute__ ((__always_inline__, __gnu_inline__, __artificial__)) vld2q_p16 (const poly16_t * __a) { - poly16x8x2_t ret; - __builtin_aarch64_simd_oi __o; - __o = __builtin_aarch64_ld2v8hi ((const __builtin_aarch64_simd_hi *) __a); - ret.val[0] = (poly16x8_t) __builtin_aarch64_get_qregoiv8hi (__o, 0); - ret.val[1] = (poly16x8_t) __builtin_aarch64_get_qregoiv8hi (__o, 1); - return ret; + return __builtin_aarch64_ld2v8hi_ps ( + (const __builtin_aarch64_simd_hi *) __a); } __extension__ extern __inline poly64x2x2_t __attribute__ ((__always_inline__, __gnu_inline__, __artificial__)) vld2q_p64 (const poly64_t * __a) { - poly64x2x2_t ret; - __builtin_aarch64_simd_oi __o; - __o = __builtin_aarch64_ld2v2di ((const __builtin_aarch64_simd_di *) __a); - ret.val[0] = (poly64x2_t) __builtin_aarch64_get_qregoiv2di_pss (__o, 0); - ret.val[1] = (poly64x2_t) __builtin_aarch64_get_qregoiv2di_pss (__o, 1); - return ret; + return __builtin_aarch64_ld2v2di_ps ( + (const __builtin_aarch64_simd_di *) __a); } __extension__ extern __inline int32x4x2_t __attribute__ ((__always_inline__, __gnu_inline__, __artificial__)) vld2q_s32 (const int32_t * __a) { - int32x4x2_t ret; - __builtin_aarch64_simd_oi __o; - __o = __builtin_aarch64_ld2v4si ((const __builtin_aarch64_simd_si *) __a); - ret.val[0] = (int32x4_t) __builtin_aarch64_get_qregoiv4si (__o, 0); - ret.val[1] = (int32x4_t) __builtin_aarch64_get_qregoiv4si (__o, 1); - return ret; + return __builtin_aarch64_ld2v4si ((const __builtin_aarch64_simd_si *) __a); } __extension__ extern __inline int64x2x2_t __attribute__ ((__always_inline__, __gnu_inline__, __artificial__)) vld2q_s64 (const int64_t * __a) { - int64x2x2_t ret; - __builtin_aarch64_simd_oi __o; - __o = __builtin_aarch64_ld2v2di ((const __builtin_aarch64_simd_di *) __a); - ret.val[0] = (int64x2_t) __builtin_aarch64_get_qregoiv2di (__o, 0); - ret.val[1] = (int64x2_t) __builtin_aarch64_get_qregoiv2di (__o, 1); - return ret; + return __builtin_aarch64_ld2v2di ((const __builtin_aarch64_simd_di *) __a); } __extension__ extern __inline uint8x16x2_t __attribute__ ((__always_inline__, __gnu_inline__, __artificial__)) vld2q_u8 (const uint8_t * __a) { - uint8x16x2_t ret; - __builtin_aarch64_simd_oi __o; - __o = __builtin_aarch64_ld2v16qi ((const __builtin_aarch64_simd_qi *) __a); - ret.val[0] = (uint8x16_t) __builtin_aarch64_get_qregoiv16qi (__o, 0); - ret.val[1] = (uint8x16_t) __builtin_aarch64_get_qregoiv16qi (__o, 1); - return ret; + return __builtin_aarch64_ld2v16qi_us ( + (const __builtin_aarch64_simd_qi *) __a); } __extension__ extern __inline uint16x8x2_t __attribute__ ((__always_inline__, __gnu_inline__, __artificial__)) vld2q_u16 (const uint16_t * __a) { - uint16x8x2_t ret; - __builtin_aarch64_simd_oi __o; - __o = __builtin_aarch64_ld2v8hi ((const __builtin_aarch64_simd_hi *) __a); - ret.val[0] = (uint16x8_t) __builtin_aarch64_get_qregoiv8hi (__o, 0); - ret.val[1] = (uint16x8_t) __builtin_aarch64_get_qregoiv8hi (__o, 1); - return ret; + return __builtin_aarch64_ld2v8hi_us ( + (const __builtin_aarch64_simd_hi *) __a); } __extension__ extern __inline uint32x4x2_t __attribute__ ((__always_inline__, __gnu_inline__, __artificial__)) vld2q_u32 (const uint32_t * __a) { - uint32x4x2_t ret; - __builtin_aarch64_simd_oi __o; - __o = __builtin_aarch64_ld2v4si ((const __builtin_aarch64_simd_si *) __a); - ret.val[0] = (uint32x4_t) __builtin_aarch64_get_qregoiv4si (__o, 0); - ret.val[1] = (uint32x4_t) __builtin_aarch64_get_qregoiv4si (__o, 1); - return ret; + return __builtin_aarch64_ld2v4si_us ( + (const __builtin_aarch64_simd_si *) __a); } __extension__ extern __inline uint64x2x2_t __attribute__ ((__always_inline__, __gnu_inline__, __artificial__)) vld2q_u64 (const uint64_t * __a) { - uint64x2x2_t ret; - __builtin_aarch64_simd_oi __o; - __o = __builtin_aarch64_ld2v2di ((const __builtin_aarch64_simd_di *) __a); - ret.val[0] = (uint64x2_t) __builtin_aarch64_get_qregoiv2di (__o, 0); - ret.val[1] = (uint64x2_t) __builtin_aarch64_get_qregoiv2di (__o, 1); - return ret; + return __builtin_aarch64_ld2v2di_us ( + (const __builtin_aarch64_simd_di *) __a); } __extension__ extern __inline float16x8x2_t __attribute__ ((__always_inline__, __gnu_inline__, __artificial__)) vld2q_f16 (const float16_t * __a) { - float16x8x2_t ret; - __builtin_aarch64_simd_oi __o; - __o = __builtin_aarch64_ld2v8hf (__a); - ret.val[0] = __builtin_aarch64_get_qregoiv8hf (__o, 0); - ret.val[1] = __builtin_aarch64_get_qregoiv8hf (__o, 1); - return ret; + return __builtin_aarch64_ld2v8hf (__a); } __extension__ extern __inline float32x4x2_t __attribute__ ((__always_inline__, __gnu_inline__, __artificial__)) vld2q_f32 (const float32_t * __a) { - float32x4x2_t ret; - __builtin_aarch64_simd_oi __o; - __o = __builtin_aarch64_ld2v4sf ((const __builtin_aarch64_simd_sf *) __a); - ret.val[0] = (float32x4_t) __builtin_aarch64_get_qregoiv4sf (__o, 0); - ret.val[1] = (float32x4_t) __builtin_aarch64_get_qregoiv4sf (__o, 1); - return ret; + return __builtin_aarch64_ld2v4sf ((const __builtin_aarch64_simd_sf *) __a); } __extension__ extern __inline float64x2x2_t __attribute__ ((__always_inline__, __gnu_inline__, __artificial__)) vld2q_f64 (const float64_t * __a) { - float64x2x2_t ret; - __builtin_aarch64_simd_oi __o; - __o = __builtin_aarch64_ld2v2df ((const __builtin_aarch64_simd_df *) __a); - ret.val[0] = (float64x2_t) __builtin_aarch64_get_qregoiv2df (__o, 0); - ret.val[1] = (float64x2_t) __builtin_aarch64_get_qregoiv2df (__o, 1); - return ret; + return __builtin_aarch64_ld2v2df ((const __builtin_aarch64_simd_df *) __a); } __extension__ extern __inline int64x1x3_t __attribute__ ((__always_inline__, __gnu_inline__, __artificial__)) vld3_s64 (const int64_t * __a) { - int64x1x3_t ret; - __builtin_aarch64_simd_ci __o; - __o = __builtin_aarch64_ld3di ((const __builtin_aarch64_simd_di *) __a); - ret.val[0] = (int64x1_t) __builtin_aarch64_get_dregcidi (__o, 0); - ret.val[1] = (int64x1_t) __builtin_aarch64_get_dregcidi (__o, 1); - ret.val[2] = (int64x1_t) __builtin_aarch64_get_dregcidi (__o, 2); - return ret; + return __builtin_aarch64_ld3di ((const __builtin_aarch64_simd_di *) __a); } __extension__ extern __inline uint64x1x3_t __attribute__ ((__always_inline__, __gnu_inline__, __artificial__)) vld3_u64 (const uint64_t * __a) { - uint64x1x3_t ret; - __builtin_aarch64_simd_ci __o; - __o = __builtin_aarch64_ld3di ((const __builtin_aarch64_simd_di *) __a); - ret.val[0] = (uint64x1_t) __builtin_aarch64_get_dregcidi (__o, 0); - ret.val[1] = (uint64x1_t) __builtin_aarch64_get_dregcidi (__o, 1); - ret.val[2] = (uint64x1_t) __builtin_aarch64_get_dregcidi (__o, 2); - return ret; + return __builtin_aarch64_ld3di_us ((const __builtin_aarch64_simd_di *) __a); } __extension__ extern __inline float64x1x3_t __attribute__ ((__always_inline__, __gnu_inline__, __artificial__)) vld3_f64 (const float64_t * __a) { - float64x1x3_t ret; - __builtin_aarch64_simd_ci __o; - __o = __builtin_aarch64_ld3df ((const __builtin_aarch64_simd_df *) __a); - ret.val[0] = (float64x1_t) {__builtin_aarch64_get_dregcidf (__o, 0)}; - ret.val[1] = (float64x1_t) {__builtin_aarch64_get_dregcidf (__o, 1)}; - ret.val[2] = (float64x1_t) {__builtin_aarch64_get_dregcidf (__o, 2)}; - return ret; + return __builtin_aarch64_ld3df ((const __builtin_aarch64_simd_df *) __a); } __extension__ extern __inline int8x8x3_t __attribute__ ((__always_inline__, __gnu_inline__, __artificial__)) vld3_s8 (const int8_t * __a) { - int8x8x3_t ret; - __builtin_aarch64_simd_ci __o; - __o = __builtin_aarch64_ld3v8qi ((const __builtin_aarch64_simd_qi *) __a); - ret.val[0] = (int8x8_t) __builtin_aarch64_get_dregciv8qi (__o, 0); - ret.val[1] = (int8x8_t) __builtin_aarch64_get_dregciv8qi (__o, 1); - ret.val[2] = (int8x8_t) __builtin_aarch64_get_dregciv8qi (__o, 2); - return ret; + return __builtin_aarch64_ld3v8qi ((const __builtin_aarch64_simd_qi *) __a); } __extension__ extern __inline poly8x8x3_t __attribute__ ((__always_inline__, __gnu_inline__, __artificial__)) vld3_p8 (const poly8_t * __a) { - poly8x8x3_t ret; - __builtin_aarch64_simd_ci __o; - __o = __builtin_aarch64_ld3v8qi ((const __builtin_aarch64_simd_qi *) __a); - ret.val[0] = (poly8x8_t) __builtin_aarch64_get_dregciv8qi (__o, 0); - ret.val[1] = (poly8x8_t) __builtin_aarch64_get_dregciv8qi (__o, 1); - ret.val[2] = (poly8x8_t) __builtin_aarch64_get_dregciv8qi (__o, 2); - return ret; + return __builtin_aarch64_ld3v8qi_ps ( + (const __builtin_aarch64_simd_qi *) __a); } __extension__ extern __inline int16x4x3_t __attribute__ ((__always_inline__, __gnu_inline__, __artificial__)) vld3_s16 (const int16_t * __a) { - int16x4x3_t ret; - __builtin_aarch64_simd_ci __o; - __o = __builtin_aarch64_ld3v4hi ((const __builtin_aarch64_simd_hi *) __a); - ret.val[0] = (int16x4_t) __builtin_aarch64_get_dregciv4hi (__o, 0); - ret.val[1] = (int16x4_t) __builtin_aarch64_get_dregciv4hi (__o, 1); - ret.val[2] = (int16x4_t) __builtin_aarch64_get_dregciv4hi (__o, 2); - return ret; + return __builtin_aarch64_ld3v4hi ((const __builtin_aarch64_simd_hi *) __a); } __extension__ extern __inline poly16x4x3_t __attribute__ ((__always_inline__, __gnu_inline__, __artificial__)) vld3_p16 (const poly16_t * __a) { - poly16x4x3_t ret; - __builtin_aarch64_simd_ci __o; - __o = __builtin_aarch64_ld3v4hi ((const __builtin_aarch64_simd_hi *) __a); - ret.val[0] = (poly16x4_t) __builtin_aarch64_get_dregciv4hi (__o, 0); - ret.val[1] = (poly16x4_t) __builtin_aarch64_get_dregciv4hi (__o, 1); - ret.val[2] = (poly16x4_t) __builtin_aarch64_get_dregciv4hi (__o, 2); - return ret; + return __builtin_aarch64_ld3v4hi_ps ( + (const __builtin_aarch64_simd_hi *) __a); } __extension__ extern __inline int32x2x3_t __attribute__ ((__always_inline__, __gnu_inline__, __artificial__)) vld3_s32 (const int32_t * __a) { - int32x2x3_t ret; - __builtin_aarch64_simd_ci __o; - __o = __builtin_aarch64_ld3v2si ((const __builtin_aarch64_simd_si *) __a); - ret.val[0] = (int32x2_t) __builtin_aarch64_get_dregciv2si (__o, 0); - ret.val[1] = (int32x2_t) __builtin_aarch64_get_dregciv2si (__o, 1); - ret.val[2] = (int32x2_t) __builtin_aarch64_get_dregciv2si (__o, 2); - return ret; + return __builtin_aarch64_ld3v2si ((const __builtin_aarch64_simd_si *) __a); } __extension__ extern __inline uint8x8x3_t __attribute__ ((__always_inline__, __gnu_inline__, __artificial__)) vld3_u8 (const uint8_t * __a) { - uint8x8x3_t ret; - __builtin_aarch64_simd_ci __o; - __o = __builtin_aarch64_ld3v8qi ((const __builtin_aarch64_simd_qi *) __a); - ret.val[0] = (uint8x8_t) __builtin_aarch64_get_dregciv8qi (__o, 0); - ret.val[1] = (uint8x8_t) __builtin_aarch64_get_dregciv8qi (__o, 1); - ret.val[2] = (uint8x8_t) __builtin_aarch64_get_dregciv8qi (__o, 2); - return ret; + return __builtin_aarch64_ld3v8qi_us ( + (const __builtin_aarch64_simd_qi *) __a); } __extension__ extern __inline uint16x4x3_t __attribute__ ((__always_inline__, __gnu_inline__, __artificial__)) vld3_u16 (const uint16_t * __a) { - uint16x4x3_t ret; - __builtin_aarch64_simd_ci __o; - __o = __builtin_aarch64_ld3v4hi ((const __builtin_aarch64_simd_hi *) __a); - ret.val[0] = (uint16x4_t) __builtin_aarch64_get_dregciv4hi (__o, 0); - ret.val[1] = (uint16x4_t) __builtin_aarch64_get_dregciv4hi (__o, 1); - ret.val[2] = (uint16x4_t) __builtin_aarch64_get_dregciv4hi (__o, 2); - return ret; + return __builtin_aarch64_ld3v4hi_us ( + (const __builtin_aarch64_simd_hi *) __a); } __extension__ extern __inline uint32x2x3_t __attribute__ ((__always_inline__, __gnu_inline__, __artificial__)) vld3_u32 (const uint32_t * __a) { - uint32x2x3_t ret; - __builtin_aarch64_simd_ci __o; - __o = __builtin_aarch64_ld3v2si ((const __builtin_aarch64_simd_si *) __a); - ret.val[0] = (uint32x2_t) __builtin_aarch64_get_dregciv2si (__o, 0); - ret.val[1] = (uint32x2_t) __builtin_aarch64_get_dregciv2si (__o, 1); - ret.val[2] = (uint32x2_t) __builtin_aarch64_get_dregciv2si (__o, 2); - return ret; + return __builtin_aarch64_ld3v2si_us ( + (const __builtin_aarch64_simd_si *) __a); } __extension__ extern __inline float16x4x3_t __attribute__ ((__always_inline__, __gnu_inline__, __artificial__)) vld3_f16 (const float16_t * __a) { - float16x4x3_t ret; - __builtin_aarch64_simd_ci __o; - __o = __builtin_aarch64_ld3v4hf (__a); - ret.val[0] = __builtin_aarch64_get_dregciv4hf (__o, 0); - ret.val[1] = __builtin_aarch64_get_dregciv4hf (__o, 1); - ret.val[2] = __builtin_aarch64_get_dregciv4hf (__o, 2); - return ret; + return __builtin_aarch64_ld3v4hf (__a); } __extension__ extern __inline float32x2x3_t __attribute__ ((__always_inline__, __gnu_inline__, __artificial__)) vld3_f32 (const float32_t * __a) { - float32x2x3_t ret; - __builtin_aarch64_simd_ci __o; - __o = __builtin_aarch64_ld3v2sf ((const __builtin_aarch64_simd_sf *) __a); - ret.val[0] = (float32x2_t) __builtin_aarch64_get_dregciv2sf (__o, 0); - ret.val[1] = (float32x2_t) __builtin_aarch64_get_dregciv2sf (__o, 1); - ret.val[2] = (float32x2_t) __builtin_aarch64_get_dregciv2sf (__o, 2); - return ret; + return __builtin_aarch64_ld3v2sf ((const __builtin_aarch64_simd_sf *) __a); } __extension__ extern __inline poly64x1x3_t __attribute__ ((__always_inline__, __gnu_inline__, __artificial__)) vld3_p64 (const poly64_t * __a) { - poly64x1x3_t ret; - __builtin_aarch64_simd_ci __o; - __o = __builtin_aarch64_ld3di ((const __builtin_aarch64_simd_di *) __a); - ret.val[0] = (poly64x1_t) __builtin_aarch64_get_dregcidi_pss (__o, 0); - ret.val[1] = (poly64x1_t) __builtin_aarch64_get_dregcidi_pss (__o, 1); - ret.val[2] = (poly64x1_t) __builtin_aarch64_get_dregcidi_pss (__o, 2); - return ret; + return __builtin_aarch64_ld3di_ps ((const __builtin_aarch64_simd_di *) __a); } __extension__ extern __inline int8x16x3_t __attribute__ ((__always_inline__, __gnu_inline__, __artificial__)) vld3q_s8 (const int8_t * __a) { - int8x16x3_t ret; - __builtin_aarch64_simd_ci __o; - __o = __builtin_aarch64_ld3v16qi ((const __builtin_aarch64_simd_qi *) __a); - ret.val[0] = (int8x16_t) __builtin_aarch64_get_qregciv16qi (__o, 0); - ret.val[1] = (int8x16_t) __builtin_aarch64_get_qregciv16qi (__o, 1); - ret.val[2] = (int8x16_t) __builtin_aarch64_get_qregciv16qi (__o, 2); - return ret; + return __builtin_aarch64_ld3v16qi ((const __builtin_aarch64_simd_qi *) __a); } __extension__ extern __inline poly8x16x3_t __attribute__ ((__always_inline__, __gnu_inline__, __artificial__)) vld3q_p8 (const poly8_t * __a) { - poly8x16x3_t ret; - __builtin_aarch64_simd_ci __o; - __o = __builtin_aarch64_ld3v16qi ((const __builtin_aarch64_simd_qi *) __a); - ret.val[0] = (poly8x16_t) __builtin_aarch64_get_qregciv16qi (__o, 0); - ret.val[1] = (poly8x16_t) __builtin_aarch64_get_qregciv16qi (__o, 1); - ret.val[2] = (poly8x16_t) __builtin_aarch64_get_qregciv16qi (__o, 2); - return ret; + return __builtin_aarch64_ld3v16qi_ps ((const __builtin_aarch64_simd_qi *) __a); } __extension__ extern __inline int16x8x3_t __attribute__ ((__always_inline__, __gnu_inline__, __artificial__)) vld3q_s16 (const int16_t * __a) { - int16x8x3_t ret; - __builtin_aarch64_simd_ci __o; - __o = __builtin_aarch64_ld3v8hi ((const __builtin_aarch64_simd_hi *) __a); - ret.val[0] = (int16x8_t) __builtin_aarch64_get_qregciv8hi (__o, 0); - ret.val[1] = (int16x8_t) __builtin_aarch64_get_qregciv8hi (__o, 1); - ret.val[2] = (int16x8_t) __builtin_aarch64_get_qregciv8hi (__o, 2); - return ret; + return __builtin_aarch64_ld3v8hi ((const __builtin_aarch64_simd_hi *) __a); } __extension__ extern __inline poly16x8x3_t __attribute__ ((__always_inline__, __gnu_inline__, __artificial__)) vld3q_p16 (const poly16_t * __a) { - poly16x8x3_t ret; - __builtin_aarch64_simd_ci __o; - __o = __builtin_aarch64_ld3v8hi ((const __builtin_aarch64_simd_hi *) __a); - ret.val[0] = (poly16x8_t) __builtin_aarch64_get_qregciv8hi (__o, 0); - ret.val[1] = (poly16x8_t) __builtin_aarch64_get_qregciv8hi (__o, 1); - ret.val[2] = (poly16x8_t) __builtin_aarch64_get_qregciv8hi (__o, 2); - return ret; + return __builtin_aarch64_ld3v8hi_ps ((const __builtin_aarch64_simd_hi *) __a); } __extension__ extern __inline int32x4x3_t __attribute__ ((__always_inline__, __gnu_inline__, __artificial__)) vld3q_s32 (const int32_t * __a) { - int32x4x3_t ret; - __builtin_aarch64_simd_ci __o; - __o = __builtin_aarch64_ld3v4si ((const __builtin_aarch64_simd_si *) __a); - ret.val[0] = (int32x4_t) __builtin_aarch64_get_qregciv4si (__o, 0); - ret.val[1] = (int32x4_t) __builtin_aarch64_get_qregciv4si (__o, 1); - ret.val[2] = (int32x4_t) __builtin_aarch64_get_qregciv4si (__o, 2); - return ret; + return __builtin_aarch64_ld3v4si ((const __builtin_aarch64_simd_si *) __a); } __extension__ extern __inline int64x2x3_t __attribute__ ((__always_inline__, __gnu_inline__, __artificial__)) vld3q_s64 (const int64_t * __a) { - int64x2x3_t ret; - __builtin_aarch64_simd_ci __o; - __o = __builtin_aarch64_ld3v2di ((const __builtin_aarch64_simd_di *) __a); - ret.val[0] = (int64x2_t) __builtin_aarch64_get_qregciv2di (__o, 0); - ret.val[1] = (int64x2_t) __builtin_aarch64_get_qregciv2di (__o, 1); - ret.val[2] = (int64x2_t) __builtin_aarch64_get_qregciv2di (__o, 2); - return ret; + return __builtin_aarch64_ld3v2di ((const __builtin_aarch64_simd_di *) __a); } __extension__ extern __inline uint8x16x3_t __attribute__ ((__always_inline__, __gnu_inline__, __artificial__)) vld3q_u8 (const uint8_t * __a) { - uint8x16x3_t ret; - __builtin_aarch64_simd_ci __o; - __o = __builtin_aarch64_ld3v16qi ((const __builtin_aarch64_simd_qi *) __a); - ret.val[0] = (uint8x16_t) __builtin_aarch64_get_qregciv16qi (__o, 0); - ret.val[1] = (uint8x16_t) __builtin_aarch64_get_qregciv16qi (__o, 1); - ret.val[2] = (uint8x16_t) __builtin_aarch64_get_qregciv16qi (__o, 2); - return ret; + return __builtin_aarch64_ld3v16qi_us ( + (const __builtin_aarch64_simd_qi *) __a); } __extension__ extern __inline uint16x8x3_t __attribute__ ((__always_inline__, __gnu_inline__, __artificial__)) vld3q_u16 (const uint16_t * __a) { - uint16x8x3_t ret; - __builtin_aarch64_simd_ci __o; - __o = __builtin_aarch64_ld3v8hi ((const __builtin_aarch64_simd_hi *) __a); - ret.val[0] = (uint16x8_t) __builtin_aarch64_get_qregciv8hi (__o, 0); - ret.val[1] = (uint16x8_t) __builtin_aarch64_get_qregciv8hi (__o, 1); - ret.val[2] = (uint16x8_t) __builtin_aarch64_get_qregciv8hi (__o, 2); - return ret; + return __builtin_aarch64_ld3v8hi_us ( + (const __builtin_aarch64_simd_hi *) __a); } __extension__ extern __inline uint32x4x3_t __attribute__ ((__always_inline__, __gnu_inline__, __artificial__)) vld3q_u32 (const uint32_t * __a) { - uint32x4x3_t ret; - __builtin_aarch64_simd_ci __o; - __o = __builtin_aarch64_ld3v4si ((const __builtin_aarch64_simd_si *) __a); - ret.val[0] = (uint32x4_t) __builtin_aarch64_get_qregciv4si (__o, 0); - ret.val[1] = (uint32x4_t) __builtin_aarch64_get_qregciv4si (__o, 1); - ret.val[2] = (uint32x4_t) __builtin_aarch64_get_qregciv4si (__o, 2); - return ret; + return __builtin_aarch64_ld3v4si_us ( + (const __builtin_aarch64_simd_si *) __a); } __extension__ extern __inline uint64x2x3_t __attribute__ ((__always_inline__, __gnu_inline__, __artificial__)) vld3q_u64 (const uint64_t * __a) { - uint64x2x3_t ret; - __builtin_aarch64_simd_ci __o; - __o = __builtin_aarch64_ld3v2di ((const __builtin_aarch64_simd_di *) __a); - ret.val[0] = (uint64x2_t) __builtin_aarch64_get_qregciv2di (__o, 0); - ret.val[1] = (uint64x2_t) __builtin_aarch64_get_qregciv2di (__o, 1); - ret.val[2] = (uint64x2_t) __builtin_aarch64_get_qregciv2di (__o, 2); - return ret; + return __builtin_aarch64_ld3v2di_us ( + (const __builtin_aarch64_simd_di *) __a); } __extension__ extern __inline float16x8x3_t __attribute__ ((__always_inline__, __gnu_inline__, __artificial__)) vld3q_f16 (const float16_t * __a) { - float16x8x3_t ret; - __builtin_aarch64_simd_ci __o; - __o = __builtin_aarch64_ld3v8hf (__a); - ret.val[0] = __builtin_aarch64_get_qregciv8hf (__o, 0); - ret.val[1] = __builtin_aarch64_get_qregciv8hf (__o, 1); - ret.val[2] = __builtin_aarch64_get_qregciv8hf (__o, 2); - return ret; + return __builtin_aarch64_ld3v8hf (__a); } __extension__ extern __inline float32x4x3_t __attribute__ ((__always_inline__, __gnu_inline__, __artificial__)) vld3q_f32 (const float32_t * __a) { - float32x4x3_t ret; - __builtin_aarch64_simd_ci __o; - __o = __builtin_aarch64_ld3v4sf ((const __builtin_aarch64_simd_sf *) __a); - ret.val[0] = (float32x4_t) __builtin_aarch64_get_qregciv4sf (__o, 0); - ret.val[1] = (float32x4_t) __builtin_aarch64_get_qregciv4sf (__o, 1); - ret.val[2] = (float32x4_t) __builtin_aarch64_get_qregciv4sf (__o, 2); - return ret; + return __builtin_aarch64_ld3v4sf ((const __builtin_aarch64_simd_sf *) __a); } __extension__ extern __inline float64x2x3_t __attribute__ ((__always_inline__, __gnu_inline__, __artificial__)) vld3q_f64 (const float64_t * __a) { - float64x2x3_t ret; - __builtin_aarch64_simd_ci __o; - __o = __builtin_aarch64_ld3v2df ((const __builtin_aarch64_simd_df *) __a); - ret.val[0] = (float64x2_t) __builtin_aarch64_get_qregciv2df (__o, 0); - ret.val[1] = (float64x2_t) __builtin_aarch64_get_qregciv2df (__o, 1); - ret.val[2] = (float64x2_t) __builtin_aarch64_get_qregciv2df (__o, 2); - return ret; + return __builtin_aarch64_ld3v2df ((const __builtin_aarch64_simd_df *) __a); } __extension__ extern __inline poly64x2x3_t __attribute__ ((__always_inline__, __gnu_inline__, __artificial__)) vld3q_p64 (const poly64_t * __a) { - poly64x2x3_t ret; - __builtin_aarch64_simd_ci __o; - __o = __builtin_aarch64_ld3v2di ((const __builtin_aarch64_simd_di *) __a); - ret.val[0] = (poly64x2_t) __builtin_aarch64_get_qregciv2di_pss (__o, 0); - ret.val[1] = (poly64x2_t) __builtin_aarch64_get_qregciv2di_pss (__o, 1); - ret.val[2] = (poly64x2_t) __builtin_aarch64_get_qregciv2di_pss (__o, 2); - return ret; + return __builtin_aarch64_ld3v2di_ps ( + (const __builtin_aarch64_simd_di *) __a); } __extension__ extern __inline int64x1x4_t __attribute__ ((__always_inline__, __gnu_inline__, __artificial__)) vld4_s64 (const int64_t * __a) { - int64x1x4_t ret; - __builtin_aarch64_simd_xi __o; - __o = __builtin_aarch64_ld4di ((const __builtin_aarch64_simd_di *) __a); - ret.val[0] = (int64x1_t) __builtin_aarch64_get_dregxidi (__o, 0); - ret.val[1] = (int64x1_t) __builtin_aarch64_get_dregxidi (__o, 1); - ret.val[2] = (int64x1_t) __builtin_aarch64_get_dregxidi (__o, 2); - ret.val[3] = (int64x1_t) __builtin_aarch64_get_dregxidi (__o, 3); - return ret; + return __builtin_aarch64_ld4di ((const __builtin_aarch64_simd_di *) __a); } __extension__ extern __inline uint64x1x4_t __attribute__ ((__always_inline__, __gnu_inline__, __artificial__)) vld4_u64 (const uint64_t * __a) { - uint64x1x4_t ret; - __builtin_aarch64_simd_xi __o; - __o = __builtin_aarch64_ld4di ((const __builtin_aarch64_simd_di *) __a); - ret.val[0] = (uint64x1_t) __builtin_aarch64_get_dregxidi (__o, 0); - ret.val[1] = (uint64x1_t) __builtin_aarch64_get_dregxidi (__o, 1); - ret.val[2] = (uint64x1_t) __builtin_aarch64_get_dregxidi (__o, 2); - ret.val[3] = (uint64x1_t) __builtin_aarch64_get_dregxidi (__o, 3); - return ret; + return __builtin_aarch64_ld4di_us ((const __builtin_aarch64_simd_di *) __a); } __extension__ extern __inline float64x1x4_t __attribute__ ((__always_inline__, __gnu_inline__, __artificial__)) vld4_f64 (const float64_t * __a) { - float64x1x4_t ret; - __builtin_aarch64_simd_xi __o; - __o = __builtin_aarch64_ld4df ((const __builtin_aarch64_simd_df *) __a); - ret.val[0] = (float64x1_t) {__builtin_aarch64_get_dregxidf (__o, 0)}; - ret.val[1] = (float64x1_t) {__builtin_aarch64_get_dregxidf (__o, 1)}; - ret.val[2] = (float64x1_t) {__builtin_aarch64_get_dregxidf (__o, 2)}; - ret.val[3] = (float64x1_t) {__builtin_aarch64_get_dregxidf (__o, 3)}; - return ret; + return __builtin_aarch64_ld4df ((const __builtin_aarch64_simd_df *) __a); } __extension__ extern __inline int8x8x4_t __attribute__ ((__always_inline__, __gnu_inline__, __artificial__)) vld4_s8 (const int8_t * __a) { - int8x8x4_t ret; - __builtin_aarch64_simd_xi __o; - __o = __builtin_aarch64_ld4v8qi ((const __builtin_aarch64_simd_qi *) __a); - ret.val[0] = (int8x8_t) __builtin_aarch64_get_dregxiv8qi (__o, 0); - ret.val[1] = (int8x8_t) __builtin_aarch64_get_dregxiv8qi (__o, 1); - ret.val[2] = (int8x8_t) __builtin_aarch64_get_dregxiv8qi (__o, 2); - ret.val[3] = (int8x8_t) __builtin_aarch64_get_dregxiv8qi (__o, 3); - return ret; + return __builtin_aarch64_ld4v8qi ((const __builtin_aarch64_simd_qi *) __a); } __extension__ extern __inline poly8x8x4_t __attribute__ ((__always_inline__, __gnu_inline__, __artificial__)) vld4_p8 (const poly8_t * __a) { - poly8x8x4_t ret; - __builtin_aarch64_simd_xi __o; - __o = __builtin_aarch64_ld4v8qi ((const __builtin_aarch64_simd_qi *) __a); - ret.val[0] = (poly8x8_t) __builtin_aarch64_get_dregxiv8qi (__o, 0); - ret.val[1] = (poly8x8_t) __builtin_aarch64_get_dregxiv8qi (__o, 1); - ret.val[2] = (poly8x8_t) __builtin_aarch64_get_dregxiv8qi (__o, 2); - ret.val[3] = (poly8x8_t) __builtin_aarch64_get_dregxiv8qi (__o, 3); - return ret; + return __builtin_aarch64_ld4v8qi_ps ( + (const __builtin_aarch64_simd_qi *) __a); } __extension__ extern __inline int16x4x4_t __attribute__ ((__always_inline__, __gnu_inline__, __artificial__)) vld4_s16 (const int16_t * __a) { - int16x4x4_t ret; - __builtin_aarch64_simd_xi __o; - __o = __builtin_aarch64_ld4v4hi ((const __builtin_aarch64_simd_hi *) __a); - ret.val[0] = (int16x4_t) __builtin_aarch64_get_dregxiv4hi (__o, 0); - ret.val[1] = (int16x4_t) __builtin_aarch64_get_dregxiv4hi (__o, 1); - ret.val[2] = (int16x4_t) __builtin_aarch64_get_dregxiv4hi (__o, 2); - ret.val[3] = (int16x4_t) __builtin_aarch64_get_dregxiv4hi (__o, 3); - return ret; + return __builtin_aarch64_ld4v4hi ((const __builtin_aarch64_simd_hi *) __a); } __extension__ extern __inline poly16x4x4_t __attribute__ ((__always_inline__, __gnu_inline__, __artificial__)) vld4_p16 (const poly16_t * __a) { - poly16x4x4_t ret; - __builtin_aarch64_simd_xi __o; - __o = __builtin_aarch64_ld4v4hi ((const __builtin_aarch64_simd_hi *) __a); - ret.val[0] = (poly16x4_t) __builtin_aarch64_get_dregxiv4hi (__o, 0); - ret.val[1] = (poly16x4_t) __builtin_aarch64_get_dregxiv4hi (__o, 1); - ret.val[2] = (poly16x4_t) __builtin_aarch64_get_dregxiv4hi (__o, 2); - ret.val[3] = (poly16x4_t) __builtin_aarch64_get_dregxiv4hi (__o, 3); - return ret; + return __builtin_aarch64_ld4v4hi_ps ( + (const __builtin_aarch64_simd_hi *) __a); } __extension__ extern __inline int32x2x4_t __attribute__ ((__always_inline__, __gnu_inline__, __artificial__)) vld4_s32 (const int32_t * __a) { - int32x2x4_t ret; - __builtin_aarch64_simd_xi __o; - __o = __builtin_aarch64_ld4v2si ((const __builtin_aarch64_simd_si *) __a); - ret.val[0] = (int32x2_t) __builtin_aarch64_get_dregxiv2si (__o, 0); - ret.val[1] = (int32x2_t) __builtin_aarch64_get_dregxiv2si (__o, 1); - ret.val[2] = (int32x2_t) __builtin_aarch64_get_dregxiv2si (__o, 2); - ret.val[3] = (int32x2_t) __builtin_aarch64_get_dregxiv2si (__o, 3); - return ret; + return __builtin_aarch64_ld4v2si ((const __builtin_aarch64_simd_si *) __a); } __extension__ extern __inline uint8x8x4_t __attribute__ ((__always_inline__, __gnu_inline__, __artificial__)) vld4_u8 (const uint8_t * __a) { - uint8x8x4_t ret; - __builtin_aarch64_simd_xi __o; - __o = __builtin_aarch64_ld4v8qi ((const __builtin_aarch64_simd_qi *) __a); - ret.val[0] = (uint8x8_t) __builtin_aarch64_get_dregxiv8qi (__o, 0); - ret.val[1] = (uint8x8_t) __builtin_aarch64_get_dregxiv8qi (__o, 1); - ret.val[2] = (uint8x8_t) __builtin_aarch64_get_dregxiv8qi (__o, 2); - ret.val[3] = (uint8x8_t) __builtin_aarch64_get_dregxiv8qi (__o, 3); - return ret; + return __builtin_aarch64_ld4v8qi_us ( + (const __builtin_aarch64_simd_qi *) __a); } __extension__ extern __inline uint16x4x4_t __attribute__ ((__always_inline__, __gnu_inline__, __artificial__)) vld4_u16 (const uint16_t * __a) { - uint16x4x4_t ret; - __builtin_aarch64_simd_xi __o; - __o = __builtin_aarch64_ld4v4hi ((const __builtin_aarch64_simd_hi *) __a); - ret.val[0] = (uint16x4_t) __builtin_aarch64_get_dregxiv4hi (__o, 0); - ret.val[1] = (uint16x4_t) __builtin_aarch64_get_dregxiv4hi (__o, 1); - ret.val[2] = (uint16x4_t) __builtin_aarch64_get_dregxiv4hi (__o, 2); - ret.val[3] = (uint16x4_t) __builtin_aarch64_get_dregxiv4hi (__o, 3); - return ret; + return __builtin_aarch64_ld4v4hi_us ( + (const __builtin_aarch64_simd_hi *) __a); } __extension__ extern __inline uint32x2x4_t __attribute__ ((__always_inline__, __gnu_inline__, __artificial__)) vld4_u32 (const uint32_t * __a) { - uint32x2x4_t ret; - __builtin_aarch64_simd_xi __o; - __o = __builtin_aarch64_ld4v2si ((const __builtin_aarch64_simd_si *) __a); - ret.val[0] = (uint32x2_t) __builtin_aarch64_get_dregxiv2si (__o, 0); - ret.val[1] = (uint32x2_t) __builtin_aarch64_get_dregxiv2si (__o, 1); - ret.val[2] = (uint32x2_t) __builtin_aarch64_get_dregxiv2si (__o, 2); - ret.val[3] = (uint32x2_t) __builtin_aarch64_get_dregxiv2si (__o, 3); - return ret; + return __builtin_aarch64_ld4v2si_us ( + (const __builtin_aarch64_simd_si *) __a); } __extension__ extern __inline float16x4x4_t __attribute__ ((__always_inline__, __gnu_inline__, __artificial__)) vld4_f16 (const float16_t * __a) { - float16x4x4_t ret; - __builtin_aarch64_simd_xi __o; - __o = __builtin_aarch64_ld4v4hf (__a); - ret.val[0] = __builtin_aarch64_get_dregxiv4hf (__o, 0); - ret.val[1] = __builtin_aarch64_get_dregxiv4hf (__o, 1); - ret.val[2] = __builtin_aarch64_get_dregxiv4hf (__o, 2); - ret.val[3] = __builtin_aarch64_get_dregxiv4hf (__o, 3); - return ret; + return __builtin_aarch64_ld4v4hf (__a); } __extension__ extern __inline float32x2x4_t __attribute__ ((__always_inline__, __gnu_inline__, __artificial__)) vld4_f32 (const float32_t * __a) { - float32x2x4_t ret; - __builtin_aarch64_simd_xi __o; - __o = __builtin_aarch64_ld4v2sf ((const __builtin_aarch64_simd_sf *) __a); - ret.val[0] = (float32x2_t) __builtin_aarch64_get_dregxiv2sf (__o, 0); - ret.val[1] = (float32x2_t) __builtin_aarch64_get_dregxiv2sf (__o, 1); - ret.val[2] = (float32x2_t) __builtin_aarch64_get_dregxiv2sf (__o, 2); - ret.val[3] = (float32x2_t) __builtin_aarch64_get_dregxiv2sf (__o, 3); - return ret; + return __builtin_aarch64_ld4v2sf ((const __builtin_aarch64_simd_sf *) __a); } __extension__ extern __inline poly64x1x4_t __attribute__ ((__always_inline__, __gnu_inline__, __artificial__)) vld4_p64 (const poly64_t * __a) { - poly64x1x4_t ret; - __builtin_aarch64_simd_xi __o; - __o = __builtin_aarch64_ld4di ((const __builtin_aarch64_simd_di *) __a); - ret.val[0] = (poly64x1_t) __builtin_aarch64_get_dregxidi_pss (__o, 0); - ret.val[1] = (poly64x1_t) __builtin_aarch64_get_dregxidi_pss (__o, 1); - ret.val[2] = (poly64x1_t) __builtin_aarch64_get_dregxidi_pss (__o, 2); - ret.val[3] = (poly64x1_t) __builtin_aarch64_get_dregxidi_pss (__o, 3); - return ret; + return __builtin_aarch64_ld4di_ps ((const __builtin_aarch64_simd_di *) __a); } __extension__ extern __inline int8x16x4_t __attribute__ ((__always_inline__, __gnu_inline__, __artificial__)) vld4q_s8 (const int8_t * __a) { - int8x16x4_t ret; - __builtin_aarch64_simd_xi __o; - __o = __builtin_aarch64_ld4v16qi ((const __builtin_aarch64_simd_qi *) __a); - ret.val[0] = (int8x16_t) __builtin_aarch64_get_qregxiv16qi (__o, 0); - ret.val[1] = (int8x16_t) __builtin_aarch64_get_qregxiv16qi (__o, 1); - ret.val[2] = (int8x16_t) __builtin_aarch64_get_qregxiv16qi (__o, 2); - ret.val[3] = (int8x16_t) __builtin_aarch64_get_qregxiv16qi (__o, 3); - return ret; + return __builtin_aarch64_ld4v16qi ((const __builtin_aarch64_simd_qi *) __a); } __extension__ extern __inline poly8x16x4_t __attribute__ ((__always_inline__, __gnu_inline__, __artificial__)) vld4q_p8 (const poly8_t * __a) { - poly8x16x4_t ret; - __builtin_aarch64_simd_xi __o; - __o = __builtin_aarch64_ld4v16qi ((const __builtin_aarch64_simd_qi *) __a); - ret.val[0] = (poly8x16_t) __builtin_aarch64_get_qregxiv16qi (__o, 0); - ret.val[1] = (poly8x16_t) __builtin_aarch64_get_qregxiv16qi (__o, 1); - ret.val[2] = (poly8x16_t) __builtin_aarch64_get_qregxiv16qi (__o, 2); - ret.val[3] = (poly8x16_t) __builtin_aarch64_get_qregxiv16qi (__o, 3); - return ret; + return __builtin_aarch64_ld4v16qi_ps ( + (const __builtin_aarch64_simd_qi *) __a); } __extension__ extern __inline int16x8x4_t __attribute__ ((__always_inline__, __gnu_inline__, __artificial__)) vld4q_s16 (const int16_t * __a) { - int16x8x4_t ret; - __builtin_aarch64_simd_xi __o; - __o = __builtin_aarch64_ld4v8hi ((const __builtin_aarch64_simd_hi *) __a); - ret.val[0] = (int16x8_t) __builtin_aarch64_get_qregxiv8hi (__o, 0); - ret.val[1] = (int16x8_t) __builtin_aarch64_get_qregxiv8hi (__o, 1); - ret.val[2] = (int16x8_t) __builtin_aarch64_get_qregxiv8hi (__o, 2); - ret.val[3] = (int16x8_t) __builtin_aarch64_get_qregxiv8hi (__o, 3); - return ret; + return __builtin_aarch64_ld4v8hi ((const __builtin_aarch64_simd_hi *) __a); } __extension__ extern __inline poly16x8x4_t __attribute__ ((__always_inline__, __gnu_inline__, __artificial__)) vld4q_p16 (const poly16_t * __a) { - poly16x8x4_t ret; - __builtin_aarch64_simd_xi __o; - __o = __builtin_aarch64_ld4v8hi ((const __builtin_aarch64_simd_hi *) __a); - ret.val[0] = (poly16x8_t) __builtin_aarch64_get_qregxiv8hi (__o, 0); - ret.val[1] = (poly16x8_t) __builtin_aarch64_get_qregxiv8hi (__o, 1); - ret.val[2] = (poly16x8_t) __builtin_aarch64_get_qregxiv8hi (__o, 2); - ret.val[3] = (poly16x8_t) __builtin_aarch64_get_qregxiv8hi (__o, 3); - return ret; + return __builtin_aarch64_ld4v8hi_ps ( + (const __builtin_aarch64_simd_hi *) __a); } __extension__ extern __inline int32x4x4_t __attribute__ ((__always_inline__, __gnu_inline__, __artificial__)) vld4q_s32 (const int32_t * __a) { - int32x4x4_t ret; - __builtin_aarch64_simd_xi __o; - __o = __builtin_aarch64_ld4v4si ((const __builtin_aarch64_simd_si *) __a); - ret.val[0] = (int32x4_t) __builtin_aarch64_get_qregxiv4si (__o, 0); - ret.val[1] = (int32x4_t) __builtin_aarch64_get_qregxiv4si (__o, 1); - ret.val[2] = (int32x4_t) __builtin_aarch64_get_qregxiv4si (__o, 2); - ret.val[3] = (int32x4_t) __builtin_aarch64_get_qregxiv4si (__o, 3); - return ret; + return __builtin_aarch64_ld4v4si ((const __builtin_aarch64_simd_si *) __a); } __extension__ extern __inline int64x2x4_t __attribute__ ((__always_inline__, __gnu_inline__, __artificial__)) vld4q_s64 (const int64_t * __a) { - int64x2x4_t ret; - __builtin_aarch64_simd_xi __o; - __o = __builtin_aarch64_ld4v2di ((const __builtin_aarch64_simd_di *) __a); - ret.val[0] = (int64x2_t) __builtin_aarch64_get_qregxiv2di (__o, 0); - ret.val[1] = (int64x2_t) __builtin_aarch64_get_qregxiv2di (__o, 1); - ret.val[2] = (int64x2_t) __builtin_aarch64_get_qregxiv2di (__o, 2); - ret.val[3] = (int64x2_t) __builtin_aarch64_get_qregxiv2di (__o, 3); - return ret; + return __builtin_aarch64_ld4v2di ((const __builtin_aarch64_simd_di *) __a); } __extension__ extern __inline uint8x16x4_t __attribute__ ((__always_inline__, __gnu_inline__, __artificial__)) vld4q_u8 (const uint8_t * __a) { - uint8x16x4_t ret; - __builtin_aarch64_simd_xi __o; - __o = __builtin_aarch64_ld4v16qi ((const __builtin_aarch64_simd_qi *) __a); - ret.val[0] = (uint8x16_t) __builtin_aarch64_get_qregxiv16qi (__o, 0); - ret.val[1] = (uint8x16_t) __builtin_aarch64_get_qregxiv16qi (__o, 1); - ret.val[2] = (uint8x16_t) __builtin_aarch64_get_qregxiv16qi (__o, 2); - ret.val[3] = (uint8x16_t) __builtin_aarch64_get_qregxiv16qi (__o, 3); - return ret; + return __builtin_aarch64_ld4v16qi_us ( + (const __builtin_aarch64_simd_qi *) __a); } __extension__ extern __inline uint16x8x4_t __attribute__ ((__always_inline__, __gnu_inline__, __artificial__)) vld4q_u16 (const uint16_t * __a) { - uint16x8x4_t ret; - __builtin_aarch64_simd_xi __o; - __o = __builtin_aarch64_ld4v8hi ((const __builtin_aarch64_simd_hi *) __a); - ret.val[0] = (uint16x8_t) __builtin_aarch64_get_qregxiv8hi (__o, 0); - ret.val[1] = (uint16x8_t) __builtin_aarch64_get_qregxiv8hi (__o, 1); - ret.val[2] = (uint16x8_t) __builtin_aarch64_get_qregxiv8hi (__o, 2); - ret.val[3] = (uint16x8_t) __builtin_aarch64_get_qregxiv8hi (__o, 3); - return ret; + return __builtin_aarch64_ld4v8hi_us ( + (const __builtin_aarch64_simd_hi *) __a); } __extension__ extern __inline uint32x4x4_t __attribute__ ((__always_inline__, __gnu_inline__, __artificial__)) vld4q_u32 (const uint32_t * __a) { - uint32x4x4_t ret; - __builtin_aarch64_simd_xi __o; - __o = __builtin_aarch64_ld4v4si ((const __builtin_aarch64_simd_si *) __a); - ret.val[0] = (uint32x4_t) __builtin_aarch64_get_qregxiv4si (__o, 0); - ret.val[1] = (uint32x4_t) __builtin_aarch64_get_qregxiv4si (__o, 1); - ret.val[2] = (uint32x4_t) __builtin_aarch64_get_qregxiv4si (__o, 2); - ret.val[3] = (uint32x4_t) __builtin_aarch64_get_qregxiv4si (__o, 3); - return ret; + return __builtin_aarch64_ld4v4si_us ( + (const __builtin_aarch64_simd_si *) __a); } __extension__ extern __inline uint64x2x4_t __attribute__ ((__always_inline__, __gnu_inline__, __artificial__)) vld4q_u64 (const uint64_t * __a) { - uint64x2x4_t ret; - __builtin_aarch64_simd_xi __o; - __o = __builtin_aarch64_ld4v2di ((const __builtin_aarch64_simd_di *) __a); - ret.val[0] = (uint64x2_t) __builtin_aarch64_get_qregxiv2di (__o, 0); - ret.val[1] = (uint64x2_t) __builtin_aarch64_get_qregxiv2di (__o, 1); - ret.val[2] = (uint64x2_t) __builtin_aarch64_get_qregxiv2di (__o, 2); - ret.val[3] = (uint64x2_t) __builtin_aarch64_get_qregxiv2di (__o, 3); - return ret; + return __builtin_aarch64_ld4v2di_us ( + (const __builtin_aarch64_simd_di *) __a); } __extension__ extern __inline float16x8x4_t __attribute__ ((__always_inline__, __gnu_inline__, __artificial__)) vld4q_f16 (const float16_t * __a) { - float16x8x4_t ret; - __builtin_aarch64_simd_xi __o; - __o = __builtin_aarch64_ld4v8hf (__a); - ret.val[0] = __builtin_aarch64_get_qregxiv8hf (__o, 0); - ret.val[1] = __builtin_aarch64_get_qregxiv8hf (__o, 1); - ret.val[2] = __builtin_aarch64_get_qregxiv8hf (__o, 2); - ret.val[3] = __builtin_aarch64_get_qregxiv8hf (__o, 3); - return ret; + return __builtin_aarch64_ld4v8hf (__a); } __extension__ extern __inline float32x4x4_t __attribute__ ((__always_inline__, __gnu_inline__, __artificial__)) vld4q_f32 (const float32_t * __a) { - float32x4x4_t ret; - __builtin_aarch64_simd_xi __o; - __o = __builtin_aarch64_ld4v4sf ((const __builtin_aarch64_simd_sf *) __a); - ret.val[0] = (float32x4_t) __builtin_aarch64_get_qregxiv4sf (__o, 0); - ret.val[1] = (float32x4_t) __builtin_aarch64_get_qregxiv4sf (__o, 1); - ret.val[2] = (float32x4_t) __builtin_aarch64_get_qregxiv4sf (__o, 2); - ret.val[3] = (float32x4_t) __builtin_aarch64_get_qregxiv4sf (__o, 3); - return ret; + return __builtin_aarch64_ld4v4sf ((const __builtin_aarch64_simd_sf *) __a); } __extension__ extern __inline float64x2x4_t __attribute__ ((__always_inline__, __gnu_inline__, __artificial__)) vld4q_f64 (const float64_t * __a) { - float64x2x4_t ret; - __builtin_aarch64_simd_xi __o; - __o = __builtin_aarch64_ld4v2df ((const __builtin_aarch64_simd_df *) __a); - ret.val[0] = (float64x2_t) __builtin_aarch64_get_qregxiv2df (__o, 0); - ret.val[1] = (float64x2_t) __builtin_aarch64_get_qregxiv2df (__o, 1); - ret.val[2] = (float64x2_t) __builtin_aarch64_get_qregxiv2df (__o, 2); - ret.val[3] = (float64x2_t) __builtin_aarch64_get_qregxiv2df (__o, 3); - return ret; + return __builtin_aarch64_ld4v2df ((const __builtin_aarch64_simd_df *) __a); } __extension__ extern __inline poly64x2x4_t __attribute__ ((__always_inline__, __gnu_inline__, __artificial__)) vld4q_p64 (const poly64_t * __a) { - poly64x2x4_t ret; - __builtin_aarch64_simd_xi __o; - __o = __builtin_aarch64_ld4v2di ((const __builtin_aarch64_simd_di *) __a); - ret.val[0] = (poly64x2_t) __builtin_aarch64_get_qregxiv2di_pss (__o, 0); - ret.val[1] = (poly64x2_t) __builtin_aarch64_get_qregxiv2di_pss (__o, 1); - ret.val[2] = (poly64x2_t) __builtin_aarch64_get_qregxiv2di_pss (__o, 2); - ret.val[3] = (poly64x2_t) __builtin_aarch64_get_qregxiv2di_pss (__o, 3); - return ret; + return __builtin_aarch64_ld4v2di_ps ( + (const __builtin_aarch64_simd_di *) __a); } __extension__ extern __inline poly128_t @@ -18247,1093 +16954,624 @@ __extension__ extern __inline int8x8x2_t __attribute__ ((__always_inline__, __gnu_inline__, __artificial__)) vld2_dup_s8 (const int8_t * __a) { - int8x8x2_t ret; - __builtin_aarch64_simd_oi __o; - __o = __builtin_aarch64_ld2rv8qi ((const __builtin_aarch64_simd_qi *) __a); - ret.val[0] = (int8x8_t) __builtin_aarch64_get_dregoiv8qi (__o, 0); - ret.val[1] = (int8x8_t) __builtin_aarch64_get_dregoiv8qi (__o, 1); - return ret; + return __builtin_aarch64_ld2rv8qi ((const __builtin_aarch64_simd_qi *) __a); } __extension__ extern __inline int16x4x2_t __attribute__ ((__always_inline__, __gnu_inline__, __artificial__)) vld2_dup_s16 (const int16_t * __a) { - int16x4x2_t ret; - __builtin_aarch64_simd_oi __o; - __o = __builtin_aarch64_ld2rv4hi ((const __builtin_aarch64_simd_hi *) __a); - ret.val[0] = (int16x4_t) __builtin_aarch64_get_dregoiv4hi (__o, 0); - ret.val[1] = (int16x4_t) __builtin_aarch64_get_dregoiv4hi (__o, 1); - return ret; + return __builtin_aarch64_ld2rv4hi ((const __builtin_aarch64_simd_hi *) __a); } __extension__ extern __inline int32x2x2_t __attribute__ ((__always_inline__, __gnu_inline__, __artificial__)) vld2_dup_s32 (const int32_t * __a) { - int32x2x2_t ret; - __builtin_aarch64_simd_oi __o; - __o = __builtin_aarch64_ld2rv2si ((const __builtin_aarch64_simd_si *) __a); - ret.val[0] = (int32x2_t) __builtin_aarch64_get_dregoiv2si (__o, 0); - ret.val[1] = (int32x2_t) __builtin_aarch64_get_dregoiv2si (__o, 1); - return ret; + return __builtin_aarch64_ld2rv2si ((const __builtin_aarch64_simd_si *) __a); } __extension__ extern __inline float16x4x2_t __attribute__ ((__always_inline__, __gnu_inline__, __artificial__)) vld2_dup_f16 (const float16_t * __a) { - float16x4x2_t ret; - __builtin_aarch64_simd_oi __o; - __o = __builtin_aarch64_ld2rv4hf ((const __builtin_aarch64_simd_hf *) __a); - ret.val[0] = __builtin_aarch64_get_dregoiv4hf (__o, 0); - ret.val[1] = (float16x4_t) __builtin_aarch64_get_dregoiv4hf (__o, 1); - return ret; + return __builtin_aarch64_ld2rv4hf ((const __builtin_aarch64_simd_hf *) __a); } __extension__ extern __inline float32x2x2_t __attribute__ ((__always_inline__, __gnu_inline__, __artificial__)) vld2_dup_f32 (const float32_t * __a) { - float32x2x2_t ret; - __builtin_aarch64_simd_oi __o; - __o = __builtin_aarch64_ld2rv2sf ((const __builtin_aarch64_simd_sf *) __a); - ret.val[0] = (float32x2_t) __builtin_aarch64_get_dregoiv2sf (__o, 0); - ret.val[1] = (float32x2_t) __builtin_aarch64_get_dregoiv2sf (__o, 1); - return ret; + return __builtin_aarch64_ld2rv2sf ((const __builtin_aarch64_simd_sf *) __a); } __extension__ extern __inline float64x1x2_t __attribute__ ((__always_inline__, __gnu_inline__, __artificial__)) vld2_dup_f64 (const float64_t * __a) { - float64x1x2_t ret; - __builtin_aarch64_simd_oi __o; - __o = __builtin_aarch64_ld2rdf ((const __builtin_aarch64_simd_df *) __a); - ret.val[0] = (float64x1_t) {__builtin_aarch64_get_dregoidf (__o, 0)}; - ret.val[1] = (float64x1_t) {__builtin_aarch64_get_dregoidf (__o, 1)}; - return ret; + return __builtin_aarch64_ld2rdf ((const __builtin_aarch64_simd_df *) __a); } __extension__ extern __inline uint8x8x2_t __attribute__ ((__always_inline__, __gnu_inline__, __artificial__)) vld2_dup_u8 (const uint8_t * __a) { - uint8x8x2_t ret; - __builtin_aarch64_simd_oi __o; - __o = __builtin_aarch64_ld2rv8qi ((const __builtin_aarch64_simd_qi *) __a); - ret.val[0] = (uint8x8_t) __builtin_aarch64_get_dregoiv8qi (__o, 0); - ret.val[1] = (uint8x8_t) __builtin_aarch64_get_dregoiv8qi (__o, 1); - return ret; + return __builtin_aarch64_ld2rv8qi_us ( + (const __builtin_aarch64_simd_qi *) __a); } __extension__ extern __inline uint16x4x2_t __attribute__ ((__always_inline__, __gnu_inline__, __artificial__)) vld2_dup_u16 (const uint16_t * __a) { - uint16x4x2_t ret; - __builtin_aarch64_simd_oi __o; - __o = __builtin_aarch64_ld2rv4hi ((const __builtin_aarch64_simd_hi *) __a); - ret.val[0] = (uint16x4_t) __builtin_aarch64_get_dregoiv4hi (__o, 0); - ret.val[1] = (uint16x4_t) __builtin_aarch64_get_dregoiv4hi (__o, 1); - return ret; + return __builtin_aarch64_ld2rv4hi_us ( + (const __builtin_aarch64_simd_hi *) __a); } __extension__ extern __inline uint32x2x2_t __attribute__ ((__always_inline__, __gnu_inline__, __artificial__)) vld2_dup_u32 (const uint32_t * __a) { - uint32x2x2_t ret; - __builtin_aarch64_simd_oi __o; - __o = __builtin_aarch64_ld2rv2si ((const __builtin_aarch64_simd_si *) __a); - ret.val[0] = (uint32x2_t) __builtin_aarch64_get_dregoiv2si (__o, 0); - ret.val[1] = (uint32x2_t) __builtin_aarch64_get_dregoiv2si (__o, 1); - return ret; + return __builtin_aarch64_ld2rv2si_us ( + (const __builtin_aarch64_simd_si *) __a); } __extension__ extern __inline poly8x8x2_t __attribute__ ((__always_inline__, __gnu_inline__, __artificial__)) vld2_dup_p8 (const poly8_t * __a) { - poly8x8x2_t ret; - __builtin_aarch64_simd_oi __o; - __o = __builtin_aarch64_ld2rv8qi ((const __builtin_aarch64_simd_qi *) __a); - ret.val[0] = (poly8x8_t) __builtin_aarch64_get_dregoiv8qi (__o, 0); - ret.val[1] = (poly8x8_t) __builtin_aarch64_get_dregoiv8qi (__o, 1); - return ret; + return __builtin_aarch64_ld2rv8qi_ps ( + (const __builtin_aarch64_simd_qi *) __a); } __extension__ extern __inline poly16x4x2_t __attribute__ ((__always_inline__, __gnu_inline__, __artificial__)) vld2_dup_p16 (const poly16_t * __a) { - poly16x4x2_t ret; - __builtin_aarch64_simd_oi __o; - __o = __builtin_aarch64_ld2rv4hi ((const __builtin_aarch64_simd_hi *) __a); - ret.val[0] = (poly16x4_t) __builtin_aarch64_get_dregoiv4hi (__o, 0); - ret.val[1] = (poly16x4_t) __builtin_aarch64_get_dregoiv4hi (__o, 1); - return ret; + return __builtin_aarch64_ld2rv4hi_ps ( + (const __builtin_aarch64_simd_hi *) __a); } __extension__ extern __inline poly64x1x2_t __attribute__ ((__always_inline__, __gnu_inline__, __artificial__)) vld2_dup_p64 (const poly64_t * __a) { - poly64x1x2_t ret; - __builtin_aarch64_simd_oi __o; - __o = __builtin_aarch64_ld2rv2di ((const __builtin_aarch64_simd_di *) __a); - ret.val[0] = (poly64x1_t) __builtin_aarch64_get_dregoidi_pss (__o, 0); - ret.val[1] = (poly64x1_t) __builtin_aarch64_get_dregoidi_pss (__o, 1); - return ret; + return __builtin_aarch64_ld2rdi_ps ((const __builtin_aarch64_simd_di *) __a); } - __extension__ extern __inline int64x1x2_t __attribute__ ((__always_inline__, __gnu_inline__, __artificial__)) vld2_dup_s64 (const int64_t * __a) { - int64x1x2_t ret; - __builtin_aarch64_simd_oi __o; - __o = __builtin_aarch64_ld2rdi ((const __builtin_aarch64_simd_di *) __a); - ret.val[0] = (int64x1_t) __builtin_aarch64_get_dregoidi (__o, 0); - ret.val[1] = (int64x1_t) __builtin_aarch64_get_dregoidi (__o, 1); - return ret; + return __builtin_aarch64_ld2rdi ((const __builtin_aarch64_simd_di *) __a); } __extension__ extern __inline uint64x1x2_t __attribute__ ((__always_inline__, __gnu_inline__, __artificial__)) vld2_dup_u64 (const uint64_t * __a) { - uint64x1x2_t ret; - __builtin_aarch64_simd_oi __o; - __o = __builtin_aarch64_ld2rdi ((const __builtin_aarch64_simd_di *) __a); - ret.val[0] = (uint64x1_t) __builtin_aarch64_get_dregoidi (__o, 0); - ret.val[1] = (uint64x1_t) __builtin_aarch64_get_dregoidi (__o, 1); - return ret; + return __builtin_aarch64_ld2rdi_us ((const __builtin_aarch64_simd_di *) __a); } __extension__ extern __inline int8x16x2_t __attribute__ ((__always_inline__, __gnu_inline__, __artificial__)) vld2q_dup_s8 (const int8_t * __a) { - int8x16x2_t ret; - __builtin_aarch64_simd_oi __o; - __o = __builtin_aarch64_ld2rv16qi ((const __builtin_aarch64_simd_qi *) __a); - ret.val[0] = (int8x16_t) __builtin_aarch64_get_qregoiv16qi (__o, 0); - ret.val[1] = (int8x16_t) __builtin_aarch64_get_qregoiv16qi (__o, 1); - return ret; + return __builtin_aarch64_ld2rv16qi ((const __builtin_aarch64_simd_qi *) __a); } __extension__ extern __inline poly8x16x2_t __attribute__ ((__always_inline__, __gnu_inline__, __artificial__)) vld2q_dup_p8 (const poly8_t * __a) { - poly8x16x2_t ret; - __builtin_aarch64_simd_oi __o; - __o = __builtin_aarch64_ld2rv16qi ((const __builtin_aarch64_simd_qi *) __a); - ret.val[0] = (poly8x16_t) __builtin_aarch64_get_qregoiv16qi (__o, 0); - ret.val[1] = (poly8x16_t) __builtin_aarch64_get_qregoiv16qi (__o, 1); - return ret; + return __builtin_aarch64_ld2rv16qi_ps ( + (const __builtin_aarch64_simd_qi *) __a); } __extension__ extern __inline int16x8x2_t __attribute__ ((__always_inline__, __gnu_inline__, __artificial__)) vld2q_dup_s16 (const int16_t * __a) { - int16x8x2_t ret; - __builtin_aarch64_simd_oi __o; - __o = __builtin_aarch64_ld2rv8hi ((const __builtin_aarch64_simd_hi *) __a); - ret.val[0] = (int16x8_t) __builtin_aarch64_get_qregoiv8hi (__o, 0); - ret.val[1] = (int16x8_t) __builtin_aarch64_get_qregoiv8hi (__o, 1); - return ret; + return __builtin_aarch64_ld2rv8hi ((const __builtin_aarch64_simd_hi *) __a); } __extension__ extern __inline poly16x8x2_t __attribute__ ((__always_inline__, __gnu_inline__, __artificial__)) vld2q_dup_p16 (const poly16_t * __a) { - poly16x8x2_t ret; - __builtin_aarch64_simd_oi __o; - __o = __builtin_aarch64_ld2rv8hi ((const __builtin_aarch64_simd_hi *) __a); - ret.val[0] = (poly16x8_t) __builtin_aarch64_get_qregoiv8hi (__o, 0); - ret.val[1] = (poly16x8_t) __builtin_aarch64_get_qregoiv8hi (__o, 1); - return ret; + return __builtin_aarch64_ld2rv8hi_ps ( + (const __builtin_aarch64_simd_hi *) __a); } __extension__ extern __inline int32x4x2_t __attribute__ ((__always_inline__, __gnu_inline__, __artificial__)) vld2q_dup_s32 (const int32_t * __a) { - int32x4x2_t ret; - __builtin_aarch64_simd_oi __o; - __o = __builtin_aarch64_ld2rv4si ((const __builtin_aarch64_simd_si *) __a); - ret.val[0] = (int32x4_t) __builtin_aarch64_get_qregoiv4si (__o, 0); - ret.val[1] = (int32x4_t) __builtin_aarch64_get_qregoiv4si (__o, 1); - return ret; + return __builtin_aarch64_ld2rv4si ((const __builtin_aarch64_simd_si *) __a); } __extension__ extern __inline int64x2x2_t __attribute__ ((__always_inline__, __gnu_inline__, __artificial__)) vld2q_dup_s64 (const int64_t * __a) { - int64x2x2_t ret; - __builtin_aarch64_simd_oi __o; - __o = __builtin_aarch64_ld2rv2di ((const __builtin_aarch64_simd_di *) __a); - ret.val[0] = (int64x2_t) __builtin_aarch64_get_qregoiv2di (__o, 0); - ret.val[1] = (int64x2_t) __builtin_aarch64_get_qregoiv2di (__o, 1); - return ret; + return __builtin_aarch64_ld2rv2di ((const __builtin_aarch64_simd_di *) __a); } __extension__ extern __inline uint8x16x2_t __attribute__ ((__always_inline__, __gnu_inline__, __artificial__)) vld2q_dup_u8 (const uint8_t * __a) { - uint8x16x2_t ret; - __builtin_aarch64_simd_oi __o; - __o = __builtin_aarch64_ld2rv16qi ((const __builtin_aarch64_simd_qi *) __a); - ret.val[0] = (uint8x16_t) __builtin_aarch64_get_qregoiv16qi (__o, 0); - ret.val[1] = (uint8x16_t) __builtin_aarch64_get_qregoiv16qi (__o, 1); - return ret; + return __builtin_aarch64_ld2rv16qi_us ( + (const __builtin_aarch64_simd_qi *) __a); } __extension__ extern __inline uint16x8x2_t __attribute__ ((__always_inline__, __gnu_inline__, __artificial__)) vld2q_dup_u16 (const uint16_t * __a) { - uint16x8x2_t ret; - __builtin_aarch64_simd_oi __o; - __o = __builtin_aarch64_ld2rv8hi ((const __builtin_aarch64_simd_hi *) __a); - ret.val[0] = (uint16x8_t) __builtin_aarch64_get_qregoiv8hi (__o, 0); - ret.val[1] = (uint16x8_t) __builtin_aarch64_get_qregoiv8hi (__o, 1); - return ret; + return __builtin_aarch64_ld2rv8hi_us ( + (const __builtin_aarch64_simd_hi *) __a); } __extension__ extern __inline uint32x4x2_t __attribute__ ((__always_inline__, __gnu_inline__, __artificial__)) vld2q_dup_u32 (const uint32_t * __a) { - uint32x4x2_t ret; - __builtin_aarch64_simd_oi __o; - __o = __builtin_aarch64_ld2rv4si ((const __builtin_aarch64_simd_si *) __a); - ret.val[0] = (uint32x4_t) __builtin_aarch64_get_qregoiv4si (__o, 0); - ret.val[1] = (uint32x4_t) __builtin_aarch64_get_qregoiv4si (__o, 1); - return ret; + return __builtin_aarch64_ld2rv4si_us ( + (const __builtin_aarch64_simd_si *) __a); } __extension__ extern __inline uint64x2x2_t __attribute__ ((__always_inline__, __gnu_inline__, __artificial__)) vld2q_dup_u64 (const uint64_t * __a) { - uint64x2x2_t ret; - __builtin_aarch64_simd_oi __o; - __o = __builtin_aarch64_ld2rv2di ((const __builtin_aarch64_simd_di *) __a); - ret.val[0] = (uint64x2_t) __builtin_aarch64_get_qregoiv2di (__o, 0); - ret.val[1] = (uint64x2_t) __builtin_aarch64_get_qregoiv2di (__o, 1); - return ret; + return __builtin_aarch64_ld2rv2di_us ( + (const __builtin_aarch64_simd_di *) __a); } __extension__ extern __inline float16x8x2_t __attribute__ ((__always_inline__, __gnu_inline__, __artificial__)) vld2q_dup_f16 (const float16_t * __a) { - float16x8x2_t ret; - __builtin_aarch64_simd_oi __o; - __o = __builtin_aarch64_ld2rv8hf ((const __builtin_aarch64_simd_hf *) __a); - ret.val[0] = (float16x8_t) __builtin_aarch64_get_qregoiv8hf (__o, 0); - ret.val[1] = __builtin_aarch64_get_qregoiv8hf (__o, 1); - return ret; + return __builtin_aarch64_ld2rv8hf ((const __builtin_aarch64_simd_hf *) __a); } __extension__ extern __inline float32x4x2_t __attribute__ ((__always_inline__, __gnu_inline__, __artificial__)) vld2q_dup_f32 (const float32_t * __a) { - float32x4x2_t ret; - __builtin_aarch64_simd_oi __o; - __o = __builtin_aarch64_ld2rv4sf ((const __builtin_aarch64_simd_sf *) __a); - ret.val[0] = (float32x4_t) __builtin_aarch64_get_qregoiv4sf (__o, 0); - ret.val[1] = (float32x4_t) __builtin_aarch64_get_qregoiv4sf (__o, 1); - return ret; + return __builtin_aarch64_ld2rv4sf ((const __builtin_aarch64_simd_sf *) __a); } __extension__ extern __inline float64x2x2_t __attribute__ ((__always_inline__, __gnu_inline__, __artificial__)) vld2q_dup_f64 (const float64_t * __a) { - float64x2x2_t ret; - __builtin_aarch64_simd_oi __o; - __o = __builtin_aarch64_ld2rv2df ((const __builtin_aarch64_simd_df *) __a); - ret.val[0] = (float64x2_t) __builtin_aarch64_get_qregoiv2df (__o, 0); - ret.val[1] = (float64x2_t) __builtin_aarch64_get_qregoiv2df (__o, 1); - return ret; + return __builtin_aarch64_ld2rv2df ((const __builtin_aarch64_simd_df *) __a); } __extension__ extern __inline poly64x2x2_t __attribute__ ((__always_inline__, __gnu_inline__, __artificial__)) vld2q_dup_p64 (const poly64_t * __a) { - poly64x2x2_t ret; - __builtin_aarch64_simd_oi __o; - __o = __builtin_aarch64_ld2rv2di ((const __builtin_aarch64_simd_di *) __a); - ret.val[0] = (poly64x2_t) __builtin_aarch64_get_qregoiv2di_pss (__o, 0); - ret.val[1] = (poly64x2_t) __builtin_aarch64_get_qregoiv2di_pss (__o, 1); - return ret; + return __builtin_aarch64_ld2rv2di_ps ( + (const __builtin_aarch64_simd_di *) __a); } __extension__ extern __inline int64x1x3_t __attribute__ ((__always_inline__, __gnu_inline__, __artificial__)) vld3_dup_s64 (const int64_t * __a) { - int64x1x3_t ret; - __builtin_aarch64_simd_ci __o; - __o = __builtin_aarch64_ld3rdi ((const __builtin_aarch64_simd_di *) __a); - ret.val[0] = (int64x1_t) __builtin_aarch64_get_dregcidi (__o, 0); - ret.val[1] = (int64x1_t) __builtin_aarch64_get_dregcidi (__o, 1); - ret.val[2] = (int64x1_t) __builtin_aarch64_get_dregcidi (__o, 2); - return ret; + return __builtin_aarch64_ld3rdi ((const __builtin_aarch64_simd_di *) __a); } __extension__ extern __inline uint64x1x3_t __attribute__ ((__always_inline__, __gnu_inline__, __artificial__)) vld3_dup_u64 (const uint64_t * __a) { - uint64x1x3_t ret; - __builtin_aarch64_simd_ci __o; - __o = __builtin_aarch64_ld3rdi ((const __builtin_aarch64_simd_di *) __a); - ret.val[0] = (uint64x1_t) __builtin_aarch64_get_dregcidi (__o, 0); - ret.val[1] = (uint64x1_t) __builtin_aarch64_get_dregcidi (__o, 1); - ret.val[2] = (uint64x1_t) __builtin_aarch64_get_dregcidi (__o, 2); - return ret; + return __builtin_aarch64_ld3rdi_us ((const __builtin_aarch64_simd_di *) __a); } __extension__ extern __inline float64x1x3_t __attribute__ ((__always_inline__, __gnu_inline__, __artificial__)) vld3_dup_f64 (const float64_t * __a) { - float64x1x3_t ret; - __builtin_aarch64_simd_ci __o; - __o = __builtin_aarch64_ld3rdf ((const __builtin_aarch64_simd_df *) __a); - ret.val[0] = (float64x1_t) {__builtin_aarch64_get_dregcidf (__o, 0)}; - ret.val[1] = (float64x1_t) {__builtin_aarch64_get_dregcidf (__o, 1)}; - ret.val[2] = (float64x1_t) {__builtin_aarch64_get_dregcidf (__o, 2)}; - return ret; + return __builtin_aarch64_ld3rdf ((const __builtin_aarch64_simd_df *) __a); } __extension__ extern __inline int8x8x3_t __attribute__ ((__always_inline__, __gnu_inline__, __artificial__)) vld3_dup_s8 (const int8_t * __a) { - int8x8x3_t ret; - __builtin_aarch64_simd_ci __o; - __o = __builtin_aarch64_ld3rv8qi ((const __builtin_aarch64_simd_qi *) __a); - ret.val[0] = (int8x8_t) __builtin_aarch64_get_dregciv8qi (__o, 0); - ret.val[1] = (int8x8_t) __builtin_aarch64_get_dregciv8qi (__o, 1); - ret.val[2] = (int8x8_t) __builtin_aarch64_get_dregciv8qi (__o, 2); - return ret; + return __builtin_aarch64_ld3rv8qi ((const __builtin_aarch64_simd_qi *) __a); } __extension__ extern __inline poly8x8x3_t __attribute__ ((__always_inline__, __gnu_inline__, __artificial__)) vld3_dup_p8 (const poly8_t * __a) { - poly8x8x3_t ret; - __builtin_aarch64_simd_ci __o; - __o = __builtin_aarch64_ld3rv8qi ((const __builtin_aarch64_simd_qi *) __a); - ret.val[0] = (poly8x8_t) __builtin_aarch64_get_dregciv8qi (__o, 0); - ret.val[1] = (poly8x8_t) __builtin_aarch64_get_dregciv8qi (__o, 1); - ret.val[2] = (poly8x8_t) __builtin_aarch64_get_dregciv8qi (__o, 2); - return ret; + return __builtin_aarch64_ld3rv8qi_ps ( + (const __builtin_aarch64_simd_qi *) __a); } __extension__ extern __inline int16x4x3_t __attribute__ ((__always_inline__, __gnu_inline__, __artificial__)) vld3_dup_s16 (const int16_t * __a) { - int16x4x3_t ret; - __builtin_aarch64_simd_ci __o; - __o = __builtin_aarch64_ld3rv4hi ((const __builtin_aarch64_simd_hi *) __a); - ret.val[0] = (int16x4_t) __builtin_aarch64_get_dregciv4hi (__o, 0); - ret.val[1] = (int16x4_t) __builtin_aarch64_get_dregciv4hi (__o, 1); - ret.val[2] = (int16x4_t) __builtin_aarch64_get_dregciv4hi (__o, 2); - return ret; + return __builtin_aarch64_ld3rv4hi ((const __builtin_aarch64_simd_hi *) __a); } __extension__ extern __inline poly16x4x3_t __attribute__ ((__always_inline__, __gnu_inline__, __artificial__)) vld3_dup_p16 (const poly16_t * __a) { - poly16x4x3_t ret; - __builtin_aarch64_simd_ci __o; - __o = __builtin_aarch64_ld3rv4hi ((const __builtin_aarch64_simd_hi *) __a); - ret.val[0] = (poly16x4_t) __builtin_aarch64_get_dregciv4hi (__o, 0); - ret.val[1] = (poly16x4_t) __builtin_aarch64_get_dregciv4hi (__o, 1); - ret.val[2] = (poly16x4_t) __builtin_aarch64_get_dregciv4hi (__o, 2); - return ret; + return __builtin_aarch64_ld3rv4hi_ps ( + (const __builtin_aarch64_simd_hi *) __a); } __extension__ extern __inline int32x2x3_t __attribute__ ((__always_inline__, __gnu_inline__, __artificial__)) vld3_dup_s32 (const int32_t * __a) { - int32x2x3_t ret; - __builtin_aarch64_simd_ci __o; - __o = __builtin_aarch64_ld3rv2si ((const __builtin_aarch64_simd_si *) __a); - ret.val[0] = (int32x2_t) __builtin_aarch64_get_dregciv2si (__o, 0); - ret.val[1] = (int32x2_t) __builtin_aarch64_get_dregciv2si (__o, 1); - ret.val[2] = (int32x2_t) __builtin_aarch64_get_dregciv2si (__o, 2); - return ret; + return __builtin_aarch64_ld3rv2si ((const __builtin_aarch64_simd_si *) __a); } __extension__ extern __inline uint8x8x3_t __attribute__ ((__always_inline__, __gnu_inline__, __artificial__)) vld3_dup_u8 (const uint8_t * __a) { - uint8x8x3_t ret; - __builtin_aarch64_simd_ci __o; - __o = __builtin_aarch64_ld3rv8qi ((const __builtin_aarch64_simd_qi *) __a); - ret.val[0] = (uint8x8_t) __builtin_aarch64_get_dregciv8qi (__o, 0); - ret.val[1] = (uint8x8_t) __builtin_aarch64_get_dregciv8qi (__o, 1); - ret.val[2] = (uint8x8_t) __builtin_aarch64_get_dregciv8qi (__o, 2); - return ret; + return __builtin_aarch64_ld3rv8qi_us ( + (const __builtin_aarch64_simd_qi *) __a); } __extension__ extern __inline uint16x4x3_t __attribute__ ((__always_inline__, __gnu_inline__, __artificial__)) vld3_dup_u16 (const uint16_t * __a) { - uint16x4x3_t ret; - __builtin_aarch64_simd_ci __o; - __o = __builtin_aarch64_ld3rv4hi ((const __builtin_aarch64_simd_hi *) __a); - ret.val[0] = (uint16x4_t) __builtin_aarch64_get_dregciv4hi (__o, 0); - ret.val[1] = (uint16x4_t) __builtin_aarch64_get_dregciv4hi (__o, 1); - ret.val[2] = (uint16x4_t) __builtin_aarch64_get_dregciv4hi (__o, 2); - return ret; + return __builtin_aarch64_ld3rv4hi_us ( + (const __builtin_aarch64_simd_hi *) __a); } __extension__ extern __inline uint32x2x3_t __attribute__ ((__always_inline__, __gnu_inline__, __artificial__)) vld3_dup_u32 (const uint32_t * __a) { - uint32x2x3_t ret; - __builtin_aarch64_simd_ci __o; - __o = __builtin_aarch64_ld3rv2si ((const __builtin_aarch64_simd_si *) __a); - ret.val[0] = (uint32x2_t) __builtin_aarch64_get_dregciv2si (__o, 0); - ret.val[1] = (uint32x2_t) __builtin_aarch64_get_dregciv2si (__o, 1); - ret.val[2] = (uint32x2_t) __builtin_aarch64_get_dregciv2si (__o, 2); - return ret; + return __builtin_aarch64_ld3rv2si_us ( + (const __builtin_aarch64_simd_si *) __a); } __extension__ extern __inline float16x4x3_t __attribute__ ((__always_inline__, __gnu_inline__, __artificial__)) vld3_dup_f16 (const float16_t * __a) { - float16x4x3_t ret; - __builtin_aarch64_simd_ci __o; - __o = __builtin_aarch64_ld3rv4hf ((const __builtin_aarch64_simd_hf *) __a); - ret.val[0] = (float16x4_t) __builtin_aarch64_get_dregciv4hf (__o, 0); - ret.val[1] = (float16x4_t) __builtin_aarch64_get_dregciv4hf (__o, 1); - ret.val[2] = (float16x4_t) __builtin_aarch64_get_dregciv4hf (__o, 2); - return ret; + return __builtin_aarch64_ld3rv4hf ((const __builtin_aarch64_simd_hf *) __a); } __extension__ extern __inline float32x2x3_t __attribute__ ((__always_inline__, __gnu_inline__, __artificial__)) vld3_dup_f32 (const float32_t * __a) { - float32x2x3_t ret; - __builtin_aarch64_simd_ci __o; - __o = __builtin_aarch64_ld3rv2sf ((const __builtin_aarch64_simd_sf *) __a); - ret.val[0] = (float32x2_t) __builtin_aarch64_get_dregciv2sf (__o, 0); - ret.val[1] = (float32x2_t) __builtin_aarch64_get_dregciv2sf (__o, 1); - ret.val[2] = (float32x2_t) __builtin_aarch64_get_dregciv2sf (__o, 2); - return ret; + return __builtin_aarch64_ld3rv2sf ((const __builtin_aarch64_simd_sf *) __a); } __extension__ extern __inline poly64x1x3_t __attribute__ ((__always_inline__, __gnu_inline__, __artificial__)) vld3_dup_p64 (const poly64_t * __a) { - poly64x1x3_t ret; - __builtin_aarch64_simd_ci __o; - __o = __builtin_aarch64_ld3rv2di ((const __builtin_aarch64_simd_di *) __a); - ret.val[0] = (poly64x1_t) __builtin_aarch64_get_dregcidi_pss (__o, 0); - ret.val[1] = (poly64x1_t) __builtin_aarch64_get_dregcidi_pss (__o, 1); - ret.val[2] = (poly64x1_t) __builtin_aarch64_get_dregcidi_pss (__o, 2); - return ret; + return __builtin_aarch64_ld3rdi_ps ((const __builtin_aarch64_simd_di *) __a); } __extension__ extern __inline int8x16x3_t __attribute__ ((__always_inline__, __gnu_inline__, __artificial__)) vld3q_dup_s8 (const int8_t * __a) { - int8x16x3_t ret; - __builtin_aarch64_simd_ci __o; - __o = __builtin_aarch64_ld3rv16qi ((const __builtin_aarch64_simd_qi *) __a); - ret.val[0] = (int8x16_t) __builtin_aarch64_get_qregciv16qi (__o, 0); - ret.val[1] = (int8x16_t) __builtin_aarch64_get_qregciv16qi (__o, 1); - ret.val[2] = (int8x16_t) __builtin_aarch64_get_qregciv16qi (__o, 2); - return ret; + return __builtin_aarch64_ld3rv16qi ((const __builtin_aarch64_simd_qi *) __a); } __extension__ extern __inline poly8x16x3_t __attribute__ ((__always_inline__, __gnu_inline__, __artificial__)) vld3q_dup_p8 (const poly8_t * __a) { - poly8x16x3_t ret; - __builtin_aarch64_simd_ci __o; - __o = __builtin_aarch64_ld3rv16qi ((const __builtin_aarch64_simd_qi *) __a); - ret.val[0] = (poly8x16_t) __builtin_aarch64_get_qregciv16qi (__o, 0); - ret.val[1] = (poly8x16_t) __builtin_aarch64_get_qregciv16qi (__o, 1); - ret.val[2] = (poly8x16_t) __builtin_aarch64_get_qregciv16qi (__o, 2); - return ret; + return __builtin_aarch64_ld3rv16qi_ps ( + (const __builtin_aarch64_simd_qi *) __a); } __extension__ extern __inline int16x8x3_t __attribute__ ((__always_inline__, __gnu_inline__, __artificial__)) vld3q_dup_s16 (const int16_t * __a) { - int16x8x3_t ret; - __builtin_aarch64_simd_ci __o; - __o = __builtin_aarch64_ld3rv8hi ((const __builtin_aarch64_simd_hi *) __a); - ret.val[0] = (int16x8_t) __builtin_aarch64_get_qregciv8hi (__o, 0); - ret.val[1] = (int16x8_t) __builtin_aarch64_get_qregciv8hi (__o, 1); - ret.val[2] = (int16x8_t) __builtin_aarch64_get_qregciv8hi (__o, 2); - return ret; + return __builtin_aarch64_ld3rv8hi ((const __builtin_aarch64_simd_hi *) __a); } __extension__ extern __inline poly16x8x3_t __attribute__ ((__always_inline__, __gnu_inline__, __artificial__)) vld3q_dup_p16 (const poly16_t * __a) { - poly16x8x3_t ret; - __builtin_aarch64_simd_ci __o; - __o = __builtin_aarch64_ld3rv8hi ((const __builtin_aarch64_simd_hi *) __a); - ret.val[0] = (poly16x8_t) __builtin_aarch64_get_qregciv8hi (__o, 0); - ret.val[1] = (poly16x8_t) __builtin_aarch64_get_qregciv8hi (__o, 1); - ret.val[2] = (poly16x8_t) __builtin_aarch64_get_qregciv8hi (__o, 2); - return ret; + return __builtin_aarch64_ld3rv8hi_ps ( + (const __builtin_aarch64_simd_hi *) __a); } __extension__ extern __inline int32x4x3_t __attribute__ ((__always_inline__, __gnu_inline__, __artificial__)) vld3q_dup_s32 (const int32_t * __a) { - int32x4x3_t ret; - __builtin_aarch64_simd_ci __o; - __o = __builtin_aarch64_ld3rv4si ((const __builtin_aarch64_simd_si *) __a); - ret.val[0] = (int32x4_t) __builtin_aarch64_get_qregciv4si (__o, 0); - ret.val[1] = (int32x4_t) __builtin_aarch64_get_qregciv4si (__o, 1); - ret.val[2] = (int32x4_t) __builtin_aarch64_get_qregciv4si (__o, 2); - return ret; + return __builtin_aarch64_ld3rv4si ((const __builtin_aarch64_simd_si *) __a); } __extension__ extern __inline int64x2x3_t __attribute__ ((__always_inline__, __gnu_inline__, __artificial__)) vld3q_dup_s64 (const int64_t * __a) { - int64x2x3_t ret; - __builtin_aarch64_simd_ci __o; - __o = __builtin_aarch64_ld3rv2di ((const __builtin_aarch64_simd_di *) __a); - ret.val[0] = (int64x2_t) __builtin_aarch64_get_qregciv2di (__o, 0); - ret.val[1] = (int64x2_t) __builtin_aarch64_get_qregciv2di (__o, 1); - ret.val[2] = (int64x2_t) __builtin_aarch64_get_qregciv2di (__o, 2); - return ret; + return __builtin_aarch64_ld3rv2di ((const __builtin_aarch64_simd_di *) __a); } __extension__ extern __inline uint8x16x3_t __attribute__ ((__always_inline__, __gnu_inline__, __artificial__)) vld3q_dup_u8 (const uint8_t * __a) { - uint8x16x3_t ret; - __builtin_aarch64_simd_ci __o; - __o = __builtin_aarch64_ld3rv16qi ((const __builtin_aarch64_simd_qi *) __a); - ret.val[0] = (uint8x16_t) __builtin_aarch64_get_qregciv16qi (__o, 0); - ret.val[1] = (uint8x16_t) __builtin_aarch64_get_qregciv16qi (__o, 1); - ret.val[2] = (uint8x16_t) __builtin_aarch64_get_qregciv16qi (__o, 2); - return ret; + return __builtin_aarch64_ld3rv16qi_us ( + (const __builtin_aarch64_simd_qi *) __a); } __extension__ extern __inline uint16x8x3_t __attribute__ ((__always_inline__, __gnu_inline__, __artificial__)) vld3q_dup_u16 (const uint16_t * __a) { - uint16x8x3_t ret; - __builtin_aarch64_simd_ci __o; - __o = __builtin_aarch64_ld3rv8hi ((const __builtin_aarch64_simd_hi *) __a); - ret.val[0] = (uint16x8_t) __builtin_aarch64_get_qregciv8hi (__o, 0); - ret.val[1] = (uint16x8_t) __builtin_aarch64_get_qregciv8hi (__o, 1); - ret.val[2] = (uint16x8_t) __builtin_aarch64_get_qregciv8hi (__o, 2); - return ret; + return __builtin_aarch64_ld3rv8hi_us ( + (const __builtin_aarch64_simd_hi *) __a); } __extension__ extern __inline uint32x4x3_t __attribute__ ((__always_inline__, __gnu_inline__, __artificial__)) vld3q_dup_u32 (const uint32_t * __a) { - uint32x4x3_t ret; - __builtin_aarch64_simd_ci __o; - __o = __builtin_aarch64_ld3rv4si ((const __builtin_aarch64_simd_si *) __a); - ret.val[0] = (uint32x4_t) __builtin_aarch64_get_qregciv4si (__o, 0); - ret.val[1] = (uint32x4_t) __builtin_aarch64_get_qregciv4si (__o, 1); - ret.val[2] = (uint32x4_t) __builtin_aarch64_get_qregciv4si (__o, 2); - return ret; + return __builtin_aarch64_ld3rv4si_us ( + (const __builtin_aarch64_simd_si *) __a); } __extension__ extern __inline uint64x2x3_t __attribute__ ((__always_inline__, __gnu_inline__, __artificial__)) vld3q_dup_u64 (const uint64_t * __a) { - uint64x2x3_t ret; - __builtin_aarch64_simd_ci __o; - __o = __builtin_aarch64_ld3rv2di ((const __builtin_aarch64_simd_di *) __a); - ret.val[0] = (uint64x2_t) __builtin_aarch64_get_qregciv2di (__o, 0); - ret.val[1] = (uint64x2_t) __builtin_aarch64_get_qregciv2di (__o, 1); - ret.val[2] = (uint64x2_t) __builtin_aarch64_get_qregciv2di (__o, 2); - return ret; + return __builtin_aarch64_ld3rv2di_us ( + (const __builtin_aarch64_simd_di *) __a); } __extension__ extern __inline float16x8x3_t __attribute__ ((__always_inline__, __gnu_inline__, __artificial__)) vld3q_dup_f16 (const float16_t * __a) { - float16x8x3_t ret; - __builtin_aarch64_simd_ci __o; - __o = __builtin_aarch64_ld3rv8hf ((const __builtin_aarch64_simd_hf *) __a); - ret.val[0] = (float16x8_t) __builtin_aarch64_get_qregciv8hf (__o, 0); - ret.val[1] = (float16x8_t) __builtin_aarch64_get_qregciv8hf (__o, 1); - ret.val[2] = (float16x8_t) __builtin_aarch64_get_qregciv8hf (__o, 2); - return ret; + return __builtin_aarch64_ld3rv8hf ((const __builtin_aarch64_simd_hf *) __a); } __extension__ extern __inline float32x4x3_t __attribute__ ((__always_inline__, __gnu_inline__, __artificial__)) vld3q_dup_f32 (const float32_t * __a) { - float32x4x3_t ret; - __builtin_aarch64_simd_ci __o; - __o = __builtin_aarch64_ld3rv4sf ((const __builtin_aarch64_simd_sf *) __a); - ret.val[0] = (float32x4_t) __builtin_aarch64_get_qregciv4sf (__o, 0); - ret.val[1] = (float32x4_t) __builtin_aarch64_get_qregciv4sf (__o, 1); - ret.val[2] = (float32x4_t) __builtin_aarch64_get_qregciv4sf (__o, 2); - return ret; + return __builtin_aarch64_ld3rv4sf ((const __builtin_aarch64_simd_sf *) __a); } __extension__ extern __inline float64x2x3_t __attribute__ ((__always_inline__, __gnu_inline__, __artificial__)) vld3q_dup_f64 (const float64_t * __a) { - float64x2x3_t ret; - __builtin_aarch64_simd_ci __o; - __o = __builtin_aarch64_ld3rv2df ((const __builtin_aarch64_simd_df *) __a); - ret.val[0] = (float64x2_t) __builtin_aarch64_get_qregciv2df (__o, 0); - ret.val[1] = (float64x2_t) __builtin_aarch64_get_qregciv2df (__o, 1); - ret.val[2] = (float64x2_t) __builtin_aarch64_get_qregciv2df (__o, 2); - return ret; + return __builtin_aarch64_ld3rv2df ((const __builtin_aarch64_simd_df *) __a); } __extension__ extern __inline poly64x2x3_t __attribute__ ((__always_inline__, __gnu_inline__, __artificial__)) vld3q_dup_p64 (const poly64_t * __a) { - poly64x2x3_t ret; - __builtin_aarch64_simd_ci __o; - __o = __builtin_aarch64_ld3rv2di ((const __builtin_aarch64_simd_di *) __a); - ret.val[0] = (poly64x2_t) __builtin_aarch64_get_qregciv2di_pss (__o, 0); - ret.val[1] = (poly64x2_t) __builtin_aarch64_get_qregciv2di_pss (__o, 1); - ret.val[2] = (poly64x2_t) __builtin_aarch64_get_qregciv2di_pss (__o, 2); - return ret; + return __builtin_aarch64_ld3rv2di_ps ( + (const __builtin_aarch64_simd_di *) __a); } __extension__ extern __inline int64x1x4_t __attribute__ ((__always_inline__, __gnu_inline__, __artificial__)) vld4_dup_s64 (const int64_t * __a) { - int64x1x4_t ret; - __builtin_aarch64_simd_xi __o; - __o = __builtin_aarch64_ld4rdi ((const __builtin_aarch64_simd_di *) __a); - ret.val[0] = (int64x1_t) __builtin_aarch64_get_dregxidi (__o, 0); - ret.val[1] = (int64x1_t) __builtin_aarch64_get_dregxidi (__o, 1); - ret.val[2] = (int64x1_t) __builtin_aarch64_get_dregxidi (__o, 2); - ret.val[3] = (int64x1_t) __builtin_aarch64_get_dregxidi (__o, 3); - return ret; + return __builtin_aarch64_ld4rdi ((const __builtin_aarch64_simd_di *) __a); } __extension__ extern __inline uint64x1x4_t __attribute__ ((__always_inline__, __gnu_inline__, __artificial__)) vld4_dup_u64 (const uint64_t * __a) { - uint64x1x4_t ret; - __builtin_aarch64_simd_xi __o; - __o = __builtin_aarch64_ld4rdi ((const __builtin_aarch64_simd_di *) __a); - ret.val[0] = (uint64x1_t) __builtin_aarch64_get_dregxidi (__o, 0); - ret.val[1] = (uint64x1_t) __builtin_aarch64_get_dregxidi (__o, 1); - ret.val[2] = (uint64x1_t) __builtin_aarch64_get_dregxidi (__o, 2); - ret.val[3] = (uint64x1_t) __builtin_aarch64_get_dregxidi (__o, 3); - return ret; + return __builtin_aarch64_ld4rdi_us ((const __builtin_aarch64_simd_di *) __a); } __extension__ extern __inline float64x1x4_t __attribute__ ((__always_inline__, __gnu_inline__, __artificial__)) vld4_dup_f64 (const float64_t * __a) { - float64x1x4_t ret; - __builtin_aarch64_simd_xi __o; - __o = __builtin_aarch64_ld4rdf ((const __builtin_aarch64_simd_df *) __a); - ret.val[0] = (float64x1_t) {__builtin_aarch64_get_dregxidf (__o, 0)}; - ret.val[1] = (float64x1_t) {__builtin_aarch64_get_dregxidf (__o, 1)}; - ret.val[2] = (float64x1_t) {__builtin_aarch64_get_dregxidf (__o, 2)}; - ret.val[3] = (float64x1_t) {__builtin_aarch64_get_dregxidf (__o, 3)}; - return ret; + return __builtin_aarch64_ld4rdf ((const __builtin_aarch64_simd_df *) __a); } __extension__ extern __inline int8x8x4_t __attribute__ ((__always_inline__, __gnu_inline__, __artificial__)) vld4_dup_s8 (const int8_t * __a) { - int8x8x4_t ret; - __builtin_aarch64_simd_xi __o; - __o = __builtin_aarch64_ld4rv8qi ((const __builtin_aarch64_simd_qi *) __a); - ret.val[0] = (int8x8_t) __builtin_aarch64_get_dregxiv8qi (__o, 0); - ret.val[1] = (int8x8_t) __builtin_aarch64_get_dregxiv8qi (__o, 1); - ret.val[2] = (int8x8_t) __builtin_aarch64_get_dregxiv8qi (__o, 2); - ret.val[3] = (int8x8_t) __builtin_aarch64_get_dregxiv8qi (__o, 3); - return ret; + return __builtin_aarch64_ld4rv8qi ((const __builtin_aarch64_simd_qi *) __a); } __extension__ extern __inline poly8x8x4_t __attribute__ ((__always_inline__, __gnu_inline__, __artificial__)) vld4_dup_p8 (const poly8_t * __a) { - poly8x8x4_t ret; - __builtin_aarch64_simd_xi __o; - __o = __builtin_aarch64_ld4rv8qi ((const __builtin_aarch64_simd_qi *) __a); - ret.val[0] = (poly8x8_t) __builtin_aarch64_get_dregxiv8qi (__o, 0); - ret.val[1] = (poly8x8_t) __builtin_aarch64_get_dregxiv8qi (__o, 1); - ret.val[2] = (poly8x8_t) __builtin_aarch64_get_dregxiv8qi (__o, 2); - ret.val[3] = (poly8x8_t) __builtin_aarch64_get_dregxiv8qi (__o, 3); - return ret; + return __builtin_aarch64_ld4rv8qi_ps ( + (const __builtin_aarch64_simd_qi *) __a); } __extension__ extern __inline int16x4x4_t __attribute__ ((__always_inline__, __gnu_inline__, __artificial__)) vld4_dup_s16 (const int16_t * __a) { - int16x4x4_t ret; - __builtin_aarch64_simd_xi __o; - __o = __builtin_aarch64_ld4rv4hi ((const __builtin_aarch64_simd_hi *) __a); - ret.val[0] = (int16x4_t) __builtin_aarch64_get_dregxiv4hi (__o, 0); - ret.val[1] = (int16x4_t) __builtin_aarch64_get_dregxiv4hi (__o, 1); - ret.val[2] = (int16x4_t) __builtin_aarch64_get_dregxiv4hi (__o, 2); - ret.val[3] = (int16x4_t) __builtin_aarch64_get_dregxiv4hi (__o, 3); - return ret; + return __builtin_aarch64_ld4rv4hi ((const __builtin_aarch64_simd_hi *) __a); } __extension__ extern __inline poly16x4x4_t __attribute__ ((__always_inline__, __gnu_inline__, __artificial__)) vld4_dup_p16 (const poly16_t * __a) { - poly16x4x4_t ret; - __builtin_aarch64_simd_xi __o; - __o = __builtin_aarch64_ld4rv4hi ((const __builtin_aarch64_simd_hi *) __a); - ret.val[0] = (poly16x4_t) __builtin_aarch64_get_dregxiv4hi (__o, 0); - ret.val[1] = (poly16x4_t) __builtin_aarch64_get_dregxiv4hi (__o, 1); - ret.val[2] = (poly16x4_t) __builtin_aarch64_get_dregxiv4hi (__o, 2); - ret.val[3] = (poly16x4_t) __builtin_aarch64_get_dregxiv4hi (__o, 3); - return ret; + return __builtin_aarch64_ld4rv4hi_ps ( + (const __builtin_aarch64_simd_hi *) __a); } __extension__ extern __inline int32x2x4_t __attribute__ ((__always_inline__, __gnu_inline__, __artificial__)) vld4_dup_s32 (const int32_t * __a) { - int32x2x4_t ret; - __builtin_aarch64_simd_xi __o; - __o = __builtin_aarch64_ld4rv2si ((const __builtin_aarch64_simd_si *) __a); - ret.val[0] = (int32x2_t) __builtin_aarch64_get_dregxiv2si (__o, 0); - ret.val[1] = (int32x2_t) __builtin_aarch64_get_dregxiv2si (__o, 1); - ret.val[2] = (int32x2_t) __builtin_aarch64_get_dregxiv2si (__o, 2); - ret.val[3] = (int32x2_t) __builtin_aarch64_get_dregxiv2si (__o, 3); - return ret; + return __builtin_aarch64_ld4rv2si ((const __builtin_aarch64_simd_si *) __a); } __extension__ extern __inline uint8x8x4_t __attribute__ ((__always_inline__, __gnu_inline__, __artificial__)) vld4_dup_u8 (const uint8_t * __a) { - uint8x8x4_t ret; - __builtin_aarch64_simd_xi __o; - __o = __builtin_aarch64_ld4rv8qi ((const __builtin_aarch64_simd_qi *) __a); - ret.val[0] = (uint8x8_t) __builtin_aarch64_get_dregxiv8qi (__o, 0); - ret.val[1] = (uint8x8_t) __builtin_aarch64_get_dregxiv8qi (__o, 1); - ret.val[2] = (uint8x8_t) __builtin_aarch64_get_dregxiv8qi (__o, 2); - ret.val[3] = (uint8x8_t) __builtin_aarch64_get_dregxiv8qi (__o, 3); - return ret; + return __builtin_aarch64_ld4rv8qi_us ( + (const __builtin_aarch64_simd_qi *) __a); } __extension__ extern __inline uint16x4x4_t __attribute__ ((__always_inline__, __gnu_inline__, __artificial__)) vld4_dup_u16 (const uint16_t * __a) { - uint16x4x4_t ret; - __builtin_aarch64_simd_xi __o; - __o = __builtin_aarch64_ld4rv4hi ((const __builtin_aarch64_simd_hi *) __a); - ret.val[0] = (uint16x4_t) __builtin_aarch64_get_dregxiv4hi (__o, 0); - ret.val[1] = (uint16x4_t) __builtin_aarch64_get_dregxiv4hi (__o, 1); - ret.val[2] = (uint16x4_t) __builtin_aarch64_get_dregxiv4hi (__o, 2); - ret.val[3] = (uint16x4_t) __builtin_aarch64_get_dregxiv4hi (__o, 3); - return ret; + return __builtin_aarch64_ld4rv4hi_us ( + (const __builtin_aarch64_simd_hi *) __a); } __extension__ extern __inline uint32x2x4_t __attribute__ ((__always_inline__, __gnu_inline__, __artificial__)) vld4_dup_u32 (const uint32_t * __a) { - uint32x2x4_t ret; - __builtin_aarch64_simd_xi __o; - __o = __builtin_aarch64_ld4rv2si ((const __builtin_aarch64_simd_si *) __a); - ret.val[0] = (uint32x2_t) __builtin_aarch64_get_dregxiv2si (__o, 0); - ret.val[1] = (uint32x2_t) __builtin_aarch64_get_dregxiv2si (__o, 1); - ret.val[2] = (uint32x2_t) __builtin_aarch64_get_dregxiv2si (__o, 2); - ret.val[3] = (uint32x2_t) __builtin_aarch64_get_dregxiv2si (__o, 3); - return ret; + return __builtin_aarch64_ld4rv2si_us ( + (const __builtin_aarch64_simd_si *) __a); } __extension__ extern __inline float16x4x4_t __attribute__ ((__always_inline__, __gnu_inline__, __artificial__)) vld4_dup_f16 (const float16_t * __a) { - float16x4x4_t ret; - __builtin_aarch64_simd_xi __o; - __o = __builtin_aarch64_ld4rv4hf ((const __builtin_aarch64_simd_hf *) __a); - ret.val[0] = (float16x4_t) __builtin_aarch64_get_dregxiv4hf (__o, 0); - ret.val[1] = (float16x4_t) __builtin_aarch64_get_dregxiv4hf (__o, 1); - ret.val[2] = (float16x4_t) __builtin_aarch64_get_dregxiv4hf (__o, 2); - ret.val[3] = (float16x4_t) __builtin_aarch64_get_dregxiv4hf (__o, 3); - return ret; + return __builtin_aarch64_ld4rv4hf ((const __builtin_aarch64_simd_hf *) __a); } __extension__ extern __inline float32x2x4_t __attribute__ ((__always_inline__, __gnu_inline__, __artificial__)) vld4_dup_f32 (const float32_t * __a) { - float32x2x4_t ret; - __builtin_aarch64_simd_xi __o; - __o = __builtin_aarch64_ld4rv2sf ((const __builtin_aarch64_simd_sf *) __a); - ret.val[0] = (float32x2_t) __builtin_aarch64_get_dregxiv2sf (__o, 0); - ret.val[1] = (float32x2_t) __builtin_aarch64_get_dregxiv2sf (__o, 1); - ret.val[2] = (float32x2_t) __builtin_aarch64_get_dregxiv2sf (__o, 2); - ret.val[3] = (float32x2_t) __builtin_aarch64_get_dregxiv2sf (__o, 3); - return ret; + return __builtin_aarch64_ld4rv2sf ((const __builtin_aarch64_simd_sf *) __a); } __extension__ extern __inline poly64x1x4_t __attribute__ ((__always_inline__, __gnu_inline__, __artificial__)) vld4_dup_p64 (const poly64_t * __a) { - poly64x1x4_t ret; - __builtin_aarch64_simd_xi __o; - __o = __builtin_aarch64_ld4rv2di ((const __builtin_aarch64_simd_di *) __a); - ret.val[0] = (poly64x1_t) __builtin_aarch64_get_dregxidi_pss (__o, 0); - ret.val[1] = (poly64x1_t) __builtin_aarch64_get_dregxidi_pss (__o, 1); - ret.val[2] = (poly64x1_t) __builtin_aarch64_get_dregxidi_pss (__o, 2); - ret.val[3] = (poly64x1_t) __builtin_aarch64_get_dregxidi_pss (__o, 3); - return ret; + return __builtin_aarch64_ld4rdi_ps ((const __builtin_aarch64_simd_di *) __a); } __extension__ extern __inline int8x16x4_t __attribute__ ((__always_inline__, __gnu_inline__, __artificial__)) vld4q_dup_s8 (const int8_t * __a) { - int8x16x4_t ret; - __builtin_aarch64_simd_xi __o; - __o = __builtin_aarch64_ld4rv16qi ((const __builtin_aarch64_simd_qi *) __a); - ret.val[0] = (int8x16_t) __builtin_aarch64_get_qregxiv16qi (__o, 0); - ret.val[1] = (int8x16_t) __builtin_aarch64_get_qregxiv16qi (__o, 1); - ret.val[2] = (int8x16_t) __builtin_aarch64_get_qregxiv16qi (__o, 2); - ret.val[3] = (int8x16_t) __builtin_aarch64_get_qregxiv16qi (__o, 3); - return ret; + return __builtin_aarch64_ld4rv16qi ((const __builtin_aarch64_simd_qi *) __a); } __extension__ extern __inline poly8x16x4_t __attribute__ ((__always_inline__, __gnu_inline__, __artificial__)) vld4q_dup_p8 (const poly8_t * __a) { - poly8x16x4_t ret; - __builtin_aarch64_simd_xi __o; - __o = __builtin_aarch64_ld4rv16qi ((const __builtin_aarch64_simd_qi *) __a); - ret.val[0] = (poly8x16_t) __builtin_aarch64_get_qregxiv16qi (__o, 0); - ret.val[1] = (poly8x16_t) __builtin_aarch64_get_qregxiv16qi (__o, 1); - ret.val[2] = (poly8x16_t) __builtin_aarch64_get_qregxiv16qi (__o, 2); - ret.val[3] = (poly8x16_t) __builtin_aarch64_get_qregxiv16qi (__o, 3); - return ret; + return __builtin_aarch64_ld4rv16qi_ps ( + (const __builtin_aarch64_simd_qi *) __a); } __extension__ extern __inline int16x8x4_t __attribute__ ((__always_inline__, __gnu_inline__, __artificial__)) vld4q_dup_s16 (const int16_t * __a) { - int16x8x4_t ret; - __builtin_aarch64_simd_xi __o; - __o = __builtin_aarch64_ld4rv8hi ((const __builtin_aarch64_simd_hi *) __a); - ret.val[0] = (int16x8_t) __builtin_aarch64_get_qregxiv8hi (__o, 0); - ret.val[1] = (int16x8_t) __builtin_aarch64_get_qregxiv8hi (__o, 1); - ret.val[2] = (int16x8_t) __builtin_aarch64_get_qregxiv8hi (__o, 2); - ret.val[3] = (int16x8_t) __builtin_aarch64_get_qregxiv8hi (__o, 3); - return ret; + return __builtin_aarch64_ld4rv8hi ((const __builtin_aarch64_simd_hi *) __a); } __extension__ extern __inline poly16x8x4_t __attribute__ ((__always_inline__, __gnu_inline__, __artificial__)) vld4q_dup_p16 (const poly16_t * __a) { - poly16x8x4_t ret; - __builtin_aarch64_simd_xi __o; - __o = __builtin_aarch64_ld4rv8hi ((const __builtin_aarch64_simd_hi *) __a); - ret.val[0] = (poly16x8_t) __builtin_aarch64_get_qregxiv8hi (__o, 0); - ret.val[1] = (poly16x8_t) __builtin_aarch64_get_qregxiv8hi (__o, 1); - ret.val[2] = (poly16x8_t) __builtin_aarch64_get_qregxiv8hi (__o, 2); - ret.val[3] = (poly16x8_t) __builtin_aarch64_get_qregxiv8hi (__o, 3); - return ret; + return __builtin_aarch64_ld4rv8hi_ps ( + (const __builtin_aarch64_simd_hi *) __a); } __extension__ extern __inline int32x4x4_t __attribute__ ((__always_inline__, __gnu_inline__, __artificial__)) vld4q_dup_s32 (const int32_t * __a) { - int32x4x4_t ret; - __builtin_aarch64_simd_xi __o; - __o = __builtin_aarch64_ld4rv4si ((const __builtin_aarch64_simd_si *) __a); - ret.val[0] = (int32x4_t) __builtin_aarch64_get_qregxiv4si (__o, 0); - ret.val[1] = (int32x4_t) __builtin_aarch64_get_qregxiv4si (__o, 1); - ret.val[2] = (int32x4_t) __builtin_aarch64_get_qregxiv4si (__o, 2); - ret.val[3] = (int32x4_t) __builtin_aarch64_get_qregxiv4si (__o, 3); - return ret; + return __builtin_aarch64_ld4rv4si ((const __builtin_aarch64_simd_si *) __a); } __extension__ extern __inline int64x2x4_t __attribute__ ((__always_inline__, __gnu_inline__, __artificial__)) vld4q_dup_s64 (const int64_t * __a) { - int64x2x4_t ret; - __builtin_aarch64_simd_xi __o; - __o = __builtin_aarch64_ld4rv2di ((const __builtin_aarch64_simd_di *) __a); - ret.val[0] = (int64x2_t) __builtin_aarch64_get_qregxiv2di (__o, 0); - ret.val[1] = (int64x2_t) __builtin_aarch64_get_qregxiv2di (__o, 1); - ret.val[2] = (int64x2_t) __builtin_aarch64_get_qregxiv2di (__o, 2); - ret.val[3] = (int64x2_t) __builtin_aarch64_get_qregxiv2di (__o, 3); - return ret; + return __builtin_aarch64_ld4rv2di ((const __builtin_aarch64_simd_di *) __a); } __extension__ extern __inline uint8x16x4_t __attribute__ ((__always_inline__, __gnu_inline__, __artificial__)) vld4q_dup_u8 (const uint8_t * __a) { - uint8x16x4_t ret; - __builtin_aarch64_simd_xi __o; - __o = __builtin_aarch64_ld4rv16qi ((const __builtin_aarch64_simd_qi *) __a); - ret.val[0] = (uint8x16_t) __builtin_aarch64_get_qregxiv16qi (__o, 0); - ret.val[1] = (uint8x16_t) __builtin_aarch64_get_qregxiv16qi (__o, 1); - ret.val[2] = (uint8x16_t) __builtin_aarch64_get_qregxiv16qi (__o, 2); - ret.val[3] = (uint8x16_t) __builtin_aarch64_get_qregxiv16qi (__o, 3); - return ret; + return __builtin_aarch64_ld4rv16qi_us ( + (const __builtin_aarch64_simd_qi *) __a); } __extension__ extern __inline uint16x8x4_t __attribute__ ((__always_inline__, __gnu_inline__, __artificial__)) vld4q_dup_u16 (const uint16_t * __a) { - uint16x8x4_t ret; - __builtin_aarch64_simd_xi __o; - __o = __builtin_aarch64_ld4rv8hi ((const __builtin_aarch64_simd_hi *) __a); - ret.val[0] = (uint16x8_t) __builtin_aarch64_get_qregxiv8hi (__o, 0); - ret.val[1] = (uint16x8_t) __builtin_aarch64_get_qregxiv8hi (__o, 1); - ret.val[2] = (uint16x8_t) __builtin_aarch64_get_qregxiv8hi (__o, 2); - ret.val[3] = (uint16x8_t) __builtin_aarch64_get_qregxiv8hi (__o, 3); - return ret; + return __builtin_aarch64_ld4rv8hi_us ( + (const __builtin_aarch64_simd_hi *) __a); } __extension__ extern __inline uint32x4x4_t __attribute__ ((__always_inline__, __gnu_inline__, __artificial__)) vld4q_dup_u32 (const uint32_t * __a) { - uint32x4x4_t ret; - __builtin_aarch64_simd_xi __o; - __o = __builtin_aarch64_ld4rv4si ((const __builtin_aarch64_simd_si *) __a); - ret.val[0] = (uint32x4_t) __builtin_aarch64_get_qregxiv4si (__o, 0); - ret.val[1] = (uint32x4_t) __builtin_aarch64_get_qregxiv4si (__o, 1); - ret.val[2] = (uint32x4_t) __builtin_aarch64_get_qregxiv4si (__o, 2); - ret.val[3] = (uint32x4_t) __builtin_aarch64_get_qregxiv4si (__o, 3); - return ret; + return __builtin_aarch64_ld4rv4si_us ( + (const __builtin_aarch64_simd_si *) __a); } __extension__ extern __inline uint64x2x4_t __attribute__ ((__always_inline__, __gnu_inline__, __artificial__)) vld4q_dup_u64 (const uint64_t * __a) { - uint64x2x4_t ret; - __builtin_aarch64_simd_xi __o; - __o = __builtin_aarch64_ld4rv2di ((const __builtin_aarch64_simd_di *) __a); - ret.val[0] = (uint64x2_t) __builtin_aarch64_get_qregxiv2di (__o, 0); - ret.val[1] = (uint64x2_t) __builtin_aarch64_get_qregxiv2di (__o, 1); - ret.val[2] = (uint64x2_t) __builtin_aarch64_get_qregxiv2di (__o, 2); - ret.val[3] = (uint64x2_t) __builtin_aarch64_get_qregxiv2di (__o, 3); - return ret; + return __builtin_aarch64_ld4rv2di_us ( + (const __builtin_aarch64_simd_di *) __a); } __extension__ extern __inline float16x8x4_t __attribute__ ((__always_inline__, __gnu_inline__, __artificial__)) vld4q_dup_f16 (const float16_t * __a) { - float16x8x4_t ret; - __builtin_aarch64_simd_xi __o; - __o = __builtin_aarch64_ld4rv8hf ((const __builtin_aarch64_simd_hf *) __a); - ret.val[0] = (float16x8_t) __builtin_aarch64_get_qregxiv8hf (__o, 0); - ret.val[1] = (float16x8_t) __builtin_aarch64_get_qregxiv8hf (__o, 1); - ret.val[2] = (float16x8_t) __builtin_aarch64_get_qregxiv8hf (__o, 2); - ret.val[3] = (float16x8_t) __builtin_aarch64_get_qregxiv8hf (__o, 3); - return ret; + return __builtin_aarch64_ld4rv8hf ((const __builtin_aarch64_simd_hf *) __a); } __extension__ extern __inline float32x4x4_t __attribute__ ((__always_inline__, __gnu_inline__, __artificial__)) vld4q_dup_f32 (const float32_t * __a) { - float32x4x4_t ret; - __builtin_aarch64_simd_xi __o; - __o = __builtin_aarch64_ld4rv4sf ((const __builtin_aarch64_simd_sf *) __a); - ret.val[0] = (float32x4_t) __builtin_aarch64_get_qregxiv4sf (__o, 0); - ret.val[1] = (float32x4_t) __builtin_aarch64_get_qregxiv4sf (__o, 1); - ret.val[2] = (float32x4_t) __builtin_aarch64_get_qregxiv4sf (__o, 2); - ret.val[3] = (float32x4_t) __builtin_aarch64_get_qregxiv4sf (__o, 3); - return ret; + return __builtin_aarch64_ld4rv4sf ((const __builtin_aarch64_simd_sf *) __a); } __extension__ extern __inline float64x2x4_t __attribute__ ((__always_inline__, __gnu_inline__, __artificial__)) vld4q_dup_f64 (const float64_t * __a) { - float64x2x4_t ret; - __builtin_aarch64_simd_xi __o; - __o = __builtin_aarch64_ld4rv2df ((const __builtin_aarch64_simd_df *) __a); - ret.val[0] = (float64x2_t) __builtin_aarch64_get_qregxiv2df (__o, 0); - ret.val[1] = (float64x2_t) __builtin_aarch64_get_qregxiv2df (__o, 1); - ret.val[2] = (float64x2_t) __builtin_aarch64_get_qregxiv2df (__o, 2); - ret.val[3] = (float64x2_t) __builtin_aarch64_get_qregxiv2df (__o, 3); - return ret; + return __builtin_aarch64_ld4rv2df ((const __builtin_aarch64_simd_df *) __a); } __extension__ extern __inline poly64x2x4_t __attribute__ ((__always_inline__, __gnu_inline__, __artificial__)) vld4q_dup_p64 (const poly64_t * __a) { - poly64x2x4_t ret; - __builtin_aarch64_simd_xi __o; - __o = __builtin_aarch64_ld4rv2di ((const __builtin_aarch64_simd_di *) __a); - ret.val[0] = (poly64x2_t) __builtin_aarch64_get_qregxiv2di_pss (__o, 0); - ret.val[1] = (poly64x2_t) __builtin_aarch64_get_qregxiv2di_pss (__o, 1); - ret.val[2] = (poly64x2_t) __builtin_aarch64_get_qregxiv2di_pss (__o, 2); - ret.val[3] = (poly64x2_t) __builtin_aarch64_get_qregxiv2di_pss (__o, 3); - return ret; + return __builtin_aarch64_ld4rv2di_ps ( + (const __builtin_aarch64_simd_di *) __a); } /* vld2_lane */ @@ -19342,238 +17580,112 @@ __extension__ extern __inline uint8x8x2_t __attribute__ ((__always_inline__, __gnu_inline__,__artificial__)) vld2_lane_u8 (const uint8_t * __ptr, uint8x8x2_t __b, const int __c) { - __builtin_aarch64_simd_oi __o; - uint8x16x2_t __temp; - __temp.val[0] = vcombine_u8 (__b.val[0], vcreate_u8 (0)); - __temp.val[1] = vcombine_u8 (__b.val[1], vcreate_u8 (0)); - __o = __builtin_aarch64_set_qregoiv16qi (__o, (int8x16_t) __temp.val[0], 0); - __o = __builtin_aarch64_set_qregoiv16qi (__o, (int8x16_t) __temp.val[1], 1); - __o = __builtin_aarch64_ld2_lanev8qi ( - (__builtin_aarch64_simd_qi *) __ptr, __o, __c); - __b.val[0] = (uint8x8_t) __builtin_aarch64_get_dregoidi (__o, 0); - __b.val[1] = (uint8x8_t) __builtin_aarch64_get_dregoidi (__o, 1); - return __b; + return __builtin_aarch64_ld2_lanev8qi_usus ( + (__builtin_aarch64_simd_qi *) __ptr, __b, __c); } __extension__ extern __inline uint16x4x2_t __attribute__ ((__always_inline__, __gnu_inline__,__artificial__)) vld2_lane_u16 (const uint16_t * __ptr, uint16x4x2_t __b, const int __c) { - __builtin_aarch64_simd_oi __o; - uint16x8x2_t __temp; - __temp.val[0] = vcombine_u16 (__b.val[0], vcreate_u16 (0)); - __temp.val[1] = vcombine_u16 (__b.val[1], vcreate_u16 (0)); - __o = __builtin_aarch64_set_qregoiv8hi (__o, (int16x8_t) __temp.val[0], 0); - __o = __builtin_aarch64_set_qregoiv8hi (__o, (int16x8_t) __temp.val[1], 1); - __o = __builtin_aarch64_ld2_lanev4hi ( - (__builtin_aarch64_simd_hi *) __ptr, __o, __c); - __b.val[0] = (uint16x4_t) __builtin_aarch64_get_dregoidi (__o, 0); - __b.val[1] = (uint16x4_t) __builtin_aarch64_get_dregoidi (__o, 1); - return __b; + return __builtin_aarch64_ld2_lanev4hi_usus ( + (__builtin_aarch64_simd_hi *) __ptr, __b, __c); } __extension__ extern __inline uint32x2x2_t __attribute__ ((__always_inline__, __gnu_inline__,__artificial__)) vld2_lane_u32 (const uint32_t * __ptr, uint32x2x2_t __b, const int __c) { - __builtin_aarch64_simd_oi __o; - uint32x4x2_t __temp; - __temp.val[0] = vcombine_u32 (__b.val[0], vcreate_u32 (0)); - __temp.val[1] = vcombine_u32 (__b.val[1], vcreate_u32 (0)); - __o = __builtin_aarch64_set_qregoiv4si (__o, (int32x4_t) __temp.val[0], 0); - __o = __builtin_aarch64_set_qregoiv4si (__o, (int32x4_t) __temp.val[1], 1); - __o = __builtin_aarch64_ld2_lanev2si ( - (__builtin_aarch64_simd_si *) __ptr, __o, __c); - __b.val[0] = (uint32x2_t) __builtin_aarch64_get_dregoidi (__o, 0); - __b.val[1] = (uint32x2_t) __builtin_aarch64_get_dregoidi (__o, 1); - return __b; + return __builtin_aarch64_ld2_lanev2si_usus ( + (__builtin_aarch64_simd_si *) __ptr, __b, __c); } __extension__ extern __inline uint64x1x2_t __attribute__ ((__always_inline__, __gnu_inline__,__artificial__)) vld2_lane_u64 (const uint64_t * __ptr, uint64x1x2_t __b, const int __c) { - __builtin_aarch64_simd_oi __o; - uint64x2x2_t __temp; - __temp.val[0] = vcombine_u64 (__b.val[0], vcreate_u64 (0)); - __temp.val[1] = vcombine_u64 (__b.val[1], vcreate_u64 (0)); - __o = __builtin_aarch64_set_qregoiv2di (__o, (int64x2_t) __temp.val[0], 0); - __o = __builtin_aarch64_set_qregoiv2di (__o, (int64x2_t) __temp.val[1], 1); - __o = __builtin_aarch64_ld2_lanedi ( - (__builtin_aarch64_simd_di *) __ptr, __o, __c); - __b.val[0] = (uint64x1_t) __builtin_aarch64_get_dregoidi (__o, 0); - __b.val[1] = (uint64x1_t) __builtin_aarch64_get_dregoidi (__o, 1); - return __b; + return __builtin_aarch64_ld2_lanedi_usus ( + (__builtin_aarch64_simd_di *) __ptr, __b, __c); } __extension__ extern __inline int8x8x2_t __attribute__ ((__always_inline__, __gnu_inline__,__artificial__)) vld2_lane_s8 (const int8_t * __ptr, int8x8x2_t __b, const int __c) { - __builtin_aarch64_simd_oi __o; - int8x16x2_t __temp; - __temp.val[0] = vcombine_s8 (__b.val[0], vcreate_s8 (0)); - __temp.val[1] = vcombine_s8 (__b.val[1], vcreate_s8 (0)); - __o = __builtin_aarch64_set_qregoiv16qi (__o, (int8x16_t) __temp.val[0], 0); - __o = __builtin_aarch64_set_qregoiv16qi (__o, (int8x16_t) __temp.val[1], 1); - __o = __builtin_aarch64_ld2_lanev8qi ( - (__builtin_aarch64_simd_qi *) __ptr, __o, __c); - __b.val[0] = (int8x8_t) __builtin_aarch64_get_dregoidi (__o, 0); - __b.val[1] = (int8x8_t) __builtin_aarch64_get_dregoidi (__o, 1); - return __b; + return __builtin_aarch64_ld2_lanev8qi ( + (__builtin_aarch64_simd_qi *) __ptr, __b, __c); } __extension__ extern __inline int16x4x2_t __attribute__ ((__always_inline__, __gnu_inline__,__artificial__)) vld2_lane_s16 (const int16_t * __ptr, int16x4x2_t __b, const int __c) { - __builtin_aarch64_simd_oi __o; - int16x8x2_t __temp; - __temp.val[0] = vcombine_s16 (__b.val[0], vcreate_s16 (0)); - __temp.val[1] = vcombine_s16 (__b.val[1], vcreate_s16 (0)); - __o = __builtin_aarch64_set_qregoiv8hi (__o, (int16x8_t) __temp.val[0], 0); - __o = __builtin_aarch64_set_qregoiv8hi (__o, (int16x8_t) __temp.val[1], 1); - __o = __builtin_aarch64_ld2_lanev4hi ( - (__builtin_aarch64_simd_hi *) __ptr, __o, __c); - __b.val[0] = (int16x4_t) __builtin_aarch64_get_dregoidi (__o, 0); - __b.val[1] = (int16x4_t) __builtin_aarch64_get_dregoidi (__o, 1); - return __b; + return __builtin_aarch64_ld2_lanev4hi ( + (__builtin_aarch64_simd_hi *) __ptr, __b, __c); } __extension__ extern __inline int32x2x2_t __attribute__ ((__always_inline__, __gnu_inline__,__artificial__)) vld2_lane_s32 (const int32_t * __ptr, int32x2x2_t __b, const int __c) { - __builtin_aarch64_simd_oi __o; - int32x4x2_t __temp; - __temp.val[0] = vcombine_s32 (__b.val[0], vcreate_s32 (0)); - __temp.val[1] = vcombine_s32 (__b.val[1], vcreate_s32 (0)); - __o = __builtin_aarch64_set_qregoiv4si (__o, (int32x4_t) __temp.val[0], 0); - __o = __builtin_aarch64_set_qregoiv4si (__o, (int32x4_t) __temp.val[1], 1); - __o = __builtin_aarch64_ld2_lanev2si ( - (__builtin_aarch64_simd_si *) __ptr, __o, __c); - __b.val[0] = (int32x2_t) __builtin_aarch64_get_dregoidi (__o, 0); - __b.val[1] = (int32x2_t) __builtin_aarch64_get_dregoidi (__o, 1); - return __b; + return __builtin_aarch64_ld2_lanev2si ( + (__builtin_aarch64_simd_si *) __ptr, __b, __c); } __extension__ extern __inline int64x1x2_t __attribute__ ((__always_inline__, __gnu_inline__,__artificial__)) vld2_lane_s64 (const int64_t * __ptr, int64x1x2_t __b, const int __c) { - __builtin_aarch64_simd_oi __o; - int64x2x2_t __temp; - __temp.val[0] = vcombine_s64 (__b.val[0], vcreate_s64 (0)); - __temp.val[1] = vcombine_s64 (__b.val[1], vcreate_s64 (0)); - __o = __builtin_aarch64_set_qregoiv2di (__o, (int64x2_t) __temp.val[0], 0); - __o = __builtin_aarch64_set_qregoiv2di (__o, (int64x2_t) __temp.val[1], 1); - __o = __builtin_aarch64_ld2_lanedi ( - (__builtin_aarch64_simd_di *) __ptr, __o, __c); - __b.val[0] = (int64x1_t) __builtin_aarch64_get_dregoidi (__o, 0); - __b.val[1] = (int64x1_t) __builtin_aarch64_get_dregoidi (__o, 1); - return __b; + return __builtin_aarch64_ld2_lanedi ( + (__builtin_aarch64_simd_di *) __ptr, __b, __c); } __extension__ extern __inline float16x4x2_t __attribute__ ((__always_inline__, __gnu_inline__,__artificial__)) vld2_lane_f16 (const float16_t * __ptr, float16x4x2_t __b, const int __c) { - __builtin_aarch64_simd_oi __o; - float16x8x2_t __temp; - __temp.val[0] = vcombine_f16 (__b.val[0], vcreate_f16 (0)); - __temp.val[1] = vcombine_f16 (__b.val[1], vcreate_f16 (0)); - __o = __builtin_aarch64_set_qregoiv8hf (__o, (float16x8_t) __temp.val[0], 0); - __o = __builtin_aarch64_set_qregoiv8hf (__o, (float16x8_t) __temp.val[1], 1); - __o = __builtin_aarch64_ld2_lanev4hf ( - (__builtin_aarch64_simd_hf *) __ptr, __o, __c); - __b.val[0] = (float16x4_t) __builtin_aarch64_get_dregoidi (__o, 0); - __b.val[1] = (float16x4_t) __builtin_aarch64_get_dregoidi (__o, 1); - return __b; + return __builtin_aarch64_ld2_lanev4hf ( + (__builtin_aarch64_simd_hf *) __ptr, __b, __c); } __extension__ extern __inline float32x2x2_t __attribute__ ((__always_inline__, __gnu_inline__,__artificial__)) vld2_lane_f32 (const float32_t * __ptr, float32x2x2_t __b, const int __c) { - __builtin_aarch64_simd_oi __o; - float32x4x2_t __temp; - __temp.val[0] = vcombine_f32 (__b.val[0], vcreate_f32 (0)); - __temp.val[1] = vcombine_f32 (__b.val[1], vcreate_f32 (0)); - __o = __builtin_aarch64_set_qregoiv4sf (__o, (float32x4_t) __temp.val[0], 0); - __o = __builtin_aarch64_set_qregoiv4sf (__o, (float32x4_t) __temp.val[1], 1); - __o = __builtin_aarch64_ld2_lanev2sf ( - (__builtin_aarch64_simd_sf *) __ptr, __o, __c); - __b.val[0] = (float32x2_t) __builtin_aarch64_get_dregoidi (__o, 0); - __b.val[1] = (float32x2_t) __builtin_aarch64_get_dregoidi (__o, 1); - return __b; + return __builtin_aarch64_ld2_lanev2sf ( + (__builtin_aarch64_simd_sf *) __ptr, __b, __c); } __extension__ extern __inline float64x1x2_t __attribute__ ((__always_inline__, __gnu_inline__,__artificial__)) vld2_lane_f64 (const float64_t * __ptr, float64x1x2_t __b, const int __c) { - __builtin_aarch64_simd_oi __o; - float64x2x2_t __temp; - __temp.val[0] = vcombine_f64 (__b.val[0], vcreate_f64 (0)); - __temp.val[1] = vcombine_f64 (__b.val[1], vcreate_f64 (0)); - __o = __builtin_aarch64_set_qregoiv2df (__o, (float64x2_t) __temp.val[0], 0); - __o = __builtin_aarch64_set_qregoiv2df (__o, (float64x2_t) __temp.val[1], 1); - __o = __builtin_aarch64_ld2_lanedf ( - (__builtin_aarch64_simd_df *) __ptr, __o, __c); - __b.val[0] = (float64x1_t) __builtin_aarch64_get_dregoidi (__o, 0); - __b.val[1] = (float64x1_t) __builtin_aarch64_get_dregoidi (__o, 1); - return __b; + return __builtin_aarch64_ld2_lanedf ( + (__builtin_aarch64_simd_df *) __ptr, __b, __c); } __extension__ extern __inline poly8x8x2_t __attribute__ ((__always_inline__, __gnu_inline__,__artificial__)) vld2_lane_p8 (const poly8_t * __ptr, poly8x8x2_t __b, const int __c) { - __builtin_aarch64_simd_oi __o; - poly8x16x2_t __temp; - __temp.val[0] = vcombine_p8 (__b.val[0], vcreate_p8 (0)); - __temp.val[1] = vcombine_p8 (__b.val[1], vcreate_p8 (0)); - __o = __builtin_aarch64_set_qregoiv16qi (__o, (int8x16_t) __temp.val[0], 0); - __o = __builtin_aarch64_set_qregoiv16qi (__o, (int8x16_t) __temp.val[1], 1); - __o = __builtin_aarch64_ld2_lanev8qi ( - (__builtin_aarch64_simd_qi *) __ptr, __o, __c); - __b.val[0] = (poly8x8_t) __builtin_aarch64_get_dregoidi (__o, 0); - __b.val[1] = (poly8x8_t) __builtin_aarch64_get_dregoidi (__o, 1); - return __b; + return __builtin_aarch64_ld2_lanev8qi_psps ( + (__builtin_aarch64_simd_qi *) __ptr, __b, __c); } __extension__ extern __inline poly16x4x2_t __attribute__ ((__always_inline__, __gnu_inline__,__artificial__)) vld2_lane_p16 (const poly16_t * __ptr, poly16x4x2_t __b, const int __c) { - __builtin_aarch64_simd_oi __o; - poly16x8x2_t __temp; - __temp.val[0] = vcombine_p16 (__b.val[0], vcreate_p16 (0)); - __temp.val[1] = vcombine_p16 (__b.val[1], vcreate_p16 (0)); - __o = __builtin_aarch64_set_qregoiv8hi (__o, (int16x8_t) __temp.val[0], 0); - __o = __builtin_aarch64_set_qregoiv8hi (__o, (int16x8_t) __temp.val[1], 1); - __o = __builtin_aarch64_ld2_lanev4hi ( - (__builtin_aarch64_simd_hi *) __ptr, __o, __c); - __b.val[0] = (poly16x4_t) __builtin_aarch64_get_dregoidi (__o, 0); - __b.val[1] = (poly16x4_t) __builtin_aarch64_get_dregoidi (__o, 1); - return __b; + return __builtin_aarch64_ld2_lanev4hi_psps ( + (__builtin_aarch64_simd_hi *) __ptr, __b, __c); } __extension__ extern __inline poly64x1x2_t __attribute__ ((__always_inline__, __gnu_inline__,__artificial__)) vld2_lane_p64 (const poly64_t * __ptr, poly64x1x2_t __b, const int __c) { - __builtin_aarch64_simd_oi __o; - poly64x2x2_t __temp; - __temp.val[0] = vcombine_p64 (__b.val[0], vcreate_p64 (0)); - __temp.val[1] = vcombine_p64 (__b.val[1], vcreate_p64 (0)); - __o = __builtin_aarch64_set_qregoiv2di (__o, (int64x2_t) __temp.val[0], 0); - __o = __builtin_aarch64_set_qregoiv2di (__o, (int64x2_t) __temp.val[1], 1); - __o = __builtin_aarch64_ld2_lanedi ( - (__builtin_aarch64_simd_di *) __ptr, __o, __c); - __b.val[0] = (poly64x1_t) __builtin_aarch64_get_dregoidi (__o, 0); - __b.val[1] = (poly64x1_t) __builtin_aarch64_get_dregoidi (__o, 1); - return __b; + return __builtin_aarch64_ld2_lanedi_psps ( + (__builtin_aarch64_simd_di *) __ptr, __b, __c); } /* vld2q_lane */ @@ -19582,210 +17694,112 @@ __extension__ extern __inline uint8x16x2_t __attribute__ ((__always_inline__, __gnu_inline__,__artificial__)) vld2q_lane_u8 (const uint8_t * __ptr, uint8x16x2_t __b, const int __c) { - __builtin_aarch64_simd_oi __o; - uint8x16x2_t ret; - __o = __builtin_aarch64_set_qregoiv4si (__o, (int32x4_t) __b.val[0], 0); - __o = __builtin_aarch64_set_qregoiv4si (__o, (int32x4_t) __b.val[1], 1); - __o = __builtin_aarch64_ld2_lanev16qi ( - (__builtin_aarch64_simd_qi *) __ptr, __o, __c); - ret.val[0] = (uint8x16_t) __builtin_aarch64_get_qregoiv4si (__o, 0); - ret.val[1] = (uint8x16_t) __builtin_aarch64_get_qregoiv4si (__o, 1); - return ret; + return __builtin_aarch64_ld2_lanev16qi_usus ( + (__builtin_aarch64_simd_qi *) __ptr, __b, __c); } __extension__ extern __inline uint16x8x2_t __attribute__ ((__always_inline__, __gnu_inline__,__artificial__)) vld2q_lane_u16 (const uint16_t * __ptr, uint16x8x2_t __b, const int __c) { - __builtin_aarch64_simd_oi __o; - uint16x8x2_t ret; - __o = __builtin_aarch64_set_qregoiv4si (__o, (int32x4_t) __b.val[0], 0); - __o = __builtin_aarch64_set_qregoiv4si (__o, (int32x4_t) __b.val[1], 1); - __o = __builtin_aarch64_ld2_lanev8hi ( - (__builtin_aarch64_simd_hi *) __ptr, __o, __c); - ret.val[0] = (uint16x8_t) __builtin_aarch64_get_qregoiv4si (__o, 0); - ret.val[1] = (uint16x8_t) __builtin_aarch64_get_qregoiv4si (__o, 1); - return ret; + return __builtin_aarch64_ld2_lanev8hi_usus ( + (__builtin_aarch64_simd_hi *) __ptr, __b, __c); } __extension__ extern __inline uint32x4x2_t __attribute__ ((__always_inline__, __gnu_inline__,__artificial__)) vld2q_lane_u32 (const uint32_t * __ptr, uint32x4x2_t __b, const int __c) { - __builtin_aarch64_simd_oi __o; - uint32x4x2_t ret; - __o = __builtin_aarch64_set_qregoiv4si (__o, (int32x4_t) __b.val[0], 0); - __o = __builtin_aarch64_set_qregoiv4si (__o, (int32x4_t) __b.val[1], 1); - __o = __builtin_aarch64_ld2_lanev4si ( - (__builtin_aarch64_simd_si *) __ptr, __o, __c); - ret.val[0] = (uint32x4_t) __builtin_aarch64_get_qregoiv4si (__o, 0); - ret.val[1] = (uint32x4_t) __builtin_aarch64_get_qregoiv4si (__o, 1); - return ret; + return __builtin_aarch64_ld2_lanev4si_usus ( + (__builtin_aarch64_simd_si *) __ptr, __b, __c); } __extension__ extern __inline uint64x2x2_t __attribute__ ((__always_inline__, __gnu_inline__,__artificial__)) vld2q_lane_u64 (const uint64_t * __ptr, uint64x2x2_t __b, const int __c) { - __builtin_aarch64_simd_oi __o; - uint64x2x2_t ret; - __o = __builtin_aarch64_set_qregoiv4si (__o, (int32x4_t) __b.val[0], 0); - __o = __builtin_aarch64_set_qregoiv4si (__o, (int32x4_t) __b.val[1], 1); - __o = __builtin_aarch64_ld2_lanev2di ( - (__builtin_aarch64_simd_di *) __ptr, __o, __c); - ret.val[0] = (uint64x2_t) __builtin_aarch64_get_qregoiv4si (__o, 0); - ret.val[1] = (uint64x2_t) __builtin_aarch64_get_qregoiv4si (__o, 1); - return ret; + return __builtin_aarch64_ld2_lanev2di_usus ( + (__builtin_aarch64_simd_di *) __ptr, __b, __c); } __extension__ extern __inline int8x16x2_t __attribute__ ((__always_inline__, __gnu_inline__,__artificial__)) vld2q_lane_s8 (const int8_t * __ptr, int8x16x2_t __b, const int __c) { - __builtin_aarch64_simd_oi __o; - int8x16x2_t ret; - __o = __builtin_aarch64_set_qregoiv4si (__o, (int32x4_t) __b.val[0], 0); - __o = __builtin_aarch64_set_qregoiv4si (__o, (int32x4_t) __b.val[1], 1); - __o = __builtin_aarch64_ld2_lanev16qi ( - (__builtin_aarch64_simd_qi *) __ptr, __o, __c); - ret.val[0] = (int8x16_t) __builtin_aarch64_get_qregoiv4si (__o, 0); - ret.val[1] = (int8x16_t) __builtin_aarch64_get_qregoiv4si (__o, 1); - return ret; + return __builtin_aarch64_ld2_lanev16qi ( + (__builtin_aarch64_simd_qi *) __ptr, __b, __c); } __extension__ extern __inline int16x8x2_t __attribute__ ((__always_inline__, __gnu_inline__,__artificial__)) vld2q_lane_s16 (const int16_t * __ptr, int16x8x2_t __b, const int __c) { - __builtin_aarch64_simd_oi __o; - int16x8x2_t ret; - __o = __builtin_aarch64_set_qregoiv4si (__o, (int32x4_t) __b.val[0], 0); - __o = __builtin_aarch64_set_qregoiv4si (__o, (int32x4_t) __b.val[1], 1); - __o = __builtin_aarch64_ld2_lanev8hi ( - (__builtin_aarch64_simd_hi *) __ptr, __o, __c); - ret.val[0] = (int16x8_t) __builtin_aarch64_get_qregoiv4si (__o, 0); - ret.val[1] = (int16x8_t) __builtin_aarch64_get_qregoiv4si (__o, 1); - return ret; + return __builtin_aarch64_ld2_lanev8hi ( + (__builtin_aarch64_simd_hi *) __ptr, __b, __c); } __extension__ extern __inline int32x4x2_t __attribute__ ((__always_inline__, __gnu_inline__,__artificial__)) vld2q_lane_s32 (const int32_t * __ptr, int32x4x2_t __b, const int __c) { - __builtin_aarch64_simd_oi __o; - int32x4x2_t ret; - __o = __builtin_aarch64_set_qregoiv4si (__o, __b.val[0], 0); - __o = __builtin_aarch64_set_qregoiv4si (__o, __b.val[1], 1); - __o = __builtin_aarch64_ld2_lanev4si ( - (__builtin_aarch64_simd_si *) __ptr, __o, __c); - ret.val[0] = __builtin_aarch64_get_qregoiv4si (__o, 0); - ret.val[1] = __builtin_aarch64_get_qregoiv4si (__o, 1); - return ret; + return __builtin_aarch64_ld2_lanev4si ( + (__builtin_aarch64_simd_si *) __ptr, __b, __c); } __extension__ extern __inline int64x2x2_t __attribute__ ((__always_inline__, __gnu_inline__,__artificial__)) vld2q_lane_s64 (const int64_t * __ptr, int64x2x2_t __b, const int __c) { - __builtin_aarch64_simd_oi __o; - int64x2x2_t ret; - __o = __builtin_aarch64_set_qregoiv4si (__o, (int32x4_t) __b.val[0], 0); - __o = __builtin_aarch64_set_qregoiv4si (__o, (int32x4_t) __b.val[1], 1); - __o = __builtin_aarch64_ld2_lanev2di ( - (__builtin_aarch64_simd_di *) __ptr, __o, __c); - ret.val[0] = (int64x2_t) __builtin_aarch64_get_qregoiv4si (__o, 0); - ret.val[1] = (int64x2_t) __builtin_aarch64_get_qregoiv4si (__o, 1); - return ret; + return __builtin_aarch64_ld2_lanev2di ( + (__builtin_aarch64_simd_di *) __ptr, __b, __c); } __extension__ extern __inline float16x8x2_t __attribute__ ((__always_inline__, __gnu_inline__,__artificial__)) vld2q_lane_f16 (const float16_t * __ptr, float16x8x2_t __b, const int __c) { - __builtin_aarch64_simd_oi __o; - float16x8x2_t ret; - __o = __builtin_aarch64_set_qregoiv4si (__o, (int32x4_t) __b.val[0], 0); - __o = __builtin_aarch64_set_qregoiv4si (__o, (int32x4_t) __b.val[1], 1); - __o = __builtin_aarch64_ld2_lanev8hf ( - (__builtin_aarch64_simd_hf *) __ptr, __o, __c); - ret.val[0] = (float16x8_t) __builtin_aarch64_get_qregoiv4si (__o, 0); - ret.val[1] = (float16x8_t) __builtin_aarch64_get_qregoiv4si (__o, 1); - return ret; + return __builtin_aarch64_ld2_lanev8hf ( + (__builtin_aarch64_simd_hf *) __ptr, __b, __c); } __extension__ extern __inline float32x4x2_t __attribute__ ((__always_inline__, __gnu_inline__,__artificial__)) vld2q_lane_f32 (const float32_t * __ptr, float32x4x2_t __b, const int __c) { - __builtin_aarch64_simd_oi __o; - float32x4x2_t ret; - __o = __builtin_aarch64_set_qregoiv4si (__o, (int32x4_t) __b.val[0], 0); - __o = __builtin_aarch64_set_qregoiv4si (__o, (int32x4_t) __b.val[1], 1); - __o = __builtin_aarch64_ld2_lanev4sf ( - (__builtin_aarch64_simd_sf *) __ptr, __o, __c); - ret.val[0] = (float32x4_t) __builtin_aarch64_get_qregoiv4si (__o, 0); - ret.val[1] = (float32x4_t) __builtin_aarch64_get_qregoiv4si (__o, 1); - return ret; + return __builtin_aarch64_ld2_lanev4sf ( + (__builtin_aarch64_simd_sf *) __ptr, __b, __c); } __extension__ extern __inline float64x2x2_t __attribute__ ((__always_inline__, __gnu_inline__,__artificial__)) vld2q_lane_f64 (const float64_t * __ptr, float64x2x2_t __b, const int __c) { - __builtin_aarch64_simd_oi __o; - float64x2x2_t ret; - __o = __builtin_aarch64_set_qregoiv4si (__o, (int32x4_t) __b.val[0], 0); - __o = __builtin_aarch64_set_qregoiv4si (__o, (int32x4_t) __b.val[1], 1); - __o = __builtin_aarch64_ld2_lanev2df ( - (__builtin_aarch64_simd_df *) __ptr, __o, __c); - ret.val[0] = (float64x2_t) __builtin_aarch64_get_qregoiv4si (__o, 0); - ret.val[1] = (float64x2_t) __builtin_aarch64_get_qregoiv4si (__o, 1); - return ret; + return __builtin_aarch64_ld2_lanev2df ( + (__builtin_aarch64_simd_df *) __ptr, __b, __c); } __extension__ extern __inline poly8x16x2_t __attribute__ ((__always_inline__, __gnu_inline__,__artificial__)) vld2q_lane_p8 (const poly8_t * __ptr, poly8x16x2_t __b, const int __c) { - __builtin_aarch64_simd_oi __o; - poly8x16x2_t ret; - __o = __builtin_aarch64_set_qregoiv4si (__o, (int32x4_t) __b.val[0], 0); - __o = __builtin_aarch64_set_qregoiv4si (__o, (int32x4_t) __b.val[1], 1); - __o = __builtin_aarch64_ld2_lanev16qi ( - (__builtin_aarch64_simd_qi *) __ptr, __o, __c); - ret.val[0] = (poly8x16_t) __builtin_aarch64_get_qregoiv4si (__o, 0); - ret.val[1] = (poly8x16_t) __builtin_aarch64_get_qregoiv4si (__o, 1); - return ret; + return __builtin_aarch64_ld2_lanev16qi_psps ( + (__builtin_aarch64_simd_qi *) __ptr, __b, __c); } __extension__ extern __inline poly16x8x2_t __attribute__ ((__always_inline__, __gnu_inline__,__artificial__)) vld2q_lane_p16 (const poly16_t * __ptr, poly16x8x2_t __b, const int __c) { - __builtin_aarch64_simd_oi __o; - poly16x8x2_t ret; - __o = __builtin_aarch64_set_qregoiv4si (__o, (int32x4_t) __b.val[0], 0); - __o = __builtin_aarch64_set_qregoiv4si (__o, (int32x4_t) __b.val[1], 1); - __o = __builtin_aarch64_ld2_lanev8hi ( - (__builtin_aarch64_simd_hi *) __ptr, __o, __c); - ret.val[0] = (poly16x8_t) __builtin_aarch64_get_qregoiv4si (__o, 0); - ret.val[1] = (poly16x8_t) __builtin_aarch64_get_qregoiv4si (__o, 1); - return ret; + return __builtin_aarch64_ld2_lanev8hi_psps ( + (__builtin_aarch64_simd_hi *) __ptr, __b, __c); } __extension__ extern __inline poly64x2x2_t __attribute__ ((__always_inline__, __gnu_inline__,__artificial__)) vld2q_lane_p64 (const poly64_t * __ptr, poly64x2x2_t __b, const int __c) { - __builtin_aarch64_simd_oi __o; - poly64x2x2_t ret; - __o = __builtin_aarch64_set_qregoiv4si (__o, (int32x4_t) __b.val[0], 0); - __o = __builtin_aarch64_set_qregoiv4si (__o, (int32x4_t) __b.val[1], 1); - __o = __builtin_aarch64_ld2_lanev2di ( - (__builtin_aarch64_simd_di *) __ptr, __o, __c); - ret.val[0] = (poly64x2_t) __builtin_aarch64_get_qregoiv4si (__o, 0); - ret.val[1] = (poly64x2_t) __builtin_aarch64_get_qregoiv4si (__o, 1); - return ret; + return __builtin_aarch64_ld2_lanev2di_psps ( + (__builtin_aarch64_simd_di *) __ptr, __b, __c); } /* vld3_lane */ @@ -19794,280 +17808,112 @@ __extension__ extern __inline uint8x8x3_t __attribute__ ((__always_inline__, __gnu_inline__,__artificial__)) vld3_lane_u8 (const uint8_t * __ptr, uint8x8x3_t __b, const int __c) { - __builtin_aarch64_simd_ci __o; - uint8x16x3_t __temp; - __temp.val[0] = vcombine_u8 (__b.val[0], vcreate_u8 (0)); - __temp.val[1] = vcombine_u8 (__b.val[1], vcreate_u8 (0)); - __temp.val[2] = vcombine_u8 (__b.val[2], vcreate_u8 (0)); - __o = __builtin_aarch64_set_qregciv16qi (__o, (int8x16_t) __temp.val[0], 0); - __o = __builtin_aarch64_set_qregciv16qi (__o, (int8x16_t) __temp.val[1], 1); - __o = __builtin_aarch64_set_qregciv16qi (__o, (int8x16_t) __temp.val[2], 2); - __o = __builtin_aarch64_ld3_lanev8qi ( - (__builtin_aarch64_simd_qi *) __ptr, __o, __c); - __b.val[0] = (uint8x8_t) __builtin_aarch64_get_dregcidi (__o, 0); - __b.val[1] = (uint8x8_t) __builtin_aarch64_get_dregcidi (__o, 1); - __b.val[2] = (uint8x8_t) __builtin_aarch64_get_dregcidi (__o, 2); - return __b; + return __builtin_aarch64_ld3_lanev8qi_usus ( + (__builtin_aarch64_simd_qi *) __ptr, __b, __c); } __extension__ extern __inline uint16x4x3_t __attribute__ ((__always_inline__, __gnu_inline__,__artificial__)) vld3_lane_u16 (const uint16_t * __ptr, uint16x4x3_t __b, const int __c) { - __builtin_aarch64_simd_ci __o; - uint16x8x3_t __temp; - __temp.val[0] = vcombine_u16 (__b.val[0], vcreate_u16 (0)); - __temp.val[1] = vcombine_u16 (__b.val[1], vcreate_u16 (0)); - __temp.val[2] = vcombine_u16 (__b.val[2], vcreate_u16 (0)); - __o = __builtin_aarch64_set_qregciv8hi (__o, (int16x8_t) __temp.val[0], 0); - __o = __builtin_aarch64_set_qregciv8hi (__o, (int16x8_t) __temp.val[1], 1); - __o = __builtin_aarch64_set_qregciv8hi (__o, (int16x8_t) __temp.val[2], 2); - __o = __builtin_aarch64_ld3_lanev4hi ( - (__builtin_aarch64_simd_hi *) __ptr, __o, __c); - __b.val[0] = (uint16x4_t) __builtin_aarch64_get_dregcidi (__o, 0); - __b.val[1] = (uint16x4_t) __builtin_aarch64_get_dregcidi (__o, 1); - __b.val[2] = (uint16x4_t) __builtin_aarch64_get_dregcidi (__o, 2); - return __b; + return __builtin_aarch64_ld3_lanev4hi_usus ( + (__builtin_aarch64_simd_hi *) __ptr, __b, __c); } __extension__ extern __inline uint32x2x3_t __attribute__ ((__always_inline__, __gnu_inline__,__artificial__)) vld3_lane_u32 (const uint32_t * __ptr, uint32x2x3_t __b, const int __c) { - __builtin_aarch64_simd_ci __o; - uint32x4x3_t __temp; - __temp.val[0] = vcombine_u32 (__b.val[0], vcreate_u32 (0)); - __temp.val[1] = vcombine_u32 (__b.val[1], vcreate_u32 (0)); - __temp.val[2] = vcombine_u32 (__b.val[2], vcreate_u32 (0)); - __o = __builtin_aarch64_set_qregciv4si (__o, (int32x4_t) __temp.val[0], 0); - __o = __builtin_aarch64_set_qregciv4si (__o, (int32x4_t) __temp.val[1], 1); - __o = __builtin_aarch64_set_qregciv4si (__o, (int32x4_t) __temp.val[2], 2); - __o = __builtin_aarch64_ld3_lanev2si ( - (__builtin_aarch64_simd_si *) __ptr, __o, __c); - __b.val[0] = (uint32x2_t) __builtin_aarch64_get_dregcidi (__o, 0); - __b.val[1] = (uint32x2_t) __builtin_aarch64_get_dregcidi (__o, 1); - __b.val[2] = (uint32x2_t) __builtin_aarch64_get_dregcidi (__o, 2); - return __b; + return __builtin_aarch64_ld3_lanev2si_usus ( + (__builtin_aarch64_simd_si *) __ptr, __b, __c); } __extension__ extern __inline uint64x1x3_t __attribute__ ((__always_inline__, __gnu_inline__,__artificial__)) vld3_lane_u64 (const uint64_t * __ptr, uint64x1x3_t __b, const int __c) { - __builtin_aarch64_simd_ci __o; - uint64x2x3_t __temp; - __temp.val[0] = vcombine_u64 (__b.val[0], vcreate_u64 (0)); - __temp.val[1] = vcombine_u64 (__b.val[1], vcreate_u64 (0)); - __temp.val[2] = vcombine_u64 (__b.val[2], vcreate_u64 (0)); - __o = __builtin_aarch64_set_qregciv2di (__o, (int64x2_t) __temp.val[0], 0); - __o = __builtin_aarch64_set_qregciv2di (__o, (int64x2_t) __temp.val[1], 1); - __o = __builtin_aarch64_set_qregciv2di (__o, (int64x2_t) __temp.val[2], 2); - __o = __builtin_aarch64_ld3_lanedi ( - (__builtin_aarch64_simd_di *) __ptr, __o, __c); - __b.val[0] = (uint64x1_t) __builtin_aarch64_get_dregcidi (__o, 0); - __b.val[1] = (uint64x1_t) __builtin_aarch64_get_dregcidi (__o, 1); - __b.val[2] = (uint64x1_t) __builtin_aarch64_get_dregcidi (__o, 2); - return __b; + return __builtin_aarch64_ld3_lanedi_usus ( + (__builtin_aarch64_simd_di *) __ptr, __b, __c); } __extension__ extern __inline int8x8x3_t __attribute__ ((__always_inline__, __gnu_inline__,__artificial__)) vld3_lane_s8 (const int8_t * __ptr, int8x8x3_t __b, const int __c) { - __builtin_aarch64_simd_ci __o; - int8x16x3_t __temp; - __temp.val[0] = vcombine_s8 (__b.val[0], vcreate_s8 (0)); - __temp.val[1] = vcombine_s8 (__b.val[1], vcreate_s8 (0)); - __temp.val[2] = vcombine_s8 (__b.val[2], vcreate_s8 (0)); - __o = __builtin_aarch64_set_qregciv16qi (__o, (int8x16_t) __temp.val[0], 0); - __o = __builtin_aarch64_set_qregciv16qi (__o, (int8x16_t) __temp.val[1], 1); - __o = __builtin_aarch64_set_qregciv16qi (__o, (int8x16_t) __temp.val[2], 2); - __o = __builtin_aarch64_ld3_lanev8qi ( - (__builtin_aarch64_simd_qi *) __ptr, __o, __c); - __b.val[0] = (int8x8_t) __builtin_aarch64_get_dregcidi (__o, 0); - __b.val[1] = (int8x8_t) __builtin_aarch64_get_dregcidi (__o, 1); - __b.val[2] = (int8x8_t) __builtin_aarch64_get_dregcidi (__o, 2); - return __b; + return __builtin_aarch64_ld3_lanev8qi ( + (__builtin_aarch64_simd_qi *) __ptr, __b, __c); } __extension__ extern __inline int16x4x3_t __attribute__ ((__always_inline__, __gnu_inline__,__artificial__)) vld3_lane_s16 (const int16_t * __ptr, int16x4x3_t __b, const int __c) { - __builtin_aarch64_simd_ci __o; - int16x8x3_t __temp; - __temp.val[0] = vcombine_s16 (__b.val[0], vcreate_s16 (0)); - __temp.val[1] = vcombine_s16 (__b.val[1], vcreate_s16 (0)); - __temp.val[2] = vcombine_s16 (__b.val[2], vcreate_s16 (0)); - __o = __builtin_aarch64_set_qregciv8hi (__o, (int16x8_t) __temp.val[0], 0); - __o = __builtin_aarch64_set_qregciv8hi (__o, (int16x8_t) __temp.val[1], 1); - __o = __builtin_aarch64_set_qregciv8hi (__o, (int16x8_t) __temp.val[2], 2); - __o = __builtin_aarch64_ld3_lanev4hi ( - (__builtin_aarch64_simd_hi *) __ptr, __o, __c); - __b.val[0] = (int16x4_t) __builtin_aarch64_get_dregcidi (__o, 0); - __b.val[1] = (int16x4_t) __builtin_aarch64_get_dregcidi (__o, 1); - __b.val[2] = (int16x4_t) __builtin_aarch64_get_dregcidi (__o, 2); - return __b; + return __builtin_aarch64_ld3_lanev4hi ( + (__builtin_aarch64_simd_hi *) __ptr, __b, __c); } __extension__ extern __inline int32x2x3_t __attribute__ ((__always_inline__, __gnu_inline__,__artificial__)) vld3_lane_s32 (const int32_t * __ptr, int32x2x3_t __b, const int __c) { - __builtin_aarch64_simd_ci __o; - int32x4x3_t __temp; - __temp.val[0] = vcombine_s32 (__b.val[0], vcreate_s32 (0)); - __temp.val[1] = vcombine_s32 (__b.val[1], vcreate_s32 (0)); - __temp.val[2] = vcombine_s32 (__b.val[2], vcreate_s32 (0)); - __o = __builtin_aarch64_set_qregciv4si (__o, (int32x4_t) __temp.val[0], 0); - __o = __builtin_aarch64_set_qregciv4si (__o, (int32x4_t) __temp.val[1], 1); - __o = __builtin_aarch64_set_qregciv4si (__o, (int32x4_t) __temp.val[2], 2); - __o = __builtin_aarch64_ld3_lanev2si ( - (__builtin_aarch64_simd_si *) __ptr, __o, __c); - __b.val[0] = (int32x2_t) __builtin_aarch64_get_dregcidi (__o, 0); - __b.val[1] = (int32x2_t) __builtin_aarch64_get_dregcidi (__o, 1); - __b.val[2] = (int32x2_t) __builtin_aarch64_get_dregcidi (__o, 2); - return __b; + return __builtin_aarch64_ld3_lanev2si ( + (__builtin_aarch64_simd_si *) __ptr, __b, __c); } __extension__ extern __inline int64x1x3_t __attribute__ ((__always_inline__, __gnu_inline__,__artificial__)) vld3_lane_s64 (const int64_t * __ptr, int64x1x3_t __b, const int __c) { - __builtin_aarch64_simd_ci __o; - int64x2x3_t __temp; - __temp.val[0] = vcombine_s64 (__b.val[0], vcreate_s64 (0)); - __temp.val[1] = vcombine_s64 (__b.val[1], vcreate_s64 (0)); - __temp.val[2] = vcombine_s64 (__b.val[2], vcreate_s64 (0)); - __o = __builtin_aarch64_set_qregciv2di (__o, (int64x2_t) __temp.val[0], 0); - __o = __builtin_aarch64_set_qregciv2di (__o, (int64x2_t) __temp.val[1], 1); - __o = __builtin_aarch64_set_qregciv2di (__o, (int64x2_t) __temp.val[2], 2); - __o = __builtin_aarch64_ld3_lanedi ( - (__builtin_aarch64_simd_di *) __ptr, __o, __c); - __b.val[0] = (int64x1_t) __builtin_aarch64_get_dregcidi (__o, 0); - __b.val[1] = (int64x1_t) __builtin_aarch64_get_dregcidi (__o, 1); - __b.val[2] = (int64x1_t) __builtin_aarch64_get_dregcidi (__o, 2); - return __b; + return __builtin_aarch64_ld3_lanedi ( + (__builtin_aarch64_simd_di *) __ptr, __b, __c); } __extension__ extern __inline float16x4x3_t __attribute__ ((__always_inline__, __gnu_inline__,__artificial__)) vld3_lane_f16 (const float16_t * __ptr, float16x4x3_t __b, const int __c) { - __builtin_aarch64_simd_ci __o; - float16x8x3_t __temp; - __temp.val[0] = vcombine_f16 (__b.val[0], vcreate_f16 (0)); - __temp.val[1] = vcombine_f16 (__b.val[1], vcreate_f16 (0)); - __temp.val[2] = vcombine_f16 (__b.val[2], vcreate_f16 (0)); - __o = __builtin_aarch64_set_qregciv8hf (__o, (float16x8_t) __temp.val[0], 0); - __o = __builtin_aarch64_set_qregciv8hf (__o, (float16x8_t) __temp.val[1], 1); - __o = __builtin_aarch64_set_qregciv8hf (__o, (float16x8_t) __temp.val[2], 2); - __o = __builtin_aarch64_ld3_lanev4hf ( - (__builtin_aarch64_simd_hf *) __ptr, __o, __c); - __b.val[0] = (float16x4_t) __builtin_aarch64_get_dregcidi (__o, 0); - __b.val[1] = (float16x4_t) __builtin_aarch64_get_dregcidi (__o, 1); - __b.val[2] = (float16x4_t) __builtin_aarch64_get_dregcidi (__o, 2); - return __b; + return __builtin_aarch64_ld3_lanev4hf ( + (__builtin_aarch64_simd_hf *) __ptr, __b, __c); } __extension__ extern __inline float32x2x3_t __attribute__ ((__always_inline__, __gnu_inline__,__artificial__)) vld3_lane_f32 (const float32_t * __ptr, float32x2x3_t __b, const int __c) { - __builtin_aarch64_simd_ci __o; - float32x4x3_t __temp; - __temp.val[0] = vcombine_f32 (__b.val[0], vcreate_f32 (0)); - __temp.val[1] = vcombine_f32 (__b.val[1], vcreate_f32 (0)); - __temp.val[2] = vcombine_f32 (__b.val[2], vcreate_f32 (0)); - __o = __builtin_aarch64_set_qregciv4sf (__o, (float32x4_t) __temp.val[0], 0); - __o = __builtin_aarch64_set_qregciv4sf (__o, (float32x4_t) __temp.val[1], 1); - __o = __builtin_aarch64_set_qregciv4sf (__o, (float32x4_t) __temp.val[2], 2); - __o = __builtin_aarch64_ld3_lanev2sf ( - (__builtin_aarch64_simd_sf *) __ptr, __o, __c); - __b.val[0] = (float32x2_t) __builtin_aarch64_get_dregcidi (__o, 0); - __b.val[1] = (float32x2_t) __builtin_aarch64_get_dregcidi (__o, 1); - __b.val[2] = (float32x2_t) __builtin_aarch64_get_dregcidi (__o, 2); - return __b; + return __builtin_aarch64_ld3_lanev2sf ( + (__builtin_aarch64_simd_sf *) __ptr, __b, __c); } __extension__ extern __inline float64x1x3_t __attribute__ ((__always_inline__, __gnu_inline__,__artificial__)) vld3_lane_f64 (const float64_t * __ptr, float64x1x3_t __b, const int __c) { - __builtin_aarch64_simd_ci __o; - float64x2x3_t __temp; - __temp.val[0] = vcombine_f64 (__b.val[0], vcreate_f64 (0)); - __temp.val[1] = vcombine_f64 (__b.val[1], vcreate_f64 (0)); - __temp.val[2] = vcombine_f64 (__b.val[2], vcreate_f64 (0)); - __o = __builtin_aarch64_set_qregciv2df (__o, (float64x2_t) __temp.val[0], 0); - __o = __builtin_aarch64_set_qregciv2df (__o, (float64x2_t) __temp.val[1], 1); - __o = __builtin_aarch64_set_qregciv2df (__o, (float64x2_t) __temp.val[2], 2); - __o = __builtin_aarch64_ld3_lanedf ( - (__builtin_aarch64_simd_df *) __ptr, __o, __c); - __b.val[0] = (float64x1_t) __builtin_aarch64_get_dregcidi (__o, 0); - __b.val[1] = (float64x1_t) __builtin_aarch64_get_dregcidi (__o, 1); - __b.val[2] = (float64x1_t) __builtin_aarch64_get_dregcidi (__o, 2); - return __b; + return __builtin_aarch64_ld3_lanedf ( + (__builtin_aarch64_simd_df *) __ptr, __b, __c); } __extension__ extern __inline poly8x8x3_t __attribute__ ((__always_inline__, __gnu_inline__,__artificial__)) vld3_lane_p8 (const poly8_t * __ptr, poly8x8x3_t __b, const int __c) { - __builtin_aarch64_simd_ci __o; - poly8x16x3_t __temp; - __temp.val[0] = vcombine_p8 (__b.val[0], vcreate_p8 (0)); - __temp.val[1] = vcombine_p8 (__b.val[1], vcreate_p8 (0)); - __temp.val[2] = vcombine_p8 (__b.val[2], vcreate_p8 (0)); - __o = __builtin_aarch64_set_qregciv16qi (__o, (int8x16_t) __temp.val[0], 0); - __o = __builtin_aarch64_set_qregciv16qi (__o, (int8x16_t) __temp.val[1], 1); - __o = __builtin_aarch64_set_qregciv16qi (__o, (int8x16_t) __temp.val[2], 2); - __o = __builtin_aarch64_ld3_lanev8qi ( - (__builtin_aarch64_simd_qi *) __ptr, __o, __c); - __b.val[0] = (poly8x8_t) __builtin_aarch64_get_dregcidi (__o, 0); - __b.val[1] = (poly8x8_t) __builtin_aarch64_get_dregcidi (__o, 1); - __b.val[2] = (poly8x8_t) __builtin_aarch64_get_dregcidi (__o, 2); - return __b; + return __builtin_aarch64_ld3_lanev8qi_psps ( + (__builtin_aarch64_simd_qi *) __ptr, __b, __c); } __extension__ extern __inline poly16x4x3_t __attribute__ ((__always_inline__, __gnu_inline__,__artificial__)) vld3_lane_p16 (const poly16_t * __ptr, poly16x4x3_t __b, const int __c) { - __builtin_aarch64_simd_ci __o; - poly16x8x3_t __temp; - __temp.val[0] = vcombine_p16 (__b.val[0], vcreate_p16 (0)); - __temp.val[1] = vcombine_p16 (__b.val[1], vcreate_p16 (0)); - __temp.val[2] = vcombine_p16 (__b.val[2], vcreate_p16 (0)); - __o = __builtin_aarch64_set_qregciv8hi (__o, (int16x8_t) __temp.val[0], 0); - __o = __builtin_aarch64_set_qregciv8hi (__o, (int16x8_t) __temp.val[1], 1); - __o = __builtin_aarch64_set_qregciv8hi (__o, (int16x8_t) __temp.val[2], 2); - __o = __builtin_aarch64_ld3_lanev4hi ( - (__builtin_aarch64_simd_hi *) __ptr, __o, __c); - __b.val[0] = (poly16x4_t) __builtin_aarch64_get_dregcidi (__o, 0); - __b.val[1] = (poly16x4_t) __builtin_aarch64_get_dregcidi (__o, 1); - __b.val[2] = (poly16x4_t) __builtin_aarch64_get_dregcidi (__o, 2); - return __b; + return __builtin_aarch64_ld3_lanev4hi_psps ( + (__builtin_aarch64_simd_hi *) __ptr, __b, __c); } __extension__ extern __inline poly64x1x3_t __attribute__ ((__always_inline__, __gnu_inline__,__artificial__)) vld3_lane_p64 (const poly64_t * __ptr, poly64x1x3_t __b, const int __c) { - __builtin_aarch64_simd_ci __o; - poly64x2x3_t __temp; - __temp.val[0] = vcombine_p64 (__b.val[0], vcreate_p64 (0)); - __temp.val[1] = vcombine_p64 (__b.val[1], vcreate_p64 (0)); - __temp.val[2] = vcombine_p64 (__b.val[2], vcreate_p64 (0)); - __o = __builtin_aarch64_set_qregciv2di (__o, (int64x2_t) __temp.val[0], 0); - __o = __builtin_aarch64_set_qregciv2di (__o, (int64x2_t) __temp.val[1], 1); - __o = __builtin_aarch64_set_qregciv2di (__o, (int64x2_t) __temp.val[2], 2); - __o = __builtin_aarch64_ld3_lanedi ( - (__builtin_aarch64_simd_di *) __ptr, __o, __c); - __b.val[0] = (poly64x1_t) __builtin_aarch64_get_dregcidi (__o, 0); - __b.val[1] = (poly64x1_t) __builtin_aarch64_get_dregcidi (__o, 1); - __b.val[2] = (poly64x1_t) __builtin_aarch64_get_dregcidi (__o, 2); - return __b; + return __builtin_aarch64_ld3_lanedi_psps ( + (__builtin_aarch64_simd_di *) __ptr, __b, __c); } /* vld3q_lane */ @@ -20076,238 +17922,112 @@ __extension__ extern __inline uint8x16x3_t __attribute__ ((__always_inline__, __gnu_inline__,__artificial__)) vld3q_lane_u8 (const uint8_t * __ptr, uint8x16x3_t __b, const int __c) { - __builtin_aarch64_simd_ci __o; - uint8x16x3_t ret; - __o = __builtin_aarch64_set_qregciv4si (__o, (int32x4_t) __b.val[0], 0); - __o = __builtin_aarch64_set_qregciv4si (__o, (int32x4_t) __b.val[1], 1); - __o = __builtin_aarch64_set_qregciv4si (__o, (int32x4_t) __b.val[2], 2); - __o = __builtin_aarch64_ld3_lanev16qi ( - (__builtin_aarch64_simd_qi *) __ptr, __o, __c); - ret.val[0] = (uint8x16_t) __builtin_aarch64_get_qregciv4si (__o, 0); - ret.val[1] = (uint8x16_t) __builtin_aarch64_get_qregciv4si (__o, 1); - ret.val[2] = (uint8x16_t) __builtin_aarch64_get_qregciv4si (__o, 2); - return ret; + return __builtin_aarch64_ld3_lanev16qi_usus ( + (__builtin_aarch64_simd_qi *) __ptr, __b, __c); } __extension__ extern __inline uint16x8x3_t __attribute__ ((__always_inline__, __gnu_inline__,__artificial__)) vld3q_lane_u16 (const uint16_t * __ptr, uint16x8x3_t __b, const int __c) { - __builtin_aarch64_simd_ci __o; - uint16x8x3_t ret; - __o = __builtin_aarch64_set_qregciv4si (__o, (int32x4_t) __b.val[0], 0); - __o = __builtin_aarch64_set_qregciv4si (__o, (int32x4_t) __b.val[1], 1); - __o = __builtin_aarch64_set_qregciv4si (__o, (int32x4_t) __b.val[2], 2); - __o = __builtin_aarch64_ld3_lanev8hi ( - (__builtin_aarch64_simd_hi *) __ptr, __o, __c); - ret.val[0] = (uint16x8_t) __builtin_aarch64_get_qregciv4si (__o, 0); - ret.val[1] = (uint16x8_t) __builtin_aarch64_get_qregciv4si (__o, 1); - ret.val[2] = (uint16x8_t) __builtin_aarch64_get_qregciv4si (__o, 2); - return ret; + return __builtin_aarch64_ld3_lanev8hi_usus ( + (__builtin_aarch64_simd_hi *) __ptr, __b, __c); } __extension__ extern __inline uint32x4x3_t __attribute__ ((__always_inline__, __gnu_inline__,__artificial__)) vld3q_lane_u32 (const uint32_t * __ptr, uint32x4x3_t __b, const int __c) { - __builtin_aarch64_simd_ci __o; - uint32x4x3_t ret; - __o = __builtin_aarch64_set_qregciv4si (__o, (int32x4_t) __b.val[0], 0); - __o = __builtin_aarch64_set_qregciv4si (__o, (int32x4_t) __b.val[1], 1); - __o = __builtin_aarch64_set_qregciv4si (__o, (int32x4_t) __b.val[2], 2); - __o = __builtin_aarch64_ld3_lanev4si ( - (__builtin_aarch64_simd_si *) __ptr, __o, __c); - ret.val[0] = (uint32x4_t) __builtin_aarch64_get_qregciv4si (__o, 0); - ret.val[1] = (uint32x4_t) __builtin_aarch64_get_qregciv4si (__o, 1); - ret.val[2] = (uint32x4_t) __builtin_aarch64_get_qregciv4si (__o, 2); - return ret; + return __builtin_aarch64_ld3_lanev4si_usus ( + (__builtin_aarch64_simd_si *) __ptr, __b, __c); } __extension__ extern __inline uint64x2x3_t __attribute__ ((__always_inline__, __gnu_inline__,__artificial__)) vld3q_lane_u64 (const uint64_t * __ptr, uint64x2x3_t __b, const int __c) { - __builtin_aarch64_simd_ci __o; - uint64x2x3_t ret; - __o = __builtin_aarch64_set_qregciv4si (__o, (int32x4_t) __b.val[0], 0); - __o = __builtin_aarch64_set_qregciv4si (__o, (int32x4_t) __b.val[1], 1); - __o = __builtin_aarch64_set_qregciv4si (__o, (int32x4_t) __b.val[2], 2); - __o = __builtin_aarch64_ld3_lanev2di ( - (__builtin_aarch64_simd_di *) __ptr, __o, __c); - ret.val[0] = (uint64x2_t) __builtin_aarch64_get_qregciv4si (__o, 0); - ret.val[1] = (uint64x2_t) __builtin_aarch64_get_qregciv4si (__o, 1); - ret.val[2] = (uint64x2_t) __builtin_aarch64_get_qregciv4si (__o, 2); - return ret; + return __builtin_aarch64_ld3_lanev2di_usus ( + (__builtin_aarch64_simd_di *) __ptr, __b, __c); } __extension__ extern __inline int8x16x3_t __attribute__ ((__always_inline__, __gnu_inline__,__artificial__)) vld3q_lane_s8 (const int8_t * __ptr, int8x16x3_t __b, const int __c) { - __builtin_aarch64_simd_ci __o; - int8x16x3_t ret; - __o = __builtin_aarch64_set_qregciv4si (__o, (int32x4_t) __b.val[0], 0); - __o = __builtin_aarch64_set_qregciv4si (__o, (int32x4_t) __b.val[1], 1); - __o = __builtin_aarch64_set_qregciv4si (__o, (int32x4_t) __b.val[2], 2); - __o = __builtin_aarch64_ld3_lanev16qi ( - (__builtin_aarch64_simd_qi *) __ptr, __o, __c); - ret.val[0] = (int8x16_t) __builtin_aarch64_get_qregciv4si (__o, 0); - ret.val[1] = (int8x16_t) __builtin_aarch64_get_qregciv4si (__o, 1); - ret.val[2] = (int8x16_t) __builtin_aarch64_get_qregciv4si (__o, 2); - return ret; + return __builtin_aarch64_ld3_lanev16qi ( + (__builtin_aarch64_simd_qi *) __ptr, __b, __c); } __extension__ extern __inline int16x8x3_t __attribute__ ((__always_inline__, __gnu_inline__,__artificial__)) vld3q_lane_s16 (const int16_t * __ptr, int16x8x3_t __b, const int __c) { - __builtin_aarch64_simd_ci __o; - int16x8x3_t ret; - __o = __builtin_aarch64_set_qregciv4si (__o, (int32x4_t) __b.val[0], 0); - __o = __builtin_aarch64_set_qregciv4si (__o, (int32x4_t) __b.val[1], 1); - __o = __builtin_aarch64_set_qregciv4si (__o, (int32x4_t) __b.val[2], 2); - __o = __builtin_aarch64_ld3_lanev8hi ( - (__builtin_aarch64_simd_hi *) __ptr, __o, __c); - ret.val[0] = (int16x8_t) __builtin_aarch64_get_qregciv4si (__o, 0); - ret.val[1] = (int16x8_t) __builtin_aarch64_get_qregciv4si (__o, 1); - ret.val[2] = (int16x8_t) __builtin_aarch64_get_qregciv4si (__o, 2); - return ret; + return __builtin_aarch64_ld3_lanev8hi ( + (__builtin_aarch64_simd_hi *) __ptr, __b, __c); } __extension__ extern __inline int32x4x3_t __attribute__ ((__always_inline__, __gnu_inline__,__artificial__)) vld3q_lane_s32 (const int32_t * __ptr, int32x4x3_t __b, const int __c) { - __builtin_aarch64_simd_ci __o; - int32x4x3_t ret; - __o = __builtin_aarch64_set_qregciv4si (__o, (int32x4_t) __b.val[0], 0); - __o = __builtin_aarch64_set_qregciv4si (__o, (int32x4_t) __b.val[1], 1); - __o = __builtin_aarch64_set_qregciv4si (__o, (int32x4_t) __b.val[2], 2); - __o = __builtin_aarch64_ld3_lanev4si ( - (__builtin_aarch64_simd_si *) __ptr, __o, __c); - ret.val[0] = (int32x4_t) __builtin_aarch64_get_qregciv4si (__o, 0); - ret.val[1] = (int32x4_t) __builtin_aarch64_get_qregciv4si (__o, 1); - ret.val[2] = (int32x4_t) __builtin_aarch64_get_qregciv4si (__o, 2); - return ret; + return __builtin_aarch64_ld3_lanev4si ( + (__builtin_aarch64_simd_si *) __ptr, __b, __c); } __extension__ extern __inline int64x2x3_t __attribute__ ((__always_inline__, __gnu_inline__,__artificial__)) vld3q_lane_s64 (const int64_t * __ptr, int64x2x3_t __b, const int __c) { - __builtin_aarch64_simd_ci __o; - int64x2x3_t ret; - __o = __builtin_aarch64_set_qregciv4si (__o, (int32x4_t) __b.val[0], 0); - __o = __builtin_aarch64_set_qregciv4si (__o, (int32x4_t) __b.val[1], 1); - __o = __builtin_aarch64_set_qregciv4si (__o, (int32x4_t) __b.val[2], 2); - __o = __builtin_aarch64_ld3_lanev2di ( - (__builtin_aarch64_simd_di *) __ptr, __o, __c); - ret.val[0] = (int64x2_t) __builtin_aarch64_get_qregciv4si (__o, 0); - ret.val[1] = (int64x2_t) __builtin_aarch64_get_qregciv4si (__o, 1); - ret.val[2] = (int64x2_t) __builtin_aarch64_get_qregciv4si (__o, 2); - return ret; + return __builtin_aarch64_ld3_lanev2di ( + (__builtin_aarch64_simd_di *) __ptr, __b, __c); } __extension__ extern __inline float16x8x3_t __attribute__ ((__always_inline__, __gnu_inline__,__artificial__)) vld3q_lane_f16 (const float16_t * __ptr, float16x8x3_t __b, const int __c) { - __builtin_aarch64_simd_ci __o; - float16x8x3_t ret; - __o = __builtin_aarch64_set_qregciv4si (__o, (int32x4_t) __b.val[0], 0); - __o = __builtin_aarch64_set_qregciv4si (__o, (int32x4_t) __b.val[1], 1); - __o = __builtin_aarch64_set_qregciv4si (__o, (int32x4_t) __b.val[2], 2); - __o = __builtin_aarch64_ld3_lanev8hf ( - (__builtin_aarch64_simd_hf *) __ptr, __o, __c); - ret.val[0] = (float16x8_t) __builtin_aarch64_get_qregciv4si (__o, 0); - ret.val[1] = (float16x8_t) __builtin_aarch64_get_qregciv4si (__o, 1); - ret.val[2] = (float16x8_t) __builtin_aarch64_get_qregciv4si (__o, 2); - return ret; + return __builtin_aarch64_ld3_lanev8hf ( + (__builtin_aarch64_simd_hf *) __ptr, __b, __c); } __extension__ extern __inline float32x4x3_t __attribute__ ((__always_inline__, __gnu_inline__,__artificial__)) vld3q_lane_f32 (const float32_t * __ptr, float32x4x3_t __b, const int __c) { - __builtin_aarch64_simd_ci __o; - float32x4x3_t ret; - __o = __builtin_aarch64_set_qregciv4si (__o, (int32x4_t) __b.val[0], 0); - __o = __builtin_aarch64_set_qregciv4si (__o, (int32x4_t) __b.val[1], 1); - __o = __builtin_aarch64_set_qregciv4si (__o, (int32x4_t) __b.val[2], 2); - __o = __builtin_aarch64_ld3_lanev4sf ( - (__builtin_aarch64_simd_sf *) __ptr, __o, __c); - ret.val[0] = (float32x4_t) __builtin_aarch64_get_qregciv4si (__o, 0); - ret.val[1] = (float32x4_t) __builtin_aarch64_get_qregciv4si (__o, 1); - ret.val[2] = (float32x4_t) __builtin_aarch64_get_qregciv4si (__o, 2); - return ret; + return __builtin_aarch64_ld3_lanev4sf ( + (__builtin_aarch64_simd_sf *) __ptr, __b, __c); } __extension__ extern __inline float64x2x3_t __attribute__ ((__always_inline__, __gnu_inline__,__artificial__)) vld3q_lane_f64 (const float64_t * __ptr, float64x2x3_t __b, const int __c) { - __builtin_aarch64_simd_ci __o; - float64x2x3_t ret; - __o = __builtin_aarch64_set_qregciv4si (__o, (int32x4_t) __b.val[0], 0); - __o = __builtin_aarch64_set_qregciv4si (__o, (int32x4_t) __b.val[1], 1); - __o = __builtin_aarch64_set_qregciv4si (__o, (int32x4_t) __b.val[2], 2); - __o = __builtin_aarch64_ld3_lanev2df ( - (__builtin_aarch64_simd_df *) __ptr, __o, __c); - ret.val[0] = (float64x2_t) __builtin_aarch64_get_qregciv4si (__o, 0); - ret.val[1] = (float64x2_t) __builtin_aarch64_get_qregciv4si (__o, 1); - ret.val[2] = (float64x2_t) __builtin_aarch64_get_qregciv4si (__o, 2); - return ret; + return __builtin_aarch64_ld3_lanev2df ( + (__builtin_aarch64_simd_df *) __ptr, __b, __c); } __extension__ extern __inline poly8x16x3_t __attribute__ ((__always_inline__, __gnu_inline__,__artificial__)) vld3q_lane_p8 (const poly8_t * __ptr, poly8x16x3_t __b, const int __c) { - __builtin_aarch64_simd_ci __o; - poly8x16x3_t ret; - __o = __builtin_aarch64_set_qregciv4si (__o, (int32x4_t) __b.val[0], 0); - __o = __builtin_aarch64_set_qregciv4si (__o, (int32x4_t) __b.val[1], 1); - __o = __builtin_aarch64_set_qregciv4si (__o, (int32x4_t) __b.val[2], 2); - __o = __builtin_aarch64_ld3_lanev16qi ( - (__builtin_aarch64_simd_qi *) __ptr, __o, __c); - ret.val[0] = (poly8x16_t) __builtin_aarch64_get_qregciv4si (__o, 0); - ret.val[1] = (poly8x16_t) __builtin_aarch64_get_qregciv4si (__o, 1); - ret.val[2] = (poly8x16_t) __builtin_aarch64_get_qregciv4si (__o, 2); - return ret; + return __builtin_aarch64_ld3_lanev16qi_psps ( + (__builtin_aarch64_simd_qi *) __ptr, __b, __c); } __extension__ extern __inline poly16x8x3_t __attribute__ ((__always_inline__, __gnu_inline__,__artificial__)) vld3q_lane_p16 (const poly16_t * __ptr, poly16x8x3_t __b, const int __c) { - __builtin_aarch64_simd_ci __o; - poly16x8x3_t ret; - __o = __builtin_aarch64_set_qregciv4si (__o, (int32x4_t) __b.val[0], 0); - __o = __builtin_aarch64_set_qregciv4si (__o, (int32x4_t) __b.val[1], 1); - __o = __builtin_aarch64_set_qregciv4si (__o, (int32x4_t) __b.val[2], 2); - __o = __builtin_aarch64_ld3_lanev8hi ( - (__builtin_aarch64_simd_hi *) __ptr, __o, __c); - ret.val[0] = (poly16x8_t) __builtin_aarch64_get_qregciv4si (__o, 0); - ret.val[1] = (poly16x8_t) __builtin_aarch64_get_qregciv4si (__o, 1); - ret.val[2] = (poly16x8_t) __builtin_aarch64_get_qregciv4si (__o, 2); - return ret; + return __builtin_aarch64_ld3_lanev8hi_psps ( + (__builtin_aarch64_simd_hi *) __ptr, __b, __c); } __extension__ extern __inline poly64x2x3_t __attribute__ ((__always_inline__, __gnu_inline__,__artificial__)) vld3q_lane_p64 (const poly64_t * __ptr, poly64x2x3_t __b, const int __c) { - __builtin_aarch64_simd_ci __o; - poly64x2x3_t ret; - __o = __builtin_aarch64_set_qregciv4si (__o, (int32x4_t) __b.val[0], 0); - __o = __builtin_aarch64_set_qregciv4si (__o, (int32x4_t) __b.val[1], 1); - __o = __builtin_aarch64_set_qregciv4si (__o, (int32x4_t) __b.val[2], 2); - __o = __builtin_aarch64_ld3_lanev2di ( - (__builtin_aarch64_simd_di *) __ptr, __o, __c); - ret.val[0] = (poly64x2_t) __builtin_aarch64_get_qregciv4si (__o, 0); - ret.val[1] = (poly64x2_t) __builtin_aarch64_get_qregciv4si (__o, 1); - ret.val[2] = (poly64x2_t) __builtin_aarch64_get_qregciv4si (__o, 2); - return ret; + return __builtin_aarch64_ld3_lanev2di_psps ( + (__builtin_aarch64_simd_di *) __ptr, __b, __c); } /* vld4_lane */ @@ -20316,322 +18036,112 @@ __extension__ extern __inline uint8x8x4_t __attribute__ ((__always_inline__, __gnu_inline__,__artificial__)) vld4_lane_u8 (const uint8_t * __ptr, uint8x8x4_t __b, const int __c) { - __builtin_aarch64_simd_xi __o; - uint8x16x4_t __temp; - __temp.val[0] = vcombine_u8 (__b.val[0], vcreate_u8 (0)); - __temp.val[1] = vcombine_u8 (__b.val[1], vcreate_u8 (0)); - __temp.val[2] = vcombine_u8 (__b.val[2], vcreate_u8 (0)); - __temp.val[3] = vcombine_u8 (__b.val[3], vcreate_u8 (0)); - __o = __builtin_aarch64_set_qregxiv16qi (__o, (int8x16_t) __temp.val[0], 0); - __o = __builtin_aarch64_set_qregxiv16qi (__o, (int8x16_t) __temp.val[1], 1); - __o = __builtin_aarch64_set_qregxiv16qi (__o, (int8x16_t) __temp.val[2], 2); - __o = __builtin_aarch64_set_qregxiv16qi (__o, (int8x16_t) __temp.val[3], 3); - __o = __builtin_aarch64_ld4_lanev8qi ( - (__builtin_aarch64_simd_qi *) __ptr, __o, __c); - __b.val[0] = (uint8x8_t) __builtin_aarch64_get_dregxidi (__o, 0); - __b.val[1] = (uint8x8_t) __builtin_aarch64_get_dregxidi (__o, 1); - __b.val[2] = (uint8x8_t) __builtin_aarch64_get_dregxidi (__o, 2); - __b.val[3] = (uint8x8_t) __builtin_aarch64_get_dregxidi (__o, 3); - return __b; + return __builtin_aarch64_ld4_lanev8qi_usus ( + (__builtin_aarch64_simd_qi *) __ptr, __b, __c); } __extension__ extern __inline uint16x4x4_t __attribute__ ((__always_inline__, __gnu_inline__,__artificial__)) vld4_lane_u16 (const uint16_t * __ptr, uint16x4x4_t __b, const int __c) { - __builtin_aarch64_simd_xi __o; - uint16x8x4_t __temp; - __temp.val[0] = vcombine_u16 (__b.val[0], vcreate_u16 (0)); - __temp.val[1] = vcombine_u16 (__b.val[1], vcreate_u16 (0)); - __temp.val[2] = vcombine_u16 (__b.val[2], vcreate_u16 (0)); - __temp.val[3] = vcombine_u16 (__b.val[3], vcreate_u16 (0)); - __o = __builtin_aarch64_set_qregxiv8hi (__o, (int16x8_t) __temp.val[0], 0); - __o = __builtin_aarch64_set_qregxiv8hi (__o, (int16x8_t) __temp.val[1], 1); - __o = __builtin_aarch64_set_qregxiv8hi (__o, (int16x8_t) __temp.val[2], 2); - __o = __builtin_aarch64_set_qregxiv8hi (__o, (int16x8_t) __temp.val[3], 3); - __o = __builtin_aarch64_ld4_lanev4hi ( - (__builtin_aarch64_simd_hi *) __ptr, __o, __c); - __b.val[0] = (uint16x4_t) __builtin_aarch64_get_dregxidi (__o, 0); - __b.val[1] = (uint16x4_t) __builtin_aarch64_get_dregxidi (__o, 1); - __b.val[2] = (uint16x4_t) __builtin_aarch64_get_dregxidi (__o, 2); - __b.val[3] = (uint16x4_t) __builtin_aarch64_get_dregxidi (__o, 3); - return __b; + return __builtin_aarch64_ld4_lanev4hi_usus ( + (__builtin_aarch64_simd_hi *) __ptr, __b, __c); } __extension__ extern __inline uint32x2x4_t __attribute__ ((__always_inline__, __gnu_inline__,__artificial__)) vld4_lane_u32 (const uint32_t * __ptr, uint32x2x4_t __b, const int __c) { - __builtin_aarch64_simd_xi __o; - uint32x4x4_t __temp; - __temp.val[0] = vcombine_u32 (__b.val[0], vcreate_u32 (0)); - __temp.val[1] = vcombine_u32 (__b.val[1], vcreate_u32 (0)); - __temp.val[2] = vcombine_u32 (__b.val[2], vcreate_u32 (0)); - __temp.val[3] = vcombine_u32 (__b.val[3], vcreate_u32 (0)); - __o = __builtin_aarch64_set_qregxiv4si (__o, (int32x4_t) __temp.val[0], 0); - __o = __builtin_aarch64_set_qregxiv4si (__o, (int32x4_t) __temp.val[1], 1); - __o = __builtin_aarch64_set_qregxiv4si (__o, (int32x4_t) __temp.val[2], 2); - __o = __builtin_aarch64_set_qregxiv4si (__o, (int32x4_t) __temp.val[3], 3); - __o = __builtin_aarch64_ld4_lanev2si ( - (__builtin_aarch64_simd_si *) __ptr, __o, __c); - __b.val[0] = (uint32x2_t) __builtin_aarch64_get_dregxidi (__o, 0); - __b.val[1] = (uint32x2_t) __builtin_aarch64_get_dregxidi (__o, 1); - __b.val[2] = (uint32x2_t) __builtin_aarch64_get_dregxidi (__o, 2); - __b.val[3] = (uint32x2_t) __builtin_aarch64_get_dregxidi (__o, 3); - return __b; + return __builtin_aarch64_ld4_lanev2si_usus ( + (__builtin_aarch64_simd_si *) __ptr, __b, __c); } __extension__ extern __inline uint64x1x4_t __attribute__ ((__always_inline__, __gnu_inline__,__artificial__)) vld4_lane_u64 (const uint64_t * __ptr, uint64x1x4_t __b, const int __c) { - __builtin_aarch64_simd_xi __o; - uint64x2x4_t __temp; - __temp.val[0] = vcombine_u64 (__b.val[0], vcreate_u64 (0)); - __temp.val[1] = vcombine_u64 (__b.val[1], vcreate_u64 (0)); - __temp.val[2] = vcombine_u64 (__b.val[2], vcreate_u64 (0)); - __temp.val[3] = vcombine_u64 (__b.val[3], vcreate_u64 (0)); - __o = __builtin_aarch64_set_qregxiv2di (__o, (int64x2_t) __temp.val[0], 0); - __o = __builtin_aarch64_set_qregxiv2di (__o, (int64x2_t) __temp.val[1], 1); - __o = __builtin_aarch64_set_qregxiv2di (__o, (int64x2_t) __temp.val[2], 2); - __o = __builtin_aarch64_set_qregxiv2di (__o, (int64x2_t) __temp.val[3], 3); - __o = __builtin_aarch64_ld4_lanedi ( - (__builtin_aarch64_simd_di *) __ptr, __o, __c); - __b.val[0] = (uint64x1_t) __builtin_aarch64_get_dregxidi (__o, 0); - __b.val[1] = (uint64x1_t) __builtin_aarch64_get_dregxidi (__o, 1); - __b.val[2] = (uint64x1_t) __builtin_aarch64_get_dregxidi (__o, 2); - __b.val[3] = (uint64x1_t) __builtin_aarch64_get_dregxidi (__o, 3); - return __b; + return __builtin_aarch64_ld4_lanedi_usus ( + (__builtin_aarch64_simd_di *) __ptr, __b, __c); } __extension__ extern __inline int8x8x4_t __attribute__ ((__always_inline__, __gnu_inline__,__artificial__)) vld4_lane_s8 (const int8_t * __ptr, int8x8x4_t __b, const int __c) { - __builtin_aarch64_simd_xi __o; - int8x16x4_t __temp; - __temp.val[0] = vcombine_s8 (__b.val[0], vcreate_s8 (0)); - __temp.val[1] = vcombine_s8 (__b.val[1], vcreate_s8 (0)); - __temp.val[2] = vcombine_s8 (__b.val[2], vcreate_s8 (0)); - __temp.val[3] = vcombine_s8 (__b.val[3], vcreate_s8 (0)); - __o = __builtin_aarch64_set_qregxiv16qi (__o, (int8x16_t) __temp.val[0], 0); - __o = __builtin_aarch64_set_qregxiv16qi (__o, (int8x16_t) __temp.val[1], 1); - __o = __builtin_aarch64_set_qregxiv16qi (__o, (int8x16_t) __temp.val[2], 2); - __o = __builtin_aarch64_set_qregxiv16qi (__o, (int8x16_t) __temp.val[3], 3); - __o = __builtin_aarch64_ld4_lanev8qi ( - (__builtin_aarch64_simd_qi *) __ptr, __o, __c); - __b.val[0] = (int8x8_t) __builtin_aarch64_get_dregxidi (__o, 0); - __b.val[1] = (int8x8_t) __builtin_aarch64_get_dregxidi (__o, 1); - __b.val[2] = (int8x8_t) __builtin_aarch64_get_dregxidi (__o, 2); - __b.val[3] = (int8x8_t) __builtin_aarch64_get_dregxidi (__o, 3); - return __b; + return __builtin_aarch64_ld4_lanev8qi ( + (__builtin_aarch64_simd_qi *) __ptr, __b, __c); } __extension__ extern __inline int16x4x4_t __attribute__ ((__always_inline__, __gnu_inline__,__artificial__)) vld4_lane_s16 (const int16_t * __ptr, int16x4x4_t __b, const int __c) { - __builtin_aarch64_simd_xi __o; - int16x8x4_t __temp; - __temp.val[0] = vcombine_s16 (__b.val[0], vcreate_s16 (0)); - __temp.val[1] = vcombine_s16 (__b.val[1], vcreate_s16 (0)); - __temp.val[2] = vcombine_s16 (__b.val[2], vcreate_s16 (0)); - __temp.val[3] = vcombine_s16 (__b.val[3], vcreate_s16 (0)); - __o = __builtin_aarch64_set_qregxiv8hi (__o, (int16x8_t) __temp.val[0], 0); - __o = __builtin_aarch64_set_qregxiv8hi (__o, (int16x8_t) __temp.val[1], 1); - __o = __builtin_aarch64_set_qregxiv8hi (__o, (int16x8_t) __temp.val[2], 2); - __o = __builtin_aarch64_set_qregxiv8hi (__o, (int16x8_t) __temp.val[3], 3); - __o = __builtin_aarch64_ld4_lanev4hi ( - (__builtin_aarch64_simd_hi *) __ptr, __o, __c); - __b.val[0] = (int16x4_t) __builtin_aarch64_get_dregxidi (__o, 0); - __b.val[1] = (int16x4_t) __builtin_aarch64_get_dregxidi (__o, 1); - __b.val[2] = (int16x4_t) __builtin_aarch64_get_dregxidi (__o, 2); - __b.val[3] = (int16x4_t) __builtin_aarch64_get_dregxidi (__o, 3); - return __b; + return __builtin_aarch64_ld4_lanev4hi ( + (__builtin_aarch64_simd_hi *) __ptr, __b, __c); } __extension__ extern __inline int32x2x4_t __attribute__ ((__always_inline__, __gnu_inline__,__artificial__)) vld4_lane_s32 (const int32_t * __ptr, int32x2x4_t __b, const int __c) { - __builtin_aarch64_simd_xi __o; - int32x4x4_t __temp; - __temp.val[0] = vcombine_s32 (__b.val[0], vcreate_s32 (0)); - __temp.val[1] = vcombine_s32 (__b.val[1], vcreate_s32 (0)); - __temp.val[2] = vcombine_s32 (__b.val[2], vcreate_s32 (0)); - __temp.val[3] = vcombine_s32 (__b.val[3], vcreate_s32 (0)); - __o = __builtin_aarch64_set_qregxiv4si (__o, (int32x4_t) __temp.val[0], 0); - __o = __builtin_aarch64_set_qregxiv4si (__o, (int32x4_t) __temp.val[1], 1); - __o = __builtin_aarch64_set_qregxiv4si (__o, (int32x4_t) __temp.val[2], 2); - __o = __builtin_aarch64_set_qregxiv4si (__o, (int32x4_t) __temp.val[3], 3); - __o = __builtin_aarch64_ld4_lanev2si ( - (__builtin_aarch64_simd_si *) __ptr, __o, __c); - __b.val[0] = (int32x2_t) __builtin_aarch64_get_dregxidi (__o, 0); - __b.val[1] = (int32x2_t) __builtin_aarch64_get_dregxidi (__o, 1); - __b.val[2] = (int32x2_t) __builtin_aarch64_get_dregxidi (__o, 2); - __b.val[3] = (int32x2_t) __builtin_aarch64_get_dregxidi (__o, 3); - return __b; + return __builtin_aarch64_ld4_lanev2si ( + (__builtin_aarch64_simd_si *) __ptr, __b, __c); } __extension__ extern __inline int64x1x4_t __attribute__ ((__always_inline__, __gnu_inline__,__artificial__)) vld4_lane_s64 (const int64_t * __ptr, int64x1x4_t __b, const int __c) { - __builtin_aarch64_simd_xi __o; - int64x2x4_t __temp; - __temp.val[0] = vcombine_s64 (__b.val[0], vcreate_s64 (0)); - __temp.val[1] = vcombine_s64 (__b.val[1], vcreate_s64 (0)); - __temp.val[2] = vcombine_s64 (__b.val[2], vcreate_s64 (0)); - __temp.val[3] = vcombine_s64 (__b.val[3], vcreate_s64 (0)); - __o = __builtin_aarch64_set_qregxiv2di (__o, (int64x2_t) __temp.val[0], 0); - __o = __builtin_aarch64_set_qregxiv2di (__o, (int64x2_t) __temp.val[1], 1); - __o = __builtin_aarch64_set_qregxiv2di (__o, (int64x2_t) __temp.val[2], 2); - __o = __builtin_aarch64_set_qregxiv2di (__o, (int64x2_t) __temp.val[3], 3); - __o = __builtin_aarch64_ld4_lanedi ( - (__builtin_aarch64_simd_di *) __ptr, __o, __c); - __b.val[0] = (int64x1_t) __builtin_aarch64_get_dregxidi (__o, 0); - __b.val[1] = (int64x1_t) __builtin_aarch64_get_dregxidi (__o, 1); - __b.val[2] = (int64x1_t) __builtin_aarch64_get_dregxidi (__o, 2); - __b.val[3] = (int64x1_t) __builtin_aarch64_get_dregxidi (__o, 3); - return __b; + return __builtin_aarch64_ld4_lanedi ( + (__builtin_aarch64_simd_di *) __ptr, __b, __c); } __extension__ extern __inline float16x4x4_t __attribute__ ((__always_inline__, __gnu_inline__,__artificial__)) vld4_lane_f16 (const float16_t * __ptr, float16x4x4_t __b, const int __c) { - __builtin_aarch64_simd_xi __o; - float16x8x4_t __temp; - __temp.val[0] = vcombine_f16 (__b.val[0], vcreate_f16 (0)); - __temp.val[1] = vcombine_f16 (__b.val[1], vcreate_f16 (0)); - __temp.val[2] = vcombine_f16 (__b.val[2], vcreate_f16 (0)); - __temp.val[3] = vcombine_f16 (__b.val[3], vcreate_f16 (0)); - __o = __builtin_aarch64_set_qregxiv8hf (__o, (float16x8_t) __temp.val[0], 0); - __o = __builtin_aarch64_set_qregxiv8hf (__o, (float16x8_t) __temp.val[1], 1); - __o = __builtin_aarch64_set_qregxiv8hf (__o, (float16x8_t) __temp.val[2], 2); - __o = __builtin_aarch64_set_qregxiv8hf (__o, (float16x8_t) __temp.val[3], 3); - __o = __builtin_aarch64_ld4_lanev4hf ( - (__builtin_aarch64_simd_hf *) __ptr, __o, __c); - __b.val[0] = (float16x4_t) __builtin_aarch64_get_dregxidi (__o, 0); - __b.val[1] = (float16x4_t) __builtin_aarch64_get_dregxidi (__o, 1); - __b.val[2] = (float16x4_t) __builtin_aarch64_get_dregxidi (__o, 2); - __b.val[3] = (float16x4_t) __builtin_aarch64_get_dregxidi (__o, 3); - return __b; + return __builtin_aarch64_ld4_lanev4hf ( + (__builtin_aarch64_simd_hf *) __ptr, __b, __c); } __extension__ extern __inline float32x2x4_t __attribute__ ((__always_inline__, __gnu_inline__,__artificial__)) vld4_lane_f32 (const float32_t * __ptr, float32x2x4_t __b, const int __c) { - __builtin_aarch64_simd_xi __o; - float32x4x4_t __temp; - __temp.val[0] = vcombine_f32 (__b.val[0], vcreate_f32 (0)); - __temp.val[1] = vcombine_f32 (__b.val[1], vcreate_f32 (0)); - __temp.val[2] = vcombine_f32 (__b.val[2], vcreate_f32 (0)); - __temp.val[3] = vcombine_f32 (__b.val[3], vcreate_f32 (0)); - __o = __builtin_aarch64_set_qregxiv4sf (__o, (float32x4_t) __temp.val[0], 0); - __o = __builtin_aarch64_set_qregxiv4sf (__o, (float32x4_t) __temp.val[1], 1); - __o = __builtin_aarch64_set_qregxiv4sf (__o, (float32x4_t) __temp.val[2], 2); - __o = __builtin_aarch64_set_qregxiv4sf (__o, (float32x4_t) __temp.val[3], 3); - __o = __builtin_aarch64_ld4_lanev2sf ( - (__builtin_aarch64_simd_sf *) __ptr, __o, __c); - __b.val[0] = (float32x2_t) __builtin_aarch64_get_dregxidi (__o, 0); - __b.val[1] = (float32x2_t) __builtin_aarch64_get_dregxidi (__o, 1); - __b.val[2] = (float32x2_t) __builtin_aarch64_get_dregxidi (__o, 2); - __b.val[3] = (float32x2_t) __builtin_aarch64_get_dregxidi (__o, 3); - return __b; + return __builtin_aarch64_ld4_lanev2sf ( + (__builtin_aarch64_simd_sf *) __ptr, __b, __c); } __extension__ extern __inline float64x1x4_t __attribute__ ((__always_inline__, __gnu_inline__,__artificial__)) vld4_lane_f64 (const float64_t * __ptr, float64x1x4_t __b, const int __c) { - __builtin_aarch64_simd_xi __o; - float64x2x4_t __temp; - __temp.val[0] = vcombine_f64 (__b.val[0], vcreate_f64 (0)); - __temp.val[1] = vcombine_f64 (__b.val[1], vcreate_f64 (0)); - __temp.val[2] = vcombine_f64 (__b.val[2], vcreate_f64 (0)); - __temp.val[3] = vcombine_f64 (__b.val[3], vcreate_f64 (0)); - __o = __builtin_aarch64_set_qregxiv2df (__o, (float64x2_t) __temp.val[0], 0); - __o = __builtin_aarch64_set_qregxiv2df (__o, (float64x2_t) __temp.val[1], 1); - __o = __builtin_aarch64_set_qregxiv2df (__o, (float64x2_t) __temp.val[2], 2); - __o = __builtin_aarch64_set_qregxiv2df (__o, (float64x2_t) __temp.val[3], 3); - __o = __builtin_aarch64_ld4_lanedf ( - (__builtin_aarch64_simd_df *) __ptr, __o, __c); - __b.val[0] = (float64x1_t) __builtin_aarch64_get_dregxidi (__o, 0); - __b.val[1] = (float64x1_t) __builtin_aarch64_get_dregxidi (__o, 1); - __b.val[2] = (float64x1_t) __builtin_aarch64_get_dregxidi (__o, 2); - __b.val[3] = (float64x1_t) __builtin_aarch64_get_dregxidi (__o, 3); - return __b; + return __builtin_aarch64_ld4_lanedf ( + (__builtin_aarch64_simd_df *) __ptr, __b, __c); } __extension__ extern __inline poly8x8x4_t __attribute__ ((__always_inline__, __gnu_inline__,__artificial__)) vld4_lane_p8 (const poly8_t * __ptr, poly8x8x4_t __b, const int __c) { - __builtin_aarch64_simd_xi __o; - poly8x16x4_t __temp; - __temp.val[0] = vcombine_p8 (__b.val[0], vcreate_p8 (0)); - __temp.val[1] = vcombine_p8 (__b.val[1], vcreate_p8 (0)); - __temp.val[2] = vcombine_p8 (__b.val[2], vcreate_p8 (0)); - __temp.val[3] = vcombine_p8 (__b.val[3], vcreate_p8 (0)); - __o = __builtin_aarch64_set_qregxiv16qi (__o, (int8x16_t) __temp.val[0], 0); - __o = __builtin_aarch64_set_qregxiv16qi (__o, (int8x16_t) __temp.val[1], 1); - __o = __builtin_aarch64_set_qregxiv16qi (__o, (int8x16_t) __temp.val[2], 2); - __o = __builtin_aarch64_set_qregxiv16qi (__o, (int8x16_t) __temp.val[3], 3); - __o = __builtin_aarch64_ld4_lanev8qi ( - (__builtin_aarch64_simd_qi *) __ptr, __o, __c); - __b.val[0] = (poly8x8_t) __builtin_aarch64_get_dregxidi (__o, 0); - __b.val[1] = (poly8x8_t) __builtin_aarch64_get_dregxidi (__o, 1); - __b.val[2] = (poly8x8_t) __builtin_aarch64_get_dregxidi (__o, 2); - __b.val[3] = (poly8x8_t) __builtin_aarch64_get_dregxidi (__o, 3); - return __b; + return __builtin_aarch64_ld4_lanev8qi_psps ( + (__builtin_aarch64_simd_qi *) __ptr, __b, __c); } __extension__ extern __inline poly16x4x4_t __attribute__ ((__always_inline__, __gnu_inline__,__artificial__)) vld4_lane_p16 (const poly16_t * __ptr, poly16x4x4_t __b, const int __c) { - __builtin_aarch64_simd_xi __o; - poly16x8x4_t __temp; - __temp.val[0] = vcombine_p16 (__b.val[0], vcreate_p16 (0)); - __temp.val[1] = vcombine_p16 (__b.val[1], vcreate_p16 (0)); - __temp.val[2] = vcombine_p16 (__b.val[2], vcreate_p16 (0)); - __temp.val[3] = vcombine_p16 (__b.val[3], vcreate_p16 (0)); - __o = __builtin_aarch64_set_qregxiv8hi (__o, (int16x8_t) __temp.val[0], 0); - __o = __builtin_aarch64_set_qregxiv8hi (__o, (int16x8_t) __temp.val[1], 1); - __o = __builtin_aarch64_set_qregxiv8hi (__o, (int16x8_t) __temp.val[2], 2); - __o = __builtin_aarch64_set_qregxiv8hi (__o, (int16x8_t) __temp.val[3], 3); - __o = __builtin_aarch64_ld4_lanev4hi ( - (__builtin_aarch64_simd_hi *) __ptr, __o, __c); - __b.val[0] = (poly16x4_t) __builtin_aarch64_get_dregxidi (__o, 0); - __b.val[1] = (poly16x4_t) __builtin_aarch64_get_dregxidi (__o, 1); - __b.val[2] = (poly16x4_t) __builtin_aarch64_get_dregxidi (__o, 2); - __b.val[3] = (poly16x4_t) __builtin_aarch64_get_dregxidi (__o, 3); - return __b; + return __builtin_aarch64_ld4_lanev4hi_psps ( + (__builtin_aarch64_simd_hi *) __ptr, __b, __c); } __extension__ extern __inline poly64x1x4_t __attribute__ ((__always_inline__, __gnu_inline__,__artificial__)) vld4_lane_p64 (const poly64_t * __ptr, poly64x1x4_t __b, const int __c) { - __builtin_aarch64_simd_xi __o; - poly64x2x4_t __temp; - __temp.val[0] = vcombine_p64 (__b.val[0], vcreate_p64 (0)); - __temp.val[1] = vcombine_p64 (__b.val[1], vcreate_p64 (0)); - __temp.val[2] = vcombine_p64 (__b.val[2], vcreate_p64 (0)); - __temp.val[3] = vcombine_p64 (__b.val[3], vcreate_p64 (0)); - __o = __builtin_aarch64_set_qregxiv2di (__o, (int64x2_t) __temp.val[0], 0); - __o = __builtin_aarch64_set_qregxiv2di (__o, (int64x2_t) __temp.val[1], 1); - __o = __builtin_aarch64_set_qregxiv2di (__o, (int64x2_t) __temp.val[2], 2); - __o = __builtin_aarch64_set_qregxiv2di (__o, (int64x2_t) __temp.val[3], 3); - __o = __builtin_aarch64_ld4_lanedi ( - (__builtin_aarch64_simd_di *) __ptr, __o, __c); - __b.val[0] = (poly64x1_t) __builtin_aarch64_get_dregxidi (__o, 0); - __b.val[1] = (poly64x1_t) __builtin_aarch64_get_dregxidi (__o, 1); - __b.val[2] = (poly64x1_t) __builtin_aarch64_get_dregxidi (__o, 2); - __b.val[3] = (poly64x1_t) __builtin_aarch64_get_dregxidi (__o, 3); - return __b; + return __builtin_aarch64_ld4_lanedi_psps ( + (__builtin_aarch64_simd_di *) __ptr, __b, __c); } /* vld4q_lane */ @@ -20640,266 +18150,112 @@ __extension__ extern __inline uint8x16x4_t __attribute__ ((__always_inline__, __gnu_inline__,__artificial__)) vld4q_lane_u8 (const uint8_t * __ptr, uint8x16x4_t __b, const int __c) { - __builtin_aarch64_simd_xi __o; - uint8x16x4_t ret; - __o = __builtin_aarch64_set_qregxiv4si (__o, (int32x4_t) __b.val[0], 0); - __o = __builtin_aarch64_set_qregxiv4si (__o, (int32x4_t) __b.val[1], 1); - __o = __builtin_aarch64_set_qregxiv4si (__o, (int32x4_t) __b.val[2], 2); - __o = __builtin_aarch64_set_qregxiv4si (__o, (int32x4_t) __b.val[3], 3); - __o = __builtin_aarch64_ld4_lanev16qi ( - (__builtin_aarch64_simd_qi *) __ptr, __o, __c); - ret.val[0] = (uint8x16_t) __builtin_aarch64_get_qregxiv4si (__o, 0); - ret.val[1] = (uint8x16_t) __builtin_aarch64_get_qregxiv4si (__o, 1); - ret.val[2] = (uint8x16_t) __builtin_aarch64_get_qregxiv4si (__o, 2); - ret.val[3] = (uint8x16_t) __builtin_aarch64_get_qregxiv4si (__o, 3); - return ret; + return __builtin_aarch64_ld4_lanev16qi_usus ( + (__builtin_aarch64_simd_qi *) __ptr, __b, __c); } __extension__ extern __inline uint16x8x4_t __attribute__ ((__always_inline__, __gnu_inline__,__artificial__)) vld4q_lane_u16 (const uint16_t * __ptr, uint16x8x4_t __b, const int __c) { - __builtin_aarch64_simd_xi __o; - uint16x8x4_t ret; - __o = __builtin_aarch64_set_qregxiv4si (__o, (int32x4_t) __b.val[0], 0); - __o = __builtin_aarch64_set_qregxiv4si (__o, (int32x4_t) __b.val[1], 1); - __o = __builtin_aarch64_set_qregxiv4si (__o, (int32x4_t) __b.val[2], 2); - __o = __builtin_aarch64_set_qregxiv4si (__o, (int32x4_t) __b.val[3], 3); - __o = __builtin_aarch64_ld4_lanev8hi ( - (__builtin_aarch64_simd_hi *) __ptr, __o, __c); - ret.val[0] = (uint16x8_t) __builtin_aarch64_get_qregxiv4si (__o, 0); - ret.val[1] = (uint16x8_t) __builtin_aarch64_get_qregxiv4si (__o, 1); - ret.val[2] = (uint16x8_t) __builtin_aarch64_get_qregxiv4si (__o, 2); - ret.val[3] = (uint16x8_t) __builtin_aarch64_get_qregxiv4si (__o, 3); - return ret; + return __builtin_aarch64_ld4_lanev8hi_usus ( + (__builtin_aarch64_simd_hi *) __ptr, __b, __c); } __extension__ extern __inline uint32x4x4_t __attribute__ ((__always_inline__, __gnu_inline__,__artificial__)) vld4q_lane_u32 (const uint32_t * __ptr, uint32x4x4_t __b, const int __c) { - __builtin_aarch64_simd_xi __o; - uint32x4x4_t ret; - __o = __builtin_aarch64_set_qregxiv4si (__o, (int32x4_t) __b.val[0], 0); - __o = __builtin_aarch64_set_qregxiv4si (__o, (int32x4_t) __b.val[1], 1); - __o = __builtin_aarch64_set_qregxiv4si (__o, (int32x4_t) __b.val[2], 2); - __o = __builtin_aarch64_set_qregxiv4si (__o, (int32x4_t) __b.val[3], 3); - __o = __builtin_aarch64_ld4_lanev4si ( - (__builtin_aarch64_simd_si *) __ptr, __o, __c); - ret.val[0] = (uint32x4_t) __builtin_aarch64_get_qregxiv4si (__o, 0); - ret.val[1] = (uint32x4_t) __builtin_aarch64_get_qregxiv4si (__o, 1); - ret.val[2] = (uint32x4_t) __builtin_aarch64_get_qregxiv4si (__o, 2); - ret.val[3] = (uint32x4_t) __builtin_aarch64_get_qregxiv4si (__o, 3); - return ret; + return __builtin_aarch64_ld4_lanev4si_usus ( + (__builtin_aarch64_simd_si *) __ptr, __b, __c); } __extension__ extern __inline uint64x2x4_t __attribute__ ((__always_inline__, __gnu_inline__,__artificial__)) vld4q_lane_u64 (const uint64_t * __ptr, uint64x2x4_t __b, const int __c) { - __builtin_aarch64_simd_xi __o; - uint64x2x4_t ret; - __o = __builtin_aarch64_set_qregxiv4si (__o, (int32x4_t) __b.val[0], 0); - __o = __builtin_aarch64_set_qregxiv4si (__o, (int32x4_t) __b.val[1], 1); - __o = __builtin_aarch64_set_qregxiv4si (__o, (int32x4_t) __b.val[2], 2); - __o = __builtin_aarch64_set_qregxiv4si (__o, (int32x4_t) __b.val[3], 3); - __o = __builtin_aarch64_ld4_lanev2di ( - (__builtin_aarch64_simd_di *) __ptr, __o, __c); - ret.val[0] = (uint64x2_t) __builtin_aarch64_get_qregxiv4si (__o, 0); - ret.val[1] = (uint64x2_t) __builtin_aarch64_get_qregxiv4si (__o, 1); - ret.val[2] = (uint64x2_t) __builtin_aarch64_get_qregxiv4si (__o, 2); - ret.val[3] = (uint64x2_t) __builtin_aarch64_get_qregxiv4si (__o, 3); - return ret; + return __builtin_aarch64_ld4_lanev2di_usus ( + (__builtin_aarch64_simd_di *) __ptr, __b, __c); } __extension__ extern __inline int8x16x4_t __attribute__ ((__always_inline__, __gnu_inline__,__artificial__)) vld4q_lane_s8 (const int8_t * __ptr, int8x16x4_t __b, const int __c) { - __builtin_aarch64_simd_xi __o; - int8x16x4_t ret; - __o = __builtin_aarch64_set_qregxiv4si (__o, (int32x4_t) __b.val[0], 0); - __o = __builtin_aarch64_set_qregxiv4si (__o, (int32x4_t) __b.val[1], 1); - __o = __builtin_aarch64_set_qregxiv4si (__o, (int32x4_t) __b.val[2], 2); - __o = __builtin_aarch64_set_qregxiv4si (__o, (int32x4_t) __b.val[3], 3); - __o = __builtin_aarch64_ld4_lanev16qi ( - (__builtin_aarch64_simd_qi *) __ptr, __o, __c); - ret.val[0] = (int8x16_t) __builtin_aarch64_get_qregxiv4si (__o, 0); - ret.val[1] = (int8x16_t) __builtin_aarch64_get_qregxiv4si (__o, 1); - ret.val[2] = (int8x16_t) __builtin_aarch64_get_qregxiv4si (__o, 2); - ret.val[3] = (int8x16_t) __builtin_aarch64_get_qregxiv4si (__o, 3); - return ret; + return __builtin_aarch64_ld4_lanev16qi ( + (__builtin_aarch64_simd_qi *) __ptr, __b, __c); } __extension__ extern __inline int16x8x4_t __attribute__ ((__always_inline__, __gnu_inline__,__artificial__)) vld4q_lane_s16 (const int16_t * __ptr, int16x8x4_t __b, const int __c) { - __builtin_aarch64_simd_xi __o; - int16x8x4_t ret; - __o = __builtin_aarch64_set_qregxiv4si (__o, (int32x4_t) __b.val[0], 0); - __o = __builtin_aarch64_set_qregxiv4si (__o, (int32x4_t) __b.val[1], 1); - __o = __builtin_aarch64_set_qregxiv4si (__o, (int32x4_t) __b.val[2], 2); - __o = __builtin_aarch64_set_qregxiv4si (__o, (int32x4_t) __b.val[3], 3); - __o = __builtin_aarch64_ld4_lanev8hi ( - (__builtin_aarch64_simd_hi *) __ptr, __o, __c); - ret.val[0] = (int16x8_t) __builtin_aarch64_get_qregxiv4si (__o, 0); - ret.val[1] = (int16x8_t) __builtin_aarch64_get_qregxiv4si (__o, 1); - ret.val[2] = (int16x8_t) __builtin_aarch64_get_qregxiv4si (__o, 2); - ret.val[3] = (int16x8_t) __builtin_aarch64_get_qregxiv4si (__o, 3); - return ret; + return __builtin_aarch64_ld4_lanev8hi ( + (__builtin_aarch64_simd_hi *) __ptr, __b, __c); } __extension__ extern __inline int32x4x4_t __attribute__ ((__always_inline__, __gnu_inline__,__artificial__)) vld4q_lane_s32 (const int32_t * __ptr, int32x4x4_t __b, const int __c) { - __builtin_aarch64_simd_xi __o; - int32x4x4_t ret; - __o = __builtin_aarch64_set_qregxiv4si (__o, (int32x4_t) __b.val[0], 0); - __o = __builtin_aarch64_set_qregxiv4si (__o, (int32x4_t) __b.val[1], 1); - __o = __builtin_aarch64_set_qregxiv4si (__o, (int32x4_t) __b.val[2], 2); - __o = __builtin_aarch64_set_qregxiv4si (__o, (int32x4_t) __b.val[3], 3); - __o = __builtin_aarch64_ld4_lanev4si ( - (__builtin_aarch64_simd_si *) __ptr, __o, __c); - ret.val[0] = (int32x4_t) __builtin_aarch64_get_qregxiv4si (__o, 0); - ret.val[1] = (int32x4_t) __builtin_aarch64_get_qregxiv4si (__o, 1); - ret.val[2] = (int32x4_t) __builtin_aarch64_get_qregxiv4si (__o, 2); - ret.val[3] = (int32x4_t) __builtin_aarch64_get_qregxiv4si (__o, 3); - return ret; + return __builtin_aarch64_ld4_lanev4si ( + (__builtin_aarch64_simd_si *) __ptr, __b, __c); } __extension__ extern __inline int64x2x4_t __attribute__ ((__always_inline__, __gnu_inline__,__artificial__)) vld4q_lane_s64 (const int64_t * __ptr, int64x2x4_t __b, const int __c) { - __builtin_aarch64_simd_xi __o; - int64x2x4_t ret; - __o = __builtin_aarch64_set_qregxiv4si (__o, (int32x4_t) __b.val[0], 0); - __o = __builtin_aarch64_set_qregxiv4si (__o, (int32x4_t) __b.val[1], 1); - __o = __builtin_aarch64_set_qregxiv4si (__o, (int32x4_t) __b.val[2], 2); - __o = __builtin_aarch64_set_qregxiv4si (__o, (int32x4_t) __b.val[3], 3); - __o = __builtin_aarch64_ld4_lanev2di ( - (__builtin_aarch64_simd_di *) __ptr, __o, __c); - ret.val[0] = (int64x2_t) __builtin_aarch64_get_qregxiv4si (__o, 0); - ret.val[1] = (int64x2_t) __builtin_aarch64_get_qregxiv4si (__o, 1); - ret.val[2] = (int64x2_t) __builtin_aarch64_get_qregxiv4si (__o, 2); - ret.val[3] = (int64x2_t) __builtin_aarch64_get_qregxiv4si (__o, 3); - return ret; + return __builtin_aarch64_ld4_lanev2di ( + (__builtin_aarch64_simd_di *) __ptr, __b, __c); } __extension__ extern __inline float16x8x4_t __attribute__ ((__always_inline__, __gnu_inline__,__artificial__)) vld4q_lane_f16 (const float16_t * __ptr, float16x8x4_t __b, const int __c) { - __builtin_aarch64_simd_xi __o; - float16x8x4_t ret; - __o = __builtin_aarch64_set_qregxiv4si (__o, (int32x4_t) __b.val[0], 0); - __o = __builtin_aarch64_set_qregxiv4si (__o, (int32x4_t) __b.val[1], 1); - __o = __builtin_aarch64_set_qregxiv4si (__o, (int32x4_t) __b.val[2], 2); - __o = __builtin_aarch64_set_qregxiv4si (__o, (int32x4_t) __b.val[3], 3); - __o = __builtin_aarch64_ld4_lanev8hf ( - (__builtin_aarch64_simd_hf *) __ptr, __o, __c); - ret.val[0] = (float16x8_t) __builtin_aarch64_get_qregxiv4si (__o, 0); - ret.val[1] = (float16x8_t) __builtin_aarch64_get_qregxiv4si (__o, 1); - ret.val[2] = (float16x8_t) __builtin_aarch64_get_qregxiv4si (__o, 2); - ret.val[3] = (float16x8_t) __builtin_aarch64_get_qregxiv4si (__o, 3); - return ret; + return __builtin_aarch64_ld4_lanev8hf ( + (__builtin_aarch64_simd_hf *) __ptr, __b, __c); } __extension__ extern __inline float32x4x4_t __attribute__ ((__always_inline__, __gnu_inline__,__artificial__)) vld4q_lane_f32 (const float32_t * __ptr, float32x4x4_t __b, const int __c) { - __builtin_aarch64_simd_xi __o; - float32x4x4_t ret; - __o = __builtin_aarch64_set_qregxiv4si (__o, (int32x4_t) __b.val[0], 0); - __o = __builtin_aarch64_set_qregxiv4si (__o, (int32x4_t) __b.val[1], 1); - __o = __builtin_aarch64_set_qregxiv4si (__o, (int32x4_t) __b.val[2], 2); - __o = __builtin_aarch64_set_qregxiv4si (__o, (int32x4_t) __b.val[3], 3); - __o = __builtin_aarch64_ld4_lanev4sf ( - (__builtin_aarch64_simd_sf *) __ptr, __o, __c); - ret.val[0] = (float32x4_t) __builtin_aarch64_get_qregxiv4si (__o, 0); - ret.val[1] = (float32x4_t) __builtin_aarch64_get_qregxiv4si (__o, 1); - ret.val[2] = (float32x4_t) __builtin_aarch64_get_qregxiv4si (__o, 2); - ret.val[3] = (float32x4_t) __builtin_aarch64_get_qregxiv4si (__o, 3); - return ret; + return __builtin_aarch64_ld4_lanev4sf ( + (__builtin_aarch64_simd_sf *) __ptr, __b, __c); } __extension__ extern __inline float64x2x4_t __attribute__ ((__always_inline__, __gnu_inline__,__artificial__)) vld4q_lane_f64 (const float64_t * __ptr, float64x2x4_t __b, const int __c) { - __builtin_aarch64_simd_xi __o; - float64x2x4_t ret; - __o = __builtin_aarch64_set_qregxiv4si (__o, (int32x4_t) __b.val[0], 0); - __o = __builtin_aarch64_set_qregxiv4si (__o, (int32x4_t) __b.val[1], 1); - __o = __builtin_aarch64_set_qregxiv4si (__o, (int32x4_t) __b.val[2], 2); - __o = __builtin_aarch64_set_qregxiv4si (__o, (int32x4_t) __b.val[3], 3); - __o = __builtin_aarch64_ld4_lanev2df ( - (__builtin_aarch64_simd_df *) __ptr, __o, __c); - ret.val[0] = (float64x2_t) __builtin_aarch64_get_qregxiv4si (__o, 0); - ret.val[1] = (float64x2_t) __builtin_aarch64_get_qregxiv4si (__o, 1); - ret.val[2] = (float64x2_t) __builtin_aarch64_get_qregxiv4si (__o, 2); - ret.val[3] = (float64x2_t) __builtin_aarch64_get_qregxiv4si (__o, 3); - return ret; + return __builtin_aarch64_ld4_lanev2df ( + (__builtin_aarch64_simd_df *) __ptr, __b, __c); } __extension__ extern __inline poly8x16x4_t __attribute__ ((__always_inline__, __gnu_inline__,__artificial__)) vld4q_lane_p8 (const poly8_t * __ptr, poly8x16x4_t __b, const int __c) { - __builtin_aarch64_simd_xi __o; - poly8x16x4_t ret; - __o = __builtin_aarch64_set_qregxiv4si (__o, (int32x4_t) __b.val[0], 0); - __o = __builtin_aarch64_set_qregxiv4si (__o, (int32x4_t) __b.val[1], 1); - __o = __builtin_aarch64_set_qregxiv4si (__o, (int32x4_t) __b.val[2], 2); - __o = __builtin_aarch64_set_qregxiv4si (__o, (int32x4_t) __b.val[3], 3); - __o = __builtin_aarch64_ld4_lanev16qi ( - (__builtin_aarch64_simd_qi *) __ptr, __o, __c); - ret.val[0] = (poly8x16_t) __builtin_aarch64_get_qregxiv4si (__o, 0); - ret.val[1] = (poly8x16_t) __builtin_aarch64_get_qregxiv4si (__o, 1); - ret.val[2] = (poly8x16_t) __builtin_aarch64_get_qregxiv4si (__o, 2); - ret.val[3] = (poly8x16_t) __builtin_aarch64_get_qregxiv4si (__o, 3); - return ret; + return __builtin_aarch64_ld4_lanev16qi_psps ( + (__builtin_aarch64_simd_qi *) __ptr, __b, __c); } __extension__ extern __inline poly16x8x4_t __attribute__ ((__always_inline__, __gnu_inline__,__artificial__)) vld4q_lane_p16 (const poly16_t * __ptr, poly16x8x4_t __b, const int __c) { - __builtin_aarch64_simd_xi __o; - poly16x8x4_t ret; - __o = __builtin_aarch64_set_qregxiv4si (__o, (int32x4_t) __b.val[0], 0); - __o = __builtin_aarch64_set_qregxiv4si (__o, (int32x4_t) __b.val[1], 1); - __o = __builtin_aarch64_set_qregxiv4si (__o, (int32x4_t) __b.val[2], 2); - __o = __builtin_aarch64_set_qregxiv4si (__o, (int32x4_t) __b.val[3], 3); - __o = __builtin_aarch64_ld4_lanev8hi ( - (__builtin_aarch64_simd_hi *) __ptr, __o, __c); - ret.val[0] = (poly16x8_t) __builtin_aarch64_get_qregxiv4si (__o, 0); - ret.val[1] = (poly16x8_t) __builtin_aarch64_get_qregxiv4si (__o, 1); - ret.val[2] = (poly16x8_t) __builtin_aarch64_get_qregxiv4si (__o, 2); - ret.val[3] = (poly16x8_t) __builtin_aarch64_get_qregxiv4si (__o, 3); - return ret; + return __builtin_aarch64_ld4_lanev8hi_psps ( + (__builtin_aarch64_simd_hi *) __ptr, __b, __c); } __extension__ extern __inline poly64x2x4_t __attribute__ ((__always_inline__, __gnu_inline__,__artificial__)) vld4q_lane_p64 (const poly64_t * __ptr, poly64x2x4_t __b, const int __c) { - __builtin_aarch64_simd_xi __o; - poly64x2x4_t ret; - __o = __builtin_aarch64_set_qregxiv4si (__o, (int32x4_t) __b.val[0], 0); - __o = __builtin_aarch64_set_qregxiv4si (__o, (int32x4_t) __b.val[1], 1); - __o = __builtin_aarch64_set_qregxiv4si (__o, (int32x4_t) __b.val[2], 2); - __o = __builtin_aarch64_set_qregxiv4si (__o, (int32x4_t) __b.val[3], 3); - __o = __builtin_aarch64_ld4_lanev2di ( - (__builtin_aarch64_simd_di *) __ptr, __o, __c); - ret.val[0] = (poly64x2_t) __builtin_aarch64_get_qregxiv4si (__o, 0); - ret.val[1] = (poly64x2_t) __builtin_aarch64_get_qregxiv4si (__o, 1); - ret.val[2] = (poly64x2_t) __builtin_aarch64_get_qregxiv4si (__o, 2); - ret.val[3] = (poly64x2_t) __builtin_aarch64_get_qregxiv4si (__o, 3); - return ret; + return __builtin_aarch64_ld4_lanev2di_psps ( + (__builtin_aarch64_simd_di *) __ptr, __b, __c); } /* vmax */ @@ -24916,54 +22272,42 @@ __extension__ extern __inline int8x8_t __attribute__ ((__always_inline__, __gnu_inline__, __artificial__)) vqtbl2_s8 (int8x16x2_t __tab, uint8x8_t __idx) { - __builtin_aarch64_simd_oi __o; - __builtin_memcpy (&__o, &__tab, sizeof (__tab)); - return __builtin_aarch64_qtbl2v8qi (__o, (int8x8_t)__idx); + return __builtin_aarch64_qtbl2v8qi_ssu (__tab, __idx); } __extension__ extern __inline uint8x8_t __attribute__ ((__always_inline__, __gnu_inline__, __artificial__)) vqtbl2_u8 (uint8x16x2_t __tab, uint8x8_t __idx) { - __builtin_aarch64_simd_oi __o; - __builtin_memcpy (&__o, &__tab, sizeof (__tab)); - return (uint8x8_t)__builtin_aarch64_qtbl2v8qi (__o, (int8x8_t)__idx); + return __builtin_aarch64_qtbl2v8qi_uuu (__tab, __idx); } __extension__ extern __inline poly8x8_t __attribute__ ((__always_inline__, __gnu_inline__, __artificial__)) vqtbl2_p8 (poly8x16x2_t __tab, uint8x8_t __idx) { - __builtin_aarch64_simd_oi __o; - __builtin_memcpy (&__o, &__tab, sizeof (__tab)); - return (poly8x8_t)__builtin_aarch64_qtbl2v8qi (__o, (int8x8_t)__idx); + return __builtin_aarch64_qtbl2v8qi_ppu (__tab, __idx); } __extension__ extern __inline int8x16_t __attribute__ ((__always_inline__, __gnu_inline__, __artificial__)) vqtbl2q_s8 (int8x16x2_t __tab, uint8x16_t __idx) { - __builtin_aarch64_simd_oi __o; - __builtin_memcpy (&__o, &__tab, sizeof (__tab)); - return __builtin_aarch64_qtbl2v16qi (__o, (int8x16_t)__idx); + return __builtin_aarch64_qtbl2v16qi_ssu (__tab, __idx); } __extension__ extern __inline uint8x16_t __attribute__ ((__always_inline__, __gnu_inline__, __artificial__)) vqtbl2q_u8 (uint8x16x2_t __tab, uint8x16_t __idx) { - __builtin_aarch64_simd_oi __o; - __builtin_memcpy (&__o, &__tab, sizeof (__tab)); - return (uint8x16_t)__builtin_aarch64_qtbl2v16qi (__o, (int8x16_t)__idx); + return __builtin_aarch64_qtbl2v16qi_uuu (__tab, __idx); } __extension__ extern __inline poly8x16_t __attribute__ ((__always_inline__, __gnu_inline__, __artificial__)) vqtbl2q_p8 (poly8x16x2_t __tab, uint8x16_t __idx) { - __builtin_aarch64_simd_oi __o; - __builtin_memcpy (&__o, &__tab, sizeof (__tab)); - return (poly8x16_t)__builtin_aarch64_qtbl2v16qi (__o, (int8x16_t)__idx); + return __builtin_aarch64_qtbl2v16qi_ppu (__tab, __idx); } /* vqtbl3 */ @@ -24972,54 +22316,42 @@ __extension__ extern __inline int8x8_t __attribute__ ((__always_inline__, __gnu_inline__, __artificial__)) vqtbl3_s8 (int8x16x3_t __tab, uint8x8_t __idx) { - __builtin_aarch64_simd_ci __o; - __builtin_memcpy (&__o, &__tab, sizeof (__tab)); - return __builtin_aarch64_qtbl3v8qi (__o, (int8x8_t)__idx); + return __builtin_aarch64_qtbl3v8qi_ssu (__tab, __idx); } __extension__ extern __inline uint8x8_t __attribute__ ((__always_inline__, __gnu_inline__, __artificial__)) vqtbl3_u8 (uint8x16x3_t __tab, uint8x8_t __idx) { - __builtin_aarch64_simd_ci __o; - __builtin_memcpy (&__o, &__tab, sizeof (__tab)); - return (uint8x8_t)__builtin_aarch64_qtbl3v8qi (__o, (int8x8_t)__idx); + return __builtin_aarch64_qtbl3v8qi_uuu (__tab, __idx); } __extension__ extern __inline poly8x8_t __attribute__ ((__always_inline__, __gnu_inline__, __artificial__)) vqtbl3_p8 (poly8x16x3_t __tab, uint8x8_t __idx) { - __builtin_aarch64_simd_ci __o; - __builtin_memcpy (&__o, &__tab, sizeof (__tab)); - return (poly8x8_t)__builtin_aarch64_qtbl3v8qi (__o, (int8x8_t)__idx); + return __builtin_aarch64_qtbl3v8qi_ppu (__tab, __idx); } __extension__ extern __inline int8x16_t __attribute__ ((__always_inline__, __gnu_inline__, __artificial__)) vqtbl3q_s8 (int8x16x3_t __tab, uint8x16_t __idx) { - __builtin_aarch64_simd_ci __o; - __builtin_memcpy (&__o, &__tab, sizeof (__tab)); - return __builtin_aarch64_qtbl3v16qi (__o, (int8x16_t)__idx); + return __builtin_aarch64_qtbl3v16qi_ssu (__tab, __idx); } __extension__ extern __inline uint8x16_t __attribute__ ((__always_inline__, __gnu_inline__, __artificial__)) vqtbl3q_u8 (uint8x16x3_t __tab, uint8x16_t __idx) { - __builtin_aarch64_simd_ci __o; - __builtin_memcpy (&__o, &__tab, sizeof (__tab)); - return (uint8x16_t)__builtin_aarch64_qtbl3v16qi (__o, (int8x16_t)__idx); + return __builtin_aarch64_qtbl3v16qi_uuu (__tab, __idx); } __extension__ extern __inline poly8x16_t __attribute__ ((__always_inline__, __gnu_inline__, __artificial__)) vqtbl3q_p8 (poly8x16x3_t __tab, uint8x16_t __idx) { - __builtin_aarch64_simd_ci __o; - __builtin_memcpy (&__o, &__tab, sizeof (__tab)); - return (poly8x16_t)__builtin_aarch64_qtbl3v16qi (__o, (int8x16_t)__idx); + return __builtin_aarch64_qtbl3v16qi_ppu (__tab, __idx); } /* vqtbl4 */ @@ -25028,54 +22360,42 @@ __extension__ extern __inline int8x8_t __attribute__ ((__always_inline__, __gnu_inline__, __artificial__)) vqtbl4_s8 (int8x16x4_t __tab, uint8x8_t __idx) { - __builtin_aarch64_simd_xi __o; - __builtin_memcpy (&__o, &__tab, sizeof (__tab)); - return __builtin_aarch64_qtbl4v8qi (__o, (int8x8_t)__idx); + return __builtin_aarch64_qtbl4v8qi_ssu (__tab, __idx); } __extension__ extern __inline uint8x8_t __attribute__ ((__always_inline__, __gnu_inline__, __artificial__)) vqtbl4_u8 (uint8x16x4_t __tab, uint8x8_t __idx) { - __builtin_aarch64_simd_xi __o; - __builtin_memcpy (&__o, &__tab, sizeof (__tab)); - return (uint8x8_t)__builtin_aarch64_qtbl4v8qi (__o, (int8x8_t)__idx); + return __builtin_aarch64_qtbl4v8qi_uuu (__tab, __idx); } __extension__ extern __inline poly8x8_t __attribute__ ((__always_inline__, __gnu_inline__, __artificial__)) vqtbl4_p8 (poly8x16x4_t __tab, uint8x8_t __idx) { - __builtin_aarch64_simd_xi __o; - __builtin_memcpy (&__o, &__tab, sizeof (__tab)); - return (poly8x8_t)__builtin_aarch64_qtbl4v8qi (__o, (int8x8_t)__idx); + return __builtin_aarch64_qtbl4v8qi_ppu (__tab, __idx); } __extension__ extern __inline int8x16_t __attribute__ ((__always_inline__, __gnu_inline__, __artificial__)) vqtbl4q_s8 (int8x16x4_t __tab, uint8x16_t __idx) { - __builtin_aarch64_simd_xi __o; - __builtin_memcpy (&__o, &__tab, sizeof (__tab)); - return __builtin_aarch64_qtbl4v16qi (__o, (int8x16_t)__idx); + return __builtin_aarch64_qtbl4v16qi_ssu (__tab, __idx); } __extension__ extern __inline uint8x16_t __attribute__ ((__always_inline__, __gnu_inline__, __artificial__)) vqtbl4q_u8 (uint8x16x4_t __tab, uint8x16_t __idx) { - __builtin_aarch64_simd_xi __o; - __builtin_memcpy (&__o, &__tab, sizeof (__tab)); - return (uint8x16_t)__builtin_aarch64_qtbl4v16qi (__o, (int8x16_t)__idx); + return __builtin_aarch64_qtbl4v16qi_uuu (__tab, __idx); } __extension__ extern __inline poly8x16_t __attribute__ ((__always_inline__, __gnu_inline__, __artificial__)) vqtbl4q_p8 (poly8x16x4_t __tab, uint8x16_t __idx) { - __builtin_aarch64_simd_xi __o; - __builtin_memcpy (&__o, &__tab, sizeof (__tab)); - return (poly8x16_t)__builtin_aarch64_qtbl4v16qi (__o, (int8x16_t)__idx); + return __builtin_aarch64_qtbl4v16qi_ppu (__tab, __idx); } /* vqtbx2 */ @@ -25084,58 +22404,42 @@ __extension__ extern __inline int8x8_t __attribute__ ((__always_inline__, __gnu_inline__, __artificial__)) vqtbx2_s8 (int8x8_t __r, int8x16x2_t __tab, uint8x8_t __idx) { - __builtin_aarch64_simd_oi __o; - __builtin_memcpy (&__o, &__tab, sizeof (__tab)); - return __builtin_aarch64_qtbx2v8qi (__r, __o, (int8x8_t)__idx); + return __builtin_aarch64_qtbx2v8qi_sssu (__r, __tab, __idx); } __extension__ extern __inline uint8x8_t __attribute__ ((__always_inline__, __gnu_inline__, __artificial__)) vqtbx2_u8 (uint8x8_t __r, uint8x16x2_t __tab, uint8x8_t __idx) { - __builtin_aarch64_simd_oi __o; - __builtin_memcpy (&__o, &__tab, sizeof (__tab)); - return (uint8x8_t)__builtin_aarch64_qtbx2v8qi ((int8x8_t)__r, __o, - (int8x8_t)__idx); + return __builtin_aarch64_qtbx2v8qi_uuuu (__r, __tab, __idx); } __extension__ extern __inline poly8x8_t __attribute__ ((__always_inline__, __gnu_inline__, __artificial__)) vqtbx2_p8 (poly8x8_t __r, poly8x16x2_t __tab, uint8x8_t __idx) { - __builtin_aarch64_simd_oi __o; - __builtin_memcpy (&__o, &__tab, sizeof (__tab)); - return (poly8x8_t)__builtin_aarch64_qtbx2v8qi ((int8x8_t)__r, __o, - (int8x8_t)__idx); + return __builtin_aarch64_qtbx2v8qi_pppu (__r, __tab, __idx); } __extension__ extern __inline int8x16_t __attribute__ ((__always_inline__, __gnu_inline__, __artificial__)) vqtbx2q_s8 (int8x16_t __r, int8x16x2_t __tab, uint8x16_t __idx) { - __builtin_aarch64_simd_oi __o; - __builtin_memcpy (&__o, &__tab, sizeof (__tab)); - return __builtin_aarch64_qtbx2v16qi (__r, __o, (int8x16_t)__idx); + return __builtin_aarch64_qtbx2v16qi_sssu (__r, __tab, __idx); } __extension__ extern __inline uint8x16_t __attribute__ ((__always_inline__, __gnu_inline__, __artificial__)) vqtbx2q_u8 (uint8x16_t __r, uint8x16x2_t __tab, uint8x16_t __idx) { - __builtin_aarch64_simd_oi __o; - __builtin_memcpy (&__o, &__tab, sizeof (__tab)); - return (uint8x16_t)__builtin_aarch64_qtbx2v16qi ((int8x16_t)__r, __o, - (int8x16_t)__idx); + return __builtin_aarch64_qtbx2v16qi_uuuu (__r, __tab, __idx); } __extension__ extern __inline poly8x16_t __attribute__ ((__always_inline__, __gnu_inline__, __artificial__)) vqtbx2q_p8 (poly8x16_t __r, poly8x16x2_t __tab, uint8x16_t __idx) { - __builtin_aarch64_simd_oi __o; - __builtin_memcpy (&__o, &__tab, sizeof (__tab)); - return (poly8x16_t)__builtin_aarch64_qtbx2v16qi ((int8x16_t)__r, __o, - (int8x16_t)__idx); + return __builtin_aarch64_qtbx2v16qi_pppu (__r, __tab, __idx); } /* vqtbx3 */ @@ -25144,58 +22448,42 @@ __extension__ extern __inline int8x8_t __attribute__ ((__always_inline__, __gnu_inline__, __artificial__)) vqtbx3_s8 (int8x8_t __r, int8x16x3_t __tab, uint8x8_t __idx) { - __builtin_aarch64_simd_ci __o; - __builtin_memcpy (&__o, &__tab, sizeof (__tab)); - return __builtin_aarch64_qtbx3v8qi (__r, __o, (int8x8_t)__idx); + return __builtin_aarch64_qtbx3v8qi_sssu (__r, __tab, __idx); } __extension__ extern __inline uint8x8_t __attribute__ ((__always_inline__, __gnu_inline__, __artificial__)) vqtbx3_u8 (uint8x8_t __r, uint8x16x3_t __tab, uint8x8_t __idx) { - __builtin_aarch64_simd_ci __o; - __builtin_memcpy (&__o, &__tab, sizeof (__tab)); - return (uint8x8_t)__builtin_aarch64_qtbx3v8qi ((int8x8_t)__r, __o, - (int8x8_t)__idx); + return __builtin_aarch64_qtbx3v8qi_uuuu (__r, __tab, __idx); } __extension__ extern __inline poly8x8_t __attribute__ ((__always_inline__, __gnu_inline__, __artificial__)) vqtbx3_p8 (poly8x8_t __r, poly8x16x3_t __tab, uint8x8_t __idx) { - __builtin_aarch64_simd_ci __o; - __builtin_memcpy (&__o, &__tab, sizeof (__tab)); - return (poly8x8_t)__builtin_aarch64_qtbx3v8qi ((int8x8_t)__r, __o, - (int8x8_t)__idx); + return __builtin_aarch64_qtbx3v8qi_pppu (__r, __tab, __idx); } __extension__ extern __inline int8x16_t __attribute__ ((__always_inline__, __gnu_inline__, __artificial__)) vqtbx3q_s8 (int8x16_t __r, int8x16x3_t __tab, uint8x16_t __idx) { - __builtin_aarch64_simd_ci __o; - __builtin_memcpy (&__o, &__tab, sizeof (__tab)); - return __builtin_aarch64_qtbx3v16qi (__r, __o, (int8x16_t)__idx); + return __builtin_aarch64_qtbx3v16qi_sssu (__r, __tab, __idx); } __extension__ extern __inline uint8x16_t __attribute__ ((__always_inline__, __gnu_inline__, __artificial__)) vqtbx3q_u8 (uint8x16_t __r, uint8x16x3_t __tab, uint8x16_t __idx) { - __builtin_aarch64_simd_ci __o; - __builtin_memcpy (&__o, &__tab, sizeof (__tab)); - return (uint8x16_t)__builtin_aarch64_qtbx3v16qi ((int8x16_t)__r, __o, - (int8x16_t)__idx); + return __builtin_aarch64_qtbx3v16qi_uuuu (__r, __tab, __idx); } __extension__ extern __inline poly8x16_t __attribute__ ((__always_inline__, __gnu_inline__, __artificial__)) vqtbx3q_p8 (poly8x16_t __r, poly8x16x3_t __tab, uint8x16_t __idx) { - __builtin_aarch64_simd_ci __o; - __builtin_memcpy (&__o, &__tab, sizeof (__tab)); - return (poly8x16_t)__builtin_aarch64_qtbx3v16qi ((int8x16_t)__r, __o, - (int8x16_t)__idx); + return __builtin_aarch64_qtbx3v16qi_pppu (__r, __tab, __idx); } /* vqtbx4 */ @@ -25204,58 +22492,42 @@ __extension__ extern __inline int8x8_t __attribute__ ((__always_inline__, __gnu_inline__, __artificial__)) vqtbx4_s8 (int8x8_t __r, int8x16x4_t __tab, uint8x8_t __idx) { - __builtin_aarch64_simd_xi __o; - __builtin_memcpy (&__o, &__tab, sizeof (__tab)); - return __builtin_aarch64_qtbx4v8qi (__r, __o, (int8x8_t)__idx); + return __builtin_aarch64_qtbx4v8qi_sssu (__r, __tab, __idx); } __extension__ extern __inline uint8x8_t __attribute__ ((__always_inline__, __gnu_inline__, __artificial__)) vqtbx4_u8 (uint8x8_t __r, uint8x16x4_t __tab, uint8x8_t __idx) { - __builtin_aarch64_simd_xi __o; - __builtin_memcpy (&__o, &__tab, sizeof (__tab)); - return (uint8x8_t)__builtin_aarch64_qtbx4v8qi ((int8x8_t)__r, __o, - (int8x8_t)__idx); + return __builtin_aarch64_qtbx4v8qi_uuuu (__r, __tab, __idx); } __extension__ extern __inline poly8x8_t __attribute__ ((__always_inline__, __gnu_inline__, __artificial__)) vqtbx4_p8 (poly8x8_t __r, poly8x16x4_t __tab, uint8x8_t __idx) { - __builtin_aarch64_simd_xi __o; - __builtin_memcpy (&__o, &__tab, sizeof (__tab)); - return (poly8x8_t)__builtin_aarch64_qtbx4v8qi ((int8x8_t)__r, __o, - (int8x8_t)__idx); + return __builtin_aarch64_qtbx4v8qi_pppu (__r, __tab, __idx); } __extension__ extern __inline int8x16_t __attribute__ ((__always_inline__, __gnu_inline__, __artificial__)) vqtbx4q_s8 (int8x16_t __r, int8x16x4_t __tab, uint8x16_t __idx) { - __builtin_aarch64_simd_xi __o; - __builtin_memcpy (&__o, &__tab, sizeof (__tab)); - return __builtin_aarch64_qtbx4v16qi (__r, __o, (int8x16_t)__idx); + return __builtin_aarch64_qtbx4v16qi_sssu (__r, __tab, __idx); } __extension__ extern __inline uint8x16_t __attribute__ ((__always_inline__, __gnu_inline__, __artificial__)) vqtbx4q_u8 (uint8x16_t __r, uint8x16x4_t __tab, uint8x16_t __idx) { - __builtin_aarch64_simd_xi __o; - __builtin_memcpy (&__o, &__tab, sizeof (__tab)); - return (uint8x16_t)__builtin_aarch64_qtbx4v16qi ((int8x16_t)__r, __o, - (int8x16_t)__idx); + return __builtin_aarch64_qtbx4v16qi_uuuu (__r, __tab, __idx); } __extension__ extern __inline poly8x16_t __attribute__ ((__always_inline__, __gnu_inline__, __artificial__)) vqtbx4q_p8 (poly8x16_t __r, poly8x16x4_t __tab, uint8x16_t __idx) { - __builtin_aarch64_simd_xi __o; - __builtin_memcpy (&__o, &__tab, sizeof (__tab)); - return (poly8x16_t)__builtin_aarch64_qtbx4v16qi ((int8x16_t)__r, __o, - (int8x16_t)__idx); + return __builtin_aarch64_qtbx4v16qi_pppu (__r, __tab, __idx); } /* vrbit */ @@ -27881,310 +25153,196 @@ __extension__ extern __inline void __attribute__ ((__always_inline__, __gnu_inline__, __artificial__)) vst1_s64_x2 (int64_t * __a, int64x1x2_t __val) { - __builtin_aarch64_simd_oi __o; - int64x2x2_t __temp; - __temp.val[0] - = vcombine_s64 (__val.val[0], vcreate_s64 (__AARCH64_INT64_C (0))); - __temp.val[1] - = vcombine_s64 (__val.val[1], vcreate_s64 (__AARCH64_INT64_C (0))); - __builtin_memcpy (&__o, &__temp, sizeof (__temp)); - __builtin_aarch64_st1x2di ((__builtin_aarch64_simd_di *) __a, __o); + __builtin_aarch64_st1x2di ((__builtin_aarch64_simd_di *) __a, __val); } __extension__ extern __inline void __attribute__ ((__always_inline__, __gnu_inline__, __artificial__)) vst1_u64_x2 (uint64_t * __a, uint64x1x2_t __val) { - __builtin_aarch64_simd_oi __o; - uint64x2x2_t __temp; - __temp.val[0] - = vcombine_u64 (__val.val[0], vcreate_u64 (__AARCH64_UINT64_C (0))); - __temp.val[1] - = vcombine_u64 (__val.val[1], vcreate_u64 (__AARCH64_UINT64_C (0))); - __builtin_memcpy (&__o, &__temp, sizeof (__temp)); - __builtin_aarch64_st1x2di ((__builtin_aarch64_simd_di *) __a, __o); + __builtin_aarch64_st1x2di_su ((__builtin_aarch64_simd_di *) __a, __val); } __extension__ extern __inline void __attribute__ ((__always_inline__, __gnu_inline__, __artificial__)) vst1_f64_x2 (float64_t * __a, float64x1x2_t __val) { - __builtin_aarch64_simd_oi __o; - float64x2x2_t __temp; - __temp.val[0] - = vcombine_f64 (__val.val[0], vcreate_f64 (__AARCH64_UINT64_C (0))); - __temp.val[1] - = vcombine_f64 (__val.val[1], vcreate_f64 (__AARCH64_UINT64_C (0))); - __builtin_memcpy (&__o, &__temp, sizeof (__temp)); - __builtin_aarch64_st1x2df ((__builtin_aarch64_simd_df *) __a, __o); + __builtin_aarch64_st1x2df ((__builtin_aarch64_simd_df *) __a, __val); } __extension__ extern __inline void __attribute__ ((__always_inline__, __gnu_inline__, __artificial__)) vst1_s8_x2 (int8_t * __a, int8x8x2_t __val) { - __builtin_aarch64_simd_oi __o; - int8x16x2_t __temp; - __temp.val[0] - = vcombine_s8 (__val.val[0], vcreate_s8 (__AARCH64_INT64_C (0))); - __temp.val[1] - = vcombine_s8 (__val.val[1], vcreate_s8 (__AARCH64_INT64_C (0))); - __builtin_memcpy (&__o, &__temp, sizeof (__temp)); - __builtin_aarch64_st1x2v8qi ((__builtin_aarch64_simd_qi *) __a, __o); + __builtin_aarch64_st1x2v8qi ((__builtin_aarch64_simd_qi *) __a, __val); } __extension__ extern __inline void __attribute__ ((__always_inline__, __gnu_inline__, __artificial__)) vst1_p8_x2 (poly8_t * __a, poly8x8x2_t __val) { - __builtin_aarch64_simd_oi __o; - poly8x16x2_t __temp; - __temp.val[0] - = vcombine_p8 (__val.val[0], vcreate_p8 (__AARCH64_UINT64_C (0))); - __temp.val[1] - = vcombine_p8 (__val.val[1], vcreate_p8 (__AARCH64_UINT64_C (0))); - __builtin_memcpy (&__o, &__temp, sizeof (__temp)); - __builtin_aarch64_st1x2v8qi ((__builtin_aarch64_simd_qi *) __a, __o); + __builtin_aarch64_st1x2v8qi_sp ((__builtin_aarch64_simd_qi *) __a, __val); } __extension__ extern __inline void __attribute__ ((__always_inline__, __gnu_inline__, __artificial__)) vst1_s16_x2 (int16_t * __a, int16x4x2_t __val) { - __builtin_aarch64_simd_oi __o; - int16x8x2_t __temp; - __temp.val[0] - = vcombine_s16 (__val.val[0], vcreate_s16 (__AARCH64_INT64_C (0))); - __temp.val[1] - = vcombine_s16 (__val.val[1], vcreate_s16 (__AARCH64_INT64_C (0))); - __builtin_memcpy (&__o, &__temp, sizeof (__temp)); - __builtin_aarch64_st1x2v4hi ((__builtin_aarch64_simd_hi *) __a, __o); + __builtin_aarch64_st1x2v4hi ((__builtin_aarch64_simd_hi *) __a, __val); } __extension__ extern __inline void __attribute__ ((__always_inline__, __gnu_inline__, __artificial__)) vst1_p16_x2 (poly16_t * __a, poly16x4x2_t __val) { - __builtin_aarch64_simd_oi __o; - poly16x8x2_t __temp; - __temp.val[0] - = vcombine_p16 (__val.val[0], vcreate_p16 (__AARCH64_UINT64_C (0))); - __temp.val[1] - = vcombine_p16 (__val.val[1], vcreate_p16 (__AARCH64_UINT64_C (0))); - __builtin_memcpy (&__o, &__temp, sizeof (__temp)); - __builtin_aarch64_st1x2v4hi ((__builtin_aarch64_simd_hi *) __a, __o); + __builtin_aarch64_st1x2v4hi_sp ((__builtin_aarch64_simd_hi *) __a, __val); } __extension__ extern __inline void __attribute__ ((__always_inline__, __gnu_inline__, __artificial__)) vst1_s32_x2 (int32_t * __a, int32x2x2_t __val) { - __builtin_aarch64_simd_oi __o; - int32x4x2_t __temp; - __temp.val[0] - = vcombine_s32 (__val.val[0], vcreate_s32 (__AARCH64_INT64_C (0))); - __temp.val[1] - = vcombine_s32 (__val.val[1], vcreate_s32 (__AARCH64_INT64_C (0))); - __builtin_memcpy (&__o, &__temp, sizeof (__temp)); - __builtin_aarch64_st1x2v2si ((__builtin_aarch64_simd_si *) __a, __o); + __builtin_aarch64_st1x2v2si ((__builtin_aarch64_simd_si *) __a, __val); } __extension__ extern __inline void __attribute__ ((__always_inline__, __gnu_inline__, __artificial__)) vst1_u8_x2 (uint8_t * __a, uint8x8x2_t __val) { - __builtin_aarch64_simd_oi __o; - uint8x16x2_t __temp; - __temp.val[0] = vcombine_u8 (__val.val[0], vcreate_u8 (__AARCH64_UINT64_C (0))); - __temp.val[1] = vcombine_u8 (__val.val[1], vcreate_u8 (__AARCH64_UINT64_C (0))); - __builtin_memcpy (&__o, &__temp, sizeof (__temp)); - __builtin_aarch64_st1x2v8qi ((__builtin_aarch64_simd_qi *) __a, __o); + __builtin_aarch64_st1x2v8qi_su ((__builtin_aarch64_simd_qi *) __a, __val); } __extension__ extern __inline void __attribute__ ((__always_inline__, __gnu_inline__, __artificial__)) vst1_u16_x2 (uint16_t * __a, uint16x4x2_t __val) { - __builtin_aarch64_simd_oi __o; - uint16x8x2_t __temp; - __temp.val[0] = vcombine_u16 (__val.val[0], vcreate_u16 (__AARCH64_UINT64_C (0))); - __temp.val[1] = vcombine_u16 (__val.val[1], vcreate_u16 (__AARCH64_UINT64_C (0))); - __builtin_memcpy (&__o, &__temp, sizeof (__temp)); - __builtin_aarch64_st1x2v4hi ((__builtin_aarch64_simd_hi *) __a, __o); + __builtin_aarch64_st1x2v4hi_su ((__builtin_aarch64_simd_hi *) __a, __val); } __extension__ extern __inline void __attribute__ ((__always_inline__, __gnu_inline__, __artificial__)) vst1_u32_x2 (uint32_t * __a, uint32x2x2_t __val) { - __builtin_aarch64_simd_oi __o; - uint32x4x2_t __temp; - __temp.val[0] = vcombine_u32 (__val.val[0], vcreate_u32 (__AARCH64_UINT64_C (0))); - __temp.val[1] = vcombine_u32 (__val.val[1], vcreate_u32 (__AARCH64_UINT64_C (0))); - __builtin_memcpy (&__o, &__temp, sizeof (__temp)); - __builtin_aarch64_st1x2v2si ((__builtin_aarch64_simd_si *) __a, __o); + __builtin_aarch64_st1x2v2si_su ((__builtin_aarch64_simd_si *) __a, __val); } __extension__ extern __inline void __attribute__ ((__always_inline__, __gnu_inline__, __artificial__)) vst1_f16_x2 (float16_t * __a, float16x4x2_t __val) { - __builtin_aarch64_simd_oi __o; - float16x8x2_t __temp; - __temp.val[0] = vcombine_f16 (__val.val[0], vcreate_f16 (__AARCH64_UINT64_C (0))); - __temp.val[1] = vcombine_f16 (__val.val[1], vcreate_f16 (__AARCH64_UINT64_C (0))); - __builtin_memcpy (&__o, &__temp, sizeof (__temp)); - __builtin_aarch64_st1x2v4hf (__a, __o); + __builtin_aarch64_st1x2v4hf (__a, __val); } __extension__ extern __inline void __attribute__ ((__always_inline__, __gnu_inline__, __artificial__)) vst1_f32_x2 (float32_t * __a, float32x2x2_t __val) { - __builtin_aarch64_simd_oi __o; - float32x4x2_t __temp; - __temp.val[0] = vcombine_f32 (__val.val[0], vcreate_f32 (__AARCH64_UINT64_C (0))); - __temp.val[1] = vcombine_f32 (__val.val[1], vcreate_f32 (__AARCH64_UINT64_C (0))); - __builtin_memcpy (&__o, &__temp, sizeof (__temp)); - __builtin_aarch64_st1x2v2sf ((__builtin_aarch64_simd_sf *) __a, __o); + __builtin_aarch64_st1x2v2sf ((__builtin_aarch64_simd_sf *) __a, __val); } __extension__ extern __inline void __attribute__ ((__always_inline__, __gnu_inline__, __artificial__)) vst1_p64_x2 (poly64_t * __a, poly64x1x2_t __val) { - __builtin_aarch64_simd_oi __o; - poly64x2x2_t __temp; - __temp.val[0] = vcombine_p64 (__val.val[0], vcreate_p64 (__AARCH64_UINT64_C (0))); - __temp.val[1] = vcombine_p64 (__val.val[1], vcreate_p64 (__AARCH64_UINT64_C (0))); - __builtin_memcpy (&__o, &__temp, sizeof (__temp)); - __builtin_aarch64_st1x2di ((__builtin_aarch64_simd_di *) __a, __o); + __builtin_aarch64_st1x2di_sp ((__builtin_aarch64_simd_di *) __a, __val); } __extension__ extern __inline void __attribute__ ((__always_inline__, __gnu_inline__, __artificial__)) vst1q_s8_x2 (int8_t * __a, int8x16x2_t __val) { - __builtin_aarch64_simd_oi __o; - __builtin_memcpy (&__o, &__val, sizeof (__val)); - __builtin_aarch64_st1x2v16qi ((__builtin_aarch64_simd_qi *) __a, __o); + __builtin_aarch64_st1x2v16qi ((__builtin_aarch64_simd_qi *) __a, __val); } __extension__ extern __inline void __attribute__ ((__always_inline__, __gnu_inline__, __artificial__)) vst1q_p8_x2 (poly8_t * __a, poly8x16x2_t __val) { - __builtin_aarch64_simd_oi __o; - __builtin_memcpy (&__o, &__val, sizeof (__val)); - __builtin_aarch64_st1x2v16qi ((__builtin_aarch64_simd_qi *) __a, __o); + __builtin_aarch64_st1x2v16qi_sp ((__builtin_aarch64_simd_qi *) __a, __val); } __extension__ extern __inline void __attribute__ ((__always_inline__, __gnu_inline__, __artificial__)) vst1q_s16_x2 (int16_t * __a, int16x8x2_t __val) { - __builtin_aarch64_simd_oi __o; - __builtin_memcpy (&__o, &__val, sizeof (__val)); - __builtin_aarch64_st1x2v8hi ((__builtin_aarch64_simd_hi *) __a, __o); + __builtin_aarch64_st1x2v8hi ((__builtin_aarch64_simd_hi *) __a, __val); } __extension__ extern __inline void __attribute__ ((__always_inline__, __gnu_inline__, __artificial__)) vst1q_p16_x2 (poly16_t * __a, poly16x8x2_t __val) { - __builtin_aarch64_simd_oi __o; - __builtin_memcpy (&__o, &__val, sizeof (__val)); - __builtin_aarch64_st1x2v8hi ((__builtin_aarch64_simd_hi *) __a, __o); + __builtin_aarch64_st1x2v8hi_sp ((__builtin_aarch64_simd_hi *) __a, __val); } __extension__ extern __inline void __attribute__ ((__always_inline__, __gnu_inline__, __artificial__)) vst1q_s32_x2 (int32_t * __a, int32x4x2_t __val) { - __builtin_aarch64_simd_oi __o; - __builtin_memcpy (&__o, &__val, sizeof (__val)); - __builtin_aarch64_st1x2v4si ((__builtin_aarch64_simd_si *) __a, __o); + __builtin_aarch64_st1x2v4si ((__builtin_aarch64_simd_si *) __a, __val); } __extension__ extern __inline void __attribute__ ((__always_inline__, __gnu_inline__, __artificial__)) vst1q_s64_x2 (int64_t * __a, int64x2x2_t __val) { - __builtin_aarch64_simd_oi __o; - __builtin_memcpy (&__o, &__val, sizeof (__val)); - __builtin_aarch64_st1x2v2di ((__builtin_aarch64_simd_di *) __a, __o); + __builtin_aarch64_st1x2v2di ((__builtin_aarch64_simd_di *) __a, __val); } __extension__ extern __inline void __attribute__ ((__always_inline__, __gnu_inline__, __artificial__)) vst1q_u8_x2 (uint8_t * __a, uint8x16x2_t __val) { - __builtin_aarch64_simd_oi __o; - __builtin_memcpy (&__o, &__val, sizeof (__val)); - __builtin_aarch64_st1x2v16qi ((__builtin_aarch64_simd_qi *) __a, __o); + __builtin_aarch64_st1x2v16qi_su ((__builtin_aarch64_simd_qi *) __a, __val); } __extension__ extern __inline void __attribute__ ((__always_inline__, __gnu_inline__, __artificial__)) vst1q_u16_x2 (uint16_t * __a, uint16x8x2_t __val) { - __builtin_aarch64_simd_oi __o; - __builtin_memcpy (&__o, &__val, sizeof (__val)); - __builtin_aarch64_st1x2v8hi ((__builtin_aarch64_simd_hi *) __a, __o); + __builtin_aarch64_st1x2v8hi_su ((__builtin_aarch64_simd_hi *) __a, __val); } __extension__ extern __inline void __attribute__ ((__always_inline__, __gnu_inline__, __artificial__)) vst1q_u32_x2 (uint32_t * __a, uint32x4x2_t __val) { - __builtin_aarch64_simd_oi __o; - __builtin_memcpy (&__o, &__val, sizeof (__val)); - __builtin_aarch64_st1x2v4si ((__builtin_aarch64_simd_si *) __a, __o); + __builtin_aarch64_st1x2v4si_su ((__builtin_aarch64_simd_si *) __a, __val); } __extension__ extern __inline void __attribute__ ((__always_inline__, __gnu_inline__, __artificial__)) vst1q_u64_x2 (uint64_t * __a, uint64x2x2_t __val) { - __builtin_aarch64_simd_oi __o; - __builtin_memcpy (&__o, &__val, sizeof (__val)); - __builtin_aarch64_st1x2v2di ((__builtin_aarch64_simd_di *) __a, __o); + __builtin_aarch64_st1x2v2di_su ((__builtin_aarch64_simd_di *) __a, __val); } __extension__ extern __inline void __attribute__ ((__always_inline__, __gnu_inline__, __artificial__)) vst1q_f16_x2 (float16_t * __a, float16x8x2_t __val) { - __builtin_aarch64_simd_oi __o; - __builtin_memcpy (&__o, &__val, sizeof (__val)); - __builtin_aarch64_st1x2v8hf (__a, __o); + __builtin_aarch64_st1x2v8hf (__a, __val); } __extension__ extern __inline void __attribute__ ((__always_inline__, __gnu_inline__, __artificial__)) vst1q_f32_x2 (float32_t * __a, float32x4x2_t __val) { - __builtin_aarch64_simd_oi __o; - __builtin_memcpy (&__o, &__val, sizeof (__val)); - __builtin_aarch64_st1x2v4sf ((__builtin_aarch64_simd_sf *) __a, __o); + __builtin_aarch64_st1x2v4sf ((__builtin_aarch64_simd_sf *) __a, __val); } __extension__ extern __inline void __attribute__ ((__always_inline__, __gnu_inline__, __artificial__)) vst1q_f64_x2 (float64_t * __a, float64x2x2_t __val) { - __builtin_aarch64_simd_oi __o; - __builtin_memcpy (&__o, &__val, sizeof (__val)); - __builtin_aarch64_st1x2v2df ((__builtin_aarch64_simd_df *) __a, __o); + __builtin_aarch64_st1x2v2df ((__builtin_aarch64_simd_df *) __a, __val); } __extension__ extern __inline void __attribute__ ((__always_inline__, __gnu_inline__, __artificial__)) vst1q_p64_x2 (poly64_t * __a, poly64x2x2_t __val) { - __builtin_aarch64_simd_oi __o; - __builtin_memcpy (&__o, &__val, sizeof (__val)); - __builtin_aarch64_st1x2v2di ((__builtin_aarch64_simd_di *) __a, __o); + __builtin_aarch64_st1x2v2di_sp ((__builtin_aarch64_simd_di *) __a, __val); } /* vst1x3 */ @@ -28193,308 +25351,197 @@ __extension__ extern __inline void __attribute__ ((__always_inline__, __gnu_inline__, __artificial__)) vst1_s64_x3 (int64_t * __a, int64x1x3_t __val) { - __builtin_aarch64_simd_ci __o; - int64x2x3_t __temp; - __temp.val[0] = vcombine_s64 (__val.val[0], vcreate_s64 (__AARCH64_INT64_C (0))); - __temp.val[1] = vcombine_s64 (__val.val[1], vcreate_s64 (__AARCH64_INT64_C (0))); - __temp.val[2] = vcombine_s64 (__val.val[2], vcreate_s64 (__AARCH64_INT64_C (0))); - __builtin_memcpy (&__o, &__temp, sizeof (__temp)); - __builtin_aarch64_st1x3di ((__builtin_aarch64_simd_di *) __a, __o); + __builtin_aarch64_st1x3di ((__builtin_aarch64_simd_di *) __a, __val); } __extension__ extern __inline void __attribute__ ((__always_inline__, __gnu_inline__, __artificial__)) vst1_u64_x3 (uint64_t * __a, uint64x1x3_t __val) { - __builtin_aarch64_simd_ci __o; - uint64x2x3_t __temp; - __temp.val[0] = vcombine_u64 (__val.val[0], vcreate_u64 (__AARCH64_UINT64_C (0))); - __temp.val[1] = vcombine_u64 (__val.val[1], vcreate_u64 (__AARCH64_UINT64_C (0))); - __temp.val[2] = vcombine_u64 (__val.val[2], vcreate_u64 (__AARCH64_UINT64_C (0))); - __builtin_memcpy (&__o, &__temp, sizeof (__temp)); - __builtin_aarch64_st1x3di ((__builtin_aarch64_simd_di *) __a, __o); + __builtin_aarch64_st1x3di_su ((__builtin_aarch64_simd_di *) __a, __val); } __extension__ extern __inline void __attribute__ ((__always_inline__, __gnu_inline__, __artificial__)) vst1_f64_x3 (float64_t * __a, float64x1x3_t __val) { - __builtin_aarch64_simd_ci __o; - float64x2x3_t __temp; - __temp.val[0] = vcombine_f64 (__val.val[0], vcreate_f64 (__AARCH64_UINT64_C (0))); - __temp.val[1] = vcombine_f64 (__val.val[1], vcreate_f64 (__AARCH64_UINT64_C (0))); - __temp.val[2] = vcombine_f64 (__val.val[2], vcreate_f64 (__AARCH64_UINT64_C (0))); - __builtin_memcpy (&__o, &__temp, sizeof (__temp)); - __builtin_aarch64_st1x3df ((__builtin_aarch64_simd_df *) __a, __o); + __builtin_aarch64_st1x3df ((__builtin_aarch64_simd_df *) __a, __val); } __extension__ extern __inline void __attribute__ ((__always_inline__, __gnu_inline__, __artificial__)) vst1_s8_x3 (int8_t * __a, int8x8x3_t __val) { - __builtin_aarch64_simd_ci __o; - int8x16x3_t __temp; - __temp.val[0] = vcombine_s8 (__val.val[0], vcreate_s8 (__AARCH64_INT64_C (0))); - __temp.val[1] = vcombine_s8 (__val.val[1], vcreate_s8 (__AARCH64_INT64_C (0))); - __temp.val[2] = vcombine_s8 (__val.val[2], vcreate_s8 (__AARCH64_INT64_C (0))); - __builtin_memcpy (&__o, &__temp, sizeof (__temp)); - __builtin_aarch64_st1x3v8qi ((__builtin_aarch64_simd_qi *) __a, __o); + __builtin_aarch64_st1x3v8qi ((__builtin_aarch64_simd_qi *) __a, __val); } __extension__ extern __inline void __attribute__ ((__always_inline__, __gnu_inline__, __artificial__)) vst1_p8_x3 (poly8_t * __a, poly8x8x3_t __val) { - __builtin_aarch64_simd_ci __o; - poly8x16x3_t __temp; - __temp.val[0] = vcombine_p8 (__val.val[0], vcreate_p8 (__AARCH64_UINT64_C (0))); - __temp.val[1] = vcombine_p8 (__val.val[1], vcreate_p8 (__AARCH64_UINT64_C (0))); - __temp.val[2] = vcombine_p8 (__val.val[2], vcreate_p8 (__AARCH64_UINT64_C (0))); - __builtin_memcpy (&__o, &__temp, sizeof (__temp)); - __builtin_aarch64_st1x3v8qi ((__builtin_aarch64_simd_qi *) __a, __o); + __builtin_aarch64_st1x3v8qi_sp ((__builtin_aarch64_simd_qi *) __a, __val); } __extension__ extern __inline void __attribute__ ((__always_inline__, __gnu_inline__, __artificial__)) vst1_s16_x3 (int16_t * __a, int16x4x3_t __val) { - __builtin_aarch64_simd_ci __o; - int16x8x3_t __temp; - __temp.val[0] = vcombine_s16 (__val.val[0], vcreate_s16 (__AARCH64_INT64_C (0))); - __temp.val[1] = vcombine_s16 (__val.val[1], vcreate_s16 (__AARCH64_INT64_C (0))); - __temp.val[2] = vcombine_s16 (__val.val[2], vcreate_s16 (__AARCH64_INT64_C (0))); - __builtin_memcpy (&__o, &__temp, sizeof (__temp)); - __builtin_aarch64_st1x3v4hi ((__builtin_aarch64_simd_hi *) __a, __o); + __builtin_aarch64_st1x3v4hi ((__builtin_aarch64_simd_hi *) __a, __val); } __extension__ extern __inline void __attribute__ ((__always_inline__, __gnu_inline__, __artificial__)) vst1_p16_x3 (poly16_t * __a, poly16x4x3_t __val) { - __builtin_aarch64_simd_ci __o; - poly16x8x3_t __temp; - __temp.val[0] = vcombine_p16 (__val.val[0], vcreate_p16 (__AARCH64_UINT64_C (0))); - __temp.val[1] = vcombine_p16 (__val.val[1], vcreate_p16 (__AARCH64_UINT64_C (0))); - __temp.val[2] = vcombine_p16 (__val.val[2], vcreate_p16 (__AARCH64_UINT64_C (0))); - __builtin_memcpy (&__o, &__temp, sizeof (__temp)); - __builtin_aarch64_st1x3v4hi ((__builtin_aarch64_simd_hi *) __a, __o); + __builtin_aarch64_st1x3v4hi_sp ((__builtin_aarch64_simd_hi *) __a, __val); } __extension__ extern __inline void __attribute__ ((__always_inline__, __gnu_inline__, __artificial__)) vst1_s32_x3 (int32_t * __a, int32x2x3_t __val) { - __builtin_aarch64_simd_ci __o; - int32x4x3_t __temp; - __temp.val[0] = vcombine_s32 (__val.val[0], vcreate_s32 (__AARCH64_INT64_C (0))); - __temp.val[1] = vcombine_s32 (__val.val[1], vcreate_s32 (__AARCH64_INT64_C (0))); - __temp.val[2] = vcombine_s32 (__val.val[2], vcreate_s32 (__AARCH64_INT64_C (0))); - __builtin_memcpy (&__o, &__temp, sizeof (__temp)); - __builtin_aarch64_st1x3v2si ((__builtin_aarch64_simd_si *) __a, __o); + __builtin_aarch64_st1x3v2si ((__builtin_aarch64_simd_si *) __a, __val); } __extension__ extern __inline void __attribute__ ((__always_inline__, __gnu_inline__, __artificial__)) vst1_u8_x3 (uint8_t * __a, uint8x8x3_t __val) { - __builtin_aarch64_simd_ci __o; - uint8x16x3_t __temp; - __temp.val[0] = vcombine_u8 (__val.val[0], vcreate_u8 (__AARCH64_UINT64_C (0))); - __temp.val[1] = vcombine_u8 (__val.val[1], vcreate_u8 (__AARCH64_UINT64_C (0))); - __temp.val[2] = vcombine_u8 (__val.val[2], vcreate_u8 (__AARCH64_UINT64_C (0))); - __builtin_memcpy (&__o, &__temp, sizeof (__temp)); - __builtin_aarch64_st1x3v8qi ((__builtin_aarch64_simd_qi *) __a, __o); + __builtin_aarch64_st1x3v8qi_su ((__builtin_aarch64_simd_qi *) __a, __val); } __extension__ extern __inline void __attribute__ ((__always_inline__, __gnu_inline__, __artificial__)) vst1_u16_x3 (uint16_t * __a, uint16x4x3_t __val) { - __builtin_aarch64_simd_ci __o; - uint16x8x3_t __temp; - __temp.val[0] = vcombine_u16 (__val.val[0], vcreate_u16 (__AARCH64_UINT64_C (0))); - __temp.val[1] = vcombine_u16 (__val.val[1], vcreate_u16 (__AARCH64_UINT64_C (0))); - __temp.val[2] = vcombine_u16 (__val.val[2], vcreate_u16 (__AARCH64_UINT64_C (0))); - __builtin_memcpy (&__o, &__temp, sizeof (__temp)); - __builtin_aarch64_st1x3v4hi ((__builtin_aarch64_simd_hi *) __a, __o); + __builtin_aarch64_st1x3v4hi_su ((__builtin_aarch64_simd_hi *) __a, __val); } __extension__ extern __inline void __attribute__ ((__always_inline__, __gnu_inline__, __artificial__)) vst1_u32_x3 (uint32_t * __a, uint32x2x3_t __val) { - __builtin_aarch64_simd_ci __o; - uint32x4x3_t __temp; - __temp.val[0] = vcombine_u32 (__val.val[0], vcreate_u32 (__AARCH64_UINT64_C (0))); - __temp.val[1] = vcombine_u32 (__val.val[1], vcreate_u32 (__AARCH64_UINT64_C (0))); - __temp.val[2] = vcombine_u32 (__val.val[2], vcreate_u32 (__AARCH64_UINT64_C (0))); - __builtin_memcpy (&__o, &__temp, sizeof (__temp)); - __builtin_aarch64_st1x3v2si ((__builtin_aarch64_simd_si *) __a, __o); + __builtin_aarch64_st1x3v2si_su ((__builtin_aarch64_simd_si *) __a, __val); } __extension__ extern __inline void __attribute__ ((__always_inline__, __gnu_inline__, __artificial__)) vst1_f16_x3 (float16_t * __a, float16x4x3_t __val) { - __builtin_aarch64_simd_ci __o; - float16x8x3_t __temp; - __temp.val[0] = vcombine_f16 (__val.val[0], vcreate_f16 (__AARCH64_UINT64_C (0))); - __temp.val[1] = vcombine_f16 (__val.val[1], vcreate_f16 (__AARCH64_UINT64_C (0))); - __temp.val[2] = vcombine_f16 (__val.val[2], vcreate_f16 (__AARCH64_UINT64_C (0))); - __builtin_memcpy (&__o, &__temp, sizeof (__temp)); - __builtin_aarch64_st1x3v4hf ((__builtin_aarch64_simd_hf *) __a, __o); + __builtin_aarch64_st1x3v4hf ((__builtin_aarch64_simd_hf *) __a, __val); } __extension__ extern __inline void __attribute__ ((__always_inline__, __gnu_inline__, __artificial__)) vst1_f32_x3 (float32_t * __a, float32x2x3_t __val) { - __builtin_aarch64_simd_ci __o; - float32x4x3_t __temp; - __temp.val[0] = vcombine_f32 (__val.val[0], vcreate_f32 (__AARCH64_UINT64_C (0))); - __temp.val[1] = vcombine_f32 (__val.val[1], vcreate_f32 (__AARCH64_UINT64_C (0))); - __temp.val[2] = vcombine_f32 (__val.val[2], vcreate_f32 (__AARCH64_UINT64_C (0))); - __builtin_memcpy (&__o, &__temp, sizeof (__temp)); - __builtin_aarch64_st1x3v2sf ((__builtin_aarch64_simd_sf *) __a, __o); + __builtin_aarch64_st1x3v2sf ((__builtin_aarch64_simd_sf *) __a, __val); } __extension__ extern __inline void __attribute__ ((__always_inline__, __gnu_inline__, __artificial__)) vst1_p64_x3 (poly64_t * __a, poly64x1x3_t __val) { - __builtin_aarch64_simd_ci __o; - poly64x2x3_t __temp; - __temp.val[0] = vcombine_p64 (__val.val[0], vcreate_p64 (__AARCH64_UINT64_C (0))); - __temp.val[1] = vcombine_p64 (__val.val[1], vcreate_p64 (__AARCH64_UINT64_C (0))); - __temp.val[2] = vcombine_p64 (__val.val[2], vcreate_p64 (__AARCH64_UINT64_C (0))); - __builtin_memcpy (&__o, &__temp, sizeof (__temp)); - __builtin_aarch64_st1x3di ((__builtin_aarch64_simd_di *) __a, __o); + __builtin_aarch64_st1x3di_sp ((__builtin_aarch64_simd_di *) __a, __val); } __extension__ extern __inline void __attribute__ ((__always_inline__, __gnu_inline__, __artificial__)) vst1q_s8_x3 (int8_t * __a, int8x16x3_t __val) { - __builtin_aarch64_simd_ci __o; - __builtin_memcpy (&__o, &__val, sizeof (__val)); - __builtin_aarch64_st1x3v16qi ((__builtin_aarch64_simd_qi *) __a, __o); + __builtin_aarch64_st1x3v16qi ((__builtin_aarch64_simd_qi *) __a, __val); } __extension__ extern __inline void __attribute__ ((__always_inline__, __gnu_inline__, __artificial__)) vst1q_p8_x3 (poly8_t * __a, poly8x16x3_t __val) { - __builtin_aarch64_simd_ci __o; - __builtin_memcpy (&__o, &__val, sizeof (__val)); - __builtin_aarch64_st1x3v16qi ((__builtin_aarch64_simd_qi *) __a, __o); + __builtin_aarch64_st1x3v16qi_sp ((__builtin_aarch64_simd_qi *) __a, __val); } __extension__ extern __inline void __attribute__ ((__always_inline__, __gnu_inline__, __artificial__)) vst1q_s16_x3 (int16_t * __a, int16x8x3_t __val) { - __builtin_aarch64_simd_ci __o; - __builtin_memcpy (&__o, &__val, sizeof (__val)); - __builtin_aarch64_st1x3v8hi ((__builtin_aarch64_simd_hi *) __a, __o); + __builtin_aarch64_st1x3v8hi ((__builtin_aarch64_simd_hi *) __a, __val); } __extension__ extern __inline void __attribute__ ((__always_inline__, __gnu_inline__, __artificial__)) vst1q_p16_x3 (poly16_t * __a, poly16x8x3_t __val) { - __builtin_aarch64_simd_ci __o; - __builtin_memcpy (&__o, &__val, sizeof (__val)); - __builtin_aarch64_st1x3v8hi ((__builtin_aarch64_simd_hi *) __a, __o); + __builtin_aarch64_st1x3v8hi_sp ((__builtin_aarch64_simd_hi *) __a, __val); } __extension__ extern __inline void __attribute__ ((__always_inline__, __gnu_inline__, __artificial__)) vst1q_s32_x3 (int32_t * __a, int32x4x3_t __val) { - __builtin_aarch64_simd_ci __o; - __builtin_memcpy (&__o, &__val, sizeof (__val)); - __builtin_aarch64_st1x3v4si ((__builtin_aarch64_simd_si *) __a, __o); + __builtin_aarch64_st1x3v4si ((__builtin_aarch64_simd_si *) __a, __val); } __extension__ extern __inline void __attribute__ ((__always_inline__, __gnu_inline__, __artificial__)) vst1q_s64_x3 (int64_t * __a, int64x2x3_t __val) { - __builtin_aarch64_simd_ci __o; - __builtin_memcpy (&__o, &__val, sizeof (__val)); - __builtin_aarch64_st1x3v2di ((__builtin_aarch64_simd_di *) __a, __o); + __builtin_aarch64_st1x3v2di ((__builtin_aarch64_simd_di *) __a, + (int64x2x3_t) __val); } __extension__ extern __inline void __attribute__ ((__always_inline__, __gnu_inline__, __artificial__)) vst1q_u8_x3 (uint8_t * __a, uint8x16x3_t __val) { - __builtin_aarch64_simd_ci __o; - __builtin_memcpy (&__o, &__val, sizeof (__val)); - __builtin_aarch64_st1x3v16qi ((__builtin_aarch64_simd_qi *) __a, __o); + __builtin_aarch64_st1x3v16qi_su ((__builtin_aarch64_simd_qi *) __a, __val); } __extension__ extern __inline void __attribute__ ((__always_inline__, __gnu_inline__, __artificial__)) vst1q_u16_x3 (uint16_t * __a, uint16x8x3_t __val) { - __builtin_aarch64_simd_ci __o; - __builtin_memcpy (&__o, &__val, sizeof (__val)); - __builtin_aarch64_st1x3v8hi ((__builtin_aarch64_simd_hi *) __a, __o); + __builtin_aarch64_st1x3v8hi_su ((__builtin_aarch64_simd_hi *) __a, __val); } __extension__ extern __inline void __attribute__ ((__always_inline__, __gnu_inline__, __artificial__)) vst1q_u32_x3 (uint32_t * __a, uint32x4x3_t __val) { - __builtin_aarch64_simd_ci __o; - __builtin_memcpy (&__o, &__val, sizeof (__val)); - __builtin_aarch64_st1x3v4si ((__builtin_aarch64_simd_si *) __a, __o); + __builtin_aarch64_st1x3v4si_su ((__builtin_aarch64_simd_si *) __a, __val); } __extension__ extern __inline void __attribute__ ((__always_inline__, __gnu_inline__, __artificial__)) vst1q_u64_x3 (uint64_t * __a, uint64x2x3_t __val) { - __builtin_aarch64_simd_ci __o; - __builtin_memcpy (&__o, &__val, sizeof (__val)); - __builtin_aarch64_st1x3v2di ((__builtin_aarch64_simd_di *) __a, __o); + __builtin_aarch64_st1x3v2di_su ((__builtin_aarch64_simd_di *) __a, __val); } __extension__ extern __inline void __attribute__ ((__always_inline__, __gnu_inline__, __artificial__)) vst1q_f16_x3 (float16_t * __a, float16x8x3_t __val) { - __builtin_aarch64_simd_ci __o; - __builtin_memcpy (&__o, &__val, sizeof (__val)); - __builtin_aarch64_st1x3v8hf ((__builtin_aarch64_simd_hf *) __a, __o); + __builtin_aarch64_st1x3v8hf ((__builtin_aarch64_simd_hf *) __a, __val); } __extension__ extern __inline void __attribute__ ((__always_inline__, __gnu_inline__, __artificial__)) vst1q_f32_x3 (float32_t * __a, float32x4x3_t __val) { - __builtin_aarch64_simd_ci __o; - __builtin_memcpy (&__o, &__val, sizeof (__val)); - __builtin_aarch64_st1x3v4sf ((__builtin_aarch64_simd_sf *) __a, __o); + __builtin_aarch64_st1x3v4sf ((__builtin_aarch64_simd_sf *) __a, __val); } __extension__ extern __inline void __attribute__ ((__always_inline__, __gnu_inline__, __artificial__)) vst1q_f64_x3 (float64_t * __a, float64x2x3_t __val) { - __builtin_aarch64_simd_ci __o; - __builtin_memcpy (&__o, &__val, sizeof (__val)); - __builtin_aarch64_st1x3v2df ((__builtin_aarch64_simd_df *) __a, __o); + __builtin_aarch64_st1x3v2df ((__builtin_aarch64_simd_df *) __a, __val); } __extension__ extern __inline void __attribute__ ((__always_inline__, __gnu_inline__, __artificial__)) vst1q_p64_x3 (poly64_t * __a, poly64x2x3_t __val) { - __builtin_aarch64_simd_ci __o; - __builtin_memcpy (&__o, &__val, sizeof (__val)); - __builtin_aarch64_st1x3v2di ((__builtin_aarch64_simd_di *) __a, __o); + __builtin_aarch64_st1x3v2di_sp ((__builtin_aarch64_simd_di *) __a, __val); } /* vst1(q)_x4. */ @@ -28503,322 +25550,196 @@ __extension__ extern __inline void __attribute__ ((__always_inline__, __gnu_inline__, __artificial__)) vst1_s8_x4 (int8_t * __a, int8x8x4_t __val) { - __builtin_aarch64_simd_xi __o; - int8x16x4_t __temp; - __temp.val[0] = vcombine_s8 (__val.val[0], vcreate_s8 (__AARCH64_INT64_C (0))); - __temp.val[1] = vcombine_s8 (__val.val[1], vcreate_s8 (__AARCH64_INT64_C (0))); - __temp.val[2] = vcombine_s8 (__val.val[2], vcreate_s8 (__AARCH64_INT64_C (0))); - __temp.val[3] = vcombine_s8 (__val.val[3], vcreate_s8 (__AARCH64_INT64_C (0))); - __builtin_memcpy (&__o, &__temp, sizeof (__temp)); - __builtin_aarch64_st1x4v8qi ((__builtin_aarch64_simd_qi *) __a, __o); + __builtin_aarch64_st1x4v8qi ((__builtin_aarch64_simd_qi *) __a, __val); } __extension__ extern __inline void __attribute__ ((__always_inline__, __gnu_inline__, __artificial__)) vst1q_s8_x4 (int8_t * __a, int8x16x4_t __val) { - __builtin_aarch64_simd_xi __o; - __builtin_memcpy (&__o, &__val, sizeof (__val)); - __builtin_aarch64_st1x4v16qi ((__builtin_aarch64_simd_qi *) __a, __o); + __builtin_aarch64_st1x4v16qi ((__builtin_aarch64_simd_qi *) __a, __val); } __extension__ extern __inline void __attribute__ ((__always_inline__, __gnu_inline__, __artificial__)) vst1_s16_x4 (int16_t * __a, int16x4x4_t __val) { - __builtin_aarch64_simd_xi __o; - int16x8x4_t __temp; - __temp.val[0] = vcombine_s16 (__val.val[0], vcreate_s16 (__AARCH64_INT64_C (0))); - __temp.val[1] = vcombine_s16 (__val.val[1], vcreate_s16 (__AARCH64_INT64_C (0))); - __temp.val[2] = vcombine_s16 (__val.val[2], vcreate_s16 (__AARCH64_INT64_C (0))); - __temp.val[3] = vcombine_s16 (__val.val[3], vcreate_s16 (__AARCH64_INT64_C (0))); - __builtin_memcpy (&__o, &__temp, sizeof (__temp)); - __builtin_aarch64_st1x4v4hi ((__builtin_aarch64_simd_hi *) __a, __o); + __builtin_aarch64_st1x4v4hi ((__builtin_aarch64_simd_hi *) __a, __val); } __extension__ extern __inline void __attribute__ ((__always_inline__, __gnu_inline__, __artificial__)) vst1q_s16_x4 (int16_t * __a, int16x8x4_t __val) { - __builtin_aarch64_simd_xi __o; - __builtin_memcpy (&__o, &__val, sizeof (__val)); - __builtin_aarch64_st1x4v8hi ((__builtin_aarch64_simd_hi *) __a, __o); + __builtin_aarch64_st1x4v8hi ((__builtin_aarch64_simd_hi *) __a, __val); } __extension__ extern __inline void __attribute__ ((__always_inline__, __gnu_inline__, __artificial__)) vst1_s32_x4 (int32_t * __a, int32x2x4_t __val) { - __builtin_aarch64_simd_xi __o; - int32x4x4_t __temp; - __temp.val[0] = vcombine_s32 (__val.val[0], vcreate_s32 (__AARCH64_INT64_C (0))); - __temp.val[1] = vcombine_s32 (__val.val[1], vcreate_s32 (__AARCH64_INT64_C (0))); - __temp.val[2] = vcombine_s32 (__val.val[2], vcreate_s32 (__AARCH64_INT64_C (0))); - __temp.val[3] = vcombine_s32 (__val.val[3], vcreate_s32 (__AARCH64_INT64_C (0))); - __builtin_memcpy (&__o, &__temp, sizeof (__temp)); - __builtin_aarch64_st1x4v2si ((__builtin_aarch64_simd_si *) __a, __o); + __builtin_aarch64_st1x4v2si ((__builtin_aarch64_simd_si *) __a, __val); } __extension__ extern __inline void __attribute__ ((__always_inline__, __gnu_inline__, __artificial__)) vst1q_s32_x4 (int32_t * __a, int32x4x4_t __val) { - __builtin_aarch64_simd_xi __o; - __builtin_memcpy (&__o, &__val, sizeof (__val)); - __builtin_aarch64_st1x4v4si ((__builtin_aarch64_simd_si *) __a, __o); + __builtin_aarch64_st1x4v4si ((__builtin_aarch64_simd_si *) __a, __val); } __extension__ extern __inline void __attribute__ ((__always_inline__, __gnu_inline__, __artificial__)) vst1_u8_x4 (uint8_t * __a, uint8x8x4_t __val) { - __builtin_aarch64_simd_xi __o; - uint8x16x4_t __temp; - __temp.val[0] = vcombine_u8 (__val.val[0], vcreate_u8 (__AARCH64_UINT64_C (0))); - __temp.val[1] = vcombine_u8 (__val.val[1], vcreate_u8 (__AARCH64_UINT64_C (0))); - __temp.val[2] = vcombine_u8 (__val.val[2], vcreate_u8 (__AARCH64_UINT64_C (0))); - __temp.val[3] = vcombine_u8 (__val.val[3], vcreate_u8 (__AARCH64_UINT64_C (0))); - __builtin_memcpy (&__o, &__temp, sizeof (__temp)); - __builtin_aarch64_st1x4v8qi ((__builtin_aarch64_simd_qi *) __a, __o); + __builtin_aarch64_st1x4v8qi_su ((__builtin_aarch64_simd_qi *) __a, __val); } __extension__ extern __inline void __attribute__ ((__always_inline__, __gnu_inline__, __artificial__)) vst1q_u8_x4 (uint8_t * __a, uint8x16x4_t __val) { - __builtin_aarch64_simd_xi __o; - __builtin_memcpy (&__o, &__val, sizeof (__val)); - __builtin_aarch64_st1x4v16qi ((__builtin_aarch64_simd_qi *) __a, __o); + __builtin_aarch64_st1x4v16qi_su ((__builtin_aarch64_simd_qi *) __a, __val); } __extension__ extern __inline void __attribute__ ((__always_inline__, __gnu_inline__, __artificial__)) vst1_u16_x4 (uint16_t * __a, uint16x4x4_t __val) { - __builtin_aarch64_simd_xi __o; - uint16x8x4_t __temp; - __temp.val[0] = vcombine_u16 (__val.val[0], vcreate_u16 (__AARCH64_UINT64_C (0))); - __temp.val[1] = vcombine_u16 (__val.val[1], vcreate_u16 (__AARCH64_UINT64_C (0))); - __temp.val[2] = vcombine_u16 (__val.val[2], vcreate_u16 (__AARCH64_UINT64_C (0))); - __temp.val[3] = vcombine_u16 (__val.val[3], vcreate_u16 (__AARCH64_UINT64_C (0))); - __builtin_memcpy (&__o, &__temp, sizeof (__temp)); - __builtin_aarch64_st1x4v4hi ((__builtin_aarch64_simd_hi *) __a, __o); + __builtin_aarch64_st1x4v4hi_su ((__builtin_aarch64_simd_hi *) __a, __val); } __extension__ extern __inline void __attribute__ ((__always_inline__, __gnu_inline__, __artificial__)) vst1q_u16_x4 (uint16_t * __a, uint16x8x4_t __val) { - __builtin_aarch64_simd_xi __o; - __builtin_memcpy (&__o, &__val, sizeof (__val)); - __builtin_aarch64_st1x4v8hi ((__builtin_aarch64_simd_hi *) __a, __o); + __builtin_aarch64_st1x4v8hi_su ((__builtin_aarch64_simd_hi *) __a, __val); } __extension__ extern __inline void __attribute__ ((__always_inline__, __gnu_inline__, __artificial__)) vst1_u32_x4 (uint32_t * __a, uint32x2x4_t __val) { - __builtin_aarch64_simd_xi __o; - uint32x4x4_t __temp; - __temp.val[0] = vcombine_u32 (__val.val[0], vcreate_u32 (__AARCH64_UINT64_C (0))); - __temp.val[1] = vcombine_u32 (__val.val[1], vcreate_u32 (__AARCH64_UINT64_C (0))); - __temp.val[2] = vcombine_u32 (__val.val[2], vcreate_u32 (__AARCH64_UINT64_C (0))); - __temp.val[3] = vcombine_u32 (__val.val[3], vcreate_u32 (__AARCH64_UINT64_C (0))); - __builtin_memcpy (&__o, &__temp, sizeof (__temp)); - __builtin_aarch64_st1x4v2si ((__builtin_aarch64_simd_si *) __a, __o); + __builtin_aarch64_st1x4v2si_su ((__builtin_aarch64_simd_si *) __a, __val); } __extension__ extern __inline void __attribute__ ((__always_inline__, __gnu_inline__, __artificial__)) vst1q_u32_x4 (uint32_t * __a, uint32x4x4_t __val) { - __builtin_aarch64_simd_xi __o; - __builtin_memcpy (&__o, &__val, sizeof (__val)); - __builtin_aarch64_st1x4v4si ((__builtin_aarch64_simd_si *) __a, __o); + __builtin_aarch64_st1x4v4si_su ((__builtin_aarch64_simd_si *) __a, __val); } __extension__ extern __inline void __attribute__ ((__always_inline__, __gnu_inline__, __artificial__)) vst1_f16_x4 (float16_t * __a, float16x4x4_t __val) { - __builtin_aarch64_simd_xi __o; - float16x8x4_t __temp; - __temp.val[0] = vcombine_f16 (__val.val[0], vcreate_f16 (__AARCH64_UINT64_C (0))); - __temp.val[1] = vcombine_f16 (__val.val[1], vcreate_f16 (__AARCH64_UINT64_C (0))); - __temp.val[2] = vcombine_f16 (__val.val[2], vcreate_f16 (__AARCH64_UINT64_C (0))); - __temp.val[3] = vcombine_f16 (__val.val[3], vcreate_f16 (__AARCH64_UINT64_C (0))); - __builtin_memcpy (&__o, &__temp, sizeof (__temp)); - __builtin_aarch64_st1x4v4hf ((__builtin_aarch64_simd_hf *) __a, __o); + __builtin_aarch64_st1x4v4hf ((__builtin_aarch64_simd_hf *) __a, __val); } __extension__ extern __inline void __attribute__ ((__always_inline__, __gnu_inline__, __artificial__)) vst1q_f16_x4 (float16_t * __a, float16x8x4_t __val) { - __builtin_aarch64_simd_xi __o; - __builtin_memcpy (&__o, &__val, sizeof (__val)); - __builtin_aarch64_st1x4v8hf ((__builtin_aarch64_simd_hf *) __a, __o); + __builtin_aarch64_st1x4v8hf ((__builtin_aarch64_simd_hf *) __a, __val); } __extension__ extern __inline void __attribute__ ((__always_inline__, __gnu_inline__, __artificial__)) vst1_f32_x4 (float32_t * __a, float32x2x4_t __val) { - __builtin_aarch64_simd_xi __o; - float32x4x4_t __temp; - __temp.val[0] = vcombine_f32 (__val.val[0], vcreate_f32 (__AARCH64_UINT64_C (0))); - __temp.val[1] = vcombine_f32 (__val.val[1], vcreate_f32 (__AARCH64_UINT64_C (0))); - __temp.val[2] = vcombine_f32 (__val.val[2], vcreate_f32 (__AARCH64_UINT64_C (0))); - __temp.val[3] = vcombine_f32 (__val.val[3], vcreate_f32 (__AARCH64_UINT64_C (0))); - __builtin_memcpy (&__o, &__temp, sizeof (__temp)); - __builtin_aarch64_st1x4v2sf ((__builtin_aarch64_simd_sf *) __a, __o); + __builtin_aarch64_st1x4v2sf ((__builtin_aarch64_simd_sf *) __a, __val); } __extension__ extern __inline void __attribute__ ((__always_inline__, __gnu_inline__, __artificial__)) vst1q_f32_x4 (float32_t * __a, float32x4x4_t __val) { - __builtin_aarch64_simd_xi __o; - __builtin_memcpy (&__o, &__val, sizeof (__val)); - __builtin_aarch64_st1x4v4sf ((__builtin_aarch64_simd_sf *) __a, __o); + __builtin_aarch64_st1x4v4sf ((__builtin_aarch64_simd_sf *) __a, __val); } __extension__ extern __inline void __attribute__ ((__always_inline__, __gnu_inline__, __artificial__)) vst1_p8_x4 (poly8_t * __a, poly8x8x4_t __val) { - __builtin_aarch64_simd_xi __o; - poly8x16x4_t __temp; - __temp.val[0] = vcombine_p8 (__val.val[0], vcreate_p8 (__AARCH64_UINT64_C (0))); - __temp.val[1] = vcombine_p8 (__val.val[1], vcreate_p8 (__AARCH64_UINT64_C (0))); - __temp.val[2] = vcombine_p8 (__val.val[2], vcreate_p8 (__AARCH64_UINT64_C (0))); - __temp.val[3] = vcombine_p8 (__val.val[3], vcreate_p8 (__AARCH64_UINT64_C (0))); - __builtin_memcpy (&__o, &__temp, sizeof (__temp)); - __builtin_aarch64_st1x4v8qi ((__builtin_aarch64_simd_qi *) __a, __o); + __builtin_aarch64_st1x4v8qi_sp ((__builtin_aarch64_simd_qi *) __a, __val); } __extension__ extern __inline void __attribute__ ((__always_inline__, __gnu_inline__, __artificial__)) vst1q_p8_x4 (poly8_t * __a, poly8x16x4_t __val) { - __builtin_aarch64_simd_xi __o; - __builtin_memcpy (&__o, &__val, sizeof (__val)); - __builtin_aarch64_st1x4v16qi ((__builtin_aarch64_simd_qi *) __a, __o); + __builtin_aarch64_st1x4v16qi_sp ((__builtin_aarch64_simd_qi *) __a, __val); } __extension__ extern __inline void __attribute__ ((__always_inline__, __gnu_inline__, __artificial__)) vst1_p16_x4 (poly16_t * __a, poly16x4x4_t __val) { - __builtin_aarch64_simd_xi __o; - poly16x8x4_t __temp; - __temp.val[0] = vcombine_p16 (__val.val[0], vcreate_p16 (__AARCH64_UINT64_C (0))); - __temp.val[1] = vcombine_p16 (__val.val[1], vcreate_p16 (__AARCH64_UINT64_C (0))); - __temp.val[2] = vcombine_p16 (__val.val[2], vcreate_p16 (__AARCH64_UINT64_C (0))); - __temp.val[3] = vcombine_p16 (__val.val[3], vcreate_p16 (__AARCH64_UINT64_C (0))); - __builtin_memcpy (&__o, &__temp, sizeof (__temp)); - __builtin_aarch64_st1x4v4hi ((__builtin_aarch64_simd_hi *) __a, __o); + __builtin_aarch64_st1x4v4hi_sp ((__builtin_aarch64_simd_hi *) __a, __val); } __extension__ extern __inline void __attribute__ ((__always_inline__, __gnu_inline__, __artificial__)) vst1q_p16_x4 (poly16_t * __a, poly16x8x4_t __val) { - __builtin_aarch64_simd_xi __o; - __builtin_memcpy (&__o, &__val, sizeof (__val)); - __builtin_aarch64_st1x4v8hi ((__builtin_aarch64_simd_hi *) __a, __o); + __builtin_aarch64_st1x4v8hi_sp ((__builtin_aarch64_simd_hi *) __a, __val); } __extension__ extern __inline void __attribute__ ((__always_inline__, __gnu_inline__, __artificial__)) vst1_s64_x4 (int64_t * __a, int64x1x4_t __val) { - __builtin_aarch64_simd_xi __o; - int64x2x4_t __temp; - __temp.val[0] = vcombine_s64 (__val.val[0], vcreate_s64 (__AARCH64_INT64_C (0))); - __temp.val[1] = vcombine_s64 (__val.val[1], vcreate_s64 (__AARCH64_INT64_C (0))); - __temp.val[2] = vcombine_s64 (__val.val[2], vcreate_s64 (__AARCH64_INT64_C (0))); - __temp.val[3] = vcombine_s64 (__val.val[3], vcreate_s64 (__AARCH64_INT64_C (0))); - __builtin_memcpy (&__o, &__temp, sizeof (__temp)); - __builtin_aarch64_st1x4di ((__builtin_aarch64_simd_di *) __a, __o); + __builtin_aarch64_st1x4di ((__builtin_aarch64_simd_di *) __a, __val); } __extension__ extern __inline void __attribute__ ((__always_inline__, __gnu_inline__, __artificial__)) vst1_u64_x4 (uint64_t * __a, uint64x1x4_t __val) { - __builtin_aarch64_simd_xi __o; - uint64x2x4_t __temp; - __temp.val[0] = vcombine_u64 (__val.val[0], vcreate_u64 (__AARCH64_UINT64_C (0))); - __temp.val[1] = vcombine_u64 (__val.val[1], vcreate_u64 (__AARCH64_UINT64_C (0))); - __temp.val[2] = vcombine_u64 (__val.val[2], vcreate_u64 (__AARCH64_UINT64_C (0))); - __temp.val[3] = vcombine_u64 (__val.val[3], vcreate_u64 (__AARCH64_UINT64_C (0))); - __builtin_memcpy (&__o, &__temp, sizeof (__temp)); - __builtin_aarch64_st1x4di ((__builtin_aarch64_simd_di *) __a, __o); + __builtin_aarch64_st1x4di_su ((__builtin_aarch64_simd_di *) __a, __val); } __extension__ extern __inline void __attribute__ ((__always_inline__, __gnu_inline__, __artificial__)) vst1_p64_x4 (poly64_t * __a, poly64x1x4_t __val) { - __builtin_aarch64_simd_xi __o; - poly64x2x4_t __temp; - __temp.val[0] = vcombine_p64 (__val.val[0], vcreate_p64 (__AARCH64_UINT64_C (0))); - __temp.val[1] = vcombine_p64 (__val.val[1], vcreate_p64 (__AARCH64_UINT64_C (0))); - __temp.val[2] = vcombine_p64 (__val.val[2], vcreate_p64 (__AARCH64_UINT64_C (0))); - __temp.val[3] = vcombine_p64 (__val.val[3], vcreate_p64 (__AARCH64_UINT64_C (0))); - __builtin_memcpy (&__o, &__temp, sizeof (__temp)); - __builtin_aarch64_st1x4di ((__builtin_aarch64_simd_di *) __a, __o); + __builtin_aarch64_st1x4di_sp ((__builtin_aarch64_simd_di *) __a, __val); } __extension__ extern __inline void __attribute__ ((__always_inline__, __gnu_inline__, __artificial__)) vst1q_s64_x4 (int64_t * __a, int64x2x4_t __val) { - __builtin_aarch64_simd_xi __o; - __builtin_memcpy (&__o, &__val, sizeof (__val)); - __builtin_aarch64_st1x4v2di ((__builtin_aarch64_simd_di *) __a, __o); + __builtin_aarch64_st1x4v2di ((__builtin_aarch64_simd_di *) __a, __val); } __extension__ extern __inline void __attribute__ ((__always_inline__, __gnu_inline__, __artificial__)) vst1q_u64_x4 (uint64_t * __a, uint64x2x4_t __val) { - __builtin_aarch64_simd_xi __o; - __builtin_memcpy (&__o, &__val, sizeof (__val)); - __builtin_aarch64_st1x4v2di ((__builtin_aarch64_simd_di *) __a, __o); + __builtin_aarch64_st1x4v2di_su ((__builtin_aarch64_simd_di *) __a, __val); } __extension__ extern __inline void __attribute__ ((__always_inline__, __gnu_inline__, __artificial__)) vst1q_p64_x4 (poly64_t * __a, poly64x2x4_t __val) { - __builtin_aarch64_simd_xi __o; - __builtin_memcpy (&__o, &__val, sizeof (__val)); - __builtin_aarch64_st1x4v2di ((__builtin_aarch64_simd_di *) __a, __o); + __builtin_aarch64_st1x4v2di_sp ((__builtin_aarch64_simd_di *) __a, __val); } __extension__ extern __inline void __attribute__ ((__always_inline__, __gnu_inline__, __artificial__)) vst1_f64_x4 (float64_t * __a, float64x1x4_t __val) { - __builtin_aarch64_simd_xi __o; - float64x2x4_t __temp; - __temp.val[0] = vcombine_f64 (__val.val[0], vcreate_f64 (__AARCH64_UINT64_C (0))); - __temp.val[1] = vcombine_f64 (__val.val[1], vcreate_f64 (__AARCH64_UINT64_C (0))); - __temp.val[2] = vcombine_f64 (__val.val[2], vcreate_f64 (__AARCH64_UINT64_C (0))); - __temp.val[3] = vcombine_f64 (__val.val[3], vcreate_f64 (__AARCH64_UINT64_C (0))); - __builtin_memcpy (&__o, &__temp, sizeof (__temp)); - __builtin_aarch64_st1x4df ((__builtin_aarch64_simd_df *) __a, __o); + __builtin_aarch64_st1x4df ((__builtin_aarch64_simd_df *) __a, __val); } __extension__ extern __inline void __attribute__ ((__always_inline__, __gnu_inline__, __artificial__)) vst1q_f64_x4 (float64_t * __a, float64x2x4_t __val) { - __builtin_aarch64_simd_xi __o; - __builtin_memcpy (&__o, &__val, sizeof (__val)); - __builtin_aarch64_st1x4v2df ((__builtin_aarch64_simd_df *) __a, __o); + __builtin_aarch64_st1x4v2df ((__builtin_aarch64_simd_df *) __a, __val); } /* vstn */ @@ -28827,924 +25748,588 @@ __extension__ extern __inline void __attribute__ ((__always_inline__, __gnu_inline__, __artificial__)) vst2_s64 (int64_t * __a, int64x1x2_t __val) { - __builtin_aarch64_simd_oi __o; - int64x2x2_t __temp; - __temp.val[0] = vcombine_s64 (__val.val[0], vcreate_s64 (__AARCH64_INT64_C (0))); - __temp.val[1] = vcombine_s64 (__val.val[1], vcreate_s64 (__AARCH64_INT64_C (0))); - __builtin_memcpy (&__o, &__temp, sizeof (__temp)); - __builtin_aarch64_st2di ((__builtin_aarch64_simd_di *) __a, __o); + __builtin_aarch64_st2di ((__builtin_aarch64_simd_di *) __a, __val); } __extension__ extern __inline void __attribute__ ((__always_inline__, __gnu_inline__, __artificial__)) vst2_u64 (uint64_t * __a, uint64x1x2_t __val) { - __builtin_aarch64_simd_oi __o; - uint64x2x2_t __temp; - __temp.val[0] = vcombine_u64 (__val.val[0], vcreate_u64 (__AARCH64_UINT64_C (0))); - __temp.val[1] = vcombine_u64 (__val.val[1], vcreate_u64 (__AARCH64_UINT64_C (0))); - __builtin_memcpy (&__o, &__temp, sizeof (__temp)); - __builtin_aarch64_st2di ((__builtin_aarch64_simd_di *) __a, __o); + __builtin_aarch64_st2di_su ((__builtin_aarch64_simd_di *) __a, __val); } __extension__ extern __inline void __attribute__ ((__always_inline__, __gnu_inline__, __artificial__)) vst2_f64 (float64_t * __a, float64x1x2_t __val) { - __builtin_aarch64_simd_oi __o; - float64x2x2_t __temp; - __temp.val[0] = vcombine_f64 (__val.val[0], vcreate_f64 (__AARCH64_UINT64_C (0))); - __temp.val[1] = vcombine_f64 (__val.val[1], vcreate_f64 (__AARCH64_UINT64_C (0))); - __builtin_memcpy (&__o, &__temp, sizeof (__temp)); - __builtin_aarch64_st2df ((__builtin_aarch64_simd_df *) __a, __o); + __builtin_aarch64_st2df ((__builtin_aarch64_simd_df *) __a, __val); } __extension__ extern __inline void __attribute__ ((__always_inline__, __gnu_inline__, __artificial__)) vst2_s8 (int8_t * __a, int8x8x2_t __val) { - __builtin_aarch64_simd_oi __o; - int8x16x2_t __temp; - __temp.val[0] = vcombine_s8 (__val.val[0], vcreate_s8 (__AARCH64_INT64_C (0))); - __temp.val[1] = vcombine_s8 (__val.val[1], vcreate_s8 (__AARCH64_INT64_C (0))); - __builtin_memcpy (&__o, &__temp, sizeof (__temp)); - __builtin_aarch64_st2v8qi ((__builtin_aarch64_simd_qi *) __a, __o); + __builtin_aarch64_st2v8qi ((__builtin_aarch64_simd_qi *) __a, __val); } __extension__ extern __inline void __attribute__ ((__always_inline__, __gnu_inline__, __artificial__)) vst2_p8 (poly8_t * __a, poly8x8x2_t __val) { - __builtin_aarch64_simd_oi __o; - poly8x16x2_t __temp; - __temp.val[0] = vcombine_p8 (__val.val[0], vcreate_p8 (__AARCH64_UINT64_C (0))); - __temp.val[1] = vcombine_p8 (__val.val[1], vcreate_p8 (__AARCH64_UINT64_C (0))); - __builtin_memcpy (&__o, &__temp, sizeof (__temp)); - __builtin_aarch64_st2v8qi ((__builtin_aarch64_simd_qi *) __a, __o); + __builtin_aarch64_st2v8qi_sp ((__builtin_aarch64_simd_qi *) __a, __val); } __extension__ extern __inline void __attribute__ ((__always_inline__, __gnu_inline__, __artificial__)) vst2_s16 (int16_t * __a, int16x4x2_t __val) { - __builtin_aarch64_simd_oi __o; - int16x8x2_t __temp; - __temp.val[0] = vcombine_s16 (__val.val[0], vcreate_s16 (__AARCH64_INT64_C (0))); - __temp.val[1] = vcombine_s16 (__val.val[1], vcreate_s16 (__AARCH64_INT64_C (0))); - __builtin_memcpy (&__o, &__temp, sizeof (__temp)); - __builtin_aarch64_st2v4hi ((__builtin_aarch64_simd_hi *) __a, __o); + __builtin_aarch64_st2v4hi ((__builtin_aarch64_simd_hi *) __a, __val); } __extension__ extern __inline void __attribute__ ((__always_inline__, __gnu_inline__, __artificial__)) vst2_p16 (poly16_t * __a, poly16x4x2_t __val) { - __builtin_aarch64_simd_oi __o; - poly16x8x2_t __temp; - __temp.val[0] = vcombine_p16 (__val.val[0], vcreate_p16 (__AARCH64_UINT64_C (0))); - __temp.val[1] = vcombine_p16 (__val.val[1], vcreate_p16 (__AARCH64_UINT64_C (0))); - __builtin_memcpy (&__o, &__temp, sizeof (__temp)); - __builtin_aarch64_st2v4hi ((__builtin_aarch64_simd_hi *) __a, __o); + __builtin_aarch64_st2v4hi_sp ((__builtin_aarch64_simd_hi *) __a, __val); } __extension__ extern __inline void __attribute__ ((__always_inline__, __gnu_inline__, __artificial__)) vst2_s32 (int32_t * __a, int32x2x2_t __val) { - __builtin_aarch64_simd_oi __o; - int32x4x2_t __temp; - __temp.val[0] = vcombine_s32 (__val.val[0], vcreate_s32 (__AARCH64_INT64_C (0))); - __temp.val[1] = vcombine_s32 (__val.val[1], vcreate_s32 (__AARCH64_INT64_C (0))); - __builtin_memcpy (&__o, &__temp, sizeof (__temp)); - __builtin_aarch64_st2v2si ((__builtin_aarch64_simd_si *) __a, __o); + __builtin_aarch64_st2v2si ((__builtin_aarch64_simd_si *) __a, __val); } __extension__ extern __inline void __attribute__ ((__always_inline__, __gnu_inline__, __artificial__)) vst2_u8 (uint8_t * __a, uint8x8x2_t __val) { - __builtin_aarch64_simd_oi __o; - uint8x16x2_t __temp; - __temp.val[0] = vcombine_u8 (__val.val[0], vcreate_u8 (__AARCH64_UINT64_C (0))); - __temp.val[1] = vcombine_u8 (__val.val[1], vcreate_u8 (__AARCH64_UINT64_C (0))); - __builtin_memcpy (&__o, &__temp, sizeof (__temp)); - __builtin_aarch64_st2v8qi ((__builtin_aarch64_simd_qi *) __a, __o); + __builtin_aarch64_st2v8qi_su ((__builtin_aarch64_simd_qi *) __a, __val); } __extension__ extern __inline void __attribute__ ((__always_inline__, __gnu_inline__, __artificial__)) vst2_u16 (uint16_t * __a, uint16x4x2_t __val) { - __builtin_aarch64_simd_oi __o; - uint16x8x2_t __temp; - __temp.val[0] = vcombine_u16 (__val.val[0], vcreate_u16 (__AARCH64_UINT64_C (0))); - __temp.val[1] = vcombine_u16 (__val.val[1], vcreate_u16 (__AARCH64_UINT64_C (0))); - __builtin_memcpy (&__o, &__temp, sizeof (__temp)); - __builtin_aarch64_st2v4hi ((__builtin_aarch64_simd_hi *) __a, __o); + __builtin_aarch64_st2v4hi_su ((__builtin_aarch64_simd_hi *) __a, __val); } __extension__ extern __inline void __attribute__ ((__always_inline__, __gnu_inline__, __artificial__)) vst2_u32 (uint32_t * __a, uint32x2x2_t __val) { - __builtin_aarch64_simd_oi __o; - uint32x4x2_t __temp; - __temp.val[0] = vcombine_u32 (__val.val[0], vcreate_u32 (__AARCH64_UINT64_C (0))); - __temp.val[1] = vcombine_u32 (__val.val[1], vcreate_u32 (__AARCH64_UINT64_C (0))); - __builtin_memcpy (&__o, &__temp, sizeof (__temp)); - __builtin_aarch64_st2v2si ((__builtin_aarch64_simd_si *) __a, __o); + __builtin_aarch64_st2v2si_su ((__builtin_aarch64_simd_si *) __a, __val); } __extension__ extern __inline void __attribute__ ((__always_inline__, __gnu_inline__, __artificial__)) vst2_f16 (float16_t * __a, float16x4x2_t __val) { - __builtin_aarch64_simd_oi __o; - float16x8x2_t __temp; - __temp.val[0] = vcombine_f16 (__val.val[0], vcreate_f16 (__AARCH64_UINT64_C (0))); - __temp.val[1] = vcombine_f16 (__val.val[1], vcreate_f16 (__AARCH64_UINT64_C (0))); - __builtin_memcpy (&__o, &__temp, sizeof (__temp)); - __builtin_aarch64_st2v4hf (__a, __o); + __builtin_aarch64_st2v4hf ((__builtin_aarch64_simd_hf *) __a, __val); } __extension__ extern __inline void __attribute__ ((__always_inline__, __gnu_inline__, __artificial__)) vst2_f32 (float32_t * __a, float32x2x2_t __val) { - __builtin_aarch64_simd_oi __o; - float32x4x2_t __temp; - __temp.val[0] = vcombine_f32 (__val.val[0], vcreate_f32 (__AARCH64_UINT64_C (0))); - __temp.val[1] = vcombine_f32 (__val.val[1], vcreate_f32 (__AARCH64_UINT64_C (0))); - __builtin_memcpy (&__o, &__temp, sizeof (__temp)); - __builtin_aarch64_st2v2sf ((__builtin_aarch64_simd_sf *) __a, __o); + __builtin_aarch64_st2v2sf ((__builtin_aarch64_simd_sf *) __a, __val); } __extension__ extern __inline void __attribute__ ((__always_inline__, __gnu_inline__, __artificial__)) vst2_p64 (poly64_t * __a, poly64x1x2_t __val) { - __builtin_aarch64_simd_oi __o; - poly64x2x2_t __temp; - __temp.val[0] = vcombine_p64 (__val.val[0], vcreate_p64 (__AARCH64_UINT64_C (0))); - __temp.val[1] = vcombine_p64 (__val.val[1], vcreate_p64 (__AARCH64_UINT64_C (0))); - __builtin_memcpy (&__o, &__temp, sizeof (__temp)); - __builtin_aarch64_st2di ((__builtin_aarch64_simd_di *) __a, __o); + __builtin_aarch64_st2di_sp ((__builtin_aarch64_simd_di *) __a, __val); } __extension__ extern __inline void __attribute__ ((__always_inline__, __gnu_inline__, __artificial__)) vst2q_s8 (int8_t * __a, int8x16x2_t __val) { - __builtin_aarch64_simd_oi __o; - __builtin_memcpy (&__o, &__val, sizeof (__val)); - __builtin_aarch64_st2v16qi ((__builtin_aarch64_simd_qi *) __a, __o); + __builtin_aarch64_st2v16qi ((__builtin_aarch64_simd_qi *) __a, __val); } __extension__ extern __inline void __attribute__ ((__always_inline__, __gnu_inline__, __artificial__)) vst2q_p8 (poly8_t * __a, poly8x16x2_t __val) { - __builtin_aarch64_simd_oi __o; - __builtin_memcpy (&__o, &__val, sizeof (__val)); - __builtin_aarch64_st2v16qi ((__builtin_aarch64_simd_qi *) __a, __o); + __builtin_aarch64_st2v16qi_sp ((__builtin_aarch64_simd_qi *) __a, __val); } __extension__ extern __inline void __attribute__ ((__always_inline__, __gnu_inline__, __artificial__)) vst2q_s16 (int16_t * __a, int16x8x2_t __val) { - __builtin_aarch64_simd_oi __o; - __builtin_memcpy (&__o, &__val, sizeof (__val)); - __builtin_aarch64_st2v8hi ((__builtin_aarch64_simd_hi *) __a, __o); + __builtin_aarch64_st2v8hi ((__builtin_aarch64_simd_hi *) __a, __val); } __extension__ extern __inline void __attribute__ ((__always_inline__, __gnu_inline__, __artificial__)) vst2q_p16 (poly16_t * __a, poly16x8x2_t __val) { - __builtin_aarch64_simd_oi __o; - __builtin_memcpy (&__o, &__val, sizeof (__val)); - __builtin_aarch64_st2v8hi ((__builtin_aarch64_simd_hi *) __a, __o); + __builtin_aarch64_st2v8hi_sp ((__builtin_aarch64_simd_hi *) __a, __val); } __extension__ extern __inline void __attribute__ ((__always_inline__, __gnu_inline__, __artificial__)) vst2q_s32 (int32_t * __a, int32x4x2_t __val) { - __builtin_aarch64_simd_oi __o; - __builtin_memcpy (&__o, &__val, sizeof (__val)); - __builtin_aarch64_st2v4si ((__builtin_aarch64_simd_si *) __a, __o); + __builtin_aarch64_st2v4si ((__builtin_aarch64_simd_si *) __a, __val); } __extension__ extern __inline void __attribute__ ((__always_inline__, __gnu_inline__, __artificial__)) vst2q_s64 (int64_t * __a, int64x2x2_t __val) { - __builtin_aarch64_simd_oi __o; - __builtin_memcpy (&__o, &__val, sizeof (__val)); - __builtin_aarch64_st2v2di ((__builtin_aarch64_simd_di *) __a, __o); + __builtin_aarch64_st2v2di ((__builtin_aarch64_simd_di *) __a, __val); } __extension__ extern __inline void __attribute__ ((__always_inline__, __gnu_inline__, __artificial__)) vst2q_u8 (uint8_t * __a, uint8x16x2_t __val) { - __builtin_aarch64_simd_oi __o; - __builtin_memcpy (&__o, &__val, sizeof (__val)); - __builtin_aarch64_st2v16qi ((__builtin_aarch64_simd_qi *) __a, __o); + __builtin_aarch64_st2v16qi_su ((__builtin_aarch64_simd_qi *) __a, __val); } __extension__ extern __inline void __attribute__ ((__always_inline__, __gnu_inline__, __artificial__)) vst2q_u16 (uint16_t * __a, uint16x8x2_t __val) { - __builtin_aarch64_simd_oi __o; - __builtin_memcpy (&__o, &__val, sizeof (__val)); - __builtin_aarch64_st2v8hi ((__builtin_aarch64_simd_hi *) __a, __o); + __builtin_aarch64_st2v8hi_su ((__builtin_aarch64_simd_hi *) __a, __val); } __extension__ extern __inline void __attribute__ ((__always_inline__, __gnu_inline__, __artificial__)) vst2q_u32 (uint32_t * __a, uint32x4x2_t __val) { - __builtin_aarch64_simd_oi __o; - __builtin_memcpy (&__o, &__val, sizeof (__val)); - __builtin_aarch64_st2v4si ((__builtin_aarch64_simd_si *) __a, __o); + __builtin_aarch64_st2v4si_su ((__builtin_aarch64_simd_si *) __a, __val); } __extension__ extern __inline void __attribute__ ((__always_inline__, __gnu_inline__, __artificial__)) vst2q_u64 (uint64_t * __a, uint64x2x2_t __val) { - __builtin_aarch64_simd_oi __o; - __builtin_memcpy (&__o, &__val, sizeof (__val)); - __builtin_aarch64_st2v2di ((__builtin_aarch64_simd_di *) __a, __o); + __builtin_aarch64_st2v2di_su ((__builtin_aarch64_simd_di *) __a, __val); } __extension__ extern __inline void __attribute__ ((__always_inline__, __gnu_inline__, __artificial__)) vst2q_f16 (float16_t * __a, float16x8x2_t __val) { - __builtin_aarch64_simd_oi __o; - __builtin_memcpy (&__o, &__val, sizeof (__val)); - __builtin_aarch64_st2v8hf (__a, __o); + __builtin_aarch64_st2v8hf ((__builtin_aarch64_simd_hf *) __a, __val); } __extension__ extern __inline void __attribute__ ((__always_inline__, __gnu_inline__, __artificial__)) vst2q_f32 (float32_t * __a, float32x4x2_t __val) { - __builtin_aarch64_simd_oi __o; - __builtin_memcpy (&__o, &__val, sizeof (__val)); - __builtin_aarch64_st2v4sf ((__builtin_aarch64_simd_sf *) __a, __o); + __builtin_aarch64_st2v4sf ((__builtin_aarch64_simd_sf *) __a, __val); } __extension__ extern __inline void __attribute__ ((__always_inline__, __gnu_inline__, __artificial__)) vst2q_f64 (float64_t * __a, float64x2x2_t __val) { - __builtin_aarch64_simd_oi __o; - __builtin_memcpy (&__o, &__val, sizeof (__val)); - __builtin_aarch64_st2v2df ((__builtin_aarch64_simd_df *) __a, __o); + __builtin_aarch64_st2v2df ((__builtin_aarch64_simd_df *) __a, __val); } __extension__ extern __inline void __attribute__ ((__always_inline__, __gnu_inline__, __artificial__)) vst2q_p64 (poly64_t * __a, poly64x2x2_t __val) { - __builtin_aarch64_simd_oi __o; - __builtin_memcpy (&__o, &__val, sizeof (__val)); - __builtin_aarch64_st2v2di ((__builtin_aarch64_simd_di *) __a, __o); + __builtin_aarch64_st2v2di_sp ((__builtin_aarch64_simd_di *) __a, __val); } __extension__ extern __inline void __attribute__ ((__always_inline__, __gnu_inline__, __artificial__)) vst3_s64 (int64_t * __a, int64x1x3_t __val) { - __builtin_aarch64_simd_ci __o; - int64x2x3_t __temp; - __temp.val[0] = vcombine_s64 (__val.val[0], vcreate_s64 (__AARCH64_INT64_C (0))); - __temp.val[1] = vcombine_s64 (__val.val[1], vcreate_s64 (__AARCH64_INT64_C (0))); - __temp.val[2] = vcombine_s64 (__val.val[2], vcreate_s64 (__AARCH64_INT64_C (0))); - __builtin_memcpy (&__o, &__temp, sizeof (__temp)); - __builtin_aarch64_st3di ((__builtin_aarch64_simd_di *) __a, __o); + __builtin_aarch64_st3di ((__builtin_aarch64_simd_di *) __a, __val); } __extension__ extern __inline void __attribute__ ((__always_inline__, __gnu_inline__, __artificial__)) vst3_u64 (uint64_t * __a, uint64x1x3_t __val) { - __builtin_aarch64_simd_ci __o; - uint64x2x3_t __temp; - __temp.val[0] = vcombine_u64 (__val.val[0], vcreate_u64 (__AARCH64_UINT64_C (0))); - __temp.val[1] = vcombine_u64 (__val.val[1], vcreate_u64 (__AARCH64_UINT64_C (0))); - __temp.val[2] = vcombine_u64 (__val.val[2], vcreate_u64 (__AARCH64_UINT64_C (0))); - __builtin_memcpy (&__o, &__temp, sizeof (__temp)); - __builtin_aarch64_st3di ((__builtin_aarch64_simd_di *) __a, __o); + __builtin_aarch64_st3di_su ((__builtin_aarch64_simd_di *) __a, __val); } __extension__ extern __inline void __attribute__ ((__always_inline__, __gnu_inline__, __artificial__)) vst3_f64 (float64_t * __a, float64x1x3_t __val) { - __builtin_aarch64_simd_ci __o; - float64x2x3_t __temp; - __temp.val[0] = vcombine_f64 (__val.val[0], vcreate_f64 (__AARCH64_UINT64_C (0))); - __temp.val[1] = vcombine_f64 (__val.val[1], vcreate_f64 (__AARCH64_UINT64_C (0))); - __temp.val[2] = vcombine_f64 (__val.val[2], vcreate_f64 (__AARCH64_UINT64_C (0))); - __builtin_memcpy (&__o, &__temp, sizeof (__temp)); - __builtin_aarch64_st3df ((__builtin_aarch64_simd_df *) __a, __o); + __builtin_aarch64_st3df ((__builtin_aarch64_simd_df *) __a, __val); } __extension__ extern __inline void __attribute__ ((__always_inline__, __gnu_inline__, __artificial__)) vst3_s8 (int8_t * __a, int8x8x3_t __val) { - __builtin_aarch64_simd_ci __o; - int8x16x3_t __temp; - __temp.val[0] = vcombine_s8 (__val.val[0], vcreate_s8 (__AARCH64_INT64_C (0))); - __temp.val[1] = vcombine_s8 (__val.val[1], vcreate_s8 (__AARCH64_INT64_C (0))); - __temp.val[2] = vcombine_s8 (__val.val[2], vcreate_s8 (__AARCH64_INT64_C (0))); - __builtin_memcpy (&__o, &__temp, sizeof (__temp)); - __builtin_aarch64_st3v8qi ((__builtin_aarch64_simd_qi *) __a, __o); + __builtin_aarch64_st3v8qi ((__builtin_aarch64_simd_qi *) __a, __val); } __extension__ extern __inline void __attribute__ ((__always_inline__, __gnu_inline__, __artificial__)) vst3_p8 (poly8_t * __a, poly8x8x3_t __val) { - __builtin_aarch64_simd_ci __o; - poly8x16x3_t __temp; - __temp.val[0] = vcombine_p8 (__val.val[0], vcreate_p8 (__AARCH64_UINT64_C (0))); - __temp.val[1] = vcombine_p8 (__val.val[1], vcreate_p8 (__AARCH64_UINT64_C (0))); - __temp.val[2] = vcombine_p8 (__val.val[2], vcreate_p8 (__AARCH64_UINT64_C (0))); - __builtin_memcpy (&__o, &__temp, sizeof (__temp)); - __builtin_aarch64_st3v8qi ((__builtin_aarch64_simd_qi *) __a, __o); + __builtin_aarch64_st3v8qi_sp ((__builtin_aarch64_simd_qi *) __a, __val); } __extension__ extern __inline void __attribute__ ((__always_inline__, __gnu_inline__, __artificial__)) vst3_s16 (int16_t * __a, int16x4x3_t __val) { - __builtin_aarch64_simd_ci __o; - int16x8x3_t __temp; - __temp.val[0] = vcombine_s16 (__val.val[0], vcreate_s16 (__AARCH64_INT64_C (0))); - __temp.val[1] = vcombine_s16 (__val.val[1], vcreate_s16 (__AARCH64_INT64_C (0))); - __temp.val[2] = vcombine_s16 (__val.val[2], vcreate_s16 (__AARCH64_INT64_C (0))); - __builtin_memcpy (&__o, &__temp, sizeof (__temp)); - __builtin_aarch64_st3v4hi ((__builtin_aarch64_simd_hi *) __a, __o); + __builtin_aarch64_st3v4hi ((__builtin_aarch64_simd_hi *) __a, __val); } __extension__ extern __inline void __attribute__ ((__always_inline__, __gnu_inline__, __artificial__)) vst3_p16 (poly16_t * __a, poly16x4x3_t __val) { - __builtin_aarch64_simd_ci __o; - poly16x8x3_t __temp; - __temp.val[0] = vcombine_p16 (__val.val[0], vcreate_p16 (__AARCH64_UINT64_C (0))); - __temp.val[1] = vcombine_p16 (__val.val[1], vcreate_p16 (__AARCH64_UINT64_C (0))); - __temp.val[2] = vcombine_p16 (__val.val[2], vcreate_p16 (__AARCH64_UINT64_C (0))); - __builtin_memcpy (&__o, &__temp, sizeof (__temp)); - __builtin_aarch64_st3v4hi ((__builtin_aarch64_simd_hi *) __a, __o); + __builtin_aarch64_st3v4hi_sp ((__builtin_aarch64_simd_hi *) __a, __val); } __extension__ extern __inline void __attribute__ ((__always_inline__, __gnu_inline__, __artificial__)) vst3_s32 (int32_t * __a, int32x2x3_t __val) { - __builtin_aarch64_simd_ci __o; - int32x4x3_t __temp; - __temp.val[0] = vcombine_s32 (__val.val[0], vcreate_s32 (__AARCH64_INT64_C (0))); - __temp.val[1] = vcombine_s32 (__val.val[1], vcreate_s32 (__AARCH64_INT64_C (0))); - __temp.val[2] = vcombine_s32 (__val.val[2], vcreate_s32 (__AARCH64_INT64_C (0))); - __builtin_memcpy (&__o, &__temp, sizeof (__temp)); - __builtin_aarch64_st3v2si ((__builtin_aarch64_simd_si *) __a, __o); + __builtin_aarch64_st3v2si ((__builtin_aarch64_simd_si *) __a, __val); } __extension__ extern __inline void __attribute__ ((__always_inline__, __gnu_inline__, __artificial__)) vst3_u8 (uint8_t * __a, uint8x8x3_t __val) { - __builtin_aarch64_simd_ci __o; - uint8x16x3_t __temp; - __temp.val[0] = vcombine_u8 (__val.val[0], vcreate_u8 (__AARCH64_UINT64_C (0))); - __temp.val[1] = vcombine_u8 (__val.val[1], vcreate_u8 (__AARCH64_UINT64_C (0))); - __temp.val[2] = vcombine_u8 (__val.val[2], vcreate_u8 (__AARCH64_UINT64_C (0))); - __builtin_memcpy (&__o, &__temp, sizeof (__temp)); - __builtin_aarch64_st3v8qi ((__builtin_aarch64_simd_qi *) __a, __o); + __builtin_aarch64_st3v8qi_su ((__builtin_aarch64_simd_qi *) __a, __val); } __extension__ extern __inline void __attribute__ ((__always_inline__, __gnu_inline__, __artificial__)) vst3_u16 (uint16_t * __a, uint16x4x3_t __val) { - __builtin_aarch64_simd_ci __o; - uint16x8x3_t __temp; - __temp.val[0] = vcombine_u16 (__val.val[0], vcreate_u16 (__AARCH64_UINT64_C (0))); - __temp.val[1] = vcombine_u16 (__val.val[1], vcreate_u16 (__AARCH64_UINT64_C (0))); - __temp.val[2] = vcombine_u16 (__val.val[2], vcreate_u16 (__AARCH64_UINT64_C (0))); - __builtin_memcpy (&__o, &__temp, sizeof (__temp)); - __builtin_aarch64_st3v4hi ((__builtin_aarch64_simd_hi *) __a, __o); + __builtin_aarch64_st3v4hi_su ((__builtin_aarch64_simd_hi *) __a, __val); } __extension__ extern __inline void __attribute__ ((__always_inline__, __gnu_inline__, __artificial__)) vst3_u32 (uint32_t * __a, uint32x2x3_t __val) { - __builtin_aarch64_simd_ci __o; - uint32x4x3_t __temp; - __temp.val[0] = vcombine_u32 (__val.val[0], vcreate_u32 (__AARCH64_UINT64_C (0))); - __temp.val[1] = vcombine_u32 (__val.val[1], vcreate_u32 (__AARCH64_UINT64_C (0))); - __temp.val[2] = vcombine_u32 (__val.val[2], vcreate_u32 (__AARCH64_UINT64_C (0))); - __builtin_memcpy (&__o, &__temp, sizeof (__temp)); - __builtin_aarch64_st3v2si ((__builtin_aarch64_simd_si *) __a, __o); + __builtin_aarch64_st3v2si_su ((__builtin_aarch64_simd_si *) __a, __val); } __extension__ extern __inline void __attribute__ ((__always_inline__, __gnu_inline__, __artificial__)) vst3_f16 (float16_t * __a, float16x4x3_t __val) { - __builtin_aarch64_simd_ci __o; - float16x8x3_t __temp; - __temp.val[0] = vcombine_f16 (__val.val[0], vcreate_f16 (__AARCH64_UINT64_C (0))); - __temp.val[1] = vcombine_f16 (__val.val[1], vcreate_f16 (__AARCH64_UINT64_C (0))); - __temp.val[2] = vcombine_f16 (__val.val[2], vcreate_f16 (__AARCH64_UINT64_C (0))); - __builtin_memcpy (&__o, &__temp, sizeof (__temp)); - __builtin_aarch64_st3v4hf ((__builtin_aarch64_simd_hf *) __a, __o); + __builtin_aarch64_st3v4hf ((__builtin_aarch64_simd_hf *) __a, __val); } __extension__ extern __inline void __attribute__ ((__always_inline__, __gnu_inline__, __artificial__)) vst3_f32 (float32_t * __a, float32x2x3_t __val) { - __builtin_aarch64_simd_ci __o; - float32x4x3_t __temp; - __temp.val[0] = vcombine_f32 (__val.val[0], vcreate_f32 (__AARCH64_UINT64_C (0))); - __temp.val[1] = vcombine_f32 (__val.val[1], vcreate_f32 (__AARCH64_UINT64_C (0))); - __temp.val[2] = vcombine_f32 (__val.val[2], vcreate_f32 (__AARCH64_UINT64_C (0))); - __builtin_memcpy (&__o, &__temp, sizeof (__temp)); - __builtin_aarch64_st3v2sf ((__builtin_aarch64_simd_sf *) __a, __o); + __builtin_aarch64_st3v2sf ((__builtin_aarch64_simd_sf *) __a, __val); } __extension__ extern __inline void __attribute__ ((__always_inline__, __gnu_inline__, __artificial__)) vst3_p64 (poly64_t * __a, poly64x1x3_t __val) { - __builtin_aarch64_simd_ci __o; - poly64x2x3_t __temp; - __temp.val[0] = vcombine_p64 (__val.val[0], vcreate_p64 (__AARCH64_UINT64_C (0))); - __temp.val[1] = vcombine_p64 (__val.val[1], vcreate_p64 (__AARCH64_UINT64_C (0))); - __temp.val[2] = vcombine_p64 (__val.val[2], vcreate_p64 (__AARCH64_UINT64_C (0))); - __builtin_memcpy (&__o, &__temp, sizeof (__temp)); - __builtin_aarch64_st3di ((__builtin_aarch64_simd_di *) __a, __o); + __builtin_aarch64_st3di_sp ((__builtin_aarch64_simd_di *) __a, __val); } __extension__ extern __inline void __attribute__ ((__always_inline__, __gnu_inline__, __artificial__)) vst3q_s8 (int8_t * __a, int8x16x3_t __val) { - __builtin_aarch64_simd_ci __o; - __builtin_memcpy (&__o, &__val, sizeof (__val)); - __builtin_aarch64_st3v16qi ((__builtin_aarch64_simd_qi *) __a, __o); + __builtin_aarch64_st3v16qi ((__builtin_aarch64_simd_qi *) __a, __val); } __extension__ extern __inline void __attribute__ ((__always_inline__, __gnu_inline__, __artificial__)) vst3q_p8 (poly8_t * __a, poly8x16x3_t __val) { - __builtin_aarch64_simd_ci __o; - __builtin_memcpy (&__o, &__val, sizeof (__val)); - __builtin_aarch64_st3v16qi ((__builtin_aarch64_simd_qi *) __a, __o); + __builtin_aarch64_st3v16qi_sp ((__builtin_aarch64_simd_qi *) __a, __val); } __extension__ extern __inline void __attribute__ ((__always_inline__, __gnu_inline__, __artificial__)) vst3q_s16 (int16_t * __a, int16x8x3_t __val) { - __builtin_aarch64_simd_ci __o; - __builtin_memcpy (&__o, &__val, sizeof (__val)); - __builtin_aarch64_st3v8hi ((__builtin_aarch64_simd_hi *) __a, __o); + __builtin_aarch64_st3v8hi ((__builtin_aarch64_simd_hi *) __a, __val); } __extension__ extern __inline void __attribute__ ((__always_inline__, __gnu_inline__, __artificial__)) vst3q_p16 (poly16_t * __a, poly16x8x3_t __val) { - __builtin_aarch64_simd_ci __o; - __builtin_memcpy (&__o, &__val, sizeof (__val)); - __builtin_aarch64_st3v8hi ((__builtin_aarch64_simd_hi *) __a, __o); + __builtin_aarch64_st3v8hi_sp ((__builtin_aarch64_simd_hi *) __a, __val); } __extension__ extern __inline void __attribute__ ((__always_inline__, __gnu_inline__, __artificial__)) vst3q_s32 (int32_t * __a, int32x4x3_t __val) { - __builtin_aarch64_simd_ci __o; - __builtin_memcpy (&__o, &__val, sizeof (__val)); - __builtin_aarch64_st3v4si ((__builtin_aarch64_simd_si *) __a, __o); + __builtin_aarch64_st3v4si ((__builtin_aarch64_simd_si *) __a, __val); } __extension__ extern __inline void __attribute__ ((__always_inline__, __gnu_inline__, __artificial__)) vst3q_s64 (int64_t * __a, int64x2x3_t __val) { - __builtin_aarch64_simd_ci __o; - __builtin_memcpy (&__o, &__val, sizeof (__val)); - __builtin_aarch64_st3v2di ((__builtin_aarch64_simd_di *) __a, __o); + __builtin_aarch64_st3v2di ((__builtin_aarch64_simd_di *) __a, __val); } __extension__ extern __inline void __attribute__ ((__always_inline__, __gnu_inline__, __artificial__)) vst3q_u8 (uint8_t * __a, uint8x16x3_t __val) { - __builtin_aarch64_simd_ci __o; - __builtin_memcpy (&__o, &__val, sizeof (__val)); - __builtin_aarch64_st3v16qi ((__builtin_aarch64_simd_qi *) __a, __o); + __builtin_aarch64_st3v16qi_su ((__builtin_aarch64_simd_qi *) __a, __val); } __extension__ extern __inline void __attribute__ ((__always_inline__, __gnu_inline__, __artificial__)) vst3q_u16 (uint16_t * __a, uint16x8x3_t __val) { - __builtin_aarch64_simd_ci __o; - __builtin_memcpy (&__o, &__val, sizeof (__val)); - __builtin_aarch64_st3v8hi ((__builtin_aarch64_simd_hi *) __a, __o); + __builtin_aarch64_st3v8hi_su ((__builtin_aarch64_simd_hi *) __a, __val); } __extension__ extern __inline void __attribute__ ((__always_inline__, __gnu_inline__, __artificial__)) vst3q_u32 (uint32_t * __a, uint32x4x3_t __val) { - __builtin_aarch64_simd_ci __o; - __builtin_memcpy (&__o, &__val, sizeof (__val)); - __builtin_aarch64_st3v4si ((__builtin_aarch64_simd_si *) __a, __o); + __builtin_aarch64_st3v4si_su ((__builtin_aarch64_simd_si *) __a, __val); } __extension__ extern __inline void __attribute__ ((__always_inline__, __gnu_inline__, __artificial__)) vst3q_u64 (uint64_t * __a, uint64x2x3_t __val) { - __builtin_aarch64_simd_ci __o; - __builtin_memcpy (&__o, &__val, sizeof (__val)); - __builtin_aarch64_st3v2di ((__builtin_aarch64_simd_di *) __a, __o); + __builtin_aarch64_st3v2di_su ((__builtin_aarch64_simd_di *) __a, __val); } __extension__ extern __inline void __attribute__ ((__always_inline__, __gnu_inline__, __artificial__)) vst3q_f16 (float16_t * __a, float16x8x3_t __val) { - __builtin_aarch64_simd_ci __o; - __builtin_memcpy (&__o, &__val, sizeof (__val)); - __builtin_aarch64_st3v8hf ((__builtin_aarch64_simd_hf *) __a, __o); + __builtin_aarch64_st3v8hf ((__builtin_aarch64_simd_hf *) __a, __val); } __extension__ extern __inline void __attribute__ ((__always_inline__, __gnu_inline__, __artificial__)) vst3q_f32 (float32_t * __a, float32x4x3_t __val) { - __builtin_aarch64_simd_ci __o; - __builtin_memcpy (&__o, &__val, sizeof (__val)); - __builtin_aarch64_st3v4sf ((__builtin_aarch64_simd_sf *) __a, __o); + __builtin_aarch64_st3v4sf ((__builtin_aarch64_simd_sf *) __a, __val); } __extension__ extern __inline void __attribute__ ((__always_inline__, __gnu_inline__, __artificial__)) vst3q_f64 (float64_t * __a, float64x2x3_t __val) { - __builtin_aarch64_simd_ci __o; - __builtin_memcpy (&__o, &__val, sizeof (__val)); - __builtin_aarch64_st3v2df ((__builtin_aarch64_simd_df *) __a, __o); + __builtin_aarch64_st3v2df ((__builtin_aarch64_simd_df *) __a, __val); } __extension__ extern __inline void __attribute__ ((__always_inline__, __gnu_inline__, __artificial__)) vst3q_p64 (poly64_t * __a, poly64x2x3_t __val) { - __builtin_aarch64_simd_ci __o; - __builtin_memcpy (&__o, &__val, sizeof (__val)); - __builtin_aarch64_st3v2di ((__builtin_aarch64_simd_di *) __a, __o); + __builtin_aarch64_st3v2di_sp ((__builtin_aarch64_simd_di *) __a, __val); } __extension__ extern __inline void __attribute__ ((__always_inline__, __gnu_inline__, __artificial__)) vst4_s64 (int64_t * __a, int64x1x4_t __val) { - __builtin_aarch64_simd_xi __o; - int64x2x4_t __temp; - __temp.val[0] = vcombine_s64 (__val.val[0], vcreate_s64 (__AARCH64_INT64_C (0))); - __temp.val[1] = vcombine_s64 (__val.val[1], vcreate_s64 (__AARCH64_INT64_C (0))); - __temp.val[2] = vcombine_s64 (__val.val[2], vcreate_s64 (__AARCH64_INT64_C (0))); - __temp.val[3] = vcombine_s64 (__val.val[3], vcreate_s64 (__AARCH64_INT64_C (0))); - __builtin_memcpy (&__o, &__temp, sizeof (__temp)); - __builtin_aarch64_st4di ((__builtin_aarch64_simd_di *) __a, __o); + __builtin_aarch64_st4di ((__builtin_aarch64_simd_di *) __a, __val); } __extension__ extern __inline void __attribute__ ((__always_inline__, __gnu_inline__, __artificial__)) vst4_u64 (uint64_t * __a, uint64x1x4_t __val) { - __builtin_aarch64_simd_xi __o; - uint64x2x4_t __temp; - __temp.val[0] = vcombine_u64 (__val.val[0], vcreate_u64 (__AARCH64_UINT64_C (0))); - __temp.val[1] = vcombine_u64 (__val.val[1], vcreate_u64 (__AARCH64_UINT64_C (0))); - __temp.val[2] = vcombine_u64 (__val.val[2], vcreate_u64 (__AARCH64_UINT64_C (0))); - __temp.val[3] = vcombine_u64 (__val.val[3], vcreate_u64 (__AARCH64_UINT64_C (0))); - __builtin_memcpy (&__o, &__temp, sizeof (__temp)); - __builtin_aarch64_st4di ((__builtin_aarch64_simd_di *) __a, __o); + __builtin_aarch64_st4di_su ((__builtin_aarch64_simd_di *) __a, __val); } __extension__ extern __inline void __attribute__ ((__always_inline__, __gnu_inline__, __artificial__)) vst4_f64 (float64_t * __a, float64x1x4_t __val) { - __builtin_aarch64_simd_xi __o; - float64x2x4_t __temp; - __temp.val[0] = vcombine_f64 (__val.val[0], vcreate_f64 (__AARCH64_UINT64_C (0))); - __temp.val[1] = vcombine_f64 (__val.val[1], vcreate_f64 (__AARCH64_UINT64_C (0))); - __temp.val[2] = vcombine_f64 (__val.val[2], vcreate_f64 (__AARCH64_UINT64_C (0))); - __temp.val[3] = vcombine_f64 (__val.val[3], vcreate_f64 (__AARCH64_UINT64_C (0))); - __builtin_memcpy (&__o, &__temp, sizeof (__temp)); - __builtin_aarch64_st4df ((__builtin_aarch64_simd_df *) __a, __o); + __builtin_aarch64_st4df ((__builtin_aarch64_simd_df *) __a, __val); } __extension__ extern __inline void __attribute__ ((__always_inline__, __gnu_inline__, __artificial__)) vst4_s8 (int8_t * __a, int8x8x4_t __val) { - __builtin_aarch64_simd_xi __o; - int8x16x4_t __temp; - __temp.val[0] = vcombine_s8 (__val.val[0], vcreate_s8 (__AARCH64_INT64_C (0))); - __temp.val[1] = vcombine_s8 (__val.val[1], vcreate_s8 (__AARCH64_INT64_C (0))); - __temp.val[2] = vcombine_s8 (__val.val[2], vcreate_s8 (__AARCH64_INT64_C (0))); - __temp.val[3] = vcombine_s8 (__val.val[3], vcreate_s8 (__AARCH64_INT64_C (0))); - __builtin_memcpy (&__o, &__temp, sizeof (__temp)); - __builtin_aarch64_st4v8qi ((__builtin_aarch64_simd_qi *) __a, __o); + __builtin_aarch64_st4v8qi ((__builtin_aarch64_simd_qi *) __a, __val); } __extension__ extern __inline void __attribute__ ((__always_inline__, __gnu_inline__, __artificial__)) vst4_p8 (poly8_t * __a, poly8x8x4_t __val) { - __builtin_aarch64_simd_xi __o; - poly8x16x4_t __temp; - __temp.val[0] = vcombine_p8 (__val.val[0], vcreate_p8 (__AARCH64_UINT64_C (0))); - __temp.val[1] = vcombine_p8 (__val.val[1], vcreate_p8 (__AARCH64_UINT64_C (0))); - __temp.val[2] = vcombine_p8 (__val.val[2], vcreate_p8 (__AARCH64_UINT64_C (0))); - __temp.val[3] = vcombine_p8 (__val.val[3], vcreate_p8 (__AARCH64_UINT64_C (0))); - __builtin_memcpy (&__o, &__temp, sizeof (__temp)); - __builtin_aarch64_st4v8qi ((__builtin_aarch64_simd_qi *) __a, __o); + __builtin_aarch64_st4v8qi_sp ((__builtin_aarch64_simd_qi *) __a, __val); } __extension__ extern __inline void __attribute__ ((__always_inline__, __gnu_inline__, __artificial__)) vst4_s16 (int16_t * __a, int16x4x4_t __val) { - __builtin_aarch64_simd_xi __o; - int16x8x4_t __temp; - __temp.val[0] = vcombine_s16 (__val.val[0], vcreate_s16 (__AARCH64_INT64_C (0))); - __temp.val[1] = vcombine_s16 (__val.val[1], vcreate_s16 (__AARCH64_INT64_C (0))); - __temp.val[2] = vcombine_s16 (__val.val[2], vcreate_s16 (__AARCH64_INT64_C (0))); - __temp.val[3] = vcombine_s16 (__val.val[3], vcreate_s16 (__AARCH64_INT64_C (0))); - __builtin_memcpy (&__o, &__temp, sizeof (__temp)); - __builtin_aarch64_st4v4hi ((__builtin_aarch64_simd_hi *) __a, __o); + __builtin_aarch64_st4v4hi ((__builtin_aarch64_simd_hi *) __a, __val); } __extension__ extern __inline void __attribute__ ((__always_inline__, __gnu_inline__, __artificial__)) vst4_p16 (poly16_t * __a, poly16x4x4_t __val) { - __builtin_aarch64_simd_xi __o; - poly16x8x4_t __temp; - __temp.val[0] = vcombine_p16 (__val.val[0], vcreate_p16 (__AARCH64_UINT64_C (0))); - __temp.val[1] = vcombine_p16 (__val.val[1], vcreate_p16 (__AARCH64_UINT64_C (0))); - __temp.val[2] = vcombine_p16 (__val.val[2], vcreate_p16 (__AARCH64_UINT64_C (0))); - __temp.val[3] = vcombine_p16 (__val.val[3], vcreate_p16 (__AARCH64_UINT64_C (0))); - __builtin_memcpy (&__o, &__temp, sizeof (__temp)); - __builtin_aarch64_st4v4hi ((__builtin_aarch64_simd_hi *) __a, __o); + __builtin_aarch64_st4v4hi_sp ((__builtin_aarch64_simd_hi *) __a, __val); } __extension__ extern __inline void __attribute__ ((__always_inline__, __gnu_inline__, __artificial__)) vst4_s32 (int32_t * __a, int32x2x4_t __val) { - __builtin_aarch64_simd_xi __o; - int32x4x4_t __temp; - __temp.val[0] = vcombine_s32 (__val.val[0], vcreate_s32 (__AARCH64_INT64_C (0))); - __temp.val[1] = vcombine_s32 (__val.val[1], vcreate_s32 (__AARCH64_INT64_C (0))); - __temp.val[2] = vcombine_s32 (__val.val[2], vcreate_s32 (__AARCH64_INT64_C (0))); - __temp.val[3] = vcombine_s32 (__val.val[3], vcreate_s32 (__AARCH64_INT64_C (0))); - __builtin_memcpy (&__o, &__temp, sizeof (__temp)); - __builtin_aarch64_st4v2si ((__builtin_aarch64_simd_si *) __a, __o); + __builtin_aarch64_st4v2si ((__builtin_aarch64_simd_si *) __a, __val); } __extension__ extern __inline void __attribute__ ((__always_inline__, __gnu_inline__, __artificial__)) vst4_u8 (uint8_t * __a, uint8x8x4_t __val) { - __builtin_aarch64_simd_xi __o; - uint8x16x4_t __temp; - __temp.val[0] = vcombine_u8 (__val.val[0], vcreate_u8 (__AARCH64_UINT64_C (0))); - __temp.val[1] = vcombine_u8 (__val.val[1], vcreate_u8 (__AARCH64_UINT64_C (0))); - __temp.val[2] = vcombine_u8 (__val.val[2], vcreate_u8 (__AARCH64_UINT64_C (0))); - __temp.val[3] = vcombine_u8 (__val.val[3], vcreate_u8 (__AARCH64_UINT64_C (0))); - __builtin_memcpy (&__o, &__temp, sizeof (__temp)); - __builtin_aarch64_st4v8qi ((__builtin_aarch64_simd_qi *) __a, __o); + __builtin_aarch64_st4v8qi_su ((__builtin_aarch64_simd_qi *) __a, __val); } __extension__ extern __inline void __attribute__ ((__always_inline__, __gnu_inline__, __artificial__)) vst4_u16 (uint16_t * __a, uint16x4x4_t __val) { - __builtin_aarch64_simd_xi __o; - uint16x8x4_t __temp; - __temp.val[0] = vcombine_u16 (__val.val[0], vcreate_u16 (__AARCH64_UINT64_C (0))); - __temp.val[1] = vcombine_u16 (__val.val[1], vcreate_u16 (__AARCH64_UINT64_C (0))); - __temp.val[2] = vcombine_u16 (__val.val[2], vcreate_u16 (__AARCH64_UINT64_C (0))); - __temp.val[3] = vcombine_u16 (__val.val[3], vcreate_u16 (__AARCH64_UINT64_C (0))); - __builtin_memcpy (&__o, &__temp, sizeof (__temp)); - __builtin_aarch64_st4v4hi ((__builtin_aarch64_simd_hi *) __a, __o); + __builtin_aarch64_st4v4hi_su ((__builtin_aarch64_simd_hi *) __a, __val); } __extension__ extern __inline void __attribute__ ((__always_inline__, __gnu_inline__, __artificial__)) vst4_u32 (uint32_t * __a, uint32x2x4_t __val) { - __builtin_aarch64_simd_xi __o; - uint32x4x4_t __temp; - __temp.val[0] = vcombine_u32 (__val.val[0], vcreate_u32 (__AARCH64_UINT64_C (0))); - __temp.val[1] = vcombine_u32 (__val.val[1], vcreate_u32 (__AARCH64_UINT64_C (0))); - __temp.val[2] = vcombine_u32 (__val.val[2], vcreate_u32 (__AARCH64_UINT64_C (0))); - __temp.val[3] = vcombine_u32 (__val.val[3], vcreate_u32 (__AARCH64_UINT64_C (0))); - __builtin_memcpy (&__o, &__temp, sizeof (__temp)); - __builtin_aarch64_st4v2si ((__builtin_aarch64_simd_si *) __a, __o); + __builtin_aarch64_st4v2si_su ((__builtin_aarch64_simd_si *) __a, __val); } __extension__ extern __inline void __attribute__ ((__always_inline__, __gnu_inline__, __artificial__)) vst4_f16 (float16_t * __a, float16x4x4_t __val) { - __builtin_aarch64_simd_xi __o; - float16x8x4_t __temp; - __temp.val[0] = vcombine_f16 (__val.val[0], vcreate_f16 (__AARCH64_UINT64_C (0))); - __temp.val[1] = vcombine_f16 (__val.val[1], vcreate_f16 (__AARCH64_UINT64_C (0))); - __temp.val[2] = vcombine_f16 (__val.val[2], vcreate_f16 (__AARCH64_UINT64_C (0))); - __temp.val[3] = vcombine_f16 (__val.val[3], vcreate_f16 (__AARCH64_UINT64_C (0))); - __builtin_memcpy (&__o, &__temp, sizeof (__temp)); - __builtin_aarch64_st4v4hf ((__builtin_aarch64_simd_hf *) __a, __o); + __builtin_aarch64_st4v4hf ((__builtin_aarch64_simd_hf *) __a, __val); } __extension__ extern __inline void __attribute__ ((__always_inline__, __gnu_inline__, __artificial__)) vst4_f32 (float32_t * __a, float32x2x4_t __val) { - __builtin_aarch64_simd_xi __o; - float32x4x4_t __temp; - __temp.val[0] = vcombine_f32 (__val.val[0], vcreate_f32 (__AARCH64_UINT64_C (0))); - __temp.val[1] = vcombine_f32 (__val.val[1], vcreate_f32 (__AARCH64_UINT64_C (0))); - __temp.val[2] = vcombine_f32 (__val.val[2], vcreate_f32 (__AARCH64_UINT64_C (0))); - __temp.val[3] = vcombine_f32 (__val.val[3], vcreate_f32 (__AARCH64_UINT64_C (0))); - __builtin_memcpy (&__o, &__temp, sizeof (__temp)); - __builtin_aarch64_st4v2sf ((__builtin_aarch64_simd_sf *) __a, __o); + __builtin_aarch64_st4v2sf ((__builtin_aarch64_simd_sf *) __a, __val); } __extension__ extern __inline void __attribute__ ((__always_inline__, __gnu_inline__, __artificial__)) vst4_p64 (poly64_t * __a, poly64x1x4_t __val) { - __builtin_aarch64_simd_xi __o; - poly64x2x4_t __temp; - __temp.val[0] = vcombine_p64 (__val.val[0], vcreate_p64 (__AARCH64_UINT64_C (0))); - __temp.val[1] = vcombine_p64 (__val.val[1], vcreate_p64 (__AARCH64_UINT64_C (0))); - __temp.val[2] = vcombine_p64 (__val.val[2], vcreate_p64 (__AARCH64_UINT64_C (0))); - __temp.val[3] = vcombine_p64 (__val.val[3], vcreate_p64 (__AARCH64_UINT64_C (0))); - __builtin_memcpy (&__o, &__temp, sizeof (__temp)); - __builtin_aarch64_st4di ((__builtin_aarch64_simd_di *) __a, __o); + __builtin_aarch64_st4di_sp ((__builtin_aarch64_simd_di *) __a, __val); } __extension__ extern __inline void __attribute__ ((__always_inline__, __gnu_inline__, __artificial__)) vst4q_s8 (int8_t * __a, int8x16x4_t __val) { - __builtin_aarch64_simd_xi __o; - __builtin_memcpy (&__o, &__val, sizeof (__val)); - __builtin_aarch64_st4v16qi ((__builtin_aarch64_simd_qi *) __a, __o); + __builtin_aarch64_st4v16qi ((__builtin_aarch64_simd_qi *) __a, __val); } __extension__ extern __inline void __attribute__ ((__always_inline__, __gnu_inline__, __artificial__)) vst4q_p8 (poly8_t * __a, poly8x16x4_t __val) { - __builtin_aarch64_simd_xi __o; - __builtin_memcpy (&__o, &__val, sizeof (__val)); - __builtin_aarch64_st4v16qi ((__builtin_aarch64_simd_qi *) __a, __o); + __builtin_aarch64_st4v16qi_sp ((__builtin_aarch64_simd_qi *) __a, __val); } __extension__ extern __inline void __attribute__ ((__always_inline__, __gnu_inline__, __artificial__)) vst4q_s16 (int16_t * __a, int16x8x4_t __val) { - __builtin_aarch64_simd_xi __o; - __builtin_memcpy (&__o, &__val, sizeof (__val)); - __builtin_aarch64_st4v8hi ((__builtin_aarch64_simd_hi *) __a, __o); + __builtin_aarch64_st4v8hi ((__builtin_aarch64_simd_hi *) __a, __val); } __extension__ extern __inline void __attribute__ ((__always_inline__, __gnu_inline__, __artificial__)) vst4q_p16 (poly16_t * __a, poly16x8x4_t __val) { - __builtin_aarch64_simd_xi __o; - __builtin_memcpy (&__o, &__val, sizeof (__val)); - __builtin_aarch64_st4v8hi ((__builtin_aarch64_simd_hi *) __a, __o); + __builtin_aarch64_st4v8hi_sp ((__builtin_aarch64_simd_hi *) __a, __val); } __extension__ extern __inline void __attribute__ ((__always_inline__, __gnu_inline__, __artificial__)) vst4q_s32 (int32_t * __a, int32x4x4_t __val) { - __builtin_aarch64_simd_xi __o; - __builtin_memcpy (&__o, &__val, sizeof (__val)); - __builtin_aarch64_st4v4si ((__builtin_aarch64_simd_si *) __a, __o); + __builtin_aarch64_st4v4si ((__builtin_aarch64_simd_si *) __a, __val); } __extension__ extern __inline void __attribute__ ((__always_inline__, __gnu_inline__, __artificial__)) vst4q_s64 (int64_t * __a, int64x2x4_t __val) { - __builtin_aarch64_simd_xi __o; - __builtin_memcpy (&__o, &__val, sizeof (__val)); - __builtin_aarch64_st4v2di ((__builtin_aarch64_simd_di *) __a, __o); + __builtin_aarch64_st4v2di ((__builtin_aarch64_simd_di *) __a, __val); } __extension__ extern __inline void __attribute__ ((__always_inline__, __gnu_inline__, __artificial__)) vst4q_u8 (uint8_t * __a, uint8x16x4_t __val) { - __builtin_aarch64_simd_xi __o; - __builtin_memcpy (&__o, &__val, sizeof (__val)); - __builtin_aarch64_st4v16qi ((__builtin_aarch64_simd_qi *) __a, __o); + __builtin_aarch64_st4v16qi_su ((__builtin_aarch64_simd_qi *) __a, __val); } __extension__ extern __inline void __attribute__ ((__always_inline__, __gnu_inline__, __artificial__)) vst4q_u16 (uint16_t * __a, uint16x8x4_t __val) { - __builtin_aarch64_simd_xi __o; - __builtin_memcpy (&__o, &__val, sizeof (__val)); - __builtin_aarch64_st4v8hi ((__builtin_aarch64_simd_hi *) __a, __o); + __builtin_aarch64_st4v8hi_su ((__builtin_aarch64_simd_hi *) __a, __val); } __extension__ extern __inline void __attribute__ ((__always_inline__, __gnu_inline__, __artificial__)) vst4q_u32 (uint32_t * __a, uint32x4x4_t __val) { - __builtin_aarch64_simd_xi __o; - __builtin_memcpy (&__o, &__val, sizeof (__val)); - __builtin_aarch64_st4v4si ((__builtin_aarch64_simd_si *) __a, __o); + __builtin_aarch64_st4v4si_su ((__builtin_aarch64_simd_si *) __a, __val); } __extension__ extern __inline void __attribute__ ((__always_inline__, __gnu_inline__, __artificial__)) vst4q_u64 (uint64_t * __a, uint64x2x4_t __val) { - __builtin_aarch64_simd_xi __o; - __builtin_memcpy (&__o, &__val, sizeof (__val)); - __builtin_aarch64_st4v2di ((__builtin_aarch64_simd_di *) __a, __o); + __builtin_aarch64_st4v2di_su ((__builtin_aarch64_simd_di *) __a, __val); } __extension__ extern __inline void __attribute__ ((__always_inline__, __gnu_inline__, __artificial__)) vst4q_f16 (float16_t * __a, float16x8x4_t __val) { - __builtin_aarch64_simd_xi __o; - __builtin_memcpy (&__o, &__val, sizeof (__val)); - __builtin_aarch64_st4v8hf ((__builtin_aarch64_simd_hf *) __a, __o); + __builtin_aarch64_st4v8hf ((__builtin_aarch64_simd_hf *) __a, __val); } __extension__ extern __inline void __attribute__ ((__always_inline__, __gnu_inline__, __artificial__)) vst4q_f32 (float32_t * __a, float32x4x4_t __val) { - __builtin_aarch64_simd_xi __o; - __builtin_memcpy (&__o, &__val, sizeof (__val)); - __builtin_aarch64_st4v4sf ((__builtin_aarch64_simd_sf *) __a, __o); + __builtin_aarch64_st4v4sf ((__builtin_aarch64_simd_sf *) __a, __val); } __extension__ extern __inline void __attribute__ ((__always_inline__, __gnu_inline__, __artificial__)) vst4q_f64 (float64_t * __a, float64x2x4_t __val) { - __builtin_aarch64_simd_xi __o; - __builtin_memcpy (&__o, &__val, sizeof (__val)); - __builtin_aarch64_st4v2df ((__builtin_aarch64_simd_df *) __a, __o); + __builtin_aarch64_st4v2df ((__builtin_aarch64_simd_df *) __a, __val); } __extension__ extern __inline void __attribute__ ((__always_inline__, __gnu_inline__, __artificial__)) vst4q_p64 (poly64_t * __a, poly64x2x4_t __val) { - __builtin_aarch64_simd_xi __o; - __builtin_memcpy (&__o, &__val, sizeof (__val)); - __builtin_aarch64_st4v2di ((__builtin_aarch64_simd_di *) __a, __o); + __builtin_aarch64_st4v2di_sp ((__builtin_aarch64_simd_di *) __a, __val); } __extension__ extern __inline void @@ -29843,11 +26428,9 @@ __attribute__ ((__always_inline__, __gnu_inline__, __artificial__)) vtbx4_s8 (int8x8_t __r, int8x8x4_t __tab, int8x8_t __idx) { int8x16x2_t __temp; - __builtin_aarch64_simd_oi __o; __temp.val[0] = vcombine_s8 (__tab.val[0], __tab.val[1]); __temp.val[1] = vcombine_s8 (__tab.val[2], __tab.val[3]); - __builtin_memcpy (&__o, &__temp, sizeof (__temp)); - return __builtin_aarch64_qtbx2v8qi (__r, __o, __idx); + return __builtin_aarch64_qtbx2v8qi (__r, __temp, __idx); } __extension__ extern __inline uint8x8_t @@ -29855,12 +26438,9 @@ __attribute__ ((__always_inline__, __gnu_inline__, __artificial__)) vtbx4_u8 (uint8x8_t __r, uint8x8x4_t __tab, uint8x8_t __idx) { uint8x16x2_t __temp; - __builtin_aarch64_simd_oi __o; __temp.val[0] = vcombine_u8 (__tab.val[0], __tab.val[1]); __temp.val[1] = vcombine_u8 (__tab.val[2], __tab.val[3]); - __builtin_memcpy (&__o, &__temp, sizeof (__temp)); - return (uint8x8_t)__builtin_aarch64_qtbx2v8qi ((int8x8_t)__r, __o, - (int8x8_t)__idx); + return __builtin_aarch64_qtbx2v8qi_uuuu (__r, __temp, __idx); } __extension__ extern __inline poly8x8_t @@ -29868,12 +26448,9 @@ __attribute__ ((__always_inline__, __gnu_inline__, __artificial__)) vtbx4_p8 (poly8x8_t __r, poly8x8x4_t __tab, uint8x8_t __idx) { poly8x16x2_t __temp; - __builtin_aarch64_simd_oi __o; __temp.val[0] = vcombine_p8 (__tab.val[0], __tab.val[1]); __temp.val[1] = vcombine_p8 (__tab.val[2], __tab.val[3]); - __builtin_memcpy (&__o, &__temp, sizeof (__temp)); - return (poly8x8_t)__builtin_aarch64_qtbx2v8qi ((int8x8_t)__r, __o, - (int8x8_t)__idx); + return __builtin_aarch64_qtbx2v8qi_pppu (__r, __temp, __idx); } /* vtrn */ @@ -34302,69 +30879,42 @@ __extension__ extern __inline bfloat16x4x2_t __attribute__ ((__always_inline__, __gnu_inline__, __artificial__)) vld1_bf16_x2 (const bfloat16_t *__a) { - bfloat16x4x2_t ret; - __builtin_aarch64_simd_oi __o; - __o = __builtin_aarch64_ld1x2v4bf ((const __builtin_aarch64_simd_bf *) __a); - ret.val[0] = (bfloat16x4_t) __builtin_aarch64_get_dregoiv4bf (__o, 0); - ret.val[1] = (bfloat16x4_t) __builtin_aarch64_get_dregoiv4bf (__o, 1); - return ret; + return __builtin_aarch64_ld1x2v4bf ((const __builtin_aarch64_simd_bf *) __a); } __extension__ extern __inline bfloat16x8x2_t __attribute__ ((__always_inline__, __gnu_inline__, __artificial__)) vld1q_bf16_x2 (const bfloat16_t *__a) { - bfloat16x8x2_t ret; - __builtin_aarch64_simd_oi __o; - __o = __builtin_aarch64_ld1x2v8bf ((const __builtin_aarch64_simd_bf *) __a); - ret.val[0] = (bfloat16x8_t) __builtin_aarch64_get_qregoiv8bf (__o, 0); - ret.val[1] = (bfloat16x8_t) __builtin_aarch64_get_qregoiv8bf (__o, 1); - return ret; + return __builtin_aarch64_ld1x2v8bf ( + (const __builtin_aarch64_simd_bf *) __a); } __extension__ extern __inline bfloat16x4x3_t __attribute__ ((__always_inline__, __gnu_inline__, __artificial__)) vld1_bf16_x3 (const bfloat16_t *__a) { - bfloat16x4x3_t __i; - __builtin_aarch64_simd_ci __o; - __o = __builtin_aarch64_ld1x3v4bf ((const __builtin_aarch64_simd_bf *) __a); - __i.val[0] = (bfloat16x4_t) __builtin_aarch64_get_dregciv4bf (__o, 0); - __i.val[1] = (bfloat16x4_t) __builtin_aarch64_get_dregciv4bf (__o, 1); - __i.val[2] = (bfloat16x4_t) __builtin_aarch64_get_dregciv4bf (__o, 2); - return __i; + return __builtin_aarch64_ld1x3v4bf ((const __builtin_aarch64_simd_bf *) __a); } __extension__ extern __inline bfloat16x8x3_t __attribute__ ((__always_inline__, __gnu_inline__, __artificial__)) vld1q_bf16_x3 (const bfloat16_t *__a) { - bfloat16x8x3_t __i; - __builtin_aarch64_simd_ci __o; - __o = __builtin_aarch64_ld1x3v8bf ((const __builtin_aarch64_simd_bf *) __a); - __i.val[0] = (bfloat16x8_t) __builtin_aarch64_get_qregciv8bf (__o, 0); - __i.val[1] = (bfloat16x8_t) __builtin_aarch64_get_qregciv8bf (__o, 1); - __i.val[2] = (bfloat16x8_t) __builtin_aarch64_get_qregciv8bf (__o, 2); - return __i; + return __builtin_aarch64_ld1x3v8bf ((const __builtin_aarch64_simd_bf *) __a); } __extension__ extern __inline bfloat16x4x4_t __attribute__ ((__always_inline__, __gnu_inline__, __artificial__)) vld1_bf16_x4 (const bfloat16_t *__a) { - union { bfloat16x4x4_t __i; __builtin_aarch64_simd_xi __o; } __au; - __au.__o - = __builtin_aarch64_ld1x4v4bf ((const __builtin_aarch64_simd_bf *) __a); - return __au.__i; + return __builtin_aarch64_ld1x4v4bf ((const __builtin_aarch64_simd_bf *) __a); } __extension__ extern __inline bfloat16x8x4_t __attribute__ ((__always_inline__, __gnu_inline__, __artificial__)) vld1q_bf16_x4 (const bfloat16_t *__a) { - union { bfloat16x8x4_t __i; __builtin_aarch64_simd_xi __o; } __au; - __au.__o - = __builtin_aarch64_ld1x4v8bf ((const __builtin_aarch64_simd_bf *) __a); - return __au.__i; + return __builtin_aarch64_ld1x4v8bf ((const __builtin_aarch64_simd_bf *) __a); } __extension__ extern __inline bfloat16x4_t @@ -34399,156 +30949,84 @@ __extension__ extern __inline bfloat16x4x2_t __attribute__ ((__always_inline__, __gnu_inline__, __artificial__)) vld2_bf16 (const bfloat16_t * __a) { - bfloat16x4x2_t ret; - __builtin_aarch64_simd_oi __o; - __o = __builtin_aarch64_ld2v4bf (__a); - ret.val[0] = (bfloat16x4_t) __builtin_aarch64_get_dregoiv4bf (__o, 0); - ret.val[1] = (bfloat16x4_t) __builtin_aarch64_get_dregoiv4bf (__o, 1); - return ret; + return __builtin_aarch64_ld2v4bf (__a); } __extension__ extern __inline bfloat16x8x2_t __attribute__ ((__always_inline__, __gnu_inline__, __artificial__)) vld2q_bf16 (const bfloat16_t * __a) { - bfloat16x8x2_t ret; - __builtin_aarch64_simd_oi __o; - __o = __builtin_aarch64_ld2v8bf ((const __builtin_aarch64_simd_bf *) __a); - ret.val[0] = (bfloat16x8_t) __builtin_aarch64_get_qregoiv8bf (__o, 0); - ret.val[1] = (bfloat16x8_t) __builtin_aarch64_get_qregoiv8bf (__o, 1); - return ret; + return __builtin_aarch64_ld2v8bf ((const __builtin_aarch64_simd_bf *) __a); } __extension__ extern __inline bfloat16x4x2_t __attribute__ ((__always_inline__, __gnu_inline__, __artificial__)) vld2_dup_bf16 (const bfloat16_t * __a) { - bfloat16x4x2_t ret; - __builtin_aarch64_simd_oi __o; - __o = __builtin_aarch64_ld2rv4bf ((const __builtin_aarch64_simd_bf *) __a); - ret.val[0] = (bfloat16x4_t) __builtin_aarch64_get_dregoiv4bf (__o, 0); - ret.val[1] = (bfloat16x4_t) __builtin_aarch64_get_dregoiv4bf (__o, 1); - return ret; + return __builtin_aarch64_ld2rv4bf ((const __builtin_aarch64_simd_bf *) __a); } __extension__ extern __inline bfloat16x8x2_t __attribute__ ((__always_inline__, __gnu_inline__, __artificial__)) vld2q_dup_bf16 (const bfloat16_t * __a) { - bfloat16x8x2_t ret; - __builtin_aarch64_simd_oi __o; - __o = __builtin_aarch64_ld2rv8bf ((const __builtin_aarch64_simd_bf *) __a); - ret.val[0] = (bfloat16x8_t) __builtin_aarch64_get_qregoiv8bf (__o, 0); - ret.val[1] = (bfloat16x8_t) __builtin_aarch64_get_qregoiv8bf (__o, 1); - return ret; + return __builtin_aarch64_ld2rv8bf ((const __builtin_aarch64_simd_bf *) __a); } __extension__ extern __inline bfloat16x4x3_t __attribute__ ((__always_inline__, __gnu_inline__, __artificial__)) vld3_bf16 (const bfloat16_t * __a) { - bfloat16x4x3_t ret; - __builtin_aarch64_simd_ci __o; - __o = __builtin_aarch64_ld3v4bf ((const __builtin_aarch64_simd_bf *) __a); - ret.val[0] = (bfloat16x4_t) __builtin_aarch64_get_dregciv4bf (__o, 0); - ret.val[1] = (bfloat16x4_t) __builtin_aarch64_get_dregciv4bf (__o, 1); - ret.val[2] = (bfloat16x4_t) __builtin_aarch64_get_dregciv4bf (__o, 2); - return ret; + return __builtin_aarch64_ld3v4bf ((const __builtin_aarch64_simd_bf *) __a); } __extension__ extern __inline bfloat16x8x3_t __attribute__ ((__always_inline__, __gnu_inline__, __artificial__)) vld3q_bf16 (const bfloat16_t * __a) { - bfloat16x8x3_t ret; - __builtin_aarch64_simd_ci __o; - __o = __builtin_aarch64_ld3v8bf ((const __builtin_aarch64_simd_bf *) __a); - ret.val[0] = (bfloat16x8_t) __builtin_aarch64_get_qregciv8bf (__o, 0); - ret.val[1] = (bfloat16x8_t) __builtin_aarch64_get_qregciv8bf (__o, 1); - ret.val[2] = (bfloat16x8_t) __builtin_aarch64_get_qregciv8bf (__o, 2); - return ret; + return __builtin_aarch64_ld3v8bf ((const __builtin_aarch64_simd_bf *) __a); } __extension__ extern __inline bfloat16x4x3_t __attribute__ ((__always_inline__, __gnu_inline__, __artificial__)) vld3_dup_bf16 (const bfloat16_t * __a) { - bfloat16x4x3_t ret; - __builtin_aarch64_simd_ci __o; - __o = __builtin_aarch64_ld3rv4bf ((const __builtin_aarch64_simd_bf *) __a); - ret.val[0] = (bfloat16x4_t) __builtin_aarch64_get_dregciv4bf (__o, 0); - ret.val[1] = (bfloat16x4_t) __builtin_aarch64_get_dregciv4bf (__o, 1); - ret.val[2] = (bfloat16x4_t) __builtin_aarch64_get_dregciv4bf (__o, 2); - return ret; + return __builtin_aarch64_ld3rv4bf ((const __builtin_aarch64_simd_bf *) __a); } __extension__ extern __inline bfloat16x8x3_t __attribute__ ((__always_inline__, __gnu_inline__, __artificial__)) vld3q_dup_bf16 (const bfloat16_t * __a) { - bfloat16x8x3_t ret; - __builtin_aarch64_simd_ci __o; - __o = __builtin_aarch64_ld3rv8bf ((const __builtin_aarch64_simd_bf *) __a); - ret.val[0] = (bfloat16x8_t) __builtin_aarch64_get_qregciv8bf (__o, 0); - ret.val[1] = (bfloat16x8_t) __builtin_aarch64_get_qregciv8bf (__o, 1); - ret.val[2] = (bfloat16x8_t) __builtin_aarch64_get_qregciv8bf (__o, 2); - return ret; + return __builtin_aarch64_ld3rv8bf ((const __builtin_aarch64_simd_bf *) __a); } __extension__ extern __inline bfloat16x4x4_t __attribute__ ((__always_inline__, __gnu_inline__, __artificial__)) vld4_bf16 (const bfloat16_t * __a) { - bfloat16x4x4_t ret; - __builtin_aarch64_simd_xi __o; - __o = __builtin_aarch64_ld4v4bf ((const __builtin_aarch64_simd_bf *) __a); - ret.val[0] = (bfloat16x4_t) __builtin_aarch64_get_dregxiv4bf (__o, 0); - ret.val[1] = (bfloat16x4_t) __builtin_aarch64_get_dregxiv4bf (__o, 1); - ret.val[2] = (bfloat16x4_t) __builtin_aarch64_get_dregxiv4bf (__o, 2); - ret.val[3] = (bfloat16x4_t) __builtin_aarch64_get_dregxiv4bf (__o, 3); - return ret; + return __builtin_aarch64_ld4v4bf ((const __builtin_aarch64_simd_bf *) __a); } __extension__ extern __inline bfloat16x8x4_t __attribute__ ((__always_inline__, __gnu_inline__, __artificial__)) vld4q_bf16 (const bfloat16_t * __a) { - bfloat16x8x4_t ret; - __builtin_aarch64_simd_xi __o; - __o = __builtin_aarch64_ld4v8bf ((const __builtin_aarch64_simd_bf *) __a); - ret.val[0] = (bfloat16x8_t) __builtin_aarch64_get_qregxiv8bf (__o, 0); - ret.val[1] = (bfloat16x8_t) __builtin_aarch64_get_qregxiv8bf (__o, 1); - ret.val[2] = (bfloat16x8_t) __builtin_aarch64_get_qregxiv8bf (__o, 2); - ret.val[3] = (bfloat16x8_t) __builtin_aarch64_get_qregxiv8bf (__o, 3); - return ret; + return __builtin_aarch64_ld4v8bf ((const __builtin_aarch64_simd_bf *) __a); } __extension__ extern __inline bfloat16x4x4_t __attribute__ ((__always_inline__, __gnu_inline__, __artificial__)) vld4_dup_bf16 (const bfloat16_t * __a) { - bfloat16x4x4_t ret; - __builtin_aarch64_simd_xi __o; - __o = __builtin_aarch64_ld4rv4bf ((const __builtin_aarch64_simd_bf *) __a); - ret.val[0] = (bfloat16x4_t) __builtin_aarch64_get_dregxiv4bf (__o, 0); - ret.val[1] = (bfloat16x4_t) __builtin_aarch64_get_dregxiv4bf (__o, 1); - ret.val[2] = (bfloat16x4_t) __builtin_aarch64_get_dregxiv4bf (__o, 2); - ret.val[3] = (bfloat16x4_t) __builtin_aarch64_get_dregxiv4bf (__o, 3); - return ret; + return __builtin_aarch64_ld4rv4bf ((const __builtin_aarch64_simd_bf *) __a); } __extension__ extern __inline bfloat16x8x4_t __attribute__ ((__always_inline__, __gnu_inline__, __artificial__)) vld4q_dup_bf16 (const bfloat16_t * __a) { - bfloat16x8x4_t ret; - __builtin_aarch64_simd_xi __o; - __o = __builtin_aarch64_ld4rv8bf ((const __builtin_aarch64_simd_bf *) __a); - ret.val[0] = (bfloat16x8_t) __builtin_aarch64_get_qregxiv8bf (__o, 0); - ret.val[1] = (bfloat16x8_t) __builtin_aarch64_get_qregxiv8bf (__o, 1); - ret.val[2] = (bfloat16x8_t) __builtin_aarch64_get_qregxiv8bf (__o, 2); - ret.val[3] = (bfloat16x8_t) __builtin_aarch64_get_qregxiv8bf (__o, 3); - return ret; + return __builtin_aarch64_ld4rv8bf ((const __builtin_aarch64_simd_bf *) __a); } /* vst */ @@ -34564,66 +31042,42 @@ __extension__ extern __inline void __attribute__ ((__always_inline__, __gnu_inline__, __artificial__)) vst1_bf16_x2 (bfloat16_t * __a, bfloat16x4x2_t __val) { - __builtin_aarch64_simd_oi __o; - bfloat16x8x2_t __temp; - __temp.val[0] = vcombine_bf16 (__val.val[0], vcreate_bf16 (__AARCH64_UINT64_C (0))); - __temp.val[1] = vcombine_bf16 (__val.val[1], vcreate_bf16 (__AARCH64_UINT64_C (0))); - __builtin_memcpy (&__o, &__temp, sizeof (__temp)); - __builtin_aarch64_st1x2v4bf (__a, __o); + __builtin_aarch64_st1x2v4bf (__a, __val); } __extension__ extern __inline void __attribute__ ((__always_inline__, __gnu_inline__, __artificial__)) vst1q_bf16_x2 (bfloat16_t * __a, bfloat16x8x2_t __val) { - __builtin_aarch64_simd_oi __o; - __builtin_memcpy (&__o, &__val, sizeof (__val)); - __builtin_aarch64_st1x2v8bf (__a, __o); + __builtin_aarch64_st1x2v8bf (__a, __val); } __extension__ extern __inline void __attribute__ ((__always_inline__, __gnu_inline__, __artificial__)) vst1_bf16_x3 (bfloat16_t * __a, bfloat16x4x3_t __val) { - __builtin_aarch64_simd_ci __o; - bfloat16x8x3_t __temp; - __temp.val[0] = vcombine_bf16 (__val.val[0], vcreate_bf16 (__AARCH64_UINT64_C (0))); - __temp.val[1] = vcombine_bf16 (__val.val[1], vcreate_bf16 (__AARCH64_UINT64_C (0))); - __temp.val[2] = vcombine_bf16 (__val.val[2], vcreate_bf16 (__AARCH64_UINT64_C (0))); - __builtin_memcpy (&__o, &__temp, sizeof (__temp)); - __builtin_aarch64_st1x3v4bf ((__builtin_aarch64_simd_bf *) __a, __o); + __builtin_aarch64_st1x3v4bf ((__builtin_aarch64_simd_bf *) __a, __val); } __extension__ extern __inline void __attribute__ ((__always_inline__, __gnu_inline__, __artificial__)) vst1q_bf16_x3 (bfloat16_t * __a, bfloat16x8x3_t __val) { - __builtin_aarch64_simd_ci __o; - __builtin_memcpy (&__o, &__val, sizeof (__val)); - __builtin_aarch64_st1x3v8bf ((__builtin_aarch64_simd_bf *) __a, __o); + __builtin_aarch64_st1x3v8bf ((__builtin_aarch64_simd_bf *) __a, __val); } __extension__ extern __inline void __attribute__ ((__always_inline__, __gnu_inline__, __artificial__)) vst1_bf16_x4 (bfloat16_t * __a, bfloat16x4x4_t __val) { - __builtin_aarch64_simd_xi __o; - bfloat16x8x4_t __temp; - __temp.val[0] = vcombine_bf16 (__val.val[0], vcreate_bf16 (__AARCH64_UINT64_C (0))); - __temp.val[1] = vcombine_bf16 (__val.val[1], vcreate_bf16 (__AARCH64_UINT64_C (0))); - __temp.val[2] = vcombine_bf16 (__val.val[2], vcreate_bf16 (__AARCH64_UINT64_C (0))); - __temp.val[3] = vcombine_bf16 (__val.val[3], vcreate_bf16 (__AARCH64_UINT64_C (0))); - __builtin_memcpy (&__o, &__temp, sizeof (__temp)); - __builtin_aarch64_st1x4v4bf ((__builtin_aarch64_simd_bf *) __a, __o); + __builtin_aarch64_st1x4v4bf ((__builtin_aarch64_simd_bf *) __a, __val); } __extension__ extern __inline void __attribute__ ((__always_inline__, __gnu_inline__, __artificial__)) vst1q_bf16_x4 (bfloat16_t * __a, bfloat16x8x4_t __val) { - __builtin_aarch64_simd_xi __o; - __builtin_memcpy (&__o, &__val, sizeof (__val)); - __builtin_aarch64_st1x4v8bf ((__builtin_aarch64_simd_bf *) __a, __o); + __builtin_aarch64_st1x4v8bf ((__builtin_aarch64_simd_bf *) __a, __val); } __extension__ extern __inline void @@ -34651,66 +31105,42 @@ __extension__ extern __inline void __attribute__ ((__always_inline__, __gnu_inline__, __artificial__)) vst2_bf16 (bfloat16_t * __a, bfloat16x4x2_t __val) { - __builtin_aarch64_simd_oi __o; - bfloat16x8x2_t __temp; - __temp.val[0] = vcombine_bf16 (__val.val[0], vcreate_bf16 (__AARCH64_UINT64_C (0))); - __temp.val[1] = vcombine_bf16 (__val.val[1], vcreate_bf16 (__AARCH64_UINT64_C (0))); - __builtin_memcpy (&__o, &__temp, sizeof (__temp)); - __builtin_aarch64_st2v4bf (__a, __o); + __builtin_aarch64_st2v4bf (__a, __val); } __extension__ extern __inline void __attribute__ ((__always_inline__, __gnu_inline__, __artificial__)) vst2q_bf16 (bfloat16_t * __a, bfloat16x8x2_t __val) { - __builtin_aarch64_simd_oi __o; - __builtin_memcpy (&__o, &__val, sizeof (__val)); - __builtin_aarch64_st2v8bf (__a, __o); + __builtin_aarch64_st2v8bf (__a, __val); } __extension__ extern __inline void __attribute__ ((__always_inline__, __gnu_inline__, __artificial__)) vst3_bf16 (bfloat16_t * __a, bfloat16x4x3_t __val) { - __builtin_aarch64_simd_ci __o; - bfloat16x8x3_t __temp; - __temp.val[0] = vcombine_bf16 (__val.val[0], vcreate_bf16 (__AARCH64_UINT64_C (0))); - __temp.val[1] = vcombine_bf16 (__val.val[1], vcreate_bf16 (__AARCH64_UINT64_C (0))); - __temp.val[2] = vcombine_bf16 (__val.val[2], vcreate_bf16 (__AARCH64_UINT64_C (0))); - __builtin_memcpy (&__o, &__temp, sizeof (__temp)); - __builtin_aarch64_st3v4bf ((__builtin_aarch64_simd_bf *) __a, __o); + __builtin_aarch64_st3v4bf ((__builtin_aarch64_simd_bf *) __a, __val); } __extension__ extern __inline void __attribute__ ((__always_inline__, __gnu_inline__, __artificial__)) vst3q_bf16 (bfloat16_t * __a, bfloat16x8x3_t __val) { - __builtin_aarch64_simd_ci __o; - __builtin_memcpy (&__o, &__val, sizeof (__val)); - __builtin_aarch64_st3v8bf ((__builtin_aarch64_simd_bf *) __a, __o); + __builtin_aarch64_st3v8bf ((__builtin_aarch64_simd_bf *) __a, __val); } __extension__ extern __inline void __attribute__ ((__always_inline__, __gnu_inline__, __artificial__)) vst4_bf16 (bfloat16_t * __a, bfloat16x4x4_t __val) { - __builtin_aarch64_simd_xi __o; - bfloat16x8x4_t __temp; - __temp.val[0] = vcombine_bf16 (__val.val[0], vcreate_bf16 (__AARCH64_UINT64_C (0))); - __temp.val[1] = vcombine_bf16 (__val.val[1], vcreate_bf16 (__AARCH64_UINT64_C (0))); - __temp.val[2] = vcombine_bf16 (__val.val[2], vcreate_bf16 (__AARCH64_UINT64_C (0))); - __temp.val[3] = vcombine_bf16 (__val.val[3], vcreate_bf16 (__AARCH64_UINT64_C (0))); - __builtin_memcpy (&__o, &__temp, sizeof (__temp)); - __builtin_aarch64_st4v4bf ((__builtin_aarch64_simd_bf *) __a, __o); + __builtin_aarch64_st4v4bf ((__builtin_aarch64_simd_bf *) __a, __val); } __extension__ extern __inline void __attribute__ ((__always_inline__, __gnu_inline__, __artificial__)) vst4q_bf16 (bfloat16_t * __a, bfloat16x8x4_t __val) { - __builtin_aarch64_simd_xi __o; - __builtin_memcpy (&__o, &__val, sizeof (__val)); - __builtin_aarch64_st4v8bf ((__builtin_aarch64_simd_bf *) __a, __o); + __builtin_aarch64_st4v8bf ((__builtin_aarch64_simd_bf *) __a, __val); } /* vreinterpret */ @@ -35317,125 +31747,55 @@ __extension__ extern __inline bfloat16x4x2_t __attribute__ ((__always_inline__, __gnu_inline__,__artificial__)) vld2_lane_bf16 (const bfloat16_t * __ptr, bfloat16x4x2_t __b, const int __c) { - __builtin_aarch64_simd_oi __o; - bfloat16x8x2_t __temp; - __temp.val[0] = vcombine_bf16 (__b.val[0], vcreate_bf16 (0)); - __temp.val[1] = vcombine_bf16 (__b.val[1], vcreate_bf16 (0)); - __o = __builtin_aarch64_set_qregoiv8bf (__o, (bfloat16x8_t) __temp.val[0], 0); - __o = __builtin_aarch64_set_qregoiv8bf (__o, (bfloat16x8_t) __temp.val[1], 1); - __o = __builtin_aarch64_ld2_lanev4bf ( - (__builtin_aarch64_simd_bf *) __ptr, __o, __c); - __b.val[0] = (bfloat16x4_t) __builtin_aarch64_get_dregoidi (__o, 0); - __b.val[1] = (bfloat16x4_t) __builtin_aarch64_get_dregoidi (__o, 1); - return __b; + return __builtin_aarch64_ld2_lanev4bf ( + (__builtin_aarch64_simd_bf *) __ptr, __b, __c); } __extension__ extern __inline bfloat16x8x2_t __attribute__ ((__always_inline__, __gnu_inline__,__artificial__)) vld2q_lane_bf16 (const bfloat16_t * __ptr, bfloat16x8x2_t __b, const int __c) { - __builtin_aarch64_simd_oi __o; - bfloat16x8x2_t ret; - __o = __builtin_aarch64_set_qregoiv4si (__o, (int32x4_t) __b.val[0], 0); - __o = __builtin_aarch64_set_qregoiv4si (__o, (int32x4_t) __b.val[1], 1); - __o = __builtin_aarch64_ld2_lanev8bf ( - (__builtin_aarch64_simd_bf *) __ptr, __o, __c); - ret.val[0] = (bfloat16x8_t) __builtin_aarch64_get_qregoiv4si (__o, 0); - ret.val[1] = (bfloat16x8_t) __builtin_aarch64_get_qregoiv4si (__o, 1); - return ret; + return __builtin_aarch64_ld2_lanev8bf ( + (__builtin_aarch64_simd_bf *) __ptr, __b, __c); } __extension__ extern __inline bfloat16x4x3_t __attribute__ ((__always_inline__, __gnu_inline__,__artificial__)) vld3_lane_bf16 (const bfloat16_t * __ptr, bfloat16x4x3_t __b, const int __c) { - __builtin_aarch64_simd_ci __o; - bfloat16x8x3_t __temp; - __temp.val[0] = vcombine_bf16 (__b.val[0], vcreate_bf16 (0)); - __temp.val[1] = vcombine_bf16 (__b.val[1], vcreate_bf16 (0)); - __temp.val[2] = vcombine_bf16 (__b.val[2], vcreate_bf16 (0)); - __o = __builtin_aarch64_set_qregciv8bf (__o, (bfloat16x8_t) __temp.val[0], 0); - __o = __builtin_aarch64_set_qregciv8bf (__o, (bfloat16x8_t) __temp.val[1], 1); - __o = __builtin_aarch64_set_qregciv8bf (__o, (bfloat16x8_t) __temp.val[2], 2); - __o = __builtin_aarch64_ld3_lanev4bf ( - (__builtin_aarch64_simd_bf *) __ptr, __o, __c); - __b.val[0] = (bfloat16x4_t) __builtin_aarch64_get_dregcidi (__o, 0); - __b.val[1] = (bfloat16x4_t) __builtin_aarch64_get_dregcidi (__o, 1); - __b.val[2] = (bfloat16x4_t) __builtin_aarch64_get_dregcidi (__o, 2); - return __b; + return __builtin_aarch64_ld3_lanev4bf ( + (__builtin_aarch64_simd_bf *) __ptr, __b, __c); } __extension__ extern __inline bfloat16x8x3_t __attribute__ ((__always_inline__, __gnu_inline__,__artificial__)) vld3q_lane_bf16 (const bfloat16_t * __ptr, bfloat16x8x3_t __b, const int __c) { - __builtin_aarch64_simd_ci __o; - bfloat16x8x3_t ret; - __o = __builtin_aarch64_set_qregciv4si (__o, (int32x4_t) __b.val[0], 0); - __o = __builtin_aarch64_set_qregciv4si (__o, (int32x4_t) __b.val[1], 1); - __o = __builtin_aarch64_set_qregciv4si (__o, (int32x4_t) __b.val[2], 2); - __o = __builtin_aarch64_ld3_lanev8bf ( - (__builtin_aarch64_simd_bf *) __ptr, __o, __c); - ret.val[0] = (bfloat16x8_t) __builtin_aarch64_get_qregciv4si (__o, 0); - ret.val[1] = (bfloat16x8_t) __builtin_aarch64_get_qregciv4si (__o, 1); - ret.val[2] = (bfloat16x8_t) __builtin_aarch64_get_qregciv4si (__o, 2); - return ret; + return __builtin_aarch64_ld3_lanev8bf ( + (__builtin_aarch64_simd_bf *) __ptr, __b, __c); } __extension__ extern __inline bfloat16x4x4_t __attribute__ ((__always_inline__, __gnu_inline__,__artificial__)) vld4_lane_bf16 (const bfloat16_t * __ptr, bfloat16x4x4_t __b, const int __c) { - __builtin_aarch64_simd_xi __o; - bfloat16x8x4_t __temp; - __temp.val[0] = vcombine_bf16 (__b.val[0], vcreate_bf16 (0)); - __temp.val[1] = vcombine_bf16 (__b.val[1], vcreate_bf16 (0)); - __temp.val[2] = vcombine_bf16 (__b.val[2], vcreate_bf16 (0)); - __temp.val[3] = vcombine_bf16 (__b.val[3], vcreate_bf16 (0)); - __o = __builtin_aarch64_set_qregxiv8bf (__o, (bfloat16x8_t) __temp.val[0], 0); - __o = __builtin_aarch64_set_qregxiv8bf (__o, (bfloat16x8_t) __temp.val[1], 1); - __o = __builtin_aarch64_set_qregxiv8bf (__o, (bfloat16x8_t) __temp.val[2], 2); - __o = __builtin_aarch64_set_qregxiv8bf (__o, (bfloat16x8_t) __temp.val[3], 3); - __o = __builtin_aarch64_ld4_lanev4bf ( - (__builtin_aarch64_simd_bf *) __ptr, __o, __c); - __b.val[0] = (bfloat16x4_t) __builtin_aarch64_get_dregxidi (__o, 0); - __b.val[1] = (bfloat16x4_t) __builtin_aarch64_get_dregxidi (__o, 1); - __b.val[2] = (bfloat16x4_t) __builtin_aarch64_get_dregxidi (__o, 2); - __b.val[3] = (bfloat16x4_t) __builtin_aarch64_get_dregxidi (__o, 3); - return __b; + return __builtin_aarch64_ld4_lanev4bf ( + (__builtin_aarch64_simd_bf *) __ptr, __b, __c); } __extension__ extern __inline bfloat16x8x4_t __attribute__ ((__always_inline__, __gnu_inline__,__artificial__)) vld4q_lane_bf16 (const bfloat16_t * __ptr, bfloat16x8x4_t __b, const int __c) { - __builtin_aarch64_simd_xi __o; - bfloat16x8x4_t ret; - __o = __builtin_aarch64_set_qregxiv4si (__o, (int32x4_t) __b.val[0], 0); - __o = __builtin_aarch64_set_qregxiv4si (__o, (int32x4_t) __b.val[1], 1); - __o = __builtin_aarch64_set_qregxiv4si (__o, (int32x4_t) __b.val[2], 2); - __o = __builtin_aarch64_set_qregxiv4si (__o, (int32x4_t) __b.val[3], 3); - __o = __builtin_aarch64_ld4_lanev8bf ( - (__builtin_aarch64_simd_bf *) __ptr, __o, __c); - ret.val[0] = (bfloat16x8_t) __builtin_aarch64_get_qregxiv4si (__o, 0); - ret.val[1] = (bfloat16x8_t) __builtin_aarch64_get_qregxiv4si (__o, 1); - ret.val[2] = (bfloat16x8_t) __builtin_aarch64_get_qregxiv4si (__o, 2); - ret.val[3] = (bfloat16x8_t) __builtin_aarch64_get_qregxiv4si (__o, 3); - return ret; + return __builtin_aarch64_ld4_lanev8bf ( + (__builtin_aarch64_simd_bf *) __ptr, __b, __c); } __extension__ extern __inline void __attribute__ ((__always_inline__, __gnu_inline__, __artificial__)) vst2_lane_bf16 (bfloat16_t *__ptr, bfloat16x4x2_t __val, const int __lane) { - __builtin_aarch64_simd_oi __o; - bfloat16x8x2_t __temp; - __temp.val[0] = vcombine_bf16 (__val.val[0], - vcreate_bf16 (__AARCH64_UINT64_C (0))); - __temp.val[1] = vcombine_bf16 (__val.val[1], - vcreate_bf16 (__AARCH64_UINT64_C (0))); - __builtin_memcpy (&__o, &__temp, sizeof (__temp)); - __builtin_aarch64_st2_lanev4bf ((__builtin_aarch64_simd_bf *) __ptr, __o, + __builtin_aarch64_st2_lanev4bf ((__builtin_aarch64_simd_bf *) __ptr, __val, __lane); } @@ -35443,9 +31803,7 @@ __extension__ extern __inline void __attribute__ ((__always_inline__, __gnu_inline__, __artificial__)) vst2q_lane_bf16 (bfloat16_t *__ptr, bfloat16x8x2_t __val, const int __lane) { - __builtin_aarch64_simd_oi __o; - __builtin_memcpy (&__o, &__val, sizeof (__val)); - __builtin_aarch64_st2_lanev8bf ((__builtin_aarch64_simd_bf *) __ptr, __o, + __builtin_aarch64_st2_lanev8bf ((__builtin_aarch64_simd_bf *) __ptr, __val, __lane); } @@ -35453,16 +31811,7 @@ __extension__ extern __inline void __attribute__ ((__always_inline__, __gnu_inline__, __artificial__)) vst3_lane_bf16 (bfloat16_t *__ptr, bfloat16x4x3_t __val, const int __lane) { - __builtin_aarch64_simd_ci __o; - bfloat16x8x3_t __temp; - __temp.val[0] = vcombine_bf16 (__val.val[0], - vcreate_bf16 (__AARCH64_UINT64_C (0))); - __temp.val[1] = vcombine_bf16 (__val.val[1], - vcreate_bf16 (__AARCH64_UINT64_C (0))); - __temp.val[2] = vcombine_bf16 (__val.val[2], - vcreate_bf16 (__AARCH64_UINT64_C (0))); - __builtin_memcpy (&__o, &__temp, sizeof (__temp)); - __builtin_aarch64_st3_lanev4bf ((__builtin_aarch64_simd_bf *) __ptr, __o, + __builtin_aarch64_st3_lanev4bf ((__builtin_aarch64_simd_bf *) __ptr, __val, __lane); } @@ -35470,9 +31819,7 @@ __extension__ extern __inline void __attribute__ ((__always_inline__, __gnu_inline__, __artificial__)) vst3q_lane_bf16 (bfloat16_t *__ptr, bfloat16x8x3_t __val, const int __lane) { - __builtin_aarch64_simd_ci __o; - __builtin_memcpy (&__o, &__val, sizeof (__val)); - __builtin_aarch64_st3_lanev8bf ((__builtin_aarch64_simd_bf *) __ptr, __o, + __builtin_aarch64_st3_lanev8bf ((__builtin_aarch64_simd_bf *) __ptr, __val, __lane); } @@ -35480,18 +31827,7 @@ __extension__ extern __inline void __attribute__ ((__always_inline__, __gnu_inline__, __artificial__)) vst4_lane_bf16 (bfloat16_t *__ptr, bfloat16x4x4_t __val, const int __lane) { - __builtin_aarch64_simd_xi __o; - bfloat16x8x4_t __temp; - __temp.val[0] = vcombine_bf16 (__val.val[0], - vcreate_bf16 (__AARCH64_UINT64_C (0))); - __temp.val[1] = vcombine_bf16 (__val.val[1], - vcreate_bf16 (__AARCH64_UINT64_C (0))); - __temp.val[2] = vcombine_bf16 (__val.val[2], - vcreate_bf16 (__AARCH64_UINT64_C (0))); - __temp.val[3] = vcombine_bf16 (__val.val[3], - vcreate_bf16 (__AARCH64_UINT64_C (0))); - __builtin_memcpy (&__o, &__temp, sizeof (__temp)); - __builtin_aarch64_st4_lanev4bf ((__builtin_aarch64_simd_bf *) __ptr, __o, + __builtin_aarch64_st4_lanev4bf ((__builtin_aarch64_simd_bf *) __ptr, __val, __lane); } @@ -35499,9 +31835,7 @@ __extension__ extern __inline void __attribute__ ((__always_inline__, __gnu_inline__, __artificial__)) vst4q_lane_bf16 (bfloat16_t *__ptr, bfloat16x8x4_t __val, const int __lane) { - __builtin_aarch64_simd_xi __o; - __builtin_memcpy (&__o, &__val, sizeof (__val)); - __builtin_aarch64_st4_lanev8bf ((__builtin_aarch64_simd_bf *) __ptr, __o, + __builtin_aarch64_st4_lanev8bf ((__builtin_aarch64_simd_bf *) __ptr, __val, __lane); } diff --git a/gcc/config/aarch64/geniterators.sh b/gcc/config/aarch64/geniterators.sh index 5fd8bec9..52e07d4 100644 --- a/gcc/config/aarch64/geniterators.sh +++ b/gcc/config/aarch64/geniterators.sh @@ -64,7 +64,7 @@ iterdef { gsub(/ *"[^"]*" *\)/, "", s) gsub(/\( */, "", s) - if (s !~ /^[A-Za-z0-9_]+ \[[A-Z0-9 ]*\]$/) + if (s !~ /^[A-Za-z0-9_]+ \[[A-Za-z0-9 ]*\]$/) next sub(/\[ */, "", s) sub(/ *\]/, "", s) diff --git a/gcc/config/aarch64/iterators.md b/gcc/config/aarch64/iterators.md index aee32dc..bdc8ba3 100644 --- a/gcc/config/aarch64/iterators.md +++ b/gcc/config/aarch64/iterators.md @@ -299,6 +299,102 @@ ;; Advanced SIMD opaque structure modes. (define_mode_iterator VSTRUCT [OI CI XI]) +;; Advanced SIMD 64-bit vector structure modes. +(define_mode_iterator VSTRUCT_D [V2x8QI V2x4HI V2x2SI V2x1DI + V2x4HF V2x2SF V2x1DF V2x4BF + V3x8QI V3x4HI V3x2SI V3x1DI + V3x4HF V3x2SF V3x1DF V3x4BF + V4x8QI V4x4HI V4x2SI V4x1DI + V4x4HF V4x2SF V4x1DF V4x4BF]) + +;; Advanced SIMD 64-bit 2-vector structure modes. +(define_mode_iterator VSTRUCT_2D [V2x8QI V2x4HI V2x2SI V2x1DI + V2x4HF V2x2SF V2x1DF V2x4BF]) + +;; Advanced SIMD 64-bit 3-vector structure modes. +(define_mode_iterator VSTRUCT_3D [V3x8QI V3x4HI V3x2SI V3x1DI + V3x4HF V3x2SF V3x1DF V3x4BF]) + +;; Advanced SIMD 64-bit 4-vector structure modes. +(define_mode_iterator VSTRUCT_4D [V4x8QI V4x4HI V4x2SI V4x1DI + V4x4HF V4x2SF V4x1DF V4x4BF]) + +;; Advanced SIMD 64-bit 2-vector structure modes minus V2x1DI and V2x1DF. +(define_mode_iterator VSTRUCT_2DNX [V2x8QI V2x4HI V2x2SI V2x4HF + V2x2SF V2x4BF]) + +;; Advanced SIMD 64-bit 3-vector structure modes minus V3x1DI and V3x1DF. +(define_mode_iterator VSTRUCT_3DNX [V3x8QI V3x4HI V3x2SI V3x4HF + V3x2SF V3x4BF]) + +;; Advanced SIMD 64-bit 4-vector structure modes minus V4x1DI and V4x1DF. +(define_mode_iterator VSTRUCT_4DNX [V4x8QI V4x4HI V4x2SI V4x4HF + V4x2SF V4x4BF]) + +;; Advanced SIMD 64-bit structure modes with 64-bit elements. +(define_mode_iterator VSTRUCT_DX [V2x1DI V2x1DF V3x1DI V3x1DF V4x1DI V4x1DF]) + +;; Advanced SIMD 64-bit 2-vector structure modes with 64-bit elements. +(define_mode_iterator VSTRUCT_2DX [V2x1DI V2x1DF]) + +;; Advanced SIMD 64-bit 3-vector structure modes with 64-bit elements. +(define_mode_iterator VSTRUCT_3DX [V3x1DI V3x1DF]) + +;; Advanced SIMD 64-bit 4-vector structure modes with 64-bit elements. +(define_mode_iterator VSTRUCT_4DX [V4x1DI V4x1DF]) + +;; Advanced SIMD 128-bit vector structure modes. +(define_mode_iterator VSTRUCT_Q [V2x16QI V2x8HI V2x4SI V2x2DI + V2x8HF V2x4SF V2x2DF V2x8BF + V3x16QI V3x8HI V3x4SI V3x2DI + V3x8HF V3x4SF V3x2DF V3x8BF + V4x16QI V4x8HI V4x4SI V4x2DI + V4x8HF V4x4SF V4x2DF V4x8BF]) + +;; Advanced SIMD 128-bit 2-vector structure modes. +(define_mode_iterator VSTRUCT_2Q [V2x16QI V2x8HI V2x4SI V2x2DI + V2x8HF V2x4SF V2x2DF V2x8BF]) + +;; Advanced SIMD 128-bit 3-vector structure modes. +(define_mode_iterator VSTRUCT_3Q [V3x16QI V3x8HI V3x4SI V3x2DI + V3x8HF V3x4SF V3x2DF V3x8BF]) + +;; Advanced SIMD 128-bit 4-vector structure modes. +(define_mode_iterator VSTRUCT_4Q [V4x16QI V4x8HI V4x4SI V4x2DI + V4x8HF V4x4SF V4x2DF V4x8BF]) + +;; Advanced SIMD 2-vector structure modes. +(define_mode_iterator VSTRUCT_2QD [V2x8QI V2x4HI V2x2SI V2x1DI + V2x4HF V2x2SF V2x1DF V2x4BF + V2x16QI V2x8HI V2x4SI V2x2DI + V2x8HF V2x4SF V2x2DF V2x8BF]) + +;; Advanced SIMD 3-vector structure modes. +(define_mode_iterator VSTRUCT_3QD [V3x8QI V3x4HI V3x2SI V3x1DI + V3x4HF V3x2SF V3x1DF V3x4BF + V3x16QI V3x8HI V3x4SI V3x2DI + V3x8HF V3x4SF V3x2DF V3x8BF]) + +;; Advanced SIMD 4-vector structure modes. +(define_mode_iterator VSTRUCT_4QD [V4x8QI V4x4HI V4x2SI V4x1DI + V4x4HF V4x2SF V4x1DF V4x4BF + V4x16QI V4x8HI V4x4SI V4x2DI + V4x8HF V4x4SF V4x2DF V4x8BF]) + +;; Advanced SIMD vector structure modes. +(define_mode_iterator VSTRUCT_QD [V2x8QI V2x4HI V2x2SI V2x1DI + V2x4HF V2x2SF V2x1DF V2x4BF + V3x8QI V3x4HI V3x2SI V3x1DI + V3x4HF V3x2SF V3x1DF V3x4BF + V4x8QI V4x4HI V4x2SI V4x1DI + V4x4HF V4x2SF V4x1DF V4x4BF + V2x16QI V2x8HI V2x4SI V2x2DI + V2x8HF V2x4SF V2x2DF V2x8BF + V3x16QI V3x8HI V3x4SI V3x2DI + V3x8HF V3x4SF V3x2DF V3x8BF + V4x16QI V4x8HI V4x4SI V4x2DI + V4x8HF V4x4SF V4x2DF V4x8BF]) + ;; Double scalar modes (define_mode_iterator DX [DI DF]) @@ -1021,7 +1117,31 @@ (DI "1d") (DF "1d") (V2DI "2d") (V2SF "2s") (V4SF "4s") (V2DF "2d") - (V4HF "4h") (V8HF "8h")]) + (V4HF "4h") (V8HF "8h") + (V2x8QI "8b") (V2x4HI "4h") + (V2x2SI "2s") (V2x1DI "1d") + (V2x4HF "4h") (V2x2SF "2s") + (V2x1DF "1d") (V2x4BF "4h") + (V2x16QI "16b") (V2x8HI "8h") + (V2x4SI "4s") (V2x2DI "2d") + (V2x8HF "8h") (V2x4SF "4s") + (V2x2DF "2d") (V2x8BF "8h") + (V3x8QI "8b") (V3x4HI "4h") + (V3x2SI "2s") (V3x1DI "1d") + (V3x4HF "4h") (V3x2SF "2s") + (V3x1DF "1d") (V3x4BF "4h") + (V3x16QI "16b") (V3x8HI "8h") + (V3x4SI "4s") (V3x2DI "2d") + (V3x8HF "8h") (V3x4SF "4s") + (V3x2DF "2d") (V3x8BF "8h") + (V4x8QI "8b") (V4x4HI "4h") + (V4x2SI "2s") (V4x1DI "1d") + (V4x4HF "4h") (V4x2SF "2s") + (V4x1DF "1d") (V4x4BF "4h") + (V4x16QI "16b") (V4x8HI "8h") + (V4x4SI "4s") (V4x2DI "2d") + (V4x8HF "8h") (V4x4SF "4s") + (V4x2DF "2d") (V4x8BF "8h")]) ;; Map mode to type used in widening multiplies. (define_mode_attr Vcondtype [(V4HI "4h") (V8HI "4h") (V2SI "2s") (V4SI "2s")]) @@ -1059,6 +1179,30 @@ (V4HF "h") (V8HF "h") (V2SF "s") (V4SF "s") (V2DF "d") + (V2x8QI "b") (V2x4HI "h") + (V2x2SI "s") (V2x1DI "d") + (V2x4HF "h") (V2x2SF "s") + (V2x1DF "d") (V2x4BF "h") + (V2x16QI "b") (V2x8HI "h") + (V2x4SI "s") (V2x2DI "d") + (V2x8HF "h") (V2x4SF "s") + (V2x2DF "d") (V2x8BF "h") + (V3x8QI "b") (V3x4HI "h") + (V3x2SI "s") (V3x1DI "d") + (V3x4HF "h") (V3x2SF "s") + (V3x1DF "d") (V3x4BF "h") + (V3x16QI "b") (V3x8HI "h") + (V3x4SI "s") (V3x2DI "d") + (V3x8HF "h") (V3x4SF "s") + (V3x2DF "d") (V3x8BF "h") + (V4x8QI "b") (V4x4HI "h") + (V4x2SI "s") (V4x1DI "d") + (V4x4HF "h") (V4x2SF "s") + (V4x1DF "d") (V4x4BF "h") + (V4x16QI "b") (V4x8HI "h") + (V4x4SI "s") (V4x2DI "d") + (V4x8HF "h") (V4x4SF "s") + (V4x2DF "d") (V4x8BF "h") (VNx16BI "b") (VNx8BI "h") (VNx4BI "s") (VNx2BI "d") (VNx16QI "b") (VNx8QI "b") (VNx4QI "b") (VNx2QI "b") (VNx8HI "h") (VNx4HI "h") (VNx2HI "h") @@ -1138,6 +1282,58 @@ (SI "8b") (SF "8b") (V4BF "8b") (V8BF "16b")]) +;; Advanced SIMD vector structure to element modes. +(define_mode_attr VSTRUCT_ELT [(V2x8QI "V8QI") (V2x4HI "V4HI") + (V2x2SI "V2SI") (V2x1DI "DI") + (V2x4HF "V4HF") (V2x2SF "V2SF") + (V2x1DF "DF") (V2x4BF "V4BF") + (V3x8QI "V8QI") (V3x4HI "V4HI") + (V3x2SI "V2SI") (V3x1DI "DI") + (V3x4HF "V4HF") (V3x2SF "V2SF") + (V3x1DF "DF") (V3x4BF "V4BF") + (V4x8QI "V8QI") (V4x4HI "V4HI") + (V4x2SI "V2SI") (V4x1DI "DI") + (V4x4HF "V4HF") (V4x2SF "V2SF") + (V4x1DF "DF") (V4x4BF "V4BF") + (V2x16QI "V16QI") (V2x8HI "V8HI") + (V2x4SI "V4SI") (V2x2DI "V2DI") + (V2x8HF "V8HF") (V2x4SF "V4SF") + (V2x2DF "V2DF") (V2x8BF "V8BF") + (V3x16QI "V16QI") (V3x8HI "V8HI") + (V3x4SI "V4SI") (V3x2DI "V2DI") + (V3x8HF "V8HF") (V3x4SF "V4SF") + (V3x2DF "V2DF") (V3x8BF "V8BF") + (V4x16QI "V16QI") (V4x8HI "V8HI") + (V4x4SI "V4SI") (V4x2DI "V2DI") + (V4x8HF "V8HF") (V4x4SF "V4SF") + (V4x2DF "V2DF") (V4x8BF "V8BF")]) + +;; Advanced SIMD vector structure to element modes in lower case. +(define_mode_attr vstruct_elt [(V2x8QI "v8qi") (V2x4HI "v4hi") + (V2x2SI "v2si") (V2x1DI "di") + (V2x4HF "v4hf") (V2x2SF "v2sf") + (V2x1DF "df") (V2x4BF "v4bf") + (V3x8QI "v8qi") (V3x4HI "v4hi") + (V3x2SI "v2si") (V3x1DI "di") + (V3x4HF "v4hf") (V3x2SF "v2sf") + (V3x1DF "df") (V3x4BF "v4bf") + (V4x8QI "v8qi") (V4x4HI "v4hi") + (V4x2SI "v2si") (V4x1DI "di") + (V4x4HF "v4hf") (V4x2SF "v2sf") + (V4x1DF "df") (V4x4BF "v4bf") + (V2x16QI "v16qi") (V2x8HI "v8hi") + (V2x4SI "v4si") (V2x2DI "v2di") + (V2x8HF "v8hf") (V2x4SF "v4sf") + (V2x2DF "v2df") (V2x8BF "v8bf") + (V3x16QI "v16qi") (V3x8HI "v8hi") + (V3x4SI "v4si") (V3x2DI "v2di") + (V3x8HF "v8hf") (V3x4SF "v4sf") + (V3x2DF "v2df") (V3x8BF "v8bf") + (V4x16QI "v16qi") (V4x8HI "v8hi") + (V4x4SI "v4si") (V4x2DI "v2di") + (V4x8HF "v8hf") (V4x4SF "v4sf") + (V4x2DF "v2df") (V4x8BF "v8bf")]) + ;; Define element mode for each vector mode. (define_mode_attr VEL [(V8QI "QI") (V16QI "QI") (V4HI "HI") (V8HI "HI") @@ -1492,12 +1688,60 @@ (define_mode_attr vwx [(V4HI "x") (V8HI "x") (HI "x") (V2SI "w") (V4SI "w") (SI "w")]) -(define_mode_attr Vendreg [(OI "T") (CI "U") (XI "V")]) +(define_mode_attr Vendreg [(OI "T") (CI "U") (XI "V") + (V2x8QI "T") (V2x16QI "T") + (V2x4HI "T") (V2x8HI "T") + (V2x2SI "T") (V2x4SI "T") + (V2x1DI "T") (V2x2DI "T") + (V2x4HF "T") (V2x8HF "T") + (V2x2SF "T") (V2x4SF "T") + (V2x1DF "T") (V2x2DF "T") + (V2x4BF "T") (V2x8BF "T") + (V3x8QI "U") (V3x16QI "U") + (V3x4HI "U") (V3x8HI "U") + (V3x2SI "U") (V3x4SI "U") + (V3x1DI "U") (V3x2DI "U") + (V3x4HF "U") (V3x8HF "U") + (V3x2SF "U") (V3x4SF "U") + (V3x1DF "U") (V3x2DF "U") + (V3x4BF "U") (V3x8BF "U") + (V4x8QI "V") (V4x16QI "V") + (V4x4HI "V") (V4x8HI "V") + (V4x2SI "V") (V4x4SI "V") + (V4x1DI "V") (V4x2DI "V") + (V4x4HF "V") (V4x8HF "V") + (V4x2SF "V") (V4x4SF "V") + (V4x1DF "V") (V4x2DF "V") + (V4x4BF "V") (V4x8BF "V")]) ;; This is both the number of Q-Registers needed to hold the corresponding ;; opaque large integer mode, and the number of elements touched by the ;; ld..._lane and st..._lane operations. -(define_mode_attr nregs [(OI "2") (CI "3") (XI "4")]) +(define_mode_attr nregs [(OI "2") (CI "3") (XI "4") + (V2x8QI "2") (V2x16QI "2") + (V2x4HI "2") (V2x8HI "2") + (V2x2SI "2") (V2x4SI "2") + (V2x1DI "2") (V2x2DI "2") + (V2x4HF "2") (V2x8HF "2") + (V2x2SF "2") (V2x4SF "2") + (V2x1DF "2") (V2x2DF "2") + (V2x4BF "2") (V2x8BF "2") + (V3x8QI "3") (V3x16QI "3") + (V3x4HI "3") (V3x8HI "3") + (V3x2SI "3") (V3x4SI "3") + (V3x1DI "3") (V3x2DI "3") + (V3x4HF "3") (V3x8HF "3") + (V3x2SF "3") (V3x4SF "3") + (V3x1DF "3") (V3x2DF "3") + (V3x4BF "3") (V3x8BF "3") + (V4x8QI "4") (V4x16QI "4") + (V4x4HI "4") (V4x8HI "4") + (V4x2SI "4") (V4x4SI "4") + (V4x1DI "4") (V4x2DI "4") + (V4x4HF "4") (V4x8HF "4") + (V4x2SF "4") (V4x4SF "4") + (V4x1DF "4") (V4x2DF "4") + (V4x4BF "4") (V4x8BF "4")]) ;; Mode for atomic operation suffixes (define_mode_attr atomic_sfx @@ -1575,7 +1819,31 @@ (V4BF "") (V8BF "_q") (V2SF "") (V4SF "_q") (V2DF "_q") - (QI "") (HI "") (SI "") (DI "") (HF "") (SF "") (DF "")]) + (QI "") (HI "") (SI "") (DI "") (HF "") (SF "") (DF "") + (V2x8QI "") (V2x16QI "_q") + (V2x4HI "") (V2x8HI "_q") + (V2x2SI "") (V2x4SI "_q") + (V2x1DI "") (V2x2DI "_q") + (V2x4HF "") (V2x8HF "_q") + (V2x2SF "") (V2x4SF "_q") + (V2x1DF "") (V2x2DF "_q") + (V2x4BF "") (V2x8BF "_q") + (V3x8QI "") (V3x16QI "_q") + (V3x4HI "") (V3x8HI "_q") + (V3x2SI "") (V3x4SI "_q") + (V3x1DI "") (V3x2DI "_q") + (V3x4HF "") (V3x8HF "_q") + (V3x2SF "") (V3x4SF "_q") + (V3x1DF "") (V3x2DF "_q") + (V3x4BF "") (V3x8BF "_q") + (V4x8QI "") (V4x16QI "_q") + (V4x4HI "") (V4x8HI "_q") + (V4x2SI "") (V4x4SI "_q") + (V4x1DI "") (V4x2DI "_q") + (V4x4HF "") (V4x8HF "_q") + (V4x2SF "") (V4x4SF "_q") + (V4x1DF "") (V4x2DF "_q") + (V4x4BF "") (V4x8BF "_q")]) (define_mode_attr vp [(V8QI "v") (V16QI "v") (V4HI "v") (V8HI "v") @@ -1597,7 +1865,31 @@ (define_mode_attr Vbfdottype [(V2SF "4h") (V4SF "8h")]) ;; Sum of lengths of instructions needed to move vector registers of a mode. -(define_mode_attr insn_count [(OI "8") (CI "12") (XI "16")]) +(define_mode_attr insn_count [(OI "8") (CI "12") (XI "16") + (V2x8QI "8") (V2x16QI "8") + (V2x4HI "8") (V2x8HI "8") + (V2x2SI "8") (V2x4SI "8") + (V2x1DI "8") (V2x2DI "8") + (V2x4HF "8") (V2x8HF "8") + (V2x2SF "8") (V2x4SF "8") + (V2x1DF "8") (V2x2DF "8") + (V2x4BF "8") (V2x8BF "8") + (V3x8QI "12") (V3x16QI "12") + (V3x4HI "12") (V3x8HI "12") + (V3x2SI "12") (V3x4SI "12") + (V3x1DI "12") (V3x2DI "12") + (V3x4HF "12") (V3x8HF "12") + (V3x2SF "12") (V3x4SF "12") + (V3x1DF "12") (V3x2DF "12") + (V3x4BF "12") (V3x8BF "12") + (V4x8QI "16") (V4x16QI "16") + (V4x4HI "16") (V4x8HI "16") + (V4x2SI "16") (V4x4SI "16") + (V4x1DI "16") (V4x2DI "16") + (V4x4HF "16") (V4x8HF "16") + (V4x2SF "16") (V4x4SF "16") + (V4x1DF "16") (V4x2DF "16") + (V4x4BF "16") (V4x8BF "16")]) ;; -fpic small model GOT reloc modifers: gotpage_lo15/lo14 for ILP64/32. ;; No need of iterator for -fPIC as it use got_lo12 for both modes. diff --git a/gcc/genmodes.c b/gcc/genmodes.c index c268ebc..c9af4ef 100644 --- a/gcc/genmodes.c +++ b/gcc/genmodes.c @@ -749,12 +749,15 @@ make_partial_integer_mode (const char *base, const char *name, /* A single vector mode can be specified by naming its component mode and the number of components. */ -#define VECTOR_MODE(C, M, N) \ - make_vector_mode (MODE_##C, #M, N, __FILE__, __LINE__); +#define VECTOR_MODE_WITH_PREFIX(PREFIX, C, M, N, ORDER) \ + make_vector_mode (MODE_##C, #PREFIX, #M, N, ORDER, __FILE__, __LINE__); +#define VECTOR_MODE(C, M, N) VECTOR_MODE_WITH_PREFIX(V, C, M, N, 0); static void ATTRIBUTE_UNUSED make_vector_mode (enum mode_class bclass, + const char *prefix, const char *base, unsigned int ncomponents, + unsigned int order, const char *file, unsigned int line) { struct mode_data *v; @@ -778,7 +781,7 @@ make_vector_mode (enum mode_class bclass, return; } - if ((size_t)snprintf (namebuf, sizeof namebuf, "V%u%s", + if ((size_t)snprintf (namebuf, sizeof namebuf, "%s%u%s", prefix, ncomponents, base) >= sizeof namebuf) { error ("%s:%d: mode name \"%s\" is too long", @@ -787,6 +790,7 @@ make_vector_mode (enum mode_class bclass, } v = new_mode (vclass, xstrdup (namebuf), file, line); + v->order = order; v->ncomponents = ncomponents; v->component = component; } diff --git a/gcc/testsuite/gcc.target/aarch64/advsimd-intrinsics/bf16_vldN_lane_2.c b/gcc/testsuite/gcc.target/aarch64/advsimd-intrinsics/bf16_vldN_lane_2.c index 670cf0b..87b5fc3 100644 --- a/gcc/testsuite/gcc.target/aarch64/advsimd-intrinsics/bf16_vldN_lane_2.c +++ b/gcc/testsuite/gcc.target/aarch64/advsimd-intrinsics/bf16_vldN_lane_2.c @@ -17,7 +17,7 @@ test_vld2q_lane_bf16 (const bfloat16_t *ptr, bfloat16x8x2_t b) return vld2q_lane_bf16 (ptr, b, 2); } -/* { dg-final { scan-assembler-times "ld2\\t{v2.h - v3.h}\\\[2\\\], \\\[x0\\\]" 2 } } */ +/* { dg-final { scan-assembler-times "ld2\\t{v\[0-9\]+.h - v\[0-9\]+.h}\\\[2\\\], \\\[x0\\\]" 2 } } */ bfloat16x4x3_t test_vld3_lane_bf16 (const bfloat16_t *ptr, bfloat16x4x3_t b) @@ -25,15 +25,13 @@ test_vld3_lane_bf16 (const bfloat16_t *ptr, bfloat16x4x3_t b) return vld3_lane_bf16 (ptr, b, 2); } -/* { dg-final { scan-assembler-times "ld3\t{v4.h - v6.h}\\\[2\\\], \\\[x0\\\]" 1 } } */ - bfloat16x8x3_t test_vld3q_lane_bf16 (const bfloat16_t *ptr, bfloat16x8x3_t b) { return vld3q_lane_bf16 (ptr, b, 2); } -/* { dg-final { scan-assembler-times "ld3\t{v1.h - v3.h}\\\[2\\\], \\\[x0\\\]" 1 } } */ +/* { dg-final { scan-assembler-times "ld3\t{v\[0-9\]+.h - v\[0-9\]+.h}\\\[2\\\], \\\[x0\\\]" 2 } } */ bfloat16x4x4_t test_vld4_lane_bf16 (const bfloat16_t *ptr, bfloat16x4x4_t b) @@ -41,12 +39,10 @@ test_vld4_lane_bf16 (const bfloat16_t *ptr, bfloat16x4x4_t b) return vld4_lane_bf16 (ptr, b, 2); } -/* { dg-final { scan-assembler-times "ld4\t{v4.h - v7.h}\\\[2\\\], \\\[x0\\\]" 1 } } */ - bfloat16x8x4_t test_vld4q_lane_bf16 (const bfloat16_t *ptr, bfloat16x8x4_t b) { return vld4q_lane_bf16 (ptr, b, 2); } -/* { dg-final { scan-assembler-times "ld4\t{v0.h - v3.h}\\\[2\\\], \\\[x0\\\]" 1 } } */ +/* { dg-final { scan-assembler-times "ld4\t{v\[0-9\]+.h - v\[0-9\]+.h}\\\[2\\\], \\\[x0\\\]" 2 } } */ diff --git a/gcc/testsuite/gcc.target/aarch64/sve/pcs/struct_3_256.c b/gcc/testsuite/gcc.target/aarch64/sve/pcs/struct_3_256.c index fdfbec5..af0af29 100644 --- a/gcc/testsuite/gcc.target/aarch64/sve/pcs/struct_3_256.c +++ b/gcc/testsuite/gcc.target/aarch64/sve/pcs/struct_3_256.c @@ -933,7 +933,11 @@ SEL2 (struct, nonpst1) /* ** test_nonpst1: +** ( +** fmov d0, d3 +** | ** mov v0\.8b, v3\.8b +** ) ** ret */ /* { dg-final { scan-assembler-not {\t\.variant_pcs\ttest_nonpst1\n} } } */