review.tizen.org Git - platform/upstream/gcc.git/log

Fortran manual: Revise introductory chapter.

Fix various bit-rot in the discussion of standards conformance, remove
material that is only of historical interest, copy-editing.  Also move
discussion of preprocessing out of the introductory chapter.

2021-11-01  Sandra Loosemore  <sandra@codesourcery.com>

gcc/fortran/
* gfortran.texi (About GNU Fortran): Consolidate material
formerly in other sections.  Copy-editing.
(Preprocessing and conditional compilation): Delete, moving
most material to invoke.texi.
(GNU Fortran and G77): Delete.
(Project Status): Delete.
(Standards): Update.
(Fortran 95 status): Mention conditional compilation here.
(Fortran 2003 status): Rewrite to mention the 1 missing feature
instead of all the ones implemented.
(Fortran 2008 status): Similarly for the 2 missing features.
(Fortran 2018 status): Rewrite to reflect completion of TS29113
feature support.
* invoke.texi (Preprocessing Options): Move material formerly
in introductory chapter here.

Fortran manual: Combine standard conformance docs in one place.

Discussion of conformance with various revisions of the
Fortran standard was split between two separate parts of the
manual. This patch moves it all to the introductory chapter.

2021-11-01 Sandra Loosemore <sandra@codesourcery.com>

gcc/fortran/
* gfortran.texi (Standards): Move discussion of specific
standard versions here....
(Fortran standards status): ...from here, and delete this node.

Workaround ICE in gimple_call_static_chain_flags

gcc/ChangeLog:

2021-11-04 Jan Hubicka <hubicka@ucw.cz>

PR ipa/103058
* gimple.c (gimple_call_static_chain_flags): Handle case when
nested function does not bind locally.

c++: use range-for more

gcc/cp/ChangeLog:

* call.c (build_array_conv): Use range-for.
(build_complex_conv): Likewise.
* constexpr.c (clear_no_implicit_zero)
(reduced_constant_expression_p): Likewise.
* decl.c (cp_complete_array_type): Likewise.
* decl2.c (mark_vtable_entries): Likewise.
* pt.c (iterative_hash_template_arg):
(invalid_tparm_referent_p, unify)
(type_dependent_expression_p): Likewise.
* typeck.c (build_ptrmemfunc_access_expr): Likewise.

aarch64: Pass and return Neon vector-tuple types without a parallel

Neon vector-tuple types can be passed in registers on function call
and return - there is no need to generate a parallel rtx. This patch
adds cases to detect vector-tuple modes and generates an appropriate
register rtx.

This change greatly improves code generated when passing Neon vector-
tuple types between functions; many new test cases are added to
defend these improvements.

gcc/ChangeLog:

2021-10-07 Jonathan Wright <jonathan.wright@arm.com>

* config/aarch64/aarch64.c (aarch64_function_value): Generate
a register rtx for Neon vector-tuple modes.
(aarch64_layout_arg): Likewise.

gcc/testsuite/ChangeLog:

* gcc.target/aarch64/vector_structure_intrinsics.c: New code
generation tests.

gcc/lower_subreg.c: Prevent decomposition if modes are not tieable

Preventing decomposition if modes are not tieable is necessary to
stop AArch64 partial Neon structure modes being treated as packed in
registers.

This is a necessary prerequisite for a future AArch64 PCS change to
maintain good code generation.

gcc/ChangeLog:

2021-10-14 Jonathan Wright <jonathan.wright@arm.com>

* lower-subreg.c (simple_move): Prevent decomposition if
modes are not tieable.

aarch64: Add machine modes for Neon vector-tuple types

Until now, GCC has used large integer machine modes (OI, CI and XI)
to model Neon vector-tuple types. This is suboptimal for many
reasons, the most notable are:

1) Large integer modes are opaque and modifying one vector in the
    tuple requires a lot of inefficient set/get gymnastics. The
    result is a lot of superfluous move instructions.
2) Large integer modes do not map well to types that are tuples of
    64-bit vectors - we need additional zero-padding which again
    results in superfluous move instructions.

This patch adds new machine modes that better model the C-level Neon
vector-tuple types. The approach is somewhat similar to that already
used for SVE vector-tuple types.

All of the AArch64 backend patterns and builtins that manipulate Neon
vector tuples are updated to use the new machine modes. This has the
effect of significantly reducing the amount of boiler-plate code in
the arm_neon.h header.

While this patch increases the quality of code generated in many
instances, there is still room for significant improvement - which
will be attempted in subsequent patches.

gcc/ChangeLog:

2021-08-09  Jonathan Wright  <jonathan.wright@arm.com>
    Richard Sandiford  <richard.sandiford@arm.com>

* config/aarch64/aarch64-builtins.c (v2x8qi_UP): Define.
(v2x4hi_UP): Likewise.
(v2x4hf_UP): Likewise.
(v2x4bf_UP): Likewise.
(v2x2si_UP): Likewise.
(v2x2sf_UP): Likewise.
(v2x1di_UP): Likewise.
(v2x1df_UP): Likewise.
(v2x16qi_UP): Likewise.
(v2x8hi_UP): Likewise.
(v2x8hf_UP): Likewise.
(v2x8bf_UP): Likewise.
(v2x4si_UP): Likewise.
(v2x4sf_UP): Likewise.
(v2x2di_UP): Likewise.
(v2x2df_UP): Likewise.
(v3x8qi_UP): Likewise.
(v3x4hi_UP): Likewise.
(v3x4hf_UP): Likewise.
(v3x4bf_UP): Likewise.
(v3x2si_UP): Likewise.
(v3x2sf_UP): Likewise.
(v3x1di_UP): Likewise.
(v3x1df_UP): Likewise.
(v3x16qi_UP): Likewise.
(v3x8hi_UP): Likewise.
(v3x8hf_UP): Likewise.
(v3x8bf_UP): Likewise.
(v3x4si_UP): Likewise.
(v3x4sf_UP): Likewise.
(v3x2di_UP): Likewise.
(v3x2df_UP): Likewise.
(v4x8qi_UP): Likewise.
(v4x4hi_UP): Likewise.
(v4x4hf_UP): Likewise.
(v4x4bf_UP): Likewise.
(v4x2si_UP): Likewise.
(v4x2sf_UP): Likewise.
(v4x1di_UP): Likewise.
(v4x1df_UP): Likewise.
(v4x16qi_UP): Likewise.
(v4x8hi_UP): Likewise.
(v4x8hf_UP): Likewise.
(v4x8bf_UP): Likewise.
(v4x4si_UP): Likewise.
(v4x4sf_UP): Likewise.
(v4x2di_UP): Likewise.
(v4x2df_UP): Likewise.
(TYPES_GETREGP): Delete.
(TYPES_SETREGP): Likewise.
(TYPES_LOADSTRUCT_U): Define.
(TYPES_LOADSTRUCT_P): Likewise.
(TYPES_LOADSTRUCT_LANE_U): Likewise.
(TYPES_LOADSTRUCT_LANE_P): Likewise.
(TYPES_STORE1P): Move for consistency.
(TYPES_STORESTRUCT_U): Define.
(TYPES_STORESTRUCT_P): Likewise.
(TYPES_STORESTRUCT_LANE_U): Likewise.
(TYPES_STORESTRUCT_LANE_P): Likewise.
(aarch64_simd_tuple_types): Define.
(aarch64_lookup_simd_builtin_type): Handle tuple type lookup.
(aarch64_init_simd_builtin_functions): Update frontend lookup
for builtin functions after handling arm_neon.h pragma.
(register_tuple_type): Manually set modes of single-integer
tuple types. Record tuple types.
* config/aarch64/aarch64-modes.def
(ADV_SIMD_D_REG_STRUCT_MODES): Define D-register tuple modes.
(ADV_SIMD_Q_REG_STRUCT_MODES): Define Q-register tuple modes.
(SVE_MODES): Give single-vector modes priority over vector-
tuple modes.
(VECTOR_MODES_WITH_PREFIX): Set partial-vector mode order to
be after all single-vector modes.
* config/aarch64/aarch64-simd-builtins.def: Update builtin
generator macros to reflect modifications to the backend
patterns.
* config/aarch64/aarch64-simd.md (aarch64_simd_ld2<mode>):
Use vector-tuple mode iterator and rename to...
(aarch64_simd_ld2<vstruct_elt>): This.
(aarch64_simd_ld2r<mode>): Use vector-tuple mode iterator and
rename to...
(aarch64_simd_ld2r<vstruct_elt>): This.
(aarch64_vec_load_lanesoi_lane<mode>): Use vector-tuple mode
iterator and rename to...
(aarch64_vec_load_lanes<mode>_lane<vstruct_elt>): This.
(vec_load_lanesoi<mode>): Use vector-tuple mode iterator and
rename to...
(vec_load_lanes<mode><vstruct_elt>): This.
(aarch64_simd_st2<mode>): Use vector-tuple mode iterator and
rename to...
(aarch64_simd_st2<vstruct_elt>): This.
(aarch64_vec_store_lanesoi_lane<mode>): Use vector-tuple mode
iterator and rename to...
(aarch64_vec_store_lanes<mode>_lane<vstruct_elt>): This.
(vec_store_lanesoi<mode>): Use vector-tuple mode iterator and
rename to...
(vec_store_lanes<mode><vstruct_elt>): This.
(aarch64_simd_ld3<mode>): Use vector-tuple mode iterator and
rename to...
(aarch64_simd_ld3<vstruct_elt>): This.
(aarch64_simd_ld3r<mode>): Use vector-tuple mode iterator and
rename to...
(aarch64_simd_ld3r<vstruct_elt>): This.
(aarch64_vec_load_lanesci_lane<mode>): Use vector-tuple mode
iterator and rename to...
(vec_load_lanesci<mode>): This.
(aarch64_simd_st3<mode>): Use vector-tuple mode iterator and
rename to...
(aarch64_simd_st3<vstruct_elt>): This.
(aarch64_vec_store_lanesci_lane<mode>): Use vector-tuple mode
iterator and rename to...
(vec_store_lanesci<mode>): This.
(aarch64_simd_ld4<mode>): Use vector-tuple mode iterator and
rename to...
(aarch64_simd_ld4<vstruct_elt>): This.
(aarch64_simd_ld4r<mode>): Use vector-tuple mode iterator and
rename to...
(aarch64_simd_ld4r<vstruct_elt>): This.
(aarch64_vec_load_lanesxi_lane<mode>): Use vector-tuple mode
iterator and rename to...
(vec_load_lanesxi<mode>): This.
(aarch64_simd_st4<mode>): Use vector-tuple mode iterator and
rename to...
(aarch64_simd_st4<vstruct_elt>): This.
(aarch64_vec_store_lanesxi_lane<mode>): Use vector-tuple mode
iterator and rename to...
(vec_store_lanesxi<mode>): This.
(mov<mode>): Define for Neon vector-tuple modes.
(aarch64_ld1x3<VALLDIF:mode>): Use vector-tuple mode iterator
and rename to...
(aarch64_ld1x3<vstruct_elt>): This.
(aarch64_ld1_x3_<mode>): Use vector-tuple mode iterator and
rename to...
(aarch64_ld1_x3_<vstruct_elt>): This.
(aarch64_ld1x4<VALLDIF:mode>): Use vector-tuple mode iterator
and rename to...
(aarch64_ld1x4<vstruct_elt>): This.
(aarch64_ld1_x4_<mode>): Use vector-tuple mode iterator and
rename to...
(aarch64_ld1_x4_<vstruct_elt>): This.
(aarch64_st1x2<VALLDIF:mode>): Use vector-tuple mode iterator
and rename to...
(aarch64_st1x2<vstruct_elt>): This.
(aarch64_st1_x2_<mode>): Use vector-tuple mode iterator and
rename to...
(aarch64_st1_x2_<vstruct_elt>): This.
(aarch64_st1x3<VALLDIF:mode>): Use vector-tuple mode iterator
and rename to...
(aarch64_st1x3<vstruct_elt>): This.
(aarch64_st1_x3_<mode>): Use vector-tuple mode iterator and
rename to...
(aarch64_st1_x3_<vstruct_elt>): This.
(aarch64_st1x4<VALLDIF:mode>): Use vector-tuple mode iterator
and rename to...
(aarch64_st1x4<vstruct_elt>): This.
(aarch64_st1_x4_<mode>): Use vector-tuple mode iterator and
rename to...
(aarch64_st1_x4_<vstruct_elt>): This.
(*aarch64_mov<mode>): Define for vector-tuple modes.
(*aarch64_be_mov<mode>): Likewise.
(aarch64_ld<VSTRUCT:nregs>r<VALLDIF:mode>): Use vector-tuple
mode iterator and rename to...
(aarch64_ld<nregs>r<vstruct_elt>): This.
(aarch64_ld2<mode>_dreg): Use vector-tuple mode iterator and
rename to...
(aarch64_ld2<vstruct_elt>_dreg): This.
(aarch64_ld3<mode>_dreg): Use vector-tuple mode iterator and
rename to...
(aarch64_ld3<vstruct_elt>_dreg): This.
(aarch64_ld4<mode>_dreg): Use vector-tuple mode iterator and
rename to...
(aarch64_ld4<vstruct_elt>_dreg): This.
(aarch64_ld<VSTRUCT:nregs><VDC:mode>): Use vector-tuple mode
iterator and rename to...
(aarch64_ld<nregs><vstruct_elt>): Use vector-tuple mode
iterator and rename to...
(aarch64_ld<VSTRUCT:nregs><VQ:mode>): Use vector-tuple mode
(aarch64_ld1x2<VQ:mode>): Delete.
(aarch64_ld1x2<VDC:mode>): Use vector-tuple mode iterator and
rename to...
(aarch64_ld1x2<vstruct_elt>): This.
(aarch64_ld<VSTRUCT:nregs>_lane<VALLDIF:mode>): Use vector-
tuple mode iterator and rename to...
(aarch64_ld<nregs>_lane<vstruct_elt>): This.
(aarch64_get_dreg<VSTRUCT:mode><VDC:mode>): Delete.
(aarch64_get_qreg<VSTRUCT:mode><VQ:mode>): Likewise.
(aarch64_st2<mode>_dreg): Use vector-tuple mode iterator and
rename to...
(aarch64_st2<vstruct_elt>_dreg): This.
(aarch64_st3<mode>_dreg): Use vector-tuple mode iterator and
rename to...
(aarch64_st3<vstruct_elt>_dreg): This.
(aarch64_st4<mode>_dreg): Use vector-tuple mode iterator and
rename to...
(aarch64_st4<vstruct_elt>_dreg): This.
(aarch64_st<VSTRUCT:nregs><VDC:mode>): Use vector-tuple mode
iterator and rename to...
(aarch64_st<nregs><vstruct_elt>): This.
(aarch64_st<VSTRUCT:nregs><VQ:mode>): Use vector-tuple mode
iterator and rename to aarch64_st<nregs><vstruct_elt>.
(aarch64_st<VSTRUCT:nregs>_lane<VALLDIF:mode>): Use vector-
tuple mode iterator and rename to...
(aarch64_st<nregs>_lane<vstruct_elt>): This.
(aarch64_set_qreg<VSTRUCT:mode><VQ:mode>): Delete.
(aarch64_simd_ld1<mode>_x2): Use vector-tuple mode iterator
and rename to...
(aarch64_simd_ld1<vstruct_elt>_x2): This.
* config/aarch64/aarch64.c (aarch64_advsimd_struct_mode_p):
Refactor to include new vector-tuple modes.
(aarch64_classify_vector_mode): Add cases for new vector-
tuple modes.
(aarch64_advsimd_partial_struct_mode_p): Define.
(aarch64_advsimd_full_struct_mode_p): Likewise.
(aarch64_advsimd_vector_array_mode): Likewise.
(aarch64_sve_data_mode): Change location in file.
(aarch64_array_mode): Handle case of Neon vector-tuple modes.
(aarch64_hard_regno_nregs): Handle case of partial Neon
vector structures.
(aarch64_classify_address): Refactor to include handling of
Neon vector-tuple modes.
(aarch64_print_operand): Print "d" for "%R" for a partial
Neon vector structure.
(aarch64_expand_vec_perm_1): Use new vector-tuple mode.
(aarch64_modes_tieable_p): Prevent tieing Neon partial struct
modes with scalar machines modes larger than 8 bytes.
(aarch64_can_change_mode_class): Don't allow changes between
partial and full Neon vector-structure modes.
* config/aarch64/arm_neon.h (vst2_lane_f16): Use updated
builtin and remove boiler-plate code for opaque mode.
(vst2_lane_f32): Likewise.
(vst2_lane_f64): Likewise.
(vst2_lane_p8): Likewise.
(vst2_lane_p16): Likewise.
(vst2_lane_p64): Likewise.
(vst2_lane_s8): Likewise.
(vst2_lane_s16): Likewise.
(vst2_lane_s32): Likewise.
(vst2_lane_s64): Likewise.
(vst2_lane_u8): Likewise.
(vst2_lane_u16): Likewise.
(vst2_lane_u32): Likewise.
(vst2_lane_u64): Likewise.
(vst2q_lane_f16): Likewise.
(vst2q_lane_f32): Likewise.
(vst2q_lane_f64): Likewise.
(vst2q_lane_p8): Likewise.
(vst2q_lane_p16): Likewise.
(vst2q_lane_p64): Likewise.
(vst2q_lane_s8): Likewise.
(vst2q_lane_s16): Likewise.
(vst2q_lane_s32): Likewise.
(vst2q_lane_s64): Likewise.
(vst2q_lane_u8): Likewise.
(vst2q_lane_u16): Likewise.
(vst2q_lane_u32): Likewise.
(vst2q_lane_u64): Likewise.
(vst3_lane_f16): Likewise.
(vst3_lane_f32): Likewise.
(vst3_lane_f64): Likewise.
(vst3_lane_p8): Likewise.
(vst3_lane_p16): Likewise.
(vst3_lane_p64): Likewise.
(vst3_lane_s8): Likewise.
(vst3_lane_s16): Likewise.
(vst3_lane_s32): Likewise.
(vst3_lane_s64): Likewise.
(vst3_lane_u8): Likewise.
(vst3_lane_u16): Likewise.
(vst3_lane_u32): Likewise.
(vst3_lane_u64): Likewise.
(vst3q_lane_f16): Likewise.
(vst3q_lane_f32): Likewise.
(vst3q_lane_f64): Likewise.
(vst3q_lane_p8): Likewise.
(vst3q_lane_p16): Likewise.
(vst3q_lane_p64): Likewise.
(vst3q_lane_s8): Likewise.
(vst3q_lane_s16): Likewise.
(vst3q_lane_s32): Likewise.
(vst3q_lane_s64): Likewise.
(vst3q_lane_u8): Likewise.
(vst3q_lane_u16): Likewise.
(vst3q_lane_u32): Likewise.
(vst3q_lane_u64): Likewise.
(vst4_lane_f16): Likewise.
(vst4_lane_f32): Likewise.
(vst4_lane_f64): Likewise.
(vst4_lane_p8): Likewise.
(vst4_lane_p16): Likewise.
(vst4_lane_p64): Likewise.
(vst4_lane_s8): Likewise.
(vst4_lane_s16): Likewise.
(vst4_lane_s32): Likewise.
(vst4_lane_s64): Likewise.
(vst4_lane_u8): Likewise.
(vst4_lane_u16): Likewise.
(vst4_lane_u32): Likewise.
(vst4_lane_u64): Likewise.
(vst4q_lane_f16): Likewise.
(vst4q_lane_f32): Likewise.
(vst4q_lane_f64): Likewise.
(vst4q_lane_p8): Likewise.
(vst4q_lane_p16): Likewise.
(vst4q_lane_p64): Likewise.
(vst4q_lane_s8): Likewise.
(vst4q_lane_s16): Likewise.
(vst4q_lane_s32): Likewise.
(vst4q_lane_s64): Likewise.
(vst4q_lane_u8): Likewise.
(vst4q_lane_u16): Likewise.
(vst4q_lane_u32): Likewise.
(vst4q_lane_u64): Likewise.
(vtbl3_s8): Likewise.
(vtbl3_u8): Likewise.
(vtbl3_p8): Likewise.
(vtbl4_s8): Likewise.
(vtbl4_u8): Likewise.
(vtbl4_p8): Likewise.
(vld1_u8_x3): Likewise.
(vld1_s8_x3): Likewise.
(vld1_u16_x3): Likewise.
(vld1_s16_x3): Likewise.
(vld1_u32_x3): Likewise.
(vld1_s32_x3): Likewise.
(vld1_u64_x3): Likewise.
(vld1_s64_x3): Likewise.
(vld1_f16_x3): Likewise.
(vld1_f32_x3): Likewise.
(vld1_f64_x3): Likewise.
(vld1_p8_x3): Likewise.
(vld1_p16_x3): Likewise.
(vld1_p64_x3): Likewise.
(vld1q_u8_x3): Likewise.
(vld1q_s8_x3): Likewise.
(vld1q_u16_x3): Likewise.
(vld1q_s16_x3): Likewise.
(vld1q_u32_x3): Likewise.
(vld1q_s32_x3): Likewise.
(vld1q_u64_x3): Likewise.
(vld1q_s64_x3): Likewise.
(vld1q_f16_x3): Likewise.
(vld1q_f32_x3): Likewise.
(vld1q_f64_x3): Likewise.
(vld1q_p8_x3): Likewise.
(vld1q_p16_x3): Likewise.
(vld1q_p64_x3): Likewise.
(vld1_u8_x2): Likewise.
(vld1_s8_x2): Likewise.
(vld1_u16_x2): Likewise.
(vld1_s16_x2): Likewise.
(vld1_u32_x2): Likewise.
(vld1_s32_x2): Likewise.
(vld1_u64_x2): Likewise.
(vld1_s64_x2): Likewise.
(vld1_f16_x2): Likewise.
(vld1_f32_x2): Likewise.
(vld1_f64_x2): Likewise.
(vld1_p8_x2): Likewise.
(vld1_p16_x2): Likewise.
(vld1_p64_x2): Likewise.
(vld1q_u8_x2): Likewise.
(vld1q_s8_x2): Likewise.
(vld1q_u16_x2): Likewise.
(vld1q_s16_x2): Likewise.
(vld1q_u32_x2): Likewise.
(vld1q_s32_x2): Likewise.
(vld1q_u64_x2): Likewise.
(vld1q_s64_x2): Likewise.
(vld1q_f16_x2): Likewise.
(vld1q_f32_x2): Likewise.
(vld1q_f64_x2): Likewise.
(vld1q_p8_x2): Likewise.
(vld1q_p16_x2): Likewise.
(vld1q_p64_x2): Likewise.
(vld1_s8_x4): Likewise.
(vld1q_s8_x4): Likewise.
(vld1_s16_x4): Likewise.
(vld1q_s16_x4): Likewise.
(vld1_s32_x4): Likewise.
(vld1q_s32_x4): Likewise.
(vld1_u8_x4): Likewise.
(vld1q_u8_x4): Likewise.
(vld1_u16_x4): Likewise.
(vld1q_u16_x4): Likewise.
(vld1_u32_x4): Likewise.
(vld1q_u32_x4): Likewise.
(vld1_f16_x4): Likewise.
(vld1q_f16_x4): Likewise.
(vld1_f32_x4): Likewise.
(vld1q_f32_x4): Likewise.
(vld1_p8_x4): Likewise.
(vld1q_p8_x4): Likewise.
(vld1_p16_x4): Likewise.
(vld1q_p16_x4): Likewise.
(vld1_s64_x4): Likewise.
(vld1_u64_x4): Likewise.
(vld1_p64_x4): Likewise.
(vld1q_s64_x4): Likewise.
(vld1q_u64_x4): Likewise.
(vld1q_p64_x4): Likewise.
(vld1_f64_x4): Likewise.
(vld1q_f64_x4): Likewise.
(vld2_s64): Likewise.
(vld2_u64): Likewise.
(vld2_f64): Likewise.
(vld2_s8): Likewise.
(vld2_p8): Likewise.
(vld2_p64): Likewise.
(vld2_s16): Likewise.
(vld2_p16): Likewise.
(vld2_s32): Likewise.
(vld2_u8): Likewise.
(vld2_u16): Likewise.
(vld2_u32): Likewise.
(vld2_f16): Likewise.
(vld2_f32): Likewise.
(vld2q_s8): Likewise.
(vld2q_p8): Likewise.
(vld2q_s16): Likewise.
(vld2q_p16): Likewise.
(vld2q_p64): Likewise.
(vld2q_s32): Likewise.
(vld2q_s64): Likewise.
(vld2q_u8): Likewise.
(vld2q_u16): Likewise.
(vld2q_u32): Likewise.
(vld2q_u64): Likewise.
(vld2q_f16): Likewise.
(vld2q_f32): Likewise.
(vld2q_f64): Likewise.
(vld3_s64): Likewise.
(vld3_u64): Likewise.
(vld3_f64): Likewise.
(vld3_s8): Likewise.
(vld3_p8): Likewise.
(vld3_s16): Likewise.
(vld3_p16): Likewise.
(vld3_s32): Likewise.
(vld3_u8): Likewise.
(vld3_u16): Likewise.
(vld3_u32): Likewise.
(vld3_f16): Likewise.
(vld3_f32): Likewise.
(vld3_p64): Likewise.
(vld3q_s8): Likewise.
(vld3q_p8): Likewise.
(vld3q_s16): Likewise.
(vld3q_p16): Likewise.
(vld3q_s32): Likewise.
(vld3q_s64): Likewise.
(vld3q_u8): Likewise.
(vld3q_u16): Likewise.
(vld3q_u32): Likewise.
(vld3q_u64): Likewise.
(vld3q_f16): Likewise.
(vld3q_f32): Likewise.
(vld3q_f64): Likewise.
(vld3q_p64): Likewise.
(vld4_s64): Likewise.
(vld4_u64): Likewise.
(vld4_f64): Likewise.
(vld4_s8): Likewise.
(vld4_p8): Likewise.
(vld4_s16): Likewise.
(vld4_p16): Likewise.
(vld4_s32): Likewise.
(vld4_u8): Likewise.
(vld4_u16): Likewise.
(vld4_u32): Likewise.
(vld4_f16): Likewise.
(vld4_f32): Likewise.
(vld4_p64): Likewise.
(vld4q_s8): Likewise.
(vld4q_p8): Likewise.
(vld4q_s16): Likewise.
(vld4q_p16): Likewise.
(vld4q_s32): Likewise.
(vld4q_s64): Likewise.
(vld4q_u8): Likewise.
(vld4q_u16): Likewise.
(vld4q_u32): Likewise.
(vld4q_u64): Likewise.
(vld4q_f16): Likewise.
(vld4q_f32): Likewise.
(vld4q_f64): Likewise.
(vld4q_p64): Likewise.
(vld2_dup_s8): Likewise.
(vld2_dup_s16): Likewise.
(vld2_dup_s32): Likewise.
(vld2_dup_f16): Likewise.
(vld2_dup_f32): Likewise.
(vld2_dup_f64): Likewise.
(vld2_dup_u8): Likewise.
(vld2_dup_u16): Likewise.
(vld2_dup_u32): Likewise.
(vld2_dup_p8): Likewise.
(vld2_dup_p16): Likewise.
(vld2_dup_p64): Likewise.
(vld2_dup_s64): Likewise.
(vld2_dup_u64): Likewise.
(vld2q_dup_s8): Likewise.
(vld2q_dup_p8): Likewise.
(vld2q_dup_s16): Likewise.
(vld2q_dup_p16): Likewise.
(vld2q_dup_s32): Likewise.
(vld2q_dup_s64): Likewise.
(vld2q_dup_u8): Likewise.
(vld2q_dup_u16): Likewise.
(vld2q_dup_u32): Likewise.
(vld2q_dup_u64): Likewise.
(vld2q_dup_f16): Likewise.
(vld2q_dup_f32): Likewise.
(vld2q_dup_f64): Likewise.
(vld2q_dup_p64): Likewise.
(vld3_dup_s64): Likewise.
(vld3_dup_u64): Likewise.
(vld3_dup_f64): Likewise.
(vld3_dup_s8): Likewise.
(vld3_dup_p8): Likewise.
(vld3_dup_s16): Likewise.
(vld3_dup_p16): Likewise.
(vld3_dup_s32): Likewise.
(vld3_dup_u8): Likewise.
(vld3_dup_u16): Likewise.
(vld3_dup_u32): Likewise.
(vld3_dup_f16): Likewise.
(vld3_dup_f32): Likewise.
(vld3_dup_p64): Likewise.
(vld3q_dup_s8): Likewise.
(vld3q_dup_p8): Likewise.
(vld3q_dup_s16): Likewise.
(vld3q_dup_p16): Likewise.
(vld3q_dup_s32): Likewise.
(vld3q_dup_s64): Likewise.
(vld3q_dup_u8): Likewise.
(vld3q_dup_u16): Likewise.
(vld3q_dup_u32): Likewise.
(vld3q_dup_u64): Likewise.
(vld3q_dup_f16): Likewise.
(vld3q_dup_f32): Likewise.
(vld3q_dup_f64): Likewise.
(vld3q_dup_p64): Likewise.
(vld4_dup_s64): Likewise.
(vld4_dup_u64): Likewise.
(vld4_dup_f64): Likewise.
(vld4_dup_s8): Likewise.
(vld4_dup_p8): Likewise.
(vld4_dup_s16): Likewise.
(vld4_dup_p16): Likewise.
(vld4_dup_s32): Likewise.
(vld4_dup_u8): Likewise.
(vld4_dup_u16): Likewise.
(vld4_dup_u32): Likewise.
(vld4_dup_f16): Likewise.
(vld4_dup_f32): Likewise.
(vld4_dup_p64): Likewise.
(vld4q_dup_s8): Likewise.
(vld4q_dup_p8): Likewise.
(vld4q_dup_s16): Likewise.
(vld4q_dup_p16): Likewise.
(vld4q_dup_s32): Likewise.
(vld4q_dup_s64): Likewise.
(vld4q_dup_u8): Likewise.
(vld4q_dup_u16): Likewise.
(vld4q_dup_u32): Likewise.
(vld4q_dup_u64): Likewise.
(vld4q_dup_f16): Likewise.
(vld4q_dup_f32): Likewise.
(vld4q_dup_f64): Likewise.
(vld4q_dup_p64): Likewise.
(vld2_lane_u8): Likewise.
(vld2_lane_u16): Likewise.
(vld2_lane_u32): Likewise.
(vld2_lane_u64): Likewise.
(vld2_lane_s8): Likewise.
(vld2_lane_s16): Likewise.
(vld2_lane_s32): Likewise.
(vld2_lane_s64): Likewise.
(vld2_lane_f16): Likewise.
(vld2_lane_f32): Likewise.
(vld2_lane_f64): Likewise.
(vld2_lane_p8): Likewise.
(vld2_lane_p16): Likewise.
(vld2_lane_p64): Likewise.
(vld2q_lane_u8): Likewise.
(vld2q_lane_u16): Likewise.
(vld2q_lane_u32): Likewise.
(vld2q_lane_u64): Likewise.
(vld2q_lane_s8): Likewise.
(vld2q_lane_s16): Likewise.
(vld2q_lane_s32): Likewise.
(vld2q_lane_s64): Likewise.
(vld2q_lane_f16): Likewise.
(vld2q_lane_f32): Likewise.
(vld2q_lane_f64): Likewise.
(vld2q_lane_p8): Likewise.
(vld2q_lane_p16): Likewise.
(vld2q_lane_p64): Likewise.
(vld3_lane_u8): Likewise.
(vld3_lane_u16): Likewise.
(vld3_lane_u32): Likewise.
(vld3_lane_u64): Likewise.
(vld3_lane_s8): Likewise.
(vld3_lane_s16): Likewise.
(vld3_lane_s32): Likewise.
(vld3_lane_s64): Likewise.
(vld3_lane_f16): Likewise.
(vld3_lane_f32): Likewise.
(vld3_lane_f64): Likewise.
(vld3_lane_p8): Likewise.
(vld3_lane_p16): Likewise.
(vld3_lane_p64): Likewise.
(vld3q_lane_u8): Likewise.
(vld3q_lane_u16): Likewise.
(vld3q_lane_u32): Likewise.
(vld3q_lane_u64): Likewise.
(vld3q_lane_s8): Likewise.
(vld3q_lane_s16): Likewise.
(vld3q_lane_s32): Likewise.
(vld3q_lane_s64): Likewise.
(vld3q_lane_f16): Likewise.
(vld3q_lane_f32): Likewise.
(vld3q_lane_f64): Likewise.
(vld3q_lane_p8): Likewise.
(vld3q_lane_p16): Likewise.
(vld3q_lane_p64): Likewise.
(vld4_lane_u8): Likewise.
(vld4_lane_u16): Likewise.
(vld4_lane_u32): Likewise.
(vld4_lane_u64): Likewise.
(vld4_lane_s8): Likewise.
(vld4_lane_s16): Likewise.
(vld4_lane_s32): Likewise.
(vld4_lane_s64): Likewise.
(vld4_lane_f16): Likewise.
(vld4_lane_f32): Likewise.
(vld4_lane_f64): Likewise.
(vld4_lane_p8): Likewise.
(vld4_lane_p16): Likewise.
(vld4_lane_p64): Likewise.
(vld4q_lane_u8): Likewise.
(vld4q_lane_u16): Likewise.
(vld4q_lane_u32): Likewise.
(vld4q_lane_u64): Likewise.
(vld4q_lane_s8): Likewise.
(vld4q_lane_s16): Likewise.
(vld4q_lane_s32): Likewise.
(vld4q_lane_s64): Likewise.
(vld4q_lane_f16): Likewise.
(vld4q_lane_f32): Likewise.
(vld4q_lane_f64): Likewise.
(vld4q_lane_p8): Likewise.
(vld4q_lane_p16): Likewise.
(vld4q_lane_p64): Likewise.
(vqtbl2_s8): Likewise.
(vqtbl2_u8): Likewise.
(vqtbl2_p8): Likewise.
(vqtbl2q_s8): Likewise.
(vqtbl2q_u8): Likewise.
(vqtbl2q_p8): Likewise.
(vqtbl3_s8): Likewise.
(vqtbl3_u8): Likewise.
(vqtbl3_p8): Likewise.
(vqtbl3q_s8): Likewise.
(vqtbl3q_u8): Likewise.
(vqtbl3q_p8): Likewise.
(vqtbl4_s8): Likewise.
(vqtbl4_u8): Likewise.
(vqtbl4_p8): Likewise.
(vqtbl4q_s8): Likewise.
(vqtbl4q_u8): Likewise.
(vqtbl4q_p8): Likewise.
(vqtbx2_s8): Likewise.
(vqtbx2_u8): Likewise.
(vqtbx2_p8): Likewise.
(vqtbx2q_s8): Likewise.
(vqtbx2q_u8): Likewise.
(vqtbx2q_p8): Likewise.
(vqtbx3_s8): Likewise.
(vqtbx3_u8): Likewise.
(vqtbx3_p8): Likewise.
(vqtbx3q_s8): Likewise.
(vqtbx3q_u8): Likewise.
(vqtbx3q_p8): Likewise.
(vqtbx4_s8): Likewise.
(vqtbx4_u8): Likewise.
(vqtbx4_p8): Likewise.
(vqtbx4q_s8): Likewise.
(vqtbx4q_u8): Likewise.
(vqtbx4q_p8): Likewise.
(vst1_s64_x2): Likewise.
(vst1_u64_x2): Likewise.
(vst1_f64_x2): Likewise.
(vst1_s8_x2): Likewise.
(vst1_p8_x2): Likewise.
(vst1_s16_x2): Likewise.
(vst1_p16_x2): Likewise.
(vst1_s32_x2): Likewise.
(vst1_u8_x2): Likewise.
(vst1_u16_x2): Likewise.
(vst1_u32_x2): Likewise.
(vst1_f16_x2): Likewise.
(vst1_f32_x2): Likewise.
(vst1_p64_x2): Likewise.
(vst1q_s8_x2): Likewise.
(vst1q_p8_x2): Likewise.
(vst1q_s16_x2): Likewise.
(vst1q_p16_x2): Likewise.
(vst1q_s32_x2): Likewise.
(vst1q_s64_x2): Likewise.
(vst1q_u8_x2): Likewise.
(vst1q_u16_x2): Likewise.
(vst1q_u32_x2): Likewise.
(vst1q_u64_x2): Likewise.
(vst1q_f16_x2): Likewise.
(vst1q_f32_x2): Likewise.
(vst1q_f64_x2): Likewise.
(vst1q_p64_x2): Likewise.
(vst1_s64_x3): Likewise.
(vst1_u64_x3): Likewise.
(vst1_f64_x3): Likewise.
(vst1_s8_x3): Likewise.
(vst1_p8_x3): Likewise.
(vst1_s16_x3): Likewise.
(vst1_p16_x3): Likewise.
(vst1_s32_x3): Likewise.
(vst1_u8_x3): Likewise.
(vst1_u16_x3): Likewise.
(vst1_u32_x3): Likewise.
(vst1_f16_x3): Likewise.
(vst1_f32_x3): Likewise.
(vst1_p64_x3): Likewise.
(vst1q_s8_x3): Likewise.
(vst1q_p8_x3): Likewise.
(vst1q_s16_x3): Likewise.
(vst1q_p16_x3): Likewise.
(vst1q_s32_x3): Likewise.
(vst1q_s64_x3): Likewise.
(vst1q_u8_x3): Likewise.
(vst1q_u16_x3): Likewise.
(vst1q_u32_x3): Likewise.
(vst1q_u64_x3): Likewise.
(vst1q_f16_x3): Likewise.
(vst1q_f32_x3): Likewise.
(vst1q_f64_x3): Likewise.
(vst1q_p64_x3): Likewise.
(vst1_s8_x4): Likewise.
(vst1q_s8_x4): Likewise.
(vst1_s16_x4): Likewise.
(vst1q_s16_x4): Likewise.
(vst1_s32_x4): Likewise.
(vst1q_s32_x4): Likewise.
(vst1_u8_x4): Likewise.
(vst1q_u8_x4): Likewise.
(vst1_u16_x4): Likewise.
(vst1q_u16_x4): Likewise.
(vst1_u32_x4): Likewise.
(vst1q_u32_x4): Likewise.
(vst1_f16_x4): Likewise.
(vst1q_f16_x4): Likewise.
(vst1_f32_x4): Likewise.
(vst1q_f32_x4): Likewise.
(vst1_p8_x4): Likewise.
(vst1q_p8_x4): Likewise.
(vst1_p16_x4): Likewise.
(vst1q_p16_x4): Likewise.
(vst1_s64_x4): Likewise.
(vst1_u64_x4): Likewise.
(vst1_p64_x4): Likewise.
(vst1q_s64_x4): Likewise.
(vst1q_u64_x4): Likewise.
(vst1q_p64_x4): Likewise.
(vst1_f64_x4): Likewise.
(vst1q_f64_x4): Likewise.
(vst2_s64): Likewise.
(vst2_u64): Likewise.
(vst2_f64): Likewise.
(vst2_s8): Likewise.
(vst2_p8): Likewise.
(vst2_s16): Likewise.
(vst2_p16): Likewise.
(vst2_s32): Likewise.
(vst2_u8): Likewise.
(vst2_u16): Likewise.
(vst2_u32): Likewise.
(vst2_f16): Likewise.
(vst2_f32): Likewise.
(vst2_p64): Likewise.
(vst2q_s8): Likewise.
(vst2q_p8): Likewise.
(vst2q_s16): Likewise.
(vst2q_p16): Likewise.
(vst2q_s32): Likewise.
(vst2q_s64): Likewise.
(vst2q_u8): Likewise.
(vst2q_u16): Likewise.
(vst2q_u32): Likewise.
(vst2q_u64): Likewise.
(vst2q_f16): Likewise.
(vst2q_f32): Likewise.
(vst2q_f64): Likewise.
(vst2q_p64): Likewise.
(vst3_s64): Likewise.
(vst3_u64): Likewise.
(vst3_f64): Likewise.
(vst3_s8): Likewise.
(vst3_p8): Likewise.
(vst3_s16): Likewise.
(vst3_p16): Likewise.
(vst3_s32): Likewise.
(vst3_u8): Likewise.
(vst3_u16): Likewise.
(vst3_u32): Likewise.
(vst3_f16): Likewise.
(vst3_f32): Likewise.
(vst3_p64): Likewise.
(vst3q_s8): Likewise.
(vst3q_p8): Likewise.
(vst3q_s16): Likewise.
(vst3q_p16): Likewise.
(vst3q_s32): Likewise.
(vst3q_s64): Likewise.
(vst3q_u8): Likewise.
(vst3q_u16): Likewise.
(vst3q_u32): Likewise.
(vst3q_u64): Likewise.
(vst3q_f16): Likewise.
(vst3q_f32): Likewise.
(vst3q_f64): Likewise.
(vst3q_p64): Likewise.
(vst4_s64): Likewise.
(vst4_u64): Likewise.
(vst4_f64): Likewise.
(vst4_s8): Likewise.
(vst4_p8): Likewise.
(vst4_s16): Likewise.
(vst4_p16): Likewise.
(vst4_s32): Likewise.
(vst4_u8): Likewise.
(vst4_u16): Likewise.
(vst4_u32): Likewise.
(vst4_f16): Likewise.
(vst4_f32): Likewise.
(vst4_p64): Likewise.
(vst4q_s8): Likewise.
(vst4q_p8): Likewise.
(vst4q_s16): Likewise.
(vst4q_p16): Likewise.
(vst4q_s32): Likewise.
(vst4q_s64): Likewise.
(vst4q_u8): Likewise.
(vst4q_u16): Likewise.
(vst4q_u32): Likewise.
(vst4q_u64): Likewise.
(vst4q_f16): Likewise.
(vst4q_f32): Likewise.
(vst4q_f64): Likewise.
(vst4q_p64): Likewise.
(vtbx4_s8): Likewise.
(vtbx4_u8): Likewise.
(vtbx4_p8): Likewise.
(vld1_bf16_x2): Likewise.
(vld1q_bf16_x2): Likewise.
(vld1_bf16_x3): Likewise.
(vld1q_bf16_x3): Likewise.
(vld1_bf16_x4): Likewise.
(vld1q_bf16_x4): Likewise.
(vld2_bf16): Likewise.
(vld2q_bf16): Likewise.
(vld2_dup_bf16): Likewise.
(vld2q_dup_bf16): Likewise.
(vld3_bf16): Likewise.
(vld3q_bf16): Likewise.
(vld3_dup_bf16): Likewise.
(vld3q_dup_bf16): Likewise.
(vld4_bf16): Likewise.
(vld4q_bf16): Likewise.
(vld4_dup_bf16): Likewise.
(vld4q_dup_bf16): Likewise.
(vst1_bf16_x2): Likewise.
(vst1q_bf16_x2): Likewise.
(vst1_bf16_x3): Likewise.
(vst1q_bf16_x3): Likewise.
(vst1_bf16_x4): Likewise.
(vst1q_bf16_x4): Likewise.
(vst2_bf16): Likewise.
(vst2q_bf16): Likewise.
(vst3_bf16): Likewise.
(vst3q_bf16): Likewise.
(vst4_bf16): Likewise.
(vst4q_bf16): Likewise.
(vld2_lane_bf16): Likewise.
(vld2q_lane_bf16): Likewise.
(vld3_lane_bf16): Likewise.
(vld3q_lane_bf16): Likewise.
(vld4_lane_bf16): Likewise.
(vld4q_lane_bf16): Likewise.
(vst2_lane_bf16): Likewise.
(vst2q_lane_bf16): Likewise.
(vst3_lane_bf16): Likewise.
(vst3q_lane_bf16): Likewise.
(vst4_lane_bf16): Likewise.
(vst4q_lane_bf16): Likewise.
* config/aarch64/geniterators.sh: Modify iterator regex to
match new vector-tuple modes.
* config/aarch64/iterators.md (insn_count): Extend mode
attribute with vector-tuple type information.
(nregs): Likewise.
(Vendreg): Likewise.
(Vetype): Likewise.
(Vtype): Likewise.
(VSTRUCT_2D): New mode iterator.
(VSTRUCT_2DNX): Likewise.
(VSTRUCT_2DX): Likewise.
(VSTRUCT_2Q): Likewise.
(VSTRUCT_2QD): Likewise.
(VSTRUCT_3D): Likewise.
(VSTRUCT_3DNX): Likewise.
(VSTRUCT_3DX): Likewise.
(VSTRUCT_3Q): Likewise.
(VSTRUCT_3QD): Likewise.
(VSTRUCT_4D): Likewise.
(VSTRUCT_4DNX): Likewise.
(VSTRUCT_4DX): Likewise.
(VSTRUCT_4Q): Likewise.
(VSTRUCT_4QD): Likewise.
(VSTRUCT_D): Likewise.
(VSTRUCT_Q): Likewise.
(VSTRUCT_QD): Likewise.
(VSTRUCT_ELT): New mode attribute.
(vstruct_elt): Likewise.
* genmodes.c (VECTOR_MODE): Add default prefix and order
parameters.
(VECTOR_MODE_WITH_PREFIX): Define.
(make_vector_mode): Add mode prefix and order parameters.

gcc/testsuite/ChangeLog:

* gcc.target/aarch64/advsimd-intrinsics/bf16_vldN_lane_2.c:
Relax incorrect register number requirement.
* gcc.target/aarch64/sve/pcs/struct_3_256.c: Accept
equivalent codegen with fmov.

gcc/expmed.c: Ensure vector modes are tieable before extraction

Extracting a bitfield from a vector can be achieved by casting the
vector to a new type whose elements are the same size as the desired
bitfield, before generating a subreg. However, this is only an
optimization if the original vector can be accessed in the new
machine mode without first being copied - a condition denoted by the
TARGET_MODES_TIEABLE_P hook.

This patch adds a check to make sure that the vector modes are
tieable before attempting to generate a subreg. This is a necessary
prerequisite for a subsequent patch that will introduce new machine
modes for Arm Neon vector-tuple types.

gcc/ChangeLog:

2021-10-11 Jonathan Wright <jonathan.wright@arm.com>

* expmed.c (extract_bit_field_1): Ensure modes are tieable.

gcc/expr.c: Remove historic workaround for broken SIMD subreg

A long time ago, using a parallel to take a subreg of a SIMD register
was broken. This temporary fix[1] (from 2003) spilled these registers
to memory and reloaded the appropriate part to obtain the subreg.

The fix initially existed for the benefit of the PowerPC E500 - a
platform for which GCC removed support a number of years ago.
Regardless, a proper mechanism for taking a subreg of a SIMD register
exists now anyway.

This patch removes the workaround thus preventing SIMD registers
being dumped to memory unnecessarily - which sometimes can't be fixed
by later passes.

[1] https://gcc.gnu.org/pipermail/gcc-patches/2003-April/102099.html

gcc/ChangeLog:

2021-10-11 Jonathan Wright <jonathan.wright@arm.com>

* expr.c (emit_group_load_1): Remove historic workaround.

aarch64: Move Neon vector-tuple type declaration into the compiler

Declare the Neon vector-tuple types inside the compiler instead of in
the arm_neon.h header. This is a necessary first step before adding
corresponding machine modes to the AArch64 backend.

The vector-tuple types are implemented using a #pragma. This means
initialization of builtin functions that have vector-tuple types as
arguments or return values has to be delayed until the #pragma is
handled.

gcc/ChangeLog:

2021-09-10 Jonathan Wright <jonathan.wright@arm.com>

* config/aarch64/aarch64-builtins.c (aarch64_init_simd_builtins):
Factor out main loop to...
(aarch64_init_simd_builtin_functions): This new function.
(register_tuple_type): Define.
(aarch64_scalar_builtin_type_p): Define.
(handle_arm_neon_h): Define.
* config/aarch64/aarch64-c.c (aarch64_pragma_aarch64): Handle
pragma for arm_neon.h.
* config/aarch64/aarch64-protos.h (aarch64_advsimd_struct_mode_p):
Declare.
(handle_arm_neon_h): Likewise.
* config/aarch64/aarch64.c (aarch64_advsimd_struct_mode_p):
Remove static modifier.
* config/aarch64/arm_neon.h (target): Remove Neon vector
structure type definitions.

x86: Check leal/addl gcc.target/i386/amxtile-3.c for x32

Check leal and addl for x32 to fix:

FAIL: gcc.target/i386/amxtile-3.c scan-assembler addq[ \\t]+\\$12
FAIL: gcc.target/i386/amxtile-3.c scan-assembler leaq[ \\t]+4
FAIL: gcc.target/i386/amxtile-3.c scan-assembler leaq[ \\t]+8

* gcc.target/i386/amxtile-3.c: Check leal/addl for x32.

path solver: Prefer range_of_expr instead of range_on_edge.

The range_of_expr method provides better caching than range_on_edge.
If we have a statement, we can just it and avoid the range_on_edge
dance. Plus we can use all the range_of_expr fanciness.

Tested on x86-64 and ppc64le Linux with the usual regstrap. I also
verified that the before and after number of threads was the same or
greater in a suite of .ii files from a bootstrap.

gcc/ChangeLog:

PR tree-optimization/102943
* gimple-range-path.cc (path_range_query::range_on_path_entry):
Prefer range_of_expr unless there are no statements in the BB.

Avoid repeating calculations in threader.

We already attempt to resolve the current path on entry to
find_paths_to_name(), so there's no need to do so again for each
exported range since nothing has changed.

Removing this redundant calculation avoids 22% of calls into the path
solver.

Tested on x86-64 and ppc64le Linux with the usual regstrap. I also
verified that the before and after number of threads was the same
in a suite of .ii files from a bootstrap.

gcc/ChangeLog:

PR tree-optimization/102943
* tree-ssa-threadbackward.c (back_threader::find_paths_to_names):
Avoid duplicate calculation of paths.

path solver: Only compute relations for imports.

We are currently calculating implicit PHI relations for all PHI
arguments.  This creates unecessary work, as we only care about SSA
names in the import bitmap.  Similarly for inter-path relationals.  We
can avoid things not in the bitmap.

Tested on x86-64 and ppc64le Linux with the usual regstrap.  I also
verified that the before and after number of threads was the same
in a suite of .ii files from a bootstrap.

gcc/ChangeLog:

PR tree-optimization/102943
* gimple-range-path.cc (path_range_query::compute_phi_relations):
Only compute relations for SSA names in the import list.
(path_range_query::compute_outgoing_relations): Same.
* gimple-range-path.h (path_range_query::import_p): New.

libffi: Add --enable-cet to configure

When --enable-cet is used to configure GCC, enable Intel CET in libffi.

* Makefile.am (AM_CFLAGS): Add $(CET_FLAGS).
(AM_CCASFLAGS): Likewise.
* configure.ac (CET_FLAGS): Add GCC_CET_FLAGS and AC_SUBST.
* Makefile.in: Regenerate.
* aclocal.m4: Likewise.
* configure: Likewise.
* include/Makefile.in: Likewise.
* man/Makefile.in: Likewise.
* testsuite/Makefile.in: Likewise.

Add -v option for git_check_commit.py.

Doing so, one can see:
$ git gcc-verify a50914d2111c72d2cd5cb8cf474133f4f85a25f6 -v
Checking a50914d2111c72d2cd5cb8cf474133f4f85a25f6: FAILED
ERR: unchanged file mentioned in a ChangeLog: "gcc/common.opt"
ERR: unchanged file mentioned in a ChangeLog (did you mean "gcc/testsuite/g++.dg/pr102955.C"?): "gcc/testsuite/gcc.dg/pr102955.c"
- gcc/testsuite/gcc.dg/pr102955.c
? ^^ ^

+ gcc/testsuite/g++.dg/pr102955.C
? ^^ ^

contrib/ChangeLog:

* gcc-changelog/git_check_commit.py: Add -v option.
* gcc-changelog/git_commit.py: Print verbose diff for wrong
filename.

testsuite: Add more guards to complex tests

This test hopefully fixes all the remaining target specific test issues by

1: Unrolling all add testcases by 16 using pragma GCC unroll
2. On armhf use Adv.SIMD instead of MVE to test. MVE's autovec is too incomplete
to be a general test target.
3. Add appropriate vect_<type> and float<size> guards on testcases.

gcc/testsuite/ChangeLog:

PR testsuite/103042
* gcc.dg/vect/complex/bb-slp-complex-add-pattern-int.c: Update guards.
* gcc.dg/vect/complex/bb-slp-complex-add-pattern-long.c: Likewise.
* gcc.dg/vect/complex/bb-slp-complex-add-pattern-short.c: Likewise.
* gcc.dg/vect/complex/bb-slp-complex-add-pattern-unsigned-int.c:
Likewise.
* gcc.dg/vect/complex/bb-slp-complex-add-pattern-unsigned-long.c:
Likewise.
* gcc.dg/vect/complex/bb-slp-complex-add-pattern-unsigned-short.c:
Likewise.
* gcc.dg/vect/complex/complex-add-pattern-template.c: Likewise.
* gcc.dg/vect/complex/complex-add-template.c: Likewise.
* gcc.dg/vect/complex/complex-operations-run.c: Likewise.
* gcc.dg/vect/complex/fast-math-bb-slp-complex-add-double.c: Likewise.
* gcc.dg/vect/complex/fast-math-bb-slp-complex-add-float.c: Likewise.
* gcc.dg/vect/complex/fast-math-bb-slp-complex-add-half-float.c:
Likewise.
* gcc.dg/vect/complex/fast-math-bb-slp-complex-add-pattern-double.c:
Likewise.
* gcc.dg/vect/complex/fast-math-bb-slp-complex-add-pattern-float.c:
Likewise.
* gcc.dg/vect/complex/fast-math-bb-slp-complex-add-pattern-half-float.c:
Likewise.
* gcc.dg/vect/complex/fast-math-bb-slp-complex-mla-double.c: Likewise.
* gcc.dg/vect/complex/fast-math-bb-slp-complex-mla-float.c: Likewise.
* gcc.dg/vect/complex/fast-math-bb-slp-complex-mla-half-float.c:
Likewise.
* gcc.dg/vect/complex/fast-math-bb-slp-complex-mls-double.c: Likewise.
* gcc.dg/vect/complex/fast-math-bb-slp-complex-mls-float.c: Likewise.
* gcc.dg/vect/complex/fast-math-bb-slp-complex-mls-half-float.c:
Likewise.
* gcc.dg/vect/complex/fast-math-bb-slp-complex-mul-double.c: Likewise.
* gcc.dg/vect/complex/fast-math-bb-slp-complex-mul-float.c: Likewise.
* gcc.dg/vect/complex/fast-math-bb-slp-complex-mul-half-float.c:
Likewise.
* gcc.dg/vect/complex/fast-math-complex-add-double.c: Likewise.
* gcc.dg/vect/complex/fast-math-complex-add-float.c: Likewise.
* gcc.dg/vect/complex/fast-math-complex-add-half-float.c: Likewise.
* gcc.dg/vect/complex/fast-math-complex-add-pattern-double.c: Likewise.
* gcc.dg/vect/complex/fast-math-complex-add-pattern-float.c: Likewise.
* gcc.dg/vect/complex/fast-math-complex-add-pattern-half-float.c:
Likewise.
* gcc.dg/vect/complex/fast-math-complex-mla-double.c: Likewise.
* gcc.dg/vect/complex/fast-math-complex-mla-float.c: Likewise.
* gcc.dg/vect/complex/fast-math-complex-mla-half-float.c: Likewise.
* gcc.dg/vect/complex/fast-math-complex-mls-double.c: Likewise.
* gcc.dg/vect/complex/fast-math-complex-mls-float.c: Likewise.
* gcc.dg/vect/complex/fast-math-complex-mls-half-float.c: Likewise.
* gcc.dg/vect/complex/fast-math-complex-mul-double.c: Likewise.
* gcc.dg/vect/complex/fast-math-complex-mul-float.c: Likewise.
* gcc.dg/vect/complex/fast-math-complex-mul-half-float.c: Likewise.
* gcc.dg/vect/complex/vect-complex-add-pattern-byte.c: Likewise.
* gcc.dg/vect/complex/vect-complex-add-pattern-int.c: Likewise.
* gcc.dg/vect/complex/vect-complex-add-pattern-long.c: Likewise.
* gcc.dg/vect/complex/vect-complex-add-pattern-short.c: Likewise.
* gcc.dg/vect/complex/vect-complex-add-pattern-unsigned-byte.c:
Likewise.
* gcc.dg/vect/complex/vect-complex-add-pattern-unsigned-int.c:
Likewise.
* gcc.dg/vect/complex/vect-complex-add-pattern-unsigned-long.c:
Likewise.
* gcc.dg/vect/complex/vect-complex-add-pattern-unsigned-short.c:
Likewise.

analyzer: fix ICE in sm_state_map::dump when dumping trees

gcc/analyzer/ChangeLog:
* program-state.cc (sm_state_map::dump): Use default_tree_printer
as format decoder.

Signed-off-by: David Malcolm <dmalcolm@redhat.com>

rtl-optimization/103075 - avoid ICEing on unfolded int-to-float converts

The following avoids asserting in exact_int_to_float_conversion_p that
the argument is not constant which it in fact can be with
-frounding-math and inexact int-to-float conversions. Say so.

2021-11-04 Richard Biener <rguenther@suse.de>

PR rtl-optimization/103075
* simplify-rtx.c (exact_int_to_float_conversion_p): Return
false for a VOIDmode operand.

* gcc.dg/pr103075.c: New testcase.

aarch64: Move more code into aarch64_vector_costs

This patch moves more code into aarch64_vector_costs and reuses
some of the information that is now available in the base class.

I'm planing to significantly rework this code, with more hooks
into the vectoriser, but this seemed worth doing as a first step.

gcc/
* config/aarch64/aarch64.c (aarch64_vector_costs): Make member
variables private and add "m_" to their names. Remove is_loop.
(aarch64_record_potential_advsimd_unrolling): Replace with...
(aarch64_vector_costs::record_potential_advsimd_unrolling): ...this.
(aarch64_analyze_loop_vinfo): Replace with...
(aarch64_vector_costs::analyze_loop_vinfo): ...this.
Move initialization of (m_)vec_flags to add_stmt_cost.
(aarch64_analyze_bb_vinfo): Delete.
(aarch64_count_ops): Replace with...
(aarch64_vector_costs::count_ops): ...this.
(aarch64_vector_costs::add_stmt_cost): Set m_vec_flags,
using m_costing_for_scalar to test whether we're costing
scalar or vector code.
(aarch64_adjust_body_cost_sve): Replace with...
(aarch64_vector_costs::adjust_body_cost_sve): ...this.
(aarch64_adjust_body_cost): Replace with...
(aarch64_vector_costs::adjust_body_cost): ...this.
(aarch64_vector_costs::finish_cost): Use m_vinfo instead of is_loop.

vect: Convert cost hooks to classes

The current vector cost interface has a quite a bit of redundancy
built in.  Each target that defines its own hooks has to replicate
the basic unsigned[3] management.  Currently each target also
duplicates the cost adjustment for inner loops.

This patch instead defines a vector_costs class for holding
the scalar or vector cost and allows targets to subclass it.
There is then only one costing hook: to create a new costs
structure of the appropriate type.  Everything else can be
virtual functions, with common concepts implemented in the
base class rather than in each target's derivation.

This might seem like excess C++-ification, but it shaves
~100 LOC.  I've also got some follow-on changes that become
significantly easier with this patch.  Maybe it could help
with things like weighting blocks based on frequency too.

This will clash with Andre's unrolling patches.  His patches
have priority so this patch should queue behind them.

The x86 and rs6000 parts fully convert to a self-contained class.
The equivalent aarch64 changes are more complex, so this patch
just does the bare minimum.  A later patch will rework the
aarch64 bits.

gcc/
* target.def (targetm.vectorize.init_cost): Replace with...
(targetm.vectorize.create_costs): ...this.
(targetm.vectorize.add_stmt_cost): Delete.
(targetm.vectorize.finish_cost): Likewise.
(targetm.vectorize.destroy_cost_data): Likewise.
* doc/tm.texi.in (TARGET_VECTORIZE_INIT_COST): Replace with...
(TARGET_VECTORIZE_CREATE_COSTS): ...this.
(TARGET_VECTORIZE_ADD_STMT_COST): Delete.
(TARGET_VECTORIZE_FINISH_COST): Likewise.
(TARGET_VECTORIZE_DESTROY_COST_DATA): Likewise.
* doc/tm.texi: Regenerate.
* tree-vectorizer.h (vec_info::vec_info): Remove target_cost_data
parameter.
(vec_info::target_cost_data): Change from a void * to a vector_costs *.
(vector_costs): New class.
(init_cost): Take a vec_info and return a vector_costs.
(dump_stmt_cost): Remove data parameter.
(add_stmt_cost): Replace vinfo and data parameters with a vector_costs.
(add_stmt_costs): Likewise.
(finish_cost): Replace data parameter with a vector_costs.
(destroy_cost_data): Delete.
* tree-vectorizer.c (dump_stmt_cost): Remove data argument and
don't print it.
(vec_info::vec_info): Remove the target_cost_data parameter and
initialize the member variable to null instead.
(vec_info::~vec_info): Delete target_cost_data instead of calling
destroy_cost_data.
(vector_costs::add_stmt_cost): New function.
(vector_costs::finish_cost): Likewise.
(vector_costs::record_stmt_cost): Likewise.
(vector_costs::adjust_cost_for_freq): Likewise.
* tree-vect-loop.c (_loop_vec_info::_loop_vec_info): Update
call to vec_info::vec_info.
(vect_compute_single_scalar_iteration_cost): Update after above
changes to costing interface.
(vect_analyze_loop_operations): Likewise.
(vect_estimate_min_profitable_iters): Likewise.
(vect_analyze_loop_2): Initialize LOOP_VINFO_TARGET_COST_DATA
at the start_over point, where it needs to be recreated after
trying without slp.  Update retry code accordingly.
* tree-vect-slp.c (_bb_vec_info::_bb_vec_info): Update call
to vec_info::vec_info.
(vect_slp_analyze_operation): Update after above changes to costing
interface.
(vect_bb_vectorization_profitable_p): Likewise.
* targhooks.h (default_init_cost): Replace with...
(default_vectorize_create_costs): ...this.
(default_add_stmt_cost): Delete.
(default_finish_cost, default_destroy_cost_data): Likewise.
* targhooks.c (default_init_cost): Replace with...
(default_vectorize_create_costs): ...this.
(default_add_stmt_cost): Delete, moving logic to vector_costs instead.
(default_finish_cost, default_destroy_cost_data): Delete.
* config/aarch64/aarch64.c (aarch64_vector_costs): Inherit from
vector_costs.  Add a constructor.
(aarch64_init_cost): Replace with...
(aarch64_vectorize_create_costs): ...this.
(aarch64_add_stmt_cost): Replace with...
(aarch64_vector_costs::add_stmt_cost): ...this.  Use record_stmt_cost
to adjust the cost for inner loops.
(aarch64_finish_cost): Replace with...
(aarch64_vector_costs::finish_cost): ...this.
(aarch64_destroy_cost_data): Delete.
(TARGET_VECTORIZE_INIT_COST): Replace with...
(TARGET_VECTORIZE_CREATE_COSTS): ...this.
(TARGET_VECTORIZE_ADD_STMT_COST): Delete.
(TARGET_VECTORIZE_FINISH_COST): Likewise.
(TARGET_VECTORIZE_DESTROY_COST_DATA): Likewise.
* config/i386/i386.c (ix86_vector_costs): New structure.
(ix86_init_cost): Replace with...
(ix86_vectorize_create_costs): ...this.
(ix86_add_stmt_cost): Replace with...
(ix86_vector_costs::add_stmt_cost): ...this.  Use adjust_cost_for_freq
to adjust the cost for inner loops.
(ix86_finish_cost, ix86_destroy_cost_data): Delete.
(TARGET_VECTORIZE_INIT_COST): Replace with...
(TARGET_VECTORIZE_CREATE_COSTS): ...this.
(TARGET_VECTORIZE_ADD_STMT_COST): Delete.
(TARGET_VECTORIZE_FINISH_COST): Likewise.
(TARGET_VECTORIZE_DESTROY_COST_DATA): Likewise.
* config/rs6000/rs6000.c (TARGET_VECTORIZE_INIT_COST): Replace with...
(TARGET_VECTORIZE_CREATE_COSTS): ...this.
(TARGET_VECTORIZE_ADD_STMT_COST): Delete.
(TARGET_VECTORIZE_FINISH_COST): Likewise.
(TARGET_VECTORIZE_DESTROY_COST_DATA): Likewise.
(rs6000_cost_data): Inherit from vector_costs.
Add a constructor.  Drop loop_info, cost and costing_for_scalar
in favor of the corresponding vector_costs member variables.
Add "m_" to the names of the remaining member variables and
initialize them.
(rs6000_density_test): Replace with...
(rs6000_cost_data::density_test): ...this.
(rs6000_init_cost): Replace with...
(rs6000_vectorize_create_costs): ...this.
(rs6000_update_target_cost_per_stmt): Replace with...
(rs6000_cost_data::update_target_cost_per_stmt): ...this.
(rs6000_add_stmt_cost): Replace with...
(rs6000_cost_data::add_stmt_cost): ...this.  Use adjust_cost_for_freq
to adjust the cost for inner loops.
(rs6000_adjust_vect_cost_per_loop): Replace with...
(rs6000_cost_data::adjust_vect_cost_per_loop): ...this.
(rs6000_finish_cost): Replace with...
(rs6000_cost_data::finish_cost): ...this.  Group loop code
into a single if statement and pass the loop_vinfo down to
subroutines.
(rs6000_destroy_cost_data): Delete.

libsanitizer: update LOCAL_PATCHES

libsanitizer/ChangeLog:

* LOCAL_PATCHES: Update git revision.

libsanitizer: Apply local patches

lisanitizer: Apply autoreconf.

libsanitizer: merge from master (c86b4503a94c277534ce4b9a5c015a6ac151b98a).

Convert arrays in ssa pointer_equiv_analyzer to auto_vec's.

The problem in this PR is an off-by-one bug.  We should've allocated
num_ssa_names + 1.  However, in fixing this, I noticed that
num_ssa_names can change between queries, so I have replaced the array
with an auto_vec and added code to grow the vector as necessary.

Tested on x86-64 Linux.

PR tree-optimization/103062

gcc/ChangeLog:

PR tree-optimization/103062
* value-pointer-equiv.cc (ssa_equiv_stack::ssa_equiv_stack):
Increase size of allocation by 1.
(ssa_equiv_stack::push_replacement): Grow as needed.
(ssa_equiv_stack::get_replacement): Same.
(pointer_equiv_analyzer::pointer_equiv_analyzer): Same.
(pointer_equiv_analyzer::~pointer_equiv_analyzer): Remove delete.
(pointer_equiv_analyzer::set_global_equiv): Grow as needed.
(pointer_equiv_analyzer::get_equiv): Same.
(pointer_equiv_analyzer::get_equiv_expr): Remove const.
* value-pointer-equiv.h (class pointer_equiv_analyzer): Remove
const markers.  Use auto_vec instead of tree *.

gcc/testsuite/ChangeLog:

* gcc.dg/pr103062.c: New test.

libstdc++: Refactor emplace-like functions in std::variant

libstdc++-v3/ChangeLog:

* include/std/variant (__detail::__variant::__emplace): New
function template.
(_Copy_assign_base::operator=): Reorder conditions to match
bulleted list of effects in the standard. Use __emplace instead
of _M_reset followed by _Construct.
(_Move_assign_base::operator=): Likewise.
(__construct_by_index): Remove.
(variant::emplace): Use __emplace instead of _M_reset followed
by __construct_by_index.
(variant::swap): Hoist valueless cases out of visitor. Use
__emplace to replace _M_reset followed by _Construct.

libstdc++: Optimize std::variant traits and improve diagnostics

By defining additional partial specializations of _Nth_type we can
reduce the number of recursive instantiations needed to get from N to 0.
We can also use _Nth_type in variant_alternative, to take advantage of
that new optimization.

By adding a static_assert to variant_alternative we get a nicer error
than 'invalid use of incomplete type'.

By defining partial specializations of std::variant_size_v for the
common case we can avoid instantiating the std::variant_size class
template.

The __tuple_count class template and __tuple_count_v variable template
can be simplified to a single variable template, __count.

By adding a deleted constructor to the _Variant_union primary template
we can (very slightly) improve diagnostics for invalid attempts to
construct a std::variant with an out-of-range index. Instead of a
confusing error about "too many initializers for ..." we get a call to a
deleted function.

By using _Nth_type instead of variant_alternative (for cv-unqualified
variant types) we avoid instantiating variant_alternative.

By adding deleted overloads of variant::emplace we get better
diagnostics for emplace<invalid-index> or emplace<invalid-type>. Instead
of getting errors explaining why each of the four overloads wasn't
valid, we just get one error about calling a deleted function.

libstdc++-v3/ChangeLog:

* include/std/variant (_Nth_type): Define partial
specializations to reduce number of instantiations.
(variant_size_v): Define partial specializations to avoid
instantiations.
(variant_alternative): Use _Nth_type. Add static assert.
(__tuple_count, __tuple_count_v): Replace with ...
(__count): New variable template.
(_Variant_union): Add deleted constructor.
(variant::__to_type): Use _Nth_type.
(variant::emplace): Use _Nth_type. Add deleted overloads for
invalid types and indices.

libstdc++: Fix handling of const types in std::variant [PR102912]

Prior to r12-4447 (implementing P2231R1 constexpr changes) we didn't
construct the correct member of the union in __variant_construct_single,
we just plopped an object in the memory occupied by the union:

  void* __storage = std::addressof(__lhs._M_u);
  using _Type = remove_reference_t<decltype(__rhs_mem)>;
  ::new (__storage) _Type(std::forward<decltype(__rhs_mem)>(__rhs_mem));

We didn't care whether we had variant<int, const int>, we would just
place an int (or const int) into the storage, and then set the _M_index
to say which one it was.

In the new constexpr-friendly code we use std::construct_at to construct
the union object, which constructs the active member of the right type.
But now we need to know exactly the right type. We have to distinguish
between alternatives of type int and const int, and we have to be able
to find a const int (or const std::string, as in the OP) among the
alternatives. So my change from remove_reference_t<decltype(__rhs_mem)>
to remove_cvref_t<_Up> was wrong. It strips the const from const int,
and then we can't find the index of the const int alternative.

But just using remove_reference_t doesn't work either. When the copy
assignment operator of std::variant<int> uses __variant_construct_single
it passes a const int& as __rhs_mem, but if we don't strip the const
then we try to find const int among the alternatives, and *that* fails.
Similarly for the copy constructor, which also uses a const int& as the
initializer for a non-const int alternative.

The root cause of the problem is that __variant_construct_single doesn't
know the index of the type it's supposed to construct, and the new
_Variant_storage::__index_of<_Type> helper doesn't work if __rhs_mem and
the alternative being constructed have different const-qualification. We
need to replace __variant_construct_single with something that knows the
index of the alternative being constructed. All uses of that function do
actually know the index, but that context is lost by the time we call
__variant_construct_single. This patch replaces that function and
__variant_construct, inlining their effects directly into the callers.

libstdc++-v3/ChangeLog:

PR libstdc++/102912
* include/std/variant (_Variant_storage::__index_of): Remove.
(__variant_construct_single): Remove.
(__variant_construct): Remove.
(_Copy_ctor_base::_Copy_ctor_base(const _Copy_ctor_base&)): Do
construction directly instead of using __variant_construct.
(_Move_ctor_base::_Move_ctor_base(_Move_ctor_base&&)): Likewise.
(_Move_ctor_base::_M_destructive_move()): Remove.
(_Move_ctor_base::_M_destructive_copy()): Remove.
(_Copy_assign_base::operator=(const _Copy_assign_base&)): Do
construction directly instead of using _M_destructive_copy.
(variant::swap): Do construction directly instead of using
_M_destructive_move.
* testsuite/20_util/variant/102912.cc: New test.

VN/PRE TLC

This removes an always true parameter of vn_nary_op_insert_into and moves
valueization to the two callers of vn_nary_op_compute_hash instead of doing it
therein where this function name does not suggest such thing.
Also remove extra valueization from PRE phi-translation.

2021-11-03 Richard Biener <rguenther@suse.de>

* tree-ssa-sccvn.c (vn_nary_op_insert_into): Remove always
true parameter and inline valueization.
(vn_nary_op_lookup_1): Inline valueization from ...
(vn_nary_op_compute_hash): ... here and remove it here.
* tree-ssa-pre.c (phi_translate_1): Do not valueize
before vn_nary_lookup_pieces.
(get_representative_for): Mark created SSA representatives
as visited.

Update dg-require-effective-target for pr101145 cases

For test cases pr101145*.c, some types are not able to be
vectorized on some targets. This patch updates
dg-require-effective-target according to test cases.

gcc/testsuite/ChangeLog:

* gcc.dg/vect/pr101145_1.c: Update case.
* gcc.dg/vect/pr101145_2.c: Update case.
* gcc.dg/vect/pr101145_3.c: Update case.

Disable warning for an ASAN test-case.

gcc/testsuite/ChangeLog:

* g++.dg/asan/asan_test.C: Disable one warning.

simplify-rtx: Fix vec_select index check

Vector lane indices follow memory (array) order, so lane 0 corresponds
to the high element rather than the low element on big-endian targets.

This was causing quite a few execution failures on aarch64_be,
such as gcc.c-torture/execute/pr47538.c.

gcc/
* simplify-rtx.c (simplify_context::simplify_gen_vec_select): Assert
that the operand has a vector mode. Use subreg_lowpart_offset
to test whether an index corresponds to the low part.

gcc/testsuite/
* gcc.dg/rtl/aarch64/big-endian-cse-1.c: New test.

Fix RTL frontend handling of const_vectors

The RTL frontend makes sure that CONST_INTs use shared rtxes where
appropriate.  We should do the same thing for CONST_VECTORs,
reusing CONST0_RTX, CONST1_RTX and CONSTM1_RTX.  This also has
the effect of setting CONST_VECTOR_NELTS_PER_PATTERN and
CONST_VECTOR_NPATTERNS.

While looking at where to add that, I noticed we had some dead #includes
in read-rtl.c.  Some of the stuff that read-rtl-function.c does was once
in that file instead.

gcc/
* read-rtl.c: Remove dead !GENERATOR_FILE block.
* read-rtl-function.c (function_reader::consolidate_singletons):
Generate canonical CONST_VECTORs.

Extend vternlog define_insn_and_split to memory_operand to enable more optimziation.

gcc/ChangeLog:

PR target/101989
* config/i386/predicates.md (reg_or_notreg_operand): Rename to ..
(regmem_or_bitnot_regmem_operand): .. and extend to handle
memory_operand.
* config/i386/sse.md (*<avx512>_vpternlog<mode>_1): Force_reg
the operands which are required to be register_operand.
(*<avx512>_vpternlog<mode>_2): Ditto.
(*<avx512>_vpternlog<mode>_3): Ditto.
(*<avx512>_vternlog<mode>_all): Disallow embeded broadcast for
vector HFmodes since it's not a real AVX512FP16 instruction.

gcc/testsuite/ChangeLog:

* gcc.target/i386/pr101989-3.c: New test.

Simplify (trunc)copysign((extend)a, (extend)b) to .COPYSIGN (a,b).

a and b are same type as the truncation type and has less precision
than extend type.

gcc/ChangeLog:

PR target/102464
* match.pd: simplify (trunc)copysign((extend)a, (extend)b) to
.COPYSIGN (a,b) when a and b are same type as the truncation
type and has less precision than extend type.

gcc/testsuite/ChangeLog:

* gcc.target/i386/pr102464-copysign-1.c: New test.

Update TARGET_MEM_REF documentation

This updates the internals manual documentation of TARGET_MEM_REF
and amends MEM_REF. The former was seriously out of date.

2021-11-04 Richard Biener <rguenther@suse.de>

gcc/
* doc/generic.texi: Update TARGET_MEM_REF and MEM_REF
documentation.

i386: Auto vectorize sdot_prod, usdot_prod with VNNI instruction.

AVX512VNNI/AVXVNNI has vpdpwssd for HImode, vpdpbusd for QImode, so
Adjust HImode sdot_prod expander and add QImode usdot_prod expander
to enhance vectorization for dotprod.

gcc/ChangeLog:

* config/i386/sse.md (VI2_AVX512VNNIBW): New mode iterator.
(VI1_AVX512VNNI): Likewise.
(SDOT_VPDP_SUF): New mode_attr.
(VI1SI): Likewise.
(vi1si): Likewise.
(sdot_prod<mode>): Use VI2_AVX512F iterator, expand to
vpdpwssd when VNNI targets available.
(usdot_prod<mode>): New expander for vector QImode.

gcc/testsuite/ChangeLog:

* gcc.target/i386/vnni-auto-vectorize-1.c: New test.
* gcc.target/i386/vnni-auto-vectorize-2.c: Ditto.

i386: Fix wrong result for AMX-TILE intrinsic when parsing expression.

_tile_loadd, _tile_stored, _tile_streamloadd intrinsics are defined by
macro, so the parameters should be wrapped by parentheses to accept
expressions.

gcc/ChangeLog:

* config/i386/amxtileintrin.h (_tile_loadd_internal): Add
parentheses to base and stride.
(_tile_stream_loadd_internal): Likewise.
(_tile_stored_internal): Likewise.

gcc/testsuite/ChangeLog:
* gcc.target/i386/amxtile-3.c: New test.

testsuite: Fix g++.dg/opt/pr102970.C

This test uses a generic lambda, only available since C++14, so don't
run it in earlier modes.

gcc/testsuite/ChangeLog:

* g++.dg/opt/pr102970.C: Only run in C++14 and up.

Daily bump.

MAINTAINERS: Clarify the policy WRT the Write After Approval list

* MAINTAINERS: Clarify the policy WRT the Write After Approval
list.

RISC-V: Fix register class subset checks for CLASS_MAX_NREGS

Fix the register class subset checks in the determination of the maximum
number of consecutive registers needed to hold a value of a given mode.

The number depends on whether a register is a general-purpose or a
floating-point register, so check whether the register class requested
is a subset (argument 1 to `reg_class_subset_p') rather than superset
(argument 2) of GR_REGS or FP_REGS class respectively.

gcc/
* config/riscv/riscv.c (riscv_class_max_nregs): Swap the
arguments to `reg_class_subset_p'.

libstdc++: Fix regression in std::list::sort [PR66742]

The standard does not require const-correct comparisons in list::sort.

libstdc++-v3/ChangeLog:

PR libstdc++/66742
* include/bits/list.tcc (list::sort): Use mutable iterators for
comparisons.
* include/bits/stl_list.h (_Scratch_list::_Ptr_cmp): Likewise.
* testsuite/23_containers/list/operations/66742.cc: Check
non-const comparisons.

c: Fold implicit integer-to-floating conversions in static initializers with -frounding-math [PR103031]

Recent fixes to avoid inappropriate folding of some conversions to
floating-point types with -frounding-math also prevented such folding
in C static initializers, when folding (in the default rounding mode,
exceptions discarded) is required for correctness.

Folding for static initializers is handled via functions in
fold-const.c calling START_FOLD_INIT and END_FOLD_INIT to adjust flags
such as flag_rounding_math that should not apply in static initializer
context, but no such function was being called for the folding of
these implicit conversions to the type of the object being
initialized, only for explicit conversions as part of the initializer.

Arrange for relevant folding (a fold call in convert, in particular)
to use this special initializer handling (via a new fold_init
function, in particular).

Because convert is used by language-independent code but defined in
each front end, this isn't as simple as just adding a new default
argument to it.  Instead, I added a new convert_init function; that
then gets called by c-family code, and C and C++ need convert_init
implementations (the C++ one does nothing different from convert and
will never actually get called because the new convert_and_check
argument will never be true from C++), but other languages don't.

Bootstrapped with no regressions for x86_64-pc-linux-gnu.

gcc/
PR c/103031
* fold-const.c (fold_init): New function.
* fold-const.h (fold_init): New prototype.

gcc/c-family/
PR c/103031
* c-common.c (convert_and_check): Add argument init_const.  Call
convert_init if init_const.
* c-common.h (convert_and_check): Update prototype.
(convert_init): New prototype.

gcc/c/
PR c/103031
* c-convert.c (c_convert): New function, based on convert.
(convert): Make into wrapper of c_convert.
(convert_init): New function.
* c-typeck.c (enum impl_conv): Add ic_init_const.
(convert_for_assignment): Handle ic_init_const like ic_init.  Add
new argument to convert_and_check call.
(digest_init): Pass ic_init_const to convert_for_assignment for
initializers required to be constant.

gcc/cp/
PR c/103031
* cvt.c (convert_init): New function.

gcc/testsuite/
PR c/103031
* gcc.dg/init-rounding-math-1.c: New test.

Switch vrp2 to ranger.

This patch flips the default for the VRP2 pass to execute ranger vrp rather
than the assert_expr version of VRP.

* params.opt (param_vrp2_mode): Make ranger the default for VRP2.

Testcase adjustments for pass vrp1.

Unify testcases for the vrp1 pass so they will work with the output from either
VRP or ranger.

gcc/testsuite/
* gcc.dg/tree-ssa/pr23744.c: Tweak output checks.
* gcc.dg/tree-ssa/vrp07.c: Ditto.
* gcc.dg/tree-ssa/vrp08.c: Ditto.
* gcc.dg/tree-ssa/vrp09.c: Ditto.
* gcc.dg/tree-ssa/vrp20.c: Ditto.
* gcc.dg/tree-ssa/vrp92.c: Ditto.
* jit.dg/test-sum-of-squares.c: Ditto.

For ranges, PHIs don't need to process arg == def.

If an argument of a phi is the same as the DEF of the phi, then the range
on the incoming edge doesn't need to be taken into account since it can't
be anything other than itself.

* gimple-range-fold.cc (fold_using_range::range_of_phi): Don't import
a range from edge if arg == phidef.

Check for constant builtin value first.

The original code imported from EVRP for evaluating built_in_constant_p
didn't check to see if the value was a constant before checking the
inlining flag. Now we check for a constant first.

* gimple-range-fold.cc (fold_using_range::range_of_builtin_call): Test
for constant before any other processing.

Fix --param=ranger-debug=all to include a trace.

A recent change made each debug flag its own value, but the 'all' value was
not adjusted properly and 'trace' was left out.

* flag-types.h (RANGER_DEBUG_ALL): Fix values.

Provide some context to folding via ranger.

Provide an internal mechanism to supply context to range_of_expr for calls
to ::fold_stmt.

* gimple-range.cc (gimple_ranger::gimple_ranger): Initialize current_bb.
(gimple_ranger::range_of_expr): Pick up range_on_entry when there is
no explcit context and current_bb is set.
(gimple_ranger::fold_stmt): New.
* gimple-range.h (current_bb, fold_stmt): New.
* tree-vrp.c (rvrp_folder::fold_stmt): Call ranger's fold_stmt.

tree-optimization/102970 - remap cliques when translating over backedges

The following makes sure to remap (or rather drop for simplicity)
dependence info encoded in MR_DEPENDENCE_CLIQUE when PRE PHI translation
translates a reference over a backedge since that ends up interleaving
two different loop iterations which boils down to two different
inline copies.

2021-11-03 Richard Biener <rguenther@suse.de>

PR tree-optimization/102970
* tree-ssa-pre.c (phi_translate_1): Drop clique and base
when translating a MEM_REF over a backedge.

* g++.dg/opt/pr102970.C: New testcase.

aarch64: enable Ampere-1 CPU

This adds support and a basic turning model for the Ampere Computing
"Ampere-1" CPU.

The Ampere-1 implements the ARMv8.6 architecture in A64 mode and is
modelled as a 4-wide issue (as with all modern micro-architectures,
the chosen issue rate is a compromise between the maximum dispatch
rate and the maximum rate of uops issued to the scheduler).

This adds the -mcpu=ampere1 command-line option and the relevant cost
information/tuning tables for the Ampere-1.

gcc/ChangeLog:

* config/aarch64/aarch64-cores.def (AARCH64_CORE): New Ampere-1 core.
* config/aarch64/aarch64-tune.md: Regenerate.
* config/aarch64/aarch64-cost-tables.h: Add extra costs for Ampere-1.
* config/aarch64/aarch64.c: Add tuning structures for Ampere-1.
* doc/invoke.texi: Add documentation for Ampere-1 core.

AArch64: Improve GOT addressing

Improve GOT addressing by treating the instructions as a pair.  This reduces
register pressure and improves code quality significantly.  SPECINT2017
improves by 0.6% with -fPIC and codesize is 0.73% smaller.  Perlbench has
0.9% smaller codesize, 1.5% fewer executed instructions and is 1.8% faster
on Neoverse N1.

ChangeLog:
2021-11-02  Wilco Dijkstra  <wdijkstr@arm.com>

* config/aarch64/aarch64.md (movsi): Add alternative for GOT accesses.
(movdi): Likewise.
(ldr_got_small_<mode>): Remove pattern.
(ldr_got_small_sidi): Likewise.
* config/aarch64/aarch64.c (aarch64_load_symref_appropriately): Keep
GOT accesses as moves.
(aarch64_print_operand): Correctly print got_lo12 in L specifier.
(aarch64_mov_operand_p): Make GOT accesses valid move operands.
* config/aarch64/constraints.md: Add new constraint Usw for GOT access.

gcov: Remove dead variable.

gcc/ChangeLog:

* gcov.c (read_line): Remove dead variable.

Rename predicate class to ipa_predicate

PR bootstrap/102828

gcc/ChangeLog:

* ipa-fnsummary.c (edge_predicate_pool): Rename predicate class to ipa_predicate.
(ipa_fn_summary::account_size_time): Likewise.
(edge_set_predicate): Likewise.
(set_hint_predicate): Likewise.
(add_freqcounting_predicate): Likewise.
(evaluate_conditions_for_known_args): Likewise.
(evaluate_properties_for_edge): Likewise.
(remap_freqcounting_preds_after_dup): Likewise.
(ipa_fn_summary_t::duplicate): Likewise.
(set_cond_stmt_execution_predicate): Likewise.
(set_switch_stmt_execution_predicate): Likewise.
(compute_bb_predicates): Likewise.
(will_be_nonconstant_expr_predicate): Likewise.
(will_be_nonconstant_predicate): Likewise.
(phi_result_unknown_predicate): Likewise.
(predicate_for_phi_result): Likewise.
(analyze_function_body): Likewise.
(compute_fn_summary): Likewise.
(summarize_calls_size_and_time): Likewise.
(estimate_calls_size_and_time): Likewise.
(ipa_call_context::estimate_size_and_time): Likewise.
(remap_edge_summaries): Likewise.
(remap_freqcounting_predicate): Likewise.
(ipa_merge_fn_summary_after_inlining): Likewise.
(ipa_update_overall_fn_summary): Likewise.
(read_ipa_call_summary): Likewise.
(inline_read_section): Likewise.
* ipa-fnsummary.h (struct ipa_freqcounting_predicate): Likewise.
* ipa-predicate.c (predicate::add_clause): Likewise.
(ipa_predicate::add_clause): Likewise.
(predicate::or_with): Likewise.
(ipa_predicate::or_with): Likewise.
(predicate::evaluate): Likewise.
(ipa_predicate::evaluate): Likewise.
(predicate::probability): Likewise.
(ipa_predicate::probability): Likewise.
(dump_condition): Likewise.
(dump_clause): Likewise.
(predicate::dump): Likewise.
(ipa_predicate::dump): Likewise.
(predicate::debug): Likewise.
(ipa_predicate::debug): Likewise.
(predicate::remap_after_duplication): Likewise.
(ipa_predicate::remap_after_duplication): Likewise.
(predicate::remap_after_inlining): Likewise.
(ipa_predicate::remap_after_inlining): Likewise.
(predicate::stream_in): Likewise.
(ipa_predicate::stream_in): Likewise.
(predicate::stream_out): Likewise.
(ipa_predicate::stream_out): Likewise.
(add_condition): Likewise.
* ipa-predicate.h (class predicate): Likewise.
(class ipa_predicate): Likewise.
(add_condition): Likewise.

Make sbitmap bitmap_set_bit and bitmap_clear_bit return changed state

The following adjusts the sbitmap bitmap_set_bit and bitmap_clear_bit
APIs to match that of bitmap by returning a bool indicating whether
the bitmap was changed. I've also changed bitmap_bit_p to return
a bool rather than an int and made use of the sbitmap bitmap_set_bit
API change in one place.

2021-11-03 Richard Biener <rguenther@suse.de>

* bitmap.h (bitmap_bit_p): Change the return type to bool.
* bitmap.c (bitmap_bit_p): Likewise.
* sbitmap.h (bitmap_bit_p): Likewise.
(bitmap_set_bit): Return whether the bit changed.
(bitmap_clear_bit): Likewise.
* tree-ssa.c (verify_vssa): Make use of the changed state
from bitmap_set_bit.

middle-end/103033 - drop native_interpret_expr with .DEFERRED_INIT expansion

This drops the use of native_interpret_expr which can fail even though
can_native_interpret_expr_p returns true in favor of simply folding
the VIEW_CONVERT_EXPR punning.

2021-11-03 Richard Biener <rguenther@suse.de>

PR middle-end/103033
* internal-fn.c (expand_DEFERRED_INIT): Elide the
native_interpret_expr path in favor of folding the
VIEW_CONVERT_EXPR generated when punning the RHS.

IBM Z: Free bbs in s390_loop_unroll_adjust

gcc/ChangeLog:

* config/s390/s390.c (s390_loop_unroll_adjust): In case of early
exit free bbs.

Fix wrong code caulsed by retslot EAF flags propagation [PR103040]

Fixe (quite nasty) thinko in how I propagate EAF flags from callee
to caller.  In this case some flags needs to be changed.  In particular
  - EAF_NOT_RETURNED in callee does not really mean EAF_NOT_RETURNED in caller
    since we speak of different return values
  - if callee escapes the parametr, we caller may return it
  - for retslot the rewritting is even bit more funny, since escaping to of
    return slot to return slot is not really an escape, however escape of
    argument to itself is.

This patch should correct all of the cases above and does fix the testcase from PR103040.

Bootstrapped/regtested x86_64 with all languages.  Also lto-bootstrapped.

gcc/ChangeLog:

PR ipa/103040
* ipa-modref.c (callee_to_caller_flags): New function.
(modref_eaf_analysis::analyze_ssa_name): Use it.
(ipa_merge_modref_summary_after_inlining): Fix whitespace.

gcc/testsuite/ChangeLog:

* g++.dg/torture/pr103040.C: New test.

Daily bump.

libstdc++: Add some noexcept to std::valarray

libstdc++-v3/ChangeLog:

* include/std/valarray (valarray::valarray()): Add noexcept.
(valarray::operator[]): Likewise.

Revert accidental commit.

2021-11-02 Jan Hubicka <hubicka@ucw.cz>

* ipa-modref.c (modref_eaf_analysis::analyze_ssa_name): Revert
accidental commit.

x86_64: Improved implementation of TImode rotations.

This simple patch improves the implementation of 128-bit (TImode)
rotations on x86_64 (a missed optimization opportunity spotted
during the recent V1TImode improvements).

Currently, the function:

unsigned __int128 rotrti3(unsigned __int128 x, unsigned int i) {
  return (x >> i) | (x << (128-i));
}

produces:

rotrti3:
        movq    %rsi, %r8
        movq    %rdi, %r9
        movl    %edx, %ecx
        movq    %rdi, %rsi
        movq    %r9, %rax
        movq    %r8, %rdx
        movq    %r8, %rdi
        shrdq   %r8, %rax
        shrq    %cl, %rdx
        xorl    %r8d, %r8d
        testb   $64, %cl
        cmovne  %rdx, %rax
        cmovne  %r8, %rdx
        negl    %ecx
        andl    $127, %ecx
        shldq   %r9, %rdi
        salq    %cl, %rsi
        xorl    %r9d, %r9d
        testb   $64, %cl
        cmovne  %rsi, %rdi
        cmovne  %r9, %rsi
        orq     %rdi, %rdx
        orq     %rsi, %rax
        ret

with this patch, GCC will now generate the much nicer:
rotrti3:
        movl    %edx, %ecx
        movq    %rdi, %rdx
        shrdq   %rsi, %rdx
        shrdq   %rdi, %rsi
        andl    $64, %ecx
        movq    %rdx, %rax
        cmove   %rsi, %rdx
        cmovne  %rsi, %rax
        ret

Even I wasn't expecting the optimizer's choice of the final three
instructions; a thing of beauty.  For rotations larger than 64,
the lowpart and the highpart (%rax and %rdx) are transposed, and
it would be nice to have a conditional swap/exchange.  The inspired
solution the compiler comes up with is to store/duplicate the same
value in both %rax/%rdx, and then use complementary conditional moves
to either update the lowpart or highpart, which cleverly avoids the
potential decode-stage pipeline stall (on some microarchitectures)
from having multiple instructions conditional on the same condition.
See X86_TUNE_ONE_IF_CONV_INSN, and notice there are two such stalls
in the original expansion of rot[rl]ti3.

2021-11-02  Roger Sayle  <roger@nextmovesoftware.com>
    Uroš Bizjak  <ubizjak@gmail.com>

* config/i386/i386.md (<any_rotate>ti3): Provide expansion for
rotations by non-constant amounts.

ipa-modref cleanup

A small refactoring of ipa-modref to make it bit more
C++y by moving logic analyzing ssa name flags to a class
and I also moved the anonymous namespace markers so we do not
export unnecessary stuff. There are no functional changes.

Bootstrapped/regtested x86_64-linux, will commit it shortly.

gcc/ChangeLog:

* ipa-modref.c: Fix anonymous namespace placement.
(class modref_eaf_analysis): New class.
(analyze_ssa_name_flags): Turn to ...
(modref_eaf_analysis::analyze_ssa_name): ... this one.
(merge_call_lhs_flags): Turn to ...
(modref_eaf_analysis::merge_call_lhs_flags): .. this one
(modref_eaf_analysis::merge_with_ssa_name): New member function.
(record_escape_points): Turn to ...
(modref_eaf_analysis::record_escape_points): ... this one.
(analyze_parms): Updat
(ipa_merge_modref_summary_after_inlining): Move to the end of file.

Static chain support in ipa-modref

Teach ipa-modref about the static chain that is, like
retslot, a hiden argument. The patch is pretty much symemtric to what
was done for retslot handling and I verified it does the intended job
for Ada LTO bootstrap.

gcc/ChangeLog:

* gimple.c (gimple_call_static_chain_flags): New function.
* gimple.h (gimple_call_static_chain_flags): Declare
* ipa-modref.c (modref_summary::modref_summary): Initialize
static_chain_flags.
(modref_summary_lto::modref_summary_lto): Likewise.
(modref_summary::useful_p): Test static_chain_flags.
(modref_summary_lto::useful_p): Likewise.
(struct modref_summary_lto): Add static_chain_flags.
(modref_summary::dump): Dump static_chain_flags.
(modref_summary_lto::dump): Likewise.
(struct escape_point): Add static_cahin_arg.
(analyze_ssa_name_flags): Use gimple_call_static_chain_flags.
(analyze_parms): Handle static chains.
(modref_summaries::duplicate): Duplicate static_chain_flags.
(modref_summaries_lto::duplicate): Likewise.
(modref_write): Stream static_chain_flags.
(read_section): Likewise.
(modref_merge_call_site_flags): Handle static_chain_flags.
* ipa-modref.h (struct modref_summary): Add static_chain_flags.
* tree-ssa-structalias.c (handle_rhs_call): Use
gimple_static_chain_flags.

gcc/testsuite/ChangeLog:

* gcc.dg/ipa/modref-3.c: New test.

tree-optimization/103029 - ensure vect loop versioning constraint on PHIs

PHI nodes in vectorizer loop versioning need to maintain the same
order of PHI arguments to not disturb SLP discovery. The following
adds an assertion and mitigation in case loop versioning breaks this
which happens more often after the recent reorg.

2021-11-02 Richard Biener <rguenther@suse.de>

PR tree-optimization/103029
* tree-vect-loop-manip.c (vect_loop_versioning): Ensure
the PHI nodes in the loop maintain their original operand
order.

addS EAF_NOT_RETURNED_DIRECTLY

addS EAF_NOT_RETURNED_DIRECTLY which works similarly as
EAF_NODIRECTESCAPE. Values pointed to by a given argument may be returned but
not the argument itself. This helps PTA quite noticeably because we mostly
care about tracking points to which given memory location can escape.

gcc/ChangeLog:

* tree-core.h (EAF_NOT_RETURNED_DIRECTLY): New flag.
(EAF_NOREAD): Renumber.
* ipa-modref.c (dump_eaf_flags): Dump EAF_NOT_RETURNED_DIRECTLY.
(remove_useless_eaf_flags): Handle EAF_NOT_RETURNED_DIRECTLY
(deref_flags): Likewise.
(modref_lattice::init): Likewise.
(modref_lattice::merge): Likewise.
(merge_call_lhs_flags): Likewise.
(analyze_ssa_name_flags): Likewise.
(modref_merge_call_site_flags): Likewise.
* tree-ssa-structalias.c (handle_call_arg): Likewise.

gcc/testsuite/ChangeLog:

* g++.dg/ipa/modref-1.C: Update template.
* gcc.dg/tree-ssa/modref-10.c: New test.

RISC-V: Fix build errors with shNadd/shNadd.uw patterns in zba cost model

Fix a build regression from commit 04a9b554ba1a ("RISC-V: Cost model
for zba extension."):

.../gcc/config/riscv/riscv.c: In function 'bool riscv_rtx_costs(rtx, machine_mode, int, int, int*, bool)':
.../gcc/config/riscv/riscv.c:2018:11: error: 'and' of mutually exclusive equal-tests is always 0 [-Werror]
2018 |           && IN_RANGE (INTVAL (XEXP (XEXP (x, 0), 0)), 1, 3))
      |           ^~
.../gcc/config/riscv/riscv.c:2047:17: error: unused variable 'ashift_lhs' [-Werror=unused-variable]
2047 |             rtx ashift_lhs = XEXP (and_lhs, 0);
      |                 ^~~~~~~~~~

by correcting a CONST_INT_P check referring the wrong operand and
getting rid of the unused variable.

gcc/
* config/riscv/riscv.c (riscv_rtx_costs): Correct a CONST_INT_P
check and remove an unused local variable with shNadd/shNadd.uw
pattern handling.

IBM Z: ldist-{rawmemchr,strlen} tests require vector extensions

The tests require vector extensions which are only available for z13 and
later while using the z/Architecture.

gcc/testsuite/ChangeLog:

* gcc.dg/tree-ssa/ldist-rawmemchr-1.c: For IBM Z set arch to z13
and use z/Architecture since the tests require vector extensions.
* gcc.dg/tree-ssa/ldist-rawmemchr-2.c: Likewise.
* gcc.dg/tree-ssa/ldist-strlen-1.c: Likewise.
* gcc.dg/tree-ssa/ldist-strlen-3.c: Likewise.

middle-end: Fix PR103007, add missing check on complex fms detection.

The complex FMS detection is missing a check on if the nodes of the VEC_PERM
has the amount of children we expect before it recurses.

This check is there on MUL and FMA but was missing for FMS, due to this the
compiler goes on further than it should and hits an assert.

gcc/ChangeLog:

PR tree-optimization/103007
* tree-vect-slp-patterns.c (complex_fms_pattern::matches): Add elem
check.

gcc/testsuite/ChangeLog:

PR tree-optimization/103007
* g++.dg/pr103007.C: New test.

middle-end/103038 - avoid ICE with -ftrivial-auto-var-init=pattern

This avoids ICEing with expanding a VIEW_CONVERT_EXRP of a SSA name
on the LHS by making sure we can native-interpret OFFSET_TYPE and
by never building such a LHS but instead view-converting the RHS
for SSA LHS.

2021-11-02 Richard Biener <rguenther@suse.de>

PR middle-end/103038
* fold-const.c (native_interpret_expr): Handle OFFSET_TYPE.
(can_native_interpret_type_p): Likewise.
* internal-fn.c (expand_DEFERRED_INIT): View-convert the
RHS if the LHS is an SSA name.

* g++.dg/pr103038.C: New testcase.

Add a simulate_record_decl lang hook

This patch adds a lang hook for defining a struct/RECORD_TYPE
“as if” it had appeared directly in the source code. It follows
the similar existing hook for enums.

It's the caller's responsibility to create the fields
(as FIELD_DECLs) but the hook's responsibility to create
and declare the associated RECORD_TYPE.

For now the hook is hard-coded to do the equivalent of:

typedef struct NAME { FIELDS } NAME;

but this could be controlled by an extra parameter if some callers
want a different behaviour in future.

The motivating use case is to allow the long list of struct
definitions in arm_neon.h to be provided by the compiler,
which in turn unblocks various arm_neon.h optimisations.

gcc/
* langhooks.h (lang_hooks_for_types::simulate_record_decl): New hook.
* langhooks-def.h (lhd_simulate_record_decl): Declare.
(LANG_HOOKS_SIMULATE_RECORD_DECL): Define.
(LANG_HOOKS_FOR_TYPES_INITIALIZER): Include it.
* langhooks.c (lhd_simulate_record_decl): New function.

gcc/c/
* c-tree.h (c_simulate_record_decl): Declare.
* c-objc-common.h (LANG_HOOKS_SIMULATE_RECORD_DECL): Override.
* c-decl.c (c_simulate_record_decl): New function.

gcc/cp/
* decl.c: Include langhooks-def.h.
(cxx_simulate_record_decl): New function.
* cp-objcp-common.h (cxx_simulate_record_decl): Declare.
(LANG_HOOKS_SIMULATE_RECORD_DECL): Override.

update my email address

Update my email address, and move myself into the Write After Approval
list - I've not done any ARC work for years now.

/

* MAINTAINERS (Reviewers, arc): Remove my entry.
(Write After Approval): Add an entry for myself.

Fix flake8 errors.

contrib/ChangeLog:

* check-internal-format-escaping.py: Fix flake8 errors.

ia32: Disallow mode(V1TI) [PR103020]

As discussed in the PR, TImode isn't supported for -m32 on x86 (for the same
reason as on most 32-bit targets, no support for > 2 * BITS_PER_WORD
precision integers), but since PR32280 V1TImode is allowed with -msse in SSE
regs, V2TImode with -mavx or V4TImode with -mavx512f.
typedef __int128 V __attribute__((vector_size ({16,32,64}));
will not work, neither typedef int I __attribute__((mode(TI)));
but mode(V1TI), mode(V2TI) etc. are accepted with a warning when those
ISAs are enabled.  But they are certainly not fully supported, for some
optabs maybe, but most of them will not.  And, veclower lowering those ops
to TImode scalar operations will not work either because TImode isn't
supported.

So, this patch keeps V1TImode etc. in VALID*_MODE macros so that we can use
it in certain instructions, but disallows it in
targetm.vector_mode_supported_p, so that we don't offer those modes to the
user as supported.

2021-11-02  Jakub Jelinek  <jakub@redhat.com>

PR target/103020
* config/i386/i386.c (ix86_vector_mode_supported_p): Reject vector
modes with TImode inner mode if 32-bit.

* gcc.target/i386/pr103020.c: New test.

Add TSVC tests.

gcc/testsuite/ChangeLog:

* gcc.dg/vect/vect.exp: Include also tsvc sub-directory.
* gcc.dg/vect/tsvc/license.txt: New test.
* gcc.dg/vect/tsvc/tsvc.h: New test.
* gcc.dg/vect/tsvc/vect-tsvc-s000.c: New test.
* gcc.dg/vect/tsvc/vect-tsvc-s111.c: New test.
* gcc.dg/vect/tsvc/vect-tsvc-s1111.c: New test.
* gcc.dg/vect/tsvc/vect-tsvc-s1112.c: New test.
* gcc.dg/vect/tsvc/vect-tsvc-s1113.c: New test.
* gcc.dg/vect/tsvc/vect-tsvc-s1115.c: New test.
* gcc.dg/vect/tsvc/vect-tsvc-s1119.c: New test.
* gcc.dg/vect/tsvc/vect-tsvc-s112.c: New test.
* gcc.dg/vect/tsvc/vect-tsvc-s113.c: New test.
* gcc.dg/vect/tsvc/vect-tsvc-s114.c: New test.
* gcc.dg/vect/tsvc/vect-tsvc-s115.c: New test.
* gcc.dg/vect/tsvc/vect-tsvc-s116.c: New test.
* gcc.dg/vect/tsvc/vect-tsvc-s1161.c: New test.
* gcc.dg/vect/tsvc/vect-tsvc-s118.c: New test.
* gcc.dg/vect/tsvc/vect-tsvc-s119.c: New test.
* gcc.dg/vect/tsvc/vect-tsvc-s121.c: New test.
* gcc.dg/vect/tsvc/vect-tsvc-s1213.c: New test.
* gcc.dg/vect/tsvc/vect-tsvc-s122.c: New test.
* gcc.dg/vect/tsvc/vect-tsvc-s1221.c: New test.
* gcc.dg/vect/tsvc/vect-tsvc-s123.c: New test.
* gcc.dg/vect/tsvc/vect-tsvc-s1232.c: New test.
* gcc.dg/vect/tsvc/vect-tsvc-s124.c: New test.
* gcc.dg/vect/tsvc/vect-tsvc-s1244.c: New test.
* gcc.dg/vect/tsvc/vect-tsvc-s125.c: New test.
* gcc.dg/vect/tsvc/vect-tsvc-s1251.c: New test.
* gcc.dg/vect/tsvc/vect-tsvc-s126.c: New test.
* gcc.dg/vect/tsvc/vect-tsvc-s127.c: New test.
* gcc.dg/vect/tsvc/vect-tsvc-s1279.c: New test.
* gcc.dg/vect/tsvc/vect-tsvc-s128.c: New test.
* gcc.dg/vect/tsvc/vect-tsvc-s1281.c: New test.
* gcc.dg/vect/tsvc/vect-tsvc-s131.c: New test.
* gcc.dg/vect/tsvc/vect-tsvc-s13110.c: New test.
* gcc.dg/vect/tsvc/vect-tsvc-s132.c: New test.
* gcc.dg/vect/tsvc/vect-tsvc-s1351.c: New test.
* gcc.dg/vect/tsvc/vect-tsvc-s141.c: New test.
* gcc.dg/vect/tsvc/vect-tsvc-s1421.c: New test.
* gcc.dg/vect/tsvc/vect-tsvc-s151.c: New test.
* gcc.dg/vect/tsvc/vect-tsvc-s152.c: New test.
* gcc.dg/vect/tsvc/vect-tsvc-s161.c: New test.
* gcc.dg/vect/tsvc/vect-tsvc-s162.c: New test.
* gcc.dg/vect/tsvc/vect-tsvc-s171.c: New test.
* gcc.dg/vect/tsvc/vect-tsvc-s172.c: New test.
* gcc.dg/vect/tsvc/vect-tsvc-s173.c: New test.
* gcc.dg/vect/tsvc/vect-tsvc-s174.c: New test.
* gcc.dg/vect/tsvc/vect-tsvc-s175.c: New test.
* gcc.dg/vect/tsvc/vect-tsvc-s176.c: New test.
* gcc.dg/vect/tsvc/vect-tsvc-s2101.c: New test.
* gcc.dg/vect/tsvc/vect-tsvc-s2102.c: New test.
* gcc.dg/vect/tsvc/vect-tsvc-s211.c: New test.
* gcc.dg/vect/tsvc/vect-tsvc-s2111.c: New test.
* gcc.dg/vect/tsvc/vect-tsvc-s212.c: New test.
* gcc.dg/vect/tsvc/vect-tsvc-s221.c: New test.
* gcc.dg/vect/tsvc/vect-tsvc-s222.c: New test.
* gcc.dg/vect/tsvc/vect-tsvc-s2233.c: New test.
* gcc.dg/vect/tsvc/vect-tsvc-s2244.c: New test.
* gcc.dg/vect/tsvc/vect-tsvc-s2251.c: New test.
* gcc.dg/vect/tsvc/vect-tsvc-s2275.c: New test.
* gcc.dg/vect/tsvc/vect-tsvc-s231.c: New test.
* gcc.dg/vect/tsvc/vect-tsvc-s232.c: New test.
* gcc.dg/vect/tsvc/vect-tsvc-s233.c: New test.
* gcc.dg/vect/tsvc/vect-tsvc-s235.c: New test.
* gcc.dg/vect/tsvc/vect-tsvc-s241.c: New test.
* gcc.dg/vect/tsvc/vect-tsvc-s242.c: New test.
* gcc.dg/vect/tsvc/vect-tsvc-s243.c: New test.
* gcc.dg/vect/tsvc/vect-tsvc-s244.c: New test.
* gcc.dg/vect/tsvc/vect-tsvc-s251.c: New test.
* gcc.dg/vect/tsvc/vect-tsvc-s252.c: New test.
* gcc.dg/vect/tsvc/vect-tsvc-s253.c: New test.
* gcc.dg/vect/tsvc/vect-tsvc-s254.c: New test.
* gcc.dg/vect/tsvc/vect-tsvc-s255.c: New test.
* gcc.dg/vect/tsvc/vect-tsvc-s256.c: New test.
* gcc.dg/vect/tsvc/vect-tsvc-s257.c: New test.
* gcc.dg/vect/tsvc/vect-tsvc-s258.c: New test.
* gcc.dg/vect/tsvc/vect-tsvc-s261.c: New test.
* gcc.dg/vect/tsvc/vect-tsvc-s271.c: New test.
* gcc.dg/vect/tsvc/vect-tsvc-s2710.c: New test.
* gcc.dg/vect/tsvc/vect-tsvc-s2711.c: New test.
* gcc.dg/vect/tsvc/vect-tsvc-s2712.c: New test.
* gcc.dg/vect/tsvc/vect-tsvc-s272.c: New test.
* gcc.dg/vect/tsvc/vect-tsvc-s273.c: New test.
* gcc.dg/vect/tsvc/vect-tsvc-s274.c: New test.
* gcc.dg/vect/tsvc/vect-tsvc-s275.c: New test.
* gcc.dg/vect/tsvc/vect-tsvc-s276.c: New test.
* gcc.dg/vect/tsvc/vect-tsvc-s277.c: New test.
* gcc.dg/vect/tsvc/vect-tsvc-s278.c: New test.
* gcc.dg/vect/tsvc/vect-tsvc-s279.c: New test.
* gcc.dg/vect/tsvc/vect-tsvc-s281.c: New test.
* gcc.dg/vect/tsvc/vect-tsvc-s291.c: New test.
* gcc.dg/vect/tsvc/vect-tsvc-s292.c: New test.
* gcc.dg/vect/tsvc/vect-tsvc-s293.c: New test.
* gcc.dg/vect/tsvc/vect-tsvc-s311.c: New test.
* gcc.dg/vect/tsvc/vect-tsvc-s3110.c: New test.
* gcc.dg/vect/tsvc/vect-tsvc-s3111.c: New test.
* gcc.dg/vect/tsvc/vect-tsvc-s31111.c: New test.
* gcc.dg/vect/tsvc/vect-tsvc-s3112.c: New test.
* gcc.dg/vect/tsvc/vect-tsvc-s3113.c: New test.
* gcc.dg/vect/tsvc/vect-tsvc-s312.c: New test.
* gcc.dg/vect/tsvc/vect-tsvc-s313.c: New test.
* gcc.dg/vect/tsvc/vect-tsvc-s314.c: New test.
* gcc.dg/vect/tsvc/vect-tsvc-s315.c: New test.
* gcc.dg/vect/tsvc/vect-tsvc-s316.c: New test.
* gcc.dg/vect/tsvc/vect-tsvc-s317.c: New test.
* gcc.dg/vect/tsvc/vect-tsvc-s318.c: New test.
* gcc.dg/vect/tsvc/vect-tsvc-s319.c: New test.
* gcc.dg/vect/tsvc/vect-tsvc-s321.c: New test.
* gcc.dg/vect/tsvc/vect-tsvc-s322.c: New test.
* gcc.dg/vect/tsvc/vect-tsvc-s323.c: New test.
* gcc.dg/vect/tsvc/vect-tsvc-s3251.c: New test.
* gcc.dg/vect/tsvc/vect-tsvc-s331.c: New test.
* gcc.dg/vect/tsvc/vect-tsvc-s332.c: New test.
* gcc.dg/vect/tsvc/vect-tsvc-s341.c: New test.
* gcc.dg/vect/tsvc/vect-tsvc-s342.c: New test.
* gcc.dg/vect/tsvc/vect-tsvc-s343.c: New test.
* gcc.dg/vect/tsvc/vect-tsvc-s351.c: New test.
* gcc.dg/vect/tsvc/vect-tsvc-s352.c: New test.
* gcc.dg/vect/tsvc/vect-tsvc-s353.c: New test.
* gcc.dg/vect/tsvc/vect-tsvc-s4112.c: New test.
* gcc.dg/vect/tsvc/vect-tsvc-s4113.c: New test.
* gcc.dg/vect/tsvc/vect-tsvc-s4114.c: New test.
* gcc.dg/vect/tsvc/vect-tsvc-s4115.c: New test.
* gcc.dg/vect/tsvc/vect-tsvc-s4116.c: New test.
* gcc.dg/vect/tsvc/vect-tsvc-s4117.c: New test.
* gcc.dg/vect/tsvc/vect-tsvc-s4121.c: New test.
* gcc.dg/vect/tsvc/vect-tsvc-s421.c: New test.
* gcc.dg/vect/tsvc/vect-tsvc-s422.c: New test.
* gcc.dg/vect/tsvc/vect-tsvc-s423.c: New test.
* gcc.dg/vect/tsvc/vect-tsvc-s424.c: New test.
* gcc.dg/vect/tsvc/vect-tsvc-s431.c: New test.
* gcc.dg/vect/tsvc/vect-tsvc-s441.c: New test.
* gcc.dg/vect/tsvc/vect-tsvc-s442.c: New test.
* gcc.dg/vect/tsvc/vect-tsvc-s443.c: New test.
* gcc.dg/vect/tsvc/vect-tsvc-s451.c: New test.
* gcc.dg/vect/tsvc/vect-tsvc-s452.c: New test.
* gcc.dg/vect/tsvc/vect-tsvc-s453.c: New test.
* gcc.dg/vect/tsvc/vect-tsvc-s471.c: New test.
* gcc.dg/vect/tsvc/vect-tsvc-s481.c: New test.
* gcc.dg/vect/tsvc/vect-tsvc-s482.c: New test.
* gcc.dg/vect/tsvc/vect-tsvc-s491.c: New test.
* gcc.dg/vect/tsvc/vect-tsvc-va.c: New test.
* gcc.dg/vect/tsvc/vect-tsvc-vag.c: New test.
* gcc.dg/vect/tsvc/vect-tsvc-vas.c: New test.
* gcc.dg/vect/tsvc/vect-tsvc-vbor.c: New test.
* gcc.dg/vect/tsvc/vect-tsvc-vdotr.c: New test.
* gcc.dg/vect/tsvc/vect-tsvc-vif.c: New test.
* gcc.dg/vect/tsvc/vect-tsvc-vpv.c: New test.
* gcc.dg/vect/tsvc/vect-tsvc-vpvpv.c: New test.
* gcc.dg/vect/tsvc/vect-tsvc-vpvts.c: New test.
* gcc.dg/vect/tsvc/vect-tsvc-vpvtv.c: New test.
* gcc.dg/vect/tsvc/vect-tsvc-vsumr.c: New test.
* gcc.dg/vect/tsvc/vect-tsvc-vtv.c: New test.
* gcc.dg/vect/tsvc/vect-tsvc-vtvtv.c: New test.

Adjust testcase for O2 vect.

Adjust code in check_vect_slp_store_usage to make it an exact
pattern match of the corresponding testcases.
These new target/xfail selectors are added as a temporary solution,
and should be removed after real issue is fixed for Wstringop-overflow.

gcc/ChangeLog:

* doc/sourcebuild.texi (vect_slp_v4qi_store_unalign,
vect_slp_v2hi_store_unalign, vect_slp_v4hi_store_unalign,
vect_slp_v4si_store_unalign): Document efficient target.
(vect_slp_v4qi_store_unalign_1, vect_slp_v8qi_store_unalign_1,
vect_slp_v16qi_store_unalign_1): Ditto.
(vect_slp_v2hi_store_align,vect_slp_v2qi_store_align,
vect_slp_v2si_store_align, vect_slp_v4qi_store_align): Ditto.
(struct_4char_block_move, struct_8char_block_move,
struct_16char_block_move): Ditto.

gcc/testsuite/ChangeLog:

PR testsuite/102944
* c-c++-common/Wstringop-overflow-2.c: Adjust target/xfail
selector.
* gcc.dg/Warray-bounds-48.c: Ditto.
* gcc.dg/Warray-bounds-51.c: Ditto.
* gcc.dg/Warray-parameter-3.c: Ditto.
* gcc.dg/Wstringop-overflow-14.c: Ditto.
* gcc.dg/Wstringop-overflow-21.c: Ditto.
* gcc.dg/Wstringop-overflow-68.c: Ditto
* gcc.dg/Wstringop-overflow-76.c: Ditto
* gcc.dg/Wzero-length-array-bounds-2.c: Ditto.
* lib/target-supports.exp (vect_slp_v4qi_store_unalign): New
efficient target.
(vect_slp_v4qi_store_unalign_1): Ditto.
(struct_4char_block_move): Ditto.
(struct_8char_block_move): Ditto.
(stryct_16char_block_move): Ditto.
(vect_slp_v2hi_store_align): Ditto.
(vect_slp_v2qi_store): Rename to ..
(vect_slp_v2qi_store_align): .. this.
(vect_slp_v4qi_store): Rename to ..
(vect_slp_v4qi_store_align): .. This.
(vect_slp_v8qi_store): Rename to ..
(vect_slp_v8qi_store_unalign_1): .. This.
(vect_slp_v16qi_store): Rename to ..
(vect_slp_v16qi_store_unalign_1): .. This.
(vect_slp_v2hi_store): Rename to ..
(vect_slp_v2hi_store_unalign): .. This.
(vect_slp_v4hi_store): Rename to ..
(vect_slp_v4hi_store_unalign): This.
(vect_slp_v2si_store): Rename to ..
(vect_slp_v2si_store_align): .. This.
(vect_slp_v4si_store): Rename to ..
(vect_slp_v4si_store_unalign): Ditto.
(check_vect_slp_aligned_store_usage): Rename to ..
(check_vect_slp_store_usage): .. this and adjust code to make
it an exact pattern match of corresponding testcase.

x86_64: Expand ashrv1ti (and PR target/102986)

This patch was originally intended to implement 128-bit arithmetic right
shifts by constants of vector registers (V1TImode), but while working on
it I discovered the (my) recent ICE on valid regression now known as
PR target/102986.

As diagnosed by Jakub, expanders for shifts are not allowed to fail, and
so any backend that provides a shift optab needs to handle variable amount
shifts as well as constant shifts [even though the middle-end knows how
to synthesize these for vector modes].  This constraint could be relaxed
in the middle-end, but it makes sense to fix this in my erroneous code.

The solution is to change the constraints on the recently added (and new)
shift expanders from SImode const_int_register to QImode general operand,
matching the TImode expanders' constraints, and then simply check for
!CONST_INT_P at the start of the ix86_expand_v1ti_* functions, converting
the operands from V1TImode to TImode, performing the TImode operation
and converting the result back to V1TImode.

One nice benefit of this strategy, is that it allows us to implement
Uros' recent suggestion, that we should be more efficiently converting
between these modes, avoiding the use of memory and using the same idiom
as LLVM or using pextrq/pinsrq where available.  The new helper functions
ix86_expand_v1ti_to_ti and ix86_expand_ti_to_v1ti are sufficient to take
care of this.  Interestingly partial support for this is already present,
but x86_64's generic tuning prefers memory transfers to avoid penalizing
microarchitectures with significant interunit delays.  With these changes
we now generate both pextrq and pinsrq for -mtune=native.

The main body of the patch is to implement arithmetic right shift in
addition to the logical right shift and left shift implemented in the
previous patch.  This expander provides no less than 13 different code
sequences, special casing the different constant shifts, including
variants taking advantage of TARGET_AVX2 and TARGET_SSE4_1.  The code
is structured with the faster/shorter sequences and the start, and
the generic implementations at the end.

For the record, the implementations are:

ashr_127: // Shift 127, 2 operations, 10 bytes
        pshufd  $255, %xmm0, %xmm0
        psrad   $31, %xmm0
        ret

ashr_64: // Shift by 64, 3 operations, 14 bytes
        pshufd  $255, %xmm0, %xmm1
        psrad   $31, %xmm1
        punpckhqdq      %xmm1, %xmm0
        ret

ashr_96: // Shift by 96, 3 operations, 18 bytes
        movdqa  %xmm0, %xmm1
        psrad   $31, %xmm1
        punpckhqdq      %xmm1, %xmm0
        pshufd  $253, %xmm0, %xmm0
        ret

ashr_8: // Shift by 8/16/24/32 on AVX2, 3 operations, 16 bytes
        vpsrad  $8, %xmm0, %xmm1
        vpsrldq $1, %xmm0, %xmm0
        vpblendd        $7, %xmm0, %xmm1, %xmm0
        ret

ashr_8: // Shift by 8/16/24/32 on SSE4.1, 3 operations, 24 bytes
        movdqa  %xmm0, %xmm1
        psrldq  $1, %xmm0
        psrad   $8, %xmm1
        pblendw $63, %xmm0, %xmm1
        movdqa  %xmm1, %xmm0
        ret

ashr_97: // Shifts by 97..126, 4 operations, 23 bytes
        movdqa  %xmm0, %xmm1
        psrad   $31, %xmm0
        psrad   $1, %xmm1
        punpckhqdq      %xmm0, %xmm1
        pshufd  $253, %xmm1, %xmm0
        ret

ashr_48: // Shifts by 48/80 on SSE4.1, 4 operations, 25 bytes
        movdqa  %xmm0, %xmm1
        pshufd  $255, %xmm0, %xmm0
        psrldq  $6, %xmm1
        psrad   $31, %xmm0
        pblendw $31, %xmm1, %xmm0
        ret

ashr_8: // Shifts by multiples of 8, 5 operations, 28 bytes
        movdqa  %xmm0, %xmm1
        pshufd  $255, %xmm0, %xmm0
        psrad   $31, %xmm0
        psrldq  $1, %xmm1
        pslldq  $15, %xmm0
        por     %xmm1, %xmm0
        ret

ashr_1: // Shifts by 1..31 on AVX2, 6 operations, 30 bytes
        vpsrldq $8, %xmm0, %xmm2
        vpsrad  $1, %xmm0, %xmm1
        vpsllq  $63, %xmm2, %xmm2
        vpsrlq  $1, %xmm0, %xmm0
        vpor    %xmm2, %xmm0, %xmm0
        vpblendd        $7, %xmm0, %xmm1, %xmm0
        ret

ashr_1: // Shifts by 1..15 on SSE4.1, 6 operations, 42 bytes
        movdqa  %xmm0, %xmm2
        movdqa  %xmm0, %xmm1
        psrldq  $8, %xmm2
        psrlq   $1, %xmm0
        psllq   $63, %xmm2
        psrad   $1, %xmm1
        por     %xmm2, %xmm0
        pblendw $63, %xmm0, %xmm1
        movdqa  %xmm1, %xmm0
        ret

ashr_1: // Shift by 1, 8 operations, 46 bytes
        movdqa  %xmm0, %xmm1
        movdqa  %xmm0, %xmm2
        psrldq  $8, %xmm2
        psrlq   $63, %xmm1
        psllq   $63, %xmm2
        psrlq   $1, %xmm0
        pshufd  $191, %xmm1, %xmm1
        por     %xmm2, %xmm0
        psllq   $31, %xmm1
        por     %xmm1, %xmm0
        ret

ashr_65: // Shifts by 65..95, 8 operations, 42 bytes
        pshufd  $255, %xmm0, %xmm1
        psrldq  $8, %xmm0
        psrad   $31, %xmm1
        psrlq   $1, %xmm0
        movdqa  %xmm1, %xmm2
        psllq   $63, %xmm1
        pslldq  $8, %xmm2
        por     %xmm2, %xmm1
        por     %xmm1, %xmm0
        ret

ashr_2: // Shifts from 2..63, 9 operations, 47 bytes
        pshufd  $255, %xmm0, %xmm1
        movdqa  %xmm0, %xmm2
        psrad   $31, %xmm1
        psrldq  $8, %xmm2
        psllq   $62, %xmm2
        psrlq   $2, %xmm0
        pslldq  $8, %xmm1
        por     %xmm2, %xmm0
        psllq   $62, %xmm1
        por     %xmm1, %xmm0
        ret

To test these changes there are several new test cases.  sse2-v1ti-shift-2.c
is a compile-test designed to spot/catch PR target/102986 [for all shifts
and rotates by variable amounts], and sse2-v1ti-shift-3.c is an execution
test to confirm shifts/rotates by variable amounts produce the same results
for TImode and V1TImode.  sse2-v1ti-ashiftrt-1.c is a (similar) execution
test to confirm arithmetic right shifts by different constants produce
identical results between TImode and V1TImode.  sse2-v1ti-ashift-[23].c are
duplicates of this file as compilation tests specifying -mavx2 and -msse4.1
respectively to trigger all the paths through the new expander.

2021-11-02  Roger Sayle  <roger@nextmovesoftware.com>
    Jakub Jelinek  <jakub@redhat.com>

gcc/ChangeLog
PR target/102986
* config/i386/i386-expand.c (ix86_expand_v1ti_to_ti,
ix86_expand_ti_to_v1ti): New helper functions.
(ix86_expand_v1ti_shift): Check if the amount operand is an
integer constant, and expand as a TImode shift if it isn't.
(ix86_expand_v1ti_rotate): Check if the amount operand is an
integer constant, and expand as a TImode rotate if it isn't.
(ix86_expand_v1ti_ashiftrt): New function to expand arithmetic
right shifts of V1TImode quantities.
* config/i386/i386-protos.h (ix86_expand_v1ti_ashift): Prototype.
* config/i386/sse.md (ashlv1ti3, lshrv1ti3): Change constraints
to QImode general_operand, and let the helper functions lower
shifts by non-constant operands, as TImode shifts.  Make
conditional on TARGET_64BIT.
(ashrv1ti3): New expander calling ix86_expand_v1ti_ashiftrt.
(rotlv1ti3, rotrv1ti3): Change shift operand to QImode.
Make conditional on TARGET_64BIT.

gcc/testsuite/ChangeLog
PR target/102986
* gcc.target/i386/sse2-v1ti-ashiftrt-1.c: New test case.
* gcc.target/i386/sse2-v1ti-ashiftrt-2.c: New test case.
* gcc.target/i386/sse2-v1ti-ashiftrt-3.c: New test case.
* gcc.target/i386/sse2-v1ti-shift-2.c: New test case.
* gcc.target/i386/sse2-v1ti-shift-3.c: New test case.

IBM Z: Fix address of operands will never be NULL warnings

Since a recent enhancement of -Waddress a couple of warnings are emitted
and turned into errors during bootstrap:

gcc/config/s390/s390.md:12087:25: error: the address of 'operands' will never be NULL [-Werror=address]
12087 |   "TARGET_HTM && operands != NULL
build/gencondmd.c:59:12: note: 'operands' declared here
   59 | extern rtx operands[];
      |            ^~~~~~~~

Fixed by removing those non-null checks.

gcc/ChangeLog:

* config/s390/s390.md ("*cc_to_int", "tabort", "*tabort_1",
"*tabort_1_plus"): Remove operands non-null check.

openmp: Add testcase for threadprivate random access class iterators

This adds a testcase for random access class iterators. The diagnostics
can be different between templates and non-templates, as for some
threadprivate vars finish_id_expression replaces them with call to their
corresponding wrapper, but I think it is not that big deal, we reject
it in either case.

2021-11-02 Jakub Jelinek <jakub@redhat.com>

* g++.dg/gomp/loop-8.C: New test.

Daily bump.

libstdc++: Missing constexpr for __gnu_debug::__valid_range etc

The new 25_algorithms/move/constexpr.cc test fails in debug mode,
because the debug assertions use the non-constexpr overloads in
<debug/stl_iterator.h>.

libstdc++-v3/ChangeLog:

* include/debug/stl_iterator.h (__valid_range): Add constexpr
for C++20. Qualify call to avoid ADL.
(__get_distance, __can_advance, __unsafe, __base): Likewise.
* testsuite/25_algorithms/move/constexpr.cc: Also check with
std::reverse_iterator arguments.

libstdc++: Reorder constraints on std::span::span(Range&&) constructor.

In PR libstdc++/103013 Tim Song pointed out that we could reorder the
constraints of this constructor. That's worth doing just to reduce the
work the compiler has to do during overload resolution, even if it isn't
needed to make the code in the PR work.

libstdc++-v3/ChangeLog:

* include/std/span (span(Range&&)): Reorder constraints.

Fix negative integer range for UInteger.

gcc/ChangeLog:

* opt-functions.awk: Add new sanity checking.
* optc-gen.awk: Add new argument to integer_range_info.
* params.opt: Update 2 params which have negative IntegerRange.

Fix test-suite pattern scanning.

Fixes:

UNRESOLVED: g++.dg/ipa/modref-1.C scan-ipa-dump local-pure-const1 "Function found to be const: {anonymous}::B::genB"
UNRESOLVED: g++.dg/ipa/modref-1.C scan-ipa-dump modref1 "Retslot flags: direct noescape nodirectescape not_returned noread"

gcc/testsuite/ChangeLog:

* g++.dg/ipa/modref-1.C: Fix test-suite pattern scanning.

contrib: add unicode/utf8-dump.py

This script may be useful when debugging issues relating to Unicode
encoding (e.g. when investigating source files with bidirectional control
characters).

It dumps a UTF-8 file as a list of numbered lines (mimicking GCC's
diagnostic output format), interleaved with lines per character showing
the Unicode codepoints, the UTF-8 encoding bytes, the name of the
character, and, where printable, the characters themselves.
The lines are printed in logical order, which may help the reader to grok
the relationship between visual and logical ordering in bi-di files.

For example:

$ cat test.c
int གྷ;
const char *אבג = "ALEF-BET-GIMEL";

$ ./contrib/unicode/utf8-dump.py test.c
   1 | int གྷ;
     |   U+0069            0x69                     LATIN SMALL LETTER I i
     |   U+006E            0x6e                     LATIN SMALL LETTER N n
     |   U+0074            0x74                     LATIN SMALL LETTER T t
     |   U+0020            0x20                                    SPACE (separator)
     |   U+0F43  0xe0 0xbd 0x83                       TIBETAN LETTER GHA གྷ
     |   U+003B            0x3b                                SEMICOLON ;
     |   U+000A            0x0a                           LINE FEED (LF) (control character)
   2 | const char *אבג = "ALEF-BET-GIMEL";
     |   U+0063            0x63                     LATIN SMALL LETTER C c
     |   U+006F            0x6f                     LATIN SMALL LETTER O o
     |   U+006E            0x6e                     LATIN SMALL LETTER N n
     |   U+0073            0x73                     LATIN SMALL LETTER S s
     |   U+0074            0x74                     LATIN SMALL LETTER T t
     |   U+0020            0x20                                    SPACE (separator)
     |   U+0063            0x63                     LATIN SMALL LETTER C c
     |   U+0068            0x68                     LATIN SMALL LETTER H h
     |   U+0061            0x61                     LATIN SMALL LETTER A a
     |   U+0072            0x72                     LATIN SMALL LETTER R r
     |   U+0020            0x20                                    SPACE (separator)
     |   U+002A            0x2a                                 ASTERISK *
     |   U+05D0       0xd7 0x90                       HEBREW LETTER ALEF א
     |   U+05D1       0xd7 0x91                        HEBREW LETTER BET ב
     |   U+05D2       0xd7 0x92                      HEBREW LETTER GIMEL ג
     |   U+0020            0x20                                    SPACE (separator)
     |   U+003D            0x3d                              EQUALS SIGN =
     |   U+0020            0x20                                    SPACE (separator)
     |   U+0022            0x22                           QUOTATION MARK "
     |   U+0041            0x41                   LATIN CAPITAL LETTER A A
     |   U+004C            0x4c                   LATIN CAPITAL LETTER L L
     |   U+0045            0x45                   LATIN CAPITAL LETTER E E
     |   U+0046            0x46                   LATIN CAPITAL LETTER F F
     |   U+002D            0x2d                             HYPHEN-MINUS -
     |   U+0042            0x42                   LATIN CAPITAL LETTER B B
     |   U+0045            0x45                   LATIN CAPITAL LETTER E E
     |   U+0054            0x54                   LATIN CAPITAL LETTER T T
     |   U+002D            0x2d                             HYPHEN-MINUS -
     |   U+0047            0x47                   LATIN CAPITAL LETTER G G
     |   U+0049            0x49                   LATIN CAPITAL LETTER I I
     |   U+004D            0x4d                   LATIN CAPITAL LETTER M M
     |   U+0045            0x45                   LATIN CAPITAL LETTER E E
     |   U+004C            0x4c                   LATIN CAPITAL LETTER L L
     |   U+0022            0x22                           QUOTATION MARK "
     |   U+003B            0x3b                                SEMICOLON ;
     |   U+000A            0x0a                           LINE FEED (LF) (control character)

Tested with Python 3.8

contrib/ChangeLog:
* unicode/utf8-dump.py: New file.

Signed-off-by: David Malcolm <dmalcolm@redhat.com>

PR 102281 (-ftrivial-auto-var-init=zero causes ice)

Do not add call to __builtin_clear_padding when a variable is a gimple
register or it might not have padding.

gcc/ChangeLog:

2021-11-01 qing zhao <qing.zhao@oracle.com>

* gimplify.c (gimplify_decl_expr): Do not add call to
__builtin_clear_padding when a variable is a gimple register
or it might not have padding.
(gimplify_init_constructor): Likewise.

gcc/testsuite/ChangeLog:

2021-11-01 qing zhao <qing.zhao@oracle.com>

* c-c++-common/pr102281.c: New test.
* gcc.target/i386/auto-init-2.c: Adjust testing case.
* gcc.target/i386/auto-init-4.c: Likewise.
* gcc.target/i386/auto-init-6.c: Likewise.
* gcc.target/aarch64/auto-init-6.c: Likewise.

AArch64: Add better costing for vector constants and operations

This patch adds extended costing to cost the creation of constants and the
manipulation of constants.  The default values provided are based on
architectural expectations and each cost models can be individually tweaked as
needed.

The changes in this patch covers:

* Construction of PARALLEL or CONST_VECTOR:
  Adds better costing for vector of constants which is based on the constant
  being created and the instruction that can be used to create it.  i.e. a movi
  is cheaper than a literal load etc.
* Construction of a vector through a vec_dup.

gcc/ChangeLog:

* config/arm/aarch-common-protos.h (struct vector_cost_table): Add
movi, dup and extract costing fields.
* config/aarch64/aarch64-cost-tables.h (qdf24xx_extra_costs,
thunderx_extra_costs, thunderx2t99_extra_costs,
thunderx3t110_extra_costs, tsv110_extra_costs, a64fx_extra_costs): Use
them.
* config/arm/aarch-cost-tables.h (generic_extra_costs,
cortexa53_extra_costs, cortexa57_extra_costs, cortexa76_extra_costs,
exynosm1_extra_costs, xgene1_extra_costs): Likewise
* config/aarch64/aarch64-simd.md (aarch64_simd_dup<mode>): Add r->w dup.
* config/aarch64/aarch64.c (aarch64_rtx_costs): Add extra costs.

gcc/testsuite/ChangeLog:

* gcc.target/aarch64/vect-cse-codegen.c: New test.

middle-end: Teach CSE to be able to do vector extracts.

This patch gets CSE to re-use constants already inside a vector rather than
re-materializing the constant again.

Basically consider the following case:

#include <stdint.h>
#include <arm_neon.h>

uint64_t
test (uint64_t a, uint64x2_t b, uint64x2_t* rt)
{
  uint64_t arr[2] = { 0x0942430810234076UL, 0x0942430810234076UL};
  uint64_t res = a | arr[0];
  uint64x2_t val = vld1q_u64 (arr);
  *rt = vaddq_u64 (val, b);
  return res;
}

The actual behavior is inconsequential however notice that the same constants
are used in the vector (arr and later val) and in the calculation of res.

The code we generate for this however is quite sub-optimal:

test:
        adrp    x2, .LC0
        sub     sp, sp, #16
        ldr     q1, [x2, #:lo12:.LC0]
        mov     x2, 16502
        movk    x2, 0x1023, lsl 16
        movk    x2, 0x4308, lsl 32
        add     v1.2d, v1.2d, v0.2d
        movk    x2, 0x942, lsl 48
        orr     x0, x0, x2
        str     q1, [x1]
        add     sp, sp, 16
        ret
.LC0:
        .xword  667169396713799798
        .xword  667169396713799798

Essentially we materialize the same constant twice.  The reason for this is
because the front-end lowers the constant extracted from arr[0] quite early on.
If you look into the result of fre you'll find

  <bb 2> :
  arr[0] = 667169396713799798;
  arr[1] = 667169396713799798;
  res_7 = a_6(D) | 667169396713799798;
  _16 = __builtin_aarch64_ld1v2di (&arr);
  _17 = VIEW_CONVERT_EXPR<uint64x2_t>(_16);
  _11 = b_10(D) + _17;
  *rt_12(D) = _11;
  arr ={v} {CLOBBER};
  return res_7;

Which makes sense for further optimization.  However come expand time if the
constant isn't representable in the target arch it will be assigned to a
register again.

(insn 8 5 9 2 (set (reg:V2DI 99)
        (const_vector:V2DI [
                (const_int 667169396713799798 [0x942430810234076]) repeated x2
            ])) "cse.c":7:12 -1
     (nil))
...
(insn 14 13 15 2 (set (reg:DI 103)
        (const_int 667169396713799798 [0x942430810234076])) "cse.c":8:12 -1
     (nil))
(insn 15 14 16 2 (set (reg:DI 102 [ res ])
        (ior:DI (reg/v:DI 96 [ a ])
            (reg:DI 103))) "cse.c":8:12 -1
     (nil))

And since it's out of the immediate range of the scalar instruction used
combine won't be able to do anything here.

This will then trigger the re-materialization of the constant twice.

To fix this this patch extends CSE to be able to generate an extract for a
constant from another vector, or to make a vector for a constant by duplicating
another constant.

Whether this transformation is done or not depends entirely on the costing for
the target for the different constants and operations.

I Initially also investigated doing this in PRE, but PRE requires at least 2 BB
to work and does not currently have any way to remove redundancies within a
single BB and it did not look easy to support.

gcc/ChangeLog:

* cse.c (add_to_set): New.
(find_sets_in_insn): Register constants in sets.
(canonicalize_insn): Use auto_vec instead.
(cse_insn): Try materializing using vec_dup.
* rtl.h (simplify_context::simplify_gen_vec_select,
simplify_gen_vec_select): New.
* simplify-rtx.c (simplify_context::simplify_gen_vec_select): New.

testsuite: fix failing complex add testcases PR103000

Some targets have overriden the default unroll factor and so do not have enough
data to succeed for SLP vectorization if loop vect is turned off.

To fix this just always unroll in these testcases.

gcc/testsuite/ChangeLog:

PR testsuite/103000
* gcc.dg/vect/complex/fast-math-bb-slp-complex-add-double.c:
Force unroll.
* gcc.dg/vect/complex/fast-math-bb-slp-complex-add-float.c: likewise
* gcc.dg/vect/complex/fast-math-bb-slp-complex-add-pattern-float.c:
Likewise
* gcc.dg/vect/complex/fast-math-bb-slp-complex-add-pattern-half-float.c:
Likewise.

diagnostics: escape non-ASCII source bytes for certain diagnostics

This patch adds support to GCC's diagnostic subsystem for escaping certain
bytes and Unicode characters when quoting source code.

Specifically, this patch adds a new flag rich_location::m_escape_on_output
which is a hint from a diagnostic that non-ASCII bytes in the pertinent
lines of the user's source code should be escaped when printed.

The patch sets this for the following diagnostics:
- when complaining about stray bytes in the program (when these
are non-printable)
- when complaining about "null character(s) ignored");
- for -Wnormalized= (and generate source ranges for such warnings)

The escaping is controlled by a new option:
  -fdiagnostics-escape-format=[unicode|bytes]

For example, consider a diagnostic involing a source line containing the
string "before" followed by the Unicode character U+03C0 ("GREEK SMALL
LETTER PI", with UTF-8 encoding 0xCF 0x80) followed by the byte 0xBF
(a stray UTF-8 trailing byte), followed by the string "after", where the
diagnostic highlights the U+03C0 character.

By default, this line will be printed verbatim to the user when
reporting a diagnostic at it, as:

beforeπXafter
       ^

(using X for the stray byte to avoid putting invalid UTF-8 in this
commit message)

If the diagnostic sets the "escape" flag, it will be printed as:

before<U+03C0><BF>after
       ^~~~~~~~

with -fdiagnostics-escape-format=unicode (the default), or as:

  before<CF><80><BF>after
        ^~~~~~~~

if the user supplies -fdiagnostics-escape-format=bytes.

This only affects how the source is printed; it does not affect
how column numbers that are printed (as per -fdiagnostics-column-unit=
and -fdiagnostics-column-origin=).

gcc/c-family/ChangeLog:
* c-lex.c (c_lex_with_flags): When complaining about non-printable
CPP_OTHER tokens, set the "escape on output" flag.

gcc/ChangeLog:
* common.opt (fdiagnostics-escape-format=): New.
(diagnostics_escape_format): New enum.
(DIAGNOSTICS_ESCAPE_FORMAT_UNICODE): New enum value.
(DIAGNOSTICS_ESCAPE_FORMAT_BYTES): Likewise.
* diagnostic-format-json.cc (json_end_diagnostic): Add
"escape-source" attribute.
* diagnostic-show-locus.c
(exploc_with_display_col::exploc_with_display_col): Replace
"tabstop" param with a cpp_char_column_policy and add an "aspect"
param.  Use these to compute m_display_col accordingly.
(struct char_display_policy): New struct.
(layout::m_policy): New field.
(layout::m_escape_on_output): New field.
(def_policy): New function.
(make_range): Update for changes to exploc_with_display_col ctor.
(default_print_decoded_ch): New.
(width_per_escaped_byte): New.
(escape_as_bytes_width): New.
(escape_as_bytes_print): New.
(escape_as_unicode_width): New.
(escape_as_unicode_print): New.
(make_policy): New.
(layout::layout): Initialize new fields.  Update m_exploc ctor
call for above change to ctor.
(layout::maybe_add_location_range): Update for changes to
exploc_with_display_col ctor.
(layout::calculate_x_offset_display): Update for change to
cpp_display_width.
(layout::print_source_line): Pass policy
to cpp_display_width_computation. Capture cpp_decoded_char when
calling process_next_codepoint.  Move printing of source code to
m_policy.m_print_cb.
(line_label::line_label): Pass in policy rather than context.
(layout::print_any_labels): Update for change to line_label ctor.
(get_affected_range): Pass in policy rather than context, updating
calls to location_compute_display_column accordingly.
(get_printed_columns): Likewise, also for cpp_display_width.
(correction::correction): Pass in policy rather than tabstop.
(correction::compute_display_cols): Pass m_policy rather than
m_tabstop to cpp_display_width.
(correction::m_tabstop): Replace with...
(correction::m_policy): ...this.
(line_corrections::line_corrections): Pass in policy rather than
context.
(line_corrections::m_context): Replace with...
(line_corrections::m_policy): ...this.
(line_corrections::add_hint): Update to use m_policy rather than
m_context.
(line_corrections::add_hint): Likewise.
(layout::print_trailing_fixits): Likewise.
(selftest::test_display_widths): New.
(selftest::test_layout_x_offset_display_utf8): Update to use
policy rather than tabstop.
(selftest::test_one_liner_labels_utf8): Add test of escaping
source lines.
(selftest::test_diagnostic_show_locus_one_liner_utf8): Update to
use policy rather than tabstop.
(selftest::test_overlapped_fixit_printing): Likewise.
(selftest::test_overlapped_fixit_printing_utf8): Likewise.
(selftest::test_overlapped_fixit_printing_2): Likewise.
(selftest::test_tab_expansion): Likewise.
(selftest::test_escaping_bytes_1): New.
(selftest::test_escaping_bytes_2): New.
(selftest::diagnostic_show_locus_c_tests): Call the new tests.
* diagnostic.c (diagnostic_initialize): Initialize
context->escape_format.
(convert_column_unit): Update to use default character width policy.
(selftest::test_diagnostic_get_location_text): Likewise.
* diagnostic.h (enum diagnostics_escape_format): New enum.
(diagnostic_context::escape_format): New field.
* doc/invoke.texi (-fdiagnostics-escape-format=): New option.
(-fdiagnostics-format=): Add "escape-source" attribute to examples
of JSON output, and document it.
* input.c (location_compute_display_column): Pass in "policy"
rather than "tabstop", passing to
cpp_byte_column_to_display_column.
(selftest::test_cpp_utf8): Update to use cpp_char_column_policy.
* input.h (class cpp_char_column_policy): New forward decl.
(location_compute_display_column): Pass in "policy" rather than
"tabstop".
* opts.c (common_handle_option): Handle
OPT_fdiagnostics_escape_format_.
* selftest.c (temp_source_file::temp_source_file): New ctor
overload taking a size_t.
* selftest.h (temp_source_file::temp_source_file): Likewise.

gcc/testsuite/ChangeLog:
* c-c++-common/diagnostic-format-json-1.c: Add regexp to consume
"escape-source" attribute.
* c-c++-common/diagnostic-format-json-2.c: Likewise.
* c-c++-common/diagnostic-format-json-3.c: Likewise.
* c-c++-common/diagnostic-format-json-4.c: Likewise, twice.
* c-c++-common/diagnostic-format-json-5.c: Likewise.
* gcc.dg/cpp/warn-normalized-4-bytes.c: New test.
* gcc.dg/cpp/warn-normalized-4-unicode.c: New test.
* gcc.dg/encoding-issues-bytes.c: New test.
* gcc.dg/encoding-issues-unicode.c: New test.
* gfortran.dg/diagnostic-format-json-1.F90: Add regexp to consume
"escape-source" attribute.
* gfortran.dg/diagnostic-format-json-2.F90: Likewise.
* gfortran.dg/diagnostic-format-json-3.F90: Likewise.

libcpp/ChangeLog:
* charset.c (convert_escape): Use encoding_rich_location when
complaining about nonprintable unknown escape sequences.
(cpp_display_width_computation::::cpp_display_width_computation):
Pass in policy rather than tabstop.
(cpp_display_width_computation::process_next_codepoint): Add "out"
param and populate *out if non-NULL.
(cpp_display_width_computation::advance_display_cols): Pass NULL
to process_next_codepoint.
(cpp_byte_column_to_display_column): Pass in policy rather than
tabstop.  Pass NULL to process_next_codepoint.
(cpp_display_column_to_byte_column): Pass in policy rather than
tabstop.
* errors.c (cpp_diagnostic_get_current_location): New function,
splitting out the logic from...
(cpp_diagnostic): ...here.
(cpp_warning_at): New function.
(cpp_pedwarning_at): New function.
* include/cpplib.h (cpp_warning_at): New decl for rich_location.
(cpp_pedwarning_at): Likewise.
(struct cpp_decoded_char): New.
(struct cpp_char_column_policy): New.
(cpp_display_width_computation::cpp_display_width_computation):
Replace "tabstop" param with "policy".
(cpp_display_width_computation::process_next_codepoint): Add "out"
param.
(cpp_display_width_computation::m_tabstop): Replace with...
(cpp_display_width_computation::m_policy): ...this.
(cpp_byte_column_to_display_column): Replace "tabstop" param with
"policy".
(cpp_display_width): Likewise.
(cpp_display_column_to_byte_column): Likewise.
* include/line-map.h (rich_location::escape_on_output_p): New.
(rich_location::set_escape_on_output): New.
(rich_location::m_escape_on_output): New.
* internal.h (cpp_diagnostic_get_current_location): New decl.
(class encoding_rich_location): New.
* lex.c (skip_whitespace): Use encoding_rich_location when
complaining about null characters.
(warn_about_normalization): Generate a source range when
complaining about improperly normalized tokens, rather than just a
point, and use encoding_rich_location so that the source code
is escaped on printing.
* line-map.c (rich_location::rich_location): Initialize
m_escape_on_output.

Signed-off-by: David Malcolm <dmalcolm@redhat.com>

libstdc++: Fix range access for empty std::valarray [PR103022]

The std::begin and std::end overloads for std::valarray are defined in
terms of std::addressof(v[0]) which is undefined for an empty valarray.

libstdc++-v3/ChangeLog:

PR libstdc++/103022
* include/std/valarray (begin, end): Do not dereference an empty
valarray. Add noexcept and [[nodiscard]].
* testsuite/26_numerics/valarray/range_access.cc: Check empty
valarray. Check iterator properties. Run as well as compiling.
* testsuite/26_numerics/valarray/range_access2.cc: Likewise.
* testsuite/26_numerics/valarray/103022.cc: New test.

Add debug counters to back threader.

Chasing down stage3 miscomparisons is never fun, and having no way to
distinguish between jump threads registered by a particular
pass, is even harder. This patch adds debug counters for the individual
back threading passes. I've left the ethread pass alone, as that one is
usually benign, but we could easily add it if needed.

The fact that we can only pass one boolean argument to the passes
infrastructure has us do all sorts of gymnastics to differentiate
between the various back threading passes.

Tested on x86-64 Linux.

gcc/ChangeLog:

* dbgcnt.def: Add debug counter for back_thread[12] and
back_threadfull[12].
* passes.def: Pass "first" argument to each back threading pass.
* tree-ssa-threadbackward.c (back_threader::back_threader): Add
first argument.
(back_threader::debug_counter): New.
(back_threader::maybe_register_path): Call debug_counter.

Move statics to threader pass class.

This patch moves all the static functions into the pass class, and
cleans up things a little. The goal is to shuffle things around such
that we can add debug counters that depend on different threading
passes, but it's a clean-up on its own right.

Tested on x86-64 Linux.

gcc/ChangeLog:

* tree-ssa-threadbackward.c (BT_NONE): New.
(BT_SPEED): New.
(BT_RESOLVE): New.
(back_threader::back_threader): Add flags.
Move loop initialization here.
(back_threader::~back_threader): New.
(back_threader::find_taken_edge_switch): Change solver and ranger
to pointers.
(back_threader::find_taken_edge_cond): Same.
(back_threader::find_paths_to_names): Same.
(back_threader::find_paths): Same.
(back_threader::dump): Same.
(try_thread_blocks): Merge into thread_blocks.
(back_threader::thread_blocks): New.
(do_early_thread_jumps): Merge into thread_blocks.
(do_thread_jumps): Merge into thread_blocks.
(back_threader::thread_through_all_blocks): Remove.

Don't register nonsensical relations.

gcc/
PR tree-optimization/103003
* value-relation.cc (dom_oracle::register_relation): If the 2
ssa names are the same, don't register any relation.

gcc/testsuite/
* gcc.dg/pr103003.c: New.

aarch64: Fix redundant check in aut insn generation

During the generation of the epilogue of aarch64(aarch64_expand_epilogue),
the value of crtl->calls_eh_return does not need to be checked again.
This value has been checked during aarch64_return_address_signing_enabled.

gcc/ChangeLog:

* config/aarch64/aarch64.c (aarch64_expand_epilogue): Remove
redundant check for calls_eh_return.
* config/aarch64/aarch64.md (*do_return): Likewise.

Signed-off-by: Dan Li <ashimida@linux.alibaba.com>

Rename duplicate_loop_to_header_edge to duplicate_loop_body_to_header_edge

gcc/ChangeLog:

2021-11-01 Xionghu Luo <luoxhu@linux.ibm.com>

* cfghooks.c (cfg_hook_duplicate_loop_to_header_edge): Rename
duplicate_loop_to_header_edge to
duplicate_loop_body_to_header_edge.
(cfg_hook_duplicate_loop_body_to_header_edge): Likewise.
* cfghooks.h (struct cfg_hooks): Likewise.
(cfg_hook_duplicate_loop_body_to_header_edge): Likewise.
* cfgloopmanip.c (duplicate_loop_body_to_header_edge): Likewise.
(clone_loop_to_header_edge): Likewise.
* cfgloopmanip.h (duplicate_loop_body_to_header_edge): Likewise.
* cfgrtl.c (struct cfg_hooks): Likewise.
* doc/loop.texi: Likewise.
* loop-unroll.c (unroll_loop_constant_iterations): Likewise.
(unroll_loop_runtime_iterations): Likewise.
(unroll_loop_stupid): Likewise.
(apply_opt_in_copies): Likewise.
* tree-cfg.c (struct cfg_hooks): Likewise.
* tree-ssa-loop-ivcanon.c (try_unroll_loop_completely): Likewise.
(try_peel_loop): Likewise.
* tree-ssa-loop-manip.c (copy_phi_node_args): Likewise.
(gimple_duplicate_loop_body_to_header_edge): Likewise.
(tree_transform_and_unroll_loop): Likewise.
* tree-ssa-loop-manip.h (gimple_duplicate_loop_body_to_header_edge):
Likewise.

Refactor loop_version

loop_version currently does lv_adjust_loop_entry_edge
before it loopifys the copy inserted on the header.  This patch moves
the condition generation later and thus we have four pieces to help
understanding of how the adjustment works:
1) duplicating the loop on the entry edge.
2) loopify the duplicated new loop.
3) adjusting the CFG to insert a condition branching to either loop
with lv_adjust_loop_entry_edge.
4) From loopify extract the scale_loop_frequencies bits.

Also removed some piece of code seems obviously useless:
- redirect_all_edges since it is false and loopify only called once.
- extract_cond_bb_edges and lv_flush_pending_stmts (false_edge) as the
edge is not redirected actually.

gcc/ChangeLog:

2021-11-01  Xionghu Luo  <luoxhu@linux.ibm.com>

* cfgloopmanip.c (loop_version): Refactor loopify to
loop_version.  Move condition generation after loopify.
(loopify): Delete.
* cfgloopmanip.h (loopify): Delete.

libcody: add mostlyclean Makefile target

PR other/102657

libcody/ChangeLog:

* Makefile.in: Add mostlyclean Makefile target.