review.tizen.org Git - platform/upstream/gcc.git/log

Fortran/openmp: Add support for 2 argument num_teams clause

Fortran part to commit r12-5146-g48d7327f2aaf65

gcc/fortran/ChangeLog:

* gfortran.h (struct gfc_omp_clauses): Rename num_teams to
num_teams_upper, add num_teams_upper.
* dump-parse-tree.c (show_omp_clauses): Update to handle
lower-bound num_teams clause.
* frontend-passes.c (gfc_code_walker): Likewise
* openmp.c (gfc_free_omp_clauses, gfc_match_omp_clauses,
resolve_omp_clauses): Likewise.
* trans-openmp.c (gfc_trans_omp_clauses, gfc_split_omp_clauses,
gfc_trans_omp_target): Likewise.

libgomp/ChangeLog:

* testsuite/libgomp.fortran/teams-1.f90: New test.

aarch64: Use type-qualified builtins for vcombine_* Neon intrinsics

Declare unsigned and polynomial type-qualified builtins for
vcombine_* Neon intrinsics. Using these builtins removes the need for
many casts in arm_neon.h.

gcc/ChangeLog:

2021-11-10 Jonathan Wright <jonathan.wright@arm.com>

* config/aarch64/aarch64-builtins.c (TYPES_COMBINE): Delete.
(TYPES_COMBINEP): Delete.
* config/aarch64/aarch64-simd-builtins.def: Declare type-
qualified builtins for vcombine_* intrinsics.
* config/aarch64/arm_neon.h (vcombine_s8): Remove unnecessary
cast.
(vcombine_s16): Likewise.
(vcombine_s32): Likewise.
(vcombine_f32): Likewise.
(vcombine_u8): Use type-qualified builtin and remove casts.
(vcombine_u16): Likewise.
(vcombine_u32): Likewise.
(vcombine_u64): Likewise.
(vcombine_p8): Likewise.
(vcombine_p16): Likewise.
(vcombine_p64): Likewise.
(vcombine_bf16): Remove unnecessary cast.
* config/aarch64/iterators.md (VD_I): New mode iterator.
(VDC_P): New mode iterator.

aarch64: Use type-qualified builtins for LD1/ST1 Neon intrinsics

Declare unsigned and polynomial type-qualified builtins for LD1/ST1
Neon intrinsics. Using these builtins removes the need for many casts
in arm_neon.h.

The new type-qualified builtins are also lowered to gimple - as the
unqualified builtins are already.

gcc/ChangeLog:

2021-11-10 Jonathan Wright <jonathan.wright@arm.com>

* config/aarch64/aarch64-builtins.c (TYPES_LOAD1_U): Define.
(TYPES_LOAD1_P): Define.
(TYPES_STORE1_U): Define.
(TYPES_STORE1P): Rename to...
(TYPES_STORE1_P): This.
(get_mem_type_for_load_store): Add unsigned and poly types.
(aarch64_general_gimple_fold_builtin): Add unsigned and poly
type-qualified builtin declarations.
* config/aarch64/aarch64-simd-builtins.def: Declare type-
qualified builtins for LD1/ST1.
* config/aarch64/arm_neon.h (vld1_p8): Use type-qualified
builtin and remove cast.
(vld1_p16): Likewise.
(vld1_u8): Likewise.
(vld1_u16): Likewise.
(vld1_u32): Likewise.
(vld1q_p8): Likewise.
(vld1q_p16): Likewise.
(vld1q_p64): Likewise.
(vld1q_u8): Likewise.
(vld1q_u16): Likewise.
(vld1q_u32): Likewise.
(vld1q_u64): Likewise.
(vst1_p8): Likewise.
(vst1_p16): Likewise.
(vst1_u8): Likewise.
(vst1_u16): Likewise.
(vst1_u32): Likewise.
(vst1q_p8): Likewise.
(vst1q_p16): Likewise.
(vst1q_p64): Likewise.
(vst1q_u8): Likewise.
(vst1q_u16): Likewise.
(vst1q_u32): Likewise.
(vst1q_u64): Likewise.
* config/aarch64/iterators.md (VALLP_NO_DI): New iterator.

aarch64: Use type-qualified builtins for ADDV Neon intrinsics

Declare unsigned type-qualified builtins and use them to implement
the vector reduction Neon intrinsics. This removes the need for many
casts in arm_neon.h.

gcc/ChangeLog:

2021-11-09 Jonathan Wright <jonathan.wright@arm.com>

* config/aarch64/aarch64-simd-builtins.def: Declare unsigned
builtins for vector reduction.
* config/aarch64/arm_neon.h (vaddv_u8): Use type-qualified
builtin and remove casts.
(vaddv_u16): Likewise.
(vaddv_u32): Likewise.
(vaddvq_u8): Likewise.
(vaddvq_u16): Likewise.
(vaddvq_u32): Likewise.
(vaddvq_u64): Likewise.

aarch64: Use type-qualified builtins for ADDP Neon intrinsics

Declare unsigned type-qualified builtins and use them to implement
the pairwise addition Neon intrinsics. This removes the need for many
casts in arm_neon.h.

gcc/ChangeLog:

2021-11-09 Jonathan Wright <jonathan.wright@arm.com>

* config/aarch64/aarch64-simd-builtins.def:
* config/aarch64/arm_neon.h (vpaddq_u8): Use type-qualified
builtin and remove casts.
(vpaddq_u16): Likewise.
(vpaddq_u32): Likewise.
(vpaddq_u64): Likewise.
(vpadd_u8): Likewise.
(vpadd_u16): Likewise.
(vpadd_u32): Likewise.
(vpaddd_u64): Likewise.

aarch64: Use type-qualified builtins for [R]SUBHN[2] Neon intrinsics

Declare unsigned type-qualified builtins and use them to implement
(rounding) halving-narrowing-subtract Neon intrinsics. This removes
the need for many casts in arm_neon.h.

gcc/ChangeLog:

2021-11-09 Jonathan Wright <jonathan.wright@arm.com>

* config/aarch64/aarch64-simd-builtins.def: Declare unsigned
builtins for [r]subhn[2].
* config/aarch64/arm_neon.h (vsubhn_s16): Remove unnecessary
cast.
(vsubhn_s32): Likewise.
(vsubhn_s64): Likewise.
(vsubhn_u16): Use type-qualified builtin and remove casts.
(vsubhn_u32): Likewise.
(vsubhn_u64): Likewise.
(vrsubhn_s16): Remove unnecessary cast.
(vrsubhn_s32): Likewise.
(vrsubhn_s64): Likewise.
(vrsubhn_u16): Use type-qualified builtin and remove casts.
(vrsubhn_u32): Likewise.
(vrsubhn_u64): Likewise.
(vrsubhn_high_s16): Remove unnecessary cast.
(vrsubhn_high_s32): Likewise.
(vrsubhn_high_s64): Likewise.
(vrsubhn_high_u16): Use type-qualified builtin and remove
casts.
(vrsubhn_high_u32): Likewise.
(vrsubhn_high_u64): Likewise.
(vsubhn_high_s16): Remove unnecessary cast.
(vsubhn_high_s32): Likewise.
(vsubhn_high_s64): Likewise.
(vsubhn_high_u16): Use type-qualified builtin and remove
casts.
(vsubhn_high_u32): Likewise.
(vsubhn_high_u64): Likewise.

aarch64: Use type-qualified builtins for [R]ADDHN[2] Neon intrinsics

Declare unsigned type-qualified builtins and use them to implement
(rounding) halving-narrowing-add Neon intrinsics. This removes the
need for many casts in arm_neon.h.

gcc/ChangeLog:

2021-11-09 Jonathan Wright <jonathan.wright@arm.com>

* config/aarch64/aarch64-simd-builtins.def: Declare unsigned
builtins for [r]addhn[2].
* config/aarch64/arm_neon.h (vaddhn_s16): Remove unnecessary
cast.
(vaddhn_s32): Likewise.
(vaddhn_s64): Likewise.
(vaddhn_u16): Use type-qualified builtin and remove casts.
(vaddhn_u32): Likewise.
(vaddhn_u64): Likewise.
(vraddhn_s16): Remove unnecessary cast.
(vraddhn_s32): Likewise.
(vraddhn_s64): Likewise.
(vraddhn_u16): Use type-qualified builtin and remove casts.
(vraddhn_u32): Likewise.
(vraddhn_u64): Likewise.
(vaddhn_high_s16): Remove unnecessary cast.
(vaddhn_high_s32): Likewise.
(vaddhn_high_s64): Likewise.
(vaddhn_high_u16): Use type-qualified builtin and remove
casts.
(vaddhn_high_u32): Likewise.
(vaddhn_high_u64): Likewise.
(vraddhn_high_s16): Remove unnecessary cast.
(vraddhn_high_s32): Likewise.
(vraddhn_high_s64): Likewise.
(vraddhn_high_u16): Use type-qualified builtin and remove
casts.
(vraddhn_high_u32): Likewise.
(vraddhn_high_u64): Likewise.

aarch64: Use type-qualified builtins for UHSUB Neon intrinsics

Declare unsigned type-qualified builtins and use them to implement
halving-subtract Neon intrinsics. This removes the need for many
casts in arm_neon.h.

gcc/ChangeLog:

2021-11-09 Jonathan Wright <jonathan.wright@arm.com>

* config/aarch64/aarch64-simd-builtins.def: Use BINOPU type
qualifiers in generator macros for uhsub builtins.
* config/aarch64/arm_neon.h (vhsub_s8): Remove unnecessary
cast.
(vhsub_s16): Likewise.
(vhsub_s32): Likewise.
(vhsub_u8): Use type-qualified builtin and remove casts.
(vhsub_u16): Likewise.
(vhsub_u32): Likewise.
(vhsubq_s8): Remove unnecessary cast.
(vhsubq_s16): Likewise.
(vhsubq_s32): Likewise.
(vhsubq_u8): Use type-qualified builtin and remove casts.
(vhsubq_u16): Likewise.
(vhsubq_u32): Likewise.

aarch64: Use type-qualified builtins for U[R]HADD Neon intrinsics

Declare unsigned type-qualified builtins and use them to implement
(rounding) halving-add Neon intrinsics. This removes the need for
many casts in arm_neon.h.

gcc/ChangeLog:

2021-11-09 Jonathan Wright <jonathan.wright@arm.com>

* config/aarch64/aarch64-simd-builtins.def: Use BINOPU type
qualifiers in generator macros for u[r]hadd builtins.
* config/aarch64/arm_neon.h (vhadd_s8): Remove unnecessary
cast.
(vhadd_s16): Likewise.
(vhadd_s32): Likewise.
(vhadd_u8): Use type-qualified builtin and remove casts.
(vhadd_u16): Likewise.
(vhadd_u32): Likewise.
(vhaddq_s8): Remove unnecessary cast.
(vhaddq_s16): Likewise.
(vhaddq_s32): Likewise.
(vhaddq_u8): Use type-qualified builtin and remove casts.
(vhaddq_u16): Likewise.
(vhaddq_u32): Likewise.
(vrhadd_s8): Remove unnecessary cast.
(vrhadd_s16): Likewise.
(vrhadd_s32): Likewise.
(vrhadd_u8): Use type-qualified builtin and remove casts.
(vrhadd_u16): Likewise.
(vrhadd_u32): Likewise.
(vrhaddq_s8): Remove unnecessary cast.
(vrhaddq_s16): Likewise.
(vrhaddq_s32): Likewise.
(vrhaddq_u8): Use type-wualified builtin and remove casts.
(vrhaddq_u16): Likewise.
(vrhaddq_u32): Likewise.

aarch64: Use type-qualified builtins for USUB[LW][2] Neon intrinsics

Declare unsigned type-qualified builtins and use them to implement
widening-subtract Neon intrinsics. This removes the need for many
casts in arm_neon.h.

gcc/ChangeLog:

2021-11-09 Jonathan Wright <jonathan.wright@arm.com>

* config/aarch64/aarch64-simd-builtins.def: Use BINOPU type
qualifiers in generator macros for usub[lw][2] builtins.
* config/aarch64/arm_neon.h (vsubl_s8): Remove unnecessary
cast.
(vsubl_s16): Likewise.
(vsubl_s32): Likewise.
(vsubl_u8): Use type-qualified builtin and remove casts.
(vsubl_u16): Likewise.
(vsubl_u32): Likewise.
(vsubl_high_s8): Remove unnecessary cast.
(vsubl_high_s16): Likewise.
(vsubl_high_s32): Likewise.
(vsubl_high_u8): Use type-qualified builtin and remove casts.
(vsubl_high_u16): Likewise.
(vsubl_high_u32): Likewise.
(vsubw_s8): Remove unnecessary casts.
(vsubw_s16): Likewise.
(vsubw_s32): Likewise.
(vsubw_u8): Use type-qualified builtin and remove casts.
(vsubw_u16): Likewise.
(vsubw_u32): Likewise.
(vsubw_high_s8): Remove unnecessary cast.
(vsubw_high_s16): Likewise.
(vsubw_high_s32): Likewise.
(vsubw_high_u8): Use type-qualified builtin and remove casts.
(vsubw_high_u16): Likewise.
(vsubw_high_u32): Likewise.

aarch64: Use type-qualified builtins for UADD[LW][2] Neon intrinsics

Declare unsigned type-qualified builtins and use them to implement
widening-add Neon intrinsics. This removes the need for many casts in
arm_neon.h.

gcc/ChangeLog:

2021-11-09 Jonathan Wright <jonathan.wright@arm.com>

* config/aarch64/aarch64-simd-builtins.def: Use BINOPU type
qualifiers in generator macros for uadd[lw][2] builtins.
* config/aarch64/arm_neon.h (vaddl_s8): Remove unnecessary
cast.
(vaddl_s16): Likewise.
(vaddl_s32): Likewise.
(vaddl_u8): Use type-qualified builtin and remove casts.
(vaddl_u16): Likewise.
(vaddl_u32): Likewise.
(vaddl_high_s8): Remove unnecessary cast.
(vaddl_high_s16): Likewise.
(vaddl_high_s32): Likewise.
(vaddl_high_u8): Use type-qualified builtin and remove casts.
(vaddl_high_u16): Likewise.
(vaddl_high_u32): Likewise.
(vaddw_s8): Remove unnecessary cast.
(vaddw_s16): Likewise.
(vaddw_s32): Likewise.
(vaddw_u8): Use type-qualified builtin and remove casts.
(vaddw_u16): Likewise.
(vaddw_u32): Likewise.
(vaddw_high_s8): Remove unnecessary cast.
(vaddw_high_s16): Likewise.
(vaddw_high_s32): Likewise.
(vaddw_high_u8): Use type-qualified builtin and remove casts.
(vaddw_high_u16): Likewise.
(vaddw_high_u32): Likewise.

aarch64: Use type-qualified builtins for [R]SHRN[2] Neon intrinsics

Declare unsigned type-qualified builtins and use them for [R]SHRN[2]
Neon intrinsics. This removes the need for casts in arm_neon.h.

gcc/ChangeLog:

2021-11-08 Jonathan Wright <jonathan.wright@arm.com>

* config/aarch64/aarch64-simd-builtins.def: Declare type-
qualified builtins for [R]SHRN[2].
* config/aarch64/arm_neon.h (vshrn_n_u16): Use type-qualified
builtin and remove casts.
(vshrn_n_u32): Likewise.
(vshrn_n_u64): Likewise.
(vrshrn_high_n_u16): Likewise.
(vrshrn_high_n_u32): Likewise.
(vrshrn_high_n_u64): Likewise.
(vrshrn_n_u16): Likewise.
(vrshrn_n_u32): Likewise.
(vrshrn_n_u64): Likewise.
(vshrn_high_n_u16): Likewise.
(vshrn_high_n_u32): Likewise.
(vshrn_high_n_u64): Likewise.

aarch64: Use type-qualified builtins for XTN[2] Neon intrinsics

Declare unsigned type-qualified builtins and use them for XTN[2] Neon
intrinsics. This removes the need for casts in arm_neon.h.

gcc/ChangeLog:

2021-11-08 Jonathan Wright <jonathan.wright@arm.com>

* config/aarch64/aarch64-simd-builtins.def: Declare unsigned
type-qualified builtins for XTN[2].
* config/aarch64/arm_neon.h (vmovn_high_u16): Use type-
qualified builtin and remove casts.
(vmovn_high_u32): Likewise.
(vmovn_high_u64): Likewise.
(vmovn_u16): Likewise.
(vmovn_u32): Likewise.
(vmovn_u64): Likewise.

aarch64: Use type-qualified builtins for PMUL[L] Neon intrinsics

Declare poly type-qualified builtins and use them for PMUL[L] Neon
intrinsics. This removes the need for casts in arm_neon.h.

gcc/ChangeLog:

2021-11-08 Jonathan Wright <jonathan.wright@arm.com>

* config/aarch64/aarch64-simd-builtins.def: Use poly type
qualifier in builtin generator macros.
* config/aarch64/arm_neon.h (vmul_p8): Use type-qualified
builtin and remove casts.
(vmulq_p8): Likewise.
(vmull_high_p8): Likewise.
(vmull_p8): Likewise.

aarch64: Use type-qualified builtins for unsigned MLA/MLS intrinsics

Declare type-qualified builtins and use them for MLA/MLS Neon
intrinsics that operate on unsigned types. This eliminates lots of
casts in arm_neon.h.

gcc/ChangeLog:

2021-11-08 Jonathan Wright <jonathan.wright@arm.com>

* config/aarch64/aarch64-simd-builtins.def: Declare type-
qualified builtin generators for unsigned MLA/MLS intrinsics.
* config/aarch64/arm_neon.h (vmla_n_u16): Use type-qualified
builtin.
(vmla_n_u32): Likewise.
(vmla_u8): Likewise.
(vmla_u16): Likewise.
(vmla_u32): Likewise.
(vmlaq_n_u16): Likewise.
(vmlaq_n_u32): Likewise.
(vmlaq_u8): Likewise.
(vmlaq_u16): Likewise.
(vmlaq_u32): Likewise.
(vmls_n_u16): Likewise.
(vmls_n_u32): Likewise.
(vmls_u8): Likewise.
(vmls_u16): Likewise.
(vmls_u32): Likewise.
(vmlsq_n_u16): Likewise.
(vmlsq_n_u32): Likewise.
(vmlsq_u8): Likewise.
(vmlsq_u16): Likewise.
(vmlsq_u32): Likewise.

libgcc: Fix backtrace fallback on PowerPC Big-endian

At the end of the backtrace stream _Unwind_Find_FDE() may not be able
to find the frame unwind info and will later call the backtrace fallback
instead of finishing. This occurs when using an old libc on ppc64 due to
dl_iterate_phdr() not being able to set the fde in the last trace.
When this occurs the cfa of the trace will be behind of context's cfa.
Also, libgo’s probestackmaps() calls the backtrace with a null pointer
and can get to the backchain fallback with the same problem, in this case
we are only interested in find a stack map, we don't need nor can do a
backchain.
_Unwind_ForcedUnwind_Phase2() can hit the same issue as it uses
uw_frame_state_for(), so we need to treat _URC_NORMAL_STOP.

libgcc/ChangeLog:
PR libgcc/103044
* config/rs6000/linux-unwind.h (ppc_backchain_fallback): Check if it's
called with a null argument or at the end of the backtrace and return.
* unwind.inc (_Unwind_ForcedUnwind_Phase2): Treat _URC_NORMAL_STOP.

Fix some side cases of side effects discovery

I wrote script comparing modref pure/const discovery with ipa-pure-const
and found mistakes on both ends.  This plugs the modref differences in handling
looping pure consts which were previously missed due to early exits on
ECF_CONST | ECF_PURE.  Those early exists are bit anoying and I think as
a cleanup I may just drop some of them as premature optimizations coming from
time modref was very simplistic on what it propagates.

gcc/ChangeLog:

2021-11-11  Jan Hubicka  <hubicka@ucw.cz>

* ipa-modref.c (modref_summary::useful_p): Check also for side-effects
with looping const/pure.
(modref_summary_lto::useful_p): Likewise.
(merge_call_side_effects): Merge side effects before early exit
for pure/const.
(process_fnspec): Also handle pure functions.
(analyze_call): Do not early exit on looping pure const.
(propagate_unknown_call): Also handle nontrivial SCC as side-effect.
(modref_propagate_in_scc): Update.

tree-optimization/103190 - fix assert in reassoc stmt placement with asm

This makes sure to only assert we don't run into a asm goto when
inserting a stmt in reassoc, matching the condition in
can_reassociate_p. We can handle EH edges from an asm just like
EH edges from any other stmt.

2021-11-11 Richard Biener <rguenther@suse.de>

PR tree-optimization/103190
* tree-ssa-reassoc.c (insert_stmt_after): Only assert on asm goto.

Move import population from threader to path solver.

Imports are our nomenclature for external SSA names to a block that
are used to calculate the outgoing edges for said block.  For example,
in the following snippet:

    <bb 2> :
    _1 = b_10 == block_11;
    _2 = b_10 != -1;
    _3 = _1 & _2;
    if (_3 != 0)
      goto <bb 3>; [INV]
    else
      goto <bb 5>; [INV]

...the imports to the block are b_10 and block_11 since they are both
needed to calculate _3.

The path solver takes a bitmap of imports in addition to the path
itself.  This sets up the number of SSA names to be on the lookout
for, while resolving the final conditional.

Calculating these imports was initially done in the threader, since it
was the only user of the path solver.  With new clients, it has become
obvious that populating the imports should be a task for the path
solver, so it can be shared among the clients.

This patch moves the import code to the solver, making both the solver
and the threader simpler in the process.  This is because intent is
clearer and some duplicate code was removed.

This reshuffling had the net effect of giving us a handful of new
threads through my suite of .ii files (125).  This was unexpected, but
welcome nevertheless.  There is no performance difference in callgrind
over the same suite.

Regstrapped on x86-64 Linux.

gcc/ChangeLog:

* gimple-range-path.cc (path_range_query::add_copies_to_imports):
Rename to...
(path_range_query::compute_imports): ...this.  Adapt it so it can
be passed the imports bitmap instead of working on m_imports.
(path_range_query::compute_ranges): Call compute_imports in all
cases unless an imports bitmap is passed.
* gimple-range-path.h (path_range_query::compute_imports): New.
(path_range_query::add_copies_to_imports): Remove.
* tree-ssa-threadbackward.c (back_threader::resolve_def): Remove.
(back_threader::find_paths_to_names): Inline resolve_def.
(back_threader::find_paths): Call compute_imports.
(back_threader::resolve_phi): Adjust comment.

Testsuite: Various fixes for nios2.

2021-11-11 Sandra Loosemore <sandra@codesourcery.com>

gcc/testsuite/
* g++.dg/warn/Wmismatched-new-delete-5.C: Add
-fdelete-null-pointer-checks.
* gcc.dg/attr-returns-nonnull.c: Likewise.
* gcc.dg/debug/btf/btf-datasec-1.c: Add -G0 option for nios2.
* gcc.dg/ifcvt-4.c: Skip on nios2.
* gcc.dg/struct-by-value-1.c: Add -G0 option for nios2.

tree-optimization/103188 - avoid running ranger on not-up-to-date SSA

The following splits loop header copying into an analysis phase
that uses ranger and a transform phase that can do without to avoid
running ranger on IL that has SSA form not updated.

2021-11-11 Richard Biener <rguenther@suse.de>

PR tree-optimization/103188
* tree-ssa-loop-ch.c (should_duplicate_loop_header_p):
Remove query parameter, split out check for size
optimization.
(ch_base::m_ranger, cb_base::m_query): Remove.
(ch_base::copy_headers): Split processing loop into
analysis around which we allocate and use ranger and
transform where we do not.
(pass_ch::execute): Do not allocate/free ranger here.
(pass_ch_vect::execute): Likewise.

* gcc.dg/torture/pr103188.c: New testcase.

Fix recursion discovery in ipa-pure-const

We make self recursive functions as looping of fear of endless recursion.
This is done correctly for local pure/const and for non-trivial SCCs in
callgraph, but for trivial SCCs we miss the flag.

I think it is bad decision since infinite recursion will run out of stack,
but changing it upsets some testcases and should be done independently.
So this patch is fixing current behaviour to be consistent.

gcc/ChangeLog:

2021-11-11 Jan Hubicka <hubicka@ucw.cz>

* ipa-pure-const.c (propagate_pure_const): Self recursion is
a side effects.

Fix noreturn discovery.

Fix ipa-pure-const handling of noreturn flags.  It is not safe to set it for
interposable symbols and we should also set it for aliases (just like we do for
other flags).  This patch merely copies other flag handling and implements it
here.

gcc/ChangeLog:

2021-11-11  Jan Hubicka  <hubicka@ucw.cz>

* cgraph.c (set_noreturn_flag_1): New function.
(cgraph_node::set_noreturn_flag): New member function
* cgraph.h (cgraph_node::set_noreturn_flags): Declare.
* ipa-pure-const.c (pass_local_pure_const::execute): Use it.

c++: use auto_vec in cp_parser_template_argument_list

gcc/cp/ChangeLog:

* parser.c (cp_parser_template_argument_list): Use auto_vec
instead of manual memory management.

libgomp: Use TLS storage for omp_get_num_teams()/omp_get_team_num() values

When thinking about GOMP_teams3, I've realized that using global variables
for the values returned by omp_get_num_teams()/omp_get_team_num() calls
is incorrect even with our right now dumb way of implementing host teams.
The problems are two, one is if host teams is used from multiple pthread_create
created threads - the spec says that host teams can't be nested inside of
explicit parallel or other teams constructs, but with pthread_create the
standard says obviously nothing about it.  Another more important thing
is host fallback, right now we don't do anything for omp_get_num_teams()
or omp_get_team_num() which was fine before host teams was introduced and
the 5.1 requirement that num_teams clause specifies minimum of teams, but
with the global vars it means inside of target teams num_teams (2) we happily
return omp_get_num_teams() == 4 if the target teams is inside of host teams
with num_teams(4).  With target fallback being invoked from parallel
regions global vars simply can't work right on the host.

So, this patch moves them to struct gomp_thread and propagates those for
parallel to child threads.  For host fallback, the implicit zeroing of
*thr results in us returning omp_get_num_teams () == 1 and
omp_get_team_num () == 0 which is fine for target teams without num_teams
clause, for target teams with num_teams clause something to work on and
for target without teams nested in it I've asked on omp-lang what should
be done.

2021-11-11  Jakub Jelinek  <jakub@redhat.com>

* libgomp.h (struct gomp_thread): Add num_teams and team_num members.
* team.c (struct gomp_thread_start_data): Likewise.
(gomp_thread_start): Initialize thr->num_teams and thr->team_num.
(gomp_team_start): Initialize start_data->num_teams and
start_data->team_num.  Update nthr->num_teams and nthr->team_num.
* teams.c (gomp_num_teams, gomp_team_num): Remove.
(GOMP_teams_reg): Set and restore thr->num_teams and thr->team_num
instead of gomp_num_teams and gomp_team_num.
(omp_get_num_teams): Use thr->num_teams + 1 instead of gomp_num_teams.
(omp_get_team_num): Use thr->team_num instead of gomp_team_num.
* testsuite/libgomp.c/teams-4.c: New test.

Resolve entry loop condition for the edge remaining in the loop.

There is a known failure for gfortran.dg/vector_subscript_1.f90. It
was previously failing for all optimization levels except -Os.
Getting the loop header copying right, now makes it fail for all
levels :-).

Tested on x86-64 Linux.

Co-authored-by: Richard Biener <rguenther@suse.de>
gcc/ChangeLog:

* tree-ssa-loop-ch.c (entry_loop_condition_is_static): Resolve
statically to the edge remaining in the loop.

middle-end/103181 - fix operation_could_trap_p for vector division

For integer vector division we only checked for all zero vector
constants rather than checking whether any element in the constant
vector is zero.

2021-11-11 Richard Biener <rguenther@suse.de>

PR middle-end/103181
* tree-eh.c (operation_could_trap_helper_p): Properly
check vector constants for a zero element for integer
division. Separate floating point and integer division code.

* gcc.dg/torture/pr103181.c: New testcase.

dwarf2out: Fix up field_byte_offset [PR101378]

For PCC_BITFIELD_TYPE_MATTERS field_byte_offset has quite large code
to deal with it since many years ago (see it e.g. in GCC 3.2, although it
used to be on HOST_WIDE_INTs, then on double_ints, now on offset_ints).
But that code apparently isn't able to cope with members with empty class
types with [[no_unique_address]] attribute, because the empty classes have
non-zero type size but zero decl size and so one can end up from the
computation with negative offset or offset 1 byte smaller than it should be.
For !PCC_BITFIELD_TYPE_MATTERS, we just use
    tree_result = byte_position (decl);
which seems exactly right even for the empty classes or anything which is
not a bitfield (and for which we don't add DW_AT_bit_offset attribute).
So, instead of trying to handle those no_unique_address members in the
current already very complicated code, this limits it to bitfields.

stor-layout.c PCC_BITFIELD_TYPE_MATTERS handling also affects only
bitfields, twice it checks DECL_BIT_FIELD and once DECL_BIT_FIELD_TYPE.

As discussed, this patch uses DECL_BIT_FIELD_TYPE check, because
DECL_BIT_FIELD might be cleared for some bitfields with bitsizes
multiple of BITS_PER_UNIT and e.g.
struct S { int e; int a : 1, b : 7, c : 8, d : 16; } s;
struct T { int a : 1, b : 7; long long c : 8; int d : 16; } t;

int
main ()
{
  s.c = 0x55;
  s.d = 0xaaaa;
  t.c = 0x55;
  t.d = 0xaaaa;
  s.e++;
}
has different debug info with DECL_BIT_FIELD check.

2021-11-11  Jakub Jelinek  <jakub@redhat.com>

PR debug/101378
* dwarf2out.c (field_byte_offset): Do the PCC_BITFIELD_TYPE_MATTERS
handling only for DECL_BIT_FIELD_TYPE decls.

* g++.dg/debug/dwarf2/pr101378.C: New test.

[aarch64] PR102376 - Emit better diagnostic for arch extensions in target attr.

gcc/ChangeLog:
PR target/102376
* config/aarch64/aarch64.c (aarch64_process_target_attr): Check if
token is arch extension without leading '+' and emit appropriate
diagnostic for the same.

gcc/testsuite/ChangeLog:
PR target/102376
* gcc.target/aarch64/pr102376.c: New test.

openmp: Add support for 2 argument num_teams clause

In OpenMP 5.1, num_teams clause can accept either one expression as before,
but it in that case changed meaning, rather than create <= expression
teams it is now create == expression teams.  Or it accepts two expressions
separated by :, with the meaning that the first is low bound and second upper
bound on how many teams should be created.  The other ways to set number of
teams are upper bounds with lower bound of 1.

The following patch does parsing of this for C/C++.  For host teams, we
actually don't need to do anything further right now, we always create
(pretend to create) exactly the requested number of teams, so we can just
evaluate and throw away the lower bound for now.
For teams nested in target, we don't guarantee that though and further
work will be needed.
In particular, omplower now turns the teams part of:
struct S { S (); S (const S &); ~S (); int s; };
void bar (S &, S &);
int baz ();
_Pragma ("omp declare target to (baz)");

void
foo (void)
{
  S a, b;
  #pragma omp target private (a) map (b)
  {
    #pragma omp teams firstprivate (b) num_teams (baz ())
    {
      bar (a, b);
    }
  }
}
into:
  retval.0 = baz ();
  retval.1 = retval.0;
  {
    unsigned int retval.3;
    struct S * D.2549;
    struct S b;

    retval.3 = (unsigned int) retval.1;
    D.2549 = .omp_data_i->b;
    S::S (&b, D.2549);
    #pragma omp teams num_teams(retval.1) firstprivate(b) shared(a)
    __builtin_GOMP_teams (retval.3, 0);
    {
      bar (&a, &b);
    }
    S::~S (&b);
    #pragma omp return(nowait)
  }
IMHO we want a new API, say GOMP_teams3 which will take 3 arguments
instead of 2 (the lower and upper bounds from num_teams and thread_limit)
and will return a bool whether it should do the teams body or not.
And, we should add right before outermost {} above
while (__builtin_GOMP_teams3 ((unsigned) retval.1, (unsigned) retval.1, 0))
and remove the __builtin_GOMP_teams call.  The current function performs
exit equivalent (at least on NVPTX) which seems bad because that means
the destructors of e.g. private variables on target aren't invoked, and
at the current placement neither destructors of the already constructed
privatized variables in teams.
I'll do this next on the compiler side, but I'm afraid I'll need help
with the nvptx and amdgcn implementations.  E.g. for nvptx, we won't be
able to use %ctaid.x .  I think ideal would be to use a .shared
integer variable for the omp_get_team_num value, but I don't have any
experience with that, are .shared variables zero initialized by default,
or do they have random value at start?  PTX docs say they aren't initializable.

2021-11-11  Jakub Jelinek  <jakub@redhat.com>

gcc/
* tree.h (OMP_CLAUSE_NUM_TEAMS_EXPR): Rename to ...
(OMP_CLAUSE_NUM_TEAMS_UPPER_EXPR): ... this.
(OMP_CLAUSE_NUM_TEAMS_LOWER_EXPR): Define.
* tree.c (omp_clause_num_ops): Increase num ops for
OMP_CLAUSE_NUM_TEAMS to 2.
* tree-pretty-print.c (dump_omp_clause): Print optional lower bound
for OMP_CLAUSE_NUM_TEAMS.
* gimplify.c (gimplify_scan_omp_clauses): Gimplify
OMP_CLAUSE_NUM_TEAMS_LOWER_EXPR if non-NULL.
(optimize_target_teams): Use OMP_CLAUSE_NUM_TEAMS_UPPER_EXPR instead
of OMP_CLAUSE_NUM_TEAMS_EXPR.  Handle OMP_CLAUSE_NUM_TEAMS_LOWER_EXPR.
* omp-low.c (lower_omp_teams): Use OMP_CLAUSE_NUM_TEAMS_UPPER_EXPR
instead of OMP_CLAUSE_NUM_TEAMS_EXPR.
* omp-expand.c (expand_teams_call, get_target_arguments): Likewise.
gcc/c/
* c-parser.c (c_parser_omp_clause_num_teams): Parse optional
lower-bound and store it into OMP_CLAUSE_NUM_TEAMS_LOWER_EXPR.
Use OMP_CLAUSE_NUM_TEAMS_UPPER_EXPR instead of
OMP_CLAUSE_NUM_TEAMS_EXPR.
(c_parser_omp_target): For OMP_CLAUSE_NUM_TEAMS evaluate before
combined target teams even lower-bound expression.
gcc/cp/
* parser.c (cp_parser_omp_clause_num_teams): Parse optional
lower-bound and store it into OMP_CLAUSE_NUM_TEAMS_LOWER_EXPR.
Use OMP_CLAUSE_NUM_TEAMS_UPPER_EXPR instead of
OMP_CLAUSE_NUM_TEAMS_EXPR.
(cp_parser_omp_target): For OMP_CLAUSE_NUM_TEAMS evaluate before
combined target teams even lower-bound expression.
* semantics.c (finish_omp_clauses): Handle
OMP_CLAUSE_NUM_TEAMS_LOWER_EXPR of OMP_CLAUSE_NUM_TEAMS clause.
* pt.c (tsubst_omp_clauses): Likewise.
(tsubst_expr): For OMP_CLAUSE_NUM_TEAMS evaluate before
combined target teams even lower-bound expression.
gcc/fortran/
* trans-openmp.c (gfc_trans_omp_clauses): Use
OMP_CLAUSE_NUM_TEAMS_UPPER_EXPR instead of OMP_CLAUSE_NUM_TEAMS_EXPR.
gcc/testsuite/
* c-c++-common/gomp/clauses-1.c (bar): Supply lower-bound expression
to half of the num_teams clauses.
* c-c++-common/gomp/num-teams-1.c: New test.
* c-c++-common/gomp/num-teams-2.c: New test.
* g++.dg/gomp/attrs-1.C (bar): Supply lower-bound expression
to half of the num_teams clauses.
* g++.dg/gomp/attrs-2.C (bar): Likewise.
* g++.dg/gomp/num-teams-1.C: New test.
* g++.dg/gomp/num-teams-2.C: New test.
libgomp/
* testsuite/libgomp.c-c++-common/teams-1.c: New test.

Remove find_pdom and find_dom

This removes now useless wrappers around get_immediate_dominator.

2021-11-11 Richard Biener <rguenther@suse.de>

* cfganal.c (find_pdom): Remove.
(control_dependences::find_control_dependence): Remove
special-casing of entry block, call get_immediate_dominator
directly.
* gimple-predicate-analysis.cc (find_pdom): Remove.
(find_dom): Likewise.
(find_control_equiv_block): Call get_immediate_dominator
directly.
(compute_control_dep_chain): Likewise.
(predicate::init_from_phi_def): Likewise.

Apply TLC to control dependence compute

This makes the control dependence compute avoid a find_edge
and optimizes allocation by embedding the bitmap head into the
vector of control dependences instead of allocating all of them.
It also uses a local bitmap obstack.

The bitmap changes make it necessary to shuffle some includes.

2021-11-10 Richard Biener <rguenther@suse.de>

* cfganal.h (control_dependences::control_dependence_map):
Embed bitmap_head.
(control_dependences::m_bitmaps): New.
* cfganal.c (control_dependences::set_control_dependence_map_bit):
Adjust.
(control_dependences::clear_control_dependence_bitmap):
Likewise.
(control_dependences::find_control_dependence): Do not
find_edge for the abnormal edge test.
(control_dependences::control_dependences): Instead do not
add abnormal edges to the edge list. Adjust.
(control_dependences::~control_dependences): Likewise.
(control_dependences::get_edges_dependent_on): Likewise.
* function-tests.c: Include bitmap.h.

gcc/analyzer/
* supergraph.cc: Include bitmap.h.

gcc/c/
* gimple-parser.c: Shuffle bitmap.h include.

rs6000/doc: Rename future cpu with power10

Commmit 5d9d0c94588 renamed future to power10 and ace60939fd2
updated the documentation for "future" renaming. This patch
is to rename the remaining "future architecture" references in
documentation and polish the words for float128.

gcc/ChangeLog:

* doc/invoke.texi: Change references to "future cpu" to "power10",
"-mcpu=future" to "-mcpu=power10". Adjust words for float128.

x86: Update -mtune=alderlake

Update mtune for alderlake, Alder Lake Intel Hybrid Technology will not support
Intel® AVX-512. ISA features such as Intel® AVX, AVX-VNNI, Intel® AVX2, and
UMONITOR/UMWAIT/TPAUSE are supported.

gcc/ChangeLog

* config/i386/i386-options.c (m_CORE_AVX2): Remove Alderlake
from m_CORE_AVX2.
(processor_cost_table): Use alderlake_cost for Alderlake.
* config/i386/i386.c (ix86_sched_init_global): Handle Alderlake.
* config/i386/x86-tune-costs.h (struct processor_costs): Add alderlake
cost.
* config/i386/x86-tune-sched.c (ix86_issue_rate): Change Alderlake
issue rate to 4.
(ix86_adjust_cost): Handle Alderlake.
* config/i386/x86-tune.def (X86_TUNE_SCHEDULE): Enable for Alderlake.
(X86_TUNE_PARTIAL_REG_DEPENDENCY): Likewise.
(X86_TUNE_SSE_PARTIAL_REG_DEPENDENCY): Likewise.
(X86_TUNE_SSE_PARTIAL_REG_FP_CONVERTS_DEPENDENCY): Likewise.
(X86_TUNE_SSE_PARTIAL_REG_CONVERTS_DEPENDENCY): Likewise.
(X86_TUNE_MEMORY_MISMATCH_STALL): Likewise.
(X86_TUNE_USE_LEAVE): Likewise.
(X86_TUNE_PUSH_MEMORY): Likewise.
(X86_TUNE_USE_INCDEC): Likewise.
(X86_TUNE_INTEGER_DFMODE_MOVES): Likewise.
(X86_TUNE_MISALIGNED_MOVE_STRING_PRO_EPILOGUES): Likewise.
(X86_TUNE_USE_SAHF): Likewise.
(X86_TUNE_USE_BT): Likewise.
(X86_TUNE_AVOID_FALSE_DEP_FOR_BMI): Likewise.
(X86_TUNE_ONE_IF_CONV_INSN): Likewise.
(X86_TUNE_AVOID_MFENCE): Likewise.
(X86_TUNE_USE_SIMODE_FIOP): Likewise.
(X86_TUNE_EXT_80387_CONSTANTS): Likewise.
(X86_TUNE_SSE_UNALIGNED_LOAD_OPTIMAL): Likewise.
(X86_TUNE_SSE_UNALIGNED_STORE_OPTIMAL): Likewise.
(X86_TUNE_SSE_TYPELESS_STORES): Likewise.
(X86_TUNE_SSE_LOAD0_BY_PXOR): Likewise.
(X86_TUNE_AVOID_4BYTE_PREFIXES): Likewise.
(X86_TUNE_USE_GATHER): Disable for Alderlake.
(X86_TUNE_AVX256_MOVE_BY_PIECES): Likewise.
(X86_TUNE_AVX256_STORE_BY_PIECES): Likewise.

Extend vpcmov to handle V8HF/V16HFmode under TARGET_XOP.

gcc/ChangeLog:

PR target/103151
* config/i386/sse.md (V_128_256): Extend to V8HF/V16HF.
(avxsizesuffix): Ditto.

gcc/testsuite/ChangeLog:

* gcc.target/i386/pr103151.c: New test.

RISC-V: Fix wrong zifencei handling in riscv_subset_list::to_string

This issue cause zifencei never correctly appended on the ISA string.

gcc/ChangeLog

* common/config/riscv/riscv-common.c (riscv_subset_list::to_string): Fix
wrong marco checking.

Daily bump.

Allow loop header copying when first iteration condition is known.

As discussed in the PR, the loop header copying pass avoids doing so
when optimizing for size.  However, sometimes we can determine the
loop entry conditional statically for the first iteration of the loop.

This patch uses the path solver to determine the outgoing edge
out of preheader->header->xx.  If so, it allows header copying.  Doing
this in the loop optimizer saves us from doing gymnastics in the
threader which doesn't have the context to determine if a loop
transformation is profitable.

I am only returning true in entry_loop_condition_is_static for
a true conditional.  Technically a false conditional is also
provably static, but allowing any boolean value causes a regression
in gfortran.dg/vector_subscript_1.f90.

I would have preferred not passing around the query object, but the
layout of pass_ch and should_duplicate_loop_header_p make it a bit
awkward to get it right without an outright refactor to the
pass.

Tested on x86-64 Linux.

gcc/ChangeLog:

PR tree-optimization/102906
* tree-ssa-loop-ch.c (entry_loop_condition_is_static): New.
(should_duplicate_loop_header_p): Call entry_loop_condition_is_static.
(class ch_base): Add m_ranger and m_query.
(ch_base::copy_headers): Pass m_query to
entry_loop_condition_is_static.
(pass_ch::execute): Allocate and deallocate m_ranger and
m_query.
(pass_ch_vect::execute): Same.

gcc/testsuite/ChangeLog:

* gcc.dg/tree-ssa/pr102906.c: New test.

[COMMITTED] aarch64: [PR103170] Fix aarch64_simd_dup<mode>

The problem here is aarch64_simd_dup<mode> use
the vw iterator rather than vwcore iterator. This causes
problems for the V4SF and V2DF modes. I changed both of
aarch64_simd_dup<mode> patterns to be consistent.

Committed as obvious after a bootstrap/test on aarch64-linux-gnu.

PR target/103170

gcc/ChangeLog:

* config/aarch64/aarch64-simd.md (aarch64_simd_dup<mode>):
Use vwcore iterator for the r constraint output string.

gcc/testsuite/ChangeLog:

* gcc.c-torture/compile/vector-dup-1.c: New test.

Fortran: avoid NULL pointer dereferences

CLASS(), PARAMETER is not yet properly implemented in gfortran. Using it
in declarations could lead to subsequent NULL pointer dereferences during
checking or simplification of expressions involving those CLASS variables.

gcc/fortran/ChangeLog:

PR fortran/103137
PR fortran/103138
* check.c (gfc_check_shape): Avoid NULL pointer dereference on
missing ref.
* simplify.c (gfc_simplify_cshift): Avoid NULL pointer dereference
when shape not set.
(gfc_simplify_transpose): Likewise.

Add a testcase for PR tree-optimization/102892

PR tree-optimization/102892 is fixed by

commit 4b3a325f07acebf47e82de227ce1d5ba62f5bcae
Author: Aldy Hernandez <aldyh@redhat.com>
Date: Thu Oct 28 15:35:21 2021 +0200

Remove VRP threader passes in exchange for better threading pre-VRP.

PR tree-optimization/102892
* gcc.dg/pr102892-1.c: New file.
* gcc.dg/pr102892-2.c: Likewise.

Adjust test to avoid target-specific failures [PR103161].

Resolves:
PR testsuite/103161 - Better ranges cause builtin-sprintf-warn-16.c failure

gcc/testsuite:
PR testsuite/103161
* gcc.dg/tree-ssa/builtin-sprintf-warn-16.c: Avoid relying on
argument evaluation order. Cast width and precision to signed
to avoid undefined behavior.

Apply pattern initialization only when have_insn_for return true.

For -ftrivial-auto-var-init=pattern, initialize the variable with patterns only
when have_insn_for (SET, mode) return true.  Otherwise initialize it with zeros.
with this change, _Complex long double on X86 is initialized to zero for
pattern initialization.

gcc/ChangeLog:

2021-11-10  qing zhao  <qing.zhao@oracle.com>

* internal-fn.c (expand_DEFERRED_INIT): Apply pattern initialization
only when have_insn_for return true for the mode. Fix a memory leak.

gcc/testsuite/ChangeLog:

2021-11-10  qing zhao  <qing.zhao@oracle.com>

* gcc.target/i386/auto-init-6.c: _Complex long double is initialized
to zero now with -ftrivial-auto-var-init=pattern.

arm: Initialize vector costing fields

The movi, dup and extract costing fields were recently added to struct
vector_cost_table, but there initialization is missing for the arm
(aarch32) specific descriptions.

Although the arm port does not use these fields (only aarch64 does),
this is causing warnings during the build, and even build failures
when using gcc-4.8.5 as host compiler:

/gccsrc/gcc/config/arm/arm.c:1194:1: error: uninitialized const member 'vector_cost_table::movi'
};
  ^
/gccsrc/gcc/config/arm/arm.c:1194:1: warning: missing initializer for member 'vector_cost_table::movi' [-Wmissing-field-initializers]
/gccsrc/gcc/config/arm/arm.c:1194:1: error: uninitialized const member 'vector_cost_table::dup'
/gccsrc/gcc/config/arm/arm.c:1194:1: warning: missing initializer for member 'vector_cost_table::dup' [-Wmissing-field-initializers]
/gccsrc/gcc/config/arm/arm.c:1194:1: error: uninitialized const member 'vector_cost_table::extract'
/gccsrc/gcc/config/arm/arm.c:1194:1: warning: missing initializer for member 'vector_cost_table::extract' [-Wmissing-field-initializers]

This patch uses the same initialization values as in aarch64 for
consistency:
+    COSTS_N_INSNS (1),  /* movi.  */
+    COSTS_N_INSNS (2),  /* dup.  */
+    COSTS_N_INSNS (2)   /* extract.  */

2021-11-10  Christophe Lyon  <christophe.lyon@foss.st.com>

gcc/
* config/arm/arm.c (cortexa9_extra_costs, cortexa8_extra_costs,
cortexa5_extra_costs, cortexa7_extra_costs,
cortexa12_extra_costs, cortexa15_extra_costs, v7m_extra_costs):
Initialize movi, dup and extract costing fields.

path solver: Adjustments for use outside of the backward threader.

Here are some enhancements to make it easier for other clients to use
the path solver.

First, I've made the imports to the solver optional since we can
calculate them ourselves.  However, I've left the ability to set them,
since the backward threader adds a few SSA names in addition to the
default ones.  As a follow-up I may move all the import set up code
from the threader to the solver, as the extra imports tend to improve
the behavior slightly.

Second, Richi suggested an entry point where you just feed the solver
an edge, which will be quite convenient for a subsequent patch adding
a client in the header copying pass.  The required some shuffling,
since we'll be adding the blocks on the fly.  There's now a vector
copy, but the impact will be minimal, since these are just 5-6 entries
at the most.

Tested on ppc64le Linux.

gcc/ChangeLog:

* gimple-range-path.cc (path_range_query::path_range_query): Do
not init m_path.
(path_range_query::dump): Change m_path uses to non-pointer.
(path_range_query::defined_outside_path):  Same.
(path_range_query::set_path): Same.
(path_range_query::add_copies_to_imports): Same.
(path_range_query::range_of_stmt): Same.
(path_range_query::compute_outgoing_relations): Same.
(path_range_query::compute_ranges): Imports are now optional.
Implement overload that takes an edge.
* gimple-range-path.h (class path_range_query): Make imports
optional for compute_ranges.  Add compute_ranges(edge) overload.
Make m_path an auto_vec instead of a pointer and adjust
accordingly.

AArch64: do not keep negated mask and inverse mask live at the same time

The following example:

void f11(double * restrict z, double * restrict w, double * restrict x,
double * restrict y, int n)
{
    for (int i = 0; i < n; i++) {
        z[i] = (w[i] > 0) ? w[i] : y[i];
    }
}

Generates currently:

        ptrue   p2.b, all
        ld1d    z0.d, p0/z, [x1, x2, lsl 3]
        fcmgt   p1.d, p2/z, z0.d, #0.0
        bic     p3.b, p2/z, p0.b, p1.b
        ld1d    z1.d, p3/z, [x3, x2, lsl 3]

and after the previous patches generates:

        ptrue   p3.b, all
        ld1d    z0.d, p0/z, [x1, x2, lsl 3]
        fcmgt   p1.d, p0/z, z0.d, #0.0
        fcmgt   p2.d, p3/z, z0.d, #0.0
        not     p1.b, p0/z, p1.b
        ld1d    z1.d, p1/z, [x3, x2, lsl 3]

where a duplicate comparison is performed for w[i] > 0.

This is because in the vectorizer we're emitting a comparison for both a and ~a
where we just need to emit one of them and invert the other.  After this patch
we generate:

        ld1d    z0.d, p0/z, [x1, x2, lsl 3]
        fcmgt   p1.d, p0/z, z0.d, #0.0
        mov     p2.b, p1.b
        not     p1.b, p0/z, p1.b
        ld1d    z1.d, p1/z, [x3, x2, lsl 3]

In order to perform the check I have to fully expand the NOT stmts when
recording them as the SSA names for the top level expressions differ but
their arguments don't. e.g. in _31 = ~_34 the value of _34 differs but not
the operands in _34.

But we only do this when the operation is an ordered one because mixing
ordered and unordered expressions can lead to de-optimized code.

Note: This patch series is working incrementally towards generating the most
      efficient code for this and other loops in small steps. The mov is
      created by postreload when it does a late CSE.

gcc/ChangeLog:

* tree-vectorizer.h (struct scalar_cond_masked_key): Add inverted_p.
(default_hash_traits<scalar_conf_masked_key>): Likewise.
* tree-vect-stmts.c (vectorizable_condition): Check if inverse of mask
is live.
* tree-vectorizer.c (scalar_cond_masked_key::get_cond_ops_from_tree):
Register mask inverses.

gcc/testsuite/ChangeLog:

* gcc.target/aarch64/sve/pred-not-gen-1.c: Update testcase.
* gcc.target/aarch64/sve/pred-not-gen-2.c: Update testcase.
* gcc.target/aarch64/sve/pred-not-gen-3.c: Update testcase.
* gcc.target/aarch64/sve/pred-not-gen-4.c: Update testcase.

middle-end: Add an RPO pass after successful vectorization

Following my current SVE predicate optimization series a problem has presented
itself in that the way vector masks are generated for masked operations relies
on CSE to share masks efficiently.

The issue however is that masking is done using the & operand and & is
associative and so reassoc decides to reassociate the masked operations.

This makes CSE then unable to CSE an unmasked and a masked operation leading to
duplicate operations being performed.

To counter this we want to add an RPO pass over the vectorized loop body when
vectorization succeeds. This makes it then no longer reliant on the RTL level
CSE.

I have not added a testcase for this as it requires the changes in my patch
series, however the entire series relies on this patch to work so all the
tests there cover it.

gcc/ChangeLog:

* tree-vectorizer.c (vectorize_loops): Do local CSE through RPVN upon
successful vectorization.

Grow sbr_vector in ranger's on-entry cache as needed.

The on-entry cache does not expect the number of BBs to change. This
could happen in various scenarios, recently in the suggestion to use
ranger with loop unswitching and also with a work in progress to use
the path solver in the loopch pass. This patch fixes both.

This is a patch from Andrew, who tested it on x86-64 Linux.

gcc/ChangeLog:

* gimple-range-cache.cc (sbr_vector::grow): New.
(sbr_vector::set_bb_range): Call grow.
(sbr_vector::get_bb_range): Same.
(sbr_vector::bb_range_p): Remove assert.

AArch64: Remove shuffle pattern for rounding variant.

This removed the patterns to optimize the rounding shift and narrow.
The optimization is valid only for the truncating rounding shift and narrow,
for the rounding shift and narrow we need a different pattern that I will submit
separately.

This wasn't noticed before as the benchmarks did not run conformance as part of
the run, which we now do and this now passes again.

gcc/ChangeLog:

* config/aarch64/aarch64-simd.md (*aarch64_topbits_shuffle<mode>_le
,*aarch64_topbits_shuffle<mode>_be): Remove.

gcc/testsuite/ChangeLog:

* gcc.target/aarch64/shrn-combine-8.c: Update.
* gcc.target/aarch64/shrn-combine-9.c: Update.

Extend modref by side-effect analysis

Make modref to also collect info whether function has side
effects. This allows pure/const function detection and also handling
functions which do store some memory in similar way as we handle
pure/consts now.

The code is symmetric to what ipa-pure-const does. Modref is actually more
capable on proving that a given function is pure/const (since it understands
that non-pure function can be called when it only modifies data on stack)
so we could retire ipa-pure-const's pure-const discovery at some point.

However this patch only does the anlaysis - the consumers of this flag
will come next.

Bootstrapped/regtested x86_64-linux. I plan to commit it later today
if there are no complains.

gcc/ChangeLog:

* ipa-modref.c: Include tree-eh.h
(modref_summary::modref_summary): Initialize side_effects.
(struct modref_summary_lto): New bool field side_effects.
(modref_summary_lto::modref_summary_lto): Initialize side_effects.
(modref_summary::dump): Dump side_effects.
(modref_summary_lto::dump): Dump side_effects.
(merge_call_side_effects): Merge side effects.
(process_fnspec): Calls to non-const/pure or looping
function is a side effect.
(analyze_call): Self-recursion is a side-effect; handle
special builtins.
(analyze_load): Watch for volatile and throwing memory.
(analyze_store): Likewise.
(analyze_stmt): Watch for volatitle asm.
(analyze_function): Handle side_effects.
(modref_summaries::duplicate): Duplicate side_effects.
(modref_summaries_lto::duplicate): Likewise.
(modref_write): Stream side_effects.
(read_section): Likewise.
(update_signature): Update.
(propagate_unknown_call): Handle side_effects.
(modref_propagate_in_scc): Likewise.
* ipa-modref.h (struct modref_summary): Add side_effects.
* ipa-pure-const.c (special_builtin_state): Rename to ...
(builtin_safe_for_const_function_p): ... this one.
(check_call): Update.
(finite_function_p): Break out from ...
(propagate_pure_const): ... here
* ipa-utils.h (finite_function): Declare.

Fix typo in modref-13.c

gcc/testsuite/ChangeLog:

2021-11-10 Jan Hubicka <hubicka@ucw.cz>

* gcc.dg/tree-ssa/modref-13.c: Fix typo.

rs6000: Remove LINK_OS_EXTRA_SPEC{32,64} from --with-advance-toolchain

Historically this was added to fill gaps from ld.so.cache on early AT
releases. This now are just causing errors and rework. Since AT5.0 the
AT's ld.so is using a correctly configured ld.so.cache and sets the
DT_INTERP to AT's ld.so. This two factors are sufficient for an AT
builded program to get the correct libraries.

GCC congured with --with-advance-toolchain has issues building GlibC
releases because it adds DT_RUNPATH to ld.so and that's unsupported.

2021-11-10 Lucas A. M. Magalhães <lamm@linux.ibm.com>

gcc/
* config.gcc (powerpc*-*-*): Remove -rpath from
--with-advance-toolchain.

attribs: Implement -Wno-attributes=vendor::attr [PR101940]

It is desirable for -Wattributes to warn about e.g.

[[deprecate]] void g(); // typo, should warn

However, -Wattributes also warns about vendor-specific attributes
(that's because lookup_scoped_attribute_spec -> find_attribute_namespace
finds nothing), which, with -Werror, causes grief.  We don't want the
-Wattributes warning for

[[company::attr]] void f();

GCC warns because it doesn't know the "company" namespace; it only knows
the "gnu" and "omp" namespaces.  We could entirely disable warning about
attributes in unknown scopes but then the compiler would also miss typos
like

  [[company::attrx]] void f();

or

  [[gmu::warn_used_result]] int write();

so that is not a viable solution.  A workaround is to use a #pragma:

  #pragma GCC diagnostic push
  #pragma GCC diagnostic ignored "-Wattributes"
  [[company::attr]] void f() {}
  #pragma GCC diagnostic pop

but that's a mouthful and awkward to use and could also hide typos.  In
fact, any macro-based solution doesn't seem like a way forward.

This patch implements -Wno-attributes=, which takes these arguments:

company::attr
company::

This option should go well with using @file: the user could have a file
containing
-Wno-attributes=vendor::attr1,vendor::attr2
and then invoke gcc with '@attrs' or similar.

I've also added a new pragma which has the same effect:

The pragma along with the new option should help with various static
analysis tools.

PR c++/101940

gcc/ChangeLog:

* attribs.c (struct scoped_attributes): Add a bool member.
(lookup_scoped_attribute_spec): Forward declare.
(register_scoped_attributes): New bool parameter, defaulted to
false.  Use it.
(handle_ignored_attributes_option): New function.
(free_attr_data): New function.
(init_attributes): Call handle_ignored_attributes_option.
(attr_namespace_ignored_p): New function.
(decl_attributes): Check attr_namespace_ignored_p before
warning.
* attribs.h (free_attr_data): Declare.
(register_scoped_attributes): Adjust declaration.
(handle_ignored_attributes_option): Declare.
(canonicalize_attr_name): New function template.
(canonicalize_attr_name): Use it.
* common.opt (Wattributes=): New option with a variable.
* doc/extend.texi: Document #pragma GCC diagnostic ignored_attributes.
* doc/invoke.texi: Document -Wno-attributes=.
* opts.c (common_handle_option) <case OPT_Wattributes_>: Handle.
* plugin.h (register_scoped_attributes): Adjust declaration.
* toplev.c (compile_file): Call free_attr_data.

gcc/c-family/ChangeLog:

* c-pragma.c (handle_pragma_diagnostic): Handle #pragma GCC diagnostic
ignored_attributes.

gcc/testsuite/ChangeLog:

* c-c++-common/Wno-attributes-1.c: New test.
* c-c++-common/Wno-attributes-2.c: New test.
* c-c++-common/Wno-attributes-3.c: New test.

arm: enable cortex-a710 CPU

This patch is adding support for Cortex-A710 CPU in Arm.

gcc/ChangeLog:

* config/arm/arm-cpus.in (cortex-a710): New CPU.
* config/arm/arm-tables.opt: Regenerate.
* config/arm/arm-tune.md: Regenerate.
* doc/invoke.texi: Update docs.

[AArch64] Fix bootstrap failure due to missing ATTRIBUTE_UNUSED

gcc/ChangeLog:

* config/aarch64/aarch64-builtins.c
(aarch64_general_gimple_fold_builtin): Mark argument as unused.

lto-wrapper: fix memory corruption.

The first argument of merge_and_complain is actually vector where
we merge options and it should be propagated to caller properly.

Fixes:

==6656== Invalid read of size 8
==6656==    at 0x408056: merge_and_complain (lto-wrapper.c:335)
==6656==    by 0x408056: find_and_merge_options(int, long, char const*, vec<cl_decoded_option, va_heap, vl_ptr>, vec<cl_decoded_option, va_heap, vl_ptr>*, char const*) (lto-wrapper.c:1139)
==6656==    by 0x408AFC: run_gcc(unsigned int, char**) (lto-wrapper.c:1505)
==6656==    by 0x4061A2: main (lto-wrapper.c:2138)
==6656==  Address 0x4e69b18 is 344 bytes inside a block of size 1,768 free'd
==6656==    at 0x484339F: realloc (vg_replace_malloc.c:1192)
==6656==    by 0x4993C0: xrealloc (xmalloc.c:181)
==6656==    by 0x406A82: reserve<cl_decoded_option> (vec.h:290)
==6656==    by 0x406A82: reserve (vec.h:1858)
==6656==    by 0x406A82: vec<cl_decoded_option, va_heap, vl_ptr>::safe_push(cl_decoded_option const&) [clone .isra.0] (vec.h:1967)
==6656==    by 0x4077E0: merge_and_complain (lto-wrapper.c:457)
==6656==    by 0x4077E0: find_and_merge_options(int, long, char const*, vec<cl_decoded_option, va_heap, vl_ptr>, vec<cl_decoded_option, va_heap, vl_ptr>*, char const*) (lto-wrapper.c:1139)
==6656==    by 0x408AFC: run_gcc(unsigned int, char**) (lto-wrapper.c:1505)
==6656==    by 0x4061A2: main (lto-wrapper.c:2138)
==6656==  Block was alloc'd at
==6656==    at 0x483E70F: malloc (vg_replace_malloc.c:380)
==6656==    by 0x4993D7: xrealloc (xmalloc.c:179)
==6656==    by 0x407476: reserve<cl_decoded_option> (vec.h:290)
==6656==    by 0x407476: reserve (vec.h:1858)
==6656==    by 0x407476: reserve_exact (vec.h:1878)
==6656==    by 0x407476: create (vec.h:1893)
==6656==    by 0x407476: get_options_from_collect_gcc_options(char const*, char const*) (lto-wrapper.c:163)
==6656==    by 0x407674: find_and_merge_options(int, long, char const*, vec<cl_decoded_option, va_heap, vl_ptr>, vec<cl_decoded_option, va_heap, vl_ptr>*, char const*) (lto-wrapper.c:1132)
==6656==    by 0x408AFC: run_gcc(unsigned int, char**) (lto-wrapper.c:1505)
==6656==    by 0x4061A2: main (lto-wrapper.c:2138)

gcc/ChangeLog:

* lto-wrapper.c (merge_and_complain): Make the first argument
a reference type.

aarch64: Tweak FMAX/FMIN iterators

There was some duplication between the maxmin_uns (uns for unspec
rather than unsigned) int attribute and the optab int attribute.
The difficulty for FMAXNM and FMINNM is that the instructions
really correspond to two things: the smax/smin optabs for floats
(used only for fast-math-like flags) and the fmax/fmin optabs
(used for built-in functions).  The optab attribute was
consistently for the former but maxmin_uns had a mixture of both.

This patch renames maxmin_uns to fmaxmin and only uses it
for the fmax and fmin optabs.  The reductions that previously
used the maxmin_uns attribute now use the optab attribute instead.

FMAX and FMIN are awkward in that they don't correspond to any
optab.  It's nevertheless useful to define them alongside the
“real” optabs.  Previously they were known as “smax_nan” and
“smin_nan”, but the problem with those names it that smax and
smin are only used for floats if NaNs don't matter.  This patch
therefore uses fmax_nan and fmin_nan instead.

There is still some inconsistency, in that the optab attribute
handles UNSPEC_COND_FMAX but the fmaxmin attribute handles
UNSPEC_FMAX.  This is because the SVE FP instructions, being
predicated, have to use unspecs in cases where the Advanced
SIMD ones could use rtl codes.

At least there are no duplicate entries though, so this seemed
like the best compromise for now.

gcc/
* config/aarch64/iterators.md (optab): Use fmax_nan instead of
smax_nan and fmin_nan instead of smin_nan.
(maxmin_uns): Rename to...
(fmaxmin): ...this and make the same changes.  Remove entries
unrelated to fmax* and fmin*.
* config/aarch64/aarch64.md (<maxmin_uns><mode>3): Rename to...
(<fmaxmin><mode>3): ...this.
* config/aarch64/aarch64-simd.md (aarch64_<maxmin_uns>p<mode>):
Rename to...
(aarch64_<optab>p<mode>): ...this.
(<maxmin_uns><mode>3): Rename to...
(<fmaxmin><mode>3): ...this.
(reduc_<maxmin_uns>_scal_<mode>): Rename to...
(reduc_<optab>_scal_<mode>): ...this and update gen* call.
(aarch64_reduc_<maxmin_uns>_internal<mode>): Rename to...
(aarch64_reduc_<optab>_internal<mode>): ...this.
(aarch64_reduc_<maxmin_uns>_internalv2si): Rename to...
(aarch64_reduc_<optab>_internalv2si): ...this.
* config/aarch64/aarch64-sve.md (<maxmin_uns><mode>3): Rename to...
(<fmaxmin><mode>3): ...this.
* config/aarch64/aarch64-simd-builtins.def (smax_nan, smin_nan)
Rename to...
(fmax_nan, fmin_nan): ...this.
* config/aarch64/arm_neon.h (vmax_f32, vmax_f64, vmaxq_f32, vmaxq_f64)
(vmin_f32, vmin_f64, vminq_f32, vminq_f64, vmax_f16, vmaxq_f16)
(vmin_f16, vminq_f16): Update accordingly.

vect: Pass scalar_costs to finish_cost

When finishing the vector costs, it can be useful to know
what the associated scalar costs were. This allows targets
to read information collected about the original scalar loop
when trying to make a final judgement about the cost of the
vector code.

This patch therefore passes the scalar costs to
vector_costs::finish_cost. The parameter is null for the
scalar costs themselves.

gcc/
* tree-vectorizer.h (vector_costs::finish_cost): Take the
corresponding scalar costs as a parameter.
(finish_cost): Likewise.
* tree-vect-loop.c (vect_compute_single_scalar_iteration_cost)
(vect_estimate_min_profitable_iters): Update accordingly.
* tree-vect-slp.c (vect_bb_vectorization_profitable_p): Likewise.
* tree-vectorizer.c (vector_costs::finish_cost): Likewise.
* config/aarch64/aarch64.c (aarch64_vector_costs::finish_cost):
Likewise.
* config/rs6000/rs6000.c (rs6000_cost_data::finish_cost): Likewise.

vect: Keep scalar costs around longer

The scalar costs for a loop are fleeting, with only the final
single_scalar_iteration_cost being kept for later comparison.
This patch replaces single_scalar_iteration_cost with the cost
structure, so that (with later patches) it's possible for targets
to examine other target-specific cost properties as well. This will
be done by passing the scalar costs to hooks where appropriate;
targets shouldn't try to read the information directly from
loop_vec_infos.

gcc/
* tree-vectorizer.h (_loop_vec_info::scalar_costs): New member
variable.
(_loop_vec_info::single_scalar_iteration_cost): Delete.
(LOOP_VINFO_SINGLE_SCALAR_ITERATION_COST): Delete.
(vector_costs::total_cost): New function.
* tree-vect-loop.c (_loop_vec_info::_loop_vec_info): Update
after above changes.
(_loop_vec_info::~_loop_vec_info): Delete scalar_costs.
(vect_compute_single_scalar_iteration_cost): Store the costs
in loop_vinfo->scalar_costs.
(vect_estimate_min_profitable_iters): Get the scalar cost from
loop_vinfo->scalar_costs.

vect: Hookize better_loop_vinfo_p

One of the things we want to do on AArch64 is compare vector loops
side-by-side and pick the best one.  For some targets, we want this
to be based on issue rates as well as the usual latency-based costs
(at least for loops with relatively high iteration counts).

The current approach to doing this is: when costing vectorisation
candidate A, try to guess what the other main candidate B will look
like and adjust A's latency-based cost up or down based on the likely
difference between A and B's issue rates.  This effectively means
that we try to cost parts of B at the same time as A, without actually
being able to see B.

This is needlessly indirect and complex.  It was a compromise due
to the code being added (too) late in the GCC 11 cycle, so that
target-independent changes weren't possible.

The target-independent code already compares two candidate loop_vec_infos
side-by-side, so that information about A and B above are available
directly.  This patch creates a way for targets to hook into this
comparison.

The AArch64 code can therefore hook into better_main_loop_than_p to
compare issue rates.  If the issue rate comparison isn't decisive,
the code can fall back to the normal latency-based comparison instead.

gcc/
* tree-vectorizer.h (vector_costs::better_main_loop_than_p)
(vector_costs::better_epilogue_loop_than_p)
(vector_costs::compare_inside_loop_cost)
(vector_costs::compare_outside_loop_cost): Likewise.
* tree-vectorizer.c (vector_costs::better_main_loop_than_p)
(vector_costs::better_epilogue_loop_than_p)
(vector_costs::compare_inside_loop_cost)
(vector_costs::compare_outside_loop_cost): New functions,
containing code moved from...
* tree-vect-loop.c (vect_better_loop_vinfo_p): ...here.

vect: Remove vec_outside/inside_cost fields

The vector costs now use a common base class instead of being
completely abstract. This means that there's no longer a
need to record the inside and outside costs separately.

gcc/
* tree-vectorizer.h (_loop_vec_info): Remove vec_outside_cost
and vec_inside_cost.
(vector_costs::outside_cost): New function.
* tree-vect-loop.c (_loop_vec_info::_loop_vec_info): Update
after above.
(vect_estimate_min_profitable_iters): Likewise.
(vect_better_loop_vinfo_p): Get the inside and outside costs
from the loop_vec_infos' vector_costs.

vect: Move vector costs to loop_vec_info

target_cost_data is in vec_info but is really specific to
loop_vec_info. This patch moves it there and renames it to
vector_costs, to distinguish it from scalar target costs.

gcc/
* tree-vectorizer.h (vec_info::target_cost_data): Replace with...
(_loop_vec_info::vector_costs): ...this.
(LOOP_VINFO_TARGET_COST_DATA): Delete.
* tree-vectorizer.c (vec_info::vec_info): Remove target_cost_data
initialization.
(vec_info::~vec_info): Remove corresponding delete.
* tree-vect-loop.c (_loop_vec_info::_loop_vec_info): Initialize
vector_costs to null.
(_loop_vec_info::~_loop_vec_info): Delete vector_costs.
(vect_analyze_loop_operations): Update after above changes.
(vect_analyze_loop_2): Likewise.
(vect_estimate_min_profitable_iters): Likewise.
* tree-vect-slp.c (vect_slp_analyze_operations): Likewise.

Make EAF flags more regular (and expressive)

I hoped that I am done with EAF flags related changes, but while looking into
the Fortran testcases I noticed that I have designed them in unnecesarily
restricted way.  I followed the scheme of NOESCAPE and NODIRECTESCAPE which is
however the only property tht is naturally transitive.

This patch replaces the existing flags by 9 flags:

EAF_UNUSED
EAF_NO_DIRECT_CLOBBER and EAF_NO_INDIRECT_CLOBBER
EAF_NO_DIRECT_READ and EAF_NO_INDIRECT_READ
EAF_NO_DIRECT_ESCAPE and EAF_NO_INDIRECT_ESCAPE
EAF_NO_DIRECT_READ and EAF_NO_INDIRECT_READ

So I have removed the unified EAF_DIRECT flag and made each of the flags to come
in direct and indirect variant.  Newly the indirect variant is not implied by direct
(well except for escape but it is not special cased in the code)
Consequently we can analyse i.e. the case where function reads directly and clobber
indirectly as in the following testcase:

struct wrap {
void **array;
};
__attribute__ ((noinline))
void
write_array (struct wrap *ptr)
{
ptr->array[0]=0;
}
int
test ()
{
void *arrayval;
struct wrap w = {&arrayval};
write_array (&w);
return w.array == &arrayval;
}

This is pretty common in array descriptors and also C++ pointer wrappers or structures
containing pointers to arrays.

Other advantage is that !binds_to_current_def_p functions we can still track the fact
that the value is not clobbered indirectly while previously we implied EAF_DIRECT
for all three cases.

Finally the propagation becomes more regular and I hope easier to understand
because the flags are handled in a symmetric way.

In tree-ssa-structalias I now produce "callarg" var_info as before and if necessary
also "indircallarg" for the indirect accesses.  I added some logic to optimize the
common case where we can not make difference between direct and indirect.

gcc/ChangeLog:

2021-11-09  Jan Hubicka  <hubicka@ucw.cz>

* tree-core.h (EAF_DIRECT): Remove.
(EAF_NOCLOBBER): Remove.
(EAF_UNUSED): Remove.
(EAF_NOESCAPE): Remove.
(EAF_NO_DIRECT_CLOBBER): New.
(EAF_NO_INDIRECT_CLOBBER): New.
(EAF_NODIRECTESCAPE): Remove.
(EAF_NO_DIRECT_ESCAPE): New.
(EAF_NO_INDIRECT_ESCAPE): New.
(EAF_NOT_RETURNED): Remove.
(EAF_NOT_RETURNED_INDIRECTLY): New.
(EAF_NOREAD): Remove.
(EAF_NO_DIRECT_READ): New.
(EAF_NO_INDIRECT_READ): New.
* gimple.c (gimple_call_arg_flags): Update for new flags.
(gimple_call_retslot_flags): Update for new flags.
* ipa-modref.c (dump_eaf_flags): Likewise.
(remove_useless_eaf_flags): Likewise.
(deref_flags): Likewise.
(modref_lattice::init): Likewise.
(modref_lattice::merge): Likewise.
(modref_lattice::merge_direct_load): Likewise.
(modref_lattice::merge_direct_store): Likewise.
(modref_eaf_analysis::merge_call_lhs_flags): Likewise.
(callee_to_caller_flags): Likewise.
(modref_eaf_analysis::analyze_ssa_name): Likewise.
(modref_eaf_analysis::propagate): Likewise.
(modref_merge_call_site_flags): Likewise.
* ipa-modref.h (interposable_eaf_flags): Likewise.
* tree-ssa-alias.c: (ref_maybe_used_by_call_p_1) Likewise.
* tree-ssa-structalias.c (handle_call_arg): Likewise.
(handle_rhs_call): Likewise.
* tree-ssa-uninit.c (maybe_warn_pass_by_reference): Likewise.

gcc/testsuite/ChangeLog:

* g++.dg/ipa/modref-1.C: Update template.
* gcc.dg/ipa/modref-3.c: Update template.
* gcc.dg/lto/modref-3_0.c: Update template.
* gcc.dg/lto/modref-4_0.c: Update template.
* gcc.dg/tree-ssa/modref-10.c: Update template.
* gcc.dg/tree-ssa/modref-11.c: Update template.
* gcc.dg/tree-ssa/modref-5.c: Update template.
* gcc.dg/tree-ssa/modref-6.c: Update template.
* gcc.dg/tree-ssa/modref-13.c: New test.

testsuite: change vect_long to vect_long_long in complex tests.

These tests are still failing on SPARC and it looks like this is because I need
to use vect_long_long instead of vect_long.

gcc/testsuite/ChangeLog:

PR testsuite/103042
* gcc.dg/vect/complex/bb-slp-complex-add-pattern-long.c: Use
vect_long_long instead of vect_long.
* gcc.dg/vect/complex/bb-slp-complex-add-pattern-unsigned-long.c:
Likewise.
* gcc.dg/vect/complex/vect-complex-add-pattern-long.c: Likewise.
* gcc.dg/vect/complex/vect-complex-add-pattern-unsigned-long.c:
Likewise.

middle-end: Fix signbit tests when ran on ISA with support for masks.

These test don't work on vector ISAs where the truth
type don't match the vector mode of the operation.

However I still want the tests to run on these
architectures but just turn off the ISA modes that
enable masks.

This thus turns off SVE is it's on and turns off
AVX512 if it's on.

gcc/testsuite/ChangeLog:

* gcc.dg/signbit-2.c: Turn off masks.
* gcc.dg/signbit-5.c: Likewise.

vect: remove unused variable in complex numbers detection code.

This removed an unused variable that clang seems to catch when
compiling GCC with Clang.

gcc/ChangeLog:

* tree-vect-slp-patterns.c (complex_mul_pattern::matches): Remove l1node.

libstdc++: Fix test for libstdc++ not including <unistd.h> [PR100117]

The <cxxx> headers for the C library are not under our control, so we
can't prevent them from including <unistd.h>. Change the PR 49745 test
to only include the C++ library headers, not the <cxxx> ones.

To ensure <bits/stdc++.h> isn't included automatically we need to use
no_pch to disable PCH.

libstdc++-v3/ChangeLog:

PR libstdc++/100117
* testsuite/17_intro/headers/c++1998/49745.cc: Explicitly list
all C++ headers instead of including <bits/stdc++.h>

libstdc++: Disable gthreads weak symbols for glibc 2.34 [PR103133]

Since Glibc 2.34 all pthreads symbols are defined directly in libc not
libpthread, and since Glibc 2.32 we have used __libc_single_threaded to
avoid unnecessary locking in single-threaded programs. This means there
is no reason to avoid linking to libpthread now, and so no reason to use
weak symbols defined in gthr-posix.h for all the pthread_xxx functions.

libstdc++-v3/ChangeLog:

PR libstdc++/100748
PR libstdc++/103133
* config/os/gnu-linux/os_defines.h (_GLIBCXX_GTHREAD_USE_WEAK):
Define for glibc 2.34 and later.

testsuite/102690 - XFAIL g++.dg/warn/Warray-bounds-16.C

This XFAILs the bogus diagnostic test and rectifies the expectation
on the optimization.

2021-11-10 Richard Biener <rguenther@suse.de>

PR testsuite/102690
* g++.dg/warn/Warray-bounds-16.C: XFAIL diagnostic part
and optimization.

[AArch64] Fix TBAA information when lowering NEON loads and stores to gimple

This patch fixes the wrong TBAA information when lowering NEON loads and stores
to gimple that showed up when bootstrapping with UBSAN.

gcc/ChangeLog:

* config/aarch64/aarch64-builtins.c
(aarch64_general_gimple_fold_builtin): Change pointer alignment and
alias.

gcc/testsuite/ChangeLog:

* gcc.target/aarch64/simd/lowering_tbaa.c: New test.

[AArch64] Fix big-endian testisms introduced by NEON gimple lowering patch

This patch reverts the tests for big-endian after the NEON gimple lowering
patch. The earlier patch only lowers NEON loads and stores for little-endian,
meaning the codegen now differs between endinanness so we need target specific
testing.

gcc/testsuite/ChangeLog:

* gcc.target/aarch64/fmla_intrinsic_1.c: Fix big-endian testism.
* gcc.target/aarch64/fmls_intrinsic_1.c: Likewise.
* gcc.target/aarch64/fmul_intrinsic_1.c: Likewise.

Fix modref_tree::remap_params

gcc/ChangeLog:

* ipa-modref-tree.h (modref_tree::remap_params): Fix off-by-one error.

rs6000, libgcc: Fix up -Wmissing-prototypes warning on rs6000/linux-unwind.h

Jonathan reported and I've verified a
In file included from ../../../libgcc/unwind-dw2.c:412:
./md-unwind-support.h:398:6: warning: no previous prototype for ‘ppc_backchain_fallback’ [-Wmissing-prototypes]
  398 | void ppc_backchain_fallback (struct _Unwind_Context *context, void *a)
      |      ^~~~~~~~~~~~~~~~~~~~~~
warning on powerpc*-linux* libgcc build.

All the other MD_* macro functions are static, so I think the following
is the right thing rather than adding a previous prototype for
ppc_backchain_fallback.

2021-11-10  Jakub Jelinek  <jakub@redhat.com>

* config/rs6000/linux-unwind.h (ppc_back_fallback): Make it static,
formatting fix.

Improve integer bit test on __atomic_fetch_[or|and]_* returns

commit adedd5c173388ae505470df152b9cb3947339566
Author: Jakub Jelinek <jakub@redhat.com>
Date:   Tue May 3 13:37:25 2016 +0200

    re PR target/49244 (__sync or __atomic builtins will not emit 'lock bts/btr/btc')

optimized bit test on __atomic_fetch_or_* and __atomic_fetch_and_* returns
with lock bts/btr/btc by turning

  mask_2 = 1 << cnt_1;
  _4 = __atomic_fetch_or_* (ptr_6, mask_2, _3);
  _5 = _4 & mask_2;

into

  _4 = ATOMIC_BIT_TEST_AND_SET (ptr_6, cnt_1, 0, _3);
  _5 = _4;

and

  mask_6 = 1 << bit_5(D);
  _1 = ~mask_6;
  _2 = __atomic_fetch_and_4 (v_8(D), _1, 0);
  _3 = _2 & mask_6;
  _4 = _3 != 0;

into

  mask_6 = 1 << bit_5(D);
  _1 = ~mask_6;
  _11 = .ATOMIC_BIT_TEST_AND_RESET (v_8(D), bit_5(D), 1, 0);
  _4 = _11 != 0;

But it failed to optimize many equivalent, but slighly different cases:

1.
  _1 = __atomic_fetch_or_4 (ptr_6, 1, _3);
  _4 = (_Bool) _1;
2.
  _1 = __atomic_fetch_and_4 (ptr_6, ~1, _3);
  _4 = (_Bool) _1;
3.
  _1 = __atomic_fetch_or_4 (ptr_6, 1, _3);
  _7 = ~_1;
  _5 = (_Bool) _7;
4.
  _1 = __atomic_fetch_and_4 (ptr_6, ~1, _3);
  _7 = ~_1;
  _5 = (_Bool) _7;
5.
  _1 = __atomic_fetch_or_4 (ptr_6, 1, _3);
  _2 = (int) _1;
  _7 = ~_2;
  _5 = (_Bool) _7;
6.
  _1 = __atomic_fetch_and_4 (ptr_6, ~1, _3);
  _2 = (int) _1;
  _7 = ~_2;
  _5 = (_Bool) _7;
7.
  _1 = __atomic_fetch_or_4 (ptr_6, 0x80000000, _3);
  _5 = (signed int) _1;
  _4 = _5 < 0;
8.
  _1 = __atomic_fetch_and_4 (ptr_6, 0x7fffffff, _3);
  _5 = (signed int) _1;
  _4 = _5 < 0;
9.
  _1 = 1 << bit_4(D);
  mask_5 = (unsigned int) _1;
  _2 = __atomic_fetch_or_4 (v_7(D), mask_5, 0);
  _3 = _2 & mask_5;
10.
  mask_7 = 1 << bit_6(D);
  _1 = ~mask_7;
  _2 = (unsigned int) _1;
  _3 = __atomic_fetch_and_4 (v_9(D), _2, 0);
  _4 = (int) _3;
  _5 = _4 & mask_7;

We make

  mask_2 = 1 << cnt_1;
  _4 = __atomic_fetch_or_* (ptr_6, mask_2, _3);
  _5 = _4 & mask_2;

and

  mask_6 = 1 << bit_5(D);
  _1 = ~mask_6;
  _2 = __atomic_fetch_and_4 (v_8(D), _1, 0);
  _3 = _2 & mask_6;
  _4 = _3 != 0;

the canonical forms for this optimization and transform cases 1-9 to the
equivalent canonical form.  For cases 10 and 11, we simply remove the cast
before __atomic_fetch_or_4/__atomic_fetch_and_4 with

  _1 = 1 << bit_4(D);
  _2 = __atomic_fetch_or_4 (v_7(D), _1, 0);
  _3 = _2 & _1;

and

  mask_7 = 1 << bit_6(D);
  _1 = ~mask_7;
  _3 = __atomic_fetch_and_4 (v_9(D), _1, 0);
  _6 = _3 & mask_7;
  _5 = (int) _6;

2021-11-04  H.J. Lu  <hongjiu.lu@intel.com>
    Hongtao Liu  <hongtao.liu@intel.com>
gcc/

PR middle-end/102566
* match.pd (nop_atomic_bit_test_and_p): New match.
* tree-ssa-ccp.c (convert_atomic_bit_not): New function.
(gimple_nop_atomic_bit_test_and_p): New prototype.
(optimize_atomic_bit_test_and): Transform equivalent, but slighly
different cases to their canonical forms.

gcc/testsuite/

PR middle-end/102566
* g++.target/i386/pr102566-1.C: New test.
* g++.target/i386/pr102566-2.C: Likewise.
* g++.target/i386/pr102566-3.C: Likewise.
* g++.target/i386/pr102566-4.C: Likewise.
* g++.target/i386/pr102566-5a.C: Likewise.
* g++.target/i386/pr102566-5b.C: Likewise.
* g++.target/i386/pr102566-6a.C: Likewise.
* g++.target/i386/pr102566-6b.C: Likewise.
* gcc.target/i386/pr102566-1a.c: Likewise.
* gcc.target/i386/pr102566-1b.c: Likewise.
* gcc.target/i386/pr102566-2.c: Likewise.
* gcc.target/i386/pr102566-3a.c: Likewise.
* gcc.target/i386/pr102566-3b.c: Likewise.
* gcc.target/i386/pr102566-4.c: Likewise.
* gcc.target/i386/pr102566-5.c: Likewise.
* gcc.target/i386/pr102566-6.c: Likewise.
* gcc.target/i386/pr102566-7.c: Likewise.
* gcc.target/i386/pr102566-8a.c: Likewise.
* gcc.target/i386/pr102566-8b.c: Likewise.
* gcc.target/i386/pr102566-9a.c: Likewise.
* gcc.target/i386/pr102566-9b.c: Likewise.
* gcc.target/i386/pr102566-10a.c: Likewise.
* gcc.target/i386/pr102566-10b.c: Likewise.
* gcc.target/i386/pr102566-11.c: Likewise.
* gcc.target/i386/pr102566-12.c: Likewise.
* gcc.target/i386/pr102566-13.c: New test.
* gcc.target/i386/pr102566-14.c: New test.

[Ada] Minor cleanup in translation of calls to subprograms

gcc/ada/

* gcc-interface/ada-tree.h (DECL_STUBBED_P): Delete.
* gcc-interface/decl.c (gnat_to_gnu_entity): Do not set it.
* gcc-interface/trans.c (Call_to_gnu): Use GNAT_NAME local variable
and adjust accordingly. Replace test on DECL_STUBBED_P with direct
test on Convention and move it down in the processing.

[Ada] Warn for bidirectional characters

gcc/ada/

* scng.adb (Check_Bidi): New procedure to give warning. Note
that this is called only for non-ASCII characters, so should not
be an efficiency issue.
(Slit): Call Check_Bidi for wide characters in string_literals.
(Minus_Case): Call Check_Bidi for wide characters in comments.
(Char_Literal_Case): Call Check_Bidi for wide characters in
character_literals. Move Accumulate_Checksum down, because
otherwise, if Err is True, the Code is uninitialized.
* errout.ads: Make the obsolete nature of "Insertion character
?" more prominent; one should not have to read several
paragraphs before finding out that it's obsolete.

[Ada] Avoid warnings regarding rep clauses in generics -- follow-on

gcc/ada/

* repinfo.adb (List_Component_Layout): Initialize Sbit.

[Ada] Fix comments about expansion of array equality

gcc/ada/

* exp_ch4.adb (Expand_Array_Equality): Fix inconsistent casing
in comment about the template for expansion of array equality;
now we use lower case for true/false/boolean.
(Handle_One_Dimension): Fix comment about the template for
expansion of array equality.

[Ada] Avoid warnings regarding rep clauses in generics

gcc/ada/

* repinfo.adb (List_Common_Type_Info, List_Object_Info): Add
check for In_Generic_Scope.
(List_Component_Layout): Check for known static values.
* sem_ch13.adb (Check_Record_Representation_Clause): Add check
for In_Generic_Scope.

[Ada] ACATS BDC1002 shall not error on arbitrary aspect

gcc/ada/

* aspects.adb, aspects.ads (Is_Aspect_Id): New function.
* namet-sp.ads, namet-sp.adb (Aspect_Spell_Check,
Attribute_Spell_Check): New Functions.
* par-ch13.adb (Possible_Misspelled_Aspect): Removed.
(With_Present): Use Aspect_Spell_Check, use Is_Aspect_Id.
(Get_Aspect_Specifications): Use Aspect_Spell_Check,
Is_Aspect_Id, Bad_Aspect.
* par-sync.adb (Resync_Past_Malformed_Aspect): Use Is_Aspect_Id.
* sem_ch13.adb (Check_One_Attr): Use Is_Aspect_Id.
* sem_prag.adb (Process_Restrictions_Or_Restriction_Warnings):
Introduce the Process_No_Specification_Of_Aspect, emit a warning
instead of an error on unknown aspect, hint for typos.
Introduce Process_No_Use_Of_Attribute to add spell check for
attributes too.
(Set_Error_Msg_To_Profile_Name): Use Is_Aspect_Id.
* sem_util.adb (Bad_Attribute): Use Attribute_Spell_Check.
(Bad_Aspect): New function.
* sem_util.ads (Bad_Aspect): New function.

[Ada] Do not assume a priority value of zero is a valid priority

gcc/ada/

* libgnarl/s-taskin.adb (Initialize_ATCB): Initialize
T.Common.Current_Priority to Priority'First.
* libgnarl/s-taskin.ads (Unspecified_Priority): Redefined as -1.
* libgnat/system-rtems.ads: Start priority range from 1, as 0 is
reserved by the operating system.

[Ada] Prove double precision integer arithmetic unit

gcc/ada/

* libgnat/a-nbnbig.ads: Mark the unit as Pure.
* libgnat/s-aridou.adb: Add contracts and ghost code for proof.
(Scaled_Divide): Reorder operations and use of temporaries in
two places to facilitate proof.
* libgnat/s-aridou.ads: Add full functional contracts.
* libgnat/s-arit64.adb: Mark in SPARK.
* libgnat/s-arit64.ads: Add contracts similar to those from
s-aridou.ads.
* rtsfind.ads: Document the limitation that runtime units
loading does not work for private with-clauses.

[Ada] Don't carry action bodies for expansion of array equality

gcc/ada/

* exp_ch3.adb (Make_Eq_Body): Adapt call to
Expand_Record_Equality.
* exp_ch4.ads, exp_ch4.adb (Expand_Composite_Equality): Remove
Bodies parameter; adapt comment; fix style in body; adapt calls
to Expand_Record_Equality.
(Expand_Array_Equality): Adapt calls to
Expand_Composite_Equality.
(Expand_Record_Equality): Remove Bodies parameter; adapt
comment; adapt call to Expand_Composite_Equality.
* exp_ch8.adb (Build_Body_For_Renaming): Adapt call to
Expand_Record_Equality.

[Ada] Use predefined equality for arrays inside records

gcc/ada/

* exp_ch4.adb (Expand_Composite_Equality): Handle arrays inside
records just like scalars; only records inside records need
dedicated handling.

[Ada] Fix oversight in latest change to Has_Compatible_Type

gcc/ada/

* sem_type.ads (Has_Compatible_Type): Add For_Comparison parameter.
* sem_type.adb (Has_Compatible_Type): Put back the reversed calls
to Covers guarded with For_Comparison.
* sem_ch4.adb (Analyze_Membership_Op) <Try_One_Interp>: Remove new
reversed call to Covers and set For_Comparison to true instead.
(Find_Comparison_Types) <Try_One_Interp>: Likewise
(Find_Equality_Types) <Try_One_Interp>: Likewise.

[Ada] Create explicit ghost mirror unit for big integers

gcc/ada/

* Makefile.rtl: Add unit.
* libgnat/a-nbnbin__ghost.adb: Move...
* libgnat/a-nbnbig.adb: ... here. Mark ghost as ignored.
* libgnat/a-nbnbin__ghost.ads: Move...
* libgnat/a-nbnbig.ads: ... here. Add comment for purpose of
this unit. Mark ghost as ignored.
* libgnat/s-widthu.adb: Use new unit.
* sem_aux.adb (First_Subtype): Adapt to the case of a ghost type
whose freeze node is rewritten to a null statement.

[Ada] Fix Constraint error on rexgexp close bracket find algorithm

gcc/ada/

* libgnat/s-regexp.adb (Check_Well_Formed_Pattern): Fix
Constraint_Error on missing close bracket.

[Ada] Extend optimized equality of 2-element arrays

gcc/ada/

* exp_ch4.adb (Expand_Array_Equality): Remove check of the array
bound being an N_Range node; use Type_High_Bound/Type_Low_Bound,
which handle all kinds of array bounds.

[Ada] Warn when interfaces swapped between full and partial view

gcc/ada/

* sem_ch3.adb (Derived_Type_Declaration): Introduce a subprogram
for tree transformation. If a tree transformation is performed,
then warn that it would be better to reorder the interfaces.

[Ada] Add guard against previous error for peculiar ACATS test

gcc/ada/

* sem_ch4.adb (Find_Non_Universal_Interpretations): Add guard.

[Ada] Better error message on missing parentheses

gcc/ada/

* par-ch4.adb (P_Primary): Adapt test for getting error message
on missing parentheses.

Extend is_cond_scalar_reduction to handle bit_and/bit_xor/bit_ior.

This will enable transformation like

-  # sum1_50 = PHI <prephitmp_64(13), 0(4)>
-  # sum2_52 = PHI <sum2_21(13), 0(4)>
+  # sum1_50 = PHI <_87(13), 0(4)>
+  # sum2_52 = PHI <_89(13), 0(4)>
   # ivtmp_62 = PHI <ivtmp_61(13), 64(4)>
   i.2_7 = (long unsigned int) i_49;
   _8 = i.2_7 * 8;
...
   vec1_i_38 = vec1_29 >> _10;
   vec2_i_39 = vec2_31 >> _10;
   _11 = vec1_i_38 & 1;
-  _63 = tmp_37 ^ sum1_50;
-  prephitmp_64 = _11 == 0 ? sum1_50 : _63;
+  _ifc__86 = _11 != 0 ? tmp_37 : 0;
+  _87 = sum1_50 ^ _ifc__86;
   _12 = vec2_i_39 & 1;
:

so that vectorizer won't failed due to

  /* If this isn't a nested cycle or if the nested cycle reduction value
     is used ouside of the inner loop we cannot handle uses of the reduction
     value.  */
  if (nlatch_def_loop_uses > 1 || nphi_def_loop_uses > 1)
    {
      if (dump_enabled_p ())
dump_printf_loc (MSG_MISSED_OPTIMIZATION, vect_location,
"reduction used in loop.\n");
      return NULL;
    }

gcc/ChangeLog:

PR tree-optimization/103126
* tree-vect-loop.c (neutral_op_for_reduction): Remove static.
* tree-vectorizer.h (neutral_op_for_reduction): Declare.
* tree-if-conv.c : Include tree-vectorizer.h.
(is_cond_scalar_reduction): Handle
BIT_XOR_EXPR/BIT_IOR_EXPR/BIT_AND_EXPR.
(convert_scalar_cond_reduction): Ditto.

gcc/testsuite/ChangeLog:

* gcc.target/i386/ifcvt-reduction-logic-op.c: New test.

i386: Support complex fma/conj_fma for _Float16.

Support cmla_optab, cmul_optab, cmla_conj_optab, cmul_conj_optab for vector _Float16.

gcc/ChangeLog:

* config/i386/sse.md (cmul<conj_op><mode>3): add new define_expand.
(cmla<conj_op><mode>4): Likewise

gcc/testsuite/ChangeLog:

* gcc.target/i386/avx512fp16-vector-complex-float.c: New test.

Remove unused gimple-ssa-evr-analyze.h header file.

gcc/ChangeLog:

* tree-ssa-threadedge.c: Do not include
gimple-ssa-evrp-analyze.h.
* value-pointer-equiv.cc: Same.

Include PHI threading restrictions in backthreader diagnostics.

I forgot to include the path dump when failing a path in resolve_phi.
To do so I abstracted dump_path into its own function, which made me
realize we had another copy with slightly different output.

I've merged everything and cleaned it up.

gcc/ChangeLog:

* tree-ssa-threadbackward.c
(back_threader::maybe_register_path_dump): Abstract path dumping...
(dump_path): ...here.
(back_threader::resolve_phi): Call dump_path.
(debug): Same.

i386: Optimization for mm512_set1_pch.

This patch is to support fold _mm512_fmadd_pch (a, _mm512_set1_pch(*(b)), c)
to 1 instruction vfmaddcph (%rsp){1to16}, %zmm1, %zmm2;

gcc/ChangeLog:

* config/i386/sse.md (fma_<complexpairopname>_<mode>_pair):
Add new define_insn.
(fma_<mode>_fmaddc_bcst): Add new define_insn_and_split.
(fma_<mode>_fcmaddc_bcst): Likewise

gcc/testsuite/ChangeLog:

* gcc.target/i386/avx512fp16vl-complex-broadcast-1.c: New test.

Simplify (trunc)MAX/MIN((extend)a, (extend)b) to MAX/MIN(a,b)

a and b are same type as trunc type and has less precision than
extend type.

gcc/ChangeLog:

PR target/102464
* match.pd: Simplify (trunc)fmax/fmin((extend)a, (extend)b) to
MAX/MIN(a,b)

gcc/testsuite/ChangeLog:

* gcc.target/i386/pr102464-maxmin.c: New test.

aarch64: [PR101529] Fix vector shuffle insertion expansion

The function aarch64_evpc_ins would reuse the target even though
it might be the same register as the two inputs.
Instead of checking to see if we can reuse the target, just use the
original input directly.

Committed as approved after bootstrapped and tested on
aarch64-linux-gnu with no regressions.

PR target/101529

gcc/ChangeLog:

* config/aarch64/aarch64.c (aarch64_evpc_ins): Don't use target
as an input, use original one.

gcc/testsuite/ChangeLog:

* c-c++-common/torture/builtin-convertvector-2.c: New test.
* c-c++-common/torture/builtin-shufflevector-2.c: New test.

Nios2: Add TARGET_CAN_INLINE_P hook.

2021-11-09 Sandra Loosemore <sandra@codesourcery.com>

gcc/
* config/nios2/nios2.c (nios2_can_inline_p): New.
(TARGET_CAN_INLINE_P): Define.

gcc/testsuite/
* gcc.target/nios2/custom-fp-inline-1.c: New.
* gcc.target/nios2/custom-fp-inline-2.c: New.
* gcc.target/nios2/custom-fp-inline-3.c: New.
* gcc.target/nios2/custom-fp-inline-4.c: New.

Daily bump.