review.tizen.org Git - platform/upstream/gcc.git/log

Add loads/stores relative to static chain in ipa-modref

Adds tracking of accesses relative to static chain into modref
load/stores analysis.  This helps some Fortran benchmarks however it is still
quite limited.  One problem is that we never discover functions with nested
functions as const, pure or not accessing global memory because it contains
__builtin_dward_cfa call which we believe to be non-pure.

Bootstrapped/regtested x86_64-linux.  Plan to commit it tomorrow if there are
no complains and once periodic testers picks today modref changes.

Honza

gcc/ChangeLog:

* ipa-modref-tree.h (enum modref_special_parms): New enum.
(struct modref_access_node): update for special parms.
(struct modref_ref_node): Likewise.
(struct modref_parm_map): Likewise.
(struct modref_tree): Likewise.
* ipa-modref.c (dump_access): Likewise.
(get_access): Detect static chain.
(parm_map_for_arg): Take tree as arg instead of
stmt and index.
(merge_call_side_effects): Compute map for static chain.
(process_fnspec): Update.
(struct escape_point): Remove retslot_arg and static_chain_arg.
(analyze_parms): Update.
(compute_parm_map): Update.
(propagate_unknown_call): Update.
(modref_propagate_in_scc): Update.
(modref_merge_call_site_flags): Update.
(ipa_merge_modref_summary_after_inlining): Update.
* tree-ssa-alias.c (modref_may_conflict): Handle static chain.
* ipa-modref-tree.c (test_merge): Update.

gcc/testsuite/ChangeLog:

* gcc.dg/tree-ssa/modref-12.c: New test.

Disables gimple folding for VSX_BUILTIN_XVMINDP, VSX_BUILTIN_XVMAXDP,ALTIVEC_BUILTIN_VMINFP and ALTIVEC_BUILTIN_VMAXFP when fast-math is not set.

gcc/
* config/rs6000/rs6000-call.c (rs6000_gimple_fold_builtin): Disable
gimple fold for VSX_BUILTIN_XVMINDP, ALTIVEC_BUILTIN_VMINFP,
VSX_BUILTIN_XVMAXDP, ALTIVEC_BUILTIN_VMAXFP when fast-math is not
set.

gcc/testsuite/
* gcc.target/powerpc/vec-minmax-1.c: New test.
* gcc.target/powerpc/vec-minmax-2.c: Likewise.

Update documentation for -ftree-loop-vectorize and -ftree-slp-vectorize which are enabled by default at -02.

gcc/ChangeLog:

PR tree-optimization/103077
* doc/invoke.texi (Options That Control Optimization):
Update documentation for -ftree-loop-vectorize and
-ftree-slp-vectorize which are enabled by default at -02.

Add !HONOR_SNANS to simplifcation: (trunc)copysign((extend)a, (extend)b) to copysign (a, b).

> Note that this is not safe with -fsignaling-nans, so needs to be disabled
> for that option (if there isn't already logic somewhere with that effect),
> because the extend will convert a signaling NaN to quiet (raising
> "invalid"), but copysign won't, so this transformation could result in a
> signaling NaN being wrongly returned when the original code would never
> have returned a signaling NaN.
>
> --
> Joseph S. Myers
> joseph@codesourcery.com

gcc/ChangeLog

PR target/102464
* match.pd (Simplifcation (trunc)copysign((extend)a, (extend)b)
to .COPYSIGN (a, b)): Add !HONOR_SNANS.

[Gimple] Simplify (trunc)fma ((extend)a, (extend)b, (extend)c) to IFN_FMA (a,b, c).

a, b, c are same type as truncation type and has less precision than
extend type, the optimization is guarded under
flag_unsafe_math_optimizations.

gcc/ChangeLog:
PR target/102464
* match.pd: Simplify
(trunc)fma ((extend)a, (extend)b, (extend)c) to IFN_FMA (a, b,
c) under flag_unsafe_math_optimizations.

gcc/testsuite/ChangeLog:

* gcc.target/i386/pr102464-fma.c: New test.

Daily bump.

Fix keyword name for co_reduce.

gcc/fortran/ChangeLog:

* intrinsic.c (add_subroutines): Change keyword "operator"
to the correct one, "operation".
* check.c (gfc_check_co_reduce): Change OPERATOR to
OPERATION in error messages.
* intrinsic.texi: Change OPERATOR to OPERATION in
documentation.

gcc/testsuite/ChangeLog:

* gfortran.dg/co_reduce_2.f90: New test.
* gfortran.dg/coarray_collectives_14.f90: Change OPERATOR
to OPERATION.
* gfortran.dg/coarray_collectives_16.f90: Likewise.
* gfortran.dg/coarray_collectives_9.f90: Likewise.

Co-authored by: Steve Kargl <steve@gcc.gnu.org>

Limit range of modref-max-depth

gcc/ChangeLog:

PR ipa/103055
* params.opt (modref-max-depth): Add range.
(modref-max-adjustments): Fix range.

Remove VRP threader.

Now that things have stabilized, we can remove the old code.

I have left the hybrid threader in tree-ssa-threadedge, even though the
VRP threader was the only user, because we may need it as an interim
step for DOM threading removal.

Tested on x86-64 Linux.

gcc/ChangeLog:

* tree-pass.h (make_pass_vrp_threader): Remove.
* tree-ssa-threadbackward.c
(back_threader_profitability::profitable_path_p): Remove
ASSERT_EXPR references.
* tree-ssa-threadedge.c (jt_state::register_equivs_stmt): Same.
* tree-vrp.c (vrp_folder::simplify_casted_conds): Same.
(execute_vrp): Same.
(class hybrid_threader): Remove.
(hybrid_threader::hybrid_threader): Remove.
(hybrid_threader::~hybrid_threader): Remove.
(hybrid_threader::before_dom_children): Remove.
(hybrid_threader::after_dom_children): Remove.
(execute_vrp_threader): Remove.
(class pass_vrp_threader): Remove.
(make_pass_vrp_threader): Remove.

Fortran: Diagnose all operands/arguments with constraint violations

04-Nov-2021 Sandra Loosemore <sandra@codesourcery.com>
Bernhard Reutner-Fischer <aldot@gcc.gnu.org>

PR fortran/101337

gcc/fortran/ChangeLog:
* interface.c (gfc_compare_actual_formal): Continue checking
all arguments after encountering an error.
* intrinsic.c (do_ts29113_check): Likewise.
* resolve.c (resolve_operator): Continue resolving on op2 error.

gcc/testsuite/ChangeLog:
* gfortran.dg/bessel_3.f90: Expect additional diagnostics from
multiple bad arguments in the call.
* gfortran.dg/pr24823.f: Likewise.
* gfortran.dg/pr39937.f: Likewise.
* gfortran.dg/pr41011.f: Likewise.
* gfortran.dg/pr61318.f90: Likewise.
* gfortran.dg/c-interop/c407b-2.f90: Remove xfails.
* gfortran.dg/c-interop/c535b-2.f90: Likewise.

Fix inter-procedural EAF flags propagation with respect to !binds_to_current_def_p

While proofreading the code for handling EAF flags of !binds_to_current_def_p I
noticed that the interprocedural dataflow actually ignores the flag possibly
introducing wrong code on quite complex interposable functions in non-trivial
recursion cycles (or at ltrans partition boundary).

This patch unifies the flags changes to single place (remove_useless_eaf_flags)
and does extend modref_merge_call_site_flags to do the right thing.

lto-bootstrapped/regtested x86_64-linux. Plan to commit it today after bit
more testing (firefox/clang build).

gcc/ChangeLog:

* gimple.c (gimple_call_arg_flags): Use interposable_eaf_flags.
(gimple_call_retslot_flags): Likewise.
(gimple_call_static_chain_flags): Likewise.
* ipa-modref.c (remove_useless_eaf_flags): Do not remove everything for
NOVOPS.
(modref_summary::useful_p): Likewise.
(modref_summary_lto::useful_p): Likewise.
(analyze_parms): Do not give up on NOVOPS.
(analyze_function): When dumping report chnages in EAF flags
between IPA and local pass.
(modref_merge_call_site_flags): Compute implicit eaf flags
based on callee ecf_flags and fnspec; if the function does not
bind to current defs use interposable_eaf_flags.
(modref_propagate_flags_in_scc): Update.
* ipa-modref.h (interposable_eaf_flags): New function.

rs6000: Replace the builtin expansion machinery

This patch forms the meat of the improvements for this patch series.
We develop a replacement for rs6000_expand_builtin and its supporting
functions, which are inefficient and difficult to maintain.

Differences between the old and new support in this patch include:
- Make use of the new builtin data structures, directly looking up
   a function's information rather than searching for the function
   multiple times;
- Test for enablement of builtins at expand time, to support #pragma
   target changes within a compilation unit;
- Use the builtin function attributes (e.g., bif_is_cpu) to control
   special handling;
- Refactor common code into one place; and
- Provide common error handling in one place for operands that are
   restricted to specific values or ranges.

2021-11-07  Bill Schmidt  <wschmidt@linux.ibm.com>

gcc/
* config/rs6000/rs6000-call.c (rs6000_expand_new_builtin): New
forward decl.
(rs6000_invalid_new_builtin): New function.
(rs6000_expand_builtin): Call rs6000_expand_new_builtin.
(rs6000_expand_ldst_mask): New function.
(new_cpu_expand_builtin): Likewise.
(elemrev_icode): Likewise.
(ldv_expand_builtin): Likewise.
(lxvrse_expand_builtin): Likewise.
(lxvrze_expand_builtin): Likewise.
(stv_expand_builtin): Likewise.
(new_mma_expand_builtin): Likewise.
(new_htm_spr_num): Likewise.
(new_htm_expand_builtin): Likewise.
(rs6000_expand_new_builtin): Likewise.
(rs6000_init_builtins): Initialize altivec_builtin_mask_for_load.

Implement intra-procedural dataflow in ipa-modref flags propagation.

implement the (long promised) intraprocedural dataflow for
propagating eaf flags, so we can handle parameters that participate
in loops in SSA graphs. Typical example are acessors that walk linked
lists, for example.

I implemented dataflow using the standard iteration over BBs in RPO some time
ago, but did not like it becuase it had measurable compile time impact with
very small code quality effect. This is why I kept mainline to do the DFS walk
instead. The reason is that we care about flags of SSA names that corresponds
to parameters and those can be often determined from a small fraction of the
SSA graph so solving dataflow for all SSA names in a function is a waste.

This patch implements dataflow more carefully.  The DFS walk is kept in place to
solve acyclic cases and discover the relevat part of SSA graph into new graph
(which is similar to one used for inter-procedrual dataflow - we only need to
know the edges and if the access is direct or derefernced).  The RPO iterative
dataflow then works on this simplified graph.

This seems to be fast in practice. For GCC linktime we do dataflow for 4881
functions. Out of that 4726 finishes in one iteration, 144 in two and 10 in 3.

Overall 31979 functions are analysed, so we do dataflow only for bit over of
10% of cases.  131123 edges are visited by the solver.  I measured no compile
time impact of this.

gcc/ChangeLog:

* ipa-modref.c (modref_lattice): Add do_dataflow,
changed and propagate_to fields.
(modref_lattice::release): Free propagate_to
(modref_lattice::merge): Do not give up early on unknown
lattice values.
(modref_lattice::merge_deref): Likewise.
(modref_eaf_analysis): Update toplevel comment.
(modref_eaf_analysis::analyze_ssa_name): Record postponned ssa names;
do optimistic dataflow initialization.
(modref_eaf_analysis::merge_with_ssa_name): Build dataflow graph.
(modref_eaf_analysis::propagate): New member function.
(analyze_parms): Update to new API of modref_eaf_analysis.

Daily bump.

Fix can_be_discarded_p wrt partitioned functions.

gcc/ChangeLog:

* cgraph.h (cgraph_node::can_be_discarded_p): Do not
return true on functions from other partition.

gcc/lto/ChangeLog:

PR ipa/103070
PR ipa/103058
* lto-partition.c (must_not_rename): Update comment.
(promote_symbol): Set resolution to LDPR_PREVAILING_DEF_IRONLY.

Fortran: error recovery on rank mismatch of array and its initializer

gcc/fortran/ChangeLog:

PR fortran/102715
* decl.c (add_init_expr_to_sym): Reject rank mismatch between
array and its initializer.

gcc/testsuite/ChangeLog:

PR fortran/102715
* gfortran.dg/pr68019.f90: Adjust error message.
* gfortran.dg/pr102715.f90: New test.

powerpc: Fix vsx_splat_v4si in 32 bit mode

Tamar's recent patch to teach CSE to perform vector extract exercises
VSX splat more frequently, which exposed a constraint error for the
vsx_splat patterns. The pattern could be created for Power9, but
the "we constraint only provided alternatives in 64 bit mode. The
instructions are valid in 32 bit mode and SImode is allowed in VSX
registers. This patch updates the constraints from "we" to "wa" to
allow the pattern and fix the failing testcases.

gcc/ChangeLog:

* config/rs6000/vsx.md (vsx_splat_v4si): Change constraints to "wa".
(vsx_splat_v4si_di): Change constraint to "wa".

path oracle: Do not look at root oracle for killed defs.

The problem here is that we are incorrectly threading 41->20->21 here:

  <bb 35> [local count: 56063504182]:
  _134 = M.10_120 + 1;
  if (_71 <= _134)
    goto <bb 19>; [11.00%]
  else
    goto <bb 41>; [89.00%]
...
...
...
  <bb 41> [local count: 49896518755]:

  <bb 20> [local count: 56063503181]:
  # lb_75 = PHI <_134(41), 1(18)>
  _117 = mstep_49 + lb_75;
  _118 = _117 + -1;
  _119 = mstep_49 + _118;
  M.10_120 = MIN_EXPR <_119, _71>;
  if (lb_75 > M.10_120)
    goto <bb 21>; [11.00%]
  else
    goto <bb 22>; [89.00%]

First, lb_17 == _134 because of the PHI.
Second, _134 > M.10_120 because of _134 = M.10_120 + 1.

We then assume that lb_75 > M.10_120, but this is incorrect because
M.10_120 was killed along the path.

This incorrect thread causes the miscompilation in 527.cam4_r.

Tested on x86-64 and ppc64le Linux.

gcc/ChangeLog:

PR tree-optimization/103061
* value-relation.cc (path_oracle::path_oracle): Initialize
m_killed_defs.
(path_oracle::killing_def): Set m_killed_defs.
(path_oracle::query_relation): Do not look at the root oracle for
killed defs.
* value-relation.h (class path_oracle): Add m_killed_defs.

testsuite: Use posix_memalign on AIX for tsvc

AIX does not provide memalign, so the testcases much use
posix_memalign for portability on AIX.

gcc/testsuite/ChangeLog:

* gcc.dg/vect/tsvc/tsvc.h (init): Use posix_memalign on AIX.

Cleanup back_threader::find_path_to_names.

The main path discovery function was due for a cleanup.  First,
there's a nagging goto and second, my bitmap use was sloppy.  Hopefully
this makes the code easier for others to read.

Regstrapped on x86-64 Linux.  I also made sure there were no difference
in the number of threads with this patch.

No functional changes.

gcc/ChangeLog:

* tree-ssa-threadbackward.c (back_threader::find_paths_to_names):
Remove gotos and other cleanups.

Daily bump.

Fortran: fix simplification of array-valued parameter expressions

gcc/fortran/ChangeLog:

PR fortran/102817
* expr.c (simplify_parameter_variable): Copy shape of referenced
subobject when simplifying.

gcc/testsuite/ChangeLog:

PR fortran/102817
* gfortran.dg/pr102817.f90: New test.

Fix ice in insert_access

gcc/ChangeLog:

PR ipa/103073
* ipa-modref-tree.h (modref_tree::insert): Do nothing for
paradoxical and zero sized accesses.

gcc/testsuite/ChangeLog:

PR ipa/103073
* g++.dg/torture/pr103073.C: New test.
* gcc.dg/tree-ssa/modref-11.c: New test.

Avoid left shift of negative value in ipa-modref-tree.h

gcc/ChangeLog:

PR ipa/103082
* ipa-modref-tree.h (struct modref_access_node): Avoid left shift
of negative value

Fortran: a symbol in a COMMON cannot be a coarray

gcc/fortran/ChangeLog:

PR fortran/69419
* match.c (gfc_match_common): Check array spec of a symbol in a
COMMON object list and reject it if it is a coarray.

gcc/testsuite/ChangeLog:

PR fortran/69419
* gfortran.dg/pr69419.f90: New test.

libstdc++: Fix inconsistent noexcept-specific for valarray begin/end

These declarations should be noexcept after I added it to the
definitions in <valarray>.

libstdc++-v3/ChangeLog:

* include/bits/range_access.h (begin(valarray), end(valarray)):
Add noexcept.

libstdc++: Fix pack expansions in tuple_size_v specializations

libstdc++-v3/ChangeLog:

* include/std/tuple (tuple_size_v): Fix pack expansion.

Fortran: Missing error with IMPLICIT none (external) [PR100972]

gcc/fortran/ChangeLog:

PR fortran/100972
* decl.c (gfc_match_implicit_none): Fix typo in warning.
* resolve.c (resolve_unknown_f): Reject external procedures
without explicit EXTERNAL attribute whe IMPLICIT none (external)
is in effect.

gcc/testsuite/ChangeLog:

PR fortran/100972
* gfortran.dg/implicit_14.f90: Adjust error.
* gfortran.dg/external_implicit_none_3.f08: New test.

Fortran: Delete unused decl in gfortran.h

gcc/fortran/ChangeLog:

* decl.c (gfc_insert_kind_parameter_exprs): Make static.
* expr.c (gfc_build_init_expr): Make static
(gfc_build_default_init_expr): Move below its static helper.
* gfortran.h (gfc_insert_kind_parameter_exprs, gfc_add_saved_common,
gfc_add_common, gfc_use_derived_tree, gfc_free_charlen,
gfc_get_ultimate_derived_super_type,
gfc_resolve_oacc_parallel_loop_blocks, gfc_build_init_expr,
gfc_iso_c_sub_interface): Delete.
* symbol.c (gfc_new_charlen, gfc_get_derived_super_type): Make
static.

Fortran: Add more documentation for mixed-language programming [PR35276]

2021-11-05 Sandra Loosemore <sandra@codesourcery.com>

PR fortran/35276

gcc/fortran/
* gfortran.texi (Mixed-Language Programming): Talk about C++,
and how to link.

testsuite, Darwin : Fix tsvc test build on Darwin.

Currently all the tsvc tests fail to build on Darwin because
they assume that <malloc.h> and memalign() are available.

For Darwin, <stdlib.h> is sufficient to obtain the declarations
for malloc and the port has posix_memalign () but not memalign.

Fixed as below.

Signed-off-by: Iain Sandoe <iain@sandoe.co.uk>
gcc/testsuite/ChangeLog:

* gcc.dg/vect/tsvc/tsvc.h: Do not try to include malloc.h
on Darwin also use posix_memalign ().

Darwin : Make trampoline templates linker-visible.

For aarch64, the alignment of the LTRAMPn symbols matters.

Actually, the LTRAMPn symbols _are_ 8 byte aligned, but because
they are Local, the linker doesn't know that this guarantee can be met.
It assumes that they are not necessarily more aligned than the
containing section (ld64 atoms strike again).

The fix is to publish the trampoline symbol for the linker to access
directly - it can then see that the atom is suitably aligned.

Fixes issue #11 on the development branch.

Signed-off-by: Iain Sandoe <iain@sandoe.co.uk>
gcc/ChangeLog:

* config/darwin.h (ASM_GENERATE_INTERNAL_LABEL): Add LTRAMP
to the list of symbol prefixes that must be made linker-
visible.

Darwin, aarch64 : Ada fixes for hosted tools.

This will allow someone (with an existing Ada compiler on the
platform - which can be provided by the experimental aarch64-darwin
branch) - to build the host tools (gnatmake and friends) for a
non-native cross.

The existing provisions for iOS are OK for cross-compilation from
an x86-64-darwin platform, but we need some adjustments so that these
host tools can be built to run on aarch64-darwin.

Signed-off-by: Iain Sandoe <iain@sandoe.co.uk>
gcc/ada/

* gcc-interface/Make-lang.in: Use iOS signal trampoline code
for hosted Ada tools.
* sigtramp-ios.c: Wrap the declarations in extern "C" when
the code is built by a C++ compiler.

Darwin, aarch64 : Initial support for the self-host driver.

At present, there is no special action needed for aarch64-darwin
this just pulls in generic Darwin code.

Signed-off-by: Iain Sandoe <iain@sandoe.co.uk>
gcc/ChangeLog:

* config.host: Add support for aarch64-*-darwin.
* config/aarch64/host-aarch64-darwin.c: New file.
* config/aarch64/x-darwin: New file.

Darwin, crts: Fix a build warning.

We have a shim crt for Darwin10 that implements functionality
missing in libSystem. Provide this with a prototype to silence the
warning about this.

libgcc/ChangeLog:

* config/darwin10-unwind-find-enc-func.c: Include libgcc_tm.h.
* config/i386/darwin-lib.h: Declare Darwin10 crt function.

Signed-off-by: Iain Sandoe <iain@sandoe.co.uk>

libstdc++: Add [[unlikely]] attributes to std::random_device routines

libstdc++-v3/ChangeLog:

* src/c++11/random.cc (__x86_rdrand, __x86_rdseed): Add
[[unlikely]] attribute.

libstdc++: Add support for POWER9 DARN instruction to std::random_device

The ISA-3.0 instruction set includes DARN ("deliver a random number")
which can be used similarly to the existing support for RDRAND and RDSEED.

libstdc++-v3/ChangeLog:

* src/c++11/random.cc [__powerpc__] (USE_DARN): Define.
(__ppc_darn): New function to use POWER9 DARN instruction.
(Which): Add 'darn' enumerator.
(which_source): Check for __ppc_darn.
(random_device::_M_init): Support "darn" and "hw" tokens.
(random_device::_M_getentropy): Add darn to switch.
* testsuite/26_numerics/random/random_device/cons/token.cc:
Check "darn" token.
* testsuite/26_numerics/random/random_device/entropy.cc:
Likewise.

libsanitizer: update LOCAL_PATCHES.

libsanitizer: Apply local patches

libsanitizer: merge from master (78d3e0a4f1406b17cdecc77540e09210670fe9a9).

Remove def chain import assert from GORI.

When the IL has changed, any new ssa-names import calculations may not jive
with existing ssa-names, so just remove the assert.

gcc/
PR tree-optimization/103093
* gimple-range-gori.cc (range_def_chain::get_imports): Remove assert.

gcc/testsuite/
* gcc.dg/pr103093.c: New.

Abstract ranger cache update list.

Make it more efficient by removing the call to vec::contains.

PR tree-optimization/102943
* gimple-range-cache.cc (class update_list): New.
(update_list::add): Replace add_to_update.
(update_list::pop): New.
(ranger_cache::ranger_cache): Adjust.
(ranger_cache::~ranger_cache): Adjust.
(ranger_cache::add_to_update): Delete.
(ranger_cache::propagate_cache): Adjust to new class.
(ranger_cache::propagate_updated_value): Ditto.
(ranger_cache::fill_block_cache): Ditto.
* gimple-range-cache.h (class ranger_cache): Adjust to update class.

Amend split vector loop analysis into main and epilogue analysis

I forgot to commit the changes done as response to Richards review
before committing.

2021-11-05 Richard Biener <rguenther@suse.de>

* tree-vect-loop.c (vect_analyze_loop): Remove obsolete
comment and expand on another one. Combine nested if.

Support TI mode and soft float on PA64

This change implements TI mode on PA64.  Various new patterns are
added to pa.md.  The libgcc build needed modification to build both
DI and TI routines.  We also need various softfp routines to
convert to and from TImode.

I added full softfp for the -msoft-float option.  At the moment,
this doesn't completely eliminate all use of the floating-point
co-processor.  For this, libgcc needs to be built with -msoft-mult.
The floating-point exception support also needs a soft option.

2021-11-05  John David Anglin  <danglin@gcc.gnu.org>

PR libgomp/96661

gcc/ChangeLog:

* config/pa/pa-modes.def: Add OImode integer type.
* config/pa/pa.c (pa_scalar_mode_supported_p): Allow TImode
for TARGET_64BIT.
* config/pa/pa.h (MIN_UNITS_PER_WORD) Define to MIN_UNITS_PER_WORD
to UNITS_PER_WORD if IN_LIBGCC2.
* config/pa/pa.md (addti3, addvti3, subti3, subvti3, negti2,
negvti2, ashlti3, shrpd_internal): New patterns.
Change some multi instruction types to multi.

libgcc/ChangeLog:

* config.host (hppa*64*-*-linux*): Revise tmake_file.
(hppa*64*-*-hpux11*): Likewise.
* config/pa/sfp-exceptions.c: New.
* config/pa/sfp-machine.h: New.
* config/pa/t-dimode: New.
* config/pa/t-softfp-sfdftf: New.

x86: Make stringop_algs::stringop_strategy ctor constexpr [PR100246]

> Several older compilers fail to build modern GCC because of missing
> or incomplete C++11 support.
>
>       * config/i386/i386.h (struct stringop_algs): Define a CTOR for
>       this type.

Unfortunately, as mentioned in my
https://gcc.gnu.org/pipermail/gcc-patches/2021-November/583289.html
mail, without the new dyninit pass this causes dynamic initialization of
many variables, 6.5KB _GLOBAL__sub_I_* on x86_64 and 12.5KB on i686.

The following patch makes the ctor constexpr so that already the FE
is able to statically initialize all those.

I have tested on godbolt a reduced testcase without a constructor,
with constructor and with constexpr constructor.
clang before 3.3 is unhappy about all the 3 cases, clang 3.3 and 3.4
is ok with ctor and ctor with constexpr and optimizes it into static
initialization, clang 3.5+ is ok with all 3 versions and optimizes,
gcc 4.8 and 5+ is ok with all 3 versions and no ctor and ctor with constexpr
is optimized, gcc 4.9 is unhappy about the no ctor case and happy with the
other two.

2021-11-05  Jakub Jelinek  <jakub@redhat.com>

PR bootstrap/100246
* config/i386/i386.h
(stringop_algs::stringop_strategy::stringop_strategy): Make the ctor
constexpr.

contrib: testsuite-management: Update to be python3 compatible

contrib/ChangeLog:

* testsuite-management/validate_failures.py: 2to3

AArch64: Fix PR103085

The stack protector implementation hides symbols in a const unspec, which means
movdi/movsi patterns must always support const on symbol operands and
explicitly strip away the unspec. Do this for the recently added GOT
alternatives. Add a test to ensure stack-protector tests GOT accesses as well.

2021-11-05 Wilco Dijkstra <wdijkstr@arm.com>

PR target/103085
* config/aarch64/aarch64.c (aarch64_mov_operand_p): Strip the salt
first.
* config/aarch64/constraints.md: Support const in Usw.

gcc/testsuite/
PR target/103085
* gcc.target/aarch64/pr103085.c: New test

Move PREFERRED_DEBUGGING_TYPE define in pa64-hpux.h to pa.h

This fixes D language build on hppa64-hpux11.

2021-11-05 John David Anglin <danglin@gcc.gnu.org>

gcc/ChangeLog:

* config/pa/pa.h (PREFERRED_DEBUGGING_TYPE): Define to DWARF2_DEBUG.
* config/pa/pa64-hpux.h (PREFERRED_DEBUGGING_TYPE): Remove define.

gcov-profile: Filter test only for some targets [PR102945]

PR gcov-profile/102945

gcc/testsuite/ChangeLog:

* gcc.dg/gcov-info-to-gcda.c: Filter supported targets.

Split vector loop analysis into main and epilogue analysis

As discussed this splits the analysis loop into two, first settling
on a vector mode used for the main loop and only then analyzing
the epilogue of that for possible vectorization.  That makes it
easier to put in support for unrolled main loops.

On the way I've realized some cleanup opportunities, namely caching
n_stmts in vec_info_shared (it's computed by dataref analysis)
avoiding to pass that around and setting/clearing loop->aux
during analysis - try_vectorize_loop_1 will ultimatively set it
on those we vectorize.

This also gets rid of the previously introduced callback in
vect_analyze_loop_1 in favor of making that advance the mode iterator.
I'm now pushing VOIDmode explicitely into the vector_modes array
which makes the re-start on the epilogue side a bit more
straight-forward.  Note that will now use auto-detection of the
vector mode in case the main loop used it and we want to try
LOOP_VINFO_EPIL_USING_PARTIAL_VECTORS_P and the first mode from
the target array if not.  I've added a comment that says we may
want to make sure we don't try vectorizing the epilogue with a
bigger vector size than the main loop but the situation isn't
very likely to appear in practice I guess (and it was also present
before this change).

In principle this change should not change vectorization decisions
but the way we handled re-analyzing epilogues as main loops makes
me only 99% sure that it does.

2021-11-05  Richard Biener  <rguenther@suse.de>

* tree-vectorizer.h (vec_info_shared::n_stmts): Add.
(LOOP_VINFO_N_STMTS): Likewise.
(vec_info_for_bb): Remove unused function.
* tree-vectorizer.c (vec_info_shared::vec_info_shared):
Initialize n_stmts member.
* tree-vect-loop.c: Remove INCLUDE_FUNCTIONAL.
(vect_create_loop_vinfo): Do not set loop->aux.
(vect_analyze_loop_2): Do not get n_stmts as argument,
instead use LOOP_VINFO_N_STMTS.  Set LOOP_VINFO_VECTORIZABLE_P
here.
(vect_analyze_loop_1): Remove callback, get the mode iterator
and autodetected_vector_mode as argument, advancing the
iterator and initializing autodetected_vector_mode here.
(vect_analyze_loop): Split analysis loop into two, first
processing main loops only and then epilogues.

ipa: Do not require RECORD_TYPE for ancestor jump functions

The check this patch removes has remained from times when ancestor
jump functions have been only used for devirtualization and also
contained BINFOs. It is not necessary now and should have been
removed long time ago.

gcc/ChangeLog:

2021-11-04 Martin Jambor <mjambor@suse.cz>

* ipa-prop.c (compute_complex_assign_jump_func): Remove
unnecessary check for RECORD_TYPE.

libstdc++: Add xfail to pretty printer tests that fail in C++20

For some reason the type printer for std::string doesn't work in C++20
mode, so std::basic_string<char, char_traits<char>, allocator<char> is
printed out in full rather than being shown as std::string. It's
probably related to the fact that the extern template declarations are
disabled for C++20, but I don't know why that affects GDB.

For now I'm just marking the relevant tests as XFAIL. That requires
adding support for target selectors to individual GDB directives such as
note-test and whatis-regexp-test.

libstdc++-v3/ChangeLog:

* testsuite/lib/gdb-test.exp: Add target selector support to the
dg-final directives.
* testsuite/libstdc++-prettyprinters/80276.cc: Add xfail for
C++20.
* testsuite/libstdc++-prettyprinters/libfundts.cc: Likewise.
* testsuite/libstdc++-prettyprinters/prettyprinters.exp: Tweak
comment.

include: Allow for our md5.h to defer to the system header

This came up in the context of libsanitizer, where platform-specific
support for FreeBSD relies on aspects provided by FreeBSD's own md5.h.

Address this by allowing GCC's md5.h to pull in the system header
instead, controlled by a new macro USE_SYSTEM_MD5.

2021-11-05 Gerald Pfeifer <gerald@pfeifer.com>
Jakub Jelinek <jakub@redhat.com>

include/
* md5.h (USE_SYSTEM_MD5): Introduce.

doc: No longer generate old.html

Commit 431d26e1dd18c1146d3d4dcd3b45a3b04f7f7d59 removed
doc/install-old.texi, alas we still tried to generate the
associated web page old.html - which then turned out empty.

Simplify remove this from the list of pages to be generated.

gcc:
* doc/install.texi2html: Do not generate old.html any longer.

Reset when -gtoggle is used in gcc_options.

PR debug/102955

gcc/ChangeLog:

* opts.c (finish_options): Reset flag_gtoggle when it is used.

gcc/testsuite/ChangeLog:

* g++.dg/pr102955.C: New test.

dwarf2out: Fix up CONST_WIDE_INT handling once more [PR103046]

My last change to CONST_WIDE_INT handling in add_const_value_attribute broke
handling of CONST_WIDE_INT constants like ((__uint128_t) 1 << 120).
wi::min_precision (w1, UNSIGNED) in that case 121, but wide_int::from
creates a wide_int that has 0 and 0xff00000000000000ULL in its elts and
precision 121.  When we output that, we output both elements and thus emit
0, 0xff00000000000000 instead of the desired 0, 0x0100000000000000.

IMHO we should actually pass machine_mode to add_const_value_attribute from
callers, so that we know exactly what precision we want.  Because
hypothetically, if say mode is OImode and the CONST_WIDE_INT value fits into
128 bits or 192 bits, we'd emit just those 128 or 192 bits but debug info
users would expect 256 bits.

On
typedef unsigned __int128 U;

int
main ()
{
  U a = (U) 1 << 120;
  U b = 0xffffffffffffffffULL;
  U c = ((U) 0xffffffff00000000ULL) << 64;
  return 0;
}
vanilla gcc incorrectly emits 0, 0xff00000000000000 for a,
0xffffffffffffffff alone (DW_FORM_data8) for b and 0, 0xffffffff00000000
for c.  gcc with the previously posted PR103046 patch emits
0, 0x0100000000000000 for a, 0xffffffffffffffff alone for b and
0, 0xffffffff00000000 for c.  And with this patch we emit
0, 0x0100000000000000 for a, 0xffffffffffffffff, 0 for b and
0, 0xffffffff00000000 for c.
So, the patch below certainly causes larger debug info (well, 128-bit
integers are pretty rare), but in this case the question is if it isn't
more correct, as debug info consumers generally will not know if they
should sign or zero extend the value in DW_AT_const_value.
The previous code assumes they will always zero extend it...

2021-11-05  Jakub Jelinek  <jakub@redhat.com>

PR debug/103046
* dwarf2out.c (add_const_value_attribute): Add MODE argument, use it
in CONST_WIDE_INT handling.  Adjust recursive calls.
(add_location_or_const_value_attribute): Pass DECL_MODE (decl) to
new add_const_value_attribute argument.
(tree_add_const_value_attribute): Pass TYPE_MODE (type) to new
add_const_value_attribute argument.

gcc: vx-common.h: fix test for VxWorks7

The macro TARGET_VXWORKS7 is always defined (see vxworks-dummy.h).
Thus we need to test its value, not its definedness.

Fixes aca124df (define NO_DOT_IN_LABEL only in vxworks6).

gcc/ChangeLog:

* config/vx-common.h: Test value of TARGET_VXWORKS7 rather
than definedness.

First refactor of vect_analyze_loop

This refactors the main loop analysis part in vect_analyze_loop,
re-purposing the existing vect_reanalyze_as_main_loop for this
to reduce code duplication.  Failure flow is a bit tricky since
we want to extract info from the analyzed loop but I wanted to
share the destruction part.  Thus I add some std::function and
lambda to funnel post-analysis for the case we want that
(when analyzing from the main iteration but not when re-analyzing
an epilogue as main).

In addition I split vect_analyze_loop_form into analysis and
vinfo creation so we can do the analysis only once, simplifying
the new vect_analyze_loop_1.

As discussed we probably want to change the loop over vector
modes to first only analyze things as the main loop, picking
the best (or simd VF) mode for the main loop and then analyze
for a vectorized epilogue.  The unroll would then integrate
with the main loop vectorization.  I think that currently
we may fail to analyze the epilogue with the same mode as
the main loop when using partial vectors since we increment
mode_i before doing that.

2021-11-04  Richard Biener  <rguenther@suse.de>

* tree-vectorizer.h (struct vect_loop_form_info): New.
(vect_analyze_loop_form): Adjust.
(vect_create_loop_vinfo): New.
* tree-parloops.c (gather_scalar_reductions): Adjust for
vect_analyze_loop_form API change.
* tree-vect-loop.c: Include <functional>.
(vect_analyze_loop_form_1): Rename to vect_analyze_loop_form,
take struct vect_loop_form_info as output parameter and adjust.
(vect_analyze_loop_form): Rename to vect_create_loop_vinfo and
split out call to the original vect_analyze_loop_form_1.
(vect_reanalyze_as_main_loop): Rename to...
(vect_analyze_loop_1): ... this, factor out the call to
vect_analyze_loop_form and generalize to be able to use it twice ...
(vect_analyze_loop): ... here.  Perform vect_analyze_loop_form
once only and here.

rs6000: Fix incorrect fusion constraint [PR102991]

gcc/ChangeLog:

2021-11-05 Xionghu Luo <luoxhu@linux.ibm.com>

PR target/102991
* config/rs6000/fusion.md: Regenerate.
* config/rs6000/genfusion.pl: Fix incorrect clobber constraint.

Daily bump.

libstdc++: Fix pretty printing of std::unique_ptr [PR103086]

Since std::tuple started using [[no_unique_address]] the tuple<T*, D>
member of std::unique_ptr<T, D> has two _M_head_impl subobjects, in
different base classes. That means this printer code is ambiguous:

    tuple_head_type = tuple_impl_type.fields()[1].type   # _Head_base
    head_field = tuple_head_type.fields()[0]
    if head_field.name == '_M_head_impl':
        self.pointer = tuple_member['_M_head_impl']

In older versions of GDB it happened to work by chance, because GDB
returned the last _M_head_impl member and std::tuple's base classes are
stored in reverse order, so the last one was the T* element of the
tuple. Since GDB 11 it returns the first _M_head_impl, which is the
deleter element.

The fix is for the printer to stop using an ambiguous field name and
cast the tuple to the correct base class before accessing the
_M_head_impl member.

Instead of fixing this in both UniquePointerPrinter and StdPathPrinter a
new unique_ptr_get function is defined to do it correctly. That is
defined in terms of new tuple_get and _tuple_impl_get functions.

It would be possible to reuse _tuple_impl_get to access each element in
StdTuplePrinter._iterator.__next__, but that already does the correct
casting, and wouldn't be much simpler anyway.

libstdc++-v3/ChangeLog:

PR libstdc++/103086
* python/libstdcxx/v6/printers.py (_tuple_impl_get): New helper
for accessing the tuple element stored in a _Tuple_impl node.
(tuple_get): New function for accessing a tuple element.
(unique_ptr_get): New function for accessing a unique_ptr.
(UniquePointerPrinter, StdPathPrinter): Use unique_ptr_get.
* python/libstdcxx/v6/xmethods.py (UniquePtrGetWorker): Cast
tuple to its base class before accessing _M_head_impl.

libstdc++: Deprecate std::unexpected and handler functions

These functions have been deprecated since C++11, and were removed in
C++17. The proposal P0323 wants to reuse the name std::unexpected for a
class template, so we will need to stop defining the current function
for C++23 anyway.

This marks them as deprecated for C++11 and up, to warn users they won't
continue to be available. It disables them for C++17 and up, unless the
_GLIBCXX_USE_DEPRECATED macro is defined.

The <unwind-cxx.h> header uses std::unexpected_handler in the public
API, but since that type is the same as std::terminate_handler we can
just use that instead, to avoid warnings about it being deprecated.

libstdc++-v3/ChangeLog:

* doc/xml/manual/evolution.xml: Document deprecations.
* doc/html/*: Regenerate.
* libsupc++/exception (unexpected_handler, unexpected)
(get_unexpected, set_unexpected): Add deprecated attribute.
Do not define without _GLIBCXX_USE_DEPRECATED for C++17 and up.
* libsupc++/eh_personality.cc (PERSONALITY_FUNCTION): Disable
deprecated warnings.
* libsupc++/eh_ptr.cc (std::rethrow_exception): Likewise.
* libsupc++/eh_terminate.cc: Likewise.
* libsupc++/eh_throw.cc (__cxa_init_primary_exception):
Likewise.
* libsupc++/unwind-cxx.h (struct __cxa_exception): Use
terminate_handler instead of unexpected_handler.
(struct __cxa_dependent_exception): Likewise.
(__unexpected): Likewise.
* testsuite/18_support/headers/exception/synopsis.cc: Add
dg-warning for deprecated warning.
* testsuite/18_support/exception_ptr/60612-unexpected.cc:
Disable deprecated warnings.
* testsuite/18_support/set_unexpected.cc: Likewise.
* testsuite/18_support/unexpected_handler.cc: Likewise.

gcc/testsuite/ChangeLog:

* g++.dg/cpp0x/lambda/lambda-eh2.C: Add dg-warning for new
deprecation warnings.
* g++.dg/cpp0x/noexcept06.C: Likewise.
* g++.dg/cpp0x/noexcept07.C: Likewise.
* g++.dg/eh/forced3.C: Likewise.
* g++.dg/eh/unexpected1.C: Likewise.
* g++.old-deja/g++.eh/spec1.C: Likewise.
* g++.old-deja/g++.eh/spec2.C: Likewise.
* g++.old-deja/g++.eh/spec3.C: Likewise.
* g++.old-deja/g++.eh/spec4.C: Likewise.
* g++.old-deja/g++.mike/eh33.C: Likewise.
* g++.old-deja/g++.mike/eh34.C: Likewise.
* g++.old-deja/g++.mike/eh50.C: Likewise.
* g++.old-deja/g++.mike/eh51.C: Likewise.

IBM Z: Define STACK_CHECK_MOVING_SP

With -fstack-check the stack probes emitted access memory below the
stack pointer.

gcc/ChangeLog:

* config/s390/s390.h (STACK_CHECK_MOVING_SP): New macro
definition.

libstdc++: Consolidate duplicate metaprogramming utilities

Currently std::variant uses __index_of<T, Types...> to find the first
occurence of a type in a pack, and __exactly_once<T, Types...> to check
that there is no other occurrence.

We can reuse the __find_uniq_type_in_pack<T, Types...>() function for
both tasks, and remove the recursive templates used to implement
__index_of and __exactly_once.

libstdc++-v3/ChangeLog:

* include/bits/utility.h (__find_uniq_type_in_pack): Move
definition to here, ...
* include/std/tuple (__find_uniq_type_in_pack): ... from here.
* include/std/variant (__detail__variant::__index_of): Remove.
(__detail::__variant::__exactly_once): Define using
__find_uniq_type_in_pack instead of __index_of.
(get<T>, get_if<T>, variant::__index_of): Likewise.

libstdc++: Optimize std::tuple_element and std::tuple_size_v

This reduces the number of class template instantiations needed for code
using tuples, by reusing _Nth_type in tuple_element and specializing
tuple_size_v for tuple, pair and array (and const-qualified versions of
them).

Also define the _Nth_type primary template as a complete type (but with
no nested 'type' member). This avoids "invalid use of incomplete type"
errors for out-of-range specializations of tuple_element. Those errors
would probably be confusing and unhelpful for users. We already have
a user-friendly static assert in tuple_element itself.

Also ensure that tuple_size_v is available whenever tuple_size is (as
proposed by LWG 3387). We already do that for tuple_element_t.

libstdc++-v3/ChangeLog:

* include/bits/stl_pair.h (tuple_size_v): Define partial
specializations for std::pair.
* include/bits/utility.h (_Nth_type): Move definition here
and define primary template.
(tuple_size_v): Move definition here.
* include/std/array (tuple_size_v): Define partial
specializations for std::array.
* include/std/tuple (tuple_size_v): Move primary template to
<bits/utility.h>. Define partial specializations for
std::tuple.
(tuple_element): Change definition to use _Nth_type.
* include/std/variant (_Nth_type): Move to <bits/utility.h>.
(variant_alternative, variant): Adjust qualification of
_Nth_type.
* testsuite/20_util/tuple/element_access/get_neg.cc: Prune
additional errors from _Nth_type.

AArch64: Lower intrinsics shift to GIMPLE when possible.

This lowers shifts to GIMPLE when the C interpretations of the shift operations
matches that of AArch64.

In C shifting right by BITSIZE is undefined, but the behavior is defined in
AArch64.  Additionally negative shifts lefts are undefined for the register
variant of the instruction (SSHL, USHL) as being right shifts.

Since we have a right shift by immediate I rewrite those cases into right shifts

So:

int64x1_t foo3 (int64x1_t a)
{
  return vshl_s64 (a, vdup_n_s64(-6));
}

produces:

foo3:
        sshr    d0, d0, 6
        ret

instead of:

foo3:
        mov     x0, -6
        fmov    d1, x0
        sshl    d0, d0, d1
        ret

This behavior isn't specifically mentioned for a left shift by immediate, but I
believe that only the case because we do have a right shift by immediate but not
a right shift by register.  As such I do the same for left shift by immediate.

gcc/ChangeLog:

* config/aarch64/aarch64-builtins.c
(aarch64_general_gimple_fold_builtin): Add ashl, sshl, ushl, ashr,
ashr_simd, lshr, lshr_simd.
* config/aarch64/aarch64-simd-builtins.def (lshr): Use USHIFTIMM.
* config/aarch64/arm_neon.h (vshr_n_u8, vshr_n_u16, vshr_n_u32,
vshrq_n_u8, vshrq_n_u16, vshrq_n_u32, vshrq_n_u64): Fix type hack.

gcc/testsuite/ChangeLog:

* gcc.target/aarch64/advsimd-intrinsics/vshl-opt-1.c: New test.
* gcc.target/aarch64/advsimd-intrinsics/vshl-opt-2.c: New test.
* gcc.target/aarch64/advsimd-intrinsics/vshl-opt-3.c: New test.
* gcc.target/aarch64/advsimd-intrinsics/vshl-opt-4.c: New test.
* gcc.target/aarch64/advsimd-intrinsics/vshl-opt-5.c: New test.
* gcc.target/aarch64/advsimd-intrinsics/vshl-opt-6.c: New test.
* gcc.target/aarch64/advsimd-intrinsics/vshl-opt-7.c: New test.
* gcc.target/aarch64/advsimd-intrinsics/vshl-opt-8.c: New test.
* gcc.target/aarch64/signbit-2.c: New test.

middle-end: convert negate + right shift into compare greater.

This turns an inversion of the sign bit + arithmetic right shift into a
comparison with 0.

i.e.

void fun1(int32_t *x, int n)
{
    for (int i = 0; i < (n & -16); i++)
      x[i] = (-x[i]) >> 31;
}

now generates:

.L3:
        ldr     q0, [x0]
        cmgt    v0.4s, v0.4s, #0
        str     q0, [x0], 16
        cmp     x0, x1
        bne     .L3

instead of:

.L3:
        ldr     q0, [x0]
        neg     v0.4s, v0.4s
        sshr    v0.4s, v0.4s, 31
        str     q0, [x0], 16
        cmp     x0, x1
        bne     .L3

gcc/ChangeLog:

* match.pd: New negate+shift pattern.

gcc/testsuite/ChangeLog:

* gcc.dg/signbit-2.c: New test.
* gcc.dg/signbit-3.c: New test.
* gcc.dg/signbit-4.c: New test.
* gcc.dg/signbit-5.c: New test.
* gcc.dg/signbit-6.c: New test.
* gcc.target/aarch64/signbit-1.c: New test.

Treat undefined operands as varying in GORI.

If the LHS is UNDEFINED simply stop calculating. Treat op1 and op2
as VARYING if they are UNDEFINED.

PR tree-optimization/103079
gcc/
* gimple-range-gori.cc (gimple_range_calc_op1): Treat undefined as
varying.
(gimple_range_calc_op2): Ditto.

gcc/testsuite/
* gcc.dg/pr103079.c: New.

ipa-sra: Improve debug info for removed parameters (PR 93385)

In spring I added code eliminating any statements using parameters
removed by IPA passes (to fix PR 93385).  That patch fixed issues such
as divisions by zero that such code could perform but it only reset
all affected debug bind statements, this one updates them with
expressions which can allow the debugger to print the removed value -
see the added test-case for an example.

Even though I originally did not want to create DEBUG_EXPR_DECLs for
intermediate values, I ended up doing so, because otherwise the code
started creating statements like

   # DEBUG __aD.198693 => &MEM[(const struct _Alloc_nodeD.171110 *)D#195]._M_tD.184726->_M_implD.171154

which not only is a bit scary but also gimple-fold ICEs on
it. Therefore I decided they are probably quite necessary.

The patch simply notes each removed SSA name present in a debug
statement and then works from it backwards, looking if it can
reconstruct the expression it represents (which can fail if a
non-degenerate PHI node is in the way).  If it can, it populates two
hash maps with those expressions so that 1) removed assignments are
replaced with a debug bind defining a new intermediate debug_decl_expr
and 2) existing debug binds that refer to SSA names that are bing
removed now refer to corresponding debug_decl_exprs.

If a removed parameter is passed to another function, the debugging
information still cannot describe its value there - see the xfailed
test in the testcase.  I sort of know what needs to be done but that
needs a little bit more of IPA infrastructure on top of this patch and
so I would like to get this patch reviewed first.

Bootstrapped and tested on x86_64-linux, i686-linux and (long time
ago) on aarch64-linux.  Also LTO-bootstrapped and on x86_64-linux.

Perhaps it is good to go to trunk?

Thanks,

Martin

gcc/ChangeLog:

2021-03-29  Martin Jambor  <mjambor@suse.cz>

PR ipa/93385
* ipa-param-manipulation.h (class ipa_param_body_adjustments): New
members remap_with_debug_expressions, m_dead_ssa_debug_equiv,
m_dead_stmt_debug_equiv and prepare_debug_expressions.  Added
parameter to mark_dead_statements.
* ipa-param-manipulation.c: Include tree-phinodes.h and cfgexpand.h.
(ipa_param_body_adjustments::mark_dead_statements): New parameter
debugstack, push into it all SSA names used in debug statements,
produce m_dead_ssa_debug_equiv mapping for the removed param.
(replace_with_mapped_expr): New function.
(ipa_param_body_adjustments::remap_with_debug_expressions): Likewise.
(ipa_param_body_adjustments::prepare_debug_expressions): Likewise.
(ipa_param_body_adjustments::common_initialization): Gather and
procecc SSA which will be removed but are in debug statements. Simplify.
(ipa_param_body_adjustments::ipa_param_body_adjustments): Initialize
new members.
* tree-inline.c (remap_gimple_stmt): Create a debug bind when possible
when avoiding a copy of an unnecessary statement.  Remap removed SSA
names in existing debug statements.
(tree_function_versioning): Do not create DEBUG_EXPR_DECL for removed
parameters if we have already done so.

gcc/testsuite/ChangeLog:

2021-03-29  Martin Jambor  <mjambor@suse.cz>

PR ipa/93385
* gcc.dg/guality/ipa-sra-1.c: New test.

Fortran manual: Remove old docs for never-implemented extensions.

2021-11-01 Sandra Loosemore <sandra@codesourcery.com>

gcc/fortran/
* gfortran.texi (Projects): Add bullet for helping with
incomplete standards compliance.
(Proposed Extensions): Delete section.

Fortran manual: Update miscellaneous references to old standard versions.

2021-11-01 Sandra Loosemore <sandra@codesourcery.com>

gcc/fortran/
* intrinsic.texi (Introduction to Intrinsics): Genericize
references to standard versions.
* invoke.texi (-fall-intrinsics): Likewise.
(-fmax-identifier-length=): Likewise.

Fortran manual: Update section on Interoperability with C

2021-11-01  Sandra Loosemore  <sandra@codesourcery.com>

gcc/fortran/
* gfortran.texi (Interoperability with C): Copy-editing.  Add
more index entries.
(Intrinsic Types): Likewise.
(Derived Types and struct): Likewise.
(Interoperable Global Variables): Likewise.
(Interoperable Subroutines and Functions): Likewise.
(Working with C Pointers): Likewise.
(Further Interoperability of Fortran with C): Likewise.  Rewrite
to reflect that this is now fully supported by gfortran.

Fortran manual: Revise introductory chapter.

Fix various bit-rot in the discussion of standards conformance, remove
material that is only of historical interest, copy-editing.  Also move
discussion of preprocessing out of the introductory chapter.

2021-11-01  Sandra Loosemore  <sandra@codesourcery.com>

gcc/fortran/
* gfortran.texi (About GNU Fortran): Consolidate material
formerly in other sections.  Copy-editing.
(Preprocessing and conditional compilation): Delete, moving
most material to invoke.texi.
(GNU Fortran and G77): Delete.
(Project Status): Delete.
(Standards): Update.
(Fortran 95 status): Mention conditional compilation here.
(Fortran 2003 status): Rewrite to mention the 1 missing feature
instead of all the ones implemented.
(Fortran 2008 status): Similarly for the 2 missing features.
(Fortran 2018 status): Rewrite to reflect completion of TS29113
feature support.
* invoke.texi (Preprocessing Options): Move material formerly
in introductory chapter here.

Fortran manual: Combine standard conformance docs in one place.

Discussion of conformance with various revisions of the
Fortran standard was split between two separate parts of the
manual. This patch moves it all to the introductory chapter.

2021-11-01 Sandra Loosemore <sandra@codesourcery.com>

gcc/fortran/
* gfortran.texi (Standards): Move discussion of specific
standard versions here....
(Fortran standards status): ...from here, and delete this node.

Workaround ICE in gimple_call_static_chain_flags

gcc/ChangeLog:

2021-11-04 Jan Hubicka <hubicka@ucw.cz>

PR ipa/103058
* gimple.c (gimple_call_static_chain_flags): Handle case when
nested function does not bind locally.

c++: use range-for more

gcc/cp/ChangeLog:

* call.c (build_array_conv): Use range-for.
(build_complex_conv): Likewise.
* constexpr.c (clear_no_implicit_zero)
(reduced_constant_expression_p): Likewise.
* decl.c (cp_complete_array_type): Likewise.
* decl2.c (mark_vtable_entries): Likewise.
* pt.c (iterative_hash_template_arg):
(invalid_tparm_referent_p, unify)
(type_dependent_expression_p): Likewise.
* typeck.c (build_ptrmemfunc_access_expr): Likewise.

aarch64: Pass and return Neon vector-tuple types without a parallel

Neon vector-tuple types can be passed in registers on function call
and return - there is no need to generate a parallel rtx. This patch
adds cases to detect vector-tuple modes and generates an appropriate
register rtx.

This change greatly improves code generated when passing Neon vector-
tuple types between functions; many new test cases are added to
defend these improvements.

gcc/ChangeLog:

2021-10-07 Jonathan Wright <jonathan.wright@arm.com>

* config/aarch64/aarch64.c (aarch64_function_value): Generate
a register rtx for Neon vector-tuple modes.
(aarch64_layout_arg): Likewise.

gcc/testsuite/ChangeLog:

* gcc.target/aarch64/vector_structure_intrinsics.c: New code
generation tests.

gcc/lower_subreg.c: Prevent decomposition if modes are not tieable

Preventing decomposition if modes are not tieable is necessary to
stop AArch64 partial Neon structure modes being treated as packed in
registers.

This is a necessary prerequisite for a future AArch64 PCS change to
maintain good code generation.

gcc/ChangeLog:

2021-10-14 Jonathan Wright <jonathan.wright@arm.com>

* lower-subreg.c (simple_move): Prevent decomposition if
modes are not tieable.

aarch64: Add machine modes for Neon vector-tuple types

Until now, GCC has used large integer machine modes (OI, CI and XI)
to model Neon vector-tuple types. This is suboptimal for many
reasons, the most notable are:

1) Large integer modes are opaque and modifying one vector in the
    tuple requires a lot of inefficient set/get gymnastics. The
    result is a lot of superfluous move instructions.
2) Large integer modes do not map well to types that are tuples of
    64-bit vectors - we need additional zero-padding which again
    results in superfluous move instructions.

This patch adds new machine modes that better model the C-level Neon
vector-tuple types. The approach is somewhat similar to that already
used for SVE vector-tuple types.

All of the AArch64 backend patterns and builtins that manipulate Neon
vector tuples are updated to use the new machine modes. This has the
effect of significantly reducing the amount of boiler-plate code in
the arm_neon.h header.

While this patch increases the quality of code generated in many
instances, there is still room for significant improvement - which
will be attempted in subsequent patches.

gcc/ChangeLog:

2021-08-09  Jonathan Wright  <jonathan.wright@arm.com>
    Richard Sandiford  <richard.sandiford@arm.com>

* config/aarch64/aarch64-builtins.c (v2x8qi_UP): Define.
(v2x4hi_UP): Likewise.
(v2x4hf_UP): Likewise.
(v2x4bf_UP): Likewise.
(v2x2si_UP): Likewise.
(v2x2sf_UP): Likewise.
(v2x1di_UP): Likewise.
(v2x1df_UP): Likewise.
(v2x16qi_UP): Likewise.
(v2x8hi_UP): Likewise.
(v2x8hf_UP): Likewise.
(v2x8bf_UP): Likewise.
(v2x4si_UP): Likewise.
(v2x4sf_UP): Likewise.
(v2x2di_UP): Likewise.
(v2x2df_UP): Likewise.
(v3x8qi_UP): Likewise.
(v3x4hi_UP): Likewise.
(v3x4hf_UP): Likewise.
(v3x4bf_UP): Likewise.
(v3x2si_UP): Likewise.
(v3x2sf_UP): Likewise.
(v3x1di_UP): Likewise.
(v3x1df_UP): Likewise.
(v3x16qi_UP): Likewise.
(v3x8hi_UP): Likewise.
(v3x8hf_UP): Likewise.
(v3x8bf_UP): Likewise.
(v3x4si_UP): Likewise.
(v3x4sf_UP): Likewise.
(v3x2di_UP): Likewise.
(v3x2df_UP): Likewise.
(v4x8qi_UP): Likewise.
(v4x4hi_UP): Likewise.
(v4x4hf_UP): Likewise.
(v4x4bf_UP): Likewise.
(v4x2si_UP): Likewise.
(v4x2sf_UP): Likewise.
(v4x1di_UP): Likewise.
(v4x1df_UP): Likewise.
(v4x16qi_UP): Likewise.
(v4x8hi_UP): Likewise.
(v4x8hf_UP): Likewise.
(v4x8bf_UP): Likewise.
(v4x4si_UP): Likewise.
(v4x4sf_UP): Likewise.
(v4x2di_UP): Likewise.
(v4x2df_UP): Likewise.
(TYPES_GETREGP): Delete.
(TYPES_SETREGP): Likewise.
(TYPES_LOADSTRUCT_U): Define.
(TYPES_LOADSTRUCT_P): Likewise.
(TYPES_LOADSTRUCT_LANE_U): Likewise.
(TYPES_LOADSTRUCT_LANE_P): Likewise.
(TYPES_STORE1P): Move for consistency.
(TYPES_STORESTRUCT_U): Define.
(TYPES_STORESTRUCT_P): Likewise.
(TYPES_STORESTRUCT_LANE_U): Likewise.
(TYPES_STORESTRUCT_LANE_P): Likewise.
(aarch64_simd_tuple_types): Define.
(aarch64_lookup_simd_builtin_type): Handle tuple type lookup.
(aarch64_init_simd_builtin_functions): Update frontend lookup
for builtin functions after handling arm_neon.h pragma.
(register_tuple_type): Manually set modes of single-integer
tuple types. Record tuple types.
* config/aarch64/aarch64-modes.def
(ADV_SIMD_D_REG_STRUCT_MODES): Define D-register tuple modes.
(ADV_SIMD_Q_REG_STRUCT_MODES): Define Q-register tuple modes.
(SVE_MODES): Give single-vector modes priority over vector-
tuple modes.
(VECTOR_MODES_WITH_PREFIX): Set partial-vector mode order to
be after all single-vector modes.
* config/aarch64/aarch64-simd-builtins.def: Update builtin
generator macros to reflect modifications to the backend
patterns.
* config/aarch64/aarch64-simd.md (aarch64_simd_ld2<mode>):
Use vector-tuple mode iterator and rename to...
(aarch64_simd_ld2<vstruct_elt>): This.
(aarch64_simd_ld2r<mode>): Use vector-tuple mode iterator and
rename to...
(aarch64_simd_ld2r<vstruct_elt>): This.
(aarch64_vec_load_lanesoi_lane<mode>): Use vector-tuple mode
iterator and rename to...
(aarch64_vec_load_lanes<mode>_lane<vstruct_elt>): This.
(vec_load_lanesoi<mode>): Use vector-tuple mode iterator and
rename to...
(vec_load_lanes<mode><vstruct_elt>): This.
(aarch64_simd_st2<mode>): Use vector-tuple mode iterator and
rename to...
(aarch64_simd_st2<vstruct_elt>): This.
(aarch64_vec_store_lanesoi_lane<mode>): Use vector-tuple mode
iterator and rename to...
(aarch64_vec_store_lanes<mode>_lane<vstruct_elt>): This.
(vec_store_lanesoi<mode>): Use vector-tuple mode iterator and
rename to...
(vec_store_lanes<mode><vstruct_elt>): This.
(aarch64_simd_ld3<mode>): Use vector-tuple mode iterator and
rename to...
(aarch64_simd_ld3<vstruct_elt>): This.
(aarch64_simd_ld3r<mode>): Use vector-tuple mode iterator and
rename to...
(aarch64_simd_ld3r<vstruct_elt>): This.
(aarch64_vec_load_lanesci_lane<mode>): Use vector-tuple mode
iterator and rename to...
(vec_load_lanesci<mode>): This.
(aarch64_simd_st3<mode>): Use vector-tuple mode iterator and
rename to...
(aarch64_simd_st3<vstruct_elt>): This.
(aarch64_vec_store_lanesci_lane<mode>): Use vector-tuple mode
iterator and rename to...
(vec_store_lanesci<mode>): This.
(aarch64_simd_ld4<mode>): Use vector-tuple mode iterator and
rename to...
(aarch64_simd_ld4<vstruct_elt>): This.
(aarch64_simd_ld4r<mode>): Use vector-tuple mode iterator and
rename to...
(aarch64_simd_ld4r<vstruct_elt>): This.
(aarch64_vec_load_lanesxi_lane<mode>): Use vector-tuple mode
iterator and rename to...
(vec_load_lanesxi<mode>): This.
(aarch64_simd_st4<mode>): Use vector-tuple mode iterator and
rename to...
(aarch64_simd_st4<vstruct_elt>): This.
(aarch64_vec_store_lanesxi_lane<mode>): Use vector-tuple mode
iterator and rename to...
(vec_store_lanesxi<mode>): This.
(mov<mode>): Define for Neon vector-tuple modes.
(aarch64_ld1x3<VALLDIF:mode>): Use vector-tuple mode iterator
and rename to...
(aarch64_ld1x3<vstruct_elt>): This.
(aarch64_ld1_x3_<mode>): Use vector-tuple mode iterator and
rename to...
(aarch64_ld1_x3_<vstruct_elt>): This.
(aarch64_ld1x4<VALLDIF:mode>): Use vector-tuple mode iterator
and rename to...
(aarch64_ld1x4<vstruct_elt>): This.
(aarch64_ld1_x4_<mode>): Use vector-tuple mode iterator and
rename to...
(aarch64_ld1_x4_<vstruct_elt>): This.
(aarch64_st1x2<VALLDIF:mode>): Use vector-tuple mode iterator
and rename to...
(aarch64_st1x2<vstruct_elt>): This.
(aarch64_st1_x2_<mode>): Use vector-tuple mode iterator and
rename to...
(aarch64_st1_x2_<vstruct_elt>): This.
(aarch64_st1x3<VALLDIF:mode>): Use vector-tuple mode iterator
and rename to...
(aarch64_st1x3<vstruct_elt>): This.
(aarch64_st1_x3_<mode>): Use vector-tuple mode iterator and
rename to...
(aarch64_st1_x3_<vstruct_elt>): This.
(aarch64_st1x4<VALLDIF:mode>): Use vector-tuple mode iterator
and rename to...
(aarch64_st1x4<vstruct_elt>): This.
(aarch64_st1_x4_<mode>): Use vector-tuple mode iterator and
rename to...
(aarch64_st1_x4_<vstruct_elt>): This.
(*aarch64_mov<mode>): Define for vector-tuple modes.
(*aarch64_be_mov<mode>): Likewise.
(aarch64_ld<VSTRUCT:nregs>r<VALLDIF:mode>): Use vector-tuple
mode iterator and rename to...
(aarch64_ld<nregs>r<vstruct_elt>): This.
(aarch64_ld2<mode>_dreg): Use vector-tuple mode iterator and
rename to...
(aarch64_ld2<vstruct_elt>_dreg): This.
(aarch64_ld3<mode>_dreg): Use vector-tuple mode iterator and
rename to...
(aarch64_ld3<vstruct_elt>_dreg): This.
(aarch64_ld4<mode>_dreg): Use vector-tuple mode iterator and
rename to...
(aarch64_ld4<vstruct_elt>_dreg): This.
(aarch64_ld<VSTRUCT:nregs><VDC:mode>): Use vector-tuple mode
iterator and rename to...
(aarch64_ld<nregs><vstruct_elt>): Use vector-tuple mode
iterator and rename to...
(aarch64_ld<VSTRUCT:nregs><VQ:mode>): Use vector-tuple mode
(aarch64_ld1x2<VQ:mode>): Delete.
(aarch64_ld1x2<VDC:mode>): Use vector-tuple mode iterator and
rename to...
(aarch64_ld1x2<vstruct_elt>): This.
(aarch64_ld<VSTRUCT:nregs>_lane<VALLDIF:mode>): Use vector-
tuple mode iterator and rename to...
(aarch64_ld<nregs>_lane<vstruct_elt>): This.
(aarch64_get_dreg<VSTRUCT:mode><VDC:mode>): Delete.
(aarch64_get_qreg<VSTRUCT:mode><VQ:mode>): Likewise.
(aarch64_st2<mode>_dreg): Use vector-tuple mode iterator and
rename to...
(aarch64_st2<vstruct_elt>_dreg): This.
(aarch64_st3<mode>_dreg): Use vector-tuple mode iterator and
rename to...
(aarch64_st3<vstruct_elt>_dreg): This.
(aarch64_st4<mode>_dreg): Use vector-tuple mode iterator and
rename to...
(aarch64_st4<vstruct_elt>_dreg): This.
(aarch64_st<VSTRUCT:nregs><VDC:mode>): Use vector-tuple mode
iterator and rename to...
(aarch64_st<nregs><vstruct_elt>): This.
(aarch64_st<VSTRUCT:nregs><VQ:mode>): Use vector-tuple mode
iterator and rename to aarch64_st<nregs><vstruct_elt>.
(aarch64_st<VSTRUCT:nregs>_lane<VALLDIF:mode>): Use vector-
tuple mode iterator and rename to...
(aarch64_st<nregs>_lane<vstruct_elt>): This.
(aarch64_set_qreg<VSTRUCT:mode><VQ:mode>): Delete.
(aarch64_simd_ld1<mode>_x2): Use vector-tuple mode iterator
and rename to...
(aarch64_simd_ld1<vstruct_elt>_x2): This.
* config/aarch64/aarch64.c (aarch64_advsimd_struct_mode_p):
Refactor to include new vector-tuple modes.
(aarch64_classify_vector_mode): Add cases for new vector-
tuple modes.
(aarch64_advsimd_partial_struct_mode_p): Define.
(aarch64_advsimd_full_struct_mode_p): Likewise.
(aarch64_advsimd_vector_array_mode): Likewise.
(aarch64_sve_data_mode): Change location in file.
(aarch64_array_mode): Handle case of Neon vector-tuple modes.
(aarch64_hard_regno_nregs): Handle case of partial Neon
vector structures.
(aarch64_classify_address): Refactor to include handling of
Neon vector-tuple modes.
(aarch64_print_operand): Print "d" for "%R" for a partial
Neon vector structure.
(aarch64_expand_vec_perm_1): Use new vector-tuple mode.
(aarch64_modes_tieable_p): Prevent tieing Neon partial struct
modes with scalar machines modes larger than 8 bytes.
(aarch64_can_change_mode_class): Don't allow changes between
partial and full Neon vector-structure modes.
* config/aarch64/arm_neon.h (vst2_lane_f16): Use updated
builtin and remove boiler-plate code for opaque mode.
(vst2_lane_f32): Likewise.
(vst2_lane_f64): Likewise.
(vst2_lane_p8): Likewise.
(vst2_lane_p16): Likewise.
(vst2_lane_p64): Likewise.
(vst2_lane_s8): Likewise.
(vst2_lane_s16): Likewise.
(vst2_lane_s32): Likewise.
(vst2_lane_s64): Likewise.
(vst2_lane_u8): Likewise.
(vst2_lane_u16): Likewise.
(vst2_lane_u32): Likewise.
(vst2_lane_u64): Likewise.
(vst2q_lane_f16): Likewise.
(vst2q_lane_f32): Likewise.
(vst2q_lane_f64): Likewise.
(vst2q_lane_p8): Likewise.
(vst2q_lane_p16): Likewise.
(vst2q_lane_p64): Likewise.
(vst2q_lane_s8): Likewise.
(vst2q_lane_s16): Likewise.
(vst2q_lane_s32): Likewise.
(vst2q_lane_s64): Likewise.
(vst2q_lane_u8): Likewise.
(vst2q_lane_u16): Likewise.
(vst2q_lane_u32): Likewise.
(vst2q_lane_u64): Likewise.
(vst3_lane_f16): Likewise.
(vst3_lane_f32): Likewise.
(vst3_lane_f64): Likewise.
(vst3_lane_p8): Likewise.
(vst3_lane_p16): Likewise.
(vst3_lane_p64): Likewise.
(vst3_lane_s8): Likewise.
(vst3_lane_s16): Likewise.
(vst3_lane_s32): Likewise.
(vst3_lane_s64): Likewise.
(vst3_lane_u8): Likewise.
(vst3_lane_u16): Likewise.
(vst3_lane_u32): Likewise.
(vst3_lane_u64): Likewise.
(vst3q_lane_f16): Likewise.
(vst3q_lane_f32): Likewise.
(vst3q_lane_f64): Likewise.
(vst3q_lane_p8): Likewise.
(vst3q_lane_p16): Likewise.
(vst3q_lane_p64): Likewise.
(vst3q_lane_s8): Likewise.
(vst3q_lane_s16): Likewise.
(vst3q_lane_s32): Likewise.
(vst3q_lane_s64): Likewise.
(vst3q_lane_u8): Likewise.
(vst3q_lane_u16): Likewise.
(vst3q_lane_u32): Likewise.
(vst3q_lane_u64): Likewise.
(vst4_lane_f16): Likewise.
(vst4_lane_f32): Likewise.
(vst4_lane_f64): Likewise.
(vst4_lane_p8): Likewise.
(vst4_lane_p16): Likewise.
(vst4_lane_p64): Likewise.
(vst4_lane_s8): Likewise.
(vst4_lane_s16): Likewise.
(vst4_lane_s32): Likewise.
(vst4_lane_s64): Likewise.
(vst4_lane_u8): Likewise.
(vst4_lane_u16): Likewise.
(vst4_lane_u32): Likewise.
(vst4_lane_u64): Likewise.
(vst4q_lane_f16): Likewise.
(vst4q_lane_f32): Likewise.
(vst4q_lane_f64): Likewise.
(vst4q_lane_p8): Likewise.
(vst4q_lane_p16): Likewise.
(vst4q_lane_p64): Likewise.
(vst4q_lane_s8): Likewise.
(vst4q_lane_s16): Likewise.
(vst4q_lane_s32): Likewise.
(vst4q_lane_s64): Likewise.
(vst4q_lane_u8): Likewise.
(vst4q_lane_u16): Likewise.
(vst4q_lane_u32): Likewise.
(vst4q_lane_u64): Likewise.
(vtbl3_s8): Likewise.
(vtbl3_u8): Likewise.
(vtbl3_p8): Likewise.
(vtbl4_s8): Likewise.
(vtbl4_u8): Likewise.
(vtbl4_p8): Likewise.
(vld1_u8_x3): Likewise.
(vld1_s8_x3): Likewise.
(vld1_u16_x3): Likewise.
(vld1_s16_x3): Likewise.
(vld1_u32_x3): Likewise.
(vld1_s32_x3): Likewise.
(vld1_u64_x3): Likewise.
(vld1_s64_x3): Likewise.
(vld1_f16_x3): Likewise.
(vld1_f32_x3): Likewise.
(vld1_f64_x3): Likewise.
(vld1_p8_x3): Likewise.
(vld1_p16_x3): Likewise.
(vld1_p64_x3): Likewise.
(vld1q_u8_x3): Likewise.
(vld1q_s8_x3): Likewise.
(vld1q_u16_x3): Likewise.
(vld1q_s16_x3): Likewise.
(vld1q_u32_x3): Likewise.
(vld1q_s32_x3): Likewise.
(vld1q_u64_x3): Likewise.
(vld1q_s64_x3): Likewise.
(vld1q_f16_x3): Likewise.
(vld1q_f32_x3): Likewise.
(vld1q_f64_x3): Likewise.
(vld1q_p8_x3): Likewise.
(vld1q_p16_x3): Likewise.
(vld1q_p64_x3): Likewise.
(vld1_u8_x2): Likewise.
(vld1_s8_x2): Likewise.
(vld1_u16_x2): Likewise.
(vld1_s16_x2): Likewise.
(vld1_u32_x2): Likewise.
(vld1_s32_x2): Likewise.
(vld1_u64_x2): Likewise.
(vld1_s64_x2): Likewise.
(vld1_f16_x2): Likewise.
(vld1_f32_x2): Likewise.
(vld1_f64_x2): Likewise.
(vld1_p8_x2): Likewise.
(vld1_p16_x2): Likewise.
(vld1_p64_x2): Likewise.
(vld1q_u8_x2): Likewise.
(vld1q_s8_x2): Likewise.
(vld1q_u16_x2): Likewise.
(vld1q_s16_x2): Likewise.
(vld1q_u32_x2): Likewise.
(vld1q_s32_x2): Likewise.
(vld1q_u64_x2): Likewise.
(vld1q_s64_x2): Likewise.
(vld1q_f16_x2): Likewise.
(vld1q_f32_x2): Likewise.
(vld1q_f64_x2): Likewise.
(vld1q_p8_x2): Likewise.
(vld1q_p16_x2): Likewise.
(vld1q_p64_x2): Likewise.
(vld1_s8_x4): Likewise.
(vld1q_s8_x4): Likewise.
(vld1_s16_x4): Likewise.
(vld1q_s16_x4): Likewise.
(vld1_s32_x4): Likewise.
(vld1q_s32_x4): Likewise.
(vld1_u8_x4): Likewise.
(vld1q_u8_x4): Likewise.
(vld1_u16_x4): Likewise.
(vld1q_u16_x4): Likewise.
(vld1_u32_x4): Likewise.
(vld1q_u32_x4): Likewise.
(vld1_f16_x4): Likewise.
(vld1q_f16_x4): Likewise.
(vld1_f32_x4): Likewise.
(vld1q_f32_x4): Likewise.
(vld1_p8_x4): Likewise.
(vld1q_p8_x4): Likewise.
(vld1_p16_x4): Likewise.
(vld1q_p16_x4): Likewise.
(vld1_s64_x4): Likewise.
(vld1_u64_x4): Likewise.
(vld1_p64_x4): Likewise.
(vld1q_s64_x4): Likewise.
(vld1q_u64_x4): Likewise.
(vld1q_p64_x4): Likewise.
(vld1_f64_x4): Likewise.
(vld1q_f64_x4): Likewise.
(vld2_s64): Likewise.
(vld2_u64): Likewise.
(vld2_f64): Likewise.
(vld2_s8): Likewise.
(vld2_p8): Likewise.
(vld2_p64): Likewise.
(vld2_s16): Likewise.
(vld2_p16): Likewise.
(vld2_s32): Likewise.
(vld2_u8): Likewise.
(vld2_u16): Likewise.
(vld2_u32): Likewise.
(vld2_f16): Likewise.
(vld2_f32): Likewise.
(vld2q_s8): Likewise.
(vld2q_p8): Likewise.
(vld2q_s16): Likewise.
(vld2q_p16): Likewise.
(vld2q_p64): Likewise.
(vld2q_s32): Likewise.
(vld2q_s64): Likewise.
(vld2q_u8): Likewise.
(vld2q_u16): Likewise.
(vld2q_u32): Likewise.
(vld2q_u64): Likewise.
(vld2q_f16): Likewise.
(vld2q_f32): Likewise.
(vld2q_f64): Likewise.
(vld3_s64): Likewise.
(vld3_u64): Likewise.
(vld3_f64): Likewise.
(vld3_s8): Likewise.
(vld3_p8): Likewise.
(vld3_s16): Likewise.
(vld3_p16): Likewise.
(vld3_s32): Likewise.
(vld3_u8): Likewise.
(vld3_u16): Likewise.
(vld3_u32): Likewise.
(vld3_f16): Likewise.
(vld3_f32): Likewise.
(vld3_p64): Likewise.
(vld3q_s8): Likewise.
(vld3q_p8): Likewise.
(vld3q_s16): Likewise.
(vld3q_p16): Likewise.
(vld3q_s32): Likewise.
(vld3q_s64): Likewise.
(vld3q_u8): Likewise.
(vld3q_u16): Likewise.
(vld3q_u32): Likewise.
(vld3q_u64): Likewise.
(vld3q_f16): Likewise.
(vld3q_f32): Likewise.
(vld3q_f64): Likewise.
(vld3q_p64): Likewise.
(vld4_s64): Likewise.
(vld4_u64): Likewise.
(vld4_f64): Likewise.
(vld4_s8): Likewise.
(vld4_p8): Likewise.
(vld4_s16): Likewise.
(vld4_p16): Likewise.
(vld4_s32): Likewise.
(vld4_u8): Likewise.
(vld4_u16): Likewise.
(vld4_u32): Likewise.
(vld4_f16): Likewise.
(vld4_f32): Likewise.
(vld4_p64): Likewise.
(vld4q_s8): Likewise.
(vld4q_p8): Likewise.
(vld4q_s16): Likewise.
(vld4q_p16): Likewise.
(vld4q_s32): Likewise.
(vld4q_s64): Likewise.
(vld4q_u8): Likewise.
(vld4q_u16): Likewise.
(vld4q_u32): Likewise.
(vld4q_u64): Likewise.
(vld4q_f16): Likewise.
(vld4q_f32): Likewise.
(vld4q_f64): Likewise.
(vld4q_p64): Likewise.
(vld2_dup_s8): Likewise.
(vld2_dup_s16): Likewise.
(vld2_dup_s32): Likewise.
(vld2_dup_f16): Likewise.
(vld2_dup_f32): Likewise.
(vld2_dup_f64): Likewise.
(vld2_dup_u8): Likewise.
(vld2_dup_u16): Likewise.
(vld2_dup_u32): Likewise.
(vld2_dup_p8): Likewise.
(vld2_dup_p16): Likewise.
(vld2_dup_p64): Likewise.
(vld2_dup_s64): Likewise.
(vld2_dup_u64): Likewise.
(vld2q_dup_s8): Likewise.
(vld2q_dup_p8): Likewise.
(vld2q_dup_s16): Likewise.
(vld2q_dup_p16): Likewise.
(vld2q_dup_s32): Likewise.
(vld2q_dup_s64): Likewise.
(vld2q_dup_u8): Likewise.
(vld2q_dup_u16): Likewise.
(vld2q_dup_u32): Likewise.
(vld2q_dup_u64): Likewise.
(vld2q_dup_f16): Likewise.
(vld2q_dup_f32): Likewise.
(vld2q_dup_f64): Likewise.
(vld2q_dup_p64): Likewise.
(vld3_dup_s64): Likewise.
(vld3_dup_u64): Likewise.
(vld3_dup_f64): Likewise.
(vld3_dup_s8): Likewise.
(vld3_dup_p8): Likewise.
(vld3_dup_s16): Likewise.
(vld3_dup_p16): Likewise.
(vld3_dup_s32): Likewise.
(vld3_dup_u8): Likewise.
(vld3_dup_u16): Likewise.
(vld3_dup_u32): Likewise.
(vld3_dup_f16): Likewise.
(vld3_dup_f32): Likewise.
(vld3_dup_p64): Likewise.
(vld3q_dup_s8): Likewise.
(vld3q_dup_p8): Likewise.
(vld3q_dup_s16): Likewise.
(vld3q_dup_p16): Likewise.
(vld3q_dup_s32): Likewise.
(vld3q_dup_s64): Likewise.
(vld3q_dup_u8): Likewise.
(vld3q_dup_u16): Likewise.
(vld3q_dup_u32): Likewise.
(vld3q_dup_u64): Likewise.
(vld3q_dup_f16): Likewise.
(vld3q_dup_f32): Likewise.
(vld3q_dup_f64): Likewise.
(vld3q_dup_p64): Likewise.
(vld4_dup_s64): Likewise.
(vld4_dup_u64): Likewise.
(vld4_dup_f64): Likewise.
(vld4_dup_s8): Likewise.
(vld4_dup_p8): Likewise.
(vld4_dup_s16): Likewise.
(vld4_dup_p16): Likewise.
(vld4_dup_s32): Likewise.
(vld4_dup_u8): Likewise.
(vld4_dup_u16): Likewise.
(vld4_dup_u32): Likewise.
(vld4_dup_f16): Likewise.
(vld4_dup_f32): Likewise.
(vld4_dup_p64): Likewise.
(vld4q_dup_s8): Likewise.
(vld4q_dup_p8): Likewise.
(vld4q_dup_s16): Likewise.
(vld4q_dup_p16): Likewise.
(vld4q_dup_s32): Likewise.
(vld4q_dup_s64): Likewise.
(vld4q_dup_u8): Likewise.
(vld4q_dup_u16): Likewise.
(vld4q_dup_u32): Likewise.
(vld4q_dup_u64): Likewise.
(vld4q_dup_f16): Likewise.
(vld4q_dup_f32): Likewise.
(vld4q_dup_f64): Likewise.
(vld4q_dup_p64): Likewise.
(vld2_lane_u8): Likewise.
(vld2_lane_u16): Likewise.
(vld2_lane_u32): Likewise.
(vld2_lane_u64): Likewise.
(vld2_lane_s8): Likewise.
(vld2_lane_s16): Likewise.
(vld2_lane_s32): Likewise.
(vld2_lane_s64): Likewise.
(vld2_lane_f16): Likewise.
(vld2_lane_f32): Likewise.
(vld2_lane_f64): Likewise.
(vld2_lane_p8): Likewise.
(vld2_lane_p16): Likewise.
(vld2_lane_p64): Likewise.
(vld2q_lane_u8): Likewise.
(vld2q_lane_u16): Likewise.
(vld2q_lane_u32): Likewise.
(vld2q_lane_u64): Likewise.
(vld2q_lane_s8): Likewise.
(vld2q_lane_s16): Likewise.
(vld2q_lane_s32): Likewise.
(vld2q_lane_s64): Likewise.
(vld2q_lane_f16): Likewise.
(vld2q_lane_f32): Likewise.
(vld2q_lane_f64): Likewise.
(vld2q_lane_p8): Likewise.
(vld2q_lane_p16): Likewise.
(vld2q_lane_p64): Likewise.
(vld3_lane_u8): Likewise.
(vld3_lane_u16): Likewise.
(vld3_lane_u32): Likewise.
(vld3_lane_u64): Likewise.
(vld3_lane_s8): Likewise.
(vld3_lane_s16): Likewise.
(vld3_lane_s32): Likewise.
(vld3_lane_s64): Likewise.
(vld3_lane_f16): Likewise.
(vld3_lane_f32): Likewise.
(vld3_lane_f64): Likewise.
(vld3_lane_p8): Likewise.
(vld3_lane_p16): Likewise.
(vld3_lane_p64): Likewise.
(vld3q_lane_u8): Likewise.
(vld3q_lane_u16): Likewise.
(vld3q_lane_u32): Likewise.
(vld3q_lane_u64): Likewise.
(vld3q_lane_s8): Likewise.
(vld3q_lane_s16): Likewise.
(vld3q_lane_s32): Likewise.
(vld3q_lane_s64): Likewise.
(vld3q_lane_f16): Likewise.
(vld3q_lane_f32): Likewise.
(vld3q_lane_f64): Likewise.
(vld3q_lane_p8): Likewise.
(vld3q_lane_p16): Likewise.
(vld3q_lane_p64): Likewise.
(vld4_lane_u8): Likewise.
(vld4_lane_u16): Likewise.
(vld4_lane_u32): Likewise.
(vld4_lane_u64): Likewise.
(vld4_lane_s8): Likewise.
(vld4_lane_s16): Likewise.
(vld4_lane_s32): Likewise.
(vld4_lane_s64): Likewise.
(vld4_lane_f16): Likewise.
(vld4_lane_f32): Likewise.
(vld4_lane_f64): Likewise.
(vld4_lane_p8): Likewise.
(vld4_lane_p16): Likewise.
(vld4_lane_p64): Likewise.
(vld4q_lane_u8): Likewise.
(vld4q_lane_u16): Likewise.
(vld4q_lane_u32): Likewise.
(vld4q_lane_u64): Likewise.
(vld4q_lane_s8): Likewise.
(vld4q_lane_s16): Likewise.
(vld4q_lane_s32): Likewise.
(vld4q_lane_s64): Likewise.
(vld4q_lane_f16): Likewise.
(vld4q_lane_f32): Likewise.
(vld4q_lane_f64): Likewise.
(vld4q_lane_p8): Likewise.
(vld4q_lane_p16): Likewise.
(vld4q_lane_p64): Likewise.
(vqtbl2_s8): Likewise.
(vqtbl2_u8): Likewise.
(vqtbl2_p8): Likewise.
(vqtbl2q_s8): Likewise.
(vqtbl2q_u8): Likewise.
(vqtbl2q_p8): Likewise.
(vqtbl3_s8): Likewise.
(vqtbl3_u8): Likewise.
(vqtbl3_p8): Likewise.
(vqtbl3q_s8): Likewise.
(vqtbl3q_u8): Likewise.
(vqtbl3q_p8): Likewise.
(vqtbl4_s8): Likewise.
(vqtbl4_u8): Likewise.
(vqtbl4_p8): Likewise.
(vqtbl4q_s8): Likewise.
(vqtbl4q_u8): Likewise.
(vqtbl4q_p8): Likewise.
(vqtbx2_s8): Likewise.
(vqtbx2_u8): Likewise.
(vqtbx2_p8): Likewise.
(vqtbx2q_s8): Likewise.
(vqtbx2q_u8): Likewise.
(vqtbx2q_p8): Likewise.
(vqtbx3_s8): Likewise.
(vqtbx3_u8): Likewise.
(vqtbx3_p8): Likewise.
(vqtbx3q_s8): Likewise.
(vqtbx3q_u8): Likewise.
(vqtbx3q_p8): Likewise.
(vqtbx4_s8): Likewise.
(vqtbx4_u8): Likewise.
(vqtbx4_p8): Likewise.
(vqtbx4q_s8): Likewise.
(vqtbx4q_u8): Likewise.
(vqtbx4q_p8): Likewise.
(vst1_s64_x2): Likewise.
(vst1_u64_x2): Likewise.
(vst1_f64_x2): Likewise.
(vst1_s8_x2): Likewise.
(vst1_p8_x2): Likewise.
(vst1_s16_x2): Likewise.
(vst1_p16_x2): Likewise.
(vst1_s32_x2): Likewise.
(vst1_u8_x2): Likewise.
(vst1_u16_x2): Likewise.
(vst1_u32_x2): Likewise.
(vst1_f16_x2): Likewise.
(vst1_f32_x2): Likewise.
(vst1_p64_x2): Likewise.
(vst1q_s8_x2): Likewise.
(vst1q_p8_x2): Likewise.
(vst1q_s16_x2): Likewise.
(vst1q_p16_x2): Likewise.
(vst1q_s32_x2): Likewise.
(vst1q_s64_x2): Likewise.
(vst1q_u8_x2): Likewise.
(vst1q_u16_x2): Likewise.
(vst1q_u32_x2): Likewise.
(vst1q_u64_x2): Likewise.
(vst1q_f16_x2): Likewise.
(vst1q_f32_x2): Likewise.
(vst1q_f64_x2): Likewise.
(vst1q_p64_x2): Likewise.
(vst1_s64_x3): Likewise.
(vst1_u64_x3): Likewise.
(vst1_f64_x3): Likewise.
(vst1_s8_x3): Likewise.
(vst1_p8_x3): Likewise.
(vst1_s16_x3): Likewise.
(vst1_p16_x3): Likewise.
(vst1_s32_x3): Likewise.
(vst1_u8_x3): Likewise.
(vst1_u16_x3): Likewise.
(vst1_u32_x3): Likewise.
(vst1_f16_x3): Likewise.
(vst1_f32_x3): Likewise.
(vst1_p64_x3): Likewise.
(vst1q_s8_x3): Likewise.
(vst1q_p8_x3): Likewise.
(vst1q_s16_x3): Likewise.
(vst1q_p16_x3): Likewise.
(vst1q_s32_x3): Likewise.
(vst1q_s64_x3): Likewise.
(vst1q_u8_x3): Likewise.
(vst1q_u16_x3): Likewise.
(vst1q_u32_x3): Likewise.
(vst1q_u64_x3): Likewise.
(vst1q_f16_x3): Likewise.
(vst1q_f32_x3): Likewise.
(vst1q_f64_x3): Likewise.
(vst1q_p64_x3): Likewise.
(vst1_s8_x4): Likewise.
(vst1q_s8_x4): Likewise.
(vst1_s16_x4): Likewise.
(vst1q_s16_x4): Likewise.
(vst1_s32_x4): Likewise.
(vst1q_s32_x4): Likewise.
(vst1_u8_x4): Likewise.
(vst1q_u8_x4): Likewise.
(vst1_u16_x4): Likewise.
(vst1q_u16_x4): Likewise.
(vst1_u32_x4): Likewise.
(vst1q_u32_x4): Likewise.
(vst1_f16_x4): Likewise.
(vst1q_f16_x4): Likewise.
(vst1_f32_x4): Likewise.
(vst1q_f32_x4): Likewise.
(vst1_p8_x4): Likewise.
(vst1q_p8_x4): Likewise.
(vst1_p16_x4): Likewise.
(vst1q_p16_x4): Likewise.
(vst1_s64_x4): Likewise.
(vst1_u64_x4): Likewise.
(vst1_p64_x4): Likewise.
(vst1q_s64_x4): Likewise.
(vst1q_u64_x4): Likewise.
(vst1q_p64_x4): Likewise.
(vst1_f64_x4): Likewise.
(vst1q_f64_x4): Likewise.
(vst2_s64): Likewise.
(vst2_u64): Likewise.
(vst2_f64): Likewise.
(vst2_s8): Likewise.
(vst2_p8): Likewise.
(vst2_s16): Likewise.
(vst2_p16): Likewise.
(vst2_s32): Likewise.
(vst2_u8): Likewise.
(vst2_u16): Likewise.
(vst2_u32): Likewise.
(vst2_f16): Likewise.
(vst2_f32): Likewise.
(vst2_p64): Likewise.
(vst2q_s8): Likewise.
(vst2q_p8): Likewise.
(vst2q_s16): Likewise.
(vst2q_p16): Likewise.
(vst2q_s32): Likewise.
(vst2q_s64): Likewise.
(vst2q_u8): Likewise.
(vst2q_u16): Likewise.
(vst2q_u32): Likewise.
(vst2q_u64): Likewise.
(vst2q_f16): Likewise.
(vst2q_f32): Likewise.
(vst2q_f64): Likewise.
(vst2q_p64): Likewise.
(vst3_s64): Likewise.
(vst3_u64): Likewise.
(vst3_f64): Likewise.
(vst3_s8): Likewise.
(vst3_p8): Likewise.
(vst3_s16): Likewise.
(vst3_p16): Likewise.
(vst3_s32): Likewise.
(vst3_u8): Likewise.
(vst3_u16): Likewise.
(vst3_u32): Likewise.
(vst3_f16): Likewise.
(vst3_f32): Likewise.
(vst3_p64): Likewise.
(vst3q_s8): Likewise.
(vst3q_p8): Likewise.
(vst3q_s16): Likewise.
(vst3q_p16): Likewise.
(vst3q_s32): Likewise.
(vst3q_s64): Likewise.
(vst3q_u8): Likewise.
(vst3q_u16): Likewise.
(vst3q_u32): Likewise.
(vst3q_u64): Likewise.
(vst3q_f16): Likewise.
(vst3q_f32): Likewise.
(vst3q_f64): Likewise.
(vst3q_p64): Likewise.
(vst4_s64): Likewise.
(vst4_u64): Likewise.
(vst4_f64): Likewise.
(vst4_s8): Likewise.
(vst4_p8): Likewise.
(vst4_s16): Likewise.
(vst4_p16): Likewise.
(vst4_s32): Likewise.
(vst4_u8): Likewise.
(vst4_u16): Likewise.
(vst4_u32): Likewise.
(vst4_f16): Likewise.
(vst4_f32): Likewise.
(vst4_p64): Likewise.
(vst4q_s8): Likewise.
(vst4q_p8): Likewise.
(vst4q_s16): Likewise.
(vst4q_p16): Likewise.
(vst4q_s32): Likewise.
(vst4q_s64): Likewise.
(vst4q_u8): Likewise.
(vst4q_u16): Likewise.
(vst4q_u32): Likewise.
(vst4q_u64): Likewise.
(vst4q_f16): Likewise.
(vst4q_f32): Likewise.
(vst4q_f64): Likewise.
(vst4q_p64): Likewise.
(vtbx4_s8): Likewise.
(vtbx4_u8): Likewise.
(vtbx4_p8): Likewise.
(vld1_bf16_x2): Likewise.
(vld1q_bf16_x2): Likewise.
(vld1_bf16_x3): Likewise.
(vld1q_bf16_x3): Likewise.
(vld1_bf16_x4): Likewise.
(vld1q_bf16_x4): Likewise.
(vld2_bf16): Likewise.
(vld2q_bf16): Likewise.
(vld2_dup_bf16): Likewise.
(vld2q_dup_bf16): Likewise.
(vld3_bf16): Likewise.
(vld3q_bf16): Likewise.
(vld3_dup_bf16): Likewise.
(vld3q_dup_bf16): Likewise.
(vld4_bf16): Likewise.
(vld4q_bf16): Likewise.
(vld4_dup_bf16): Likewise.
(vld4q_dup_bf16): Likewise.
(vst1_bf16_x2): Likewise.
(vst1q_bf16_x2): Likewise.
(vst1_bf16_x3): Likewise.
(vst1q_bf16_x3): Likewise.
(vst1_bf16_x4): Likewise.
(vst1q_bf16_x4): Likewise.
(vst2_bf16): Likewise.
(vst2q_bf16): Likewise.
(vst3_bf16): Likewise.
(vst3q_bf16): Likewise.
(vst4_bf16): Likewise.
(vst4q_bf16): Likewise.
(vld2_lane_bf16): Likewise.
(vld2q_lane_bf16): Likewise.
(vld3_lane_bf16): Likewise.
(vld3q_lane_bf16): Likewise.
(vld4_lane_bf16): Likewise.
(vld4q_lane_bf16): Likewise.
(vst2_lane_bf16): Likewise.
(vst2q_lane_bf16): Likewise.
(vst3_lane_bf16): Likewise.
(vst3q_lane_bf16): Likewise.
(vst4_lane_bf16): Likewise.
(vst4q_lane_bf16): Likewise.
* config/aarch64/geniterators.sh: Modify iterator regex to
match new vector-tuple modes.
* config/aarch64/iterators.md (insn_count): Extend mode
attribute with vector-tuple type information.
(nregs): Likewise.
(Vendreg): Likewise.
(Vetype): Likewise.
(Vtype): Likewise.
(VSTRUCT_2D): New mode iterator.
(VSTRUCT_2DNX): Likewise.
(VSTRUCT_2DX): Likewise.
(VSTRUCT_2Q): Likewise.
(VSTRUCT_2QD): Likewise.
(VSTRUCT_3D): Likewise.
(VSTRUCT_3DNX): Likewise.
(VSTRUCT_3DX): Likewise.
(VSTRUCT_3Q): Likewise.
(VSTRUCT_3QD): Likewise.
(VSTRUCT_4D): Likewise.
(VSTRUCT_4DNX): Likewise.
(VSTRUCT_4DX): Likewise.
(VSTRUCT_4Q): Likewise.
(VSTRUCT_4QD): Likewise.
(VSTRUCT_D): Likewise.
(VSTRUCT_Q): Likewise.
(VSTRUCT_QD): Likewise.
(VSTRUCT_ELT): New mode attribute.
(vstruct_elt): Likewise.
* genmodes.c (VECTOR_MODE): Add default prefix and order
parameters.
(VECTOR_MODE_WITH_PREFIX): Define.
(make_vector_mode): Add mode prefix and order parameters.

gcc/testsuite/ChangeLog:

* gcc.target/aarch64/advsimd-intrinsics/bf16_vldN_lane_2.c:
Relax incorrect register number requirement.
* gcc.target/aarch64/sve/pcs/struct_3_256.c: Accept
equivalent codegen with fmov.

gcc/expmed.c: Ensure vector modes are tieable before extraction

Extracting a bitfield from a vector can be achieved by casting the
vector to a new type whose elements are the same size as the desired
bitfield, before generating a subreg. However, this is only an
optimization if the original vector can be accessed in the new
machine mode without first being copied - a condition denoted by the
TARGET_MODES_TIEABLE_P hook.

This patch adds a check to make sure that the vector modes are
tieable before attempting to generate a subreg. This is a necessary
prerequisite for a subsequent patch that will introduce new machine
modes for Arm Neon vector-tuple types.

gcc/ChangeLog:

2021-10-11 Jonathan Wright <jonathan.wright@arm.com>

* expmed.c (extract_bit_field_1): Ensure modes are tieable.

gcc/expr.c: Remove historic workaround for broken SIMD subreg

A long time ago, using a parallel to take a subreg of a SIMD register
was broken. This temporary fix[1] (from 2003) spilled these registers
to memory and reloaded the appropriate part to obtain the subreg.

The fix initially existed for the benefit of the PowerPC E500 - a
platform for which GCC removed support a number of years ago.
Regardless, a proper mechanism for taking a subreg of a SIMD register
exists now anyway.

This patch removes the workaround thus preventing SIMD registers
being dumped to memory unnecessarily - which sometimes can't be fixed
by later passes.

[1] https://gcc.gnu.org/pipermail/gcc-patches/2003-April/102099.html

gcc/ChangeLog:

2021-10-11 Jonathan Wright <jonathan.wright@arm.com>

* expr.c (emit_group_load_1): Remove historic workaround.

aarch64: Move Neon vector-tuple type declaration into the compiler

Declare the Neon vector-tuple types inside the compiler instead of in
the arm_neon.h header. This is a necessary first step before adding
corresponding machine modes to the AArch64 backend.

The vector-tuple types are implemented using a #pragma. This means
initialization of builtin functions that have vector-tuple types as
arguments or return values has to be delayed until the #pragma is
handled.

gcc/ChangeLog:

2021-09-10 Jonathan Wright <jonathan.wright@arm.com>

* config/aarch64/aarch64-builtins.c (aarch64_init_simd_builtins):
Factor out main loop to...
(aarch64_init_simd_builtin_functions): This new function.
(register_tuple_type): Define.
(aarch64_scalar_builtin_type_p): Define.
(handle_arm_neon_h): Define.
* config/aarch64/aarch64-c.c (aarch64_pragma_aarch64): Handle
pragma for arm_neon.h.
* config/aarch64/aarch64-protos.h (aarch64_advsimd_struct_mode_p):
Declare.
(handle_arm_neon_h): Likewise.
* config/aarch64/aarch64.c (aarch64_advsimd_struct_mode_p):
Remove static modifier.
* config/aarch64/arm_neon.h (target): Remove Neon vector
structure type definitions.

x86: Check leal/addl gcc.target/i386/amxtile-3.c for x32

Check leal and addl for x32 to fix:

FAIL: gcc.target/i386/amxtile-3.c scan-assembler addq[ \\t]+\\$12
FAIL: gcc.target/i386/amxtile-3.c scan-assembler leaq[ \\t]+4
FAIL: gcc.target/i386/amxtile-3.c scan-assembler leaq[ \\t]+8

* gcc.target/i386/amxtile-3.c: Check leal/addl for x32.

path solver: Prefer range_of_expr instead of range_on_edge.

The range_of_expr method provides better caching than range_on_edge.
If we have a statement, we can just it and avoid the range_on_edge
dance. Plus we can use all the range_of_expr fanciness.

Tested on x86-64 and ppc64le Linux with the usual regstrap. I also
verified that the before and after number of threads was the same or
greater in a suite of .ii files from a bootstrap.

gcc/ChangeLog:

PR tree-optimization/102943
* gimple-range-path.cc (path_range_query::range_on_path_entry):
Prefer range_of_expr unless there are no statements in the BB.

Avoid repeating calculations in threader.

We already attempt to resolve the current path on entry to
find_paths_to_name(), so there's no need to do so again for each
exported range since nothing has changed.

Removing this redundant calculation avoids 22% of calls into the path
solver.

Tested on x86-64 and ppc64le Linux with the usual regstrap. I also
verified that the before and after number of threads was the same
in a suite of .ii files from a bootstrap.

gcc/ChangeLog:

PR tree-optimization/102943
* tree-ssa-threadbackward.c (back_threader::find_paths_to_names):
Avoid duplicate calculation of paths.

path solver: Only compute relations for imports.

We are currently calculating implicit PHI relations for all PHI
arguments.  This creates unecessary work, as we only care about SSA
names in the import bitmap.  Similarly for inter-path relationals.  We
can avoid things not in the bitmap.

Tested on x86-64 and ppc64le Linux with the usual regstrap.  I also
verified that the before and after number of threads was the same
in a suite of .ii files from a bootstrap.

gcc/ChangeLog:

PR tree-optimization/102943
* gimple-range-path.cc (path_range_query::compute_phi_relations):
Only compute relations for SSA names in the import list.
(path_range_query::compute_outgoing_relations): Same.
* gimple-range-path.h (path_range_query::import_p): New.

libffi: Add --enable-cet to configure

When --enable-cet is used to configure GCC, enable Intel CET in libffi.

* Makefile.am (AM_CFLAGS): Add $(CET_FLAGS).
(AM_CCASFLAGS): Likewise.
* configure.ac (CET_FLAGS): Add GCC_CET_FLAGS and AC_SUBST.
* Makefile.in: Regenerate.
* aclocal.m4: Likewise.
* configure: Likewise.
* include/Makefile.in: Likewise.
* man/Makefile.in: Likewise.
* testsuite/Makefile.in: Likewise.

Add -v option for git_check_commit.py.

Doing so, one can see:
$ git gcc-verify a50914d2111c72d2cd5cb8cf474133f4f85a25f6 -v
Checking a50914d2111c72d2cd5cb8cf474133f4f85a25f6: FAILED
ERR: unchanged file mentioned in a ChangeLog: "gcc/common.opt"
ERR: unchanged file mentioned in a ChangeLog (did you mean "gcc/testsuite/g++.dg/pr102955.C"?): "gcc/testsuite/gcc.dg/pr102955.c"
- gcc/testsuite/gcc.dg/pr102955.c
? ^^ ^

+ gcc/testsuite/g++.dg/pr102955.C
? ^^ ^

contrib/ChangeLog:

* gcc-changelog/git_check_commit.py: Add -v option.
* gcc-changelog/git_commit.py: Print verbose diff for wrong
filename.

testsuite: Add more guards to complex tests

This test hopefully fixes all the remaining target specific test issues by

1: Unrolling all add testcases by 16 using pragma GCC unroll
2. On armhf use Adv.SIMD instead of MVE to test. MVE's autovec is too incomplete
to be a general test target.
3. Add appropriate vect_<type> and float<size> guards on testcases.

gcc/testsuite/ChangeLog:

PR testsuite/103042
* gcc.dg/vect/complex/bb-slp-complex-add-pattern-int.c: Update guards.
* gcc.dg/vect/complex/bb-slp-complex-add-pattern-long.c: Likewise.
* gcc.dg/vect/complex/bb-slp-complex-add-pattern-short.c: Likewise.
* gcc.dg/vect/complex/bb-slp-complex-add-pattern-unsigned-int.c:
Likewise.
* gcc.dg/vect/complex/bb-slp-complex-add-pattern-unsigned-long.c:
Likewise.
* gcc.dg/vect/complex/bb-slp-complex-add-pattern-unsigned-short.c:
Likewise.
* gcc.dg/vect/complex/complex-add-pattern-template.c: Likewise.
* gcc.dg/vect/complex/complex-add-template.c: Likewise.
* gcc.dg/vect/complex/complex-operations-run.c: Likewise.
* gcc.dg/vect/complex/fast-math-bb-slp-complex-add-double.c: Likewise.
* gcc.dg/vect/complex/fast-math-bb-slp-complex-add-float.c: Likewise.
* gcc.dg/vect/complex/fast-math-bb-slp-complex-add-half-float.c:
Likewise.
* gcc.dg/vect/complex/fast-math-bb-slp-complex-add-pattern-double.c:
Likewise.
* gcc.dg/vect/complex/fast-math-bb-slp-complex-add-pattern-float.c:
Likewise.
* gcc.dg/vect/complex/fast-math-bb-slp-complex-add-pattern-half-float.c:
Likewise.
* gcc.dg/vect/complex/fast-math-bb-slp-complex-mla-double.c: Likewise.
* gcc.dg/vect/complex/fast-math-bb-slp-complex-mla-float.c: Likewise.
* gcc.dg/vect/complex/fast-math-bb-slp-complex-mla-half-float.c:
Likewise.
* gcc.dg/vect/complex/fast-math-bb-slp-complex-mls-double.c: Likewise.
* gcc.dg/vect/complex/fast-math-bb-slp-complex-mls-float.c: Likewise.
* gcc.dg/vect/complex/fast-math-bb-slp-complex-mls-half-float.c:
Likewise.
* gcc.dg/vect/complex/fast-math-bb-slp-complex-mul-double.c: Likewise.
* gcc.dg/vect/complex/fast-math-bb-slp-complex-mul-float.c: Likewise.
* gcc.dg/vect/complex/fast-math-bb-slp-complex-mul-half-float.c:
Likewise.
* gcc.dg/vect/complex/fast-math-complex-add-double.c: Likewise.
* gcc.dg/vect/complex/fast-math-complex-add-float.c: Likewise.
* gcc.dg/vect/complex/fast-math-complex-add-half-float.c: Likewise.
* gcc.dg/vect/complex/fast-math-complex-add-pattern-double.c: Likewise.
* gcc.dg/vect/complex/fast-math-complex-add-pattern-float.c: Likewise.
* gcc.dg/vect/complex/fast-math-complex-add-pattern-half-float.c:
Likewise.
* gcc.dg/vect/complex/fast-math-complex-mla-double.c: Likewise.
* gcc.dg/vect/complex/fast-math-complex-mla-float.c: Likewise.
* gcc.dg/vect/complex/fast-math-complex-mla-half-float.c: Likewise.
* gcc.dg/vect/complex/fast-math-complex-mls-double.c: Likewise.
* gcc.dg/vect/complex/fast-math-complex-mls-float.c: Likewise.
* gcc.dg/vect/complex/fast-math-complex-mls-half-float.c: Likewise.
* gcc.dg/vect/complex/fast-math-complex-mul-double.c: Likewise.
* gcc.dg/vect/complex/fast-math-complex-mul-float.c: Likewise.
* gcc.dg/vect/complex/fast-math-complex-mul-half-float.c: Likewise.
* gcc.dg/vect/complex/vect-complex-add-pattern-byte.c: Likewise.
* gcc.dg/vect/complex/vect-complex-add-pattern-int.c: Likewise.
* gcc.dg/vect/complex/vect-complex-add-pattern-long.c: Likewise.
* gcc.dg/vect/complex/vect-complex-add-pattern-short.c: Likewise.
* gcc.dg/vect/complex/vect-complex-add-pattern-unsigned-byte.c:
Likewise.
* gcc.dg/vect/complex/vect-complex-add-pattern-unsigned-int.c:
Likewise.
* gcc.dg/vect/complex/vect-complex-add-pattern-unsigned-long.c:
Likewise.
* gcc.dg/vect/complex/vect-complex-add-pattern-unsigned-short.c:
Likewise.

analyzer: fix ICE in sm_state_map::dump when dumping trees

gcc/analyzer/ChangeLog:
* program-state.cc (sm_state_map::dump): Use default_tree_printer
as format decoder.

Signed-off-by: David Malcolm <dmalcolm@redhat.com>

rtl-optimization/103075 - avoid ICEing on unfolded int-to-float converts

The following avoids asserting in exact_int_to_float_conversion_p that
the argument is not constant which it in fact can be with
-frounding-math and inexact int-to-float conversions. Say so.

2021-11-04 Richard Biener <rguenther@suse.de>

PR rtl-optimization/103075
* simplify-rtx.c (exact_int_to_float_conversion_p): Return
false for a VOIDmode operand.

* gcc.dg/pr103075.c: New testcase.

aarch64: Move more code into aarch64_vector_costs

This patch moves more code into aarch64_vector_costs and reuses
some of the information that is now available in the base class.

I'm planing to significantly rework this code, with more hooks
into the vectoriser, but this seemed worth doing as a first step.

gcc/
* config/aarch64/aarch64.c (aarch64_vector_costs): Make member
variables private and add "m_" to their names. Remove is_loop.
(aarch64_record_potential_advsimd_unrolling): Replace with...
(aarch64_vector_costs::record_potential_advsimd_unrolling): ...this.
(aarch64_analyze_loop_vinfo): Replace with...
(aarch64_vector_costs::analyze_loop_vinfo): ...this.
Move initialization of (m_)vec_flags to add_stmt_cost.
(aarch64_analyze_bb_vinfo): Delete.
(aarch64_count_ops): Replace with...
(aarch64_vector_costs::count_ops): ...this.
(aarch64_vector_costs::add_stmt_cost): Set m_vec_flags,
using m_costing_for_scalar to test whether we're costing
scalar or vector code.
(aarch64_adjust_body_cost_sve): Replace with...
(aarch64_vector_costs::adjust_body_cost_sve): ...this.
(aarch64_adjust_body_cost): Replace with...
(aarch64_vector_costs::adjust_body_cost): ...this.
(aarch64_vector_costs::finish_cost): Use m_vinfo instead of is_loop.

vect: Convert cost hooks to classes

The current vector cost interface has a quite a bit of redundancy
built in.  Each target that defines its own hooks has to replicate
the basic unsigned[3] management.  Currently each target also
duplicates the cost adjustment for inner loops.

This patch instead defines a vector_costs class for holding
the scalar or vector cost and allows targets to subclass it.
There is then only one costing hook: to create a new costs
structure of the appropriate type.  Everything else can be
virtual functions, with common concepts implemented in the
base class rather than in each target's derivation.

This might seem like excess C++-ification, but it shaves
~100 LOC.  I've also got some follow-on changes that become
significantly easier with this patch.  Maybe it could help
with things like weighting blocks based on frequency too.

This will clash with Andre's unrolling patches.  His patches
have priority so this patch should queue behind them.

The x86 and rs6000 parts fully convert to a self-contained class.
The equivalent aarch64 changes are more complex, so this patch
just does the bare minimum.  A later patch will rework the
aarch64 bits.

gcc/
* target.def (targetm.vectorize.init_cost): Replace with...
(targetm.vectorize.create_costs): ...this.
(targetm.vectorize.add_stmt_cost): Delete.
(targetm.vectorize.finish_cost): Likewise.
(targetm.vectorize.destroy_cost_data): Likewise.
* doc/tm.texi.in (TARGET_VECTORIZE_INIT_COST): Replace with...
(TARGET_VECTORIZE_CREATE_COSTS): ...this.
(TARGET_VECTORIZE_ADD_STMT_COST): Delete.
(TARGET_VECTORIZE_FINISH_COST): Likewise.
(TARGET_VECTORIZE_DESTROY_COST_DATA): Likewise.
* doc/tm.texi: Regenerate.
* tree-vectorizer.h (vec_info::vec_info): Remove target_cost_data
parameter.
(vec_info::target_cost_data): Change from a void * to a vector_costs *.
(vector_costs): New class.
(init_cost): Take a vec_info and return a vector_costs.
(dump_stmt_cost): Remove data parameter.
(add_stmt_cost): Replace vinfo and data parameters with a vector_costs.
(add_stmt_costs): Likewise.
(finish_cost): Replace data parameter with a vector_costs.
(destroy_cost_data): Delete.
* tree-vectorizer.c (dump_stmt_cost): Remove data argument and
don't print it.
(vec_info::vec_info): Remove the target_cost_data parameter and
initialize the member variable to null instead.
(vec_info::~vec_info): Delete target_cost_data instead of calling
destroy_cost_data.
(vector_costs::add_stmt_cost): New function.
(vector_costs::finish_cost): Likewise.
(vector_costs::record_stmt_cost): Likewise.
(vector_costs::adjust_cost_for_freq): Likewise.
* tree-vect-loop.c (_loop_vec_info::_loop_vec_info): Update
call to vec_info::vec_info.
(vect_compute_single_scalar_iteration_cost): Update after above
changes to costing interface.
(vect_analyze_loop_operations): Likewise.
(vect_estimate_min_profitable_iters): Likewise.
(vect_analyze_loop_2): Initialize LOOP_VINFO_TARGET_COST_DATA
at the start_over point, where it needs to be recreated after
trying without slp.  Update retry code accordingly.
* tree-vect-slp.c (_bb_vec_info::_bb_vec_info): Update call
to vec_info::vec_info.
(vect_slp_analyze_operation): Update after above changes to costing
interface.
(vect_bb_vectorization_profitable_p): Likewise.
* targhooks.h (default_init_cost): Replace with...
(default_vectorize_create_costs): ...this.
(default_add_stmt_cost): Delete.
(default_finish_cost, default_destroy_cost_data): Likewise.
* targhooks.c (default_init_cost): Replace with...
(default_vectorize_create_costs): ...this.
(default_add_stmt_cost): Delete, moving logic to vector_costs instead.
(default_finish_cost, default_destroy_cost_data): Delete.
* config/aarch64/aarch64.c (aarch64_vector_costs): Inherit from
vector_costs.  Add a constructor.
(aarch64_init_cost): Replace with...
(aarch64_vectorize_create_costs): ...this.
(aarch64_add_stmt_cost): Replace with...
(aarch64_vector_costs::add_stmt_cost): ...this.  Use record_stmt_cost
to adjust the cost for inner loops.
(aarch64_finish_cost): Replace with...
(aarch64_vector_costs::finish_cost): ...this.
(aarch64_destroy_cost_data): Delete.
(TARGET_VECTORIZE_INIT_COST): Replace with...
(TARGET_VECTORIZE_CREATE_COSTS): ...this.
(TARGET_VECTORIZE_ADD_STMT_COST): Delete.
(TARGET_VECTORIZE_FINISH_COST): Likewise.
(TARGET_VECTORIZE_DESTROY_COST_DATA): Likewise.
* config/i386/i386.c (ix86_vector_costs): New structure.
(ix86_init_cost): Replace with...
(ix86_vectorize_create_costs): ...this.
(ix86_add_stmt_cost): Replace with...
(ix86_vector_costs::add_stmt_cost): ...this.  Use adjust_cost_for_freq
to adjust the cost for inner loops.
(ix86_finish_cost, ix86_destroy_cost_data): Delete.
(TARGET_VECTORIZE_INIT_COST): Replace with...
(TARGET_VECTORIZE_CREATE_COSTS): ...this.
(TARGET_VECTORIZE_ADD_STMT_COST): Delete.
(TARGET_VECTORIZE_FINISH_COST): Likewise.
(TARGET_VECTORIZE_DESTROY_COST_DATA): Likewise.
* config/rs6000/rs6000.c (TARGET_VECTORIZE_INIT_COST): Replace with...
(TARGET_VECTORIZE_CREATE_COSTS): ...this.
(TARGET_VECTORIZE_ADD_STMT_COST): Delete.
(TARGET_VECTORIZE_FINISH_COST): Likewise.
(TARGET_VECTORIZE_DESTROY_COST_DATA): Likewise.
(rs6000_cost_data): Inherit from vector_costs.
Add a constructor.  Drop loop_info, cost and costing_for_scalar
in favor of the corresponding vector_costs member variables.
Add "m_" to the names of the remaining member variables and
initialize them.
(rs6000_density_test): Replace with...
(rs6000_cost_data::density_test): ...this.
(rs6000_init_cost): Replace with...
(rs6000_vectorize_create_costs): ...this.
(rs6000_update_target_cost_per_stmt): Replace with...
(rs6000_cost_data::update_target_cost_per_stmt): ...this.
(rs6000_add_stmt_cost): Replace with...
(rs6000_cost_data::add_stmt_cost): ...this.  Use adjust_cost_for_freq
to adjust the cost for inner loops.
(rs6000_adjust_vect_cost_per_loop): Replace with...
(rs6000_cost_data::adjust_vect_cost_per_loop): ...this.
(rs6000_finish_cost): Replace with...
(rs6000_cost_data::finish_cost): ...this.  Group loop code
into a single if statement and pass the loop_vinfo down to
subroutines.
(rs6000_destroy_cost_data): Delete.

libsanitizer: update LOCAL_PATCHES

libsanitizer/ChangeLog:

* LOCAL_PATCHES: Update git revision.

libsanitizer: Apply local patches

lisanitizer: Apply autoreconf.

libsanitizer: merge from master (c86b4503a94c277534ce4b9a5c015a6ac151b98a).

Convert arrays in ssa pointer_equiv_analyzer to auto_vec's.

The problem in this PR is an off-by-one bug.  We should've allocated
num_ssa_names + 1.  However, in fixing this, I noticed that
num_ssa_names can change between queries, so I have replaced the array
with an auto_vec and added code to grow the vector as necessary.

Tested on x86-64 Linux.

PR tree-optimization/103062

gcc/ChangeLog:

PR tree-optimization/103062
* value-pointer-equiv.cc (ssa_equiv_stack::ssa_equiv_stack):
Increase size of allocation by 1.
(ssa_equiv_stack::push_replacement): Grow as needed.
(ssa_equiv_stack::get_replacement): Same.
(pointer_equiv_analyzer::pointer_equiv_analyzer): Same.
(pointer_equiv_analyzer::~pointer_equiv_analyzer): Remove delete.
(pointer_equiv_analyzer::set_global_equiv): Grow as needed.
(pointer_equiv_analyzer::get_equiv): Same.
(pointer_equiv_analyzer::get_equiv_expr): Remove const.
* value-pointer-equiv.h (class pointer_equiv_analyzer): Remove
const markers.  Use auto_vec instead of tree *.

gcc/testsuite/ChangeLog:

* gcc.dg/pr103062.c: New test.

libstdc++: Refactor emplace-like functions in std::variant

libstdc++-v3/ChangeLog:

* include/std/variant (__detail::__variant::__emplace): New
function template.
(_Copy_assign_base::operator=): Reorder conditions to match
bulleted list of effects in the standard. Use __emplace instead
of _M_reset followed by _Construct.
(_Move_assign_base::operator=): Likewise.
(__construct_by_index): Remove.
(variant::emplace): Use __emplace instead of _M_reset followed
by __construct_by_index.
(variant::swap): Hoist valueless cases out of visitor. Use
__emplace to replace _M_reset followed by _Construct.

libstdc++: Optimize std::variant traits and improve diagnostics

By defining additional partial specializations of _Nth_type we can
reduce the number of recursive instantiations needed to get from N to 0.
We can also use _Nth_type in variant_alternative, to take advantage of
that new optimization.

By adding a static_assert to variant_alternative we get a nicer error
than 'invalid use of incomplete type'.

By defining partial specializations of std::variant_size_v for the
common case we can avoid instantiating the std::variant_size class
template.

The __tuple_count class template and __tuple_count_v variable template
can be simplified to a single variable template, __count.

By adding a deleted constructor to the _Variant_union primary template
we can (very slightly) improve diagnostics for invalid attempts to
construct a std::variant with an out-of-range index. Instead of a
confusing error about "too many initializers for ..." we get a call to a
deleted function.

By using _Nth_type instead of variant_alternative (for cv-unqualified
variant types) we avoid instantiating variant_alternative.

By adding deleted overloads of variant::emplace we get better
diagnostics for emplace<invalid-index> or emplace<invalid-type>. Instead
of getting errors explaining why each of the four overloads wasn't
valid, we just get one error about calling a deleted function.

libstdc++-v3/ChangeLog:

* include/std/variant (_Nth_type): Define partial
specializations to reduce number of instantiations.
(variant_size_v): Define partial specializations to avoid
instantiations.
(variant_alternative): Use _Nth_type. Add static assert.
(__tuple_count, __tuple_count_v): Replace with ...
(__count): New variable template.
(_Variant_union): Add deleted constructor.
(variant::__to_type): Use _Nth_type.
(variant::emplace): Use _Nth_type. Add deleted overloads for
invalid types and indices.