review.tizen.org Git - platform/upstream/gcc.git/log

Fix checking disabled build.

gcc/ChangeLog:

2021-11-13 Jan Hubicka <hubicka@ucw.cz>

* ipa-modref-tree.c: Move #if CHECKING_P to proper place.

fixincludes: simplify handling for access() failure [PR21283, PR80047]

POSIX says:

    On some implementations, if buf is a null pointer, getcwd() may obtain
    size bytes of memory using malloc(). In this case, the pointer returned
    by getcwd() may be used as the argument in a subsequent call to free().
    Invoking getcwd() with buf as a null pointer is not recommended in
    conforming applications.

This produces an error building GCC with --enable-werror-always:

    ../../../fixincludes/fixincl.c: In function ‘process’:
    ../../../fixincludes/fixincl.c:1356:7: error: argument 1 is null but
    the corresponding size argument 2 value is 4096 [-Werror=nonnull]

It's suggested by POSIX to call getcwd() with progressively larger
buffers until it does not give an [ERANGE] error. However, it's highly
unlikely that this error-handling route is ever used.

So we can simplify it instead of writting too much code.  We give up to
use getcwd(), because `make` will output a `Leaving directory ...` message
containing the path to cwd when we call abort().

fixincludes/ChangeLog:

PR other/21823
PR bootstrap/80047
* fixincl.c (process): Simplify the handling for highly
  unlikely access() failure, to avoid using non-standard
  extensions.

modref_access_node cleanup

move member functions of modref_access_node from ipa-modref-tree.h to
ipa-modref-tree.c since they become long and not fitting for inlines anyway. I
also cleaned up the interface by making static insert method (which handles
inserting accesses into a vector and optimizing them) which makes it possible
to hide most of the interface handling interval merging private.

Honza

gcc/ChangeLog:

* ipa-modref-tree.h
(struct modref_access_node): Move longer member functions to
ipa-modref-tree.c
(modref_ref_node::try_merge_with): Turn into modreef_acces_node member
function.
* ipa-modref-tree.c (modref_access_node::contains): Move here
from ipa-modref-tree.h.
(modref_access_node::update): Likewise.
(modref_access_node::merge): Likewise.
(modref_access_node::closer_pair_p): Likewise.
(modref_access_node::forced_merge): Likewise.
(modref_access_node::update2): Likewise.
(modref_access_node::combined_offsets): Likewise.
(modref_access_node::try_merge_with): Likewise.
(modref_access_node::insert): Likewise.

Add finalize method to modref summary.

gcc/ChangeLog:

* ipa-modref.c (modref_summary::global_memory_read_p): Remove.
(modref_summary::global_memory_written_p): Remove.
(modref_summary::dump): Dump new flags.
(modref_summary::finalize): New member function.
(analyze_function): Call it.
(read_section): Call it.
(update_signature): Call it.
(pass_ipa_modref::execute): Call it.
* ipa-modref.h (struct modref_summary): Remove
global_memory_read_p and global_memory_written_p.
Add global_memory_read, global_memory_written.
* tree-ssa-structalias.c (determine_global_memory_access):
Update.

Whitelity type attributes for function signature change

gcc/ChangeLog:

* ipa-fnsummary.c (compute_fn_summary): Use type_attribut_allowed_p
* ipa-param-manipulation.c
(ipa_param_adjustments::type_attribute_allowed_p):
New member function.
(drop_type_attribute_if_params_changed_p): New function.
(build_adjusted_function_type): Use it.
* ipa-param-manipulation.h: Add type_attribute_allowed_p.

analyzer: add four new taint-based warnings

The initial commit of the analyzer in GCC 10 had a single warning,
  -Wanalyzer-tainted-array-index
and required manually enabling the taint checker with
-fanalyzer-checker=taint (due to scaling issues).

This patch extends the taint detection to add four new taint-based
warnings:

  -Wanalyzer-tainted-allocation-size
     for e.g. attacker-controlled malloc/alloca
  -Wanalyzer-tainted-divisor
     for detecting where an attacker can inject a divide-by-zero
  -Wanalyzer-tainted-offset
     for attacker-controlled pointer offsets
  -Wanalyzer-tainted-size
     for e.g. attacker-controlled memset

and rewords all the warnings to talk about "attacker-controlled" values
rather than "tainted" values.

Unfortunately I haven't yet addressed the scaling issues, so all of
these still require -fanalyzer-checker=taint (in addition to -fanalyzer).

gcc/analyzer/ChangeLog:
* analyzer.opt (Wanalyzer-tainted-allocation-size): New.
(Wanalyzer-tainted-divisor): New.
(Wanalyzer-tainted-offset): New.
(Wanalyzer-tainted-size): New.
* engine.cc (impl_region_model_context::get_taint_map): New.
* exploded-graph.h (impl_region_model_context::get_taint_map):
New decl.
* program-state.cc (sm_state_map::get_state): Call
alt_get_inherited_state.
(sm_state_map::impl_set_state): Modify states within
compound svalues.
(program_state::impl_call_analyzer_dump_state): Undo casts.
(selftest::test_program_state_1): Update for new context param of
create_region_for_heap_alloc.
(selftest::test_program_state_merging): Likewise.
* region-model-impl-calls.cc (region_model::impl_call_alloca):
Likewise.
(region_model::impl_call_calloc): Likewise.
(region_model::impl_call_malloc): Likewise.
(region_model::impl_call_operator_new): Likewise.
(region_model::impl_call_realloc): Likewise.
* region-model.cc (region_model::check_region_access): Call
check_region_for_taint.
(region_model::get_representative_path_var_1): Handle binops.
(region_model::create_region_for_heap_alloc): Add "ctxt" param and
pass it to set_dynamic_extents.
(region_model::create_region_for_alloca): Likewise.
(region_model::set_dynamic_extents): Add "ctxt" param and use it
to call check_dynamic_size_for_taint.
(selftest::test_state_merging): Update for new context param of
create_region_for_heap_alloc.
(selftest::test_malloc_constraints): Likewise.
(selftest::test_malloc): Likewise.
(selftest::test_alloca): Likewise for create_region_for_alloca.
* region-model.h (region_model::create_region_for_heap_alloc): Add
"ctxt" param.
(region_model::create_region_for_alloca): Likewise.
(region_model::set_dynamic_extents): Likewise.
(region_model::check_dynamic_size_for_taint): New decl.
(region_model::check_region_for_taint): New decl.
(region_model_context::get_taint_map): New vfunc.
(noop_region_model_context::get_taint_map): New.
* sm-taint.cc: Remove include of "diagnostic-event-id.h"; add
includes of "gimple-iterator.h", "tristate.h", "selftest.h",
"ordered-hash-map.h", "cgraph.h", "cfg.h", "digraph.h",
"analyzer/supergraph.h", "analyzer/call-string.h",
"analyzer/program-point.h", "analyzer/store.h",
"analyzer/region-model.h", and "analyzer/program-state.h".
(enum bounds): Move to top of file.
(class taint_diagnostic): New.
(class tainted_array_index): Convert to subclass of taint_diagnostic.
(tainted_array_index::emit): Add CWE-129.  Reword warning to use
"attacker-controlled" rather than "tainted".
(tainted_array_index::describe_state_change): Move to
taint_diagnostic::describe_state_change.
(tainted_array_index::describe_final_event): Reword to use
"attacker-controlled" rather than "tainted".
(class tainted_offset): New.
(class tainted_size): New.
(class tainted_divisor): New.
(class tainted_allocation_size): New.
(taint_state_machine::alt_get_inherited_state): New.
(taint_state_machine::on_stmt): In assignment handling, remove
ARRAY_REF handling in favor of check_region_for_taint.  Add
detection of tainted divisors.
(taint_state_machine::get_taint): New.
(taint_state_machine::combine_states): New.
(region_model::check_region_for_taint): New.
(region_model::check_dynamic_size_for_taint): New.
* sm.h (state_machine::alt_get_inherited_state): New.

gcc/ChangeLog:
* doc/invoke.texi (Static Analyzer Options): Add
-Wno-analyzer-tainted-allocation-size,
-Wno-analyzer-tainted-divisor, -Wno-analyzer-tainted-offset, and
-Wno-analyzer-tainted-size to list.  Add
-Wanalyzer-tainted-allocation-size, -Wanalyzer-tainted-divisor,
-Wanalyzer-tainted-offset, and -Wanalyzer-tainted-size to list
of options effectively enabled by -fanalyzer.
(-Wanalyzer-tainted-allocation-size): New.
(-Wanalyzer-tainted-array-index): Tweak wording; add link to CWE.
(-Wanalyzer-tainted-divisor): New.
(-Wanalyzer-tainted-offset): New.
(-Wanalyzer-tainted-size): New.

gcc/testsuite/ChangeLog:
* gcc.dg/analyzer/pr93382.c: Tweak expected wording.
* gcc.dg/analyzer/taint-alloc-1.c: New test.
* gcc.dg/analyzer/taint-alloc-2.c: New test.
* gcc.dg/analyzer/taint-divisor-1.c: New test.
* gcc.dg/analyzer/taint-1.c: Rename to...
* gcc.dg/analyzer/taint-read-index-1.c: ...this.  Tweak expected
wording.  Mark some events as xfail.
* gcc.dg/analyzer/taint-read-offset-1.c: New test.
* gcc.dg/analyzer/taint-size-1.c: New test.
* gcc.dg/analyzer/taint-write-index-1.c: New test.
* gcc.dg/analyzer/taint-write-offset-1.c: New test.

Signed-off-by: David Malcolm <dmalcolm@redhat.com>

Remember fnspec based EAF flags in modref summary.

gcc/ChangeLog:

* attr-fnspec.h (attr_fnspec::arg_eaf_flags): Break out from ...
* gimple.c (gimple_call_arg_flags): ... here.
* ipa-modref.c (analyze_parms): Record flags known from fnspec.
(modref_merge_call_site_flags): Use arg_eaf_flags.

path solver: Compute all PHI ranges simultaneously.

PHIs must be resolved simulatenously, otherwise we may not pick up the
ranges incoming to the block.

For example.  If we put p3_7 in the cache before all PHIs have been
computed, we will pick up the wrong p3_7 value for p2_17:

    # p3_7 = PHI <1(2), 0(5)>
    # p2_17 = PHI <1(2), p3_7(5)>

This patch delays updating the cache until all PHIs have been
analyzed.

gcc/ChangeLog:

PR tree-optimization/103222
* gimple-range-path.cc (path_range_query::compute_ranges_in_phis):
New.
(path_range_query::compute_ranges_in_block): Call
compute_ranges_in_phis.
* gimple-range-path.h (path_range_query::compute_ranges_in_phis):
New.

gcc/testsuite/ChangeLog:

* gcc.dg/pr103222.c: New test.

libsanitizer: Update LOCAL_PATCHES

* LOCAL_PATCHES: Update to the corresponding revision.

libsanitizer: Apply local patches

libsanitizer: Merge with upstream

Merged revision: 82bc6a094e85014f1891ef9407496f44af8fe442

with the fix for PR sanitizer/102911

libstdc++: Implement std::spanstream for C++23

This implements the <spanstream> header, as proposed for C++23 by P0448R4.

libstdc++-v3/ChangeLog:

* include/Makefile.am: Add spanstream header.
* include/Makefile.in: Regenerate.
* include/precompiled/stdc++.h: Add spanstream header.
* include/std/version (__cpp_lib_spanstream): Define.
* include/std/spanstream: New file.
* testsuite/27_io/spanstream/1.cc: New test.
* testsuite/27_io/spanstream/version.cc: New test.

Enable ipa-sra with fnspec attributes

Enable some ipa-sra on fortran by allowing signature changes on functions
with "fn spec" attribute when ipa-modref is enabled.  This is possible since ipa-modref
knows how to preserve things we trace in fnspec and fnspec generated by fortran forntend
are quite simple and can be analysed automatically now.  To be sure I will also add
code that merge fnspec to parameters.

This unfortunately hits bug in ipa-param-manipulation when we remove parameter
that specifies size of variable length parameter. For this reason I added a hack
that prevent signature changes on such functions and will handle it incrementally.

I tried creating C testcase but it is blocked by another problem that we punt ipa-sra
on access attribute.  This is optimization regression we ought to fix so I filled
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=103223.

As a followup I will add code classifying the type attributes (we have just few) and
get stats on access attribute.

gcc/ChangeLog:

* ipa-fnsummary.c (compute_fn_summary): Do not give up on signature
changes on "fn spec" attribute; give up on varadic types.
* ipa-param-manipulation.c: Include attribs.h.
(build_adjusted_function_type): New parameter ARG_MODIFIED; if it is
true remove "fn spec" attribute.
(ipa_param_adjustments::build_new_function_type): Update.
(ipa_param_body_adjustments::modify_formal_parameters): update.
* ipa-sra.c: Include attribs.h.
(ipa_sra_preliminary_function_checks): Do not check for TYPE_ATTRIBUTES.

path solver: Merge path_range_query constructors.

There's no need for two constructors, when we can do it all with one
that defaults to the common behavior:

path_range_query (bool resolve = true, gimple_ranger *ranger = NULL);

Tested on x86-64 Linux.

gcc/ChangeLog:

* gimple-range-path.cc (path_range_query::path_range_query): Merge
ctors.
(path_range_query::import_p): Move from header file.
(path_range_query::~path_range_query): Adjust for combined ctors.
* gimple-range-path.h: Merge ctors.
(path_range_query::import_p): Move to .cc file.

Fix wrong code with modref and some builtins.

ipa-modref gets confused by EAF flags of memcpy becuase parameter 1 is
escaping but used only directly. In modref we do not track values saved to
memory and thus we clear all other flags on each store. This needs to also
happen when called function escapes parameter.

gcc/ChangeLog:

PR tree-optimization/103182
* ipa-modref.c (callee_to_caller_flags): Fix merging of flags.
(modref_eaf_analysis::analyze_ssa_name): Fix merging of flags.

libstdc++: Use GCC_TRY_COMPILE_OR_LINK for getentropy, arc4random

Since r12-5056-g3439657b0286, there has been a regression in
test results; an additional 100 FAILs running the g++ and
libstdc++ testsuite on cris-elf, a newlib target.  The
failures are linker errors, not finding a definition for
getentropy.  It appears newlib has since 2017-12-03
declarations of getentropy and arc4random, and provides an
implementation of arc4random using getentropy, but provides no
definition of getentropy, not even a stub yielding ENOSYS.
This is similar to what it does for many other functions too.

While fixing newlib (like adding said stub) would likely help,
it still leaves older newlib releases hanging.  Thankfully,
the libstdc++ configury test can be improved to try linking
where possible; using the bespoke GCC_TRY_COMPILE_OR_LINK
instead of AC_TRY_COMPILE.  BTW, I see a lack of consistency;
some tests use AC_TRY_COMPILE and some GCC_TRY_COMPILE_OR_LINK
for no apparent reason, but this commit just amends
r12-5056-g3439657b0286.

libstdc++-v3:
PR libstdc++/103166
* acinclude.m4 (GLIBCXX_CHECK_GETENTROPY, GLIBCXX_CHECK_ARC4RANDOM):
Use GCC_TRY_COMPILE_OR_LINK instead of AC_TRY_COMPILE.
* configure: Regenerate.

Daily bump.

or1k: Fix clobbering of _mcount argument if fPIC is enabled

Recently we changed the PROFILE_HOOK _mcount call to pass in the link
register as an argument.  This actually does not work when the _mcount
call uses a PLT because the GOT register setup code ends up getting
inserted before the PROFILE_HOOK and clobbers the link register
argument.

These glibc tests are failing:
  gmon/tst-gmon-pie-gprof
  gmon/tst-gmon-static-gprof

This patch fixes this by saving the instruction that stores the Link
Register to the _mcount argument and then inserts the GOT register setup
instructions after that.

For example:

main.c:

    extern int e;

    int f2(int a) {
      return a + e;
    }

    int f1(int a) {
      return f2 (a + a);
    }

    int main(int argc, char ** argv) {
      return f1 (argc);
    }

Compiled:

    or1k-smh-linux-gnu-gcc -Wall -c -O2 -fPIC -pg -S main.c

Before Fix:

    main:
        l.addi  r1, r1, -16
        l.sw    8(r1), r2
        l.sw    0(r1), r16
        l.addi  r2, r1, 16   # Keeping FP, but not needed
        l.sw    4(r1), r18
        l.sw    12(r1), r9
        l.jal   8            # GOT Setup clobbers r9 (Link Register)
         l.movhi        r16, gotpchi(_GLOBAL_OFFSET_TABLE_-4)
        l.ori   r16, r16, gotpclo(_GLOBAL_OFFSET_TABLE_+0)
        l.add   r16, r16, r9
        l.or    r18, r3, r3
        l.or    r3, r9, r9    # This is not the original LR
        l.jal   plt(_mcount)
         l.nop

        l.jal   plt(f1)
         l.or    r3, r18, r18
        l.lwz   r9, 12(r1)
        l.lwz   r16, 0(r1)
        l.lwz   r18, 4(r1)
        l.lwz   r2, 8(r1)
        l.jr    r9
         l.addi  r1, r1, 16

After the fix:

    main:
        l.addi  r1, r1, -12
        l.sw    0(r1), r16
        l.sw    4(r1), r18
        l.sw    8(r1), r9
        l.or    r18, r3, r3
        l.or    r3, r9, r9    # We now have r9 (LR) set early
        l.jal   8             # Clobbers r9 (Link Register)
         l.movhi        r16, gotpchi(_GLOBAL_OFFSET_TABLE_-4)
        l.ori   r16, r16, gotpclo(_GLOBAL_OFFSET_TABLE_+0)
        l.add   r16, r16, r9
        l.jal   plt(_mcount)
         l.nop

        l.jal   plt(f1)
         l.or    r3, r18, r18
        l.lwz   r9, 8(r1)
        l.lwz   r16, 0(r1)
        l.lwz   r18, 4(r1)
        l.jr    r9
         l.addi  r1, r1, 12

Fixes: 308531d148a ("or1k: Add return address argument to _mcount call")

gcc/ChangeLog:
* config/or1k/or1k-protos.h (or1k_profile_hook): New function.
* config/or1k/or1k.h (PROFILE_HOOK): Change macro to reference
new function or1k_profile_hook.
* config/or1k/or1k.c (struct machine_function): Add new field
set_mcount_arg_insn.
(or1k_profile_hook): New function.
(or1k_init_pic_reg): Update to inject pic rtx after _mcount arg
when profiling.
(or1k_frame_pointer_required): Frame pointer no longer needed
when profiling.

Fix wrong code with pure functions

I introduced bug into find_func_aliases_for_call in handling pure functions.
Instead of reading global memory pure functions are believed to write global
memory. This results in misoptimization of the testcase at -O1.

The change to pta-callused.c updates the template for new behaviour of the
constraint generation. We copy nonlocal memory to calluse which is correct but
also not strictly necessary because later we take care to add nonlocal_p flag
manually.

gcc/ChangeLog:

PR tree-optimization/103209
* tree-ssa-structalias.c (find_func_aliases_for_call): Fix
use of handle_rhs_call

gcc/testsuite/ChangeLog:

PR tree-optimization/103209
* gcc.dg/tree-ssa/pta-callused.c: Update template.
* gcc.c-torture/execute/pr103209.c: New test.

path solver: Solve PHI imports first for ranges.

PHIs must be resolved first while solving ranges in a block,
regardless of where they appear in the import bitmap. We went through
a similar exercise for the relational code, but missed these.

Tested on x86-64 & ppc64le Linux.

gcc/ChangeLog:

PR tree-optimization/103202
* gimple-range-path.cc
(path_range_query::compute_ranges_in_block): Solve PHI imports first.

Fix ipa-pure-const

gcc/ChangeLog:

* ipa-pure-const.c (propagate_pure_const): Remove redundant check;
fix call of ipa_make_function_const and ipa_make_function_pure.

analyzer: "__analyzer_dump_state" has no side-effects

gcc/analyzer/ChangeLog:
* engine.cc (exploded_node::on_stmt_pre): Return when handling
"__analyzer_dump_state".

Signed-off-by: David Malcolm <dmalcolm@redhat.com>

aarch64: Remove redundant costing code

Previous patches made some of the complex parts of the issue rate
code redundant.

gcc/
* config/aarch64/aarch64.c (aarch64_vector_op::n_advsimd_ops): Delete.
(aarch64_vector_op::m_seen_loads): Likewise.
(aarch64_vector_costs::aarch64_vector_costs): Don't push to
m_advsimd_ops.
(aarch64_vector_op::count_ops): Remove vectype and factor parameters.
Remove code that tries to predict different vec_flags from the
current loop's.
(aarch64_vector_costs::add_stmt_cost): Update accordingly.
Remove m_advsimd_ops handling.

aarch64: Use new hooks for vector comparisons

Previously we tried to account for the different issue rates of
the various vector modes by guessing what the Advanced SIMD version
of an SVE loop would look like and what its issue rate was likely to be.
We'd then increase the cost of the SVE loop if the Advanced SIMD loop
might issue more quickly.

This patch moves that logic to better_main_loop_than_p, so that we
can compare loops side-by-side rather than having to guess. This also
means we can apply the issue rate heuristics to *any* vector loop
comparison, rather than just weighting SVE vs. Advanced SIMD.

The actual heuristics are otherwise unchanged. We're just
applying them in a different place.

gcc/
* config/aarch64/aarch64.c (aarch64_vector_costs::m_saw_sve_only_op)
(aarch64_sve_only_stmt_p): Delete.
(aarch64_vector_costs::prefer_unrolled_loop): New function,
extracted from adjust_body_cost.
(aarch64_vector_costs::better_main_loop_than_p): New function,
using heuristics extracted from adjust_body_cost and
adjust_body_cost_sve.
(aarch64_vector_costs::adjust_body_cost_sve): Remove
advsimd_cycles_per_iter and could_use_advsimd parameters.
Update after changes above.
(aarch64_vector_costs::adjust_body_cost): Update after changes above.

aarch64: Add vf_factor to aarch64_vec_op_count

-mtune=neoverse-512tvb sets the likely SVE vector length to 128 bits,
but it also takes into account Neoverse V1, which is a 256-bit target.
This patch adds this VF (VL) factor to aarch64_vec_op_count.

gcc/
* config/aarch64/aarch64.c (aarch64_vec_op_count::m_vf_factor):
New member variable.
(aarch64_vec_op_count::aarch64_vec_op_count): Add a parameter for it.
(aarch64_vec_op_count::vf_factor): New function.
(aarch64_vector_costs::aarch64_vector_costs): When costing for
neoverse-512tvb, pass a vf_factor of 2 for the Neoverse V1 version
of an SVE loop.
(aarch64_vector_costs::adjust_body_cost): Read the vf factor
instead of hard-coding 2.

aarch64: Move cycle estimation into aarch64_vec_op_count

This patch just moves the main cycle estimation routines
into aarch64_vec_op_count.

gcc/
* config/aarch64/aarch64.c
(aarch64_vec_op_count::rename_cycles_per_iter): New function.
(aarch64_vec_op_count::min_nonpred_cycles_per_iter): Likewise.
(aarch64_vec_op_count::min_pred_cycles_per_iter): Likewise.
(aarch64_vec_op_count::min_cycles_per_iter): Likewise.
(aarch64_vec_op_count::dump): Move earlier in file.  Dump the
above properties too.
(aarch64_estimate_min_cycles_per_iter): Delete.
(adjust_body_cost): Use aarch64_vec_op_count::min_cycles_per_iter
instead of aarch64_estimate_min_cycles_per_iter.  Rely on the dump
routine to print CPI estimates.
(adjust_body_cost_sve): Likewise.  Use the other functions above
instead of doing the work inline.

aarch64: Use an array of aarch64_vec_op_counts

-mtune=neoverse-512tvb uses two issue rates, one for Neoverse V1
and one with more generic parameters. We use both rates when
making a choice between scalar, Advanced SIMD and SVE code.

Previously we calculated the Neoverse V1 issue rates from the
more generic issue rates, but by removing m_scalar_ops and
(later) m_advsimd_ops, it becomes easier to track multiple
issue rates directly.

This patch therefore converts m_ops and (temporarily) m_advsimd_ops
into arrays.

gcc/
* config/aarch64/aarch64.c (aarch64_vec_op_count): Allow default
initialization.
(aarch64_vec_op_count::base_issue_info): Remove handling of null
issue_infos.
(aarch64_vec_op_count::simd_issue_info): Likewise.
(aarch64_vec_op_count::sve_issue_info): Likewise.
(aarch64_vector_costs::m_ops): Turn into a vector.
(aarch64_vector_costs::m_advsimd_ops): Likewise.
(aarch64_vector_costs::aarch64_vector_costs): Add entries to
the vectors based on aarch64_tune_params.
(aarch64_vector_costs::analyze_loop_vinfo): Update the pred_ops
of all entries in m_ops.
(aarch64_vector_costs::add_stmt_cost): Call count_ops for all
entries in m_ops.
(aarch64_estimate_min_cycles_per_iter): Remove issue_info
parameter and get the information from the ops instead.
(aarch64_vector_costs::adjust_body_cost_sve): Take a
aarch64_vec_issue_info instead of a aarch64_vec_op_count.
(aarch64_vector_costs::adjust_body_cost): Update call accordingly.
Exit earlier if m_ops is empty for either cost structure.

aarch64: Use real scalar op counts

Now that vector finish_costs is passed the associated scalar costs,
we can record the scalar issue information while computing the scalar
costs, rather than trying to estimate it while computing the vector
costs.

This simplifies things a little, but the main motivation is to improve
accuracy.

gcc/
* config/aarch64/aarch64.c (aarch64_vector_costs::m_scalar_ops)
(aarch64_vector_costs::m_sve_ops): Replace with...
(aarch64_vector_costs::m_ops): ...this.
(aarch64_vector_costs::analyze_loop_vinfo): Update accordingly.
(aarch64_vector_costs::adjust_body_cost_sve): Likewise.
(aarch64_vector_costs::aarch64_vector_costs): Likewise.
Initialize m_vec_flags here rather than in add_stmt_cost.
(aarch64_vector_costs::count_ops): Test for scalar reductions too.
Allow vectype to be null.
(aarch64_vector_costs::add_stmt_cost): Call count_ops for scalar
code too. Don't require vectype to be nonnull.
(aarch64_vector_costs::adjust_body_cost): Take the loop_vec_info
and scalar costs as parameters. Use the scalar costs to determine
the cycles per iteration of the scalar loop, then multiply it
by the estimated VF.
(aarch64_vector_costs::finish_cost): Update call accordingly.

aarch64: Get floatness from stmt_info

This patch gets the floatness of a memory access from the data
reference rather than the vectype. This makes it more suitable
for use in scalar costing code.

gcc/
* config/aarch64/aarch64.c (aarch64_dr_type): New function.
(aarch64_vector_costs::count_ops): Use it rather than the
vectype to determine floatness.

aarch64: Remove vectype from latency tests

This patch gets the scalar mode of a reduction operation from the
gimple stmt rather than the vectype. This makes it more suitable
for use in scalar costs.

gcc/
* config/aarch64/aarch64.c (aarch64_sve_in_loop_reduction_latency):
Remove vectype parameter and get floatness from the type of the
stmt lhs instead.
(arch64_in_loop_reduction_latency): Likewise.
(aarch64_detect_vector_stmt_subtype): Update caller.
(aarch64_vector_costs::count_ops): Likewise.

aarch64: Fold aarch64_sve_op_count into aarch64_vec_op_count

Later patches make aarch64 use the new vector hooks.  We then
only need to track one set of ops for each aarch64_vector_costs
structure.  This in turn means that it's more convenient to merge
aarch64_sve_op_count and aarch64_vec_op_count.

The patch also adds issue info and vec flags to aarch64_vec_op_count,
so that the structure is more self-descriptive.  This simplifies some
things later.

gcc/
* config/aarch64/aarch64.c (aarch64_sve_op_count): Fold into...
(aarch64_vec_op_count): ...this.  Add a constructor.
(aarch64_vec_op_count::vec_flags): New function.
(aarch64_vec_op_count::base_issue_info): Likewise.
(aarch64_vec_op_count::simd_issue_info): Likewise.
(aarch64_vec_op_count::sve_issue_info): Likewise.
(aarch64_vec_op_count::m_issue_info): New member variable.
(aarch64_vec_op_count::m_vec_flags): Likewise.
(aarch64_vector_costs): Add a constructor.
(aarch64_vector_costs::m_sve_ops): Change type to aarch64_vec_op_count.
(aarch64_vector_costs::aarch64_vector_costs): New function.
Initialize m_scalar_ops, m_advsimd_ops and m_sve_ops.
(aarch64_vector_costs::count_ops): Remove vec_flags and
issue_info parameters, using the new aarch64_vec_op_count
functions instead.
(aarch64_vector_costs::add_stmt_cost): Update call accordingly.
(aarch64_sve_op_count::dump): Fold into...
(aarch64_vec_op_count::dump): ..here.

aarch64: Detect more consecutive MEMs

For tests like:

    int res[2];
    void
    f1 (int x, int y)
    {
      res[0] = res[1] = x + y;
    }

we generated:

        add     w0, w0, w1
        adrp    x1, .LANCHOR0
        add     x2, x1, :lo12:.LANCHOR0
        str     w0, [x1, #:lo12:.LANCHOR0]
        str     w0, [x2, 4]
        ret

Using [x1, #:lo12:.LANCHOR0] for the first store prevented the
two stores being recognised as a pair.  However, the MEM_EXPR
and MEM_OFFSET information tell us that the MEMs really are
consecutive.  The peehole2 context then guarantees that the
first address is equivalent to [x2, 0].

While there: the reg_mentioned_p tests for loads were probably correct,
but seemed a bit indirect.  We're matching two consecutive loads,
so the thing we need to test is that the second MEM in the original
sequence doesn't depend on the result of the first load in the
original sequence.

gcc/
* config/aarch64/aarch64.c: Include tree-dfa.h.
(aarch64_check_consecutive_mems): New function that takes MEM_EXPR
and MEM_OFFSET into account.
(aarch64_swap_ldrstr_operands): Use it.
(aarch64_operands_ok_for_ldpstp): Likewise.  Check that the
address of the second memory doesn't depend on the result of
the first load.

gcc/testsuite/
* gcc.target/aarch64/stp_1.c: New test.

Fortran/openmp: Fix '!$omp end'

gcc/fortran/ChangeLog:

* parse.c (decode_omp_directive): Fix permitting 'nowait' for some
combined directives, add missing 'omp end ... loop'.
(gfc_ascii_statement): Fix ST_OMP_END_TEAMS_LOOP result.
* openmp.c (resolve_omp_clauses): Add missing combined loop constructs
case values to the 'if(directive-name: ...)' check.
* trans-openmp.c (gfc_split_omp_clauses): Put nowait on target if
first leaf construct accepting it.

gcc/testsuite/ChangeLog:

* gfortran.dg/gomp/unexpected-end.f90: Update dg-error.
* gfortran.dg/gomp/clauses-1.f90: New test.
* gfortran.dg/gomp/nowait-2.f90: New test.
* gfortran.dg/gomp/nowait-3.f90: New test.

Fix exit condition in ipa_make_function_pure

gcc/ChangeLog:

* ipa-pure-const.c (ipa_make_function_pure): Fix exit condition.

Fix ICE in tree-ssa-structalias.c

PR tree-optimization/103175
* ipa-modref.c (modref_lattice::merge): Add sanity check.
(callee_to_caller_flags): Make flags adjustment sane.
(modref_eaf_analysis::analyze_ssa_name): Likewise.

libgomp: Unbreak gcn offload build

My recent libgomp change apparently broke libgomp build for gcn offloading.
The problem is that gcn, unlike nvptx, doesn't override teams.c source file
and the patch I've committed assumed all the non-LIBGOMP_USE_PTHREADS targets
do not use it.  My understanding is that gcn included omp_get_num_teams
and omp_get_team_num definitions in both icv-device.o and teams.o,
with the definitions only in the former working correctly.

This patch brings gcn into sync with how nvptx does it, that teams.c
is overridden, provides a dummy GOMP_teams_reg and omp_get_{num_teams,team_num}
definitions and icv-device.c doesn't provide those.

2021-11-12  Jakub Jelinek  <jakub@redhat.com>

PR target/103201
* config/gcn/icv-device.c (omp_get_num_teams, omp_get_team_num): Move
to ...
* config/gcn/teams.c: ... here.  New file.

Fortran: Use build_debug_expr_decl to create DEBUG_DECL_EXPRs

This patch converts one more open coded construction of a
DEBUG_EXPR_DECL to a call of build_debug_expr_decl that I missed in my
previous patch befause it happens to be in the Fortran front-end.

gcc/fortran/ChangeLog:

2021-11-11 Martin Jambor <mjambor@suse.cz>

* trans-types.c (gfc_get_array_descr_info): Use build_debug_expr_decl
instead of building DEBUG_EXPR_DECL manually.

testsuite: Filter out TSVC test on Power [PR103051]

PR testsuite/103051

gcc/testsuite/ChangeLog:

* gcc.dg/vect/tsvc/vect-tsvc-s112.c: Skip test for old Power
CPUs.

libbacktrace: fix UBSAN issues

Fix issues mentioned in the PR.

PR libbacktrace/103167

libbacktrace/ChangeLog:

* elf.c (elf_uncompress_lzma_block): Cast to unsigned int.
(elf_uncompress_lzma): Likewise.
* xztest.c (test_samples): memcpy only if v > 0.

jit: fix -Werror=format-overflow= in testsuite [PR103199]

gcc/jit/ChangeLog:
PR jit/103199
* docs/examples/tut04-toyvm/toyvm.c (toyvm_function_compile):
Increase size of buffer.
* docs/examples/tut04-toyvm/toyvm.cc
(compilation_state::create_function): Likewise.

Signed-off-by: David Malcolm <dmalcolm@redhat.com>

Fix ipa-modref pure/const discovery

PR ipa/103200
* ipa-modref.c (analyze_function, modref_propagate_in_scc): Do
not mark pure/const function if there are side-effects.

openmp: Relax handling of implicit map vs. existing device mappings

This patch implements relaxing the requirements when a map with the implicit
attribute encounters an overlapping existing map. As the OpenMP 5.0 spec
describes on page 320, lines 18-27 (and 5.1 spec, page 352, lines 13-22):

"If a single contiguous part of the original storage of a list item with an
implicit data-mapping attribute has corresponding storage in the device data
environment prior to a task encountering the construct that is associated with
the map clause, only that part of the original storage will have corresponding
storage in the device data environment as a result of the map clause."

2021-11-12 Chung-Lin Tang <cltang@codesourcery.com>

include/ChangeLog:

* gomp-constants.h (GOMP_MAP_FLAG_SPECIAL_3): Define special bit macro.
(GOMP_MAP_IMPLICIT): New special map kind bits value.
(GOMP_MAP_FLAG_SPECIAL_BITS): Define helper mask for whole set of
special map kind bits.
(GOMP_MAP_IMPLICIT_P): New predicate macro for implicit map kinds.

gcc/ChangeLog:

* tree.h (OMP_CLAUSE_MAP_RUNTIME_IMPLICIT_P): New access macro for
'implicit' bit, using 'base.deprecated_flag' field of tree_node.
* tree-pretty-print.c (dump_omp_clause): Add support for printing
implicit attribute in tree dumping.
* gimplify.c (gimplify_adjust_omp_clauses_1):
Set OMP_CLAUSE_MAP_RUNTIME_IMPLICIT_P to 1 if map clause is implicitly
created.
(gimplify_adjust_omp_clauses): Adjust place of adding implicitly created
clauses, from simple append, to starting of list, after non-map clauses.
* omp-low.c (lower_omp_target): Add GOMP_MAP_IMPLICIT bits into kind
values passed to libgomp for implicit maps.

gcc/testsuite/ChangeLog:

* c-c++-common/gomp/target-implicit-map-1.c: New test.
* c-c++-common/goacc/combined-reduction.c: Adjust scan test pattern.
* c-c++-common/goacc/firstprivate-mappings-1.c: Likewise.
* c-c++-common/goacc/mdc-1.c: Likewise.
* g++.dg/goacc/firstprivate-mappings-1.C: Likewise.

libgomp/ChangeLog:

* target.c (gomp_map_vars_existing): Add 'bool implicit' parameter, add
implicit map handling to allow a "superset" existing map as valid case.
(get_kind): Adjust to filter out GOMP_MAP_IMPLICIT bits in return value.
(get_implicit): New function to extract implicit status.
(gomp_map_fields_existing): Adjust arguments in calls to
gomp_map_vars_existing, and add uses of get_implicit.
(gomp_map_vars_internal): Likewise.
* testsuite/libgomp.c-c++-common/target-implicit-map-1.c: New test.

libstdc++: Print assertion messages to stderr [PR59675]

This replaces the printf used by failed debug assertions with fprintf,
so we can write to stderr.

To avoid including <stdio.h> the assert function is moved into the
library. To avoid programs using a vague linkage definition of the old
inline function, the function is renamed. Code compiled with old
versions of GCC might still call the old function, but code compiled
with the newer GCC will call the new function and write to stderr.

libstdc++-v3/ChangeLog:

PR libstdc++/59675
* acinclude.m4 (libtool_VERSION): Bump version.
* config/abi/pre/gnu.ver (GLIBCXX_3.4.30): Add version and
export new symbol.
* configure: Regenerate.
* include/bits/c++config (__replacement_assert): Remove, declare
__glibcxx_assert_fail instead.
* src/c++11/debug.cc (__glibcxx_assert_fail): New function to
replace __replacement_assert, writing to stderr instead of
stdout.
* testsuite/util/testsuite_abi.cc: Update latest version.

fortran: Ignore unused args in scalarization [PR97896]

The KIND argument of the INDEX intrinsic is a compile time constant
that is used at compile time only to resolve to a kind-specific library
function. That argument is otherwise completely ignored at runtime, and there is
no code generated for it as the library procedure has no kind argument.
This confuses the scalarizer which expects to see every argument
of elemental functions used when calling a procedure.
This change removes the argument from the scalarization lists
at the beginning of the scalarization process, so that the argument
is completely ignored.
This also reverts the existing workaround
(commit d09847357b965a2c2cda063827ce362d4c9c86f2 except for its testcase).

PR fortran/97896

gcc/fortran/ChangeLog:
* intrinsic.c (add_sym_4ind): Remove.
(add_functions): Use add_sym4 instead of add_sym4ind.
Don’t special case the index intrinsic.
* iresolve.c (gfc_resolve_index_func): Use the individual arguments
directly instead of the full argument list.
* intrinsic.h (gfc_resolve_index_func): Update the declaration
accordingly.
* trans-decl.c (gfc_get_extern_function_decl): Don’t modify the
list of arguments in the case of the index intrinsic.
* trans-array.h (gfc_get_intrinsic_for_expr,
gfc_get_proc_ifc_for_expr): New.
* trans-array.c (gfc_get_intrinsic_for_expr,
arg_evaluated_for_scalarization): New.
(gfc_walk_elemental_function_args): Add intrinsic procedure
as argument. Count arguments. Check arg_evaluated_for_scalarization.
* trans-intrinsic.c (gfc_walk_intrinsic_function): Update call.
* trans-stmt.c (get_intrinsic_for_code): New.
(gfc_trans_call): Update call.

gcc/testsuite/ChangeLog:
* gfortran.dg/index_5.f90: New.

openmp: Honor OpenMP 5.1 num_teams lower bound

The following patch implements what I've been talking about earlier,
honor that for explicit num_teams clause we create at least the
lower-bound (if not specified, upper-bound) teams in the league.
For host fallback, it still means we only have one thread doing all the
teams, sequentially one after another.
For PTX and GCN, I think the new teams-2.c test and maybe teams-4.c too
will or might fail.
For these offloads, I think it is ok to remove symbols no longer used
from libgomp.a.
If num_teams_lower is bigger than the provided num_blocks or num_workgroups,
we should arrange for gomp_num_teams_var to be num_teams_lower - 1,
stop using the %ctaid.x or __builtin_gcn_dim_pos (0) for omp_get_team_num ()
and instead use for it some .shared var that GOMP_teams4 initializes to
%ctaid.x or __builtin_gcn_dim_pos (0) when first and for !first
increment that by num_blocks or num_workgroups each time and only
return false when we are above num_teams_lower.
Any help with actually implementing this for the 2 architectures highly
appreciated.

2021-11-12 Jakub Jelinek <jakub@redhat.com>

gcc/
* omp-builtins.def (BUILT_IN_GOMP_TEAMS): Remove.
(BUILT_IN_GOMP_TEAMS4): New.
* builtin-types.def (BT_FN_VOID_UINT_UINT): Remove.
(BT_FN_BOOL_UINT_UINT_UINT_BOOL): New.
* omp-low.c (lower_omp_teams): Use GOMP_teams4 instead of
GOMP_teams, pass to it also num_teams lower-bound expression
or a dup of upper-bound if it is missing and a flag whether
it is the first call or not.
gcc/fortran/
* types.def (BT_FN_VOID_UINT_UINT): Remove.
(BT_FN_BOOL_UINT_UINT_UINT_BOOL): New.
libgomp/
* libgomp_g.h (GOMP_teams4): Declare.
* libgomp.map (GOMP_5.1): Export GOMP_teams4.
* target.c (GOMP_teams4): New function.
* config/nvptx/target.c (GOMP_teams): Remove.
(GOMP_teams4): New function.
* config/gcn/target.c (GOMP_teams): Remove.
(GOMP_teams4): New function.
* testsuite/libgomp.c/teams-4.c (main): Expect exactly 2
teams instead of <= 2.
* testsuite/libgomp.c-c++-common/teams-2.c: New test.

Remove unused function.

PR tree-optimization/102497

gcc/ChangeLog:

* gimple-predicate-analysis.cc (add_pred): Remove unused
function:

tree-optimization/103204 - fix missed valueization in VN

The following fixes a missed valueization when simplifying
a MEM[&...] combination during valueization.

2021-11-12 Richard Biener <rguenther@suse.de>

PR tree-optimization/103204
* tree-ssa-sccvn.c (valueize_refs_1): Re-valueize the
top operand after folding in an address.

* gcc.dg/torture/pr103204.c: New testcase.

Make opcodes configure depend on bfd configure

The idea is for opcodes to be able to see whether bfd is compiled
for 64-bit. A lot of --enable-targets=all libopcodes is wasted space
if bfd can't load 64-bit target object files.

* Makefile.def (configure-opcodes): Depend on configure-bfd.
* Makefile.in: Regenerate.

libstdc++: Implement constexpr std::vector for C++20

This implements P1004R2 ("Making std::vector constexpr") for C++20.

For now, debug mode vectors are not supported in constant expressions.
To make that work we might need to disable all attaching/detaching of
safe iterators. That can be fixed later.

Co-authored-by: Josh Marshall <joshua.r.marshall.1991@gmail.com>
libstdc++-v3/ChangeLog:

* include/bits/alloc_traits.h (_Destroy): Make constexpr for
C++20 mode.
* include/bits/allocator.h (__shrink_to_fit::_S_do_it):
Likewise.
* include/bits/stl_algobase.h (__fill_a1): Declare _Bit_iterator
overload constexpr for C++20.
* include/bits/stl_bvector.h (_Bit_type, _S_word_bit): Move out
of inline namespace.
(_Bit_reference, _Bit_iterator_base, _Bit_iterator)
(_Bit_const_iterator, _Bvector_impl_data, _Bvector_base)
(vector<bool, A>>): Add constexpr to every member function.
(_Bvector_base::_M_allocate): Initialize storage during constant
evaluation.
(vector<bool, A>::_M_initialize_value): Use __fill_bvector_n
instead of memset.
(__fill_bvector_n): New helper function to replace memset during
constant evaluation.
* include/bits/stl_uninitialized.h (__uninitialized_copy<false>):
Move logic to ...
(__do_uninit_copy): New function.
(__uninitialized_fill<false>): Move logic to ...
(__do_uninit_fill): New function.
(__uninitialized_fill_n<false>): Move logic to ...
(__do_uninit_fill_n): New function.
(__uninitialized_copy_a): Add constexpr. Use __do_uninit_copy.
(__uninitialized_move_a, __uninitialized_move_if_noexcept_a):
Add constexpr.
(__uninitialized_fill_a): Add constexpr. Use __do_uninit_fill.
(__uninitialized_fill_n_a): Add constexpr. Use
__do_uninit_fill_n.
(__uninitialized_default_n, __uninitialized_default_n_a)
(__relocate_a_1, __relocate_a): Add constexpr.
* include/bits/stl_vector.h (_Vector_impl_data, _Vector_impl)
(_Vector_base, vector): Add constexpr to every member function.
(_Vector_impl::_S_adjust): Disable ASan annotation during
constant evaluation.
(_Vector_base::_S_use_relocate): Disable bitwise-relocation
during constant evaluation.
(vector::_Temporary_value): Use a union for storage.
* include/bits/vector.tcc (vector, vector<bool>): Add constexpr
to every member function.
* include/std/vector (erase_if, erase): Add constexpr.
* testsuite/23_containers/headers/vector/synopsis.cc: Add
constexpr for C++20 mode.
* testsuite/23_containers/vector/bool/cmp_c++20.cc: Change to
compile-only test using constant expressions.
* testsuite/23_containers/vector/bool/capacity/29134.cc: Adjust
namespace for _S_word_bit.
* testsuite/23_containers/vector/bool/modifiers/insert/31370.cc:
Likewise.
* testsuite/23_containers/vector/cmp_c++20.cc: Likewise.
* testsuite/23_containers/vector/cons/89164.cc: Adjust errors
for C++20 and move C++17 test to ...
* testsuite/23_containers/vector/cons/89164_c++17.cc: ... here.
* testsuite/23_containers/vector/bool/capacity/constexpr.cc: New test.
* testsuite/23_containers/vector/bool/cons/constexpr.cc: New test.
* testsuite/23_containers/vector/bool/element_access/constexpr.cc: New test.
* testsuite/23_containers/vector/bool/modifiers/assign/constexpr.cc: New test.
* testsuite/23_containers/vector/bool/modifiers/constexpr.cc: New test.
* testsuite/23_containers/vector/bool/modifiers/swap/constexpr.cc: New test.
* testsuite/23_containers/vector/capacity/constexpr.cc: New test.
* testsuite/23_containers/vector/cons/constexpr.cc: New test.
* testsuite/23_containers/vector/data_access/constexpr.cc: New test.
* testsuite/23_containers/vector/element_access/constexpr.cc: New test.
* testsuite/23_containers/vector/modifiers/assign/constexpr.cc: New test.
* testsuite/23_containers/vector/modifiers/constexpr.cc: New test.
* testsuite/23_containers/vector/modifiers/swap/constexpr.cc: New test.

Daily bump.

libstdc++: Fix debug containers for C++98 mode

Since r12-5072 made _Safe_container::operator=(const _Safe_container&)
protected, the debug containers no longer compile in C++98 mode. They
have user-provided copy assignment operators in C++98 mode, and they
assign each base class in turn. The 'this->_M_safe() = __x' expressions
fail, because calling a protected member function is only allowed via
'this'. They could be fixed by using this->_Safe::operator=(__x) but a
simpler solution is to just remove the user-provided assignment
operators and let the compiler define them (as we do for C++11 and
later, by defining them as defaulted).

The only change needed for that to work is to define the _Safe_vector
copy assignment operator in C++98 mode, so that the implicit
__gnu_debug::vector::operator= definition will call it, instead of
needing to call _M_update_guaranteed_capacity() manually.

libstdc++-v3/ChangeLog:

* include/debug/deque (deque::operator=(const deque&)): Remove
definition.
* include/debug/list (list::operator=(const list&)): Likewise.
* include/debug/map.h (map::operator=(const map&)): Likewise.
* include/debug/multimap.h (multimap::operator=(const multimap&)):
Likewise.
* include/debug/multiset.h (multiset::operator=(const multiset&)):
Likewise.
* include/debug/set.h (set::operator=(const set&)): Likewise.
* include/debug/string (basic_string::operator=(const basic_string&)):
Likewise.
* include/debug/vector (vector::operator=(const vector&)):
Likewise.
(_Safe_vector::operator=(const _Safe_vector&)): Define for
C++98 as well.

Make ranger optional in path_range_query.

All users of path_range_query are currently allocating a gimple_ranger
only to pass it to the query object. It's tidier to just do it from
path_range_query if no ranger was passed.

Tested on x86-64 Linux.

gcc/ChangeLog:

* gimple-range-path.cc (path_range_query::path_range_query): New
ctor without a ranger.
(path_range_query::~path_range_query): Free ranger if necessary.
(path_range_query::range_on_path_entry): Adjust m_ranger for pointer.
(path_range_query::ssa_range_in_phi): Same.
(path_range_query::compute_ranges_in_block): Same.
(path_range_query::compute_imports): Same.
(path_range_query::compute_ranges): Same.
(path_range_query::range_of_stmt): Same.
(path_range_query::compute_outgoing_relations): Same.
* gimple-range-path.h (class path_range_query): New ctor.
* tree-ssa-loop-ch.c (ch_base::copy_headers): Remove gimple_ranger
as path_range_query allocates one.
* tree-ssa-threadbackward.c (class back_threader): Remove m_ranger.
(back_threader::~back_threader): Same.

Remove loop crossing restriction from the backward threader.

We have much more thorough restrictions, that are shared between both
threader implementations, in the registry. I've been meaning to
remove the backward threader one, since it's only purpose was reducing
the search space. Previously there was a small time penalty for its
removal, but with the various patches in the past month, it looks like
the removal is a wash performance wise.

This catches 8 more jump threads in the backward threader in my suite.
Presumably, because we disallowed all loop crossing, whereas the
registry restrictions allow some crossing (if we exit the loop, etc).

Tested on x86-64 Linux.

gcc/ChangeLog:

* tree-ssa-threadbackward.c
(back_threader_profitability::profitable_path_p): Remove loop
crossing restriction.

rs6000: Fix test_mffsl.c to require Power9 support

2021-11-11 Bill Schmidt <wschmidt@linux.ibm.com>

gcc/testsuite/
* gcc.target/powerpc/test_mffsl.c: Require Power9.

compiler: traverse func subexprs when creating func descriptors

Fix the Create_func_descriptors pass to traverse the subexpressions of
the function in a Call_expression.  There are no subexpressions in the
normal case of calling a function a method directly, but there are
subexpressions when in code like F().M() when F returns an interface type.

Forgetting to traverse the function subexpressions was almost entirely
hidden by the fact that we also created the necessary thunks in
Bound_method_expression::do_flatten and
Interface_field_reference_expression::do_get_backend.  However, when
the thunks were created there, they did not go through the
order_evaluations pass.  This almost always worked, but failed in the
case in which the function being thunked returned multiple results, as
order_evaluations takes the necessary step of moving the
Call_expression into its own statement, and that would not happen when
order_evaluations was not called.  Avoid hiding errors like this by
changing those methods to only lookup the previously created thunk,
rather than creating it if it was not already created.

The test case for this is https://golang.org/cl/363156.

Fixes https://golang.org/issue/49512

Reviewed-on: https://go-review.googlesource.com/c/gofrontend/+/363274

libstdc++: Make pmr::memory_resource::allocate implicitly create objects

Calling the placement version of ::operator new "implicitly creates
objects in the returned region of storage" as per [intro.object]. This
allows the returned memory to be used as storage for implicit-lifetime
types (including arrays) without additional action by the caller. This
is required by the proposed resolution of LWG 3147.

libstdc++-v3/ChangeLog:

* include/std/memory_resource (memory_resource::allocate):
Implicitly create objects in the returned storage.

libstdc++: Remove public std::vector<bool>::data() member

This function only exists to avoid an error in the debug mode vector, so
doesn't need to be public.

libstdc++-v3/ChangeLog:

* include/bits/stl_bvector.h (vector<bool>::data()): Give
protected access, and delete for C++11 and later.

Fix gfortran.dg/inline_matmul_17.f90 template.

As discussed on the mailing list the template actually tests for missed
optimization where we fail to pragate size of an array. We no longer miss this
after modref improvements.

gcc/testsuite/ChangeLog:

2021-11-11 Jan Hubicka <hubicka@ucw.cz>

* gfortran.dg/inline_matmul_17.f90: Fix template

Enable pure-const discovery in modref.

We newly can handle some extra cases, for example:

struct a {int a,b,c;};
__attribute__ ((noinline))
int init (struct a *a)
{
  a->a=1;
  a->b=2;
  a->c=3;
}
int const_fn ()
{
  struct a a;
  init (&a);
  return a.a + a.b + a.c;
}

Here pure/const stops on the fact that const_fn calls non-const init, while
modref knows that the memory it initializes is local to const_fn.

I ended up reordering passes so early modref is done after early pure-const
mostly to avoid need to change testsuite which greps for const functions
being detects in pure-const.  Stil some testuiste compensation is needed.

gcc/ChangeLog:

2021-11-11  Jan Hubicka  <hubicka@ucw.cz>

* ipa-modref.c (analyze_function): Do pure/const discovery, return
true on success.
(pass_modref::execute): If pure/const is discovered fixup cfg.
(ignore_edge): Do not ignore pure/const edges.
(modref_propagate_in_scc): Do pure/const discovery, return true if
cdtor was promoted pure/const.
(pass_ipa_modref::execute): If needed remove unreachable functions.
* ipa-pure-const.c (warn_function_noreturn): Fix whitespace.
(warn_function_cold): Likewise.
(skip_function_for_local_pure_const): Move earlier.
(ipa_make_function_const): Break out from ...
(ipa_make_function_pure): Break out from ...
(propagate_pure_const): ... here.
(pass_local_pure_const::execute): Use it.
* ipa-utils.h (ipa_make_function_const): Declare.
(ipa_make_function_pure): Declare.
* passes.def: Move early modref after pure-const.

gcc/testsuite/ChangeLog:

2021-11-11  Jan Hubicka  <hubicka@ucw.cz>

* c-c++-common/tm/inline-asm.c: Disable pure-const.
* g++.dg/ipa/modref-1.C: Update template.
* gcc.dg/tree-ssa/modref-11.c: Disable pure-const.
* gcc.dg/tree-ssa/modref-14.c: New test.
* gcc.dg/tree-ssa/modref-8.c: Do not optimize sibling calls.
* gfortran.dg/do_subscript_3.f90: Add -O0.

diagnostic: fix unused variable 'def_tabstop' [PR103129]

gcc/ChangeLog:
PR other/103129
* diagnostic-show-locus.c (def_policy): Use def_tabstop.

Signed-off-by: David Malcolm <dmalcolm@redhat.com>

Fortran/openmp: Add support for 2 argument num_teams clause

Fortran part to commit r12-5146-g48d7327f2aaf65

gcc/fortran/ChangeLog:

* gfortran.h (struct gfc_omp_clauses): Rename num_teams to
num_teams_upper, add num_teams_upper.
* dump-parse-tree.c (show_omp_clauses): Update to handle
lower-bound num_teams clause.
* frontend-passes.c (gfc_code_walker): Likewise
* openmp.c (gfc_free_omp_clauses, gfc_match_omp_clauses,
resolve_omp_clauses): Likewise.
* trans-openmp.c (gfc_trans_omp_clauses, gfc_split_omp_clauses,
gfc_trans_omp_target): Likewise.

libgomp/ChangeLog:

* testsuite/libgomp.fortran/teams-1.f90: New test.

aarch64: Use type-qualified builtins for vcombine_* Neon intrinsics

Declare unsigned and polynomial type-qualified builtins for
vcombine_* Neon intrinsics. Using these builtins removes the need for
many casts in arm_neon.h.

gcc/ChangeLog:

2021-11-10 Jonathan Wright <jonathan.wright@arm.com>

* config/aarch64/aarch64-builtins.c (TYPES_COMBINE): Delete.
(TYPES_COMBINEP): Delete.
* config/aarch64/aarch64-simd-builtins.def: Declare type-
qualified builtins for vcombine_* intrinsics.
* config/aarch64/arm_neon.h (vcombine_s8): Remove unnecessary
cast.
(vcombine_s16): Likewise.
(vcombine_s32): Likewise.
(vcombine_f32): Likewise.
(vcombine_u8): Use type-qualified builtin and remove casts.
(vcombine_u16): Likewise.
(vcombine_u32): Likewise.
(vcombine_u64): Likewise.
(vcombine_p8): Likewise.
(vcombine_p16): Likewise.
(vcombine_p64): Likewise.
(vcombine_bf16): Remove unnecessary cast.
* config/aarch64/iterators.md (VD_I): New mode iterator.
(VDC_P): New mode iterator.

aarch64: Use type-qualified builtins for LD1/ST1 Neon intrinsics

Declare unsigned and polynomial type-qualified builtins for LD1/ST1
Neon intrinsics. Using these builtins removes the need for many casts
in arm_neon.h.

The new type-qualified builtins are also lowered to gimple - as the
unqualified builtins are already.

gcc/ChangeLog:

2021-11-10 Jonathan Wright <jonathan.wright@arm.com>

* config/aarch64/aarch64-builtins.c (TYPES_LOAD1_U): Define.
(TYPES_LOAD1_P): Define.
(TYPES_STORE1_U): Define.
(TYPES_STORE1P): Rename to...
(TYPES_STORE1_P): This.
(get_mem_type_for_load_store): Add unsigned and poly types.
(aarch64_general_gimple_fold_builtin): Add unsigned and poly
type-qualified builtin declarations.
* config/aarch64/aarch64-simd-builtins.def: Declare type-
qualified builtins for LD1/ST1.
* config/aarch64/arm_neon.h (vld1_p8): Use type-qualified
builtin and remove cast.
(vld1_p16): Likewise.
(vld1_u8): Likewise.
(vld1_u16): Likewise.
(vld1_u32): Likewise.
(vld1q_p8): Likewise.
(vld1q_p16): Likewise.
(vld1q_p64): Likewise.
(vld1q_u8): Likewise.
(vld1q_u16): Likewise.
(vld1q_u32): Likewise.
(vld1q_u64): Likewise.
(vst1_p8): Likewise.
(vst1_p16): Likewise.
(vst1_u8): Likewise.
(vst1_u16): Likewise.
(vst1_u32): Likewise.
(vst1q_p8): Likewise.
(vst1q_p16): Likewise.
(vst1q_p64): Likewise.
(vst1q_u8): Likewise.
(vst1q_u16): Likewise.
(vst1q_u32): Likewise.
(vst1q_u64): Likewise.
* config/aarch64/iterators.md (VALLP_NO_DI): New iterator.

aarch64: Use type-qualified builtins for ADDV Neon intrinsics

Declare unsigned type-qualified builtins and use them to implement
the vector reduction Neon intrinsics. This removes the need for many
casts in arm_neon.h.

gcc/ChangeLog:

2021-11-09 Jonathan Wright <jonathan.wright@arm.com>

* config/aarch64/aarch64-simd-builtins.def: Declare unsigned
builtins for vector reduction.
* config/aarch64/arm_neon.h (vaddv_u8): Use type-qualified
builtin and remove casts.
(vaddv_u16): Likewise.
(vaddv_u32): Likewise.
(vaddvq_u8): Likewise.
(vaddvq_u16): Likewise.
(vaddvq_u32): Likewise.
(vaddvq_u64): Likewise.

aarch64: Use type-qualified builtins for ADDP Neon intrinsics

Declare unsigned type-qualified builtins and use them to implement
the pairwise addition Neon intrinsics. This removes the need for many
casts in arm_neon.h.

gcc/ChangeLog:

2021-11-09 Jonathan Wright <jonathan.wright@arm.com>

* config/aarch64/aarch64-simd-builtins.def:
* config/aarch64/arm_neon.h (vpaddq_u8): Use type-qualified
builtin and remove casts.
(vpaddq_u16): Likewise.
(vpaddq_u32): Likewise.
(vpaddq_u64): Likewise.
(vpadd_u8): Likewise.
(vpadd_u16): Likewise.
(vpadd_u32): Likewise.
(vpaddd_u64): Likewise.

aarch64: Use type-qualified builtins for [R]SUBHN[2] Neon intrinsics

Declare unsigned type-qualified builtins and use them to implement
(rounding) halving-narrowing-subtract Neon intrinsics. This removes
the need for many casts in arm_neon.h.

gcc/ChangeLog:

2021-11-09 Jonathan Wright <jonathan.wright@arm.com>

* config/aarch64/aarch64-simd-builtins.def: Declare unsigned
builtins for [r]subhn[2].
* config/aarch64/arm_neon.h (vsubhn_s16): Remove unnecessary
cast.
(vsubhn_s32): Likewise.
(vsubhn_s64): Likewise.
(vsubhn_u16): Use type-qualified builtin and remove casts.
(vsubhn_u32): Likewise.
(vsubhn_u64): Likewise.
(vrsubhn_s16): Remove unnecessary cast.
(vrsubhn_s32): Likewise.
(vrsubhn_s64): Likewise.
(vrsubhn_u16): Use type-qualified builtin and remove casts.
(vrsubhn_u32): Likewise.
(vrsubhn_u64): Likewise.
(vrsubhn_high_s16): Remove unnecessary cast.
(vrsubhn_high_s32): Likewise.
(vrsubhn_high_s64): Likewise.
(vrsubhn_high_u16): Use type-qualified builtin and remove
casts.
(vrsubhn_high_u32): Likewise.
(vrsubhn_high_u64): Likewise.
(vsubhn_high_s16): Remove unnecessary cast.
(vsubhn_high_s32): Likewise.
(vsubhn_high_s64): Likewise.
(vsubhn_high_u16): Use type-qualified builtin and remove
casts.
(vsubhn_high_u32): Likewise.
(vsubhn_high_u64): Likewise.

aarch64: Use type-qualified builtins for [R]ADDHN[2] Neon intrinsics

Declare unsigned type-qualified builtins and use them to implement
(rounding) halving-narrowing-add Neon intrinsics. This removes the
need for many casts in arm_neon.h.

gcc/ChangeLog:

2021-11-09 Jonathan Wright <jonathan.wright@arm.com>

* config/aarch64/aarch64-simd-builtins.def: Declare unsigned
builtins for [r]addhn[2].
* config/aarch64/arm_neon.h (vaddhn_s16): Remove unnecessary
cast.
(vaddhn_s32): Likewise.
(vaddhn_s64): Likewise.
(vaddhn_u16): Use type-qualified builtin and remove casts.
(vaddhn_u32): Likewise.
(vaddhn_u64): Likewise.
(vraddhn_s16): Remove unnecessary cast.
(vraddhn_s32): Likewise.
(vraddhn_s64): Likewise.
(vraddhn_u16): Use type-qualified builtin and remove casts.
(vraddhn_u32): Likewise.
(vraddhn_u64): Likewise.
(vaddhn_high_s16): Remove unnecessary cast.
(vaddhn_high_s32): Likewise.
(vaddhn_high_s64): Likewise.
(vaddhn_high_u16): Use type-qualified builtin and remove
casts.
(vaddhn_high_u32): Likewise.
(vaddhn_high_u64): Likewise.
(vraddhn_high_s16): Remove unnecessary cast.
(vraddhn_high_s32): Likewise.
(vraddhn_high_s64): Likewise.
(vraddhn_high_u16): Use type-qualified builtin and remove
casts.
(vraddhn_high_u32): Likewise.
(vraddhn_high_u64): Likewise.

aarch64: Use type-qualified builtins for UHSUB Neon intrinsics

Declare unsigned type-qualified builtins and use them to implement
halving-subtract Neon intrinsics. This removes the need for many
casts in arm_neon.h.

gcc/ChangeLog:

2021-11-09 Jonathan Wright <jonathan.wright@arm.com>

* config/aarch64/aarch64-simd-builtins.def: Use BINOPU type
qualifiers in generator macros for uhsub builtins.
* config/aarch64/arm_neon.h (vhsub_s8): Remove unnecessary
cast.
(vhsub_s16): Likewise.
(vhsub_s32): Likewise.
(vhsub_u8): Use type-qualified builtin and remove casts.
(vhsub_u16): Likewise.
(vhsub_u32): Likewise.
(vhsubq_s8): Remove unnecessary cast.
(vhsubq_s16): Likewise.
(vhsubq_s32): Likewise.
(vhsubq_u8): Use type-qualified builtin and remove casts.
(vhsubq_u16): Likewise.
(vhsubq_u32): Likewise.

aarch64: Use type-qualified builtins for U[R]HADD Neon intrinsics

Declare unsigned type-qualified builtins and use them to implement
(rounding) halving-add Neon intrinsics. This removes the need for
many casts in arm_neon.h.

gcc/ChangeLog:

2021-11-09 Jonathan Wright <jonathan.wright@arm.com>

* config/aarch64/aarch64-simd-builtins.def: Use BINOPU type
qualifiers in generator macros for u[r]hadd builtins.
* config/aarch64/arm_neon.h (vhadd_s8): Remove unnecessary
cast.
(vhadd_s16): Likewise.
(vhadd_s32): Likewise.
(vhadd_u8): Use type-qualified builtin and remove casts.
(vhadd_u16): Likewise.
(vhadd_u32): Likewise.
(vhaddq_s8): Remove unnecessary cast.
(vhaddq_s16): Likewise.
(vhaddq_s32): Likewise.
(vhaddq_u8): Use type-qualified builtin and remove casts.
(vhaddq_u16): Likewise.
(vhaddq_u32): Likewise.
(vrhadd_s8): Remove unnecessary cast.
(vrhadd_s16): Likewise.
(vrhadd_s32): Likewise.
(vrhadd_u8): Use type-qualified builtin and remove casts.
(vrhadd_u16): Likewise.
(vrhadd_u32): Likewise.
(vrhaddq_s8): Remove unnecessary cast.
(vrhaddq_s16): Likewise.
(vrhaddq_s32): Likewise.
(vrhaddq_u8): Use type-wualified builtin and remove casts.
(vrhaddq_u16): Likewise.
(vrhaddq_u32): Likewise.

aarch64: Use type-qualified builtins for USUB[LW][2] Neon intrinsics

Declare unsigned type-qualified builtins and use them to implement
widening-subtract Neon intrinsics. This removes the need for many
casts in arm_neon.h.

gcc/ChangeLog:

2021-11-09 Jonathan Wright <jonathan.wright@arm.com>

* config/aarch64/aarch64-simd-builtins.def: Use BINOPU type
qualifiers in generator macros for usub[lw][2] builtins.
* config/aarch64/arm_neon.h (vsubl_s8): Remove unnecessary
cast.
(vsubl_s16): Likewise.
(vsubl_s32): Likewise.
(vsubl_u8): Use type-qualified builtin and remove casts.
(vsubl_u16): Likewise.
(vsubl_u32): Likewise.
(vsubl_high_s8): Remove unnecessary cast.
(vsubl_high_s16): Likewise.
(vsubl_high_s32): Likewise.
(vsubl_high_u8): Use type-qualified builtin and remove casts.
(vsubl_high_u16): Likewise.
(vsubl_high_u32): Likewise.
(vsubw_s8): Remove unnecessary casts.
(vsubw_s16): Likewise.
(vsubw_s32): Likewise.
(vsubw_u8): Use type-qualified builtin and remove casts.
(vsubw_u16): Likewise.
(vsubw_u32): Likewise.
(vsubw_high_s8): Remove unnecessary cast.
(vsubw_high_s16): Likewise.
(vsubw_high_s32): Likewise.
(vsubw_high_u8): Use type-qualified builtin and remove casts.
(vsubw_high_u16): Likewise.
(vsubw_high_u32): Likewise.

aarch64: Use type-qualified builtins for UADD[LW][2] Neon intrinsics

Declare unsigned type-qualified builtins and use them to implement
widening-add Neon intrinsics. This removes the need for many casts in
arm_neon.h.

gcc/ChangeLog:

2021-11-09 Jonathan Wright <jonathan.wright@arm.com>

* config/aarch64/aarch64-simd-builtins.def: Use BINOPU type
qualifiers in generator macros for uadd[lw][2] builtins.
* config/aarch64/arm_neon.h (vaddl_s8): Remove unnecessary
cast.
(vaddl_s16): Likewise.
(vaddl_s32): Likewise.
(vaddl_u8): Use type-qualified builtin and remove casts.
(vaddl_u16): Likewise.
(vaddl_u32): Likewise.
(vaddl_high_s8): Remove unnecessary cast.
(vaddl_high_s16): Likewise.
(vaddl_high_s32): Likewise.
(vaddl_high_u8): Use type-qualified builtin and remove casts.
(vaddl_high_u16): Likewise.
(vaddl_high_u32): Likewise.
(vaddw_s8): Remove unnecessary cast.
(vaddw_s16): Likewise.
(vaddw_s32): Likewise.
(vaddw_u8): Use type-qualified builtin and remove casts.
(vaddw_u16): Likewise.
(vaddw_u32): Likewise.
(vaddw_high_s8): Remove unnecessary cast.
(vaddw_high_s16): Likewise.
(vaddw_high_s32): Likewise.
(vaddw_high_u8): Use type-qualified builtin and remove casts.
(vaddw_high_u16): Likewise.
(vaddw_high_u32): Likewise.

aarch64: Use type-qualified builtins for [R]SHRN[2] Neon intrinsics

Declare unsigned type-qualified builtins and use them for [R]SHRN[2]
Neon intrinsics. This removes the need for casts in arm_neon.h.

gcc/ChangeLog:

2021-11-08 Jonathan Wright <jonathan.wright@arm.com>

* config/aarch64/aarch64-simd-builtins.def: Declare type-
qualified builtins for [R]SHRN[2].
* config/aarch64/arm_neon.h (vshrn_n_u16): Use type-qualified
builtin and remove casts.
(vshrn_n_u32): Likewise.
(vshrn_n_u64): Likewise.
(vrshrn_high_n_u16): Likewise.
(vrshrn_high_n_u32): Likewise.
(vrshrn_high_n_u64): Likewise.
(vrshrn_n_u16): Likewise.
(vrshrn_n_u32): Likewise.
(vrshrn_n_u64): Likewise.
(vshrn_high_n_u16): Likewise.
(vshrn_high_n_u32): Likewise.
(vshrn_high_n_u64): Likewise.

aarch64: Use type-qualified builtins for XTN[2] Neon intrinsics

Declare unsigned type-qualified builtins and use them for XTN[2] Neon
intrinsics. This removes the need for casts in arm_neon.h.

gcc/ChangeLog:

2021-11-08 Jonathan Wright <jonathan.wright@arm.com>

* config/aarch64/aarch64-simd-builtins.def: Declare unsigned
type-qualified builtins for XTN[2].
* config/aarch64/arm_neon.h (vmovn_high_u16): Use type-
qualified builtin and remove casts.
(vmovn_high_u32): Likewise.
(vmovn_high_u64): Likewise.
(vmovn_u16): Likewise.
(vmovn_u32): Likewise.
(vmovn_u64): Likewise.

aarch64: Use type-qualified builtins for PMUL[L] Neon intrinsics

Declare poly type-qualified builtins and use them for PMUL[L] Neon
intrinsics. This removes the need for casts in arm_neon.h.

gcc/ChangeLog:

2021-11-08 Jonathan Wright <jonathan.wright@arm.com>

* config/aarch64/aarch64-simd-builtins.def: Use poly type
qualifier in builtin generator macros.
* config/aarch64/arm_neon.h (vmul_p8): Use type-qualified
builtin and remove casts.
(vmulq_p8): Likewise.
(vmull_high_p8): Likewise.
(vmull_p8): Likewise.

aarch64: Use type-qualified builtins for unsigned MLA/MLS intrinsics

Declare type-qualified builtins and use them for MLA/MLS Neon
intrinsics that operate on unsigned types. This eliminates lots of
casts in arm_neon.h.

gcc/ChangeLog:

2021-11-08 Jonathan Wright <jonathan.wright@arm.com>

* config/aarch64/aarch64-simd-builtins.def: Declare type-
qualified builtin generators for unsigned MLA/MLS intrinsics.
* config/aarch64/arm_neon.h (vmla_n_u16): Use type-qualified
builtin.
(vmla_n_u32): Likewise.
(vmla_u8): Likewise.
(vmla_u16): Likewise.
(vmla_u32): Likewise.
(vmlaq_n_u16): Likewise.
(vmlaq_n_u32): Likewise.
(vmlaq_u8): Likewise.
(vmlaq_u16): Likewise.
(vmlaq_u32): Likewise.
(vmls_n_u16): Likewise.
(vmls_n_u32): Likewise.
(vmls_u8): Likewise.
(vmls_u16): Likewise.
(vmls_u32): Likewise.
(vmlsq_n_u16): Likewise.
(vmlsq_n_u32): Likewise.
(vmlsq_u8): Likewise.
(vmlsq_u16): Likewise.
(vmlsq_u32): Likewise.

libgcc: Fix backtrace fallback on PowerPC Big-endian

At the end of the backtrace stream _Unwind_Find_FDE() may not be able
to find the frame unwind info and will later call the backtrace fallback
instead of finishing. This occurs when using an old libc on ppc64 due to
dl_iterate_phdr() not being able to set the fde in the last trace.
When this occurs the cfa of the trace will be behind of context's cfa.
Also, libgo’s probestackmaps() calls the backtrace with a null pointer
and can get to the backchain fallback with the same problem, in this case
we are only interested in find a stack map, we don't need nor can do a
backchain.
_Unwind_ForcedUnwind_Phase2() can hit the same issue as it uses
uw_frame_state_for(), so we need to treat _URC_NORMAL_STOP.

libgcc/ChangeLog:
PR libgcc/103044
* config/rs6000/linux-unwind.h (ppc_backchain_fallback): Check if it's
called with a null argument or at the end of the backtrace and return.
* unwind.inc (_Unwind_ForcedUnwind_Phase2): Treat _URC_NORMAL_STOP.

Fix some side cases of side effects discovery

I wrote script comparing modref pure/const discovery with ipa-pure-const
and found mistakes on both ends.  This plugs the modref differences in handling
looping pure consts which were previously missed due to early exits on
ECF_CONST | ECF_PURE.  Those early exists are bit anoying and I think as
a cleanup I may just drop some of them as premature optimizations coming from
time modref was very simplistic on what it propagates.

gcc/ChangeLog:

2021-11-11  Jan Hubicka  <hubicka@ucw.cz>

* ipa-modref.c (modref_summary::useful_p): Check also for side-effects
with looping const/pure.
(modref_summary_lto::useful_p): Likewise.
(merge_call_side_effects): Merge side effects before early exit
for pure/const.
(process_fnspec): Also handle pure functions.
(analyze_call): Do not early exit on looping pure const.
(propagate_unknown_call): Also handle nontrivial SCC as side-effect.
(modref_propagate_in_scc): Update.

tree-optimization/103190 - fix assert in reassoc stmt placement with asm

This makes sure to only assert we don't run into a asm goto when
inserting a stmt in reassoc, matching the condition in
can_reassociate_p. We can handle EH edges from an asm just like
EH edges from any other stmt.

2021-11-11 Richard Biener <rguenther@suse.de>

PR tree-optimization/103190
* tree-ssa-reassoc.c (insert_stmt_after): Only assert on asm goto.

Move import population from threader to path solver.

Imports are our nomenclature for external SSA names to a block that
are used to calculate the outgoing edges for said block.  For example,
in the following snippet:

    <bb 2> :
    _1 = b_10 == block_11;
    _2 = b_10 != -1;
    _3 = _1 & _2;
    if (_3 != 0)
      goto <bb 3>; [INV]
    else
      goto <bb 5>; [INV]

...the imports to the block are b_10 and block_11 since they are both
needed to calculate _3.

The path solver takes a bitmap of imports in addition to the path
itself.  This sets up the number of SSA names to be on the lookout
for, while resolving the final conditional.

Calculating these imports was initially done in the threader, since it
was the only user of the path solver.  With new clients, it has become
obvious that populating the imports should be a task for the path
solver, so it can be shared among the clients.

This patch moves the import code to the solver, making both the solver
and the threader simpler in the process.  This is because intent is
clearer and some duplicate code was removed.

This reshuffling had the net effect of giving us a handful of new
threads through my suite of .ii files (125).  This was unexpected, but
welcome nevertheless.  There is no performance difference in callgrind
over the same suite.

Regstrapped on x86-64 Linux.

gcc/ChangeLog:

* gimple-range-path.cc (path_range_query::add_copies_to_imports):
Rename to...
(path_range_query::compute_imports): ...this.  Adapt it so it can
be passed the imports bitmap instead of working on m_imports.
(path_range_query::compute_ranges): Call compute_imports in all
cases unless an imports bitmap is passed.
* gimple-range-path.h (path_range_query::compute_imports): New.
(path_range_query::add_copies_to_imports): Remove.
* tree-ssa-threadbackward.c (back_threader::resolve_def): Remove.
(back_threader::find_paths_to_names): Inline resolve_def.
(back_threader::find_paths): Call compute_imports.
(back_threader::resolve_phi): Adjust comment.

Testsuite: Various fixes for nios2.

2021-11-11 Sandra Loosemore <sandra@codesourcery.com>

gcc/testsuite/
* g++.dg/warn/Wmismatched-new-delete-5.C: Add
-fdelete-null-pointer-checks.
* gcc.dg/attr-returns-nonnull.c: Likewise.
* gcc.dg/debug/btf/btf-datasec-1.c: Add -G0 option for nios2.
* gcc.dg/ifcvt-4.c: Skip on nios2.
* gcc.dg/struct-by-value-1.c: Add -G0 option for nios2.

tree-optimization/103188 - avoid running ranger on not-up-to-date SSA

The following splits loop header copying into an analysis phase
that uses ranger and a transform phase that can do without to avoid
running ranger on IL that has SSA form not updated.

2021-11-11 Richard Biener <rguenther@suse.de>

PR tree-optimization/103188
* tree-ssa-loop-ch.c (should_duplicate_loop_header_p):
Remove query parameter, split out check for size
optimization.
(ch_base::m_ranger, cb_base::m_query): Remove.
(ch_base::copy_headers): Split processing loop into
analysis around which we allocate and use ranger and
transform where we do not.
(pass_ch::execute): Do not allocate/free ranger here.
(pass_ch_vect::execute): Likewise.

* gcc.dg/torture/pr103188.c: New testcase.

Fix recursion discovery in ipa-pure-const

We make self recursive functions as looping of fear of endless recursion.
This is done correctly for local pure/const and for non-trivial SCCs in
callgraph, but for trivial SCCs we miss the flag.

I think it is bad decision since infinite recursion will run out of stack,
but changing it upsets some testcases and should be done independently.
So this patch is fixing current behaviour to be consistent.

gcc/ChangeLog:

2021-11-11 Jan Hubicka <hubicka@ucw.cz>

* ipa-pure-const.c (propagate_pure_const): Self recursion is
a side effects.

Fix noreturn discovery.

Fix ipa-pure-const handling of noreturn flags.  It is not safe to set it for
interposable symbols and we should also set it for aliases (just like we do for
other flags).  This patch merely copies other flag handling and implements it
here.

gcc/ChangeLog:

2021-11-11  Jan Hubicka  <hubicka@ucw.cz>

* cgraph.c (set_noreturn_flag_1): New function.
(cgraph_node::set_noreturn_flag): New member function
* cgraph.h (cgraph_node::set_noreturn_flags): Declare.
* ipa-pure-const.c (pass_local_pure_const::execute): Use it.

c++: use auto_vec in cp_parser_template_argument_list

gcc/cp/ChangeLog:

* parser.c (cp_parser_template_argument_list): Use auto_vec
instead of manual memory management.

libgomp: Use TLS storage for omp_get_num_teams()/omp_get_team_num() values

When thinking about GOMP_teams3, I've realized that using global variables
for the values returned by omp_get_num_teams()/omp_get_team_num() calls
is incorrect even with our right now dumb way of implementing host teams.
The problems are two, one is if host teams is used from multiple pthread_create
created threads - the spec says that host teams can't be nested inside of
explicit parallel or other teams constructs, but with pthread_create the
standard says obviously nothing about it.  Another more important thing
is host fallback, right now we don't do anything for omp_get_num_teams()
or omp_get_team_num() which was fine before host teams was introduced and
the 5.1 requirement that num_teams clause specifies minimum of teams, but
with the global vars it means inside of target teams num_teams (2) we happily
return omp_get_num_teams() == 4 if the target teams is inside of host teams
with num_teams(4).  With target fallback being invoked from parallel
regions global vars simply can't work right on the host.

So, this patch moves them to struct gomp_thread and propagates those for
parallel to child threads.  For host fallback, the implicit zeroing of
*thr results in us returning omp_get_num_teams () == 1 and
omp_get_team_num () == 0 which is fine for target teams without num_teams
clause, for target teams with num_teams clause something to work on and
for target without teams nested in it I've asked on omp-lang what should
be done.

2021-11-11  Jakub Jelinek  <jakub@redhat.com>

* libgomp.h (struct gomp_thread): Add num_teams and team_num members.
* team.c (struct gomp_thread_start_data): Likewise.
(gomp_thread_start): Initialize thr->num_teams and thr->team_num.
(gomp_team_start): Initialize start_data->num_teams and
start_data->team_num.  Update nthr->num_teams and nthr->team_num.
* teams.c (gomp_num_teams, gomp_team_num): Remove.
(GOMP_teams_reg): Set and restore thr->num_teams and thr->team_num
instead of gomp_num_teams and gomp_team_num.
(omp_get_num_teams): Use thr->num_teams + 1 instead of gomp_num_teams.
(omp_get_team_num): Use thr->team_num instead of gomp_team_num.
* testsuite/libgomp.c/teams-4.c: New test.

Resolve entry loop condition for the edge remaining in the loop.

There is a known failure for gfortran.dg/vector_subscript_1.f90. It
was previously failing for all optimization levels except -Os.
Getting the loop header copying right, now makes it fail for all
levels :-).

Tested on x86-64 Linux.

Co-authored-by: Richard Biener <rguenther@suse.de>
gcc/ChangeLog:

* tree-ssa-loop-ch.c (entry_loop_condition_is_static): Resolve
statically to the edge remaining in the loop.

middle-end/103181 - fix operation_could_trap_p for vector division

For integer vector division we only checked for all zero vector
constants rather than checking whether any element in the constant
vector is zero.

2021-11-11 Richard Biener <rguenther@suse.de>

PR middle-end/103181
* tree-eh.c (operation_could_trap_helper_p): Properly
check vector constants for a zero element for integer
division. Separate floating point and integer division code.

* gcc.dg/torture/pr103181.c: New testcase.

dwarf2out: Fix up field_byte_offset [PR101378]

For PCC_BITFIELD_TYPE_MATTERS field_byte_offset has quite large code
to deal with it since many years ago (see it e.g. in GCC 3.2, although it
used to be on HOST_WIDE_INTs, then on double_ints, now on offset_ints).
But that code apparently isn't able to cope with members with empty class
types with [[no_unique_address]] attribute, because the empty classes have
non-zero type size but zero decl size and so one can end up from the
computation with negative offset or offset 1 byte smaller than it should be.
For !PCC_BITFIELD_TYPE_MATTERS, we just use
    tree_result = byte_position (decl);
which seems exactly right even for the empty classes or anything which is
not a bitfield (and for which we don't add DW_AT_bit_offset attribute).
So, instead of trying to handle those no_unique_address members in the
current already very complicated code, this limits it to bitfields.

stor-layout.c PCC_BITFIELD_TYPE_MATTERS handling also affects only
bitfields, twice it checks DECL_BIT_FIELD and once DECL_BIT_FIELD_TYPE.

As discussed, this patch uses DECL_BIT_FIELD_TYPE check, because
DECL_BIT_FIELD might be cleared for some bitfields with bitsizes
multiple of BITS_PER_UNIT and e.g.
struct S { int e; int a : 1, b : 7, c : 8, d : 16; } s;
struct T { int a : 1, b : 7; long long c : 8; int d : 16; } t;

int
main ()
{
  s.c = 0x55;
  s.d = 0xaaaa;
  t.c = 0x55;
  t.d = 0xaaaa;
  s.e++;
}
has different debug info with DECL_BIT_FIELD check.

2021-11-11  Jakub Jelinek  <jakub@redhat.com>

PR debug/101378
* dwarf2out.c (field_byte_offset): Do the PCC_BITFIELD_TYPE_MATTERS
handling only for DECL_BIT_FIELD_TYPE decls.

* g++.dg/debug/dwarf2/pr101378.C: New test.

[aarch64] PR102376 - Emit better diagnostic for arch extensions in target attr.

gcc/ChangeLog:
PR target/102376
* config/aarch64/aarch64.c (aarch64_process_target_attr): Check if
token is arch extension without leading '+' and emit appropriate
diagnostic for the same.

gcc/testsuite/ChangeLog:
PR target/102376
* gcc.target/aarch64/pr102376.c: New test.

openmp: Add support for 2 argument num_teams clause

In OpenMP 5.1, num_teams clause can accept either one expression as before,
but it in that case changed meaning, rather than create <= expression
teams it is now create == expression teams.  Or it accepts two expressions
separated by :, with the meaning that the first is low bound and second upper
bound on how many teams should be created.  The other ways to set number of
teams are upper bounds with lower bound of 1.

The following patch does parsing of this for C/C++.  For host teams, we
actually don't need to do anything further right now, we always create
(pretend to create) exactly the requested number of teams, so we can just
evaluate and throw away the lower bound for now.
For teams nested in target, we don't guarantee that though and further
work will be needed.
In particular, omplower now turns the teams part of:
struct S { S (); S (const S &); ~S (); int s; };
void bar (S &, S &);
int baz ();
_Pragma ("omp declare target to (baz)");

void
foo (void)
{
  S a, b;
  #pragma omp target private (a) map (b)
  {
    #pragma omp teams firstprivate (b) num_teams (baz ())
    {
      bar (a, b);
    }
  }
}
into:
  retval.0 = baz ();
  retval.1 = retval.0;
  {
    unsigned int retval.3;
    struct S * D.2549;
    struct S b;

    retval.3 = (unsigned int) retval.1;
    D.2549 = .omp_data_i->b;
    S::S (&b, D.2549);
    #pragma omp teams num_teams(retval.1) firstprivate(b) shared(a)
    __builtin_GOMP_teams (retval.3, 0);
    {
      bar (&a, &b);
    }
    S::~S (&b);
    #pragma omp return(nowait)
  }
IMHO we want a new API, say GOMP_teams3 which will take 3 arguments
instead of 2 (the lower and upper bounds from num_teams and thread_limit)
and will return a bool whether it should do the teams body or not.
And, we should add right before outermost {} above
while (__builtin_GOMP_teams3 ((unsigned) retval.1, (unsigned) retval.1, 0))
and remove the __builtin_GOMP_teams call.  The current function performs
exit equivalent (at least on NVPTX) which seems bad because that means
the destructors of e.g. private variables on target aren't invoked, and
at the current placement neither destructors of the already constructed
privatized variables in teams.
I'll do this next on the compiler side, but I'm afraid I'll need help
with the nvptx and amdgcn implementations.  E.g. for nvptx, we won't be
able to use %ctaid.x .  I think ideal would be to use a .shared
integer variable for the omp_get_team_num value, but I don't have any
experience with that, are .shared variables zero initialized by default,
or do they have random value at start?  PTX docs say they aren't initializable.

2021-11-11  Jakub Jelinek  <jakub@redhat.com>

gcc/
* tree.h (OMP_CLAUSE_NUM_TEAMS_EXPR): Rename to ...
(OMP_CLAUSE_NUM_TEAMS_UPPER_EXPR): ... this.
(OMP_CLAUSE_NUM_TEAMS_LOWER_EXPR): Define.
* tree.c (omp_clause_num_ops): Increase num ops for
OMP_CLAUSE_NUM_TEAMS to 2.
* tree-pretty-print.c (dump_omp_clause): Print optional lower bound
for OMP_CLAUSE_NUM_TEAMS.
* gimplify.c (gimplify_scan_omp_clauses): Gimplify
OMP_CLAUSE_NUM_TEAMS_LOWER_EXPR if non-NULL.
(optimize_target_teams): Use OMP_CLAUSE_NUM_TEAMS_UPPER_EXPR instead
of OMP_CLAUSE_NUM_TEAMS_EXPR.  Handle OMP_CLAUSE_NUM_TEAMS_LOWER_EXPR.
* omp-low.c (lower_omp_teams): Use OMP_CLAUSE_NUM_TEAMS_UPPER_EXPR
instead of OMP_CLAUSE_NUM_TEAMS_EXPR.
* omp-expand.c (expand_teams_call, get_target_arguments): Likewise.
gcc/c/
* c-parser.c (c_parser_omp_clause_num_teams): Parse optional
lower-bound and store it into OMP_CLAUSE_NUM_TEAMS_LOWER_EXPR.
Use OMP_CLAUSE_NUM_TEAMS_UPPER_EXPR instead of
OMP_CLAUSE_NUM_TEAMS_EXPR.
(c_parser_omp_target): For OMP_CLAUSE_NUM_TEAMS evaluate before
combined target teams even lower-bound expression.
gcc/cp/
* parser.c (cp_parser_omp_clause_num_teams): Parse optional
lower-bound and store it into OMP_CLAUSE_NUM_TEAMS_LOWER_EXPR.
Use OMP_CLAUSE_NUM_TEAMS_UPPER_EXPR instead of
OMP_CLAUSE_NUM_TEAMS_EXPR.
(cp_parser_omp_target): For OMP_CLAUSE_NUM_TEAMS evaluate before
combined target teams even lower-bound expression.
* semantics.c (finish_omp_clauses): Handle
OMP_CLAUSE_NUM_TEAMS_LOWER_EXPR of OMP_CLAUSE_NUM_TEAMS clause.
* pt.c (tsubst_omp_clauses): Likewise.
(tsubst_expr): For OMP_CLAUSE_NUM_TEAMS evaluate before
combined target teams even lower-bound expression.
gcc/fortran/
* trans-openmp.c (gfc_trans_omp_clauses): Use
OMP_CLAUSE_NUM_TEAMS_UPPER_EXPR instead of OMP_CLAUSE_NUM_TEAMS_EXPR.
gcc/testsuite/
* c-c++-common/gomp/clauses-1.c (bar): Supply lower-bound expression
to half of the num_teams clauses.
* c-c++-common/gomp/num-teams-1.c: New test.
* c-c++-common/gomp/num-teams-2.c: New test.
* g++.dg/gomp/attrs-1.C (bar): Supply lower-bound expression
to half of the num_teams clauses.
* g++.dg/gomp/attrs-2.C (bar): Likewise.
* g++.dg/gomp/num-teams-1.C: New test.
* g++.dg/gomp/num-teams-2.C: New test.
libgomp/
* testsuite/libgomp.c-c++-common/teams-1.c: New test.

Remove find_pdom and find_dom

This removes now useless wrappers around get_immediate_dominator.

2021-11-11 Richard Biener <rguenther@suse.de>

* cfganal.c (find_pdom): Remove.
(control_dependences::find_control_dependence): Remove
special-casing of entry block, call get_immediate_dominator
directly.
* gimple-predicate-analysis.cc (find_pdom): Remove.
(find_dom): Likewise.
(find_control_equiv_block): Call get_immediate_dominator
directly.
(compute_control_dep_chain): Likewise.
(predicate::init_from_phi_def): Likewise.

Apply TLC to control dependence compute

This makes the control dependence compute avoid a find_edge
and optimizes allocation by embedding the bitmap head into the
vector of control dependences instead of allocating all of them.
It also uses a local bitmap obstack.

The bitmap changes make it necessary to shuffle some includes.

2021-11-10 Richard Biener <rguenther@suse.de>

* cfganal.h (control_dependences::control_dependence_map):
Embed bitmap_head.
(control_dependences::m_bitmaps): New.
* cfganal.c (control_dependences::set_control_dependence_map_bit):
Adjust.
(control_dependences::clear_control_dependence_bitmap):
Likewise.
(control_dependences::find_control_dependence): Do not
find_edge for the abnormal edge test.
(control_dependences::control_dependences): Instead do not
add abnormal edges to the edge list. Adjust.
(control_dependences::~control_dependences): Likewise.
(control_dependences::get_edges_dependent_on): Likewise.
* function-tests.c: Include bitmap.h.

gcc/analyzer/
* supergraph.cc: Include bitmap.h.

gcc/c/
* gimple-parser.c: Shuffle bitmap.h include.

rs6000/doc: Rename future cpu with power10

Commmit 5d9d0c94588 renamed future to power10 and ace60939fd2
updated the documentation for "future" renaming. This patch
is to rename the remaining "future architecture" references in
documentation and polish the words for float128.

gcc/ChangeLog:

* doc/invoke.texi: Change references to "future cpu" to "power10",
"-mcpu=future" to "-mcpu=power10". Adjust words for float128.

x86: Update -mtune=alderlake

Update mtune for alderlake, Alder Lake Intel Hybrid Technology will not support
Intel® AVX-512. ISA features such as Intel® AVX, AVX-VNNI, Intel® AVX2, and
UMONITOR/UMWAIT/TPAUSE are supported.

gcc/ChangeLog

* config/i386/i386-options.c (m_CORE_AVX2): Remove Alderlake
from m_CORE_AVX2.
(processor_cost_table): Use alderlake_cost for Alderlake.
* config/i386/i386.c (ix86_sched_init_global): Handle Alderlake.
* config/i386/x86-tune-costs.h (struct processor_costs): Add alderlake
cost.
* config/i386/x86-tune-sched.c (ix86_issue_rate): Change Alderlake
issue rate to 4.
(ix86_adjust_cost): Handle Alderlake.
* config/i386/x86-tune.def (X86_TUNE_SCHEDULE): Enable for Alderlake.
(X86_TUNE_PARTIAL_REG_DEPENDENCY): Likewise.
(X86_TUNE_SSE_PARTIAL_REG_DEPENDENCY): Likewise.
(X86_TUNE_SSE_PARTIAL_REG_FP_CONVERTS_DEPENDENCY): Likewise.
(X86_TUNE_SSE_PARTIAL_REG_CONVERTS_DEPENDENCY): Likewise.
(X86_TUNE_MEMORY_MISMATCH_STALL): Likewise.
(X86_TUNE_USE_LEAVE): Likewise.
(X86_TUNE_PUSH_MEMORY): Likewise.
(X86_TUNE_USE_INCDEC): Likewise.
(X86_TUNE_INTEGER_DFMODE_MOVES): Likewise.
(X86_TUNE_MISALIGNED_MOVE_STRING_PRO_EPILOGUES): Likewise.
(X86_TUNE_USE_SAHF): Likewise.
(X86_TUNE_USE_BT): Likewise.
(X86_TUNE_AVOID_FALSE_DEP_FOR_BMI): Likewise.
(X86_TUNE_ONE_IF_CONV_INSN): Likewise.
(X86_TUNE_AVOID_MFENCE): Likewise.
(X86_TUNE_USE_SIMODE_FIOP): Likewise.
(X86_TUNE_EXT_80387_CONSTANTS): Likewise.
(X86_TUNE_SSE_UNALIGNED_LOAD_OPTIMAL): Likewise.
(X86_TUNE_SSE_UNALIGNED_STORE_OPTIMAL): Likewise.
(X86_TUNE_SSE_TYPELESS_STORES): Likewise.
(X86_TUNE_SSE_LOAD0_BY_PXOR): Likewise.
(X86_TUNE_AVOID_4BYTE_PREFIXES): Likewise.
(X86_TUNE_USE_GATHER): Disable for Alderlake.
(X86_TUNE_AVX256_MOVE_BY_PIECES): Likewise.
(X86_TUNE_AVX256_STORE_BY_PIECES): Likewise.

Extend vpcmov to handle V8HF/V16HFmode under TARGET_XOP.

gcc/ChangeLog:

PR target/103151
* config/i386/sse.md (V_128_256): Extend to V8HF/V16HF.
(avxsizesuffix): Ditto.

gcc/testsuite/ChangeLog:

* gcc.target/i386/pr103151.c: New test.

RISC-V: Fix wrong zifencei handling in riscv_subset_list::to_string

This issue cause zifencei never correctly appended on the ISA string.

gcc/ChangeLog

* common/config/riscv/riscv-common.c (riscv_subset_list::to_string): Fix
wrong marco checking.

Daily bump.

Allow loop header copying when first iteration condition is known.

As discussed in the PR, the loop header copying pass avoids doing so
when optimizing for size.  However, sometimes we can determine the
loop entry conditional statically for the first iteration of the loop.

This patch uses the path solver to determine the outgoing edge
out of preheader->header->xx.  If so, it allows header copying.  Doing
this in the loop optimizer saves us from doing gymnastics in the
threader which doesn't have the context to determine if a loop
transformation is profitable.

I am only returning true in entry_loop_condition_is_static for
a true conditional.  Technically a false conditional is also
provably static, but allowing any boolean value causes a regression
in gfortran.dg/vector_subscript_1.f90.

I would have preferred not passing around the query object, but the
layout of pass_ch and should_duplicate_loop_header_p make it a bit
awkward to get it right without an outright refactor to the
pass.

Tested on x86-64 Linux.

gcc/ChangeLog:

PR tree-optimization/102906
* tree-ssa-loop-ch.c (entry_loop_condition_is_static): New.
(should_duplicate_loop_header_p): Call entry_loop_condition_is_static.
(class ch_base): Add m_ranger and m_query.
(ch_base::copy_headers): Pass m_query to
entry_loop_condition_is_static.
(pass_ch::execute): Allocate and deallocate m_ranger and
m_query.
(pass_ch_vect::execute): Same.

gcc/testsuite/ChangeLog:

* gcc.dg/tree-ssa/pr102906.c: New test.

[COMMITTED] aarch64: [PR103170] Fix aarch64_simd_dup<mode>

The problem here is aarch64_simd_dup<mode> use
the vw iterator rather than vwcore iterator. This causes
problems for the V4SF and V2DF modes. I changed both of
aarch64_simd_dup<mode> patterns to be consistent.

Committed as obvious after a bootstrap/test on aarch64-linux-gnu.

PR target/103170

gcc/ChangeLog:

* config/aarch64/aarch64-simd.md (aarch64_simd_dup<mode>):
Use vwcore iterator for the r constraint output string.

gcc/testsuite/ChangeLog:

* gcc.c-torture/compile/vector-dup-1.c: New test.

Fortran: avoid NULL pointer dereferences

CLASS(), PARAMETER is not yet properly implemented in gfortran. Using it
in declarations could lead to subsequent NULL pointer dereferences during
checking or simplification of expressions involving those CLASS variables.

gcc/fortran/ChangeLog:

PR fortran/103137
PR fortran/103138
* check.c (gfc_check_shape): Avoid NULL pointer dereference on
missing ref.
* simplify.c (gfc_simplify_cshift): Avoid NULL pointer dereference
when shape not set.
(gfc_simplify_transpose): Likewise.