review.tizen.org Git - platform/upstream/gcc.git/log

c++: Rename new -flang-note-module-read option [PR 99166]

I realized that the just-added flang-note-module-read option should
also cover module writes, and was therefore misnamed. This addresses
that, replacing it with a -flang-note-module-cmi pair of options. As
this was such a recent addition, I didn't leave the old option
available.

PR c++/99166
gcc/c-family/
* c.opt (-flang-info-module-cmi): Renamed option.
gcc/
* doc/invoke.texi (flang-info-module-cmi): Renamed option.
gcc/cp/
* module.cc (module_state::inform_cmi_p): Renamed field.
(module_state::do_import): Adjust.
(init_modules, finish_module_processing): Likewise.
(handle_module_option): Likewise.
gcc/testsuite/
* g++.dg/modules/pr99166_a.X: Adjust.
* g++.dg/modules/pr99166_b.C: Adjust.
* g++.dg/modules/pr99166_c.C: Adjust.
* g++.dg/modules/pr99166_d.C: Adjust.

c++tools: Make NETWORKING define check consistent [PR 98318]

PR98318 also pointed out that the NETWORKING #define was being checked
with both #if and #ifdef. Let's consistently use one form.

c++tools/
* server.cc: Use #if NETWORKING not #ifdef, to be consistent
with elsewhere.

libstdc++: Use uint32_t for all year_month_day::_S_from_days arithmetic

libstdc++-v3/ChangeLog:

* include/std/chrono (year_month_day::_S_from_days): Perform
all calculations with type uint32_t.

pr95690.f90: move error line for CRIS.

I don't know what it is that ix86, x86_64, Solaris and
apparently CRIS has in common here.

According to
https://gcc.gnu.org/pipermail/gcc-testresults/2021-February/652763.html
m68k-unknown-linux-gnu is also in that bunch, but since
there's a *-*-solaris* in the target specifier and also m68k
vs. m68k*, I'm leaving the adjustment to a maintainer.

gcc/testsuite:
* gfortran.dg/pr95690.f90: CRIS error appears on line 5.

slp: Don't traverse tree on (nil) nodes.

The given testcase shows that one of the children of the complex MUL contains a
PHI node.  This results in the vectorizer having a child that's (nil).

The pattern matcher handles this correctly, but optimize_load_redistribution_1
needs to not traverse/inspect the NULL nodes.

This however does high-light a missed opportunity.  This testcase seems to
result in a different canonicalization than normally.

Normally the expressions are right leaning.  But sometimes, especially when type
casts are introduced the trees suddenly become left leaning. For instance this
testcase (even without type casts) won't detect the FMA form because the addition
gets the MUL node in the left and not right node as it expects.

Checking all forms would be quite expensive so for GCC 12 it probably makes sense to make
forms with type casts in them have the same form as those without?

gcc/ChangeLog:

* tree-vect-slp.c (optimize_load_redistribution_1): Abort on NULL nodes.

gcc/testsuite/ChangeLog:

* g++.dg/vect/simd-complex-num-null-node.cc: New test.

[PR99233] tesstsuite: Run test pr96264.c only for little endian

The test in question is assumed to work only for little endian target.

gcc/testsuite/ChangeLog:

PR testsuite/99233
* gcc.target/powerpc/pr96264.c: Run it only for powerpc64le.

PR middle-end/97172 - ICE: tree code 'ssa_name' is not supported in LTO streams

Skip test when -shared is not supported.

2021-02-25 Christophe Lyon <christophe.lyon@linaro.org>

gcc/testsuite/
PR middle-end/97172
* gcc.dg/pr97172-2.c: Add dg-require-effective-target shared.

libstdc++: Document library versioning for GCC 11

libstdc++-v3/ChangeLog:

* doc/xml/manual/abi.xml: Document versioning for GCC 11.
* doc/html/manual/abi.html: Regenerate.

libstdc++: Do not assume std::FILE is complete [PR 99270]

libstdc++-v3/ChangeLog:

PR libstdc++/99270
* testsuite/27_io/headers/cstdio/types_std.cc: Use pointer to
FILE instead of FILE.

libstdc++: Update baseline symbols for {aarch64,ia64,m68k,riscv64}-linux

libstdc++-v3/
* config/abi/post/aarch64-linux-gnu/baseline_symbols.txt: Update.
* config/abi/post/ia64-linux-gnu/baseline_symbols.txt: Update.
* config/abi/post/m68k-linux-gnu/baseline_symbols.txt: Update.
* config/abi/post/riscv64-linux-gnu/baseline_symbols.txt: Update.

c++: Fix typo in module-mapper [PR 98318]

User reported this typo: '0' and '-' are right next to each other, and
as it happened I always had networking, so it went unnoticed.

PR c++/98318
gcc/cp/
* mapper-client.cc (module_client::open_module_client): Fix typo
of fd init.

libstdc++: Fix narrowing conversion in year_month_day [PR 99265]

libstdc++-v3/ChangeLog:

PR libstdc++/99265
* include/std/chrono (year_month_day::_S_from_days): Cast long
to int explicitly.

libstdc++: Add std::to_underlying for C++23

Implement P1682R2 as just approved for C++23.

libstdc++-v3/ChangeLog:

* include/std/utility (to_underlying): Define.
* include/std/version (__cpp_lib_to_underlying): Define.
* testsuite/20_util/to_underlying/1.cc: New test.
* testsuite/20_util/to_underlying/version.cc: New test.

Bump gcc/BASE-VER to 11.0.1 now that we are in stage4.

2021-02-25 Jakub Jelinek <jakub@redhat.com>

* BASE-VER: Bump to 11.0.1.

tree-optimization/99253 - fix reduction path check

This fixes an ordering problem with verifying that no intermediate
computations in a reduction path are used outside of the chain.  The
check was disabled for value-preserving conversions at the tail
but whether a stmt was a conversion or not was only computed after
the first use.  The following fixes this by re-ordering things
accordingly.

2021-02-25  Richard Biener  <rguenther@suse.de>

PR tree-optimization/99253
* tree-vect-loop.c (check_reduction_path): First compute
code, then verify out-of-loop uses.

* gcc.dg/vect/pr99253.c: New testcase.

match.pd: Use :s for (T)(A) + CST -> (T)(A + CST) [PR95798]

The r10-2806 change regressed following testcases, instead of doing
int -> unsigned long sign-extension once and then add 8, 16, ... 56 to it
for each of the memory access, it adds 8, 16, ... 56 in int mode and then
sign extends each.  So that means:
+       movq    $0, (%rsp,%rax,8)
+       leal    1(%rdx), %eax
+       cltq
+       movq    $1, (%rsp,%rax,8)
+       leal    2(%rdx), %eax
+       cltq
+       movq    $2, (%rsp,%rax,8)
+       leal    3(%rdx), %eax
+       cltq
+       movq    $3, (%rsp,%rax,8)
+       leal    4(%rdx), %eax
+       cltq
+       movq    $4, (%rsp,%rax,8)
+       leal    5(%rdx), %eax
+       cltq
+       movq    $5, (%rsp,%rax,8)
+       leal    6(%rdx), %eax
+       addl    $7, %edx
+       cltq
+       movslq  %edx, %rdx
+       movq    $6, (%rsp,%rax,8)
+       movq    $7, (%rsp,%rdx,8)
-       movq    $0, (%rsp,%rdx,8)
-       movq    $1, 8(%rsp,%rdx,8)
-       movq    $2, 16(%rsp,%rdx,8)
-       movq    $3, 24(%rsp,%rdx,8)
-       movq    $4, 32(%rsp,%rdx,8)
-       movq    $5, 40(%rsp,%rdx,8)
-       movq    $6, 48(%rsp,%rdx,8)
-       movq    $7, 56(%rsp,%rdx,8)
GCC 9 -> 10 change or:
-       movq    $0, (%rsp,%rdx,8)
-       movq    $1, 8(%rsp,%rdx,8)
-       movq    $2, 16(%rsp,%rdx,8)
-       movq    $3, 24(%rsp,%rdx,8)
-       movq    $4, 32(%rsp,%rdx,8)
-       movq    $5, 40(%rsp,%rdx,8)
-       movq    $6, 48(%rsp,%rdx,8)
-       movq    $7, 56(%rsp,%rdx,8)
+       movq    $0, (%rsp,%rax,8)
+       leal    1(%rdx), %eax
+       movq    $1, (%rsp,%rax,8)
+       leal    2(%rdx), %eax
+       movq    $2, (%rsp,%rax,8)
+       leal    3(%rdx), %eax
+       movq    $3, (%rsp,%rax,8)
+       leal    4(%rdx), %eax
+       movq    $4, (%rsp,%rax,8)
+       leal    5(%rdx), %eax
+       movq    $5, (%rsp,%rax,8)
+       leal    6(%rdx), %eax
+       movq    $6, (%rsp,%rax,8)
+       leal    7(%rdx), %eax
+       movq    $7, (%rsp,%rax,8)
change on the other test.  While for the former case of
int there is due to signed integer overflow (unless -fwrapv)
the possibility to undo it e.g. during expansion, for the unsigned
case information is unfortunately lost.

The following patch adds :s to the convert which restores these
testcases but keeps the testcases the patch meant to improve as is.

2021-02-25  Jakub Jelinek  <jakub@redhat.com>

PR target/95798
* match.pd ((T)(A) + CST -> (T)(A + CST)): Add :s to convert.

* gcc.target/i386/pr95798-1.c: New test.
* gcc.target/i386/pr95798-2.c: New test.

vrp: Handle VCE in vrp_simplify_cond_using_ranges [PR80635]

> So I wonder what other optimizations are prevented here?

> Why does uninit warn with VCE but not with NOP_EXPR?  Or does the
> warning disappear because of those other optimizations you mention?

The optimization that it prevents is in this particular case in tree-vrp.c
(vrp_simplify_cond_using_ranges):

      if (!is_gimple_assign (def_stmt)
          || !CONVERT_EXPR_CODE_P (gimple_assign_rhs_code (def_stmt)))
        return;
so it punts on VIEW_CONVERT_EXPR, with NOP_EXPR it optimizes that:
  _9 = (bool) maybe_a$4_7;
  if (_9 != 0)
into:
  _9 = (bool) maybe_a$4_7;
  if (maybe_a$4_7 != 0)

Now, if I apply my patch but manually disable this
vrp_simplify_cond_using_ranges optimization, then the uninit warning is
back, so on the uninit side it is not about VIEW_CONVERT_EXPR vs. NOP_EXPR,
both are bad there, uninit wants the guarding condition to be
that SSA_NAME and not some demotion cast thereof.
We have:
  # maybe_a$m_6 = PHI <_5(4), maybe_a$m_4(D)(6)>
  # maybe_a$4_7 = PHI <1(4), 0(6)>
...
One of:
  _9 = VIEW_CONVERT_EXPR<bool>(maybe_a$4_7);
  if (_9 != 0)
or:
  _9 = (bool) maybe_a$4_7;
  if (_9 != 0)
or:
  if (maybe_a$4_7 != 0)
followed by:
    goto <bb 11>; [0.00%]
  else
    goto <bb 14>; [0.00%]
...
  <bb 11> [count: 0]:
  set (maybe_a$m_6);
and uninit wants to see that maybe_a$m_4(D) is not used if
bb 11 is encountered.

This patch fixes it by teaching vrp_simplify_cond_using_ranges
to handle VCE (when from an integral type) in addition to
NOP_EXPR/CONVERT_EXPR, of course as long as the VCE or demotion
doesn't change any values, i.e. when the range of the VCE or
conversion operand fits into the target type.

2021-02-25  Jakub Jelinek  <jakub@redhat.com>

PR tree-optimization/80635
* tree-vrp.c (vrp_simplify_cond_using_ranges): Also handle
VIEW_CONVERT_EXPR if modes are the same, innerop is integral and
has mode precision.

* g++.dg/warn/pr80635-1.C: New test.
* g++.dg/warn/pr80635-2.C: New test.

Make the PR99220 fix more robust

This avoids interleaving pattern recognition and load redistribution
optimization since the load_map used in the latter is fragile with
respect to release and reuse of SLP nodes, something which can also
occur within the pattern recognition machinery.

2021-02-25 Richard Biener <rguenther@suse.de>

* tree-vect-slp.c (optimize_load_redistribution_1): Delay
load_map population.
(vect_match_slp_patterns_2): Revert part of last change.
(vect_analyze_slp): Do not interleave optimize_load_redistribution
with pattern detection but do it afterwards. Dump the
whole SLP graph after pattern recognition and load
redistribution optimization finished.

analyzer: fix false positive on realloc [PR99193]

PR analyzer/99193 describes various false positives from
-Wanalyzer-mismatching-deallocation on realloc(3) calls
of the form:

    |   31 |   void *p = malloc (1024);
    |      |             ^~~~~~~~~~~~~
    |      |             |
    |      |             (1) allocated here (expects deallocation with ‘free’)
    |   32 |   void *q = realloc (p, 4096);
    |      |             ~~~~~~~~~~~~~~~~~
    |      |             |
    |      |             (2) deallocated with ‘realloc’ here; allocation at (1) expects deallocation with ‘free’
    |

The underlying issue is that the analyzer has no knowledge of
realloc(3), and realloc has awkward semantics.

Unfortunately, the analyzer is currently structured so that each call
statement can only have at most one successor state; there is no
way to "bifurcate" the state, or have N-way splits into multiple
outcomes.  The existing "on_stmt" code works on a copy of the next
state, updating it in place, rather than copying it and making any
necessary changes.  I did this as an optimization to avoid unnecessary
copying of state objects, but it makes it hard to support multiple
outcomes.  (ideally our state objects would be immutable and thus
support trivial copying, alternatively, C++11 move semantics may
help here)

I attempted a few approaches to implementing bifurcation within the
existing state-update framework, but they were messy and thus likely
buggy; a proper implementation would rework state-updating to
generate copies, but this would be a major change, and seems too
late for GCC 11.

As a workaround, this patch implements enough of realloc(3) to
suppress the false positives.

This fixes the false positives in PR analyzer/99193.
I've filed PR analyzer/99260 to track "properly" implementing realloc(3).

gcc/analyzer/ChangeLog:
PR analyzer/99193
* region-model-impl-calls.cc (region_model::impl_call_realloc): New.
* region-model.cc (region_model::on_call_pre): Call it.
* region-model.h (region_model::impl_call_realloc): New decl.
* sm-malloc.cc (enum wording): Add WORDING_REALLOCATED.
(malloc_state_machine::m_realloc): New field.
(use_after_free::describe_state_change): Add case for
WORDING_REALLOCATED.
(use_after_free::describe_final_event): Likewise.
(malloc_state_machine::malloc_state_machine): Initialize
m_realloc.
(malloc_state_machine::on_stmt): Handle realloc by calling...
(malloc_state_machine::on_realloc_call): New.

gcc/testsuite/ChangeLog:
PR analyzer/99193
* gcc.dg/analyzer/pr99193-1.c: New test.
* gcc.dg/analyzer/pr99193-2.c: New test.
* gcc.dg/analyzer/pr99193-3.c: New test.
* gcc.dg/analyzer/realloc-1.c: New test.

Daily bump.

libstdc++: Fix order of arguments to sprintf [PR 99261]

libstdc++-v3/ChangeLog:

PR libstdc++/99261
* src/c++17/floating_to_chars.cc (sprintf_ld): Add extra args
before value to be printed.

libstdc++: Fix __floating_to_chars_precision for __float128

The code path in __floating_to_chars_precision for handling long double
by going through printf now also handles __float128, so the condition
that guards this code path needs to get updated accordingly.

libstdc++-v3/ChangeLog:

* src/c++17/floating_to_chars.cc (__floating_to_chars_precision):
Relax the condition that guards the printf code path to accept
F128_type as well as long double.

c++: Macro location fixes [PR 98718]

This fixes some issues with macro maps.  We were incorrectly
calculating the number of macro expansions in a location span, and I
had a workaround that partially covered that up.  Further, while macro
location spans are monotonic, that is not true of ordinary location
spans.  Thus we need to insert an indirection array when binary
searching the latter. (We load ordinary locations before loading
imports, but macro locations afterwards.  We make sure an import
location is de-macrofied, if needed.)

PR c++/98718
gcc/cp/
* module.cc (ool): New indirection vector.
(loc_spans::maybe_propagate): Location is not optional.
(loc_spans::open): Likewise.  Assert monotonically advancing.
(module_for_ordinary_loc): Use ool indirection vector.
(module_state::write_prepare_maps): Do not count empty macro
expansions.  Elide empty spans.
(module_state::write_macro_maps): Skip empty expansions.
(ool_cmp): New qsort comparator.
(module_state::write): Create and destroy ool vector.
(name_pending_imports): Fix dump push/pop.
(preprocess_module): Likewise.  Add more dumping.
(preprocessed_module): Likewise.
libcpp/
* include/line-map.h
* line-map.c
gcc/testsuite/
* g++.dg/modules/pr98718_a.C: New.
* g++.dg/modules/pr98718_b.C: New.

testsuite, coroutines : Make final_suspend calls noexcept.

The wording of [dcl.fct.def.coroutine]/15 states:
The expression co_await promise.final_suspend() shall not be
potentially-throwing. A fair number of testcases are not correctly
marked. Fixed here.

gcc/testsuite/ChangeLog:

* g++.dg/coroutines/co-await-void_type.C: Mark promise
final_suspend call as noexcept.
* g++.dg/coroutines/co-return-syntax-08-bad-return.C: Likewise.
* g++.dg/coroutines/co-return-syntax-10-movable.C: Likewise.
* g++.dg/coroutines/co-return-warning-1.C: Likewise.
* g++.dg/coroutines/co-yield-syntax-08-needs-expr.C: Likewise.
* g++.dg/coroutines/coro-bad-gro-00-class-gro-scalar-return.C: Likewise.
* g++.dg/coroutines/coro-bad-gro-01-void-gro-non-class-coro.C: Likewise.
* g++.dg/coroutines/coro-missing-gro.C: Likewise.
* g++.dg/coroutines/coro-missing-promise-yield.C: Likewise.
* g++.dg/coroutines/coro-missing-ret-value.C: Likewise.
* g++.dg/coroutines/coro-missing-ret-void.C: Likewise.
* g++.dg/coroutines/coro-missing-ueh.h: Likewise.
* g++.dg/coroutines/coro1-allocators.h: Likewise.
* g++.dg/coroutines/coro1-refs-and-ctors.h: Likewise.
* g++.dg/coroutines/coro1-ret-int-yield-int.h: Likewise.
* g++.dg/coroutines/pr94682-preview-this.C: Likewise.
* g++.dg/coroutines/pr94752.C: Likewise.
* g++.dg/coroutines/pr94760-mismatched-traits-and-promise-prev.C: Likewise.
* g++.dg/coroutines/pr94879-folly-1.C: Likewise.
* g++.dg/coroutines/pr94883-folly-2.C: Likewise.
* g++.dg/coroutines/pr95050.C: Likewise.
* g++.dg/coroutines/pr95345.C: Likewise.
* g++.dg/coroutines/pr95440.C: Likewise.
* g++.dg/coroutines/pr95591.C: Likewise.
* g++.dg/coroutines/pr95711.C: Likewise.
* g++.dg/coroutines/pr95813.C: Likewise.
* g++.dg/coroutines/symmetric-transfer-00-basic.C: Likewise.
* g++.dg/coroutines/torture/co-await-07-tmpl.C: Likewise.
* g++.dg/coroutines/torture/co-await-17-capture-comp-ref.C: Likewise.
* g++.dg/coroutines/torture/co-ret-00-void-return-is-ready.C: Likewise.
* g++.dg/coroutines/torture/co-ret-01-void-return-is-suspend.C: Likewise.
* g++.dg/coroutines/torture/co-ret-03-different-GRO-type.C: Likewise.
* g++.dg/coroutines/torture/co-ret-04-GRO-nontriv.C: Likewise.
* g++.dg/coroutines/torture/co-ret-06-template-promise-val-1.C: Likewise.
* g++.dg/coroutines/torture/co-ret-08-template-cast-ret.C: Likewise.
* g++.dg/coroutines/torture/co-ret-09-bool-await-susp.C: Likewise.
* g++.dg/coroutines/torture/co-ret-15-default-return_void.C: Likewise.
* g++.dg/coroutines/torture/co-ret-17-void-ret-coro.C: Likewise.
* g++.dg/coroutines/torture/co-yield-00-triv.C: Likewise.
* g++.dg/coroutines/torture/co-yield-03-tmpl.C: Likewise.
* g++.dg/coroutines/torture/co-yield-04-complex-local-state.C: Likewise.
* g++.dg/coroutines/torture/exceptions-test-0.C: Likewise.
* g++.dg/coroutines/torture/exceptions-test-01-n4849-a.C: Likewise.
* g++.dg/coroutines/torture/func-params-04.C: Likewise.
* g++.dg/coroutines/torture/local-var-06-structured-binding.C: Likewise.
* g++.dg/coroutines/torture/mid-suspend-destruction-0.C: Likewise.

openmp: Diagnose invalid teams nested in target construct [PR99226]

The OpenMP standard says:
"A teams region can only be strictly nested within the implicit parallel region
or a target region. If a teams construct is nested within a target construct,
that target construct must contain no statements, declarations or directives
outside of the teams construct."
We weren't diagnosing that restriction, because we need to allow e.g.
#pragma omp target
{{{{{{
   #pragma omp teams
   ;
}}}}}}
and as target doesn't need to have teams nested in it, using some special
parser of the target body didn't feel right.  And after the parsing,
the question is if e.g. already parsing of the clauses doesn't add some
statements before the teams statement (gimplification certainly will).

As we now have a bugreport where we ICE on the invalid code, this just
diagnoses a subset of the invalid programs, in particular those where
nest to the teams strictly nested in targets the target region contains
some other OpenMP construct.

2021-02-24  Jakub Jelinek  <jakub@redhat.com>

PR fortran/99226
* omp-low.c (struct omp_context): Add teams_nested_p and
nonteams_nested_p members.
(scan_omp_target): Diagnose teams nested inside of target with other
directives strictly nested inside of the same target.
(check_omp_nesting_restrictions): Set ctx->teams_nested_p or
ctx->nonteams_nested_p as needed.

* c-c++-common/gomp/pr99226.c: New test.
* gfortran.dg/gomp/pr99226.f90: New test.

libgcc: Avoid signed negation overflow in __powi?f2 [PR99236]

When these functions are called with integer minimum, there is UB on the libgcc
side. Fixed in the obvious way, the code in the end wants ABSU_EXPR behavior.

2021-02-24 Jakub Jelinek <jakub@redhat.com>

PR libgcc/99236
* libgcc2.c (__powisf2, __powidf2, __powitf2, __powixf2): Perform
negation of m in unsigned type.

[PR99123] inline-asm: Don't use decompose_mem_address to find used hard regs

Inline asm in question has empty constraint which means anything
including memory with invalid address. To check used hard regs we
used decompose_mem_address which assumes memory with valid address.
The patch implements the same semantics without assuming valid
addresses.

gcc/ChangeLog:

PR inline-asm/99123
* lra-constraints.c (uses_hard_regs_p): Don't use decompose_mem_address.

gcc/testsuite/ChangeLog:

PR inline-asm/99123
* gcc.target/i386/pr99123.c: New.

libstdc++: More efficient last day of month

This patch reimplements std::chrono::year_month_day_last:day() which yields the
last day of a particular month.  The current implementation uses a look-up table
implemented as an unsigned[12] array.  The new implementation instead
is based on
the fact that a month m in [1, 12], except for m == 2 (February), is
either 31 or
30 days long and m's length depends on two things: m's parity and whether m >= 8
or not. These two conditions are determined by the 0th and 3th bit of m and,
therefore, cheap and straightforward bit-twiddling can provide the right result.

Measurements in x86_64 [1] suggest a 10% performance boost.  Although this does
not seem to be huge, notice that measurements are done in hot L1 cache
conditions which might not be very representative of production runs. Also
freeing L1 cache from holding the look-up table might allow performance
improvements elsewhere.

References:
[1] https://github.com/cassioneri/calendar

libstdc++-v3/ChangeLog:

* include/std/chrono (year_month_day_last:day): New
implementation.

libstdc++: More efficient is_leap

This patch reimplements std::chrono::year::is_leap().  Leap year check is
ubiquitously implemented (including here) as:

    y % 4 == 0 && (y % 100 != 0 || y % 400 == 0).

The rationale being that testing divisibility by 4 first implies an earlier
return for 75% of the cases, therefore, avoiding the needless calculations of
y % 100 and y % 400. Although this fact is true, it does not take into account
the cost of branching.  This patch, instead, tests divisibility by 100 first:

    (y % 100 != 0 || y % 400 == 0) && y % 4 == 0.

It is certainly counterintuitive that this could be more efficient since among
the three divisibility tests (4, 100 and 400) the one by 100 is the only one
that can never provide a definitive answer and a second divisibility test (by 4
or 400) is always required. However, measurements [1] in x86_64 suggest this is
3x more efficient!  A possible explanation is that checking divisibility by 100
first implies a split in the execution path with probabilities of (1%, 99%)
rather than (25%, 75%) when divisibility by 4 is checked first.  This decreases
the entropy of the branching distribution which seems to help prediction.

Given that y belongs to [-32767, 32767] [time.cal.year.members], a more
efficient algorithm [2] to check divisibility by 100 is used (instead of
y % 100 != 0).  Measurements suggest that this optimization improves performance
by 20%.

The patch adds a test that exhaustively compares the result of this
implementation with the ubiquitous one for all y in [-32767, 32767]. Although
its completeness, the test completes in a matter of seconds.

References:
[1] https://stackoverflow.com/a/60646967/1137388
[2] https://accu.org/journals/overload/28/155/overload155.pdf#page=16

libstdc++-v3/ChangeLog:

* include/std/chrono (year::is_leap): New implementation.
* testsuite/std/time/year/2.cc: New test.

libstdc++: More efficient days from date

This patch reimplements std::chrono::year_month_day::_M_days_since_epoch()
which calculates the number of elapsed days since 1970/01/01.  The new
implementation is based on Proposition 6.2 of Neri and Schneider, "Euclidean
Affine Functions and Applications to Calendar Algorithms" available at
https://arxiv.org/abs/2102.06959.

The aforementioned paper benchmarks the implementation against several
counterparts, including libc++'s (which is identical to the current
implementation).  The results, shown in Figure 3, indicate the new algorithm is
1.7 times faster than the current one.

The patch adds a test which loops through all dates in [-32767/01/01,
32767/12/31], and for each of them, gets the number of days and compares the
result against its expected value. The latter is calculated using a much
simpler and easy to understand algorithm but which is also much slower.

The dates used in the test covers the full range of possible values
[time.cal.year.members].  Despite its completeness the test runs in matter of
seconds.

libstdc++-v3/ChangeLog:

* include/std/chrono (year_month_day::_M_days_since_epoch):
New implementation.
* testsuite/std/time/year_month_day/4.cc: New test.

libstdc++: More efficient date from days

This patch reimplements std::chrono::year_month_day::_S_from_days() which
retrieves a date from the number of elapsed days since 1970/01/01.  The new
implementation is based on Proposition 6.3 of Neri and Schneider, "Euclidean
Affine Functions and Applications to Calendar Algorithms" available at
https://arxiv.org/abs/2102.06959.

The aforementioned paper benchmarks the implementation against several
counterparts, including libc++'s (which is identical to the current
implementation).  The results, shown in Figure 4, indicate the new algorithm is
2.2 times faster than the current one.

The patch adds a test which loops through all integers in [-12687428, 11248737],
and for each of them, gets the corresponding date and compares the result
against its expected value.  The latter is calculated using a much simpler and
easy to understand algorithm but which is also much slower.

The interval used in the test covers the full range of values for which a
roundtrip must work [time.cal.ymd.members].  Despite its completeness the test
runs in a matter of seconds.

libstdc++-v3/ChangeLog:

* include/std/chrono (year_month_day::_S_from_days): New
implementation.
* testsuite/std/time/year_month_day/3.cc: New test.

cris: support -fstack-usage

All the bits were there, used with a pre-existing
-mmax-stackframe=SIZE which unfortunately seems to lack
test-cases.

Note that the early-return for -mno-prologue-epilogue (what
some targets call -mnaked) is deliberately not clearing
current_function_static_stack_size, as I consider that
erroneous usage but don't really care to emit a better error
message.

For stack-usage-1.c, like most ILP32 targets, CRIS (at -O0)
needs 4 bytes for the return-address. The default size of
256 seems ill chosen but not worth fixing.

gcc:
* config/cris/cris.c (cris_expand_prologue): Set
current_function_static_stack_size, if flag_stack_usage_info.

gcc/testsuite:
* gcc.dg/stack-usage-1.c: Adjust for CRIS.

libstdc++: Robustify long double std::to_chars testcase [PR98384]

The long double std::to_chars testcase currently verifies the
correctness of its output by comparing it to that of printf, so if
there's a mismatch between to_chars and printf, the test FAILs.  This
works well for the scientific, fixed and general formatting modes,
because the corresponding printf conversion specifiers (%e, %f and %g)
are rigidly specified.

But this doesn't work well for the hex formatting mode because the
corresponding printf conversion specifier %a is more flexibly specified.
For instance, the hexadecimal forms 0x1p+0, 0x2p-1, 0x4p-2 and 0x8p-3
are all equivalent and valid outputs of the %a specifier for the number 1.
The apparent freedom here is the choice of leading hex digit -- the
standard just requires that the leading hex digit is nonzero for
normalized numbers.

Currently, our hexadecimal formatting implementation uses 0/1/2 as the
leading hex digit for floating point types that have an implicit leading
mantissa bit which in practice means all supported floating point types
except x86 long double.  The latter type has a 64 bit mantissa with an
explicit leading mantissa bit, and for this type our implementation uses
the most significant four bits of the mantissa as leading hex digit.
This seems to be consistent with most printf implementations, but not
all, as PR98384 illustrates.

In order to avoid false-positive FAILs due to arbitrary disagreement
between to_chars and printf about the choice of leading hex digit, this
patch makes the testcase's verification via printf conditional on the
leading hex digits first agreeing.  An additional verification step is
also added: round-tripping the output of to_chars through from_chars
should recover the value exactly.

libstdc++-v3/ChangeLog:

PR libstdc++/98384
* testsuite/20_util/to_chars/long_double.cc: Include <optional>.
(test01): Simplify verifying the nearby values by using a
2-iteration loop and a dedicated output buffer to check that the
nearby values are different.  Factor out the printf-based
verification into a local function, and check that the leading
hex digits agree before comparing to the output of printf.  Also
verify the output by round-tripping it through from_chars.

c++: modules & -fpreprocessed [PR 99072]

When we read preprocessed source, we deal with a couple of special
location lines at the start of the file.  These provide information
about the original filename of the source and the current directory,
so we can process the source in the same manner.  When updating that
code, I had a somewhat philosophical question: Should the line table
contain evidence of the filename the user provided to the compiler?  I
figured to leave it there, as it did no harm.  But this defect shows
an issue.  It's in the line table and our (non optimizing) line table
serializer emits that filename.  Which means if one re-preprocesses
the original source to a differently-named intermediate file, the
resultant CMI is different.  Boo.  That's a difference that doesn't
matter, except the CRC matching then fails.  We should elide the
filename, so that one can preprocess to mktemp intermediate filenames
for whatever reason.

This patch takes the approach of expunging it from the line table --
so the line table will end up with exactly the same form.  That seems
a better bet than trying to fix up mismatching line tables in CMI
emission.

PR c++/99072
libcpp/
* init.c (read_original_filename): Expunge all evidence of the
original filename.
gcc/testsuite/
* g++.dg/modules/pr99072.H: New.

libstdc++: Define std::to_chars overloads for __ieee128 [PR 98389]

This adds overloads of std::to_chars for powerpc64's __ieee128, so that
std::to_chars can be used for long double when -mabi=ieeelongdouble is
in used.

Eventually we'll want to extend these new overloads to work for
__float128 on all targets that support that type. For now, we're only
doing it for powerpc64 when the new long double type is supported in
parallel to the old long double type.

Additionally the existing std::to_chars overloads for long double
are given the right symbol version, resolving PR libstdc++/98389.

libstdc++-v3/ChangeLog:

PR libstdc++/98389
* config/abi/pre/gnu.ver (GLIBCXX_3.4.29): Do not match to_chars
symbols for long double arguments mangled as 'g'.
* config/os/gnu-linux/ldbl-extra.ver: Likewise.
* config/os/gnu-linux/ldbl-ieee128-extra.ver: Likewise.
* src/c++17/Makefile.am [GLIBCXX_LDBL_ALT128_COMPAT_TRUE]:
Use -mabi=ibmlongdouble for floating_to_chars.cc.
* src/c++17/Makefile.in: Regenerate.
* src/c++17/floating_to_chars.cc (floating_type_traits_binary128):
New type defining type traits of IEEE binary128 format.
(floating_type_traits<__float128>): Define specialization.
(floating_type_traits<long double>): Define in terms of
floating_type_traits_binary128 when appropriate.
(floating_to_shortest_scientific): Handle __float128.
(sprintf_ld): New function template for printing a long double
or __ieee128 value using sprintf.
(__floating_to_chars_shortest, __floating_to_chars_precision):
Use sprintf_ld.
(to_chars): Define overloads for __float128.

libstdc++: Fix failing tests due to 'u' identifier in kernel header

libstdc++-v3/ChangeLog:

* testsuite/17_intro/names.cc: Undefine 'u' on powerpc*-linux*.

Rename next_insn_prefixed_p for improved clarity.

2021-02-24 Pat Haugen <pthaugen@linux.ibm.com>

gcc/
* config/rs6000/rs6000.c (next_insn_prefixed_p): Rename.
(rs6000_final_prescan_insn): Adjust.
(rs6000_asm_output_opcode): Likewise.

Fortran: Fix memory problems with assumed rank formal args [PR98342].

2021-02-24 Paul Thomas <pault@gcc.gnu.org>

gcc/fortran
PR fortran/98342
* trans-expr.c (gfc_conv_derived_to_class): Add optional arg.
'derived_array' to hold the fixed, parmse expr in the case of
assumed rank formal arguments. Deal with optional arguments.
(gfc_conv_procedure_call): Null 'derived' array for each actual
argument. Add its address to the call to gfc_conv_derived_to_
class. Access the 'data' field of scalar descriptors before
deallocating allocatable components. Also strip NOPs before the
calls to gfc_deallocate_alloc_comp. Use 'derived' array as the
input to gfc_deallocate_alloc_comp if it is available.
* trans.h : Include the optional argument 'derived_array' to
the prototype of gfc_conv_derived_to_class. The default value
is NULL_TREE.

gcc/testsuite/
PR fortran/98342
* gfortran.dg/assumed_rank_21.f90 : New test.

arm: Fix CMSE support detection in libgcc (PR target/99157)

As discussed in the PR, the Makefile fragment lacks a double '$' to
get the return-code from GCC invocation, resulting is CMSE support
missing from multilibs.

I checked that the simple patch proposed in the PR fixes the problem.

2021-02-23 Christophe Lyon <christophe.lyon@linaro.org>
Hau Hsu <hsuhau617@gmail.com>

PR target/99157
libgcc/
* config/arm/t-arm: Fix cmse support detection.

PR middle-end/97172 - ICE: tree code 'ssa_name' is not supported in LTO streams

gcc/ChangeLog:
PR middle-end/97172
* attribs.c (attr_access::free_lang_data): Clear attribute arg spec
from function arguments.

gcc/c/ChangeLog:

PR middle-end/97172
* c-decl.c (free_attr_access_data): Clear attribute arg spec.

gcc/testsuite/ChangeLog:

PR middle-end/97172
* gcc.dg/pr97172-2.c: New test.

slp: fix accidental resource re-use of slp_tree (PR99220)

The attached testcase shows a bug where two nodes end up with the same pointer.
During the loop that analyzes all the instances
in optimize_load_redistribution_1 we do

      if (value)
        {
          SLP_TREE_REF_COUNT (value)++;
          SLP_TREE_CHILDREN (root)[i] = value;
          vect_free_slp_tree (node);
        }

when doing a replacement.  When this is done and the refcount for the node
reaches 0, the node is removed, which allows the libc to return the pointer
again in the next call to new, which it does..

First instance

note:   node 0x5325f48 (max_nunits=1, refcnt=2)
note:   op: VEC_PERM_EXPR
note:           { }
note:           lane permutation { 0[0] 1[1] 0[2] 1[3] }
note:           children 0x5325db0 0x5325200

Second instance

note:   node 0x5325f48 (max_nunits=1, refcnt=1)
note:   op: VEC_PERM_EXPR
note:           { }
note:           lane permutation { 0[0] 1[1] }
note:           children 0x53255b8 0x5325530

This will end up with the illegal construction of

note:   node 0x53258e8 (max_nunits=2, refcnt=2)
note:   op template: slp_patt_57 = .COMPLEX_MUL (_16, _16);
note:           stmt 0 _16 = _14 - _15;
note:           stmt 1 _23 = _17 + _22;
note:           children 0x53257d8 0x5325d28
note:   node 0x53257d8 (max_nunits=2, refcnt=3)
note:   op template: l$b_4 = MEM[(const struct a &)_3].b;
note:           stmt 0 l$b_4 = MEM[(const struct a &)_3].b;
note:           stmt 1 l$c_5 = MEM[(const struct a &)_3].c;
note:           load permutation { 0 1 }
note:   node 0x5325d28 (max_nunits=2, refcnt=8)
note:   op template: l$b_4 = MEM[(const struct a &)_3].b;
note:           stmt 0 l$b_4 = MEM[(const struct a &)_3].b;
note:           stmt 1 l$c_5 = MEM[(const struct a &)_3].c;
note:           stmt 2 l$b_4 = MEM[(const struct a &)_3].b;
note:           stmt 3 l$c_5 = MEM[(const struct a &)_3].c;
note:           load permutation { 0 1 0 1 }

To prevent this we remove the node from the load_map if it's
about to be deleted.

gcc/ChangeLog:

PR tree-optimization/99220
* tree-vect-slp.c (optimize_load_redistribution_1): Remove
node from cache when it's about to be deleted.

gcc/testsuite/ChangeLog:

PR tree-optimization/99220
* g++.dg/vect/pr99220.cc: New test.

[comitted] Testsuite: Disable PR99149 test on big-endian

This patch disables the test for PR99149 on Big-endian
where for standard AArch64 the patterns are disabled.

gcc/testsuite/ChangeLog:

PR tree-optimization/99149
* g++.dg/vect/pr99149.cc: Disabled on BE.

coroutines : Adjust error handling for type-dependent coroutines [PR96251].

Although coroutines are not permitted to be constexpr, generic lambdas
are implicitly from C++17 and, because of this, a generic coroutine lambda
can be marked as potentially constexpr. As per the PR, this then fails when
type substitution is attempted because the check disallowing constexpr in
the coroutines code was overly restrictive.

This changes the error handing to mark the function as 'invalid_constexpr'
but suppresses the error in the case that we are instantiating a constexpr.

gcc/cp/ChangeLog:

PR c++/96251
* coroutines.cc (coro_common_keyword_context_valid_p): Suppress
error reporting when instantiating for a constexpr.

gcc/testsuite/ChangeLog:

PR c++/96251
* g++.dg/coroutines/pr96251.C: New test.

fold-const: Fix up ((1 << x) & y) != 0 folding for vectors [PR99225]

This optimization was written purely with scalar integers in mind,
can work fine even with vectors, but we can't use build_int_cst but
need to use build_one_cst instead.

2021-02-24 Jakub Jelinek <jakub@redhat.com>

PR tree-optimization/99225
* fold-const.c (fold_binary_loc) <case NE_EXPR>: In (x & (1 << y)) != 0
to ((x >> y) & 1) != 0 simplifications use build_one_cst instead of
build_int_cst (..., 1). Formatting fixes.

* gcc.c-torture/compile/pr99225.c: New test.

slp: fix sharing of SLP only patterns.

The attached testcase ICEs due to a couple of issues.
In the testcase you have two SLP instances that share the majority of their
definition with each other.  One tree defines a COMPLEX_MUL sequence and the
other tree a COMPLEX_FMA.

The ice happens because:

1. the refcounts are wrong, in particular the FMA case doesn't correctly count
the references for the COMPLEX_MUL that it consumes.

2. when the FMA is created it incorrectly assumes it can just tear apart the MUL
node that it's consuming.  This is wrong and should only be done when there is
no more uses of the node, in which case the vector only pattern is no longer
relevant.

To fix the last part the SLP only pattern reset code was moved into
vect_free_slp_tree which results in cleaner code.  I also think it does belong
there since that function knows when there are no more uses of the node and so
the pattern should be unmarked, so when the the vectorizer is inspecting the BB
it doesn't find the now invalid vector only patterns.

The patch also clears the SLP_TREE_REPRESENTATIVE when stores are removed such
that we don't hit an error later trying to free the stmt_vec_info again.

Lastly it also tweaks the results of whether a pattern has been detected or not
to return true when another SLP instance has created a pattern that is only used
by a different instance (due to the trees being unshared).

Instead of ICEing this code now produces

        adrp    x1, .LANCHOR0
        add     x2, x1, :lo12:.LANCHOR0
        movi    v1.2s, 0
        mov     w0, 0
        ldr     x4, [x1, #:lo12:.LANCHOR0]
        ldrsw   x3, [x2, 16]
        ldr     x1, [x2, 8]
        ldrsw   x2, [x2, 20]
        ldr     d0, [x4]
        ldr     d2, [x1, x3, lsl 3]
        fcmla   v2.2s, v0.2s, v0.2s, #0
        fcmla   v2.2s, v0.2s, v0.2s, #90
        str     d2, [x1, x3, lsl 3]
        fcmla   v1.2s, v0.2s, v0.2s, #0
        fcmla   v1.2s, v0.2s, v0.2s, #90
        str     d1, [x1, x2, lsl 3]
        ret

PS. This testcase actually shows that the codegen we get in these cases is not
optimal. It should generate a MUL + ADD instead MUL + FMA.

But that's for GCC 12.

gcc/ChangeLog:

PR tree-optimization/99149
* tree-vect-slp-patterns.c (vect_detect_pair_op): Don't recreate the
buffer.
(vect_slp_reset_pattern): Remove.
(complex_fma_pattern::matches): Remove call to vect_slp_reset_pattern.
(complex_mul_pattern::build, complex_fma_pattern::build,
complex_fms_pattern::build): Fix ref counts.
* tree-vect-slp.c (vect_free_slp_tree): Undo SLP only pattern relevancy
when node is being deleted.
(vect_match_slp_patterns_2): Correct result of cache hit on patterns.
(vect_schedule_slp): Invalidate SLP_TREE_REPRESENTATIVE of removed
stores.
* tree-vectorizer.c (vec_info::new_stmt_vec_info): Initialize value.

gcc/testsuite/ChangeLog:

PR tree-optimization/99149
* g++.dg/vect/pr99149.cc: New test.

Revert: "Don't build insn-extract.o with rtl checking"

This reverts commit 8441545d4f2afb9e9342e0dac378eafd03f00462.

c/99224 - avoid ICEing on invalid __builtin_next_arg

This avoids crashes with __builtin_next_arg on non-parameters. For
the specific testcase we arrive with an anonymous SSA_NAME so that
SSA_NAME_VAR becomes NULL and we crash.

2021-02-24 Richard Biener <rguenther@suse.de>

PR c/99224
* builtins.c (fold_builtin_next_arg): Avoid NULL arg.

* gcc.dg/pr99224.c: New testcase.

Daily bump.

rs6000: Add support for compatibility built-ins

The LLVM and GCC teams agreed to rename the __builtin_mma_assemble_pair and
__builtin_mma_disassemble_pair built-ins to __builtin_vsx_assemble_pair and
__builtin_vsx_disassemble_pair respectively. It's too late to remove the
old names, so this patch renames the built-ins to the new names and then
adds support for creating compatibility built-ins (ie, multiple built-in
functions generate the same code) and then creates compatibility built-ins
using the old names.

2021-02-23 Peter Bergner <bergner@linux.ibm.com>

gcc/
* config/rs6000/mma.md (mma_assemble_pair): Rename from this...
(vsx_assemble_pair): ...to this.
(*mma_assemble_pair): Rename from this...
(*vsx_assemble_pair): ...to this.
(mma_disassemble_pair): Rename from this...
(vsx_disassemble_pair): ...to this.
(*mma_disassemble_pair): Rename from this...
(*vsx_disassemble_pair): ...to this.
* config/rs6000/rs6000-builtin.def (BU_MMA_V2, BU_MMA_V3,
BU_COMPAT): New macros.
(mma_assemble_pair): Rename from this...
(vsx_assemble_pair): ...to this.
(mma_disassemble_pair): Rename from this...
(vsx_disassemble_pair): ...to this.
(mma_assemble_pair): New compatibility built-in.
(mma_disassemble_pair): Likewise.
* config/rs6000/rs6000-call.c (struct builtin_compatibility): New.
(RS6000_BUILTIN_COMPAT): Define.
(bdesc_compat): New.
(mma_expand_builtin): Use VSX_BUILTIN_DISASSEMBLE_PAIR_INTERNAL.
(rs6000_gimple_fold_mma_builtin): Use MMA_BUILTIN_DISASSEMBLE_PAIR
and VSX_BUILTIN_ASSEMBLE_PAIR.
(rs6000_init_builtins): Register compatibility built-ins.
(mma_init_builtins): Use VSX_BUILTIN_ASSEMBLE_PAIR,
VSX_BUILTIN_ASSEMBLE_PAIR_INTERNAL, VSX_BUILTIN_DISASSEMBLE_PAIR and
VSX_BUILTIN_DISASSEMBLE_PAIR_INTERNAL.
* doc/extend.texi (__builtin_mma_assemble_pair): Rename from this...
(__builtin_vsx_assemble_pair): ...to this.
(__builtin_mma_disassemble_pair): Rename from this...
(__builtin_vsx_disassemble_pair): ...to this.

gcc/testsuite/
* gcc.target/powerpc/mma-builtin-4.c: Add tests for
__builtin_vsx_assemble_pair and __builtin_vsx_disassemble_pair.
Add __has_builtin tests for built-ins.
Update expected instruction counts.

PR c++/99074 - crash in dynamic_cast<>() on null pointer

libstdc++-v3/ChangeLog:

PR c++/99074
* libsupc++/dyncast.cc (__dynamic_cast): Return null when
first argument is null.

gcc/testsuite/ChangeLog:

PR c++/99074
* g++.dg/warn/Wnonnull11.C: New test.

Fortran: Fix for class defined operators [PR99124].

2021-02-23 Paul Thomas <pault@gcc.gnu.org>

gcc/fortran
PR fortran/99124
* resolve.c (resolve_fl_procedure): Include class results in
the test for F2018, C15100.
* trans-array.c (get_class_info_from_ss): Do not use the saved
descriptor to obtain the class expression for variables. Use
gfc_get_class_from_expr instead.

gcc/testsuite/
PR fortran/99124
* gfortran.dg/class_defined_operator_2.f03 : New test.
* gfortran.dg/elemental_result_2.f90 : New test.
* gfortran.dg/class_assign_4.f90: Correct the non-conforming
elemental function with an allocatable result with an operator
interface with array dummies and result.

PR fortran/99206 - ICE in add_init_expr_to_sym, at fortran/decl.c:1980

Make sure that the string length is properly set during simplification
even when the resulting array is zero-sized.

gcc/fortran/ChangeLog:

PR fortran/99206
* simplify.c (gfc_simplify_reshape): Set string length for
character arguments.

gcc/testsuite/ChangeLog:

PR fortran/99206
* gfortran.dg/reshape_zerosize_4.f90: New test.

c++: typedef for linkage [PR 99208]

Unnamed types with a typedef name for linkage were always troublesome
in modules.  This is the underlying cause of that trouble -- we were
creating incorrect type structures.  Classes have an implicit
self-reference, and we created that for unnamed classes too.  It turns
out we make use of this member, so just not generating it turned into
a rathole.  This member is created using the anonymous name -- because
we've not yet met the typedef name.  When we retrofit the typedef name
we were checking identifier matching and changing all type variants
with that identifier.  Which meant we ended up with a strange typedef
for the self reference.  This fixes things to check for DECL identity
of the variants, so we don't smash the self-reference -- that
continues to have the anonymous name.

PR c++/99208
gcc/cp/
* decl.c (name_unnamed_type): Check DECL identity, not IDENTIFIER
identity.
gcc/testsuite/
* g++.dg/modules/pr99208_a.C: New.
* g++.dg/modules/pr99208_b.C: New.

IPA ICF + ASAN: do not merge vars with different alignment

gcc/ChangeLog:

PR sanitizer/99168
* ipa-icf.c (sem_variable::merge): Do not merge 2 variables
with different alignment. That leads to an invalid red zone
size allocated in runtime.

gcc/testsuite/ChangeLog:

PR sanitizer/99168
* c-c++-common/asan/pr99168.c: New test.

c++: Fix folding of non-dependent BASELINKs [PR95468]

Here, the problem ultimately seems to be that tsubst_copy_and_build,
when called with empty args as we do during non-dependent expression
folding, doesn't touch BASELINKs at all: it delegates to tsubst_copy
which then immediately exits early due to the empty args. This means
that the CAST_EXPR int(1) in the BASELINK A::condition<int(1)> never
gets folded (as part of folding of the overall CALL_EXPR), which later
causes us to crash when performing overload resolution of the rebuilt
CALL_EXPR (which is still in terms of this templated BASELINK).

This doesn't happen when condition() is a namespace-scope function
because then condition<int(1)> is represented by a TEMPLATE_ID_EXPR
rather than by a BASELINK, which does get handled directly from
tsubst_copy_and_build.

This patch fixes this issue by having tsubst_copy_and_build handle
BASELINK directly rather than delegating to tsubst_copy, so that it
processes BASELINKs even when args is empty.

gcc/cp/ChangeLog:

PR c++/95468
* pt.c (tsubst_copy_and_build) <case BASELINK>: New case, copied
over from tsubst_copy.

gcc/testsuite/ChangeLog:

PR c++/95468
* g++.dg/template/non-dependent15.C: New test.

c++: Micro-optimize instantiation_dependent_expression_p

This makes instantiation_dependent_expression_p avoid calling
potential_constant_expression when processing_template_decl isn't set
(and hence when value_dependent_expression_p is definitely false).

gcc/cp/ChangeLog:

* pt.c (instantiation_dependent_expression_p): Check
processing_template_decl before calling
potential_constant_expression.

Fix UBSAN in __ubsan::Value::getSIntValue

/home/marxin/Programming/gcc2/libsanitizer/ubsan/ubsan_value.cpp:77:25: runtime error: left shift of 0x0000000000000000fffffffffffffffb by 96 places cannot be represented in type '__int128'
    #0 0x7ffff754edfe in __ubsan::Value::getSIntValue() const /home/marxin/Programming/gcc2/libsanitizer/ubsan/ubsan_value.cpp:77
    #1 0x7ffff7548719 in __ubsan::Value::isNegative() const /home/marxin/Programming/gcc2/libsanitizer/ubsan/ubsan_value.h:190
    #2 0x7ffff7542a34 in handleShiftOutOfBoundsImpl /home/marxin/Programming/gcc2/libsanitizer/ubsan/ubsan_handlers.cpp:338
    #3 0x7ffff75431b7 in __ubsan_handle_shift_out_of_bounds /home/marxin/Programming/gcc2/libsanitizer/ubsan/ubsan_handlers.cpp:370
    #4 0x40067f in main (/home/marxin/Programming/testcases/a.out+0x40067f)
    #5 0x7ffff72c8b24 in __libc_start_main (/lib64/libc.so.6+0x27b24)
    #6 0x4005bd in _start (/home/marxin/Programming/testcases/a.out+0x4005bd)

Differential Revision: https://reviews.llvm.org/D97263

Cherry-pick from 16ede0956cb1f4b692dfa619ccfa6ab1de28e19b.

config.sub, config.guess : Import upstream 2021-01-25.

Hi

Does it update config.sub and config.guess, I know it's already
stage 4, but the config.* stuff update should be harmless things,
and we need this for RISC-V big-endian support, which is already
supported in binutils 2.36.

This imports from:

sha1 6faca61810d335c7837f320733fe8e15a1431fc2

ChangeLog:

* config.guess: Import latest upstream.
* config.sub: Import latest upstream.

fold-const: Fix ICE in fold_read_from_constant_string on invalid code [PR99204]

fold_read_from_constant_string and expand_expr_real_1 have code to optimize
constant reads from string (tree vs. rtl).
If the STRING_CST array type has zero low bound, index is fold converted to
sizetype and so the compare_tree_int works fine, but if it has some other
low bound, it calls size_diffop_loc and that function from 2 sizetype
operands creates a ssizetype difference. expand_expr_real_1 then uses
tree_fits_uhwi_p + compare_tree_int and so works fine, but fold-const.c
only checked if index is INTEGER_CST and calls compare_tree_int, which means
for negative index it will succeed and result in UB in the compiler.

2021-02-23 Jakub Jelinek <jakub@redhat.com>

PR tree-optimization/99204
* fold-const.c (fold_read_from_constant_string): Check that
tree_fits_uhwi_p (index) rather than just that index is INTEGER_CST.

* gfortran.dg/pr99204.f90: New test.

libstdc++: Fix up constexpr std::char_traits<char>::compare [PR99181]

Because of LWG 467, std::char_traits<char>::lt compares the values
cast to unsigned char rather than char, so even when char is signed
we get unsigned comparision.  std::char_traits<char>::compare uses
__builtin_memcmp and that works the same, but during constexpr evaluation
we were calling __gnu_cxx::char_traits<char_type>::compare.  As
char_traits::lt is not virtual, __gnu_cxx::char_traits<char_type>::compare
used __gnu_cxx::char_traits<char_type>::lt rather than
std::char_traits<char>::lt and thus compared chars as signed if char is
signed.
This change fixes it by inlining __gnu_cxx::char_traits<char_type>::compare
into std::char_traits<char>::compare by hand, so that it calls the right
lt method.

2021-02-23  Jakub Jelinek  <jakub@redhat.com>

PR libstdc++/99181
* include/bits/char_traits.h (char_traits<char>::compare): For
constexpr evaluation don't call
__gnu_cxx::char_traits<char_type>::compare but do the comparison loop
directly.

* testsuite/21_strings/char_traits/requirements/char/99181.cc: New
test.

libstdc++: Fix up parallel_backend_serial.h [PR97549]

In GCC 10, parallel_backend.h just included parallel_backend_{serial,tbb}.h and
did nothing beyond that, and parallel_backend_tbb.h provided directly
namespace __pstl { namespace __par_backend { ... } }
and defined everything in there, while parallel_backend_serial.h did:
namespace __pstl { namespace __serial { ... } } and had this
namespace __pstl { namespace __par_backend { using namespace __pstl::__serial; } }
at the end.
In GCC 11, parallel_backend.h does:
namespace __pstl { namespace __par_backend = __serial_backend; }
after including parallel_backend_serial.h or
namespace __pstl { namespace __par_backend = __tbb_backend; }
after including parallel_backend_tbb.h.  The latter then has:
namespace __pstl { namespace __tbb_backend { ... } }
and no using etc. at the end, while parallel_backend_serial.h changed to:
namespace __pstl { namespace __serial_backend { ... } }
but has this leftover block from the GCC 10 times.  Even changing that
using namespace __pstl::__serial;
to
using namespace __pstl::__serial_backend;
doesn't work, as it clashes with
namespace __pstl { namespace __par_backend = __serial_backend; }
in parallel_backend.h.

2021-02-23  Jakub Jelinek  <jakub@redhat.com>

PR libstdc++/97549
* include/pstl/parallel_backend_serial.h: Remove __pstl::__par_backend.

rs6000: Use rldimi for vec init instead of shift + ior

This patch is to teach unsigned int vector init to use rldimi to
merge two integers instead of shift and ior.  It also adds one
required splitter made by Segher.

An rl*imi is usually written as an IOR of an ASHIFT or similar, and
an AND of a register with a constant mask.  In some cases combine
knows that that AND doesn't do anything (because all zero bits in
that mask correspond to bits known to be already zero), and then no
pattern matches.  This patch adds a define_split for such cases.

It uses nonzero_bits in the condition of the splitter, but does not
need it afterwards for the instruction to be recognised.  This is
necessary because later passes can see fewer nonzero_bits.

Because it is a splitter, combine will only use it when starting with
three insns (or more), even though the result is just one.  This isn't
a huge problem in practice, but some possible combinations still won't
happen.

Bootstrapped/regtested on powerpc64le-linux-gnu P9 and
powerpc64-linux-gnu P8, also SPEC2017 build/run passed on P9.

gcc/ChangeLog:

2020-02-23  Segher Boessenkool  <segher@kernel.crashing.org>
    Kewen Lin  <linkw@gcc.gnu.org>

* config/rs6000/rs6000.md (*rotl<mode>3_insert_3): Renamed to...
(rotl<mode>3_insert_3): ...this.
(plus_ior_xor): New code_iterator.
(define_split for GPR rl*imi): New splitter.
* config/rs6000/vsx.md (vsx_init_v4si): Use gen_rotldi3_insert_3
for integer merging.

gcc/testsuite/ChangeLog:

* gcc.target/powerpc/vec-init-10.c: New test.

libstdc++: Fix endianness issue with IBM long double [PR98384]

The code in std::to_chars for extracting the high- and low-order parts
of an IBM long double value does the right thing on powerpc64le, but not
on powerpc64be. This patch makes the extraction endian-agnostic, which
fixes the execution FAIL of to_chars/long_double.cc on powerpc64be.

libstdc++-v3/ChangeLog:

PR libstdc++/98384
* src/c++17/floating_to_chars.cc (get_ieee_repr): Extract
the high- and low-order parts from an IBM long double value
in an endian-agnostic way.

Update gcc sv.po.

* sv.po: Update.

g++.dg/warn/Wplacement-new-size-1.C, -2, -6: Fix for default_packed targets

Looking at commit de05c19d5fd6, that adjustment to these
tests apparently assumed that the testsuite is run (only) on
targets where structure memory layout has padding as per
"natural alignment". For cris-elf, a target with no padding
in structure memory layout, these tests have been failing
since that commit.

Tested cris-elf and x86_64-linux, committed as obvious.

gcc/testsuite:
* g++.dg/warn/Wplacement-new-size-1.C,
g++.dg/warn/Wplacement-new-size-2.C,
g++.dg/warn/Wplacement-new-size-6.C: Adjust for
default_packed targets.

Daily bump.

analyzer: handle error/error_at_line [PR99196]

PR analyzer/99196 describes a false positive from -fanalyzer due to
the analyzer not "knowing" that calls to GNU libc's error(3) with a
nonzero status terminate the process and thus don't return.

This patch fixes the false positive by special-casing "error" and
"error_at_line".

gcc/analyzer/ChangeLog:
PR analyzer/99196
* engine.cc (exploded_node::on_stmt): Provide terminate_path
flag as a way for on_call_pre to terminate the current analysis
path.
* region-model-impl-calls.cc (call_details::num_args): New.
(region_model::impl_call_error): New.
* region-model.cc (region_model::on_call_pre): Add param
"out_terminate_path". Handle "error" and "error_at_line".
* region-model.h (call_details::num_args): New decl.
(region_model::on_call_pre): Add param "out_terminate_path".
(region_model::impl_call_error): New decl.

gcc/testsuite/ChangeLog:
PR analyzer/99196
* gcc.dg/analyzer/error-1.c: New test.
* gcc.dg/analyzer/error-2.c: New test.
* gcc.dg/analyzer/error-3.c: New test.

Require SHF_GNU_RETAIN support for retain tests

Since retain attribute requires SHF_GNU_RETAIN, run retain tests only
if SHF_GNU_RETAIN is supported.

PR testsuite/99173
* c-c++-common/attr-retain-5.c: Require R_flag_in_section.
* c-c++-common/attr-retain-6.c: Likewise.
* c-c++-common/attr-retain-7.c: Likewise.
* c-c++-common/attr-retain-8.c: Likewise.
* c-c++-common/attr-retain-9.c: Likewise.

aarch64: Add internal tune flag to minimise VL-based scalar ops

This patch introduces an internal tune flag to break up VL-based scalar ops
into a GP-reg scalar op with the VL read kept separate. This can be preferable on some CPUs.

I went for a tune param rather than extending the rtx costs as our RTX costs tables aren't set up to track
this intricacy.

I've confirmed that on the simple loop:
void vadd (int *dst, int *op1, int *op2, int count)
{
  for (int i = 0; i < count; ++i)
    dst[i] = op1[i] + op2[i];
}

we now split the incw into a cntw outside the loop and the add inside.

+       cntw    x5
...
loop:
-       incw    x4
+       add     x4, x4, x5

gcc/ChangeLog:

* config/aarch64/aarch64-tuning-flags.def (cse_sve_vl_constants):
Define.
* config/aarch64/aarch64.md (add<mode>3): Force CONST_POLY_INT immediates
into a register when the above is enabled.
* config/aarch64/aarch64.c (neoversev1_tunings):
AARCH64_EXTRA_TUNE_CSE_SVE_VL_CONSTANTS.
(aarch64_rtx_costs): Use AARCH64_EXTRA_TUNE_CSE_SVE_VL_CONSTANTS.

gcc/testsuite/

* gcc.target/aarch64/sve/cse_sve_vl_constants_1.c: New test.

Fix a comment line that was too long.

libgcc/
2021-02-22 Michael Meissner <meissner@linux.ibm.com>

* dfp-bit.c (BFP_TO_DFP): Fix a comment line that was too long.

Add conversions between _Float128 and Decimal.

This patch implements conversions between _Float128 and the 3 Decimal floating
types.  It does this by extendending the dfp-bit conversions to add a new
binary floating point type (KF), and doing the conversions in the same manner
as the other binary/decimal conversions.

For conversions from _Float128 to Decimal, this patch uses a function
(__sprintfkf) instead of the sprintf function to convert long double values to
strings.  The __sprintfkf function determines if GLIBC 2.32 or newer is used
and calls the IEEE 128-bit version of sprintf (__sprintfieee128).  If the GLIBC
is earlier than 2.32, the code will convert _Float128 to __ibm128 and then use
the normal sprintf to convert this value.

For conversions from Decimal to _Float128, this patch uses a function
(__strtokf) instead of strtold to convert the strings from the Decimal
conversion to long double.  The __strtokf function determines if GLIBC 2.32 or
newer is used, and if it is, calls the IEEE 128-bit version (__strtoieee128).
If the GLIBC is earlier than 2.32, the code will call strtold and convert the
__ibm128 value to _Float128.

These functions will primarily be used if/when the default PowerPC long double
type is changed to IEEE 128-bit, but they could also be used if the user
explicitly converts _Float128 to/from a Decimal type.

libgcc/
2021-02-22  Michael Meissner  <meissner@linux.ibm.com>

* config/rs6000/_dd_to_kf.c: New file.
* config/rs6000/_kf_to_dd.c: New file.
* config/rs6000/_kf_to_sd.c: New file.
* config/rs6000/_kf_to_td.c: New file.
* config/rs6000/_sd_to_kf.c: New file.
* config/rs6000/_sprintfkf.c: New file.
* config/rs6000/_sprintfkf.h: New file.
* config/rs6000/_strtokf.h: New file.
* config/rs6000/_strtokf.c: New file.
* config/rs6000/_td_to_kf.c: New file.
* config/rs6000/quad-float128.h: Add new declarations.
* config/rs6000/t-float128 (fp128_dec_funcs): New macro.
(fp128_decstr_funcs): New macro.
(ibm128_dec_funcs): New macro.
(fp128_ppc_funcs): Add the new conversions.
(fp128_dec_objs): Force Decimal <-> __float128 conversions to be
compiled with -mabi=ieeelongdouble.
(fp128_decstr_objs): Force __float128 <-> string conversions to be
compiled with -mabi=ibmlongdouble.
(ibm128_dec_objs): Force Decimal <-> __float128 conversions to be
compiled with -mabi=ieeelongdouble.
(FP128_CFLAGS_DECIMAL): New macro.
(IBM128_CFLAGS_DECIMAL): New macro.
* dfp-bit.c (DFP_TO_BFP): Add PowerPC _Float128 support.
(BFP_TO_DFP): Add PowerPC _Float128 support.
* dfp-bit.h (BFP_KIND): Add new binary floating point kind for
IEEE 128-bit floating point.
(DFP_TO_BFP): Add PowerPC _Float128 support.
(BFP_TO_DFP): Add PowerPC _Float128 support.
(BFP_SPRINTF): New macro.

g++.dg/warn/Warray-bounds-10..13: Fix for 32-bit newlib targets

See gcc/config/newlib-stdint.h, where targets that have
LONG_TYPE_SIZE == 32, get __INT32_TYPE__ defined to "long
int".

All these tests have "typedef __INT32_TYPE__ int32_t;".
Thus the tests fail for 32-bit newlib targets due to related
warning messages being matched to "aka int" where the
emitted message for these targets have "aka long int".

Tested cris-elf and x86_64-linux, committed as obvious.

gcc/testsuite:
* g++.dg/warn/Warray-bounds-10.C, g++.dg/warn/Warray-bounds-11.C,
g++.dg/warn/Warray-bounds-12.C, g++.dg/warn/Warray-bounds-13.C:
Handle __INT32_TYPE__ being "long int".

testsuite/gcc.target/cris/biap.c: Add a Y+=X*2 to the Y+=X*4.

Also, tweak the scan-assembler regexps to include a tab,
lest they may spuriously match file-paths in the emitted
assembly code, should some be added at some point. And, add
"mul", "move" and (non-addi-)"add" to insns that shouldn't
appear.

gcc/testsuite:
* gcc.target/cris/biap.c: Add a Y+=X*2 to the Y+=X*4.

testsuite/gcc.target/cris/biap-mul.c: New test.

gcc/testsuite:
* gcc.target/cris/biap-mul.c: New test.

cris: Fix addi insn mult vs. shift canonicalization

Ever since the canonicalization clean-up of (mult X (1 << N)) into
(ashift X N) outside addresses, the CRIS addi patterns have been
unmatched.  No big deal.

Unfortunately, nobody thought of adjusting reloaded addresses, so
transforming mult into a shift has to be a kludged for when reload
decides that it has to move an address like (plus (mult reg0 4) reg1)
into a register, as happens building libgfortran.  (No, simplify_rtx
et al don't automatically DTRT.)  Something less kludgy would make
sense if it wasn't for the current late development stage and reload
being deprecated.  I don't know whether this issue is absent for LRA,
though.

I added a testsuite for the reload issue, despite being exposed by a
libgfortran build, so the issue would be covered by C/C++ builds, but
to the CRIS test-suite, not as a generic test, to avoid bad feelings
from anyone preferring short test-times to redundant coverage.

gcc:
* config/cris/cris.c (cris_print_operand) <'T'>: Change
valid operand from is now an addi mult-value to shift-value.
* config/cris/cris.md (*addi): Change expression of scaled
operand from mult to ashift.
* config/cris/cris.md (*addi_reload): New insn_and_split.

gcc/testsuite:
* gcc.target/cris/torture/sync-reload-mul-1.c: New test.

c++: Recursing header imports and other duplicates [PR 99174]

The fix for 98741 introduced two issues.  (a) recursive header units
iced because we tried to read the preprocessor state after having
failed to read the config.  (b) we could have duplicate imports of
named modules in our pending queue, and that would lead to duplicate
requests for pathnames, which coupled with the use of a null-pathname
to indicate we'd asked could lead to desynchronization with the module
mapper.  Fixed by adding a 'visited' flag to module state, so we ask
exactly once.

PR c++/99174
gcc/cp
* module.cc (struct module_state): Add visited_p flag.
(name_pending_imports): Use it to avoid duplicate requests.
(preprocess_module): Don't read preprocessor state if we failed to
load a module's config.
gcc/testsuite/
* g++.dg/modules/pr99174-1_a.C: New.
* g++.dg/modules/pr99174-1_b.C: New.
* g++.dg/modules/pr99174-1_c.C: New.
* g++.dg/modules/pr99174.H: New.

Add mi_thunk support for vcalls on hppa.

gcc/ChangeLog:

PR target/85074
* config/pa/pa.c (TARGET_ASM_CAN_OUTPUT_MI_THUNK): Define as
hook_bool_const_tree_hwi_hwi_const_tree_true.
(pa_asm_output_mi_thunk): Add support for nonzero vcall_offset.

c++: cross-header-unit template definitions [PR 99153]

A member function can be defined in a different header-file than the
one defining the class.  In such situations we must unmark the decl as
imported.  When the entity is a template we failed to unmark the
template_decl.

Perhaps the duplication of these flags on the template_decl from the
underlying decl is an error.  I set on the fence about it for a long
time during development, but I don't think now is the time to change
that (barring catastrophic bugs).

PR c++/99153
gcc/cp/
* decl.c (duplicate_decls): Move DECL_MODULE_IMPORT_P propagation
to common-path.
* module.cc (set_defining_module): Add assert.
gcc/testsuite/
* g++.dg/modules/pr99153_a.H: New.
* g++.dg/modules/pr99153_b.H: New.

ira: Make sure allocno copies are ordered [PR98791]

gcc/ChangeLog:
2021-02-22 Andre Vieira <andre.simoesdiasvieira@arm.com>

PR rtl-optimization/98791
* ira-conflicts.c (process_regs_for_copy): Don't create allocno copies
for unordered modes.

gcc/testsuite/ChangeLog:
2021-02-22 Andre Vieira <andre.simoesdiasvieira@arm.com>

PR rtl-optimization/98791
* gcc.target/aarch64/sve/pr98791.c: New test.

Fortran/OpenMP: Fix optional dummy procedures [PR99171]

gcc/fortran/ChangeLog:

PR fortran/99171
* trans-openmp.c (gfc_omp_is_optional_argument): Regard optional
dummy procs as nonoptional as no special treatment is needed.

libgomp/ChangeLog:

PR fortran/99171
* testsuite/libgomp.fortran/dummy-procs-1.f90: New test.

Fix ICE in tree_inlinable_function_p.

After g:1a2a7096e5e20d736c6138179470b21aa5a74864 we forbid inlining
for a VLA types. What we miss is setting inline_forbidden_reason
variable.

Fixes:

./xgcc -B. -O3 -c /home/marxin/Programming/gcc/gcc/testsuite/gcc.dg/pr99122-2.c -Winline

during GIMPLE pass: local-fnsummary
/home/marxin/Programming/gcc/gcc/testsuite/gcc.dg/pr99122-2.c: In function ‘foo’:
/home/marxin/Programming/gcc/gcc/testsuite/gcc.dg/pr99122-2.c:21:1: internal compiler error: Segmentation fault
21 | }
| ^
0xe8b2ca crash_signal
/home/marxin/Programming/gcc/gcc/toplev.c:327
0x1a92733 pp_format(pretty_printer*, text_info*)
/home/marxin/Programming/gcc/gcc/pretty-print.c:1096
0x1a76b90 diagnostic_report_diagnostic(diagnostic_context*, diagnostic_info*)
/home/marxin/Programming/gcc/gcc/diagnostic.c:1244
0x1a79994 diagnostic_impl
/home/marxin/Programming/gcc/gcc/diagnostic.c:1406
0x1a79994 warning(int, char const*, ...)
/home/marxin/Programming/gcc/gcc/diagnostic.c:1527
0xf1bb16 tree_inlinable_function_p(tree_node*)
/home/marxin/Programming/gcc/gcc/tree-inline.c:4123
0xc3f1c5 compute_fn_summary(cgraph_node*, bool)
/home/marxin/Programming/gcc/gcc/ipa-fnsummary.c:3110
0xc3f937 compute_fn_summary_for_current
/home/marxin/Programming/gcc/gcc/ipa-fnsummary.c:3160
0xc3f937 execute
/home/marxin/Programming/gcc/gcc/ipa-fnsummary.c:4768

gcc/ChangeLog:

* tree-inline.c (inline_forbidden_p): Set
inline_forbidden_reason.

dump SLP subgraph before costing

This adds another dump of the SLP subgraph we're throwing at costing.

2021-02-22 Richard Biener <rguenther@suse.de>

* tree-vect-slp.c (vect_bb_vectorization_profitable_p): Dump
costed subgraph.

tree-optimization/99165 - fix ICE in store-merging w/ non-call EH

This adds a missing accumulation to ret.

2021-02-22 Richard Biener <rguenther@suse.de>

PR tree-optimization/99165
* gimple-ssa-store-merging.c (pass_store_merging::process_store):
Accumulate changed to ret.

* g++.dg/pr99165.C: New testcase.

Daily bump.

PR fortran/99169 - Do not clobber allocatable intent(out) dummy argument

gcc/fortran/ChangeLog:

* trans-expr.c (gfc_conv_procedure_call): Do not add clobber to
allocatable intent(out) argument.

gcc/testsuite/ChangeLog:

* gfortran.dg/intent_optimize_3.f90: New test.

Revert: "i386: Remove REG_ALLOC_ORDER definition"

This reverts commit 4c61e35f20fe2ffeb9421dbd6f26c767a234a4a0.

Daily bump.

libiberty: autogenerate aclocal.m4

Move custom macros to acinclude.m4 so we can autogenerate aclocal.m4
with aclocal. This matches every other project in the tree.

libiberty/ChangeLog:

* Makefile.in (ACLOCAL, ACLOCAL_AMFLAGS, $(srcdir)/aclocal.m4): Define.
(configure_deps): Rename to ...
(aclocal_deps): ... this. Replace aclocal.m4 with acinclude.m4.
($(srcdir)/configure): Replace $(configure_deps) with
$(srcdir)/aclocal.m4.
* aclocal.m4: Move libiberty macros to acinclude.m4, then regenerate.
* acinclude.m4: New file.
* configure: Regenerate.

testsuite: skip attr-retain-?.c on AIX

The attr-retain-?.c tests assume ELF file syntax / semantics. Some of the
tests skip AIX because of other requirements, and some explicitly skip
Darwin. This patch adds AIX to the explicit skip list.

gcc/testsuite/ChangeLog:

* c-c++-common/attr-retain-5.c: Skip on AIX.
* c-c++-common/attr-retain-6.c: Same.
* c-c++-common/attr-retain-7.c: Same.
* c-c++-common/attr-retain-8.c: Same.
* c-c++-common/attr-retain-9.c: Same.

IBM Z: Fix long double <-> DFP conversions

When switching the s390 backend to store long doubles in vector
registers, the patterns for long double <-> DFP conversions were
forgotten. This did not cause observable problems so far, because
libdfp calls are emitted instead of pfpo. However, when building
libdfp itself, this leads to infinite recursion.

gcc/ChangeLog:

PR target/99134
* config/s390/vector.md (trunctf<DFP_ALL:mode>2_vr): New
pattern.
(trunctf<DFP_ALL:mode>2): Likewise.
(trunctdtf2_vr): Likewise.
(trunctdtf2): Likewise.
(extend<DFP_ALL:mode>tf2_vr): Likewise.
(extend<DFP_ALL:mode>tf2): Likewise.
(extendtftd2_vr): Likewise.
(extendtftd2): Likewise.

gcc/testsuite/ChangeLog:

PR target/99134
* gcc.target/s390/vector/long-double-from-decimal128.c: New test.
* gcc.target/s390/vector/long-double-from-decimal32.c: New test.
* gcc.target/s390/vector/long-double-from-decimal64.c: New test.
* gcc.target/s390/vector/long-double-to-decimal128.c: New test.
* gcc.target/s390/vector/long-double-to-decimal32.c: New test.
* gcc.target/s390/vector/long-double-to-decimal64.c: New test.

IBM Z: Improve FPRX2 <-> TF conversions

gcc/ChangeLog:

* config/s390/vector.md (*fprx2_to_tf): Rename to fprx2_to_tf,
add memory alternative.
(tf_to_fprx2): New pattern.

Daily bump.

c++: Incorrect module-number ordering [PR 98741]

One of the very strong invariants in modules is that module numbers
are allocated such that (other than the current TU), all imports have
lesser module numbers, and also that the binding vector is only
appended to with increasing module numbers.   This broke down when
module-directives became a thing and the preprocessing became entirely
decoupled from parsing.  We'd load header units and their macros (but
not symbols of course) during preprocessing.  Then we'd load named
modules during parsing.  This could lead to the situation where a
header unit appearing after a named import had a lower module number
than the import.  Consequently, if they both bound the same
identifier, the binding vector would be misorderd and bad things
happen.

This patch restores a pending import queue I previously had, but in
simpler form (hurrah).  During preprocessing we queue all
module-directives and when we meet one for a header unit we do the
minimal loading for all of the queue, so they get appropriate
numbering.  Then we load the preprocessor state for the header unit.

PR c++/98741
gcc/cp/
* module.cc (pending_imports): New.
(declare_module): Adjust test condition.
(name_pending_imports): New.
(preprocess_module): Reimplement using pending_imports.
(preprocessed_module): Move name-getting to name_pending_imports.
* name-lookup.c (append_imported_binding_slot): Assert module
ordering is increasing.
gcc/testsuite/
* g++.dg/modules/pr98741_a.H: New.
* g++.dg/modules/pr98741_b.H: New.
* g++.dg/modules/pr98741_c.C: New.
* g++.dg/modules/pr98741_d.C: New.

fortran: Object types should be declared before use in NAMELIST.

gcc/fortran/ChangeLog:

PR fortran/98686
* match.c (gfc_match_namelist): If BT_UNKNOWN, check for
IMPLICIT NONE and and issue an error, otherwise set the type
to its IMPLICIT type so that any subsequent use of objects will
will confirm their types.

gcc/testsuite/ChangeLog:

PR fortran/98686
* gfortran.dg/namelist_4.f90: Modify.
* gfortran.dg/namelist_98.f90: New test.

Update gcc fr.po.

* fr.po: Update.

libgo: update to Go1.16 release

Reviewed-on: https://go-review.googlesource.com/c/gofrontend/+/293793

PR fortran/99147 - Sanitizer detects heap-use-after-free in gfc_add_flavor

Reverse order of conditions to avoid invalid read.

gcc/fortran/ChangeLog:

* symbol.c (gfc_add_flavor): Reverse order of conditions.

Update .po files.

gcc/po/
* be.po, da.po, de.po, el.po, es.po, fi.po, fr.po, hr.po, id.po,
ja.po, nl.po, ru.po, sr.po, sv.po, tr.po, uk.po, vi.po, zh_CN.po,
zh_TW.po: Update.

libcpp/po/
* be.po, ca.po, da.po, de.po, el.po, eo.po, es.po, fi.po, fr.po,
id.po, ja.po, nl.po, pt_BR.po, ru.po, sr.po, sv.po, tr.po, uk.po,
vi.po, zh_CN.po, zh_TW.po: Update.

PR c/97172 - ICE: tree code 'ssa_name' is not supported in LTO streams

gcc/ChangeLog:

PR c/97172
* attribs.c (init_attr_rdwr_indices): Guard vblist use.
(attr_access::free_lang_data): Remove a spurious test.

gcc/testsuite/ChangeLog:

PR c/97172
* gcc.dg/pr97172.c: Add test cases.

c++: Inform of CMI reads [PR 99166]

When successfully reading a module CMI, the user gets no indication of
where that CMI was located.  I originally didn't consider this a
problem -- the read was successful after all.  But it can make it
difficult to interact with build systems, particularly when caching
can be involved.  Grovelling over internal dump files is not really
useful to the user.  Hence this option, which is similar to the
-flang-info-include-translate variants, and allows the user to ask for
all, or specific module read notification.

gcc/c-family/
* c.opt (flang-info-module-read, flang-info-module-read=): New.
gcc/
* doc/invoke.texi (flang-info-module-read): Document.
gcc/cp/
* module.cc (note_cmis): New.
(struct module_state): Add inform_read_p bit.
(module_state::do_import): Inform of CMI location, if enabled.
(init_modules): Canonicalize note_cmis entries.
(handle_module_option): Handle -flang-info-module-read=FOO.
gcc/testsuite/
* g++.dg/modules/pr99166_a.X: New.
* g++.dg/modules/pr99166_b.C: New.
* g++.dg/modules/pr99166_c.C: New.
* g++.dg/modules/pr99166_d.C: New.