review.tizen.org Git - platform/upstream/gcc.git/log

Update gcc sv.po

* sv.po: Update.

vec: Add array_slice constructors from non-const and gc vectors

This patch adds constructors of array_slice that are required to
create them from non-const (heap or auto) vectors or from GC vectors.

gcc/ChangeLog:

2022-08-08 Martin Jambor <mjambor@suse.cz>

* vec.h (array_slice): Add constructors for non-const reference to
heap vector and pointers to heap vectors.

Improve union of ranges containing NAN.

Previously [5,6] U NAN would just drop to VARYING.  With this patch,
the resulting range becomes [5,6] with the NAN bit set to unknown.

[I still have yet to decide what to do with intersections.  ISTM, the
intersection of a known NAN with anything else should be a NAN, but it
could also be undefined (the empty set).  I'll have to run some tests
and see.  Currently, we drop to VARYING cause well... it's always safe
to give up;-).]

gcc/ChangeLog:

* value-range.cc (early_nan_resolve): Change comment.
(frange::union_): Handle union when one side is a NAN.
(range_tests_nan): Add tests for NAN union.

amdgcn: OpenMP SIMD routine support

Enable and configure SIMD clones for amdgcn. This affects both the __simd__
function attribute, and the OpenMP "declare simd" directive.

Note that the masked SIMD variants are generated, but the middle end doesn't
actually support calling them yet.

gcc/ChangeLog:

* config/gcn/gcn.cc (gcn_simd_clone_compute_vecsize_and_simdlen): New.
(gcn_simd_clone_adjust): New.
(gcn_simd_clone_usable): New.
(TARGET_SIMD_CLONE_ADJUST): New.
(TARGET_SIMD_CLONE_COMPUTE_VECSIZE_AND_SIMDLEN): New.
(TARGET_SIMD_CLONE_USABLE): New.

gcc/testsuite/ChangeLog:

* gcc.dg/vect/vect-simd-clone-1.c: Add dg-warning.
* gcc.dg/vect/vect-simd-clone-2.c: Add dg-warning.
* gcc.dg/vect/vect-simd-clone-3.c: Add dg-warning.
* gcc.dg/vect/vect-simd-clone-4.c: Add dg-warning.
* gcc.dg/vect/vect-simd-clone-5.c: Add dg-warning.
* gcc.dg/vect/vect-simd-clone-8.c: Add dg-warning.

omp-simd-clone: Allow fixed-lane vectors

The vecsize_int/vecsize_float has an assumption that all arguments will use
the same bitsize, and vary the number of lanes according to the element size,
but this is inappropriate on targets where the number of lanes is fixed and
the bitsize varies (i.e. amdgcn).

With this change the vecsize can be left zero and the vectorization factor will
be the same for all types.

gcc/ChangeLog:

* doc/tm.texi: Regenerate.
* omp-simd-clone.cc (simd_clone_adjust_return_type): Allow zero
vecsize.
(simd_clone_adjust_argument_types): Likewise.
* target.def (compute_vecsize_and_simdlen): Document the new
vecsize_int and vecsize_float semantics.

expmed: Fix store_bit_field_1 subreg offset

store_bit_field_1 tries to convert a field assignment into a subreg
assignment. Normally it must check that the field occupies a full
word (or more specifically, a full REGMODE_NATURAL_SIZE chunk),
so that writing to the subreg doesn't clobber any other fields.
But it can skip that check if the structure is known to be in
an undefined state.

The idea was that, in the undefined case, we could rely on
simplify_gen_subreg to do the check for a valid subreg, rather
than having to repeat the required endianness logic in the caller.

Before the addition of the undefined case, the code could use
regnum * regsize to get the byte offset, where regnum came from
checking that the start was word-aligned. In the undefined case
we need to calculate the byte offset explicitly.

gcc/
* expmed.cc (store_bit_field_1): Fix byte offset calculation
for undefined structures.

Extend SLP permutation optimisations

Currently SLP tries to force permute operations "down" the graph
from loads in the hope of reducing the total number of permutations
needed or (in the best case) removing the need for the permutations
entirely.  This patch tries to extend it as follows:

- Allow loads to take a different permutation from the one they
  started with, rather than choosing between "original permutation"
  and "no permutation".

- Allow changes in both directions, if the target supports the
  reverse permutation.

- Treat the placement of permutations as a two-way dataflow problem:
  after propagating information from leaves to roots (as now), propagate
  information back up the graph.

- Take execution frequency into account when optimising for speed,
  so that (for example) permutations inside loops have a higher
  cost than permutations outside loops.

- Try to reduce the total number of permutations when optimising for
  size, even if that increases the number of permutations on a given
  execution path.

See the big block comment above vect_optimize_slp_pass for
a detailed description.

The original motivation for doing this was to add a framework that would
allow other layout differences in future.  The two main ones are:

- Make it easier to represent predicated operations, including
  predicated operations with gaps.  E.g.:

     a[0] += 1;
     a[1] += 1;
     a[3] += 1;

  could be a single load/add/store for SVE.  We could handle this
  by representing a layout such as { 0, 1, _, 2 } or { 0, 1, _, 3 }
  (depending on what's being counted).  We might need to move
  elements between lanes at various points, like with permutes.

  (This would first mean adding support for stores with gaps.)

- Make it easier to switch between an even/odd and unpermuted layout
  when switching between wide and narrow elements.  E.g. if a widening
  operation produces an even vector and an odd vector, we should try
  to keep operations on the wide elements in that order rather than
  force them to be permuted back "in order".

To give some examples of what the patch does:

int f1(int *__restrict a, int *__restrict b, int *__restrict c,
       int *__restrict d)
{
  a[0] = (b[1] << c[3]) - d[1];
  a[1] = (b[0] << c[2]) - d[0];
  a[2] = (b[3] << c[1]) - d[3];
  a[3] = (b[2] << c[0]) - d[2];
}

continues to produce the same code as before when optimising for
speed: b, c and d are permuted at load time.  But when optimising
for size we instead permute c into the same order as b+d and then
permute the result of the arithmetic into the same order as a:

        ldr     q1, [x2]
        ldr     q0, [x1]
        ext     v1.16b, v1.16b, v1.16b, #8     // <------
        sshl    v0.4s, v0.4s, v1.4s
        ldr     q1, [x3]
        sub     v0.4s, v0.4s, v1.4s
        rev64   v0.4s, v0.4s                   // <------
        str     q0, [x0]
        ret

The following function:

int f2(int *__restrict a, int *__restrict b, int *__restrict c,
       int *__restrict d)
{
  a[0] = (b[3] << c[3]) - d[3];
  a[1] = (b[2] << c[2]) - d[2];
  a[2] = (b[1] << c[1]) - d[1];
  a[3] = (b[0] << c[0]) - d[0];
}

continues to push the reverse down to just before the store,
like the previous code did.

In:

int f3(int *__restrict a, int *__restrict b, int *__restrict c,
       int *__restrict d)
{
  for (int i = 0; i < 100; ++i)
    {
      a[0] = (a[0] + c[3]);
      a[1] = (a[1] + c[2]);
      a[2] = (a[2] + c[1]);
      a[3] = (a[3] + c[0]);
      c += 4;
    }
}

the loads of a are hoisted and the stores of a are sunk, so that
only the load from c happens in the loop.  When optimising for
speed, we prefer to have the loop operate on the reversed layout,
changing on entry and exit from the loop:

        mov     x3, x0
        adrp    x0, .LC0
        add     x1, x2, 1600
        ldr     q2, [x0, #:lo12:.LC0]
        ldr     q0, [x3]
        mov     v1.16b, v0.16b
        tbl     v0.16b, {v0.16b - v1.16b}, v2.16b    // <--------
        .p2align 3,,7
.L6:
        ldr     q1, [x2], 16
        add     v0.4s, v0.4s, v1.4s
        cmp     x2, x1
        bne     .L6
        mov     v1.16b, v0.16b
        adrp    x0, .LC0
        ldr     q2, [x0, #:lo12:.LC0]
        tbl     v0.16b, {v0.16b - v1.16b}, v2.16b    // <--------
        str     q0, [x3]
        ret

Similarly, for the very artificial testcase:

int f4(int *__restrict a, int *__restrict b, int *__restrict c,
       int *__restrict d)
{
  int a0 = a[0];
  int a1 = a[1];
  int a2 = a[2];
  int a3 = a[3];
  for (int i = 0; i < 100; ++i)
    {
      a0 ^= c[0];
      a1 ^= c[1];
      a2 ^= c[2];
      a3 ^= c[3];
      c += 4;
      for (int j = 0; j < 100; ++j)
{
  a0 += d[1];
  a1 += d[0];
  a2 += d[3];
  a3 += d[2];
  d += 4;
}
      b[0] = a0;
      b[1] = a1;
      b[2] = a2;
      b[3] = a3;
      b += 4;
    }
  a[0] = a0;
  a[1] = a1;
  a[2] = a2;
  a[3] = a3;
}

the a vector in the inner loop maintains the order { 1, 0, 3, 2 },
even though it's part of an SCC that includes the outer loop.
In other words, this is a motivating case for not assigning
permutes at SCC granularity.  The code we get is:

        ldr     q0, [x0]
        mov     x4, x1
        mov     x5, x0
        add     x1, x3, 1600
        add     x3, x4, 1600
        .p2align 3,,7
.L11:
        ldr     q1, [x2], 16
        sub     x0, x1, #1600
        eor     v0.16b, v1.16b, v0.16b
        rev64   v0.4s, v0.4s              // <---
        .p2align 3,,7
.L10:
        ldr     q1, [x0], 16
        add     v0.4s, v0.4s, v1.4s
        cmp     x0, x1
        bne     .L10
        rev64   v0.4s, v0.4s              // <---
        add     x1, x0, 1600
        str     q0, [x4], 16
        cmp     x3, x4
        bne     .L11
        str     q0, [x5]
        ret

bb-slp-layout-17.c is a collection of compile tests for problems
I hit with earlier versions of the patch.  The same prolems might
show up elsewhere, but it seemed worth having the test anyway.

In slp-11b.c we previously pushed the permutation of the in[i*4]
group down from the load to just before the store.  That didn't
reduce the number or frequency of the permutations (or increase
them either).  But separating the permute from the load meant
that we could no longer use load/store lanes.

Whether load/store lanes are a good idea here is another question.
If there were two sets of loads, and if we could use a single
permutation instead of one per load, then avoiding load/store
lanes should be a good thing even under the current abstract
cost model.  But I think under the current model we should
try to avoid splitting up potential load/store lanes groups
if there is no specific benefit to the split.

Preferring load/store lanes is still a source of missed optimisations
that we should fix one day...

gcc/
* params.opt (-param=vect-max-layout-candidates=): New parameter.
* doc/invoke.texi (vect-max-layout-candidates): Document it.
* tree-vectorizer.h (auto_lane_permutation_t): New typedef.
(auto_load_permutation_t): Likewise.
* tree-vect-slp.cc (vect_slp_node_weight): New function.
(slpg_layout_cost): New class.
(slpg_vertex): Replace perm_in and perm_out with partition,
out_degree, weight and out_weight.
(slpg_partition_info, slpg_partition_layout_costs): New classes.
(vect_optimize_slp_pass): Likewise, cannibalizing some part of
the previous vect_optimize_slp.
(vect_optimize_slp): Use it.

gcc/testsuite/
* lib/target-supports.exp (check_effective_target_vect_var_shift):
Return true for aarch64.
* gcc.dg/vect/bb-slp-layout-1.c: New test.
* gcc.dg/vect/bb-slp-layout-2.c: New test.
* gcc.dg/vect/bb-slp-layout-3.c: New test.
* gcc.dg/vect/bb-slp-layout-4.c: New test.
* gcc.dg/vect/bb-slp-layout-5.c: New test.
* gcc.dg/vect/bb-slp-layout-6.c: New test.
* gcc.dg/vect/bb-slp-layout-7.c: New test.
* gcc.dg/vect/bb-slp-layout-8.c: New test.
* gcc.dg/vect/bb-slp-layout-9.c: New test.
* gcc.dg/vect/bb-slp-layout-10.c: New test.
* gcc.dg/vect/bb-slp-layout-11.c: New test.
* gcc.dg/vect/bb-slp-layout-13.c: New test.
* gcc.dg/vect/bb-slp-layout-14.c: New test.
* gcc.dg/vect/bb-slp-layout-15.c: New test.
* gcc.dg/vect/bb-slp-layout-16.c: New test.
* gcc.dg/vect/bb-slp-layout-17.c: New test.
* gcc.dg/vect/slp-11b.c: XFAIL SLP test for load-lanes targets.

Add base hash traits for vectors

This patch adds a class that provides basic hash/equal functions
for vectors, based on corresponding traits for the element type.

gcc/
* hash-traits.h (vec_hash_base): New class.
(vec_free_hash_base): Likewise.

Rearrange unbounded_hashmap_traits

int_hash combines two kinds of operation:

(1) hashing and equality of integers
(2) using spare integer encodings to represent empty and deleted slots

(1) is really independent of (2), and could be useful in cases where
no spare integer encodings are available. This patch adds a base class
(int_hash_base) for (1) and makes int_hash inherit from it.

If we follow a similar style for future hashes, we can make
unbounded_hashmap_traits take the "base" hash for the key
as a template parameter, rather than requiring every type of
key to have a separate derivative of unbounded_hashmap_traits.
A later patch applies this to vector keys.

No functional change intended.

gcc/
* hash-traits.h (int_hash_base): New struct, split out from...
(int_hash): ...this class, which now inherits from int_hash_base.
* hash-map-traits.h (unbounded_hashmap_traits): Take a template
parameter for the key that provides hash and equality functions.
(unbounded_int_hashmap_traits): Turn into a type alias of
unbounded_hashmap_traits.

Make graphds_scc pass the node order back to callers

As a side-effect, graphds_scc constructs a vector in which all
nodes in an SCC are listed consecutively. This can be useful
information, so that the patch adds an optional pass-back parameter
for it. The interface is similar to the one for graphds_dfs.

gcc/
* graphds.cc (graphds_scc): Add a pass-back parameter for the
final node order.
* graphds.h (graphds_scc): Update prototype accordingly.

Split code out of vect_transform_slp_perm_load

Similarly to the previous vectorizable_slp_permutation patch,
this one splits out the main part of vect_transform_slp_perm_load
so that a later patch can test a permutation without constructing
a node for it.

Also fixes a lingering use of STMT_VINFO_VECTYPE.

gcc/
* tree-vect-slp.cc (vect_transform_slp_perm_load_1): Split out from...
(vect_transform_slp_perm_load): ...here. Use SLP_TREE_VECTYPE instead
of STMT_VINFO_VECTYPE.

Split code out of vectorizable_slp_permutation

A later patch needs to test whether the target supports a
lane_permutation_t without having to construct a full SLP
node to test that. This patch splits out most of the work
of vectorizable_slp_permutation into a subroutine, so that
properties of the permutation can be passed explicitly without
disturbing the main interface.

The new subroutine still uses an slp_tree argument to get things
like the number of lanes and the vector type. That's a bit clunky,
but it seemed like the least worst option.

gcc/
* tree-vect-slp.cc (vectorizable_slp_permutation_1): Split out from...
(vectorizable_slp_permutation): ...here.

vect: Tighten get_related_vectype_for_scalar_type

Builds of glibc with SVE enabled have been failing since V1DI was added
to the aarch64 port.  The problem is that BB SLP starts the (hopeless)
attempt to use variable-length modes to vectorise a single-element
vector, and that now gets further than it did before.

Initially we tried getting a vector mode with 1 + 1X DI elements
(i.e. 1 DI per 128-bit vector chunk).  We don't provide such a mode --
it would be VNx1DI -- because it isn't a native SVE format.  We then
try just 1 DI, which previously failed but now succeeds.

There are numerous ways we could fix this.  Perhaps the most obvious
would be to skip variable-length modes for BB SLP.  However, I think
that'd just be kicking the can down the road, since eventually we want
to support BB SLP and VLA vectors using predication.

However, if we do use VLA vectors for BB SLP, the vector modes
we use should actually be variable length.  We don't want to use
variable-length vectors for some element types/group sizes and
fixed-length vectors for others, since it would be difficult
to handle the seams.

The same principle applies during loop vectorisation.  We can't
use a mixture of variable-length and fixed-length vectors for
the same loop because the relative unroll/vectorisation factors
would not be constant (compile-time) multiples of each other.

This patch therefore makes get_related_vectype_for_scalar_type
check that the provided number of units is interoperable with
the provided prevailing mode.  The function is generally quite
forgiving -- it does basic things like checking for scalarness
itself rather than expecting callers to do them -- so the new
check feels in keeping with that.

This seems to subsume the fix for PR96974.  I'm not sure it's
worth reverting that code to an assert though, so the patch just
drops the scan for the associated message.

gcc/
* tree-vect-stmts.cc (get_related_vectype_for_scalar_type): Check
that the requested number of units is interoperable with the requested
prevailing mode.

gcc/testsuite/
* gcc.target/aarch64/sve/slp_15.c: New test.
* g++.target/aarch64/sve/pr96974.C: Remove scan test.

Change get_std_name_hint to use generated hash table

The get_std_name_hint function so far uses linear search to locate
matching entries.  After adding more hint entries this might not be
appropriate anymore.  Therefore this patch also replaces the linear
array with a gperf-generated hash table.

contrib/ChangeLog

* gcc_update (files_and_dependencies): Add rule for
gcc/cp/std-name-hint.h.

gcc/cp/ChangeLog

* Make-lang.in: Add rule to rebuild std-name-hint.h from
std-name-hint.gperf.
* name-lookup.cc (get_std_name_hint): Remove hints array.
Use gperf-generated class std_name_hint_lookup.
Include "std-name-hint.h".
* std-name-hint.gperf: New file.
* std-name-hint.h: New file.  Generated from the .gperf file.

m32c-rtems: remove obsoleted port

contrib/ChangeLog:

* config-list.mk: Remove the port.

gcc/ChangeLog:

* config.gcc: Remove the port.
* config/m32c/rtems.h: Removed.

libgcc/ChangeLog:

* config.host: Remove the port.

tree-optimization/73550 - apply MAX_NUM_CHAINS consistently

The MAX_NUM_CHAINS is applied once with <= and once with < which
results in the chains not limited but analyis dropped completely.
That's one issue in the PR.

PR tree-optimization/73550
* gimple-predicate-analysis.cc (predicate::init_from_control_deps):
Do not apply MAX_NUM_CHAINS again.

Improve uninit pass dumping

This produces less redundancy and more complete info dumping
the control dependence chains.

* gimple-predicate-analysis.cc (format_edge_vec): Dump
both source and destination.
(dump_dep_chains): Remove.
(uninit_analysis::init_use_preds): Remove redundant
dumping of chains.

c++: __has_builtin gives the wrong answer [PR106759]

We've supported __is_nothrow_constructible since r11-4386, but
names_builtin_p didn't know about it, so it gave the wrong answer for
#if __has_builtin(__is_nothrow_constructible)
...
#endif

I've tested all C++-only built-ins and only two were missing.

PR c++/106759

gcc/cp/ChangeLog:

* cp-objcp-common.cc (names_builtin_p): Handle RID_IS_NOTHROW_ASSIGNABLE
and RID_IS_NOTHROW_CONSTRUCTIBLE.

gcc/testsuite/ChangeLog:

* g++.dg/ext/has-builtin-1.C: New test.

Force a [NAN, NAN] range when the definite NAN property is set.

Setting the definite NAN property should also force a [NAN, NAN]
range, otherwise we'd have two ways of representing a NAN: with the
endpoints or with the property. In the ranger world we avoid at all
costs having more than one representation for a range.

In doing this, I removed the FRANGE_PROP_ACCESSOR macro, since it
looks like setting a property may have repercurssions in the range
itself, so it's best for the client to definte its own setter.

gcc/ChangeLog:

* value-range-storage.cc (frange_storage_slot::get_frange): Use
frange_nan.
* value-range.cc (frange::set_nan): New.
(frange_nan): Move to header file.
(range_tests_nan): Adjust frange_nan callers to pass type.
New test.
* value-range.h (FRANGE_PROP_ACCESSOR): Remove.
(frange_nan): New.

automake: regenerate

gotools/ChangeLog:

* Makefile.in: Regenerate.

automake: regenerate

gotools/ChangeLog:

* Makefile.in: Regenerate.

libatomic/ChangeLog:

* testsuite/Makefile.in: Regenerate.

tree-optimization/67196 - normalize use predicates earlier

The following makes sure to have use predicates simplified and
normalized before doing uninit_analysis::overlap because that
otherwise cannot pick up all flag setting cases. This fixes
half of the issue in PR67196 and conveniently resolves the
XFAIL in gcc.dg/uninit-pred-7_a.c.

PR tree-optimization/67196
* gimple-predicate-analysis.cc (uninit_analysis::is_use_guarded):
Simplify and normalize use prediates before first use.

* gcc.dg/uninit-pred-7_a.c: Un-XFAIL.

libsanitizer: update LOCAL_PATCHES

libsanitizer/ChangeLog:

* LOCAL_PATCHES: Update.

libsanitizer: Apply local patches

libsanitizer: update build system

libsanitizer/ChangeLog:

* sanitizer_common/Makefile.am: Remove sanitizer_openbsd.
* sanitizer_common/Makefile.in: Regenerate.

libsanitizer: merge from master (84a71d5259c2682403cdbd8710592410a2f128ab)

Remove GENERIC expr building from predicate analysis, improve dumps

The following removes duplicate dumping and makes the predicate
dumping more readable. That makes the GENERIC predicate build
routines unused which is also nice.

* gimple-predicate-analysis.cc (dump_pred_chain): Fix
parentizing and AND prepending.
(predicate::dump): Do not dump the GENERIC expanded
predicate, properly parentize and prepend ORs to the
piecewise predicate dump.
(build_pred_expr): Remove.

Implement relational operators for frange with endpoints.

This is the implementation of the relational range operators for
frange. These are the core operations that require specific FP domain
knowledge.

gcc/ChangeLog:

* range-op-float.cc (finite_operand_p): New.
(build_le): New.
(build_lt): New.
(build_ge): New.
(build_gt): New.
(foperator_equal::fold_range): New implementation with endpoints.
(foperator_equal::op1_range): Same.
(foperator_not_equal::fold_range): Same.
(foperator_not_equal::op1_range): Same.
(foperator_lt::fold_range): Same.
(foperator_lt::op1_range): Same.
(foperator_lt::op2_range): Same.
(foperator_le::fold_range): Same.
(foperator_le::op1_range): Same.
(foperator_le::op2_range): Same.
(foperator_gt::fold_range): Same.
(foperator_gt::op1_range): Same.
(foperator_gt::op2_range): Same.
(foperator_ge::fold_range): Same.
(foperator_ge::op1_range): Same.
(foperator_ge::op2_range): Same.

gcc/testsuite/ChangeLog:

* gcc.dg/tree-ssa/recip-3.c: Avoid premature optimization so test
has a chance to succeed.

Add support for floating point endpoints to frange.

The current implementation of frange is just a type with some bits to
represent NAN and INF.  We can do better and represent endpoints to
ultimately solve longstanding PRs such as PR24021.  This patch adds
these endpoints.  In follow-up patches I will add support for a bare
bones PLUS_EXPR range-op-float entry to solve the PR.

I have chosen to use REAL_VALUE_TYPEs for the endpoints, since that's
what we use underneath the trees.  This will be somewhat analogous to
our eventual use of wide-ints in the irange.  No sense going through
added levels of indirection if we can avoid it.  That, plus real.*
already has a nice API for dealing with floats.

With this patch, ranges will be closed float point intervals, which
make the implementation simpler, since we don't have to keep track of
open/closed intervals.  This is conservative enough for use in the
ranger world, as we'd rather err on the side of more elements in a
range, than less.

For example, even though we cannot precisely represent the open
interval (3.0, 5.0) with this approach, it is perfectably reasonable
to represent it as [3.0, 5.0] since the closed interval is a super set
of the open one.  In the VRP/ranger world, it is always better to
err on the side of more information in a range, than not.  After all,
when we don't know anything about a range, we just use VARYING which
is a fancy term for a range spanning the entire domain.

Since REAL_VALUE_TYPEs have properly defined infinity and NAN
semantics, all the math can be made to work:

[-INF, 3.0] !NAN        => Numbers <= 3.0 (NAN cannot happen)
[3.0, 3.0]   => 3.0 or NAN.
[3.0, +INF]             => Numbers >= 3.0 (NAN is possible)
[-INF, +INF]            => VARYING (NAN is possible)
[-INF, +INF] !NAN       => Entire domain.  NAN cannot happen.

Also, since REAL_VALUE_TYPEs can represent the minimum and maximum
representable values of a TYPE_MODE, we can disambiguate between them
and negative and positive infinity (see get_max_float in real.cc).

This also makes the math all work.  For example, suppose we know
nothing about x and y (VARYING).  On the TRUE side of x > y, we can
deduce that:

        (a) x cannot be NAN
        (b) y cannot be NAN
        (c) y cannot be +INF.

(c) means that we can drop the upper bound of "y" from +INF to the
maximum representable value for its type.

Having endpoints with different representation for infinity and the
maximum representable values, means we can drop the +-INF properties
we currently have in the frange.

gcc/ChangeLog:

* range-op-float.cc (frange_set_nan): New.
(frange_drop_inf): New.
(frange_drop_ninf): New.
(foperator_equal::op1_range): Adjust for endpoints.
(foperator_lt::op1_range): Same.
(foperator_lt::op2_range): Same.
(foperator_gt::op1_range): Same.
(foperator_gt::op2_range): Same.
(foperator_unordered::op1_range): Same.
* value-query.cc (range_query::get_tree_range): Same.
* value-range-pretty-print.cc (vrange_printer::visit): Same.
* value-range-storage.cc (frange_storage_slot::get_frange): Same.
* value-range.cc (frange::set): Same.
(frange::normalize_kind): Same.
(frange::union_): Same.
(frange::intersect): Same.
(frange::operator=): Same.
(early_nan_resolve): New.
(frange::contains_p): New.
(frange::singleton_p): New.
(frange::set_nonzero): New.
(frange::nonzero_p): New.
(frange::set_zero): New.
(frange::zero_p): New.
(frange::set_nonnegative): New.
(frange_float): New.
(frange_nan): New.
(range_tests_nan): New.
(range_tests_signed_zeros): New.
(range_tests_floats): New.
(range_tests): New.
* value-range.h (frange::lower_bound): New.
(frange::upper_bound): New.
(vrp_val_min): Use real_inf with a sign instead of negating inf.
(frange::frange): New.
(frange::set_varying): Adjust for endpoints.
(real_max_representable): New.
(real_min_representable): New.

A == 0 ? A : -A    same as -A (when A is 0.0)

The upcoming work for frange triggers a regression in
gcc.dg/tree-ssa/phi-opt-24.c.

For -O2 -fno-signed-zeros, we fail to transform the following into -A:

float f0(float A)
{
  //     A == 0? A : -A    same as -A
  if (A == 0)  return A;
  return -A;
}

This is because the abs/negative match.pd pattern here:

/* abs/negative simplifications moved from fold_cond_expr_with_comparison,
   Need to handle (A - B) case as fold_cond_expr_with_comparison does.
   Need to handle UN* comparisons.
   ...
   ...

Sees IL that has the 0.0 propagated.

Instead of:

  <bb 2> [local count: 1073741824]:
  if (A_2(D) == 0.0)
    goto <bb 4>; [34.00%]
  else
    goto <bb 3>; [66.00%]

  <bb 3> [local count: 708669601]:
  _3 = -A_2(D);

  <bb 4> [local count: 1073741824]:
  # _1 = PHI <A_2(D)(2), _3(3)>

It now sees:

  <bb 4> [local count: 1073741824]:
  # _1 = PHI <0.0(2), _3(3)>

which it leaves untouched, causing the if conditional to survive.

Changing integger_zerop to zerop fixes the problem.

I did not include a testcase, as it's just phi-opt-24.c which will get
triggered when I commit the frange with endpoints work.

gcc/ChangeLog:

* match.pd ((cmp @0 zerop) real_zerop (negate@1 @0)): Add variant
for real zero.

s390: fix build on 32-bit hosts

Fixes build on i686:

gcc/config/s390/s390.cc: In function 'bool s390_rtx_costs(rtx, machine_mode, int, int, int*, bool)':
gcc/config/s390/s390.cc:3728:63: error: cannot convert 'long int*' to 'long long int*'

gcc/ChangeLog:

* config/s390/s390.cc (s390_rtx_costs): Use proper type as
argument.

Use reachability analysis to improve uninit diagnostic

This patch does what the comment in uninit diagnostic suggests.
When the value-numbering run done without optimizing figures there's
a fallthru path, consider blocks on it as always executed.

* tree-ssa-uninit.cc (warn_uninitialized_vars): Pre-compute
the set of fallthru reachable blocks from function entry
and use that to determine wlims.always_executed.

tree-optimization/63660 - testcase for fixed PR

This adds a testcase for the PR which was fixed with r13-2155-gbaa3ffb19c54fa

PR tree-optimization/63660
* gcc.dg/uninit-pr63660.c: New testcase.

tree-optimization/56654 - sort uninit candidates after RPO

The following sorts the immediate uses of a possibly uninitialized
SSA variable after their RPO order so we prefer warning for an
earlier occuring use rather than issueing the diagnostic for the
first uninitialized immediate use.

The sorting will inevitably be imperfect but it also allows us to
optimize the expensive predicate check for the case where there
are multiple uses in the same basic-block which is a nice side-effect.

PR tree-optimization/56654
* tree-ssa-uninit.cc (cand_cmp): New.
(find_uninit_use): First process all PHIs and collect candidate
stmts, then sort those after RPO.
(warn_uninitialized_phi): Pass on bb_to_rpo.
(execute_late_warn_uninitialized): Compute and pass on
reverse lookup of RPO number from basic block index.

Make uninit PHI processing more consistent

Currently the main working of the maybe-uninit pass is to scan over
all PHIs with possibly undefined arguments, diagnosing whether there's
a direct not guarded use.  For not guarded uses in PHIs those are queued for
later processing and to make the uninit analysis PHI def handling work,
mark the PHI def as possibly uninitialized.  But this happens only
for those PHI uses that happen to be seen before a direct not guarded
use and whether all arguments of a PHI node which are defined by a PHI
are properly marked as maybe uninitialized depends on the processing
order.

The following changes the uninit pass to perform an RPO walk over
the function, ensuring that PHI argument defs are visited before
the PHI node (besides backedge uses which we ignore already),
getting rid of the worklist.  It also makes sure to process all
PHI uses, but recording those that are properly guarded so they
are not treated as maybe undefined when processing the PHI use
later.

Overall this should make behavior more consistent, avoid some
false negative because of the previous early out and order issue,
and avoid some false positive because of the missed recording
of guarded PHI uses.

The patch correctly diagnoses an uninitalized use of 'regnum'
in store_bit_field_1 and also diagnoses an uninitialized use of
best_match::m_best_candidate_len in c-decl.cc which I've chosen to
silence by initializing m_best_candidate_len.  The warning is
a false positive but GCC cannot see that m_best_candidate_len is
initialized when m_best_candidate is not NULL so from this
perspective this was a false negative.  I've added
g++.dg/uninit-pred-5.C with a reduced testcase that nicely shows
how the previous behavior missed the diagnostic because the
worklist ended up visiting the PHI with the dependend uninit
value before visiting the PHIs producing it.

* gimple-predicate-analysis.h (uninit_analysis::operator()):
Remove.
* gimple-predicate-analysis.cc
(uninit_analysis::collect_phi_def_edges): Use phi_arg_set,
simplify a bit.
* tree-ssa-uninit.cc (defined_args): New global.
(compute_uninit_opnds_pos): Mask with the recorded set
of guarded maybe-uninitialized uses.
(uninit_undef_val_t::operator()): Remove.
(find_uninit_use): Process all PHI uses, recording the
guarded ones and marking the PHI result as uninitialized
consistently.
(warn_uninitialized_phi): Adjust.
(execute_late_warn_uninitialized): Get rid of the PHI worklist
and instead walk the function in RPO order.
* spellcheck.h (best_match::m_best_candidate_len): Initialize.

* g++.dg/uninit-pred-5.C: New testcase.

middle-end: fix min/max phiopts reduction [PR106744]

This corrects the argument usage to use them in the order that they occur in
the comparisons in gimple.

gcc/ChangeLog:

PR tree-optimization/106744
* tree-ssa-phiopt.cc (minmax_replacement): Correct arguments.

gcc/testsuite/ChangeLog:

PR tree-optimization/106744
* gcc.dg/tree-ssa/minmax-10.c: Make runtime test.
* gcc.dg/tree-ssa/minmax-11.c: Likewise.
* gcc.dg/tree-ssa/minmax-12.c: Likewise.
* gcc.dg/tree-ssa/minmax-13.c: Likewise.
* gcc.dg/tree-ssa/minmax-14.c: Likewise.
* gcc.dg/tree-ssa/minmax-15.c: Likewise.
* gcc.dg/tree-ssa/minmax-16.c: Likewise.
* gcc.dg/tree-ssa/minmax-3.c: Likewise.
* gcc.dg/tree-ssa/minmax-4.c: Likewise.
* gcc.dg/tree-ssa/minmax-5.c: Likewise.
* gcc.dg/tree-ssa/minmax-6.c: Likewise.
* gcc.dg/tree-ssa/minmax-7.c: Likewise.
* gcc.dg/tree-ssa/minmax-8.c: Likewise.
* gcc.dg/tree-ssa/minmax-9.c: Likewise.

middle-end: intialize regnum in store_bit_field_1

This initializes regnum to 0 for when undefined_p.
0 is the right default as it's supposed to get the lowpart
when undefined.

gcc/ChangeLog:

* expmed.cc (store_bit_field_1): Initialize regnum to 0.

Daily bump.

c++: Fix C++11 attribute propagation [PR106712]

When we have

  [[noreturn]] int fn1 [[nodiscard]](), fn2();

"noreturn" should apply to both fn1 and fn2 but "nodiscard" only to fn1:
[dcl.pre]/3: "The attribute-specifier-seq appertains to each of
the entities declared by the declarators of the init-declarator-list."
[dcl.spec.general]: "The attribute-specifier-seq affects the type
only for the declaration it appears in, not other declarations involving
the same type."

As Ed Catmur correctly analyzed, this is because, for the test above,
we call start_decl with prefix_attributes=noreturn, but this line:

  attributes = attr_chainon (attributes, prefix_attributes);

results in attributes == prefix_attributes, because chainon sees
that attributes is null so it just returns prefix_attributes.  Then
in grokdeclarator we reach

  *attrlist = attr_chainon (*attrlist, declarator->std_attributes);

which modifies prefix_attributes so now it's "noreturn, nodiscard"
and so fn2 is wrongly marked nodiscard as well.  Fixed by reversing
the order of arguments to attr_chainon.  That way, we tack the prefix
attributes onto ->std_attributes, avoiding modifying prefix_attributes.

PR c++/106712

gcc/cp/ChangeLog:

* decl.cc (grokdeclarator): Reverse the order of arguments to
attr_chainon.

gcc/testsuite/ChangeLog:

* g++.dg/cpp0x/gen-attrs-77.C: New test.

bpf: handle anonymous members in CO-RE reloc [PR106745]

The old method for computing a member index for a CO-RE relocation
relied on a name comparison, which could SEGV if the member in question
is itself part of an anonymous inner struct or union.

This patch changes the index computation to not rely on a name, while
maintaining the ability to account for other sibling fields which may
not have a representation in BTF.

gcc/ChangeLog:

PR target/106745
* config/bpf/coreout.cc (bpf_core_get_sou_member_index): Fix
computation of index for anonymous members.

gcc/testsuite/ChangeLog:

PR target/106745
* gcc.target/bpf/core-pr106745.c: New test.

bpf: define __bpf__ as well as __BPF__ as a target macro

LLVM defines both __bpf__ and __BPF_ as target macros.
GCC was defining only __BPF__.

This patch defines __bpf__ as a target macro for BPF.
Tested in bpf-unknown-none.

gcc/ChangeLog:

* config/bpf/bpf.cc (bpf_target_macros): Define __bpf__ as a
target macro.

x86: Handle V16BF in ix86_avx256_split_vector_move_misalign

Handle E_V16BFmode in ix86_avx256_split_vector_move_misalign and add
V16BF to V_256H iterator.

gcc/

PR target/106748
* config/i386/i386-expand.cc
(ix86_avx256_split_vector_move_misalign): Handle E_V16BFmode.
* config/i386/sse.md (V_256H): Add V16BF.

gcc/testsuite/

PR target/106748
* gcc.target/i386/pr106748.c: New test.

LoongArch: testsuite: refine __tls_get_addr tests with tls_native

If GCC is not built with a working linker for the target (developers
occansionally build such a "minimal" GCC for testing and debugging),
TLS will be emulated and __tls_get_addr won't be used. Refine those
tests depending on __tls_get_addr with tls_native to avoid test
failures.

gcc/testsuite/ChangeLog:

* gcc.target/loongarch/func-call-medium-1.c: Refine test
depending on __tls_get_addr with { target tls_native }.
* gcc.target/loongarch/func-call-medium-2.c: Likewise.
* gcc.target/loongarch/func-call-medium-3.c: Likewise.
* gcc.target/loongarch/func-call-medium-4.c: Likewise.
* gcc.target/loongarch/func-call-medium-5.c: Likewise.
* gcc.target/loongarch/func-call-medium-6.c: Likewise.
* gcc.target/loongarch/func-call-medium-7.c: Likewise.
* gcc.target/loongarch/func-call-medium-8.c: Likewise.
* gcc.target/loongarch/tls-gd-noplt.c: Likewise.

s390: Change SET rtx_cost handling.

The IF_THEN_ELSE detection currently prevents us from properly costing
register-register moves which causes the lower-subreg pass to assume that
a VR-VR move is as expensive as two GPR-GPR moves.

This patch adds handling for SETs containing REGs as well as MEMs and is
inspired by the aarch64 implementation.

gcc/ChangeLog:

* config/s390/s390.cc (s390_address_cost): Declare.
(s390_hard_regno_nregs): Declare.
(s390_rtx_costs): Add handling for REG and MEM in SET.

gcc/testsuite/ChangeLog:

* gcc.target/s390/vector/vec-sum-across-no-lower-subreg-1.c: New test.

s390: Recognize reverse/element swap permute patterns.

This adds functions to recognize reverse/element swap permute patterns
for vler, vster as well as vpdi and rotate.

gcc/ChangeLog:

* config/s390/s390.cc (expand_perm_with_vpdi): Recognize swap pattern.
(is_reverse_perm_mask): New function.
(expand_perm_with_rot): Recognize reverse pattern.
(expand_perm_with_vstbrq): New function.
(expand_perm_with_vster): Use vler/vster for element reversal on z15.
(vectorize_vec_perm_const_1): Use.
(s390_vectorize_vec_perm_const): Add expand functions.
* config/s390/vx-builtins.md: Prefer vster over vler.

gcc/testsuite/ChangeLog:

* gcc.target/s390/vector/vperm-rev-z14.c: New test.
* gcc.target/s390/vector/vperm-rev-z15.c: New test.
* gcc.target/s390/zvector/vec-reve-store-byte.c: Adjust test
expectation.

s390: Implement vec_extract via vec_select.

vec_select can handle dynamic/runtime masks nowadays. Therefore we can
get rid of the UNSPEC_VEC_EXTRACT that was preventing further
optimizations like combining instructions with vec_extract patterns.

gcc/ChangeLog:

* config/s390/s390.md: Remove UNSPEC_VEC_EXTRACT.
* config/s390/vector.md: Rewrite patterns to use vec_select.
* config/s390/vx-builtins.md (vec_scatter_element<V_HW_2:mode>_SI):
Likewise.

s390: Use vpdi and verllg in vec_reve.

Swapping the two elements of a V2DImode or V2DFmode vector can be done
with vpdi instead of using the generic way of loading a permutation mask
from the literal pool and vperm.

Analogous to the V2DI/V2DF case reversing the elements of a four-element
vector can be done by first swapping the elements of the first
doubleword as well the ones of the second one and subsequently rotate
the doublewords by 32 bits.

gcc/ChangeLog:

PR target/100869
* config/s390/vector.md (@vpdi4_2<mode>): New pattern.
(rotl<mode>3_di): New pattern.
* config/s390/vx-builtins.md: Use vpdi and verll for reversing
elements.

gcc/testsuite/ChangeLog:

* gcc.target/s390/zvector/vec-reve-int-long.c: New test.

s390: Add z15 to s390_issue_rate.

Be more explicit by mentioning z15 in s390_issue_rate.

gcc/ChangeLog:

* config/s390/s390.cc (s390_issue_rate): Add z15.

s390: Add -munroll-only-small-loops.

Inspired by Power we also introduce -munroll-only-small-loops. This
implies activating -funroll-loops and -munroll-only-small-loops at -O2 and
above.

gcc/ChangeLog:

* common/config/s390/s390-common.cc: Enable -funroll-loops and
-munroll-only-small-loops for OPT_LEVELS_2_PLUS_SPEED_ONLY.
* config/s390/s390.cc (s390_loop_unroll_adjust): Do not unroll
loops larger than 12 instructions.
(s390_override_options_after_change): Set unroll options.
(s390_option_override_internal): Likewise.
* config/s390/s390.opt: Document munroll-only-small-loops.

gcc/testsuite/ChangeLog:

* gcc.target/s390/vector/vec-copysign.c: Do not unroll.
* gcc.target/s390/zvector/autovec-double-quiet-uneq.c: Dito.
* gcc.target/s390/zvector/autovec-double-signaling-ltgt.c: Dito.
* gcc.target/s390/zvector/autovec-float-quiet-uneq.c: Dito.
* gcc.target/s390/zvector/autovec-float-signaling-ltgt.c: Dito.

Refactor init_use_preds and find_control_equiv_block

The following inlines find_control_equiv_block and is_loop_exit
into init_use_preds and refactors that for better readability and
similarity with the post-dominator walk in compute_control_dep_chain.

* gimple-predicate-analysis.cc (is_loop_exit,
find_control_equiv_block): Inline into single caller ...
(uninit_analysis::init_use_preds): ... here and refactor.

Improve compute_control_dep_chain documentation

The following refactors compute_control_dep_chain slightly by
inlining is_loop_exit and factoring the check on the loop
invariant condition. It also adds a comment as of how I
understand the code and it's current problem.

* gimple-predicate-analysis.cc (compute_control_dep_chain):
Inline is_loop_exit and refactor, add comment about
loop exits.

RISC-V: Suppress -Wclass-memaccess warning

poly_int64 is non-trivial type, we need to clean up manully instead
of memset to prevent this warning.

../../gcc/gcc/config/riscv/riscv.cc: In function 'void riscv_compute_frame_info()':
../../gcc/gcc/config/riscv/riscv.cc:4113:10: error: 'void* memset(void*, int, size_t)' clearing an object of non-trivial type 'struct riscv_frame_info'; use assignment or value-initialization instead [-Werror=class-memaccess]
4113 |   memset (frame, 0, sizeof (*frame));
      |   ~~~~~~~^~~~~~~~~~~~~~~~~~~~~~~~~~~
../../gcc/gcc/config/riscv/riscv.cc:101:17: note: 'struct riscv_frame_info' declared here
  101 | struct GTY(())  riscv_frame_info {
      |                 ^~~~~~~~~~~~~~~~
cc1plus: all warnings being treated as errors

gcc/ChangeLog:

* config/riscv/riscv.cc (riscv_frame_info): Introduce `reset(void)`;
(riscv_frame_info::reset(void)): New.
(riscv_compute_frame_info): Use riscv_frame_info::reset instead
of memset when clean frame.

RISC-V: Add RVV registers

gcc/ChangeLog:

* config/riscv/riscv.cc (riscv_v_ext_vector_mode_p): New function.
(riscv_classify_address): Disallow PLUS/LO_SUM/CONST_INT address types for RVV.
(riscv_address_insns): Add RVV modes condition.
(riscv_binary_cost): Ditto.
(riscv_rtx_costs): Adjust cost for RVV.
(riscv_secondary_memory_needed): Add RVV modes condition.
(riscv_hard_regno_nregs): Add RVV register allocation.
(riscv_hard_regno_mode_ok): Add RVV register allocation.
(riscv_class_max_nregs): Add RVV register allocation.
* config/riscv/riscv.h (DWARF_FRAME_REGNUM): Add VL/VTYPE and vector registers in Dwarf.
(UNITS_PER_V_REG): New macro.
(FIRST_PSEUDO_REGISTER): Adjust first pseudo num for RVV.
(V_REG_FIRST): New macro.
(V_REG_LAST): Ditto.
(V_REG_NUM): Ditto.
(V_REG_P): Ditto.
(VL_REG_P): Ditto.
(VTYPE_REG_P): Ditto.
(RISCV_DWARF_VL): Ditto.
(RISCV_DWARF_VTYPE): Ditto.
(enum reg_class): Add RVV register types.
(REG_CLASS_CONTENTS): Add RVV register types.
* config/riscv/riscv.md: Add VL/VTYPE register number constants.

RISC-V: Add RVV instructions classification

gcc/ChangeLog:

* config/riscv/riscv.md: Add new type for vector instructions.

Daily bump.

rs6000: Allow conversions of MMA pointer types [PR106017]

GCC incorrectly disables conversions between MMA pointer types, which
are allowed with clang. The original intent was to disable conversions
between MMA types and other other types, but pointer conversions should
have been allowed. The fix is to just remove the MMA pointer conversion
handling code altogether.

gcc/
PR target/106017
* config/rs6000/rs6000.cc (rs6000_invalid_conversion): Remove handling
of MMA pointer conversions.

gcc/testsuite/
PR target/106017
* gcc.target/powerpc/pr106017.c: New test.

Daily bump.

d: Merge upstream dmd 817610b16d, phobos b578dfad9

D front-end changes:

    - Import latest bug fixes to mainline.

Phobos changes:

    - Import latest bug fixes to mainline.
    - std.logger module has been moved out of experimental.
    - Removed std.experimental.typecons module.

gcc/d/ChangeLog:

* dmd/MERGE: Merge upstream dmd 817610b16d.
* d-ctfloat.cc (CTFloat::parse): Update for new front-end interface.
* d-lang.cc (d_parse_file): Likewise.
* expr.cc (ExprVisitor::visit (AssignExp *)): Remove handling of array
assignments to non-trivial static and dynamic arrays.
* runtime.def (ARRAYASSIGN): Remove.
(ARRAYASSIGN_L): Remove.
(ARRAYASSIGN_R): Remove.

libphobos/ChangeLog:

* libdruntime/MERGE: Merge upstream druntime 817610b16d.
* libdruntime/Makefile.am (DRUNTIME_DSOURCES): Add
core/internal/array/arrayassign.d.
* libdruntime/Makefile.in: Regenerate.
* src/MERGE: Merge upstream phobos b578dfad9.
* src/Makefile.am (PHOBOS_DSOURCES): Remove
std/experimental/typecons.d. Add std/logger package.
* src/Makefile.in: Regenerate.

libstdc++: Add test for std::con/disjunction's base class

libstdc++-v3/ChangeLog:

* testsuite/20_util/logical_traits/requirements/base_classes.cc: New test.

Require fgraphite effective target for pr106737.c test [PR106737]

The test uses -floop-parallelize-all which emits a sorry when graphite
isn't configured in.

2022-08-27 Jakub Jelinek <jakub@redhat.com>

PR tree-optimization/106737
* gcc.dg/autopar/pr106737.c: Require fgraphite effective target.

contrib: modernize gen_autofdo_event.py

Python 2 has been EOL'ed for two years. egrep has been deprecated
for many years and the next grep release will start to print warning if
it is used.

-E option may be unsupported by some non-POSIX grep implementations, but
gcc-auto-profile won't work on non-Linux systems anyway.

contrib/ChangeLog:

* gen_autofdo_event.py: Port to Python 3, and use grep -E
instead of egrep.

gcc/ChangeLog:

* config/i386/gcc-auto-profile: Regenerate.

Daily bump.

libstdc++: Implement LWG 3692/3702 changes to zip_/zip_transform_view

libstdc++-v3/ChangeLog:

* include/std/ranges (zip_view::_Iterator::operator<): Remove
as per LWG 3692.
(zip_view::_Iterator::operator>): Likewise.
(zip_view::_Iterator::operator<=): Likewise.
(zip_view::_Iterator::operator>=): Likewise.
(zip_view::_Iterator::operator<=>): Remove three_way_comparable
constraint as per LWG 3692.
(zip_transform_view::_Iterator): Ditto as per LWG 3702.

libstdc++: Implement ranges::zip_transform_view from P2321R2

libstdc++-v3/ChangeLog:

* include/std/ranges (zip_view::_Iterator): Befriend
zip_transform_view.
(__detail::__range_iter_cat): Define.
(zip_transform_view): Define.
(zip_transform_view::_Iterator): Define.
(zip_transform_view::_Sentinel): Define.
(views::__detail::__can_zip_transform_view): Define.
(views::_ZipTransform): Define.
(views::zip_transform): Define.
* testsuite/std/ranges/zip_transform/1.cc: New test.

libstdc++: Optimize std::con/disjunction, __and_/__or_, etc

The internal type-level logical operator traits __and_ and __or_ seem to
have high overhead for a couple of reasons:

  1. They are drop-in replacements for std::con/disjunction, which
     are rigidly specified to form a type that derives from the first
     type argument that caused the overall computation to short-circuit.
     In practice this inheritance property seems to be rarely needed;
     usually all we care about is the value of the overall result.
  2. Their recursive implementations instantiate O(N) class templates
     and form an inheritance chain of depth O(N).

This patch gets rid of this inheritance property of __and_ and __or_
(which seems to be unneeded in the library except indirectly by
std::con/disjunction) which allows us to redefine them non-recursively
as alias templates that yield either false_type or true_type via
enable_if_t and partial ordering of a pair of function templates
(alternatively we could use an equivalent partially specialized class
template, but using function templates appears to be slightly more
efficient).

As for std::con/disjunction, it seems we need to keep implementing them
via a recursive class template for sake of the inheritance property.
But instead of using inheritance recursion, use a recursive member
typedef that gets immediately flattened, so that specializations thereof
now have O(1) instead of O(N) inheritance depth.

In passing, redefine __not_ as an alias template for consistency with
__and_ and __or_, and to remove a layer of indirection.

Together these changes have a substantial effect on compile time and
memory usage for code that heavily uses these internal type traits.
For the following example (which tests constructibility between two
compatible 257-element tuple types):

  #include <tuple>

  #define M(x) x, x

  using ty1 = std::tuple<M(M(M(M(M(M(M(M(int)))))))), int>;
  using ty2 = std::tuple<M(M(M(M(M(M(M(M(int)))))))), long>;

  static_assert(std::is_constructible_v<ty2, ty1>);

memory usage improves ~27% from 440MB to 320MB and compile time improves
~20% from ~2s to ~1.6s (with -std=c++23).

libstdc++-v3/ChangeLog:

* include/std/type_traits (enable_if, __enable_if_t): Define them
earlier.
(__detail::__first_t): Define.
(__detail::__or_fn, __detail::__and_fn): Declare.
(__or_, __and_): Redefine as alias templates in terms of __or_fn
and __and_fn.
(__not_): Redefine as an alias template.
(__detail::__disjunction_impl, __detail::__conjunction_impl):
Define.
(conjuction, disjunction): Redefine in terms of __disjunction_impl
and __conjunction_impl.

Add real_iszero to real.*

We have real_isnegzero but no real_iszero. We could memcmp with 0,
but that's just ugly.

gcc/ChangeLog:

* real.cc (real_iszero): New.
* real.h (real_iszero): New.

Add set/get functions for negative infinity in real.*

For the frange implementation with endpoints I'm about to contribute,
we need to set REAL_VALUE_TYPEs with negative infinity. The support
is already there in real.cc, but it is awkward to get at. One could
call real_inf() and then negate the value, but I've added the ability
to pass the sign argument like many of the existing real.* functions.

I've declared the functions in such a way to avoid changes to the
existing code base:

// Unchanged function returning true for either +-INF.
bool real_isinf (const REAL_VALUE_TYPE *r);
// New overload to be able to specify the sign.
bool real_isinf (const REAL_VALUE_TYPE *r, int sign);
// Replacement function for setting INF, defaults to +INF.
void real_inf (REAL_VALUE_TYPE *, int sign = 0);

gcc/ChangeLog:

* real.cc (real_isinf): New overload.
(real_inf): Add sign argument.
* real.h (real_isinf): New overload.
(real_inf): Add sign argument.

c++: Implement -Wself-move warning [PR81159]

About 5 years ago we got a request to implement -Wself-move, which
warns about useless moves like this:

int x;
x = std::move (x);

This patch implements that warning.

PR c++/81159

gcc/c-family/ChangeLog:

* c.opt (Wself-move): New option.

gcc/cp/ChangeLog:

* typeck.cc (maybe_warn_self_move): New.
(cp_build_modify_expr): Call maybe_warn_self_move.

gcc/ChangeLog:

* doc/invoke.texi: Document -Wself-move.

gcc/testsuite/ChangeLog:

* g++.dg/warn/Wself-move1.C: New test.

Make all default vrange setters set VARYING.

frange is using some of the default vrange setters, some of which are
leaving the range in an undefined state. We hadn't noticed this
because neither frange nor unsupported_range, both which use some of
the default implementation, weren't being used much.

We can never go wrong with setting VARYING ;-).

gcc/ChangeLog:

* value-range.cc (vrange::set): Set varying.
(vrange::set_nonzero): Same.
(vrange::set_zero): Same.
(vrange::set_nonnegative): Same.

[ranger] x == -0.0 does not mean we can replace x with -0.0

On the true side of x == -0.0, we can't just blindly value propagate
the -0.0 into every use of x because x could be +0.0.

With this change, we only allow the transformation if
!HONOR_SIGNED_ZEROS or if the range is known not to contain 0.

gcc/ChangeLog:

* range-op-float.cc (foperator_equal::op1_range): Do not blindly
copy op2 range when honoring signed zeros.

Add newline when checking path profitability.

It looks like we're missing a newline for cases where we don't print
anything.

gcc/ChangeLog:

* tree-ssa-threadbackward.cc (possibly_profitable_path_p): Always
add newline.
(profitable_path_p): Same.

libstdc++: Simplify std::error_code and std::error_condition

This removes the redundant operator=(E) from std::error_code and
std::error_condition. Without that overload, assignment from a custom
type will use the templated constructor to create a temporary and then
use the trivial copy assignment operator. With the overloaded
assignment, we have to check the constraints twice as often, because
that overload and its constraints are checked for simple copy
assignments (including the one in the overloaded assignment operator
itself!)

Also add tests that ADL is used as per LWG 3629.

libstdc++-v3/ChangeLog:

* include/std/system_error (error_code::_Check): New alias
template for constructor SFINAE constraint.
(error_code::error_code(ErrorCodeEnum)): Use it.
(error_code::operator=(ErrorCodeEnum)): Remove.
(error_condition::_Check): New alias template for constraint.
(error_condition::error_condition(ErrorConditionEnum)): Use it.
(error_condition::operator=(ErrorConditionEnum)): Remove.
* testsuite/19_diagnostics/error_code/cons/1.cc: Check
constructor taking user-defined error enum.
* testsuite/19_diagnostics/error_condition/cons/1.cc: Likewise.

libstdc++: Add nonnull to starts_with/ends_with/contains string members

Ideally this wouldn't be needed, because eventually these pointers all
get passed to either the basic_string_view(const CharT*) constructor, or
to basic_string_view::find(const CharT*), both of which already have the
attribute. But for that to work requires optimization, so that the null
value gets propagated through the call chain.

Adding it explicitly to each member that requires a non-null pointer
makes the diagnostics more reliable even without optimization. It's
better to give a diagnostic earlier anyway, at the actual problematic
call in the user's code.

libstdc++-v3/ChangeLog:

* include/bits/basic_string.h (starts_with, ends_with, contains):
Add nonnull attribute.
* include/bits/cow_string.h (starts_with, ends_with, contains):
Likewise.
* include/std/string_view (starts_with, ends_with, contains):
Likewise.
* testsuite/21_strings/basic_string/operations/contains/nonnull.cc
* testsuite/21_strings/basic_string/operations/ends_with/nonnull.cc
* testsuite/21_strings/basic_string/operations/starts_with/nonnull.cc
* testsuite/21_strings/basic_string_view/operations/contains/nonnull.cc
* testsuite/21_strings/basic_string_view/operations/ends_with/nonnull.cc
* testsuite/21_strings/basic_string_view/operations/starts_with/nonnull.cc

libcpp: Implement P2362R3 - Remove non-encodable wide character literals and multicharacter [PR106647]

My understanding of the paper is that we just want to promote the CPP_WCHAR
"character constant too long for its type" warning to error as it is already
error for u8, u and U literals.

2022-08-26 Jakub Jelinek <jakub@redhat.com>

PR c++/106647
* charset.cc (wide_str_to_charconst): Implement P2362R3 - Remove
non-encodable wide character literals and multicharacter. For
C++23 use CPP_DL_ERROR instead of CPP_DL_WARNING for
"character constant too long for its type" diagnostics on CPP_WCHAR
literals.

* g++.dg/cpp23/wchar-multi1.C: New test.
* g++.dg/cpp23/wchar-multi2.C: New test.

Remove uninit_analysis::use_cannot_happen

As written earlier uninit_analysis::use_cannot_happen is duplicate
functionality implemented in a complement way, not adhering to
the idea of disproving a may-uninit use and eventually (I have not
yet found a testcase it helps to avoid false positives) avoiding
false positives because of this or different ways it imposes limits
on the predicate computations.

This patch removes it.

* gimple-predicate-analysis.h
(uninit_analysis::use_cannot_happen): Remove.
* gimple-predicate-analysis.cc (can_be_invalidated_p): Remove.
(uninit_analysis::use_cannot_happen): Likewise.
(uninit_analysis::is_use_guarded): Do not call
use_cannot_happen.
(dump_predicates): Remove.
(simple_control_dep_chain): Remove edge overload.

New testcase for uninit

The following adds a testcase that illustrates a defect in
compute_control_dep_chain and its attempt to identify loop
exits as special to continue walking post-dominators but failing
to do so for following post-dominators. On trunk there is now
simple_control_dep_chain saving the day, avoiding the false
positive but with GCC 12 we get a bogus diagnostic.

* gcc.dg/uninit-pred-11.c: New testcase.

OpenMP: Support reverse offload (middle end part)

gcc/ChangeLog:

* internal-fn.cc (expand_GOMP_TARGET_REV): New.
* internal-fn.def (GOMP_TARGET_REV): New.
* lto-cgraph.cc (lto_output_node, verify_node_partition): Mark
'omp target device_ancestor_host' as in_other_partition and don't
error if absent.
* omp-low.cc (create_omp_child_function): Mark as 'noclone'.
* omp-expand.cc (expand_omp_target): For reverse offload, remove
sorry, use device = GOMP_DEVICE_HOST_FALLBACK and create
empty-body nohost function.
* omp-offload.cc (execute_omp_device_lower): Handle
IFN_GOMP_TARGET_REV.
(pass_omp_target_link::execute): For ACCEL_COMPILER, don't
nullify fn argument for reverse offload

libgomp/ChangeLog:

* libgomp.texi (OpenMP 5.0): Mark 'ancestor' as implemented but
refer to 'requires'.
* testsuite/libgomp.c-c++-common/reverse-offload-1-aux.c: New test.
* testsuite/libgomp.c-c++-common/reverse-offload-1.c: New test.
* testsuite/libgomp.fortran/reverse-offload-1-aux.f90: New test.
* testsuite/libgomp.fortran/reverse-offload-1.f90: New test.

gcc/testsuite/ChangeLog:

* c-c++-common/gomp/reverse-offload-1.c: Remove dg-sorry.
* c-c++-common/gomp/target-device-ancestor-4.c: Likewise.
* gfortran.dg/gomp/target-device-ancestor-4.f90: Likewise.
* gfortran.dg/gomp/target-device-ancestor-5.f90: Likewise.
* c-c++-common/goacc/classify-kernels-parloops.c: Add 'noclone' to
scan-tree-dump-times.
* c-c++-common/goacc/classify-kernels-unparallelized-parloops.c:
Likewise.
* c-c++-common/goacc/classify-kernels-unparallelized.c: Likewise.
* c-c++-common/goacc/classify-kernels.c: Likewise.
* c-c++-common/goacc/classify-parallel.c: Likewise.
* c-c++-common/goacc/classify-serial.c: Likewise.
* c-c++-common/goacc/kernels-counter-vars-function-scope.c: Likewise.
* c-c++-common/goacc/kernels-loop-2.c: Likewise.
* c-c++-common/goacc/kernels-loop-3.c: Likewise.
* c-c++-common/goacc/kernels-loop-data-2.c: Likewise.
* c-c++-common/goacc/kernels-loop-data-enter-exit-2.c: Likewise.
* c-c++-common/goacc/kernels-loop-data-enter-exit.c: Likewise.
* c-c++-common/goacc/kernels-loop-data-update.c: Likewise.
* c-c++-common/goacc/kernels-loop-data.c: Likewise.
* c-c++-common/goacc/kernels-loop-g.c: Likewise.
* c-c++-common/goacc/kernels-loop-mod-not-zero.c: Likewise.
* c-c++-common/goacc/kernels-loop-n.c: Likewise.
* c-c++-common/goacc/kernels-loop-nest.c: Likewise.
* c-c++-common/goacc/kernels-loop.c: Likewise.
* c-c++-common/goacc/kernels-one-counter-var.c: Likewise.
* c-c++-common/goacc/kernels-parallel-loop-data-enter-exit.c: Likewise.
* gfortran.dg/goacc/classify-kernels-parloops.f95: Likewise.
* gfortran.dg/goacc/classify-kernels-unparallelized-parloops.f95:
Likewise.
* gfortran.dg/goacc/classify-kernels-unparallelized.f95: Likewise.
* gfortran.dg/goacc/classify-kernels.f95: Likewise.
* gfortran.dg/goacc/classify-parallel.f95: Likewise.
* gfortran.dg/goacc/classify-serial.f95: Likewise.
* gfortran.dg/goacc/kernels-loop-2.f95: Likewise.
* gfortran.dg/goacc/kernels-loop-data-2.f95: Likewise.
* gfortran.dg/goacc/kernels-loop-data-enter-exit-2.f95: Likewise.
* gfortran.dg/goacc/kernels-loop-data-enter-exit.f95: Likewise.
* gfortran.dg/goacc/kernels-loop-data-update.f95: Likewise.
* gfortran.dg/goacc/kernels-loop-data.f95: Likewise.
* gfortran.dg/goacc/kernels-loop-n.f95: Likewise.
* gfortran.dg/goacc/kernels-loop.f95: Likewise.
* gfortran.dg/goacc/kernels-parallel-loop-data-enter-exit.f95: Likewise.

fortran: Expand ieee_arithmetic module's ieee_value inline [PR106579]

The following patch expands IEEE_VALUE function inline in the FE.

2022-08-26 Jakub Jelinek <jakub@redhat.com>

PR fortran/106579
* trans-intrinsic.cc: Include realmpfr.h.
(conv_intrinsic_ieee_value): New function.
(gfc_conv_ieee_arithmetic_function): Handle ieee_value.

fortran: Expand ieee_arithmetic module's ieee_class inline [PR106579]

The following patch expands IEEE_CLASS inline in the FE, using the
__builtin_fpclassify, __builtin_signbit and the new __builtin_issignaling
builtins.

2022-08-26 Jakub Jelinek <jakub@redhat.com>

PR fortran/106579
gcc/fortran/
* f95-lang.cc (gfc_init_builtin_functions): Initialize
BUILT_IN_FPCLASSIFY.
* libgfortran.h (IEEE_OTHER_VALUE, IEEE_SIGNALING_NAN,
IEEE_QUIET_NAN, IEEE_NEGATIVE_INF, IEEE_NEGATIVE_NORMAL,
IEEE_NEGATIVE_DENORMAL, IEEE_NEGATIVE_SUBNORMAL,
IEEE_NEGATIVE_ZERO, IEEE_POSITIVE_ZERO, IEEE_POSITIVE_DENORMAL,
IEEE_POSITIVE_SUBNORMAL, IEEE_POSITIVE_NORMAL, IEEE_POSITIVE_INF):
New enum.
* trans-intrinsic.cc (conv_intrinsic_ieee_class): New function.
(gfc_conv_ieee_arithmetic_function): Handle ieee_class.
libgfortran/
* ieee/ieee_helper.c (IEEE_OTHER_VALUE, IEEE_SIGNALING_NAN,
IEEE_QUIET_NAN, IEEE_NEGATIVE_INF, IEEE_NEGATIVE_NORMAL,
IEEE_NEGATIVE_DENORMAL, IEEE_NEGATIVE_SUBNORMAL,
IEEE_NEGATIVE_ZERO, IEEE_POSITIVE_ZERO, IEEE_POSITIVE_DENORMAL,
IEEE_POSITIVE_SUBNORMAL, IEEE_POSITIVE_NORMAL, IEEE_POSITIVE_INF):
Move to gcc/fortran/libgfortran.h.

libgfortran: Use __builtin_issignaling in libgfortran [PR105105]

The following patch makes use of the new __builtin_issignaling,
so it no longer needs the fallback implementation and can use
the builtin even where glibc provides the macro.

2022-08-26 Jakub Jelinek <jakub@redhat.com>

PR fortran/105105
* ieee/ieee_helper.c: Don't include issignaling_fallback.h.
(CLASSMACRO): Use __builtin_issignaling instead of issignaling.
* ieee/issignaling_fallback.h: Removed.

Implement __builtin_issignaling

The following patch implements a new builtin, __builtin_issignaling,
which can be used to implement the ISO/IEC TS 18661-1 issignaling
macro.

It is implemented as type-generic function, so there is just one
builtin, not many with various suffixes.
This patch doesn't address PR56831 nor PR58416, but I think compared to
using glibc issignaling macro could make some cases better (as
the builtin is expanded always inline and for SFmode/DFmode just
reinterprets a memory or pseudo register as SImode/DImode, so could
avoid some raising of exception + turning sNaN into qNaN before the
builtin can analyze it).

For floading point modes that do not have NaNs it will return 0,
otherwise I've tried to implement this for all the other supported
real formats.
It handles both the MIPS/PA floats where a sNaN has the mantissa
MSB set and the rest where a sNaN has it cleared, with the exception
of format which are known never to be in the MIPS/PA form.
The MIPS/PA floats are handled using a test like
(x & mask) == mask,
the other usually as
((x ^ bit) & mask) > val
where bit, mask and val are some constants.
IBM double double is done by doing DFmode test on the most significant
half, and Intel/Motorola extended (12 or 16 bytes) and IEEE quad are
handled by extracting 32-bit/16-bit words or 64-bit parts from the
value and testing those.
On x86, XFmode is handled by a special optab so that even pseudo numbers
are considered signaling, like in glibc and like the i386 specific testcase
tests.

2022-08-26 Jakub Jelinek <jakub@redhat.com>

gcc/
* builtins.def (BUILT_IN_ISSIGNALING): New built-in.
* builtins.cc (expand_builtin_issignaling): New function.
(expand_builtin_signbit): Don't overwrite target.
(expand_builtin): Handle BUILT_IN_ISSIGNALING.
(fold_builtin_classify): Likewise.
(fold_builtin_1): Likewise.
* optabs.def (issignaling_optab): New.
* fold-const-call.cc (fold_const_call_ss): Handle
BUILT_IN_ISSIGNALING.
* config/i386/i386.md (issignalingxf2): New expander.
* doc/extend.texi (__builtin_issignaling): Document.
(__builtin_isinf, __builtin_isnan): Clarify behavior with
-ffinite-math-only.
* doc/md.texi (issignaling<mode>2): Likewise.
gcc/c-family/
* c-common.cc (check_builtin_function_arguments): Handle
BUILT_IN_ISSIGNALING.
gcc/c/
* c-typeck.cc (convert_arguments): Handle BUILT_IN_ISSIGNALING.
gcc/fortran/
* f95-lang.cc (gfc_init_builtin_functions): Initialize
BUILT_IN_ISSIGNALING.
gcc/testsuite/
* gcc.dg/torture/builtin-issignaling-1.c: New test.
* gcc.dg/torture/builtin-issignaling-2.c: New test.
* gcc.dg/torture/float16-builtin-issignaling-1.c: New test.
* gcc.dg/torture/float32-builtin-issignaling-1.c: New test.
* gcc.dg/torture/float32x-builtin-issignaling-1.c: New test.
* gcc.dg/torture/float64-builtin-issignaling-1.c: New test.
* gcc.dg/torture/float64x-builtin-issignaling-1.c: New test.
* gcc.dg/torture/float128-builtin-issignaling-1.c: New test.
* gcc.dg/torture/float128x-builtin-issignaling-1.c: New test.
* gcc.target/i386/builtin-issignaling-1.c: New test.

internal-fn, tree-cfg: Fix .TRAP handling and another __builtin_trap vops issue [PR106099]

This patch fixes 2 __builtin_unreachable/__builtin_trap related issues.
One (first hunk) is that CDDCE happily removes calls to .TRAP ()
internal-fn as useless.  The problem is that the internal-fn is
ECF_CONST | ECF_NORETURN, doesn't have lhs and so DCE thinks it doesn't
have side-effects and removes it.  __builtin_unreachable which has
the same ECF_* flags works fine, as since PR44485 we implicitly add
ECF_LOOPING_CONST_OR_PURE to ECF_CONST | ECF_NORETURN builtins, but
do it in flags_from_decl_or_type which isn't called for internal-fns.
As IFN_TRAP is the only ifn with such flags, it seems easier to
add it explicitly.

The other issue (which on the testcase can be seen only with the
first bug unfixed) is that execute_fixup_cfg can add a __builtin_trap
which needs vops, but nothing adds it and it can appear in many passes
which don't have corresponding TODO_update_ssa_only_virtuals etc.
Fixed similarly as last time but emitting ifn there instead.

2022-08-26  Jakub Jelinek  <jakub@redhat.com>

PR tree-optimization/106099
* internal-fn.def (TRAP): Add ECF_LOOPING_CONST_OR_PURE flag.
* tree-cfg.cc (execute_fixup_cfg): Add IFN_TRAP instead of
__builtin_trap to avoid the need of vops.

* gcc.dg/pr106099.c: New test.

c++: Implement C++23 P2071R2 - Named universal character escapes [PR106648]

The following patch implements the
C++23 P2071R2 - Named universal character escapes
paper to support \N{LATIN SMALL LETTER E} etc.
I've used Unicode 14.0, there are 144803 character name properties
(including the ones generated by Unicode NR1 and NR2 rules)
and correction/control/alternate aliases, together with zero terminators
that would be 3884745 bytes, which is clearly unacceptable for libcpp.
This patch instead contains a generator which from the UnicodeData.txt
and NameAliases.txt files emits a space optimized radix tree (208765
bytes long for 14.0), a single string literal dictionary (59418 bytes),
maximum name length (currently 88 chars) and two small helper arrays
for the NR1/NR2 name generation.
The radix tree needs 2 to 9 bytes per node, the exact format is
described in the generator program.  There could be ways to shrink
the dictionary size somewhat at the expense of slightly slower lookups.

Currently the patch implements strict matching (that is what is needed
to actually implement it on valid code) and Unicode UAX44-LM2 algorithm
loose matching to provide hints (that algorithm essentially ignores
hyphens in between two alphanumeric characters, spaces and underscores
(with one exception for hyphen) and does case insensitive matching).
In the attachment is a WIP patch that shows how to implement also
spellcheck.{h,cc} style discovery of misspellings, but I'll need to talk
to David Malcolm about it, as spellcheck.{h,cc} is in gcc/ subdir
(so the WIP incremental patch instead prints all the names to stderr).

2022-08-26  Jakub Jelinek  <jakub@redhat.com>

PR c++/106648
libcpp/
* charset.cc: Implement C++23 P2071R2 - Named universal character
escapes.  Include uname2c.h.
(hangul_syllables, hangul_count): New variables.
(struct uname2c_data): New type.
(_cpp_uname2c, _cpp_uname2c_uax44_lm2): New functions.
(_cpp_valid_ucn): Use them.  Handle named universal character escapes.
(convert_ucn): Adjust comment.
(convert_escape): Call convert_ucn even for \N.
(_cpp_interpret_identifier): Handle named universal character escapes.
* lex.cc (get_bidi_ucn): Fix up function comment formatting.
(get_bidi_named): New function.
(forms_identifier_p, lex_string): Handle named universal character
escapes.
* makeuname2c.cc: New file.  Small parts copied from makeucnid.cc.
* uname2c.h: New generated file.
gcc/c-family/
* c-cppbuiltin.cc (c_cpp_builtins): Predefine
__cpp_named_character_escapes to 202207L.
gcc/testsuite/
* c-c++-common/cpp/named-universal-char-escape-1.c: New test.
* c-c++-common/cpp/named-universal-char-escape-2.c: New test.
* c-c++-common/cpp/named-universal-char-escape-3.c: New test.
* c-c++-common/cpp/named-universal-char-escape-4.c: New test.
* c-c++-common/Wbidi-chars-25.c: New test.
* gcc.dg/cpp/named-universal-char-escape-1.c: New test.
* gcc.dg/cpp/named-universal-char-escape-2.c: New test.
* g++.dg/cpp/named-universal-char-escape-1.C: New test.
* g++.dg/cpp/named-universal-char-escape-2.C: New test.
* g++.dg/cpp23/feat-cxx2b.C: Test __cpp_named_character_escapes.

Improve compute_control_dep_chain path finding

This improves the compute_control_dep_chain path finding by first
marking the dominating region we search and then making sure to
not walk outside if it when enumerating all paths from the dominating
block to the interesting PHI edge source. I have limited the DFS
walk done for the marking in similar ways as we limit the walking
in compute_control_dep_chain, more careful limiting might be
necessary though - the --param uninit-control-dep-attempts param
I re-use has a rather high default of 1000 which we might be able
to reduce with this patch as well (I think we'll usually hit some of the
other limits before ever reaching this).

* gimple-predicate-analysis.cc (dfs_mark_dominating_region):
New helper.
(compute_control_dep_chain): Adjust to honor marked region
if provided.
(uninit_analysis::init_from_phi_def): Pre-mark the dominating
region to improve compute_control_dep_chain walking.
* vec.h (vec<T, va_heap, vl_ptr>::allocated): Add forwarder.

Improve uninit_analysis::collect_phi_def_edges

This avoids expanding an edge to those of a PHI def if it is not
may-undefined, reducing the number of compute_control_dep_chain calls.

* gimple-predicate-analysis.cc
(uninit_analysis::collect_phi_def_edges): Only expand a
PHI def edge when it is possibly undefined.

cr16: remove obsoleted port

contrib/ChangeLog:

* config-list.mk: Remove cr16.

gcc/ChangeLog:

* doc/extend.texi: Remove cr16 related stuff.
* doc/install.texi: Likewise.
* doc/invoke.texi: Likewise.
* doc/md.texi: Likewise.
* function-tests.cc (test_expansion_to_rtl): Likewise.
* common/config/cr16/cr16-common.cc: Removed.
* config/cr16/constraints.md: Removed.
* config/cr16/cr16-protos.h: Removed.
* config/cr16/cr16.cc: Removed.
* config/cr16/cr16.h: Removed.
* config/cr16/cr16.md: Removed.
* config/cr16/cr16.opt: Removed.
* config/cr16/predicates.md: Removed.
* config/cr16/t-cr16: Removed.

libgcc/ChangeLog:

* config.host: Remove cr16 related stuff.
* config/cr16/crti.S: Removed.
* config/cr16/crtlibid.S: Removed.
* config/cr16/crtn.S: Removed.
* config/cr16/divmodhi3.c: Removed.
* config/cr16/lib1funcs.S: Removed.
* config/cr16/t-cr16: Removed.
* config/cr16/t-crtlibid: Removed.
* config/cr16/unwind-cr16.c: Removed.
* config/cr16/unwind-dw2.h: Removed.

gcc/testsuite/ChangeLog:

* lib/target-supports.exp: Remove cr16 related stuff.

Don't gimple fold ymm-version vblendvpd/vblendvps/vpblendvb w/o TARGET_AVX2

Since 256-bit vector integer comparison is under TARGET_AVX2,
and gimple folding for vblendvpd/vblendvps/vpblendvb relies on that.
Restrict gimple fold condition to TARGET_AVX2.

gcc/ChangeLog:

PR target/106704
* config/i386/i386-builtin.def (BDESC): Add
CODE_FOR_avx_blendvpd256/CODE_FOR_avx_blendvps256 to
corresponding builtins.
* config/i386/i386.cc (ix86_gimple_fold_builtin):
Don't fold IX86_BUILTIN_PBLENDVB256, IX86_BUILTIN_BLENDVPS256,
IX86_BUILTIN_BLENDVPD256 w/o TARGET_AVX2.

gcc/testsuite/ChangeLog:

* gcc.target/i386/pr106704.c: New test.

Daily bump.

c: Implement C23 nullptr (N3042)

This patch implements the C23 nullptr literal:
<https://www.open-std.org/jtc1/sc22/wg14/www/docs/n3042.htm> (with
wording fixes from N3047), which is intended to replace the problematic
definition of NULL which might be either of integer type or void*.

Since C++ has had nullptr for over a decade now, it was relatively easy
to just move the built-in node definitions from the C++ FE to the C/C++
common code.  Also, our DWARF emitter already handles NULLPTR_TYPE by
emitting DW_TAG_unspecified_type.  However, I had to handle a lot of
contexts such as ?:, comparison, conversion, etc.

There are some minor differences, e.g. in C you can do

  bool b = nullptr;

but in C++ you have to use direct-initialization:

  bool b{nullptr};

And I think that

  nullptr_t n = 0;

is only valid in C++.

Of course, C doesn't have to handle mangling, RTTI, substitution,
overloading, ...

This patch also defines nullptr_t in <stddef.h>.  However, it does not
define __STDC_VERSION_STDDEF_H__ yet, because we don't know yet what value
it should be defined to.

gcc/c-family/ChangeLog:

* c-common.cc (c_common_reswords): Enable nullptr in C2X.
(c_common_nodes_and_builtins): Create the built-in node for nullptr.
* c-common.h (enum c_tree_index): Add CTI_NULLPTR, CTI_NULLPTR_TYPE.
(struct c_common_resword): Resize the disable member.
(D_C2X): Add.
(nullptr_node): Define.
(nullptr_type_node): Define.
(NULLPTR_TYPE_P): Define.
* c-pretty-print.cc (c_pretty_printer::simple_type_specifier): Handle
NULLPTR_TYPE.
(c_pretty_printer::direct_abstract_declarator): Likewise.
(c_pretty_printer::constant): Likewise.

gcc/c/ChangeLog:

* c-convert.cc (c_convert) <case POINTER_TYPE>: Handle NULLPTR_TYPE.
Give a better diagnostic when converting to nullptr_t.
* c-decl.cc (c_init_decl_processing): Perform C-specific nullptr
initialization.
* c-parser.cc (c_parse_init): Maybe OR D_C2X into mask.
(c_parser_postfix_expression): Handle RID_NULLPTR.
* c-typeck.cc (null_pointer_constant_p): Return true when expr is
nullptr_node.
(build_unary_op) <case TRUTH_NOT_EXPR>: Handle NULLPTR_TYPE.
(build_conditional_expr): Handle the case when the second/third operand
is NULLPTR_TYPE and third/second operand is POINTER_TYPE.
(convert_for_assignment): Handle converting an expression of type
nullptr_t to pointer/bool.
(build_binary_op) <case TRUTH_XOR_EXPR>: Handle NULLPTR_TYPE.
<case EQ_EXPR>: Handle comparing operands of type nullptr_t.

gcc/cp/ChangeLog:

* cp-tree.h (enum cp_tree_index): Remove CTI_NULLPTR, CTI_NULLPTR_TYPE.
Move it to c_tree_index.
(nullptr_node): No longer define here.
(nullptr_type_node): Likewise.
(NULLPTR_TYPE_P): Likewise.
* decl.cc (cxx_init_decl_processing): Only keep C++-specific nullptr
initialization; move the shared code to c_common_nodes_and_builtins.

gcc/ChangeLog:

* ginclude/stddef.h: Define nullptr_t.

gcc/testsuite/ChangeLog:

* gcc.dg/c11-nullptr-1.c: New test.
* gcc.dg/c17-nullptr-1.c: New test.
* gcc.dg/c17-nullptr-2.c: New test.
* gcc.dg/c2x-nullptr-1.c: New test.
* gcc.dg/c2x-nullptr-2.c: New test.
* gcc.dg/c2x-nullptr-3.c: New test.
* gcc.dg/c2x-nullptr-4.c: New test.
* gcc.dg/c2x-nullptr-5.c: New test.

c: Support C2x empty initializer braces

ISO C2x standardizes empty initializer braces {}.  Implement this
feature accordingly.  The basic case was already supported and so just
needed diagnostic adjustments.  However, the standard feature also
includes two cases that were not previously supported: empty
initializer braces for scalars, and empty initializer braces for
VLAs.  Thus, add support for those features as well, updating existing
tests that expected them to be diagnosed.

There was already some gimplifier support for converting
variable-sized initializations with empty CONSTRUCTORs to memset.
However, it didn't apply here; code earlier in gimplify_modify_expr
ended up calling gimplify_init_constructor via
gimplify_modify_expr_rhs, which ended up handling the CONSTRUCTOR in a
way that generated an ICE later.  Add a check for this case earlier in
gimplify_modify_expr to avoid that issue.

Bootstrapped with no regressions for x86_64-pc-linux-gnu.

gcc/
* gimplify.cc (gimplify_modify_expr): Convert initialization from
a variable-size CONSTRUCTOR to memset before call to
gimplify_modify_expr_rhs.

gcc/c/
* c-decl.cc (start_decl): Do not diagnose initialization of
variable-sized objects here.
* c-parser.cc (c_parser_braced_init): Add argument DECL.  All
callers changed.
(c_parser_initializer): Diagnose initialization of variable-sized
objects other than with braced initializer.
(c_parser_braced_init): Use pedwarn_c11 for empty initializer
braces and update diagnostic text.  Diagnose initialization of
variable-sized objects with nonempty braces.
* c-typeck.cc (digest_init): Update diagnostic for initialization
of variable-sized objects.
(really_start_incremental_init, set_designator)
(process_init_element): Update comments.
(pop_init_level): Allow scalar empty initializers.

gcc/testsuite/
* gcc.dg/c11-empty-init-1.c, gcc.dg/c11-empty-init-2.c,
gcc.dg/c11-empty-init-3.c, gcc.dg/c2x-empty-init-1.c,
gcc.dg/c2x-empty-init-2.c, gcc.dg/c2x-empty-init-3.c,
gcc.dg/gnu2x-empty-init-1.c, gcc.dg/gnu2x-empty-init-2.c: New
tests.
* gcc.dg/torture/dfp-default-init-1.c: Also test empty
initializers.
* gcc.dg/init-bad-1.c, gcc.dg/noncompile/pr71583.c,
gcc.dg/pr61096-1.c, gcc.dg/vla-init-2.c, gcc.dg/vla-init-3.c,
gcc.target/i386/sse2-bfloat16-scalar-typecheck.c: Update expected
diagnostics.
* gcc.dg/ubsan/c-shift-1.c: Use nonempty initializers for VLA
initializations expected to be diagnosed.

c++: block copy elision in delegating ctor

CWG2403 deals with the issue that copy elision is not possible when the
initialized object is a potentially-overlapping subobject and the
initializer is a function that returns by value. Jonathan pointed out that
this also affects delegating constructors, which might be used to construct
a base subobject.

gcc/cp/ChangeLog:

* call.cc (unsafe_return_slot_p): Return 2 for *this in a
constructor.

gcc/testsuite/ChangeLog:

* g++.dg/init/elide8.C: New test.

dwarf2: use DW_ATE_UTF for char8_t

While looking at the Rust changes to dwarf2out I noticed that this was
missing from the char8_t support.

gcc/ChangeLog:

* dwarf2out.cc (base_type_die): Also use DW_ATE_UTF for char8_t.

gcc/testsuite/ChangeLog:

* g++.dg/debug/dwarf2/utf-1.C: New test.

libstdc++: Some minor <ranges> cleanups

libstdc++-v3/ChangeLog:

* include/std/ranges (lazy_split_view::_OuterIter::_M_current):
Remove redundant comment.
(lazy_split_view::_M_current): Likewise.
(common_view::common_view): Remove commented out view-converting
constructor as per LWG3405.
(elements_view::_Iterator::_Iterator): Uglify 'current' and 'i'.

PR 106101: IBM zSystems: Fix strict_low_part problem

This avoids generating illegal (strict_low_part (reg ...)) RTXs. This
required two changes:

1. Do not use gen_lowpart to generate the inner expression of a
STRICT_LOW_PART.  gen_lowpart might fold the SUBREG either because
there is already a paradoxical subreg or because it can directly be
applied to the register. A new wrapper function makes sure that we
always end up having an actual SUBREG.

2. Change the movstrict patterns to enforce a SUBREG as inner operand
of the STRICT_LOW_PARTs.  The new predicate introduced for the
destination operand requires a SUBREG expression with a
register_operand as inner operand.  However, since reload strips away
the majority of the SUBREGs we have to accept single registers as well
once we reach reload.

Bootstrapped and regression tested on IBM zSystems 64 bit.

gcc/ChangeLog:

PR target/106101
* config/s390/predicates.md (subreg_register_operand): New
predicate.
* config/s390/s390-protos.h (s390_gen_lowpart_subreg): New
function prototype.
* config/s390/s390.cc (s390_gen_lowpart_subreg): New function.
(s390_expand_insv): Use s390_gen_lowpart_subreg instead of
gen_lowpart.
* config/s390/s390.md ("*get_tp_64", "*zero_extendhisi2_31")
("*zero_extendqisi2_31", "*zero_extendqihi2_31"): Likewise.
("movstrictqi", "movstricthi", "movstrictsi"): Use the
subreg_register_operand predicate instead of register_operand.

gcc/testsuite/ChangeLog:

PR target/106101
* gcc.c-torture/compile/pr106101.c: New test.

regenerate configure files and config.h.in files

fixincludes/ChangeLog:

* config.h.in: Regenerate.
* configure: Regenerate.

libada/ChangeLog:

* configure: Regenerate.

libiberty/ChangeLog:

* configure: Regenerate.

libobjc/ChangeLog:

* configure: Regenerate.

liboffloadmic/ChangeLog:

* configure: Regenerate.
* plugin/configure: Regenerate.

libquadmath/ChangeLog:

* configure: Regenerate.

libssp/ChangeLog:

* configure: Regenerate.

libvtv/ChangeLog:

* configure: Regenerate.

zlib/ChangeLog:

* configure: Regenerate.

LoongArch: add model attribute

A linker script and/or a section attribute may locate some object
specially, so we need to handle the code model for such objects
differently than the -mcmodel setting. This happens when the Linux
kernel loads a module with per-CPU variables.

Add an attribute to override the code model for a specific variable.

gcc/ChangeLog:

* config/loongarch/loongarch-protos.h (loongarch_symbol_type):
Add SYMBOL_PCREL64 and change the description for SYMBOL_PCREL.
* config/loongarch/loongarch.cc (loongarch_attribute_table):
New attribute table.
(TARGET_ATTRIBUTE_TABLE): Define the target hook.
(loongarch_handle_model_attribute): New static function.
(loongarch_classify_symbol): Take TARGET_CMODEL_EXTREME and the
model attribute of SYMBOL_REF_DECL into account returning
SYMBOL_PCREL or SYMBOL_PCREL64.
(loongarch_use_anchors_for_symbol_p): New static function.
(TARGET_USE_ANCHORS_FOR_SYMBOL_P): Define the target hook.
(loongarch_symbol_extreme_p): New static function.
(loongarch_symbolic_constant_p): Handle SYMBOL_PCREL64.
(loongarch_symbol_insns): Likewise.
(loongarch_split_symbol_type): Likewise.
(loongarch_split_symbol): Check SYMBOL_PCREL64 instead of
TARGET_CMODEL_EXTREME for PC-relative addressing.
(loongarch_print_operand_reloc): Likewise.
* doc/extend.texi (Variable Attributes): Document new
LoongArch specific attribute.

gcc/testsuite/ChangeLog:

* gcc.target/loongarch/attr-model-test.c: New test.
* gcc.target/loongarch/attr-model-1.c: New test.
* gcc.target/loongarch/attr-model-2.c: New test.
* gcc.target/loongarch/attr-model-diag.c: New test.

LoongArch: Avoid RTL flag check failure in loongarch_classify_symbol

SYMBOL_REF_TLS_MODEL invokes SYMBOL_REF_FLAGS, and SYMBOL_REF_FLAGS
invokes RTL_FLAG_CHECK1 and aborts when RTL code is not SYMBOL_REF.

r13-1833 removed "gcc_assert (SYMBOL_REF_P (x))" before invoking
"SYMBOL_REF_TLS_MODEL (x)", indicating that it's now possible that "x"
is not a SYMBOL_REF.  So we need to check if "x" is SYMBOL_REF first.

This fixes a test failure happening with r13-2173 with RTL flag
checking enabled:

    pr106096.C:26:1: internal compiler error: RTL flag check:
    SYMBOL_REF_FLAGS used with unexpected rtx code 'const' in
    loongarch_classify_symbol

gcc/ChangeLog:

* config/loongarch/loongarch.cc (loongarch_classify_symbol):
Return early if the rtx is not SYMBOL_REF.

tree-optimization/106737 - remove intermediate SSA verification in autopar

The following removes intermediate SSA verification in autopar which
isn't expected to succeed after previous changes delaying (virtual)
SSA update to the end of the pass.

PR tree-optimization/106737
* tree-parloops.cc (transform_to_exit_first_loop_alt): Do not
verify SSA form.

* gcc.dg/autopar/pr106737.c: New testcase.

Fortran/OpenMP: Fix strictly structured blocks parsing

gcc/fortran/ChangeLog:

* parse.cc (parse_omp_structured_block): When parsing strictly
structured blocks, issue an error if the end-directive comes
before the 'end block'.

gcc/testsuite/ChangeLog:

* gfortran.dg/gomp/strictly-structured-block-4.f90: New test.

LoongArch: Fix pr106459 by use HWIT instead of 1UL.

gcc/ChangeLog:

PR target/106459
* config/loongarch/loongarch.cc (loongarch_build_integer):
Use HOST_WIDE_INT.
* config/loongarch/loongarch.h (IMM_REACH): Likewise.
(HWIT_1U): New Defined.
(LU12I_OPERAND): Use HOST_WIDE_INT.
(LU32I_OPERAND): Likewise.
(LU52I_OPERAND): Likewise.
(HWIT_UC_0xFFF): Likwise.

gcc/testsuite/ChangeLog:

* gcc.target/loongarch/pr106459.c: New test.