review.tizen.org Git - platform/upstream/gcc.git/log

i386 PIE: testsuite: cope with default pie on ia32

This patch continues the effort of cleaning up the testsuite for
--enable-default-pie; the focus herein is mostly 32-bit x86.

As much as I tried to avoid it, most of the changes to the testsuite
simply disable PIC/PIE, for reasons I'm going to detail below.

static-cdtor1.C gets new patterns to match PIE output.  Some
avx512fp16 tests change only in register allocation, because of the
register used to hold the GOT base address.  Interrupt tests changed
in this regard as well, but here it also affected register saving and
restoring.

The previous patch modified cet-sjlj tests, mentioning a single regexp
covering PIC and nonPIC got incorrect match counts.  I found out that
adding ?: to parenthesized subpatterns avoids miscounting matches.
Other tests that count certain kinds of insns needed adjustment over
insns in get_pc_thunk, extra loads from the GOT, or extra adds to
compute addresses.  In one case, namely stack-check-12, it is nonPIC
that had extra insns, that PIC gets rid of, or rather, pushing and
popping the PIC register obviates the dummy push and matching pop used
for stack probing in nonpic.

pr95126 tests were supposed to optimize loads into known constants,
but the @GOTOFF addresses prevent that for reasons I have not
investigated, but that would be clearly desirable, so I've XFAILed
these.  pr95852 is another case of missed optimization: sibcalls are
not possible when the PIC register needs to be set up for the call,
which prevents the expected constant propagation to the return block;
I have adjusted the codegen expectations of these tests.

As for tests that disable PIE...  Some are judgment calls, that fail
for similar reasons as tests described above, but I chose not to
adjust their expectations; others are just not possible with PIC, or
not worth the effort of adjusting.

anon[14].C check for no global or comdat symbols, respectively, but
-fPIE outputs get_pc_thunk, as global hidden comdat.
initlist-const1.C wants .rodata and checks for no .data, but PIC
outputs constant data that needs relocations in .data.rel.ro.local.
no-stack-protector-attr-3.C and stackprotectexplicit2.C count
stack_check_fail matches; -fPIE calls stack_check_fail_local instead,
which matches the pattern, but this symbol is also marked as .hidden,
so the match count needs to be adjusted.

pr71694.C checks for no movl, but get_pc_thunk contains one.
pr102892-1.c is a missed optimization, ivopts creates an induction
variable because the array address can't be part of an indexing base
address with PIE, and that ends up stopping a load from being resolved
to a constant as expected.  sibcall-11.c needs @PLT for the call,
which requires the PIC register, which makes sibcalling impossible.
builtin-self.c, in turn, expects no calls, but PIC calls get_pc_thunk.

avx* vector tests that had PIE disabled were affected in that the need
for GOT-based addressing modes changed instruction selection in ways
that deviated from the expectations of the tests.  Ditto other vector
tests: pr100865*, pr101796-1, pr101846, pr101989-broadcast-1, and
pr102021, pr54855-[37], and pr90773-17.

pr15184* tests need a PIC register to access global variables, which
affects register allocation, so the patterns would have to be
adjusted.  pr27971 can't use the expected addressing mode to
dereference the array with PIC, so it ends up selecting an indexed
addressing mode, obviating the expected separate shift insn.

pr70263-2 is another case that implicitly expects a sibcall,
impossible because of the need for the PIC register; without a
sibcall, the expected REG_EQUIV for the reuse of the stack slot of an
incoming argument does not occur.  pr78035 duplicates the final
compare in both then and else blocks with PIE, which deviates from the
expected cmp count.  pr81736-[57] test for no frame pointer, but the
PIC register assignment to a call-saved register forces a frame; the
former ends up not using the PIC register, but it's only optimized out
after committing to a stack frame to preserve it.  pr85620-6 also
expects a tail call in a situation that is impossible on ia32 PIC.

pr85667-6 doesn't expect the movl in get_pc_thunk.  pr93492-5 tests
-mfentry, not available with PIC on ia32.  pr96539 expects a
tail-call, to avoid copying a large-ish struct argument, but the call
requires the PIC register, so no tail-call.  stack-prot-sym.c expects
a nonpic addressing mode.

for  gcc/testsuite/ChangeLog

* g++.dg/abi/anon1.C: Disable pie on ia32.
* g++.dg/abi/anon4.C: Likewise.
* g++.dg/cpp0x/initlist-const1.C: Likewise.
* g++.dg/no-stack-protector-attr-3.C: Likewise.
* g++.dg/stackprotectexplicit2.C: Likewise.
* g++.dg/pr71694.C: Likewise.
* gcc.dg/pr102892-1.c: Likewise.
* gcc.dg/sibcall-11.c: Likewise.
* gcc.dg/torture/builtin-self.c: Likewise.
* gcc.target/i386/avx2-dest-false-dep-for-glc.c: Likewise.
* gcc.target/i386/avx512bf16-cvtsbh2ss-1.c: Likewise.
* gcc.target/i386/avx512f-broadcast-pr87767-1.c: Likewise.
* gcc.target/i386/avx512f-broadcast-pr87767-3.c: Likewise.
* gcc.target/i386/avx512f-broadcast-pr87767-5.c: Likewise.
* gcc.target/i386/avx512f-broadcast-pr87767-7.c: Likewise.
* gcc.target/i386/avx512fp16-broadcast-1.c: Likewise.
* gcc.target/i386/avx512fp16-pr101846.c: Likewise.
* gcc.target/i386/avx512vl-broadcast-pr87767-1.c: Likewise.
* gcc.target/i386/avx512vl-broadcast-pr87767-3.c: Likewise.
* gcc.target/i386/avx512vl-broadcast-pr87767-5.c: Likewise.
* gcc.target/i386/pr100865-2.c: Likewise.
* gcc.target/i386/pr100865-3.c: Likewise.
* gcc.target/i386/pr100865-4a.c: Likewise.
* gcc.target/i386/pr100865-4b.c: Likewise.
* gcc.target/i386/pr100865-5a.c: Likewise.
* gcc.target/i386/pr100865-5b.c: Likewise.
* gcc.target/i386/pr100865-6a.c: Likewise.
* gcc.target/i386/pr100865-6b.c: Likewise.
* gcc.target/i386/pr100865-6c.c: Likewise.
* gcc.target/i386/pr100865-7b.c: Likewise.
* gcc.target/i386/pr101796-1.c: Likewise.
* gcc.target/i386/pr101846-2.c: Likewise.
* gcc.target/i386/pr101989-broadcast-1.c: Likewise.
* gcc.target/i386/pr102021.c: Likewise.
* gcc.target/i386/pr90773-17.c: Likewise.
* gcc.target/i386/pr54855-3.c: Likewise.
* gcc.target/i386/pr54855-7.c: Likewise.
* gcc.target/i386/pr15184-1.c: Likewise.
* gcc.target/i386/pr15184-2.c: Likewise.
* gcc.target/i386/pr27971.c: Likewise.
* gcc.target/i386/pr70263-2.c: Likewise.
* gcc.target/i386/pr78035.c: Likewise.
* gcc.target/i386/pr81736-5.c: Likewise.
* gcc.target/i386/pr81736-7.c: Likewise.
* gcc.target/i386/pr85620-6.c: Likewise.
* gcc.target/i386/pr85667-6.c: Likewise.
* gcc.target/i386/pr93492-5.c: Likewise.
* gcc.target/i386/pr96539.c: Likewise.
PR target/81708 (%gs:my_guard)
* gcc.target/i386/stack-prot-sym.c: Likewise.
* g++.dg/init/static-cdtor1.C: Add alternate patterns for PIC.
* gcc.target/i386/avx512fp16-vcvtsh2si-1a.c: Extend patterns
for PIC/PIE register allocation.
* gcc.target/i386/pr100704-3.c: Likewise.
* gcc.target/i386/avx512fp16-vcvtsh2usi-1a.c: Likewise.
* gcc.target/i386/avx512fp16-vcvttsh2si-1a.c: Likewise.
* gcc.target/i386/avx512fp16-vcvttsh2usi-1a.c: Likewise.
* gcc.target/i386/avx512fp16-vmovsh-1a.c: Likewise.
* gcc.target/i386/interrupt-11.c: Likewise, allowing for
preservation of the PIC register.
* gcc.target/i386/interrupt-12.c: Likewise.
* gcc.target/i386/interrupt-13.c: Likewise.
* gcc.target/i386/interrupt-15.c: Likewise.
* gcc.target/i386/interrupt-16.c: Likewise.
* gcc.target/i386/interrupt-17.c: Likewise.
* gcc.target/i386/interrupt-8.c: Likewise.
* gcc.target/i386/cet-sjlj-6a.c: Combine patterns from
previous change.
* gcc.target/i386/cet-sjlj-6b.c: Likewise.
* gcc.target/i386/pad-10.c: Accept insns in get_pc_thunk.
* gcc.target/i386/pr70321.c: Likewise.
* gcc.target/i386/pr81563.c: Likewise.
* gcc.target/i386/pr84278.c: Likewise.
* gcc.target/i386/pr90773-2.c: Likewise, plus extra loads from
the GOT.
* gcc.target/i386/pr90773-3.c: Likewise.
* gcc.target/i386/pr94913-2.c: Accept additional PIC insns.
* gcc.target/i386/stack-check-17.c: Likewise.
* gcc.target/i386/stack-check-12.c: Do not require dummy stack
probing obviated with PIC.
* gcc.target/i386/pr95126-m32-1.c: Expect missed optimization
with PIC.
* gcc.target/i386/pr95126-m32-2.c: Likewise.
* gcc.target/i386/pr95852-2.c: Accept different optimization
with PIC.
* gcc.target/i386/pr95852-4.c: Likewise.

ifcvt: Fix up noce_convert_multiple_sets [PR106590]

The following testcase is miscompiled on x86_64-linux.
The problem is in the noce_convert_multiple_sets optimization.
We essentially have:
if (g == 1)
  {
    g = 1;
    f = 23;
  }
else
  {
    g = 2;
    f = 20;
  }
and for each insn try to create a conditional move sequence.
There is code to detect overlap with the regs used in the condition
and the destinations, so we actually try to construct:
tmp_g = g == 1 ? 1 : 2;
f = g == 1 ? 23 : 20;
g = tmp_g;
which is fine.  But, we actually try to create two different
conditional move sequences in each case, seq1 with the whole
(eq (reg/v:HI 82 [ g ]) (const_int 1 [0x1]))
condition and seq2 with cc_cmp
(eq (reg:CCZ 17 flags) (const_int 0 [0]))
to rely on the earlier present comparison.  In each case, we
compare the rtx costs and choose the cheaper sequence (seq1 if both
have the same cost).
The problem is that with the skylake tuning,
tmp_g = g == 1 ? 1 : 2;
is actually expanded as
tmp_g = (g == 1) + 1;
in seq1 (which clobbers (reg 17 flags)) and as a cmov in seq2
(which doesn't).  The tuning says both have the same cost, so we
pick seq1.  Next we check sequences for
f = g == 1 ? 23 : 20; and here the seq2 cmov is cheaper, but it
uses (reg 17 flags) which has been clobbered earlier.

The following patch fixes that by detecting if we in the chosen
sequence clobber some register mentioned in cc_cmp or rev_cc_cmp,
and if yes, arranges for only seq1 (i.e. sequences that emit the
comparison itself) to be used after that.

2022-08-15  Jakub Jelinek  <jakub@redhat.com>

PR rtl-optimization/106590
* ifcvt.cc (check_for_cc_cmp_clobbers): New function.
(noce_convert_multiple_sets_1): If SEQ sets or clobbers any regs
mentioned in cc_cmp or rev_cc_cmp, don't consider seq2 for any
further conditional moves.

* gcc.dg/torture/pr106590.c: New test.

x86: Enable __bf16 type for TARGET_SSE2 and above

gcc/ChangeLog:

* config/i386/i386-builtin-types.def (BFLOAT16): New primitive type.
* config/i386/i386-builtins.cc : Support __bf16 type for i386 backend.
(ix86_register_bf16_builtin_type): New function.
(ix86_bf16_type_node): New.
(ix86_bf16_ptr_type_node): Ditto.
(ix86_init_builtin_types): Add ix86_register_bf16_builtin_type function call.
* config/i386/i386-modes.def (FLOAT_MODE): Add BFmode.
(ADJUST_FLOAT_FORMAT): Ditto.
* config/i386/i386.cc (classify_argument): Handle BFmode.
(construct_container): Ditto.
(function_value_32): Return __bf16 by %xmm0.
(function_value_64): Return __bf16 by SSE register.
(ix86_output_ssemov): Handle BFmode.
(ix86_legitimate_constant_p): Disable BFmode constant double.
(ix86_secondary_reload): Require gpr as intermediate register
to store __bf16 from sse register when sse4 is not available.
(ix86_scalar_mode_supported_p): Enable __bf16 under sse2.
(ix86_mangle_type): Add manlging for __bf16 type.
(ix86_invalid_conversion): New function for target hook.
(ix86_invalid_unary_op): Ditto.
(ix86_invalid_binary_op): Ditto.
(TARGET_INVALID_CONVERSION): New define for target hook.
(TARGET_INVALID_UNARY_OP): Ditto.
(TARGET_INVALID_BINARY_OP): Ditto.
* config/i386/i386.h (host_detect_local_cpu): Add BFmode.
* config/i386/i386.md ("mode"): Add BFmode.
(MODE_SIZE): Ditto.
(X87MODEFH): Ditto.
(HFBF): Add new define_mode_iterator.
(*pushhf_rex64): Change for BFmode.
(*push<mode>_rex64): Ditto.
(*pushhf): Ditto.
(*push<mode>): Ditto.
(MODESH): Ditto.
(hfbfconstf): Add new define_mode_attr.
(*mov<mode>_internal): Add BFmode.

gcc/testsuite/ChangeLog:

* g++.target/i386/bfloat_cpp_typecheck.C: New test.
* gcc.target/i386/bfloat16-1.c: Ditto.
* gcc.target/i386/sse2-bfloat16-1.c: Ditto.
* gcc.target/i386/sse2-bfloat16-2.c: Ditto.
* gcc.target/i386/sse2-bfloat16-scalar-typecheck.c: Ditto.

Daily bump.

Move V1TI shift/rotate lowering from expand to pre-reload split on x86_64.

This patch moves the lowering of 128-bit V1TImode shifts and rotations by
constant bit counts to sequences of SSE operations from the RTL expansion
pass to the pre-reload split pass.  Postponing this splitting of shifts
and rotates enables (will enable) the TImode equivalents of these operations/
instructions to be considered as candidates by the (TImode) STV pass.
Technically, this patch changes the existing expanders to continue to
lower shifts by variable amounts, but constant operands become RTL
instructions, specified by define_insn_and_split that are triggered by
x86_pre_reload_split.  The one minor complication is that logical shifts
by multiples of eight, don't get split, but are handled by existing insn
patterns, such as sse2_ashlv1ti3 and sse2_lshrv1ti3.  There should be no
changes in generated code with this patch, which just adjusts the pass
in which transformations get applied.

2022-08-13  Roger Sayle  <roger@nextmovesoftware.com>
    Uroš Bizjak  <ubizjak@gmail.com>

gcc/ChangeLog
* config/i386/predicates.md (const_0_to_255_not_mul_8_operand):
New predicate for values between 0/1 and 255, not multiples of 8.
* config/i386/sse.md (ashlv1ti3): Delay lowering of logical left
shifts by constant bit counts.
(*ashlvti3_internal): New define_insn_and_split that lowers
logical left shifts by constant bit counts, that aren't multiples
of 8, before reload.
(lshrv1ti3): Delay lowering of logical right shifts by constant.
(*lshrv1ti3_internal): New define_insn_and_split that lowers
logical right shifts by constant bit counts, that aren't multiples
of 8, before reload.
(ashrv1ti3):: Delay lowering of arithmetic right shifts by
constant bit counts.
(*ashrv1ti3_internal): New define_insn_and_split that lowers
arithmetic right shifts by constant bit counts before reload.
(rotlv1ti3): Delay lowering of rotate left by constant.
(*rotlv1ti3_internal): New define_insn_and_split that lowers
rotate left by constant bits counts before reload.
(rotrv1ti3): Delay lowering of rotate right by constant.
(*rotrv1ti3_internal): New define_insn_and_split that lowers
rotate right by constant bits counts before reload.

testsuite: Disable out-of-bounds checker in analyzer/torture/pr93451.c

This patch disables Wanalyzer-out-of-bounds for analyzer/torture/pr93451.c
and makes the test case pass when compiled with -m32.

The emitted warning is a true positive but only occurs if
sizeof (long int) is less than sizeof (double). I've already discussed a
similar case with Dave in the context of pr96764.c and we came to the
conclusion that we just disable the checker in such cases.

Committed under the "obvious fix" rule.

gcc/testsuite/ChangeLog:

* gcc.dg/analyzer/torture/pr93451.c:
Disable Wanalyzer-out-of-bounds.

Daily bump.

[Committed] arm: Document +no options for Cortex-M55 CPU.

This patch documents the following options for Arm Cortex-M55 CPU
under -mcpu= list.

+nomve.fp (disables MVE single precision floating point instructions)
+nomve (disables MVE integer and single precision floating point instructions)
+nodsp (disables dsp, MVE integer and single precision floating point instructions)
+nofp (disables floating point instructions)
Committed as obvious to master.

gcc/ChangeLog:

2022-08-12 Srinath Parvathaneni <srinath.parvathaneni@arm.com>

* doc/invoke.texi (Arm Options): Document -mcpu=cortex-m55 options.

Fix invalid devirtualization when combining final keyword and anonymous types

this patch fixes a wrong code issue where we incorrectly devirtualize to
__builtin_unreachable.  The problem occurs in combination of anonymous
namespaces and final keyword used on methods.  We do two optimizations here
1) when reacing final method we cut the search for possible new targets
2) if the type is anonymous we detect whether it is ever instatiated by
    looking if its vtable is referred to.
Now this goes wrong when thre is an anonymous type with final method that
is not instantiated while its derived type is.  So if 1 triggers we need
to make 2 to look for vtables of all derived types as done by this patch.

Bootstrpaped/regtested x86_64-linux

Honza

gcc/ChangeLog:

2022-08-10  Jan Hubicka  <hubicka@ucw.cz>

PR middle-end/106057
* ipa-devirt.cc (type_or_derived_type_possibly_instantiated_p): New
function.
(possible_polymorphic_call_targets): Use it.

gcc/testsuite/ChangeLog:

2022-08-10  Jan Hubicka  <hubicka@ucw.cz>

PR middle-end/106057
* g++.dg/tree-ssa/pr101839.C: New test.

Improve comment for tree_niter_desc.{control,bound,cmp}

Fix typos and explain ERROR_MARK usage.

gcc/ChangeLog:

* tree-ssa-loop.h: Improve comment

phiopt: Remove unnecessary checks from spaceship_replacement [PR106506]

Those 2 checks were just me trying to be extra careful, the
(phires & 1) == phires and variants it is folded to of course make only sense
for the -1/0/1/2 result spaceship, for -1/0/1 one can just use comparisons of
phires.  We only floating point spaceship if nans aren't honored, so the
2 case is ignored, and if it is, with Aldy's changes we can simplify the
2 case away from the phi but the (phires & 1) == phires stayed.  It is safe
to treat the phires comparison as phires >= 0 even then.

2022-08-12  Jakub Jelinek  <jakub@redhat.com>

PR tree-optimization/106506
* tree-ssa-phiopt.cc (spaceship_replacement): Don't punt for
is_cast or orig_use_lhs cases if phi_bb has 3 predecessors.

* g++.dg/opt/pr94589-2.C: New test.

tree-optimization/106593 - fix ICE with backward threading

With the last re-org I failed to make sure to not add SSA names
nor supported by ranger into m_imports which then triggers an
ICE in range_on_path_entry because range_of_expr returns false.

PR tree-optimization/106593
* tree-ssa-threadbackward.cc (back_threader::find_paths):
If the imports from the conditional do not satisfy
gimple_range_ssa_p don't try to thread anything.

sve: Fix fcmuo combine patterns [PR106524]

There's no encoding for fcmuo with zero. This restricts the combine patterns
from accepting zero registers.

gcc/ChangeLog:

PR target/106524
* config/aarch64/aarch64-sve.md (*fcmuo<mode>_nor_combine,
*fcmuo<mode>_bic_combine): Don't accept comparisons against zero.

gcc/testsuite/ChangeLog:

PR target/106524
* gcc.target/aarch64/sve/pr106524.c: New test.

analyzer: out-of-bounds checker [PR106000]

This patch adds an experimental out-of-bounds checker to the analyzer.

The checker was tested on coreutils, curl, httpd and openssh. It is mostly
accurate but does produce false-positives on yacc-generated files and
sometimes when the analyzer misses an invariant. These cases will be
documented in bugzilla.
Regression-tested on Linux x86-64, further ran the analyzer tests with
the -m32 option.

2022-08-11 Tim Lange <mail@tim-lange.me>

gcc/analyzer/ChangeLog:

PR analyzer/106000
* analyzer.opt: Add Wanalyzer-out-of-bounds.
* region-model.cc (class out_of_bounds): Diagnostics base class
for all out-of-bounds diagnostics.
(class past_the_end): Base class derived from out_of_bounds for
the buffer_overflow and buffer_overread diagnostics.
(class buffer_overflow): Buffer overflow diagnostics.
(class buffer_overread): Buffer overread diagnostics.
(class buffer_underflow): Buffer underflow diagnostics.
(class buffer_underread): Buffer overread diagnostics.
(region_model::check_region_bounds): New function to check region
bounds for out-of-bounds accesses.
(region_model::check_region_access):
Add call to check_region_bounds.
(region_model::get_representative_tree): New function that accepts
a region instead of an svalue.
* region-model.h (class region_model):
Add region_model::check_region_bounds.
* region.cc (region::symbolic_p): New predicate.
(offset_region::get_byte_size_sval): Only return the remaining
byte size on offset_regions.
* region.h: Add region::symbolic_p.
* store.cc (byte_range::intersects_p):
Add new function equivalent to bit_range::intersects_p.
(byte_range::exceeds_p): New function.
(byte_range::falls_short_of_p): New function.
* store.h (struct byte_range): Add byte_range::intersects_p,
byte_range::exceeds_p and byte_range::falls_short_of_p.

gcc/ChangeLog:

PR analyzer/106000
* doc/invoke.texi: Add Wanalyzer-out-of-bounds.

gcc/testsuite/ChangeLog:

PR analyzer/106000
* g++.dg/analyzer/pr100244.C: Disable out-of-bounds warning.
* gcc.dg/analyzer/allocation-size-3.c:
Disable out-of-bounds warning.
* gcc.dg/analyzer/memcpy-2.c: Disable out-of-bounds warning.
* gcc.dg/analyzer/pr101962.c: Add dg-warning.
* gcc.dg/analyzer/pr96764.c: Disable out-of-bounds warning.
* gcc.dg/analyzer/pr97029.c:
Add dummy buffer to prevent an out-of-bounds warning.
* gcc.dg/analyzer/realloc-5.c: Add dg-warning.
* gcc.dg/analyzer/test-setjmp.h:
Add dummy buffer to prevent an out-of-bounds warning.
* gcc.dg/analyzer/zlib-3.c: Add dg-bogus.
* g++.dg/analyzer/out-of-bounds-placement-new.C: New test.
* gcc.dg/analyzer/out-of-bounds-1.c: New test.
* gcc.dg/analyzer/out-of-bounds-2.c: New test.
* gcc.dg/analyzer/out-of-bounds-3.c: New test.
* gcc.dg/analyzer/out-of-bounds-container_of.c: New test.
* gcc.dg/analyzer/out-of-bounds-coreutils.c: New test.
* gcc.dg/analyzer/out-of-bounds-curl.c: New test.

analyzer: consider that realloc could shrink the buffer [PR106539]

This patch adds the "shrinks buffer" case to the success_with_move
modelling of realloc.

Regression-tested on Linux x86-64, further ran the analyzer tests with
the -m32 option.

2022-08-11 Tim Lange <mail@tim-lange.me>

gcc/analyzer/ChangeLog:

PR analyzer/106539
* region-model-impl-calls.cc (region_model::impl_call_realloc):
Use the result of get_copied_size as the size for the
sized_regions in realloc.
(success_with_move::get_copied_size): New function.

gcc/testsuite/ChangeLog:

PR analyzer/106539
* gcc.dg/analyzer/pr106539.c: New test.
* gcc.dg/analyzer/realloc-5.c: New test.

[AARCH64] Remove reference to MD_INCLUDES

The comment reference to MD_INCLUDES is not needed
as it is auto generated for long time now even before
aarch64 target was added.

MD_INCLUDES has been auto generated since r0-64489.
Note some targets still manually set MD_INCLUDES and
I suspect those can be changed but I don't have access
to those targets.

Committed as obvious.

Thanks,
Andrew Pinski

gcc/ChangeLog:

* config/aarch64/aarch64.md: Remove comment
about MD_INCLUDES as it is out of date and not needed.

Daily bump.

testsuite: fd-4.c redefines mode_t on AIX.

AIX stdio.h includes sys/types.h, which defines mode_t. The
analyzer/fd-4.c testcase provides a definition of mode_t for creat()
call, which conflicts with the AIX definition. This patch defines an
AIX macro to prevent multiple-definition of the type.

gcc/testsuite/ChangeLog:

* gcc.dg/analyzer/fd-4.c: Define _MODE_T on AIX.

testcase: Fix AIX testsuite failures

Recent testsuite additions trip over AIX-specific features.

gcc/testsuite/ChangeLog:

* g++.dg/cpp0x/initlist-const1.C: XFAIL on AIX.

analyzer: fix ICE casued by dup2 in sm-fd.cc[PR106551]

This patch fixes the ICE caused by valid_to_unchecked_state,
at analyzer/sm-fd.cc by handling the m_start state in
check_for_dup.

Tested lightly on x86_64.

gcc/analyzer/ChangeLog:
PR analyzer/106551
* sm-fd.cc (check_for_dup): handle the m_start
state when transitioning the state of LHS
of dup, dup2 and dup3 call.

gcc/testsuite/ChangeLog:
* gcc.dg/analyzer/fd-dup-1.c: New testcases.
* gcc.dg/analyzer/fd-uninit-1.c: Remove bogus
warning.
Signed-off-by: Immad Mir <mirimmad@outlook.com>

c-family: Honor -Wno-init-self for cv-qual vars [PR102633]

Since r11-5188-g32934a4f45a721, we drop qualifiers during l-to-r
conversion by creating a NOP_EXPR.  For e.g.

  const int i = i;

that means that the DECL_INITIAL is '(int) i' and not 'i' anymore.
Consequently, we don't suppress_warning here:

711     case DECL_EXPR:
715       if (VAR_P (DECL_EXPR_DECL (*expr_p))
716           && !DECL_EXTERNAL (DECL_EXPR_DECL (*expr_p))
717           && !TREE_STATIC (DECL_EXPR_DECL (*expr_p))
718           && (DECL_INITIAL (DECL_EXPR_DECL (*expr_p)) == DECL_EXPR_DECL (*expr_p))
719           && !warn_init_self)
720         suppress_warning (DECL_EXPR_DECL (*expr_p), OPT_Winit_self);

because of the check on line 718 -- (int) i is not i.  So -Wno-init-self
doesn't disable the warning as it's supposed to.

The following patch fixes it by moving the suppress_warning call from
c_gimplify_expr to the front ends, at points where we haven't created
the NOP_EXPR yet.

PR middle-end/102633

gcc/c-family/ChangeLog:

* c-gimplify.cc (c_gimplify_expr) <case DECL_EXPR>: Don't call
suppress_warning here.

gcc/c/ChangeLog:

* c-parser.cc (c_parser_initializer): Add new tree parameter.  Use it.
Call suppress_warning.
(c_parser_declaration_or_fndef): Pass d down to c_parser_initializer.
(c_parser_omp_declare_reduction): Pass omp_priv down to
c_parser_initializer.

gcc/cp/ChangeLog:

* decl.cc (cp_finish_decl): Call suppress_warning.

gcc/testsuite/ChangeLog:

* c-c++-common/Winit-self1.c: New test.
* c-c++-common/Winit-self2.c: New test.

Tame path_range_query::compute_imports

This avoids going BBs outside of the path when adding def chains
to the set of imports. It also syncs the code with
range_def_chain::get_def_chain to not miss out on some imports
this function would identify.

* gimple-range-path.cc (path_range_query::compute_imports):
Restrict walking SSA defs to blocks inside the path. Track
the same operands as range_def_chain::get_def_chain does.

tree-optimization/106514 - revisit m_import compute in backward threading

This revisits how we compute imports later used for the ranger path
query during backwards threading.  The compute_imports function
of the path solver ends up pulling the SSA def chain of regular
stmts without limit and since it starts with just the gori imports
of the path exit it misses some interesting names to translate
during path discovery.  In fact with a still empty path this
compute_imports function looks like not the correct tool.

The following instead implements what it does during the path discovery
and since we add the exit block we seed the initial imports and
interesting names from just the exit conditional.  When we then
process interesting names (aka imports we did not yet see the definition
of) we prune local defs but add their uses in a similar way as
compute_imports would have done.

compute_imports also is lacking in its walking of the def chain
compared to range_def_chain::get_def_chain which for example
handles &_1->x specially through range_op_handler and
gimple_range_operand1, so the code copies this.  A fix for
compute_imports will be done separately, also fixing the unbound
walk there.

The patch also properly unwinds m_imports during the path discovery
backtracking and from a debugging session I have verified the two
sets evolve as expected now while previously behaving slightly erratic.

Fortunately the m_imports set now also is shrunken significantly for
the PR69592 testcase (aka PR106514) so that there's overall speedup
when increasing --param max-jump-thread-duplication-stmts as
15 -> 30 -> 60 -> 120 from 1s -> 2s -> 13s -> 27s to with the patch
1s -> 2s -> 4s -> 8s.

This runs into a latent issue in X which doesn't seem to expect
any PHI nodes with a constant argument on an edge inside the path.
But we now have those as interesting, for example for the ICEing
g++.dg/torture/pr100925.C which just has sth like

  if (i)
    x = 1;
  else
    x = 5;
  if (x == 1)
    ...

where we now have the path from if (i) to if (x) and the PHI for x
in the set of imports to consider for resolving x == 1 which IMHO
looks exactly like what we want.  The path_range_query::ssa_range_in_phi
papers over the issue and drops the range to varying instead of
crashing.  I didn't want to mess with this any further in this patch
(but I couldn't resist replacing the loop over PHI args with
PHI_ARG_DEF_FROM_EDGE, so mind the re-indenting).

PR tree-optimization/106514
* tree-ssa-threadbackward.cc (back_threader::find_paths_to_names):
Compute and unwind both m_imports and interesting on the fly during
path discovery.
(back_threader::find_paths): Compute the original m_imports
from just the SSA uses of the exit conditional.  Drop
handling single_succ_to_potentially_threadable_block.
* gimple-range-path.cc (path_range_query::ssa_range_in_phi): Handle
constant PHI arguments without crashing.  Use PHI_ARG_DEF_FROM_EDGE.

* gcc.dg/tree-ssa/ssa-thread-19.c: Un-XFAIL.
* gcc.dg/tree-ssa/ssa-thread-20.c: New testcase.

testsuite: Fix up pr106243* tests on i686-linux [PR106243]

These 2 tests were FAILing on i686-linux or e.g. with
--target_board=unix/-m32/-mno-sse on x86_64-linux due to
-Wpsabi warnings.

2022-08-11 Jakub Jelinek <jakub@redhat.com>

PR tree-optimization/106243
* gcc.dg/pr106243.c: Add -Wno-psabi to dg-options.
* gcc.dg/pr106243-1.c: Likewise.

testsuite: Fix up pr104992* tests on i686-linux [PR104992]

These 2 tests were FAILing on i686-linux or e.g. with
--target_board=unix/-m32/-mno-sse on x86_64-linux due to
-Wpsabi warnings and also because dg-options in the latter
test has been ignored due to missing space, so even -O2
wasn't passed at all.

2022-08-11 Jakub Jelinek <jakub@redhat.com>

PR tree-optimization/104992
* gcc.dg/pr104992.c: Add -Wno-psabi to dg-options.
* g++.dg/pr104992-1.C: Likewise. Add space between " and } in
dg-options.

Fix path query compute_imports for external path

The following fixes the use of compute_imports from the backwards
threader which ends up accessing stale m_path from a previous
threading attempt.  The fix is to pass in the path explicitely
(and not the exit), and initializing it with the exit around this
call from the backwards threader.  That unfortunately exposed that
we rely on this broken behavior as the new testcase shows.  The
missed threading can be restored by registering all relations
from conditions on the path during solving, for the testcase the
particular important case is for relations provided by the path
entry conditional.

I've verified that the GORI query for imported ranges on edges
is not restricted this way.

This regresses the new ssa-thread-19.c testcase which is exactly
a case for the other patch re-doing how we compute imports since
this misses imports for defs that are not on the dominating path
from the exit.

That's one of the cases this regresses (it also progresses a few
due to more or the correct relations added).  Overall it
reduces the number of threads from 98649 to 98620 on my set of
cc1files.  I think it's a reasonable intermediate step to find
a stable, less random ground to compare stats to.

* gimple-range-path.h (path_range_query::compute_imports):
Take path as argument, not the exit block.
* gimple-range-path.cc (path_range_query::compute_imports):
Likewise, and adjust, avoiding possibly stale m_path.
(path_range_query::compute_outgoing_relations): Register
relations for all conditionals.
* tree-ssa-threadbackward.cc (back_threader::find_paths):
Adjust.

* gcc.dg/tree-ssa/ssa-thread-18.c: New testcase.
* gcc.dg/tree-ssa/ssa-thread-19.c: Likewise, but XFAILed.

rs6000: Simplify some code with rs6000_builtin_is_supported

In function rs6000_init_builtins, there is a oversight that
in one target debugging hunk with TARGET_DEBUG_BUILTIN we
missed to handle enum bif_enable ENB_CELL.  It's easy to
fix it by adding another if case.  But considering the long
term maintainability, this patch updates it with the existing
function rs6000_builtin_is_supported, which centralizes the
related conditions for different enum bif_enable, we only
need to update that function once some condition needs to
be changed later.  This also simplifies another usage in
function rs6000_expand_builtin.

gcc/ChangeLog:

* config/rs6000/rs6000-builtin.cc (rs6000_init_builtins): Fix the
oversight on ENB_CELL by simplifying with rs6000_builtin_is_supported.
(rs6000_expand_builtin): Simplify with rs6000_builtin_is_supported.

rs6000: Remove stale rs6000_global_entry_point_needed_p

r10-631 had renamed rs6000_global_entry_point_needed_p to
rs6000_global_entry_point_prologue_needed_p. This is to
remove the stale function declaration.

gcc/ChangeLog:

* config/rs6000/rs6000-internal.h (rs6000_global_entry_point_needed_p):
Remove function declaration.

Daily bump.

tree-optimization/106513 - fix mistake in bswap symbolic number shifts

This fixes a mistake in typing a local variable in the symbolic
shift routine.

PR tree-optimization/106513
* gimple-ssa-store-merging.cc (do_shift_rotate): Use uint64_t
for head_marker.

* gcc.dg/torture/pr106513.c: New testcase.

lto: respect jobserver in parallel WPA streaming

PR lto/106328

gcc/ChangeLog:

* opts-jobserver.h (struct jobserver_info): Add pipefd.
(jobserver_info::connect): New.
(jobserver_info::disconnect): Likewise.
(jobserver_info::get_token): Likewise.
(jobserver_info::return_token): Likewise.
* opts-common.cc: Implement the new functions.

gcc/lto/ChangeLog:

* lto.cc (wait_for_child): Decrement nruns once a process
finishes.
(stream_out_partitions): Use job server if active.
(do_whole_program_analysis): Likewise.

lto: support --jobserver-style=fifo for recent GNU make

gcc/ChangeLog:

* opts-jobserver.h: Add one member.
* opts-common.cc (jobserver_info::jobserver_info): Parse FIFO
format of --jobserver-auth.

Factor out jobserver_active_p.

gcc/ChangeLog:

* gcc.cc (driver::detect_jobserver): Remove and move to
jobserver.h.
* lto-wrapper.cc (jobserver_active_p): Likewise.
(run_gcc): Likewise.
* opts-jobserver.h: New file.
* opts-common.cc (jobserver_info::jobserver_info): New function.

[Committed] PR other/106575: Use "signed char" in new fold-eqandshift-4.c

My recently added testcase gcc.dg/fold-eqandshift-4.c, incorrectly assumed
that "char" was "signed char", and hence fails on powerpc64 where this
isn't the case.  Fixed by making "signed char" explicit where needed in
this test.  Committed as obvious.

2022-08-10  Roger Sayle  <roger@nextmovesoftware.com>

gcc/testsuite/ChangeLog
PR other/106575
* gcc.dg/fold-eqandshift-4.c: Use "signed char" explicitly.

Daily bump.

analyzer: fix missing -Wanalyzer-use-of-uninitialized-value on special-cased functions [PR106573]

We were missing checks for uninitialized params on calls to functions
that the analyzer has hardcoded knowledge of - both for those that are
handled just by state machines, and for those that are handled in
region-model-impl-calls.cc (for those arguments for which the svalue
wasn't accessed in handling the call).

Fixed thusly.

gcc/analyzer/ChangeLog:
PR analyzer/106573
* region-model.cc (region_model::on_call_pre): Ensure that we call
get_arg_svalue on all arguments.

gcc/testsuite/ChangeLog:
PR analyzer/106573
* gcc.dg/analyzer/error-uninit.c: New test.
* gcc.dg/analyzer/fd-uninit-1.c: New test.
* gcc.dg/analyzer/file-uninit-1.c: New test.

Signed-off-by: David Malcolm <dmalcolm@redhat.com>

Use PTEST to perform AND in TImode STV of (A & B) != 0 on x86_64.

This x86_64 backend patch allows TImode STV to take advantage of the
fact that the PTEST instruction performs an AND operation.  Previously
PTEST was (mostly) used for comparison against zero, by using the same
operands.  The benefits are demonstrated by the new test case:

__int128 a,b;
int foo()
{
  return (a & b) != 0;
}

Currently with -O2 -msse4 we generate:

        movdqa  a(%rip), %xmm0
        pand    b(%rip), %xmm0
        xorl    %eax, %eax
        ptest   %xmm0, %xmm0
        setne   %al
        ret

with this patch we now generate:

        movdqa  a(%rip), %xmm0
        xorl    %eax, %eax
        ptest   b(%rip), %xmm0
        setne   %al
        ret

Technically, the magic happens using new define_insn_and_split patterns.
Using two patterns allows this transformation to performed independently
of whether TImode STV is run before or after combine.  The one tricky
case is that immediate constant operands of the AND behave slightly
differently between TImode and V1TImode: All V1TImode immediate operands
becomes loads, but for TImode only values that are not hilo_operands
need to be loaded.  Hence the new *testti_doubleword accepts any
general_operand, but internally during split calls force_reg whenever
the second operand is not x86_64_hilo_general_operand.  This required
(benefits from) some tweaks to TImode STV to support CONST_WIDE_INT in
more places, using CONST_SCALAR_INT_P instead of just CONST_INT_P.

2022-08-09  Roger Sayle  <roger@nextmovesoftware.com>

gcc/ChangeLog
* config/i386/i386-features.cc (scalar_chain::convert_compare):
Create new pseudos only when/if needed.  Add support for TEST,
i.e. (COMPARE (AND x y) (const_int 0)), using UNSPEC_PTEST.
When broadcasting V2DImode and V4SImode use new pseudo register.
(timode_scalar_chain::convert_op): Do nothing if operand is
already V1TImode.  Avoid generating useless SUBREG conversions,
i.e. (SUBREG:V1TImode (REG:V1TImode) 0).  Handle CONST_WIDE_INT
in addition to CONST_INT by using CONST_SCALAR_INT_P.
(convertible_comparison_p): Use CONST_SCALAR_INT_P to match both
CONST_WIDE_INT and CONST_INT.  Recognize new *testti_doubleword
pattern as an STV candidate.
(timode_scalar_to_vector_candidate_p): Allow CONST_SCALAR_INT_P
operands in binary logic operations.

* config/i386/i386.cc (ix86_rtx_costs) <case UNSPEC>: Add costs
for UNSPEC_PTEST; a PTEST that performs an AND has the same cost
as regular PTEST, i.e. cost->sse_op.

* config/i386/i386.md (*testti_doubleword): New pre-reload
define_insn_and_split that recognizes comparison of TI mode AND
against zero.
* config/i386/sse.md (*ptest<mode>_and): New pre-reload
define_insn_and_split that recognizes UNSPEC_PTEST of identical
AND operands.

gcc/testsuite/ChangeLog
* gcc.target/i386/sse4_1-stv-8.c: New test case.

middle-end: Optimize ((X >> C1) & C2) != C3 for more cases.

Following my middle-end patch for PR tree-optimization/94026, I'd promised
Jeff Law that I'd clean up the dead-code in fold-const.cc now that these
optimizations are handled in match.pd.  Alas, I discovered things aren't
quite that simple, as the transformations I'd added avoided cases where
C2 overlapped with the new bits introduced by the shift, but the original
code handled any value of C2 provided that it had a single-bit set (under
the condition that C3 was always zero).

This patch upgrades the transformations supported by match.pd to cover
any values of C2 and C3, provided that C1 is a valid bit shift constant,
for all three shift types (logical right, arithmetic right and left).
This then makes the code in fold-const.cc fully redundant, and adds
support for some new (corner) cases not previously handled.  If the
constant C1 is valid for the type's precision, the shift is now always
eliminated (with C2 and C3 possibly updated to test the sign bit).

Interestingly, the fold-const.cc code that I'm now deleting was originally
added by me back in 2006 to resolve PR middle-end/21137.  I've confirmed
that those testcase(s) remain resolved with this patch (and I'll close
21137 in Bugzilla).  This patch also implements most (but not all) of the
examples mentioned in PR tree-optimization/98954, for which I have some
follow-up patches.

2022-08-09  Roger Sayle  <roger@nextmovesoftware.com>
    Richard Biener  <rguenther@suse.de>

gcc/ChangeLog
PR middle-end/21137
PR tree-optimization/98954
* fold-const.cc (fold_binary_loc): Remove optimizations to
optimize ((X >> C1) & C2) ==/!= 0.
* match.pd (cmp (bit_and (lshift @0 @1) @2) @3): Remove wi::ctz
check, and handle all values of INTEGER_CSTs @2 and @3.
(cmp (bit_and (rshift @0 @1) @2) @3): Likewise, remove wi::clz
checks, and handle all values of INTEGER_CSTs @2 and @3.

gcc/testsuite/ChangeLog
PR middle-end/21137
PR tree-optimization/98954
* gcc.dg/fold-eqandshift-4.c: New test case.

libgccjit.h: Uncomment macro definition for testing gcc_jit_context_new_bitcast support

The macro definition for LIBGCCJIT_HAVE_gcc_jit_context_new_bitcast
was earlier located in the documentation comment for
gcc_jit_context_new_bitcast, making it unavailable to code that
consumed libgccjit.h. This commit moves the definition out of the
comment, making it effective.

gcc/jit/ChangeLog:
* libgccjit.h (LIBGCCJIT_HAVE_gcc_jit_context_new_bitcast): Move
definition out of comment.

Signed-off-by: David Malcolm <dmalcolm@redhat.com>

docs: add notes on which functions -fanalyzer has hardcoded knowledge of

gcc/ChangeLog:
* doc/invoke.texi (Static Analyzer Options): Add notes on which
functions the analyzer has hardcoded knowledge of.

Signed-off-by: David Malcolm <dmalcolm@redhat.com>

d: Fix undefined reference to pragma(inline) symbol (PR106563)

Functions that are declared `pragma(inline)' should be treated as if
they are defined in every translation unit they are referenced from,
regardless of visibility protection. Ensure they always get
DECL_ONE_ONLY linkage, and start emitting them into other modules that
import them.

PR d/106563

gcc/d/ChangeLog:

* decl.cc (DeclVisitor::visit (FuncDeclaration *)): Set semanticRun
before generating its symbol.
(function_defined_in_root_p): New function.
(function_needs_inline_definition_p): New function.
(maybe_build_decl_tree): New function.
(get_symbol_decl): Call maybe_build_decl_tree before returning symbol.
(start_function): Use function_defined_in_root_p instead of inline
test for locally defined symbols.
(set_linkage_for_decl): Check for inline functions before private or
protected symbols.

gcc/testsuite/ChangeLog:

* gdc.dg/torture/torture.exp (srcdir): New proc.
* gdc.dg/torture/imports/pr106563math.d: New test.
* gdc.dg/torture/imports/pr106563regex.d: New test.
* gdc.dg/torture/imports/pr106563uni.d: New test.
* gdc.dg/torture/pr106563.d: New test.

amdgcn: Vector procedure call ABI

Adjust the (unofficial) procedure calling ABI such that vector arguments are
passed in vector registers, not on the stack.  Scalar arguments continue to
be passed in scalar registers, making a total of 12 argument registers.

The return value is also moved to a vector register (even for scalars; it
would be possible to retain the scalar location, using untyped_call, but
there's no obvious advantage in doing so).

After this change the ABI is as follows:

s0-s13  : Reserved for kernel launch parameters.
s14-s15 : Frame pointer.
s16-s17 : Stack pointer.
s18-s19 : Link register.
s20-s21 : Exec Save.
s22-s23 : CC Save.
s24-s25 : Scalar arguments.          NO LONGER RETURN VALUE.
s26-s29 : Additional scalar arguments (makes 6 total).
s30-s31 : Static Chain.
v0      : Prologue/epilogue scratch.
v1      : Constant 0, 1, 2, 3, 4, ... 63.
v2-v7   : Prologue/epilogue scratch.
v8-v9   : Return value & vector arguments.              NEW.
v10-v13 : Additional vector arguments (makes 6 total).  NEW.

gcc/ChangeLog:

* config/gcn/gcn.cc (gcn_function_value): Allow vector return values.
(num_arg_regs): Allow vector arguments.
(gcn_function_arg): Likewise.
(gcn_function_arg_advance): Likewise.
(gcn_arg_partial_bytes): Likewise.
(gcn_return_in_memory): Likewise.
(gcn_expand_epilogue): Get return value from v8.
* config/gcn/gcn.h (RETURN_VALUE_REG): Set to v8.
(FIRST_PARM_REG): USE FIRST_SGPR_REG for clarity.
(FIRST_VPARM_REG): New.
(FUNCTION_ARG_REGNO_P): Allow vector parameters.
(struct gcn_args): Add vnum field.
(LIBCALL_VALUE): All vector return values.
* config/gcn/gcn.md (gcn_call_value): Add vector constraints.
(gcn_call_value_indirect): Likewise.

autopar TLC

The following removes all excessive update_ssa calls from OMP
expansion, thereby rewriting the atomic load and store cases to
GIMPLE code generation. I don't think autopar ever exercises the
atomics code though.

There's not much test coverage overall so I've built SPEC 2k17
with -floop-parallelize-all -ftree-parallelize-loops=2 with and
without LTO (and otherwise -Ofast plus -march=haswell) without
fallout.

If there's any fallout it's not OK to update SSA form for
each and every OMP stmt lowered.

* omp-expand.cc (expand_omp_atomic_load): Emit GIMPLE
directly. Avoid update_ssa when in SSA form.
(expand_omp_atomic_store): Likewise.
(expand_omp_atomic_fetch_op): Avoid update_ssa when in SSA
form.
(expand_omp_atomic_pipeline): Likewise.
(expand_omp_atomic_mutex): Likewise.
* tree-parloops.cc (gen_parallel_loop): Use
TODO_update_ssa_no_phi after loop_version.

Remove --param max-fsm-thread-length

This removes max-fsm-thread-length which is obsoleted by
max-jump-thread-paths.

* doc/invoke.texi (max-fsm-thread-length): Remove.
* params.opt (max-fsm-thread-length): Likewise.
* tree-ssa-threadbackward.cc
(back_threader_profitability::profitable_path_p): Do not
check max-fsm-thread-length.

tree-optimization/106514 - add --param max-jump-thread-paths

The following adds a limit for the exponential greedy search of
the backwards jump threader.  The idea is to limit the search
space in a way that the paths considered are the same if the search
were in BFS order rather than DFS.  In particular it stops considering
incoming edges into a block if the product of the in-degrees of
blocks on the path exceeds the specified limit.

When considering the low stmt copying limit of 7 (or 1 in the size
optimize case) this means the degenerate case with maximum search
space is a sequence of conditions with no actual code

  B1
   |\
   | empty
   |/
  B2
   |\
   ...
  Bn
   |\

GIMPLE_CONDs are costed 2, an equivalent GIMPLE_SWITCH already 4, so
we reach 7 already with 3 middle conditions (B1 and Bn do not count).
The search space would be 2^4 == 16 to reach this.  The FSM threads
historically allowed for a thread length of 10 but is really looking
for a single multiway branch threaded across the backedge.  I've
chosen the default of the new parameter to 64 which effectively
limits the outdegree of the switch statement (the cases reaching the
backedge) to that number (divided by 2 until I add some special
pruning for FSM threads due to the loop header indegree).  The
testcase ssa-dom-thread-7.c requires 56 at the moment (as said,
some special FSM thread pruning of considered edges would bring
it down to half of that), but we now get one more threading
and quite some more in later threadfull.  This testcase seems to
be difficult to check for expected transforms.

The new testcases add the degenerate case we currently thread
(without deciding whether that's a good idea ...) plus one with
an approripate limit that should prevent the threading.

This obsoletes the mentioned --param max-fsm-thread-length but
I am not removing it as part of this patch.  When the search
space is limited the thread stmt size limit effectively provides
max-fsm-thread-length.

The param with its default does not help PR106514 enough to unleash
path searching with the higher FSM stmt count limit.

PR tree-optimization/106514
* params.opt (max-jump-thread-paths): New.
* doc/invoke.texi (max-jump-thread-paths): Document.
* tree-ssa-threadbackward.cc (back_threader::find_paths_to_names):
Honor max-jump-thread-paths, take overall_path argument.
(back_threader::find_paths): Pass 1 as initial overall_path.

* gcc.dg/tree-ssa/ssa-thread-16.c: New testcase.
* gcc.dg/tree-ssa/ssa-thread-17.c: Likewise.
* gcc.dg/tree-ssa/ssa-dom-thread-7.c: Adjust.

OpenMP: Fix folding with simd's linear clause [PR106492]

gcc/ChangeLog:

PR middle-end/106492
* omp-low.cc (lower_rec_input_clauses): Add missing folding
to data type of linear-clause list item.

gcc/testsuite/ChangeLog:

PR middle-end/106492
* g++.dg/gomp/pr106492.C: New test.

Daily bump.

Evaluate condition arguments with the correct type.

Processing of a cond_expr requires that a range of the correct type for the
operands of the cond_expr is passed in.

PR tree-optimization/106556
gcc/
* gimple-range-gori.cc (gori_compute::condexpr_adjust): Use the
type of the cond_expr operands being evaluted.

gcc/testsuite/
* gfortran.dg/pr106556.f90: New.

preprocessor/106426: Treat u8 character literals as unsigned in char8_t modes.

This patch corrects handling of UTF-8 character literals in preprocessing
directives so that they are treated as unsigned types in char8_t enabled
C++ modes (C++17 with -fchar8_t or C++20 without -fno-char8_t). Previously,
UTF-8 character literals were always treated as having the same type as
ordinary character literals (signed or unsigned dependent on target or use
of the -fsigned-char or -funsigned char options).

PR preprocessor/106426

gcc/c-family/ChangeLog:
* c-opts.cc (c_common_post_options): Assign cpp_opts->unsigned_utf8char
subject to -fchar8_t, -fsigned-char, and/or -funsigned-char.

gcc/testsuite/ChangeLog:
* g++.dg/ext/char8_t-char-literal-1.C: Check signedness of u8 literals.
* g++.dg/ext/char8_t-char-literal-2.C: Check signedness of u8 literals.

libcpp/ChangeLog:
* charset.cc (narrow_str_to_charconst): Set signedness of CPP_UTF8CHAR
literals based on unsigned_utf8char.
* include/cpplib.h (cpp_options): Add unsigned_utf8char.
* init.cc (cpp_create_reader): Initialize unsigned_utf8char.

C: Implement C2X N2653 char8_t and UTF-8 string literal changes

This patch implements the core language and compiler dependent library
changes adopted for C2X via WG14 N2653.  The changes include:
- Change of type for UTF-8 string literals from array of const char to
  array of const char8_t (unsigned char).
- A new atomic_char8_t typedef.
- A new ATOMIC_CHAR8_T_LOCK_FREE macro defined in terms of the existing
  __GCC_ATOMIC_CHAR8_T_LOCK_FREE predefined macro.

gcc/ChangeLog:

* ginclude/stdatomic.h (atomic_char8_t,
ATOMIC_CHAR8_T_LOCK_FREE): New typedef and macro.

gcc/c/ChangeLog:

* c-parser.cc (c_parser_string_literal): Use char8_t as the type
of CPP_UTF8STRING when char8_t support is enabled.
* c-typeck.cc (digest_init): Allow initialization of an array
of character type by a string literal with type array of
char8_t.

gcc/c-family/ChangeLog:

* c-lex.cc (lex_string, lex_charconst): Use char8_t as the type
of CPP_UTF8CHAR and CPP_UTF8STRING when char8_t support is
enabled.
* c-opts.cc (c_common_post_options): Set flag_char8_t if
targeting C2x.

gcc/testsuite/ChangeLog:
* gcc.dg/atomic/c2x-stdatomic-lockfree-char8_t.c: New test.
* gcc.dg/atomic/gnu2x-stdatomic-lockfree-char8_t.c: New test.
* gcc.dg/c11-utf8str-type.c: New test.
* gcc.dg/c17-utf8str-type.c: New test.
* gcc.dg/c2x-utf8str-type.c: New test.
* gcc.dg/c2x-utf8str.c: New test.
* gcc.dg/gnu2x-utf8str-type.c: New test.
* gcc.dg/gnu2x-utf8str.c: New test.

d: Fix ICE in in add_stack_var, at cfgexpand.cc:476

The type that triggers the ICE never got completed by the semantic
analysis pass. Checking for size forces it to be done, or issue a
compile-time error.

PR d/106555

gcc/d/ChangeLog:

* d-target.cc (Target::isReturnOnStack): Check for return type size.

gcc/testsuite/ChangeLog:

* gdc.dg/imports/pr106555.d: New test.
* gdc.dg/pr106555.d: New test.

libstdc++: [_GLIBCXX_DEBUG] Do not consider detached iterators as value-initialized

An attach iterator has its _M_version set to something != 0, the container version. This
value shall be preserved when detaching it so that the iterator does not look like a
value-initialized one.

libstdc++-v3/ChangeLog:

* include/debug/formatter.h (__singular_value_init): New _Iterator_state enum entry.
(_Parameter<>(const _Safe_iterator<>&, const char*, _Is_iterator)): Check if iterator
parameter is value-initialized.
(_Parameter<>(const _Safe_local_iterator<>&, const char*, _Is_iterator)): Likewise.
* include/debug/safe_iterator.h (_Safe_iterator<>::_M_value_initialized()): New. Adapt
checks.
* include/debug/safe_local_iterator.h (_Safe_local_iterator<>::_M_value_initialized()): New.
Adapt checks.
* src/c++11/debug.cc (_Safe_iterator_base::_M_reset): Do not reset _M_version.
(print_field(PrintContext&, const _Parameter&, const char*)): Adapt state_names.
* testsuite/23_containers/deque/debug/iterator1_neg.cc: New test.
* testsuite/23_containers/deque/debug/iterator2_neg.cc: New test.
* testsuite/23_containers/forward_list/debug/iterator1_neg.cc: New test.
* testsuite/23_containers/forward_list/debug/iterator2_neg.cc: New test.
* testsuite/23_containers/forward_list/debug/iterator3_neg.cc: New test.

Fix middle-end/103645: empty struct store not removed when using compound literal

For compound literals empty struct stores are not removed as they go down a
different path of the gimplifier; trying to optimize the init constructor.
This fixes the problem by not adding the gimple assignment at the end
of gimplify_init_constructor if it was an empty type.

Note this updates gcc.dg/pr87052.c where we had:
const char d[0] = { };
And was expecting a store to d but after this, there is no store
as the decl's type is zero in size.

OK? Bootstrapped and tested on x86_64-linux-gnu with no regressions.

gcc/ChangeLog:

PR middle-end/103645
* gimplify.cc (gimplify_init_constructor): Don't build/add
gimple assignment of an empty type.

gcc/testsuite/ChangeLog:
* gcc.dg/pr87052.c: Update d var to expect nothing.

AArch32: Fix 128-bit sequential consistency atomic operations.

Similar to AArch64 the Arm implementation of 128-bit atomics is broken.

For 128-bit atomics we rely on pthread barriers to correct guard the address
in the pointer to get correct memory ordering. However for 128-bit atomics the
address under the lock is different from the original pointer.

This means that one of the values under the atomic operation is not protected
properly and so we fail during when the user has requested sequential
consistency as there's no barrier to enforce this requirement.

As such users have resorted to adding an

#ifdef GCC
<emit barrier>
#endif

around the use of these atomics.

This corrects the issue by issuing a barrier only when __ATOMIC_SEQ_CST was
requested. I have hand verified that the barriers are inserted
for atomic seq cst.

libatomic/ChangeLog:

PR target/102218
* config/arm/host-config.h (pre_seq_barrier, post_seq_barrier,
pre_post_seq_barrier): Require barrier on __ATOMIC_SEQ_CST.

AArch64: Fix 128-bit sequential consistency atomic operations.

The AArch64 implementation of 128-bit atomics is broken.

For 128-bit atomics we rely on pthread barriers to correct guard the address
in the pointer to get correct memory ordering.  However for 128-bit atomics the
address under the lock is different from the original pointer.

This means that one of the values under the atomic operation is not protected
properly and so we fail during when the user has requested sequential
consistency as there's no barrier to enforce this requirement.

As such users have resorted to adding an

#ifdef GCC
<emit barrier>
#endif

around the use of these atomics.

This corrects the issue by issuing a barrier only when __ATOMIC_SEQ_CST was
requested.  To remedy this performance hit I think we should revisit using a
similar approach to out-line-atomics for the 128-bit atomics.

Note that I believe I need the empty file due to the include_next chain but
I am not entirely sure.  I have hand verified that the barriers are inserted
for atomic seq cst.

libatomic/ChangeLog:

PR target/102218
* config/aarch64/aarch64-config.h: New file.
* config/aarch64/host-config.h: New file.

lto/106540 - fix LTO tree input wrt dwarf2out_register_external_die

I've revisited the earlier two workarounds for dwarf2out_register_external_die
getting duplicate entries. It turns out that r11-525-g03d90a20a1afcb
added dref_queue pruning to lto_input_tree but decl reading uses that
to stream in DECL_INITIAL even when in the middle of SCC streaming.
When that SCC then gets thrown away we can end up with debug nodes
registered which isn't supposed to happen. The following adjusts
the DECL_INITIAL streaming to go the in-SCC way, using lto_input_tree_1,
since no SCCs are expected at this point, just refs.

PR lto/106540
PR lto/106334
* dwarf2out.cc (dwarf2out_register_external_die): Restore
original assert.
* lto-streamer-in.cc (lto_read_tree_1): Use lto_input_tree_1
to input DECL_INITIAL, avoiding to commit drefs.

Move testcase gcc.dg/tree-ssa/pr93776.c to gcc.c-torture/compile/pr93776.c

Since this testcase is not exactly SSA specific and it would
be a good idea to compile this at more than just at -O1, moving
it to gcc.c-torture/compile would do that.

Committed as obvious after a test on x86_64-linux-gnu.

gcc/testsuite/ChangeLog:

* gcc.dg/tree-ssa/pr93776.c: Moved to...
* gcc.c-torture/compile/pr93776.c: ...here.

Daily bump.

[Committed] Add -mno-stv to new gcc.target/i386/cmpti2.c test case.

Adding -march=cascadelake to the command line options of the new cmpti2.c
testcase triggers TImode STV and produces vector code that doesn't match
the scalar implementation that this test was intended to check.  Adding
-mno-stv to the options fixes this.  Committed as obvious.

2022-08-07  Roger Sayle  <roger@nextmovesoftware.com>

gcc/testsuite/ChangeLog
* gcc.target/i386/cmpti2.c: Add -mno-stv to dg-options.

c++: Add support for __real__/__imag__ modifications in constant expressions [PR88174]

We claim we support P0415R1 (constexpr complex), but e.g.
#include <complex>

constexpr bool
foo ()
{
  std::complex<double> a (1.0, 2.0);
  a += 3.0;
  a.real (6.0);
  return a.real () == 6.0 && a.imag () == 2.0;
}

static_assert (foo ());

fails with
test.C:12:20: error: non-constant condition for static assertion
   12 | static_assert (foo ());
      |                ~~~~^~
test.C:12:20:   in ‘constexpr’ expansion of ‘foo()’
test.C:8:10:   in ‘constexpr’ expansion of ‘a.std::complex<double>::real(6.0e+0)’
test.C:12:20: error: modification of ‘__real__ a.std::complex<double>::_M_value’ is not a constant expression

The problem is we don't handle REALPART_EXPR and IMAGPART_EXPR
in cxx_eval_store_expression.
The following patch attempts to support it (with a requirement
that those are the outermost expressions, ARRAY_REF/COMPONENT_REF
etc. are just not possible on the result of these, BIT_FIELD_REF
would be theoretically possible if trying to extract some bits
from one part of a complex int, but I don't see how it could appear
in the FE trees.

For these references, the code handles value being COMPLEX_CST,
COMPLEX_EXPR or CONSTRUCTOR_NO_CLEARING empty CONSTRUCTOR (what we use
to represent uninitialized values for C++20 and later) and the
code starts by rewriting it to COMPLEX_EXPR, so that we can freely
adjust the individual parts and later on possibly optimize it back
to COMPLEX_CST if both halves are constant.

2022-08-07  Jakub Jelinek  <jakub@redhat.com>

PR c++/88174
* constexpr.cc (cxx_eval_store_expression): Handle REALPART_EXPR
and IMAGPART_EXPR.  Change ctors from releasing_vec to
auto_vec<tree *>, adjust all uses.  For !preeval, update ctors
vector.

* g++.dg/cpp1y/constexpr-complex1.C: New test.

Allow any immediate constant in *cmp<dwi>_doubleword splitter on x86_64.

This patch tweaks i386.md's *cmp<dwi>_doubleword splitter's predicate to
allow general_operand, not just x86_64_hilo_general_operand, to improve
code generation.  As a general rule, i386.md's _doubleword splitters should
be post-reload splitters that require integer immediate operands to be
x86_64_hilo_int_operand, so that each part is a valid word mode immediate
constant.  As an exception to this rule, doubleword patterns that must be
split before reload, because they require additional scratch registers,
can use take advantage of this ability to create new pseudos, to accept
any immediate constant, and call force_reg on the high and/or low parts
if they are not suitable immediate operands in word mode.

The benefit is shown in the new cmpti3.c test case below.

__int128 x;
int foo()
{
    __int128 t = 0x1234567890abcdefLL;
    return x == t;
}

where GCC with -O2 currently generates:

        movabsq $1311768467294899695, %rax
        xorl    %edx, %edx
        xorq    x(%rip), %rax
        xorq    x+8(%rip), %rdx
        orq     %rdx, %rax
        sete    %al
        movzbl  %al, %eax
        ret

but with this patch now generates:

        movabsq $1311768467294899695, %rax
        xorq    x(%rip), %rax
        orq     x+8(%rip), %rax
        sete    %al
        movzbl  %al, %eax
        ret

2022-08-07  Roger Sayle  <roger@nextmovesoftware.com>

gcc/ChangeLog
* config/i386/i386.md (*cmp<dwi>_doubleword): Change predicate
for x86_64_hilo_general_operand to general operand.  Call
force_reg on parts that are not x86_64_immediate_operand.

gcc/testsuite/ChangeLog
* gcc.target/i386/cmpti1.c: New test case.
* gcc.target/i386/cmpti2.c: Likewise.
* gcc.target/i386/cmpti3.c: Likewise.

Daily bump.

New warning: -Wanalyzer-jump-through-null [PR105947]

This patch adds a new warning to -fanalyzer for jumps through NULL
function pointers.

gcc/analyzer/ChangeLog:
PR analyzer/105947
* analyzer.opt (Wanalyzer-jump-through-null): New option.
* engine.cc (class jump_through_null): New.
(exploded_graph::process_node): Complain about jumps through NULL
function pointers.

gcc/ChangeLog:
PR analyzer/105947
* doc/invoke.texi: Add -Wanalyzer-jump-through-null.

gcc/testsuite/ChangeLog:
PR analyzer/105947
* gcc.dg/analyzer/function-ptr-5.c: New test.

Signed-off-by: David Malcolm <dmalcolm@redhat.com>

middle-end: Allow backend to expand/split double word compare to 0/-1.

This patch to the middle-end's RTL expansion reorders the code in
emit_store_flag_1 so that the backend has more control over how best
to expand/split double word equality/inequality comparisons against
zero or minus one.  With the current implementation, the middle-end
always decides to lower this idiom during RTL expansion using SUBREGs
and word mode instructions, without ever consulting the backend's
machine description.  Hence on x86_64, a TImode comparison against zero
is always expanded as:

(parallel [
  (set (reg:DI 91)
       (ior:DI (subreg:DI (reg:TI 88) 0)
               (subreg:DI (reg:TI 88) 8)))
  (clobber (reg:CC 17 flags))])
(set (reg:CCZ 17 flags)
     (compare:CCZ (reg:DI 91)
                  (const_int 0 [0])))

This patch, which makes no changes to the code itself, simply reorders
the clauses in emit_store_flag_1 so that the middle-end first attempts
expansion using the target's doubleword mode cstore optab/expander,
and only if this fails, falls back to lowering to word mode operations.
On x86_64, this allows the expander to produce:

(set (reg:CCZ 17 flags)
     (compare:CCZ (reg:TI 88)
                  (const_int 0 [0])))

which is a candidate for scalar-to-vector transformations (and
combine simplifications etc.).  On targets that don't define a cstore
pattern for doubleword integer modes, there should be no change in
behaviour.  For those that do, the current behaviour can be restored
(if desired) by restricting the expander/insn to not apply when the
comparison is EQ or NE, and operand[2] is either const0_rtx or
constm1_rtx.

This change just keeps RTL expansion more consistent (in philosophy).
For other doubleword comparisons, such as with operators LT and GT,
or with constants other than zero or -1, the wishes of the backend
are respected, and only if the optab expansion fails are the default
fall-back implementations using narrower integer mode operations
(and conditional jumps) used.

2022-08-05  Roger Sayle  <roger@nextmovesoftware.com>

gcc/ChangeLog
* expmed.cc (emit_store_flag_1): Move code to expand double word
equality and inequality against zero or -1, using word operations,
to after trying to use the backend's cstore<mode>4 optab/expander.

libstdc++: Add feature test macro for <experimental/scope>

libstdc++-v3/ChangeLog:

* include/experimental/scope (__cpp_lib_experimental_scope):
Define.
* testsuite/experimental/scopeguard/uniqueres.cc: Check macro.

libstdc++: Implement <experimental/scope> from LFTSv3

libstdc++-v3/ChangeLog:

* include/Makefile.am: Add new header.
* include/Makefile.in: Regenerate.
* include/experimental/scope: New file.
* testsuite/experimental/scopeguard/uniqueres.cc: New test.
* testsuite/experimental/scopeguard/exit.cc: New test.

middle-end: Guard value_replacement and store_elim from seeing diamonds.

This excludes value_replacement and store_elim from diamonds as they don't
handle the form properly.

gcc/ChangeLog:

PR middle-end/106534
* tree-ssa-phiopt.cc (tree_ssa_phiopt_worker): Guard the
value_replacement and store_elim from diamonds.

backthreader dump fix

This fixes odd SUCCEEDED dumps from the backthreader registry that
can happen even though register_jump_thread cancelled the thread
as invalid.

* tree-ssa-threadbackward.cc (back_threader::maybe_register_path):
Check whether the registry register_path rejected the path.
(back_threader_registry::register_path): Return whether
register_jump_thread succeeded.

Inline unsupported_range constructor.

An unsupported_range temporary is instantiated in every Value_Range
for completeness sake and should be mostly a NOP. However, it's
showing up in the callgrind stats, because it's not inline. This
fixes the oversight.

PR tree-optimization/106514

gcc/ChangeLog:

* value-range.cc (unsupported_range::unsupported_range): Move...
* value-range.h (unsupported_range::unsupported_range): ...here.
(unsupported_range::set_undefined): New.

tree-optimization/106533 - loop distribution of inner loop of nest

Loop distribution currently gives up if the outer loop of a loop
nest it analyzes contains a stmt with side-effects instead of
continuing to analyze the innermost loop. The following fixes that
by continuing anyway.

PR tree-optimization/106533
* tree-loop-distribution.cc (loop_distribution::execute): Continue
analyzing the inner loops when find_seed_stmts_for_distribution
fails.

* gcc.dg/tree-ssa/ldist-39.c: New testcase.

rs6000: Correct return value of check_p9modulo_hw_available.

Set the return value to 0 when modulo is supported, and to 1 when not supported.

gcc/testsuite/
* lib/target-supports.exp (check_p9modulo_hw_available): Correct return
value.

[RSIC-V] Fix 32bit riscv with zbs extension enabled

The problem here was a disconnect between splittable_const_int_operand
predicate and the function riscv_build_integer_1 for 32bits with zbs enabled.
The splittable_const_int_operand predicate had a check for TARGET_64BIT which
was not needed so this patch removed it.

Committed as obvious after a build for risc32-elf configured with --with-arch=rv32imac_zba_zbb_zbc_zbs.

Thanks,
Andrew Pinski

gcc/ChangeLog:

* config/riscv/predicates.md (splittable_const_int_operand):
Remove the check for TARGET_64BIT for single bit const values.

Daily bump.

Add myself as AutoFDO maintainer

ChangeLog:

* MAINTAINERS: Add myself as AutoFDO maintainer.

libstdc++: Make std::string_view(Range&&) constructor explicit

The P2499R0 paper was recently approved for C++23.

libstdc++-v3/ChangeLog:

* include/std/string_view (basic_string_view(Range&&)): Add
explicit as per P2499R0.
* testsuite/21_strings/basic_string_view/cons/char/range_c++20.cc:
Adjust implicit conversions. Check implicit conversions fail.
* testsuite/21_strings/basic_string_view/cons/wchar_t/range_c++20.cc:
Likewise.

libstdc++: Add comparisons to std::default_sentinel_t (LWG 3719)

This library defect was recently approved for C++23.

libstdc++-v3/ChangeLog:

* include/bits/fs_dir.h (directory_iterator): Add comparison
with std::default_sentinel_t. Remove redundant operator!= for
C++20.
* (recursive_directory_iterator): Likewise.
* include/bits/iterator_concepts.h [!__cpp_lib_concepts]
(default_sentinel_t, default_sentinel): Define even if concepts
are not supported.
* include/bits/regex.h (regex_iterator): Add comparison with
std::default_sentinel_t. Remove redundant operator!= for C++20.
(regex_token_iterator): Likewise.
(regex_token_iterator::_M_end_of_seq()): Add noexcept.
* testsuite/27_io/filesystem/iterators/lwg3719.cc: New test.
* testsuite/28_regex/iterators/regex_iterator/lwg3719.cc:
New test.
* testsuite/28_regex/iterators/regex_token_iterator/lwg3719.cc:
New test.

Loop over intersected bitmaps.

compute_ranges_in_block loops over the import list and then checks the
same bit in exports. It is nmore efficent to loop over the intersection
of the 2 bitmaps.

PR tree-optimization/106514
* gimple-range-path.cc (path_range_query::compute_ranges_in_block):
Use EXECUTE_IF_AND_IN_BITMAP to loop over 2 bitmaps.

middle-end: Simplify subtract where both arguments are being bitwise inverted.

This adds a match.pd rule that drops the bitwwise nots when both arguments to a
subtract is inverted. i.e. for:

float g(float a, float b)
{
return ~(int)a - ~(int)b;
}

we instead generate

float g(float a, float b)
{
return (int)b - (int)a;
}

We already do a limited version of this from the fold_binary fold functions but
this makes a more general version in match.pd that applies more often.

gcc/ChangeLog:

* match.pd: New bit_not rule.

gcc/testsuite/ChangeLog:

* gcc.dg/subnot.c: New test.

middle-end: Fix phi-ssa assertion triggers. [PR106519]

For the diamond PHI form in tree_ssa_phiopt_worker we need to
extract edge e2 sooner. This changes it so we extract it at the
same time we determine we have a diamond shape.

gcc/ChangeLog:

PR middle-end/106519
* tree-ssa-phiopt.cc (tree_ssa_phiopt_worker): Check final phi edge for
diamond shapes.

gcc/testsuite/ChangeLog:

PR middle-end/106519
* gcc.dg/pr106519.c: New test.

match.pd: Add bitwise and pattern [PR106243]

This patch adds a new optimization to match.pd. The pattern, -x & 1,
now gets simplified to x & 1, reducing the number of instructions
produced.

This patch also adds tests for the optimization rule.

Bootstrapped/regtested on x86_64-pc-linux-gnu.

PR tree-optimization/106243

gcc/ChangeLog:

* match.pd (-x & 1): New simplification.

gcc/testsuite/ChangeLog:

* gcc.dg/pr106243-1.c: New test.
* gcc.dg/pr106243.c: New test.

tree-optimization/106521 - unroll-and-jam LC SSA rewrite

The LC SSA rewrite performs SSA verification at start but the VN
run performed on the unrolled-and-jammed body can leave us with
invalid SSA form until CFG cleanup is run. So make sure we do that
before rewriting into LC SSA.

PR tree-optimization/106521
* gimple-loop-jam.cc (tree_loop_unroll_and_jam): Perform
CFG cleanup manually before rewriting into LC SSA.

* gcc.dg/torture/pr106521.c: New testcase.

Backwards threader greedy search TLC

I've tried to understand how the greedy search works seeing the
bitmap dances and the split into resolve_phi.  I've summarized
the intent of the algorithm as

      // For further greedy searching we want to remove interesting
      // names defined in BB but add ones on the PHI edges for the
      // respective edges.

but the implementation differs in detail.  In particular when
there is more than one interesting PHI in BB it seems to only consider
the first for translating defs across edges.  It also only applies
the loop crossing restriction when there is an interesting PHI.

The following preserves the loop crossing restriction to the case
of interesting PHIs but merges resolve_phi back, changing interesting
as outlined with the intent above.  It should get more threading
cases when there are multiple interesting PHI defs in a block.
It might be a bit faster due to less bitmap operations but in the
end the main intent was to make what happens more obvious.

* tree-ssa-threadbackward.cc (populate_worklist): Remove.
(back_threader::resolve_phi): Likewise.
(back_threader::find_paths_to_names): Rewrite greedy search.

libstdc++: Rename data members of std::unexpected and std::bad_expected_access

The P2549R1 paper was accepted for C++23. I already implemented it for
our <expected>, but I didn't rename the private daata members, only the
public member functions. This renames the data members for consistency
with the working draft.

libstdc++-v3/ChangeLog:

* include/std/expected (unexpected::_M_val): Rename to _M_unex.
(bad_expected_access::_M_val): Likewise.

libstdc++: Update value of __cpp_lib_ios_noreplace macro

My P2467R1 proposal was accepted for C++23 so there's an official value
for this macro now.

libstdc++-v3/ChangeLog:

* include/bits/ios_base.h (__cpp_lib_ios_noreplace): Update
value to 202207L.
* include/std/version (__cpp_lib_ios_noreplace): Likewise.
* testsuite/27_io/basic_ofstream/open/char/noreplace.cc: Check
for new value.
* testsuite/27_io/basic_ofstream/open/wchar_t/noreplace.cc:
Likewise.

libstdc++: Unblock atomic wait on non-futex platforms [PR106183]

When using a mutex and condition variable, the notifying thread needs to
increment _M_ver while holding the mutex lock, and the waiting thread
needs to re-check after locking the mutex. This avoids a missed
notification as described in the PR.

By moving the increment of _M_ver to the base _M_notify we can make the
use of the mutex local to the use of the condition variable, and
simplify the code a little. We can use a relaxed store because the mutex
already provides sequential consistency. Also we don't need to check
whether __addr == &_M_ver because we know that's always true for
platforms that use a condition variable, and so we also know that we
always need to use notify_all() not notify_one().

Reviewed-by: Thomas Rodgers <trodgers@redhat.com>
libstdc++-v3/ChangeLog:

PR libstdc++/106183
* include/bits/atomic_wait.h (__waiter_pool_base::_M_notify):
Move increment of _M_ver here.
[!_GLIBCXX_HAVE_PLATFORM_WAIT]: Lock mutex around increment.
Use relaxed memory order and always notify all waiters.
(__waiter_base::_M_do_wait) [!_GLIBCXX_HAVE_PLATFORM_WAIT]:
Check value again after locking mutex.
(__waiter_base::_M_notify): Remove increment of _M_ver.

Adjust index number of tuple pretty printer

The tuple pretty printer uses 1-based indeces which is quite confusing
considering the access to the same values with the std::get functions
uses 0-based indeces. This patch changes the pretty printer since
this is not a guaranteed API.

libstdc++-v3/ChangeLog:

* python/libstdcxx/v6/printers.py (class StdTuplePrinter): Use
zero-based indeces just like std:get takes.

PR106342 - IBM zSystems: Provide vsel for all vector modes

dg.exp=pr104612.c fails with an ICE on s390x, because copysignv2sf3
produces an insn that vsel<mode> is supposed to recognize, but can't,
because it's not defined for V2SF. Fix by defining it for all vector
modes supported by copysign<mode>3.

gcc/ChangeLog:

* config/s390/vector.md (V_HW_FT): New iterator.
* config/s390/vx-builtins.md (vsel<mode>): Use V_HW_FT instead
of V_HW.

Daily bump.

Do not enable -mblock-ops-vector-pair.

Testing has shown that using the load vector pair and store vector pair
instructions for block moves has some performance issues on power10.

A patch on June 11th modified the code so that GCC would not set
-mblock-ops-vector-pair by default if we are tuning for power10, but it would
set the option if we were tuning for a different machine and have load and store
vector pair instructions enabled.

This patch eliminates the code setting -mblock-ops-vector-pair. If you want to
generate load vector pair and store vector pair instructions for block moves,
you must use -mblock-ops-vector-pair.

2022-08-03 Michael Meissner <meissner@linux.ibm.com>

gcc/

* config/rs6000/rs6000.cc (rs6000_option_override_internal): Remove code
setting -mblock-ops-vector-pair.

Do not walk equivalence set in path_oracle::killing_def.

When killing a def in the path ranger, there is no need to walk the set
of existing equivalences clearing bits.  An equivalence match requires
that both ssa-names have to be in each others set.  As killing_def
creates a new empty set contianing only the current def,  it already
ensures false equivaelnces won't happen.

PR tree-optimization/106514
* value-relation.cc (path_oracle::killing_def) Do not walk the
  equivalence set clearing bits.

testsuite: btf: fix regexps in btf-int-1.c

The regexps in hte test btf-int-1.c were not working properly with the
commenting style of at least one target: powerpc64le-linux-gnu. This
patch changes the test to use better regexps.

Tested in bpf-unkonwn-none, x86_64-linux-gnu and powerpc64le-linux-gnu.
Pushed to master as obvious.

gcc/testsuite/ChangeLog:

PR testsuite/106515
* gcc.dg/debug/btf/btf-int-1.c: Fix regexps in
scan-assembler-times.

middle-end: Support recognition of three-way max/min.

This patch adds support for three-way min/max recognition in phi-opts.

Concretely for e.g.

#include <stdint.h>

uint8_t three_min (uint8_t xc, uint8_t xm, uint8_t xy) {
uint8_t xk;
    if (xc < xm) {
        xk = (uint8_t) (xc < xy ? xc : xy);
    } else {
        xk = (uint8_t) (xm < xy ? xm : xy);
    }
    return xk;
}

we generate:

  <bb 2> [local count: 1073741824]:
  _5 = MIN_EXPR <xc_1(D), xy_3(D)>;
  _7 = MIN_EXPR <xm_2(D), _5>;
  return _7;

instead of

  <bb 2>:
  if (xc_2(D) < xm_3(D))
    goto <bb 3>;
  else
    goto <bb 4>;

  <bb 3>:
  xk_5 = MIN_EXPR <xc_2(D), xy_4(D)>;
  goto <bb 5>;

  <bb 4>:
  xk_6 = MIN_EXPR <xm_3(D), xy_4(D)>;

  <bb 5>:
  # xk_1 = PHI <xk_5(3), xk_6(4)>
  return xk_1;

The same function also immediately deals with turning a minimization problem
into a maximization one if the results are inverted.  We do this here since
doing it in match.pd would end up changing the shape of the BBs and adding
additional instructions which would prevent various optimizations from working.

gcc/ChangeLog:

* tree-ssa-phiopt.cc (minmax_replacement): Optionally search for the phi
sequence of a three-way conditional.
(replace_phi_edge_with_variable): Support diamonds.
(tree_ssa_phiopt_worker): Detect diamond phi structure for three-way
min/max.
(strip_bit_not, invert_minmax_code): New.

gcc/testsuite/ChangeLog:

* gcc.dg/tree-ssa/split-path-1.c: Disable phi-opts so we don't optimize
code away.
* gcc.dg/tree-ssa/minmax-10.c: New test.
* gcc.dg/tree-ssa/minmax-11.c: New test.
* gcc.dg/tree-ssa/minmax-12.c: New test.
* gcc.dg/tree-ssa/minmax-13.c: New test.
* gcc.dg/tree-ssa/minmax-14.c: New test.
* gcc.dg/tree-ssa/minmax-15.c: New test.
* gcc.dg/tree-ssa/minmax-16.c: New test.
* gcc.dg/tree-ssa/minmax-3.c: New test.
* gcc.dg/tree-ssa/minmax-4.c: New test.
* gcc.dg/tree-ssa/minmax-5.c: New test.
* gcc.dg/tree-ssa/minmax-6.c: New test.
* gcc.dg/tree-ssa/minmax-7.c: New test.
* gcc.dg/tree-ssa/minmax-8.c: New test.
* gcc.dg/tree-ssa/minmax-9.c: New test.

d: Merge upstream dmd d7772a2369, phobos 5748ca43f.

In upstream dmd, the compiler front-end and run-time have been merged
together into one repository.  Both dmd and libdruntime now track that.

D front-end changes:

    - Deprecated `scope(failure)' blocks that contain `return' statements.
    - Deprecated using integers for `version' or `debug' conditions.
    - Deprecated returning a discarded void value from a function.
    - `new' can now allocate an associative array.

D runtime changes:

    - Added avx512f detection to core.cpuid module.

Phobos changes:

    - Changed std.experimental.logger.core.sharedLog to return
      shared(Logger).

gcc/d/ChangeLog:

* dmd/MERGE: Merge upstream dmd d7772a2369.
* dmd/VERSION: Bump version to v2.100.1.
* d-codegen.cc (get_frameinfo): Check whether decision to generate
closure changed since semantic finished.
* d-lang.cc (d_handle_option): Remove handling of -fdebug=level and
-fversion=level.
* decl.cc (DeclVisitor::visit (VarDeclaration *)): Generate evaluation
of noreturn variable initializers before throw.
* expr.cc (ExprVisitor::visit (AssignExp *)): Don't generate
assignment for noreturn types, only evaluate for side effects.
* lang.opt (fdebug=): Undocument -fdebug=level.
(fversion=): Undocument -fversion=level.

libphobos/ChangeLog:

* configure: Regenerate.
* configure.ac (libtool_VERSION): Update to 4:0:0.
* libdruntime/MERGE: Merge upstream druntime d7772a2369.
* libdruntime/Makefile.am (DRUNTIME_DSOURCES): Add
core/internal/array/duplication.d.
* libdruntime/Makefile.in: Regenerate.
* src/MERGE: Merge upstream phobos 5748ca43f.
* testsuite/libphobos.gc/nocollect.d:

cselib: add function to check if SET is redundant [PR106187]

A SET operation that writes memory may have the same value as an
earlier store but if the alias sets of the new and earlier store do
not conflict then the set is not truly redundant. This can happen,
for example, if objects of different types share a stack slot.

To fix this we define a new function in cselib that first checks for
equality and if that is successful then finds the earlier store in the
value history and checks the alias sets.

The routine is used in two places elsewhere in the compiler:
cfgcleanup and postreload.

gcc/ChangeLog:

PR rtl-optimization/106187
* alias.h (mems_same_for_tbaa_p): Declare.
* alias.cc (mems_same_for_tbaa_p): New function.
* dse.cc (record_store): Use it instead of open-coding
alias check.
* cselib.h (cselib_redundant_set_p): Declare.
* cselib.cc: Include alias.h
(cselib_redundant_set_p): New function.
* cfgcleanup.cc: (mark_effect): Use cselib_redundant_set_p instead
of rtx_equal_for_cselib_p.
* postreload.cc (reload_cse_simplify): Use cselib_redundant_set_p.
(reload_cse_noop_set_p): Delete.

gcov-dump: add --stable option

The option prints TOP N counters in a stable format
usage for comparison (diff).

gcc/ChangeLog:

* doc/gcov-dump.texi: Document the new option.
* gcov-dump.cc (main): Parse the new option.
(print_usage): Show the option.
(tag_counters): Sort key:value pairs of TOP N counter.

profile: do not collect stats unless TDF_DETAILS

gcc/ChangeLog:

* profile.cc (compute_branch_probabilities): Do not collect
stats unless TDF_DETAILS.

PR target/47949: Use xchg to move from/to AX_REG with -Oz on x86.

This patch adds a peephole2 to i386.md to implement the suggestion in
PR target/47949, of using xchg instead of mov for moving values to/from
the %rax/%eax register, controlled by -Oz, as the xchg instruction is
one byte shorter than the move it is replacing.

The new test case is taken from the PR:
int foo(int x) { return x; }

where previously we'd generate:
foo: mov %edi,%eax  // 2 bytes
ret

but with this patch, using -Oz, we generate:
foo: xchg %eax,%edi // 1 byte
ret

On the CSiBE benchmark, this saves a total of 10238 bytes (reducing
the -Oz total from 3661796 bytes to 3651558 bytes, a 0.28% saving).

Interestingly, some modern architectures (such as Zen 3) implement
xchg using zero latency register renaming (just like mov), so in theory
this transformation could be enabled when optimizing for speed, if
benchmarking shows the improved code density produces consistently
better performance.  However, this is architecture dependent, and
there may be interactions using xchg (instead a single_set) in the
late RTL passes (such as cprop_hardreg), so for now I've restricted
this to -Oz.

2022-08-03  Roger Sayle  <roger@nextmovesoftware.com>
    Uroš Bizjak  <ubizjak@gmail.com>

gcc/ChangeLog
PR target/47949
* config/i386/i386.md (peephole2): New peephole2 to convert
SWI48 moves to/from %rax/%eax where the src is dead to xchg,
when optimizing for minimal size with -Oz.

gcc/testsuite/ChangeLog
PR target/47949
* gcc.target/i386/pr47949.c: New test case.

Improved pre-reload split of double word comparison against -1 on x86.

This patch adds an extra optimization to *cmp<dwi>_doubleword to improve
the code generated for comparisons against -1.  Hypothetically, if a
comparison against -1 reached this splitter we'd currently generate code
that looks like:

        notq    %rdx ; 3 bytes
        notq    %rax ; 3 bytes
        orq     %rdx, %rax ; 3 bytes
        setne   %al

With this patch we would instead generate the superior:

        andq    %rdx, %rax ; 3 bytes
        cmpq    $-1, %rax ; 4 bytes
        setne   %al

which is both faster and smaller, and also what's currently generated
thanks to the middle-end splitting double word comparisons against
zero and minus one during RTL expansion.  Should that change, this would
become a missed-optimization regression, but this patch also (potentially)
helps suitable comparisons created by CSE and combine.

2022-08-03  Roger Sayle  <roger@nextmovesoftware.com>

gcc/ChangeLog
* config/i386/i386.md (*cmp<dwi>_doubleword): Add a special case
to split comparisons against -1 using AND and CMP -1 instructions.