review.tizen.org Git - test

x86: Use x constraint on KL patterns

Since KL instructions have no AVX512 version, replace the "v" register
constraint with the "x" register constraint.

PR target/105058
* config/i386/sse.md (loadiwkey): Replace "v" with "x".
(aes<aesklvariant>u8): Likewise.

x86: Use x constraint on SSSE3 patterns with MMX operands

Since PHADDW/PHADDD/PHADDSW/PHSUBW/PHSUBD/PHSUBSW/PSIGNB/PSIGNW/PSIGND
have no AVX512 version, replace the "Yv" register constraint with the
"x" register constraint.

PR target/105052
* config/i386/sse.md (ssse3_ph<plusminus_mnemonic>wv4hi3):
Replace "Yv" with "x".
(ssse3_ph<plusminus_mnemonic>dv2si3): Likewise.
(ssse3_psign<mode>3): Likewise.

analyzer: fix ICE on memset of untracked region [PR105057]

In r12-7809-g5f6197d7c197f9d2b7fb2e1a19dac39a023755e8 I added an
optimization to avoid tracking the state of certain memory regions
in the store.

Unfortunately, I didn't cover every way in which
store::get_or_create_cluster can be called for a base region, leading
to assertion failure ICEs in -fanalyzer on certain function calls
with certain params.

I've worked through all uses of store::get_or_create_cluster and found
four places where the assertion could fire.

This patch fixes them, and adds regression tests where possible.

gcc/analyzer/ChangeLog:
PR analyzer/105057
* store.cc (binding_cluster::make_unknown_relative_to): Reject
attempts to create a cluster for untracked base regions.
(store::set_value): Likewise.
(store::fill_region): Likewise.
(store::mark_region_as_unknown): Likewise.

gcc/testsuite/ChangeLog:
PR analyzer/105057
* gcc.dg/analyzer/fread-2.c: New test, as a regression test for
ICE in store::set_value on untracked base region.
* gcc.dg/analyzer/memset-2.c: Likewise, for ICE in
store::fill_region.
* gcc.dg/analyzer/strcpy-2.c: Likewise, for ICE in
store::mark_region_as_unknown.

Signed-off-by: David Malcolm <dmalcolm@redhat.com>

Manually add entry for r12-7818-g3ab5c8cd03d92bf4ec41e351820349d92fbc40c4

Because update_version_git gave up on it.

Daily bump.

Add another commit to ignore

We can't handle r12-7818-g3ab5c8cd03d92bf4ec41e351820349d92fbc40c4

* gcc-changelog/git_update_version.py: Add
3ab5c8cd03d92bf4ec41e351820349d92fbc40c4 to ignored commits.

c++: Fix up __builtin_{bit_cast,convertvector} parsing

Jonathan reported on IRC that we don't parse
__builtin_bit_cast (type, val).field
etc.
The problem is that for these 2 builtins we return from
cp_parser_postfix_expression instead of setting postfix_expression
to the cp_build_* value and falling through into the postfix regression
suffix handling loop.

2022-03-26 Jakub Jelinek <jakub@redhat.com>

* parser.cc (cp_parser_postfix_expression)
<case RID_BILTIN_CONVERTVECTOR, case RID_BUILTIN_BIT_CAST>: Don't
return cp_build_{vec,convert,bit_cast} result right away, instead
set postfix_expression to it and break.

* c-c++-common/builtin-convertvector-3.c: New test.
* g++.dg/cpp2a/bit-cast15.C: New test.

reload: Adjust comment in find_reloads about subset, not intersection

gcc:
* reload.cc (find_reloads): Align comment with code where
considering the intersection of register classes then tweaking the
regclass for the current alternative or rejecting it.

rs6000: Update testsuite to use -mdejagnu-cpu= and -mdejagnu-tune= options

This patch updates the POWER testsuite test cases using -mcpu= and -mtune=
to use the preferred -mdejagnu-cpu= and -mdejagnu-tune= options.  This also
obviates the need for the dg-skip-if directive, since the user cannot
override the -mcpu= value being used to compile the test case.

2022-03-25  Peter Bergner  <bergner@linux.ibm.com>

gcc/testsuite/

* g++.dg/pr65240-1.C: Use -mdejagnu-cpu=.  Remove dg-skip-if.
* g++.dg/pr65240-2.C: Likewise.
* g++.dg/pr65240-3.C: Likewise.
* g++.dg/pr65240-4.C: Likewise.
* g++.dg/pr65242.C: Likewise.
* g++.dg/pr67211.C: Likewise.
* g++.dg/pr69667.C: Likewise.
* g++.dg/pr71294.C: Likewise.
* g++.dg/pr84279.C: Likewise.
* g++.dg/torture/ppc-ldst-array.C: Likewise.
* gfortran.dg/nint_p7.f90: Likewise.
* gfortran.dg/pr102860.f90: Likewise.
* gcc.target/powerpc/fusion.c: Use -mdejagnu-cpu= and -mdejagnu-tune=.
* gcc.target/powerpc/fusion2.c: Likewise.
* gcc.target/powerpc/int_128bit-runnable.c: Use -mdejagnu-cpu=.
* gcc.target/powerpc/test_mffsl.c: Likewise.
* gfortran.dg/pr47614.f: Likewise.
* gfortran.dg/pr58968.f: Likewise.

x86: Use -msse2 on gcc.target/i386/pr95483-1.c

Replace -msse with -msse2 since <emmintrin.h> requires SSE2.

PR testsuite/105055
* gcc.target/i386/pr95483-1.c: Replace -msse with -msse2.

libstdc++: Add more doxygen comments in <bit>

libstdc++-v3/ChangeLog:

* include/std/bit (bit_cast, byteswap, endian): Add doxygen
comments.

arm: Revert Auto-vectorization for MVE: add pack/unpack patterns PR target/104882

This reverts commit r12-1434-g046a3beb1673bf to fix PR target/104882.

As discussed in the PR, it turns out that the MVE ISA has no natural
mapping with GCC's vec_pack_trunc / vec_unpack standard patterns, unlike
Neon or SVE for instance.

This patch also adds the executable testcase provided in the PR.
This test passes at -O3 because the generated code does not need
to use the pack/unpack patterns, hence the use of -O2 which now
triggers vectorization since a few months ago.

2022-03-18 Christophe Lyon <christohe.lyon@arm.com>

PR target/104882
Revert
2021-06-11 Christophe Lyon <christophe.lyon@linaro.org>

gcc/
* config/arm/mve.md (mve_vec_unpack<US>_lo_<mode>): Delete.
(mve_vec_unpack<US>_hi_<mode>): Delete.
(@mve_vec_pack_trunc_lo_<mode>): Delete.
(mve_vmovntq_<supf><mode>): Remove '@' prefix.
* config/arm/neon.md (vec_unpack<US>_hi_<mode>): Move back
from vec-common.md.
(vec_unpack<US>_lo_<mode>): Likewise.
(vec_pack_trunc_<mode>): Rename from
neon_quad_vec_pack_trunc_<mode>.
* config/arm/vec-common.md (vec_unpack<US>_hi_<mode>): Delete.
(vec_unpack<US>_lo_<mode>): Delete.
(vec_pack_trunc_<mode>): Delete.

PR target/104882
gcc/testsuite/
* gcc.target/arm/simd/mve-vclz.c: Update expected results.
* gcc.target/arm/simd/mve-vshl.c: Likewise.
* gcc.target/arm/simd/mve-vec-pack.c: Delete.
* gcc.target/arm/simd/mve-vec-unpack.c: Delete.
* gcc.target/arm/simd/pr104882.c: New test.

[PR104971] LRA: check live hard regs to remove a dead insn

LRA removes insn modifying sp for given PR test set. We should also have
checked living hard regs to prevent this. The patch fixes this.

gcc/ChangeLog:

PR middle-end/104971
* lra-lives.cc (process_bb_lives): Check hard_regs_live for hard
regs to clear remove_p flag.

tree-optimization/105053 - fix reduction chain epilogue generation

When we optimize permutations in a reduction chain we have to
be careful to select the correct live-out stmt, otherwise the
reduction result will be unused and the retained scalar code will
execute only the number of vector iterations.

2022-03-25 Richard Biener <rguenther@suse.de>

PR tree-optimization/105053
* tree-vect-loop.cc (vect_create_epilog_for_reduction): Pick
the correct live-out stmt for a reduction chain.

* g++.dg/vect/pr105053.cc: New testcase.

c++: alignas and alignof void [PR104944]

I started looking into this PR because in GCC 4.9 we were able to
detect the invalid

  struct alignas(void) S{};

but I broke it in r210262.

It's ill-formed code in C++:
[dcl.align]/3: "An alignment-specifier of the form alignas(type-id) has
the same effect as alignas(alignof(type-id))", and [expr.align]/1:
"The operand shall be a type-id representing a complete object type,
or an array thereof, or a reference to one of those types." and void
is not a complete type.

It's also invalid in C:
6.7.5: _Alignas(type-name) is equivalent to _Alignas(_Alignof(type-name))
6.5.3.4: "The _Alignof operator shall not be applied to a function type
or an incomplete type."

We have a GNU extension whereby we treat sizeof(void) as 1, but I assume
it doesn't apply to alignof, at least in C++.  However, __alignof__(void)
is still accepted with a -Wpedantic warning.

We still say "invalid application of 'alignof'" rather than 'alignas' in the
void diagnostic, but I felt that fixing that may not be suitable as part of
this patch.  The "incomplete type" diagnostic still always prints
'__alignof__'.

PR c++/104944

gcc/cp/ChangeLog:

* typeck.cc (cxx_sizeof_or_alignof_type): Diagnose alignof(void).
(cxx_alignas_expr): Call cxx_sizeof_or_alignof_type with
complain == true.

gcc/testsuite/ChangeLog:

* g++.dg/cpp0x/alignas20.C: New test.

[libgomp, testsuite] Scale down some OpenACC test-cases

When a display manager is running on an nvidia card, all CUDA kernel launches
get a 5 seconds watchdog timer.

Consequently, when running the libgomp testsuite with nvptx accelerator and
GOMP_NVPTX_JIT=-O0 we run into a few FAILs like this:
...
libgomp: cuStreamSynchronize error: the launch timed out and was terminated
FAIL: libgomp.oacc-c/../libgomp.oacc-c-c++-common/parallel-dims.c \
  -DACC_DEVICE_TYPE_nvidia=1 -DACC_MEM_SHARED=0 -foffload=nvptx-none -O0 \
  execution test
...

Fix this by scaling down the failing test-cases by default, and reverting to
the original behaviour for GCC_TEST_RUN_EXPENSIVE=1.

Tested on x86_64-linux with nvptx accelerator.

libgomp/ChangeLog:

2022-03-25  Tom de Vries  <tdevries@suse.de>

PR libgomp/105042
* testsuite/libgomp.oacc-c-c++-common/parallel-dims.c: Reduce
execution time.
* testsuite/libgomp.oacc-c-c++-common/vred2d-128.c: Same.
* testsuite/libgomp.oacc-fortran/parallel-dims.f90: Same.

middle-end/105049 - fix uniform_vector_p and vector CTOR gimplification

We have

  return VIEW_CONVERT_EXPR<U>( VEC_PERM_EXPR < {<<< Unknown tree: compound_literal_expr
        V D.1984 = { 0 }; >>>, { 0 }} , {<<< Unknown tree: compound_literal_expr
        V D.1985 = { 0 }; >>>, { 0 }} , { 0, 0 } >  & {(short int) SAVE_EXPR <c>, (short int) SAVE_EXPR <c>});

where we gimplify the init CTORs to

  _1 = {{ 0 }, { 0 }};
  _2 = {{ 0 }, { 0 }};

instead of to vector constants.  That later runs into a bug in
uniform_vector_p which doesn't handle CTORs of vector elements
correctly.

The following adjusts uniform_vector_p to handle CTORs of vector
elements.

2022-03-25  Richard Biener  <rguenther@suse.de>

PR middle-end/105049
* tree.cc (uniform_vector_p): Recurse for VECTOR_CST or
CONSTRUCTOR first elements.

* gcc.dg/pr105049.c: New testcase.

Fix issue for pointers to anonymous types with -fdump-ada-spec

This used to work long ago but broke at some point.

gcc/c-family/
* c-ada-spec.cc (dump_ada_import): Deal with the "section" attribute
(dump_ada_node) <POINTER_TYPE>: Do not modify and pass the name, but
the referenced type instead. Deal with the anonymous original type
of a typedef'ed type. In the actual access case, follow the chain
of external subtypes.
<TYPE_DECL>: Tidy up control flow.

fortran: Fix up initializers of param(0) PARAMETERs [PR103691]

On the gfortran.dg/pr103691.f90 testcase the Fortran ICE emits
  static real(kind=4) a[0] = {[0 ... -1]=2.0e+0};
That is an invalid RANGE_EXPR where the maximum is smaller than the minimum.

The following patch fixes that.  If TYPE_MAX_VALUE is smaller than
TYPE_MIN_VALUE, the array is empty and so doesn't need any initializer,
if the two are equal, we don't need to bother with a RANGE_EXPR and
can just use that INTEGER_CST as the index and finally for the 2+ values
in the range it uses a RANGE_EXPR as before.

2022-03-25  Jakub Jelinek  <jakub@redhat.com>

PR fortran/103691
* trans-array.cc (gfc_conv_array_initializer): If TYPE_MAX_VALUE is
smaller than TYPE_MIN_VALUE (i.e. empty array), ignore the
initializer; if TYPE_MIN_VALUE is equal to TYPE_MAX_VALUE, use just
the TYPE_MIN_VALUE as index instead of RANGE_EXPR.

doc/invoke.texi: Move @ignore block out of @gccoptlist [PR103533]

With TeX output ("make pdf"), @gccoptlist's content end up in a single
line such that TeX does not find the matching '@end ignore' for the
'@ignore' block – failing with a runaway error. Solution is to move
the @ignore block after the closing '}'.
(Follow up to r12-7808-g319ba7e241e7e21f9eb481f075310796f13d2035 )

gcc/
PR analyzer/103533
* doc/invoke.texi (Static Analyzer Options): Move
@ignore block after @gccoptlist's '}' for 'make pdf'.

analyzer: add region::tracked_p to optimize state objects [PR104954]

PR analyzer/104954 tracks that -fanalyzer was taking a very long time
on a particular source file in the Linux kernel:
  drivers/gpu/drm/amd/display/dc/calcs/dce_calcs.c

One issue occurs with the repeated use of dynamic debug lines e.g. via
the DC_LOG_BANDWIDTH_CALCS macro, such as in print_bw_calcs_dceip in
drivers/gpu/drm/amd/display/dc/calcs/calcs_logger.h:

  DC_LOG_BANDWIDTH_CALCS("#####################################################################");
  DC_LOG_BANDWIDTH_CALCS("struct bw_calcs_dceip");
  DC_LOG_BANDWIDTH_CALCS("#####################################################################");

  [...snip dozens of lines...]

  DC_LOG_BANDWIDTH_CALCS("[bw_fixed] dmif_request_buffer_size: %d",
                         bw_fixed_to_int(dceip->dmif_request_buffer_size));

When this is configured to use __dynamic_pr_debug, each of these becomes
code like:

  do {
    static struct _ddebug __attribute__((__aligned__(8)))
    __attribute__((__section__("__dyndbg"))) __UNIQUE_ID_ddebug277 = {
      [...snip...]
    };
    if (arch_static_branch(&__UNIQUE_ID_ddebug277.key, false))
      __dynamic_pr_debug(&__UNIQUE_ID_ddebug277, [...the message...]);
  } while (0);

The analyzer was naively seeing each call to __dynamic_pr_debug, noting
that the __UNIQUE_ID_nnnn object escapes.  At each call, as successive
__UNIQUE_ID_nnnn object escapes, there are N escaped objects, and thus N
need clobbering, and so we have O(N^2) clobbering of escaped objects overall,
leading to huge amounts of pointless work: print_bw_calcs_data has 225
uses of DC_LOG_BANDWIDTH_CALCS, many of which are in loops.

This patch adds a way to identify declarations that aren't interesting
to the analyzer, so that we don't attempt to create binding_clusters
for them (i.e. we don't store any state for them in our state objects).
This is implemented by adding a new region::tracked_p, implemented for
declarations by walking the existing IPA data the first time the
analyzer sees a declaration, setting it to false for global vars that
have no loads/stores/aliases, and "sufficiently safe" address-of
ipa-refs.

The patch gives a large speedup of -fanalyzer on the above kernel
source file:
                           Before  After
Total cc1 wallclock time:    180s    36s
analyzer wallclock time:     162s    17s
% spent in analyzer:          90%    47%

gcc/analyzer/ChangeLog:
PR analyzer/104954
* analyzer.opt (-fdump-analyzer-untracked): New option.
* engine.cc (impl_run_checkers): Handle it.
* region-model-asm.cc (region_model::on_asm_stmt): Don't attempt
to clobber regions with !tracked_p ().
* region-model-manager.cc (dump_untracked_region): New.
(region_model_manager::dump_untracked_regions): New.
(frame_region::dump_untracked_regions): New.
* region-model.h (region_model_manager::dump_untracked_regions):
New decl.
* region.cc (ipa_ref_requires_tracking): New.
(symnode_requires_tracking_p): New.
(decl_region::calc_tracked_p): New.
* region.h (region::tracked_p): New vfunc.
(frame_region::dump_untracked_regions): New decl.
(class decl_region): Note that this is also used fo SSA names.
(decl_region::decl_region): Initialize m_tracked.
(decl_region::tracked_p): New.
(decl_region::calc_tracked_p): New decl.
(decl_region::m_tracked): New.
* store.cc (store::get_or_create_cluster): Assert that we
don't try to create clusters for base regions that aren't
trackable.
(store::mark_as_escaped): Don't mark base regions that we're not
tracking.

gcc/ChangeLog:
PR analyzer/104954
* doc/invoke.texi (Static Analyzer Options): Add
-fdump-analyzer-untracked.

gcc/testsuite/ChangeLog:
PR analyzer/104954
* gcc.dg/analyzer/asm-x86-dyndbg-1.c: New test.
* gcc.dg/analyzer/asm-x86-dyndbg-2.c: New test.
* gcc.dg/analyzer/many-unused-locals.c: New test.
* gcc.dg/analyzer/untracked-1.c: New test.
* gcc.dg/analyzer/unused-local-1.c: New test.

Signed-off-by: David Malcolm <dmalcolm@redhat.com>

Docs: Document that taint analyzer checker disables some warnings [PR103533]

gcc/ChangeLog:
PR analyzer/103533
* doc/invoke.texi: Document that enabling taint analyzer
checker disables some warnings from `-fanalyzer`.

Signed-off-by: Avinash Sonawane <rootkea@gmail.com>

Daily bump.

Change my MAINTAINERS information

I have changed employers and need to withdraw as SLSR maintainer for now.
Adding myself under a new email address under the DCO session. Thanks!

2021-03-24 Bill Schmidt <bill.schmidt@gmail.com>

* MAINTAINERS: Change my information.

c++: ICE with template code in constexpr [PR104284]

Since r9-6073 cxx_eval_store_expression preevaluates the value to
be stored, and that revealed a crash where a template code (here,
code=IMPLICIT_CONV_EXPR) leaks into cxx_eval*.

It happens because we're performing build_vec_init while processing
a template, which calls get_temp_regvar which creates an INIT_EXPR.
This INIT_EXPR's RHS contains an rvalue conversion so we create an
IMPLICIT_CONV_EXPR.  Its operand is not type-dependent and the whole
INIT_EXPR is not type-dependent.  So we call build_non_dependent_expr
which, with -fchecking=2, calls fold_non_dependent_expr.  At this
point the expression still has an IMPLICIT_CONV_EXPR, which ought to
be handled in instantiate_non_dependent_expr_internal.  However,
tsubst_copy_and_build doesn't handle INIT_EXPR; it will just call
tsubst_copy which does nothing when args is null.  So we fail to
replace the IMPLICIT_CONV_EXPR and ICE.

The problem is that we call build_vec_init in a template in the
first place.  We can avoid doing so by checking p_t_d before
calling build_aggr_init in check_initializer.

PR c++/104284

gcc/cp/ChangeLog:

* decl.cc (check_initializer): Don't call build_aggr_init in
a template.

gcc/testsuite/ChangeLog:

* g++.dg/cpp1y/constexpr-104284-1.C: New test.
* g++.dg/cpp1y/constexpr-104284-2.C: New test.
* g++.dg/cpp1y/constexpr-104284-3.C: New test.
* g++.dg/cpp1y/constexpr-104284-4.C: New test.

c++: delayed parse DMI [PR96645]

With the changes for PR81359 and PR88368 to make get_nsdmi errors be treated
as substitution failure, we have the problem that if we check
std::is_default_constructible for a complete class that still has unparsed
default member initializers, we get an answer (false) that will be wrong
once the DMIs have been parsed. The traits avoid this problem for regular
incomplete classes by giving an error if the operand is incomplete; we
should do the same if get_nsdmi is going to fail due to unparsed DMI.

PR c++/96645

gcc/cp/ChangeLog:

* cp-tree.h (type_has_default_ctor_to_be_synthesized): Declare.
* class.cc (type_has_default_ctor_to_be_synthesized): New.
(type_has_non_user_provided_default_constructor_1): Support it.
(type_has_non_user_provided_default_constructor): Now a wrapper.
* method.cc (complain_about_unparsed_dmi): New.
(constructible_expr): Call it.

gcc/testsuite/ChangeLog:

* g++.dg/ext/is_constructible3.C: Expect error.
* g++.dg/ext/is_constructible7.C: New test.

c++: FIX_TRUNC_EXPR in tsubst [PR102990]

This is a crash where a FIX_TRUNC_EXPR gets into tsubst_copy_and_build
where it hits gcc_unreachable ().

The history of tsubst_copy_and_build/FIX_TRUNC_EXPR is such that it
was introduced in r181478, but it did the wrong thing, whereupon it
was turned into gcc_unreachable () in r258821 (see this thread:
<https://gcc.gnu.org/pipermail/gcc-patches/2018-March/495853.html>).

In a template, we should never create a FIX_TRUNC_EXPR (that's what
conv_unsafe_in_template_p is for). But in this test we are NOT in
a template when we call digest_nsdmi_init which ends up calling
convert_like, converting 1.0e+0 to int, so convert_to_integer_1
gives us a FIX_TRUNC_EXPR.

But then when we get to parsing f's parameters, we are in a template
when processing decltype(Helpers{}), and since r268321, when the
compound literal isn't instantiation-dependent and the type isn't
type-dependent, finish_compound_literal falls back to the normal
processing, so it calls digest_init, which does fold_non_dependent_init
and since the FIX_TRUNC_EXPR isn't dependent, we instantiate and
therefore crash in tsubst_copy_and_build.

The fateful call to fold_non_dependent_init comes from massage_init_elt,
We shouldn't be calling f_n_d_i on the result of get_nsdmi. This we can
avoid by eschewing calling f_n_d_i on CONSTRUCTORs; their elements have
already been folded.

PR c++/102990

gcc/cp/ChangeLog:

* typeck2.cc (massage_init_elt): Avoid folding CONSTRUCTORs.

gcc/testsuite/ChangeLog:

* g++.dg/cpp0x/nsdmi-template22.C: New test.
* g++.dg/cpp0x/nsdmi-template23.C: New test.

c++: missing SFINAE for non-constant consteval calls [PR104620]

Here we weren't respecting SFINAE when evaluating a call to a consteval
function, which caused us to reject the new testcase below.  This patch
fixes this by making build_over_call use the SFINAE-friendly version of
cxx_constant_value.

This change causes us to no longer diagnose ahead of time a couple of
non-constant non-dependent consteval calls in consteval-if2.C with
-fchecking=2.  These errors were apparently coming from the call to
fold_non_dependent_expr in build_non_dependent_expr (for the RHS of the +=)
despite complain=tf_none being passed.  Now that build_over_call respects
the value of complain during constant evaluation of a consteval call,
the errors are gone.

That the errors are also gone without -fchecking=2 is a regression caused
by r12-7264-gc19f317a78c0e4 and is the subject of PR104620.  As described
in comment #5, I think it's basically an accident that we were diagnosing
these two calls correctly before r12-7264, so perhaps we can live without
these errors for GCC 12.  Thus this patch just XFAILs the two tests.

PR c++/104620

gcc/cp/ChangeLog:

* call.cc (build_over_call): Use cxx_constant_value_sfinae
instead of cxx_constant_value to evaluate a consteval call.
* constexpr.cc (cxx_constant_value_sfinae): Add decl parameter
and pass it to cxx_eval_outermost_constant_expr.
* cp-tree.h (cxx_constant_value_sfinae): Add decl parameter.
* pt.cc (fold_targs_r): Pass NULL_TREE as decl parameter to
cxx_constant_value_sfinae.

gcc/testsuite/ChangeLog:

* g++.dg/cpp23/consteval-if2.C: XFAIL two dg-error tests where
the argument to the non-constant non-dependent consteval call is
wrapped by NON_DEPENDENT_EXPR.
* g++.dg/cpp2a/consteval30.C: New test.

hardened conditionals: drop copied identifiers

The copies of identifiers, indended to associate hardening SSA
temporaries to the original variables they refer to, end up causing
-fcompare-debug to fail, because DECL_UIDs are not identical, and the
nouid flag used in compare-debug dumps doesn't affect the uids in
naked identifiers, so the divergence becomes apparent.

This patch drops the naked identifiers.  Though somewhat desirable,
they're not necessary.

for  gcc/ChangeLog

PR debug/104564
* gimple-harden-conditionals.cc (detach_value): Keep temps
anonymous.

for  gcc/testsuite/ChangeLog

PR debug/104564
* c-c++-common/torture/harden-comp.c: Adjust.
* c-c++-common/torture/harden-cond.c: Adjust.

hardcmp: split before dispatch edge

If we harden a compare at the end of a block with an edge to the
abnormal dispatch block, it won't have a single successor.  Arrange to
split the block at its final stmt so as to have a single succ.

for  gcc/ChangeLog

PR middle-end/104975
* gimple-harden-conditionals.cc
(pass_harden_compares::execute): Force split in case of
multiple edges.

for  gcc/testsuite/ChangeLog

PR middle-end/104975
* gcc.dg/pr104975.c: New.

[libatomic] Fix return value in libat_test_and_set

On nvptx (using a Quadro K2000 with driver 470.103.01) I ran into this:
...
FAIL: gcc.dg/atomic/stdatomic-flag-2.c -O1 execution test
...
which mimimized to:
...
  #include <stdatomic.h>
  atomic_flag a = ATOMIC_FLAG_INIT;
  int main () {
    if ((atomic_flag_test_and_set) (&a))
      __builtin_abort ();
    return 0;
  }
...

The atomic_flag_test_and_set is implemented using __atomic_test_and_set_1,
which corresponds to the "word-sized compare-and-swap loop" version of
libat_test_and_set in libatomic/tas_n.c.

The semantics of a test-and-set is that the return value is "true if and only
if the previous contents were 'set'".

But the code uses:
...
  return woldval != 0;
...
which means it doesn't look only at the byte that was either set or not set,
but at the entire word.

Fix this by using instead:
...
  return (woldval & ((UTYPE) ~(UTYPE) 0 << shift)) != 0;
...

Tested on nvptx.

libatomic/ChangeLog:

2022-03-24  Tom de Vries  <tdevries@suse.de>

PR target/105011
* tas_n.c (libat_test_and_set): Fix return value.

testsuite: Add compat.exp testcase for most common zero width bitfld ABI passing [PR102024]

On Tue, Mar 22, 2022 at 05:51:58PM +0100, Jakub Jelinek via Gcc wrote:
> I guess it would be nice to include the testcases we are talking about,
> like { float x; int : 0; float y; } and { float x; int : 0; } and
> { int : 0; float x; } into compat.exp testsuite so that we see ABI
> differences in compat testing.

Here is a patch that does that.  It uses the struct-layout-1* framework,
but isn't generated because we don't want in this case pseudo-random
structure layouts, but particular ones we know cause or could cause problems
on some targets.  If other problematic cases are discovered, we can add
further ones.

Tested on x86_64-linux with:
make check-gcc check-g++ RUNTESTFLAGS='ALT_CC_UNDER_TEST=gcc ALT_CXX_UNDER_TEST=g++ compat.exp=pr102*'
and with
make check-gcc check-g++ RUNTESTFLAGS='compat.exp=pr102*'
The former as expected has:
FAIL: gcc.dg/compat/pr102024 c_compat_x_tst.o-c_compat_y_alt.o execute
FAIL: gcc.dg/compat/pr102024 c_compat_x_alt.o-c_compat_y_tst.o execute
fails because on x86_64 we've changed the C ABI but kept the C++ ABI here.
E.g. on rs6000 it should be the g++.dg such tests to fail (all assuming
the alt gcc/g++ is GCC 4.5 through 11).

2022-03-24  Jakub Jelinek  <jakub@redhat.com>

PR target/102024
* gcc.dg/compat/pr102024_main.c: New test.
* gcc.dg/compat/pr102024_test.h: New test.
* gcc.dg/compat/pr102024_x.c: New test.
* gcc.dg/compat/pr102024_y.c: New test.
* g++.dg/compat/pr102024_main.C: New test.
* g++.dg/compat/pr102024_test.h: New test.
* g++.dg/compat/pr102024_x.C: New test.
* g++.dg/compat/pr102024_y.C: New test.

fold-const: Handle C++ dependent COMPONENT_REFs in operand_equal_p [PR105035]

As mentioned in the PR, operand_equal_p already contains some hacks so that
it can be called already on pre-instantiation C++ trees from templates,
but the recent change to compare DECL_FIELD_OFFSET in the COMPONENT_REF
case broke this. Many such COMPONENT_REFs are already punted on earlier
because they have NULL TREE_TYPE, but in this case the code knows what
type they have but still uses an IDENTIFIER_NODE as second operand
of COMPONENT_REF (I think SCOPE_REF is something that could be used too).

The following patch looks at those DECL_FIELD_*OFFSET fields only if
both field[01] args are FIELD_DECLs and otherwise keeps it to the
earlier OP_SAME (1) check that guards this whole block.

2022-03-24 Jakub Jelinek <jakub@redhat.com>

PR c++/105035
* fold-const.cc (operand_equal_p) <case COMPONENT_REF>: If either
field0 or field1 is not a FIELD_DECL, return false.

* g++.dg/warn/Wduplicated-cond2.C: New test.

Properly reset the port handle when closing

When the serial port is closed, we need to ensure that the port handle is
properly reset for it to be detected as closed.

gcc/ada/
PR ada/104767
* libgnat/g-sercom__mingw.adb (Close): Reset port handle to -1.
* libgnat/g-sercom__linux.adb (Close): Likewise.

Fix memory leaks

When changing the predcom pass to use auto_vec leaks were introduced by
failing to replace deallocation with C++ delete.  The following does
this.  It also fixes leaks in vectorization and range folding.

2022-03-24  Richard Biener  <rguenther@suse.de>

* tree-predcom.cc (chain::chain): Add CTOR.
(component::component): Likewise.
(pcom_worker::release_chain): Use delete.
(release_components): Likewise.
(pcom_worker::filter_suitable_components): Likewise.
(pcom_worker::split_data_refs_to_components): Use new.
(make_invariant_chain): Likewise.
(make_rooted_chain): Likewise.
(pcom_worker::combine_chains): Likewise.
* tree-vect-loop.cc (vect_create_epilog_for_reduction):
Make sure to release previously constructed scalar_results.
* tree-vect-stmts.cc (vectorizable_load): Use auto_vec
for vec_offsets.
* vr-values.cc (simplify_using_ranges::~simplify_using_ranges):
Release m_flag_set_edges.

tree-optimization/104970: Limit size computation for access attribute

Limit object size computation only to the simple case where access
attribute has been explicitly specified. The object passed to
__builtin_dynamic_object_size could either be a pointer or a VLA whose
size has been described using access attribute.

Further, return a valid size only if the object is a void * pointer or
points to (or is a VLA of) a type that has a constant size.

gcc/ChangeLog:

PR tree-optimization/104970
* tree-object-size.cc (parm_object_size): Restrict size
computation scenarios to explicit access attributes.

gcc/testsuite/ChangeLog:

PR tree-optimization/104970
* gcc.dg/builtin-dynamic-object-size-0.c (test_parmsz_simple2,
test_parmsz_simple3, test_parmsz_extern, test_parmsz_internal,
test_parmsz_internal2, test_parmsz_internal3): New tests.
(main): Use them.

Signed-off-by: Siddhesh Poyarekar <siddhesh@gotplt.org>

c++: extern thread_local declarations in constexpr [PR104994]

C++14 to C++20 apparently should allow extern thread_local declarations in
constexpr functions, however useless they are there (because accessing
such vars is not valid in a constant expression, perhaps sizeof/decltype).
P2242 changed that for C++23 to passing through declaration but
https://cplusplus.github.io/CWG/issues/2552.html
has been filed for it yesterday.

The following patch implements the proposed wording of CWG 2552 in addition
to fixing the C++14 - C++20 handling bug.
If you'd like instead to keep the current pedantic C++23 wording for now,
that would mean taking out the first hunk (cxx_eval_constant_expression) and
g++.dg/cpp23/constexpr-nonlit2.C hunk.

2022-03-24  Jakub Jelinek  <jakub@redhat.com>

PR c++/104994
* constexpr.cc (cxx_eval_constant_expression): Don't diagnose passing
through extern thread_local declarations.  Change wording from
declaration to definition.
(potential_constant_expression_1): Don't diagnose extern thread_local
declarations.  Change wording from declared to defined.
* decl.cc (start_decl): Likewise.

* g++.dg/diagnostic/constexpr1.C: Change expected diagnostic wording
from declared to defined.
* g++.dg/cpp23/constexpr-nonlit1.C: Likewise.
(garply): Change dg-error into dg-bogus.
* g++.dg/cpp23/constexpr-nonlit2.C: Change expected diagnostic wording
from declaration to definition.
* g++.dg/cpp23/constexpr-nonlit6.C: Change expected diagnostic wording
from declared to defined.
* g++.dg/cpp23/constexpr-nonlit7.C: New test.
* g++.dg/cpp2a/constexpr-try5.C: Change expected diagnostic wording
from declared to defined.
* g++.dg/cpp2a/consteval3.C: Likewise.

rs6000: Skip overload instances with NULL fntype [PR104967]

For some overload built-in function instance, if it requires
a data type which isn't defined on the target, its fntype
would be initialized as NULL. This patch is to consider
this possibility in function find_instance, as shown in
PR104967.

PR target/104967

gcc/ChangeLog:

* config/rs6000/rs6000-c.cc (find_instance): Skip instances with null
function types.

Daily bump.

analyzer: fix accessing wrong stack frame on interprocedural return [PR104979]

PR analyzer/104979 reports a leak false positive when handling an
interprocedural return to a caller:

  LHS = CALL(ARGS);

where the LHS is a certain non-trivial compound expression.

The root cause is that parts of the LHS were being erroneously
evaluated with respect to the stack frame of the called function,
rather than tha of the caller.  When LHS contained a local variable
within the caller as part of certain nested expressions, this local
variable was looked for within the called frame, rather than that of the
caller.  This lookup in the wrong stack frame led to the local variable
being treated as uninitialized, and thus the write to LHS was considered
as writing to a garbage location, leading to the return value being
lost, and thus being considered as a leak.

The region_model code uses the analyzer's path_var class to try to
extend the tree type with stack depth information.  Based on the above,
I think that the path_var class is fundamentally broken, but it's used
in a few other places in the analyzer, so I don't want to rip it out
until the next stage 1.

In the meantime, this patch reworks how region_model::pop_frame works so
that the destination region for an interprocedural return value is
computed after the frame is popped, so that the region_model has the
stack frame for the *caller* at that point.  Doing so fixes the issue.

I attempted a more ambitious fix which moved the storing of the return
svalue into the destination region from region_model::pop_region into
region_model::update_for_return_gcall, with pop_frame returning the
return svalue.  Unfortunately, this regressed g++.dg/analyzer/pr93212.C,
which returns a pointer into a stale frame.
unbind_region_and_descendents and poison_any_pointers_to_descendents are
only set up to poison regions with bindings into the stale frame, not
individual svalues, and updating that became more invasive than I'm
comfortable with in stage 4.

The patch also adds assertions to verify that we have the correct
function when looking up locals/SSA names in a stack frame.  There
doesn't seem to be a general-purpose way to get at the function of an
SSA name, so the assertions go from SSA name to def-stmt to basic_block,
and from there use the analyzer's supergraph to get the function from
the basic_block.  If there's a simpler way to do this, please let me know.

gcc/analyzer/ChangeLog:
PR analyzer/104979
* engine.cc (impl_run_checkers): Create the engine after the
supergraph, and pass the supergraph to the engine.
* region-model.cc (region_model::get_lvalue_1): Pass ctxt to
frame_region::get_region_for_local.
(region_model::update_for_return_gcall): Pass the lvalue for the
result to pop_frame as a tree, rather than as a region.
(region_model::pop_frame): Update for above change, determining
the destination region after the frame is popped and thus with
respect to the caller frame rather than the called frame.
Likewise, set the value of the region to the return value after
the frame is popped.
(engine::engine): Add supergraph pointer.
(selftest::test_stack_frames): Set the DECL_CONTECT of PARM_DECLs.
(selftest::test_get_representative_path_var): Likewise.
(selftest::test_state_merging): Likewise.
* region-model.h (region_model::pop_frame): Convert first param
from a const region * to a tree.
(engine::engine): Add param "sg".
(engine::m_sg): New field.
* region.cc: Include "analyzer/sm.h" and
"analyzer/program-state.h".
(frame_region::get_region_for_local): Add "ctxt" param.
Add assertions that VAR_DECLs are locals, and that expr is for the
correct function.
* region.h (frame_region::get_region_for_local): Add "ctxt" param.

gcc/testsuite/ChangeLog:
PR analyzer/104979
* gcc.dg/analyzer/boxed-malloc-1-29.c: Deleted test, moving the
now fixed test_29 to...
* gcc.dg/analyzer/boxed-malloc-1.c: ...here.
* gcc.dg/analyzer/stale-frame-1.c: Add test coverage.

Signed-off-by: David Malcolm <dmalcolm@redhat.com>

c++: tweak PR103337 fix

Patrick suggested a way to implement the designated-init handling without
(temporarily) modifying the CONSTRUCTOR being reshaped.

PR c++/103337

gcc/cp/ChangeLog:

* decl.cc (reshape_single_init): New.
(reshape_init_class): Use it.

c++: tweak PR105006 fix

Checking dependent_type_p avoids needing to walk the overloads in cases
where it would not be possible to find a dependent using.

PR c++/105006

gcc/cp/ChangeLog:

* name-lookup.cc (lookup_using_decl): Check that scope is
a dependent type before looking for dependent using.

Fortran: Fix directory stat check for '.' [PR103560]

MinGW does not like a call to 'stat' for './' via gfc_do_check_include_dir.
Solution: Only append '/' when concatenating the path with the filename.

gcc/fortran/ChangeLog:

PR fortran/103560
* scanner.cc (add_path_to_list): Don't append '/' to the
save include path.
(open_included_file): Use '/' in concatenating path + file name.
* module.cc (gzopen_included_file_1): Likewise.

gcc/testsuite/ChangeLog:

PR fortran/103560
* gfortran.dg/include_14.f90: Update dg-warning.
* gfortran.dg/include_17.f90: Likewise.
* gfortran.dg/include_18.f90: Likewise.
* gfortran.dg/include_6.f90: Update dg-*.

target/102125 - alternative memcpy folding improvement

The following extends the heuristical memcpy folding path with the
ability to use misaligned accesses on strict-alignment targets just
like the size-based path does.  That avoids regressing the following
testcase on arm

    uint64_t bar64(const uint8_t *rData1)
    {
        uint64_t buffer;
        memcpy(&buffer, rData1, sizeof(buffer));
        return buffer;
    }

when r12-3482-g5f6a6c91d7c592 is reverted.

2022-03-23  Richard Biener  <rguenther@suse.de>

PR target/102125
* gimple-fold.cc (gimple_fold_builtin_memory_op): Allow the
use of movmisalign when either the source or destination
decl is properly aligned.

rtl-optimization/105028 - fix compile-time hog in form_threads_from_copies

form_threads_from_copies processes a sorted array of copies, skipping
those with the same thread and conflicting threads and merging the
first non-conflicting ones.  After that it terminates the loop and
gathers the remaining elements of the array, skipping same thread
copies, re-starting the process.  For a large number of copies this
gathering of the rest takes considerable time and it also appears
pointless.  The following simply continues processing the array
which should be equivalent as far as I can see.

This takes form_threads_from_copies off the profile radar from
previously taking ~50% of the compile-time.

2022-03-23  Richard Biener  <rguenther@suse.de>

PR rtl-optimization/105028
* ira-color.cc (form_threads_from_copies): Remove unnecessary
copying of the sorted_copies tail.

c++: using from enclosing class template [PR105006]

Here, DECL_DEPENDENT_P was false for the second using because Row<eT> is
"the current instantiation", so lookup succeeds. But since Row itself has a
dependent using-decl for operator(), the set of functions imported by the
second using is dependent, so we should set the flag.

PR c++/105006

gcc/cp/ChangeLog:

* name-lookup.cc (lookup_using_decl): Set DECL_DEPENDENT_P if lookup
finds a dependent using.

gcc/testsuite/ChangeLog:

* g++.dg/template/using30.C: New test.

analyzer: use tainted_allocation_size::m_mem_space [PR105017]

gcc/analyzer/ChangeLog:
PR analyzer/105017
* sm-taint.cc (taint_diagnostic::subclass_equal_p): Check
m_has_bounds as well as m_arg.
(tainted_allocation_size::subclass_equal_p): Chain up to base
class implementation. Also check m_mem_space.
(tainted_allocation_size::emit): Add note showing stack-based vs
heap-based allocations.

gcc/testsuite/ChangeLog:
PR analyzer/105017
* gcc.dg/analyzer/taint-alloc-1.c: Add expected messages relating
to heap vs stack.

Signed-off-by: David Malcolm <dmalcolm@redhat.com>

analyzer: fix ICE adding note to disabled diagnostic [PR104997]

gcc/analyzer/ChangeLog:
PR analyzer/104997
* diagnostic-manager.cc (diagnostic_manager::add_diagnostic):
Convert return type from "void" to "bool", reporting success vs
failure to caller, for both overloads.
* diagnostic-manager.h (diagnostic_manager::add_diagnostic):
Likewise.
* engine.cc (impl_region_model_context::warn): Propagate return
value from diagnostic_manager::add_diagnostic.

gcc/testsuite/ChangeLog:
PR analyzer/104997
* gcc.dg/analyzer/write-to-string-literal-4-disabled.c: New test,
adapted from write-to-string-literal-4.c.

Signed-off-by: David Malcolm <dmalcolm@redhat.com>

libstdc++: Add missing constraints to std::bit_cast [PR105027]

Our std::bit_cast was relying on the compiler to check for errors inside
__builtin_bit_cast, instead of checking them as constraints. That means
std::bit_cast was not SFINAE-friendly.

This fix uses a requires-clause, so for old versions of Clang without
concepts support the function will still be unconstrained. At some point
in future we can remove the #ifdef __cpp_concepts check and rely on all
compilers having full concepts support in C++20 mode.

libstdc++-v3/ChangeLog:

PR libstdc++/105027
* include/std/bit (bit_cast): Add constraints.
* testsuite/26_numerics/bit/bit.cast/105027.cc: New test.

rs6000: Adjust error messages.

gcc/ChangeLog:

* config/rs6000/rs6000-c.cc (altivec_resolve_overloaded_builtin):
Use %qs in format.
* config/rs6000/rs6000.cc (rs6000_option_override_internal):
Reword the error message.

libstdc++: Fix feature test macros in <version> for freestanding

Some C++17 and C++20 feature test macros are only defined in <version>
for hosted builds, even though the features are supported for
freestanding.

All C++23 feature test macros are defined in <version> for freestanding,
but most of the features are only supported for hosted.

libstdc++-v3/ChangeLog:

* include/std/version [!_GLIBCXX_HOSTED]
(__cpp_lib_hardware_interference_size): Define for freestanding.
(__cpp_lib_bit_cast): Likewise.
(__cpp_lib_is_layout_compatible): Likewise.
(__cpp_lib_is_pointer_interconvertible): Likewise.
(__cpp_lib_adaptor_iterator_pair_constructor): Do not define for
freestanding.
(__cpp_lib_invoke_r): Likewise.
(__cpp_lib_ios_noreplace): Likewise.
(__cpp_lib_monadic_optional): Likewise.
(__cpp_lib_move_only_function): Likewise.
(__cpp_lib_spanstream): Likewise.
(__cpp_lib_stacktrace): Likewise.
(__cpp_lib_string_contains): Likewise.
(__cpp_lib_string_resize_and_overwrite): Likewise.
(__cpp_lib_to_underlying): Likewise.

libstdc++: Disable atomic wait for freestanding [PR105021]

We use either condition variables or futexes to implement atomic waits,
so we can't do it in freestanding. This is non-conforming, so should be
revisited later, probably by making freestanding atomic waiting
operations spin without ever blocking.

Reviewed-by: Thomas Rodgers <trodgers@redhat.com>
libstdc++-v3/ChangeLog:

PR libstdc++/105021
* include/bits/atomic_base.h [!_GLIBCXX_HOSTED]: Do not include
<bits/atomic_wait.h> for freestanding.

testsuite: Fix up sse2-v1ti-shift-3.c test [PR102986]

This test is dg-do run and invokes UB when these rotate functions
are called with 0 as second argument. There are some other tests
that do this but they are dg-do compile only and not even call those
functions at all, so it IMHO doesn't matter that they are only well
defined for [1,127] and not [0,127].

The following patch fixes it, we pattern recognize both forms as rotates
and we emit identical assembly.

2022-03-23 Jakub Jelinek <jakub@redhat.com>

PR target/102986
* gcc.target/i386/sse2-v1ti-shift-3.c (rotr_v1ti, rotl_v1ti, rotr_ti,
rotl_ti): Use -i&127 instead of 128-i to avoid UB on i == 0.

LTO: Fixes for renaming issues with offload/OpenMP [PR104285]

gcc/lto/ChangeLog:

PR middle-end/104285
* lto-partition.cc (maybe_rewrite_identifier): Use get_identifier
for the returned string to be usable as hash key.
(validize_symbol_for_target): Hence, use return value directly.
(privatize_symbol_name_1): Track maybe_rewrite_identifier renames.
* lto.cc (offload_handle_link_vars): Move function up before ...
(do_whole_program_analysis): Call it after static renamings.
(lto_main): Move call after static renamings.

libgomp/ChangeLog:

PR middle-end/104285
* testsuite/libgomp.c++/target-same-name-2-a.C: New test.
* testsuite/libgomp.c++/target-same-name-2-b.C: New test.
* testsuite/libgomp.c++/target-same-name-2.C: New test.
* testsuite/libgomp.c-c++-common/target-same-name-1-a.c: New test.
* testsuite/libgomp.c-c++-common/target-same-name-1-b.c: New test.
* testsuite/libgomp.c-c++-common/target-same-name-1.c: New test.

Fix ICE caused by NULL_RTX returned by lowpart_subreg.

In validate_subreg, both (subreg:V2HF (reg:SI) 0)
and (subreg:V8HF (reg:V2HF) 0) are valid, but not
for (subreg:V8HF (reg:SI) 0) which causes ICE.

Ideally it should be handled in validate_subreg to support
subreg for all modes available in TARGET_CAN_CHANGE_MODE_CLASS, but
that would be too risky in stage4, so the patch is a walkround in the
backend to force_reg operands before lowpart_subreg for expanders or
pre_reload splitters.

gcc/ChangeLog:

PR target/104976
* config/i386/sse.md (ssePSmodelower): New.
(*avx_cmp<mode>3_ltint_not): Force_reg operand before
lowpart_subreg to avoid NULL_RTX.
(<avx512>_fmaddc_<mode>_mask1<round_expand_name>,
<avx512>_fcmaddc_<mode>_mask1<round_expand_name>,
fma_<mode>_fmaddc_bcst, fma_<mode>_fcmaddc_bcst,
<avx512>_<complexopname>_<mode>_mask<round_name>,
avx512fp16_fcmaddcsh_v8hf_mask1<round_expand_name>,
avx512fp16_fcmaddcsh_v8hf_mask3<round_expand_name>,
avx512fp16_fmaddcsh_v8hf_mask3<round_expand_name>,
avx512fp16_fmaddcsh_v8hf_mask3<round_expand_name>,
float<floatunssuffix><mode>v4hf2,
float<floatunssuffix>v2div2hf2,
fix<fixunssuffix>_truncv4hf<mode>2,
fix<fixunssuffix>_truncv2hfv2di2, extendv4hf<mode>2,
extendv2hfv2df2,
trunc<mode>v4hf2,truncv2dfv2hf2,
*avx512bw_permvar_truncv16siv16hi_1,
*avx512bw_permvar_truncv16siv16hi_1_hf,
*avx512f_permvar_truncv8siv8hi_1,
*avx512f_permvar_truncv8siv8hi_1_hf,
*avx512f_vpermvar_truncv8div8si_1,
*avx512f_permvar_truncv32hiv32qi_1,
*avx512f_permvar_truncv16hiv16qi_1,
*avx512f_permvar_truncv4div4si_1,
*avx512f_pshufb_truncv8hiv8qi_1,
*avx512f_pshufb_truncv4siv4hi_1,
*avx512f_pshufd_truncv2div2si_1,
sdot_prod<mode>, avx2_pblend<ssemodesuffix>_1,
ashrv2di3,ashrv2di3,usdot_prod<mode>): Ditto.

gcc/testsuite/ChangeLog:

* gcc.target/i386/pr104976.c: New test.
* gcc.target/i386/avx512fp16-vfcmaddcph-1a.c: Scan either
vblendps or masked vmovaps.
* gcc.target/i386/avx512fp16-vfmaddcph-1a.c: Ditto
* gcc.target/i386/avx512fp16vl-vfcmaddcph-1a.c: Ditto.
* gcc.target/i386/avx512fp16vl-vfmaddcph-1a.c: Ditto.

Daily bump.

libstdc++-v3 testsuite: Call fesetround(FE_DOWNWARD) only if defined

Without this, for a typical soft-float target such as cris-elf, after
commit r12-7676-g5a4e208022e704 you'll see, in libstdc++.log:
...
FAIL: 20_util/from_chars/6.cc (test for excess errors)
Excess errors:
/home/hp/tmp/auto0321/gcc/libstdc++-v3/testsuite/20_util/from_chars/6.cc:33: error: 'FE_DOWNWARD' was not declared in this scope

UNRESOLVED: 20_util/from_chars/6.cc compilation failed to produce executable
...

It appears to be a side-effect of that commit changing the
way __cpp_lib_to_chars is defined. (On the bright side,
./7.cc now passes since that commit.)

TFM, specifically fenv(3), says that "Each of the macros
FE_DIVBYZERO, FE_INEXACT, FE_INVALID, FE_OVERFLOW,
FE_UNDERFLOW is defined when the implementation supports
handling of the corresponding exception".

A git-grep shows that this was the only place using a FE_ macro
unconditionally.

libstdc++-v3:
* testsuite/20_util/from_chars/6.cc (test01) [FE_DOWNWARD]:
Conditionalize call to fesetround.

c: -Wmissing-field-initializers and designated inits [PR82283, PR84685]

This patch fixes two kinds of wrong -Wmissing-field-initializers
warnings.  Our docs say that this warning "does not warn about designated
initializers", but we give a warning for

1) the array case:

  struct S {
    struct N {
      int a;
      int b;
    } c[1];
  } d = {
    .c[0].a = 1,
    .c[0].b = 1, // missing initializer for field 'b' of 'struct N'
  };

we warn because push_init_level, when constructing an array, clears
constructor_designated (which the warning relies on), and we forget
that we were in a designated initializer context.  Fixed by the
push_init_level hunk; and

2) the compound literal case:

  struct T {
    int a;
    int *b;
    int c;
  };

  struct T t = { .b = (int[]){1} }; // missing initializer for field 'c' of 'struct T'

where set_designator properly sets constructor_designated to 1, but the
compound literal causes us to create a whole new initializer_stack in
start_init, which clears constructor_designated.  Then, after we've parsed
the compound literal, finish_init flushes the initializer_stack entry,
but doesn't restore constructor_designated, so we forget we were in
a designated initializer context, which causes the bogus warning.  (The
designated flag is also tracked in constructor_stack, but in this case,
we didn't perform push_init_level between set_designator and start_init
so it wasn't saved anywhere.)

PR c/82283
PR c/84685

gcc/c/ChangeLog:

* c-typeck.cc (struct initializer_stack): Add 'designated' member.
(start_init): Set it.
(finish_init): Restore constructor_designated.
(push_init_level): Set constructor_designated to the value of
constructor_designated in the upper constructor_stack.

gcc/testsuite/ChangeLog:

* gcc.dg/Wmissing-field-initializers-1.c: New test.
* gcc.dg/Wmissing-field-initializers-2.c: New test.
* gcc.dg/Wmissing-field-initializers-3.c: New test.
* gcc.dg/Wmissing-field-initializers-4.c: New test.
* gcc.dg/Wmissing-field-initializers-5.c: New test.

Fortran: ensure intialization of stride array

gcc/fortran/ChangeLog:

PR fortran/104999
* simplify.cc (gfc_simplify_cshift): Ensure temporary holding
source array stride is initialized.

testsuite: Add testcase for already fixed PR [PR102489]

This got broken with r12-3529 and fixed with r12-5255.

2022-03-22 Jakub Jelinek <jakub@redhat.com>

PR c++/102489
* g++.dg/coroutines/pr102489.C: New test.

[nvptx] Use '%' as register prefix

The percentage sign as first character of a ptx identifier can be used to
avoid name conflicts, e.g., between user-defined variable names and
compiler-generated names.

The insn nvptx_uniform_warp_check contains register names without '%' prefix,
which potentially could lead to name conflicts with user-defined variable
names.

Fix this by adding a '%' prefix, more specifically a '%r_' prefix to avoid a
name conflict with ptx special registers.

Tested on x86_64 with nvptx accelerator.

gcc/ChangeLog:

2022-03-20 Tom de Vries <tdevries@suse.de>

PR target/104925
* config/nvptx/nvptx.md (define_insn "nvptx_uniform_warp_check"):
Use % as register prefix.

[nvptx] Limit HFmode support to mexperimental

With PR104489 still open and end-of-stage-4 approaching, classify HFmode
support as experimental, which is not enabled by default but can be enabled
using -mexperimental.

This fixes the nvptx build when the default sm_xx is set to sm_53 or higher.

Note that we're not using -mfp16 or some such, because that might create
expectations about being able to switch support on or off in the future, and
at this point it's not clear why, once reaching non-experimental status, it
shouldn't always be enabled.

gcc/ChangeLog:

2022-03-19 Tom de Vries <tdevries@suse.de>

* config/nvptx/nvptx.cc (nvptx_scalar_mode_supported_p)
(nvptx_libgcc_floating_mode_supported_p): Only enable HFmode for
mexperimental.

gcc/testsuite/ChangeLog:

2022-03-19 Tom de Vries <tdevries@suse.de>

* gcc.target/nvptx/float16-1.c: Add additional-options -mexperimental.
* gcc.target/nvptx/float16-2.c: Same.
* gcc.target/nvptx/float16-3.c: Same.
* gcc.target/nvptx/float16-4.c: Same.
* gcc.target/nvptx/float16-5.c: Same.
* gcc.target/nvptx/float16-6.c: Same.

[nvptx] Add mexperimental

Add new option -mexperimental.

This allows, rather than developing a new feature to completion in a
development branch, to develop a new feature on trunk, without disturbing
trunk.

The equivalent of the feature branch merge then becomes making the
functionality available for -mno-experimental.

If more features at the same time will be developed, we can do something like
-mexperimental=feature1,feature2 but for now that's not necessary.

For now, has no effect.

gcc/ChangeLog:

2022-03-19 Tom de Vries <tdevries@suse.de>

* config/nvptx/nvptx.opt (mexperimental): New option.

[nvptx] Use .alias directive for mptx >= 6.3

Starting with ptx isa version 6.3, a ptx directive .alias is available.
Use this directive to support symbol aliases, as far as possible.

The alias support is off by default.  It can be turned on using a switch
-malias.

Furthermore, for pre-sm_75, it's not effective unless the ptx version is
bumped to 6.3 or higher using -mptx (given that the default for pre-sm_75 is
6.0).

The alias support has the following limitations.

Only function aliases are supported.

Weak aliases are not supported.  That is, if I disable the check in
nvptx_asm_output_def_from_decls that disallows this, a weak alias is emitted
and parsed by the driver.  But the test gcc.dg/globalalias.c starts failing,
with the behaviour matching the comment about "weird behavior of AIX's .set
pseudo-op": a weak alias may resolve to different functions in different
files.

Aliases to weak symbols are not supported (see gcc.dg/localalias.c).  This is
currently not prohibited by the compiler, but with the driver link we run
into: "error: Function test with .weak scope cannot be aliased".

Aliases to aliases are not supported (see libgomp.c-c++-common/pr96390.c).
This is currently not prohibited by the compiler, but with the driver link we
run into:  "Internal error: alias to unknown symbol" .

Unreferenced aliases are not emitted (these can occur f.i. when inlining a
call to an alias).  This avoids driver link error "Internal error: reference
to deleted section".

When enabling malias by default, libgomp detects alias support and
consequently libgomp.a will contains a few uses of .alias.  This however
results in aforementioned "Internal error: reference to deleted section" in
many test-cases.  Either there's some error with how .alias is used, or
there's a driver bug.  While this issue is not resolved, we keep malias
off-by-default.

At some point we may add support in the nvptx-tools linker for symbol
aliases, and define f.i. malias=ptx and malias=ld to choose between the two in
the compiler.

An example of where this support is useful, is the OvO (OpenMP vs Offload)
testsuite.  The testsuite passes already at -O2.  But at -O0, there are errors
in some c++ test-cases due to missing symbol alias support.  By compiling with
-malias, the whole testsuite passes also at -O0.

This patch causes a regression:
...
-PASS: gcc.dg/pr60797.c  (test for errors, line 4)
+FAIL: gcc.dg/pr60797.c  (test for errors, line 4)
...
The test-case is skipped for effective target alias, and both without and with
this patch the nvptx target is considered to not support it, so the test-case is
executed.  The test-case expects an error message along the lines of "alias
definitions not supported in this configuration", but instead we run into:
...
gcc.dg/pr60797.c:4:12: error: foo aliased to undefined symbol
...
This is probably due to the fact that the nvptx backend now defines macros
ASM_OUTPUT_DEF and ASM_OUTPUT_DEF_FROM_DECLS, so from the point of view of the
common part of the compiler, aliases are supported.

gcc/ChangeLog:

2022-03-18  Tom de Vries  <tdevries@suse.de>

PR target/104957
* config/nvptx/nvptx-protos.h (nvptx_asm_output_def_from_decls): Declare.
* config/nvptx/nvptx.cc (write_fn_proto_1): Don't add function marker
for alias.
(SET_ASM_OP, NVPTX_ASM_OUTPUT_DEF): New macro def.
(nvptx_asm_output_def_from_decls): New function.
* config/nvptx/nvptx.h (ASM_OUTPUT_DEF): New macro def, define to
gcc_unreachable ().
(ASM_OUTPUT_DEF_FROM_DECLS): New macro def, define to
nvptx_asm_output_def_from_decls.
* config/nvptx/nvptx.opt (malias): New opt.

gcc/testsuite/ChangeLog:

2022-03-18  Tom de Vries  <tdevries@suse.de>

PR target/104957
* gcc.target/nvptx/alias-1.c: New test.
* gcc.target/nvptx/alias-2.c: New test.
* gcc.target/nvptx/alias-3.c: New test.
* gcc.target/nvptx/alias-4.c: New test.
* gcc.target/nvptx/nvptx.exp
(check_effective_target_runtime_ptx_isa_version_6_3): New proc.

[nvptx] Add warp sync at simt exit

Consider this code (with N defined to 1024):
...
  float v = 0.0;
  #pragma omp target map(tofrom: v)
  #pragma omp parallel for simd
  for (int i = 0 ; i < N; i++)
    {
      #pragma omp atomic update
      v = v + 1.0;
    }
...

It hangs when executing on target board unix/-foffload=-misa=sm_75, using
drivers 470.103.01 and 510.54 on a T400 board (sm_75).

I'm tentatively identifying the problem as a bug in -muniform-simt for
architectures that support Independent Thread Scheduling (sm_70 and later).

The problem -muniform-simt is trying to address is to make sure that a
register produced outside an openmp simd region is available when used in any
lane inside an simd region.

The solution is to, outside an simd region, execute in all warp lanes, thus
producing consistent values in result registers in each warp thread.

This approach doesn't work when executing in all warp lanes multiplies the
side effects from 1 to 32 separate side effects, which is the case for atomic
insns.  So atomic insns are rewritten to execute only in lane 0, and if
there are any results, those are propagated to the other threads in the warp.
[ And likewise for system calls malloc, free, vprintf. ]

Now, consider a non-atomic update: ld, add, store.  The store has side
effects, are those multiplied or not?

Pre-sm_70 we can assume that at the end of an SIMT region, any divergent
control flow has reconverged, and we have a uniform warp, executing in lock
step.  So:
- the load will load the same value into the result register across the warp,
- the add will write the same value into the result register across the warp,
- the store will write the same value to the same memory location, 32 times,
  at once, having the result of a single store.
So, no side-effect multiplication (well, at least that's the observation).

Starting sm_70, the threads in a warp are no longer guaranteed to reconverge
after divergence.  There's a "Convergence Optimizer" that can can identify
that it is safe for a warp to reconverge, but that works only as long as the
code does not contain "synchronizing operations".

Consequently, the ld, add, store sequence can be executed by a non-uniform
warp, which means the side effects can have multiplied, and the registers are
no longer guarantueed to be in sync.

The atomic update in the example above is translated using an atom.cas loop,
which means that we have divergence (because only one thread is allowed to
succeed at a time) and the "Convergence Optimizer" doesn't reconverge probably
because the atom.cas counts as a "synchronizing operation".  So, it seems
plausible that the root cause for the mentioned hang is the problem described
above.

Fix this by adding an explicit warp sync at simt exit.

Note that we're assuming here that the warp will stay uniform until the next
SIMT region entry.

Tested on x86_64 with nvptx accelerator.

gcc/ChangeLog:

2022-03-09  Tom de Vries  <tdevries@suse.de>

PR target/104916
PR target/104783
* config/nvptx/nvptx.md (define_expand "omp_simt_exit"): Emit warp
sync (or uniform warp check for mptx < 6.0).

libgomp/ChangeLog:

2022-03-15  Tom de Vries  <tdevries@suse.de>

PR target/104916
PR target/104783
* testsuite/libgomp.c/pr104783-2.c: New test.

tree-optimization/105012 - fix ICE from local DSE of if-conversion

The following guards dse_classify_store with the same condition as
the DSE pass does - availability of a virtual definition.  For
the PR we run into the fortran frontend generating a clobber for
a FUNCTION_DECL lhs which is ignored by the operand scanner and has
no virtual operands assigned.  Apart from fixing the frontend the
following fixes the ICE by adjusting if-conversion.

2022-03-22  Richard Biener  <rguenther@suse.de>

PR tree-optimization/105012
* tree-if-conv.cc (ifcvt_local_dce): Only call
dse_classify_store when we have a VDEF.

nvptx: fix wrapping in an error message.

PR target/104902

gcc/ChangeLog:

* config/nvptx/nvptx.cc (handle_ptx_version_option):
Fix option wrapping in an error message.

rs6000: wrap const in an error message.

PR target/104903

gcc/ChangeLog:

* config/rs6000/rs6000-c.cc (altivec_resolve_overloaded_builtin):
Wrap const keyword.

v850: fix typo in pragma name

PR target/104904

gcc/ChangeLog:

* config/v850/v850-c.cc (pop_data_area): Fix typo in pragma
name.

rs6000: update error message format.

PR target/104898

gcc/ChangeLog:

* config/rs6000/rs6000.cc (rs6000_option_override_internal):
Use %qs instead of (%qs).

i386: update error message format.

Use '%qs' instead of '(%qs)'.

PR target/104898

gcc/ChangeLog:

* config/i386/i386-options.cc (ix86_option_override_internal):
Use '%qs' instead of '(%qs)'.

gcc/testsuite/ChangeLog:

* gcc.target/i386/pr99753.c: Update test.
* gcc.target/i386/spellcheck-options-1.c: Likewise.
* gcc.target/i386/spellcheck-options-2.c: Likewise.
* gcc.target/i386/spellcheck-options-4.c: Likewise.

aarch64: update error message format.

Use 'qs' and remove usage '(%qs)'.

PR target/104898

gcc/ChangeLog:

* config/aarch64/aarch64.cc (aarch64_handle_attr_arch):
Use 'qs' and remove usage '(%qs)'.
(aarch64_handle_attr_cpu): Likewise.
(aarch64_handle_attr_tune): Likewise.
(aarch64_handle_attr_isa_flags): Likewise.

gcc/testsuite/ChangeLog:

* gcc.target/aarch64/branch-protection-attr.c:
Use 'qs' and remove usage '(%qs)'.
* gcc.target/aarch64/spellcheck_1.c: Likewise.
* gcc.target/aarch64/spellcheck_2.c: Likewise.
* gcc.target/aarch64/spellcheck_3.c: Likewise.

aarch64: Update regmove costs for neoverse-v1 and neoverse-512tvb tunings

This patch updates the register move tunings for
-mcpu/-mtune={neoverse-v1,neoverse-512tvb}.

gcc/ChangeLog:
2022-03-22 Tamar Christina <tamar.christina@arm.com>
Andre Vieira <andre.simoesdiasvieira@arm.com>

* config/aarch64/aarch64.cc (neoversev1_regmove_cost): New tuning
struct.
(neoversev1_tunings): Use neoversev1_regmove_cost and update store_int
cost.
(neoverse512tvb_tunings): Likewise.

aarch64: Add Demeter tuning structs

This patch adds tuning structs for -mcpu/-mtune=demeter.

gcc/ChangeLog:

2022-03-22 Tamar Christina <tamar.christina@arm.com>
Andre Vieira <andre.simoesdiasvieira@arm.com>

* config/aarch64/aarch64.cc (demeter_addrcost_table,
demeter_regmove_cost, demeter_advsimd_vector_cost,
demeter_sve_vector_cost, demeter_scalar_issue_info,
demeter_advsimd_issue_info, demeter_sve_issue_info,
demeter_vec_issue_info, demeter_vector_cost,
demeter_tunings): New tuning structs.
(aarch64_ve_op_count::rename_cycles_per_iter): Enable for demeter
tuning.
* config/aarch64/aarch64-cores.def: Add entry for demeter.
* config/aarch64/aarch64-tune.md (tune): Add demeter to list.

aarch64: Update reg-costs to differentiate between memmove costs

This patch introduces a struct to differentiate between different memmove costs
to enable a better modeling of memory operations. These have been modelled for
-mcpu/-mtune=neoverse-v1/neoverse-n1/neoverse-n2/neoverse-512tvb, for all other
tunings all entries are equal to the old single memmove cost to ensure the
behaviour remains the same.

2022-03-16 Tamar Christina <tamar.christina@arm.com>
Andre Vieira <andre.simoesdiasvieira@arm.com>

gcc/ChangeLog:

* config/aarch64/aarch64-protos.h (struct cpu_memmov_cost): New struct.
(struct tune_params): Change type of memmov_cost to use cpu_memmov_cost.
* config/aarch64/aarch64.cc (aarch64_memory_move_cost): Update all
tunings to use cpu_memmov_cost struct.

aarch64: Add Neoverse-N2 tuning structs

This patch adds tuning structures for Neoverse N2.

2022-03-22 Tamar Christina <tamar.christina@arm.com>
Andre Vieira <andre.simoesdiasvieira@arm.com>

* config/aarch64/aarch64.cc (neoversen2_addrcost_table,
neoversen2_regmove_cost, neoversen2_advsimd_vector_cost,
neoversen2_sve_vector_cost, neoversen2_scalar_issue_info,
neoversen2_advsimd_issue_info, neoversen2_sve_issue_info,
neoversen2_vec_issue_info, neoversen2_tunings): New structs.
(neoversen2_tunings): Use new structs and update tuning flags.
(aarch64_vec_op_count::rename_cycles_per_iter): Enable for neoversen2
tuning.

aarch64: Enable FP16 feature by default for Armv9

This patch adds the feature bit for FP16 to the feature set for Armv9 since
Armv9 requires SVE to be implemented and SVE requires FP16 to be implemented.

2022-03-22 Andre Vieira <andre.simoesdiasvieira@arm.com>

* config/aarch64/aarch64.h (AARCH64_FL_FOR_ARCH9): Add FP16 feature
bit.

lto-plugin: Use GNU ld or Solaris ld version script in preference to -export-symbols-regex [PR102426]

As reported, libtool -export-symbols-regex doesn't work on Solaris
when using GNU ld instead of Sun ld, libtool just always assumes Sun ld.
As I'm unsure what is the maintainance status of libtool right now,
this patch solves it on the lto-plugin side instead, tests at configure time
similar way how libssp and other target libraries test for symbol versioning
(except omitting the symbol version because we just want one GLOBAL symbol
and rest of them LOCAL), and will use the current way of
-export-symbols-regex onload as fallback when this doesn't work.

2022-03-22 Jakub Jelinek <jakub@redhat.com>

PR lto/102426
lto-plugin/
* configure.ac (LTO_PLUGIN_USE_SYMVER, LTO_PLUGIN_USE_SYMVER_GNU,
LTO_PLUGIN_USE_SYMVER_SUN): New test for symbol versioning support.
* Makefile.am (version_arg, version_dep): Set conditionally based
on LTO_PLUGIN_USE_SYMVER*.
(liblto_plugin_la_LDFLAGS): Use $(version_arg) instead of
-export-symbols-regex onload.
(liblto_plugin_la_DEPENDENCIES): Depend on $(version_dep).
* lto-plugin.map: New file.
* configure: Regenerated.
* Makefile.in: Regenerated.

Extend splitter pattern to reversed condition by swapping then and else rtx. [PR target/104982]

Failed to match this instruction:
(set (reg/v:SI 88 [ z ])
    (if_then_else:SI (eq (zero_extract:SI (reg:SI 92)
                (const_int 1 [0x1])
                (zero_extend:SI (subreg:QI (reg:SI 93) 0)))
            (const_int 0 [0]))
        (reg:SI 95)
        (reg:SI 94)))

but it's equal to

(set (reg/v:SI 88 [ z ])
    (if_then_else:SI (ne (zero_extract:SI (reg:SI 92)
                (const_int 1 [0x1])
                (zero_extend:SI (subreg:QI (reg:SI 93) 0)))
            (const_int 0 [0]))
        (reg:SI 94)
        (reg:SI 95)))

which is the exact existing splitter.

The patch will fix below regressions:

On x86-64, r12-7687 caused:

FAIL: gcc.target/i386/bt-5.c scan-assembler-not sar[lq][ \t]
FAIL: gcc.target/i386/bt-5.c scan-assembler-times bt[lq][ \t] 7

gcc/ChangeLog:

PR target/104982
* config/i386/i386.md (*jcc_bt<mode>_mask): Extend the
following splitter to reversed condition.

testsuite: Add testcase for no longer failing PR [PR102645]

This test started ICEing with r12-3876 but stopped with r12-5264.

2022-03-22 Jakub Jelinek <jakub@redhat.com>

PR tree-optimization/102645
* gcc.c-torture/compile/pr102645.c: New test.

calls: Fix error recovery after sorry differently [PR104989]

On Mon, Feb 28, 2022 at 07:52:56AM -0000, Roger Sayle wrote:
> This patch resolves PR c++/84964 which is an ICE in the middle-end after
> emitting a "sorry, unimplemented" message, and is a regression from
> earlier releases of GCC.  This issue is that after encountering a
> function call requiring an unreasonable amount of stack space, the
> code continues and falls foul of an assert checking that stack pointer
> has been correctly updated.  The fix is to (locally) consider aborted
> function calls as "no return", which skips this downstream sanity check.

As can be seen on PR104989, just setting ECF_NORETURN after sorry is quite
risky and leads to other ICEs.  The problem is that ECF_NORETURN calls
better should be at the end of basic blocks that don't have any fallthru
successor edges, otherwise we can ICE later.

This patch instead sets sibcall_failure if in pass == 0 (sibcall_failure
means that the tail call sequence is not useful/not desirable and throws
it away) and otherwise sets a new bool variable that will let us pass
the assertion and also throws away the whole call sequence, I think that is
best for error recovery.

2022-03-22  Jakub Jelinek  <jakub@redhat.com>

PR rtl-optimization/104989
* calls.cc (expand_call): Don't set ECF_NORETURN in flags after
sorry for passing too large argument, instead set sibcall_failure
for pass == 0, or a new normal_failure flag otherwise.  If
normal_failure is set, don't assert all stack has been deallocated
at the end and throw away the whole insn sequence.

* g++.dg/other/pr104989.C: New test.

print-tree:Avoid warnings of overflow

This patch avoids two warnings of "'sprintf' may write a
terminating nul past the end of the destination
[-Wformat-overflow=]" when build GCC.

Tested on x86_64, and committed as obvious.

gcc/ChangeLog:

* print-tree.cc: Change array length

AVX512FP16: Fix wrong code for _mm_mask_f[c]madd.*sch [PR 104978]

For complex scalar intrinsic like _mm_mask_fcmadd_sch, the
mask should be and by 1 to ensure the mask is bind to lowest byte.
Use masked vmovss to perform same operation which omits higher bits
of mask.

gcc/ChangeLog:

PR target/104978
* config/i386/sse.md
(avx512fp16_fmaddcsh_v8hf_mask1<round_expand_name):
Use avx512f_movsf_mask instead of vmovaps or vblend, and
force_reg before lowpart_subreg.
(avx512fp16_fcmaddcsh_v8hf_mask1<round_expand_name): Likewise.

gcc/testsuite/ChangeLog:

PR target/104978
* gcc.target/i386/avx512fp16-vfcmaddcsh-1a.c: Adjust asm scan.
* gcc.target/i386/avx512fp16-vfmaddcsh-1a.c: Ditto.
* gcc.target/i386/avx512fp16-vfcmaddcsh-1c.c: Removed.
* gcc.target/i386/avx512fp16-vfmaddcsh-1c.c: Ditto.
* gcc.target/i386/pr104978.c: New test.

Daily bump.

x86: Disable SSE in ISA2 for -mgeneral-regs-only

Replace OPTION_MASK_ISA2_AVX512F_UNSET with OPTION_MASK_ISA2_SSE_UNSET
in OPTION_MASK_ISA2_GENERAL_REGS_ONLY_UNSET to disable SSE, AVX and
AVX512 ISAs.

gcc/

PR target/105000
* common/config/i386/i386-common.cc
(OPTION_MASK_ISA2_GENERAL_REGS_ONLY_UNSET): Replace
OPTION_MASK_ISA2_AVX512F_UNSET with OPTION_MASK_ISA2_SSE_UNSET.

gcc/testsuite/

PR target/105000
* gcc.target/i386/pr105000-1.c: New test.
* gcc.target/i386/pr105000-2.c: Likewise.
* gcc.target/i386/pr105000-3.c: Likewise.
* gcc.target/i386/pr105000-4.c: Likewise.

c++: initialized array of vla [PR58646]

We went into build_vec_init because we're dealing with a VLA, but then
build_vec_init thought it was safe to just build an INIT_EXPR because the
outer dimension is constant. Nope.

PR c++/58646

gcc/cp/ChangeLog:

* init.cc (build_vec_init): Check for vla element type.

gcc/testsuite/ChangeLog:

* g++.dg/ext/vla24.C: New test.

x86: Disable AVX on pr86722.c and pr90356.c

SSE/SSE2 are enabled explicitly on pr86722.c and pr90356.c. Disable AVX
to avoid AVX with -march=native.

PR target/86722
PR tree-optimization/90356
* gcc.target/i386/pr86722.c: Add -mno-avx.
* gcc.target/i386/pr90356.c: Likewise.

x86: Properly check FEATURE_AESKLE

1. Pass 0x19 to __cpuid for bit_AESKLE.
2. Enable FEATURE_AESKLE only if bit_AESKLE is set.

PR target/104998
* common/config/i386/cpuinfo.h (get_available_features): Pass
0x19 to __cpuid for bit_AESKLE. Enable FEATURE_AESKLE only if
bit_AESKLE is set.

c++: designated init and aggregate members [PR103337]

Our C++20 designated initializer handling was broken with members of class
type; we would find the relevant member and then try to find a member of
the member with the same name. Or we would sometimes ignore the designator
entirely. The former problem is fixed by the change to reshape_init_class,
the latter by the change to reshape_init_r.

PR c++/103337
PR c++/102740
PR c++/103299
PR c++/102538

gcc/cp/ChangeLog:

* decl.cc (reshape_init_class): Avoid looking for designator
after we found it.
(reshape_init_r): Keep looking for designator.

gcc/testsuite/ChangeLog:

* g++.dg/ext/flexary3.C: Remove one error.
* g++.dg/parse/pr43765.C: Likewise.
* g++.dg/cpp2a/desig22.C: New test.
* g++.dg/cpp2a/desig23.C: New test.
* g++.dg/cpp2a/desig24.C: New test.
* g++.dg/cpp2a/desig25.C: New test.

c++: designator and anon struct [PR101767]

We found .x in the anonymous struct, but then didn't find .y there; we
should decide that means we're done with the struct rather than that the
code is wrong.

PR c++/101767

gcc/cp/ChangeLog:

* decl.cc (reshape_init_class): Back out of anon struct
if a designator doesn't match.

gcc/testsuite/ChangeLog:

* g++.dg/ext/anon-struct10.C: New test.

Update gcc sv.po

* sv.po: Update.

d: Fix internal compiler error: in build_complex, at tree.c:2358

The conversion from the special _Complex enum to native complex used
build_complex, however the input value isn't necessarily a literal.

PR d/105004

gcc/d/ChangeLog:

* d-codegen.cc (build_struct_literal): Use complex_expr to build
complex expressions from __c_complex types.

gcc/testsuite/ChangeLog:

* gdc.dg/pr105004.d: New test.

d: Merge upstream dmd 2503f17e5, phobos a74fa63e6.

D front-end changes:

    - Import dmd mainline development.
    - Removed internal d_intN and d_unsN aliases to stdint types, which
      caused a regression on Solaris where int8_t is a char (PR104911).

Phobos changes:

    - Import phobos mainline development.

PR d/104911

gcc/d/ChangeLog:

* dmd/MERGE: Merge upstream dmd 2503f17e5.
* d-convert.cc (convert_expr): Replace d_uns64 with dinteger_t.
* d-lang.cc: Remove dmd/root/file.h include.
(d_handle_option): Update for new front-end interface.
(d_parse_file): Likewise.

libphobos/ChangeLog:

* src/MERGE: Merge upstream phobos a74fa63e6.

rtl-ssa: Fix prev/next_def confusion [PR104869]

rtl-ssa chains definitions into an RPO list. It also groups
sequences of clobbers together into a single node, so that it's
possible to skip over the clobbers in constant time in order to
get the next or previous set.

When adding a clobber to an insn, the main DF barriers for that
clobber are the last use of the previous set (if any) and the next
set (if any); adding a new clobber to a sea of clobbers is fine.
def_lookup provided the basis for these barriers as prev_def ()
and next_def ().

But of course, in hindsight, those were bad names, since they
implied that the returned values were literally the previous
definition (of any kind) or the next definition (of any kind).
And function_info::make_use_available was using the same routines
assuming that they had that meaning. :-(

This made a difference for the case where the start of a BB
occurs in the middle of an (RPO) clobber group: we then want
the previous and next clobbers in the group, rather than the
set before the clobber group and the set after the clobber group.

This patch renames the existing routines to something that's hopefully
clearer (though also more long-winded). It then adds routines that
really do provide the previous and next definitions.

This complication is supposed to be internal to rtl-ssa and, as
mentioned above, is part of trying to reduce time complexity.

gcc/
PR middle-end/104869
* rtl-ssa/accesses.h (clobber_group::prev_clobber): Declare.
(clobber_group::next_clobber): Likewise.
(def_lookup::prev_def): Rename to...
(def_lookup::last_def_of_prev_group): ...this.
(def_lookup::next_def): Rename to...
(def_lookup::first_def_of_next_group): ...this.
(def_lookup::matching_or_prev_def): Rename to...
(def_lookup::matching_set_or_last_def_of_prev_group): ...this.
(def_lookup::matching_or_next_def): Rename to...
(def_lookup::matching_set_or_first_def_of_next_group): ...this.
(def_lookup::prev_def): New function, taking the lookup insn as
argument.
(def_lookup::next_def): Likewise.
* rtl-ssa/member-fns.inl (def_lookup::prev_def): Rename to...
(def_lookup::last_def_of_prev_group): ...this.
(def_lookup::next_def): Rename to...
(def_lookup::first_def_of_next_group): ...this.
(def_lookup::matching_or_prev_def): Rename to...
(def_lookup::matching_set_or_last_def_of_prev_group): ...this.
(def_lookup::matching_or_next_def): Rename to...
(def_lookup::matching_set_or_first_def_of_next_group): ...this.
* rtl-ssa/movement.h (restrict_movement_for_dead_range): Update after
above renaming.
* rtl-ssa/accesses.cc (clobber_group::prev_clobber): New function.
(clobber_group::next_clobber): Likewise.
(def_lookup::prev_def): Likewise.
(def_lookup::next_def): Likewise.
(function_info::make_use_available): Pass the lookup insn to
def_lookup::prev_def and def_lookup::next_def.

gcc/testsuite/
PR middle-end/104869
* g++.dg/pr104869.C: New test.

Avoid a warning of overflow

This patch avoid a warning of "c-ada-spec.cc:1660:34: warning:
'sprintf' may write a terminating nul past the end of the
destination [-Wformat-overflow=]" when build GCC.

gcc/c-family/ChangeLog:

* c-ada-spec.cc: Change array length

libstdc++: Work around clang misdesign in time_get<>::get [PR104990]

Apparently clang has a -fgnuc-version= option which allows it to pretend
it is any GCC version the user likes. It is already bad that it claims to
be GCC 4.2 compatible by default when it is not (various unimplemented
extensions at least), but this option is a horrible idea.

Anyway, this patch adds a hack for it.

2022-03-21 Jakub Jelinek <jakub@redhat.com>

PR libstdc++/104990
* include/bits/locale_facets_nonio.tcc (get): Don't check if do_get
isn't overloaded if __clang__ is defined.

docs: Document min-pagesize parameter.

gcc/ChangeLog:

* doc/invoke.texi: Document min-pagesize parameter.

Dump when estimating the number of iterations of a loop

Currently the dumps are somewhat inter-mangled, not showing the
(possibly bad) recursion between niter estimation and number of
iteration computation. The following tries to improve deciphering
a little bit by dumping when we do niter estimation.

2022-03-21 Richard Biener <rguenther@suse.de>

* tree-ssa-loop-niter.cc (estimate_numbers_of_iterations): Dump
we are estimating niter of loop.

RISC-V: Implement misc macro for vector extensions.

See also:
https://github.com/riscv-non-isa/riscv-c-api-doc/pull/21

gcc/ChangeLog:

* common/config/riscv/riscv-common.cc (riscv_ext_flag_table):
Update flag name and mask name.
* config/riscv/riscv-c.cc (riscv_cpu_cpp_builtins): Define
misc macro for vector extensions.
* config/riscv/riscv-opts.h (MASK_VECTOR_EEW_32): Rename to ...
(MASK_VECTOR_ELEN_32): ... this.
(MASK_VECTOR_EEW_64): Rename to ...
(MASK_VECTOR_ELEN_64): ... this.
(MASK_VECTOR_EEW_FP_32): Rename to ...
(MASK_VECTOR_ELEN_FP_32): ... this.
(MASK_VECTOR_EEW_FP_64): Rename to ...
(MASK_VECTOR_ELEN_FP_64): ... this.
(TARGET_VECTOR_ELEN_32): New.
(TARGET_VECTOR_ELEN_64): Ditto.
(TARGET_VECTOR_ELEN_FP_32): Ditto.
(TARGET_VECTOR_ELEN_FP_64): Ditto.
(TARGET_MIN_VLEN): Ditto.
* config/riscv/riscv.opt (riscv_vector_eew_flags): Rename to ...
(riscv_vector_elen_flags): ... this.

gcc/testsuite/ChangeLog:

* gcc.target/riscv/arch-13.c: New.
* gcc.target/riscv/arch-14.c: Ditto.
* gcc.target/riscv/arch-15.c: Ditto.
* gcc.target/riscv/predef-18.c: Ditto.
* gcc.target/riscv/predef-19.c: Ditto.
* gcc.target/riscv/predef-20.c: Ditto.

AVX512FP16: Fix masm=intel output for vfc?(madd|mul)csh [PR 104977]

Fix typo in subst for scalar complex mask_round operand.

gcc/ChangeLog:

PR target/104977
* config/i386/sse.md
(avx512fp16_fma<complexopname>sh_v8hf<mask_scalarcz_name><round_scalarcz_name>):
Correct round operand for intel dialect.

gcc/testsuite/ChangeLog:

PR target/104977
* gcc.target/i386/pr104977.c: New test.