platform/upstream/gcc.git
3 years agotestsuite: Fix duplicated content of gcc.c-torture/execute/ieee/pr29302-1.x
Jakub Jelinek [Wed, 4 Aug 2021 09:42:59 +0000 (11:42 +0200)]
testsuite: Fix duplicated content of gcc.c-torture/execute/ieee/pr29302-1.x

The file has two identical halves, seems like twice applied patch.

2021-08-04  Jakub Jelinek  <jakub@redhat.com>

* gcc.c-torture/execute/ieee/pr29302-1.x: Undo doubly applied patch.

3 years agoRefine predicate of peephole2 to general_reg_operand. [PR target/101743]
liuhongt [Wed, 4 Aug 2021 02:50:28 +0000 (10:50 +0800)]
Refine predicate of peephole2 to general_reg_operand. [PR target/101743]

The define_peephole2 which is added by r12-2640-gf7bf03cf69ccb7dc
should only work on general registers, considering that x86 also
supports mov instructions between gpr, sse reg, mask reg, limiting the
peephole2 predicate to general_reg_operand.

gcc/ChangeLog:

PR target/101743
* config/i386/i386.md (peephole2): Refine predicate from
register_operand to general_reg_operand.

3 years agolibgcc: Fix duplicated content of config/t-slibgcc-fuchsia
Jakub Jelinek [Wed, 4 Aug 2021 09:40:52 +0000 (11:40 +0200)]
libgcc: Fix duplicated content of config/t-slibgcc-fuchsia

The file has two identical halves, seems like twice applied patch.

2021-08-04  Jakub Jelinek  <jakub@redhat.com>

* config/t-slibgcc-fuchsia: Undo doubly applied patch.

3 years agoMark path_range_query::dump as override.
Aldy Hernandez [Wed, 4 Aug 2021 08:55:12 +0000 (10:55 +0200)]
Mark path_range_query::dump as override.

gcc/ChangeLog:

* gimple-range-path.h (path_range_query::dump): Mark override.

3 years agotree-optimization/101769 - tail recursion creates possibly infinite loop
Richard Biener [Wed, 4 Aug 2021 07:22:51 +0000 (09:22 +0200)]
tree-optimization/101769 - tail recursion creates possibly infinite loop

This makes tail recursion optimization produce a loop structure
manually rather than relying on loop fixup.  That also allows the
loop to be marked as finite (it would eventually blow the stack
if it were not).

2021-08-04  Richard Biener  <rguenther@suse.de>

PR tree-optimization/101769
* tree-tailcall.c (eliminate_tail_call): Add the created loop
for the first recursion and return it via the new output parameter.
(optimize_tail_call): Pass through new output param.
(tree_optimize_tail_calls_1): After creating all latches,
add the created loop to the loop tree.  Do not mark loops for fixup.

* g++.dg/tree-ssa/pr101769.C: New testcase.

3 years agodocs: document threader-mode param
Martin Liska [Wed, 4 Aug 2021 07:48:05 +0000 (09:48 +0200)]
docs: document threader-mode param

gcc/ChangeLog:

* doc/invoke.texi: Document threader-mode param.

3 years agoAdd dg-require-effective-target for testcases.
liuhongt [Wed, 4 Aug 2021 05:20:56 +0000 (13:20 +0800)]
Add dg-require-effective-target for testcases.

gcc/testsuite/ChangeLog:

* gcc.target/i386/cond_op_addsubmul_d-2.c: Add
dg-require-effective-target for avx512.
* gcc.target/i386/cond_op_addsubmul_q-2.c: Ditto.
* gcc.target/i386/cond_op_addsubmul_w-2.c: Ditto.
* gcc.target/i386/cond_op_addsubmuldiv_double-2.c: Ditto.
* gcc.target/i386/cond_op_addsubmuldiv_float-2.c: Ditto.
* gcc.target/i386/cond_op_fma_double-2.c: Ditto.
* gcc.target/i386/cond_op_fma_float-2.c: Ditto.

3 years agoSupport cond_{fma,fms,fnma,fnms} for vector float/double under AVX512.
liuhongt [Wed, 4 Aug 2021 03:41:37 +0000 (11:41 +0800)]
Support cond_{fma,fms,fnma,fnms} for vector float/double under AVX512.

gcc/ChangeLog:

* config/i386/sse.md (cond_fma<mode>): New expander.
(cond_fms<mode>): Ditto.
(cond_fnma<mode>): Ditto.
(cond_fnms<mode>): Ditto.

gcc/testsuite/ChangeLog:

* gcc.target/i386/cond_op_fma_double-1.c: New test.
* gcc.target/i386/cond_op_fma_double-2.c: New test.
* gcc.target/i386/cond_op_fma_float-1.c: New test.
* gcc.target/i386/cond_op_fma_float-2.c: New test.

3 years agocompiler: support new language constructs in escape analysis
Cherry Mui [Tue, 3 Aug 2021 23:35:55 +0000 (19:35 -0400)]
compiler: support new language constructs in escape analysis

Previous CLs add new language constructs in Go 1.17, specifically,
unsafe.Add, unsafe.Slice, and conversion from a slice to a pointer
to an array. This CL handles them in the escape analysis.

At the point of the escape analysis, unsafe.Add and unsafe.Slice
are still builtin calls, so just handle them in data flow.
Conversion from a slice to a pointer to an array has already been
lowered to a combination of compound expression, conditional
expression and slice info expressions, so handle them in the
escape analysis.

Reviewed-on: https://go-review.googlesource.com/c/gofrontend/+/339671

3 years agoDaily bump.
GCC Administrator [Wed, 4 Aug 2021 00:16:51 +0000 (00:16 +0000)]
Daily bump.

3 years agocompile, runtime: make selectnbrecv return two values
Ian Lance Taylor [Tue, 3 Aug 2021 18:36:24 +0000 (11:36 -0700)]
compile, runtime: make selectnbrecv return two values

The only different between selectnbrecv and selectnbrecv2 is the later
set the input pointer value by second return value from chanrecv.

So by making selectnbrecv return two values from chanrecv, we can get
rid of selectnbrecv2, the compiler can now call only selectnbrecv and
generate simpler code.

This is the gofrontend version of https://golang.org/cl/292890.

Reviewed-on: https://go-review.googlesource.com/c/gofrontend/+/339529

3 years agocompiler: check slice to pointer-to-array conversion element type
Ian Lance Taylor [Mon, 2 Aug 2021 23:27:02 +0000 (16:27 -0700)]
compiler: check slice to pointer-to-array conversion element type

When checking a slice to pointer-to-array conversion, I forgot to
verify that the elements types are identical.

For golang/go#395

Reviewed-on: https://go-review.googlesource.com/c/gofrontend/+/339329

3 years agors6000: Replace & by &&
Segher Boessenkool [Fri, 4 Jun 2021 19:10:38 +0000 (19:10 +0000)]
rs6000: Replace & by &&

2021-08-03  Segher Boessenkool  <segher@kernel.crashing.org>

* config/rs6000/vsx.md (*vsx_le_perm_store_<mode>): Use && instead of &.

3 years agors6000: "e" is not a free constraint letter
Segher Boessenkool [Tue, 3 Aug 2021 22:22:37 +0000 (22:22 +0000)]
rs6000: "e" is not a free constraint letter

It is the prefix of the "es" and "eI" constraints.

2021-08-03  Segher Boessenkool  <segher@kernel.crashing.org>

* config/rs6000/constraints.md: Remove "e" from the list of available
constraint characters.

3 years agoFix indirect call inlining with AutoFDO
Eugene Rozenfeld [Tue, 3 Aug 2021 01:36:09 +0000 (18:36 -0700)]
Fix indirect call inlining with AutoFDO

The histogram value for indirect calls was incorrectly set up.
That is fixed now.

With this change the tree-prof tests checking indirect call inlining with AutoFDO
in gcc.dg and g++.dg are passing.

Resolves:
PR gcov-profile/71672 - inlining indirect calls does not work with autofdo

gcc/ChangeLog:
PR gcov-profile/71672
* auto-profile.c (afdo_indirect_call): Fix setup of the historgram value for indirect calls.

3 years agoFixes for AutoFDO testing
Eugene Rozenfeld [Tue, 3 Aug 2021 01:29:24 +0000 (18:29 -0700)]
Fixes for AutoFDO testing

* create_gcov tool doesn't currently support dwarf 5 so I made a change in profopt.exp
  to pass -gdwarf-4 when compiling the binary to profile.

* I updated the invocation of create_gcov in profopt.exp to pass -gcov_version=2.
  I recently made a change to create_gcov to support version 2:
  https://github.com/google/autofdo/pull/117 .

* I removed useless -o perf.data from the invocation of gcc-auto-profile in
  target-supports.exp.

These changes contribute to fixing PR gcov-profile/71672.

gcc/testsuite/ChangeLog:

* lib/profopt.exp: Pass gdwarf-4 when compiling test to profile; pass -gcov_version=2.
* lib/target-supports.exp: Remove unnecessary -o perf.data passed to gcc-auto-profile.

3 years agoFix indir-call-prof-2.c with AutoFDO
Eugene Rozenfeld [Tue, 3 Aug 2021 00:22:34 +0000 (17:22 -0700)]
Fix indir-call-prof-2.c with AutoFDO

indir-call-prof-2.c has -fno-early-inlining but AutoFDO can't work without
early inlining (it needs to match the inlining of the profiled binary).
I changed profopt.exp to always pass -fearly-inlining for AutoFDO.
With that change the indirect call inlining in indir-call-prof-2.c happens in the early inliner
so I changed the dg-final-use-autofdo.

Contributes to fixing PR gcov-profile/71672

gcc/testsuite/ChangeLog:

* gcc.dg/tree-prof/indir-call-prof-2.c: Fix dg-final-use-autofdo.
* lib/profopt.exp: Pass -fearly-inlining when compiling with AutoFDO.

3 years agoFixes for AutoFDO tests
Eugene Rozenfeld [Tue, 3 Aug 2021 00:12:04 +0000 (17:12 -0700)]
Fixes for AutoFDO tests

* Changed several tests to use -fdump-ipa-afdo-optimized instead of -fdump-ipa-afdo
in dg-options so that the expected output can be found

* Increased the number of iterations in several tests so that perf can have
enough sampling events

Contributes to fixing PR gcov-profile/71672.

gcc/testsuite/ChangeLog:

* g++.dg/tree-prof/indir-call-prof.C: Fix options, increase the number of iterations.
* g++.dg/tree-prof/morefunc.C: Fix options, increase the number of iterations.
* g++.dg/tree-prof/reorder.C: Fix options, increase the number of iterations.
* gcc.dg/tree-prof/indir-call-prof-2.c: Fix options, increase the number of iterations.
* gcc.dg/tree-prof/indir-call-prof.c: Fix options.

3 years agoDisable a test case in ILP32 [PR101688].
Martin Sebor [Tue, 3 Aug 2021 19:53:02 +0000 (13:53 -0600)]
Disable a test case in ILP32 [PR101688].

Resolves:
PR testsuite/101688 - g++.dg/warn/Wstringop-overflow-4.C fails on 32-bit archs with new jump threader

gcc/testsuite:
PR testsuite/101688
* g++.dg/warn/Wstringop-overflow-4.C: Disable a test case in ILP32.

3 years agors6000: Add test for _mm_minpos_epu16
Paul A. Clarke [Tue, 23 Feb 2021 01:20:48 +0000 (19:20 -0600)]
rs6000: Add test for _mm_minpos_epu16

Copy the test for _mm_minpos_epu16 from
gcc/testsuite/gcc.target/i386/sse4_1-phminposuw.c, with
a few adjustments:

- Adjust the dejagnu directives for powerpc platform.
- Make the data not be monotonically increasing,
  such that some of the returned values are not
  always the first value (index 0).
- Create a list of input data testing various scenarios
  including more than one minimum value and different
  orders and indices of the minimum value.
- Fix a masking issue where the index was being truncated
  to 2 bits instead of 3 bits, which wasn't found because
  all of the returned indices were 0 with the original
  generated data.
- Support big-endian.

2021-08-03  Paul A. Clarke  <pc@us.ibm.com>

gcc/testsuite
* gcc.target/powerpc/sse4_1-phminposuw.c: Copy from
gcc/testsuite/gcc.target/i386, adjust dg directives to suit,
make more robust.

3 years agors6000: Add support for _mm_minpos_epu16
Paul A. Clarke [Tue, 23 Feb 2021 01:13:28 +0000 (19:13 -0600)]
rs6000: Add support for _mm_minpos_epu16

Add a naive implementation of the subject x86 intrinsic to
ease porting.

2021-08-03  Paul A. Clarke  <pc@us.ibm.com>

gcc
* config/rs6000/smmintrin.h (_mm_minpos_epu16): New.

3 years agolibstdc++: Suppress redundant definitions of inline variables
Jonathan Wakely [Tue, 3 Aug 2021 14:03:44 +0000 (15:03 +0100)]
libstdc++: Suppress redundant definitions of inline variables

In C++17 the out-of-class definitions for static constexpr variables are
redundant, because they are implicitly inline. This change avoids
"redundant redeclaration" warnings from -Wsystem-headers -Wdeprecated.

Signed-off-by: Jonathan Wakely <jwakely@redhat.com>
libstdc++-v3/ChangeLog:

* include/bits/random.tcc (linear_congruential_engine): Do not
define static constexpr members when they are implicitly inline.
* include/std/ratio (ratio, __ratio_multiply, __ratio_divide)
(__ratio_add, __ratio_subtract): Likewise.
* include/std/type_traits (integral_constant): Likewise.
* testsuite/26_numerics/random/pr60037-neg.cc: Adjust dg-error
line number.

3 years agolibstdc++: Replace TR1 components with C++11 ones in test utils
Jonathan Wakely [Tue, 3 Aug 2021 14:02:50 +0000 (15:02 +0100)]
libstdc++: Replace TR1 components with C++11 ones in test utils

Signed-off-by: Jonathan Wakely <jwakely@redhat.com>
libstdc++-v3/ChangeLog:

* testsuite/util/testsuite_common_types.h: Replace uses of
tr1::unordered_map and tr1::unordered_set with their C++11
equivalents.
* testsuite/29_atomics/atomic/cons/assign_neg.cc: Adjust
dg-error line number.
* testsuite/29_atomics/atomic/cons/copy_neg.cc: Likewise.
* testsuite/29_atomics/atomic_integral/cons/assign_neg.cc:
Likewise.
* testsuite/29_atomics/atomic_integral/cons/copy_neg.cc:
Likewise.
* testsuite/29_atomics/atomic_integral/operators/bitwise_neg.cc:
Likewise.
* testsuite/29_atomics/atomic_integral/operators/decrement_neg.cc:
Likewise.
* testsuite/29_atomics/atomic_integral/operators/increment_neg.cc:
Likewise.

3 years agolibstdc++: Specialize allocator_traits<pmr::polymorphic_allocator<T>>
Jonathan Wakely [Tue, 3 Aug 2021 13:00:47 +0000 (14:00 +0100)]
libstdc++: Specialize allocator_traits<pmr::polymorphic_allocator<T>>

This adds a partial specialization of allocator_traits, similar to what
was already done for std::allocator. This means that most uses of
polymorphic_allocator via the traits can avoid the metaprogramming
overhead needed to deduce the properties from polymorphic_allocator.

In addition, I'm changing polymorphic_allocator::delete_object to invoke
the destructor (or pseudo-destructor) directly, rather than calling
allocator_traits::destroy, which calls polymorphic_allocator::destroy
(which is deprecated). This is observable if a user has specialized
allocator_traits<polymorphic_allocator<Foo>> and expects to see its
destroy member function called. I consider explicit specializations of
allocator_traits to be wrong-headed, and this use case seems unnecessary
to support. So delete_object just invokes the destructor directly.

Signed-off-by: Jonathan Wakely <jwakely@redhat.com>
libstdc++-v3/ChangeLog:

* include/std/memory_resource (polymorphic_allocator::delete_object):
Call destructor directly instead of using destroy.
(allocator_traits<polymorphic_allocator<T>>): Define partial
specialization.

3 years agolibstdc++: Remove trailing whitespace in some tests
Jonathan Wakely [Mon, 2 Aug 2021 23:05:01 +0000 (00:05 +0100)]
libstdc++: Remove trailing whitespace in some tests

Signed-off-by: Jonathan Wakely <jwakely@redhat.com>
libstdc++-v3/ChangeLog:

* testsuite/20_util/function_objects/binders/3113.cc: Remove
trailing whitespace.
* testsuite/20_util/shared_ptr/assign/auto_ptr.cc: Likewise.
* testsuite/20_util/shared_ptr/assign/auto_ptr_neg.cc: Likewise.
* testsuite/20_util/shared_ptr/assign/auto_ptr_rvalue.cc:
Likewise.
* testsuite/20_util/shared_ptr/creation/dr925.cc: Likewise.
* testsuite/25_algorithms/headers/algorithm/synopsis.cc:
Likewise.
* testsuite/25_algorithms/random_shuffle/requirements/explicit_instantiation/2.cc:
Likewise.
* testsuite/25_algorithms/random_shuffle/requirements/explicit_instantiation/pod.cc:
Likewise.

3 years agolibstdc++: Deprecate std::random_shuffle for C++14
Jonathan Wakely [Mon, 2 Aug 2021 17:35:42 +0000 (18:35 +0100)]
libstdc++: Deprecate std::random_shuffle for C++14

The std::random_shuffle algorithm was removed in C++14 (without
deprecation). This adds the deprecated attribute for C++14 and later, so
that users are warned they should not be using it in those dialects.

Signed-off-by: Jonathan Wakely <jwakely@redhat.com>
libstdc++-v3/ChangeLog:

* doc/xml/manual/evolution.xml: Document deprecation.
* doc/html/*: Regenerate.
* include/bits/c++config (_GLIBCXX14_DEPRECATED): Define.
(_GLIBCXX14_DEPRECATED_SUGGEST): Define.
* include/bits/stl_algo.h (random_shuffle): Deprecate for C++14
and later.
* testsuite/25_algorithms/headers/algorithm/synopsis.cc: Adjust
for C++11 and C++14 changes to std::random_shuffle and
std::shuffle.
* testsuite/25_algorithms/random_shuffle/1.cc: Add options to
use deprecated algorithms.
* testsuite/25_algorithms/random_shuffle/59603.cc: Likewise.
* testsuite/25_algorithms/random_shuffle/moveable.cc: Likewise.
* testsuite/25_algorithms/random_shuffle/requirements/explicit_instantiation/2.cc:
Likewise.
* testsuite/25_algorithms/random_shuffle/requirements/explicit_instantiation/pod.cc:
Likewise.

3 years agolibstdc++: Add testsuite proc for testing deprecated features
Jonathan Wakely [Mon, 2 Aug 2021 22:55:18 +0000 (23:55 +0100)]
libstdc++: Add testsuite proc for testing deprecated features

This change adds options to tests that explicitly use deprecated
features, so that -D_GLIBCXX_USE_DEPRECATED=0 can be used to run the
rest of the testsuite. The tests that explicitly/intentionally use
deprecated features will still be able to use them, but they can be
disabled for the majority of tests.

Signed-off-by: Jonathan Wakely <jwakely@redhat.com>
libstdc++-v3/ChangeLog:

* testsuite/23_containers/forward_list/operations/3.cc:
Use lambda instead of std::bind2nd.
* testsuite/20_util/function_objects/binders/3113.cc: Add
options for testing deprecated features.
* testsuite/20_util/pair/cons/99957.cc: Likewise.
* testsuite/20_util/shared_ptr/assign/auto_ptr.cc: Likewise.
* testsuite/20_util/shared_ptr/assign/auto_ptr_neg.cc: Likewise.
* testsuite/20_util/shared_ptr/assign/auto_ptr_rvalue.cc:
Likewise.
* testsuite/20_util/shared_ptr/cons/43820_neg.cc: Likewise.
* testsuite/20_util/shared_ptr/cons/auto_ptr.cc: Likewise.
* testsuite/20_util/shared_ptr/cons/auto_ptr_neg.cc: Likewise.
* testsuite/20_util/shared_ptr/creation/dr925.cc: Likewise.
* testsuite/20_util/unique_ptr/cons/auto_ptr.cc: Likewise.
* testsuite/20_util/unique_ptr/cons/auto_ptr_neg.cc: Likewise.
* testsuite/ext/pb_ds/example/priority_queue_erase_if.cc:
Likewise.
* testsuite/ext/pb_ds/example/priority_queue_split_join.cc:
Likewise.
* testsuite/lib/dg-options.exp (dg_add_options_using-deprecated):
New proc.

3 years agolibstdc++: Reduce header dependencies in <regex>
Jonathan Wakely [Mon, 2 Aug 2021 17:34:19 +0000 (18:34 +0100)]
libstdc++: Reduce header dependencies in <regex>

This reduces the size of <regex> a little. This is one of the largest
and slowest headers in the library.

By using <bits/stl_algobase.h> and <bits/stl_algo.h> instead of
<algorithm> we don't need to parse all the parallel algorithms and
std::ranges:: algorithms that are not needed by <regex>. Similarly, by
using <bits/stl_tree.h> and <bits/stl_map.h> instead of <map> we don't
need to parse the definition of std::multimap.

The _State_info type is not movable or copyable, so doesn't need to use
std::unique_ptr<bool[]> to manage a bitset, we can just delete it in the
destructor. It would use a lot less space if we used a bitset instead,
but that would be an ABI break. We could do it for the versioned
namespace, but this patch doesn't do so. For future reference, using
vector<bool> would work, but would increase sizeof(_State_info) by two
pointers, because it's three times as large as unique_ptr<bool[]>. We
can't use std::bitset because the length isn't constant. We want a
bitset with a non-constant but fixed length.

Signed-off-by: Jonathan Wakely <jwakely@redhat.com>
libstdc++-v3/ChangeLog:

* include/bits/regex_executor.h (_State_info): Replace
unique_ptr<bool[]> with array of bool.
* include/bits/regex_executor.tcc: Likewise.
* include/bits/regex_scanner.tcc: Replace std::strchr with
__builtin_strchr.
* include/std/regex: Replace standard headers with smaller
internal ones.
* testsuite/28_regex/traits/char/lookup_classname.cc: Include
<string.h> for strlen.
* testsuite/28_regex/traits/char/lookup_collatename.cc:
Likewise.

3 years agox86: Use XMM31 for scratch SSE register
H.J. Lu [Fri, 16 Jul 2021 17:29:46 +0000 (10:29 -0700)]
x86: Use XMM31 for scratch SSE register

In 64-bit mode, use XMM31 for scratch SSE register to avoid vzeroupper
if possible.

gcc/

* config/i386/i386.c (ix86_gen_scratch_sse_rtx): In 64-bit mode,
try XMM31 to avoid vzeroupper.

gcc/testsuite/

* gcc.target/i386/avx-vzeroupper-14.c: Pass -mno-avx512f to
disable XMM31.
* gcc.target/i386/avx-vzeroupper-15.c: Likewise.
* gcc.target/i386/pr82941-1.c: Updated.  Check for vzeroupper.
* gcc.target/i386/pr82942-1.c: Likewise.
* gcc.target/i386/pr82990-1.c: Likewise.
* gcc.target/i386/pr82990-3.c: Likewise.
* gcc.target/i386/pr82990-5.c: Likewise.
* gcc.target/i386/pr100865-4b.c: Likewise.
* gcc.target/i386/pr100865-6b.c: Likewise.
* gcc.target/i386/pr100865-7b.c: Likewise.
* gcc.target/i386/pr100865-10b.c: Likewise.
* gcc.target/i386/pr100865-8b.c: Updated.
* gcc.target/i386/pr100865-9b.c: Likewise.
* gcc.target/i386/pr100865-11b.c: Likewise.
* gcc.target/i386/pr100865-12b.c: Likewise.

3 years agolibstdc++: Avoid using std::unique_ptr in <locale>
Jonathan Wakely [Mon, 2 Aug 2021 16:12:52 +0000 (17:12 +0100)]
libstdc++: Avoid using std::unique_ptr in <locale>

std::wstring_convert and std::wbuffer_convert types are not copyable or
movable, and store a plain pointer without a deleter. That means a much
simpler type that just uses delete in its destructor can be used instead
of std::unique_ptr.

That avoids including and parsing all of <bits/unique_ptr.h> in every
header that includes <locale>. It also avoids instantiating
unique_ptr<C> and std::tuple<C*, default_delete<C>> when the conversion
utilities are used.

Signed-off-by: Jonathan Wakely <jwakely@redhat.com>
libstdc++-v3/ChangeLog:

* include/bits/locale_conv.h (__detail::_Scoped_ptr): Define new
RAII class template.
(wstring_convert, wbuffer_convert): Use __detail::_Scoped_ptr
instead of unique_ptr.

3 years agoaarch64: Add -mtune=neoverse-512tvb
Richard Sandiford [Tue, 3 Aug 2021 12:00:49 +0000 (13:00 +0100)]
aarch64: Add -mtune=neoverse-512tvb

This patch adds an option to tune for Neoverse cores that have
a total vector bandwidth of 512 bits (4x128 for Advanced SIMD
and a vector-length-dependent equivalent for SVE).  This is intended
to be a compromise between tuning aggressively for a single core like
Neoverse V1 (which can be too narrow) and tuning for AArch64 cores
in general (which can be too wide).

-mcpu=neoverse-512tvb is equivalent to -mcpu=neoverse-v1
-mtune=neoverse-512tvb.

gcc/
* doc/invoke.texi: Document -mtune=neoverse-512tvb and
-mcpu=neoverse-512tvb.
* config/aarch64/aarch64-cores.def (neoverse-512tvb): New entry.
* config/aarch64/aarch64-tune.md: Regenerate.
* config/aarch64/aarch64.c (neoverse512tvb_sve_vector_cost)
(neoverse512tvb_sve_issue_info, neoverse512tvb_vec_issue_info)
(neoverse512tvb_vector_cost, neoverse512tvb_tunings): New structures.
(aarch64_adjust_body_cost_sve): Handle -mtune=neoverse-512tvb.
(aarch64_adjust_body_cost): Likewise.

3 years agoaarch64: Restrict issue heuristics to inner vector loop
Richard Sandiford [Tue, 3 Aug 2021 12:00:48 +0000 (13:00 +0100)]
aarch64: Restrict issue heuristics to inner vector loop

The AArch64 vector costs try to take issue rates into account.
However, when vectorising an outer loop, we lumped the inner
and outer operations together, which is somewhat meaningless.
This patch restricts the heuristic to the inner loop.

gcc/
* config/aarch64/aarch64.c (aarch64_add_stmt_cost): Only
record issue information for operations that occur in the
innermost loop.

3 years agoaarch64: Tweak MLA vector costs
Richard Sandiford [Tue, 3 Aug 2021 12:00:47 +0000 (13:00 +0100)]
aarch64: Tweak MLA vector costs

The issue-based vector costs currently assume that a multiply-add
sequence can be implemented using a single instruction.  This is
generally true for scalars (which have a 4-operand instruction)
and SVE (which allows the output to be tied to any input).
However, for Advanced SIMD, multiplying two values and adding
an invariant will end up being a move and an MLA.

The only target to use the issue-based vector costs is Neoverse V1,
which would generally prefer SVE in this case anyway.  I therefore
don't have a self-contained testcase.  However, the distinction
becomes more important with a later patch.

gcc/
* config/aarch64/aarch64.c (aarch64_multiply_add_p): Add a vec_flags
parameter.  Detect cases in which an Advanced SIMD MLA would almost
certainly require a MOV.
(aarch64_count_ops): Update accordingly.

3 years agoaarch64: Tweak the cost of elementwise stores
Richard Sandiford [Tue, 3 Aug 2021 12:00:46 +0000 (13:00 +0100)]
aarch64: Tweak the cost of elementwise stores

When the vectoriser scalarises a strided store, it counts one
scalar_store for each element plus one vec_to_scalar extraction
for each element.  However, extracting element 0 is free on AArch64,
so it should have zero cost.

I don't have a testcase that requires this for existing -mtune
options, but it becomes more important with a later patch.

gcc/
* config/aarch64/aarch64.c (aarch64_is_store_elt_extraction): New
function, split out from...
(aarch64_detect_vector_stmt_subtype): ...here.
(aarch64_add_stmt_cost): Treat extracting element 0 as free.

3 years agoaarch64: Add gather_load_xNN_cost tuning fields
Richard Sandiford [Tue, 3 Aug 2021 12:00:45 +0000 (13:00 +0100)]
aarch64: Add gather_load_xNN_cost tuning fields

This patch adds tuning fields for the total cost of a gather load
instruction.  Until now, we've costed them as one scalar load
per element instead.  Those scalar_load-based values are also
what the patch uses to fill in the new fields for existing
cost structures.

gcc/
* config/aarch64/aarch64-protos.h (sve_vec_cost):
Add gather_load_x32_cost and gather_load_x64_cost.
* config/aarch64/aarch64.c (generic_sve_vector_cost)
(a64fx_sve_vector_cost, neoversev1_sve_vector_cost): Update
accordingly, using the values given by the scalar_load * number
of elements calculation that we used previously.
(aarch64_detect_vector_stmt_subtype): Use the new fields.

3 years agoaarch64: Split out aarch64_adjust_body_cost_sve
Richard Sandiford [Tue, 3 Aug 2021 12:00:45 +0000 (13:00 +0100)]
aarch64: Split out aarch64_adjust_body_cost_sve

This patch splits the SVE-specific part of aarch64_adjust_body_cost
out into its own subroutine, so that a future patch can call it
more than once.  I wondered about using a lambda to avoid having
to pass all the arguments, but in the end this way seemed clearer.

gcc/
* config/aarch64/aarch64.c (aarch64_adjust_body_cost_sve): New
function, split out from...
(aarch64_adjust_body_cost): ...here.

3 years agoaarch64: Add a simple fixed-point class for costing
Richard Sandiford [Tue, 3 Aug 2021 12:00:44 +0000 (13:00 +0100)]
aarch64: Add a simple fixed-point class for costing

This patch adds a simple fixed-point class for holding fractional
cost values.  It can exactly represent the reciprocal of any
single-vector SVE element count (including the non-power-of-2 ones).
This means that it can also hold 1/N for all N in [1, 16], which should
be enough for the various *_per_cycle fields.

For now the assumption is that the number of possible reciprocals
is fixed at compile time and so the class should always be able
to hold an exact value.

The class uses a uint64_t to hold the fixed-point value, which means
that it can hold any scaled uint32_t cost.  Normally we don't worry
about overflow when manipulating raw uint32_t costs, but just to be
on the safe side, the class uses saturating arithmetic for all
operations.

As far as the changes to the cost routines themselves go:

- The changes to aarch64_add_stmt_cost and its subroutines are
  just laying groundwork for future patches; no functional change
  intended.

- The changes to aarch64_adjust_body_cost mean that we now
  take fractional differences into account.

gcc/
* config/aarch64/fractional-cost.h: New file.
* config/aarch64/aarch64.c: Include <algorithm> (indirectly)
and cost_fraction.h.
(vec_cost_fraction): New typedef.
(aarch64_detect_scalar_stmt_subtype): Use it for statement costs.
(aarch64_detect_vector_stmt_subtype): Likewise.
(aarch64_sve_adjust_stmt_cost, aarch64_adjust_stmt_cost): Likewise.
(aarch64_estimate_min_cycles_per_iter): Use vec_cost_fraction
for cycle counts.
(aarch64_adjust_body_cost): Likewise.
(aarch64_test_cost_fraction): New function.
(aarch64_run_selftests): Call it.

3 years agoaarch64: Turn sve_width tuning field into a bitmask
Richard Sandiford [Tue, 3 Aug 2021 12:00:43 +0000 (13:00 +0100)]
aarch64: Turn sve_width tuning field into a bitmask

The tuning structures have an sve_width field that specifies the
number of bits in an SVE vector (or SVE_NOT_IMPLEMENTED if not
applicable).  This patch turns the field into a bitmask so that
it can specify multiple widths at the same time.  For now we
always treat the mininum width as the likely width.

An alternative would have been to add extra fields, which would
have coped correctly with non-power-of-2 widths.  However,
we're very far from supporting constant non-power-of-2 vectors
in GCC, so I think the non-power-of-2 case will in reality always
have to be hidden behind VLA.

gcc/
* config/aarch64/aarch64-protos.h (tune_params::sve_width): Turn
into a bitmask.
* config/aarch64/aarch64.c (aarch64_cmp_autovec_modes): Update
accordingly.
(aarch64_estimated_poly_value): Likewise.  Use the least significant
set bit for the minimum and likely values.  Use the most significant
set bit for the maximum value.

3 years agoAdd cond_add/sub/mul for vector integer modes.
liuhongt [Tue, 3 Aug 2021 05:22:11 +0000 (13:22 +0800)]
Add cond_add/sub/mul for vector integer modes.

gcc/ChangeLog:

* config/i386/sse.md (cond_<insn><mode>): New expander.
(cond_mul<mode>): Ditto.

gcc/testsuite/ChangeLog:

* gcc.target/i386/cond_op_addsubmul_d-1.c: New test.
* gcc.target/i386/cond_op_addsubmul_d-2.c: New test.
* gcc.target/i386/cond_op_addsubmul_q-1.c: New test.
* gcc.target/i386/cond_op_addsubmul_q-2.c: New test.
* gcc.target/i386/cond_op_addsubmul_w-1.c: New test.
* gcc.target/i386/cond_op_addsubmul_w-2.c: New test.

3 years agoFix bashism in `libsanitizer/configure.tgt'
Mosè Giordano [Fri, 18 Jun 2021 23:46:44 +0000 (23:46 +0000)]
Fix bashism in `libsanitizer/configure.tgt'

Appending to a string variable with `+=' is a bashism and does not work in
strict POSIX shells like dash.  This results in the extra compilation flags not
to be set correctly.  This patch replaces the `+=' syntax with a simple string
interpolation to append to the `EXTRA_CXXFLAGS' variable.

libsanitizer/ChangeLog

PR sanitizer/101111
* configure.tgt: Fix bashism in setting of `EXTRA_CXXFLAGS'.

3 years agoanalyzer: Fix ICE on MD builtin [PR101721]
Jakub Jelinek [Tue, 3 Aug 2021 10:44:17 +0000 (12:44 +0200)]
analyzer: Fix ICE on MD builtin [PR101721]

The following testcase ICEs because DECL_FUNCTION_CODE asserts the builtin
is BUILT_IN_NORMAL, but it sees a backend (MD) builtin instead.
The FE, normal and MD builtin numbers overlap, so one should always
check what kind of builtin it is before looking at specific codes.

On the other side, region-model.cc has:
      if (fndecl_built_in_p (callee_fndecl, BUILT_IN_NORMAL)
          && gimple_builtin_call_types_compatible_p (call, callee_fndecl))
        switch (DECL_UNCHECKED_FUNCTION_CODE (callee_fndecl))
which IMO should use DECL_FUNCTION_CODE instead, it checked first it is
a normal builtin...

2021-08-03  Jakub Jelinek  <jakub@redhat.com>

PR analyzer/101721
* sm-malloc.cc (known_allocator_p): Only check DECL_FUNCTION_CODE on
BUILT_IN_NORMAL builtins.

* gcc.dg/analyzer/pr101721.c: New test.

3 years agoChangeLog: add problematic commit 2e96b5f14e4025691b57d2301d71aa6092ed44bc.
Martin Liska [Tue, 3 Aug 2021 07:57:21 +0000 (09:57 +0200)]
ChangeLog: add problematic commit 2e96b5f14e4025691b57d2301d71aa6092ed44bc.

gcc/ChangeLog:

* ChangeLog: Add manually.

libgomp/ChangeLog:

* ChangeLog: Add manually.

gcc/testsuite/ChangeLog:

* ChangeLog: Add manually.

3 years agoDaily bump.
GCC Administrator [Tue, 3 Aug 2021 07:49:16 +0000 (07:49 +0000)]
Daily bump.

3 years agogcc-changelog: ignore one more commit
Martin Liska [Tue, 3 Aug 2021 07:22:30 +0000 (09:22 +0200)]
gcc-changelog: ignore one more commit

contrib/ChangeLog:

* gcc-changelog/git_update_version.py: Ignore problematic
  commit.

3 years agox86: Add testcases for PR target/80566
H.J. Lu [Tue, 3 Aug 2021 03:34:13 +0000 (20:34 -0700)]
x86: Add testcases for PR target/80566

PR target/80566
* g++.target/i386/pr80566-1.C: New test.
* g++.target/i386/pr80566-2.C: Likewise.

3 years agotree-cfg: Fix typos on dloop in move_sese_region_to_fn
Kewen Lin [Tue, 3 Aug 2021 03:12:00 +0000 (22:12 -0500)]
tree-cfg: Fix typos on dloop in move_sese_region_to_fn

As mentioned in [1], there is one pre-existing issue before
the refactoring of FOR_EACH_LOOP_FN.  The macro will always
set the given LOOP as NULL at the end of iterating unless
there is some early break inside, obviously there is no
early break and dloop will be set as NULL after the loop
iterating.  It's kept as NULL after the factoring.

I tried to debug the test case gcc.dg/graphite/pr83359.c
with commit 555758de90074 (also reproduced the ICE with
555758de90074~), and noticed the compilation of the test
case only covers the hunk:

  else
    {
      moved_orig_loop_num[dloop->orig_loop_num] = -1;
      dloop->orig_loop_num = 0;
    }

it doesn't touch the if condition hunk to increase
"moved_orig_loop_num[dloop->orig_loop_num]".  So the
following hunk guarded with

  if (moved_orig_loop_num[orig_loop_num] == 2)

using dloop for dereference doesn't get executed.  It
explains why the problem doesn't get exposed before.

By looking to the code using dloop, I think it's a copy
paste typo, the modified assertion codes have the same
words as the above condition check.  In that context, the
expected original number has been assigned to variable
orig_loop_num by extracting from the arg0 of the call
IFN_LOOP_DIST_ALIAS.

[1] https://gcc.gnu.org/pipermail/gcc-patches/2021-July/576367.html

gcc/ChangeLog:

* tree-cfg.c (move_sese_region_to_fn): Fix typos on dloop.

3 years agoSupport cond_add/sub/mul/div for vector float/double.
liuhongt [Tue, 27 Jul 2021 10:08:38 +0000 (18:08 +0800)]
Support cond_add/sub/mul/div for vector float/double.

gcc/ChangeLog:

* config/i386/sse.md (cond_<insn><mode>):New expander.
(cond_mul<mode>): Ditto.
(cond_div<mode>): Ditto.

gcc/testsuite/ChangeLog:

* gcc.target/i386/cond_op_addsubmuldiv_double-1.c: New test.
* gcc.target/i386/cond_op_addsubmuldiv_double-2.c: New test.
* gcc.target/i386/cond_op_addsubmuldiv_float-1.c: New test.
* gcc.target/i386/cond_op_addsubmuldiv_float-2.c: New test.

3 years agocompiler, runtime: allow slice to array pointer conversion
Ian Lance Taylor [Sat, 31 Jul 2021 00:19:42 +0000 (17:19 -0700)]
compiler, runtime: allow slice to array pointer conversion

Panic if the slice is too short.

For golang/go#395

Reviewed-on: https://go-review.googlesource.com/c/gofrontend/+/338630

3 years agocompiler, runtime: support unsafe.Add and unsafe.Slice
Ian Lance Taylor [Sun, 1 Aug 2021 02:28:51 +0000 (19:28 -0700)]
compiler, runtime: support unsafe.Add and unsafe.Slice

For golang/go#19367
For golang/go#40481

Reviewed-on: https://go-review.googlesource.com/c/gofrontend/+/338949

3 years agolibstdc++: Add missing std::move to ranges::copy/move/reverse_copy [PR101599]
Patrick Palka [Mon, 2 Aug 2021 19:30:15 +0000 (15:30 -0400)]
libstdc++: Add missing std::move to ranges::copy/move/reverse_copy [PR101599]

In passing, this also renames the template parameter _O2 to _Out2 in
ranges::partition_copy and uglifies two of its function parameters,
out_true and out_false.

PR libstdc++/101599

libstdc++-v3/ChangeLog:

* include/bits/ranges_algo.h (__reverse_copy_fn::operator()):
Add missing std::move in return statement.
(__partition_copy_fn::operator()): Rename templtae parameter
_O2 to _Out2.  Uglify function parameters out_true and out_false.
* include/bits/ranges_algobase.h (__copy_or_move): Add missing
std::move to recursive call that unwraps a __normal_iterator
output iterator.
* testsuite/25_algorithms/copy/constrained.cc (test06): New test.
* testsuite/25_algorithms/move/constrained.cc (test05): New test.

3 years agolibstdc++: Fix up implementation of LWG 3533 [PR101589]
Patrick Palka [Mon, 2 Aug 2021 19:30:13 +0000 (15:30 -0400)]
libstdc++: Fix up implementation of LWG 3533 [PR101589]

In r12-569 I accidentally applied the LWG 3533 change to
elements_view::iterator::base instead to elements_view::base.

This patch corrects this, and also applies the corresponding LWG 3533
change to lazy_split_view::inner-iter::base now that we implement P2210.

PR libstdc++/101589

libstdc++-v3/ChangeLog:

* include/std/ranges (lazy_split_view::_InnerIter::base): Make
the const& overload unconstrained and return a const reference
as per LWG 3533.  Make unconditionally noexcept.
(elements_view::base): Revert accidental r12-569 change.
(elements_view::_Iterator::base): Make the const& overload
unconstrained and return a const reference as per LWG 3533.
Make unconditionally noexcept.

3 years agolibstdc++: Add missing std::move to join_view::iterator ctor [PR101483]
Patrick Palka [Mon, 2 Aug 2021 19:30:10 +0000 (15:30 -0400)]
libstdc++: Add missing std::move to join_view::iterator ctor [PR101483]

PR libstdc++/101483

libstdc++-v3/ChangeLog:

* include/std/ranges (join_view::_Iterator::_Iterator): Add
missing std::move.

3 years agox86: Also pass -mno-sse to vect8-ret.c
H.J. Lu [Mon, 2 Aug 2021 17:01:47 +0000 (10:01 -0700)]
x86: Also pass -mno-sse to vect8-ret.c

Also pass -mno-sse to vect8-ret.c to disable XMM load/store when running
GCC tests with "-march=x86-64 -m32".

* gcc.target/i386/vect8-ret.c: Also pass -mno-sse.

3 years agox86: Update gcc.target/i386/incoming-11.c
H.J. Lu [Mon, 2 Aug 2021 17:01:47 +0000 (10:01 -0700)]
x86: Update gcc.target/i386/incoming-11.c

Expect no stack realignment since we no longer realign stack when
copying data.

* gcc.target/i386/incoming-11.c: Expect no stack realignment.

3 years agox86: Also pass -mno-avx to sw-1.c for ia32
H.J. Lu [Mon, 2 Aug 2021 17:01:47 +0000 (10:01 -0700)]
x86: Also pass -mno-avx to sw-1.c for ia32

Also pass -mno-avx to sw-1.c for ia32 since copying data with YMM or ZMM
registers disables shrink-wrapping when the second argument is passed on
stack.

* gcc.target/i386/sw-1.c: Also pass -mno-avx for ia32.

3 years agox86: Also pass -mno-avx to cold-attribute-1.c
H.J. Lu [Mon, 2 Aug 2021 17:01:47 +0000 (10:01 -0700)]
x86: Also pass -mno-avx to cold-attribute-1.c

Also pass -mno-avx to pr72839.c to avoid copying data with YMM or ZMM
registers.

* gcc.target/i386/cold-attribute-1.c: Also pass -mno-avx.

3 years agox86: Also pass -mno-avx to pr72839.c
H.J. Lu [Mon, 2 Aug 2021 17:01:47 +0000 (10:01 -0700)]
x86: Also pass -mno-avx to pr72839.c

Also pass -mno-avx to pr72839.c to avoid copying data with YMM or ZMM
registers.

* gcc.target/i386/pr72839.c: Also pass -mno-avx.

3 years agox86: Add tests for piecewise move and store
H.J. Lu [Mon, 2 Aug 2021 17:01:46 +0000 (10:01 -0700)]
x86: Add tests for piecewise move and store

* gcc.target/i386/pieces-memcpy-10.c: New test.
* gcc.target/i386/pieces-memcpy-11.c: Likewise.
* gcc.target/i386/pieces-memcpy-12.c: Likewise.
* gcc.target/i386/pieces-memcpy-13.c: Likewise.
* gcc.target/i386/pieces-memcpy-14.c: Likewise.
* gcc.target/i386/pieces-memcpy-15.c: Likewise.
* gcc.target/i386/pieces-memcpy-16.c: Likewise.
* gcc.target/i386/pieces-memset-1.c: Likewise.
* gcc.target/i386/pieces-memset-2.c: Likewise.
* gcc.target/i386/pieces-memset-3.c: Likewise.
* gcc.target/i386/pieces-memset-4.c: Likewise.
* gcc.target/i386/pieces-memset-5.c: Likewise.
* gcc.target/i386/pieces-memset-6.c: Likewise.
* gcc.target/i386/pieces-memset-7.c: Likewise.
* gcc.target/i386/pieces-memset-8.c: Likewise.
* gcc.target/i386/pieces-memset-9.c: Likewise.
* gcc.target/i386/pieces-memset-10.c: Likewise.
* gcc.target/i386/pieces-memset-11.c: Likewise.
* gcc.target/i386/pieces-memset-12.c: Likewise.
* gcc.target/i386/pieces-memset-13.c: Likewise.
* gcc.target/i386/pieces-memset-14.c: Likewise.
* gcc.target/i386/pieces-memset-15.c: Likewise.
* gcc.target/i386/pieces-memset-16.c: Likewise.
* gcc.target/i386/pieces-memset-17.c: Likewise.
* gcc.target/i386/pieces-memset-18.c: Likewise.
* gcc.target/i386/pieces-memset-19.c: Likewise.
* gcc.target/i386/pieces-memset-20.c: Likewise.
* gcc.target/i386/pieces-memset-21.c: Likewise.
* gcc.target/i386/pieces-memset-22.c: Likewise.
* gcc.target/i386/pieces-memset-23.c: Likewise.
* gcc.target/i386/pieces-memset-24.c: Likewise.
* gcc.target/i386/pieces-memset-25.c: Likewise.
* gcc.target/i386/pieces-memset-26.c: Likewise.
* gcc.target/i386/pieces-memset-27.c: Likewise.
* gcc.target/i386/pieces-memset-28.c: Likewise.
* gcc.target/i386/pieces-memset-29.c: Likewise.
* gcc.target/i386/pieces-memset-30.c: Likewise.
* gcc.target/i386/pieces-memset-31.c: Likewise.
* gcc.target/i386/pieces-memset-32.c: Likewise.
* gcc.target/i386/pieces-memset-33.c: Likewise.
* gcc.target/i386/pieces-memset-34.c: Likewise.
* gcc.target/i386/pieces-memset-35.c: Likewise.
* gcc.target/i386/pieces-memset-36.c: Likewise.
* gcc.target/i386/pieces-memset-37.c: Likewise.
* gcc.target/i386/pieces-memset-38.c: Likewise.
* gcc.target/i386/pieces-memset-39.c: Likewise.
* gcc.target/i386/pieces-memset-40.c: Likewise.
* gcc.target/i386/pieces-memset-41.c: Likewise.
* gcc.target/i386/pieces-memset-42.c: Likewise.
* gcc.target/i386/pieces-memset-43.c: Likewise.
* gcc.target/i386/pieces-memset-44.c: Likewise.

3 years agox86: Add AVX2 tests for PR middle-end/90773
H.J. Lu [Mon, 2 Aug 2021 17:01:46 +0000 (10:01 -0700)]
x86: Add AVX2 tests for PR middle-end/90773

PR middle-end/90773
* gcc.target/i386/pr90773-20.c: New test.
* gcc.target/i386/pr90773-21.c: Likewise.
* gcc.target/i386/pr90773-22.c: Likewise.
* gcc.target/i386/pr90773-23.c: Likewise.
* gcc.target/i386/pr90773-26.c: Likewise.

3 years agox86: Update piecewise move and store
H.J. Lu [Mon, 2 Aug 2021 17:01:46 +0000 (10:01 -0700)]
x86: Update piecewise move and store

We can use TImode/OImode/XImode integers for piecewise move and store.

1. Define MAX_MOVE_MAX to 64, which is the constant maximum number of
bytes that a single instruction can move quickly between memory and
registers or between two memory locations.
2. Define MOVE_MAX to the maximum number of bytes we can move from memory
to memory in one reasonably fast instruction.  The difference between
MAX_MOVE_MAX and MOVE_MAX is that MAX_MOVE_MAX must be a constant,
independent of compiler options, since it is used in reload.h to define
struct target_reload and MOVE_MAX can vary, depending on compiler options.
3. When vector register is used for piecewise move and store, we don't
increase stack_alignment_needed since vector register spill isn't
required for piecewise move and store.  Since stack_realign_needed is
set to true by checking stack_alignment_estimated set by pseudo vector
register usage, we also need to check stack_realign_needed to eliminate
frame pointer.

gcc/

* config/i386/i386.c (ix86_finalize_stack_frame_flags): Also
check stack_realign_needed for stack realignment.
(ix86_legitimate_constant_p): Always allow CONST_WIDE_INT smaller
than the largest integer supported by vector register.
* config/i386/i386.h (MAX_MOVE_MAX): New.  Set to 64.
(MOVE_MAX): Set to bytes of the largest integer supported by
vector register.
(STORE_MAX_PIECES): New.

gcc/testsuite/

* gcc.target/i386/pr90773-1.c: Adjust to expect movq for 32-bit.
* gcc.target/i386/pr90773-4.c: Also run for 32-bit.
* gcc.target/i386/pr90773-15.c: Likewise.
* gcc.target/i386/pr90773-16.c: Likewise.
* gcc.target/i386/pr90773-17.c: Likewise.
* gcc.target/i386/pr90773-24.c: Likewise.
* gcc.target/i386/pr90773-25.c: Likewise.
* gcc.target/i386/pr100865-1.c: Likewise.
* gcc.target/i386/pr100865-2.c: Likewise.
* gcc.target/i386/pr100865-3.c: Likewise.
* gcc.target/i386/pr90773-14.c: Also run for 32-bit and expect
XMM movd to store 4 bytes.
* gcc.target/i386/pr100865-4a.c: Also run for 32-bit and expect
YMM registers.
* gcc.target/i386/pr100865-4b.c: Likewise.
* gcc.target/i386/pr100865-10a.c: Expect YMM registers.
* gcc.target/i386/pr100865-10b.c: Likewise.

3 years agox86: Avoid stack realignment when copying data
H.J. Lu [Mon, 2 Aug 2021 17:01:46 +0000 (10:01 -0700)]
x86: Avoid stack realignment when copying data

To avoid stack realignment, use SCRATCH_SSE_REG to copy data from one
memory location to another.

gcc/

* config/i386/i386-expand.c (ix86_expand_vector_move): Call
ix86_gen_scratch_sse_rtx to get a scratch SSE register to copy
data from one memory location to another.

gcc/testsuite/

* gcc.target/i386/eh_return-1.c: New test.

3 years agox86: Add TARGET_GEN_MEMSET_SCRATCH_RTX
H.J. Lu [Mon, 2 Aug 2021 17:01:46 +0000 (10:01 -0700)]
x86: Add TARGET_GEN_MEMSET_SCRATCH_RTX

Define TARGET_GEN_MEMSET_SCRATCH_RTX to ix86_gen_scratch_sse_rtx to
return a scratch SSE register for memset.

gcc/

PR middle-end/90773
* config/i386/i386.c (TARGET_GEN_MEMSET_SCRATCH_RTX): New.

gcc/testsuite/

PR middle-end/90773
* gcc.target/i386/pr90773-5.c: Updated to expect XMM register.
* gcc.target/i386/pr90773-14.c: Likewise.
* gcc.target/i386/pr90773-15.c: New test.
* gcc.target/i386/pr90773-16.c: Likewise.
* gcc.target/i386/pr90773-17.c: Likewise.
* gcc.target/i386/pr90773-18.c: Likewise.
* gcc.target/i386/pr90773-19.c: Likewise.

3 years agolibstdc++: Fix filesystem::temp_directory_path [PR101709]
Jonathan Wakely [Mon, 2 Aug 2021 14:52:41 +0000 (15:52 +0100)]
libstdc++: Fix filesystem::temp_directory_path [PR101709]

Signed-off-by: Jonathan Wakely <jwakely@redhat.com>
libstdc++-v3/ChangeLog:

PR libstdc++/101709
* src/filesystem/ops-common.h (get_temp_directory_from_env):
Add error_code parameter.
* src/c++17/fs_ops.cc (fs::temp_directory_path): Pass error_code
argument to get_temp_directory_from_env and check it.
* src/filesystem/ops.cc (fs::temp_directory_path): Likewise.

3 years agolibstc++: Add dg-error for additional error in C++11 mode
Jonathan Wakely [Mon, 2 Aug 2021 13:41:17 +0000 (14:41 +0100)]
libstc++: Add dg-error for additional error in C++11 mode

When the comparison with a nullptr_t is ill-formed, there is an
additional error for C++11 mode due to the constexpr function body being
invalid.

Signed-off-by: Jonathan Wakely <jwakely@redhat.com>
libstdc++-v3/ChangeLog:

* testsuite/20_util/tuple/comparison_operators/overloaded2.cc:
Add dg-error for c++11_only target.

3 years agoRemove --param=threader-iterative.
Aldy Hernandez [Mon, 2 Aug 2021 13:12:30 +0000 (15:12 +0200)]
Remove --param=threader-iterative.

This was meant to be an internal construct, but I see folks are using
it and submitting PRs against it.  Let's just remove this to avoid
further confusion.

Tested on x86-64 Linux.

gcc/ChangeLog:

PR tree-optimization/101724
* params.opt: Remove --param=threader-iterative.
* tree-ssa-threadbackward.c (pass_thread_jumps::execute): Remove
iterative mode.

3 years ago[gcc/doc] Improve nonnull attribute documentation
Tom de Vries [Wed, 28 Jul 2021 13:44:54 +0000 (15:44 +0200)]
[gcc/doc] Improve nonnull attribute documentation

Improve nonnull attribute documentation in a number of ways:

Reorganize discussion of effects into:
- effects for calls to functions with nonnull-marked parameters, and
- effects for function definitions with nonnull-marked parameters.
This makes it clear that -fno-delete-null-pointer-checks has no effect for
optimizations based on nonnull-marked parameters in function definitions
(see PR100404).

Mention -Wnonnull-compare.

gcc/ChangeLog:

2021-07-28  Tom de Vries  <tdevries@suse.de>

PR middle-end/101665
* doc/extend.texi (nonnull attribute): Improve documentation.

3 years agoFix PR 101683: FP exceptions for float->unsigned
Andrew Pinski [Fri, 30 Jul 2021 02:48:46 +0000 (19:48 -0700)]
Fix PR 101683: FP exceptions for float->unsigned

Just like the old bug PR9651, unsigned_fix rtl should
also be handled as a trapping instruction.

OK? Bootstrapped and tested on x86_64-linux-gnu with no regressions.

gcc/ChangeLog:

PR rtl-optimization/101683
* rtlanal.c (may_trap_p_1): Handle UNSIGNED_FIX.

3 years agoc++: Improve memory usage of subsumption [PR100828]
Patrick Palka [Mon, 2 Aug 2021 13:59:56 +0000 (09:59 -0400)]
c++: Improve memory usage of subsumption [PR100828]

Constraint subsumption is implemented in two steps.  The first step
computes the disjunctive (or conjunctive) normal form of one of the
constraints, and the second step verifies that each clause in the
decomposed form implies the other constraint.   Performing these two
steps separately is problematic because in the first step the DNF/CNF
can be exponentially larger than the original constraint, and by
computing it ahead of time we'd have to keep all of it in memory.

This patch fixes this exponential blowup in memory usage by interleaving
the two steps, so that as soon as we decompose one clause we check
implication for it.  In turn, memory usage during subsumption is now
worst case linear in the size of the constraints rather than
exponential, and so we can safely remove the hard limit of 16 clauses
without introducing runaway memory usage on some inputs.  (Note the
_time_ complexity of subsumption is still exponential in the worst case.)

In order for this to work we need to make formula::branch() insert the
copy of the current clause directly after the current clause rather than
at the end of the list, so that we fully decompose a clause shortly
after creating it.  Otherwise we'd end up accumulating exponentially
many (partially decomposed) clauses in memory anyway.

PR c++/100828

gcc/cp/ChangeLog:

* logic.cc (formula::formula): Use emplace_back instead of
push_back.
(formula::branch): Insert a copy of m_current directly after
m_current instead of at the end of the list.
(formula::erase): Define.
(decompose_formula): Remove.
(decompose_antecedents): Remove.
(decompose_consequents): Remove.
(derive_proofs): Remove.
(max_problem_size): Remove.
(diagnose_constraint_size): Remove.
(subsumes_constraints_nonnull): Rewrite directly in terms of
decompose_clause and derive_proof, interleaving decomposition
with implication checking.  Remove limit on constraint complexity.
Use formula::erase to free the current clause before moving on to
the next one.

3 years agoOptimize x ? bswap(x) : 0 in tree-ssa-phiopt
Roger Sayle [Mon, 2 Aug 2021 12:27:53 +0000 (13:27 +0100)]
Optimize x ? bswap(x) : 0 in tree-ssa-phiopt

Many thanks again to Jakub Jelinek for a speedy fix for PR 101642.
Interestingly, that test case "bswap16(x) ? : x" also reveals a
missed optimization opportunity.  The resulting "x ? bswap(x) : 0"
can be further simplified to just bswap(x).

Conveniently, tree-ssa-phiopt.c already recognizes/optimizes the
related "x ? popcount(x) : 0", so this patch simply makes that
transformation make general, additionally handling bswap, parity,
ffs and clrsb.  All of the required infrastructure is already
present thanks to Jakub previously adding support for clz/ctz.
To reflect this generalization, the name of the function is changed
from cond_removal_in_popcount_clz_ctz_pattern to the hopefully
equally descriptive cond_removal_in_builtin_zero_pattern.

2021-08-02  Roger Sayle  <roger@nextmovesoftware.com>

gcc/ChangeLog
* tree-ssa-phiopt.c (cond_removal_in_builtin_zero_pattern):
Renamed from cond_removal_in_popcount_clz_ctz_pattern.
Add support for BSWAP, FFS, PARITY and CLRSB builtins.
(tree_ssa_phiop_worker): Update call to function above.

gcc/testsuite/ChangeLog
* gcc.dg/tree-ssa/phi-opt-25.c: New test case.

3 years agoi386: Improve SImode constant - __builtin_clzll for -mno-lzcnt
H.J. Lu [Sun, 1 Aug 2021 16:55:33 +0000 (09:55 -0700)]
i386: Improve SImode constant - __builtin_clzll for -mno-lzcnt

Add a zero_extend patten for bsr_rex64_1 and use it to split SImode
constant - __builtin_clzll to avoid unncessary zero_extend.

gcc/

PR target/78103
* config/i386/i386.md (bsr_rex64_1_zext): New.
(combine splitter for constant - clzll): Replace gen_bsr_rex64_1
with gen_bsr_rex64_1_zext.

gcc/testsuite/

PR target/78103
* gcc.target/i386/pr78103-2.c: Also scan incl.
* gcc.target/i386/pr78103-3.c: Scan leal|addl|incl for x32.  Also
scan incq.

3 years agoAdd missing descriptions gcc/testsuite/ChangeLog
Jonathan Wakely [Sun, 1 Aug 2021 18:37:52 +0000 (19:37 +0100)]
Add missing descriptions gcc/testsuite/ChangeLog

3 years agoUpdate gcc fr.po.
Joseph Myers [Sat, 31 Jul 2021 19:30:11 +0000 (19:30 +0000)]
Update gcc fr.po.

* fr.po: Update.

3 years agoc++: ICE on anon struct with base [PR96636]
Jason Merrill [Fri, 30 Jul 2021 20:49:03 +0000 (16:49 -0400)]
c++: ICE on anon struct with base [PR96636]

pinski pointed out that my recent change to reject anonymous structs with
bases was relevant to this PR.  But we still ICEd after giving that error;
this fixes the ICE.

PR c++/96636

gcc/cp/ChangeLog:

* decl.c (fixup_anonymous_aggr): Clear TYPE_NEEDS_CONSTRUCTING
after error.

gcc/testsuite/ChangeLog:

* g++.dg/ext/anon-struct9.C: New test.

3 years agoc++: pretty-print TYPE_PACK_EXPANSION better
Jason Merrill [Sat, 10 Jul 2021 09:45:02 +0000 (05:45 -0400)]
c++: pretty-print TYPE_PACK_EXPANSION better

gcc/cp/ChangeLog:

* ptree.c (cxx_print_type) [TYPE_PACK_EXPANSION]: Also print
PACK_EXPANSION_PATTERN.

3 years ago[Committed] Tweak new test case gcc.target/i386/dec-cmov-2.c
Roger Sayle [Sat, 31 Jul 2021 10:06:22 +0000 (11:06 +0100)]
[Committed] Tweak new test case gcc.target/i386/dec-cmov-2.c

With -m32, this test case is sensitive to the instruction timings of
the target (for ifcvt to normalize bar() to foo() during the ce1 pass,
prior to the transformations actually being tested here).  Specifying
-march=core2 prevents these failures.  Committed as obvious.

2021-07-31  Roger Sayle  <roger@nextmovesoftware.com>

gcc/testsuite/ChangeLog
* gcc.target/i386/dec-cmov-2.c: Require -march=core2 with -m32.

3 years agoopenmp: Handle OpenMP directives in attribute syntax in attribute-declaration
Jakub Jelinek [Sat, 31 Jul 2021 07:35:25 +0000 (09:35 +0200)]
openmp: Handle OpenMP directives in attribute syntax in attribute-declaration

Now that we parse attribute-declaration (outside of functions), the following
patch handles OpenMP directives in its attribute(s).
What needs handling incrementally is diagnose mismatching begin/end pair
like
 [[omp::directive (declare target)]];
 int a;
 #pragma omp end declare target
or
 #pragma omp declare target
 int b;
 [[omp::directive (end declare target)]];
and handling declare simd/declare variant on declarations (function
definitions and declarations), for those in two different spots.

2021-07-31  Jakub Jelinek  <jakub@redhat.com>

* parser.c (cp_parser_declaration): Handle OpenMP directives
in attribute-declaration.

* g++.dg/gomp/attrs-9.C: New test.

3 years agoi386: Improve extensions of __builtin_clz and constant - __builtin_clz for -mno-lzcnt...
Jakub Jelinek [Sat, 31 Jul 2021 07:19:32 +0000 (09:19 +0200)]
i386: Improve extensions of __builtin_clz and constant - __builtin_clz for -mno-lzcnt [PR78103]

This patch improves emitted code for the non-TARGET_LZCNT case.
As __builtin_clz* is UB on 0 argument and for !TARGET_LZCNT
CLZ_VALUE_DEFINED_AT_ZERO is 0, it is UB even at RTL time and so we
can take advantage of that and assume the result will be 0 to 31 or
0 to 63.
Given that, sign or zero extension of that result are the same and
are actually already performed by bsrl or xorl instructions.
And constant - __builtin_clz* can be simplified into
bsr + constant - bitmask.
For TARGET_LZCNT, a lot of this is already fine as is (e.g. the sign or
zero extensions), and other optimizations are IMHO not possible
(if we have lzcnt, we've lost information on whether it is UB at
zero or not and so can't transform it into bsr even when that is
1-2 insns shorter).
The changes on the 3 testcases between unpatched and patched gcc
are for -m64:
pr78103-1.s:
        bsrq    %rdi, %rax
-       xorq    $63, %rax
-       cltq
+       xorl    $63, %eax
...
        bsrq    %rdi, %rax
-       xorq    $63, %rax
-       cltq
+       xorl    $63, %eax
...
        bsrl    %edi, %eax
        xorl    $31, %eax
-       cltq
...
        bsrl    %edi, %eax
        xorl    $31, %eax
-       cltq
pr78103-2.s:
        bsrl    %edi, %edi
-       movl    $32, %eax
-       xorl    $31, %edi
-       subl    %edi, %eax
+       leal    1(%rdi), %eax
...
-       bsrl    %edi, %edi
-       movl    $31, %eax
-       xorl    $31, %edi
-       subl    %edi, %eax
+       bsrl    %edi, %eax
...
        bsrq    %rdi, %rdi
-       movl    $64, %eax
-       xorq    $63, %rdi
-       subl    %edi, %eax
+       leal    1(%rdi), %eax
...
-       bsrq    %rdi, %rdi
-       movl    $63, %eax
-       xorq    $63, %rdi
-       subl    %edi, %eax
+       bsrq    %rdi, %rax
pr78103-3.s:
        bsrl    %edi, %edi
-       movl    $32, %eax
-       xorl    $31, %edi
-       movslq  %edi, %rdi
-       subq    %rdi, %rax
+       leaq    1(%rdi), %rax
...
-       bsrl    %edi, %edi
-       movl    $31, %eax
-       xorl    $31, %edi
-       movslq  %edi, %rdi
-       subq    %rdi, %rax
+       bsrl    %edi, %eax
...
        bsrq    %rdi, %rdi
-       movl    $64, %eax
-       xorq    $63, %rdi
-       movslq  %edi, %rdi
-       subq    %rdi, %rax
+       leaq    1(%rdi), %rax
...
-       bsrq    %rdi, %rdi
-       movl    $63, %eax
-       xorq    $63, %rdi
-       movslq  %edi, %rdi
-       subq    %rdi, %rax
+       bsrq    %rdi, %rax

Most of the changes are done with combine splitters, but for
*bsr_rex64_2 and *bsr_2 I had to use define_insn_and_split, because
as mentioned in the PR the combiner unfortunately doesn't create LOG_LINKS
in between the two insns created by combine splitter, so it can't be
combined further with following instructions.

2021-07-31  Jakub Jelinek  <jakub@redhat.com>

PR target/78103
* config/i386/i386.md (bsr_rex64_1, bsr_1, bsr_zext_1): New
define_insn patterns.
(*bsr_rex64_2, *bsr_2): New define_insn_and_split patterns.
Add combine splitters for constant - clz.
(clz<mode>2): Use a temporary pseudo for bsr result.

* gcc.target/i386/pr78103-1.c: New test.
* gcc.target/i386/pr78103-2.c: New test.
* gcc.target/i386/pr78103-3.c: New test.

3 years agogcc.dg/tree-ssa/ssa-dse-26.c: Skip on mmix-knuth-mmixware
Hans-Peter Nilsson [Sat, 31 Jul 2021 00:08:36 +0000 (02:08 +0200)]
gcc.dg/tree-ssa/ssa-dse-26.c: Skip on mmix-knuth-mmixware

Commit r12-432, rewriting the dg-stuff, reverted the
adjustment for mmix-knuth-mmixware that I added in r11-2335.
(See those commits for context.)

Hopefully this variant will age better, just skipping it
with a trivial extra line less prone to pile-on.  (Not much
is won by covering this generic case for MMIX too; might as
well skip it.)

Beware that the dg-skip-if text can't say
"temporary variables are not x and y but x::3 and y::4"
because that leads to (on one line):

ERROR: gcc.dg/tree-ssa/ssa-dse-26.c: can't set "{temporary
 variables are not x and y but x::3 and y::4} {
 mmix-knuth-mmixware }": parent namespace doesn't exist for
 " dg-skip-if 4 "temporary variables are not x and y but
 x::3 and y::4" { mmix-knuth-mmixware } "

gcc/testsuite:
* gcc.dg/tree-ssa/ssa-dse-26.c: Skip on mmix-knuth-mmixware.

3 years agogcc.dg/uninit-pred-9_b.c: Xfail for MMIX too
Hans-Peter Nilsson [Fri, 30 Jul 2021 23:23:20 +0000 (01:23 +0200)]
gcc.dg/uninit-pred-9_b.c: Xfail for MMIX too

Looks like MMIX is the "correct target" too (cf. 2f6bdd51cfe15)
and from
https://gcc.gnu.org/pipermail/gcc-testresults/2021-July/710188.html
it seems powerpc-ibm-aix7.2.3.0 is too, but I've not found
other targets failing.

gcc/testsuite:
PR middle-end/101674
* gcc.dg/uninit-pred-9_b.c: Xfail for mmix-*-* too.

3 years agors6000: Add tests for SSE4.1 "floor" intrinsics
Paul A. Clarke [Tue, 6 Jul 2021 22:35:45 +0000 (17:35 -0500)]
rs6000: Add tests for SSE4.1 "floor" intrinsics

Add the tests for _mm_floor_pd, _mm_floor_ps, _mm_floor_sd, _mm_floor_ss.
These are modelled after (and depend upon parts of) the tests for
_mm_ceil intrinsics, recently posted.

Copy a test for _mm_floor_sd from gcc/testsuite/gcc.target/i386.

2021-07-30  Paul A. Clarke  <pc@us.ibm.com>

gcc/testsuite
* gcc.target/powerpc/sse4_1-floorpd.c: New.
* gcc.target/powerpc/sse4_1-floorps.c: New.
* gcc.target/powerpc/sse4_1-floorsd.c: New.
* gcc.target/powerpc/sse4_1-floorss.c: New.
* gcc.target/powerpc/sse4_1-roundpd-2.c: Copy from
gcc/testsuite/gcc.target/i386 and adjust dg directives to suit.

3 years agors6000: Add support for SSE4.1 "floor" intrinsics
Paul A. Clarke [Tue, 6 Jul 2021 22:31:21 +0000 (17:31 -0500)]
rs6000: Add support for SSE4.1 "floor" intrinsics

2021-07-30  Paul A. Clarke  <pc@us.ibm.com>

gcc
* config/rs6000/smmintrin.h (_mm_floor_pd, _mm_floor_ps,
_mm_floor_sd, _mm_floor_ss): New.

3 years agors6000: Add tests for SSE4.1 "ceil" intrinsics
Paul A. Clarke [Fri, 2 Jul 2021 02:00:26 +0000 (21:00 -0500)]
rs6000: Add tests for SSE4.1 "ceil" intrinsics

Add the tests for _mm_ceil_pd, _mm_ceil_ps, _mm_ceil_sd, _mm_ceil_ss.

Copy a test for _mm_ceil_pd and _mm_ceil_ps from
gcc/testsuite/gcc.target/i386.

Define __VSX_SSE2__ to pick up some union definitions in
m128-check.h.

2021-07-30  Paul A. Clarke  <pc@us.ibm.com>

gcc/testsuite
* gcc.target/powerpc/sse4_1-ceilpd.c: New.
* gcc.target/powerpc/sse4_1-ceilps.c: New.
* gcc.target/powerpc/sse4_1-ceilsd.c: New.
* gcc.target/powerpc/sse4_1-ceilss.c: New.
* gcc.target/powerpc/sse4_1-round-data.h: New.
* gcc.target/powerpc/sse4_1-round.h: New.
* gcc.target/powerpc/sse4_1-round2.h: New.
* gcc.target/powerpc/sse4_1-roundpd-3.c: Copy from gcc.target/i386
and adjust dg directives to suit.
* gcc.target/powerpc/sse4_1-check.h (__VSX_SSE2__): Define.

3 years agors6000: Add support for SSE4.1 "ceil" intrinsics
Paul A. Clarke [Thu, 1 Jul 2021 22:04:51 +0000 (17:04 -0500)]
rs6000: Add support for SSE4.1 "ceil" intrinsics

2021-07-30  Paul A. Clarke  <pc@us.ibm.com>

gcc
* config/rs6000/smmintrin.h (_mm_ceil_pd, _mm_ceil_ps,
_mm_ceil_sd, _mm_ceil_ss): New.

3 years agors6000: Add tests for SSE4.1 "blend" intrinsics
Paul A. Clarke [Tue, 29 Jun 2021 17:09:56 +0000 (12:09 -0500)]
rs6000: Add tests for SSE4.1 "blend" intrinsics

Copy the tests for _mm_blend_pd, _mm_blendv_pd, _mm_blend_ps,
_mm_blendv_ps from gcc/testsuite/gcc.target/i386.

2021-07-30  Paul A. Clarke  <pc@us.ibm.com>

gcc/testsuite
* gcc.target/powerpc/sse4_1-blendpd.c: Copy from gcc.target/i386
and adjust dg directives to suit.
* gcc.target/powerpc/sse4_1-blendps-2.c: Likewise.
* gcc.target/powerpc/sse4_1-blendps.c: Likewise.
* gcc.target/powerpc/sse4_1-blendvpd.c: Likewise.

3 years agors6000: Add support for SSE4.1 "blend" intrinsics
Paul A. Clarke [Mon, 12 Jul 2021 17:06:18 +0000 (12:06 -0500)]
rs6000: Add support for SSE4.1 "blend" intrinsics

_mm_blend_epi16 and _mm_blendv_epi8 were added earlier.
Add these four to complete the set.

2021-07-30  Paul A. Clarke  <pc@us.ibm.com>

gcc
* config/rs6000/smmintrin.h (_mm_blend_pd, _mm_blendv_pd,
_mm_blend_ps, _mm_blendv_ps): New.

3 years agoDecrement followed by cmov improvements.
Roger Sayle [Fri, 30 Jul 2021 21:46:32 +0000 (22:46 +0100)]
Decrement followed by cmov improvements.

The following patch to the x86_64 backend improves the code generated
for a decrement followed by a conditional move.  The primary change is
to recognize that after subtracting one, checking the result is -1 (or
equivalently that the original value was zero) can be implemented using
the borrow/carry flag instead of requiring an explicit test instruction.
This is achieved by a new define_insn_and_split that allows combine to
split the desired sequence/composite into a *subsi_3 and *movsicc_noc.

The other change with this patch is/are a pair of peephole2 optimizations
to eliminate register-to-register moves generated during register
allocation.  During reload, the compiler doesn't know that inverting
the condition of a conditional cmove can sometimes reduce register
pressure, but this is easy to tidy up during the peephole2 pass (where
swapping the order of the insn's operands performs the required
logic inversion).

Both improvements are demonstrated by the case below:

int foo(int x) {
  if (x == 0)
    x = 16;
  else x--;
  return x;
}

Before:
foo: leal    -1(%rdi), %eax
        testl   %edi, %edi
        movl    $16, %edx
        cmove   %edx, %eax
        ret

After:
foo: subl    $1, %edi
        movl    $16, %eax
        cmovnc  %edi, %eax
        ret

And the value of the peephole2 clean-up can be seen on its own in:

int bar(int x) {
  x--;
  if (x == 0)
    x = 16;
  return x;
}

Before:
bar: movl    %edi, %eax
        movl    $16, %edx
        subl    $1, %eax
        cmove   %edx, %eax
        ret

After:
bar: subl    $1, %edi
        movl    $16, %eax
        cmovne  %edi, %eax
        ret

These idioms were inspired by the source code of NIST SciMark4's
Random_nextDouble function, where the tweaks above result in
a ~1% improvement in the MonteCarlo benchmark kernel.

2021-07-30  Roger Sayle  <roger@nextmovesoftware.com>
    Uroš Bizjak  <ubizjak@gmail.com>

gcc/ChangeLog
* config/i386/i386.md (*dec_cmov<mode>): New define_insn_and_split
to generate a conditional move using the carry flag after sub $1.
(peephole2): Eliminate a register-to-register move by inverting
the condition of a conditional move.

gcc/testsuite/ChangeLog
* gcc.target/i386/dec-cmov-1.c: New test.
* gcc.target/i386/dec-cmov-2.c: New test.

3 years agoMMIX: remove generic placeholders parameters in call insn patterns.
Hans-Peter Nilsson [Sun, 18 Jul 2021 02:59:30 +0000 (04:59 +0200)]
MMIX: remove generic placeholders parameters in call insn patterns.

I guess the best way to describe these operands, at least for MMIX, is
"ballast".  Some targets seem to drag along one or two of the incoming
pattern operands through the rtl passes and not dropping them until
assembly output.  Let's stop doing that for MMIX.  There really are
*two* unused parameters: one is a number corresponding to the
stack-size of arguments as a const_int and the other is whatever the
target yields for targetm.calls.function_arg (args_so_far,
function_arg_info::end_marker ()).  There's a mandatory second
argument to the "call" RTX, but the target doesn't have to keep it a
variable number; it can be replaced by (const_int 0) early, like this.

Astute readers may object that as the MMIX call-type insns (PUSHJ,
PUSHGO) have a parameter in addition to the address of the called
function, so should the emitted RTL.  But, that parameter depends only
on the local function, not the called function (IOW, it's the same for
all calls in a function), and its value isn't known until frame layout
time.  Having it a parameter in the emitted RTL for the call would
just be confusing.  (Maybe this will be amended later, if/when
improving "shrink-wrapping".)

gcc:
* config/mmix/mmix.md ("call", "call_value", "*call_real")
("*call_value_real"): Don't generate rtx mentioning the generic
operands 1 and 2 to "call", and similarly for "call_value".
* config/mmix/mmix.c (mmix_print_operand_punct_valid_p)
(mmix_print_operand): Use '!' instead of 'p'.

3 years agodoc: correct documentation of "call" (et al) operand 2.
Hans-Peter Nilsson [Sun, 18 Jul 2021 01:40:11 +0000 (03:40 +0200)]
doc: correct documentation of "call" (et al) operand 2.

An old itch being scratched: the documentation lies; it's not "the
number of registers used as operands", unless the target makes a
special arrangement to that effect, and there's nothing in the guts of
gcc setting up or assuming those semantics.

Instead, see calls.c:expand_call, variable next_arg_reg.  Or just
consider the variable name.  The text is somewhat transcribed from the
head comment of emit_call_1 for parameter next_arg_reg.  Most
important is to document the relation to function_arg_info::end_marker()
and the TARGET_FUNCTION_ARG hook.

The "normally" in the head comment, in "normally it is the first
arg-register beyond those used for args in this call, or 0 if all the
arg-registers are used in this call" means "by default", unless the
target tests end_marker_p and does something special, but the port is
free to return whatever it likes when it sees the end-marker.

And, I do mean "whatever it likes" because if the port doesn't
actually mention that operand in the RTX emitted for its "call" or
"call_value" patterns ("usually" define_expands), it can be any
mumbo-jumbo, such as a VOIDmode register, which seems like it happens
for some targets, or NULL, that happens for others.  Returning a
VOIDmode register until recently included MMIX, where it made it into
the emitted RTL, confusing later passes, recently exposed as an ICE.

Tested by inspecting the info and generated pdf for sanity.

gcc:
* doc/md.texi (call): Correct information about operand 2.
* config/mmix/mmix.md ("call", "call_value"): Remove fixed FIXMEs.

3 years agoHandle constants in wi_fold for trunc_mod.
Andrew MacLeod [Thu, 29 Jul 2021 15:22:28 +0000 (11:22 -0400)]
Handle constants in wi_fold for trunc_mod.

Handle const % const, as wi_fold_in_parts may now provide this.  Before this
[10, 10] % [4, 4] would produce [0, 3] instead of [2, 2].

gcc/
* range-op.cc (operator_trunc_mod::wi_fold): Fold constants.

gcc/testsuite/
* gcc.dg/tree-ssa/pr61839_2.c: Adjust.  Add new const fold test.

3 years agoChange integral divide by zero to produce UNDEFINED.
Andrew MacLeod [Wed, 28 Jul 2021 17:14:22 +0000 (13:14 -0400)]
Change integral divide by zero to produce UNDEFINED.

Instead of VARYING, we can get better results by treating divide by zero
as producing an undefined result.

gcc/
* range-op.cc (operator_div::wi_fold): Return UNDEFINED for [0, 0] divisor.

gcc/testsuite/
* gcc.dg/tree-ssa/pr61839_2.c: Adjust.

3 years agoChange const basic_block to const_basic_block.
Andrew MacLeod [Thu, 29 Jul 2021 13:15:45 +0000 (09:15 -0400)]
Change const basic_block to const_basic_block.

* gimple-range-cache.cc (*::set_bb_range): Change const basic_block to
const_basic_block..
(*::get_bb_range): Ditto.
(*::bb_range_p): Ditto.
* gimple-range-cache.h: Change prototypes.

3 years agoMove failed part of a test to a new file [PR101671]
Martin Sebor [Fri, 30 Jul 2021 17:41:02 +0000 (11:41 -0600)]
Move failed part of a test to a new file [PR101671]

Related:
PR middle-end/101671 - pr83510 fails with -Os because threader confuses -Warray-bounds

gcc/testsuite:
PR middle-end/101671
* gcc.c-torture/compile/pr83510.c: Move test functions...
* gcc.dg/Warray-bounds-87.c: ...to this file.

3 years agoAdd QI vector mode support to by-pieces for memset
H.J. Lu [Sun, 6 Mar 2016 14:38:21 +0000 (06:38 -0800)]
Add QI vector mode support to by-pieces for memset

1. Replace scalar_int_mode with fixed_size_mode in the by-pieces
infrastructure to allow non-integer mode.
2. Rename widest_int_mode_for_size to widest_fixed_size_mode_for_size
to return QI vector mode for memset.
3. Add op_by_pieces_d::smallest_fixed_size_mode_for_size to return the
smallest integer or QI vector mode.
4. Remove clear_by_pieces_1 and use builtin_memset_read_str in
clear_by_pieces to support vector mode broadcast.
5. Add lowpart_subreg_regno, a wrapper around simplify_subreg_regno that
uses subreg_lowpart_offset (mode, prev_mode) as the offset.
6. Add TARGET_GEN_MEMSET_SCRATCH_RTX to allow the backend to use a hard
scratch register to avoid stack realignment when expanding memset.

gcc/

PR middle-end/90773
* builtins.c (builtin_memcpy_read_str): Change the mode argument
from scalar_int_mode to fixed_size_mode.
(builtin_strncpy_read_str): Likewise.
(gen_memset_value_from_prev): New function.
(builtin_memset_read_str): Change the mode argument from
scalar_int_mode to fixed_size_mode.  Use gen_memset_value_from_prev
and support CONST_VECTOR.
(builtin_memset_gen_str): Likewise.
(try_store_by_multiple_pieces): Use by_pieces_constfn to declare
constfun.
* builtins.h (builtin_strncpy_read_str): Replace scalar_int_mode
with fixed_size_mode.
(builtin_memset_read_str): Likewise.
* expr.c (widest_int_mode_for_size): Renamed to ...
(widest_fixed_size_mode_for_size): Add a bool argument to
indicate if QI vector mode can be used.
(by_pieces_ninsns): Call widest_fixed_size_mode_for_size
instead of widest_int_mode_for_size.
(pieces_addr::adjust): Change the mode argument from
scalar_int_mode to fixed_size_mode.
(op_by_pieces_d): Make m_len read-only.  Add a bool member,
m_qi_vector_mode, to indicate that QI vector mode can be used.
(op_by_pieces_d::op_by_pieces_d): Add a bool argument to
initialize m_qi_vector_mode.  Call widest_fixed_size_mode_for_size
instead of widest_int_mode_for_size.
(op_by_pieces_d::get_usable_mode): Change the mode argument from
scalar_int_mode to fixed_size_mode.  Call
widest_fixed_size_mode_for_size instead of
widest_int_mode_for_size.
(op_by_pieces_d::smallest_fixed_size_mode_for_size): New member
function to return the smallest integer or QI vector mode.
(op_by_pieces_d::run): Call widest_fixed_size_mode_for_size
instead of widest_int_mode_for_size.  Call
smallest_fixed_size_mode_for_size instead of
smallest_int_mode_for_size.
(store_by_pieces_d::store_by_pieces_d): Add a bool argument to
indicate that QI vector mode can be used and pass it to
op_by_pieces_d::op_by_pieces_d.
(can_store_by_pieces): Call widest_fixed_size_mode_for_size
instead of widest_int_mode_for_size.  Pass memsetp to
widest_fixed_size_mode_for_size to support QI vector mode.
Allow all CONST_VECTORs for memset if vec_duplicate is supported.
(store_by_pieces): Pass memsetp to
store_by_pieces_d::store_by_pieces_d.
(clear_by_pieces_1): Removed.
(clear_by_pieces): Replace clear_by_pieces_1 with
builtin_memset_read_str and pass true to store_by_pieces_d to
support vector mode broadcast.
(string_cst_read_str): Change the mode argument from
scalar_int_mode to fixed_size_mode.
* expr.h (by_pieces_constfn): Change scalar_int_mode to
fixed_size_mode.
(by_pieces_prev): Likewise.
* rtl.h (lowpart_subreg_regno): New.
* rtlanal.c (lowpart_subreg_regno): New.  A wrapper around
simplify_subreg_regno.
* target.def (gen_memset_scratch_rtx): New hook.
* doc/tm.texi.in: Add TARGET_GEN_MEMSET_SCRATCH_RTX.
* doc/tm.texi: Regenerated.

gcc/testsuite/

* gcc.target/i386/pr100865-3.c: Expect vmovdqu8 instead of
vmovdqu.
* gcc.target/i386/pr100865-4b.c: Likewise.

3 years agoAdd testcases that got lost when tree-ssa was merged
Andrew Pinski [Thu, 29 Jul 2021 19:56:18 +0000 (12:56 -0700)]
Add testcases that got lost when tree-ssa was merged

So I was looking at some older PRs (PR 16016 in this case),
I noticed that some of the testcases were removed when
the tree-ssa branch was merged. This adds them back in.

OK? Bootstrapped and tested on x86_64-linux-gnu with no regressions.

Thanks,
Andrew Pinski

gcc/testsuite/ChangeLog:

PR testsuite/101517
* g++.dg/warn/Wunused-18.C: New test.
* gcc.c-torture/compile/20030405-2.c: New test.
* gcc.c-torture/compile/20040304-2.c: New test.
* gcc.dg/20030612-2.c: New test.

3 years agolibstdc++: Use secure_getenv for filesystem::temp_directory_path() [PR65018]
Jonathan Wakely [Fri, 30 Jul 2021 12:56:14 +0000 (13:56 +0100)]
libstdc++: Use secure_getenv for filesystem::temp_directory_path() [PR65018]

This adds a configure check for the GNU extension secure_getenv and then
uses it for looking up TMPDIR and similar variables.

Signed-off-by: Jonathan Wakely <jwakely@redhat.com>
libstdc++-v3/ChangeLog:

PR libstdc++/65018
* configure.ac: Check for secure_getenv.
* config.h.in: Regenerate.
* configure: Regenerate.
* src/filesystem/ops-common.h (get_temp_directory_from_env): New
helper function to obtain path from the environment.
* src/c++17/fs_ops.cc (fs::temp_directory_path): Use new helper.
* src/filesystem/ops.cc (fs::temp_directory_path): Likewise.
* testsuite/27_io/filesystem/operations/temp_directory_path.cc:
Print messages if test cannot be run.
* testsuite/experimental/filesystem/operations/temp_directory_path.cc:
Likewise. Fix incorrect condition. Use "TMP" to work with
Windows as well as POSIX.

3 years agomips: Fix up mips_atomic_assign_expand_fenv [PR94780]
Xi Ruoyao [Fri, 30 Jul 2021 15:44:14 +0000 (23:44 +0800)]
mips: Fix up mips_atomic_assign_expand_fenv [PR94780]

Commit message shamelessly copied from 1777beb6b129 by jakub:

This function, because it is sometimes called even outside of function
bodies, uses create_tmp_var_raw rather than create_tmp_var.  But in order
for that to work, when first referenced, the VAR_DECLs need to appear in a
TARGET_EXPR so that during gimplification the var gets the right
DECL_CONTEXT and is added to local decls.

gcc/

PR target/94780
* config/mips/mips.c (mips_atomic_assign_expand_fenv): Use
  TARGET_EXPR instead of MODIFY_EXPR.

3 years agomips: add MSA vec_cmp and vec_cmpu expand pattern [PR101132]
Xi Ruoyao [Sun, 20 Jun 2021 07:21:39 +0000 (15:21 +0800)]
mips: add MSA vec_cmp and vec_cmpu expand pattern [PR101132]

Middle-end started to emit vec_cmp and vec_cmpu since GCC 11, causing
ICE on MIPS with MSA enabled.  Add the pattern to prevent it.

gcc/

PR target/101132
* config/mips/mips-protos.h (mips_expand_vec_cmp_expr): Declare.
* config/mips/mips.c (mips_expand_vec_cmp_expr): New function.
* config/mips/mips-msa.md (vec_cmp<MSA:mode><mode_i>): New
  expander.
  (vec_cmpu<IMSA:mode><mode_i>): New expander.

gcc/testsuite/

PR target/101132
* gcc.target/mips/pr101132.c: New test.

3 years agoc++: Implement P0466R5 __cpp_lib_is_pointer_interconvertible compiler helpers [PR101539]
Jakub Jelinek [Fri, 30 Jul 2021 16:38:41 +0000 (18:38 +0200)]
c++: Implement P0466R5 __cpp_lib_is_pointer_interconvertible compiler helpers [PR101539]

The following patch attempts to implement the compiler helpers for
libstdc++ std::is_pointer_interconvertible_base_of trait and
std::is_pointer_interconvertible_with_class template function.

For the former __is_pointer_interconvertible_base_of trait that checks first
whether base and derived aren't non-union class types that are the same
ignoring toplevel cv-qualifiers, otherwise if derived is unambiguously
derived from base without cv-qualifiers, derived being a complete type,
and if so, my limited understanding of any derived object being
pointer-interconvertible with base subobject IMHO implies (because one can't
inherit from unions or unions can't inherit) that we check if derived is
standard layout type and we walk bases of derived
recursively, stopping on a class that has any non-static data members and
check if any of the bases is base.  On class with non-static data members
no bases are compared already.
Upon discussions, this is something that maybe should have been changed
in the standard with CWG 2254 and the patch no longer performs this and
assumes all base subobjects of standard-layout class types are
pointer-interconvertible with the whole class objects.

The latter is implemented using a FE
__builtin_is_pointer_interconvertible_with_class, but because on the library
side it will be a template function, the builtin takes ... arguments and
only during folding verifies it has a single argument with pointer to member
type.  The initial errors IMHO can only happen if one uses the builtin
incorrectly by hand, the template function should ensure that it has
exactly a single argument that has pointer to member type.
Otherwise, again with my limited understanding of what
the template function should do and pointer-interconvertibility,
it folds to false for pointer-to-member-function, errors if
basetype of the OFFSET_TYPE is incomplete, folds to false
for non-std-layout non-union basetype, then finds the first non-static
data member in the basetype or its bases (by ignoring
DECL_FIELD_IS_BASE FIELD_DECLs that are empty, recursing into
DECL_FIELD_IS_BASE FIELD_DECLs type that are non-empty (I think
std layout should ensure there is at most one), for unions
checks if membertype is same type as any of the union FIELD_DECLs,
for non-unions the first other FIELD_DECL only, and for anonymous
aggregates similarly (union vs. non-union) but recurses into the
anon aggr types with std layout check for anon structures.  If
membertype doesn't match the type of first non-static data member
(or for unions any of the members), then the builtin folds to false,
otherwise the built folds to a check whether the argument is equal
to OFFSET_TYPE of 0 or not, either at compile time if it is constant
(e.g. for constexpr folding) or at runtime otherwise.

As I wrote in the PR, I've tried my testcases with MSVC on godbolt
that claims to implement it, and https://godbolt.org/z/3PnjM33vM
for the first testcase shows it disagrees with my expectations on
static_assert (std::is_pointer_interconvertible_base_of_v<D, F>);
static_assert (std::is_pointer_interconvertible_base_of_v<E, F>);
static_assert (!std::is_pointer_interconvertible_base_of_v<D, G>);
static_assert (!std::is_pointer_interconvertible_base_of_v<D, I>);
static_assert (std::is_pointer_interconvertible_base_of_v<H, volatile I>);
Is that a bug in my patch or is MSVC buggy on these (or mix thereof)?
https://godbolt.org/z/aYeYnne9d
shows the second testcase, here it differs on:
static_assert (std::is_pointer_interconvertible_with_class<F, int> (&F::b));
static_assert (std::is_pointer_interconvertible_with_class<I, int> (&I::g));
static_assert (std::is_pointer_interconvertible_with_class<L, int> (&L::b));
static_assert (std::is_pointer_interconvertible_with_class (&V::a));
static_assert (std::is_pointer_interconvertible_with_class (&V::b));
Again, my bug, MSVC bug, mix thereof?
According to Jason the <D, G>, <D, I> case are the subject of the
CWG 2254 above discussed change and the rest are likely MSVC bugs.

Oh, and there is another thing, the standard has an example:
struct A { int a; };                    // a standard-layout class
struct B { int b; };                    // a standard-layout class
struct C: public A, public B { };       // not a standard-layout class

static_assert( is_pointer_interconvertible_with_class( &C::b ) );
  // Succeeds because, despite its appearance, &C::b has type
  // “pointer to member of B of type int”.
static_assert( is_pointer_interconvertible_with_class<C>( &C::b ) );
  // Forces the use of class C, and fails.
It seems to work as written with MSVC (second assertion fails),
but fails with GCC with the patch:
/tmp/1.C:22:57: error: no matching function for call to ‘is_pointer_interconvertible_with_class<C>(int B::*)’
   22 | static_assert( is_pointer_interconvertible_with_class<C>( &C::b ) );
      |                ~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~^~~~~~~~~
/tmp/1.C:8:1: note: candidate: ‘template<class S, class M> constexpr bool std::is_pointer_interconvertible_with_class(M S::*)’
    8 | is_pointer_interconvertible_with_class (M S::*m) noexcept
      | ^~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~
/tmp/1.C:8:1: note:   template argument deduction/substitution failed:
/tmp/1.C:22:57: note:   mismatched types ‘C’ and ‘B’
   22 | static_assert( is_pointer_interconvertible_with_class<C>( &C::b ) );
      |                ~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~^~~~~~~~~
the second int argument isn't deduced.

This boils down to:
template <class S, class M>
bool foo (M S::*m) noexcept;
struct A { int a; };
struct B { int b; };
struct C : public A, public B {};
bool a = foo (&C::b);
bool b = foo<C, int> (&C::b);
bool c = foo<C> (&C::b);
which with /std:c++20 or -std=c++20 is accepted by latest MSVC and ICC but
rejected by GCC and clang (in both cases on the last line).
Is this a GCC/clang bug in argument deduction (in that case I think we want
a separate PR), or a bug in ICC/MSVC and the standard itself that should
specify in the examples both template arguments instead of just the first?
And this has been raised with the CWG.

2021-07-30  Jakub Jelinek  <jakub@redhat.com>

PR c++/101539
gcc/c-family/
* c-common.h (enum rid): Add RID_IS_POINTER_INTERCONVERTIBLE_BASE_OF.
* c-common.c (c_common_reswords): Add
__is_pointer_interconvertible_base_of.
gcc/cp/
* cp-tree.h (enum cp_trait_kind): Add
CPTK_IS_POINTER_INTERCONVERTIBLE_BASE_OF.
(enum cp_built_in_function): Add
CP_BUILT_IN_IS_POINTER_INTERCONVERTIBLE_WITH_CLASS.
(fold_builtin_is_pointer_inverconvertible_with_class): Declare.
* parser.c (cp_parser_primary_expression): Handle
RID_IS_POINTER_INTERCONVERTIBLE_BASE_OF.
(cp_parser_trait_expr): Likewise.
* cp-objcp-common.c (names_builtin_p): Likewise.
* constraint.cc (diagnose_trait_expr): Handle
CPTK_IS_POINTER_INTERCONVERTIBLE_BASE_OF.
* decl.c (cxx_init_decl_processing): Register
__builtin_is_pointer_interconvertible_with_class builtin.
* constexpr.c (cxx_eval_builtin_function_call): Handle
CP_BUILT_IN_IS_POINTER_INTERCONVERTIBLE_WITH_CLASS builtin.
* semantics.c (pointer_interconvertible_base_of_p,
first_nonstatic_data_member_p,
fold_builtin_is_pointer_inverconvertible_with_class): New functions.
(trait_expr_value): Handle CPTK_IS_POINTER_INTERCONVERTIBLE_BASE_OF.
(finish_trait_expr): Likewise.  Formatting fix.
* cp-gimplify.c (cp_gimplify_expr): Fold
CP_BUILT_IN_IS_POINTER_INTERCONVERTIBLE_WITH_CLASS.  Call
fndecl_built_in_p just once.
(cp_fold): Likewise.
* tree.c (builtin_valid_in_constant_expr_p): Handle
CP_BUILT_IN_IS_POINTER_INTERCONVERTIBLE_WITH_CLASS.  Call
fndecl_built_in_p just once.
* cxx-pretty-print.c (pp_cxx_trait_expression): Handle
CPTK_IS_POINTER_INTERCONVERTIBLE_BASE_OF.
gcc/testsuite/
* g++.dg/cpp2a/is-pointer-interconvertible-base-of1.C: New test.
* g++.dg/cpp2a/is-pointer-interconvertible-with-class1.C: New test.
* g++.dg/cpp2a/is-pointer-interconvertible-with-class2.C: New test.
* g++.dg/cpp2a/is-pointer-interconvertible-with-class3.C: New test.
* g++.dg/cpp2a/is-pointer-interconvertible-with-class4.C: New test.
* g++.dg/cpp2a/is-pointer-interconvertible-with-class5.C: New test.
* g++.dg/cpp2a/is-pointer-interconvertible-with-class6.C: New test.

3 years agoc++: Reject anonymous struct with bases
Jason Merrill [Fri, 30 Jul 2021 12:45:01 +0000 (08:45 -0400)]
c++: Reject anonymous struct with bases

In discussion of jakub's patch for C++20 pointer-interconvertibility, it
came up that we allow anonymous structs to have bases, but don't do anything
usable with them.  Let's reject it.

The comment change is something I noticed while looking for the right place
to diagnose this: finish_struct_anon does not actually check for anything
invalid, so it shouldn't claim to.

gcc/cp/ChangeLog:

* class.c (finish_struct_anon): Improve comment.
* decl.c (fixup_anonymous_aggr): Reject anonymous struct
with bases.

gcc/testsuite/ChangeLog:

* g++.dg/ext/anon-struct8.C: New test.

3 years agoc++: Fix up attribute rollbacks in cp_parser_statement
Jakub Jelinek [Fri, 30 Jul 2021 15:44:38 +0000 (17:44 +0200)]
c++: Fix up attribute rollbacks in cp_parser_statement

During the OpenMP directives using C++ attribute syntax work, I've noticed
that cp_parser_statement when parsing various block declarations that do
not allow attribute-specifier-seq at the start rolls back the attributes
only if std_attrs is non-NULL (i.e. some attributes have been parsed),
but doesn't roll back if some tokens were parsed as attribute-specifier-seq,
but didn't yield any attributes (e.g. [[]][[]][[]][[]]), which means
we accept those empty attributes even in places where they don't appear
in the grammar.

The following patch fixes that by instead checking if there are any
tokens to roll back.  This makes the parsing handle the first
function the same as the second one (where some attribute appears).

The testcase contains two xfails, using namespace ... apparently
allows attributes at the start and the attributes shall appeartain to
using in that case.  To be fixed incrementally.

2021-07-30  Jakub Jelinek  <jakub@redhat.com>

* parser.c (cp_parser_statement): Rollback attributes not just
when std_attrs is non-NULL, but whenever
cp_parser_std_attribute_spec_seq parsed any tokens.

* g++.dg/cpp0x/gen-attrs-76.C: New test.