Hans-Peter Nilsson [Tue, 1 Feb 2022 23:00:10 +0000 (00:00 +0100)]
cris: For expanded movsi, don't match operands we know will be reloaded
In a session investigating unexpected fallout from a change, I
noticed reload needs one operand being a register to make an
informed decision. It can happen that there's just a constant
and a memory operand, as in:
(insn 668 667 42 104 (parallel [
(set (mem:SI (plus:SI (reg/v/f:SI 347 [ fs ])
(const_int 168 [0xa8])) \
[1 fs_126(D)->regs.cfa_how+0 S4 A8])
(const_int 2 [0x2]))
(clobber (reg:CC 19 dccr))
]) "<...>/gcc/libgcc/unwind-dw2.c":1121:21 22 {*movsi_internal}
(expr_list:REG_UNUSED (reg:CC 19 dccr)
(nil)))
This was helpfully created by combine. When this happens,
reload can't check for costs and preferred register classes,
(both operands will start with NO_REGS as the preferred class)
and will default to the constraints order in the insn in reload.
(Which also does its own temporary merge in find_reloads, but
that's a different story.) Better don't match the simple cases.
Beware that subregs have to be matched.
I'm doing this just for word_mode (SI) for now, but may repeat
this for the other valid modes as well. In particular, that
goes for DImode as I see the expanded movdi does *almost* this,
but uses register_operand instead of REG_S_P (from cris.h).
Using REG_S_P is the right choice here because register_operand
also matches (subreg (mem ...) ...) *until* reload is done.
By itself it's just a sub-0.1% performance win (coremark).
Also removing a stale comment.
gcc:
* config/cris/cris.md ("*movsi_internal<setcc><setnz><setnzvc>"):
Conditionalize on (sub-)register operands or operand 1 being 0.
Hans-Peter Nilsson [Tue, 1 Feb 2022 23:00:09 +0000 (00:00 +0100)]
cris: Don't default to -mmul-bug-workaround
This flips the default for the errata handling for an old version
(TL;DR: workaround: no multiply instruction last on a cache-line).
Newer versions of the CRIS cpu don't have that bug. While the impact
of the workaround is very marginal (coremark: less than .05% larger,
less than .0005% slower) it's an irritating pseudorandom factor when
assessing the impact of other changes.
Also, fix a wart requiring changes to more than TARGET_DEFAULT to flip
the default.
People building old kernels or operating systems to run on
ETRAX 100 LX are advised to pass "-mmul-bug-workaround".
gcc:
* config/cris/cris.h (TARGET_DEFAULT): Don't include MASK_MUL_BUG.
(MUL_BUG_ASM_DEFAULT): New macro.
(MAYBE_AS_NO_MUL_BUG_ABORT): Define in terms of MUL_BUG_ASM_DEFAULT.
* doc/invoke.texi (CRIS Options, -mmul-bug-workaround): Adjust
accordingly.
GCC Administrator [Wed, 2 Feb 2022 00:17:16 +0000 (00:17 +0000)]
Daily bump.
Jonathan Wakely [Tue, 1 Feb 2022 23:58:08 +0000 (23:58 +0000)]
libstdc++: Do not use dirent::d_type unconditionally
These new tests should not use the d_type member unless it's actually
present on the OS.
libstdc++-v3/ChangeLog:
* testsuite/27_io/filesystem/iterators/error_reporting.cc: Use
autoconf macro to check whether d_type is present.
* testsuite/experimental/filesystem/iterators/error_reporting.cc:
Likewise.
Eugene Rozenfeld [Wed, 19 Jan 2022 00:03:19 +0000 (16:03 -0800)]
AutoFDO: don't set param_early_inliner_max_iterations to 10.
param_early_inliner_max_iterations specifies the maximum number
of nested indirect inlining iterations performed by early inliner.
Normally, the default value is 1.
For AutoFDO this parameter was also used as the number of iteration for
its indirect call promotion loop and the default value was set to 10.
While it makes sense to have 10 in the indirect call promotion loop
(we want to make the IR match the profiled binary before actual annotation)
there is no reason to have a special default value for the
regular early inliner.
This change removes the special AutoFDO default value setting for
param_early_inliner_max_iterations while keeping 10 as the number of
iterations for the AutoFDO indirect call promotion loop.
This change improves a simple fibonacci benchmark in AutoFDO mode
by 15% on x86_64-pc-linux-gnu.
Tested on x86_64-pc-linux-gnu.
gcc/ChangeLog:
* auto-profile.cc (auto_profile): Hard-code the number of iterations (10).
gcc/ChangeLog:
* opts.cc (common_handle_option): Don't set param_early_inliner_max_iterations
to 10 for AutoFDO.
Andrew Pinski [Tue, 1 Feb 2022 23:05:14 +0000 (23:05 +0000)]
[COMMITTED] Change multiprecision.org to use https
As reported at
https://gcc.gnu.org/pipermail/gcc/2022-February/238216.html,
multiprecision.org now uses https so this updates the documentation
to use https instead of http.
Committed as obvious.
gcc/ChangeLog:
* doc/install.texi:
Jonathan Wakely [Tue, 1 Feb 2022 14:02:56 +0000 (14:02 +0000)]
libstdc++: Add more tests for filesystem directory iterators
The PR 97731 test was added to verify a fix to the Filesystem TS code,
but we should also have the same test to avoid similar regressions in
the C++17 std::filesystem code.
Also add tests for directory_options::follow_directory_symlink
libstdc++-v3/ChangeLog:
* testsuite/27_io/filesystem/iterators/97731.cc: New test.
* testsuite/27_io/filesystem/iterators/recursive_directory_iterator.cc:
Check follow_directory_symlink option.
* testsuite/experimental/filesystem/iterators/recursive_directory_iterator.cc:
Likewise.
Jonathan Wakely [Mon, 31 Jan 2022 21:12:53 +0000 (21:12 +0000)]
libstdc++: Reset filesystem::recursive_directory_iterator on error
The standard requires directory iterators to become equal to the end
iterator value if they report an error. Some members functions of
filesystem::recursive_directory_iterator fail to do that.
libstdc++-v3/ChangeLog:
* src/c++17/fs_dir.cc (recursive_directory_iterator::increment):
Reset state to past-the-end iterator on error.
(fs::recursive_directory_iterator::pop(error_code&)): Likewise.
(fs::recursive_directory_iterator::pop()): Check _M_dirs before
it might get reset.
* src/filesystem/dir.cc (recursive_directory_iterator): Likewise,
for the TS implementation.
* testsuite/27_io/filesystem/iterators/error_reporting.cc: New test.
* testsuite/experimental/filesystem/iterators/error_reporting.cc: New test.
Jonathan Wakely [Mon, 31 Jan 2022 14:11:34 +0000 (14:11 +0000)]
libstdc++: Fix doxygen comment for filesystem::perms operators
libstdc++-v3/ChangeLog:
* include/bits/fs_fwd.h (filesystem::perms): Fix comment.
Jonathan Wakely [Mon, 31 Jan 2022 11:00:18 +0000 (11:00 +0000)]
libstdc++: Improve config output for --enable-cstdio [PR104301]
Currently we just print "checking for underlying I/O to use... stdio"
unconditionally, whether configured to use stdio_pure or stdio_posix. We
should make it clear that the user's configure option chose the right
thing.
libstdc++-v3/ChangeLog:
PR libstdc++/104301
* acinclude.m4 (GLIBCXX_ENABLE_CSTDIO): Print different messages
for stdio_pure and stdio_posix options.
* configure: Regenerate.
Ilya Leoshkevich [Fri, 28 Jan 2022 12:34:24 +0000 (13:34 +0100)]
IBM Z: fix `section type conflict` with -mindirect-branch-table
s390_code_end () puts indirect branch tables into separate sections and
tries to switch back to wherever it was in the beginning by calling
switch_to_section (current_function_section ()).
First of all, this is unnecessary - the other backends don't do it.
Furthermore, at this time there is no current function, but if the
last processed function was cold, in_cold_section_p remains set. This
causes targetm.asm_out.function_section () to call
targetm.section_type_flags (), which in absence of current function
decl classifies the section as SECTION_WRITE. This causes a section
type conflict with the existing SECTION_CODE.
gcc/ChangeLog:
* config/s390/s390.cc (s390_code_end): Do not switch back to
code section.
gcc/testsuite/ChangeLog:
* gcc.target/s390/nobp-section-type-conflict.c: New test.
Harald Anlauf [Tue, 1 Feb 2022 20:36:42 +0000 (21:36 +0100)]
Fortran: error recovery when simplifying EOSHIFT
gcc/fortran/ChangeLog:
PR fortran/104331
* simplify.cc (gfc_simplify_eoshift): Avoid NULL pointer
dereference when shape is not set.
gcc/testsuite/ChangeLog:
PR fortran/104331
* gfortran.dg/eoshift_9.f90: New test.
Jakub Jelinek [Tue, 1 Feb 2022 19:48:03 +0000 (20:48 +0100)]
libcpp: Fix up padding handling in funlike_invocation_p [PR104147]
As mentioned in the PR, in some cases we preprocess incorrectly when we
encounter an identifier which is defined as function-like macro, followed
by at least 2 CPP_PADDING tokens and then some other identifier.
On the following testcase, the problem is in the 3rd funlike_invocation_p,
the tokens are CPP_NAME Y, CPP_PADDING (the pfile->avoid_paste shared token),
CPP_PADDING (one created with padding_token, val.source is non-NULL and
val.source->flags & PREV_WHITE is non-zero) and then another CPP_NAME.
funlike_invocation_p remembers there was a padding token, but remembers the
first one because of its condition, then the next token is the CPP_NAME,
which is not CPP_OPEN_PAREN, so the CPP_NAME token is backed up, but as we
can't easily backup more tokens, it pushes into a new context the padding
token (the pfile->avoid_paste one). The net effect is that when Y is not
defined as fun-like macro, we read Y, avoid_paste, padding_token, Y,
while if Y is fun-like macro, we read Y, avoid_paste, avoid_paste, Y
(the second avoid_paste is because that is how we handle end of a context).
Now, for stringify_arg that is unfortunately a significant difference,
which handles CPP_PADDING tokens with:
if (token->type == CPP_PADDING)
{
if (source == NULL
|| (!(source->flags & PREV_WHITE)
&& token->val.source == NULL))
source = token->val.source;
continue;
}
and later on
/* Leading white space? */
if (dest - 1 != BUFF_FRONT (pfile->u_buff))
{
if (source == NULL)
source = token;
if (source->flags & PREV_WHITE)
*dest++ = ' ';
}
source = NULL;
(and c-ppoutput.cc has similar code).
So, when Y is not fun-like macro, ' ' is added because padding_token's
val.source->flags & PREV_WHITE is non-zero, while when it is fun-like
macro, we don't add ' ' in between, because source is NULL and so
used from the next token (CPP_NAME Y), which doesn't have PREV_WHITE set.
Now, the funlike_invocation_p condition
if (padding == NULL
|| (!(padding->flags & PREV_WHITE) && token->val.source == NULL))
padding = token;
looks very similar to that in stringify_arg/c-ppoutput.cc, so I assume
the intent was to prefer do the same thing and pick the right padding.
But there are significant differences. Both stringify_arg and c-ppoutput.cc
don't remember the CPP_PADDING token, but its val.source instead, while
in funlike_invocation_p we want to remember the padding token that has the
significant information for stringify_arg/c-ppoutput.cc.
So, IMHO we want to overwrite padding if:
1) padding == NULL (remember that there was any padding at all)
2) padding->val.source == NULL (this matches the source == NULL
case in stringify_arg)
3) !(padding->val.source->flags & PREV_WHITE) && token->val.source == NULL
(this matches the !(source->flags & PREV_WHITE) && token->val.source == NULL
case in stringify_arg)
2022-02-01 Jakub Jelinek <jakub@redhat.com>
PR preprocessor/104147
* macro.cc (funlike_invocation_p): For padding prefer a token
with val.source non-NULL especially if it has PREV_WHITE set
on val.source->flags. Add gcc_assert that CPP_PADDING tokens
don't have PREV_WHITE set in flags.
* c-c++-common/cpp/pr104147.c: New test.
Jakub Jelinek [Tue, 1 Feb 2022 19:42:49 +0000 (20:42 +0100)]
libcpp: Avoid PREV_WHITE and other random content on CPP_PADDING tokens
The funlike_invocation_p macro never triggered, the other
asserts did on some tests, see below for a full list.
This seems to be caused by #pragma/_Pragma handling.
do_pragma does:
pfile->directive_result.src_loc = pragma_token_virt_loc;
pfile->directive_result.type = CPP_PRAGMA;
pfile->directive_result.flags = pragma_token->flags;
pfile->directive_result.val.pragma = p->u.ident;
when it sees a pragma, while start_directive does:
pfile->directive_result.type = CPP_PADDING;
and so does _cpp_do__Pragma.
Now, for #pragma lex.cc will just ignore directive_result if
it has CPP_PADDING type:
if (_cpp_handle_directive (pfile, result->flags & PREV_WHITE))
{
if (pfile->directive_result.type == CPP_PADDING)
continue;
result = &pfile->directive_result;
}
but destringize_and_run does not:
if (pfile->directive_result.type == CPP_PRAGMA)
{
...
}
else
{
count = 1;
toks = XNEW (cpp_token);
toks[0] = pfile->directive_result;
and from there it will copy type member of CPP_PADDING, but all the
other members from the last CPP_PRAGMA before it.
Small testcase for it with no option (at least no -fopenmp or -fopenmp-simd).
#pragma GCC push_options
#pragma GCC ignored "-Wformat"
#pragma GCC pop_options
void
foo ()
{
_Pragma ("omp simd")
for (int i = 0; i < 64; i++)
;
}
Here is a patch that replaces those
toks = XNEW (cpp_token);
toks[0] = pfile->directive_result;
lines with
toks = &pfile->avoid_paste;
2022-02-01 Jakub Jelinek <jakub@redhat.com>
* directives.cc (destringize_and_run): Push &pfile->avoid_paste
instead of a copy of pfile->directive_result for the CPP_PADDING
case.
Jakub Jelinek [Tue, 1 Feb 2022 19:22:14 +0000 (20:22 +0100)]
rs6000: Fix up PCH on powerpc* [PR104323]
As mentioned in the PR and as can be seen on:
--- gcc/testsuite/gcc.dg/pch/pr104323-1.c.jj 2022-02-01 13:06:00.
163192414 +0100
+++ gcc/testsuite/gcc.dg/pch/pr104323-1.c 2022-02-01 13:13:41.
226712735 +0100
@@ -0,0 +1,16 @@
+/* PR target/104323 */
+/* { dg-require-effective-target powerpc_altivec_ok } */
+/* { dg-options "-maltivec" } */
+
+#include "pr104323-1.h"
+
+__vector int a1 = { 100, 200, 300, 400 };
+__vector int a2 = { 500, 600, 700, 800 };
+__vector int r;
+
+int
+main ()
+{
+ r = vec_add (a1, a2);
+ return 0;
+}
--- gcc/testsuite/gcc.dg/pch/pr104323-1.hs.jj 2022-02-01 13:06:03.
180149978 +0100
+++ gcc/testsuite/gcc.dg/pch/pr104323-1.hs 2022-02-01 13:12:30.
175706620 +0100
@@ -0,0 +1,5 @@
+/* PR target/104323 */
+/* { dg-require-effective-target powerpc_altivec_ok } */
+/* { dg-options "-maltivec" } */
+
+#include <altivec.h>
testcase which I'm not including into testsuite because for some reason
the test fails on non-powerpc* targets (is done even on those and fails
because of missing altivec.h etc.), PCH is broken on powerpc*-*-* since the
new builtin generator has been introduced.
The generator contains or emits comments like:
/* #### Cannot mark this as a GC root because only pointer types can
be marked as GTY((user)) and be GC roots. All trees in here are
kept alive by other globals, so not a big deal. Alternatively,
we could change the enum fields to ints and cast them in and out
to avoid requiring a GTY((user)) designation, but that seems
unnecessarily gross. */
Having the fntypes stored in other GC roots can work fine for GC,
ggc_collect will then always mark them and so they won't disappear from
the tables, but it definitely doesn't work for PCH, which when the
arrays with fntype members aren't GTY marked means on PCH write we create
copies of those FUNCTION_TYPEs and store in *.gch that the GC roots should
be updated, but don't store that rs6000_builtin_info[?].fntype etc. should
be updated. When PCH is read again, the blob is read at some other address,
GC roots are updated, rs6000_builtin_info[?].fntype contains garbage
pointers (GC freed pointers with random data, or random unrelated types or
other trees).
The following patch fixes that. It stops any user markings because that
is totally unnecessary, just skips fields we don't need to mark and adds
GTY(()) to the 2 array variables. We can get rid of all those global
vars for the fn types, they can be now automatic vars.
With the patch we get
{
&rs6000_instance_info[0].fntype,
1 * (RS6000_INST_MAX),
sizeof (rs6000_instance_info[0]),
>_ggc_mx_tree_node,
>_pch_nx_tree_node
},
{
&rs6000_builtin_info[0].fntype,
1 * (RS6000_BIF_MAX),
sizeof (rs6000_builtin_info[0]),
>_ggc_mx_tree_node,
>_pch_nx_tree_node
},
as the new roots which is exactly what we want and significantly more
compact than countless
{
&uv2di_ftype_pudi_usi,
1,
sizeof (uv2di_ftype_pudi_usi),
>_ggc_mx_tree_node,
>_pch_nx_tree_node
},
{
&uv2di_ftype_lg_puv2di,
1,
sizeof (uv2di_ftype_lg_puv2di),
>_ggc_mx_tree_node,
>_pch_nx_tree_node
},
{
&uv2di_ftype_lg_pudi,
1,
sizeof (uv2di_ftype_lg_pudi),
>_ggc_mx_tree_node,
>_pch_nx_tree_node
},
{
&uv2di_ftype_di_puv2di,
1,
sizeof (uv2di_ftype_di_puv2di),
>_ggc_mx_tree_node,
>_pch_nx_tree_node
},
cases (822 of these instead of just those 4 shown).
2022-02-01 Jakub Jelinek <jakub@redhat.com>
PR target/104323
* config/rs6000/t-rs6000 (EXTRA_GTYPE_DEPS): Append rs6000-builtins.h
rather than $(srcdir)/config/rs6000/rs6000-builtins.def.
* config/rs6000/rs6000-gen-builtins.cc (write_decls): Don't use
GTY((user)) for struct bifdata and struct ovlddata. Instead add
GTY((skip(""))) to members with pointer and enum types that don't need
to be tracked. Add GTY(()) to rs6000_builtin_info and rs6000_instance_info
declarations. Don't emit gt_ggc_mx and gt_pch_nx declarations.
(write_extern_fntype, write_fntype): Remove.
(write_fntype_init): Emit the fntype vars as automatic vars instead
of file scope ones.
(write_header_file): Don't iterate with write_extern_fntype.
(write_init_file): Don't iterate with write_fntype. Don't emit
gt_ggc_mx and gt_pch_nx definitions.
Jason Merrill [Wed, 26 Jan 2022 21:42:57 +0000 (16:42 -0500)]
c++: lambda in template default argument [PR103186]
The problem with this testcase was that since my patch for PR97900 we
weren't preserving DECL_UID identity for parameters of instantiations of
templated functions, so using those parameters as the keys for the
defarg_inst map broke. I think this was always fragile given the
possibility of redeclarations, so instead of reverting that change let's
switch to keying off the function.
Memory use compiling stdc++.h is not noticeably different.
PR c++/103186
gcc/cp/ChangeLog:
* pt.cc (defarg_inst): Use tree_vec_map_cache_hasher.
(defarg_insts_for): New.
(tsubst_default_argument): Adjust.
gcc/testsuite/ChangeLog:
* g++.dg/cpp0x/lambda/lambda-defarg10.C: New test.
Jason Merrill [Thu, 27 Jan 2022 15:53:07 +0000 (10:53 -0500)]
tree: move tree_vec_map_cache_hasher into header
gcc/ChangeLog:
* tree.h (struct tree_vec_map_cache_hasher): Move from...
* tree.cc (struct tree_vec_map_cache_hasher): ...here.
Tom de Vries [Fri, 28 Jan 2022 09:28:59 +0000 (10:28 +0100)]
[nvptx] Add uniform_warp_check insn
On a GT 1030, with driver version 470.94 and -mptx=3.1 I run into:
...
FAIL: libgomp.oacc-c/../libgomp.oacc-c-c++-common/parallel-dims.c \
-DACC_DEVICE_TYPE_nvidia=1 -DACC_MEM_SHARED=0 -foffload=nvptx-none \
-O2 execution test
...
which minimizes to the same test-case as listed in commit "[nvptx]
Update default ptx isa to 6.3".
The problem is again that the first diverging branch is not handled as such in
SASS, which causes problems with a subsequent shfl insn, but given that we
have -mptx=3.1 we can't use the bar.warp.sync insn.
Given that the default is now -mptx=6.3, and consequently -mptx=3.1 is of a
lesser importance, implement the next best thing: abort when detecting
non-convergence using this insn:
...
{ .reg.b32 act;
vote.ballot.b32 act,1;
.reg.pred uni;
setp.eq.b32 uni,act,0xffffffff;
@ !uni trap;
@ !uni exit;
}
...
Interestingly, the effect of this is that rather than aborting, the test-case
now passes.
Tested on x86_64 with nvptx accelerator.
gcc/ChangeLog:
2022-01-31 Tom de Vries <tdevries@suse.de>
* config/nvptx/nvptx.cc (nvptx_single): Use nvptx_uniform_warp_check.
* config/nvptx/nvptx.md (define_c_enum "unspecv"): Add
UNSPECV_UNIFORM_WARP_CHECK.
(define_insn "nvptx_uniform_warp_check"): New define_insn.
Tom de Vries [Thu, 27 Jan 2022 14:03:59 +0000 (15:03 +0100)]
[nvptx] Add bar.warp.sync
On a GT 1030 (sm_61), with driver version 470.94 I run into:
...
FAIL: libgomp.oacc-c/../libgomp.oacc-c-c++-common/parallel-dims.c \
-DACC_DEVICE_TYPE_nvidia=1 -DACC_MEM_SHARED=0 -foffload=nvptx-none \
-O2 execution test
...
which minimizes to the same test-case as listed in commit "[nvptx] Update
default ptx isa to 6.3".
The first divergent branch looks like:
...
{
.reg .u32 %x;
mov.u32 %x,%tid.x;
setp.ne.u32 %r59,%x,0;
}
@ %r59 bra $L15;
mov.u64 %r48,%ar0;
mov.u32 %r22,2;
ld.u64 %r53,[%r48];
mov.u32 %r55,%r22;
mov.u32 %r54,1;
$L15:
...
and when inspecting the generated SASS, the branch is not setup as a divergent
branch, but instead as a regular branch.
This causes us to execute a shfl.sync insn in divergent mode, which is likely
to cause trouble given a remark in the ptx isa version 6.3, which mentions
that for .target sm_6x or below, all threads must excute the same
shfl.sync instruction in convergence.
Fix this by placing a "bar.warp.sync 0xffffffff" at the desired convergence
point (in the example above, after $L15).
Tested on x86_64 with nvptx accelerator.
gcc/ChangeLog:
2022-01-31 Tom de Vries <tdevries@suse.de>
* config/nvptx/nvptx.cc (nvptx_single): Use nvptx_warpsync.
* config/nvptx/nvptx.md (define_c_enum "unspecv"): Add
UNSPECV_WARPSYNC.
(define_insn "nvptx_warpsync"): New define_insn.
Tom de Vries [Wed, 26 Jan 2022 13:17:40 +0000 (14:17 +0100)]
[nvptx] Update default ptx isa to 6.3
With the following example, minimized from parallel-dims.c:
...
int
main (void)
{
int vectors_max = -1;
#pragma acc parallel num_gangs (1) num_workers (1) copy (vectors_max)
{
for (int i = 0; i < 2; i++)
for (int j = 0; j < 2; j++)
#pragma acc loop vector reduction (max: vectors_max)
for (int k = 0; k < 32; k++)
vectors_max = k;
}
if (vectors_max != 31)
__builtin_abort ();
return 0;
}
...
I run into (T400, driver version 470.94):
...
FAIL: libgomp.oacc-c/../libgomp.oacc-c-c++-common/parallel-dims.c \
-DACC_DEVICE_TYPE_nvidia=1 -DACC_MEM_SHARED=0 -foffload=nvptx-none -O2 \
execution test
...
The FAIL does not happen with GOMP_NVPTX_JIT=-O0.
The problem seems to be that the shfl insns for the vector reduction are not
executed uniformly by the warp. Enforcing this by using shfl.sync fixes the
problem.
Fix this by setting the ptx isa to 6.3 by default, which allows the use of
shfl.sync.
Tested on x86_64 with nvptx accelerator.
gcc/ChangeLog:
2022-01-27 Tom de Vries <tdevries@suse.de>
* config/nvptx/nvptx.opt (mptx): Set to PTX_VERSION_6_3 by default.
Tom de Vries [Wed, 26 Jan 2022 13:16:42 +0000 (14:16 +0100)]
[nvptx] Update bar.sync for ptx isa 6.0
In ptx isa 6.0, a new barrier instruction was added, and bar.sync was
redefined as barrier.sync.aligned.
The aligned modifier indicates that all threads in a CTA will execute the same
barrier instruction.
The seems fine for a form "bar.sync 0".
But a "bar.sync %rx,64" (as used for vector length > 32) may execute a
diffferent barrier depending on the value of %rx, so we can't assume it's
aligned.
Fix this by using "barrier.sync %rx,64" instead.
Tested on x86_64 with nvptx accelerator.
gcc/ChangeLog:
2022-01-27 Tom de Vries <tdevries@suse.de>
* config/nvptx/nvptx-opts.h (enum ptx_version): Add PTX_VERSION_6_0.
* config/nvptx/nvptx.h (TARGET_PTX_6_0): New macro.
* config/nvptx/nvptx.md (define_insn "nvptx_barsync"): Use barrier
insn for TARGET_PTX_6_0.
Tom de Vries [Sun, 23 Jan 2022 05:42:24 +0000 (06:42 +0100)]
[nvptx] Handle nop in prevent_branch_around_nothing
When running libgomp test-case reduction-7.c on an nvptx accelerator
(T400, driver version 470.86) and GOMP_NVPTX_JIT=-O0, I run into:
...
reduction-7.exe:reduction-7.c:312: v_p_2: \
Assertion `out[j * 32 + i] == (i + j) * 2' failed.
FAIL: libgomp.oacc-c/../libgomp.oacc-c-c++-common/reduction-7.c \
-DACC_DEVICE_TYPE_nvidia=1 -DACC_MEM_SHARED=0 -foffload=nvptx-none \
-O0 execution test
...
During investigation I found ptx code like this:
...
@ %r163 bra $L262;
$L262:
...
There's a known problem with executing this type of code, and a workaround is
in place to address this: prevent_branch_around_nothing. The workaround does
not trigger though because it doesn't handle the nop insn.
Fix this by handling the nop insn in prevent_branch_around_nothing.
Tested libgomp on x86_64 with nvptx accelerator.
gcc/ChangeLog:
2022-01-27 Tom de Vries <tdevries@suse.de>
PR target/100428
* config/nvptx/nvptx.cc (prevent_branch_around_nothing): Handle nop
insn.
Tom de Vries [Fri, 21 Jan 2022 20:46:05 +0000 (21:46 +0100)]
[nvptx] Add some support for .local atomics
The ptx insn atom doesn't support local memory. In case of doing an atomic
operation on local memory, we run into:
...
operation not supported on global/shared address space
...
This is the cuGetErrorString message for CUDA_ERROR_INVALID_ADDRESS_SPACE.
The message is somewhat confusing given that actually the operation is not
supported on local address space.
Fix this by falling back on a non-atomic version when detecting
a frame-related memory operand.
This only solves some cases that are detected at compile-time. It does
however fix the openacc private-atomic-* test-cases.
Tested on x86_64 with nvptx accelerator.
gcc/ChangeLog:
2022-01-27 Tom de Vries <tdevries@suse.de>
* config/nvptx/nvptx.md (define_insn "atomic_compare_and_swap<mode>_1")
(define_insn "atomic_exchange<mode>")
(define_insn "atomic_fetch_add<mode>")
(define_insn "atomic_fetch_addsf")
(define_insn "atomic_fetch_<logic><mode>"): Output non-atomic version
if memory operands is frame-relative.
gcc/testsuite/ChangeLog:
2022-01-31 Tom de Vries <tdevries@suse.de>
* gcc.target/nvptx/stack-atomics-run.c: New test.
libgomp/ChangeLog:
2022-01-27 Tom de Vries <tdevries@suse.de>
* testsuite/libgomp.oacc-c-c++-common/private-atomic-1.c: Remove
PR83812 workaround.
* testsuite/libgomp.oacc-fortran/private-atomic-1-vector.f90: Same.
* testsuite/libgomp.oacc-fortran/private-atomic-1-worker.f90: Same.
Tom de Vries [Fri, 21 Jan 2022 09:57:43 +0000 (10:57 +0100)]
[nvptx] Fix reduction lock
When I run the libgomp test-case reduction-cplx-dbl.c on an nvptx accelerator
(T400, driver version 470.86), I run into:
...
FAIL: libgomp.oacc-c/../libgomp.oacc-c-c++-common/reduction-cplx-dbl.c \
-DACC_DEVICE_TYPE_nvidia=1 -DACC_MEM_SHARED=0 -foffload=nvptx-none -O0 \
execution test
FAIL: libgomp.oacc-c/../libgomp.oacc-c-c++-common/reduction-cplx-dbl.c \
-DACC_DEVICE_TYPE_nvidia=1 -DACC_MEM_SHARED=0 -foffload=nvptx-none -O2 \
execution test
...
The problem is in this code generated for a gang reduction:
...
$L39:
atom.global.cas.b32 %r59, [__reduction_lock], 0, 1;
setp.ne.u32 %r116, %r59, 0;
@%r116 bra $L39;
ld.f64 %r60, [%r44];
ld.f64 %r61, [%r44+8];
ld.f64 %r64, [%r44];
ld.f64 %r65, [%r44+8];
add.f64 %r117, %r64, %r22;
add.f64 %r118, %r65, %r41;
st.f64 [%r44], %r117;
st.f64 [%r44+8], %r118;
atom.global.cas.b32 %r119, [__reduction_lock], 1, 0;
...
which is taking and releasing a lock, but missing the appropriate barriers to
protect the loads and store inside the lock.
Fix this by adding membar.gl barriers.
Likewise, add membar.cta barriers if we protect shared memory loads and
stores (even though the worker-partitioning part of the test-case is not
failing).
Tested on x86_64 with nvptx accelerator.
gcc/ChangeLog:
2022-01-27 Tom de Vries <tdevries@suse.de>
* config/nvptx/nvptx.cc (enum nvptx_builtins): Add
NVPTX_BUILTIN_MEMBAR_GL and NVPTX_BUILTIN_MEMBAR_CTA.
(VOID): New macro.
(nvptx_init_builtins): Add MEMBAR_GL and MEMBAR_CTA.
(nvptx_expand_builtin): Handle NVPTX_BUILTIN_MEMBAR_GL and
NVPTX_BUILTIN_MEMBAR_CTA.
(nvptx_lockfull_update): Add level parameter. Emit barriers.
(nvptx_reduction_update, nvptx_goacc_reduction_fini): Update call to
nvptx_lockfull_update.
* config/nvptx/nvptx.md (define_c_enum "unspecv"): Add
UNSPECV_MEMBAR_GL.
(define_expand "nvptx_membar_gl"): New expand.
(define_insn "*nvptx_membar_gl"): New insn.
Thomas Rodgers [Mon, 31 Jan 2022 21:39:44 +0000 (13:39 -0800)]
Strengthen memory order for atomic<T>::wait/notify
This matches the memory order in libc++.
libstdc++-v3/ChangeLog:
* include/bits/atomic_wait.h: Change memory order from
Acquire/Release with relaxed loads to SeqCst+Release for
accesses to the waiter's count.
Martin Liska [Tue, 1 Feb 2022 15:35:47 +0000 (16:35 +0100)]
docs: remove --disable-stage1-checking from requirements
As the minimal GCC version that can build the current master
is 4.8, it does not make sense mentioning something for older
versions.
gcc/ChangeLog:
* doc/install.texi: Remove option for GCC < 4.8.
Jakub Jelinek [Tue, 1 Feb 2022 15:02:54 +0000 (16:02 +0100)]
veclower: Fix up -fcompare-debug issue in expand_vector_comparison [PR104307]
The following testcase fails -fcompare-debug, because expand_vector_comparison
since r11-1786-g1ac9258cca8030745d3c0b8f63186f0adf0ebc27 sets
vec_cond_expr_only when it sees some use other than VEC_COND_EXPR that uses
the lhs in its condition.
Obviously we should ignore debug stmts when doing so, e.g. by not pushing
them to uses.
That would be a 2 liner change, but while looking at it, I'm also worried
about VEC_COND_EXPRs that would use the lhs in more than one operand,
like VEC_COND_EXPR <lhs, lhs, something> or VEC_COND_EXPR <lhs, something, lhs>
(sure, they ought to be folded, but what if they weren't). Because if
something like that happens, then FOR_EACH_IMM_USE_FAST would push the same
stmt multiple times and expand_vector_condition can return true even when
it modifies it (for vector bool masking).
And lastly, it seems quite wasteful to safe_push statements that will just
cause vec_cond_expr_only = false; and break; in the second loop, both for
cases like 1000 immediate non-VEC_COND_EXPR uses and for cases like
999 VEC_COND_EXPRs with lhs in cond followed by a single non-VEC_COND_EXPR
use. So this patch only pushes VEC_COND_EXPRs there.
2022-02-01 Jakub Jelinek <jakub@redhat.com>
PR middle-end/104307
* tree-vect-generic.cc (expand_vector_comparison): Don't push debug
stmts to uses vector, just set vec_cond_expr_only to false for
non-VEC_COND_EXPRs instead of pushing them into uses. Treat
VEC_COND_EXPRs that use lhs not just in rhs1, but rhs2 or rhs3 too
like non-VEC_COND_EXPRs.
* gcc.target/i386/pr104307.c: New test.
Bill Schmidt [Mon, 31 Jan 2022 18:28:12 +0000 (12:28 -0600)]
rs6000: Don't #ifdef "short" built-in names
It was recently pointed out that we get anomalous behavior when using
__attribute__((target)) to select a CPU. As an example, when building for
-mcpu=power8 but using __attribute__((target("mcpu=power10")), it is legal
to call __builtin_vec_mod, but not vec_mod, even though these are
equivalent. This is because the equivalence is established with a #define
that is guarded by #ifdef _ARCH_PWR10.
This goofy behavior occurs with both the old builtins support and the
new. One of the goals of the new builtins support was to make sure all
appropriate interfaces are available using __attribute__((target)), so I
failed in this respect. This patch corrects the problem by removing the
ifdef. Note that in a few cases we use an ifdef in a way that can't be
overridden by __attribute__((target)), and we need to keep those. For
example, #ifdef __PPU__ is still appropriate.
2022-01-06 Bill Schmidt <wschmidt@linux.ibm.com>
gcc/
* config/rs6000/rs6000-overload.def (VEC_ABSD): Remove #ifdef token.
(VEC_BLENDV): Likewise.
(VEC_BPERM): Likewise.
(VEC_CFUGE): Likewise.
(VEC_CIPHER_BE): Likewise.
(VEC_CIPHERLAST_BE): Likewise.
(VEC_CLRL): Likewise.
(VEC_CLRR): Likewise.
(VEC_CMPNEZ): Likewise.
(VEC_CNTLZ): Likewise.
(VEC_CNTLZM): Likewise.
(VEC_CNTTZM): Likewise.
(VEC_CNTLZ_LSBB): Likewise.
(VEC_CNTM): Likewise.
(VEC_CNTTZ): Likewise.
(VEC_CNTTZ_LSBB): Likewise.
(VEC_CONVERT_4F32_8F16): Likewise.
(VEC_DIV): Likewise.
(VEC_DIVE): Likewise.
(VEC_EQV): Likewise.
(VEC_EXPANDM): Likewise.
(VEC_EXTRACT_FP_FROM_SHORTH): Likewise.
(VEC_EXTRACT_FP_FROM_SHORTL): Likewise.
(VEC_EXTRACTH): Likewise.
(VEC_EXTRACTL): Likewise.
(VEC_EXTRACTM): Likewise.
(VEC_EXTRACT4B): Likewise.
(VEC_EXTULX): Likewise.
(VEC_EXTURX): Likewise.
(VEC_FIRSTMATCHINDEX): Likewise.
(VEC_FIRSTMACHOREOSINDEX): Likewise.
(VEC_FIRSTMISMATCHINDEX): Likewise.
(VEC_FIRSTMISMATCHOREOSINDEX): Likewise.
(VEC_GB): Likewise.
(VEC_GENBM): Likewise.
(VEC_GENHM): Likewise.
(VEC_GENWM): Likewise.
(VEC_GENDM): Likewise.
(VEC_GENQM): Likewise.
(VEC_GENPCVM): Likewise.
(VEC_GNB): Likewise.
(VEC_INSERTH): Likewise.
(VEC_INSERTL): Likewise.
(VEC_INSERT4B): Likewise.
(VEC_LXVL): Likewise.
(VEC_MERGEE): Likewise.
(VEC_MERGEO): Likewise.
(VEC_MOD): Likewise.
(VEC_MSUB): Likewise.
(VEC_MULH): Likewise.
(VEC_NAND): Likewise.
(VEC_NCIPHER_BE): Likewise.
(VEC_NCIPHERLAST_BE): Likewise.
(VEC_NEARBYINT): Likewise.
(VEC_NMADD): Likewise.
(VEC_ORC): Likewise.
(VEC_PDEP): Likewise.
(VEC_PERMX): Likewise.
(VEC_PEXT): Likewise.
(VEC_POPCNT): Likewise.
(VEC_PARITY_LSBB): Likewise.
(VEC_REPLACE_ELT): Likewise.
(VEC_REPLACE_UN): Likewise.
(VEC_REVB): Likewise.
(VEC_RINT): Likewise.
(VEC_RLMI): Likewise.
(VEC_RLNM): Likewise.
(VEC_SBOX_BE): Likewise.
(VEC_SIGNEXTI): Likewise.
(VEC_SIGNEXTLL): Likewise.
(VEC_SIGNEXTQ): Likewise.
(VEC_SLDB): Likewise.
(VEC_SLV): Likewise.
(VEC_SPLATI): Likewise.
(VEC_SPLATID): Likewise.
(VEC_SPLATI_INS): Likewise.
(VEC_SQRT): Likewise.
(VEC_SRDB): Likewise.
(VEC_SRV): Likewise.
(VEC_STRIL): Likewise.
(VEC_STRIL_P): Likewise.
(VEC_STRIR): Likewise.
(VEC_STRIR_P): Likewise.
(VEC_STXVL): Likewise.
(VEC_TERNARYLOGIC): Likewise.
(VEC_TEST_LSBB_ALL_ONES): Likewise.
(VEC_TEST_LSBB_ALL_ZEROS): Likewise.
(VEC_VEE): Likewise.
(VEC_VES): Likewise.
(VEC_VIE): Likewise.
(VEC_VPRTYB): Likewise.
(VEC_VSCEEQ): Likewise.
(VEC_VSCEGT): Likewise.
(VEC_VSCELT): Likewise.
(VEC_VSCEUO): Likewise.
(VEC_VSEE): Likewise.
(VEC_VSES): Likewise.
(VEC_VSIE): Likewise.
(VEC_VSTDC): Likewise.
(VEC_VSTDCN): Likewise.
(VEC_VTDC): Likewise.
(VEC_XL): Likewise.
(VEC_XL_BE): Likewise.
(VEC_XL_LEN_R): Likewise.
(VEC_XL_SEXT): Likewise.
(VEC_XL_ZEXT): Likewise.
(VEC_XST): Likewise.
(VEC_XST_BE): Likewise.
(VEC_XST_LEN_R): Likewise.
(VEC_XST_TRUNC): Likewise.
(VEC_XXPERMDI): Likewise.
(VEC_XXSLDWI): Likewise.
(VEC_TSTSFI_EQ_DD): Likewise.
(VEC_TSTSFI_EQ_TD): Likewise.
(VEC_TSTSFI_GT_DD): Likewise.
(VEC_TSTSFI_GT_TD): Likewise.
(VEC_TSTSFI_LT_DD): Likewise.
(VEC_TSTSFI_LT_TD): Likewise.
(VEC_TSTSFI_OV_DD): Likewise.
(VEC_TSTSFI_OV_TD): Likewise.
(VEC_VADDCUQ): Likewise.
(VEC_VADDECUQ): Likewise.
(VEC_VADDEUQM): Likewise.
(VEC_VADDUDM): Likewise.
(VEC_VADDUQM): Likewise.
(VEC_VBPERMQ): Likewise.
(VEC_VCLZB): Likewise.
(VEC_VCLZD): Likewise.
(VEC_VCLZH): Likewise.
(VEC_VCLZW): Likewise.
(VEC_VCTZB): Likewise.
(VEC_VCTZD): Likewise.
(VEC_VCTZH): Likewise.
(VEC_VCTZW): Likewise.
(VEC_VEEDP): Likewise.
(VEC_VEESP): Likewise.
(VEC_VESDP): Likewise.
(VEC_VESSP): Likewise.
(VEC_VIEDP): Likewise.
(VEC_VIESP): Likewise.
(VEC_VPKSDSS): Likewise.
(VEC_VPKSDUS): Likewise.
(VEC_VPKUDUM): Likewise.
(VEC_VPKUDUS): Likewise.
(VEC_VPOPCNT): Likewise.
(VEC_VPOPCNTB): Likewise.
(VEC_VPOPCNTD): Likewise.
(VEC_VPOPCNTH): Likewise.
(VEC_VPOPCNTW): Likewise.
(VEC_VPRTYBD): Likewise.
(VEC_VPRTYBQ): Likewise.
(VEC_VPRTYBW): Likewise.
(VEC_VRLD): Likewise.
(VEC_VSLD): Likewise.
(VEC_VSRAD): Likewise.
(VEC_VSRD): Likewise.
(VEC_VSTDCDP): Likewise.
(VEC_VSTDCNDP): Likewise.
(VEC_VSTDCNQP): Likewise.
(VEC_VSTDCNSP): Likewise.
(VEC_VSTDCQP): Likewise.
(VEC_VSTDCSP): Likewise.
(VEC_VSUBECUQ): Likewise.
(VEC_VSUBEUQM): Likewise.
(VEC_VSUBUDM): Likewise.
(VEC_VSUBUQM): Likewise.
(VEC_VTDCDP): Likewise.
(VEC_VTDCSP): Likewise.
(VEC_VUPKHSW): Likewise.
(VEC_VUPKLSW): Likewise.
Andreas Krebbel [Tue, 1 Feb 2022 12:33:55 +0000 (13:33 +0100)]
PR101260 regcprop: Add mode change check for copy reg
When propagating a multi-word register into an access with a smaller
mode the can_change_mode backend hook is already consulted for the
original register. This however is also required for the intermediate
copy in copy_regno which might use a different register class.
gcc/ChangeLog:
PR rtl-optimization/101260
* regcprop.cc (maybe_mode_change): Invoke mode_change_ok also for
copy_regno.
gcc/testsuite/ChangeLog:
PR rtl-optimization/101260
* gcc.target/s390/pr101260.c: New testcase.
Xi Ruoyao [Sun, 30 Jan 2022 17:15:20 +0000 (01:15 +0800)]
fold-const: do not fold NaN result from non-NaN operands [PR95115]
These operations should raise an invalid operation exception at runtime.
So they should not be folded during compilation unless -fno-trapping-math
is used.
gcc/
PR middle-end/95115
* fold-const.cc (const_binop): Do not fold NaN result from
non-NaN operands.
gcc/testsuite
* gcc.dg/pr95115.c: New test.
Tom de Vries [Sun, 23 Jan 2022 05:29:58 +0000 (06:29 +0100)]
[libgomp, testsuite] Fix insufficient resources in test-cases
When running libgomp test-case broadcast-many.c on an nvptx accelerator
(T400, driver version 470.86), I run into:
...
libgomp: The Nvidia accelerator has insufficient resources to launch \
'main$_omp_fn$0' with num_workers = 32 and vector_length = 32; \
recompile the program with 'num_workers = x and vector_length = y' on \
that offloaded region or '-fopenacc-dim=:x:y' where x * y <= 896.
FAIL: libgomp.oacc-c/../libgomp.oacc-c-c++-common/broadcast-many.c \
-DACC_DEVICE_TYPE_nvidia=1 -DACC_MEM_SHARED=0 -foffload=nvptx-none \
-O0 execution test
...
The error does not occur when using GOMP_NVPTX_JIT=-O0.
Fix this by using 896 / 32 == 28 workers for ACC_DEVICE_TYPE_nvidia.
Likewise for some other test-cases.
Tested libgomp on x86_64 with nvptx accelerator.
libgomp/ChangeLog:
2022-01-27 Tom de Vries <tdevries@suse.de>
* testsuite/libgomp.oacc-c-c++-common/broadcast-many.c: Reduce
num_workers for nvidia accelerator to fix libgomp error 'insufficient
resources'.
* testsuite/libgomp.oacc-c-c++-common/par-loop-comb-reduction-4.c:
Same.
* testsuite/libgomp.oacc-c-c++-common/reduction-7.c: Same.
Tom de Vries [Thu, 20 Jan 2022 12:37:08 +0000 (13:37 +0100)]
[libgomp, testsuite] Reduce recursion depth in declare_target-*.f90
When running the libgomp testsuite with GOMP_NVPTX_JIT=-O0 using an nvptx
accelerator (Nvidia T400, 2GB), I run into:
...
libgomp: cuCtxSynchronize error: unspecified launch failure \
(perhaps abort was called)
libgomp: cuMemFree_v2 error: unspecified launch failure
libgomp: device finalization failed
FAIL: libgomp.fortran/examples-4/declare_target-1.f90 -O0 execution test
...
The test-case contains:
...
! Reduced from 25 to 23, otherwise execution runs out of thread stack on
! Nvidia Titan V.
if (fib (23) /= fib_wrapper (23)) stop 2
...
Fix this by reducing the fib/fib_wrapper argument from 23 to 22.
Same for declare_target-2.f90.
Tested on x86_64 with nvptx accelerator.
libgomp/ChangeLog:
2022-01-27 Tom de Vries <tdevries@suse.de>
* testsuite/libgomp.fortran/examples-4/declare_target-1.f90: Reduce
recursion depth.
* testsuite/libgomp.fortran/examples-4/declare_target-2.f90: Same.
Tom de Vries [Mon, 31 Jan 2022 16:05:28 +0000 (17:05 +0100)]
[ldist] Don't add lib calls with -fno-tree-loop-distribute-patterns
As mentioned in PR56888 comment 21:
...
-fno-tree-loop-distribute-patterns is the reliable way to not
transform loops into library calls.
...
However, since commit
6f966f06146 ("ldist: Recognize strlen and rawmemchr like
loops") a strlen or rawmemchr library call may be introduced by ldist.
This caused regressions in testcases
gcc.c-torture/execute/builtins/strlen{,-2,-3}.c for nvptx.
Fix this by not calling transform_reduction_loop from
loop_distribution::execute for -fno-tree-loop-distribute-patterns.
Tested regressed test-cases as well as gcc.dg/tree-ssa/ldist-*.c on
nvptx.
gcc/ChangeLog:
2022-01-31 Tom de Vries <tdevries@suse.de>
* tree-loop-distribution.cc (generate_reduction_builtin_1): Check for
-ftree-loop-distribute-patterns.
(loop_distribution::execute): Don't call transform_reduction_loop for
-fno-tree-loop-distribute-patterns.
gcc/testsuite/ChangeLog:
2022-01-31 Tom de Vries <tdevries@suse.de>
* gcc.dg/tree-ssa/ldist-strlen-4.c: New test.
GCC Administrator [Tue, 1 Feb 2022 00:16:29 +0000 (00:16 +0000)]
Daily bump.
Andrew Pinski [Thu, 27 Jan 2022 01:22:48 +0000 (01:22 +0000)]
Fix comment for operand_compare::operand_equal_p.
The OEP_* enums were moved to tree-core.h in
r0-124973-g5e351e960763 but the comment was correct
when it was added added to fold-const.h in
r10-4231-g7f4a8ee03d40. This fixes the reference
to the OEP_* enum to reference tree-core.
Committed as obvious after a bootstrap/test on x86_64-linux.
gcc/ChangeLog:
* fold-const.h (operand_compare::operand_equal_p):
Fix comment about OEP_* flags.
Ed Smith-Rowland [Mon, 31 Jan 2022 23:01:42 +0000 (18:01 -0500)]
MAINTAINERS: Update my email and add myself to the DCO list.
ChangeLog:
2022-01-31 Ed Smith-Rowland <esmithrowland@gmail.com>
* MAINTAINERS: Update my email and add myself to the DCO list.
Marek Polacek [Thu, 27 Jan 2022 23:11:03 +0000 (18:11 -0500)]
c++: ICE with auto[] and VLA [PR102414]
Here we ICE in unify_array_domain when we're trying to deduce the type
of an array, as in
auto(*p)[i] = (int(*)[i])0;
but unify_array_domain doesn't arbitrarily complex bounds. Another
test is, e.g.,
auto (*b)[0/0] = &a;
where the type of the array is
<<< Unknown tree: template_type_parm >>>[0:(sizetype) ((ssizetype) (0 / 0) - 1)]
It seems to me that we need not handle these.
PR c++/102414
PR c++/101874
gcc/cp/ChangeLog:
* decl.cc (create_array_type_for_decl): Use template_placeholder_p.
Sorry on a variable-length array of auto.
gcc/testsuite/ChangeLog:
* g++.dg/cpp23/auto-array3.C: New test.
* g++.dg/cpp23/auto-array4.C: New test.
Marek Polacek [Sat, 29 Jan 2022 01:01:06 +0000 (20:01 -0500)]
c++: Reject union std::initializer_list [PR102434]
Weird things are going to happen if you define your std::initializer_list
as a union. In this case, we crash in output_constructor_regular_field.
Let's not allow such a definition in the first place.
PR c++/102434
gcc/cp/ChangeLog:
* class.cc (finish_struct): Don't allow union initializer_list.
gcc/testsuite/ChangeLog:
* g++.dg/cpp0x/initlist128.C: New test.
Patrick Palka [Mon, 31 Jan 2022 20:27:58 +0000 (15:27 -0500)]
c++: CTAD for class tmpl defined inside partial spec [PR104294]
Here during deduction guide generation for the nested class template
B<char(int)>::C, the computation of outer_args yields the template
arguments relative to the primary template for B (i.e. {char(int)})
but what we really want is those relative to C's enclosing scope, the
partial specialization of B (i.e. {char, int}).
PR c++/104294
gcc/cp/ChangeLog:
* pt.cc (ctor_deduction_guides_for): Correct computation of
outer_args.
gcc/testsuite/ChangeLog:
* g++.dg/cpp1z/class-deduction106.C: New test.
Patrick Palka [Mon, 31 Jan 2022 19:15:01 +0000 (14:15 -0500)]
c++: CONSTRUCTORs are non-deduced contexts [PR104291]
PR c++/104291
gcc/cp/ChangeLog:
* pt.cc (for_each_template_parm_r) <case CONSTRUCTOR>: Clear
walk_subtrees if !include_nondeduced_p. Simplify given that
cp_walk_subtrees already walks TYPE_PTRMEMFUNC_FN_TYPE_RAW.
gcc/testsuite/ChangeLog:
* g++.dg/template/partial20.C: New test.
Jakub Jelinek [Mon, 31 Jan 2022 19:08:18 +0000 (20:08 +0100)]
rs6000: Fix up build of non-glibc/aix/darwin powerpc* targets [PR104298]
As reported by Martin, while David has added OPTION_GLIBC define to aix
and Iain to darwin, all the other non-linux targets now fail because
rs6000.md macro isn't defined.
One possibility is to define this macro in option-defaults.h which on rs6000
targets is included last, then we don't need to define it in aix/darwin
headers and for targets using linux.h or linux64.h it will DTRT too.
The other option is the first 2 hunks + changing the 3
if (!OPTION_GLIBC)
FAIL;
cases in rs6000.md to e.g.
#ifdef OPTION_GLIBC
if (!OPTION_GLIBC)
#endif
FAIL;
or to:
#ifdef OPTION_GLIBC
if (!OPTION_GLIBC)
#else
if (true)
#endif
FAIL;
(the latter case if Richi wants to push the -Wunreachable-code changes for
GCC 13).
2022-01-31 Jakub Jelinek <jakub@redhat.com>
PR target/104298
* config/rs6000/aix.h (OPTION_GLIBC): Remove.
* config/rs6000/darwin.h (OPTION_GLIBC): Likewise.
* config/rs6000/option-defaults.h (OPTION_GLIBC): Define to 0
if not already defined.
Martin Sebor [Mon, 31 Jan 2022 19:04:55 +0000 (12:04 -0700)]
Constrain PHI handling in -Wuse-after-free [PR104232].
Resolves:
PR middle-end/104232 - spurious -Wuse-after-free after conditional free
gcc/ChangeLog:
PR middle-end/104232
* gimple-ssa-warn-access.cc (pointers_related_p): Add argument.
Handle PHIs. Add a synonymous overload.
(pass_waccess::check_pointer_uses): Call pointers_related_p.
gcc/testsuite/ChangeLog:
PR middle-end/104232
* g++.dg/warn/Wuse-after-free4.C: New test.
* gcc.dg/Wuse-after-free-2.c: New test.
* gcc.dg/Wuse-after-free-3.c: New test.
Martin Liska [Mon, 31 Jan 2022 15:39:02 +0000 (16:39 +0100)]
contrib: update analyze_brprob_* scripts.
contrib/ChangeLog:
* analyze_brprob.py: Support more formatted predict.def file.
* analyze_brprob_spec.py: Improve output and documentation.
Nick Clifton [Mon, 31 Jan 2022 14:28:42 +0000 (14:28 +0000)]
libiberty: Fix infinite recursion in rust demangler.
libiberty/
PR demangler/98886
PR demangler/99935
* rust-demangle.c (struct rust_demangler): Add a recursion
counter.
(demangle_path): Increment/decrement the recursion counter upon
entry and exit. Fail if the counter exceeds a fixed limit.
(demangle_type): Likewise.
(rust_demangle_callback): Initialise the recursion counter,
disabling if requested by the option flags.
Pierre-Marie de Rodat [Tue, 25 Jan 2022 13:27:36 +0000 (13:27 +0000)]
[Ada] doc/share/conf.py: fix string handling
gcc/ada/
* doc/share/conf.py: Remove spurious call to ".decode()".
Arnaud Charlet [Mon, 24 Jan 2022 19:16:27 +0000 (14:16 -0500)]
[Ada] Fix up handling of ghost units PR104027 #2
gcc/ada/
PR ada/104027
* gnat1drv.adb (Gnat1drv): Only call Exit_Program when not
generating code, otherwise instead go to End_Of_Program.
Jakub Jelinek [Mon, 31 Jan 2022 09:30:58 +0000 (10:30 +0100)]
testsuite: Fix up tree-ssa/pr103514.c testcase [PR103514]
> > PR tree-optimization/103514
> > * match.pd (a & b) ^ (a == b) -> !(a | b): New optimization.
> > * match.pd (a & b) == (a ^ b) -> !(a | b): New optimization.
> > * gcc.dg/tree-ssa/pr103514.c: Testcase for this optimization.
> >
> > 1) https://gcc.gnu.org/bugzilla/show_bug.cgi?id=103514
> Note the bug was filed an fixed during stage3, review just didn't happen in
> a reasonable timeframe.
>
> I'm going to ACK this for the trunk and go ahead and commit it for you.
The testcase FAILs on short-circuit targets like powerpc64le-linux.
While the first 2 functions are identical, the last two look like:
<bb 2> :
if (a_5(D) != 0)
goto <bb 3>; [INV]
else
goto <bb 4>; [INV]
<bb 3> :
if (b_6(D) != 0)
goto <bb 5>; [INV]
else
goto <bb 4>; [INV]
<bb 4> :
<bb 5> :
# iftmp.1_4 = PHI <1(3), 0(4)>
_1 = a_5(D) == b_6(D);
_2 = (int) _1;
_3 = _2 ^ iftmp.1_4;
_9 = _2 != iftmp.1_4;
return _9;
instead of the expected:
<bb 2> :
_3 = a_8(D) & b_9(D);
_4 = (int) _3;
_5 = a_8(D) == b_9(D);
_6 = (int) _5;
_1 = a_8(D) | b_9(D);
_2 = ~_1;
_7 = (int) _2;
_10 = ~_1;
return _10;
so no wonder it doesn't match. E.g. x86_64-linux will also use jumps
if it isn't just a && b but a && b && c && d (will do
a & b and c & d tests and jump based on those.
As it is too late to implement this optimization even for the short
circuiting targets this late (not even sure which pass would be best),
this patch just forces non-short-circuiting for the test.
2022-01-31 Jakub Jelinek <jakub@redhat.com>
PR tree-optimization/103514
* gcc.dg/tree-ssa/pr103514.c: Add
--param logical-op-non-short-circuit=1 to dg-options.
Martin Liska [Mon, 31 Jan 2022 08:49:41 +0000 (09:49 +0100)]
d: Fix -Werror=format-diag error.
PR d/104287
gcc/d/ChangeLog:
* decl.cc (d_finish_decl): Remove trailing dot.
Martin Liska [Fri, 21 Jan 2022 16:10:07 +0000 (17:10 +0100)]
Add mold detection for libs.
libatomic/ChangeLog:
* acinclude.m4: Detect *_ld_is_mold and use it.
* configure: Regenerate.
libgomp/ChangeLog:
* acinclude.m4: Detect *_ld_is_mold and use it.
* configure: Regenerate.
libitm/ChangeLog:
* acinclude.m4: Detect *_ld_is_mold and use it.
* configure: Regenerate.
libstdc++-v3/ChangeLog:
* acinclude.m4: Detect *_ld_is_mold and use it.
* configure: Regenerate.
Richard Biener [Mon, 24 Jan 2022 13:59:00 +0000 (14:59 +0100)]
Fix multiple_of_p behavior with NOP_EXPR
We were passing down the original type to recursive invocations
of multiple_of_p for say (int)(unsigned * unsigned).
2022-01-24 Richard Biener <rguenther@suse.de>
PR tree-optimization/100499
* fold-const.cc (multiple_of_p): Pass the correct type of
the expression to the recursive invocation of multiple_of_p
for conversions and use CASE_CONVERT.
Eric Botcazou [Mon, 31 Jan 2022 08:21:48 +0000 (09:21 +0100)]
Use V8+ default in 32-bit mode on SPARC64/Linux
This is what has been done for ages on SPARC/Solaris and makes it possible
to use 64-bit atomic instructions even in 32-bit mode.
gcc/
PR target/104189
* config/sparc/linux64.h (TARGET_DEFAULT): Add MASK_V8PLUS.
Eric Botcazou [Mon, 31 Jan 2022 08:14:41 +0000 (09:14 +0100)]
Add testcase for incorrect optimization in Ada
gcc/testsuite/
* gnat.dg/div_zero.adb: New test.
Richard Biener [Mon, 24 Jan 2022 13:49:20 +0000 (14:49 +0100)]
Reduce multiple_of_p uses
There are a few cases where we know we're dealing with (poly-)integer
constants, so remove the use of multiple_of_p in those cases to make
the PR100499 fix less impactful.
2022-01-24 Richard Biener <rguenther@suse.de>
PR tree-optimization/100499
* tree-cfg.cc (verify_gimple_assign_ternary): Use multiple_p
on poly-ints instead of multiple_of_p.
* tree-ssa.cc (maybe_rewrite_mem_ref_base): Likewise.
(non_rewritable_mem_ref_base): Likewise.
(non_rewritable_lvalue_p): Likewise.
(execute_update_addresses_taken): Likewise.
GCC Administrator [Mon, 31 Jan 2022 00:16:28 +0000 (00:16 +0000)]
Daily bump.
Hans-Peter Nilsson [Sun, 30 Jan 2022 01:01:12 +0000 (02:01 +0100)]
libstdc++ testsuite: Don't run lwg3464.cc tests on simulators
These tests have always been failing for my autotester running a
cris-elf simulator; when unrestrained they take about 20 minutes each,
compared to the (doubled) timeout of 720 seconds, of a total 2h40min
for the whole of the libstdc++-v3 testsuite. The tests cover counter
overflow and are already disabled for LP64 targets.
* testsuite/27_io/basic_istream/get/char/lwg3464.cc: Don't run on
simulator targets.
* testsuite/27_io/basic_istream/get/wchar_t/lwg3464.cc: Likewise.
GCC Administrator [Sun, 30 Jan 2022 00:16:20 +0000 (00:16 +0000)]
Daily bump.
Jakub Jelinek [Sat, 29 Jan 2022 16:55:51 +0000 (17:55 +0100)]
testsuite: Fix up tree-ssa/divide-7.c testcase [PR95424]
This test fails everywhere, because ? doesn't match literal ?.
It should use \\? instead. I've also changed those .s in there.
2022-01-29 Jakub Jelinek <jakub@redhat.com>
PR tree-optimization/95424
* gcc.dg/tree-ssa/divide-7.c: Fix up regexps in scan-tree-dump{,-not}.
Jakub Jelinek [Sat, 29 Jan 2022 16:54:43 +0000 (17:54 +0100)]
match.pd: Fix up 1 / X for unsigned X optimization [PR104280]
On Fri, Jan 28, 2022 at 11:38:23AM -0700, Jeff Law wrote:
> Thanks. Given the original submission and most of the review work was done
> prior to stage3 closing, I went ahead and installed this on the trunk.
Unfortunately this breaks quite a lot of things.
The main problem is that GIMPLE allows EQ_EXPR etc. only with BOOLEAN_TYPE
or with TYPE_PRECISION == 1 integral type (or vector boolean).
Violating this causes verification failures in tree-cfg.cc in some cases,
in other cases wrong-code issues because before it is verified we e.g.
transform
1U / x
into
x == 1U
and later into
x (because we assume that == type must be one of the above cases and
when it is the same type as the type of the first operand, for boolean-ish
cases it should be equivalent).
Fixed by changing that
(eq @1 { build_one_cst (type); })
into
(convert (eq:boolean_type_node @1 { build_one_cst (type); }))
Note, I'm not 100% sure if :boolean_type_node is required in that case,
I see some spots in match.pd that look exactly like this, while there is
e.g. (convert (le ...)) that supposedly does the right thing too.
The signed integer 1/X case doesn't need changes changes, for
(cond (le ...) ...)
le gets correctly boolean_type_node and cond should use type.
I've also reformatted it, some lines were too long, match.pd uses
indentation by 1 column instead of 2 etc.
2022-01-29 Jakub Jelinek <jakub@redhat.com>
Andrew Pinski <apinski@marvell.com>
PR tree-optimization/104279
PR tree-optimization/104280
PR tree-optimization/104281
* match.pd (1 / X -> X == 1 for unsigned X): Build eq with
boolean_type_node and convert to type. Formatting fixes.
* gcc.dg/torture/pr104279.c: New test.
* gcc.dg/torture/pr104280.c: New test.
* gcc.dg/torture/pr104281.c: New test.
GCC Administrator [Sat, 29 Jan 2022 00:16:22 +0000 (00:16 +0000)]
Daily bump.
Yoshinori Sato [Fri, 28 Jan 2022 22:16:47 +0000 (17:16 -0500)]
sh-linux fix target cpu
sh-linux not supported any SH1 and SH2a little-endian.
gcc
* config/sh/t-linux (MULTILIB_EXCEPTIONS): Add m1, mb/m1 and m2a.
Navid Rahimi [Fri, 28 Jan 2022 22:11:30 +0000 (17:11 -0500)]
tree-optimization/103514 Missing XOR-EQ-AND Optimization
This patch will add the missed pattern described in bug 103514 [1] to the match.pd. [1] includes proof of correctness for the patch too.
1) https://gcc.gnu.org/bugzilla/show_bug.cgi?id=103514
gcc/
PR tree-optimization/103514
* match.pd (a & b) ^ (a == b) -> !(a | b): New optimization.
(a & b) == (a ^ b) -> !(a | b): New optimization.
gcc/testsuite
* gcc.dg/tree-ssa/pr103514.c: Testcase for this optimization.
Marek Polacek [Fri, 28 Jan 2022 20:56:42 +0000 (15:56 -0500)]
doc: Update -Wbidi-chars documentation
gcc/ChangeLog:
* doc/invoke.texi: Update -Wbidi-chars documentation.
Patrick Palka [Fri, 28 Jan 2022 20:41:15 +0000 (15:41 -0500)]
c++: bogus warning with value init of const pmf [PR92752]
Here we're emitting a -Wignored-qualifiers warning for an intermediate
compiler-generated cast of nullptr to 'method-type* const' as part of
value initialization of a const pmf. This patch suppresses the warning
by instead casting to the corresponding unqualified type.
PR c++/92752
gcc/cp/ChangeLog:
* typeck.cc (build_ptrmemfunc): Cast a nullptr constant to the
unqualified pointer type not the qualified one.
gcc/testsuite/ChangeLog:
* g++.dg/warn/Wignored-qualifiers2.C: New test.
Co-authored-by: Jason Merrill <jason@redhat.com>
Iain Sandoe [Fri, 28 Jan 2022 19:17:16 +0000 (19:17 +0000)]
Darwin, PPC: Fix bootstrap after GLIBC version changes.
A recent patch added tests for OPTION_GLIBC that is defined in
linux.h and linux64.h. This broke bootstrap for powerpc Darwin.
Fixed by adding a definition to 0 for OPTION_GLIBC.
Signed-off-by: Iain Sandoe <iain@sandoe.co.uk>
gcc/ChangeLog:
* config/rs6000/darwin.h (OPTION_GLIBC): Define to 0.
Zhao Wei Liew [Fri, 28 Jan 2022 18:36:39 +0000 (13:36 -0500)]
match.pd: Simplify 1 / X for integer X [PR95424]
This patch implements an optimization for the following C++ code:
int f(int x) {
return 1 / x;
}
int f(unsigned int x) {
return 1 / x;
}
Before this patch, x86-64 gcc -std=c++20 -O3 produces the following assembly:
f(int):
xor edx, edx
mov eax, 1
idiv edi
ret
f(unsigned int):
xor edx, edx
mov eax, 1
div edi
ret
In comparison, clang++ -std=c++20 -O3 produces the following assembly:
f(int):
lea ecx, [rdi + 1]
xor eax, eax
cmp ecx, 3
cmovb eax, edi
ret
f(unsigned int):
xor eax, eax
cmp edi, 1
sete al
ret
Clang's output is more efficient as it avoids expensive div operations.
With this patch, GCC now produces the following assembly:
f(int):
lea eax, [rdi + 1]
cmp eax, 2
mov eax, 0
cmovbe eax, edi
ret
f(unsigned int):
xor eax, eax
cmp edi, 1
sete al
ret
which is virtually identical to Clang's assembly output. Any slight differences
in the output for f(int) is possibly related to a different missed optimization.
v2: https://gcc.gnu.org/pipermail/gcc-patches/2022-January/587751.html
Changes from v2:
1. Refactor from using a switch statement to using the built-in
if-else statement.
v1: https://gcc.gnu.org/pipermail/gcc-patches/2022-January/587634.html
Changes from v1:
1. Refactor common if conditions.
2. Use build_[minus_]one_cst (type) to get -1/1 of the correct type.
3. Match only for TRUNC_DIV_EXPR and TYPE_PRECISION (type) > 1.
gcc/ChangeLog:
PR tree-optimization/95424
* match.pd: Simplify 1 / X where X is an integer.
Jakub Jelinek [Fri, 28 Jan 2022 18:02:26 +0000 (19:02 +0100)]
store-merging: Fix up a -fcompare-debug bug in get_status_for_store_merging [PR104263]
As mentioned in the PRthe following testcase fails, because the last
stmt of a bb with -g is a debug stmt and get_status_for_store_merging
uses gimple_seq_last_stmt (bb_seq (bb)) when testing if it is valid
for store merging. The debug stmt isn't valid, while a stmt at that
position with -g0 is valid and so the divergence.
As we walk the whole bb already, this patch just remembers the last
non-debug stmt, so that we don't need to skip backwards debug stmts at the
end of the bb to find last real stmt.
2022-01-28 Jakub Jelinek <jakub@redhat.com>
PR tree-optimization/104263
* gimple-ssa-store-merging.cc (get_status_for_store_merging): For
cfun->can_throw_non_call_exceptions && cfun->eh test whether
last non-debug stmt in the bb is store_valid_for_store_merging_p
rather than last stmt.
* gcc.dg/pr104263.c: New test.
Allan McRae [Fri, 28 Jan 2022 17:44:08 +0000 (12:44 -0500)]
testsuite/70230 - fix failures with default SSP\
Configuring with --enable-default-ssp triggers various testsuite
failures. These contain asm statements that are not compatible with
-fstack-protector. Adding -fno-stack-protector to dg-options to
work around this issue.
Tested on x86_64-linux.
PR testsuite/70230
* gcc.dg/asan/use-after-scope-4.c (dg-options): Add
-fno-stack-protector.
* gcc.dg/stack-usage-1.c: Likewise
* gcc.dg/superblock.c: Likewise
* gcc.target/i386/avx-vzeroupper-17.c: Likewise
* gcc.target/i386/cleanup-1.c: Likewise
* gcc.target/i386/cleanup-2.c: Likewise
* gcc.target/i386/interrupt-redzone-1.c: Likewise
* gcc.target/i386/interrupt-redzone-2.c: Likewise
* gcc.target/i386/pr79793-1.c: Likewise
* gcc.target/i386/pr79793-2.c: Likewise
* gcc.target/i386/shrink_wrap_1.c: Likewise
* gcc.target/i386/stack-check-11.c: Likewise
* gcc.target/i386/stack-check-18.c: Likewise
* gcc.target/i386/stack-check-19.c: Likewise
* gcc.target/i386/stackalign/pr88483-1.c: Likewise
* gcc.target/i386/stackalign/pr88483-2.c: Likewise
* gcc.target/i386/sw-1.c: Likewise
Martin Liska [Fri, 28 Jan 2022 15:11:33 +0000 (16:11 +0100)]
Remove extra newline in ICE report.
Revert partially what I did in g:
76ef38e3178a11e76a66b4d4c0e10e85fe186a45.
gcc/ChangeLog:
* diagnostic.cc (diagnostic_action_after_output): Remove extra
newline.
Martin Liska [Thu, 27 Jan 2022 12:37:04 +0000 (13:37 +0100)]
internal_error - do not use leading capital letter
gcc/ChangeLog:
* config/rs6000/host-darwin.cc (segv_crash_handler):
Do not use leading capital letter.
(segv_handler): Likewise.
* ipa-sra.cc (verify_splitting_accesses): Likewise.
* varasm.cc (get_section): Likewise.
gcc/d/ChangeLog:
* decl.cc (d_finish_decl): Do not use leading capital letter.
Patrick Palka [Fri, 28 Jan 2022 13:18:28 +0000 (08:18 -0500)]
c++: var tmpl w/ dependent constrained auto type [PR103341]
When deducing the type of a variable template (or templated static data
member) with a constrained auto type, we might need its template
arguments for satisfaction since the constraint could depend on them.
PR c++/103341
gcc/cp/ChangeLog:
* decl.cc (cp_finish_decl): Pass the template arguments of a
variable template specialization or a templated static data
member to do_auto_deduction when the auto is constrained.
gcc/testsuite/ChangeLog:
* g++.dg/cpp2a/concepts-class4.C: New test.
* g++.dg/cpp2a/concepts-var-templ2.C: New test.
Richard Biener [Fri, 28 Jan 2022 10:32:11 +0000 (11:32 +0100)]
tree-optimization/104267 - fix external def vector type for call args
The following fixes the vector type registered for external defs
in call arguments when vectorizing with SLP. We assumed uniform
vectype_in types here but with calls like .COND_MUL we also have
mask arguments which, when invariant or external, need to have
a proper mask vector type.
2022-01-28 Richard Biener <rguenther@suse.de>
PR tree-optimization/104267
* tree-vect-stmts.cc (vectorizable_call): Properly use the
per-argument determined vector type for externals and
invariants.
Richard Biener [Fri, 28 Jan 2022 09:55:29 +0000 (10:55 +0100)]
tree-optimization/104263 - avoid retaining abnormal edges for non-call/goto stmts
This removes a premature optimization from
gimple_purge_dead_abnormal_call_edges which, after eliding the
last setjmp (or computed goto) statement from a function and
thus clearing cfun->calls_setjmp, leaves us with the abnormal
edges from other calls that are elided for example via inlining
or DCE. That's a CFG / IL combination that should be impossible
(not addressing the fact that with cfun->calls_setjmp and
cfun->has_nonlocal_label cleared we should not have any abnormal
edge at all).
For the testcase in the PR this means that IPA inlining will
remove the abormal edges from the block after inlining the call
the edge was coming from.
2022-01-28 Richard Biener <rguenther@suse.de>
PR tree-optimization/104263
* tree-cfg.cc (gimple_purge_dead_abnormal_call_edges):
Purge edges also when !cfun->has_nonlocal_label
and !cfun->calls_setjmp.
* gcc.dg/tree-ssa/inline-13.c: New testcase.
Maciej W. Rozycki [Fri, 28 Jan 2022 11:55:12 +0000 (11:55 +0000)]
RISC-V: Document `auipc' and `bitmanip' `type' attributes
Document new `auipc' and `bitmanip' `type' attributes added respectively
with commit
88108b27dda9 ("RISC-V: Add sifive-7 pipeline description.")
and commit
283b1707f237 ("RISC-V: Implement instruction patterns for ZBA
extension.") but not listed so far.
gcc/
* config/riscv/riscv.md: Document `auipc' and `bitmanip' `type'
attributes.
Andre Vehreschild [Fri, 28 Jan 2022 11:34:17 +0000 (12:34 +0100)]
Prevent malicious descriptor stacking for scalar components [V2].
gcc/fortran/ChangeLog:
PR fortran/103790
* trans-array.cc (structure_alloc_comps): Prevent descriptor
stacking for non-array data; do not broadcast caf-tokens.
* trans-intrinsic.cc (conv_co_collective): Prevent generation
of unused descriptor.
gcc/testsuite/ChangeLog:
PR fortran/103790
* gfortran.dg/coarray_collectives_18.f90: New test.
Jakub Jelinek [Fri, 28 Jan 2022 10:48:18 +0000 (11:48 +0100)]
cfgrtl: Fix up locus comparison in unique_locus_on_edge_between_p [PR104237]
The testcase in the PR (not included for the testsuite because we don't
have an (easy) way to -fcompare-debug LTO, we'd need 2 compilations/linking,
one with -g and one with -g0 and -fdump-rtl-final= at the end of lto1
and compare that) has different code generation for -g vs. -g0.
The difference appears during expansion, where we have a goto_locus
that is at -O0 compared to the INSN_LOCATION of the previous and next insn
across an edge. With -g0 the locations are equal and so no nop is added.
With -g the locations aren't equal and so a nop is added holding that
location.
The reason for the different location is in the way how we stream in
locations by lto1.
We have lto_location_cache::apply_location_cache that is called with some
set of expanded locations, qsorts them, creates location_t's for those
and remembers the last expanded location.
lto_location_cache::input_location_and_block when read in expanded_location
is equal to the last expanded location just reuses the last location_t
(or adds/changes/removes LOCATION_BLOCK in it), when it is not queues
it for next apply_location_cache. Now, when streaming in -g input, we can
see extra locations that don't appear with -g0, and if we are unlucky
enough, those can be sorted last during apply_location_cache and affect
what locations are used from the single entry cache next.
In particular, second apply_location_cache with non-empty loc_cache in
the testcase has 14 locations with -g0 and 16 with -g and those 2 extra
ones sort both last (they are the same). The last one from -g0 then
appears to be input_location_and_block sourced again, for -g0 triggers
the single entry cache, while for -g it doesn't and so apply_location_cache
will create for it another location_t with the same content.
The following patch fixes it by comparing everything we care about the
location instead (well, better in addition) to a simple location_t ==
location_t check. I think we don't care about the sysp flag for debug
info...
2022-01-28 Jakub Jelinek <jakub@redhat.com>
PR lto/104237
* cfgrtl.cc (loc_equal): New function.
(unique_locus_on_edge_between_p): Use it.
Richard Biener [Fri, 28 Jan 2022 09:28:39 +0000 (10:28 +0100)]
Make graph dumping work for fn != cfun
The following makes dumping of a function as graph work as intended
when specifying a function other than cfun. Unfortunately the loop
and the dominance APIs are not set up to work for other functions
than cfun so you won't get any fancy loop dumps but the non-loop
dump works up to reaching mark_dfs_back_edges which I trivially made
function aware and adjusted current callers with a wrapper.
With all this, doing dot-fn id->src_cfun from the debugger when
debugging inlining works. Previously you got a strange mix of
the src and dest functions visualized ;)
2022-01-28 Richard Biener <rguenther@suse.de>
* cfganal.h (mark_dfs_back_edges): Provide API with struct
function argument.
* cfganal.cc (mark_dfs_back_edges): Take a struct function
to work on, add a wrapper passing cfun.
* graph.cc (draw_cfg_nodes_no_loops): Replace stray cfun
uses with fun which is already passed.
(draw_cfg_edges): Likewise.
(draw_cfg_nodes_for_loop): Do not use draw_cfg_nodes_for_loop
for fun != cfun.
Eric Botcazou [Fri, 28 Jan 2022 10:04:06 +0000 (11:04 +0100)]
Fix wrong operator for universal_integer operands in instance
This is a regression present on mainline and 11 branch: the transformation
applied during expansion by Narrow_Large_Operation would incorrectly perform
name resolution for the operator again.
gcc/ada/
PR ada/104258
* exp_ch4.adb (Narrow_Large_Operation): Also copy the entity, if
any, when rewriting the operator node.
gcc/testsuite/
* gnat.dg/generic_comp.adb: New test.
Andre Vehreschild [Fri, 28 Jan 2022 09:35:07 +0000 (10:35 +0100)]
Revert "Prevent malicious descriptor stacking for scalar components."
Breaks bootstrap.
This reverts commit
c9c48ab7bad9fe5e096076e56a60ce0a5a2b65f7.
Andre Vehreschild [Fri, 28 Jan 2022 08:20:23 +0000 (09:20 +0100)]
Prevent malicious descriptor stacking for scalar components.
gcc/fortran/ChangeLog:
PR fortran/103790
* trans-array.cc (structure_alloc_comps): Prevent descriptor
stacking for non-array data; do not broadcast caf-tokens.
* trans-intrinsic.cc (conv_co_collective): Prevent generation
of unused descriptor.
gcc/testsuite/ChangeLog:
PR fortran/103790
* gfortran.dg/coarray_collectives_18.f90: New test.
Jason Merrill [Thu, 27 Jan 2022 22:46:43 +0000 (17:46 -0500)]
c++: pack in enumerator in lambda [PR100198]
The GCC 8 lambda overhaul fixed most uses of lambdas in pack expansions, but
local enums and classes within such lambdas that depend on parameter packs
are still broken. For now, give a sorry instead of an ICE or incorrect
error.
PR c++/100198
PR c++/100030
PR c++/100282
gcc/cp/ChangeLog:
* parser.cc (cp_parser_enumerator_definition): Sorry on parameter
pack in lambda.
(cp_parser_class_head): And in class attributes.
* pt.cc (check_for_bare_parameter_packs): Sorry instead of error
in lambda.
gcc/testsuite/ChangeLog:
* g++.dg/cpp0x/lambda/lambda-variadic13.C: Accept the sorry
as well as the correct error.
* g++.dg/cpp0x/lambda/lambda-variadic14.C: Likewise.
* g++.dg/cpp0x/lambda/lambda-variadic14a.C: New test.
* g++.dg/cpp0x/lambda/lambda-variadic15.C: New test.
* g++.dg/cpp0x/lambda/lambda-variadic16.C: New test.
GCC Administrator [Fri, 28 Jan 2022 00:16:32 +0000 (00:16 +0000)]
Daily bump.
Jonathan Wakely [Thu, 27 Jan 2022 22:31:26 +0000 (22:31 +0000)]
libstdc++: Prevent -Wstringop-overread warning in std::deque [PR100516]
The compiler warns about the loop in deque::_M_range_initialize because
it doesn't know that the number of nodes has already been correctly
sized to match the size of the input. Use __builtin_unreachable to tell
it that the loop will never be entered if the number of elements is
smaller than a single node.
libstdc++-v3/ChangeLog:
PR libstdc++/100516
* include/bits/deque.tcc (_M_range_initialize<ForwardIterator>):
Add __builtin_unreachable to loop.
* testsuite/23_containers/deque/100516.cc: New test.
David Malcolm [Wed, 26 Jan 2022 21:24:08 +0000 (16:24 -0500)]
analyzer: show region creation events for uninit warnings
When reviewing the output of -fanalyzer on PR analyzer/104224 I noticed
that despite very verbose paths, the diagnostic paths for
-Wanalyzer-use-of-uninitialized-value
don't show where the uninitialized memory is allocated.
This patch adapts and simplifies material from
"[PATCH 3/6] analyzer: implement infoleak detection"
https://gcc.gnu.org/pipermail/gcc-patches/2021-November/584377.html
in order to add region creation events for the pertinent region (whether
on the stack or heap).
For example, this patch extends:
malloc-1.c: In function 'test_40':
malloc-1.c:461:5: warning: use of uninitialized value '*p' [CWE-457] [-Wanalyzer-use-of-uninitialized-value]
461 | i = *p;
| ~~^~~~
'test_40': event 1
|
| 461 | i = *p;
| | ~~^~~~
| | |
| | (1) use of uninitialized value '*p' here
|
to:
malloc-1.c: In function 'test_40':
malloc-1.c:461:5: warning: use of uninitialized value '*p' [CWE-457] [-Wanalyzer-use-of-uninitialized-value]
461 | i = *p;
| ~~^~~~
'test_40': events 1-2
|
| 460 | int *p = (int*)malloc(sizeof(int*));
| | ^~~~~~~~~~~~~~~~~~~~
| | |
| | (1) region created on heap here
| 461 | i = *p;
| | ~~~~~~
| | |
| | (2) use of uninitialized value '*p' here
|
and this helps readability of the resulting warnings, especially in
more complicated cases.
gcc/analyzer/ChangeLog:
* checker-path.cc (event_kind_to_string): Handle
EK_REGION_CREATION.
(region_creation_event::region_creation_event): New.
(region_creation_event::get_desc): New.
(checker_path::add_region_creation_event): New.
* checker-path.h (enum event_kind): Add EK_REGION_CREATION.
(class region_creation_event): New subclass.
(checker_path::add_region_creation_event): New decl.
* diagnostic-manager.cc
(diagnostic_manager::emit_saved_diagnostic): Pass NULL for new
param to add_events_for_eedge when handling trailing eedge.
(diagnostic_manager::build_emission_path): Create an interesting_t
instance, allow the pending diagnostic to populate it, and pass it
to the calls to add_events_for_eedge.
(diagnostic_manager::add_events_for_eedge): Add "interest" param.
Use it to add region_creation_events for on-stack regions created
within at function entry, and when pertinent dynamically-sized
regions are created.
(diagnostic_manager::prune_for_sm_diagnostic): Add case for
EK_REGION_CREATION.
* diagnostic-manager.h (diagnostic_manager::add_events_for_eedge):
Add "interest" param.
* pending-diagnostic.cc: Include "selftest.h", "tristate.h",
"analyzer/call-string.h", "analyzer/program-point.h",
"analyzer/store.h", and "analyzer/region-model.h".
(interesting_t::add_region_creation): New.
(interesting_t::dump_to_pp): New.
* pending-diagnostic.h (struct interesting_t): New.
(pending_diagnostic::mark_interesting_stuff): New vfunc.
* region-model.cc
(poisoned_value_diagnostic::poisoned_value_diagnostic): Add
(poisoned_value_diagnostic::operator==): Compare m_pkind and
m_src_region fields.
(poisoned_value_diagnostic::mark_interesting_stuff): New.
(poisoned_value_diagnostic::m_src_region): New.
(region_model::check_for_poison): Call
get_region_for_poisoned_expr for uninit values and pass the resul
to the diagnostic.
(region_model::get_region_for_poisoned_expr): New.
(region_model::deref_rvalue): Pass NULL for
poisoned_value_diagnostic's src_region.
* region-model.h (region_model::get_region_for_poisoned_expr): New
decl.
* region.h (frame_region::get_fndecl): New.
gcc/testsuite/ChangeLog:
* gcc.dg/analyzer/data-model-1.c: Add dg-message directives for
expected region creation events.
* gcc.dg/analyzer/malloc-1.c: Likewise.
* gcc.dg/analyzer/memset-CVE-2017-18549-1.c: Likewise.
* gcc.dg/analyzer/pr101547.c: Likewise.
* gcc.dg/analyzer/pr101875.c: Likewise.
* gcc.dg/analyzer/pr101962.c: Likewise.
* gcc.dg/analyzer/pr104224.c: Likewise.
* gcc.dg/analyzer/pr94047.c: Likewise.
* gcc.dg/analyzer/symbolic-1.c: Likewise.
* gcc.dg/analyzer/uninit-1.c: Likewise.
* gcc.dg/analyzer/uninit-4.c: Likewise.
* gcc.dg/analyzer/uninit-alloca.c: New test.
* gcc.dg/analyzer/uninit-pr94713.c: Add dg-message directive for
expected region creation event.
* gcc.dg/analyzer/uninit-pr94714.c: Likewise.
* gcc.dg/analyzer/zlib-3.c: Likewise.
Signed-off-by: David Malcolm <dmalcolm@redhat.com>
Jonathan Wakely [Wed, 26 Jan 2022 16:08:51 +0000 (16:08 +0000)]
libstdc++: Avoid overflow in ranges::advance(i, n, bound)
When (bound - i) or n is the most negative value of its type, the
negative of the value will overflow. Instead of abs(n) >= abs(bound - i)
use n >= (bound - i) when positive and n <= (bound - i) when negative.
The function has a precondition that they must have the same sign, so
this works correctly. The precondition check can be moved into the else
branch, and simplified.
The standard requires calling ranges::advance(i, bound) even if i==bound
is already true, which is technically observable, but that's pointless.
We can just return n in that case. Similarly, for i!=bound but n==0 we
are supposed to call ranges::advance(i, n), but that's pointless. An LWG
issue to allow omitting the pointless calls is expected to be filed.
libstdc++-v3/ChangeLog:
* include/bits/ranges_base.h (ranges::advance): Avoid signed
overflow. Do nothing if already equal to desired result.
* testsuite/24_iterators/range_operations/advance_overflow.cc:
New test.
Jason Merrill [Thu, 27 Jan 2022 21:12:18 +0000 (16:12 -0500)]
c++: dependent and non-dependent attributes [PR104245]
A flaw in my patch for PR51344 was that cplus_decl_attributes calls
decl_attributes after save_template_attributes, which messes up the ordering
that save_template_attributes set up. Fixed by splitting
save_template_attributes around the call to decl_attributes.
PR c++/104245
PR c++/51344
gcc/cp/ChangeLog:
* decl2.cc (save_template_attributes): Take late attrs as parm.
(cplus_decl_attributes): Call it after decl_attributes,
splice_template_attributes before.
gcc/testsuite/ChangeLog:
* g++.dg/lto/alignas1_0.C: New test.
Uros Bizjak [Thu, 27 Jan 2022 21:14:18 +0000 (22:14 +0100)]
testsuite: Fix gfortran.dg/ieee/signaling_?.f90 tests for x86 targets
As stated in signaling_?.f90 tests, x86-32 ABI is not suitable to
correctly handle signaling NaNs. However, XFAIL is not the correct choice
to disable these tests, since various optimizations can generate code
that avoids moves from registers to memory (and back), resulting
in the code that executes correctly, producing spurious XFAIL.
These tests should be disabled on x86-32 using { ! ia32 } dg-directive
which rules out x32 ilp32 ABI, where tests execute without problems.
Please note that check_effective_target_ia32 test tries to compile code that
uses __i386__ target-dependent preprocessor definition, so it is guaranteed
to fail on all non-ia32 targets.
2022-01-27 Uroš Bizjak <ubizjak@gmail.com>
gcc/testsuite/ChangeLog:
* gfortran.dg/ieee/signaling_1.f90 (dg-do):
Run only on non-ia32 targets.
* gfortran.dg/ieee/signaling_2.f90 (dg-do): Ditto.
* gfortran.dg/ieee/signaling_3.f90 (dg-do): Ditto.
Harald Anlauf [Sun, 23 Jan 2022 20:55:33 +0000 (21:55 +0100)]
Fortran: fix issues with internal conversion between default and wide char
gcc/fortran/ChangeLog:
PR fortran/104128
* expr.cc (gfc_copy_expr): Convert internal representation of
string to wide char in value only for default character kind.
* target-memory.cc (interpret_array): Pass flag for conversion of
wide chars.
(gfc_target_interpret_expr): Likewise.
gcc/testsuite/ChangeLog:
PR fortran/104128
* gfortran.dg/transfer_simplify_14.f90: New test.
Patrick Palka [Thu, 27 Jan 2022 19:34:05 +0000 (14:34 -0500)]
c++: Add a couple of CTAD testcases [PR82632]
PR c++/82632
gcc/testsuite/ChangeLog:
* g++.dg/cpp1z/class-deduction104.C: New test.
* g++.dg/cpp1z/class-deduction105.C: New test.
Harald Anlauf [Thu, 27 Jan 2022 19:23:00 +0000 (20:23 +0100)]
Fortran: add missing conversions for result of intrinsics to result type
gcc/fortran/ChangeLog:
PR fortran/84784
* trans-intrinsic.cc (conv_intrinsic_image_status): Convert result
to resulting (default) integer type.
(conv_intrinsic_team_number): Likewise.
(gfc_conv_intrinsic_popcnt_poppar): Likewise.
gcc/testsuite/ChangeLog:
PR fortran/84784
* gfortran.dg/pr84784.f90: New test.
Martin Liska [Thu, 27 Jan 2022 18:27:51 +0000 (19:27 +0100)]
git-undescr.sh: Support full output of git-descr.sh.
contrib/ChangeLog:
* git-undescr.sh: Support full output of git-descr.sh.
Martin Liska [Thu, 27 Jan 2022 15:01:55 +0000 (16:01 +0100)]
contrib: Put gcc-descr and gcc-undescr to file.
contrib/ChangeLog:
* git-descr.sh: New file.
* git-undescr.sh: New file.
Support optional arguments --long, --short and default
to 14 characters of git hash.
* gcc-git-customization.sh: Use the created files.
Co-Authored-By: Martin Jambor <mjambor@suse.cz>
Patrick Palka [Thu, 27 Jan 2022 15:56:49 +0000 (10:56 -0500)]
c++: non-dependent immediate member fn call [PR99895]
Here we're emitting a bogus error during ahead of time evaluation of a
non-dependent immediate member function call such as a.f(args) because
the defacto templated form for such a call is (a.f)(args) but we're
trying to evaluate it using the intermediate CALL_EXPR built by
build_over_call, which has the non-member form f(a, args). The defacto
member form is built in build_new_method_call, so it seems we should
handle the immediate call there instead, or perhaps make build_over_call
build the correct form in the first place.
Giiven that there are many spots other than build_new_method_call that
call build_over_call for member functions, e.g. build_op_call, this
patch takes the latter approach.
In passing, this patch makes us avoid wrapping PARM_DECL in
NON_DEPENDENT_EXPR for benefit of the third testcase below.
PR c++/99895
gcc/cp/ChangeLog:
* call.cc (build_over_call): For a non-dependent member call,
build up a CALL_EXPR using a COMPONENT_REF callee, as in
build_new_method_call.
* pt.cc (build_non_dependent_expr): Don't wrap PARM_DECL either.
* tree.cc (build_min_non_dep_op_overload): Adjust accordingly
after the build_over_call change.
gcc/ChangeLog:
* tree.cc (build_call_vec): Add const to second parameter.
* tree.h (build_call_vec): Likewise.
gcc/testsuite/ChangeLog:
* g++.dg/cpp2a/consteval-memfn1.C: New test.
* g++.dg/cpp2a/consteval-memfn2.C: New test.
* g++.dg/cpp2a/consteval28.C: New test.
Patrick Palka [Thu, 27 Jan 2022 15:56:34 +0000 (10:56 -0500)]
c++: constrained partial spec using qualified name [PR92944, PR103678]
In the nested_name_specifier branch within cp_parser_class_head, we need
to update 'type' with the result of maybe_process_partial_specialization
like we do in the template_id_p branch.
PR c++/92944
PR c++/103678
gcc/cp/ChangeLog:
* parser.cc (cp_parser_class_head): Update 'type' with the result
of maybe_process_partial_specialization in the
nested_name_specifier branch. Refactor nearby code to accomodate
that maybe_process_partial_specialization returns a _TYPE, not a
TYPE_DECL, and eliminate local variable 'class_type' in passing.
gcc/testsuite/ChangeLog:
* g++.dg/cpp2a/concepts-partial-spec10.C: New test.
* g++.dg/cpp2a/concepts-partial-spec11.C: New test.
Martin Liska [Thu, 27 Jan 2022 13:47:23 +0000 (14:47 +0100)]
libstdc++: fix typo in acinclude.m4.
PR libstdc++/104259
libstdc++-v3/ChangeLog:
* acinclude.m4: Fix typo.
* configure: Regenerate.
Marek Polacek [Wed, 26 Jan 2022 22:29:19 +0000 (17:29 -0500)]
c++: new-expr of array of deduced class tmpl [PR101988]
In r12-1933 I attempted to implement DR2397 aka allowing
int a[3];
auto (&r)[3] = a;
by removing the type_uses_auto check in create_array_type_for_decl.
That may have gone too far, because it also allows arrays of
CLASS_PLACEHOLDER_TEMPLATE and it looks like [dcl.type.class.deduct]
prohibits that: "...the declared type of the variable shall be cv T,
where T is the placeholder." However, in /2 it explicitly states that
"A placeholder for a deduced class type can also be used in the
type-specifier-seq in the new-type-id or type-id of a new-expression."
In this PR, it manifested by making us accept invalid
template<class T> struct A { A(T); };
auto p = new A[]{1};
[expr.new]/2 says that such a construct is treated as an invented
declaration of the form
A x[]{1};
but, I think, that ought to be ill-formed as per above. So this patch
sort of restores the create_array_type_for_decl check. I should mention
that the difference between [] and [1] is due to cp_parser_new_type_id:
if (*nelts == NULL_TREE)
/* Leave [] in the declarator. */;
and groktypename returning different types based on that.
PR c++/101988
gcc/cp/ChangeLog:
* decl.cc (create_array_type_for_decl): Reject forming an array of
placeholder for a deduced class type.
gcc/testsuite/ChangeLog:
* g++.dg/cpp1z/class-deduction-new1.C: New test.
* g++.dg/cpp23/auto-array2.C: New test.
Martin Liska [Thu, 27 Jan 2022 10:22:42 +0000 (11:22 +0100)]
Improve wording for -freport-bug option.
PR web/104254
gcc/ChangeLog:
* diagnostic.cc (diagnostic_initialize):
Initialize report_bug flag.
(diagnostic_action_after_output):
Explain that -freport-bug option can be used for pre-processed
file creation. Make the message shorter.
(error_recursion): Rename Internal to internal.
* diagnostic.h (struct diagnostic_context): New field.
* opts.cc (common_handle_option): Init the field here.
Martin Liska [Thu, 27 Jan 2022 11:41:16 +0000 (12:41 +0100)]
analyzer: fix -Wformat warnings on i686
PR analyzer/104247
gcc/analyzer/ChangeLog:
* constraint-manager.cc (bounded_ranges_manager::log_stats):
Cast to long for format purpose.
* region-model-manager.cc (log_uniq_map): Likewise.
Kewen Lin [Thu, 27 Jan 2022 09:46:28 +0000 (03:46 -0600)]
rs6000: Fix an assertion in update_target_cost_per_stmt [PR103702]
This patch is to fix one wrong assertion which is too aggressive.
Vectorizer can do vec_construct costing for the vector type which
only has one unit. For the failed case, the passed in vector type
is "vector(1) int", though it doesn't end up with any construction
eventually, we have to handle this kind of possibility.
gcc/ChangeLog:
PR target/103702
* config/rs6000/rs6000.cc
(rs6000_cost_data::update_target_cost_per_stmt): Fix one wrong
assertion with early return.
gcc/testsuite/ChangeLog:
PR target/103702
* gcc.target/powerpc/pr103702.c: New test.
Chung-Lin Tang [Thu, 27 Jan 2022 10:33:00 +0000 (18:33 +0800)]
Fix omp-low ICE for indirect references based off component access [PR103642]
This issue was triggered after the patch extending syntax for component access
in map clauses in commit
0ab29cf0bb68960c1f87405f14b4fb2109254e2f.
In gimplify_scan_omp_clauses, the case for handling indirect accesses (which
creates firstprivate ptr and zero-length array section map for such decls) was
erroneously went into for non-pointer cases (here being the base struct decl),
so added the
appropriate checks there.
Added new testcase is a compile only test for the ICE. The original omptests
t-partial-struct test actually should not execute correctly, because for
map(t.s->a[:N]), map(t.s[:1]) is not implicitly mapped, thus the entire
offloaded access does not work as is (fixing that omptests test is out of
scope here).
2022-01-27 Chung-Lin Tang <cltang@codesourcery.com>
PR middle-end/103642
gcc/ChangeLog:
* gimplify.cc (gimplify_scan_omp_clauses): Do not do indir_p handling
for non-pointer or non-reference-to-pointer cases.
gcc/testsuite/ChangeLog:
* c-c++-common/gomp/pr103642.c: New test.
Andrew Pinski [Thu, 27 Jan 2022 10:28:28 +0000 (10:28 +0000)]
Fix aarch64/104201: branch-protection-attr.c fails after quoting difference
After the quoting changes in r12-6521-g03a1a86b5ee40d4e240, branch-protection-attr.c
fails due to expecting a different quoting type for "leaf".
This patch changes the quoting from "" to '' as that is what is used now.
Committed as obvious after a test of the testcase.
gcc/testsuite/ChangeLog:
PR target/104201
* gcc.target/aarch64/branch-protection-attr.c: Fix quoting for
the expected error message on line 5 of leaf.