Roger Sayle [Thu, 10 Feb 2022 13:32:07 +0000 (13:32 +0000)]
gfortran: Respect target's NO_DOT_IN_LABEL in trans-common.cc
This patch fixes 9 unexpected failures in the gfortran testsuite on
nvptx-none. The issue is that gfortran's EQUIVALENCE internally uses
symbols such as "equiv.0" even on platforms that define NO_DOT_IN_LABEL.
On nvptx-none, this then results in the following error message(s):
ptxas application ptx input, fatal: Parsing error near '.0': syntax error
ptxas fatal: Ptx assembly aborted due to errors
The fix is to tweak trans-common.cc to respect the target's NO_DOT_IN_LABEL
(and NO_DOLLAR_IN_LABEL) when generating internal equiv.%d symbols.
Only the nvptx, mmix and xtensa backends define NO_DOT_IN_LABEL which
explains why no-one has spotted/fixed this issue since the problematic
code was last changed back in 2005(!).
2022-02-10 Roger Sayle <roger@nextmovesoftware.com>
Tobias Burnus <tobias@codesourcery.com>
gcc/fortran/ChangeLog
* trans-common.cc (GFC_EQUIV_FMT): New macro respecting the
target's NO_DOT_IN_LABEL and NO_DOLLAR_IN_LABEL preferences.
(build_equiv_decl): Use GFC_EQUIV_FMT here.
Jonathan Wakely [Wed, 9 Feb 2022 13:38:33 +0000 (13:38 +0000)]
libstdc++: Add atomic_fetch_xor to <stdatomic.h>
This function (and the explicit memory over version) are present in both
C++ <atomic> and C <stdatomic.h>, so should be in C++ <stdatomic.h> too.
There is a library issue incoming for this, but the resolution is
obvious.
libstdc++-v3/ChangeLog:
* include/c_compatibility/stdatomic.h (atomic_fetch_xor): Add
using-declaration.
(atomic_fetch_xor_explicit): Likewise.
* testsuite/29_atomics/headers/stdatomic.h/c_compat.cc: Check
arithmetic and logical operations for atomic_int.
Jonathan Wakely [Tue, 8 Feb 2022 21:05:30 +0000 (21:05 +0000)]
libstdc++: Fix directory iterator build for newlib
When building for newlib HAVE_OPENAT and HAVE_UNLINKAT are (sometimes?)
defined, but <fcntl.h> is only included when HAVE_DIRENT_H is defined.
Since directory iterators are completely useless without <dirent.h>,
just override the HAVE_OPENAT and HAVE_UNLINKAT detection when we don't
have <dirent.h>.
libstdc++-v3/ChangeLog:
* src/filesystem/dir-common.h (_GLIBCXX_HAVE_DIRFD): Undefine
when <dirent.h> is not available.
(_GLIBCXX_HAVE_UNLINKAT): Likewise.
Richard Biener [Fri, 4 Feb 2022 08:46:43 +0000 (09:46 +0100)]
tree-optimization/104373 - early diagnostic on unreachable code
The following improves early uninit diagnostics by computing edge
reachability using VN and ignoring unreachable blocks when looking
for uninitialized uses. To not ICE with -fdump-tree-all the
early uninit pass needs a dumpfile since VN tries to dump statistics.
2022-02-04 Richard Biener <rguenther@suse.de>
PR tree-optimization/104373
* tree-ssa-sccvn.h (do_rpo_vn): New export exposing the
walk kind.
* tree-ssa-sccvn.cc (do_rpo_vn): Export, get the default
walk kind as argument.
(run_rpo_vn): Adjust.
(pass_fre::execute): Likewise.
* tree-ssa-uninit.cc (warn_uninitialized_vars): Skip
blocks not reachable.
(execute_late_warn_uninitialized): Mark all edges as
executable.
(execute_early_warn_uninitialized): Use VN to compute
executable edges.
(pass_data_early_warn_uninitialized): Enable a dump file,
change dump name to warn_uninit.
* g++.dg/warn/Wuninitialized-32.C: New testcase.
* gcc.dg/uninit-pr20644-O0.c: Remove XFAIL.
Richard Biener [Thu, 10 Feb 2022 09:01:20 +0000 (10:01 +0100)]
middle-end/104467 - fix vector extract simplification
This fixes a bogus vector type used for a CTOR build as part of
vector extract simplification. The code failed to consider a
CTOR of vector elements.
2022-02-10 Richard Biener <rguenther@suse.de>
PR middle-end/104467
* match.pd (vector extract simplification): Multiply the
number of CTOR elements with the number of element elements.
* gcc.dg/torture/pr104467.c: New testcase.
Richard Biener [Thu, 10 Feb 2022 08:03:48 +0000 (09:03 +0100)]
tree-optimization/104466 - fix cut&paste error perventing alias disambiguation
The following fixes a cut&paste error in disambiguating using restrict
info. Instead of using the for this purpose computed rbase1/rbase2
which preserve MEM_REF bases even when they are based on a decl the
code performs the check on the bases that drop info for those ...
2022-02-10 Richard Biener <rguenther@suse.de>
PR tree-optimization/104466
* tree-ssa-alias.cc (refs_may_alias_p_2): Use rbase1/rbase2
for the MR_DEPENDENCE checks as intended.
* gfortran.dg/pr104466.f90: New testcase.
Tom de Vries [Wed, 2 Feb 2022 15:23:37 +0000 (16:23 +0100)]
[nvptx] Handle sm_7x shared atomic store more optimal
For sm_7x atomic stores we fall back on expand_atomic_store, but this
results in using membar.sys for shared stores.
Fix this by adding an nvptx_atomic_store insn that adds a membar.cta for a
shared store.
Tested on x86_64 with nvptx accelerator.
gcc/ChangeLog:
2022-02-02 Tom de Vries <tdevries@suse.de>
* config/nvptx/nvptx.md (define_insn "nvptx_atomic_store<mode>"): New
define_insn.
(define_expand "atomic_store<mode>"): Use nvptx_atomic_store<mode> for
TARGET_SM70.
(define_c_enum "unspecv"): Add UNSPECV_ST.
gcc/testsuite/ChangeLog:
2022-02-02 Tom de Vries <tdevries@suse.de>
* gcc.target/nvptx/atomic-store-2.c: New test.
Tom de Vries [Thu, 13 Jan 2022 12:13:44 +0000 (13:13 +0100)]
[nvptx] Handle pre-sm_7x shared atomic store using atomic exchange
The ptx isa specifies (for pre-sm_7x) that atomic operations on shared memory
locations do not guarantee atomicity with respect to normal store instructions
to the same address.
This can be fixed by:
- inserting barriers between normal stores and atomic operations to a common
address
- using atom.exch to store to locations accessed by other atomic operations.
It's not clearly spelled out which barriers are needed, and a barrier seem more
expensive than atomic exchange.
Implement the pre-sm_7x shared atomic store using atomic exchange.
That includes stores using generic addressing, since those may also point to
shared memory.
Tested on x86-64 with nvptx accelerator.
gcc/ChangeLog:
2022-02-02 Tom de Vries <tdevries@suse.de>
* config/nvptx/nvptx-protos.h (nvptx_mem_maybe_shared_p): Declare.
* config/nvptx/nvptx.cc (nvptx_mem_data_area): New static function.
(nvptx_mem_maybe_shared_p): New function.
* config/nvptx/nvptx.md (define_expand "atomic_store<mode>"): New
define_expand.
gcc/testsuite/ChangeLog:
2022-02-02 Tom de Vries <tdevries@suse.de>
* gcc.target/nvptx/atomic-store-1.c: New test.
* gcc.target/nvptx/atomic-store-3.c: New test.
* gcc.target/nvptx/stack-atomics-run.c: Update.
Tom de Vries [Mon, 7 Feb 2022 13:12:34 +0000 (14:12 +0100)]
[nvptx] Workaround sub.u16 driver JIT bug
There's a nvidia driver JIT bug that mishandles this code (minimized from
builtin-arith-overflow-15.c):
...
int main (void) {
signed char r;
unsigned char y = (unsigned char) 0x80;
if (__builtin_sub_overflow ((unsigned char)0, (unsigned char)y, &r))
__builtin_abort ();
return 0;
}
...
which at ptx level minimizes to:
...
mov.u16 r22, 0x0080;
st.local.u16 [frame_var],r22;
ld.local.u16 r32,[frame_var];
sub.u16 r33,0x0000,r32;
cvt.u32.u16 r35,r33;
...
where we expect r35 == 0x0000ff80 but get instead 0xffffff80, and where using
nvptx-none-run -O0 fixes the problem. [ See also
https://github.com/vries/nvidia-bugs/tree/master/builtin-arith-overflow-15 . ]
Try to workaround the bug by using sub.s16 instead of sub.u16.
Tested on nvptx.
gcc/ChangeLog:
2022-02-07 Tom de Vries <tdevries@suse.de>
PR target/97005
* config/nvptx/nvptx.md (define_insn "sub<mode>3"): Workaround
driver JIT bug by using sub.s16 instead of sub.u16.
Tobias Burnus [Thu, 10 Feb 2022 08:30:19 +0000 (09:30 +0100)]
Fortran/OpenMP: Avoid ICE for invalid char array in omp atomic [PR104329]
PR fortran/104329
gcc/fortran/ChangeLog:
* openmp.cc (resolve_omp_atomic): Defer extra-code assert after
other diagnostics.
gcc/testsuite/ChangeLog:
* gfortran.dg/gomp/atomic-28.f90: New test.
Roger Sayle [Tue, 8 Feb 2022 19:56:55 +0000 (20:56 +0100)]
nvptx: Tweak constraints on copysign instructions
Many thanks to Thomas Schwinge for confirming my hypothesis that the register
usage regression, PR target/104345, is solely due to libgcc's _muldc3 function.
In addition to the isinf functionality in the previously proposed nvptx patch at
https://gcc.gnu.org/pipermail/gcc-patches/2022-January/588453.html which
significantly reduces the number of instructions in _muldc3, the patch below
further reduces both the number of instructions and the number of explicitly
declared registers, by permitting floating point constant immediate operands
in nvptx's copysign instruction.
Fingers-crossed, the combination with all of the previous proposed nvptx
patches improves things. Ultimately, increasing register usage from 50 to
51 registers, reducing the number of concurrent threads by ~2%, can easily
be countered if we're now executing significantly fewer instructions in each
kernel, for a net performance win.
This patch has been tested on nvptx-none hosted on x86_64-pc-linux-gnu
with a "make" and "make -k check" with no new failures.
gcc/ChangeLog:
* config/nvptx/nvptx.md (copysign<mode>3): Allow immediate
floating point constants as operands 1 and/or 2.
Roger Sayle [Fri, 4 Feb 2022 03:13:53 +0000 (04:13 +0100)]
PR target/104345: Use nvptx "set" instruction for cond ? -1 : 0
This patch addresses the "increased register pressure" regression on
nvptx-none caused by my change to transition the backend to a
STORE_FLAG_VALUE = 1 target. This improved code generation for the
more common case of producing 0/1 Boolean values, but unfortunately
made things marginally worse when a 0/-1 mask value is desired.
Unfortunately, nvptx kernels are extremely sensitive to changes in
register usage, which was observable in the reported PR.
This patch provides optimizations for -(cond ? 1 : 0), effectively
simplify this into cond ? -1 : 0, where these ternary operators are
provided by nvptx's selp instruction, and for the specific case of
SImode, using (restoring) nvptx's "set" instruction (which avoids
the need for a predicate register).
This patch has been tested on nvptx-none hosted on x86_64-pc-linux-gnu
with a "make" and "make -k check" with no new failures. Unfortunately,
the exact register usage of a nvptx kernel depends upon the version of
the Cuda drivers being used (and the hardware), but I believe this
change should resolve the PR (for Thomas) by improving code generation
for the cases that regressed.
gcc/ChangeLog:
PR target/104345
* config/nvptx/nvptx.md (sel_true<mode>): Fix indentation.
(sel_false<mode>): Likewise.
(define_code_iterator eqne): New code iterator for EQ and NE.
(*selp<mode>_neg_<code>): New define_insn_and_split to optimize
the negation of a selp instruction.
(*selp<mode>_not_<code>): New define_insn_and_split to optimize
the bitwise not of a selp instruction.
(*setcc_int<mode>): Use set instruction for neg:SI of a selp.
gcc/testsuite/ChangeLog:
PR target/104345
* gcc.target/nvptx/neg-selp.c: New test case.
Roger Sayle [Thu, 3 Feb 2022 13:46:40 +0000 (14:46 +0100)]
nvptx: Fix and use BI mode logic instructions (e.g. and.pred)
This patch adds support for nvptx's BImode and.pred, or.pred and
xor.pred instructions. Technically, nvptx.md previously defined
andbi3, iorbi3 and xorbi3 instructions, but the assembly language
mnemonic output for these was incorrect (e.g. and.b1) and would be
rejected by the ptxas assembler. The most significant part of this
patch is the new define_split which teaches the compiler to actually
use these instructions when appropriate (exposing the latent bug above).
After https://gcc.gnu.org/pipermail/gcc-patches/2022-January/587999.html,
the function:
int foo(int x, int y) { return (x==21) && (y==69); }
when compiled with -O2 produces:
mov.u32 %r26, %ar0;
mov.u32 %r27, %ar1;
setp.eq.u32 %r31, %r26, 21;
setp.eq.u32 %r34, %r27, 69;
selp.u32 %r37, 1, 0, %r31;
selp.u32 %r38, 1, 0, %r34;
and.b32 %value, %r37, %r38;
with this patch we now save an extra instruction and generate:
mov.u32 %r26, %ar0;
mov.u32 %r27, %ar1;
setp.eq.u32 %r31, %r26, 21;
setp.eq.u32 %r34, %r27, 69;
and.pred %r39, %r34, %r31;
selp.u32 %value, 1, 0, %r39;
This patch has been tested (on top of the patch mentioned above) on
nvptx-none hosted on x86_64-pc-linux-gnu (including newlib) with a
make and make -k check with no new failures.
gcc/ChangeLog:
* config/nvptx/nvptx.md (any_logic): Move code iterator earlier
in machine description.
(logic): Move code attribute earlier in machine description.
(ilogic): New code attribute, like logic but "ior" for IOR.
(and<mode>3, ior<mode>3, xor<mode>3): Delete. Replace with...
(<ilogic><mode>3): New define_insn for HSDIM logic operations.
(<ilogic>bi3): New define_insn for BI mode logic operations.
(define_split): Lower logic operations from integer modes to
BI mode predicate operations.
gcc/testsuite/ChangeLog:
* gcc.target/nvptx/bool-1.c: Update.
* gcc.target/nvptx/bool-2.c: New test case for and.pred.
* gcc.target/nvptx/bool-3.c: New test case for or.pred.
* gcc.target/nvptx/bool-4.c: New test case for xor.pred.
Roger Sayle [Thu, 3 Feb 2022 13:41:01 +0000 (14:41 +0100)]
nvptx: Add support for 64-bit mul.hi (and other) instructions
Now that the middle-end MULT_HIGHPART_EXPR pieces are in place, this
patch adds support for nvptx's mul.hi.s64 and mul.hi.u64 instructions,
as previously reviewed (provisionally pre-approved) back in August 2020:
https://gcc.gnu.org/pipermail/gcc-patches/2020-August/551373.html
Since then a few things have changed, so this patch uses the new
SMUL_HIGHPART and UMUL_HIGHPART RTX expressions, but the test cases
remain the same. Like the x86_64 backend, this patch retains the
"trunc" forms of these instructions (while the RTL optimizers/combine
may still generate them).
Given that we're rapidly approaching stage 4, I also took the liberty
of including support in nvptx.md for a few other instructions. With
the new 64-bit highpart multiplication instructions added above, we
can now provide a define_expand for efficient 64-bit (to 128-bit)
widening multiplications. This patch also adds support for nvptx's
testp.infinite instruction (for implementing __builtin_isinf) and
the not.pred instruction.
As an example of the code generation improvements, the function
int foo(double x) { return __builtin_isinf(x); }
previously generated with -O2:
mov.f64 %r26, %ar0;
abs.f64 %r28, %r26;
setp.leu.f64 %r31, %r28,
0d7fefffffffffffff;
selp.u32 %r30, 1, 0, %r31;
mov.u32 %r29, %r30;
cvt.u16.u8 %r35, %r29;
mov.u16 %r33, %r35;
xor.b16 %r32, %r33, 1;
cvt.u32.u16 %r34, %r32;
cvt.u32.u8 %value, %r34;
and with this patch now generates:
mov.f64 %r23, %ar0;
testp.infinite.f64 %r24, %r23;
selp.u32 %value, 1, 0, %r24;
This patch has been tested on nvptx-none hosted on x86_64-pc-linux-gnu
(including newlib) with a make and make -k check with no new failures.
gcc/ChangeLog:
* config/nvptx/nvptx.md (UNSPEC_ISINF): New UNSPEC.
(one_cmplbi2): New define_insn for not.pred.
(mulditi3): New define_expand for signed widening multiply.
(umulditi3): New define_expand for unsigned widening multiply.
(smul<mode>3_highpart): New define_insn for signed highpart mult.
(umul<mode>3_highpart): New define_insn for unsigned highpart mult.
(*smulhi3_highpart_2): Renamed from smulhi3_highpart.
(*smulsi3_highpart_2): Renamed from smulsi3_highpart.
(*umulhi3_highpart_2): Renamed from umulhi3_highpart.
(*umulsi3_highpart_2): Renamed from umulsi3_highpart.
(*setcc<mode>_from_not_bi): New define_insn.
(*setcc_isinf<mode>): New define_insn for testp.infinite.
(isinf<mode>2): New define_expand.
gcc/testsuite/ChangeLog:
* gcc.target/nvptx/mul-hi64.c: New test case.
* gcc.target/nvptx/umul-hi64.c: New test case.
* gcc.target/nvptx/mul-wide64.c: New test case.
* gcc.target/nvptx/umul-wide64.c: New test case.
* gcc.target/nvptx/isinf.c: New test case.
Roger Sayle [Thu, 3 Feb 2022 08:21:58 +0000 (09:21 +0100)]
nvptx: Expand QI mode operations using SI mode instructions
One of the unusual target features of the Nvidia PTX ISA is that it
doesn't provide QI mode (byte sized) operations or registers. Somewhat
conventionally, 8-bit quantities are read from/written to memory using
special instructions, but stored internally using SImode (32-bit) registers.
GCC's middle-end accomodates targets without QImode optabs, by widening
operations until suitable support is found, and with the current nvptx
backend this means 16-bit HImode operations. The inconvenience is that
nvptx is also a TARGET_TRULY_NOOP_TRUNCATION=false target, meaning that
additional instructions are required to convert between the SImode
registers used to hold QImode values, and the HImode registers used to
operate on them (and back again). This results in a large amount of
shuffling and type conversion in code dealing with bytes, i.e. using
char or Boolean types.
This patch improves the situation by providing expanders in the nvptx
machine description to perform QImode operations natively in SImode
instead of HImode. An alternate implementation might be to provide
some form of target hook to specify which fallback modes to use during
RTL expansion, but I think this requirement is unusual, and a solution
entirely in the nvptx backend doesn't disturb/affect other targets.
The improvements can be quite dramatic, as shown in the example below:
int foo(int x, int y) { return (x==21) && (y==69); }
previously with -O2 required 15 instructions:
mov.u32 %r26, %ar0;
mov.u32 %r27, %ar1;
setp.eq.u32 %r31, %r26, 21;
selp.u32 %r30, 1, 0, %r31;
mov.u32 %r29, %r30;
setp.eq.u32 %r34, %r27, 69;
selp.u32 %r33, 1, 0, %r34;
mov.u32 %r32, %r33;
cvt.u16.u8 %r39, %r29;
mov.u16 %r36, %r39;
cvt.u16.u8 %r39, %r32;
mov.u16 %r37, %r39;
and.b16 %r35, %r36, %r37;
cvt.u32.u16 %r38, %r35;
cvt.u32.u8 %value, %r38;
with this patch, now requires only 7 instructions:
mov.u32 %r26, %ar0;
mov.u32 %r27, %ar1;
setp.eq.u32 %r31, %r26, 21;
setp.eq.u32 %r34, %r27, 69;
selp.u32 %r37, 1, 0, %r31;
selp.u32 %r38, 1, 0, %r34;
and.b32 %value, %r37, %r38;
This patch has been tested on nvptx-none hosted on x86_64-pc-linux-gnu
(including newlib) with a make and make -k check with no new failures.
gcc/ChangeLog:
* config/nvptx/nvptx.md (cmp<mode>): Renamed from *cmp<mode>.
(setcc<mode>_from_bi): Additionally support QImode.
(extendbi<mode>2): Additionally support QImode.
(zero_extendbi<mode>2): Additionally support QImode.
(any_sbinary, any_ubinary, any_sunary, any_uunary): New code
iterators for signed and unsigned, binary and unary operations.
(<sbinary>qi3, <ubinary>qi3, <sunary>qi2, <uunary>qi2): New
expanders to perform QImode operations using SImode instructions.
(cstoreqi4): New define_expand.
(*ext_truncsi2_qi): New define_insn.
(*zext_truncsi2_qi): New define_insn.
gcc/testsuite/ChangeLog:
* gcc.target/nvptx/bool-1.c: New test case.
Roger Sayle [Thu, 3 Feb 2022 08:07:22 +0000 (09:07 +0100)]
nvptx: Improved support for HFMode including neghf2 and abshf2
This patch adds more support for _Float16 (HFmode) to the nvptx backend.
Currently negation, absolute value and floating point comparisons are
implemented by promoting to float (SFmode). This patch adds suitable
define_insns to nvptx.md, most conditional on TARGET_SM53 (-misa=sm_53).
This patch also adds support for HFmode fused multiply-add.
One subtlety is that neghf2 and abshf2 are implemented by (HImode)
bit manipulation operations to update the sign bit. The NVidia PTX
ISA documentation for neg.f16 and abs.f16 contains the caution
"Future implementations may comply with the IEEE 754 standard by preserving
the (NaN) payload and modifying only the sign bit". Given the availability
of suitable replacements, I thought it best to provide IEEE 754 compliant
implementations. If anyone observes a performance penalty from this
choice I'm happy to provide a -ffast-math variant (or revisit this
decision).
This patch has been tested on nvptx-none hosted on x86_64-pc-linux-gnu
(including newlib) with a make and make -k check with no new failures.
gcc/ChangeLog:
* config/nvptx/nvptx.md (*cmpf): New define_insn.
(cstorehf4): New define_expand.
(fmahf4): New define_insn.
(neghf2): New define_insn.
(abshf2): New define_insn.
gcc/testsuite/ChangeLog:
* gcc.target/nvptx/float16-3.c: New test case for neghf2.
* gcc.target/nvptx/float16-4.c: New test case for abshf2.
* gcc.target/nvptx/float16-5.c: New test case for fmahf4.
* gcc.target/nvptx/float16-6.c: New test case.
Gerald Pfeifer [Thu, 10 Feb 2022 07:59:53 +0000 (08:59 +0100)]
doc: Tweak the www.bitwizard.nl reference
gcc:
* doc/install.texi (Specific): Change the www.bitwizard.nl
reference to use https.
Marcel Vollweiler [Thu, 10 Feb 2022 07:47:12 +0000 (23:47 -0800)]
C, C++, Fortran, OpenMP: Add 'has_device_addr' clause to 'target' construct.
This patch adds the 'has_device_addr' clause to the OpenMP 'target' construct
which was introduced in OpenMP 5.1 (OpenMP API 5.1 specification pp. 197ff):
has_device_addr(list)
"The has_device_addr clause indicates that its list items already have device
addresses and therefore they may be directly accessed from a target device.
If the device address of a list item is not for the device on which the target
region executes, accessing the list item inside the region results in
unspecified behavior. The list items may include array sections." (p. 200)
"A list item may not be specified in both an is_device_ptr clause and a
has_device_addr clause on the directive." (p. 202)
"A list item that appears in an is_device_ptr or a has_device_addr clause must
not be specified in any data-sharing attribute clause on the same target
construct." (p. 203)
gcc/c-family/ChangeLog:
* c-omp.cc (c_omp_split_clauses): Added OMP_CLAUSE_HAS_DEVICE_ADDR case.
* c-pragma.h (enum pragma_kind): Added 5.1 in comment.
(enum pragma_omp_clause): Added PRAGMA_OMP_CLAUSE_HAS_DEVICE_ADDR.
gcc/c/ChangeLog:
* c-parser.cc (c_parser_omp_clause_name): Parse 'has_device_addr'
clause.
(c_parser_omp_variable_list): Handle array sections.
(c_parser_omp_clause_has_device_addr): Added.
(c_parser_omp_all_clauses): Added PRAGMA_OMP_CLAUSE_HAS_DEVICE_ADDR
case.
(c_parser_omp_target_exit_data): Added HAS_DEVICE_ADDR to
OMP_CLAUSE_MASK.
* c-typeck.cc (handle_omp_array_sections): Handle clause restrictions.
(c_finish_omp_clauses): Handle array sections.
gcc/cp/ChangeLog:
* parser.cc (cp_parser_omp_clause_name): Parse 'has_device_addr' clause.
(cp_parser_omp_var_list_no_open): Handle array sections.
(cp_parser_omp_all_clauses): Added PRAGMA_OMP_CLAUSE_HAS_DEVICE_ADDR
case.
(cp_parser_omp_target_update): Added HAS_DEVICE_ADDR to OMP_CLAUSE_MASK.
* semantics.cc (handle_omp_array_sections): Handle clause restrictions.
(finish_omp_clauses): Handle array sections.
gcc/fortran/ChangeLog:
* dump-parse-tree.cc (show_omp_clauses): Added OMP_LIST_HAS_DEVICE_ADDR
case.
* gfortran.h: Added OMP_LIST_HAS_DEVICE_ADDR.
* openmp.cc (enum omp_mask2): Added OMP_CLAUSE_HAS_DEVICE_ADDR.
(gfc_match_omp_clauses): Parse HAS_DEVICE_ADDR clause.
(resolve_omp_clauses): Same.
* trans-openmp.cc (gfc_trans_omp_variable_list): Added
OMP_LIST_HAS_DEVICE_ADDR case.
(gfc_trans_omp_clauses): Firstprivatize of array descriptors.
gcc/ChangeLog:
* gimplify.cc (gimplify_scan_omp_clauses): Added cases for
OMP_CLAUSE_HAS_DEVICE_ADDR
and handle array sections.
(gimplify_adjust_omp_clauses): Added OMP_CLAUSE_HAS_DEVICE_ADDR case.
* omp-low.cc (scan_sharing_clauses): Handle OMP_CLAUSE_HAS_DEVICE_ADDR.
(lower_omp_target): Same.
* tree-core.h (enum omp_clause_code): Same.
* tree-nested.cc (convert_nonlocal_omp_clauses): Same.
(convert_local_omp_clauses): Same.
* tree-pretty-print.cc (dump_omp_clause): Same.
* tree.cc: Same.
libgomp/ChangeLog:
* libgomp.texi: Updated entry for HAS_DEVICE_ADDR.
* target.c (copy_firstprivate_data): Copy only if host address is not
NULL.
* testsuite/libgomp.c++/target-has-device-addr-2.C: New test.
* testsuite/libgomp.c++/target-has-device-addr-4.C: New test.
* testsuite/libgomp.c++/target-has-device-addr-5.C: New test.
* testsuite/libgomp.c++/target-has-device-addr-6.C: New test.
* testsuite/libgomp.c-c++-common/target-has-device-addr-1.c: New test.
* testsuite/libgomp.c/target-has-device-addr-3.c: New test.
* testsuite/libgomp.fortran/target-has-device-addr-1.f90: New test.
* testsuite/libgomp.fortran/target-has-device-addr-2.f90: New test.
* testsuite/libgomp.fortran/target-has-device-addr-3.f90: New test.
* testsuite/libgomp.fortran/target-has-device-addr-4.f90: New test.
gcc/testsuite/ChangeLog:
* c-c++-common/gomp/clauses-1.c: Added has_device_addr to test cases.
* g++.dg/gomp/attrs-1.C: Added has_device_addr to test cases.
* g++.dg/gomp/attrs-2.C: Added has_device_addr to test cases.
* c-c++-common/gomp/target-has-device-addr-1.c: New test.
* c-c++-common/gomp/target-has-device-addr-2.c: New test.
* c-c++-common/gomp/target-is-device-ptr-1.c: New test.
* c-c++-common/gomp/target-is-device-ptr-2.c: New test.
* gfortran.dg/gomp/is_device_ptr-3.f90: New test.
* gfortran.dg/gomp/target-has-device-addr-1.f90: New test.
* gfortran.dg/gomp/target-has-device-addr-2.f90: New test.
Eugene Rozenfeld [Wed, 9 Feb 2022 07:00:33 +0000 (23:00 -0800)]
AutoFDO: Don't try to promote indirect calls that result in recursive direct calls
AutoFDO tries to promote and inline all indirect calls that were promoted
and inlined in the original binary and that are still hot. In the included
test case, the promotion results in a direct call that is a recursive call.
inline_call and optimize_inline_calls can't handle recursive calls at this stage.
Currently, inline_call fails with a segmentation fault.
This change leaves the indirect call alone if promotion will result in a recursive call.
Tested on x86_64-pc-linux-gnu.
gcc/ChangeLog:
* auto-profile.cc (afdo_indirect_call): Don't attempt to promote indirect calls
that will result in direct recursive calls.
gcc/testsuite/ChangeLog:
* g++.dg/tree-prof/indir-call-recursive-inlining.C : New test.
Andrew Pinski [Wed, 9 Feb 2022 22:56:58 +0000 (14:56 -0800)]
[COMMITTED] Fix PR aarch64/104474: ICE with vector float initializers and non-consts.
The problem here is that the aarch64 back-end was placing const0_rtx
into the constant vector RTL even if the mode was a floating point mode.
The fix is instead to use CONST0_RTX and pass the mode to select the
correct zero (either const_int or const_double).
Committed as obvious after a bootstrap/test on aarch64-linux-gnu with
no regressions.
PR target/104474
gcc/ChangeLog:
* config/aarch64/aarch64.cc
(aarch64_sve_expand_vector_init_handle_trailing_constants):
Use CONST0_RTX instead of const0_rtx for the non-constant elements.
gcc/testsuite/ChangeLog:
* gcc.target/aarch64/sve/pr104474-1.c: New test.
* gcc.target/aarch64/sve/pr104474-2.c: New test.
* gcc.target/aarch64/sve/pr104474-3.c: New test.
GCC Administrator [Thu, 10 Feb 2022 00:16:27 +0000 (00:16 +0000)]
Daily bump.
David Malcolm [Wed, 9 Feb 2022 19:35:31 +0000 (14:35 -0500)]
analyzer: more uninit test coverage
In addition to other test coverage, this adds the examples from
https://cwe.mitre.org/data/definitions/457.html
(aka "CWE-457: Use of Uninitialized Variable")
For reference, the output from -fanalyzer looks like this
(after stripping away the DejaGnu directives):
uninit-CWE-457-examples.c: In function 'example_2_bad_code':
uninit-CWE-457-examples.c:56:3: warning: use of uninitialized value 'bN' [CWE-457] [-Wanalyzer-use-of-uninitialized-value]
56 | repaint(aN, bN); /* { dg-warning "use of uninitialized value 'bN'" } */
| ^~~~~~~~~~~~~~~
'example_2_bad_code': events 1-4
|
| 34 | int aN, bN;
| | ^~
| | |
| | (1) region created on stack here
| 35 | switch (ctl) {
| | ~~~~~~
| | |
| | (2) following 'default:' branch...
|......
| 51 | default:
| | ~~~~~~~
| | |
| | (3) ...to here
|......
| 56 | repaint(aN, bN);
| | ~~~~~~~~~~~~~~~
| | |
| | (4) use of uninitialized value 'bN' here
|
uninit-CWE-457-examples.c: In function 'example_3_bad_code':
uninit-CWE-457-examples.c:95:3: warning: use of uninitialized value 'test_string' [CWE-457] [-Wanalyzer-use-of-uninitialized-value]
95 | printf("%s", test_string);
| ^~~~~~~~~~~~~~~~~~~~~~~~~
'example_3_bad_code': events 1-4
|
| 90 | char *test_string;
| | ^~~~~~~~~~~
| | |
| | (1) region created on stack here
| 91 | if (i != err_val)
| | ~
| | |
| | (2) following 'false' branch (when 'i == err_val')...
|......
| 95 | printf("%s", test_string);
| | ~~~~~~~~~~~~~~~~~~~~~~~~~
| | |
| | (3) ...to here
| | (4) use of uninitialized value 'test_string' here
|
gcc/testsuite/ChangeLog:
* gcc.dg/analyzer/uninit-1.c: Add test coverage for shifts,
comparisons, +, -, *, /, and __builtin_strlen.
* gcc.dg/analyzer/uninit-CWE-457-examples.c: New test.
Signed-off-by: David Malcolm <dmalcolm@redhat.com>
Ian Lance Taylor [Wed, 9 Feb 2022 04:19:04 +0000 (20:19 -0800)]
compiler: don't warn for print()
We used to warn for calls to print(), because it doesn't do anything.
However, a Go 1.18 test uses that call, and it is valid Go. Change
the compiler to just accept it and compile it; this will produce calls
to printlock and printunlock, and nothing else.
Reviewed-on: https://go-review.googlesource.com/c/gofrontend/+/384355
Ian Lance Taylor [Wed, 9 Feb 2022 04:16:38 +0000 (20:16 -0800)]
compiler: use nil pointer for zero length string constant
We used to pointlessly set the pointer of a zero length string
constant to point to a zero byte constant. Instead, just use nil.
Reviewed-on: https://go-review.googlesource.com/c/gofrontend/+/384354
Ian Lance Taylor [Wed, 9 Feb 2022 04:22:32 +0000 (20:22 -0800)]
compiler: treat notinheap types as not being pointers
By definition, a type is marked notinheap doesn't contain any pointers
that the garbage collector cares about, and neither does a pointer to
such a type. Change the type descriptors to consistently treat such
types as not being pointers, by setting ptrdata to 0 and gcdata to nil.
Change-Id: Id8466555ec493456ff5ff09f1670551414619bd2
Reviewed-on: https://go-review.googlesource.com/c/gofrontend/+/384118
Trust: Ian Lance Taylor <iant@golang.org>
Reviewed-by: Cherry Mui <cherryyz@google.com>
Harald Anlauf [Sun, 6 Feb 2022 20:47:20 +0000 (21:47 +0100)]
Fortran: try simplifications during reductions of array constructors
gcc/fortran/ChangeLog:
PR fortran/66193
* arith.cc (reduce_binary_ac): When reducing binary expressions,
try simplification. Handle case of empty constructor.
(reduce_binary_ca): Likewise.
gcc/testsuite/ChangeLog:
PR fortran/66193
* gfortran.dg/array_constructor_55.f90: New test.
Ian Lance Taylor [Wed, 9 Feb 2022 21:11:55 +0000 (13:11 -0800)]
gccgo: link static libgo against -lrt on GNU/Linux
The upcoming Go 1.18 release requires linking against -lrt on GNU/Linux
(only) in order to call timer_create and friends.
Also change gotools to link the runtime test against -lrt.
* gospec.cc (RTLIB, RT_LIBRARY): Define.
(lang_specific_driver): Add -lrt if linking statically on
GNU/Linux.
* configure.ac (RT_LIBS): Define.
* Makefile.am (check-runtime): Set GOLIBS to $(RT_LIBS).
* configure, Makefile.in: Regenerate.
Thomas Rodgers [Wed, 9 Feb 2022 20:29:19 +0000 (12:29 -0800)]
libstdc++: Fix deadlock in atomic wait [PR104442]
This issue was observed as a deadlock in
29_atomics/atomic/wait_notify/100334.cc on vxworks. When a wait is
"laundered" (e.g. type T* does not suffice as a waitable address for the
platform's native waiting primitive), the address waited is that of the
_M_ver member of __waiter_pool_base, so several threads may wait on the
same address for unrelated atomic<T> objects. As noted in the PR, the
implementation correctly exits the wait for the thread whose data
changed, but not for any other threads waiting on the same address.
As noted in the PR the __waiter::_M_do_wait_v member was correctly exiting
but the other waiters were not reloading the value of _M_ver before
re-entering the wait.
Moving the spin call inside the loop accomplishes this, and is
consistent with the predicate accepting version of __waiter::_M_do_wait.
libstdc++-v3/ChangeLog:
PR libstdc++/104442
* include/bits/atomic_wait.h (__waiter::_M_do_wait_v): Move spin
loop inside do loop so that threads failing the wait, reload
_M_ver.
David Edelsohn [Wed, 9 Feb 2022 15:10:45 +0000 (10:10 -0500)]
testsuite: AIX fixes
gcc/testsuite/ChangeLog:
* gcc.dg/Wstringop-overflow-69.c: Add -Wno-psabi.
* gcc.dg/loop-unswitch-6.c: Omit -fcompare-debug on AIX.
H.J. Lu [Wed, 9 Feb 2022 19:48:58 +0000 (11:48 -0800)]
x86: Compile PR target/104441 tests with -march=x86-64
Compile PR target/104441 tests with -march=x86-64 to fix test failures
when GCC is configured with --with-arch=native --with-cpu=native.
PR target/104441
* gcc.target/i386/pr104441-1a.c: Compile with -march=x86-64.
* gcc.target/i386/pr104441-1b.c: Likewise.
Jakub Jelinek [Wed, 9 Feb 2022 19:45:31 +0000 (20:45 +0100)]
c: Fix up __builtin_assoc_barrier handling in the C FE [PR104427]
The following testcase ICEs, because when creating PAREN_EXPR for
__builtin_assoc_barrier the FE doesn't do the usual tweaks for
EXCESS_PRECISION_EXPR or C_MAYBE_CONST_EXPR. I believe that the
declared effect of the builtin is just association barrier, so
e.g. excess precision should be still handled like if it wasn't
there.
The following patch uses build_unary_op to handle those.
2022-02-09 Jakub Jelinek <jakub@redhat.com>
PR c/104427
* c-parser.cc (c_parser_postfix_expression)
<case RID_BUILTIN_ASSOC_BARRIER>: Use parser_build_unary_op
instead of build1_loc to build PAREN_EXPR.
* c-typeck.cc (build_unary_op): Handle PAREN_EXPR.
* c-fold.cc (c_fully_fold_internal): Likewise.
* gcc.dg/pr104427.c: New test.
Uros Bizjak [Wed, 9 Feb 2022 19:19:45 +0000 (20:19 +0100)]
i386: -mno-xsave should disable all relevant ISA flags [PR104462]
2022-02-09 Uroš Bizjak <ubizjak@gmail.com>
gcc/ChangeLog:
PR target/104462
* common/config/i386/i386-common.cc (OPTION_MASK_ISA2_XSAVE_UNSET):
Also include OPTION_MASK_ISA2_AVX2_UNSET.
gcc/testsuite/ChangeLog:
PR target/104462
* gcc.target/i386/pr104462.c: New test.
Uros Bizjak [Wed, 9 Feb 2022 19:18:10 +0000 (20:18 +0100)]
i386: Force inputs to a register to avoid lowpart_subreg failure [PR104458]
Input operands can be in the form of:
(subreg:DI (reg:V2SF 96) 0)
which chokes lowpart_subreg. Force inputs to a register, which is
preferable even when the input operand is from memory.
2022-02-09 Uroš Bizjak <ubizjak@gmail.com>
gcc/ChangeLog:
PR target/104458
* config/i386/i386-expand.cc (ix86_split_idivmod):
Force operands[2] and operands[3] into a register..
gcc/testsuite/ChangeLog:
PR target/104458
* gcc.target/i386/pr104458.c: New test.
Jeff Law [Wed, 9 Feb 2022 19:10:53 +0000 (14:10 -0500)]
Avoid using predefined insn name for instruction with different semantics
This isn't technically a regression, but it only impacts the v850 target and
fixes a long standing code correctness issue.
As outlined in slightly more detail in the PR, the v850 is using the pattern
name "fnmasf4" and "fnmssf4" to generate fnmaf.s and fnmsf.s instructions
respectively.
Unfortunately fnmasf4 is expected to produce (-a * b) + c and
fnmssf4 (-a * b) - c. Those v850 instructions actually negate the entire
result.
The fix is trivial. Use a different pattern name so that the combiner can
still generate those instructions, but prevent those instructions from being
used to implement GCC's notion of what fnmas and fnmss should be.
This fixes pr97040 as well as a handful of testsuite failures for the v3e5
multilib.
gcc/
PR target/97040
* config/v850/v850.md (*v850_fnmasf4): Renamed from fnmasf4.
(*v850_fnmssf4): Renamed from fnmssf4
Ian Lance Taylor [Wed, 9 Feb 2022 17:38:18 +0000 (09:38 -0800)]
-fgo-dump-spec: really name alignment field "_"
* godump.cc (go_force_record_alignment): Really name the alignment
field "_" (complete 2021-12-29 change).
* gcc.misc-tests/godump-1.c: Adjust for alignment field rename.
Bill Schmidt [Fri, 4 Feb 2022 19:07:17 +0000 (13:07 -0600)]
rs6000: Correct function prototypes for vec_replace_unaligned
Due to a pasto error in the documentation, vec_replace_unaligned was
implemented with the same function prototypes as vec_replace_elt. It was
intended that vec_replace_unaligned always specify output vectors as having
type vector unsigned char, to emphasize that elements are potentially
misaligned by this built-in function. This patch corrects the
misimplementation.
2022-02-04 Bill Schmidt <wschmidt@linux.ibm.com>
gcc/
* config/rs6000/rs6000-builtins.def (VREPLACE_UN_UV2DI): Change
function prototype.
(VREPLACE_UN_UV4SI): Likewise.
(VREPLACE_UN_V2DF): Likewise.
(VREPLACE_UN_V2DI): Likewise.
(VREPLACE_UN_V4SF): Likewise.
(VREPLACE_UN_V4SI): Likewise.
* config/rs6000/rs6000-overload.def (VEC_REPLACE_UN): Change all
function prototypes.
* config/rs6000/vsx.md (vreplace_un_<mode>): Remove define_expand.
(vreplace_un_<mode>): New define_insn.
gcc/testsuite/
* gcc.target/powerpc/vec-replace-word-runnable.c: Handle expected
prototypes for each call to vec_replace_unaligned.
Richard Sandiford [Wed, 9 Feb 2022 16:57:06 +0000 (16:57 +0000)]
aarch64: Extend vec_concat patterns to 8-byte vectors
This patch extends the previous support for 16-byte vec_concat
so that it supports pairs of 4-byte elements. This too isn't
strictly a regression fix, since the 8-byte forms weren't affected
by the same problems as the 16-byte forms, but it leaves things in
a more consistent state.
gcc/
* config/aarch64/iterators.md (VDCSIF): New mode iterator.
(VDBL): Handle SF.
(single_wx, single_type, single_dtype, dblq): New mode attributes.
* config/aarch64/aarch64-simd.md (load_pair_lanes<mode>): Extend
from VDC to VDCSIF.
(store_pair_lanes<mode>): Likewise.
(*aarch64_combine_internal<mode>): Likewise.
(*aarch64_combine_internal_be<mode>): Likewise.
(*aarch64_combinez<mode>): Likewise.
(*aarch64_combinez_be<mode>): Likewise.
* config/aarch64/aarch64.cc (aarch64_classify_address): Handle
8-byte modes for ADDR_QUERY_LDP_STP_N.
(aarch64_print_operand): Likewise for %y.
gcc/testsuite/
* gcc.target/aarch64/vec-init-13.c: New test.
* gcc.target/aarch64/vec-init-14.c: Likewise.
* gcc.target/aarch64/vec-init-15.c: Likewise.
* gcc.target/aarch64/vec-init-16.c: Likewise.
* gcc.target/aarch64/vec-init-17.c: Likewise.
Richard Sandiford [Wed, 9 Feb 2022 16:57:06 +0000 (16:57 +0000)]
aarch64: Remove move_lo/hi_quad expanders
This patch is the second of two to remove the old
move_lo/hi_quad expanders and move_hi_quad insns.
gcc/
* config/aarch64/aarch64-simd.md (@aarch64_split_simd_mov<mode>):
Use aarch64_combine instead of move_lo/hi_quad. Tabify.
(move_lo_quad_<mode>, aarch64_simd_move_hi_quad_<mode>): Delete.
(aarch64_simd_move_hi_quad_be_<mode>, move_hi_quad_<mode>): Delete.
(vec_pack_trunc_<mode>): Take general_operand elements and use
aarch64_combine rather than move_lo/hi_quad to combine them.
(vec_pack_trunc_df): Likewise.
Richard Sandiford [Wed, 9 Feb 2022 16:57:05 +0000 (16:57 +0000)]
aarch64: Add a general vec_concat expander
After previous patches, we have a (mostly new) group of vec_concat
patterns as well as vestiges of the old move_lo/hi_quad patterns.
(A previous patch removed the move_lo_quad insns, but we still
have the move_hi_quad insns and both sets of expanders.)
This patch is the first of two to remove the old move_lo/hi_quad
stuff. It isn't technically a regression fix, but it seemed
better to make the changes now rather than leave things in
a half-finished and inconsistent state.
This patch defines an aarch64_vec_concat expander that coerces the
element operands into a valid form, including the ones added by the
previous patch. This in turn lets us get rid of one move_lo/hi_quad
pair.
As a side-effect, it also means that vcombines of 2 vectors make
better use of the available forms, like vec_inits of 2 scalars
already do.
gcc/
* config/aarch64/aarch64-protos.h (aarch64_split_simd_combine):
Delete.
* config/aarch64/aarch64-simd.md (@aarch64_combinez<mode>): Rename
to...
(*aarch64_combinez<mode>): ...this.
(@aarch64_combinez_be<mode>): Rename to...
(*aarch64_combinez_be<mode>): ...this.
(@aarch64_vec_concat<mode>): New expander.
(aarch64_combine<mode>): Use it.
(@aarch64_simd_combine<mode>): Delete.
* config/aarch64/aarch64.cc (aarch64_split_simd_combine): Delete.
(aarch64_expand_vector_init): Use aarch64_vec_concat.
gcc/testsuite/
* gcc.target/aarch64/vec-init-12.c: New test.
Richard Sandiford [Wed, 9 Feb 2022 16:57:05 +0000 (16:57 +0000)]
aarch64: Add more vec_combine patterns
vec_combine is really one instruction on aarch64, provided that
the lowpart element is in the same register as the destination
vector. This patch adds patterns for that.
The patch fixes a regression from GCC 8. Before the patch:
int64x2_t s64q_1(int64_t a0, int64_t a1) {
if (__BYTE_ORDER__ == __ORDER_BIG_ENDIAN__)
return (int64x2_t) { a1, a0 };
else
return (int64x2_t) { a0, a1 };
}
generated:
fmov d0, x0
ins v0.d[1], x1
ins v0.d[1], x1
ret
whereas GCC 8 generated the more respectable:
dup v0.2d, x0
ins v0.d[1], x1
ret
gcc/
* config/aarch64/predicates.md (aarch64_reg_or_mem_pair_operand):
New predicate.
* config/aarch64/aarch64-simd.md (*aarch64_combine_internal<mode>)
(*aarch64_combine_internal_be<mode>): New patterns.
gcc/testsuite/
* gcc.target/aarch64/vec-init-9.c: New test.
* gcc.target/aarch64/vec-init-10.c: Likewise.
* gcc.target/aarch64/vec-init-11.c: Likewise.
Richard Sandiford [Wed, 9 Feb 2022 16:57:04 +0000 (16:57 +0000)]
aarch64: Remove redundant vec_concat patterns
move_lo_quad_internal_<mode> and move_lo_quad_internal_be_<mode>
partially duplicate the later aarch64_combinez{,_be}<mode> patterns.
The duplication itself is a regression.
The only substantive differences between the two are:
* combinez uses vector MOV (ORR) instead of element MOV (DUP).
The former seems more likely to be handled via renaming.
* combinez disparages the GPR->FPR alternative whereas move_lo_quad
gave it equal cost. The new test gives a token example of when
the combinez behaviour helps.
gcc/
* config/aarch64/aarch64-simd.md (move_lo_quad_internal_<mode>)
(move_lo_quad_internal_be_<mode>): Delete.
(move_lo_quad_<mode>): Use aarch64_combine<Vhalf> instead of the above.
gcc/testsuite/
* gcc.target/aarch64/vec-init-8.c: New test.
Richard Sandiford [Wed, 9 Feb 2022 16:57:03 +0000 (16:57 +0000)]
aarch64: Generalise adjacency check for load_pair_lanes
This patch generalises the load_pair_lanes<mode> guard so that
it uses aarch64_check_consecutive_mems to check for consecutive
mems. It also allows the pattern to be used for STRICT_ALIGNMENT
targets if the alignment is high enough.
The main aim is to avoid an inline test, for the sake of a later patch
that needs to repeat it. Reusing aarch64_check_consecutive_mems seemed
simpler than writing an entirely new function.
gcc/
* config/aarch64/aarch64-protos.h (aarch64_mergeable_load_pair_p):
Declare.
* config/aarch64/aarch64-simd.md (load_pair_lanes<mode>): Use
aarch64_mergeable_load_pair_p instead of inline check.
* config/aarch64/aarch64.cc (aarch64_expand_vector_init): Likewise.
(aarch64_check_consecutive_mems): Allow the reversed parameter
to be null.
(aarch64_mergeable_load_pair_p): New function.
Richard Sandiford [Wed, 9 Feb 2022 16:57:02 +0000 (16:57 +0000)]
aarch64: Generalise vec_set predicate
The aarch64_simd_vec_set<mode> define_insn takes memory operands,
so this patch makes the vec_set<mode> optab expander do the same.
gcc/
* config/aarch64/aarch64-simd.md (vec_set<mode>): Allow the
element to be an aarch64_simd_nonimmediate_operand.
Richard Sandiford [Wed, 9 Feb 2022 16:57:02 +0000 (16:57 +0000)]
aarch64: Tighten general_operand predicates
This patch fixes some case in which *general_operand was used over
*nonimmediate_operand by patterns that don't accept immediates.
This avoids some complication with later patches.
gcc/
* config/aarch64/aarch64-simd.md (aarch64_simd_vec_set<mode>): Use
aarch64_simd_nonimmediate_operand instead of
aarch64_simd_general_operand.
(@aarch64_combinez<mode>): Use nonimmediate_operand instead of
general_operand.
(@aarch64_combinez_be<mode>): Likewise.
Patrick Palka [Wed, 9 Feb 2022 16:33:04 +0000 (11:33 -0500)]
c++: memfn lookup consistency and using-decls [PR104432]
In filter_memfn_lookup, we weren't correctly recognizing and matching up
member functions introduced via a non-dependent using-decl. This caused
us to crash in the below testcases in which we correctly pruned the
overload set for the non-dependent call ahead of time, but then at
instantiation time filter_memfn_lookup failed to match the selected
function (introduced in each case by a non-dependent using-decl) to the
corresponding function from the new lookup set. Such member functions
need special handling in filter_memfn_lookup because they look exactly
the same in the old and new lookup sets, whereas ordinary member
functions that're defined in the (dependent) current class become more
specialized in the new lookup set.
This patch reworks the matching logic in filter_memfn_lookup so that it
handles (member functions introduced by) non-dependent using-decls
correctly, and is hopefully simpler overall.
PR c++/104432
gcc/cp/ChangeLog:
* call.cc (build_new_method_call): When a non-dependent call
resolves to a specialization of a member template, always build
the pruned overload set using the member template, not the
specialization.
* pt.cc (filter_memfn_lookup): New parameter newtype. Simplify
and correct how members from the new lookup set are matched to
those from the old one.
(tsubst_baselink): Pass binfo_type as newtype to
filter_memfn_lookup.
gcc/testsuite/ChangeLog:
* g++.dg/template/non-dependent19.C: New test.
* g++.dg/template/non-dependent19a.C: New test.
* g++.dg/template/non-dependent20.C: New test.
Jason Merrill [Wed, 9 Feb 2022 05:31:12 +0000 (00:31 -0500)]
c++: modules and explicit(bool) [PR103752]
We weren't streaming a C++20 dependent explicit-specifier.
PR c++/103752
gcc/cp/ChangeLog:
* module.cc (trees_out::core_vals): Stream explicit specifier.
(trees_in::core_vals): Likewise.
* pt.cc (store_explicit_specifier): No longer static.
(tsubst_function_decl): Clear DECL_HAS_DEPENDENT_EXPLICIT_SPEC_P.
* cp-tree.h (lookup_explicit_specifier): Declare.
gcc/testsuite/ChangeLog:
* g++.dg/modules/explicit-bool-1_b.C: New test.
* g++.dg/modules/explicit-bool-1_a.H: New test.
Richard Biener [Wed, 9 Feb 2022 13:52:24 +0000 (14:52 +0100)]
middle-end/104464 - ISEL and non-call EH #2
The following adjusts the earlier change to still allow an
uncritical replacement.
2022-02-09 Richard Biener <rguenther@suse.de>
PR middle-end/104464
* gimple-isel.cc (gimple_expand_vec_cond_expr): Postpone
throwing check to after unproblematic replacement.
* gcc.dg/pr104464.c: New testcase.
Jason Merrill [Wed, 9 Feb 2022 05:30:49 +0000 (00:30 -0500)]
c++: P2493 feature test macro updates
The C++ committee just updated the values of these macros to reflect some
late C++20 papers that we implement but others don't yet; see PR103891.
gcc/c-family/ChangeLog:
* c-cppbuiltin.cc (c_cpp_builtins): Update values
of __cpp_constexpr and __cpp_concepts for C++20.
gcc/testsuite/ChangeLog:
* g++.dg/cpp23/feat-cxx2b.C: Adjust.
* g++.dg/cpp2a/feat-cxx2a.C: Adjust.
Roger Sayle [Wed, 9 Feb 2022 14:21:08 +0000 (14:21 +0000)]
[PATCH] PR tree-optimization/104420: Fix checks for constant folding X*0.0
This patch resolves PR tree-optimization/104420, which is a P1 regression
where, as observed by Jakub Jelinek, the conditions for constant folding
x*0.0 are incorrect (following my patch for PR tree-optimization/96392).
The multiplication x*0.0 may yield a negative zero result, -0.0, if X is
negative (not just if x may be negative zero). Hence (without -ffast-math)
(int)x*0.0 can't be optimized to 0.0, but (unsigned)x*0.0 can be constant
folded. This adds a bunch of test cases to confirm the desired behaviour,
and removes an incorrect test from gcc.dg/pr96392.c which checked for the
wrong behaviour.
2022-02-09 Roger Sayle <roger@nextmovesoftware.com>
gcc/ChangeLog
PR tree-optimization/104420
* match.pd (mult @0 real_zerop): Tweak conditions for constant
folding X*0.0 (or X*-0.0) to HONOR_SIGNED_ZEROS when appropriate.
gcc/testsuite/ChangeLog
PR tree-optimization/104420
* gcc.dg/pr104420-1.c: New test case.
* gcc.dg/pr104420-2.c: New test case.
* gcc.dg/pr104420-3.c: New test case.
* gcc.dg/pr104420-4.c: New test case.
* gcc.dg/pr96392.c: Remove incorrect test.
Jakub Jelinek [Wed, 9 Feb 2022 14:17:52 +0000 (15:17 +0100)]
dwarf2out: Don't call expand_expr during early_dwarf [PR104407]
As mentioned in the PR, since PR96690 r11-2834 we call rtl_for_decl_init
which can call expand_expr already during early_dwarf. The comment and PR
explains it that the intent is to ensure the referenced vars and functions
are properly mangled because free_lang_data doesn't cover everything, like
template parameters etc. It doesn't work well though, because expand_expr
can set DECL_RTLs e.g. on referenced vars and keep them there, and they can
be created e.g. with different MEM_ALIGN compared to what they would be
created with if they were emitted later.
So, the following patch stops calling rtl_for_decl_init and instead
for cases for which rtl_for_decl_init does anything at all walks the
initializer and ensures referenced vars or functions are mangled.
2022-02-09 Jakub Jelinek <jakub@redhat.com>
PR debug/104407
* dwarf2out.cc (mangle_referenced_decls): New function.
(tree_add_const_value_attribute): Don't call rtl_for_decl_init if
early_dwarf. Instead walk the initializer and try to mangle vars or
functions referenced from it.
* g++.dg/debug/dwarf2/pr104407.C: New test.
Andrew MacLeod [Mon, 7 Feb 2022 20:52:16 +0000 (15:52 -0500)]
Register non-null side effects properly.
This patch adjusts uses of nonnull to accurately reflect "somewhere in block".
It also adds the ability to register statement side effects within a block
for ranger which will apply for the rest of the block.
PR tree-optimization/104288
gcc/
* gimple-range-cache.cc (non_null_ref::set_nonnull): New.
(non_null_ref::adjust_range): Move to header.
(ranger_cache::range_of_def): Don't check non-null.
(ranger_cache::entry_range): Don't check non-null.
(ranger_cache::range_on_edge): Check for nonnull on normal edges.
(ranger_cache::update_to_nonnull): New.
(non_null_loadstore): New.
(ranger_cache::block_apply_nonnull): New.
* gimple-range-cache.h (class non_null_ref): Update prototypes.
(non_null_ref::adjust_range): Move to here and inline.
(class ranger_cache): Update prototypes.
* gimple-range-path.cc (path_range_query::range_defined_in_block): Do
not search dominators.
(path_range_query::adjust_for_non_null_uses): Ditto.
* gimple-range.cc (gimple_ranger::range_of_expr): Check on-entry for
def overrides. Do not check nonnull.
(gimple_ranger::range_on_entry): Check dominators for nonnull.
(gimple_ranger::range_on_edge): Check for nonnull on normal edges..
(gimple_ranger::register_side_effects): New.
* gimple-range.h (gimple_ranger::register_side_effects): New.
* tree-vrp.cc (rvrp_folder::fold_stmt): Call register_side_effects.
gcc/testsuite/
* gcc.dg/pr104288.c: New.
Richard Biener [Wed, 9 Feb 2022 09:55:18 +0000 (10:55 +0100)]
tree-optimization/104445 - check for vector extraction support
This adds a missing check to epilogue reduction re-use, namely
that we can do hi/lo extracts from the vector when demoting it
to the epilogue vector size.
I've chosen to add a can_vec_extract helper to optabs-query.h,
in the future we might want to simplify the vectorizers life by
handling vector-from-vector extraction via BIT_FIELD_REFs during
RTL expansion via the mode punning when the vec_extract is not
directly supported.
I'm not 100% sure we can always do the punning of the
vec_extract result to a vector mode of the same size, but then
I'm also not sure how to check for that (the vectorizer doesn't
in other places it does that at the moment, but I suppose we
eventually just go through memory there)?
2022-02-09 Richard Biener <rguenther@suse.de>
PR tree-optimization/104445
PR tree-optimization/102832
* optabs-query.h (can_vec_extract): New.
* optabs-query.cc (can_vec_extract): Likewise.
* tree-vect-loop.cc (vect_find_reusable_accumulator): Check
we can extract a hi/lo part from the larger vector, rework
check iteration from larger to smaller sizes.
* gcc.dg/vect/pr104445.c: New testcase.
H.J. Lu [Sat, 19 Jun 2021 12:12:48 +0000 (05:12 -0700)]
x86: Add -m[no-]direct-extern-access
Add -m[no-]direct-extern-access and nodirect_extern_access attribute.
-mdirect-extern-access is the default. With nodirect_extern_access
attribute, GOT is always used to access undefined data and function
symbols with nodirect_extern_access attribute, including in PIE and
non-PIE. With -mno-direct-extern-access:
1. Always use GOT to access undefined data and function symbols,
including in PIE and non-PIE. These will avoid copy relocations
in executables. This is compatible with existing executables and
shared libraries.
2. In executable and shared library, bind symbols with the STV_PROTECTED
visibility locally:
a. The address of data symbol is the address of data body.
b. For systems without function descriptor, the function pointer is
the address of function body.
c. The resulting shared libraries may not be incompatible with
executables which have copy relocations on protected symbols or
use executable PLT entries as function addresses for protected
functions in shared libraries.
3. Update asm_preferred_eh_data_format to select PC relative EH encoding
format with -mno-direct-extern-access to avoid copy relocation.
4. Add ix86_reloc_rw_mask for TARGET_ASM_RELOC_RW_MASK to avoid copy
relocation with -mno-direct-extern-access.
gcc/
PR target/35513
PR target/100593
* config/i386/gnu-property.cc: Include "i386-protos.h".
(file_end_indicate_exec_stack_and_gnu_property): Generate
a GNU_PROPERTY_1_NEEDED note for -mno-direct-extern-access or
nodirect_extern_access attribute.
* config/i386/i386-options.cc
(handle_nodirect_extern_access_attribute): New function.
(ix86_attribute_table): Add nodirect_extern_access attribute.
* config/i386/i386-protos.h (ix86_force_load_from_GOT_p): Add a
bool argument.
(ix86_has_no_direct_extern_access): New.
* config/i386/i386.cc (ix86_has_no_direct_extern_access): New.
(ix86_force_load_from_GOT_p): Add a bool argument to indicate
call operand. Force non-call load from GOT for
-mno-direct-extern-access or nodirect_extern_access attribute.
(legitimate_pic_address_disp_p): Avoid copy relocation in PIE
for -mno-direct-extern-access or nodirect_extern_access attribute.
(ix86_print_operand): Pass true to ix86_force_load_from_GOT_p
for call operand.
(asm_preferred_eh_data_format): Use PC-relative format for
-mno-direct-extern-access to avoid copy relocation. Check
ptr_mode instead of TARGET_64BIT when selecting DW_EH_PE_sdata4.
(ix86_binds_local_p): Set ix86_has_no_direct_extern_access to
true for -mno-direct-extern-access or nodirect_extern_access
attribute. Don't treat protected data as extern and avoid copy
relocation on common symbol with -mno-direct-extern-access or
nodirect_extern_access attribute.
(ix86_reloc_rw_mask): New to avoid copy relocation for
-mno-direct-extern-access.
(TARGET_ASM_RELOC_RW_MASK): New.
* config/i386/i386.opt: Add -mdirect-extern-access.
* doc/extend.texi: Document nodirect_extern_access attribute.
* doc/invoke.texi: Document -m[no-]direct-extern-access.
gcc/testsuite/
PR target/35513
PR target/100593
* g++.target/i386/pr35513-1.C: New file.
* g++.target/i386/pr35513-2.C: Likewise.
* gcc.target/i386/pr35513-1a.c: Likewise.
* gcc.target/i386/pr35513-1b.c: Likewise.
* gcc.target/i386/pr35513-2a.c: Likewise.
* gcc.target/i386/pr35513-2b.c: Likewise.
* gcc.target/i386/pr35513-3a.c: Likewise.
* gcc.target/i386/pr35513-3b.c: Likewise.
* gcc.target/i386/pr35513-4a.c: Likewise.
* gcc.target/i386/pr35513-4b.c: Likewise.
* gcc.target/i386/pr35513-5a.c: Likewise.
* gcc.target/i386/pr35513-5b.c: Likewise.
* gcc.target/i386/pr35513-6a.c: Likewise.
* gcc.target/i386/pr35513-6b.c: Likewise.
* gcc.target/i386/pr35513-7a.c: Likewise.
* gcc.target/i386/pr35513-7b.c: Likewise.
* gcc.target/i386/pr35513-8.c: Likewise.
* gcc.target/i386/pr35513-9a.c: Likewise.
* gcc.target/i386/pr35513-9b.c: Likewise.
* gcc.target/i386/pr35513-10a.c: Likewise.
* gcc.target/i386/pr35513-10b.c: Likewise.
* gcc.target/i386/pr35513-11a.c: Likewise.
* gcc.target/i386/pr35513-11b.c: Likewise.
* gcc.target/i386/pr35513-12a.c: Likewise.
* gcc.target/i386/pr35513-12b.c: Likewise.
H.J. Lu [Sun, 30 Jan 2022 18:08:14 +0000 (10:08 -0800)]
x86: Check each component of source operand for AVX_U128_DIRTY
commit
9775e465c1fbfc32656de77c618c61acf5bd905d
Author: H.J. Lu <hjl.tools@gmail.com>
Date: Tue Jul 27 07:46:04 2021 -0700
x86: Don't set AVX_U128_DIRTY when zeroing YMM/ZMM register
called ix86_check_avx_upper_register to check mode on source operand.
But ix86_check_avx_upper_register doesn't work on source operand like
(vec_select:V2DI (reg/v:V4DI 23 xmm3 [orig:91 ymm ] [91])
(parallel [
(const_int 2 [0x2])
(const_int 3 [0x3])
]))
Add ix86_avx_u128_mode_source to check mode for each component of source
operand.
gcc/
PR target/104441
* config/i386/i386.cc (ix86_avx_u128_mode_source): New function.
(ix86_avx_u128_mode_needed): Return AVX_U128_ANY for debug INSN.
Call ix86_avx_u128_mode_source to check mode for each component
of source operand.
gcc/testsuite/
PR target/104441
* gcc.target/i386/pr104441-1a.c: New test.
* gcc.target/i386/pr104441-1b.c: Likewise.
liuhongt [Wed, 9 Feb 2022 05:14:43 +0000 (13:14 +0800)]
ICE: QImode(not SImode) operand should be passed to gen_vec_initv16qiqi in ashlv16qi3.
ix86_expand_vector_init expects vals to be a parallel containing
values of individual fields which should be either element mode of the
vector mode, or a vector mode with the same element mode and smaller
number of elements.
But in the expander ashlv16qi3, the second operand is SImode which
can't be directly passed to gen_vec_initv16qiqi.
gcc/ChangeLog:
PR target/104451
* config/i386/sse.md (<insn><mode>3): lowpart_subreg
operands[2] from SImode to QImode.
gcc/testsuite/ChangeLog:
PR target/104451
* gcc.target/i386/pr104451.c: New test.
Richard Biener [Wed, 9 Feb 2022 08:11:28 +0000 (09:11 +0100)]
middle-end/104450 - ISEL and non-call EH
The following avoids merging a vector compare with EH with a
VEC_COND_EXPR. We should be able to do fallback expansion and if
we really are for the optimization we need quite some shuffling
to arrange for the proper EH redirection in all cases, IMHO not
worth it.
2022-02-09 Richard Biener <rguenther@suse.de>
PR middle-end/104450
* gimple-isel.cc: Pass cfun around.
(+gimple_expand_vec_cond_expr): Do not combine a throwing
comparison with the select.
* g++.dg/torture/pr104450.C: New testcase.
Richard Biener [Wed, 9 Feb 2022 07:48:35 +0000 (08:48 +0100)]
target/104453 - guard call folding with NULL LHS
This guards shift builtin folding to do nothing when there is
no LHS, similar to what other foldings do.
2022-02-09 Richard Biener <rguenther@suse.de>
PR target/104453
* config/i386/i386.cc (ix86_gimple_fold_builtin): Guard shift
folding for NULL LHS.
* gcc.target/i386/pr104453.c: New testcase.
Ian Lance Taylor [Mon, 7 Feb 2022 02:25:25 +0000 (18:25 -0800)]
compiler: recognize Go 1.18 runtime/internal/atomic methods
The Go 1.18 library introduces specific types in runtime/internal/atomic.
Recognize and optimize the methods on those types, as we do with the
functions in runtime/internal/atomic.
While we're here avoid getting confused by methods in any other
package that we recognize specially.
* go-gcc.cc (Gcc_backend::Gcc_backend): Define builtins
__atomic_load_1 and __atomic_store_1.
Reviewed-on: https://go-review.googlesource.com/c/gofrontend/+/383654
Ian Lance Taylor [Sat, 5 Feb 2022 03:59:59 +0000 (19:59 -0800)]
compiler, internal/abi: implement FuncPCABI0, FuncPCABIInternal
The Go 1.18 standard library uses an internal/abi package with two
functions that are implemented in the compiler. This patch implements
them in the gofrontend, to support the upcoming update to 1.18.
Reviewed-on: https://go-review.googlesource.com/c/gofrontend/+/383514
Joel Teichroeb [Fri, 4 Feb 2022 16:35:08 +0000 (11:35 -0500)]
analyzer: Fix tests for glibc 2.35 [PR101081]
In recent versions of glibc fopen has __attribute__((malloc)).
Since we can not detect wether this attribute is present or not,
we avoid including stdio.h and instead forward declare what we
need in each test.
Signed-off-by: Joel Teichroeb <joel@teichroeb.net>
gcc/testsuite/ChangeLog:
PR analyzer/101081
* gcc.dg/analyzer/analyzer-verbosity-2a.c: Replace #include of
stdio.h with declarations needed by the test.
* gcc.dg/analyzer/analyzer-verbosity-3a.c: Likewise.
* gcc.dg/analyzer/edges-1.c: Likewise.
* gcc.dg/analyzer/file-1.c: Likewise.
* gcc.dg/analyzer/file-2.c: Likewise.
* gcc.dg/analyzer/file-paths-1.c: Likewise.
* gcc.dg/analyzer/file-pr58237.c: Likewise.
* gcc.dg/analyzer/pr99716-1.c: Likewise.
Signed-off-by: David Malcolm <dmalcolm@redhat.com>
David Malcolm [Tue, 8 Feb 2022 21:37:08 +0000 (16:37 -0500)]
analyzer: fix hashing of bit_range_region::key_t [PR104452]
gcc/analyzer/ChangeLog:
PR analyzer/104452
* region-model.cc (selftest::test_bit_range_regions): New.
(selftest::analyzer_region_model_cc_tests): Call it.
* region.h (bit_range_region::key_t::hash): Fix hashing of m_bits
to avoid using uninitialized data.
gcc/testsuite/ChangeLog:
PR analyzer/104452
* gcc.dg/analyzer/pr104452.c: New test.
Signed-off-by: David Malcolm <dmalcolm@redhat.com>
Jason Merrill [Fri, 4 Feb 2022 23:25:51 +0000 (18:25 -0500)]
c++: cleanup constant-init'd members [PR96876]
This is a case missed by my recent fixes to aggregate initialization and
exception cleanup for PR94041 et al: we also need to clean up members with
constant initialization if initialization of a later member throws.
It also occurs to me that we needn't bother building the cleanups if
-fno-exceptions; build_vec_init already doesn't.
PR c++/96876
gcc/cp/ChangeLog:
* typeck2.cc (split_nonconstant_init_1): Push cleanups for
preceding members with constant initialization.
(maybe_push_temp_cleanup): Do nothing if -fno-exceptions.
gcc/testsuite/ChangeLog:
* g++.dg/cpp1z/aggr-base11.C: New test.
* g++.dg/eh/aggregate2.C: New test.
GCC Administrator [Wed, 9 Feb 2022 00:16:24 +0000 (00:16 +0000)]
Daily bump.
Jonathan Wakely [Tue, 8 Feb 2022 15:57:58 +0000 (15:57 +0000)]
libstdc++: Simplify resource management in directory iterators
This replaces the _Dir constructor that takes ownership of an existing
DIR* resource with one that takes a _Dir_base rvalue instead. This means
a raw DIR* is never passed around, but is always owned by a _Dir_base
object.
libstdc++-v3/ChangeLog:
* src/c++17/fs_dir.cc (_Dir(DIR*, const path&)): Change first
parameter to _Dir_base&&.
* src/filesystem/dir-common.h (_Dir_base(DIR*)): Remove.
* src/filesystem/dir.cc (_Dir(DIR*, const path&)): Change first
parameter to _Dir_base&&.
Robin Dapp [Tue, 8 Feb 2022 15:11:20 +0000 (16:11 +0100)]
ifcvt: Fix PR104153 and PR104198.
This is a bugfix for r12-6747-gaa8cfe785953a0 which caused an ICE
on or1k (PR104153) and broke SPARC bootstrap (PR104198).
cond_exec_get_condition () returns the jump condition directly and we
now pass it to the backend. The or1k backend modified the condition
in-place (other backends do that as well) but this modification is not
reverted when the sequence in question is discarded. Therefore we copy
the RTX instead of using it directly.
The SPARC problem is due to the SPARC backend recreating the initial
condition when being passed a CC comparison. This causes the sequence
to read from an already overwritten condition operand. Generally, this
could also happen on other targets. The workaround is to always first
emit to a temporary. In a second run of noce_convert_multiple_sets_1
we know which sequences actually require the comparison and will use no
temporaries if all sequences after the current one do not require it.
PR rtl-optimization/104198
PR rtl-optimization/104153
gcc/ChangeLog:
* ifcvt.cc (noce_convert_multiple_sets_1): Copy rtx instead of
using it directly. Rework comparison handling and always
perform a second pass.
gcc/testsuite/ChangeLog:
* gcc.dg/pr104198.c: New test.
Jakub Jelinek [Tue, 8 Feb 2022 19:17:55 +0000 (20:17 +0100)]
c++: Don't emit repeated -Wshadow warnings for templates/ctors [PR104379]
The following patch suppresses extraneous -Wshadow warnings.
On the testcase without the patch we emit 14 -Wshadow warnings,
with the patch just 4. It is enough to warn once e.g. during parsing of the
template or the abstract ctor, while previously we'd warn also on the clones
of the ctors and on instantiation.
In GCC 8 and earlier we didn't warn because check_local_shadow did
/* Inline decls shadow nothing. */
if (DECL_FROM_INLINE (decl))
return;
2022-02-08 Jakub Jelinek <jakub@redhat.com>
PR c++/104379
* name-lookup.cc (check_local_shadow): When diagnosing shadowing
of a member or global declaration, add warning suppression for
the decl and don't warn again on it.
* g++.dg/warn/Wshadow-18.C: New test.
Jakub Jelinek [Tue, 8 Feb 2022 19:15:42 +0000 (20:15 +0100)]
c++: Remove superflous assert [PR104403]
I've added the assert because start_decl diagnoses such vars for C++20 and
earlier:
if (current_function_decl && VAR_P (decl)
&& DECL_DECLARED_CONSTEXPR_P (current_function_decl)
&& cxx_dialect < cxx23)
but as can be seen, we cam trigger the assert in older standards e.g. during
non-manifestly constant evaluation. Rather than refining the assert that
DECL_EXPRs for such vars don't appear for C++20 and older if they are inside
of functions declared constexpr this patch just removes the assert, the
code rejects encountering those vars in constant expressions anyway.
2022-02-08 Jakub Jelinek <jakub@redhat.com>
PR c++/104403
* constexpr.cc (cxx_eval_constant_expression): Don't assert DECL_EXPRs
of TREE_STATIC vars may only appear in -std=c++23.
* g++.dg/cpp0x/lambda/lambda-104403.C: New test.
Jakub Jelinek [Tue, 8 Feb 2022 19:14:30 +0000 (20:14 +0100)]
rs6000: Fix up vspltis_shifted [PR102140]
The following testcase ICEs, because
(const_vector:V4SI [
(const_int 0 [0]) repeated x3
(const_int -
2147483648 [0xffffffff80000000])
])
is recognized as valid easy_vector_constant in between split1 pass and
end of RA.
The problem is that such constants need to be split, and the only
splitter for that is:
(define_split
[(set (match_operand:VM 0 "altivec_register_operand")
(match_operand:VM 1 "easy_vector_constant_vsldoi"))]
"VECTOR_UNIT_ALTIVEC_OR_VSX_P (<MODE>mode) && can_create_pseudo_p ()"
There is only a single splitting pass before RA, so after that finishes,
if something gets matched in between that and end of RA (after that
can_create_pseudo_p () would be no longer true), it will never be
successfully split and we ICE at final.cc time or earlier.
The i386 backend (and a few others) already use
(cfun->curr_properties & PROP_rtl_split_insns)
as a test for split1 pass finished, so that some insns that should be split
during split1 and shouldn't be matched afterwards are properly guarded.
So, the following patch does that for vspltis_shifted too.
2022-02-08 Jakub Jelinek <jakub@redhat.com>
PR target/102140
* config/rs6000/rs6000.cc (vspltis_shifted): Return false also if
split1 pass has finished already.
* gcc.dg/pr102140.c: New test.
Bill Schmidt [Tue, 8 Feb 2022 16:36:14 +0000 (10:36 -0600)]
rs6000: Add support for vmsumcud and vec_msumc
2022-02-08 Bill Schmidt <wschmidt@linux.ibm.com>
gcc/
* config/rs6000/rs6000-builtins.def (VMSUMCUD): New.
* config/rs6000/rs6000-overload.def (VEC_MSUMC): New.
* config/rs6000/vsx.md (UNSPEC_VMSUMCUD): New constant.
(vmsumcud): New define_insn.
gcc/testsuite/
* gcc.target/powerpc/vec-msumc.c: New test.
Patrick Palka [Tue, 8 Feb 2022 14:47:34 +0000 (09:47 -0500)]
c++: Add testcase for already fixed PR [PR104425]
Fixed by r12-1829.
PR c++/104425
gcc/testsuite/ChangeLog:
* g++.dg/template/partial-specialization10.C: New test.
Tom de Vries [Tue, 8 Feb 2022 14:35:37 +0000 (15:35 +0100)]
[nvptx] Unbreak build, add PTX_ISA_SM70
With the commit "[nvptx] Choose -mptx default based on -misa" I introduced a
use of PTX_ISA_SM70, without adding it first.
Add it, as well as the corresponding TARGET_SM70.
Build for x86_64 with nvptx accelerator.
gcc/ChangeLog:
2022-02-08 Tom de Vries <tdevries@suse.de>
* config/nvptx/nvptx-opts.h (enum ptx_isa): Add PTX_ISA_SM70.
* config/nvptx/nvptx.h (TARGET_SM70): Define.
Robin Dapp [Tue, 8 Feb 2022 13:56:29 +0000 (14:56 +0100)]
s390: Increase costs for load on condition and change movqicc expander.
This patch changes the costs for a load on condition from 5 to 6 in
order to ensure that we only if-convert two and not three or more SETS like
if (cond)
{
a = b;
c = d;
e = f;
}
In the movqicc expander we emit a paradoxical subreg directly that
combine would otherwise try to create by using a non-optimal sequence
(which would be too expensive).
Also, fix two oversights in ifcvt testcases.
gcc/ChangeLog:
* config/s390/s390.cc (s390_rtx_costs): Increase costs for load
on condition.
* config/s390/s390.md: Use paradoxical subreg.
gcc/testsuite/ChangeLog:
* gcc.target/s390/ifcvt-two-insns-int.c: Fix array size.
* gcc.target/s390/ifcvt-two-insns-long.c: Dito.
Robin Dapp [Tue, 8 Feb 2022 13:39:16 +0000 (14:39 +0100)]
combine: Check for paradoxical subreg.
This adds a check for a paradoxical subreg in reg_subword_p ()
in order to prevent an ICE on s390 in try_combine () triggered
by the movqicc expander.
gcc/ChangeLog:
* combine.cc (reg_subword_p): Check for paradoxical subreg.
Jonathan Wakely [Tue, 8 Feb 2022 13:41:33 +0000 (13:41 +0000)]
libstdc++: Add comment to acinclude.m4
libstdc++-v3/ChangeLog:
* acinclude.m4 (GLIBCXX_ENABLE_LOCK_POLICY): Add comment about
checking for CAS on correct word size.
Patrick Palka [Tue, 8 Feb 2022 14:11:31 +0000 (09:11 -0500)]
c++: deducing only from noexcept-spec [PR80951]
The "fail-fast" predicate uses_deducible_template_parms used by
unify_one_argument is neglecting to look inside the noexcept-spec of a
function type. This causes deduction to spuriously fail whenever the
noexcept-spec is the only part of the type which contains a deducible
template parameter.
PR c++/80951
gcc/cp/ChangeLog:
* pt.cc (uses_deducible_template_parms): Consider the
noexcept-spec of a function type.
gcc/testsuite/ChangeLog:
* g++.dg/cpp1z/noexcept-type25.C: New test.
Patrick Palka [Tue, 8 Feb 2022 14:11:29 +0000 (09:11 -0500)]
c++: satisfaction value of type const bool [PR104410]
Here constant evaluation of the atomic constraint use_func_v<T>
sensibly yields an INTEGER_CST of type const bool, but the assert in
satisfaction_value expects unqualified bool. So let's just relax the
assert to accept cv-qualified bool.
PR c++/104410
gcc/cp/ChangeLog:
* constraint.cc (satisfaction_value): Relax assert to accept
cv-qualified bool.
gcc/testsuite/ChangeLog:
* g++.dg/cpp2a/concepts-pr104410.C: New test.
Patrick Palka [Tue, 8 Feb 2022 13:46:32 +0000 (08:46 -0500)]
c++: lambda in pack expansion using pack in constraint [PR103706]
Here when expanding the pack expansion pattern containing a constrained
lambda, the template argument for each Ts is an ARGUMENT_PACK_SELECT,
which we store inside the lambda's LAMBDA_EXPR_REGEN_INFO. Then during
satisfaction of the lambda's constraint C<Ts> the satisfaction cache
uses this argument as part of the key to the corresponding sat_entry, but
iterative_hash_template_arg and template_args_equal deliberately don't
handle ARGUMENT_PACK_SELECT.
Since it's wrong to preserve ARGUMENT_PACK_SELECT inside a hash table
due to its instability (as documented in iterative_hash_template_arg),
this patch helps make sure the satisfaction cache doesn't see such trees
by resolving ARGUMENT_PACK_SELECT arguments before adding them to
LAMBDA_EXPR_REGEN_INFO.
PR c++/103706
gcc/cp/ChangeLog:
* pt.cc (preserve_args): New function.
(tsubst_lambda_expr): Use it when setting LAMBDA_EXPR_REGEN_INFO.
gcc/testsuite/ChangeLog:
* g++.dg/cpp2a/concepts-lambda19.C: New test.
Patrick Palka [Tue, 8 Feb 2022 13:46:13 +0000 (08:46 -0500)]
c++: constrained auto in lambda using outer tparms [PR103706]
Here we're crashing during satisfaction of the lambda's placeholder type
constraints because the constraints depend on the template arguments
from the enclosing scope, which aren't part of the lambda's DECL_TI_ARGS.
This patch fixes this by making do_auto_deduction consider the
"regenerating" template arguments of a lambda for satisfaction,
mirroring what's done in satisfy_declaration_constraints.
PR c++/103706
gcc/cp/ChangeLog:
* constraint.cc (satisfy_declaration_constraints): Use
lambda_regenerating_args instead.
* cp-tree.h (lambda_regenerating_args): Declare.
* pt.cc (lambda_regenerating_args): Define, split out from
satisfy_declaration_constraints.
(do_auto_deduction): Use lambda_regenerating_args to obtain the
full set of outer template arguments for satisfaction when
inside a lambda.
gcc/testsuite/ChangeLog:
* g++.dg/cpp2a/concepts-lambda18.C: New test.
Jonathan Wakely [Tue, 8 Feb 2022 12:45:46 +0000 (12:45 +0000)]
libstdc++: Adjust Filesystem TS test for Windows
The Filesystem TS isn't really supported for Windows, but the FAIL for
this test is just because it doesn't match what happens on Windows.
libstdc++-v3/ChangeLog:
* testsuite/experimental/filesystem/operations/create_directories.cc:
Adjust expected results for Windows.
Jonathan Wakely [Mon, 7 Feb 2022 23:36:47 +0000 (23:36 +0000)]
libstdc++: Fix filesystem::remove_all for Windows [PR104161]
The recursive_directory_iterator::__erase member was failing for
Windows, because the entry._M_type value is always file_type::none
(because _Dir_base::advance doesn't populate it for Windows) and
top.unlink uses fs::remove which sets an error using the
system_category. That meant that ec.value() was a Windows error code and
not an errno value, so the comparisons to EPERM and EISDIR failed.
Instead of depending on a specific Windows error code for attempting to
remove a directory, just use directory_entry::refresh() to query the
type first. This doesn't avoid the TOCTTOU races with directory
symlinks, but we can't avoid them on Windows without openat and
unlinkat, and creating symlinks requires admin privs on Windows anyway.
This also fixes the fs::remove_all(const path&) overload, which was
supposed to use the same logic as the other overload, but I forgot to
change it before my previous commit.
libstdc++-v3/ChangeLog:
PR libstdc++/104161
* src/c++17/fs_dir.cc (fs::recursive_directory_iterator::__erase):
[i_GLIBCXX_FILESYSTEM_IS_WINDOWS]: Refresh entry._M_type member,
instead of checking for errno values indicating a directory.
* src/c++17/fs_ops.cc (fs::remove_all(const path&)): Use similar
logic to non-throwing overload.
(fs::remove_all(const path&, error_code&)): Add comments.
* src/filesystem/ops-common.h: Likewise.
Tom de Vries [Fri, 4 Feb 2022 07:53:52 +0000 (08:53 +0100)]
[nvptx] Choose -mptx default based on -misa
While testing with driver version 390.147 I ran into the problem that it
doesn't support ptx isa version 6.3 (the new default), only 6.1.
Furthermore, using the -mptx option is a bit user-unfriendly.
Say we want to compile for sm_80. We can use -misa=sm_80 to specify that, but
then run into errors because the default ptx version is 6.3, which doesn't
support sm_80 yet.
Address both these issues by:
- picking a default -mptx based on the active -misa, and
- ensuring that the default -mptx is at least 6.0 (instead
of 6.3).
Also add an error in case of incompatible options like
"-misa=sm_80 -mptx=6.3":
...
cc1: error: PTX version (-mptx) needs to be at least 7.0 to support \
selected -misa (sm_80)
...
Tested on x86_64-linux with nvptx accelerator.
gcc/ChangeLog:
2022-02-08 Tom de Vries <tdevries@suse.de>
PR target/104283
* config/nvptx/nvptx-opts.h (enum ptx_version): Add PTX_VERSION_3_0
and PTX_VERSION_4_2.
* config/nvptx/nvptx.cc (first_ptx_version_supporting_sm)
(default_ptx_version_option, ptx_version_to_string)
(sm_version_to_string, handle_ptx_version_option): New function.
(nvptx_option_override): Call handle_ptx_version_option.
(nvptx_file_start): Use ptx_version_to_string and sm_version_to_string.
* config/nvptx/nvptx.md (define_insn "nvptx_shuffle<mode>")
(define_insn "nvptx_vote_ballot"): Use TARGET_PTX_6_0.
* config/nvptx/nvptx.opt (mptx): Remove 'Init'.
Maciej W. Rozycki [Tue, 8 Feb 2022 12:14:59 +0000 (12:14 +0000)]
RISC-V/testsuite: Run target testing over all the usual optimization levels
Use `gcc-dg-runtest' test driver rather than `dg-runtest' to run the
RISC-V testsuite as several targets already do. Adjust test options
across individual test cases accordingly where required.
As some tests want to be run at `-Og', add a suitable optimization
variant via ADDITIONAL_TORTURE_OPTIONS, and include the moderately
recent `-Oz' variant as well.
gcc/testsuite/
* gcc.target/riscv/riscv.exp: Use `gcc-dg-runtest' rather than
`dg-runtest'. Add `-Og -g' and `-Oz' variants via
ADDITIONAL_TORTURE_OPTIONS.
* gcc.target/riscv/arch-1.c: Adjust test options accordingly.
* gcc.target/riscv/arch-10.c: Likewise.
* gcc.target/riscv/arch-11.c: Likewise.
* gcc.target/riscv/arch-12.c: Likewise.
* gcc.target/riscv/arch-2.c: Likewise.
* gcc.target/riscv/arch-3.c: Likewise.
* gcc.target/riscv/arch-4.c: Likewise.
* gcc.target/riscv/arch-5.c: Likewise.
* gcc.target/riscv/arch-6.c: Likewise.
* gcc.target/riscv/arch-7.c: Likewise.
* gcc.target/riscv/arch-8.c: Likewise.
* gcc.target/riscv/arch-9.c: Likewise.
* gcc.target/riscv/attribute-1.c: Likewise.
* gcc.target/riscv/attribute-10.c: Likewise.
* gcc.target/riscv/attribute-11.c: Likewise.
* gcc.target/riscv/attribute-12.c: Likewise.
* gcc.target/riscv/attribute-13.c: Likewise.
* gcc.target/riscv/attribute-14.c: Likewise.
* gcc.target/riscv/attribute-15.c: Likewise.
* gcc.target/riscv/attribute-16.c: Likewise.
* gcc.target/riscv/attribute-17.c: Likewise.
* gcc.target/riscv/attribute-2.c: Likewise.
* gcc.target/riscv/attribute-3.c: Likewise.
* gcc.target/riscv/attribute-4.c: Likewise.
* gcc.target/riscv/attribute-5.c: Likewise.
* gcc.target/riscv/attribute-7.c: Likewise.
* gcc.target/riscv/attribute-8.c: Likewise.
* gcc.target/riscv/attribute-9.c: Likewise.
* gcc.target/riscv/interrupt-1.c: Likewise.
* gcc.target/riscv/interrupt-2.c: Likewise.
* gcc.target/riscv/interrupt-3.c: Likewise.
* gcc.target/riscv/interrupt-4.c: Likewise.
* gcc.target/riscv/interrupt-conflict-mode.c: Likewise.
* gcc.target/riscv/interrupt-debug.c: Likewise.
* gcc.target/riscv/interrupt-mmode.c: Likewise.
* gcc.target/riscv/interrupt-smode.c: Likewise.
* gcc.target/riscv/interrupt-umode.c: Likewise.
* gcc.target/riscv/li.c: Likewise.
* gcc.target/riscv/load-immediate.c: Likewise.
* gcc.target/riscv/losum-overflow.c: Likewise.
* gcc.target/riscv/mcpu-6.c: Likewise.
* gcc.target/riscv/mcpu-7.c: Likewise.
* gcc.target/riscv/pr102957.c: Likewise.
* gcc.target/riscv/pr103302.c: Likewise.
* gcc.target/riscv/pr104140.c: Likewise.
* gcc.target/riscv/pr84660.c: Likewise.
* gcc.target/riscv/pr93202.c: Likewise.
* gcc.target/riscv/pr93304.c: Likewise.
* gcc.target/riscv/pr95252.c: Likewise.
* gcc.target/riscv/pr95683.c: Likewise.
* gcc.target/riscv/pr98777.c: Likewise.
* gcc.target/riscv/pr99702.c: Likewise.
* gcc.target/riscv/predef-1.c: Likewise.
* gcc.target/riscv/predef-10.c: Likewise.
* gcc.target/riscv/predef-11.c: Likewise.
* gcc.target/riscv/predef-12.c: Likewise.
* gcc.target/riscv/predef-13.c: Likewise.
* gcc.target/riscv/predef-14.c: Likewise.
* gcc.target/riscv/predef-15.c: Likewise.
* gcc.target/riscv/predef-16.c: Likewise.
* gcc.target/riscv/predef-2.c: Likewise.
* gcc.target/riscv/predef-3.c: Likewise.
* gcc.target/riscv/predef-4.c: Likewise.
* gcc.target/riscv/predef-5.c: Likewise.
* gcc.target/riscv/predef-6.c: Likewise.
* gcc.target/riscv/predef-7.c: Likewise.
* gcc.target/riscv/predef-8.c: Likewise.
* gcc.target/riscv/promote-type-for-libcall.c: Likewise.
* gcc.target/riscv/save-restore-1.c: Likewise.
* gcc.target/riscv/save-restore-2.c: Likewise.
* gcc.target/riscv/save-restore-3.c: Likewise.
* gcc.target/riscv/save-restore-4.c: Likewise.
* gcc.target/riscv/save-restore-6.c: Likewise.
* gcc.target/riscv/save-restore-7.c: Likewise.
* gcc.target/riscv/save-restore-8.c: Likewise.
* gcc.target/riscv/save-restore-9.c: Likewise.
* gcc.target/riscv/shift-and-1.c: Likewise.
* gcc.target/riscv/shift-and-2.c: Likewise.
* gcc.target/riscv/shift-shift-1.c: Likewise.
* gcc.target/riscv/shift-shift-2.c: Likewise.
* gcc.target/riscv/shift-shift-3.c: Likewise.
* gcc.target/riscv/shift-shift-4.c: Likewise.
* gcc.target/riscv/shift-shift-5.c: Likewise.
* gcc.target/riscv/shorten-memrefs-1.c: Likewise.
* gcc.target/riscv/shorten-memrefs-2.c: Likewise.
* gcc.target/riscv/shorten-memrefs-3.c: Likewise.
* gcc.target/riscv/shorten-memrefs-4.c: Likewise.
* gcc.target/riscv/shorten-memrefs-5.c: Likewise.
* gcc.target/riscv/shorten-memrefs-6.c: Likewise.
* gcc.target/riscv/shorten-memrefs-7.c: Likewise.
* gcc.target/riscv/shorten-memrefs-8.c: Likewise.
* gcc.target/riscv/switch-qi.c: Likewise.
* gcc.target/riscv/switch-si.c: Likewise.
* gcc.target/riscv/weak-1.c: Likewise.
* gcc.target/riscv/zba-adduw.c: Likewise.
* gcc.target/riscv/zba-shNadd-01.c: Likewise.
* gcc.target/riscv/zba-shNadd-02.c: Likewise.
* gcc.target/riscv/zba-shNadd-03.c: Likewise.
* gcc.target/riscv/zba-slliuw.c: Likewise.
* gcc.target/riscv/zba-zextw.c: Likewise.
* gcc.target/riscv/zbb-andn-orn-xnor-01.c: Likewise.
* gcc.target/riscv/zbb-andn-orn-xnor-02.c: Likewise.
* gcc.target/riscv/zbb-li-rotr.c: Likewise.
* gcc.target/riscv/zbb-min-max.c: Likewise.
* gcc.target/riscv/zbb-rol-ror-01.c: Likewise.
* gcc.target/riscv/zbb-rol-ror-02.c: Likewise.
* gcc.target/riscv/zbb-rol-ror-03.c: Likewise.
* gcc.target/riscv/zbbw.c: Likewise.
* gcc.target/riscv/zbs-bclr.c: Likewise.
* gcc.target/riscv/zbs-bext.c: Likewise.
* gcc.target/riscv/zbs-binv.c: Likewise.
* gcc.target/riscv/zbs-bset.c: Likewise.
* gcc.target/riscv/zero-extend-1.c: Likewise.
* gcc.target/riscv/zero-extend-2.c: Likewise.
* gcc.target/riscv/zero-extend-3.c: Likewise.
* gcc.target/riscv/zero-extend-4.c: Likewise.
* gcc.target/riscv/zero-extend-5.c: Likewise.
Maciej W. Rozycki [Tue, 8 Feb 2022 12:14:58 +0000 (12:14 +0000)]
doc: RISC-V: Document the `-misa-spec=' option
We have recently updated the default for the `-misa-spec=' option, yet
we still have not documented it nor its `--with-isa-spec=' counterpart
in the GCC manuals. Fix that.
gcc/
* doc/install.texi (Configuration): Document `--with-isa-spec='
RISC-V option.
* doc/invoke.texi (Option Summary): List `-misa-spec=' RISC-V
option.
(RISC-V Options): Document it.
Maciej W. Rozycki [Tue, 8 Feb 2022 12:14:58 +0000 (12:14 +0000)]
RISC-V: Add target machine headers as a dependency for riscv-sr.o
Make riscv-sr.o depend on target machine headers, removing spurious test
failures:
FAIL: gcc.target/riscv/save-restore-3.c scan-assembler-not call[ \t]*t0,__riscv_save_0
FAIL: gcc.target/riscv/save-restore-3.c scan-assembler-not tail[ \t]*__riscv_restore_0
FAIL: gcc.target/riscv/save-restore-3.c scan-assembler tail[ \t]*foo
FAIL: gcc.target/riscv/save-restore-6.c scan-assembler-not call[ \t]*t0,__riscv_save_0
FAIL: gcc.target/riscv/save-restore-6.c scan-assembler-not tail[ \t]*__riscv_restore_0
FAIL: gcc.target/riscv/save-restore-6.c scan-assembler tail[ \t]*other_func
if the definitions of UNSPECs are locally changed and GCC rebuilt from a
dirty tree.
gcc/
* config/riscv/t-riscv (riscv-sr.o): Add $(TM_H) dependency.
Tom de Vries [Mon, 7 Feb 2022 13:50:13 +0000 (14:50 +0100)]
[nvptx] Fix 'main (int argc)' compilation
On nvptx, with test-case sso-12.c I run into:
...
spawn nvptx-none-run ./sso-12.exe^M
error: Prototype doesn't match for 'main' in 'input file 1 at offset 1796', \
first defined in 'input file 1 at offset 1796'^M
nvptx-run: cuLinkAddData failed: device kernel image is invalid \
(CUDA_ERROR_INVALID_SOURCE, 300)^M
FAIL: gcc.dg/sso-12.c execution test
...
The problem is that the test case uses 'main (int)' prototype, while __main
uses:
...
extern int main (int, void **);
...
There's code in write_fn_proto_1 to handle 'main (void)' as if
'main (int, void **)' was specified, but that's not active for 'main (int)'.
Fix this in write_fn_proto_1 by handling 'main (int)' as if
'main (int, void **)' was specified.
Tested on nvptx.
gcc/ChangeLog:
2022-02-07 Tom de Vries <tdevries@suse.de>
* config/nvptx/nvptx.cc (write_fn_proto_1): Handle 'main (int)'.
Tom de Vries [Mon, 7 Feb 2022 13:45:12 +0000 (14:45 +0100)]
[testsuite] Require c99_runtime to run builtin-sprintf.c
On nvptx, I run into an execution failure in test-case
gcc.dg/tree-ssa/builtin-sprintf.c because the test-case uses the 'hh'
modifier.
The port uses newlib, which does by default not support that modifier.
There's a configure option --enable-newlib-io-c99-formats to enable this
support, but that requires alloca support, which nvptx doesn't have.
Fix this by requiring c99_runtime for running the test-case.
Tested on nvptx.
gcc/testsuite/ChangeLog:
2022-02-07 Tom de Vries <tdevries@suse.de>
* gcc.dg/tree-ssa/builtin-sprintf.c: Require c99_runtime for
dg-do run.
Tom de Vries [Thu, 3 Feb 2022 13:00:02 +0000 (14:00 +0100)]
[nvptx] Fix .local atomic regressions
In PR target/104364, two problems were reported:
- in muniform-simt mode, an atom.cas insn is no longer executed in the
"master lane" only.
- in msoft-stack mode, an __atomic_compare_exchange_n on stack memory is
translated assuming it accesses local memory, while that's not the case.
Fix these by:
- ensuring that all insns with atomic attribute are also predicable, such
that the validate_change in nvptx_reorg_uniform_simt will succeed, and
asserting that it does, and
- guarding the local atomics implementation with a new function
nvptx_mem_local_p that correctly handles msoft-stack.
Tested on x86_64 with nvptx accelerator.
gcc/ChangeLog:
2022-02-04 Tom de Vries <tdevries@suse.de>
PR target/104364
* config/nvptx/nvptx-protos.h (nvptx_mem_local_p): Declare.
* config/nvptx/nvptx.cc (nvptx_reorg_uniform_simt): Assert that
change is validated.
(nvptx_mem_local_p): New function.
* config/nvptx/nvptx.md: Use nvptx_mem_local_p.
(define_c_enum "unspecv"): Add UNSPECV_CAS_LOCAL.
(define_insn "atomic_compare_and_swap<mode>_1_local"): New
non-atomic, non-predicable define_insn, factored out of ...
(define_insn "atomic_compare_and_swap<mode>_1"): ... here.
Make predicable again.
(define_expand "atomic_compare_and_swap<mode>"): Use
atomic_compare_and_swap<mode>_1_local.
gcc/testsuite/ChangeLog:
2022-02-04 Tom de Vries <tdevries@suse.de>
PR target/104364
* gcc.target/nvptx/softstack-2.c: New test.
* gcc.target/nvptx/uniform-simt-1.c: New test.
Jakub Jelinek [Tue, 8 Feb 2022 08:30:17 +0000 (09:30 +0100)]
libgomp: Fix segfault with posthumous orphan tasks [PR104385]
The following patch fixes crashes with posthumous orphan tasks.
When a parent task finishes, gomp_clear_parent clears the parent
pointers of its children tasks present in the parent->children_queue.
But children that are still waiting for dependencies aren't in that
queue yet, they will be added there only when the sibling they are
waiting for exits. Unfortunately we were adding those tasks into
the queues with the original task->parent which then causes crashes
because that task is gone and freed. The following patch fixes that
by clearing the parent field when we schedule such task for running
by adding it into the queues and we know that the sibling task which
is about to finish has NULL parent.
2022-02-08 Jakub Jelinek <jakub@redhat.com>
PR libgomp/104385
* task.c (gomp_task_run_post_handle_dependers): If parent is NULL,
clear task->parent.
* testsuite/libgomp.c/pr104385.c: New test.
Ulrich Weigand [Tue, 8 Feb 2022 08:21:07 +0000 (09:21 +0100)]
MAINTAINERS: Remove Hartmut Penner as s390 maintainer
Hartmut is no longer with IBM and has not worked on GCC for a
long time; he asked to be removed from MAINTAINERS.
ChangeLog:
* MAINTAINERS: Remove Hartmut Penner as s390 maintainer.
liuhongt [Mon, 24 Jan 2022 10:17:47 +0000 (18:17 +0800)]
Don't propagate for a more expensive reg-reg move.
For i386, it enables optimization like:
vmovd %xmm0, %edx
- vmovd %xmm0, %eax
+ movl %edx, %eax
gcc/ChangeLog:
PR rtl-optimization/104059
* regcprop.cc (copyprop_hardreg_forward_1): Don't propagate
for a more expensive reg-reg move.
gcc/testsuite/ChangeLog:
* gcc.target/i386/pr104059.c: New test.
GCC Administrator [Tue, 8 Feb 2022 00:16:24 +0000 (00:16 +0000)]
Daily bump.
David Malcolm [Mon, 7 Feb 2022 19:00:55 +0000 (14:00 -0500)]
analyzer: fix ICE on realloc of non-heap [PR104417]
gcc/analyzer/ChangeLog:
PR analyzer/104417
* sm-taint.cc (tainted_allocation_size::tainted_allocation_size):
Remove overzealous assertion.
(tainted_allocation_size::emit): Likewise.
(region_model::check_dynamic_size_for_taint): Likewise.
gcc/testsuite/ChangeLog:
PR analyzer/104417
* gcc.dg/analyzer/pr104417.c: New test.
Signed-off-by: David Malcolm <dmalcolm@redhat.com>
David Malcolm [Thu, 3 Feb 2022 21:21:27 +0000 (16:21 -0500)]
analyzer: fixes to memcpy [PR103872]
PR analyzer/103872 reports a failure of gcc.dg/analyzer/pr103526.c on
riscv64-unknown-elf-gcc. The issue is that I wrote the test on x86_64
where a memcpy in the test is optimized to a write to a read/write pair,
whereas due to alignment differences the analyzer can see it as a
memcpy call, revealing problems with the analyzer's implementation
of memcpy.
This patch reimplements region_model::impl_call_memcpy in terms of a
get_store_value followed by a set_value, fixing the issue.
gcc/analyzer/ChangeLog:
PR analyzer/103872
* region-model-impl-calls.cc (region_model::impl_call_memcpy):
Reimplement in terms of a get_store_value followed by a set_value.
gcc/testsuite/ChangeLog:
PR analyzer/103872
* gcc.dg/analyzer/memcpy-1.c: Add alternate versions of test cases
in which the calls to memcpy are hidden from the optimizer. Add
further test cases.
* gcc.dg/analyzer/taint-size-1.c: Add test coverage for memcpy
with tainted size.
Signed-off-by: David Malcolm <dmalcolm@redhat.com>
David Malcolm [Wed, 19 Jan 2022 19:06:25 +0000 (14:06 -0500)]
testsuite: avoid analyzer asm failures on non-Linux
gcc/testsuite/ChangeLog:
* gcc.dg/analyzer/asm-x86-1.c: Use dg-do "compile" rather than
"assemble".
* gcc.dg/analyzer/asm-x86-lp64-1.c: Likewise.
* gcc.dg/analyzer/asm-x86-lp64-2.c: Likewise.
* gcc.dg/analyzer/torture/asm-x86-linux-array_index_mask_nospec.c:
Likewise.
* gcc.dg/analyzer/torture/asm-x86-linux-cpuid-paravirt-1.c:
Likewise, and restrict to x86_64-pc-linux-gnu.
* gcc.dg/analyzer/torture/asm-x86-linux-cpuid-paravirt-2.c: Likewise.
* gcc.dg/analyzer/torture/asm-x86-linux-cpuid.c: Use dg-do
"compile" rather than "assemble".
* gcc.dg/analyzer/torture/asm-x86-linux-rdmsr-paravirt.c:
Likewise, and restrict to x86_64-pc-linux-gnu.
* gcc.dg/analyzer/torture/asm-x86-linux-rdmsr.c: Use dg-do
"compile" rather than "assemble".
* gcc.dg/analyzer/torture/asm-x86-linux-wfx_get_ps_timeout-full.c:
Likewise.
* gcc.dg/analyzer/torture/asm-x86-linux-wfx_get_ps_timeout-reduced.c:
Likewise.
Signed-off-by: David Malcolm <dmalcolm@redhat.com>
Jakub Jelinek [Mon, 7 Feb 2022 16:39:11 +0000 (17:39 +0100)]
testsuite: Fix up testsuite/gcc.c-torture/execute/builtins/lib/chk.c for powerpc [PR104380]
> > The following testcase FAILs when configured with
> > --with-long-double-format=ieee . Only happens in the -std=c* modes, not the
> > GNU modes; while the glibc headers have __asm redirects of
> > vsnprintf and __vsnprinf_chk to __vsnprintfieee128 and
> > __vsnprintf_chkieee128, the vsnprintf fortification extern inline gnu_inline
> > always_inline wrapper calls __builtin_vsnprintf_chk and we actually emit
> > a call to __vsnprinf_chk (i.e. with IBM extended long double) instead of
> > __vsnprintf_chkieee128.
> >
> > rs6000_mangle_decl_assembler_name already had cases for *printf and *scanf,
> > so this just adds another case for *printf_chk. *scanf_chk doesn't exist.
> > __ prefixing isn't done because *printf_chk already starts with __.
Unfortunately, while I've tested the testcase also with -mabi=ieeelongdouble
by hand, the full bootstrap/regtest was on GCCFarm where glibc is too old
to test with --with-long-double-format=ieee.
I've done full bootstrap/regtest with that option during the weekend and
the patch regressed:
FAIL: gcc.c-torture/execute/builtins/snprintf-chk.c execution, -O1
FAIL: gcc.c-torture/execute/builtins/snprintf-chk.c execution, -O2
FAIL: gcc.c-torture/execute/builtins/snprintf-chk.c execution, -O2 -flto -fno-use-linker-plugin -flto-partition=none
FAIL: gcc.c-torture/execute/builtins/snprintf-chk.c execution, -O2 -flto -fuse-linker-plugin -fno-fat-lto-objects
FAIL: gcc.c-torture/execute/builtins/snprintf-chk.c execution, -O3 -fomit-frame-pointer -funroll-loops -fpeel-loops -ftracer -finline-functions
FAIL: gcc.c-torture/execute/builtins/snprintf-chk.c execution, -O3 -g
FAIL: gcc.c-torture/execute/builtins/snprintf-chk.c execution, -Og -g
FAIL: gcc.c-torture/execute/builtins/snprintf-chk.c execution, -Os
FAIL: gcc.c-torture/execute/builtins/sprintf-chk.c execution, -O1
FAIL: gcc.c-torture/execute/builtins/sprintf-chk.c execution, -O2
FAIL: gcc.c-torture/execute/builtins/sprintf-chk.c execution, -O2 -flto -fno-use-linker-plugin -flto-partition=none
FAIL: gcc.c-torture/execute/builtins/sprintf-chk.c execution, -O2 -flto -fuse-linker-plugin -fno-fat-lto-objects
FAIL: gcc.c-torture/execute/builtins/sprintf-chk.c execution, -O3 -fomit-frame-pointer -funroll-loops -fpeel-loops -ftracer -finline-functions
FAIL: gcc.c-torture/execute/builtins/sprintf-chk.c execution, -O3 -g
FAIL: gcc.c-torture/execute/builtins/sprintf-chk.c execution, -Og -g
FAIL: gcc.c-torture/execute/builtins/sprintf-chk.c execution, -Os
FAIL: gcc.c-torture/execute/builtins/vsnprintf-chk.c execution, -O1
FAIL: gcc.c-torture/execute/builtins/vsnprintf-chk.c execution, -O2
FAIL: gcc.c-torture/execute/builtins/vsnprintf-chk.c execution, -O2 -flto -fno-use-linker-plugin -flto-partition=none
FAIL: gcc.c-torture/execute/builtins/vsnprintf-chk.c execution, -O2 -flto -fuse-linker-plugin -fno-fat-lto-objects
FAIL: gcc.c-torture/execute/builtins/vsnprintf-chk.c execution, -O3 -fomit-frame-pointer -funroll-loops -fpeel-loops -ftracer -finline-functions
FAIL: gcc.c-torture/execute/builtins/vsnprintf-chk.c execution, -O3 -g
FAIL: gcc.c-torture/execute/builtins/vsnprintf-chk.c execution, -Og -g
FAIL: gcc.c-torture/execute/builtins/vsnprintf-chk.c execution, -Os
FAIL: gcc.c-torture/execute/builtins/vsprintf-chk.c execution, -O1
FAIL: gcc.c-torture/execute/builtins/vsprintf-chk.c execution, -O2
FAIL: gcc.c-torture/execute/builtins/vsprintf-chk.c execution, -O2 -flto -fno-use-linker-plugin -flto-partition=none
FAIL: gcc.c-torture/execute/builtins/vsprintf-chk.c execution, -O2 -flto -fuse-linker-plugin -fno-fat-lto-objects
FAIL: gcc.c-torture/execute/builtins/vsprintf-chk.c execution, -O3 -fomit-frame-pointer -funroll-loops -fpeel-loops -ftracer -finline-functions
FAIL: gcc.c-torture/execute/builtins/vsprintf-chk.c execution, -O3 -g
FAIL: gcc.c-torture/execute/builtins/vsprintf-chk.c execution, -Og -g
FAIL: gcc.c-torture/execute/builtins/vsprintf-chk.c execution, -Os
The problem is that the execute/builtins/ testsuite wants to override some
of the library functions and with the change we (correctly) call
__*printf_chkieee128 and so lib/chk.c is no longer called but the glibc
APIs are.
2022-02-07 Jakub Jelinek <jakub@redhat.com>
PR target/104380
* gcc.c-torture/execute/builtins/lib/chk.c (__sprintf_chkieee128,
__vsprintf_chkieee128, __snprintf_chkieee128,
__vsnprintf_chkieee128): New aliases to non-ieee128 suffixed functions
for powerpc -mabi=ieeelongdouble.
Tamar Christina [Mon, 7 Feb 2022 12:55:12 +0000 (12:55 +0000)]
AArch32: correct usdot-product RTL patterns.
There was a bug in the ACLE specication for dot product which has now
been fixed[1]. This means some intrinsics were missing and are added by this
patch.
Bootstrapped and regtested on arm-none-linux-gnueabihf and no issues.
Ok for master?
[1] https://github.com/ARM-software/acle/releases/tag/r2021Q3
gcc/ChangeLog:
* config/arm/arm_neon.h (vusdotq_s32, vusdot_laneq_s32,
vusdotq_laneq_s32, vsudot_laneq_s32, vsudotq_laneq_s32): New
* config/arm/arm_neon_builtins.def (usdot): Add V16QI.
(usdot_laneq, sudot_laneq): New.
* config/arm/neon.md (neon_<sup>dot_laneq<vsi2qi>): New.
(neon_<sup>dot_lane<vsi2qi>): Remote unneeded code.
gcc/testsuite/ChangeLog:
* gcc.target/arm/simd/vdot-2-1.c: Add new tests.
* gcc.target/arm/simd/vdot-2-2.c: Likewise and fix output.
Tamar Christina [Mon, 7 Feb 2022 12:54:42 +0000 (12:54 +0000)]
AArch32: correct dot-product RTL patterns.
The previous fix for this problem was wrong due to a subtle difference between
where NEON expects the RMW values and where intrinsics expects them.
The insn pattern is modeled after the intrinsics and so needs an expand for
the vectorizer optab to switch the RTL.
However operand[3] is not expected to be written to so the current pattern is
bogus.
Instead we use the expand to shuffle around the RTL.
The vectorizer expects operands[3] and operands[0] to be
the same but the aarch64 intrinsics expanders expect operands[0] and
operands[1] to be the same.
This also fixes some issues with big-endian, each dot product performs 4 8-byte
multiplications. However compared to AArch64 we don't enter lanes in GCC
lane indexed in AArch32 aside from loads/stores. This means no lane remappings
are done in arm-builtins.c and so none should be done at the instruction side.
There are some other instructions that need inspections as I think there are
more incorrect ones.
Third there was a bug in the ACLE specication for dot product which has now been
fixed[1]. This means some intrinsics were missing and are added by this patch.
Bootstrapped and regtested on arm-none-linux-gnueabihf and no issues.
Ok for master? and active branches after some stew?
[1] https://github.com/ARM-software/acle/releases/tag/r2021Q3
gcc/ChangeLog:
* config/arm/arm_neon.h (vdot_laneq_u32, vdotq_laneq_u32,
vdot_laneq_s32, vdotq_laneq_s32): New.
* config/arm/arm_neon_builtins.def (sdot_laneq, udot_laneq): New.
* config/arm/neon.md (neon_<sup>dot<vsi2qi>): New.
(<sup>dot_prod<vsi2qi>): Re-order rtl.
(neon_<sup>dot_lane<vsi2qi>): Fix rtl order and endiannes.
(neon_<sup>dot_laneq<vsi2qi>): New.
gcc/testsuite/ChangeLog:
* gcc.target/arm/simd/vdot-compile.c: Add new cases.
* gcc.target/arm/simd/vdot-exec.c: Likewise.
Andreas Krebbel [Sun, 6 Feb 2022 08:07:41 +0000 (09:07 +0100)]
Check always_inline flag in s390_can_inline_p [PR104327]
MASK_MVCLE is set for -Os but not for other optimization levels. In
general it should not make much sense to inline across calls where the
flag is different but we have to allow it for always_inline.
The patch also rearranges the hook implementation a bit based on the
recommendations from Jakub und Martin in the PR.
Bootstrapped and regression tested on s390x with various arch flags.
Will commit after giving a few days for comments.
gcc/ChangeLog:
PR target/104327
* config/s390/s390.cc (s390_can_inline_p): Accept a few more flags
if always_inline is set. Don't inline when tune differs without
always_inline.
gcc/testsuite/ChangeLog:
PR target/104327
* gcc.c-torture/compile/pr104327.c: New test.
Richard Biener [Mon, 7 Feb 2022 08:31:07 +0000 (09:31 +0100)]
middle-end/104402 - split out _Complex compares from COND_EXPRs
This makes sure we always have a _Complex compare split to a
different stmt for the compare operand in a COND_EXPR on GIMPLE.
Complex lowering doesn't handle this and the change is something
we want for all kind of compares at some point.
2022-02-07 Richard Biener <rguenther@suse.de>
PR middle-end/104402
* gimple-expr.cc (is_gimple_condexpr): _Complex typed
compares are not valid.
* tree-cfg.cc (verify_gimple_assign_ternary): For COND_EXPR
check is_gimple_condexpr.
* gcc.dg/torture/pr104402.c: New testcase.
Kewen Lin [Mon, 7 Feb 2022 03:30:02 +0000 (21:30 -0600)]
rs6000: Move the hunk affecting VSX/ALTIVEC ahead [PR103627]
The modified hunk can update VSX and ALTIVEC flag, we have some codes
to check/warn for some flags related to VSX and ALTIVEC sitting where
the hunk is proprosed to be moved to. Without this adjustment, the
VSX and ALTIVEC update is too late, it can cause the incompatibility
and result in unexpected behaviors, the associated test case is one
typical case.
Since we already have the code which sets TARGET_FLOAT128_TYPE and lays
after the moved place, and OPTION_MASK_FLOAT128_KEYWORD will rely on
TARGET_FLOAT128_TYPE, so it just simply remove them.
gcc/ChangeLog:
PR target/103627
* config/rs6000/rs6000.cc (rs6000_option_override_internal): Move the
hunk affecting VSX and ALTIVEC to appropriate place.
gcc/testsuite/ChangeLog:
PR target/103627
* gcc.target/powerpc/pr103627-3.c: New test.