Christophe Lyon [Wed, 13 Oct 2021 09:16:22 +0000 (09:16 +0000)]
arm: Implement MVE predicates as vectors of booleans
This patch implements support for vectors of booleans to support MVE
predicates, instead of HImode. Since the ABI mandates pred16_t (aka
uint16_t) to represent predicates in intrinsics prototypes, we
introduce a new "predicate" type qualifier so that we can map relevant
builtins HImode arguments and return value to the appropriate vector
of booleans (VxBI).
We have to update test_vector_ops_duplicate, because it iterates using
an offset in bytes, where we would need to iterate in bits: we stop
iterating when we reach the end of the vector of booleans.
In addition, we have to fix the underlying definition of vectors of
booleans because ARM/MVE needs a different representation than
AArch64/SVE. With ARM/MVE the 'true' bit is duplicated over the
element size, so that a true element of V4BI is represented by
'0b1111'. This patch updates the aarch64 definition of VNx*BI as
needed.
Most of the work of this patch series was carried out while I was
working at STMicroelectronics as a Linaro assignee.
2022-02-22 Christophe Lyon <christophe.lyon@arm.com>
Richard Sandiford <richard.sandiford@arm.com>
gcc/
PR target/100757
PR target/101325
* config/aarch64/aarch64-modes.def (VNx16BI, VNx8BI, VNx4BI,
VNx2BI): Update definition.
* config/arm/arm-builtins.cc (arm_init_simd_builtin_types): Add new
simd types.
(arm_init_builtin): Map predicate vectors arguments to HImode.
(arm_expand_builtin_args): Move HImode predicate arguments to VxBI
rtx. Move return value to HImode rtx.
* config/arm/arm-builtins.h (arm_type_qualifiers): Add qualifier_predicate.
* config/arm/arm-modes.def (B2I, B4I, V16BI, V8BI, V4BI): New modes.
* config/arm/arm-simd-builtin-types.def (Pred1x16_t,
Pred2x8_t,Pred4x4_t): New.
* emit-rtl.cc (init_emit_once): Handle all boolean modes.
* genmodes.cc (mode_data): Add boolean field.
(blank_mode): Initialize it.
(make_complex_modes): Fix handling of boolean modes.
(make_vector_modes): Likewise.
(VECTOR_BOOL_MODE): Use new COMPONENT parameter.
(make_vector_bool_mode): Likewise.
(BOOL_MODE): New.
(make_bool_mode): New.
(emit_insn_modes_h): Fix generation of boolean modes.
(emit_class_narrowest_mode): Likewise.
* machmode.def: (VECTOR_BOOL_MODE): Document new COMPONENT
parameter. Use new BOOL_MODE instead of FRACTIONAL_INT_MODE to
define BImode.
* rtx-vector-builder.cc (rtx_vector_builder::find_cached_value):
Fix handling of constm1_rtx for VECTOR_BOOL.
* simplify-rtx.cc (native_encode_rtx): Fix support for VECTOR_BOOL.
(native_decode_vector_rtx): Likewise.
(test_vector_ops_duplicate): Skip vec_merge test
with vectors of booleans.
* varasm.cc (output_constant_pool_2): Likewise.
Christophe Lyon [Wed, 13 Oct 2021 09:16:17 +0000 (09:16 +0000)]
arm: Fix mve_vmvnq_n_<supf><mode> argument mode
The vmvnq_n* intrinsics and have [u]int[16|32]_t arguments, so use
<V_elem> iterator instead of HI in mve_vmvnq_n_<supf><mode>.
Most of the work of this patch series was carried out while I was
working at STMicroelectronics as a Linaro assignee.
2022-02-22 Christophe Lyon <christophe.lyon@arm.com>
gcc/
* config/arm/mve.md (mve_vmvnq_n_<supf><mode>): Use V_elem mode
for operand 1.
Christophe Lyon [Wed, 13 Oct 2021 09:16:14 +0000 (09:16 +0000)]
arm: Add support for VPR_REG in arm_class_likely_spilled_p
VPR_REG is the only register in its class, so it should be handled by
TARGET_CLASS_LIKELY_SPILLED_P, which is achieved by calling
default_class_likely_spilled_p. No test fails without this patch, but
it seems it should be implemented.
Most of the work of this patch series was carried out while I was
working at STMicroelectronics as a Linaro assignee.
2022-02-22 Christophe Lyon <christophe.lyon@arm.com>
gcc/
* config/arm/arm.cc (arm_class_likely_spilled_p): Handle VPR_REG.
Christophe Lyon [Wed, 13 Oct 2021 09:16:09 +0000 (09:16 +0000)]
arm: Add GENERAL_AND_VPR_REGS regclass
At some point during the development of this patch series, it appeared
that in some cases the register allocator wants “VPR or general”
rather than “VPR or general or FP” (which is the same thing as
ALL_REGS). The series does not seem to require this anymore, but it
seems to be a good thing to do anyway, to give the register allocator
more freedom.
CLASS_MAX_NREGS and arm_hard_regno_nregs need adjustment to avoid a
regression in gcc.dg/stack-usage-1.c when compiled with -mthumb
-mfloat-abi=hard -march=armv8.1-m.main+mve.fp+fp.dp.
Most of the work of this patch series was carried out while I was
working at STMicroelectronics as a Linaro assignee.
2022-02-22 Christophe Lyon <christophe.lyon@arm.com>
gcc/
* config/arm/arm.h (reg_class): Add GENERAL_AND_VPR_REGS.
(REG_CLASS_NAMES): Likewise.
(REG_CLASS_CONTENTS): Likewise.
(CLASS_MAX_NREGS): Handle VPR.
* config/arm/arm.cc (arm_hard_regno_nregs): Handle VPR.
Christophe Lyon [Wed, 13 Oct 2021 09:15:49 +0000 (09:15 +0000)]
arm: Add new tests for comparison vectorization with Neon and MVE
This patch mainly adds Neon tests similar to existing MVE ones,
to make sure we do not break Neon when fixing MVE.
mve-vcmp-f32-2.c is similar to mve-vcmp-f32.c but uses a conditional
with 2.0f and 3.0f constants to help scan-assembler-times.
Most of the work of this patch series was carried out while I was
working at STMicroelectronics as a Linaro assignee.
2022-02-22 Christophe Lyon <christophe.lyon@arm.com>
gcc/testsuite/
* gcc.target/arm/simd/mve-vcmp-f32-2.c: New.
* gcc.target/arm/simd/neon-compare-1.c: New.
* gcc.target/arm/simd/neon-compare-2.c: New.
* gcc.target/arm/simd/neon-compare-3.c: New.
* gcc.target/arm/simd/neon-compare-scalar-1.c: New.
* gcc.target/arm/simd/neon-vcmp-f16.c: New.
* gcc.target/arm/simd/neon-vcmp-f32-2.c: New.
* gcc.target/arm/simd/neon-vcmp-f32-3.c: New.
* gcc.target/arm/simd/neon-vcmp-f32.c: New.
* gcc.target/arm/simd/neon-vcmp.c: New.
Christophe Lyon [Tue, 22 Feb 2022 13:55:40 +0000 (13:55 +0000)]
MAINTAINERS: Update my email address.
* MAINTAINERS (Write After Approval): Update my e-mail address.
Tom de Vries [Tue, 20 Apr 2021 06:47:03 +0000 (08:47 +0200)]
[libgomp, nvptx] Fix hang in gomp_team_barrier_wait_end
Consider the following omp fragment.
...
#pragma omp target
#pragma omp parallel num_threads (2)
#pragma omp task
;
...
This hangs at -O0 for nvptx.
Investigating the behaviour gives us the following trace of events:
- both threads execute GOMP_task, where they:
- deposit a task, and
- execute gomp_team_barrier_wake
- thread 1 executes gomp_team_barrier_wait_end and, not being the last thread,
proceeds to wait at the team barrier
- thread 0 executes gomp_team_barrier_wait_end and, being the last thread, it
calls gomp_barrier_handle_tasks, where it:
- executes both tasks and marks the team barrier done
- executes a gomp_team_barrier_wake which wakes up thread 1
- thread 1 exits the team barrier
- thread 0 returns from gomp_barrier_handle_tasks and goes to wait at
the team barrier.
- thread 0 hangs.
To understand why there is a hang here, it's good to understand how things
are setup for nvptx. The libgomp/config/nvptx/bar.c implementation is
a copy of the libgomp/config/linux/bar.c implementation, with uses of both
futex_wake and do_wait replaced with uses of ptx insn bar.sync:
...
if (bar->total > 1)
asm ("bar.sync 1, %0;" : : "r" (32 * bar->total));
...
The point where thread 0 goes to wait at the team barrier, corresponds in
the linux implementation with a do_wait. In the linux case, the call to
do_wait doesn't hang, because it's waiting for bar->generation to become
a certain value, and if bar->generation already has that value, it just
proceeds, without any need for coordination with other threads.
In the nvtpx case, the bar.sync waits until thread 1 joins it in the same
logical barrier, which never happens: thread 1 is lingering in the
thread pool at the thread pool barrier (using a different logical barrier),
waiting to join a new team.
The easiest way to fix this is to revert to the posix implementation for
bar.{c,h}. That however falls back on a busy-waiting approach, and
does not take advantage of the ptx bar.sync insn.
Instead, we revert to the linux implementation for bar.c,
and implement bar.c local functions futex_wait and futex_wake using the
bar.sync insn.
The bar.sync insn takes an argument specifying how many threads are
participating, and that doesn't play well with the futex syntax where it's
not clear in advance how many threads will be woken up.
This is solved by waking up all waiting threads each time a futex_wait or
futex_wake happens, and possibly going back to sleep with an updated thread
count.
Tested libgomp on x86_64 with nvptx accelerator.
libgomp/ChangeLog:
2021-04-20 Tom de Vries <tdevries@suse.de>
PR target/99555
* config/nvptx/bar.c (generation_to_barrier): New function, copied
from config/rtems/bar.c.
(futex_wait, futex_wake): New function.
(do_spin, do_wait): New function, copied from config/linux/wait.h.
(gomp_barrier_wait_end, gomp_barrier_wait_last)
(gomp_team_barrier_wake, gomp_team_barrier_wait_end):
(gomp_team_barrier_wait_cancel_end, gomp_team_barrier_cancel): Remove
and replace with include of config/linux/bar.c.
* config/nvptx/bar.h (gomp_barrier_t): Add fields waiters and lock.
(gomp_barrier_init): Init new fields.
* testsuite/libgomp.c-c++-common/task-detach-6.c: Remove nvptx-specific
workarounds.
* testsuite/libgomp.c/pr99555-1.c: Same.
* testsuite/libgomp.fortran/task-detach-6.f90: Same.
Tobias Burnus [Sat, 19 Feb 2022 23:25:33 +0000 (00:25 +0100)]
nvptx: Add -misa=sm_70
Add -misa=sm_70, and use it to specify the misa value in test-case
gcc.target/nvptx/atomic-store-2.c.
Tested on nvptx.
gcc/ChangeLog:
* config/nvptx/nvptx-c.cc (nvptx_cpu_cpp_builtins): Handle SM70.
* config/nvptx/nvptx.cc (first_ptx_version_supporting_sm):
Likewise.
* config/nvptx/nvptx.opt (misa): Add sm_70 alias PTX_ISA_SM70.
gcc/testsuite/ChangeLog:
2022-02-22 Tom de Vries <tdevries@suse.de>
* gcc.target/nvptx/atomic-store-2.c: Use -misa=sm_70.
* gcc.target/nvptx/uniform-simt-3.c: Same.
Co-Authored-By: Tom de Vries <tdevries@suse.de>
Patrick Palka [Tue, 22 Feb 2022 14:37:58 +0000 (09:37 -0500)]
libstdc++: Implement P2415R2 changes to viewable_range / views::all
This implements the wording changes in P2415R2 "What is a view?", which
is a DR for C++20.
libstdc++-v3/ChangeLog:
* include/bits/ranges_base.h (__detail::__is_initializer_list):
Define.
(viewable_range): Adjust as per P2415R2.
* include/bits/ranges_cmp.h (__cpp_lib_ranges): Adjust value.
* include/std/ranges (owning_view): Define as per P2415R2.
(enable_borrowed_range<owning_view>): Likewise.
(views::__detail::__can_subrange): Replace with ...
(views::__detail::__can_owning_view): ... this.
(views::_All::_S_noexcept): Sync with operator().
(views::_All::operator()): Use owning_view instead of subrange
as per P2415R2.
* include/std/version (__cpp_lib_ranges): Adjust value.
* testsuite/std/ranges/adaptors/all.cc (test06): Adjust now that
views::all uses owning_view instead of subrange.
(test08): New test.
* testsuite/std/ranges/adaptors/lazy_split.cc (test09): Adjust
now that rvalue non-view non-borrowed ranges are viewable.
* testsuite/std/ranges/adaptors/split.cc (test06): Likewise.
Tobias Burnus [Sat, 19 Feb 2022 22:28:49 +0000 (23:28 +0100)]
nvptx: Add -mptx=6.0
Currently supported internally are 3.1, 6.0, 6.3 and 7.0.
However, -mptx= supports 3.1, 6.3, 7.0 – but not the internal default 6.0.
Add -mptx=6.0 for consistency.
Tested on nvptx.
gcc/ChangeLog:
* config/nvptx/nvptx.opt (mptx): Add 6.0 alias PTX_VERSION_6_0.
* doc/invoke.texi (-mptx): Update for new values and defaults.
Co-Authored-By: Tom de Vries <tdevries@suse.de>
Tom de Vries [Fri, 18 Feb 2022 11:31:02 +0000 (12:31 +0100)]
[nvptx] Add -mptx-comment
Add functionality that indicates which insns are added by -minit-regs, such
that for instance we have for pr53465.s:
...
// #APP
// 9 "gcc/testsuite/gcc.c-torture/execute/pr53465.c" 1
// Start: Added by -minit-regs=3:
// #NO_APP
mov.u32 %r26, 0;
// #APP
// 9 "gcc/testsuite/gcc.c-torture/execute/pr53465.c" 1
// End: Added by -minit-regs=3:
// #NO_APP
...
Can be switched off using -mno-ptx-comment.
Tested on nvptx.
gcc/ChangeLog:
2022-02-21 Tom de Vries <tdevries@suse.de>
* config/nvptx/nvptx.cc (gen_comment): New function.
(workaround_uninit_method_1, workaround_uninit_method_2)
(workaround_uninit_method_3): : Use gen_comment.
* config/nvptx/nvptx.opt (mptx-comment): New option.
Richard Biener [Tue, 22 Feb 2022 13:26:06 +0000 (14:26 +0100)]
Dump def that we use for a splat
This makes the SLP vectorizer dump the def we use for a splat to
aid debugging.
2022-02-22 Richard Biener <rguenther@suse.de>
* tree-vect-slp.cc (vect_build_slp_tree_2): Dump the def used
for a splat.
Roger Sayle [Tue, 22 Feb 2022 12:32:22 +0000 (12:32 +0000)]
Implement constant-folding simplifications of reductions.
This patch addresses a code quality regression in GCC 12 by implementing
some constant folding/simplification transformations for REDUC_PLUS_EXPR
in match.pd. The motivating example is gcc.dg/vect/pr89440.c which with
-O2 -ffast-math (with vectorization now enabled) gets optimized to:
float f (float x)
{
vector(4) float vect_x_14.11;
vector(4) float _2;
float _32;
_2 = {x_9(D), 0.0, 0.0, 0.0};
vect_x_14.11_29 = _2 + { 1.0e+1, 2.6e+1, 4.2e+1, 5.8e+1 };
_32 = .REDUC_PLUS (vect_x_14.11_29); [tail call]
return _32;
}
With these proposed new transformations, we can simplify the
above code even further.
float f (float x)
{
float _32;
_32 = x_9(D) + 1.36e+2;
return _32;
}
[which happens to match what we'd produce with -fno-tree-vectorize,
and with GCC 11].
2022-02-22 Roger Sayle <roger@nextmovesoftware.com>
Richard Biener <rguenther@suse.de>
gcc/ChangeLog
* fold-const.cc (ctor_single_nonzero_element): New function to
return the single non-zero element of a (vector) constructor.
* fold-const.h (ctor_single_nonzero_element): Prototype here.
* match.pd (reduc (constructor@0)): Simplify reductions of a
constructor containing a single non-zero element.
(reduc (@0 op VECTOR_CST) -> (reduc @0) op CONST): Simplify
reductions of vector operations of the same operator with
constant vector operands.
gcc/testsuite/ChangeLog
* gcc.dg/fold-reduc-1.c: New test case.
Jakub Jelinek [Tue, 22 Feb 2022 10:32:08 +0000 (11:32 +0100)]
libiberty: Fix up debug.temp.o creation if *.o has 64K+ sections [PR104617]
On
#define A(n) int foo1##n(void) { return 1##n; }
#define B(n) A(n##0) A(n##1) A(n##2) A(n##3) A(n##4) A(n##5) A(n##6) A(n##7) A(n##8) A(n##9)
#define C(n) B(n##0) B(n##1) B(n##2) B(n##3) B(n##4) B(n##5) B(n##6) B(n##7) B(n##8) B(n##9)
#define D(n) C(n##0) C(n##1) C(n##2) C(n##3) C(n##4) C(n##5) C(n##6) C(n##7) C(n##8) C(n##9)
#define E(n) D(n##0) D(n##1) D(n##2) D(n##3) D(n##4) D(n##5) D(n##6) D(n##7) D(n##8) D(n##9)
E(0) E(1) E(2) D(30) D(31) C(320) C(321) C(322) C(323) C(324) C(325)
B(3260) B(3261) B(3262) B(3263) A(32640) A(32641) A(32642)
testcase with
./xgcc -B ./ -c -g -fpic -ffat-lto-objects -flto -O0 -o foo1.o foo1.c -ffunction-sections
./xgcc -B ./ -shared -g -fpic -flto -O0 -o foo1.so foo1.o
/tmp/ccTW8mBm.debug.temp.o: file not recognized: file format not recognized
(testcase too slow to be included into testsuite).
The problem is clearly reported by readelf:
readelf: foo1.o.debug.temp.o: Warning: Section 2 has an out of range sh_link value of 65321
readelf: foo1.o.debug.temp.o: Warning: Section 5 has an out of range sh_link value of 65321
readelf: foo1.o.debug.temp.o: Warning: Section 10 has an out of range sh_link value of 65323
readelf: foo1.o.debug.temp.o: Warning: [ 2]: Link field (65321) should index a symtab section.
readelf: foo1.o.debug.temp.o: Warning: [ 5]: Link field (65321) should index a symtab section.
readelf: foo1.o.debug.temp.o: Warning: [10]: Link field (65323) should index a string section.
because simple_object_elf_copy_lto_debug_sections doesn't adjust sh_info and
sh_link fields in ElfNN_Shdr if they are in between SHN_{LO,HI}RESERVE
inclusive. Not adjusting those is incorrect though, SHN_{LO,HI}RESERVE
range is only relevant to the 16-bit fields, mainly st_shndx in ElfNN_Sym
where if one needs >= SHN_LORESERVE section number, SHN_XINDEX should be
used instead and .symtab_shndx section should contain the real section
index, and in ElfNN_Ehdr e_shnum and e_shstrndx fields, where if >=
SHN_LORESERVE value is needed it should put those into
Shdr[0].sh_{size,link}. But, sh_{link,info} are 32-bit fields which can
contain any section index.
Note, as simple-object-elf.c mentions, binutils from 2.12 to 2.18 (so before
2011) used to mishandle the > 63.75K sections case and assumed there is a
hole in between the sections, but what
simple_object_elf_copy_lto_debug_sections does wouldn't help in that case
for the debug temp object creation, we'd need to detect the case also in
that routine and take it into account in the remapping etc. I think
it is not worth it given that it is over 10 years, if somebody needs
63.75K or more sections, better use more recent binutils.
2022-02-22 Jakub Jelinek <jakub@redhat.com>
PR lto/104617
* simple-object-elf.c (simple_object_elf_match): Fix up URL
in comment.
(simple_object_elf_copy_lto_debug_sections): Remap sh_info and
sh_link even if they are in the SHN_LORESERVE .. SHN_HIRESERVE
range (inclusive).
Jakub Jelinek [Tue, 22 Feb 2022 09:43:13 +0000 (10:43 +0100)]
ranger: Fix up REALPART_EXPR/IMAGPART_EXPR handling [PR104604]
The following testcase is miscompiled since r12-3328.
That change assumed that if rhs1 of a GIMPLE_ASSIGN is COMPLEX_CST, then
that is the value of the lhs of the stmt, but that is not the case always,
only if it is a GIMPLE_SINGLE_RHS stmt. If it is e.g.
GIMPLE_UNARY_RHS or GIMPLE_BINARY_RHS (the latter happens in the testcase),
then it can be e.g.
__complex__ (3, 0) / var
and the REALPART_EXPR of that isn't 3, but the realpart of the division.
I assume once the ranger can do complex numbers adjust_*part_expr will just
fetch one or the other range from a underlying complex range, but until
then, we should limit this to what r12-3328 meant to do.
2022-02-22 Jakub Jelinek <jakub@redhat.com>
PR tree-optimization/104604
* gimple-range-fold.cc (adjust_imagpart_expr, adjust_realpart_expr):
Only check if gimple_assign_rhs1 is COMPLEX_CST if
gimple_assign_rhs_code is COMPLEX_CST.
* gcc.c-torture/execute/pr104604.c: New test.
Jakub Jelinek [Tue, 22 Feb 2022 09:38:37 +0000 (10:38 +0100)]
i386: Fix up copysign/xorsign expansion [PR104612]
We ICE on the following testcase for -m32 since r12-3435. because
operands[2] is (subreg:SF (reg:DI ...) 0) and
lowpart_subreg (V4SFmode, operands[2], SFmode)
returns NULL, and that is what we use in AND etc. insns we emit.
My earlier version of the patch fixes that by calling force_reg for the
input operands, to make sure they are really REGs and so lowpart_subreg
will succeed on them - even for theoretical MEMs using REGs there seems
desirable, we don't want to read following memory slots for the paradoxical
subreg. For the outputs, I thought we'd get better code by always computing
result into a new pseudo and them move lowpart of that pseudo into dest.
Unfortunately it regressed
FAIL: gcc.target/i386/pr89984-2.c scan-assembler-not vmovaps
on which the patch changes:
vandps .LC0(%rip), %xmm1, %xmm1
- vxorps %xmm0, %xmm1, %xmm0
+ vxorps %xmm0, %xmm1, %xmm1
+ vmovaps %xmm1, %xmm0
ret
The RA sees:
(insn 8 4 9 2 (set (reg:V4SF 85)
(and:V4SF (subreg:V4SF (reg:SF 90) 0)
(mem/u/c:V4SF (symbol_ref/u:DI ("*.LC0") [flags 0x2]) [0 S16 A128]))) "pr89984-2.c":7:12 2838 {*andv4sf3}
(expr_list:REG_DEAD (reg:SF 90)
(nil)))
(insn 9 8 10 2 (set (reg:V4SF 87)
(xor:V4SF (reg:V4SF 85)
(subreg:V4SF (reg:SF 89) 0))) "pr89984-2.c":7:12 2842 {*xorv4sf3}
(expr_list:REG_DEAD (reg:SF 89)
(expr_list:REG_DEAD (reg:V4SF 85)
(nil))))
(insn 10 9 14 2 (set (reg:SF 82 [ <retval> ])
(subreg:SF (reg:V4SF 87) 0)) "pr89984-2.c":7:12 142 {*movsf_internal}
(expr_list:REG_DEAD (reg:V4SF 87)
(nil)))
(insn 14 10 15 2 (set (reg/i:SF 20 xmm0)
(reg:SF 82 [ <retval> ])) "pr89984-2.c":8:1 142 {*movsf_internal}
(expr_list:REG_DEAD (reg:SF 82 [ <retval> ])
(nil)))
(insn 15 14 0 2 (use (reg/i:SF 20 xmm0)) "pr89984-2.c":8:1 -1
(nil))
and doesn't know that if it would use xmm0 not just for pseudo 82
but also for pseudo 87, it could create a noop move in insn 10 and
so could avoid an extra register copy and nothing later on is able
to figure that out either. I don't know how the RA should know
that though.
So that we don't regress, this version of the patch
will do this stuff (i.e. use fresh vector pseudo as destination and
then move lowpart of that to dest) over what it used before (i.e.
use paradoxical subreg of the dest) only if lowpart_subreg returns NULL.
2022-02-22 Jakub Jelinek <jakub@redhat.com>
PR target/104612
* config/i386/i386-expand.cc (ix86_expand_copysign): Call force_reg
on input operands before calling lowpart_subreg on it. For output
operand, use a vmode pseudo as destination and then move its lowpart
subreg into operands[0] if lowpart_subreg fails on dest.
(ix86_expand_xorsign): Likewise.
* gcc.dg/pr104612.c: New test.
Tom de Vries [Mon, 21 Feb 2022 19:02:13 +0000 (20:02 +0100)]
[libgomp, testsuite, nvptx] Fix pr96390.c without CUDA
When running the libgomp testsuite on x86_64 with nvptx accelerator, we run into:
...
XPASS: libgomp.c/../libgomp.c-c++-common/pr96390.c (test for excess errors)
FAIL: libgomp.c/../libgomp.c-c++-common/pr96390.c execution test
...
The problem is that we're expecting the following ptxas error:
...
XFAIL: libgomp.c/../libgomp.c-c++-common/pr96390.c (test for excess errors)
Excess errors:
ptxas /tmp/ccZYDw8N.o, line 90; error : Call to 'baz' requires call prototype
ptxas /tmp/ccZYDw8N.o, line 90; error : Unknown symbol 'baz'
...
But it's not triggered because ptxas is not in the path, so nvptx-none-as
defaults to --no-verify.
So instead, we run into the same error at execution time.
Fix this by forcing verification using:
...
/* { dg-additional-options "-foffload=-Wa,--verify" \
{ target offload_target_nvptx } } */
...
such that we run into the xfail in this way instead:
...
XFAIL: libgomp.c/../libgomp.c-c++-common/pr96390.c (test for excess errors)
Excess errors:
nvptx-as: error trying to exec 'ptxas': execvp: No such file or directory
nvptx-as: ptxas returned 255 exit status
...
Tested on x86_64-linux with nvptx accelerator.
libgomp/ChangeLog:
2022-02-21 Tom de Vries <tdevries@suse.de>
PR testsuite/104146
* testsuite/libgomp.c++/pr96390.C: Add additional-option
-foffload=-Wa,--verify for nvptx.
* testsuite/libgomp.c-c++-common/pr96390.c: Same.
Tom de Vries [Sun, 20 Feb 2022 08:41:39 +0000 (09:41 +0100)]
[nvptx] Xfail sibcall execution tests
On nvptx I see the following FAIL:
...
FAIL: gcc.dg/sibcall-3.c execution test
...
The test-case states that "this test is xfailed on targets without sibcall
patterns".
The nvptx port doesn't have a sibcall pattern, so add an xfail. Likewise in
two similar test-cases.
Tested on nvptx.
gcc/testsuite/ChangeLog:
2022-02-20 Tom de Vries <tdevries@suse.de>
* gcc.dg/sibcall-10.c: Xfail execution test for nvptx.
* gcc.dg/sibcall-3.c: Same.
* gcc.dg/sibcall-4.c: Same.
Tom de Vries [Sat, 19 Feb 2022 22:33:27 +0000 (23:33 +0100)]
[nvptx, testsuite] Remove mptx settings in gcc.target/nvptx tests
Some test-cases in gcc/testsuite/gcc.target/nvptx contain mptx
settings, which are paired with misa settings, in order to have the mptx
version support the misa version.
Since commit
decde11183bd ("[nvptx] Choose -mptx default based on -misa"),
this is no longer necessary.
Remove the mptx settings.
Tested on nvptx.
gcc/testsuite/ChangeLog:
2022-02-20 Tom de Vries <tdevries@suse.de>
* gcc.target/nvptx/float16-1.c: Drop -mptx setting.
* gcc.target/nvptx/float16-2.c: Same.
* gcc.target/nvptx/float16-3.c: Same.
* gcc.target/nvptx/float16-4.c: Same.
* gcc.target/nvptx/float16-5.c: Same.
* gcc.target/nvptx/float16-6.c: Same.
* gcc.target/nvptx/tanh-1.c: Same.
Richard Biener [Fri, 18 Feb 2022 13:32:14 +0000 (14:32 +0100)]
target/99881 - x86 vector cost of CTOR from integer regs
This uses the now passed SLP node to the vectorizer costing hook
to adjust vector construction costs for the cost of moving an
integer component from a GPR to a vector register when that's
required for building a vector from components. A cruical difference
here is whether the component is loaded from memory or extracted
from a vector register as in those cases no intermediate GPR is involved.
The pr99881.c testcase can be Un-XFAILed with this patch, the
pr91446.c testcase now produces scalar code which looks superior
to me so I've adjusted it as well.
2022-02-18 Richard Biener <rguenther@suse.de>
PR tree-optimization/104582
PR target/99881
* config/i386/i386.cc (ix86_vector_costs::add_stmt_cost):
Cost GPR to vector register moves for integer vector construction.
* gcc.dg/vect/costmodel/x86_64/costmodel-pr104582-1.c: New.
* gcc.dg/vect/costmodel/x86_64/costmodel-pr104582-2.c: Likewise.
* gcc.dg/vect/costmodel/x86_64/costmodel-pr104582-3.c: Likewise.
* gcc.dg/vect/costmodel/x86_64/costmodel-pr104582-4.c: Likewise.
* gcc.target/i386/pr99881.c: Un-XFAIL.
* gcc.target/i386/pr91446.c: Adjust to not expect vectorization.
Richard Biener [Fri, 18 Feb 2022 10:50:44 +0000 (11:50 +0100)]
tree-optimization/104582 - make SLP node available in vector cost hook
This adjusts the vectorizer costing API to allow passing down the
SLP node the vector stmt is created from.
2022-02-18 Richard Biener <rguenther@suse.de>
PR tree-optimization/104582
* tree-vectorizer.h (stmt_info_for_cost::node): New field.
(vector_costs::add_stmt_cost): Add SLP node parameter.
(dump_stmt_cost): Likewise.
(add_stmt_cost): Likewise, new overload and adjust.
(add_stmt_costs): Adjust.
(record_stmt_cost): New overload.
* tree-vectorizer.cc (dump_stmt_cost): Dump the SLP node.
(vector_costs::add_stmt_cost): Adjust.
* tree-vect-loop.cc (vect_estimate_min_profitable_iters):
Adjust.
* tree-vect-slp.cc (vect_prologue_cost_for_slp): Record
the SLP node for costing.
(vectorizable_slp_permutation): Likewise.
* tree-vect-stmts.cc (record_stmt_cost): Adjust and add
new overloads.
* config/i386/i386.cc (ix86_vector_costs::add_stmt_cost):
Adjust.
* config/aarch64/aarch64.cc (aarch64_vector_costs::add_stmt_cost):
Adjust.
* config/rs6000/rs6000.cc (rs6000_vector_costs::add_stmt_cost):
Adjust.
(rs6000_cost_data::adjust_vect_cost_per_loop): Likewise.
Richard Biener [Fri, 18 Feb 2022 10:34:52 +0000 (11:34 +0100)]
tree-optimization/104582 - Simplify vectorizer cost API and fixes
This simplifies the vectorizer cost API by providing overloads
to add_stmt_cost and record_stmt_cost suitable for scalar stmt
and branch stmt costing which do not need information like
a vector type or alignment. It also fixes two mistakes where
costs for versioning tests were recorded as vector stmt rather
than scalar stmt.
This is a first patch to simplify the actual fix for PR104582.
2022-02-18 Richard Biener <rguenther@suse.de>
PR tree-optimization/104582
* tree-vectorizer.h (add_stmt_cost): New overload.
(record_stmt_cost): Likewise.
* tree-vect-loop.cc (vect_compute_single_scalar_iteration_cost):
Use add_stmt_costs.
(vect_get_known_peeling_cost): Use new overloads.
(vect_estimate_min_profitable_iters): Likewise. Consistently
use scalar_stmt for costing versioning checks.
* tree-vect-stmts.cc (record_stmt_cost): New overload.
Hongyu Wang [Fri, 11 Feb 2022 06:44:15 +0000 (14:44 +0800)]
i386: Relax cmpxchg instruction under -mrelax-cmpxchg-loop [PR103069]
For cmpxchg, it is commonly used in spin loop, and several user code
such as pthread directly takes cmpxchg as loop condition, which cause
huge cache bouncing.
This patch extends previous implementation to relax all cmpxchg
instruction under -mrelax-cmpxchg-loop with an extra atomic load,
compare and emulate the failed cmpxchg behavior.
For original spin loop which looks like
loop: mov %eax,%r8d
or $1,%r8d
lock cmpxchg %r8d,(%rdi)
jne loop
It will now truns to
loop: mov %eax,%r8d
or $1,%r8d
mov (%r8),%rsi <--- load lock first
cmp %rsi,%rax <--- compare with expected input
jne .L2 <--- lock ne expected
lock cmpxchg %r8d,(%rdi)
jne loop
L2: mov %rsi,%rax <--- perform the behavior of failed cmpxchg
jne loop
under -mrelax-cmpxchg-loop.
gcc/ChangeLog:
PR target/103069
* config/i386/i386-expand.cc (ix86_expand_atomic_fetch_op_loop):
Split atomic fetch and loop part.
(ix86_expand_cmpxchg_loop): New expander for cmpxchg loop.
* config/i386/i386-protos.h (ix86_expand_cmpxchg_loop): New
prototype.
* config/i386/sync.md (atomic_compare_and_swap<mode>): Call new
expander under TARGET_RELAX_CMPXCHG_LOOP.
(atomic_compare_and_swap<mode>): Likewise for doubleword modes.
gcc/testsuite/ChangeLog:
PR target/103069
* gcc.target/i386/pr103069-2.c: Adjust result check.
* gcc.target/i386/pr103069-3.c: New test.
* gcc.target/i386/pr103069-4.c: Likewise.
GCC Administrator [Tue, 22 Feb 2022 00:16:33 +0000 (00:16 +0000)]
Daily bump.
Ian Lance Taylor [Mon, 21 Feb 2022 17:50:28 +0000 (09:50 -0800)]
runtime/internal/syscall: build dummy package if not Linux
Fixes libgo build on non-Linux systems.
Reviewed-on: https://go-review.googlesource.com/c/gofrontend/+/387134
Dan Li [Mon, 21 Feb 2022 20:01:14 +0000 (20:01 +0000)]
aarch64: Add compiler support for Shadow Call Stack
Shadow Call Stack can be used to protect the return address of a
function at runtime, and clang already supports this feature[1].
To enable SCS in user mode, in addition to compiler, other support
is also required (as discussed in [2]). This patch only adds basic
support for SCS from the compiler side, and provides convenience
for users to enable SCS.
For linux kernel, only the support of the compiler is required.
[1] https://clang.llvm.org/docs/ShadowCallStack.html
[2] https://gcc.gnu.org/bugzilla/show_bug.cgi?id=102768
Signed-off-by: Dan Li <ashimida@linux.alibaba.com>
gcc/ChangeLog:
* config/aarch64/aarch64.cc (SLOT_REQUIRED):
Change wb_candidate[12] to wb_push_candidate[12].
(aarch64_layout_frame): Likewise, and
change callee_adjust when scs is enabled.
(aarch64_save_callee_saves):
Change wb_candidate[12] to wb_push_candidate[12].
(aarch64_restore_callee_saves):
Change wb_candidate[12] to wb_pop_candidate[12].
(aarch64_get_separate_components):
Change wb_candidate[12] to wb_push_candidate[12].
(aarch64_expand_prologue): Push x30 onto SCS before it's
pushed onto stack.
(aarch64_expand_epilogue): Pop x30 frome SCS, while
preventing it from being popped from the regular stack again.
(aarch64_override_options_internal): Add SCS compile option check.
(TARGET_HAVE_SHADOW_CALL_STACK): New hook.
* config/aarch64/aarch64.h (struct GTY): Add is_scs_enabled,
wb_pop_candidate[12], and rename wb_candidate[12] to
wb_push_candidate[12].
* config/aarch64/aarch64.md (scs_push): New template.
(scs_pop): Likewise.
* doc/invoke.texi: Document -fsanitize=shadow-call-stack.
* doc/tm.texi: Regenerate.
* doc/tm.texi.in: Add hook have_shadow_call_stack.
* flag-types.h (enum sanitize_code):
Add SANITIZE_SHADOW_CALL_STACK.
* opts.cc (parse_sanitizer_options): Add shadow-call-stack
and exclude SANITIZE_SHADOW_CALL_STACK.
* target.def: New hook.
* toplev.cc (process_options): Add SCS compile option check.
* ubsan.cc (ubsan_expand_null_ifn): Enum type conversion.
gcc/testsuite/ChangeLog:
* gcc.target/aarch64/shadow_call_stack_1.c: New test.
* gcc.target/aarch64/shadow_call_stack_2.c: New test.
* gcc.target/aarch64/shadow_call_stack_3.c: New test.
* gcc.target/aarch64/shadow_call_stack_4.c: New test.
* gcc.target/aarch64/shadow_call_stack_5.c: New test.
* gcc.target/aarch64/shadow_call_stack_6.c: New test.
* gcc.target/aarch64/shadow_call_stack_7.c: New test.
* gcc.target/aarch64/shadow_call_stack_8.c: New test.
Tom de Vries [Wed, 16 Feb 2022 16:09:11 +0000 (17:09 +0100)]
[nvptx] Initialize ptx regs
With nvptx target, driver version 510.47.03 and board GT 1030 I, we run into:
...
FAIL: gcc.c-torture/execute/pr53465.c -O1 execution test
FAIL: gcc.c-torture/execute/pr53465.c -O2 execution test
FAIL: gcc.c-torture/execute/pr53465.c -O3 -g execution test
...
while the test-cases pass with nvptx-none-run -O0.
The problem is that the generated ptx contains a read from an uninitialized
ptx register, and the driver JIT doesn't handle this well.
For -O2 and -O3, we can get rid of the FAIL using --param
logical-op-non-short-circuit=0. But not for -O1.
At -O1, the test-case minimizes to:
...
void __attribute__((noinline, noclone))
foo (int y) {
int c;
for (int i = 0; i < y; i++)
{
int d = i + 1;
if (i && d <= c)
__builtin_abort ();
c = d;
}
}
int main () {
foo (2); return 0;
}
...
Note that the test-case does not contain an uninitialized use. In the first
iteration, i is 0 and consequently c is not read. In the second iteration, c
is read, but by that time it's already initialized by 'c = d' from the first
iteration.
AFAICT the problem is introduced as follows: the conditional use of c in the
loop body is translated into an unconditional use of c in the loop header:
...
# c_1 = PHI <c_4(D)(2), c_9(6)>
...
which forwprop1 propagates the 'c_9 = d_7' assignment into:
...
# c_1 = PHI <c_4(D)(2), d_7(6)>
...
which ends up being translated by expand into an unconditional:
...
(insn 13 12 0 (set (reg/v:SI 22 [ c ])
(reg/v:SI 23 [ d ])) -1
(nil))
...
at the start of the loop body, creating an uninitialized read of d on the
path from loop entry.
By disabling coalesce_ssa_name, we get the more usual copies on the incoming
edges. The copy on the loop entry path still does an uninitialized read, but
that one's now initialized by init-regs. The test-case passes, also when
disabling init-regs, so it's possible that the JIT driver doesn't object to
this type of uninitialized read.
Now that we characterized the problem to some degree, we need to fix this,
because either:
- we're violating an undocumented ptx invariant, and this is a compiler bug,
or
- this is is a driver JIT bug and we need to work around it.
There are essentially two strategies to address this:
- stop the compiler from creating uninitialized reads
- patch up uninitialized reads using additional initialization
The former will probably involve:
- making some optimizations more conservative in the presence of
uninitialized reads, and
- disabling some other optimizations (where making them more conservative is
not possible, or cannot easily be achieved).
This will probably will have a cost penalty for code that does not suffer from
the original problem.
The latter has the problem that it may paper over uninitialized reads
in the source code, or indeed over ones that were incorrectly introduced
by the compiler. But it has the advantage that it allows for the problem to
be addressed at a single location.
There's an existing pass, init-regs, which implements a form of the latter,
but it doesn't work for this example because it only inserts additional
initialization for uses that have not a single reaching definition.
Fix this by adding initialization of uninitialized ptx regs in reorg.
Control the new functionality using -minit-regs=<0|1|2|3>, meaning:
- 0: disabled.
- 1: add initialization of all regs at the entry bb
- 2: add initialization of uninitialized regs at the entry bb
- 3: add initialization of uninitialized regs close to the use
and defaulting to 3.
Tested on nvptx.
gcc/ChangeLog:
2022-02-17 Tom de Vries <tdevries@suse.de>
PR target/104440
* config/nvptx/nvptx.cc (workaround_uninit_method_1)
(workaround_uninit_method_2, workaround_uninit_method_3)
(workaround_uninit): New function.
(nvptx_reorg): Use workaround_uninit.
* config/nvptx/nvptx.opt (minit-regs): New option.
Patrick Palka [Mon, 21 Feb 2022 14:20:23 +0000 (09:20 -0500)]
c++: Add testcase for already fixed PR [PR85493]
The a1 and a2 case were fixed (by diagnosing the invalid expression)
with r11-434, and the a3 case with r8-7625.
PR c++/85493
gcc/testsuite/ChangeLog:
* g++.dg/cpp0x/decltype80.C: New test.
Andre Vieira [Mon, 21 Feb 2022 09:41:53 +0000 (09:41 +0000)]
rtl-optimization/104498: Fix comparing symbol reference
gcc/ChangeLog:
PR rtl-optimization/104498
* alias.cc (compare_base_symbol_refs): Correct distance computation
when swapping x and y.
Andrew Pinski [Sun, 13 Feb 2022 00:09:39 +0000 (00:09 +0000)]
c: [PR104506] Fix ICE after error due to change of type to error_mark_node
The problem here is we end up with an error_mark_node when calling
useless_type_conversion_p and that ICEs. STRIP_NOPS/tree_nop_conversion
has had a check for the inner type being an error_mark_node since g9a6bb3f78c96
(2000). This just adds the check also to tree_ssa_useless_type_conversion.
STRIP_USELESS_TYPE_CONVERSION is mostly used inside the gimplifier
and the places where it is used outside of the gimplifier would not
be adding too much overhead.
OK? Bootstrapped and tested on x86_64-linux-gnu with no regressions.
Thanks,
Andrew Pinski
PR c/104506
gcc/ChangeLog:
* tree-ssa.cc (tree_ssa_useless_type_conversion):
Check the inner type before calling useless_type_conversion_p.
gcc/testsuite/ChangeLog:
* gcc.dg/pr104506-1.c: New test.
* gcc.dg/pr104506-2.c: New test.
* gcc.dg/pr104506-3.c: New test.
GCC Administrator [Mon, 21 Feb 2022 00:16:24 +0000 (00:16 +0000)]
Daily bump.
Iain Buclaw [Sun, 20 Feb 2022 23:06:16 +0000 (00:06 +0100)]
d: Remove handling of deleting GC allocated classes.
Now that the `delete' keyword has been removed from the front-end, only
compiler-generated uses of DeleteExp reach the code generator via the
auto-destruction of `scope class' variables.
The run-time library helpers that previously were used to delete GC
class objects can now be removed from the compiler.
gcc/d/ChangeLog:
* expr.cc (ExprVisitor::visit (DeleteExp *)): Remove handling of
deleting GC allocated classes.
* runtime.def (DELCLASS): Remove.
(DELINTERFACE): Remove.
Iain Buclaw [Sun, 20 Feb 2022 19:02:23 +0000 (20:02 +0100)]
d: Merge upstream dmd
cb49e99f8, druntime
55528bd1, phobos
1a3e80ec2.
D front-end changes:
- Import dmd v2.099.0-beta.1.
- It's now an error to use `alias this' for partial assignment.
- The `delete' keyword has been removed from the language.
- Using `this' and `super' as types has been removed from the
language, the parser no longer specially handles this wrong code
with an informative error.
D Runtime changes:
- Import druntime v2.099.0-beta.1.
Phobos changes:
- Import phobos v2.099.0-beta.1.
gcc/d/ChangeLog:
* dmd/MERGE: Merge upstream dmd
cb49e99f8.
* dmd/VERSION: Update version to v2.099.0-beta.1.
* decl.cc (layout_class_initializer): Update call to NewExp::create.
* expr.cc (ExprVisitor::visit (DeleteExp *)): Remove handling of
deleting arrays and pointers.
(ExprVisitor::visit (DotVarExp *)): Convert complex types to the
front-end library type representing them.
(ExprVisitor::visit (StringExp *)): Use getCodeUnit instead of charAt
to get the value of each index in a string expression.
* runtime.def (DELMEMORY): Remove.
(DELARRAYT): Remove.
* types.cc (TypeVisitor::visit (TypeEnum *)): Handle anonymous enums.
libphobos/ChangeLog:
* libdruntime/MERGE: Merge upstream druntime
55528bd1.
* src/MERGE: Merge upstream phobos
1a3e80ec2.
* testsuite/libphobos.hash/test_hash.d: Update.
* testsuite/libphobos.betterc/test19933.d: New test.
Harald Anlauf [Wed, 9 Feb 2022 20:54:29 +0000 (21:54 +0100)]
Fortran: improve check of pointer initialization in DATA statements
gcc/fortran/ChangeLog:
PR fortran/77693
* data.cc (gfc_assign_data_value): If a variable in a data
statement has the POINTER attribute, check for allowed initial
data target that is compatible with pointer assignment.
* gfortran.h (IS_POINTER): New macro.
gcc/testsuite/ChangeLog:
PR fortran/77693
* gfortran.dg/data_pointer_2.f90: New test.
GCC Administrator [Sun, 20 Feb 2022 00:16:22 +0000 (00:16 +0000)]
Daily bump.
Tom de Vries [Tue, 15 Feb 2022 13:36:26 +0000 (14:36 +0100)]
[nvptx] Use _ as destination operand of atom.exch
We currently generate this code for an atomic store:
...
.reg.u32 %r21;
atom.exch.b32 %r21,[%r22],%r23;
...
where %r21 is set but unused.
Use the ptx bit bucket operand '_' instead, such that we have:
...
atom.exch.b32 _,[%r22],%r23;
...
[ Note that the same problem still occurs for this code:
...
void atomic_store (int *ptr, int val) {
__atomic_exchange_n (ptr, val, MEMMODEL_RELAXED);
}
... ]
Tested on nvptx.
gcc/ChangeLog:
2022-02-19 Tom de Vries <tdevries@suse.de>
* config/nvptx/nvptx.cc (nvptx_reorg_uniform_simt): Handle SET insn.
* config/nvptx/nvptx.md
(define_insn "nvptx_atomic_store<mode>"): Rename to ...
(define_insn "nvptx_atomic_store_sm70<mode>"): This.
(define_insn "nvptx_atomic_store<mode>"): New define_insn.
(define_expand "atomic_store<mode>"): Handle rename. Use
nvptx_atomic_store instead of atomic_exchange.
gcc/testsuite/ChangeLog:
2022-02-19 Tom de Vries <tdevries@suse.de>
* gcc.target/nvptx/atomic-store-1.c: Update.
Tom de Vries [Fri, 18 Feb 2022 16:38:50 +0000 (17:38 +0100)]
[nvptx] Don't skip atomic insns in nvptx_reorg_uniform_simt
In nvptx_reorg_uniform_simt we have a loop:
...
for (insn = get_insns (); insn; insn = next)
{
next = NEXT_INSN (insn);
if (!(CALL_P (insn) && nvptx_call_insn_is_syscall_p (insn))
&& !(NONJUMP_INSN_P (insn)
&& GET_CODE (PATTERN (insn)) == PARALLEL
&& get_attr_atomic (insn)))
continue;
...
that intends to handle syscalls and atomic insns.
However, this also silently skips the atomic insn nvptx_atomic_store, which
has GET_CODE (PATTERN (insn)) == SET.
This does not cause problems, because the nvptx_atomic_store actually maps
onto a "st" insn, and therefore is not atomic and doesn't need to be handled
by nvptx_reorg_uniform_simt.
Fix this by:
- explicitly setting nvptx_atomic_store's atomic attribute to false,
- rewriting the skip condition to make sure all insn
with atomic attribute are handled, and
- asserting that all handled insns are PARALLEL.
Tested on nvptx.
gcc/ChangeLog:
2022-02-19 Tom de Vries <tdevries@suse.de>
* config/nvptx/nvptx.cc (nvptx_reorg_uniform_simt): Handle all
insns with atomic attribute. Assert that all handled insns are
PARALLELs.
* config/nvptx/nvptx.md (define_insn "nvptx_atomic_store<mode>"):
Set atomic attribute to false.
gcc/testsuite/ChangeLog:
2022-02-19 Tom de Vries <tdevries@suse.de>
* gcc.target/nvptx/uniform-simt-3.c: New test.
Tom de Vries [Fri, 18 Feb 2022 15:50:03 +0000 (16:50 +0100)]
[nvptx] Use nvptx_warpsync / nvptx_uniform_warp_check for -muniform-simt
With the default ptx isa 6.0, we have for uniform-simt-1.c:
...
@%r33 atom.global.cas.b32 %r26, [a], %r28, %r29;
shfl.sync.idx.b32 %r26, %r26, %r32, 31, 0xffffffff;
...
The atomic insn is predicated by -muniform-simt, and the subsequent insn does
a warp sync, at which point the warp is uniform again.
But with -mptx=3.1, we have instead:
...
@%r33 atom.global.cas.b32 %r26, [a], %r28, %r29;
shfl.idx.b32 %r26, %r26, %r32, 31;
...
The shfl does not sync the warp, and we want the warp to go back to executing
uniformly asap. We cannot enforce this, but at least check this using
nvptx_uniform_warp_check, similar to how that is done for openacc.
Likewise, detect the case that no shfl insn is emitted, and add a
nvptx_uniform_warp_check or nvptx_warpsync.
gcc/ChangeLog:
2022-02-19 Tom de Vries <tdevries@suse.de>
* config/nvptx/nvptx.cc (nvptx_unisimt_handle_set): Change return
type to bool.
(nvptx_reorg_uniform_simt): Insert nvptx_uniform_warp_check or
nvptx_warpsync, if necessary.
gcc/testsuite/ChangeLog:
2022-02-19 Tom de Vries <tdevries@suse.de>
* gcc.target/nvptx/uniform-simt-1.c: Add scan-assembler test.
* gcc.target/nvptx/uniform-simt-2.c: New test.
Jakub Jelinek [Sat, 19 Feb 2022 08:03:57 +0000 (09:03 +0100)]
asan: Mark instrumented vars addressable [PR102656]
We ICE on the following testcase, because the asan1 pass decides to
instrument
<retval>.x = 0;
and does that by
_13 = &<retval>.x;
.ASAN_CHECK (7, _13, 4, 4);
<retval>.x = 0;
and later sanopt pass turns that into:
_39 = (unsigned long) &<retval>.x;
_40 = _39 >> 3;
_41 = _40 +
2147450880;
_42 = (signed char *) _41;
_43 = *_42;
_44 = _43 != 0;
_45 = _39 & 7;
_46 = (signed char) _45;
_47 = _46 + 3;
_48 = _47 >= _43;
_49 = _44 & _48;
if (_49 != 0)
goto <bb 10>; [0.05%]
else
goto <bb 9>; [99.95%]
<bb 10> [local count: 536864]:
__builtin___asan_report_store4 (_39);
<bb 9> [local count:
1073741824]:
<retval>.x = 0;
The problem is during expansion, <retval> isn't marked TREE_ADDRESSABLE,
even when we take its address in (unsigned long) &<retval>.x.
Now, instrument_derefs has code to avoid the instrumentation altogether
if we can prove the access is within bounds of an automatic variable in the
current function and the var isn't TREE_ADDRESSABLE (or we don't instrument
use after scope), but we do it solely for VAR_DECLs.
I think we should treat RESULT_DECLs exactly like that too, which is what
the following patch does. I must say I'm unsure about PARM_DECLs, those can
have different cases, either they are fully or partially passed in
registers, then if we take parameter's address, they are in a local copy
inside of a function and so work like those automatic vars. But if they
are fully passed in memory, we typically just take address of the slot
and in that case they live in the caller's frame. It is true we don't
(can't) put any asan padding in between the arguments, so all asan could
detect in that case is if caller passes fewer on stack arguments or smaller
arguments than callee accepts. Anyway, as I'm unsure, I haven't added
PARM_DECLs to that case.
And another thing is, when we actually build_fold_addr_expr, we need to
mark_addressable the inner if it isn't addressable already.
2022-02-19 Jakub Jelinek <jakub@redhat.com>
PR sanitizer/102656
* asan.cc (instrument_derefs): If inner is a RESULT_DECL and access is
known to be within bounds, treat it like automatic variables.
If instrumenting access and inner is {VAR,PARM,RESULT}_DECL from
current function and !TREE_STATIC which is not TREE_ADDRESSABLE, mark
it addressable.
* g++.dg/asan/pr102656.C: New test.
GCC Administrator [Sat, 19 Feb 2022 00:16:17 +0000 (00:16 +0000)]
Daily bump.
Ian Lance Taylor [Fri, 18 Feb 2022 23:04:00 +0000 (15:04 -0800)]
libgo: update Hurd support
Patches from Svante Signell for PR go/104290.
Reviewed-on: https://go-review.googlesource.com/c/gofrontend/+/386797
Pat Haugen [Fri, 18 Feb 2022 21:38:23 +0000 (15:38 -0600)]
Mark Power10 fusion option undocumented and remove sub-options.
gcc/
* config/rs6000/rs6000.opt (mpower10-fusion): Mark Undocumented.
(mpower10-fusion-ld-cmpi, mpower10-fusion-2logical,
mpower10-fusion-logical-add, mpower10-fusion-add-logical,
mpower10-fusion-2add, mpower10-fusion-2store): Remove.
* config/rs6000/rs6000-cpus.def (ISA_3_1_MASKS_SERVER,
OTHER_P9_VECTOR_MASKS): Remove Power10 fusion sub-options.
* config/rs6000/rs6000.cc (rs6000_option_override_internal,
power10_sched_reorder): Likewise.
* config/rs6000/genfusion.pl (gen_ld_cmpi_p10, gen_logical_addsubf,
gen_addadd): Likewise
* config/rs6000/fusion.md: Regenerate.
Ian Lance Taylor [Fri, 18 Feb 2022 21:10:34 +0000 (13:10 -0800)]
libgo: update to Go1.18rc1 release
Reviewed-on: https://go-review.googlesource.com/c/gofrontend/+/386594
H.J. Lu [Fri, 18 Feb 2022 18:36:53 +0000 (10:36 -0800)]
pieces-memset-21.c: Expect vzeroupper for ia32
Update gcc.target/i386/pieces-memset-21.c to expect vzeroupper for ia32
caused by
commit
fe79d652c96b53384ddfa43e312cb0010251391b
Author: Richard Biener <rguenther@suse.de>
Date: Thu Feb 17 14:40:16 2022 +0100
target/104581 - compile-time regression in mode-switching
PR target/104581
* gcc.target/i386/pieces-memset-21.c: Expect vzeroupper for ia32.
Jakub Jelinek [Fri, 18 Feb 2022 16:21:43 +0000 (17:21 +0100)]
rs6000: Fix up posix_memalign call in _mm_malloc [PR104598]
The uglification changes went in one spot too far and uglified also
the anem of function, posix_memalign should be called like that and
not a non-existent function instead of it.
2022-02-18 Jakub Jelinek <jakub@redhat.com>
PR target/104257
PR target/104598
* config/rs6000/mm_malloc.h (_mm_malloc): Call posix_memalign
rather than __posix_memalign.
Richard Biener [Thu, 17 Feb 2022 13:40:16 +0000 (14:40 +0100)]
target/104581 - compile-time regression in mode-switching
The x86 backend piggy-backs on mode-switching for insertion of
vzeroupper. A recent improvement there was implemented in a way
to walk possibly the whole basic-block for all DF reg def definitions
in its mode_needed hook which is called for each instruction in
a basic-block during mode-switching local analysis.
The following mostly reverts this improvement. It needs to be
re-done in a way more consistent with a local dataflow which
probably means making targets aware of the state of the local
dataflow analysis.
2022-02-17 Richard Biener <rguenther@suse.de>
PR target/104581
* config/i386/i386.cc (ix86_avx_u128_mode_source): Remove.
(ix86_avx_u128_mode_needed): Return AVX_U128_DIRTY instead
of calling ix86_avx_u128_mode_source which would eventually
have returned AVX_U128_ANY in some very special case.
* gcc.target/i386/pr101456-1.c: XFAIL.
Richard Biener [Tue, 15 Feb 2022 12:32:22 +0000 (13:32 +0100)]
tree-optimization/96881 - CD-DCE and CLOBBERs
CD-DCE does not consider CLOBBERs as necessary in the attempt
to not prevent DCE of SSA defs it uses. A side-effect of that
is that it also removes all its control dependences if they are
not made necessary by other means. When we later try to preserve
as many CLOBBERs as possible we have to make sure we also
preserved the controlling conditions, otherwise a CLOBBER can
now appear on a path where it was not executed before, leading
to wrong code as seen in the testcase.
I've tried to continue to handle both direct and indirect
CLOBBERs optimistically, allowing CD-DCE to remove control
flow that just controls CLOBBERs but that regresses for
example the stack coalescing test g++.dg/opt/pr80032.C.
The pattern there is
if (pred) D.2512 = CLOBBER; else D.2512 = CLOBBER;
basically we have all paths leading to the same clobber but
we could safely cut some branches which we do not realize
early enough. This regression can be mitigated by no longer
considering direct CLOBBERs optimistically - the original
motivation for the CD-DCE handling wasn't removal of control
flow but SSA defs of the address.
Handling indirect vs. direct clobbers differently feels
somewhat wrong, still the patch goes with this solution.
2022-02-15 Richard Biener <rguenther@suse.de>
PR tree-optimization/96881
* tree-ssa-dce.cc (mark_stmt_if_obviously_necessary): Comment
CLOBBER handling.
(control_parents_preserved_p): New function.
(eliminate_unnecessary_stmts): Check that we preserved control
parents before retaining a CLOBBER.
(perform_tree_ssa_dce): Pass down aggressive flag
to eliminate_unnecessary_stmts.
* g++.dg/torture/pr96881-1.C: New testcase.
* g++.dg/torture/pr96881-2.C: Likewise.
Patrick Palka [Fri, 18 Feb 2022 01:20:24 +0000 (20:20 -0500)]
c++: implicit 'this' in noexcept-spec within class tmpl [PR94944]
Here when instantiating the noexcept-spec we fail to resolve the
implicit object for the member call A<T>::f() ultimately because
maybe_instantiate_noexcept sets current_class_ptr/ref to the dependent
'this' (of type B<T>) rather than the specialized 'this' (of type B<int>).
This patch fixes this by making maybe_instantiate_noexcept set
current_class_ptr/ref to the specialized 'this' instead, consistent
with what tsubst_function_type does when substituting into the trailing
return type of a non-static member function.
PR c++/94944
gcc/cp/ChangeLog:
* pt.cc (maybe_instantiate_noexcept): For non-static member
functions, set current_class_ptr/ref to the specialized 'this'
instead.
gcc/testsuite/ChangeLog:
* g++.dg/cpp0x/noexcept34.C: Adjusted expected diagnostics.
* g++.dg/cpp0x/noexcept75.C: New test.
GCC Administrator [Fri, 18 Feb 2022 00:16:39 +0000 (00:16 +0000)]
Daily bump.
Jonathan Wakely [Thu, 17 Feb 2022 17:37:42 +0000 (17:37 +0000)]
libstdc++: Deprecate non-standard std::vector<bool>::insert(pos) [PR104559]
The SGI STL and pre-1998 drafts of the C++ standard had a default
argument for vector<bool>::insert(iterator, const bool&) which was
remove by N1051. The default argument is still present in libstdc++ for
some reason. There are no tests verifying it as an extension, so I don't
think it has been kept intentionally.
This removes the default argument but adds an overload without the
second parameter, and adds the deprecated attribute to it. This allows
any code using it to keep working (for now) but with a warning.
libstdc++-v3/ChangeLog:
PR libstdc++/104559
* doc/xml/manual/evolution.xml: Document deprecation.
* doc/html/manual/api.html: Regenerate.
* include/bits/stl_bvector.h (insert(const_iterator, const bool&)):
Remove default argument.
(insert(const_iterator)): New overload with deprecated attribute.
* testsuite/23_containers/vector/bool/modifiers/insert/104559.cc:
New test.
Jason Merrill [Thu, 17 Feb 2022 05:04:21 +0000 (00:04 -0500)]
c++: inlining explicit instantiations [PR104539]
The PR10968 fix cleared DECL_COMDAT to force output of explicit
instantiations. Then the PR59469 fix added a call to mark_needed, after
which we no longer need to clear DECL_COMDAT, and leaving it set allows us
to inline explicit instantiations without worrying about symbol
interposition.
I suppose there's an argument to be made that an explicit instantiation
declaration (extern template) should clear DECL_COMDAT, since that suggests
that there will be only a single instantiation somewhere that could be
subject to interposition, but that doesn't change the 'inline' semantics,
and it seems cleaner to treat template instantiations uniformly.
PR c++/104539
gcc/cp/ChangeLog:
* pt.cc (mark_decl_instantiated): Don't clear DECL_COMDAT.
gcc/testsuite/ChangeLog:
* g++.dg/ipa/inline-4.C: New test.
Jason Merrill [Wed, 16 Feb 2022 00:17:03 +0000 (19:17 -0500)]
tree: tweak warn_deprecated_use
While looking at PR90451 I noticed that this function was failing to find
the attributes if called with a variant of the struct.
gcc/ChangeLog:
* tree.cc (warn_deprecated_use): Look for TYPE_STUB_DECL
on TYPE_MAIN_VARIANT.
gcc/testsuite/ChangeLog:
* g++.dg/warn/deprecated-16.C: New test.
Jonathan Wakely [Thu, 17 Feb 2022 17:23:36 +0000 (17:23 +0000)]
libstdc++: Make std::error_code printer more robust
This attempts to implement a partial workaround for the GDB bug
https://sourceware.org/bugzilla/show_bug.cgi?id=28856 which causes GDB
to crash when printing a frame with a std::error_code argument.
By recognising the known error categories defined in the library and
hardcoding their names we do not need to call cat->name() on the
category. This has the additional benefit of also working when
debugging a core file rather than a running process. For those known
categories we can also cast the int value to the corresponding error
code enum (e.g. future_errc) so that we show an enumerator instead of
just an integer.
For program-defined categories we just use the name of the dynamic type
to identify the category, and print the value as an integer. Once the
GDB bug is fixed and the virtual name() function can be called safely,
that would be preferable. For now it's better to have an imperfect
printer that doesn't crash GDB.
This rewritten StdErrorCodePrinter needs gdb.Value.dynamic_type, so is
only registered if that is supported, which means GDB 7.7 and later.
libstdc++-v3/ChangeLog:
* python/libstdcxx/v6/printers.py (StdErrorCodePrinter): Replace
code that call cat->name() on std::error_category objects.
Identify known categories by symbol name and use a hardcoded
name. Print error code values as enumerators where appopriate.
* testsuite/libstdc++-prettyprinters/cxx11.cc: Adjust expected
name of custom category. Check io_errc and future_errc errors.
Jason Merrill [Wed, 16 Feb 2022 19:05:39 +0000 (14:05 -0500)]
c++: avoid duplicate deprecated warning [PR90451]
We were getting the deprecated warning twice for the same call because we
called mark_used first in finish_qualified_id_expr and then again in
build_over_call. Let's not call it the first time; C++17 clarified that a
function is used only when it is selected from an overload set, which
happens later.
Then I had to add a few more uses in places that don't do anything further
with the expression (convert_to_void, finish_decltype_type), and places that
use the expression more unusually (cp_build_addr_expr_1,
convert_nontype_argument). The new mark_single_function is mostly so
that I only have to put the comment in one place.
PR c++/90451
gcc/cp/ChangeLog:
* decl2.cc (mark_single_function): New.
* cp-tree.h: Declare it.
* typeck.cc (cp_build_addr_expr_1): mark_used when making a PMF.
* semantics.cc (finish_qualified_id_expr): Not here.
(finish_id_expression_1): Or here.
(finish_decltype_type): Call mark_single_function.
* cvt.cc (convert_to_void): And here.
* pt.cc (convert_nontype_argument): And here.
* init.cc (build_offset_ref): Adjust assert.
gcc/testsuite/ChangeLog:
* g++.dg/warn/deprecated-14.C: New test.
* g++.dg/warn/deprecated-15.C: New test.
Paul A. Clarke [Thu, 17 Feb 2022 02:01:41 +0000 (20:01 -0600)]
rs6000: __Uglify non-uglified local variables in headers
Properly prefix (with "__") all local variables in shipped headers for x86
compatibility intrinsics implementations. This avoids possible problems with
usages like:
```
```
2022-02-16 Paul A. Clarke <pc@us.ibm.com>
gcc
PR target/104257
* config/rs6000/bmi2intrin.h: Uglify local variables.
* config/rs6000/emmintrin.h: Likewise.
* config/rs6000/mm_malloc.h: Likewise.
* config/rs6000/mmintrin.h: Likewise.
* config/rs6000/pmmintrin.h: Likewise.
* config/rs6000/smmintrin.h: Likewise.
* config/rs6000/tmmintrin.h: Likewise.
* config/rs6000/xmmintrin.h: Likewise.
Robin Dapp [Thu, 17 Feb 2022 18:59:51 +0000 (19:59 +0100)]
rs6000: Workaround for new ifcvt behavior [PR104335].
Since r12-6747-gaa8cfe785953a0 ifcvt passes a "cc comparison"
i.e. the representation of the result of a comparison to the
backend. rs6000_emit_int_cmove () is not prepared to handle this.
Therefore, this patch makes it return false in such a case.
PR target/104335
gcc/ChangeLog:
* config/rs6000/rs6000.cc (rs6000_emit_int_cmove): Return false
if the expected comparison's first operand is of mode MODE_CC.
Jonathan Wakely [Tue, 15 Feb 2022 13:22:15 +0000 (13:22 +0000)]
c-family: Remove names of unused parameters
C++ allows unnamed parameters, which means we don't need to call them
'dummy' and mark them with the unused attribute.
gcc/c-family/ChangeLog:
* c-pragma.cc (handle_pragma_pack): Remove parameter name.
(handle_pragma_weak): Likewise.
(handle_pragma_scalar_storage_order): Likewise.
(handle_pragma_redefine_extname): Likewise.
(handle_pragma_visibility): Likewise.
(handle_pragma_diagnostic): Likewise.
(handle_pragma_target): Likewise.
(handle_pragma_optimize): Likewise.
(handle_pragma_push_options): Likewise.
(handle_pragma_pop_options): Likewise.
(handle_pragma_reset_options): Likewise.
(handle_pragma_message): Likewise.
(handle_pragma_float_const_decimal64): Likewise.
Eric Botcazou [Thu, 17 Feb 2022 17:34:06 +0000 (18:34 +0100)]
Add missing target selector
gcc/testsuite/
PR target/79754
* gcc.target/i386/pr79754.c: Add target dfp.
Ian Lance Taylor [Thu, 17 Feb 2022 00:57:28 +0000 (16:57 -0800)]
net: add hurd build tag for setReadMsgCloseOnExec
Patch from Svante Signell.
PR go/103573
PR go/104290
Reviewed-on: https://go-review.googlesource.com/c/gofrontend/+/386216
Mark Wielaard [Thu, 2 Dec 2021 17:00:39 +0000 (18:00 +0100)]
libiberty rust-demangle, ignore .suffix
Rust symbols can have a .suffix because of compiler transformations.
These can be ignored in the demangled name. Which is what this patch
implements. By stopping at the first dot for v0 symbols and searching
backwards to the ending 'E' for legacy symbols.
An alternative implementation could be to follow what C++ does and
represent these as [clone .suffix] tagged onto the demangled name.
But this seems somewhat confusing since it results in a demangled
name that cannot be mangled again. And it would mean trying to
decode compiler internal naming.
https://bugs.kde.org/show_bug.cgi?id=445916
https://github.com/rust-lang/rust/issues/60705
libiberty/Changelog
* rust-demangle.c (rust_demangle_callback): Ignore everything
after '.' char in sym for v0. For legacy symbols search
backwards to find the last 'E' before any '.'.
* testsuite/rust-demangle-expected: Add new .suffix testcases.
Vladimir N. Makarov [Thu, 17 Feb 2022 16:31:50 +0000 (11:31 -0500)]
[PR104447] LRA: Do not split non-alloc hard regs.
LRA tried to split non-allocated hard reg for reload pseudos again and
again until number of assignment passes reaches the limit. The patch fixes
this.
gcc/ChangeLog:
PR rtl-optimization/104447
* lra-constraints.cc (spill_hard_reg_in_range): Initiate ignore
hard reg set by lra_no_alloc_regs.
gcc/testsuite/ChangeLog:
PR rtl-optimization/104447
* gcc.target/i386/pr104447.c: New.
Patrick Palka [Thu, 17 Feb 2022 13:35:23 +0000 (08:35 -0500)]
c++: double non-dep folding from finish_compound_literal [PR104565]
In finish_compound_literal, we perform non-dependent expr folding before
the call to check_narrowing ever since r9-5973. But ever since r10-7096,
check_narrowing also performs non-dependent expr folding of its own.
This double folding means tsubst will see non-templated trees during the
second folding, which causes a spurious error in the below testcase.
This patch removes the former folding operation; it seems obviated by
the latter one.
PR c++/104565
gcc/cp/ChangeLog:
* semantics.cc (finish_compound_literal): Don't perform
non-dependent expr folding before calling check_narrowing.
gcc/testsuite/ChangeLog:
* g++.dg/template/non-dependent22.C: New test.
liuhongt [Wed, 16 Feb 2022 04:15:18 +0000 (12:15 +0800)]
Restrict the two sources of vect_recog_cond_expr_convert_pattern to be of the same type when convert is extension.
It's not equal to transform
(cond (cmp @1 @2) (convert@3 @4) (convert@5 @6))
to
(convert (cmp @1 @2) (convert)@4 @6)
when(convert@3 @4) is extension because it's zero_extend vs sign_extend.
gcc/ChangeLog:
PR tree-optimization/104551
PR tree-optimization/103771
* match.pd (cond_expr_convert_p): Add types_match check when
convert is extension.
* tree-vect-patterns.cc
(gimple_cond_expr_convert_p): Adjust comments.
(vect_recog_cond_expr_convert_pattern): Ditto.
gcc/testsuite/ChangeLog:
* gcc.target/i386/pr104551.c: New test.
Jakub Jelinek [Thu, 17 Feb 2022 10:14:38 +0000 (11:14 +0100)]
valtrack: Avoid creating raw SUBREGs with VOIDmode argument [PR104557]
After the recent r12-7240 simplify_immed_subreg changes, we bail on more
simplify_subreg calls than before, e.g. apparently for decimal modes
in the NaN representations we almost never preserve anything except the
canonical {q,s}NaNs.
simplify_gen_subreg will punt in such cases because a SUBREG with VOIDmode
is not valid, but debug_lowpart_subreg wants to attempt even harder, even
if e.g. target indicates certain mode combinations aren't valid for the
backend, dwarf2out can still handle them. But a SUBREG from a VOIDmode
operand is just too much, the inner mode is lost there. We'd need some
new rtx that would be able to represent those cases.
For now, just punt in those cases.
2022-02-17 Jakub Jelinek <jakub@redhat.com>
PR debug/104557
* valtrack.cc (debug_lowpart_subreg): Don't call gen_rtx_raw_SUBREG
if expr has VOIDmode.
* gcc.dg/dfp/pr104557.c: New test.
Jakub Jelinek [Thu, 17 Feb 2022 09:29:06 +0000 (10:29 +0100)]
openmp: Ensure proper diagnostics for -> in map/to/from clauses [PR104532]
The following patch uses the functions normal CPP_DEREF parsing uses,
i.e. convert_lvalue_to_rvalue and build_indirect_ref, instead of
blindly calling build_simple_mem_ref, so that if the variable does not
have correct type, we properly diagnose it instead of ICEing on it.
2022-02-17 Jakub Jelinek <jakub@redhat.com>
PR c/104532
* c-parser.cc (c_parser_omp_variable_list): For CPP_DEREF, use
convert_lvalue_to_rvalue and build_indirect_ref instead of
build_simple_mem_ref.
* gcc.dg/gomp/pr104532.c: New test.
liuhongt [Wed, 16 Feb 2022 07:00:59 +0000 (15:00 +0800)]
Clean up MPX-related bit_{MPX,BNDREGS,BNDCSR}.
gcc/ChangeLog:
* config/i386/cpuid.h (bit_MPX): Removed.
(bit_BNDREGS): Ditto.
(bit_BNDCSR): Ditto.
Ian Lance Taylor [Thu, 17 Feb 2022 04:18:45 +0000 (20:18 -0800)]
libbacktrace: gather address ranges from skeleton units
* dwarf.c (find_address_ranges): Handle skeleton units.
(read_function_entry): Likewise.
Michael Meissner [Thu, 17 Feb 2022 03:00:00 +0000 (22:00 -0500)]
Define __SIZEOF_FLOAT128__ and __SIZEOF_IBM128__.
Define the sizes of the PowerPC specific types __float128 and __ibm128 if those
types are enabled.
This patch will define __SIZEOF_IBM128__ and __SIZEOF_FLOAT128__ if their
respective types are created in the compiler. Currently, this means both of
these will be defined if float128 support is enabled. But at some point in
the future, __ibm128 could be enabled without enabling float128 support and
__SIZEOF_IBM128__ would be defined.
2022-02-16 Michael Meissner <meissner@the-meissners.org>
gcc/
PR target/99708
* config/rs6000/rs6000-c.cc (rs6000_cpu_cpp_builtins): Define
__SIZEOF_IBM128__ if the IBM 128-bit long double type is created.
Define __SIZEOF_FLOAT128__ if the IEEE 128-bit floating point type
is created.
gcc/testsuite/
PR target/99708
* gcc.target/powerpc/pr99708.c: New test.
David Malcolm [Wed, 16 Feb 2022 23:21:58 +0000 (18:21 -0500)]
analyzer: const functions have no side effects [PR104576]
PR analyzer/104576 tracks that we issue a false positive from
-Wanalyzer-use-of-uninitialized-value for the reproducers of PR 63311
when optimization is disabled.
The root cause is that the analyzer was considering that a call to
__builtin_sinf could have side-effects.
This patch fixes things by generalizing the handling for "pure"
functions to also consider "const" functions.
gcc/analyzer/ChangeLog:
PR analyzer/104576
* region-model.cc: Include "calls.h".
(region_model::on_call_pre): Use flags_from_decl_or_type to
generalize check for DECL_PURE_P to also check for ECF_CONST.
gcc/testsuite/ChangeLog:
PR analyzer/104576
* gcc.dg/analyzer/torture/uninit-pr63311.c: New test.
* gcc.dg/analyzer/uninit-pr104576.c: New test.
* gfortran.dg/analyzer/uninit-pr63311.f90: New test.
Signed-off-by: David Malcolm <dmalcolm@redhat.com>
GCC Administrator [Thu, 17 Feb 2022 00:16:36 +0000 (00:16 +0000)]
Daily bump.
David Malcolm [Wed, 16 Feb 2022 14:06:46 +0000 (09:06 -0500)]
analyzer: fixes to free of non-heap detection [PR104560]
PR analyzer/104560 reports various false positives from
-Wanalyzer-free-of-non-heap seen with rdma-core, on what's
effectively:
free (&ptr->field)
where in this case "field" is the first element of its struct, and thus
&ptr->field == ptr, and could be on the heap.
The root cause is due to malloc_state_machine::on_stmt making
"LHS = &EXPR;"
transition LHS from start to non_heap when EXPR is not a MEM_REF;
this assumption doesn't hold for the above case.
This patch eliminates that state transition, instead relying on
malloc_state_machine::get_default_state to detect regions known to
not be on the heap.
Doing so fixes the false positive, but eliminates some events relating
to free-of-alloca identifying the alloca, so the patch also reworks
free_of_non_heap to capture which region has been freed, adding
region creation events to diagnostic paths, so that the alloca calls
can be identified, and using the memory space of the region for more
precise wording of the diagnostic.
The improvement to malloc_state_machine::get_default_state also
means we now detect attempts to free VLAs, functions and code labels.
In doing so I spotted that I wasn't adding region creation events for
regions for global variables, and for cases where an allocation is the
last stmt within its basic block, so the patch also fixes these issues.
gcc/analyzer/ChangeLog:
PR analyzer/104560
* diagnostic-manager.cc (diagnostic_manager::build_emission_path):
Add region creation events for globals of interest.
(null_assignment_sm_context::get_old_program_state): New.
(diagnostic_manager::add_events_for_eedge): Move check for
changing dynamic extents from PK_BEFORE_STMT case to after the
switch on the dst_point's kind so that we can emit them for the
final stmt in a basic block.
* engine.cc (impl_sm_context::get_old_program_state): New.
* sm-malloc.cc (malloc_state_machine::get_default_state): Rewrite
detection of m_non_heap to use get_memory_space.
(free_of_non_heap::free_of_non_heap): Add freed_reg param.
(free_of_non_heap::subclass_equal_p): Update for changes to
fields.
(free_of_non_heap::emit): Drop m_kind in favor of
get_memory_space.
(free_of_non_heap::describe_state_change): Remove logic for
detecting alloca.
(free_of_non_heap::mark_interesting_stuff): Add region-creation of
m_freed_reg.
(free_of_non_heap::get_memory_space): New.
(free_of_non_heap::kind): Drop enum.
(free_of_non_heap::m_freed_reg): New field.
(free_of_non_heap::m_kind): Drop field.
(malloc_state_machine::on_stmt): Drop transition to m_non_heap.
(malloc_state_machine::handle_free_of_non_heap): New function,
split out from on_deallocator_call and on_realloc_call, adding
detection of the freed region.
(malloc_state_machine::on_deallocator_call): Use it.
(malloc_state_machine::on_realloc_call): Likewise.
* sm.h (sm_context::get_old_program_state): New vfunc.
gcc/testsuite/ChangeLog:
PR analyzer/104560
* g++.dg/analyzer/placement-new.C: Update expected wording.
* g++.dg/analyzer/pr100244.C: Likewise.
* gcc.dg/analyzer/attr-malloc-1.c (test_7): Likewise.
* gcc.dg/analyzer/malloc-1.c (test_24): Likewise.
(test_25): Likewise.
(test_26): Likewise.
(test_50a, test_50b, test_50c): New.
* gcc.dg/analyzer/malloc-callbacks.c (test_5): Update expected
wording.
* gcc.dg/analyzer/malloc-paths-8.c: Likewise.
* gcc.dg/analyzer/pr104560-1.c: New test.
* gcc.dg/analyzer/pr104560-2.c: New test.
* gcc.dg/analyzer/realloc-1.c (test_7): Updated expected wording.
* gcc.dg/analyzer/vla-1.c (test_2): New. Prune output from
-Wfree-nonheap-object.
Signed-off-by: David Malcolm <dmalcolm@redhat.com>
Ian Lance Taylor [Wed, 16 Feb 2022 19:30:04 +0000 (11:30 -0800)]
libgo: restore building on Solaris
Add build tags and a few other changes so that libgo builds on Solaris.
Patch partially from Rainer Orth.
Reviewed-on: https://go-review.googlesource.com/c/gofrontend/+/386215
Ian Lance Taylor [Wed, 16 Feb 2022 19:35:00 +0000 (11:35 -0800)]
libbacktrace: initialize DWARF 5 fields of unit
When I added the fields in 2019-12-13 I forgot to initialize them.
* dwarf.c (build_address_map): Initialize DWARF 5 fields of unit.
Andrew MacLeod [Wed, 16 Feb 2022 14:01:47 +0000 (09:01 -0500)]
Use range_compatible_p in condexpr_adjust
* gimple-range-gori.cc (gori_compute::condexpr_adjust): Use
range_compatible_p instead of direct type comparison.
Patrick Palka [Wed, 16 Feb 2022 17:41:35 +0000 (12:41 -0500)]
c++: treat NON_DEPENDENT_EXPR as not potentially constant [PR104507]
Here we're crashing from potential_constant_expression because it tries
to perform trial evaluation of the first operand '(bool)__r' of the
conjunction (which is overall wrapped in a NON_DEPENDENT_EXPR), but
cxx_eval_constant_expression ICEs on unsupported trees (of which CAST_EXPR
is one). The sequence of events is:
1. build_non_dependent_expr for the array subscript yields
NON_DEPENDENT_EXPR<<<(bool)__r && __s>>> ? 1 : 2
2. cp_build_array_ref calls fold_non_dependent_expr on this subscript
(after this point, processing_template_decl is cleared)
3. during which, the COND_EXPR case of tsubst_copy_and_build calls
fold_non_dependent_expr on the first operand
4. during which, we crash from p_c_e_1 because it attempts trial
evaluation of the CAST_EXPR '(bool)__r'.
Note that even if this crash didn't happen, fold_non_dependent_expr
from cp_build_array_ref would still ultimately be one big no-op here
since neither constexpr evaluation nor tsubst handle NON_DEPENDENT_EXPR.
In light of this and of the observation that we should never see
NON_DEPENDENT_EXPR in a context where a constant expression is needed
(it's used primarily in the build_x_* family of functions), it seems
futile for p_c_e_1 to ever return true for NON_DEPENDENT_EXPR. And the
otherwise inconsistent handling of NON_DEPENDENT_EXPR between p_c_e_1,
cxx_evaluate_constexpr_expression and tsubst apparently leads to weird
bugs such as this one.
PR c++/104507
gcc/cp/ChangeLog:
* constexpr.cc (potential_constant_expression_1)
<case NON_DEPENDENT_EXPR>: Return false instead of recursing.
Assert tf_error isn't set.
gcc/testsuite/ChangeLog:
* g++.dg/template/non-dependent21.C: New test.
Jakub Jelinek [Wed, 16 Feb 2022 16:03:58 +0000 (17:03 +0100)]
testsuite: Add testcase for already fixed PR [PR104448]
This PR has been fixed with r12-7147-g2f9ab267e725ddf2.
2022-02-16 Jakub Jelinek <jakub@redhat.com>
PR target/104448
* gcc.target/i386/pr104448.c: New test.
Jakub Jelinek [Wed, 16 Feb 2022 13:48:30 +0000 (14:48 +0100)]
combine: Fix up -fcompare-debug issue in the combiner [PR104544]
On the following testcase on aarch64-linux, we behave differently
with -g and -g0.
The problem is that on:
(insn 10011 10010 10012 2 (set (reg:CC 66 cc)
(compare:CC (reg:DI 105)
(const_int 0 [0]))) "pr104544.c":18:3 407 {cmpdi}
(expr_list:REG_DEAD (reg:DI 105)
(nil)))
(insn 10012 10011 10013 2 (set (reg:SI 109)
(eq:SI (reg:CC 66 cc)
(const_int 0 [0]))) "pr104544.c":18:3 444 {aarch64_cstoresi}
(expr_list:REG_DEAD (reg:CC 66 cc)
(nil)))
(insn 10013 10012 10016 2 (set (reg:DI 110)
(zero_extend:DI (reg:SI 109))) "pr104544.c":18:3 111 {*zero_extendsidi2_aarch64}
(expr_list:REG_DEAD (reg:SI 109)
(nil)))
(insn 10016 10013 10017 2 (parallel [
(set (reg:CC 66 cc)
(compare:CC (const_int 0 [0])
(reg:DI 110)))
(set (reg:DI 111)
(neg:DI (reg:DI 110)))
]) "pr104544.c":18:3 281 {negdi_carryout}
(expr_list:REG_DEAD (reg:DI 110)
(nil)))
...
(debug_insn 6 5 7 2 (var_location:SI y (debug_expr:SI D#5)) "pr104544.c":18:3 -1
(nil))
(debug_insn 7 6 10033 2 (debug_marker) "pr104544.c":11:3 -1
(nil))
(insn 10033 7 10034 2 (set (reg:DI 117 [ _14 ])
(ior:DI (reg:DI 111)
(reg:DI 112))) "pr104544.c":11:6 496 {iordi3}
(expr_list:REG_DEAD (reg:DI 112)
(expr_list:REG_DEAD (reg:DI 111)
(nil))))
we successfully split 3 insns into two:
Trying 10011, 10013 -> 10016:
10011: cc:CC=cmp(r105:DI,0)
REG_DEAD r105:DI
10013: r110:DI=cc:CC==0
REG_DEAD cc:CC
10016: {cc:CC=cmp(0,r110:DI);r111:DI=-r110:DI;}
REG_DEAD r110:DI
Failed to match this instruction:
(parallel [
(set (reg:CC 66 cc)
(compare:CC (reg:DI 105)
(const_int 0 [0])))
(set (reg:DI 111)
(neg:DI (eq:DI (reg:DI 105)
(const_int 0 [0]))))
])
Failed to match this instruction:
(parallel [
(set (reg:CC 66 cc)
(compare:CC (reg:DI 105)
(const_int 0 [0])))
(set (reg:DI 111)
(neg:DI (eq:DI (reg:DI 105)
(const_int 0 [0]))))
])
Successfully matched this instruction:
(set (reg:DI 111)
(neg:DI (eq:DI (reg:DI 105)
(const_int 0 [0]))))
Successfully matched this instruction:
(set (reg:CC 66 cc)
(compare:CC (reg:DI 105)
(const_int 0 [0])))
Successfully matched this instruction:
(set (reg:DI 112)
(neg:DI (eq:DI (reg:CC 66 cc)
(const_int 0 [0]))))
allowing combination of insns 10011, 10013 and 10016
original costs 4 + 4 + 4 = 16
replacement costs 4 + 4 = 12
deferring deletion of insn with uid = 10011.
but the code that searches forward for insns to update their log
links (before the change there is a link from insn 10033 to insn 10016
for pseudo 111) only finds insn 10033 and updates the log link if
-g isn't enabled, otherwise it stops earlier because there are debug insns
in between. So, with -g LOG_LINKS of 10033 isn't updated, points eventually
to NOTE_INSN_DELETED and so we do not attempt to combine 10033 with other
insns, while with -g0 we do.
The following patch fixes that by instead ignoring debug insns during the
searching. We can still check BLOCK_FOR_INSN (insn) on those, because
if we notice DEBUG_INSN in a following basic block, necessarily there won't
be any further normal insns in the current block after it.
2022-02-16 Jakub Jelinek <jakub@redhat.com>
PR rtl-optimization/104544
* combine.cc (try_combine): When looking for insn whose links
should be updated from i3 to i2, don't stop on debug insns, instead
skip over them.
* gcc.dg/pr104544.c: New test.
Richard Sandiford [Wed, 16 Feb 2022 10:21:14 +0000 (10:21 +0000)]
aarch64: Tweak atomic-inst-cas.c options
atomic-inst-cas.c has code to skip __atomic_compare_exchange_n
calls for invalid memory orderings, but -Winvalid-memory-model
applies before the dead code is removed (which is the right
behaviour IMO). This patch therefore suppresses the warning
for this test.
gcc/testsuite/
* gcc.target/aarch64/atomic-inst-cas.c: Add
-Wno-invalid-memory-model.
Richard Sandiford [Wed, 16 Feb 2022 10:21:14 +0000 (10:21 +0000)]
aarch64: Remove XFAIL for bic-bitmask-1.c
bic-bitmask-1.c is now passing, so remove the XFAIL.
gcc/testsuite/
* gcc.target/aarch64/bic-bitmask-1.c: Remove XFAIL.
Richard Sandiford [Wed, 16 Feb 2022 10:21:13 +0000 (10:21 +0000)]
aarch64: Extend PR100056 patterns to +
pr100056.c contains things like:
int
or_shift_u3a (unsigned i)
{
i &= 7;
return i | (i << 11);
}
After g:
96146e61cd7aee62c21c2845916ec42152918ab7, the preferred
gimple representation of this is a multiplication:
i_2 = i_1(D) & 7;
_5 = i_2 * 2049;
Expand then open-codes the multiplication back to individual shifts,
but (of course) it uses + rather than | to combine the shifts.
This means that we end up with the RTL equivalent of:
i + (i << 11)
I wondered about canonicalising the + to | (*back* to | in this case)
when the operands have no set bits in common and when one of the
operands is &, | or ^, but that didn't seem to be a popular idea when
I asked on IRC. The feeling seemed to be that + is inherently simpler
than |, so we shouldn't be “simplifying” the other way.
This patch therefore adjusts the PR100056 patterns to handle +
as well as |, in cases where the operands are provably disjoint.
For:
int
or_shift_u8 (unsigned char i)
{
return i | (i << 11);
}
the instructions:
2: r95:SI=zero_extend(x0:QI)
REG_DEAD x0:QI
7: r98:SI=r95:SI<<0xb
are combined into:
(parallel [
(set (reg:SI 98)
(and:SI (ashift:SI (reg:SI 0 x0 [ i ])
(const_int 11 [0xb]))
(const_int 522240 [0x7f800])))
(set (reg/v:SI 95 [ i ])
(zero_extend:SI (reg:QI 0 x0 [ i ])))
])
which fails to match, but which is then split into its individual
(independent) sets. Later the zero_extend is combined with the add
to get an ADD UXTB:
(set (reg:SI 99)
(plus:SI (zero_extend:SI (reg:QI 0 x0 [ i ]))
(reg:SI 98)))
This means that there is never a 3-insn combo to match the split
against. The end result is therefore:
ubfiz w1, w0, 11, 8
add w0, w1, w0, uxtb
This is a bit redundant, since it's doing the zero_extend twice.
It is at least 2 instructions though, rather than the 3 that we
had before the original patch for PR100056. or_shift_u8_asm is
affected similarly.
The net effect is that we do still have 2 UBFIZs, but we're at
least back down to 2 instructions per function, as for GCC 11.
I think that's good enough for now.
There are probably other instructions that should be extended
to support + as well as | (e.g. the EXTR ones), but those aren't
regressions and so are GCC 13 material.
gcc/
PR target/100056
* config/aarch64/iterators.md (LOGICAL_OR_PLUS): New iterator.
* config/aarch64/aarch64.md: Extend the PR100056 patterns
to handle plus in the same way as ior, if the operands have
no set bits in common.
gcc/testsuite/
PR target/100056
* gcc.target/aarch64/pr100056.c: XFAIL the original UBFIZ test
and instead expect two UBFIZs + two ADD UXTBs.
Iain Buclaw [Sun, 13 Feb 2022 19:17:53 +0000 (20:17 +0100)]
d: Merge upstream dmd
52844d4b1, druntime
dbd0c874, phobos
896b1d0e1.
D front-end changes:
- Parsing and compiling C code is now possible using `import'.
- `throw' statements can now be used as an expression.
- Improvements to the D template emission strategy when compiling
with `-funittest'.
D Runtime changes:
- New core.int128 module for implementing intrinsics to support
128-bit integer types.
- C bindings for the kernel and C runtime have been better separated
to allow compiling for hybrid targets, such as kFreeBSD.
Phobos changes:
- The std.experimental.checkedint module has been renamed to
std.checkedint.
gcc/d/ChangeLog:
* d-builtins.cc (d_build_builtins_module): Set purity of DECL_PURE_P
functions to PURE::const_.
* d-gimplify.cc (bit_field_ref): New function.
(d_gimplify_modify_expr): Handle implicit casting for assignments to
bit-fields.
(d_gimplify_unary_expr): New function.
(d_gimplify_binary_expr): New function.
(d_gimplify_expr): Handle UNARY_CLASS_P and BINARY_CLASS_P.
* d-target.cc (Target::_init): Initialize bitFieldStyle.
(TargetCPP::parameterType): Update signature.
(Target::supportsLinkerDirective): New function.
* dmd/MERGE: Merge upstream dmd
52844d4b1.
* expr.cc (ExprVisitor::visit (ThrowExp *)): New function.
* types.cc (d_build_bitfield_integer_type): New function.
(insert_aggregate_bitfield): New function.
(layout_aggregate_members): Handle inserting bit-fields into an
aggregate type.
libphobos/ChangeLog:
* Makefile.in: Regenerate.
* libdruntime/MERGE: Merge upstream druntime
dbd0c874.
* libdruntime/Makefile.am (DRUNTIME_CSOURCES): Add core/int128.d.
(DRUNTIME_DISOURCES): Add __builtins.di.
* libdruntime/Makefile.in: Regenerate.
* src/MERGE: Merge upstream phobos
896b1d0e1.
* src/Makefile.am (PHOBOS_DSOURCES): Add std/checkedint.d.
* src/Makefile.in: Regenerate.
* testsuite/testsuite_flags.in: Add -fall-instantiations to
--gdcflags.
Jakub Jelinek [Wed, 16 Feb 2022 08:27:11 +0000 (09:27 +0100)]
openmp: For min/max omp atomic compare forms verify arg types with build_binary_op [PR104531]
The MIN_EXPR/MAX_EXPR handling in *build_binary_op is minimal (especially
for C FE), because min/max aren't expressions the languages contain directly.
I'm using those for the
#pragma omp atomic
x = x < y ? y : x;
forms, but e.g. for the attached testcase we normally reject _Complex int vs. int
comparisons, in C++ due to MIN/MAX_EXPR we were diagnosing it as invalid types
for <unknown> while in C we accept it and ICEd later on.
The following patch will try build_binary_op with LT_EXPR on the operands first
to get needed diagnostics and fail if it returns error_mark_node.
2022-02-16 Jakub Jelinek <jakub@redhat.com>
PR c/104531
* c-omp.cc (c_finish_omp_atomic): For MIN_EXPR/MAX_EXPR, try first
build_binary_op with LT_EXPR and only if that doesn't return
error_mark_node call build_modify_expr.
* c-c++-common/gomp/atomic-31.c: New test.
Jakub Jelinek [Wed, 16 Feb 2022 08:25:55 +0000 (09:25 +0100)]
c-family: Fix up shorten_compare for decimal vs. non-decimal float comparison [PR104510]
The comment in shorten_compare says:
/* If either arg is decimal float and the other is float, fail. */
but the callers of shorten_compare don't expect anything like failure
as a possibility from the function, callers require that the function
promotes the operands to the same type, whether the original selected
*restype_ptr one or some shortened.
So, if we choose not to shorten, we should still promote to the original
*restype_ptr.
2022-02-16 Jakub Jelinek <jakub@redhat.com>
PR c/104510
* c-common.cc (shorten_compare): Convert original arguments to
the original *restype_ptr when mixing binary and decimal float.
* gcc.dg/dfp/pr104510.c: New test.
GCC Administrator [Wed, 16 Feb 2022 00:16:26 +0000 (00:16 +0000)]
Daily bump.
Peter Bergner [Tue, 15 Feb 2022 22:51:32 +0000 (16:51 -0600)]
rs6000: Retry tbegin. instructions that can fail intermittently
The HTM tbegin. instruction can fail intermittently due to many reasons.
This can lead to htm-1.c FAILing from time to time. The solution is to
allow retrying the instruction a few times before aborting.
2022-02-15 Peter Bergner <bergner@linux.ibm.com>
gcc/testsuite/
* gcc.target/powerpc/htm-1.c: Retry intermittent failing tbegins.
Andrew MacLeod [Tue, 15 Feb 2022 00:43:40 +0000 (19:43 -0500)]
Use GORI to evaluate arguments of a COND_EXPR.
Provide an API into gori to perform a basic evaluation of the arguments of a
COND_EXPR if they are in the dependency chain of the condition.
PR tree-optimization/104526
gcc/
* gimple-range-fold.cc (fold_using_range::range_of_cond_expr): Call
new routine.
* gimple-range-gori.cc (range_def_chain::get_def_chain): Force a build
of dependency chain if there isn't one.
(gori_compute::condexpr_adjust): New.
* gimple-range-gori.h (class gori_compute): New prototype.
gcc/testsuite/
* gcc.dg/pr104526.c: New.
David Malcolm [Mon, 14 Feb 2022 18:27:45 +0000 (13:27 -0500)]
analyzer: fix ICE on cast to NULL type [PR104524]
gcc/analyzer/ChangeLog:
PR analyzer/104524
* region-model-manager.cc
(region_model_manager::maybe_fold_sub_svalue): Only call
get_or_create_cast if type is non-NULL.
gcc/testsuite/ChangeLog:
PR analyzer/104524
* gcc.dg/analyzer/pr104524.c: New test.
Signed-off-by: David Malcolm <dmalcolm@redhat.com>
David Malcolm [Fri, 11 Feb 2022 21:43:21 +0000 (16:43 -0500)]
analyzer: fix uninit false +ve due to optimized conditionals [PR102692]
There is false positive from -Wanalyzer-use-of-uninitialized-value on
gcc.dg/analyzer/pr102692.c here:
‘fix_overlays_before’: events 1-3
|
| 75 | while (tail
| | ~~~~
| 76 | && (tem = make_lisp_ptr (tail, 5),
| | ^~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~
| | |
| | (1) following ‘false’ branch (when ‘tail’ is NULL)...
| 77 | (end = marker_position (XOVERLAY (tem)->end)) >= pos))
| | ~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~
|......
| 82 | if (!tail || end < prev || !tail->next)
| | ~~~~~ ~~~~~~~~~~
| | | |
| | | (3) use of uninitialized value ‘end’ here
| | (2) ...to here
|
The issue is that inner || of the conditionals have been folded within the
frontend from a chain of control flow:
5 │ if (tail == 0B) goto <D.1986>; else goto <D.1988>;
6 │ <D.1988>:
7 │ if (end < prev) goto <D.1986>; else goto <D.1989>;
8 │ <D.1989>:
9 │ _1 = tail->next;
10 │ if (_1 == 0B) goto <D.1986>; else goto <D.1987>;
11 │ <D.1986>:
to an OR expr (and then to a bitwise-or by the gimplifier):
5 │ _1 = tail == 0B;
6 │ _2 = end < prev;
7 │ _3 = _1 | _2;
8 │ if (_3 != 0) goto <D.1986>; else goto <D.1988>;
9 │ <D.1988>:
10 │ _4 = tail->next;
11 │ if (_4 == 0B) goto <D.1986>; else goto <D.1987>;
This happens for sufficiently simple conditionals in fold_truth_andor.
In particular, the (end < prev) is short-circuited without optimization,
but is evaluated with optimization, leading to the false positive.
Given how early this folding occurs, it seems the simplest fix is to
try to detect places where this optimization appears to have happened,
and suppress uninit warnings within the statement that would have
been short-circuited.
gcc/analyzer/ChangeLog:
PR analyzer/102692
* exploded-graph.h (impl_region_model_context::get_stmt): New.
* region-model.cc: Include "gimple-ssa.h", "tree-phinodes.h",
"tree-ssa-operands.h", and "ssa-iterators.h".
(within_short_circuited_stmt_p): New.
(region_model::check_for_poison): Don't warn about uninit values
if within_short_circuited_stmt_p.
* region-model.h (region_model_context::get_stmt): New vfunc.
(noop_region_model_context::get_stmt): New.
gcc/testsuite/ChangeLog:
PR analyzer/102692
* gcc.dg/analyzer/pr102692-2.c: New test.
* gcc.dg/analyzer/pr102692.c: Remove xfail. Remove -O2 from
options and move to...
* gcc.dg/analyzer/torture/pr102692.c: ...here.
Signed-off-by: David Malcolm <dmalcolm@redhat.com>
Tobias Burnus [Tue, 15 Feb 2022 20:42:33 +0000 (21:42 +0100)]
Fortran/OpenMP: Fix depend-clause handling for c_ptr
gcc/fortran/ChangeLog:
* trans-openmp.cc (gfc_trans_omp_depobj): Fix to alloc/ptr dummy
and for c_ptr.
gcc/testsuite/ChangeLog:
* gfortran.dg/gomp/depend-4.f90: Add VALUE test, update scan test.
* gfortran.dg/gomp/depend-5.f90: Fix scan tree for -m32.
* gfortran.dg/gomp/depend-6.f90: New test.
Richard Sandiford [Tue, 15 Feb 2022 18:09:35 +0000 (18:09 +0000)]
aarch64: Fix subs_compare_2.c regression [PR100874]
subs_compare_2.c tests that we can use a SUBS+CSEL sequence for:
unsigned int
foo (unsigned int a, unsigned int b)
{
unsigned int x = a - 4;
if (a < 4)
return x;
else
return 0;
}
As Andrew notes in the PR, this is effectively MIN (x, 4) - 4,
and it is now recognised as such by phiopt. Previously it was
if-converted in RTL instead.
I tried to look for ways to generalise this to other situations
and to other ?:-style operations, not just max and min. However,
for general ?: we tend to push an outer “- CST” into the arms of
the ?: -- at least if one of them simplifies -- so I didn't find
any useful abstraction.
This patch therefore adds a pattern specifically for
max/min(a,cst)-cst. I'm not thrilled at having to do this,
but it seems like the least worst fix in the circumstances.
Also, max(a,cst)-cst for unsigned a is a useful saturating
subtraction idiom and so is arguably worth its own code
for that reason.
gcc/
PR target/100874
* config/aarch64/aarch64-protos.h (aarch64_maxmin_plus_const):
Declare.
* config/aarch64/aarch64.cc (aarch64_maxmin_plus_const): New function.
* config/aarch64/aarch64.md (*aarch64_minmax_plus): New pattern.
gcc/testsuite/
* gcc.target/aarch64/max_plus_1.c: New test.
* gcc.target/aarch64/max_plus_2.c: Likewise.
* gcc.target/aarch64/max_plus_3.c: Likewise.
* gcc.target/aarch64/max_plus_4.c: Likewise.
* gcc.target/aarch64/max_plus_5.c: Likewise.
* gcc.target/aarch64/max_plus_6.c: Likewise.
* gcc.target/aarch64/max_plus_7.c: Likewise.
* gcc.target/aarch64/min_plus_1.c: Likewise.
* gcc.target/aarch64/min_plus_2.c: Likewise.
* gcc.target/aarch64/min_plus_3.c: Likewise.
* gcc.target/aarch64/min_plus_4.c: Likewise.
* gcc.target/aarch64/min_plus_5.c: Likewise.
* gcc.target/aarch64/min_plus_6.c: Likewise.
* gcc.target/aarch64/min_plus_7.c: Likewise.
Richard Sandiford [Tue, 15 Feb 2022 18:09:34 +0000 (18:09 +0000)]
aarch64: Fix store_v2vec_lanes.c failure
store_v2vec_lanes.c started failing after SLP was enabled at -O2.
The test is specifically checking what happens for unvectorised code,
with the vectors being constructed from individal addition results.
gcc/testsuite/
* gcc.target/aarch64/store_v2vec_lanes.c: Add -fno-tree-vectorize.
Richard Sandiford [Tue, 15 Feb 2022 18:09:34 +0000 (18:09 +0000)]
aarch64: Add +nosve to tests
This patch adds +nosve to various Advanced SIMD-only tests.
gcc/testsuite/
* gcc.target/aarch64/shl-combine-2.c: New test.
* gcc.target/aarch64/shl-combine-3.c: Likewise.
* gcc.target/aarch64/shl-combine-4.c: Likewise.
* gcc.target/aarch64/shl-combine-5.c: Likewise.
* gcc.target/aarch64/xtn-combine-1.c: Likewise.
* gcc.target/aarch64/xtn-combine-2.c: Likewise.
* gcc.target/aarch64/xtn-combine-3.c: Likewise.
* gcc.target/aarch64/xtn-combine-4.c: Likewise.
* gcc.target/aarch64/xtn-combine-5.c: Likewise.
* gcc.target/aarch64/xtn-combine-6.c: Likewise.
Richard Sandiford [Tue, 15 Feb 2022 18:09:33 +0000 (18:09 +0000)]
vect+aarch64: Fix ldp_stp_* regressions
ldp_stp_1.c, ldp_stp_4.c and ldp_stp_5.c have been failing since
vectorisation was enabled at -O2. In all three cases SLP is
generating vector code when scalar code would be better.
The problem is that the target costs do not model whether STP could
be used for the scalar or vector code, so the normal latency-based
costs for store-heavy code can be way off. It would be good to fix
that “properly” at some point, but it isn't easy; see the existing
discussion in aarch64_sve_adjust_stmt_cost for more details.
This patch therefore adds an on-the-side check for whether the
code is doing nothing more than set-up+stores. It then applies
STP-based costs to those cases only, in addition to the normal
latency-based costs. (That is, the vector code has to win on
both counts rather than on one count individually.)
However, at the moment, SLP costs one vector set-up instruction
for every vector in an SLP node, even if the contents are the
same as a previous vector in the same node. Fixing the STP costs
without fixing that would regress other cases, tested in the patch.
The patch therefore makes the SLP costing code check for duplicates
within a node. Ideally we'd check for duplicates more globally,
but that would require a more global approach to costs: the cost
of an initialisation should be amoritised across all trees that
use the initialisation, rather than fully counted against one
arbitrarily-chosen subtree.
Back on aarch64: an earlier version of the patch tried to apply
the new heuristic to constant stores. However, that didn't work
too well in practice; see the comments for details. The patch
therefore just tests the status quo for constant cases, leaving out
a match if the current choice is dubious.
ldp_stp_5.c was affected by the same thing. The test would be
worth vectorising if we generated better vector code, but:
(1) We do a bad job of moving the { -1, 1 } constant, given that
we have { -1, -1 } and { 1, 1 } to hand.
(2) The vector code has 6 pairable stores to misaligned offsets.
We have peephole patterns to handle such misalignment for
4 pairable stores, but not 6.
So the SLP decision isn't wrong as such. It's just being let
down by later codegen.
The patch therefore adds -mstrict-align to preserve the original
intention of the test while adding ldp_stp_19.c to check for the
preferred vector code (XFAILed for now).
gcc/
* tree-vectorizer.h (vect_scalar_ops_slice): New struct.
(vect_scalar_ops_slice_hash): Likewise.
(vect_scalar_ops_slice::op): New function.
* tree-vect-slp.cc (vect_scalar_ops_slice::all_same_p): New function.
(vect_scalar_ops_slice_hash::hash): Likewise.
(vect_scalar_ops_slice_hash::equal): Likewise.
(vect_prologue_cost_for_slp): Check for duplicate vectors.
* config/aarch64/aarch64.cc
(aarch64_vector_costs::m_stp_sequence_cost): New member variable.
(aarch64_aligned_constant_offset_p): New function.
(aarch64_stp_sequence_cost): Likewise.
(aarch64_vector_costs::add_stmt_cost): Handle new STP heuristic.
(aarch64_vector_costs::finish_cost): Likewise.
gcc/testsuite/
* gcc.target/aarch64/ldp_stp_5.c: Require -mstrict-align.
* gcc.target/aarch64/ldp_stp_14.h,
* gcc.target/aarch64/ldp_stp_14.c: New test.
* gcc.target/aarch64/ldp_stp_15.c: Likewise.
* gcc.target/aarch64/ldp_stp_16.c: Likewise.
* gcc.target/aarch64/ldp_stp_17.c: Likewise.
* gcc.target/aarch64/ldp_stp_18.c: Likewise.
* gcc.target/aarch64/ldp_stp_19.c: Likewise.
Richard Sandiford [Tue, 15 Feb 2022 18:09:33 +0000 (18:09 +0000)]
vect: Fix early free
When updating the target costs interface, I failed to move the
free of the scalar costs beyond the new last use.
gcc/
* tree-vect-slp.cc (vect_bb_vectorization_profitable_p): Fix
use after free.
Jonathan Wakely [Tue, 15 Feb 2022 12:47:39 +0000 (12:47 +0000)]
libstdc++: Add missing constexpr to uses-allocator construction utilities [PR104542]
libstdc++-v3/ChangeLog:
PR libstdc++/104542
* include/bits/uses_allocator_args.h (make_obj_using_allocator)
(uninitialized_construct_using_allocator): Add constexpr.
* testsuite/20_util/uses_allocator/make_obj.cc: Check constexpr.
* testsuite/20_util/uses_allocator/uninitialized_construct.cc: New test.
Richard Biener [Tue, 15 Feb 2022 11:27:14 +0000 (12:27 +0100)]
tree-optimization/104543 - fix unroll-and-jam precondition
We have to make sure that outer loop exits come after the inner
loop since we otherwise will put it into the fused loop body.
2022-02-15 Richard Biener <rguenther@suse.de>
PR tree-optimization/104543
* gimple-loop-jam.cc (unroll_jam_possible_p): Check outer loop exits
come after the inner loop.
* gcc.dg/torture/pr104543.c: New testcase.
Tobias Burnus [Tue, 15 Feb 2022 11:26:48 +0000 (12:26 +0100)]
Fortran/OpenMP: Fix depend-clause handling
gcc/fortran/ChangeLog:
* trans-openmp.cc (gfc_trans_omp_clauses, gfc_trans_omp_depobj):
Depend on the proper addr, for ptr/alloc depend on pointee.
libgomp/ChangeLog:
* testsuite/libgomp.fortran/depend-4.f90: New test.
gcc/testsuite/ChangeLog:
* gfortran.dg/gomp/depend-4.f90: New test.
* gfortran.dg/gomp/depend-5.f90: New test.
Jakub Jelinek [Tue, 15 Feb 2022 11:17:41 +0000 (12:17 +0100)]
cygwin: Fix up -Werror=format-diag errors [PR104536]
As the testcase reports, cygwin has 3 can%'t contractions in diagnostics,
we use cannot everywhere else instead and -Wformat-diag enforces that.
2022-02-15 Jakub Jelinek <jakub@redhat.com>
PR target/104536
* config/i386/host-cygwin.cc (cygwin_gt_pch_get_address): Use
cannot instead of can%'t in diagnostics. Formatting fixes.
Jakub Jelinek [Tue, 15 Feb 2022 11:11:31 +0000 (12:11 +0100)]
fold, simplify-rtx: Punt on non-representable floating point constants [PR104522]
For IBM double double I've added in PR95450 and PR99648 verification that
when we at the tree/GIMPLE or RTL level interpret target bytes as a REAL_CST
or CONST_DOUBLE constant, we try to encode it back to target bytes and
verify it is the same.
This is because our real.c support isn't able to represent all valid values
of IBM double double which has variable precision.
In PR104522, it has been noted that we have similar problem with the
Intel/Motorola extended XFmode formats, our internal representation isn't
able to record pseudo denormals, pseudo infinities, pseudo NaNs and unnormal
values.
So, the following patch is an attempt to extend that verification to all
floats.
Unfortunately, it wasn't that straightforward, because the
__builtin_clear_padding code exactly for the XFmode long doubles needs to
discover what bits are padding and does that by interpreting memory of
all 1s. That is actually a valid supported value, a qNaN with negative
sign with all mantissa bits set, but the verification includes also the
padding bits (exactly what __builtin_clear_padding wants to figure out)
and so fails the comparison check and so we ICE.
The patch fixes that case by moving that verification from
native_interpret_real to its caller, so that clear_padding_type can
call native_interpret_real and avoid that extra check.
With this, the only thing that regresses in the testsuite is
+FAIL: gcc.target/i386/auto-init-4.c scan-assembler-times long\\t-
16843010 5
because it decides to use a pattern that has non-zero bits in the padding
bits of the long double, so the simplify-rtx.cc change prevents folding
a SUBREG into a constant. We emit (the testcase is -O0 but we emit worse
code at all opt levels) something like:
movabsq $-
72340172838076674, %rax
movabsq $-
72340172838076674, %rdx
movq %rax, -48(%rbp)
movq %rdx, -40(%rbp)
fldt -48(%rbp)
fstpt -32(%rbp)
instead of
fldt .LC2(%rip)
fstpt -32(%rbp)
...
.LC2:
.long -
16843010
.long -
16843010
.long 65278
.long 0
Note, neither of those sequences actually stores the padding bits, fstpt
simply doesn't touch them.
For vars with clear_padding_real_needs_padding_p types that are allocated
to memory at expansion time, I'd say much better would be to do the stores
using integral modes rather than XFmode, so do that:
movabsq $-
72340172838076674, %rax
movq %rax, -32(%rbp)
movq %rax, -24(%rbp)
directly. That is the only way to ensure the padding bits are initialized
(or expand __builtin_clear_padding, but then you initialize separately the
value bits and padding bits).
2022-02-15 Jakub Jelinek <jakub@redhat.com>
PR middle-end/104522
* fold-const.h (native_interpret_real): Declare.
* fold-const.cc (native_interpret_real): No longer static. Don't
perform MODE_COMPOSITE_P verification here.
(native_interpret_expr) <case REAL_TYPE>: But perform it here instead
for all modes.
* gimple-fold.cc (clear_padding_type): Call native_interpret_real
instead of native_interpret_expr.
* simplify-rtx.cc (simplify_immed_subreg): Perform the native_encode_rtx
and comparison verification for all FLOAT_MODE_P modes, not just
MODE_COMPOSITE_P.
* gcc.dg/pr104522.c: New test.
Richard Biener [Tue, 15 Feb 2022 08:40:59 +0000 (09:40 +0100)]
tree-optimization/104519 - adjust PR100499 niter fix
The following adjusts the PR100499 niter fix to use the appropriate
types when checking whether the difference between the final and base
values of the IV are a multiple of the step. It also gets rid of
an always false condition in multiple_of_p which lead me to a
wrong solution first.
2022-02-15 Richard Biener <rguenther@suse.de>
PR tree-optimization/104519
* fold-const.cc (multiple_of_p): Remove never true condition.
* tree-ssa-loop-niter.cc (number_of_iterations_ne): Use
the appropriate types for determining whether the difference
of final and base is a multiple of the step.
* gcc.dg/torture/pr104519.c: New testcase.