Marek Polacek [Tue, 13 Jul 2021 21:16:54 +0000 (17:16 -0400)]
c++: constexpr array reference and value-initialization [PR101371]
This PR gave me a hard time: I saw multiple issues starting with
different revisions. But ultimately the root cause seems to be
the following, and the attached patch fixes all issues I've found
here.
In cxx_eval_array_reference we create a new constexpr context for the
CP_AGGREGATE_TYPE_P case, but we also have to create it for the
non-aggregate case. In this test, we are evaluating
((B *)this)->a = rhs->a
which means that we set ctx.object to ((B *)this)->a. Then we proceed
to evaluate the initializer, rhs->a. For *rhs, we eval rhs, a PARM_DECL,
for which we have (const B &) &c.arr[0] in the hash table. Then
cxx_fold_indirect_ref gives us c.arr[0]. c is evaluated to {.arr={}} so
c.arr is {}. Now we want c.arr[0], so we end up in cxx_eval_array_reference
and since we're initializing from {}, we call build_value_init which
gives us an AGGR_INIT_EXPR that calls 'constexpr B::B()'. Then we
evaluate this AGGR_INIT_EXPR and since its first argument is dummy,
we take ctx.object instead. But that is the wrong object, we're not
initializing ((B *)this)->a here. And so we wound up with an
initializer for A, and then crash in cxx_eval_component_reference:
gcc_assert (DECL_CONTEXT (part) == TYPE_MAIN_VARIANT (TREE_TYPE (whole)));
where DECL_CONTEXT (part) is B (as it should be) but the type of whole
was A.
So create a new object, if there already was one, and the element type
is not a scalar.
PR c++/101371
gcc/cp/ChangeLog:
* constexpr.c (cxx_eval_array_reference): Create a new .object
and .ctor for the non-aggregate non-scalar case too when
value-initializing.
gcc/testsuite/ChangeLog:
* g++.dg/cpp1y/constexpr-101371-2.C: New test.
* g++.dg/cpp1y/constexpr-101371.C: New test.
Harald Anlauf [Wed, 14 Jul 2021 15:25:29 +0000 (17:25 +0200)]
Fortran - ICE in gfc_conv_expr_present initializing non-dummy class variable
gcc/fortran/ChangeLog:
PR fortran/100949
* trans-expr.c (gfc_trans_class_init_assign): Call
gfc_conv_expr_present only for dummy variables.
gcc/testsuite/ChangeLog:
PR fortran/100949
* gfortran.dg/pr100949.f90: New test.
Tamar Christina [Wed, 14 Jul 2021 14:23:23 +0000 (15:23 +0100)]
AArch64: Correct dot-product auto-vect optab RTL
The current RTL for the vectorizer patterns for dot-product are incorrect.
Operand3 isn't an output parameter so we can't write to it.
This fixes this issue and reduces the number of RTL.
gcc/ChangeLog:
* config/aarch64/aarch64-simd-builtins.def (udot, sdot): Rename to...
(sdot_prod, udot_prod): ...These.
* config/aarch64/aarch64-simd.md (<sur>dot_prod<vsi2qi>): Remove.
(aarch64_<sur>dot<vsi2qi>): Rename to...
(<sur>dot_prod<vsi2qi>): ...This.
* config/aarch64/arm_neon.h (vdot_u32, vdotq_u32, vdot_s32, vdotq_s32):
Update builtins.
Tamar Christina [Wed, 14 Jul 2021 14:22:37 +0000 (15:22 +0100)]
AArch32: Correct sdot RTL on aarch32
The RTL Generated from <sup>dot_prod<vsi2qi> is invalid as operand3 cannot be
written to, it's a normal input. For the expand it's just another operand
but the caller does not expect it to be written to.
gcc/ChangeLog:
* config/arm/neon.md (<sup>dot_prod<vsi2qi>): Drop statements.
Tamar Christina [Wed, 14 Jul 2021 14:21:40 +0000 (15:21 +0100)]
middle-end: Add tests middle end generic tests for sign differing dotproduct.
This adds testcases to test for auto-vect detection of the new sign differing
dot product.
gcc/ChangeLog:
* doc/sourcebuild.texi (arm_v8_2a_i8mm_neon_hw): Document.
gcc/testsuite/ChangeLog:
* lib/target-supports.exp
(check_effective_target_arm_v8_2a_imm8_neon_ok_nocache,
check_effective_target_arm_v8_2a_i8mm_neon_hw,
check_effective_target_vect_usdot_qi): New.
* gcc.dg/vect/vect-reduc-dot-9.c: New test.
* gcc.dg/vect/vect-reduc-dot-10.c: New test.
* gcc.dg/vect/vect-reduc-dot-11.c: New test.
* gcc.dg/vect/vect-reduc-dot-12.c: New test.
* gcc.dg/vect/vect-reduc-dot-13.c: New test.
* gcc.dg/vect/vect-reduc-dot-14.c: New test.
* gcc.dg/vect/vect-reduc-dot-15.c: New test.
* gcc.dg/vect/vect-reduc-dot-16.c: New test.
* gcc.dg/vect/vect-reduc-dot-17.c: New test.
* gcc.dg/vect/vect-reduc-dot-18.c: New test.
* gcc.dg/vect/vect-reduc-dot-19.c: New test.
* gcc.dg/vect/vect-reduc-dot-20.c: New test.
* gcc.dg/vect/vect-reduc-dot-21.c: New test.
* gcc.dg/vect/vect-reduc-dot-22.c: New test.
Tamar Christina [Wed, 14 Jul 2021 14:20:45 +0000 (15:20 +0100)]
AArch32: Add support for sign differing dot-product usdot for NEON.
This adds optabs implementing usdot_prod.
The following testcase:
#define N 480
#define SIGNEDNESS_1 unsigned
#define SIGNEDNESS_2 signed
#define SIGNEDNESS_3 signed
#define SIGNEDNESS_4 unsigned
SIGNEDNESS_1 int __attribute__ ((noipa))
f (SIGNEDNESS_1 int res, SIGNEDNESS_3 char *restrict a,
SIGNEDNESS_4 char *restrict b)
{
for (__INTPTR_TYPE__ i = 0; i < N; ++i)
{
int av = a[i];
int bv = b[i];
SIGNEDNESS_2 short mult = av * bv;
res += mult;
}
return res;
}
Generates
f:
vmov.i32 q8, #0 @ v4si
add r3, r2, #480
.L2:
vld1.8 {q10}, [r2]!
vld1.8 {q9}, [r1]!
vusdot.s8 q8, q9, q10
cmp r3, r2
bne .L2
vadd.i32 d16, d16, d17
vpadd.i32 d16, d16, d16
vmov.32 r3, d16[0]
add r0, r0, r3
bx lr
instead of
f:
vmov.i32 q8, #0 @ v4si
add r3, r2, #480
.L2:
vld1.8 {q9}, [r2]!
vld1.8 {q11}, [r1]!
cmp r3, r2
vmull.s8 q10, d18, d22
vmull.s8 q9, d19, d23
vaddw.s16 q8, q8, d20
vaddw.s16 q8, q8, d21
vaddw.s16 q8, q8, d18
vaddw.s16 q8, q8, d19
bne .L2
vadd.i32 d16, d16, d17
vpadd.i32 d16, d16, d16
vmov.32 r3, d16[0]
add r0, r0, r3
bx lr
For NEON. I couldn't figure out if the MVE instruction vmlaldav.s16 could be
used to emulate this. Because it would require additional widening to work I
left MVE out of this patch set but perhaps someone should take a look.
gcc/ChangeLog:
* config/arm/neon.md (usdot_prod<vsi2qi>): New.
gcc/testsuite/ChangeLog:
* gcc.target/arm/simd/vusdot-autovec.c: New test.
Tamar Christina [Wed, 14 Jul 2021 14:19:32 +0000 (15:19 +0100)]
AArch64: Add support for sign differing dot-product usdot for NEON and SVE.
Hi All,
This adds optabs implementing usdot_prod.
The following testcase:
#define N 480
#define SIGNEDNESS_1 unsigned
#define SIGNEDNESS_2 signed
#define SIGNEDNESS_3 signed
#define SIGNEDNESS_4 unsigned
SIGNEDNESS_1 int __attribute__ ((noipa))
f (SIGNEDNESS_1 int res, SIGNEDNESS_3 char *restrict a,
SIGNEDNESS_4 char *restrict b)
{
for (__INTPTR_TYPE__ i = 0; i < N; ++i)
{
int av = a[i];
int bv = b[i];
SIGNEDNESS_2 short mult = av * bv;
res += mult;
}
return res;
}
Generates for NEON
f:
movi v0.4s, 0
mov x3, 0
.p2align 3,,7
.L2:
ldr q1, [x2, x3]
ldr q2, [x1, x3]
usdot v0.4s, v1.16b, v2.16b
add x3, x3, 16
cmp x3, 480
bne .L2
addv s0, v0.4s
fmov w1, s0
add w0, w0, w1
ret
and for SVE
f:
mov x3, 0
cntb x5
mov w4, 480
mov z1.b, #0
whilelo p0.b, wzr, w4
mov z3.b, #0
ptrue p1.b, all
.p2align 3,,7
.L2:
ld1b z2.b, p0/z, [x1, x3]
ld1b z0.b, p0/z, [x2, x3]
add x3, x3, x5
sel z0.b, p0, z0.b, z3.b
whilelo p0.b, w3, w4
usdot z1.s, z0.b, z2.b
b.any .L2
uaddv d0, p1, z1.s
fmov x1, d0
add w0, w0, w1
ret
instead of
f:
movi v0.4s, 0
mov x3, 0
.p2align 3,,7
.L2:
ldr q2, [x1, x3]
ldr q1, [x2, x3]
add x3, x3, 16
sxtl v4.8h, v2.8b
sxtl2 v3.8h, v2.16b
uxtl v2.8h, v1.8b
uxtl2 v1.8h, v1.16b
mul v2.8h, v2.8h, v4.8h
mul v1.8h, v1.8h, v3.8h
saddw v0.4s, v0.4s, v2.4h
saddw2 v0.4s, v0.4s, v2.8h
saddw v0.4s, v0.4s, v1.4h
saddw2 v0.4s, v0.4s, v1.8h
cmp x3, 480
bne .L2
addv s0, v0.4s
fmov w1, s0
add w0, w0, w1
ret
and
f:
mov x3, 0
cnth x5
mov w4, 480
mov z1.b, #0
whilelo p0.h, wzr, w4
ptrue p2.b, all
.p2align 3,,7
.L2:
ld1sb z2.h, p0/z, [x1, x3]
punpklo p1.h, p0.b
ld1b z0.h, p0/z, [x2, x3]
add x3, x3, x5
mul z0.h, p2/m, z0.h, z2.h
sunpklo z2.s, z0.h
sunpkhi z0.s, z0.h
add z1.s, p1/m, z1.s, z2.s
punpkhi p1.h, p0.b
whilelo p0.h, w3, w4
add z1.s, p1/m, z1.s, z0.s
b.any .L2
uaddv d0, p2, z1.s
fmov x1, d0
add w0, w0, w1
ret
gcc/ChangeLog:
* config/aarch64/aarch64-simd.md (aarch64_usdot<vsi2qi>): Rename to...
(usdot_prod<vsi2qi>): ... This.
* config/aarch64/aarch64-simd-builtins.def (usdot): Rename to...
(usdot_prod): ...This.
* config/aarch64/arm_neon.h (vusdot_s32, vusdotq_s32): Likewise.
* config/aarch64/aarch64-sve.md (@aarch64_<sur>dot_prod<vsi2qi>):
Rename to...
(@<sur>dot_prod<vsi2qi>): ...This.
* config/aarch64/aarch64-sve-builtins-base.cc
(svusdot_impl::expand): Use it.
gcc/testsuite/ChangeLog:
* gcc.target/aarch64/simd/vusdot-autovec.c: New test.
* gcc.target/aarch64/sve/vusdot-autovec.c: New test.
Tamar Christina [Wed, 14 Jul 2021 13:54:26 +0000 (14:54 +0100)]
Vect: Add support for dot-product where the sign for the multiplicant changes.
This patch adds support for a dot product where the sign of the multiplication
arguments differ. i.e. one is signed and one is unsigned but the precisions are
the same.
#define N 480
#define SIGNEDNESS_1 unsigned
#define SIGNEDNESS_2 signed
#define SIGNEDNESS_3 signed
#define SIGNEDNESS_4 unsigned
SIGNEDNESS_1 int __attribute__ ((noipa))
f (SIGNEDNESS_1 int res, SIGNEDNESS_3 char *restrict a,
SIGNEDNESS_4 char *restrict b)
{
for (__INTPTR_TYPE__ i = 0; i < N; ++i)
{
int av = a[i];
int bv = b[i];
SIGNEDNESS_2 short mult = av * bv;
res += mult;
}
return res;
}
The operations are performed as if the operands were extended to a 32-bit value.
As such this operation isn't valid if there is an intermediate conversion to an
unsigned value. i.e. if SIGNEDNESS_2 is unsigned.
more over if the signs of SIGNEDNESS_3 and SIGNEDNESS_4 are flipped the same
optab is used but the operands are flipped in the optab expansion.
To support this the patch extends the dot-product detection to optionally
ignore operands with different signs and stores this information in the optab
subtype which is now made a bitfield.
The subtype can now additionally controls which optab an EXPR can expand to.
gcc/ChangeLog:
* optabs.def (usdot_prod_optab): New.
* doc/md.texi: Document it and clarify other dot prod optabs.
* optabs-tree.h (enum optab_subtype): Add optab_vector_mixed_sign.
* optabs-tree.c (optab_for_tree_code): Support usdot_prod_optab.
* optabs.c (expand_widen_pattern_expr): Likewise.
* tree-cfg.c (verify_gimple_assign_ternary): Likewise.
* tree-vect-loop.c (vectorizable_reduction): Query dot-product kind.
* tree-vect-patterns.c (vect_supportable_direct_optab_p): Take optional
optab subtype.
(vect_widened_op_tree): Optionally ignore
mismatch types.
(vect_recog_dot_prod_pattern): Support usdot_prod_optab.
H.J. Lu [Fri, 9 Jul 2021 16:16:01 +0000 (09:16 -0700)]
x86: Don't enable UINTR in 32-bit mode
UINTR is available only in 64-bit mode. Since the codegen target is
unknown when the the gcc driver is processing -march=native, to properly
handle UINTR for -march=native:
1. Pass "arch [32|64]" and "tune [32|64]" to host_detect_local_cpu to
indicate 32-bit and 64-bit codegen.
2. Change ix86_option_override_internal to enable UINTR only in 64-bit
mode for -march=CPU when PTA_CPU includes PTA_UINTR.
gcc/
PR target/101395
* config/i386/driver-i386.c (host_detect_local_cpu): Check
"arch [32|64]" and "tune [32|64]" for 32-bit and 64-bit codegen.
Enable UINTR only for 64-bit codegen.
* config/i386/i386-options.c
(ix86_option_override_internal::DEF_PTA): Skip PTA_UINTR if not
in 64-bit mode.
* config/i386/i386.h (ARCH_ARG): New.
(CC1_CPU_SPEC): Pass "[arch|tune] 32" for 32-bit codegen and
"[arch|tune] 64" for 64-bit codegen.
gcc/testsuite/
PR target/101395
* gcc.target/i386/pr101395-1.c: New test.
* gcc.target/i386/pr101395-2.c: Likewise.
* gcc.target/i386/pr101395-3.c: Likewise.
Jonathan Wakely [Wed, 14 Jul 2021 10:03:17 +0000 (11:03 +0100)]
libstdc++: Add noexcept-specifier to basic_string_view(It, End)
This adds a conditional noexcept to the C++20 constructor. The
std::to_address call cannot throw, so only taking the difference of the
two iterators can throw.
Signed-off-by: Jonathan Wakely <jwakely@redhat.com>
libstdc++-v3/ChangeLog:
* include/std/string_view (basic_string_view(It, End)): Add
noexcept-specifier.
* testsuite/21_strings/basic_string_view/cons/char/range.cc:
Check noexcept-specifier. Also check construction without CTAD.
Richard Biener [Wed, 14 Jul 2021 09:06:58 +0000 (11:06 +0200)]
tree-optimization/101445 - fix negative stride SLP vect with gaps
The following fixes the IV adjustment for the gap in a negative
stride SLP vectorization. The adjustment was in the wrong direction,
now fixes as in the patch.
2021-07-14 Richard Biener <rguenther@suse.de>
PR tree-optimization/101445
* tree-vect-stmts.c (vectorizable_load): Do the gap adjustment
of the IV in the correct direction for negative stride
accesses.
* gcc.dg/vect/pr101445.c: New testcase.
Jakub Jelinek [Wed, 14 Jul 2021 08:22:50 +0000 (10:22 +0200)]
godump: Fix -fdump-go-spec= reproduceability issue [PR101407]
pot_dummy_types is a hash_set from whose traversal the code prints some type
lines. hash_set normally uses default_hash_traits which for pointer types
(the hash set hashes const char *) uses pointer_hash which hashes the
addresses of the pointers except of the least significant 3 bits.
With address space randomization, that results in non-determinism in the
-fdump-go-specs= generated file, each invocation can have different order of
the lines emitted from pot_dummy_types traversal.
This patch fixes it by hashing the string contents instead to make the
hashes reproduceable.
2021-07-14 Jakub Jelinek <jakub@redhat.com>
PR go/101407
* godump.c (godump_str_hash): New type.
(godump_container::pot_dummy_types): Use string_hash instead of
ptr_hash in the hash_set.
Richard Biener [Tue, 13 Jul 2021 11:59:15 +0000 (13:59 +0200)]
Support reduction def re-use for epilogue with different vector size
The following adds support for re-using the vector reduction def
from the main loop in vectorized epilogue loops on architectures
which use different vector sizes for the epilogue. That's only
x86 as far as I am aware.
2021-07-13 Richard Biener <rguenther@suse.de>
* tree-vect-loop.c (vect_find_reusable_accumulator): Handle
vector types where the old vector type has a multiple of
the new vector type elements.
(vect_create_partial_epilog): New function, split out from...
(vect_create_epilog_for_reduction): ... here.
(vect_transform_cycle_phi): Reduce the re-used accumulator
to the new vector type.
* gcc.target/i386/vect-reduc-1.c: New testcase.
Alexandre Oliva [Wed, 14 Jul 2021 01:25:56 +0000 (22:25 -0300)]
fix typo in attr_fnspec::verify
Odd-numbered indices describing argument access sizes in the fnspec
string can only hold 't' or a digit, as tested in the beginning of the
case. When checking that the size-supplying argument does not have
additional information associated with it, the test that excludes the
't' possibility looks for it at the even position in the fnspec
string. Oops.
This might yield false positives and negatives if a function has a
fnspec in which an argument uses a 't' access-size, and ('t' - '1')
happens to be the index of an argument described in an fnspec string.
Assuming ASCII encoding, it would take a function with at least 68
arguments described in fnspec. Still, probably worth fixing.
for gcc/ChangeLog
* tree-ssa-alias.c (attr_fnspec::verify): Fix index in
non-'t'-sized arg check.
Alexandre Oliva [Wed, 14 Jul 2021 01:25:54 +0000 (22:25 -0300)]
adjust landing pads when changing main label
If an artificial label created for a landing pad ends up being
dropped in favor of a user-supplied label, the user-supplied label
inherits the landing pad index, but the post_landing_pad field is not
adjusted to point to the new label.
This patch fixes the problem, and adds verification that we don't
remove a label that's still used as a landing pad.
The circumstance in which this problem can be hit was unusual: removal
of a block with an unreachable label moves the label to some other
unrelated block, in case its address is taken. In the case at hand
(pr42739.C, complicated by wrappers and cleanups), the chosen block
happened to be an EH landing pad. (A followup patch will change that.)
for gcc/ChangeLog
* tree-cfg.c (cleanup_dead_labels_eh): Update
post_landing_pad label upon change of landing pad block's
primary label.
(cleanup_dead_labels): Check that a removed label is not that
of a landing pad.
GCC Administrator [Wed, 14 Jul 2021 00:16:44 +0000 (00:16 +0000)]
Daily bump.
Jonathan Wright [Wed, 2 Jun 2021 15:55:00 +0000 (16:55 +0100)]
gcc: Add vec_select -> subreg RTL simplification
Add a new RTL simplification for the case of a VEC_SELECT selecting
the low part of a vector. The simplification returns a SUBREG.
The primary goal of this patch is to enable better combinations of
Neon RTL patterns - specifically allowing generation of 'write-to-
high-half' narrowing intructions.
Adding this RTL simplification means that the expected results for a
number of tests need to be updated:
* aarch64 Neon: Update the scan-assembler regex for intrinsics tests
to expect a scalar register instead of lane 0 of a vector.
* aarch64 SVE: Likewise.
* arm MVE: Use lane 1 instead of lane 0 for lane-extraction
intrinsics tests (as the move instructions get optimized away for
lane 0.)
This patch also adds new code generation tests to
narrow_high_combine.c to verify the benefit of this RTL
simplification.
gcc/ChangeLog:
2021-06-08 Jonathan Wright <jonathan.wright@arm.com>
* combine.c (combine_simplify_rtx): Add vec_select -> subreg
simplification.
* config/aarch64/aarch64.md (*zero_extend<SHORT:mode><GPI:mode>2_aarch64):
Add Neon to general purpose register case for zero-extend
pattern.
* config/arm/vfp.md (*arm_movsi_vfp): Remove "*" from *t -> r
case to prevent some cases opting to go through memory.
* cse.c (fold_rtx): Add vec_select -> subreg simplification.
* rtl.c (rtvec_series_p): Define predicate to determine
whether a vector contains a linear series of integers.
* rtl.h (rtvec_series_p): Define.
* rtlanal.c (vec_series_lowpart_p): Define predicate to
determine if a vector selection is equivalent to the low part
of the vector.
* rtlanal.h (vec_series_lowpart_p): Define.
* simplify-rtx.c (simplify_context::simplify_binary_operation_1):
Add vec_select -> subreg simplification.
gcc/testsuite/ChangeLog:
* gcc.target/aarch64/extract_zero_extend.c: Remove dump scan
for RTL pattern match.
* gcc.target/aarch64/narrow_high_combine.c: Add new tests.
* gcc.target/aarch64/simd/vmulx_laneq_f64_1.c: Update
scan-assembler regex to look for a scalar register instead of
lane 0 of a vector.
* gcc.target/aarch64/simd/vmulxd_laneq_f64_1.c: Likewise.
* gcc.target/aarch64/simd/vmulxs_lane_f32_1.c: Likewise.
* gcc.target/aarch64/simd/vmulxs_laneq_f32_1.c: Likewise.
* gcc.target/aarch64/simd/vqdmlalh_lane_s16.c: Likewise.
* gcc.target/aarch64/simd/vqdmlals_lane_s32.c: Likewise.
* gcc.target/aarch64/simd/vqdmlslh_lane_s16.c: Likewise.
* gcc.target/aarch64/simd/vqdmlsls_lane_s32.c: Likewise.
* gcc.target/aarch64/simd/vqdmullh_lane_s16.c: Likewise.
* gcc.target/aarch64/simd/vqdmullh_laneq_s16.c: Likewise.
* gcc.target/aarch64/simd/vqdmulls_lane_s32.c: Likewise.
* gcc.target/aarch64/simd/vqdmulls_laneq_s32.c: Likewise.
* gcc.target/aarch64/sve/dup_lane_1.c: Likewise.
* gcc.target/aarch64/sve/extract_1.c: Likewise.
* gcc.target/aarch64/sve/extract_2.c: Likewise.
* gcc.target/aarch64/sve/extract_3.c: Likewise.
* gcc.target/aarch64/sve/extract_4.c: Likewise.
* gcc.target/aarch64/sve/live_1.c: Update scan-assembler regex
cases to look for 'b' and 'h' registers instead of 'w'.
* gcc.target/arm/crypto-vsha1cq_u32.c: Update scan-assembler
regex to reflect lane 0 vector extractions being simplified
to scalar register moves.
* gcc.target/arm/crypto-vsha1h_u32.c: Likewise.
* gcc.target/arm/crypto-vsha1mq_u32.c: Likewise.
* gcc.target/arm/crypto-vsha1pq_u32.c: Likewise.
* gcc.target/arm/mve/intrinsics/vgetq_lane_f16.c: Extract
lane 1 as the moves for lane 0 now get optimized away.
* gcc.target/arm/mve/intrinsics/vgetq_lane_f32.c: Likewise.
* gcc.target/arm/mve/intrinsics/vgetq_lane_s16.c: Likewise.
* gcc.target/arm/mve/intrinsics/vgetq_lane_s32.c: Likewise.
* gcc.target/arm/mve/intrinsics/vgetq_lane_s8.c: Likewise.
* gcc.target/arm/mve/intrinsics/vgetq_lane_u16.c: Likewise.
* gcc.target/arm/mve/intrinsics/vgetq_lane_u32.c: Likewise.
* gcc.target/arm/mve/intrinsics/vgetq_lane_u8.c: Likewise.
Paul A. Clarke [Tue, 29 Jun 2021 14:23:39 +0000 (09:23 -0500)]
rs6000: Add tests for SSE4.1 "test" intrinsics
Copy the test for _mm_testz_si128, _mm_testc_si128,
_mm_testnzc_si128, _mm_test_all_ones, _mm_test_all_zeros,
_mm_test_mix_ones_zeros from gcc/testsuite/gcc.target/i386.
2021-07-13 Paul A. Clarke <pc@us.ibm.com>
gcc/testsuite
* gcc.target/powerpc/sse4_1-ptest-1.c: Copy from
gcc/testsuite/gcc.target/i386.
Paul A. Clarke [Tue, 29 Jun 2021 14:18:55 +0000 (09:18 -0500)]
rs6000: Add support for SSE4.1 "test" intrinsics
2021-07-13 Paul A. Clarke <pc@us.ibm.com>
gcc
* config/rs6000/smmintrin.h (_mm_testz_si128, _mm_testc_si128,
_mm_testnzc_si128, _mm_test_all_ones, _mm_test_all_zeros,
_mm_test_mix_ones_zeros): New.
Jonathan Wakely [Tue, 13 Jul 2021 11:21:27 +0000 (12:21 +0100)]
libstdc++: Simplify basic_string_view::ends_with [PR 101361]
The use of npos triggers a diagnostic as described in PR c++/101361.
This change replaces the use of npos with the exact length, which is
already known. We can further simplify it by inlining the effects of
compare and substr, avoiding the redundant range checks in the latter.
Signed-off-by: Jonathan Wakely <jwakely@redhat.com>
libstdc++-v3/ChangeLog:
PR c++/101361
* include/std/string_view (ends_with): Use traits_type::compare
directly.
Andrew MacLeod [Tue, 13 Jul 2021 13:41:30 +0000 (09:41 -0400)]
Adjust testcase to test the call is removed.
Ranger now handles the test.
gcc/testsuite
PR tree-optimization/93781
* gcc.dg/tree-ssa/pr93781-1.c: Check that call is removed.
Roger Sayle [Tue, 13 Jul 2021 13:01:41 +0000 (14:01 +0100)]
Make gimple_could_trap_p const-safe.
Allow gimple_could_trap_p (which previously took a non-const gimple)
to be called from functions that take a const gimple (such as
gimple_has_side_effects), and update its prototypes. Pre-approved
as obvious.
2021-07-13 Roger Sayle <roger@nextmovesoftware.com>
Richard Biener <rguenther@suse.de>
gcc/ChangeLog
* gimple.c (gimple_could_trap_p_1): Make S argument a
"const gimple*". Preserve constness in call to
gimple_asm_volatile_p.
(gimple_could_trap_p): Make S argument a "const gimple*".
* gimple.h (gimple_could_trap_p_1, gimple_could_trap_p):
Update function prototypes.
Jonathan Wakely [Tue, 13 Jul 2021 11:09:37 +0000 (12:09 +0100)]
libstdc++: Remove duplicate #include in <string_view>
When I added the new C++23 constructor I added a conditional include of
<bits/ranges_base.h>, which was already being included unconditionally.
This removes the unconditional include but changes the condition for the
other one, so it's used for C++20 as well.
Signed-off-by: Jonathan Wakely <jwakely@redhat.com>
libstdc++-v3/ChangeLog:
* include/std/string_view: Only include <bits/ranges_base.h>
once, and only for C++20 and later.
Richard Sandiford [Tue, 13 Jul 2021 09:17:43 +0000 (10:17 +0100)]
vect: Reuse reduction accumulators between loops
This patch adds support for reusing a main loop's reduction accumulator
in an epilogue loop. This in turn lets the loops share a single piece
of vector->scalar reduction code.
The patch has the following restrictions:
(1) The epilogue reduction can only operate on a single vector
(e.g. ncopies must be 1 for non-SLP reductions, and the group size
must be <= the element count for SLP reductions).
(2) Both loops must use the same vector mode for their accumulators.
This means that the patch is restricted to targets that support
--param vect-partial-vector-usage=1.
(3) The reduction must be a standard “tree code” reduction.
However, these restrictions could be lifted in future. For example,
if the main loop operates on 128-bit vectors and the epilogue loop
operates on 64-bit vectors, we could in future reduce the 128-bit
vector by one stage and use the 64-bit result as the starting point
for the epilogue result.
The patch tries to handle chained SLP reductions, unchained SLP
reductions and non-SLP reductions. It also handles cases in which
the epilogue loop is entered directly (rather than via the main loop)
and cases in which the epilogue loop can be skipped.
vect_get_main_loop_result is a bit more general than the current
patch needs.
gcc/
* tree-vectorizer.h (vect_reusable_accumulator): New structure.
(_loop_vec_info::main_loop_edge): New field.
(_loop_vec_info::skip_main_loop_edge): Likewise.
(_loop_vec_info::skip_this_loop_edge): Likewise.
(_loop_vec_info::reusable_accumulators): Likewise.
(_stmt_vec_info::reduc_scalar_results): Likewise.
(_stmt_vec_info::reused_accumulator): Likewise.
(vect_get_main_loop_result): Declare.
* tree-vectorizer.c (vec_info::new_stmt_vec_info): Initialize
reduc_scalar_inputs.
(vec_info::free_stmt_vec_info): Free reduc_scalar_inputs.
* tree-vect-loop-manip.c (vect_get_main_loop_result): New function.
(vect_do_peeling): Fill an epilogue loop's main_loop_edge,
skip_main_loop_edge and skip_this_loop_edge fields.
* tree-vect-loop.c (INCLUDE_ALGORITHM): Define.
(vect_emit_reduction_init_stmts): New function.
(get_initial_def_for_reduction): Use it.
(get_initial_defs_for_reduction): Likewise. Change the vinfo
parameter to a loop_vec_info.
(vect_create_epilog_for_reduction): Store the scalar results
in the reduc_info. If an epilogue loop is reusing an accumulator
from the main loop, and if the epilogue loop can also be skipped,
try to place the reduction code in the join block. Record
accumulators that could potentially be reused by epilogue loops.
(vect_transform_cycle_phi): When vectorizing epilogue loops,
try to reuse accumulators from the main loop. Record the initial
value in reduc_info for non-SLP reductions too.
gcc/testsuite/
* gcc.target/aarch64/sve/reduc_9.c: New test.
* gcc.target/aarch64/sve/reduc_9_run.c: Likewise.
* gcc.target/aarch64/sve/reduc_10.c: Likewise.
* gcc.target/aarch64/sve/reduc_10_run.c: Likewise.
* gcc.target/aarch64/sve/reduc_11.c: Likewise.
* gcc.target/aarch64/sve/reduc_11_run.c: Likewise.
* gcc.target/aarch64/sve/reduc_12.c: Likewise.
* gcc.target/aarch64/sve/reduc_12_run.c: Likewise.
* gcc.target/aarch64/sve/reduc_13.c: Likewise.
* gcc.target/aarch64/sve/reduc_13_run.c: Likewise.
* gcc.target/aarch64/sve/reduc_14.c: Likewise.
* gcc.target/aarch64/sve/reduc_14_run.c: Likewise.
* gcc.target/aarch64/sve/reduc_15.c: Likewise.
* gcc.target/aarch64/sve/reduc_15_run.c: Likewise.
Richard Sandiford [Tue, 13 Jul 2021 09:17:42 +0000 (10:17 +0100)]
vect: Simplify get_initial_def_for_reduction
After previous patches, we can now easily provide the neutral op
as an argument to get_initial_def_for_reduction. This in turn
allows the adjustment calculation to be moved outside of
get_initial_def_for_reduction, which is the main motivation
of the patch.
gcc/
* tree-vect-loop.c (get_initial_def_for_reduction): Remove
adjustment handling. Take the neutral value as an argument,
in place of the code argument.
(vect_transform_cycle_phi): Update accordingly. Handle the
initial values of cond reductions separately from code reductions.
Choose the adjustment here rather than in
get_initial_def_for_reduction. Sink the splat of vec_initial_def.
Richard Sandiford [Tue, 13 Jul 2021 09:17:41 +0000 (10:17 +0100)]
vect: Generalise neutral_op_for_slp_reduction
This patch generalises the interface to neutral_op_for_slp_reduction
so that it can be used for non-SLP reductions too. This isn't much
of a win on its own, but it helps later patches.
gcc/
* tree-vect-loop.c (neutral_op_for_slp_reduction): Replace with...
(neutral_op_for_reduction): ...this, providing a more general
interface.
(vect_create_epilog_for_reduction): Update accordingly.
(vectorizable_reduction): Likewise.
(vect_transform_cycle_phi): Likewise.
Richard Sandiford [Tue, 13 Jul 2021 09:17:40 +0000 (10:17 +0100)]
vect: Pass reduc_info to get_initial_def_for_reduction
Similarly to the previous patch, this one passes the reduc_info
to get_initial_def_for_reduction, rather than a stmt_vec_info that
lacks the metadata. This again becomes useful later.
gcc/
* tree-vect-loop.c (get_initial_def_for_reduction): Take the
reduc_info instead of the original stmt_vec_info.
(vect_transform_cycle_phi): Update accordingly.
Richard Sandiford [Tue, 13 Jul 2021 09:17:39 +0000 (10:17 +0100)]
vect: Pass reduc_info to get_initial_defs_for_reduction
This patch passes the reduc_info to get_initial_defs_for_reduction,
so that the function can get general information from there rather
than from the first SLP statement. This isn't a win on its own,
but it becomes important with later patches.
gcc/
* tree-vect-loop.c (get_initial_defs_for_reduction): Take the
reduc_info as an additional parameter.
(vect_transform_cycle_phi): Update accordingly.
Richard Sandiford [Tue, 13 Jul 2021 09:17:39 +0000 (10:17 +0100)]
vect: Add a vect_phi_initial_value helper function
This patch adds a helper function called vect_phi_initial_value
for returning the incoming value of a given loop phi. The main
reason for adding it is to ensure that the right preheader edge
is used when vectorising nested loops. (PHI_ARG_DEF_FROM_EDGE
itself doesn't assert that the given edge is for the right block,
although I guess that would be good to add separately.)
gcc/
* tree-vectorizer.h: Include tree-ssa-operands.h.
(vect_phi_initial_value): New function.
* tree-vect-loop.c (neutral_op_for_slp_reduction): Use it.
(get_initial_defs_for_reduction, info_for_reduction): Likewise.
(vect_create_epilog_for_reduction, vectorizable_reduction): Likewise.
(vect_transform_cycle_phi, vectorizable_induction): Likewise.
Richard Sandiford [Tue, 13 Jul 2021 09:17:38 +0000 (10:17 +0100)]
vect: Ensure reduc_inputs always have vectype
Vector reduction accumulators can differ in signedness from the
final scalar result. The conversions to handle that case were
distributed through vect_create_epilog_for_reduction; this patch
does the conversion up-front instead.
gcc/
* tree-vect-loop.c (vect_create_epilog_for_reduction): Convert
the phi results to vectype after creating them. Remove later
conversion code that thus becomes redundant.
Richard Sandiford [Tue, 13 Jul 2021 09:17:37 +0000 (10:17 +0100)]
vect: Remove new_phis from vect_create_epilog_for_reduction
vect_create_epilog_for_reduction had a variable called new_phis.
It collected the statements that produce the exit block definitions
of the vector reduction accumulators. Although those statements
are indeed phis initially, they are often replaced with normal
statements later, leading to puzzling code like:
FOR_EACH_VEC_ELT (new_phis, i, new_phi)
{
int bit_offset;
if (gimple_code (new_phi) == GIMPLE_PHI)
vec_temp = PHI_RESULT (new_phi);
else
vec_temp = gimple_assign_lhs (new_phi);
Also, although the array collects statements, in practice all users want
the lhs instead.
This patch therefore replaces new_phis with a vector of gimple values
called “reduc_inputs”.
Also, reduction chains and ncopies>1 were handled with identical code
(and there was a comment saying so). The patch unites them into
a single “if”.
gcc/
* tree-vect-loop.c (vect_create_epilog_for_reduction): Replace
the new_phis vector with a reduc_inputs vector. Combine handling
of reduction chains and ncopies > 1.
Richard Sandiford [Tue, 13 Jul 2021 09:17:36 +0000 (10:17 +0100)]
vect: Create array_slice of live-out stmts
This patch constructs an array_slice of the scalar statements that
produce live-out reduction results in the original unvectorised loop.
There are three cases:
- SLP reduction chains: the final SLP stmt is live-out
- full SLP reductions: all SLP stmts are live-out
- non-SLP reductions: the single scalar stmt is live-out
This is a slight simplification on its own, mostly because it maans
“group_size” has a consistent meaning throughout the function.
The main justification though is that it helps with later patches.
gcc/
* tree-vect-loop.c (vect_create_epilog_for_reduction): Truncate
scalar_results to group_size elements after reducing down from
N*group_size elements. Construct an array_slice of the live-out
stmts and assert that there is one stmt per scalar result.
Richard Sandiford [Tue, 13 Jul 2021 09:17:35 +0000 (10:17 +0100)]
vect: Simplify epilogue reduction code
vect_create_epilog_for_reduction only handles two cases: single-loop
reductions and double reductions. “nested cycles” (i.e. reductions
in the inner loop when vectorising an outer loop) are handled elsewhere
and don't need a vector->scalar reduction.
The function had variables called nested_in_vect_loop and double_reduc
and asserted that nested_in_vect_loop implied double_reduc, but it
still had code to handle nested_in_vect_loop && !double_reduc.
This patch removes that and uses double_reduc everywhere.
gcc/
* tree-vect-loop.c (vect_create_epilog_for_reduction): Remove
nested_in_vect_loop and use double_reduc everywhere. Remove dead
assignment to "loop".
Richard Sandiford [Tue, 13 Jul 2021 09:17:34 +0000 (10:17 +0100)]
ifcvt: Improve tests for predicated operations
-msve-vector-bits=128 causes the AArch64 port to list 128-bit Advanced
SIMD as the first-choice mode for vectorisation, with SVE being used for
things that Advanced SIMD can't handle as easily. However, ifcvt would
not then try to use SVE's predicated FP arithmetic, leading to tests
like TSVC ControlFlow-flt failing to vectorise.
The mask load/store code did try other vector modes, but could also be
improved to make sure that SVEness sticks when computing derived modes.
(Unlike mode_for_vector, related_vector_mode always returns a vector
mode, so there's no need to check VECTOR_MODE_P as well.)
gcc/
* internal-fn.c (vectorized_internal_fn_supported_p): Handle
vector types first. For scalar types, consider both the preferred
vector mode and the alternative vector modes.
* optabs-query.c (can_vec_mask_load_store_p): Use the same
structure as above, in particular using related_vector_mode
for modes provided by autovectorize_vector_modes.
gcc/testsuite/
* gcc.target/aarch64/sve/cond_arith_6.c: New test.
Jakub Jelinek [Tue, 13 Jul 2021 09:04:22 +0000 (11:04 +0200)]
passes: Fix up subobject __bos [PR101419]
The following testcase is miscompiled, because VN during cunrolli changes
__bos argument from address of a larger field to address of a smaller field
and so __builtin_object_size (, 1) then folds into smaller value than the
actually available size.
copy_reference_ops_from_ref has a hack for this, but it was using
cfun->after_inlining as a check whether the hack can be ignored, and
cunrolli is after_inlining.
This patch uses a property to make it exact (set at the end of objsz
pass that doesn't do insert_min_max_p) and additionally based on discussions
in the PR moves the objsz pass earlier after IPA.
2021-07-13 Jakub Jelinek <jakub@redhat.com>
Richard Biener <rguenther@suse.de>
PR tree-optimization/101419
* tree-pass.h (PROP_objsz): Define.
(make_pass_early_object_sizes): Declare.
* passes.def (pass_all_early_optimizations): Rename pass_object_sizes
there to pass_early_object_sizes, drop parameter.
(pass_all_optimizations): Move pass_object_sizes right after pass_ccp,
drop parameter, move pass_post_ipa_warn right after that.
* tree-object-size.c (pass_object_sizes::execute): Rename to...
(object_sizes_execute): ... this. Add insert_min_max_p argument.
(pass_data_object_sizes): Move after object_sizes_execute.
(pass_object_sizes): Likewise. In execute method call
object_sizes_execute, drop set_pass_param method and insert_min_max_p
non-static data member and its initializer in the ctor.
(pass_data_early_object_sizes, pass_early_object_sizes,
make_pass_early_object_sizes): New.
* tree-ssa-sccvn.c (copy_reference_ops_from_ref): Use
(cfun->curr_properties & PROP_objsz) instead of cfun->after_inlining.
* gcc.dg/builtin-object-size-10.c: Pass -fdump-tree-early_objsz-details
instead of -fdump-tree-objsz1-details in dg-options and adjust names
of dump file in scan-tree-dump.
* gcc.dg/pr101419.c: New test.
Jakub Jelinek [Tue, 13 Jul 2021 07:50:49 +0000 (09:50 +0200)]
libgomp: Don't include limits.h instead of hidden visibility block
sem.h is included in between # pragma GCC visibility push(hidden)
and # pragma GCC visibility pop and includes limits.h there, which
since the introduction of sysconf declaration in recent glibcs
in there causes trouble. libgomp assumes it is compiled by gcc,
so we don't really need to include limits.h there and can use
-__INT_MAX__ - 1 instead (which clang and icc support too for years).
2021-07-13 Jakub Jelinek <jakub@redhat.com>
Florian Weimer <fweimer@redhat.com>
* config/linux/sem.h: Don't include limits.h.
(SEM_WAIT): Define to -__INT_MAX__ - 1 instead of INT_MIN.
* config/linux/affinity.c: Include limits.h.
Kito Cheng [Fri, 2 Jul 2021 02:19:30 +0000 (10:19 +0800)]
docs: Add 'S' to Machine Constraints for RISC-V
It was undocument before, but it might used in linux kernel for resolve
code model issue, so LLVM community suggest we should document that,
so that make it become supported/documented/non-internal machine constraints.
gcc/ChangeLog:
PR target/101275
* config/riscv/constraints.md ("S"): Update description and remove
@internal.
* doc/md.texi (Machine Constraints): Document the 'S' constraints
for RISC-V.
Richard Biener [Tue, 13 Jul 2021 06:04:34 +0000 (08:04 +0200)]
Revert "Display the number of components BB vectorized"
This reverts commit
c03cae4e066066278c8435c409829a9bf851e49f.
Michael Meissner [Tue, 13 Jul 2021 04:36:43 +0000 (00:36 -0400)]
Deal with prefixed loads/stores in tests, PR testsuite/100166
This patch updates the various tests in the testsuite to treat plxv
and pstxv as being vector loads/stores. This shows up if you run the
testsuite with a compiler configured with the option: --with-cpu=power10.
2021-07-13 Michael Meissner <meissner@linux.ibm.com>
gcc/testsuite/
PR testsuite/100166
* gcc.dg/vect/costmodel/ppc/costmodel-bb-slp-9a-pr63175.c: Update
insn counts to account for power10 prefixed loads and stores.
* gcc.target/powerpc/fold-vec-load-builtin_vec_xl-char.c:
Likewise.
* gcc.target/powerpc/fold-vec-load-builtin_vec_xl-double.c:
Likewise.
* gcc.target/powerpc/fold-vec-load-builtin_vec_xl-float.c:
Likewise.
* gcc.target/powerpc/fold-vec-load-builtin_vec_xl-int.c:
Likewise.
* gcc.target/powerpc/fold-vec-load-builtin_vec_xl-longlong.c:
Likewise.
* gcc.target/powerpc/fold-vec-load-builtin_vec_xl-short.c:
Likewise.
* gcc.target/powerpc/fold-vec-load-vec_vsx_ld-char.c: Likewise.
* gcc.target/powerpc/fold-vec-load-vec_vsx_ld-double.c: Likewise.
* gcc.target/powerpc/fold-vec-load-vec_vsx_ld-float.c: Likewise.
* gcc.target/powerpc/fold-vec-load-vec_vsx_ld-int.c: Likewise.
* gcc.target/powerpc/fold-vec-load-vec_vsx_ld-longlong.c:
Likewise.
* gcc.target/powerpc/fold-vec-load-vec_vsx_ld-short.c: Likewise.
* gcc.target/powerpc/fold-vec-load-vec_xl-char.c: Likewise.
* gcc.target/powerpc/fold-vec-load-vec_xl-double.c: Likewise.
* gcc.target/powerpc/fold-vec-load-vec_xl-float.c: Likewise.
* gcc.target/powerpc/fold-vec-load-vec_xl-int.c: Likewise.
* gcc.target/powerpc/fold-vec-load-vec_xl-longlong.c: Likewise.
* gcc.target/powerpc/fold-vec-load-vec_xl-short.c: Likewise.
* gcc.target/powerpc/fold-vec-splat-floatdouble.c: Likewise.
* gcc.target/powerpc/fold-vec-splat-longlong.c: Likewise.
* gcc.target/powerpc/fold-vec-store-builtin_vec_xst-char.c:
Likewise.
* gcc.target/powerpc/fold-vec-store-builtin_vec_xst-double.c:
Likewise.
* gcc.target/powerpc/fold-vec-store-builtin_vec_xst-float.c:
Likewise.
* gcc.target/powerpc/fold-vec-store-builtin_vec_xst-int.c:
Likewise.
* gcc.target/powerpc/fold-vec-store-builtin_vec_xst-longlong.c:
Likewise.
* gcc.target/powerpc/fold-vec-store-builtin_vec_xst-short.c:
Likewise.
* gcc.target/powerpc/fold-vec-store-vec_vsx_st-char.c: Likewise.
* gcc.target/powerpc/fold-vec-store-vec_vsx_st-double.c:
Likewise.
* gcc.target/powerpc/fold-vec-store-vec_vsx_st-float.c: Likewise.
* gcc.target/powerpc/fold-vec-store-vec_vsx_st-int.c: Likewise.
* gcc.target/powerpc/fold-vec-store-vec_vsx_st-longlong.c:
Likewise.
* gcc.target/powerpc/fold-vec-store-vec_vsx_st-short.c: Likewise.
* gcc.target/powerpc/fold-vec-store-vec_xst-char.c: Likewise.
* gcc.target/powerpc/fold-vec-store-vec_xst-double.c: Likewise.
* gcc.target/powerpc/fold-vec-store-vec_xst-float.c: Likewise.
* gcc.target/powerpc/fold-vec-store-vec_xst-int.c: Likewise.
* gcc.target/powerpc/fold-vec-store-vec_xst-longlong.c: Likewise.
* gcc.target/powerpc/fold-vec-store-vec_xst-short.c: Likewise.
* gcc.target/powerpc/lvsl-lvsr.c: Likewise.
* gcc.target/powerpc/pr86731-fwrapv-longlong.c: Likewise.
Michael Meissner [Tue, 13 Jul 2021 03:51:24 +0000 (23:51 -0400)]
Fix vec-splati-runnable.c test.
I noticed that the vec-splati-runnable.c did not have an abort after one
of the tests. If the test was run with optimization, the optimizer could
delete some of the tests and throw off the count. However, due to the
fact that the value being loaded in that test is undefined, I did not
check what value was loaded, but I just stored it into a volatile global
variable.
2021-07-12 Michael Meissner <meissner@linux.ibm.com>
gcc/testsuite/
* gcc.target/powerpc/vec-splati-runnable.c: Run test with -O2
optimization. Do not check what XXSPLTIDP generates if the value
is undefined.
Michael Meissner [Tue, 13 Jul 2021 03:50:38 +0000 (23:50 -0400)]
Change rs6000_const_f32_to_i32 return type.
The function rs6000_const_f32_to_i32 called REAL_VALUE_TO_TARGET_SINGLE
with a long long type and returns it. This patch changes the type to long
which is the proper type for REAL_VALUE_TO_TARGET_SINGLE.
2021-07-12 Michael Meissner <meissner@linux.ibm.com>
gcc/
* config/rs6000/altivec.md (xxspltiw_v4sf): Change local variable
value to to long.
* config/rs6000/rs6000-protos.h (rs6000_const_f32_to_i32): Change
return type to long.
* config/rs6000/rs6000.c (rs6000_const_f32_to_i32): Change return
type to long.
GCC Administrator [Tue, 13 Jul 2021 00:16:30 +0000 (00:16 +0000)]
Daily bump.
Andrew MacLeod [Mon, 12 Jul 2021 18:38:42 +0000 (14:38 -0400)]
Add relation processing to ubsan builtins.
Ubsan builtins call the plus/minus/multiple fold routines, but did not
use any relation information between the 2 operands that is available.
query and pass any relations.
This resolves gcc.dg/pr97505.c when operating in ranger-only mode.
* gimple-range-fold.cc (fold_using_range::range_of_builtin_ubsan_call):
Query relation between the 2 operands and use it.
Sergei Trofimovich [Mon, 12 Jul 2021 22:45:38 +0000 (23:45 +0100)]
docs: fix s/ei_safe_safe/ei_safe_edge/ typo
gcc/ChangeLog:
* doc/cfg.texi: Fix s/ei_safe_safe/ei_safe_edge/ typo.
Patrick Palka [Mon, 12 Jul 2021 20:35:18 +0000 (16:35 -0400)]
c++: permit deduction guides at class scope [PR79501]
This adds support for declaring (class-scope) deduction guides for a
member class template. Fortunately it seems only a couple of changes
are needed in order for the existing CTAD machinery to handle them
properly: we need to make sure to give them a FUNCTION_TYPE instead of a
METHOD_TYPE, and we need to avoid using a BASELINK when looking them up.
PR c++/79501
PR c++/100983
gcc/cp/ChangeLog:
* decl.c (grokfndecl): Don't require that deduction guides are
declared at namespace scope. Check that class-scope deduction
guides have the same access as the member class template.
(grokdeclarator): Pretend class-scope deduction guides are static.
* search.c (lookup_member): Don't use a BASELINK for (class-scope)
deduction guides.
gcc/testsuite/ChangeLog:
* g++.dg/cpp1z/class-deduction92.C: New test.
* g++.dg/cpp1z/class-deduction93.C: New test.
* g++.dg/cpp1z/class-deduction94.C: New test.
* g++.dg/cpp1z/class-deduction95.C: New test.
Uros Bizjak [Mon, 12 Jul 2021 19:06:32 +0000 (21:06 +0200)]
i386: Fix vec_set<mode> expanders [PR101424]
AVX does not support 32-byte integer compares, required by
ix86_expand_vector_set_var. The following patch fixes vec_set<mode>
expanders by introducing new vec_setm_avx2_operand predicate for AVX
vector modes.
gcc/
2021-07-12 Uroš Bizjak <ubizjak@gmail.com>
PR target/101424
* config/i386/predicates.md (vec_setm_sse41_operand):
Rename from vec_setm_operand.
(vec_setm_avx2_operand): New predicate.
* config/i386/sse.md (vec_set<V_128:mode>): Use V_128 mode iterator.
Use vec_setm_sse41_operand as operand 2 predicate.
(vec_set<V_256_512:mode): New expander.
* config/i386/mmx.md (vec_setv2hi): Use vec_setm_sse41_operand
as operand 2 predicate.
gcc/testsuite/
2021-07-12 Uroš Bizjak <ubizjak@gmail.com>
PR target/101424
* gcc.target/i386/pr101424.c: New test.
Andrew MacLeod [Mon, 12 Jul 2021 15:38:17 +0000 (11:38 -0400)]
Do not register a cast as an equivalence.
Registering an equivalence between objects of the same size in a cast can
cause other relations to be incorrect.
gcc/
PR tree-optimization/101335
* range-op.cc (operator_cast::lhs_op1_relation): Delete.
gcc/testsuite/
* gcc.dg/tree-ssa/pr101335.c: New.
Jonathan Wakely [Mon, 12 Jul 2021 15:09:34 +0000 (16:09 +0100)]
libstdc++: Constrain std::as_writable_bytes [PR101411]
The std::as_writable_bytes function should be constrained to only accept
writable spans. Currently it can be called but then gives an error in
the function body.
Signed-off-by: Jonathan Wakely <jwakely@redhat.com>
libstdc++-v3/ChangeLog:
PR libstdc++/101411
* include/std/span (as_writable_bytes): Add requires-clause.
* testsuite/23_containers/span/101411.cc: New test.
Andrew Pinski [Fri, 9 Jul 2021 02:23:35 +0000 (19:23 -0700)]
[PHIOPT/MATCH] Remove the statement to move if not used
Instead of waiting for DCE to remove the unused statement,
and maybe optimize another conditional, it is better if
we don't move the statement and have the statement
removed.
OK? Bootstrapped and tested on x86_64-linux-gnu.
Changes from v1:
* v2: Change the order of insertation and check to see if the lhs
is used rather than see if the lhs was used in the sequence.
gcc/ChangeLog:
* tree-ssa-phiopt.c (match_simplify_replacement): Move
insert of the sequence before the movement of the
statement. Check if to see if the statement is used
outside of the original phi to see if we should move it.
gcc/testsuite/ChangeLog:
* gcc.dg/tree-ssa/pr96928-1.c: Update to similar as pr96928.c.
Richard Biener [Mon, 12 Jul 2021 13:13:17 +0000 (15:13 +0200)]
produce simple DOT graphs from SLP trees
This adds a dot_slp_tree debug function producing a simple DOT
graph from a starting node down the graph. There's no fancy
direct invocation of dot but the output is directed to a specified
file. It re-uses vect_print_slp_tree, naming nodes as their
address.
2021-07-12 Richard Biener <rguenther@suse.de>
* dump-context.h (debug_dump_context::debug_dump_context):
Add FILE * parameter defaulted to stderr.
* dumpfile.c (debug_dump_context::debug_dump_context): Adjust.
* tree-vect-slp.c (dot_slp_tree): New functions.
Richard Biener [Thu, 8 Jul 2021 07:52:49 +0000 (09:52 +0200)]
tree-optimization/101373 - avoid PRE across externally throwing call
PRE already tries to avoid hoisting possibly trapping expressions
across calls that might not return normally but fails to consider
const calls that throw externally. The following fixes that and
also plugs the hole of trapping references not pruned in case
they are not catched by the actuall call clobbering it.
At -Os we hit the same issue in RTL PRE and postreload-gcse has
even more incomplete checks so the patch adjusts both of those
as well.
2021-07-08 Richard Biener <rguenther@suse.de>
PR tree-optimization/101373
* tree-ssa-pre.c (prune_clobbered_mems): Also prune trapping
references when the BB may not return.
(compute_avail): Pass in the function we're working on and
replace cfun references with it. Externally throwing
const calls also possibly terminate the function.
(pass_pre::execute): Pass down the function we're working on.
* gcse.c (compute_hash_table_work): Externally throwing
const/pure calls also need record_last_mem_set_info.
* postreload-gcse.c (record_opr_changes): Looping or externally
throwing const/pure calls also need record_last_mem_set_info.
* g++.dg/torture/pr101373.C: New testcase, XFAILed.
* gnat.dg/opt95.adb: Likewise.
Uros Bizjak [Mon, 12 Jul 2021 14:34:41 +0000 (16:34 +0200)]
Change the type of memory classification functions to bool
2021-07-12 Uroš Bizjak <ubizjak@gmail.com>
gcc/
* recog.c (memory_address_addr_space_p): Change the type to bool.
Return true/false instead of 1/0.
(offsettable_memref_p): Ditto.
(offsettable_nonstrict_memref_p): Ditto.
(offsettable_address_addr_space_p): Ditto.
Change the type of addressp indirect function to bool.
* recog.h (memory_address_addr_space_p): Change the type to bool.
(strict_memory_address_addr_space_p): Ditto.
(offsettable_memref_p): Ditto.
(offsettable_nonstrict_memref_p): Ditto.
(offsettable_address_addr_space_p): Ditto.
* reload.c (maybe_memory_address_addr_space_p): Ditto.
(strict_memory_address_addr_space_p): Change the type to bool.
Return true/false instead of 1/0.
(maybe_memory_address_addr_space_p): Change the type to bool.
Pierre-Marie de Rodat [Fri, 25 Jun 2021 09:22:19 +0000 (09:22 +0000)]
[Ada] adaint.c minor reformatting
gcc/ada/
* adaint.c (__gnat_number_of_cpus): Replace "#ifdef" by "#if
defined".
Eric Botcazou [Fri, 4 Jun 2021 16:22:17 +0000 (18:22 +0200)]
[Ada] Use GNAT encodings only when -fgnat-encodings=all is specified
gcc/ada/
* gcc-interface/decl.c (gnat_to_gnu_entity) <discrete_type>: Add a
parallel type only when -fgnat-encodings=all is specified.
<E_Array_Type>: Use the PAT name and special suffixes only when
-fgnat-encodings=all is specified.
<E_Array_Subtype>: Build a special type for debugging purposes only
when -fgnat-encodings=all is specified. Add a parallel type or use
the PAT name only when -fgnat-encodings=all is specified.
<E_Record_Type>: Generate debug info for the inner record types only
when -fgnat-encodings=all is specified.
<E_Record_Subtype>: Use a debug type for an artificial subtype only
except when -fgnat-encodings=all is specified.
(elaborate_expression_1): Reset need_for_debug when possible only
except when -fgnat-encodings=all is specified.
(components_to_record): Use XV encodings for variable size only
when -fgnat-encodings=all is specified.
(associate_original_type_to_packed_array): Add a parallel type only
when -fgnat-encodings=all is specified.
* gcc-interface/misc.c (gnat_get_array_descr_info): Do not return
full information only when -fgnat-encodings=all is specified.
* gcc-interface/utils.c (make_packable_type): Add a parallel type
only when -fgnat-encodings=all is specified.
(maybe_pad_type): Make the inner type a debug type only except when
-fgnat-encodings=all is specified. Create an XVS type for variable
size only when -fgnat-encodings=all is specified.
(rest_of_record_type_compilation): Add a parallel type only when
-fgnat-encodings=all is specified.
Eric Botcazou [Tue, 27 Apr 2021 19:18:12 +0000 (21:18 +0200)]
[Ada] Implement support for unconstrained array types with FLB
gcc/ada/
* gcc-interface/decl.c (gnat_to_gnu_entity) <E_Array_Type>: Use a
fixed lower bound if the index subtype is marked so, as well as a
more efficient formula for the upper bound if the array cannot be
superflat.
(flb_cannot_be_superflat): New predicate.
(cannot_be_superflat): Rename into...
(range_cannot_be_superfla): ...this. Minor tweak.
Bob Duff [Tue, 15 Jun 2021 13:12:36 +0000 (09:12 -0400)]
[Ada] Clean up Uint fields
gcc/ada/
* uintp.ads, types.h: New subtypes of Uint: Valid_Uint, Unat,
Upos, Nonzero_Uint with predicates. These correspond to new
field types in Gen_IL.
* gen_il-types.ads (Valid_Uint, Unat, Upos, Nonzero_Uint): New
field types.
* einfo-utils.ads, einfo-utils.adb, fe.h (Known_Alignment,
Init_Alignment): Use the initial zero value to represent
"unknown". This will ensure that if Alignment is called before
Set_Alignment, the compiler will blow up (if assertions are
enabled).
* atree.ads, atree.adb, atree.h, gen_il-gen.adb
(Get_Valid_32_Bit_Field): New generic low-level getter for
subtypes of Uint.
(Copy_Alignment): New procedure to copy Alignment field even
when Unknown.
(Init_Object_Size_Align, Init_Size_Align): Do not bypass the
Init_ procedures.
* exp_pakd.adb, freeze.adb, layout.adb, repinfo.adb,
sem_util.adb: Protect calls to Alignment with Known_Alignment.
Use Copy_Alignment when it might be unknown.
* gen_il-gen-gen_entities.adb (Alignment,
String_Literal_Length): Use type Unat instead of Uint, to ensure
that the field is always Set_ before we get it, and that it is
set to a nonnegative value.
(Enumeration_Pos): Unat.
(Enumeration_Rep): Valid_Uint. Can be negative, but must be
valid before fetching.
(Discriminant_Number): Upos.
(Renaming_Map): Remove.
* gen_il-gen-gen_nodes.adb (Char_Literal_Value, Reason): Unat.
(Intval, Corresponding_Integer_Value): Valid_Uint.
* gen_il-internals.ads: New functions for dealing with special
defaults and new subtypes of Uint.
* scans.ads: Correct comments.
* scn.adb (Post_Scan): Do not set Intval to No_Uint; that is no
longer allowed.
* sem_ch13.adb (Analyze_Enumeration_Representation_Clause): Do
not set Enumeration_Rep to No_Uint; that is no longer allowed.
(Offset_Value): Protect calls to Alignment with Known_Alignment.
* sem_prag.adb (Set_Atomic_VFA): Do not use Uint_0 to mean
"unknown"; call Init_Alignment instead.
* sinfo.ads: Minor comment fix.
* treepr.adb: Deal with printing of new field types.
* einfo.ads, gen_il-fields.ads (Renaming_Map): Remove.
* gcc-interface/decl.c (gnat_to_gnu_entity): Use Known_Alignment
before calling Alignment. This preserve some probably buggy
behavior: if the alignment is not set, it previously defaulted
to Uint_0; we now make that explicit. Use Copy_Alignment,
because "Set_Alignment (Y, Alignment (X));" no longer works when
the Alignment of X has not yet been set.
* gcc-interface/trans.c (process_freeze_entity): Use
Copy_Alignment.
Eric Botcazou [Fri, 18 Jun 2021 14:47:48 +0000 (16:47 +0200)]
[Ada] Add DWARF 5 support to System.Dwarf_Line
gcc/ada/
* libgnat/s-dwalin.ads: Adjust a few comments left and right.
(Line_Info_Register): Comment out unused components.
(Line_Info_Header): Add DWARF 5 support.
(Dwarf_Context): Likewise. Rename "prologue" into "header".
* libgnat/s-dwalin.adb: Alphabetize "with" clauses.
(DWARF constants): Add DWARF 5 support and reorder.
(For_Each_Row): Adjust.
(Initialize_Pass): Likewise.
(Initialize_State_Machine): Likewise and fix typo.
(Open): Add DWARF 5 support.
(Parse_Prologue): Rename into...
(Parse_Header): ...this and add DWARF 5 support.
(Read_And_Execute_Isn): Rename into...
(Read_And_Execute_Insn): ...this and adjust.
(To_File_Name): Change parameter name and add DWARF 5 support.
(Read_Entry_Format_Array): New procedure.
(Skip_Form): Add DWARF 5 support and reorder.
(Seek_Abbrev): Do not count entries and add DWARF 5 support.
(Debug_Info_Lookup): Add DWARF 5 support.
(Symbolic_Address.Set_Result): Likewise.
(Symbolic_Address): Adjust.
Bob Duff [Wed, 16 Jun 2021 10:47:57 +0000 (06:47 -0400)]
[Ada] Duplicate Size/Value_Size clause
gcc/ada/
* sem_ch13.adb (Duplicate_Clause): Add a helper routine
Check_One_Attr, with a parameter for the attribute_designator we
are looking for, and one for the attribute_designator of the
current node (which are usually the same). For Size and
Value_Size, call it twice, once for each.
* errout.ads: Fix a typo.
Piotr Trojanek [Thu, 17 Jun 2021 16:49:11 +0000 (18:49 +0200)]
[Ada] Avoid unnecessary work when expanding 'Image into 'Put_Image
gcc/ada/
* exp_imgv.adb (Expand_Image_Attribute): Move rewriting to
attribute Put_Image to the beginning of expansion of attribute
Image.
Richard Biener [Wed, 7 Jul 2021 09:45:43 +0000 (11:45 +0200)]
Display the number of components BB vectorized
This amends the optimization message printed when a basic-block
part is vectorized to mention the number of SLP graph entries.
This helps when debugging vectorization differences and we end up
merging SLP instances for costing purposes.
2021-07-07 Richard Biener <rguenther@suse.de>
* tree-vect-slp.c (vect_slp_region): Show the number of
SLP graph entries in the optimization message.
* g++.dg/vect/slp-pr87105.cc: Adjust.
* gcc.dg/vect/bb-slp-pr54400.c: Likewise.
Richard Biener [Mon, 12 Jul 2021 08:49:03 +0000 (10:49 +0200)]
tree-optimization/101394 - fix PRE full redundancy wrt abnormals
This avoids adding a copy from an abnormal picked up from PHI
translation much like we'd avoid inserting the translated
expression on pred edges.
2021-07-12 Richard Biener <rguenther@suse.de>
PR tree-optimization/101394
* tree-ssa-pre.c (do_pre_regular_insertion): Avoid inserting
copies from abnormals for a full redundancy.
* gcc.dg/torture/pr101394.c: New testcase.
Richard Biener [Mon, 12 Jul 2021 08:26:25 +0000 (10:26 +0200)]
middle-end/101423 - internal calls do not trap
This adjusts gimple_could_trap_p to not consider internal function
calls to trap compared to indirect calls or calls to weak functions.
2021-07-12 Richard Biener <rguenther@suse.de>
PR middle-end/101423
* gimple.c (gimple_could_trap_p_1): Internal function calls
do not trap.
* tree-eh.c (tree_could_trap_p): Likewise.
Roger Sayle [Mon, 12 Jul 2021 09:59:08 +0000 (10:59 +0100)]
Tweak testcase for PR tree-optimization/101403.
Initialize unused variable u in compound expression. Committed as obvious.
2021-07-12 Roger Sayle <roger@nextmovesoftware.com>
Jakub Jelinek <jakub@redhat.com>
gcc/testsuite/ChangeLog
PR tree-optimization/101403
* gcc.dg/pr101403.c: Avoid (unimportant) uninitialized variable.
prathamesh.kulkarni [Mon, 12 Jul 2021 09:48:21 +0000 (15:18 +0530)]
arm/66791: Replace builtins for unsigned and fp vmul_n intrinsics.
gcc/ChangeLog:
PR target/66791
* config/arm/arm_neon.h (vmul_n_u32): Replace call to builtin with
__a * __b.
(vmulq_n_u32): Likewise.
(vmul_n_f32): Gate __a * __b on __FAST_MATH__.
(vmulq_n_f32): Likewise.
(vmul_n_f16): Likewise.
(vmulq_n_f16): Likewise.
gcc/testsuite/ChangeLog:
PR target/66791
* gcc.target/arm/armv8_2-fp16-neon-2.c: Adjust.
Martin Liska [Mon, 12 Jul 2021 08:59:06 +0000 (10:59 +0200)]
offloading: fix -foffload hinting
PR sanitizer/101425
gcc/ChangeLog:
* gcc.c (check_offload_target_name): Call
candidates_list_and_hint only if we have a candidate.
prathamesh.kulkarni [Mon, 12 Jul 2021 07:53:06 +0000 (13:23 +0530)]
arm/98435: Missed optimization in expanding vector constructor.
The patch moves vec_init pattern from neon.md to vec-common.md,
and adjusts the mode to VDQX to accomodate binary floats. Also,
the pattern is additionally gated on VALID_MVE_MODE.
gcc/ChangeLog:
PR target/98435
* config/arm/neon.md (vec_init): Move to ...
* config/arm/vec-common.md (vec_init): ... here.
Change the pattern's mode to VDQX and gate it on VALID_MVE_MODE.
gcc/testsuite/ChangeLog:
PR target/98435
* gcc.target/arm/simd/pr98435.c: New test.
Roger Sayle [Mon, 12 Jul 2021 07:24:27 +0000 (08:24 +0100)]
PR tree-optimization/101403: Incorrect folding of ((T)bswap(x))>>C
My sincere apologies for the breakage. My recent patch to fold
bswapN(x)>>C where the constant C was large enough that the result
only contains bits from the low byte, and can therefore avoid
the byte swap contains a minor logic error. The pattern contains
a convert? allowing an extension to occur between the bswap and
the shift. The logic is correct if there's no extension, or the
extension has the same sign as the shift, but I'd mistakenly
convinced myself that these couldn't have different signedness.
(T)bswap16(x)>>12 is (T)((unsigned char)x>>4) or (T)((signed char)x>>4).
The bug is that for zero-extensions to signed type T, we need to use
the unsigned char variant [the signedness of the byte shift is not
(always) the same as the signedness of T and the original shift].
Then because I'm now paranoid, I've also added a clause to handle
the hypothetical (but in practice impossible) sign-extension to an
unsigned type T, which can implemented as (T)(x<<8)>>12.
2021-07-12 Roger Sayle <roger@nextmovesoftware.com>
gcc/ChangeLog
PR tree-optimization/101403
* match.pd ((T)bswap(X)>>C): Correctly handle cases where
signedness of the shift is not the same as the signedness of
the type extension.
gcc/testsuite/ChangeLog
PR tree-optimization/101403
* gcc.dg/pr101403.c: New test case.
GCC Administrator [Mon, 12 Jul 2021 00:16:29 +0000 (00:16 +0000)]
Daily bump.
GCC Administrator [Sun, 11 Jul 2021 00:16:35 +0000 (00:16 +0000)]
Daily bump.
John David Anglin [Sat, 10 Jul 2021 16:20:32 +0000 (16:20 +0000)]
Require target lra for tests using asm goto
gcc/testsuite/ChangeLog:
* gcc.dg/torture/pr100329.c: Require target lra.
* gcc.dg/torture/pr100519.c: Likewise.
Ian Lance Taylor [Fri, 9 Jul 2021 02:25:55 +0000 (19:25 -0700)]
runtime: remove direct assignments to memory locations
PR bootstrap/101374
They cause a warning with the updated GCC -Warray-bounds option.
Replace them with calls to abort, which for our purposes is fine.
Reviewed-on: https://go-review.googlesource.com/c/gofrontend/+/333409
Patrick Palka [Sat, 10 Jul 2021 02:40:07 +0000 (22:40 -0400)]
c++: 'new T[N]' and SFINAE [PR82110]
Here we're failing to treat 'new T[N]' as erroneous in a SFINAE context
when T isn't default constructible because expand_aggr_init_1 doesn't
communicate to build_aggr_init (its only SFINAE caller) whether the
initialization was actually successful. To fix this, this patch makes
expand_aggr_init_1 and its subroutine expand_default_init return true on
success, false on failure so that build_aggr_init can properly return
error_mark_node on failure.
PR c++/82110
gcc/cp/ChangeLog:
* init.c (build_aggr_init): Return error_mark_node if
expand_aggr_init_1 returns false.
(expand_default_init): Change return type to bool. Return false
on error, true on success.
(expand_aggr_init_1): Likewise.
gcc/testsuite/ChangeLog:
* g++.dg/cpp0x/pr78765.C: Expect another conversion failure
diagnostic.
* g++.dg/template/sfinae14.C: Flip incorrect assertion.
* g++.dg/cpp2a/concepts-requires27.C: New test.
GCC Administrator [Sat, 10 Jul 2021 00:16:53 +0000 (00:16 +0000)]
Daily bump.
H.J. Lu [Mon, 5 Jul 2021 21:15:04 +0000 (14:15 -0700)]
libffi/x86: Always check __x86_64__ for x86 hosts
The upstream libffi has
commit
cb8474368cdef3207638d047bd6c707ad8fcb339
Author: hjl-tools <hjl.tools@gmail.com>
Date: Wed Dec 2 12:52:12 2020 -0800
libffi/x86: Always check __x86_64__ for x32 hosts (#601) (#602)
Since for x86_64-*x32 and x86_64-x32-* hosts, -m32 generates ia32 codes.
We should always check __x86_64__ for x32 hosts.
Since for gnux32 hosts, -m32 generates i386 codes, always check __x86_64__
for x86 hosts.
PR libffi/101336
* configure.host: Always check __x86_64__ for x86 hosts.
Jason Merrill [Fri, 9 Jul 2021 17:50:01 +0000 (13:50 -0400)]
c++: concepts TS and explicit specialization [PR101098]
duplicate_decls was not recognizing the explicit specialization as matching
the implicit specialization of g<Y> because
function_requirements_equivalent_p was seeing the C constraint on the
implicit one and not on the explicit.
PR c++/101098
gcc/cp/ChangeLog:
* decl.c (function_requirements_equivalent_p): Only compare
trailing requirements on a specialization.
gcc/testsuite/ChangeLog:
* g++.dg/concepts/explicit-spec1.C: New test.
Iain Sandoe [Wed, 7 Jul 2021 18:56:20 +0000 (19:56 +0100)]
coroutines: Factor code. Match original source location in helpers [NFC].
This is primarily a source code refactoring, the only change is to
ensure that the outlined functions are marked to begin at the same
line as the original. Otherwise, they get the default (which seems
to be input_location, which corresponds to the closing brace at the
point that this is done). Having the source location point to that
confuses some debuggers.
This is a contributory fix to:
PR c++/99215 - coroutines: debugging with gdb
Signed-off-by: Iain Sandoe <iain@sandoe.co.uk>
gcc/cp/ChangeLog:
* coroutines.cc (build_actor_fn): Move common code to
act_des_fn.
(build_destroy_fn): Likewise.
(act_des_fn): Build the void return here. Ensure that the
source location matches the original function.
Roger Sayle [Fri, 9 Jul 2021 16:45:40 +0000 (17:45 +0100)]
Improvement to signed division of integer constant on x86_64.
This patch tweaks the way GCC handles 32-bit integer division on
x86_64, when the numerator is constant. Currently the function
int foo (int x) {
return 100/x;
}
generates the code:
foo: movl $100, %eax
cltd
idivl %edi
ret
where the sign-extension instruction "cltd" creates a long
dependency chain, as it depends on the "mov" before it, and
is depended upon by "idivl" after it.
With this patch, GCC now matches both icc and LLVM and uses
an xor instead, generating:
foo: xorl %edx, %edx
movl $100, %eax
idivl %edi
ret
Microbenchmarking confirms that this is faster on Intel
processors (Kaby lake), and no worse on AMD processors (Zen2),
which agrees with intuition, but oddly disagrees with the
llvm-mca cycle count prediction on godbolt.org.
The tricky bit is that this sign-extension instruction is only
produced by late (postreload) splitting, and unfortunately none
of the subsequent passes (e.g. cprop_hardreg) is able to
propagate and simplify its constant argument. The solution
here is to introduce a define_insn_and_split that allows the
constant numerator operand to be captured (by combine) and
then split into an optimal form after reload.
The above microbenchmarking also shows that eliminating the
sign extension of negative values (using movl $-1,%edx) is also
a performance improvement, as performed by icc but not by LLVM.
Both the xor and movl sign-extensions are larger than cltd,
so this transformation is prevented for -Os.
2021-07-09 Roger Sayle <roger@nextmovesoftware.com>
Uroš Bizjak <ubizjak@gmail.com>
gcc/ChangeLog
* config/i386/i386.md (*divmodsi4_const): Optimize SImode
divmod of a constant numerator with new define_insn_and_split.
gcc/testsuite/ChangeLog
* gcc.target/i386/divmod-9.c: New test case.
Iain Sandoe [Wed, 23 Jun 2021 07:13:22 +0000 (08:13 +0100)]
coroutines: Fix a typo in rewriting the function.
When amending the function re-write code, I made a typo in
the block connections. This has not shown up in any test
fails (as far as can be seen) but is a regression in debug
info.
Fixed thus.
Signed-off-by: Iain Sandoe <iain@sandoe.co.uk>
gcc/cp/ChangeLog:
* coroutines.cc
(coro_rewrite_function_body): Connect the replacement
function block to the block nest correctly.
Iain Sandoe [Mon, 3 May 2021 07:22:53 +0000 (08:22 +0100)]
Darwin, X86: Adjust call clobbers to allow for lazy-binding [PR 100152].
We allow public functions defined in a TU to bind locally for PIC
code (the default) on 64bit Mach-O.
If such functions are not inlined, we cannot tell at compile-time if
they might be called via the lazy symbol resolver (this can depend on
options given at link-time). Therefore, we must assume that the lazy
resolver could be used which clobbers R11 and R10.
Signed-off-by: Iain Sandoe <iain@sandoe.co.uk>
gcc/ChangeLog:
PR target/100152
* config/i386/i386-expand.c (ix86_expand_call): If a call is
to a non-local-binding, or local but to a public symbol, then
assume that it might be indirected via the lazy symbol binder.
Mark R10 and R10 as clobbered in that case.
Iain Sandoe [Sat, 3 Jul 2021 14:42:16 +0000 (15:42 +0100)]
Darwin, config: Revise host config fragment.
There were two uses for the Darwin host config fragment:
The first is to arrange for targets that support mdynamic-no-pic
to be built with that enabled (since it makes a significant
difference to the compiler performance). We can be more specific
in the application of this, since it only applies to 32b hosts
plus powerpc64-darwin9.
The second was to work around a tool bug where -fno-PIE was not
propagated to the link stage. This second use is redundant,
since the buggy toolchain cannot bootstrap current GCC sources
anyway.
This makes the host fragment more specific and reduces the number
of toolchains for which it is included which reduces clutter in
configure lines.
Signed-off-by: Iain Sandoe <iain@sandoe.co.uk>
config/ChangeLog:
* mh-darwin: Make this specific to handling the
mdynamic-no-pic case.
ChangeLog:
* configure: Regenerate.
* configure.ac: Adjust cases for which it is necessary to
include the Darwin host config fragment.
Eric Botcazou [Fri, 9 Jul 2021 16:30:54 +0000 (18:30 +0200)]
Missing piece in earlier change
gcc/ada/
* gcc-interface/utils.c (finish_subprog_decl): Remove obsolete line.
Indu Bhagat [Fri, 9 Jul 2021 16:03:08 +0000 (09:03 -0700)]
testsuite/101269: fix testcase when used with -m32
PR testsuite/101269 - new test case gcc.dg/debug/btf/btf-datasec-1.c
fails with its introduction in r12-1852
BTF datasec records for .rodata/.data are expected for now for all targets.
For powerpc based targets, use -msdata=none when ilp32 is enabled.
2021-07-09 Indu Bhagat <indu.bhagat@oracle.com>
gcc/testsuite/ChangeLog:
PR testsuite/101269
* gcc.dg/debug/btf/btf-datasec-1.c: Force -msdata=none with ilp32 for
powerpc based targets.
Patrick Palka [Fri, 9 Jul 2021 14:20:25 +0000 (10:20 -0400)]
c++: requires-expr with dependent extra args [PR101181]
Here we're crashing ultimately because the mechanism for delaying
substitution into a requires-expression (and constexpr if and pack
expansions) doesn't expect to see dependent args. But we end up
capturing dependent args here during substitution into the default
template argument as part of coerce_template_parms for the dependent
specialization p<T>.
This patch enables the commented out code in add_extra_args for handling
this situation. This isn't needed for pack expansions (as the
accompanying comment points out), and it doesn't seem strictly necessary
for constexpr if either, but for requires-expressions delaying even
dependent substitution is important for ensuring we don't evaluate
requirements out of order.
It turns out we also need to make a copy of the arguments when capturing
them so that coerce_template_parms doesn't later add to them and form an
unexpected cycle (REQUIRES_EXPR_EXTRA_ARGS (t) would indirectly point to t).
We also need to make tsubst_template_args handle missing template
arguments, since the arguments we capture from coerce_template_parms
and are incomplete at that point.
PR c++/101181
gcc/cp/ChangeLog:
* constraint.cc (tsubst_requires_expr): Pass complain/in_decl to
add_extra_args.
* cp-tree.h (add_extra_args): Add complain/in_decl parameters.
* pt.c (build_extra_args): Make a copy of args.
(add_extra_args): Add complain/in_decl parameters. Enable the
code for handling the case where the extra arguments are
dependent.
(tsubst_pack_expansion): Pass complain/in_decl to
add_extra_args.
(tsubst_template_args): Handle missing template arguments.
(tsubst_expr) <case IF_STMT>: Pass complain/in_decl to
add_extra_args.
gcc/testsuite/ChangeLog:
* g++.dg/cpp2a/concepts-requires26.C: New test.
* g++.dg/cpp2a/lambda-uneval16.C: New test.
Patrick Palka [Fri, 9 Jul 2021 14:20:22 +0000 (10:20 -0400)]
c++: find_template_parameters and TEMPLATE_DECLs [PR101247]
r12-1989 fixed the testcase in the PR, but unfortunately the fix is
buggy: it breaks the case where the common template between the
TEMPLATE_DECL t and ctx_parms is the innermost template (as in
concepts-memtmpl5.C below). This can be fixed by instead passing the
TREE_TYPE of ctmpl to common_enclosing_class when ctmpl is a class
template.
But even after that's fixed, the analogous case where the innermost
template is a partial specialization is still broken (as in
concepts-memtmpl5a.C below), because ctmpl is always a primary template.
So this patch instead takes a diferent approach that doesn't rely on
ctx_parms at all: when looking for the template parameters of a
TEMPLATE_DECL that are shared with the current template context, just
walk its DECL_CONTEXT. As long as the template is not overly general
(e.g. we didn't pass it through most_general_template), this should give
us exactly what we want, since if a TEMPLATE_DECL can be referred to
from some template context then the template parameters it uses must all
be in-scope and contained in its DECL_CONTEXT. This effectively makes
us treat TEMPLATE_DECLs more similarly to other _DECLs (whose DECL_CONTEXT
we also walk).
PR c++/101247
gcc/cp/ChangeLog:
* pt.c (any_template_parm_r) <case TEMPLATE_DECL>: Just walk the
DECL_CONTEXT.
gcc/testsuite/ChangeLog:
* g++.dg/cpp2a/concepts-memtmpl4.C: Uncomment the commented out
example, which we now handle correctly.
* g++.dg/cpp2a/concepts-memtmpl5.C: New test.
* g++.dg/cpp2a/concepts-memtmpl5a.C: New test.
Matheus Castanho [Fri, 9 Jul 2021 14:13:38 +0000 (15:13 +0100)]
libstdc++: Only use __gthread_yield if gthreads is available
libstdc++-v3/ChangeLog:
* include/std/mutex (__lock_impl): Check
_GLIBCXX_HAS_GTHREADS before using __gthread_yield.
Piotr Trojanek [Thu, 17 Jun 2021 16:46:21 +0000 (18:46 +0200)]
[Ada] Fix style in expansion of attribute Put_Image
gcc/ada/
* exp_put_image.adb (Make_Put_Image_Name): Fix style.
(Image_Should_Call_Put_Image): Likewise.
(Build_Image_Call): Likewise.
Ghjuvan Lacambre [Thu, 17 Jun 2021 08:01:33 +0000 (10:01 +0200)]
[Ada] par-ch6: do not mark subprogram as missing "is" if imported
gcc/ada/
* par-ch6.adb (Contains_Import_Aspect): New function.
(P_Subprogram): Acknowledge `Import` aspects.
Bob Duff [Tue, 15 Jun 2021 19:36:34 +0000 (15:36 -0400)]
[Ada] Fix crash on type extensions with discriminants
gcc/ada/
* exp_put_image.adb (Make_Component_Attributes): Use
Implementation_Base_Type to get the parent type. Otherwise,
Parent_Type_Decl is actually an internally generated subtype
declaration, so we blow up on
Type_Definition (Parent_Type_Decl).
Dmitriy Anisimkov [Sun, 13 Jun 2021 02:42:54 +0000 (08:42 +0600)]
[Ada] Add missed OS constant values
gcc/ada/
* gsocket.h: Include net/if.h to get IF_NAMESIZE constant.
* s-oscons-tmplt.c: Define IPV6_FLOWINFO for Linux.
Steve Baird [Wed, 9 Jun 2021 14:29:11 +0000 (07:29 -0700)]
[Ada] Improve performance of Ada.Containers.Doubly_Linked_Lists.Generic_Sorting.Sort
gcc/ada/
* libgnat/a-cdlili.adb: Reimplement
Ada.Containers.Doubly_Linked_Lists.Generic_Sorting.Sort using
Mergesort instead of the previous Quicksort variant.
Justin Squirek [Tue, 15 Jun 2021 16:54:12 +0000 (12:54 -0400)]
[Ada] Crash on expansion of BIP construct in -gnatf mode
gcc/ada/
* exp_ch6.adb (Is_Build_In_Place_Function_Call): Add check to
verify the Selector_Name of Exp_Node has been analyzed before
obtaining its entity.
Gary Dismukes [Mon, 14 Jun 2021 19:37:49 +0000 (15:37 -0400)]
[Ada] Typo corrections and minor reformatting
gcc/ada/
* libgnarl/s-osinte__vxworks.ads: Fix typo ("release" =>
"releases") plus comment reformatting.
* libgnat/s-os_lib.ads: In a comment, fix typo ("indended" =>
"intended"), add a hyphen and semicolon, plus reformatting. In
comment for subtype time_t, fix typo ("effect" => "affect"), add
hyphens, plus reformatting.
* libgnat/s-parame.ads, libgnat/s-parame__ae653.ads,
libgnat/s-parame__hpux.ads: Remove period from one-line comment.
Steve Baird [Thu, 10 Jun 2021 18:20:27 +0000 (11:20 -0700)]
[Ada] Add -gnatX support for casing on discriminated values
gcc/ada/
* exp_ch5.adb (Expand_General_Case_Statement): Add new function
Else_Statements to handle the case of invalid data analogously
to how it is handled when casing on a discrete value.
* sem_case.adb (Has_Static_Discriminant_Constraint): A new
Boolean-valued function.
(Composite_Case_Ops.Scalar_Part_Count): Include discriminants
when traversing components.
(Composite_Case_Ops.Choice_Analysis.Traverse_Discrete_Parts):
Include discriminants when traversing components; the component
range for a constrained discriminant is a single value.
(Composite_Case_Ops.Choice_Analysis.Parse_Choice): Eliminate
Done variable and modify how Next_Part is computed so that it is
always correct (as opposed to being incorrect when Done is
True). This includes changes in Update_Result (a local
procedure). Add new local procedure
Update_Result_For_Box_Component and call it not just for box
components but also for "missing" components (components
associated with an inactive variant).
(Check_Choices.Check_Composite_Case_Selector.Check_Component_Subtype):
Instead of disallowing all discriminated component types, allow
those that are unconstrained or statically constrained. Check
discriminant subtypes along with other component subtypes.
* doc/gnat_rm/implementation_defined_pragmas.rst: Update
documentation to reflect current implementation status.
* gnat_rm.texi: Regenerate.
Justin Squirek [Sun, 13 Jun 2021 16:52:18 +0000 (12:52 -0400)]
[Ada] Crash on inlined separate subprogram
gcc/ada/
* sem_ch6.adb (Check_Pragma_Inline): Correctly use
Corresponding_Spec_Of_Stub when dealing subprogram body stubs.
Doug Rupp [Sat, 5 Jun 2021 19:58:35 +0000 (12:58 -0700)]
[Ada] Declare time_t uniformly based on a system parameter
gcc/ada/
* Makefile.rtl: Add translations for s-parame__posix2008.ads
* libgnarl/s-linux.ads: Import System.Parameters.
(time_t): Declare using System.Parameters.time_t_bits.
* libgnarl/s-linux__alpha.ads: Likewise.
* libgnarl/s-linux__android.ads: Likewise.
* libgnarl/s-linux__hppa.ads: Likewise.
* libgnarl/s-linux__mips.ads: Likewise.
* libgnarl/s-linux__riscv.ads: Likewise.
* libgnarl/s-linux__sparc.ads: Likewise.
* libgnarl/s-linux__x32.ads: Likewise.
* libgnarl/s-qnx.ads: Likewise.
* libgnarl/s-osinte__aix.ads: Likewise.
* libgnarl/s-osinte__android.ads: Likewise.
* libgnarl/s-osinte__darwin.ads: Likewise.
* libgnarl/s-osinte__dragonfly.ads: Likewise.
* libgnarl/s-osinte__freebsd.ads: Likewise.
* libgnarl/s-osinte__gnu.ads: Likewise.
* libgnarl/s-osinte__hpux-dce.ads: Likewise.
* libgnarl/s-osinte__hpux.ads: Likewise.
* libgnarl/s-osinte__kfreebsd-gnu.ads: Likewise.
* libgnarl/s-osinte__lynxos178e.ads: Likewise.
* libgnarl/s-osinte__qnx.ads: Likewise.
* libgnarl/s-osinte__rtems.ads: Likewise.
* libgnarl/s-osinte__solaris.ads: Likewise.
* libgnarl/s-osinte__vxworks.ads: Likewise.
* libgnat/g-sothco.ads: Likewise.
* libgnat/s-osprim__darwin.adb: Likewise.
* libgnat/s-osprim__posix.adb: Likewise.
* libgnat/s-osprim__posix2008.adb: Likewise.
* libgnat/s-osprim__rtems.adb: Likewise.
* libgnat/s-osprim__x32.adb: Likewise.
* libgnarl/s-osinte__linux.ads: use type System.Linux.time_t.
* libgnat/s-os_lib.ads (time_t): Declare as subtype of
Long_Long_Integer.
* libgnat/s-parame.ads (time_t_bits): New constant.
* libgnat/s-parame__ae653.ads (time_t_bits): Likewise.
* libgnat/s-parame__hpux.ads (time_t_bits): Likewise.
* libgnat/s-parame__vxworks.ads (time_t_bits): Likewise.
* libgnat/s-parame__posix2008.ads: New file for 64 bit time_t.
Bob Duff [Mon, 14 Jun 2021 13:37:24 +0000 (09:37 -0400)]
[Ada] Add source file name to gnat bug box
gcc/ada/
* comperr.adb (Compiler_Abort): Print source file name.
Joffrey Huguet [Thu, 10 Jun 2021 09:39:01 +0000 (11:39 +0200)]
[Ada] Fix layout of contracts
gcc/ada/
* libgnat/a-strunb.ads, libgnat/a-strunb__shared.ads: Fix layout
in contracts.
Eric Botcazou [Fri, 11 Jun 2021 07:11:13 +0000 (09:11 +0200)]
[Ada] Fix invalid JSON for derived variant record with -gnatRj
gcc/ada/
* repinfo.ads (JSON output format): Document adjusted key name.
* repinfo.adb (List_Record_Layout): Use Original_Record_Component
if the normalized position of the component is not known.
(List_Structural_Record_Layout): Rename Outer_Ent parameter into
Ext_End and add Ext_Level parameter. In an extension, if the parent
subtype has static discriminants, call List_Record_Layout on it.
Output "parent_" prefixes before "variant" according to Ext_Level.
Adjust recursive calls throughout the procedure.
Piotr Trojanek [Fri, 11 Jun 2021 14:01:36 +0000 (16:01 +0200)]
[Ada] Fix typo in comment related to derived discriminated types
gcc/ada/
* exp_util.ads (Map_Types): Fix typo.
Fedor Rybin [Fri, 4 Jun 2021 18:01:27 +0000 (21:01 +0300)]
[Ada] Fix index range violations in krunch
gcc/ada/
* krunch.adb: Add safeguards against index range violations.