Jakub Jelinek [Fri, 31 Jan 2020 18:35:11 +0000 (19:35 +0100)]
testsuite: Fix up pr91838.C test [PR91838]
The test FAILs on i686-linux with:
FAIL: g++.dg/pr91838.C (test for excess errors)
Excess errors:
/home/jakub/src/gcc/gcc/testsuite/g++.dg/pr91838.C:7:8: warning: MMX vector return without MMX enabled changes the ABI [-Wpsabi]
/home/jakub/src/gcc/gcc/testsuite/g++.dg/pr91838.C:7:3: warning: MMX vector argument without MMX enabled changes the ABI [-Wpsabi]
and on x86_64-linux with -m32 testing with failure to match the
expected pattern in there (or both with e.g. -m32/-mno-mmx/-mno-sse testing).
The test is also in a wrong directory, has non-standard specification that
it requires c++11 or later.
2020-01-31 Jakub Jelinek <jakub@redhat.com>
PR rtl-optimization/91838
* g++.dg/pr91838.C: Moved to ...
* g++.dg/opt/pr91838.C: ... here. Require c++11 target instead of
dg-skip-if for c++98. Pass -Wno-psabi -w to avoid psabi style
warnings on vector arg passing or return. Add -masm=att on i?86/x86_64.
Only check for pxor %xmm0, %xmm0 on lp64 i?86/x86_64.
Richard Sandiford [Thu, 30 Jan 2020 15:46:28 +0000 (15:46 +0000)]
aarch64: Add Armv8.6 SVE bfloat16 support
This patch adds support for the SVE intrinsics that map to Armv8.6
bfloat16 instructions. This means that svcvtnt is now a base SVE
function for one type suffix combination; the others are still
SVE2-specific.
This relies on a binutils fix:
https://sourceware.org/ml/binutils/2020-01/msg00450.html
so anyone testing older binutils 2.34 or binutils master sources will
need to upgrade to get clean test results. (At the time of writing,
no released version of binutils has this bug.)
2020-01-31 Richard Sandiford <richard.sandiford@arm.com>
gcc/
* config/aarch64/aarch64.h (TARGET_SVE_BF16): New macro.
* config/aarch64/aarch64-sve-builtins-sve2.h (svcvtnt): Move to
aarch64-sve-builtins-base.h.
* config/aarch64/aarch64-sve-builtins-sve2.cc (svcvtnt): Move to
aarch64-sve-builtins-base.cc.
* config/aarch64/aarch64-sve-builtins-base.h (svbfdot, svbfdot_lane)
(svbfmlalb, svbfmlalb_lane, svbfmlalt, svbfmlalt_lane, svbfmmla)
(svcvtnt): Declare.
* config/aarch64/aarch64-sve-builtins-base.cc (svbfdot, svbfdot_lane)
(svbfmlalb, svbfmlalb_lane, svbfmlalt, svbfmlalt_lane, svbfmmla)
(svcvtnt): New functions.
* config/aarch64/aarch64-sve-builtins-base.def (svbfdot, svbfdot_lane)
(svbfmlalb, svbfmlalb_lane, svbfmlalt, svbfmlalt_lane, svbfmmla)
(svcvtnt): New functions.
(svcvt): Add a form that converts f32 to bf16.
* config/aarch64/aarch64-sve-builtins-shapes.h (ternary_bfloat)
(ternary_bfloat_lane, ternary_bfloat_lanex2, ternary_bfloat_opt_n):
Declare.
* config/aarch64/aarch64-sve-builtins-shapes.cc (parse_element_type):
Treat B as bfloat16_t.
(ternary_bfloat_lane_base): New class.
(ternary_bfloat_def): Likewise.
(ternary_bfloat): New shape.
(ternary_bfloat_lane_def): New class.
(ternary_bfloat_lane): New shape.
(ternary_bfloat_lanex2_def): New class.
(ternary_bfloat_lanex2): New shape.
(ternary_bfloat_opt_n_def): New class.
(ternary_bfloat_opt_n): New shape.
* config/aarch64/aarch64-sve-builtins.cc (TYPES_cvt_bfloat): New macro.
* config/aarch64/aarch64-sve.md (@aarch64_sve_<sve_fp_op>vnx4sf)
(@aarch64_sve_<sve_fp_op>_lanevnx4sf): New patterns.
(@aarch64_sve_<optab>_trunc<VNx4SF_ONLY:mode><VNx8BF_ONLY:mode>)
(@cond_<optab>_trunc<VNx4SF_ONLY:mode><VNx8BF_ONLY:mode>): Likewise.
(*cond_<optab>_trunc<VNx4SF_ONLY:mode><VNx8BF_ONLY:mode>): Likewise.
(@aarch64_sve_cvtnt<VNx8BF_ONLY:mode>): Likewise.
* config/aarch64/aarch64-sve2.md (@aarch64_sve2_cvtnt<mode>): Key
the pattern off the narrow mode instead of the wider one.
* config/aarch64/iterators.md (VNx8BF_ONLY): New mode iterator.
(UNSPEC_BFMLALB, UNSPEC_BFMLALT, UNSPEC_BFMMLA): New unspecs.
(sve_fp_op): Handle them.
(SVE_BFLOAT_TERNARY_LONG): New int itertor.
(SVE_BFLOAT_TERNARY_LONG_LANE): Likewise.
gcc/testsuite/
* lib/target-supports.exp (check_effective_target_aarch64_asm_bf16_ok):
New proc.
* gcc.target/aarch64/sve/acle/asm/bfdot_f32.c: New test.
* gcc.target/aarch64/sve/acle/asm/bfdot_lane_f32.c: Likweise.
* gcc.target/aarch64/sve/acle/asm/bfmlalb_f32.c: Likweise.
* gcc.target/aarch64/sve/acle/asm/bfmlalb_lane_f32.c: Likweise.
* gcc.target/aarch64/sve/acle/asm/bfmlalt_f32.c: Likweise.
* gcc.target/aarch64/sve/acle/asm/bfmlalt_lane_f32.c: Likweise.
* gcc.target/aarch64/sve/acle/asm/bfmmla_f32.c: Likweise.
* gcc.target/aarch64/sve/acle/asm/cvt_bf16.c: Likweise.
* gcc.target/aarch64/sve/acle/asm/cvtnt_bf16.c: Likweise.
* gcc.target/aarch64/sve/acle/general-c/ternary_bfloat16_1.c: Likweise.
* gcc.target/aarch64/sve/acle/general-c/ternary_bfloat16_lane_1.c:
Likweise.
* gcc.target/aarch64/sve/acle/general-c/ternary_bfloat16_lanex2_1.c:
Likweise.
* gcc.target/aarch64/sve/acle/general-c/ternary_bfloat16_opt_n_1.c:
Likweise.
Richard Sandiford [Wed, 29 Jan 2020 16:06:58 +0000 (16:06 +0000)]
aarch64: Add svbfloat16_t support to arm_sve.h
This patch adds support for the bfloat16-related vectors to
arm_sve.h. It also adds support for functions that just treat
bfloat16_t as a bag of 16 bits; these functions are available
for bf16 whenever they're available for other 16-bit types.
Previously "all_data" was used for both data movement and for arithmetic
that happened to be defined for all data types. Adding bf16 means we
need to distinguish between the two cases.
The patch also reorders the mode definitions in aarch64-modes.def,
which means we no longer need separate VECTOR_MODE entries for BF
vectors.
2020-01-31 Richard Sandiford <richard.sandiford@arm.com>
gcc/
* config/aarch64/arm_sve.h: Include arm_bf16.h.
* config/aarch64/aarch64-modes.def (BF): Move definition before
VECTOR_MODES. Remove separate VECTOR_MODES for V4BF and V8BF.
(SVE_MODES): Handle BF modes.
* config/aarch64/aarch64.c (aarch64_classify_vector_mode): Handle
BF modes.
(aarch64_full_sve_mode): Likewise.
* config/aarch64/iterators.md (SVE_STRUCT): Add VNx16BF, VNx24BF
and VNx32BF.
(SVE_FULL, SVE_FULL_HSD, SVE_ALL): Add VNx8BF.
(Vetype, Vesize, Vctype, VEL, Vel, VEL_INT, V128, v128, vwcore)
(V_INT_EQUIV, v_int_equiv, V_FP_EQUIV, v_fp_equiv, vector_count)
(insn_length, VSINGLE, vsingle, VPRED, vpred, VDOUBLE): Handle the
new SVE BF modes.
* config/aarch64/aarch64-sve-builtins.h (TYPE_bfloat): New
type_class_index.
* config/aarch64/aarch64-sve-builtins.cc (TYPES_all_arith): New macro.
(TYPES_all_data): Add bf16.
(TYPES_reinterpret1, TYPES_reinterpret): Likewise.
(register_tuple_type): Increase buffer size.
* config/aarch64/aarch64-sve-builtins.def (svbfloat16_t): New type.
(bf16): New type suffix.
* config/aarch64/aarch64-sve-builtins-base.def (svabd, svadd, svaddv)
(svcmpeq, svcmpge, svcmpgt, svcmple, svcmplt, svcmpne, svmad, svmax)
(svmaxv, svmin, svminv, svmla, svmls, svmsb, svmul, svsub, svsubr):
Change type from all_data to all_arith.
* config/aarch64/aarch64-sve-builtins-sve2.def (svaddp, svmaxp)
(svminp): Likewise.
gcc/testsuite/
* g++.target/aarch64/sve/acle/general-c++/mangle_1.C: Test mangling
of svbfloat16_t.
* g++.target/aarch64/sve/acle/general-c++/mangle_2.C: Likewise for
__SVBfloat16_t.
* gcc.target/aarch64/sve/acle/asm/clasta_bf16.c: New test.
* gcc.target/aarch64/sve/acle/asm/clastb_bf16.c: Likewise.
* gcc.target/aarch64/sve/acle/asm/cnt_bf16.c: Likewise.
* gcc.target/aarch64/sve/acle/asm/create2_1.c (create_bf16): Likewise.
* gcc.target/aarch64/sve/acle/asm/create3_1.c (create_bf16): Likewise.
* gcc.target/aarch64/sve/acle/asm/create4_1.c (create_bf16): Likewise.
* gcc.target/aarch64/sve/acle/asm/dup_bf16.c: Likewise.
* gcc.target/aarch64/sve/acle/asm/dup_lane_bf16.c: Likewise.
* gcc.target/aarch64/sve/acle/asm/dupq_lane_bf16.c: Likewise.
* gcc.target/aarch64/sve/acle/asm/ext_bf16.c: Likewise.
* gcc.target/aarch64/sve/acle/asm/get2_bf16.c: Likewise.
* gcc.target/aarch64/sve/acle/asm/get3_bf16.c: Likewise.
* gcc.target/aarch64/sve/acle/asm/get4_bf16.c: Likewise.
* gcc.target/aarch64/sve/acle/asm/insr_bf16.c: Likewise.
* gcc.target/aarch64/sve/acle/asm/lasta_bf16.c: Likewise.
* gcc.target/aarch64/sve/acle/asm/lastb_bf16.c: Likewise.
* gcc.target/aarch64/sve/acle/asm/ld1_bf16.c: Likewise.
* gcc.target/aarch64/sve/acle/asm/ld1ro_bf16.c: Likewise.
* gcc.target/aarch64/sve/acle/asm/ld1rq_bf16.c: Likewise.
* gcc.target/aarch64/sve/acle/asm/ld2_bf16.c: Likewise.
* gcc.target/aarch64/sve/acle/asm/ld3_bf16.c: Likewise.
* gcc.target/aarch64/sve/acle/asm/ld4_bf16.c: Likewise.
* gcc.target/aarch64/sve/acle/asm/ldff1_bf16.c: Likewise.
* gcc.target/aarch64/sve/acle/asm/ldnf1_bf16.c: Likewise.
* gcc.target/aarch64/sve/acle/asm/ldnt1_bf16.c: Likewise.
* gcc.target/aarch64/sve/acle/asm/len_bf16.c: Likewise.
* gcc.target/aarch64/sve/acle/asm/reinterpret_bf16.c: Likewise.
* gcc.target/aarch64/sve/acle/asm/reinterpret_f16.c
(reinterpret_f16_bf16_tied1, reinterpret_f16_bf16_untied): Likewise.
* gcc.target/aarch64/sve/acle/asm/reinterpret_f32.c
(reinterpret_f32_bf16_tied1, reinterpret_f32_bf16_untied): Likewise.
* gcc.target/aarch64/sve/acle/asm/reinterpret_f64.c
(reinterpret_f64_bf16_tied1, reinterpret_f64_bf16_untied): Likewise.
* gcc.target/aarch64/sve/acle/asm/reinterpret_s16.c
(reinterpret_s16_bf16_tied1, reinterpret_s16_bf16_untied): Likewise.
* gcc.target/aarch64/sve/acle/asm/reinterpret_s32.c
(reinterpret_s32_bf16_tied1, reinterpret_s32_bf16_untied): Likewise.
* gcc.target/aarch64/sve/acle/asm/reinterpret_s64.c
(reinterpret_s64_bf16_tied1, reinterpret_s64_bf16_untied): Likewise.
* gcc.target/aarch64/sve/acle/asm/reinterpret_s8.c
(reinterpret_s8_bf16_tied1, reinterpret_s8_bf16_untied): Likewise.
* gcc.target/aarch64/sve/acle/asm/reinterpret_u16.c
(reinterpret_u16_bf16_tied1, reinterpret_u16_bf16_untied): Likewise.
* gcc.target/aarch64/sve/acle/asm/reinterpret_u32.c
(reinterpret_u32_bf16_tied1, reinterpret_u32_bf16_untied): Likewise.
* gcc.target/aarch64/sve/acle/asm/reinterpret_u64.c
(reinterpret_u64_bf16_tied1, reinterpret_u64_bf16_untied): Likewise.
* gcc.target/aarch64/sve/acle/asm/reinterpret_u8.c
(reinterpret_u8_bf16_tied1, reinterpret_u8_bf16_untied): Likewise.
* gcc.target/aarch64/sve/acle/asm/rev_bf16.c: Likewise.
* gcc.target/aarch64/sve/acle/asm/sel_bf16.c: Likewise.
* gcc.target/aarch64/sve/acle/asm/set2_bf16.c: Likewise.
* gcc.target/aarch64/sve/acle/asm/set3_bf16.c: Likewise.
* gcc.target/aarch64/sve/acle/asm/set4_bf16.c: Likewise.
* gcc.target/aarch64/sve/acle/asm/splice_bf16.c: Likewise.
* gcc.target/aarch64/sve/acle/asm/st1_bf16.c: Likewise.
* gcc.target/aarch64/sve/acle/asm/st2_bf16.c: Likewise.
* gcc.target/aarch64/sve/acle/asm/st3_bf16.c: Likewise.
* gcc.target/aarch64/sve/acle/asm/st4_bf16.c: Likewise.
* gcc.target/aarch64/sve/acle/asm/stnt1_bf16.c: Likewise.
* gcc.target/aarch64/sve/acle/asm/tbl_bf16.c: Likewise.
* gcc.target/aarch64/sve/acle/asm/trn1_bf16.c: Likewise.
* gcc.target/aarch64/sve/acle/asm/trn1q_bf16.c: Likewise.
* gcc.target/aarch64/sve/acle/asm/trn2_bf16.c: Likewise.
* gcc.target/aarch64/sve/acle/asm/trn2q_bf16.c: Likewise.
* gcc.target/aarch64/sve/acle/asm/undef2_1.c (bfloat16_t): Likewise.
* gcc.target/aarch64/sve/acle/asm/undef3_1.c (bfloat16_t): Likewise.
* gcc.target/aarch64/sve/acle/asm/undef4_1.c (bfloat16_t): Likewise.
* gcc.target/aarch64/sve/acle/asm/undef_1.c (bfloat16_t): Likewise.
* gcc.target/aarch64/sve/acle/asm/uzp1_bf16.c: Likewise.
* gcc.target/aarch64/sve/acle/asm/uzp1q_bf16.c: Likewise.
* gcc.target/aarch64/sve/acle/asm/uzp2_bf16.c: Likewise.
* gcc.target/aarch64/sve/acle/asm/uzp2q_bf16.c: Likewise.
* gcc.target/aarch64/sve/acle/asm/zip1_bf16.c: Likewise.
* gcc.target/aarch64/sve/acle/asm/zip1q_bf16.c: Likewise.
* gcc.target/aarch64/sve/acle/asm/zip2_bf16.c: Likewise.
* gcc.target/aarch64/sve/acle/asm/zip2q_bf16.c: Likewise.
* gcc.target/aarch64/sve/pcs/annotate_1.c (ret_bf16, ret_bf16x2)
(ret_bf16x3, ret_bf16x4): Likewise.
* gcc.target/aarch64/sve/pcs/annotate_2.c (fn_bf16, fn_bf16x2)
(fn_bf16x3, fn_bf16x4): Likewise.
* gcc.target/aarch64/sve/pcs/annotate_3.c (fn_bf16, fn_bf16x2)
(fn_bf16x3, fn_bf16x4): Likewise.
* gcc.target/aarch64/sve/pcs/annotate_4.c (fn_bf16, fn_bf16x2)
(fn_bf16x3, fn_bf16x4): Likewise.
* gcc.target/aarch64/sve/pcs/annotate_5.c (fn_bf16, fn_bf16x2)
(fn_bf16x3, fn_bf16x4): Likewise.
* gcc.target/aarch64/sve/pcs/annotate_6.c (fn_bf16, fn_bf16x2)
(fn_bf16x3, fn_bf16x4): Likewise.
* gcc.target/aarch64/sve/pcs/annotate_7.c (fn_bf16, fn_bf16x2)
(fn_bf16x3, fn_bf16x4): Likewise.
* gcc.target/aarch64/sve/pcs/args_5_be_bf16.c: Likewise.
* gcc.target/aarch64/sve/pcs/args_5_le_bf16.c: Likewise.
* gcc.target/aarch64/sve/pcs/args_6_be_bf16.c: Likewise.
* gcc.target/aarch64/sve/pcs/args_6_le_bf16.c: Likewise.
* gcc.target/aarch64/sve/pcs/gnu_vectors_1.c (bfloat16x16_t): New
typedef.
(bfloat16_callee, bfloat16_caller): New tests.
* gcc.target/aarch64/sve/pcs/gnu_vectors_2.c (bfloat16x16_t): New
typedef.
(bfloat16_callee, bfloat16_caller): New tests.
* gcc.target/aarch64/sve/pcs/return_4.c (CALLER_BF16): New macro.
(callee_bf16, caller_bf16): New tests.
* gcc.target/aarch64/sve/pcs/return_4_128.c (CALLER_BF16): New macro.
(callee_bf16, caller_bf16): New tests.
* gcc.target/aarch64/sve/pcs/return_4_256.c (CALLER_BF16): New macro.
(callee_bf16, caller_bf16): New tests.
* gcc.target/aarch64/sve/pcs/return_4_512.c (CALLER_BF16): New macro.
(callee_bf16, caller_bf16): New tests.
* gcc.target/aarch64/sve/pcs/return_4_1024.c (CALLER_BF16): New macro.
(callee_bf16, caller_bf16): New tests.
* gcc.target/aarch64/sve/pcs/return_4_2048.c (CALLER_BF16): New macro.
(callee_bf16, caller_bf16): New tests.
* gcc.target/aarch64/sve/pcs/return_5.c (CALLER_BF16): New macro.
(callee_bf16, caller_bf16): New tests.
* gcc.target/aarch64/sve/pcs/return_5_128.c (CALLER_BF16): New macro.
(callee_bf16, caller_bf16): New tests.
* gcc.target/aarch64/sve/pcs/return_5_256.c (CALLER_BF16): New macro.
(callee_bf16, caller_bf16): New tests.
* gcc.target/aarch64/sve/pcs/return_5_512.c (CALLER_BF16): New macro.
(callee_bf16, caller_bf16): New tests.
* gcc.target/aarch64/sve/pcs/return_5_1024.c (CALLER_BF16): New macro.
(callee_bf16, caller_bf16): New tests.
* gcc.target/aarch64/sve/pcs/return_5_2048.c (CALLER_BF16): New macro.
(callee_bf16, caller_bf16): New tests.
* gcc.target/aarch64/sve/pcs/return_6.c (bfloat16_t): New typedef.
(callee_bf16, caller_bf16): New tests.
* gcc.target/aarch64/sve/pcs/return_6_128.c (bfloat16_t): New typedef.
(callee_bf16, caller_bf16): New tests.
* gcc.target/aarch64/sve/pcs/return_6_256.c (bfloat16_t): New typedef.
(callee_bf16, caller_bf16): New tests.
* gcc.target/aarch64/sve/pcs/return_6_512.c (bfloat16_t): New typedef.
(callee_bf16, caller_bf16): New tests.
* gcc.target/aarch64/sve/pcs/return_6_1024.c (bfloat16_t): New typedef.
(callee_bf16, caller_bf16): New tests.
* gcc.target/aarch64/sve/pcs/return_6_2048.c (bfloat16_t): New typedef.
(callee_bf16, caller_bf16): New tests.
* gcc.target/aarch64/sve/pcs/return_7.c (callee_bf16): Likewise
(caller_bf16): Likewise.
* gcc.target/aarch64/sve/pcs/return_8.c (callee_bf16): Likewise
(caller_bf16): Likewise.
* gcc.target/aarch64/sve/pcs/return_9.c (callee_bf16): Likewise
(caller_bf16): Likewise.
* gcc.target/aarch64/sve2/acle/asm/tbl2_bf16.c: Likewise.
* gcc.target/aarch64/sve2/acle/asm/tbx_bf16.c: Likewise.
* gcc.target/aarch64/sve2/acle/asm/whilerw_bf16.c: Likewise.
* gcc.target/aarch64/sve2/acle/asm/whilewr_bf16.c: Likewise.
Richard Sandiford [Tue, 28 Jan 2020 13:49:49 +0000 (13:49 +0000)]
aarch64: Add Armv8.6 SVE matrix multiply support
This mostly follows existing practice. Perhaps the only noteworthy
thing is that svmmla is split across three extensions (i8mm, f32mm
and f64mm), any of which can be enabled independently. The easiest
way of coping with this seemed to be to add a fourth svmmla entry
for base SVE, but with no type suffixes. This means that the
overloaded function is always available for C, but never successfully
resolves without the appropriate target feature.
2020-01-31 Dennis Zhang <dennis.zhang@arm.com>
Matthew Malcomson <matthew.malcomson@arm.com>
Richard Sandiford <richard.sandiford@arm.com>
gcc/
* doc/invoke.texi (f32mm): Document new AArch64 -march= extension.
* config/aarch64/aarch64-c.c (aarch64_update_cpp_builtins): Define
__ARM_FEATURE_SVE_MATMUL_INT8, __ARM_FEATURE_SVE_MATMUL_FP32 and
__ARM_FEATURE_SVE_MATMUL_FP64 as appropriate. Don't define
__ARM_FEATURE_MATMUL_FP64.
* config/aarch64/aarch64-option-extensions.def (fp, simd, fp16)
(sve): Add AARCH64_FL_F32MM to the list of extensions that should
be disabled at the same time.
(f32mm): New extension.
* config/aarch64/aarch64.h (AARCH64_FL_F32MM): New macro.
(AARCH64_FL_F64MM): Bump to the next bit up.
(AARCH64_ISA_F32MM, TARGET_SVE_I8MM, TARGET_F32MM, TARGET_SVE_F32MM)
(TARGET_SVE_F64MM): New macros.
* config/aarch64/iterators.md (SVE_MATMULF): New mode iterator.
(UNSPEC_FMMLA, UNSPEC_SMATMUL, UNSPEC_UMATMUL, UNSPEC_USMATMUL)
(UNSPEC_TRN1Q, UNSPEC_TRN2Q, UNSPEC_UZP1Q, UNSPEC_UZP2Q, UNSPEC_ZIP1Q)
(UNSPEC_ZIP2Q): New unspeccs.
(DOTPROD_US_ONLY, PERMUTEQ, MATMUL, FMMLA): New int iterators.
(optab, sur, perm_insn): Handle the new unspecs.
(sve_fp_op): Handle UNSPEC_FMMLA. Resort.
* config/aarch64/aarch64-sve.md (@aarch64_sve_ld1ro<mode>): Use
TARGET_SVE_F64MM instead of separate tests.
(@aarch64_<DOTPROD_US_ONLY:sur>dot_prod<vsi2qi>): New pattern.
(@aarch64_<DOTPROD_US_ONLY:sur>dot_prod_lane<vsi2qi>): Likewise.
(@aarch64_sve_add_<MATMUL:optab><vsi2qi>): Likewise.
(@aarch64_sve_<FMMLA:sve_fp_op><mode>): Likewise.
(@aarch64_sve_<PERMUTEQ:optab><mode>): Likewise.
* config/aarch64/aarch64-sve-builtins.cc (TYPES_s_float): New macro.
(TYPES_s_float_hsd_integer, TYPES_s_float_sd_integer): Use it.
(TYPES_s_signed): New macro.
(TYPES_s_integer): Use it.
(TYPES_d_float): New macro.
(TYPES_d_data): Use it.
* config/aarch64/aarch64-sve-builtins-shapes.h (mmla): Declare.
(ternary_intq_uintq_lane, ternary_intq_uintq_opt_n, ternary_uintq_intq)
(ternary_uintq_intq_lane, ternary_uintq_intq_opt_n): Likewise.
* config/aarch64/aarch64-sve-builtins-shapes.cc (mmla_def): New class.
(svmmla): New shape.
(ternary_resize2_opt_n_base): Add TYPE_CLASS2 and TYPE_CLASS3
template parameters.
(ternary_resize2_lane_base): Likewise.
(ternary_resize2_base): New class.
(ternary_qq_lane_base): Likewise.
(ternary_intq_uintq_lane_def): Likewise.
(ternary_intq_uintq_lane): New shape.
(ternary_intq_uintq_opt_n_def): New class
(ternary_intq_uintq_opt_n): New shape.
(ternary_qq_lane_def): Inherit from ternary_qq_lane_base.
(ternary_uintq_intq_def): New class.
(ternary_uintq_intq): New shape.
(ternary_uintq_intq_lane_def): New class.
(ternary_uintq_intq_lane): New shape.
(ternary_uintq_intq_opt_n_def): New class.
(ternary_uintq_intq_opt_n): New shape.
* config/aarch64/aarch64-sve-builtins-base.h (svmmla, svsudot)
(svsudot_lane, svtrn1q, svtrn2q, svusdot, svusdot_lane, svusmmla)
(svuzp1q, svuzp2q, svzip1q, svzip2q): Declare.
* config/aarch64/aarch64-sve-builtins-base.cc (svdot_lane_impl):
Generalize to...
(svdotprod_lane_impl): ...this new class.
(svmmla_impl, svusdot_impl): New classes.
(svdot_lane): Update to use svdotprod_lane_impl.
(svmmla, svsudot, svsudot_lane, svtrn1q, svtrn2q, svusdot)
(svusdot_lane, svusmmla, svuzp1q, svuzp2q, svzip1q, svzip2q): New
functions.
* config/aarch64/aarch64-sve-builtins-base.def (svmmla): New base
function, with no types defined.
(svmmla, svusmmla, svsudot, svsudot_lane, svusdot, svusdot_lane): New
AARCH64_FL_I8MM functions.
(svmmla): New AARCH64_FL_F32MM function.
(svld1ro): Depend only on AARCH64_FL_F64MM, not on AARCH64_FL_V8_6.
(svmmla, svtrn1q, svtrn2q, svuz1q, svuz2q, svzip1q, svzip2q): New
AARCH64_FL_F64MM function.
(REQUIRED_EXTENSIONS):
gcc/testsuite/
* lib/target-supports.exp (check_effective_target_aarch64_asm_i8mm_ok)
(check_effective_target_aarch64_asm_f32mm_ok): New target selectors.
* gcc.target/aarch64/pragma_cpp_predefs_2.c: Test handling of
__ARM_FEATURE_SVE_MATMUL_INT8, __ARM_FEATURE_SVE_MATMUL_FP32 and
__ARM_FEATURE_SVE_MATMUL_FP64.
* gcc.target/aarch64/sve/acle/asm/test_sve_acle.h (TEST_TRIPLE_Z):
(TEST_TRIPLE_Z_REV2, TEST_TRIPLE_Z_REV, TEST_TRIPLE_LANE_REG)
(TEST_TRIPLE_ZX): New macros.
* gcc.target/aarch64/sve/acle/asm/ld1ro_f16.c: Remove +sve and
rely on +f64mm to enable it.
* gcc.target/aarch64/sve/acle/asm/ld1ro_f32.c: Likewise.
* gcc.target/aarch64/sve/acle/asm/ld1ro_f64.c: Likewise.
* gcc.target/aarch64/sve/acle/asm/ld1ro_s16.c: Likewise.
* gcc.target/aarch64/sve/acle/asm/ld1ro_s32.c: Likewise.
* gcc.target/aarch64/sve/acle/asm/ld1ro_s64.c: Likewise.
* gcc.target/aarch64/sve/acle/asm/ld1ro_s8.c: Likewise.
* gcc.target/aarch64/sve/acle/asm/ld1ro_u16.c: Likewise.
* gcc.target/aarch64/sve/acle/asm/ld1ro_u32.c: Likewise.
* gcc.target/aarch64/sve/acle/asm/ld1ro_u64.c: Likewise.
* gcc.target/aarch64/sve/acle/asm/ld1ro_u8.c: Likewise.
* gcc.target/aarch64/sve/acle/asm/mmla_f32.c: New test.
* gcc.target/aarch64/sve/acle/asm/mmla_f64.c: Likewise,
* gcc.target/aarch64/sve/acle/asm/mmla_s32.c: Likewise,
* gcc.target/aarch64/sve/acle/asm/mmla_u32.c: Likewise,
* gcc.target/aarch64/sve/acle/asm/sudot_lane_s32.c: Likewise,
* gcc.target/aarch64/sve/acle/asm/sudot_s32.c: Likewise,
* gcc.target/aarch64/sve/acle/asm/trn1q_f16.c: Likewise.
* gcc.target/aarch64/sve/acle/asm/trn1q_f32.c: Likewise.
* gcc.target/aarch64/sve/acle/asm/trn1q_f64.c: Likewise.
* gcc.target/aarch64/sve/acle/asm/trn1q_s16.c: Likewise.
* gcc.target/aarch64/sve/acle/asm/trn1q_s32.c: Likewise.
* gcc.target/aarch64/sve/acle/asm/trn1q_s64.c: Likewise.
* gcc.target/aarch64/sve/acle/asm/trn1q_s8.c: Likewise.
* gcc.target/aarch64/sve/acle/asm/trn1q_u16.c: Likewise.
* gcc.target/aarch64/sve/acle/asm/trn1q_u32.c: Likewise.
* gcc.target/aarch64/sve/acle/asm/trn1q_u64.c: Likewise.
* gcc.target/aarch64/sve/acle/asm/trn1q_u8.c: Likewise.
* gcc.target/aarch64/sve/acle/asm/trn2q_f16.c: Likewise.
* gcc.target/aarch64/sve/acle/asm/trn2q_f32.c: Likewise.
* gcc.target/aarch64/sve/acle/asm/trn2q_f64.c: Likewise.
* gcc.target/aarch64/sve/acle/asm/trn2q_s16.c: Likewise.
* gcc.target/aarch64/sve/acle/asm/trn2q_s32.c: Likewise.
* gcc.target/aarch64/sve/acle/asm/trn2q_s64.c: Likewise.
* gcc.target/aarch64/sve/acle/asm/trn2q_s8.c: Likewise.
* gcc.target/aarch64/sve/acle/asm/trn2q_u16.c: Likewise.
* gcc.target/aarch64/sve/acle/asm/trn2q_u32.c: Likewise.
* gcc.target/aarch64/sve/acle/asm/trn2q_u64.c: Likewise.
* gcc.target/aarch64/sve/acle/asm/trn2q_u8.c: Likewise.
* gcc.target/aarch64/sve/acle/asm/usdot_lane_s32.c: Likewise.
* gcc.target/aarch64/sve/acle/asm/usdot_s32.c: Likewise.
* gcc.target/aarch64/sve/acle/asm/usmmla_s32.c: Likewise.
* gcc.target/aarch64/sve/acle/asm/uzp1q_f16.c: Likewise.
* gcc.target/aarch64/sve/acle/asm/uzp1q_f32.c: Likewise.
* gcc.target/aarch64/sve/acle/asm/uzp1q_f64.c: Likewise.
* gcc.target/aarch64/sve/acle/asm/uzp1q_s16.c: Likewise.
* gcc.target/aarch64/sve/acle/asm/uzp1q_s32.c: Likewise.
* gcc.target/aarch64/sve/acle/asm/uzp1q_s64.c: Likewise.
* gcc.target/aarch64/sve/acle/asm/uzp1q_s8.c: Likewise.
* gcc.target/aarch64/sve/acle/asm/uzp1q_u16.c: Likewise.
* gcc.target/aarch64/sve/acle/asm/uzp1q_u32.c: Likewise.
* gcc.target/aarch64/sve/acle/asm/uzp1q_u64.c: Likewise.
* gcc.target/aarch64/sve/acle/asm/uzp1q_u8.c: Likewise.
* gcc.target/aarch64/sve/acle/asm/uzp2q_f16.c: Likewise.
* gcc.target/aarch64/sve/acle/asm/uzp2q_f32.c: Likewise.
* gcc.target/aarch64/sve/acle/asm/uzp2q_f64.c: Likewise.
* gcc.target/aarch64/sve/acle/asm/uzp2q_s16.c: Likewise.
* gcc.target/aarch64/sve/acle/asm/uzp2q_s32.c: Likewise.
* gcc.target/aarch64/sve/acle/asm/uzp2q_s64.c: Likewise.
* gcc.target/aarch64/sve/acle/asm/uzp2q_s8.c: Likewise.
* gcc.target/aarch64/sve/acle/asm/uzp2q_u16.c: Likewise.
* gcc.target/aarch64/sve/acle/asm/uzp2q_u32.c: Likewise.
* gcc.target/aarch64/sve/acle/asm/uzp2q_u64.c: Likewise.
* gcc.target/aarch64/sve/acle/asm/uzp2q_u8.c: Likewise.
* gcc.target/aarch64/sve/acle/asm/zip1q_f16.c: Likewise.
* gcc.target/aarch64/sve/acle/asm/zip1q_f32.c: Likewise.
* gcc.target/aarch64/sve/acle/asm/zip1q_f64.c: Likewise.
* gcc.target/aarch64/sve/acle/asm/zip1q_s16.c: Likewise.
* gcc.target/aarch64/sve/acle/asm/zip1q_s32.c: Likewise.
* gcc.target/aarch64/sve/acle/asm/zip1q_s64.c: Likewise.
* gcc.target/aarch64/sve/acle/asm/zip1q_s8.c: Likewise.
* gcc.target/aarch64/sve/acle/asm/zip1q_u16.c: Likewise.
* gcc.target/aarch64/sve/acle/asm/zip1q_u32.c: Likewise.
* gcc.target/aarch64/sve/acle/asm/zip1q_u64.c: Likewise.
* gcc.target/aarch64/sve/acle/asm/zip1q_u8.c: Likewise.
* gcc.target/aarch64/sve/acle/asm/zip2q_f16.c: Likewise.
* gcc.target/aarch64/sve/acle/asm/zip2q_f32.c: Likewise.
* gcc.target/aarch64/sve/acle/asm/zip2q_f64.c: Likewise.
* gcc.target/aarch64/sve/acle/asm/zip2q_s16.c: Likewise.
* gcc.target/aarch64/sve/acle/asm/zip2q_s32.c: Likewise.
* gcc.target/aarch64/sve/acle/asm/zip2q_s64.c: Likewise.
* gcc.target/aarch64/sve/acle/asm/zip2q_s8.c: Likewise.
* gcc.target/aarch64/sve/acle/asm/zip2q_u16.c: Likewise.
* gcc.target/aarch64/sve/acle/asm/zip2q_u32.c: Likewise.
* gcc.target/aarch64/sve/acle/asm/zip2q_u64.c: Likewise.
* gcc.target/aarch64/sve/acle/asm/zip2q_u8.c: Likewise.
* gcc.target/aarch64/sve/acle/general-c/mmla_1.c: Likewise.
* gcc.target/aarch64/sve/acle/general-c/mmla_2.c: Likewise.
* gcc.target/aarch64/sve/acle/general-c/mmla_3.c: Likewise.
* gcc.target/aarch64/sve/acle/general-c/mmla_4.c: Likewise.
* gcc.target/aarch64/sve/acle/general-c/mmla_5.c: Likewise.
* gcc.target/aarch64/sve/acle/general-c/mmla_6.c: Likewise.
* gcc.target/aarch64/sve/acle/general-c/mmla_7.c: Likewise.
* gcc.target/aarch64/sve/acle/general-c/ternary_intq_uintq_lane_1.c:
Likewise.
* gcc.target/aarch64/sve/acle/general-c/ternary_intq_uintq_opt_n_1.c:
Likewise.
* gcc.target/aarch64/sve/acle/general-c/ternary_uintq_intq_1.c:
Likewise.
* gcc.target/aarch64/sve/acle/general-c/ternary_uintq_intq_lane_1.c:
Likewise.
* gcc.target/aarch64/sve/acle/general-c/ternary_uintq_intq_opt_n_1.c:
Likewise.
Richard Sandiford [Fri, 31 Jan 2020 13:56:31 +0000 (13:56 +0000)]
aarch64: Fix SVE PCS failures for BE & ILP32
This patch should (finally!) give clean test results for
aarch64-sve-pcs.exp for all {be,le}{lp64,ilp32} combinations.
The *_128.c tests require aarch64_little_endian because they test for
fixed-length 128-bit code, whereas -msve-vector-bits=128 still generates
VLA code for big-endian.
Some tests require lp64 because they match (64-bit) pointer loads and
stores. Others require it because ilp32 adds extra zero extensions.
We still have a non-trivial amount of coverage for -mbig-endian -mabi=ilp32:
# of expected passes 663
# of unsupported tests 59
2020-01-31 Richard Sandiford <richard.sandiford@arm.com>
gcc/testsuite/
* gcc.target/aarch64/sve/pcs/args_1.c: Require lp64 for
check-function-bodies tests.
* gcc.target/aarch64/sve/pcs/args_2.c: Likewise.
* gcc.target/aarch64/sve/pcs/args_3.c: Likewise.
* gcc.target/aarch64/sve/pcs/args_4.c: Likewise.
* gcc.target/aarch64/sve/pcs/return_1.c: Likewise.
* gcc.target/aarch64/sve/pcs/return_1_256.c: Likewise.
* gcc.target/aarch64/sve/pcs/return_1_512.c: Likewise.
* gcc.target/aarch64/sve/pcs/return_1_1024.c: Likewise.
* gcc.target/aarch64/sve/pcs/return_1_2048.c: Likewise.
* gcc.target/aarch64/sve/pcs/return_2.c: Likewise.
* gcc.target/aarch64/sve/pcs/return_3.c: Likewise.
* gcc.target/aarch64/sve/pcs/return_4.c: Likewise.
* gcc.target/aarch64/sve/pcs/return_4_256.c: Likewise.
* gcc.target/aarch64/sve/pcs/return_4_512.c: Likewise.
* gcc.target/aarch64/sve/pcs/return_4_1024.c: Likewise.
* gcc.target/aarch64/sve/pcs/return_4_2048.c: Likewise.
* gcc.target/aarch64/sve/pcs/return_5.c: Likewise.
* gcc.target/aarch64/sve/pcs/return_5_256.c: Likewise.
* gcc.target/aarch64/sve/pcs/return_5_512.c: Likewise.
* gcc.target/aarch64/sve/pcs/return_5_1024.c: Likewise.
* gcc.target/aarch64/sve/pcs/return_5_2048.c: Likewise.
* gcc.target/aarch64/sve/pcs/return_6.c: Likewise.
* gcc.target/aarch64/sve/pcs/return_6_256.c: Likewise.
* gcc.target/aarch64/sve/pcs/return_6_512.c: Likewise.
* gcc.target/aarch64/sve/pcs/return_6_1024.c: Likewise.
* gcc.target/aarch64/sve/pcs/return_6_2048.c: Likewise.
* gcc.target/aarch64/sve/pcs/saves_2_be_nowrap.c: Likewise.
* gcc.target/aarch64/sve/pcs/saves_2_be_wrap.c: Likewise.
* gcc.target/aarch64/sve/pcs/saves_2_le_nowrap.c: Likewise.
* gcc.target/aarch64/sve/pcs/saves_2_le_wrap.c: Likewise.
* gcc.target/aarch64/sve/pcs/saves_3.c: Likewise.
* gcc.target/aarch64/sve/pcs/saves_4_be.c: Likewise.
* gcc.target/aarch64/sve/pcs/saves_4_le.c: Likewise.
* gcc.target/aarch64/sve/pcs/varargs_1.c: Likewise.
* gcc.target/aarch64/sve/pcs/varargs_2_f16.c: Likewise.
* gcc.target/aarch64/sve/pcs/varargs_2_f32.c: Likewise.
* gcc.target/aarch64/sve/pcs/varargs_2_f64.c: Likewise.
* gcc.target/aarch64/sve/pcs/varargs_2_s16.c: Likewise.
* gcc.target/aarch64/sve/pcs/varargs_2_s32.c: Likewise.
* gcc.target/aarch64/sve/pcs/varargs_2_s64.c: Likewise.
* gcc.target/aarch64/sve/pcs/varargs_2_s8.c: Likewise.
* gcc.target/aarch64/sve/pcs/varargs_2_u16.c: Likewise.
* gcc.target/aarch64/sve/pcs/varargs_2_u32.c: Likewise.
* gcc.target/aarch64/sve/pcs/varargs_2_u64.c: Likewise.
* gcc.target/aarch64/sve/pcs/varargs_2_u8.c: Likewise.
* gcc.target/aarch64/sve/pcs/args_5_be_f16.c: Require lp64.
* gcc.target/aarch64/sve/pcs/args_5_be_f32.c: Likewise.
* gcc.target/aarch64/sve/pcs/args_5_be_f64.c: Likewise.
* gcc.target/aarch64/sve/pcs/args_5_be_s16.c: Likewise.
* gcc.target/aarch64/sve/pcs/args_5_be_s32.c: Likewise.
* gcc.target/aarch64/sve/pcs/args_5_be_s64.c: Likewise.
* gcc.target/aarch64/sve/pcs/args_5_be_s8.c: Likewise.
* gcc.target/aarch64/sve/pcs/args_5_be_u16.c: Likewise.
* gcc.target/aarch64/sve/pcs/args_5_be_u32.c: Likewise.
* gcc.target/aarch64/sve/pcs/args_5_be_u64.c: Likewise.
* gcc.target/aarch64/sve/pcs/args_5_be_u8.c: Likewise.
* gcc.target/aarch64/sve/pcs/args_5_le_f16.c: Likewise.
* gcc.target/aarch64/sve/pcs/args_5_le_f32.c: Likewise.
* gcc.target/aarch64/sve/pcs/args_5_le_f64.c: Likewise.
* gcc.target/aarch64/sve/pcs/args_5_le_s16.c: Likewise.
* gcc.target/aarch64/sve/pcs/args_5_le_s32.c: Likewise.
* gcc.target/aarch64/sve/pcs/args_5_le_s64.c: Likewise.
* gcc.target/aarch64/sve/pcs/args_5_le_s8.c: Likewise.
* gcc.target/aarch64/sve/pcs/args_5_le_u16.c: Likewise.
* gcc.target/aarch64/sve/pcs/args_5_le_u32.c: Likewise.
* gcc.target/aarch64/sve/pcs/args_5_le_u64.c: Likewise.
* gcc.target/aarch64/sve/pcs/args_5_le_u8.c: Likewise.
* gcc.target/aarch64/sve/pcs/args_6_be_f16.c: Likewise.
* gcc.target/aarch64/sve/pcs/args_6_be_f32.c: Likewise.
* gcc.target/aarch64/sve/pcs/args_6_be_f64.c: Likewise.
* gcc.target/aarch64/sve/pcs/args_6_be_s16.c: Likewise.
* gcc.target/aarch64/sve/pcs/args_6_be_s32.c: Likewise.
* gcc.target/aarch64/sve/pcs/args_6_be_s64.c: Likewise.
* gcc.target/aarch64/sve/pcs/args_6_be_s8.c: Likewise.
* gcc.target/aarch64/sve/pcs/args_6_be_u16.c: Likewise.
* gcc.target/aarch64/sve/pcs/args_6_be_u32.c: Likewise.
* gcc.target/aarch64/sve/pcs/args_6_be_u64.c: Likewise.
* gcc.target/aarch64/sve/pcs/args_6_be_u8.c: Likewise.
* gcc.target/aarch64/sve/pcs/args_6_le_f16.c: Likewise.
* gcc.target/aarch64/sve/pcs/args_6_le_f32.c: Likewise.
* gcc.target/aarch64/sve/pcs/args_6_le_f64.c: Likewise.
* gcc.target/aarch64/sve/pcs/args_6_le_s16.c: Likewise.
* gcc.target/aarch64/sve/pcs/args_6_le_s32.c: Likewise.
* gcc.target/aarch64/sve/pcs/args_6_le_s64.c: Likewise.
* gcc.target/aarch64/sve/pcs/args_6_le_s8.c: Likewise.
* gcc.target/aarch64/sve/pcs/args_6_le_u16.c: Likewise.
* gcc.target/aarch64/sve/pcs/args_6_le_u32.c: Likewise.
* gcc.target/aarch64/sve/pcs/args_6_le_u64.c: Likewise.
* gcc.target/aarch64/sve/pcs/args_6_le_u8.c: Likewise.
* gcc.target/aarch64/sve/pcs/args_7.c: Likewise.
* gcc.target/aarch64/sve/pcs/args_8.c: Likewise.
* gcc.target/aarch64/sve/pcs/args_9.c: Likewise.
* gcc.target/aarch64/sve/pcs/return_4_128.c: Require lp64 and
aarch64_little_endian for check-function-bodies tests.
* gcc.target/aarch64/sve/pcs/return_5_128.c: Likewise.
* gcc.target/aarch64/sve/pcs/stack_clash_2_128.c: Likewise.
* gcc.target/aarch64/sve/pcs/return_1_128.c: Likewise. Remove
target selector from dg-compile.
* gcc.target/aarch64/sve/pcs/return_6_128.c: Likewise.
Patrick Palka [Tue, 21 Jan 2020 22:00:43 +0000 (17:00 -0500)]
libstdc++: Always return a sentinel<I> from __gnu_test::test_range::end()
It seems that in practice std::sentinel_for<I, I> is always true, and so the
test_range container doesn't help us detect bugs in ranges code in which we
wrongly assume that a sentinel can be manipulated like an iterator. Make the
test_range range more strict by having end() unconditionally return a
sentinel<I>, and adjust some tests accordingly.
libstdc++-v3/ChangeLog:
* testsuite/24_iterators/range_operations/distance.cc: Do not assume
test_range::end() returns the same type as test_range::begin().
* testsuite/24_iterators/range_operations/next.cc: Likewise.
* testsuite/24_iterators/range_operations/prev.cc: Likewise.
* testsuite/util/testsuite_iterators.h (__gnu_test::test_range::end):
Always return a sentinel<I>.
Andrew Stubbs [Wed, 29 Jan 2020 16:59:08 +0000 (16:59 +0000)]
Fix conditional add LRA failure for amdgcn
Fix ICE in testcase gfortran.dg/assumed_rank_bounds_3.f90.
2020-01-31 Andrew Stubbs <ams@codesourcery.com>
gcc/
* config/gcn/gcn-valu.md (addv64di3_exec): Allow one '0' in each
alternative only.
Uros Bizjak [Fri, 31 Jan 2020 15:44:36 +0000 (16:44 +0100)]
Fix TARGET_SSE_PACKED_SINGLE_INSN_OPTIMAL handling.
The reason for TARGET_SSE_PACKED_SINGLE_INSN_OPTIMAL on AMD target is
only insn size, as advised in e.g. Software Optimization Guide for the
AMD Family 15h Processors [1], section 7.1.2, where it is said:
--quote--
7.1.2 Reduce Instruction SizeOptimization
Reduce the size of instructions when possible.
Rationale
Using smaller instruction sizes improves instruction fetch throughput.
Specific examples include the following:
*In SIMD code, use the single-precision (PS) form of instructions
instead of the double-precision (PD) form. For example, for register
to register moves, MOVAPS achieves the same result as MOVAPD, but uses
one less byte to encode the instruction and has no prefix byte. Other
examples in which single-precision forms can be substituted for
double-precision forms include MOVUPS, MOVNTPS, XORPS, ORPS, ANDPS,
and SHUFPS.
...
--/quote--
Please note that this optimization applies only to non-AVX forms, as
demonstrated by:
0: 0f 28 c8 movaps %xmm0,%xmm1
3: 66 0f 28 c8 movapd %xmm0,%xmm1
7: c5 f8 28 d1 vmovaps %xmm1,%xmm2
b: c5 f9 28 d1 vmovapd %xmm1,%xmm2
Also note that MOVDQA is missing in the above optimization. It is
harmful to substitute MOVDQA with MOVAPS, as it can (and does)
introduce +1 cycle forwarding penalty between FLT (FPA/FPM) and INT
(VALU) FP clusters.
[1] https://www.amd.com/system/files/TechDocs/47414_15h_sw_opt_guide.pdf
Kwok Cheung Yeung [Fri, 31 Jan 2020 14:53:30 +0000 (06:53 -0800)]
[amdgcn] Scale number of threads/workers with VGPR usage
2020-01-31 Kwok Cheung Yeung <kcy@codesourcery.com>
gcc/
* config/gcn/mkoffload.c (process_asm): Add sgpr_count and vgpr_count
to definition of hsa_kernel_description. Parse assembly to find SGPR
and VGPR count of kernel and store in hsa_kernel_description.
libgomp/
* plugin/plugin-gcn.c (struct hsa_kernel_description): Add sgpr_count
and vgpr_count fields.
(struct kernel_info): Add a field for a hsa_kernel_description.
(run_kernel): Reduce the number of threads/workers if the requested
number would require too many VGPRs.
(init_basic_kernel_info): Initialize description field with
the hsa_kernel_description entry for the kernel.
Tobias Burnus [Fri, 31 Jan 2020 14:54:21 +0000 (15:54 +0100)]
[Fortran] Disable front-end optimization for OpenACC atomic (PR93462)
PR fortran/93462
* frontend-passes.c (gfc_code_walker): For EXEC_OACC_ATOMIC, set
in_omp_atomic to true prevent front-end optimization.
PR fortran/93462
* gfortran.dg/goacc/atomic-1.f90: New.
Tamar Christina [Fri, 31 Jan 2020 14:39:38 +0000 (14:39 +0000)]
middle-end: Fix logical shift truncation (PR rtl-optimization/91838)
This fixes a fall-out from a patch I had submitted two years ago which started
allowing simplify-rtx to fold logical right shifts by offsets a followed by b
into >> (a + b).
However this can generate inefficient code when the resulting shift count ends
up being the same as the size of the shift mode. This will create some
undefined behavior on most platforms.
This patch changes to code to truncate to 0 if the shift amount goes out of
range. Before my older patch this used to happen in combine when it saw the
two shifts. However since we combine them here combine never gets a chance to
truncate them.
The issue mostly affects GCC 8 and 9 since on 10 the back-end knows how to deal
with this shift constant but it's better to do the right thing in simplify-rtx.
Note that this doesn't take care of the Arithmetic shift where you could replace
the constant with MODE_BITS (mode) - 1, but that's not a regression so punting it.
gcc/ChangeLog:
PR rtl-optimization/91838
* simplify-rtx.c (simplify_binary_operation_1): Update LSHIFTRT case
to truncate if allowed or reject combination.
gcc/testsuite/ChangeLog:
PR rtl-optimization/91838
* g++.dg/pr91838.C: New test.
Andrew Stubbs [Thu, 30 Jan 2020 14:06:12 +0000 (14:06 +0000)]
Fix fast-math-pr55281.c ICE
2020-01-31 Andrew Stubbs <ams@codesourcery.com>
gcc/
* tree-ssa-loop-ivopts.c (get_iv): Use sizetype for zero-step.
(find_inv_vars_cb): Likewise.
David Malcolm [Sun, 26 Jan 2020 23:40:43 +0000 (18:40 -0500)]
calls.c: refactor special_function_p for use by analyzer (v2)
This patch refactors some code in special_function_p that checks for
the function being sane to match by name, splitting it out into a new
maybe_special_function_p, and using it it two places in the analyzer.
gcc/analyzer/ChangeLog:
* analyzer.cc (is_named_call_p): Replace tests for fndecl being
extern at file scope and having a non-NULL DECL_NAME with a call
to maybe_special_function_p.
* function-set.cc (function_set::contains_decl_p): Add call to
maybe_special_function_p.
gcc/ChangeLog:
* calls.c (special_function_p): Split out the check for DECL_NAME
being non-NULL and fndecl being extern at file scope into a
new maybe_special_function_p and call it. Drop check for fndecl
being non-NULL that was after a usage of DECL_NAME (fndecl).
* tree.h (maybe_special_function_p): New inline function.
David Malcolm [Thu, 30 Jan 2020 20:21:28 +0000 (15:21 -0500)]
analyzer: further fixes for comparisons between uncomparable types (PR 93450)
gcc/analyzer/ChangeLog:
PR analyzer/93450
* constraint-manager.cc
(constraint_manager::get_or_add_equiv_class): Only compare constants
if their types are compatible.
* region-model.cc (constant_svalue::eval_condition): Replace check
for identical types with call to types_compatible_p.
Andrew Stubbs [Wed, 29 Jan 2020 16:57:02 +0000 (16:57 +0000)]
Zero-initialise masked load destinations
Fixes an execution failure in testcase gfortran.dg/assumed_rank_1.f90.
2020-01-30 Andrew Stubbs <ams@codesourcery.com>
gcc/
* config/gcn/gcn-valu.md (gather<mode>_exec): Move contents ...
(mask_gather_load<mode>): ... here, and zero-initialize the
destination.
(maskload<mode>di): Zero-initialize the destination.
* config/gcn/gcn.c:
David Malcolm [Thu, 30 Jan 2020 21:59:15 +0000 (16:59 -0500)]
analyzer: add extrinsic_state::dump
gcc/analyzer/ChangeLog:
* program-state.cc (extrinsic_state::dump_to_pp): New.
(extrinsic_state::dump_to_file): New.
(extrinsic_state::dump): New.
* program-state.h (extrinsic_state::dump_to_pp): New decl.
(extrinsic_state::dump_to_file): New decl.
(extrinsic_state::dump): New decl.
* sm.cc: Include "pretty-print.h".
(state_machine::dump_to_pp): New.
* sm.h (state_machine::dump_to_pp): New decl.
David Malcolm [Thu, 30 Jan 2020 21:38:35 +0000 (16:38 -0500)]
analyzer: make extrinsic_state field private
gcc/analyzer/ChangeLog:
* diagnostic-manager.cc (for_each_state_change): Use
extrinsic_state::get_num_checkers rather than accessing m_checkers
directly.
* program-state.cc (program_state::program_state): Likewise.
* program-state.h (extrinsic_state::m_checkers): Make private.
GCC Administrator [Fri, 31 Jan 2020 00:16:32 +0000 (00:16 +0000)]
Daily bump.
David Malcolm [Thu, 30 Jan 2020 20:44:59 +0000 (15:44 -0500)]
analyzer: avoid using <string.h> in malloc-1.c
This test assumes that memset and strlen have been marked with
__attribute__((nonnull)), which isn't necessarily the case for an
arbitrary <string.h>. This likely explains these failures:
FAIL: gcc.dg/analyzer/malloc-1.c (test for warnings, line 417)
FAIL: gcc.dg/analyzer/malloc-1.c (test for warnings, line 418)
FAIL: gcc.dg/analyzer/malloc-1.c (test for warnings, line 425)
FAIL: gcc.dg/analyzer/malloc-1.c (test for warnings, line 429)
seen in https://gcc.gnu.org/ml/gcc-testresults/2020-01/msg01608.html
on x86_64-apple-darwin18.
Fix it by using the __builtin_ forms.
gcc/testsuite/ChangeLog:
* gcc.dg/analyzer/malloc-1.c: Remove include of <string.h>.
Use __builtin_ forms of memset and strlen throughout.
David Malcolm [Thu, 30 Jan 2020 19:57:34 +0000 (14:57 -0500)]
analyzer: convert conditionals-2.c to a torture test
gcc/testsuite/ChangeLog:
* gcc.dg/analyzer/conditionals-2.c: Move to...
* gcc.dg/analyzer/torture/conditionals-2.c: ...here, converting
to a torture test. Remove redundant include.
David Malcolm [Thu, 30 Jan 2020 17:35:46 +0000 (12:35 -0500)]
analyzer: fix ICE in __builtin_isnan (PR 93356)
PR analyzer/93356 reports an ICE handling __builtin_isnan due to a
failing assertion:
674 gcc_assert (lhs_ec_id != rhs_ec_id);
with op=UNORDERED_EXPR.
when attempting to add an UNORDERED_EXPR constraint.
This is an overzealous assertion, but underlying it are various forms of
sloppiness regarding NaN within the analyzer:
(a) the assumption in the constraint_manager that equivalence classes
are reflexive (X == X), which isn't the case for NaN.
(b) Hardcoding the "honor_nans" param to false when calling
invert_tree_comparison throughout the analyzer.
(c) Ignoring ORDERED_EXPR, UNORDERED_EXPR, and the UN-prefixed
comparison codes.
I wrote a patch for this which tracks the NaN-ness of floating-point
values and uses this to address all of the above.
However, to minimize changes in gcc 10 stage 4, here's a simpler patch
which rejects attempts to query or add constraints on floating-point
values, instead treating any floating-point comparison as "unknown", and
silently dropping the constraints at edges.
gcc/analyzer/ChangeLog:
PR analyzer/93356
* region-model.cc (region_model::eval_condition): In both
overloads, bail out immediately on floating-point types.
(region_model::eval_condition_without_cm): Likewise.
(region_model::add_constraint): Likewise.
gcc/testsuite/ChangeLog:
PR analyzer/93356
* gcc.dg/analyzer/conditionals-notrans.c (test_float_selfcmp):
Add.
* gcc.dg/analyzer/conditionals-trans.c: Mark floating point
comparison test as failing.
(test_float_selfcmp): Add.
* gcc.dg/analyzer/data-model-1.c: Mark floating point comparison
tests as failing.
* gcc.dg/analyzer/torture/pr93356.c: New test.
gcc/ChangeLog:
PR analyzer/93356
* doc/analyzer.texi (Limitations): Note that constraints on
floating-point values are currently ignored.
Jeff Law [Thu, 30 Jan 2020 21:09:41 +0000 (14:09 -0700)]
Mark switch expression as used to avoid bogus warning
PR c/88660
* c-parser.c (c_parser_switch_statement): Make sure to request
marking the switch expr as used.
PR c/88660
* gcc.dg/pr88660.c: New test.
Jakub Jelinek [Thu, 30 Jan 2020 20:32:36 +0000 (21:32 +0100)]
cgraph: Avoid creating multiple *.localalias aliases with the same name [PR93384]
The following testcase FAILs on powerpc64le-linux with assembler errors, as we
emit a call to bar.localalias, then .set bar.localalias, bar twice and then
another call to bar.localalias. The problem is that bar.localalias can be created
at various stages and e.g. ipa-pure-const can slightly adjust the original decl,
so that the existing bar.localalias isn't considered usable (different
flags_from_decl_or_type). In that case, we'd create another bar.localalias, which
clashes with the existing name.
Fixed by retrying with another name if it is already present. The various localalias
aliases shouldn't be that many, from different partitions they would be lto_priv
suffixed and in most cases they would already have the same type/flags/attributes.
2020-01-30 Jakub Jelinek <jakub@redhat.com>
PR lto/93384
* symtab.c (symtab_node::noninterposable_alias): If localalias
already exists, but is not usable, append numbers after it until
a unique name is found. Formatting fix.
* gcc.dg/lto/pr93384_0.c: New test.
* gcc.dg/lto/pr93384_1.c: New file.
Jakub Jelinek [Thu, 30 Jan 2020 20:28:17 +0000 (21:28 +0100)]
combine: Punt on out of range rotate counts [PR93505]
What happens on this testcase is with the out of bounds rotate we get:
Trying 13 -> 16:
13: r129:SI=r132:DI#0<-<0x20
REG_DEAD r132:DI
16: r123:DI=r129:SI<0
REG_DEAD r129:SI
Successfully matched this instruction:
(set (reg/v:DI 123 [ <retval> ])
(const_int 0 [0]))
during combine. So, perhaps we could also change simplify-rtx.c to punt
if it is out of bounds rather than trying to optimize anything.
Or, but probably GCC11 material, if we decide that ROTATE/ROTATERT doesn't
have out of bounds counts or introduce targetm.rotate_truncation_mask,
we should truncate the argument instead of punting.
Punting is better for backports though.
2020-01-30 Jakub Jelinek <jakub@redhat.com>
PR middle-end/93505
* combine.c (simplify_comparison) <case ROTATE>: Punt on out of range
rotate counts.
* gcc.c-torture/compile/pr93505.c: New test.
Jason Merrill [Thu, 30 Jan 2020 18:12:05 +0000 (13:12 -0500)]
c++: Fix -Wtype-limits in templates.
When instantiating a template tsubst_copy_and_build suppresses -Wtype-limits
warnings about e.g. == always being false because it might not always be
false for an instantiation with other template arguments. But we should
warn if the operands don't depend on template arguments.
PR c++/82521
* pt.c (tsubst_copy_and_build) [EQ_EXPR]: Only suppress warnings if
the expression was dependent before substitution.
Andrew Benson [Thu, 30 Jan 2020 17:47:00 +0000 (17:47 +0000)]
Remove check for maximum symbol name length.
PR fortran/87103
* expr.c (gfc_check_conformance): Check vsnprintf for truncation.
* iresolve.c (gfc_get_string): Likewise.
* symbol.c (gfc_new_symbol): Remove check for maximum symbol
name length. Remove redundant 0 setting of new calloc()ed
gfc_symbol.
Andrew Stubbs [Wed, 29 Jan 2020 11:35:07 +0000 (11:35 +0000)]
Add LTGT operator support for amdgcn
Fixes ICE in testcase gcc.dg/pr81228.c
2020-01-30 Andrew Stubbs <ams@codesourcery.com>
gcc/
* config/gcn/gcn.c (print_operand): Handle LTGT.
* config/gcn/predicates.md (gcn_fp_compare_operator): Allow ltgt.
Jeff Law [Thu, 30 Jan 2020 16:39:57 +0000 (09:39 -0700)]
Fix "regression" reported by c6x testing.
* gcc.dg/tree-ssa/ssa-dse-26.c: Make existing dg-final scan
conditional on !c6x. Add dg-final scan pattern for c6x.
Martin Sebor [Thu, 30 Jan 2020 15:46:23 +0000 (08:46 -0700)]
PR middle-end/92323 - bogus -Warray-bounds after unrolling despite __builtin_unreachable
gcc/testsuite/ChangeLog:
* gcc.dg/Warray-bounds-57.c: New test.
Richard Biener [Thu, 30 Jan 2020 14:43:09 +0000 (15:43 +0100)]
dump CTORs properly wrapped with _Literal with -gimple
This wraps { ... } in _Literal (type) for consumption by the GIMPLE FE.
2020-01-30 Richard Biener <rguenther@suse.de>
* tree-pretty-print.c (dump_generic_node): Wrap VECTOR_CST
and CONSTRUCTOR in _Literal (type) with TDF_GIMPLE.
David Malcolm [Thu, 30 Jan 2020 01:24:42 +0000 (20:24 -0500)]
analyzer: avoid comparisons between uncomparable types (PR 93450)
PR analyzer/93450 reports an ICE trying to fold an EQ_EXPR comparison
of (int)0 with (float)Inf.
Most comparisons inside the analyzer come from gimple conditions, for
which the necessary casts have already been added.
This one is done inside constant_svalue::eval_condition as part of
purging sm-state for an unknown function call, and fails to check
the types being compared, leading to the ICE.
sm_state_map::set_state calls region_model::eval_condition_without_cm in
order to handle pointer equality (so that e.g. (void *)&r and (foo *)&r
transition together), which leads to this code generating a bogus query
to see if the two constants are equal.
This patch fixes the ICE in two ways:
- It avoids generating comparisons within
constant_svalue::eval_condition unless the types are equal (thus for
constants, but not for pointer values, which are handled by
region_svalue).
- It updates sm_state_map::set_state to bail immediately if the new
state is the same as the old one, thus avoiding the above for the
common case where an svalue_id has no sm-state (such as for the int
and float constants in the reproducer), for which the above becomes a
no-op.
gcc/analyzer/ChangeLog:
PR analyzer/93450
* program-state.cc (sm_state_map::set_state): For the overload
taking an svalue_id, bail out if the set_state on the ec does
nothing. Convert the latter's return type from void to bool,
returning true if anything changed.
(sm_state_map::impl_set_state): Convert the return type from void
to bool, returning true if the state changed.
* program-state.h (sm_state_map::set_state): Convert return type
from void to bool.
(sm_state_map::impl_set_state): Likewise.
* region-model.cc (constant_svalue::eval_condition): Only call
fold_build2 if the types are the same.
gcc/testsuite/ChangeLog:
PR analyzer/93450
* gcc.dg/analyzer/torture/pr93450.c: New test.
John David Anglin [Thu, 30 Jan 2020 12:26:58 +0000 (07:26 -0500)]
Fix ICE in pa_elf_select_rtx_section.
2020-01-30 John David Anglin <danglin@gcc.gnu.org>
* config/pa/pa.c (pa_elf_select_rtx_section): Place function pointers
without a DECL in .data.rel.ro.local.
Jakub Jelinek [Thu, 30 Jan 2020 11:58:20 +0000 (12:58 +0100)]
arm: Fix uaddvdi4 expander [PR93494]
uaddvdi4 expander has an optimization for the low 32-bits of the 2nd input
operand known to be 0. Unfortunately, in that case it only emits copying of
the low 32 bits to the low 32 bits of the destination, but doesn't emit the
addition with overflow detection for the high 64 bits.
Well, to be precise, it emits it, but into an RTL sequence returned by
gen_uaddvsi4, but that sequence isn't emitted anywhere.
2020-01-30 Jakub Jelinek <jakub@redhat.com>
PR target/93494
* config/arm/arm.md (uaddvdi4): Actually emit what gen_uaddvsi4
returned.
* gcc.c-torture/execute/pr93494.c: New test.
Tobias Burnus [Thu, 30 Jan 2020 11:27:17 +0000 (12:27 +0100)]
Skip plugin-{gcn,hsa} for (-m)x32 (PR bootstrap/93409)
PR bootstrap/93409
* plugin/configfrag.ac (enable_offload_targets): Skip
HSA and GCN plugin besides -m32 also for -mx32.
* configure: Regenerate.
Paolo Carlini [Thu, 30 Jan 2020 10:39:04 +0000 (11:39 +0100)]
Add testcase of PR c++/90338, already fixed in trunk.
PR c++/90338
* g++.dg/pr90338.C: New.
Jakub Jelinek [Thu, 30 Jan 2020 08:41:00 +0000 (09:41 +0100)]
i386: Optimize {,v}{,p}movmsk{b,ps,pd} followed by sign extension [PR91824]
Some time ago, patterns were added to optimize move mask followed by zero
extension from 32 bits to 64 bit. As the testcase shows, the intrinsics
actually return int, not unsigned int, so it will happen quite often that
one actually needs sign extension instead of zero extension. Except for
vpmovmskb with 256-bit operand, sign vs. zero extension doesn't make a
difference, as we know the bit 31 will not be set (the source will have 2 or
4 doubles, 4 or 8 floats or 16 or 32 chars).
So, for the floating point patterns, this patch just uses a code iterator
so that we handle both zero extend and sign extend, and for the byte one
adds a separate pattern for the 128-bit operand.
2020-01-30 Jakub Jelinek <jakub@redhat.com>
PR target/91824
* config/i386/sse.md
(*<sse>_movmsk<ssemodesuffix><avxsizesuffix>_zext): Renamed to ...
(*<sse>_movmsk<ssemodesuffix><avxsizesuffix>_<u>ext): ... this. Use
any_extend code iterator instead of always zero_extend.
(*<sse>_movmsk<ssemodesuffix><avxsizesuffix>_zext_lt): Renamed to ...
(*<sse>_movmsk<ssemodesuffix><avxsizesuffix>_<u>ext_lt): ... this.
Use any_extend code iterator instead of always zero_extend.
(*<sse>_movmsk<ssemodesuffix><avxsizesuffix>_zext_shift): Renamed to ...
(*<sse>_movmsk<ssemodesuffix><avxsizesuffix>_<u>ext_shift): ... this.
Use any_extend code iterator instead of always zero_extend.
(*sse2_pmovmskb_ext): New define_insn.
(*sse2_pmovmskb_ext_lt): New define_insn_and_split.
* gcc.target/i386/pr91824-2.c: New test.
Jakub Jelinek [Thu, 30 Jan 2020 08:39:05 +0000 (09:39 +0100)]
i386: Optimize popcnt followed by zero/sign extension [PR91824]
Like any other instruction with 32-bit GPR destination operand in 64-bit
mode, popcntl also clears the upper 32 bits of the register (and other bits
too, it can return only 0 to 32 inclusive).
During combine, the zero or sign extensions of it show up as paradoxical
subreg of the popcount & 63, there 63 is the smallest power of two - 1 mask
that can represent all the 0 to 32 inclusive values.
2020-01-30 Jakub Jelinek <jakub@redhat.com>
PR target/91824
* config/i386/i386.md (*popcountsi2_zext): New define_insn_and_split.
(*popcountsi2_zext_falsedep): New define_insn.
* gcc.target/i386/pr91824-1.c: New test.
Jakub Jelinek [Thu, 30 Jan 2020 08:35:03 +0000 (09:35 +0100)]
fortran: Fix up ISO_Fortran_binding_15.f90 failures [PR92123]
This is something that has been discussed already a few months ago, but
seems to have stalled. Here is Paul's patch from the PR except for the
TREE_STATIC hunk which is wrong, and does the most conservative fn spec
tweak for the problematic two builtins we are aware of (to repeat what is in
the PR, both .wR and .ww are wrong for these builtins that transform one
layout of an descriptor to another one; while the first pointer is properly
marked that we only store to what it points to, from the second pointer
we copy and reshuffle the content and store into the first one; if there
wouldn't be any pointers, ".wr" would be just fine, but as there is a
pointer and that pointer is copied to the area pointed by first argument,
the pointer effectively leaks that way, so we e.g. can't optimize stores
into what the data pointer in the descriptor points to). I haven't
analyzed other fn spec attributes in the FE, but think it is better to
fix at least this one we have analyzed.
2020-01-30 Paul Thomas  <pault@gcc.gnu.org>
Jakub Jelinek <jakub@redhat.com>
PR fortran/92123
* trans-decl.c (gfc_get_symbol_decl): Call gfc_defer_symbol_init for
CFI descs.
(gfc_build_builtin_function_decls): Use ".w." instead of ".ww" or ".wR"
for gfor_fndecl_{cfi_to_gfc,gfc_to_cfi}.
(convert_CFI_desc): Handle references to CFI descriptors.
Co-authored-by: Paul Thomas <pault@gcc.gnu.org>
Dragan Mladjenovic [Thu, 30 Jan 2020 07:19:49 +0000 (08:19 +0100)]
Regenerate configure for 54b3d52
Commit 54b3d52 ("Emit .note.GNU-stack for hard-float linux targets.")
was missing generated files. Add them now.
gcc/ChangeLog:
2020-01-30 Dragan Mladjenovic <dmladjenovic@wavecomp.com>
* config.in: Regenerated.
* configure: Regenerated.
Bin Cheng [Thu, 30 Jan 2020 04:33:47 +0000 (12:33 +0800)]
Use promise in coroutine frame in actor function.
By standard, coroutine body should be encapsulated in try-catch block
as following:
try {
// coroutine body
} catch(...) {
promise.unhandled_exception();
}
Given above try-catch block is implemented in the coroutine actor
function called by coroutine ramp function, so the promise should
be accessed via actor function's coroutine frame pointer argument,
rather than the ramp function's coroutine frame variable.
This patch cleans code a bit to make fix easy.
gcc/cp
* coroutines.cc (act_des_fn): New.
(morph_fn_to_coro): Call act_des_fn to build actor/destroy decls.
Access promise via actor function's frame pointer argument.
(build_actor_fn, build_destroy_fn): Use frame pointer argument.
Bin Cheng [Thu, 30 Jan 2020 04:10:36 +0000 (12:10 +0800)]
Handle CO_AWAIT_EXPR in conversion in co_await_expander.
Function co_await_expander expands CO_AWAIT_EXPR and inserts expanded
code before result of co_await is used, however, it doesn't cover the
type conversion case and leads to gimplify ICE. This patch fixes it.
gcc/cp
* coroutines.cc (co_await_expander): Handle type conversion case.
gcc/testsuite
* g++.dg/coroutines/co-await-syntax-09-convert.C: New test.
Ian Lance Taylor [Thu, 30 Jan 2020 00:42:01 +0000 (16:42 -0800)]
runtime, syscall: add a couple of hurd build tags
Patch by Svante Signell.
Updates PR go/93468
Reviewed-on: https://go-review.googlesource.com/c/gofrontend/+/216959
Ian Lance Taylor [Thu, 30 Jan 2020 00:36:25 +0000 (16:36 -0800)]
runtime: update netpoll_hurd.go for go1.14beta1 changes
Patch from Svante Signell.
Updates PR go/93468
Reviewed-on: https://go-review.googlesource.com/c/gofrontend/+/216958
Jason Merrill [Wed, 29 Jan 2020 22:16:12 +0000 (17:16 -0500)]
c++: Drop alignas restriction for stack variables.
Since expand_stack_vars and such know how to deal with variables aligned
beyond MAX_SUPPORTED_STACK_ALIGNMENT, we shouldn't reject alignas of large
alignments. And if we don't do that, there's no point in having
check_cxx_fundamental_alignment_constraints at all, since
check_user_alignment already enforces MAX_OFILE_ALIGNMENT.
PR c++/89357
* c-attribs.c (check_cxx_fundamental_alignment_constraints): Remove.
GCC Administrator [Thu, 30 Jan 2020 00:16:31 +0000 (00:16 +0000)]
Daily bump.
Jakub Jelinek [Thu, 30 Jan 2020 00:01:55 +0000 (01:01 +0100)]
testsuite: Fix up tree-ssa/pr92706-1.c on 32-bit targets.
The test uses __int128_t, so won't work on targets that don't support it.
2020-01-30 Jakub Jelinek <jakub@redhat.com>
PR tree-optimization/92706
* gcc.dg/tree-ssa/pr92706-1.c: Require int128 effective target.
Jason Merrill [Tue, 28 Jan 2020 22:41:05 +0000 (17:41 -0500)]
c++: Fix attributes with lambda and trailing return type.
My fix for 60503 fixed handling of C++11 attributes following the
lambda-declarator. My patch for 89640 re-added support for GNU attributes,
but attributes after the trailing return type were parsed as applying to the
return type rather than to the function. This patch adjusts parsing of a
trailing-return-type to ignore GNU attributes at the end of the declaration
so that they will be applied to the declaration as a whole.
I also considered parsing the attributes between the closing paren and the
trailing-return-type, and tried a variety of approaches to implementing
that, but I think it's better to stick with the documented rule that "An
attribute specifier list may appear immediately before the comma, '=' or
semicolon terminating the declaration of an identifier...." Anyone
disagree?
Meanwhile, C++ committee discussion about the lack of any way to apply
attributes to a lambda op() seems to have concluded that they should go
between the introducer and declarator, so I've implemented that as well.
PR c++/90333
PR c++/89640
PR c++/60503
* parser.c (cp_parser_type_specifier_seq): Don't parse attributes in
a trailing return type.
(cp_parser_lambda_declarator_opt): Parse C++11 attributes before
parens.
Tobias Burnus [Wed, 29 Jan 2020 22:45:55 +0000 (23:45 +0100)]
GCN – call assembler with -mattr=-code-object-v3 (PR93409)
PR bootstrap/93409
* config/gcn/gcn-hsa.h (ASM_SPEC): Add -mattr=-code-object-v3 as
LLVM's assembler changed the default in version 9.
Marek Polacek [Wed, 29 Jan 2020 20:12:46 +0000 (15:12 -0500)]
c++: Add new test [PR88092]
This test got fixed by r10-1976-gdaaa6fcc70ffe66bd56f5819ad4ee78fecd54bb6
so let's add it to the testsuite.
PR c++/88092
* g++.dg/cpp2a/nontype-class31.C: New test.
Jeff Law [Wed, 29 Jan 2020 19:23:53 +0000 (12:23 -0700)]
Improve DSE which in turn eliminates the need for jump threading and block duplication for the original testcase in pr89689 which in turn eliminates the false positive -Warray-bounds warning for the original testcase.
PR tree-optimization/89689
* builtins.def (BUILT_IN_OBJECT_SIZE): Make it const rather than pure.
PR tree-optimization/89689
* gcc.dg/pr89689.c: New test.
Richard Sandiford [Wed, 29 Jan 2020 18:56:35 +0000 (18:56 +0000)]
Revert g-
465c7c89e92a6d6d582173e505cb16dcb9873034
The patch caused regressions in gcc.target/sh/pr64345-1.c on
sh3-linux-gnu and gcc.target/m68k/pr39726.c on m68k-linux-gnu.
It didn't look like they would be fixable in an acceptably
non-invasive and unhacky way, so punting till future releases.
2020-01-29 Richard Sandiford <richard.sandiford@arm.com>
gcc/
Revert:
2020-01-28 Richard Sandiford <richard.sandiford@arm.com>
PR rtl-optimization/87763
* simplify-rtx.c (simplify_truncation): Extend sign/zero_extract
simplification to handle subregs as well as bare regs.
* config/i386/i386.md (*testqi_ext_3): Match QI extracts too.
Marek Polacek [Wed, 29 Jan 2020 17:02:14 +0000 (12:02 -0500)]
c++: Fix template arguments comparison with class NTTP [PR91754]
Here we fail to compile the attached test, stating that the use of
T<s> in T<s>::T() {} is "invalid use of incomplete type". It is a
function definition so grokdeclarator checks that the qualifying type
is complete.
When we parsed the class T, finish_struct gave the class a non-null
TYPE_SIZE, making it COMPLETE_TYPE_P. But then we're parsing T<s>,
a TEMPLATE_ID, in
T<s>::T() {}
so try to lookup_template_class T. This failed because we couldn't
find such a class: comp_template_args told us that the argument lists
don't match, because one of the args was wrapped in a VIEW_CONVERT_EXPR
to make it look const. It seems to me that we should see through
these artificial wrappers and consider the args same.
2020-01-29 Marek Polacek <polacek@redhat.com>
PR c++/91754 - Fix template arguments comparison with class NTTP.
* pt.c (class_nttp_const_wrapper_p): New.
(template_args_equal): See through class_nttp_const_wrapper_p
arguments.
* g++.dg/cpp2a/nontype-class30.C: New test.
Marek Polacek [Tue, 28 Jan 2020 22:44:21 +0000 (17:44 -0500)]
c++: Fix class NTTP with template arguments [PR92948]
This PR points out an ICE with an alias template and class NTTP, but I
found that there are more issues. Trouble arise when we use a
(non-type) template parameter as an argument to the template arg list of
a template that accepts a class NTTP and a conversion to a class is
involved, e.g.
struct A { A(int) { } };
template<A a> struct B { };
template<int X> void fn () {
B<X> b;
}
Normally for such a conversion we create a TARGET_EXPR with an
AGGR_INIT_EXPR that calls a __ct_comp with some arguments, but not in
this case: when converting X to type A in convert_nontype_argument we
are in a template and the template parameter 'X' is value-dependent, and
AGGR_INIT_EXPR don't work in templates. So it just creates a TARGET_EXPR
that contains "A::A(*this, X)", but with that overload resolution fails:
error: no matching function for call to 'A::A(A*, int)'
That happens because finish_call_expr for a BASELINK creates a dummy
object, so we have 'this' twice. I thought for the value-dependent case
we could use IMPLICIT_CONV_EXPR, as in the patch below. Note that I
only do this when we convert to a different type than the type of the
expr. The point is to avoid the call to a converting ctor.
The second issue was an ICE in tsubst_copy: when there's a conversion
like the above involved then
/* Wrapper to make a C++20 template parameter object const. */
op = tsubst_copy (op, args, complain, in_decl);
might not produce a _ZT... VAR_DECL with const type, so the assert
gcc_assert (CP_TYPE_CONST_P (TREE_TYPE (op)));
fails. Allowing IMPLICIT_CONV_EXPR there probably makes sense.
And since convert_nontype_argument uses value_dependent_expression_p
a lot, I used a dedicated bool to speed things up.
2020-01-29 Marek Polacek <polacek@redhat.com>
PR c++/92948 - Fix class NTTP with template arguments.
* pt.c (convert_nontype_argument): Use IMPLICIT_CONV_EXPR when
converting a value-dependent expression to a class type.
(tsubst_copy) <case VIEW_CONVERT_EXPR>: Allow IMPLICIT_CONV_EXPR
as the result of the tsubst_copy call.
* g++.dg/cpp2a/nontype-class28.C: New test.
* g++.dg/cpp2a/nontype-class29.C: New test.
Jonathan Wakely [Thu, 23 Jan 2020 16:46:17 +0000 (16:46 +0000)]
libstdc++: Fix conformance issues in <stop_token> (PR92895)
Fix synchronization issues in <stop_token>. Replace shared_ptr with
_Stop_state_ref and a reference count embedded in the shared state.
Replace std::mutex with spinlock using one bit of a std::atomic<> that
also tracks whether a stop request has been made and how many
stop_source objects share ownership of the state.
PR libstdc++/92895
* include/std/stop_token (stop_token::stop_possible()): Call new
_M_stop_possible() function.
(stop_token::stop_requested()): Do not use stop_possible().
(stop_token::binary_semaphore): New class, as temporary stand-in for
std::binary_semaphore.
(stop_token::_Stop_cb::_M_callback): Add noexcept to type.
(stop_token::_Stop_cb::_M_destroyed, stop_token::_Stop_cb::_M_done):
New data members for symchronization with stop_callback destruction.
(stop_token::_Stop_cb::_Stop_cb): Make non-template.
(stop_token::_Stop_cb::_M_linked, stop_token::_Stop_cb::_S_execute):
Remove.
(stop_token::_Stop_cb::_M_run): New member function.
(stop_token::_Stop_state::_M_stopped, stop_token::_Stop_state::_M_mtx):
Remove.
(stop_token::_Stop_state::_M_owners): New data member to track
reference count for ownership.
(stop_token::_Stop_state::_M_value): New data member combining a
spinlock, the stop requested flag, and the reference count for
associated stop_source objects.
(stop_token::_Stop_state::_M_requester): New data member for
synchronization with stop_callback destruction.
(stop_token::_Stop_state::_M_stop_possible()): New member function.
(stop_token::_Stop_state::_M_stop_requested()): Inspect relevant bit
of _M_value.
(stop_token::_Stop_state::_M_add_owner)
(stop_token::_Stop_state::_M_release_ownership)
(stop_token::_Stop_state::_M_add_ssrc)
(stop_token::_Stop_state::_M_sub_ssrc): New member functions for
updating reference counts.
(stop_token::_Stop_state::_M_lock, stop_token::_Stop_state::_M_unlock)
(stop_token::_Stop_state::_M_lock, stop_token::_Stop_state::_M_unlock)
(stop_token::_Stop_state::_M_try_lock)
(stop_token::_Stop_state::_M_try_lock_and_stop)
(stop_token::_Stop_state::_M_do_try_lock): New member functions for
managing spinlock.
(stop_token::_Stop_state::_M_request_stop): Use atomic operations to
read and update state. Release lock while running callbacks. Use new
data members to synchronize with callback destruction.
(stop_token::_Stop_state::_M_remove_callback): Likewise.
(stop_token::_Stop_state::_M_register_callback): Use atomic operations
to read and update state.
(stop_token::_Stop_state_ref): Handle type to manage _Stop_state,
replacing shared_ptr.
(stop_source::stop_source(const stop_source&)): Update reference count.
(stop_source::operator=(const stop_source&)): Likewise.
(stop_source::~stop_source()): Likewise.
(stop_source::stop_source(stop_source&&)): Define as defaulted.
(stop_source::operator=(stop_source&&)): Establish postcondition on
parameter.
(stop_callback): Enforce preconditions on template parameter. Replace
base class with data member of new _Cb_impl type.
(stop_callback::stop_callback(const stop_token&, Cb&&))
(stop_callback::stop_callback(stop_token&&, Cb&&)): Fix TOCTTOU race.
(stop_callback::_Cb_impl): New type wrapping _Callback member and
defining the _S_execute member function.
* testsuite/30_threads/stop_token/stop_callback/deadlock-mt.cc: New
test.
* testsuite/30_threads/stop_token/stop_callback/deadlock.cc: New test.
* testsuite/30_threads/stop_token/stop_callback/destroy.cc: New test.
* testsuite/30_threads/stop_token/stop_callback/destructible_neg.cc:
New test.
* testsuite/30_threads/stop_token/stop_callback/invocable_neg.cc: New
test.
* testsuite/30_threads/stop_token/stop_callback/invoke.cc: New test.
* testsuite/30_threads/stop_token/stop_source/assign.cc: New test.
* testsuite/30_threads/stop_token/stop_token/stop_possible.cc: New
test.
Frederik Harwath [Wed, 29 Jan 2020 14:51:44 +0000 (15:51 +0100)]
Add acc_device_radeon to name_of_acc_device_t function
libgomp/
* oacc-init.c (name_of_acc_device_t): Handle acc_device_radeon.
Reviewed-by: Thomas Schwinge <thomas@codesourcery.com>
Andre Vieira [Wed, 29 Jan 2020 14:23:22 +0000 (14:23 +0000)]
IRA: Revert
11b8091fb to fix PR 93221
A previous change to simplify LRA introduced in 11b809 (From-SVN: r279550)
disabled hard register splitting for -O0. This causes a problem on aarch64 in
cases where parameters are passed in multiple registers (in the bug report an OI
passed in 2 V4SI registers). This is mandated by the AAPCS.
gcc/ChangeLog:
2020-01-29 Joel Hutton <Joel.Hutton@arm.com>
PR target/93221
* ira.c (ira): Revert use of simplified LRA algorithm.
gcc/testsuite/ChangeLog:
2020-01-29 Joel Hutton <Joel.Hutton@arm.com>
PR target/93221
* gcc.target/aarch64/pr93221.c: New test.
Jonathan Wakely [Wed, 29 Jan 2020 13:56:49 +0000 (13:56 +0000)]
libstdc++: Simplify constraints on std::compare_three_way
The __3way_builtin_ptr_cmp concept can use three_way_comparable_with to
check whether <=> is valid. Doing that makes it obvious that the
disjunction on compare_three_way::operator() is redundant, because
the second constraint subsumes the first.
The workaround for PR c++/91073 can also be removed as that bug is fixed
now.
* libsupc++/compare (__detail::__3way_builtin_ptr_cmp): Use
three_way_comparable_with.
(__detail::__3way_cmp_with): Remove workaround for fixed bug.
(compare_three_way::operator()): Remove redundant constraint from
requires-clause.
(__detail::_Synth3way::operator()): Use three_way_comparable_with
instead of workaround.
* testsuite/18_support/comparisons/object/93479.cc: Prune extra
output due to simplified constraints on compare_three_way::operator().
Jonathan Wakely [Wed, 29 Jan 2020 13:36:15 +0000 (13:36 +0000)]
libstdc++: Make std::compare_three_way check if <=> is valid (PR 93479)
Currently types that cannot be compared using <=> but which are
convertible to pointers will be compared by converting to pointers
first. They should not be comparable.
PR libstdc++/93479
* libsupc++/compare (__3way_builtin_ptr_cmp): Require <=> to be valid.
* testsuite/18_support/comparisons/object/93479.cc: New test.
Jonathan Wakely [Wed, 29 Jan 2020 13:36:15 +0000 (13:36 +0000)]
libstdc++: Make tests for std::ranges access functions more robust
* testsuite/std/ranges/access/end.cc: Do not assume test_range::end()
returns the same type as test_range::begin(). Add comments.
* testsuite/std/ranges/access/rbegin.cc: Likewise.
* testsuite/std/ranges/access/rend.cc: Likewise.
* testsuite/std/ranges/range.cc: Do not assume the sentinel for
test_range is the same as its iterator type.
* testsuite/util/testsuite_iterators.h (test_range::sentinel): Add
operator- overloads to satisfy sized_sentinel_for when the iterator
satisfies random_access_iterator.
Martin Jambor [Wed, 29 Jan 2020 12:13:13 +0000 (13:13 +0100)]
SRA: Also propagate accesses from LHS to RHS [PR92706]
2020-01-29 Martin Jambor <mjambor@suse.cz>
PR tree-optimization/92706
* tree-sra.c (struct access): Fields first_link, last_link,
next_queued and grp_queued renamed to first_rhs_link, last_rhs_link,
next_rhs_queued and grp_rhs_queued respectively, new fields
first_lhs_link, last_lhs_link, next_lhs_queued and grp_lhs_queued.
(struct assign_link): Field next renamed to next_rhs, new field
next_lhs. Updated comment.
(work_queue_head): Renamed to rhs_work_queue_head.
(lhs_work_queue_head): New variable.
(add_link_to_lhs): New function.
(relink_to_new_repr): Also relink LHS lists.
(add_access_to_work_queue): Renamed to add_access_to_rhs_work_queue.
(add_access_to_lhs_work_queue): New function.
(pop_access_from_work_queue): Renamed to
pop_access_from_rhs_work_queue.
(pop_access_from_lhs_work_queue): New function.
(build_accesses_from_assign): Also add links to LHS lists and to LHS
work_queue.
(child_would_conflict_in_lacc): Renamed to
child_would_conflict_in_acc. Adjusted parameter names.
(create_artificial_child_access): New parameter set_grp_read, use it.
(subtree_mark_written_and_enqueue): Renamed to
subtree_mark_written_and_rhs_enqueue.
(propagate_subaccesses_across_link): Renamed to
propagate_subaccesses_from_rhs.
(propagate_subaccesses_from_lhs): New function.
(propagate_all_subaccesses): Also propagate subaccesses from LHSs to
RHSs.
testsuite/
* gcc.dg/tree-ssa/pr92706-1.c: New test.
Martin Jambor [Wed, 29 Jan 2020 12:13:13 +0000 (13:13 +0100)]
SRA: Total scalarization after access propagation [PR92706]
2020-01-29 Martin Jambor <mjambor@suse.cz>
PR tree-optimization/92706
* tree-sra.c (struct access): Adjust comment of
grp_total_scalarization.
(find_access_in_subtree): Look for single children spanning an entire
access.
(scalarizable_type_p): Allow register accesses, adjust callers.
(completely_scalarize): Remove function.
(scalarize_elem): Likewise.
(create_total_scalarization_access): Likewise.
(sort_and_splice_var_accesses): Do not track total scalarization
flags.
(analyze_access_subtree): New parameter totally, adjust to new meaning
of grp_total_scalarization.
(analyze_access_trees): Pass new parameter to analyze_access_subtree.
(can_totally_scalarize_forest_p): New function.
(create_total_scalarization_access): Likewise.
(create_total_access_and_reshape): Likewise.
(total_should_skip_creating_access): Likewise.
(totally_scalarize_subtree): Likewise.
(analyze_all_variable_accesses): Perform total scalarization after
subaccess propagation using the new functions above.
(initialize_constant_pool_replacements): Output initializers by
traversing the access tree.
testsuite/
* gcc.dg/tree-ssa/pr92706-2.c: New test.
* gcc.dg/guality/pr59776.c: Xfail tests for s2.g.
Martin Jambor [Wed, 29 Jan 2020 12:13:12 +0000 (13:13 +0100)]
SRA: Add verification of accesses
2020-01-29 Martin Jambor <mjambor@suse.cz>
* tree-sra.c (verify_sra_access_forest): New function.
(verify_all_sra_access_forests): Likewise.
(create_artificial_child_access): Set parent.
(analyze_all_variable_accesses): Call the verifier.
Jan Hubicka [Wed, 29 Jan 2020 11:43:10 +0000 (12:43 +0100)]
ipa: Fix removal of multi-target speculation.
* cgraph.c (cgraph_edge::resolve_speculation): Only lookup direct edge
if called on indirect edge.
(cgraph_edge::redirect_call_stmt_to_callee): Lookup indirect edge of
speculative call if needed.
* gcc.dg/tree-prof/indir-call-prof-2.c: New testcase.
Frederik Harwath [Wed, 29 Jan 2020 09:21:18 +0000 (10:21 +0100)]
Adjust formatting of acc_get_property tests
libgomp/
* testsuite/libgomp.oacc-c-c++-common/acc_get_property.c:
Adjust to GNU coding style.
* testsuite/libgomp.oacc-c-c++-common/acc_get_property-aux.c: Likewise.
* testsuite/libgomp.oacc-c-c++-common/acc_get_property-gcn.c: Likewise.
* testsuite/libgomp.oacc-c-c++-common/acc_get_property-host.c: Likewise.
* testsuite/libgomp.oacc-c-c++-common/acc_get_property-nvptx.c: Likewise.
Frederik Harwath [Wed, 29 Jan 2020 09:19:50 +0000 (10:19 +0100)]
Add OpenACC acc_get_property support for AMD GCN
Add full support for the OpenACC 2.6 acc_get_property and
acc_get_property_string functions to the libgomp GCN plugin.
libgomp/
* plugin-gcn.c (struct agent_info): Add fields "name" and
"vendor_name" ...
(GOMP_OFFLOAD_init_device): ... and init from here.
(struct hsa_context_info): Add field "driver_version_s" ...
(init_hsa_contest): ... and init from here.
(GOMP_OFFLOAD_openacc_get_property): Replace stub with a proper
implementation.
* testsuite/libgomp.oacc-c-c++-common/acc_get_property.c:
Enable test execution for amdgcn and host offloading targets.
* testsuite/libgomp.oacc-fortran/acc_get_property.f90: Likewise.
* testsuite/libgomp.oacc-c-c++-common/acc_get_property-aux.c
(expect_device_properties): Split function into ...
(expect_device_string_properties): ... this new function ...
(expect_device_memory): ... and this new function.
* testsuite/libgomp.oacc-c-c++-common/acc_get_property-gcn.c:
Add test.
Richard Sandiford [Tue, 28 Jan 2020 10:05:29 +0000 (10:05 +0000)]
testsuite: XFAIL gcc.dg/torture/pr93133.c for powerpc*-*-* [PR93393]
2020-01-29 Richard Sandiford <richard.sandiford@arm.com>
gcc/testsuite/
PR testsuite/93393
* gcc.dg/torture/pr93133.c: XFAIL for powerpc*-*-*.
Jakub Jelinek [Wed, 29 Jan 2020 08:41:42 +0000 (09:41 +0100)]
openmp: c++: Consider typeinfo decls to be predetermined shared [PR91118]
If the typeinfo decls appear in OpenMP default(none) regions, as we no longer
predetermine const with no mutable members, they are diagnosed as errors,
but it isn't something the users can actually provide explicit sharing for in
the clauses.
2020-01-29 Jakub Jelinek <jakub@redhat.com>
PR c++/91118
* cp-gimplify.c (cxx_omp_predetermined_sharing): Return
OMP_CLAUSE_DEFAULT_SHARED for typeinfo decls.
* g++.dg/gomp/pr91118-1.C: New test.
* g++.dg/gomp/pr91118-2.C: New test.
Jakub Jelinek [Wed, 29 Jan 2020 08:39:16 +0000 (09:39 +0100)]
openmp: Handle rest of EXEC_OACC_* in oacc_code_to_statement [PR93463]
As the testcase shows, some EXEC_OACC_* codes weren't handled in
oacc_code_to_statement. Fixed thusly.
2020-01-29 Jakub Jelinek <jakub@redhat.com>
PR fortran/93463
* openmp.c (oacc_code_to_statement): Handle
EXEC_OACC_{ROUTINE,UPDATE,WAIT,CACHE,{ENTER,EXIT}_DATA,DECLARE}.
* gfortran.dg/goacc/pr93463.f90: New test.
Jakub Jelinek [Wed, 29 Jan 2020 08:36:19 +0000 (09:36 +0100)]
analyzer: fix build with gcc 4.4 (PR 93276)
All that is really needed is make sure you #include "diagnostic-core.h"
before including pretty-print.h. By including
diagnostic-core.h first, you do:
and then pretty-print.h will do:
If instead pretty-print.h is included first, then it will use __gcc_diag__
instead of __gcc_tdiag__ and thus will assume %E/%D etc. can't be handled.
2020-01-29 Jakub Jelinek <jakub@redhat.com>
* analyzer.h (PUSH_IGNORE_WFORMAT, POP_IGNORE_WFORMAT): Remove.
* constraint-manager.cc: Include diagnostic-core.h before graphviz.h.
(range::dump, equiv_class::print): Don't use PUSH_IGNORE_WFORMAT or
POP_IGNORE_WFORMAT.
* state-purge.cc: Include diagnostic-core.h before
gimple-pretty-print.h.
(state_purge_annotator::add_node_annotations, print_vec_of_names):
Don't use PUSH_IGNORE_WFORMAT or POP_IGNORE_WFORMAT.
* region-model.cc: Move diagnostic-core.h include before graphviz.h.
(path_var::dump, svalue::print, constant_svalue::print_details,
region::dump_to_pp, region::dump_child_label, region::print_fields,
map_region::print_fields, map_region::dump_dot_to_pp,
map_region::dump_child_label, array_region::print_fields,
array_region::dump_dot_to_pp): Don't use PUSH_IGNORE_WFORMAT or
POP_IGNORE_WFORMAT.
Richard Biener [Wed, 29 Jan 2020 08:05:05 +0000 (09:05 +0100)]
tree-optimization/93428 - avoid load permutation vector clobbering
With SLP now being a graph with shared nodes across instances we have
to make sure to compute the load permutation of nodes once, not
overwriting the result of earlier analysis.
2020-01-28 Richard Biener <rguenther@suse.de>
PR tree-optimization/93428
* tree-vect-slp.c (vect_build_slp_tree_2): Compute the load
permutation when the load node is created.
(vect_analyze_slp_instance): Re-use it here.
* gcc.dg/torture/pr93428.c: New testcase.
Jan Hubicka [Wed, 29 Jan 2020 07:15:52 +0000 (08:15 +0100)]
Fix bogus Changelog entry.
* ipa-prop.c (update_indirect_edges_after_inlining): Fix warning.
GCC Administrator [Wed, 29 Jan 2020 00:16:37 +0000 (00:16 +0000)]
Daily bump.
Martin Sebor [Tue, 28 Jan 2020 21:48:52 +0000 (14:48 -0700)]
PR middle-end/93437 - bogus -Warray-bounds on protobuf generated code
gcc/testsuite/ChangeLog:
PR middle-end/93437
* g++.dg/warn/Wstringop-overflow-5.C: New test.
Jason Merrill [Tue, 28 Jan 2020 21:06:33 +0000 (16:06 -0500)]
c++: Fix return deduction of lambda in discarded stmt.
A return statement in a discarded statement is not used for return type
deduction, but we still want to do deduction for a return statement in a
lambda in a discarded statement.
PR c++/93442
* parser.c (cp_parser_lambda_expression): Clear in_discarded_stmt.
Jason Merrill [Tue, 28 Jan 2020 20:15:20 +0000 (15:15 -0500)]
c++: Fix guard variable and attribute weak.
My patch for PR 91476 worked for decls that are implicitly comdat/weak due
to C++ linkage rules, but broke variables explicitly marked weak.
PR c++/93477
PR c++/91476
* decl2.c (copy_linkage): Do copy DECL_ONE_ONLY and DECL_WEAK.
Jan Hubicka [Tue, 28 Jan 2020 21:44:36 +0000 (22:44 +0100)]
ipa: fix warning in ipa-prop.c
* ipa-prop.c (update_indirect_edges_after_inlining):
David Malcolm [Tue, 28 Jan 2020 18:06:24 +0000 (10:06 -0800)]
analyzer: fix ICE when longjmp isn't marked 'noreturn' (PR 93316)
Comments 11-16 within PR analyzer/93316 discuss an ICE in some setjmp
tests seen on AIX and powerpc-darwin9.
The issue turned out to be an implicit assumption that longjmp is
marked "noreturn". There are two places in engine.cc where the code
attempted to locate the longjmp GIMPLE_CALL by finding the final stmt
within its supernode, in one place casting it via "as_a <gcall *>",
in the other using it as the location_t of the
"rewinding from longjmp..." event.
When longjmp isn't marked noreturn, its basic block and hence supernode
can have additional stmts after the longjmp; in the setjmp-3.c case
this was a GIMPLE_RETURN, leading to a ICE when casting this to a
GIMPLE_CALL.
This patch fixes the two places in question to use the
rewind_info_t::get_longjmp_call
member function introduced by
342e14ffa30e9163a1a75e0a4fb21b6883d58dbe,
fixing the ICE (and ensuring the correct location_t is used for events).
gcc/analyzer/ChangeLog:
PR analyzer/93316
* engine.cc (rewind_info_t::update_model): Get the longjmp call
stmt via get_longjmp_call () rather than assuming it is the last
stmt in the longjmp's supernode.
(rewind_info_t::add_events_to_path): Get the location_t for the
rewind_from_longjmp_event via get_longjmp_call () rather than from
the supernode's get_end_location ().
Vladimir N. Makarov [Tue, 28 Jan 2020 20:43:44 +0000 (15:43 -0500)]
Fix for PR93272 - LRA: EH reg allocated to hold local variable
2020-01-28 Vladimir Makarov <vmakarov@redhat.com>
PR rtl-optimization/93272
* ira-lives.c (process_out_of_region_eh_regs): New function.
(process_bb_node_lives): Call it.
Jan Hubicka [Tue, 28 Jan 2020 20:34:43 +0000 (21:34 +0100)]
diagnostics: make error message lowercase.
* coverage.c (read_counts_file): Make error message lowercase.
Jan Hubicka [Tue, 28 Jan 2020 20:31:26 +0000 (21:31 +0100)]
* profile-count.c (profile_quality_display_names): Fix ordering.
Jan Hubicka [Tue, 28 Jan 2020 20:30:14 +0000 (21:30 +0100)]
ipa: fix handling of multiple speculations (PR93318)
This patch started as work to resole Richard's comment on quadratic lookups
in resolve_speculation. While doing it I however noticed multiple problems
in the new speuclative call code which made the patch quite big. In
particular:
1) Before applying speculation we consider only targets with at lest
probability 1/2.
If profile is sane at most two targets can have probability greater or
equal to 1/2. So the new multi-target speculation code got enabled only
in very special scenario when there ae precisely two target with precise
probability 1/2 (which is tested by the single testcase).
As a conseuqence the multiple target logic got minimal test coverage and
this made us to miss several ICEs.
2) Profile updating in profile merging, tree-inline and indirect call
expansion was wrong which led to inconsistent profiles (as already seen
on the testcase).
3) Code responsible to turn speculative call to direct call was broken for
anything with more than one target.
4) There were multiple cases where call_site_hash went out of sync which
eventually leads to an ICE..
5) Some code expects that all speculative call targets forms a sequence in
the callee linked list but there is no code to maintain that invariant
nor a verifier.
Fixing this it became obvious that the current API of speculative_call_info is
not useful because it really builds on fact tht there are precisely three
components (direct call, ref and indirect call) in every speculative call
sequence. I ended up replacing it with iterator API for direct call
(first_speculative_call_target, next_speculative_call_target) and accessors for
the other coponents updating comment in cgraph.h.
Finally I made the work with call site hash more effetive by updating edge
manipulation to keep them in sequence. So first one can be looked up from the
hash and then they can be iterated by callee.
There are other things that can be improved (for example the speculation should
start with most common target first), but I will try to keep that for next
stage1. This patch is mostly about getting rid of ICE and profile corruption
which is a regression from GCC 9.
Honza
gcc/ChangeLog:
2020-01-28 Jan Hubicka <hubicka@ucw.cz>
PR lto/93318
* cgraph.c (cgraph_add_edge_to_call_site_hash): Update call site
hash only when edge is first within the sequence.
(cgraph_edge::set_call_stmt): Update handling of speculative calls.
(symbol_table::create_edge): Do not set target_prob.
(cgraph_edge::remove_caller): Watch for speculative calls when updating
the call site hash.
(cgraph_edge::make_speculative): Drop target_prob parameter.
(cgraph_edge::speculative_call_info): Remove.
(cgraph_edge::first_speculative_call_target): New member function.
(update_call_stmt_hash_for_removing_direct_edge): New function.
(cgraph_edge::resolve_speculation): Rewrite to new API.
(cgraph_edge::speculative_call_for_target): New member function.
(cgraph_edge::make_direct): Rewrite to new API; fix handling of
multiple speculation targets.
(cgraph_edge::redirect_call_stmt_to_callee): Likewise; fix updating
of profile.
(verify_speculative_call): Verify that targets form an interval.
* cgraph.h (cgraph_edge::speculative_call_info): Remove.
(cgraph_edge::first_speculative_call_target): New member function.
(cgraph_edge::next_speculative_call_target): New member function.
(cgraph_edge::speculative_call_target_ref): New member function.
(cgraph_edge;:speculative_call_indirect_edge): New member funtion.
(cgraph_edge): Remove target_prob.
* cgraphclones.c (cgraph_node::set_call_stmt_including_clones):
Fix handling of speculative calls.
* ipa-devirt.c (ipa_devirt): Fix handling of speculative cals.
* ipa-fnsummary.c (analyze_function_body): Likewise.
* ipa-inline.c (speculation_useful_p): Use new speculative call API.
* ipa-profile.c (dump_histogram): Fix formating.
(ipa_profile_generate_summary): Watch for overflows.
(ipa_profile): Do not require probablity to be 1/2; update to new API.
* ipa-prop.c (ipa_make_edge_direct_to_target): Update to new API.
(update_indirect_edges_after_inlining): Update to new API.
* ipa-utils.c (ipa_merge_profiles): Rewrite merging of speculative call
profiles.
* profile-count.h: (profile_probability::adjusted): New.
* tree-inline.c (copy_bb): Update to new speculative call API; fix
updating of profile.
* value-prof.c (gimple_ic_transform): Rename to ...
(dump_ic_profile): ... this one; update dumping.
(stream_in_histogram_value): Fix formating.
(gimple_value_profile_transformations): Update.
gcc/testsuite/ChangeLog:
2020-01-28 Jan Hubicka <hubicka@ucw.cz>
* g++.dg/tree-prof/indir-call-prof.C: Update template.
* gcc.dg/tree-prof/crossmodule-indircall-1.c: Add more targets.
* gcc.dg/tree-prof/crossmodule-indircall-1a.c: Add more targets.
* gcc.dg/tree-prof/indir-call-prof.c: Update template.
Jason Merrill [Tue, 28 Jan 2020 17:26:10 +0000 (12:26 -0500)]
c++: Allow template rvalue-ref conv to bind to lvalue ref.
When I implemented the [over.match.ref] rule that a reference conversion
function needs to match l/rvalue of the target reference type it changed our
handling of this testcase. It seems to me that our current behavior is what
the standard says, but it doesn't seem desirable, and all the other
compilers have our old behavior. So let's limit the change to non-templates
until there's some clarification from the committee.
PR c++/90546
* call.c (build_user_type_conversion_1): Allow a template conversion
returning an rvalue reference to bind directly to an lvalue.
Jan Hubicka [Tue, 28 Jan 2020 19:34:56 +0000 (20:34 +0100)]
ipa: fix handling of multiple speculations (PR93318)
This patch started as work to resole Richard's comment on quadratic lookups
in resolve_speculation. While doing it I however noticed multiple problems
in the new speuclative call code which made the patch quite big. In
particular:
1) Before applying speculation we consider only targets with at lest
probability 1/2.
If profile is sane at most two targets can have probability greater or
equal to 1/2. So the new multi-target speculation code got enabled only
in very special scenario when there ae precisely two target with precise
probability 1/2 (which is tested by the single testcase).
As a conseuqence the multiple target logic got minimal test coverage and
this made us to miss several ICEs.
2) Profile updating in profile merging, tree-inline and indirect call
expansion was wrong which led to inconsistent profiles (as already seen
on the testcase).
3) Code responsible to turn speculative call to direct call was broken for
anything with more than one target.
4) There were multiple cases where call_site_hash went out of sync which
eventually leads to an ICE..
5) Some code expects that all speculative call targets forms a sequence in
the callee linked list but there is no code to maintain that invariant
nor a verifier.
Fixing this it became obvious that the current API of speculative_call_info is
not useful because it really builds on fact tht there are precisely three
components (direct call, ref and indirect call) in every speculative call
sequence. I ended up replacing it with iterator API for direct call
(first_speculative_call_target, next_speculative_call_target) and accessors for
the other coponents updating comment in cgraph.h.
Finally I made the work with call site hash more effetive by updating edge
manipulation to keep them in sequence. So first one can be looked up from the
hash and then they can be iterated by callee.
There are other things that can be improved (for example the speculation should
start with most common target first), but I will try to keep that for next
stage1. This patch is mostly about getting rid of ICE and profile corruption
which is a regression from GCC 9.
gcc/ChangeLog:
PR lto/93318
* cgraph.c (cgraph_add_edge_to_call_site_hash): Update call site
hash only when edge is first within the sequence.
(cgraph_edge::set_call_stmt): Update handling of speculative calls.
(symbol_table::create_edge): Do not set target_prob.
(cgraph_edge::remove_caller): Watch for speculative calls when updating
the call site hash.
(cgraph_edge::make_speculative): Drop target_prob parameter.
(cgraph_edge::speculative_call_info): Remove.
(cgraph_edge::first_speculative_call_target): New member function.
(update_call_stmt_hash_for_removing_direct_edge): New function.
(cgraph_edge::resolve_speculation): Rewrite to new API.
(cgraph_edge::speculative_call_for_target): New member function.
(cgraph_edge::make_direct): Rewrite to new API; fix handling of
multiple speculation targets.
(cgraph_edge::redirect_call_stmt_to_callee): Likewise; fix updating
of profile.
(verify_speculative_call): Verify that targets form an interval.
* cgraph.h (cgraph_edge::speculative_call_info): Remove.
(cgraph_edge::first_speculative_call_target): New member function.
(cgraph_edge::next_speculative_call_target): New member function.
(cgraph_edge::speculative_call_target_ref): New member function.
(cgraph_edge;:speculative_call_indirect_edge): New member funtion.
(cgraph_edge): Remove target_prob.
* cgraphclones.c (cgraph_node::set_call_stmt_including_clones):
Fix handling of speculative calls.
* ipa-devirt.c (ipa_devirt): Fix handling of speculative cals.
* ipa-fnsummary.c (analyze_function_body): Likewise.
* ipa-inline.c (speculation_useful_p): Use new speculative call API.
* ipa-profile.c (dump_histogram): Fix formating.
(ipa_profile_generate_summary): Watch for overflows.
(ipa_profile): Do not require probablity to be 1/2; update to new API.
* ipa-prop.c (ipa_make_edge_direct_to_target): Update to new API.
(update_indirect_edges_after_inlining): Update to new API.
* ipa-utils.c (ipa_merge_profiles): Rewrite merging of speculative call
profiles.
* profile-count.h: (profile_probability::adjusted): New.
* tree-inline.c (copy_bb): Update to new speculative call API; fix
updating of profile.
* value-prof.c (gimple_ic_transform): Rename to ...
(dump_ic_profile): ... this one; update dumping.
(stream_in_histogram_value): Fix formating.
(gimple_value_profile_transformations): Update.
gcc/testsuite/ChangeLog:
* g++.dg/tree-prof/indir-call-prof.C: Update template.
* gcc.dg/tree-prof/crossmodule-indircall-1.c: Add more targets.
* gcc.dg/tree-prof/crossmodule-indircall-1a.c: Add more targets.
* gcc.dg/tree-prof/indir-call-prof.c: Update template.
H.J. Lu [Tue, 28 Jan 2020 19:32:56 +0000 (11:32 -0800)]
i386: Prefer TARGET_AVX over TARGET_SSE_TYPELESS_STORES
movaps/movups is one byte shorter than movdqa/movdqu. But it isn't the
case for AVX nor AVX512. This patch prefers TARGET_AVX over
TARGET_SSE_TYPELESS_STORES and adjust vmovups checks in assembly ouputs.
gcc/
PR target/91461
* config/i386/i386.md (*movoi_internal_avx): Remove
TARGET_SSE_TYPELESS_STORES check.
(*movti_internal): Prefer TARGET_AVX over
TARGET_SSE_TYPELESS_STORES.
(*movtf_internal): Likewise.
* config/i386/sse.md (mov<mode>_internal): Prefer TARGET_AVX over
TARGET_SSE_TYPELESS_STORES. Remove "<MODE_SIZE> == 16" check
from TARGET_SSE_TYPELESS_STORES.
gcc/testsuite/
PR target/91461
* gcc.target/i386/avx256-unaligned-store-2.c: Don't check
vmovups.
* gcc.target/i386/avx256-unaligned-store-3.c: Likewise.
* gcc.target/i386/pieces-memcpy-4.c: Likewise.
* gcc.target/i386/pieces-memcpy-5.c: Likewise.
* gcc.target/i386/pieces-memcpy-6.c: Likewise.
* gcc.target/i386/pieces-strcpy-2.c: Likewise.
* gcc.target/i386/pr90980-1.c: Likewise.
* gcc.target/i386/pr87317-4.c: Check "\tvmovd\t" instead of
"vmovd" to avoid matching "vmovdqu".
* gcc.target/i386/pr87317-5.c: Likewise.
* gcc.target/i386/pr87317-7.c: Likewise.
* gcc.target/i386/pr91461-1.c: New test.
* gcc.target/i386/pr91461-2.c: Likewise.
* gcc.target/i386/pr91461-3.c: Likewise.
* gcc.target/i386/pr91461-4.c: Likewise.
* gcc.target/i386/pr91461-5.c: Likewise.
David Malcolm [Tue, 28 Jan 2020 14:08:25 +0000 (09:08 -0500)]
diagnostic_metadata: unbreak xgettext (v2)
Changed in v2:
- rename from warning_with_metadata_at to warning_meta
- fix test plugins
While C++ can have overloads, xgettext can't deal with overloads that have
different argument positions, leading to two failures in "make gcc.pot":
emit_diagnostic_valist used incompatibly as both
--keyword=emit_diagnostic_valist:4
--flag=emit_diagnostic_valist:4:gcc-internal-format and
--keyword=emit_diagnostic_valist:5
--flag=emit_diagnostic_valist:5:gcc-internal-format
warning_at used incompatibly as both
--keyword=warning_at:3 --flag=warning_at:3:gcc-internal-format and
--keyword=warning_at:4 --flag=warning_at:4:gcc-internal-format
The emit_diagnostic_valist overload isn't used anywhere (I think it's
a leftover from an earlier iteration of the analyzer patch kit).
The warning_at overload is used throughout the analyzer but nowhere else.
Ideally I'd like to consolidate this argument with something
constructable in various ways:
- from a metadata and an int or
- from an int (or, better an "enum opt_code"),
so that the overload happens when implicitly choosing the ctor, but
that feels like stage 1 material.
In the meantime, fix xgettext by deleting the unused overload and
renaming the used one.
gcc/analyzer/ChangeLog:
* region-model.cc (poisoned_value_diagnostic::emit): Update for
renaming of warning_at overload to warning_meta.
* sm-file.cc (file_leak::emit): Likewise.
* sm-malloc.cc (double_free::emit): Likewise.
(possible_null_deref::emit): Likewise.
(possible_null_arg::emit): Likewise.
(null_deref::emit): Likewise.
(null_arg::emit): Likewise.
(use_after_free::emit): Likewise.
(malloc_leak::emit): Likewise.
(free_of_non_heap::emit): Likewise.
* sm-sensitive.cc (exposure_through_output_file::emit): Likewise.
* sm-signal.cc (signal_unsafe_call::emit): Likewise.
* sm-taint.cc (tainted_array_index::emit): Likewise.
gcc/ChangeLog:
* diagnostic-core.h (warning_at): Rename overload to...
(warning_meta): ...this.
(emit_diagnostic_valist): Delete decl of overload taking
diagnostic_metadata.
* diagnostic.c (emit_diagnostic_valist): Likewise for defn.
(warning_at): Rename overload taking diagnostic_metadata to...
(warning_meta): ...this.
gcc/testsuite/ChangeLog:
* gcc.dg/plugin/diagnostic_plugin_test_metadata.c: Update for
renaming of warning_at overload to warning_meta.
* gcc.dg/plugin/diagnostic_plugin_test_paths.c: Likewise.
Andrew Benson [Tue, 28 Jan 2020 18:12:23 +0000 (18:12 +0000)]
Increase GFC_MAX_MANGLED_SYMBOL_LEN to handle submodule names.
PR fortran/93461
* trans.h: Increase GFC_MAX_MANGLED_SYMBOL_LEN to
GFC_MAX_SYMBOL_LEN*3+5 to allow for inclusion of submodule name,
plus the "." between module and submodule names.
* gfortran.dg/pr93461.f90: New test.
Andrew Benson [Tue, 28 Jan 2020 17:58:40 +0000 (17:58 +0000)]
Allow concatenated module+submodule names.
Increase length of char variables "parent1" and "parent2" in
set_syms_host_assoc() to allow them to hold concatenated module +
submodule names.
PR fortran/93473
* parse.c: Increase length of char variables to allow them to hold
a concatenated module + submodule name.
* gfortran.dg/pr93473.f90: New test.
Nathan Sidwell [Tue, 28 Jan 2020 15:58:29 +0000 (07:58 -0800)]
preprocessor: Make __has_include a builtin macro [PR93452]
The clever hack of '#define __has_include __has_include' breaks -dD
and -fdirectives-only, because that emits definitions. This turns
__has_include into a proper builtin macro. Thus it's never emitted
via -dD, and because use outside of directive processing is undefined,
we can just expand it anywhere.
PR preprocessor/93452
* internal.h (struct spec_nodes): Drop n__has_include{,_next}.
* directives.c (lex_macro_node): Don't check __has_include redef.
* expr.c (eval_token): Drop __has_include eval.
(parse_has_include): Move to ...
* macro.c (builtin_has_include): ... here.
(_cpp_builtin_macro_text): Eval __has_include{,_next}.
* include/cpplib.h (enum cpp_builtin_type): Add BT_HAS_INCLUDE{,_NEXT}.
* init.c (builtin_array): Add them.
(cpp_init_builtins): Drop __has_include{,_next} init here ...
* pch.c (cpp_read_state): ... and here.
* traditional.c (enum ls): Drop has_include states ...
(_cpp_scan_out_logical_line): ... and here.
Jonathan Wakely [Tue, 28 Jan 2020 15:20:23 +0000 (15:20 +0000)]
libstdc++: Fix order of changelog entries
Rebasing my last two commits put the changelog entries at the wrong
place in the file. Fixed by this change.
Jason Merrill [Mon, 27 Jan 2020 22:55:14 +0000 (17:55 -0500)]
c++: Function declared with typedef with eh-specification.
We just need to handle the exception specification like other properties of
a function typedef.
PR c++/90731
* decl.c (grokdeclarator): Propagate eh spec from typedef.
Julian Brown [Sat, 4 Jan 2020 01:39:56 +0000 (17:39 -0800)]
Check array contiguity for OpenACC/Fortran
PR fortran/93025
gcc/fortran/
* openmp.c (resolve_omp_clauses): Check array references for contiguity.
gcc/testsuite/
* gfortran.dg/goacc/mapping-tests-2.f90: New test.
* gfortran.dg/goacc/subarrays.f95: Expect rejection of non-contiguous
array.
Julian Brown [Fri, 3 Jan 2020 18:06:15 +0000 (10:06 -0800)]
Don't allow mixed component and non-component accesses for OpenACC/Fortran
gcc/fortran/
* gfortran.h (gfc_symbol): Add comp_mark bitfield.
* openmp.c (resolve_omp_clauses): Disallow mixed component and
full-derived-type accesses to the same variable within a single
directive.
libgomp/
* testsuite/libgomp.oacc-fortran/deep-copy-2.f90: Remove test from here.
* testsuite/libgomp.oacc-fortran/deep-copy-3.f90: Don't use mixed
component/non-component variable refs in a single directive.
* testsuite/libgomp.oacc-fortran/classtypes-1.f95: Likewise.
gcc/testsuite/
* gfortran.dg/goacc/deep-copy-2.f90: Move test here (from libgomp
testsuite). Make a compilation test, and expect rejection of mixed
component/non-component accesses.
* gfortran.dg/goacc/mapping-tests-1.f90: New test.
Julian Brown [Sat, 4 Jan 2020 01:45:24 +0000 (17:45 -0800)]
Add OpenACC test for sub-references being pointer or allocatable variables
gcc/testsuite/
* gfortran.dg/goacc/strided-alloc-ptr.f90: New test.
Jonathan Wakely [Tue, 28 Jan 2020 13:24:09 +0000 (13:24 +0000)]
libstdc++: Avoid using sizeof with function types (PR 93470)
PR libstdc++/93470
* include/bits/refwrap.h (reference_wrapper::operator()): Restrict
static assertion to object types.
Jonathan Wakely [Tue, 28 Jan 2020 13:24:09 +0000 (13:24 +0000)]
libstdc++: Replace glibc-specific check for clock_gettime (PR 93325)
It's wrong to assume that clock_gettime is unavailable on any *-*-linux*
target that doesn't have glibc 2.17 or later. Use a generic test instead
of using __GLIBC_PREREQ. Only do that test when is_hosted=yes so that we
don't get an error for cross targets without a working linker.
This ensures that C library's clock_gettime will be used on non-glibc
targets, instead of an incorrect syscall to SYS_clock_gettime.
PR libstdc++/93325
* acinclude.m4 (GLIBCXX_ENABLE_LIBSTDCXX_TIME): Use AC_SEARCH_LIBS for
clock_gettime instead of explicit glibc version check.
* configure: Regenerate.
Richard Biener [Tue, 28 Jan 2020 13:09:12 +0000 (14:09 +0100)]
tree-optimization/93439 move clique bookkeeping to OMP expansion
Autopar was doing clique bookkeeping too early when creating destination
functions but then later introducing new cliques via versioning loops.
The following moves the bookkeeping to the actual outlining process.
2020-01-28 Richard Biener <rguenther@suse.de>
PR tree-optimization/93439
* tree-parloops.c (create_loop_fn): Move clique bookkeeping...
* tree-cfg.c (move_sese_region_to_fn): ... here.
(verify_types_in_gimple_reference): Verify used cliques are
tracked.
* gfortran.dg/graphite/pr93439.f90: New testcase.
H.J. Lu [Tue, 28 Jan 2020 12:43:34 +0000 (04:43 -0800)]
i386: Don't use ix86_tune_ctrl_string in parse_mtune_ctrl_str
There are
static void
parse_mtune_ctrl_str (bool dump)
{
if (!ix86_tune_ctrl_string)
return;
parse_mtune_ctrl_str is only called from set_ix86_tune_features, which
is only called from ix86_function_specific_restore and
ix86_option_override_internal. parse_mtune_ctrl_str shouldn't use
ix86_tune_ctrl_string which is defined with global_options. Instead,
opts should be passed to parse_mtune_ctrl_str.
PR target/91399
* config/i386/i386-options.c (set_ix86_tune_features): Add an
argument of a pointer to struct gcc_options and pass it to
parse_mtune_ctrl_str.
(ix86_function_specific_restore): Pass opts to
set_ix86_tune_features.
(ix86_option_override_internal): Likewise.
(parse_mtune_ctrl_str): Add an argument of a pointer to struct
gcc_options and use it for x_ix86_tune_ctrl_string.
Claudiu Zissulescu [Tue, 28 Jan 2020 11:59:34 +0000 (13:59 +0200)]
[ARC] Pass along -mcode-density flag to the assembler.
This change makes sure that if the driver is invoked with
"-mcode-density" flag, then the assembler will receive it too.
Note Claudiu Zissulescu:
This is an old patch of which I forgot to add the test.
testsuite/
2019-09-03 Sahahb Vahedi <shahab@synopsys.com>
* gcc.target/arc/code-density-flag.c: New test.
Richard Sandiford [Sun, 26 Jan 2020 13:01:48 +0000 (13:01 +0000)]
simplify-rtx: Extend (truncate (*extract ...)) fold [PR87763]
In the gcc.target/aarch64/lsl_asr_sbfiz.c part of this PR, we have:
Failed to match this instruction:
(set (reg:SI 95)
(ashift:SI (subreg:SI (sign_extract:DI (subreg:DI (reg:SI 97) 0)
(const_int 3 [0x3])
(const_int 0 [0])) 0)
(const_int 19 [0x13])))
If we perform the natural simplification to:
(set (reg:SI 95)
(ashift:SI (sign_extract:SI (reg:SI 97)
(const_int 3 [0x3])
(const_int 0 [0])) 0)
(const_int 19 [0x13])))
then the pattern matches. And it turns out that we do have a
simplification like that already, but it would only kick in for
extractions from a reg, not a subreg. E.g.:
(set (reg:SI 95)
(ashift:SI (subreg:SI (sign_extract:DI (reg:DI X)
(const_int 3 [0x3])
(const_int 0 [0])) 0)
(const_int 19 [0x13])))
would simplify to:
(set (reg:SI 95)
(ashift:SI (sign_extract:SI (subreg:SI (reg:DI X) 0)
(const_int 3 [0x3])
(const_int 0 [0])) 0)
(const_int 19 [0x13])))
IMO the subreg case is even more obviously a simplification
than the bare reg case, since the net effect is to remove
either one or two subregs, rather than simply change the
position of a subreg/truncation.
However, doing that regressed gcc.dg/tree-ssa/pr64910-2.c
for -m32 on x86_64-linux-gnu, because we could then simplify
a :HI zero_extract to a :QI one. The associated *testqi_ext_3
pattern did already seem to want to handle QImode extractions:
"ix86_match_ccmode (insn, CCNOmode)
&& ((TARGET_64BIT && GET_MODE (operands[2]) == DImode)
|| GET_MODE (operands[2]) == SImode
|| GET_MODE (operands[2]) == HImode
|| GET_MODE (operands[2]) == QImode)
but I'm not sure how often the QI case would trigger in practice,
since the zero_extract mode was restricted to HI and above. I checked
the other x86 patterns and couldn't see any other instances of this.
2020-01-28 Richard Sandiford <richard.sandiford@arm.com>
gcc/
PR rtl-optimization/87763
* simplify-rtx.c (simplify_truncation): Extend sign/zero_extract
simplification to handle subregs as well as bare regs.
* config/i386/i386.md (*testqi_ext_3): Match QI extracts too.
Richard Sandiford [Wed, 22 Jan 2020 18:21:05 +0000 (18:21 +0000)]
vect: Pattern-matched calls in reduction chains
gcc.dg/pr56350.c started ICEing for SVE in GCC 10 because we
pattern-matched a division reduction:
a /= 8;
into a signed shift with division semantics:
... = IFN_SDIV_POW2 (..., 3);
whereas the reduction code expected it still to be a gassign.
One fix would be to check for a reduction in the pattern matcher
(but current patterns don't generally do that). Another would be
to fail gracefully for reductions involving calls. Since we can't
vectorise the reduction either way, and probably have a better shot
with the shift form, this patch goes for the "fail gracefully" approach.
2020-01-28 Richard Sandiford <richard.sandiford@arm.com>
gcc/
* tree-vect-loop.c (vectorizable_reduction): Fail gracefully
for reduction chains that (now) include a call.