Jakub Jelinek [Tue, 19 Nov 2019 21:28:22 +0000 (22:28 +0100)]
re PR c++/92414 (internal compiler error: tree check: expected constructor, have error_mark in cxx_eval_store_expression, at cp/constexpr.c:4009)
PR c++/92414
* constexpr.c (cxx_eval_outermost_constant_expr): If DECL_INITIAL
on object is erroneous, return t without trying to evaluate
a constexpr dtor.
* g++.dg/cpp2a/constexpr-dtor4.C: New test.
From-SVN: r278468
Jason Merrill [Tue, 19 Nov 2019 20:22:12 +0000 (15:22 -0500)]
Consider parm types equivalence for operator rewrite tiebreaker.
The C++ committee continues to discuss how best to avoid breaking existing
code with the new rules for reversed operators. A recent suggestion was to
base the tie-breaker on the parameter types of the candidates, which made a
lot of sense to me, so this patch implements that.
This patch also mentions that a candidate was reversed or rewritten when
printing the list of candidates, and warns about a comparison that becomes
recursive under the new rules. There is no flag for this warning; people
can silence it by swapping the operands.
* call.c (same_fn_or_template): Change to cand_parms_match.
(joust): Adjust.
(print_z_candidate): Mark rewritten/reversed candidates.
(build_new_op_1): Warn about recursive call with reversed arguments.
From-SVN: r278465
Pat Haugen [Tue, 19 Nov 2019 19:49:37 +0000 (19:49 +0000)]
rs6000.c (move_to_end_of_ready): New, factored out from common code.
* config/rs6000/rs6000.c (move_to_end_of_ready): New, factored out
from common code.
(power6_sched_reorder2): Factored out from rs6000_sched_reorder2,
call new function.
(power9_sched_reorder2): Call new function.
(rs6000_sched_reorder2): Likewise.
From-SVN: r278463
Richard Sandiford [Tue, 19 Nov 2019 18:58:44 +0000 (18:58 +0000)]
Move ChangeLog entry to correct file
From-SVN: r278461
Jan Hubicka [Tue, 19 Nov 2019 18:57:50 +0000 (19:57 +0100)]
Remove unused parameter PROB in ipa-fnsummary.c
* ipa-fnsummary.c (estimate_edge_size_and_time): Drop parameter PROB.
(estimate_calls_size_and_time): Update.
From-SVN: r278460
Jan Hubicka [Tue, 19 Nov 2019 18:56:26 +0000 (19:56 +0100)]
Avoid redundant computations in edge_badness.
* ipa-inline.c (inlining_speedup): New function.
(edge_badness): Use it.
From-SVN: r278459
Dragan Mladjenovic [Tue, 19 Nov 2019 18:14:32 +0000 (18:14 +0000)]
[MIPS] Prevent MSA branches from being put into delay slots
This patch tightens the instruction definitions to make sure
that MSA branch instructions cannot be put into delay slots and have their
delay slots eligible for being filled. Also, MSA *div*3 patterns use MSA
branches for zero checks but are not marked as being multi instruction and
thus could be put into delay slots. This patch fixes that.
gcc/ChangeLog:
2019-11-19 Zoran Jovanovic <zoran.jovanovic@mips.com>
Dragan Mladjenovic <dmladjenovic@wavecomp.com>
* config/mips/mips-msa.md (msa_<msabr>_<msafmt_f>, msa_<msabr>_v_<msafmt_f>):
Mark as not having "likely" version.
* config/mips/mips.md (insn_count): The simd_div instruction with
TARGET_CHECK_ZERO_DIV consists of 3 instructions.
(can_delay): Exclude simd_branch.
(defile_delay *): Add simd_branch instructions.
They have one regular delay slot.
gcc/testsuite/ChangeLog:
2019-11-19 Dragan Mladjenovic <dmladjenovic@wavecomp.com>
* gcc.target/mips/msa-ds.c: New test.
From-SVN: r278458
Richard Sandiford [Tue, 19 Nov 2019 16:31:17 +0000 (16:31 +0000)]
Revert r278441
To restore powerpc bootstrap.
2019-11-19 Richard Sandiford <richard.sandiford@arm.com>
gcc/
Revert:
2019-11-18 Richard Sandiford <richard.sandiford@arm.com>
* cse.c (cse_insn): Delete no-op register moves too.
* simplify-rtx.c (comparison_to_mask): Handle unsigned comparisons.
Take a second comparison to control the value for NE.
(mask_to_comparison): Handle unsigned comparisons.
(simplify_logical_relational_operation): Likewise. Update call
to comparison_to_mask. Handle AND if !HONOR_NANs.
(simplify_binary_operation_1): Call the above for AND too.
gcc/testsuite/
Revert:
2019-11-18 Richard Sandiford <richard.sandiford@arm.com>
* gcc.target/aarch64/sve/acle/asm/ptest_pmore.c: New test.
From-SVN: r278455
Wilco Dijkstra [Tue, 19 Nov 2019 15:57:54 +0000 (15:57 +0000)]
[AArch64] PR79262: Adjust vector cost
PR79262 has been fixed for almost all AArch64 cpus, however the example is still
vectorized in a few cases, resulting in lower performance. Adjust the vector
cost slightly so that so that -mcpu=cortex-a53 now has identical performance as
-mcpu=cortex-a57 on libquantum.
gcc/
PR target/79262
* config/aarch64/aarch64.c (generic_vector_cost): Adjust
vec_to_scalar_cost.
From-SVN: r278452
Andrew Sutton [Tue, 19 Nov 2019 15:26:16 +0000 (15:26 +0000)]
re PR c++/89913 (ICE with invalid using declaration)
PR c++/89913
gcc/cp/
* pt.c (get_underlying_template): Exit loop if the original type
of the alias is null.
gcc/testsuite/
* g++.dg/cpp2a/pr89913.C: New test.
From-SVN: r278451
Andrew Sutton [Tue, 19 Nov 2019 15:18:50 +0000 (15:18 +0000)]
re PR c++/92078 (error: 'struct std::ptr<Iter>' redeclared with different access)
PR c++/92078
gcc/cp/
* pt.c (maybe_new_partial_specialization): Apply access to newly
created partial specializations. Update comment style.
gcc/testsuite/
* g++.dg/cpp2a/concepts-pr92078.C: New.
* g++.dg/cpp2a/concepts-requires18.C: Update diagnostics.
From-SVN: r278450
Andrew Sutton [Tue, 19 Nov 2019 15:11:14 +0000 (15:11 +0000)]
Suppress diagnostics substituting into a requires-expression (PR c++/92403).
gcc/cp/
* pt.c (tsubst_copy_and_build): Perform the first substitution without
diagnostics and a second only if tsubst_requries_expr returns an error.
From-SVN: r278449
Martin Liska [Tue, 19 Nov 2019 15:07:26 +0000 (16:07 +0100)]
Restore init_ggc_heuristics.
2019-11-19 Martin Liska <mliska@suse.cz>
* toplev.c (general_init): Move the call...
(toplev::main): ... here as we need init_options_struct
being called.
From-SVN: r278448
Wilco Dijkstra [Tue, 19 Nov 2019 14:20:12 +0000 (14:20 +0000)]
[Arm] Set Armv7-A tune to Cortex-A53
By default Armv7-A tunes for Cortex-A8. This is an ancient core
today and the settings are no longer useful for newer cores. So
switch to Cortex-A53 tuning since it works well across a wide range
of modern cores.
On SPECINT2006 the performance gain is 0.7% compared to Cortex-A8 tuning,
and codesize reduces by 0.2%.
gcc/
* config/arm/arm-cpus.in (armv7): Set tune to Cortex-A53.
(armv7-a): Likewise.
(armv7ve): Likewise.
From-SVN: r278447
Andrew Stubbs [Tue, 19 Nov 2019 14:04:27 +0000 (14:04 +0000)]
Update loop-1.c test for amdgcn
2019-11-19 Andrew Stubbs <ams@codesourcery.com>
gcc/testsuite/
* gcc.dg/tree-ssa/loop-1.c: Change amdgcn assembler scan.
From-SVN: r278446
Richard Biener [Tue, 19 Nov 2019 14:00:46 +0000 (14:00 +0000)]
re PR tree-optimization/92581 (condition chains vectorized wrongly)
2019-11-19 Richard Biener <rguenther@suse.de>
PR tree-optimization/92581
* tree-vect-loop.c (vect_create_epilog_for_reduction): For
condition reduction chains gather all conditions involved
for computing the index reduction vector.
* gcc.dg/vect/vect-cond-reduc-5.c: New testcase.
From-SVN: r278445
Dennis Zhang [Tue, 19 Nov 2019 13:43:39 +0000 (13:43 +0000)]
[AArch64] Implement Armv8.5-A memory tagging (MTE) intrinsics
2019-11-19 Dennis Zhang <dennis.zhang@arm.com>
* config/aarch64/aarch64-builtins.c (enum aarch64_builtins): Add
AARCH64_MEMTAG_BUILTIN_START, AARCH64_MEMTAG_BUILTIN_IRG,
AARCH64_MEMTAG_BUILTIN_GMI, AARCH64_MEMTAG_BUILTIN_SUBP,
AARCH64_MEMTAG_BUILTIN_INC_TAG, AARCH64_MEMTAG_BUILTIN_SET_TAG,
AARCH64_MEMTAG_BUILTIN_GET_TAG, and AARCH64_MEMTAG_BUILTIN_END.
(aarch64_init_memtag_builtins): New.
(AARCH64_INIT_MEMTAG_BUILTINS_DECL): New macro.
(aarch64_general_init_builtins): Call aarch64_init_memtag_builtins.
(aarch64_expand_builtin_memtag): New.
(aarch64_general_expand_builtin): Call aarch64_expand_builtin_memtag.
(AARCH64_BUILTIN_SUBCODE): New macro.
(aarch64_resolve_overloaded_memtag): New.
(aarch64_resolve_overloaded_builtin_general): New. Call
aarch64_resolve_overloaded_memtag to handle overloaded MTE builtins.
* config/aarch64/aarch64-c.c (aarch64_update_cpp_builtins): Define
__ARM_FEATURE_MEMORY_TAGGING when enabled.
(aarch64_resolve_overloaded_builtin): Call
aarch64_resolve_overloaded_builtin_general.
* config/aarch64/aarch64-protos.h
(aarch64_resolve_overloaded_builtin_general): New declaration.
* config/aarch64/aarch64.h (AARCH64_ISA_MEMTAG): New macro.
(TARGET_MEMTAG): Likewise.
* config/aarch64/aarch64.md (UNSPEC_GEN_TAG): New unspec.
(UNSPEC_GEN_TAG_RND, and UNSPEC_TAG_SPACE): Likewise.
(irg, gmi, subp, addg, ldg, stg): New instructions.
* config/aarch64/arm_acle.h (__arm_mte_create_random_tag): New macro.
(__arm_mte_exclude_tag, __arm_mte_ptrdiff): Likewise.
(__arm_mte_increment_tag, __arm_mte_set_tag): Likewise.
(__arm_mte_get_tag): Likewise.
* config/aarch64/predicates.md (aarch64_memtag_tag_offset): New.
(aarch64_granule16_uimm6, aarch64_granule16_simm9): New.
* config/arm/types.md (memtag): New.
* doc/invoke.texi (-memtag): Update description.
2019-11-19 Dennis Zhang <dennis.zhang@arm.com>
* gcc.target/aarch64/acle/memtag_1.c: New test.
* gcc.target/aarch64/acle/memtag_2.c: New test.
* gcc.target/aarch64/acle/memtag_3.c: New test.
From-SVN: r278444
Richard Henderson [Tue, 19 Nov 2019 13:14:20 +0000 (13:14 +0000)]
arm: Fixes for asm-flags vs thumb1 and ilp32
Thumb1 cannot support asm-flags currently, because we don't expose the
flags register to the compiler. Disable the support for that case.
Adjust the asm-flag-6 test for aarch64 ilp32 correctness.
gcc/
* config/arm/arm-c.c (arm_cpu_builtins): Use def_or_undef_macro
to define __GCC_ASM_FLAG_OUTPUTS__.
* config/arm/arm.c (thumb1_md_asm_adjust): New function.
(arm_option_params_internal): Swap out targetm.md_asm_adjust
depending on TARGET_THUMB1.
* doc/extend.texi (FlagOutputOperands): Document thumb1 restriction.
gcc/testsuite/
* testsuite/gcc.target/arm/asm-flag-3.c: Skip for thumb1.
* testsuite/gcc.target/arm/asm-flag-5.c: Likewise.
* testsuite/gcc.target/arm/asm-flag-6.c: Likewise.
* testsuite/gcc.target/arm/asm-flag-4.c: New test.
* testsuite/gcc.target/aarch64/asm-flag-6.c: Use %w for
asm inputs to cmp instruction for ILP32.
From-SVN: r278443
Jonathan Wakely [Tue, 19 Nov 2019 09:34:59 +0000 (09:34 +0000)]
libstdc++: Fix declarations of variable templates
This code is invalid and rejected by other compilers (see PR 92576).
* include/bits/regex.h (ranges::__detail::__enable_view_impl): Fix
declaration.
* include/bits/stl_multiset.h (ranges::__detail::__enable_view_impl):
Likewise.
* include/bits/stl_set.h (ranges::__detail::__enable_view_impl):
Likewise.
* include/bits/unordered_set.h (ranges::__detail::__enable_view_impl):
Likewise.
* include/debug/multiset.h (ranges::__detail::__enable_view_impl):
Likewise.
* include/debug/set.h (ranges::__detail::__enable_view_impl): Likewise.
* include/debug/unordered_set (ranges::__detail::__enable_view_impl):
Likewise.
From-SVN: r278440
Jakub Jelinek [Tue, 19 Nov 2019 09:31:59 +0000 (10:31 +0100)]
re PR target/92549 (Use x86 xchg instruction more)
PR target/92549
* config/i386/i386.md (peephole2 for *swap<mode>): New peephole2.
* gcc.target/i386/pr92549.c: New test.
From-SVN: r278439
Jakub Jelinek [Tue, 19 Nov 2019 09:15:53 +0000 (10:15 +0100)]
re PR middle-end/91450 (__builtin_mul_overflow(A,B,R) wrong code if product < 0, *R is unsigned, and !(A&B))
PR middle-end/91450
* internal-fn.c (expand_mul_overflow): For s1 * s2 -> ur, if one
operand is negative and one non-negative, compare the non-negative
one against 0 rather than comparing s1 & s2 against 0. Otherwise,
don't compare (s1 & s2) == 0, but compare separately both s1 == 0
and s2 == 0, unless one of them is known to be negative. Remove
tem2 variable, use tem where tem2 has been used before.
* gcc.c-torture/execute/pr91450-1.c: New test.
* gcc.c-torture/execute/pr91450-2.c: New test.
From-SVN: r278437
Eric Botcazou [Tue, 19 Nov 2019 09:12:09 +0000 (09:12 +0000)]
* doc/invoke.texi (-gno-internal-reset-location-views): Fix typo.
From-SVN: r278434
Jakub Jelinek [Tue, 19 Nov 2019 08:52:31 +0000 (09:52 +0100)]
re PR c++/92504 (ICE on gcc-9 -fopenmp: internal compiler error: tree check: expected tree that contains 'decl common' structure, have 'baselink' in get_inner_reference, at expr.c:7238)
PR c++/92504
* semantics.c (handle_omp_for_class_iterator): Don't call
cp_fully_fold on cond.
* g++.dg/gomp/pr92504.C: New test.
From-SVN: r278433
Jakub Jelinek [Tue, 19 Nov 2019 08:51:31 +0000 (09:51 +0100)]
re PR tree-optimization/92557 (ICE in omp_clause_aligned_alignment, at omp-low.c:4090)
PR tree-optimization/92557
* omp-low.c (omp_clause_aligned_alignment): Punt if TYPE_MODE is not
vmode rather than asserting it always is.
* gcc.dg/gomp/pr92557.c: New test.
From-SVN: r278432
Richard Biener [Tue, 19 Nov 2019 07:33:58 +0000 (07:33 +0000)]
re PR tree-optimization/92554 (ICE in vect_create_epilog_for_reduction, at tree-vect-loop.c:4325)
2019-11-19 Richard Biener <rguenther@suse.de>
PR tree-optimization/92554
* tree-vect-loop.c (vect_create_epilog_for_reduction): Look
for the actual condition stmt and deal with sign-changes.
* gcc.dg/vect/pr92554.c: New testcase.
From-SVN: r278431
Richard Biener [Tue, 19 Nov 2019 07:31:28 +0000 (07:31 +0000)]
re PR tree-optimization/92555 (ICE in exact_div, at poly-int.h:2162)
2019-09-19 Richard Biener <rguenther@suse.de>
PR tree-optimization/92555
* tree-vect-loop.c (vect_update_vf_for_slp): Also scan PHIs
for non-SLP stmts.
* gcc.dg/vect/pr92555.c: New testcase.
From-SVN: r278430
Martin Liska [Tue, 19 Nov 2019 07:22:21 +0000 (08:22 +0100)]
Initialize a variable due to -Wmaybe-uninitialized.
2019-11-19 Martin Liska <mliska@suse.cz>
PR bootstrap/92540
* config/riscv/riscv.c (riscv_address_insns): Initialize
addr in order to remove boostrap -Wmaybe-uninitialized
error.
From-SVN: r278429
Joseph Myers [Tue, 19 Nov 2019 00:21:49 +0000 (00:21 +0000)]
Change some bad uses of C2x attributes into pedwarns.
Certain bad uses of C2x standard attributes (that is, attributes
inside [[]] with only a name but no namespace specified) are
constraint violations, and so should be diagnosed with a pedwarn (or
error) where GCC currently uses a warning. This patch implements this
in some cases (not yet for attributes used on types, nor for some bad
uses of fallthrough attributes). Specifically, this applies to
unknown standard attributes (taking care not to pedwarn for nodiscard,
which is known but not implemented for C), and to all currently
implemented standard attributes in attribute declarations (including
when mixed with fallthrough) and on statements.
Bootstrapped with no regressions on x86_64-pc-linux-gnu.
gcc/c:
* c-decl.c (c_warn_unused_attributes): Use pedwarn not warning for
standard attributes.
* c-parser.c (c_parser_std_attribute): Take argument for_tm. Use
pedwarn for unknown standard attributes and return error_mark_node
for them.
gcc/c-family:
* c-common.c (attribute_fallthrough_p): In C, use pedwarn not
warning for standard attributes mixed with fallthrough attributes.
gcc/testsuite:
* gcc.dg/c2x-attr-fallthrough-5.c, gcc.dg/c2x-attr-syntax-5.c: New
tests.
* gcc.dg/c2x-attr-deprecated-2.c, gcc.dg/c2x-attr-deprecated-4.c,
gcc.dg/c2x-attr-fallthrough-2.c, gcc.dg/c2x-attr-maybe_unused-2.c,
gcc.dg/c2x-attr-maybe_unused-4.c: Expect errors in place of some
warnings.
From-SVN: r278428
GCC Administrator [Tue, 19 Nov 2019 00:16:23 +0000 (00:16 +0000)]
Daily bump.
From-SVN: r278427
Paolo Carlini [Mon, 18 Nov 2019 23:02:22 +0000 (23:02 +0000)]
typeck.c (cp_build_addr_expr_1): Use cp_expr_loc_or_input_loc in three places.
/cp
2019-11-18 Paolo Carlini <paolo.carlini@oracle.com>
* typeck.c (cp_build_addr_expr_1): Use cp_expr_loc_or_input_loc
in three places.
(cxx_sizeof_expr): Use it in one additional place.
(cxx_alignof_expr): Likewise.
(lvalue_or_else): Likewise.
/testsuite
2019-11-18 Paolo Carlini <paolo.carlini@oracle.com>
* g++.dg/cpp0x/addressof2.C: Test locations too.
* g++.dg/cpp0x/rv-lvalue-req.C: Likewise.
* g++.dg/expr/crash2.C: Likewise.
* g++.dg/expr/lval1.C: Likewise.
* g++.dg/expr/unary2.C: Likewise.
* g++.dg/ext/lvaddr.C: Likewise.
* g++.dg/ext/lvalue1.C: Likewise.
* g++.dg/tree-ssa/pr20280.C: Likewise.
* g++.dg/warn/Wplacement-new-size.C: Likewise.
* g++.old-deja/g++.brendan/alignof.C: Likewise.
* g++.old-deja/g++.brendan/sizeof2.C: Likewise.
* g++.old-deja/g++.law/temps1.C: Likewise.
From-SVN: r278424
Martin Sebor [Mon, 18 Nov 2019 22:14:16 +0000 (22:14 +0000)]
PR middle-end/92493 - ICE in get_origin_and_offset at gimple-ssa-sprintf.c
gcc/ChangeLog:
PR tree-optimization/92493
* gimple-ssa-sprintf.c (get_origin_and_offset): Remove spurious
assignment.
gcc/testsuite/ChangeLog:
PR tree-optimization/92493
* gcc.dg/pr92493.c: New test.
From-SVN: r278423
Giuliano Belinassi [Mon, 18 Nov 2019 20:05:16 +0000 (20:05 +0000)]
Refactor tree-loop-distribution.c for thread safety
This patch refactors tree-loop-distribution.c for thread safety without
use of C11 __thread feature. All global variables were moved to
`class loop_distribution` which is initialized at ::execute time.
From-SVN: r278421
Jan Hubicka [Mon, 18 Nov 2019 19:28:53 +0000 (20:28 +0100)]
re PR ipa/92508 (ICE in do_estimate_edge_time, at ipa-inline-analysis.c:223 since r278159)
PR ipa/92508
* ipa-inline.c (inline_small_functions): Add new edges after reseting
caches.
* ipa-inline-analysis.c (do_estimate_edge_time): Fix sanity check.
From-SVN: r278419
Joseph Myers [Mon, 18 Nov 2019 17:41:40 +0000 (17:41 +0000)]
Add more C2x attributes tests.
This patch adds more tests of C2x attributes, where I found cases that
were handled correctly by my patches but missing from the original
tests. Tests are added for -std=c11 -pedantic handling of C2x
attribute syntax and corresponding -Wc11-c2x-compat handling; for
struct [[deprecated]]; and for the [[__fallthrough__]] spelling of
[[fallthrough]] in the case of valid fallthrough attributes.
Tested for x86_64-pc-linux-gnu.
* gcc.dg/c11-attr-syntax-1.c, gcc.dg/c11-attr-syntax-2.c,
gcc.dg/c11-attr-syntax-3.c, gcc.dg/c2x-attr-syntax-4.c: New tests.
* gcc.dg/c2x-attr-deprecated-1.c: Also test struct [[deprecated]].
* gcc.dg/c2x-attr-fallthrough-1.c: Also test [[__fallthrough__]].
From-SVN: r278418
Marek Polacek [Mon, 18 Nov 2019 16:39:24 +0000 (16:39 +0000)]
PR c++/91962 - ICE with reference binding and qualification conversion.
When fixing c++/91889 (r276251) I was assuming that we couldn't have a ck_qual
under a ck_ref_bind, and I was introducing it in the patch and so this
+ if (next_conversion (convs)->kind == ck_qual)
+ {
+ gcc_assert (same_type_p (TREE_TYPE (expr),
+ next_conversion (convs)->type));
+ /* Strip the cast created by the ck_qual; cp_build_addr_expr
+ below expects an lvalue. */
+ STRIP_NOPS (expr);
+ }
in convert_like_real was supposed to handle it. But that assumption was wrong
as this test shows; here we have "(int *)f" where f is of type long int, and
we're converting it to "const int *const &", so we have both ck_ref_bind and
ck_qual. That means that the new STRIP_NOPS strips an expression it shouldn't
have, and that then breaks when creating a TARGET_EXPR. So we want to limit
the stripping to the new case only. This I do by checking need_temporary_p,
which will be 0 in the new case. Yes, we can set need_temporary_p when
binding a reference directly, but then we won't have a qualification
conversion. It is possible to have a bit-field, convert it to a pointer,
and then convert that pointer to a more-qualified pointer, but in that case
we're not dealing with an lvalue, so gl_kind is 0, so we won't enter this
block in reference_binding:
1747 if ((related_p || compatible_p) && gl_kind)
* call.c (convert_like_real) <case ck_ref_bind>: Check need_temporary_p.
* g++.dg/cpp0x/ref-bind7.C: New test.
From-SVN: r278416
Martin Jambor [Mon, 18 Nov 2019 15:50:06 +0000 (16:50 +0100)]
Add testcase for already fixed PR ipa/92528
2019-11-18 Martin Jambor <mjambor@suse.cz>
PR ipa/92528
* g++.dg/ipa/pr92528.C: New test.
From-SVN: r278415
Richard Sandiford [Mon, 18 Nov 2019 15:36:10 +0000 (15:36 +0000)]
Add optabs for accelerating RAW and WAR alias checks
This patch adds optabs that check whether a read followed by a write
or a write followed by a read can be divided into interleaved byte
accesses without changing the dependencies between the bytes.
This is one of the uses of the SVE2 WHILERW and WHILEWR instructions.
(The instructions can also be used to limit the VF at runtime,
but that's future work.)
2019-11-18 Richard Sandiford <richard.sandiford@arm.com>
gcc/
* doc/sourcebuild.texi (vect_check_ptrs): Document.
* optabs.def (check_raw_ptrs_optab, check_war_ptrs_optab): New optabs.
* doc/md.texi: Document them.
* internal-fn.def (IFN_CHECK_RAW_PTRS, IFN_CHECK_WAR_PTRS): New
internal functions.
* internal-fn.h (internal_check_ptrs_fn_supported_p): Declare.
* internal-fn.c (check_ptrs_direct): New macro.
(expand_check_ptrs_optab_fn): Likewise.
(direct_check_ptrs_optab_supported_p): Likewise.
(internal_check_ptrs_fn_supported_p): New fuction.
* tree-data-ref.c: Include internal-fn.h.
(create_ifn_alias_checks): New function.
(create_intersect_range_checks): Use it.
* config/aarch64/iterators.md (SVE2_WHILE_PTR): New int iterator.
(optab, cmp_op): Handle it.
(raw_war, unspec): New int attributes.
* config/aarch64/aarch64.md (UNSPEC_WHILERW, UNSPEC_WHILE_WR): New
constants.
* config/aarch64/predicates.md (aarch64_bytes_per_sve_vector_operand):
New predicate.
* config/aarch64/aarch64-sve2.md (check_<raw_war>_ptrs<mode>): New
expander.
(@aarch64_sve2_while<cmp_op><GPI:mode><PRED_ALL:mode>_ptest): New
pattern.
gcc/testsuite/
* lib/target-supports.exp (check_effective_target_vect_check_ptrs):
New procedure.
* gcc.dg/vect/vect-alias-check-14.c: Expect IFN_CHECK_WAR to be
used, if available.
* gcc.dg/vect/vect-alias-check-15.c: Likewise.
* gcc.dg/vect/vect-alias-check-16.c: Likewise IFN_CHECK_RAW.
* gcc.target/aarch64/sve2/whilerw_1.c: New test.
* gcc.target/aarch64/sve2/whilewr_1.c: Likewise.
* gcc.target/aarch64/sve2/whilewr_2.c: Likewise.
From-SVN: r278414
Richard Sandiford [Mon, 18 Nov 2019 15:29:53 +0000 (15:29 +0000)]
Add an empty constructor shortcut to build_vector_from_ctor
Empty vector constructors are equivalent to zero vectors. If we handle
that case directly, we can support it for variable-length vectors and
can hopefully make things more efficient for fixed-length vectors.
This is needed by a later C++ patch.
2019-11-18 Richard Sandiford <richard.sandiford@arm.com>
gcc/
* tree.c (build_vector_from_ctor): Directly return a zero vector for
empty constructors.
From-SVN: r278413
Richard Sandiford [Mon, 18 Nov 2019 15:29:03 +0000 (15:29 +0000)]
Two RTL CC tweaks for SVE pmore/plast conditions
SVE has two composite conditions:
pmore == at least one bit set && last bit clear
plast == no bits set || last bit set
So in general we generate them from:
A: CC = test bits
B: reg1 = first condition
C: CC = test bits
D: reg2 = second condition
E: result = (reg1 op reg2) where op is || or &&
To fold all this into a single test, we need to be able to remove
the redundant C (the cse.c patch) and then fold B, D and E down to
a single condition (the simplify-rtx.c patch).
The underlying conditions are unsigned, so the simplify-rtx.c part needs
to support both unsigned comparisons and AND. However, to avoid opening
the can of worms that is ANDing FP comparisons for unordered inputs,
I've restricted the new AND handling to cases in which NaNs can be
ignored. I think this is still a strict extension of what we have now,
it just doesn't go as far as it could. Going further would need an
entirely different set of testcases so I think would make more sense
as separate work.
2019-11-18 Richard Sandiford <richard.sandiford@arm.com>
gcc/
* cse.c (cse_insn): Delete no-op register moves too.
* simplify-rtx.c (comparison_to_mask): Handle unsigned comparisons.
Take a second comparison to control the value for NE.
(mask_to_comparison): Handle unsigned comparisons.
(simplify_logical_relational_operation): Likewise. Update call
to comparison_to_mask. Handle AND if !HONOR_NANs.
(simplify_binary_operation_1): Call the above for AND too.
gcc/testsuite/
* gcc.target/aarch64/sve/acle/asm/ptest_pmore.c: New test.
From-SVN: r278411
Richard Sandiford [Mon, 18 Nov 2019 15:27:56 +0000 (15:27 +0000)]
Handle VIEW_CONVERT_EXPR for variable-length vectors
This patch handles VIEW_CONVERT_EXPRs of variable-length VECTOR_CSTs
by adding tree-level versions of native_decode_vector_rtx and
simplify_const_vector_subreg. It uses the same code for fixed-length
vectors, both to get more coverage and because operating directly on
the compressed encoding should be more efficient for longer vectors
with a regular pattern.
The structure and comments are very similar between the tree and
rtx routines.
2019-11-18 Richard Sandiford <richard.sandiford@arm.com>
gcc/
* fold-const.c (native_encode_vector): Turn into a wrapper function,
splitting the main code out into...
(native_encode_vector_part): ...this new function.
(native_decode_vector_tree): New function.
(fold_view_convert_vector_encoding): Likewise.
(fold_view_convert_expr): Use it for converting VECTOR_CSTs
to VECTOR_TYPEs.
gcc/testsuite/
* gcc.target/aarch64/sve/acle/general/temporaries_1.c: New test.
From-SVN: r278410
Richard Sandiford [Mon, 18 Nov 2019 15:26:55 +0000 (15:26 +0000)]
Optimise WAR and WAW alias checks
For:
void
f1 (int *x, int *y)
{
for (int i = 0; i < 32; ++i)
x[i] += y[i];
}
we checked at runtime whether one vector at x would overlap one vector
at y. But in cases like this, the vector code would handle x <= y just
fine, since any write to address A still happens after any read from
address A. The only problem is if x is ahead of y by less than a
vector.
The same is true for two writes:
void
f2 (int *x, int *y)
{
for (int i = 0; i < 32; ++i)
{
x[i] = i;
y[i] = 2;
}
}
if y <= x then a vector write at y after a vector write at x would
have the same net effect as the original scalar writes.
This patch optimises the alias checks for these two cases. E.g.,
before the patch, f1 used:
add x2, x0, 15
sub x2, x2, x1
cmp x2, 30
bls .L2
whereas after the patch it uses:
add x2, x1, 4
sub x2, x0, x2
cmp x2, 8
bls .L2
Read-after-write cases like:
int
f3 (int *x, int *y)
{
int res = 0;
for (int i = 0; i < 32; ++i)
{
x[i] = i;
res += y[i];
}
return res;
}
can cope with x == y, but otherwise don't allow overlap in either
direction. Since checking for x == y at runtime would require extra
code, we're probably better off sticking with the current overlap test.
An overlap test is also needed if the scalar or vector accesses covered
by the alias check are mixed together, rather than all statements for
the second access following all statements for the first access.
The new code for gcc.target/aarch64/sve/var_strict_[135].c is slightly
better than before.
2019-11-18 Richard Sandiford <richard.sandiford@arm.com>
gcc/
* tree-data-ref.c (create_intersect_range_checks_index): If the
alias pair describes simple WAW and WAR dependencies, just check
whether the first B access overlaps later A accesses.
(create_waw_or_war_checks): New function that performs the same
optimization on addresses.
(create_intersect_range_checks): Call it.
gcc/testsuite/
* gcc.dg/vect/vect-alias-check-8.c: Expect WAR/WAW checks to be used.
* gcc.dg/vect/vect-alias-check-14.c: Likewise.
* gcc.dg/vect/vect-alias-check-15.c: Likewise.
* gcc.dg/vect/vect-alias-check-18.c: Likewise.
* gcc.dg/vect/vect-alias-check-19.c: Likewise.
* gcc.target/aarch64/sve/var_stride_1.c: Update expected sequence.
* gcc.target/aarch64/sve/var_stride_2.c: Likewise.
* gcc.target/aarch64/sve/var_stride_3.c: Likewise.
* gcc.target/aarch64/sve/var_stride_5.c: Likewise.
From-SVN: r278409
Richard Sandiford [Mon, 18 Nov 2019 15:26:07 +0000 (15:26 +0000)]
LRA: handle memory constraints that accept more than "m"
LRA allows address constraints that are more relaxed than "p":
/* Target hooks sometimes don't treat extra-constraint addresses as
legitimate address_operands, so handle them specially. */
if (insn_extra_address_constraint (cn)
&& satisfies_address_constraint_p (&ad, cn))
return change_p;
For SVE it's useful to allow the same thing for memory constraints.
The particular use case is LD1RQ, which is an SVE instruction that
addresses Advanced SIMD vector modes and that accepts some addresses
that normal Advanced SIMD moves don't.
Normally we require every memory to satisfy at least "m", which is
defined to be a memory "with any kind of address that the machine
supports in general". However, LD1RQ is very much special-purpose:
it doesn't really have any relation to normal operations on these
modes. Adding its addressing modes to "m" would lead to bad Advanced
SIMD optimisation decisions in passes like ivopts. LD1RQ therefore
has a memory constraint that accepts things "m" doesn't.
2019-11-18 Richard Sandiford <richard.sandiford@arm.com>
gcc/
* lra-constraints.c (valid_address_p): Take the operand and a
constraint as argument. If the operand is a MEM and the constraint
is a memory constraint, check whether the eliminated form of the
MEM already satisfies the constraint.
(process_address_1): Update calls accordingly.
gcc/testsuite/
* gcc.target/aarch64/sve/acle/asm/ld1rq_f16.c: Remove XFAIL.
* gcc.target/aarch64/sve/acle/asm/ld1rq_f32.c: Likewise.
* gcc.target/aarch64/sve/acle/asm/ld1rq_f64.c: Likewise.
* gcc.target/aarch64/sve/acle/asm/ld1rq_s16.c: Likewise.
* gcc.target/aarch64/sve/acle/asm/ld1rq_s32.c: Likewise.
* gcc.target/aarch64/sve/acle/asm/ld1rq_s64.c: Likewise.
* gcc.target/aarch64/sve/acle/asm/ld1rq_u16.c: Likewise.
* gcc.target/aarch64/sve/acle/asm/ld1rq_u32.c: Likewise.
* gcc.target/aarch64/sve/acle/asm/ld1rq_u64.c: Likewise.
From-SVN: r278408
Tom Tromey [Mon, 18 Nov 2019 14:22:57 +0000 (14:22 +0000)]
Remove vestiges of MODIFY_JNI_METHOD_CALL
I happened to notice that MODIFY_JNI_METHOD_CALL was defined in
cygming.h and documented in tm.texi. However, because it was only
needed for gcj, it is obsolete. This patch removes the vestiges.
Tested by grep, and rebuilding the documentation.
gcc/ChangeLog
2019-11-18 Tom Tromey <tromey@adacore.com>
* doc/tm.texi: Rebuild.
* doc/tm.texi.in (Misc): Don't document MODIFY_JNI_METHOD_CALL.
* config/i386/cygming.h (MODIFY_JNI_METHOD_CALL): Don't define.
From-SVN: r278407
Richard Biener [Mon, 18 Nov 2019 14:07:11 +0000 (14:07 +0000)]
re PR tree-optimization/92516 (ICE in vect_schedule_slp_instance, at tree-vect-slp.c:4095 since r278246)
2019-11-18 Richard Biener <rguenther@suse.de>
PR tree-optimization/92516
* tree-vect-slp.c (vect_analyze_slp_instance): Add bst_map
argument, hoist bst_map creation/destruction to ...
(vect_analyze_slp): ... here, forming a true graph with
SLP instances being the entries.
(vect_detect_hybrid_slp_stmts): Remove wrapper.
(vect_detect_hybrid_slp): Use one visited set for all
graph entries.
(vect_slp_analyze_node_operations): Simplify visited/lvisited
to hash-sets of slp_tree.
(vect_slp_analyze_operations): Likewise.
(vect_bb_slp_scalar_cost): Remove wrapper.
(vect_bb_vectorization_profitable_p): Use one visited set for
all graph entries.
(vect_schedule_slp_instance): Elide bst_map use.
(vect_schedule_slp): Likewise.
* g++.dg/vect/slp-pr92516.cc: New testcase.
2019-11-18 Richard Biener <rguenther@suse.de>
* tree-vect-slp.c (vect_analyze_slp_instance): When a CTOR
was vectorized with just external refs fail.
* gcc.dg/vect/vect-ctor-1.c: New testcase.
From-SVN: r278406
Martin Liska [Mon, 18 Nov 2019 13:04:57 +0000 (14:04 +0100)]
Unset m_checker in sem_function::init.
2019-11-18 Martin Liska <mliska@suse.cz>
PR ipa/92525
* ipa-icf.c (sem_function::init): Unset m_checker
at the end of the function.
From-SVN: r278405
Martin Liska [Mon, 18 Nov 2019 12:54:11 +0000 (13:54 +0100)]
Remove strange dump suboptions in testsuite.
2019-11-18 Martin Liska <mliska@suse.cz>
* gcc.dg/ipa/ipa-icf-36.c: Remove 'all-all-all'.
* gcc.dg/ipa/ipa-icf-37.c: Likewise.
From-SVN: r278404
Szabolcs Nagy [Mon, 18 Nov 2019 12:46:56 +0000 (12:46 +0000)]
fix ChangeLog to reference the PR
From-SVN: r278403
Jonathan Wakely [Mon, 18 Nov 2019 12:46:08 +0000 (12:46 +0000)]
libstdc++: Fix std::jthread bugs
The std::jthread::get_id() function was missing a return statement.
The is_invocable check needs to be done using decayed types, as they'll
be forwarded to std::invoke as rvalues.
Also reduce header dependencies for the <thread> header. We don't need
to include <functional> for std::jthread because <bits/invoke.h> is
already included, which defines std::__invoke. We can also remove
<bits/functexcept.h> which isn't used at all. Finally, when
_GLIBCXX_HAS_GTHREADS is not defined there's no point including any
other headers, since we're not going to define anything in <thread>
anyway.
* include/std/thread: Reduce header dependencies.
(jthread::get_id()): Add missing return.
(jthread::get_stop_token()): Avoid unnecessary stop_source temporary.
(jthread::_S_create): Check is_invocable using decayed types. Add
static assertion.
* testsuite/30_threads/jthread/1.cc: Add dg-require-gthreads.
* testsuite/30_threads/jthread/2.cc: Likewise.
* testsuite/30_threads/jthread/3.cc: New test.
* testsuite/30_threads/jthread/jthread.cc: Add missing directives for
pthread and gthread support. Use VERIFY instead of assert.
From-SVN: r278402
Jonathan Wakely [Mon, 18 Nov 2019 12:46:02 +0000 (12:46 +0000)]
libstdc++: Fix some -Wsystem-headers warnings
* include/bits/alloc_traits.h (allocator_traits::construct)
(allocator_traits::destroy, allocator_traits::max_size): Add unused
attributes to parameters that are not used in C++20.
* include/std/bit (__ceil2): Add braces around assertion to avoid
-Wmissing-braces warning.
From-SVN: r278401
Richard Biener [Mon, 18 Nov 2019 12:41:11 +0000 (12:41 +0000)]
re PR tree-optimization/92558 (Miscompare of 554.roms_r with -Ofast -march=znver2 -flto since r278289)
2019-11-18 Richard Biener <rguenther@suse.de>
PR tree-optimization/92558
* tree-vect-loop.c (vect_create_epilog_for_reduction): When
reducting the width of a reduction vector def update new_phis.
* gcc.dg/vect/pr92558.c: New testcase.
From-SVN: r278400
Szabolcs Nagy [Mon, 18 Nov 2019 12:08:18 +0000 (12:08 +0000)]
musl: Don't use gthr weak refs in libgcc PR91737
The gthr weak reference based single thread detection is unsafe with
static linking and in case of dynamic linking it's ineffective on musl
since pthread symbols are defined in libc.so.
(Ideally this should be fixed for all targets, since glibc plans to move
libpthread.so into libc.so too and users want to static link to pthread
without --whole-archive: PR87189.)
For now we have to explicitly opt out from the broken behaviour in the
config machinery of each target lib and libgcc was previously missed.
libgcc/ChangeLog:
2019-11-18 Szabolcs Nagy <szabolcs.nagy@arm.com>
* config.host: Add t-gthr-noweak on *-*-musl*.
* config/t-gthr-noweak: New file.
From-SVN: r278399
Szabolcs Nagy [Mon, 18 Nov 2019 12:03:18 +0000 (12:03 +0000)]
musl: use correct long double abi by default
On powerpc and s390x the musl ABI requires 64 bit and 128 bit long
double respectively, so adjust the default.
gcc/ChangeLog:
2019-11-18 Szabolcs Nagy <szabolcs.nagy@arm.com>
* configure.ac (gcc_cv_target_ldbl128): Set for powerpc*-*-linux-musl*
and s390*-*-linux-musl* targets.
* configure: Regenerate.
From-SVN: r278398
Szabolcs Nagy [Mon, 18 Nov 2019 12:00:45 +0000 (12:00 +0000)]
s390: add musl support
Add the musl dynamic linker names.
gcc/ChangeLog:
2019-11-18 Szabolcs Nagy <szabolcs.nagy@arm.com>
* config/s390/linux.h (MUSL_DYNAMIC_LINKER32): Define.
(MUSL_DYNAMIC_LINKER64): Define.
From-SVN: r278397
Martin Liska [Mon, 18 Nov 2019 11:51:20 +0000 (12:51 +0100)]
Improve -dbg-cnt error message and support :0.
2019-11-18 Martin Liska <mliska@suse.cz>
* dbgcnt.c (dbg_cnt_set_limit_by_name): Provide error
message for an unknown counter.
(dbg_cnt_process_single_pair): Support 0 as minimum value.
(dbg_cnt_process_opt): Remove unreachable code.
From-SVN: r278396
Martin Liska [Mon, 18 Nov 2019 11:51:05 +0000 (12:51 +0100)]
Verify NOP_EXPR LHS type in IPA ICF.
2019-11-18 Martin Liska <mliska@suse.cz>
PR ipa/92529
* ipa-icf-gimple.c (func_checker::compare_gimple_assign):
Compare LHS types of NOP_EXPR.
2019-11-18 Martin Liska <mliska@suse.cz>
PR ipa/92529
* gcc.dg/ipa/pr92529.c: New test.
From-SVN: r278395
Matthew Malcomson [Mon, 18 Nov 2019 11:16:46 +0000 (11:16 +0000)]
[mid-end][__RTL] Clean state despite unspecified __RTL startwith passes
Hi there,
When compiling an __RTL function that has an unspecified "startwith"
pass we currently don't run the cleanup pass, this means that we ICE on
the next function (if it's a basic function).
This change ensures that the clean_state pass is run even if the
startwith pass is unspecified.
We also ensure the name of the startwith pass is always freed correctly.
As an example, before this change the following code would ICE when compiling
the function `foo_a`.
When compiled with
./aarch64-none-linux-gnu-gcc -O0 -S unspecified-pass-error.c -o test.s
```
int __RTL () badfoo ()
{
(function "badfoo"
(insn-chain
(block 2
(edge-from entry (flags "FALLTHRU"))
(cnote 3 [bb 2] NOTE_INSN_BASIC_BLOCK)
(cinsn 101 (set (reg:DI x19) (reg:DI x0)))
(cinsn 10 (use (reg/i:SI x19)))
(edge-to exit (flags "FALLTHRU"))
) ;; block 2
) ;; insn-chain
) ;; function "foo2"
}
int
foo_a ()
{
return 200;
}
```
Now it silently ignores the __RTL function and successfully compiles foo_a.
regtest done on aarch64
regtest done on x86_64
OK for trunk?
gcc/ChangeLog:
2019-11-18 Matthew Malcomson <matthew.malcomson@arm.com>
* run-rtl-passes.c (run_rtl_passes): Accept and handle empty
"initial_pass_name" argument -- by running "*clean_state" pass.
Also free the "initial_pass_name" when done.
gcc/c/ChangeLog:
2019-11-18 Matthew Malcomson <matthew.malcomson@arm.com>
* c-parser.c (c_parser_parse_rtl_body): Always call
run_rtl_passes, even if startwith pass is not provided.
gcc/testsuite/ChangeLog:
2019-11-18 Matthew Malcomson <matthew.malcomson@arm.com>
* gcc.dg/rtl/aarch64/unspecified-pass-error.c: New test.
From-SVN: r278393
Richard Biener [Mon, 18 Nov 2019 09:44:52 +0000 (09:44 +0000)]
re PR target/92462 ([arm32] -ftree-pre makes a variable to be wrongly hoisted out)
2019-11-18 Richard Biener <rguenther@suse.de>
PR rtl-optimization/92462
* alias.c (find_base_term): Restrict the look through ANDs.
(find_base_value): Likewise.
From-SVN: r278391
Christophe Lyon [Mon, 18 Nov 2019 09:20:18 +0000 (09:20 +0000)]
[testsuite][ARM] check_effective_target_arm_vfp_ok_nocache: Fix typo in option name
2019-11-18 Christophe Lyon <christophe.lyon@linaro.org>
* lib/target-supports.exp
(check_effective_target_arm_vfp_ok_nocache): Fix typo in option
name.
From-SVN: r278390
Georg-Johann Lay [Mon, 18 Nov 2019 08:19:08 +0000 (08:19 +0000)]
re PR target/92545 (avr: support ATmega devices from the 0-series)
PR target/92545
* config/avr/gen-avr-mmcu-specs.c (print_mcu)
[link_pm_base_address]: Symbol name is __RODATA_PM_OFFSET__.
From-SVN: r278389
Georg-Johann Lay [Mon, 18 Nov 2019 07:54:30 +0000 (07:54 +0000)]
re PR target/92545 (avr: support ATmega devices from the 0-series)
PR target/92545
* doc/avr-mmcu.texi: Regenerate.
From-SVN: r278388
Georg-Johann Lay [Mon, 18 Nov 2019 07:52:55 +0000 (07:52 +0000)]
Add support for AVR devices from the 0-series.
PR target/92545
* config/avr/avr-arch.h (avr_mcu_t) <flash_pm_offset>: New field.
* config/avr/avr-devices.c (avr_mcu_types): Adjust initializers.
* config/avr/avr-mcus.def (AVR_MCU): Add respective field.
* config/avr/specs.h (LINK_SPEC) <%(link_pm_base_address)>: Add.
* config/avr/gen-avr-mmcu-specs.c (print_mcu)
<*cpp, *cpp_mcu, *cpp_avrlibc, *link_pm_base_address>: Emit code
for spec definitions.
* doc/avr-mmcu.texi: Regenerate.
From-SVN: r278387
Hongtao Liu [Mon, 18 Nov 2019 02:22:55 +0000 (02:22 +0000)]
Split X86_TUNE_AVX128_OPTIMAL into X86_TUNE_AVX256_SPLIT_REGS
and X86_TUNE_AVX128_OPTIMAL.
Changelog
gcc/
PR target/92448
* config/i386/i386-expand.c (ix86_expand_set_or_cpymem):
Replace TARGET_AVX128_OPTIMAL with TARGET_AVX256_SPLIT_REGS.
* config/i386/i386-option.c (ix86_vec_cost): Ditto.
(ix86_reassociation_width): Ditto.
* config/i386/i386-options.c (ix86_option_override_internal):
Replace TARGET_AVX128_OPTIAML with
ix86_tune_features[X86_TUNE_AVX128_OPTIMAL]
* config/i386/i386.h (TARGET_AVX256_SPLIT_REGS): New macro.
(TARGET_AVX128_OPTIMAL): Deleted.
* config/i386/x86-tune.def (X86_TUNE_AVX256_SPLIT_REGS): New
DEF_TUNE.
From-SVN: r278385
Maciej W. Rozycki [Mon, 18 Nov 2019 00:33:37 +0000 (00:33 +0000)]
libgomp: Regenerate `testsuite/Makefile.in' for GCC_HEADER_STDINT removal
Commit r276389 ("configure.ac: Remove GCC_HEADER_STDINT(gstdint.h)") has
not regenerated `testsuite/Makefile.in'. Fix it.
libgomp/
* testsuite/Makefile.in: Regenerate.
From-SVN: r278384
Maciej W. Rozycki [Mon, 18 Nov 2019 00:21:45 +0000 (00:21 +0000)]
libgfortran: Regenerate `Makefile.in' for `runstatedir' removal
A change made with r271340 ("libfortran/90038: Use posix_spawn instead
of fork") accidentally brought the obsolete `runstatedir' setting back
in. Fix it.
libgfortran/
* Makefile.in: Regenerate.
From-SVN: r278383
GCC Administrator [Mon, 18 Nov 2019 00:16:13 +0000 (00:16 +0000)]
Daily bump.
From-SVN: r278382
John David Anglin [Sun, 17 Nov 2019 23:11:52 +0000 (23:11 +0000)]
linux-atomic.c (__kernel_cmpxchg): Change argument 1 to volatile void *.
* config/pa/linux-atomic.c (__kernel_cmpxchg): Change argument 1 to
volatile void *. Remove trap check.
(__kernel_cmpxchg2): Likewise.
(FETCH_AND_OP_2): Adjust operand types.
(OP_AND_FETCH_2): Likewise.
(FETCH_AND_OP_WORD): Likewise.
(OP_AND_FETCH_WORD): Likewise.
(COMPARE_AND_SWAP_2): Likewise.
(__sync_val_compare_and_swap_4): Likewise.
(__sync_bool_compare_and_swap_4): Likewise.
(SYNC_LOCK_TEST_AND_SET_2): Likewise.
(__sync_lock_test_and_set_4): Likewise.
(SYNC_LOCK_RELEASE_1): Likewise. Use __kernel_cmpxchg2 for release.
(__sync_lock_release_4): Adjust operand types. Use __kernel_cmpxchg
for release.
(__sync_lock_release_8): Remove.
From-SVN: r278377
Jeff Law [Sun, 17 Nov 2019 16:31:32 +0000 (09:31 -0700)]
* gcc.dg/complex-6.c: Do not run dump scan tests for rx target.
From-SVN: r278376
Jakub Jelinek [Sun, 17 Nov 2019 06:12:01 +0000 (07:12 +0100)]
method.c (lookup_comparison_result): Use %qD instead of %<%T::%D%> to print the decl.
* method.c (lookup_comparison_result): Use %qD instead of %<%T::%D%>
to print the decl.
(lookup_comparison_category): Use %qD instead of %<std::%D%> to print
the decl.
* g++.dg/cpp2a/spaceship-err3.C: New test.
From-SVN: r278375
Edward Smith-Rowland [Sun, 17 Nov 2019 03:31:15 +0000 (03:31 +0000)]
Repair the <tuple> part of C++20 p1032 Misc constexpr bits.
2019-11-16 Edward Smith-Rowland <3dw4rd@verizon.net>
Repair the <tuple> part of C++20 p1032 Misc constexpr bits.
* include/bits/uses_allocator.h (__uses_alloc0::_Sink::operaror=)
(__use_alloc(const _Alloc&)) : Constexpr.
From-SVN: r278373
Jonathan Wakely [Sun, 17 Nov 2019 01:32:55 +0000 (01:32 +0000)]
libstdc++: add range constructor for std::string_view (P1391R4)
* include/std/string_view (basic_string_view(It, End)): Add range
constructor and deduction guide from P1391R4.
* testsuite/21_strings/basic_string_view/cons/char/range.cc: New test.
From-SVN: r278371
Jonathan Wakely [Sun, 17 Nov 2019 01:07:54 +0000 (01:07 +0000)]
libstdc++: Define C++20 range utilities and range factories
This adds another chunk of the <ranges> header.
The changes from P1456R1 (Move-only views) and P1862R1 (Range adaptors
for non-copyable iterators) are included, but not the changes from
P1870R1 (forwarding-range<T> is too subtle).
The tests for subrange and iota_view are poor and should be improved.
* include/bits/regex.h (match_results): Specialize __enable_view_impl.
* include/bits/stl_set.h (set): Likewise.
* include/bits/unordered_set.h (unordered_set, unordered_multiset):
Likewise.
* include/debug/multiset.h (__debug::multiset): Likewise.
* include/debug/set.h (__debug::set): Likewise.
* include/debug/unordered_set (__debug::unordered_set)
(__debug::unordered_multiset): Likewise.
* include/std/ranges (ranges::view, ranges::enable_view)
(ranges::view_interface, ranges::subrange, ranges::empty_view)
(ranges::single_view, ranges::views::single, ranges::iota_view)
(ranges::views::iota): Define for C++20.
* testsuite/std/ranges/empty_view.cc: New test.
* testsuite/std/ranges/iota_view.cc: New test.
* testsuite/std/ranges/single_view.cc: New test.
* testsuite/std/ranges/view.cc: New test.
From-SVN: r278370
GCC Administrator [Sun, 17 Nov 2019 00:16:14 +0000 (00:16 +0000)]
Daily bump.
From-SVN: r278369
Segher Boessenkool [Sat, 16 Nov 2019 23:31:19 +0000 (00:31 +0100)]
rs6000: Allow mode GPR in cceq_{ior,rev}_compare
Also make it a parmeterized name: @cceq_{ior,rev}_compare_<mode>.
* config/rs6000/rs6000.md (cceq_ior_compare): Rename to...
(@cceq_ior_compare_<mode> for GPR): ... this. Allow GPR instead of
just SI.
(cceq_rev_compare): Rename to...
(@cceq_rev_compare_<mode> for GPR): ... this. Allow GPR instead of
just SI.
(define_split for <bd>tf_<mode>): Add SImode first argument to
gen_cceq_ior_compare.
From-SVN: r278366
Jonathan Wakely [Sat, 16 Nov 2019 22:00:23 +0000 (22:00 +0000)]
Revert r278363 "Start work on <ranges> header"
This was not meant to be on the branch I committed r278364 from, as it
is not ready to commit yet.
* include/std/ranges: Revert accidentally committed changes.
From-SVN: r278365
Jonathan Wakely [Sat, 16 Nov 2019 21:47:28 +0000 (21:47 +0000)]
libstdc++: Optimize std::jthread construction
This change avoids storing a copy of a stop_token object that isn't
needed and won't be passed to the callable object. This slightly reduces
memory usage when the callable doesn't use a stop_token. It also removes
indirection in the invocation of the callable in the new thread, as
there is no lambda and no additional calls to std::invoke.
It also adds some missing [[nodiscard]] attributes, and the non-member
swap overload for std::jthread.
* include/std/thread (jthread::jthread()): Use nostopstate constant.
(jthread::jthread(Callable&&, Args&&...)): Use helper function to
create std::thread instead of indirection through a lambda. Use
remove_cvref_t instead of decay_t.
(jthread::joinable(), jthread::get_id(), jthread::native_handle())
(jthread::hardware_concurrency()): Add nodiscard attribute.
(swap(jthread&. jthread&)): Define hidden friend.
(jthread::_S_create): New helper function for constructor.
From-SVN: r278364
Jonathan Wakely [Sat, 16 Nov 2019 21:47:22 +0000 (21:47 +0000)]
Start work on <ranges> header
From-SVN: r278363
Segher Boessenkool [Sat, 16 Nov 2019 19:32:12 +0000 (20:32 +0100)]
Delete common/config/powerpcspe
I missed this part in r266961. Various people have been editing it
since; I finally noticed.
* common/config/powerpcspe: Delete.
From-SVN: r278361
Jeff Law [Sat, 16 Nov 2019 17:14:14 +0000 (10:14 -0700)]
[PATCH] Fix slowness in demangler
* cp-demangle.c (d_print_init): Remove const from 4th param.
(cplus_demangle_fill_name): Initialize d->d_counting.
(cplus_demangle_fill_extended_operator): Likewise.
(cplus_demangle_fill_ctor): Likewise.
(cplus_demangle_fill_dtor): Likewise.
(d_make_empty): Likewise.
(d_count_templates_scopes): Remobe const from 3rd param,
Return on dc->d_counting > 1,
Increment dc->d_counting.
* cp-demint.c (cplus_demangle_fill_component): Initialize d->d_counting.
(cplus_demangle_fill_builtin_type): Likewise.
(cplus_demangle_fill_operator): Likewise.
* demangle.h (struct demangle_component): Add member
d_counting.
From-SVN: r278359
Eduard-Mihai Burtescu [Sat, 16 Nov 2019 15:32:50 +0000 (16:32 +0100)]
[PATCH] Refactor rust-demangle to be independent of C++ demangling.
* demangle.h (rust_demangle_callback): Add.
* cplus-dem.c (cplus_demangle): Use rust_demangle directly.
(rust_demangle): Remove.
* rust-demangle.c (is_prefixed_hash): Rename to is_legacy_prefixed_hash.
(parse_lower_hex_nibble): Rename to decode_lower_hex_nibble.
(parse_legacy_escape): Rename to decode_legacy_escape.
(rust_is_mangled): Remove.
(struct rust_demangler): Add.
(peek): Add.
(next): Add.
(struct rust_mangled_ident): Add.
(parse_ident): Add.
(rust_demangle_sym): Remove.
(print_str): Add.
(PRINT): Add.
(print_ident): Add.
(rust_demangle_callback): Add.
(struct str_buf): Add.
(str_buf_reserve): Add.
(str_buf_append): Add.
(str_buf_demangle_callback): Add.
(rust_demangle): Add.
* rust-demangle.h: Remove.
From-SVN: r278358
Miguel Saldivar [Sat, 16 Nov 2019 14:45:30 +0000 (14:45 +0000)]
* testsuite/demangle-expected: Fix test.
From-SVN: r278357
Richard Sandiford [Sat, 16 Nov 2019 13:31:28 +0000 (13:31 +0000)]
[AArch64] Robustify aarch64_wrffr
This patch uses distinct values for the FFR and FFRT outputs of
aarch64_wrffr, so that a following aarch64_copy_ffr_to_ffrt has
an effect. This is needed to avoid regressions with later patches.
The block comment at the head of the file already described
the pattern this way, and there was already an unspec for it.
Not sure what made me change it...
2019-11-16 Richard Sandiford <richard.sandiford@arm.com>
gcc/
* config/aarch64/aarch64-sve.md (aarch64_wrffr): Wrap the FFRT
output in UNSPEC_WRFFR.
From-SVN: r278356
Richard Sandiford [Sat, 16 Nov 2019 11:43:31 +0000 (11:43 +0000)]
Use a single comparison for index-based alias checks
This patch rewrites the index-based alias checks to use conditions
of the form:
(unsigned T) (a - b + bias) <= limit
E.g. before the patch:
struct s { int x[100]; };
void
f1 (struct s *s1, int a, int b)
{
for (int i = 0; i < 32; ++i)
s1->x[i + a] += s1->x[i + b];
}
used:
add w3, w1, 3
cmp w3, w2
add w3, w2, 3
ccmp w1, w3, 0, ge
ble .L2
whereas after the patch it uses:
sub w3, w1, w2
add w3, w3, 3
cmp w3, 6
bls .L2
The patch also fixes the seg_len1 and seg_len2 negation for cases in
which seg_len is a "negative unsigned" value narrower than 64 bits,
like it is for 32-bit targets. Previously we'd end up with values
like 0xffffffff000000001 instead of 1.
2019-11-16 Richard Sandiford <richard.sandiford@arm.com>
gcc/
* tree-data-ref.c (create_intersect_range_checks_index): Rewrite
the index tests to have the form (unsigned T) (B - A + bias) <= limit.
gcc/testsuite/
* gcc.dg/vect/vect-alias-check-18.c: New test.
* gcc.dg/vect/vect-alias-check-19.c: Likewise.
* gcc.dg/vect/vect-alias-check-20.c: Likewise.
From-SVN: r278354
Richard Sandiford [Sat, 16 Nov 2019 11:42:53 +0000 (11:42 +0000)]
Print the type of alias check in a dump message
This patch prints a message to say how an alias check is being
implemented.
2019-11-16 Richard Sandiford <richard.sandiford@arm.com>
gcc/
* tree-data-ref.c (create_intersect_range_checks_index)
(create_intersect_range_checks): Print dump messages.
gcc/testsuite/
* gcc.dg/vect/vect-alias-check-1.c: Test for the type of alias check.
* gcc.dg/vect/vect-alias-check-8.c: Likewise.
* gcc.dg/vect/vect-alias-check-9.c: Likewise.
* gcc.dg/vect/vect-alias-check-10.c: Likewise.
* gcc.dg/vect/vect-alias-check-11.c: Likewise.
* gcc.dg/vect/vect-alias-check-12.c: Likewise.
* gcc.dg/vect/vect-alias-check-13.c: Likewise.
* gcc.dg/vect/vect-alias-check-14.c: Likewise.
* gcc.dg/vect/vect-alias-check-15.c: Likewise.
* gcc.dg/vect/vect-alias-check-16.c: Likewise.
* gcc.dg/vect/vect-alias-check-17.c: Likewise.
From-SVN: r278353
Richard Sandiford [Sat, 16 Nov 2019 11:42:02 +0000 (11:42 +0000)]
Dump the list of merged alias pairs
This patch dumps the final (merged) list of alias pairs. It also adds:
- WAW and RAW versions of vect-alias-check-8.c
- a "well-ordered" version of vect-alias-check-9.c (i.e. all reads
before any writes)
- a test with mixed steps in the same alias pair
I also tweaked the test value in vect-alias-check-9.c so that the
result was less likely to be accidentally correct if the alias
isn't honoured.
2019-11-16 Richard Sandiford <richard.sandiford@arm.com>
gcc/
* tree-data-ref.c (dump_alias_pair): New function.
(prune_runtime_alias_test_list): Use it to dump each merged alias pair.
gcc/testsuite/
* gcc.dg/vect/vect-alias-check-8.c: Test for the RAW flag.
* gcc.dg/vect/vect-alias-check-9.c: Test for the ARBITRARY flag.
(TEST_VALUE): Use a higher value for early iterations.
* gcc.dg/vect/vect-alias-check-14.c: New test.
* gcc.dg/vect/vect-alias-check-15.c: Likewise.
* gcc.dg/vect/vect-alias-check-16.c: Likewise.
* gcc.dg/vect/vect-alias-check-17.c: Likewise.
From-SVN: r278352
Richard Sandiford [Sat, 16 Nov 2019 11:41:16 +0000 (11:41 +0000)]
Record whether a dr_with_seg_len contains mixed steps
prune_runtime_alias_test_list can merge dr_with_seg_len_pair_ts that
have different steps for the first reference or different steps for the
second reference. This patch adds a flag to record that.
I don't know whether the change to create_intersect_range_checks_index
fixes anything in practice. It would have to be a corner case if so,
since at present we only merge two alias pairs if either the first or
the second references are identical and only the other references differ.
And the vectoriser uses VF-based segment lengths only if both references
in a pair have the same step. Either way, it still seems wrong to use
DR_STEP when it doesn't represent all checks that have been merged into
the pair.
2019-11-16 Richard Sandiford <richard.sandiford@arm.com>
gcc/
* tree-data-ref.h (DR_ALIAS_MIXED_STEPS): New flag.
* tree-data-ref.c (prune_runtime_alias_test_list): Set it when
merging data references with different steps.
(create_intersect_range_checks_index): Take a
dr_with_seg_len_pair_t instead of two dr_with_seg_lens.
Bail out if DR_ALIAS_MIXED_STEPS is set.
(create_intersect_range_checks): Take a dr_with_seg_len_pair_t
instead of two dr_with_seg_lens. Update call to
create_intersect_range_checks_index.
(create_runtime_alias_checks): Update call accordingly.
From-SVN: r278351
Richard Sandiford [Sat, 16 Nov 2019 11:40:22 +0000 (11:40 +0000)]
Add flags to dr_with_seg_len_pair_t
This patch adds a bunch of flags to dr_with_seg_len_pair_t,
for use by later patches. The update to tree-loop-distribution.c
is conservatively correct, but might be tweakable later.
2019-11-16 Richard Sandiford <richard.sandiford@arm.com>
gcc/
* tree-data-ref.h (DR_ALIAS_RAW, DR_ALIAS_WAR, DR_ALIAS_WAW)
(DR_ALIAS_ARBITRARY, DR_ALIAS_SWAPPED, DR_ALIAS_UNSWAPPED): New flags.
(dr_with_seg_len_pair_t::sequencing): New enum.
(dr_with_seg_len_pair_t::flags): New member variable.
(dr_with_seg_len_pair_t::dr_with_seg_len_pair_t): Take a sequencing
parameter and initialize the flags member variable.
* tree-loop-distribution.c (compute_alias_check_pairs): Update
call accordingly.
* tree-vect-data-refs.c (vect_prune_runtime_alias_test_list): Likewise.
Ensure the two data references in an alias pair are in statement
order, if there is a defined order.
* tree-data-ref.c (prune_runtime_alias_test_list): Use
DR_ALIAS_SWAPPED and DR_ALIAS_UNSWAPPED to record whether we've
swapped the references in a dr_with_seg_len_pair_t. OR together
the flags when merging two dr_with_seg_len_pair_ts. After merging,
try to restore the original dr_with_seg_len order, updating the
flags if that fails.
From-SVN: r278350
Richard Sandiford [Sat, 16 Nov 2019 11:35:56 +0000 (11:35 +0000)]
Delay swapping data refs in prune_runtime_alias_test_list
prune_runtime_alias_test_list swapped dr_as between two dr_with_seg_len
pairs before finally deciding whether to merge them. Bailing out later
would therefore leave the pairs in an incorrect state.
IMO a better fix would be to split this out into a subroutine that
produces a temporary dr_with_seg_len on success, rather than changing
an existing one in-place. It would then be easy to merge both the dr_as
and dr_bs if we wanted to, rather than requiring one of them to be equal.
But here I tried to do something that could be backported if necessary.
2019-11-16 Richard Sandiford <richard.sandiford@arm.com>
gcc/
* tree-data-ref.c (prune_runtime_alias_test_list): Delay
swapping the dr_as based on init values until we've decided
whether to merge them.
From-SVN: r278349
Richard Sandiford [Sat, 16 Nov 2019 11:35:08 +0000 (11:35 +0000)]
Move canonicalisation of dr_with_seg_len_pair_ts
The two users of tree-data-ref's runtime alias checks both canonicalise
the order of the dr_with_seg_lens in a pair before passing them to
prune_runtime_alias_test_list. It's more convenient for later patches
if prune_runtime_alias_test_list does that itself.
2019-11-16 Richard Sandiford <richard.sandiford@arm.com>
gcc/
* tree-data-ref.c (prune_runtime_alias_test_list): Sort the
two accesses in each dr_with_seg_len_pair_t before trying to
combine separate dr_with_seg_len_pair_ts.
* tree-loop-distribution.c (compute_alias_check_pairs): Don't do
that here.
* tree-vect-data-refs.c (vect_prune_runtime_alias_test_list): Likewise.
From-SVN: r278348
Richard Sandiford [Sat, 16 Nov 2019 11:30:46 +0000 (11:30 +0000)]
[AArch64] Add scatter stores for partial SVE modes
This patch adds support for scatter stores of partial vectors,
where the vector base or offset elements can be wider than the
elements being stored.
2019-11-16 Richard Sandiford <richard.sandiford@arm.com>
gcc/
* config/aarch64/aarch64-sve.md
(scatter_store<SVE_FULL_SD:mode><v_int_equiv>): Extend to...
(scatter_store<SVE_24:mode><v_int_container>): ...this.
(mask_scatter_store<SVE_FULL_S:mode><v_int_equiv>): Extend to...
(mask_scatter_store<SVE_4:mode><v_int_equiv>): ...this.
(mask_scatter_store<SVE_FULL_D:mode><v_int_equiv>): Extend to...
(mask_scatter_store<SVE_2:mode><v_int_equiv>): ...this.
(*mask_scatter_store<mode><v_int_container>_<su>xtw_unpacked): New
pattern.
(*mask_scatter_store<SVE_FULL_D:mode><v_int_equiv>_sxtw): Extend to...
(*mask_scatter_store<SVE_2:mode><v_int_equiv>_sxtw): ...this.
(*mask_scatter_store<SVE_FULL_D:mode><v_int_equiv>_uxtw): Extend to...
(*mask_scatter_store<SVE_2:mode><v_int_equiv>_uxtw): ...this.
gcc/testsuite/
* gcc.target/aarch64/sve/scatter_store_1.c (TEST_LOOP): Start at 0.
(TEST_ALL): Add tests for 8-bit and 16-bit elements.
* gcc.target/aarch64/sve/scatter_store_2.c: Update accordingly.
* gcc.target/aarch64/sve/scatter_store_3.c (TEST_LOOP): Start at 0.
(TEST_ALL): Add tests for 8-bit and 16-bit elements.
* gcc.target/aarch64/sve/scatter_store_4.c: Update accordingly.
* gcc.target/aarch64/sve/scatter_store_5.c (TEST_LOOP): Start at 0.
(TEST_ALL): Add tests for 8-bit, 16-bit and 32-bit elements.
* gcc.target/aarch64/sve/scatter_store_8.c: New test.
* gcc.target/aarch64/sve/scatter_store_9.c: Likewise.
From-SVN: r278347
Richard Sandiford [Sat, 16 Nov 2019 11:26:11 +0000 (11:26 +0000)]
[AArch64] Pattern-match SVE extending gather loads
This patch pattern-matches a partial gather load followed by a sign or
zero extension into an extending gather load. (The partial gather load
is already an extending load; we just don't rely on the upper bits of
the elements.)
2019-11-16 Richard Sandiford <richard.sandiford@arm.com>
gcc/
* config/aarch64/iterators.md (SVE_2BHSI, SVE_2HSDI, SVE_4BHI)
(SVE_4HSI): New mode iterators.
(ANY_EXTEND2): New code iterator.
* config/aarch64/aarch64-sve.md
(@aarch64_gather_load_<ANY_EXTEND:optab><VNx4_WIDE:mode><VNx4_NARROW:mode>):
Extend to...
(@aarch64_gather_load_<ANY_EXTEND:optab><SVE_4HSI:mode><SVE_4BHI:mode>):
...this, handling extension to partial modes as well as full modes.
Describe the extension as a predicated rather than unpredicated
extension.
(@aarch64_gather_load_<ANY_EXTEND:optab><VNx2_WIDE:mode><VNx2_NARROW:mode>):
Likewise extend to...
(@aarch64_gather_load_<ANY_EXTEND:optab><SVE_2HSDI:mode><SVE_2BHSI:mode>):
...this, making the same adjustments.
(*aarch64_gather_load_<ANY_EXTEND:optab><VNx2_WIDE:mode><VNx2_NARROW:mode>_sxtw):
Likewise extend to...
(*aarch64_gather_load_<ANY_EXTEND:optab><SVE_2HSDI:mode><SVE_2BHSI:mode>_sxtw)
...this, making the same adjustments.
(*aarch64_gather_load_<ANY_EXTEND:optab><VNx2_WIDE:mode><VNx2_NARROW:mode>_uxtw):
Likewise extend to...
(*aarch64_gather_load_<ANY_EXTEND:optab><SVE_2HSDI:mode><SVE_2BHSI:mode>_uxtw)
...this, making the same adjustments.
(*aarch64_gather_load_<ANY_EXTEND:optab><SVE_2HSDI:mode><SVE_2BHSI:mode>_<ANY_EXTEND2:su>xtw_unpacked):
New pattern.
(*aarch64_ldff1_gather<mode>_sxtw): Canonicalize to a constant
extension predicate.
(@aarch64_ldff1_gather_<ANY_EXTEND:optab><VNx4_WIDE:mode><VNx4_NARROW:mode>)
(@aarch64_ldff1_gather_<ANY_EXTEND:optab><VNx2_WIDE:mode><VNx2_NARROW:mode>)
(*aarch64_ldff1_gather_<ANY_EXTEND:optab><VNx2_WIDE:mode><VNx2_NARROW:mode>_uxtw):
Describe the extension as a predicated rather than unpredicated
extension.
(*aarch64_ldff1_gather_<ANY_EXTEND:optab><VNx2_WIDE:mode><VNx2_NARROW:mode>_sxtw):
Likewise. Canonicalize to a constant extension predicate.
* config/aarch64/aarch64-sve-builtins-base.cc
(svld1_gather_extend_impl::expand): Add an extra predicate for
the extension.
(svldff1_gather_extend_impl::expand): Likewise.
gcc/testsuite/
* gcc.target/aarch64/sve/gather_load_extend_1.c: New test.
* gcc.target/aarch64/sve/gather_load_extend_2.c: Likewise.
* gcc.target/aarch64/sve/gather_load_extend_3.c: Likewise.
* gcc.target/aarch64/sve/gather_load_extend_4.c: Likewise.
* gcc.target/aarch64/sve/gather_load_extend_5.c: Likewise.
* gcc.target/aarch64/sve/gather_load_extend_6.c: Likewise.
* gcc.target/aarch64/sve/gather_load_extend_7.c: Likewise.
* gcc.target/aarch64/sve/gather_load_extend_8.c: Likewise.
* gcc.target/aarch64/sve/gather_load_extend_9.c: Likewise.
* gcc.target/aarch64/sve/gather_load_extend_10.c: Likewise.
* gcc.target/aarch64/sve/gather_load_extend_11.c: Likewise.
* gcc.target/aarch64/sve/gather_load_extend_12.c: Likewise.
From-SVN: r278346
Richard Sandiford [Sat, 16 Nov 2019 11:20:30 +0000 (11:20 +0000)]
[AArch64] Add gather loads for partial SVE modes
This patch adds support for gather loads of partial vectors,
where the vector base or offset elements can be wider than the
elements being loaded.
2019-11-16 Richard Sandiford <richard.sandiford@arm.com>
gcc/
* config/aarch64/iterators.md (SVE_24, SVE_2, SVE_4): New mode
iterators.
* config/aarch64/aarch64-sve.md
(gather_load<SVE_FULL_SD:mode><v_int_equiv>): Extend to...
(gather_load<SVE_24:mode><v_int_container>): ...this.
(mask_gather_load<SVE_FULL_S:mode><v_int_equiv>): Extend to...
(mask_gather_load<SVE_4:mode><v_int_container>): ...this.
(mask_gather_load<SVE_FULL_D:mode><v_int_equiv>): Extend to...
(mask_gather_load<SVE_2:mode><v_int_container>): ...this.
(*mask_gather_load<SVE_2:mode><v_int_container>_<su>xtw_unpacked):
New pattern.
(*mask_gather_load<SVE_FULL_D:mode><v_int_equiv>_sxtw): Extend to...
(*mask_gather_load<SVE_2:mode><v_int_equiv>_sxtw): ...this.
Allow the nominal extension predicate to be different from the
load predicate.
(*mask_gather_load<SVE_FULL_D:mode><v_int_equiv>_uxtw): Extend to...
(*mask_gather_load<SVE_2:mode><v_int_equiv>_uxtw): ...this.
gcc/testsuite/
* gcc.target/aarch64/sve/gather_load_1.c (TEST_LOOP): Start at 0.
(TEST_ALL): Add tests for 8-bit and 16-bit elements.
* gcc.target/aarch64/sve/gather_load_2.c: Update accordingly.
* gcc.target/aarch64/sve/gather_load_3.c (TEST_LOOP): Start at 0.
(TEST_ALL): Add tests for 8-bit and 16-bit elements.
* gcc.target/aarch64/sve/gather_load_4.c: Update accordingly.
* gcc.target/aarch64/sve/gather_load_5.c (TEST_LOOP): Start at 0.
(TEST_ALL): Add tests for 8-bit, 16-bit and 32-bit elements.
* gcc.target/aarch64/sve/gather_load_6.c: Add
--param aarch64-sve-compare-costs=0.
(TEST_LOOP): Start at 0.
* gcc.target/aarch64/sve/gather_load_7.c: Add
--param aarch64-sve-compare-costs=0.
* gcc.target/aarch64/sve/gather_load_8.c: New test.
* gcc.target/aarch64/sve/gather_load_9.c: Likewise.
* gcc.target/aarch64/sve/mask_gather_load_6.c: Add
--param aarch64-sve-compare-costs=0.
From-SVN: r278345
Richard Sandiford [Sat, 16 Nov 2019 11:14:51 +0000 (11:14 +0000)]
[AArch64] Add truncation for partial SVE modes
This patch adds support for "truncating" to a partial SVE vector from
either a full SVE vector or a wider partial vector. This truncation is
actually a no-op and so should have zero cost in the vector cost model.
2019-11-16 Richard Sandiford <richard.sandiford@arm.com>
gcc/
* config/aarch64/aarch64-sve.md
(trunc<SVE_HSDI:mode><SVE_PARTIAL_I:mode>2): New pattern.
* config/aarch64/aarch64.c (aarch64_integer_truncation_p): New
function.
(aarch64_sve_adjust_stmt_cost): Call it.
gcc/testsuite/
* gcc.target/aarch64/sve/mask_struct_load_1.c: Add
--param aarch64-sve-compare-costs=0.
* gcc.target/aarch64/sve/mask_struct_load_2.c: Likewise.
* gcc.target/aarch64/sve/mask_struct_load_3.c: Likewise.
* gcc.target/aarch64/sve/mask_struct_load_4.c: Likewise.
* gcc.target/aarch64/sve/mask_struct_load_5.c: Likewise.
* gcc.target/aarch64/sve/pack_1.c: Likewise.
* gcc.target/aarch64/sve/truncate_1.c: New test.
From-SVN: r278344
Richard Sandiford [Sat, 16 Nov 2019 11:11:47 +0000 (11:11 +0000)]
[AArch64] Pattern-match SVE extending loads
This patch pattern-matches a partial SVE load followed by a sign or zero
extension into an extending load. (The partial load is already an
extending load; we just don't rely on the upper bits of the elements.)
Nothing yet uses the extra LDFF1 and LDNF1 combinations, but it seemed
more consistent to provide them, since I needed to update the pattern
to use a predicated extension anyway.
2019-11-16 Richard Sandiford <richard.sandiford@arm.com>
gcc/
* config/aarch64/aarch64-sve.md
(@aarch64_load_<ANY_EXTEND:optab><VNx8_WIDE:mode><VNx8_NARROW:mode>):
(@aarch64_load_<ANY_EXTEND:optab><VNx4_WIDE:mode><VNx4_NARROW:mode>)
(@aarch64_load_<ANY_EXTEND:optab><VNx2_WIDE:mode><VNx2_NARROW:mode>):
Combine into...
(@aarch64_load_<ANY_EXTEND:optab><SVE_HSDI:mode><SVE_PARTIAL_I:mode>):
...this new pattern, handling extension to partial modes as well
as full modes. Describe the extension as a predicated rather than
unpredicated extension.
(@aarch64_ld<fn>f1_<ANY_EXTEND:optab><VNx8_WIDE:mode><VNx8_NARROW:mode>)
(@aarch64_ld<fn>f1_<ANY_EXTEND:optab><VNx4_WIDE:mode><VNx4_NARROW:mode>)
(@aarch64_ld<fn>f1_<ANY_EXTEND:optab><VNx2_WIDE:mode><VNx2_NARROW:mode>):
Combine into...
(@aarch64_ld<fn>f1_<ANY_EXTEND:optab><SVE_HSDI:mode><SVE_PARTIAL_I:mode>):
...this new pattern, handling extension to partial modes as well
as full modes. Describe the extension as a predicated rather than
unpredicated extension.
* config/aarch64/aarch64-sve-builtins.cc
(function_expander::use_contiguous_load_insn): Add an extra
predicate for extending loads.
* config/aarch64/aarch64.c (aarch64_extending_load_p): New function.
(aarch64_sve_adjust_stmt_cost): Likewise.
(aarch64_add_stmt_cost): Use aarch64_sve_adjust_stmt_cost to adjust
the cost of SVE vector stmts.
gcc/testsuite/
* gcc.target/aarch64/sve/load_extend_1.c: New test.
* gcc.target/aarch64/sve/load_extend_2.c: Likewise.
* gcc.target/aarch64/sve/load_extend_3.c: Likewise.
* gcc.target/aarch64/sve/load_extend_4.c: Likewise.
* gcc.target/aarch64/sve/load_extend_5.c: Likewise.
* gcc.target/aarch64/sve/load_extend_6.c: Likewise.
* gcc.target/aarch64/sve/load_extend_7.c: Likewise.
* gcc.target/aarch64/sve/load_extend_8.c: Likewise.
* gcc.target/aarch64/sve/load_extend_9.c: Likewise.
* gcc.target/aarch64/sve/load_extend_10.c: Likewise.
* gcc.target/aarch64/sve/reduc_4.c: Add
--param aarch64-sve-compare-costs=0.
From-SVN: r278343
Richard Sandiford [Sat, 16 Nov 2019 11:07:23 +0000 (11:07 +0000)]
[AArch64] Add sign and zero extension for partial SVE modes
This patch adds support for extending from partial SVE modes
to both full vector modes and wider partial modes.
Some tests now need --param aarch64-sve-compare-costs=0 to force
the original full-vector code.
2019-11-16 Richard Sandiford <richard.sandiford@arm.com>
gcc/
* config/aarch64/iterators.md (SVE_HSDI): New mode iterator.
(narrower_mask): Handle VNx4HI, VNx2HI and VNx2SI.
* config/aarch64/aarch64-sve.md
(<ANY_EXTEND:optab><SVE_PARTIAL_I:mode><SVE_HSDI:mode>2): New pattern.
(*<ANY_EXTEND:optab><SVE_PARTIAL_I:mode><SVE_HSDI:mode>2): Likewise.
(@aarch64_pred_sxt<SVE_FULL_HSDI:mode><SVE_PARTIAL_I:mode>): Update
comment. Avoid new narrower_mask ambiguity.
(@aarch64_cond_sxt<SVE_FULL_HSDI:mode><SVE_PARTIAL_I:mode>): Likewise.
(*cond_uxt<mode>_2): Update comment.
(*cond_uxt<mode>_any): Likewise.
gcc/testsuite/
* gcc.target/aarch64/sve/cost_model_1.c: Expect the loop to be
vectorized with bytes stored in 32-bit containers.
* gcc.target/aarch64/sve/extend_1.c: New test.
* gcc.target/aarch64/sve/extend_2.c: New test.
* gcc.target/aarch64/sve/extend_3.c: New test.
* gcc.target/aarch64/sve/extend_4.c: New test.
* gcc.target/aarch64/sve/load_const_offset_3.c: Add
--param aarch64-sve-compare-costs=0.
* gcc.target/aarch64/sve/mask_struct_store_1.c: Likewise.
* gcc.target/aarch64/sve/mask_struct_store_1_run.c: Likewise.
* gcc.target/aarch64/sve/mask_struct_store_2.c: Likewise.
* gcc.target/aarch64/sve/mask_struct_store_2_run.c: Likewise.
* gcc.target/aarch64/sve/unpack_unsigned_1.c: Likewise.
* gcc.target/aarch64/sve/unpack_unsigned_1_run.c: Likewise.
From-SVN: r278342
Richard Sandiford [Sat, 16 Nov 2019 11:02:09 +0000 (11:02 +0000)]
[AArch64] Add autovec support for partial SVE vectors
This patch adds the bare minimum needed to support autovectorisation of
partial SVE vectors, namely moves and integer addition. Later patches
add more interesting cases.
2019-11-16 Richard Sandiford <richard.sandiford@arm.com>
gcc/
* config/aarch64/aarch64-modes.def: Define partial SVE vector
float modes.
* config/aarch64/aarch64-protos.h (aarch64_sve_pred_mode): New
function.
* config/aarch64/aarch64.c (aarch64_classify_vector_mode): Handle the
new vector float modes.
(aarch64_sve_container_bits): New function.
(aarch64_sve_pred_mode): Likewise.
(aarch64_get_mask_mode): Use it.
(aarch64_sve_element_int_mode): Handle structure modes and partial
modes.
(aarch64_sve_container_int_mode): New function.
(aarch64_vectorize_related_mode): Return SVE modes when given
SVE modes. Handle partial modes, taking the preferred number
of units from the size of the given mode.
(aarch64_hard_regno_mode_ok): Allow partial modes to be stored
in registers.
(aarch64_expand_sve_ld1rq): Use the mode form of aarch64_sve_pred_mode.
(aarch64_expand_sve_const_vector): Handle partial SVE vectors.
(aarch64_split_sve_subreg_move): Use the mode form of
aarch64_sve_pred_mode.
(aarch64_secondary_reload): Handle partial modes in the same way
as full big-endian vectors.
(aarch64_vector_mode_supported_p): Allow partial SVE vectors.
(aarch64_autovectorize_vector_modes): Try unpacked SVE vectors,
merging with the Advanced SIMD modes. If two modes have the
same size, try the Advanced SIMD mode first.
(aarch64_simd_valid_immediate): Use the container rather than
the element mode for INDEX constants.
(aarch64_simd_vector_alignment): Make the alignment of partial
SVE vector modes the same as their minimum size.
(aarch64_evpc_sel): Use the mode form of aarch64_sve_pred_mode.
* config/aarch64/aarch64-sve.md (mov<SVE_FULL:mode>): Extend to...
(mov<SVE_ALL:mode>): ...this.
(movmisalign<SVE_FULL:mode>): Extend to...
(movmisalign<SVE_ALL:mode>): ...this.
(*aarch64_sve_mov<mode>_le): Rename to...
(*aarch64_sve_mov<mode>_ldr_str): ...this.
(*aarch64_sve_mov<SVE_FULL:mode>_be): Rename and extend to...
(*aarch64_sve_mov<SVE_ALL:mode>_no_ldr_str): ...this. Handle
partial modes regardless of endianness.
(aarch64_sve_reload_be): Rename to...
(aarch64_sve_reload_mem): ...this and enable for little-endian.
Use aarch64_sve_pred_mode to get the appropriate predicate mode.
(@aarch64_pred_mov<SVE_FULL:mode>): Extend to...
(@aarch64_pred_mov<SVE_ALL:mode>): ...this.
(*aarch64_sve_mov<SVE_FULL:mode>_subreg_be): Extend to...
(*aarch64_sve_mov<SVE_ALL:mode>_subreg_be): ...this.
(@aarch64_sve_reinterpret<SVE_FULL:mode>): Extend to...
(@aarch64_sve_reinterpret<SVE_ALL:mode>): ...this.
(*aarch64_sve_reinterpret<SVE_FULL:mode>): Extend to...
(*aarch64_sve_reinterpret<SVE_ALL:mode>): ...this.
(maskload<SVE_FULL:mode><vpred>): Extend to...
(maskload<SVE_ALL:mode><vpred>): ...this.
(maskstore<SVE_FULL:mode><vpred>): Extend to...
(maskstore<SVE_ALL:mode><vpred>): ...this.
(vec_duplicate<SVE_FULL:mode>): Extend to...
(vec_duplicate<SVE_ALL:mode>): ...this.
(*vec_duplicate<SVE_FULL:mode>_reg): Extend to...
(*vec_duplicate<SVE_ALL:mode>_reg): ...this.
(sve_ld1r<SVE_FULL:mode>): Extend to...
(sve_ld1r<SVE_ALL:mode>): ...this.
(vec_series<SVE_FULL_I:mode>): Extend to...
(vec_series<SVE_I:mode>): ...this.
(*vec_series<SVE_FULL_I:mode>_plus): Extend to...
(*vec_series<SVE_I:mode>_plus): ...this.
(@aarch64_pred_sxt<SVE_FULL_HSDI:mode><SVE_PARTIAL_I:mode>): Avoid
new VPRED ambiguity.
(@aarch64_cond_sxt<SVE_FULL_HSDI:mode><SVE_PARTIAL_I:mode>): Likewise.
(add<SVE_FULL_I:mode>3): Extend to...
(add<SVE_I:mode>3): ...this.
* config/aarch64/iterators.md (SVE_ALL, SVE_I): New mode iterators.
(Vetype, Vesize, VEL, Vel, vwcore): Handle partial SVE vector modes.
(VPRED, vpred): Likewise.
(Vctype): New iterator.
(vw): Remove SVE modes.
gcc/testsuite/
* gcc.target/aarch64/sve/mixed_size_1.c: New test.
* gcc.target/aarch64/sve/mixed_size_2.c: Likewise.
* gcc.target/aarch64/sve/mixed_size_3.c: Likewise.
* gcc.target/aarch64/sve/mixed_size_4.c: Likewise.
* gcc.target/aarch64/sve/mixed_size_5.c: Likewise.
From-SVN: r278341
Richard Sandiford [Sat, 16 Nov 2019 10:57:55 +0000 (10:57 +0000)]
[AArch64] Tweak gcc.target/aarch64/sve/clastb_8.c
clastb_8.c was using scan-tree-dump-times to check for fully-masked
loops, which made it sensitive to the number of times we try to
vectorize.
2019-11-16 Richard Sandiford <richard.sandiford@arm.com>
gcc/testsuite/
* gcc.target/aarch64/sve/clastb_8.c: Use assembly tests to
check for fully-masked loops.
From-SVN: r278340
Richard Sandiford [Sat, 16 Nov 2019 10:55:40 +0000 (10:55 +0000)]
[AArch64] Replace SVE_PARTIAL with SVE_PARTIAL_I
Another renaming, this time to make way for partial/unpacked
float modes.
2019-11-16 Richard Sandiford <richard.sandiford@arm.com>
gcc/
* config/aarch64/iterators.md (SVE_PARTIAL): Rename to...
(SVE_PARTIAL_I): ...this.
* config/aarch64/aarch64-sve.md: Apply the above renaming throughout.
From-SVN: r278339
Richard Sandiford [Sat, 16 Nov 2019 10:50:42 +0000 (10:50 +0000)]
[AArch64] Add "FULL" to SVE mode iterator names
An upcoming patch will make more use of partial/unpacked SVE vectors.
We then need a distinction between mode iterators that include partial
modes and those that only include "full" modes. This patch prepares
for that by adding "FULL" to the names of iterators that only select
full modes. There should be no change in behaviour.
2019-11-16 Richard Sandiford <richard.sandiford@arm.com>
gcc/
* config/aarch64/iterators.md (SVE_ALL): Rename to...
(SVE_FULL): ...this.
(SVE_I): Rename to...
(SVE_FULL_I): ...this.
(SVE_F): Rename to...
(SVE_FULL_F): ...this.
(SVE_BHSI): Rename to...
(SVE_FULL_BHSI): ...this.
(SVE_HSD): Rename to...
(SVE_FULL_HSD): ...this.
(SVE_HSDI): Rename to...
(SVE_FULL_HSDI): ...this.
(SVE_HSF): Rename to...
(SVE_FULL_HSF): ...this.
(SVE_SD): Rename to...
(SVE_FULL_SD): ...this.
(SVE_SDI): Rename to...
(SVE_FULL_SDI): ...this.
(SVE_SDF): Rename to...
(SVE_FULL_SDF): ...this.
(SVE_S): Rename to...
(SVE_FULL_S): ...this.
(SVE_D): Rename to...
(SVE_FULL_D): ...this.
* config/aarch64/aarch64-sve.md: Apply the above renaming throughout.
* config/aarch64/aarch64-sve2.md: Likewise.
From-SVN: r278338
Richard Sandiford [Sat, 16 Nov 2019 10:43:52 +0000 (10:43 +0000)]
[AArch64] Enable VECT_COMPARE_COSTS by default for SVE
This patch enables VECT_COMPARE_COSTS by default for SVE, both so
that we can compare SVE against Advanced SIMD and so that (with future
patches) we can compare multiple SVE vectorisation approaches against
each other. It also adds a target-specific --param to control this.
2019-11-16 Richard Sandiford <richard.sandiford@arm.com>
gcc/
* config/aarch64/aarch64.opt (--param=aarch64-sve-compare-costs):
New option.
* doc/invoke.texi: Document it.
* config/aarch64/aarch64.c (aarch64_autovectorize_vector_modes):
By default, return VECT_COMPARE_COSTS for SVE.
gcc/testsuite/
* gcc.target/aarch64/sve/reduc_3.c: Split multi-vector cases out
into...
* gcc.target/aarch64/sve/reduc_3_costly.c: ...this new test,
passing -fno-vect-cost-model for them.
* gcc.target/aarch64/sve/slp_6.c: Add -fno-vect-cost-model.
* gcc.target/aarch64/sve/slp_7.c,
* gcc.target/aarch64/sve/slp_7_run.c: Split multi-vector cases out
into...
* gcc.target/aarch64/sve/slp_7_costly.c,
* gcc.target/aarch64/sve/slp_7_costly_run.c: ...these new tests,
passing -fno-vect-cost-model for them.
* gcc.target/aarch64/sve/while_7.c: Add -fno-vect-cost-model.
* gcc.target/aarch64/sve/while_9.c: Likewise.
From-SVN: r278337
Richard Sandiford [Sat, 16 Nov 2019 10:40:23 +0000 (10:40 +0000)]
Optionally pick the cheapest loop_vec_info
This patch adds a mode in which the vectoriser tries each available
base vector mode and picks the one with the lowest cost. The new
behaviour is selected by autovectorize_vector_modes.
The patch keeps the current behaviour of preferring a VF of
loop->simdlen over any larger or smaller VF, regardless of costs
or target preferences.
2019-11-16 Richard Sandiford <richard.sandiford@arm.com>
gcc/
* target.h (VECT_COMPARE_COSTS): New constant.
* target.def (autovectorize_vector_modes): Return a bitmask of flags.
* doc/tm.texi: Regenerate.
* targhooks.h (default_autovectorize_vector_modes): Update accordingly.
* targhooks.c (default_autovectorize_vector_modes): Likewise.
* config/aarch64/aarch64.c (aarch64_autovectorize_vector_modes):
Likewise.
* config/arc/arc.c (arc_autovectorize_vector_modes): Likewise.
* config/arm/arm.c (arm_autovectorize_vector_modes): Likewise.
* config/i386/i386.c (ix86_autovectorize_vector_modes): Likewise.
* config/mips/mips.c (mips_autovectorize_vector_modes): Likewise.
* tree-vectorizer.h (_loop_vec_info::vec_outside_cost)
(_loop_vec_info::vec_inside_cost): New member variables.
* tree-vect-loop.c (_loop_vec_info::_loop_vec_info): Initialize them.
(vect_better_loop_vinfo_p, vect_joust_loop_vinfos): New functions.
(vect_analyze_loop): When autovectorize_vector_modes returns
VECT_COMPARE_COSTS, try vectorizing the loop with each available
vector mode and picking the one with the lowest cost.
(vect_estimate_min_profitable_iters): Record the computed costs
in the loop_vec_info.
From-SVN: r278336