Jakub Jelinek [Thu, 1 Apr 2021 09:04:12 +0000 (11:04 +0200)]
doc: Fix up symver attribute documentation
When looking at the symver documentation, I've noticed a couple of
syntax errors in it.
2021-04-01 Jakub Jelinek <jakub@redhat.com>
* doc/extend.texi (symver attribute): Fix up syntax errors
in the examples.
Jakub Jelinek [Thu, 1 Apr 2021 08:51:03 +0000 (10:51 +0200)]
bswap: Handle bswapping of pointers [PR96573]
In GCC8/9 we used to optimize this into a bswap, but we no longer do.
Handling byteswapping of pointers is easy, all we need is to allow them,
for the __builtin_bswap* we already use TYPE_PRECISION to determine
the precision and we cast the operand and result to the correct type
if they aren't uselessly convertible to what the builtin expects.
2021-04-01 Jakub Jelinek <jakub@redhat.com>
PR tree-optimization/96573
* gimple-ssa-store-merging.c (init_symbolic_number): Handle
also pointer types.
* gcc.dg/pr96573.c: New test.
Richard Biener [Thu, 1 Apr 2021 07:29:14 +0000 (09:29 +0200)]
tree-optimization/99856 - fix overwideing pattern creation
This fixes an omission of promoting a bit-precision required precision
to a vector element precision.
2021-04-01 Richard Biener <rguenther@suse.de>
PR tree-optimization/99856
* tree-vect-patterns.c (vect_recog_over_widening_pattern): Promote
precision to vector element precision.
* gcc.dg/vect/pr99856.c: New testcase.
Martin Jambor [Thu, 1 Apr 2021 08:12:23 +0000 (10:12 +0200)]
sra: Fix bug in grp_write propagation (PR 97009)
SRA represents parts of aggregates which are arrays accessed with
unknown index as "unscalarizable regions." When there are two such
regions one within another and the outer is only read whereas the
inner is written to, SRA fails to propagate that write information
across assignments. This means that a second aggregate can contain
data while SRA thinks it does not and the pass can wrongly eliminate
big chunks of assignment from that second aggregate into a third
aggregate, which is what happens in PR 97009.
Fixed by checking all children of unscalariable accesses for the
grp_write flag.
gcc/ChangeLog:
2021-03-31 Martin Jambor <mjambor@suse.cz>
PR tree-optimization/97009
* tree-sra.c (access_or_its_child_written): New function.
(propagate_subaccesses_from_rhs): Use it instead of a simple grp_write
test.
gcc/testsuite/ChangeLog:
2021-03-31 Martin Jambor <mjambor@suse.cz>
PR tree-optimization/97009
* gcc.dg/tree-ssa/pr97009.c: New test.
Harald Anlauf [Thu, 1 Apr 2021 05:49:32 +0000 (07:49 +0200)]
PR fortran/99840 - ICE in gfc_simplify_matmul, at fortran/simplify.c:4777
The simplification of the transposition of a constant array shall properly
initialize and set the shape of the result.
gcc/fortran/ChangeLog:
PR fortran/99840
* simplify.c (gfc_simplify_transpose): Properly initialize
resulting shape.
gcc/testsuite/ChangeLog:
PR fortran/99840
* gfortran.dg/transpose_5.f90: New test.
GCC Administrator [Thu, 1 Apr 2021 00:16:39 +0000 (00:16 +0000)]
Daily bump.
David Malcolm [Mon, 29 Mar 2021 20:13:32 +0000 (16:13 -0400)]
analyzer: avoid printing '<unknown>' for SSA names [PR99771]
We don't want to print '<unknown>' in our diagnostics, but
PR analyzer/99771 lists various cases where -fanalyzer does, due to
using the SSA_NAME for a temporary when determining the best tree to
use.
This can happen in two ways:
(a) ...when a better expression than the SSA_NAME could be built, but
finding it requires traversing the relationships in the region_model
in a graph-like way, rather than by considering individual svalues and
regions.
(b) ...when the only remaining user of the underlying svalue is the
SSA_NAME, typically due to the diagnostic referring to a temporary.
I've been experimenting with fixing (a), but don't have a good fix yet.
In the meantime, this patch addresses (b) by detecting if we have
the SSA_NAME for a temporary, and, for the cases where it's possible,
reconstructing a tree by walking the def-stmts. This fixes various
cases of (b) and ameliorates some cases of (a).
gcc/analyzer/ChangeLog:
PR analyzer/99771
* analyzer.cc (maybe_reconstruct_from_def_stmt): New.
(fixup_tree_for_diagnostic_1): New.
(fixup_tree_for_diagnostic): New.
* analyzer.h (fixup_tree_for_diagnostic): New decl.
* checker-path.cc (call_event::get_desc): Call
fixup_tree_for_diagnostic and use it for the call_with_state call.
(warning_event::get_desc): Likewise for the final_event and
make_label_text calls.
* engine.cc (impl_region_model_context::on_state_leak): Likewise
for the on_leak and add_diagnostic calls.
* region-model.cc (region_model::get_representative_tree):
Likewise for the result.
gcc/testsuite/ChangeLog:
PR analyzer/99771
* gcc.dg/analyzer/data-model-10.c: Update expected output.
* gcc.dg/analyzer/malloc-ipa-13.c: Likewise.
* gcc.dg/analyzer/malloc-ipa-13a.c: New test.
* gcc.dg/analyzer/pr99771-1.c: New test.
Jan Hubicka [Wed, 31 Mar 2021 20:44:20 +0000 (22:44 +0200)]
Make USES_COMDAT_LOCAL CIF_FINAL_NORMAL
USES_COMDAT_LOCAL is incorrectly defined as CIF_FINAL_ERROR which makes inliner
to mis some inlines of functions in comdat section that was previously split.
2021-03-31 Jan Hubicka <hubicka@ucw.cz>
PR ipa/98265
* cif-code.def (USES_COMDAT_LOCAL): Make CIF_FINAL_NORMAL.
Pat Haugen [Wed, 31 Mar 2021 19:37:24 +0000 (14:37 -0500)]
Update prefixed attribute for Power10.
This patch creates a new attribute, "maybe_prefixed", which is used to mark
those instructions that may have a prefixed form. The existing "prefixed"
attribute is now used to mark all instructions that are prefixed form.
2021-03-31 Pat Haugen <pthaugen@linux.ibm.com>
gcc/
PR target/99133
* config/rs6000/altivec.md (xxspltiw_v4si, xxspltiw_v4sf_inst,
xxspltidp_v2df_inst, xxsplti32dx_v4si_inst, xxsplti32dx_v4sf_inst,
xxblend_<mode>, xxpermx_inst, xxeval): Mark prefixed.
* config/rs6000/mma.md (mma_<vvi4i4i8>, mma_<avvi4i4i8>,
mma_<vvi4i4i2>, mma_<avvi4i4i2>, mma_<vvi4i4>, mma_<avvi4i4>,
mma_<pvi4i2>, mma_<apvi4i2>, mma_<vvi4i4i4>, mma_<avvi4i4i4>):
Likewise.
* config/rs6000/rs6000.c (rs6000_final_prescan_insn): Adjust test.
* config/rs6000/rs6000.md (define_attr "maybe_prefixed"): New.
(define_attr "prefixed"): Update initializer.
Jakub Jelinek [Wed, 31 Mar 2021 19:25:58 +0000 (21:25 +0200)]
dwarf2out: Fix up ranges for -gdwarf-5 -gsplit-dwarf [PR99490]
For -gdwarf-4 -gsplit-dwarf we used to emit .debug_ranges section
(so in the binaries/shared libraries) with DW_AT_ranges from skeleton
units as well as .debug_info.dwo pointing to it through DW_FORM_sec_offset
(and DW_AT_GNU_ranges_base pointing into section, not sure for what
reason exactly).
When DWARF5 support was being added, we've started using .debug_rnglists
section, added DW_AT_rnglists_base to the DW_TAG_skeleton_unit, kept
DW_AT_ranges with DW_FORM_sec_offset in the skeleton and switched
over to DW_FORM_rnglistx for DW_AT_ranges in .debug_info.dwo.
But the DWARF5 spec actually means for the ranges section (at least
everything for those DW_AT_ranges in .debug_info.dwo) to sit
in .debug_rnglists.dwo section next to the .debug_info.dwo, rather than
having consumers look it up in the binary/shared library instead.
Based on some discussions in the DWARF discuss mailing list:
http://lists.dwarfstd.org/pipermail/dwarf-discuss-dwarfstd.org/2021-March/thread.html#4765
this patch mostly follows what LLVM emits for that right now:
1) small .debug_rnglists section (when needed) just to cover the
skeleton DW_AT_ranges (if present); the content of the section
uses the Split DWARFy DW_RLE_* codes with addrx encodings where
possible
2) DW_AT_ranges in the skeleton uses DW_FORM_sec_offset (difference
from LLVM which uses DW_FORM_rnglistx, which makes it larger
and ambiguous)
3) DW_AT_rnglists_base attribute is gone from the skeleton (again,
unlike LLVM where it is just confusing what exactly it means because
it is inherited; it would make sense if we emitted DW_FORM_rnglistx
in non-split DWARF, but unless ranges are shared, I'm afraid we'd
make DWARF larger with fewer relocations by that)
4) usually big .debug_rnglists.dwo section again with using DW_RLE_*x*
where possible
5) DW_AT_ranges with DW_FORM_rnglistx from .debug_info.dwo referring to
that .debug_rnglists.dwo ranges
2021-03-31 Jakub Jelinek <jakub@redhat.com>
PR debug/99490
* dwarf2out.c (debug_ranges_dwo_section): New variable.
(DW_RANGES_IDX_SKELETON): Define.
(struct dw_ranges): Add begin_entry and end_entry members.
(DEBUG_DWO_RNGLISTS_SECTION): Define.
(add_ranges_num): Adjust r initializer for addition of *_entry
members.
(add_ranges_by_labels): For -gsplit-dwarf and force_direct,
set idx to DW_RANGES_IDX_SKELETON.
(use_distinct_base_address_for_range): New function.
(index_rnglists): Don't set r->idx if it is equal to
DW_RANGES_IDX_SKELETON. Initialize r->begin_entry and
r->end_entry for -gsplit-dwarf if those will be needed by
output_rnglists.
(output_rnglists): Add DWO argument. If true, switch to
debug_ranges_dwo_section rather than debug_ranges_section.
Adjust l1/l2 label indexes. Only output the offset table when
dwo is true and don't include in there the skeleton range
entry if present. For -gsplit-dwarf, skip ranges that belong
to the other rnglists section. Change return type from void
to bool and return true if there are any range entries for
the other section. For dwarf_split_debug_info use
DW_RLE_startx_endx, DW_RLE_startx_length and DW_RLE_base_addressx
entries instead of DW_RLE_start_end, DW_RLE_start_length and
DW_RLE_base_address. Use use_distinct_base_address_for_range.
(init_sections_and_labels): Initialize debug_ranges_dwo_section
if -gsplit-dwarf and DWARF >= 5. Adjust ranges_section_label
and range_base_label indexes.
(dwarf2out_finish): Call index_rnglists earlier before finalizing
.debug_addr. Never emit DW_AT_rnglists_base attribute. For
-gsplit-dwarf and DWARF >= 5 call output_rnglists up to twice
with different dwo arguments.
(dwarf2out_c_finalize): Clear debug_ranges_dwo_section.
Alexandre Oliva [Wed, 31 Mar 2021 18:34:47 +0000 (15:34 -0300)]
improve future::poll calibration loop
The calibration loop I've recently added to the libstdc++
future/members/poll.cc tests could still select iteration counts that
might yield zero-time measurements for the wait_for when ready loop.
Waiting for a future that has already had a value set is presumably
uniformly faster than a zero-timed wait for a result, so I've changed
the calibration loop to use the former.
We might still be unlucky and get nonzero from the initial loop, so
that the calibration is skipped altogether, but then get zero from the
later when-ready loop. I'm not dealing with this case in this patch.
for libstdc++-v3/ChangeLog
* testsuite/30_threads/future/members/poll.cc: Use faster
after-ready call in the calibration loop.
Richard Sandiford [Wed, 31 Mar 2021 18:34:01 +0000 (19:34 +0100)]
gimple-fold: Recompute ADDR_EXPR flags after folding a TMR [PR98268]
The gimple verifier picked up that an ADDR_EXPR of a MEM_REF was not
marked TREE_CONSTANT even though the address was in fact invariant.
This came from folding a &TARGET_MEM_REF with constant operands to
a &MEM_REF; &TARGET_MEM_REF is never treated as TREE_CONSTANT
but &MEM_REF can be.
gcc/
PR tree-optimization/98268
* gimple-fold.c (maybe_canonicalize_mem_ref_addr): Call
recompute_tree_invariant_for_addr_expr after successfully
folding a TARGET_MEM_REF that occurs inside an ADDR_EXPR.
gcc/testsuite/
PR tree-optimization/98268
* gcc.target/aarch64/sve/pr98268-1.c: New test.
* gcc.target/aarch64/sve/pr98268-2.c: Likewise.
Richard Sandiford [Wed, 31 Mar 2021 18:34:01 +0000 (19:34 +0100)]
data-ref: Tighten index-based alias checks [PR99726]
create_intersect_range_checks_index tries to create a runtime
alias check based on index comparisons. It looks through the
access functions for the two DRs to find a SCEV for the loop
that is being versioned and converts a DR_STEP-based check
into an index-based check.
However, there isn't any reliable sign information in the types,
so the code expects the value of the IV step (when interpreted as
signed) to be negative iff the DR_STEP (when interpreted as signed)
is negative.
r10-4762 added another assert related to this assumption and the
assert fired for the testcase in the PR. The sign of the IV step
didn't match the sign of the DR_STEP.
I think this is actually showing what was previously a wrong-code bug.
The signs didn't match because the DRs contained *two* access function
SCEVs for the loop being versioned. It doesn't look like the code
is set up to deal with this, since it checks each access function
independently and treats it as the sole source of DR_STEP.
The patch therefore moves the main condition out of the loop.
This also has the advantage of not building a tree for one access
function only to throw it away if we find an inner function that
makes the comparison invalid.
gcc/
PR tree-optimization/99726
* tree-data-ref.c (create_intersect_range_checks_index): Bail
out if there is more than one access function SCEV for the loop
being versioned.
gcc/testsuite/
PR tree-optimization/99726
* gcc.target/i386/pr99726.c: New test.
Richard Sandiford [Wed, 31 Mar 2021 18:34:00 +0000 (19:34 +0100)]
Handle CONST_POLY_INTs in CONST_VECTORs [PR97141, PR98726]
This PR is caused by POLY_INT_CSTs being (necessarily) valid
in tree-level VECTOR_CSTs but CONST_POLY_INTs not being valid
in RTL CONST_VECTORs. I can't tell/remember how deliberate
that was, but I'm guessing not very. In particular,
valid_for_const_vector_p was added to guard against symbolic
constants rather than CONST_POLY_INTs.
I did briefly consider whether we should maintain the current
status anyway. However, that would then require a way of
constructing variable-length vectors from individiual elements
if, say, we have:
{ [2, 2], [3, 2], [4, 2], … }
So I'm chalking this up to an oversight. I think the intention
(and certainly the natural thing) is to have the same rules for
both trees and RTL.
The SVE CONST_VECTOR code should already be set up to handle
CONST_POLY_INTs. However, we need to add support for Advanced SIMD
CONST_VECTORs that happen to contain SVE-based values. The patch does
that by expanding such CONST_VECTORs in the same way as variable vectors.
gcc/
PR rtl-optimization/97141
PR rtl-optimization/98726
* emit-rtl.c (valid_for_const_vector_p): Return true for
CONST_POLY_INT_P.
* rtx-vector-builder.h (rtx_vector_builder::step): Return a
poly_wide_int instead of a wide_int.
(rtx_vector_builder::apply_set): Take a poly_wide_int instead
of a wide_int.
* rtx-vector-builder.c (rtx_vector_builder::apply_set): Likewise.
* config/aarch64/aarch64.c (aarch64_legitimate_constant_p): Return
false for CONST_VECTORs that cannot be forced to memory.
* config/aarch64/aarch64-simd.md (mov<mode>): If a CONST_VECTOR
is too complex to force to memory, build it up from individual
elements instead.
gcc/testsuite/
PR rtl-optimization/97141
PR rtl-optimization/98726
* gcc.c-torture/compile/pr97141.c: New test.
* gcc.c-torture/compile/pr98726.c: Likewise.
* gcc.target/aarch64/sve/pr97141.c: Likewise.
* gcc.target/aarch64/sve/pr98726.c: Likewise.
Jan Hubicka [Wed, 31 Mar 2021 18:10:31 +0000 (20:10 +0200)]
Fix overvactive check in cgraph_node::release_body
gcc/ChangeLog:
PR lto/99447
* cgraph.c (cgraph_node::release_body): Fix overactive check.
Martin Sebor [Wed, 31 Mar 2021 16:39:24 +0000 (10:39 -0600)]
PR middle-end/65182 - -Wuninitialized fails when pointer to variable later passed to function
gcc/testsuite:
PR middle-end/65182
* gcc.dg/uninit-pr65182.c: New test.
Jason Merrill [Wed, 31 Mar 2021 00:31:18 +0000 (20:31 -0400)]
c++: Alias template in pack expansion [PR99445]
In this testcase, iterative_hash_template_arg checks
alias_template_specialization_p to determine whether to treat a type as a
dependent alias, and structural_comptypes checks
dependent_alias_template_spec_p. Normally that difference isn't a problem
because canonicalizing template arguments strips non-dependent aliases, but
that wasn't happening for the pack expansion. Fixed thus.
gcc/cp/ChangeLog:
PR c++/99445
* tree.c (strip_typedefs): Handle TYPE_PACK_EXPANSION.
gcc/testsuite/ChangeLog:
PR c++/99445
* g++.dg/cpp0x/alias-decl-variadic1.C: New test.
Christophe Lyon [Mon, 29 Mar 2021 11:36:41 +0000 (11:36 +0000)]
testsuite/aarch64: Skip SLP diagnostic under ILP32 (PR target/96974)
The vectorizer has a very different effect with -mabi=ilp32, and
doesn't emit the expecte diagnostic, so this patch expects it only
under lp64.
2021-03-29 Christophe Lyon <christophe.lyon@linaro.org>
gcc/testsuite/
PR target/96974
* g++.target/aarch64/sve/pr96974.C: Expect SLP diagnostic only
under lp64.
Christophe Lyon [Mon, 29 Mar 2021 12:41:08 +0000 (12:41 +0000)]
arm: Fix mult autovectorization patterm for iwmmxt (PR target/99786)
Similarly to other recently-added autovectorization patterns, mult has
been erroneously enabled for iwmmxt. However, V4HI and V2SI modes are
supported, so we make an exception for them.
The new testcase is derived from gcc.dg/ubsan/pr79904.c, with
additional modes added.
I kept dg-do compile because 'assemble' results in error messages from
the assembler, which are not related to this PR:
Error: selected processor does not support `tmcrr wr0,r4,r5' in ARM mode
Error: selected processor does not support `wstrd wr0,[r0]' in ARM mode
Error: selected processor does not support `wldrd wr0,[r0]' in ARM mode
Error: selected processor does not support `wldrd wr2,.L5' in ARM mode
Error: selected processor does not support `wmulul wr0,wr0,wr2' in ARM mode
Error: selected processor does not support `wstrd wr0,[r0]' in ARM mode
Error: selected processor does not support `wldrd wr0,[r0]' in ARM mode
Error: selected processor does not support `wldrd wr2,.L8' in ARM mode
Error: selected processor does not support `wmulwl wr0,wr0,wr2' in ARM mode
Error: selected processor does not support `wstrd wr0,[r0]' in ARM mode
2021-03-29 Christophe Lyon <christophe.lyon@linaro.org>
PR target/99786
gcc/
* config/arm/vec-common.md (mul<mode>3): Disable on iwMMXT, expect
for V4HI and V2SI.
gcc/testsuite/
* gcc.target/arm/pr99786.c: New test.
H.J. Lu [Fri, 22 Jan 2021 02:51:35 +0000 (18:51 -0800)]
x86: Update memcpy/memset inline strategies for Ice Lake
Simply memcpy and memset inline strategies to avoid branches for
-mtune=icelake:
1. With MOVE_RATIO and CLEAR_RATIO == 17, GCC will use integer/vector
load and store for up to 16 * 16 (256) bytes when the data size is
fixed and known.
2. Inline only if data size is known to be <= 256.
a. Use "rep movsb/stosb" with simple code sequence if the data size
is a constant.
b. Use loop if data size is not a constant.
3. Use memcpy/memset libray function if data size is unknown or > 256.
On Ice Lake processor with -march=native -Ofast -flto,
1. Performance impacts of SPEC CPU 2017 rate are:
500.perlbench_r -0.93%
502.gcc_r 0.36%
505.mcf_r 0.31%
520.omnetpp_r -0.07%
523.xalancbmk_r -0.53%
525.x264_r -0.09%
531.deepsjeng_r -0.19%
541.leela_r 0.16%
548.exchange2_r 0.22%
557.xz_r -1.64%
Geomean -0.24%
503.bwaves_r -0.01%
507.cactuBSSN_r 0.00%
508.namd_r 0.12%
510.parest_r 0.07%
511.povray_r 0.29%
519.lbm_r 0.00%
521.wrf_r -0.38%
526.blender_r 0.16%
527.cam4_r 0.18%
538.imagick_r 0.76%
544.nab_r -0.84%
549.fotonik3d_r -0.07%
554.roms_r -0.01%
Geomean 0.02%
2. Significant impacts on eembc benchmarks are:
eembc/nnet_test 9.90%
eembc/mp2decoddata2 16.42%
eembc/textv2data3 -4.86%
eembc/qos 12.90%
gcc/
* config/i386/i386-expand.c (expand_set_or_cpymem_via_rep):
For TARGET_PREFER_KNOWN_REP_MOVSB_STOSB, don't convert QImode
to SImode.
(decide_alg): For TARGET_PREFER_KNOWN_REP_MOVSB_STOSB, use
"rep movsb/stosb" only for known sizes.
* config/i386/i386-options.c (processor_cost_table): Use Ice
Lake cost for Cannon Lake, Ice Lake, Tiger Lake, Sapphire
Rapids and Alder Lake.
* config/i386/i386.h (TARGET_PREFER_KNOWN_REP_MOVSB_STOSB): New.
* config/i386/x86-tune-costs.h (icelake_memcpy): New.
(icelake_memset): Likewise.
(icelake_cost): Likewise.
* config/i386/x86-tune.def (X86_TUNE_PREFER_KNOWN_REP_MOVSB_STOSB):
New.
gcc/testsuite/
* gcc.target/i386/memcpy-strategy-5.c: New test.
* gcc.target/i386/memcpy-strategy-6.c: Likewise.
* gcc.target/i386/memcpy-strategy-7.c: Likewise.
* gcc.target/i386/memcpy-strategy-8.c: Likewise.
* gcc.target/i386/memset-strategy-3.c: Likewise.
* gcc.target/i386/memset-strategy-4.c: Likewise.
* gcc.target/i386/memset-strategy-5.c: Likewise.
* gcc.target/i386/memset-strategy-6.c: Likewise.
Richard Sandiford [Wed, 31 Mar 2021 10:26:06 +0000 (11:26 +0100)]
aarch64: Fix target alignment for SVE [PR98119]
The vectoriser supports peeling for alignment using predication:
we move back to the previous aligned boundary and make the skipped
elements inactive in the first loop iteration. As it happens,
the costs for existing CPUs give an equal cost to aligned and
unaligned accesses, so this feature is rarely used.
However, the PR shows that when the feature was forced on, we were
still trying to align to a full-vector boundary even when using
partial vectors.
gcc/
PR target/98119
* config/aarch64/aarch64.c
(aarch64_vectorize_preferred_vector_alignment): Query the size
of the provided SVE vector; do not assume that all SVE vectors
have the same size.
gcc/testsuite/
PR target/98119
* gcc.target/aarch64/sve/pr98119.c: New test.
Jan Hubicka [Wed, 31 Mar 2021 09:35:29 +0000 (11:35 +0200)]
Small refactoring of cgraph_node::release_body
PR lto/99447
* cgraph.c (cgraph_node::release_body): Remove all callers and
references.
* cgraphclones.c (cgraph_node::materialize_clone): Do not do it here.
* cgraphunit.c (cgraph_node::expand): And here.
Martin Liska [Wed, 31 Mar 2021 08:51:11 +0000 (10:51 +0200)]
Fix coding style in IPA modref.
gcc/ChangeLog:
* ipa-modref.c (analyze_ssa_name_flags): Fix coding style
and one negated condition.
Jakub Jelinek [Wed, 31 Mar 2021 08:46:01 +0000 (10:46 +0200)]
aarch64: Fix up *add<mode>3_poly_1 [PR99813]
As mentioned in the PR, Uai constraint stands for
aarch64_sve_scalar_inc_dec_immediate
while Uav for
aarch64_sve_addvl_addpl_immediate.
Both *add<mode>3_aarch64 and *add<mode>3_poly_1 patterns use
* return aarch64_output_sve_scalar_inc_dec (operands[2]);
* return aarch64_output_sve_addvl_addpl (operands[2]);
in that order, but the former with Uai,Uav order, while the
latter with Uav,Uai instead. This patch swaps the constraints
so that they match the output.
Co-authored-by: Richard Sandiford <richard.sandiford@arm.com>
2021-03-31 Jakub Jelinek <jakub@redhat.com>
Richard Sandiford <richard.sandiford@arm.com>
PR target/99813
* config/aarch64/aarch64.md (*add<mode>3_poly_1): Swap Uai and Uav
constraints on operands[2] and similarly 0 and rk constraints
on operands[1] corresponding to that.
* g++.target/aarch64/sve/pr99813.C: New test.
Jakub Jelinek [Wed, 31 Mar 2021 07:11:29 +0000 (09:11 +0200)]
i386, debug: Default to -gdwarf-4 on Windows targets with broken ld.bfd [PR98860]
As mentioned in the PR, before the
https://sourceware.org/git/?p=binutils-gdb.git;a=commit;h=
ba6eb62ff0ea9843a018cfd7cd06777bd66ae0a0
fix from March 1st, PECOFF ld.bfd didn't know about .debug_loclists,
.debug_rnglists and other debug sections new in DWARF 5. Unfortunately,
unlike for ELF linkers, that means the sections were placed in wrong
ordering with wrong VMA/LMA, so the resulting executables are apparently
unusable.
As that is pretty new change, newer than 2.35.2 or 2.36 binutils releases,
the following patch adds a workaround that turns -gdwarf-4 by default
instead of -gdwarf-5 if a broken linker is found at configure time.
Users can still explicitly play with -gdwarf-5 and either use a non-broken
linker or use custom linker scripts for the broken one, but at least
by default it should work.
2021-03-31 Jakub Jelinek <jakub@redhat.com>
PR bootstrap/98860
* configure.ac (HAVE_LD_BROKEN_PE_DWARF5): New AC_DEFINE if PECOFF
linker doesn't support DWARF sections new in DWARF5.
* config/i386/i386-options.c (ix86_option_override_internal): Default
to dwarf_version 4 if HAVE_LD_BROKEN_PE_DWARF5 for TARGET_PECOFF
targets.
* config.in: Regenerated.
* configure: Regenerated.
Jakub Jelinek [Wed, 31 Mar 2021 06:55:38 +0000 (08:55 +0200)]
testsuite: Disable zero-scratch-regs-{8, 9, 10, 11}.c on all but ... [PR97680]
Seems the target hook is only defined on
config/i386/i386.c:#undef TARGET_ZERO_CALL_USED_REGS
config/i386/i386.c:#define TARGET_ZERO_CALL_USED_REGS ix86_zero_call_used_regs
config/sparc/sparc.c:#undef TARGET_ZERO_CALL_USED_REGS
config/sparc/sparc.c:#define TARGET_ZERO_CALL_USED_REGS sparc_zero_call_used_regs
but apparently many of the tests actually succeed on various targets that
don't define those hooks. E.g. I haven't seen them to fail on aarch64,
on arm only the -10.c fails, on powerpc*/s390* all {8,9,10,11} fail (plus
5 is skipped on power*-aix*).
On ia64 according to testresults {6,7,8,9,10,11} fail, some with ICEs.
On mipsel according to testresults {9,10,11} fail, some with ICEs.
On nvptx at least 1-9 succeed, 10-11 don't know, don't have assert.h around.
I've kept {5,6,7} with aix,ia64,ia64 skipped because those seems like
outliers, it works pretty much everywhere but on those.
The rest have known good targets.
2021-03-31 Jakub Jelinek <jakub@redhat.com>
PR testsuite/97680
* c-c++-common/zero-scratch-regs-6.c: Skip on ia64.
* c-c++-common/zero-scratch-regs-7.c: Likewise.
* c-c++-common/zero-scratch-regs-8.c: Change from dg-skip-if of
selected unsupported triplets to all targets but selected triplets
of supported targets.
* c-c++-common/zero-scratch-regs-9.c: Likewise.
* c-c++-common/zero-scratch-regs-10.c: Likewise.
* c-c++-common/zero-scratch-regs-11.c: Likewise.
Patrick Palka [Wed, 31 Mar 2021 02:57:11 +0000 (22:57 -0400)]
c++: Adjust mangling of __alignof__ [PR88115]
r11-4926 made __alignof__ get mangled differently from alignof,
encoding __alignof__ as a vendor extended operator. But this
mangling is problematic for the reasons mentioned in
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=88115#c6.
This patch changes our mangling of __alignof__ to instead use the
new "vendor extended expression" syntax that's proposed in
https://github.com/itanium-cxx-abi/cxx-abi/issues/112. Clang does
the same thing already, so after this patch Clang and GCC agree
about the mangling of __alignof__(type) and __alignof__(expr).
gcc/cp/ChangeLog:
PR c++/88115
* mangle.c (write_expression): Adjust the mangling of
__alignof__.
include/ChangeLog:
PR c++/88115
* demangle.h (enum demangle_component_type): Add
DEMANGLE_COMPONENT_VENDOR_EXPR.
libiberty/ChangeLog:
PR c++/88115
* cp-demangle.c (d_dump, d_make_comp, d_expression_1)
(d_count_templates_scopes): Handle DEMANGLE_COMPONENT_VENDOR_EXPR.
(d_print_comp_inner): Likewise.
<case DEMANGLE_COMPONENT_EXTENDED_OPERATOR>: Revert r11-4926
change.
<case DEMANGLE_COMPONENT_UNARY>: Likewise.
* testsuite/demangle-expected: Adjust __alignof__ tests.
gcc/testsuite/ChangeLog:
PR c++/88115
* g++.dg/cpp0x/alignof7.C: Adjust expected mangling.
Patrick Palka [Wed, 31 Mar 2021 02:54:37 +0000 (22:54 -0400)]
c++: placeholder type constraint and argument pack [PR99815]
When checking dependence of a placeholder type constraint, if the first
template argument of the constraint is an argument pack, we need to
expand it in order to properly separate the implicit 'auto' argument
from the rest.
gcc/cp/ChangeLog:
PR c++/99815
* pt.c (placeholder_type_constraint_dependent_p): Expand
argument packs to separate the first non-pack argument
from the rest.
gcc/testsuite/ChangeLog:
PR c++/99815
* g++.dg/cpp2a/concepts-placeholder5.C: New test.
GCC Administrator [Wed, 31 Mar 2021 00:16:31 +0000 (00:16 +0000)]
Daily bump.
David Malcolm [Fri, 26 Mar 2021 22:54:18 +0000 (18:54 -0400)]
analyzer: remove old decl of region::dump_to_pp
This was made redundant in the GCC 11 rewrite of state
(
808f4dfeb3a95f50f15e71148e5c1067f90a126d).
gcc/analyzer/ChangeLog:
* region.h (region::dump_to_pp): Remove old decl.
David Malcolm [Fri, 26 Mar 2021 17:26:15 +0000 (13:26 -0400)]
analyzer: only call get_diagnostic_tree when it's needed
impl_sm_context::get_diagnostic_tree could be expensive, and
I find myself needing to put a breakpoint on it to debug
PR analyzer/99771, so only call it if we're about to use
the result.
gcc/analyzer/ChangeLog:
* sm-file.cc (fileptr_state_machine::on_stmt): Only call
get_diagnostic_tree if the result will be used.
* sm-malloc.cc (malloc_state_machine::on_stmt): Likewise.
(malloc_state_machine::on_deallocator_call): Likewise.
(malloc_state_machine::on_realloc_call): Likewise.
(malloc_state_machine::on_realloc_call): Likewise.
* sm-sensitive.cc
(sensitive_state_machine::warn_for_any_exposure): Likewise.
* sm-taint.cc (taint_state_machine::on_stmt): Likewise.
David Malcolm [Thu, 25 Mar 2021 01:08:04 +0000 (21:08 -0400)]
analyzer testsuite: fix typo
gcc/testsuite/ChangeLog:
* gcc.dg/analyzer/symbolic-1.c: Fix typo.
Nathan Sidwell [Tue, 30 Mar 2021 16:45:59 +0000 (09:45 -0700)]
c++: duplicate const static members [PR 99283]
This is the bug that keeps on giving. Reducing it has been successful
at hitting other defects. In this case, some more specialization hash
table fun, plus an issue with reading in a definition of a duplicated
declaration. At least I discovered a null context check is no longer
needed.
PR c++/99283
gcc/cp/
* module.cc (dumper::operator): Make less brittle.
(trees_out::core_bools): VAR_DECLs always have a context.
(trees_out::key_mergeable): Use same_type_p for asserting.
(trees_in::read_var_def): Propagate
DECL_INITIALIZED_BY_CONSTANT_EXPRESSION_P.
gcc/testsuite/
* g++.dg/modules/pr99283-5.h: New.
* g++.dg/modules/pr99283-5_a.H: New.
* g++.dg/modules/pr99283-5_b.H: New.
* g++.dg/modules/pr99283-5_c.C: New.
Jakub Jelinek [Tue, 30 Mar 2021 16:15:32 +0000 (18:15 +0200)]
c++: Fix ICE on PTRMEM_CST in lambda in inline var initializer [PR99790]
The following testcase ICEs (since the addition of inline var support),
because the lambda contains PTRMEM_CST but finish_function is called for the
lambda quite early during parsing it (from finish_lambda_function) when
the containing class is still incomplete. That means that during
genericization cplus_expand_constant keeps the PTRMEM_CST unmodified, but
later nothing lowers it when the class is finalized.
Using sizeof etc. on the class in such contexts is rejected by both g++ and
clang++, and when the PTRMEM_CST appears e.g. in static var initializers
rather than in functions, we handle it correctly because c_parse_final_cleanups
-> lower_var_init will handle those cplus_expand_constant when all classes
are already finalized.
The following patch fixes it by calling cplus_expand_constant again during
gimplification, as we are now unconditionally unit at a time, I'd think
everything that could be completed will be before we start gimplification.
2021-03-30 Jakub Jelinek <jakub@redhat.com>
PR c++/99790
* cp-gimplify.c (cp_gimplify_expr): Handle PTRMEM_CST.
* g++.dg/cpp1z/pr99790.C: New test.
Kyrylo Tkachov [Tue, 30 Mar 2021 15:42:17 +0000 (16:42 +0100)]
aarch64: PR target/99820: Guard on available SVE issue info before using
This fixes a simple segfault ICE when using the use_new_vector_costs tunable with a CPU tuning that it wasn't intended for.
I'm not adding a testcase here as we intend to remove the tunable for GCC 12 anyway (the new costing logic will remain and will benefit
from this extra check, but the -moverride option will no longer exist).
gcc/ChangeLog:
PR target/99820
* config/aarch64/aarch64.c (aarch64_analyze_loop_vinfo): Check for
available issue_info before using it.
Kyrylo Tkachov [Tue, 30 Mar 2021 14:43:36 +0000 (15:43 +0100)]
aarch64: PR target/99822 Don't allow zero register in first operand of SUBS/ADDS-immediate
In this PR we end up generating an invalid instruction:
adds x1,xzr,#2
because the pattern accepts zero as an operand in the comparison, but the instruction doesn't.
Fix it by adjusting the predicate and constraints.
gcc/ChangeLog:
PR target/99822
* config/aarch64/aarch64.md (sub<mode>3_compare1_imm): Do not allow zero
in operand 1.
gcc/testsuite/ChangeLog:
PR target/99822
* gcc.c-torture/compile/pr99822.c: New test.
luoxhu@cn.ibm.com [Sat, 27 Mar 2021 03:26:57 +0000 (22:26 -0500)]
rs6000: Enable 32bit variable vec_insert [PR99718]
32bit and P7 VSX could also benefit a lot from the variable vec_insert
implementation with shift/insert/shift back method.
2011-03-29 Xionghu Luo <luoxhu@linux.ibm.com>
PR target/99718
* config/rs6000/altivec.md (altivec_lvsl_reg): Change to ...
(altivec_lvsl_reg_<mode>): ... this.
(altivec_lvsr_reg): Change to ...
(altivec_lvsr_reg_<mode>): ... this.
* config/rs6000/predicates.md (vec_set_index_operand): New.
* config/rs6000/rs6000-c.c (altivec_resolve_overloaded_builtin):
Enable 32bit variable vec_insert for all TARGET_VSX.
* config/rs6000/rs6000.c (rs6000_expand_vector_set_var_p9):
Enable 32bit variable vec_insert for p9 and above.
(rs6000_expand_vector_set_var_p8): Rename to ...
(rs6000_expand_vector_set_var_p7): ... this.
(rs6000_expand_vector_set): Use TARGET_VSX and adjust assert
position.
* config/rs6000/vector.md (vec_set<mode>): Use vec_set_index_operand.
* config/rs6000/vsx.md (xl_len_r): Use gen_altivec_lvsl_reg_di and
gen_altivec_lvsr_reg_di.
gcc/testsuite/
PR target/99718
* gcc.target/powerpc/fold-vec-insert-char-p8.c: Update
instruction counts.
* gcc.target/powerpc/fold-vec-insert-char-p9.c: Likewise.
* gcc.target/powerpc/fold-vec-insert-double.c: Likewise.
* gcc.target/powerpc/fold-vec-insert-float-p8.c: Likewise.
* gcc.target/powerpc/fold-vec-insert-float-p9.c: Likewise.
* gcc.target/powerpc/fold-vec-insert-int-p8.c: Likewise.
* gcc.target/powerpc/fold-vec-insert-int-p9.c: Likewise.
* gcc.target/powerpc/fold-vec-insert-longlong.c: Likewise.
* gcc.target/powerpc/fold-vec-insert-short-p8.c: Likewise.
* gcc.target/powerpc/fold-vec-insert-short-p9.c: Likewise.
* gcc.target/powerpc/pr79251.p8.c: Likewise.
* gcc.target/powerpc/pr79251.p9.c: Likewise.
* gcc.target/powerpc/vsx-builtin-7.c: Likewise.
* gcc.target/powerpc/pr79251-run.p7.c: New test.
* gcc.target/powerpc/pr79251.p7.c: New test.
H.J. Lu [Wed, 24 Mar 2021 03:04:58 +0000 (20:04 -0700)]
x86: Define __rdtsc and __rdtscp as macros
Define __rdtsc and __rdtscp as macros for callers with general-regs-only
target attribute to avoid inline failure with always_inline attribute.
gcc/
PR target/99744
* config/i386/ia32intrin.h (__rdtsc): Defined as macro.
(__rdtscp): Likewise.
gcc/testsuite/
PR target/99744
* gcc.target/i386/pr99744-1.c: New test.
Tamar Christina [Tue, 30 Mar 2021 13:16:03 +0000 (14:16 +0100)]
slp: reject non-multiple of 2 laned SLP trees (PR99825)
TWO_OPERANDS allows any order or number of combinations of + and - operations
but the pattern matcher only supports pairs of operations.
This patch has the pattern matcher for complex numbers reject SLP trees where
the lanes are not a multiple of 2.
gcc/ChangeLog:
PR tree-optimization/99825
* tree-vect-slp-patterns.c (vect_check_evenodd_blend):
Reject non-mult 2 lanes.
gcc/testsuite/ChangeLog:
PR tree-optimization/99825
* gfortran.dg/vect/pr99825.f90: New test.
Christophe Lyon [Tue, 30 Mar 2021 12:26:33 +0000 (12:26 +0000)]
arm: Fix emission of Tag_ABI_VFP_args with MVE and -mfloat-abi=hard (PR target/99773)
When compiling with -mfloat-abi=hard -march=armv8.1-m.main+mve, we
want to emit Tag_ABI_VFP_args even though we are not emitting
floating-point instructions (we need "+mve.fp" for that), because we
use MVE registers to pass FP arguments.
This patch removes the condition on (! TARGET_SOFT_FLOAT) because this
is a case where TARGET_SOFT_FLOAT is true, and TARGET_HARD_FLOAT_ABI
is true too.
2021-03-30 Richard Earnshaw <rearnsha@arm.com>
gcc/
PR target/99773
* config/arm/arm.c (arm_file_start): Fix emission of
Tag_ABI_VFP_args attribute.
Kyrylo Tkachov [Tue, 30 Mar 2021 13:07:50 +0000 (14:07 +0100)]
aarch64: Fix gcc.target/aarch64/pr99808.c for ILP32
Fix test for -mabi=ilp32
gcc/testsuite/ChangeLog:
PR target/99808
* gcc.target/aarch64/pr99808.c: Use ULL constant suffix.
Richard Biener [Tue, 30 Mar 2021 09:22:52 +0000 (11:22 +0200)]
tree-optimization/99824 - avoid excessive integer type precision in VN
VN sometimes builds new integer types to handle accesss where precision
of the access type does not match the access size. The way
ao_ref_init_from_vn_reference is computing the access size ignores
the access type in case the ref operands have an outermost
COMPONENT_REF which, in case it is an array for example, can be
way larger than the access size. This can cause us to try
building an integer type with precision larger than WIDE_INT_MAX_PRECISION
eventually leading to memory corruption.
The following adjusts ao_ref_init_from_vn_reference to only lower
access sizes via the outermost COMPONENT_REF but otherwise honor
the access size as specified by the access type.
It also places an assert in integer type building that we remain
in the limits of WIDE_INT_MAX_PRECISION. I chose the shared code
where we set TYPE_MIN/MAX_VALUE because that will immediately
cross the wide_ints capacity otherwise.
2021-03-30 Richard Biener <rguenther@suse.de>
PR tree-optimization/99824
* stor-layout.c (set_min_and_max_values_for_integral_type):
Assert the precision is within the bounds of
WIDE_INT_MAX_PRECISION.
* tree-ssa-sccvn.c (ao_ref_init_from_vn_reference): Use
the outermost component ref only to lower the access size
and initialize that from the access type.
* gcc.dg/torture/pr99824.c: New testcase.
Richard Sandiford [Tue, 30 Mar 2021 10:42:50 +0000 (11:42 +0100)]
aarch64: Tweak post-RA handling of CONST_INT moves [PR98136]
This PR is a regression caused by r8-5967, where we replaced
a call to aarch64_internal_mov_immediate in aarch64_add_offset
with a call to aarch64_force_temporary, which in turn uses the
normal emit_move_insn{,_1} routines.
The problem is that aarch64_add_offset can be called while
outputting a thunk, where we require all instructions to be
valid without splitting. However, the move expanders were
not splitting CONST_INT moves themselves.
I think the right fix is to make the move expanders work
even in this scenario, rather than require callers to handle
it as a special case.
gcc/
PR target/98136
* config/aarch64/aarch64.md (mov<mode>): Pass multi-instruction
CONST_INTs to aarch64_expand_mov_immediate when called after RA.
gcc/testsuite/
PR target/98136
* g++.dg/pr98136.C: New test.
Mihailo Stojanovic [Tue, 30 Mar 2021 10:42:49 +0000 (11:42 +0100)]
aarch64: Prevent use of SIMD fcvtz[su] instruction variant with "nosimd"
Currently, SF->SI and DF->DI conversions on Aarch64 with the "nosimd"
flag provided sometimes cause the emitting of a vector variant of the
fcvtz[su] instruction (e.g. fcvtzu s0, s0).
This modifies the corresponding pattern to only select the vector
variant of the instruction when generating code with SIMD enabled.
gcc/ChangeLog:
* config/aarch64/aarch64.md
(<optab>_trunc<fcvt_target><GPI:mode>2): Set the "arch"
attribute to disambiguate between SIMD and FP variants of the
instruction.
gcc/testsuite/ChangeLog:
* gcc.target/aarch64/fcvt_nosimd.c: New test.
GCC Administrator [Tue, 30 Mar 2021 00:16:29 +0000 (00:16 +0000)]
Daily bump.
Joseph Myers [Mon, 29 Mar 2021 22:53:22 +0000 (22:53 +0000)]
Update cpplib sr.po.
* sr.po: Update.
Joseph Myers [Mon, 29 Mar 2021 22:51:16 +0000 (22:51 +0000)]
Update gcc sv.po.
* sv.po: Update.
Eric Botcazou [Mon, 29 Mar 2021 22:41:46 +0000 (00:41 +0200)]
Fix wrong assignment of aggregate to full-access component
This is a regression present on the mainline: the compiler (front-end) fails
to assign an aggregate to a full-access component (i.e. Atomic or VFA) as a
whole if the type of the component is not full access itself.
gcc/ada/
PR ada/99802
* freeze.adb (Is_Full_Access_Aggregate): Call Is_Full_Access_Object
on the name of an N_Assignment_Statement to spot full access.
Martin Sebor [Mon, 29 Mar 2021 21:58:01 +0000 (15:58 -0600)]
PR tree-optimization/61869 - Spurious uninitialized warning
gcc/testsuite/ChangeLog:
PR tree-optimization/61869
* gcc.dg/uninit-pr61869.c: New test.
Martin Sebor [Mon, 29 Mar 2021 21:21:32 +0000 (15:21 -0600)]
PR tree-optimization/61677 - False positive with -Wmaybe-uninitialized
gcc/testsuite/ChangeLog:
PR tree-optimization/61677
* gcc.dg/uninit-pr61677.c: New test.
Michael Meissner [Mon, 29 Mar 2021 20:43:14 +0000 (16:43 -0400)]
Require GLIBC 2.32 for Decimal/_Float128 conversions.
In the patch that I applied on March 2nd, I had code to provide support for
Decimal/_Float128 conversions if the user did not use at least GLIBC 2.32. It
did this by using __ibm128 as an intermediate type. The trouble is __ibm128
cannot represent all of the numbers that _Float128 can, and you lose if you do
this conversion.
This patch removes this support. The dfp-bit.c functions now call the the
__sprintfieee128 and __strtoieee128 functions to do the conversion. If the
user does not have GLIBC, they will get a linker error that these functions do
not exist.
The float128 support functions are only built into the static libgcc, so there
isn't an issue with having references to __strtoieee128 and __sprintfieee128
with older GLIBC libraries.
As an added bonus, this patch eliminates the __sprintfkf function which
included stdio.h to get a definition for the sprintf library function. This
allows for building cross compilers without having to have a target stdio.h
available.
libgcc/
2021-03-29 Michael Meissner <meissner@linux.ibm.com>
* config/rs6000/t-float128 (fp128_decstr_funcs): Delete.
(fp128_ppc_funcs): Do not add $(fp128_decstr_funcs).
(fp128_decstr_objs): Delete.
* dfp-bit.h: Call __sprintfieee128 to do conversions from
_Float128 to a Decimal type. Call __strtoieee128 to do
conversions from a Decimal type to _Float128.
* config/rs6000/_sprintfkf.c: Delete file.
* config/rs6000/_sprintfkf.h: Delete file.
* config/rs6000/_strtokf.c: Delete file.
* config/rs6000/_strtokf.h: Delete file.
Martin Sebor [Mon, 29 Mar 2021 19:52:53 +0000 (13:52 -0600)]
PR tree-optimization/61112 - repeated conditional triggers false positive -Wmaybe-uninitialized
gcc/testsuite/ChangeLog:
PR tree-optimization/61112
* gcc.dg/uninit-pr61112.c: New test.
Jan Hubicka [Mon, 29 Mar 2021 18:59:42 +0000 (20:59 +0200)]
Fix pr99751.c testcase
PR ipa/99751
* gcc.c-torture/compile/pr99751.c: Rename from ...
* gcc.c-torture/execute/pr99751.c: ... to this.
Jan Hubicka [Mon, 29 Mar 2021 18:09:35 +0000 (20:09 +0200)]
Fix typo in merge_call_lhs_flags
gcc/ChangeLog:
2021-03-29 Jan Hubicka <hubicka@ucw.cz>
* ipa-modref.c (merge_call_lhs_flags): Correct handling of deref.
(analyze_ssa_name_flags): Fix typo in comment.
gcc/testsuite/ChangeLog:
2021-03-29 Jan Hubicka <hubicka@ucw.cz>
* gcc.c-torture/compile/pr99751.c: New test.
Jonathan Wakely [Mon, 29 Mar 2021 16:08:32 +0000 (17:08 +0100)]
Fix PR number in ChangeLog
Jakub Jelinek [Mon, 29 Mar 2021 15:05:47 +0000 (17:05 +0200)]
testsuite: Expect a warning on aarch64 for declare-simd-coarray-lib.f90 [PR93660]
aarch64 currently doesn't support declare simd where the return value and arguments
have different sizes and warns about that case. This change adds a dg-warning
for that case like various other tests have already.
2021-03-29 Jakub Jelinek <jakub@redhat.com>
PR fortran/93660
* gfortran.dg/gomp/declare-simd-coarray-lib.f90: Expect a mixed size
declare simd warning on aarch64.
Jonathan Wakely [Mon, 29 Mar 2021 13:13:01 +0000 (14:13 +0100)]
libstdc++: Adjust link to PSTL upstream (again)
The LLVM project renamed their default branch to 'main'.
libstdc++-v3/ChangeLog:
* doc/xml/manual/status_cxx2017.xml: Adjust link for PSTL.
* doc/html/manual/status.html: Regenerate.
Alex Coplan [Mon, 29 Mar 2021 11:18:19 +0000 (12:18 +0100)]
aarch64: Fix SVE ACLE builtins with LTO [PR99216]
As discussed in the PR, we currently have two different numbering
schemes for SVE builtins: one for C, and one for C++. This is
problematic for LTO, where we end up getting confused about which
intrinsic we're talking about. This patch inserts placeholders into the
registered_functions vector to ensure that there is a consistent
numbering scheme for both C and C++.
We use integer_zero_node as a placeholder node instead of building a
function decl. This is safe because the node is only returned by the
TARGET_BUILTIN_DECL hook, which (on AArch64) is only used for validation
when builtin decls are streamed into lto1.
gcc/ChangeLog:
PR target/99216
* config/aarch64/aarch64-sve-builtins.cc
(function_builder::add_function): Add placeholder_p argument, use
placeholder decls if this is set.
(function_builder::add_unique_function): Instead of conditionally adding
direct overloads, unconditionally add either a direct overload or a
placeholder.
(function_builder::add_overloaded_function): Set placeholder_p if we're
using C++ overloads. Use the obstack for string storage instead
of relying on the tree nodes.
(function_builder::add_overloaded_functions): Don't return early for
m_direct_overloads: we need to add placeholders.
* config/aarch64/aarch64-sve-builtins.h
(function_builder::add_function): Add placeholder_p argument.
gcc/testsuite/ChangeLog:
PR target/99216
* g++.target/aarch64/sve/pr99216.C: New test.
Richard Biener [Mon, 29 Mar 2021 11:10:37 +0000 (13:10 +0200)]
tree-optimization/99807 - avoid bogus assert with permute SLP node
This avoids asserting anything on the SLP_TREE_REPRESENTATIVE of
an SLP permute node (which shouldn't be there).
2021-03-29 Richard Biener <rguenther@suse.de>
PR tree-optimization/99807
* tree-vect-slp.c (vect_slp_analyze_node_operations_1): Move
assert below VEC_PERM handling.
* gfortran.dg/vect/pr99807.f90: New testcase.
Kyrylo Tkachov [Mon, 29 Mar 2021 10:52:24 +0000 (11:52 +0100)]
aarch64: PR target/99037 Fix RTL represntation in move_lo_quad patterns
This patch fixes the RTL representation of the move_lo_quad patterns to use aarch64_simd_or_scalar_imm_zero
for the zero part rather than a vec_duplicate of zero or a const_int 0.
The expander that generates them is also adjusted so that we use and match the correct const_vector forms throughout.
Co-Authored-By: Jakub Jelinek <jakub@redhat.com>
gcc/ChangeLog:
PR target/99037
* config/aarch64/aarch64-simd.md (move_lo_quad_internal_<mode>): Use
aarch64_simd_or_scalar_imm_zero to match zeroes. Remove pattern
matching const_int 0.
(move_lo_quad_internal_be_<mode>): Likewise.
(move_lo_quad_<mode>): Update for the above.
* config/aarch64/iterators.md (VQ_2E): Delete.
gcc/testsuite/ChangeLog:
PR target/99808
* gcc.target/aarch64/pr99808.c: New test.
Jakub Jelinek [Mon, 29 Mar 2021 10:35:32 +0000 (12:35 +0200)]
fold-const: Fix ICE in extract_muldiv_1 [PR99777]
extract_muldiv{,_1} is apparently only prepared to handle scalar integer
operations, the callers ensure it by only calling it if the divisor or
one of the multiplicands is INTEGER_CST and because neither multiplication
nor division nor modulo are really supported e.g. for pointer types, nullptr
type etc. But the CASE_CONVERT handling doesn't really check if it isn't
a cast from some other type kind, so on the testcase we end up trying to
build MULT_EXPR in POINTER_TYPE which ICEs. A few years ago Marek has
added ANY_INTEGRAL_TYPE_P checks to two spots, but the code uses
TYPE_PRECISION which means something completely different for vector types,
etc.
So IMNSHO we should just punt on conversions from non-integrals or
non-scalar integrals.
2021-03-29 Jakub Jelinek <jakub@redhat.com>
PR tree-optimization/99777
* fold-const.c (extract_muldiv_1): For conversions, punt on casts from
types other than scalar integral types.
* g++.dg/torture/pr99777.C: New test.
Tobias Burnus [Mon, 29 Mar 2021 08:38:39 +0000 (10:38 +0200)]
libgomp: Fix on_device_arch.c aux-file handling [PR99555]
libgomp/ChangeLog:
PR target/99555
* testsuite/lib/on_device_arch.c: Move to ...
* testsuite/libgomp.c-c++-common/on_device_arch.h: ... here.
* testsuite/libgomp.fortran/on_device_arch.c: New file;
#include on_device_arch.h.
* testsuite/libgomp.c-c++-common/task-detach-6.c: #include
on_device_arch.h instead of using dg-additional-source.
* testsuite/libgomp.c/pr99555-1.c: Likewise.
* testsuite/libgomp.fortran/task-detach-6.f90: Update to use
on_device_arch.c without relative paths.
GCC Administrator [Mon, 29 Mar 2021 00:16:20 +0000 (00:16 +0000)]
Daily bump.
David Edelsohn [Sun, 28 Mar 2021 17:11:50 +0000 (13:11 -0400)]
aix: TLS DWARF symbol decorations.
GCC currently emits TLS relocation decorations on symbols in DWARF sections.
Recent changes to the AIX linker cause it to reject such symbols.
This patch removes the decorations (@ie, @le, @m) and emit only the
qualified symbol name.
gcc/ChangeLog:
* config/rs6000/rs6000.c (rs6000_output_dwarf_dtprel): Do not add
XCOFF TLS reloc decorations.
Gerald Pfeifer [Sun, 28 Mar 2021 21:34:35 +0000 (23:34 +0200)]
doc: Update link to "Memory Model" paper
gcc/ChangeLog:
* doc/analyzer.texi (Analyzer Internals): Update link to
"A Memory Model for Static Analysis of C Programs".
François Dumont [Fri, 26 Mar 2021 20:22:52 +0000 (21:22 +0100)]
libstdc++: _GLIBCXX_DEBUG Fix allocator-extended move constructor
libstdc++-v3/ChangeLog:
* include/debug/forward_list
(forward_list(forward_list&&, const allocator_type&)): Add noexcept qualification.
* include/debug/list (list(list&&, const allocator_type&)): Likewise and add
call to safe container allocator aware move constructor.
* include/debug/vector (vector(vector&&, const allocator_type&)):
Fix noexcept qualification.
* testsuite/23_containers/forward_list/cons/noexcept_move_construct.cc:
Add allocator-extended move constructor noexceot qualification check.
* testsuite/23_containers/list/cons/noexcept_move_construct.cc: Likewise.
Christophe Lyon [Sun, 28 Mar 2021 18:59:06 +0000 (18:59 +0000)]
testsuite/arm: Improve scan-assembler in pr96770.c
I'm seeing random scan-assembler-times failures in pr96770.c when LTO is used.
I suspect this is because the \\+4 string matches the LTO sections, sometimes.
This small patch avoids the issue, by matching arr\\+4 instead of \\+4.
2021-03-28 Christophe Lyon <christophe.lyon@linaro.org>
gcc/testsuite/
PR target/96770
* gcc.target/arm/pure-code/pr96770.c: Improve scan-assembler-times.
Paul Thomas [Sun, 28 Mar 2021 15:48:27 +0000 (16:48 +0100)]
Fortran: Fix problem with runtime pointer check [PR99602].
2021-03-28 Paul Thomas <pault@gcc.gnu.org>
gcc/fortran/ChangeLog
PR fortran/99602
* trans-expr.c (gfc_conv_procedure_call): Use the _data attrs
for class expressions and detect proc pointer evaluations by
the non-null actual argument list.
gcc/testsuite/ChangeLog
PR fortran/99602
* gfortran.dg/pr99602.f90: New test.
* gfortran.dg/pr99602a.f90: New test.
* gfortran.dg/pr99602b.f90: New test.
* gfortran.dg/pr99602c.f90: New test.
* gfortran.dg/pr99602d.f90: New test.
Iain Buclaw [Sun, 28 Mar 2021 14:40:23 +0000 (16:40 +0200)]
d: Predefine the D_PIE version condition when flag_pie is set.
Same as the D_PIC version condition, which is set by flag_pic.
gcc/d/ChangeLog:
* d-builtins.cc (d_init_versions): Predefine D_PIE if flag_pie is set.
Iain Buclaw [Thu, 25 Mar 2021 23:57:54 +0000 (00:57 +0100)]
d: Don't create gdc.test symlink in the gdc testsuite directory
Instead, tests are copied from the source tree (i.e: $srcdir/compilable)
into the test base directory ($base_dir/compilable). A dejagnu test
file with all translated test directives is created in a path that
follows DejaGnu naming conventions ($base_dir/gdc.test/compilable),
which is then passed to `dg-test'.
Before invoking the compiler, the gdc.test prefixed is trimmed from the
test program in `gdc-dg-test' so that all copied test files are picked
up with the correct path names.
gcc/testsuite/ChangeLog:
* lib/gdc-utils.exp (gdc-copy-extra): Rename to...
(gdc-copy-file): ... this. Use file copy instead of open/close.
(gdc-convert-test): Save translated dejagnu test to gdc.test
directory, only write dejagnu directives to the test file.
(gdc-do-test): Don't create gdc.test symlink.
Iain Buclaw [Sat, 27 Mar 2021 01:31:45 +0000 (02:31 +0100)]
d: Define language hook for LANG_HOOKS_ENUM_UNDERLYING_BASE_TYPE
The underlying base type for enumerals are always present in TREE_TYPE.
gcc/d/ChangeLog:
* d-lang.cc (d_enum_underlying_base_type): New function.
(LANG_HOOKS_ENUM_UNDERLYING_BASE_TYPE): Set as
d_enum_underlying_base_type.
Iain Buclaw [Sun, 16 Jun 2019 16:12:47 +0000 (18:12 +0200)]
d: Use COMPILER_FOR_BUILD to build all D front-end generator programs
This means the correct config headers are included when building the
D front-end in a Canadian cross configuration.
gcc/d/ChangeLog:
* Make-lang.in (DMDGEN_COMPILE): Remove.
(d/%.dmdgen.o): Use COMPILER_FOR_BUILD and BUILD_COMPILERFLAGS to
build all D generator programs.
(D_SYSTEM_H): New macro.
(d/idgen.dmdgen.o): Add dependencies to build.
(d/impcnvgen.dmdgen.o): Likewise.
* d-system.h: Include bconfig.h if GENERATOR_FILE is defined.
Iain Buclaw [Sun, 14 Mar 2021 17:11:14 +0000 (18:11 +0100)]
d: Don't generate per-module wrapper for calling DSO constructor/destructor.
The static constructor/destructor list only ever has one function to
call in it, so mark the gdc.dso_ctor and gdc.dso_dtor functions as
static ctor/dtor directly instead.
gcc/d/ChangeLog:
* config-lang.in (gtfiles): Remove modules.cc.
* modules.cc (struct module_info): Remove GTY marker.
(static_ctor_list): Remove variable.
(static_dtor_list): Remove variable.
(register_moduleinfo): Directly set DECL_STATIC_CONSTRUCTOR on
dso_ctor, and DECL_STATIC_DESTRUCTOR on dso_dtor.
(d_finish_compilation): Remove static ctor/dtor handling.
gcc/testsuite/ChangeLog:
* gdc.dg/gdc270a.d: Removed.
* gdc.dg/gdc270b.d: Removed.
GCC Administrator [Sun, 28 Mar 2021 00:16:17 +0000 (00:16 +0000)]
Daily bump.
Steve Kargl [Sat, 27 Mar 2021 22:02:16 +0000 (15:02 -0700)]
fortran: Fix off-by-one in buffer sizes.
gcc/fortran/ChangeLog:
* misc.c (gfc_typename): Fix off-by-one in buffer sizes.
GCC Administrator [Sat, 27 Mar 2021 00:16:27 +0000 (00:16 +0000)]
Daily bump.
David Edelsohn [Sun, 14 Mar 2021 19:09:21 +0000 (15:09 -0400)]
aix: ABI struct alignment (PR99557)
The AIX power alignment rules apply the natural alignment of the
"first member" if it is of a floating-point data type (or is an aggregate
whose recursively "first" member or element is such a type). The alignment
associated with these types for subsequent members use an alignment value
where the floating-point data type is considered to have 4-byte alignment.
GCC had been stripping array type but had not recursively looked
within structs and unions. This also applies to classes and
subclasses and, therefore, becomes more prominent with C++.
For example,
struct A {
double x[2];
int y;
};
struct B {
int i;
struct A a;
};
struct A has double-word alignment for the bare type, but
word alignment and offset within struct B despite the alignment of
struct A. If struct A were the first member of struct B, struct B
would have double-word alignment. One must search for the innermost
first member to increase the alignment if double and then search for
the innermost first member to reduce the alignment if the TYPE had
double-word alignment solely because the innermost first member was
double.
This patch recursively looks through the first member to apply the
double-word alignment to the struct / union as a whole and to apply
the word alignment to the struct or union as a member within a struct
or union.
This is an ABI change for GCC on AIX, but GCC on AIX had not correctly
implemented the AIX ABI and had not been compatible with the IBM XL
compiler.
Bootstrapped on powerpc-ibm-aix7.2.3.0.
gcc/ChangeLog:
* config/rs6000/aix.h (ADJUST_FIELD_ALIGN): Call function.
* config/rs6000/rs6000-protos.h (rs6000_special_adjust_field_align):
Declare.
* config/rs6000/rs6000.c (rs6000_special_adjust_field_align): New.
(rs6000_special_round_type_align): Recursively check innermost first
field.
gcc/testsuite/ChangeLog:
* gcc.target/powerpc/pr99557.c: New.
Jakub Jelinek [Fri, 26 Mar 2021 23:20:42 +0000 (00:20 +0100)]
dwarf2cfi: Defer queued register saves some more [PR99334]
On the testcase in the PR with
-fno-tree-sink -O3 -fPIC -fomit-frame-pointer -fno-strict-aliasing -mstackrealign
we have prologue:
0000000000000000 <_func_with_dwarf_issue_>:
0: 4c 8d 54 24 08 lea 0x8(%rsp),%r10
5: 48 83 e4 f0 and $0xfffffffffffffff0,%rsp
9: 41 ff 72 f8 pushq -0x8(%r10)
d: 55 push %rbp
e: 48 89 e5 mov %rsp,%rbp
11: 41 57 push %r15
13: 41 56 push %r14
15: 41 55 push %r13
17: 41 54 push %r12
19: 41 52 push %r10
1b: 53 push %rbx
1c: 48 83 ec 20 sub $0x20,%rsp
and emit
00000000 0000000000000014 00000000 CIE
Version: 1
Augmentation: "zR"
Code alignment factor: 1
Data alignment factor: -8
Return address column: 16
Augmentation data: 1b
DW_CFA_def_cfa: r7 (rsp) ofs 8
DW_CFA_offset: r16 (rip) at cfa-8
DW_CFA_nop
DW_CFA_nop
00000018 0000000000000044 0000001c FDE cie=
00000000 pc=
0000000000000000..
00000000000001d5
DW_CFA_advance_loc: 5 to
0000000000000005
DW_CFA_def_cfa: r10 (r10) ofs 0
DW_CFA_advance_loc: 9 to
000000000000000e
DW_CFA_expression: r6 (rbp) (DW_OP_breg6 (rbp): 0)
DW_CFA_advance_loc: 13 to
000000000000001b
DW_CFA_def_cfa_expression (DW_OP_breg6 (rbp): -40; DW_OP_deref)
DW_CFA_expression: r15 (r15) (DW_OP_breg6 (rbp): -8)
DW_CFA_expression: r14 (r14) (DW_OP_breg6 (rbp): -16)
DW_CFA_expression: r13 (r13) (DW_OP_breg6 (rbp): -24)
DW_CFA_expression: r12 (r12) (DW_OP_breg6 (rbp): -32)
...
unwind info for that. The problem is when async signal
(or stepping through in the debugger) stops after the pushq %rbp
instruction and before movq %rsp, %rbp, the unwind info says that
caller's %rbp is saved there at *%rbp, but that is not true, caller's
%rbp is either still available in the %rbp register, or in *%rsp,
only after executing the next instruction - movq %rsp, %rbp - the
location for %rbp is correct. So, either we'd need to temporarily
say:
DW_CFA_advance_loc: 9 to
000000000000000e
DW_CFA_expression: r6 (rbp) (DW_OP_breg7 (rsp): 0)
DW_CFA_advance_loc: 3 to
0000000000000011
DW_CFA_expression: r6 (rbp) (DW_OP_breg6 (rbp): 0)
DW_CFA_advance_loc: 10 to
000000000000001b
or to me it seems more compact to just say:
DW_CFA_advance_loc: 12 to
0000000000000011
DW_CFA_expression: r6 (rbp) (DW_OP_breg6 (rbp): 0)
DW_CFA_advance_loc: 10 to
000000000000001b
I've tried instead to deal with it through REG_FRAME_RELATED_EXPR
from the backend, but that failed miserably as explained in the PR,
dwarf2cfi.c has some rules (Rule 16 to Rule 19) that are specific to the
dynamic stack realignment using drap register that only the i386 backend
does right now, and by using REG_FRAME_RELATED_EXPR or REG_CFA* notes we
can't emulate those rules. The following patch instead does the deferring
of the hard frame pointer save rule in dwarf2cfi.c Rule 18 handling and
emits it on the (set hfp sp) assignment that must appear shortly after it
and adds assertion that it is the case.
The difference before/after the patch on the assembly is:
--- pr99334.s~ 2021-03-26 15:42:40.
881749380 +0100
+++ pr99334.s 2021-03-26 17:38:05.
729161910 +0100
@@ -11,8 +11,8 @@ _func_with_dwarf_issue_:
andq $-16, %rsp
pushq -8(%r10)
pushq %rbp
- .cfi_escape 0x10,0x6,0x2,0x76,0
movq %rsp, %rbp
+ .cfi_escape 0x10,0x6,0x2,0x76,0
pushq %r15
pushq %r14
pushq %r13
i.e. does just what we IMHO need, after pushq %rbp %rbp
still contains parent's frame value and so the save rule doesn't
need to be overridden there, ditto at the start of the next insn
before the side-effect took effect, and we override it only after
it when %rbp already has the right value.
If some other target adds dynamic stack realignment in the future and
the offset 0 case wouldn't be true there, the code can be adjusted so that
it works on all the drap architectures, I'm pretty sure the code would
need other adjustments too.
For the rule 18 and for the (set hfp sp) after it we already have asserts
for the drap cases that check whether the code looks the way i?86/x86_64
emit it currently.
2021-03-26 Jakub Jelinek <jakub@redhat.com>
PR debug/99334
* dwarf2out.h (struct dw_fde_node): Add rule18 member.
* dwarf2cfi.c (dwarf2out_frame_debug_expr): When handling (set hfp sp)
assignment with drap_reg active, queue reg save for hfp with offset 0
and flush queued reg saves. When handling a push with rule18,
defer queueing reg save for hfp and just assert the offset is 0.
(scan_trace): Assert that fde->rule18 is false.
Martin Sebor [Fri, 26 Mar 2021 22:37:34 +0000 (16:37 -0600)]
PR tree-optimization/59970 - Bogus -Wmaybe-uninitialized at low optimization levels
PR tree-optimization/59970
* gcc.dg/uninit-pr59970.c: New test.
Marek Polacek [Fri, 26 Mar 2021 15:20:03 +0000 (11:20 -0400)]
c++: ICE on invalid with NSDMI in C++98 [PR98352]
NSDMIs are a C++11 thing, and here we ICE with them on the non-C++11
path. Fortunately all we need is a small tweak to my recent r11-7835
patch.
gcc/cp/ChangeLog:
PR c++/98352
* method.c (implicitly_declare_fn): Pass &raises to
synthesized_method_walk.
gcc/testsuite/ChangeLog:
PR c++/98352
* g++.dg/cpp0x/inh-ctor37.C: Remove dg-error.
* g++.dg/cpp0x/nsdmi17.C: New test.
Jonathan Wakely [Fri, 26 Mar 2021 18:39:49 +0000 (18:39 +0000)]
libstdc++: Add PRNG fallback to std::random_device
This makes std::random_device usable on VxWorks when running on older
x86 hardware. Since the r10-728 fix for PR libstdc++/85494 the library
will use the new code unconditionally on x86, but the cpuid checks for
RDSEED and RDRAND can fail at runtime, depending on the hardware where
the code is executing. If the OS does not provide /dev/urandom then this
means the std::random_device constructor always fails. In previous
releases if /dev/urandom is unavailable then std::mt19937 was used
unconditionally.
This patch adds a fallback for the case where the runtime cpuid checks
for x86 hardware instructions fail, and no /dev/urandom is available.
When this happens a std::linear_congruential_engine object will be used,
with a seed based on hashing the engine's address and the current time.
Distinct std::random_device objects will use different seeds, unless an
object is created and destroyed and a new object created at the same
memory location within the clock tick. This is not great, but is better
than always throwing from the constructor, and better than always using
std::mt19937 with the same seed (as GCC 9 and earlier do).
libstdc++-v3/ChangeLog:
* src/c++11/random.cc (USE_LCG): Define when a pseudo-random
fallback is needed.
[USE_LCG] (bad_seed, construct_lcg_at, destroy_lcg_at, __lcg):
New helper functions and callback.
(random_device::_M_init): Add 'prng' and 'all' enumerators.
Replace switch with fallthrough with a series of 'if' statements.
[USE_LCG]: Construct an lcg_type engine and use __lcg when cpuid
checks fail.
(random_device::_M_init_pretr1) [USE_MT19937]: Accept "prng"
token.
(random_device::_M_getval): Check for callback unconditionally
and always pass _M_file pointer.
* testsuite/26_numerics/random/random_device/85494.cc: Remove
effective-target check. Use new random_device_available helper.
* testsuite/26_numerics/random/random_device/94087.cc: Likewise.
* testsuite/26_numerics/random/random_device/cons/default-cow.cc:
Remove effective-target check.
* testsuite/26_numerics/random/random_device/cons/default.cc:
Likewise.
* testsuite/26_numerics/random/random_device/cons/token.cc: Use
new random_device_available helper. Test "prng" token.
* testsuite/util/testsuite_random.h (random_device_available):
New helper function.
Nathan Sidwell [Fri, 26 Mar 2021 17:46:31 +0000 (10:46 -0700)]
c++: imported templates and alias-template changes [PR 99283]
During development of modules, I had difficulty deciding whether the
module flags of a template should live on the decl_template_result,
the template_decl, or both. I chose the latter, and require them to
be consistent. This and a few other defects show how hard that
consistency is. Hence this patch move to holding the flags on the
template-decl-result decl. That's the entity various bits of the
parser have at the appropriate time. Once needs STRIP_TEMPLATE in a
bunch of places, which this patch adds. Also a check that we never
give a TEMPLATE_DECL to the module flag accessors.
This left a problem with how I was handling template aliases. These
were in two parts -- separating the TEMPLATE_DECL from the TYPE_DECL.
That seemed somewhat funky, but development showed it necessary. Of
course, that causes problems if the TEMPLATE_DECL cannot contain 'am
imported' information. Investigating now shows that we do not need to
treat them separately. By reverting a bit of template instantiation
machinery that caused the problem, we're back on course. I think what
has happened is that between then and now, other typedef fixes have
corrected the underlying problem this separation was working around.
It allows a bunch of cleanup in the decl streamer, as we no longer
have to handle a null TEMPLATE_DECL_RESULT.
PR c++/99283
gcc/cp/
* cp-tree.h (DECL_MODULE_CHECK): Ban TEMPLATE_DECL.
(SET_TYPE_TEMPLATE_INFO): Restore Alias template setting.
* decl.c (duplicate_decls): Remove template_decl module flag
propagation.
* module.cc (merge_kind_name): Add alias tmpl spec as a thing.
(dumper::impl::nested_name): Adjust for template-decl module flag
change.
(trees_in::assert_definition): Likewise.
(trees_in::install_entity): Likewise.
(trees_out::decl_value): Likewise. Remove alias template
separation of template and type_decl.
(trees_in::decl_value): Likewise.
(trees_out::key_mergeable): Likewise,
(trees_in::key_mergeable): Likewise.
(trees_out::decl_node): Adjust for template-decl module flag
change.
(depset::hash::make_dependency): Likewise.
(get_originating_module, module_may_redeclare): Likewise.
(set_instantiating_module, set_defining_module): Likewise.
* name-lookup.c (name_lookup::search_adl): Likewise.
(do_pushdecl): Likewise.
* pt.c (build_template_decl): Likewise.
(lookup_template_class_1): Remove special alias_template handling
of DECL_TI_TEMPLATE.
(tsubst_template_decl): Likewise.
gcc/testsuite/
* g++.dg/modules/pr99283-2_a.H: New.
* g++.dg/modules/pr99283-2_b.H: New.
* g++.dg/modules/pr99283-2_c.H: New.
* g++.dg/modules/pr99283-3_a.H: New.
* g++.dg/modules/pr99283-3_b.H: New.
* g++.dg/modules/pr99283-4.H: New.
* g++.dg/modules/tpl-alias-1_a.H: Adjust scans.
* g++.dg/modules/tpl-alias-1_b.C: Adjust scans.
Dimitar Dimitrov [Fri, 26 Mar 2021 17:00:55 +0000 (19:00 +0200)]
MAINTAINERS: Add myself as pru port maintainer
ChangeLog:
* MAINTAINERS: Add myself as pru port maintainer.
Vladimir Makarov [Fri, 26 Mar 2021 17:09:24 +0000 (17:09 +0000)]
[PR99766] Consider relaxed memory associated more with memory instead of special memory.
Relaxed memory should be considered more like memory then special memory.
gcc/ChangeLog:
PR target/99766
* ira-costs.c (record_reg_classes): Put case with
CT_RELAXED_MEMORY adjacent to one with CT_MEMORY.
* ira.c (ira_setup_alts): Ditto.
* lra-constraints.c (process_alt_operands): Ditto.
* recog.c (asm_operand_ok): Ditto.
* reload.c (find_reloads): Ditto.
gcc/testsuite/ChangeLog:
PR target/99766
* g++.target/aarch64/sve/pr99766.C: New.
Richard Sandiford [Fri, 26 Mar 2021 16:08:38 +0000 (16:08 +0000)]
aarch64: Add costs for LD[34] and ST[34] postincrements
Most postincrements are cheap on Neoverse V1, but it's
generally better to avoid them on LD[34] and ST[34] instructions.
This patch adds separate address costs fields for these cases.
Other CPUs continue to use the same costs for all postincrements.
gcc/
* config/aarch64/aarch64-protos.h
(cpu_addrcost_table::post_modify_ld3_st3): New member variable.
(cpu_addrcost_table::post_modify_ld4_st4): Likewise.
* config/aarch64/aarch64.c (generic_addrcost_table): Update
accordingly, using the same costs as for post_modify.
(exynosm1_addrcost_table, xgene1_addrcost_table): Likewise.
(thunderx2t99_addrcost_table, thunderx3t110_addrcost_table):
(tsv110_addrcost_table, qdf24xx_addrcost_table): Likewise.
(a64fx_addrcost_table): Likewise.
(neoversev1_addrcost_table): New.
(neoversev1_tunings): Use neoversev1_addrcost_table.
(aarch64_address_cost): Use the new post_modify costs for CImode
and XImode.
Richard Sandiford [Fri, 26 Mar 2021 16:08:38 +0000 (16:08 +0000)]
aarch64: Take issue rate into account for vector loop costs
When SVE is enabled, GCC needs to do a three-way comparison
between scalar, Advanced SIMD and SVE code. The normal costs
tend to be latency-based, which is well-suited to SLP. However,
comparing sums of latency costs means that we effectively treat
the code as executing sequentially. This can hide the effect of
pipeline bubbles or resource contention that in practice are quite
important for loop vectorisation. This is particularly true for
loops that involve reductions.
This patch therefore tries to estimate how quickly each piece
of code could issue, using a very (very) simplistic model.
It then uses this to adjust the loop vector costs up or down as
appropriate. Part of the Advanced SIMD vs. SVE adjustment is
opt-in and is not enabled by default even for use_new_vector_costs.
Like with the previous patches, this one only becomes active if
a CPU selects use_new_vector_costs. It should therefore have
a very low impact on other CPUs. The code also mostly ignores
CPUs that have no issue information, even if use_new_vector_costs
is enabled for some reason.
gcc/
* config/aarch64/aarch64.opt
(-param=aarch64-loop-vect-issue-rate-niters=): New parameter.
* doc/invoke.texi: Document it.
* config/aarch64/aarch64-protos.h (aarch64_base_vec_issue_info)
(aarch64_scalar_vec_issue_info, aarch64_simd_vec_issue_info)
(aarch64_advsimd_vec_issue_info, aarch64_sve_vec_issue_info)
(aarch64_vec_issue_info): New structures.
(cpu_vector_cost): Write comments above the variables rather
than to the side.
(cpu_vector_cost::issue_info): New member variable.
* config/aarch64/aarch64.c: Include gimple-pretty-print.h
and tree-ssa-loop-niter.h.
(generic_vector_cost, a64fx_vector_cost, qdf24xx_vector_cost)
(thunderx_vector_cost, tsv110_vector_cost, cortexa57_vector_cost)
(exynosm1_vector_cost, xgene1_vector_cost, thunderx2t99_vector_cost)
(thunderx3t110_vector_cost): Initialize issue_info to null.
(neoversev1_scalar_issue_info, neoversev1_advsimd_issue_info)
(neoversev1_sve_issue_info, neoversev1_vec_issue_info): New structures.
(neoversev1_vector_cost): Use them.
(aarch64_vec_op_count, aarch64_sve_op_count): New structures.
(aarch64_vector_costs::saw_sve_only_op): New member variable.
(aarch64_vector_costs::num_vector_iterations): Likewise.
(aarch64_vector_costs::scalar_ops): Likewise.
(aarch64_vector_costs::advsimd_ops): Likewise.
(aarch64_vector_costs::sve_ops): Likewise.
(aarch64_vector_costs::seen_loads): Likewise.
(aarch64_simd_vec_costs_for_flags): New function.
(aarch64_analyze_loop_vinfo): Initialize num_vector_iterations.
Count the number of predicate operations required by SVE WHILE
instructions.
(aarch64_comparison_type, aarch64_multiply_add_p): New functions.
(aarch64_sve_only_stmt_p, aarch64_in_loop_reduction_latency): Likewise.
(aarch64_count_ops): Likewise.
(aarch64_add_stmt_cost): Record whether see an SVE operation
that cannot currently be implementing using Advanced SIMD.
Record issue information about the scalar, Advanced SIMD
and (where relevant) SVE versions of a loop.
(aarch64_vec_op_count::dump): New function.
(aarch64_sve_op_count::dump): Likewise.
(aarch64_estimate_min_cycles_per_iter): Likewise.
(aarch64_adjust_body_cost): If issue information is available,
try to compare the issue rates of the various loop implementations
and increase or decrease the vector body cost accordingly.
Richard Sandiford [Fri, 26 Mar 2021 16:08:37 +0000 (16:08 +0000)]
aarch64: Ignore inductions when costing vector code
In practice it seems to be better not to cost a vector induction.
The scalar code generally needs the same induction but doesn't
cost it, making an apples-for-apples comparison harder. Most
inductions also have a low latency and their cost usually gets
hidden by other operations.
Like with the previous patches, this one only becomes active if
a CPU selects use_new_vector_costs. It should therefore have
a very low impact on other CPUs.
gcc/
* config/aarch64/aarch64.c (aarch64_detect_vector_stmt_subtype):
Assume a zero cost for induction phis.
Richard Sandiford [Fri, 26 Mar 2021 16:08:36 +0000 (16:08 +0000)]
aarch64: Cost comparisons embedded in COND_EXPRs
So far the costing of COND_EXPRs hasn't distinguished between
cases in which the condition is calculated separately or is
built into the COND_EXPR itself. This patch adds the cost
of any embedded comparison.
Like with the previous patches, this one only becomes active if
a CPU selects use_new_vector_costs. It should therefore have
a very low impact on other CPUs.
gcc/
* config/aarch64/aarch64.c (aarch64_embedded_comparison_type): New
function.
(aarch64_adjust_stmt_cost): Add the costs of embedded scalar and
vector comparisons.
Richard Sandiford [Fri, 26 Mar 2021 16:08:35 +0000 (16:08 +0000)]
aarch64: Detect scalar extending loads
If the scalar code does an integer load followed by an integer
extension, we've tended to cost that as two separate operations,
even though the extension is probably going to be free in practice.
This patch treats the extension as having zero cost, like we already
do for extending SVE loads.
Like with previous patches, this one only becomes active if
a CPU selects use_new_vector_costs. It should therefore have
a very low impact on other CPUs.
gcc/
* config/aarch64/aarch64.c (aarch64_detect_scalar_stmt_subtype):
New function.
(aarch64_add_stmt_cost): Call it.
Richard Sandiford [Fri, 26 Mar 2021 16:08:35 +0000 (16:08 +0000)]
aarch64: Try to detect when Advanced SIMD code would be completely unrolled
GCC usually costs the SVE and Advanced SIMD versions of a loop
and picks the one with the lowest cost. By default it will choose
SVE over Advanced SIMD in the event of tie.
This is normally the correct behaviour, not least because SVE can
handle every scalar iteration count whereas Advanced SIMD can only
handle full vectors. However, there is one important exception
that GCC failed to consider: we can completely unroll Advanced SIMD
code at compile time, but we can't do the same for SVE.
This patch therefore adds an opt-in heuristic to guess whether
the Advanced SIMD version of a loop is likely to be unrolled.
This will only be suitable for some CPUs, so it is not enabled
by default and is controlled separately from use_new_vector_costs.
Like with previous patches, this one only becomes active if a
CPU selects both of the new tuning parameters. It should therefore
have a very low impact on other CPUs.
gcc/
* config/aarch64/aarch64-tuning-flags.def (matched_vector_throughput):
New tuning parameter.
* config/aarch64/aarch64.c (neoversev1_tunings): Use it.
(aarch64_estimated_sve_vq): New function.
(aarch64_vector_costs::analyzed_vinfo): New member variable.
(aarch64_vector_costs::is_loop): Likewise.
(aarch64_vector_costs::unrolled_advsimd_niters): Likewise.
(aarch64_vector_costs::unrolled_advsimd_stmts): Likewise.
(aarch64_record_potential_advsimd_unrolling): New function.
(aarch64_analyze_loop_vinfo, aarch64_analyze_bb_vinfo): Likewise.
(aarch64_add_stmt_cost): Call aarch64_analyze_loop_vinfo or
aarch64_analyze_bb_vinfo on the first use of a costs structure.
Detect whether we're vectorizing a loop for SVE that might be
completely unrolled if it used Advanced SIMD instead.
(aarch64_adjust_body_cost_for_latency): New function.
(aarch64_finish_cost): Call it.
Richard Sandiford [Fri, 26 Mar 2021 16:08:34 +0000 (16:08 +0000)]
aarch64: Use an aarch64-specific structure for vector costing
This patch makes the AArch64 vector code use its own vector
costs structure, rather than just using the default unsigned[3].
Unfortunately, it's not easy to make this change specific to
use_new_vector_costs, so this part is one that affects all CPUs.
The change is relatively mechanical though.
gcc/
* config/aarch64/aarch64.c (aarch64_vector_costs): New structure.
(aarch64_init_cost): New function.
(aarch64_add_stmt_cost): Use aarch64_vector_costs instead of
the default unsigned[3].
(aarch64_finish_cost, aarch64_destroy_cost_data): New functions.
(TARGET_VECTORIZE_INIT_COST): Override.
(TARGET_VECTORIZE_FINISH_COST): Likewise.
(TARGET_VECTORIZE_DESTROY_COST_DATA): Likewise.
Richard Sandiford [Fri, 26 Mar 2021 16:08:33 +0000 (16:08 +0000)]
aarch64: Add a CPU-specific cost table for Neoverse V1
This patch adds dedicated vector costs for Neoverse V1.
Previously we just used the Cortex-A57 costs, which isn't
ideal given that Cortex-A57 doesn't support SVE.
gcc/
* config/aarch64/aarch64.c (neoversev1_advsimd_vector_cost)
(neoversev1_sve_vector_cost): New cost structures.
(neoversev1_vector_cost): Likewise.
(neoversev1_tunings): Use them. Enable use_new_vector_costs.
Richard Sandiford [Fri, 26 Mar 2021 16:08:32 +0000 (16:08 +0000)]
aarch64: Add costs for one element of a scatter store
Currently each element in a gather load is costed as a scalar_load
and each element in a scatter store is costed as a scalar_store.
The load side seems to work pretty well in practice, since many
CPU-specific costs give loads quite a high cost relative to
arithmetic operations. However, stores usually have a cost
of just 1, which means that scatters tend to appear too cheap.
This patch adds a separate cost for one element in a scatter store.
Like with the previous patches, this one only becomes active if
a CPU selects use_new_vector_costs. It should therefore have
a very low impact on other CPUs.
gcc/
* config/aarch64/aarch64-protos.h
(sve_vec_cost::scatter_store_elt_cost): New member variable.
* config/aarch64/aarch64.c (generic_sve_vector_cost): Update
accordingly, taking the cost from the cost of a scalar_store.
(a64fx_sve_vector_cost): Likewise.
(aarch64_detect_vector_stmt_subtype): Detect scatter stores.
Richard Sandiford [Fri, 26 Mar 2021 16:08:31 +0000 (16:08 +0000)]
aarch64: Add costs for storing one element of a vector
Storing one element of a vector is costed as a vec_to_scalar
followed by a scalar_store. However, vec_to_scalar is also
used for reductions and for vector-to-GPR moves, which makes
it difficult to pick one cost for them all.
This patch therefore adds a cost for extracting one element
of a vector in preparation for storing it out. The store
itself is still costed separately.
Like with the previous patches, this one only becomes active if
a CPU selects use_new_vector_costs. It should therefore have
a very low impact on other CPUs.
gcc/
* config/aarch64/aarch64-protos.h
(simd_vec_cost::store_elt_extra_cost): New member variable.
* config/aarch64/aarch64.c (generic_advsimd_vector_cost): Update
accordingly, using the vec_to_scalar cost for the new field.
(generic_sve_vector_cost, a64fx_advsimd_vector_cost): Likewise.
(a64fx_sve_vector_cost, qdf24xx_advsimd_vector_cost): Likewise.
(thunderx_advsimd_vector_cost, tsv110_advsimd_vector_cost): Likewise.
(cortexa57_advsimd_vector_cost, exynosm1_advsimd_vector_cost)
(xgene1_advsimd_vector_cost, thunderx2t99_advsimd_vector_cost)
(thunderx3t110_advsimd_vector_cost): Likewise.
(aarch64_detect_vector_stmt_subtype): Detect single-element stores.
Richard Sandiford [Fri, 26 Mar 2021 16:08:31 +0000 (16:08 +0000)]
aarch64: Add costs for LD[234]/ST[234] permutes
At the moment, we cost LD[234] and ST[234] as N vector loads
or stores, which effectively treats the implied permute as free.
This patch adds additional costs for the permutes, which apply on
top of the costs for the loads and stores.
Like with the previous patches, this one only becomes active if
a CPU selects use_new_vector_costs. It should therefore have
a very low impact on other CPUs.
gcc/
* config/aarch64/aarch64-protos.h (simd_vec_cost::ld2_st2_permute_cost)
(simd_vec_cost::ld3_st3_permute_cost): New member variables.
(simd_vec_cost::ld4_st4_permute_cost): Likewise.
* config/aarch64/aarch64.c (generic_advsimd_vector_cost): Update
accordingly, using zero for the new costs.
(generic_sve_vector_cost, a64fx_advsimd_vector_cost): Likewise.
(a64fx_sve_vector_cost, qdf24xx_advsimd_vector_cost): Likewise.
(thunderx_advsimd_vector_cost, tsv110_advsimd_vector_cost): Likewise.
(cortexa57_advsimd_vector_cost, exynosm1_advsimd_vector_cost)
(xgene1_advsimd_vector_cost, thunderx2t99_advsimd_vector_cost)
(thunderx3t110_advsimd_vector_cost): Likewise.
(aarch64_ld234_st234_vectors): New function.
(aarch64_adjust_stmt_cost): Likewise.
(aarch64_add_stmt_cost): Call aarch64_adjust_stmt_cost if using
the new vector costs.
Richard Sandiford [Fri, 26 Mar 2021 16:08:30 +0000 (16:08 +0000)]
aarch64: Add vector costs for SVE CLAST[AB] and FADDA
Following on from the previous reduction costs patch, this one
adds costs for the SVE CLAST[AB] and FADDA instructions.
These instructions occur within the loop body, whereas the
reductions handled by the previous patch occur outside.
Like with the previous patch, this one only becomes active if
a CPU selects use_new_vector_costs. It should therefore have
a very low impact on other CPUs.
gcc/
* config/aarch64/aarch64-protos.h (sve_vec_cost): Turn into a
derived class of simd_vec_cost. Add information about CLAST[AB]
and FADDA instructions.
* config/aarch64/aarch64.c (generic_sve_vector_cost): Update
accordingly, using the vec_to_scalar costs for the new fields.
(a64fx_sve_vector_cost): Likewise.
(aarch64_reduc_type): New function.
(aarch64_sve_in_loop_reduction_latency): Likewise.
(aarch64_detect_vector_stmt_subtype): Take a vinfo parameter.
Use aarch64_sve_in_loop_reduction_latency to handle SVE reductions
that occur in the loop body.
(aarch64_add_stmt_cost): Update call accordingly.
Richard Sandiford [Fri, 26 Mar 2021 16:08:29 +0000 (16:08 +0000)]
aarch64: Add reduction costs to simd_vec_costs
This patch is part of a series that makes opt-in tweaks to the
AArch64 vector cost model.
At the moment, all reductions are costed as vec_to_scalar, which
also includes things like extracting a single element from a vector.
This is a bit too coarse in practice, since the cost of a reduction
depends very much on the type of value that it's processing.
This patch therefore adds separate costs for each case. To start with,
all the new costs are copied from the associated vec_to_scalar ones.
Due the extreme lateness of this patch in the GCC 11 cycle, I've added
a new tuning flag (use_new_vector_costs) that selects the new behaviour.
This should help to ensure that the risk of the new code is only borne
by the CPUs that need it. Generic tuning is not affected.
gcc/
* config/aarch64/aarch64-tuning-flags.def (use_new_vector_costs):
New tuning flag.
* config/aarch64/aarch64-protos.h (simd_vec_cost): Put comments
above the fields rather than to the right.
(simd_vec_cost::reduc_i8_cost): New member variable.
(simd_vec_cost::reduc_i16_cost): Likewise.
(simd_vec_cost::reduc_i32_cost): Likewise.
(simd_vec_cost::reduc_i64_cost): Likewise.
(simd_vec_cost::reduc_f16_cost): Likewise.
(simd_vec_cost::reduc_f32_cost): Likewise.
(simd_vec_cost::reduc_f64_cost): Likewise.
* config/aarch64/aarch64.c (generic_advsimd_vector_cost): Update
accordingly, using the vec_to_scalar_cost for the new fields.
(generic_sve_vector_cost, a64fx_advsimd_vector_cost): Likewise.
(a64fx_sve_vector_cost, qdf24xx_advsimd_vector_cost): Likewise.
(thunderx_advsimd_vector_cost, tsv110_advsimd_vector_cost): Likewise.
(cortexa57_advsimd_vector_cost, exynosm1_advsimd_vector_cost)
(xgene1_advsimd_vector_cost, thunderx2t99_advsimd_vector_cost)
(thunderx3t110_advsimd_vector_cost): Likewise.
(aarch64_use_new_vector_costs_p): New function.
(aarch64_simd_vec_costs): New function, split out from...
(aarch64_builtin_vectorization_cost): ...here.
(aarch64_is_reduction): New function.
(aarch64_detect_vector_stmt_subtype): Likewise.
(aarch64_add_stmt_cost): Call aarch64_detect_vector_stmt_subtype if
using the new vector costs.
Iain Buclaw [Fri, 26 Mar 2021 14:46:24 +0000 (15:46 +0100)]
libphobos: Build all modules with -fversion=Shared when configured with --enable-shared
The libgdruntime_convenience library was built with `-fversion=Shared',
but the libphobos part wasn't when creating the static library.
As there are no issues compiling in Shared code into the static library,
to avoid mismatches the flag is now always present when --enable-shared
is turned on. Libtool's compiler PIC D flag is now the combination of
compiler PIC and D Shared flags, and AM_DFLAGS passes `-prefer-pic' to
libtool unless --enable-shared is turned off.
libphobos/ChangeLog:
* Makefile.in: Regenerate.
* configure: Regenerate.
* configure.ac: Substitute enable_shared, enable_static, and
phobos_lt_pic_flag.
* libdruntime/Makefile.am (AM_DFLAGS): Replace
phobos_compiler_pic_flag with phobos_lt_pic_flags, and
phobos_compiler_shared_flag.
* libdruntime/Makefile.in: Regenerate.
* src/Makefile.am (AM_DFLAGS): Replace phobos_compiler_pic_flag
with phobos_lt_pic_flag, and phobos_compiler_shared_flag.
* src/Makefile.in: Regenerate.
* testsuite/Makefile.in: Regenerate.
* testsuite/libphobos.druntime_shared/druntime_shared.exp: Remove
-fversion=Shared and -fno-moduleinfo from default extra test flags.
* testsuite/libphobos.phobos_shared/phobos_shared.exp: Likewise.
* testsuite/testsuite_flags.in: Add phobos_compiler_shared_flag to
--gdcflags.
Iain Buclaw [Sat, 13 Mar 2021 16:05:52 +0000 (17:05 +0100)]
Fix ICE: in function_and_variable_visibility, at ipa-visibility.c:795 [PR99466]
In get_emutls_init_templ_addr, only thread-local declarations that were
DECL_ONE_ONLY would have a public initializer symbol, ignoring variables
that were declared with __attribute__((weak)).
gcc/ChangeLog:
PR ipa/99466
* tree-emutls.c (get_emutls_init_templ_addr): Mark initializer of weak
TLS declarations as public.
gcc/testsuite/ChangeLog:
PR ipa/99466
* gcc.dg/tls/pr99466-1.c: New test.
* gcc.dg/tls/pr99466-2.c: New test.
Iain Buclaw [Fri, 26 Mar 2021 12:12:59 +0000 (13:12 +0100)]
d: Define IN_TARGET_CODE in all machine-specific D language files.
This is to be consistent with the rest of the back-end.
gcc/ChangeLog:
* config/aarch64/aarch64-d.c (IN_TARGET_CODE): Define.
* config/arm/arm-d.c (IN_TARGET_CODE): Likewise.
* config/i386/i386-d.c (IN_TARGET_CODE): Likewise.
* config/mips/mips-d.c (IN_TARGET_CODE): Likewise.
* config/pa/pa-d.c (IN_TARGET_CODE): Likewise.
* config/riscv/riscv-d.c (IN_TARGET_CODE): Likewise.
* config/rs6000/rs6000-d.c (IN_TARGET_CODE): Likewise.
* config/s390/s390-d.c (IN_TARGET_CODE): Likewise.
* config/sparc/sparc-d.c (IN_TARGET_CODE): Likewise.