Ian Lance Taylor [Wed, 16 Feb 2022 19:35:00 +0000 (11:35 -0800)]
libbacktrace: initialize DWARF 5 fields of unit
When I added the fields in 2019-12-13 I forgot to initialize them.
* dwarf.c (build_address_map): Initialize DWARF 5 fields of unit.
Andrew MacLeod [Wed, 16 Feb 2022 14:01:47 +0000 (09:01 -0500)]
Use range_compatible_p in condexpr_adjust
* gimple-range-gori.cc (gori_compute::condexpr_adjust): Use
range_compatible_p instead of direct type comparison.
Patrick Palka [Wed, 16 Feb 2022 17:41:35 +0000 (12:41 -0500)]
c++: treat NON_DEPENDENT_EXPR as not potentially constant [PR104507]
Here we're crashing from potential_constant_expression because it tries
to perform trial evaluation of the first operand '(bool)__r' of the
conjunction (which is overall wrapped in a NON_DEPENDENT_EXPR), but
cxx_eval_constant_expression ICEs on unsupported trees (of which CAST_EXPR
is one). The sequence of events is:
1. build_non_dependent_expr for the array subscript yields
NON_DEPENDENT_EXPR<<<(bool)__r && __s>>> ? 1 : 2
2. cp_build_array_ref calls fold_non_dependent_expr on this subscript
(after this point, processing_template_decl is cleared)
3. during which, the COND_EXPR case of tsubst_copy_and_build calls
fold_non_dependent_expr on the first operand
4. during which, we crash from p_c_e_1 because it attempts trial
evaluation of the CAST_EXPR '(bool)__r'.
Note that even if this crash didn't happen, fold_non_dependent_expr
from cp_build_array_ref would still ultimately be one big no-op here
since neither constexpr evaluation nor tsubst handle NON_DEPENDENT_EXPR.
In light of this and of the observation that we should never see
NON_DEPENDENT_EXPR in a context where a constant expression is needed
(it's used primarily in the build_x_* family of functions), it seems
futile for p_c_e_1 to ever return true for NON_DEPENDENT_EXPR. And the
otherwise inconsistent handling of NON_DEPENDENT_EXPR between p_c_e_1,
cxx_evaluate_constexpr_expression and tsubst apparently leads to weird
bugs such as this one.
PR c++/104507
gcc/cp/ChangeLog:
* constexpr.cc (potential_constant_expression_1)
<case NON_DEPENDENT_EXPR>: Return false instead of recursing.
Assert tf_error isn't set.
gcc/testsuite/ChangeLog:
* g++.dg/template/non-dependent21.C: New test.
Jakub Jelinek [Wed, 16 Feb 2022 16:03:58 +0000 (17:03 +0100)]
testsuite: Add testcase for already fixed PR [PR104448]
This PR has been fixed with r12-7147-g2f9ab267e725ddf2.
2022-02-16 Jakub Jelinek <jakub@redhat.com>
PR target/104448
* gcc.target/i386/pr104448.c: New test.
Jakub Jelinek [Wed, 16 Feb 2022 13:48:30 +0000 (14:48 +0100)]
combine: Fix up -fcompare-debug issue in the combiner [PR104544]
On the following testcase on aarch64-linux, we behave differently
with -g and -g0.
The problem is that on:
(insn 10011 10010 10012 2 (set (reg:CC 66 cc)
(compare:CC (reg:DI 105)
(const_int 0 [0]))) "pr104544.c":18:3 407 {cmpdi}
(expr_list:REG_DEAD (reg:DI 105)
(nil)))
(insn 10012 10011 10013 2 (set (reg:SI 109)
(eq:SI (reg:CC 66 cc)
(const_int 0 [0]))) "pr104544.c":18:3 444 {aarch64_cstoresi}
(expr_list:REG_DEAD (reg:CC 66 cc)
(nil)))
(insn 10013 10012 10016 2 (set (reg:DI 110)
(zero_extend:DI (reg:SI 109))) "pr104544.c":18:3 111 {*zero_extendsidi2_aarch64}
(expr_list:REG_DEAD (reg:SI 109)
(nil)))
(insn 10016 10013 10017 2 (parallel [
(set (reg:CC 66 cc)
(compare:CC (const_int 0 [0])
(reg:DI 110)))
(set (reg:DI 111)
(neg:DI (reg:DI 110)))
]) "pr104544.c":18:3 281 {negdi_carryout}
(expr_list:REG_DEAD (reg:DI 110)
(nil)))
...
(debug_insn 6 5 7 2 (var_location:SI y (debug_expr:SI D#5)) "pr104544.c":18:3 -1
(nil))
(debug_insn 7 6 10033 2 (debug_marker) "pr104544.c":11:3 -1
(nil))
(insn 10033 7 10034 2 (set (reg:DI 117 [ _14 ])
(ior:DI (reg:DI 111)
(reg:DI 112))) "pr104544.c":11:6 496 {iordi3}
(expr_list:REG_DEAD (reg:DI 112)
(expr_list:REG_DEAD (reg:DI 111)
(nil))))
we successfully split 3 insns into two:
Trying 10011, 10013 -> 10016:
10011: cc:CC=cmp(r105:DI,0)
REG_DEAD r105:DI
10013: r110:DI=cc:CC==0
REG_DEAD cc:CC
10016: {cc:CC=cmp(0,r110:DI);r111:DI=-r110:DI;}
REG_DEAD r110:DI
Failed to match this instruction:
(parallel [
(set (reg:CC 66 cc)
(compare:CC (reg:DI 105)
(const_int 0 [0])))
(set (reg:DI 111)
(neg:DI (eq:DI (reg:DI 105)
(const_int 0 [0]))))
])
Failed to match this instruction:
(parallel [
(set (reg:CC 66 cc)
(compare:CC (reg:DI 105)
(const_int 0 [0])))
(set (reg:DI 111)
(neg:DI (eq:DI (reg:DI 105)
(const_int 0 [0]))))
])
Successfully matched this instruction:
(set (reg:DI 111)
(neg:DI (eq:DI (reg:DI 105)
(const_int 0 [0]))))
Successfully matched this instruction:
(set (reg:CC 66 cc)
(compare:CC (reg:DI 105)
(const_int 0 [0])))
Successfully matched this instruction:
(set (reg:DI 112)
(neg:DI (eq:DI (reg:CC 66 cc)
(const_int 0 [0]))))
allowing combination of insns 10011, 10013 and 10016
original costs 4 + 4 + 4 = 16
replacement costs 4 + 4 = 12
deferring deletion of insn with uid = 10011.
but the code that searches forward for insns to update their log
links (before the change there is a link from insn 10033 to insn 10016
for pseudo 111) only finds insn 10033 and updates the log link if
-g isn't enabled, otherwise it stops earlier because there are debug insns
in between. So, with -g LOG_LINKS of 10033 isn't updated, points eventually
to NOTE_INSN_DELETED and so we do not attempt to combine 10033 with other
insns, while with -g0 we do.
The following patch fixes that by instead ignoring debug insns during the
searching. We can still check BLOCK_FOR_INSN (insn) on those, because
if we notice DEBUG_INSN in a following basic block, necessarily there won't
be any further normal insns in the current block after it.
2022-02-16 Jakub Jelinek <jakub@redhat.com>
PR rtl-optimization/104544
* combine.cc (try_combine): When looking for insn whose links
should be updated from i3 to i2, don't stop on debug insns, instead
skip over them.
* gcc.dg/pr104544.c: New test.
Richard Sandiford [Wed, 16 Feb 2022 10:21:14 +0000 (10:21 +0000)]
aarch64: Tweak atomic-inst-cas.c options
atomic-inst-cas.c has code to skip __atomic_compare_exchange_n
calls for invalid memory orderings, but -Winvalid-memory-model
applies before the dead code is removed (which is the right
behaviour IMO). This patch therefore suppresses the warning
for this test.
gcc/testsuite/
* gcc.target/aarch64/atomic-inst-cas.c: Add
-Wno-invalid-memory-model.
Richard Sandiford [Wed, 16 Feb 2022 10:21:14 +0000 (10:21 +0000)]
aarch64: Remove XFAIL for bic-bitmask-1.c
bic-bitmask-1.c is now passing, so remove the XFAIL.
gcc/testsuite/
* gcc.target/aarch64/bic-bitmask-1.c: Remove XFAIL.
Richard Sandiford [Wed, 16 Feb 2022 10:21:13 +0000 (10:21 +0000)]
aarch64: Extend PR100056 patterns to +
pr100056.c contains things like:
int
or_shift_u3a (unsigned i)
{
i &= 7;
return i | (i << 11);
}
After g:
96146e61cd7aee62c21c2845916ec42152918ab7, the preferred
gimple representation of this is a multiplication:
i_2 = i_1(D) & 7;
_5 = i_2 * 2049;
Expand then open-codes the multiplication back to individual shifts,
but (of course) it uses + rather than | to combine the shifts.
This means that we end up with the RTL equivalent of:
i + (i << 11)
I wondered about canonicalising the + to | (*back* to | in this case)
when the operands have no set bits in common and when one of the
operands is &, | or ^, but that didn't seem to be a popular idea when
I asked on IRC. The feeling seemed to be that + is inherently simpler
than |, so we shouldn't be “simplifying” the other way.
This patch therefore adjusts the PR100056 patterns to handle +
as well as |, in cases where the operands are provably disjoint.
For:
int
or_shift_u8 (unsigned char i)
{
return i | (i << 11);
}
the instructions:
2: r95:SI=zero_extend(x0:QI)
REG_DEAD x0:QI
7: r98:SI=r95:SI<<0xb
are combined into:
(parallel [
(set (reg:SI 98)
(and:SI (ashift:SI (reg:SI 0 x0 [ i ])
(const_int 11 [0xb]))
(const_int 522240 [0x7f800])))
(set (reg/v:SI 95 [ i ])
(zero_extend:SI (reg:QI 0 x0 [ i ])))
])
which fails to match, but which is then split into its individual
(independent) sets. Later the zero_extend is combined with the add
to get an ADD UXTB:
(set (reg:SI 99)
(plus:SI (zero_extend:SI (reg:QI 0 x0 [ i ]))
(reg:SI 98)))
This means that there is never a 3-insn combo to match the split
against. The end result is therefore:
ubfiz w1, w0, 11, 8
add w0, w1, w0, uxtb
This is a bit redundant, since it's doing the zero_extend twice.
It is at least 2 instructions though, rather than the 3 that we
had before the original patch for PR100056. or_shift_u8_asm is
affected similarly.
The net effect is that we do still have 2 UBFIZs, but we're at
least back down to 2 instructions per function, as for GCC 11.
I think that's good enough for now.
There are probably other instructions that should be extended
to support + as well as | (e.g. the EXTR ones), but those aren't
regressions and so are GCC 13 material.
gcc/
PR target/100056
* config/aarch64/iterators.md (LOGICAL_OR_PLUS): New iterator.
* config/aarch64/aarch64.md: Extend the PR100056 patterns
to handle plus in the same way as ior, if the operands have
no set bits in common.
gcc/testsuite/
PR target/100056
* gcc.target/aarch64/pr100056.c: XFAIL the original UBFIZ test
and instead expect two UBFIZs + two ADD UXTBs.
Iain Buclaw [Sun, 13 Feb 2022 19:17:53 +0000 (20:17 +0100)]
d: Merge upstream dmd
52844d4b1, druntime
dbd0c874, phobos
896b1d0e1.
D front-end changes:
- Parsing and compiling C code is now possible using `import'.
- `throw' statements can now be used as an expression.
- Improvements to the D template emission strategy when compiling
with `-funittest'.
D Runtime changes:
- New core.int128 module for implementing intrinsics to support
128-bit integer types.
- C bindings for the kernel and C runtime have been better separated
to allow compiling for hybrid targets, such as kFreeBSD.
Phobos changes:
- The std.experimental.checkedint module has been renamed to
std.checkedint.
gcc/d/ChangeLog:
* d-builtins.cc (d_build_builtins_module): Set purity of DECL_PURE_P
functions to PURE::const_.
* d-gimplify.cc (bit_field_ref): New function.
(d_gimplify_modify_expr): Handle implicit casting for assignments to
bit-fields.
(d_gimplify_unary_expr): New function.
(d_gimplify_binary_expr): New function.
(d_gimplify_expr): Handle UNARY_CLASS_P and BINARY_CLASS_P.
* d-target.cc (Target::_init): Initialize bitFieldStyle.
(TargetCPP::parameterType): Update signature.
(Target::supportsLinkerDirective): New function.
* dmd/MERGE: Merge upstream dmd
52844d4b1.
* expr.cc (ExprVisitor::visit (ThrowExp *)): New function.
* types.cc (d_build_bitfield_integer_type): New function.
(insert_aggregate_bitfield): New function.
(layout_aggregate_members): Handle inserting bit-fields into an
aggregate type.
libphobos/ChangeLog:
* Makefile.in: Regenerate.
* libdruntime/MERGE: Merge upstream druntime
dbd0c874.
* libdruntime/Makefile.am (DRUNTIME_CSOURCES): Add core/int128.d.
(DRUNTIME_DISOURCES): Add __builtins.di.
* libdruntime/Makefile.in: Regenerate.
* src/MERGE: Merge upstream phobos
896b1d0e1.
* src/Makefile.am (PHOBOS_DSOURCES): Add std/checkedint.d.
* src/Makefile.in: Regenerate.
* testsuite/testsuite_flags.in: Add -fall-instantiations to
--gdcflags.
Jakub Jelinek [Wed, 16 Feb 2022 08:27:11 +0000 (09:27 +0100)]
openmp: For min/max omp atomic compare forms verify arg types with build_binary_op [PR104531]
The MIN_EXPR/MAX_EXPR handling in *build_binary_op is minimal (especially
for C FE), because min/max aren't expressions the languages contain directly.
I'm using those for the
#pragma omp atomic
x = x < y ? y : x;
forms, but e.g. for the attached testcase we normally reject _Complex int vs. int
comparisons, in C++ due to MIN/MAX_EXPR we were diagnosing it as invalid types
for <unknown> while in C we accept it and ICEd later on.
The following patch will try build_binary_op with LT_EXPR on the operands first
to get needed diagnostics and fail if it returns error_mark_node.
2022-02-16 Jakub Jelinek <jakub@redhat.com>
PR c/104531
* c-omp.cc (c_finish_omp_atomic): For MIN_EXPR/MAX_EXPR, try first
build_binary_op with LT_EXPR and only if that doesn't return
error_mark_node call build_modify_expr.
* c-c++-common/gomp/atomic-31.c: New test.
Jakub Jelinek [Wed, 16 Feb 2022 08:25:55 +0000 (09:25 +0100)]
c-family: Fix up shorten_compare for decimal vs. non-decimal float comparison [PR104510]
The comment in shorten_compare says:
/* If either arg is decimal float and the other is float, fail. */
but the callers of shorten_compare don't expect anything like failure
as a possibility from the function, callers require that the function
promotes the operands to the same type, whether the original selected
*restype_ptr one or some shortened.
So, if we choose not to shorten, we should still promote to the original
*restype_ptr.
2022-02-16 Jakub Jelinek <jakub@redhat.com>
PR c/104510
* c-common.cc (shorten_compare): Convert original arguments to
the original *restype_ptr when mixing binary and decimal float.
* gcc.dg/dfp/pr104510.c: New test.
GCC Administrator [Wed, 16 Feb 2022 00:16:26 +0000 (00:16 +0000)]
Daily bump.
Peter Bergner [Tue, 15 Feb 2022 22:51:32 +0000 (16:51 -0600)]
rs6000: Retry tbegin. instructions that can fail intermittently
The HTM tbegin. instruction can fail intermittently due to many reasons.
This can lead to htm-1.c FAILing from time to time. The solution is to
allow retrying the instruction a few times before aborting.
2022-02-15 Peter Bergner <bergner@linux.ibm.com>
gcc/testsuite/
* gcc.target/powerpc/htm-1.c: Retry intermittent failing tbegins.
Andrew MacLeod [Tue, 15 Feb 2022 00:43:40 +0000 (19:43 -0500)]
Use GORI to evaluate arguments of a COND_EXPR.
Provide an API into gori to perform a basic evaluation of the arguments of a
COND_EXPR if they are in the dependency chain of the condition.
PR tree-optimization/104526
gcc/
* gimple-range-fold.cc (fold_using_range::range_of_cond_expr): Call
new routine.
* gimple-range-gori.cc (range_def_chain::get_def_chain): Force a build
of dependency chain if there isn't one.
(gori_compute::condexpr_adjust): New.
* gimple-range-gori.h (class gori_compute): New prototype.
gcc/testsuite/
* gcc.dg/pr104526.c: New.
David Malcolm [Mon, 14 Feb 2022 18:27:45 +0000 (13:27 -0500)]
analyzer: fix ICE on cast to NULL type [PR104524]
gcc/analyzer/ChangeLog:
PR analyzer/104524
* region-model-manager.cc
(region_model_manager::maybe_fold_sub_svalue): Only call
get_or_create_cast if type is non-NULL.
gcc/testsuite/ChangeLog:
PR analyzer/104524
* gcc.dg/analyzer/pr104524.c: New test.
Signed-off-by: David Malcolm <dmalcolm@redhat.com>
David Malcolm [Fri, 11 Feb 2022 21:43:21 +0000 (16:43 -0500)]
analyzer: fix uninit false +ve due to optimized conditionals [PR102692]
There is false positive from -Wanalyzer-use-of-uninitialized-value on
gcc.dg/analyzer/pr102692.c here:
‘fix_overlays_before’: events 1-3
|
| 75 | while (tail
| | ~~~~
| 76 | && (tem = make_lisp_ptr (tail, 5),
| | ^~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~
| | |
| | (1) following ‘false’ branch (when ‘tail’ is NULL)...
| 77 | (end = marker_position (XOVERLAY (tem)->end)) >= pos))
| | ~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~
|......
| 82 | if (!tail || end < prev || !tail->next)
| | ~~~~~ ~~~~~~~~~~
| | | |
| | | (3) use of uninitialized value ‘end’ here
| | (2) ...to here
|
The issue is that inner || of the conditionals have been folded within the
frontend from a chain of control flow:
5 │ if (tail == 0B) goto <D.1986>; else goto <D.1988>;
6 │ <D.1988>:
7 │ if (end < prev) goto <D.1986>; else goto <D.1989>;
8 │ <D.1989>:
9 │ _1 = tail->next;
10 │ if (_1 == 0B) goto <D.1986>; else goto <D.1987>;
11 │ <D.1986>:
to an OR expr (and then to a bitwise-or by the gimplifier):
5 │ _1 = tail == 0B;
6 │ _2 = end < prev;
7 │ _3 = _1 | _2;
8 │ if (_3 != 0) goto <D.1986>; else goto <D.1988>;
9 │ <D.1988>:
10 │ _4 = tail->next;
11 │ if (_4 == 0B) goto <D.1986>; else goto <D.1987>;
This happens for sufficiently simple conditionals in fold_truth_andor.
In particular, the (end < prev) is short-circuited without optimization,
but is evaluated with optimization, leading to the false positive.
Given how early this folding occurs, it seems the simplest fix is to
try to detect places where this optimization appears to have happened,
and suppress uninit warnings within the statement that would have
been short-circuited.
gcc/analyzer/ChangeLog:
PR analyzer/102692
* exploded-graph.h (impl_region_model_context::get_stmt): New.
* region-model.cc: Include "gimple-ssa.h", "tree-phinodes.h",
"tree-ssa-operands.h", and "ssa-iterators.h".
(within_short_circuited_stmt_p): New.
(region_model::check_for_poison): Don't warn about uninit values
if within_short_circuited_stmt_p.
* region-model.h (region_model_context::get_stmt): New vfunc.
(noop_region_model_context::get_stmt): New.
gcc/testsuite/ChangeLog:
PR analyzer/102692
* gcc.dg/analyzer/pr102692-2.c: New test.
* gcc.dg/analyzer/pr102692.c: Remove xfail. Remove -O2 from
options and move to...
* gcc.dg/analyzer/torture/pr102692.c: ...here.
Signed-off-by: David Malcolm <dmalcolm@redhat.com>
Tobias Burnus [Tue, 15 Feb 2022 20:42:33 +0000 (21:42 +0100)]
Fortran/OpenMP: Fix depend-clause handling for c_ptr
gcc/fortran/ChangeLog:
* trans-openmp.cc (gfc_trans_omp_depobj): Fix to alloc/ptr dummy
and for c_ptr.
gcc/testsuite/ChangeLog:
* gfortran.dg/gomp/depend-4.f90: Add VALUE test, update scan test.
* gfortran.dg/gomp/depend-5.f90: Fix scan tree for -m32.
* gfortran.dg/gomp/depend-6.f90: New test.
Richard Sandiford [Tue, 15 Feb 2022 18:09:35 +0000 (18:09 +0000)]
aarch64: Fix subs_compare_2.c regression [PR100874]
subs_compare_2.c tests that we can use a SUBS+CSEL sequence for:
unsigned int
foo (unsigned int a, unsigned int b)
{
unsigned int x = a - 4;
if (a < 4)
return x;
else
return 0;
}
As Andrew notes in the PR, this is effectively MIN (x, 4) - 4,
and it is now recognised as such by phiopt. Previously it was
if-converted in RTL instead.
I tried to look for ways to generalise this to other situations
and to other ?:-style operations, not just max and min. However,
for general ?: we tend to push an outer “- CST” into the arms of
the ?: -- at least if one of them simplifies -- so I didn't find
any useful abstraction.
This patch therefore adds a pattern specifically for
max/min(a,cst)-cst. I'm not thrilled at having to do this,
but it seems like the least worst fix in the circumstances.
Also, max(a,cst)-cst for unsigned a is a useful saturating
subtraction idiom and so is arguably worth its own code
for that reason.
gcc/
PR target/100874
* config/aarch64/aarch64-protos.h (aarch64_maxmin_plus_const):
Declare.
* config/aarch64/aarch64.cc (aarch64_maxmin_plus_const): New function.
* config/aarch64/aarch64.md (*aarch64_minmax_plus): New pattern.
gcc/testsuite/
* gcc.target/aarch64/max_plus_1.c: New test.
* gcc.target/aarch64/max_plus_2.c: Likewise.
* gcc.target/aarch64/max_plus_3.c: Likewise.
* gcc.target/aarch64/max_plus_4.c: Likewise.
* gcc.target/aarch64/max_plus_5.c: Likewise.
* gcc.target/aarch64/max_plus_6.c: Likewise.
* gcc.target/aarch64/max_plus_7.c: Likewise.
* gcc.target/aarch64/min_plus_1.c: Likewise.
* gcc.target/aarch64/min_plus_2.c: Likewise.
* gcc.target/aarch64/min_plus_3.c: Likewise.
* gcc.target/aarch64/min_plus_4.c: Likewise.
* gcc.target/aarch64/min_plus_5.c: Likewise.
* gcc.target/aarch64/min_plus_6.c: Likewise.
* gcc.target/aarch64/min_plus_7.c: Likewise.
Richard Sandiford [Tue, 15 Feb 2022 18:09:34 +0000 (18:09 +0000)]
aarch64: Fix store_v2vec_lanes.c failure
store_v2vec_lanes.c started failing after SLP was enabled at -O2.
The test is specifically checking what happens for unvectorised code,
with the vectors being constructed from individal addition results.
gcc/testsuite/
* gcc.target/aarch64/store_v2vec_lanes.c: Add -fno-tree-vectorize.
Richard Sandiford [Tue, 15 Feb 2022 18:09:34 +0000 (18:09 +0000)]
aarch64: Add +nosve to tests
This patch adds +nosve to various Advanced SIMD-only tests.
gcc/testsuite/
* gcc.target/aarch64/shl-combine-2.c: New test.
* gcc.target/aarch64/shl-combine-3.c: Likewise.
* gcc.target/aarch64/shl-combine-4.c: Likewise.
* gcc.target/aarch64/shl-combine-5.c: Likewise.
* gcc.target/aarch64/xtn-combine-1.c: Likewise.
* gcc.target/aarch64/xtn-combine-2.c: Likewise.
* gcc.target/aarch64/xtn-combine-3.c: Likewise.
* gcc.target/aarch64/xtn-combine-4.c: Likewise.
* gcc.target/aarch64/xtn-combine-5.c: Likewise.
* gcc.target/aarch64/xtn-combine-6.c: Likewise.
Richard Sandiford [Tue, 15 Feb 2022 18:09:33 +0000 (18:09 +0000)]
vect+aarch64: Fix ldp_stp_* regressions
ldp_stp_1.c, ldp_stp_4.c and ldp_stp_5.c have been failing since
vectorisation was enabled at -O2. In all three cases SLP is
generating vector code when scalar code would be better.
The problem is that the target costs do not model whether STP could
be used for the scalar or vector code, so the normal latency-based
costs for store-heavy code can be way off. It would be good to fix
that “properly” at some point, but it isn't easy; see the existing
discussion in aarch64_sve_adjust_stmt_cost for more details.
This patch therefore adds an on-the-side check for whether the
code is doing nothing more than set-up+stores. It then applies
STP-based costs to those cases only, in addition to the normal
latency-based costs. (That is, the vector code has to win on
both counts rather than on one count individually.)
However, at the moment, SLP costs one vector set-up instruction
for every vector in an SLP node, even if the contents are the
same as a previous vector in the same node. Fixing the STP costs
without fixing that would regress other cases, tested in the patch.
The patch therefore makes the SLP costing code check for duplicates
within a node. Ideally we'd check for duplicates more globally,
but that would require a more global approach to costs: the cost
of an initialisation should be amoritised across all trees that
use the initialisation, rather than fully counted against one
arbitrarily-chosen subtree.
Back on aarch64: an earlier version of the patch tried to apply
the new heuristic to constant stores. However, that didn't work
too well in practice; see the comments for details. The patch
therefore just tests the status quo for constant cases, leaving out
a match if the current choice is dubious.
ldp_stp_5.c was affected by the same thing. The test would be
worth vectorising if we generated better vector code, but:
(1) We do a bad job of moving the { -1, 1 } constant, given that
we have { -1, -1 } and { 1, 1 } to hand.
(2) The vector code has 6 pairable stores to misaligned offsets.
We have peephole patterns to handle such misalignment for
4 pairable stores, but not 6.
So the SLP decision isn't wrong as such. It's just being let
down by later codegen.
The patch therefore adds -mstrict-align to preserve the original
intention of the test while adding ldp_stp_19.c to check for the
preferred vector code (XFAILed for now).
gcc/
* tree-vectorizer.h (vect_scalar_ops_slice): New struct.
(vect_scalar_ops_slice_hash): Likewise.
(vect_scalar_ops_slice::op): New function.
* tree-vect-slp.cc (vect_scalar_ops_slice::all_same_p): New function.
(vect_scalar_ops_slice_hash::hash): Likewise.
(vect_scalar_ops_slice_hash::equal): Likewise.
(vect_prologue_cost_for_slp): Check for duplicate vectors.
* config/aarch64/aarch64.cc
(aarch64_vector_costs::m_stp_sequence_cost): New member variable.
(aarch64_aligned_constant_offset_p): New function.
(aarch64_stp_sequence_cost): Likewise.
(aarch64_vector_costs::add_stmt_cost): Handle new STP heuristic.
(aarch64_vector_costs::finish_cost): Likewise.
gcc/testsuite/
* gcc.target/aarch64/ldp_stp_5.c: Require -mstrict-align.
* gcc.target/aarch64/ldp_stp_14.h,
* gcc.target/aarch64/ldp_stp_14.c: New test.
* gcc.target/aarch64/ldp_stp_15.c: Likewise.
* gcc.target/aarch64/ldp_stp_16.c: Likewise.
* gcc.target/aarch64/ldp_stp_17.c: Likewise.
* gcc.target/aarch64/ldp_stp_18.c: Likewise.
* gcc.target/aarch64/ldp_stp_19.c: Likewise.
Richard Sandiford [Tue, 15 Feb 2022 18:09:33 +0000 (18:09 +0000)]
vect: Fix early free
When updating the target costs interface, I failed to move the
free of the scalar costs beyond the new last use.
gcc/
* tree-vect-slp.cc (vect_bb_vectorization_profitable_p): Fix
use after free.
Jonathan Wakely [Tue, 15 Feb 2022 12:47:39 +0000 (12:47 +0000)]
libstdc++: Add missing constexpr to uses-allocator construction utilities [PR104542]
libstdc++-v3/ChangeLog:
PR libstdc++/104542
* include/bits/uses_allocator_args.h (make_obj_using_allocator)
(uninitialized_construct_using_allocator): Add constexpr.
* testsuite/20_util/uses_allocator/make_obj.cc: Check constexpr.
* testsuite/20_util/uses_allocator/uninitialized_construct.cc: New test.
Richard Biener [Tue, 15 Feb 2022 11:27:14 +0000 (12:27 +0100)]
tree-optimization/104543 - fix unroll-and-jam precondition
We have to make sure that outer loop exits come after the inner
loop since we otherwise will put it into the fused loop body.
2022-02-15 Richard Biener <rguenther@suse.de>
PR tree-optimization/104543
* gimple-loop-jam.cc (unroll_jam_possible_p): Check outer loop exits
come after the inner loop.
* gcc.dg/torture/pr104543.c: New testcase.
Tobias Burnus [Tue, 15 Feb 2022 11:26:48 +0000 (12:26 +0100)]
Fortran/OpenMP: Fix depend-clause handling
gcc/fortran/ChangeLog:
* trans-openmp.cc (gfc_trans_omp_clauses, gfc_trans_omp_depobj):
Depend on the proper addr, for ptr/alloc depend on pointee.
libgomp/ChangeLog:
* testsuite/libgomp.fortran/depend-4.f90: New test.
gcc/testsuite/ChangeLog:
* gfortran.dg/gomp/depend-4.f90: New test.
* gfortran.dg/gomp/depend-5.f90: New test.
Jakub Jelinek [Tue, 15 Feb 2022 11:17:41 +0000 (12:17 +0100)]
cygwin: Fix up -Werror=format-diag errors [PR104536]
As the testcase reports, cygwin has 3 can%'t contractions in diagnostics,
we use cannot everywhere else instead and -Wformat-diag enforces that.
2022-02-15 Jakub Jelinek <jakub@redhat.com>
PR target/104536
* config/i386/host-cygwin.cc (cygwin_gt_pch_get_address): Use
cannot instead of can%'t in diagnostics. Formatting fixes.
Jakub Jelinek [Tue, 15 Feb 2022 11:11:31 +0000 (12:11 +0100)]
fold, simplify-rtx: Punt on non-representable floating point constants [PR104522]
For IBM double double I've added in PR95450 and PR99648 verification that
when we at the tree/GIMPLE or RTL level interpret target bytes as a REAL_CST
or CONST_DOUBLE constant, we try to encode it back to target bytes and
verify it is the same.
This is because our real.c support isn't able to represent all valid values
of IBM double double which has variable precision.
In PR104522, it has been noted that we have similar problem with the
Intel/Motorola extended XFmode formats, our internal representation isn't
able to record pseudo denormals, pseudo infinities, pseudo NaNs and unnormal
values.
So, the following patch is an attempt to extend that verification to all
floats.
Unfortunately, it wasn't that straightforward, because the
__builtin_clear_padding code exactly for the XFmode long doubles needs to
discover what bits are padding and does that by interpreting memory of
all 1s. That is actually a valid supported value, a qNaN with negative
sign with all mantissa bits set, but the verification includes also the
padding bits (exactly what __builtin_clear_padding wants to figure out)
and so fails the comparison check and so we ICE.
The patch fixes that case by moving that verification from
native_interpret_real to its caller, so that clear_padding_type can
call native_interpret_real and avoid that extra check.
With this, the only thing that regresses in the testsuite is
+FAIL: gcc.target/i386/auto-init-4.c scan-assembler-times long\\t-
16843010 5
because it decides to use a pattern that has non-zero bits in the padding
bits of the long double, so the simplify-rtx.cc change prevents folding
a SUBREG into a constant. We emit (the testcase is -O0 but we emit worse
code at all opt levels) something like:
movabsq $-
72340172838076674, %rax
movabsq $-
72340172838076674, %rdx
movq %rax, -48(%rbp)
movq %rdx, -40(%rbp)
fldt -48(%rbp)
fstpt -32(%rbp)
instead of
fldt .LC2(%rip)
fstpt -32(%rbp)
...
.LC2:
.long -
16843010
.long -
16843010
.long 65278
.long 0
Note, neither of those sequences actually stores the padding bits, fstpt
simply doesn't touch them.
For vars with clear_padding_real_needs_padding_p types that are allocated
to memory at expansion time, I'd say much better would be to do the stores
using integral modes rather than XFmode, so do that:
movabsq $-
72340172838076674, %rax
movq %rax, -32(%rbp)
movq %rax, -24(%rbp)
directly. That is the only way to ensure the padding bits are initialized
(or expand __builtin_clear_padding, but then you initialize separately the
value bits and padding bits).
2022-02-15 Jakub Jelinek <jakub@redhat.com>
PR middle-end/104522
* fold-const.h (native_interpret_real): Declare.
* fold-const.cc (native_interpret_real): No longer static. Don't
perform MODE_COMPOSITE_P verification here.
(native_interpret_expr) <case REAL_TYPE>: But perform it here instead
for all modes.
* gimple-fold.cc (clear_padding_type): Call native_interpret_real
instead of native_interpret_expr.
* simplify-rtx.cc (simplify_immed_subreg): Perform the native_encode_rtx
and comparison verification for all FLOAT_MODE_P modes, not just
MODE_COMPOSITE_P.
* gcc.dg/pr104522.c: New test.
Richard Biener [Tue, 15 Feb 2022 08:40:59 +0000 (09:40 +0100)]
tree-optimization/104519 - adjust PR100499 niter fix
The following adjusts the PR100499 niter fix to use the appropriate
types when checking whether the difference between the final and base
values of the IV are a multiple of the step. It also gets rid of
an always false condition in multiple_of_p which lead me to a
wrong solution first.
2022-02-15 Richard Biener <rguenther@suse.de>
PR tree-optimization/104519
* fold-const.cc (multiple_of_p): Remove never true condition.
* tree-ssa-loop-niter.cc (number_of_iterations_ne): Use
the appropriate types for determining whether the difference
of final and base is a multiple of the step.
* gcc.dg/torture/pr104519.c: New testcase.
Jakub Jelinek [Tue, 15 Feb 2022 10:18:56 +0000 (11:18 +0100)]
sanitizer: Use glibc _thread_db_sizeof_pthread symbol if present
I've cherry-picked following fix from llvm-project. Recent glibcs
have _thread_db_sizeof_pthread symbol variable which contains the
size of struct pthread, so that sanitizers don't need to guess that
and risk that it will change again.
2022-02-15 Jakub Jelinek <jakub@redhat.com>
* sanitizer_common/sanitizer_linux_libcdep.cpp: Cherry-pick
llvm-project revision
ef14b78d9a144ba81ba02083fe21eb286a88732b.
Jakub Jelinek [Tue, 15 Feb 2022 09:22:30 +0000 (10:22 +0100)]
openmp: Make finalize_task_copyfn order reproduceable [PR104517]
The following testcase fails -fcompare-debug, because finalize_task_copyfn
was invoked from splay tree destruction, whose order can in some cases
depend on -g/-g0. The fix is to queue the task stmts that need copyfn
in a vector and run finalize_task_copyfn on elements of that vector.
2022-02-15 Jakub Jelinek <jakub@redhat.com>
PR debug/104517
* omp-low.cc (task_cpyfns): New variable.
(delete_omp_context): Don't call finalize_task_copyfn from here.
(create_task_copyfn): Push task_stmt into task_cpyfns.
(execute_lower_omp): Call finalize_task_copyfn here on entries from
task_cpyfns vector and release the vector.
* gcc.dg/gomp/pr104517.c: New test.
Jason Merrill [Thu, 10 Feb 2022 22:57:38 +0000 (17:57 -0500)]
c++: TTP in member alias template [PR104107]
In the first testcase, coerce_template_template_parms was adding too much of
outer_args when coercing to match P's template parameters, so that when
substituting into the 'const T&' parameter we got an unrelated template
argument for T. We should only add outer_args when the argument template is
a nested template.
PR c++/104107
PR c++/95036
gcc/cp/ChangeLog:
* pt.cc (coerce_template_template_parms): Take full parms.
Avoid adding too much of outer_args.
(coerce_template_template_parm): Adjust.
(template_template_parm_bindings_ok_p): Adjust.
(convert_template_argument): Adjust.
gcc/testsuite/ChangeLog:
* g++.dg/cpp0x/alias-decl-ttp2.C: New test.
* g++.dg/cpp1z/ttp2.C: New test.
GCC Administrator [Tue, 15 Feb 2022 00:16:41 +0000 (00:16 +0000)]
Daily bump.
Martin Sebor [Mon, 14 Feb 2022 22:40:25 +0000 (15:40 -0700)]
Update -Warray-bounds documentation [PR104355].
Resolves:
PR middle-end/104355 - Misleading and outdated -Warray-bounds documentation
gcc/ChangeLog:
PR middle-end/104355
* doc/invoke.texi (-Warray-bounds): Update documentation.
Michael Meissner [Mon, 14 Feb 2022 22:42:14 +0000 (17:42 -0500)]
Use correct names for __ibm128 if long double is IEEE 128-bit.
If you are on a PowerPC system where the default long double is IEEE
128-bit (either through the compiler option -mabi=ieeelongdouble or via
the configure option --with-long-double-format=ieee), GCC used the wrong
names for some of the conversion functions for the __ibm128 type.
Internally, GCC uses IFmode for __ibm128 if long double is IEEE 128-bit,
instead of TFmode when long double is IBM 128-bit. This patch adds the
missing conversions to prevent the 'if' name from being used.
In particular, before the patch, the conversions used were:
IFmode to DImode signed: __fixifdi instead of __fixtfdi
IFmode to DImode unsigned __fixunsifti instead of __fixunstfti
DImode to IFmode signed: __floatdiif instead of __floatditf
DImode to IFmode unsigned: __floatundiif instead of __floatunditf
2022-02-14 Michael Meissner <meissner@the-meissners.org>
gcc/
PR target/104253
* config/rs6000/rs6000.cc (init_float128_ibm): Update the
conversion functions used to convert IFmode types.
gcc/testsuite/
PR target/104253
* gcc.target/powerpc/pr104253.c: New test.
Harald Anlauf [Thu, 10 Feb 2022 20:22:48 +0000 (21:22 +0100)]
Fortran: improve error recovery on bad array section
gcc/fortran/ChangeLog:
PR fortran/104211
* expr.cc (find_array_section): Replace assertion by error
recovery when encountering bad array constructor.
gcc/testsuite/ChangeLog:
PR fortran/104211
* gfortran.dg/pr104211.f90: New test.
Jonathan Wakely [Mon, 14 Feb 2022 16:46:55 +0000 (16:46 +0000)]
libstdc++: Fix stream extraction of IEEE128 long double [PR100912]
The std::__convert_from_v helper that formats double and long double
values into a char buffer was not being duplicated for the two long
double ABIs. This resulted in an ODR violation inside the library, where
some callers needed it to use snprintf to format __ibm128 values and
other callers needed it to use __snprintfieee128 to format __ieee128
values. The linker discarded one of the definitions, leaving one set of
callers using the wrong code.
This puts __convert_from_v in the __gnu_cxx_ieee128 inline namespace
when long double is __ieee128, so that there are two different
definitions of the function.
The std::money_put::__do_put overload for __ibm128 values needs a
different fix, because that is defined when long double is __ieee128 and
so would call the one in the inline namespace. That can be fixed by just
inlining the code directly into the function and using an asm alias to
call the right version of snprintf for the __ibm128 format. The code to
do that can be simpler than __convert_from_v because if we're defining
the ALT128_COMPAT symbols we know that we have a recent glibc and so we
can assume that uselocale and snprintf are supported.
libstdc++-v3/ChangeLog:
PR libstdc++/100912
* config/locale/gnu/c_locale.h (__convert_from_v): Use inline
namespace for IEEE128 long double mode.
* config/os/gnu-linux/ldbl-ieee128-extra.ver: Add new symbol
version and export __gnu_cxx_ieee128::__convert_from_v.
* include/bits/locale_facets_nonio.tcc (money_put::__do_put):
Make __ibm128 overload use snprintf directly
* testsuite/util/testsuite_abi.cc: Add new symbol version.
Remove stable IEEE128/LDBL versions.
Jakub Jelinek [Mon, 14 Feb 2022 15:56:15 +0000 (16:56 +0100)]
c++: Don't reject GOTO_EXPRs to cdtor_label in potential_constant_expression_1 [PR104513]
return in ctors on targetm.cxx.cdtor_returns_this () target like arm
is emitted as GOTO_EXPR cdtor_label where at cdtor_label it emits
RETURN_EXPR with the this.
Similarly, in all dtors regardless of targetm.cxx.cdtor_returns_this ()
a return is emitted similarly.
potential_constant_expression_1 was rejecting these gotos and so we
incorrectly rejected these testcases, but actual cxx_eval* is apparently
handling these just fine. I was a little bit worried that for the
destruction of bases we wouldn't evaluate something we should, but as the
testcase shows, that is evaluated through try ... finally and there is
nothing after the cdtor_label. For arm there is RETURN_EXPR this; but we
don't really care about the return value from ctors and dtors during the
constexpr evaluation.
I must say I don't see much the point of cdtor_labels at all, I'd think
that with try ... finally around it for non-arm we could just RETURN_EXPR
instead of the GOTO_EXPR and the try/finally gimplification would DTRT,
and we could just add the right return value for the arm case.
2022-02-14 Jakub Jelinek <jakub@redhat.com>
PR c++/104513
* constexpr.cc (potential_constant_expression_1) <case GOTO_EXPR>:
Don't punt if returns (target).
* g++.dg/cpp1y/constexpr-104513.C: New test.
* g++.dg/cpp2a/constexpr-dtor12.C: New test.
Andrew Stubbs [Sat, 12 Feb 2022 23:44:48 +0000 (23:44 +0000)]
amdgcn: Allow vector reductions on constants
Obviously it would be better if these reductions could be evaluated at compile
time, but this will avoid an ICE.
gcc/ChangeLog:
* config/gcn/gcn.cc (gcn_expand_reduc_scalar): Use force_reg.
Richard Biener [Mon, 14 Feb 2022 12:37:54 +0000 (13:37 +0100)]
tree-optimization/104528 - free niter estimates after DSE
When DSE removes a trivially dead def we have to reset niter information
on loops since that might refer to it. The patch also adds verification
to make sure this does not happen.
2022-02-14 Richard Biener <rguenther@suse.de>
PR tree-optimization/104528
* tree-ssa.h (find_released_ssa_name): Declare.
* tree-ssa.cc (find_released_ssa_name): Export.
* cfgloop.cc (verify_loop_structure): Look for released
SSA names in loops nb_iterations.
* tree-ssa-dse.cc (pass_dse::execute): Release number of iteration
estimates.
* gfortran.dg/pr104528.f: New testcase.
Jonathan Wakely [Mon, 14 Feb 2022 12:46:10 +0000 (12:46 +0000)]
libstdc++: Use __cpp_concepts instead of custom macro [PR103891]
With the new value of __cpp_concepts required by P2493, we can test
whether the compiler supports conditionally trivial special members.
This allows us to remove the workaround that disables fully-constexpr
std::variant for Clang. Now it should work for non-GCC compilers (such
as future releases of Clang) that support conditionally trivial
destructors and define the new value of __cpp_concepts.
libstdc++-v3/ChangeLog:
PR libstdc++/103891
* include/bits/c++config (_GLIBCXX_HAVE_COND_TRIVIAL_SPECIAL_MEMBERS):
Remove.
* include/std/variant: Check feature test macros instead.
* include/std/version: Likewise.
Jonathan Wakely [Mon, 14 Feb 2022 10:05:47 +0000 (10:05 +0000)]
libstdc++: Fix typo in pragma
libstdc++-v3/ChangeLog:
* testsuite/20_util/unsynchronized_pool_resource/allocate.cc:
Fix typo.
Jonathan Wakely [Thu, 10 Feb 2022 14:06:27 +0000 (14:06 +0000)]
libstdc++: Fix std::to_chars for IEEE128 long double
The preprocessor check for _GLIBCXX_USE_FLOAT128 is the wrong condition,
because when the compiler is built with --with-long-double-format=ieee
configure determines that __float128 is the same as long double, and so
should not be used. But we do want the std::to_chars overloads for
__float128 in that case, because the floating_to_chars.cc file is built
with -mabi=ibmlongdouble and so the __float128 overloads are actually
the 'long double' ones for -mabi=ieeelongdouble code.
This fixes missing definitions of the __float128 overloads of
std::to_chars for --with-long-double-format=ieee builds. Without this,
there are symbols present in the --with-long-double-abi=ibm build which
are missing from the --with-long-double-abi=ieee build.
libstdc++-v3/ChangeLog:
* src/c++17/floating_to_chars.cc (FLOAT128_TO_CHARS): Depend on
LONG_DOUBLE_ALT128_COMPAT instead of USE_FLOAT128.
Richard Biener [Mon, 14 Feb 2022 09:09:10 +0000 (10:09 +0100)]
tree-optimization/104511 - avoid FP to DFP conversion for VEC_PACK_TRUNC
This avoids forwprop from matching DFP <-> FP vector conversions
using VEC_[UN]PACK{_TRUNC,_LO,_HI}. Maybe DFP vectors shouldn't be
a thing, but they appearantly are. Re-using CONVERT/NOP_EXPR for
DFP <-> FP conversions was probably a mistake.
2022-02-14 Richard Biener <rguenther@suse.de>
PR tree-optimization/104511
* tree-ssa-forwprop.cc (simplify_vector_constructor): Avoid
touching DFP <-> FP conversions.
* gcc.dg/pr104511.c: New testcase.
Richard Biener [Mon, 14 Feb 2022 08:29:20 +0000 (09:29 +0100)]
c/104505 - ICE with internal function call in diagnostic expression
The following handles internal function calls similar to how the
C++ frontend does, avoiding ICEing on those.
2022-02-14 Richard Biener <rguenther@suse.de>
PR c/104505
gcc/c-family/
* c-pretty-print.cc (c_pretty_printer::postfix_expression): Handle
internal function calls.
gcc/testsuite/
* c-c++-common/pr104505.c: New testcase.
Richard Biener [Fri, 11 Feb 2022 10:08:57 +0000 (11:08 +0100)]
middle-end/104497 - gimplification of vector indexing
The following attempts to address gimplification of
... = VIEW_CONVERT_EXPR<int[4]>((i & 1) != 0 ? inv : src)[i];
which is problematic since gimplifying the base object
? inv : src produces a register temporary but GIMPLE does not
really support a register as a base for an ARRAY_REF (even
though that's not strictly validated it seems as can be seen
at -O0). Interestingly the C++ frontend avoids this issue
by emitting the following GENERIC instead:
... = (i & 1) != 0 ? VIEW_CONVERT_EXPR<int[4]>(inv)[i] : VIEW_CONVERT_EXPR<int[4]>(src)[i];
The proposed patch below fixes things up when using an rvalue
as the base is OK by emitting a copy from a register base to a
non-register one. The ?: as lvalue extension seems to be gone
for C, C++ again unwraps the COND_EXPR in that case.
2022-02-11 Richard Biener <rguenther@suse.de>
PR middle-end/104497
* gimplify.cc (gimplify_compound_lval): Make sure the
base is a non-register if needed and possible.
* c-c++-common/torture/pr104497.c: New testcase.
GCC Administrator [Mon, 14 Feb 2022 00:16:23 +0000 (00:16 +0000)]
Daily bump.
Maciej W. Rozycki [Sun, 13 Feb 2022 22:57:21 +0000 (22:57 +0000)]
[Ada] PR ada/98724: Alpha/Linux/libada: Use wraplf for Aux_Long_Long_Float
Use the Long Long Float wrapper in terms of Long Float for Alpha/Linux
targets as well, fixing gnatlib compilation errors:
a-nallfl.ads:48:13: warning: intrinsic binding type mismatch on result [enabledby default]
a-nallfl.ads:48:13: warning: intrinsic binding type mismatch on parameter 1 [enabled by default]
a-nallfl.ads:48:13: warning: profile of "Sin" doesn't match the builtin it binds [enabled by default]
etc. with the `alpha-linux-gnu' target.
gcc/ada/
PR ada/98724
PR ada/97504
* Makefile.rtl (LIBGNAT_TARGET_PAIRS) <alpha*-*-linux*>: Use
wraplf version of Aux_Long_Long_Float.
Ian Lance Taylor [Sun, 13 Feb 2022 01:12:41 +0000 (17:12 -0800)]
runtime: call timer functions via syscall
It turns out to be painful to require linking against -lrt on
GNU/Linux, as that makes it harder to link Go code into C programs.
Instead just call the timer syscalls directly. That is what the
upstream library does anyhow.
gcc/go/
* gospec.cc: Revert 2022-02-09 change:
(RTLIB, RT_LIBRARY): Don't define.
(lang_specific_driver): Don't add -lrt if linking statically
on GNU/Linux.
gotools/
* configure.ac: Revert 2022-02-09 change:
(RT_LIBS): Don't define.
* Makefile.am (check-runtime): Don't set GOLIBS to $(RT_LIBS).
* configure, Makefile.in: Regenerate.
Reviewed-on: https://go-review.googlesource.com/c/gofrontend/+/385475
Ian Lance Taylor [Sat, 12 Feb 2022 22:54:21 +0000 (14:54 -0800)]
compiler: don't set ptrmask bit for pointer to notinheap type
Test case is https://go.dev/cl/385454.
Reviewed-on: https://go-review.googlesource.com/c/gofrontend/+/385474
Mikael Morin [Fri, 28 Jan 2022 21:00:57 +0000 (22:00 +0100)]
fortran: Unshare associate var charlen [PR104228]
PR104228 showed that character lengths were shared between associate
variable and associate targets. This is problematic when the associate
target is itself a variable and gets a variable to hold the length, as
the length variable is added (and all the variables following it in the chain)
to both the associate variable scope and the target variable scope.
This caused an ICE when compiling with -O0 -fsanitize=address.
This change forces the creation of a separate character length for the
associate variable. It also forces the initialization of the character
length variable to avoid regressing associate_32 and associate_47 tests.
PR fortran/104228
gcc/fortran/ChangeLog:
* resolve.cc (resolve_assoc_var): Also create a new character
length for non-dummy associate targets.
* trans-stmt.cc (trans_associate_var): Initialize character length
even if no temporary is used for the associate variable.
gcc/testsuite/ChangeLog:
* gfortran.dg/asan/associate_58.f90: New test.
* gfortran.dg/asan/associate_59.f90: New test.
liuhongt [Mon, 24 Jan 2022 03:05:47 +0000 (11:05 +0800)]
Add vect_recog_cond_expr_convert_pattern.
The pattern converts (cond (cmp a b) (convert c) (convert d))
to (convert (cond (cmp a b) c d)) when
1) types_match (c, d)
2) single_use for (convert c) and (convert d)
3) TYPE_PRECISION (TREE_TYPE (c)) == TYPE_PRECISION (TREE_TYPE (a))
4) INTEGERAL_TYPE_P (TREE_TYPE (c))
The pattern can save packing of mask and data(partial for data, 2 vs
1).
gcc/ChangeLog:
PR target/103771
* match.pd (cond_expr_convert_p): New match.
* tree-vect-patterns.cc (gimple_cond_expr_convert_p): Declare.
(vect_recog_cond_expr_convert_pattern): New.
gcc/testsuite/ChangeLog:
* gcc.target/i386/pr103771-2.c: New test.
* gcc.target/i386/pr103771-3.c: New test.
GCC Administrator [Sun, 13 Feb 2022 00:16:20 +0000 (00:16 +0000)]
Daily bump.
Jakub Jelinek [Sat, 12 Feb 2022 18:17:44 +0000 (19:17 +0100)]
asan: Fix up address sanitizer instrumentation of __builtin_alloca* if it can throw [PR104449]
With -fstack-check* __builtin_alloca* can throw and the asan
instrumentation of this builtin wasn't prepared for that case.
The following patch fixes that by replacing the builtin with the
replacement builtin and emitting any further insns on the fallthru
edge.
I haven't touched the hwasan code which most likely suffers from the
same problem.
2022-02-12 Jakub Jelinek <jakub@redhat.com>
PR sanitizer/104449
* asan.cc: Include tree-eh.h.
(handle_builtin_alloca): Handle the case when __builtin_alloca or
__builtin_alloca_with_align can throw.
* gcc.dg/asan/pr104449.c: New test.
* g++.dg/asan/pr104449.C: New test.
H.J. Lu [Thu, 10 Feb 2022 13:42:49 +0000 (05:42 -0800)]
x86: Update PR 35513 tests
1. Require linker with GNU_PROPERTY_1_NEEDED support for PR 35513
run-time tests.
2. Compile pr35513-8.c to scan assembly code.
PR testsuite/104481
* g++.target/i386/pr35513-1.C: Require property_1_needed target.
* g++.target/i386/pr35513-2.C: Likewise.
* gcc.target/i386/pr35513-8.c: Change to compile.
* lib/target-supports.exp (check_compile): Support assembly code.
(check_effective_target_property_1_needed): New proc.
Jakub Jelinek [Sat, 12 Feb 2022 10:17:41 +0000 (11:17 +0100)]
i386: Fix up cvtsd2ss splitter [PR104502]
The following testcase ICEs, because AVX512F is enabled, AVX512VL is not,
and the cvtsd2ss insn has %xmm0-15 as output operand and %xmm16-31 as
input operand. For output operand %xmm16+ the splitter just gives up
in such case, but for such input it just emits vmovddup which requires
AVX512VL if either operand is EXT_REX_SSE_REG_P (when it is 128-bit).
The following patch fixes it by treating that case like the pre-SSE3
output != input case - move the input to output and do everything on
the output reg which is known to be < %xmm16.
2022-02-12 Jakub Jelinek <jakub@redhat.com>
PR target/104502
* config/i386/i386.md (cvtsd2ss splitter): If operands[1] is xmm16+
and AVX512VL isn't available, move operands[1] to operands[0] first.
* gcc.target/i386/pr104502.c: New test.
Uros Bizjak [Sat, 12 Feb 2022 09:53:49 +0000 (10:53 +0100)]
i386: Skip decimal float vector modes in type_natural_mode [PR79754]
2022-02-12 Uroš Bizjak <ubizjak@gmail.com>
gcc/ChangeLog:
PR target/79754
* config/i386/i386.cc (type_natural_mode):
Skip decimal float vector modes.
gcc/testsuite/ChangeLog:
PR target/79754
* gcc.target/i386/pr79754.c: New test.
GCC Administrator [Sat, 12 Feb 2022 00:16:23 +0000 (00:16 +0000)]
Daily bump.
Iain Sandoe [Mon, 7 Feb 2022 15:36:35 +0000 (15:36 +0000)]
LRA, rs6000, Darwin: Amend lo_sum use for forced constants [PR104117].
Two issues resulted in this PR, which manifests when we force a constant into
memory in LRA (in PIC code on Darwin). The presence of such forced constants
is quite dependent on other RTL optimisations, and it is easy for the issue to
become latent for a specific case.
First, in the Darwin-specific rs6000 backend code, we were not being careful
enough in rejecting invalid symbolic addresses. Specifically, when generating
PIC code, we require a SYMBOL_REF to be wrapped in an UNSPEC_MACHOPIC_OFFSET.
Second, LRA was attempting to load a register using an invalid lo_sum address.
Signed-off-by: Iain Sandoe <iain@sandoe.co.uk>
Co-authored-by: Vladimir Makarov <vmakarov@redhat.com>
PR target/104117
gcc/ChangeLog:
* config/rs6000/rs6000.cc (darwin_rs6000_legitimate_lo_sum_const_p):
Check for UNSPEC_MACHOPIC_OFFSET wrappers on symbolic addresses when
emitting PIC code.
(legitimate_lo_sum_address_p): Likewise.
* lra-constraints.cc (process_address_1): Do not attempt to emit a reg
load from an invalid lo_sum address.
Joseph Myers [Fri, 11 Feb 2022 23:23:48 +0000 (23:23 +0000)]
Regenerate .pot files.
gcc/po/
* gcc.pot: Regenerate.
libcpp/po/
* cpplib.pot: Regenerate.
Joseph Myers [Fri, 11 Feb 2022 23:22:07 +0000 (23:22 +0000)]
preprocessor: Extract messages from cpp_*_at calls for translation
The logic in libcpp/Makefile.in listing diagnostic functions in a call
to xgettext was missing cpp_warning_at, cpp_pedwarning_at and
cpp_error_at, so resulting in some messages not being extracted for
translation; add those functions to those for which messages are
extracted.
Tested with "make cpplib.pot".
* Makefile.in (po/$(PACKAGE).pot): Also handle cpp_warning_at,
cpp_pedwarning_at and cpp_error_at.
Joseph Myers [Fri, 11 Feb 2022 23:16:33 +0000 (23:16 +0000)]
i18n: fix exgettext handling of C++ sources
The move of source files to .cc names broke most message extraction by
exgettext because it processed .c files with --language=GCC-source but
didn't process .cc files that way. Fix to process files identified as
C++ that way as well.
Tested with "make gcc.pot".
* exgettext: Also process C++ sources with --language=GCC-source.
Ian Lance Taylor [Fri, 11 Feb 2022 22:53:56 +0000 (14:53 -0800)]
libgo: update to Go1.18beta2
gotools/
* Makefile.am (go_cmd_cgo_files): Add ast_go118.go
(check-go-tool): Copy golang.org/x/tools directories.
* Makefile.in: Regenerate.
Reviewed-on: https://go-review.googlesource.com/c/gofrontend/+/384695
Jonathan Wakely [Thu, 10 Feb 2022 17:34:58 +0000 (17:34 +0000)]
libstdc++: Fix FAIL: 20_util/temporary_buffer.cc for C++14
The std::get_temporary_buffer function is deprecated since C++17, but
the test was expecting a warning for C++14 as well.
libstdc++-v3/ChangeLog:
* testsuite/20_util/temporary_buffer.cc: Fix dg-warning target
selector.
Jonathan Wakely [Fri, 11 Feb 2022 21:17:05 +0000 (21:17 +0000)]
libstdc++: Fix test failures at -O0
libstdc++-v3/ChangeLog:
* testsuite/20_util/monotonic_buffer_resource/allocate.cc:
Ignore -Walloc-larger-than warning.
* testsuite/20_util/unsynchronized_pool_resource/allocate.cc:
Likewise.
* testsuite/29_atomics/atomic/cons/user_pod.cc: Compile with -O1
to avoid linker error for __atomic_is_lock_free.
Jakub Jelinek [Fri, 11 Feb 2022 19:27:23 +0000 (20:27 +0100)]
match.pd: Fix up (X & Y) CMP 0 -> X CMP2 ~Y simplifications [PR104499]
The following testcase ICEs on x86_64-linux, because match.pd emits
there a NOP_EXPR cast from int*8 vector type with BLKmode to
unsigned*8 vector type with BLKmode and vec-lowering isn't prepared
to handle such casts.
Fixed by using VIEW_CONVERT_EXPR instead.
2022-02-11 Jakub Jelinek <jakub@redhat.com>
PR tree-optimization/104499
* match.pd ((X & Y) CMP 0 -> X CMP2 ~Y): Use view_convert instead
of convert.
* gcc.c-torture/compile/pr104499.c: New test.
Jakub Jelinek [Fri, 11 Feb 2022 18:47:14 +0000 (19:47 +0100)]
middle-end: Small __builtin_clear_padding improvements
When looking at __builtin_clear_padding today, I've noticed that
it is quite wasteful to extend the original user one argument to 3,
2 is enough. We need to encode the original type of the first argument
because pointer conversions are useless in GIMPLE, and we need to record
a boolean whether it is for -ftrivial-auto-var-init=* or not.
But for recording the type we don't need the value (we've always used
zero) and for recording the boolean we don't need the type (we've always
used integer_type_node).
So, this patch merges the two into one.
2022-02-11 Jakub Jelinek <jakub@redhat.com>
* tree.cc (build_common_builtin_nodes): Fix up formatting in
__builtin_clear_padding decl creation.
* gimplify.cc (gimple_add_padding_init_for_auto_var): Encode
for_auto_init in the value of 2nd BUILT_IN_CLEAR_PADDING
argument rather than in 3rd argument.
(gimplify_call_expr): Likewise. Fix up comment formatting.
* gimple-fold.cc (gimple_fold_builtin_clear_padding): Expect
2 arguments instead of 3, take for_auto_init from the value
of 2nd argument.
Vladimir N. Makarov [Fri, 11 Feb 2022 14:52:14 +0000 (09:52 -0500)]
[PR104400] LRA: Modify exclude start hard register calculation for insn alternative
v850 target has an interesting insn alternative constraint 'e!r' where e
denotes even general regs and e is a subset of r. We cannot just make
union of exclude start hard registers for e and r and should use only
exclude start hard registers of r. The following patch implements this.
gcc/ChangeLog:
PR rtl-optimization/104400
* lra-constraints.cc (process_alt_operands): Don't make union of
this_alternative_exclude_start_hard_regs when reg class in insn
alternative covers other reg classes in the same alternative.
gcc/testsuite/ChangeLog:
PR rtl-optimization/104400
* gcc.target/v850/pr104400.c: New.
* gcc.target/v850/v850.exp: New.
David Malcolm [Fri, 11 Feb 2022 00:01:30 +0000 (19:01 -0500)]
analyzer: ignore uninitialized uses of empty types [PR104274]
PR analyzer/104274 reports a false positive from
-Wanalyzer-use-of-uninitialized-value on hppa when passing
an empty struct as a function parameter.
pa_pass_by_reference returns true for empty structs, so the
call is turned into:
struct empty arg.0;
arg.0 = arg
called_function (arg.0);
by gimplify_parameters.
However, gimplify_modify_expr discards assignments statments
of empty types, so that we end up with:
struct empty arg.0;
called_function (arg.0);
which the analyzer considers to be a use of uninitialized "arg.0";
Given that gimplify_modify_expr will discard any assignments to
such types, it seems simplest for -Wanalyzer-use-of-uninitialized-value
to ignore values of empty types.
gcc/analyzer/ChangeLog:
PR analyzer/104274
* region-model.cc (region_model::check_for_poison): Ignore
uninitialized uses of empty types.
gcc/testsuite/ChangeLog:
PR analyzer/104274
* gcc.dg/analyzer/torture/empty-struct-1.c: New test.
Signed-off-by: David Malcolm <dmalcolm@redhat.com>
Richard Biener [Fri, 11 Feb 2022 11:43:22 +0000 (12:43 +0100)]
[gimplefe] Add vector_mask attribute to get access to vector bools
The following adds __attribute__((vector_mask)) to get access to
the corresponding mask type for a vector type. The implementation
simply uses truth_type_for so creating a mask type that's not
what the target would choose as canonical, say a AVX2 style one
when AVX512VL is enabled, is not possible. It might be possible
to provide access to that with an optional argument specifying
the precision of the bool element. The syntax is as simple as
typedef vector_type mask_type __attribute__((vector_mask));
In theory this allows to create unit testcases for vector
lowering and ISEL.
2022-02-11 Richard Biener <rguenther@suse.de>
gcc/c-family/
* c-attribs.cc (c_common_attribute_table): Add entry for
vector_mask.
(handle_vector_mask_attribute): New.
gcc/c/
* gimple-parser.cc (c_parser_gimple_statement): Properly parse
VEC_COND_EXPRs.
gcc/testsuite/
* gcc.dg/gimplefe-48.c: New testcase.
Jakub Jelinek [Fri, 11 Feb 2022 12:52:44 +0000 (13:52 +0100)]
c++: Fix up constant expression __builtin_convertvector folding [PR104472]
The following testcase ICEs, because due to the -frounding-math
fold_const_call fails, which is it returns NULL, and returning NULL from
cxx_eval* is wrong, all the callers rely on them to either return folded
value or original with *non_constant_p = true.
The following patch does that, and additionally falls through into the
default case where there is diagnostics for the !ctx->quiet case too.
2022-02-11 Jakub Jelinek <jakub@redhat.com>
PR c++/104472
* constexpr.cc (cxx_eval_internal_function) <case IFN_VEC_CONVERT>:
Only return fold_const_call result if it is non-NULL. Otherwise
fall through into the default: case to return t, set *non_constant_p
and emit diagnostics if needed.
* g++.dg/cpp0x/constexpr-104472.C: New test.
Jakub Jelinek [Fri, 11 Feb 2022 10:34:46 +0000 (11:34 +0100)]
combine: Fix ICE with substitution of CONST_INT into PRE_DEC argument [PR104446]
The following testcase ICEs, because combine substitutes
(insn 10 9 11 2 (set (reg/v:SI 7 sp [ a ])
(const_int 0 [0])) "pr104446.c":9:5 81 {*movsi_internal}
(nil))
(insn 13 11 14 2 (set (mem/f:SI (pre_dec:SI (reg/f:SI 7 sp)) [0 S4 A32])
(reg:SI 85)) "pr104446.c":10:3 56 {*pushsi2}
(expr_list:REG_DEAD (reg:SI 85)
(expr_list:REG_ARGS_SIZE (const_int 16 [0x10])
(nil))))
forming
(insn 13 11 14 2 (set (mem/f:SI (pre_dec:SI (const_int 0 [0])) [0 S4 A32])
(reg:SI 85)) "pr104446.c":10:3 56 {*pushsi2}
(expr_list:REG_DEAD (reg:SI 85)
(expr_list:REG_ARGS_SIZE (const_int 16 [0x10])
(nil))))
which is invalid RTL (pre_dec's argument must be a REG).
I know substitution creates various forms of invalid RTL and hopes that
invalid RTL just won't recog.
But unfortunately in this case we ICE before we get to recog, as
try_combine does:
if (n_auto_inc)
{
int new_n_auto_inc = 0;
for_each_inc_dec (newpat, count_auto_inc, &new_n_auto_inc);
if (n_auto_inc != new_n_auto_inc)
{
if (dump_file && (dump_flags & TDF_DETAILS))
fprintf (dump_file, "Number of auto_inc expressions changed\n");
undo_all ();
return 0;
}
}
and for_each_inc_dec under the hood will do e.g. for the PRE_DEC case:
case PRE_DEC:
case POST_DEC:
{
poly_int64 size = GET_MODE_SIZE (GET_MODE (mem));
rtx r1 = XEXP (x, 0);
rtx c = gen_int_mode (-size, GET_MODE (r1));
return fn (mem, x, r1, r1, c, data);
}
and that code rightfully expects that the PRE_DEC operand has non-VOIDmode
(as it needs to be a REG) - gen_int_mode for VOIDmode results in ICE.
I think it is better not to emit the clearly invalid RTL during substitution
like we do for other cases, than to adding workarounds for invalid IL
created by combine to rtlanal.cc and perhaps elsewhere.
As for the testcase, of course it is UB at runtime to modify sp that way,
but if such code is never reached, we must compile it, not to ICE on it.
And I don't see why on other targets which use the autoinc rtxes much more
it couldn't happen with other registers.
2022-02-11 Jakub Jelinek <jakub@redhat.com>
PR middle-end/104446
* combine.cc (subst): Don't substitute CONST_INTs into RTX_AUTOINC
operands.
* gcc.target/i386/pr104446.c: New test.
Richard Biener [Fri, 11 Feb 2022 09:27:20 +0000 (10:27 +0100)]
middle-end/104496 - fix vectorized_internal_fn_supported_p
This fixes vectorized_internal_fn_supported_p behavior when
facing vector types with an integer mode.
2022-02-11 Richard Biener <rguenther@suse.de>
PR middle-end/104496
* internal-fn.cc (vectorized_internal_fn_supported_p):
Bail out for integer mode vector types.
* gcc.target/i386/pr104496.c: New testcase.
Jakub Jelinek [Fri, 11 Feb 2022 10:21:24 +0000 (11:21 +0100)]
df: Don't set bbs dirty because of debug insn moves [PR104459]
As mentioned in the PR, we get -fcompare-debug failure, which is caused by
cfg_layout_merge_blocks successfully merging two bbs where both bbs
contained just CODE_LABEL, NOTE_INSN_BASIC_BLOCK and in the -g case both
some debug insns at the end. cfg_layout_merge_blocks calls
update_bb_for_insn_chain which for the post-label insns in the second block
(except for BARRIERs) calls df_insn_change_bb. This function changes
the bb of the insns and for notes just punts, but for other insns calls
df_set_bb_dirty. Now the problem is that because there were only debug
insns and notes in the second block, df_set_bb_dirty is called on both
only in the -g case and not with -g0. df_set_bb_dirty these days
sets both the BB_MODIFIED flag and marks the bb as dirty, and the former
is what 6 spots in cfgcleanup.cc use in code-generation decisions,
in this case
may_thread |= (target->flags & BB_MODIFIED) != 0;
in particular. So, with -g may_thread is true while with -g0 it is not
and we diverge from that point onwards.
I've thought about introducing df_set_bb_dirty_nondebug that wouldn't
set BB_MODIFIED but would mark the bb dirty, but then I went through
history and found changes like:
https://gcc.gnu.org/legacy-ml/gcc-patches/2010-10/msg00059.html
so I've also tried just not calling df_set_bb_dirty for debug insns
at all and it passed x86_64-linux and i686-linux
--enable-checking=yes,rtl,extra,df bootstraps/regtests, so perhaps
that works too.
Now that I look at it again, if we don't need those from %d to %d messages
for debug insns in the dump files, another way to fix it would be just to
change the very first line in the hunk from
if (!INSN_P (insn))
to
if (!DEBUG_INSN_P (insn))
Though, df_set_bb_dirty_nondebug which will do everything but
set bb->flags |= BB_MODIFIED is yet another option I can test.
Perhaps even that PR42889 was solely about those 6 decisions in cfgcleanup
(at that point it used df_get_bb_dirty) and not about actually the
recomputation of some of the problems causing different code generations.
2022-02-11 Jakub Jelinek <jakub@redhat.com>
PR rtl-optimization/104459
* df-scan.cc (df_insn_change_bb): Don't call df_set_bb_dirty when
moving DEBUG_INSNs between bbs.
* gcc.dg/pr104459.c: New test.
liuhongt [Thu, 10 Feb 2022 07:42:13 +0000 (15:42 +0800)]
Add single_use to simplification (uncond_op + vec_cond -> cond_op).
gcc/ChangeLog:
PR tree-optimization/104479
* match.pd (uncond_op + vec_cond -> cond_op): Add single_use
for the dest of uncond_op.
gcc/testsuite/ChangeLog:
* gcc.target/i386/pr104479.c: New test.
* gcc.target/i386/cond_op_shift_w-1.c: Adjust testcase.
Tom de Vries [Wed, 9 Feb 2022 15:48:27 +0000 (16:48 +0100)]
[testsuite] Require non_strict_prototype in a few tests
Require effective target non_strict_prototype in a few test-cases.
Tested on nvptx.
gcc/testsuite/ChangeLog:
2022-02-10 Tom de Vries <tdevries@suse.de>
* gcc.c-torture/compile/pr100576.c: Require effective target
non_strict_prototype.
* gcc.c-torture/compile/pr97576.c: Same.
Tom de Vries [Wed, 9 Feb 2022 15:49:38 +0000 (16:49 +0100)]
[testsuite] Require alloca support in a few tests
Require effective target alloca in a few test-cases.
Tested on nvptx.
gcc/testsuite/ChangeLog:
2022-02-10 Tom de Vries <tdevries@suse.de>
* c-c++-common/Walloca-larger-than.c: Require effective target alloca.
* c-c++-common/Warray-bounds-9.c: Same.
* c-c++-common/Wdangling-pointer-2.c: Same.
* c-c++-common/Wdangling-pointer-4.c: Same.
* c-c++-common/Wdangling-pointer-5.c: Same.
* c-c++-common/Wdangling-pointer.c: Same.
* c-c++-common/auto-init-11.c: Same.
* c-c++-common/auto-init-12.c: Same.
* c-c++-common/auto-init-15.c: Same.
* c-c++-common/auto-init-16.c: Same.
* c-c++-common/torture/builtin-clear-padding-4.c: Same.
* gcc.c-torture/compile/pr99787-1.c: Same.
* gcc.dg/Walloca-larger-than-4.c: Same.
* gcc.dg/Wdangling-pointer.c: Same.
* gcc.dg/Wfree-nonheap-object-2.c: Same.
* gcc.dg/Wfree-nonheap-object.c: Same.
* gcc.dg/Wstringop-overflow-56.c: Same.
* gcc.dg/Wstringop-overflow-57.c: Same.
* gcc.dg/Wstringop-overflow-67.c: Same.
* gcc.dg/Wstringop-overflow-71.c: Same.
* gcc.dg/Wvla-larger-than-5.c: Same.
* gcc.dg/analyzer/taint-alloc-1.c: Same.
* gcc.dg/analyzer/torture/ubsan-1.c: Same.
* gcc.dg/graphite/pr99085.c: Same.
* gcc.dg/pr100225.c: Same.
* gcc.dg/pr98721-1.c: Same.
* gcc.dg/pr99122-2.c: Same.
* gcc.dg/sso-14.c: Same.
* gcc.dg/tree-ssa/builtin-sprintf-warn-25.c: Same.
* gcc.dg/uninit-38.c: Same.
* gcc.dg/uninit-39.c: Same.
* gcc.dg/uninit-41.c: Same.
* gcc.dg/uninit-pr100250.c: Same.
* gcc.dg/uninit-pr101300.c: Same.
* gcc.dg/uninit-pr101494.c: Same.
* gcc.dg/uninit-pr98578.c: Same.
* gcc.dg/uninit-pr98583.c: Same.
* gcc.dg/vla-stexp-1.c: Same.
* gcc.dg/vla-stexp-2.c: Same.
* gcc.dg/vla-stexp-4.c: Same.
* gcc.dg/vla-stexp-5.c: Same.
Tom de Vries [Thu, 10 Feb 2022 10:26:16 +0000 (11:26 +0100)]
[nvptx] Handle asm insn in prevent_branch_around_nothing
With GOMP_NVPTX_JIT=-00 and -mptx=3.1, I run into:
...
FAIL: libgomp.oacc-c/../libgomp.oacc-c-c++-common/acc_prof-version-1.c \
-DACC_DEVICE_TYPE_nvidia=1 -DACC_MEM_SHARED=0 -foffload=nvptx-none -O2 \
execution test
...
The problem is that we're generating a diverging branch around nothing:
...
{
.reg.u32 %x;
mov.u32 %x, %tid.x;
setp.ne.u32 %r23, %x, 0;
}
@%r23 bra $L2;
$L2:
...
which the driver JIT has problems with at -O0, so consequently we run into the
nvptx_uniform_warp_check.
Fix this by handling asm ("") and alike in prevent_branch_around_nothing.
Tested on x86_64 with nvptx accelerator.
gcc/ChangeLog:
2022-02-10 Tom de Vries <tdevries@suse.de>
PR target/104456
* config/nvptx/nvptx.cc (prevent_branch_around_nothing): Handle asm
insn.
GCC Administrator [Fri, 11 Feb 2022 00:16:25 +0000 (00:16 +0000)]
Daily bump.
Jakub Jelinek [Thu, 10 Feb 2022 23:27:11 +0000 (00:27 +0100)]
testsuite: Fix up g++.dg/warn/Wuninitialized-32.C test for ilp32 [PR104373]
The testcase FAILs whenever size_t is not unsigned long:
FAIL: g++.dg/warn/Wuninitialized-32.C -std=c++98 (test for excess errors)
Excess errors:
.../gcc/testsuite/g++.dg/warn/Wuninitialized-32.C:4:7: error: 'operator new' takes type 'size_t' ('unsigned int') as first parameter [-fpermissive]
Fixed by using __SIZE_TYPE__ instead of unsigned long.
2022-02-11 Jakub Jelinek <jakub@redhat.com>
PR tree-optimization/104373
* g++.dg/warn/Wuninitialized-32.C (operator new[]): Use __SIZE_TYPE__
as type of the first argument instead of unsigned long.
Jason Merrill [Thu, 10 Feb 2022 19:59:49 +0000 (14:59 -0500)]
c++: ICE on xtreme-header_a.H
This test regressed after my PR103752 patch with -march=cascadelake. I
don't understand why that flag makes a difference, but this patch is correct
in any case.
gcc/cp/ChangeLog:
* module.cc (depset::hash::add_specializations): Use
STRIP_TEMPLATE.
Thomas Rodgers [Thu, 10 Feb 2022 18:12:36 +0000 (10:12 -0800)]
libstdc++: Strengthen memory order for atomic<T>::wait/notify
This changes the memory order used in the spin wait code to match
that of libc++.
libstdc++-v3/ChangeLog:
* include/bits/atomic_wait.h (__waiter_base::_S_do_spin,
__waiter_base::_S_do_spin_v): Change memory order from relaxed
to acquire.
Tobias Burnus [Thu, 10 Feb 2022 17:57:37 +0000 (18:57 +0100)]
OpenMP/C++: Permit mapping classes with virtual members [PR102204]
PR c++/102204
gcc/cp/ChangeLog:
* decl2.cc (cp_omp_mappable_type_1): Remove check for virtual
members as those are permitted since OpenMP 5.0.
libgomp/ChangeLog:
* testsuite/libgomp.c++/target-virtual-1.C: New test.
gcc/testsuite/ChangeLog:
* g++.dg/gomp/unmappable-1.C: Remove previously expected dg-message.
David Malcolm [Thu, 10 Feb 2022 00:06:15 +0000 (19:06 -0500)]
analyzer: handle more casts of string literals [PR98797]
gcc/analyzer/ChangeLog:
PR analyzer/98797
* region-model-manager.cc
(region_model_manager::maybe_fold_sub_svalue): Generalize getting
individual chars of a STRING_CST from element_region to any
subregion which is a concrete access of a single byte from its
parent region.
* region.cc (region::get_relative_concrete_byte_range): New.
* region.h (region::get_relative_concrete_byte_range): New decl.
gcc/testsuite/ChangeLog:
PR analyzer/98797
* gcc.dg/analyzer/casts-1.c: Mark xfails as fixed; add further
test coverage for casts of string literals.
Signed-off-by: David Malcolm <dmalcolm@redhat.com>
Qing Zhao [Thu, 10 Feb 2022 16:40:39 +0000 (16:40 +0000)]
middle-end: updating the reg use in exit block for -fzero-call-used-regs [PR100775]
In the pass_zero_call_used_regs, when updating dataflow info after adding
the register zeroing sequence in the epilogue of the function, we should
call "df_update_exit_block_uses" to update the register use information in
the exit block to include all the registers that have been zeroed.
2022-02-10 Qing Zhao <qing.zhao@oracle.com>
gcc/ChangeLog:
PR middle-end/100775
* function.cc (gen_call_used_regs_seq): Call
df_update_exit_block_uses when updating df.
gcc/testsuite/ChangeLog:
PR middle-end/100775
* gcc.target/arm/pr100775.c: New test.
Uros Bizjak [Thu, 10 Feb 2022 16:23:17 +0000 (17:23 +0100)]
i386: Fix vec_unpacks_float_lo_v4si operand constraint [PR104469]
2022-02-10 Uroš Bizjak <ubizjak@gmail.com>
gcc/ChangeLog:
PR target/104469
* config/i386/sse.md (vec_unpacks_float_lo_v4si):
Change operand 1 constraint to register_operand.
gcc/testsuite/ChangeLog:
PR target/104469
* gcc.target/i386/pr104469.c: New test.
H.J. Lu [Thu, 10 Feb 2022 14:26:23 +0000 (06:26 -0800)]
pr104458.c: Replace long with long long for -mx32
PR target/104458
* gcc.target/i386/pr104458.c: Replace long with long long.
David Malcolm [Wed, 9 Feb 2022 22:55:55 +0000 (17:55 -0500)]
analyzer: fix testsuite issues seen with mingw [PR102052]
gcc/testsuite/ChangeLog:
PR analyzer/102052
* gcc.dg/analyzer/fields.c (size_t): Use __SIZE_TYPE__ rather than
hardcoding long unsigned int.
* gcc.dg/analyzer/gzio-3.c (size_t): Likewise.
* gcc.dg/analyzer/gzio-3a.c (size_t): Likewise.
* gcc.dg/analyzer/pr98969.c (test_1): Use __UINTPTR_TYPE__ rather
than long int.
(test_2): Likewise.
* gcc.dg/analyzer/pr99716-2.c (test_mountpoint): Use "rand" rather
than "random".
* gcc.dg/analyzer/pr99774-1.c (size_t): Use __SIZE_TYPE__ rather
than hardcoding long unsigned int.
* gcc.dg/analyzer/strndup-1.c: Add MinGW to targets that don't
implement strndup.
* gcc.dg/analyzer/zlib-5.c (size_t): Use __SIZE_TYPE__ rather
than hardcoding long unsigned int.
Signed-off-by: David Malcolm <dmalcolm@redhat.com>
Patrick Palka [Thu, 10 Feb 2022 13:54:07 +0000 (08:54 -0500)]
c++: memfn lookup consistency and dependent using-decls
Rather than not doing any filtering when filter_memfn_lookup encounters
a dependent using-decl, handle this case less imprecisely by holding on
to the members in the new lookup set that come from a base, i.e. that
could plausibly have been introduced by that using-decl, and filtering
the rest as usual. This is still imperfect, but it's closer to the
correct answer than the previous behavior was.
gcc/cp/ChangeLog:
* pt.cc (filter_memfn_lookup): Handle dependent USING_DECL
better.
Roger Sayle [Thu, 10 Feb 2022 13:32:07 +0000 (13:32 +0000)]
gfortran: Respect target's NO_DOT_IN_LABEL in trans-common.cc
This patch fixes 9 unexpected failures in the gfortran testsuite on
nvptx-none. The issue is that gfortran's EQUIVALENCE internally uses
symbols such as "equiv.0" even on platforms that define NO_DOT_IN_LABEL.
On nvptx-none, this then results in the following error message(s):
ptxas application ptx input, fatal: Parsing error near '.0': syntax error
ptxas fatal: Ptx assembly aborted due to errors
The fix is to tweak trans-common.cc to respect the target's NO_DOT_IN_LABEL
(and NO_DOLLAR_IN_LABEL) when generating internal equiv.%d symbols.
Only the nvptx, mmix and xtensa backends define NO_DOT_IN_LABEL which
explains why no-one has spotted/fixed this issue since the problematic
code was last changed back in 2005(!).
2022-02-10 Roger Sayle <roger@nextmovesoftware.com>
Tobias Burnus <tobias@codesourcery.com>
gcc/fortran/ChangeLog
* trans-common.cc (GFC_EQUIV_FMT): New macro respecting the
target's NO_DOT_IN_LABEL and NO_DOLLAR_IN_LABEL preferences.
(build_equiv_decl): Use GFC_EQUIV_FMT here.
Jonathan Wakely [Wed, 9 Feb 2022 13:38:33 +0000 (13:38 +0000)]
libstdc++: Add atomic_fetch_xor to <stdatomic.h>
This function (and the explicit memory over version) are present in both
C++ <atomic> and C <stdatomic.h>, so should be in C++ <stdatomic.h> too.
There is a library issue incoming for this, but the resolution is
obvious.
libstdc++-v3/ChangeLog:
* include/c_compatibility/stdatomic.h (atomic_fetch_xor): Add
using-declaration.
(atomic_fetch_xor_explicit): Likewise.
* testsuite/29_atomics/headers/stdatomic.h/c_compat.cc: Check
arithmetic and logical operations for atomic_int.
Jonathan Wakely [Tue, 8 Feb 2022 21:05:30 +0000 (21:05 +0000)]
libstdc++: Fix directory iterator build for newlib
When building for newlib HAVE_OPENAT and HAVE_UNLINKAT are (sometimes?)
defined, but <fcntl.h> is only included when HAVE_DIRENT_H is defined.
Since directory iterators are completely useless without <dirent.h>,
just override the HAVE_OPENAT and HAVE_UNLINKAT detection when we don't
have <dirent.h>.
libstdc++-v3/ChangeLog:
* src/filesystem/dir-common.h (_GLIBCXX_HAVE_DIRFD): Undefine
when <dirent.h> is not available.
(_GLIBCXX_HAVE_UNLINKAT): Likewise.
Richard Biener [Fri, 4 Feb 2022 08:46:43 +0000 (09:46 +0100)]
tree-optimization/104373 - early diagnostic on unreachable code
The following improves early uninit diagnostics by computing edge
reachability using VN and ignoring unreachable blocks when looking
for uninitialized uses. To not ICE with -fdump-tree-all the
early uninit pass needs a dumpfile since VN tries to dump statistics.
2022-02-04 Richard Biener <rguenther@suse.de>
PR tree-optimization/104373
* tree-ssa-sccvn.h (do_rpo_vn): New export exposing the
walk kind.
* tree-ssa-sccvn.cc (do_rpo_vn): Export, get the default
walk kind as argument.
(run_rpo_vn): Adjust.
(pass_fre::execute): Likewise.
* tree-ssa-uninit.cc (warn_uninitialized_vars): Skip
blocks not reachable.
(execute_late_warn_uninitialized): Mark all edges as
executable.
(execute_early_warn_uninitialized): Use VN to compute
executable edges.
(pass_data_early_warn_uninitialized): Enable a dump file,
change dump name to warn_uninit.
* g++.dg/warn/Wuninitialized-32.C: New testcase.
* gcc.dg/uninit-pr20644-O0.c: Remove XFAIL.
Richard Biener [Thu, 10 Feb 2022 09:01:20 +0000 (10:01 +0100)]
middle-end/104467 - fix vector extract simplification
This fixes a bogus vector type used for a CTOR build as part of
vector extract simplification. The code failed to consider a
CTOR of vector elements.
2022-02-10 Richard Biener <rguenther@suse.de>
PR middle-end/104467
* match.pd (vector extract simplification): Multiply the
number of CTOR elements with the number of element elements.
* gcc.dg/torture/pr104467.c: New testcase.
Richard Biener [Thu, 10 Feb 2022 08:03:48 +0000 (09:03 +0100)]
tree-optimization/104466 - fix cut&paste error perventing alias disambiguation
The following fixes a cut&paste error in disambiguating using restrict
info. Instead of using the for this purpose computed rbase1/rbase2
which preserve MEM_REF bases even when they are based on a decl the
code performs the check on the bases that drop info for those ...
2022-02-10 Richard Biener <rguenther@suse.de>
PR tree-optimization/104466
* tree-ssa-alias.cc (refs_may_alias_p_2): Use rbase1/rbase2
for the MR_DEPENDENCE checks as intended.
* gfortran.dg/pr104466.f90: New testcase.
Tom de Vries [Wed, 2 Feb 2022 15:23:37 +0000 (16:23 +0100)]
[nvptx] Handle sm_7x shared atomic store more optimal
For sm_7x atomic stores we fall back on expand_atomic_store, but this
results in using membar.sys for shared stores.
Fix this by adding an nvptx_atomic_store insn that adds a membar.cta for a
shared store.
Tested on x86_64 with nvptx accelerator.
gcc/ChangeLog:
2022-02-02 Tom de Vries <tdevries@suse.de>
* config/nvptx/nvptx.md (define_insn "nvptx_atomic_store<mode>"): New
define_insn.
(define_expand "atomic_store<mode>"): Use nvptx_atomic_store<mode> for
TARGET_SM70.
(define_c_enum "unspecv"): Add UNSPECV_ST.
gcc/testsuite/ChangeLog:
2022-02-02 Tom de Vries <tdevries@suse.de>
* gcc.target/nvptx/atomic-store-2.c: New test.
Tom de Vries [Thu, 13 Jan 2022 12:13:44 +0000 (13:13 +0100)]
[nvptx] Handle pre-sm_7x shared atomic store using atomic exchange
The ptx isa specifies (for pre-sm_7x) that atomic operations on shared memory
locations do not guarantee atomicity with respect to normal store instructions
to the same address.
This can be fixed by:
- inserting barriers between normal stores and atomic operations to a common
address
- using atom.exch to store to locations accessed by other atomic operations.
It's not clearly spelled out which barriers are needed, and a barrier seem more
expensive than atomic exchange.
Implement the pre-sm_7x shared atomic store using atomic exchange.
That includes stores using generic addressing, since those may also point to
shared memory.
Tested on x86-64 with nvptx accelerator.
gcc/ChangeLog:
2022-02-02 Tom de Vries <tdevries@suse.de>
* config/nvptx/nvptx-protos.h (nvptx_mem_maybe_shared_p): Declare.
* config/nvptx/nvptx.cc (nvptx_mem_data_area): New static function.
(nvptx_mem_maybe_shared_p): New function.
* config/nvptx/nvptx.md (define_expand "atomic_store<mode>"): New
define_expand.
gcc/testsuite/ChangeLog:
2022-02-02 Tom de Vries <tdevries@suse.de>
* gcc.target/nvptx/atomic-store-1.c: New test.
* gcc.target/nvptx/atomic-store-3.c: New test.
* gcc.target/nvptx/stack-atomics-run.c: Update.
Tom de Vries [Mon, 7 Feb 2022 13:12:34 +0000 (14:12 +0100)]
[nvptx] Workaround sub.u16 driver JIT bug
There's a nvidia driver JIT bug that mishandles this code (minimized from
builtin-arith-overflow-15.c):
...
int main (void) {
signed char r;
unsigned char y = (unsigned char) 0x80;
if (__builtin_sub_overflow ((unsigned char)0, (unsigned char)y, &r))
__builtin_abort ();
return 0;
}
...
which at ptx level minimizes to:
...
mov.u16 r22, 0x0080;
st.local.u16 [frame_var],r22;
ld.local.u16 r32,[frame_var];
sub.u16 r33,0x0000,r32;
cvt.u32.u16 r35,r33;
...
where we expect r35 == 0x0000ff80 but get instead 0xffffff80, and where using
nvptx-none-run -O0 fixes the problem. [ See also
https://github.com/vries/nvidia-bugs/tree/master/builtin-arith-overflow-15 . ]
Try to workaround the bug by using sub.s16 instead of sub.u16.
Tested on nvptx.
gcc/ChangeLog:
2022-02-07 Tom de Vries <tdevries@suse.de>
PR target/97005
* config/nvptx/nvptx.md (define_insn "sub<mode>3"): Workaround
driver JIT bug by using sub.s16 instead of sub.u16.
Tobias Burnus [Thu, 10 Feb 2022 08:30:19 +0000 (09:30 +0100)]
Fortran/OpenMP: Avoid ICE for invalid char array in omp atomic [PR104329]
PR fortran/104329
gcc/fortran/ChangeLog:
* openmp.cc (resolve_omp_atomic): Defer extra-code assert after
other diagnostics.
gcc/testsuite/ChangeLog:
* gfortran.dg/gomp/atomic-28.f90: New test.
Roger Sayle [Tue, 8 Feb 2022 19:56:55 +0000 (20:56 +0100)]
nvptx: Tweak constraints on copysign instructions
Many thanks to Thomas Schwinge for confirming my hypothesis that the register
usage regression, PR target/104345, is solely due to libgcc's _muldc3 function.
In addition to the isinf functionality in the previously proposed nvptx patch at
https://gcc.gnu.org/pipermail/gcc-patches/2022-January/588453.html which
significantly reduces the number of instructions in _muldc3, the patch below
further reduces both the number of instructions and the number of explicitly
declared registers, by permitting floating point constant immediate operands
in nvptx's copysign instruction.
Fingers-crossed, the combination with all of the previous proposed nvptx
patches improves things. Ultimately, increasing register usage from 50 to
51 registers, reducing the number of concurrent threads by ~2%, can easily
be countered if we're now executing significantly fewer instructions in each
kernel, for a net performance win.
This patch has been tested on nvptx-none hosted on x86_64-pc-linux-gnu
with a "make" and "make -k check" with no new failures.
gcc/ChangeLog:
* config/nvptx/nvptx.md (copysign<mode>3): Allow immediate
floating point constants as operands 1 and/or 2.
Roger Sayle [Fri, 4 Feb 2022 03:13:53 +0000 (04:13 +0100)]
PR target/104345: Use nvptx "set" instruction for cond ? -1 : 0
This patch addresses the "increased register pressure" regression on
nvptx-none caused by my change to transition the backend to a
STORE_FLAG_VALUE = 1 target. This improved code generation for the
more common case of producing 0/1 Boolean values, but unfortunately
made things marginally worse when a 0/-1 mask value is desired.
Unfortunately, nvptx kernels are extremely sensitive to changes in
register usage, which was observable in the reported PR.
This patch provides optimizations for -(cond ? 1 : 0), effectively
simplify this into cond ? -1 : 0, where these ternary operators are
provided by nvptx's selp instruction, and for the specific case of
SImode, using (restoring) nvptx's "set" instruction (which avoids
the need for a predicate register).
This patch has been tested on nvptx-none hosted on x86_64-pc-linux-gnu
with a "make" and "make -k check" with no new failures. Unfortunately,
the exact register usage of a nvptx kernel depends upon the version of
the Cuda drivers being used (and the hardware), but I believe this
change should resolve the PR (for Thomas) by improving code generation
for the cases that regressed.
gcc/ChangeLog:
PR target/104345
* config/nvptx/nvptx.md (sel_true<mode>): Fix indentation.
(sel_false<mode>): Likewise.
(define_code_iterator eqne): New code iterator for EQ and NE.
(*selp<mode>_neg_<code>): New define_insn_and_split to optimize
the negation of a selp instruction.
(*selp<mode>_not_<code>): New define_insn_and_split to optimize
the bitwise not of a selp instruction.
(*setcc_int<mode>): Use set instruction for neg:SI of a selp.
gcc/testsuite/ChangeLog:
PR target/104345
* gcc.target/nvptx/neg-selp.c: New test case.