platform/upstream/gcc.git
2 years agovect: Use code_helper when building SLP nodes
Richard Sandiford [Tue, 30 Nov 2021 09:52:28 +0000 (09:52 +0000)]
vect: Use code_helper when building SLP nodes

This patch uses code_helper to represent the common (and
alternative) operations when building an SLP node.  It's not
much of a saving on its own, but it helps with later patches.

gcc/
* tree-vect-slp.c (vect_build_slp_tree_1): Use code_helper
to record the operations performed by statements, only using
CALL_EXPR for things that don't map to built-in or internal
functions.  For shifts, require all shift amounts to be equal
if optab_vector is not supported but optab_scalar is.

2 years agovect: Fix SVE mask_gather_load/store_store tests
Richard Sandiford [Tue, 30 Nov 2021 09:52:28 +0000 (09:52 +0000)]
vect: Fix SVE mask_gather_load/store_store tests

If-conversion now applies rewrite_to_defined_overflow to the
address calculation in an IFN_MASK_LOAD.  This means that we
end up with:

    cast_base = (uintptr_t) base;
    uncast_sum = cast_base + offset;
    sum = (orig_type *) uncast_sum;

If the target supports IFN_MASK_GATHER_LOAD with pointer-sized
offsets for the given vectype, we wouldn't look through the sum
cast and so would needlessly vectorise the uncast_sum addition.

This showed up as several failures in gcc.target/aarch64/sve.

gcc/
* tree-vect-data-refs.c (vect_check_gather_scatter): Continue
processing conversions if the current offset is a pointer.

2 years agovect: Fix vect_is_reduction
Richard Sandiford [Tue, 30 Nov 2021 09:52:27 +0000 (09:52 +0000)]
vect: Fix vect_is_reduction

The current definition of vect_is_reduction (provided for target
costing) misses some pattern statements.

gcc/
* tree-vectorizer.h (vect_is_reduction): Use STMT_VINFO_REDUC_IDX.

gcc/testsuite/
* gcc.target/aarch64/sve/cost_model_13.c: New test.

2 years agovect: Pass mode to gather/scatter tests
Richard Sandiford [Tue, 30 Nov 2021 09:52:27 +0000 (09:52 +0000)]
vect: Pass mode to gather/scatter tests

vect_check_gather_scatter had a binary “does this target support
internal gather/scatter functions” test.  This dates from the time when
we only handled gathers and scatters via direct target support, with
x86_64 using built-in functions and aarch64 using IFNs.  But now that we
can emulate gathers, we need to check whether the gather for a particular
mode is going to be emulated or not.

Without this, enabling SVE regresses emulated Advanced SIMD gather
sequences in cases where SVE isn't used.

Livermore kernel 15 can now be vectorised with Advanced SIMD when
SVE is enabled.

gcc/
* genopinit.c (main): Turn supports_vec_gather_load and
supports_vec_scatter_store into signed char arrays and remove
supports_vec_gather_load_cached and supports_vec_scatter_store_cached.
* optabs-query.c (supports_vec_convert_optab_p): Add a mode parameter.
If the mode is not VOIDmode, test only for that mode.
(supports_vec_gather_load_p): Likewise.
(supports_vec_scatter_store_p): Likewise.
* optabs-query.h (supports_vec_gather_load_p): Likewise.
(supports_vec_scatter_store_p): Likewise.
* tree-vect-data-refs.c (vect_check_gather_scatter): Pass the
vector mode to supports_vec_gather_load_p and
supports_vec_scatter_store_p.

gcc/testsuite/
* gfortran.dg/vect/vect-8.f90: Bump number of vectorized loops
to 25 for SVE.
* gcc.target/aarch64/sve/gather_load_10.c: New test.

2 years agoMark IFN_ADD/MUL_OVERFLOW as commutative
Richard Sandiford [Tue, 30 Nov 2021 09:52:27 +0000 (09:52 +0000)]
Mark IFN_ADD/MUL_OVERFLOW as commutative

gcc/
* internal-fn.c (commutative_binary_fn_p): Handle IFN_ADD_OVERFLOW
and IFN_MUL_OVERFLOW.

gcc/testsuite/
* gcc.dg/add-mul-overflow-1.c: New test.

2 years agoMark IFN_UBSAN_CHECK_ADD/MUL as commutative
Richard Sandiford [Tue, 30 Nov 2021 09:52:26 +0000 (09:52 +0000)]
Mark IFN_UBSAN_CHECK_ADD/MUL as commutative

gcc/
* internal-fn.c (commutative_binary_fn_p): Handle IFN_UBSAN_CHECK_ADD
and IFN_UBSAN_CHECK_MUL.

gcc/testsuite/
* gcc.dg/ubsan/commutative-1.c: New test.

2 years agoMark IFN_COMPLEX_MUL as commutative
Richard Sandiford [Tue, 30 Nov 2021 09:52:26 +0000 (09:52 +0000)]
Mark IFN_COMPLEX_MUL as commutative

gcc/
* internal-fn.c (commutative_binary_fn_p): Handle IFN_COMPLEX_MUL.

gcc/testsuite/
* gcc.target/aarch64/sve/complex_mul_1.c: New test.

2 years agoCanonicalize argument order for commutative functions
Richard Sandiford [Tue, 30 Nov 2021 09:52:25 +0000 (09:52 +0000)]
Canonicalize argument order for commutative functions

This patch uses information about internal functions to canonicalize
the argument order of calls.

gcc/
* gimple-fold.c: Include internal-fn.h.
(fold_stmt_1): If a function maps to an internal one, use
first_commutative_argument to canonicalize the order of
commutative arguments.
* gimple-match-head.c (gimple_resimplify2, gimple_resimplify3)
(gimple_resimplify4, gimple_resimplify5): Extend commutativity
checks to functions.

gcc/testsuite/
* gcc.dg/fmax-fmin-1.c: New test.

2 years agovect: Add support for fmax and fmin reductions
Richard Sandiford [Tue, 30 Nov 2021 09:52:25 +0000 (09:52 +0000)]
vect: Add support for fmax and fmin reductions

This patch adds support for reductions involving calls to fmax*()
and fmin*(), without the -ffast-math flags that allow them to be
converted to MAX_EXPR and MIN_EXPR.

gcc/
* doc/md.texi (reduc_fmin_scal_@var{m}): Document.
(reduc_fmax_scal_@var{m}): Likewise.
* optabs.def (reduc_fmax_scal_optab): New optab.
(reduc_fmin_scal_optab): Likewise
* internal-fn.def (REDUC_FMAX, REDUC_FMIN): New functions.
* tree-vect-loop.c (reduction_fn_for_scalar_code): Handle
CASE_CFN_FMAX and CASE_CFN_FMIN.
(neutral_op_for_reduction): Likewise.
(needs_fold_left_reduction_p): Likewise.
* config/aarch64/iterators.md (FMAXMINV): New iterator.
(fmaxmin): Handle UNSPEC_FMAXNMV and UNSPEC_FMINNMV.
* config/aarch64/aarch64-simd.md (reduc_<optab>_scal_<mode>): Fix
unspec mode.
(reduc_<fmaxmin>_scal_<mode>): New pattern.
* config/aarch64/aarch64-sve.md (reduc_<fmaxmin>_scal_<mode>):
Likewise.

gcc/testsuite/
* gcc.dg/vect/vect-fmax-1.c: New test.
* gcc.dg/vect/vect-fmax-2.c: Likewise.
* gcc.dg/vect/vect-fmax-3.c: Likewise.
* gcc.dg/vect/vect-fmin-1.c: New test.
* gcc.dg/vect/vect-fmin-2.c: Likewise.
* gcc.dg/vect/vect-fmin-3.c: Likewise.
* gcc.target/aarch64/fmaxnm_1.c: Likewise.
* gcc.target/aarch64/fmaxnm_2.c: Likewise.
* gcc.target/aarch64/fminnm_1.c: Likewise.
* gcc.target/aarch64/fminnm_2.c: Likewise.
* gcc.target/aarch64/sve/fmaxnm_2.c: Likewise.
* gcc.target/aarch64/sve/fmaxnm_3.c: Likewise.
* gcc.target/aarch64/sve/fminnm_2.c: Likewise.
* gcc.target/aarch64/sve/fminnm_3.c: Likewise.

2 years agovect: Make reduction code handle calls
Richard Sandiford [Tue, 30 Nov 2021 09:52:24 +0000 (09:52 +0000)]
vect: Make reduction code handle calls

This patch extends the reduction code to handle calls.  So far
it's a structural change only; a later patch adds support for
specific function reductions.

Most of the patch consists of using code_helper and gimple_match_op
to describe the reduction operations.  The other main change is that
vectorizable_call now needs to handle fully-predicated reductions.

There are some new functions that are provided for ABI completeness
and aren't currently used:

  first_commutative_argument
  commutative_ternary_op_p
  1- and 3-argument forms of gimple_build

gcc/
* builtins.h (associated_internal_fn): Declare overload that
takes a (combined_cfn, return type) pair.
* builtins.c (associated_internal_fn): Split new overload out
of original fndecl version.  Also provide an overload that takes
a (combined_cfn, return type) pair.
* internal-fn.h (commutative_binary_fn_p): Declare.
(commutative_ternary_fn_p): Likewise.
(associative_binary_fn_p): Likewise.
* internal-fn.c (commutative_binary_fn_p, commutative_ternary_fn_p):
New functions, split out from...
(first_commutative_argument): ...here.
(associative_binary_fn_p): New function.
* gimple-match.h (code_helper): Add a constructor that takes
internal functions.
(commutative_binary_op_p): Declare.
(commutative_ternary_op_p): Likewise.
(first_commutative_argument): Likewise.
(associative_binary_op_p): Likewise.
(canonicalize_code): Likewise.
(directly_supported_p): Likewise.
(get_conditional_internal_fn): Likewise.
(gimple_build): New overloads that takes a code_helper.
* gimple-fold.c (gimple_build): Likewise.
* gimple-match-head.c (commutative_binary_op_p): New function.
(commutative_ternary_op_p): Likewise.
(first_commutative_argument): Likewise.
(associative_binary_op_p): Likewise.
(canonicalize_code): Likewise.
(directly_supported_p): Likewise.
(get_conditional_internal_fn): Likewise.
* tree-vectorizer.h: Include gimple-match.h.
(neutral_op_for_reduction): Take a code_helper instead of a tree_code.
(needs_fold_left_reduction_p): Likewise.
(reduction_fn_for_scalar_code): Likewise.
(vect_can_vectorize_without_simd_p): Declare a nNew overload that takes
a code_helper.
* tree-vect-loop.c: Include case-cfn-macros.h.
(fold_left_reduction_fn): Take a code_helper instead of a tree_code.
(reduction_fn_for_scalar_code): Likewise.
(neutral_op_for_reduction): Likewise.
(needs_fold_left_reduction_p): Likewise.
(use_mask_by_cond_expr_p): Likewise.
(build_vect_cond_expr): Likewise.
(vect_create_partial_epilog): Likewise.  Use gimple_build rather
than gimple_build_assign.
(check_reduction_path): Handle calls and operate on code_helpers
rather than tree_codes.
(vect_is_simple_reduction): Likewise.
(vect_model_reduction_cost): Likewise.
(vect_find_reusable_accumulator): Likewise.
(vect_create_epilog_for_reduction): Likewise.
(vect_transform_cycle_phi): Likewise.
(vectorizable_reduction): Likewise.  Make more use of
lane_reduc_code_p.
(vect_transform_reduction): Use gimple_extract_op but expect
a tree_code for now.
(vect_can_vectorize_without_simd_p): New overload that takes
a code_helper.
* tree-vect-stmts.c (vectorizable_call): Handle reductions in
fully-masked loops.
* tree-vect-patterns.c (vect_mark_pattern_stmts): Use
gimple_extract_op when updating STMT_VINFO_REDUC_IDX.

2 years agogimple-match: Make code_helper conversions explicit
Richard Sandiford [Tue, 30 Nov 2021 09:52:24 +0000 (09:52 +0000)]
gimple-match: Make code_helper conversions explicit

code_helper provides conversions to tree_code and combined_fn.
Now that the codebase is C++11, we can mark these conversions as
explicit.  This avoids accidentally using code_helpers with
functions that take tree_codes, which would previously entail
a hidden unchecked conversion.

gcc/
* gimple-match.h (code_helper): Provide == and != overloads.
(code_helper::operator tree_code): Make explicit.
(code_helper::operator combined_fn): Likewise.
* gimple-match-head.c (convert_conditional_op): Use explicit
conversions where necessary.
(gimple_resimplify1, gimple_resimplify2, gimple_resimplify3): Likewise.
(maybe_push_res_to_seq, gimple_simplify): Likewise.
* gimple-fold.c (replace_stmt_with_simplification): Likewise.

2 years agogimple-match: Add a gimple_extract_op function
Richard Sandiford [Tue, 30 Nov 2021 09:52:24 +0000 (09:52 +0000)]
gimple-match: Add a gimple_extract_op function

code_helper and gimple_match_op seem like generally useful ways
of summing up a gimple_assign or gimple_call (or gimple_cond).
This patch adds a gimple_extract_op function that can be used
for that.

gcc/
* gimple-match.h (code_helper): Add functions for querying whether
the code represents an internal_fn or a built_in_function.
Provide explicit conversion operators for both cases.
(gimple_extract_op): Declare.
* gimple-match-head.c (gimple_extract): New function, extracted from...
(gimple_simplify): ...here.
(gimple_extract_op): New function.

2 years agoFix -freorder-blocks-and-partition glitch with Windows SEH (continued)
Eric Botcazou [Tue, 30 Nov 2021 09:17:09 +0000 (10:17 +0100)]
Fix -freorder-blocks-and-partition glitch with Windows SEH (continued)

This fixes a thinko in the fix for the -freorder-blocks-and-partition
glitch with SEH on 64-bit Windows:
  https://gcc.gnu.org/pipermail/gcc-patches/2021-February/565208.html

Even if no exceptions are active, e.g. in C, we need to consider calls.

gcc/
PR target/103274
* config/i386/i386.c (ix86_output_call_insn): Beef up comment about
nops emitted with SEH.
* config/i386/winnt.c (i386_pe_seh_unwind_emit): When switching to
the cold section, emit a nop before the directive if the previous
active instruction is a call.

2 years agolibcpp: Enable P1949R7 for C++11 and up as it was a DR [PR100977]
Jakub Jelinek [Tue, 30 Nov 2021 08:50:52 +0000 (09:50 +0100)]
libcpp: Enable P1949R7 for C++11 and up as it was a DR [PR100977]

Jonathan mentioned on IRC that:
"Accept P1949R7 (C++ Identifier Syntax using Unicode Standard Annex 31) as
a Defect Report and apply the changes therein to the C++ working paper."
while I've actually implemented it only for -std={gnu,c}++{23,2b}.
As the C++98 rules were significantly different, I'm not trying to change
anything for C++98.

2021-11-30  Jakub Jelinek  <jakub@redhat.com>

PR c++/100977
* init.c (lang_defaults): Enable cxx23_identifiers for
-std={gnu,c}++{11,14,17,20} too.

* c-c++-common/cpp/ucnid-2011-1-utf8.c: Expect errors in C++.
* c-c++-common/cpp/ucnid-2011-1.c: Likewise.
* g++.dg/cpp/ucnid-4-utf8.C: Add missing space to dg-options.
* g++.dg/cpp23/normalize3.C: Enable for c++11 rather than just c++23.
* g++.dg/cpp23/normalize4.C: Likewise.
* g++.dg/cpp23/normalize5.C: Likewise.
* g++.dg/cpp23/normalize7.C: Expect errors rather than just warnings
for c++11 and up rather than just c++23.
* g++.dg/cpp23/ucnid-2-utf8.C: Expect errors even for c++11 .. c++20.

2 years agoc++: Small incremental tweak to source_location::current() folding
Jakub Jelinek [Tue, 30 Nov 2021 08:48:59 +0000 (09:48 +0100)]
c++: Small incremental tweak to source_location::current() folding

I've already committed the patch, but perhaps we shouldn't do it in cp_fold
where it will be folded even for warnings etc. and the locations might not
be the final yet.  This patch moves it to cp_fold_r so that it is done just
once for each function and just once for each static initializer.

2021-11-30  Jakub Jelinek  <jakub@redhat.com>

* cp-gimplify.c (cp_fold_r): Perform folding of
std::source_location::current() calls here...
(cp_fold): ... rather than here.

2 years agox86_64: PR target/100711: Splitters for pandn
Roger Sayle [Tue, 30 Nov 2021 08:35:39 +0000 (08:35 +0000)]
x86_64: PR target/100711: Splitters for pandn

This patch addresses PR target/100711 by introducing define_split
patterns so that not/broadcast/pand may be simplified (by combine)
to broadcast/pandn.  This introduces two splitters one for optimizing
pandn on TARGET_SSE for V4SI and V2DI, and another for vpandn on
TARGET_AVX2 for V16QI, V8HI, V32QI, V16HI and V8SI.  Each splitter
has its own new testcase.

I've also confirmed that not/broadcast/pandn is already getting
simplified to broadcast/pand by the middle-end optimizers.

2021-11-30  Roger Sayle  <roger@nextmovesoftware.com>
    Uroš Bizjak  <ubizjak@gmail.com>

gcc/ChangeLog
PR target/100711
* config/i386/sse.md (define_split): New splitters to simplify
not;vec_duplicate;and as vec_duplicate;andn.

gcc/testsuite/ChangeLog
PR target/100711
* gcc.target/i386/pr100711-1.c: New test case.
* gcc.target/i386/pr100711-2.c: New test case.

2 years agoOnly return after resetting type_param_spec_list
Richard Biener [Mon, 29 Nov 2021 11:26:39 +0000 (12:26 +0100)]
Only return after resetting type_param_spec_list

This fixes an appearant mistake in gfc_insert_parameter_exprs.

2021-11-29  Richard Biener  <rguenther@suse.de>

gcc/fortran/
* decl.c (gfc_insert_parameter_exprs): Only return after
resetting type_param_spec_list.

2 years agomiddle-end/103485 - fix conversion kind for vectors
Richard Biener [Tue, 30 Nov 2021 07:19:24 +0000 (08:19 +0100)]
middle-end/103485 - fix conversion kind for vectors

This makes sure to use a VIEW_CONVERT_EXPR for converting
vector signedness in the -((int)x >> (prec - 1)) to (unsigned)x >> (prec - 1)
simplification.

2021-11-30  Richard Biener  <rguenther@suse.de>

PR middle-end/103485
* match.pd (-((int)x >> (prec - 1)) to (unsigned)x >> (prec - 1)):
Use VIEW_CONVERT_EXPR for vectors.

* gcc.dg/pr103485.c: New testcase.

2 years agolibgcc: vxcrtstuff.c: add a few undefs
Rasmus Villemoes [Fri, 29 Oct 2021 08:10:12 +0000 (10:10 +0200)]
libgcc: vxcrtstuff.c: add a few undefs

When vxcrtstuff.c was created, the set of #includes was copied from
crtstuff.c. But crtstuff.c also has a bunch of #undefs after the first
#include, because, as the comment says, including auto-host.h when
building objects that are meant for target is technically not
correct.

This manifests when I try do do a canadian cross, with build=linux,
host=windows and target=vxworks, in that we pick up a

  #define caddr_t char *

from auto-host.h, which then of course creates a problem when we later
include a target header that has

  typedef char * caddr_t;

I assume that the #undefs in crtstuff.c have been added for similar
reasons.

These potentially problematic #defines all seem to be guarded by
#ifndef USED_FOR_TARGET, which tconfig.h defines before including
auto-host.h. So at first, it seems that one could avoid the problem
by simply removing the initial include of auto-host.h. Unfortunately,
we do need some of the things defined in auto-host.h within such an
ifndef USED_FOR_TARGET, namely the define of
HAVE_INITFINI_ARRAY_SUPPORT, which is what later causes
initfini-array.h to define USE_INITFINI_ARRAY. So as the next best
fix, just copy the #undefs from crtstuff.c.

libgcc/
* config/vxcrtstuff.c: Undefine caddr_t, pid_t, rlim_t,
ssize_t and vfork after including auto-host.h.

2 years agoAvoid some -Wunreachable-code-ctrl
Richard Biener [Mon, 29 Nov 2021 14:20:38 +0000 (15:20 +0100)]
Avoid some -Wunreachable-code-ctrl

This cleans up unreachable code diagnosed by -Wunreachable-code-ctrl.
It largely follows the previous series but discovers a few extra
cases, namely dead code after break or continue or loops without
exits.

2021-11-29  Richard Biener  <rguenther@suse.de>

gcc/c/
* gimple-parser.c (c_parser_gimple_postfix_expression):
avoid unreachable code after break.

gcc/
* cfgrtl.c (skip_insns_after_block): Refactor code to
be more easily readable.
* expr.c (op_by_pieces_d::run): Remove unreachable
assert.
* sched-deps.c (sched_analyze): Remove unreachable
gcc_unreachable.
* sel-sched-ir.c (in_same_ebb_p): Likewise.
* tree-ssa-alias.c (nonoverlapping_refs_since_match_p):
Remove unreachable code.
* tree-vect-slp.c (vectorize_slp_instance_root_stmt):
Refactor to avoid unreachable loop iteration.
* tree.c (walk_tree_1): Remove unreachable break.
* vec-perm-indices.c (vec_perm_indices::series_p): Remove
unreachable return.

gcc/cp/
* parser.c (cp_parser_postfix_expression): Remove
unreachable code.
* pt.c (tsubst_expr): Remove unreachable breaks.

gcc/fortran/
* frontend-passes.c (gfc_expr_walker): Remove unreachable
break.
* scanner.c (skip_fixed_comments): Remove unreachable
gcc_unreachable.
* trans-expr.c (gfc_expr_is_variable): Refactor to make
control flow more obvious.

2 years agors6000: Remove builtin mask check from builtin_decl [PR102347]
Kewen Lin [Tue, 30 Nov 2021 03:22:32 +0000 (21:22 -0600)]
rs6000: Remove builtin mask check from builtin_decl [PR102347]

As the discussion in PR102347, currently builtin_decl is invoked so
early, it's when making up the function_decl for builtin functions,
at that time the rs6000_builtin_mask could be wrong for those
builtins sitting in #pragma/attribute target functions, though it
will be updated properly later when LTO processes all nodes.

This patch is to align with the practice i386 port adopts, also
align with r10-7462 by relaxing builtin mask checking in some places.

gcc/ChangeLog:

PR target/102347
* config/rs6000/rs6000-call.c (rs6000_builtin_decl): Remove builtin mask
check.

gcc/testsuite/ChangeLog:

PR target/102347
* gcc.target/powerpc/pr102347.c: New test.

2 years agors6000: Modify the way for extra penalized cost
Kewen Lin [Tue, 30 Nov 2021 03:22:27 +0000 (21:22 -0600)]
rs6000: Modify the way for extra penalized cost

This patch follows the discussions here[1][2], where Segher
pointed out the existing way to guard the extra penalized
cost for strided/elementwise loads with a magic bound does
not scale.

The way with nunits * stmt_cost can get one much
exaggerated penalized cost, such as: for V16QI on P8, it's
16 * 20 = 320, that's why we need one bound.  To make it
better and more readable, the penalized cost is simplified
as:

    unsigned adjusted_cost = (nunits == 2) ? 2 : 1;
    unsigned extra_cost = nunits * adjusted_cost;

For V2DI/V2DF, it uses 2 penalized cost for each scalar load
while for the other modes, it uses 1.  It's mainly concluded
from the performance evaluations.  One thing might be
related is that: More units vector gets constructed, more
instructions are used.  It has more chances to schedule them
better (even run in parallelly when enough available units
at that time), so it seems reasonable not to penalize more
for them.

The SPEC2017 evaluations on Power8/Power9/Power10 at option
sets O2-vect and Ofast-unroll show this change is neutral.

[1] https://gcc.gnu.org/pipermail/gcc-patches/2021-September/579121.html
[2] https://gcc.gnu.org/pipermail/gcc-patches/2021-September/580099.html

gcc/ChangeLog:

* config/rs6000/rs6000.c
(rs6000_cost_data::update_target_cost_per_stmt): Adjust the way to
compute extra penalized cost.  Remove useless parameter.
(rs6000_cost_data::rs6000_add_stmt_cost): Adjust the call to function
update_target_cost_per_stmt.

2 years agovisium: Revert commit r12-5332
Kewen Lin [Tue, 30 Nov 2021 01:26:20 +0000 (19:26 -0600)]
visium: Revert commit r12-5332

This reverts commit b8ce19bb1a0592051e8f9a4c3252d12ae605b256
(r12-5332) "visium: Fix non-robust split condition in
define_insn_and_split".

Jeff found newlib failed to build for visium port since
r12-5332, as Eric confirmed, those split conditions in the
related define_insn_and_splits are intentional not to join
with insn condition (&&), since insn condition won't hold
after reload and the proposed concatenation will make the
splitting never happen wrongly.

2 years agoDon't reuse reference after potential resize.
Andrew MacLeod [Tue, 30 Nov 2021 00:53:50 +0000 (19:53 -0500)]
Don't reuse reference after potential resize.

When a new def chain is requested, any existing reference may no longer
be valid, so just use the object directly.

PR tree-optimization/103467
* gimple-range-gori.cc (range_def_chain::register_dependency): Don't
use an object reference after a potential resize.

2 years agoDaily bump.
GCC Administrator [Tue, 30 Nov 2021 00:16:44 +0000 (00:16 +0000)]
Daily bump.

2 years agoanalyzer: further false leak fixes due to overzealous state merging [PR103217]
David Malcolm [Mon, 29 Nov 2021 16:47:47 +0000 (11:47 -0500)]
analyzer: further false leak fixes due to overzealous state merging [PR103217]

Commit r12-5424-gf573d35147ca8433c102e1721d8c99fc432cb44b fixed a false
positive from -Wanalyzer-malloc-leak due to overzealous state merging,
erroneously merging two different svalues bound to a particular part
of the store when one has sm-state.

A further case was discovered by the reporter of PR analyzer/103217,
which this patch fixes.  In this variant, different states have set
different fields of a struct, and on attempting to merge them, the
states have a different set of binding keys, leading to one state
having an svalue with sm-state, and its peer state having a NULL value
for that binding key.  The state merger code was erroneously treating
them as mergeable to "UNKNOWN".  This followup patch fixes things by
rejecting such mergers if the non-NULL svalue is not mergeable with
"UNKNOWN".

gcc/analyzer/ChangeLog:
PR analyzer/103217
* store.cc (binding_cluster::can_merge_p): For the "key is bound"
vs "key is not bound" merger case, check that the bound svalue
is mergeable before merging it to "unknown", rejecting the merger
otherwise.

gcc/testsuite/ChangeLog:
PR analyzer/103217
* gcc.dg/analyzer/pr103217-2.c: New test.
* gcc.dg/analyzer/pr103217-3.c: New test.
* gcc.dg/analyzer/pr103217-4.c: New test.
* gcc.dg/analyzer/pr103217-5.c: New test.

Signed-off-by: David Malcolm <dmalcolm@redhat.com>
2 years agoi386: Fix and improve movhi_internal and movhf_internal some more.
Uros Bizjak [Mon, 29 Nov 2021 21:16:12 +0000 (22:16 +0100)]
i386: Fix and improve movhi_internal and movhf_internal some more.

An (*v,C) alternative can be added to movhi_internal to directly load
HImode constant 0 to xmm register. Also, V4SFmode moves can be used
for xmm->xmm moves instead of TImode moves when optimizing for size.
Fix invalid %vpinsrw insn template, which needs to duplicate %xmm
register for AVX targets.

Optimize GPR moves in movhf_internal in the same way as in movhi_internal.
Fix pinsrw and pextrw templates for AVX targets. Use sselog1
instead of sselog type.  Also, handle TARGET_SSE_PARTIAL_REG_DEPENDENCY
and TARGET_SSE_SPLIT_REGS targets.

2021-11-29  Uroš Bizjak  <ubizjak@gmail.com>

gcc/ChangeLog:

PR target/102811
* config/i386/i386.md (*movhi_internal): Introduce (*v,C) alternative.
Do not allocate non-GPR registers.  Optimize xmm->xmm moves when
optimizing for size.  Fix vpinsrw insn template.
(*movhf_internal): Fix pinsrw and pextrw insn templates for
AVX targets. Use sselog1 type instead of sselog.  Optimize GPR moves.
Optimize xmm->xmm moves for TARGET_SSE_PARTIAL_REG_DEPENDENCY
and TARGET_SSE_SPLIT_REGS targets.

2 years agoPrune out valid -Winfinite-recursion [PR103469].
Martin Sebor [Mon, 29 Nov 2021 20:13:30 +0000 (13:13 -0700)]
Prune out valid -Winfinite-recursion [PR103469].

gcc/testsuite/ChangeLog:
PR testsuite/103469
* c-c++-common/attr-retain-5.c: Prune out valid warning.
* c-c++-common/attr-retain-6.c: Same.
* c-c++-common/attr-retain-9.c: Same.

2 years agoFix autoconf regeneration slip-up.
Eric Gallager [Mon, 29 Nov 2021 19:50:02 +0000 (14:50 -0500)]
Fix autoconf regeneration slip-up.

A stray _AC_FINALIZE somehow snuck into g:909b30a; this should fix it.

gcc/ChangeLog:

* configure: Re-regenerate.

2 years agoMake etags path used by build system configurable
Eric Gallager [Mon, 29 Nov 2021 18:24:12 +0000 (13:24 -0500)]
Make etags path used by build system configurable

This commit allows users to specify a path to their "etags"
executable for use when doing "make tags".
I based this patch off of this one from upstream automake:
https://git.savannah.gnu.org/cgit/automake.git/commit/m4?id=d2ccbd7eb38d6a4277d6f42b994eb5a29b1edf29
This means that I just supplied variables that the user can override
for the tags programs, rather than having the configure scripts
actually check for them. I handle etags and ctags separately because
the intl subdirectory has separate targets for them. This commit
only affects the subdirectories that use handwritten Makefiles; the
ones that use automake will have to wait until we update the version
of automake used to be 1.16.4 or newer before they'll be fixed.

Addresses #103021

gcc/ChangeLog:

PR other/103021
* Makefile.in: Substitute CTAGS, ETAGS, and CSCOPE
variables. Use ETAGS variable in TAGS target.
* configure: Regenerate.
* configure.ac: Allow CTAGS, ETAGS, and CSCOPE
variables to be overridden.

gcc/ada/ChangeLog:

PR other/103021
* gcc-interface/Make-lang.in: Use ETAGS variable in
TAGS target.

gcc/c/ChangeLog:

PR other/103021
* Make-lang.in: Use ETAGS variable in TAGS target.

gcc/cp/ChangeLog:

PR other/103021
* Make-lang.in: Use ETAGS variable in TAGS target.

gcc/d/ChangeLog:

PR other/103021
* Make-lang.in: Use ETAGS variable in TAGS target.

gcc/fortran/ChangeLog:

PR other/103021
* Make-lang.in: Use ETAGS variable in TAGS target.

gcc/go/ChangeLog:

PR other/103021
* Make-lang.in: Use ETAGS variable in TAGS target.

gcc/objc/ChangeLog:

PR other/103021
* Make-lang.in: Use ETAGS variable in TAGS target.

gcc/objcp/ChangeLog:

PR other/103021
* Make-lang.in: Use ETAGS variable in TAGS target.

intl/ChangeLog:

PR other/103021
* Makefile.in: Use ETAGS variable in TAGS target,
CTAGS variable in CTAGS target, and MKID variable
in ID target.
* configure: Regenerate.
* configure.ac: Allow CTAGS, ETAGS, and MKID
variables to be overridden.

libcpp/ChangeLog:

PR other/103021
* Makefile.in: Use ETAGS variable in TAGS target.
* configure: Regenerate.
* configure.ac: Allow ETAGS variable to be overridden.

libiberty/ChangeLog:

PR other/103021
* Makefile.in: Use ETAGS variable in TAGS target.
* configure: Regenerate.
* configure.ac: Allow ETAGS variable to be overridden.

2 years agors6000: Add Power10 optimization for most _mm_movemask*
Paul A. Clarke [Thu, 21 Oct 2021 16:21:01 +0000 (11:21 -0500)]
rs6000: Add Power10 optimization for most _mm_movemask*

Power10 ISA added `vextract*` instructions which are realized in the
`vec_extractm` instrinsic.

Use `vec_extractm` for `_mm_movemask_ps`, `_mm_movemask_pd`, and
`_mm_movemask_epi8` compatibility intrinsics, when `_ARCH_PWR10`.

2021-11-29  Paul A. Clarke  <pc@us.ibm.com>

gcc
* config/rs6000/xmmintrin.h (_mm_movemask_ps): Use vec_extractm
when _ARCH_PWR10.
* config/rs6000/emmintrin.h (_mm_movemask_pd): Likewise.
(_mm_movemask_epi8): Likewise.

2 years agoFix RTL FE issue with premature return
Richard Biener [Mon, 29 Nov 2021 11:24:30 +0000 (12:24 +0100)]
Fix RTL FE issue with premature return

This fixes an issue discovered by -Wunreachable-code-return

2021-11-29  Richard Biener  <rguenther@suse.de>

* read-rtl-function.c (function_reader::read_rtx_operand):
Return only after resetting m_in_call_function_usage.

2 years agoc++: redundant explicit 'this' capture before C++20 [PR100493]
Patrick Palka [Mon, 29 Nov 2021 12:52:47 +0000 (07:52 -0500)]
c++: redundant explicit 'this' capture before C++20 [PR100493]

As described in detail in the PR, in C++20 implicitly capturing 'this'
via a '=' capture default is deprecated, and in C++17 adding an explicit
'this' capture alongside a '=' capture default is diagnosed as redundant
(and is strictly speaking ill-formed).  This means it's impossible to
write, in a forward-compatible way, a C++17 lambda that has a '=' capture
default and that also captures 'this' (implicitly or explicitly):

  [=] { this; }      // #1 deprecated in C++20, OK in C++17
     // GCC issues a -Wdeprecated warning in C++20 mode

  [=, this] { }      // #2 ill-formed in C++17, OK in C++20
     // GCC issues an unconditional warning in C++17 mode

This patch resolves this dilemma by downgrading the warning for #2 into
a -pedantic one.  In passing, move it into the -Wc++20-extensions class
of warnings and adjust its wording accordingly.

PR c++/100493

gcc/cp/ChangeLog:

* parser.c (cp_parser_lambda_introducer): In C++17, don't
diagnose a redundant 'this' capture alongside a by-copy
capture default unless -pedantic.  Move the diagnostic into
-Wc++20-extensions and adjust wording accordingly.

gcc/testsuite/ChangeLog:

* g++.dg/cpp1z/lambda-this1.C: Adjust expected diagnostics.
* g++.dg/cpp1z/lambda-this8.C: New test.
* g++.dg/cpp2a/lambda-this3.C: Compile with -pedantic in C++17
to continue to diagnose redundant 'this' captures.

2 years agox86_64: Improved V1TImode rotations by non-constant amounts.
Roger Sayle [Mon, 29 Nov 2021 10:45:11 +0000 (10:45 +0000)]
x86_64: Improved V1TImode rotations by non-constant amounts.

This patch builds on the recent improvements to TImode rotations (and
Jakub's fixes to shldq/shrdq patterns).  Now that expanding a TImode
rotation can never fail, it is safe to allow general_operand constraints
on the QImode shift amounts in rotlv1ti3 and rotrv1ti3 patterns.
I've also made an additional tweak to ix86_expand_v1ti_to_ti to use
vec_extract via V2DImode, which avoid using memory and takes advantage
vpextrq on recent hardware.

For the following test case:

typedef unsigned __int128 uv1ti __attribute__ ((__vector_size__ (16)));
uv1ti rotr(uv1ti x, unsigned int i) { return (x >> i) | (x << (128-i)); }

GCC with -O2 -mavx2 would previously generate:

rotr:   vmovdqa %xmm0, -24(%rsp)
        movq    -16(%rsp), %rdx
        movl    %edi, %ecx
        xorl    %esi, %esi
        movq    -24(%rsp), %rax
        shrdq   %rdx, %rax
        shrq    %cl, %rdx
        testb   $64, %dil
        cmovne  %rdx, %rax
        cmovne  %rsi, %rdx
        negl    %ecx
        xorl    %edi, %edi
        andl    $127, %ecx
        vmovq   %rax, %xmm2
        movq    -24(%rsp), %rax
        vpinsrq $1, %rdx, %xmm2, %xmm1
        movq    -16(%rsp), %rdx
        shldq   %rax, %rdx
        salq    %cl, %rax
        testb   $64, %cl
        cmovne  %rax, %rdx
        cmovne  %rdi, %rax
        vmovq   %rax, %xmm3
        vpinsrq $1, %rdx, %xmm3, %xmm0
        vpor    %xmm1, %xmm0, %xmm0
        ret

with this patch, we now generate:

rotr: movl    %edi, %ecx
        vpextrq $1, %xmm0, %rax
        vmovq   %xmm0, %rdx
        shrdq   %rax, %rdx
        vmovq   %xmm0, %rsi
        shrdq   %rsi, %rax
        andl    $64, %ecx
        movq    %rdx, %rsi
        cmovne  %rax, %rsi
        cmove   %rax, %rdx
        vmovq   %rsi, %xmm0
        vpinsrq $1, %rdx, %xmm0, %xmm0
        ret

2021-11-29  Roger Sayle  <roger@nextmovesoftware.com>

gcc/ChangeLog
* config/i386/i386-expand.c (ix86_expand_v1ti_to_ti): Perform the
conversion via V2DImode using vec_extractv2didi on TARGET_SSE2.
* config/i386/sse.md (rotlv1ti3, rotrv1ti3): Change constraint
on QImode shift amounts from const_int_operand to general_operand.

gcc/testsuite/ChangeLog
* gcc.target/i386/sse2-v1ti-rotate.c: New test case.

2 years agoRemove unreachable gcc_unreachable () at the end of functions
Richard Biener [Wed, 24 Nov 2021 14:57:03 +0000 (15:57 +0100)]
Remove unreachable gcc_unreachable () at the end of functions

It seems to be a style to place gcc_unreachable () after a
switch that handles all cases with every case returning.
Those are unreachable (well, yes!), so they will be elided
at CFG construction time and the middle-end will place
another __builtin_unreachable "after" them to note the
path doesn't lead to a return when the function is not declared
void.

So IMHO those explicit gcc_unreachable () serve no purpose,
if they could be replaced by a comment.  But since all cases
cover switches not handling a case or not returning will
likely cause some diagnostic to be emitted which is better
than running into an ICE only at runtime.

2021-11-24  Richard Biener  <rguenther@suse.de>

* tree.h (reverse_storage_order_for_component_p): Remove
spurious gcc_unreachable.
* cfganal.c (dfs_find_deadend): Likewise.
* fold-const-call.c (fold_const_logb): Likewise.
(fold_const_significand): Likewise.
* gimple-ssa-store-merging.c (lhs_valid_for_store_merging_p):
Likewise.

gcc/c-family/
* c-format.c (check_format_string): Remove spurious
gcc_unreachable.

2 years agoRemove unreachable returns
Richard Biener [Wed, 24 Nov 2021 14:57:03 +0000 (15:57 +0100)]
Remove unreachable returns

This removes unreachable return statements as diagnosed by
the -Wunreachable-code patch.  Some cases are more obviously
an improvement than others - in fact some may get you the idea
to replace them with gcc_unreachable () instead, leading to
cases of the 'Remove unreachable gcc_unreachable () at the end
of functions' patch.

2021-11-25  Richard Biener  <rguenther@suse.de>

* vec.c (qsort_chk): Do not return the void return value
from the noreturn qsort_chk_error.
* ccmp.c (expand_ccmp_expr_1): Remove unreachable return.
* df-scan.c (df_ref_equal_p): Likewise.
* dwarf2out.c (is_base_type): Likewise.
(add_const_value_attribute): Likewise.
* fixed-value.c (fixed_arithmetic): Likewise.
* gimple-fold.c (gimple_fold_builtin_fputs): Likewise.
* gimple-ssa-strength-reduction.c (stmt_cost): Likewise.
* graphite-isl-ast-to-gimple.c
(gcc_expression_from_isl_expr_op): Likewise.
(gcc_expression_from_isl_expression): Likewise.
* ipa-fnsummary.c (will_be_nonconstant_expr_predicate):
Likewise.
* lto-streamer-in.c (lto_input_mode_table): Likewise.

gcc/c-family/
* c-opts.c (c_common_post_options): Remove unreachable return.
* c-pragma.c (handle_pragma_target): Likewise.
(handle_pragma_optimize): Likewise.

gcc/c/
* c-typeck.c (c_tree_equal): Remove unreachable return.
* c-parser.c (get_matching_symbol): Likewise.

libgomp/
* oacc-plugin.c (GOMP_PLUGIN_acc_default_dim): Remove unreachable
return.

2 years agoOptimize _Float16 usage for non AVX512FP16.
liuhongt [Mon, 29 Nov 2021 02:01:42 +0000 (10:01 +0800)]
Optimize _Float16 usage for non AVX512FP16.

1. No memory is needed to move HI/HFmode between GPR and SSE registers
under TARGET_SSE2 and above, pinsrw/pextrw are used for them w/o
AVX512FP16.
2. Use gen_sse2_pinsrph/gen_vec_setv4sf_0 to replace
ix86_expand_vector_set in extendhfsf2/truncsfhf2 so that redundant
initialization cound be eliminated.

gcc/ChangeLog:

PR target/102811
* config/i386/i386.c (inline_secondary_memory_needed): HImode
move between GPR and SSE registers is supported under
TARGET_SSE2 and above.
* config/i386/i386.md (extendhfsf2): Optimize expander.
(truncsfhf2): Ditto.
* config/i386/sse.md (sse2p4_1): Adjust attr for V8HFmode to
align with V8HImode.

gcc/testsuite/ChangeLog:

* gcc.target/i386/pr102811-2.c: New test.
* gcc.target/i386/avx512vl-vcvtps2ph-pr102811.c: Add new
scan-assembler-times.

2 years agoFix regression introduced by r12-5536.
liuhongt [Fri, 26 Nov 2021 15:24:20 +0000 (23:24 +0800)]
Fix regression introduced by r12-5536.

There're several failures:
1.  unsupported instruction `pextrw` for "pextrw $0, %xmm31, 16(%rax)"
%vpextrw should be used in output templates.
2. ICE in get_attr_memory for movhi_internal since some alternatives
are marked as TYPE_SSELOG.
use TYPE_SSELOG1 instead.

Also this patch fixs a typo and some latent bugs which are related to
moving HImode from/to sse register w/o TARGET_AVX512FP16.

gcc/ChangeLog:

PR target/102811
PR target/103463
* config/i386/i386.c (ix86_secondary_reload): Without
TARGET_SSE4_1, General register is needed to move HImode from
sse register to memory.
* config/i386/sse.md (*vec_extrachf): Use %vpextrw instead of
pextrw in output templates.
* config/i386/i386.md (movhi_internal): Ditto, also fix typo of
MEM_P (operands[1]) and adjust mode/prefix/type attribute for
alternatives related to sse register.

2 years agotree-optimization/103458 - avoid creating new loops in CD-DCE
Richard Biener [Mon, 29 Nov 2021 08:15:47 +0000 (09:15 +0100)]
tree-optimization/103458 - avoid creating new loops in CD-DCE

When creating forwarders in CD-DCE we have to avoid creating loops
where we formerly did not consider those because of abnormal
predecessors.  At this point simply excuse us when there are any
abnormal predecessors.

2021-11-29  Richard Biener  <rguenther@suse.de>

PR tree-optimization/103458
* tree-ssa-dce.c (make_forwarders_with_degenerate_phis): Do not
create forwarders for blocks with abnormal predecessors.

* gcc.dg/torture/pr103458.c: New testcase.

2 years agoRestore can_be_invalidated_p semantics to before refactoring
Richard Biener [Fri, 26 Nov 2021 07:50:24 +0000 (08:50 +0100)]
Restore can_be_invalidated_p semantics to before refactoring

This restores the semantics of can_be_invalidated_p to the original
semantics of the function this was split out from tree-ssa-uninit.c.
The current semantics only ever look at the first predicate which
cannot be correct.

2021-11-26  Richard Biener  <rguenther@suse.de>

* gimple-predicate-analysis.cc (can_be_invalidated_p):
Restore semantics to the one before the split from
tree-ssa-uninit.c.

2 years agolibgcc: remove crt{begin,end}.o from powerpc-wrs-vxworks target
Rasmus Villemoes [Mon, 11 Oct 2021 12:37:05 +0000 (14:37 +0200)]
libgcc: remove crt{begin,end}.o from powerpc-wrs-vxworks target

Since commit 78e49fb1bc (Introduce vxworks specific crtstuff support),
the generic crtbegin.o/crtend.o have been unnecessary to build. So
remove them from extra_parts.

This is effectively a revert of commit 9a5b8df70 (libgcc: add
crt{begin,end} for powerpc-wrs-vxworks target).

libgcc/
* config.host (powerpc-wrs-vxworks): Do not add crtbegin.o and
crtend.o to extra_parts.

2 years agors6000/test: Add emulated gather test case
Kewen Lin [Mon, 29 Nov 2021 01:59:59 +0000 (19:59 -0600)]
rs6000/test: Add emulated gather test case

As verified, the emulated gather capability of vectorizer
(r12-2733) can help to speed up SPEC2017 510.parest_r on
Power8/9/10 by 5% ~ 9% with option sets Ofast unroll and
Ofast lto.

This patch is to add a test case similar to the one in i386
to add testing coverage for 510.parest_r hotspots.

btw, different from the one in i386, this uses unsigned int
as INDEXTYPE since the unpack support for unsigned int
(r12-3134) also matters for the hotspots vectorization.

gcc/testsuite/ChangeLog:

* gcc.target/powerpc/vect-gather-1.c: New test.

2 years agoFix PR 19089: Environment variable TMP may yield gcc: abort
Andrew Pinski [Sun, 28 Nov 2021 02:16:50 +0000 (18:16 -0800)]
Fix PR 19089: Environment variable TMP may yield gcc: abort

Even though I cannot reproduce the ICE any more, this is still
a bug. We check already to see if we can access the directory
but never check to see if the path is actually a directory.

This adds the check and now we reject the file as not usable
as a tmp directory.

OK? Bootstrapped and tested on x86_64-linux-gnu with no regressions.

libiberty/ChangeLog:

* make-temp-file.c (try_dir): Check to see if the dir
is actually a directory.

2 years agoDaily bump.
GCC Administrator [Mon, 29 Nov 2021 00:16:16 +0000 (00:16 +0000)]
Daily bump.

2 years agoFix PR 62157: disclean in libsanitizer not working
Andrew Pinski [Sun, 28 Nov 2021 01:14:59 +0000 (01:14 +0000)]
Fix PR 62157: disclean in libsanitizer not working

So what is happening is DIST_SUBDIRS contains the conditional
directories which is wrong, so we need to force DIST_SUBDIRS
to be the same as SUBDIRS as recommened by the automake manual.

OK? Bootstrapped and tested on x86_64-linux-gnu with no regressions.
Also now make distclean works inside libsanitizer directory.

libsanitizer/ChangeLog:

PR sanitizer/62157
* Makefile.am: Force DIST_SUBDIRS to be SUBDIRS.
* Makefile.in: Regenerate.
* asan/Makefile.in: Likewise.
* hwasan/Makefile.in: Likewise.
* interception/Makefile.in: Likewise.
* libbacktrace/Makefile.in: Likewise.
* lsan/Makefile.in: Likewise.
* sanitizer_common/Makefile.in: Likewise.
* tsan/Makefile.in: Likewise.
* ubsan/Makefile.in: Likewise.

2 years agoCompare guessed and feedback frequencies during profile feedback stream-in
Jan Hubicka [Sun, 28 Nov 2021 18:42:45 +0000 (19:42 +0100)]
Compare guessed and feedback frequencies during profile feedback stream-in

This patch adds simple code to dump and compare frequencies of basic blocks
read from the profile feedback and frequencies guessed statically.
It dumps basic blocks in the order of decreasing frequencies from feedback
along with guessed frequencies and histograms.

It makes it to possible spot basic blocks in hot regions that are considered
cold by guessed profile or vice versa.

I am trying to figure out how realistic our profile estimate is compared to
read one on exchange2 (looking again into PR98782.  There IRA now places spills
into hot regions of code while with older (and worse) profile it did not.
Catch is that the function is very large and has 9 nested loops, so it is hard
to figure out how to improve the profile estimate and/or IRA.

gcc/ChangeLog:

2021-11-28  Jan Hubicka  <hubicka@ucw.cz>

* profile.c: Include sreal.h
(struct bb_stats): New.
(cmp_stats): New function.
(compute_branch_probabilities): Output bb stats.

2 years agoImprove -fprofile-report
Jan Hubicka [Sun, 28 Nov 2021 18:25:33 +0000 (19:25 +0100)]
Improve -fprofile-report

Profile-report was never properly updated after switch to new profile
representation.  This patch fixes the way profile mismatches are calculated:
we used to collect separately count and freq mismatches, while now we have
only counts & probabilities.  So we verify
 - in count: that total count of incomming edges is close to acutal count of
   the BB
 - out prob: that total sum of outgoing edge edge probabilities is close
   to 1 (except for BB containing noreturn calls or EH).

Moreover I added dumping of absolute data which is useful to plot them: with
Martin Liska we plan to setup regular testing so we keep optimizers profie
updates bit under control.

Finally I added both static and dynamic stats about mismatches - static one is
simply number of inconsistencies in the cfg while dynamic is scaled by the
profile - I think in order to keep eye on optimizers the first number is quite
relevant. WHile when tracking why code quality regressed the second number
matters more.

2021-11-28  Jan Hubicka  <hubicka@ucw.cz>

* cfghooks.c: Include sreal.h, profile.h.
(profile_record_check_consistency): Fix checking of count counsistency;
record also dynamic mismatches.
* cfgrtl.c (rtl_account_profile_record): Similarly.
* tree-cfg.c (gimple_account_profile_record): Likewise.
* cfghooks.h (struct profile_record): Remove num_mismatched_freq_in,
num_mismatched_freq_out, turn time to double, add
dyn_mismatched_prob_out, dyn_mismatched_count_in,
num_mismatched_prob_out; remove num_mismatched_count_out.
* passes.c (account_profile_1): New function.
(account_profile_in_list): New function.
(pass_manager::dump_profile_report): Rewrite.
(execute_one_ipa_transform_pass): Check profile consistency after
running all passes.
(execute_all_ipa_transforms): Remove cfun test; record all transform
methods.
(execute_one_pass): Fix collecting of profile stats.

2 years agolibstdc++: Implement std::byteswap for C++23
Jakub Jelinek [Sun, 28 Nov 2021 15:32:24 +0000 (16:32 +0100)]
libstdc++: Implement std::byteswap for C++23

This patch attempts to implement P1272R4 (except for the std::bit_cast
changes in there which seem quite unrelated to this and will need to be
fixed on the compiler side).
While at least for GCC __builtin_bswap{16,32,64,128} should work fine
in constant expressions, I wonder about other compilers, so I'm using
a fallback implementation for constexpr evaluation always.
If you think that is unnecessary, I can drop the
__cpp_if_consteval >= 202106L &&
if !consteval
  {
and
  }
and reformat.
The fallback implementation is an attempt to make it work even for integral
types that don't have number of bytes divisible by 2 or when __CHAR_BIT__
is e.g. 16.

2021-11-28  Jakub Jelinek  <jakub@redhat.com>

* include/std/bit (__cpp_lib_byteswap, byteswap): Define.
* include/std/version (__cpp_lib_byteswap): Define.
* testsuite/26_numerics/bit/bit.byteswap/byteswap.cc: New test.
* testsuite/26_numerics/bit/bit.byteswap/version.cc: New test.

2 years agod: fix thinko in optimize attr parsing
Martin Liska [Sun, 28 Nov 2021 08:39:40 +0000 (09:39 +0100)]
d: fix thinko in optimize attr parsing

gcc/d/ChangeLog:

* d-attribs.cc (parse_optimize_options): Fix thinko.

2 years agoDaily bump.
GCC Administrator [Sun, 28 Nov 2021 00:16:20 +0000 (00:16 +0000)]
Daily bump.

2 years agoFix typo in t-dimode
John David Anglin [Sat, 27 Nov 2021 21:47:47 +0000 (21:47 +0000)]
Fix typo in t-dimode

2021-11-27  John David Anglin  <danglin@gcc.gnu.org>

libgcc/ChangeLog:

* config/pa/t-dimode (lib2difuncs): Fix typo.

2 years agojit: Change printf specifiers for size_t to %zu
Petter Tomner [Sat, 27 Nov 2021 14:52:15 +0000 (15:52 +0100)]
jit: Change printf specifiers for size_t to %zu

Change four occurances of %ld specifier for size_t to %zu for clean 32bit builds.

Signed-off-by
2021-11-27 Petter Tomner <tomner@kth.se>

gcc/jit/
* libgccjit.c: %ld -> %zu

2 years agox86: Fix up x86_{,64_}sh{l,r}d patterns [PR103431]
Jakub Jelinek [Sat, 27 Nov 2021 12:02:06 +0000 (13:02 +0100)]
x86: Fix up x86_{,64_}sh{l,r}d patterns [PR103431]

The following testcase is miscompiled because the x86_{,64_}sh{l,r}d
patterns don't properly describe what the instructions do.  One thing
is left out, in particular that there is initial count &= 63 for
sh{l,r}dq and initial count &= 31 for sh{l,r}d{l,w}.  And another thing
not described properly, in particular the behavior when count (after the
masking) is 0.  The pattern says it is e.g.
res = (op0 << op2) | (op1 >> (64 - op2))
but that triggers UB on op1 >> 64.  For op2 0 we actually want
res = (op0 << op2) | 0
When constants are propagated to these patterns during RTL optimizations,
both such problems trigger wrong-code issues.
This patch represents the patterns as e.g.
res = (op0 << (op2 & 63)) | (unsigned long long) ((uint128_t) op1 >> (64 - (op2 & 63)))
so there is both the initial masking and op2 == 0 behavior results in
zero being ored.
The patch introduces alternate patterns for constant op2 where
simplify-rtx.c will fold those expressions into simple numbers,
and define_insn_and_split pre-reload splitter for how the patterns
looked before into the new form, so that it can pattern match during
combine even computations that assumed the shift amount will be in
the range of 1 .. bitsize-1.

2021-11-27  Jakub Jelinek  <jakub@redhat.com>

PR middle-end/103431
* config/i386/i386.md (x86_64_shld, x86_shld, x86_64_shrd, x86_shrd):
Change insn pattern to accurately describe the instructions.
(*x86_64_shld_1, *x86_shld_1, *x86_64_shrd_1, *x86_shrd_1): New
define_insn patterns.
(*x86_64_shld_2, *x86_shld_2, *x86_64_shrd_2, *x86_shrd_2): New
define_insn_and_split patterns.
(*ashl<dwi>3_doubleword_mask, *ashl<dwi>3_doubleword_mask_1,
*<insn><dwi>3_doubleword_mask, *<insn><dwi>3_doubleword_mask_1,
ix86_rotl<dwi>3_doubleword, ix86_rotr<dwi>3_doubleword): Adjust
splitters for x86_{,64_}sh{l,r}d pattern changes.

* gcc.dg/pr103431.c: New test.

2 years agobswap: Fix UB in find_bswap_or_nop_finalize [PR103435]
Jakub Jelinek [Sat, 27 Nov 2021 12:00:55 +0000 (13:00 +0100)]
bswap: Fix UB in find_bswap_or_nop_finalize [PR103435]

On gcc.c-torture/execute/pr103376.c in the following code we trigger UB
in the compiler.  n->range is 8 because it is 64-bit load and rsize is 0
because it is a bswap sequence with load and known to be 0:
  /* Find real size of result (highest non-zero byte).  */
  if (n->base_addr)
    for (tmpn = n->n, rsize = 0; tmpn; tmpn >>= BITS_PER_MARKER, rsize++);
  else
    rsize = n->range;
The shifts then shift uint64_t by 64 bits.  For this case mask is 0
and we want both *cmpxchg and *cmpnop as 0, the operation can be done as
both nop and bswap and callers will prefer nop.

2021-11-27  Jakub Jelinek  <jakub@redhat.com>

PR tree-optimization/103435
* gimple-ssa-store-merging.c (find_bswap_or_nop_finalize): Avoid UB if
n->range - rsize == 8, just clear both *cmpnop and *cmpxchg in that
case.

2 years ago[Committed] Fix new ivopts-[89].c test cases for -m32.
Roger Sayle [Sat, 27 Nov 2021 10:13:31 +0000 (10:13 +0000)]
[Committed] Fix new ivopts-[89].c test cases for -m32.

2021-11-27  Roger Sayle  <roger@nextmovesoftware.com>

gcc/testsuite/ChangeLog
* gcc.dg/tree-ssa/ivopts-8.c: Fix new test case for -m32.
* gcc.dg/tree-ssa/ivopts-9.c: Likewise.

2 years agoDaily bump.
GCC Administrator [Sat, 27 Nov 2021 00:16:19 +0000 (00:16 +0000)]
Daily bump.

2 years agoipa: Fix CFG fix-up in IPA-CP transform phase (PR 103441)
Martin Jambor [Sat, 27 Nov 2021 00:00:56 +0000 (01:00 +0100)]
ipa: Fix CFG fix-up in IPA-CP transform phase (PR 103441)

I forgot that IPA passes before ipa-inline must not return
TODO_cleanup_cfg from their transformation function because ordinary
CFG cleanup does not remove call graph edges associated with removed
call statements but must use
delete_unreachable_blocks_update_callgraph instead.  This patch fixes
that error.

gcc/ChangeLog:

2021-11-26  Martin Jambor  <mjambor@suse.cz>

PR ipa/103441
* ipa-prop.c (ipcp_transform_function): Call
delete_unreachable_blocks_update_callgraph instead of returning
TODO_cleanup_cfg.

2 years agolibstdc++: Fix test that fails in C++20 mode
Jonathan Wakely [Fri, 26 Nov 2021 22:53:02 +0000 (22:53 +0000)]
libstdc++: Fix test that fails in C++20 mode

This test was written to verify that the LWG 3265 changes work. But
those changes were superseded by LWG 3435, and the test is now incorrect
according to the current draft. The assignment operator is now
constrained to also require convertibility, which makes the test fail.

Change the Iter type to be convertible from int*, but make it throw an
exception if that conversion is used. Change the test from compile-only
to run, so we verify that the exception isn't thrown.

libstdc++-v3/ChangeLog:

* testsuite/24_iterators/move_iterator/dr3265.cc: Fix test to
account for LWG 3435 resolution.

2 years agolibstdc++: Fix trivial relocation for constexpr std::vector
Jonathan Wakely [Fri, 26 Nov 2021 21:34:17 +0000 (21:34 +0000)]
libstdc++: Fix trivial relocation for constexpr std::vector

When implementing constexpr std::vector I added a check for constant
evaluation in vector::_S_use_relocate(), so that we would not try to relocate
trivial objects by using memmove. But I put it in the constexpr function
that decides whether to relocate or not, and calls to that function are
always constant evaluated. This had the effect of disabling relocation
entirely, even in non-constexpr vectors.

This removes the check in _S_use_relocate() and modifies the actual
relocation algorithm, __relocate_a_1, to use the non-trivial
implementation instead of memmove when called during constant
evaluation.

libstdc++-v3/ChangeLog:

* include/bits/stl_uninitialized.h (__relocate_a_1): Do not use
memmove during constant evaluation.
* include/bits/stl_vector.h (vector::_S_use_relocate()): Do not
check is_constant_evaluated in always-constexpr function.

2 years agolibstdc++: Remove workaround for FE bug in std::tuple [PR96592]
Jonathan Wakely [Fri, 26 Nov 2021 17:46:47 +0000 (17:46 +0000)]
libstdc++: Remove workaround for FE bug in std::tuple [PR96592]

The FE bug was fixed, so we don't need this workaround now.

libstdc++-v3/ChangeLog:

PR libstdc++/96592
* include/std/tuple (tuple::is_constructible): Remove.

2 years agoFortran: improve check of arguments to the RESHAPE intrinsic
Harald Anlauf [Fri, 26 Nov 2021 20:00:35 +0000 (21:00 +0100)]
Fortran: improve check of arguments to the RESHAPE intrinsic

gcc/fortran/ChangeLog:

PR fortran/103411
* check.c (gfc_check_reshape): Improve check of size of source
array for the RESHAPE intrinsic against the given shape when pad
is not given, and shape is a parameter.  Try other simplifications
of shape.

gcc/testsuite/ChangeLog:

PR fortran/103411
* gfortran.dg/pr68153.f90: Adjust test to improved check.
* gfortran.dg/reshape_7.f90: Likewise.
* gfortran.dg/reshape_9.f90: New test.

2 years agolibitm: Fix bootstrap for targets without HAVE_ELF_STYLE_WEAKREF.
Iain Sandoe [Sun, 21 Nov 2021 10:49:29 +0000 (10:49 +0000)]
libitm: Fix bootstrap for targets without HAVE_ELF_STYLE_WEAKREF.

Recent improvements to null address warnings notice that for
targets that do not support HAVE_ELF_STYLE_WEAKREF the dummy stub
implementation of __cxa_get_globals() means that the address can
never be null.

Fixed by removing the test for such targets.

Signed-off-by: Iain Sandoe <iain@sandoe.co.uk>
libitm/ChangeLog:

* eh_cpp.cc (GTM::gtm_thread::init_cpp_exceptions): If the
target does not support HAVE_ELF_STYLE_WEAKREF then do not
try to test the __cxa_get_globals against NULL.

2 years agotree-object-size: Abstract object_sizes array
Siddhesh Poyarekar [Fri, 26 Nov 2021 03:14:58 +0000 (08:44 +0530)]
tree-object-size: Abstract object_sizes array

Put all accesses to object_sizes behind functions so that we can add
dynamic capability more easily.

gcc/ChangeLog:

* tree-object-size.c (object_sizes_grow, object_sizes_release,
object_sizes_unknown_p, object_sizes_get, object_size_set_force,
object_sizes_set): New functions.
(addr_object_size, compute_builtin_object_size,
expr_object_size, call_object_size, unknown_object_size,
merge_object_sizes, plus_stmt_object_size,
cond_expr_object_size, collect_object_sizes_for,
check_for_plus_in_loops_1, init_object_sizes,
fini_object_sizes): Adjust.

Signed-off-by: Siddhesh Poyarekar <siddhesh@gotplt.org>
2 years agotree-object-size: Replace magic numbers with enums
Siddhesh Poyarekar [Fri, 26 Nov 2021 03:14:21 +0000 (08:44 +0530)]
tree-object-size: Replace magic numbers with enums

A simple cleanup to allow inserting dynamic size code more easily.

gcc/ChangeLog:

* tree-object-size.c: New enum.
(object_sizes, computed, addr_object_size,
compute_builtin_object_size, expr_object_size, call_object_size,
merge_object_sizes, plus_stmt_object_size,
collect_object_sizes_for, init_object_sizes, fini_object_sizes,
object_sizes_execute): Replace magic numbers with enums.

Signed-off-by: Siddhesh Poyarekar <siddhesh@gotplt.org>
2 years agoivopts: Improve code generated for very simple loops.
Roger Sayle [Fri, 26 Nov 2021 17:22:10 +0000 (17:22 +0000)]
ivopts: Improve code generated for very simple loops.

This patch tidies up the code that GCC generates for simple loops,
by selecting/generating a simpler loop bound expression in ivopts.
The original motivation came from looking at the following loop (from
gcc.target/i386/pr90178.c)

int *find_ptr (int* mem, int sz, int val)
{
  for (int i = 0; i < sz; i++)
    if (mem[i] == val)
      return &mem[i];
  return 0;
}

which GCC currently compiles to:

find_ptr:
        movq    %rdi, %rax
        testl   %esi, %esi
        jle     .L4
        leal    -1(%rsi), %ecx
        leaq    4(%rdi,%rcx,4), %rcx
        jmp     .L3
.L7:    addq    $4, %rax
        cmpq    %rcx, %rax
        je      .L4
.L3:    cmpl    %edx, (%rax)
        jne     .L7
        ret
.L4:    xorl    %eax, %eax
        ret

Notice the relatively complex leal/leaq instructions, that result
from ivopts using the following expression for the loop bound:
inv_expr 2:     ((unsigned long) ((unsigned int) sz_8(D) + 4294967295)
* 4 + (unsigned long) mem_9(D)) + 4

which results from NITERS being (unsigned int) sz_8(D) + 4294967295,
i.e. (sz - 1), and the logic in cand_value_at determining the bound
as BASE + NITERS*STEP at the start of the final iteration and as
BASE + NITERS*STEP + STEP at the end of the final iteration.

Ideally, we'd like the middle-end optimizers to simplify
BASE + NITERS*STEP + STEP as BASE + (NITERS+1)*STEP, especially
when NITERS already has the form BOUND-1, but with type conversions
and possible overflow to worry about, the above "inv_expr 2" is the
best that can be done by fold (without additional context information).

This patch improves ivopts' cand_value_at by instead of using just
the tree expression for NITERS, passing the data structure that
explains how that expression was derived.  This allows us to peek
under the surface to check that NITERS+1 doesn't overflow, and in
this patch to use the SSA_NAME already holding the required value.

In the motivating loop above, inv_expr 2 now becomes:
(unsigned long) sz_8(D) * 4 + (unsigned long) mem_9(D)

And as a result, on x86_64 we now generate:

find_ptr:
        movq    %rdi, %rax
        testl   %esi, %esi
        jle     .L4
        movslq  %esi, %rsi
        leaq    (%rdi,%rsi,4), %rcx
        jmp     .L3
.L7:    addq    $4, %rax
        cmpq    %rcx, %rax
        je      .L4
.L3:    cmpl    %edx, (%rax)
        jne     .L7
        ret
.L4:    xorl    %eax, %eax
        ret

This improvement required one minor tweak to GCC's testsuite for
gcc.dg/wrapped-binop-simplify.c, where we again generate better
code, and therefore no longer find as many optimization opportunities
in later passes (vrp2).

Previously:

void v1 (unsigned long *in, unsigned long *out, unsigned int n)
{
  int i;
  for (i = 0; i < n; i++) {
    out[i] = in[i];
  }
}

on x86_64 generated:
v1: testl   %edx, %edx
        je      .L1
        movl    %edx, %edx
        xorl    %eax, %eax
.L3: movq    (%rdi,%rax,8), %rcx
        movq    %rcx, (%rsi,%rax,8)
        addq    $1, %rax
        cmpq    %rax, %rdx
        jne     .L3
.L1: ret

and now instead generates:
v1: testl   %edx, %edx
        je      .L1
        movl    %edx, %edx
        xorl    %eax, %eax
        leaq    0(,%rdx,8), %rcx
.L3: movq    (%rdi,%rax), %rdx
        movq    %rdx, (%rsi,%rax)
        addq    $8, %rax
        cmpq    %rax, %rcx
        jne     .L3
.L1: ret

2021-11-26  Roger Sayle  <roger@nextmovesoftware.com>

gcc/ChangeLog
* tree-ssa-loop-ivopts.c (cand_value_at): Take a class
tree_niter_desc* argument instead of just a tree for NITER.
If we require the iv candidate value at the end of the final
loop iteration, try using the original loop bound as the
NITER for sufficiently simple loops.
(may_eliminate_iv): Update (only) call to cand_value_at.

gcc/testsuite/ChangeLog
* gcc.dg/wrapped-binop-simplify.c: Update expected test result.
* gcc.dg/tree-ssa/ivopts-5.c: New test case.
* gcc.dg/tree-ssa/ivopts-6.c: New test case.
* gcc.dg/tree-ssa/ivopts-7.c: New test case.
* gcc.dg/tree-ssa/ivopts-8.c: New test case.
* gcc.dg/tree-ssa/ivopts-9.c: New test case.

2 years agolibstdc++: Ensure dg-add-options comes after dg-options
Jonathan Wakely [Fri, 26 Nov 2021 15:10:43 +0000 (15:10 +0000)]
libstdc++: Ensure dg-add-options comes after dg-options

This is what the docs say is required.

libstdc++-v3/ChangeLog:

* testsuite/29_atomics/atomic_float/1.cc: Reorder directives.

2 years agolibstdc++: Fix dg-do directive for tests supposed to be run
Jonathan Wakely [Fri, 26 Nov 2021 13:58:07 +0000 (13:58 +0000)]
libstdc++: Fix dg-do directive for tests supposed to be run

libstdc++-v3/ChangeLog:

* testsuite/23_containers/unordered_map/modifiers/move_assign.cc:
Change dg-do compile to run.
* testsuite/27_io/basic_istream/extractors_character/wchar_t/lwg2499.cc:
Likewise.

2 years agolibstdc++: Remove redundant xfail selectors in dg-do compile tests
Jonathan Wakely [Fri, 26 Nov 2021 13:55:32 +0000 (13:55 +0000)]
libstdc++: Remove redundant xfail selectors in dg-do compile tests

An 'xfail' selector means the test is expected to fail at runtime, so is
ignored for a compile-only test. The way to mark a compile-only test as
failing is with dg-error (which these already do).

libstdc++-v3/ChangeLog:

* testsuite/21_strings/basic_string_view/element_access/char/back_constexpr_neg.cc:
Remove xfail selector.
* testsuite/21_strings/basic_string_view/element_access/char/constexpr_neg.cc:
Likewise.
Likewise.
* testsuite/21_strings/basic_string_view/element_access/char/front_constexpr_neg.cc:
Likewise.
* testsuite/21_strings/basic_string_view/element_access/wchar_t/back_constexpr_neg.cc:
Likewise.
* testsuite/21_strings/basic_string_view/element_access/wchar_t/constexpr_neg.cc:
Likewise.
* testsuite/21_strings/basic_string_view/element_access/wchar_t/front_constexpr_neg.cc:
Likewise.
* testsuite/23_containers/span/101411.cc: Likewise.
* testsuite/25_algorithms/copy/debug/constexpr_neg.cc: Likewise.
* testsuite/25_algorithms/copy_backward/debug/constexpr_neg.cc:
Likewise.
* testsuite/25_algorithms/equal/constexpr_neg.cc: Likewise.
* testsuite/25_algorithms/equal/debug/constexpr_neg.cc: Likewise.
* testsuite/25_algorithms/lower_bound/debug/constexpr_partitioned_neg.cc:
Likewise.
* testsuite/25_algorithms/lower_bound/debug/constexpr_partitioned_pred_neg.cc:
Likewise.
* testsuite/25_algorithms/lower_bound/debug/constexpr_valid_range_neg.cc:
Likewise.
* testsuite/25_algorithms/upper_bound/debug/constexpr_partitioned_neg.cc:
Likewise.
* testsuite/25_algorithms/upper_bound/debug/constexpr_partitioned_pred_neg.cc:
Likewise.
* testsuite/25_algorithms/upper_bound/debug/constexpr_valid_range_neg.cc:
Likewise.

2 years agod: fix ASAN in option processing
Martin Liska [Thu, 25 Nov 2021 13:41:50 +0000 (14:41 +0100)]
d: fix ASAN in option processing

Fixes:

==129444==ERROR: AddressSanitizer: global-buffer-overflow on address 0x00000666ca5c at pc 0x000000ef094b bp 0x7fffffff8180 sp 0x7fffffff8178
READ of size 4 at 0x00000666ca5c thread T0
    #0 0xef094a in parse_optimize_options ../../gcc/d/d-attribs.cc:855
    #1 0xef0d36 in d_handle_optimize_attribute ../../gcc/d/d-attribs.cc:916
    #2 0xef107e in d_handle_optimize_attribute ../../gcc/d/d-attribs.cc:887
    #3 0xff85b1 in decl_attributes(tree_node**, tree_node*, int, tree_node*) ../../gcc/attribs.c:829
    #4 0xef2a91 in apply_user_attributes(Dsymbol*, tree_node*) ../../gcc/d/d-attribs.cc:427
    #5 0xf7b7f3 in get_symbol_decl(Declaration*) ../../gcc/d/decl.cc:1346
    #6 0xf87bc7 in get_symbol_decl(Declaration*) ../../gcc/d/decl.cc:967
    #7 0xf87bc7 in DeclVisitor::visit(FuncDeclaration*) ../../gcc/d/decl.cc:808
    #8 0xf83db5 in DeclVisitor::build_dsymbol(Dsymbol*) ../../gcc/d/decl.cc:146

for the following test-case: gcc/testsuite/gdc.dg/attr_optimize1.d.

gcc/d/ChangeLog:

* d-attribs.cc (parse_optimize_options): Check index before
accessing cl_options.

2 years agoMinor ipa-modref tweaks
Jan Hubicka [Fri, 26 Nov 2021 12:54:41 +0000 (13:54 +0100)]
Minor ipa-modref tweaks

To make dumps easier to read modref now dumps cgraph_node name rather then
cfun name in function being analysed and I also fixed minor issue with ECF
flags merging when updating inline summary.

gcc/ChangeLog:

2021-11-26  Jan Hubicka  <hubicka@ucw.cz>

* ipa-modref.c (analyze_function): Drop parameter F and dump
cgraph node name rather than cfun name.
(modref_generate): Update.
(modref_summaries::insert):Update.
(modref_summaries_lto::insert):Update.
(pass_modref::execute):Update.
(ipa_merge_modref_summary_after_inlining): Improve combining of
ECF_FLAGS.

2 years agoFix failure in inlline-9.c testcase
Jan Hubicka [Fri, 26 Nov 2021 12:48:29 +0000 (13:48 +0100)]
Fix failure in inlline-9.c testcase

gcc/testsuite/ChangeLog:

2021-11-26  Jan Hubicka  <hubicka@ucw.cz>

* gcc.dg/ipa/inline-9.c: Update template.c

2 years agolibstdc++: Move std::to_address tests to more appropriate place
Jonathan Wakely [Fri, 26 Nov 2021 11:03:06 +0000 (11:03 +0000)]
libstdc++: Move std::to_address tests to more appropriate place

Some of the checks in 20_util/pointer_traits/lwg3545.cc really belong in
20_util/to_address/lwg3545 instead.

This also fixes the ordering of the dg-options and dg-do directives.

libstdc++-v3/ChangeLog:

* testsuite/20_util/pointer_traits/lwg3545.cc: Move to_address
tests to ...
* testsuite/20_util/to_address/lwg3545.cc: ... here. Add -std
option before checking effective target.

2 years agoFix handling of in_flags in update_escape_summary_1
Jan Hubicka [Fri, 26 Nov 2021 12:36:35 +0000 (13:36 +0100)]
Fix handling of in_flags in update_escape_summary_1

update_escape_summary_1 has thinko where it compues proper min_flags but then
stores original value (ignoring the fact whether there was a dereference
in the escape point).

PR ipa/102943
* ipa-modref.c (update_escape_summary_1): Fix handling of min_flags.

2 years agoc++: Fix up taking address of an immediate function diagnostics [PR102753]
Jakub Jelinek [Fri, 26 Nov 2021 09:11:13 +0000 (10:11 +0100)]
c++: Fix up taking address of an immediate function diagnostics [PR102753]

On Wed, Oct 20, 2021 at 07:16:44PM -0400, Jason Merrill wrote:
> or an unevaluated operand, or a subexpression of an immediate invocation.
>
> Hmm...that suggests that in consteval23.C, bar(foo) should also be OK,

The following patch handles that by removing the diagnostics about taking
address of immediate function from cp_build_addr_expr_1, and instead diagnoses
it in cp_fold_r.  To do that with proper locations, the patch attempts to
ensure that ADDR_EXPRs of immediate functions get EXPR_LOCATION set and
adds a PTRMEM_CST_LOCATION for PTRMEM_CSTs.  Also, evaluation of
std::source_location::current() is moved from genericization to cp_fold.

2021-11-26  Jakub Jelinek  <jakub@redhat.com>

PR c++/102753
* cp-tree.h (struct ptrmem_cst): Add locus member.
(PTRMEM_CST_LOCATION): Define.
* tree.c (make_ptrmem_cst): Set PTRMEM_CST_LOCATION to input_location.
(cp_expr_location): Return PTRMEM_CST_LOCATION for PTRMEM_CST.
* typeck.c (build_x_unary_op): Overwrite PTRMEM_CST_LOCATION for
PTRMEM_CST instead of calling maybe_wrap_with_location.
(cp_build_addr_expr_1): Don't diagnose taking address of
immediate functions here.  Instead when taking their address make
sure the returned ADDR_EXPR has EXPR_LOCATION set.
(expand_ptrmemfunc_cst): Copy over PTRMEM_CST_LOCATION to ADDR_EXPR's
EXPR_LOCATION.
(convert_for_assignment): Use cp_expr_loc_or_input_loc instead of
EXPR_LOC_OR_LOC.
* pt.c (tsubst_copy): Use build1_loc instead of build1.  Ensure
ADDR_EXPR of immediate function has EXPR_LOCATION set.
* cp-gimplify.c (cp_fold_r): Diagnose taking address of immediate
functions here.  For consteval if don't walk THEN_CLAUSE.
(cp_genericize_r): Move evaluation of calls to
std::source_location::current from here to...
(cp_fold): ... here.  Don't assert calls to immediate functions must
be source_location_current_p, instead only constant evaluate
calls to source_location_current_p.

* g++.dg/cpp2a/consteval20.C: Add some extra tests.
* g++.dg/cpp2a/consteval23.C: Likewise.
* g++.dg/cpp2a/consteval25.C: New test.
* g++.dg/cpp2a/srcloc20.C: New test.

2 years agoi386: vcvtph2ps and vcvtps2ph should be used to convert _Float16 to SFmode with ...
konglin1 [Wed, 10 Nov 2021 01:37:32 +0000 (09:37 +0800)]
i386: vcvtph2ps and vcvtps2ph should be used to convert _Float16 to SFmode with -mf16c [PR 102811]

Add define_insn extendhfsf2 and truncsfhf2 for target_f16c.

gcc/ChangeLog:

PR target/102811
* config/i386/i386.c (ix86_can_change_mode_class): Allow 16 bit data in XMM register
for TARGET_SSE2.
* config/i386/i386.md (extendhfsf2): Add extenndhfsf2 for TARGET_F16C.
(extendhfdf2): Restrict extendhfdf for TARGET_AVX512FP16 only.
(*extendhf<mode>2): Rename from extendhf<mode>2.
(truncsfhf2): Likewise.
(truncdfhf2): Likewise.
(*trunc<mode>2): Likewise.

gcc/testsuite/ChangeLog:

PR target/102811
* gcc.target/i386/pr90773-21.c: Allow pextrw instead of movw.
* gcc.target/i386/pr90773-23.c: Ditto.
* gcc.target/i386/avx512vl-vcvtps2ph-pr102811.c: New test.

2 years agoFix typo in r12-5486.
liuhongt [Thu, 25 Nov 2021 05:51:57 +0000 (13:51 +0800)]
Fix typo in r12-5486.

gcc/ChangeLog:

PR middle-end/103419
* match.pd: Fix typo, use the type of second parameter, not
first one.

2 years agoDaily bump.
GCC Administrator [Fri, 26 Nov 2021 00:16:26 +0000 (00:16 +0000)]
Daily bump.

2 years agolibstdc++: Remove dg-error that no longer happens
Jonathan Wakely [Thu, 25 Nov 2021 22:54:07 +0000 (22:54 +0000)]
libstdc++: Remove dg-error that no longer happens

There was a c++11_only dg-error in this testcase, for a "body of
constexpr function is not a return statement" diagnostic that was bogus,
but happened because the return statement was ill-formed. A change to
G++ earlier this month means that diagnostic is no longer emitted, so
remove the dg-error.

libstdc++-v3/ChangeLog:

* testsuite/20_util/tuple/comparison_operators/overloaded2.cc:
Remove dg-error for C++11_only error.

2 years agolibstdc++: Make std::pointer_traits SFINAE-friendly [PR96416]
Jonathan Wakely [Thu, 25 Nov 2021 16:49:45 +0000 (16:49 +0000)]
libstdc++: Make std::pointer_traits SFINAE-friendly [PR96416]

This implements the resolution I'm proposing for LWG 3545, to avoid hard
errors when using std::to_address for types that make pointer_traits
ill-formed.

Consistent with std::iterator_traits, instantiating std::pointer_traits
for a non-pointer type will be well-formed, but give an empty type with
no member types. This avoids the problematic cases for std::to_address.
Additionally, the pointer_to member is now only declared when the
element type is not cv void (and for C++20, when the function body would
be well-formed). The rebind member was already SFINAE-friendly in our
implementation.

libstdc++-v3/ChangeLog:

PR libstdc++/96416
* include/bits/ptr_traits.h (pointer_traits): Reimplement to be
SFINAE-friendly (LWG 3545).
* testsuite/20_util/pointer_traits/lwg3545.cc: New test.
* testsuite/20_util/to_address/1_neg.cc: Adjust dg-error line.
* testsuite/20_util/to_address/lwg3545.cc: New test.

2 years agoRemove forgotten early return in ipa_value_range_from_jfunc
Jan Hubicka [Thu, 25 Nov 2021 22:58:48 +0000 (23:58 +0100)]
Remove forgotten early return in ipa_value_range_from_jfunc

gcc/ChangeLog:

* ipa-cp.c (ipa_value_range_from_jfunc): Remove forgotten early return.

gcc/testsuite/ChangeLog:

* gcc.dg/ipa/inline10.c: New test.

2 years agolibstdc++: Do not use memset in constexpr calls to ranges::fill_n [PR101608]
Jonathan Wakely [Wed, 24 Nov 2021 13:17:54 +0000 (13:17 +0000)]
libstdc++: Do not use memset in constexpr calls to ranges::fill_n [PR101608]

libstdc++-v3/ChangeLog:

PR libstdc++/101608
* include/bits/ranges_algobase.h (__fill_n_fn): Check for
constant evaluation before using memset.
* testsuite/25_algorithms/fill_n/constrained.cc: Check
byte-sized values as well.

2 years agoPR middle-end/103406: Check for Inf before simplifying x-x.
Roger Sayle [Thu, 25 Nov 2021 19:02:06 +0000 (19:02 +0000)]
PR middle-end/103406: Check for Inf before simplifying x-x.

This is a simple one line fix to the regression PR middle-end/103406,
where x - x is being folded to 0.0 even when x is +Inf or -Inf.
In GCC 11 and previously, we'd check whether the type honored NaNs
(which implicitly covered the case where the type honors infinities),
but my patch to test whether the operand could potentially be NaN
failed to also check whether the operand could potentially be Inf.

2021-11-25  Roger Sayle  <roger@nextmovesoftware.com>

gcc/ChangeLog
PR middle-end/103406
* match.pd (minus @0 @0): Check tree_expr_maybe_infinite_p.

gcc/testsuite/ChangeLog
PR middle-end/103406
* gcc.dg/pr103406.c: New test case.

2 years agolibgcc: Split FDE search code from PT_GNU_EH_FRAME lookup
Florian Weimer [Thu, 25 Nov 2021 17:40:51 +0000 (18:40 +0100)]
libgcc: Split FDE search code from PT_GNU_EH_FRAME lookup

This allows switching to a different implementation for
PT_GNU_EH_FRAME lookup in a subsequent commit.

This moves some of the PT_GNU_EH_FRAME parsing out of the glibc loader
lock that is implied by dl_iterate_phdr.  However, the FDE is already
parsed outside the lock before this change, so this does not introduce
additional crashes in case of a concurrent dlclose.

libgcc/ChangeLog:

* unwind-dw2-fde-dip.c (struct unw_eh_callback_data): Add hdr.
Remove func, ret.
(find_fde_tail): New function.  Split from
_Unwind_IteratePhdrCallback.  Move the result initialization
from _Unwind_Find_FDE.
(_Unwind_Find_FDE): Updated to call find_fde_tail.

2 years agoipa: Teach IPA-CP transformation about IPA-SRA modifications (PR 103227)
Martin Jambor [Thu, 25 Nov 2021 16:58:12 +0000 (17:58 +0100)]
ipa: Teach IPA-CP transformation about IPA-SRA modifications (PR 103227)

PR 103227 exposed an issue with ordering of transformations of IPA
passes.  IPA-CP can create clones for constants passed by reference
and at the same time IPA-SRA can also decide that the parameter does
not need to be a pointer (or an aggregate) and plan to convert it
into (a) simple scalar(s).  Because no intermediate clone is created
just for the purpose of ordering the transformations and because
IPA-SRA transformation is implemented as part of clone
materialization, the IPA-CP transformation happens only afterwards,
reversing the order of the transformations compared to the ordering of
analyses.

IPA-CP transformation looks at planned substitutions for values passed
by reference or in aggregates but finds that all the relevant
parameters no longer exist.  Currently it subsequently simply gives
up, leading to clones created for no good purpose (and huge regression
of 548.exchange_r.  This patch teaches it recognize the situation,
look up the new scalarized parameter and perform value substitution on
it.  On my desktop this has recovered the lost exchange2 run-time (and
some more).

I have disabled IPA-SRA in a Fortran testcase so that the dumping from
the transformation phase can still be matched in order to verify that
IPA-CP understands the IL after verifying that it does the right thing
also with IPA-SRA.

gcc/ChangeLog:

2021-11-23  Martin Jambor  <mjambor@suse.cz>

PR ipa/103227
* ipa-prop.h (ipa_get_param): New overload.  Move bits of the existing
one to the new one.
* ipa-param-manipulation.h (ipa_param_adjustments): New member
function get_updated_index_or_split.
* ipa-param-manipulation.c
(ipa_param_adjustments::get_updated_index_or_split): New function.
* ipa-prop.c (adjust_agg_replacement_values): Reimplement, add
capability to identify scalarized parameters and perform substitution
on them.
(ipcp_transform_function): Create descriptors earlier, handle new
return values of adjust_agg_replacement_values.

gcc/testsuite/ChangeLog:

2021-11-23  Martin Jambor  <mjambor@suse.cz>

PR ipa/103227
* gcc.dg/ipa/pr103227-1.c: New test.
* gcc.dg/ipa/pr103227-3.c: Likewise.
* gcc.dg/ipa/pr103227-2.c: Likewise.
* gfortran.dg/pr53787.f90: Disable IPA-SRA.

2 years agopath solver: Revert computation of ranges in gimple order.
Aldy Hernandez [Thu, 25 Nov 2021 16:30:07 +0000 (17:30 +0100)]
path solver: Revert computation of ranges in gimple order.

Revert the patch below, as it may slow down compilation with large CFGs.

commit 8acbd7bef6edbf537e3037174907029b530212f6
Author: Aldy Hernandez <aldyh@redhat.com>
Date:   Wed Nov 24 09:43:36 2021 +0100

    path solver: Compute ranges in path in gimple order.

gcc/ChangeLog:

* gimple-range-path.cc (path_range_query::compute_ranges_defined): Remove.
(path_range_query::compute_ranges_in_block): Revert to bitmap order.
* gimple-range-path.h: Remove compute_ranges_defined.

2 years agoamdgcn: Fix ICE generating CFI [PR103396]
Andrew Stubbs [Thu, 25 Nov 2021 15:59:20 +0000 (15:59 +0000)]
amdgcn: Fix ICE generating CFI [PR103396]

gcc/ChangeLog:

PR target/103396
* config/gcn/gcn.c (move_callee_saved_registers): Ensure that the
number of spilled registers is counted correctly.

2 years agoAdd the testcase for this PR to the testsuite.
Andrew MacLeod [Thu, 25 Nov 2021 13:58:19 +0000 (08:58 -0500)]
Add the testcase for this PR to the testsuite.

Various ranger-enabled patches like threading and VRP2 can do this now, so add the testcase for posterity.

gcc/testsuite/
PR tree-optimization/102648
* gcc.dg/pr102648.c: New.

2 years agoInitialize node_is_self_scc in ipa_node_params::ipa_node_params
Jan Hubicka [Thu, 25 Nov 2021 13:48:14 +0000 (14:48 +0100)]
Initialize node_is_self_scc in ipa_node_params::ipa_node_params

gcc/ChangeLog:

2021-11-25  Jan Hubicka  <hubicka@ucw.cz>

* ipa-prop.h (ipa_node_params::ipa_node_params): Initialize
node_is_self_scc.

2 years agoCheck for equivalences between PHI argument and def.
Andrew MacLeod [Tue, 23 Nov 2021 19:12:29 +0000 (14:12 -0500)]
Check for equivalences between PHI argument and def.

If a PHI argument on an edge is equivalent with the DEF, then it doesn't
provide any new information, defer processing it unless they are all
equivalences.

PR tree-optimization/103359
gcc/
* gimple-range-fold.cc (fold_using_range::range_of_phi): If arg is
equivalent to def, don't initially include it's range.

gcc/testsuite/
* gcc.dg/pr103359.c: New.

2 years agoDo not check gimple_static_cahin in ref_maybe_used_by_call_p_1
Jan Hubicka [Thu, 25 Nov 2021 13:42:47 +0000 (14:42 +0100)]
Do not check gimple_static_cahin in ref_maybe_used_by_call_p_1

gcc/ChangeLog:

2021-11-25  Jan Hubicka  <hubicka@ucw.cz>

* tree-ssa-alias.c (ref_maybe_used_by_call_p_1): Do not check
gimple_static_chain.

2 years agoRemove dead code and function
Richard Biener [Wed, 24 Nov 2021 14:57:03 +0000 (15:57 +0100)]
Remove dead code and function

The only use of get_alias_symbol is gated by a gcc_unreachable (),
so the following patch gets rid of it.

2021-11-24  Richard Biener  <rguenther@suse.de>

* cgraphunit.c (symbol_table::output_weakrefs): Remove
unreachable init.
(get_alias_symbol): Remove now unused function.

2 years agoContinue RTL verifying in rtl_verify_fallthru
Richard Biener [Wed, 24 Nov 2021 14:57:03 +0000 (15:57 +0100)]
Continue RTL verifying in rtl_verify_fallthru

One case used fatal_insn which does not return which isn't
intended as can be seen by the following erro = 1.  The following
change refactors this to inline the relevant parts of fatal_insn
instead and continue validating the RTL IL.

2021-11-25  Richard Biener  <rguenther@suse.de>

* cfgrtl.c (rtl_verify_fallthru): Do not stop verifying
with fatal_insn.
(skip_insns_after_block): Remove unreachable break and continue.

2 years agoRemove never looping loop in label_rtx_for_bb
Richard Biener [Wed, 24 Nov 2021 14:57:03 +0000 (15:57 +0100)]
Remove never looping loop in label_rtx_for_bb

This refactors the IL "walk" in a way to avoid the loop which will
never iterate.

2021-11-25  Richard Biener  <rguenther@suse.de>

* cfgexpand.c (label_rtx_for_bb): Remove dead loop construct.

2 years agoIntroduce REG_SET_EMPTY_P
Richard Biener [Wed, 24 Nov 2021 14:57:03 +0000 (15:57 +0100)]
Introduce REG_SET_EMPTY_P

This avoids a -Wunreachable-code diagnostic with EXECUTE_IF_*
in case the first iteration will exit the loop.  For the case
in thread_jump using bitmap_empty_p looks preferable so this
adds REG_SET_EMPTY_P to make that available for register sets.

2021-11-25  Richard Biener  <rguenther@suse.de>

* regset.h (REG_SET_EMPTY_P): New macro.
* cfgcleanup.c (thread_jump): Use REG_SET_EMPTY_P.

2 years agodocs: Add missing @option keyword.
Martin Liska [Thu, 25 Nov 2021 11:13:59 +0000 (12:13 +0100)]
docs: Add missing @option keyword.

gcc/ChangeLog:

* doc/invoke.texi: Use @option for -Wuninitialized.

2 years agopath solver: Move boolean import code to compute_imports.
Aldy Hernandez [Wed, 24 Nov 2021 16:58:43 +0000 (17:58 +0100)]
path solver: Move boolean import code to compute_imports.

In a follow-up patch I will be pruning the set of exported ranges
within blocks to avoid unnecessary work.  In order to do this, all the
interesting SSA names must be in the internal import bitmap ahead of
time.  I had already abstracted them out into compute_imports, but I
missed the boolean code.  This fixes the oversight.

There's a net gain of 25 threadable paths, which is unexpected but
welcome.

Tested on x86-64 & ppc64le Linux.

gcc/ChangeLog:

PR tree-optimization/103254
* gimple-range-path.cc (path_range_query::compute_ranges): Move
exported boolean code...
(path_range_query::compute_imports): ...here.

2 years agopath solver: Compute ranges in path in gimple order.
Aldy Hernandez [Wed, 24 Nov 2021 08:43:36 +0000 (09:43 +0100)]
path solver: Compute ranges in path in gimple order.

Andrew's patch for this PR103254 papered over some underlying
performance issues in the path solver that I'd like to address.

We are currently solving the SSA's defined in the current block in
bitmap order, which amounts to random order for all purposes.  This is
causing unnecessary recursion in gori.  This patch changes the order
to gimple order, thus solving dependencies before uses.

There is no change in threadable paths with this change.

Tested on x86-64 & ppc64le Linux.

gcc/ChangeLog:

PR tree-optimization/103254
* gimple-range-path.cc (path_range_query::compute_ranges_defined): New
(path_range_query::compute_ranges_in_block): Move to
compute_ranges_defined.
* gimple-range-path.h (compute_ranges_defined): New.

2 years agomatch.pd: Fix up the recent bitmask_inv_cst_vector_p simplification [PR103417]
Jakub Jelinek [Thu, 25 Nov 2021 09:47:24 +0000 (10:47 +0100)]
match.pd: Fix up the recent bitmask_inv_cst_vector_p simplification [PR103417]

The following testcase is miscompiled since the r12-5489-g0888d6bbe97e10
changes.
The simplification triggers on
(x & 4294967040U) >= 0U
and turns it into:
x <= 255U
which is incorrect, it should fold to 1 because unsigned >= 0U is always
true and normally the
/* Non-equality compare simplifications from fold_binary  */
     (if (wi::to_wide (cst) == min)
       (if (cmp == GE_EXPR)
        { constant_boolean_node (true, type); })
simplification folds that, but this simplification was done earlier.

The simplification correctly doesn't include lt which has the same
reason why it shouldn't be handled, we'll fold it to 0 elsewhere.

But, IMNSHO while it isn't incorrect to handle le and gt there, it is
unnecessary.  Because (x & cst) <= 0U and (x & cst) > 0U should
never appear, again in
/* Non-equality compare simplifications from fold_binary  */
we have a simplification for it:
       (if (cmp == LE_EXPR)
        (eq @2 @1))
       (if (cmp == GT_EXPR)
        (ne @2 @1))))
This is done for
  (cmp (convert?@2 @0) uniform_integer_cst_p@1)
and so should be done for both integers and vectors.
As the bitmask_inv_cst_vector_p simplification only handles
eq and ne for signed types, I think it can be simplified to just
following patch.

2021-11-25  Jakub Jelinek  <jakub@redhat.com>

PR tree-optimization/103417
* match.pd ((X & Y) CMP 0): Only handle eq and ne.  Commonalize
common tests.

* gcc.c-torture/execute/pr103417.c: New test.

2 years agobswap: Improve perform_symbolic_merge [PR103376]
Jakub Jelinek [Thu, 25 Nov 2021 09:38:33 +0000 (10:38 +0100)]
bswap: Improve perform_symbolic_merge [PR103376]

Thinking more about it, perhaps we could do more for BIT_XOR_EXPR.
We could allow masked1 == masked2 case for it, but would need to
do something different than the
  n->n = n1->n | n2->n;
we do on all the bytes together.
In particular, for masked1 == masked2 if masked1 != 0 (well, for 0
both variants are the same) and masked1 != 0xff we would need to
clear corresponding n->n byte instead of setting it to the input
as x ^ x = 0 (but if we don't know what x and y are, the result is
also don't know).  Now, for plus it is much harder, because not only
for non-zero operands we don't know what the result is, but it can
modify upper bytes as well.  So perhaps only if current's byte
masked1 && masked2 set the resulting byte to 0xff (unknown) iff
the byte above it is 0 and 0, and set that resulting byte to 0xff too.
Also, even for | we could instead of return NULL just set the resulting
byte to 0xff if it is different, perhaps it will be masked off later on.

This patch just punts on plus if both corresponding bytes are non-zero,
otherwise implements the above.

2021-11-25  Jakub Jelinek  <jakub@redhat.com>

PR tree-optimization/103376
* gimple-ssa-store-merging.c (perform_symbolic_merge): For
BIT_IOR_EXPR, if masked1 && masked2 && masked1 != masked2, don't
punt, but set the corresponding result byte to MARKER_BYTE_UNKNOWN.
For BIT_XOR_EXPR similarly and if masked1 == masked2 and the
byte isn't MARKER_BYTE_UNKNOWN, set the corresponding result byte to
0.

* gcc.dg/optimize-bswapsi-7.c: New test.

2 years agoc++: Return early in apply_late_template_attributes if there are no late attribs...
Jakub Jelinek [Thu, 25 Nov 2021 07:39:35 +0000 (08:39 +0100)]
c++: Return early in apply_late_template_attributes if there are no late attribs [PR101180]

The r12-299-ga0fdff3cf33f7284 change can result in cplus_decl_attributes being called
even if there are no late attributes (but at least one early attribute) in
apply_late_template_attributes.  This patch fixes that, so that we return early
if there are no late attrs, only arrange for TYPE_ATTRIBUTES to get the early
attribute list.

2021-11-25  Jakub Jelinek  <jakub@redhat.com>

PR c++/101180
* pt.c (apply_late_template_attributes): Return early if there are no
dependent attributes.