David Malcolm [Tue, 9 Aug 2022 23:58:54 +0000 (19:58 -0400)]
analyzer: fix missing -Wanalyzer-use-of-uninitialized-value on special-cased functions [PR106573]
We were missing checks for uninitialized params on calls to functions
that the analyzer has hardcoded knowledge of - both for those that are
handled just by state machines, and for those that are handled in
region-model-impl-calls.cc (for those arguments for which the svalue
wasn't accessed in handling the call).
Fixed thusly.
gcc/analyzer/ChangeLog:
PR analyzer/106573
* region-model.cc (region_model::on_call_pre): Ensure that we call
get_arg_svalue on all arguments.
gcc/testsuite/ChangeLog:
PR analyzer/106573
* gcc.dg/analyzer/error-uninit.c: New test.
* gcc.dg/analyzer/fd-uninit-1.c: New test.
* gcc.dg/analyzer/file-uninit-1.c: New test.
Signed-off-by: David Malcolm <dmalcolm@redhat.com>
Roger Sayle [Tue, 9 Aug 2022 17:59:55 +0000 (18:59 +0100)]
Use PTEST to perform AND in TImode STV of (A & B) != 0 on x86_64.
This x86_64 backend patch allows TImode STV to take advantage of the
fact that the PTEST instruction performs an AND operation. Previously
PTEST was (mostly) used for comparison against zero, by using the same
operands. The benefits are demonstrated by the new test case:
__int128 a,b;
int foo()
{
return (a & b) != 0;
}
Currently with -O2 -msse4 we generate:
movdqa a(%rip), %xmm0
pand b(%rip), %xmm0
xorl %eax, %eax
ptest %xmm0, %xmm0
setne %al
ret
with this patch we now generate:
movdqa a(%rip), %xmm0
xorl %eax, %eax
ptest b(%rip), %xmm0
setne %al
ret
Technically, the magic happens using new define_insn_and_split patterns.
Using two patterns allows this transformation to performed independently
of whether TImode STV is run before or after combine. The one tricky
case is that immediate constant operands of the AND behave slightly
differently between TImode and V1TImode: All V1TImode immediate operands
becomes loads, but for TImode only values that are not hilo_operands
need to be loaded. Hence the new *testti_doubleword accepts any
general_operand, but internally during split calls force_reg whenever
the second operand is not x86_64_hilo_general_operand. This required
(benefits from) some tweaks to TImode STV to support CONST_WIDE_INT in
more places, using CONST_SCALAR_INT_P instead of just CONST_INT_P.
2022-08-09 Roger Sayle <roger@nextmovesoftware.com>
gcc/ChangeLog
* config/i386/i386-features.cc (scalar_chain::convert_compare):
Create new pseudos only when/if needed. Add support for TEST,
i.e. (COMPARE (AND x y) (const_int 0)), using UNSPEC_PTEST.
When broadcasting V2DImode and V4SImode use new pseudo register.
(timode_scalar_chain::convert_op): Do nothing if operand is
already V1TImode. Avoid generating useless SUBREG conversions,
i.e. (SUBREG:V1TImode (REG:V1TImode) 0). Handle CONST_WIDE_INT
in addition to CONST_INT by using CONST_SCALAR_INT_P.
(convertible_comparison_p): Use CONST_SCALAR_INT_P to match both
CONST_WIDE_INT and CONST_INT. Recognize new *testti_doubleword
pattern as an STV candidate.
(timode_scalar_to_vector_candidate_p): Allow CONST_SCALAR_INT_P
operands in binary logic operations.
* config/i386/i386.cc (ix86_rtx_costs) <case UNSPEC>: Add costs
for UNSPEC_PTEST; a PTEST that performs an AND has the same cost
as regular PTEST, i.e. cost->sse_op.
* config/i386/i386.md (*testti_doubleword): New pre-reload
define_insn_and_split that recognizes comparison of TI mode AND
against zero.
* config/i386/sse.md (*ptest<mode>_and): New pre-reload
define_insn_and_split that recognizes UNSPEC_PTEST of identical
AND operands.
gcc/testsuite/ChangeLog
* gcc.target/i386/sse4_1-stv-8.c: New test case.
Roger Sayle [Tue, 9 Aug 2022 17:54:43 +0000 (18:54 +0100)]
middle-end: Optimize ((X >> C1) & C2) != C3 for more cases.
Following my middle-end patch for PR tree-optimization/94026, I'd promised
Jeff Law that I'd clean up the dead-code in fold-const.cc now that these
optimizations are handled in match.pd. Alas, I discovered things aren't
quite that simple, as the transformations I'd added avoided cases where
C2 overlapped with the new bits introduced by the shift, but the original
code handled any value of C2 provided that it had a single-bit set (under
the condition that C3 was always zero).
This patch upgrades the transformations supported by match.pd to cover
any values of C2 and C3, provided that C1 is a valid bit shift constant,
for all three shift types (logical right, arithmetic right and left).
This then makes the code in fold-const.cc fully redundant, and adds
support for some new (corner) cases not previously handled. If the
constant C1 is valid for the type's precision, the shift is now always
eliminated (with C2 and C3 possibly updated to test the sign bit).
Interestingly, the fold-const.cc code that I'm now deleting was originally
added by me back in 2006 to resolve PR middle-end/21137. I've confirmed
that those testcase(s) remain resolved with this patch (and I'll close
21137 in Bugzilla). This patch also implements most (but not all) of the
examples mentioned in PR tree-optimization/98954, for which I have some
follow-up patches.
2022-08-09 Roger Sayle <roger@nextmovesoftware.com>
Richard Biener <rguenther@suse.de>
gcc/ChangeLog
PR middle-end/21137
PR tree-optimization/98954
* fold-const.cc (fold_binary_loc): Remove optimizations to
optimize ((X >> C1) & C2) ==/!= 0.
* match.pd (cmp (bit_and (lshift @0 @1) @2) @3): Remove wi::ctz
check, and handle all values of INTEGER_CSTs @2 and @3.
(cmp (bit_and (rshift @0 @1) @2) @3): Likewise, remove wi::clz
checks, and handle all values of INTEGER_CSTs @2 and @3.
gcc/testsuite/ChangeLog
PR middle-end/21137
PR tree-optimization/98954
* gcc.dg/fold-eqandshift-4.c: New test case.
Vibhav Pant [Tue, 9 Aug 2022 15:30:18 +0000 (11:30 -0400)]
libgccjit.h: Uncomment macro definition for testing gcc_jit_context_new_bitcast support
The macro definition for LIBGCCJIT_HAVE_gcc_jit_context_new_bitcast
was earlier located in the documentation comment for
gcc_jit_context_new_bitcast, making it unavailable to code that
consumed libgccjit.h. This commit moves the definition out of the
comment, making it effective.
gcc/jit/ChangeLog:
* libgccjit.h (LIBGCCJIT_HAVE_gcc_jit_context_new_bitcast): Move
definition out of comment.
Signed-off-by: David Malcolm <dmalcolm@redhat.com>
David Malcolm [Tue, 9 Aug 2022 15:30:18 +0000 (11:30 -0400)]
docs: add notes on which functions -fanalyzer has hardcoded knowledge of
gcc/ChangeLog:
* doc/invoke.texi (Static Analyzer Options): Add notes on which
functions the analyzer has hardcoded knowledge of.
Signed-off-by: David Malcolm <dmalcolm@redhat.com>
Iain Buclaw [Tue, 9 Aug 2022 10:48:14 +0000 (12:48 +0200)]
d: Fix undefined reference to pragma(inline) symbol (PR106563)
Functions that are declared `pragma(inline)' should be treated as if
they are defined in every translation unit they are referenced from,
regardless of visibility protection. Ensure they always get
DECL_ONE_ONLY linkage, and start emitting them into other modules that
import them.
PR d/106563
gcc/d/ChangeLog:
* decl.cc (DeclVisitor::visit (FuncDeclaration *)): Set semanticRun
before generating its symbol.
(function_defined_in_root_p): New function.
(function_needs_inline_definition_p): New function.
(maybe_build_decl_tree): New function.
(get_symbol_decl): Call maybe_build_decl_tree before returning symbol.
(start_function): Use function_defined_in_root_p instead of inline
test for locally defined symbols.
(set_linkage_for_decl): Check for inline functions before private or
protected symbols.
gcc/testsuite/ChangeLog:
* gdc.dg/torture/torture.exp (srcdir): New proc.
* gdc.dg/torture/imports/pr106563math.d: New test.
* gdc.dg/torture/imports/pr106563regex.d: New test.
* gdc.dg/torture/imports/pr106563uni.d: New test.
* gdc.dg/torture/pr106563.d: New test.
Andrew Stubbs [Fri, 15 Jul 2022 08:47:36 +0000 (09:47 +0100)]
amdgcn: Vector procedure call ABI
Adjust the (unofficial) procedure calling ABI such that vector arguments are
passed in vector registers, not on the stack. Scalar arguments continue to
be passed in scalar registers, making a total of 12 argument registers.
The return value is also moved to a vector register (even for scalars; it
would be possible to retain the scalar location, using untyped_call, but
there's no obvious advantage in doing so).
After this change the ABI is as follows:
s0-s13 : Reserved for kernel launch parameters.
s14-s15 : Frame pointer.
s16-s17 : Stack pointer.
s18-s19 : Link register.
s20-s21 : Exec Save.
s22-s23 : CC Save.
s24-s25 : Scalar arguments. NO LONGER RETURN VALUE.
s26-s29 : Additional scalar arguments (makes 6 total).
s30-s31 : Static Chain.
v0 : Prologue/epilogue scratch.
v1 : Constant 0, 1, 2, 3, 4, ... 63.
v2-v7 : Prologue/epilogue scratch.
v8-v9 : Return value & vector arguments. NEW.
v10-v13 : Additional vector arguments (makes 6 total). NEW.
gcc/ChangeLog:
* config/gcn/gcn.cc (gcn_function_value): Allow vector return values.
(num_arg_regs): Allow vector arguments.
(gcn_function_arg): Likewise.
(gcn_function_arg_advance): Likewise.
(gcn_arg_partial_bytes): Likewise.
(gcn_return_in_memory): Likewise.
(gcn_expand_epilogue): Get return value from v8.
* config/gcn/gcn.h (RETURN_VALUE_REG): Set to v8.
(FIRST_PARM_REG): USE FIRST_SGPR_REG for clarity.
(FIRST_VPARM_REG): New.
(FUNCTION_ARG_REGNO_P): Allow vector parameters.
(struct gcn_args): Add vnum field.
(LIBCALL_VALUE): All vector return values.
* config/gcn/gcn.md (gcn_call_value): Add vector constraints.
(gcn_call_value_indirect): Likewise.
Richard Biener [Tue, 2 Aug 2022 11:46:28 +0000 (13:46 +0200)]
autopar TLC
The following removes all excessive update_ssa calls from OMP
expansion, thereby rewriting the atomic load and store cases to
GIMPLE code generation. I don't think autopar ever exercises the
atomics code though.
There's not much test coverage overall so I've built SPEC 2k17
with -floop-parallelize-all -ftree-parallelize-loops=2 with and
without LTO (and otherwise -Ofast plus -march=haswell) without
fallout.
If there's any fallout it's not OK to update SSA form for
each and every OMP stmt lowered.
* omp-expand.cc (expand_omp_atomic_load): Emit GIMPLE
directly. Avoid update_ssa when in SSA form.
(expand_omp_atomic_store): Likewise.
(expand_omp_atomic_fetch_op): Avoid update_ssa when in SSA
form.
(expand_omp_atomic_pipeline): Likewise.
(expand_omp_atomic_mutex): Likewise.
* tree-parloops.cc (gen_parallel_loop): Use
TODO_update_ssa_no_phi after loop_version.
Richard Biener [Mon, 8 Aug 2022 12:04:43 +0000 (14:04 +0200)]
Remove --param max-fsm-thread-length
This removes max-fsm-thread-length which is obsoleted by
max-jump-thread-paths.
* doc/invoke.texi (max-fsm-thread-length): Remove.
* params.opt (max-fsm-thread-length): Likewise.
* tree-ssa-threadbackward.cc
(back_threader_profitability::profitable_path_p): Do not
check max-fsm-thread-length.
Richard Biener [Mon, 8 Aug 2022 10:20:04 +0000 (12:20 +0200)]
tree-optimization/106514 - add --param max-jump-thread-paths
The following adds a limit for the exponential greedy search of
the backwards jump threader. The idea is to limit the search
space in a way that the paths considered are the same if the search
were in BFS order rather than DFS. In particular it stops considering
incoming edges into a block if the product of the in-degrees of
blocks on the path exceeds the specified limit.
When considering the low stmt copying limit of 7 (or 1 in the size
optimize case) this means the degenerate case with maximum search
space is a sequence of conditions with no actual code
B1
|\
| empty
|/
B2
|\
...
Bn
|\
GIMPLE_CONDs are costed 2, an equivalent GIMPLE_SWITCH already 4, so
we reach 7 already with 3 middle conditions (B1 and Bn do not count).
The search space would be 2^4 == 16 to reach this. The FSM threads
historically allowed for a thread length of 10 but is really looking
for a single multiway branch threaded across the backedge. I've
chosen the default of the new parameter to 64 which effectively
limits the outdegree of the switch statement (the cases reaching the
backedge) to that number (divided by 2 until I add some special
pruning for FSM threads due to the loop header indegree). The
testcase ssa-dom-thread-7.c requires 56 at the moment (as said,
some special FSM thread pruning of considered edges would bring
it down to half of that), but we now get one more threading
and quite some more in later threadfull. This testcase seems to
be difficult to check for expected transforms.
The new testcases add the degenerate case we currently thread
(without deciding whether that's a good idea ...) plus one with
an approripate limit that should prevent the threading.
This obsoletes the mentioned --param max-fsm-thread-length but
I am not removing it as part of this patch. When the search
space is limited the thread stmt size limit effectively provides
max-fsm-thread-length.
The param with its default does not help PR106514 enough to unleash
path searching with the higher FSM stmt count limit.
PR tree-optimization/106514
* params.opt (max-jump-thread-paths): New.
* doc/invoke.texi (max-jump-thread-paths): Document.
* tree-ssa-threadbackward.cc (back_threader::find_paths_to_names):
Honor max-jump-thread-paths, take overall_path argument.
(back_threader::find_paths): Pass 1 as initial overall_path.
* gcc.dg/tree-ssa/ssa-thread-16.c: New testcase.
* gcc.dg/tree-ssa/ssa-thread-17.c: Likewise.
* gcc.dg/tree-ssa/ssa-dom-thread-7.c: Adjust.
Tobias Burnus [Tue, 9 Aug 2022 05:57:40 +0000 (07:57 +0200)]
OpenMP: Fix folding with simd's linear clause [PR106492]
gcc/ChangeLog:
PR middle-end/106492
* omp-low.cc (lower_rec_input_clauses): Add missing folding
to data type of linear-clause list item.
gcc/testsuite/ChangeLog:
PR middle-end/106492
* g++.dg/gomp/pr106492.C: New test.
GCC Administrator [Tue, 9 Aug 2022 00:16:47 +0000 (00:16 +0000)]
Daily bump.
Andrew MacLeod [Mon, 8 Aug 2022 19:13:51 +0000 (15:13 -0400)]
Evaluate condition arguments with the correct type.
Processing of a cond_expr requires that a range of the correct type for the
operands of the cond_expr is passed in.
PR tree-optimization/106556
gcc/
* gimple-range-gori.cc (gori_compute::condexpr_adjust): Use the
type of the cond_expr operands being evaluted.
gcc/testsuite/
* gfortran.dg/pr106556.f90: New.
Tom Honermann [Tue, 2 Aug 2022 18:36:02 +0000 (14:36 -0400)]
preprocessor/106426: Treat u8 character literals as unsigned in char8_t modes.
This patch corrects handling of UTF-8 character literals in preprocessing
directives so that they are treated as unsigned types in char8_t enabled
C++ modes (C++17 with -fchar8_t or C++20 without -fno-char8_t). Previously,
UTF-8 character literals were always treated as having the same type as
ordinary character literals (signed or unsigned dependent on target or use
of the -fsigned-char or -funsigned char options).
PR preprocessor/106426
gcc/c-family/ChangeLog:
* c-opts.cc (c_common_post_options): Assign cpp_opts->unsigned_utf8char
subject to -fchar8_t, -fsigned-char, and/or -funsigned-char.
gcc/testsuite/ChangeLog:
* g++.dg/ext/char8_t-char-literal-1.C: Check signedness of u8 literals.
* g++.dg/ext/char8_t-char-literal-2.C: Check signedness of u8 literals.
libcpp/ChangeLog:
* charset.cc (narrow_str_to_charconst): Set signedness of CPP_UTF8CHAR
literals based on unsigned_utf8char.
* include/cpplib.h (cpp_options): Add unsigned_utf8char.
* init.cc (cpp_create_reader): Initialize unsigned_utf8char.
Tom Honermann [Tue, 2 Aug 2022 18:36:01 +0000 (14:36 -0400)]
C: Implement C2X N2653 char8_t and UTF-8 string literal changes
This patch implements the core language and compiler dependent library
changes adopted for C2X via WG14 N2653. The changes include:
- Change of type for UTF-8 string literals from array of const char to
array of const char8_t (unsigned char).
- A new atomic_char8_t typedef.
- A new ATOMIC_CHAR8_T_LOCK_FREE macro defined in terms of the existing
__GCC_ATOMIC_CHAR8_T_LOCK_FREE predefined macro.
gcc/ChangeLog:
* ginclude/stdatomic.h (atomic_char8_t,
ATOMIC_CHAR8_T_LOCK_FREE): New typedef and macro.
gcc/c/ChangeLog:
* c-parser.cc (c_parser_string_literal): Use char8_t as the type
of CPP_UTF8STRING when char8_t support is enabled.
* c-typeck.cc (digest_init): Allow initialization of an array
of character type by a string literal with type array of
char8_t.
gcc/c-family/ChangeLog:
* c-lex.cc (lex_string, lex_charconst): Use char8_t as the type
of CPP_UTF8CHAR and CPP_UTF8STRING when char8_t support is
enabled.
* c-opts.cc (c_common_post_options): Set flag_char8_t if
targeting C2x.
gcc/testsuite/ChangeLog:
* gcc.dg/atomic/c2x-stdatomic-lockfree-char8_t.c: New test.
* gcc.dg/atomic/gnu2x-stdatomic-lockfree-char8_t.c: New test.
* gcc.dg/c11-utf8str-type.c: New test.
* gcc.dg/c17-utf8str-type.c: New test.
* gcc.dg/c2x-utf8str-type.c: New test.
* gcc.dg/c2x-utf8str.c: New test.
* gcc.dg/gnu2x-utf8str-type.c: New test.
* gcc.dg/gnu2x-utf8str.c: New test.
Iain Buclaw [Mon, 8 Aug 2022 13:17:47 +0000 (15:17 +0200)]
d: Fix ICE in in add_stack_var, at cfgexpand.cc:476
The type that triggers the ICE never got completed by the semantic
analysis pass. Checking for size forces it to be done, or issue a
compile-time error.
PR d/106555
gcc/d/ChangeLog:
* d-target.cc (Target::isReturnOnStack): Check for return type size.
gcc/testsuite/ChangeLog:
* gdc.dg/imports/pr106555.d: New test.
* gdc.dg/pr106555.d: New test.
François Dumont [Thu, 28 Jan 2021 21:23:28 +0000 (22:23 +0100)]
libstdc++: [_GLIBCXX_DEBUG] Do not consider detached iterators as value-initialized
An attach iterator has its _M_version set to something != 0, the container version. This
value shall be preserved when detaching it so that the iterator does not look like a
value-initialized one.
libstdc++-v3/ChangeLog:
* include/debug/formatter.h (__singular_value_init): New _Iterator_state enum entry.
(_Parameter<>(const _Safe_iterator<>&, const char*, _Is_iterator)): Check if iterator
parameter is value-initialized.
(_Parameter<>(const _Safe_local_iterator<>&, const char*, _Is_iterator)): Likewise.
* include/debug/safe_iterator.h (_Safe_iterator<>::_M_value_initialized()): New. Adapt
checks.
* include/debug/safe_local_iterator.h (_Safe_local_iterator<>::_M_value_initialized()): New.
Adapt checks.
* src/c++11/debug.cc (_Safe_iterator_base::_M_reset): Do not reset _M_version.
(print_field(PrintContext&, const _Parameter&, const char*)): Adapt state_names.
* testsuite/23_containers/deque/debug/iterator1_neg.cc: New test.
* testsuite/23_containers/deque/debug/iterator2_neg.cc: New test.
* testsuite/23_containers/forward_list/debug/iterator1_neg.cc: New test.
* testsuite/23_containers/forward_list/debug/iterator2_neg.cc: New test.
* testsuite/23_containers/forward_list/debug/iterator3_neg.cc: New test.
Andrew Pinski [Tue, 21 Dec 2021 04:27:33 +0000 (20:27 -0800)]
Fix middle-end/103645: empty struct store not removed when using compound literal
For compound literals empty struct stores are not removed as they go down a
different path of the gimplifier; trying to optimize the init constructor.
This fixes the problem by not adding the gimple assignment at the end
of gimplify_init_constructor if it was an empty type.
Note this updates gcc.dg/pr87052.c where we had:
const char d[0] = { };
And was expecting a store to d but after this, there is no store
as the decl's type is zero in size.
OK? Bootstrapped and tested on x86_64-linux-gnu with no regressions.
gcc/ChangeLog:
PR middle-end/103645
* gimplify.cc (gimplify_init_constructor): Don't build/add
gimple assignment of an empty type.
gcc/testsuite/ChangeLog:
* gcc.dg/pr87052.c: Update d var to expect nothing.
Tamar Christina [Mon, 8 Aug 2022 13:37:42 +0000 (14:37 +0100)]
AArch32: Fix 128-bit sequential consistency atomic operations.
Similar to AArch64 the Arm implementation of 128-bit atomics is broken.
For 128-bit atomics we rely on pthread barriers to correct guard the address
in the pointer to get correct memory ordering. However for 128-bit atomics the
address under the lock is different from the original pointer.
This means that one of the values under the atomic operation is not protected
properly and so we fail during when the user has requested sequential
consistency as there's no barrier to enforce this requirement.
As such users have resorted to adding an
#ifdef GCC
<emit barrier>
#endif
around the use of these atomics.
This corrects the issue by issuing a barrier only when __ATOMIC_SEQ_CST was
requested. I have hand verified that the barriers are inserted
for atomic seq cst.
libatomic/ChangeLog:
PR target/102218
* config/arm/host-config.h (pre_seq_barrier, post_seq_barrier,
pre_post_seq_barrier): Require barrier on __ATOMIC_SEQ_CST.
Tamar Christina [Mon, 8 Aug 2022 13:37:00 +0000 (14:37 +0100)]
AArch64: Fix 128-bit sequential consistency atomic operations.
The AArch64 implementation of 128-bit atomics is broken.
For 128-bit atomics we rely on pthread barriers to correct guard the address
in the pointer to get correct memory ordering. However for 128-bit atomics the
address under the lock is different from the original pointer.
This means that one of the values under the atomic operation is not protected
properly and so we fail during when the user has requested sequential
consistency as there's no barrier to enforce this requirement.
As such users have resorted to adding an
#ifdef GCC
<emit barrier>
#endif
around the use of these atomics.
This corrects the issue by issuing a barrier only when __ATOMIC_SEQ_CST was
requested. To remedy this performance hit I think we should revisit using a
similar approach to out-line-atomics for the 128-bit atomics.
Note that I believe I need the empty file due to the include_next chain but
I am not entirely sure. I have hand verified that the barriers are inserted
for atomic seq cst.
libatomic/ChangeLog:
PR target/102218
* config/aarch64/aarch64-config.h: New file.
* config/aarch64/host-config.h: New file.
Richard Biener [Mon, 8 Aug 2022 07:07:23 +0000 (09:07 +0200)]
lto/106540 - fix LTO tree input wrt dwarf2out_register_external_die
I've revisited the earlier two workarounds for dwarf2out_register_external_die
getting duplicate entries. It turns out that r11-525-g03d90a20a1afcb
added dref_queue pruning to lto_input_tree but decl reading uses that
to stream in DECL_INITIAL even when in the middle of SCC streaming.
When that SCC then gets thrown away we can end up with debug nodes
registered which isn't supposed to happen. The following adjusts
the DECL_INITIAL streaming to go the in-SCC way, using lto_input_tree_1,
since no SCCs are expected at this point, just refs.
PR lto/106540
PR lto/106334
* dwarf2out.cc (dwarf2out_register_external_die): Restore
original assert.
* lto-streamer-in.cc (lto_read_tree_1): Use lto_input_tree_1
to input DECL_INITIAL, avoiding to commit drefs.
Andrew Pinski [Sun, 7 Aug 2022 20:51:43 +0000 (13:51 -0700)]
Move testcase gcc.dg/tree-ssa/pr93776.c to gcc.c-torture/compile/pr93776.c
Since this testcase is not exactly SSA specific and it would
be a good idea to compile this at more than just at -O1, moving
it to gcc.c-torture/compile would do that.
Committed as obvious after a test on x86_64-linux-gnu.
gcc/testsuite/ChangeLog:
* gcc.dg/tree-ssa/pr93776.c: Moved to...
* gcc.c-torture/compile/pr93776.c: ...here.
GCC Administrator [Mon, 8 Aug 2022 00:16:22 +0000 (00:16 +0000)]
Daily bump.
Roger Sayle [Sun, 7 Aug 2022 21:19:24 +0000 (22:19 +0100)]
[Committed] Add -mno-stv to new gcc.target/i386/cmpti2.c test case.
Adding -march=cascadelake to the command line options of the new cmpti2.c
testcase triggers TImode STV and produces vector code that doesn't match
the scalar implementation that this test was intended to check. Adding
-mno-stv to the options fixes this. Committed as obvious.
2022-08-07 Roger Sayle <roger@nextmovesoftware.com>
gcc/testsuite/ChangeLog
* gcc.target/i386/cmpti2.c: Add -mno-stv to dg-options.
Jakub Jelinek [Sun, 7 Aug 2022 08:07:38 +0000 (10:07 +0200)]
c++: Add support for __real__/__imag__ modifications in constant expressions [PR88174]
We claim we support P0415R1 (constexpr complex), but e.g.
#include <complex>
constexpr bool
foo ()
{
std::complex<double> a (1.0, 2.0);
a += 3.0;
a.real (6.0);
return a.real () == 6.0 && a.imag () == 2.0;
}
static_assert (foo ());
fails with
test.C:12:20: error: non-constant condition for static assertion
12 | static_assert (foo ());
| ~~~~^~
test.C:12:20: in ‘constexpr’ expansion of ‘foo()’
test.C:8:10: in ‘constexpr’ expansion of ‘a.std::complex<double>::real(6.0e+0)’
test.C:12:20: error: modification of ‘__real__ a.std::complex<double>::_M_value’ is not a constant expression
The problem is we don't handle REALPART_EXPR and IMAGPART_EXPR
in cxx_eval_store_expression.
The following patch attempts to support it (with a requirement
that those are the outermost expressions, ARRAY_REF/COMPONENT_REF
etc. are just not possible on the result of these, BIT_FIELD_REF
would be theoretically possible if trying to extract some bits
from one part of a complex int, but I don't see how it could appear
in the FE trees.
For these references, the code handles value being COMPLEX_CST,
COMPLEX_EXPR or CONSTRUCTOR_NO_CLEARING empty CONSTRUCTOR (what we use
to represent uninitialized values for C++20 and later) and the
code starts by rewriting it to COMPLEX_EXPR, so that we can freely
adjust the individual parts and later on possibly optimize it back
to COMPLEX_CST if both halves are constant.
2022-08-07 Jakub Jelinek <jakub@redhat.com>
PR c++/88174
* constexpr.cc (cxx_eval_store_expression): Handle REALPART_EXPR
and IMAGPART_EXPR. Change ctors from releasing_vec to
auto_vec<tree *>, adjust all uses. For !preeval, update ctors
vector.
* g++.dg/cpp1y/constexpr-complex1.C: New test.
Roger Sayle [Sun, 7 Aug 2022 07:49:48 +0000 (08:49 +0100)]
Allow any immediate constant in *cmp<dwi>_doubleword splitter on x86_64.
This patch tweaks i386.md's *cmp<dwi>_doubleword splitter's predicate to
allow general_operand, not just x86_64_hilo_general_operand, to improve
code generation. As a general rule, i386.md's _doubleword splitters should
be post-reload splitters that require integer immediate operands to be
x86_64_hilo_int_operand, so that each part is a valid word mode immediate
constant. As an exception to this rule, doubleword patterns that must be
split before reload, because they require additional scratch registers,
can use take advantage of this ability to create new pseudos, to accept
any immediate constant, and call force_reg on the high and/or low parts
if they are not suitable immediate operands in word mode.
The benefit is shown in the new cmpti3.c test case below.
__int128 x;
int foo()
{
__int128 t = 0x1234567890abcdefLL;
return x == t;
}
where GCC with -O2 currently generates:
movabsq $
1311768467294899695, %rax
xorl %edx, %edx
xorq x(%rip), %rax
xorq x+8(%rip), %rdx
orq %rdx, %rax
sete %al
movzbl %al, %eax
ret
but with this patch now generates:
movabsq $
1311768467294899695, %rax
xorq x(%rip), %rax
orq x+8(%rip), %rax
sete %al
movzbl %al, %eax
ret
2022-08-07 Roger Sayle <roger@nextmovesoftware.com>
gcc/ChangeLog
* config/i386/i386.md (*cmp<dwi>_doubleword): Change predicate
for x86_64_hilo_general_operand to general operand. Call
force_reg on parts that are not x86_64_immediate_operand.
gcc/testsuite/ChangeLog
* gcc.target/i386/cmpti1.c: New test case.
* gcc.target/i386/cmpti2.c: Likewise.
* gcc.target/i386/cmpti3.c: Likewise.
GCC Administrator [Sun, 7 Aug 2022 00:16:36 +0000 (00:16 +0000)]
Daily bump.
GCC Administrator [Sat, 6 Aug 2022 00:16:27 +0000 (00:16 +0000)]
Daily bump.
David Malcolm [Fri, 5 Aug 2022 23:45:41 +0000 (19:45 -0400)]
New warning: -Wanalyzer-jump-through-null [PR105947]
This patch adds a new warning to -fanalyzer for jumps through NULL
function pointers.
gcc/analyzer/ChangeLog:
PR analyzer/105947
* analyzer.opt (Wanalyzer-jump-through-null): New option.
* engine.cc (class jump_through_null): New.
(exploded_graph::process_node): Complain about jumps through NULL
function pointers.
gcc/ChangeLog:
PR analyzer/105947
* doc/invoke.texi: Add -Wanalyzer-jump-through-null.
gcc/testsuite/ChangeLog:
PR analyzer/105947
* gcc.dg/analyzer/function-ptr-5.c: New test.
Signed-off-by: David Malcolm <dmalcolm@redhat.com>
Roger Sayle [Fri, 5 Aug 2022 20:05:35 +0000 (21:05 +0100)]
middle-end: Allow backend to expand/split double word compare to 0/-1.
This patch to the middle-end's RTL expansion reorders the code in
emit_store_flag_1 so that the backend has more control over how best
to expand/split double word equality/inequality comparisons against
zero or minus one. With the current implementation, the middle-end
always decides to lower this idiom during RTL expansion using SUBREGs
and word mode instructions, without ever consulting the backend's
machine description. Hence on x86_64, a TImode comparison against zero
is always expanded as:
(parallel [
(set (reg:DI 91)
(ior:DI (subreg:DI (reg:TI 88) 0)
(subreg:DI (reg:TI 88) 8)))
(clobber (reg:CC 17 flags))])
(set (reg:CCZ 17 flags)
(compare:CCZ (reg:DI 91)
(const_int 0 [0])))
This patch, which makes no changes to the code itself, simply reorders
the clauses in emit_store_flag_1 so that the middle-end first attempts
expansion using the target's doubleword mode cstore optab/expander,
and only if this fails, falls back to lowering to word mode operations.
On x86_64, this allows the expander to produce:
(set (reg:CCZ 17 flags)
(compare:CCZ (reg:TI 88)
(const_int 0 [0])))
which is a candidate for scalar-to-vector transformations (and
combine simplifications etc.). On targets that don't define a cstore
pattern for doubleword integer modes, there should be no change in
behaviour. For those that do, the current behaviour can be restored
(if desired) by restricting the expander/insn to not apply when the
comparison is EQ or NE, and operand[2] is either const0_rtx or
constm1_rtx.
This change just keeps RTL expansion more consistent (in philosophy).
For other doubleword comparisons, such as with operators LT and GT,
or with constants other than zero or -1, the wishes of the backend
are respected, and only if the optab expansion fails are the default
fall-back implementations using narrower integer mode operations
(and conditional jumps) used.
2022-08-05 Roger Sayle <roger@nextmovesoftware.com>
gcc/ChangeLog
* expmed.cc (emit_store_flag_1): Move code to expand double word
equality and inequality against zero or -1, using word operations,
to after trying to use the backend's cstore<mode>4 optab/expander.
Jonathan Wakely [Fri, 5 Aug 2022 14:17:20 +0000 (15:17 +0100)]
libstdc++: Add feature test macro for <experimental/scope>
libstdc++-v3/ChangeLog:
* include/experimental/scope (__cpp_lib_experimental_scope):
Define.
* testsuite/experimental/scopeguard/uniqueres.cc: Check macro.
Jonathan Wakely [Wed, 13 Jul 2022 10:54:36 +0000 (11:54 +0100)]
libstdc++: Implement <experimental/scope> from LFTSv3
libstdc++-v3/ChangeLog:
* include/Makefile.am: Add new header.
* include/Makefile.in: Regenerate.
* include/experimental/scope: New file.
* testsuite/experimental/scopeguard/uniqueres.cc: New test.
* testsuite/experimental/scopeguard/exit.cc: New test.
Tamar Christina [Fri, 5 Aug 2022 13:53:28 +0000 (14:53 +0100)]
middle-end: Guard value_replacement and store_elim from seeing diamonds.
This excludes value_replacement and store_elim from diamonds as they don't
handle the form properly.
gcc/ChangeLog:
PR middle-end/106534
* tree-ssa-phiopt.cc (tree_ssa_phiopt_worker): Guard the
value_replacement and store_elim from diamonds.
Richard Biener [Fri, 5 Aug 2022 10:51:43 +0000 (12:51 +0200)]
backthreader dump fix
This fixes odd SUCCEEDED dumps from the backthreader registry that
can happen even though register_jump_thread cancelled the thread
as invalid.
* tree-ssa-threadbackward.cc (back_threader::maybe_register_path):
Check whether the registry register_path rejected the path.
(back_threader_registry::register_path): Return whether
register_jump_thread succeeded.
Aldy Hernandez [Fri, 5 Aug 2022 06:04:10 +0000 (08:04 +0200)]
Inline unsupported_range constructor.
An unsupported_range temporary is instantiated in every Value_Range
for completeness sake and should be mostly a NOP. However, it's
showing up in the callgrind stats, because it's not inline. This
fixes the oversight.
PR tree-optimization/106514
gcc/ChangeLog:
* value-range.cc (unsupported_range::unsupported_range): Move...
* value-range.h (unsupported_range::unsupported_range): ...here.
(unsupported_range::set_undefined): New.
Richard Biener [Fri, 5 Aug 2022 08:40:18 +0000 (10:40 +0200)]
tree-optimization/106533 - loop distribution of inner loop of nest
Loop distribution currently gives up if the outer loop of a loop
nest it analyzes contains a stmt with side-effects instead of
continuing to analyze the innermost loop. The following fixes that
by continuing anyway.
PR tree-optimization/106533
* tree-loop-distribution.cc (loop_distribution::execute): Continue
analyzing the inner loops when find_seed_stmts_for_distribution
fails.
* gcc.dg/tree-ssa/ldist-39.c: New testcase.
Haochen Gui [Fri, 5 Aug 2022 02:44:18 +0000 (10:44 +0800)]
rs6000: Correct return value of check_p9modulo_hw_available.
Set the return value to 0 when modulo is supported, and to 1 when not supported.
gcc/testsuite/
* lib/target-supports.exp (check_p9modulo_hw_available): Correct return
value.
Andrew Pinski [Fri, 5 Aug 2022 02:34:55 +0000 (19:34 -0700)]
[RSIC-V] Fix 32bit riscv with zbs extension enabled
The problem here was a disconnect between splittable_const_int_operand
predicate and the function riscv_build_integer_1 for 32bits with zbs enabled.
The splittable_const_int_operand predicate had a check for TARGET_64BIT which
was not needed so this patch removed it.
Committed as obvious after a build for risc32-elf configured with --with-arch=rv32imac_zba_zbb_zbc_zbs.
Thanks,
Andrew Pinski
gcc/ChangeLog:
* config/riscv/predicates.md (splittable_const_int_operand):
Remove the check for TARGET_64BIT for single bit const values.
GCC Administrator [Fri, 5 Aug 2022 00:16:24 +0000 (00:16 +0000)]
Daily bump.
Eugene Rozenfeld [Thu, 4 Aug 2022 20:34:22 +0000 (13:34 -0700)]
Add myself as AutoFDO maintainer
ChangeLog:
* MAINTAINERS: Add myself as AutoFDO maintainer.
Jonathan Wakely [Thu, 4 Aug 2022 12:08:00 +0000 (13:08 +0100)]
libstdc++: Make std::string_view(Range&&) constructor explicit
The P2499R0 paper was recently approved for C++23.
libstdc++-v3/ChangeLog:
* include/std/string_view (basic_string_view(Range&&)): Add
explicit as per P2499R0.
* testsuite/21_strings/basic_string_view/cons/char/range_c++20.cc:
Adjust implicit conversions. Check implicit conversions fail.
* testsuite/21_strings/basic_string_view/cons/wchar_t/range_c++20.cc:
Likewise.
Jonathan Wakely [Thu, 4 Aug 2022 11:48:22 +0000 (12:48 +0100)]
libstdc++: Add comparisons to std::default_sentinel_t (LWG 3719)
This library defect was recently approved for C++23.
libstdc++-v3/ChangeLog:
* include/bits/fs_dir.h (directory_iterator): Add comparison
with std::default_sentinel_t. Remove redundant operator!= for
C++20.
* (recursive_directory_iterator): Likewise.
* include/bits/iterator_concepts.h [!__cpp_lib_concepts]
(default_sentinel_t, default_sentinel): Define even if concepts
are not supported.
* include/bits/regex.h (regex_iterator): Add comparison with
std::default_sentinel_t. Remove redundant operator!= for C++20.
(regex_token_iterator): Likewise.
(regex_token_iterator::_M_end_of_seq()): Add noexcept.
* testsuite/27_io/filesystem/iterators/lwg3719.cc: New test.
* testsuite/28_regex/iterators/regex_iterator/lwg3719.cc:
New test.
* testsuite/28_regex/iterators/regex_token_iterator/lwg3719.cc:
New test.
Andrew MacLeod [Thu, 4 Aug 2022 16:22:59 +0000 (12:22 -0400)]
Loop over intersected bitmaps.
compute_ranges_in_block loops over the import list and then checks the
same bit in exports. It is nmore efficent to loop over the intersection
of the 2 bitmaps.
PR tree-optimization/106514
* gimple-range-path.cc (path_range_query::compute_ranges_in_block):
Use EXECUTE_IF_AND_IN_BITMAP to loop over 2 bitmaps.
Tamar Christina [Thu, 4 Aug 2022 15:37:25 +0000 (16:37 +0100)]
middle-end: Simplify subtract where both arguments are being bitwise inverted.
This adds a match.pd rule that drops the bitwwise nots when both arguments to a
subtract is inverted. i.e. for:
float g(float a, float b)
{
return ~(int)a - ~(int)b;
}
we instead generate
float g(float a, float b)
{
return (int)b - (int)a;
}
We already do a limited version of this from the fold_binary fold functions but
this makes a more general version in match.pd that applies more often.
gcc/ChangeLog:
* match.pd: New bit_not rule.
gcc/testsuite/ChangeLog:
* gcc.dg/subnot.c: New test.
Tamar Christina [Thu, 4 Aug 2022 15:35:31 +0000 (16:35 +0100)]
middle-end: Fix phi-ssa assertion triggers. [PR106519]
For the diamond PHI form in tree_ssa_phiopt_worker we need to
extract edge e2 sooner. This changes it so we extract it at the
same time we determine we have a diamond shape.
gcc/ChangeLog:
PR middle-end/106519
* tree-ssa-phiopt.cc (tree_ssa_phiopt_worker): Check final phi edge for
diamond shapes.
gcc/testsuite/ChangeLog:
PR middle-end/106519
* gcc.dg/pr106519.c: New test.
Sam Feifer [Wed, 3 Aug 2022 14:31:03 +0000 (10:31 -0400)]
match.pd: Add bitwise and pattern [PR106243]
This patch adds a new optimization to match.pd. The pattern, -x & 1,
now gets simplified to x & 1, reducing the number of instructions
produced.
This patch also adds tests for the optimization rule.
Bootstrapped/regtested on x86_64-pc-linux-gnu.
PR tree-optimization/106243
gcc/ChangeLog:
* match.pd (-x & 1): New simplification.
gcc/testsuite/ChangeLog:
* gcc.dg/pr106243-1.c: New test.
* gcc.dg/pr106243.c: New test.
Richard Biener [Thu, 4 Aug 2022 09:55:15 +0000 (11:55 +0200)]
tree-optimization/106521 - unroll-and-jam LC SSA rewrite
The LC SSA rewrite performs SSA verification at start but the VN
run performed on the unrolled-and-jammed body can leave us with
invalid SSA form until CFG cleanup is run. So make sure we do that
before rewriting into LC SSA.
PR tree-optimization/106521
* gimple-loop-jam.cc (tree_loop_unroll_and_jam): Perform
CFG cleanup manually before rewriting into LC SSA.
* gcc.dg/torture/pr106521.c: New testcase.
Richard Biener [Thu, 4 Aug 2022 07:21:24 +0000 (09:21 +0200)]
Backwards threader greedy search TLC
I've tried to understand how the greedy search works seeing the
bitmap dances and the split into resolve_phi. I've summarized
the intent of the algorithm as
// For further greedy searching we want to remove interesting
// names defined in BB but add ones on the PHI edges for the
// respective edges.
but the implementation differs in detail. In particular when
there is more than one interesting PHI in BB it seems to only consider
the first for translating defs across edges. It also only applies
the loop crossing restriction when there is an interesting PHI.
The following preserves the loop crossing restriction to the case
of interesting PHIs but merges resolve_phi back, changing interesting
as outlined with the intent above. It should get more threading
cases when there are multiple interesting PHI defs in a block.
It might be a bit faster due to less bitmap operations but in the
end the main intent was to make what happens more obvious.
* tree-ssa-threadbackward.cc (populate_worklist): Remove.
(back_threader::resolve_phi): Likewise.
(back_threader::find_paths_to_names): Rewrite greedy search.
Jonathan Wakely [Thu, 4 Aug 2022 09:20:18 +0000 (10:20 +0100)]
libstdc++: Rename data members of std::unexpected and std::bad_expected_access
The P2549R1 paper was accepted for C++23. I already implemented it for
our <expected>, but I didn't rename the private daata members, only the
public member functions. This renames the data members for consistency
with the working draft.
libstdc++-v3/ChangeLog:
* include/std/expected (unexpected::_M_val): Rename to _M_unex.
(bad_expected_access::_M_val): Likewise.
Jonathan Wakely [Thu, 4 Aug 2022 09:18:23 +0000 (10:18 +0100)]
libstdc++: Update value of __cpp_lib_ios_noreplace macro
My P2467R1 proposal was accepted for C++23 so there's an official value
for this macro now.
libstdc++-v3/ChangeLog:
* include/bits/ios_base.h (__cpp_lib_ios_noreplace): Update
value to 202207L.
* include/std/version (__cpp_lib_ios_noreplace): Likewise.
* testsuite/27_io/basic_ofstream/open/char/noreplace.cc: Check
for new value.
* testsuite/27_io/basic_ofstream/open/wchar_t/noreplace.cc:
Likewise.
Jonathan Wakely [Thu, 28 Jul 2022 15:15:58 +0000 (16:15 +0100)]
libstdc++: Unblock atomic wait on non-futex platforms [PR106183]
When using a mutex and condition variable, the notifying thread needs to
increment _M_ver while holding the mutex lock, and the waiting thread
needs to re-check after locking the mutex. This avoids a missed
notification as described in the PR.
By moving the increment of _M_ver to the base _M_notify we can make the
use of the mutex local to the use of the condition variable, and
simplify the code a little. We can use a relaxed store because the mutex
already provides sequential consistency. Also we don't need to check
whether __addr == &_M_ver because we know that's always true for
platforms that use a condition variable, and so we also know that we
always need to use notify_all() not notify_one().
Reviewed-by: Thomas Rodgers <trodgers@redhat.com>
libstdc++-v3/ChangeLog:
PR libstdc++/106183
* include/bits/atomic_wait.h (__waiter_pool_base::_M_notify):
Move increment of _M_ver here.
[!_GLIBCXX_HAVE_PLATFORM_WAIT]: Lock mutex around increment.
Use relaxed memory order and always notify all waiters.
(__waiter_base::_M_do_wait) [!_GLIBCXX_HAVE_PLATFORM_WAIT]:
Check value again after locking mutex.
(__waiter_base::_M_notify): Remove increment of _M_ver.
Ulrich Drepper [Thu, 4 Aug 2022 11:18:05 +0000 (13:18 +0200)]
Adjust index number of tuple pretty printer
The tuple pretty printer uses 1-based indeces which is quite confusing
considering the access to the same values with the std::get functions
uses 0-based indeces. This patch changes the pretty printer since
this is not a guaranteed API.
libstdc++-v3/ChangeLog:
* python/libstdcxx/v6/printers.py (class StdTuplePrinter): Use
zero-based indeces just like std:get takes.
Ilya Leoshkevich [Fri, 29 Jul 2022 14:14:10 +0000 (16:14 +0200)]
PR106342 - IBM zSystems: Provide vsel for all vector modes
dg.exp=pr104612.c fails with an ICE on s390x, because copysignv2sf3
produces an insn that vsel<mode> is supposed to recognize, but can't,
because it's not defined for V2SF. Fix by defining it for all vector
modes supported by copysign<mode>3.
gcc/ChangeLog:
* config/s390/vector.md (V_HW_FT): New iterator.
* config/s390/vx-builtins.md (vsel<mode>): Use V_HW_FT instead
of V_HW.
GCC Administrator [Thu, 4 Aug 2022 00:16:49 +0000 (00:16 +0000)]
Daily bump.
Michael Meissner [Wed, 3 Aug 2022 21:52:31 +0000 (17:52 -0400)]
Do not enable -mblock-ops-vector-pair.
Testing has shown that using the load vector pair and store vector pair
instructions for block moves has some performance issues on power10.
A patch on June 11th modified the code so that GCC would not set
-mblock-ops-vector-pair by default if we are tuning for power10, but it would
set the option if we were tuning for a different machine and have load and store
vector pair instructions enabled.
This patch eliminates the code setting -mblock-ops-vector-pair. If you want to
generate load vector pair and store vector pair instructions for block moves,
you must use -mblock-ops-vector-pair.
2022-08-03 Michael Meissner <meissner@linux.ibm.com>
gcc/
* config/rs6000/rs6000.cc (rs6000_option_override_internal): Remove code
setting -mblock-ops-vector-pair.
Andrew MacLeod [Wed, 3 Aug 2022 17:55:42 +0000 (13:55 -0400)]
Do not walk equivalence set in path_oracle::killing_def.
When killing a def in the path ranger, there is no need to walk the set
of existing equivalences clearing bits. An equivalence match requires
that both ssa-names have to be in each others set. As killing_def
creates a new empty set contianing only the current def, it already
ensures false equivaelnces won't happen.
PR tree-optimization/106514
* value-relation.cc (path_oracle::killing_def) Do not walk the
equivalence set clearing bits.
Jose E. Marchesi [Wed, 3 Aug 2022 16:50:05 +0000 (18:50 +0200)]
testsuite: btf: fix regexps in btf-int-1.c
The regexps in hte test btf-int-1.c were not working properly with the
commenting style of at least one target: powerpc64le-linux-gnu. This
patch changes the test to use better regexps.
Tested in bpf-unkonwn-none, x86_64-linux-gnu and powerpc64le-linux-gnu.
Pushed to master as obvious.
gcc/testsuite/ChangeLog:
PR testsuite/106515
* gcc.dg/debug/btf/btf-int-1.c: Fix regexps in
scan-assembler-times.
Tamar Christina [Wed, 3 Aug 2022 15:00:39 +0000 (16:00 +0100)]
middle-end: Support recognition of three-way max/min.
This patch adds support for three-way min/max recognition in phi-opts.
Concretely for e.g.
#include <stdint.h>
uint8_t three_min (uint8_t xc, uint8_t xm, uint8_t xy) {
uint8_t xk;
if (xc < xm) {
xk = (uint8_t) (xc < xy ? xc : xy);
} else {
xk = (uint8_t) (xm < xy ? xm : xy);
}
return xk;
}
we generate:
<bb 2> [local count:
1073741824]:
_5 = MIN_EXPR <xc_1(D), xy_3(D)>;
_7 = MIN_EXPR <xm_2(D), _5>;
return _7;
instead of
<bb 2>:
if (xc_2(D) < xm_3(D))
goto <bb 3>;
else
goto <bb 4>;
<bb 3>:
xk_5 = MIN_EXPR <xc_2(D), xy_4(D)>;
goto <bb 5>;
<bb 4>:
xk_6 = MIN_EXPR <xm_3(D), xy_4(D)>;
<bb 5>:
# xk_1 = PHI <xk_5(3), xk_6(4)>
return xk_1;
The same function also immediately deals with turning a minimization problem
into a maximization one if the results are inverted. We do this here since
doing it in match.pd would end up changing the shape of the BBs and adding
additional instructions which would prevent various optimizations from working.
gcc/ChangeLog:
* tree-ssa-phiopt.cc (minmax_replacement): Optionally search for the phi
sequence of a three-way conditional.
(replace_phi_edge_with_variable): Support diamonds.
(tree_ssa_phiopt_worker): Detect diamond phi structure for three-way
min/max.
(strip_bit_not, invert_minmax_code): New.
gcc/testsuite/ChangeLog:
* gcc.dg/tree-ssa/split-path-1.c: Disable phi-opts so we don't optimize
code away.
* gcc.dg/tree-ssa/minmax-10.c: New test.
* gcc.dg/tree-ssa/minmax-11.c: New test.
* gcc.dg/tree-ssa/minmax-12.c: New test.
* gcc.dg/tree-ssa/minmax-13.c: New test.
* gcc.dg/tree-ssa/minmax-14.c: New test.
* gcc.dg/tree-ssa/minmax-15.c: New test.
* gcc.dg/tree-ssa/minmax-16.c: New test.
* gcc.dg/tree-ssa/minmax-3.c: New test.
* gcc.dg/tree-ssa/minmax-4.c: New test.
* gcc.dg/tree-ssa/minmax-5.c: New test.
* gcc.dg/tree-ssa/minmax-6.c: New test.
* gcc.dg/tree-ssa/minmax-7.c: New test.
* gcc.dg/tree-ssa/minmax-8.c: New test.
* gcc.dg/tree-ssa/minmax-9.c: New test.
Iain Buclaw [Tue, 26 Jul 2022 15:42:23 +0000 (17:42 +0200)]
d: Merge upstream dmd
d7772a2369, phobos
5748ca43f.
In upstream dmd, the compiler front-end and run-time have been merged
together into one repository. Both dmd and libdruntime now track that.
D front-end changes:
- Deprecated `scope(failure)' blocks that contain `return' statements.
- Deprecated using integers for `version' or `debug' conditions.
- Deprecated returning a discarded void value from a function.
- `new' can now allocate an associative array.
D runtime changes:
- Added avx512f detection to core.cpuid module.
Phobos changes:
- Changed std.experimental.logger.core.sharedLog to return
shared(Logger).
gcc/d/ChangeLog:
* dmd/MERGE: Merge upstream dmd
d7772a2369.
* dmd/VERSION: Bump version to v2.100.1.
* d-codegen.cc (get_frameinfo): Check whether decision to generate
closure changed since semantic finished.
* d-lang.cc (d_handle_option): Remove handling of -fdebug=level and
-fversion=level.
* decl.cc (DeclVisitor::visit (VarDeclaration *)): Generate evaluation
of noreturn variable initializers before throw.
* expr.cc (ExprVisitor::visit (AssignExp *)): Don't generate
assignment for noreturn types, only evaluate for side effects.
* lang.opt (fdebug=): Undocument -fdebug=level.
(fversion=): Undocument -fversion=level.
libphobos/ChangeLog:
* configure: Regenerate.
* configure.ac (libtool_VERSION): Update to 4:0:0.
* libdruntime/MERGE: Merge upstream druntime
d7772a2369.
* libdruntime/Makefile.am (DRUNTIME_DSOURCES): Add
core/internal/array/duplication.d.
* libdruntime/Makefile.in: Regenerate.
* src/MERGE: Merge upstream phobos
5748ca43f.
* testsuite/libphobos.gc/nocollect.d:
Richard Earnshaw [Wed, 3 Aug 2022 09:01:51 +0000 (10:01 +0100)]
cselib: add function to check if SET is redundant [PR106187]
A SET operation that writes memory may have the same value as an
earlier store but if the alias sets of the new and earlier store do
not conflict then the set is not truly redundant. This can happen,
for example, if objects of different types share a stack slot.
To fix this we define a new function in cselib that first checks for
equality and if that is successful then finds the earlier store in the
value history and checks the alias sets.
The routine is used in two places elsewhere in the compiler:
cfgcleanup and postreload.
gcc/ChangeLog:
PR rtl-optimization/106187
* alias.h (mems_same_for_tbaa_p): Declare.
* alias.cc (mems_same_for_tbaa_p): New function.
* dse.cc (record_store): Use it instead of open-coding
alias check.
* cselib.h (cselib_redundant_set_p): Declare.
* cselib.cc: Include alias.h
(cselib_redundant_set_p): New function.
* cfgcleanup.cc: (mark_effect): Use cselib_redundant_set_p instead
of rtx_equal_for_cselib_p.
* postreload.cc (reload_cse_simplify): Use cselib_redundant_set_p.
(reload_cse_noop_set_p): Delete.
Martin Liska [Mon, 1 Aug 2022 13:50:43 +0000 (15:50 +0200)]
gcov-dump: add --stable option
The option prints TOP N counters in a stable format
usage for comparison (diff).
gcc/ChangeLog:
* doc/gcov-dump.texi: Document the new option.
* gcov-dump.cc (main): Parse the new option.
(print_usage): Show the option.
(tag_counters): Sort key:value pairs of TOP N counter.
Martin Liska [Wed, 3 Aug 2022 08:53:22 +0000 (10:53 +0200)]
profile: do not collect stats unless TDF_DETAILS
gcc/ChangeLog:
* profile.cc (compute_branch_probabilities): Do not collect
stats unless TDF_DETAILS.
Roger Sayle [Wed, 3 Aug 2022 08:07:36 +0000 (09:07 +0100)]
PR target/47949: Use xchg to move from/to AX_REG with -Oz on x86.
This patch adds a peephole2 to i386.md to implement the suggestion in
PR target/47949, of using xchg instead of mov for moving values to/from
the %rax/%eax register, controlled by -Oz, as the xchg instruction is
one byte shorter than the move it is replacing.
The new test case is taken from the PR:
int foo(int x) { return x; }
where previously we'd generate:
foo: mov %edi,%eax // 2 bytes
ret
but with this patch, using -Oz, we generate:
foo: xchg %eax,%edi // 1 byte
ret
On the CSiBE benchmark, this saves a total of 10238 bytes (reducing
the -Oz total from 3661796 bytes to 3651558 bytes, a 0.28% saving).
Interestingly, some modern architectures (such as Zen 3) implement
xchg using zero latency register renaming (just like mov), so in theory
this transformation could be enabled when optimizing for speed, if
benchmarking shows the improved code density produces consistently
better performance. However, this is architecture dependent, and
there may be interactions using xchg (instead a single_set) in the
late RTL passes (such as cprop_hardreg), so for now I've restricted
this to -Oz.
2022-08-03 Roger Sayle <roger@nextmovesoftware.com>
Uroš Bizjak <ubizjak@gmail.com>
gcc/ChangeLog
PR target/47949
* config/i386/i386.md (peephole2): New peephole2 to convert
SWI48 moves to/from %rax/%eax where the src is dead to xchg,
when optimizing for minimal size with -Oz.
gcc/testsuite/ChangeLog
PR target/47949
* gcc.target/i386/pr47949.c: New test case.
Roger Sayle [Wed, 3 Aug 2022 08:03:17 +0000 (09:03 +0100)]
Improved pre-reload split of double word comparison against -1 on x86.
This patch adds an extra optimization to *cmp<dwi>_doubleword to improve
the code generated for comparisons against -1. Hypothetically, if a
comparison against -1 reached this splitter we'd currently generate code
that looks like:
notq %rdx ; 3 bytes
notq %rax ; 3 bytes
orq %rdx, %rax ; 3 bytes
setne %al
With this patch we would instead generate the superior:
andq %rdx, %rax ; 3 bytes
cmpq $-1, %rax ; 4 bytes
setne %al
which is both faster and smaller, and also what's currently generated
thanks to the middle-end splitting double word comparisons against
zero and minus one during RTL expansion. Should that change, this would
become a missed-optimization regression, but this patch also (potentially)
helps suitable comparisons created by CSE and combine.
2022-08-03 Roger Sayle <roger@nextmovesoftware.com>
gcc/ChangeLog
* config/i386/i386.md (*cmp<dwi>_doubleword): Add a special case
to split comparisons against -1 using AND and CMP -1 instructions.
Roger Sayle [Wed, 3 Aug 2022 08:00:20 +0000 (09:00 +0100)]
Support logical shifts by (some) integer constants in TImode STV on x86_64.
This patch improves TImode STV by adding support for logical shifts by
integer constants that are multiples of 8. For the test case:
unsigned __int128 a, b;
void foo() { a = b << 16; }
on x86_64, gcc -O2 currently generates:
movq b(%rip), %rax
movq b+8(%rip), %rdx
shldq $16, %rax, %rdx
salq $16, %rax
movq %rax, a(%rip)
movq %rdx, a+8(%rip)
ret
with this patch we now generate:
movdqa b(%rip), %xmm0
pslldq $2, %xmm0
movaps %xmm0, a(%rip)
ret
2022-08-03 Roger Sayle <roger@nextmovesoftware.com>
gcc/ChangeLog
* config/i386/i386-features.cc (compute_convert_gain): Add gain
for converting suitable TImode shift to a V1TImode shift.
(timode_scalar_chain::convert_insn): Add support for converting
suitable ASHIFT and LSHIFTRT.
(timode_scalar_to_vector_candidate_p): Consider logical shifts
by integer constants that are multiples of 8 to be candidates.
gcc/testsuite/ChangeLog
* gcc.target/i386/sse4_1-stv-7.c: New test case.
Roger Sayle [Wed, 3 Aug 2022 07:55:35 +0000 (08:55 +0100)]
Some additional zero-extension related optimizations in simplify-rtx.
This patch implements some additional zero-extension and sign-extension
related optimizations in simplify-rtx.cc. The original motivation comes
from PR rtl-optimization/71775, where in comment #2 Andrew Pinksi sees:
Failed to match this instruction:
(set (reg:DI 88 [ _1 ])
(sign_extend:DI (subreg:SI (ctz:DI (reg/v:DI 86 [ x ])) 0)))
On many platforms the result of DImode CTZ is constrained to be a
small unsigned integer (between 0 and 64), hence the truncation to
32-bits (using a SUBREG) and the following sign extension back to
64-bits are effectively a no-op, so the above should ideally (often)
be simplified to "(set (reg:DI 88) (ctz:DI (reg/v:DI 86 [ x ]))".
To implement this, and some closely related transformations, we build
upon the existing val_signbit_known_clear_p predicate. In the first
chunk, nonzero_bits knows that FFS and ABS can't leave the sign-bit
bit set, so the simplification of of ABS (ABS (x)) and ABS (FFS (x))
can itself be simplified. The second transformation is that we can
canonicalized SIGN_EXTEND to ZERO_EXTEND (as in the PR 71775 case above)
when the operand's sign-bit is known to be clear. The final two chunks
are for SIGN_EXTEND of a truncating SUBREG, and ZERO_EXTEND of a
truncating SUBREG respectively. The nonzero_bits of a truncating
SUBREG pessimistically thinks that the upper bits may have an
arbitrary value (by taking the SUBREG), so we need look deeper at the
SUBREG's operand to confirm that the high bits are known to be zero.
Unfortunately, for PR rtl-optimization/71775, ctz:DI on x86_64 with
default architecture options is undefined at zero, so we can't be sure
the upper bits of reg:DI 88 will be sign extended (all zeros or all ones).
nonzero_bits knows this, so the above transformations don't trigger,
but the transformations themselves are perfectly valid for other
operations such as FFS, POPCOUNT and PARITY, and on other targets/-march
settings where CTZ is defined at zero.
2022-08-03 Roger Sayle <roger@nextmovesoftware.com>
Segher Boessenkool <segher@kernel.crashing.org>
Richard Sandiford <richard.sandiford@arm.com>
gcc/ChangeLog
* simplify-rtx.cc (simplify_unary_operation_1) <ABS>: Add
optimizations for CLRSB, PARITY, POPCOUNT, SS_ABS and LSHIFTRT
that are all positive to complement the existing FFS and
idempotent ABS simplifications.
<SIGN_EXTEND>: Canonicalize SIGN_EXTEND to ZERO_EXTEND when
val_signbit_known_clear_p is true of the operand.
Simplify sign extensions of SUBREG truncations of operands
that are already suitably (zero) extended.
<ZERO_EXTEND>: Simplify zero extensions of SUBREG truncations
of operands that are already suitably zero extended.
GCC Administrator [Wed, 3 Aug 2022 00:16:48 +0000 (00:16 +0000)]
Daily bump.
Andrew MacLeod [Tue, 2 Aug 2022 21:31:37 +0000 (17:31 -0400)]
Do not register edges for statements not understood.
Previously, all gimple_cond types were undserstoof, with float values,
this is no longer true. We should gracefully do nothing if the
gcond type is not supported.
PR tree-optimization/106510
gcc/
* gimple-range-fold.cc (fur_source::register_outgoing_edges):
Check for unsupported statements early.
gcc/testsuite
* gcc.dg/pr106510.c: New.
Aldy Hernandez [Tue, 2 Aug 2022 18:56:49 +0000 (20:56 +0200)]
Adjust testsuite/gcc.dg/tree-ssa/vrp-float-1.c
I missed the -details dump flag, plus I wasn't checking the actual folding.
As a bonus I had flipped the dump file name and the count, so the test
was coming out as unresolved, which I missed because I was only checking
for failures and passes.
Whooops.
gcc/testsuite/ChangeLog:
* gcc.dg/tree-ssa/vrp-float-1.c: Adjust test so it passes.
Andrew MacLeod [Fri, 29 Jul 2022 16:05:38 +0000 (12:05 -0400)]
Check equivalencies when calculating range on entry.
When propagating on-entry values in the cache, checking if any equivalence
has a known value can improve results. No new calculations are made.
Only queries via dominators which do not populate the cache are checked.
PR tree-optimization/106474
gcc/
* gimple-range-cache.cc (ranger_cache::fill_block_cache): Query
range of equivalences that may contribute to the range.
gcc/testsuite/
* g++.dg/pr106474.C: New.
Jose E. Marchesi [Fri, 22 Jul 2022 10:40:50 +0000 (12:40 +0200)]
btf: do not use the CHAR `encoding' bit for BTF
Contrary to CTF and our previous expectations, as per [1], turns out
that in BTF:
1) The `encoding' field in integer types shall not be treated as a
bitmap, but as an enumerated, i.e. these bits are exclusive to each
other.
2) The CHAR bit in `encoding' shall _not_ be set when emitting types
for char nor `unsigned char'.
Consequently this patch clears the CHAR bit before emitting the
variable part of BTF integral types. It also updates the testsuite
accordingly, expanding it to check for BOOL bits.
[1] https://lore.kernel.org/bpf/
a73586ad-f2dc-0401-1eba-
2004357b7edf@fb.com/T/#t
gcc/ChangeLog:
* btfout.cc (output_asm_btf_vlen_bytes): Do not use the CHAR
encoding bit in BTF.
gcc/testsuite/ChangeLog:
* gcc.dg/debug/btf/btf-int-1.c: Do not check for char bits in
bti_encoding and check for bool bits.
Immad Mir [Tue, 2 Aug 2022 16:52:07 +0000 (22:22 +0530)]
analyzer: support for creat, dup, dup2 and dup3 [PR106298]
This patch extends the state machine in sm-fd.cc to support
creat, dup, dup2 and dup3 functions.
Lightly tested on x86_64 Linux.
gcc/analyzer/ChangeLog:
PR analyzer/106298
* sm-fd.cc (fd_state_machine::on_open): Add
creat, dup, dup2 and dup3 functions.
(enum dup): New.
(fd_state_machine::valid_to_unchecked_state): New.
(fd_state_machine::on_creat): New.
(fd_state_machine::on_dup): New.
gcc/testsuite/ChangeLog:
PR analyzer/106298
* gcc.dg/analyzer/fd-1.c: Add tests for 'creat'.
* gcc.dg/analyzer/fd-2.c: Likewise.
* gcc.dg/analyzer/fd-4.c: Likewise.
* gcc.dg/analyzer/fd-dup-1.c: New tests.
Signed-off-by: Immad Mir <mirimmad@outlook.com>
Aldy Hernandez [Tue, 2 Aug 2022 11:27:16 +0000 (13:27 +0200)]
Make range_of_ssa_name_with_loop_info type agnostic.
gcc/ChangeLog:
* gimple-range-fold.cc (fold_using_range::range_of_phi): Remove
irange check.
(tree_lower_bound): New.
(tree_upper_bound): New.
(fold_using_range::range_of_ssa_name_with_loop_info): Convert to
vrange.
* gimple-range-fold.h (range_of_ssa_name_with_loop_info): Change
argument to vrange.
Richard Biener [Tue, 2 Aug 2022 07:58:44 +0000 (09:58 +0200)]
Properly honor param_max_fsm_thread_path_insns in backwards threader
I am trying to make sense of back_threader_profitability::profitable_path_p
and the first thing I notice is that we do
/* Threading is profitable if the path duplicated is hot but also
in a case we separate cold path from hot path and permit optimization
of the hot path later. Be on the agressive side here. In some testcases,
as in PR 78407 this leads to noticeable improvements. */
if (m_speed_p
&& ((taken_edge && optimize_edge_for_speed_p (taken_edge))
|| contains_hot_bb))
{
if (n_insns >= param_max_fsm_thread_path_insns)
{
if (dump_file && (dump_flags & TDF_DETAILS))
fprintf (dump_file, " FAIL: Jump-thread path not considered: "
"the number of instructions on the path "
"exceeds PARAM_MAX_FSM_THREAD_PATH_INSNS.\n");
return false;
}
...
}
else if (!m_speed_p && n_insns > 1)
{
if (dump_file && (dump_flags & TDF_DETAILS))
fprintf (dump_file, " FAIL: Jump-thread path not considered: "
"duplication of %i insns is needed and optimizing for size.\n",
n_insns);
return false;
}
...
return true;
thus we apply the n_insns >= param_max_fsm_thread_path_insns only
to "hot paths". The comment above this isn't entirely clear whether
this is by design ("Be on the aggressive side here ...") but I think
this is a mistake. In fact the "hot path" check seems entirely
useless since if the path is not hot we simply continue threading it.
This was caused by r12-324-g69e5544210e3c0 and the following simply
reverts the offending change.
* tree-ssa-threadbackward.cc
(back_threader_profitability::profitable_path_p): Apply
size constraints to all paths again.
Aldy Hernandez [Mon, 25 Jul 2022 14:47:48 +0000 (16:47 +0200)]
Implement basic range operators to enable floating point VRP.
Without further ado, here is the implementation for floating point
range operators, plus the switch to enable all ranger clients to
handle floats.
These are bare bone implementations good enough for relation operators
to work, while keeping the NAN bits up to date in the frange. There
is also minimal support for keeping track of +-INF when it is obvious.
Tested on x86-64 Linux.
gcc/ChangeLog:
* range-op-float.cc (finite_operands_p): New.
(frelop_early_resolve): New.
(default_frelop_fold_range): New.
(class foperator_equal): New.
(class foperator_not_equal): New.
(class foperator_lt): New.
(class foperator_le): New.
(class foperator_gt): New.
(class foperator_ge): New.
(class foperator_unordered): New.
(class foperator_ordered): New.
(class foperator_relop_unknown): New.
(floating_op_table::floating_op_table): Add above classes to
floating op table.
* value-range.h (frange::supports_p): Enable.
gcc/testsuite/ChangeLog:
* g++.dg/opt/pr94589-2.C: XFAIL.
* gcc.dg/tree-ssa/vrp-float-1.c: New test.
* gcc.dg/tree-ssa/vrp-float-11.c: New test.
* gcc.dg/tree-ssa/vrp-float-3.c: New test.
* gcc.dg/tree-ssa/vrp-float-4.c: New test.
* gcc.dg/tree-ssa/vrp-float-6.c: New test.
* gcc.dg/tree-ssa/vrp-float-7.c: New test.
* gcc.dg/tree-ssa/vrp-float-8.c: New test.
Aldy Hernandez [Mon, 25 Jul 2022 14:44:39 +0000 (16:44 +0200)]
Implement streamer for frange.
This patch Allows us to export floating point ranges into the SSA name
(SSA_NAME_RANGE_INFO).
[Richi, in PR24021 you suggested that match.pd could use global float
ranges, because it would generally not invoke ranger. This patch
implements the boiler plate to save the frange globally.]
[Jeff, we've also been talking in parallel of using NAN knowledge
during expansion to RTL. This patch will provide the NAN bits in the
SSA name.]
Since frange's currently implementation is just a shell, with no
actual endpoints, frange_storage_slot only contains frange_props which
fits inside a byte. When we have endpoints, y'all can decide if it's
worth saving them, or if the NAN/etc bits are good enough.
gcc/ChangeLog:
* tree-core.h (struct tree_ssa_name): Add frange_info and
reshuffle the rest.
* value-range-storage.cc (vrange_storage::alloc_slot): Add case
for frange.
(vrange_storage::set_vrange): Same.
(vrange_storage::get_vrange): Same.
(vrange_storage::fits_p): Same.
(frange_storage_slot::alloc_slot): New.
(frange_storage_slot::set_frange): New.
(frange_storage_slot::get_frange): New.
(frange_storage_slot::fits_p): New.
* value-range-storage.h (class frange_storage_slot): New.
Aldy Hernandez [Tue, 2 Aug 2022 10:14:22 +0000 (12:14 +0200)]
Limit ranger query in ipa-prop.cc to integrals.
ipa-* still works on legacy value_range's which only support
integrals. This patch limits the query to integrals, as to not get a
floating point range that can't exist in an irange.
gcc/ChangeLog:
* ipa-prop.cc (ipa_compute_jump_functions_for_edge): Limit ranger
query to integrals.
Aldy Hernandez [Mon, 1 Aug 2022 18:19:49 +0000 (20:19 +0200)]
More frange::set cleanups.
gcc/ChangeLog:
* value-range.cc (frange::set): Initialize m_props and cleanup.
Richard Biener [Tue, 2 Aug 2022 10:19:25 +0000 (12:19 +0200)]
tree-optimization/106497 - more forward threader can-copy-bb
This adds EDGE_COPY_SRC_JOINER_BLOCK sources to the set of blocks
we need to check we can duplicate.
PR tree-optimization/106497
* tree-ssa-threadupdate.cc (fwd_jt_path_registry::update_cfg):
Also verify we can copy EDGE_COPY_SRC_JOINER_BLOCK.
* gcc.dg/torture/pr106497.c: New testcase.
Martin Liska [Tue, 2 Aug 2022 07:58:43 +0000 (09:58 +0200)]
IPA: reduce what we dump in normal mode
gcc/ChangeLog:
* profile.cc (compute_branch_probabilities): Dump details only
if TDF_DETAILS.
* symtab.cc (symtab_node::dump_base): Do not dump pointer unless
TDF_ADDRESS is used, it makes comparison harder.
Martin Liska [Tue, 2 Aug 2022 08:50:07 +0000 (10:50 +0200)]
gcc-changelog: do not run extra deduction
Do not deduce changelog for root ChangeLog ('').
contrib/ChangeLog:
* gcc-changelog/git_commit.py: Do not deduce changelog for root ChangeLog.
Richard Biener [Tue, 2 Aug 2022 06:37:16 +0000 (08:37 +0200)]
tree-optimization/106498 - reduce SSA updates in autopar
The following reduces the number of SSA updates done during autopar
OMP expansion, specifically avoiding the cases that just add virtual
operands (where maybe none have been before) in dead regions of the CFG.
Instead virtual SSA update is delayed until after the pass. There's
much more TLC needed here, but test coverage makes it really difficult.
PR tree-optimization/106498
* omp-expand.cc (expand_omp_taskreg): Do not perform virtual
SSA update here.
(expand_omp_for): Or here.
(execute_expand_omp): Instead schedule it here together
with CFG cleanup via TODO.
Richard Biener [Mon, 1 Aug 2022 08:06:49 +0000 (10:06 +0200)]
lto/106334 - fix previous fix wrt -flto-partition=none
This adjusts the assert guard to include -flto-partition=none which
behaves as WPA.
PR lto/106334
* dwarf2out.cc (dwarf2out_register_external_die): Adjust
assert.
Richard Biener [Mon, 1 Aug 2022 12:59:08 +0000 (14:59 +0200)]
tree-optimization/106495 - avoid threading to possibly never executed edge
The following builds upon the logic of the PR105679 fix by avoiding
to thread to a known edge that is predicted as probably never executed.
PR tree-optimization/106495
* tree-ssa-threadbackward.cc
(back_threader_profitability::profitable_path_p): If known_edge
is probably never executed avoid threading.
GCC Administrator [Tue, 2 Aug 2022 00:16:51 +0000 (00:16 +0000)]
Daily bump.
David Malcolm [Mon, 1 Aug 2022 23:30:15 +0000 (19:30 -0400)]
c: improvements to address space diagnostics
This adds a clarifying "note" to address space mismatch diagnostics.
For example, it improves the diagnostic for
gcc.target/i386/addr-space-typeck-2.c from:
addr-space-typeck-2.c: In function 'test_bad_call':
addr-space-typeck-2.c:12:22: error: passing argument 2 of 'expects_seg_gs'
from pointer to non-enclosed address space
12 | expects_seg_gs (0, ptr, 1);
| ^~~
to:
addr-space-typeck-2.c: In function 'test_bad_call':
addr-space-typeck-2.c:12:22: error: passing argument 2 of 'expects_seg_gs'
from pointer to non-enclosed address space
12 | expects_seg_gs (0, ptr, 1);
| ^~~
addr-space-typeck-2.c:7:51: note: expected '__seg_gs void *' but argument
is of type 'void *'
7 | extern void expects_seg_gs (int i, void __seg_gs *param, int j);
| ~~~~~~~~~~~~~~~^~~~~
I took the liberty of adding the test coverage to i386 since we need
a specific target to test this on.
gcc/c/ChangeLog:
* c-typeck.cc (build_c_cast): Quote names of address spaces in
diagnostics.
(convert_for_assignment): Add a note to address space mismatch
diagnostics, specifying the expected and actual types.
gcc/testsuite/ChangeLog:
* gcc.target/i386/addr-space-typeck-1.c: New test.
* gcc.target/i386/addr-space-typeck-2.c: New test.
Signed-off-by: David Malcolm <dmalcolm@redhat.com>
David Malcolm [Mon, 1 Aug 2022 23:30:15 +0000 (19:30 -0400)]
docs: fix copy&paste error in -Wanalyzer-putenv-of-auto-var
gcc/ChangeLog:
* doc/invoke.texi (-Wanalyzer-putenv-of-auto-var): Fix copy&paste
error.
Signed-off-by: David Malcolm <dmalcolm@redhat.com>
Roger Sayle [Mon, 1 Aug 2022 22:08:23 +0000 (23:08 +0100)]
PR target/106481: Handle CONST_WIDE_INT in REG_EQUAL during STV on x86_64.
This patch resolves PR target/106481, and is an oversight in my recent
battles with REG_EQUAL notes during TImode STV (see PR target/106278
https://gcc.gnu.org/pipermail/gcc-patches/2022-July/598416.html).
The patch above's/current behaviour is that we check that the mode of
the REG_EQUAL note is TImode before using PUT_MODE to set it to V1TImode.
However, the new test case reveals that this doesn't consider REG_EQUAL
notes that are CONST_INT or CONST_WIDE_INT, i.e. that are VOIDmode,
and so STV produces:
(insn 85 84 86 2 (set (reg:V1TI 113)
(reg:V1TI 84)) "pr106481.c":13:3 1766 {movv1ti_internal}
(expr_list:REG_EQUAL (const_wide_int 0x0ffffffff00000004)
(nil)))
which causes problems as the const_wide_int isn't a valid immediate
constant for V1TImode. With this patch, we now generate the correct:
(insn 85 84 86 2 (set (reg:V1TI 113)
(reg:V1TI 84)) "pr106481.c":13:3 1766 {movv1ti_internal}
(expr_list:REG_EQUAL (const_vector:V1TI [
(const_wide_int 0x0ffffffff00000004)
])
(nil)))
2022-08-01 Roger Sayle <roger@nextmovesoftware.com>
Uroš Bizjak <ubizjak@gmail.com>
gcc/ChangeLog
PR target/106481
* config/i386/i386-features.cc (timode_scalar_chain::convert_insn):
Convert a CONST_SCALAR_INT_P in a REG_EQUAL note into a V1TImode
CONST_VECTOR.
gcc/testsuite/ChangeLog
PR target/106481
* gcc.target/i386/pr106481.c: New test case.
H.J. Lu [Wed, 20 Jul 2022 23:57:32 +0000 (16:57 -0700)]
x86: Add ix86_ifunc_ref_local_ok
We can't always use the PLT entry as the function address for local IFUNC
functions. When the PIC register is needed for PLT call, indirect call
via the PLT entry will fail since the PIC register may not be set up
properly for indirect call. Add ix86_ifunc_ref_local_ok to return false
when the PLT entry can't be used as local IFUNC function pointers.
gcc/
PR target/83782
* config/i386/i386.cc (ix86_ifunc_ref_local_ok): New.
(TARGET_IFUNC_REF_LOCAL_OK): Use it.
gcc/testsuite/
PR target/83782
* gcc.target/i386/pr83782-1.c: Require non-ia32.
* gcc.target/i386/pr83782-2.c: Likewise.
* gcc.target/i386/pr83782-3.c: New test.
Jose E. Marchesi [Fri, 8 Jul 2022 16:32:02 +0000 (18:32 +0200)]
btf: emit linkage information in BTF_KIND_FUNC entries
The kernel bpftool expects BTF_KIND_FUNC entries in BTF to include an
annotation reflecting the linkage of functions (static, global). For
whatever reason they abuse the `vlen' field of the BTF_KIND_FUNC entry
instead of adding a variable-part to the record like it is done with
other entry kinds.
This patch makes GCC to include this linkage info in BTF_KIND_FUNC
entries.
Tested in bpf-unknown-none target.
gcc/ChangeLog:
PR debug/106263
* ctfc.h (struct ctf_dtdef): Add field linkage.
* ctfc.cc (ctf_add_function): Set ctti_linkage.
* dwarf2ctf.cc (gen_ctf_function_type): Pass a linkage for
function types and subprograms.
* btfout.cc (btf_asm_func_type): Emit linkage information for the
function.
(btf_dtd_emit_preprocess_cb): Propagate the linkage information
for functions.
gcc/testsuite/ChangeLog:
PR debug/106263
* gcc.dg/debug/btf/btf-function-4.c: New test.
* gcc.dg/debug/btf/btf-function-5.c: Likewise.
Andrew Stubbs [Tue, 19 Jul 2022 10:16:09 +0000 (11:16 +0100)]
openmp-simd-clone: Match shift types
Ensure that both parameters to vector shifts use the same mode. This is most
important for amdgcn where the masks are DImode.
gcc/ChangeLog:
* omp-simd-clone.cc (simd_clone_adjust): Convert shift_cnt to match
the mask type.
Co-authored-by: Jakub Jelinek <jakub@redhat.com>
Sam Feifer [Fri, 29 Jul 2022 13:44:48 +0000 (09:44 -0400)]
match.pd: Add new division pattern [PR104992]
This patch fixes a missed optimization in match.pd. It takes the pattern,
x / y * y == x, and optimizes it to x % y == 0. This produces fewer
instructions. This simplification does not happen for complex types.
This patch also adds tests for the optimization rule.
Bootstrapped/regtested on x86_64-pc-linux-gnu.
PR tree-optimization/104992
gcc/ChangeLog:
* match.pd (x / y * y == x): New simplification.
gcc/testsuite/ChangeLog:
* g++.dg/pr104992-1.C: New test.
* gcc.dg/pr104992.c: New test.
Roger Sayle [Mon, 1 Aug 2022 10:36:23 +0000 (11:36 +0100)]
Update configure to check for a recent gnat Ada compiler.
GCC fails to bootstrap when configured with --enable-languages=all on
machines that have older versions of GNAT installed as the system Ada
compiler. In configure, it's not sufficient to check whether gnat is
available, but whether a sufficiently recent version of GNAT is
installed. This patch tweaks config/acx.m4 so that conftest.adb also
contains a reference to System.CRTL.int64 as required by the current
version of gcc/ada/osint.adb. This fixes the build when the system
Ada is GNAT v4.8.5 (on Redhat 7) by disabling ada, but continues to
work fine when the system Ada is GNAT v11.3.1.
2022-08-01 Roger Sayle <roger@nextmovesoftware.com>
Arnaud Charlet <charlet@adacore.com>
config/ChangeLog
* acx.m4 (AC_PROG_GNAT): Update conftest.adb to include
features required of the host gnat compiler.
ChangeLog
* configure: Regenerate.
Martin Liska [Mon, 1 Aug 2022 08:32:00 +0000 (10:32 +0200)]
lto: replace $target with $host in configure.ac [PR106170]
PR lto/106170
lto-plugin/ChangeLog:
* configure.ac: Replace $target with $host.
* configure: Regenerate.
Jakub Jelinek [Mon, 1 Aug 2022 06:26:03 +0000 (08:26 +0200)]
libfortran: Fix up boz_15.f90 on powerpc64le with -mabi=ieeelongdouble [PR106079]
The boz_15.f90 test FAILs on powerpc64le-linux when -mabi=ieeelongdouble
is used (either default through --with-long-double-format=ieee or
when used explicitly).
The problem is that the read/write transfer routines are called with
BT_REAL (or BT_COMPLEX) type and kind 17 which is magic we use to say
it is the IEEE quad real(kind=16) rather than the IBM double double
real(kind=16). For the floating point input/output we then handle kind
17 specially, but for B/O/Z we just treat the bytes of the floating point
value as binary blob and using 17 in that case results in unexpected
behavior, for write it means we don't estimate right how many chars we'll
need and print ******************** etc. rather than what we should, and
even with explicit size we'd print one further byte than intended.
For read it would even mean overwriting some unrelated byte after the
floating point object.
Fixed by using 16 instead of 17 in the read_radix and write_{b,o,z} calls.
2022-08-01 Jakub Jelinek <jakub@redhat.com>
PR libfortran/106079
* io/transfer.c (formatted_transfer_scalar_read,
formatted_transfer_scalar_write): For type BT_REAL with kind 17
change kind to 16 before calling read_radix or write_{b,o,z}.
Aldy Hernandez [Sun, 31 Jul 2022 11:43:36 +0000 (13:43 +0200)]
Cleanups to frange.
These are some assorted cleanups to the frange class to make it easier
to drop in an implementation with FP endpoints:
* frange::set() had some asserts limiting the type of arguments
passed. There's no reason why we can't handle all the variants.
Worse comes to worse, we can always return a VARYING which is
conservative and correct.
* frange::normalize_kind() now returns a boolean that can be used in
union and intersection to indicate that the range changed.
* Implement vrp_val_max and vrp_val_min for floats. Also, move them
earlier in the header file so frange can use them.
Tested on x86-64 Linux.
gcc/ChangeLog:
* value-range.cc (tree_compare): New.
(frange::set): Make more general.
(frange::normalize_kind): Cleanup and return bool.
(frange::union_): Use normalize_kind return value.
(frange::intersect): Same.
(frange::verify_range): Remove unnecessary else.
* value-range.h (vrp_val_max): Move before frange class.
(vrp_val_min): Same.
(frange::frange): Remove set to m_type.
Aldy Hernandez [Sun, 31 Jul 2022 11:36:59 +0000 (13:36 +0200)]
const_tree conversion of vrange::supports_*
Make all vrange::supports_*_p methods const_tree as they can end up
being called from functions that are const_tree.
Tested on x86-64 Linux.
gcc/ChangeLog:
* value-range.cc (vrange::supports_type_p): Use const_tree.
(irange::supports_type_p): Same.
(frange::supports_type_p): Same.
* value-range.h (Value_Range::supports_type_p): Same.
(irange::supports_p): Same.
Aldy Hernandez [Sun, 31 Jul 2022 21:02:14 +0000 (23:02 +0200)]
Make irange dependency explicit for range_of_ssa_name_with_loop_info.
Even though ranger is type agnostic, SCEV seems to only work with
integers. This patch removes some FIXME notes making it explicit that
bounds_of_var_in_loop only works with iranges.
Tested on x86-64 Linux.
gcc/ChangeLog:
* gimple-range-fold.cc (fold_using_range::range_of_phi): Only
query SCEV for integers.
(fold_using_range::range_of_ssa_name_with_loop_info): Remove
irange check.
Dimitrije Milošević [Fri, 29 Jul 2022 06:36:06 +0000 (08:36 +0200)]
libsanitizer: Cherry-pick
2bfb0fcb51510f22723c8cdfefe from upstream
2bfb0fcb51510f22723c8cdfefe [Sanitizer][MIPS] Fix stat struct size for the O32 ABI.
Signed-off-by: Dimitrije Milosevic <dimitrije.milosevic@syrmia.com>.
GCC Administrator [Mon, 1 Aug 2022 00:16:31 +0000 (00:16 +0000)]
Daily bump.