Richard Biener [Mon, 11 Jul 2022 07:23:50 +0000 (09:23 +0200)]
tree-optimization/106228 - fix vect_setup_realignment virtual SSA handling
The following adds missing assignment of a virtual use operand to a
created load to vect_setup_realignment which shows as bootstrap
failure on powerpc64-linux and extra testsuite fails for targets
when misaligned loads are not supported or not optimal.
PR tree-optimization/106228
* tree-vect-data-refs.cc (vect_setup_realignment): Properly
set a VUSE operand on the emitted load.
Aldy Hernandez [Fri, 10 Jun 2022 13:11:06 +0000 (15:11 +0200)]
Implement global ranges for all vrange types (SSA_NAME_RANGE_INFO).
Currently SSA_NAME_RANGE_INFO only handles integer ranges, and loses
half the precision in the process because its use of legacy
value_range's. This patch rewrites all the SSA_NAME_RANGE_INFO
(nonzero bits included) to use the recently contributed
vrange_storage. With it, we'll be able to efficiently save any ranges
supported by ranger in GC memory. Presently this will only be
irange's, but shortly we'll add floating ranges and others to the mix.
As per the discussion with the trailing_wide_ints adjustments and
vrange_storage, we'll be able to save integer ranges with a maximum of
5 sub-ranges. This could be adjusted later if more sub-ranges are
needed (unlikely).
Since this is a behavior changing patch, I would like to take a few
days for discussion, and commit early next week if all goes well.
A few notes.
First, we get rid of the SSA_NAME_ANTI_RANGE_P bit in the SSA_NAME
since we store full resolution ranges. Perhaps it could be re-used
for something else.
The range_info_def struct is gone in favor of an opaque type handled
by vrange_storage. It currently supports irange, but will support
frange, prange, etc, in due time.
From the looks of it, set_range_info was an update operation despite
its name, as we improved the nonzero bits with each call, even though
we clobbered the ranges. Presumably this was because doing a proper
intersect of ranges lost information with the anti-range hack. We no
longer have this limitation so now we formalize both set_range_info
and set_nonzero_bits to an update operation. After all, we should
never be losing information, but enhancing it whenever possible. This
means, that if folks' finger-memory is not offended, as a follow-up,
I'd like to rename set_nonzero_bits and set_range_info to update_*.
I have kept the same global API we had in tree-ssanames.h, with the
caveat that all set operations are now update as discussed above.
There is a 2% performance penalty for evrp and a 3% penalty for VRP
that is coincidentally in line with a previous improvement of the same
amount in the vrange abstraction patchset. Interestingly, this
penalty is mostly due to the wide int to tree dance we keep doing with
irange and legacy. In a first draft of this patch where I was
streaming trees directly, there was actually a small improvement
instead. I hope to get some of the gain back when we move irange's to
wide-ints, though I'm not in a hurry ;-).
Tested and benchmarked on x86-64 Linux. Tested on ppc64le Linux.
Comments welcome.
gcc/ChangeLog:
* gimple-range.cc (gimple_ranger::export_global_ranges): Remove
verification against legacy value_range.
(gimple_ranger::register_inferred_ranges): Same.
(gimple_ranger::export_global_ranges): Rename update_global_range
to set_range_info.
* tree-core.h (struct range_info_def): Remove.
(struct irange_storage_slot): New.
(struct tree_base): Remove SSA_NAME_ANTI_RANGE_P documentation.
(struct tree_ssa_name): Add vrange_storage support.
* tree-ssanames.cc (range_info_p): New.
(range_info_fits_p): New.
(range_info_alloc): New.
(range_info_free): New.
(range_info_get_range): New.
(range_info_set_range): New.
(set_range_info_raw): Remove.
(set_range_info): Adjust to use vrange_storage.
(set_nonzero_bits): Same.
(get_nonzero_bits): Same.
(duplicate_ssa_name_range_info): Remove overload taking
value_range_kind.
Rewrite tree overload to use vrange_storage.
(duplicate_ssa_name_fn): Adjust to use vrange_storage.
* tree-ssanames.h (struct range_info_def): Remove.
(set_range_info): Adjust prototype to take vrange.
* tree-vrp.cc (vrp_asserts::remove_range_assertions): Call
duplicate_ssa_name_range_info.
* tree.h (SSA_NAME_ANTI_RANGE_P): Remove.
(SSA_NAME_RANGE_TYPE): Remove.
* value-query.cc (get_ssa_name_range_info): Adjust to use
vrange_storage.
(update_global_range): Remove.
(get_range_global): Remove as_a<irange>.
* value-query.h (update_global_range): Remove.
* tree-ssa-dom.cc (set_global_ranges_from_unreachable_edges):
Rename update_global_range to set_range_info.
* value-range-storage.cc (vrange_storage::alloc_slot): Remove
gcc_unreachable.
GCC Administrator [Mon, 11 Jul 2022 00:16:25 +0000 (00:16 +0000)]
Daily bump.
Lewis Hyatt [Sat, 9 Jul 2022 20:12:21 +0000 (16:12 -0400)]
c: Fix location for _Pragma tokens [PR97498]
The handling of #pragma GCC diagnostic uses input_location, which is not always
as precise as needed; in particular the relative location of some tokens and a
_Pragma directive will crucially determine whether a given diagnostic is enabled
or suppressed in the desired way. PR97498 shows how the C frontend ends up with
input_location pointing to the beginning of the line containing a _Pragma()
directive, resulting in the wrong behavior if the diagnostic to be modified
pertains to some tokens found earlier on the same line. This patch fixes that by
addressing two issues:
a) libcpp was not assigning a valid location to the CPP_PRAGMA token
generated by the _Pragma directive.
b) C frontend was not setting input_location to something reasonable.
With this change, the C frontend is able to change input_location to point to
the _Pragma token as needed.
This is just a two-line fix (one for each of a) and b)), the testsuite changes
were needed only because the location on the tested warnings has been somewhat
improved, so the tests need to look for the new locations.
gcc/c/ChangeLog:
PR preprocessor/97498
* c-parser.cc (c_parser_pragma): Set input_location to the
location of the pragma, rather than the start of the line.
libcpp/ChangeLog:
PR preprocessor/97498
* directives.cc (destringize_and_run): Override the location of
the CPP_PRAGMA token from a _Pragma directive to the location of
the expansion point, as is done for the tokens lexed from it.
gcc/testsuite/ChangeLog:
PR preprocessor/97498
* c-c++-common/pr97498.c: New test.
* c-c++-common/gomp/pragma-3.c: Adapt for improved warning locations.
* c-c++-common/gomp/pragma-5.c: Likewise.
* gcc.dg/pragma-message.c: Likewise.
libgomp/ChangeLog:
* testsuite/libgomp.oacc-c-c++-common/reduction-5.c: Adapt for
improved warning locations.
* testsuite/libgomp.oacc-c-c++-common/vred2d-128.c: Likewise.
Dimitar Dimitrov [Sun, 10 Jul 2022 08:15:39 +0000 (11:15 +0300)]
testsuite: Require int128 for gcc.dg/pr106063.c
Require effective target int128 for gcc.dg/pr106063.c.
PR tree-optimization/106063
gcc/testsuite/ChangeLog:
* gcc.dg/pr106063.c: Require effective target int128.
Signed-off-by: Dimitar Dimitrov <dimitar@dinux.eu>
Aldy Hernandez [Sat, 9 Jul 2022 15:43:47 +0000 (17:43 +0200)]
Cleanups to irange::nonzero bit code.
In discussions with Andrew we realized varying_p() was returning true
for a range of the entire domain with a non-empty nonzero mask. This
is confusing as varying_p() should only return true when absolutely no
information is available. A nonzero mask that has any cleared bits is
extra information and must return false for varying_p(). This patch
fixes this oversight. Now a range of the entire domain with nonzero
bits, is internally set to VR_RANGE (with the appropriate end points
set). VR_VARYING ranges must have a null nonzero mask.
Also, the union and intersect code were not quite right in the presence of
nonzero masks. Sometimes we would drop masks to -1 unnecessarily. I
was trying to be too smart in avoiding extra work when the mask was
NULL, but there's also an implicit mask in the range that must be
taken into account. For example, [0,0] may have no nonzero bits set
explicitly, but the mask is really 0x0. This will all be simpler when
we drop trees, because the nonzero bits will always be set, even if
-1.
Finally, I've added unit tests to the nonzero mask code. This should
help us maintain sanity going forward.
There should be no visible changes, as the main consumer of this code
is the SSA_NAME_RANGE_INFO patchset which has yet to be committed.
Tested on x86-64 Linux.
gcc/ChangeLog:
* value-range.cc (irange::operator=): Call verify_range.
(irange::irange_set): Normalize kind after everything else has
been set.
(irange::irange_set_anti_range): Same.
(irange::set): Same.
(irange::verify_range): Disallow nonzero masks for VARYING.
(irange::irange_union): Call verify_range.
Handle nonzero masks better.
(irange::irange_intersect): Same.
(irange::set_nonzero_bits): Calculate mask if either range has an
explicit mask.
(irange::intersect_nonzero_bits): Same.
(irange::union_nonzero_bits): Same.
(range_tests_nonzero_bits): New.
(range_tests): Call range_tests_nonzero_bits.
* value-range.h (class irange): Remove set_nonzero_bits method
with trees.
(irange::varying_compatible_p): Set nonzero mask.
Xi Ruoyao [Wed, 6 Jul 2022 15:22:29 +0000 (23:22 +0800)]
loongarch: avoid unnecessary sign-extend after 32-bit division
Like add.w/sub.w/mul.w, div.w/mod.w/div.wu/mod.wu also sign-extend the
output on LA64. But, LoongArch v1.00 mandates that the inputs of 32-bit
division to be sign-extended so we have to expand 32-bit division into
RTL sequences.
We defined div.w/mod.w/div.wu/mod.wu as a (DI, DI) -> SI instruction.
This definition does not indicate the fact that these instructions will
store the result as sign-extended value in a 64-bit GR. Then the
compiler would emit unnecessary sign-extend operations. For example:
int div(int a, int b) { return a / b; }
was compiled to:
div.w $r4, $r4, $r5
slli.w $r4, $r4, 0 # this is unnecessary
jr $r1
To remove this unnecessary operation, we change the division
instructions to (DI, DI) -> DI and describe the sign-extend behavior
explicitly in the RTL template. In the expander for 32-bit division we
then use simplify_gen_subreg to extract the lower 32 bits.
gcc/ChangeLog:
* config/loongarch/loongarch.md (<any_div>di3_fake): Describe
the sign-extend of result in the RTL template.
(<any_div><mode>3): Adjust for <any_div>di3_fake change.
gcc/testsuite/ChangeLog:
* gcc.target/loongarch/div-4.c: New test.
Xi Ruoyao [Wed, 6 Jul 2022 05:45:55 +0000 (13:45 +0800)]
loongarch: add alternatives for idiv insns to improve code generation
Currently in the description of LoongArch integer division instructions,
the output is marked as earlyclobbered ('&'). It's necessary when
loongarch_check_zero_div_p() because clobbering operand 2 (divisor) will
make the checking for zero divisor impossible.
But, for -mno-check-zero-division (the default of GCC >= 12.2 for
optimized code), the output is not earlyclobbered at all. And, the
read of operand 1 only occurs before clobbering the output. So we make
three alternatives for an idiv instruction:
* (=r,r,r): For -mno-check-zero-division.
* (=&r,r,r): For -mcheck-zero-division.
* (=&r,0,r): For -mcheck-zero-division, to explicitly allow patterns
like "div.d $a0, $a0, $a1".
gcc/ChangeLog:
* config/loongarch/loongarch.cc (loongarch_check_zero_div_p):
Remove static, for use in the machine description file.
* config/loongarch/loongarch-protos.h:
(loongarch_check_zero_div_p): Add prototype.
* config/loongarch/loongarch.md (enabled): New attr.
(*<optab><mode>3): Add (=r,r,r) and (=&r,0,r) alternatives for
idiv. Conditionally enable the alternatives using
loongarch_check_zero_div_p.
(<optab>di3_fake): Likewise.
gcc/testsuite/ChangeLog:
* gcc.target/loongarch/div-1.c: New test.
* gcc.target/loongarch/div-2.c: New test.
* gcc.target/loongarch/div-3.c: New test.
Xi Ruoyao [Fri, 8 Jul 2022 13:09:25 +0000 (21:09 +0800)]
loongarch: fix mulsidi3_64bit instruction
(mult (sign_extend:DI rj:SI) (sign_extend:DI rk:SI)) should be
"mulw.d.w", not "mul.d".
gcc/ChangeLog:
* config/loongarch/loongarch.md (mulsidi3_64bit): Use mulw.d.w
instead of mul.d.
gcc/testsuite/ChangeLog:
* gcc.target/loongarch/mulw_d_w.c: New test.
* gcc.c-torture/execute/mul-sext.c: New test.
GCC Administrator [Sun, 10 Jul 2022 00:16:23 +0000 (00:16 +0000)]
Daily bump.
Aldy Hernandez [Sat, 9 Jul 2022 15:43:44 +0000 (17:43 +0200)]
Set VR_VARYING in irange::irange_single_pair_union.
The fast union operation is sometimes setting a range of the entire
domain, but leaving the kind bit as VR_RANGE instead of downgrading it
to VR_VARYING.
Tested on x86-64 Linux.
gcc/ChangeLog:
* value-range.cc (irange::irange_single_pair_union): Set
VR_VARYING when appropriate.
Vit Kabele [Sat, 9 Jul 2022 17:06:43 +0000 (13:06 -0400)]
[PATCH v3] c: Extend the -Wpadded message with actual padding size
gcc/ChangeLog:
* stor-layout.cc (finalize_record_size): Extend warning message.
gcc/testsuite/ChangeLog:
* c-c++-common/Wpadded.c: New test.
Sam Feifer [Sat, 9 Jul 2022 16:08:01 +0000 (12:08 -0400)]
[PATCH] match.pd: Add new bitwise arithmetic pattern [PR98304]
PR tree-optimization/98304
gcc:
* match.pd (n - (((n > C1) ? n : C1) & -C2)): New simplification.
gcc/testsuite:
* gcc.c-torture/execute/pr98304-2.c: New test.
* gcc.dg/pr98304-1.c: New test.
Jeff Law [Sat, 9 Jul 2022 15:11:00 +0000 (11:11 -0400)]
[RFA] Improve initialization of objects when the initializer has trailing zeros.
gcc/
* expr.cc (store_expr): Identify trailing NULs in a STRING_CST
initializer and use clear_storage rather than copying the
NULs to the destination array.
François Dumont [Sat, 9 Jul 2022 12:18:15 +0000 (14:18 +0200)]
libstdc++: Remove obsolete comment in <string> header
The comment is obsolete because char_traits.h do not include stl_algobase.h
anymore and stl_algobase.h is included directly from <string> a few lines
below.
libstdc++-v3/ChangeLog:
* include/std/string: Remove obsolete comment about char_traits.h including
stl_algobase.h.
Roger Sayle [Sat, 9 Jul 2022 08:07:18 +0000 (09:07 +0100)]
Improve preservation of FLAGS_REG mode in i386.md's peephole2s.
The patch tweaks several peephole2s in i386.md that propagate the flags
register, but take its mode from the SET_SRC rather than preserve the
mode of the original SET_DEST. This encounters problems when the
SET_SRC is a VOIDmode CONST_INT. Fixed by using match_operand with a
flags_reg_operand predicate.
2022-07-09 Roger Sayle <roger@nextmovesoftware.com>
gcc/ChangeLog
* config/i386/i386.md (define_peephole2): Use match_operand of
flags_reg_operand to capture and preserve the mode of FLAGS_REG.
(define_peephole2): Likewise.
(define_peephole2): Likewise...
Roger Sayle [Sat, 9 Jul 2022 08:02:14 +0000 (09:02 +0100)]
Support *testdi_not_doubleword during STV pass on x86.
This patch fixes the current two FAILs of pr65105-5.c on x86 when
compiled with -m32. These (temporary) breakages were fallout from my
patches to improve/upgrade (scalar) double word comparisons.
On mainline, the i386 backend currently represents a critical comparison
using (compare (and (not reg1) reg2) (const_int 0)) which isn't/wasn't
recognized by the STV pass' convertible_comparison_p. This simple STV
patch adds support for this pattern (*testdi_not_doubleword) and
generates the vector pandn and ptest instructions expected in the
existing (failing) test case.
2022-07-09 Roger Sayle <roger@nextmovesoftware.com>
gcc/ChangeLog
* config/i386/i386-features.cc (convert_compare): Add support
for *testdi_not_doubleword pattern, "(compare (and (not ...)))"
by generating a pandn followed by ptest.
(convertible_comparison_p): Recognize both *cmpdi_doubleword and
recent *testdi_not_doubleword comparison patterns.
Tamar Christina [Sat, 9 Jul 2022 01:54:44 +0000 (21:54 -0400)]
[PATCH][s390]: Fix the usage of store_bit_field in the backend.
Hi All,
I seem to have broken the s390 bootstrap because I added a new parameter to the
store_bit_field function to indicate whether the value the field of is being set
is currently undefined.
If it's undefined we use a subreg instead. In this case the value of false
restores the old behavior.
Ok for master?
Thanks,
Tamar
gcc/ChangeLog:
* config/s390/s390.cc (s390_expand_atomic): Pass false to store_bit_field to
indicate that the value is not undefined.
Andrew Pinski [Thu, 7 Jul 2022 22:06:19 +0000 (22:06 +0000)]
Fix tree-opt/PR106087: ICE with inline-asm with multiple output and assigned only static vars
The problem here is that when we mark the ssa name that was referenced in the now removed
dead store (to a write only static variable), the inline-asm would also be removed
even though it was defining another ssa name. This fixes the problem by checking
to make sure that the statement was only defining one ssa name.
Committed as approved after a bootstrapped and tested on x86_64 with no regressions.
PR tree-optimization/106087
gcc/ChangeLog:
* tree-ssa-dce.cc (simple_dce_from_worklist): Check
to make sure the statement is only defining one operand.
gcc/testsuite/ChangeLog:
* gcc.c-torture/compile/inline-asm-1.c: New test.
GCC Administrator [Sat, 9 Jul 2022 00:16:54 +0000 (00:16 +0000)]
Daily bump.
Ian Lance Taylor [Fri, 8 Jul 2022 17:28:24 +0000 (10:28 -0700)]
libbacktrace: check for sys/link.h
QNX uses sys/link.h rather than link.h for dl_iterate_phdr
Fixes https://github.com/ianlancetaylor/libbacktrace/issues/86
* configure.ac: Check for sys/link.h. Use either link.h or
sys/link.h when checking for dl_iterate_phdr.
* elf.c: Include sys/link.h if available.
* configure, config.h.in: Regenerate.
Martin Jambor [Fri, 8 Jul 2022 16:12:26 +0000 (18:12 +0200)]
testsuite: Fix tree-ssa/alias-access-path-13.c on 32bit platforms (PR 106216)
For gcc.dg/tree-ssa/alias-access-path-13.c to work, SRA must think of
accesses to foo.inn.val and to foo itself as different ones, i.e. they
need to have different offset and size, which on 32bit platforms they
do not. Fixed by replacing a dummy long int field of the union with a
struct of two integers.
Tested by:
make -k check-gcc RUNTESTFLAGS="tree-ssa.exp=alias-access-path-13.c" and
make -k check-gcc RUNTESTFLAGS="--target_board=unix'{-m32}' tree-ssa.exp=alias-access-path-13.c"
on an x86_64-linux, also with patched SRA to verify it still tests the
original intent.
gcc/testsuite/ChangeLog:
2022-07-08 Martin Jambor <mjambor@suse.cz>
PR testsuite/106216
* gcc.dg/tree-ssa/alias-access-path-13.c (union foo): Replace a long
int field with a struct that is larger than an int also on 32bit
platforms.
Lewis Hyatt [Thu, 7 Jul 2022 17:59:27 +0000 (13:59 -0400)]
diagnostics: Make line-ending logic consistent with libcpp [PR91733]
libcpp recognizes a lone \r as a valid line ending, so the infrastructure
for retrieving source lines to be output in diagnostics needs to do the
same. This patch fixes file_cache_slot::get_next_line() accordingly so that
diagnostics display the correct part of the source when \r line endings are in
use.
gcc/ChangeLog:
PR preprocessor/91733
* input.cc (find_end_of_line): New helper function.
(file_cache_slot::get_next_line): Recognize \r as a line ending.
* diagnostic-show-locus.cc (test_escaping_bytes_1): Adapt selftest
since \r will now be interpreted as a line-ending.
gcc/testsuite/ChangeLog:
PR preprocessor/91733
* c-c++-common/pr91733.c: New test.
Martin Liska [Wed, 29 Jun 2022 08:26:52 +0000 (10:26 +0200)]
sanitizer: Fix hwasan related option conflicts [PR106132]
Split report_conflicting_sanitizer_options(..., SANITIZE_ADDRESS | SANITIZE_HWADDRESS)
call into 2 calls as we don't have any option that would be
address+hwaddress (that conflicts) as well.
PR sanitizer/106132
gcc/ChangeLog:
* opts.cc (finish_options): Use 2 calls to
report_conflicting_sanitizer_options.
gcc/testsuite/ChangeLog:
* c-c++-common/hwasan/arguments-3.c: Cover new ICE.
Richard Biener [Fri, 8 Jul 2022 08:41:59 +0000 (10:41 +0200)]
tree-optimization/106226 - move vectorizer virtual SSA update
When we knowingly have broken virtual SSA form we need to update
it before we eventually perform slpeel manual updating which will
call delete_update_ssa. Currently that's done on-demand but
communicating whether it's a known unavoidable case is broken
there. The following makes that a synchronous operation but
instead of actually performing the update we instead recod the
need, clear the update SSA sub-state and force virtual renaming
at the very end of the vectorization pass.
PR tree-optimization/106226
* tree-vect-loop-manip.cc (vect_do_peeling): Assert that
no SSA update is needed. Move virtual SSA update ...
* tree-vectorizer.cc (pass_vectorize::execute): ... here,
via forced virtual renaming when TODO_update_ssa_only_virtuals
is queued.
(vect_transform_loops): Return TODO_update_ssa_only_virtuals
when virtual SSA update is required.
(try_vectorize_loop_1): Adjust.
* tree-vect-stmts.cc (vectorizable_simd_clone_call): Allow
virtual renaming if the ABI forces an aggregate return
but the original call did not have a virtual definition.
* gfortran.dg/pr106226.f: New testcase.
Martin Liska [Mon, 4 Jul 2022 14:32:51 +0000 (16:32 +0200)]
lto-dump: Do not print output file
Right now the following is printed:
lto-dump
.file "<artificial>"
.ident "GCC: (GNU) 13.0.0
20220707 (experimental)"
.section .note.GNU-stack,"",@progbits
After the patch we print -help and do not emit any assembly output:
lto-dump
Usage: lto-dump [OPTION]... SUB_COMMAND [OPTION]...
LTO dump tool command line options.
-list [options] Dump the symbol list.
-demangle Dump the demangled output.
-defined-only Dump only the defined symbols.
...
gcc/lto/ChangeLog:
* lto-dump.cc (lto_main): Exit in the function
as we don't want any LTO bytecode processing.
gcc/ChangeLog:
* toplev.cc (init_asm_output): Do not init asm_out_file.
Tamar Christina [Fri, 8 Jul 2022 07:30:22 +0000 (08:30 +0100)]
middle-end: don't lower past veclower [PR106063]
Hi All,
My previous patch can cause a problem if the pattern matches after veclower
as it may replace the construct with a vector sequence which the target may not
directly support.
As such don't perform the rewriting if after veclower unless the target supports
the operation. If before veclower do the rewriting as well if the target didn't
support the original operation either.
gcc/ChangeLog:
PR tree-optimization/106063
* match.pd: Do not apply pattern after veclower is not supported.
gcc/testsuite/ChangeLog:
PR tree-optimization/106063
* gcc.dg/pr106063.c: New test.
Thomas Schwinge [Thu, 7 Jul 2022 13:11:03 +0000 (15:11 +0200)]
Fix one issue in OpenMP 'requires' directive diagnostics
Fix-up for recent commit
683f11843974f0bdf42f79cdcbb0c2b43c7b81b0
"OpenMP: Move omp requires checks to libgomp".
gcc/
* lto-cgraph.cc (input_offload_tables) <LTO_symtab_edge>: Correct
'fn2' computation.
libgomp/
* testsuite/libgomp.c-c++-common/requires-1.c: Add 'dg-note's.
* testsuite/libgomp.c-c++-common/requires-2.c: Likewise.
* testsuite/libgomp.c-c++-common/requires-3.c: Likewise.
* testsuite/libgomp.c-c++-common/requires-7.c: Likewise.
* testsuite/libgomp.fortran/requires-1.f90: Likewise.
Tamar Christina [Fri, 8 Jul 2022 06:37:20 +0000 (07:37 +0100)]
middle-end: Use subregs to expand COMPLEX_EXPR to set the lowpart.
When lowering COMPLEX_EXPR we currently emit two VEC_EXTRACTs. One for the
lowpart and one for the highpart.
The problem with this is that in RTL the lvalue of the RTX is the only thing
tying the two instructions together.
This means that e.g. combine is unable to try to combine the two instructions
for setting the lowpart and highpart.
For ISAs that have bit extract instructions we can eliminate one of the extracts
if, and only if we're setting the entire complex number.
This change changes the expand code when we're setting the entire complex number
to generate a subreg for the lowpart instead of a vec_extract.
This allows us to optimize sequences such as:
_Complex int f(int a, int b) {
_Complex int t = a + b * 1i;
return t;
}
from:
f:
bfi x2, x0, 0, 32
bfi x2, x1, 32, 32
mov x0, x2
ret
into:
f:
bfi x0, x1, 32, 32
ret
I have also confirmed the codegen for x86_64 did not change.
gcc/ChangeLog:
* expmed.cc (store_bit_field_1): Add parameter that indicates if value is
still undefined and if so emit a subreg move instead.
(store_integral_bit_field): Likewise.
(store_bit_field): Likewise.
* expr.h (write_complex_part): Likewise.
* expmed.h (store_bit_field): Add new parameter.
* builtins.cc (expand_ifn_atomic_compare_exchange_into_call): Use new
parameter.
(expand_ifn_atomic_compare_exchange): Likewise.
* calls.cc (store_unaligned_arguments_into_pseudos): Likewise.
* emit-rtl.cc (validate_subreg): Likewise.
* expr.cc (emit_group_store): Likewise.
(copy_blkmode_from_reg): Likewise.
(copy_blkmode_to_reg): Likewise.
(clear_storage_hints): Likewise.
(write_complex_part): Likewise.
(emit_move_complex_parts): Likewise.
(expand_assignment): Likewise.
(store_expr): Likewise.
(store_field): Likewise.
(expand_expr_real_2): Likewise.
* ifcvt.cc (noce_emit_move_insn): Likewise.
* internal-fn.cc (expand_arith_set_overflow): Likewise.
(expand_arith_overflow_result_store): Likewise.
(expand_addsub_overflow): Likewise.
(expand_neg_overflow): Likewise.
(expand_mul_overflow): Likewise.
(expand_arith_overflow): Likewise.
gcc/testsuite/ChangeLog:
* g++.target/aarch64/complex-init.C: New test.
Haochen Jiang [Tue, 5 Jul 2022 06:12:18 +0000 (14:12 +0800)]
i386: Handle memory operand for direct call to cvtps2pd in unpack
gcc/ChangeLog:
PR target/106180
* config/i386/sse.md (sse2_cvtps2pd<mask_name>_1):
Rename from *sse2_cvtps2pd<mask_name>_1.
(vec_unpacks_lo_v4sf): Add handler for memory operand.
gcc/testsuite/ChangeLog:
PR target/106180
* g++.target/i386/pr106180-1.C: New test.
Lulu Cheng [Thu, 7 Jul 2022 10:07:28 +0000 (18:07 +0800)]
LoongArch: Modify fp_sp_offset and gp_sp_offset's calculation method when frame->mask or frame->fmask is zero.
Under the LA architecture, when the stack is dropped too far, the process
of dropping the stack is divided into two steps.
step1: After dropping the stack, save callee saved registers on the stack.
step2: The rest of it.
The stack drop operation is optimized when frame->total_size minus
frame->sp_fp_offset is an integer multiple of 4096, can reduce the number
of instructions required to drop the stack. However, this optimization is
not effective because of the original calculation method
The following case:
int main()
{
char buf[1024 * 12];
printf ("%p\n", buf);
return 0;
}
As you can see from the generated assembler, the old GCC has two more
instructions than the new GCC, lines 14 and line 24.
new old
10 main: | 11 main:
11 addi.d $r3,$r3,-16 | 12 lu12i.w $r13,-12288>>12
12 lu12i.w $r13,-12288>>12 | 13 addi.d $r3,$r3,-2032
13 lu12i.w $r5,-12288>>12 | 14 ori $r13,$r13,2016
14 lu12i.w $r12,12288>>12 | 15 lu12i.w $r5,-12288>>12
15 st.d $r1,$r3,8 | 16 lu12i.w $r12,12288>>12
16 add.d $r12,$r12,$r5 | 17 st.d $r1,$r3,2024
17 add.d $r3,$r3,$r13 | 18 add.d $r12,$r12,$r5
18 add.d $r5,$r12,$r3 | 19 add.d $r3,$r3,$r13
19 la.local $r4,.LC0 | 20 add.d $r5,$r12,$r3
20 bl %plt(printf) | 21 la.local $r4,.LC0
21 lu12i.w $r13,12288>>12 | 22 bl %plt(printf)
22 add.d $r3,$r3,$r13 | 23 lu12i.w $r13,8192>>12
23 ld.d $r1,$r3,8 | 24 ori $r13,$r13,2080
24 or $r4,$r0,$r0 | 25 add.d $r3,$r3,$r13
25 addi.d $r3,$r3,16 | 26 ld.d $r1,$r3,2024
26 jr $r1 | 27 or $r4,$r0,$r0
| 28 addi.d $r3,$r3,2032
| 29 jr $r1
gcc/ChangeLog:
* config/loongarch/loongarch.cc (loongarch_compute_frame_info):
Modify fp_sp_offset and gp_sp_offset's calculation method,
when frame->mask or frame->fmask is zero, don't minus UNITS_PER_WORD
or UNITS_PER_FP_REG.
gcc/testsuite/ChangeLog:
* gcc.target/loongarch/prolog-opt.c: New test.
GCC Administrator [Fri, 8 Jul 2022 00:16:22 +0000 (00:16 +0000)]
Daily bump.
Ian Lance Taylor [Thu, 7 Jul 2022 23:54:05 +0000 (16:54 -0700)]
libbacktrace: don't exit Mach-O dyld library loop on one failure
* macho.c (backtrace_initialize) [HAVE_MACH_O_DYLD_H]: Don't exit
loop if we can't find debug info for one shared library.
Ian Lance Taylor [Thu, 7 Jul 2022 23:13:57 +0000 (16:13 -0700)]
libbacktrace: don't let "make clean" remove allocfail.sh
For https://github.com/ianlancetaylor/libbacktrace/issues/81
* Makefile.am (MAKETESTS): New variable split out of TESTS.
(CLEANFILES): Replace TESTS with BUILDTESTS and MAKETESTS.
* Makefile.in: Regenerate.
Patrick Palka [Thu, 7 Jul 2022 20:46:29 +0000 (16:46 -0400)]
c++: generic targs and identity substitution [PR105956]
In r13-1045-gcb7fd1ea85feea I assumed that substitution into generic
DECL_TI_ARGS corresponds to an identity mapping of the given arguments,
and hence its safe to always elide such substitution. But this PR
demonstrates that such a substitution isn't always the identity mapping,
in particular when there's an ARGUMENT_PACK_SELECT argument, which gets
handled specially during substitution:
* when substituting an APS into a template parameter, we strip the
APS to its underlying argument;
* and when substituting an APS into a pack expansion, we strip the
APS to its underlying argument pack.
In this testcase, when expanding the pack expansion pattern (idx + Ns)...
with Ns={0,1}, we specialize idx twice, first with Ns=APS<0,{0,1}> and
then Ns=APS<1,{0,1}>. The DECL_TI_ARGS of idx are the generic template
arguments of the enclosing class template impl, so before r13-1045,
we'd substitute into its DECL_TI_ARGS which gave Ns={0,1} as desired.
But after r13-1045, we elide this substitution and end up attempting to
hash the original Ns argument, an APS, which ICEs.
So this patch reverts that part of r13-1045. I considered using
preserve_args in this case instead, but that'd break the static_assert
in the testcase because preserve_args always strips APS to its
underlying argument, but here we want to strip it to its underlying
argument pack, so we'd incorrectly end up forming the specializations
impl<0>::idx and impl<1>::idx instead of impl<0,1>::idx.
Although we can't elide the substitution into DECL_TI_ARGS in light of
ARGUMENT_PACK_SELECT, it should still be safe to elide template argument
coercion in the case of a non-template decl, which this patch preserves.
It's unfortunate that we need to remove this optimization just because
it doesn't hold for one special tree code. So this patch implements a
heuristic in tsubst_template_args to avoid allocating a new TREE_VEC if
the substituted elements are identical to those of a level from ARGS, as
well as a similar heuristic for tsubst_argument_pack. It turns out that
about 40% of all calls to tsubst_template_args benefit from this, and it
reduces memory usage by about 4% for e.g. range-v3's zip.cpp (relative to
r13-1045) which more than makes up for the reversion.
PR c++/105956
gcc/cp/ChangeLog:
* pt.cc (template_arg_to_parm): Define.
(tsubst_argument_pack): Try to reuse the corresponding
ARGUMENT_PACK from 'args' when substituting into a generic
ARGUMENT_PACK for a variadic template parameter.
(tsubst_template_args): Move variable declarations closer to
their first use. Replace 'orig_t' with 'r'. Rename 'need_new'
to 'const_subst_p'. Heuristically detect if the substituted
elements are identical to that of a level from 'args' and avoid
allocating a new TREE_VEC if so. Add sanity check for the
length of the new TREE_VEC, and remove dead ARGUMENT_PACK_P test.
(tsubst_decl) <case TYPE_DECL, case VAR_DECL>: Revert
r13-1045-gcb7fd1ea85feea change for avoiding substitution into
DECL_TI_ARGS, but still avoid coercion in this case.
gcc/testsuite/ChangeLog:
* g++.dg/cpp0x/variadic183.C: New test.
David Malcolm [Thu, 7 Jul 2022 19:50:26 +0000 (15:50 -0400)]
analyzer: use label_text for superedge::get_description
gcc/analyzer/ChangeLog:
* checker-path.cc (start_cfg_edge_event::get_desc): Update for
superedge::get_description returning a label_text.
* engine.cc (feasibility_state::maybe_update_for_edge): Likewise.
* supergraph.cc (superedge::dump): Likewise.
(superedge::get_description): Convert return type from char * to
label_text.
* supergraph.h (superedge::get_description): Likewise.
Signed-off-by: David Malcolm <dmalcolm@redhat.com>
David Malcolm [Thu, 7 Jul 2022 19:50:26 +0000 (15:50 -0400)]
Convert label_text to C++11 move semantics
libcpp's class label_text stores a char * for a string and a flag saying
whether it owns the buffer. I added this class before we could use
C++11, and so to avoid lots of copying it required an explicit call
to label_text::maybe_free to potentially free the buffer.
Now that we can use C++11, this patch removes label_text::maybe_free in
favor of doing the cleanup in the destructor, and using C++ move
semantics to avoid any copying. This allows lots of messy cleanup code
to be eliminated in favor of implicit destruction (mostly in the
analyzer).
No functional change intended.
gcc/analyzer/ChangeLog:
* call-info.cc (call_info::print): Update for removal of
label_text::maybe_free in favor of automatic memory management.
* checker-path.cc (checker_event::dump): Likewise.
(checker_event::prepare_for_emission): Likewise.
(state_change_event::get_desc): Likewise.
(superedge_event::should_filter_p): Likewise.
(start_cfg_edge_event::get_desc): Likewise.
(warning_event::get_desc): Likewise.
(checker_path::dump): Likewise.
(checker_path::debug): Likewise.
* diagnostic-manager.cc
(diagnostic_manager::prune_for_sm_diagnostic): Likewise.
(diagnostic_manager::prune_interproc_events): Likewise.
* program-state.cc (sm_state_map::to_json): Likewise.
* region.cc (region::to_json): Likewise.
* sm-malloc.cc (inform_nonnull_attribute): Likewise.
* store.cc (binding_map::to_json): Likewise.
(store::to_json): Likewise.
* svalue.cc (svalue::to_json): Likewise.
gcc/c-family/ChangeLog:
* c-format.cc (range_label_for_format_type_mismatch::get_text):
Update for removal of label_text::maybe_free in favor of automatic
memory management.
gcc/ChangeLog:
* diagnostic-format-json.cc (json_from_location_range): Update for
removal of label_text::maybe_free in favor of automatic memory
management.
* diagnostic-format-sarif.cc
(sarif_builder::make_location_object): Likewise.
* diagnostic-show-locus.cc (struct pod_label_text): New.
(class line_label): Convert m_text from label_text to pod_label_text.
(layout::print_any_labels): Move "text" to the line_label.
* tree-diagnostic-path.cc (path_label::get_text): Update for
removal of label_text::maybe_free in favor of automatic memory
management.
(event_range::print): Likewise.
(default_tree_diagnostic_path_printer): Likewise.
(default_tree_make_json_for_path): Likewise.
libcpp/ChangeLog:
* include/line-map.h: Include <utility>.
(class label_text): Delete maybe_free method in favor of a
destructor. Add move ctor and assignment operator. Add deletion
of the copy ctor and copy-assignment operator. Rename field
m_caller_owned to m_owned. Add std::move where necessary; add
moved_from member function.
Signed-off-by: David Malcolm <dmalcolm@redhat.com>
David Malcolm [Thu, 7 Jul 2022 19:50:26 +0000 (15:50 -0400)]
analyzer: fix false positives from -Wanalyzer-tainted-divisor [PR106225]
gcc/analyzer/ChangeLog:
PR analyzer/106225
* sm-taint.cc (taint_state_machine::on_stmt): Move handling of
assignments from division to...
(taint_state_machine::check_for_tainted_divisor): ...this new
function. Reject warning when the divisor is known to be non-zero.
* sm.cc: Include "analyzer/program-state.h".
(sm_context::get_old_region_model): New.
* sm.h (sm_context::get_old_region_model): New decl.
gcc/testsuite/ChangeLog:
PR analyzer/106225
* gcc.dg/analyzer/taint-divisor-1.c: Add test coverage for various
correct and incorrect checks against zero.
Signed-off-by: David Malcolm <dmalcolm@redhat.com>
Jonathan Wakely [Thu, 7 Jul 2022 12:40:20 +0000 (13:40 +0100)]
libstdc++: Remove workaround in __gnu_cxx::char_traits::move [PR89074]
The front-end bug that prevented this constexpr loop from working has
been fixed since GCC 12.1 so we can remove the workaround.
libstdc++-v3/ChangeLog:
PR c++/89074
* include/bits/char_traits.h (__gnu_cxx::char_traits::move):
Remove workaround for front-end bug.
Prathamesh Kulkarni [Thu, 7 Jul 2022 16:33:35 +0000 (22:03 +0530)]
statistics.cc: Add check to see if fn is not NULL in get_function_name.
gcc/ChangeLog:
* statistics.cc (get_function_name): Add check to see if fn is not NULL.
Jason Merrill [Thu, 7 Jul 2022 14:12:04 +0000 (10:12 -0400)]
c++: -Woverloaded-virtual and dtors [PR87729]
My earlier patch broke out of the loop over base members when we found a
match, but that caused trouble for dtors, which can have multiple for which
same_signature_p is true. But as the function comment says, we know this
doesn't apply to [cd]tors, so skip them.
PR c++/87729
gcc/cp/ChangeLog:
* class.cc (warn_hidden): Ignore [cd]tors.
gcc/testsuite/ChangeLog:
* g++.dg/warn/Woverloaded-virt3.C: New test.
Martin Liska [Thu, 7 Jul 2022 10:15:28 +0000 (12:15 +0200)]
lto-plugin: use locking only for selected targets
For now, support locking only for linux targets that are different from
riscv* where the target depends on libatomic (and fails during
bootstrap).
PR lto/106170
lto-plugin/ChangeLog:
* configure.ac: Configure HAVE_PTHREAD_LOCKING.
* lto-plugin.c (LOCK_SECTION): New.
(UNLOCK_SECTION): New.
(claim_file_handler): Use the newly added macros.
(onload): Likewise.
* config.h.in: Regenerate.
* configure: Regenerate.
Richard Biener [Thu, 7 Jul 2022 10:59:47 +0000 (12:59 +0200)]
Speedup update-ssa some more
The following avoids copying an sbitmap and one traversal by avoiding
to re-allocate old_ssa_names when not necessary. In addition this
actually checks what the comment before PHI insert iterating promises,
that the old_ssa_names set does not grow.
* tree-into-ssa.cc (iterating_old_ssa_names): New.
(add_new_name_mapping): Grow {new,old}_ssa_names separately
and only when actually needed. Assert we are not growing
the old_ssa_names set when iterating over it.
(update_ssa): Remove old_ssa_names copying and empty_p
query, note we are iterating over it and expect no set changes.
Thomas Schwinge [Tue, 5 Jul 2022 10:21:33 +0000 (12:21 +0200)]
Fix Intel MIC 'mkoffload' for OpenMP 'requires'
Similar to how the other 'mkoffload's got changed in
recent commit
683f11843974f0bdf42f79cdcbb0c2b43c7b81b0
"OpenMP: Move omp requires checks to libgomp".
This also means finally switching Intel MIC 'mkoffload' to
'GOMP_offload_register_ver', 'GOMP_offload_unregister_ver',
making 'GOMP_offload_register', 'GOMP_offload_unregister'
legacy entry points.
gcc/
* config/i386/intelmic-mkoffload.cc (generate_host_descr_file)
(prepare_target_image, main): Handle OpenMP 'requires'.
(generate_host_descr_file): Switch to 'GOMP_offload_register_ver',
'GOMP_offload_unregister_ver'.
libgomp/
* target.c (GOMP_offload_register, GOMP_offload_unregister):
Denote as legacy entry points.
* testsuite/lib/libgomp.exp
(check_effective_target_offload_target_any): New proc.
* testsuite/libgomp.c-c++-common/requires-1.c: Enable for
'offload_target_any'.
* testsuite/libgomp.c-c++-common/requires-3.c: Likewise.
* testsuite/libgomp.c-c++-common/requires-7.c: Likewise.
* testsuite/libgomp.fortran/requires-1.f90: Likewise.
Thomas Schwinge [Thu, 7 Jul 2022 07:45:42 +0000 (09:45 +0200)]
Enhance 'libgomp.c-c++-common/requires-4.c', 'libgomp.c-c++-common/requires-5.c' testing
These should compile and link and execute in all configurations; host-fallback
execution, which we may actually verify.
Follow-up to recent commit
683f11843974f0bdf42f79cdcbb0c2b43c7b81b0
"OpenMP: Move omp requires checks to libgomp".
libgomp/
* testsuite/libgomp.c-c++-common/requires-4.c: Enhance testing.
* testsuite/libgomp.c-c++-common/requires-5.c: Likewise.
Thomas Schwinge [Thu, 7 Jul 2022 07:59:45 +0000 (09:59 +0200)]
Adjust 'libgomp.c-c++-common/requires-3.c'
As documented, this one does "Check diagnostic by device-compiler's lto1".
Indeed there are none when compiling with '-foffload=disable' with an
offloading-enabled compiler, so we should use 'offload_target_[...]', as
used in other similar test cases.
Follow-up to recent commit
683f11843974f0bdf42f79cdcbb0c2b43c7b81b0
"OpenMP: Move omp requires checks to libgomp".
libgomp/
* testsuite/libgomp.c-c++-common/requires-3.c: Adjust.
Richard Biener [Thu, 7 Jul 2022 08:46:01 +0000 (10:46 +0200)]
target/106219 - proprly mark builtins pure via ix86_add_new_builtins
The target optimize pragma path to initialize extra target specific
builtins missed handling of the pure_p flag which in turn causes
extra clobber side-effects of gather builtins leading to unexpected
issues downhill.
PR target/106219
* config/i386/i386-builtins.cc (ix86_add_new_builtins): Properly
set DECL_PURE_P.
* g++.dg/pr106219.C: New testcase.
Jonathan Wakely [Wed, 6 Jul 2022 18:11:05 +0000 (19:11 +0100)]
testsuite: Fix incorrect -mfloat128-type option
This test currently fails with an error about -mfloat128-type being an
invalid option, which is not what it's supposed to be testing:
XFAIL: gcc.target/powerpc/ppc-fortran/pr80108-1.f90 -O (test for excess errors)
Excess errors:
xgcc: error: unrecognized command-line option '-mfloat128-type'; did you mean '-mfloat128'?
With this change we get the error that the comment says it expects:
XFAIL: gcc.target/powerpc/ppc-fortran/pr80108-1.f90 -O (test for excess errors)
Excess errors:
f951: Error: power9 target option is incompatible with '-mcpu=<xxx>' for <xxx> less than power9
f951: Error: '-mfloat128' requires VSX support
f951: Error: '-m64' requires a PowerPC64 cpu
gcc/testsuite/ChangeLog:
* gcc.target/powerpc/ppc-fortran/pr80108-1.f90: Change
-mfloat128-type to -mfloat128.
Richard Biener [Thu, 7 Jul 2022 07:29:55 +0000 (09:29 +0200)]
Speed up LC SSA rewrite more
In many cases loops have only one exit or a variable is only live
across one of the exits. In this case we know that all uses
outside of the loop will be dominated by the single LC PHI node
we insert. If that holds for all variables requiring LC SSA PHIs
then we can simplify the update_ssa process, avoiding the
(iterated) dominance frontier computations.
* tree-ssa-loop-manip.cc (add_exit_phis_var): Return the
number of LC PHIs inserted.
(add_exit_phis): Return whether any variable required
multiple LC PHI nodes.
(rewrite_into_loop_closed_ssa_1): Use TODO_update_ssa_no_phi
when possible.
Richard Biener [Thu, 7 Jul 2022 07:00:00 +0000 (09:00 +0200)]
Speed up LC SSA rewrite
The following avoids collecting all loops exit blocks into bitmaps
and computing the union of those up the loop tree possibly repeatedly.
Instead we make sure to do this only once for each loop with a
definition possibly requiring a LC phi node plus make sure to
leverage recorded exits to avoid the intermediate bitmap allocation.
* tree-ssa-loop-manip.cc (compute_live_loop_exits): Take
the def loop exit block bitmap as argument instead of
re-computing it here.
(add_exit_phis_var): Adjust.
(loop_name_cmp): New function.
(add_exit_phis): Sort variables to insert LC PHI nodes
after definition loop, for each definition loop compute
the exit block bitmap once.
(get_loops_exit): Remove.
(rewrite_into_loop_closed_ssa_1): Do not pre-record
all loop exit blocks into bitmaps. Record loop exits
if required.
Dimitrije Milosevic [Wed, 6 Jul 2022 17:58:20 +0000 (01:58 +0800)]
Mips: Fix the ASAN shadow offset hook for the n32 ABI
gcc/ChangeLog:
* config/mips/mips.cc (mips_asan_shadow_offset): Reformat
to handle the N32 ABI.
* config/mips/mips.h (SUBTARGET_SHADOW_OFFSET): Remove
the macro, as it is not needed anymore.
Dimitrije Milosevic [Wed, 6 Jul 2022 17:55:23 +0000 (01:55 +0800)]
libsanitizer: Cherry-pick
5d8077565e41 from upstream
5d8077565e41: [MIPS][AddressSanitizer] Resolve build issues for the n32 ABI
GCC Administrator [Thu, 7 Jul 2022 00:16:46 +0000 (00:16 +0000)]
Daily bump.
Thomas Schwinge [Tue, 5 Jul 2022 16:23:15 +0000 (18:23 +0200)]
Restore 'GOMP_offload_unregister_ver' functionality
The recent commit
683f11843974f0bdf42f79cdcbb0c2b43c7b81b0
"OpenMP: Move omp requires checks to libgomp" changed the
'GOMP_offload_register_ver' interface but didn't change
'GOMP_offload_unregister_ver' accordingly, so we're no longer
actually unregistering.
gcc/
* config/gcn/mkoffload.cc (process_obj): Clarify 'target_data' ->
'[...]_data'.
* config/nvptx/mkoffload.cc (process): Likewise.
libgomp/
* target.c (GOMP_offload_register_ver): Clarify 'target_data' ->
'data'.
(GOMP_offload_unregister_ver): Likewise. Fix up 'target_data'.
Thomas Schwinge [Tue, 5 Jul 2022 09:04:46 +0000 (11:04 +0200)]
Define 'OMP_REQUIRES_[...]', 'GOMP_REQUIRES_[...]' in a single place
Clean up for recent commit
683f11843974f0bdf42f79cdcbb0c2b43c7b81b0
"OpenMP: Move omp requires checks to libgomp".
gcc/
* omp-general.h (enum omp_requires): Use 'GOMP_REQUIRES_[...]'.
include/
* gomp-constants.h (OMP_REQUIRES_[...]): Update comment.
Lewis Hyatt [Tue, 5 Jul 2022 21:15:28 +0000 (17:15 -0400)]
diagnostics: Honor #pragma GCC diagnostic in the preprocessor [PR53431]
As discussed on PR c++/53431, currently, "#pragma GCC diagnostic" does
not always take effect for diagnostics generated by libcpp. The reason
is that libcpp itself does not interpret this pragma and only sends it on
to the frontend, hence the pragma is only honored if the frontend
arranges for it. The C frontend does process the pragma immediately
(more or less) after seeing the token, so things work fine there. The PR
points out that it doesn't work for C++, because the C++ frontend
doesn't handle anything until it has read all the tokens from
libcpp. The underlying problem is not C++-specific, though, and for
instance, gcc -E has the same issue.
This commit fixes the PR by adding the concept of an early pragma handler that
can be registered by frontends, which gives them a chance to process
diagnostic pragmas from libcpp before it is too late for them to take
effect. The C++ and preprocess-only frontends are modified to use early
pragmas and correct the behavior.
gcc/c-family/ChangeLog:
PR preprocessor/53920
PR c++/53431
* c-common.cc (c_option_is_from_cpp_diagnostics): New function.
* c-common.h (c_option_is_from_cpp_diagnostics): Declare.
(c_pp_stream_token): Declare.
* c-ppoutput.cc (init_pp_output): Refactor logic about skipping
pragmas to...
(should_output_pragmas): ...here. New function.
(token_streamer::stream): Support handling early pragmas.
(do_line_change): Likewise.
(c_pp_stream_token): New function.
* c-pragma.cc (struct pragma_diagnostic_data): New helper class.
(pragma_diagnostic_lex_normal): New function. Moved logic for
interpreting GCC diagnostic pragmas here.
(pragma_diagnostic_lex_pp): New function for parsing diagnostic pragmas
directly from libcpp.
(handle_pragma_diagnostic): Refactor into helper function...
(handle_pragma_diagnostic_impl): ...here. New function.
(handle_pragma_diagnostic_early): New function.
(handle_pragma_diagnostic_early_pp): New function.
(struct pragma_ns_name): Renamed to...
(struct pragma_pp_data): ...this. Add new "early_handler" member.
(c_register_pragma_1): Support early pragmas in the preprocessor.
(c_register_pragma_with_early_handler): New function.
(c_register_pragma): Support the new early handlers in struct
internal_pragma_handler.
(c_register_pragma_with_data): Likewise.
(c_register_pragma_with_expansion): Likewise.
(c_register_pragma_with_expansion_and_data): Likewise.
(c_invoke_early_pragma_handler): New function.
(c_pp_invoke_early_pragma_handler): New function.
(init_pragma): Add early pragma support for diagnostic pragmas.
* c-pragma.h (struct internal_pragma_handler): Add new early handler
members.
(c_register_pragma_with_early_handler): Declare.
(c_invoke_early_pragma_handler): Declare.
(c_pp_invoke_early_pragma_handler): Declare.
gcc/cp/ChangeLog:
PR c++/53431
* parser.cc (cp_parser_pragma_kind): Move earlier in the file.
(cp_lexer_handle_early_pragma): New function.
(cp_lexer_new_main): Support parsing and handling early pragmas.
(c_parse_file): Adapt to changes in cp_lexer_new_main.
gcc/testsuite/ChangeLog:
PR preprocessor/53920
PR c++/53431
* c-c++-common/pragma-diag-11.c: New test.
* c-c++-common/pragma-diag-12.c: New test.
* c-c++-common/pragma-diag-13.c: New test.
Iain Buclaw [Wed, 6 Jul 2022 17:45:28 +0000 (19:45 +0200)]
d: Merge upstream dmd
56589f0f4, druntime
651389b5, phobos
1516ecad9.
D front-end changes:
- Import latest bug fixes to mainline.
D runtime changes:
- Import latest bug fixes to mainline.
Phobos changes:
- Import latest bug fixes to mainline.
gcc/d/ChangeLog:
* dmd/MERGE: Merge upstream dmd
56589f0f4.
libphobos/ChangeLog:
* libdruntime/MERGE: Merge upstream druntime
651389b5.
* src/MERGE: Merge upstream phobos
1516ecad9.
Iain Buclaw [Fri, 1 Jul 2022 15:40:18 +0000 (17:40 +0200)]
d: Build the D sources in the front-end with -fno-exceptions
The D front-end does not use exceptions, but it still requires RTTI for
some lowerings of convenience language features. Enforce it with by
building with `-fno-exceptions'.
gcc/d/ChangeLog:
* Make-lang.in (NOEXCEPTION_DFLAGS): Define.
(ALL_DFLAGS): Add NO_EXCEPTION_DFLAGS.
Immad Mir [Wed, 6 Jul 2022 16:08:27 +0000 (21:38 +0530)]
analyzer: add testcase of using closed fd without warning.
This patch adds a testcase for passing a closed fd to a function
that does not emit any warning.
gcc/testsuite/ChangeLog:
* gcc.dg/analyzer/fd-4.c: Add a new testcase to demonstrate
passsing of a closed file descriptor to a function that does
not emit any warning.
Signed-off-by: Immad Mir <mirimmad@outlook.com>
Immad Mir [Wed, 6 Jul 2022 16:07:14 +0000 (21:37 +0530)]
analyzer: reorder initialization of state m_invalid in sm-fd.cc [PR106184]
This patch reorders the initialization of state m_invalid in sm-fd.cc
so that the order of initializers is same as the ordering of the fields
in the class decl.
gcc/analyzer/ChangeLog:
PR analyzer/106184
* sm-fd.cc (fd_state_machine): Change ordering of initialization
of state m_invalid so that the order of initializers is same as
the ordering of the fields in the class decl.
Signed-off-by: Immad Mir <mirimmad@outlook.com>
Immad Mir [Wed, 6 Jul 2022 16:05:53 +0000 (21:35 +0530)]
analyzer: show close event for use_after_close diagnostic
This patch saves the "close" event in use_after_close diagnostic
and shows it where possible.
gcc/analyzer/ChangeLog:
* sm-fd.cc (use_after_close): save the "close" event and
show it where possible.
gcc/testsuite/ChangeLog:
* gcc.dg/analyzer/fd-4.c (test_3): change the message note to conform to the
changes in analyzer/sm-fd.cc
(test_4): Likewise.
Signed-off-by: Immad Mir <mirimmad@outlook.com>
Piotr Trojanek [Thu, 9 Jun 2022 10:05:21 +0000 (12:05 +0200)]
[Ada] Simplify regular expression that matches 8 consecutive digits
Makefile cleanup; behaviour is unaffected.
gcc/ada/
* gcc-interface/Make-lang.in (ada/generated/gnatvsn.ads):
Simplify regular expression. The "interval expression",
i.e. \{8\} is part of the POSIX regular expressions, so it
should not be a problem for modern implementations of sed.
Eric Botcazou [Thu, 9 Jun 2022 16:49:10 +0000 (18:49 +0200)]
[Ada] Update comment after recent changes wrt. secondary stack & tagged types
gcc/ada/
* gcc-interface/trans.cc (gnat_to_gnu): Update comment.
Eric Botcazou [Tue, 7 Jun 2022 19:46:04 +0000 (21:46 +0200)]
[Ada] Improve code generated for aggregates of VFA type
This avoids using a full access for constants internally generated from
assignments of aggregates with a Volatile_Full_Access type.
gcc/ada/
* gcc-interface/gigi.h (simple_constant_p): Declare.
* gcc-interface/decl.cc (gnat_to_gnu_entity) <E_Variable>: Strip
the qualifiers from the type of a simple constant.
(simple_constant_p): New predicate.
* gcc-interface/trans.cc (node_is_atomic): Return true for objects
with atomic type except for simple constants.
(node_is_volatile_full_access): Return false for simple constants
with VFA type.
Eric Botcazou [Fri, 3 Jun 2022 08:02:56 +0000 (10:02 +0200)]
[Ada] Fix crash on aliased renaming of unconstrained array
gcc/ada/
* gcc-interface/decl.cc (gnat_to_gnu_entity) <E_Variable>: Create a
local constant holding the underlying GNAT type of the object. Do
not fiddle with the object size for an unconstrained array.
Eric Botcazou [Tue, 24 May 2022 08:01:13 +0000 (10:01 +0200)]
[Ada] Small tweak to gnat_to_gnu_subprog_type
No functional changes.
gcc/ada/
* gcc-interface/decl.cc (gnat_to_gnu_subprog_type): Constify a
local variable and move a couple of others around.
Eric Botcazou [Mon, 23 May 2022 23:39:08 +0000 (01:39 +0200)]
[Ada] Do not give warnings for compiler-generated entities by default
gcc/ada/
* gcc-interface/trans.cc (gnat_gimplify_expr) <SAVE_EXPR>: New case.
Eric Botcazou [Thu, 16 Jun 2022 11:22:18 +0000 (13:22 +0200)]
[Ada] Document the various function return mechanisms
gcc/ada/
* exp_ch6.adb (Function return mechanisms): New paragraph.
Yannick Moy [Thu, 16 Jun 2022 12:14:56 +0000 (14:14 +0200)]
[Ada] Deferred constant considered as not preelaborable
Fix detection of non-preelaborable constructs for checking SPARK
elaboration rules, which was tagging deferred constant declarations as
not preelaborable.
gcc/ada/
* sem_util.adb (Is_Non_Preelaborable_Construct): Fix for
deferred constants.
Justin Squirek [Wed, 15 Jun 2022 02:03:48 +0000 (02:03 +0000)]
[Ada] Indexing error when calling GNAT.Regpat.Match
This patch corrects an error in the compiler whereby a buffer sizing
error fails to get raised when compiling a regex expression with an
insufficiently sized Pattern_Matcher as the documentation indicated.
This, in turn, could lead to indexing errors when attempting to call
Match with the malformed regex program buffer.
gcc/ada/
* libgnat/s-regpat.adb, libgnat/s-regpat.ads (Compile): Add a
new defaulted parameter Error_When_Too_Small to trigger an
error, if specified true, when Matcher is too small to hold the
compiled regex program.
Justin Squirek [Wed, 15 Jun 2022 01:14:31 +0000 (01:14 +0000)]
[Ada] Spurious non-callable warning on prefixed call in class condition
This patch corrects an error in the compiler whereby a function call in
prefix notation within a class condition causes a spurious error
claiming the name in the call is a non-callable entity when there exists
a type extension in the same unit extended with a component featuring
the same name as the function in question.
gcc/ada/
* sem_ch4.adb (Analyze_Selected_Component): Add condition to
avoid interpreting derived type components as candidates for
selected components in preanalysis of inherited class
conditions.
Yannick Moy [Fri, 10 Jun 2022 15:18:23 +0000 (17:18 +0200)]
[Ada] Support ghost generic formal parameters
This adds support in GNAT for ghost generic formal parameters, as
included in SPARK RM 6.9.
gcc/ada/
* ghost.adb (Check_Ghost_Context): Delay checking for generic
associations.
(Check_Ghost_Context_In_Generic_Association): Perform ghost
checking in analyzed generic associations.
(Check_Ghost_Formal_Procedure_Or_Package): Check SPARK RM
6.9(13-14) for formal procedures and packages.
(Check_Ghost_Formal_Variable): Check SPARK RM 6.9(13-14) for
variables.
* ghost.ads: Declarations for the above.
* sem_ch12.adb (Analyze_Associations): Apply delayed checking
for generic associations.
(Analyze_Formal_Object_Declaration): Same.
(Analyze_Formal_Subprogram_Declaration): Same.
(Instantiate_Formal_Package): Same.
(Instantiate_Formal_Subprogram): Same.
(Instantiate_Object): Same. Copy ghost aspect to newly declared
object for actual for IN formal object. Use new function
Get_Enclosing_Deep_Object to retrieve root object.
(Instantiate_Type): Copy ghost aspect to declared subtype for
actual for formal type.
* sem_prag.adb (Analyze_Pragma): Recognize new allowed
declarations.
* sem_util.adb (Copy_Ghost_Aspect): Copy the ghost aspect
between nodes.
(Get_Enclosing_Deep_Object): New function to return enclosing
deep object (or root for reachable part).
* sem_util.ads (Copy_Ghost_Aspect): Same.
(Get_Enclosing_Deep_Object): Same.
* libgnat/s-imageu.ads: Declare formal subprograms as ghost.
* libgnat/s-valuei.ads: Same.
* libgnat/s-valuti.ads: Same.
Javier Miranda [Tue, 10 May 2022 17:18:30 +0000 (17:18 +0000)]
[Ada] Missing error on tagged type conversion
The compiler does not report an error on a type conversion to/from a
tagged type whose parent type is an interface type and there is no
relationship between the source and target types. This bug has been
dormant since January/2016.
This patch also improves the text of errors reported on interface type
conversions suggesting how to fix these errors.
gcc/ada/
* sem_res.adb (Resolve_Type_Conversion): Code cleanup since the
previous static check has been moved to Valid_Tagged_Conversion.
(Valid_Tagged_Conversion): Fix the code checking conversion
to/from interface types since incorrectly returns True when the
parent type of the operand type (or the target type) is an
interface type; add missing static checks on interface type
conversions.
Marc Poulhiès [Thu, 2 Jun 2022 07:52:21 +0000 (09:52 +0200)]
[Ada] Handle secondary stack memory allocations alignment
To accomodate cases where objects allocated on the secondary stack
needed a more constrained alignement than Standard'Maximum_Alignement,
the alignment for all allocations in the full runtime were forced on to
be aligned on Standard'Maximum_Alignement*2. This changes removes this
workaround and correctly handles the over-alignment in all runtimes.
This change modifies the SS_Allocate procedure to accept a new Alignment
parameter and to dynamically realign the pointer returned by the memory
allocation (Allocate_* functions or dedicated stack allocations for
zfp/cert).
It also simplifies the 0-sized allocations by not allocating any memory
if pointer is already correctly aligned (already the case in cert and
zfp runtimes).
gcc/ada/
* libgnat/s-secsta.ads (SS_Allocate): Add new Alignment
parameter.
(Memory_Alignment): Remove.
* libgnat/s-secsta.adb (Align_Addr): New.
(SS_Allocate): Add new Alignment parameter. Realign pointer if
needed. Don't allocate anything for 0-sized allocations.
* gcc-interface/utils2.cc (build_call_alloc_dealloc_proc): Add
allocated object's alignment as last parameter to allocation
invocation.
Piotr Trojanek [Tue, 14 Jun 2022 11:47:27 +0000 (13:47 +0200)]
[Ada] Cleanup use of local scalars in GNAT.Socket.Get_Address_Info
A cleanup opportunity spotted while working on improved detection of
uninitialised local scalar objects.
gcc/ada/
* libgnat/g-socket.adb (Get_Address_Info): Reduce scope of the
Found variable; avoid repeated assignment inside the loop.
Doug Rupp [Wed, 8 Jun 2022 20:32:51 +0000 (13:32 -0700)]
[Ada] Vxworks7* - Makefile.rtl rtp vs rtp-smp cleanup
Only smp runtimes are built for vxworks7*, even though the -smp suffix
is removed during install. Therefore, in general, the build macros for
the non-smp runtimes are superfluous except on the legacy ppc-vxworks6
target where both the smp and non-smp runtime are built. Lastly, an
error message is added if a runtime build is commanded that doesn't
exist, rather then letting the build mysteriously fail.
gcc/ada/
* Makefile.rtl [arm,aarch64 vxworks7]: Remove rtp and kernel
build macros and set an error variable if needed.
[x86,x86_vxworks7]: Likewise.
[ppc,ppc64]: Set an error variable if needed.
(rts-err): New phony Makefile target.
(setup-rts): Depend on rts-err.
Eric Botcazou [Sat, 11 Jun 2022 11:05:39 +0000 (13:05 +0200)]
[Ada] Fix incorrect itype sharing for case expression in limited type return
The compiler aborts with an internal error in gigi, but the problem is an
itype incorrectly shared between several branches of an if_statement that
has been created for a Build-In-Place return.
Three branches of this if_statement contain an allocator statement and
the latter two have been obtained as the result of calling New_Copy_Tree
on the first; now the initialization expression of the first had also been
obtained as the result of calling New_Copy_Tree on the original tree, and
these chained calls to New_Copy_Tree run afoul of an issue with the copy
of itypes after the rewrite of an aggregate as an expression with actions.
Fixing this issue looks quite delicate, so this fixes the incorrect sharing
by replacing the chained calls to New_Copy_Tree with repeated calls on the
original expression, which is more elegant in any case.
gcc/ada/
* exp_ch3.adb (Make_Allocator_For_BIP_Return): New local function.
(Expand_N_Object_Declaration): Use it to build the three allocators
for a Build-In-Place return with an unconstrained type. Update the
head comment after other recent changes.
Doug Rupp [Thu, 2 Jun 2022 15:38:50 +0000 (08:38 -0700)]
[Ada] Remove old vxworks from Makefile.rtl - e500 port.
The powerpc e500 port has been LTS'd
gcc/ada/
* libgnat/system-vxworks7-e500-kernel.ads: Remove.
* libgnat/system-vxworks7-e500-rtp-smp.ads: Likewise.
* libgnat/system-vxworks7-e500-rtp.ads: Likewise.
Justin Squirek [Fri, 10 Jun 2022 12:16:17 +0000 (12:16 +0000)]
[Ada] Incorrect emptying of CUDA global subprograms
This patch corrects an error in the compiler whereby no
Corresponding_Spec was set for emptied CUDA global subprograms - leading
to a malformed tree.
gcc/ada/
* gnat_cuda.adb (Empty_CUDA_Global_Subprogram): Set
Specification and Corresponding_Spec to match the original
Kernel_Body.
Piotr Trojanek [Thu, 9 Jun 2022 20:01:06 +0000 (22:01 +0200)]
[Ada] Remove explicit call to Make_Unchecked_Type_Conversion
Respect a comment in sinfo.ads, which says: "Unchecked type conversion
nodes should be created by calling Tbuild.Unchecked_Convert_To, rather
than by directly calling Nmake.Make_Unchecked_Type_Conversion."
No test appears to be affected by this change, so this is just a
cleanup.
gcc/ada/
* exp_ch6.adb (Build_Static_Check_Helper_Call): Replace explicit
call to Make_Unchecked_Type_Conversion with a call to
Unchecked_Convert_To.
* tbuild.adb (Unchecked_Convert_To): Fix whitespace.
Piotr Trojanek [Thu, 9 Jun 2022 21:23:46 +0000 (23:23 +0200)]
[Ada] Restore accidentally removed part of a comment about unset references
Fix an unintentionally removed comment.
gcc/ada/
* sem_res.adb (Resolve_Actuals): Restore first sentence of a
comment.
Eric Botcazou [Wed, 8 Jun 2022 11:14:46 +0000 (13:14 +0200)]
[Ada] Fix spurious error for aggregate with box component choice
It comes from the Volatile_Full_Access (or Atomic) aspect: the aggregate is
effectively analyzed/resolved twice and this does not work. It is fixed by
calling Is_Full_Access_Aggregate before resolution.
gcc/ada/
* exp_aggr.adb (Expand_Record_Aggregate): Do not call
Is_Full_Access_Aggregate here.
* freeze.ads (Is_Full_Access_Aggregate): Delete.
* freeze.adb (Is_Full_Access_Aggregate): Move to...
(Freeze_Entity): Do not call Is_Full_Access_Aggregate here.
* sem_aggr.adb (Is_Full_Access_Aggregate): ...here
(Resolve_Aggregate): Call Is_Full_Access_Aggregate here.
David Malcolm [Wed, 6 Jul 2022 11:27:45 +0000 (07:27 -0400)]
analyzer: fix uninit false positive with -ftrivial-auto-var-init= [PR106204]
-fanalyzer handles -ftrivial-auto-var-init= by special-casing
IFN_DEFERRED_INIT to be a no-op, so that e.g.:
len_2 = .DEFERRED_INIT (4, 2, &"len"[0]);
is treated as a no-op, so that len_2 is still uninitialized after the
stmt.
PR analyzer/106204 reports that -fanalyzer gives false positives from
-Wanalyzer-use-of-uninitialized-value on locals that have their address
taken, due to e.g.:
_1 = .DEFERRED_INIT (4, 2, &"len"[0]);
len = _1;
where -fanalyzer leaves _1 uninitialized, and then complains about
the assignment to "len".
Fixed thusly by suppressing the warning when assigning from such SSA
names.
gcc/analyzer/ChangeLog:
PR analyzer/106204
* region-model.cc (within_short_circuited_stmt_p): Move extraction
of assign_stmt to caller.
(due_to_ifn_deferred_init_p): New.
(region_model::check_for_poison): Move extraction of assign_stmt
from within_short_circuited_stmt_p to here. Share logic with
call to due_to_ifn_deferred_init_p.
gcc/testsuite/ChangeLog:
PR analyzer/106204
* gcc.dg/analyzer/torture/uninit-pr106204.c: New test.
* gcc.dg/analyzer/uninit-pr106204.c: New test.
Signed-off-by: David Malcolm <dmalcolm@redhat.com>
Jason Merrill [Tue, 5 Jul 2022 21:05:47 +0000 (17:05 -0400)]
c++: dependent conversion operator lookup [PR106179]
This testcase demonstrates that my assumption that we would only be
interested in a class template lookup if the template-id is followed by ::
was wrong.
PR c++/106179
PR c++/106024
gcc/cp/ChangeLog:
* parser.cc (cp_parser_lookup_name): Remove :: requirement
for using unqualified lookup result.
gcc/testsuite/ChangeLog:
* g++.dg/template/operator16.C: New test.
GCC Administrator [Wed, 6 Jul 2022 00:16:33 +0000 (00:16 +0000)]
Daily bump.
Ian Lance Taylor [Sun, 3 Jul 2022 21:37:23 +0000 (14:37 -0700)]
compiler: propagate array length error marker farther
Fixes golang/go#53639
Reviewed-on: https://go-review.googlesource.com/c/gofrontend/+/415936
Ian Lance Taylor [Mon, 4 Jul 2022 19:20:36 +0000 (12:20 -0700)]
compiler: better error message for unknown package name
Fixes golang/go#51237
Reviewed-on: https://go-review.googlesource.com/c/gofrontend/+/415994
Alexandre Oliva [Tue, 5 Jul 2022 22:07:32 +0000 (19:07 -0300)]
libstdc++: testsuite: why cast getpid result
Add a comment next to the getpid call to explain why the typecast is
needed.
for libstdc++-v3/ChangeLog
* testsuite/util/testsuite_fs.h (nonexistent_path): Explain
why we need the typecast.
Marek Polacek [Tue, 5 Jul 2022 18:22:26 +0000 (14:22 -0400)]
c-family: Prevent -Wformat warnings with u8 strings [PR105626]
The <https://gcc.gnu.org/pipermail/gcc/2022-May/238679.html> thread
seems to have concluded that -Wformat shouldn't warn about
printf((const char*) u8"test %d\n", 1);
saying "format string is not an array of type 'char'". This code
is not an aliasing violation, and there are no I/O functions for u8
strings, so the const char * cast is OK and shouldn't be disregarded.
PR c++/105626
gcc/c-family/ChangeLog:
* c-format.cc (check_format_arg): Don't emit -Wformat warnings with
u8 strings.
gcc/testsuite/ChangeLog:
* g++.dg/warn/Wformat-char8_t-1.C: New test.
Andrew MacLeod [Tue, 5 Jul 2022 14:54:26 +0000 (10:54 -0400)]
Provide a relation verification mechanism.
Provide a relation oracle API which validates a relation between 2 ranges.
This allows relation queries that are symbolicly true to be overridden
by range specific information. ie. x == x is true symbolically, but for
floating point a NaN may invalidate this assumption.
* value-relation.cc (relation_to_code): New vector.
(relation_oracle::validate_relation): New.
(set_relation): Allow ssa1 == ssa2 to be registered.
* value-relation.h (validate_relation): New prototype.
(query_relation): Make internal variant protected.
Roger Sayle [Tue, 5 Jul 2022 17:06:13 +0000 (18:06 +0100)]
Doubleword version of and;cmp to not;test optimization on x86.
This patch extends the earlier and;cmp to not;test optimization to also
perform this transformation for TImode on TARGET_64BIT and DImode on -m32,
One motivation for this is that it's a step to fixing the current failure
of gcc.target/i386/pr65105-5.c on -m32.
A more direct benefit for x86_64 is that the following code:
int foo(__int128 x, __int128 y)
{
return (x & y) == y;
}
improves with -O2 from 15 instructions:
movq %rdi, %r8
movq %rsi, %rax
movq %rax, %rdi
movq %r8, %rsi
movq %rdx, %r8
andq %rdx, %rsi
andq %rcx, %rdi
movq %rsi, %rax
movq %rdi, %rdx
xorq %r8, %rax
xorq %rcx, %rdx
orq %rdx, %rax
sete %al
movzbl %al, %eax
ret
to the slightly better 13 instructions:
movq %rdi, %r8
movq %rsi, %rax
movq %r8, %rsi
movq %rax, %rdi
notq %rsi
notq %rdi
andq %rdx, %rsi
andq %rcx, %rdi
movq %rsi, %rax
orq %rdi, %rax
sete %al
movzbl %al, %eax
ret
2022-07-05 Roger Sayle <roger@nextmovesoftware.com>
gcc/ChangeLog
* config/i386/i386.cc (ix86_rtx_costs) <COMPARE>: Provide costs
for double word comparisons and tests (comparisons against zero).
* config/i386/i386.md (*test<mode>_not_doubleword): Split DWI
and;cmp into andn;cmp $0 as a pre-reload splitter.
(*andn<dwi>3_doubleword_bmi): Use <dwi> instead of <mode> in name.
(*<any_or><dwi>3_doubleword): Likewise.
gcc/testsuite/ChangeLog
* gcc.target/i386/testnot-3.c: New test case.
Roger Sayle [Tue, 5 Jul 2022 17:00:00 +0000 (18:00 +0100)]
UNSPEC_PALIGNR optimizations and clean-ups on x86.
This patch is a follow-up to Hongtao's fix for PR target/105854. That
fix is perfectly correct, but the thing that caught my eye was why is
the compiler generating a shift by zero at all. Digging deeper it
turns out that we can easily optimize __builtin_ia32_palignr for
alignments of 0 and 64 respectively, which may be simplified to moves
of the highpart and lowpart respectively.
After adding optimizations to simplify the 64-bit DImode palignr, I
started to add the corresponding optimizations for vpalignr (i.e.
128-bit). The first oddity is that sse.md uses TImode and a special
SSESCALARMODE iterator, rather than V1TImode, and indeed the comment
above SSESCALARMODE hints that this should be "dropped in favor of
VIMAX_AVX2_AVX512BW". Hence this patch includes the migration of
<ssse3_avx2>_palignr<mode> to use VIMAX_AVX2_AVX512BW, basically
using V1TImode instead of TImode for 128-bit palignr.
This patch has been tested on x86_64-pc-linux-gnu with make bootstrap
and make -k check, both with and without --target_board=unix{-,32},
with no new failures. Ok for mainline?
2022-07-05 Roger Sayle <roger@nextmovesoftware.com>
Hongtao Liu <hongtao.liu@intel.com>
gcc/ChangeLog
* config/i386/i386-builtin.def (__builtin_ia32_palignr128): Change
CODE_FOR_ssse3_palignrti to CODE_FOR_ssse3_palignrv1ti.
* config/i386/i386-expand.cc (expand_vec_perm_palignr): Use V1TImode
and gen_ssse3_palignv1ti instead of TImode.
* config/i386/sse.md (SSESCALARMODE): Delete.
(define_mode_attr ssse3_avx2): Handle V1TImode instead of TImode.
(<ssse3_avx2>_palignr<mode>): Use VIMAX_AVX2_AVX512BW as a mode
iterator instead of SSESCALARMODE.
(ssse3_palignrdi): Optimize cases where operands[3] is 0 or 64,
using a single move instruction (if required).
gcc/testsuite/ChangeLog
* gcc.target/i386/ssse3-palignr-2.c: New test case.
Roger Sayle [Tue, 5 Jul 2022 16:55:53 +0000 (17:55 +0100)]
PR rtl-optimization/96692: ((A|B)^C)^A using andn with -mbmi on x86.
This patch addresses PR rtl-optimization/96692 on x86_64, by providing
a set of combine splitters to convert the three operation ((A|B)^C)^D
into a two operation sequence using andn when either A or B is the same
register as C or D. This is essentially a reassociation problem that's
only a win if the target supports an and-not instruction (as with -mbmi).
Hence for the new test case:
int f(int a, int b, int c)
{
return (a ^ b) ^ (a | c);
}
GCC on x86_64-pc-linux-gnu wth -O2 -mbmi would previously generate:
xorl %edi, %esi
orl %edx, %edi
movl %esi, %eax
xorl %edi, %eax
ret
but with this patch now generates:
andn %edx, %edi, %eax
xorl %esi, %eax
ret
2022-07-05 Roger Sayle <roger@nextmovesoftware.com>
Uroš Bizjak <ubizjak@gmail.com>
gcc/ChangeLog
PR rtl-optimization/96692
* config/i386/i386.md (define_split): Split ((A | B) ^ C) ^ D
as (X & ~Y) ^ Z on target BMI when either C or D is A or B.
gcc/testsuite/ChangeLog
PR rtl-optimization/96692
* gcc.target/i386/bmi-andn-4.c: New test case.
Nathan Sidwell [Fri, 24 Jun 2022 12:57:42 +0000 (05:57 -0700)]
c++: Prune ordinary locations
Like macro locations, we only need to emit ordinary location
information for locations emitted into the CMI. This adds a hash table
noting which ordinary lines are needed. These are then sorted and
(sufficiently) adjacent lines are coalesced to a single map. There is
a tradeoff here, allowing greater separation reduces the number of
line maps, but increases the number of locations. It appears allowing
2 or 3 intervening lines is the sweet spot, and this patch chooses 2.
Compiling a hello-world #includeing <iostream> in it's GMF gives a
reduction in number of locations of 5 fold, but an increase in number
of maps about 4 fold. Examining one of the xtreme-header tests we
halve the number of locations and increase the number of maps by 9
fold.
Module interfaces that emit no entities (or macros, if a header-unit),
will now have no location tables.
gcc/cp/
* module.cc
(struct ord_loc_info, ord_loc_traits): New.
(ord_loc_tabke, ord_loc_remap): New globals.
(struct location_map_info): Delete.
(struct module_state_config): Rename ordinary_loc_align to
loc_range_bits.
(module_for_ordinary_loc): Adjust.
(module_state::note_location): Note ordinary locations,
return bool.
(module_state::write_location): Adjust ordinary location
streaming.
(module_state::read_location): Likewise.
(module_state::write_init_maps): Allocate ord_loc_table.
(module_state::write_prepare_maps): Reimplement ordinary
map preparation.
(module_state::read_prepare_maps): Adjust.
(module_state::write_ordinary_maps): Reimplement.
(module_state::write_macro_maps): Adjust.
(module_state::read_ordinary_maps): Reimplement.
(module_state::write_macros): Adjust.
(module_state::write_config): Adjust.
(module_state::read_config): Adjust.
(module_state::write_begin): Adjust.
(module_state::read_initial): Adjust.
gcc/testsuite/
* g++.dg/modules/loc-prune-1.C: Adjust.
* g++.dg/modules/loc-prune-4.C: New.
* g++.dg/modules/pr98718_a.C: Adjust.
* g++.dg/modules/pr98718_b.C: Adjust.
* g++.dg/modules/pr99072.H: Adjust.
Richard Biener [Tue, 5 Jul 2022 12:14:49 +0000 (14:14 +0200)]
tree-optimization/106198 - CFG cleanup vs LC SSA
This is another case like PR106182 where for the 2nd testcase in
the bug there are no removed or discovered loops but still changing
loop exits invalidates LC SSA and it is not enough to just scan for
uses in the blocks that changed loop depth. One might argue that
if we'd include former exit destinations we'd pick up the original
LC SSA use but for virtuals on block merging we'd have propagated
those out (while for regular uses we insert copies). CFG cleanup
can also be entered with loops needing fixup so any heuristics
based on loop structure are bound to fail.
PR tree-optimization/106198
* tree-cfgcleanup.cc (repair_loop_structures): Always do a
full LC SSA rewrite but only if any blocks changed loop
depth.
* gcc.dg/pr106198.c: New testcase.
Richard Biener [Tue, 5 Jul 2022 12:09:36 +0000 (14:09 +0200)]
Remove dead loop-based LC SSA rewrite
The following removes the now unused per-loop path in LC SSA rewrite.
* tree-ssa-loop-manip.cc (find_uses_to_rename_def): Remove.
(find_uses_to_rename_in_loop): Likewise.
(rewrite_into_loop_closed_ssa_1): Remove loop parameter and
uses.
(rewrite_into_loop_closed_ssa): Adjust.
Richard Biener [Tue, 5 Jul 2022 09:38:52 +0000 (11:38 +0200)]
tree-optimization/106186 - propagate out virtual LC PHI nodes properly
The code to remove LC PHI nodes in clean_up_loop_closed_phi does not handle
virtual operands because may_propagate_copy generally returns false
for them. The following copies the merge_blocks variant for
dealing with them.
This fixes a missed jump threading in gcc.dg/auto-init-uninit-4.c
which manifests in bogus uninit diagnostics.
PR tree-optimization/106186
* tree-ssa-propagate.cc (clean_up_loop_closed_phi):
Properly handle virtual PHI nodes.
Richard Biener [Tue, 5 Jul 2022 08:43:42 +0000 (10:43 +0200)]
tree-optimization/106196 - properly update virtual SSA for vector stores
The following properly handles aggregate returns of the const marked
STORE_LANES internal function to update virtual SSA form on-the-fly
rather than relying on a costly virtual SSA rewrite.
PR tree-optimization/106196
* tree-vect-stmts.cc (vect_finish_stmt_generation): Properly
handle aggregate returns of calls for VDEF updates.
* gcc.dg/torture/pr106196.c: New testcase.
Richard Biener [Mon, 4 Jul 2022 12:58:41 +0000 (14:58 +0200)]
Maintain LC SSA when doing SVE vectorization
The final loop IV use after the loop has that not in LC SSA
(and inserts not simplified _2 = _3 - 0 stmts). In particular
since it splits the exit edge when there's a virtual PHI in the
destination it breaks virtual LC SSA form (but likely also
non-virtual).
The following properly inserts LC PHIs instead.
2022-07-04 Richard Biener <rguenther@suse.de>
* tree-vect-loop-manip.cc (vect_set_loop_condition_normal):
Maintain LC SSA.
Alexandre Oliva [Tue, 5 Jul 2022 09:12:28 +0000 (06:12 -0300)]
testsuite: fix array type in two_plus_gigs test
The array element type for the two_plus_gigs test was mistakely put in
as int rather than char.
for gcc/testsuite/ChangeLog
* lib/target-supports.exp (check_effective_target_two_plus_gigs):
Fix array element type. Reported by Hans-Peter Nilsson.