GCC Administrator [Tue, 28 Dec 2021 00:16:37 +0000 (00:16 +0000)]
Daily bump.
Francois-Xavier Coudert [Mon, 27 Dec 2021 20:32:08 +0000 (21:32 +0100)]
Fortran: fix use of static_assert() to conform to C11
libgfortran/ChangeLog:
PR libfortran/98076
* runtime/string.c (gfc_itoa): Use two args for static_assert().
John David Anglin [Mon, 27 Dec 2021 17:56:19 +0000 (17:56 +0000)]
Improve atomic store implementation on hppa-linux.
2021-12-27 John David Anglin <danglin@gcc.gnu.org>
gcc/ChangeLog:
* config/pa/pa-protos.h: Delete
pa_maybe_emit_compare_and_swap_exchange_loop() declaration.
* config/pa/pa.c (pa_expand_compare_and_swap_loop): Delete.
(pa_maybe_emit_compare_and_swap_exchange_loop): Delete.
* config/pa/pa.md (atomic_storeq): Use __sync_lock_test_and_set
instead of pa_maybe_emit_compare_and_swap_exchange_loop.
(atomic_storehi, atomic_storesi, atomic_storedi): Likewise.
Patrick Palka [Mon, 27 Dec 2021 15:01:42 +0000 (10:01 -0500)]
c++: Add testcase for SFINAE w/ p[N] and incomplete type [PR101239]
The r12-6123 fix for SFINAE with p+N and incomplete type also fixed
the analogous issue with p[N].
PR c++/101239
gcc/testsuite/ChangeLog:
* g++.dg/template/sfinae32a.C: New test.
Patrick Palka [Mon, 27 Dec 2021 14:05:17 +0000 (09:05 -0500)]
c++: hard error w/ ptr+N and incomplete type [PR103700]
In pointer_int_sum when called from a SFINAE context, we need to avoid
calling size_in_bytes_loc on an incomplete pointed-to type since this
latter function isn't SFINAE-enabled and always emits an error on such
input.
PR c++/103700
gcc/c-family/ChangeLog:
* c-common.c (pointer_int_sum): When quiet, return
error_mark_node for an incomplete pointed-to type and don't
call size_in_bytes_loc.
gcc/testsuite/ChangeLog:
* g++.dg/template/sfinae32.C: New test.
H.J. Lu [Sun, 19 Dec 2021 16:47:03 +0000 (08:47 -0800)]
ix86: Don't use the 'm' constraint for x86_64_general_operand
The 'm' constraint is defined with define_memory_constraint which allows
LRA to convert the operand to the form '(mem (reg X))', where X is a
base register. To prevent LRA from generating '(mem (reg X))' from a
register:
1. Add a 'BM' constraint which is similar to the 'm' constraint, but
is defined with define_constraint.
2. Add a 'm' mode attribute which is mapped to the 'm' constraint for
general_operand and the 'BM' constraint for x86_64_general_operand.
3. Replace the 'm' constraint on <general_operand> with the '<m>'
constraint.
4. Replace the 'm' constraint on x86_64_general_operand with the 'BM'
constraint.
gcc/
PR target/103762
* config/i386/constraints.md (BM): New constraint.
* config/i386/i386.md (m): New mode attribute.
Replace the 'm' constraint on <general_operand> with the '<m>'
constraint.
Replace the 'm' constraint on x86_64_general_operand with the
'BM' constraint.
gcc/testsuite/
* gcc.target/i386/pr103762-1a.c: New test.
* gcc.target/i386/pr103762-1b.c: Likewise.
* gcc.target/i386/pr103762-1c.c: Likewise.
Uros Bizjak [Mon, 27 Dec 2021 09:00:28 +0000 (10:00 +0100)]
testsuite: Avoid unwanted vecorization [PR95046]
2021-12-27 Uroš Bizjak <ubizjak@gmail.com>
gcc/testsuite/ChangeLog:
PR target/95046
* gfortran.dg/extract_recip_1.f: Adjust testcase.
LiaoShihua [Mon, 27 Dec 2021 04:03:08 +0000 (12:03 +0800)]
RISC-V: fixed testcase riscv/pr103302.c
Because riscv32 not support __int128, so skip if -march=rv32*.
gcc/testsuite/ChangeLog:
* gcc.target/riscv/pr103302.c: skip if -march=rv32*
GCC Administrator [Mon, 27 Dec 2021 00:16:20 +0000 (00:16 +0000)]
Daily bump.
H.J. Lu [Fri, 24 Dec 2021 20:50:21 +0000 (12:50 -0800)]
i386: Check AX input in any_mul_highpart peepholes
When applying peephole optimization to transform
mov imm, %reg0
mov %reg1, %AX_REG
imul %reg0
to
mov imm, %AX_REG
imul %reg1
disable peephole optimization if reg1 == AX_REG.
gcc/
PR target/103785
* config/i386/i386.md: Swap operand order in comments and check
AX input in any_mul_highpart peepholes.
gcc/testsuite/
PR target/103785
* gcc.target/i386/pr103785.c: New test.
Francois-Xavier Coudert [Sun, 26 Dec 2021 10:59:14 +0000 (11:59 +0100)]
Fortran: speed up decimal output of integers
libgfortran/ChangeLog:
PR libfortran/98076
* runtime/string.c (itoa64, itoa64_pad19): New helper functions.
(gfc_itoa): On targets with 128-bit integers, call fast
64-bit functions to avoid many slow divisions.
gcc/testsuite/ChangeLog:
PR libfortran/98076
* gfortran.dg/pr98076.f90: New test.
GCC Administrator [Sun, 26 Dec 2021 00:16:17 +0000 (00:16 +0000)]
Daily bump.
Francois-Xavier Coudert [Sat, 25 Dec 2021 14:07:12 +0000 (15:07 +0100)]
Fortran: simplify library code for integer-to-decimal conversion
libgfortran/ChangeLog:
PR libfortran/81986
PR libfortran/99191
* libgfortran.h: Remove gfc_xtoa(), adjust gfc_itoa() and
GFC_ITOA_BUF_SIZE.
* io/write.c (write_decimal): conversion parameter is always
gfc_itoa(), so remove it. Protect from overflow.
(xtoa): Move gfc_xtoa and update its name.
(xtoa_big): Renamed from ztoa_big for consistency.
(write_z): Adjust to new function names.
(write_i, write_integer): Remove last arg of write_decimal.
* runtime/backtrace.c (error_callback): Comment on the use of
gfc_itoa().
* runtime/error.c (gfc_xtoa): Move to io/write.c.
* runtime/string.c (gfc_itoa): Take an unsigned argument,
remove the handling of negative values.
GCC Administrator [Sat, 25 Dec 2021 00:16:18 +0000 (00:16 +0000)]
Daily bump.
Uros Bizjak [Fri, 24 Dec 2021 16:09:36 +0000 (17:09 +0100)]
i386: Add V2SFmode DIV insn pattern [PR95046, PR103797]
Use V4SFmode "DIVPS X,Y" with [y0, y1, 1.0f, 1.0f] as a divisor
to avoid division by zero.
2021-12-24 Uroš Bizjak <ubizjak@gmail.com>
gcc/ChangeLog:
PR target/95046
PR target/103797
* config/i386/mmx.md (divv2sf3): New instruction pattern.
gcc/testsuite/ChangeLog:
PR target/95046
PR target/103797
* gcc.target/i386/pr95046-1.c (test_div): Add.
(dg-options): Add -mno-recip.
Iain Sandoe [Fri, 24 Dec 2021 10:59:35 +0000 (10:59 +0000)]
Darwin: Amend a comment to be more inclusive [NFC].
As per title.
Signed-off-by: Iain Sandoe <iain@sandoe.co.uk>
gcc/ChangeLog:
* config/darwin.c (darwin_override_options): Make a comment
more inclusive.
Iain Sandoe [Mon, 20 Dec 2021 15:19:50 +0000 (15:19 +0000)]
Darwin: Update rules for handling alignment of globals.
The current rule was too strict and has not been required since Darwin11.
This relaxes the constraint to allow up to 2^28 alignment for non-common
entities. Common is still restricted to a maximum aligment of 2^15.
When the host is an older version of Darwin ( earlier that 11 ) then the
existing constraint is still applied. Note that this is a host constraint
not a target one (so that a compilation on 10.7 targeting 10.6 is allowed
to use a greater alignment than the tools on 10.6 support). This matches
the behaviour of clang.
Signed-off-by: Iain Sandoe <iain@sandoe.co.uk>
gcc/ChangeLog:
* config.gcc: Emit L2_MAX_OFILE_ALIGNMENT with suitable
values for the host.
* config/darwin.c (darwin_emit_common): Error for alignment
values > 32768.
* config/darwin.h (MAX_OFILE_ALIGNMENT): Rework to use the
configured L2_MAX_OFILE_ALIGNMENT.
gcc/testsuite/ChangeLog:
* gcc.dg/darwin-aligned-globals.c: New test.
* gcc.dg/darwin-comm-1.c: New test.
* gcc.dg/attr-aligned.c: Amend for new alignment values on
Darwin.
* gcc.target/i386/pr89261.c: Likewise.
Iain Sandoe [Fri, 10 Dec 2021 23:55:49 +0000 (23:55 +0000)]
Darwin: Check for that flag-reorder-and-partition.
We were checking whether the flag had been set by the user, but not if
it was set to true. Which means that the check fails in its intent when
the user puts -fno-reorder-and-partition.
Signed-off-by: Iain Sandoe <iain@sandoe.co.uk>
gcc/ChangeLog:
* config/darwin.c (darwin_override_options): When checking for the
flag-reorder-and-partition case, also check that it is set on.
Iain Sandoe [Wed, 22 Dec 2021 14:00:25 +0000 (14:00 +0000)]
Darwin: Define OBJECT_FORMAT_MACHO.
There are places that we need to make different codegen depending
on the object format rather than on the arch. We already have
definitions for ELF, COFF etc. this adds one for MACHO.
Signed-off-by: Iain Sandoe <iain@sandoe.co.uk>
gcc/ChangeLog:
* config/darwin.h (OBJECT_FORMAT_MACHO): New.
GCC Administrator [Fri, 24 Dec 2021 00:16:27 +0000 (00:16 +0000)]
Daily bump.
H.J. Lu [Thu, 23 Dec 2021 18:07:25 +0000 (10:07 -0800)]
smuldi3_highpart.c: Replace long with long long for -mx32
* gcc.target/i386/smuldi3_highpart.c: Replace long with long long.
Roger Sayle [Thu, 23 Dec 2021 12:33:07 +0000 (12:33 +0000)]
x86: PR target/103773: Fix wrong-code with -Oz from pop to memory.
This is a fix to PR target/103773 where -Oz shouldn't use push/pop
on x86 to shrink writing small integer constants to memory.
Instead clang uses "andl $0, mem" for writing zero, and "orl $-1, mem"
when writing -1 to memory when using -Oz. This patch implements this
via peephole2 where we can confirm that its ok to clobber the flags.
2021-12-23 Roger Sayle <roger@nextmovesoftware.com>
Uroš Bizjak <ubizjak@gmail.com>
gcc/ChangeLog
PR target/103773
* config/i386/i386.md (*mov<mode>_and): New define_insn for
writing a zero to memory using AND.
(*mov<mode>_or): Extend to allow memory destination and HImode.
(*movdi_internal): Remove -Oz push/pop optimization from here.
(*movsi_internal): Likewise.
(peephole2): Perform -Oz push/pop optimization here, only for
register destinations, values other than zero, and in functions
that don't used the red zone.
(peephole2): With -Oz, convert writes of 0 or -1 to memory into
their clobber forms, i.e. *mov<mode>_and and *mov<mode>_or resp.
gcc/testsuite/ChangeLog
PR target/103773
* gcc.target/i386/pr103773-2.c: New test case.
* gcc.target/i386/pr103773.c: New test case.
konglin1 [Tue, 7 Dec 2021 09:08:23 +0000 (17:08 +0800)]
i386: Enable intrinsics that convert float and bf16 data to each other.
gcc/ChangeLog:
* config/i386/avx512bf16intrin.h (_mm_cvtsbh_ss): Add new intrinsic.
(_mm512_cvtpbh_ps): Likewise.
(_mm512_maskz_cvtpbh_ps): Likewise.
(_mm512_mask_cvtpbh_ps): Likewise.
* config/i386/avx512bf16vlintrin.h (_mm_cvtness_sbh): Likewise.
(_mm_cvtpbh_ps): Likewise.
(_mm256_cvtpbh_ps): Likewise.
(_mm_maskz_cvtpbh_ps): Likewise.
(_mm256_maskz_cvtpbh_ps): Likewise.
(_mm_mask_cvtpbh_ps): Likewise.
(_mm256_mask_cvtpbh_ps): Likewise.
gcc/testsuite/ChangeLog:
* gcc.target/i386/avx512bf16-cvtsbh2ss-1.c: New test.
* gcc.target/i386/avx512bf16-vcvtpbh2ps-1.c: Ditto.
* gcc.target/i386/avx512bf16vl-cvtness2sbh-1.c: Ditto.
* gcc.target/i386/avx512bf16vl-vcvtpbh2ps-1.c: Ditto.
Feng Xue [Tue, 21 Dec 2021 08:48:16 +0000 (09:48 +0100)]
Fix typo in type verification.
PR ipa/103786
gcc/ChangeLog:
* tree.c (verify_type): Fix typo.
liuhongt [Wed, 22 Dec 2021 08:48:54 +0000 (16:48 +0800)]
Combine vpcmpuw + zero_extend to vpcmpuw.
vcmp{ps,ph,pd} and vpcmp{,u}{b,w,d,q} implicitly clear the upper bits
of dest.
gcc/ChangeLog:
PR target/103750
* config/i386/sse.md
(*<avx512>_cmp<V48H_AVX512VL:mode>3_zero_extend<SWI248x:mode>):
New pre_reload define_insn_and_split.
(*<avx512>_cmp<VI12_AVX512VL:mode>3_zero_extend<SWI248x:mode>):
Ditto.
(*<avx512>_ucmp<VI12_AVX512VL:mode>3_zero_extend<SWI248x:mode>):
Ditto.
(*<avx512>_ucmp<VI48_AVX512VL:mode>3_zero_extend<SWI248x:mode>):
Ditto.
(*<avx512>_cmp<V48H_AVX512VL:mode>3_zero_extend<SWI248x:mode>_2):
Ditto.
(*<avx512>_cmp<VI12_AVX512VL:mode>3_zero_extend<SWI248x:mode>_2):
Ditto.
(*<avx512>_ucmp<VI12_AVX512VL:mode>3_zero_extend<SWI248x:mode>_2):
Ditto.
(*<avx512>_ucmp<VI48_AVX512VL:mode>3_zero_extend<SWI248x:mode>_2):
Ditto.
gcc/testsuite/ChangeLog:
* gcc.target/i386/avx512bw-pr103750-1.c: New test.
* gcc.target/i386/avx512bw-pr103750-2.c: New test.
* gcc.target/i386/avx512f-pr103750-1.c: New test.
* gcc.target/i386/avx512f-pr103750-2.c: New test.
* gcc.target/i386/avx512fp16-pr103750-1.c: New test.
* gcc.target/i386/avx512fp16-pr103750-2.c: New test.
GCC Administrator [Thu, 23 Dec 2021 00:16:29 +0000 (00:16 +0000)]
Daily bump.
Harald Anlauf [Mon, 20 Dec 2021 21:12:33 +0000 (22:12 +0100)]
Fortran: BOZ literal constants are not interoperable
gcc/fortran/ChangeLog:
PR fortran/103778
* check.c (is_c_interoperable): A BOZ literal constant is not
interoperable.
gcc/testsuite/ChangeLog:
PR fortran/103778
* gfortran.dg/illegal_boz_arg_3.f90: New test.
Harald Anlauf [Mon, 20 Dec 2021 21:01:05 +0000 (22:01 +0100)]
Fortran: CASE selector expressions must be scalar
gcc/fortran/ChangeLog:
PR fortran/103776
* match.c (match_case_selector): Reject expressions in CASE
selector which are not scalar.
gcc/testsuite/ChangeLog:
PR fortran/103776
* gfortran.dg/select_10.f90: New test.
Murray Steele [Wed, 22 Dec 2021 14:55:07 +0000 (14:55 +0000)]
arm: Declare MVE types internally via pragma
Move the implementation of MVE ACLE types from arm_mve_types.h to
inside GCC via a new pragma, which replaces the prior type
definitions. This allows for the types to be used internally for
intrinsic function definitions.
gcc/ChangeLog:
* config.gcc (arm*-*-*): Add arm-mve-builtins.o to extra_objs.
* config/arm/arm-c.c (arm_pragma_arm): Handle "#pragma GCC arm".
(arm_register_target_pragmas): Register it.
* config/arm/arm-protos.h: (arm_mve::arm_handle_mve_types_h): New
prototype.
* config/arm/arm_mve_types.h: Replace MVE type definitions with
new pragma.
* config/arm/t-arm: (arm-mve-builtins.o): New target rule.
* config/arm/arm-mve-builtins.cc: New file.
* config/arm/arm-mve-builtins.def: New file.
* config/arm/arm-mve-builtins.h: New file.
gcc/testsuite/ChangeLog:
* gcc.target/arm/mve/mve.exp: Add new subdirectories.
* gcc.target/arm/mve/general-c/type_redef_1.c: New test.
* gcc.target/arm/mve/general/double_pragmas_1.c: New test.
* gcc.target/arm/mve/general/nomve_1.c: New test.
Murray Steele [Wed, 22 Dec 2021 14:50:00 +0000 (14:50 +0000)]
arm: Move arm_simd_info array declaration into header
Move the arm_simd_type and arm_type_qualifiers enums, and
arm_simd_info struct from arm-builtins.c into arm-builtins.h header.
This is a first step towards internalising the type definitions for
MVE predicate, vector, and tuple types. By moving arm_simd_types into
a header, we allow future patches to use these type trees externally
to arm-builtins.c, which is a crucial step towards developing an MVE
intrinsics framework similar to the current SVE implementation.
gcc/ChangeLog:
* config/arm/arm-builtins.c (enum arm_type_qualifiers): Move to
arm_builtins.h.
(enum arm_simd_type): Move to arm-builtins.h.
(struct arm_simd_type_info): Move to arm-builtins.h.
* config/arm/arm-builtins.h (enum arm_simd_type): Move from
arm-builtins.c.
(enum arm_type_qualifiers): Move from arm-builtins.c.
(struct arm_simd_type_info): Move from arm-builtins.c.
Francois-Xavier Coudert [Wed, 22 Dec 2021 11:46:07 +0000 (12:46 +0100)]
Fortran: allow __float128 on targets where long double is not REAL(KIND=10)
The logic for detection of REAL(KIND=16) in kinds-override.h made
assumptions:
-- if real(kind=10) exists, i.e. if HAVE_GFC_REAL_10 is defined,
then it is necessarily the "long double" type
-- if real(kind=16) exists, then:
* if HAVE_GFC_REAL_10, real(kind=16) is "__float128"
* otherwise, real(kind=16) is "long double"
This may not always be true. Take the aarch64-apple-darwin port,
it has double == long double == binary64, and __float128 == binary128.
We already have more fine-grained logic in the mk-kinds-h.sh script,
where we actually check the Fortran kind corresponding to C’s long
double. So let's use it, and emit the GFC_REAL_16_IS_FLOAT128 /
GFC_REAL_16_IS_LONG_DOUBLE macros there.
libgfortran/ChangeLog:
* kinds-override.h: Move GFC_REAL_16_IS_* macros...
* mk-kinds-h.sh: ... here.
Martin Liska [Wed, 22 Dec 2021 11:16:47 +0000 (12:16 +0100)]
docs: docs: use ';' for function declarations. (part 3)
gcc/ChangeLog:
* doc/extend.texi: Unify all function declarations in examples
where some miss trailing ';'.
Martin Liska [Wed, 22 Dec 2021 11:06:53 +0000 (12:06 +0100)]
docs: docs: use ';' for function declarations. (part 2)
gcc/ChangeLog:
* doc/extend.texi: Unify all function declarations in examples
where some miss trailing ';'.
Martin Liska [Wed, 22 Dec 2021 10:59:28 +0000 (11:59 +0100)]
docs: use ';' for function declarations.
gcc/ChangeLog:
* doc/extend.texi: Unify all function declarations in examples
where some miss trailing ';'.
Martin Liska [Wed, 22 Dec 2021 10:20:42 +0000 (11:20 +0100)]
docs: Unify instruct set name.
gcc/ChangeLog:
* doc/extend.texi: Use uppercase letters for SSEx.
GCC Administrator [Wed, 22 Dec 2021 00:16:30 +0000 (00:16 +0000)]
Daily bump.
Iain Buclaw [Thu, 16 Dec 2021 22:56:16 +0000 (23:56 +0100)]
config: Add check whether D compiler works (PR103528)
As well as checking for the existence of a GDC compiler, also validate
that it has also been built with libphobos, otherwise warn or fail with
the message that GDC is required to build d.
config/ChangeLog:
PR d/103528
* acx.m4 (ACX_PROG_GDC): Add check whether D compiler works.
ChangeLog:
* configure: Regenerate.
Iain Buclaw [Tue, 21 Dec 2021 14:03:47 +0000 (15:03 +0100)]
libphobos: Add power*-*-freebsd* as supported target
This has been tested on powerpc64-freebsd13 and powerpc64le-freebsd13,
and used to build dub, along with some D tools from ports.
libphobos/ChangeLog:
* configure.tgt: Add power*-*-freebsd* as a supported target.
Jiang Haochen [Tue, 21 Dec 2021 08:12:02 +0000 (16:12 +0800)]
i386: Add missing BMI intrinsic to align with clang
gcc/ChangeLog:
* config/i386/bmiintrin.h (_tzcnt_u16): New intrinsic.
(_andn_u32): Ditto.
(_andn_u64): Ditto.
gcc/testsuite/ChangeLog:
* gcc.target/i386/bmi-1.c: Add test for new intrinsic.
* gcc.target/i386/bmi-2.c: Ditto.
* gcc.target/i386/bmi-3.c: Ditto.
Martin Liska [Tue, 21 Dec 2021 08:10:43 +0000 (09:10 +0100)]
config.sub: change mode to 755.
ChangeLog:
* config.sub: Change mode back to 755.
Xionghu Luo [Wed, 8 Dec 2021 01:24:35 +0000 (19:24 -0600)]
Don't move cold code out of loop by checking bb count
v8 changes:
1. Use hotter_than_inner_loop instead of colder to store a hotter loop
nearest to loop.
2. Update the logic in fill_coldest_and_hotter_out_loop and
get_coldest_out_loop to make common case O(1).
3. Update function argument bb_colder_than_loop_preheader.
4. Make cached array to vec<class *loop> for index checking.
v7 changes:
1. Refine get_coldest_out_loop to replace loop with checking
pre-computed coldest_outermost_loop and colder_than_inner_loop.
2. Add function fill_cold_out_loop, compute coldest_outermost_loop and
colder_than_inner_loop recursively without loop.
v6 changes:
1. Add function fill_coldest_out_loop to pre compute the coldest
outermost loop for each loop.
2. Rename find_coldest_out_loop to get_coldest_out_loop.
3. Add testcase ssa-lim-22.c to differentiate with ssa-lim-19.c.
v5 changes:
1. Refine comments for new functions.
2. Use basic_block instead of count in bb_colder_than_loop_preheader
to align with function name.
3. Refine with simpler implementation for get_coldest_out_loop and
ref_in_loop_hot_body::operator for better understanding.
v4 changes:
1. Sort out profile_count comparision to function bb_cold_than_loop_preheader.
2. Update ref_in_loop_hot_body::operator () to find cold_loop before compare.
3. Split RTL invariant motion part out.
4. Remove aux changes.
v3 changes:
1. Handle max_loop in determine_max_movement instead of outermost_invariant_loop.
2. Remove unnecessary changes.
3. Add for_all_locs_in_loop (loop, ref, ref_in_loop_hot_body) in can_sm_ref_p.
4. "gsi_next (&bsi);" in move_computations_worker is kept since it caused
infinite loop when implementing v1 and the iteration is missed to be
updated actually.
v1: https://gcc.gnu.org/pipermail/gcc-patches/2021-August/576488.html
v2: https://gcc.gnu.org/pipermail/gcc-patches/2021-September/579086.html
v3: https://gcc.gnu.org/pipermail/gcc-patches/2021-September/580211.html
v4: https://gcc.gnu.org/pipermail/gcc-patches/2021-October/581231.html
v5: https://gcc.gnu.org/pipermail/gcc-patches/2021-October/581961.html
...
v8: https://gcc.gnu.org/pipermail/gcc-patches/2021-December/586209.html
There was a patch trying to avoid move cold block out of loop:
https://gcc.gnu.org/pipermail/gcc/2014-November/215551.html
Richard suggested to "never hoist anything from a bb with lower execution
frequency to a bb with higher one in LIM invariantness_dom_walker
before_dom_children".
In gimple LIM analysis, add get_coldest_out_loop to move invariants to
expected target loop, if profile count of the loop bb is colder
than target loop preheader, it won't be hoisted out of loop.
Likely for store motion, if all locations of the REF in loop is cold,
don't do store motion of it.
SPEC2017 performance evaluation shows 1% performance improvement for
intrate GEOMEAN and no obvious regression for others. Especially,
500.perlbench_r +7.52% (Perf shows function S_regtry of perlbench is
largely improved.), and 548.exchange2_r+1.98%, 526.blender_r +1.00%
on P8LE.
gcc/ChangeLog:
2021-12-21 Xionghu Luo <luoxhu@linux.ibm.com>
* tree-ssa-loop-im.c (bb_colder_than_loop_preheader): New
function.
(get_coldest_out_loop): New function.
(determine_max_movement): Use get_coldest_out_loop.
(move_computations_worker): Adjust and fix iteration udpate.
(class ref_in_loop_hot_body): New functor.
(ref_in_loop_hot_body::operator): New.
(can_sm_ref_p): Use for_all_locs_in_loop.
(fill_coldest_and_hotter_out_loop): New.
(tree_ssa_lim_finalize): Free coldest_outermost_loop and
hotter_than_inner_loop.
(loop_invariant_motion_in_fun): Call fill_coldest_and_hotter_out_loop.
gcc/testsuite/ChangeLog:
2021-12-21 Xionghu Luo <luoxhu@linux.ibm.com>
* gcc.dg/tree-ssa/recip-3.c: Adjust.
* gcc.dg/tree-ssa/ssa-lim-19.c: New test.
* gcc.dg/tree-ssa/ssa-lim-20.c: New test.
* gcc.dg/tree-ssa/ssa-lim-21.c: New test.
* gcc.dg/tree-ssa/ssa-lim-22.c: New test.
* gcc.dg/tree-ssa/ssa-lim-23.c: New test.
Xionghu Luo [Wed, 8 Dec 2021 05:17:51 +0000 (23:17 -0600)]
Fix loop split incorrect count and probability
In tree-ssa-loop-split.c, split_loop and split_loop_on_cond does two
kind of split. split_loop only works for single loop and insert edge at
exit when split, while split_loop_on_cond is not limited to single loop
and insert edge at latch when split. Both split behavior should consider
loop count and probability update. For split_loop, loop split condition
is moved in front of loop1 and loop2; But split_loop_on_cond moves the
condition between loop1 and loop2, this patch does:
1) profile count proportion for both original loop and copied loop
without dropping down the true branch's count;
2) probability update in the two loops and between the two loops.
Regression tested pass.
Changes diff for split_loop and split_loop_on_cond cases:
1) diff base/loop-split.c.151t.lsplit patched/loop-split.c.152t.lsplit
...
<bb 2> [local count:
118111600]:
if (beg_5(D) < end_8(D))
goto <bb 14>; [89.00%]
else
goto <bb 6>; [11.00%]
<bb 14> [local count:
105119324]:
if (beg2_6(D) < c_9(D))
- goto <bb 15>; [100.00%]
+ goto <bb 15>; [33.00%]
else
- goto <bb 16>; [100.00%]
+ goto <bb 16>; [67.00%]
- <bb 15> [local count:
105119324]:
+ <bb 15> [local count:
34689377]:
_25 = beg_5(D) + 1;
_26 = end_8(D) - beg_5(D);
_27 = beg2_6(D) + _26;
_28 = MIN_EXPR <c_9(D), _27>;
- <bb 3> [local count:
955630225]:
+ <bb 3> [local count:
315357973]:
# i_16 = PHI <i_11(8), beg_5(D)(15)>
# j_17 = PHI <j_12(8), beg2_6(D)(15)>
printf ("a: %d %d\n", i_16, j_17);
i_11 = i_16 + 1;
j_12 = j_17 + 1;
if (j_12 < _28)
- goto <bb 8>; [89.00%]
+ goto <bb 8>; [29.37%]
else
- goto <bb 17>; [11.00%]
+ goto <bb 17>; [70.63%]
- <bb 8> [local count:
850510901]:
+ <bb 8> [local count:
280668596]:
goto <bb 3>; [100.00%]
- <bb 16> [local count:
105119324]:
+ <bb 16> [local count:
70429947]:
# i_22 = PHI <beg_5(D)(14), i_29(17)>
# j_23 = PHI <beg2_6(D)(14), j_30(17)>
<bb 10> [local count:
955630225]:
# i_2 = PHI <i_22(16), i_20(13)>
# j_1 = PHI <j_23(16), j_21(13)>
i_20 = i_2 + 1;
j_21 = j_1 + 1;
if (end_8(D) > i_20)
- goto <bb 13>; [89.00%]
+ goto <bb 13>; [59.63%]
else
- goto <bb 9>; [11.00%]
+ goto <bb 9>; [40.37%]
- <bb 13> [local count:
850510901]:
+ <bb 13> [local count:
569842305]:
goto <bb 10>; [100.00%]
<bb 17> [local count:
105119324]:
# i_29 = PHI <i_11(3)>
# j_30 = PHI <j_12(3)>
if (end_8(D) > i_29)
goto <bb 16>; [80.00%]
else
goto <bb 9>; [20.00%]
<bb 9> [local count:
105119324]:
<bb 6> [local count:
118111600]:
return 0;
}
<bb 2> [local count:
118111600]:
- if (beg_5(D) < end_8(D))
+ _1 = end_6(D) - beg_7(D);
+ j_9 = _1 + beg2_8(D);
+ if (end_6(D) > beg_7(D))
goto <bb 14>; [89.00%]
else
goto <bb 6>; [11.00%]
<bb 14> [local count:
105119324]:
- if (beg2_6(D) < c_9(D))
- goto <bb 15>; [100.00%]
+ if (j_9 >= c_11(D))
+ goto <bb 15>; [33.00%]
else
- goto <bb 16>; [100.00%]
+ goto <bb 16>; [67.00%]
- <bb 15> [local count:
105119324]:
- _25 = beg_5(D) + 1;
- _26 = end_8(D) - beg_5(D);
- _27 = beg2_6(D) + _26;
- _28 = MIN_EXPR <c_9(D), _27>;
-
- <bb 3> [local count:
955630225]:
- # i_16 = PHI <i_11(8), beg_5(D)(15)>
- # j_17 = PHI <j_12(8), beg2_6(D)(15)>
- printf ("a: %d %d\n", i_16, j_17);
- i_11 = i_16 + 1;
- j_12 = j_17 + 1;
- if (j_12 < _28)
- goto <bb 8>; [89.00%]
+ <bb 15> [local count:
34689377]:
+ _27 = end_6(D) + -1;
+ _28 = beg_7(D) - end_6(D);
+ _29 = j_9 + _28;
+ _30 = _29 + 1;
+ _31 = MAX_EXPR <c_11(D), _30>;
+
+ <bb 3> [local count:
315357973]:
+ # i_18 = PHI <i_13(8), end_6(D)(15)>
+ # j_19 = PHI <j_14(8), j_9(15)>
+ printf ("a: %d %d\n", i_18, j_19);
+ i_13 = i_18 + -1;
+ j_14 = j_19 + -1;
+ if (j_14 >= _31)
+ goto <bb 8>; [29.37%]
else
- goto <bb 17>; [11.00%]
+ goto <bb 17>; [70.63%]
- <bb 8> [local count:
850510901]:
+ <bb 8> [local count:
280668596]:
goto <bb 3>; [100.00%]
- <bb 16> [local count:
105119324]:
- # i_22 = PHI <beg_5(D)(14), i_29(17)>
- # j_23 = PHI <beg2_6(D)(14), j_30(17)>
+ <bb 16> [local count:
70429947]:
+ # i_24 = PHI <end_6(D)(14), i_32(17)>
+ # j_25 = PHI <j_9(14), j_33(17)>
<bb 10> [local count:
955630225]:
- # i_2 = PHI <i_22(16), i_20(13)>
- # j_1 = PHI <j_23(16), j_21(13)>
- i_20 = i_2 + 1;
- j_21 = j_1 + 1;
- if (end_8(D) > i_20)
+ # i_3 = PHI <i_24(16), i_22(13)>
+ # j_2 = PHI <j_25(16), j_23(13)>
+ i_22 = i_3 + -1;
+ j_23 = j_2 + -1;
+ if (beg_7(D) < i_22)
goto <bb 13>; [89.00%]
else
goto <bb 9>; [11.00%]
- <bb 13> [local count:
850510901]:
+ <bb 13> [local count:
569842305]:
goto <bb 10>; [100.00%]
<bb 17> [local count:
105119324]:
- # i_29 = PHI <i_11(3)>
- # j_30 = PHI <j_12(3)>
- if (end_8(D) > i_29)
+ # i_32 = PHI <i_13(3)>
+ # j_33 = PHI <j_14(3)>
+ if (beg_7(D) < i_32)
goto <bb 16>; [80.00%]
else
goto <bb 9>; [20.00%]
<bb 9> [local count:
105119324]:
<bb 6> [local count:
118111600]:
return 0;
}
2) diff base/loop-cond-split-1.c.151t.lsplit patched/loop-cond-split-1.c.151t.lsplit:
...
<bb 2> [local count:
118111600]:
if (n_7(D) > 0)
goto <bb 4>; [89.00%]
else
goto <bb 3>; [11.00%]
<bb 3> [local count:
118111600]:
return;
<bb 4> [local count:
105119324]:
pretmp_3 = ga;
- <bb 5> [local count:
955630225]:
+ <bb 5> [local count:
315357973]:
# i_13 = PHI <i_10(20), 0(4)>
# prephitmp_12 = PHI <prephitmp_5(20), pretmp_3(4)>
if (prephitmp_12 != 0)
goto <bb 6>; [33.00%]
else
goto <bb 7>; [67.00%]
<bb 6> [local count:
315357972]:
_2 = do_something ();
ga = _2;
- <bb 7> [local count:
955630225]:
+ <bb 7> [local count:
315357973]:
# prephitmp_5 = PHI <prephitmp_12(5), _2(6)>
i_10 = inc (i_13);
if (n_7(D) > i_10)
goto <bb 21>; [89.00%]
else
goto <bb 11>; [11.00%]
<bb 11> [local count:
105119324]:
goto <bb 3>; [100.00%]
- <bb 21> [local count:
850510901]:
+ <bb 21> [local count:
280668596]:
if (prephitmp_12 != 0)
- goto <bb 20>; [100.00%]
+ goto <bb 20>; [33.00%]
else
- goto <bb 19>; [INV]
+ goto <bb 19>; [67.00%]
- <bb 20> [local count:
850510901]:
+ <bb 20> [local count:
280668596]:
goto <bb 5>; [100.00%]
- <bb 19> [count: 0]:
+ <bb 19> [local count:
70429947]:
# i_23 = PHI <i_10(21)>
# prephitmp_25 = PHI <prephitmp_5(21)>
- <bb 12> [local count:
955630225]:
+ <bb 12> [local count:
640272252]:
# i_15 = PHI <i_23(19), i_22(16)>
# prephitmp_16 = PHI <prephitmp_25(19), prephitmp_16(16)>
i_22 = inc (i_15);
if (n_7(D) > i_22)
goto <bb 16>; [89.00%]
else
goto <bb 11>; [11.00%]
- <bb 16> [local count:
850510901]:
+ <bb 16> [local count:
569842305]:
goto <bb 12>; [100.00%]
}
gcc/ChangeLog:
2021-12-21 Xionghu Luo <luoxhu@linux.ibm.com>
* tree-ssa-loop-split.c (split_loop): Fix incorrect
profile_count and probability.
(do_split_loop_on_cond): Likewise.
Xionghu Luo [Tue, 21 Dec 2021 03:10:09 +0000 (21:10 -0600)]
Fix incorrect loop exit edge probability [PR103270]
r12-4526 cancelled jump thread path rotates loop. It exposes a issue in
profile-estimate when predict_extra_loop_exits, outer loop's exit edge
is marked as inner loop's extra loop exit and set with incorrect
prediction, then a hot inner loop will become cold loop finally through
optimizations, this patch add loop check when searching extra exit edges
to avoid unexpected predict_edge from predict_paths_for_bb.
Regression tested on P8LE.
gcc/ChangeLog:
2021-12-21 Xionghu Luo <luoxhu@linux.ibm.com>
PR middle-end/103270
* predict.c (predict_extra_loop_exits): Add loop parameter.
(predict_loops): Call with loop argument.
gcc/testsuite/ChangeLog:
2021-12-21 Xionghu Luo <luoxhu@linux.ibm.com>
PR middle-end/103270
* gcc.dg/pr103270.c: New test.
Xionghu Luo [Tue, 21 Dec 2021 03:02:50 +0000 (21:02 -0600)]
rs6000: Replace UNSPECS with ss_plus/us_plus and ss_minus/us_minus
These four UNSPECS seems could be replaced with native RTL.
For
"(set (reg:SI VSCR_REGNO) (unspec:SI [(const_int 0)] UNSPEC_SET_VSCR))":
Quoted David's explanation:
"The design came from the early implementation of Altivec:
https://gcc.gnu.org/pipermail/gcc-patches/2002-May/077409.html
If one later checks for saturation (reads VSCR), one needs a
corresponding SET of the value. It's set in an architecture-specific
manner that isn't described to GCC, but it's set, not just clobbered
and in an undefined state.
The RTL does not describe that VSCR is set to the value 0. The
(const_int 0) is not the value set. You can think of the (const_int
0) as a dummy RTL argument to the VSCR UNSPEC. UNSPEC requires at
least one argument and the pattern doesn't try to express the
argument, so it uses a dummy RTL constant. It's part of a PARALLEL
and the plus or minus already expresses the data dependency of the
pattern on the input operands."
gcc/ChangeLog:
2021-12-21 Xionghu Luo <luoxhu@linux.ibm.com>
* config/rs6000/altivec.md (altivec_vaddu<VI_char>s): Replace
UNSPEC_VADDU with us_plus.
(altivec_vadds<VI_char>s): Replace UNSPEC_VADDS with ss_plus.
(altivec_vsubu<VI_char>s): Replace UNSPEC_VSUBU with us_minus.
(altivec_vsubs<VI_char>s): Replace UNSPEC_VSUBS with ss_minus.
(altivec_abss_<mode>): Likewise.
GCC Administrator [Tue, 21 Dec 2021 00:16:24 +0000 (00:16 +0000)]
Daily bump.
Joseph Myers [Mon, 20 Dec 2021 23:09:37 +0000 (23:09 +0000)]
Update cpplib es.po
* es.po: Update.
Uros Bizjak [Mon, 20 Dec 2021 20:15:50 +0000 (21:15 +0100)]
i386: Fix <sse2p4_1>_pinsr<ssemodesuffix> and its splitters [PR103772]
The clever trick to duplicate the value of the input operand into itself
proved not so clever after all. The splitter should not clobber the input
operand in any case, since the register can hold the value outside the HImode
lowpart when accessed as subreg. Use the standard earlyclobber approach
instead.
The testcase fails with avx2 ISA, but I was not able to create the testcase
that wouldn't require -mavx512fp16 compile flag.
2021-12-20 Uroš Bizjak <ubizjak@gmail.com>
gcc/ChangeLog:
PR target/103772
* config/i386/sse.md (<sse2p4_1>_pinsr<ssemodesuffix>): Add
earlyclobber to (x,x,x,i) alternative.
(<sse2p4_1>_pinsr<ssemodesuffix> peephole2): Remove.
(<sse2p4_1>_pinsr<ssemodesuffix> splitter): Use output
operand as a temporary register. Split after reload_completed.
Patrick Palka [Mon, 20 Dec 2021 20:02:40 +0000 (15:02 -0500)]
c++: memfn lookup consistency in incomplete-class ctx
When instantiating a call to a member function of a class template, we
repeat the member function lookup in order to obtain the corresponding
partially instantiated functions. Within an incomplete-class context
however, we need to be more careful when repeating the lookup because we
don't want to introduce later-declared member functions that weren't
visible at template definition time. We're currently not careful enough
in this respect, which causes us to reject memfn1.C below.
This patch fixes this issue by making tsubst_baselink filter out from
the instantiation-time lookup those member functions that were invisible
at template definition time. This is really only necessary within an
incomplete-class context, so this patch adds a heuristic flag to BASELINK
to help us avoid needlessly performing this filtering step (which would
be a no-op) in complete-class contexts.
This is also necessary for the ahead-of-time overload set pruning
implemented in r12-6075 to be effective for member functions within
class templates.
gcc/cp/ChangeLog:
* call.c (build_new_method_call): Set
BASELINK_FUNCTIONS_MAYBE_INCOMPLETE_P on the pruned baselink.
* cp-tree.h (BASELINK_FUNCTIONS_MAYBE_INCOMPLETE_P): Define.
* pt.c (filter_memfn_lookup): New subroutine of tsubst_baselink.
(tsubst_baselink): Use filter_memfn_lookup on the new lookup
result when BASELINK_FUNCTIONS_MAYBE_INCOMPLETE_P is set on the
old baselink. Remove redundant BASELINK_P check.
* search.c (build_baselink): Set
BASELINK_FUNCTIONS_MAYBE_INCOMPLETE_P appropriately.
gcc/testsuite/ChangeLog:
* g++.dg/lookup/memfn1.C: New test.
* g++.dg/template/non-dependent16b.C: New test.
Iain Buclaw [Mon, 20 Dec 2021 18:25:32 +0000 (19:25 +0100)]
d: Merge upstream dmd
ad8412530, druntime
fd9a4544, phobos
495e835c2.
D front-end changes:
- Import dmd v2.098.1
- Remove calling of _d_delstruct from code generator.
Druntime changes:
- Import druntime v2.098.1
Phobos changes:
- Import phobos v2.098.1
gcc/d/ChangeLog:
* dmd/MERGE: Merge upstream dmd
ad8412530.
* expr.cc (ExprVisitor::visit (DeleteExp *)): Remove code generation
of _d_delstruct.
* runtime.def (DELSTRUCT): Remove.
libphobos/ChangeLog:
* libdruntime/MERGE: Merge upstream druntime
fd9a4544.
* src/MERGE: Merge upstream phobos
495e835c2.
Olivier Hainque [Wed, 3 Nov 2021 14:18:16 +0000 (14:18 +0000)]
Fix static array size in gcc.dg/vect/vect-simd-20.c
10000 / 78 is strictly greater than 128 so we will
actually do 128+1 strides in foo() for s == 78 and p[]
needs to be dimensioned accordingly.
2021-12-20 Olivier Hainque <hainque@adacore.com>
gcc/testsuite/
* gcc.dg/vect/vect-simd-20.c: Fix size of p[]
to accommodate the number of strides performed
by foo() for s == 78.
Roger Sayle [Mon, 20 Dec 2021 15:22:18 +0000 (15:22 +0000)]
x86_64: Improve code expanded for highpart multiplications.
While working on a middle-end patch to more aggressively use highpart
multiplications on targets that support them, I noticed that the RTL
expanded by the x86 backend interacts poorly with register allocation
leading to suboptimal code.
For the testcase,
typedef int __attribute ((mode(TI))) ti_t;
long foo(long x)
{
return ((ti_t)x * 19065) >> 64;
}
we'd like to avoid:
foo: movq %rdi, %rax
movl $19065, %edx
imulq %rdx
movq %rdx, %rax
ret
and would prefer:
foo: movl $19065, %eax
imulq %rdi
movq %rdx, %rax
ret
This patch provides a pair of peephole2 transformations to tweak the
spills generated by reload, and at the same time replaces the current
define_expand with a define_insn pattern using the new [su]mul_highpart
RTX codes.
2021-12-20 Roger Sayle <roger@nextmovesoftware.com>
Uroš Bizjak <ubizjak@gmail.com>
gcc/ChangeLog
* config/i386/i386.md (any_mul_highpart): New code iterator.
(sgnprefix, s): Add attribute support for [su]mul_highpart.
(<s>mul<mode>3_highpart): Delete expander.
(<s>mul<mode>3_highpart, <s>mulsi32_highpart_zext):
New define_insn patterns.
(define_peephole2): Tweak the register allocation for the above
instructions after reload.
gcc/testsuite/ChangeLog
* gcc.target/i386/smuldi3_highpart.c: New test case.
Joel Sherrill [Fri, 17 Dec 2021 16:10:10 +0000 (10:10 -0600)]
Obsolete m32c-rtems target
2021-12-20 Joel Sherrill <joel@rtems.org>
gcc/
* config.gcc: Obsolete m32c-*-rtems* target.
Patrick Palka [Mon, 20 Dec 2021 14:28:20 +0000 (09:28 -0500)]
c++: ahead-of-time overload set pruning for non-dep calls
This patch makes us remember the function selected by overload resolution
during ahead of time processing of a non-dependent call expression, so
that at instantiation time we avoid repeating some of the work of overload
resolution for the call. Note that we already do this for non-dependent
operator expressions via build_min_non_dep_op_overload.
Some caveats:
* When processing ahead of time a non-dependent call to a member
function template of a currently open class template (as in
g++.dg/template/deduce4.C), we end up generating an "inside-out"
partial instantiation such as S<T>::foo<int, int>(), the likes of
which we're apparently not prepared to fully instantiate. So in this
situation, we instead prune to the selected template instead of the
specialization in this situation.
* This change triggered a latent FUNCTION_DECL pretty printing issue
in cpp0x/error2.C -- since we now resolve the call to foo<0> ahead
of time, the error now looks like:
error: expansion pattern ‘foo()()=0’ contains no parameter pack
where the FUNCTION_DECL for foo<0> is clearly misprinted. But this
pretty-printing issue could be reproduced without this patch if
we define foo as a non-template function. Since this testcase was
added to verify pretty printing of TEMPLATE_ID_EXPR, I work around
this test failure by making the call to foo type-dependent and thus
immune to this ahead of time pruning.
* We now reject parts of cpp0x/fntmp-equiv1.C because we notice that
the non-dependent call d(f, b) in
int d(int, int);
template <unsigned long f, unsigned b, typename> e<d(f, b)> d();
is non-constexpr. Since this testcase is about equivalency of
dependent names in the context of declaration matching, it seems the
best fix here is to make the calls to d, d2 and d3 within the
function signatures dependent.
gcc/cp/ChangeLog:
* call.c (build_new_method_call): For a non-dependent call
expression inside a template, returning a templated tree
whose overload set contains just the selected function.
* semantics.c (finish_call_expr): Likewise.
gcc/testsuite/ChangeLog:
* g++.dg/cpp0x/error2.C: Make the call to foo type-dependent in
order to avoid latent pretty-printing issue for FUNCTION_DECL
inside MODOP_EXPR.
* g++.dg/cpp0x/fntmp-equiv1.C: Make the calls to d, d2 and d3
within the function signatures dependent.
* g++.dg/template/non-dependent16.C: New test.
* g++.dg/template/non-dependent16a.C: New test.
* g++.dg/template/non-dependent17.C: New test.
Martin Liska [Mon, 20 Dec 2021 10:31:50 +0000 (11:31 +0100)]
jit: Fix -Wodr warning
gcc/jit/libgccjit.c:3957:8: warning: type 'struct version_info' violates the C++ One Definition Rule [-Wodr]
../../gcc/jit/libgccjit.c:3957:8: warning: type 'struct version_info' violates the C++ One Definition Rule [-Wodr]
3957 | struct version_info
../../gcc/tree-ssa-loop-ivopts.c:181: note: a different type is defined in another translation unit
181 | struct version_info
gcc/jit/ChangeLog:
* libgccjit.c (struct version_info): Rename to jit_version_info.
(struct jit_version_info): Likewise.
(gcc_jit_version_major): Likewise.
(gcc_jit_version_minor): Likewise.
(gcc_jit_version_patchlevel): Likewise.
Martin Liska [Mon, 20 Dec 2021 09:55:50 +0000 (10:55 +0100)]
opts: Support -Oz in -Ox option hints.
gcc/ChangeLog:
* opts.c (default_options_optimization): Support -Oz in -Ox option hints.
Jan Hubicka [Mon, 20 Dec 2021 07:43:13 +0000 (08:43 +0100)]
Fix handling of deferred SSA names in modref dataflow
In the testcase we fail to analyze SSA name because flag do_dataflow is set
and thus triggers early exist in analyze_ssa_name. Fixed by disabling
early exits when handling deferred names.
gcc/ChangeLog:
2021-12-20 Jan Hubicka <hubicka@ucw.cz>
PR ipa/103669
* ipa-modref.c (modref_eaf_analysis::analyze_ssa_name): Add deferred
parameter.
(modref_eaf_analysis::propagate): Use it.
gcc/testsuite/ChangeLog:
2021-12-20 Jan Hubicka <hubicka@ucw.cz>
PR ipa/103669
* g++.dg/torture/pr103669.C: New test.
liuhongt [Wed, 15 Dec 2021 05:07:30 +0000 (13:07 +0800)]
Optimize bit_and op1 float_vector_all_ones_operands to op1.
gcc/ChangeLog:
PR target/98468
* config/i386/sse.md (*bit_and_float_vector_all_ones): New
pre-reload splitter.
gcc/testsuite/ChangeLog:
* gcc.target/i386/pr98468.c: New test.
GCC Administrator [Mon, 20 Dec 2021 00:16:21 +0000 (00:16 +0000)]
Daily bump.
Francois-Xavier Coudert [Sun, 19 Dec 2021 23:45:31 +0000 (00:45 +0100)]
Fortran: add support for IEEE intrinsics on aarch64 non-glibc targets
This enables IEEE support on the upcoming aarch64-apple-darwin target,
and has been tested for some time in an external port.
libgfortran/ChangeLog:
* configure.host: Add aarch64-apple-darwin support.
* config/fpu-aarch64.h: New file.
Andrew Pinski [Sun, 19 Dec 2021 22:26:41 +0000 (22:26 +0000)]
Change the xfail in gcc.dg/uninit-pr89230-1.c
With the recent PHI-OPT patch for line numbers, I had missed this
testcase was now failing. The uninitialized warning was there
before my recent patch, just was on the wrong line. The testcase
had added an xfail in r12-4698-gf6d012338 (though a bug report was
not filed to record it).
This patch changes the dg-bogus messages around to catch both locations
and xfail both of them.
At least there is now a patch for the correct line numbers for the
phi-opt.
Committed after testing the testcase.
gcc/testsuite/ChangeLog:
* gcc.dg/uninit-pr89230-1.c: Change the dg-bogus messages
around and xfail both of them.
Jan Hubicka [Sun, 19 Dec 2021 21:28:40 +0000 (22:28 +0100)]
Fix early exit in modref_merge_call_site_flags
When adding support for static chain and return slot flags I forgot to update
early exit condition in modref_merge_call_site_flags. This yields to wrong
code as demonstrated by the Fortran testcase attached to PR (which I hope
someone will help me to turn into testuite one).
gcc/ChangeLog:
2021-12-19 Jan Hubicka <hubicka@ucw.cz>
PR ipa/103766
* ipa-modref.c (modref_merge_call_site_flags): Fix early exit condition
Matthias Kretz [Wed, 15 Dec 2021 08:45:06 +0000 (09:45 +0100)]
c++: don't ICE on NAMESPACE_DECL inside FUNCTION_DECL
Code like
void swap() {
namespace __variant = __detail::__variant;
...
}
create a NAMESPACE_DECL where the CP_DECL_CONTEXT is a FUNCTION_DECL.
DECL_TEMPLATE_INFO fails on NAMESPACE_DECL and therefore must be handled
first in the assertion.
Signed-off-by: Matthias Kretz <m.kretz@gsi.de>
gcc/cp/ChangeLog:
* module.cc (trees_out::get_merge_kind): NAMESPACE_DECLs also
cannot have a DECL_TEMPLATE_INFO.
Patrick Palka [Sun, 19 Dec 2021 19:42:14 +0000 (14:42 -0500)]
c++: nested lambda capturing a capture proxy, cont [PR94376]
The r12-5403 fix apparently doesn't handle the case where the inner
lambda explicitly rather than implicitly captures the capture proxy from
the outer lambda, which causes us to reject the first example in the
testcase below.
This is because compared to an implicit capture, the effective initializer
for an explicit capture is wrapped in a location wrapper (pointing to within
the capture list), and this wrapper foils the is_capture_proxy check added
in r12-5403.
The simplest fix appears to be to strip location wrappers accordingly
before checking is_capture_proxy. And to help prevent against this kind
of bug, this patch also makes is_capture_proxy assert it doesn't see a
location wrapper.
PR c++/94376
gcc/cp/ChangeLog:
* lambda.c (lambda_capture_field_type): Strip location wrappers
before checking for a capture proxy.
(is_capture_proxy): Assert that we don't see a location wrapper.
(mark_const_cap_r): Don't call is_constant_capture_proxy on a
location wrapper.
gcc/testsuite/ChangeLog:
* g++.dg/cpp0x/lambda/lambda-nested9a.C: New test.
Patrick Palka [Sun, 19 Dec 2021 18:49:04 +0000 (13:49 -0500)]
print-tree: dump DECL_LANG_FLAG_8
gcc/ChangeLog:
* print-tree.c (print_node) <case tcc_declaration>: Dump
DECL_LANG_FLAG_8.
Patrick Palka [Sun, 19 Dec 2021 17:10:16 +0000 (12:10 -0500)]
c++: local_specializations and recursive constrained fn [PR103714]
Here during constraint checking for the inner call to A<0>::f<0>,
substitution into the PARM_DECL d in the atomic constraint yields the
wrong local specialization because local_specializations at this point
is nonempty, and contains specializations for the caller A<0>::f<1>.
This patch makes us call push_to_top_level during satisfaction, which'll
temporarily clear local_specializations for us.
PR c++/103714
gcc/cp/ChangeLog:
* constraint.cc (satisfy_declaration_constraints): Do
push_to_top_level and pop_from_top_level around the call to
satisfy_normalized_constraints.
gcc/testsuite/ChangeLog:
* g++.dg/cpp2a/concepts-uneval5.C: New test.
Francois-Xavier Coudert [Sun, 19 Dec 2021 11:23:32 +0000 (12:23 +0100)]
testsuite: mark tests that require alias
gcc/testsuite/ChangeLog:
* gcc.dg/pr100509.c: Needs alias.
* gcc.dg/pragma-diag-10.c: Needs alias.
Andrew Pinski [Sat, 18 Dec 2021 11:52:37 +0000 (11:52 +0000)]
Improve location for new statements in match-and-simplify phiopt
Before match-and-simplify was used in phiot, the location of the
new stamtents were all of that of the conditional, this adds that
back as I did not realize gimple_simplify didn't do that for you.
OK? Bootstrapped and tested on x86_64 with no regressions.
gcc/ChangeLog:
* tree-ssa-phiopt.c (gimple_simplify_phiopt): Annotate the
new sequence with the location of the conditional statement.
GCC Administrator [Sun, 19 Dec 2021 00:16:17 +0000 (00:16 +0000)]
Daily bump.
Harald Anlauf [Sat, 18 Dec 2021 22:21:35 +0000 (23:21 +0100)]
Fortran: reject BOZ type argument to SIZEOF().
gcc/fortran/ChangeLog:
PR fortran/103412
* check.c (gfc_check_sizeof): Reject BOZ type argument.
gcc/testsuite/ChangeLog:
PR fortran/103412
* gfortran.dg/illegal_boz_arg_2.f90: New test.
Roger Sayle [Sat, 18 Dec 2021 13:51:56 +0000 (13:51 +0000)]
x86: PR target/103611: Splitter for DST:DI = (HI:SI<<32)|LO:SI.
A common idiom is to create a DImode value from the "concat" of two SImode
values, using "(long long)hi << 32 | (long long)lo", where the operation
may be ior, xor or plus. On x86, with -m32, the high and low parts of
a DImode register are actually different SImode registers (typically %edx
and %eax) so ideally this idiom should reduce to two move instructions
(or optimally, just clever register allocation).
Unfortunately, GCC currently performs the IOR operation above on -m32,
and worse allocates DImode registers (split to SImode register pairs)
for both the zero extended HI and LO values.
Hence, for test1 from the new test case below:
typedef int __v4si __attribute__ ((__vector_size__ (16)));
long long test1(__v4si v) {
unsigned int loVal = (unsigned int)v[0];
unsigned int hiVal = (unsigned int)v[1];
return (long long)(loVal) | ((long long)(hiVal) << 32);
}
we currently generate (with -m32 -O2 -msse4.1):
test1: subl $28, %esp
pextrd $1, %xmm0, %eax
pmovzxdq %xmm0, %xmm1
movq %xmm1, 8(%esp)
movl %eax, %edx
movl 8(%esp), %eax
orl 12(%esp), %edx
addl $28, %esp
orb $0, %ah
ret
with this patch we now generate:
test1: pextrd $1, %xmm0, %edx
movd %xmm0, %eax
ret
The fix is to recognize and split the idiom (hi<<32)|zext(lo) prior
to register allocation on !TARGET_64BIT, simplifying this sequence to
"highpart(dst) = hi; lowpart(dst) = lo".
The one minor complication is that sse.md's define_insn for
*vec_extractv4si_0_zext_sse4 can sometimes interfere with this
optimization. It turns out that on !TARGET_64BIT, the zero_extend:DI
following vec_select:SI isn't free, and this insn gets split back
into multiple instructions during later passes, but too late to
be optimized away by this patch/reload. Hence the last hunk of
this patch is to restrict *vec_extractv4si_0_zext_sse4 to TARGET_64BIT.
Checking PR target/80286, where *vec_extractv4si_0_zext_sse4 was
first added, this seems reasonable.
2021-12-18 Roger Sayle <roger@nextmovesoftware.com>
Uroš Bizjak <ubizjak@gmail.com>
gcc/ChangeLog
PR target/103611
* config/i386/i386.md (any_or_plus): New code iterator.
(define_split): Split (HI<<32)|zext(LO) into piece-wise
move instructions on !TARGET_64BIT.
* config/i386/sse.md (*vec_extractv4si_0_zext_sse4):
Restrict to TARGET_64BIT.
gcc/testsuite/ChangeLog
PR target/103611
* gcc.target/i386/pr103611-2.c: New test case.
Roger Sayle [Sat, 18 Dec 2021 13:47:52 +0000 (13:47 +0000)]
PR target/32803: Add -Oz option for improved clang compatibility.
This patch adds support for an -Oz command line option, aggressively
optimizing for size at the expense of performance. GCC's current -Os
provides a reasonable balance of size and performance, whereas -Oz is
probably only useful for code size benchmarks such as CSiBE. Or so I
thought until I read in https://news.ycombinator.com/item?id=
25408853
that clang's -Oz sometimes outperforms -O[23s]; I suspect modern instruction
decode stages can treat "pushq $1; popq %rax" as a short uop encoding.
Instead of introducing a new global variable, this patch simply abuses
the existing optimize_size by setting its value to 2. The only change
in behaviour is the tweak to the i386 backend implementing the suggestion
in PR target/32803 to use a short push/pop sequence for loading small
immediate values (-128..127) on x86, matching the behaviour of LLVM.
On x86_64, the simple function:
int foo() { return 25; }
currently generates with -Os:
foo: movl $25, %eax // 5 bytes
ret
With the proposed -Oz, it generates:
foo: pushq $25 // 2 bytes
popq %rax // 1 byte
ret
On CSiBE, this results in a 0.94% improvement (3703513 bytes total
down to 3668516 bytes).
2021-12-18 Roger Sayle <roger@nextmovesoftware.com>
gcc/ChangeLog
PR target/32803
* common.opt (Oz): New command line option.
* doc/invoke.texi: Document the new -Oz option.
* lto-wrapper.c (merge_and_complain, append_compiler_options):
Treat OPT_Oz as synonymous with OPT_Os.
* optc-save-gen.awk: Increase maximum value of optimize_size to 2.
* opts.c (default_options_optimization) [OPT_Oz]: Handle OPT_Oz
just like OPT_Os, except set opt->x_optimize_size to 2.
(common_handle_option): Skip OPT_Oz just like OPT_Os.
* config/i386/i386.md (*movdi_internal): Use a push/pop sequence
for suitable SImode TYPE_IMOV moves when optimize_size > 1.
(*movsi_internal): Likewise.
gcc/testsuite/ChangeLog
PR target/32803
* gcc.target/i386/pr32803.c: New test case.
Siddhesh Poyarekar [Sat, 18 Dec 2021 11:16:43 +0000 (16:46 +0530)]
tree-optimization/103759: Use sizetype everywhere for object sizes
Since all computations in tree-object-size are now done in sizetype and
not HOST_WIDE_INT, comparisons with HOST_WIDE_INT based unknown and
initval would be incorrect. Instead, use the sizetype trees directly to
generate and evaluate initval and unknown size values.
gcc/ChangeLog:
PR tree-optimization/103759
* tree-object-size.c (unknown, initval): Remove functions.
(size_unknown, size_initval, size_unknown_p): Operate directly
on trees.
Signed-off-by: Siddhesh Poyarekar <siddhesh@gotplt.org>
François-Xavier Coudert [Thu, 16 Dec 2021 17:38:30 +0000 (18:38 +0100)]
Fortran: Cast arguments of <ctype.h> functions to unsigned char
Functions from <ctype.h> should only be called on values that can be
represented by unsigned char. On targets where char is a signed type,
some of libgfortran calls have undefined behaviour.
The solution is to cast the argument to unsigned char type. I’ve defined
macros in libgfortran.h to do so, to retain legibility of the library
code.
PR libfortran/95177
libgfortran/ChangeLog
* libgfortran.h: include ctype.h, provide safe macros.
* io/format.c: use safe macros.
* io/list_read.c: use safe macros.
* io/read.c: use safe macros.
* io/write.c: use safe macros.
* runtime/environ.c: use safe macros.
François-Xavier Coudert [Fri, 17 Dec 2021 18:30:36 +0000 (19:30 +0100)]
Darwin: Future-proof and homogeneize detection of darwin versions
The current GCC branch will become 12.1.0, which will be the stable
version of GCC when the next macOS version is released. There are some
places in GCC that don’t handle darwin22 as a version, so we need to
future-proof it (gcc/config.gcc and gcc/config/darwin-driver.c). We
align that code with what Apple clang does, i.e. accept all potential
major macOS versions until 99.
This patch also homogenises the handling of darwin version numbers,
where the majority of places use darwin2*, but some used darwin2[0-9]*.
Since there never was a darwin2.x version, the two are equivalent, and
we prefer the simpler darwin2*
gcc/ChangeLog:
* config/darwin-driver.c: Make version code more future-proof.
* config.gcc: Homogeneize darwin versions.
* configure.ac: Homogeneize darwin versions.
* configure: Regenerate.
gcc/testsuite/ChangeLog:
* gcc.dg/darwin-minversion-link.c: Test darwin21.
* obj-c++.dg/cxx-ivars-3.mm: Homogeneize darwin versions.
* obj-c++.dg/objc-gc-3.mm: Homogeneize darwin versions.
* objc.dg/objc-gc-4.m: Homogeneize darwin versions.
GCC Administrator [Sat, 18 Dec 2021 00:16:23 +0000 (00:16 +0000)]
Daily bump.
Marek Polacek [Thu, 16 Dec 2021 21:29:41 +0000 (16:29 -0500)]
attribs: Fix wrong error with -Wno-attribute=A::b [PR103649]
My patch to implement -Wno-attribute=A::b caused a bogus error when
parsing
[[foo::bar(1, 2)]];
when -Wno-attributes=foo::bar was specified on the command line, because
when we create a fake foo::bar attribute and insert it into our attribute
table, it is created with max_length == 0 which doesn't allow any args.
That is wrong -- we know nothing about the attribute, so we shouldn't
require any specific number of arguments. And since unknown attributes
can be rather complex (see for example omp::{directive,sequence}), we
must skip parsing their arguments. To that end, I'm using max_length
with value -2.
Also let's not warn about things like
[[vendor::assume(true)]];
because they may have some meaning (this is reminiscent of C++ Portable
Assumptions).
PR c/103649
gcc/ChangeLog:
* attribs.c (handle_ignored_attributes_option): Create the fake
attribute with max_length == -2.
(attribute_ignored_p): New overloads.
* attribs.h (attribute_ignored_p): Declare them.
* tree-core.h (struct attribute_spec): Document that max_length
can be -2.
gcc/c/ChangeLog:
* c-decl.c (c_warn_unused_attributes): Don't warn for
attribute_ignored_p.
* c-parser.c (c_parser_std_attribute): Skip parsing of the attribute
arguments when the attribute is ignored.
gcc/cp/ChangeLog:
* parser.c (cp_parser_declaration): Don't warn for attribute_ignored_p.
(cp_parser_std_attribute): Skip parsing of the attribute
arguments when the attribute is ignored.
gcc/testsuite/ChangeLog:
* c-c++-common/Wno-attributes-6.c: New test.
David Edelsohn [Fri, 17 Dec 2021 22:16:19 +0000 (17:16 -0500)]
testsuite: update expected results for ilp32.
gcc/testsuite/ChangeLog:
* gcc.target/powerpc/fold-vec-insert-float-p9.c
Olivier Hainque [Sat, 9 Oct 2021 13:22:12 +0000 (13:22 +0000)]
Add -mdejagnu-cpu=power7 to dg-options for pr97142.c
To match the tests expectations for toolchains
configured to default to not so capable cpus.
2021-12-17 Olivier Hainque <hainque@adacore.com>
gcc/testsuite/
* gcc.target/powerpc/pr97142.c: Add -mdejagnu-cpu=power7
to the dg-options.
Marek Polacek [Thu, 16 Dec 2021 19:57:07 +0000 (14:57 -0500)]
c++: Improve diagnostic for class tmpl/class redecl [PR103749]
For code like
template<typename>
struct bar;
struct bar {
int baz;
};
bar var;
we emit a fairly misleading and unwieldy diagnostic:
~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~
$ g++ -c u.cc
u.cc:6:8: error: template argument required for 'struct bar'
6 | struct bar {
| ^~~
u.cc:10:5: error: class template argument deduction failed:
10 | bar var;
| ^~~
u.cc:10:5: error: no matching function for call to 'bar()'
u.cc:3:17: note: candidate: 'template<class> bar()-> bar< <template-parameter-1-1> >'
3 | friend struct bar;
| ^~~
u.cc:3:17: note: template argument deduction/substitution failed:
u.cc:10:5: note: couldn't deduce template parameter '<template-parameter-1-1>'
10 | bar var;
| ^~~
u.cc:3:17: note: candidate: 'template<class> bar(bar< <template-parameter-1-1> >)-> bar< <template-parameter-1-1> >'
3 | friend struct bar;
| ^~~
u.cc:3:17: note: template argument deduction/substitution failed:
u.cc:10:5: note: candidate expects 1 argument, 0 provided
10 | bar var;
| ^~~
~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~
but with this patch we get:
~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~
z.C:4:10: error: class template 'bar' redeclared as non-template
4 | struct bar {
| ^~~
z.C:2:10: note: previous declaration here
2 | struct bar;
| ^~~
z.C:8:7: error: 'bar<...auto...> var' has incomplete type
8 | bar var;
| ^~~
~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~
which is clearer about what the problem is.
I thought it'd be nice to avoid printing the messages about failed CTAD,
too. To that end, I'm using CLASSTYPE_ERRONEOUS to suppress CTAD. Not
sure if that's entirely kosher.
The other direction (first a non-template class declaration followed by
a class template definition) we handle quite well:
~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~
z.C:11:8: error: 'bar' is not a template
11 | struct bar {};
| ^~~
z.C:8:8: note: previous declaration here
8 | struct bar;
| ^~~
~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~
PR c++/103749
gcc/cp/ChangeLog:
* decl.c (lookup_and_check_tag): Give an error when a class was
declared as template but no template header has been provided.
* pt.c (do_class_deduction): Don't deduce CLASSTYPE_ERRONEOUS
types.
gcc/testsuite/ChangeLog:
* g++.dg/template/redecl4.C: Adjust dg-error.
* g++.dg/diagnostic/redeclaration-2.C: New test.
Segher Boessenkool [Fri, 17 Dec 2021 17:01:16 +0000 (17:01 +0000)]
rs6000: Update darn testcases
Make the darn testcases work (and be tested) in 32-bit mode as well.
They used to ICE, but they no longer do.
2021-12-17 Segher Boessenkool <segher@kernel.crashing.org>
gcc/testsuite/
PR target/103624
* gcc.target/powerpc/darn-0.c: Remove target clause.
* gcc.target/powerpc/darn-1.c: Remove target clause. Remove lp64
requirement. Change return type to long.
* gcc.target/powerpc/darn-2.c: Ditto.
* gcc.target/powerpc/darn-3.c: Remove target clause.
Segher Boessenkool [Fri, 17 Dec 2021 16:59:55 +0000 (16:59 +0000)]
rs6000: Redo darn (PR103624)
The builtins now all return "long". The patterns have :GPR as the
output mode, so they can be 32-bit as well (the instruction makes sense
in 32 bit just fine). The builtins expand to the DImode version
normally, but to the SImode if {32bit} is true.
2021-12-17 Segher Boessenkool <segher@kernel.crashing.org>
PR target/103624
* config/rs6000/rs6000-builtins.def (__builtin_darn): Expand to
darn_64_di. Add {32bit} attribute. Return long.
(__builtin_darn_32): Expand to darn_32_di. Add {32bit} attribute.
Return long.
(__builtin_darn_raw): Expand to darn_raw_di. Add {32bit} attribute.
Return long.
* config/rs6000/rs6000-call.c (rs6000_expand_builtin): Expand the darn
builtins to the _si variants for -m32.
* config/rs6000/rs6000.md (UNSPECV_DARN_32, UNSPECV_DARN_RAW): Delete.
(UNSPECV_DARN): Update comment.
(darn_32, darn_raw, darn): Delete.
(darn_32_<mode>, darn_64_<mode>, darn_raw_<mode> for GPR): New.
(@darn<mode> for GPR): New.
Iain Sandoe [Sat, 2 Oct 2021 16:20:08 +0000 (17:20 +0100)]
coroutines: Handle initial awaiters with non-void returns [PR 100127].
The way in which a C++20 coroutine is specified discards any value
that might be returned from the initial or final await expressions.
This ICE was caused by an initial await expression with an
await_resume () returning a reference, the function rewrite code
was not set up to expect this.
Fixed by looking through any indirection present and by explicitly
discarding the value, if any, returned by await_resume().
It does not seem useful to make a diagnostic for this, since
the user could define a generic awaiter that usefully returns
values when used in a different position from the initial (or
final) await expressions.
Signed-off-by: Iain Sandoe <iain@sandoe.co.uk>
PR c++/100127
gcc/cp/ChangeLog:
* coroutines.cc (coro_rewrite_function_body): Handle initial
await expressions that try to produce a reference value.
gcc/testsuite/ChangeLog:
* g++.dg/coroutines/pr100127.C: New test.
Iain Sandoe [Sun, 3 Oct 2021 18:46:09 +0000 (19:46 +0100)]
coroutines: Pass lvalues to user-defined operator new [PR 100772].
The wording of the standard has been clarified to be explicit that
the the parameters to any user-defined operator-new in the promise
class should be lvalues.
Signed-off-by: Iain Sandoe <iain@sandoe.co.uk>
PR c++/100772
gcc/cp/ChangeLog:
* coroutines.cc (morph_fn_to_coro): Convert function parms
from reference before constructing any operator-new args
list.
gcc/testsuite/ChangeLog:
* g++.dg/coroutines/pr100772-a.C: New test.
* g++.dg/coroutines/pr100772-b.C: New test.
Iain Sandoe [Fri, 15 Oct 2021 08:42:25 +0000 (09:42 +0100)]
coroutines, c++: Add test for PR 96517.
This PR was fixed by r12-5255-gdaa9c6b015, this adds
the testcase.
Signed-off-by: Iain Sandoe <iain@sandoe.co.uk>
gcc/testsuite/ChangeLog:
PR c++/96517
* g++.dg/coroutines/pr96517.C: New test.
Bill Schmidt [Fri, 17 Dec 2021 16:39:00 +0000 (10:39 -0600)]
rs6000: Fix fake vec_promote overload
rs6000-overload.def defines one instance of vec_promote so that it can be
registered with the front end. Actual expansion of the vec_promote overload
is done with special-case code in rs6000-c.c. During another cleanup, I
observed that the fake instance has the wrong number of arguments. Fix that.
2021-12-17 Bill Schmidt <wschmidt@linux.ibm.com>
gcc/
* config/rs6000/rs6000-overload.def (__builtin_vec_promote): Add second
argument.
David Edelsohn [Fri, 17 Dec 2021 14:45:20 +0000 (09:45 -0500)]
testsuite: pragma-optimize.c requires ifunc.
gcc/testsuite/ChangeLog:
* gcc.target/powerpc/pragma-optimize.c: Require ifunc support.
Richard Sandiford [Fri, 17 Dec 2021 14:18:39 +0000 (14:18 +0000)]
vect: Fix multi-vector SLP gather loads [PR103744]
This PR shows that I didn't properly test the multi-vector case when
adding support for SLP gather loads. The patch fixes that case using
the same approach as we do for non-SLP cases: keep the scalar base
the same, but iterate through the (also multi-vector) vector offsets.
“vec_num * j + i” is already used elsewhere as a way of handling both
the multi-vector SLP case and the multi-vector non-SLP case.
gcc/
PR tree-optimization/103744
* tree-vect-stmts.c (vectorizable_load): Handle multi-vector
SLP gather loads.
gcc/testsuite/
PR tree-optimization/103744
* gcc.dg/vect/pr103744-1.c: New test.
* gcc.dg/vect/pr103744-2.c: Likewise.
Martin Liska [Fri, 17 Dec 2021 13:56:52 +0000 (14:56 +0100)]
docs: fix option name reference
gcc/ChangeLog:
* doc/invoke.texi: Rename to -fstack-protector.
Martin Liska [Fri, 17 Dec 2021 13:33:35 +0000 (14:33 +0100)]
docs: Fix spelling issues in -fipa-strict-aliasing.
gcc/ChangeLog:
* doc/invoke.texi: Fix spelling issues.
Tamar Christina [Fri, 17 Dec 2021 10:59:25 +0000 (10:59 +0000)]
slp: check that the operation we're combing is a boolean operation [PR103741]
It seems I forgot to check that the operation we're combing when masking the
predicated together are actually predicates types.
Without it we end up accidentally trying to combine a value and a mask.
gcc/ChangeLog:
PR tree-optimization/103741
* tree-vect-stmts.c (vectorizable_operation): Check for boolean.
gcc/testsuite/ChangeLog:
PR tree-optimization/103741
* gcc.target/aarch64/pr103741.c: New test.
Iain Sandoe [Wed, 15 Dec 2021 14:11:58 +0000 (14:11 +0000)]
libgcc, Darwin: Add missing build dependencies.
There was a race condition where the link for the new shared EH library
(only used on earlier Darwin) could fail because the new crts had not been
copied to the gcc directory. This can cause a build failure (although
currently only seen on powerpc-darwin).
Fixed by adding specific dependency on the crts and on the multi target.
We also add the declaration header for the Darwin10 unwinder shim to the
powerpc cases, since we build that there for Rosetta use.
Signed-off-by: Iain Sandoe <iain@sandoe.co.uk>
libgcc/ChangeLog:
* config.host: Add shim declaration header to powerpc*-darwin builds.
* config/rs6000/t-darwin-ehs: Remove dependency on the powerpc end
file.
* config/t-darwin-ehs: Add dependencies to the shared unwinder
objects.
* config/t-slibgcc-darwin: Add extra_parts to the dependencies for
the shared EH lib. Add all-multi to the dependencies for the
libgcc_s.1.dylib redirections.
Martin Liska [Fri, 17 Dec 2021 08:56:21 +0000 (09:56 +0100)]
Sync config.sub: 2021-10-27
ChangeLog:
* config.sub: Sync from master.
Iain Sandoe [Wed, 15 Dec 2021 20:25:27 +0000 (20:25 +0000)]
Darwin, Driver: Avoid a link line for empty commands.
We were pushing a spec value for weak_reference_mismatches unconditionally
which is not needed (the value was the default) and the side-effect of
this was that we appeared to need to drive a link command; leading to
unexpected diagnostics for cases where gcc was invoked with an empty
command line.
Also we were pushing flags for sysroot, os minimum version and controls
even if the command line was empty.
Signed-off-by: Iain Sandoe <iain@sandoe.co.uk>
gcc/ChangeLog:
* config/darwin-driver.c (darwin_driver_init): Exit from the
option handling early if the command line is definitely enpty.
* config/darwin.h (SUBTARGET_DRIVER_SELF_SPECS): Remove
setting for the default content of weak_reference_mismatches.
Iain Sandoe [Fri, 17 Dec 2021 09:29:02 +0000 (09:29 +0000)]
Darwin, ppc: Additional change for r12-5974.
This adds a missed change from r12-5974-g926d64906af.
The builin_decls array has been renamed to drop the trailing
_x that was used during the main changes to the builtins.
Signed-off-by: Iain Sandoe <iain@sandoe.co.uk>
gcc/ChangeLog:
* config/rs6000/darwin.h: Drop trailing _x from the
builtin_decls array name.
Martin Liska [Fri, 17 Dec 2021 08:54:53 +0000 (09:54 +0100)]
Revert "Fixed typo"
This reverts commit
06cd44b4387a9f6ab46f377f42ee5be9cf11bf15.
Haochen Jiang [Thu, 2 Dec 2021 07:30:17 +0000 (15:30 +0800)]
Add combine splitter to transform vpternlogd/vpcmpeqd/vpxor/vblendvps to vblendvps for ~op0
gcc/ChangeLog:
PR target/100738
* config/i386/sse.md (*avx_cmp<mode>3_lt, *avx_cmp<mode>3_ltint):
Remove MEM_P restriction and add force_reg for operands[2].
(*avx_cmp<mode>3_ltint_not): Add new define_insn_and_split.
gcc/testsuite/ChangeLog:
PR target/100738
* g++.target/i386/avx512vl-pr100738-1.C: New test.
Siddhesh Poyarekar [Fri, 17 Dec 2021 04:04:44 +0000 (09:34 +0530)]
__builtin_dynamic_object_size: Recognize builtin
Recognize the __builtin_dynamic_object_size builtin and add paths in the
object size path to deal with it, but treat it like
__builtin_object_size for now. Also add tests to provide the same
testing coverage for the new builtin name.
gcc/ChangeLog:
* builtins.def (BUILT_IN_DYNAMIC_OBJECT_SIZE): New builtin.
* tree-object-size.h: Move object size type bits enum from
tree-object-size.c and add new value OST_DYNAMIC.
* builtins.c (expand_builtin, fold_builtin_2): Handle it.
(fold_builtin_object_size): Handle new builtin and adjust for
change to compute_builtin_object_size.
* tree-object-size.c: Include builtins.h.
(compute_builtin_object_size): Adjust.
(early_object_sizes_execute_one,
dynamic_object_sizes_execute_one): New functions.
(object_sizes_execute): Rename insert_min_max_p argument to
early. Handle BUILT_IN_DYNAMIC_OBJECT_SIZE and call the new
functions.
* doc/extend.texi (__builtin_dynamic_object_size): Document new
builtin.
gcc/testsuite/ChangeLog:
* g++.dg/ext/builtin-dynamic-object-size1.C: New test.
* g++.dg/ext/builtin-dynamic-object-size2.C: Likewise.
* gcc.dg/builtin-dynamic-alloc-size.c: Likewise.
* gcc.dg/builtin-dynamic-object-size-1.c: Likewise.
* gcc.dg/builtin-dynamic-object-size-10.c: Likewise.
* gcc.dg/builtin-dynamic-object-size-11.c: Likewise.
* gcc.dg/builtin-dynamic-object-size-12.c: Likewise.
* gcc.dg/builtin-dynamic-object-size-13.c: Likewise.
* gcc.dg/builtin-dynamic-object-size-14.c: Likewise.
* gcc.dg/builtin-dynamic-object-size-15.c: Likewise.
* gcc.dg/builtin-dynamic-object-size-16.c: Likewise.
* gcc.dg/builtin-dynamic-object-size-17.c: Likewise.
* gcc.dg/builtin-dynamic-object-size-18.c: Likewise.
* gcc.dg/builtin-dynamic-object-size-19.c: Likewise.
* gcc.dg/builtin-dynamic-object-size-2.c: Likewise.
* gcc.dg/builtin-dynamic-object-size-3.c: Likewise.
* gcc.dg/builtin-dynamic-object-size-4.c: Likewise.
* gcc.dg/builtin-dynamic-object-size-5.c: Likewise.
* gcc.dg/builtin-dynamic-object-size-6.c: Likewise.
* gcc.dg/builtin-dynamic-object-size-7.c: Likewise.
* gcc.dg/builtin-dynamic-object-size-8.c: Likewise.
* gcc.dg/builtin-dynamic-object-size-9.c: Likewise.
* gcc.dg/builtin-object-size-16.c: Adjust to allow inclusion
from builtin-dynamic-object-size-16.c.
* gcc.dg/builtin-object-size-17.c: Likewise.
Signed-off-by: Siddhesh Poyarekar <siddhesh@gotplt.org>
Siddhesh Poyarekar [Fri, 17 Dec 2021 01:37:18 +0000 (07:07 +0530)]
tree-object-size: Use trees and support negative offsets
Transform tree-object-size to operate on tree objects instead of host
wide integers. This makes it easier to extend to dynamic expressions
for object sizes.
The compute_builtin_object_size interface also now returns a tree
expression instead of HOST_WIDE_INT, so callers have been adjusted to
account for that.
The trees in object_sizes are each an object_size object with members
size (the bytes from the pointer to the end of the object) and wholesize
(the size of the whole object). This allows analysis of negative
offsets, which can now be allowed to the extent of the object bounds.
Tests have been added to verify that it actually works.
gcc/ChangeLog:
* tree-object-size.h (compute_builtin_object_size): Return tree
instead of HOST_WIDE_INT.
* builtins.c (fold_builtin_object_size): Adjust.
* gimple-fold.c (gimple_fold_builtin_strncat): Likewise.
* ubsan.c (instrument_object_size): Likewise.
* tree-object-size.c (object_size): New structure.
(object_sizes): Change type to vec<object_size>.
(initval): New function.
(unknown): Use it.
(size_unknown_p, size_initval, size_unknown): New functions.
(object_sizes_unknown_p): Use it.
(object_sizes_get): Return tree.
(object_sizes_initialize): Rename from object_sizes_set_force
and set VAL parameter type as tree. Add new parameter WHOLEVAL.
(object_sizes_set): Set VAL parameter type as tree and adjust
implementation. Add new parameter WHOLEVAL.
(size_for_offset): New function.
(decl_init_size): Adjust comment.
(addr_object_size): Change PSIZE parameter to tree and adjust
implementation. Add new parameter PWHOLESIZE.
(alloc_object_size): Return tree.
(compute_builtin_object_size): Return tree in PSIZE.
(expr_object_size, call_object_size, unknown_object_size):
Adjust for object_sizes_set change.
(merge_object_sizes): Drop OFFSET parameter and adjust
implementation for tree change.
(plus_stmt_object_size): Call collect_object_sizes_for directly
instead of merge_object_size and call size_for_offset to get net
size.
(cond_expr_object_size, collect_object_sizes_for,
object_sizes_execute): Adjust for change of type from
HOST_WIDE_INT to tree.
(check_for_plus_in_loops_1): Likewise and skip non-positive
offsets.
gcc/testsuite/ChangeLog:
* gcc.dg/builtin-object-size-1.c (test9): New test.
(main): Call it.
* gcc.dg/builtin-object-size-2.c (test8): New test.
(main): Call it.
* gcc.dg/builtin-object-size-3.c (test9): New test.
(main): Call it.
* gcc.dg/builtin-object-size-4.c (test8): New test.
(main): Call it.
* gcc.dg/builtin-object-size-5.c (test5, test6, test7): New
tests.
Signed-off-by: Siddhesh Poyarekar <siddhesh@gotplt.org>
Jason Merrill [Wed, 15 Dec 2021 22:14:54 +0000 (17:14 -0500)]
c++: tweak comment
The comment documented a parameter that no longer exists.
gcc/cp/ChangeLog:
* constraint.cc (deduce_concept_introduction): Adjust comment.
Jason Merrill [Tue, 14 Dec 2021 22:00:40 +0000 (17:00 -0500)]
c++: layout of aggregate base with DMI [PR103681]
C++14 changed the definition of 'aggregate' to allow default member
initializers, but such classes still need to be considered "non-POD for the
purpose of layout" for ABI compatibility with C++11 code. It seems rare to
derive from such a class, as evidenced by how long this bug has
survived (since r216750 in 2014), but it's certainly worth fixing.
We only warn when we were failing to allocate another field into the
tail padding of the newly aggregate class; this is the only ABI impact.
This also changes end_of_class to consider all data members, not just empty
data members; that used to be an additional flag, removed in r9-5710, but I
don't see any reason not to always include them. This makes the result of
the function correspond to the ABI nvsize term and its nameless counterpart
that does include virtual bases.
When looking closely at other users of end_of_class, I realized that we were
assuming that the latter corresponded to the ABI dsize term, but it doesn't
if the class ends with an empty virtual base (in the rare case that the
empty base can't be assigned offset 0), and this matters for layout of
[[no_unique_address]]. So I added another mode that returns the desired
value for that case. I'm not adding a warning for this ABI fix because it's
a C++20 feature.
PR c++/103681
gcc/ChangeLog:
* common.opt (fabi-version): Add v17.
gcc/cp/ChangeLog:
* cp-tree.h (struct lang_type): Add non_pod_aggregate.
(CLASSTYPE_NON_POD_AGGREGATE): New.
* class.c (check_field_decls): Set it.
(check_bases_and_members): Check it.
(check_non_pod_aggregate): New.
(enum eoc_mode): New.
(end_of_class): Always include non-empty fields.
Add eoc_nv_or_dsize mode.
(include_empty_classes, layout_class_type): Adjust.
gcc/c-family/ChangeLog:
* c-opts.c (c_common_post_options): Update defaults.
gcc/testsuite/ChangeLog:
* g++.dg/abi/macro0.C: Update value.
* g++.dg/abi/no_unique_address6.C: New test.
* g++.dg/abi/nsdmi-aggr1.C: New test.
* g++.dg/abi/nsdmi-aggr1a.C: New test.