Ian Lance Taylor [Sat, 21 Aug 2021 19:42:19 +0000 (12:42 -0700)]
compiler: correct condition for calling memclrHasPointers
When compiling append(s, make([]typ, ln)...), where typ has a pointer,
and the append fits within the existing capacity of s, the condition
used to clear out the new elements was reversed.
Fixes golang/go#47771
Reviewed-on: https://go-review.googlesource.com/c/gofrontend/+/344189
Aldy Hernandez [Thu, 9 Sep 2021 18:30:28 +0000 (20:30 +0200)]
Disable threading through latches until after loop optimizations.
The motivation for this patch was enabling the use of global ranges in
the path solver, but this caused certain properties of loops being
destroyed which made subsequent loop optimizations to fail.
Consequently, this patch's mail goal is to disable jump threading
involving the latch until after loop optimizations have run.
As can be seen in the test adjustments, we mostly shift the threading
from the early threaders (ethread, thread[12] to the late threaders
thread[34]). I have nuked some of the early notes in the testcases
that came as part of the jump threader rewrite. They're mostly noise
now.
Note that we could probably relax some other restrictions in
profitable_path_p when loop optimizations have completed, but it would
require more testing, and I'm hesitant to touch more things than needed
at this point. I have added a reminder to the function to keep this
in mind.
Finally, perhaps as a follow-up, we should apply the same restrictions to
the forward threader. At some point I'd like to combine the cost models.
Tested on x86-64 Linux.
p.s. There is a thorough discussion involving the limitations of jump
threading involving loops here:
https://gcc.gnu.org/pipermail/gcc/2021-September/237247.html
gcc/ChangeLog:
* tree-pass.h (PROP_loop_opts_done): New.
* gimple-range-path.cc (path_range_query::internal_range_of_expr):
Intersect with global range.
* tree-ssa-loop.c (tree_ssa_loop_done): Set PROP_loop_opts_done.
* tree-ssa-threadbackward.c
(back_threader_profitability::profitable_path_p): Disable
threading through latches until after loop optimizations have run.
gcc/testsuite/ChangeLog:
* gcc.dg/tree-ssa/ssa-dom-thread-2b.c: Adjust for disabling of
threading through latches.
* gcc.dg/tree-ssa/ssa-dom-thread-6.c: Same.
* gcc.dg/tree-ssa/ssa-dom-thread-7.c: Same.
Co-authored-by: Michael Matz <matz@suse.de>
David Faust [Wed, 8 Sep 2021 17:31:03 +0000 (10:31 -0700)]
doc: document BPF -mcpu and related options
This commit adds documentation for the new BPF options -mcpu, -mjmpext,
-mjmp32, and -malu32.
gcc/ChangeLog:
* doc/invoke.texi: Document BPF -mcpu, -mjmpext, -mjmp32 and -malu32
options.
David Faust [Wed, 8 Sep 2021 17:28:59 +0000 (10:28 -0700)]
bpf testsuite: add tests for new feature options
This commit adds tests for the new -mjmpext, -mjmp32 and -malu32 feature
options in the BPF backend.
gcc/testsuite/ChangeLog:
* gcc.target/bpf/alu-1.c: New test.
* gcc.target/bpf/jmp-1.c: New test.
David Faust [Wed, 8 Sep 2021 17:26:15 +0000 (10:26 -0700)]
bpf: add -mcpu and related feature options
New instructions have been added over time to the eBPF ISA, but
previously there has been no good method to select which version to
target in GCC.
This patch adds the following options to the BPF backend:
-mcpu={v1, v2, v3}
Select which version of the eBPF ISA to target. This enables or
disables generation of certain instructions. The default is v3.
-mjmpext
Enable extra conditional branch instructions.
Enabled for CPU v2 and above.
-mjmp32
Enable 32-bit jump/branch instructions.
Enabled for CPU v3 and above.
-malu32
Enable 32-bit ALU instructions.
Enabled for CPU v3 and above.
gcc/ChangeLog:
* config/bpf/bpf-opts.h (bpf_isa_version): New enum.
* config/bpf/bpf-protos.h (bpf_expand_cbranch): New.
* config/bpf/bpf.c (bpf_option_override): Handle -mcpu option.
(bpf_expand_cbranch): New function.
* config/bpf/bpf.md (AM mode iterator): Conditionalize support for SI
mode.
(zero_extendsidi2): Only use mov32 instruction if it is available.
(SIM mode iterator): Conditionalize support for SI mode.
(JM mode iterator): New.
(cbranchdi4): Update name, use new JM iterator. Use bpf_expand_cbranch.
(*branch_on_di): Update name, use new JM iterator.
* config/bpf/bpf.opt: (mjmpext): New option.
(malu32): Likewise.
(mjmp32): Likewise.
(mcpu): Likewise.
(bpf_isa): New enum.
David Faust [Fri, 20 Aug 2021 21:54:42 +0000 (14:54 -0700)]
bpf: correct zero_extend output templates
The output templates for zero_extendhidi2 and zero_extendqidi2 could
lead to incorrect code generation when zero-extending one register into
another. This patch adds a new output template to the define_insns to
handle such cases and produce correct asm.
gcc/ChangeLog:
* config/bpf/bpf.md (zero_extendhidi2): Add new output template
for register-to-register extensions.
(zero_extendqidi2): Likewise.
Jonathan Wakely [Fri, 10 Sep 2021 14:08:27 +0000 (15:08 +0100)]
libstdc++: Use "test.invalid." for invalid hostname
This avoids test.invalid.some.domain being successfully resolved.
libstdc++-v3/ChangeLog:
* testsuite/experimental/net/internet/resolver/ops/lookup.cc:
Fix invalid hostname to only match the .invalid TLD.
Richard Biener [Fri, 10 Sep 2021 10:28:09 +0000 (12:28 +0200)]
middle-end/102273 - avoid ICE with auto-init and nested functions
This refactors expansion to consider non-decl LHS. I suspect
the is_val argument is not needed.
2021-09-10 Richard Biener <rguenther@suse.de>
PR middle-end/102273
* internal-fn.c (expand_DEFERRED_INIT): Always expand non-SSA vars.
* gcc.dg/pr102273.c: New testcase.
Thomas Schwinge [Fri, 10 Sep 2021 09:26:50 +0000 (11:26 +0200)]
Fix 'dg-do run' syntax in 'c-c++-common/auto-init-padding-{2,3}.c'
Fix-up for recent commit
a25e0b5e6ac8a77a71c229e0a7b744603365b0e9
"Add -ftrivial-auto-var-init option and uninitialized variable attribute".
gcc/testsuite/
* c-c++-common/auto-init-padding-2.c: Fix 'dg-do run' syntax.
* c-c++-common/auto-init-padding-3.c: Likewise.
Richard Biener [Fri, 10 Sep 2021 08:17:24 +0000 (10:17 +0200)]
middle-end/102269 - avoid auto-init of empty types
This avoids initializing empty types for which we'll eventually
leave a .DEFERRED_INIT call without a LHS.
2021-09-10 Richard Biener <rguenther@suse.de>
PR middle-end/102269
* gimplify.c (is_var_need_auto_init): Empty types do not need
initialization.
* gcc.dg/pr102269.c: New testcase.
Richard Biener [Fri, 10 Sep 2021 06:04:57 +0000 (08:04 +0200)]
Remove vestiges of --with-stabs
This removes the --with-stabs configure option which had no effect
since quite some time.
2021-09-10 Richard Biener <rguenther@suse.de>
* configure.ac (--with-stabs): Remove.
* configure: Regenerate.
* doc/install.texi: Remove --with-stabs documentation.
liuhongt [Mon, 2 Mar 2020 08:57:07 +0000 (16:57 +0800)]
AVX512FP16: Add testcase for vcmpph/vcmpsh/vcomish/vucomish.
gcc/testsuite/ChangeLog:
* gcc.target/i386/avx512fp16-helper.h
(check_results_mask): New check_function.
* gcc.target/i386/avx512fp16-vcmpph-1a.c: New test.
* gcc.target/i386/avx512fp16-vcmpph-1b.c: Ditto.
* gcc.target/i386/avx512fp16-vcmpsh-1a.c: Ditto.
* gcc.target/i386/avx512fp16-vcmpsh-1b.c: Ditto.
* gcc.target/i386/avx512fp16-vcomish-1a.c: Ditto.
* gcc.target/i386/avx512fp16-vcomish-1b.c: Ditto.
* gcc.target/i386/avx512fp16-vcomish-1c.c: Ditto.
* gcc.target/i386/avx512fp16vl-vcmpph-1a.c: Ditto.
* gcc.target/i386/avx512fp16vl-vcmpph-1b.c: Ditto.
liuhongt [Tue, 19 Feb 2019 02:04:02 +0000 (18:04 -0800)]
AVX512FP16: Add vcmpph/vcmpsh/vcomish/vucomish.
gcc/ChangeLog:
* config/i386/avx512fp16intrin.h: (_mm512_cmp_ph_mask):
New intrinsic.
(_mm512_mask_cmp_ph_mask): Likewise.
(_mm512_cmp_round_ph_mask): Likewise.
(_mm512_mask_cmp_round_ph_mask): Likewise.
(_mm_cmp_sh_mask): Likewise.
(_mm_mask_cmp_sh_mask): Likewise.
(_mm_cmp_round_sh_mask): Likewise.
(_mm_mask_cmp_round_sh_mask): Likewise.
(_mm_comieq_sh): Likewise.
(_mm_comilt_sh): Likewise.
(_mm_comile_sh): Likewise.
(_mm_comigt_sh): Likewise.
(_mm_comige_sh): Likewise.
(_mm_comineq_sh): Likewise.
(_mm_ucomieq_sh): Likewise.
(_mm_ucomilt_sh): Likewise.
(_mm_ucomile_sh): Likewise.
(_mm_ucomigt_sh): Likewise.
(_mm_ucomige_sh): Likewise.
(_mm_ucomineq_sh): Likewise.
(_mm_comi_round_sh): Likewise.
(_mm_comi_sh): Likewise.
* config/i386/avx512fp16vlintrin.h (_mm_cmp_ph_mask): New intrinsic.
(_mm_mask_cmp_ph_mask): Likewise.
(_mm256_cmp_ph_mask): Likewise.
(_mm256_mask_cmp_ph_mask): Likewise.
* config/i386/i386-builtin-types.def: Add corresponding builtin types.
* config/i386/i386-builtin.def: Add corresponding new builtins.
* config/i386/i386-expand.c
(ix86_expand_args_builtin): Handle new builtin types.
(ix86_expand_round_builtin): Ditto.
* config/i386/i386.md (ssevecmode): Add HF mode.
(MODEFH): New mode iterator.
* config/i386/sse.md
(V48H_AVX512VL): New mode iterator to support HF vector modes.
Ajdust corresponding description.
(ssecmpintprefix): New.
(VI12_AVX512VL): Adjust to support HF vector modes.
(cmp_imm_predicate): Likewise.
(<avx512>_cmp<mode>3<mask_scalar_merge_name><round_saeonly_name>):
Likewise.
(avx512f_vmcmp<mode>3<round_saeonly_name>): Likewise.
(avx512f_vmcmp<mode>3_mask<round_saeonly_name>): Likewise.
(<sse>_<unord>comi<round_saeonly_name>): Likewise.
gcc/testsuite/ChangeLog:
* gcc.target/i386/avx-1.c: Add test for new builtins.
* gcc.target/i386/sse-13.c: Ditto.
* gcc.target/i386/sse-23.c: Ditto.
* gcc.target/i386/sse-14.c: Add test for new intrinsics.
* gcc.target/i386/sse-22.c: Ditto.
liuhongt [Mon, 2 Mar 2020 08:46:43 +0000 (16:46 +0800)]
AVX512FP16: Add testcase for vmaxph/vmaxsh/vminph/vminsh.
gcc/testsuite/ChangeLog:
* gcc.target/i386/avx512fp16-vmaxph-1a.c: New test.
* gcc.target/i386/avx512fp16-vmaxph-1b.c: Ditto.
* gcc.target/i386/avx512fp16-vmaxsh-1.c: Ditto.
* gcc.target/i386/avx512fp16-vmaxsh-1b.c: Ditto.
* gcc.target/i386/avx512fp16-vminph-1a.c: Ditto.
* gcc.target/i386/avx512fp16-vminph-1b.c: Ditto.
* gcc.target/i386/avx512fp16-vminsh-1.c: Ditto.
* gcc.target/i386/avx512fp16-vminsh-1b.c: Ditto.
* gcc.target/i386/avx512fp16vl-vmaxph-1a.c: Ditto.
* gcc.target/i386/avx512fp16vl-vmaxph-1b.c: Ditto.
* gcc.target/i386/avx512fp16vl-vminph-1a.c: Ditto.
* gcc.target/i386/avx512fp16vl-vminph-1b.c: Ditto.
liuhongt [Thu, 24 Jan 2019 00:06:48 +0000 (16:06 -0800)]
AVX512FP16: Add vmaxph/vminph/vmaxsh/vminsh.
gcc/ChangeLog:
* config/i386/avx512fp16intrin.h: (_mm512_max_ph): New intrinsic.
(_mm512_mask_max_ph): Likewise.
(_mm512_maskz_max_ph): Likewise.
(_mm512_min_ph): Likewise.
(_mm512_mask_min_ph): Likewise.
(_mm512_maskz_min_ph): Likewise.
(_mm512_max_round_ph): Likewise.
(_mm512_mask_max_round_ph): Likewise.
(_mm512_maskz_max_round_ph): Likewise.
(_mm512_min_round_ph): Likewise.
(_mm512_mask_min_round_ph): Likewise.
(_mm512_maskz_min_round_ph): Likewise.
(_mm_max_sh): Likewise.
(_mm_mask_max_sh): Likewise.
(_mm_maskz_max_sh): Likewise.
(_mm_min_sh): Likewise.
(_mm_mask_min_sh): Likewise.
(_mm_maskz_min_sh): Likewise.
(_mm_max_round_sh): Likewise.
(_mm_mask_max_round_sh): Likewise.
(_mm_maskz_max_round_sh): Likewise.
(_mm_min_round_sh): Likewise.
(_mm_mask_min_round_sh): Likewise.
(_mm_maskz_min_round_sh): Likewise.
* config/i386/avx512fp16vlintrin.h (_mm_max_ph): New intrinsic.
(_mm256_max_ph): Likewise.
(_mm_mask_max_ph): Likewise.
(_mm256_mask_max_ph): Likewise.
(_mm_maskz_max_ph): Likewise.
(_mm256_maskz_max_ph): Likewise.
(_mm_min_ph): Likewise.
(_mm256_min_ph): Likewise.
(_mm_mask_min_ph): Likewise.
(_mm256_mask_min_ph): Likewise.
(_mm_maskz_min_ph): Likewise.
(_mm256_maskz_min_ph): Likewise.
* config/i386/i386-builtin-types.def: Add corresponding builtin types.
* config/i386/i386-builtin.def: Add corresponding new builtins.
* config/i386/i386-expand.c
(ix86_expand_args_builtin): Handle new builtin types.
* config/i386/sse.md
(<code><mode>3<mask_name><round_saeonly_name>): Adjust to
support HF vector modes.
(*<code><mode>3<mask_name><round_saeonly_name>): Likewise.
(ieee_<ieee_maxmin><mode>3<mask_name><round_saeonly_name>):
Likewise.
(<sse>_vm<code><mode>3<mask_scalar_name><round_saeonly_scalar_name>):
Likewise.
* config/i386/subst.md (round_saeonly_mode512bit_condition):
Adjust for HF vector modes.
gcc/testsuite/ChangeLog:
* gcc.target/i386/avx-1.c: Add test for new builtins.
* gcc.target/i386/sse-13.c: Ditto.
* gcc.target/i386/sse-23.c: Ditto.
* gcc.target/i386/sse-14.c: Add test for new intrinsics.
* gcc.target/i386/sse-22.c: Ditto.
liuhongt [Mon, 2 Mar 2020 08:43:10 +0000 (16:43 +0800)]
AVX512FP16: Add testcase for vaddsh/vsubsh/vmulsh/vdivsh.
gcc/testsuite/ChangeLog:
* gcc.target/i386/avx512fp16-vaddsh-1a.c: New test.
* gcc.target/i386/avx512fp16-vaddsh-1b.c: Ditto.
* gcc.target/i386/avx512fp16-vdivsh-1a.c: Ditto.
* gcc.target/i386/avx512fp16-vdivsh-1b.c: Ditto.
* gcc.target/i386/avx512fp16-vmulsh-1a.c: Ditto.
* gcc.target/i386/avx512fp16-vmulsh-1b.c: Ditto.
* gcc.target/i386/avx512fp16-vsubsh-1a.c: Ditto.
* gcc.target/i386/avx512fp16-vsubsh-1b.c: Ditto.
* gcc.target/i386/pr54855-11.c: Ditto.
Liu, Hongtao [Mon, 28 Jan 2019 08:05:04 +0000 (00:05 -0800)]
AVX512FP16: Add vaddsh/vsubsh/vmulsh/vdivsh.
gcc/ChangeLog:
* config/i386/avx512fp16intrin.h (_mm_add_sh): New intrinsic.
(_mm_mask_add_sh): Likewise.
(_mm_maskz_add_sh): Likewise.
(_mm_sub_sh): Likewise.
(_mm_mask_sub_sh): Likewise.
(_mm_maskz_sub_sh): Likewise.
(_mm_mul_sh): Likewise.
(_mm_mask_mul_sh): Likewise.
(_mm_maskz_mul_sh): Likewise.
(_mm_div_sh): Likewise.
(_mm_mask_div_sh): Likewise.
(_mm_maskz_div_sh): Likewise.
(_mm_add_round_sh): Likewise.
(_mm_mask_add_round_sh): Likewise.
(_mm_maskz_add_round_sh): Likewise.
(_mm_sub_round_sh): Likewise.
(_mm_mask_sub_round_sh): Likewise.
(_mm_maskz_sub_round_sh): Likewise.
(_mm_mul_round_sh): Likewise.
(_mm_mask_mul_round_sh): Likewise.
(_mm_maskz_mul_round_sh): Likewise.
(_mm_div_round_sh): Likewise.
(_mm_mask_div_round_sh): Likewise.
(_mm_maskz_div_round_sh): Likewise.
* config/i386/i386-builtin-types.def: Add corresponding builtin types.
* config/i386/i386-builtin.def: Add corresponding new builtins.
* config/i386/i386-expand.c
(ix86_expand_round_builtin): Handle new builtins.
* config/i386/sse.md (VF_128): Change description.
(<sse>_vm<plusminus_insn><mode>3<mask_scalar_name><round_scalar_name>):
Adjust to support HF vector modes.
(<sse>_vm<multdiv_mnemonic><mode>3<mask_scalar_name><round_scalar_name>):
Likewise.
gcc/testsuite/ChangeLog:
* gcc.target/i386/avx-1.c: Add test for new builtins.
* gcc.target/i386/sse-13.c: Ditto.
* gcc.target/i386/sse-23.c: Ditto.
* gcc.target/i386/sse-14.c: Add test for new intrinsics.
* gcc.target/i386/sse-22.c: Ditto.
H.J. Lu [Mon, 28 Jan 2019 03:38:02 +0000 (19:38 -0800)]
AVX512FP16: Enable _Float16 autovectorization
gcc/ChangeLog:
* config/i386/i386-expand.c
(ix86_avx256_split_vector_move_misalign): Handle V16HF mode.
* config/i386/i386.c
(ix86_preferred_simd_mode): Handle HF mode.
* config/i386/sse.md (V_256H): New mode iterator.
(avx_vextractf128<mode>): Use it.
(VEC_INIT_MODE): Align vector HFmode condition to vector
HImodes since there're no real HF instruction used.
(VEC_INIT_HALF_MODE): Ditto.
(VIHF): Ditto.
(VIHF_AVX512BW): Ditto.
(*vec_extracthf): Ditto.
(VEC_EXTRACT_MODE): Ditto.
gcc/testsuite/ChangeLog:
* gcc.target/i386/vect-float16-1.c: New test.
* gcc.target/i386/vect-float16-10.c: Ditto.
* gcc.target/i386/vect-float16-11.c: Ditto.
* gcc.target/i386/vect-float16-12.c: Ditto.
* gcc.target/i386/vect-float16-2.c: Ditto.
* gcc.target/i386/vect-float16-3.c: Ditto.
* gcc.target/i386/vect-float16-4.c: Ditto.
* gcc.target/i386/vect-float16-5.c: Ditto.
* gcc.target/i386/vect-float16-6.c: Ditto.
* gcc.target/i386/vect-float16-7.c: Ditto.
* gcc.target/i386/vect-float16-8.c: Ditto.
* gcc.target/i386/vect-float16-9.c: Ditto.
Richard Biener [Thu, 9 Sep 2021 13:08:22 +0000 (15:08 +0200)]
Remove dbx.h, do not set PREFERRED_DEBUGGING_TYPE from dbxcoff.h, lynx.h
The following removes the unused config/dbx.h file and removes the
setting of PREFERRED_DEBUGGING_TYPE from dbxcoff.h which is
overridden by all users (djgpp/mingw/cygwin) via either including
config/i386/djgpp.h or config/i386/cygming.h
There are still circumstances where mingw and cygwin default to
STABS, namely when HAVE_GAS_PE_SECREL32_RELOC is not defined and
the target defaults to 32bit code generation.
The new style handling DBX_DEBUGGING_INFO is in line with
dbxelf.h which does not define PREFERRED_DEBUGGING_TYPE either.
The patch also removes the PREFERRED_DEBUGGING_TYPE define from
lynx.h which always follows elfos.h already defaulting to DWARF,
so the comment about STABS being the default is misleading and
outdated.
2021-09-09 Richard Biener <rguenther@suse.de>
PR target/102255
* config/dbx.h: Remove.
* config/dbxcoff.h: Do not define PREFERRED_DEBUGGING_TYPE.
* config/lynx.h: Likewise.
liuhongt [Thu, 9 Sep 2021 06:49:16 +0000 (14:49 +0800)]
Remove copysign post_reload splitter for scalar modes.
It can generate better code just like avx512dq-abs-copysign-1.c
shows.
gcc/ChangeLog:
* config/i386/i386-expand.c (ix86_expand_copysign): Expand
right into ANDNOT + AND + IOR, using paradoxical subregs.
(ix86_split_copysign_const): Remove.
(ix86_split_copysign_var): Ditto.
* config/i386/i386-protos.h (ix86_split_copysign_const): Dotto.
(ix86_split_copysign_var): Ditto.
* config/i386/i386.md (@copysign<mode>3_const): Ditto.
(@copysign<mode>3_var): Ditto.
gcc/testsuite/ChangeLog:
* gcc.target/i386/avx512dq-abs-copysign-1.c: Adjust testcase.
* gcc.target/i386/avx512vl-abs-copysign-1.c: Adjust testcase.
GCC Administrator [Fri, 10 Sep 2021 00:16:31 +0000 (00:16 +0000)]
Daily bump.
qing zhao [Thu, 9 Sep 2021 22:44:49 +0000 (15:44 -0700)]
Add -ftrivial-auto-var-init option and uninitialized variable attribute.
Initialize automatic variables with either a pattern or with zeroes to increase
the security and predictability of a program by preventing uninitialized memory
disclosure and use.
GCC still considers an automatic variable that doesn't have an explicit
initializer as uninitialized, -Wuninitialized will still report warning messages
on such automatic variables.
With this option, GCC will also initialize any padding of automatic variables
that have structure or union types to zeroes.
You can control this behavior for a specific variable by using the variable
attribute "uninitialized" to control runtime overhead.
gcc/ChangeLog:
2021-09-09 qing zhao <qing.zhao@oracle.com>
* builtins.c (expand_builtin_memset): Make external visible.
* builtins.h (expand_builtin_memset): Declare extern.
* common.opt (ftrivial-auto-var-init=): New option.
* doc/extend.texi: Document the uninitialized attribute.
* doc/invoke.texi: Document -ftrivial-auto-var-init.
* flag-types.h (enum auto_init_type): New enumerated type
auto_init_type.
* gimple-fold.c (clear_padding_type): Add one new parameter.
(clear_padding_union): Likewise.
(clear_padding_emit_loop): Likewise.
(clear_type_padding_in_mask): Likewise.
(gimple_fold_builtin_clear_padding): Handle this new parameter.
* gimplify.c (gimple_add_init_for_auto_var): New function.
(gimple_add_padding_init_for_auto_var): New function.
(is_var_need_auto_init): New function.
(gimplify_decl_expr): Add initialization to automatic variables per
users' requests.
(gimplify_call_expr): Add one new parameter for call to
__builtin_clear_padding.
(gimplify_init_constructor): Add padding initialization in the end.
* internal-fn.c (INIT_PATTERN_VALUE): New macro.
(expand_DEFERRED_INIT): New function.
* internal-fn.def (DEFERRED_INIT): New internal function.
* tree-cfg.c (verify_gimple_call): Verify calls to .DEFERRED_INIT.
* tree-sra.c (generate_subtree_deferred_init): New function.
(scan_function): Avoid setting cannot_scalarize_away_bitmap for
calls to .DEFERRED_INIT.
(sra_modify_deferred_init): New function.
(sra_modify_function_body): Handle calls to DEFERRED_INIT specially.
* tree-ssa-structalias.c (find_func_aliases_for_call): Likewise.
* tree-ssa-uninit.c (warn_uninit): Handle calls to DEFERRED_INIT
specially.
(check_defs): Likewise.
(warn_uninitialized_vars): Likewise.
* tree-ssa.c (ssa_undefined_value_p): Likewise.
* tree.c (build_common_builtin_nodes): Build tree node for
BUILT_IN_CLEAR_PADDING when needed.
gcc/c-family/ChangeLog:
2021-09-09 qing zhao <qing.zhao@oracle.com>
* c-attribs.c (handle_uninitialized_attribute): New function.
(c_common_attribute_table): Add "uninitialized" attribute.
gcc/testsuite/ChangeLog:
2021-09-09 qing zhao <qing.zhao@oracle.com>
* c-c++-common/auto-init-1.c: New test.
* c-c++-common/auto-init-10.c: New test.
* c-c++-common/auto-init-11.c: New test.
* c-c++-common/auto-init-12.c: New test.
* c-c++-common/auto-init-13.c: New test.
* c-c++-common/auto-init-14.c: New test.
* c-c++-common/auto-init-15.c: New test.
* c-c++-common/auto-init-16.c: New test.
* c-c++-common/auto-init-2.c: New test.
* c-c++-common/auto-init-3.c: New test.
* c-c++-common/auto-init-4.c: New test.
* c-c++-common/auto-init-5.c: New test.
* c-c++-common/auto-init-6.c: New test.
* c-c++-common/auto-init-7.c: New test.
* c-c++-common/auto-init-8.c: New test.
* c-c++-common/auto-init-9.c: New test.
* c-c++-common/auto-init-esra.c: New test.
* c-c++-common/auto-init-padding-1.c: New test.
* c-c++-common/auto-init-padding-2.c: New test.
* c-c++-common/auto-init-padding-3.c: New test.
* g++.dg/auto-init-uninit-pred-1_a.C: New test.
* g++.dg/auto-init-uninit-pred-2_a.C: New test.
* g++.dg/auto-init-uninit-pred-3_a.C: New test.
* g++.dg/auto-init-uninit-pred-4.C: New test.
* gcc.dg/auto-init-sra-1.c: New test.
* gcc.dg/auto-init-sra-2.c: New test.
* gcc.dg/auto-init-uninit-1.c: New test.
* gcc.dg/auto-init-uninit-12.c: New test.
* gcc.dg/auto-init-uninit-13.c: New test.
* gcc.dg/auto-init-uninit-14.c: New test.
* gcc.dg/auto-init-uninit-15.c: New test.
* gcc.dg/auto-init-uninit-16.c: New test.
* gcc.dg/auto-init-uninit-17.c: New test.
* gcc.dg/auto-init-uninit-18.c: New test.
* gcc.dg/auto-init-uninit-19.c: New test.
* gcc.dg/auto-init-uninit-2.c: New test.
* gcc.dg/auto-init-uninit-20.c: New test.
* gcc.dg/auto-init-uninit-21.c: New test.
* gcc.dg/auto-init-uninit-22.c: New test.
* gcc.dg/auto-init-uninit-23.c: New test.
* gcc.dg/auto-init-uninit-24.c: New test.
* gcc.dg/auto-init-uninit-25.c: New test.
* gcc.dg/auto-init-uninit-26.c: New test.
* gcc.dg/auto-init-uninit-3.c: New test.
* gcc.dg/auto-init-uninit-34.c: New test.
* gcc.dg/auto-init-uninit-36.c: New test.
* gcc.dg/auto-init-uninit-37.c: New test.
* gcc.dg/auto-init-uninit-4.c: New test.
* gcc.dg/auto-init-uninit-5.c: New test.
* gcc.dg/auto-init-uninit-6.c: New test.
* gcc.dg/auto-init-uninit-8.c: New test.
* gcc.dg/auto-init-uninit-9.c: New test.
* gcc.dg/auto-init-uninit-A.c: New test.
* gcc.dg/auto-init-uninit-B.c: New test.
* gcc.dg/auto-init-uninit-C.c: New test.
* gcc.dg/auto-init-uninit-H.c: New test.
* gcc.dg/auto-init-uninit-I.c: New test.
* gcc.target/aarch64/auto-init-1.c: New test.
* gcc.target/aarch64/auto-init-2.c: New test.
* gcc.target/aarch64/auto-init-3.c: New test.
* gcc.target/aarch64/auto-init-4.c: New test.
* gcc.target/aarch64/auto-init-5.c: New test.
* gcc.target/aarch64/auto-init-6.c: New test.
* gcc.target/aarch64/auto-init-7.c: New test.
* gcc.target/aarch64/auto-init-8.c: New test.
* gcc.target/aarch64/auto-init-padding-1.c: New test.
* gcc.target/aarch64/auto-init-padding-10.c: New test.
* gcc.target/aarch64/auto-init-padding-11.c: New test.
* gcc.target/aarch64/auto-init-padding-12.c: New test.
* gcc.target/aarch64/auto-init-padding-2.c: New test.
* gcc.target/aarch64/auto-init-padding-3.c: New test.
* gcc.target/aarch64/auto-init-padding-4.c: New test.
* gcc.target/aarch64/auto-init-padding-5.c: New test.
* gcc.target/aarch64/auto-init-padding-6.c: New test.
* gcc.target/aarch64/auto-init-padding-7.c: New test.
* gcc.target/aarch64/auto-init-padding-8.c: New test.
* gcc.target/aarch64/auto-init-padding-9.c: New test.
* gcc.target/i386/auto-init-1.c: New test.
* gcc.target/i386/auto-init-2.c: New test.
* gcc.target/i386/auto-init-21.c: New test.
* gcc.target/i386/auto-init-22.c: New test.
* gcc.target/i386/auto-init-23.c: New test.
* gcc.target/i386/auto-init-24.c: New test.
* gcc.target/i386/auto-init-3.c: New test.
* gcc.target/i386/auto-init-4.c: New test.
* gcc.target/i386/auto-init-5.c: New test.
* gcc.target/i386/auto-init-6.c: New test.
* gcc.target/i386/auto-init-7.c: New test.
* gcc.target/i386/auto-init-8.c: New test.
* gcc.target/i386/auto-init-padding-1.c: New test.
* gcc.target/i386/auto-init-padding-10.c: New test.
* gcc.target/i386/auto-init-padding-11.c: New test.
* gcc.target/i386/auto-init-padding-12.c: New test.
* gcc.target/i386/auto-init-padding-2.c: New test.
* gcc.target/i386/auto-init-padding-3.c: New test.
* gcc.target/i386/auto-init-padding-4.c: New test.
* gcc.target/i386/auto-init-padding-5.c: New test.
* gcc.target/i386/auto-init-padding-6.c: New test.
* gcc.target/i386/auto-init-padding-7.c: New test.
* gcc.target/i386/auto-init-padding-8.c: New test.
* gcc.target/i386/auto-init-padding-9.c: New test.
Harald Anlauf [Thu, 9 Sep 2021 19:34:01 +0000 (21:34 +0200)]
Fortran - out of bounds in array constructor with implied do loop
gcc/fortran/ChangeLog:
PR fortran/98490
* trans-expr.c (gfc_conv_substring): Do not generate substring
bounds check for implied do loop index variable before it actually
becomes defined.
gcc/testsuite/ChangeLog:
PR fortran/98490
* gfortran.dg/bounds_check_23.f90: New test.
H.J. Lu [Thu, 9 Sep 2021 14:23:16 +0000 (07:23 -0700)]
x86-64: Update AVX512FP16 ABI tests for x32
On x32, long is the same as int and pointer is 32 bits. Update AVX512FP16
ABI tests:
1. Replace long with long long for 64-bit integers.
2. Update type and alignment for long and pointer.
3. Skip tests for long on x32.
* gcc.target/x86_64/abi/avx512fp16/args.h: Replace long with
long long.
(XMM_T): Rename _long to _longlong and _ulong to _ulonglong.
(X87_T): Rename _ulong to _ulonglong.
* gcc.target/x86_64/abi/avx512fp16/defines.h (TYPE_SIZE_LONG):
Define to 4 if __ILP32__ is defined.
(TYPE_SIZE_POINTER): Likewise.
(TYPE_ALIGN_LONG): Likewise.
(TYPE_ALIGN_POINTER): Likewise.
* gcc.target/x86_64/abi/avx512fp16/test_3_element_struct_and_unions.c
(main): Skip test for long if __ILP32__ is defined.
* gcc.target/x86_64/abi/avx512fp16/test_m64m128_returning.c
(do_test): Replace _long with _longlong.
* gcc.target/x86_64/abi/avx512fp16/test_struct_returning.c:
(check_300): Replace _ulong with _ulonglong.
* gcc.target/x86_64/abi/avx512fp16/m256h/args.h: Replace long
with long long.
(YMM_T): Rename _long to _longlong and _ulong to _ulonglong.
(X87_T): Rename _ulong to _ulonglong.
* gcc.target/x86_64/abi/avx512fp16/m512h/args.h: Replace long
with long long.
(ZMM_T): Rename _long to _longlong and _ulong to _ulonglong.
(X87_T): Rename _ulong to _ulonglong.
Richard Biener [Thu, 9 Sep 2021 09:50:20 +0000 (11:50 +0200)]
Improve LIM fill_always_executed_in computation
Currently the DOM walk over a loop body does not walk into not
always executed subloops to avoid scalability issues since doing
so makes the walk quadratic in the loop depth. It turns out this
is not an issue in practice and even with a loop depth of 1800
this function is way off the radar.
So the following patch removes the limitation, replacing it with
a comment.
2021-09-09 Richard Biener <rguenther@suse.de>
* tree-ssa-loop-im.c (fill_always_executed_in_1): Walk
into all subloops.
* gcc.dg/tree-ssa/ssa-lim-17.c: New testcase.
Richard Biener [Thu, 9 Sep 2021 08:52:12 +0000 (10:52 +0200)]
Avoid full DOM walk in LIM fill_always_executed_in
This avoids a full DOM walk via get_loop_body_in_dom_order in the
loop body walk of fill_always_executed_in which is often terminating
the walk of a loop body early by integrating the DOM walk of
get_loop_body_in_dom_order with the actual processing done by
fill_always_executed_in. This trades the fully populated loop
body array with a worklist allocation of the same size and thus
should be a strict improvement over the recursive approach of
get_loop_body_in_dom_order.
2021-09-09 Richard Biener <rguenther@suse.de>
* tree-ssa-loop-im.c (fill_always_executed_in_1): Integrate
DOM walk from get_loop_body_in_dom_order using a worklist
approach.
liuhongt [Mon, 2 Mar 2020 08:35:58 +0000 (16:35 +0800)]
AVX512FP16: Add testcase for vaddph/vsubph/vmulph/vdivph.
gcc/testsuite/ChangeLog:
* gcc.target/i386/avx512fp16-helper.h: New header file for
FP16 runtime test.
* gcc.target/i386/avx512fp16-vaddph-1a.c: New test.
* gcc.target/i386/avx512fp16-vaddph-1b.c: Ditto.
* gcc.target/i386/avx512fp16-vdivph-1a.c: Ditto.
* gcc.target/i386/avx512fp16-vdivph-1b.c: Ditto.
* gcc.target/i386/avx512fp16-vmulph-1a.c: Ditto.
* gcc.target/i386/avx512fp16-vmulph-1b.c: Ditto.
* gcc.target/i386/avx512fp16-vsubph-1a.c: Ditto.
* gcc.target/i386/avx512fp16-vsubph-1b.c: Ditto.
* gcc.target/i386/avx512fp16vl-vaddph-1a.c: Ditto.
* gcc.target/i386/avx512fp16vl-vaddph-1b.c: Ditto.
* gcc.target/i386/avx512fp16vl-vdivph-1a.c: Ditto.
* gcc.target/i386/avx512fp16vl-vdivph-1b.c: Ditto.
* gcc.target/i386/avx512fp16vl-vmulph-1a.c: Ditto.
* gcc.target/i386/avx512fp16vl-vmulph-1b.c: Ditto.
* gcc.target/i386/avx512fp16vl-vsubph-1a.c: Ditto.
* gcc.target/i386/avx512fp16vl-vsubph-1b.c: Ditto.
liuhongt [Fri, 18 Jan 2019 22:09:24 +0000 (14:09 -0800)]
AVX512FP16: Add vaddph/vsubph/vdivph/vmulph.
gcc/ChangeLog:
* config.gcc: Add avx512fp16vlintrin.h.
* config/i386/avx512fp16intrin.h: (_mm512_add_ph): New intrinsic.
(_mm512_mask_add_ph): Likewise.
(_mm512_maskz_add_ph): Likewise.
(_mm512_sub_ph): Likewise.
(_mm512_mask_sub_ph): Likewise.
(_mm512_maskz_sub_ph): Likewise.
(_mm512_mul_ph): Likewise.
(_mm512_mask_mul_ph): Likewise.
(_mm512_maskz_mul_ph): Likewise.
(_mm512_div_ph): Likewise.
(_mm512_mask_div_ph): Likewise.
(_mm512_maskz_div_ph): Likewise.
(_mm512_add_round_ph): Likewise.
(_mm512_mask_add_round_ph): Likewise.
(_mm512_maskz_add_round_ph): Likewise.
(_mm512_sub_round_ph): Likewise.
(_mm512_mask_sub_round_ph): Likewise.
(_mm512_maskz_sub_round_ph): Likewise.
(_mm512_mul_round_ph): Likewise.
(_mm512_mask_mul_round_ph): Likewise.
(_mm512_maskz_mul_round_ph): Likewise.
(_mm512_div_round_ph): Likewise.
(_mm512_mask_div_round_ph): Likewise.
(_mm512_maskz_div_round_ph): Likewise.
* config/i386/avx512fp16vlintrin.h: New header.
* config/i386/i386-builtin-types.def (V16HF, V8HF, V32HF):
Add new builtin types.
* config/i386/i386-builtin.def: Add corresponding builtins.
* config/i386/i386-expand.c
(ix86_expand_args_builtin): Handle new builtin types.
(ix86_expand_round_builtin): Likewise.
* config/i386/immintrin.h: Include avx512fp16vlintrin.h
* config/i386/sse.md (VFH): New mode_iterator.
(VF2H): Likewise.
(avx512fmaskmode): Add HF vector modes.
(avx512fmaskhalfmode): Likewise.
(<plusminus_insn><mode>3<mask_name><round_name>): Adjust to for
HF vector modes.
(*<plusminus_insn><mode>3<mask_name><round_name>): Likewise.
(mul<mode>3<mask_name><round_name>): Likewise.
(*mul<mode>3<mask_name><round_name>): Likewise.
(div<mode>3): Likewise.
(<sse>_div<mode>3<mask_name><round_name>): Likewise.
* config/i386/subst.md (SUBST_V): Add HF vector modes.
(SUBST_A): Likewise.
(round_mode512bit_condition): Adjust for V32HFmode.
gcc/testsuite/ChangeLog:
* gcc.target/i386/avx-1.c: Add -mavx512vl and test for new intrinsics.
* gcc.target/i386/avx-2.c: Add -mavx512vl.
* gcc.target/i386/avx512fp16-11a.c: New test.
* gcc.target/i386/avx512fp16-11b.c: Ditto.
* gcc.target/i386/avx512vlfp16-11a.c: Ditto.
* gcc.target/i386/avx512vlfp16-11b.c: Ditto.
* gcc.target/i386/sse-13.c: Add test for new builtins.
* gcc.target/i386/sse-23.c: Ditto.
* gcc.target/i386/sse-14.c: Add test for new intrinsics.
* gcc.target/i386/sse-22.c: Ditto.
liuhongt [Tue, 7 Sep 2021 04:39:04 +0000 (12:39 +0800)]
Optimize v4sf reduction.
gcc/ChangeLog:
PR target/101059
* config/i386/sse.md (reduc_plus_scal_<mode>): Split to ..
(reduc_plus_scal_v4sf): .. this, New define_expand.
(reduc_plus_scal_v2df): .. and this, New define_expand.
gcc/testsuite/ChangeLog:
PR target/101059
* gcc.target/i386/sse2-pr101059.c: New test.
* gcc.target/i386/sse3-pr101059.c: New test.
liuhongt [Wed, 8 Sep 2021 08:19:37 +0000 (16:19 +0800)]
Optimize vec_extract for 256/512-bit vector when index exceeds the lower 128 bits.
- vextracti32x8 $0x1, %zmm0, %ymm0
- vmovd %xmm0, %eax
+ valignd $8, %zmm0, %zmm0, %zmm1
+ vmovd %xmm1, %eax
- vextracti32x8 $0x1, %zmm0, %ymm0
- vextracti128 $0x1, %ymm0, %xmm0
- vpextrd $3, %xmm0, %eax
+ valignd $15, %zmm0, %zmm0, %zmm1
+ vmovd %xmm1, %eax
- vextractf64x2 $0x1, %ymm0, %xmm0
+ valignq $2, %ymm0, %ymm0, %ymm0
- vextractf64x4 $0x1, %zmm0, %ymm0
- vextractf64x2 $0x1, %ymm0, %xmm0
- vunpckhpd %xmm0, %xmm0, %xmm0
+ valignq $7, %zmm0, %zmm0, %zmm0
gcc/ChangeLog:
PR target/91103
* config/i386/sse.md (*vec_extract<mode><ssescalarmodelower>_valign):
New define_insn.
gcc/testsuite/ChangeLog:
PR target/91103
* gcc.target/i386/pr91103-1.c: New test.
* gcc.target/i386/pr91103-2.c: New test.
GCC Administrator [Thu, 9 Sep 2021 00:16:32 +0000 (00:16 +0000)]
Daily bump.
Jonathan Wakely [Tue, 31 Aug 2021 08:46:41 +0000 (09:46 +0100)]
c++: Fix docs on assignment of virtual bases [PR60318]
The description of behaviour is incorrect, the virtual base gets
assigned before entering the bodies of A::operator= and B::operator=,
not after.
The example is also ill-formed (passing a string literal to char*) and
undefined (missing return from Base::operator=).
Signed-off-by: Jonathan Wakely <jwakely@redhat.com>
gcc/ChangeLog:
PR c++/60318
* doc/trouble.texi (Copy Assignment): Fix description of
behaviour and fix code in example.
David Malcolm [Wed, 8 Sep 2021 18:37:19 +0000 (14:37 -0400)]
analyzer: fix ICE when discarding result of realloc [PR102225]
gcc/analyzer/ChangeLog:
PR analyzer/102225
* analyzer.h (compat_types_p): New decl.
* constraint-manager.cc
(constraint_manager::get_or_add_equiv_class): Guard against NULL
type when checking for pointer types.
* region-model-impl-calls.cc (region_model::impl_call_realloc):
Guard against NULL lhs type/region. Guard against the size value
not being of a compatible type for dynamic extents.
* region-model.cc (compat_types_p): Make non-static.
gcc/testsuite/ChangeLog:
PR analyzer/102225
* gcc.dg/analyzer/realloc-1.c (test_10): New.
* gcc.dg/analyzer/torture/pr102225.c: New test.
Richard Biener [Wed, 8 Sep 2021 08:39:27 +0000 (10:39 +0200)]
c++/102228 - make lookup_anon_field O(1)
For the testcase in PR101555 lookup_anon_field takes the majority
of parsing time followed by get_class_binding_direct/fields_linear_search
which is PR83309. The situation with anon aggregates is particularly
dire when we need to build accesses to their members and the anon
aggregates are nested. There for each such access we recursively
build sub-accesses to the anon aggregate FIELD_DECLs bottom-up,
DFS searching for them. That's inefficient since as I believe
there's a 1:1 relationship between anon aggregate types and the
FIELD_DECL used to place them.
The patch below does away with the search in lookup_anon_field and
instead records the single FIELD_DECL in the anon aggregate types
lang-specific data, re-using the RTTI typeinfo_var field. That
speeds up the compile of the testcase with -fsyntax-only from
about 4.5s to slightly less than 1s.
I tried to poke holes into the 1:1 relationship idea with my C++
knowledge but failed (which might not say much). It also leaves
a hole for the case when the C++ FE itself duplicates such type
and places it at a semantically different position. I've tried
to poke holes into it with the duplication mechanism I understand
(templates) but failed.
2021-09-08 Richard Biener <rguenther@suse.de>
PR c++/102228
gcc/cp/
* cp-tree.h (ANON_AGGR_TYPE_FIELD): New define.
* decl.c (fixup_anonymous_aggr): Wipe RTTI info put in
place on invalid code.
* decl2.c (reset_type_linkage): Guard CLASSTYPE_TYPEINFO_VAR
access.
* module.cc (trees_in::read_class_def): Likewise. Reconstruct
ANON_AGGR_TYPE_FIELD.
* semantics.c (finish_member_declaration): Populate
ANON_AGGR_TYPE_FIELD for anon aggregate typed members.
* typeck.c (lookup_anon_field): Remove DFS search and return
ANON_AGGR_TYPE_FIELD directly.
Joseph Myers [Wed, 8 Sep 2021 15:38:18 +0000 (15:38 +0000)]
testsuite: Allow .sdata in more cases in gcc.dg/array-quals-1.c
When testing for Nios II (gcc-testresults shows this for MIPS as
well), failures of gcc.dg/array-quals-1.c appear where a symbol was
found in .sdata rather than one of the expected sections.
FAIL: gcc.dg/array-quals-1.c scan-assembler-symbol-section symbol ^_?a$ (found a) has section ^\\.(const|rodata|srodata)|\\[RO\\] (found .sdata)
FAIL: gcc.dg/array-quals-1.c scan-assembler-symbol-section symbol ^_?b$ (found b) has section ^\\.(const|rodata|srodata)|\\[RO\\] (found .sdata)
FAIL: gcc.dg/array-quals-1.c scan-assembler-symbol-section symbol ^_?c$ (found c) has section ^\\.(const|rodata|srodata)|\\[RO\\] (found .sdata)
FAIL: gcc.dg/array-quals-1.c scan-assembler-symbol-section symbol ^_?d$ (found d) has section ^\\.(const|rodata|srodata)|\\[RO\\] (found .sdata)
Jakub's commit
0b34dbc0a24864b1674bff7a92fa3cf0f1cbcea1 allowed .sdata
for many variables in that test where use of .sdata caused a failure
on powerpc-linux. I'm presuming the choice of which variables had
.sdata allowed was based only on the code generated for powerpc-linux,
not on any reason it would be wrong to allow it for the other
variables; thus, this patch adjusts the test to allow .sdata for some
more variables where that is needed on Nios II (and in one case where
it's not needed on Nios II, but the test results on gcc-testresults
suggest that it is needed on MIPS).
Tested with no regressions with cross to nios2-elf.
* gcc.dg/array-quals-1.c: Allow .sdata section in more cases.
Joseph Myers [Wed, 8 Sep 2021 14:57:20 +0000 (14:57 +0000)]
testsuite: Use explicit -ftree-cselim in tests using -fdump-tree-cselim-details
When testing for Nios II (gcc-testresults shows this for various other
targets as well), tests scanning cselim dumps produce an UNRESOLVED
result because those dumps do not exist.
cselim is enabled conditionally by code in toplev.c:
if (flag_tree_cselim == AUTODETECT_VALUE)
{
if (HAVE_conditional_move)
flag_tree_cselim = 1;
else
flag_tree_cselim = 0;
}
Add explicit -ftree-cselim to dg-options in the affected tests (as
already used by some other tests of cselim dumps) so that this dump
exists on all architectures.
Tested with no regressions with cross to nios2-elf, where this causes
the tests in question to PASS instead of being UNRESOLVED.
* gcc.dg/tree-ssa/pr89430-1.c, gcc.dg/tree-ssa/pr89430-2.c,
gcc.dg/tree-ssa/pr89430-3.c, gcc.dg/tree-ssa/pr89430-4.c,
gcc.dg/tree-ssa/pr89430-5.c, gcc.dg/tree-ssa/pr89430-6.c,
gcc.dg/tree-ssa/pr89430-7-comp-ref.c,
gcc.dg/tree-ssa/pr89430-8-mem-ref-size.c,
gcc.dg/tree-ssa/pr99473-1.c: Use -ftree-cselim.
Segher Boessenkool [Wed, 8 Sep 2021 13:10:30 +0000 (13:10 +0000)]
rs6000: Fix ELFv2 r12 use in epilogue
We cannot use r12 here, it is already in use as the GEP (for sibling
calls).
2021-09-08 Segher Boessenkool <segher@kernel.crashing.org>
PR target/102107
* config/rs6000/rs6000-logue.c (rs6000_emit_epilogue): For ELFv2 use
r11 instead of r12 for restoring CR.
Jakub Jelinek [Wed, 8 Sep 2021 12:06:10 +0000 (14:06 +0200)]
i386: Fix up xorsign for AVX [PR89984]
Thinking about it more this morning, while this patch fixes the problems
revealed in the testcase, the recent PR89984 change was buggy too, but
perhaps that can be fixed incrementally. Because for AVX the new code
destructively modifies op1. If that is different from dest, say on:
float
foo (float x, float y)
{
return x * __builtin_copysignf (1.0f, y) + y;
}
then we get after RA:
(insn 8 7 9 2 (set (reg:SF 20 xmm0 [orig:82 _2 ] [82])
(unspec:SF [
(reg:SF 20 xmm0 [88])
(reg:SF 21 xmm1 [89])
(mem/u/c:V4SF (symbol_ref/u:DI ("*.LC0") [flags 0x2]) [0 S16 A128])
] UNSPEC_XORSIGN)) "hohoho.c":4:12 649 {xorsignsf3_1}
(nil))
(insn 9 8 15 2 (set (reg:SF 20 xmm0 [87])
(plus:SF (reg:SF 20 xmm0 [orig:82 _2 ] [82])
(reg:SF 21 xmm1 [89]))) "hohoho.c":4:44 1021 {*fop_sf_comm}
(nil))
but split the xorsign into:
vandps .LC0(%rip), %xmm1, %xmm1
vxorps %xmm0, %xmm1, %xmm0
and then the addition:
vaddss %xmm1, %xmm0, %xmm0
which means we miscompile it - instead of adding y in the end we add
__builtin_copysignf (0.0f, y).
So, wonder if we don't want instead in addition to the &Yv <- Yv, 0
alternative (enabled for both pre-AVX and AVX as in this patch) the
&Yv <- Yv, Yv where destination must be different from inputs and another
Yv <- Yv, Yv where it can be the same but then need a match_scratch
(with X for the other alternatives and =Yv for the last one).
That way we'd always have a safe register we can store the op1 & mask
value into, either the destination (in the first alternative known to
be equal to op1 which is needed for non-AVX but ok for AVX too), in the
second alternative known to be different from both inputs and in the third
which could be used for those
float bar (float x, float y) { return x * __builtin_copysignf (1.0f, y); }
cases where op1 is naturally xmm1 and dest == op0 naturally xmm0 we'd use
some other register like xmm2.
On Wed, Sep 08, 2021 at 05:23:40PM +0800, Hongtao Liu wrote:
> I'm curious why we need the post_reload splitter @xorsign<mode>3_1
> for scalar mode, can't we just expand them into and/xor operations in
> the expander, just like vector modes did.
Following seems to work for all the testcases I've tried (and in some
generates better code than the post-reload splitter).
2021-09-08 Jakub Jelinek <jakub@redhat.com>
liuhongt <hongtao.liu@intel.com>
PR target/89984
* config/i386/i386.md (@xorsign<mode>3_1): Remove.
* config/i386/i386-expand.c (ix86_expand_xorsign): Expand right away
into AND with mask and XOR, using paradoxical subregs.
(ix86_split_xorsign): Remove.
* config/i386/i386-protos.h (ix86_split_xorsign): Remove.
* gcc.target/i386/avx-pr102224.c: Fix up PR number.
* gcc.dg/pr89984.c: New test.
* gcc.target/i386/avx-pr89984.c: New test.
liuhongt [Wed, 8 Sep 2021 01:49:54 +0000 (09:49 +0800)]
Compile __{mul,div}hc3 into libgcc_s.so.1.
libgcc/ChangeLog:
* config/i386/t-softfp: Compile __{mul,div}hc3 into
libgcc_s.so.1.
Di Zhao [Wed, 8 Sep 2021 07:34:27 +0000 (15:34 +0800)]
tree-optimization/102183 - sccvn: fix result compare in vn_nary_op_insert_into
If the first predicate value is different and copied, the comparison will then
be between val->result and the copied one. That can cause inserting extra
vn_pvals.
gcc/ChangeLog:
* tree-ssa-sccvn.c (vn_nary_op_insert_into): fix result compare
Jakub Jelinek [Wed, 8 Sep 2021 09:34:45 +0000 (11:34 +0200)]
libgcc, i386: Export *hf* and *hc* from libgcc_s.so.1
The following patch exports it for Linux from config/i386/*.ver where it
IMNSHO belongs, aarch64 already exports some of those at GCC_11* and other
targets might add them at completely different gcc versions.
2021-09-08 Jakub Jelinek <jakub@redhat.com>
Iain Sandoe <iain@sandoe.co.uk>
* config/i386/libgcc-glibc.ver: Add %inherit GCC_12.0.0 GCC_7.0.0
and export *hf* and *hc* functions at GCC_12.0.0.
Jakub Jelinek [Wed, 8 Sep 2021 09:25:31 +0000 (11:25 +0200)]
i386: Fix up @xorsign<mode>3_1 [PR102224]
As the testcase shows, we miscompile @xorsign<mode>3_1 if both input
operands are in the same register, because the splitter overwrites op1
before with op1 & mask before using op0.
For dest = xorsign op0, op0 we can actually simplify it from
dest = (op0 & mask) ^ op0 to dest = op0 & ~mask (aka abs).
The expander change is an optimization improvement, if we at expansion
time know it is xorsign op0, op0, we can emit abs right away and get better
code through that.
The @xorsign<mode>3_1 is a fix for the case where xorsign wouldn't be known
to have same operands during expansion, but during RTL optimizations they
would appear. For non-AVX we need to use earlyclobber, we require
dest and op1 to be the same but op0 must be different because we overwrite
op1 first. For AVX the constraints ensure that at most 2 of the 3 operands
may be the same register and if both inputs are the same, handles that case.
This case can be easily tested with the xorsign<mode>3 expander change
reverted.
Bootstrapped/regtested on x86_64-linux and i686-linux, ok for trunk?
Thinking about it more this morning, while this patch fixes the problems
revealed in the testcase, the recent PR89984 change was buggy too, but
perhaps that can be fixed incrementally. Because for AVX the new code
destructively modifies op1. If that is different from dest, say on:
float
foo (float x, float y)
{
return x * __builtin_copysignf (1.0f, y) + y;
}
then we get after RA:
(insn 8 7 9 2 (set (reg:SF 20 xmm0 [orig:82 _2 ] [82])
(unspec:SF [
(reg:SF 20 xmm0 [88])
(reg:SF 21 xmm1 [89])
(mem/u/c:V4SF (symbol_ref/u:DI ("*.LC0") [flags 0x2]) [0 S16 A128])
] UNSPEC_XORSIGN)) "hohoho.c":4:12 649 {xorsignsf3_1}
(nil))
(insn 9 8 15 2 (set (reg:SF 20 xmm0 [87])
(plus:SF (reg:SF 20 xmm0 [orig:82 _2 ] [82])
(reg:SF 21 xmm1 [89]))) "hohoho.c":4:44 1021 {*fop_sf_comm}
(nil))
but split the xorsign into:
vandps .LC0(%rip), %xmm1, %xmm1
vxorps %xmm0, %xmm1, %xmm0
and then the addition:
vaddss %xmm1, %xmm0, %xmm0
which means we miscompile it - instead of adding y in the end we add
__builtin_copysignf (0.0f, y).
So, wonder if we don't want instead in addition to the &Yv <- Yv, 0
alternative (enabled for both pre-AVX and AVX as in this patch) the
&Yv <- Yv, Yv where destination must be different from inputs and another
Yv <- Yv, Yv where it can be the same but then need a match_scratch
(with X for the other alternatives and =Yv for the last one).
That way we'd always have a safe register we can store the op1 & mask
value into, either the destination (in the first alternative known to
be equal to op1 which is needed for non-AVX but ok for AVX too), in the
second alternative known to be different from both inputs and in the third
which could be used for those
float bar (float x, float y) { return x * __builtin_copysignf (1.0f, y); }
cases where op1 is naturally xmm1 and dest == op0 naturally xmm0 we'd use
some other register like xmm2.
2021-09-08 Jakub Jelinek <jakub@redhat.com>
PR target/102224
* config/i386/i386.md (xorsign<mode>3): If operands[1] is equal to
operands[2], emit abs<mode>2 instead.
(@xorsign<mode>3_1): Add early-clobbers for output operand, enable
first alternative even for avx, add another alternative with
=&Yv <- 0, Yv, Yvm constraints.
* config/i386/i386-expand.c (ix86_split_xorsign): If op0 is equal
to op1, emit vpandn instead.
* gcc.dg/pr102224.c: New test.
* gcc.target/i386/avx-pr102224.c: New test.
liuhongt [Thu, 5 Mar 2020 01:57:25 +0000 (09:57 +0800)]
AVX512FP16: Add abi test for zmm
gcc/testsuite/ChangeLog:
* gcc.target/x86_64/abi/avx512fp16/m512h/abi-avx512fp16-zmm.exp:
New file.
* gcc.target/x86_64/abi/avx512fp16/m512h/args.h: Likewise.
* gcc.target/x86_64/abi/avx512fp16/m512h/asm-support.S: Likewise.
* gcc.target/x86_64/abi/avx512fp16/m512h/avx512fp16-zmm-check.h:
Likewise.
* gcc.target/x86_64/abi/avx512fp16/m512h/test_m512_returning.c:
Likewise.
* gcc.target/x86_64/abi/avx512fp16/m512h/test_passing_m512.c:
Likewise.
* gcc.target/x86_64/abi/avx512fp16/m512h/test_passing_structs.c:
Likewise.
* gcc.target/x86_64/abi/avx512fp16/m512h/test_passing_unions.c:
Likewise.
* gcc.target/x86_64/abi/avx512fp16/m512h/test_varargs-m512.c:
Likewise.
liuhongt [Thu, 5 Mar 2020 01:57:10 +0000 (09:57 +0800)]
AVX512FP16: Add ABI test for ymm.
gcc/testsuite/ChangeLog:
* gcc.target/x86_64/abi/avx512fp16/m256h/abi-avx512fp16-ymm.exp:
New exp file.
* gcc.target/x86_64/abi/avx512fp16/m256h/args.h: New header.
* gcc.target/x86_64/abi/avx512fp16/m256h/avx512fp16-ymm-check.h:
Likewise.
* gcc.target/x86_64/abi/avx512fp16/m256h/asm-support.S: New.
* gcc.target/x86_64/abi/avx512fp16/m256h/test_m256_returning.c:
New test.
* gcc.target/x86_64/abi/avx512fp16/m256h/test_passing_m256.c: Likewise.
* gcc.target/x86_64/abi/avx512fp16/m256h/test_passing_structs.c:
Likewise.
* gcc.target/x86_64/abi/avx512fp16/m256h/test_passing_unions.c:
Likewise.
* gcc.target/x86_64/abi/avx512fp16/m256h/test_varargs-m256.c: Likewise.
H.J. Lu [Fri, 28 Dec 2018 15:46:19 +0000 (07:46 -0800)]
AVX512FP16: Add ABI tests for xmm.
Copied from regular XMM ABI tests. Only run AVX512FP16 ABI tests for ELF
targets.
gcc/testsuite/ChangeLog:
* gcc.target/x86_64/abi/avx512fp16/abi-avx512fp16-xmm.exp: New exp
file for abi test.
* gcc.target/x86_64/abi/avx512fp16/args.h: New header file for abi test.
* gcc.target/x86_64/abi/avx512fp16/avx512fp16-check.h: Likewise.
* gcc.target/x86_64/abi/avx512fp16/avx512fp16-xmm-check.h: Likewise.
* gcc.target/x86_64/abi/avx512fp16/defines.h: Likewise.
* gcc.target/x86_64/abi/avx512fp16/macros.h: Likewise.
* gcc.target/x86_64/abi/avx512fp16/asm-support.S: New asm for abi check.
* gcc.target/x86_64/abi/avx512fp16/test_3_element_struct_and_unions.c:
New test.
* gcc.target/x86_64/abi/avx512fp16/test_basic_alignment.c: Likewise.
* gcc.target/x86_64/abi/avx512fp16/test_basic_array_size_and_align.c:
Likewise.
* gcc.target/x86_64/abi/avx512fp16/test_basic_returning.c: Likewise.
* gcc.target/x86_64/abi/avx512fp16/test_basic_sizes.c: Likewise.
* gcc.target/x86_64/abi/avx512fp16/test_basic_struct_size_and_align.c:
Likewise.
* gcc.target/x86_64/abi/avx512fp16/test_basic_union_size_and_align.c:
Likewise.
* gcc.target/x86_64/abi/avx512fp16/test_complex_returning.c: Likewise.
* gcc.target/x86_64/abi/avx512fp16/test_m64m128_returning.c: Likewise.
* gcc.target/x86_64/abi/avx512fp16/test_passing_floats.c: Likewise.
* gcc.target/x86_64/abi/avx512fp16/test_passing_m64m128.c: Likewise.
* gcc.target/x86_64/abi/avx512fp16/test_passing_structs.c: Likewise.
* gcc.target/x86_64/abi/avx512fp16/test_passing_unions.c: Likewise.
* gcc.target/x86_64/abi/avx512fp16/test_struct_returning.c: Likewise.
* gcc.target/x86_64/abi/avx512fp16/test_varargs-m128.c: Likewise.
H.J. Lu [Wed, 22 May 2019 15:23:52 +0000 (08:23 -0700)]
AVX512FP16: Add tests for vector passing in variable arguments.
gcc/testsuite/ChangeLog:
* gcc.target/i386/avx512fp16-vararg-1.c: New test.
* gcc.target/i386/avx512fp16-vararg-2.c: Ditto.
* gcc.target/i386/avx512fp16-vararg-3.c: Ditto.
* gcc.target/i386/avx512fp16-vararg-4.c: Ditto.
liuhongt [Mon, 2 Mar 2020 10:07:36 +0000 (18:07 +0800)]
AVX512FP16: Add testcase for vector init and broadcast intrinsics.
gcc/testsuite/ChangeLog:
* gcc.target/i386/m512-check.h: Add union128h, union256h, union512h.
* gcc.target/i386/avx512fp16-10a.c: New test.
* gcc.target/i386/avx512fp16-10b.c: Ditto.
* gcc.target/i386/avx512fp16-1a.c: Ditto.
* gcc.target/i386/avx512fp16-1b.c: Ditto.
* gcc.target/i386/avx512fp16-1c.c: Ditto.
* gcc.target/i386/avx512fp16-1d.c: Ditto.
* gcc.target/i386/avx512fp16-1e.c: Ditto.
* gcc.target/i386/avx512fp16-2a.c: Ditto.
* gcc.target/i386/avx512fp16-2b.c: Ditto.
* gcc.target/i386/avx512fp16-2c.c: Ditto.
* gcc.target/i386/avx512fp16-3a.c: Ditto.
* gcc.target/i386/avx512fp16-3b.c: Ditto.
* gcc.target/i386/avx512fp16-3c.c: Ditto.
* gcc.target/i386/avx512fp16-4.c: Ditto.
* gcc.target/i386/avx512fp16-5.c: Ditto.
* gcc.target/i386/avx512fp16-6.c: Ditto.
* gcc.target/i386/avx512fp16-7.c: Ditto.
* gcc.target/i386/avx512fp16-8.c: Ditto.
* gcc.target/i386/avx512fp16-9a.c: Ditto.
* gcc.target/i386/avx512fp16-9b.c: Ditto.
* gcc.target/i386/pr54855-13.c: Ditto.
* gcc.target/i386/avx512fp16-vec_set_var.c: Ditto.
liuhongt [Fri, 9 Jul 2021 03:24:45 +0000 (11:24 +0800)]
AVX512FP16: Support vector init/broadcast/set/extract for FP16.
gcc/ChangeLog:
* config/i386/avx512fp16intrin.h (_mm_set_ph): New intrinsic.
(_mm256_set_ph): Likewise.
(_mm512_set_ph): Likewise.
(_mm_setr_ph): Likewise.
(_mm256_setr_ph): Likewise.
(_mm512_setr_ph): Likewise.
(_mm_set1_ph): Likewise.
(_mm256_set1_ph): Likewise.
(_mm512_set1_ph): Likewise.
(_mm_setzero_ph): Likewise.
(_mm256_setzero_ph): Likewise.
(_mm512_setzero_ph): Likewise.
(_mm_set_sh): Likewise.
(_mm_load_sh): Likewise.
(_mm_store_sh): Likewise.
* config/i386/i386-builtin-types.def (V8HF): New type.
(DEF_FUNCTION_TYPE (V8HF, V8HI)): New builtin function type
* config/i386/i386-expand.c (ix86_expand_vector_init_duplicate):
Support vector HFmodes.
(ix86_expand_vector_init_one_nonzero): Likewise.
(ix86_expand_vector_init_one_var): Likewise.
(ix86_expand_vector_init_interleave): Likewise.
(ix86_expand_vector_init_general): Likewise.
(ix86_expand_vector_set): Likewise.
(ix86_expand_vector_extract): Likewise.
(ix86_expand_vector_init_concat): Likewise.
(ix86_expand_sse_movcc): Handle vector HFmodes.
(ix86_expand_vector_set_var): Ditto.
* config/i386/i386-modes.def: Add HF vector modes in comment.
* config/i386/i386.c (classify_argument): Add HF vector modes.
(ix86_hard_regno_mode_ok): Allow HF vector modes for AVX512FP16.
(ix86_vector_mode_supported_p): Likewise.
(ix86_set_reg_reg_cost): Handle vector HFmode.
(ix86_get_ssemov): Handle vector HFmode.
(function_arg_advance_64): Pass unamed V16HFmode and V32HFmode
by stack.
(function_arg_advance_32): Pass V8HF/V16HF/V32HF by sse reg for 32bit
mode.
(function_arg_advance_32): Ditto.
* config/i386/i386.h (VALID_AVX512FP16_REG_MODE): New.
(VALID_AVX256_REG_OR_OI_MODE): Rename to ..
(VALID_AVX256_REG_OR_OI_VHF_MODE): .. this, and add V16HF.
(VALID_SSE2_REG_VHF_MODE): New.
(VALID_AVX512VL_128_REG_MODE): Add V8HF and TImode.
(SSE_REG_MODE_P): Add vector HFmode.
* config/i386/i386.md (mode): Add HF vector modes.
(MODE_SIZE): Likewise.
(ssemodesuffix): Add ph suffix for HF vector modes.
* config/i386/sse.md (VFH_128): New mode iterator.
(VMOVE): Adjust for HF vector modes.
(V): Likewise.
(V_256_512): Likewise.
(avx512): Likewise.
(avx512fmaskmode): Likewise.
(shuffletype): Likewise.
(sseinsnmode): Likewise.
(ssedoublevecmode): Likewise.
(ssehalfvecmode): Likewise.
(ssehalfvecmodelower): Likewise.
(ssePScmode): Likewise.
(ssescalarmode): Likewise.
(ssescalarmodelower): Likewise.
(sseintprefix): Likewise.
(i128): Likewise.
(bcstscalarsuff): Likewise.
(xtg_mode): Likewise.
(VI12HF_AVX512VL): New mode_iterator.
(VF_AVX512FP16): Likewise.
(VIHF): Likewise.
(VIHF_256): Likewise.
(VIHF_AVX512BW): Likewise.
(V16_256): Likewise.
(V32_512): Likewise.
(sseintmodesuffix): New mode_attr.
(sse): Add scalar and vector HFmodes.
(ssescalarmode): Add vector HFmode mapping.
(ssescalarmodesuffix): Add sh suffix for HFmode.
(*<sse>_vm<insn><mode>3): Use VFH_128.
(*<sse>_vm<multdiv_mnemonic><mode>3): Likewise.
(*ieee_<ieee_maxmin><mode>3): Likewise.
(<avx512>_blendm<mode>): New define_insn.
(vec_setv8hf): New define_expand.
(vec_set<mode>_0): New define_insn for HF vector set.
(*avx512fp16_movsh): Likewise.
(avx512fp16_movsh): Likewise.
(vec_extract_lo_v32hi): Rename to ...
(vec_extract_lo_<mode>): ... this, and adjust to allow HF
vector modes.
(vec_extract_hi_v32hi): Likewise.
(vec_extract_hi_<mode>): Likewise.
(vec_extract_lo_v16hi): Likewise.
(vec_extract_lo_<mode>): Likewise.
(vec_extract_hi_v16hi): Likewise.
(vec_extract_hi_<mode>): Likewise.
(vec_set_hi_v16hi): Likewise.
(vec_set_hi_<mode>): Likewise.
(vec_set_lo_v16hi): Likewise.
(vec_set_lo_<mode>): Likewise.
(*vec_extract<mode>_0): New define_insn_and_split for HF
vector extract.
(*vec_extracthf): New define_insn.
(VEC_EXTRACT_MODE): Add HF vector modes.
(PINSR_MODE): Add V8HF.
(sse2p4_1): Likewise.
(pinsr_evex_isa): Likewise.
(<sse2p4_1>_pinsr<ssemodesuffix>): Adjust to support
insert for V8HFmode.
(pbroadcast_evex_isa): Add HF vector modes.
(AVX2_VEC_DUP_MODE): Likewise.
(VEC_INIT_MODE): Likewise.
(VEC_INIT_HALF_MODE): Likewise.
(avx2_pbroadcast<mode>): Adjust to support HF vector mode
broadcast.
(avx2_pbroadcast<mode>_1): Likewise.
(<avx512>_vec_dup<mode>_1): Likewise.
(<avx512>_vec_dup<mode><mask_name>): Likewise.
(<mask_codefor><avx512>_vec_dup_gpr<mode><mask_name>):
Likewise.
Guo, Xuepeng [Tue, 25 Dec 2018 03:39:26 +0000 (19:39 -0800)]
AVX512FP16: Initial support for AVX512FP16 feature and scalar _Float16 instructions.
gcc/ChangeLog:
* common/config/i386/cpuinfo.h (get_available_features):
Detect FEATURE_AVX512FP16.
* common/config/i386/i386-common.c
(OPTION_MASK_ISA_AVX512FP16_SET,
OPTION_MASK_ISA_AVX512FP16_UNSET,
OPTION_MASK_ISA2_AVX512FP16_SET,
OPTION_MASK_ISA2_AVX512FP16_UNSET): New.
(OPTION_MASK_ISA2_AVX512BW_UNSET,
OPTION_MASK_ISA2_AVX512BF16_UNSET): Add AVX512FP16.
(ix86_handle_option): Handle -mavx512fp16.
* common/config/i386/i386-cpuinfo.h (enum processor_features):
Add FEATURE_AVX512FP16.
* common/config/i386/i386-isas.h: Add entry for AVX512FP16.
* config.gcc: Add avx512fp16intrin.h.
* config/i386/avx512fp16intrin.h: New intrinsic header.
* config/i386/cpuid.h: Add bit_AVX512FP16.
* config/i386/i386-builtin-types.def: (FLOAT16): New primitive type.
* config/i386/i386-builtins.c: Support _Float16 type for i386
backend.
(ix86_register_float16_builtin_type): New function.
(ix86_float16_type_node): New.
* config/i386/i386-c.c (ix86_target_macros_internal): Define
__AVX512FP16__.
* config/i386/i386-expand.c (ix86_expand_branch): Support
HFmode.
(ix86_prepare_fp_compare_args): Adjust TARGET_SSE_MATH &&
SSE_FLOAT_MODE_P to SSE_FLOAT_MODE_SSEMATH_OR_HF_P.
(ix86_expand_fp_movcc): Ditto.
* config/i386/i386-isa.def: Add PTA define for AVX512FP16.
* config/i386/i386-options.c (isa2_opts): Add -mavx512fp16.
(ix86_valid_target_attribute_inner_p): Add avx512fp16 attribute.
* config/i386/i386.c (ix86_get_ssemov): Use
vmovdqu16/vmovw/vmovsh for HFmode/HImode scalar or vector.
(ix86_get_excess_precision): Use
FLT_EVAL_METHOD_PROMOTE_TO_FLOAT16 when TARGET_AVX512FP16
existed.
(sse_store_index): Use SFmode cost for HFmode cost.
(inline_memory_move_cost): Add HFmode, and perfer SSE cost over
GPR cost for HFmode.
(ix86_hard_regno_mode_ok): Allow HImode in sse register.
(ix86_mangle_type): Add manlging for _Float16 type.
(inline_secondary_memory_needed): No memory is needed for
16bit movement between gpr and sse reg under
TARGET_AVX512FP16.
(ix86_multiplication_cost): Adjust TARGET_SSE_MATH &&
SSE_FLOAT_MODE_P to SSE_FLOAT_MODE_SSEMATH_OR_HF_P.
(ix86_division_cost): Ditto.
(ix86_rtx_costs): Ditto.
(ix86_add_stmt_cost): Ditto.
(ix86_optab_supported_p): Ditto.
* config/i386/i386.h (VALID_AVX512F_SCALAR_MODE): Add HFmode.
(SSE_FLOAT_MODE_SSEMATH_OR_HF_P): Add HFmode.
(PTA_SAPPHIRERAPIDS): Add PTA_AVX512FP16.
* config/i386/i386.md (mode): Add HFmode.
(MODE_SIZE): Add HFmode.
(isa): Add avx512fp16.
(enabled): Handle avx512fp16.
(ssemodesuffix): Add sh suffix for HFmode.
(comm): Add mult, div.
(plusminusmultdiv): New code iterator.
(insn): Add mult, div.
(*movhf_internal): Adjust for avx512fp16 instruction.
(*movhi_internal): Ditto.
(*cmpi<unord>hf): New define_insn for HFmode.
(*ieee_s<ieee_maxmin>hf3): Likewise.
(extendhf<mode>2): Likewise.
(trunc<mode>hf2): Likewise.
(float<floatunssuffix><mode>hf2): Likewise.
(*<insn>hf): Likewise.
(cbranchhf4): New expander.
(movhfcc): Likewise.
(<insn>hf3): Likewise.
(mulhf3): Likewise.
(divhf3): Likewise.
* config/i386/i386.opt: Add mavx512fp16.
* config/i386/immintrin.h: Include avx512fp16intrin.h.
* doc/invoke.texi: Add mavx512fp16.
* doc/extend.texi: Add avx512fp16 Usage Notes.
gcc/testsuite/ChangeLog:
* gcc.target/i386/avx-1.c: Add -mavx512fp16 in dg-options.
* gcc.target/i386/avx-2.c: Ditto.
* gcc.target/i386/avx512-check.h: Check cpuid for AVX512FP16.
* gcc.target/i386/funcspec-56.inc: Add new target attribute check.
* gcc.target/i386/sse-13.c: Add -mavx512fp16.
* gcc.target/i386/sse-14.c: Ditto.
* gcc.target/i386/sse-22.c: Ditto.
* gcc.target/i386/sse-23.c: Ditto.
* lib/target-supports.exp: (check_effective_target_avx512fp16): New.
* g++.target/i386/float16-1.C: New test.
* g++.target/i386/float16-2.C: Ditto.
* g++.target/i386/float16-3.C: Ditto.
* gcc.target/i386/avx512fp16-12a.c: Ditto.
* gcc.target/i386/avx512fp16-12b.c: Ditto.
* gcc.target/i386/float16-3a.c: Ditto.
* gcc.target/i386/float16-3b.c: Ditto.
* gcc.target/i386/float16-4a.c: Ditto.
* gcc.target/i386/float16-4b.c: Ditto.
* gcc.target/i386/pr54855-12.c: Ditto.
* g++.dg/other/i386-2.C: Ditto.
* g++.dg/other/i386-3.C: Ditto.
Co-Authored-By: H.J. Lu <hongjiu.lu@intel.com>
Co-Authored-By: Liu Hongtao <hongtao.liu@intel.com>
Co-Authored-By: Wang Hongyu <hongyu.wang@intel.com>
Co-Authored-By: Xu Dianhong <dianhong.xu@intel.com>
liuhongt [Mon, 2 Aug 2021 02:56:45 +0000 (10:56 +0800)]
Support -fexcess-precision=16 which will enable FLT_EVAL_METHOD_PROMOTE_TO_FLOAT16 when backend supports _Float16.
gcc/ada/ChangeLog:
* gcc-interface/misc.c (gnat_post_options): Issue an error for
-fexcess-precision=16.
gcc/c-family/ChangeLog:
* c-common.c (excess_precision_mode_join): Update below comments.
(c_ts18661_flt_eval_method): Set excess_precision_type to
EXCESS_PRECISION_TYPE_FLOAT16 when -fexcess-precision=16.
* c-cppbuiltin.c (cpp_atomic_builtins): Update below comments.
(c_cpp_flt_eval_method_iec_559): Set excess_precision_type to
EXCESS_PRECISION_TYPE_FLOAT16 when -fexcess-precision=16.
gcc/ChangeLog:
* common.opt: Support -fexcess-precision=16.
* config/aarch64/aarch64.c (aarch64_excess_precision): Return
FLT_EVAL_METHOD_PROMOTE_TO_FLOAT16 when
EXCESS_PRECISION_TYPE_FLOAT16.
* config/arm/arm.c (arm_excess_precision): Ditto.
* config/i386/i386.c (ix86_get_excess_precision): Ditto.
* config/m68k/m68k.c (m68k_excess_precision): Issue an error
when EXCESS_PRECISION_TYPE_FLOAT16.
* config/s390/s390.c (s390_excess_precision): Ditto.
* coretypes.h (enum excess_precision_type): Add
EXCESS_PRECISION_TYPE_FLOAT16.
* doc/tm.texi (TARGET_C_EXCESS_PRECISION): Update documents.
* doc/tm.texi.in (TARGET_C_EXCESS_PRECISION): Ditto.
* doc/extend.texi (Half-Precision): Document
-fexcess-precision=16.
* flag-types.h (enum excess_precision): Add
EXCESS_PRECISION_FLOAT16.
* target.def (excess_precision): Update document.
* tree.c (excess_precision_type): Set excess_precision_type to
EXCESS_PRECISION_FLOAT16 when -fexcess-precision=16.
gcc/fortran/ChangeLog:
* options.c (gfc_post_options): Issue an error for
-fexcess-precision=16.
gcc/testsuite/ChangeLog:
* gcc.target/i386/float16-6.c: New test.
* gcc.target/i386/float16-7.c: New test.
liuhongt [Mon, 6 Sep 2021 01:54:42 +0000 (09:54 +0800)]
Adjust the wording for x86 _Float16 type.
gcc/ChangeLog:
* doc/extend.texi: (@node Floating Types): Adjust the wording.
(@node Half-Precision): Ditto.
GCC Administrator [Wed, 8 Sep 2021 00:16:23 +0000 (00:16 +0000)]
Daily bump.
Max Filippov [Tue, 7 Sep 2021 22:40:00 +0000 (15:40 -0700)]
gcc: xtensa: fix PR target/102115
2021-09-07 Takayuki 'January June' Suwa <jjsuwa_sys3175@yahoo.co.jp>
gcc/
PR target/102115
* config/xtensa/xtensa.c (xtensa_emit_move_sequence): Add
'CONST_INT_P (src)' to the condition of the block that tries to
eliminate literal when loading integer contant.
Ian Lance Taylor [Tue, 7 Sep 2021 21:37:55 +0000 (14:37 -0700)]
runtime: use hash32, not hash64, for amd64p32, mips64p32, mips64p32le
Fixes PR go/102102
Reviewed-on: https://go-review.googlesource.com/c/gofrontend/+/348015
David Faust [Tue, 3 Aug 2021 17:33:03 +0000 (10:33 -0700)]
doc: BPF CO-RE documentation
Document the new command line options (-mco-re and -mno-co-re), the new
BPF target builtin (__builtin_preserve_access_index), and the new BPF
target attribute (preserve_access_index) introduced with BPF CO-RE.
gcc/ChangeLog:
* doc/extend.texi (BPF Type Attributes) New node.
Document new preserve_access_index attribute.
Document new preserve_access_index builtin.
* doc/invoke.texi: Document -mco-re and -mno-co-re options.
David Faust [Tue, 3 Aug 2021 17:28:53 +0000 (10:28 -0700)]
bpf testsuite: Add BPF CO-RE tests
This commit adds several tests for the new BPF CO-RE functionality to
the BPF target testsuite.
gcc/testsuite/ChangeLog:
* gcc.target/bpf/core-attr-1.c: New test.
* gcc.target/bpf/core-attr-2.c: Likewise.
* gcc.target/bpf/core-attr-3.c: Likewise.
* gcc.target/bpf/core-attr-4.c: Likewise
* gcc.target/bpf/core-builtin-1.c: Likewise
* gcc.target/bpf/core-builtin-2.c: Likewise.
* gcc.target/bpf/core-builtin-3.c: Likewise.
* gcc.target/bpf/core-section-1.c: Likewise.
David Faust [Tue, 3 Aug 2021 17:27:44 +0000 (10:27 -0700)]
bpf: BPF CO-RE support
This commit introduces support for BPF Compile Once - Run
Everywhere (CO-RE) in GCC.
gcc/ChangeLog:
* config/bpf/bpf.c: Adjust includes.
(bpf_handle_preserve_access_index_attribute): New function.
(bpf_attribute_table): Use it here.
(bpf_builtins): Add BPF_BUILTIN_PRESERVE_ACCESS_INDEX.
(bpf_option_override): Handle "-mco-re" option.
(bpf_asm_init_sections): New.
(TARGET_ASM_INIT_SECTIONS): Redefine.
(bpf_file_end): New.
(TARGET_ASM_FILE_END): Redefine.
(bpf_init_builtins): Add "__builtin_preserve_access_index".
(bpf_core_compute, bpf_core_get_index): New.
(is_attr_preserve_access): New.
(bpf_expand_builtin): Handle new builtins.
(bpf_core_newdecl, bpf_core_is_maybe_aggregate_access): New.
(bpf_core_walk): New.
(bpf_resolve_overloaded_builtin): New.
(TARGET_RESOLVE_OVERLOADED_BUILTIN): Redefine.
(handle_attr): New.
(pass_bpf_core_attr): New RTL pass.
* config/bpf/bpf-passes.def: New file.
* config/bpf/bpf-protos.h (make_pass_bpf_core_attr): New.
* config/bpf/coreout.c: New file.
* config/bpf/coreout.h: Likewise.
* config/bpf/t-bpf (TM_H): Add $(srcdir)/config/bpf/coreout.h.
(coreout.o): New rule.
(PASSES_EXTRA): Add $(srcdir)/config/bpf/bpf-passes.def.
* config.gcc (bpf): Add coreout.h to extra_headers.
Add coreout.o to extra_objs.
Add $(srcdir)/config/bpf/coreout.c to target_gtfiles.
David Faust [Tue, 3 Aug 2021 17:04:10 +0000 (10:04 -0700)]
btf: expose get_btf_id
Expose the function get_btf_id, so that it may be used by the BPF
backend. This enables the BPF CO-RE machinery in the BPF backend to
lookup BTF type IDs, in order to create CO-RE relocation records.
A prototype is added in ctfc.h
gcc/ChangeLog:
* btfout.c (get_btf_id): Function is no longer static.
* ctfc.h: Expose it here.
David Faust [Tue, 3 Aug 2021 17:01:31 +0000 (10:01 -0700)]
ctfc: add function to lookup CTF ID of a TREE type
Add a new function, ctf_lookup_tree_type, to return the CTF type ID
associated with a type via its is TREE node. The function is exposed via
a prototype in ctfc.h.
gcc/ChangeLog:
* ctfc.c (ctf_lookup_tree_type): New function.
* ctfc.h: Likewise.
David Faust [Tue, 3 Aug 2021 17:00:42 +0000 (10:00 -0700)]
ctfc: externalize ctf_dtd_lookup
Expose the function ctf_dtd_lookup, so that it can be used by the BPF
CO-RE machinery. The function is no longer static, and an extern
prototype is added in ctfc.h.
gcc/ChangeLog:
* ctfc.c (ctf_dtd_lookup): Function is no longer static.
* ctfc.h: Analogous change.
David Faust [Tue, 3 Aug 2021 16:58:48 +0000 (09:58 -0700)]
dwarf: externalize lookup_type_die
Expose the function lookup_type_die in dwarf2out, so that it can be used
by CTF/BTF when adding BPF CO-RE information. The function is now
non-static, and an extern prototype is added in dwarf2out.h.
gcc/ChangeLog:
* dwarf2out.c (lookup_type_die): Function is no longer static.
* dwarf2out.h: Expose it here.
Hans-Peter Nilsson [Tue, 7 Sep 2021 20:08:49 +0000 (22:08 +0200)]
Fix fatal typo in gcc.dg/no_profile_instrument_function-attr-2.c
Dejagnu is unfortunately brittle: a syntax error in a
directive can abort the test-run for the current "tool"
(gcc, g++, gfortran), and if you don't check for this
condition or actually read the stdout log yourself, your
tools may make you believe the test was successful without
regressions. At the very least, always grep for ^ERROR: in
the stdout log!
With r12-3379, the testsuite got such a fatal syntax error,
causing the gcc test-run to abort at (e.g.):
...
FAIL: gcc.dg/memchr.c (test for excess errors)
FAIL: gcc.dg/memcmp-3.c (test for excess errors)
ERROR: (DejaGnu) proc "scan-tree-dump-not\" = foo {\(\)"} optimized" does not exist.
The error code is TCL LOOKUP COMMAND scan-tree-dump-not\"
The info on the error is:
invalid command name "scan-tree-dump-not""
while executing
"::tcl_unknown scan-tree-dump-not\" = foo {\(\)"} optimized"
("uplevel" body line 1)
invoked from within
"uplevel 1 ::tcl_unknown $args"
=== gcc Summary ===
# of expected passes 63740
# of unexpected failures 38
# of unexpected successes 2
# of expected failures 351
# of unresolved testcases 3
# of unsupported tests 662
x/cris-elf/gccobj/gcc/xgcc version 12.0.0
20210907 (experimental)\
[master r12-3391-g849d5f5929fc] (GCC)
testsuite:
* gcc.dg/no_profile_instrument_function-attr-2.c: Fix
typo in last change.
Harald Anlauf [Tue, 7 Sep 2021 18:51:49 +0000 (20:51 +0200)]
Fortran - improve error recovery determining array element from constructor
gcc/fortran/ChangeLog:
PR fortran/101327
* expr.c (find_array_element): When bounds cannot be determined as
constant, return error instead of aborting.
gcc/testsuite/ChangeLog:
PR fortran/101327
* gfortran.dg/pr101327.f90: New test.
Indu Bhagat [Tue, 7 Sep 2021 18:18:54 +0000 (11:18 -0700)]
dwarf2out: Emit BTF in dwarf2out_finish for BPF CO-RE usecase
DWARF generation is split between early and late phases when LTO is in effect.
This poses challenges for CTF/BTF generation especially if late debug info
generation is desirable, as turns out to be the case for BPF CO-RE.
The approach taken here in this patch is:
1. LTO is disabled for BPF CO-RE
The reason to disable LTO for BPF CO-RE is that if LTO is in effect, BPF CO-RE
relocations need to be generated in the LTO link phase _after_ the optimizations
are done. This means we need to devise way to combine early and late BTF. At
this time, in absence of linker support for BTF sections, it makes sense to
steer clear of LTO for BPF CO-RE and bypass the issue.
2. The BPF backend updates the write_symbols with BPF_WITH_CORE_DEBUG to convey
the case that BTF with CO-RE support needs to be generated. This information
is used by the debug info emission routines to defer the emission of BTF/CO-RE
until dwarf2out_finish.
So, in other words,
dwarf2out_early_finish
- Always emit CTF here.
- if (BTF && !BTF_WITH_CORE), emit BTF now.
dwarf2out_finish
- if (BTF_WITH_CORE) emit BTF now.
gcc/ChangeLog:
* dwarf2ctf.c (ctf_debug_finalize): Make it static.
(ctf_debug_early_finish): New definition.
(ctf_debug_finish): Likewise.
* dwarf2ctf.h (ctf_debug_finalize): Remove declaration.
(ctf_debug_early_finish): New declaration.
(ctf_debug_finish): Likewise.
* dwarf2out.c (dwarf2out_finish): Invoke ctf_debug_finish.
(dwarf2out_early_finish): Invoke ctf_debug_early_finish.
Indu Bhagat [Tue, 7 Sep 2021 18:17:55 +0000 (11:17 -0700)]
bpf: Add new -mco-re option for BPF CO-RE
-mco-re in the BPF backend enables code generation for the CO-RE usecase. LTO is
disabled for CO-RE compilations.
gcc/ChangeLog:
* config/bpf/bpf.c (bpf_option_override): For BPF backend, disable LTO
support when compiling for CO-RE.
* config/bpf/bpf.opt: Add new command line option -mco-re.
gcc/testsuite/ChangeLog:
* gcc.target/bpf/core-lto-1.c: New test.
Indu Bhagat [Tue, 7 Sep 2021 18:16:53 +0000 (11:16 -0700)]
debug: Add BTF_WITH_CORE_DEBUG debug format
To best handle BTF/CO-RE in GCC, a distinct BTF_WITH_CORE_DEBUG debug format is
being added. This helps the compiler detect whether BTF with CO-RE relocations
needs to be emitted.
gcc/ChangeLog:
* flag-types.h (enum debug_info_type): Add new enum
DINFO_TYPE_BTF_WITH_CORE.
(BTF_WITH_CORE_DEBUG): New bitmask.
* flags.h (btf_with_core_debuginfo_p): New declaration.
* opts.c (btf_with_core_debuginfo_p): New definition.
Jason Merrill [Tue, 31 Aug 2021 16:54:37 +0000 (12:54 -0400)]
tree: Change error_operand_p to an inline function
I've thought for a while that many of the macros in tree.h and such should
become inline functions. This one in particular was confusing Coverity; the
null check in the macro made it think that all code guarded by
error_operand_p would also need null checks.
gcc/ChangeLog:
* tree.h (error_operand_p): Change to inline function.
Jakub Jelinek [Tue, 7 Sep 2021 17:33:28 +0000 (19:33 +0200)]
c++: Fix up constexpr evaluation of deleting dtors [PR100495]
We do not save bodies of constexpr clones and instead evaluate the bodies
of the constexpr functions they were cloned from.
I believe that is just fine for constructors because complete vs. base
ctors differ only in classes that have virtual bases and such constructors
aren't constexpr, similarly complete/base destructors.
But as the testcase below shows, for deleting destructors it is not fine,
deleting dtors while marked as clones in fact are just artificial functions
with synthetized body which calls the user destructor and deallocation.
So, either we'd need to evaluate the destructor and afterwards synthetize
and evaluate the deallocation, or we can just save and use the deleting
dtors bodies. The latter seems much easier to me.
2021-09-07 Jakub Jelinek <jakub@redhat.com>
PR c++/100495
* constexpr.c (maybe_save_constexpr_fundef): Save body even for
constexpr deleting dtors.
(cxx_eval_call_expression): Don't use DECL_CLONED_FUNCTION for
deleting dtors.
* g++.dg/cpp2a/constexpr-new21.C: New test.
Tobias Burnus [Tue, 7 Sep 2021 16:29:46 +0000 (18:29 +0200)]
libgomp.texi: Extend OpenMP 5.0 Implementation Status
libgomp/
* libgomp.texi (OpenMP Implementation Status): Extend
OpenMP 5.0 section.
(OpenACC Profiling Interface): Fix typo.
Aldy Hernandez [Tue, 7 Sep 2021 13:20:23 +0000 (15:20 +0200)]
Rename forwarder_block_p in treading code to empty_block_with_phis_p.
gcc/ChangeLog:
* tree-ssa-threadedge.c (forwarder_block_p): Rename to...
(empty_block_with_phis_p): ...this.
(potentially_threadable_block): Same.
(jump_threader::thread_through_normal_block): Same.
Tobias Burnus [Tue, 7 Sep 2021 15:46:05 +0000 (17:46 +0200)]
libgfortran: Makefile fix for ISO_Fortran_binding.h
libgfortran/ChangeLog:
* Makefile.am (gfor_built_src): Depend on
include/ISO_Fortran_binding.h not on ISO_Fortran_binding.h.
(ISO_Fortran_binding.h): Rename make target to ...
(include/ISO_Fortran_binding.h): ... this.
* Makefile.in: Regenerate.
Eric Botcazou [Tue, 7 Sep 2021 13:41:49 +0000 (15:41 +0200)]
Fix PR debug/101947
This is the recent LTO bootstrap failure with Ada enabled. The compiler now
generates DW_OP_deref_type for a unit of the Ada front-end, which means that
the offset of base types in the CU must be computed during early DWARF too.
gcc/
PR debug/101947
* dwarf2out.c (mark_base_types): New overloaded function.
(dwarf2out_early_finish): Invoke it on the COMDAT type list as well
as the compilation unit, and call move_marked_base_types afterward.
H.J. Lu [Sat, 4 Sep 2021 14:48:43 +0000 (07:48 -0700)]
x86: Enable FMA in unsigned SI to SF expanders
Enable FMA in scalar/vector unsigned SI to SF expanders. Don't check
TARGET_AVX512F which has vcvtusi2ss and vcvtudq2ps instructions.
gcc/
PR target/85819
* config/i386/i386-expand.c (ix86_expand_convert_uns_sisf_sse):
Enable FMA.
(ix86_expand_vector_convert_uns_vsivsf): Likewise.
gcc/testsuite/
PR target/85819
* gcc.target/i386/pr85819-1a.c: New test.
* gcc.target/i386/pr85819-1b.c: Likewise.
* gcc.target/i386/pr85819-2a.c: Likewise.
* gcc.target/i386/pr85819-2b.c: Likewise.
* gcc.target/i386/pr85819-2c.c: Likewise.
* gcc.target/i386/pr85819-3.c: Likewise.
Richard Biener [Tue, 7 Sep 2021 09:46:00 +0000 (11:46 +0200)]
tree-optimization/102226 - fix epilogue vector re-use
This fixes re-use of the reduction value in epilogue vectorization
when a conversion from/to variable lenght vectors is required.
2021-09-07 Richard Biener <rguenther@suse.de>
PR tree-optimization/102226
* tree-vect-loop.c (vect_transform_cycle_phi): Record
the converted value for the epilogue PHI use.
* g++.dg/vect/pr102226.cc: New testcase.
Marcel Vollweiler [Tue, 7 Sep 2021 10:46:28 +0000 (03:46 -0700)]
C, C++, Fortran, OpenMP: Add support for 'flush seq_cst' construct.
This patch adds support for the 'seq_cst' memory order clause on the 'flush'
directive which was introduced in OpenMP 5.1.
gcc/c-family/ChangeLog:
* c-omp.c (c_finish_omp_flush): Handle MEMMODEL_SEQ_CST.
gcc/c/ChangeLog:
* c-parser.c (c_parser_omp_flush): Parse 'seq_cst' clause on 'flush'
directive.
gcc/cp/ChangeLog:
* parser.c (cp_parser_omp_flush): Parse 'seq_cst' clause on 'flush'
directive.
* semantics.c (finish_omp_flush): Handle MEMMODEL_SEQ_CST.
gcc/fortran/ChangeLog:
* openmp.c (gfc_match_omp_flush): Parse 'seq_cst' clause on 'flush'
directive.
* trans-openmp.c (gfc_trans_omp_flush): Handle OMP_MEMORDER_SEQ_CST.
gcc/testsuite/ChangeLog:
* c-c++-common/gomp/flush-1.c: Add test case for 'seq_cst'.
* c-c++-common/gomp/flush-2.c: Add test case for 'seq_cst'.
* g++.dg/gomp/attrs-1.C: Adapt test to handle all flush clauses.
* g++.dg/gomp/attrs-2.C: Adapt test to handle all flush clauses.
* gfortran.dg/gomp/flush-1.f90: Add test case for 'seq_cst'.
* gfortran.dg/gomp/flush-2.f90: Add test case for 'seq_cst'.
Martin Liska [Tue, 22 Jun 2021 08:09:01 +0000 (10:09 +0200)]
inline: do not einline when no_profile_instrument_function is different
PR gcov-profile/80223
gcc/ChangeLog:
* ipa-inline.c (can_inline_edge_p): Similarly to sanitizer
options, do not inline when no_profile_instrument_function
attributes are different in early inliner. It's fine to inline
it after PGO instrumentation.
gcc/testsuite/ChangeLog:
* gcc.dg/no_profile_instrument_function-attr-2.c: New test.
Richard Biener [Tue, 7 Sep 2021 08:35:42 +0000 (10:35 +0200)]
tree-optimization/101555 - avoid redundant alias queries in PRE
This avoids doing redundant work during PHI translation to invalidate
mems when translating their corresponding VUSE through the blocks
virtual PHI node. All the invalidation work is already done by
prune_clobbered_mems.
This speeds up the compile of the testcase from 275s with PRE
taking 91% of the compile-time down to 43s with PRE taking 16%
of the compile-time.
2021-09-07 Richard Biener <rguenther@suse.de>
PR tree-optimization/101555
* tree-ssa-pre.c (translate_vuse_through_block): Do not
perform an alias walk to determine the validity of the
mem at the start of the block which is already guaranteed
by means of prune_clobbered_mems.
(phi_translate_1): Pass edge to translate_vuse_through_block.
Tobias Burnus [Tue, 7 Sep 2021 09:01:38 +0000 (11:01 +0200)]
libgomp.texi: Add OpenMP Implementation Status
libgomp/
* libgomp.texi (Enabling OpenMP): Refer to OMP spec in general
not to 4.5; link to new section.
(OpenMP Implementation Status): New.
Sandra Loosemore [Tue, 7 Sep 2021 04:25:11 +0000 (21:25 -0700)]
Fortran: Revert to non-multilib-specific ISO_Fortran_binding.h
Commit
fef67987cf502fe322e92ddce22eea7ac46b4d75 changed the
libgfortran build process to generate multilib-specific versions of
ISO_Fortran_binding.h from a template, by running gfortran to identify
the values of the Fortran kind constants C_LONG_DOUBLE, C_FLOAT128,
and C_INT128_T. This caused multiple problems with search paths, both
for build-tree testing and installed-tree use, not all of which have
been fixed.
This patch reverts to a non-multilib-specific .h file that uses GCC's
predefined preprocessor symbols to detect the supported types and map
them to kind values in the same way as the Fortran front end.
2021-09-06 Sandra Loosemore <sandra@codesourcery.com>
libgfortran/
* ISO_Fortran_binding-1-tmpl.h: Deleted.
* ISO_Fortran_binding-2-tmpl.h: Deleted.
* ISO_Fortran_binding-3-tmpl.h: Deleted.
* ISO_Fortran_binding.h: New file to replace the above.
* Makefile.am (gfor_cdir): Remove MULTISUBDIR.
(ISO_Fortran_binding.h): Simplify to just copy the file.
* Makefile.in: Regenerated.
* mk-kinds-h.sh: Revert pieces no longer needed for
ISO_Fortran_binding.h.
Xionghu Luo [Tue, 7 Sep 2021 01:22:50 +0000 (20:22 -0500)]
rs6000: Expand fmod and remainder when built with fast-math [PR97142]
fmod/fmodf and remainder/remainderf could be expanded instead of library
call when fast-math build, which is much faster.
fmodf:
fdivs f0,f1,f2
friz f0,f0
fnmsubs f1,f2,f0,f1
remainderf:
fdivs f0,f1,f2
frin f0,f0
fnmsubs f1,f2,f0,f1
SPEC2017 Ofast P8LE: 511.povray_r +1.14%, 526.blender_r +1.72%
gcc/ChangeLog:
2021-09-07 Xionghu Luo <luoxhu@linux.ibm.com>
PR target/97142
* config/rs6000/rs6000.md (fmod<mode>3): New define_expand.
(remainder<mode>3): Likewise.
gcc/testsuite/ChangeLog:
2021-09-07 Xionghu Luo <luoxhu@linux.ibm.com>
PR target/97142
* gcc.target/powerpc/pr97142.c: New test.
YunQiang Su [Fri, 3 Sep 2021 07:32:26 +0000 (03:32 -0400)]
MIPS: add .module arch and ase to all output asm
Currently, the asm output file for MIPS has no rev info.
It can make some trouble, for example:
assembler is mips1 by default,
gcc is fpxx by default.
To assemble the output of gcc -S, we have to pass -mips2
to assembler.
The same situation is for some CPU has extension insn.
Octeon is an example.
So we can just add ".set arch=octeon".
If an ASE is enabled, .module ase will also be used.
gcc/ChangeLog:
* config/mips/mips.c (mips_file_start): add .module for
arch and ase.
GCC Administrator [Tue, 7 Sep 2021 00:16:34 +0000 (00:16 +0000)]
Daily bump.
Roger Sayle [Mon, 6 Sep 2021 21:48:53 +0000 (22:48 +0100)]
Correct implementation of wi::clz
As diagnosed with Jakub and Richard in the analysis of PR 102134, the
current implementation of wi::clz has incorrect/inconsistent behaviour.
As mentioned by Richard in comment #7, clz should (always) return zero
for negative values, but the current implementation can only return 0
when precision is a multiple of HOST_BITS_PER_WIDE_INT. The fix is
simply to reorder/shuffle the existing tests.
2021-09-06 Roger Sayle <roger@nextmovesoftware.com>
gcc/ChangeLog
* wide-int.cc (wi::clz): Reorder tests to ensure the result
is zero for all negative values.
Tobias Burnus [Mon, 6 Sep 2021 16:49:08 +0000 (18:49 +0200)]
invoke.texi: Fix @opindex for -foffload-options
gcc/
* doc/invoke.texi (-foffload-options): Fix @opindex.
Serge Belyshev [Thu, 15 Jul 2021 17:19:18 +0000 (20:19 +0300)]
gcc_update: use human readable name for revision string in gcc/REVISION
contrib/Changelog:
* gcc_update: Derive human readable name for HEAD using git describe
like "git gcc-descr" with short commit hash. Drop "revision" from
gcc/REVISION.
H.J. Lu [Sat, 4 Sep 2021 15:28:00 +0000 (08:28 -0700)]
x86: Add non-destructive source to @xorsign<mode>3_1
Add non-destructive source alternative to @xorsign<mode>3_1 for AVX.
gcc/
PR target/89984
* config/i386/i386-expand.c (ix86_split_xorsign): Use operands[2].
* config/i386/i386.md (@xorsign<mode>3_1): Add non-destructive
source alternative for AVX.
gcc/testsuite/
PR target/89984
* gcc.target/i386/pr89984-1.c: New test.
* gcc.target/i386/pr89984-2.c: Likewise.
* gcc.target/i386/xorsign-avx.c: Likewise.
liuhongt [Mon, 6 Sep 2021 09:09:38 +0000 (17:09 +0800)]
Avoid FROM being overwritten in expand_fix.
For the conversion from _Float16 to int, if the corresponding optab
does not exist, the compiler will try the wider mode (SFmode here),
but when floatsfsi exists but FAIL, FROM will be rewritten, which
leads to a PR runtime error.
gcc/ChangeLog:
PR middle-end/102182
* optabs.c (expand_fix): Add from1 to avoid from being
overwritten.
gcc/testsuite/ChangeLog:
PR middle-end/102182
* gcc.target/i386/pr101282.c: New test.
Thomas Schwinge [Mon, 6 Sep 2021 09:42:03 +0000 (11:42 +0200)]
'libgomp.c/target-43.c': '-latomic' for nvptx offloading
... to avoid a regression with recent
commit
090f0d78f194e3cda23fe904016db77ea36c38fa
"openmp: Improve expand_omp_atomic_pipeline":
unresolved symbol __atomic_compare_exchange_1
collect2: error: ld returned 1 exit status
mkoffload: fatal error: [...]/gcc/x86_64-pc-linux-gnu-accel-nvptx-none-gcc returned 1 exit status
libgomp/
* testsuite/libgomp.c/target-43.c: '-latomic' for nvptx offloading.
Eric Botcazou [Mon, 6 Sep 2021 09:16:08 +0000 (11:16 +0200)]
Fix debug info for packed array types in Ada
Packed array types are sometimes represented with integer types under the
hood in Ada, but we nevertheless need to emit them as array types in the
debug info so we have the types.get_array_descr_info langhook for this
purpose; but it is not invoked from modified_type_die, which causes:
FAIL: gdb.ada/arrayptr.exp: scenario=minimal: print pa_ptr.all
FAIL: gdb.ada/arrayptr.exp: scenario=minimal: print pa_ptr.all(3)
in the GDB testsuite.
gcc/
* dwarf2out.c (modified_type_die): Deal with all array types earlier
and use local variable consistently throughout the function.
Jakub Jelinek [Mon, 6 Sep 2021 08:08:16 +0000 (10:08 +0200)]
match.pd: Fix up __builtin_*_overflow arg demotion [PR102207]
My earlier patch to demote arguments of __builtin_*_overflow unfortunately
caused a wrong-code regression. The builtins operate on infinite precision
arguments, outer_prec > inner_prec signed -> signed, unsigned -> unsigned
promotions there are just repeating the sign or 0s and can be demoted,
similarly unsigned -> signed which also is repeating 0s, but as the
testcase shows, signed -> unsigned promotions need to be preserved (unless
we'd know the inner arguments can't be negative), because for negative
numbers such promotion sets the outer_prec -> inner_prec bits to 1 bit the
bits above that to 0 in the infinite precision.
So, the following patch avoids the demotions for the signed -> unsigned
promotions.
2021-09-06 Jakub Jelinek <jakub@redhat.com>
PR tree-optimization/102207
* match.pd: Don't demote operands of IFN_{ADD,SUB,MUL}_OVERFLOW if they
were promoted from signed to wider unsigned type.
* gcc.dg/pr102207.c: New test.
Andrew Pinski [Mon, 6 Sep 2021 00:52:18 +0000 (00:52 +0000)]
Fix PR tree-optimization/63184: add simplification of (& + A) != (& + B)
These two testcases have been failing since GCC 5 but things
have improved such that adding a simplification to match.pd
for this case is easier than before.
In the end we have the following IR:
....
_5 = &a[1] + _4;
_7 = &a + _13;
if (_5 != _7)
So we can fold the _5 != _7 into:
(&a[1] - &a) + _4 != _13
The subtraction is folded into constant by ptr_difference_const.
In this case, the full expression gets folded into a constant
and we are able to remove the if statement.
OK? Bootstrapped and tested on aarch64-linux-gnu with no regressions.
gcc/ChangeLog:
PR tree-optimization/63184
* match.pd: Add simplification of pointer_diff of two pointer_plus
with addr_expr in the first operand of each pointer_plus.
Add simplificatoin of ne/eq of two pointer_plus with addr_expr
in the first operand of each pointer_plus.
gcc/testsuite/ChangeLog:
PR tree-optimization/63184
* c-c++-common/pr19807-2.c: Enable for all targets and remove the xfail.
* c-c++-common/pr19807-3.c: Likewise.
liuhongt [Fri, 3 Sep 2021 05:06:57 +0000 (13:06 +0800)]
Explicitly add -msse2 to compile HF related libgcc source file.
For 32-bit libgcc configure w/o sse2, there's would be an error since
GCC only support _Float16 under sse2. Explicitly add -msse2 for those
HF related libgcc functions, so users can still link them w/ the
upper configuration.
libgcc/ChangeLog:
* Makefile.in: Adjust to support specific CFLAGS for each
libgcc source file.
* config/i386/64/t-softfp: Explicitly add -msse2 for HF
related libgcc source files.
* config/i386/t-softfp: Ditto.
* config/i386/_divhc3.c: New file.
* config/i386/_mulhc3.c: New file.
Richard Biener [Thu, 2 Sep 2021 12:48:10 +0000 (14:48 +0200)]
tree-optimization/102176 - locally compute participating SLP stmts
This performs local re-computation of participating scalar stmts
in BB vectorization subgraphs to allow precise computation of
liveness of scalar stmts after vectorization and thus precise
costing. This treats all extern defs as live but continues
to optimistically handle scalar defs that we think we can handle
by lane-extraction even though that can still fail late during
code-generation.
2021-09-02 Richard Biener <rguenther@suse.de>
PR tree-optimization/102176
* tree-vect-slp.c (vect_slp_gather_vectorized_scalar_stmts):
New function.
(vect_bb_slp_scalar_cost): Use the computed set of
vectorized scalar stmts instead of relying on the out-of-date
and not accurate PURE_SLP_STMT.
(vect_bb_vectorization_profitable_p): Compute the set
of vectorized scalar stmts.
GCC Administrator [Mon, 6 Sep 2021 00:16:18 +0000 (00:16 +0000)]
Daily bump.
Ian Lance Taylor [Thu, 19 Aug 2021 19:29:54 +0000 (12:29 -0700)]
libgo: update to final Go 1.17 release
Reviewed-on: https://go-review.googlesource.com/c/gofrontend/+/343729
Aldy Hernandez [Sun, 5 Sep 2021 14:53:31 +0000 (16:53 +0200)]
Make the path solver's range_of_stmt() handle all statements.
The path solver's range_of_stmt() was handcuffed to only fold
GIMPLE_COND statements, since those were the only statements the
backward threader needed to resolve. However, there is no need for this
restriction, as the folding code is perfectly capable of folding any
statement.
This can be the case when trying to fold other statements in the final
block of a path (for instance, in the forward threader as it tries to
fold candidate statements along a path).
Tested on x86-64 Linux.
gcc/ChangeLog:
* gimple-range-path.cc (path_range_query::range_of_stmt): Remove
GIMPLE_COND special casing.
(path_range_query::range_defined_in_block): Use range_of_stmt
instead of calling fold_range directly.
Aldy Hernandez [Sun, 5 Sep 2021 10:44:41 +0000 (12:44 +0200)]
Add an unreachable_path_p method to path_range_query.
Keeping track of unreachable calculations while traversing a path is
useful to determine edge reachability, among other things. We've been
doing this ad-hoc in the backwards threader, so this provides a cleaner
way of accessing the information.
This patch also makes it easier to compare different threading
implementations, in some upcoming work. For example, it's currently
difficult to gague how good we're doing compared to the forward threader,
because it can thread paths that are obviously unreachable. This
provides a way of discarding those paths.
Note that I've opted to keep unreachable_path_p() out-of-line, because I
have local changes that will enhance this method.
Tested on x86-64 Linux.
gcc/ChangeLog:
* gimple-range-path.cc (path_range_query::range_of_expr): Set
m_undefined_path when appropriate.
(path_range_query::internal_range_of_expr): Copy from range_of_expr.
(path_range_query::unreachable_path_p): New.
(path_range_query::precompute_ranges): Set m_undefined_path.
* gimple-range-path.h (path_range_query::unreachable_path_p): New.
(path_range_query::internal_range_of_expr): New.
* tree-ssa-threadbackward.c (back_threader::find_taken_edge_cond):
Use unreachable_path_p.
Aldy Hernandez [Sun, 5 Sep 2021 07:41:50 +0000 (09:41 +0200)]
Clean up registering of paths in backwards threader.
All callers to maybe_register_path() call find_taken_edge() beforehand
and pass the edge as an argument. There's no reason to repeat this
at each call site.
This is a clean-up in preparation for some other enhancements to the
backwards threader.
Tested on x86-64 Linux.
gcc/ChangeLog:
* tree-ssa-threadbackward.c (back_threader::maybe_register_path):
Remove argument and call find_taken_edge.
(back_threader::resolve_phi): Do not calculate taken edge before
calling maybe_register_path.
(back_threader::find_paths_to_names): Same.
Jeff Law [Sun, 5 Sep 2021 04:08:34 +0000 (00:08 -0400)]
Improve handling of C bit for setcc insns
gcc/
* config/h8300/h8300.md (QHSI2 mode iterator): New mode iterator.
* config/h8300/testcompare.md (store_c): Update name, use new
QHSI2 iterator.
(store_neg_c, store_shifted_c): New patterns.
GCC Administrator [Sun, 5 Sep 2021 00:16:17 +0000 (00:16 +0000)]
Daily bump.