review.tizen.org Git - platform/upstream/gcc.git/log

AVX512FP16: Add testcase for vaddsh/vsubsh/vmulsh/vdivsh.

gcc/testsuite/ChangeLog:

* gcc.target/i386/avx512fp16-vaddsh-1a.c: New test.
* gcc.target/i386/avx512fp16-vaddsh-1b.c: Ditto.
* gcc.target/i386/avx512fp16-vdivsh-1a.c: Ditto.
* gcc.target/i386/avx512fp16-vdivsh-1b.c: Ditto.
* gcc.target/i386/avx512fp16-vmulsh-1a.c: Ditto.
* gcc.target/i386/avx512fp16-vmulsh-1b.c: Ditto.
* gcc.target/i386/avx512fp16-vsubsh-1a.c: Ditto.
* gcc.target/i386/avx512fp16-vsubsh-1b.c: Ditto.
* gcc.target/i386/pr54855-11.c: Ditto.

AVX512FP16: Add vaddsh/vsubsh/vmulsh/vdivsh.

gcc/ChangeLog:

* config/i386/avx512fp16intrin.h (_mm_add_sh): New intrinsic.
(_mm_mask_add_sh): Likewise.
(_mm_maskz_add_sh): Likewise.
(_mm_sub_sh): Likewise.
(_mm_mask_sub_sh): Likewise.
(_mm_maskz_sub_sh): Likewise.
(_mm_mul_sh): Likewise.
(_mm_mask_mul_sh): Likewise.
(_mm_maskz_mul_sh): Likewise.
(_mm_div_sh): Likewise.
(_mm_mask_div_sh): Likewise.
(_mm_maskz_div_sh): Likewise.
(_mm_add_round_sh): Likewise.
(_mm_mask_add_round_sh): Likewise.
(_mm_maskz_add_round_sh): Likewise.
(_mm_sub_round_sh): Likewise.
(_mm_mask_sub_round_sh): Likewise.
(_mm_maskz_sub_round_sh): Likewise.
(_mm_mul_round_sh): Likewise.
(_mm_mask_mul_round_sh): Likewise.
(_mm_maskz_mul_round_sh): Likewise.
(_mm_div_round_sh): Likewise.
(_mm_mask_div_round_sh): Likewise.
(_mm_maskz_div_round_sh): Likewise.
* config/i386/i386-builtin-types.def: Add corresponding builtin types.
* config/i386/i386-builtin.def: Add corresponding new builtins.
* config/i386/i386-expand.c
(ix86_expand_round_builtin): Handle new builtins.
* config/i386/sse.md (VF_128): Change description.
(<sse>_vm<plusminus_insn><mode>3<mask_scalar_name><round_scalar_name>):
Adjust to support HF vector modes.
(<sse>_vm<multdiv_mnemonic><mode>3<mask_scalar_name><round_scalar_name>):
Likewise.

gcc/testsuite/ChangeLog:

* gcc.target/i386/avx-1.c: Add test for new builtins.
* gcc.target/i386/sse-13.c: Ditto.
* gcc.target/i386/sse-23.c: Ditto.
* gcc.target/i386/sse-14.c: Add test for new intrinsics.
* gcc.target/i386/sse-22.c: Ditto.

AVX512FP16: Enable _Float16 autovectorization

gcc/ChangeLog:

* config/i386/i386-expand.c
(ix86_avx256_split_vector_move_misalign): Handle V16HF mode.
* config/i386/i386.c
(ix86_preferred_simd_mode): Handle HF mode.
* config/i386/sse.md (V_256H): New mode iterator.
(avx_vextractf128<mode>): Use it.
(VEC_INIT_MODE): Align vector HFmode condition to vector
HImodes since there're no real HF instruction used.
(VEC_INIT_HALF_MODE): Ditto.
(VIHF): Ditto.
(VIHF_AVX512BW): Ditto.
(*vec_extracthf): Ditto.
(VEC_EXTRACT_MODE): Ditto.

gcc/testsuite/ChangeLog:

* gcc.target/i386/vect-float16-1.c: New test.
* gcc.target/i386/vect-float16-10.c: Ditto.
* gcc.target/i386/vect-float16-11.c: Ditto.
* gcc.target/i386/vect-float16-12.c: Ditto.
* gcc.target/i386/vect-float16-2.c: Ditto.
* gcc.target/i386/vect-float16-3.c: Ditto.
* gcc.target/i386/vect-float16-4.c: Ditto.
* gcc.target/i386/vect-float16-5.c: Ditto.
* gcc.target/i386/vect-float16-6.c: Ditto.
* gcc.target/i386/vect-float16-7.c: Ditto.
* gcc.target/i386/vect-float16-8.c: Ditto.
* gcc.target/i386/vect-float16-9.c: Ditto.

Remove dbx.h, do not set PREFERRED_DEBUGGING_TYPE from dbxcoff.h, lynx.h

The following removes the unused config/dbx.h file and removes the
setting of PREFERRED_DEBUGGING_TYPE from dbxcoff.h which is
overridden by all users (djgpp/mingw/cygwin) via either including
config/i386/djgpp.h or config/i386/cygming.h

There are still circumstances where mingw and cygwin default to
STABS, namely when HAVE_GAS_PE_SECREL32_RELOC is not defined and
the target defaults to 32bit code generation.

The new style handling DBX_DEBUGGING_INFO is in line with
dbxelf.h which does not define PREFERRED_DEBUGGING_TYPE either.

The patch also removes the PREFERRED_DEBUGGING_TYPE define from
lynx.h which always follows elfos.h already defaulting to DWARF,
so the comment about STABS being the default is misleading and
outdated.

2021-09-09 Richard Biener <rguenther@suse.de>

PR target/102255
* config/dbx.h: Remove.
* config/dbxcoff.h: Do not define PREFERRED_DEBUGGING_TYPE.
* config/lynx.h: Likewise.

Remove copysign post_reload splitter for scalar modes.

It can generate better code just like avx512dq-abs-copysign-1.c
shows.

gcc/ChangeLog:

* config/i386/i386-expand.c (ix86_expand_copysign): Expand
right into ANDNOT + AND + IOR, using paradoxical subregs.
(ix86_split_copysign_const): Remove.
(ix86_split_copysign_var): Ditto.
* config/i386/i386-protos.h (ix86_split_copysign_const): Dotto.
(ix86_split_copysign_var): Ditto.
* config/i386/i386.md (@copysign<mode>3_const): Ditto.
(@copysign<mode>3_var): Ditto.

gcc/testsuite/ChangeLog:

* gcc.target/i386/avx512dq-abs-copysign-1.c: Adjust testcase.
* gcc.target/i386/avx512vl-abs-copysign-1.c: Adjust testcase.

Daily bump.

Add -ftrivial-auto-var-init option and uninitialized variable attribute.

Initialize automatic variables with either a pattern or with zeroes to increase
the security and predictability of a program by preventing uninitialized memory
disclosure and use.
GCC still considers an automatic variable that doesn't have an explicit
initializer as uninitialized, -Wuninitialized will still report warning messages
on such automatic variables.
With this option, GCC will also initialize any padding of automatic variables
that have structure or union types to zeroes.
You can control this behavior for a specific variable by using the variable
attribute "uninitialized" to control runtime overhead.

gcc/ChangeLog:

2021-09-09  qing zhao  <qing.zhao@oracle.com>

* builtins.c (expand_builtin_memset): Make external visible.
* builtins.h (expand_builtin_memset): Declare extern.
* common.opt (ftrivial-auto-var-init=): New option.
* doc/extend.texi: Document the uninitialized attribute.
* doc/invoke.texi: Document -ftrivial-auto-var-init.
* flag-types.h (enum auto_init_type): New enumerated type
auto_init_type.
* gimple-fold.c (clear_padding_type): Add one new parameter.
(clear_padding_union): Likewise.
(clear_padding_emit_loop): Likewise.
(clear_type_padding_in_mask): Likewise.
(gimple_fold_builtin_clear_padding): Handle this new parameter.
* gimplify.c (gimple_add_init_for_auto_var): New function.
(gimple_add_padding_init_for_auto_var): New function.
(is_var_need_auto_init): New function.
(gimplify_decl_expr): Add initialization to automatic variables per
users' requests.
(gimplify_call_expr): Add one new parameter for call to
__builtin_clear_padding.
(gimplify_init_constructor): Add padding initialization in the end.
* internal-fn.c (INIT_PATTERN_VALUE): New macro.
(expand_DEFERRED_INIT): New function.
* internal-fn.def (DEFERRED_INIT): New internal function.
* tree-cfg.c (verify_gimple_call): Verify calls to .DEFERRED_INIT.
* tree-sra.c (generate_subtree_deferred_init): New function.
(scan_function): Avoid setting cannot_scalarize_away_bitmap for
calls to .DEFERRED_INIT.
(sra_modify_deferred_init): New function.
(sra_modify_function_body): Handle calls to DEFERRED_INIT specially.
* tree-ssa-structalias.c (find_func_aliases_for_call): Likewise.
* tree-ssa-uninit.c (warn_uninit): Handle calls to DEFERRED_INIT
specially.
(check_defs): Likewise.
(warn_uninitialized_vars): Likewise.
* tree-ssa.c (ssa_undefined_value_p): Likewise.
* tree.c (build_common_builtin_nodes): Build tree node for
BUILT_IN_CLEAR_PADDING when needed.

gcc/c-family/ChangeLog:

2021-09-09  qing zhao  <qing.zhao@oracle.com>

* c-attribs.c (handle_uninitialized_attribute): New function.
(c_common_attribute_table): Add "uninitialized" attribute.

gcc/testsuite/ChangeLog:

2021-09-09  qing zhao  <qing.zhao@oracle.com>

* c-c++-common/auto-init-1.c: New test.
* c-c++-common/auto-init-10.c: New test.
* c-c++-common/auto-init-11.c: New test.
* c-c++-common/auto-init-12.c: New test.
* c-c++-common/auto-init-13.c: New test.
* c-c++-common/auto-init-14.c: New test.
* c-c++-common/auto-init-15.c: New test.
* c-c++-common/auto-init-16.c: New test.
* c-c++-common/auto-init-2.c: New test.
* c-c++-common/auto-init-3.c: New test.
* c-c++-common/auto-init-4.c: New test.
* c-c++-common/auto-init-5.c: New test.
* c-c++-common/auto-init-6.c: New test.
* c-c++-common/auto-init-7.c: New test.
* c-c++-common/auto-init-8.c: New test.
* c-c++-common/auto-init-9.c: New test.
* c-c++-common/auto-init-esra.c: New test.
* c-c++-common/auto-init-padding-1.c: New test.
* c-c++-common/auto-init-padding-2.c: New test.
* c-c++-common/auto-init-padding-3.c: New test.
* g++.dg/auto-init-uninit-pred-1_a.C: New test.
* g++.dg/auto-init-uninit-pred-2_a.C: New test.
* g++.dg/auto-init-uninit-pred-3_a.C: New test.
* g++.dg/auto-init-uninit-pred-4.C: New test.
* gcc.dg/auto-init-sra-1.c: New test.
* gcc.dg/auto-init-sra-2.c: New test.
* gcc.dg/auto-init-uninit-1.c: New test.
* gcc.dg/auto-init-uninit-12.c: New test.
* gcc.dg/auto-init-uninit-13.c: New test.
* gcc.dg/auto-init-uninit-14.c: New test.
* gcc.dg/auto-init-uninit-15.c: New test.
* gcc.dg/auto-init-uninit-16.c: New test.
* gcc.dg/auto-init-uninit-17.c: New test.
* gcc.dg/auto-init-uninit-18.c: New test.
* gcc.dg/auto-init-uninit-19.c: New test.
* gcc.dg/auto-init-uninit-2.c: New test.
* gcc.dg/auto-init-uninit-20.c: New test.
* gcc.dg/auto-init-uninit-21.c: New test.
* gcc.dg/auto-init-uninit-22.c: New test.
* gcc.dg/auto-init-uninit-23.c: New test.
* gcc.dg/auto-init-uninit-24.c: New test.
* gcc.dg/auto-init-uninit-25.c: New test.
* gcc.dg/auto-init-uninit-26.c: New test.
* gcc.dg/auto-init-uninit-3.c: New test.
* gcc.dg/auto-init-uninit-34.c: New test.
* gcc.dg/auto-init-uninit-36.c: New test.
* gcc.dg/auto-init-uninit-37.c: New test.
* gcc.dg/auto-init-uninit-4.c: New test.
* gcc.dg/auto-init-uninit-5.c: New test.
* gcc.dg/auto-init-uninit-6.c: New test.
* gcc.dg/auto-init-uninit-8.c: New test.
* gcc.dg/auto-init-uninit-9.c: New test.
* gcc.dg/auto-init-uninit-A.c: New test.
* gcc.dg/auto-init-uninit-B.c: New test.
* gcc.dg/auto-init-uninit-C.c: New test.
* gcc.dg/auto-init-uninit-H.c: New test.
* gcc.dg/auto-init-uninit-I.c: New test.
* gcc.target/aarch64/auto-init-1.c: New test.
* gcc.target/aarch64/auto-init-2.c: New test.
* gcc.target/aarch64/auto-init-3.c: New test.
* gcc.target/aarch64/auto-init-4.c: New test.
* gcc.target/aarch64/auto-init-5.c: New test.
* gcc.target/aarch64/auto-init-6.c: New test.
* gcc.target/aarch64/auto-init-7.c: New test.
* gcc.target/aarch64/auto-init-8.c: New test.
* gcc.target/aarch64/auto-init-padding-1.c: New test.
* gcc.target/aarch64/auto-init-padding-10.c: New test.
* gcc.target/aarch64/auto-init-padding-11.c: New test.
* gcc.target/aarch64/auto-init-padding-12.c: New test.
* gcc.target/aarch64/auto-init-padding-2.c: New test.
* gcc.target/aarch64/auto-init-padding-3.c: New test.
* gcc.target/aarch64/auto-init-padding-4.c: New test.
* gcc.target/aarch64/auto-init-padding-5.c: New test.
* gcc.target/aarch64/auto-init-padding-6.c: New test.
* gcc.target/aarch64/auto-init-padding-7.c: New test.
* gcc.target/aarch64/auto-init-padding-8.c: New test.
* gcc.target/aarch64/auto-init-padding-9.c: New test.
* gcc.target/i386/auto-init-1.c: New test.
* gcc.target/i386/auto-init-2.c: New test.
* gcc.target/i386/auto-init-21.c: New test.
* gcc.target/i386/auto-init-22.c: New test.
* gcc.target/i386/auto-init-23.c: New test.
* gcc.target/i386/auto-init-24.c: New test.
* gcc.target/i386/auto-init-3.c: New test.
* gcc.target/i386/auto-init-4.c: New test.
* gcc.target/i386/auto-init-5.c: New test.
* gcc.target/i386/auto-init-6.c: New test.
* gcc.target/i386/auto-init-7.c: New test.
* gcc.target/i386/auto-init-8.c: New test.
* gcc.target/i386/auto-init-padding-1.c: New test.
* gcc.target/i386/auto-init-padding-10.c: New test.
* gcc.target/i386/auto-init-padding-11.c: New test.
* gcc.target/i386/auto-init-padding-12.c: New test.
* gcc.target/i386/auto-init-padding-2.c: New test.
* gcc.target/i386/auto-init-padding-3.c: New test.
* gcc.target/i386/auto-init-padding-4.c: New test.
* gcc.target/i386/auto-init-padding-5.c: New test.
* gcc.target/i386/auto-init-padding-6.c: New test.
* gcc.target/i386/auto-init-padding-7.c: New test.
* gcc.target/i386/auto-init-padding-8.c: New test.
* gcc.target/i386/auto-init-padding-9.c: New test.

Fortran - out of bounds in array constructor with implied do loop

gcc/fortran/ChangeLog:

PR fortran/98490
* trans-expr.c (gfc_conv_substring): Do not generate substring
bounds check for implied do loop index variable before it actually
becomes defined.

gcc/testsuite/ChangeLog:

PR fortran/98490
* gfortran.dg/bounds_check_23.f90: New test.

x86-64: Update AVX512FP16 ABI tests for x32

On x32, long is the same as int and pointer is 32 bits. Update AVX512FP16
ABI tests:

1. Replace long with long long for 64-bit integers.
2. Update type and alignment for long and pointer.
3. Skip tests for long on x32.

* gcc.target/x86_64/abi/avx512fp16/args.h: Replace long with
long long.
(XMM_T): Rename _long to _longlong and _ulong to _ulonglong.
(X87_T): Rename _ulong to _ulonglong.
* gcc.target/x86_64/abi/avx512fp16/defines.h (TYPE_SIZE_LONG):
Define to 4 if __ILP32__ is defined.
(TYPE_SIZE_POINTER): Likewise.
(TYPE_ALIGN_LONG): Likewise.
(TYPE_ALIGN_POINTER): Likewise.
* gcc.target/x86_64/abi/avx512fp16/test_3_element_struct_and_unions.c
(main): Skip test for long if __ILP32__ is defined.
* gcc.target/x86_64/abi/avx512fp16/test_m64m128_returning.c
(do_test): Replace _long with _longlong.
* gcc.target/x86_64/abi/avx512fp16/test_struct_returning.c:
(check_300): Replace _ulong with _ulonglong.
* gcc.target/x86_64/abi/avx512fp16/m256h/args.h: Replace long
with long long.
(YMM_T): Rename _long to _longlong and _ulong to _ulonglong.
(X87_T): Rename _ulong to _ulonglong.
* gcc.target/x86_64/abi/avx512fp16/m512h/args.h: Replace long
with long long.
(ZMM_T): Rename _long to _longlong and _ulong to _ulonglong.
(X87_T): Rename _ulong to _ulonglong.

Improve LIM fill_always_executed_in computation

Currently the DOM walk over a loop body does not walk into not
always executed subloops to avoid scalability issues since doing
so makes the walk quadratic in the loop depth. It turns out this
is not an issue in practice and even with a loop depth of 1800
this function is way off the radar.

So the following patch removes the limitation, replacing it with
a comment.

2021-09-09 Richard Biener <rguenther@suse.de>

* tree-ssa-loop-im.c (fill_always_executed_in_1): Walk
into all subloops.

* gcc.dg/tree-ssa/ssa-lim-17.c: New testcase.

Avoid full DOM walk in LIM fill_always_executed_in

This avoids a full DOM walk via get_loop_body_in_dom_order in the
loop body walk of fill_always_executed_in which is often terminating
the walk of a loop body early by integrating the DOM walk of
get_loop_body_in_dom_order with the actual processing done by
fill_always_executed_in. This trades the fully populated loop
body array with a worklist allocation of the same size and thus
should be a strict improvement over the recursive approach of
get_loop_body_in_dom_order.

2021-09-09 Richard Biener <rguenther@suse.de>

* tree-ssa-loop-im.c (fill_always_executed_in_1): Integrate
DOM walk from get_loop_body_in_dom_order using a worklist
approach.

AVX512FP16: Add testcase for vaddph/vsubph/vmulph/vdivph.

gcc/testsuite/ChangeLog:

* gcc.target/i386/avx512fp16-helper.h: New header file for
FP16 runtime test.
* gcc.target/i386/avx512fp16-vaddph-1a.c: New test.
* gcc.target/i386/avx512fp16-vaddph-1b.c: Ditto.
* gcc.target/i386/avx512fp16-vdivph-1a.c: Ditto.
* gcc.target/i386/avx512fp16-vdivph-1b.c: Ditto.
* gcc.target/i386/avx512fp16-vmulph-1a.c: Ditto.
* gcc.target/i386/avx512fp16-vmulph-1b.c: Ditto.
* gcc.target/i386/avx512fp16-vsubph-1a.c: Ditto.
* gcc.target/i386/avx512fp16-vsubph-1b.c: Ditto.
* gcc.target/i386/avx512fp16vl-vaddph-1a.c: Ditto.
* gcc.target/i386/avx512fp16vl-vaddph-1b.c: Ditto.
* gcc.target/i386/avx512fp16vl-vdivph-1a.c: Ditto.
* gcc.target/i386/avx512fp16vl-vdivph-1b.c: Ditto.
* gcc.target/i386/avx512fp16vl-vmulph-1a.c: Ditto.
* gcc.target/i386/avx512fp16vl-vmulph-1b.c: Ditto.
* gcc.target/i386/avx512fp16vl-vsubph-1a.c: Ditto.
* gcc.target/i386/avx512fp16vl-vsubph-1b.c: Ditto.

AVX512FP16: Add vaddph/vsubph/vdivph/vmulph.

gcc/ChangeLog:

* config.gcc: Add avx512fp16vlintrin.h.
* config/i386/avx512fp16intrin.h: (_mm512_add_ph): New intrinsic.
(_mm512_mask_add_ph): Likewise.
(_mm512_maskz_add_ph): Likewise.
(_mm512_sub_ph): Likewise.
(_mm512_mask_sub_ph): Likewise.
(_mm512_maskz_sub_ph): Likewise.
(_mm512_mul_ph): Likewise.
(_mm512_mask_mul_ph): Likewise.
(_mm512_maskz_mul_ph): Likewise.
(_mm512_div_ph): Likewise.
(_mm512_mask_div_ph): Likewise.
(_mm512_maskz_div_ph): Likewise.
(_mm512_add_round_ph): Likewise.
(_mm512_mask_add_round_ph): Likewise.
(_mm512_maskz_add_round_ph): Likewise.
(_mm512_sub_round_ph): Likewise.
(_mm512_mask_sub_round_ph): Likewise.
(_mm512_maskz_sub_round_ph): Likewise.
(_mm512_mul_round_ph): Likewise.
(_mm512_mask_mul_round_ph): Likewise.
(_mm512_maskz_mul_round_ph): Likewise.
(_mm512_div_round_ph): Likewise.
(_mm512_mask_div_round_ph): Likewise.
(_mm512_maskz_div_round_ph): Likewise.
* config/i386/avx512fp16vlintrin.h: New header.
* config/i386/i386-builtin-types.def (V16HF, V8HF, V32HF):
Add new builtin types.
* config/i386/i386-builtin.def: Add corresponding builtins.
* config/i386/i386-expand.c
(ix86_expand_args_builtin): Handle new builtin types.
(ix86_expand_round_builtin): Likewise.
* config/i386/immintrin.h: Include avx512fp16vlintrin.h
* config/i386/sse.md (VFH): New mode_iterator.
(VF2H): Likewise.
(avx512fmaskmode): Add HF vector modes.
(avx512fmaskhalfmode): Likewise.
(<plusminus_insn><mode>3<mask_name><round_name>): Adjust to for
HF vector modes.
(*<plusminus_insn><mode>3<mask_name><round_name>): Likewise.
(mul<mode>3<mask_name><round_name>): Likewise.
(*mul<mode>3<mask_name><round_name>): Likewise.
(div<mode>3): Likewise.
(<sse>_div<mode>3<mask_name><round_name>): Likewise.
* config/i386/subst.md (SUBST_V): Add HF vector modes.
(SUBST_A): Likewise.
(round_mode512bit_condition): Adjust for V32HFmode.

gcc/testsuite/ChangeLog:

* gcc.target/i386/avx-1.c: Add -mavx512vl and test for new intrinsics.
* gcc.target/i386/avx-2.c: Add -mavx512vl.
* gcc.target/i386/avx512fp16-11a.c: New test.
* gcc.target/i386/avx512fp16-11b.c: Ditto.
* gcc.target/i386/avx512vlfp16-11a.c: Ditto.
* gcc.target/i386/avx512vlfp16-11b.c: Ditto.
* gcc.target/i386/sse-13.c: Add test for new builtins.
* gcc.target/i386/sse-23.c: Ditto.
* gcc.target/i386/sse-14.c: Add test for new intrinsics.
* gcc.target/i386/sse-22.c: Ditto.

Optimize v4sf reduction.

gcc/ChangeLog:

PR target/101059
* config/i386/sse.md (reduc_plus_scal_<mode>): Split to ..
(reduc_plus_scal_v4sf): .. this, New define_expand.
(reduc_plus_scal_v2df): .. and this, New define_expand.

gcc/testsuite/ChangeLog:

PR target/101059
* gcc.target/i386/sse2-pr101059.c: New test.
* gcc.target/i386/sse3-pr101059.c: New test.

Optimize vec_extract for 256/512-bit vector when index exceeds the lower 128 bits.

- vextracti32x8 $0x1, %zmm0, %ymm0
- vmovd %xmm0, %eax
+ valignd $8, %zmm0, %zmm0, %zmm1
+ vmovd %xmm1, %eax

- vextracti32x8 $0x1, %zmm0, %ymm0
- vextracti128 $0x1, %ymm0, %xmm0
- vpextrd $3, %xmm0, %eax
+ valignd $15, %zmm0, %zmm0, %zmm1
+ vmovd %xmm1, %eax

- vextractf64x2 $0x1, %ymm0, %xmm0
+ valignq $2, %ymm0, %ymm0, %ymm0

- vextractf64x4 $0x1, %zmm0, %ymm0
- vextractf64x2 $0x1, %ymm0, %xmm0
- vunpckhpd %xmm0, %xmm0, %xmm0
+ valignq $7, %zmm0, %zmm0, %zmm0

gcc/ChangeLog:

PR target/91103
* config/i386/sse.md (*vec_extract<mode><ssescalarmodelower>_valign):
New define_insn.

gcc/testsuite/ChangeLog:

PR target/91103
* gcc.target/i386/pr91103-1.c: New test.
* gcc.target/i386/pr91103-2.c: New test.

Daily bump.

c++: Fix docs on assignment of virtual bases [PR60318]

The description of behaviour is incorrect, the virtual base gets
assigned before entering the bodies of A::operator= and B::operator=,
not after.

The example is also ill-formed (passing a string literal to char*) and
undefined (missing return from Base::operator=).

Signed-off-by: Jonathan Wakely <jwakely@redhat.com>
gcc/ChangeLog:

PR c++/60318
* doc/trouble.texi (Copy Assignment): Fix description of
behaviour and fix code in example.

analyzer: fix ICE when discarding result of realloc [PR102225]

gcc/analyzer/ChangeLog:
PR analyzer/102225
* analyzer.h (compat_types_p): New decl.
* constraint-manager.cc
(constraint_manager::get_or_add_equiv_class): Guard against NULL
type when checking for pointer types.
* region-model-impl-calls.cc (region_model::impl_call_realloc):
Guard against NULL lhs type/region. Guard against the size value
not being of a compatible type for dynamic extents.
* region-model.cc (compat_types_p): Make non-static.

gcc/testsuite/ChangeLog:
PR analyzer/102225
* gcc.dg/analyzer/realloc-1.c (test_10): New.
* gcc.dg/analyzer/torture/pr102225.c: New test.

c++/102228 - make lookup_anon_field O(1)

For the testcase in PR101555 lookup_anon_field takes the majority
of parsing time followed by get_class_binding_direct/fields_linear_search
which is PR83309.  The situation with anon aggregates is particularly
dire when we need to build accesses to their members and the anon
aggregates are nested.  There for each such access we recursively
build sub-accesses to the anon aggregate FIELD_DECLs bottom-up,
DFS searching for them.  That's inefficient since as I believe
there's a 1:1 relationship between anon aggregate types and the
FIELD_DECL used to place them.

The patch below does away with the search in lookup_anon_field and
instead records the single FIELD_DECL in the anon aggregate types
lang-specific data, re-using the RTTI typeinfo_var field.  That
speeds up the compile of the testcase with -fsyntax-only from
about 4.5s to slightly less than 1s.

I tried to poke holes into the 1:1 relationship idea with my C++
knowledge but failed (which might not say much).  It also leaves
a hole for the case when the C++ FE itself duplicates such type
and places it at a semantically different position.  I've tried
to poke holes into it with the duplication mechanism I understand
(templates) but failed.

2021-09-08  Richard Biener  <rguenther@suse.de>

PR c++/102228
gcc/cp/
* cp-tree.h (ANON_AGGR_TYPE_FIELD): New define.
* decl.c (fixup_anonymous_aggr): Wipe RTTI info put in
place on invalid code.
* decl2.c (reset_type_linkage): Guard CLASSTYPE_TYPEINFO_VAR
access.
* module.cc (trees_in::read_class_def): Likewise.  Reconstruct
ANON_AGGR_TYPE_FIELD.
* semantics.c (finish_member_declaration): Populate
ANON_AGGR_TYPE_FIELD for anon aggregate typed members.
* typeck.c (lookup_anon_field): Remove DFS search and return
ANON_AGGR_TYPE_FIELD directly.

testsuite: Allow .sdata in more cases in gcc.dg/array-quals-1.c

When testing for Nios II (gcc-testresults shows this for MIPS as
well), failures of gcc.dg/array-quals-1.c appear where a symbol was
found in .sdata rather than one of the expected sections.

FAIL: gcc.dg/array-quals-1.c scan-assembler-symbol-section symbol ^_?a$ (found a) has section ^\\.(const|rodata|srodata)|\\[RO\\] (found .sdata)
FAIL: gcc.dg/array-quals-1.c scan-assembler-symbol-section symbol ^_?b$ (found b) has section ^\\.(const|rodata|srodata)|\\[RO\\] (found .sdata)
FAIL: gcc.dg/array-quals-1.c scan-assembler-symbol-section symbol ^_?c$ (found c) has section ^\\.(const|rodata|srodata)|\\[RO\\] (found .sdata)
FAIL: gcc.dg/array-quals-1.c scan-assembler-symbol-section symbol ^_?d$ (found d) has section ^\\.(const|rodata|srodata)|\\[RO\\] (found .sdata)

Jakub's commit 0b34dbc0a24864b1674bff7a92fa3cf0f1cbcea1 allowed .sdata
for many variables in that test where use of .sdata caused a failure
on powerpc-linux. I'm presuming the choice of which variables had
.sdata allowed was based only on the code generated for powerpc-linux,
not on any reason it would be wrong to allow it for the other
variables; thus, this patch adjusts the test to allow .sdata for some
more variables where that is needed on Nios II (and in one case where
it's not needed on Nios II, but the test results on gcc-testresults
suggest that it is needed on MIPS).

Tested with no regressions with cross to nios2-elf.

* gcc.dg/array-quals-1.c: Allow .sdata section in more cases.

testsuite: Use explicit -ftree-cselim in tests using -fdump-tree-cselim-details

When testing for Nios II (gcc-testresults shows this for various other
targets as well), tests scanning cselim dumps produce an UNRESOLVED
result because those dumps do not exist.

cselim is enabled conditionally by code in toplev.c:

  if (flag_tree_cselim == AUTODETECT_VALUE)
    {
      if (HAVE_conditional_move)
flag_tree_cselim = 1;
      else
flag_tree_cselim = 0;
    }

Add explicit -ftree-cselim to dg-options in the affected tests (as
already used by some other tests of cselim dumps) so that this dump
exists on all architectures.

Tested with no regressions with cross to nios2-elf, where this causes
the tests in question to PASS instead of being UNRESOLVED.

* gcc.dg/tree-ssa/pr89430-1.c, gcc.dg/tree-ssa/pr89430-2.c,
gcc.dg/tree-ssa/pr89430-3.c, gcc.dg/tree-ssa/pr89430-4.c,
gcc.dg/tree-ssa/pr89430-5.c, gcc.dg/tree-ssa/pr89430-6.c,
gcc.dg/tree-ssa/pr89430-7-comp-ref.c,
gcc.dg/tree-ssa/pr89430-8-mem-ref-size.c,
gcc.dg/tree-ssa/pr99473-1.c: Use -ftree-cselim.

rs6000: Fix ELFv2 r12 use in epilogue

We cannot use r12 here, it is already in use as the GEP (for sibling
calls).

2021-09-08 Segher Boessenkool <segher@kernel.crashing.org>
PR target/102107
* config/rs6000/rs6000-logue.c (rs6000_emit_epilogue): For ELFv2 use
r11 instead of r12 for restoring CR.

i386: Fix up xorsign for AVX [PR89984]

Thinking about it more this morning, while this patch fixes the problems
revealed in the testcase, the recent PR89984 change was buggy too, but
perhaps that can be fixed incrementally.  Because for AVX the new code
destructively modifies op1.  If that is different from dest, say on:
float
foo (float x, float y)
{
  return x * __builtin_copysignf (1.0f, y) + y;
}
then we get after RA:
(insn 8 7 9 2 (set (reg:SF 20 xmm0 [orig:82 _2 ] [82])
        (unspec:SF [
                (reg:SF 20 xmm0 [88])
                (reg:SF 21 xmm1 [89])
                (mem/u/c:V4SF (symbol_ref/u:DI ("*.LC0") [flags 0x2]) [0  S16 A128])
            ] UNSPEC_XORSIGN)) "hohoho.c":4:12 649 {xorsignsf3_1}
     (nil))
(insn 9 8 15 2 (set (reg:SF 20 xmm0 [87])
        (plus:SF (reg:SF 20 xmm0 [orig:82 _2 ] [82])
            (reg:SF 21 xmm1 [89]))) "hohoho.c":4:44 1021 {*fop_sf_comm}
     (nil))
but split the xorsign into:
        vandps  .LC0(%rip), %xmm1, %xmm1
        vxorps  %xmm0, %xmm1, %xmm0
and then the addition:
        vaddss  %xmm1, %xmm0, %xmm0
which means we miscompile it - instead of adding y in the end we add
__builtin_copysignf (0.0f, y).
So, wonder if we don't want instead in addition to the &Yv <- Yv, 0
alternative (enabled for both pre-AVX and AVX as in this patch) the
&Yv <- Yv, Yv where destination must be different from inputs and another
Yv <- Yv, Yv where it can be the same but then need a match_scratch
(with X for the other alternatives and =Yv for the last one).
That way we'd always have a safe register we can store the op1 & mask
value into, either the destination (in the first alternative known to
be equal to op1 which is needed for non-AVX but ok for AVX too), in the
second alternative known to be different from both inputs and in the third
which could be used for those
float bar (float x, float y) { return x * __builtin_copysignf (1.0f, y); }
cases where op1 is naturally xmm1 and dest == op0 naturally xmm0 we'd use
some other register like xmm2.

On Wed, Sep 08, 2021 at 05:23:40PM +0800, Hongtao Liu wrote:
> I'm curious why we need the  post_reload splitter @xorsign<mode>3_1
> for scalar mode, can't we just expand them into and/xor operations in
> the expander, just like vector modes did.

Following seems to work for all the testcases I've tried (and in some
generates better code than the post-reload splitter).

2021-09-08  Jakub Jelinek  <jakub@redhat.com>
    liuhongt  <hongtao.liu@intel.com>

PR target/89984
* config/i386/i386.md (@xorsign<mode>3_1): Remove.
* config/i386/i386-expand.c (ix86_expand_xorsign): Expand right away
into AND with mask and XOR, using paradoxical subregs.
(ix86_split_xorsign): Remove.
* config/i386/i386-protos.h (ix86_split_xorsign): Remove.

* gcc.target/i386/avx-pr102224.c: Fix up PR number.
* gcc.dg/pr89984.c: New test.
* gcc.target/i386/avx-pr89984.c: New test.

Compile __{mul,div}hc3 into libgcc_s.so.1.

libgcc/ChangeLog:

* config/i386/t-softfp: Compile __{mul,div}hc3 into
libgcc_s.so.1.

tree-optimization/102183 - sccvn: fix result compare in vn_nary_op_insert_into

If the first predicate value is different and copied, the comparison will then
be between val->result and the copied one. That can cause inserting extra
vn_pvals.

gcc/ChangeLog:

* tree-ssa-sccvn.c (vn_nary_op_insert_into): fix result compare

libgcc, i386: Export *hf* and *hc* from libgcc_s.so.1

The following patch exports it for Linux from config/i386/*.ver where it
IMNSHO belongs, aarch64 already exports some of those at GCC_11* and other
targets might add them at completely different gcc versions.

2021-09-08 Jakub Jelinek <jakub@redhat.com>
Iain Sandoe <iain@sandoe.co.uk>

* config/i386/libgcc-glibc.ver: Add %inherit GCC_12.0.0 GCC_7.0.0
and export *hf* and *hc* functions at GCC_12.0.0.

i386: Fix up @xorsign<mode>3_1 [PR102224]

As the testcase shows, we miscompile @xorsign<mode>3_1 if both input
operands are in the same register, because the splitter overwrites op1
before with op1 & mask before using op0.

For dest = xorsign op0, op0 we can actually simplify it from
dest = (op0 & mask) ^ op0 to dest = op0 & ~mask (aka abs).

The expander change is an optimization improvement, if we at expansion
time know it is xorsign op0, op0, we can emit abs right away and get better
code through that.

The @xorsign<mode>3_1 is a fix for the case where xorsign wouldn't be known
to have same operands during expansion, but during RTL optimizations they
would appear.  For non-AVX we need to use earlyclobber, we require
dest and op1 to be the same but op0 must be different because we overwrite
op1 first.  For AVX the constraints ensure that at most 2 of the 3 operands
may be the same register and if both inputs are the same, handles that case.
This case can be easily tested with the xorsign<mode>3 expander change
reverted.

Bootstrapped/regtested on x86_64-linux and i686-linux, ok for trunk?

Thinking about it more this morning, while this patch fixes the problems
revealed in the testcase, the recent PR89984 change was buggy too, but
perhaps that can be fixed incrementally.  Because for AVX the new code
destructively modifies op1.  If that is different from dest, say on:
float
foo (float x, float y)
{
  return x * __builtin_copysignf (1.0f, y) + y;
}
then we get after RA:
(insn 8 7 9 2 (set (reg:SF 20 xmm0 [orig:82 _2 ] [82])
        (unspec:SF [
                (reg:SF 20 xmm0 [88])
                (reg:SF 21 xmm1 [89])
                (mem/u/c:V4SF (symbol_ref/u:DI ("*.LC0") [flags 0x2]) [0  S16 A128])
            ] UNSPEC_XORSIGN)) "hohoho.c":4:12 649 {xorsignsf3_1}
     (nil))
(insn 9 8 15 2 (set (reg:SF 20 xmm0 [87])
        (plus:SF (reg:SF 20 xmm0 [orig:82 _2 ] [82])
            (reg:SF 21 xmm1 [89]))) "hohoho.c":4:44 1021 {*fop_sf_comm}
     (nil))
but split the xorsign into:
        vandps  .LC0(%rip), %xmm1, %xmm1
        vxorps  %xmm0, %xmm1, %xmm0
and then the addition:
        vaddss  %xmm1, %xmm0, %xmm0
which means we miscompile it - instead of adding y in the end we add
__builtin_copysignf (0.0f, y).
So, wonder if we don't want instead in addition to the &Yv <- Yv, 0
alternative (enabled for both pre-AVX and AVX as in this patch) the
&Yv <- Yv, Yv where destination must be different from inputs and another
Yv <- Yv, Yv where it can be the same but then need a match_scratch
(with X for the other alternatives and =Yv for the last one).
That way we'd always have a safe register we can store the op1 & mask
value into, either the destination (in the first alternative known to
be equal to op1 which is needed for non-AVX but ok for AVX too), in the
second alternative known to be different from both inputs and in the third
which could be used for those
float bar (float x, float y) { return x * __builtin_copysignf (1.0f, y); }
cases where op1 is naturally xmm1 and dest == op0 naturally xmm0 we'd use
some other register like xmm2.

2021-09-08  Jakub Jelinek  <jakub@redhat.com>

PR target/102224
* config/i386/i386.md (xorsign<mode>3): If operands[1] is equal to
operands[2], emit abs<mode>2 instead.
(@xorsign<mode>3_1): Add early-clobbers for output operand, enable
first alternative even for avx, add another alternative with
=&Yv <- 0, Yv, Yvm constraints.
* config/i386/i386-expand.c (ix86_split_xorsign): If op0 is equal
to op1, emit vpandn instead.

* gcc.dg/pr102224.c: New test.
* gcc.target/i386/avx-pr102224.c: New test.

AVX512FP16: Add abi test for zmm

gcc/testsuite/ChangeLog:

* gcc.target/x86_64/abi/avx512fp16/m512h/abi-avx512fp16-zmm.exp:
New file.
* gcc.target/x86_64/abi/avx512fp16/m512h/args.h: Likewise.
* gcc.target/x86_64/abi/avx512fp16/m512h/asm-support.S: Likewise.
* gcc.target/x86_64/abi/avx512fp16/m512h/avx512fp16-zmm-check.h:
Likewise.
* gcc.target/x86_64/abi/avx512fp16/m512h/test_m512_returning.c:
Likewise.
* gcc.target/x86_64/abi/avx512fp16/m512h/test_passing_m512.c:
Likewise.
* gcc.target/x86_64/abi/avx512fp16/m512h/test_passing_structs.c:
Likewise.
* gcc.target/x86_64/abi/avx512fp16/m512h/test_passing_unions.c:
Likewise.
* gcc.target/x86_64/abi/avx512fp16/m512h/test_varargs-m512.c:
Likewise.

AVX512FP16: Add ABI test for ymm.

gcc/testsuite/ChangeLog:

* gcc.target/x86_64/abi/avx512fp16/m256h/abi-avx512fp16-ymm.exp:
New exp file.
* gcc.target/x86_64/abi/avx512fp16/m256h/args.h: New header.
* gcc.target/x86_64/abi/avx512fp16/m256h/avx512fp16-ymm-check.h:
Likewise.
* gcc.target/x86_64/abi/avx512fp16/m256h/asm-support.S: New.
* gcc.target/x86_64/abi/avx512fp16/m256h/test_m256_returning.c:
New test.
* gcc.target/x86_64/abi/avx512fp16/m256h/test_passing_m256.c: Likewise.
* gcc.target/x86_64/abi/avx512fp16/m256h/test_passing_structs.c:
Likewise.
* gcc.target/x86_64/abi/avx512fp16/m256h/test_passing_unions.c:
Likewise.
* gcc.target/x86_64/abi/avx512fp16/m256h/test_varargs-m256.c: Likewise.

AVX512FP16: Add ABI tests for xmm.

Copied from regular XMM ABI tests. Only run AVX512FP16 ABI tests for ELF
targets.

gcc/testsuite/ChangeLog:

* gcc.target/x86_64/abi/avx512fp16/abi-avx512fp16-xmm.exp: New exp
file for abi test.
* gcc.target/x86_64/abi/avx512fp16/args.h: New header file for abi test.
* gcc.target/x86_64/abi/avx512fp16/avx512fp16-check.h: Likewise.
* gcc.target/x86_64/abi/avx512fp16/avx512fp16-xmm-check.h: Likewise.
* gcc.target/x86_64/abi/avx512fp16/defines.h: Likewise.
* gcc.target/x86_64/abi/avx512fp16/macros.h: Likewise.
* gcc.target/x86_64/abi/avx512fp16/asm-support.S: New asm for abi check.
* gcc.target/x86_64/abi/avx512fp16/test_3_element_struct_and_unions.c:
New test.
* gcc.target/x86_64/abi/avx512fp16/test_basic_alignment.c: Likewise.
* gcc.target/x86_64/abi/avx512fp16/test_basic_array_size_and_align.c:
Likewise.
* gcc.target/x86_64/abi/avx512fp16/test_basic_returning.c: Likewise.
* gcc.target/x86_64/abi/avx512fp16/test_basic_sizes.c: Likewise.
* gcc.target/x86_64/abi/avx512fp16/test_basic_struct_size_and_align.c:
Likewise.
* gcc.target/x86_64/abi/avx512fp16/test_basic_union_size_and_align.c:
Likewise.
* gcc.target/x86_64/abi/avx512fp16/test_complex_returning.c: Likewise.
* gcc.target/x86_64/abi/avx512fp16/test_m64m128_returning.c: Likewise.
* gcc.target/x86_64/abi/avx512fp16/test_passing_floats.c: Likewise.
* gcc.target/x86_64/abi/avx512fp16/test_passing_m64m128.c: Likewise.
* gcc.target/x86_64/abi/avx512fp16/test_passing_structs.c: Likewise.
* gcc.target/x86_64/abi/avx512fp16/test_passing_unions.c: Likewise.
* gcc.target/x86_64/abi/avx512fp16/test_struct_returning.c: Likewise.
* gcc.target/x86_64/abi/avx512fp16/test_varargs-m128.c: Likewise.

AVX512FP16: Add tests for vector passing in variable arguments.

gcc/testsuite/ChangeLog:

* gcc.target/i386/avx512fp16-vararg-1.c: New test.
* gcc.target/i386/avx512fp16-vararg-2.c: Ditto.
* gcc.target/i386/avx512fp16-vararg-3.c: Ditto.
* gcc.target/i386/avx512fp16-vararg-4.c: Ditto.

AVX512FP16: Add testcase for vector init and broadcast intrinsics.

gcc/testsuite/ChangeLog:

* gcc.target/i386/m512-check.h: Add union128h, union256h, union512h.
* gcc.target/i386/avx512fp16-10a.c: New test.
* gcc.target/i386/avx512fp16-10b.c: Ditto.
* gcc.target/i386/avx512fp16-1a.c: Ditto.
* gcc.target/i386/avx512fp16-1b.c: Ditto.
* gcc.target/i386/avx512fp16-1c.c: Ditto.
* gcc.target/i386/avx512fp16-1d.c: Ditto.
* gcc.target/i386/avx512fp16-1e.c: Ditto.
* gcc.target/i386/avx512fp16-2a.c: Ditto.
* gcc.target/i386/avx512fp16-2b.c: Ditto.
* gcc.target/i386/avx512fp16-2c.c: Ditto.
* gcc.target/i386/avx512fp16-3a.c: Ditto.
* gcc.target/i386/avx512fp16-3b.c: Ditto.
* gcc.target/i386/avx512fp16-3c.c: Ditto.
* gcc.target/i386/avx512fp16-4.c: Ditto.
* gcc.target/i386/avx512fp16-5.c: Ditto.
* gcc.target/i386/avx512fp16-6.c: Ditto.
* gcc.target/i386/avx512fp16-7.c: Ditto.
* gcc.target/i386/avx512fp16-8.c: Ditto.
* gcc.target/i386/avx512fp16-9a.c: Ditto.
* gcc.target/i386/avx512fp16-9b.c: Ditto.
* gcc.target/i386/pr54855-13.c: Ditto.
* gcc.target/i386/avx512fp16-vec_set_var.c: Ditto.

AVX512FP16: Support vector init/broadcast/set/extract for FP16.

gcc/ChangeLog:

* config/i386/avx512fp16intrin.h (_mm_set_ph): New intrinsic.
(_mm256_set_ph): Likewise.
(_mm512_set_ph): Likewise.
(_mm_setr_ph): Likewise.
(_mm256_setr_ph): Likewise.
(_mm512_setr_ph): Likewise.
(_mm_set1_ph): Likewise.
(_mm256_set1_ph): Likewise.
(_mm512_set1_ph): Likewise.
(_mm_setzero_ph): Likewise.
(_mm256_setzero_ph): Likewise.
(_mm512_setzero_ph): Likewise.
(_mm_set_sh): Likewise.
(_mm_load_sh): Likewise.
(_mm_store_sh): Likewise.
* config/i386/i386-builtin-types.def (V8HF): New type.
(DEF_FUNCTION_TYPE (V8HF, V8HI)): New builtin function type
* config/i386/i386-expand.c (ix86_expand_vector_init_duplicate):
Support vector HFmodes.
(ix86_expand_vector_init_one_nonzero): Likewise.
(ix86_expand_vector_init_one_var): Likewise.
(ix86_expand_vector_init_interleave): Likewise.
(ix86_expand_vector_init_general): Likewise.
(ix86_expand_vector_set): Likewise.
(ix86_expand_vector_extract): Likewise.
(ix86_expand_vector_init_concat): Likewise.
(ix86_expand_sse_movcc): Handle vector HFmodes.
(ix86_expand_vector_set_var): Ditto.
* config/i386/i386-modes.def: Add HF vector modes in comment.
* config/i386/i386.c (classify_argument): Add HF vector modes.
(ix86_hard_regno_mode_ok): Allow HF vector modes for AVX512FP16.
(ix86_vector_mode_supported_p): Likewise.
(ix86_set_reg_reg_cost): Handle vector HFmode.
(ix86_get_ssemov): Handle vector HFmode.
(function_arg_advance_64): Pass unamed V16HFmode and V32HFmode
by stack.
(function_arg_advance_32): Pass V8HF/V16HF/V32HF by sse reg for 32bit
mode.
(function_arg_advance_32): Ditto.
* config/i386/i386.h (VALID_AVX512FP16_REG_MODE): New.
(VALID_AVX256_REG_OR_OI_MODE): Rename to ..
(VALID_AVX256_REG_OR_OI_VHF_MODE): .. this, and add V16HF.
(VALID_SSE2_REG_VHF_MODE): New.
(VALID_AVX512VL_128_REG_MODE): Add V8HF and TImode.
(SSE_REG_MODE_P): Add vector HFmode.
* config/i386/i386.md (mode): Add HF vector modes.
(MODE_SIZE): Likewise.
(ssemodesuffix): Add ph suffix for HF vector modes.
* config/i386/sse.md (VFH_128): New mode iterator.
(VMOVE): Adjust for HF vector modes.
(V): Likewise.
(V_256_512): Likewise.
(avx512): Likewise.
(avx512fmaskmode): Likewise.
(shuffletype): Likewise.
(sseinsnmode): Likewise.
(ssedoublevecmode): Likewise.
(ssehalfvecmode): Likewise.
(ssehalfvecmodelower): Likewise.
(ssePScmode): Likewise.
(ssescalarmode): Likewise.
(ssescalarmodelower): Likewise.
(sseintprefix): Likewise.
(i128): Likewise.
(bcstscalarsuff): Likewise.
(xtg_mode): Likewise.
(VI12HF_AVX512VL): New mode_iterator.
(VF_AVX512FP16): Likewise.
(VIHF): Likewise.
(VIHF_256): Likewise.
(VIHF_AVX512BW): Likewise.
(V16_256): Likewise.
(V32_512): Likewise.
(sseintmodesuffix): New mode_attr.
(sse): Add scalar and vector HFmodes.
(ssescalarmode): Add vector HFmode mapping.
(ssescalarmodesuffix): Add sh suffix for HFmode.
(*<sse>_vm<insn><mode>3): Use VFH_128.
(*<sse>_vm<multdiv_mnemonic><mode>3): Likewise.
(*ieee_<ieee_maxmin><mode>3): Likewise.
(<avx512>_blendm<mode>): New define_insn.
(vec_setv8hf): New define_expand.
(vec_set<mode>_0): New define_insn for HF vector set.
(*avx512fp16_movsh): Likewise.
(avx512fp16_movsh): Likewise.
(vec_extract_lo_v32hi): Rename to ...
(vec_extract_lo_<mode>): ... this, and adjust to allow HF
vector modes.
(vec_extract_hi_v32hi): Likewise.
(vec_extract_hi_<mode>): Likewise.
(vec_extract_lo_v16hi): Likewise.
(vec_extract_lo_<mode>): Likewise.
(vec_extract_hi_v16hi): Likewise.
(vec_extract_hi_<mode>): Likewise.
(vec_set_hi_v16hi): Likewise.
(vec_set_hi_<mode>): Likewise.
(vec_set_lo_v16hi): Likewise.
(vec_set_lo_<mode>): Likewise.
(*vec_extract<mode>_0): New define_insn_and_split for HF
vector extract.
(*vec_extracthf): New define_insn.
(VEC_EXTRACT_MODE): Add HF vector modes.
(PINSR_MODE): Add V8HF.
(sse2p4_1): Likewise.
(pinsr_evex_isa): Likewise.
(<sse2p4_1>_pinsr<ssemodesuffix>): Adjust to support
insert for V8HFmode.
(pbroadcast_evex_isa): Add HF vector modes.
(AVX2_VEC_DUP_MODE): Likewise.
(VEC_INIT_MODE): Likewise.
(VEC_INIT_HALF_MODE): Likewise.
(avx2_pbroadcast<mode>): Adjust to support HF vector mode
broadcast.
(avx2_pbroadcast<mode>_1): Likewise.
(<avx512>_vec_dup<mode>_1): Likewise.
(<avx512>_vec_dup<mode><mask_name>): Likewise.
(<mask_codefor><avx512>_vec_dup_gpr<mode><mask_name>):
Likewise.

AVX512FP16: Initial support for AVX512FP16 feature and scalar _Float16 instructions.

gcc/ChangeLog:

* common/config/i386/cpuinfo.h (get_available_features):
Detect FEATURE_AVX512FP16.
* common/config/i386/i386-common.c
(OPTION_MASK_ISA_AVX512FP16_SET,
OPTION_MASK_ISA_AVX512FP16_UNSET,
OPTION_MASK_ISA2_AVX512FP16_SET,
OPTION_MASK_ISA2_AVX512FP16_UNSET): New.
(OPTION_MASK_ISA2_AVX512BW_UNSET,
OPTION_MASK_ISA2_AVX512BF16_UNSET): Add AVX512FP16.
(ix86_handle_option): Handle -mavx512fp16.
* common/config/i386/i386-cpuinfo.h (enum processor_features):
Add FEATURE_AVX512FP16.
* common/config/i386/i386-isas.h: Add entry for AVX512FP16.
* config.gcc: Add avx512fp16intrin.h.
* config/i386/avx512fp16intrin.h: New intrinsic header.
* config/i386/cpuid.h: Add bit_AVX512FP16.
* config/i386/i386-builtin-types.def: (FLOAT16): New primitive type.
* config/i386/i386-builtins.c: Support _Float16 type for i386
backend.
(ix86_register_float16_builtin_type): New function.
(ix86_float16_type_node): New.
* config/i386/i386-c.c (ix86_target_macros_internal): Define
__AVX512FP16__.
* config/i386/i386-expand.c (ix86_expand_branch): Support
HFmode.
(ix86_prepare_fp_compare_args): Adjust TARGET_SSE_MATH &&
SSE_FLOAT_MODE_P to SSE_FLOAT_MODE_SSEMATH_OR_HF_P.
(ix86_expand_fp_movcc): Ditto.
* config/i386/i386-isa.def: Add PTA define for AVX512FP16.
* config/i386/i386-options.c (isa2_opts): Add -mavx512fp16.
(ix86_valid_target_attribute_inner_p): Add avx512fp16 attribute.
* config/i386/i386.c (ix86_get_ssemov): Use
vmovdqu16/vmovw/vmovsh for HFmode/HImode scalar or vector.
(ix86_get_excess_precision): Use
FLT_EVAL_METHOD_PROMOTE_TO_FLOAT16 when TARGET_AVX512FP16
existed.
(sse_store_index): Use SFmode cost for HFmode cost.
(inline_memory_move_cost): Add HFmode, and perfer SSE cost over
GPR cost for HFmode.
(ix86_hard_regno_mode_ok): Allow HImode in sse register.
(ix86_mangle_type): Add manlging for _Float16 type.
(inline_secondary_memory_needed): No memory is needed for
16bit movement between gpr and sse reg under
TARGET_AVX512FP16.
(ix86_multiplication_cost): Adjust TARGET_SSE_MATH &&
SSE_FLOAT_MODE_P to SSE_FLOAT_MODE_SSEMATH_OR_HF_P.
(ix86_division_cost): Ditto.
(ix86_rtx_costs): Ditto.
(ix86_add_stmt_cost): Ditto.
(ix86_optab_supported_p): Ditto.
* config/i386/i386.h (VALID_AVX512F_SCALAR_MODE): Add HFmode.
(SSE_FLOAT_MODE_SSEMATH_OR_HF_P): Add HFmode.
(PTA_SAPPHIRERAPIDS): Add PTA_AVX512FP16.
* config/i386/i386.md (mode): Add HFmode.
(MODE_SIZE): Add HFmode.
(isa): Add avx512fp16.
(enabled): Handle avx512fp16.
(ssemodesuffix): Add sh suffix for HFmode.
(comm): Add mult, div.
(plusminusmultdiv): New code iterator.
(insn): Add mult, div.
(*movhf_internal): Adjust for avx512fp16 instruction.
(*movhi_internal): Ditto.
(*cmpi<unord>hf): New define_insn for HFmode.
(*ieee_s<ieee_maxmin>hf3): Likewise.
(extendhf<mode>2): Likewise.
(trunc<mode>hf2): Likewise.
(float<floatunssuffix><mode>hf2): Likewise.
(*<insn>hf): Likewise.
(cbranchhf4): New expander.
(movhfcc): Likewise.
(<insn>hf3): Likewise.
(mulhf3): Likewise.
(divhf3): Likewise.
* config/i386/i386.opt: Add mavx512fp16.
* config/i386/immintrin.h: Include avx512fp16intrin.h.
* doc/invoke.texi: Add mavx512fp16.
* doc/extend.texi: Add avx512fp16 Usage Notes.

gcc/testsuite/ChangeLog:

* gcc.target/i386/avx-1.c: Add -mavx512fp16 in dg-options.
* gcc.target/i386/avx-2.c: Ditto.
* gcc.target/i386/avx512-check.h: Check cpuid for AVX512FP16.
* gcc.target/i386/funcspec-56.inc: Add new target attribute check.
* gcc.target/i386/sse-13.c: Add -mavx512fp16.
* gcc.target/i386/sse-14.c: Ditto.
* gcc.target/i386/sse-22.c: Ditto.
* gcc.target/i386/sse-23.c: Ditto.
* lib/target-supports.exp: (check_effective_target_avx512fp16): New.
* g++.target/i386/float16-1.C: New test.
* g++.target/i386/float16-2.C: Ditto.
* g++.target/i386/float16-3.C: Ditto.
* gcc.target/i386/avx512fp16-12a.c: Ditto.
* gcc.target/i386/avx512fp16-12b.c: Ditto.
* gcc.target/i386/float16-3a.c: Ditto.
* gcc.target/i386/float16-3b.c: Ditto.
* gcc.target/i386/float16-4a.c: Ditto.
* gcc.target/i386/float16-4b.c: Ditto.
* gcc.target/i386/pr54855-12.c: Ditto.
* g++.dg/other/i386-2.C: Ditto.
* g++.dg/other/i386-3.C: Ditto.

Co-Authored-By: H.J. Lu <hongjiu.lu@intel.com>
Co-Authored-By: Liu Hongtao <hongtao.liu@intel.com>
Co-Authored-By: Wang Hongyu <hongyu.wang@intel.com>
Co-Authored-By: Xu Dianhong <dianhong.xu@intel.com>

Support -fexcess-precision=16 which will enable FLT_EVAL_METHOD_PROMOTE_TO_FLOAT16 when backend supports _Float16.

gcc/ada/ChangeLog:

* gcc-interface/misc.c (gnat_post_options): Issue an error for
-fexcess-precision=16.

gcc/c-family/ChangeLog:

* c-common.c (excess_precision_mode_join): Update below comments.
(c_ts18661_flt_eval_method): Set excess_precision_type to
EXCESS_PRECISION_TYPE_FLOAT16 when -fexcess-precision=16.
* c-cppbuiltin.c (cpp_atomic_builtins): Update below comments.
(c_cpp_flt_eval_method_iec_559): Set excess_precision_type to
EXCESS_PRECISION_TYPE_FLOAT16 when -fexcess-precision=16.

gcc/ChangeLog:

* common.opt: Support -fexcess-precision=16.
* config/aarch64/aarch64.c (aarch64_excess_precision): Return
FLT_EVAL_METHOD_PROMOTE_TO_FLOAT16 when
EXCESS_PRECISION_TYPE_FLOAT16.
* config/arm/arm.c (arm_excess_precision): Ditto.
* config/i386/i386.c (ix86_get_excess_precision): Ditto.
* config/m68k/m68k.c (m68k_excess_precision): Issue an error
when EXCESS_PRECISION_TYPE_FLOAT16.
* config/s390/s390.c (s390_excess_precision): Ditto.
* coretypes.h (enum excess_precision_type): Add
EXCESS_PRECISION_TYPE_FLOAT16.
* doc/tm.texi (TARGET_C_EXCESS_PRECISION): Update documents.
* doc/tm.texi.in (TARGET_C_EXCESS_PRECISION): Ditto.
* doc/extend.texi (Half-Precision): Document
-fexcess-precision=16.
* flag-types.h (enum excess_precision): Add
EXCESS_PRECISION_FLOAT16.
* target.def (excess_precision): Update document.
* tree.c (excess_precision_type): Set excess_precision_type to
EXCESS_PRECISION_FLOAT16 when -fexcess-precision=16.

gcc/fortran/ChangeLog:

* options.c (gfc_post_options): Issue an error for
-fexcess-precision=16.

gcc/testsuite/ChangeLog:

* gcc.target/i386/float16-6.c: New test.
* gcc.target/i386/float16-7.c: New test.

Adjust the wording for x86 _Float16 type.

gcc/ChangeLog:

* doc/extend.texi: (@node Floating Types): Adjust the wording.
(@node Half-Precision): Ditto.

Daily bump.

gcc: xtensa: fix PR target/102115

2021-09-07 Takayuki 'January June' Suwa <jjsuwa_sys3175@yahoo.co.jp>
gcc/
PR target/102115
* config/xtensa/xtensa.c (xtensa_emit_move_sequence): Add
'CONST_INT_P (src)' to the condition of the block that tries to
eliminate literal when loading integer contant.

runtime: use hash32, not hash64, for amd64p32, mips64p32, mips64p32le

Fixes PR go/102102

Reviewed-on: https://go-review.googlesource.com/c/gofrontend/+/348015

doc: BPF CO-RE documentation

Document the new command line options (-mco-re and -mno-co-re), the new
BPF target builtin (__builtin_preserve_access_index), and the new BPF
target attribute (preserve_access_index) introduced with BPF CO-RE.

gcc/ChangeLog:

* doc/extend.texi (BPF Type Attributes) New node.
Document new preserve_access_index attribute.
Document new preserve_access_index builtin.
* doc/invoke.texi: Document -mco-re and -mno-co-re options.

bpf testsuite: Add BPF CO-RE tests

This commit adds several tests for the new BPF CO-RE functionality to
the BPF target testsuite.

gcc/testsuite/ChangeLog:

* gcc.target/bpf/core-attr-1.c: New test.
* gcc.target/bpf/core-attr-2.c: Likewise.
* gcc.target/bpf/core-attr-3.c: Likewise.
* gcc.target/bpf/core-attr-4.c: Likewise
* gcc.target/bpf/core-builtin-1.c: Likewise
* gcc.target/bpf/core-builtin-2.c: Likewise.
* gcc.target/bpf/core-builtin-3.c: Likewise.
* gcc.target/bpf/core-section-1.c: Likewise.

bpf: BPF CO-RE support

This commit introduces support for BPF Compile Once - Run
Everywhere (CO-RE) in GCC.

gcc/ChangeLog:

* config/bpf/bpf.c: Adjust includes.
(bpf_handle_preserve_access_index_attribute): New function.
(bpf_attribute_table): Use it here.
(bpf_builtins): Add BPF_BUILTIN_PRESERVE_ACCESS_INDEX.
(bpf_option_override): Handle "-mco-re" option.
(bpf_asm_init_sections): New.
(TARGET_ASM_INIT_SECTIONS): Redefine.
(bpf_file_end): New.
(TARGET_ASM_FILE_END): Redefine.
(bpf_init_builtins): Add "__builtin_preserve_access_index".
(bpf_core_compute, bpf_core_get_index): New.
(is_attr_preserve_access): New.
(bpf_expand_builtin): Handle new builtins.
(bpf_core_newdecl, bpf_core_is_maybe_aggregate_access): New.
(bpf_core_walk): New.
(bpf_resolve_overloaded_builtin): New.
(TARGET_RESOLVE_OVERLOADED_BUILTIN): Redefine.
(handle_attr): New.
(pass_bpf_core_attr): New RTL pass.
* config/bpf/bpf-passes.def: New file.
* config/bpf/bpf-protos.h (make_pass_bpf_core_attr): New.
* config/bpf/coreout.c: New file.
* config/bpf/coreout.h: Likewise.
* config/bpf/t-bpf (TM_H): Add $(srcdir)/config/bpf/coreout.h.
(coreout.o): New rule.
(PASSES_EXTRA): Add $(srcdir)/config/bpf/bpf-passes.def.
* config.gcc (bpf): Add coreout.h to extra_headers.
Add coreout.o to extra_objs.
Add $(srcdir)/config/bpf/coreout.c to target_gtfiles.

btf: expose get_btf_id

Expose the function get_btf_id, so that it may be used by the BPF
backend. This enables the BPF CO-RE machinery in the BPF backend to
lookup BTF type IDs, in order to create CO-RE relocation records.

A prototype is added in ctfc.h

gcc/ChangeLog:

* btfout.c (get_btf_id): Function is no longer static.
* ctfc.h: Expose it here.

ctfc: add function to lookup CTF ID of a TREE type

Add a new function, ctf_lookup_tree_type, to return the CTF type ID
associated with a type via its is TREE node. The function is exposed via
a prototype in ctfc.h.

gcc/ChangeLog:

* ctfc.c (ctf_lookup_tree_type): New function.
* ctfc.h: Likewise.

ctfc: externalize ctf_dtd_lookup

Expose the function ctf_dtd_lookup, so that it can be used by the BPF
CO-RE machinery. The function is no longer static, and an extern
prototype is added in ctfc.h.

gcc/ChangeLog:

* ctfc.c (ctf_dtd_lookup): Function is no longer static.
* ctfc.h: Analogous change.

dwarf: externalize lookup_type_die

Expose the function lookup_type_die in dwarf2out, so that it can be used
by CTF/BTF when adding BPF CO-RE information. The function is now
non-static, and an extern prototype is added in dwarf2out.h.

gcc/ChangeLog:

* dwarf2out.c (lookup_type_die): Function is no longer static.
* dwarf2out.h: Expose it here.

Fix fatal typo in gcc.dg/no_profile_instrument_function-attr-2.c

Dejagnu is unfortunately brittle: a syntax error in a
directive can abort the test-run for the current "tool"
(gcc, g++, gfortran), and if you don't check for this
condition or actually read the stdout log yourself, your
tools may make you believe the test was successful without
regressions.  At the very least, always grep for ^ERROR: in
the stdout log!

With r12-3379, the testsuite got such a fatal syntax error,
causing the gcc test-run to abort at (e.g.):

...
FAIL: gcc.dg/memchr.c (test for excess errors)
FAIL: gcc.dg/memcmp-3.c (test for excess errors)
ERROR: (DejaGnu) proc "scan-tree-dump-not\" = foo {"} optimized" does not exist.
The error code is TCL LOOKUP COMMAND scan-tree-dump-not\"
The info on the error is:
invalid command name "scan-tree-dump-not""
    while executing
"::tcl_unknown scan-tree-dump-not\" = foo {"} optimized"
    ("uplevel" body line 1)
    invoked from within
"uplevel 1 ::tcl_unknown $args"

=== gcc Summary ===

# of expected passes 63740
# of unexpected failures 38
# of unexpected successes 2
# of expected failures 351
# of unresolved testcases 3
# of unsupported tests 662
x/cris-elf/gccobj/gcc/xgcc  version 12.0.0 20210907 (experimental)\
[master r12-3391-g849d5f5929fc] (GCC)

testsuite:
* gcc.dg/no_profile_instrument_function-attr-2.c: Fix
typo in last change.

Fortran - improve error recovery determining array element from constructor

gcc/fortran/ChangeLog:

PR fortran/101327
* expr.c (find_array_element): When bounds cannot be determined as
constant, return error instead of aborting.

gcc/testsuite/ChangeLog:

PR fortran/101327
* gfortran.dg/pr101327.f90: New test.

dwarf2out: Emit BTF in dwarf2out_finish for BPF CO-RE usecase

DWARF generation is split between early and late phases when LTO is in effect.
This poses challenges for CTF/BTF generation especially if late debug info
generation is desirable, as turns out to be the case for BPF CO-RE.

The approach taken here in this patch is:

1. LTO is disabled for BPF CO-RE
The reason to disable LTO for BPF CO-RE is that if LTO is in effect, BPF CO-RE
relocations need to be generated in the LTO link phase _after_ the optimizations
are done. This means we need to devise way to combine early and late BTF. At
this time, in absence of linker support for BTF sections, it makes sense to
steer clear of LTO for BPF CO-RE and bypass the issue.

2. The BPF backend updates the write_symbols with BPF_WITH_CORE_DEBUG to convey
the case that BTF with CO-RE support needs to be generated.  This information
is used by the debug info emission routines to defer the emission of BTF/CO-RE
until dwarf2out_finish.

So, in other words,

dwarf2out_early_finish
  - Always emit CTF here.
  - if (BTF && !BTF_WITH_CORE), emit BTF now.

dwarf2out_finish
  - if (BTF_WITH_CORE) emit BTF now.

gcc/ChangeLog:

* dwarf2ctf.c (ctf_debug_finalize): Make it static.
(ctf_debug_early_finish): New definition.
(ctf_debug_finish): Likewise.
* dwarf2ctf.h (ctf_debug_finalize): Remove declaration.
(ctf_debug_early_finish): New declaration.
(ctf_debug_finish): Likewise.
* dwarf2out.c (dwarf2out_finish): Invoke ctf_debug_finish.
(dwarf2out_early_finish): Invoke ctf_debug_early_finish.

bpf: Add new -mco-re option for BPF CO-RE

-mco-re in the BPF backend enables code generation for the CO-RE usecase. LTO is
disabled for CO-RE compilations.

gcc/ChangeLog:

* config/bpf/bpf.c (bpf_option_override): For BPF backend, disable LTO
support when compiling for CO-RE.
* config/bpf/bpf.opt: Add new command line option -mco-re.

gcc/testsuite/ChangeLog:

* gcc.target/bpf/core-lto-1.c: New test.

debug: Add BTF_WITH_CORE_DEBUG debug format

To best handle BTF/CO-RE in GCC, a distinct BTF_WITH_CORE_DEBUG debug format is
being added. This helps the compiler detect whether BTF with CO-RE relocations
needs to be emitted.

gcc/ChangeLog:

* flag-types.h (enum debug_info_type): Add new enum
DINFO_TYPE_BTF_WITH_CORE.
(BTF_WITH_CORE_DEBUG): New bitmask.
* flags.h (btf_with_core_debuginfo_p): New declaration.
* opts.c (btf_with_core_debuginfo_p): New definition.

tree: Change error_operand_p to an inline function

I've thought for a while that many of the macros in tree.h and such should
become inline functions. This one in particular was confusing Coverity; the
null check in the macro made it think that all code guarded by
error_operand_p would also need null checks.

gcc/ChangeLog:

* tree.h (error_operand_p): Change to inline function.

c++: Fix up constexpr evaluation of deleting dtors [PR100495]

We do not save bodies of constexpr clones and instead evaluate the bodies
of the constexpr functions they were cloned from.
I believe that is just fine for constructors because complete vs. base
ctors differ only in classes that have virtual bases and such constructors
aren't constexpr, similarly complete/base destructors.
But as the testcase below shows, for deleting destructors it is not fine,
deleting dtors while marked as clones in fact are just artificial functions
with synthetized body which calls the user destructor and deallocation.

So, either we'd need to evaluate the destructor and afterwards synthetize
and evaluate the deallocation, or we can just save and use the deleting
dtors bodies. The latter seems much easier to me.

2021-09-07 Jakub Jelinek <jakub@redhat.com>

PR c++/100495
* constexpr.c (maybe_save_constexpr_fundef): Save body even for
constexpr deleting dtors.
(cxx_eval_call_expression): Don't use DECL_CLONED_FUNCTION for
deleting dtors.

* g++.dg/cpp2a/constexpr-new21.C: New test.

libgomp.texi: Extend OpenMP 5.0 Implementation Status

libgomp/
* libgomp.texi (OpenMP Implementation Status): Extend
OpenMP 5.0 section.
(OpenACC Profiling Interface): Fix typo.

Rename forwarder_block_p in treading code to empty_block_with_phis_p.

gcc/ChangeLog:

* tree-ssa-threadedge.c (forwarder_block_p): Rename to...
(empty_block_with_phis_p): ...this.
(potentially_threadable_block): Same.
(jump_threader::thread_through_normal_block): Same.

libgfortran: Makefile fix for ISO_Fortran_binding.h

libgfortran/ChangeLog:

* Makefile.am (gfor_built_src): Depend on
include/ISO_Fortran_binding.h not on ISO_Fortran_binding.h.
(ISO_Fortran_binding.h): Rename make target to ...
(include/ISO_Fortran_binding.h): ... this.
* Makefile.in: Regenerate.

Fix PR debug/101947

This is the recent LTO bootstrap failure with Ada enabled. The compiler now
generates DW_OP_deref_type for a unit of the Ada front-end, which means that
the offset of base types in the CU must be computed during early DWARF too.

gcc/
PR debug/101947
* dwarf2out.c (mark_base_types): New overloaded function.
(dwarf2out_early_finish): Invoke it on the COMDAT type list as well
as the compilation unit, and call move_marked_base_types afterward.

x86: Enable FMA in unsigned SI to SF expanders

Enable FMA in scalar/vector unsigned SI to SF expanders. Don't check
TARGET_AVX512F which has vcvtusi2ss and vcvtudq2ps instructions.

gcc/

PR target/85819
* config/i386/i386-expand.c (ix86_expand_convert_uns_sisf_sse):
Enable FMA.
(ix86_expand_vector_convert_uns_vsivsf): Likewise.

gcc/testsuite/

PR target/85819
* gcc.target/i386/pr85819-1a.c: New test.
* gcc.target/i386/pr85819-1b.c: Likewise.
* gcc.target/i386/pr85819-2a.c: Likewise.
* gcc.target/i386/pr85819-2b.c: Likewise.
* gcc.target/i386/pr85819-2c.c: Likewise.
* gcc.target/i386/pr85819-3.c: Likewise.

tree-optimization/102226 - fix epilogue vector re-use

This fixes re-use of the reduction value in epilogue vectorization
when a conversion from/to variable lenght vectors is required.

2021-09-07 Richard Biener <rguenther@suse.de>

PR tree-optimization/102226
* tree-vect-loop.c (vect_transform_cycle_phi): Record
the converted value for the epilogue PHI use.

* g++.dg/vect/pr102226.cc: New testcase.

C, C++, Fortran, OpenMP: Add support for 'flush seq_cst' construct.

This patch adds support for the 'seq_cst' memory order clause on the 'flush'
directive which was introduced in OpenMP 5.1.

gcc/c-family/ChangeLog:

* c-omp.c (c_finish_omp_flush): Handle MEMMODEL_SEQ_CST.

gcc/c/ChangeLog:

* c-parser.c (c_parser_omp_flush): Parse 'seq_cst' clause on 'flush'
directive.

gcc/cp/ChangeLog:

* parser.c (cp_parser_omp_flush): Parse 'seq_cst' clause on 'flush'
directive.
* semantics.c (finish_omp_flush): Handle MEMMODEL_SEQ_CST.

gcc/fortran/ChangeLog:

* openmp.c (gfc_match_omp_flush): Parse 'seq_cst' clause on 'flush'
directive.
* trans-openmp.c (gfc_trans_omp_flush): Handle OMP_MEMORDER_SEQ_CST.

gcc/testsuite/ChangeLog:

* c-c++-common/gomp/flush-1.c: Add test case for 'seq_cst'.
* c-c++-common/gomp/flush-2.c: Add test case for 'seq_cst'.
* g++.dg/gomp/attrs-1.C: Adapt test to handle all flush clauses.
* g++.dg/gomp/attrs-2.C: Adapt test to handle all flush clauses.
* gfortran.dg/gomp/flush-1.f90: Add test case for 'seq_cst'.
* gfortran.dg/gomp/flush-2.f90: Add test case for 'seq_cst'.

inline: do not einline when no_profile_instrument_function is different

PR gcov-profile/80223

gcc/ChangeLog:

* ipa-inline.c (can_inline_edge_p): Similarly to sanitizer
options, do not inline when no_profile_instrument_function
attributes are different in early inliner. It's fine to inline
it after PGO instrumentation.

gcc/testsuite/ChangeLog:

* gcc.dg/no_profile_instrument_function-attr-2.c: New test.

tree-optimization/101555 - avoid redundant alias queries in PRE

This avoids doing redundant work during PHI translation to invalidate
mems when translating their corresponding VUSE through the blocks
virtual PHI node. All the invalidation work is already done by
prune_clobbered_mems.

This speeds up the compile of the testcase from 275s with PRE
taking 91% of the compile-time down to 43s with PRE taking 16%
of the compile-time.

2021-09-07 Richard Biener <rguenther@suse.de>

PR tree-optimization/101555
* tree-ssa-pre.c (translate_vuse_through_block): Do not
perform an alias walk to determine the validity of the
mem at the start of the block which is already guaranteed
by means of prune_clobbered_mems.
(phi_translate_1): Pass edge to translate_vuse_through_block.

libgomp.texi: Add OpenMP Implementation Status

libgomp/
* libgomp.texi (Enabling OpenMP): Refer to OMP spec in general
not to 4.5; link to new section.
(OpenMP Implementation Status): New.

Fortran: Revert to non-multilib-specific ISO_Fortran_binding.h

Commit fef67987cf502fe322e92ddce22eea7ac46b4d75 changed the
libgfortran build process to generate multilib-specific versions of
ISO_Fortran_binding.h from a template, by running gfortran to identify
the values of the Fortran kind constants C_LONG_DOUBLE, C_FLOAT128,
and C_INT128_T. This caused multiple problems with search paths, both
for build-tree testing and installed-tree use, not all of which have
been fixed.

This patch reverts to a non-multilib-specific .h file that uses GCC's
predefined preprocessor symbols to detect the supported types and map
them to kind values in the same way as the Fortran front end.

2021-09-06 Sandra Loosemore <sandra@codesourcery.com>

libgfortran/
* ISO_Fortran_binding-1-tmpl.h: Deleted.
* ISO_Fortran_binding-2-tmpl.h: Deleted.
* ISO_Fortran_binding-3-tmpl.h: Deleted.
* ISO_Fortran_binding.h: New file to replace the above.
* Makefile.am (gfor_cdir): Remove MULTISUBDIR.
(ISO_Fortran_binding.h): Simplify to just copy the file.
* Makefile.in: Regenerated.
* mk-kinds-h.sh: Revert pieces no longer needed for
ISO_Fortran_binding.h.

rs6000: Expand fmod and remainder when built with fast-math [PR97142]

fmod/fmodf and remainder/remainderf could be expanded instead of library
call when fast-math build, which is much faster.

fmodf:
     fdivs   f0,f1,f2
     friz    f0,f0
     fnmsubs f1,f2,f0,f1

remainderf:
     fdivs   f0,f1,f2
     frin    f0,f0
     fnmsubs f1,f2,f0,f1

SPEC2017 Ofast P8LE: 511.povray_r +1.14%,  526.blender_r +1.72%

gcc/ChangeLog:

2021-09-07  Xionghu Luo  <luoxhu@linux.ibm.com>

PR target/97142
* config/rs6000/rs6000.md (fmod<mode>3): New define_expand.
(remainder<mode>3): Likewise.

gcc/testsuite/ChangeLog:

2021-09-07  Xionghu Luo  <luoxhu@linux.ibm.com>

PR target/97142
* gcc.target/powerpc/pr97142.c: New test.

MIPS: add .module arch and ase to all output asm

Currently, the asm output file for MIPS has no rev info.
It can make some trouble, for example:

  assembler is mips1 by default,
  gcc is fpxx by default.

To assemble the output of gcc -S, we have to pass -mips2
to assembler.

The same situation is for some CPU has extension insn.
Octeon is an example.
So we can just add ".set arch=octeon".

If an ASE is enabled, .module ase will also be used.

gcc/ChangeLog:
* config/mips/mips.c (mips_file_start): add .module for
  arch and ase.

Daily bump.

Correct implementation of wi::clz

As diagnosed with Jakub and Richard in the analysis of PR 102134, the
current implementation of wi::clz has incorrect/inconsistent behaviour.
As mentioned by Richard in comment #7, clz should (always) return zero
for negative values, but the current implementation can only return 0
when precision is a multiple of HOST_BITS_PER_WIDE_INT. The fix is
simply to reorder/shuffle the existing tests.

2021-09-06 Roger Sayle <roger@nextmovesoftware.com>

gcc/ChangeLog
* wide-int.cc (wi::clz): Reorder tests to ensure the result
is zero for all negative values.

invoke.texi: Fix @opindex for -foffload-options

gcc/
* doc/invoke.texi (-foffload-options): Fix @opindex.

gcc_update: use human readable name for revision string in gcc/REVISION

contrib/Changelog:

* gcc_update: Derive human readable name for HEAD using git describe
like "git gcc-descr" with short commit hash. Drop "revision" from
gcc/REVISION.

x86: Add non-destructive source to @xorsign<mode>3_1

Add non-destructive source alternative to @xorsign<mode>3_1 for AVX.

gcc/

PR target/89984
* config/i386/i386-expand.c (ix86_split_xorsign): Use operands[2].
* config/i386/i386.md (@xorsign<mode>3_1): Add non-destructive
source alternative for AVX.

gcc/testsuite/

PR target/89984
* gcc.target/i386/pr89984-1.c: New test.
* gcc.target/i386/pr89984-2.c: Likewise.
* gcc.target/i386/xorsign-avx.c: Likewise.

Avoid FROM being overwritten in expand_fix.

For the conversion from _Float16 to int, if the corresponding optab
does not exist, the compiler will try the wider mode (SFmode here),
but when floatsfsi exists but FAIL, FROM will be rewritten, which
leads to a PR runtime error.

gcc/ChangeLog:

PR middle-end/102182
* optabs.c (expand_fix): Add from1 to avoid from being
overwritten.

gcc/testsuite/ChangeLog:

PR middle-end/102182
* gcc.target/i386/pr101282.c: New test.

'libgomp.c/target-43.c': '-latomic' for nvptx offloading

... to avoid a regression with recent
commit 090f0d78f194e3cda23fe904016db77ea36c38fa
"openmp: Improve expand_omp_atomic_pipeline":

    unresolved symbol __atomic_compare_exchange_1
    collect2: error: ld returned 1 exit status
    mkoffload: fatal error: [...]/gcc/x86_64-pc-linux-gnu-accel-nvptx-none-gcc returned 1 exit status

libgomp/
* testsuite/libgomp.c/target-43.c: '-latomic' for nvptx offloading.

Fix debug info for packed array types in Ada

Packed array types are sometimes represented with integer types under the
hood in Ada, but we nevertheless need to emit them as array types in the
debug info so we have the types.get_array_descr_info langhook for this
purpose; but it is not invoked from modified_type_die, which causes:

FAIL: gdb.ada/arrayptr.exp: scenario=minimal: print pa_ptr.all
FAIL: gdb.ada/arrayptr.exp: scenario=minimal: print pa_ptr.all(3)

in the GDB testsuite.

gcc/
* dwarf2out.c (modified_type_die): Deal with all array types earlier
and use local variable consistently throughout the function.

match.pd: Fix up __builtin_*_overflow arg demotion [PR102207]

My earlier patch to demote arguments of __builtin_*_overflow unfortunately
caused a wrong-code regression. The builtins operate on infinite precision
arguments, outer_prec > inner_prec signed -> signed, unsigned -> unsigned
promotions there are just repeating the sign or 0s and can be demoted,
similarly unsigned -> signed which also is repeating 0s, but as the
testcase shows, signed -> unsigned promotions need to be preserved (unless
we'd know the inner arguments can't be negative), because for negative
numbers such promotion sets the outer_prec -> inner_prec bits to 1 bit the
bits above that to 0 in the infinite precision.

So, the following patch avoids the demotions for the signed -> unsigned
promotions.

2021-09-06 Jakub Jelinek <jakub@redhat.com>

PR tree-optimization/102207
* match.pd: Don't demote operands of IFN_{ADD,SUB,MUL}_OVERFLOW if they
were promoted from signed to wider unsigned type.

* gcc.dg/pr102207.c: New test.

Fix PR tree-optimization/63184: add simplification of (& + A) != (& + B)

These two testcases have been failing since GCC 5 but things
have improved such that adding a simplification to match.pd
for this case is easier than before.
In the end we have the following IR:
....
  _5 = &a[1] + _4;
  _7 = &a + _13;
  if (_5 != _7)

So we can fold the _5 != _7 into:
(&a[1] - &a) + _4 != _13

The subtraction is folded into constant by ptr_difference_const.
In this case, the full expression gets folded into a constant
and we are able to remove the if statement.

OK? Bootstrapped and tested on aarch64-linux-gnu with no regressions.

gcc/ChangeLog:

PR tree-optimization/63184
* match.pd: Add simplification of pointer_diff of two pointer_plus
with addr_expr in the first operand of each pointer_plus.
Add simplificatoin of ne/eq of two pointer_plus with addr_expr
in the first operand of each pointer_plus.

gcc/testsuite/ChangeLog:

PR tree-optimization/63184
* c-c++-common/pr19807-2.c: Enable for all targets and remove the xfail.
* c-c++-common/pr19807-3.c: Likewise.

Explicitly add -msse2 to compile HF related libgcc source file.

For 32-bit libgcc configure w/o sse2, there's would be an error since
GCC only support _Float16 under sse2. Explicitly add -msse2 for those
HF related libgcc functions, so users can still link them w/ the
upper configuration.

libgcc/ChangeLog:

* Makefile.in: Adjust to support specific CFLAGS for each
libgcc source file.
* config/i386/64/t-softfp: Explicitly add -msse2 for HF
related libgcc source files.
* config/i386/t-softfp: Ditto.
* config/i386/_divhc3.c: New file.
* config/i386/_mulhc3.c: New file.

tree-optimization/102176 - locally compute participating SLP stmts

This performs local re-computation of participating scalar stmts
in BB vectorization subgraphs to allow precise computation of
liveness of scalar stmts after vectorization and thus precise
costing. This treats all extern defs as live but continues
to optimistically handle scalar defs that we think we can handle
by lane-extraction even though that can still fail late during
code-generation.

2021-09-02 Richard Biener <rguenther@suse.de>

PR tree-optimization/102176
* tree-vect-slp.c (vect_slp_gather_vectorized_scalar_stmts):
New function.
(vect_bb_slp_scalar_cost): Use the computed set of
vectorized scalar stmts instead of relying on the out-of-date
and not accurate PURE_SLP_STMT.
(vect_bb_vectorization_profitable_p): Compute the set
of vectorized scalar stmts.

Daily bump.

libgo: update to final Go 1.17 release

Reviewed-on: https://go-review.googlesource.com/c/gofrontend/+/343729

Make the path solver's range_of_stmt() handle all statements.

The path solver's range_of_stmt() was handcuffed to only fold
GIMPLE_COND statements, since those were the only statements the
backward threader needed to resolve. However, there is no need for this
restriction, as the folding code is perfectly capable of folding any
statement.

This can be the case when trying to fold other statements in the final
block of a path (for instance, in the forward threader as it tries to
fold candidate statements along a path).

Tested on x86-64 Linux.

gcc/ChangeLog:

* gimple-range-path.cc (path_range_query::range_of_stmt): Remove
GIMPLE_COND special casing.
(path_range_query::range_defined_in_block): Use range_of_stmt
instead of calling fold_range directly.

Add an unreachable_path_p method to path_range_query.

Keeping track of unreachable calculations while traversing a path is
useful to determine edge reachability, among other things.  We've been
doing this ad-hoc in the backwards threader, so this provides a cleaner
way of accessing the information.

This patch also makes it easier to compare different threading
implementations, in some upcoming work.  For example, it's currently
difficult to gague how good we're doing compared to the forward threader,
because it can thread paths that are obviously unreachable.  This
provides a way of discarding those paths.

Note that I've opted to keep unreachable_path_p() out-of-line, because I
have local changes that will enhance this method.

Tested on x86-64 Linux.

gcc/ChangeLog:

* gimple-range-path.cc (path_range_query::range_of_expr): Set
m_undefined_path when appropriate.
(path_range_query::internal_range_of_expr): Copy from range_of_expr.
(path_range_query::unreachable_path_p): New.
(path_range_query::precompute_ranges): Set m_undefined_path.
* gimple-range-path.h (path_range_query::unreachable_path_p): New.
(path_range_query::internal_range_of_expr): New.
* tree-ssa-threadbackward.c (back_threader::find_taken_edge_cond):
Use unreachable_path_p.

Clean up registering of paths in backwards threader.

All callers to maybe_register_path() call find_taken_edge() beforehand
and pass the edge as an argument. There's no reason to repeat this
at each call site.

This is a clean-up in preparation for some other enhancements to the
backwards threader.

Tested on x86-64 Linux.

gcc/ChangeLog:

* tree-ssa-threadbackward.c (back_threader::maybe_register_path):
Remove argument and call find_taken_edge.
(back_threader::resolve_phi): Do not calculate taken edge before
calling maybe_register_path.
(back_threader::find_paths_to_names): Same.

Improve handling of C bit for setcc insns

gcc/
* config/h8300/h8300.md (QHSI2 mode iterator): New mode iterator.
* config/h8300/testcompare.md (store_c): Update name, use new
QHSI2 iterator.
(store_neg_c, store_shifted_c): New patterns.

Daily bump.

rs6000: Don't use r12 for CR save on ELFv2 (PR102107)

CR is saved and/or restored on some paths where GPR12 is already live
since it has a meaning in the calling convention in the ELFv2 ABI.

It is not completely clear to me that we can always use r11 here, but
it does seem save, there is checking code (to detect conflicts here),
and it is stage 1. So here goes.

2021-09-03 Segher Boessenkool <segher@kernel.crashing.org>

PR target/102107
* config/rs6000/rs6000-logue.c (rs6000_emit_prologue): On ELFv2 use r11
instead of r12 for CR save, in all cases.

coroutines: Support for debugging implementation state.

Some of the state that is associated with the implementation
is of interest to a user debugging a coroutine.  In particular
items such as the suspend point, promise object, and current
suspend point.

These variables live in the coroutine frame, but we can inject
proxies for them into the outermost bind expression of the
coroutine.  Such variables are automatically moved into the
coroutine frame (if they need to persist across a suspend
expression).  PLacing the proxies thus allows the user to
inspect them by name in the debugger.

To implement this, we ensure that (at the outermost scope) the
frame entries are not mangled (coroutine frame variables are
usually mangled with scope nesting information so that they do
not clash).  We can safely avoid doing this for the outermost
scope so that we can map frame entries directly to the variables.

This is partial contribution to debug support (PR 99215).

Signed-off-by: Iain Sandoe <iain@sandoe.co.uk>
gcc/cp/ChangeLog:

* coroutines.cc (register_local_var_uses): Do not mangle
frame entries for the outermost scope.  Record the outer
scope as nesting depth 0.

coroutines: Add a helper for creating local vars.

This is primarily code factoring, but we take this opportunity
to rename some of the implementation variables (which we intend
to expose to debugging) so that they are in the implementation
namespace.

Signed-off-by: Iain Sandoe <iain@sandoe.co.uk>
gcc/cp/ChangeLog:

* coroutines.cc (coro_build_artificial_var): New.
(build_actor_fn): Use var builder, rename vars to use
implementation namespace.
(coro_rewrite_function_body): Likewise.
(morph_fn_to_coro): Likewise.

coroutines: Use DECL_VALUE_EXPR instead of rewriting vars.

Variables that need to persist over suspension expressions
must be preserved by being copied into the coroutine frame.

The initial implementations do this manually in the transform
code. However, that has various disadvantages - including
that the debug connections are lost between the original var
and the frame copy.

The revised implementation makes use of DECL_VALUE_EXPRs to
contain the frame offset expressions, so that the original
var names are preserved in the code.

This process is also applied to the function parms which are
always copied to the frame. In this case the decls need to be
copied since they are used in two different contexts during
the re-write (in the building of the ramp function, and in
the actor function itself).

This will assist in improvement of debugging (PR 99215).

Signed-off-by: Iain Sandoe <iain@sandoe.co.uk>
gcc/cp/ChangeLog:

* coroutines.cc (transform_local_var_uses): Record
frame offset expressions as DECL_VALUE_EXPRs instead of
rewriting them.

Fix target/102173 ICE after error recovery

After the recent r12-3278-823685221de986a change, the testcase
gcc.target/aarch64/sve/acle/general-c/type_redef_1.c started
to ICE as the code was not ready for error_mark_node in the
type. This fixes that and the testcase now passes.

gcc/ChangeLog:

* config/aarch64/aarch64-sve-builtins.cc (register_vector_type):
Handle error_mark_node as the type of the type_decl.

Fix some GC issues in the aarch64 back-end.

I got some ICEs in my latest testsing while running the libstdc++ testsuite.
I had noticed the problem was connected to types and had just touched the
builtins code but nothing which could have caused this and I looked for
some types/variables that were not being marked with GTY.

OK? Bootstrapped and tested on aarch64-linux-gnu with no regressions.

gcc/ChangeLog:

* config/aarch64/aarch64-builtins.c (struct aarch64_simd_type_info):
Mark with GTY.
(aarch64_simd_types): Likewise.
(aarch64_simd_intOI_type_node): Likewise.
(aarch64_simd_intCI_type_node): Likewise.
(aarch64_simd_intXI_type_node): Likewise.
* config/aarch64/aarch64.h (aarch64_fp16_type_node): Likewise.
(aarch64_fp16_ptr_type_node): Likewise.
(aarch64_bf16_type_node): Likewise.
(aarch64_bf16_ptr_type_node): Likewise.

Implement POINTER_DIFF_EXPR entry in range-op.

I've seen cases in the upcoming jump threader enhancements where we see
a difference of two pointers that are known to be equivalent, and yet we
fail to return 0 for the range. This is because we have no working
range-op entry for POINTER_DIFF_EXPR. The entry we currently have is
a mere placeholder to avoid ignoring POINTER_DIFF_EXPR's so
adjust_pointer_diff_expr() could get a whack at it here:

// def = __builtin_memchr (arg, 0, sz)
// n = def - arg
//
// The range for N can be narrowed to [0, PTRDIFF_MAX - 1].

This patch adds the relational magic to range-op, which we can just
steal from the minus_expr code.

gcc/ChangeLog:

* range-op.cc (operator_minus::op1_op2_relation_effect): Abstract
out to...
(minus_op1_op2_relation_effect): ...here.
(class operator_pointer_diff): New.
(operator_pointer_diff::op1_op2_relation_effect): Call
minus_op1_op2_relation_effect.
(integral_table::integral_table): Add entry for POINTER_DIFF_EXPR.

c++: shortcut bad convs during overload resolution [PR101904]

In the context of overload resolution we have the notion of a "bad"
argument conversion, which is a conversion that "would be a permitted
with a bending of the language standards", and we handle such bad
conversions specially.  In particular, we rank a bad conversion as
better than no conversion but worse than a good conversion, and a bad
conversion doesn't necessarily make a candidate unviable.  With the
flag -fpermissive, we permit the situation where overload resolution
selects a candidate that contains a bad conversion (which we call a
non-strictly viable candidate).  And without the flag, the caller
of overload resolution usually issues a distinct permerror in this
situation instead.

One consequence of this defacto behavior is that in order to distinguish
a non-strictly viable candidate from an unviable candidate, if we
encounter a bad argument conversion during overload resolution we must
keep converting subsequent arguments because a subsequent conversion
could render the candidate unviable instead of just non-strictly viable.
But checking subsequent arguments can force template instantiations and
result in otherwise avoidable hard errors.  And in particular, all
'this' conversions are at worst bad, so this means the const/ref-qualifiers
of a member function can't be used to prune a candidate quickly, which
is the subject of the mentioned PR.

This patch tries to improve the situation without changing the defacto
output of add_candidates.  Specifically, when considering a candidate
during overload resolution this patch makes us shortcut argument
conversion checking upon encountering the first bad conversion
(tentatively marking the candidate as non-strictly viable, though it
could ultimately be unviable) under the assumption that we'll eventually
find a strictly viable candidate anyway (which renders moot the
distinction between non-strictly viable and unviable, since both are
worse than a strictly viable candidate).  If this assumption turns out
to be false, we'll fully reconsider the candidate under the defacto
behavior (without the shortcutting) so that all its conversions are
computed.

So in the best case (there's a strictly viable candidate), we avoid
some argument conversions and/or template argument deduction that may
cause a hard error.  In the worst case (there's no such candidate), we
have to redundantly consider some candidates twice.  (In a previous
version of the patch, to avoid this redundant checking I created a new
"deferred" conversion type that represents a conversion that is yet to
be computed, and instead of reconsidering a candidate I just realized
its deferred conversions.  But it doesn't seem this redundancy is a
significant performance issue to justify the added complexity of this
other approach.)

PR c++/101904

gcc/cp/ChangeLog:

* call.c (build_this_conversion): New function, split out from
add_function_candidate.
(add_function_candidate): New parameter shortcut_bad_convs.
Document it.  Use build_this_conversion.  Stop at the first bad
argument conversion when shortcut_bad_convs is true.
(add_template_candidate_real): New parameter shortcut_bad_convs.
Use build_this_conversion to check the 'this' conversion before
attempting deduction.  When the rejection reason code is
rr_bad_arg_conversion, pass -1 instead of 0 as the viable
parameter to add_candidate.  Pass 'convs' to add_candidate.
(add_template_candidate): New parameter shortcut_bad_convs.
(add_template_conv_candidate): Pass false as shortcut_bad_convs
to add_template_candidate_real.
(add_candidates): Prefer to shortcut bad conversions during
overload resolution under the assumption that we'll eventually
see a strictly viable candidate.  If this assumption turns out
to be false, re-process the non-strictly viable candidates
without shortcutting those bad conversions.

gcc/testsuite/ChangeLog:

* g++.dg/template/conv17.C: New test.

libgcc, soft-float: Fix strong_alias macro use for Darwin.

Darwin does not support strong symbol aliases and a work-
around is provided in sfp-machine.h where a second function
is created that simply calls the original. However this
needs the arguments to the synthesized function to track
the mode of the original function.

So the fix here is to match known floating point modes from
the incoming function and apply the one found to the new
function args.

The matching is highly specific to the current set of modes
and will need adjusting should more cases be added.

Signed-off-by: Iain Sandoe <iain@sandoe.co.uk>
libgcc/ChangeLog:

* config/i386/sfp-machine.h (alias_HFtype, alias_SFtype
alias_DFtype, alias_TFtype): New.
(ALIAS_SELECTOR): New.
(strong_alias): Use __typeof and a _Generic selector to
provide the type to the synthesized function.

Do not assume loop header threading in backward threader.

The registry's thread_through_all_blocks() has a may_peel_loop_headers
argument.  When refactoring the backward threader code, I removed this
argument for the local passthru method because it was always TRUE.  This
may not necessarily be true in the future, if the backward threader is
called from another context.  This patch removes the default definition,
in favor of an argument that is exactly the same as the identically
named function in tree-ssa-threadupdate.c.  I think this also makes it
less confusing when looking at both methods across the source base.

Tested on x86-64 Linux.

gcc/ChangeLog:

* tree-ssa-threadbackward.c (back_threader::thread_through_all_blocks):
Add may_peel_loop_headers.
(back_threader_registry::thread_through_all_blocks): Same.
(try_thread_blocks): Pass may_peel_loop_headers argument.
(pass_early_thread_jumps::execute): Same.

Abstract PHI and forwarder block checks in jump threader.

This patch abstracts out a couple common idioms in the forward
threader that I found useful while navigating the code base.

Tested on x86-64 Linux.

gcc/ChangeLog:

* tree-ssa-threadedge.c (has_phis_p): New.
(forwarder_block_p): New.
(potentially_threadable_block): Call forwarder_block_p.
(jump_threader::thread_around_empty_blocks): Call has_phis_p.
(jump_threader::thread_through_normal_block): Call
forwarder_block_p.

Improve backwards threader debugging dumps.

This patch adds debugging helpers to the backwards threader.  I have
also noticed that profitable_path_p() can bail early on paths that
crosses loops and leave the dump of blocks incomplete.  Fixed as
well.

Unfortunately the new methods cannot be marked const, because we call
the solver's dump which is not const.  I believe this was because the
ranger dump calls m_cache.block_range().  This could probably use a
cleanup at a later time.

Tested on x86-64 Linux.

gcc/ChangeLog:

* tree-ssa-threadbackward.c (back_threader::dump): New.
(back_threader::debug): New.
(back_threader_profitability::profitable_path_p): Dump blocks
even if we are bailing early.

Dump reason why threads are being cancelled and abstract code.

We are inconsistent on dumping out reasons why a thread was canceled.
This makes debugging jump threading problems harder because paths can be
canceled with no reason given. This patch abstracts out the thread
canceling code and adds a reason for every cancellation.

Tested on x86-64 Linux.

gcc/ChangeLog:

* tree-ssa-threadupdate.c (cancel_thread): New.
(jump_thread_path_registry::thread_block_1): Use cancel_thread.
(jump_thread_path_registry::mark_threaded_blocks): Same.
(jump_thread_path_registry::register_jump_thread): Same.

c++: Avoid bogus -Wunused with recent change

My change to make limit_bad_template_recursion avoid instantiating members
of erroneous classes produced a bogus "used but not defined" warning for
23_containers/unordered_set/instantiation_neg.cc; it's not defined because
we decided not to instantiate it. So we need to suppress that warning.

gcc/cp/ChangeLog:

* pt.c (limit_bad_template_recursion): Suppress -Wunused for decls
we decide not to instantiate.