review.tizen.org Git - platform/upstream/gcc.git/log

Output vextract{i,f}{32x4,64x2} for (vec_select:(reg:Vmode) idx) when byte_offset of idx % 16 == 0.

2020-09-13 Hongtao Liu <hongtao.liu@intel.com>
Peter Cordes <peter@cordes.ca>
gcc/ChangeLog:

PR target/91103
* config/i386/sse.md (extract_suf): Add V8SF/V8SI/V4DF/V4DI.
(*vec_extract<mode><ssescalarmodelower>_valign): Output
vextract{i,f}{32x4,64x2} instruction when byte_offset % 16 ==
0.

gcc/testsuite/ChangeLog:

PR target/91103
* gcc.target/i386/pr91103-1.c: Add extract tests.
* gcc.target/i386/pr91103-2.c: Ditto.

Add OpenACC 'host_data' testing to 'gfortran.dg/goacc/unexpected-end.f90'

Use underscore instead of space in 'host_data'.

Follow-up to recent commit 33fdbbe4ce6055eb858096d01720ccf94aa854ec
"Fortran: Add missing ST_OMP_END_SCOPE handling [PR102313]".

gcc/testsuite/
* gfortran.dg/goacc/unexpected-end.f90: Add OpenACC 'host_data'
testing.

Remove support for vax-openbsd

This removes the support for vax-openbsd which has been discontinued
after the OpenBSD 5.9 release and which has no supported gas or GNU ld
configuration [anymore]. In particular this target does only support
STABS debuginfo generation.

2021-09-13 Richard Biener <rguenther@suse.de>

* config.gcc: Remove vax-*-openbsd* configuration.

contrib/
* config-list.mk: Remove vax-openbsd.

Remove m68k-openbsd support

This removes m68k-openbsd as a valid configuration, according
to openbsd.org m68k-openbsd [on the mac] was discontinued after
the 5.1 release.  The configuration is also not (or no longer)
supported by gas and GNU ld so I could not figure whether it is still
a.out (I suspect it is).  But first and foremost the target only supports
STABS as a debugging format.

2021-09-13  Richard Biener  <rguenther@suse.de>

* config.gcc: Remove m68k-openbsd.

contrib/
* config-list.mk: Remove m68k-openbsd.

c++: don't predeclare std::type_info [PR48396]

We've always predeclared std::type_info, which has been wrong for a while,
but now with modules it becomes more of a practical problem, if we want to
declare it in the purview of a module. So don't predeclare it. For
building up the type_info information to write out with the vtable, we can
use void* instead of type_info*, since they already aren't the real types.

PR c++/48396

gcc/cp/ChangeLog:

* cp-tree.h (enum cp_tree_index): Remove CPTI_TYPE_INFO_PTR_TYPE.
(type_info_ptr_type): Remove.
* rtti.c (init_rtti_processing): Don't predeclare std::type_info.
(typeid_ok_p): Check for null const_type_info_type_node.
(type_info_ptr_type, get_void_tinfo_ptr): New fns.
(get_tinfo_decl_dynamic, get_tinfo_ptr): Use them.
(ptr_initializer, ptm_initializer, get_pseudo_ti_init): Use them.
(get_tinfo_desc): Use const_ptr_type_node.

gcc/testsuite/ChangeLog:

* g++.dg/rtti/undeclared1.C: New test.

c++: correct object scope handling

The way cp_parser_lookup_name handles object scope (i.e. the scope on the
RHS of a . or -> expression) is a bit subtle: before the lookup it's in
parser->context->object type, and after the lookup it's in
parser->object_scope. But a couple of places that elide lookups were
failing to do the same transform.

I'm not aware of this breaking anything currently.

gcc/cp/ChangeLog:

* parser.c (cp_parser_template_name): Move object type.
(cp_parser_pre_parsed_nested_name_specifier): Likewise.

c++: tweak C++20 destructor template-id rule

While working on a larger change to destructor lookup I noticed that this
rule talks about declarators, but we weren't limiting the error to the case
where we're parsing a declarator. I don't know if this actually broke
anything, since a CPP_TEMPLATE_ID would have to have been parsed once
before, but it's more correct this way.

gcc/cp/ChangeLog:

* parser.c (cp_parser_unqualified_id): Only complain about ~A<T> in
a declarator.

gcc: xtensa: fix PR target/102336

2021-09-14 Max Filippov <jcmvbkbc@gmail.com>
gcc/
PR target/102336
* config/xtensa/t-xtensa (TM_H): Add include/xtensa-config.h.

Daily bump.

Fortran - fix ICE during error recovery checking entry characteristics

gcc/fortran/ChangeLog:

PR fortran/102311
* resolve.c (resolve_entries): Attempt to recover cleanly after
rejecting mismatched function entries.

gcc/testsuite/ChangeLog:

PR fortran/102311
* gfortran.dg/entry_25.f90: New test.

c++tools : Add a simple handler for ModuleCompiledRequest.

This just replies with "OK".

c++tools/ChangeLog:

* resolver.cc (module_resolver::ModuleCompiledRequest):
Add a simple handler.
* resolver.h: Declare handler for ModuleCompiledRequest.

rs6000: Disable optimizing multiple xxsetaccz instructions into one xxsetaccz

Fwprop will happily optimize two xxsetaccz instructions into one xxsetaccz
by propagating the results of the first to the uses of the second.
We really don't want that to happen given the late priming/depriming of
accumulators.  I fixed this by making the xxsetaccz source operand an
unspec volatile.  I also removed the mma_xxsetaccz define_expand and
define_insn_and_split and replaced it with a simple define_insn.
The expand and splitter patterns were leftovers from the pre opaque mode
code when the xxsetaccz code was part of the movpxi pattern, and we don't
need them now.

Rather than a new test case, I was able to just modify the current test case
to add another __builtin_mma_xxsetaccz call which shows the bad code gen
with unpatched compilers.

2021-09-14  Peter Bergner  <bergner@linux.ibm.com>

gcc/
* config/rs6000/mma.md (unspec): Delete UNSPEC_MMA_XXSETACCZ.
(unspecv): Add UNSPECV_MMA_XXSETACCZ.
(*mma_xxsetaccz): Delete.
(mma_xxsetaccz): Change to define_insn.  Remove operand 1.
Use UNSPECV_MMA_XXSETACCZ.  Update comment.
* config/rs6000/rs6000.c (rs6000_rtx_costs): Use UNSPECV_MMA_XXSETACCZ.

gcc/testsuite/
* gcc.target/powerpc/mma-builtin-6.c: Add second call to xxsetacc
built-in.  Update instruction counts.

configure: Avoid unnecessary constraints on executables for $build.

The executables for GCC's c-family compilers must be built with no-PIE
because they use PCH and the current model for this requires that the
exe is always lauched at the same address.  Since the other language
compilers share code with the c-family this constraint is also applied
to them.

However, the executables that run on $build (generators, and parsers
for md and def files) need not have any such constraint they do not
consume PCH files.

This change simplifies the configuration and Makefile content by
removing the code enforcing no-PIE on these exes.  This also fixes a
bootstrap issue with some Darwin versions and clang as the bootstrap
compiler,  where -no-PIE causes the correct relocation model to be
switched off leading to invalid user-space code.

Signed-off-by: Iain Sandoe <iain@sandoe.co.uk>
gcc/ChangeLog:

* Makefile.in: Remove variables related to applying no-PIE
to the exes on $build.
* configure: Regenerate.
* configure.ac: Remove configuration related to applying
no-PIE to the exes on $build.

coroutines: Make proxy vars for the function arg copies.

This adds top level proxy variables for the coroutine frame
copies of the original function args. These are then available
in the debugger to refer to the frame copies. We rewrite the
function body to use the copies, since the original parms will
no longer be in scope when the coroutine is running.

Signed-off-by: Iain Sandoe <iain@sandoe.co.uk>
gcc/cp/ChangeLog:

* coroutines.cc (struct param_info): Add copy_var.
(build_actor_fn): Use simplified param references.
(register_param_uses): Likewise.
(rewrite_param_uses): Likewise.
(analyze_fn_parms): New function.
(coro_rewrite_function_body): Add proxies for the fn
parameters to the outer bind scope of the rewritten code.
(morph_fn_to_coro): Use simplified version of param ref.

coroutines: Expose implementation state to the debugger.

In the process of transforming a coroutine into the separate representation
as the ramp function and a state machine, we generate some variables that
are of interest to a user during debugging.  Any variable that is persistent
for the execution of the coroutine is placed into the coroutine frame.

In particular:
  The promise object.
  The function pointers for the resumer and destroyer.
  The current resume index (suspend point).
  The handle that represents this coroutine 'self handle'.
  Any handle provided for a continuation coroutine.
  Whether the coroutine frame is allocated and needs to be freed.

Visibility of some of these has already been requested by end users.

This patch ensures that such variables have names that are usable in a
debugger, but are in the reserved namespace for the implementation (they
all begin with _Coro_).  The identifiers are generated lazily when the
first coroutine is encountered.

We place the variables into the outermost bind expression and then add a
DECL_VALUE_EXPR to each that points to the frame entry.

These changes simplify the handling of the variables in the body of the
function (in particular, the use of the DECL_VALUE_EXPR means that we now
no longer need to rewrite proxies for the promise and coroutine handles into
the frame->offset form).

Partial improvement to debugging (PR c++/99215).

Signed-off-by: Iain Sandoe <iain@sandoe.co.uk>
gcc/cp/ChangeLog:

* coroutines.cc (coro_resume_fn_id, coro_destroy_fn_id,
coro_promise_id, coro_frame_needs_free_id, coro_resume_index_id,
coro_self_handle_id, coro_actor_continue_id,
coro_frame_i_a_r_c_id): New.
(coro_init_identifiers): Initialize new name identifiers.
(coro_promise_type_found_p): Use pre-built identifiers.
(struct await_xform_data): Remove unused fields.
(transform_await_expr): Delete code that is now unused.
(build_actor_fn): Simplify interface, use pre-built identifiers and
remove transforms that are no longer needed.
(build_destroy_fn): Use revised field names.
(register_local_var_uses): Use pre-built identifiers.
(coro_rewrite_function_body): Simplify interface, use pre-built
identifiers.  Generate proxy vars in the outer bind expr scope for the
implementation state that we wish to expose.
(morph_fn_to_coro): Adjust comments for new variable names, use pre-
built identifiers.  Remove unused code to generate frame entries for
the implementation state.  Adjust call for build_actor_fn.

c++: empty union member activation during constexpr [PR102163]

Here, the union's constructor is defined to activate its empty data
member _M_rest, but during constexpr evaluation of this constructor the
subobject constructor call O::O(&_M_rest, 42) doesn't produce a side
effect that actually activates the member, so the union still appears
uninitialized after its constructor has run. This patch fixes this by
using a dummy MODIFY_EXPR in this situation, whose evaluation ensures
the member gets activated.

PR c++/102163

gcc/cp/ChangeLog:

* constexpr.c (cxx_eval_call_expression): After evaluating a
subobject constructor call for an empty union member, produce a
side effect that makes sure the member gets activated.

gcc/testsuite/ChangeLog:

* g++.dg/cpp0x/constexpr-empty17.C: New test.

c++: Update DECL_*SIZE for objects with flexible array members with initializers [PR102295]

The C FE updates DECL_*SIZE for vars which have initializers for flexible
array members for many years, but C++ FE kept DECL_*SIZE the same as the
type size (i.e. as if there were zero elements in the flexible array
member). This results e.g. in ELF symbol sizes being too small.

Note, if the flexible array member is initialized only with non-constant
initializers, we have a worse bug that this patch doesn't solve, the
splitting of initializers into constant and dynamic initialization removes
the initializer and we don't have just wrong DECL_*SIZE, but nothing is
emitted when emitting those vars into assembly either and so the dynamic
initialization clobbers other vars that may overlap the variable.
I think we need keep an empty CONSTRUCTOR elt in DECL_INITIAL for the
flexible array member in that case.

2021-09-14 Jakub Jelinek <jakub@redhat.com>

PR c++/102295
* decl.c (layout_var_decl): For aggregates ending with a flexible
array member, add the size of the initializer for that member to
DECL_SIZE and DECL_SIZE_UNIT.

* g++.target/i386/pr102295.C: New test.

c++: Fix __is_*constructible/assignable for templates [PR102305]

is_xible_helper returns error_mark_node (i.e. false from the traits)
for abstract classes by testing ABSTRACT_CLASS_TYPE_P (to) early.
Unfortunately, as the testcase shows, that doesn't work on class templates
that haven't been instantiated yet, ABSTRACT_CLASS_TYPE_P for them is false
until it is instantiated, which is done when the routine later constructs
a dummy object with that type.

The following patch fixes this by calling complete_type first, so that
ABSTRACT_CLASS_TYPE_P test will work properly, while keeping the handling
of arrays with unknown bounds, or incomplete types where it is done
currently.

2021-09-14 Jakub Jelinek <jakub@redhat.com>

PR c++/102305
* method.c (is_xible_helper): Call complete_type on to.

* g++.dg/cpp0x/pr102305.C: New test.

Fortran: Add missing ST_OMP_END_SCOPE handling [PR102313]

PR fortran/102313

gcc/fortran/ChangeLog:

* parse.c (gfc_ascii_statement): Add missing ST_OMP_END_SCOPE.

gcc/testsuite/ChangeLog:

* gfortran.dg/goacc/unexpected-end.f90: New test.
* gfortran.dg/gomp/unexpected-end.f90: New test.

testsuite: fix failing pytest tests

gcc/testsuite/ChangeLog:

* g++.dg/gcov/gcov.py: Fix failing pytests as gcov.json.gz
filename was changed in b777f228b481ae881a7fbb09de367a053740932c.

Fix PR ada/101970

This is a regression present on the mainline and 11 branch in the form of an
ICE for an enumeration type with a full signed representation for its size.

gcc/ada/
PR ada/101970
* exp_attr.adb (Expand_N_Attribute_Reference) <Attribute_Enum_Rep>:
Use an unchecked conversion instead of a regular conversion in the
enumeration case and remove Conversion_OK flag in the integer case.
<Attribute_Pos>: Remove superfluous test.

gcc/testsuite/
* gnat.dg/enum_rep2.adb: New test.

arc: Update ZOL pattern.

The ZOL pattern is missing modes which may lead to errors during
var_tracking. Add them.

gcc/
* config/arc/arc.md (doloop_end): Add missing mode.
(loop_end): Likewise.

Signed-off-by: Claudiu Zissulescu <claziss@synopsys.com>

Do not issue size error for too large array type

The error is to be issued when objects of the type are declared instead.

gcc/ada/
* gcc-interface/decl.c (validate_size): Do not issue an error if the
old size has overflowed.

Fix inaccurate bounds in debug info for vector array types

They should not be 0-based, unless the array type itself is.

gcc/ada/
* gcc-interface/decl.c (gnat_to_gnu_entity): For vector types, make
the representative array the debug type.

Fix internal error on broken import of vector intrinsics

The change also makes small adjustments to warning messages for intrinsics.

gcc/ada/
* gcc-interface/decl.c (gnat_to_gnu_subprog_type): Turn variable
into constant. Capitalize GCC in warning message.
(intrin_arglists_compatible_p): Change parameter to pointer-to-const
Adjust warning messages. Turn warning into error for vector types.
(intrin_return_compatible_p): Likewise.
(intrin_profiles_compatible_p): Change parameter to pointer-to-const

Strengthen compatibility warning for GCC builtins

This is necessary for vector builtins, which are picky about the
signedness of the element type.

gcc/ada/
* libgnat/s-atopri.ads (bool): Delete.
(Atomic_Test_And_Set): Replace bool with Boolean.
(Atomic_Always_Lock_Free): Likewise.
* libgnat/s-aoinar.adb (Is_Lock_Free): Adjust.
* libgnat/s-aomoar.adb (Is_Lock_Free): Likewise.
* libgnat/s-aotase.adb (Atomic_Test_And_Set): Likewise.
* libgnat/s-atopex.adb (Atomic_Compare_And_Exchange): Likewise.
* gcc-interface/decl.c: Include gimple-expr.h.
(intrin_types_incompatible_p): Delete.
(intrin_arglists_compatible_p): Call types_compatible_p.
(intrin_return_compatible_p): Likewise.

Fix internal error on pointer-to-pointer binding in LTO mode

gcc/ada/
* gcc-interface/utils.c (update_pointer_to): Set TYPE_CANONICAL on
pointer and reference types.

testsuite: Use sync_long_long instead of sync_int_long for atomic-29.c test

As discussed, the test tests atomics on doubles which are 64-bit and so we
should use sync_long_long effective target instead of sync_int_long that
covers 64-bit atomics only on 64-bit arches.  I've added -march=pentium
to follow what is documented for sync_long_long, I guess -march=zarch should
be added for s390* too, but haven't tested that.

And using sync_long_long found a syntax error in that effective target
implementation, so I've fixed that too.

2021-09-14  Jakub Jelinek  <jakub@redhat.com>

* c-c++-common/gomp/atomic-29.c: Add -march=pentium
dg-additional-options for ia32.  Use sync_long_long effective target
instead of sync_int_long.
* lib/target-supports.exp (check_effective_target_sync_long_long): Fix
a syntax error.

openmp: Add testing checks (whether lhs appears in operands at all) to more trees

This patch adds testing checks (goa_stabilize_expr with NULL pre_p) for more
tree codes, so that we don't gimplify their operands individually unless lhs
appears in them. Also, so that we don't have exponential compile time complexity
with the added checks, I've added a depth computation, we don't expect lhs
to be found in depth 8 or above as all the atomic forms must have x expression
in specific places in the expressions.

2021-09-14 Jakub Jelinek <jakub@redhat.com>

* gimplify.c (goa_stabilize_expr): Add depth argument, propagate
it to recursive calls, for depth above 7 just gimplify or return.
Perform a test even for MODIFY_EXPR, ADDR_EXPR, COMPOUND_EXPR with
__builtin_clear_padding and TARGET_EXPR.
(gimplify_omp_atomic): Adjust goa_stabilize_expr callers.

Implement PR ada/101385

For consistency's sake with -Wall & -w, this makes -Werror imply -gnatwe.

gcc/ada/
PR ada/101385
* doc/gnat_ugn/building_executable_programs_with_gnat.rst
(-Wall): Minor fixes.
(-w): Likewise.
(-Werror): Document that it also sets -gnatwe by default.
* gcc-interface/lang-specs.h (ada): Expand -gnatwe if -Werror is
passed and move expansion of -gnatw switches to before -gnatez.

Remove superfluous call to UI_Is_In_Int_Range

gcc/ada/
* gcc-interface/utils.c (can_materialize_object_renaming_p): Do not
call UI_Is_In_Int_Range on the result of Normalized_First_Bit.

Give more informative error message for by-reference types

Recent compilers enforce more strictly the RM C.6(18) clause, which says
that volatile record types are by-reference types. This changes the typical
error message now given in these cases.

gcc/ada/
* gcc-interface/decl.c (gnat_to_gnu_entity) <is_type>: Declare new
constant. Adjust error message issued by validate_size in the case
of by-reference types.
(validate_size): Always use the error strings passed by the caller.

AVX512FP16: Add testcase for fpclass/getmant/getexp instructions.

gcc/testsuite/ChangeLog:

* gcc.target/i386/avx512fp16-helper.h (V512):
Add xmm component.
* gcc.target/i386/avx512fp16-vfpclassph-1a.c: New test.
* gcc.target/i386/avx512fp16-vfpclassph-1b.c: Ditto.
* gcc.target/i386/avx512fp16-vfpclasssh-1a.c: Ditto.
* gcc.target/i386/avx512fp16-vfpclasssh-1b.c: Ditto.
* gcc.target/i386/avx512fp16-vgetexpph-1a.c: Ditto.
* gcc.target/i386/avx512fp16-vgetexpph-1b.c: Ditto.
* gcc.target/i386/avx512fp16-vgetexpsh-1a.c: Ditto.
* gcc.target/i386/avx512fp16-vgetexpsh-1b.c: Ditto.
* gcc.target/i386/avx512fp16-vgetmantph-1a.c: Ditto.
* gcc.target/i386/avx512fp16-vgetmantph-1b.c: Ditto.
* gcc.target/i386/avx512fp16-vgetmantsh-1a.c: Ditto.
* gcc.target/i386/avx512fp16-vgetmantsh-1b.c: Ditto.
* gcc.target/i386/avx512fp16vl-vfpclassph-1a.c: Ditto.
* gcc.target/i386/avx512fp16vl-vfpclassph-1b.c: Ditto.
* gcc.target/i386/avx512fp16vl-vgetexpph-1a.c: Ditto.
* gcc.target/i386/avx512fp16vl-vgetexpph-1b.c: Ditto.
* gcc.target/i386/avx512fp16vl-vgetmantph-1a.c: Ditto.
* gcc.target/i386/avx512fp16vl-vgetmantph-1b.c: Ditto.

AVX512FP16: Add fpclass/getexp/getmant instructions.

Add vfpclassph/vfpclasssh/vgetexpph/vgetexpsh/vgetmantph/vgetmantsh.

gcc/ChangeLog:

* config/i386/avx512fp16intrin.h (_mm_fpclass_sh_mask):
New intrinsic.
(_mm_mask_fpclass_sh_mask): Likewise.
(_mm512_mask_fpclass_ph_mask): Likewise.
(_mm512_fpclass_ph_mask): Likewise.
(_mm_getexp_sh): Likewise.
(_mm_mask_getexp_sh): Likewise.
(_mm_maskz_getexp_sh): Likewise.
(_mm512_getexp_ph): Likewise.
(_mm512_mask_getexp_ph): Likewise.
(_mm512_maskz_getexp_ph): Likewise.
(_mm_getexp_round_sh): Likewise.
(_mm_mask_getexp_round_sh): Likewise.
(_mm_maskz_getexp_round_sh): Likewise.
(_mm512_getexp_round_ph): Likewise.
(_mm512_mask_getexp_round_ph): Likewise.
(_mm512_maskz_getexp_round_ph): Likewise.
(_mm_getmant_sh): Likewise.
(_mm_mask_getmant_sh): Likewise.
(_mm_maskz_getmant_sh): Likewise.
(_mm512_getmant_ph): Likewise.
(_mm512_mask_getmant_ph): Likewise.
(_mm512_maskz_getmant_ph): Likewise.
(_mm_getmant_round_sh): Likewise.
(_mm_mask_getmant_round_sh): Likewise.
(_mm_maskz_getmant_round_sh): Likewise.
(_mm512_getmant_round_ph): Likewise.
(_mm512_mask_getmant_round_ph): Likewise.
(_mm512_maskz_getmant_round_ph): Likewise.
* config/i386/avx512fp16vlintrin.h (_mm_mask_fpclass_ph_mask):
New intrinsic.
(_mm_fpclass_ph_mask): Likewise.
(_mm256_mask_fpclass_ph_mask): Likewise.
(_mm256_fpclass_ph_mask): Likewise.
(_mm256_getexp_ph): Likewise.
(_mm256_mask_getexp_ph): Likewise.
(_mm256_maskz_getexp_ph): Likewise.
(_mm_getexp_ph): Likewise.
(_mm_mask_getexp_ph): Likewise.
(_mm_maskz_getexp_ph): Likewise.
(_mm256_getmant_ph): Likewise.
(_mm256_mask_getmant_ph): Likewise.
(_mm256_maskz_getmant_ph): Likewise.
(_mm_getmant_ph): Likewise.
(_mm_mask_getmant_ph): Likewise.
(_mm_maskz_getmant_ph): Likewise.
* config/i386/i386-builtin-types.def: Add corresponding builtin types.
* config/i386/i386-builtin.def: Add corresponding new builtins.
* config/i386/i386-expand.c
(ix86_expand_args_builtin): Handle new builtin types.
(ix86_expand_round_builtin): Ditto.
* config/i386/sse.md (vecmemsuffix): Add HF vector modes.
(<avx512>_getexp<mode><mask_name><round_saeonly_name>): Adjust
to support HF vector modes.
(avx512f_sgetexp<mode><mask_scalar_name><round_saeonly_scalar_name):
Ditto.
(avx512dq_fpclass<mode><mask_scalar_merge_name>): Ditto.
(avx512dq_vmfpclass<mode><mask_scalar_merge_name>): Ditto.
(<avx512>_getmant<mode><mask_name><round_saeonly_name>): Ditto.
(avx512f_vgetmant<mode><mask_scalar_name><round_saeonly_scalar_name>):
Ditto.

gcc/testsuite/ChangeLog:

* gcc.target/i386/avx-1.c: Add test for new builtins.
* gcc.target/i386/sse-13.c: Ditto.
* gcc.target/i386/sse-23.c: Ditto.
* gcc.target/i386/sse-14.c: Add test for new intrinsics.
* gcc.target/i386/sse-22.c: Ditto.

AVX512FP16: Add testcase for vreduceph/vreducesh/vrndscaleph/vrndscalesh.

gcc/testsuite/ChangeLog:

* gcc.target/i386/avx512fp16-helper.h (_ROUND_CUR): New macro.
* gcc.target/i386/avx512fp16-vreduceph-1a.c: New test.
* gcc.target/i386/avx512fp16-vreduceph-1b.c: Ditto.
* gcc.target/i386/avx512fp16-vreducesh-1a.c: Ditto.
* gcc.target/i386/avx512fp16-vreducesh-1b.c: Ditto.
* gcc.target/i386/avx512fp16-vrndscaleph-1a.c: Ditto.
* gcc.target/i386/avx512fp16-vrndscaleph-1b.c: Ditto.
* gcc.target/i386/avx512fp16-vrndscalesh-1a.c: Ditto.
* gcc.target/i386/avx512fp16-vrndscalesh-1b.c: Ditto.
* gcc.target/i386/avx512fp16vl-vreduceph-1a.c: Ditto.
* gcc.target/i386/avx512fp16vl-vreduceph-1b.c: Ditto.
* gcc.target/i386/avx512fp16vl-vrndscaleph-1a.c: Ditto.
* gcc.target/i386/avx512fp16vl-vrndscaleph-1b.c: Ditto.

AVX512FP16: Add vreduceph/vreducesh/vrndscaleph/vrndscalesh.

gcc/ChangeLog:

* config/i386/avx512fp16intrin.h (_mm512_reduce_ph):
New intrinsic.
(_mm512_mask_reduce_ph): Likewise.
(_mm512_maskz_reduce_ph): Likewise.
(_mm512_reduce_round_ph): Likewise.
(_mm512_mask_reduce_round_ph): Likewise.
(_mm512_maskz_reduce_round_ph): Likewise.
(_mm_reduce_sh): Likewise.
(_mm_mask_reduce_sh): Likewise.
(_mm_maskz_reduce_sh): Likewise.
(_mm_reduce_round_sh): Likewise.
(_mm_mask_reduce_round_sh): Likewise.
(_mm_maskz_reduce_round_sh): Likewise.
(_mm512_roundscale_ph): Likewise.
(_mm512_mask_roundscale_ph): Likewise.
(_mm512_maskz_roundscale_ph): Likewise.
(_mm512_roundscale_round_ph): Likewise.
(_mm512_mask_roundscale_round_ph): Likewise.
(_mm512_maskz_roundscale_round_ph): Likewise.
(_mm_roundscale_sh): Likewise.
(_mm_mask_roundscale_sh): Likewise.
(_mm_maskz_roundscale_sh): Likewise.
(_mm_roundscale_round_sh): Likewise.
(_mm_mask_roundscale_round_sh): Likewise.
(_mm_maskz_roundscale_round_sh): Likewise.
* config/i386/avx512fp16vlintrin.h: (_mm_reduce_ph):
New intrinsic.
(_mm_mask_reduce_ph): Likewise.
(_mm_maskz_reduce_ph): Likewise.
(_mm256_reduce_ph): Likewise.
(_mm256_mask_reduce_ph): Likewise.
(_mm256_maskz_reduce_ph): Likewise.
(_mm_roundscale_ph): Likewise.
(_mm_mask_roundscale_ph): Likewise.
(_mm_maskz_roundscale_ph): Likewise.
(_mm256_roundscale_ph): Likewise.
(_mm256_mask_roundscale_ph): Likewise.
(_mm256_maskz_roundscale_ph): Likewise.
* config/i386/i386-builtin-types.def: Add corresponding builtin types.
* config/i386/i386-builtin.def: Add corresponding new builtins.
* config/i386/i386-expand.c
(ix86_expand_args_builtin): Handle new builtin types.
(ix86_expand_round_builtin): Ditto.
* config/i386/sse.md (<mask_codefor>reducep<mode><mask_name>):
Renamed to ...
(<mask_codefor>reducep<mode><mask_name><round_saeonly_name>):
... this, and adjust for round operands.
(reduces<mode><mask_scalar_name>): Likewise, with ...
(reduces<mode><mask_scalar_name><round_saeonly_scalar_name):
... this.
(<avx512>_rndscale<mode><mask_name><round_saeonly_name>):
Adjust for HF vector modes.
(avx512f_rndscale<mode><mask_scalar_name><round_saeonly_scalar_name>):
Ditto.
(*avx512f_rndscale<mode><round_saeonly_name>): Ditto.

gcc/testsuite/ChangeLog:

* gcc.target/i386/avx-1.c: Add test for new builtins.
* gcc.target/i386/sse-13.c: Ditto.
* gcc.target/i386/sse-23.c: Ditto.
* gcc.target/i386/sse-14.c: Add test for new intrinsics.
* gcc.target/i386/sse-22.c: Ditto.

AVX512FP16: Add testcase for vrcpph/vrcpsh/vscalefph/vscalefsh.

gcc/testsuite/ChangeLog:

* gcc.target/i386/avx512fp16-vrcpph-1a.c: New test.
* gcc.target/i386/avx512fp16-vrcpph-1b.c: Ditto.
* gcc.target/i386/avx512fp16-vrcpsh-1a.c: Ditto.
* gcc.target/i386/avx512fp16-vrcpsh-1b.c: Ditto.
* gcc.target/i386/avx512fp16-vscalefph-1a.c: Ditto.
* gcc.target/i386/avx512fp16-vscalefph-1b.c: Ditto.
* gcc.target/i386/avx512fp16-vscalefsh-1a.c: Ditto.
* gcc.target/i386/avx512fp16-vscalefsh-1b.c: Ditto.
* gcc.target/i386/avx512fp16vl-vrcpph-1a.c: Ditto.
* gcc.target/i386/avx512fp16vl-vrcpph-1b.c: Ditto.
* gcc.target/i386/avx512fp16vl-vscalefph-1a.c: Ditto.
* gcc.target/i386/avx512fp16vl-vscalefph-1b.c: Ditto.

AVX512FP16: Add vrcpph/vrcpsh/vscalefph/vscalefsh.

gcc/ChangeLog:

* config/i386/avx512fp16intrin.h: (_mm512_rcp_ph):
New intrinsic.
(_mm512_mask_rcp_ph): Likewise.
(_mm512_maskz_rcp_ph): Likewise.
(_mm_rcp_sh): Likewise.
(_mm_mask_rcp_sh): Likewise.
(_mm_maskz_rcp_sh): Likewise.
(_mm512_scalef_ph): Likewise.
(_mm512_mask_scalef_ph): Likewise.
(_mm512_maskz_scalef_ph): Likewise.
(_mm512_scalef_round_ph): Likewise.
(_mm512_mask_scalef_round_ph): Likewise.
(_mm512_maskz_scalef_round_ph): Likewise.
(_mm_scalef_sh): Likewise.
(_mm_mask_scalef_sh): Likewise.
(_mm_maskz_scalef_sh): Likewise.
(_mm_scalef_round_sh): Likewise.
(_mm_mask_scalef_round_sh): Likewise.
(_mm_maskz_scalef_round_sh): Likewise.
* config/i386/avx512fp16vlintrin.h (_mm_rcp_ph):
New intrinsic.
(_mm256_rcp_ph): Likewise.
(_mm_mask_rcp_ph): Likewise.
(_mm256_mask_rcp_ph): Likewise.
(_mm_maskz_rcp_ph): Likewise.
(_mm256_maskz_rcp_ph): Likewise.
(_mm_scalef_ph): Likewise.
(_mm256_scalef_ph): Likewise.
(_mm_mask_scalef_ph): Likewise.
(_mm256_mask_scalef_ph): Likewise.
(_mm_maskz_scalef_ph): Likewise.
(_mm256_maskz_scalef_ph): Likewise.
* config/i386/i386-builtin.def: Add new builtins.
* config/i386/sse.md (VFH_AVX512VL): New.
(avx512fp16_rcp<mode>2<mask_name>): Ditto.
(avx512fp16_vmrcpv8hf2<mask_scalar_name>): Ditto.
(avx512f_vmscalef<mode><mask_scalar_name><round_scalar_name>):
Adjust to support HF vector modes.
(<avx512>_scalef<mode><mask_name><round_name>): Ditto.

gcc/testsuite/ChangeLog:

* gcc.target/i386/avx-1.c: Add test for new builtins.
* gcc.target/i386/sse-13.c: Ditto.
* gcc.target/i386/sse-23.c: Ditto.
* gcc.target/i386/sse-14.c: Add test for new intrinsics.
* gcc.target/i386/sse-22.c: Ditto.

AVX512FP16: Add testcase for vsqrtph/vsqrtsh/vrsqrtph/vrsqrtsh.

gcc/testsuite/ChangeLog:

* gcc.target/i386/avx512fp16-vrsqrtph-1a.c: New test.
* gcc.target/i386/avx512fp16-vrsqrtph-1b.c: Ditto.
* gcc.target/i386/avx512fp16-vrsqrtsh-1a.c: Ditto.
* gcc.target/i386/avx512fp16-vrsqrtsh-1b.c: Ditto.
* gcc.target/i386/avx512fp16-vsqrtph-1a.c: Ditto.
* gcc.target/i386/avx512fp16-vsqrtph-1b.c: Ditto.
* gcc.target/i386/avx512fp16-vsqrtsh-1a.c: Ditto.
* gcc.target/i386/avx512fp16-vsqrtsh-1b.c: Ditto.
* gcc.target/i386/avx512fp16vl-vrsqrtph-1a.c: Ditto.
* gcc.target/i386/avx512fp16vl-vrsqrtph-1b.c: Ditto.
* gcc.target/i386/avx512fp16vl-vsqrtph-1a.c: Ditto.
* gcc.target/i386/avx512fp16vl-vsqrtph-1b.c: Ditto.

AVX512FP16: Add vsqrtph/vrsqrtph/vsqrtsh/vrsqrtsh.

gcc/ChangeLog:

* config/i386/avx512fp16intrin.h: (_mm512_sqrt_ph):
New intrinsic.
(_mm512_mask_sqrt_ph): Likewise.
(_mm512_maskz_sqrt_ph): Likewise.
(_mm512_sqrt_round_ph): Likewise.
(_mm512_mask_sqrt_round_ph): Likewise.
(_mm512_maskz_sqrt_round_ph): Likewise.
(_mm512_rsqrt_ph): Likewise.
(_mm512_mask_rsqrt_ph): Likewise.
(_mm512_maskz_rsqrt_ph): Likewise.
(_mm_rsqrt_sh): Likewise.
(_mm_mask_rsqrt_sh): Likewise.
(_mm_maskz_rsqrt_sh): Likewise.
(_mm_sqrt_sh): Likewise.
(_mm_mask_sqrt_sh): Likewise.
(_mm_maskz_sqrt_sh): Likewise.
(_mm_sqrt_round_sh): Likewise.
(_mm_mask_sqrt_round_sh): Likewise.
(_mm_maskz_sqrt_round_sh): Likewise.
* config/i386/avx512fp16vlintrin.h (_mm_sqrt_ph): New intrinsic.
(_mm256_sqrt_ph): Likewise.
(_mm_mask_sqrt_ph): Likewise.
(_mm256_mask_sqrt_ph): Likewise.
(_mm_maskz_sqrt_ph): Likewise.
(_mm256_maskz_sqrt_ph): Likewise.
(_mm_rsqrt_ph): Likewise.
(_mm256_rsqrt_ph): Likewise.
(_mm_mask_rsqrt_ph): Likewise.
(_mm256_mask_rsqrt_ph): Likewise.
(_mm_maskz_rsqrt_ph): Likewise.
(_mm256_maskz_rsqrt_ph): Likewise.
* config/i386/i386-builtin-types.def: Add corresponding builtin types.
* config/i386/i386-builtin.def: Add corresponding new builtins.
* config/i386/i386-expand.c
(ix86_expand_args_builtin): Handle new builtins.
(ix86_expand_round_builtin): Ditto.
* config/i386/sse.md (VF_AVX512FP16VL): New.
(sqrt<mode>2): Adjust for HF vector modes.
(<sse>_sqrt<mode>2<mask_name><round_name>): Likewise.
(<sse>_vmsqrt<mode>2<mask_scalar_name><round_scalar_name>):
Likewise.
(<sse>_rsqrt<mode>2<mask_name>): New.
(avx512fp16_vmrsqrtv8hf2<mask_scalar_name>): Likewise.

gcc/testsuite/ChangeLog:

* gcc.target/i386/avx-1.c: Add test for new builtins.
* gcc.target/i386/sse-13.c: Ditto.
* gcc.target/i386/sse-23.c: Ditto.
* gcc.target/i386/sse-14.c: Add test for new intrinsics.
* gcc.target/i386/sse-22.c: Ditto.

c++: Fix warning on 32-bit x86

My C++17 hardware interference sizes patch caused a bogus warning on 32-bit
x86, where we have a default L1 cache line size of 0, and the front end
complained that the default constructive interference size of 64 was larger
than that.

gcc/cp/ChangeLog:

* decl.c (cxx_init_decl_processing): Don't warn if L1 cache line
size is smaller than maxalign.

Daily bump.

Fortran - ensure simplification of bounds of array-valued named constants

gcc/fortran/ChangeLog:

PR fortran/82314
* decl.c (add_init_expr_to_sym): For proper initialization of
array-valued named constants the array bounds need to be
simplified before adding the initializer.

gcc/testsuite/ChangeLog:

PR fortran/82314
* gfortran.dg/pr82314.f90: New test.

Fortran - fix handling of substring start and end indices

gcc/fortran/ChangeLog:

PR fortran/85130
* expr.c (find_substring_ref): Handle given substring start and
end indices as signed integers, not unsigned.

gcc/testsuite/ChangeLog:

PR fortran/85130
* gfortran.dg/substr_6.f90: Revert commit r8-7574, adding again
test that was erroneously considered as illegal.

Don't maintain a warning spec for 'UNKNOWN_LOCATION'/'BUILTINS_LOCATION' [PR101574]

This resolves PR101574 "gcc/sparseset.h:215:20: error: suggest parentheses
around assignment used as truth value [-Werror=parentheses]", as (bogusly)
reported at commit a61f6afbee370785cf091fe46e2e022748528307:

    In file included from [...]/source-gcc/gcc/lra-lives.c:43:
    [...]/source-gcc/gcc/lra-lives.c: In function ‘void make_hard_regno_dead(int)’:
    [...]/source-gcc/gcc/sparseset.h:215:20: error: suggest parentheses around assignment used as truth value [-Werror=parentheses]
      215 |        && (((ITER) = sparseset_iter_elm (SPARSESET)) || 1);             \
          |            ~~~~~~~~^~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~
    [...]/source-gcc/gcc/lra-lives.c:304:3: note: in expansion of macro ‘EXECUTE_IF_SET_IN_SPARSESET’
      304 |   EXECUTE_IF_SET_IN_SPARSESET (pseudos_live, i)
          |   ^~~~~~~~~~~~~~~~~~~~~~~~~~~

gcc/
PR bootstrap/101574
* diagnostic-spec.c (warning_suppressed_at, copy_warning): Handle
'RESERVED_LOCATION_P' locations.
* warning-control.cc (get_nowarn_spec, suppress_warning)
(copy_warning): Likewise.

Clarify 'key_type_t' to 'location_t' as used for 'gcc/diagnostic-spec.h:nowarn_map'

To make it obvious what exactly the key type is.  No change in behavior.

gcc/
* diagnostic-spec.h (typedef xint_hash_t): Use 'location_t' instead of...
(typedef key_type_t): ... this.  Remove.
(nowarn_map): Document.
* diagnostic-spec.c (nowarn_map): Likewise.
* warning-control.cc (convert_to_key): Evolve functions into...
(get_location): ... these.  Adjust all users.

Simplify 'gcc/diagnostic-spec.h:nowarn_map' setup

If we've just read something from the map, we can be sure that it exists.

gcc/
* warning-control.cc (copy_warning): Remove 'nowarn_map' setup.

c++: implement C++17 hardware interference size

The last missing piece of the C++17 standard library is the hardware
intereference size constants.  Much of the delay in implementing these has
been due to uncertainty about what the right values are, and even whether
there is a single constant value that is suitable; the destructive
interference size is intended to be used in structure layout, so program
ABIs will depend on it.

In principle, both of these values should be the same as the target's L1
cache line size.  When compiling for a generic target that is intended to
support a range of target CPUs with different cache line sizes, the
constructive size should probably be the minimum size, and the destructive
size the maximum, unless you are constrained by ABI compatibility with
previous code.

From discussion on gcc-patches, I've come to the conclusion that the
solution to the difficulty of choosing stable values is to give up on it,
and instead encourage only uses where ABI stability is unimportant: in
particular, uses where the ABI is shared at most between translation units
built at the same time with the same flags.

To that end, I've added a warning for any use of the constant value of
std::hardware_destructive_interference_size in a header or module export.
Appropriate uses within a project can disable the warning.

A previous iteration of this patch included an -finterference-tune flag to
make the value vary with -mtune; this iteration makes that the default
behavior, which should be appropriate for all reasonable uses of the
variable.  The previous default of "stable-ish" seems to me likely to have
been more of an attractive nuisance; since we can't promise actual
stability, we should instead make proper uses more convenient.

JF Bastien's implementation proposal is summarized at
https://github.com/itanium-cxx-abi/cxx-abi/issues/74

I implement this by adding new --params for the two sizes.  Targets can
override these values in targetm.target_option.override() to support a range
of values for the generic target; otherwise, both will default to the L1
cache line size.

64 bytes still seems correct for all x86.

I'm not sure why he proposed 64/64 for generic 32-bit ARM, since the Cortex
A9 has a 32-byte cache line, so I'd think 32/64 would make more sense.

He proposed 64/128 for generic AArch64, but since the A64FX now has a 256B
cache line, I've changed that to 64/256.

Other arch maintainers are invited to set ranges for their generic targets
if that seems better than using the default cache line size for both values.

With the above choice to reject stability as a goal, getting these values
"right" is now just a matter of what we want the default optimization to be,
and we can feel free to adjust them as CPUs with different cache lines
become more and less common.

gcc/ChangeLog:

* params.opt: Add destructive-interference-size and
constructive-interference-size.
* doc/invoke.texi: Document them.
* config/aarch64/aarch64.c (aarch64_override_options_internal):
Set them.
* config/arm/arm.c (arm_option_override): Set them.
* config/i386/i386-options.c (ix86_option_override_internal):
Set them.

gcc/c-family/ChangeLog:

* c.opt: Add -Winterference-size.
* c-cppbuiltin.c (cpp_atomic_builtins): Add __GCC_DESTRUCTIVE_SIZE
and __GCC_CONSTRUCTIVE_SIZE.

gcc/cp/ChangeLog:

* constexpr.c (maybe_warn_about_constant_value):
Complain about std::hardware_destructive_interference_size.
(cxx_eval_constant_expression): Call it.
* decl.c (cxx_init_decl_processing): Check
--param *-interference-size values.

libstdc++-v3/ChangeLog:

* include/std/version: Define __cpp_lib_hardware_interference_size.
* libsupc++/new: Define hardware interference size variables.

gcc/testsuite/ChangeLog:

* g++.dg/warn/Winterference.H: New file.
* g++.dg/warn/Winterference.C: New test.
* g++.target/aarch64/interference.C: New test.
* g++.target/arm/interference.C: New test.
* g++.target/i386/interference.C: New test.

i386: support micro-levels in target{,_clone} attrs [PR101696]

As mentioned in the PR, we do miss supports target micro-architectures
in target and target_clone attribute. While the levels
x86-64 x86-64-v2 x86-64-v3 x86-64-v4 are supported values by -march
option, they are actually only aliases for k8 CPU. That said, they are more
closer to __builtin_cpu_supports function and we decided to implement
it there.

PR target/101696

gcc/ChangeLog:

* common/config/i386/cpuinfo.h (cpu_indicator_init): Add support
for x86-64 micro levels for __builtin_cpu_supports.
* common/config/i386/i386-cpuinfo.h (enum feature_priority):
Add priorities for the micro-arch levels.
(enum processor_features): Add new features.
* common/config/i386/i386-isas.h: Add micro-arch features.
* config/i386/i386-builtins.c (get_builtin_code_for_version):
Support the micro-arch levels by callsing
__builtin_cpu_supports.
* doc/extend.texi: Document that the levels are support by
__builtin_cpu_supports.

gcc/testsuite/ChangeLog:

* g++.target/i386/mv30.C: New test.
* gcc.target/i386/mvc16.c: New test.
* gcc.target/i386/builtin_target.c (CHECK___builtin_cpu_supports):
New.

Co-Authored-By: H.J. Lu <hjl.tools@gmail.com>

[aarch64] Fix target/95969: __builtin_aarch64_im_lane_boundsi interferes with gimple

This patch adds simple folding of __builtin_aarch64_im_lane_boundsi where
we are not going to error out. It fixes the problem by the removal
of the function from the IR.

OK? Bootstrapped and tested on aarch64-linux-gnu with no regressions.

gcc/ChangeLog:

PR target/95969
* config/aarch64/aarch64-builtins.c (aarch64_fold_builtin_lane_check):
New function.
(aarch64_general_fold_builtin): Handle AARCH64_SIMD_BUILTIN_LANE_CHECK.
(aarch64_general_gimple_fold_builtin): Likewise.

gcc/testsuite/ChangeLog:

PR target/95969
* gcc.target/aarch64/lane-bound-1.c: New test.
* gcc.target/aarch64/lane-bound-2.c: New test.

Remove m32r{,le}-*-linux* support from GCC

m32r support never made it to glibc and the support for the Linux kernel
was removed with 4.18. It does not remove much but no reason to keep
around a port which never worked or one which the support in other
projects is gone.

OK? Checked to make sure m32r-linux and m32rle-linux were rejected
when building.

contrib/ChangeLog:

* config-list.mk: Remove m32r-linux and m32rle-linux
from the list.

gcc/ChangeLog:

* config.gcc: Add m32r-*-linux* and m32rle-*-linux*
to the Unsupported targets list.
Remove support for m32r-*-linux* and m32rle-*-linux*.
* config/m32r/linux.h: Removed.
* config/m32r/t-linux: Removed.

libgcc/ChangeLog:

* config.host: Remove m32r-*-linux* and m32rle-*-linux*.
* config/m32r/libgcc-glibc.ver: Removed.
* config/m32r/t-linux: Removed.

Fix PR lto/49664: liblto_plugin.so exports too many symbols

So right now liblto_plugin.so exports many libiberty symbols and
simple_object file symbols but really it just needs to export onload.

This fixes the problem by using "-export-symbols-regex onload" on
the libtool link line.

lto-plugin/ChangeLog:

PR lto/49664
* Makefile.am: Export only onload.
* Makefile.in: Regenerate.

aarch64: PR target/102252 Invalid addressing mode for SVE load predicate

In the testcase we generate invalid assembly for an SVE load predicate instruction.
The RTL for the insn is:
(insn 9 8 10 (set (reg:VNx16BI 68 p0)
        (mem:VNx16BI (plus:DI (mult:DI (reg:DI 1 x1 [93])
                    (const_int 8 [0x8]))
                (reg/f:DI 0 x0 [92])) [2 work_3(D)->array[offset_4(D)]+0 S8 A16]))

That addressing mode is not valid for the instruction [1] as it only accepts the addressing mode:
[<Xn|SP>{, #<imm>, MUL VL}]

This patch rejects the register index form for SVE predicate modes.

Bootstrapped and tested on aarch64-none-linux-gnu.

[1] https://developer.arm.com/documentation/ddi0602/2021-06/SVE-Instructions/LDR--predicate---Load-predicate-register-

gcc/ChangeLog:

PR target/102252
* config/aarch64/aarch64.c (aarch64_classify_address): Don't allow
register index for SVE predicate modes.

gcc/testsuite/ChangeLog:

PR target/102252
* g++.target/aarch64/sve/pr102252.C: New test.

Remove references to FSM threads.

Now that the jump thread back registry has been split into the generic
copier and the custom (old) copier, it becomes trivial to remove the
FSM bits from the jump threaders.

First, there's no need for an EDGE_FSM_THREAD type.  The only reason
we were looking at the threading type was to determine what type of
copier to use, and now that the copier has been split, there's no need
to even look.  However, there is one check in register_jump_thread
where we verify that only the generic copier can thread through
back-edges.  I've removed that check in favor of a flag passed to the
constructor.

I've also removed all the FSM references from the code and tests.
Interestingly, some tests weren't even testing the right thing.  They
were testing for "FSM" which would catch jump thread paths as well as
the backward threader *failing* on registering a path.  *big eye roll*

The only remaining code that was actually checking for EDGE_FSM_THREAD
was adjust_paths_after_duplication, and the checks could be written
without looking at the edge type at all.  For the record, the code
there is horrible: it's convoluted, hard to read, and doesn't have any
tests.  I'd smack myself if I could go back in time.

All that remains are the FSM references in the --param's themselves.
I think we should s/fsm/threader/, since I envision a day when we can
share the cost basis code between the threaders.  However, I don't
know what the proper procedure is for renaming existing compiler
options.

By the way, param_fsm_maximum_phi_arguments is no longer relevant
after the rewrite.  We can nuke that one right away.

Tested on x86-64 Linux.

gcc/ChangeLog:

* tree-ssa-threadbackward.c
(back_threader_profitability::profitable_path_p): Remove FSM
references.
(back_threader_registry::register_path): Same.
* tree-ssa-threadedge.c
(jump_threader::simplify_control_stmt_condition): Same.
* tree-ssa-threadupdate.c (jt_path_registry::jt_path_registry):
Add backedge_threads argument.
(fwd_jt_path_registry::fwd_jt_path_registry): Pass
backedge_threads argument.
(back_jt_path_registry::back_jt_path_registry):  Same.
(dump_jump_thread_path): Adjust for FSM removal.
(back_jt_path_registry::rewire_first_differing_edge): Same.
(back_jt_path_registry::adjust_paths_after_duplication): Same.
(back_jt_path_registry::update_cfg): Same.
(jt_path_registry::register_jump_thread): Same.
* tree-ssa-threadupdate.h (enum jump_thread_edge_type): Remove
EDGE_FSM_THREAD.
(class back_jt_path_registry): Add backedge_threads to
constructor.

gcc/testsuite/ChangeLog:

* gcc.dg/tree-ssa/pr21417.c: Adjust for FSM removal.
* gcc.dg/tree-ssa/pr66752-3.c: Same.
* gcc.dg/tree-ssa/pr68198.c: Same.
* gcc.dg/tree-ssa/pr69196-1.c: Same.
* gcc.dg/tree-ssa/pr70232.c: Same.
* gcc.dg/tree-ssa/pr77445.c: Same.
* gcc.dg/tree-ssa/ranger-threader-4.c: Same.
* gcc.dg/tree-ssa/ssa-dom-thread-18.c: Same.
* gcc.dg/tree-ssa/ssa-dom-thread-6.c: Same.
* gcc.dg/tree-ssa/ssa-thread-12.c: Same.
* gcc.dg/tree-ssa/ssa-thread-13.c: Same.

c++: parameter pack inside constexpr if [PR101764]

Here when partially instantiating the first pack expansion, substitution
into the condition of the constexpr if yields a still-dependent tree, so
tsubst_expr returns an IF_STMT with an unsubstituted IF_COND and with
IF_STMT_EXTRA_ARGS added to.  Hence after partial instantiation the pack
expansion pattern still refers to the unlowered parameter pack 'ts' of
level 2, and it's thusly recorded in the new PACK_EXPANSION_PARAMETER_PACKS.
During the subsequent final instantiation of the regenerated lambda we
crash in tsubst_pack_expansion because it can't find an argument pack
for this unlowered 'ts', due to the level mismatch.  (Likewise when the
constexpr if is replaced by a requires-expr, which also uses the extra
args mechanism for avoiding partial instantiation.)

So essentially, a pack expansion pattern that contains an "extra args"
tree doesn't play well with partial instantiation.  This patch fixes
this by forcing such pack expansions to use the extra args mechanism as
well.

PR c++/101764

gcc/cp/ChangeLog:

* cp-tree.h (PACK_EXPANSION_FORCE_EXTRA_ARGS_P): New accessor
macro.
* pt.c (has_extra_args_mechanism_p): New function.
(find_parameter_pack_data::found_extra_args_tree_p): New data
member.
(find_parameter_packs_r): Set ppd->found_extra_args_tree_p
appropriately.
(make_pack_expansion): Set PACK_EXPANSION_FORCE_EXTRA_ARGS_P if
ppd.found_extra_args_tree_p.
(use_pack_expansion_extra_args_p): Return true if there were
unsubstituted packs and PACK_EXPANSION_FORCE_EXTRA_ARGS_P.
(tsubst_pack_expansion): Pass the pack expansion to
use_pack_expansion_extra_args_p.

gcc/testsuite/ChangeLog:

* g++.dg/cpp1z/constexpr-if35.C: New test.

c++: fix -fsanitize-coverage=trace-pc ICE [PR101331]

PR c++/101331

gcc/ChangeLog:

* asan.h (sanitize_coverage_p): Handle when fn == NULL.

gcc/testsuite/ChangeLog:

* g++.dg/pr101331.C: New test.

Adjust ssa-dom-thread-7.c on aarch64.

gcc/testsuite/ChangeLog:

* gcc.dg/tree-ssa/ssa-dom-thread-7.c: Adjust for aarch64.

x86: Add TARGET_AVX256_[MOVE|STORE]_BY_PIECES

1. Add TARGET_AVX256_MOVE_BY_PIECES to perform move by-pieces operation
with 256-bit AVX instructions.
2. Add TARGET_AVX256_STORE_BY_PIECES to perform move and store by-pieces
operations with 256-bit AVX instructions.

They are enabled only for Intel Alder Lake and Intel processors with
AVX512.

gcc/

PR target/101935
* config/i386/i386.h (TARGET_AVX256_MOVE_BY_PIECES): New.
(TARGET_AVX256_STORE_BY_PIECES): Likewise.
(MOVE_MAX): Check TARGET_AVX256_MOVE_BY_PIECES and
TARGET_AVX256_STORE_BY_PIECES instead of
TARGET_AVX256_SPLIT_UNALIGNED_LOAD and
TARGET_AVX256_SPLIT_UNALIGNED_STORE.
(STORE_MAX_PIECES): Check TARGET_AVX256_STORE_BY_PIECES instead
of TARGET_AVX256_SPLIT_UNALIGNED_STORE.
* config/i386/x86-tune.def (X86_TUNE_AVX256_MOVE_BY_PIECES): New.
(X86_TUNE_AVX256_STORE_BY_PIECES): Likewise.

gcc/testsuite/

PR target/101935
* g++.target/i386/pr80566-1.C: Add
-mtune-ctrl=avx256_store_by_pieces.
* gcc.target/i386/pr100865-4a.c: Likewise.
* gcc.target/i386/pr100865-10a.c: Likewise.
* gcc.target/i386/pr90773-20.c: Likewise.
* gcc.target/i386/pr90773-21.c: Likewise.
* gcc.target/i386/pr90773-22.c: Likewise.
* gcc.target/i386/pr90773-23.c: Likewise.
* g++.target/i386/pr80566-2.C: Add
-mtune-ctrl=avx256_move_by_pieces.
* gcc.target/i386/eh_return-1.c: Likewise.
* gcc.target/i386/pr90773-26.c: Likewise.
* gcc.target/i386/pieces-memcpy-12.c: Replace -mtune=haswell
with -mtune-ctrl=avx256_move_by_pieces.
* gcc.target/i386/pieces-memcpy-15.c: Likewise.
* gcc.target/i386/pieces-memset-2.c: Replace -mtune=haswell
with -mtune-ctrl=avx256_store_by_pieces.
* gcc.target/i386/pieces-memset-5.c: Likewise.
* gcc.target/i386/pieces-memset-11.c: Likewise.
* gcc.target/i386/pieces-memset-14.c: Likewise.
* gcc.target/i386/pieces-memset-20.c: Likewise.
* gcc.target/i386/pieces-memset-23.c: Likewise.
* gcc.target/i386/pieces-memset-29.c: Likewise.
* gcc.target/i386/pieces-memset-30.c: Likewise.
* gcc.target/i386/pieces-memset-33.c: Likewise.
* gcc.target/i386/pieces-memset-34.c: Likewise.
* gcc.target/i386/pieces-memset-44.c: Likewise.
* gcc.target/i386/pieces-memset-37.c: Replace -mtune=generic
with -mtune-ctrl=avx256_store_by_pieces.

Use gen_lowpart_if_possible instead of gen_lowpart to avoid ICE.

gcc/ChangeLog:

PR bootstrap/102302
* expmed.c (extract_bit_field_using_extv): Use
gen_lowpart_if_possible instead of gen_lowpart to avoid ICE.

Move pointer_equiv_analyzer to new file.

We need to use the pointer equivalence tracking from evrp in the jump
threader. Instead of moving it to some *evrp.h header, it's cleaner for
it to live in its own file, since it's completely independent and not
evrp specific.

Tested on x86-64 Linux.

gcc/ChangeLog:

* Makefile.in (OBJS): Add value-pointer-equiv.o.
* gimple-ssa-evrp.c (class ssa_equiv_stack): Move to
value-pointer-equiv.*.
(ssa_equiv_stack::ssa_equiv_stack): Same.
(ssa_equiv_stack::enter): Same.
(ssa_equiv_stack::leave): Same.
(ssa_equiv_stack::push_replacement): Same.
(ssa_equiv_stack::get_replacement): Same.
(is_pointer_ssa): Same.
(class pointer_equiv_analyzer): Same.
(pointer_equiv_analyzer::pointer_equiv_analyzer): Same.
(pointer_equiv_analyzer::~pointer_equiv_analyzer): Same.
(pointer_equiv_analyzer::set_global_equiv): Same.
(pointer_equiv_analyzer::set_cond_equiv): Same.
(pointer_equiv_analyzer::get_equiv): Same.
(pointer_equiv_analyzer::enter): Same.
(pointer_equiv_analyzer::leave): Same.
(pointer_equiv_analyzer::get_equiv_expr): Same.
(pta_valueize): Same.
(pointer_equiv_analyzer::visit_stmt): Same.
(pointer_equiv_analyzer::visit_edge): Same.
(hybrid_folder::value_of_expr): Same.
(hybrid_folder::value_on_edge): Same.
* value-pointer-equiv.cc: New file.
* value-pointer-equiv.h: New file.

gimple: allow more folding of memcpy [PR102125]

The current restriction on folding memcpy to a single element of size
MOVE_MAX is excessively cautious on most machines and limits some
significant further optimizations.  So relax the restriction provided
the copy size does not exceed MOVE_MAX * MOVE_RATIO and that a SET
insn exists for moving the value into machine registers.

Note that there were already checks in place for having misaligned
move operations when one or more of the operands were unaligned.

On Arm this now permits optimizing

uint64_t bar64(const uint8_t *rData1)
{
    uint64_t buffer;
    memcpy(&buffer, rData1, sizeof(buffer));
    return buffer;
}

from
        ldr     r2, [r0]        @ unaligned
        sub     sp, sp, #8
        ldr     r3, [r0, #4]    @ unaligned
        strd    r2, [sp]
        ldrd    r0, [sp]
        add     sp, sp, #8

to
        mov     r3, r0
        ldr     r0, [r0]        @ unaligned
        ldr     r1, [r3, #4]    @ unaligned

PR target/102125 - (ARM Cortex-M3 and newer) missed optimization. memcpy not needed operations

gcc/ChangeLog:

PR target/102125
* gimple-fold.c (gimple_fold_builtin_memory_op): Allow folding
memcpy if the size is not more than MOVE_MAX * MOVE_RATIO.

arm: expand handling of movmisalign for DImode [PR102125]

DImode is currently handled only for machines with vector modes
enabled, but this is unduly restrictive and is generally better done
in core registers.

gcc/ChangeLog:

PR target/102125
* config/arm/arm.md (movmisaligndi): New define_expand.
* config/arm/vec-common.md (movmisalign<mode>): Iterate over VDQ mode.

rtl: directly handle MEM in gen_highpart [PR102125]

gen_lowpart_general handles forming a lowpart of a MEM by using
adjust_address to rework and validate a new version of the MEM.
Do the same for gen_highpart rather than calling simplify_gen_subreg
for this case.

gcc/ChangeLog:

PR target/102125
* emit-rtl.c (gen_highpart): Use adjust_address to handle
MEM rather than calling simplify_gen_subreg.

cr16-elf is now obsoleted

As we are still building it for ./contrib/config-list.mk, let's add
--enable-obsolete so this has a chance to work.

contrib/ChangeLog:

* config-list.mk (LIST): --enable-obsolete for cr16-elf.

Fix multi-statment macro

INIT_CUMULATIVE_ARGS() expands to multiple statements, which will break right
after an `if` statement. Wrap it into a block.

gcc/ChangeLog:

* config/alpha/vms.h (INIT_CUMULATIVE_ARGS): Wrap multi-statment
define into a block.

Remove DARWIN_PREFER_DWARF and dead code

This removes the always defined DARWIN_PREFER_DWARF and the code
guarded by it being not defined, removing the possibility to
default some i386 darwin configurations to STABS when it would
not be defined.

2021-09-10 Richard Biener <rguenther@suse.de>

* config/darwin.h (DARWIN_PREFER_DWARF): Do not define.
* config/i386/darwin.h (PREFERRED_DEBUGGING_TYPE): Do not
change based on DARWIN_PREFER_DWARF not being defined.

Fix i686-lynx build breakage

With the last adjustment I failed to remove a stray undef of
PREFERRED_DEBUGGING_TYPE from config/i386/lynx.h

2021-09-13 Richard Biener <rguenther@suse.de>

* config/i386/lynx.h: Remove undef of PREFERRED_DEBUGGING_TYPE
to inherit from elfos.h

Add cr16-*-* to the list of obsoleted targets

This adds cr16-*-* to the list of obsoleted targets in config.gcc

2021-09-13 Richard Biener <rguenther@suse.de>

* config.gcc: Add cr16-*-* to the list of obsoleted targets.

Default AVR to DWARF2 debug

This switches the AVR port to generate DWARF2 debugging info by
default since the support for STABS is going to be deprecated for
GCC 12.

2021-09-10 Richard Biener <rguenther@suse.de>

* config/avr/elf.h (PREFERRED_DEBUGGING_TYPE): Remove
override, pick up DWARF2_DEBUG define from elfos.h

Always default to DWARF2 debugging for RX, even with -mas100-syntax

The RX port defaults to STABS when -mas100-syntax is used because
the AS100 assembler does not support some of the pseudo-ops used
by DWARF2 debug emission.  Since STABS is going to be deprecated
that has to change.  The following simply always uses DWARF2,
likely leaving -mas100-syntax broken when debug info is generated.

Can the RX port maintainer please sort out the situation?

2021-09-10  Richard Biener  <rguenther@suse.de>

* config/rx/rx.h (PREFERRED_DEBUGGING_TYPE): Always define to
DWARF2_DEBUG.

Default Alpha/VMS to DWARF2 debugging only

This changes the default debug format for Alpha/VMS to DWARF2 only,
skipping emission of VMS debug info which is going do be deprecated
for GCC 12 alongside the support for STABS.

2021-09-10 Richard Biener <rguenther@suse.de>

* config/alpha/vms.h (PREFERRED_DEBUGGING_TYPE): Define to
DWARF2_DEBUG.

Always default to DWARF2 debug for cygwin and mingw

This removes the fallback to STABS as default for cygwin and mingw
when the assembler does not support .secrel32 and the default is
to emit 32bit code. Support for .secrel32 was added to binutils 2.16
released in 2005 so instead document that as requirement.

I left the now unused check for .secrel32 in configure around
in case somebody wants to turn that into an error or warning.

2021-09-10 Richard Biener <rguenther@suse.de>

* config/i386/cygming.h: Always default to DWARF2 debugging.
Do not define DBX_DEBUGGING_INFO, that's done via dbxcoff.h
already.
* doc/install.texi: Document binutils 2.16 as minimum
requirement for mingw.

libgfortran: Handle m68k extended real format in ISO_Fortran_binding.h

libgfortran/
* ISO_Fortran_binding.h (CFI_type_long_double)
(CFI_type_long_double_Complex) [LDBL_MANT_DIG == 64 &&
LDBL_MIN_EXP == -16382 && LDBL_MAX_EXP == 16384]: Define.

rs6000: Add load density heuristic

We noticed that SPEC2017 503.bwaves_r run time degrades by
about 8% on P8 and P9 if we enabled vectorization at O2
fast-math (with cheap vect cost model).  Comparing to Ofast,
compiler doesn't do the loop interchange on the innermost
loop, it's not profitable to vectorize it then.

As Richi's comments [1], this follows the similar idea to
over price the vector construction fed by VMAT_ELEMENTWISE
or VMAT_STRIDED_SLP.  Instead of adding the extra cost on
vector construction costing immediately, it firstly records
how many loads and vectorized statements in the given loop,
later in rs6000_density_test (called by finish_cost) it
computes the load density ratio against all vectorized
statements, and check with the corresponding thresholds
DENSITY_LOAD_NUM_THRESHOLD and DENSITY_LOAD_PCT_THRESHOLD,
do the actual extra pricing if both thresholds are exceeded.

Note that this new load density heuristic check is based on
some fields in target cost which are updated as needed when
scanning each add_stmt_cost entry, it's independent of the
current function rs6000_density_test which requires to scan
non_vect stmts.  Since it's checking the load stmts count
vs. all vectorized stmts, it's kind of density, so I put
it in function rs6000_density_test.  With the same reason to
keep it independent, I didn't put it as an else arm of the
current existing density threshold check hunk or before this
hunk.

In the investigation of -1.04% degradation from 526.blender_r
on Power8, I noticed that the extra penalized cost 320 on one
single vector construction for mode V16QI is much exaggerated,
which makes the final body cost unreliable, so this patch adds
one maximum bound for the extra penalized cost for each vector
construction statement.

Full SPEC2017 performance evaluation on Power8/Power9 with
option combinations:
  * -O2 -ftree-vectorize {,-fvect-cost-model=very-cheap}
    {,-ffast-math}
  * {-O3, -Ofast} {,-funroll-loops}
bwaves_r degradations on P8/P9 have been fixed, nothing else
remarkable was observed.  Power10 -Ofast -funroll-loops run
shows it's neutral, while -O2 -ftree-vectorize run shows the
bwaves_r degradation is fixed expectedly.

[1] https://gcc.gnu.org/pipermail/gcc-patches/2021-May/570076.html

gcc/ChangeLog:

* config/rs6000/rs6000.c (struct rs6000_cost_data): New members
nstmts, nloads and extra_ctor_cost.
(rs6000_density_test): Add load density related heuristics.  Do
extra costing on vector construction statements if need.
(rs6000_init_cost): Init new members.
(rs6000_update_target_cost_per_stmt): New function.
(rs6000_add_stmt_cost): Factor vect_nonmem hunk out to function
rs6000_update_target_cost_per_stmt and call it.

rs6000: Remove typedef for struct rs6000_cost_data

As Segher pointed out, to typedef struct _rs6000_cost_data as
rs6000_cost_data is useless, so rewrite it without typedef.

gcc/ChangeLog:

* config/rs6000/rs6000.c (struct rs6000_cost_data): Remove typedef.
(rs6000_init_cost): Adjust.

[i386] Remove UNSPEC_{COPYSIGN,XORSIGN}.

gcc/ChangeLog:

* config/i386/i386.md: (UNSPEC_COPYSIGN): Remove.
(UNSPEC_XORSIGN): Ditto.

Daily bump.

d: Don't include terminating null pointer in string expression conversion (PR102185)

This gets re-added by the ExprVisitor when lowering StringExp back into a
STRING_CST during the code generator pass.

PR d/102185

gcc/d/ChangeLog:

* d-builtins.cc (d_eval_constant_expression): Don't include
terminating null pointer in string expression conversion.

gcc/testsuite/ChangeLog:

* gdc.dg/pr102185.d: New test.

Also preserve SUBREG_PROMOTED_VAR_P in expr.c's convert_move.

This patch catches another place in the middle-end where it's possible
to preserve the SUBREG_PROMOTED_VAR_P annotation on a subreg to the
benefit of later RTL optimizations.  This adds the same logic to
expr.c's convert_move as recently added to convert_modes.

On nvptx-none, the simple test program:

short foo (char c) { return c; }

currently generates three instructions:

mov.u32 %r23, %ar0;
cvt.u16.u32     %r24, %r23;
cvt.s32.s16     %value, %r24;

with this patch, we now generate just one:

mov.u32 %value, %ar0;

This patch should look familiar, it's almost identical to the recent patch
https://gcc.gnu.org/pipermail/gcc-patches/2021-August/578331.html but with
the fix https://gcc.gnu.org/pipermail/gcc-patches/2021-August/578519.html

2021-09-12  Roger Sayle  <roger@nextmovesoftware.com>

gcc/ChangeLog
* expr.c (convert_move): Preserve SUBREG_PROMOTED_VAR_P when
creating a (wider) partial subreg from a SUBREG_PROMOTED_VAR_P
subreg.

Daily bump.

compiler: don't pad zero-sized trailing field in results struct

Nothing can take the address of that field anyhow.

Fixes PR go/101994

Reviewed-on: https://go-review.googlesource.com/c/gofrontend/+/343873

Refactor jump_thread_path_registry.

In an attempt to refactor thread_through_all_blocks(), I've realized
that there is a mess of code dealing with coexisting forward and
backward thread types.  However, this is an impossible scenario, as
the registry contains either forward/old-style threads, or backward
threads (EDGE_FSM_THREADs), never both.

The fact that both types of threads cannot coexist, simplifies the
code considerably.  For that matter, it splits things up nicely
because there are some common bits that can go into a base class, and
some differing code that can go into derived classes.

Diving things in this way makes it very obvious which parts belong in
the old-style copier and which parts belong to the generic copier.
Doing all this provided some nice cleanups, as well as fixing a latent
bug in adjust_paths_after_duplication.

The diff is somewhat hard to read, so perhaps looking at the final
output would be easier.

A general overview of what this patch achieves can be seen by just
looking at this simplified class layout:

// Abstract class for the jump thread registry.

class jt_path_registry
{
public:
  jt_path_registry ();
  virtual ~jt_path_registry ();
  bool register_jump_thread (vec<jump_thread_edge *> *);
  bool thread_through_all_blocks (bool peel_loop_headers);
  jump_thread_edge *allocate_thread_edge (edge e, jump_thread_edge_type t);
  vec<jump_thread_edge *> *allocate_thread_path ();
protected:
  vec<vec<jump_thread_edge *> *> m_paths;
  unsigned long m_num_threaded_edges;
private:
  virtual bool update_cfg (bool peel_loop_headers) = 0;
};

// Forward threader path registry using a custom BB copier.

class fwd_jt_path_registry : public jt_path_registry
{
public:
  fwd_jt_path_registry ();
  ~fwd_jt_path_registry ();
  void remove_jump_threads_including (edge);
private:
  bool update_cfg (bool peel_loop_headers) override;
  void mark_threaded_blocks (bitmap threaded_blocks);
  bool thread_block_1 (basic_block, bool noloop_only, bool joiners);
  bool thread_block (basic_block, bool noloop_only);
  bool thread_through_loop_header (class loop *loop,
                                   bool may_peel_loop_headers);
  class redirection_data *lookup_redirection_data (edge e, enum insert_option);
  hash_table<struct removed_edges> *m_removed_edges;
  hash_table<redirection_data> *m_redirection_data;
};

// Backward threader path registry using a generic BB copier.

class back_jt_path_registry : public jt_path_registry
{
private:
  bool update_cfg (bool peel_loop_headers) override;
  void adjust_paths_after_duplication (unsigned curr_path_num);
  bool duplicate_thread_path (edge entry, edge exit, basic_block *region,
                              unsigned n_region, unsigned current_path_no);
  bool rewire_first_differing_edge (unsigned path_num, unsigned edge_num);
};

That is, the forward and backward bits have been completely split,
while deriving from a base class for the common functionality.

Most everything is mechanical, but there are a few gotchas:

a) back_jt_path_registry::update_cfg(), which contains the backward
threading specific bits, is rather simple, since most of the code in
the original thread_through_all_blocks() only applied to the forward
threader: removed edges, mark_threaded_blocks,
thread_through_loop_header, the copy tables (*).

(*) The back threader has its own copy tables in
duplicate_thread_path.

b) In some cases, adjust_paths_after_duplication() was commoning out
so many blocks that it was removing the initial EDGE_FSM_THREAD
marker.  I've fixed this.

c) AFAICT, when run from the forward threader,
thread_through_all_blocks() attempts to remove threads starting with
an edge already seen, but it would never see anything because the loop
doing the checking only has a visited_starting_edges.contains(), and
no corresponding visited_starting_edges.add().  The add() method in
thread_through_all_blocks belongs to the backward threading bits, and
as I've explained, both types cannot coexist.  I've removed the checks
in the forward bits since they don't appear to do anything.  If this
was an oversight, and we want to avoid threading already seen edges in
the forward threader, I can move this functionality to the base class.

Ultimately I would like to move all the registry code to
tree-ssa-threadregistry.*.  I've avoided this in this patch to aid in
review.

My apologies for this longass explanation, but I want to make sure
we're covering all of our bases.

Tested on x86-64 Linux by a very tedious process of moving chunks
around, running "make check-gcc RUNTESTFLAGS=tree-ssa.exp", and
repeating ad-nauseum.  And of course, by running a full bootstrap and
tests.

OK?

p.s. In a follow-up patch I will rename the confusing EDGE_FSM_THREAD
type.

gcc/ChangeLog:

* tree-ssa-threadbackward.c (class back_threader_registry): Use
back_jt_path_registry.
* tree-ssa-threadedge.c (jump_threader::jump_threader): Use
fwd_jt_path_registry.
* tree-ssa-threadedge.h (class jump_threader): Same..
* tree-ssa-threadupdate.c
(jump_thread_path_registry::jump_thread_path_registry): Rename...
(jt_path_registry::jt_path_registry): ...to this.
(jump_thread_path_registry::~jump_thread_path_registry): Rename...
(jt_path_registry::~jt_path_registry): ...this.
(fwd_jt_path_registry::fwd_jt_path_registry): New.
(fwd_jt_path_registry::~fwd_jt_path_registry): New.
(jump_thread_path_registry::allocate_thread_edge): Rename...
(jt_path_registry::allocate_thread_edge): ...to this.
(jump_thread_path_registry::allocate_thread_path): Rename...
(jt_path_registry::allocate_thread_path): ...to this.
(jump_thread_path_registry::lookup_redirection_data): Rename...
(fwd_jt_path_registry::lookup_redirection_data): ...to this.
(jump_thread_path_registry::thread_block_1): Rename...
(fwd_jt_path_registry::thread_block_1): ...to this.
(jump_thread_path_registry::thread_block): Rename...
(fwd_jt_path_registry::thread_block): ...to this.
(jt_path_registry::thread_through_loop_header): Rename...
(fwd_jt_path_registry::thread_through_loop_header): ...to this.
(jump_thread_path_registry::mark_threaded_blocks): Rename...
(fwd_jt_path_registry::mark_threaded_blocks): ...to this.
(jump_thread_path_registry::debug_path): Rename...
(jt_path_registry::debug_path): ...to this.
(jump_thread_path_registry::dump): Rename...
(jt_path_registry::debug): ...to this.
(jump_thread_path_registry::rewire_first_differing_edge): Rename...
(back_jt_path_registry::rewire_first_differing_edge): ...to this.
(jump_thread_path_registry::adjust_paths_after_duplication): Rename...
(back_jt_path_registry::adjust_paths_after_duplication): ...to this.
(jump_thread_path_registry::duplicate_thread_path): Rename...
(back_jt_path_registry::duplicate_thread_path): ...to this.  Also,
drop ill-formed candidates.
(jump_thread_path_registry::remove_jump_threads_including): Rename...
(fwd_jt_path_registry::remove_jump_threads_including): ...to this.
(jt_path_registry::thread_through_all_blocks): New.
(back_jt_path_registry::update_cfg): New.
(fwd_jt_path_registry::update_cfg): New.
(jump_thread_path_registry::register_jump_thread): Rename...
(jt_path_registry::register_jump_thread): ...to this.
* tree-ssa-threadupdate.h (class jump_thread_path_registry):
Abstract to...
(class jt_path_registry): ...here.
(class fwd_jt_path_registry): New.
(class back_jt_path_registry): New.

testsuite: Fix c-c++-common/auto-init-* tests

> > 2021-08-20  qing zhao  <qing.zhao@oracle.com>
> >
> >        * c-c++-common/auto-init-1.c: New test.
> >        * c-c++-common/auto-init-10.c: New test.
> >        * c-c++-common/auto-init-11.c: New test.
> >        * c-c++-common/auto-init-12.c: New test.
> >        * c-c++-common/auto-init-13.c: New test.
> >        * c-c++-common/auto-init-14.c: New test.
> >        * c-c++-common/auto-init-15.c: New test.
> >        * c-c++-common/auto-init-16.c: New test.
> >        * c-c++-common/auto-init-2.c: New test.
> >        * c-c++-common/auto-init-3.c: New test.
> >        * c-c++-common/auto-init-4.c: New test.
> >        * c-c++-common/auto-init-5.c: New test.
> >        * c-c++-common/auto-init-6.c: New test.
> >        * c-c++-common/auto-init-7.c: New test.
> >        * c-c++-common/auto-init-8.c: New test.
> >        * c-c++-common/auto-init-9.c: New test.
> >        * c-c++-common/auto-init-esra.c: New test.
> >        * c-c++-common/auto-init-padding-1.c: New test.
> >        * c-c++-common/auto-init-padding-2.c: New test.
> >        * c-c++-common/auto-init-padding-3.c: New test.

This fails on many targets, e.g. i686-linux or x86_64-linux with -m32.

The main problem is hardcoding type sizes and structure layout expectations
that are valid only on some lp64 targets.
On ilp32 long and pointer are 32-bit, and there are targets that are neither
ilp32 nor lp64 and there even other sizes can't be taken for granted.
Also, long double depending on target and options is either 8, 12 or 16 byte
(the first one when it is the same as double, the second e.g. for ia32
extended long double (which is under the hood 10 byte), the last either
the same hw type on x86_64 or IBM double double or IEEE quad).
In the last test, one problem is that unsigned long is on ilp32 32-bit
instead of 64-bit, but even just changing to long long is not enough,
as long long in structures on ia32 is only 4 byte aligned instead of 8.

Tested on x86_64-linux -m32/-m64, ok for trunk?

Note, the gcc.dg/i386/auto-init* tests fail also, just don't have time to
deal with that right now, just try
make check-gcc RUNTESTFLAGS='--target_board=unix\{-m32,-m64\} i386.exp=auto-init*'
Guess some of those tests should be restricted to lp64 in there, others
where it might be easier to check all of lp64, x32 and ia32 code generation
could have different matches.  Wonder also about the aarch64 tests, there is
also -mabi=ilp32...
+FAIL: gcc.target/i386/auto-init-2.c scan-rtl-dump-times expand "0xfefefefefefefefe" 3
+FAIL: gcc.target/i386/auto-init-2.c scan-rtl-dump-times expand "0xfffffffffefefefe" 2
+FAIL: gcc.target/i386/auto-init-3.c scan-assembler-times pxor\\t\\\\%xmm0, \\\\%xmm0 3
+FAIL: gcc.target/i386/auto-init-4.c scan-rtl-dump-times expand "0xfffffffffefefefe" 1
+FAIL: gcc.target/i386/auto-init-4.c scan-rtl-dump-times expand "0xfffffffffffffffe\\\\]\\\\) repeated x16" 1
+FAIL: gcc.target/i386/auto-init-4.c scan-rtl-dump-times expand "\\\\[0xfefefefefefefefe\\\\]" 1
+FAIL: gcc.target/i386/auto-init-5.c scan-assembler-times \\\\.long\\t0 14
+FAIL: gcc.target/i386/auto-init-6.c scan-rtl-dump-times expand "0xfffffffffffffffe\\\\]\\\\) repeated x16" 2
+FAIL: gcc.target/i386/auto-init-6.c scan-rtl-dump-times expand "\\\\[0xfefefefefefefefe\\\\]" 1
+FAIL: gcc.target/i386/auto-init-7.c scan-rtl-dump-times expand "const_int 0 \\\\[0\\\\]\\\\) repeated x16" 2
+FAIL: gcc.target/i386/auto-init-7.c scan-rtl-dump-times expand "const_int 0 \\\\[0\\\\]\\\\)\\\\)" 3
+FAIL: gcc.target/i386/auto-init-8.c scan-rtl-dump-times expand "0xfffffffffefefefe" 1
+FAIL: gcc.target/i386/auto-init-8.c scan-rtl-dump-times expand "0xfffffffffffffffe\\\\]\\\\) repeated x16" 2
+FAIL: gcc.target/i386/auto-init-8.c scan-rtl-dump-times expand "\\\\[0xfefefefefefefefe\\\\]" 2
+FAIL: gcc.target/i386/auto-init-padding-1.c scan-rtl-dump-times expand "const_int 0 \\\\[0\\\\]\\\\) repeated x16" 1
+FAIL: gcc.target/i386/auto-init-padding-10.c scan-rtl-dump-times expand "0xfffffffffffffffe\\\\]\\\\) repeated x16" 1
+FAIL: gcc.target/i386/auto-init-padding-11.c scan-rtl-dump-times expand "const_int 0 \\\\[0\\\\]\\\\) repeated x16" 1
+FAIL: gcc.target/i386/auto-init-padding-12.c scan-rtl-dump-times expand "0xfffffffffffffffe\\\\]\\\\) repeated x16" 1
+FAIL: gcc.target/i386/auto-init-padding-2.c scan-rtl-dump-times expand "0xfffffffffffffffe\\\\]\\\\) repeated x16" 1
+FAIL: gcc.target/i386/auto-init-padding-3.c scan-assembler movl\\t\\\\\$16,
+FAIL: gcc.target/i386/auto-init-padding-3.c scan-assembler rep stosq
+FAIL: gcc.target/i386/auto-init-padding-4.c scan-rtl-dump-times expand "0xfffffffffffffffe\\\\]\\\\) repeated x16" 1
+FAIL: gcc.target/i386/auto-init-padding-5.c scan-rtl-dump-times expand "const_int 0 \\\\[0\\\\]\\\\) repeated x16" 1
+FAIL: gcc.target/i386/auto-init-padding-6.c scan-rtl-dump-times expand "0xfffffffffffffffe\\\\]\\\\) repeated x16" 1
+FAIL: gcc.target/i386/auto-init-padding-7.c scan-assembler-times movq\\t\\\\\$0, 2
+FAIL: gcc.target/i386/auto-init-padding-8.c scan-assembler-times movq\\t\\\\\$0, 2
+FAIL: gcc.target/i386/auto-init-padding-9.c scan-assembler rep stosq

2021-09-11  Jakub Jelinek  <jakub@redhat.com>

* c-c++-common/auto-init-1.c: Enable test only on ilp32 or lp64
targets, expect different long and pointer sizes between ilp32 and
lp64.
* c-c++-common/auto-init-2.c: Likewise.
* c-c++-common/auto-init-3.c: Expect one of the common long double
sizes (8/12/16 bytes) instead of hardcoding 16 bytes.
* c-c++-common/auto-init-4.c: Likewise.
* c-c++-common/auto-init-5.c: Expect one of the common
_Complex long double sizes (16/24/32 bytes) instead of hardcoding 32
bytes.
* c-c++-common/auto-init-6.c: Likewise.
* c-c++-common/auto-init-padding-1.c: Enable test only on ilp32 or lp64
targets.
(struct test_small_hole): Change type of four to unsigned long long
and add aligned attribute.

Daily bump.

libgccjit: Generate debug info for variables

Finalize declares via available helpers after location is set. Set
TYPE_NAME of primitives and friends to "int" etc. Debug info is now
set properly for variables.

Signed-off-by:
2021-09-09 Petter Tomner <tomner@kth.se>

gcc/jit/
* jit-playback.c: Moved global var processing to after loc handling.
  Setting TYPE_NAME for fundamental types.
  Using common functions for finalizing globals.
* jit-playback.h: New method init_types().
  Changed get_tree_node_for_type() to method.

gcc/testsuite/
* jit.dg/test-error-array-bounds.c: Array is not unsigned
* jit.dg/jit.exp: Helper function
* jit.dg/test-debuginfo.c: New testcase

Revert "Get rid of all float-int special cases in validate_subreg."

This reverts commit d2874d905647a1d146dafa60199d440e837adc4d.

PR target/102254
PR target/102154
PR target/102211

MAINTAINERS: Adding myself to to DCO and write after approval

2020-09-10 Petter Tomner <tomner@kth.se>

ChangeLog:
* MAINTAINERS: Me added to DCO and write after approval

openmp: Implement OpenMP 5.1 atomics, so far for C only

This patch implements OpenMP 5.1 atomics (with clarifications from upcoming 5.2).
The most important changes are that it is now possible to write (for C/C++,
for Fortran it was possible before already) min/max atomics and more importantly
compare and exchange in various forms.
Also, acq_rel is now allowed on read/write and acq_rel/acquire are allowed on
update, and there are new compare, weak and fail clauses.

2021-09-10  Jakub Jelinek  <jakub@redhat.com>

gcc/
* tree-core.h (enum omp_memory_order): Add OMP_MEMORY_ORDER_MASK,
OMP_FAIL_MEMORY_ORDER_UNSPECIFIED, OMP_FAIL_MEMORY_ORDER_RELAXED,
OMP_FAIL_MEMORY_ORDER_ACQUIRE, OMP_FAIL_MEMORY_ORDER_RELEASE,
OMP_FAIL_MEMORY_ORDER_ACQ_REL, OMP_FAIL_MEMORY_ORDER_SEQ_CST and
OMP_FAIL_MEMORY_ORDER_MASK enumerators.
(OMP_FAIL_MEMORY_ORDER_SHIFT): Define.
* gimple-pretty-print.c (dump_gimple_omp_atomic_load,
dump_gimple_omp_atomic_store): Print [weak] for weak atomic
load/store.
* gimple.h (enum gf_mask): Change GF_OMP_ATOMIC_MEMORY_ORDER
to 6-bit mask, adjust GF_OMP_ATOMIC_NEED_VALUE value and add
GF_OMP_ATOMIC_WEAK.
(gimple_omp_atomic_weak_p, gimple_omp_atomic_set_weak): New inline
functions.
* tree.h (OMP_ATOMIC_WEAK): Define.
* tree-pretty-print.c (dump_omp_atomic_memory_order): Adjust for
fail memory order being encoded in the same enum and also print
fail clause if present.
(dump_generic_node): Print weak clause if OMP_ATOMIC_WEAK.
* gimplify.c (goa_stabilize_expr): Add target_expr and rhs arguments,
handle pre_p == NULL case as a test mode that only returns value
but doesn't change gimplify nor change anything otherwise, adjust
recursive calls, add MODIFY_EXPR, ADDR_EXPR, COND_EXPR, TARGET_EXPR
and CALL_EXPR handling, adjust COMPOUND_EXPR handling for
__builtin_clear_padding calls, for !rhs gimplify as lvalue rather
than rvalue.
(gimplify_omp_atomic): Adjust goa_stabilize_expr caller.  Handle
COND_EXPR rhs.  Set weak flag on gimple load/store for
OMP_ATOMIC_WEAK.
* omp-expand.c (omp_memory_order_to_fail_memmodel): New function.
(omp_memory_order_to_memmodel): Adjust for fail clause encoded
in the same enum.
(expand_omp_atomic_cas): New function.
(expand_omp_atomic_pipeline): Use omp_memory_order_to_fail_memmodel
function.
(expand_omp_atomic): Attempt to optimize atomic compare and exchange
using expand_omp_atomic_cas.
gcc/c-family/
* c-common.h (c_finish_omp_atomic): Add r and weak arguments.
* c-omp.c: Include gimple-fold.h.
(c_finish_omp_atomic): Add r and weak arguments.  Add support for
OpenMP 5.1 atomics.
gcc/c/
* c-parser.c (c_parser_conditional_expression): If omp_atomic_lhs and
cond.value is >, < or == with omp_atomic_lhs as one of the operands,
don't call build_conditional_expr, instead build a COND_EXPR directly.
(c_parser_binary_expression): Avoid calling parser_build_binary_op
if omp_atomic_lhs even in more cases for >, < or ==.
(c_parser_omp_atomic): Update function comment for OpenMP 5.1 atomics,
parse OpenMP 5.1 atomics and fail, compare and weak clauses, allow
acq_rel on atomic read/write and acq_rel/acquire clauses on update.
* c-typeck.c (build_binary_op): For flag_openmp only handle
MIN_EXPR/MAX_EXPR.
gcc/cp/
* parser.c (cp_parser_omp_atomic): Allow acq_rel on atomic read/write
and acq_rel/acquire clauses on update.
* semantics.c (finish_omp_atomic): Adjust c_finish_omp_atomic caller.
gcc/testsuite/
* c-c++-common/gomp/atomic-17.c (foo): Add tests for atomic read,
write or update with acq_rel clause and atomic update with acquire clause.
* c-c++-common/gomp/atomic-18.c (foo): Adjust expected diagnostics
wording, remove tests moved to atomic-17.c.
* c-c++-common/gomp/atomic-21.c: Expect only 2 omp atomic release and
2 omp atomic acq_rel directives instead of 4 omp atomic release.
* c-c++-common/gomp/atomic-25.c: New test.
* c-c++-common/gomp/atomic-26.c: New test.
* c-c++-common/gomp/atomic-27.c: New test.
* c-c++-common/gomp/atomic-28.c: New test.
* c-c++-common/gomp/atomic-29.c: New test.
* c-c++-common/gomp/atomic-30.c: New test.
* c-c++-common/goacc-gomp/atomic.c: Expect 1 omp atomic release and
1 omp atomic_acq_rel instead of 2 omp atomic release directives.
* gcc.dg/gomp/atomic-5.c: Adjust expected error diagnostic wording.
* g++.dg/gomp/atomic-18.C:Expect 4 omp atomic release and
1 omp atomic_acq_rel instead of 5 omp atomic release directives.
libgomp/
* testsuite/libgomp.c-c++-common/atomic-19.c: New test.
* testsuite/libgomp.c-c++-common/atomic-20.c: New test.
* testsuite/libgomp.c-c++-common/atomic-21.c: New test.

compiler: correct condition for calling memclrHasPointers

When compiling append(s, make([]typ, ln)...), where typ has a pointer,
and the append fits within the existing capacity of s, the condition
used to clear out the new elements was reversed.

Fixes golang/go#47771

Reviewed-on: https://go-review.googlesource.com/c/gofrontend/+/344189

Disable threading through latches until after loop optimizations.

The motivation for this patch was enabling the use of global ranges in
the path solver, but this caused certain properties of loops being
destroyed which made subsequent loop optimizations to fail.
Consequently, this patch's mail goal is to disable jump threading
involving the latch until after loop optimizations have run.

As can be seen in the test adjustments, we mostly shift the threading
from the early threaders (ethread, thread[12] to the late threaders
thread[34]).  I have nuked some of the early notes in the testcases
that came as part of the jump threader rewrite.  They're mostly noise
now.

Note that we could probably relax some other restrictions in
profitable_path_p when loop optimizations have completed, but it would
require more testing, and I'm hesitant to touch more things than needed
at this point.  I have added a reminder to the function to keep this
in mind.

Finally, perhaps as a follow-up, we should apply the same restrictions to
the forward threader.  At some point I'd like to combine the cost models.

Tested on x86-64 Linux.

p.s. There is a thorough discussion involving the limitations of jump
threading involving loops here:

https://gcc.gnu.org/pipermail/gcc/2021-September/237247.html

gcc/ChangeLog:

* tree-pass.h (PROP_loop_opts_done): New.
* gimple-range-path.cc (path_range_query::internal_range_of_expr):
Intersect with global range.
* tree-ssa-loop.c (tree_ssa_loop_done): Set PROP_loop_opts_done.
* tree-ssa-threadbackward.c
(back_threader_profitability::profitable_path_p): Disable
threading through latches until after loop optimizations have run.

gcc/testsuite/ChangeLog:

* gcc.dg/tree-ssa/ssa-dom-thread-2b.c: Adjust for disabling of
threading through latches.
* gcc.dg/tree-ssa/ssa-dom-thread-6.c: Same.
* gcc.dg/tree-ssa/ssa-dom-thread-7.c: Same.

Co-authored-by: Michael Matz <matz@suse.de>

doc: document BPF -mcpu and related options

This commit adds documentation for the new BPF options -mcpu, -mjmpext,
-mjmp32, and -malu32.

gcc/ChangeLog:
* doc/invoke.texi: Document BPF -mcpu, -mjmpext, -mjmp32 and -malu32
options.

bpf testsuite: add tests for new feature options

This commit adds tests for the new -mjmpext, -mjmp32 and -malu32 feature
options in the BPF backend.

gcc/testsuite/ChangeLog:
* gcc.target/bpf/alu-1.c: New test.
* gcc.target/bpf/jmp-1.c: New test.

bpf: add -mcpu and related feature options

New instructions have been added over time to the eBPF ISA, but
previously there has been no good method to select which version to
target in GCC.

This patch adds the following options to the BPF backend:

  -mcpu={v1, v2, v3}
    Select which version of the eBPF ISA to target. This enables or
    disables generation of certain instructions. The default is v3.

  -mjmpext
    Enable extra conditional branch instructions.
    Enabled for CPU v2 and above.

  -mjmp32
    Enable 32-bit jump/branch instructions.
    Enabled for CPU v3 and above.

  -malu32
    Enable 32-bit ALU instructions.
    Enabled for CPU v3 and above.

gcc/ChangeLog:
* config/bpf/bpf-opts.h (bpf_isa_version): New enum.
* config/bpf/bpf-protos.h (bpf_expand_cbranch): New.
* config/bpf/bpf.c (bpf_option_override): Handle -mcpu option.
(bpf_expand_cbranch): New function.
* config/bpf/bpf.md (AM mode iterator): Conditionalize support for SI
mode.
(zero_extendsidi2): Only use mov32 instruction if it is available.
(SIM mode iterator): Conditionalize support for SI mode.
(JM mode iterator): New.
(cbranchdi4): Update name, use new JM iterator. Use bpf_expand_cbranch.
(*branch_on_di): Update name, use new JM iterator.
* config/bpf/bpf.opt: (mjmpext): New option.
(malu32): Likewise.
(mjmp32): Likewise.
(mcpu): Likewise.
(bpf_isa): New enum.

bpf: correct zero_extend output templates

The output templates for zero_extendhidi2 and zero_extendqidi2 could
lead to incorrect code generation when zero-extending one register into
another. This patch adds a new output template to the define_insns to
handle such cases and produce correct asm.

gcc/ChangeLog:
* config/bpf/bpf.md (zero_extendhidi2): Add new output template
for register-to-register extensions.
(zero_extendqidi2): Likewise.

libstdc++: Use "test.invalid." for invalid hostname

This avoids test.invalid.some.domain being successfully resolved.

libstdc++-v3/ChangeLog:

* testsuite/experimental/net/internet/resolver/ops/lookup.cc:
Fix invalid hostname to only match the .invalid TLD.

middle-end/102273 - avoid ICE with auto-init and nested functions

This refactors expansion to consider non-decl LHS. I suspect
the is_val argument is not needed.

2021-09-10 Richard Biener <rguenther@suse.de>

PR middle-end/102273
* internal-fn.c (expand_DEFERRED_INIT): Always expand non-SSA vars.

* gcc.dg/pr102273.c: New testcase.

Fix 'dg-do run' syntax in 'c-c++-common/auto-init-padding-{2,3}.c'

Fix-up for recent commit a25e0b5e6ac8a77a71c229e0a7b744603365b0e9
"Add -ftrivial-auto-var-init option and uninitialized variable attribute".

gcc/testsuite/
* c-c++-common/auto-init-padding-2.c: Fix 'dg-do run' syntax.
* c-c++-common/auto-init-padding-3.c: Likewise.

middle-end/102269 - avoid auto-init of empty types

This avoids initializing empty types for which we'll eventually
leave a .DEFERRED_INIT call without a LHS.

2021-09-10 Richard Biener <rguenther@suse.de>

PR middle-end/102269
* gimplify.c (is_var_need_auto_init): Empty types do not need
initialization.

* gcc.dg/pr102269.c: New testcase.

Remove vestiges of --with-stabs

This removes the --with-stabs configure option which had no effect
since quite some time.

2021-09-10 Richard Biener <rguenther@suse.de>

* configure.ac (--with-stabs): Remove.
* configure: Regenerate.
* doc/install.texi: Remove --with-stabs documentation.

AVX512FP16: Add testcase for vcmpph/vcmpsh/vcomish/vucomish.

gcc/testsuite/ChangeLog:

* gcc.target/i386/avx512fp16-helper.h
(check_results_mask): New check_function.
* gcc.target/i386/avx512fp16-vcmpph-1a.c: New test.
* gcc.target/i386/avx512fp16-vcmpph-1b.c: Ditto.
* gcc.target/i386/avx512fp16-vcmpsh-1a.c: Ditto.
* gcc.target/i386/avx512fp16-vcmpsh-1b.c: Ditto.
* gcc.target/i386/avx512fp16-vcomish-1a.c: Ditto.
* gcc.target/i386/avx512fp16-vcomish-1b.c: Ditto.
* gcc.target/i386/avx512fp16-vcomish-1c.c: Ditto.
* gcc.target/i386/avx512fp16vl-vcmpph-1a.c: Ditto.
* gcc.target/i386/avx512fp16vl-vcmpph-1b.c: Ditto.