review.tizen.org Git - platform/upstream/gcc.git/log

rs6000: Add load density heuristic

We noticed that SPEC2017 503.bwaves_r run time degrades by
about 8% on P8 and P9 if we enabled vectorization at O2
fast-math (with cheap vect cost model).  Comparing to Ofast,
compiler doesn't do the loop interchange on the innermost
loop, it's not profitable to vectorize it then.

As Richi's comments [1], this follows the similar idea to
over price the vector construction fed by VMAT_ELEMENTWISE
or VMAT_STRIDED_SLP.  Instead of adding the extra cost on
vector construction costing immediately, it firstly records
how many loads and vectorized statements in the given loop,
later in rs6000_density_test (called by finish_cost) it
computes the load density ratio against all vectorized
statements, and check with the corresponding thresholds
DENSITY_LOAD_NUM_THRESHOLD and DENSITY_LOAD_PCT_THRESHOLD,
do the actual extra pricing if both thresholds are exceeded.

Note that this new load density heuristic check is based on
some fields in target cost which are updated as needed when
scanning each add_stmt_cost entry, it's independent of the
current function rs6000_density_test which requires to scan
non_vect stmts.  Since it's checking the load stmts count
vs. all vectorized stmts, it's kind of density, so I put
it in function rs6000_density_test.  With the same reason to
keep it independent, I didn't put it as an else arm of the
current existing density threshold check hunk or before this
hunk.

In the investigation of -1.04% degradation from 526.blender_r
on Power8, I noticed that the extra penalized cost 320 on one
single vector construction for mode V16QI is much exaggerated,
which makes the final body cost unreliable, so this patch adds
one maximum bound for the extra penalized cost for each vector
construction statement.

Full SPEC2017 performance evaluation on Power8/Power9 with
option combinations:
  * -O2 -ftree-vectorize {,-fvect-cost-model=very-cheap}
    {,-ffast-math}
  * {-O3, -Ofast} {,-funroll-loops}
bwaves_r degradations on P8/P9 have been fixed, nothing else
remarkable was observed.  Power10 -Ofast -funroll-loops run
shows it's neutral, while -O2 -ftree-vectorize run shows the
bwaves_r degradation is fixed expectedly.

[1] https://gcc.gnu.org/pipermail/gcc-patches/2021-May/570076.html

gcc/ChangeLog:

* config/rs6000/rs6000.c (struct rs6000_cost_data): New members
nstmts, nloads and extra_ctor_cost.
(rs6000_density_test): Add load density related heuristics.  Do
extra costing on vector construction statements if need.
(rs6000_init_cost): Init new members.
(rs6000_update_target_cost_per_stmt): New function.
(rs6000_add_stmt_cost): Factor vect_nonmem hunk out to function
rs6000_update_target_cost_per_stmt and call it.

rs6000: Remove typedef for struct rs6000_cost_data

As Segher pointed out, to typedef struct _rs6000_cost_data as
rs6000_cost_data is useless, so rewrite it without typedef.

gcc/ChangeLog:

* config/rs6000/rs6000.c (struct rs6000_cost_data): Remove typedef.
(rs6000_init_cost): Adjust.

[i386] Remove UNSPEC_{COPYSIGN,XORSIGN}.

gcc/ChangeLog:

* config/i386/i386.md: (UNSPEC_COPYSIGN): Remove.
(UNSPEC_XORSIGN): Ditto.

Daily bump.

d: Don't include terminating null pointer in string expression conversion (PR102185)

This gets re-added by the ExprVisitor when lowering StringExp back into a
STRING_CST during the code generator pass.

PR d/102185

gcc/d/ChangeLog:

* d-builtins.cc (d_eval_constant_expression): Don't include
terminating null pointer in string expression conversion.

gcc/testsuite/ChangeLog:

* gdc.dg/pr102185.d: New test.

Also preserve SUBREG_PROMOTED_VAR_P in expr.c's convert_move.

This patch catches another place in the middle-end where it's possible
to preserve the SUBREG_PROMOTED_VAR_P annotation on a subreg to the
benefit of later RTL optimizations.  This adds the same logic to
expr.c's convert_move as recently added to convert_modes.

On nvptx-none, the simple test program:

short foo (char c) { return c; }

currently generates three instructions:

mov.u32 %r23, %ar0;
cvt.u16.u32     %r24, %r23;
cvt.s32.s16     %value, %r24;

with this patch, we now generate just one:

mov.u32 %value, %ar0;

This patch should look familiar, it's almost identical to the recent patch
https://gcc.gnu.org/pipermail/gcc-patches/2021-August/578331.html but with
the fix https://gcc.gnu.org/pipermail/gcc-patches/2021-August/578519.html

2021-09-12  Roger Sayle  <roger@nextmovesoftware.com>

gcc/ChangeLog
* expr.c (convert_move): Preserve SUBREG_PROMOTED_VAR_P when
creating a (wider) partial subreg from a SUBREG_PROMOTED_VAR_P
subreg.

Daily bump.

compiler: don't pad zero-sized trailing field in results struct

Nothing can take the address of that field anyhow.

Fixes PR go/101994

Reviewed-on: https://go-review.googlesource.com/c/gofrontend/+/343873

Refactor jump_thread_path_registry.

In an attempt to refactor thread_through_all_blocks(), I've realized
that there is a mess of code dealing with coexisting forward and
backward thread types.  However, this is an impossible scenario, as
the registry contains either forward/old-style threads, or backward
threads (EDGE_FSM_THREADs), never both.

The fact that both types of threads cannot coexist, simplifies the
code considerably.  For that matter, it splits things up nicely
because there are some common bits that can go into a base class, and
some differing code that can go into derived classes.

Diving things in this way makes it very obvious which parts belong in
the old-style copier and which parts belong to the generic copier.
Doing all this provided some nice cleanups, as well as fixing a latent
bug in adjust_paths_after_duplication.

The diff is somewhat hard to read, so perhaps looking at the final
output would be easier.

A general overview of what this patch achieves can be seen by just
looking at this simplified class layout:

// Abstract class for the jump thread registry.

class jt_path_registry
{
public:
  jt_path_registry ();
  virtual ~jt_path_registry ();
  bool register_jump_thread (vec<jump_thread_edge *> *);
  bool thread_through_all_blocks (bool peel_loop_headers);
  jump_thread_edge *allocate_thread_edge (edge e, jump_thread_edge_type t);
  vec<jump_thread_edge *> *allocate_thread_path ();
protected:
  vec<vec<jump_thread_edge *> *> m_paths;
  unsigned long m_num_threaded_edges;
private:
  virtual bool update_cfg (bool peel_loop_headers) = 0;
};

// Forward threader path registry using a custom BB copier.

class fwd_jt_path_registry : public jt_path_registry
{
public:
  fwd_jt_path_registry ();
  ~fwd_jt_path_registry ();
  void remove_jump_threads_including (edge);
private:
  bool update_cfg (bool peel_loop_headers) override;
  void mark_threaded_blocks (bitmap threaded_blocks);
  bool thread_block_1 (basic_block, bool noloop_only, bool joiners);
  bool thread_block (basic_block, bool noloop_only);
  bool thread_through_loop_header (class loop *loop,
                                   bool may_peel_loop_headers);
  class redirection_data *lookup_redirection_data (edge e, enum insert_option);
  hash_table<struct removed_edges> *m_removed_edges;
  hash_table<redirection_data> *m_redirection_data;
};

// Backward threader path registry using a generic BB copier.

class back_jt_path_registry : public jt_path_registry
{
private:
  bool update_cfg (bool peel_loop_headers) override;
  void adjust_paths_after_duplication (unsigned curr_path_num);
  bool duplicate_thread_path (edge entry, edge exit, basic_block *region,
                              unsigned n_region, unsigned current_path_no);
  bool rewire_first_differing_edge (unsigned path_num, unsigned edge_num);
};

That is, the forward and backward bits have been completely split,
while deriving from a base class for the common functionality.

Most everything is mechanical, but there are a few gotchas:

a) back_jt_path_registry::update_cfg(), which contains the backward
threading specific bits, is rather simple, since most of the code in
the original thread_through_all_blocks() only applied to the forward
threader: removed edges, mark_threaded_blocks,
thread_through_loop_header, the copy tables (*).

(*) The back threader has its own copy tables in
duplicate_thread_path.

b) In some cases, adjust_paths_after_duplication() was commoning out
so many blocks that it was removing the initial EDGE_FSM_THREAD
marker.  I've fixed this.

c) AFAICT, when run from the forward threader,
thread_through_all_blocks() attempts to remove threads starting with
an edge already seen, but it would never see anything because the loop
doing the checking only has a visited_starting_edges.contains(), and
no corresponding visited_starting_edges.add().  The add() method in
thread_through_all_blocks belongs to the backward threading bits, and
as I've explained, both types cannot coexist.  I've removed the checks
in the forward bits since they don't appear to do anything.  If this
was an oversight, and we want to avoid threading already seen edges in
the forward threader, I can move this functionality to the base class.

Ultimately I would like to move all the registry code to
tree-ssa-threadregistry.*.  I've avoided this in this patch to aid in
review.

My apologies for this longass explanation, but I want to make sure
we're covering all of our bases.

Tested on x86-64 Linux by a very tedious process of moving chunks
around, running "make check-gcc RUNTESTFLAGS=tree-ssa.exp", and
repeating ad-nauseum.  And of course, by running a full bootstrap and
tests.

OK?

p.s. In a follow-up patch I will rename the confusing EDGE_FSM_THREAD
type.

gcc/ChangeLog:

* tree-ssa-threadbackward.c (class back_threader_registry): Use
back_jt_path_registry.
* tree-ssa-threadedge.c (jump_threader::jump_threader): Use
fwd_jt_path_registry.
* tree-ssa-threadedge.h (class jump_threader): Same..
* tree-ssa-threadupdate.c
(jump_thread_path_registry::jump_thread_path_registry): Rename...
(jt_path_registry::jt_path_registry): ...to this.
(jump_thread_path_registry::~jump_thread_path_registry): Rename...
(jt_path_registry::~jt_path_registry): ...this.
(fwd_jt_path_registry::fwd_jt_path_registry): New.
(fwd_jt_path_registry::~fwd_jt_path_registry): New.
(jump_thread_path_registry::allocate_thread_edge): Rename...
(jt_path_registry::allocate_thread_edge): ...to this.
(jump_thread_path_registry::allocate_thread_path): Rename...
(jt_path_registry::allocate_thread_path): ...to this.
(jump_thread_path_registry::lookup_redirection_data): Rename...
(fwd_jt_path_registry::lookup_redirection_data): ...to this.
(jump_thread_path_registry::thread_block_1): Rename...
(fwd_jt_path_registry::thread_block_1): ...to this.
(jump_thread_path_registry::thread_block): Rename...
(fwd_jt_path_registry::thread_block): ...to this.
(jt_path_registry::thread_through_loop_header): Rename...
(fwd_jt_path_registry::thread_through_loop_header): ...to this.
(jump_thread_path_registry::mark_threaded_blocks): Rename...
(fwd_jt_path_registry::mark_threaded_blocks): ...to this.
(jump_thread_path_registry::debug_path): Rename...
(jt_path_registry::debug_path): ...to this.
(jump_thread_path_registry::dump): Rename...
(jt_path_registry::debug): ...to this.
(jump_thread_path_registry::rewire_first_differing_edge): Rename...
(back_jt_path_registry::rewire_first_differing_edge): ...to this.
(jump_thread_path_registry::adjust_paths_after_duplication): Rename...
(back_jt_path_registry::adjust_paths_after_duplication): ...to this.
(jump_thread_path_registry::duplicate_thread_path): Rename...
(back_jt_path_registry::duplicate_thread_path): ...to this.  Also,
drop ill-formed candidates.
(jump_thread_path_registry::remove_jump_threads_including): Rename...
(fwd_jt_path_registry::remove_jump_threads_including): ...to this.
(jt_path_registry::thread_through_all_blocks): New.
(back_jt_path_registry::update_cfg): New.
(fwd_jt_path_registry::update_cfg): New.
(jump_thread_path_registry::register_jump_thread): Rename...
(jt_path_registry::register_jump_thread): ...to this.
* tree-ssa-threadupdate.h (class jump_thread_path_registry):
Abstract to...
(class jt_path_registry): ...here.
(class fwd_jt_path_registry): New.
(class back_jt_path_registry): New.

testsuite: Fix c-c++-common/auto-init-* tests

> > 2021-08-20  qing zhao  <qing.zhao@oracle.com>
> >
> >        * c-c++-common/auto-init-1.c: New test.
> >        * c-c++-common/auto-init-10.c: New test.
> >        * c-c++-common/auto-init-11.c: New test.
> >        * c-c++-common/auto-init-12.c: New test.
> >        * c-c++-common/auto-init-13.c: New test.
> >        * c-c++-common/auto-init-14.c: New test.
> >        * c-c++-common/auto-init-15.c: New test.
> >        * c-c++-common/auto-init-16.c: New test.
> >        * c-c++-common/auto-init-2.c: New test.
> >        * c-c++-common/auto-init-3.c: New test.
> >        * c-c++-common/auto-init-4.c: New test.
> >        * c-c++-common/auto-init-5.c: New test.
> >        * c-c++-common/auto-init-6.c: New test.
> >        * c-c++-common/auto-init-7.c: New test.
> >        * c-c++-common/auto-init-8.c: New test.
> >        * c-c++-common/auto-init-9.c: New test.
> >        * c-c++-common/auto-init-esra.c: New test.
> >        * c-c++-common/auto-init-padding-1.c: New test.
> >        * c-c++-common/auto-init-padding-2.c: New test.
> >        * c-c++-common/auto-init-padding-3.c: New test.

This fails on many targets, e.g. i686-linux or x86_64-linux with -m32.

The main problem is hardcoding type sizes and structure layout expectations
that are valid only on some lp64 targets.
On ilp32 long and pointer are 32-bit, and there are targets that are neither
ilp32 nor lp64 and there even other sizes can't be taken for granted.
Also, long double depending on target and options is either 8, 12 or 16 byte
(the first one when it is the same as double, the second e.g. for ia32
extended long double (which is under the hood 10 byte), the last either
the same hw type on x86_64 or IBM double double or IEEE quad).
In the last test, one problem is that unsigned long is on ilp32 32-bit
instead of 64-bit, but even just changing to long long is not enough,
as long long in structures on ia32 is only 4 byte aligned instead of 8.

Tested on x86_64-linux -m32/-m64, ok for trunk?

Note, the gcc.dg/i386/auto-init* tests fail also, just don't have time to
deal with that right now, just try
make check-gcc RUNTESTFLAGS='--target_board=unix\{-m32,-m64\} i386.exp=auto-init*'
Guess some of those tests should be restricted to lp64 in there, others
where it might be easier to check all of lp64, x32 and ia32 code generation
could have different matches.  Wonder also about the aarch64 tests, there is
also -mabi=ilp32...
+FAIL: gcc.target/i386/auto-init-2.c scan-rtl-dump-times expand "0xfefefefefefefefe" 3
+FAIL: gcc.target/i386/auto-init-2.c scan-rtl-dump-times expand "0xfffffffffefefefe" 2
+FAIL: gcc.target/i386/auto-init-3.c scan-assembler-times pxor\\t\\\\%xmm0, \\\\%xmm0 3
+FAIL: gcc.target/i386/auto-init-4.c scan-rtl-dump-times expand "0xfffffffffefefefe" 1
+FAIL: gcc.target/i386/auto-init-4.c scan-rtl-dump-times expand "0xfffffffffffffffe\\\\]\\\\) repeated x16" 1
+FAIL: gcc.target/i386/auto-init-4.c scan-rtl-dump-times expand "\\\\[0xfefefefefefefefe\\\\]" 1
+FAIL: gcc.target/i386/auto-init-5.c scan-assembler-times \\\\.long\\t0 14
+FAIL: gcc.target/i386/auto-init-6.c scan-rtl-dump-times expand "0xfffffffffffffffe\\\\]\\\\) repeated x16" 2
+FAIL: gcc.target/i386/auto-init-6.c scan-rtl-dump-times expand "\\\\[0xfefefefefefefefe\\\\]" 1
+FAIL: gcc.target/i386/auto-init-7.c scan-rtl-dump-times expand "const_int 0 \\\\[0\\\\]\\\\) repeated x16" 2
+FAIL: gcc.target/i386/auto-init-7.c scan-rtl-dump-times expand "const_int 0 \\\\[0\\\\]\\\\)\\\\)" 3
+FAIL: gcc.target/i386/auto-init-8.c scan-rtl-dump-times expand "0xfffffffffefefefe" 1
+FAIL: gcc.target/i386/auto-init-8.c scan-rtl-dump-times expand "0xfffffffffffffffe\\\\]\\\\) repeated x16" 2
+FAIL: gcc.target/i386/auto-init-8.c scan-rtl-dump-times expand "\\\\[0xfefefefefefefefe\\\\]" 2
+FAIL: gcc.target/i386/auto-init-padding-1.c scan-rtl-dump-times expand "const_int 0 \\\\[0\\\\]\\\\) repeated x16" 1
+FAIL: gcc.target/i386/auto-init-padding-10.c scan-rtl-dump-times expand "0xfffffffffffffffe\\\\]\\\\) repeated x16" 1
+FAIL: gcc.target/i386/auto-init-padding-11.c scan-rtl-dump-times expand "const_int 0 \\\\[0\\\\]\\\\) repeated x16" 1
+FAIL: gcc.target/i386/auto-init-padding-12.c scan-rtl-dump-times expand "0xfffffffffffffffe\\\\]\\\\) repeated x16" 1
+FAIL: gcc.target/i386/auto-init-padding-2.c scan-rtl-dump-times expand "0xfffffffffffffffe\\\\]\\\\) repeated x16" 1
+FAIL: gcc.target/i386/auto-init-padding-3.c scan-assembler movl\\t\\\\\$16,
+FAIL: gcc.target/i386/auto-init-padding-3.c scan-assembler rep stosq
+FAIL: gcc.target/i386/auto-init-padding-4.c scan-rtl-dump-times expand "0xfffffffffffffffe\\\\]\\\\) repeated x16" 1
+FAIL: gcc.target/i386/auto-init-padding-5.c scan-rtl-dump-times expand "const_int 0 \\\\[0\\\\]\\\\) repeated x16" 1
+FAIL: gcc.target/i386/auto-init-padding-6.c scan-rtl-dump-times expand "0xfffffffffffffffe\\\\]\\\\) repeated x16" 1
+FAIL: gcc.target/i386/auto-init-padding-7.c scan-assembler-times movq\\t\\\\\$0, 2
+FAIL: gcc.target/i386/auto-init-padding-8.c scan-assembler-times movq\\t\\\\\$0, 2
+FAIL: gcc.target/i386/auto-init-padding-9.c scan-assembler rep stosq

2021-09-11  Jakub Jelinek  <jakub@redhat.com>

* c-c++-common/auto-init-1.c: Enable test only on ilp32 or lp64
targets, expect different long and pointer sizes between ilp32 and
lp64.
* c-c++-common/auto-init-2.c: Likewise.
* c-c++-common/auto-init-3.c: Expect one of the common long double
sizes (8/12/16 bytes) instead of hardcoding 16 bytes.
* c-c++-common/auto-init-4.c: Likewise.
* c-c++-common/auto-init-5.c: Expect one of the common
_Complex long double sizes (16/24/32 bytes) instead of hardcoding 32
bytes.
* c-c++-common/auto-init-6.c: Likewise.
* c-c++-common/auto-init-padding-1.c: Enable test only on ilp32 or lp64
targets.
(struct test_small_hole): Change type of four to unsigned long long
and add aligned attribute.

Daily bump.

libgccjit: Generate debug info for variables

Finalize declares via available helpers after location is set. Set
TYPE_NAME of primitives and friends to "int" etc. Debug info is now
set properly for variables.

Signed-off-by:
2021-09-09 Petter Tomner <tomner@kth.se>

gcc/jit/
* jit-playback.c: Moved global var processing to after loc handling.
  Setting TYPE_NAME for fundamental types.
  Using common functions for finalizing globals.
* jit-playback.h: New method init_types().
  Changed get_tree_node_for_type() to method.

gcc/testsuite/
* jit.dg/test-error-array-bounds.c: Array is not unsigned
* jit.dg/jit.exp: Helper function
* jit.dg/test-debuginfo.c: New testcase

Revert "Get rid of all float-int special cases in validate_subreg."

This reverts commit d2874d905647a1d146dafa60199d440e837adc4d.

PR target/102254
PR target/102154
PR target/102211

MAINTAINERS: Adding myself to to DCO and write after approval

2020-09-10 Petter Tomner <tomner@kth.se>

ChangeLog:
* MAINTAINERS: Me added to DCO and write after approval

openmp: Implement OpenMP 5.1 atomics, so far for C only

This patch implements OpenMP 5.1 atomics (with clarifications from upcoming 5.2).
The most important changes are that it is now possible to write (for C/C++,
for Fortran it was possible before already) min/max atomics and more importantly
compare and exchange in various forms.
Also, acq_rel is now allowed on read/write and acq_rel/acquire are allowed on
update, and there are new compare, weak and fail clauses.

2021-09-10  Jakub Jelinek  <jakub@redhat.com>

gcc/
* tree-core.h (enum omp_memory_order): Add OMP_MEMORY_ORDER_MASK,
OMP_FAIL_MEMORY_ORDER_UNSPECIFIED, OMP_FAIL_MEMORY_ORDER_RELAXED,
OMP_FAIL_MEMORY_ORDER_ACQUIRE, OMP_FAIL_MEMORY_ORDER_RELEASE,
OMP_FAIL_MEMORY_ORDER_ACQ_REL, OMP_FAIL_MEMORY_ORDER_SEQ_CST and
OMP_FAIL_MEMORY_ORDER_MASK enumerators.
(OMP_FAIL_MEMORY_ORDER_SHIFT): Define.
* gimple-pretty-print.c (dump_gimple_omp_atomic_load,
dump_gimple_omp_atomic_store): Print [weak] for weak atomic
load/store.
* gimple.h (enum gf_mask): Change GF_OMP_ATOMIC_MEMORY_ORDER
to 6-bit mask, adjust GF_OMP_ATOMIC_NEED_VALUE value and add
GF_OMP_ATOMIC_WEAK.
(gimple_omp_atomic_weak_p, gimple_omp_atomic_set_weak): New inline
functions.
* tree.h (OMP_ATOMIC_WEAK): Define.
* tree-pretty-print.c (dump_omp_atomic_memory_order): Adjust for
fail memory order being encoded in the same enum and also print
fail clause if present.
(dump_generic_node): Print weak clause if OMP_ATOMIC_WEAK.
* gimplify.c (goa_stabilize_expr): Add target_expr and rhs arguments,
handle pre_p == NULL case as a test mode that only returns value
but doesn't change gimplify nor change anything otherwise, adjust
recursive calls, add MODIFY_EXPR, ADDR_EXPR, COND_EXPR, TARGET_EXPR
and CALL_EXPR handling, adjust COMPOUND_EXPR handling for
__builtin_clear_padding calls, for !rhs gimplify as lvalue rather
than rvalue.
(gimplify_omp_atomic): Adjust goa_stabilize_expr caller.  Handle
COND_EXPR rhs.  Set weak flag on gimple load/store for
OMP_ATOMIC_WEAK.
* omp-expand.c (omp_memory_order_to_fail_memmodel): New function.
(omp_memory_order_to_memmodel): Adjust for fail clause encoded
in the same enum.
(expand_omp_atomic_cas): New function.
(expand_omp_atomic_pipeline): Use omp_memory_order_to_fail_memmodel
function.
(expand_omp_atomic): Attempt to optimize atomic compare and exchange
using expand_omp_atomic_cas.
gcc/c-family/
* c-common.h (c_finish_omp_atomic): Add r and weak arguments.
* c-omp.c: Include gimple-fold.h.
(c_finish_omp_atomic): Add r and weak arguments.  Add support for
OpenMP 5.1 atomics.
gcc/c/
* c-parser.c (c_parser_conditional_expression): If omp_atomic_lhs and
cond.value is >, < or == with omp_atomic_lhs as one of the operands,
don't call build_conditional_expr, instead build a COND_EXPR directly.
(c_parser_binary_expression): Avoid calling parser_build_binary_op
if omp_atomic_lhs even in more cases for >, < or ==.
(c_parser_omp_atomic): Update function comment for OpenMP 5.1 atomics,
parse OpenMP 5.1 atomics and fail, compare and weak clauses, allow
acq_rel on atomic read/write and acq_rel/acquire clauses on update.
* c-typeck.c (build_binary_op): For flag_openmp only handle
MIN_EXPR/MAX_EXPR.
gcc/cp/
* parser.c (cp_parser_omp_atomic): Allow acq_rel on atomic read/write
and acq_rel/acquire clauses on update.
* semantics.c (finish_omp_atomic): Adjust c_finish_omp_atomic caller.
gcc/testsuite/
* c-c++-common/gomp/atomic-17.c (foo): Add tests for atomic read,
write or update with acq_rel clause and atomic update with acquire clause.
* c-c++-common/gomp/atomic-18.c (foo): Adjust expected diagnostics
wording, remove tests moved to atomic-17.c.
* c-c++-common/gomp/atomic-21.c: Expect only 2 omp atomic release and
2 omp atomic acq_rel directives instead of 4 omp atomic release.
* c-c++-common/gomp/atomic-25.c: New test.
* c-c++-common/gomp/atomic-26.c: New test.
* c-c++-common/gomp/atomic-27.c: New test.
* c-c++-common/gomp/atomic-28.c: New test.
* c-c++-common/gomp/atomic-29.c: New test.
* c-c++-common/gomp/atomic-30.c: New test.
* c-c++-common/goacc-gomp/atomic.c: Expect 1 omp atomic release and
1 omp atomic_acq_rel instead of 2 omp atomic release directives.
* gcc.dg/gomp/atomic-5.c: Adjust expected error diagnostic wording.
* g++.dg/gomp/atomic-18.C:Expect 4 omp atomic release and
1 omp atomic_acq_rel instead of 5 omp atomic release directives.
libgomp/
* testsuite/libgomp.c-c++-common/atomic-19.c: New test.
* testsuite/libgomp.c-c++-common/atomic-20.c: New test.
* testsuite/libgomp.c-c++-common/atomic-21.c: New test.

compiler: correct condition for calling memclrHasPointers

When compiling append(s, make([]typ, ln)...), where typ has a pointer,
and the append fits within the existing capacity of s, the condition
used to clear out the new elements was reversed.

Fixes golang/go#47771

Reviewed-on: https://go-review.googlesource.com/c/gofrontend/+/344189

Disable threading through latches until after loop optimizations.

The motivation for this patch was enabling the use of global ranges in
the path solver, but this caused certain properties of loops being
destroyed which made subsequent loop optimizations to fail.
Consequently, this patch's mail goal is to disable jump threading
involving the latch until after loop optimizations have run.

As can be seen in the test adjustments, we mostly shift the threading
from the early threaders (ethread, thread[12] to the late threaders
thread[34]).  I have nuked some of the early notes in the testcases
that came as part of the jump threader rewrite.  They're mostly noise
now.

Note that we could probably relax some other restrictions in
profitable_path_p when loop optimizations have completed, but it would
require more testing, and I'm hesitant to touch more things than needed
at this point.  I have added a reminder to the function to keep this
in mind.

Finally, perhaps as a follow-up, we should apply the same restrictions to
the forward threader.  At some point I'd like to combine the cost models.

Tested on x86-64 Linux.

p.s. There is a thorough discussion involving the limitations of jump
threading involving loops here:

https://gcc.gnu.org/pipermail/gcc/2021-September/237247.html

gcc/ChangeLog:

* tree-pass.h (PROP_loop_opts_done): New.
* gimple-range-path.cc (path_range_query::internal_range_of_expr):
Intersect with global range.
* tree-ssa-loop.c (tree_ssa_loop_done): Set PROP_loop_opts_done.
* tree-ssa-threadbackward.c
(back_threader_profitability::profitable_path_p): Disable
threading through latches until after loop optimizations have run.

gcc/testsuite/ChangeLog:

* gcc.dg/tree-ssa/ssa-dom-thread-2b.c: Adjust for disabling of
threading through latches.
* gcc.dg/tree-ssa/ssa-dom-thread-6.c: Same.
* gcc.dg/tree-ssa/ssa-dom-thread-7.c: Same.

Co-authored-by: Michael Matz <matz@suse.de>

doc: document BPF -mcpu and related options

This commit adds documentation for the new BPF options -mcpu, -mjmpext,
-mjmp32, and -malu32.

gcc/ChangeLog:
* doc/invoke.texi: Document BPF -mcpu, -mjmpext, -mjmp32 and -malu32
options.

bpf testsuite: add tests for new feature options

This commit adds tests for the new -mjmpext, -mjmp32 and -malu32 feature
options in the BPF backend.

gcc/testsuite/ChangeLog:
* gcc.target/bpf/alu-1.c: New test.
* gcc.target/bpf/jmp-1.c: New test.

bpf: add -mcpu and related feature options

New instructions have been added over time to the eBPF ISA, but
previously there has been no good method to select which version to
target in GCC.

This patch adds the following options to the BPF backend:

  -mcpu={v1, v2, v3}
    Select which version of the eBPF ISA to target. This enables or
    disables generation of certain instructions. The default is v3.

  -mjmpext
    Enable extra conditional branch instructions.
    Enabled for CPU v2 and above.

  -mjmp32
    Enable 32-bit jump/branch instructions.
    Enabled for CPU v3 and above.

  -malu32
    Enable 32-bit ALU instructions.
    Enabled for CPU v3 and above.

gcc/ChangeLog:
* config/bpf/bpf-opts.h (bpf_isa_version): New enum.
* config/bpf/bpf-protos.h (bpf_expand_cbranch): New.
* config/bpf/bpf.c (bpf_option_override): Handle -mcpu option.
(bpf_expand_cbranch): New function.
* config/bpf/bpf.md (AM mode iterator): Conditionalize support for SI
mode.
(zero_extendsidi2): Only use mov32 instruction if it is available.
(SIM mode iterator): Conditionalize support for SI mode.
(JM mode iterator): New.
(cbranchdi4): Update name, use new JM iterator. Use bpf_expand_cbranch.
(*branch_on_di): Update name, use new JM iterator.
* config/bpf/bpf.opt: (mjmpext): New option.
(malu32): Likewise.
(mjmp32): Likewise.
(mcpu): Likewise.
(bpf_isa): New enum.

bpf: correct zero_extend output templates

The output templates for zero_extendhidi2 and zero_extendqidi2 could
lead to incorrect code generation when zero-extending one register into
another. This patch adds a new output template to the define_insns to
handle such cases and produce correct asm.

gcc/ChangeLog:
* config/bpf/bpf.md (zero_extendhidi2): Add new output template
for register-to-register extensions.
(zero_extendqidi2): Likewise.

libstdc++: Use "test.invalid." for invalid hostname

This avoids test.invalid.some.domain being successfully resolved.

libstdc++-v3/ChangeLog:

* testsuite/experimental/net/internet/resolver/ops/lookup.cc:
Fix invalid hostname to only match the .invalid TLD.

middle-end/102273 - avoid ICE with auto-init and nested functions

This refactors expansion to consider non-decl LHS. I suspect
the is_val argument is not needed.

2021-09-10 Richard Biener <rguenther@suse.de>

PR middle-end/102273
* internal-fn.c (expand_DEFERRED_INIT): Always expand non-SSA vars.

* gcc.dg/pr102273.c: New testcase.

Fix 'dg-do run' syntax in 'c-c++-common/auto-init-padding-{2,3}.c'

Fix-up for recent commit a25e0b5e6ac8a77a71c229e0a7b744603365b0e9
"Add -ftrivial-auto-var-init option and uninitialized variable attribute".

gcc/testsuite/
* c-c++-common/auto-init-padding-2.c: Fix 'dg-do run' syntax.
* c-c++-common/auto-init-padding-3.c: Likewise.

middle-end/102269 - avoid auto-init of empty types

This avoids initializing empty types for which we'll eventually
leave a .DEFERRED_INIT call without a LHS.

2021-09-10 Richard Biener <rguenther@suse.de>

PR middle-end/102269
* gimplify.c (is_var_need_auto_init): Empty types do not need
initialization.

* gcc.dg/pr102269.c: New testcase.

Remove vestiges of --with-stabs

This removes the --with-stabs configure option which had no effect
since quite some time.

2021-09-10 Richard Biener <rguenther@suse.de>

* configure.ac (--with-stabs): Remove.
* configure: Regenerate.
* doc/install.texi: Remove --with-stabs documentation.

AVX512FP16: Add testcase for vcmpph/vcmpsh/vcomish/vucomish.

gcc/testsuite/ChangeLog:

* gcc.target/i386/avx512fp16-helper.h
(check_results_mask): New check_function.
* gcc.target/i386/avx512fp16-vcmpph-1a.c: New test.
* gcc.target/i386/avx512fp16-vcmpph-1b.c: Ditto.
* gcc.target/i386/avx512fp16-vcmpsh-1a.c: Ditto.
* gcc.target/i386/avx512fp16-vcmpsh-1b.c: Ditto.
* gcc.target/i386/avx512fp16-vcomish-1a.c: Ditto.
* gcc.target/i386/avx512fp16-vcomish-1b.c: Ditto.
* gcc.target/i386/avx512fp16-vcomish-1c.c: Ditto.
* gcc.target/i386/avx512fp16vl-vcmpph-1a.c: Ditto.
* gcc.target/i386/avx512fp16vl-vcmpph-1b.c: Ditto.

AVX512FP16: Add vcmpph/vcmpsh/vcomish/vucomish.

gcc/ChangeLog:

* config/i386/avx512fp16intrin.h: (_mm512_cmp_ph_mask):
New intrinsic.
(_mm512_mask_cmp_ph_mask): Likewise.
(_mm512_cmp_round_ph_mask): Likewise.
(_mm512_mask_cmp_round_ph_mask): Likewise.
(_mm_cmp_sh_mask): Likewise.
(_mm_mask_cmp_sh_mask): Likewise.
(_mm_cmp_round_sh_mask): Likewise.
(_mm_mask_cmp_round_sh_mask): Likewise.
(_mm_comieq_sh): Likewise.
(_mm_comilt_sh): Likewise.
(_mm_comile_sh): Likewise.
(_mm_comigt_sh): Likewise.
(_mm_comige_sh): Likewise.
(_mm_comineq_sh): Likewise.
(_mm_ucomieq_sh): Likewise.
(_mm_ucomilt_sh): Likewise.
(_mm_ucomile_sh): Likewise.
(_mm_ucomigt_sh): Likewise.
(_mm_ucomige_sh): Likewise.
(_mm_ucomineq_sh): Likewise.
(_mm_comi_round_sh): Likewise.
(_mm_comi_sh): Likewise.
* config/i386/avx512fp16vlintrin.h (_mm_cmp_ph_mask): New intrinsic.
(_mm_mask_cmp_ph_mask): Likewise.
(_mm256_cmp_ph_mask): Likewise.
(_mm256_mask_cmp_ph_mask): Likewise.
* config/i386/i386-builtin-types.def: Add corresponding builtin types.
* config/i386/i386-builtin.def: Add corresponding new builtins.
* config/i386/i386-expand.c
(ix86_expand_args_builtin): Handle new builtin types.
(ix86_expand_round_builtin): Ditto.
* config/i386/i386.md (ssevecmode): Add HF mode.
(MODEFH): New mode iterator.
* config/i386/sse.md
(V48H_AVX512VL): New mode iterator to support HF vector modes.
Ajdust corresponding description.
(ssecmpintprefix): New.
(VI12_AVX512VL): Adjust to support HF vector modes.
(cmp_imm_predicate): Likewise.
(<avx512>_cmp<mode>3<mask_scalar_merge_name><round_saeonly_name>):
Likewise.
(avx512f_vmcmp<mode>3<round_saeonly_name>): Likewise.
(avx512f_vmcmp<mode>3_mask<round_saeonly_name>): Likewise.
(<sse>_<unord>comi<round_saeonly_name>): Likewise.

gcc/testsuite/ChangeLog:

* gcc.target/i386/avx-1.c: Add test for new builtins.
* gcc.target/i386/sse-13.c: Ditto.
* gcc.target/i386/sse-23.c: Ditto.
* gcc.target/i386/sse-14.c: Add test for new intrinsics.
* gcc.target/i386/sse-22.c: Ditto.

AVX512FP16: Add testcase for vmaxph/vmaxsh/vminph/vminsh.

gcc/testsuite/ChangeLog:

* gcc.target/i386/avx512fp16-vmaxph-1a.c: New test.
* gcc.target/i386/avx512fp16-vmaxph-1b.c: Ditto.
* gcc.target/i386/avx512fp16-vmaxsh-1.c: Ditto.
* gcc.target/i386/avx512fp16-vmaxsh-1b.c: Ditto.
* gcc.target/i386/avx512fp16-vminph-1a.c: Ditto.
* gcc.target/i386/avx512fp16-vminph-1b.c: Ditto.
* gcc.target/i386/avx512fp16-vminsh-1.c: Ditto.
* gcc.target/i386/avx512fp16-vminsh-1b.c: Ditto.
* gcc.target/i386/avx512fp16vl-vmaxph-1a.c: Ditto.
* gcc.target/i386/avx512fp16vl-vmaxph-1b.c: Ditto.
* gcc.target/i386/avx512fp16vl-vminph-1a.c: Ditto.
* gcc.target/i386/avx512fp16vl-vminph-1b.c: Ditto.

AVX512FP16: Add vmaxph/vminph/vmaxsh/vminsh.

gcc/ChangeLog:

* config/i386/avx512fp16intrin.h: (_mm512_max_ph): New intrinsic.
(_mm512_mask_max_ph): Likewise.
(_mm512_maskz_max_ph): Likewise.
(_mm512_min_ph): Likewise.
(_mm512_mask_min_ph): Likewise.
(_mm512_maskz_min_ph): Likewise.
(_mm512_max_round_ph): Likewise.
(_mm512_mask_max_round_ph): Likewise.
(_mm512_maskz_max_round_ph): Likewise.
(_mm512_min_round_ph): Likewise.
(_mm512_mask_min_round_ph): Likewise.
(_mm512_maskz_min_round_ph): Likewise.
(_mm_max_sh): Likewise.
(_mm_mask_max_sh): Likewise.
(_mm_maskz_max_sh): Likewise.
(_mm_min_sh): Likewise.
(_mm_mask_min_sh): Likewise.
(_mm_maskz_min_sh): Likewise.
(_mm_max_round_sh): Likewise.
(_mm_mask_max_round_sh): Likewise.
(_mm_maskz_max_round_sh): Likewise.
(_mm_min_round_sh): Likewise.
(_mm_mask_min_round_sh): Likewise.
(_mm_maskz_min_round_sh): Likewise.
* config/i386/avx512fp16vlintrin.h (_mm_max_ph): New intrinsic.
(_mm256_max_ph): Likewise.
(_mm_mask_max_ph): Likewise.
(_mm256_mask_max_ph): Likewise.
(_mm_maskz_max_ph): Likewise.
(_mm256_maskz_max_ph): Likewise.
(_mm_min_ph): Likewise.
(_mm256_min_ph): Likewise.
(_mm_mask_min_ph): Likewise.
(_mm256_mask_min_ph): Likewise.
(_mm_maskz_min_ph): Likewise.
(_mm256_maskz_min_ph): Likewise.
* config/i386/i386-builtin-types.def: Add corresponding builtin types.
* config/i386/i386-builtin.def: Add corresponding new builtins.
* config/i386/i386-expand.c
(ix86_expand_args_builtin): Handle new builtin types.
* config/i386/sse.md
(<code><mode>3<mask_name><round_saeonly_name>): Adjust to
support HF vector modes.
(*<code><mode>3<mask_name><round_saeonly_name>): Likewise.
(ieee_<ieee_maxmin><mode>3<mask_name><round_saeonly_name>):
Likewise.
(<sse>_vm<code><mode>3<mask_scalar_name><round_saeonly_scalar_name>):
Likewise.
* config/i386/subst.md (round_saeonly_mode512bit_condition):
Adjust for HF vector modes.

gcc/testsuite/ChangeLog:

* gcc.target/i386/avx-1.c: Add test for new builtins.
* gcc.target/i386/sse-13.c: Ditto.
* gcc.target/i386/sse-23.c: Ditto.
* gcc.target/i386/sse-14.c: Add test for new intrinsics.
* gcc.target/i386/sse-22.c: Ditto.

AVX512FP16: Add testcase for vaddsh/vsubsh/vmulsh/vdivsh.

gcc/testsuite/ChangeLog:

* gcc.target/i386/avx512fp16-vaddsh-1a.c: New test.
* gcc.target/i386/avx512fp16-vaddsh-1b.c: Ditto.
* gcc.target/i386/avx512fp16-vdivsh-1a.c: Ditto.
* gcc.target/i386/avx512fp16-vdivsh-1b.c: Ditto.
* gcc.target/i386/avx512fp16-vmulsh-1a.c: Ditto.
* gcc.target/i386/avx512fp16-vmulsh-1b.c: Ditto.
* gcc.target/i386/avx512fp16-vsubsh-1a.c: Ditto.
* gcc.target/i386/avx512fp16-vsubsh-1b.c: Ditto.
* gcc.target/i386/pr54855-11.c: Ditto.

AVX512FP16: Add vaddsh/vsubsh/vmulsh/vdivsh.

gcc/ChangeLog:

* config/i386/avx512fp16intrin.h (_mm_add_sh): New intrinsic.
(_mm_mask_add_sh): Likewise.
(_mm_maskz_add_sh): Likewise.
(_mm_sub_sh): Likewise.
(_mm_mask_sub_sh): Likewise.
(_mm_maskz_sub_sh): Likewise.
(_mm_mul_sh): Likewise.
(_mm_mask_mul_sh): Likewise.
(_mm_maskz_mul_sh): Likewise.
(_mm_div_sh): Likewise.
(_mm_mask_div_sh): Likewise.
(_mm_maskz_div_sh): Likewise.
(_mm_add_round_sh): Likewise.
(_mm_mask_add_round_sh): Likewise.
(_mm_maskz_add_round_sh): Likewise.
(_mm_sub_round_sh): Likewise.
(_mm_mask_sub_round_sh): Likewise.
(_mm_maskz_sub_round_sh): Likewise.
(_mm_mul_round_sh): Likewise.
(_mm_mask_mul_round_sh): Likewise.
(_mm_maskz_mul_round_sh): Likewise.
(_mm_div_round_sh): Likewise.
(_mm_mask_div_round_sh): Likewise.
(_mm_maskz_div_round_sh): Likewise.
* config/i386/i386-builtin-types.def: Add corresponding builtin types.
* config/i386/i386-builtin.def: Add corresponding new builtins.
* config/i386/i386-expand.c
(ix86_expand_round_builtin): Handle new builtins.
* config/i386/sse.md (VF_128): Change description.
(<sse>_vm<plusminus_insn><mode>3<mask_scalar_name><round_scalar_name>):
Adjust to support HF vector modes.
(<sse>_vm<multdiv_mnemonic><mode>3<mask_scalar_name><round_scalar_name>):
Likewise.

gcc/testsuite/ChangeLog:

* gcc.target/i386/avx-1.c: Add test for new builtins.
* gcc.target/i386/sse-13.c: Ditto.
* gcc.target/i386/sse-23.c: Ditto.
* gcc.target/i386/sse-14.c: Add test for new intrinsics.
* gcc.target/i386/sse-22.c: Ditto.

AVX512FP16: Enable _Float16 autovectorization

gcc/ChangeLog:

* config/i386/i386-expand.c
(ix86_avx256_split_vector_move_misalign): Handle V16HF mode.
* config/i386/i386.c
(ix86_preferred_simd_mode): Handle HF mode.
* config/i386/sse.md (V_256H): New mode iterator.
(avx_vextractf128<mode>): Use it.
(VEC_INIT_MODE): Align vector HFmode condition to vector
HImodes since there're no real HF instruction used.
(VEC_INIT_HALF_MODE): Ditto.
(VIHF): Ditto.
(VIHF_AVX512BW): Ditto.
(*vec_extracthf): Ditto.
(VEC_EXTRACT_MODE): Ditto.

gcc/testsuite/ChangeLog:

* gcc.target/i386/vect-float16-1.c: New test.
* gcc.target/i386/vect-float16-10.c: Ditto.
* gcc.target/i386/vect-float16-11.c: Ditto.
* gcc.target/i386/vect-float16-12.c: Ditto.
* gcc.target/i386/vect-float16-2.c: Ditto.
* gcc.target/i386/vect-float16-3.c: Ditto.
* gcc.target/i386/vect-float16-4.c: Ditto.
* gcc.target/i386/vect-float16-5.c: Ditto.
* gcc.target/i386/vect-float16-6.c: Ditto.
* gcc.target/i386/vect-float16-7.c: Ditto.
* gcc.target/i386/vect-float16-8.c: Ditto.
* gcc.target/i386/vect-float16-9.c: Ditto.

Remove dbx.h, do not set PREFERRED_DEBUGGING_TYPE from dbxcoff.h, lynx.h

The following removes the unused config/dbx.h file and removes the
setting of PREFERRED_DEBUGGING_TYPE from dbxcoff.h which is
overridden by all users (djgpp/mingw/cygwin) via either including
config/i386/djgpp.h or config/i386/cygming.h

There are still circumstances where mingw and cygwin default to
STABS, namely when HAVE_GAS_PE_SECREL32_RELOC is not defined and
the target defaults to 32bit code generation.

The new style handling DBX_DEBUGGING_INFO is in line with
dbxelf.h which does not define PREFERRED_DEBUGGING_TYPE either.

The patch also removes the PREFERRED_DEBUGGING_TYPE define from
lynx.h which always follows elfos.h already defaulting to DWARF,
so the comment about STABS being the default is misleading and
outdated.

2021-09-09 Richard Biener <rguenther@suse.de>

PR target/102255
* config/dbx.h: Remove.
* config/dbxcoff.h: Do not define PREFERRED_DEBUGGING_TYPE.
* config/lynx.h: Likewise.

Remove copysign post_reload splitter for scalar modes.

It can generate better code just like avx512dq-abs-copysign-1.c
shows.

gcc/ChangeLog:

* config/i386/i386-expand.c (ix86_expand_copysign): Expand
right into ANDNOT + AND + IOR, using paradoxical subregs.
(ix86_split_copysign_const): Remove.
(ix86_split_copysign_var): Ditto.
* config/i386/i386-protos.h (ix86_split_copysign_const): Dotto.
(ix86_split_copysign_var): Ditto.
* config/i386/i386.md (@copysign<mode>3_const): Ditto.
(@copysign<mode>3_var): Ditto.

gcc/testsuite/ChangeLog:

* gcc.target/i386/avx512dq-abs-copysign-1.c: Adjust testcase.
* gcc.target/i386/avx512vl-abs-copysign-1.c: Adjust testcase.

Daily bump.

Add -ftrivial-auto-var-init option and uninitialized variable attribute.

Initialize automatic variables with either a pattern or with zeroes to increase
the security and predictability of a program by preventing uninitialized memory
disclosure and use.
GCC still considers an automatic variable that doesn't have an explicit
initializer as uninitialized, -Wuninitialized will still report warning messages
on such automatic variables.
With this option, GCC will also initialize any padding of automatic variables
that have structure or union types to zeroes.
You can control this behavior for a specific variable by using the variable
attribute "uninitialized" to control runtime overhead.

gcc/ChangeLog:

2021-09-09  qing zhao  <qing.zhao@oracle.com>

* builtins.c (expand_builtin_memset): Make external visible.
* builtins.h (expand_builtin_memset): Declare extern.
* common.opt (ftrivial-auto-var-init=): New option.
* doc/extend.texi: Document the uninitialized attribute.
* doc/invoke.texi: Document -ftrivial-auto-var-init.
* flag-types.h (enum auto_init_type): New enumerated type
auto_init_type.
* gimple-fold.c (clear_padding_type): Add one new parameter.
(clear_padding_union): Likewise.
(clear_padding_emit_loop): Likewise.
(clear_type_padding_in_mask): Likewise.
(gimple_fold_builtin_clear_padding): Handle this new parameter.
* gimplify.c (gimple_add_init_for_auto_var): New function.
(gimple_add_padding_init_for_auto_var): New function.
(is_var_need_auto_init): New function.
(gimplify_decl_expr): Add initialization to automatic variables per
users' requests.
(gimplify_call_expr): Add one new parameter for call to
__builtin_clear_padding.
(gimplify_init_constructor): Add padding initialization in the end.
* internal-fn.c (INIT_PATTERN_VALUE): New macro.
(expand_DEFERRED_INIT): New function.
* internal-fn.def (DEFERRED_INIT): New internal function.
* tree-cfg.c (verify_gimple_call): Verify calls to .DEFERRED_INIT.
* tree-sra.c (generate_subtree_deferred_init): New function.
(scan_function): Avoid setting cannot_scalarize_away_bitmap for
calls to .DEFERRED_INIT.
(sra_modify_deferred_init): New function.
(sra_modify_function_body): Handle calls to DEFERRED_INIT specially.
* tree-ssa-structalias.c (find_func_aliases_for_call): Likewise.
* tree-ssa-uninit.c (warn_uninit): Handle calls to DEFERRED_INIT
specially.
(check_defs): Likewise.
(warn_uninitialized_vars): Likewise.
* tree-ssa.c (ssa_undefined_value_p): Likewise.
* tree.c (build_common_builtin_nodes): Build tree node for
BUILT_IN_CLEAR_PADDING when needed.

gcc/c-family/ChangeLog:

2021-09-09  qing zhao  <qing.zhao@oracle.com>

* c-attribs.c (handle_uninitialized_attribute): New function.
(c_common_attribute_table): Add "uninitialized" attribute.

gcc/testsuite/ChangeLog:

2021-09-09  qing zhao  <qing.zhao@oracle.com>

* c-c++-common/auto-init-1.c: New test.
* c-c++-common/auto-init-10.c: New test.
* c-c++-common/auto-init-11.c: New test.
* c-c++-common/auto-init-12.c: New test.
* c-c++-common/auto-init-13.c: New test.
* c-c++-common/auto-init-14.c: New test.
* c-c++-common/auto-init-15.c: New test.
* c-c++-common/auto-init-16.c: New test.
* c-c++-common/auto-init-2.c: New test.
* c-c++-common/auto-init-3.c: New test.
* c-c++-common/auto-init-4.c: New test.
* c-c++-common/auto-init-5.c: New test.
* c-c++-common/auto-init-6.c: New test.
* c-c++-common/auto-init-7.c: New test.
* c-c++-common/auto-init-8.c: New test.
* c-c++-common/auto-init-9.c: New test.
* c-c++-common/auto-init-esra.c: New test.
* c-c++-common/auto-init-padding-1.c: New test.
* c-c++-common/auto-init-padding-2.c: New test.
* c-c++-common/auto-init-padding-3.c: New test.
* g++.dg/auto-init-uninit-pred-1_a.C: New test.
* g++.dg/auto-init-uninit-pred-2_a.C: New test.
* g++.dg/auto-init-uninit-pred-3_a.C: New test.
* g++.dg/auto-init-uninit-pred-4.C: New test.
* gcc.dg/auto-init-sra-1.c: New test.
* gcc.dg/auto-init-sra-2.c: New test.
* gcc.dg/auto-init-uninit-1.c: New test.
* gcc.dg/auto-init-uninit-12.c: New test.
* gcc.dg/auto-init-uninit-13.c: New test.
* gcc.dg/auto-init-uninit-14.c: New test.
* gcc.dg/auto-init-uninit-15.c: New test.
* gcc.dg/auto-init-uninit-16.c: New test.
* gcc.dg/auto-init-uninit-17.c: New test.
* gcc.dg/auto-init-uninit-18.c: New test.
* gcc.dg/auto-init-uninit-19.c: New test.
* gcc.dg/auto-init-uninit-2.c: New test.
* gcc.dg/auto-init-uninit-20.c: New test.
* gcc.dg/auto-init-uninit-21.c: New test.
* gcc.dg/auto-init-uninit-22.c: New test.
* gcc.dg/auto-init-uninit-23.c: New test.
* gcc.dg/auto-init-uninit-24.c: New test.
* gcc.dg/auto-init-uninit-25.c: New test.
* gcc.dg/auto-init-uninit-26.c: New test.
* gcc.dg/auto-init-uninit-3.c: New test.
* gcc.dg/auto-init-uninit-34.c: New test.
* gcc.dg/auto-init-uninit-36.c: New test.
* gcc.dg/auto-init-uninit-37.c: New test.
* gcc.dg/auto-init-uninit-4.c: New test.
* gcc.dg/auto-init-uninit-5.c: New test.
* gcc.dg/auto-init-uninit-6.c: New test.
* gcc.dg/auto-init-uninit-8.c: New test.
* gcc.dg/auto-init-uninit-9.c: New test.
* gcc.dg/auto-init-uninit-A.c: New test.
* gcc.dg/auto-init-uninit-B.c: New test.
* gcc.dg/auto-init-uninit-C.c: New test.
* gcc.dg/auto-init-uninit-H.c: New test.
* gcc.dg/auto-init-uninit-I.c: New test.
* gcc.target/aarch64/auto-init-1.c: New test.
* gcc.target/aarch64/auto-init-2.c: New test.
* gcc.target/aarch64/auto-init-3.c: New test.
* gcc.target/aarch64/auto-init-4.c: New test.
* gcc.target/aarch64/auto-init-5.c: New test.
* gcc.target/aarch64/auto-init-6.c: New test.
* gcc.target/aarch64/auto-init-7.c: New test.
* gcc.target/aarch64/auto-init-8.c: New test.
* gcc.target/aarch64/auto-init-padding-1.c: New test.
* gcc.target/aarch64/auto-init-padding-10.c: New test.
* gcc.target/aarch64/auto-init-padding-11.c: New test.
* gcc.target/aarch64/auto-init-padding-12.c: New test.
* gcc.target/aarch64/auto-init-padding-2.c: New test.
* gcc.target/aarch64/auto-init-padding-3.c: New test.
* gcc.target/aarch64/auto-init-padding-4.c: New test.
* gcc.target/aarch64/auto-init-padding-5.c: New test.
* gcc.target/aarch64/auto-init-padding-6.c: New test.
* gcc.target/aarch64/auto-init-padding-7.c: New test.
* gcc.target/aarch64/auto-init-padding-8.c: New test.
* gcc.target/aarch64/auto-init-padding-9.c: New test.
* gcc.target/i386/auto-init-1.c: New test.
* gcc.target/i386/auto-init-2.c: New test.
* gcc.target/i386/auto-init-21.c: New test.
* gcc.target/i386/auto-init-22.c: New test.
* gcc.target/i386/auto-init-23.c: New test.
* gcc.target/i386/auto-init-24.c: New test.
* gcc.target/i386/auto-init-3.c: New test.
* gcc.target/i386/auto-init-4.c: New test.
* gcc.target/i386/auto-init-5.c: New test.
* gcc.target/i386/auto-init-6.c: New test.
* gcc.target/i386/auto-init-7.c: New test.
* gcc.target/i386/auto-init-8.c: New test.
* gcc.target/i386/auto-init-padding-1.c: New test.
* gcc.target/i386/auto-init-padding-10.c: New test.
* gcc.target/i386/auto-init-padding-11.c: New test.
* gcc.target/i386/auto-init-padding-12.c: New test.
* gcc.target/i386/auto-init-padding-2.c: New test.
* gcc.target/i386/auto-init-padding-3.c: New test.
* gcc.target/i386/auto-init-padding-4.c: New test.
* gcc.target/i386/auto-init-padding-5.c: New test.
* gcc.target/i386/auto-init-padding-6.c: New test.
* gcc.target/i386/auto-init-padding-7.c: New test.
* gcc.target/i386/auto-init-padding-8.c: New test.
* gcc.target/i386/auto-init-padding-9.c: New test.

Fortran - out of bounds in array constructor with implied do loop

gcc/fortran/ChangeLog:

PR fortran/98490
* trans-expr.c (gfc_conv_substring): Do not generate substring
bounds check for implied do loop index variable before it actually
becomes defined.

gcc/testsuite/ChangeLog:

PR fortran/98490
* gfortran.dg/bounds_check_23.f90: New test.

x86-64: Update AVX512FP16 ABI tests for x32

On x32, long is the same as int and pointer is 32 bits. Update AVX512FP16
ABI tests:

1. Replace long with long long for 64-bit integers.
2. Update type and alignment for long and pointer.
3. Skip tests for long on x32.

* gcc.target/x86_64/abi/avx512fp16/args.h: Replace long with
long long.
(XMM_T): Rename _long to _longlong and _ulong to _ulonglong.
(X87_T): Rename _ulong to _ulonglong.
* gcc.target/x86_64/abi/avx512fp16/defines.h (TYPE_SIZE_LONG):
Define to 4 if __ILP32__ is defined.
(TYPE_SIZE_POINTER): Likewise.
(TYPE_ALIGN_LONG): Likewise.
(TYPE_ALIGN_POINTER): Likewise.
* gcc.target/x86_64/abi/avx512fp16/test_3_element_struct_and_unions.c
(main): Skip test for long if __ILP32__ is defined.
* gcc.target/x86_64/abi/avx512fp16/test_m64m128_returning.c
(do_test): Replace _long with _longlong.
* gcc.target/x86_64/abi/avx512fp16/test_struct_returning.c:
(check_300): Replace _ulong with _ulonglong.
* gcc.target/x86_64/abi/avx512fp16/m256h/args.h: Replace long
with long long.
(YMM_T): Rename _long to _longlong and _ulong to _ulonglong.
(X87_T): Rename _ulong to _ulonglong.
* gcc.target/x86_64/abi/avx512fp16/m512h/args.h: Replace long
with long long.
(ZMM_T): Rename _long to _longlong and _ulong to _ulonglong.
(X87_T): Rename _ulong to _ulonglong.

Improve LIM fill_always_executed_in computation

Currently the DOM walk over a loop body does not walk into not
always executed subloops to avoid scalability issues since doing
so makes the walk quadratic in the loop depth. It turns out this
is not an issue in practice and even with a loop depth of 1800
this function is way off the radar.

So the following patch removes the limitation, replacing it with
a comment.

2021-09-09 Richard Biener <rguenther@suse.de>

* tree-ssa-loop-im.c (fill_always_executed_in_1): Walk
into all subloops.

* gcc.dg/tree-ssa/ssa-lim-17.c: New testcase.

Avoid full DOM walk in LIM fill_always_executed_in

This avoids a full DOM walk via get_loop_body_in_dom_order in the
loop body walk of fill_always_executed_in which is often terminating
the walk of a loop body early by integrating the DOM walk of
get_loop_body_in_dom_order with the actual processing done by
fill_always_executed_in. This trades the fully populated loop
body array with a worklist allocation of the same size and thus
should be a strict improvement over the recursive approach of
get_loop_body_in_dom_order.

2021-09-09 Richard Biener <rguenther@suse.de>

* tree-ssa-loop-im.c (fill_always_executed_in_1): Integrate
DOM walk from get_loop_body_in_dom_order using a worklist
approach.

AVX512FP16: Add testcase for vaddph/vsubph/vmulph/vdivph.

gcc/testsuite/ChangeLog:

* gcc.target/i386/avx512fp16-helper.h: New header file for
FP16 runtime test.
* gcc.target/i386/avx512fp16-vaddph-1a.c: New test.
* gcc.target/i386/avx512fp16-vaddph-1b.c: Ditto.
* gcc.target/i386/avx512fp16-vdivph-1a.c: Ditto.
* gcc.target/i386/avx512fp16-vdivph-1b.c: Ditto.
* gcc.target/i386/avx512fp16-vmulph-1a.c: Ditto.
* gcc.target/i386/avx512fp16-vmulph-1b.c: Ditto.
* gcc.target/i386/avx512fp16-vsubph-1a.c: Ditto.
* gcc.target/i386/avx512fp16-vsubph-1b.c: Ditto.
* gcc.target/i386/avx512fp16vl-vaddph-1a.c: Ditto.
* gcc.target/i386/avx512fp16vl-vaddph-1b.c: Ditto.
* gcc.target/i386/avx512fp16vl-vdivph-1a.c: Ditto.
* gcc.target/i386/avx512fp16vl-vdivph-1b.c: Ditto.
* gcc.target/i386/avx512fp16vl-vmulph-1a.c: Ditto.
* gcc.target/i386/avx512fp16vl-vmulph-1b.c: Ditto.
* gcc.target/i386/avx512fp16vl-vsubph-1a.c: Ditto.
* gcc.target/i386/avx512fp16vl-vsubph-1b.c: Ditto.

AVX512FP16: Add vaddph/vsubph/vdivph/vmulph.

gcc/ChangeLog:

* config.gcc: Add avx512fp16vlintrin.h.
* config/i386/avx512fp16intrin.h: (_mm512_add_ph): New intrinsic.
(_mm512_mask_add_ph): Likewise.
(_mm512_maskz_add_ph): Likewise.
(_mm512_sub_ph): Likewise.
(_mm512_mask_sub_ph): Likewise.
(_mm512_maskz_sub_ph): Likewise.
(_mm512_mul_ph): Likewise.
(_mm512_mask_mul_ph): Likewise.
(_mm512_maskz_mul_ph): Likewise.
(_mm512_div_ph): Likewise.
(_mm512_mask_div_ph): Likewise.
(_mm512_maskz_div_ph): Likewise.
(_mm512_add_round_ph): Likewise.
(_mm512_mask_add_round_ph): Likewise.
(_mm512_maskz_add_round_ph): Likewise.
(_mm512_sub_round_ph): Likewise.
(_mm512_mask_sub_round_ph): Likewise.
(_mm512_maskz_sub_round_ph): Likewise.
(_mm512_mul_round_ph): Likewise.
(_mm512_mask_mul_round_ph): Likewise.
(_mm512_maskz_mul_round_ph): Likewise.
(_mm512_div_round_ph): Likewise.
(_mm512_mask_div_round_ph): Likewise.
(_mm512_maskz_div_round_ph): Likewise.
* config/i386/avx512fp16vlintrin.h: New header.
* config/i386/i386-builtin-types.def (V16HF, V8HF, V32HF):
Add new builtin types.
* config/i386/i386-builtin.def: Add corresponding builtins.
* config/i386/i386-expand.c
(ix86_expand_args_builtin): Handle new builtin types.
(ix86_expand_round_builtin): Likewise.
* config/i386/immintrin.h: Include avx512fp16vlintrin.h
* config/i386/sse.md (VFH): New mode_iterator.
(VF2H): Likewise.
(avx512fmaskmode): Add HF vector modes.
(avx512fmaskhalfmode): Likewise.
(<plusminus_insn><mode>3<mask_name><round_name>): Adjust to for
HF vector modes.
(*<plusminus_insn><mode>3<mask_name><round_name>): Likewise.
(mul<mode>3<mask_name><round_name>): Likewise.
(*mul<mode>3<mask_name><round_name>): Likewise.
(div<mode>3): Likewise.
(<sse>_div<mode>3<mask_name><round_name>): Likewise.
* config/i386/subst.md (SUBST_V): Add HF vector modes.
(SUBST_A): Likewise.
(round_mode512bit_condition): Adjust for V32HFmode.

gcc/testsuite/ChangeLog:

* gcc.target/i386/avx-1.c: Add -mavx512vl and test for new intrinsics.
* gcc.target/i386/avx-2.c: Add -mavx512vl.
* gcc.target/i386/avx512fp16-11a.c: New test.
* gcc.target/i386/avx512fp16-11b.c: Ditto.
* gcc.target/i386/avx512vlfp16-11a.c: Ditto.
* gcc.target/i386/avx512vlfp16-11b.c: Ditto.
* gcc.target/i386/sse-13.c: Add test for new builtins.
* gcc.target/i386/sse-23.c: Ditto.
* gcc.target/i386/sse-14.c: Add test for new intrinsics.
* gcc.target/i386/sse-22.c: Ditto.

Optimize v4sf reduction.

gcc/ChangeLog:

PR target/101059
* config/i386/sse.md (reduc_plus_scal_<mode>): Split to ..
(reduc_plus_scal_v4sf): .. this, New define_expand.
(reduc_plus_scal_v2df): .. and this, New define_expand.

gcc/testsuite/ChangeLog:

PR target/101059
* gcc.target/i386/sse2-pr101059.c: New test.
* gcc.target/i386/sse3-pr101059.c: New test.

Optimize vec_extract for 256/512-bit vector when index exceeds the lower 128 bits.

- vextracti32x8 $0x1, %zmm0, %ymm0
- vmovd %xmm0, %eax
+ valignd $8, %zmm0, %zmm0, %zmm1
+ vmovd %xmm1, %eax

- vextracti32x8 $0x1, %zmm0, %ymm0
- vextracti128 $0x1, %ymm0, %xmm0
- vpextrd $3, %xmm0, %eax
+ valignd $15, %zmm0, %zmm0, %zmm1
+ vmovd %xmm1, %eax

- vextractf64x2 $0x1, %ymm0, %xmm0
+ valignq $2, %ymm0, %ymm0, %ymm0

- vextractf64x4 $0x1, %zmm0, %ymm0
- vextractf64x2 $0x1, %ymm0, %xmm0
- vunpckhpd %xmm0, %xmm0, %xmm0
+ valignq $7, %zmm0, %zmm0, %zmm0

gcc/ChangeLog:

PR target/91103
* config/i386/sse.md (*vec_extract<mode><ssescalarmodelower>_valign):
New define_insn.

gcc/testsuite/ChangeLog:

PR target/91103
* gcc.target/i386/pr91103-1.c: New test.
* gcc.target/i386/pr91103-2.c: New test.

Daily bump.

c++: Fix docs on assignment of virtual bases [PR60318]

The description of behaviour is incorrect, the virtual base gets
assigned before entering the bodies of A::operator= and B::operator=,
not after.

The example is also ill-formed (passing a string literal to char*) and
undefined (missing return from Base::operator=).

Signed-off-by: Jonathan Wakely <jwakely@redhat.com>
gcc/ChangeLog:

PR c++/60318
* doc/trouble.texi (Copy Assignment): Fix description of
behaviour and fix code in example.

analyzer: fix ICE when discarding result of realloc [PR102225]

gcc/analyzer/ChangeLog:
PR analyzer/102225
* analyzer.h (compat_types_p): New decl.
* constraint-manager.cc
(constraint_manager::get_or_add_equiv_class): Guard against NULL
type when checking for pointer types.
* region-model-impl-calls.cc (region_model::impl_call_realloc):
Guard against NULL lhs type/region. Guard against the size value
not being of a compatible type for dynamic extents.
* region-model.cc (compat_types_p): Make non-static.

gcc/testsuite/ChangeLog:
PR analyzer/102225
* gcc.dg/analyzer/realloc-1.c (test_10): New.
* gcc.dg/analyzer/torture/pr102225.c: New test.

c++/102228 - make lookup_anon_field O(1)

For the testcase in PR101555 lookup_anon_field takes the majority
of parsing time followed by get_class_binding_direct/fields_linear_search
which is PR83309.  The situation with anon aggregates is particularly
dire when we need to build accesses to their members and the anon
aggregates are nested.  There for each such access we recursively
build sub-accesses to the anon aggregate FIELD_DECLs bottom-up,
DFS searching for them.  That's inefficient since as I believe
there's a 1:1 relationship between anon aggregate types and the
FIELD_DECL used to place them.

The patch below does away with the search in lookup_anon_field and
instead records the single FIELD_DECL in the anon aggregate types
lang-specific data, re-using the RTTI typeinfo_var field.  That
speeds up the compile of the testcase with -fsyntax-only from
about 4.5s to slightly less than 1s.

I tried to poke holes into the 1:1 relationship idea with my C++
knowledge but failed (which might not say much).  It also leaves
a hole for the case when the C++ FE itself duplicates such type
and places it at a semantically different position.  I've tried
to poke holes into it with the duplication mechanism I understand
(templates) but failed.

2021-09-08  Richard Biener  <rguenther@suse.de>

PR c++/102228
gcc/cp/
* cp-tree.h (ANON_AGGR_TYPE_FIELD): New define.
* decl.c (fixup_anonymous_aggr): Wipe RTTI info put in
place on invalid code.
* decl2.c (reset_type_linkage): Guard CLASSTYPE_TYPEINFO_VAR
access.
* module.cc (trees_in::read_class_def): Likewise.  Reconstruct
ANON_AGGR_TYPE_FIELD.
* semantics.c (finish_member_declaration): Populate
ANON_AGGR_TYPE_FIELD for anon aggregate typed members.
* typeck.c (lookup_anon_field): Remove DFS search and return
ANON_AGGR_TYPE_FIELD directly.

testsuite: Allow .sdata in more cases in gcc.dg/array-quals-1.c

When testing for Nios II (gcc-testresults shows this for MIPS as
well), failures of gcc.dg/array-quals-1.c appear where a symbol was
found in .sdata rather than one of the expected sections.

FAIL: gcc.dg/array-quals-1.c scan-assembler-symbol-section symbol ^_?a$ (found a) has section ^\\.(const|rodata|srodata)|\\[RO\\] (found .sdata)
FAIL: gcc.dg/array-quals-1.c scan-assembler-symbol-section symbol ^_?b$ (found b) has section ^\\.(const|rodata|srodata)|\\[RO\\] (found .sdata)
FAIL: gcc.dg/array-quals-1.c scan-assembler-symbol-section symbol ^_?c$ (found c) has section ^\\.(const|rodata|srodata)|\\[RO\\] (found .sdata)
FAIL: gcc.dg/array-quals-1.c scan-assembler-symbol-section symbol ^_?d$ (found d) has section ^\\.(const|rodata|srodata)|\\[RO\\] (found .sdata)

Jakub's commit 0b34dbc0a24864b1674bff7a92fa3cf0f1cbcea1 allowed .sdata
for many variables in that test where use of .sdata caused a failure
on powerpc-linux. I'm presuming the choice of which variables had
.sdata allowed was based only on the code generated for powerpc-linux,
not on any reason it would be wrong to allow it for the other
variables; thus, this patch adjusts the test to allow .sdata for some
more variables where that is needed on Nios II (and in one case where
it's not needed on Nios II, but the test results on gcc-testresults
suggest that it is needed on MIPS).

Tested with no regressions with cross to nios2-elf.

* gcc.dg/array-quals-1.c: Allow .sdata section in more cases.

testsuite: Use explicit -ftree-cselim in tests using -fdump-tree-cselim-details

When testing for Nios II (gcc-testresults shows this for various other
targets as well), tests scanning cselim dumps produce an UNRESOLVED
result because those dumps do not exist.

cselim is enabled conditionally by code in toplev.c:

  if (flag_tree_cselim == AUTODETECT_VALUE)
    {
      if (HAVE_conditional_move)
flag_tree_cselim = 1;
      else
flag_tree_cselim = 0;
    }

Add explicit -ftree-cselim to dg-options in the affected tests (as
already used by some other tests of cselim dumps) so that this dump
exists on all architectures.

Tested with no regressions with cross to nios2-elf, where this causes
the tests in question to PASS instead of being UNRESOLVED.

* gcc.dg/tree-ssa/pr89430-1.c, gcc.dg/tree-ssa/pr89430-2.c,
gcc.dg/tree-ssa/pr89430-3.c, gcc.dg/tree-ssa/pr89430-4.c,
gcc.dg/tree-ssa/pr89430-5.c, gcc.dg/tree-ssa/pr89430-6.c,
gcc.dg/tree-ssa/pr89430-7-comp-ref.c,
gcc.dg/tree-ssa/pr89430-8-mem-ref-size.c,
gcc.dg/tree-ssa/pr99473-1.c: Use -ftree-cselim.

rs6000: Fix ELFv2 r12 use in epilogue

We cannot use r12 here, it is already in use as the GEP (for sibling
calls).

2021-09-08 Segher Boessenkool <segher@kernel.crashing.org>
PR target/102107
* config/rs6000/rs6000-logue.c (rs6000_emit_epilogue): For ELFv2 use
r11 instead of r12 for restoring CR.

i386: Fix up xorsign for AVX [PR89984]

Thinking about it more this morning, while this patch fixes the problems
revealed in the testcase, the recent PR89984 change was buggy too, but
perhaps that can be fixed incrementally.  Because for AVX the new code
destructively modifies op1.  If that is different from dest, say on:
float
foo (float x, float y)
{
  return x * __builtin_copysignf (1.0f, y) + y;
}
then we get after RA:
(insn 8 7 9 2 (set (reg:SF 20 xmm0 [orig:82 _2 ] [82])
        (unspec:SF [
                (reg:SF 20 xmm0 [88])
                (reg:SF 21 xmm1 [89])
                (mem/u/c:V4SF (symbol_ref/u:DI ("*.LC0") [flags 0x2]) [0  S16 A128])
            ] UNSPEC_XORSIGN)) "hohoho.c":4:12 649 {xorsignsf3_1}
     (nil))
(insn 9 8 15 2 (set (reg:SF 20 xmm0 [87])
        (plus:SF (reg:SF 20 xmm0 [orig:82 _2 ] [82])
            (reg:SF 21 xmm1 [89]))) "hohoho.c":4:44 1021 {*fop_sf_comm}
     (nil))
but split the xorsign into:
        vandps  .LC0(%rip), %xmm1, %xmm1
        vxorps  %xmm0, %xmm1, %xmm0
and then the addition:
        vaddss  %xmm1, %xmm0, %xmm0
which means we miscompile it - instead of adding y in the end we add
__builtin_copysignf (0.0f, y).
So, wonder if we don't want instead in addition to the &Yv <- Yv, 0
alternative (enabled for both pre-AVX and AVX as in this patch) the
&Yv <- Yv, Yv where destination must be different from inputs and another
Yv <- Yv, Yv where it can be the same but then need a match_scratch
(with X for the other alternatives and =Yv for the last one).
That way we'd always have a safe register we can store the op1 & mask
value into, either the destination (in the first alternative known to
be equal to op1 which is needed for non-AVX but ok for AVX too), in the
second alternative known to be different from both inputs and in the third
which could be used for those
float bar (float x, float y) { return x * __builtin_copysignf (1.0f, y); }
cases where op1 is naturally xmm1 and dest == op0 naturally xmm0 we'd use
some other register like xmm2.

On Wed, Sep 08, 2021 at 05:23:40PM +0800, Hongtao Liu wrote:
> I'm curious why we need the  post_reload splitter @xorsign<mode>3_1
> for scalar mode, can't we just expand them into and/xor operations in
> the expander, just like vector modes did.

Following seems to work for all the testcases I've tried (and in some
generates better code than the post-reload splitter).

2021-09-08  Jakub Jelinek  <jakub@redhat.com>
    liuhongt  <hongtao.liu@intel.com>

PR target/89984
* config/i386/i386.md (@xorsign<mode>3_1): Remove.
* config/i386/i386-expand.c (ix86_expand_xorsign): Expand right away
into AND with mask and XOR, using paradoxical subregs.
(ix86_split_xorsign): Remove.
* config/i386/i386-protos.h (ix86_split_xorsign): Remove.

* gcc.target/i386/avx-pr102224.c: Fix up PR number.
* gcc.dg/pr89984.c: New test.
* gcc.target/i386/avx-pr89984.c: New test.

Compile __{mul,div}hc3 into libgcc_s.so.1.

libgcc/ChangeLog:

* config/i386/t-softfp: Compile __{mul,div}hc3 into
libgcc_s.so.1.

tree-optimization/102183 - sccvn: fix result compare in vn_nary_op_insert_into

If the first predicate value is different and copied, the comparison will then
be between val->result and the copied one. That can cause inserting extra
vn_pvals.

gcc/ChangeLog:

* tree-ssa-sccvn.c (vn_nary_op_insert_into): fix result compare

libgcc, i386: Export *hf* and *hc* from libgcc_s.so.1

The following patch exports it for Linux from config/i386/*.ver where it
IMNSHO belongs, aarch64 already exports some of those at GCC_11* and other
targets might add them at completely different gcc versions.

2021-09-08 Jakub Jelinek <jakub@redhat.com>
Iain Sandoe <iain@sandoe.co.uk>

* config/i386/libgcc-glibc.ver: Add %inherit GCC_12.0.0 GCC_7.0.0
and export *hf* and *hc* functions at GCC_12.0.0.

i386: Fix up @xorsign<mode>3_1 [PR102224]

As the testcase shows, we miscompile @xorsign<mode>3_1 if both input
operands are in the same register, because the splitter overwrites op1
before with op1 & mask before using op0.

For dest = xorsign op0, op0 we can actually simplify it from
dest = (op0 & mask) ^ op0 to dest = op0 & ~mask (aka abs).

The expander change is an optimization improvement, if we at expansion
time know it is xorsign op0, op0, we can emit abs right away and get better
code through that.

The @xorsign<mode>3_1 is a fix for the case where xorsign wouldn't be known
to have same operands during expansion, but during RTL optimizations they
would appear.  For non-AVX we need to use earlyclobber, we require
dest and op1 to be the same but op0 must be different because we overwrite
op1 first.  For AVX the constraints ensure that at most 2 of the 3 operands
may be the same register and if both inputs are the same, handles that case.
This case can be easily tested with the xorsign<mode>3 expander change
reverted.

Bootstrapped/regtested on x86_64-linux and i686-linux, ok for trunk?

Thinking about it more this morning, while this patch fixes the problems
revealed in the testcase, the recent PR89984 change was buggy too, but
perhaps that can be fixed incrementally.  Because for AVX the new code
destructively modifies op1.  If that is different from dest, say on:
float
foo (float x, float y)
{
  return x * __builtin_copysignf (1.0f, y) + y;
}
then we get after RA:
(insn 8 7 9 2 (set (reg:SF 20 xmm0 [orig:82 _2 ] [82])
        (unspec:SF [
                (reg:SF 20 xmm0 [88])
                (reg:SF 21 xmm1 [89])
                (mem/u/c:V4SF (symbol_ref/u:DI ("*.LC0") [flags 0x2]) [0  S16 A128])
            ] UNSPEC_XORSIGN)) "hohoho.c":4:12 649 {xorsignsf3_1}
     (nil))
(insn 9 8 15 2 (set (reg:SF 20 xmm0 [87])
        (plus:SF (reg:SF 20 xmm0 [orig:82 _2 ] [82])
            (reg:SF 21 xmm1 [89]))) "hohoho.c":4:44 1021 {*fop_sf_comm}
     (nil))
but split the xorsign into:
        vandps  .LC0(%rip), %xmm1, %xmm1
        vxorps  %xmm0, %xmm1, %xmm0
and then the addition:
        vaddss  %xmm1, %xmm0, %xmm0
which means we miscompile it - instead of adding y in the end we add
__builtin_copysignf (0.0f, y).
So, wonder if we don't want instead in addition to the &Yv <- Yv, 0
alternative (enabled for both pre-AVX and AVX as in this patch) the
&Yv <- Yv, Yv where destination must be different from inputs and another
Yv <- Yv, Yv where it can be the same but then need a match_scratch
(with X for the other alternatives and =Yv for the last one).
That way we'd always have a safe register we can store the op1 & mask
value into, either the destination (in the first alternative known to
be equal to op1 which is needed for non-AVX but ok for AVX too), in the
second alternative known to be different from both inputs and in the third
which could be used for those
float bar (float x, float y) { return x * __builtin_copysignf (1.0f, y); }
cases where op1 is naturally xmm1 and dest == op0 naturally xmm0 we'd use
some other register like xmm2.

2021-09-08  Jakub Jelinek  <jakub@redhat.com>

PR target/102224
* config/i386/i386.md (xorsign<mode>3): If operands[1] is equal to
operands[2], emit abs<mode>2 instead.
(@xorsign<mode>3_1): Add early-clobbers for output operand, enable
first alternative even for avx, add another alternative with
=&Yv <- 0, Yv, Yvm constraints.
* config/i386/i386-expand.c (ix86_split_xorsign): If op0 is equal
to op1, emit vpandn instead.

* gcc.dg/pr102224.c: New test.
* gcc.target/i386/avx-pr102224.c: New test.

AVX512FP16: Add abi test for zmm

gcc/testsuite/ChangeLog:

* gcc.target/x86_64/abi/avx512fp16/m512h/abi-avx512fp16-zmm.exp:
New file.
* gcc.target/x86_64/abi/avx512fp16/m512h/args.h: Likewise.
* gcc.target/x86_64/abi/avx512fp16/m512h/asm-support.S: Likewise.
* gcc.target/x86_64/abi/avx512fp16/m512h/avx512fp16-zmm-check.h:
Likewise.
* gcc.target/x86_64/abi/avx512fp16/m512h/test_m512_returning.c:
Likewise.
* gcc.target/x86_64/abi/avx512fp16/m512h/test_passing_m512.c:
Likewise.
* gcc.target/x86_64/abi/avx512fp16/m512h/test_passing_structs.c:
Likewise.
* gcc.target/x86_64/abi/avx512fp16/m512h/test_passing_unions.c:
Likewise.
* gcc.target/x86_64/abi/avx512fp16/m512h/test_varargs-m512.c:
Likewise.

AVX512FP16: Add ABI test for ymm.

gcc/testsuite/ChangeLog:

* gcc.target/x86_64/abi/avx512fp16/m256h/abi-avx512fp16-ymm.exp:
New exp file.
* gcc.target/x86_64/abi/avx512fp16/m256h/args.h: New header.
* gcc.target/x86_64/abi/avx512fp16/m256h/avx512fp16-ymm-check.h:
Likewise.
* gcc.target/x86_64/abi/avx512fp16/m256h/asm-support.S: New.
* gcc.target/x86_64/abi/avx512fp16/m256h/test_m256_returning.c:
New test.
* gcc.target/x86_64/abi/avx512fp16/m256h/test_passing_m256.c: Likewise.
* gcc.target/x86_64/abi/avx512fp16/m256h/test_passing_structs.c:
Likewise.
* gcc.target/x86_64/abi/avx512fp16/m256h/test_passing_unions.c:
Likewise.
* gcc.target/x86_64/abi/avx512fp16/m256h/test_varargs-m256.c: Likewise.

AVX512FP16: Add ABI tests for xmm.

Copied from regular XMM ABI tests. Only run AVX512FP16 ABI tests for ELF
targets.

gcc/testsuite/ChangeLog:

* gcc.target/x86_64/abi/avx512fp16/abi-avx512fp16-xmm.exp: New exp
file for abi test.
* gcc.target/x86_64/abi/avx512fp16/args.h: New header file for abi test.
* gcc.target/x86_64/abi/avx512fp16/avx512fp16-check.h: Likewise.
* gcc.target/x86_64/abi/avx512fp16/avx512fp16-xmm-check.h: Likewise.
* gcc.target/x86_64/abi/avx512fp16/defines.h: Likewise.
* gcc.target/x86_64/abi/avx512fp16/macros.h: Likewise.
* gcc.target/x86_64/abi/avx512fp16/asm-support.S: New asm for abi check.
* gcc.target/x86_64/abi/avx512fp16/test_3_element_struct_and_unions.c:
New test.
* gcc.target/x86_64/abi/avx512fp16/test_basic_alignment.c: Likewise.
* gcc.target/x86_64/abi/avx512fp16/test_basic_array_size_and_align.c:
Likewise.
* gcc.target/x86_64/abi/avx512fp16/test_basic_returning.c: Likewise.
* gcc.target/x86_64/abi/avx512fp16/test_basic_sizes.c: Likewise.
* gcc.target/x86_64/abi/avx512fp16/test_basic_struct_size_and_align.c:
Likewise.
* gcc.target/x86_64/abi/avx512fp16/test_basic_union_size_and_align.c:
Likewise.
* gcc.target/x86_64/abi/avx512fp16/test_complex_returning.c: Likewise.
* gcc.target/x86_64/abi/avx512fp16/test_m64m128_returning.c: Likewise.
* gcc.target/x86_64/abi/avx512fp16/test_passing_floats.c: Likewise.
* gcc.target/x86_64/abi/avx512fp16/test_passing_m64m128.c: Likewise.
* gcc.target/x86_64/abi/avx512fp16/test_passing_structs.c: Likewise.
* gcc.target/x86_64/abi/avx512fp16/test_passing_unions.c: Likewise.
* gcc.target/x86_64/abi/avx512fp16/test_struct_returning.c: Likewise.
* gcc.target/x86_64/abi/avx512fp16/test_varargs-m128.c: Likewise.

AVX512FP16: Add tests for vector passing in variable arguments.

gcc/testsuite/ChangeLog:

* gcc.target/i386/avx512fp16-vararg-1.c: New test.
* gcc.target/i386/avx512fp16-vararg-2.c: Ditto.
* gcc.target/i386/avx512fp16-vararg-3.c: Ditto.
* gcc.target/i386/avx512fp16-vararg-4.c: Ditto.

AVX512FP16: Add testcase for vector init and broadcast intrinsics.

gcc/testsuite/ChangeLog:

* gcc.target/i386/m512-check.h: Add union128h, union256h, union512h.
* gcc.target/i386/avx512fp16-10a.c: New test.
* gcc.target/i386/avx512fp16-10b.c: Ditto.
* gcc.target/i386/avx512fp16-1a.c: Ditto.
* gcc.target/i386/avx512fp16-1b.c: Ditto.
* gcc.target/i386/avx512fp16-1c.c: Ditto.
* gcc.target/i386/avx512fp16-1d.c: Ditto.
* gcc.target/i386/avx512fp16-1e.c: Ditto.
* gcc.target/i386/avx512fp16-2a.c: Ditto.
* gcc.target/i386/avx512fp16-2b.c: Ditto.
* gcc.target/i386/avx512fp16-2c.c: Ditto.
* gcc.target/i386/avx512fp16-3a.c: Ditto.
* gcc.target/i386/avx512fp16-3b.c: Ditto.
* gcc.target/i386/avx512fp16-3c.c: Ditto.
* gcc.target/i386/avx512fp16-4.c: Ditto.
* gcc.target/i386/avx512fp16-5.c: Ditto.
* gcc.target/i386/avx512fp16-6.c: Ditto.
* gcc.target/i386/avx512fp16-7.c: Ditto.
* gcc.target/i386/avx512fp16-8.c: Ditto.
* gcc.target/i386/avx512fp16-9a.c: Ditto.
* gcc.target/i386/avx512fp16-9b.c: Ditto.
* gcc.target/i386/pr54855-13.c: Ditto.
* gcc.target/i386/avx512fp16-vec_set_var.c: Ditto.

AVX512FP16: Support vector init/broadcast/set/extract for FP16.

gcc/ChangeLog:

* config/i386/avx512fp16intrin.h (_mm_set_ph): New intrinsic.
(_mm256_set_ph): Likewise.
(_mm512_set_ph): Likewise.
(_mm_setr_ph): Likewise.
(_mm256_setr_ph): Likewise.
(_mm512_setr_ph): Likewise.
(_mm_set1_ph): Likewise.
(_mm256_set1_ph): Likewise.
(_mm512_set1_ph): Likewise.
(_mm_setzero_ph): Likewise.
(_mm256_setzero_ph): Likewise.
(_mm512_setzero_ph): Likewise.
(_mm_set_sh): Likewise.
(_mm_load_sh): Likewise.
(_mm_store_sh): Likewise.
* config/i386/i386-builtin-types.def (V8HF): New type.
(DEF_FUNCTION_TYPE (V8HF, V8HI)): New builtin function type
* config/i386/i386-expand.c (ix86_expand_vector_init_duplicate):
Support vector HFmodes.
(ix86_expand_vector_init_one_nonzero): Likewise.
(ix86_expand_vector_init_one_var): Likewise.
(ix86_expand_vector_init_interleave): Likewise.
(ix86_expand_vector_init_general): Likewise.
(ix86_expand_vector_set): Likewise.
(ix86_expand_vector_extract): Likewise.
(ix86_expand_vector_init_concat): Likewise.
(ix86_expand_sse_movcc): Handle vector HFmodes.
(ix86_expand_vector_set_var): Ditto.
* config/i386/i386-modes.def: Add HF vector modes in comment.
* config/i386/i386.c (classify_argument): Add HF vector modes.
(ix86_hard_regno_mode_ok): Allow HF vector modes for AVX512FP16.
(ix86_vector_mode_supported_p): Likewise.
(ix86_set_reg_reg_cost): Handle vector HFmode.
(ix86_get_ssemov): Handle vector HFmode.
(function_arg_advance_64): Pass unamed V16HFmode and V32HFmode
by stack.
(function_arg_advance_32): Pass V8HF/V16HF/V32HF by sse reg for 32bit
mode.
(function_arg_advance_32): Ditto.
* config/i386/i386.h (VALID_AVX512FP16_REG_MODE): New.
(VALID_AVX256_REG_OR_OI_MODE): Rename to ..
(VALID_AVX256_REG_OR_OI_VHF_MODE): .. this, and add V16HF.
(VALID_SSE2_REG_VHF_MODE): New.
(VALID_AVX512VL_128_REG_MODE): Add V8HF and TImode.
(SSE_REG_MODE_P): Add vector HFmode.
* config/i386/i386.md (mode): Add HF vector modes.
(MODE_SIZE): Likewise.
(ssemodesuffix): Add ph suffix for HF vector modes.
* config/i386/sse.md (VFH_128): New mode iterator.
(VMOVE): Adjust for HF vector modes.
(V): Likewise.
(V_256_512): Likewise.
(avx512): Likewise.
(avx512fmaskmode): Likewise.
(shuffletype): Likewise.
(sseinsnmode): Likewise.
(ssedoublevecmode): Likewise.
(ssehalfvecmode): Likewise.
(ssehalfvecmodelower): Likewise.
(ssePScmode): Likewise.
(ssescalarmode): Likewise.
(ssescalarmodelower): Likewise.
(sseintprefix): Likewise.
(i128): Likewise.
(bcstscalarsuff): Likewise.
(xtg_mode): Likewise.
(VI12HF_AVX512VL): New mode_iterator.
(VF_AVX512FP16): Likewise.
(VIHF): Likewise.
(VIHF_256): Likewise.
(VIHF_AVX512BW): Likewise.
(V16_256): Likewise.
(V32_512): Likewise.
(sseintmodesuffix): New mode_attr.
(sse): Add scalar and vector HFmodes.
(ssescalarmode): Add vector HFmode mapping.
(ssescalarmodesuffix): Add sh suffix for HFmode.
(*<sse>_vm<insn><mode>3): Use VFH_128.
(*<sse>_vm<multdiv_mnemonic><mode>3): Likewise.
(*ieee_<ieee_maxmin><mode>3): Likewise.
(<avx512>_blendm<mode>): New define_insn.
(vec_setv8hf): New define_expand.
(vec_set<mode>_0): New define_insn for HF vector set.
(*avx512fp16_movsh): Likewise.
(avx512fp16_movsh): Likewise.
(vec_extract_lo_v32hi): Rename to ...
(vec_extract_lo_<mode>): ... this, and adjust to allow HF
vector modes.
(vec_extract_hi_v32hi): Likewise.
(vec_extract_hi_<mode>): Likewise.
(vec_extract_lo_v16hi): Likewise.
(vec_extract_lo_<mode>): Likewise.
(vec_extract_hi_v16hi): Likewise.
(vec_extract_hi_<mode>): Likewise.
(vec_set_hi_v16hi): Likewise.
(vec_set_hi_<mode>): Likewise.
(vec_set_lo_v16hi): Likewise.
(vec_set_lo_<mode>): Likewise.
(*vec_extract<mode>_0): New define_insn_and_split for HF
vector extract.
(*vec_extracthf): New define_insn.
(VEC_EXTRACT_MODE): Add HF vector modes.
(PINSR_MODE): Add V8HF.
(sse2p4_1): Likewise.
(pinsr_evex_isa): Likewise.
(<sse2p4_1>_pinsr<ssemodesuffix>): Adjust to support
insert for V8HFmode.
(pbroadcast_evex_isa): Add HF vector modes.
(AVX2_VEC_DUP_MODE): Likewise.
(VEC_INIT_MODE): Likewise.
(VEC_INIT_HALF_MODE): Likewise.
(avx2_pbroadcast<mode>): Adjust to support HF vector mode
broadcast.
(avx2_pbroadcast<mode>_1): Likewise.
(<avx512>_vec_dup<mode>_1): Likewise.
(<avx512>_vec_dup<mode><mask_name>): Likewise.
(<mask_codefor><avx512>_vec_dup_gpr<mode><mask_name>):
Likewise.

AVX512FP16: Initial support for AVX512FP16 feature and scalar _Float16 instructions.

gcc/ChangeLog:

* common/config/i386/cpuinfo.h (get_available_features):
Detect FEATURE_AVX512FP16.
* common/config/i386/i386-common.c
(OPTION_MASK_ISA_AVX512FP16_SET,
OPTION_MASK_ISA_AVX512FP16_UNSET,
OPTION_MASK_ISA2_AVX512FP16_SET,
OPTION_MASK_ISA2_AVX512FP16_UNSET): New.
(OPTION_MASK_ISA2_AVX512BW_UNSET,
OPTION_MASK_ISA2_AVX512BF16_UNSET): Add AVX512FP16.
(ix86_handle_option): Handle -mavx512fp16.
* common/config/i386/i386-cpuinfo.h (enum processor_features):
Add FEATURE_AVX512FP16.
* common/config/i386/i386-isas.h: Add entry for AVX512FP16.
* config.gcc: Add avx512fp16intrin.h.
* config/i386/avx512fp16intrin.h: New intrinsic header.
* config/i386/cpuid.h: Add bit_AVX512FP16.
* config/i386/i386-builtin-types.def: (FLOAT16): New primitive type.
* config/i386/i386-builtins.c: Support _Float16 type for i386
backend.
(ix86_register_float16_builtin_type): New function.
(ix86_float16_type_node): New.
* config/i386/i386-c.c (ix86_target_macros_internal): Define
__AVX512FP16__.
* config/i386/i386-expand.c (ix86_expand_branch): Support
HFmode.
(ix86_prepare_fp_compare_args): Adjust TARGET_SSE_MATH &&
SSE_FLOAT_MODE_P to SSE_FLOAT_MODE_SSEMATH_OR_HF_P.
(ix86_expand_fp_movcc): Ditto.
* config/i386/i386-isa.def: Add PTA define for AVX512FP16.
* config/i386/i386-options.c (isa2_opts): Add -mavx512fp16.
(ix86_valid_target_attribute_inner_p): Add avx512fp16 attribute.
* config/i386/i386.c (ix86_get_ssemov): Use
vmovdqu16/vmovw/vmovsh for HFmode/HImode scalar or vector.
(ix86_get_excess_precision): Use
FLT_EVAL_METHOD_PROMOTE_TO_FLOAT16 when TARGET_AVX512FP16
existed.
(sse_store_index): Use SFmode cost for HFmode cost.
(inline_memory_move_cost): Add HFmode, and perfer SSE cost over
GPR cost for HFmode.
(ix86_hard_regno_mode_ok): Allow HImode in sse register.
(ix86_mangle_type): Add manlging for _Float16 type.
(inline_secondary_memory_needed): No memory is needed for
16bit movement between gpr and sse reg under
TARGET_AVX512FP16.
(ix86_multiplication_cost): Adjust TARGET_SSE_MATH &&
SSE_FLOAT_MODE_P to SSE_FLOAT_MODE_SSEMATH_OR_HF_P.
(ix86_division_cost): Ditto.
(ix86_rtx_costs): Ditto.
(ix86_add_stmt_cost): Ditto.
(ix86_optab_supported_p): Ditto.
* config/i386/i386.h (VALID_AVX512F_SCALAR_MODE): Add HFmode.
(SSE_FLOAT_MODE_SSEMATH_OR_HF_P): Add HFmode.
(PTA_SAPPHIRERAPIDS): Add PTA_AVX512FP16.
* config/i386/i386.md (mode): Add HFmode.
(MODE_SIZE): Add HFmode.
(isa): Add avx512fp16.
(enabled): Handle avx512fp16.
(ssemodesuffix): Add sh suffix for HFmode.
(comm): Add mult, div.
(plusminusmultdiv): New code iterator.
(insn): Add mult, div.
(*movhf_internal): Adjust for avx512fp16 instruction.
(*movhi_internal): Ditto.
(*cmpi<unord>hf): New define_insn for HFmode.
(*ieee_s<ieee_maxmin>hf3): Likewise.
(extendhf<mode>2): Likewise.
(trunc<mode>hf2): Likewise.
(float<floatunssuffix><mode>hf2): Likewise.
(*<insn>hf): Likewise.
(cbranchhf4): New expander.
(movhfcc): Likewise.
(<insn>hf3): Likewise.
(mulhf3): Likewise.
(divhf3): Likewise.
* config/i386/i386.opt: Add mavx512fp16.
* config/i386/immintrin.h: Include avx512fp16intrin.h.
* doc/invoke.texi: Add mavx512fp16.
* doc/extend.texi: Add avx512fp16 Usage Notes.

gcc/testsuite/ChangeLog:

* gcc.target/i386/avx-1.c: Add -mavx512fp16 in dg-options.
* gcc.target/i386/avx-2.c: Ditto.
* gcc.target/i386/avx512-check.h: Check cpuid for AVX512FP16.
* gcc.target/i386/funcspec-56.inc: Add new target attribute check.
* gcc.target/i386/sse-13.c: Add -mavx512fp16.
* gcc.target/i386/sse-14.c: Ditto.
* gcc.target/i386/sse-22.c: Ditto.
* gcc.target/i386/sse-23.c: Ditto.
* lib/target-supports.exp: (check_effective_target_avx512fp16): New.
* g++.target/i386/float16-1.C: New test.
* g++.target/i386/float16-2.C: Ditto.
* g++.target/i386/float16-3.C: Ditto.
* gcc.target/i386/avx512fp16-12a.c: Ditto.
* gcc.target/i386/avx512fp16-12b.c: Ditto.
* gcc.target/i386/float16-3a.c: Ditto.
* gcc.target/i386/float16-3b.c: Ditto.
* gcc.target/i386/float16-4a.c: Ditto.
* gcc.target/i386/float16-4b.c: Ditto.
* gcc.target/i386/pr54855-12.c: Ditto.
* g++.dg/other/i386-2.C: Ditto.
* g++.dg/other/i386-3.C: Ditto.

Co-Authored-By: H.J. Lu <hongjiu.lu@intel.com>
Co-Authored-By: Liu Hongtao <hongtao.liu@intel.com>
Co-Authored-By: Wang Hongyu <hongyu.wang@intel.com>
Co-Authored-By: Xu Dianhong <dianhong.xu@intel.com>

Support -fexcess-precision=16 which will enable FLT_EVAL_METHOD_PROMOTE_TO_FLOAT16 when backend supports _Float16.

gcc/ada/ChangeLog:

* gcc-interface/misc.c (gnat_post_options): Issue an error for
-fexcess-precision=16.

gcc/c-family/ChangeLog:

* c-common.c (excess_precision_mode_join): Update below comments.
(c_ts18661_flt_eval_method): Set excess_precision_type to
EXCESS_PRECISION_TYPE_FLOAT16 when -fexcess-precision=16.
* c-cppbuiltin.c (cpp_atomic_builtins): Update below comments.
(c_cpp_flt_eval_method_iec_559): Set excess_precision_type to
EXCESS_PRECISION_TYPE_FLOAT16 when -fexcess-precision=16.

gcc/ChangeLog:

* common.opt: Support -fexcess-precision=16.
* config/aarch64/aarch64.c (aarch64_excess_precision): Return
FLT_EVAL_METHOD_PROMOTE_TO_FLOAT16 when
EXCESS_PRECISION_TYPE_FLOAT16.
* config/arm/arm.c (arm_excess_precision): Ditto.
* config/i386/i386.c (ix86_get_excess_precision): Ditto.
* config/m68k/m68k.c (m68k_excess_precision): Issue an error
when EXCESS_PRECISION_TYPE_FLOAT16.
* config/s390/s390.c (s390_excess_precision): Ditto.
* coretypes.h (enum excess_precision_type): Add
EXCESS_PRECISION_TYPE_FLOAT16.
* doc/tm.texi (TARGET_C_EXCESS_PRECISION): Update documents.
* doc/tm.texi.in (TARGET_C_EXCESS_PRECISION): Ditto.
* doc/extend.texi (Half-Precision): Document
-fexcess-precision=16.
* flag-types.h (enum excess_precision): Add
EXCESS_PRECISION_FLOAT16.
* target.def (excess_precision): Update document.
* tree.c (excess_precision_type): Set excess_precision_type to
EXCESS_PRECISION_FLOAT16 when -fexcess-precision=16.

gcc/fortran/ChangeLog:

* options.c (gfc_post_options): Issue an error for
-fexcess-precision=16.

gcc/testsuite/ChangeLog:

* gcc.target/i386/float16-6.c: New test.
* gcc.target/i386/float16-7.c: New test.

Adjust the wording for x86 _Float16 type.

gcc/ChangeLog:

* doc/extend.texi: (@node Floating Types): Adjust the wording.
(@node Half-Precision): Ditto.

Daily bump.

gcc: xtensa: fix PR target/102115

2021-09-07 Takayuki 'January June' Suwa <jjsuwa_sys3175@yahoo.co.jp>
gcc/
PR target/102115
* config/xtensa/xtensa.c (xtensa_emit_move_sequence): Add
'CONST_INT_P (src)' to the condition of the block that tries to
eliminate literal when loading integer contant.

runtime: use hash32, not hash64, for amd64p32, mips64p32, mips64p32le

Fixes PR go/102102

Reviewed-on: https://go-review.googlesource.com/c/gofrontend/+/348015

doc: BPF CO-RE documentation

Document the new command line options (-mco-re and -mno-co-re), the new
BPF target builtin (__builtin_preserve_access_index), and the new BPF
target attribute (preserve_access_index) introduced with BPF CO-RE.

gcc/ChangeLog:

* doc/extend.texi (BPF Type Attributes) New node.
Document new preserve_access_index attribute.
Document new preserve_access_index builtin.
* doc/invoke.texi: Document -mco-re and -mno-co-re options.

bpf testsuite: Add BPF CO-RE tests

This commit adds several tests for the new BPF CO-RE functionality to
the BPF target testsuite.

gcc/testsuite/ChangeLog:

* gcc.target/bpf/core-attr-1.c: New test.
* gcc.target/bpf/core-attr-2.c: Likewise.
* gcc.target/bpf/core-attr-3.c: Likewise.
* gcc.target/bpf/core-attr-4.c: Likewise
* gcc.target/bpf/core-builtin-1.c: Likewise
* gcc.target/bpf/core-builtin-2.c: Likewise.
* gcc.target/bpf/core-builtin-3.c: Likewise.
* gcc.target/bpf/core-section-1.c: Likewise.

bpf: BPF CO-RE support

This commit introduces support for BPF Compile Once - Run
Everywhere (CO-RE) in GCC.

gcc/ChangeLog:

* config/bpf/bpf.c: Adjust includes.
(bpf_handle_preserve_access_index_attribute): New function.
(bpf_attribute_table): Use it here.
(bpf_builtins): Add BPF_BUILTIN_PRESERVE_ACCESS_INDEX.
(bpf_option_override): Handle "-mco-re" option.
(bpf_asm_init_sections): New.
(TARGET_ASM_INIT_SECTIONS): Redefine.
(bpf_file_end): New.
(TARGET_ASM_FILE_END): Redefine.
(bpf_init_builtins): Add "__builtin_preserve_access_index".
(bpf_core_compute, bpf_core_get_index): New.
(is_attr_preserve_access): New.
(bpf_expand_builtin): Handle new builtins.
(bpf_core_newdecl, bpf_core_is_maybe_aggregate_access): New.
(bpf_core_walk): New.
(bpf_resolve_overloaded_builtin): New.
(TARGET_RESOLVE_OVERLOADED_BUILTIN): Redefine.
(handle_attr): New.
(pass_bpf_core_attr): New RTL pass.
* config/bpf/bpf-passes.def: New file.
* config/bpf/bpf-protos.h (make_pass_bpf_core_attr): New.
* config/bpf/coreout.c: New file.
* config/bpf/coreout.h: Likewise.
* config/bpf/t-bpf (TM_H): Add $(srcdir)/config/bpf/coreout.h.
(coreout.o): New rule.
(PASSES_EXTRA): Add $(srcdir)/config/bpf/bpf-passes.def.
* config.gcc (bpf): Add coreout.h to extra_headers.
Add coreout.o to extra_objs.
Add $(srcdir)/config/bpf/coreout.c to target_gtfiles.

btf: expose get_btf_id

Expose the function get_btf_id, so that it may be used by the BPF
backend. This enables the BPF CO-RE machinery in the BPF backend to
lookup BTF type IDs, in order to create CO-RE relocation records.

A prototype is added in ctfc.h

gcc/ChangeLog:

* btfout.c (get_btf_id): Function is no longer static.
* ctfc.h: Expose it here.

ctfc: add function to lookup CTF ID of a TREE type

Add a new function, ctf_lookup_tree_type, to return the CTF type ID
associated with a type via its is TREE node. The function is exposed via
a prototype in ctfc.h.

gcc/ChangeLog:

* ctfc.c (ctf_lookup_tree_type): New function.
* ctfc.h: Likewise.

ctfc: externalize ctf_dtd_lookup

Expose the function ctf_dtd_lookup, so that it can be used by the BPF
CO-RE machinery. The function is no longer static, and an extern
prototype is added in ctfc.h.

gcc/ChangeLog:

* ctfc.c (ctf_dtd_lookup): Function is no longer static.
* ctfc.h: Analogous change.

dwarf: externalize lookup_type_die

Expose the function lookup_type_die in dwarf2out, so that it can be used
by CTF/BTF when adding BPF CO-RE information. The function is now
non-static, and an extern prototype is added in dwarf2out.h.

gcc/ChangeLog:

* dwarf2out.c (lookup_type_die): Function is no longer static.
* dwarf2out.h: Expose it here.

Fix fatal typo in gcc.dg/no_profile_instrument_function-attr-2.c

Dejagnu is unfortunately brittle: a syntax error in a
directive can abort the test-run for the current "tool"
(gcc, g++, gfortran), and if you don't check for this
condition or actually read the stdout log yourself, your
tools may make you believe the test was successful without
regressions.  At the very least, always grep for ^ERROR: in
the stdout log!

With r12-3379, the testsuite got such a fatal syntax error,
causing the gcc test-run to abort at (e.g.):

...
FAIL: gcc.dg/memchr.c (test for excess errors)
FAIL: gcc.dg/memcmp-3.c (test for excess errors)
ERROR: (DejaGnu) proc "scan-tree-dump-not\" = foo {"} optimized" does not exist.
The error code is TCL LOOKUP COMMAND scan-tree-dump-not\"
The info on the error is:
invalid command name "scan-tree-dump-not""
    while executing
"::tcl_unknown scan-tree-dump-not\" = foo {"} optimized"
    ("uplevel" body line 1)
    invoked from within
"uplevel 1 ::tcl_unknown $args"

=== gcc Summary ===

# of expected passes 63740
# of unexpected failures 38
# of unexpected successes 2
# of expected failures 351
# of unresolved testcases 3
# of unsupported tests 662
x/cris-elf/gccobj/gcc/xgcc  version 12.0.0 20210907 (experimental)\
[master r12-3391-g849d5f5929fc] (GCC)

testsuite:
* gcc.dg/no_profile_instrument_function-attr-2.c: Fix
typo in last change.

Fortran - improve error recovery determining array element from constructor

gcc/fortran/ChangeLog:

PR fortran/101327
* expr.c (find_array_element): When bounds cannot be determined as
constant, return error instead of aborting.

gcc/testsuite/ChangeLog:

PR fortran/101327
* gfortran.dg/pr101327.f90: New test.

dwarf2out: Emit BTF in dwarf2out_finish for BPF CO-RE usecase

DWARF generation is split between early and late phases when LTO is in effect.
This poses challenges for CTF/BTF generation especially if late debug info
generation is desirable, as turns out to be the case for BPF CO-RE.

The approach taken here in this patch is:

1. LTO is disabled for BPF CO-RE
The reason to disable LTO for BPF CO-RE is that if LTO is in effect, BPF CO-RE
relocations need to be generated in the LTO link phase _after_ the optimizations
are done. This means we need to devise way to combine early and late BTF. At
this time, in absence of linker support for BTF sections, it makes sense to
steer clear of LTO for BPF CO-RE and bypass the issue.

2. The BPF backend updates the write_symbols with BPF_WITH_CORE_DEBUG to convey
the case that BTF with CO-RE support needs to be generated.  This information
is used by the debug info emission routines to defer the emission of BTF/CO-RE
until dwarf2out_finish.

So, in other words,

dwarf2out_early_finish
  - Always emit CTF here.
  - if (BTF && !BTF_WITH_CORE), emit BTF now.

dwarf2out_finish
  - if (BTF_WITH_CORE) emit BTF now.

gcc/ChangeLog:

* dwarf2ctf.c (ctf_debug_finalize): Make it static.
(ctf_debug_early_finish): New definition.
(ctf_debug_finish): Likewise.
* dwarf2ctf.h (ctf_debug_finalize): Remove declaration.
(ctf_debug_early_finish): New declaration.
(ctf_debug_finish): Likewise.
* dwarf2out.c (dwarf2out_finish): Invoke ctf_debug_finish.
(dwarf2out_early_finish): Invoke ctf_debug_early_finish.

bpf: Add new -mco-re option for BPF CO-RE

-mco-re in the BPF backend enables code generation for the CO-RE usecase. LTO is
disabled for CO-RE compilations.

gcc/ChangeLog:

* config/bpf/bpf.c (bpf_option_override): For BPF backend, disable LTO
support when compiling for CO-RE.
* config/bpf/bpf.opt: Add new command line option -mco-re.

gcc/testsuite/ChangeLog:

* gcc.target/bpf/core-lto-1.c: New test.

debug: Add BTF_WITH_CORE_DEBUG debug format

To best handle BTF/CO-RE in GCC, a distinct BTF_WITH_CORE_DEBUG debug format is
being added. This helps the compiler detect whether BTF with CO-RE relocations
needs to be emitted.

gcc/ChangeLog:

* flag-types.h (enum debug_info_type): Add new enum
DINFO_TYPE_BTF_WITH_CORE.
(BTF_WITH_CORE_DEBUG): New bitmask.
* flags.h (btf_with_core_debuginfo_p): New declaration.
* opts.c (btf_with_core_debuginfo_p): New definition.

tree: Change error_operand_p to an inline function

I've thought for a while that many of the macros in tree.h and such should
become inline functions. This one in particular was confusing Coverity; the
null check in the macro made it think that all code guarded by
error_operand_p would also need null checks.

gcc/ChangeLog:

* tree.h (error_operand_p): Change to inline function.

c++: Fix up constexpr evaluation of deleting dtors [PR100495]

We do not save bodies of constexpr clones and instead evaluate the bodies
of the constexpr functions they were cloned from.
I believe that is just fine for constructors because complete vs. base
ctors differ only in classes that have virtual bases and such constructors
aren't constexpr, similarly complete/base destructors.
But as the testcase below shows, for deleting destructors it is not fine,
deleting dtors while marked as clones in fact are just artificial functions
with synthetized body which calls the user destructor and deallocation.

So, either we'd need to evaluate the destructor and afterwards synthetize
and evaluate the deallocation, or we can just save and use the deleting
dtors bodies. The latter seems much easier to me.

2021-09-07 Jakub Jelinek <jakub@redhat.com>

PR c++/100495
* constexpr.c (maybe_save_constexpr_fundef): Save body even for
constexpr deleting dtors.
(cxx_eval_call_expression): Don't use DECL_CLONED_FUNCTION for
deleting dtors.

* g++.dg/cpp2a/constexpr-new21.C: New test.

libgomp.texi: Extend OpenMP 5.0 Implementation Status

libgomp/
* libgomp.texi (OpenMP Implementation Status): Extend
OpenMP 5.0 section.
(OpenACC Profiling Interface): Fix typo.

Rename forwarder_block_p in treading code to empty_block_with_phis_p.

gcc/ChangeLog:

* tree-ssa-threadedge.c (forwarder_block_p): Rename to...
(empty_block_with_phis_p): ...this.
(potentially_threadable_block): Same.
(jump_threader::thread_through_normal_block): Same.

libgfortran: Makefile fix for ISO_Fortran_binding.h

libgfortran/ChangeLog:

* Makefile.am (gfor_built_src): Depend on
include/ISO_Fortran_binding.h not on ISO_Fortran_binding.h.
(ISO_Fortran_binding.h): Rename make target to ...
(include/ISO_Fortran_binding.h): ... this.
* Makefile.in: Regenerate.

Fix PR debug/101947

This is the recent LTO bootstrap failure with Ada enabled. The compiler now
generates DW_OP_deref_type for a unit of the Ada front-end, which means that
the offset of base types in the CU must be computed during early DWARF too.

gcc/
PR debug/101947
* dwarf2out.c (mark_base_types): New overloaded function.
(dwarf2out_early_finish): Invoke it on the COMDAT type list as well
as the compilation unit, and call move_marked_base_types afterward.

x86: Enable FMA in unsigned SI to SF expanders

Enable FMA in scalar/vector unsigned SI to SF expanders. Don't check
TARGET_AVX512F which has vcvtusi2ss and vcvtudq2ps instructions.

gcc/

PR target/85819
* config/i386/i386-expand.c (ix86_expand_convert_uns_sisf_sse):
Enable FMA.
(ix86_expand_vector_convert_uns_vsivsf): Likewise.

gcc/testsuite/

PR target/85819
* gcc.target/i386/pr85819-1a.c: New test.
* gcc.target/i386/pr85819-1b.c: Likewise.
* gcc.target/i386/pr85819-2a.c: Likewise.
* gcc.target/i386/pr85819-2b.c: Likewise.
* gcc.target/i386/pr85819-2c.c: Likewise.
* gcc.target/i386/pr85819-3.c: Likewise.

tree-optimization/102226 - fix epilogue vector re-use

This fixes re-use of the reduction value in epilogue vectorization
when a conversion from/to variable lenght vectors is required.

2021-09-07 Richard Biener <rguenther@suse.de>

PR tree-optimization/102226
* tree-vect-loop.c (vect_transform_cycle_phi): Record
the converted value for the epilogue PHI use.

* g++.dg/vect/pr102226.cc: New testcase.

C, C++, Fortran, OpenMP: Add support for 'flush seq_cst' construct.

This patch adds support for the 'seq_cst' memory order clause on the 'flush'
directive which was introduced in OpenMP 5.1.

gcc/c-family/ChangeLog:

* c-omp.c (c_finish_omp_flush): Handle MEMMODEL_SEQ_CST.

gcc/c/ChangeLog:

* c-parser.c (c_parser_omp_flush): Parse 'seq_cst' clause on 'flush'
directive.

gcc/cp/ChangeLog:

* parser.c (cp_parser_omp_flush): Parse 'seq_cst' clause on 'flush'
directive.
* semantics.c (finish_omp_flush): Handle MEMMODEL_SEQ_CST.

gcc/fortran/ChangeLog:

* openmp.c (gfc_match_omp_flush): Parse 'seq_cst' clause on 'flush'
directive.
* trans-openmp.c (gfc_trans_omp_flush): Handle OMP_MEMORDER_SEQ_CST.

gcc/testsuite/ChangeLog:

* c-c++-common/gomp/flush-1.c: Add test case for 'seq_cst'.
* c-c++-common/gomp/flush-2.c: Add test case for 'seq_cst'.
* g++.dg/gomp/attrs-1.C: Adapt test to handle all flush clauses.
* g++.dg/gomp/attrs-2.C: Adapt test to handle all flush clauses.
* gfortran.dg/gomp/flush-1.f90: Add test case for 'seq_cst'.
* gfortran.dg/gomp/flush-2.f90: Add test case for 'seq_cst'.

inline: do not einline when no_profile_instrument_function is different

PR gcov-profile/80223

gcc/ChangeLog:

* ipa-inline.c (can_inline_edge_p): Similarly to sanitizer
options, do not inline when no_profile_instrument_function
attributes are different in early inliner. It's fine to inline
it after PGO instrumentation.

gcc/testsuite/ChangeLog:

* gcc.dg/no_profile_instrument_function-attr-2.c: New test.

tree-optimization/101555 - avoid redundant alias queries in PRE

This avoids doing redundant work during PHI translation to invalidate
mems when translating their corresponding VUSE through the blocks
virtual PHI node. All the invalidation work is already done by
prune_clobbered_mems.

This speeds up the compile of the testcase from 275s with PRE
taking 91% of the compile-time down to 43s with PRE taking 16%
of the compile-time.

2021-09-07 Richard Biener <rguenther@suse.de>

PR tree-optimization/101555
* tree-ssa-pre.c (translate_vuse_through_block): Do not
perform an alias walk to determine the validity of the
mem at the start of the block which is already guaranteed
by means of prune_clobbered_mems.
(phi_translate_1): Pass edge to translate_vuse_through_block.

libgomp.texi: Add OpenMP Implementation Status

libgomp/
* libgomp.texi (Enabling OpenMP): Refer to OMP spec in general
not to 4.5; link to new section.
(OpenMP Implementation Status): New.

Fortran: Revert to non-multilib-specific ISO_Fortran_binding.h

Commit fef67987cf502fe322e92ddce22eea7ac46b4d75 changed the
libgfortran build process to generate multilib-specific versions of
ISO_Fortran_binding.h from a template, by running gfortran to identify
the values of the Fortran kind constants C_LONG_DOUBLE, C_FLOAT128,
and C_INT128_T. This caused multiple problems with search paths, both
for build-tree testing and installed-tree use, not all of which have
been fixed.

This patch reverts to a non-multilib-specific .h file that uses GCC's
predefined preprocessor symbols to detect the supported types and map
them to kind values in the same way as the Fortran front end.

2021-09-06 Sandra Loosemore <sandra@codesourcery.com>

libgfortran/
* ISO_Fortran_binding-1-tmpl.h: Deleted.
* ISO_Fortran_binding-2-tmpl.h: Deleted.
* ISO_Fortran_binding-3-tmpl.h: Deleted.
* ISO_Fortran_binding.h: New file to replace the above.
* Makefile.am (gfor_cdir): Remove MULTISUBDIR.
(ISO_Fortran_binding.h): Simplify to just copy the file.
* Makefile.in: Regenerated.
* mk-kinds-h.sh: Revert pieces no longer needed for
ISO_Fortran_binding.h.

rs6000: Expand fmod and remainder when built with fast-math [PR97142]

fmod/fmodf and remainder/remainderf could be expanded instead of library
call when fast-math build, which is much faster.

fmodf:
     fdivs   f0,f1,f2
     friz    f0,f0
     fnmsubs f1,f2,f0,f1

remainderf:
     fdivs   f0,f1,f2
     frin    f0,f0
     fnmsubs f1,f2,f0,f1

SPEC2017 Ofast P8LE: 511.povray_r +1.14%,  526.blender_r +1.72%

gcc/ChangeLog:

2021-09-07  Xionghu Luo  <luoxhu@linux.ibm.com>

PR target/97142
* config/rs6000/rs6000.md (fmod<mode>3): New define_expand.
(remainder<mode>3): Likewise.

gcc/testsuite/ChangeLog:

2021-09-07  Xionghu Luo  <luoxhu@linux.ibm.com>

PR target/97142
* gcc.target/powerpc/pr97142.c: New test.

MIPS: add .module arch and ase to all output asm

Currently, the asm output file for MIPS has no rev info.
It can make some trouble, for example:

  assembler is mips1 by default,
  gcc is fpxx by default.

To assemble the output of gcc -S, we have to pass -mips2
to assembler.

The same situation is for some CPU has extension insn.
Octeon is an example.
So we can just add ".set arch=octeon".

If an ASE is enabled, .module ase will also be used.

gcc/ChangeLog:
* config/mips/mips.c (mips_file_start): add .module for
  arch and ase.

Daily bump.

Correct implementation of wi::clz

As diagnosed with Jakub and Richard in the analysis of PR 102134, the
current implementation of wi::clz has incorrect/inconsistent behaviour.
As mentioned by Richard in comment #7, clz should (always) return zero
for negative values, but the current implementation can only return 0
when precision is a multiple of HOST_BITS_PER_WIDE_INT. The fix is
simply to reorder/shuffle the existing tests.

2021-09-06 Roger Sayle <roger@nextmovesoftware.com>

gcc/ChangeLog
* wide-int.cc (wi::clz): Reorder tests to ensure the result
is zero for all negative values.

invoke.texi: Fix @opindex for -foffload-options

gcc/
* doc/invoke.texi (-foffload-options): Fix @opindex.

gcc_update: use human readable name for revision string in gcc/REVISION

contrib/Changelog:

* gcc_update: Derive human readable name for HEAD using git describe
like "git gcc-descr" with short commit hash. Drop "revision" from
gcc/REVISION.