review.tizen.org Git - test

[nvptx] Add nvptx-gen.h and nvptx-gen.opt

Use nvptx-sm.def to generate new files nvptx-gen.h and nvptx-gen.opt, and:
- include nvptx-gen.h in nvptx.h, and
- add nvptx-gen.opt to extra_options (before nvptx.opt, in case that matters).

Tested on nvptx.

gcc/ChangeLog:

2022-02-25  Tom de Vries  <tdevries@suse.de>

* config.gcc (nvptx*-*-*): Add nvptx/nvptx-gen.opt to extra_options.
* config/nvptx/gen-copyright.sh: New file.
* config/nvptx/gen-h.sh: New file.
* config/nvptx/gen-opt.sh: New file.
* config/nvptx/nvptx.h (TARGET_SM35, TARGET_SM53, TARGET_SM70)
(TARGET_SM75, TARGET_SM80): Move ...
* config/nvptx/nvptx-gen.h: ... here.  New file, generate.
* config/nvptx/nvptx.opt (Enum ptx_isa): Move ...
* config/nvptx/nvptx-gen.opt: ... here.  New file, generate.
* config/nvptx/t-nvptx ($(srcdir)/config/nvptx/nvptx-gen.h)
($(srcdir)/config/nvptx/nvptx-gen.opt): New make target.

[nvptx] Use nvptx-sm.def for t-omp-device

Add a script gen-omp-device-properties.sh that uses nvptx-sm.def to generate
omp-device-properties-nvptx.

Tested on x86_64 with nvptx accelerator.

gcc/ChangeLog:

2022-02-25 Tom de Vries <tdevries@suse.de>

* config/nvptx/gen-omp-device-properties.sh: New file.
* config/nvptx/t-omp-device: Use gen-omp-device-properties.sh.

[nvptx] Add nvptx-sm.def

Add a file gcc/config/nvptx/nvptx-sm.def that lists all sm_xx versions used in
the port, like so:
...
NVPTX_SM(30, NVPTX_SM_SEP)
NVPTX_SM(35, NVPTX_SM_SEP)
NVPTX_SM(53, NVPTX_SM_SEP)
NVPTX_SM(70, NVPTX_SM_SEP)
NVPTX_SM(75, NVPTX_SM_SEP)
NVPTX_SM(80,)
...
and use it in various places using a pattern:
...
  #define NVPTX_SM(XX, SEP) { ... }
  #include "nvptx-sm.def"
  #undef NVPTX_SM
...

Tested on nvptx.

gcc/ChangeLog:

2022-02-25  Tom de Vries  <tdevries@suse.de>

* config/nvptx/nvptx-sm.def: New file.
* config/nvptx/nvptx-c.cc (nvptx_cpu_cpp_builtins): Use nvptx-sm.def.
* config/nvptx/nvptx-opts.h (enum ptx_isa): Same.
* config/nvptx/nvptx.cc (sm_version_to_string)
(nvptx_omp_device_kind_arch_isa): Same.

[nvptx, testsuite] Add gcc.target/nvptx/sm*.c

Add a few test-cases that test passing each -misa=sm_xx version and verify that
the proper __PTX_SM__ is defined.

Tested on nvptx.

gcc/testsuite/ChangeLog:

2022-02-25 Tom de Vries <tdevries@suse.de>

* gcc.target/nvptx/sm30.c: New test.
* gcc.target/nvptx/sm35.c: New test.
* gcc.target/nvptx/sm53.c: New test.
* gcc.target/nvptx/sm70.c: New test.
* gcc.target/nvptx/sm75.c: New test.
* gcc.target/nvptx/sm80.c: New test.

arc: Fix for new ifcvt behavior [PR104154]

ifcvt now passes a CC-mode "comparison" to backends. This patch
simply returns from gen_compare_reg () in that case since nothing
needs to be prepared anymore.

gcc/ChangeLog:

PR rtl-optimization/104154
* config/arc/arc.cc (gen_compare_reg): Return the CC-mode
comparison ifcvt passed us.

i386: Fix V8HF vector init under -mno-avx [PR 104664]

For V8HFmode vector init with HFmode, do not directly emits V8HF move
with subreg, which may cause reload to assign general register to move
src.

gcc/ChangeLog:

PR target/104664
* config/i386/i386-expand.cc (ix86_expand_vector_init_duplicate):
Use vec_setv8hf_0 for HF to V8HFmode move instead of subreg.

gcc/testsuite/ChangeLog:

PR target/104664
* gcc.target/i386/pr104664.c: New test.

Daily bump.

PR tree-optimization/91384: peephole2 to eliminate testl after negl.

This patch is my proposed solution to PR tree-optimization/91384 which is
a missed-optimization/code quality regression on x86_64.  The problematic
idiom is "if (r = -a)" which is equivalent to both "r = -a; if (r != 0)"
and alternatively "r = -a; if (a != 0)".  In this particular case, on
x86_64, we prefer to use the condition codes from the negation, rather
than require an explicit testl instruction.

Unfortunately, combine can't help, as it doesn't attempt to merge pairs
of instructions that share the same operand(s), only pairs/triples of
instructions where the result of each instruction feeds the next.  But
I doubt there's sufficient benefit to attempt this kind of "combination"
(that wouldn't already be caught by the tree-ssa passes).

Fortunately, it's relatively easy to fix this up (addressing the
regression) during peephole2 to eliminate the unnecessary testl in:

        movl    %edi, %ebx
        negl    %ebx
        testl   %edi, %edi
        je      .L2

2022-02-28  Roger Sayle  <roger@nextmovesoftware.com>

gcc/ChangeLog
PR tree-optimization/91384
* config/i386/i386.md (peephole2): Eliminate final testl insn
from the sequence *movsi_internal, *negsi_1, *cmpsi_ccno_1 by
transforming using *negsi_2 for the negation.

gcc/testsuite/ChangeLog
PR tree-optimization/91384
* gcc.target/i386/pr91384.c: New test case.

PR middle-end/80270: ICE in extract_bit_field_1

This patch fixes PR middle-end/80270, an ICE-on-valid regression, where
performing a bitfield extraction on a variable explicitly stored in a
hard register by the user causes a segmentation fault during RTL
expansion.  Nearly identical source code without the "asm" qualifier
compiles fine.  The point of divergence is in simplify_gen_subreg
which tries to avoid creating non-trivial SUBREGs of hard registers,
to avoid problems during register allocation.  This suggests the
simple solution proposed here, to copy hard registers to a new pseudo
in extract_integral_bit_field, just before calling simplify_gen_subreg.

2022-02-28  Roger Sayle  <roger@nextmovesoftware.com>
    Eric Botcazou  <ebotcazou@adacore.com>

gcc/ChangeLog
PR middle-end/80270
* expmed.cc (extract_integral_bit_field): If OP0 is a hard
register, copy it to a pseudo before calling simplify_gen_subreg.

gcc/testsuite/ChangeLog
* gcc.target/i386/pr80270.c: New test case.

[PR104637] LRA: Split hard regs as many as possible on one subpass

LRA hard reg split subpass is a small subpass used as the last
resort for LRA when it can not assign a hard reg to a reload
pseudo by other ways (e.g. by spilling non-reload pseudos).  For
simplicity the subpass works on one split base (as each split
changes pseudo live range info).  In this case it results in
reaching maximal possible number of subpasses.  The patch
implements as many non-overlapping hard reg splits
splits as possible on each subpass.

gcc/ChangeLog:

PR rtl-optimization/104637
* lra-assigns.cc (lra_split_hard_reg_for): Split hard regs as many
as possible on one subpass.

gcc/testsuite/ChangeLog:

PR rtl-optimization/104637
* gcc.target/i386/pr104637.c: New.

d: Merge upstream dmd cf63dd8e5, druntime caf14b0f, phobos 41aaf8c26.

D front-end changes:

    - Import dmd v2.099.0-rc.1.
    - The `main' can now return type `noreturn' and supports return
      inference.

D Runtime changes:

    - Import druntime v2.099.0-rc.1.
    - C bindings for stat_t on powerpc-linux has been fixed.

Phobos changes:

    - Import phobos v2.099.0-rc.1.

gcc/d/ChangeLog:

* d-target.cc (Target::_init): Initialize C type size fields.
* dmd/MERGE: Merge upstream dmd cf63dd8e5.
* dmd/VERSION: Update version to v2.099.0-rc.1.

libphobos/ChangeLog:

* libdruntime/MERGE: Merge upstream druntime caf14b0f.
* src/MERGE: Merge upstream phobos 41aaf8c26.

gcc/testsuite/ChangeLog:

* gdc.dg/torture/simd7413a.d: Update.
* gdc.dg/ubsan/pr88957.d: Update.
* gdc.dg/simd18489.d: New test.
* gdc.dg/torture/simd21727.d: New test.

c++: Lost deprecated/unavailable attr in class tmpl [PR104682]

When looking into the other PR I noticed that we fail to give a warning
for a deprecated enumerator when the enum is in a class template.  This
only happens when the attribute doesn't have an argument.  The reason is
that when we tsubst_enum, we create a new enumerator:

      build_enumerator (DECL_NAME (decl), value, newtag,
           DECL_ATTRIBUTES (decl), DECL_SOURCE_LOCATION (decl));

but DECL_ATTRIBUTES (decl) is null when the attribute was provided
without an argument -- in that case it simply melts into a tree flag.
handle_deprecated_attribute has:

      if (!args)
         *no_add_attrs = true;

so the attribute isn't retained and we lose it when tsubsting.  Same
thing when the attribute is on the enum itself.

Attribute unavailable is a similar case, but it's different in that
it can be a late attribute whereas "deprecated" can't:
is_late_template_attribute has

                /* But some attributes specifically apply to templates.  */
                && !is_attribute_p ("abi_tag", name)
                && !is_attribute_p ("deprecated", name)
                && !is_attribute_p ("visibility", name))
         return true;
       else
         return false;

which looks strange, but attr-unavailable-9.C tests that we don't error when
the attribute is applied on a template.

PR c++/104682

gcc/cp/ChangeLog:

* cp-tree.h (build_enumerator): Adjust.
* decl.cc (finish_enum): Make it return the new decl.
* pt.cc (tsubst_enum): Propagate TREE_DEPRECATED and TREE_UNAVAILABLE.

gcc/testsuite/ChangeLog:

* g++.dg/ext/attr-unavailable-10.C: New test.
* g++.dg/ext/attr-unavailable-11.C: New test.
* g++.dg/warn/deprecated-17.C: New test.
* g++.dg/warn/deprecated-18.C: New test.

c++: ICE with attribute on enumerator [PR104667]

When processing a template, the enumerators we build don't have a type
yet.  But is_late_template_attribute is not prepared to see a _DECL
without a type, so we crash on

  enum tree_code code = TREE_CODE (type);

(I found that we don't give the "is deprecated" warning for the enumerator
'f' in the test.  Reported as PR104682.)

PR c++/104667

gcc/cp/ChangeLog:

* decl2.cc (is_late_template_attribute): Cope with a decl without
a type.

gcc/testsuite/ChangeLog:

* g++.dg/ext/attrib64.C: New test.

Suppress uninitialized warnings for new created uses from __builtin_clear_padding folding [PR104550]

__builtin_clear_padding(&object) will clear all the padding bits of the object.
actually, it doesn't involve any use of an user variable. Therefore, users do
not expect any uninitialized warning from it. It's reasonable to suppress
uninitialized warnings for all new created uses from __builtin_clear_padding
folding.

PR middle-end/104550

gcc/ChangeLog:

* gimple-fold.cc (clear_padding_flush): Suppress warnings for new
created uses.

gcc/testsuite/ChangeLog:

* gcc.dg/auto-init-pr104550-1.c: New test.
* gcc.dg/auto-init-pr104550-2.c: New test.
* gcc.dg/auto-init-pr104550-3.c: New test.

Fix error recovery in toplev::finalize.

PR ipa/104648

gcc/ChangeLog:

* main.cc (main): Use flag_checking instead of CHECKING_P
and run toplev::finalize only if there is not error seen.

gcc/testsuite/ChangeLog:

* g++.dg/pr104648.C: New test.

Simplify PRE fix

The following reverts a part of the PR103037 fix which is no longer necessary
after the fix for PR104700. That makes the possible cummulative backport
smaller.

2022-02-28 Richard Biener <rguenther@suse.de>

* tree-ssa-pre.cc (compute_avail): Revert part of last change.

tree-optimization/104700 - adjust constant handling in PRE

The following refactors find_or_generate_expression to more properly
handle constant valued SSA names thereby simplifying the code and
avoiding ICEing after the last change to NARY processing.

2022-02-28 Richard Biener <rguenther@suse.de>

PR tree-optimization/104700
* tree-ssa-pre.cc (get_or_alloc_expr_for): Remove and inline
into ...
(find_or_generate_expression): ... here, simplifying code.

* gcc.dg/pr104700-2.c: New testcase.
* gcc.dg/torture/pr104700-1.c: Likewise.

[libgomp, testsuite, nvptx] Add -mptx=_ in declare-variant-3-sm*.c

When running with target board unix/-foffload=-mptx=3.1, we run into:
...
lto1: error: PTX version (-mptx) needs to be at least 4.2 to support \
  selected -misa (sm_53)^M
mkoffload: fatal error: x86_64-pc-linux-gnu-accel-nvptx-none-gcc returned \
  1 exit status^M
compilation terminated.^M
  ...
FAIL: libgomp.c/declare-variant-3-sm53.c (test for excess errors)
...

Fix this by adding -foffload=-mptx=_ in the libgomp.c/declare-variant-3-sm*.c
test-cases.

Tested on x86_64 with nvptx accelerator.

libgomp/ChangeLog:

2022-02-28  Tom de Vries  <tdevries@suse.de>

* testsuite/libgomp.c/declare-variant-3-sm30.c: Add -foffload=-mptx=_.
* testsuite/libgomp.c/declare-variant-3-sm35.c: Same.
* testsuite/libgomp.c/declare-variant-3-sm53.c: Same.
* testsuite/libgomp.c/declare-variant-3-sm70.c: Same.
* testsuite/libgomp.c/declare-variant-3-sm75.c: Same.
* testsuite/libgomp.c/declare-variant-3-sm80.c: Same.

[nvptx, testsuite] Add -mptx=_ in nvptx.exp test-cases

When running with target board nvptx-none-run/-mptx=3.1, I run into:
...
cc1: error: PTX version (-mptx) needs to be at least 4.2 to support selected \
-misa (sm_53)^M
compiler exited with status 1
FAIL: gcc.target/nvptx/atomic-store-1.c (test for excess errors)
...

Fix this and similar cases by adding an explicit -mptx=_ setting.

Tested on nvptx.

gcc/testsuite/ChangeLog:

2022-02-28 Tom de Vries <tdevries@suse.de>

* gcc.target/nvptx/atomic-store-1.c: Add -mptx=_.
* gcc.target/nvptx/atomic-store-2.c: Same.
* gcc.target/nvptx/float16-1.c: Same.
* gcc.target/nvptx/float16-2.c: Same.
* gcc.target/nvptx/float16-3.c: Same.
* gcc.target/nvptx/float16-4.c: Same.
* gcc.target/nvptx/float16-5.c: Same.
* gcc.target/nvptx/float16-6.c: Same.
* gcc.target/nvptx/tanh-1.c: Same.
* gcc.target/nvptx/uniform-simt-1.c: Same.
* gcc.target/nvptx/uniform-simt-3.c: Same.

[nvptx] Add -mptx=_

Add an -mptx=_ value, that indicates the default ptx version.

It can be used to undo an explicit -mptx setting, so this:
...
$ gcc test.c -mptx=3.1 -mptx=_
...
has the same effect as:
...
$ gcc test.c
...

Tested on nvptx.

gcc/ChangeLog:

2022-02-28 Tom de Vries <tdevries@suse.de>

* config/nvptx/nvptx-opts.h (enum ptx_version): Add
PTX_VERSION_default.
* config/nvptx/nvptx.cc (handle_ptx_version_option): Handle
PTX_VERSION_default.
* config/nvptx/nvptx.opt: Add EnumValue "_" / PTX_VERSION_default.

[nvptx, testsuite] Add -misa=sm_30 in nvptx/atomic-store-3.c

When running with target board nvptx-none-run/-misa=sm_70 I run into:
...
FAIL: gcc.target/nvptx/atomic-store-3.c scan-assembler-times st.global.u32 1
FAIL: gcc.target/nvptx/atomic-store-3.c scan-assembler-times st.global.u64 1
...

Fix this by adding an explicit -misa=sm_30 in the test-case.

Tested on nvptx.

gcc/testsuite/ChangeLog:

2022-02-28 Tom de Vries <tdevries@suse.de>

* gcc.target/nvptx/atomic-store-3.c: Add -misa=sm_30.

[nvptx, testsuite] Add -misa=sm_30 in nvptx/uniform-simt-2.c

When running with target board nvptx-none-run/-misa=sm_53 we run into:
...
cc1: error: PTX version (-mptx) needs to be at least 4.2 to support selected \
-misa (sm_53)^M
compiler exited with status 1
FAIL: gcc.target/nvptx/uniform-simt-2.c (test for excess errors)
...

Fix this by adding an explicit -misa=sm_30 in the test-case.

Tested on nvptx.

gcc/testsuite/ChangeLog:

2022-02-28 Tom de Vries <tdevries@suse.de>

* gcc.target/nvptx/uniform-simt-2.c: Add -misa=sm_30.

[nvptx, testsuite] Add -misa=sm_35 in nvptx/rotate.c

When running with target board nvptx-none-run/-misa=sm_30 we run into:
...
FAIL: gcc.target/nvptx/rotate.c scan-assembler-times shf.l.wrap.b32 1
FAIL: gcc.target/nvptx/rotate.c scan-assembler-times shf.r.wrap.b32 1
FAIL: gcc.target/nvptx/rotate.c scan-assembler-not and.b32
...

Fix this by adding an explicit -misa=sm_35 in the test-case.

Tested on nvptx.

gcc/testsuite/ChangeLog:

2022-02-28 Tom de Vries <tdevries@suse.de>

* gcc.target/nvptx/rotate.c: Add -misa=sm_35.

rtl-optimization/104686 - speed up conflict iteration

The following replaces

       /* Skip bits that are zero.  */
       for (; (word & 1) == 0; word >>= 1)
         bit_num++;

idioms in ira-int.h in the attempt to speedup update_conflict_hard_regno_costs
which we're bound on in PR104686.  The trick is to use ctz_hwi here
which should pay off even with dense bitmaps on architectures that
have HW support for this.

For the PR in question this speeds up compile-time from 31s to 24s for
me.

2022-02-25  Richard Biener  <rguenther@suse.de>

PR rtl-optimization/104686
* ira-int.h (minmax_set_iter_cond): Use ctz_hwi to elide loop
skipping bits that are zero.
(ira_object_conflict_iter_cond): Likewise.

AVX512F: Add helper enumeration for ternary logic intrinsics.

Sync with llvm change in https://reviews.llvm.org/D120307 to
add enumeration and truncate imm to unsigned char, so users could
use ~ on immediates.

gcc/ChangeLog:

* config/i386/avx512fintrin.h (_MM_TERNLOG_ENUM): New enum.
(_mm512_ternarylogic_epi64): Truncate imm to unsigned
char to avoid error when using ~enum as parameter.
(_mm512_mask_ternarylogic_epi64): Likewise.
(_mm512_maskz_ternarylogic_epi64): Likewise.
(_mm512_ternarylogic_epi32): Likewise.
(_mm512_mask_ternarylogic_epi32): Likewise.
(_mm512_maskz_ternarylogic_epi32): Likewise.
* config/i386/avx512vlintrin.h (_mm256_ternarylogic_epi64):
Adjust imm param type to unsigned char.
(_mm256_mask_ternarylogic_epi64): Likewise.
(_mm256_maskz_ternarylogic_epi64): Likewise.
(_mm256_ternarylogic_epi32): Likewise.
(_mm256_mask_ternarylogic_epi32): Likewise.
(_mm256_maskz_ternarylogic_epi32): Likewise.
(_mm_ternarylogic_epi64): Likewise.
(_mm_mask_ternarylogic_epi64): Likewise.
(_mm_maskz_ternarylogic_epi64): Likewise.
(_mm_ternarylogic_epi32): Likewise.
(_mm_mask_ternarylogic_epi32): Likewise.
(_mm_maskz_ternarylogic_epi32): Likewise.

gcc/testsuite/ChangeLog:

* gcc.target/i386/avx512f-vpternlogd-1.c: Use new enum.
* gcc.target/i386/avx512f-vpternlogq-1.c: Likewise.
* gcc.target/i386/avx512vl-vpternlogd-1.c: Likewise.
* gcc.target/i386/avx512vl-vpternlogq-1.c: Likewise.
* gcc.target/i386/testimm-10.c: Remove imm check for vpternlog
insns since the imm has been truncated in intrinsic.

Daily bump.

c++: (*(fn))() [PR104618]

The patch for PR90451 deferred marking to the point of actual use; we missed
this one because of the parens.

PR c++/104618

gcc/cp/ChangeLog:

* typeck.cc (cp_build_addr_expr_1): Also
maybe_undo_parenthesized_ref.

gcc/testsuite/ChangeLog:

* g++.dg/overload/paren1.C: New test.

Fix declarations of _DINFINITY, _SINFINITY and _SQNAN

The declarations of _DINFINITY, _SINFINITY and _SQNAN need to be constant
expressions.

2022-02-27 John David Anglin <danglin@gcc.gnu.org>

fixincludes/ChangeLog:
* inclhack.def (hpux_math_constexpr): New hack.
* fixincl.x: Regenerate.
* tests/base/math.h: Update.

Daily bump.

match.pd: Further complex simplification fixes [PR104675]

Mark mentioned in the PR further 2 simplifications that also ICE
with complex types.
For these, eventually (but IMO GCC 13 materials) we could support it
for vector types if it would be uniform vector constants.
Currently integer_pow2p is true only for INTEGER_CSTs and COMPLEX_CSTs
and we can't use bit_and etc. for complex type.

2022-02-25 Jakub Jelinek <jakub@redhat.com>
Marc Glisse <marc.glisse@inria.fr>

PR tree-optimization/104675
* match.pd (t * 2U / 2 -> t & (~0 / 2), t / 2U * 2 -> t & ~1):
Restrict simplifications to INTEGRAL_TYPE_P.

* gcc.dg/pr104675-3.c : New test.

rs6000: Use rs6000_emit_move in movmisalign<mode> expander [PR104681]

The following testcase ICEs, because for some strange reason it decides to use
movmisaligntf during expansion where the destination is MEM and source is
CONST_DOUBLE. For normal mov<mode> expanders the rs6000 backend uses
rs6000_emit_move to ensure that if one operand is a MEM, the other is a REG
and a few other things, but for movmisalign<mode> nothing enforced this.
The middle-end documents that movmisalign<mode> shouldn't fail, so we can't
force that through predicates or condition on the expander.

2022-02-25 Jakub Jelinek <jakub@redhat.com>

PR target/104681
* config/rs6000/vector.md (movmisalign<mode>): Use rs6000_emit_move.

* g++.dg/opt/pr104681.C: New test.

testsuite: Move pr104540.C test to g++.target/i386/

Both -mforce-drap and -mstackrealign options are x86 specific.

2022-02-25 Jakub Jelinek <jakub@redhat.com>

* g++.dg/pr104540.C: Move to ...
* g++.target/i386/pr104540.C: ... here.

testsuite: Fix ASAN error [PR104687]

PR testsuite/104687

gcc/testsuite/ChangeLog:

* gcc.dg/lto/20090717_0.c: Fix asan error.

arc: Fail conditional move expand patterns

If the movcc comparison is not valid it triggers an assert in the
current implementation. This behavior is not needed as we can FAIL
the movcc expand pattern.

gcc/
* config/arc/arc.cc (gen_compare_reg): Return NULL_RTX if the
comparison is not valid.
* config/arc/arc.md (movsicc): Fail if comparison is not valid.
(movdicc): Likewise.
(movsfcc): Likewise.
(movdfcc): Likewise.

Signed-off-by: Claudiu Zissulescu <claziss@synopsys.com>

tree-optimization/103037 - PRE simplifying valueized expressions

This fixes a long-standing issue in PRE where we track valueized
expressions in our expression sets that we use for PHI translation,
code insertion but also feed into match-and-simplify via
vn_nary_simplify.  But that's not what is expected from vn_nary_simplify
or match-and-simplify which assume we are simplifying with operands
available at the point of the expression so they can use contextual
information on the SSA names like ranges.  While the VN side was
updated to ensure this with the rewrite to RPO VN, thereby removing
all workarounds that nullified such contextual info on all SSA names,
the PRE side still suffers from this.

The following patch tries to apply minimal surgery at this point
and makes PRE track un-valueized expressions in the expression sets
but only for the NARY kind (both NAME and CONSTANT do not suffer
from this issue), leaving the REFERENCE kind alone.  The REFERENCE
kind is important when trying to remove the workarounds still in
place in compute_avail for code hoisting, but that's a separate issue
and we have a working workaround in place.

Doing this comes at the cost of duplicating the VN IL on the PRE side
for NARY and eventually some extra overhead for translated expressions
that is difficult to assess.

2022-02-25  Richard Biener  <rguenther@suse.de>

PR tree-optimization/103037
* tree-ssa-sccvn.h (alloc_vn_nary_op_noinit): Declare.
(vn_nary_length_from_stmt): Likewise.
(init_vn_nary_op_from_stmt): Likewise.
(vn_nary_op_compute_hash): Likewise.
* tree-ssa-sccvn.cc (alloc_vn_nary_op_noinit): Export.
(vn_nary_length_from_stmt): Likewise.
(init_vn_nary_op_from_stmt): Likewise.
(vn_nary_op_compute_hash): Likewise.
* tree-ssa-pre.cc (pre_expr_obstack): New obstack.
(get_or_alloc_expr_for_nary): Pass in the value-id to use,
(re-)compute the hash value and if the expression is not
found allocate it from pre_expr_obstack.
(phi_translate_1): Do not insert the NARY found in the
VN tables but build a PRE expression from the valueized
NARY with the value-id we eventually found.
(find_or_generate_expression): Assert we have an entry
for constant values.
(compute_avail): Insert not valueized expressions into
EXP_GEN using the value-id from the VN tables.
(init_pre): Allocate pre_expr_obstack.
(fini_pre): Free pre_expr_obstack.

* gcc.dg/torture/pr103037.c: New testcase.

i386: Use a new temp slot kind for splitter to floatdi<mode>2_i387_with_xmm [PR104674]

As mentioned in the PR, the following testcase is miscompiled for similar
reasons as the already fixed PR78791 - we use SLOT_TEMP slots in various
places during expansion and during expansion we can guarantee that the
lifetime of those temporary slot doesn't overlap. But the following
splitter uses SLOT_TEMP too and in between expansion and split1 there is
a possibility that something extends the lifetime of SLOT_TEMP created
slots across an instruction that will be split by this splitter.

The following patch fixes it by using a new temp slot kind to make sure
it doesn't reuse a SLOT_TEMP that could be live across the instruction.

2022-02-25 Jakub Jelinek <jakub@redhat.com>

PR target/104674
* config/i386/i386.h (enum ix86_stack_slot): Add SLOT_FLOATxFDI_387.
* config/i386/i386.md (splitter to floatdi<mode>2_i387_with_xmm): Use
SLOT_FLOATxFDI_387 rather than SLOT_TEMP.

* gcc.target/i386/pr104674.c: New test.

warning-control: Comment spelling fix

This fixes a spelling mistake I found while looking at warning-control
implementation.

2022-02-25 Jakub Jelinek <jakub@redhat.com>

* warning-control.cc (get_nowarn_spec): Comment spelling fix.

internal-fn: Call do_pending_stack_adjust in expand_SPACESHIP [PR104679]

The following testcase is miscompiled on ia32 at -O2, because
when expand_SPACESHIP is called, we have pending stack adjustment
from the foo call right before it.
Now, ix86_expand_fp_spaceship uses emit_jump_insn several times
but then emit_jump also several times.  While emit_jump_insn doesn't
do do_pending_stack_adjust (), emit_jump does, so we end up with:
...
    8: call [`_Z3foodl'] argc:0x10
      REG_CALL_DECL `_Z3foodl'
    9: r88:DF=[`a']
   10: r89:HI=unspec[cmp(r88:DF,0.0)] 25
   11: flags:CC=unspec[r89:HI] 26
   12: pc={(unordered(flags:CCFP,0))?L27:pc}
      REG_BR_PROB 536868
   66: NOTE_INSN_BASIC_BLOCK 4
   13: pc={(uneq(flags:CCFP,0))?L19:pc}
      REG_BR_PROB 214748364
   67: NOTE_INSN_BASIC_BLOCK 5
   14: pc={(flags:CCFP>0)?L23:pc}
      REG_BR_PROB 536870916
   68: NOTE_INSN_BASIC_BLOCK 6
   15: r86:SI=0xffffffffffffffff
   16: {sp:SI=sp:SI+0x10;clobber flags:CC;}
      REG_ARGS_SIZE 0
   17: pc=L29
   18: barrier
   19: L19:
   69: NOTE_INSN_BASIC_BLOCK 7
...
The sp += 16 pending stuck adjust was emitted in the middle of the
sequence and is effective only for the single case of the 4 possibilities
where .SPACESHIP returns -1, in all other cases the stack isn't adjusted
and so we ICE during dwarf2cfi.

Now, we could either call do_pending_stack_adjust in
ix86_expand_fp_spaceship, or use there calls that actually don't call
do_pending_stack_adjust (but having the stack adjustment across branches is
generally undesirable), or we can call it in expand_SPACESHIP for all
targets (note, just i386 currently implements it).
I chose the generic code because e.g. expand_{addsub,neg,mul}_overflow
in the same file also call do_pending_stack_adjust in internal-fn.cc for the
same reasons, that it is expected that most if not all targets will expand
those through jumps and we don't want all of the targets to need to deal
with that.

2022-02-25  Jakub Jelinek  <jakub@redhat.com>

PR middle-end/104679
* internal-fn.cc (expand_SPACESHIP): Call do_pending_stack_adjust.

* g++.dg/torture/pr104679.C: New test.

match.pd: Don't create BIT_NOT_EXPRs for COMPLEX_TYPE [PR104675]

We don't support BIT_{AND,IOR,XOR,NOT}_EXPR on complex types,
&/|/^ are just rejected for them, and ~ is parsed as CONJ_EXPR.
So, we should avoid simplifications which turn valid complex type
expressions into something that will ICE during expansion.

2022-02-25 Jakub Jelinek <jakub@redhat.com>

PR tree-optimization/104675
* match.pd (-A - 1 -> ~A, -1 - A -> ~A): Don't simplify for
COMPLEX_TYPE.

* gcc.dg/pr104675-1.c: New test.
* gcc.dg/pr104675-2.c: New test.

Revert commit r12-5852-g50e8b0c9bca6cdc57804f860ec5311b641753fbb

The patch for PR103302 caused PR104121, and extended the live ranges
of LRA reloads.

for gcc/ChangeLog

PR target/104121
PR target/103302
* expr.cc (emit_move_multi_word): Restore clobbers during LRA.

Add testcase from PR103845

This problem was already fixed as part of PR104263: the abnormal edge
that remained from before inlining didn't make sense after inlining.
So this patch adds only the testcase.

for gcc/testsuite/ChangeLog

PR tree-optimization/103845
PR tree-optimization/104263
* gcc.dg/pr103845.c: New.

Cope with NULL dw_cfi_cfa_loc

In def_cfa_0, we may set the 2nd operand's dw_cfi_cfa_loc to NULL, but
then cfi_oprnd_equal_p calls cfa_equal_p with a NULL dw_cfa_location*.
This patch aranges for us to tolerate NULL dw_cfi_cfa_loc.

for gcc/ChangeLog

PR middle-end/104540
* dwarf2cfi.cc (cfi_oprnd_equal_p): Cope with NULL
dw_cfi_cfa_loc.

for gcc/testsuite/ChangeLog

PR middle-end/104540
* g++.dg/pr104540.C: New.

Copy EH phi args for throwing hardened compares

When we duplicate a throwing compare for hardening, the EH edge from
the original compare gets duplicated for the inverted compare, but we
failed to adjust any PHI nodes in the EH block.  This patch adds the
needed adjustment, copying the PHI args from those of the preexisting
edge.

for  gcc/ChangeLog

PR tree-optimization/103856
* gimple-harden-conditionals.cc (non_eh_succ_edge): Enable the
eh edge to be requested through an extra parameter.
(pass_harden_compares::execute): Copy PHI args in the EH dest
block for the new EH edge added for the inverted compare.

for  gcc/testsuite/ChangeLog

PR tree-optimization/103856
* g++.dg/pr103856.C: New.

Daily bump.

libstdc++: Fix cast in source_location::current() [PR104602]

This fixes a problem for Clang, which is going to return a non-void
pointer from __builtin_source_location(). The current definition of
std::source_location::current() converts that to void* and then has to
cast it back again in the body (which makes it invalid in a constant
expression). By using the actual type of the returned pointer, we avoid
the problematic cast for Clang.

libstdc++-v3/ChangeLog:

PR libstdc++/104602
* include/std/source_location (source_location::current): Use
deduced type of __builtin_source_location().

Fix attr-retain-* tescases for 32-bit PowerPC.

PR testsuite/100407

gcc/testsuite/
* gcc.c-torture/compile/attr-retain-1.c: Add -G0 for 32-bit PowerPC.
* gcc.c-torture/compile/attr-retain-2.c: Likewise.

Fortran: frontend code for F2018 QUIET specifier to STOP and ERROR STOP

Fortran 2018 allows for a QUIET specifier to the STOP and ERROR STOP
statements. Whilst the gfortran library code provides support for this
specifier for quite some time, the frontend implementation was missing.

gcc/fortran/ChangeLog:

PR fortran/84519
* dump-parse-tree.cc (show_code_node): Dump QUIET specifier when
present.
* match.cc (gfc_match_stopcode): Implement parsing of F2018 QUIET
specifier. F2018 stopcodes may have non-default integer kind.
* resolve.cc (gfc_resolve_code): Add checks for QUIET argument.
* trans-stmt.cc (gfc_trans_stop): Pass QUIET specifier to call of
library function.

gcc/testsuite/ChangeLog:

PR fortran/84519
* gfortran.dg/stop_1.f90: New test.
* gfortran.dg/stop_2.f: New test.
* gfortran.dg/stop_3.f90: New test.
* gfortran.dg/stop_4.f90: New test.

RISC-V: Document the degree of position independence that medany affords

The code generated by -mcmodel=medany is defined to be
position-independent, but is not guaranteed to function correctly when
linked into position-independent executables or libraries. See the
recent discussion at the psABI specification [1] for more details.

It would be better to reject these invalid sequences when linking, but
as pointed out in a recent LD bug [2] there may be some compatibility
issues related to the PCREL_HI20 relocations used to initialize GP.
Given the complexity here it's unlikely we'll be able to reject these
sequences any time soon, so instead just document that these may not
work.

[1]: https://github.com/riscv-non-isa/riscv-elf-psabi-doc/issues/245
[2]: https://sourceware.org/bugzilla/show_bug.cgi?id=28789

gcc/ChangeLog:

* doc/invoke.texi (RISC-V -mcmodel=medany): Document the degree
of position independence that -mcmodel=medany affords.

Signed-off-by: Palmer Dabbelt <palmer@rivosinc.com>

libgcc: fix a warning calling find_fde_tail

The third parameter of find_fde_tail is an _Unwind_Ptr (which is an
integer type instead of a pointer), but we are passing NULL to it. This
causes a -Wint-conversion warning.

libgcc/

* unwind-dw2-fde-dip.c (_Unwind_Find_FDE): Call find_fde_tail
with 0 instead of NULL.

Fix clang warning in pt.cc

Fixes:

gcc/cp/pt.cc:13755:23: warning: suggest braces around initialization of subobject [-Wmissing-braces]
tree_vec_map in = { fn, nullptr };

gcc/cp/ChangeLog:

* pt.cc (defarg_insts_for): Use braces for subobject.

bpf: do not --enable-gcov for bpf-*-* targets

This patch changes the build machinery in order to disable the build
of GCOV (both compiler and libgcc) in bpf-*-* targets. The reason for
this change is that BPF is (currently) too restricted in order to
support the coverage instrumentalization.

Tested in bpf-unknown-none and x86_64-linux-gnu targets.

2022-02-23 Jose E. Marchesi <jose.marchesi@oracle.com>

gcc/ChangeLog

PR target/104656
* configure.ac: --disable-gcov if targetting bpf-*.
* configure: Regenerate.

libgcc/ChangeLog

PR target/104656
* configure.ac: --disable-gcov if targetting bpf-*.
* configure: Regenerate.

tree-optimization/104676 - free nb_iterations after loop distribution

Loop distribution can release SSA names used in nb_iterations, make
sure to release those.

2022-02-24 Richard Biener <rguenther@suse.de>

PR tree-optimization/104676
* tree-loop-distribution.cc (loop_distribution::execute):
Do a full scev_reset.

* gcc.dg/torture/pr104676.c: New testcase.

sccvn: Fix visit_reference_op_call value numbering of vdefs [PR104601]

The following testcase is miscompiled, because -fipa-pure-const discovers
that bar is const, but when sccvn during fre3 sees
  # .MEM_140 = VDEF <.MEM_96>
  *__pred$__d_43 = _50 (_49);
where _50 value numbers to &bar, it value numbers .MEM_140 to
vuse_ssa_val (gimple_vuse (stmt)).  For const/pure calls that return
a SSA_NAME (or don't have lhs) that is fine, those calls don't store
anything, but if the lhs is present and not an SSA_NAME, value numbering
the vdef to anything but itself means that e.g. walk_non_aliased_vuses
won't consider the call, but the call acts as a store to its lhs.
When it is ignored, sccvn will return whatever has been stored to the
lhs earlier.

I've bootstrapped/regtested an earlier version of this patch, which did the
if (!lhs && gimple_call_lhs (stmt))
  changed |= set_ssa_val_to (vdef, vdef);
part before else if (vnresult->result_vdef), and that regressed
+FAIL: gcc.dg/pr51879-16.c scan-tree-dump-times pre "foo \\\\(" 1
+FAIL: gcc.dg/pr51879-16.c scan-tree-dump-times pre "foo2 \\\\(" 1
so this updated patch uses result_vdef there as before and only otherwise
(which I think must be the const/pure case) decides based on whether the
lhs is non-SSA_NAME.

2022-02-24  Jakub Jelinek  <jakub@redhat.com>

PR tree-optimization/104601
* tree-ssa-sccvn.cc (visit_reference_op_call): For calls with
non-SSA_NAME lhs value number vdef to itself instead of e.g. the
vuse value number.

* g++.dg/torture/pr104601.C: New test.

[libgomp, testsuite, nvptx] Add libgomp.c/declare-variant-3-sm*.c

Add openmp test-cases that test the omp declare variant construct:
...
#pragma omp declare variant (f30) match (device={isa("sm_30")})
...
using the available nvptx isas.

Only the one for sm_30 is a dg-do run test-case, the other ones are dg-do
link.

Tested on x86_64 with nvptx accelerator.

libgomp/ChangeLog:

2022-02-24 Tom de Vries <tdevries@suse.de>

* testsuite/libgomp.c/declare-variant-3-sm30.c: New test.
* testsuite/libgomp.c/declare-variant-3-sm35.c: New test.
* testsuite/libgomp.c/declare-variant-3-sm53.c: New test.
* testsuite/libgomp.c/declare-variant-3-sm70.c: New test.
* testsuite/libgomp.c/declare-variant-3-sm75.c: New test.
* testsuite/libgomp.c/declare-variant-3-sm80.c: New test.
* testsuite/libgomp.c/declare-variant-3.h: New header file.

[nvptx] Add missing t-omp-device isas

In t-omp-device we list isas that can be used in omp declare variant like so:
...
#pragma omp declare variant (f30) match (device={isa("sm_30")})
...
and in nvptx_omp_device_kind_arch_isa we handle them.

Update both to reflect the current list of isas.

Tested on x86_64-linux with nvptx accelerator.

gcc/ChangeLog:

2022-02-23 Tom de Vries <tdevries@suse.de>

* config/nvptx/nvptx.cc (nvptx_omp_device_kind_arch_isa): Handle
sm_70, sm_75 and sm_80.
* config/nvptx/t-omp-device: Add sm_53, sm_70, sm_75 and sm_80.

Co-Authored-By: Tobias Burnus <tobias@codesourcery.com>

[nvptx] Add shf.{l,r}.wrap insn

Ptx contains funnel shift operations shf.l.wrap and shf.r.wrap that can be
used to implement 32-bit left or right rotate.

Add define_insns rotlsi3 and rotrsi3.

Tested on nvptx.

gcc/ChangeLog:

2022-02-23 Tom de Vries <tdevries@suse.de>

* config/nvptx/nvptx.md (define_insn "rotlsi3", define_insn
"rotrsi3"): New define_insn.

gcc/testsuite/ChangeLog:

2022-02-23 Tom de Vries <tdevries@suse.de>

* gcc.target/nvptx/rotate-run.c: New test.
* gcc.target/nvptx/rotate.c: New test.

[nvptx] Fix dummy location in gen_comment

I committed "[nvptx] Add -mptx-comment", but tested it in combination with the
proposed "[final] Handle compiler-generated asm insn" (
https://gcc.gnu.org/pipermail/gcc-patches/2022-February/590721.html ), so
by itself the commit introduced some regressions:
...
FAIL: gcc.dg/20020426-2.c (internal compiler error: Segmentation fault)
FAIL: gcc.dg/analyzer/zlib-3.c (internal compiler error: Segmentation fault)
FAIL: gcc.dg/pr101223.c (internal compiler error: Segmentation fault)
FAIL: gcc.dg/torture/pr80764.c -O2 (internal compiler error: Segmentation fault)
...

There are due to cfun->function_start_locus == 0.

Fix these by using DECL_SOURCE_LOCATION (cfun->decl) instead.

Tested on nvptx.

gcc/ChangeLog:

2022-02-23 Tom de Vries <tdevries@suse.de>

* config/nvptx/nvptx.cc (gen_comment): Use
DECL_SOURCE_LOCATION (cfun->decl) instead of cfun->function_start_locus.

Fix typo in <code>v1ti3.

For evex encoding vp{xor,or,and}, suffix is needed.

Or there would be an error for
vpxor %xmm0, %xmm31, %xmm1

Error: unsupported instruction `vpxor'

gcc/ChangeLog:

* config/i386/sse.md (<code>v1ti3): Add suffix and replace
isa attr of alternative 2 from avx to avx512vl.

gcc/testsuite/ChangeLog:

* gcc.target/i386/avx512vl-logicsuffix-1.c: New test.

Daily bump.

analyzer: handle __attribute__((const)) [PR104434]

When testing -fanalyzer on openblas-0.3, I noticed slightly over 2000
false positives from -Wanalyzer-malloc-leak on code like this:

        if( LAPACKE_lsame( vect, 'b' ) || LAPACKE_lsame( vect, 'p' ) ) {
            pt_t = (lapack_complex_float*)
                LAPACKE_malloc( sizeof(lapack_complex_float) *
                                ldpt_t * MAX(1,n) );
            [...snip...]
        }

        [...snip lots of code...]

        if( LAPACKE_lsame( vect, 'b' ) || LAPACKE_lsame( vect, 'q' ) ) {
            LAPACKE_free( pt_t );
        }

where LAPACKE_lsame is a char-comparison function implemented in a
different TU.
The analyzer naively considers the execution path where:
  LAPACKE_lsame( vect, 'b' ) || LAPACKE_lsame( vect, 'p' )
is true at the malloc guard, but then false at the free guard, which
is thus a memory leak.

This patch makes -fanalyer respect __attribute__((const)), so that the
analyzer treats such functions as returning the same value when given
the same inputs.

I've filed https://github.com/xianyi/OpenBLAS/issues/3543 suggesting that
LAPACKE_lsame be annotated with __attribute__((const)); with that, and
with this patch, the false positives seem to be fixed.

gcc/analyzer/ChangeLog:
PR analyzer/104434
* analyzer.h (class const_fn_result_svalue): New decl.
* region-model-impl-calls.cc (call_details::get_manager): New.
* region-model-manager.cc
(region_model_manager::get_or_create_const_fn_result_svalue): New.
(region_model_manager::log_stats): Log
m_const_fn_result_values_map.
* region-model.cc (const_fn_p): New.
(maybe_get_const_fn_result): New.
(region_model::on_call_pre): Handle fndecls with
__attribute__((const)) by calling the above rather than making
a conjured_svalue.
* region-model.h (visitor::visit_const_fn_result_svalue): New.
(region_model_manager::get_or_create_const_fn_result_svalue): New
decl.
(region_model_manager::const_fn_result_values_map_t): New typedef.
(region_model_manager::m_const_fn_result_values_map): New field.
(call_details::get_manager): New decl.
* svalue.cc (svalue::cmp_ptr): Handle SK_CONST_FN_RESULT.
(const_fn_result_svalue::dump_to_pp): New.
(const_fn_result_svalue::dump_input): New.
(const_fn_result_svalue::accept): New.
* svalue.h (enum svalue_kind): Add SK_CONST_FN_RESULT.
(svalue::dyn_cast_const_fn_result_svalue): New.
(class const_fn_result_svalue): New.
(is_a_helper <const const_fn_result_svalue *>::test): New.
(template <> struct default_hash_traits<const_fn_result_svalue::key_t>):
New.

gcc/testsuite/ChangeLog:
PR analyzer/104434
* gcc.dg/analyzer/attr-const-1.c: New test.
* gcc.dg/analyzer/attr-const-2.c: New test.
* gcc.dg/analyzer/attr-const-3.c: New test.
* gcc.dg/analyzer/pr104434-const.c: New test.
* gcc.dg/analyzer/pr104434-nonconst.c: New test.
* gcc.dg/analyzer/pr104434.h: New test.

Signed-off-by: David Malcolm <dmalcolm@redhat.com>

c++: Add new test [PR79493]

A nice side effect of r12-1822 was improving the diagnostic
we emit for the following test.

PR c++/79493

gcc/testsuite/ChangeLog:

* g++.dg/diagnostic/undeclared1.C: New test.

c++: Add fixed test [PR70077]

Fixed with r10-1280.

PR c++/70077

gcc/testsuite/ChangeLog:

* g++.dg/cpp0x/noexcept76.C: New test.

middle-end/104644 - recursion with bswap match.pd pattern

The following patch avoids infinite recursion during generic folding.
The (cmp (bswap @0) INTEGER_CST@1) simplification relies on
(bswap @1) actually being simplified, if it is not simplified, we just
move the bswap from one operand to the other and if @0 is also INTEGER_CST,
we apply the same rule next.

The reason why bswap @1 isn't folded to INTEGER_CST is that the INTEGER_CST
has TREE_OVERFLOW set on it and fold-const-call.cc predicate punts in
such cases:
static inline bool
integer_cst_p (tree t)
{
return TREE_CODE (t) == INTEGER_CST && !TREE_OVERFLOW (t);
}
The patch uses ! modifier to ensure the bswap is simplified and
extends support to GENERIC by means of requiring !EXPR_P which
is not perfect but a conservative approximation.

2022-02-22 Richard Biener <rguenther@suse.de>

PR tree-optimization/104644
* doc/match-and-simplify.texi: Amend ! documentation.
* genmatch.cc (expr::gen_transform): Code-generate ! support
for GENERIC.
(parser::parse_expr): Allow ! for GENERIC.
* match.pd (cmp (bswap @0) INTEGER_CST@1): Use ! modifier on
bswap.

* gcc.dg/pr104644.c: New test.

Co-Authored-by: Jakub Jelinek <jakub@redhat.com>

Support SSA name declarations with pointer type

Currently we fail to parse

  int * _3;

as SSA name and instead get a VAR_DECL because of the way the C
frontends declarator specs work.  That causes havoc if those
supposed SSA names are used in PHIs or in other places where
VAR_DECLs are not allowed.  The following fixes the pointer case
in an ad-hoc way - for more complex type declarators we probably
have to find a way to re-use the C frontend grokdeclarator without
actually creating a VAR_DECL there (or maybe make it create an
SSA name).

Pointers appear too often to be neglected though, thus the following
ad-hoc fix for this.  This also adds verification that we do not
end up with SSA names without definitions as can happen when
reducing a GIMPLE testcase.  Instead of working through segfaults
one-by-one we emit errors for all of those at once now.

2022-02-23  Richard Biener  <rguenther@suse.de>

gcc/c
* gimple-parser.cc (c_parser_parse_gimple_body): Diagnose
SSA names without definition.
(c_parser_gimple_declaration): Handle pointer typed SSA names.

gcc/testsuite/
* gcc.dg/gimplefe-49.c: New testcase.
* gcc.dg/gimplefe-error-13.c: Likewise.

tree-optimization/101636 - CTOR vectorization ICE

The following fixes an ICE when vectorizing the defs of a CTOR
results in a different vector type than expected. That can happen
with AARCH64 SVE and a fixed vector length as noted in r10-5979
and on x86 with AVX512 mask CTORs and trying to re-vectorize
using SSE as shown in this bug.

The fix is simply to reject the vectorization when it didn't
produce the desired type.

2022-02-23 Richard Biener <rguenther@suse.de>

PR tree-optimization/101636
* tree-vect-slp.cc (vect_print_slp_tree): Dump the
vector type of the node.
(vect_slp_analyze_operations): Make sure the CTOR
is vectorized with an expected type.
(vectorize_slp_instance_root_stmt): Revert r10-5979 fix.

* gcc.target/i386/pr101636.c: New testcase.
* c-c++-common/torture/pr101636.c: Likewise.

warn-recursion: Don't warn for __builtin_calls in gnu_inline extern inline functions [PR104633]

The first two testcases show different ways how e.g. the glibc
_FORTIFY_SOURCE wrappers are implemented, and on Winfinite-recursion-3.c
the new -Winfinite-recursion warning emits a false positive warning.

It is a false positive because when a builtin with 2 names is called
through the __builtin_ name (but not all builtins have a name prefixed
exactly like that) from extern inline function with gnu_inline semantics,
it doesn't mean the compiler will ever attempt to use the user inline
wrapper for the call, the __builtin_ just does what the builtin function
is expected to do and either expands into some compiler generated code,
or if the compiler decides to emit a call it will use an actual definition
of the function, but that is not the extern inline gnu_inline function
which is never emitted out of line.
Compared to that, in Winfinite-recursion-5.c the extern inline gnu_inline
wrapper calls the builtin by the same name as the function's name and in
that case it is infinite recursion, we actuall try to inline the recursive
call and also error because the recursion is infinite during inlining;
without always_inline we wouldn't error but it is still infinite recursion,
the user has no control on how many recursive calls we actually inline.

2022-02-22 Jakub Jelinek <jakub@redhat.com>

PR c/104633
* gimple-warn-recursion.cc (pass_warn_recursion::find_function_exit):
Don't warn about calls to corresponding builtin from extern inline
gnu_inline wrappers.

* gcc.dg/Winfinite-recursion-3.c: New test.
* gcc.dg/Winfinite-recursion-4.c: New test.
* gcc.dg/Winfinite-recursion-5.c: New test.

nvptx: Back-end portion of a fix for PR target/104489.

This one line fix/tweak is the back-end specific change for a fix for
PR target/104489, that allows the ISA for GCC's nvptx backend to be bumped
to sm_53. The machine-independent middle-end pieces were posted here:
https://gcc.gnu.org/pipermail/gcc-patches/2022-February/590139.html

2022-02-23 Roger Sayle <roger@nextmovesoftware.com>

gcc/ChangeLog
PR target/104489
* config/nvptx/nvptx.md (*movhf_insn): Add subregs_ok attribute.

arm: Fix typo in auto-vectorized MVE comparisons

I made a last minute renaming of mve_const_bool_vec_to_hi () into
mve_bool_vec_to_const () and forgot to update the call sites in vfp.md
accordingly.

Committed as obvious.

2022-02-23 Christophe Lyon <christophe.lyon@arm.com>

gcc/
PR target/100757
PR target/101325
* config/arm/vfp.md (thumb2_movhi_vfp, thumb2_movhi_fp16): Fix
typo.

x86: Update Intel architectures ISA support in documentation.

Since the ISA supported by Intel architectures in the documentation
are inconsistent with the actual, modify them all.

gcc/Changelog:

* doc/invoke.texi: Update documents for Intel architectures.

Daily bump.

libgo: update README.gcc

Reviewed-on: https://go-review.googlesource.com/c/gofrontend/+/387514

rs6000: Move g++.dg/ext powerpc tests to g++.target

Also adjust DejaGnu directives, as specifically requiring "powerpc*-*-*" is no
longer required.

2021-02-22 Paul A. Clarke <pc@us.ibm.com>

gcc/testsuite
* g++.dg/ext/altivec-1.C: Move to g++.target/powerpc, adjust dg
directives.
* g++.dg/ext/altivec-2.C: Likewise.
* g++.dg/ext/altivec-3.C: Likewise.
* g++.dg/ext/altivec-4.C: Likewise.
* g++.dg/ext/altivec-5.C: Likewise.
* g++.dg/ext/altivec-6.C: Likewise.
* g++.dg/ext/altivec-7.C: Likewise.
* g++.dg/ext/altivec-8.C: Likewise.
* g++.dg/ext/altivec-9.C: Likewise.
* g++.dg/ext/altivec-10.C: Likewise.
* g++.dg/ext/altivec-11.C: Likewise.
* g++.dg/ext/altivec-12.C: Likewise.
* g++.dg/ext/altivec-13.C: Likewise.
* g++.dg/ext/altivec-14.C: Likewise.
* g++.dg/ext/altivec-15.C: Likewise.
* g++.dg/ext/altivec-16.C: Likewise.
* g++.dg/ext/altivec-17.C: Likewise.
* g++.dg/ext/altivec-18.C: Likewise.
* g++.dg/ext/altivec-cell-1.C: Likewise.
* g++.dg/ext/altivec-cell-2.C: Likewise.
* g++.dg/ext/altivec-cell-3.C: Likewise.
* g++.dg/ext/altivec-cell-4.C: Likewise.
* g++.dg/ext/altivec-cell-5.C: Likewise.
* g++.dg/ext/altivec-types-1.C: Likewise.
* g++.dg/ext/altivec-types-2.C: Likewise.
* g++.dg/ext/altivec-types-3.C: Likewise.
* g++.dg/ext/altivec-types-4.C: Likewise.
* g++.dg/ext/undef-bool-1.C: Likewise.

Fortran: skip compile-time shape check if constructor shape is not known

gcc/fortran/ChangeLog:

PR fortran/104619
* resolve.cc (resolve_structure_cons): Skip shape check if shape
of constructor cannot be determined at compile time.

gcc/testsuite/ChangeLog:

PR fortran/104619
* gfortran.dg/derived_constructor_comps_7.f90: New test.

Restore bootstrap on x86_64-pc-linux-gnu

This patch resolves the bootstrap failure on x86_64-pc-linux-gnu.

2022-02-22 Roger Sayle <roger@nextmovesoftware.com>

gcc/ChangeLog
* config/i386/i386-expand.cc (ix86_expand_cmpxchg_loop): Restore
bootstrap.

Get rid of 'gcc/omp-oacc-neuter-broadcast.cc:oacc_build_component_ref'

Clean-up for commit e2a58ed6dc5293602d0d168475109caa81ad0f0d
"openacc: Middle-end worker-partitioning support":
as of commit 2a3f9f6532bb21d8ab6f16fbe9ee603f6b1405f2
"openacc: Shared memory layout optimisation", we're no longer
running into the vectorizer ICEs for '!ADDR_SPACE_GENERIC_P'.

gcc/
* omp-low.cc (omp_build_component_ref): Move function...
* omp-general.cc (omp_build_component_ref): ... here. Remove
'static'.
* omp-general.h (omp_build_component_ref): Declare function.
* omp-oacc-neuter-broadcast.cc (oacc_build_component_ref): Remove
function.
(build_receiver_ref, build_sender_ref): Call
'omp_build_component_ref' instead.

Further simplify 'gcc/omp-oacc-neuter-broadcast.cc:record_field_map_t'

Now that I've resolved GCC 'hash_map' issues (a while ago already), we may
further simplify this after commit 049eda8274b7394523238b17ab12c3e2889f253e
"Avoid 'GTY' use for 'gcc/omp-oacc-neuter-broadcast.cc:field_map'": as
'hash_map' Value, directly store 'field_map_t' objects, not pointers to
manually allocated 'field_map_t' objects.

gcc/
* omp-oacc-neuter-broadcast.cc (record_field_map_t): Further
simplify. Adjust all users.

Fix OpenACC gang-redundant execution in 'libgomp.oacc-fortran/privatized-ref-2.f90'

This was a latent problem, and this commit here now resolves a regression that
after recent commit a78b1ab1df9ca44acc5638e8f9d0ae2e62bd65ed
"amdgcn: Tune default OpenMP/OpenACC GPU utilization" we had (only) seen on a
GCN offloading '-march=gfx908' system:

{+WARNING: program timed out.+}
[-PASS:-]{+FAIL:+} libgomp.oacc-fortran/privatized-ref-2.f90 -DACC_DEVICE_TYPE_radeon=1 -DACC_MEM_SHARED=0 -foffload=amdgcn-amdhsa -O0 execution test

Same for other optimization levels.

Make sure that we're not executing non-parallelized code in gang-redundant
mode, by putting these parts into their own 'parallel' constructs, which then
default to 'num_gangs(1)'.

libgomp/
* testsuite/libgomp.oacc-fortran/privatized-ref-2.f90: Fix OpenACC
gang-redundant execution.

rs6000: Fix GC on rs6000.c decls for atomic handling (PR88134)

In PR88134 it is pointed out that we do not have GTY markup for some
variables we use for atomic. So, let's add that.

2022-02-22 Segher Boessenkool <segher@kernel.crashing.org>

PR target/88134
* config/rs6000/rs6000.cc (atomic_hold_decl, atomic_clear_decl,
atomic_update_decl): Add GTY markup.

arm: Add VPR_REG to ALL_REGS

VPR_REG should be part of ALL_REGS, this patch fixes this omission.

Most of the work of this patch series was carried out while I was
working at STMicroelectronics as a Linaro assignee.

2022-02-22 Christophe Lyon <christophe.lyon@arm.com>

gcc/
* config/arm/arm.h (REG_CLASS_CONTENTS): Add VPR_REG to ALL_REGS.

arm: Convert more MVE/CDE builtins to predicate qualifiers

This patch covers a few non-load/store builtins where we do not use
the <mode> iterator and thus we cannot use <MVE_vpred>.

Most of the work of this patch series was carried out while I was
working at STMicroelectronics as a Linaro assignee.

2022-02-22 Christophe Lyon <christophe.lyon@arm.com>

gcc/
PR target/100757
PR target/101325
* config/arm/arm-builtins.cc (CX_UNARY_UNONE_QUALIFIERS): Use
predicate.
(CX_BINARY_UNONE_QUALIFIERS): Likewise.
(CX_TERNARY_UNONE_QUALIFIERS): Likewise.
(TERNOP_NONE_NONE_NONE_UNONE_QUALIFIERS): Delete.
(QUADOP_NONE_NONE_NONE_NONE_UNONE_QUALIFIERS): Delete.
(QUADOP_UNONE_UNONE_UNONE_UNONE_UNONE_QUALIFIERS): Delete.
* config/arm/arm_mve_builtins.def: Use predicated qualifiers.
* config/arm/mve.md: Use VxBI instead of HI.

arm: Convert more load/store MVE builtins to predicate qualifiers

This patch covers a few builtins where we do not use the <mode>
iterator and thus we cannot use <MVE_vpred>.

For v2di instructions, we keep the HI mode for predicates.

Most of the work of this patch series was carried out while I was
working at STMicroelectronics as a Linaro assignee.

2022-02-22 Christophe Lyon <christophe.lyon@arm.com>

gcc/
PR target/100757
PR target/101325
* config/arm/arm-builtins.cc (STRSBS_P_QUALIFIERS): Use predicate
qualifier.
(STRSBU_P_QUALIFIERS): Likewise.
(LDRGBS_Z_QUALIFIERS): Likewise.
(LDRGBU_Z_QUALIFIERS): Likewise.
(LDRGBWBXU_Z_QUALIFIERS): Likewise.
(LDRGBWBS_Z_QUALIFIERS): Likewise.
(LDRGBWBU_Z_QUALIFIERS): Likewise.
(STRSBWBS_P_QUALIFIERS): Likewise.
(STRSBWBU_P_QUALIFIERS): Likewise.
* config/arm/mve.md: Use VxBI instead of HI.

arm: Convert more MVE builtins to predicate qualifiers

This patch covers all builtins that have an HI operand and use the
<mode> iterator, thus we can replace HI whe <MVE_vpred>.

Most of the work of this patch series was carried out while I was
working at STMicroelectronics as a Linaro assignee.

2022-02-22 Christophe Lyon <christophe.lyon@arm.com>

gcc/
PR target/100757
PR target/101325
* config/arm/arm-builtins.cc (TERNOP_UNONE_UNONE_NONE_UNONE_QUALIFIERS): Change to ...
(TERNOP_UNONE_UNONE_NONE_PRED_QUALIFIERS): ... this.
(TERNOP_UNONE_UNONE_IMM_UNONE_QUALIFIERS): Change to ...
(TERNOP_UNONE_UNONE_IMM_PRED_QUALIFIERS): ... this.
(TERNOP_NONE_NONE_IMM_UNONE_QUALIFIERS): Change to ...
(TERNOP_NONE_NONE_IMM_PRED_QUALIFIERS): ... this.
(TERNOP_NONE_NONE_UNONE_UNONE_QUALIFIERS): Change to ...
(TERNOP_NONE_NONE_UNONE_PRED_QUALIFIERS): ... this.
(QUADOP_UNONE_UNONE_NONE_NONE_UNONE_QUALIFIERS): Change to ...
(QUADOP_UNONE_UNONE_NONE_NONE_PRED_QUALIFIERS): ... this.
(QUADOP_NONE_NONE_NONE_NONE_PRED_QUALIFIERS): New.
(QUADOP_NONE_NONE_NONE_IMM_UNONE_QUALIFIERS): Change to ...
(QUADOP_NONE_NONE_NONE_IMM_PRED_QUALIFIERS): ... this.
(QUADOP_UNONE_UNONE_UNONE_UNONE_PRED_QUALIFIERS): New.
(QUADOP_UNONE_UNONE_NONE_IMM_UNONE_QUALIFIERS): Change to ...
(QUADOP_UNONE_UNONE_NONE_IMM_PRED_QUALIFIERS): ... this.
(QUADOP_NONE_NONE_UNONE_IMM_UNONE_QUALIFIERS): Change to ...
(QUADOP_NONE_NONE_UNONE_IMM_PRED_QUALIFIERS): ... this.
(QUADOP_UNONE_UNONE_UNONE_IMM_UNONE_QUALIFIERS): Change to ...
(QUADOP_UNONE_UNONE_UNONE_IMM_PRED_QUALIFIERS): ... this.
(QUADOP_UNONE_UNONE_UNONE_NONE_UNONE_QUALIFIERS): Change to ...
(QUADOP_UNONE_UNONE_UNONE_NONE_PRED_QUALIFIERS): ... this.
(STRS_P_QUALIFIERS): Use predicate qualifier.
(STRU_P_QUALIFIERS): Likewise.
(STRSU_P_QUALIFIERS): Likewise.
(STRSS_P_QUALIFIERS): Likewise.
(LDRGS_Z_QUALIFIERS): Likewise.
(LDRGU_Z_QUALIFIERS): Likewise.
(LDRS_Z_QUALIFIERS): Likewise.
(LDRU_Z_QUALIFIERS): Likewise.
(QUINOP_UNONE_UNONE_UNONE_UNONE_IMM_UNONE_QUALIFIERS): Change to ...
(QUINOP_UNONE_UNONE_UNONE_UNONE_IMM_PRED_QUALIFIERS): ... this.
(BINOP_NONE_NONE_PRED_QUALIFIERS): New.
(BINOP_UNONE_UNONE_PRED_QUALIFIERS): New.
* config/arm/arm_mve_builtins.def: Use new predicated qualifiers.
* config/arm/mve.md: Use MVE_VPRED instead of HI.

arm: Convert remaining MVE vcmp builtins to predicate qualifiers

This is mostly a mechanical change, only tested by the intrinsics
expansion tests.

Most of the work of this patch series was carried out while I was
working at STMicroelectronics as a Linaro assignee.

2022-02-22 Christophe Lyon <christophe.lyon@arm.com>

gcc/
PR target/100757
PR target/101325
* config/arm/arm-builtins.cc (BINOP_UNONE_NONE_NONE_QUALIFIERS):
Delete.
(TERNOP_UNONE_NONE_NONE_UNONE_QUALIFIERS): Change to ...
(TERNOP_PRED_NONE_NONE_PRED_QUALIFIERS): ... this.
(TERNOP_PRED_UNONE_UNONE_PRED_QUALIFIERS): New.
* config/arm/arm_mve_builtins.def (vcmp*q_n_, vcmp*q_m_f): Use new
predicated qualifiers.
* config/arm/mve.md (mve_vcmp<mve_cmp_op>q_n_<mode>)
(mve_vcmp*q_m_f<mode>): Use MVE_VPRED instead of HI.

arm: Fix vcond_mask expander for MVE (PR target/100757)

The problem in this PR is that we call VPSEL with a mask of vector
type instead of HImode. This happens because operand 3 in vcond_mask
is the pre-computed vector comparison and has vector type.

This patch fixes it by implementing TARGET_VECTORIZE_GET_MASK_MODE,
returning the appropriate VxBI mode when targeting MVE.  In turn, this
implies implementing vec_cmp<mode><MVE_vpred>,
vec_cmpu<mode><MVE_vpred> and vcond_mask_<mode><MVE_vpred>, and we can
move vec_cmp<mode><v_cmp_result>, vec_cmpu<mode><mode> and
vcond_mask_<mode><v_cmp_result> back to neon.md since they are not
used by MVE anymore.  The new *<MVE_vpred> patterns listed above are
implemented in mve.md since they are only valid for MVE. However this
may make maintenance/comparison more painful than having all of them
in vec-common.md.

In the process, we can get rid of the recently added vcond_mve
parameter of arm_expand_vector_compare.

Compared to neon.md's vcond_mask_<mode><v_cmp_result> before my "arm:
Auto-vectorization for MVE: vcmp" patch (r12-834), it keeps the VDQWH
iterator added in r12-835 (to have V4HF/V8HF support), as well as the
(!<Is_float_mode> || flag_unsafe_math_optimizations) condition which
was not present before r12-834 although SF modes were enabled by VDQW
(I think this was a bug).

Using TARGET_VECTORIZE_GET_MASK_MODE has the advantage that we no
longer need to generate vpsel with vectors of 0 and 1: the masks are
now merged via scalar 'ands' instructions operating on 16-bit masks
after converting the boolean vectors.

In addition, this patch fixes a problem in arm_expand_vcond() where
the result would be a vector of 0 or 1 instead of operand 1 or 2.

Since we want to skip gcc.dg/signbit-2.c for MVE, we also add a new
arm_mve effective target.

Reducing the number of iterations in pr100757-3.c from 32 to 8, we
generate the code below:

float a[32];
float fn1(int d) {
  float c = 4.0f;
  for (int b = 0; b < 8; b++)
    if (a[b] != 2.0f)
      c = 5.0f;
  return c;
}

fn1:
ldr     r3, .L3+48
vldr.64 d4, .L3              // q2=(2.0,2.0,2.0,2.0)
vldr.64 d5, .L3+8
vldrw.32        q0, [r3]     // q0=a(0..3)
adds    r3, r3, #16
vcmp.f32        eq, q0, q2   // cmp a(0..3) == (2.0,2.0,2.0,2.0)
vldrw.32        q1, [r3]     // q1=a(4..7)
vmrs     r3, P0
vcmp.f32        eq, q1, q2   // cmp a(4..7) == (2.0,2.0,2.0,2.0)
vmrs    r2, P0  @ movhi
ands    r3, r3, r2           // r3=select(a(0..3]) & select(a(4..7))
vldr.64 d4, .L3+16           // q2=(5.0,5.0,5.0,5.0)
vldr.64 d5, .L3+24
vmsr     P0, r3
vldr.64 d6, .L3+32           // q3=(4.0,4.0,4.0,4.0)
vldr.64 d7, .L3+40
vpsel q3, q3, q2             // q3=vcond_mask(4.0,5.0)
vmov.32 r2, q3[1]            // keep the scalar max
vmov.32 r0, q3[3]
vmov.32 r3, q3[2]
vmov.f32        s11, s12
vmov    s15, r2
vmov    s14, r3
vmaxnm.f32      s15, s11, s15
vmaxnm.f32      s15, s15, s14
vmov    s14, r0
vmaxnm.f32      s15, s15, s14
vmov    r0, s15
bx      lr
.L4:
.align  3
.L3:
.word   1073741824 // 2.0f
.word   1073741824
.word   1073741824
.word   1073741824
.word   1084227584 // 5.0f
.word   1084227584
.word   1084227584
.word   1084227584
.word   1082130432 // 4.0f
.word   1082130432
.word   1082130432
.word   1082130432

This patch adds tests that trigger an ICE without this fix.

The pr100757*.c testcases are derived from
gcc.c-torture/compile/20160205-1.c, forcing the use of MVE, and using
various types and return values different from 0 and 1 to avoid
commonalization with boolean masks.  In addition, since we should not
need these masks, the tests make sure they are not present.

Most of the work of this patch series was carried out while I was
working at STMicroelectronics as a Linaro assignee.

2022-02-22  Christophe Lyon  <christophe.lyon@arm.com>

PR target/100757
gcc/
* config/arm/arm-protos.h (arm_get_mask_mode): New prototype.
(arm_expand_vector_compare): Update prototype.
* config/arm/arm.cc (TARGET_VECTORIZE_GET_MASK_MODE): New.
(arm_vector_mode_supported_p): Add support for VxBI modes.
(arm_expand_vector_compare): Remove useless generation of vpsel.
(arm_expand_vcond): Fix select operands.
(arm_get_mask_mode): New.
* config/arm/mve.md (vec_cmp<mode><MVE_vpred>): New.
(vec_cmpu<mode><MVE_vpred>): New.
(vcond_mask_<mode><MVE_vpred>): New.
* config/arm/vec-common.md (vec_cmp<mode><v_cmp_result>)
(vec_cmpu<mode><mode, vcond_mask_<mode><v_cmp_result>): Move to ...
* config/arm/neon.md (vec_cmp<mode><v_cmp_result>)
(vec_cmpu<mode><mode, vcond_mask_<mode><v_cmp_result>): ... here
and disable for MVE.
* doc/sourcebuild.texi (arm_mve): Document new effective-target.

gcc/testsuite/
PR target/100757
* gcc.target/arm/simd/pr100757-2.c: New.
* gcc.target/arm/simd/pr100757-3.c: New.
* gcc.target/arm/simd/pr100757-4.c: New.
* gcc.target/arm/simd/pr100757.c: New.
* gcc.dg/signbit-2.c: Skip when targeting ARM/MVE.
* lib/target-supports.exp (check_effective_target_arm_mve): New.

arm: Implement auto-vectorized MVE comparisons with vectors of boolean predicates

We make use of qualifier_predicate to describe MVE builtins
prototypes, restricting to auto-vectorizable vcmp* and vpsel builtins,
as they are exercised by the tests added earlier in the series.

Special handling is needed for mve_vpselq because it has a v2di
variant, which has no natural VPR.P0 representation: we keep HImode
for it.

The vector_compare expansion code is updated to use the right VxBI
mode instead of HI for the result.

We extend the existing thumb2_movhi_vfp and thumb2_movhi_fp16 patterns
to use the new MVE_7_HI iterator which covers HI and the new VxBI
modes, in conjunction with the new DB constraint for a constant vector
of booleans.

This patch also adds tests derived from the one provided in PR
target/101325: there is a compile-only test because I did not have
access to anything that could execute MVE code until recently.  I have
been able to add an executable test since QEMU supports MVE.

Instead of adding arm_v8_1m_mve_hw, I update arm_mve_hw so that it
uses add_options_for_arm_v8_1m_mve_fp, like arm_neon_hw does.  This
ensures arm_mve_hw passes even if the toolchain does not generate MVE
code by default.

Most of the work of this patch series was carried out while I was
working at STMicroelectronics as a Linaro assignee.

2022-02-22  Christophe Lyon <christophe.lyon@arm.com>
    Richard Sandiford  <richard.sandiford@arm.com>

gcc/
PR target/100757
PR target/101325
* config/arm/arm-builtins.cc (BINOP_PRED_UNONE_UNONE_QUALIFIERS)
(BINOP_PRED_NONE_NONE_QUALIFIERS)
(TERNOP_NONE_NONE_NONE_PRED_QUALIFIERS)
(TERNOP_UNONE_UNONE_UNONE_PRED_QUALIFIERS): New.
* config/arm/arm-protos.h (mve_bool_vec_to_const): New.
* config/arm/arm.cc (arm_hard_regno_mode_ok): Handle new VxBI
modes.
(arm_mode_to_pred_mode): New.
(arm_expand_vector_compare): Use the right VxBI mode instead of
HI.
(arm_expand_vcond): Likewise.
(simd_valid_immediate): Handle MODE_VECTOR_BOOL.
(mve_bool_vec_to_const): New.
(neon_make_constant): Call mve_bool_vec_to_const when needed.
* config/arm/arm_mve_builtins.def (vcmpneq_, vcmphiq_, vcmpcsq_)
(vcmpltq_, vcmpleq_, vcmpgtq_, vcmpgeq_, vcmpeqq_, vcmpneq_f)
(vcmpltq_f, vcmpleq_f, vcmpgtq_f, vcmpgeq_f, vcmpeqq_f, vpselq_u)
(vpselq_s, vpselq_f): Use new predicated qualifiers.
* config/arm/constraints.md (DB): New.
* config/arm/iterators.md (MVE_7, MVE_7_HI): New mode iterators.
(MVE_VPRED, MVE_vpred): New attribute iterators.
* config/arm/mve.md (@mve_vcmp<mve_cmp_op>q_<mode>)
(@mve_vcmp<mve_cmp_op>q_f<mode>, @mve_vpselq_<supf><mode>)
(@mve_vpselq_f<mode>): Use MVE_VPRED instead of HI.
(@mve_vpselq_<supf>v2di): Define separately.
(mov<mode>): New expander for VxBI modes.
* config/arm/vfp.md (thumb2_movhi_vfp, thumb2_movhi_fp16): Use
MVE_7_HI iterator and add support for DB constraint.

gcc/testsuite/
PR target/100757
PR target/101325
* gcc.dg/rtl/arm/mve-vxbi.c: New test.
* gcc.target/arm/simd/pr101325.c: New.
* gcc.target/arm/simd/pr101325-2.c: New.
* lib/target-supports.exp (check_effective_target_arm_mve_hw): Use
add_options_for_arm_v8_1m_mve_fp.

arm: Implement MVE predicates as vectors of booleans

This patch implements support for vectors of booleans to support MVE
predicates, instead of HImode.  Since the ABI mandates pred16_t (aka
uint16_t) to represent predicates in intrinsics prototypes, we
introduce a new "predicate" type qualifier so that we can map relevant
builtins HImode arguments and return value to the appropriate vector
of booleans (VxBI).

We have to update test_vector_ops_duplicate, because it iterates using
an offset in bytes, where we would need to iterate in bits: we stop
iterating when we reach the end of the vector of booleans.

In addition, we have to fix the underlying definition of vectors of
booleans because ARM/MVE needs a different representation than
AArch64/SVE. With ARM/MVE the 'true' bit is duplicated over the
element size, so that a true element of V4BI is represented by
'0b1111'.  This patch updates the aarch64 definition of VNx*BI as
needed.

Most of the work of this patch series was carried out while I was
working at STMicroelectronics as a Linaro assignee.

2022-02-22  Christophe Lyon  <christophe.lyon@arm.com>
    Richard Sandiford  <richard.sandiford@arm.com>

gcc/
PR target/100757
PR target/101325
* config/aarch64/aarch64-modes.def (VNx16BI, VNx8BI, VNx4BI,
VNx2BI): Update definition.
* config/arm/arm-builtins.cc (arm_init_simd_builtin_types): Add new
simd types.
(arm_init_builtin): Map predicate vectors arguments to HImode.
(arm_expand_builtin_args): Move HImode predicate arguments to VxBI
rtx. Move return value to HImode rtx.
* config/arm/arm-builtins.h (arm_type_qualifiers): Add qualifier_predicate.
* config/arm/arm-modes.def (B2I, B4I, V16BI, V8BI, V4BI): New modes.
* config/arm/arm-simd-builtin-types.def (Pred1x16_t,
Pred2x8_t,Pred4x4_t): New.
* emit-rtl.cc (init_emit_once): Handle all boolean modes.
* genmodes.cc (mode_data): Add boolean field.
(blank_mode): Initialize it.
(make_complex_modes): Fix handling of boolean modes.
(make_vector_modes): Likewise.
(VECTOR_BOOL_MODE): Use new COMPONENT parameter.
(make_vector_bool_mode): Likewise.
(BOOL_MODE): New.
(make_bool_mode): New.
(emit_insn_modes_h): Fix generation of boolean modes.
(emit_class_narrowest_mode): Likewise.
* machmode.def: (VECTOR_BOOL_MODE): Document new COMPONENT
parameter.  Use new BOOL_MODE instead of FRACTIONAL_INT_MODE to
define BImode.
* rtx-vector-builder.cc (rtx_vector_builder::find_cached_value):
Fix handling of constm1_rtx for VECTOR_BOOL.
* simplify-rtx.cc (native_encode_rtx): Fix support for VECTOR_BOOL.
(native_decode_vector_rtx): Likewise.
(test_vector_ops_duplicate): Skip vec_merge test
with vectors of booleans.
* varasm.cc (output_constant_pool_2): Likewise.

arm: Fix mve_vmvnq_n_<supf><mode> argument mode

The vmvnq_n* intrinsics and have [u]int[16|32]_t arguments, so use
<V_elem> iterator instead of HI in mve_vmvnq_n_<supf><mode>.

Most of the work of this patch series was carried out while I was
working at STMicroelectronics as a Linaro assignee.

2022-02-22 Christophe Lyon <christophe.lyon@arm.com>

gcc/
* config/arm/mve.md (mve_vmvnq_n_<supf><mode>): Use V_elem mode
for operand 1.

arm: Add support for VPR_REG in arm_class_likely_spilled_p

VPR_REG is the only register in its class, so it should be handled by
TARGET_CLASS_LIKELY_SPILLED_P, which is achieved by calling
default_class_likely_spilled_p. No test fails without this patch, but
it seems it should be implemented.

Most of the work of this patch series was carried out while I was
working at STMicroelectronics as a Linaro assignee.

2022-02-22 Christophe Lyon <christophe.lyon@arm.com>

gcc/
* config/arm/arm.cc (arm_class_likely_spilled_p): Handle VPR_REG.

arm: Add GENERAL_AND_VPR_REGS regclass

At some point during the development of this patch series, it appeared
that in some cases the register allocator wants “VPR or general”
rather than “VPR or general or FP” (which is the same thing as
ALL_REGS). The series does not seem to require this anymore, but it
seems to be a good thing to do anyway, to give the register allocator
more freedom.

CLASS_MAX_NREGS and arm_hard_regno_nregs need adjustment to avoid a
regression in gcc.dg/stack-usage-1.c when compiled with -mthumb
-mfloat-abi=hard -march=armv8.1-m.main+mve.fp+fp.dp.

Most of the work of this patch series was carried out while I was
working at STMicroelectronics as a Linaro assignee.

2022-02-22 Christophe Lyon <christophe.lyon@arm.com>

gcc/
* config/arm/arm.h (reg_class): Add GENERAL_AND_VPR_REGS.
(REG_CLASS_NAMES): Likewise.
(REG_CLASS_CONTENTS): Likewise.
(CLASS_MAX_NREGS): Handle VPR.
* config/arm/arm.cc (arm_hard_regno_nregs): Handle VPR.

arm: Add new tests for comparison vectorization with Neon and MVE

This patch mainly adds Neon tests similar to existing MVE ones,
to make sure we do not break Neon when fixing MVE.

mve-vcmp-f32-2.c is similar to mve-vcmp-f32.c but uses a conditional
with 2.0f and 3.0f constants to help scan-assembler-times.

Most of the work of this patch series was carried out while I was
working at STMicroelectronics as a Linaro assignee.

2022-02-22 Christophe Lyon <christophe.lyon@arm.com>

gcc/testsuite/
* gcc.target/arm/simd/mve-vcmp-f32-2.c: New.
* gcc.target/arm/simd/neon-compare-1.c: New.
* gcc.target/arm/simd/neon-compare-2.c: New.
* gcc.target/arm/simd/neon-compare-3.c: New.
* gcc.target/arm/simd/neon-compare-scalar-1.c: New.
* gcc.target/arm/simd/neon-vcmp-f16.c: New.
* gcc.target/arm/simd/neon-vcmp-f32-2.c: New.
* gcc.target/arm/simd/neon-vcmp-f32-3.c: New.
* gcc.target/arm/simd/neon-vcmp-f32.c: New.
* gcc.target/arm/simd/neon-vcmp.c: New.

MAINTAINERS: Update my email address.

* MAINTAINERS (Write After Approval): Update my e-mail address.

[libgomp, nvptx] Fix hang in gomp_team_barrier_wait_end

Consider the following omp fragment.
...
  #pragma omp target
  #pragma omp parallel num_threads (2)
  #pragma omp task
    ;
...

This hangs at -O0 for nvptx.

Investigating the behaviour gives us the following trace of events:
- both threads execute GOMP_task, where they:
  - deposit a task, and
  - execute gomp_team_barrier_wake
- thread 1 executes gomp_team_barrier_wait_end and, not being the last thread,
  proceeds to wait at the team barrier
- thread 0 executes gomp_team_barrier_wait_end and, being the last thread, it
  calls gomp_barrier_handle_tasks, where it:
  - executes both tasks and marks the team barrier done
  - executes a gomp_team_barrier_wake which wakes up thread 1
- thread 1 exits the team barrier
- thread 0 returns from gomp_barrier_handle_tasks and goes to wait at
  the team barrier.
- thread 0 hangs.

To understand why there is a hang here, it's good to understand how things
are setup for nvptx.  The libgomp/config/nvptx/bar.c implementation is
a copy of the libgomp/config/linux/bar.c implementation, with uses of both
futex_wake and do_wait replaced with uses of ptx insn bar.sync:
...
  if (bar->total > 1)
    asm ("bar.sync 1, %0;" : : "r" (32 * bar->total));
...

The point where thread 0 goes to wait at the team barrier, corresponds in
the linux implementation with a do_wait.  In the linux case, the call to
do_wait doesn't hang, because it's waiting for bar->generation to become
a certain value, and if bar->generation already has that value, it just
proceeds, without any need for coordination with other threads.

In the nvtpx case, the bar.sync waits until thread 1 joins it in the same
logical barrier, which never happens: thread 1 is lingering in the
thread pool at the thread pool barrier (using a different logical barrier),
waiting to join a new team.

The easiest way to fix this is to revert to the posix implementation for
bar.{c,h}.  That however falls back on a busy-waiting approach, and
does not take advantage of the ptx bar.sync insn.

Instead, we revert to the linux implementation for bar.c,
and implement bar.c local functions futex_wait and futex_wake using the
bar.sync insn.

The bar.sync insn takes an argument specifying how many threads are
participating, and that doesn't play well with the futex syntax where it's
not clear in advance how many threads will be woken up.

This is solved by waking up all waiting threads each time a futex_wait or
futex_wake happens, and possibly going back to sleep with an updated thread
count.

Tested libgomp on x86_64 with nvptx accelerator.

libgomp/ChangeLog:

2021-04-20  Tom de Vries  <tdevries@suse.de>

PR target/99555
* config/nvptx/bar.c (generation_to_barrier): New function, copied
from config/rtems/bar.c.
(futex_wait, futex_wake): New function.
(do_spin, do_wait): New function, copied from config/linux/wait.h.
(gomp_barrier_wait_end, gomp_barrier_wait_last)
(gomp_team_barrier_wake, gomp_team_barrier_wait_end):
(gomp_team_barrier_wait_cancel_end, gomp_team_barrier_cancel): Remove
and replace with include of config/linux/bar.c.
* config/nvptx/bar.h (gomp_barrier_t): Add fields waiters and lock.
(gomp_barrier_init): Init new fields.
* testsuite/libgomp.c-c++-common/task-detach-6.c: Remove nvptx-specific
workarounds.
* testsuite/libgomp.c/pr99555-1.c: Same.
* testsuite/libgomp.fortran/task-detach-6.f90: Same.

nvptx: Add -misa=sm_70

Add -misa=sm_70, and use it to specify the misa value in test-case
gcc.target/nvptx/atomic-store-2.c.

Tested on nvptx.

gcc/ChangeLog:

* config/nvptx/nvptx-c.cc (nvptx_cpu_cpp_builtins): Handle SM70.
* config/nvptx/nvptx.cc (first_ptx_version_supporting_sm):
Likewise.
* config/nvptx/nvptx.opt (misa): Add sm_70 alias PTX_ISA_SM70.

gcc/testsuite/ChangeLog:

2022-02-22 Tom de Vries <tdevries@suse.de>

* gcc.target/nvptx/atomic-store-2.c: Use -misa=sm_70.
* gcc.target/nvptx/uniform-simt-3.c: Same.

Co-Authored-By: Tom de Vries <tdevries@suse.de>

libstdc++: Implement P2415R2 changes to viewable_range / views::all

This implements the wording changes in P2415R2 "What is a view?", which
is a DR for C++20.

libstdc++-v3/ChangeLog:

* include/bits/ranges_base.h (__detail::__is_initializer_list):
Define.
(viewable_range): Adjust as per P2415R2.
* include/bits/ranges_cmp.h (__cpp_lib_ranges): Adjust value.
* include/std/ranges (owning_view): Define as per P2415R2.
(enable_borrowed_range<owning_view>): Likewise.
(views::__detail::__can_subrange): Replace with ...
(views::__detail::__can_owning_view): ... this.
(views::_All::_S_noexcept): Sync with operator().
(views::_All::operator()): Use owning_view instead of subrange
as per P2415R2.
* include/std/version (__cpp_lib_ranges): Adjust value.
* testsuite/std/ranges/adaptors/all.cc (test06): Adjust now that
views::all uses owning_view instead of subrange.
(test08): New test.
* testsuite/std/ranges/adaptors/lazy_split.cc (test09): Adjust
now that rvalue non-view non-borrowed ranges are viewable.
* testsuite/std/ranges/adaptors/split.cc (test06): Likewise.

nvptx: Add -mptx=6.0

Currently supported internally are 3.1, 6.0, 6.3 and 7.0.

However, -mptx= supports 3.1, 6.3, 7.0 – but not the internal default 6.0.

Add -mptx=6.0 for consistency.

Tested on nvptx.

gcc/ChangeLog:

* config/nvptx/nvptx.opt (mptx): Add 6.0 alias PTX_VERSION_6_0.
* doc/invoke.texi (-mptx): Update for new values and defaults.

Co-Authored-By: Tom de Vries <tdevries@suse.de>

[nvptx] Add -mptx-comment

Add functionality that indicates which insns are added by -minit-regs, such
that for instance we have for pr53465.s:
...
        // #APP
// 9 "gcc/testsuite/gcc.c-torture/execute/pr53465.c" 1
        // Start: Added by -minit-regs=3:
        // #NO_APP
                mov.u32 %r26, 0;
        // #APP
// 9 "gcc/testsuite/gcc.c-torture/execute/pr53465.c" 1
        // End: Added by -minit-regs=3:
        // #NO_APP
...

Can be switched off using -mno-ptx-comment.

Tested on nvptx.

gcc/ChangeLog:

2022-02-21  Tom de Vries  <tdevries@suse.de>

* config/nvptx/nvptx.cc (gen_comment): New function.
(workaround_uninit_method_1, workaround_uninit_method_2)
(workaround_uninit_method_3): : Use gen_comment.
* config/nvptx/nvptx.opt (mptx-comment): New option.

Dump def that we use for a splat

This makes the SLP vectorizer dump the def we use for a splat to
aid debugging.

2022-02-22 Richard Biener <rguenther@suse.de>

* tree-vect-slp.cc (vect_build_slp_tree_2): Dump the def used
for a splat.

Implement constant-folding simplifications of reductions.

This patch addresses a code quality regression in GCC 12 by implementing
some constant folding/simplification transformations for REDUC_PLUS_EXPR
in match.pd.  The motivating example is gcc.dg/vect/pr89440.c which with
-O2 -ffast-math (with vectorization now enabled) gets optimized to:

float f (float x)
{
  vector(4) float vect_x_14.11;
  vector(4) float _2;
  float _32;

  _2 = {x_9(D), 0.0, 0.0, 0.0};
  vect_x_14.11_29 = _2 + { 1.0e+1, 2.6e+1, 4.2e+1, 5.8e+1 };
  _32 = .REDUC_PLUS (vect_x_14.11_29); [tail call]
  return _32;
}

With these proposed new transformations, we can simplify the
above code even further.

float f (float x)
{
  float _32;
  _32 = x_9(D) + 1.36e+2;
  return _32;
}

[which happens to match what we'd produce with -fno-tree-vectorize,
and with GCC 11].

2022-02-22  Roger Sayle  <roger@nextmovesoftware.com>
    Richard Biener  <rguenther@suse.de>

gcc/ChangeLog
* fold-const.cc (ctor_single_nonzero_element): New function to
return the single non-zero element of a (vector) constructor.
* fold-const.h (ctor_single_nonzero_element): Prototype here.
* match.pd (reduc (constructor@0)): Simplify reductions of a
constructor containing a single non-zero element.
(reduc (@0 op VECTOR_CST) ->  (reduc @0) op CONST): Simplify
reductions of vector operations of the same operator with
constant vector operands.

gcc/testsuite/ChangeLog
* gcc.dg/fold-reduc-1.c: New test case.

libiberty: Fix up debug.temp.o creation if *.o has 64K+ sections [PR104617]

On
#define A(n) int foo1##n(void) { return 1##n; }
#define B(n) A(n##0) A(n##1) A(n##2) A(n##3) A(n##4) A(n##5) A(n##6) A(n##7) A(n##8) A(n##9)
#define C(n) B(n##0) B(n##1) B(n##2) B(n##3) B(n##4) B(n##5) B(n##6) B(n##7) B(n##8) B(n##9)
#define D(n) C(n##0) C(n##1) C(n##2) C(n##3) C(n##4) C(n##5) C(n##6) C(n##7) C(n##8) C(n##9)
#define E(n) D(n##0) D(n##1) D(n##2) D(n##3) D(n##4) D(n##5) D(n##6) D(n##7) D(n##8) D(n##9)
E(0) E(1) E(2) D(30) D(31) C(320) C(321) C(322) C(323) C(324) C(325)
B(3260) B(3261) B(3262) B(3263) A(32640) A(32641) A(32642)
testcase with
./xgcc -B ./ -c -g -fpic -ffat-lto-objects -flto  -O0 -o foo1.o foo1.c -ffunction-sections
./xgcc -B ./ -shared -g -fpic -flto -O0 -o foo1.so foo1.o
/tmp/ccTW8mBm.debug.temp.o: file not recognized: file format not recognized
(testcase too slow to be included into testsuite).
The problem is clearly reported by readelf:
readelf: foo1.o.debug.temp.o: Warning: Section 2 has an out of range sh_link value of 65321
readelf: foo1.o.debug.temp.o: Warning: Section 5 has an out of range sh_link value of 65321
readelf: foo1.o.debug.temp.o: Warning: Section 10 has an out of range sh_link value of 65323
readelf: foo1.o.debug.temp.o: Warning: [ 2]: Link field (65321) should index a symtab section.
readelf: foo1.o.debug.temp.o: Warning: [ 5]: Link field (65321) should index a symtab section.
readelf: foo1.o.debug.temp.o: Warning: [10]: Link field (65323) should index a string section.
because simple_object_elf_copy_lto_debug_sections doesn't adjust sh_info and
sh_link fields in ElfNN_Shdr if they are in between SHN_{LO,HI}RESERVE
inclusive.  Not adjusting those is incorrect though, SHN_{LO,HI}RESERVE
range is only relevant to the 16-bit fields, mainly st_shndx in ElfNN_Sym
where if one needs >= SHN_LORESERVE section number, SHN_XINDEX should be
used instead and .symtab_shndx section should contain the real section
index, and in ElfNN_Ehdr e_shnum and e_shstrndx fields, where if >=
SHN_LORESERVE value is needed it should put those into
Shdr[0].sh_{size,link}.  But, sh_{link,info} are 32-bit fields which can
contain any section index.

Note, as simple-object-elf.c mentions, binutils from 2.12 to 2.18 (so before
2011) used to mishandle the > 63.75K sections case and assumed there is a
hole in between the sections, but what
simple_object_elf_copy_lto_debug_sections does wouldn't help in that case
for the debug temp object creation, we'd need to detect the case also in
that routine and take it into account in the remapping etc.  I think
it is not worth it given that it is over 10 years, if somebody needs
63.75K or more sections, better use more recent binutils.

2022-02-22  Jakub Jelinek  <jakub@redhat.com>

PR lto/104617
* simple-object-elf.c (simple_object_elf_match): Fix up URL
in comment.
(simple_object_elf_copy_lto_debug_sections): Remap sh_info and
sh_link even if they are in the SHN_LORESERVE .. SHN_HIRESERVE
range (inclusive).