review.tizen.org Git - platform/upstream/gcc.git/log

Adjust testcase for O2 vect.

Adjust code in check_vect_slp_store_usage to make it an exact
pattern match of the corresponding testcases.
These new target/xfail selectors are added as a temporary solution,
and should be removed after real issue is fixed for Wstringop-overflow.

gcc/ChangeLog:

* doc/sourcebuild.texi (vect_slp_v4qi_store_unalign,
vect_slp_v2hi_store_unalign, vect_slp_v4hi_store_unalign,
vect_slp_v4si_store_unalign): Document efficient target.
(vect_slp_v4qi_store_unalign_1, vect_slp_v8qi_store_unalign_1,
vect_slp_v16qi_store_unalign_1): Ditto.
(vect_slp_v2hi_store_align,vect_slp_v2qi_store_align,
vect_slp_v2si_store_align, vect_slp_v4qi_store_align): Ditto.
(struct_4char_block_move, struct_8char_block_move,
struct_16char_block_move): Ditto.

gcc/testsuite/ChangeLog:

PR testsuite/102944
* c-c++-common/Wstringop-overflow-2.c: Adjust target/xfail
selector.
* gcc.dg/Warray-bounds-48.c: Ditto.
* gcc.dg/Warray-bounds-51.c: Ditto.
* gcc.dg/Warray-parameter-3.c: Ditto.
* gcc.dg/Wstringop-overflow-14.c: Ditto.
* gcc.dg/Wstringop-overflow-21.c: Ditto.
* gcc.dg/Wstringop-overflow-68.c: Ditto
* gcc.dg/Wstringop-overflow-76.c: Ditto
* gcc.dg/Wzero-length-array-bounds-2.c: Ditto.
* lib/target-supports.exp (vect_slp_v4qi_store_unalign): New
efficient target.
(vect_slp_v4qi_store_unalign_1): Ditto.
(struct_4char_block_move): Ditto.
(struct_8char_block_move): Ditto.
(stryct_16char_block_move): Ditto.
(vect_slp_v2hi_store_align): Ditto.
(vect_slp_v2qi_store): Rename to ..
(vect_slp_v2qi_store_align): .. this.
(vect_slp_v4qi_store): Rename to ..
(vect_slp_v4qi_store_align): .. This.
(vect_slp_v8qi_store): Rename to ..
(vect_slp_v8qi_store_unalign_1): .. This.
(vect_slp_v16qi_store): Rename to ..
(vect_slp_v16qi_store_unalign_1): .. This.
(vect_slp_v2hi_store): Rename to ..
(vect_slp_v2hi_store_unalign): .. This.
(vect_slp_v4hi_store): Rename to ..
(vect_slp_v4hi_store_unalign): This.
(vect_slp_v2si_store): Rename to ..
(vect_slp_v2si_store_align): .. This.
(vect_slp_v4si_store): Rename to ..
(vect_slp_v4si_store_unalign): Ditto.
(check_vect_slp_aligned_store_usage): Rename to ..
(check_vect_slp_store_usage): .. this and adjust code to make
it an exact pattern match of corresponding testcase.

x86_64: Expand ashrv1ti (and PR target/102986)

This patch was originally intended to implement 128-bit arithmetic right
shifts by constants of vector registers (V1TImode), but while working on
it I discovered the (my) recent ICE on valid regression now known as
PR target/102986.

As diagnosed by Jakub, expanders for shifts are not allowed to fail, and
so any backend that provides a shift optab needs to handle variable amount
shifts as well as constant shifts [even though the middle-end knows how
to synthesize these for vector modes].  This constraint could be relaxed
in the middle-end, but it makes sense to fix this in my erroneous code.

The solution is to change the constraints on the recently added (and new)
shift expanders from SImode const_int_register to QImode general operand,
matching the TImode expanders' constraints, and then simply check for
!CONST_INT_P at the start of the ix86_expand_v1ti_* functions, converting
the operands from V1TImode to TImode, performing the TImode operation
and converting the result back to V1TImode.

One nice benefit of this strategy, is that it allows us to implement
Uros' recent suggestion, that we should be more efficiently converting
between these modes, avoiding the use of memory and using the same idiom
as LLVM or using pextrq/pinsrq where available.  The new helper functions
ix86_expand_v1ti_to_ti and ix86_expand_ti_to_v1ti are sufficient to take
care of this.  Interestingly partial support for this is already present,
but x86_64's generic tuning prefers memory transfers to avoid penalizing
microarchitectures with significant interunit delays.  With these changes
we now generate both pextrq and pinsrq for -mtune=native.

The main body of the patch is to implement arithmetic right shift in
addition to the logical right shift and left shift implemented in the
previous patch.  This expander provides no less than 13 different code
sequences, special casing the different constant shifts, including
variants taking advantage of TARGET_AVX2 and TARGET_SSE4_1.  The code
is structured with the faster/shorter sequences and the start, and
the generic implementations at the end.

For the record, the implementations are:

ashr_127: // Shift 127, 2 operations, 10 bytes
        pshufd  $255, %xmm0, %xmm0
        psrad   $31, %xmm0
        ret

ashr_64: // Shift by 64, 3 operations, 14 bytes
        pshufd  $255, %xmm0, %xmm1
        psrad   $31, %xmm1
        punpckhqdq      %xmm1, %xmm0
        ret

ashr_96: // Shift by 96, 3 operations, 18 bytes
        movdqa  %xmm0, %xmm1
        psrad   $31, %xmm1
        punpckhqdq      %xmm1, %xmm0
        pshufd  $253, %xmm0, %xmm0
        ret

ashr_8: // Shift by 8/16/24/32 on AVX2, 3 operations, 16 bytes
        vpsrad  $8, %xmm0, %xmm1
        vpsrldq $1, %xmm0, %xmm0
        vpblendd        $7, %xmm0, %xmm1, %xmm0
        ret

ashr_8: // Shift by 8/16/24/32 on SSE4.1, 3 operations, 24 bytes
        movdqa  %xmm0, %xmm1
        psrldq  $1, %xmm0
        psrad   $8, %xmm1
        pblendw $63, %xmm0, %xmm1
        movdqa  %xmm1, %xmm0
        ret

ashr_97: // Shifts by 97..126, 4 operations, 23 bytes
        movdqa  %xmm0, %xmm1
        psrad   $31, %xmm0
        psrad   $1, %xmm1
        punpckhqdq      %xmm0, %xmm1
        pshufd  $253, %xmm1, %xmm0
        ret

ashr_48: // Shifts by 48/80 on SSE4.1, 4 operations, 25 bytes
        movdqa  %xmm0, %xmm1
        pshufd  $255, %xmm0, %xmm0
        psrldq  $6, %xmm1
        psrad   $31, %xmm0
        pblendw $31, %xmm1, %xmm0
        ret

ashr_8: // Shifts by multiples of 8, 5 operations, 28 bytes
        movdqa  %xmm0, %xmm1
        pshufd  $255, %xmm0, %xmm0
        psrad   $31, %xmm0
        psrldq  $1, %xmm1
        pslldq  $15, %xmm0
        por     %xmm1, %xmm0
        ret

ashr_1: // Shifts by 1..31 on AVX2, 6 operations, 30 bytes
        vpsrldq $8, %xmm0, %xmm2
        vpsrad  $1, %xmm0, %xmm1
        vpsllq  $63, %xmm2, %xmm2
        vpsrlq  $1, %xmm0, %xmm0
        vpor    %xmm2, %xmm0, %xmm0
        vpblendd        $7, %xmm0, %xmm1, %xmm0
        ret

ashr_1: // Shifts by 1..15 on SSE4.1, 6 operations, 42 bytes
        movdqa  %xmm0, %xmm2
        movdqa  %xmm0, %xmm1
        psrldq  $8, %xmm2
        psrlq   $1, %xmm0
        psllq   $63, %xmm2
        psrad   $1, %xmm1
        por     %xmm2, %xmm0
        pblendw $63, %xmm0, %xmm1
        movdqa  %xmm1, %xmm0
        ret

ashr_1: // Shift by 1, 8 operations, 46 bytes
        movdqa  %xmm0, %xmm1
        movdqa  %xmm0, %xmm2
        psrldq  $8, %xmm2
        psrlq   $63, %xmm1
        psllq   $63, %xmm2
        psrlq   $1, %xmm0
        pshufd  $191, %xmm1, %xmm1
        por     %xmm2, %xmm0
        psllq   $31, %xmm1
        por     %xmm1, %xmm0
        ret

ashr_65: // Shifts by 65..95, 8 operations, 42 bytes
        pshufd  $255, %xmm0, %xmm1
        psrldq  $8, %xmm0
        psrad   $31, %xmm1
        psrlq   $1, %xmm0
        movdqa  %xmm1, %xmm2
        psllq   $63, %xmm1
        pslldq  $8, %xmm2
        por     %xmm2, %xmm1
        por     %xmm1, %xmm0
        ret

ashr_2: // Shifts from 2..63, 9 operations, 47 bytes
        pshufd  $255, %xmm0, %xmm1
        movdqa  %xmm0, %xmm2
        psrad   $31, %xmm1
        psrldq  $8, %xmm2
        psllq   $62, %xmm2
        psrlq   $2, %xmm0
        pslldq  $8, %xmm1
        por     %xmm2, %xmm0
        psllq   $62, %xmm1
        por     %xmm1, %xmm0
        ret

To test these changes there are several new test cases.  sse2-v1ti-shift-2.c
is a compile-test designed to spot/catch PR target/102986 [for all shifts
and rotates by variable amounts], and sse2-v1ti-shift-3.c is an execution
test to confirm shifts/rotates by variable amounts produce the same results
for TImode and V1TImode.  sse2-v1ti-ashiftrt-1.c is a (similar) execution
test to confirm arithmetic right shifts by different constants produce
identical results between TImode and V1TImode.  sse2-v1ti-ashift-[23].c are
duplicates of this file as compilation tests specifying -mavx2 and -msse4.1
respectively to trigger all the paths through the new expander.

2021-11-02  Roger Sayle  <roger@nextmovesoftware.com>
    Jakub Jelinek  <jakub@redhat.com>

gcc/ChangeLog
PR target/102986
* config/i386/i386-expand.c (ix86_expand_v1ti_to_ti,
ix86_expand_ti_to_v1ti): New helper functions.
(ix86_expand_v1ti_shift): Check if the amount operand is an
integer constant, and expand as a TImode shift if it isn't.
(ix86_expand_v1ti_rotate): Check if the amount operand is an
integer constant, and expand as a TImode rotate if it isn't.
(ix86_expand_v1ti_ashiftrt): New function to expand arithmetic
right shifts of V1TImode quantities.
* config/i386/i386-protos.h (ix86_expand_v1ti_ashift): Prototype.
* config/i386/sse.md (ashlv1ti3, lshrv1ti3): Change constraints
to QImode general_operand, and let the helper functions lower
shifts by non-constant operands, as TImode shifts.  Make
conditional on TARGET_64BIT.
(ashrv1ti3): New expander calling ix86_expand_v1ti_ashiftrt.
(rotlv1ti3, rotrv1ti3): Change shift operand to QImode.
Make conditional on TARGET_64BIT.

gcc/testsuite/ChangeLog
PR target/102986
* gcc.target/i386/sse2-v1ti-ashiftrt-1.c: New test case.
* gcc.target/i386/sse2-v1ti-ashiftrt-2.c: New test case.
* gcc.target/i386/sse2-v1ti-ashiftrt-3.c: New test case.
* gcc.target/i386/sse2-v1ti-shift-2.c: New test case.
* gcc.target/i386/sse2-v1ti-shift-3.c: New test case.

IBM Z: Fix address of operands will never be NULL warnings

Since a recent enhancement of -Waddress a couple of warnings are emitted
and turned into errors during bootstrap:

gcc/config/s390/s390.md:12087:25: error: the address of 'operands' will never be NULL [-Werror=address]
12087 |   "TARGET_HTM && operands != NULL
build/gencondmd.c:59:12: note: 'operands' declared here
   59 | extern rtx operands[];
      |            ^~~~~~~~

Fixed by removing those non-null checks.

gcc/ChangeLog:

* config/s390/s390.md ("*cc_to_int", "tabort", "*tabort_1",
"*tabort_1_plus"): Remove operands non-null check.

openmp: Add testcase for threadprivate random access class iterators

This adds a testcase for random access class iterators. The diagnostics
can be different between templates and non-templates, as for some
threadprivate vars finish_id_expression replaces them with call to their
corresponding wrapper, but I think it is not that big deal, we reject
it in either case.

2021-11-02 Jakub Jelinek <jakub@redhat.com>

* g++.dg/gomp/loop-8.C: New test.

Daily bump.

libstdc++: Missing constexpr for __gnu_debug::__valid_range etc

The new 25_algorithms/move/constexpr.cc test fails in debug mode,
because the debug assertions use the non-constexpr overloads in
<debug/stl_iterator.h>.

libstdc++-v3/ChangeLog:

* include/debug/stl_iterator.h (__valid_range): Add constexpr
for C++20. Qualify call to avoid ADL.
(__get_distance, __can_advance, __unsafe, __base): Likewise.
* testsuite/25_algorithms/move/constexpr.cc: Also check with
std::reverse_iterator arguments.

libstdc++: Reorder constraints on std::span::span(Range&&) constructor.

In PR libstdc++/103013 Tim Song pointed out that we could reorder the
constraints of this constructor. That's worth doing just to reduce the
work the compiler has to do during overload resolution, even if it isn't
needed to make the code in the PR work.

libstdc++-v3/ChangeLog:

* include/std/span (span(Range&&)): Reorder constraints.

Fix negative integer range for UInteger.

gcc/ChangeLog:

* opt-functions.awk: Add new sanity checking.
* optc-gen.awk: Add new argument to integer_range_info.
* params.opt: Update 2 params which have negative IntegerRange.

Fix test-suite pattern scanning.

Fixes:

UNRESOLVED: g++.dg/ipa/modref-1.C scan-ipa-dump local-pure-const1 "Function found to be const: {anonymous}::B::genB"
UNRESOLVED: g++.dg/ipa/modref-1.C scan-ipa-dump modref1 "Retslot flags: direct noescape nodirectescape not_returned noread"

gcc/testsuite/ChangeLog:

* g++.dg/ipa/modref-1.C: Fix test-suite pattern scanning.

contrib: add unicode/utf8-dump.py

This script may be useful when debugging issues relating to Unicode
encoding (e.g. when investigating source files with bidirectional control
characters).

It dumps a UTF-8 file as a list of numbered lines (mimicking GCC's
diagnostic output format), interleaved with lines per character showing
the Unicode codepoints, the UTF-8 encoding bytes, the name of the
character, and, where printable, the characters themselves.
The lines are printed in logical order, which may help the reader to grok
the relationship between visual and logical ordering in bi-di files.

For example:

$ cat test.c
int གྷ;
const char *אבג = "ALEF-BET-GIMEL";

$ ./contrib/unicode/utf8-dump.py test.c
   1 | int གྷ;
     |   U+0069            0x69                     LATIN SMALL LETTER I i
     |   U+006E            0x6e                     LATIN SMALL LETTER N n
     |   U+0074            0x74                     LATIN SMALL LETTER T t
     |   U+0020            0x20                                    SPACE (separator)
     |   U+0F43  0xe0 0xbd 0x83                       TIBETAN LETTER GHA གྷ
     |   U+003B            0x3b                                SEMICOLON ;
     |   U+000A            0x0a                           LINE FEED (LF) (control character)
   2 | const char *אבג = "ALEF-BET-GIMEL";
     |   U+0063            0x63                     LATIN SMALL LETTER C c
     |   U+006F            0x6f                     LATIN SMALL LETTER O o
     |   U+006E            0x6e                     LATIN SMALL LETTER N n
     |   U+0073            0x73                     LATIN SMALL LETTER S s
     |   U+0074            0x74                     LATIN SMALL LETTER T t
     |   U+0020            0x20                                    SPACE (separator)
     |   U+0063            0x63                     LATIN SMALL LETTER C c
     |   U+0068            0x68                     LATIN SMALL LETTER H h
     |   U+0061            0x61                     LATIN SMALL LETTER A a
     |   U+0072            0x72                     LATIN SMALL LETTER R r
     |   U+0020            0x20                                    SPACE (separator)
     |   U+002A            0x2a                                 ASTERISK *
     |   U+05D0       0xd7 0x90                       HEBREW LETTER ALEF א
     |   U+05D1       0xd7 0x91                        HEBREW LETTER BET ב
     |   U+05D2       0xd7 0x92                      HEBREW LETTER GIMEL ג
     |   U+0020            0x20                                    SPACE (separator)
     |   U+003D            0x3d                              EQUALS SIGN =
     |   U+0020            0x20                                    SPACE (separator)
     |   U+0022            0x22                           QUOTATION MARK "
     |   U+0041            0x41                   LATIN CAPITAL LETTER A A
     |   U+004C            0x4c                   LATIN CAPITAL LETTER L L
     |   U+0045            0x45                   LATIN CAPITAL LETTER E E
     |   U+0046            0x46                   LATIN CAPITAL LETTER F F
     |   U+002D            0x2d                             HYPHEN-MINUS -
     |   U+0042            0x42                   LATIN CAPITAL LETTER B B
     |   U+0045            0x45                   LATIN CAPITAL LETTER E E
     |   U+0054            0x54                   LATIN CAPITAL LETTER T T
     |   U+002D            0x2d                             HYPHEN-MINUS -
     |   U+0047            0x47                   LATIN CAPITAL LETTER G G
     |   U+0049            0x49                   LATIN CAPITAL LETTER I I
     |   U+004D            0x4d                   LATIN CAPITAL LETTER M M
     |   U+0045            0x45                   LATIN CAPITAL LETTER E E
     |   U+004C            0x4c                   LATIN CAPITAL LETTER L L
     |   U+0022            0x22                           QUOTATION MARK "
     |   U+003B            0x3b                                SEMICOLON ;
     |   U+000A            0x0a                           LINE FEED (LF) (control character)

Tested with Python 3.8

contrib/ChangeLog:
* unicode/utf8-dump.py: New file.

Signed-off-by: David Malcolm <dmalcolm@redhat.com>

PR 102281 (-ftrivial-auto-var-init=zero causes ice)

Do not add call to __builtin_clear_padding when a variable is a gimple
register or it might not have padding.

gcc/ChangeLog:

2021-11-01 qing zhao <qing.zhao@oracle.com>

* gimplify.c (gimplify_decl_expr): Do not add call to
__builtin_clear_padding when a variable is a gimple register
or it might not have padding.
(gimplify_init_constructor): Likewise.

gcc/testsuite/ChangeLog:

2021-11-01 qing zhao <qing.zhao@oracle.com>

* c-c++-common/pr102281.c: New test.
* gcc.target/i386/auto-init-2.c: Adjust testing case.
* gcc.target/i386/auto-init-4.c: Likewise.
* gcc.target/i386/auto-init-6.c: Likewise.
* gcc.target/aarch64/auto-init-6.c: Likewise.

AArch64: Add better costing for vector constants and operations

This patch adds extended costing to cost the creation of constants and the
manipulation of constants.  The default values provided are based on
architectural expectations and each cost models can be individually tweaked as
needed.

The changes in this patch covers:

* Construction of PARALLEL or CONST_VECTOR:
  Adds better costing for vector of constants which is based on the constant
  being created and the instruction that can be used to create it.  i.e. a movi
  is cheaper than a literal load etc.
* Construction of a vector through a vec_dup.

gcc/ChangeLog:

* config/arm/aarch-common-protos.h (struct vector_cost_table): Add
movi, dup and extract costing fields.
* config/aarch64/aarch64-cost-tables.h (qdf24xx_extra_costs,
thunderx_extra_costs, thunderx2t99_extra_costs,
thunderx3t110_extra_costs, tsv110_extra_costs, a64fx_extra_costs): Use
them.
* config/arm/aarch-cost-tables.h (generic_extra_costs,
cortexa53_extra_costs, cortexa57_extra_costs, cortexa76_extra_costs,
exynosm1_extra_costs, xgene1_extra_costs): Likewise
* config/aarch64/aarch64-simd.md (aarch64_simd_dup<mode>): Add r->w dup.
* config/aarch64/aarch64.c (aarch64_rtx_costs): Add extra costs.

gcc/testsuite/ChangeLog:

* gcc.target/aarch64/vect-cse-codegen.c: New test.

middle-end: Teach CSE to be able to do vector extracts.

This patch gets CSE to re-use constants already inside a vector rather than
re-materializing the constant again.

Basically consider the following case:

#include <stdint.h>
#include <arm_neon.h>

uint64_t
test (uint64_t a, uint64x2_t b, uint64x2_t* rt)
{
  uint64_t arr[2] = { 0x0942430810234076UL, 0x0942430810234076UL};
  uint64_t res = a | arr[0];
  uint64x2_t val = vld1q_u64 (arr);
  *rt = vaddq_u64 (val, b);
  return res;
}

The actual behavior is inconsequential however notice that the same constants
are used in the vector (arr and later val) and in the calculation of res.

The code we generate for this however is quite sub-optimal:

test:
        adrp    x2, .LC0
        sub     sp, sp, #16
        ldr     q1, [x2, #:lo12:.LC0]
        mov     x2, 16502
        movk    x2, 0x1023, lsl 16
        movk    x2, 0x4308, lsl 32
        add     v1.2d, v1.2d, v0.2d
        movk    x2, 0x942, lsl 48
        orr     x0, x0, x2
        str     q1, [x1]
        add     sp, sp, 16
        ret
.LC0:
        .xword  667169396713799798
        .xword  667169396713799798

Essentially we materialize the same constant twice.  The reason for this is
because the front-end lowers the constant extracted from arr[0] quite early on.
If you look into the result of fre you'll find

  <bb 2> :
  arr[0] = 667169396713799798;
  arr[1] = 667169396713799798;
  res_7 = a_6(D) | 667169396713799798;
  _16 = __builtin_aarch64_ld1v2di (&arr);
  _17 = VIEW_CONVERT_EXPR<uint64x2_t>(_16);
  _11 = b_10(D) + _17;
  *rt_12(D) = _11;
  arr ={v} {CLOBBER};
  return res_7;

Which makes sense for further optimization.  However come expand time if the
constant isn't representable in the target arch it will be assigned to a
register again.

(insn 8 5 9 2 (set (reg:V2DI 99)
        (const_vector:V2DI [
                (const_int 667169396713799798 [0x942430810234076]) repeated x2
            ])) "cse.c":7:12 -1
     (nil))
...
(insn 14 13 15 2 (set (reg:DI 103)
        (const_int 667169396713799798 [0x942430810234076])) "cse.c":8:12 -1
     (nil))
(insn 15 14 16 2 (set (reg:DI 102 [ res ])
        (ior:DI (reg/v:DI 96 [ a ])
            (reg:DI 103))) "cse.c":8:12 -1
     (nil))

And since it's out of the immediate range of the scalar instruction used
combine won't be able to do anything here.

This will then trigger the re-materialization of the constant twice.

To fix this this patch extends CSE to be able to generate an extract for a
constant from another vector, or to make a vector for a constant by duplicating
another constant.

Whether this transformation is done or not depends entirely on the costing for
the target for the different constants and operations.

I Initially also investigated doing this in PRE, but PRE requires at least 2 BB
to work and does not currently have any way to remove redundancies within a
single BB and it did not look easy to support.

gcc/ChangeLog:

* cse.c (add_to_set): New.
(find_sets_in_insn): Register constants in sets.
(canonicalize_insn): Use auto_vec instead.
(cse_insn): Try materializing using vec_dup.
* rtl.h (simplify_context::simplify_gen_vec_select,
simplify_gen_vec_select): New.
* simplify-rtx.c (simplify_context::simplify_gen_vec_select): New.

testsuite: fix failing complex add testcases PR103000

Some targets have overriden the default unroll factor and so do not have enough
data to succeed for SLP vectorization if loop vect is turned off.

To fix this just always unroll in these testcases.

gcc/testsuite/ChangeLog:

PR testsuite/103000
* gcc.dg/vect/complex/fast-math-bb-slp-complex-add-double.c:
Force unroll.
* gcc.dg/vect/complex/fast-math-bb-slp-complex-add-float.c: likewise
* gcc.dg/vect/complex/fast-math-bb-slp-complex-add-pattern-float.c:
Likewise
* gcc.dg/vect/complex/fast-math-bb-slp-complex-add-pattern-half-float.c:
Likewise.

diagnostics: escape non-ASCII source bytes for certain diagnostics

This patch adds support to GCC's diagnostic subsystem for escaping certain
bytes and Unicode characters when quoting source code.

Specifically, this patch adds a new flag rich_location::m_escape_on_output
which is a hint from a diagnostic that non-ASCII bytes in the pertinent
lines of the user's source code should be escaped when printed.

The patch sets this for the following diagnostics:
- when complaining about stray bytes in the program (when these
are non-printable)
- when complaining about "null character(s) ignored");
- for -Wnormalized= (and generate source ranges for such warnings)

The escaping is controlled by a new option:
  -fdiagnostics-escape-format=[unicode|bytes]

For example, consider a diagnostic involing a source line containing the
string "before" followed by the Unicode character U+03C0 ("GREEK SMALL
LETTER PI", with UTF-8 encoding 0xCF 0x80) followed by the byte 0xBF
(a stray UTF-8 trailing byte), followed by the string "after", where the
diagnostic highlights the U+03C0 character.

By default, this line will be printed verbatim to the user when
reporting a diagnostic at it, as:

beforeπXafter
       ^

(using X for the stray byte to avoid putting invalid UTF-8 in this
commit message)

If the diagnostic sets the "escape" flag, it will be printed as:

before<U+03C0><BF>after
       ^~~~~~~~

with -fdiagnostics-escape-format=unicode (the default), or as:

  before<CF><80><BF>after
        ^~~~~~~~

if the user supplies -fdiagnostics-escape-format=bytes.

This only affects how the source is printed; it does not affect
how column numbers that are printed (as per -fdiagnostics-column-unit=
and -fdiagnostics-column-origin=).

gcc/c-family/ChangeLog:
* c-lex.c (c_lex_with_flags): When complaining about non-printable
CPP_OTHER tokens, set the "escape on output" flag.

gcc/ChangeLog:
* common.opt (fdiagnostics-escape-format=): New.
(diagnostics_escape_format): New enum.
(DIAGNOSTICS_ESCAPE_FORMAT_UNICODE): New enum value.
(DIAGNOSTICS_ESCAPE_FORMAT_BYTES): Likewise.
* diagnostic-format-json.cc (json_end_diagnostic): Add
"escape-source" attribute.
* diagnostic-show-locus.c
(exploc_with_display_col::exploc_with_display_col): Replace
"tabstop" param with a cpp_char_column_policy and add an "aspect"
param.  Use these to compute m_display_col accordingly.
(struct char_display_policy): New struct.
(layout::m_policy): New field.
(layout::m_escape_on_output): New field.
(def_policy): New function.
(make_range): Update for changes to exploc_with_display_col ctor.
(default_print_decoded_ch): New.
(width_per_escaped_byte): New.
(escape_as_bytes_width): New.
(escape_as_bytes_print): New.
(escape_as_unicode_width): New.
(escape_as_unicode_print): New.
(make_policy): New.
(layout::layout): Initialize new fields.  Update m_exploc ctor
call for above change to ctor.
(layout::maybe_add_location_range): Update for changes to
exploc_with_display_col ctor.
(layout::calculate_x_offset_display): Update for change to
cpp_display_width.
(layout::print_source_line): Pass policy
to cpp_display_width_computation. Capture cpp_decoded_char when
calling process_next_codepoint.  Move printing of source code to
m_policy.m_print_cb.
(line_label::line_label): Pass in policy rather than context.
(layout::print_any_labels): Update for change to line_label ctor.
(get_affected_range): Pass in policy rather than context, updating
calls to location_compute_display_column accordingly.
(get_printed_columns): Likewise, also for cpp_display_width.
(correction::correction): Pass in policy rather than tabstop.
(correction::compute_display_cols): Pass m_policy rather than
m_tabstop to cpp_display_width.
(correction::m_tabstop): Replace with...
(correction::m_policy): ...this.
(line_corrections::line_corrections): Pass in policy rather than
context.
(line_corrections::m_context): Replace with...
(line_corrections::m_policy): ...this.
(line_corrections::add_hint): Update to use m_policy rather than
m_context.
(line_corrections::add_hint): Likewise.
(layout::print_trailing_fixits): Likewise.
(selftest::test_display_widths): New.
(selftest::test_layout_x_offset_display_utf8): Update to use
policy rather than tabstop.
(selftest::test_one_liner_labels_utf8): Add test of escaping
source lines.
(selftest::test_diagnostic_show_locus_one_liner_utf8): Update to
use policy rather than tabstop.
(selftest::test_overlapped_fixit_printing): Likewise.
(selftest::test_overlapped_fixit_printing_utf8): Likewise.
(selftest::test_overlapped_fixit_printing_2): Likewise.
(selftest::test_tab_expansion): Likewise.
(selftest::test_escaping_bytes_1): New.
(selftest::test_escaping_bytes_2): New.
(selftest::diagnostic_show_locus_c_tests): Call the new tests.
* diagnostic.c (diagnostic_initialize): Initialize
context->escape_format.
(convert_column_unit): Update to use default character width policy.
(selftest::test_diagnostic_get_location_text): Likewise.
* diagnostic.h (enum diagnostics_escape_format): New enum.
(diagnostic_context::escape_format): New field.
* doc/invoke.texi (-fdiagnostics-escape-format=): New option.
(-fdiagnostics-format=): Add "escape-source" attribute to examples
of JSON output, and document it.
* input.c (location_compute_display_column): Pass in "policy"
rather than "tabstop", passing to
cpp_byte_column_to_display_column.
(selftest::test_cpp_utf8): Update to use cpp_char_column_policy.
* input.h (class cpp_char_column_policy): New forward decl.
(location_compute_display_column): Pass in "policy" rather than
"tabstop".
* opts.c (common_handle_option): Handle
OPT_fdiagnostics_escape_format_.
* selftest.c (temp_source_file::temp_source_file): New ctor
overload taking a size_t.
* selftest.h (temp_source_file::temp_source_file): Likewise.

gcc/testsuite/ChangeLog:
* c-c++-common/diagnostic-format-json-1.c: Add regexp to consume
"escape-source" attribute.
* c-c++-common/diagnostic-format-json-2.c: Likewise.
* c-c++-common/diagnostic-format-json-3.c: Likewise.
* c-c++-common/diagnostic-format-json-4.c: Likewise, twice.
* c-c++-common/diagnostic-format-json-5.c: Likewise.
* gcc.dg/cpp/warn-normalized-4-bytes.c: New test.
* gcc.dg/cpp/warn-normalized-4-unicode.c: New test.
* gcc.dg/encoding-issues-bytes.c: New test.
* gcc.dg/encoding-issues-unicode.c: New test.
* gfortran.dg/diagnostic-format-json-1.F90: Add regexp to consume
"escape-source" attribute.
* gfortran.dg/diagnostic-format-json-2.F90: Likewise.
* gfortran.dg/diagnostic-format-json-3.F90: Likewise.

libcpp/ChangeLog:
* charset.c (convert_escape): Use encoding_rich_location when
complaining about nonprintable unknown escape sequences.
(cpp_display_width_computation::::cpp_display_width_computation):
Pass in policy rather than tabstop.
(cpp_display_width_computation::process_next_codepoint): Add "out"
param and populate *out if non-NULL.
(cpp_display_width_computation::advance_display_cols): Pass NULL
to process_next_codepoint.
(cpp_byte_column_to_display_column): Pass in policy rather than
tabstop.  Pass NULL to process_next_codepoint.
(cpp_display_column_to_byte_column): Pass in policy rather than
tabstop.
* errors.c (cpp_diagnostic_get_current_location): New function,
splitting out the logic from...
(cpp_diagnostic): ...here.
(cpp_warning_at): New function.
(cpp_pedwarning_at): New function.
* include/cpplib.h (cpp_warning_at): New decl for rich_location.
(cpp_pedwarning_at): Likewise.
(struct cpp_decoded_char): New.
(struct cpp_char_column_policy): New.
(cpp_display_width_computation::cpp_display_width_computation):
Replace "tabstop" param with "policy".
(cpp_display_width_computation::process_next_codepoint): Add "out"
param.
(cpp_display_width_computation::m_tabstop): Replace with...
(cpp_display_width_computation::m_policy): ...this.
(cpp_byte_column_to_display_column): Replace "tabstop" param with
"policy".
(cpp_display_width): Likewise.
(cpp_display_column_to_byte_column): Likewise.
* include/line-map.h (rich_location::escape_on_output_p): New.
(rich_location::set_escape_on_output): New.
(rich_location::m_escape_on_output): New.
* internal.h (cpp_diagnostic_get_current_location): New decl.
(class encoding_rich_location): New.
* lex.c (skip_whitespace): Use encoding_rich_location when
complaining about null characters.
(warn_about_normalization): Generate a source range when
complaining about improperly normalized tokens, rather than just a
point, and use encoding_rich_location so that the source code
is escaped on printing.
* line-map.c (rich_location::rich_location): Initialize
m_escape_on_output.

Signed-off-by: David Malcolm <dmalcolm@redhat.com>

libstdc++: Fix range access for empty std::valarray [PR103022]

The std::begin and std::end overloads for std::valarray are defined in
terms of std::addressof(v[0]) which is undefined for an empty valarray.

libstdc++-v3/ChangeLog:

PR libstdc++/103022
* include/std/valarray (begin, end): Do not dereference an empty
valarray. Add noexcept and [[nodiscard]].
* testsuite/26_numerics/valarray/range_access.cc: Check empty
valarray. Check iterator properties. Run as well as compiling.
* testsuite/26_numerics/valarray/range_access2.cc: Likewise.
* testsuite/26_numerics/valarray/103022.cc: New test.

Add debug counters to back threader.

Chasing down stage3 miscomparisons is never fun, and having no way to
distinguish between jump threads registered by a particular
pass, is even harder. This patch adds debug counters for the individual
back threading passes. I've left the ethread pass alone, as that one is
usually benign, but we could easily add it if needed.

The fact that we can only pass one boolean argument to the passes
infrastructure has us do all sorts of gymnastics to differentiate
between the various back threading passes.

Tested on x86-64 Linux.

gcc/ChangeLog:

* dbgcnt.def: Add debug counter for back_thread[12] and
back_threadfull[12].
* passes.def: Pass "first" argument to each back threading pass.
* tree-ssa-threadbackward.c (back_threader::back_threader): Add
first argument.
(back_threader::debug_counter): New.
(back_threader::maybe_register_path): Call debug_counter.

Move statics to threader pass class.

This patch moves all the static functions into the pass class, and
cleans up things a little. The goal is to shuffle things around such
that we can add debug counters that depend on different threading
passes, but it's a clean-up on its own right.

Tested on x86-64 Linux.

gcc/ChangeLog:

* tree-ssa-threadbackward.c (BT_NONE): New.
(BT_SPEED): New.
(BT_RESOLVE): New.
(back_threader::back_threader): Add flags.
Move loop initialization here.
(back_threader::~back_threader): New.
(back_threader::find_taken_edge_switch): Change solver and ranger
to pointers.
(back_threader::find_taken_edge_cond): Same.
(back_threader::find_paths_to_names): Same.
(back_threader::find_paths): Same.
(back_threader::dump): Same.
(try_thread_blocks): Merge into thread_blocks.
(back_threader::thread_blocks): New.
(do_early_thread_jumps): Merge into thread_blocks.
(do_thread_jumps): Merge into thread_blocks.
(back_threader::thread_through_all_blocks): Remove.

Don't register nonsensical relations.

gcc/
PR tree-optimization/103003
* value-relation.cc (dom_oracle::register_relation): If the 2
ssa names are the same, don't register any relation.

gcc/testsuite/
* gcc.dg/pr103003.c: New.

aarch64: Fix redundant check in aut insn generation

During the generation of the epilogue of aarch64(aarch64_expand_epilogue),
the value of crtl->calls_eh_return does not need to be checked again.
This value has been checked during aarch64_return_address_signing_enabled.

gcc/ChangeLog:

* config/aarch64/aarch64.c (aarch64_expand_epilogue): Remove
redundant check for calls_eh_return.
* config/aarch64/aarch64.md (*do_return): Likewise.

Signed-off-by: Dan Li <ashimida@linux.alibaba.com>

Rename duplicate_loop_to_header_edge to duplicate_loop_body_to_header_edge

gcc/ChangeLog:

2021-11-01 Xionghu Luo <luoxhu@linux.ibm.com>

* cfghooks.c (cfg_hook_duplicate_loop_to_header_edge): Rename
duplicate_loop_to_header_edge to
duplicate_loop_body_to_header_edge.
(cfg_hook_duplicate_loop_body_to_header_edge): Likewise.
* cfghooks.h (struct cfg_hooks): Likewise.
(cfg_hook_duplicate_loop_body_to_header_edge): Likewise.
* cfgloopmanip.c (duplicate_loop_body_to_header_edge): Likewise.
(clone_loop_to_header_edge): Likewise.
* cfgloopmanip.h (duplicate_loop_body_to_header_edge): Likewise.
* cfgrtl.c (struct cfg_hooks): Likewise.
* doc/loop.texi: Likewise.
* loop-unroll.c (unroll_loop_constant_iterations): Likewise.
(unroll_loop_runtime_iterations): Likewise.
(unroll_loop_stupid): Likewise.
(apply_opt_in_copies): Likewise.
* tree-cfg.c (struct cfg_hooks): Likewise.
* tree-ssa-loop-ivcanon.c (try_unroll_loop_completely): Likewise.
(try_peel_loop): Likewise.
* tree-ssa-loop-manip.c (copy_phi_node_args): Likewise.
(gimple_duplicate_loop_body_to_header_edge): Likewise.
(tree_transform_and_unroll_loop): Likewise.
* tree-ssa-loop-manip.h (gimple_duplicate_loop_body_to_header_edge):
Likewise.

Refactor loop_version

loop_version currently does lv_adjust_loop_entry_edge
before it loopifys the copy inserted on the header.  This patch moves
the condition generation later and thus we have four pieces to help
understanding of how the adjustment works:
1) duplicating the loop on the entry edge.
2) loopify the duplicated new loop.
3) adjusting the CFG to insert a condition branching to either loop
with lv_adjust_loop_entry_edge.
4) From loopify extract the scale_loop_frequencies bits.

Also removed some piece of code seems obviously useless:
- redirect_all_edges since it is false and loopify only called once.
- extract_cond_bb_edges and lv_flush_pending_stmts (false_edge) as the
edge is not redirected actually.

gcc/ChangeLog:

2021-11-01  Xionghu Luo  <luoxhu@linux.ibm.com>

* cfgloopmanip.c (loop_version): Refactor loopify to
loop_version.  Move condition generation after loopify.
(loopify): Delete.
* cfgloopmanip.h (loopify): Delete.

libcody: add mostlyclean Makefile target

PR other/102657

libcody/ChangeLog:

* Makefile.in: Add mostlyclean Makefile target.

Daily bump.

Fortran: Revert explicit memcpy in gfc_get_typebound_proc

This reverts the hunk to gfc_get_typebound_proc from
7883a7f07c1ad9c8aaccc5bbd96e0ae1fa230c89

gcc/fortran/ChangeLog:

* symbol.c (gfc_get_typebound_proc): Revert memcpy.

Improve handling of return slot in ipa-pure-const and modref.

while preparing testcase for return slot tracking I noticed that both
ipa-pure-const and modref treat return slot writes as non-local which prevents
detecting functions as pure or not modifying global state.  Fixed by making
points_to_local_or_readonly_memory_p to special case return slot.  This is bit
of a side case, but presently at all uses of
points_to_local_or_readonly_memory_p we want to handle return slot this way.

I also noticed that we handle gimple copy unnecesarily pesimistically.  This
does not make difference right now since we do no not track non-scalars, but
I fixed it anyway.

Bootstrapped/regtested x86_64-linux, comitted.

gcc/ChangeLog:

* ipa-fnsummary.c: Include tree-dfa.h.
(points_to_local_or_readonly_memory_p): Return true on return
slot writes.
* ipa-modref.c (analyze_ssa_name_flags): Fix handling of copy
statement.

gcc/testsuite/ChangeLog:

* g++.dg/ipa/modref-1.C: New test.

d: Fix regressing test failures on ix86-solaris2.11

The _Unwind_Exception struct had its alignment adjusted to 16-bytes,
however malloc() on Solaris X86 is not guaranteed to allocate memory
aligned to 16-bytes as well.

PR d/102837

libphobos/ChangeLog:

* libdruntime/gcc/deh.d (ExceptionHeader.free): Use memset to reset
contents of internal EH storage.

d: Fix pr96435.d failing on SPARC and HPPA

The value used to initialize the integer field in the union didn't
account for BigEndian targets running this code.

PR d/102959

gcc/testsuite/ChangeLog:

* gdc.dg/torture/pr96435.d: Adjust for BigEndian.

Fortran: Silence -Wmaybe-uninitialized warning

gcc/fortran/ChangeLog:

* resolve.c (resolve_fl_procedure): Initialize
allocatable_or_pointer.

Daily bump.

OpenMP: Add strictly nested API call check [PR102972]

The teams construct only permits omp_get_num_teams and omp_get_team_num
as API call in strictly nested regions - check for it.

Additionally, for Fortran, using DECL_NAME does not show the mangled
name, hence, DECL_ASSEMBLER_NAME had to be used to.

Finally, 'target device(ancestor:1)' wrongly rejected non-API calls
as well.

PR middle-end/102972
gcc/ChangeLog:

* omp-low.c (omp_runtime_api_call): Use DECL_ASSEMBLER_NAME to get
internal Fortran name; new permit_num_teams arg to permit
omp_get_num_teams and omp_get_team_num.
(scan_omp_1_stmt): Update call to it, add missing call for
reverse offload, and check for strictly nested API calls in teams.

gcc/testsuite/ChangeLog:

* c-c++-common/gomp/target-device-ancestor-3.c: Add non-API
routine test.
* gfortran.dg/gomp/order-6.f90: Add missing bind(C).
* c-c++-common/gomp/teams-3.c: New test.
* gfortran.dg/gomp/teams-3.f90: New test.
* gfortran.dg/gomp/teams-4.f90: New test.

libgomp/ChangeLog:
* testsuite/libgomp.c-c++-common/icv-3.c: Nest API calls inside
parallel construct.
* testsuite/libgomp.c-c++-common/icv-4.c: Likewise.
* testsuite/libgomp.c/target-3.c: Likewise.
* testsuite/libgomp.c/target-5.c: Likewise.
* testsuite/libgomp.c/target-6.c: Likewise.
* testsuite/libgomp.c/target-teams-1.c: Likewise.
* testsuite/libgomp.c/teams-1.c: Likewise.
* testsuite/libgomp.c/thread-limit-2.c: Likewise.
* testsuite/libgomp.c/thread-limit-3.c: Likewise.
* testsuite/libgomp.c/thread-limit-4.c: Likewise.
* testsuite/libgomp.c/thread-limit-5.c: Likewise.
* testsuite/libgomp.fortran/icv-3.f90: Likewise.
* testsuite/libgomp.fortran/icv-4.f90: Likewise.
* testsuite/libgomp.fortran/teams1.f90: Likewise.

Fortran: remove descriptions of SHORT and LONG in intrinsic.texi

2021-10-30 Manfred Schwarb <manfred99@gmx.ch>

gcc/fortran/ChangeLog:

* intrinsic.texi: Remove entries for SHORT and LONG intrinsics.

Fortran: non-standard intrinsics SHORT and LONG have been removed

2021-10-30 Manfred Schwarb <manfred99@gmx.ch>

gcc/fortran/ChangeLog:

* check.c (gfc_check_intconv): Change error message.

gcc/testsuite/ChangeLog:

* gfortran.dg/intrinsic_short-long.f90: New test.

Fortran: fix descriptions in intrinsic.texi

2021-10-30 Manfred Schwarb <manfred99@gmx.ch>

gcc/fortran/ChangeLog:

* intrinsic.texi (REAL): Fix entries in Specific names table.

Fortran: improve formatting of tables in intrinsic.texi

2021-10-30 Manfred Schwarb <manfred99@gmx.ch>

gcc/fortran/ChangeLog:

* intrinsic.texi: Adjust @columnfractions commands to improve
appearance for narrow 80 character terminals.

Fix memory leak of gsymbol

We did not free global symbols. For a simplified abstract_type_3.f90
valgrind reports:

96 bytes in 1 blocks are still reachable in loss record 461 of 602
   at 0x48377D5: calloc (vg_replace_malloc.c:711)
   by 0x21257C3: xcalloc (xmalloc.c:162)
   by 0x98611B: gfc_get_gsymbol(char const*) (symbol.c:4341)
   by 0x932C58: parse_module() (parse.c:5912)
   by 0x9336F8: gfc_parse_file() (parse.c:6236)
   by 0x991449: gfc_be_parse_file() (f95-lang.c:204)
   by 0x11D8EDE: compile_file() (toplev.c:455)
   by 0x11DB9C3: do_compile() (toplev.c:2170)
   by 0x11DBCAF: toplev::main(int, char**) (toplev.c:2305)
   by 0x2045D37: main (main.c:39)

This patch reduces this to

LEAK SUMMARY:
    definitely lost: 344 bytes in 1 blocks
    indirectly lost: 3,024 bytes in 4 blocks
      possibly lost: 0 bytes in 0 blocks
-   still reachable: 1,576,174 bytes in 2,277 blocks
+   still reachable: 1,576,078 bytes in 2,276 blocks
         suppressed: 0 bytes in 0 blocks

gcc/fortran/ChangeLog:

2018-10-21  Bernhard Reutner-Fischer  <aldot@gcc.gnu.org>

* parse.c (clean_up_modules): Free gsym.

Fortran: update gfortran.texi list of frequent reporters

gcc/fortran/ChangeLog:

* gfortran.texi (bug reports): credit Gerhard Steinmetz for
numerous bug reports.

Fortran: generate regular error on invalid conversions of CASE expressions

gcc/fortran/ChangeLog:

PR fortran/99853
* resolve.c (resolve_select): Generate regular gfc_error on
invalid conversions instead of an gfc_internal_error.

gcc/testsuite/ChangeLog:

PR fortran/99853
* gfortran.dg/pr99853.f90: New test.

Implied compares in Ada Harded Conditionals documentation

Improve the wording on optimizations that prevent compare hardening,
so as to also cover cases in which explicit compares get combined into
operations with implied compares.

for gcc/ada/ChangeLog

* doc/gnat_rm/security_hardening_features.rst: Mention
optimization to operations with implied compares.

openmp: Diagnose threadprivate OpenMP loop iterators

We weren't diagnosing the
The loop iteration variable may not appear in a threadprivate directive.
restriction which used to be in 5.0 just among the Worksharing-Loop
restrictions but in 5.1 it is among Canonical Loop Nest Form restrictions.

This patch diagnoses those.

2021-10-30 Jakub Jelinek <jakub@redhat.com>

* gimplify.c (gimplify_omp_for): Diagnose threadprivate iterators.

* c-c++-common/gomp/loop-10.c: New test.

Daily bump.

testsuite: Don't expect a complex FMA

The sharing of the COMPLEX_MUL node makes it so it's
more efficient to not generate both a MUL and FMA
in this node.

Because the shape for a normal FMA is not different
the FMA is no longer detected here which results in
better codegen so update the testcase.

gcc/testsuite/ChangeLog:

* g++.dg/vect/pr99149.cc: Update case.

libcpp: Fix _Pragma expansion [PR102409]

Both #pragma and _Pragma ended up as CPP_PRAGMA. Presumably since
r131819 (2008, GCC 4.3) for PR34692, pragmas are not expanded in
macro arguments but are output as is before. From the old bug report,
that was to fix usage like
  FOO (
    #pragma GCC diagnostic
  )
However, that change also affected _Pragma such that
  BAR (
    "1";
    _Pragma("omp ..."); )
yielded
  #pragma omp ...
followed by what BAR expanded too, possibly including '"1";'.

This commit adds a flag, PRAGMA_OP, to tokens to make the two
distinguishable - and include again _Pragma in the expanded arguments.

libcpp/ChangeLog:

PR c++/102409
* directives.c (destringize_and_run): Add PRAGMA_OP to the
CPP_PRAGMA token's flags to mark is as coming from _Pragma.
* include/cpplib.h (PRAGMA_OP): #define, to be used with token flags.
* macro.c (collect_args): Only handle CPP_PRAGMA special if PRAGMA_OP
is set.

gcc/testsuite/ChangeLog:

* c-c++-common/gomp/pragma-1.c: New test.
* c-c++-common/gomp/pragma-2.c: New test.

assert_streq: add newlines to failure message

Adding newlines so that the two strings line up makes string equality
failures considerably easier to read.

gcc/ChangeLog:
* selftest.c (assert_streq): Add newlines when emitting non-equal
non-NULL strings.

Signed-off-by: David Malcolm <dmalcolm@redhat.com>

gcc/Makefile.in: fix bug in gengtype link rule

gcc/ChangeLog:
* Makefile.in: Fix syntax for reference to LIBDEPS in
gengtype link rule.

Signed-off-by: David Malcolm <dmalcolm@redhat.com>

libstdc++: Fix typo in std::stack test

libstdc++-v3/ChangeLog:

* testsuite/23_containers/stack/deduction.cc: Fix typo.

Fortran: Free type-bound procedure structs

compiling gfortran.dg/typebound_proc_31.f90 leaked the type-bound
structs:

56 bytes in 1 blocks are definitely lost.
  at 0x4C2CC05: calloc (vg_replace_malloc.c:711)
  by 0x151EA90: xcalloc (xmalloc.c:162)
  by 0x8E3E4F: gfc_get_typebound_proc(gfc_typebound_proc*) (symbol.c:4945)
  by 0x84C095: match_procedure_in_type (decl.c:10486)
  by 0x84C095: gfc_match_procedure() (decl.c:6696)
...

gcc/fortran/ChangeLog:

2017-12-06  Bernhard Reutner-Fischer  <aldot@gcc.gnu.org>

* symbol.c (free_tb_tree): Free type-bound procedure struct.
(gfc_get_typebound_proc): Use explicit memcpy for clarity.

doc: Bump required minimum DejaGnu version to 1.5.3

Bump required DejaGnu version to 1.5.3 (or later).
Ok for trunk?

gcc/ChangeLog:

* doc/install.texi: Bump required minimum DejaGnu version.

path oracle: Do not look back to the root oracle for killing defs.

Since registering a kill means removing all references to it from the
path oracle list, make sure we don't look back to the root oracle
either.

Tested on x86-64 Linux.

Co-authored-by: Andrew MacLeod <amacleod@redhat.com>
gcc/ChangeLog:

* value-relation.cc (path_oracle::killing_def): Add a
self-equivalence so we don't look to the root oracle.

Remove VRP threader passes in exchange for better threading pre-VRP.

This patch upgrades the pre-VRP threading passes to fully resolving
backward threaders, and removes the post-VRP threading passes altogether.
With it, we reduce the number of threaders in our pipeline from 9 to 7.

This will leave DOM as the only forward threader client.  When the ranger
can handle floats, we should be able to upgrade the pre-DOM threaders to
fully resolving threaders and kill the embedded DOM threader.

The numbers are as follows:

prev: # threads in backward + vrp-threaders = 92624
now:  # threads in backward threaders = 94275
Gain: +1.78%

prev: # total threads: 189495
now:  # total threads: 193714
Gain: +2.22%

The numbers are not as great as my initial proposal, but I've
recently pushed all the work that got us to this point ;-).

And... the compilation improves by 1.32%!

There's a regression on uninit-pred-7_a.c that I've yet to look at.  I
want to make sure it's not a missing thread.  If it is, I'll create a PR
and own it.

Also, the tree-ssa/phi_on_compare-*.c tests have all regressed.  This
seems to be some special case the forward threader handles that the
backward threader does not (edge_forwards_cmp_to_conditional_jump*).
I haven't dug deep to see if this is solveable within our
infrastructure, but a cursory look shows that even though the VRP
threader threads this, the *.optimized dump ends with more conditional
jumps than without the optimization.  I'd like to punt on this for
now, because DOM actually catches this through its lone use of the
forward threader (I've adjusted the tests).  However, we will need to
address this sooner or later, if indeed it's still improving the final
assembly.

gcc/ChangeLog:

* passes.def: Replace the pass_thread_jumps before VRP* with
pass_thread_jumps_full.  Remove all pass_vrp_threader instances.
* tree-ssa-threadbackward.c (pass_data_thread_jumps_full):
Remove hyphen from "thread-full" name.

libgomp/ChangeLog:

* testsuite/libgomp.graphite/force-parallel-4.c: Adjust for threading changes.
* testsuite/libgomp.graphite/force-parallel-8.c: Same.

gcc/testsuite/ChangeLog:

* gcc.dg/loop-unswitch-2.c: Adjust for threading changes.
* gcc.dg/old-style-asm-1.c: Same.
* gcc.dg/tree-ssa/phi_on_compare-1.c: Same.
* gcc.dg/tree-ssa/phi_on_compare-2.c: Same.
* gcc.dg/tree-ssa/phi_on_compare-3.c: Same.
* gcc.dg/tree-ssa/phi_on_compare-4.c: Same.
* gcc.dg/tree-ssa/pr20701.c: Same.
* gcc.dg/tree-ssa/pr21001.c: Same.
* gcc.dg/tree-ssa/pr21294.c: Same.
* gcc.dg/tree-ssa/pr21417.c: Same.
* gcc.dg/tree-ssa/pr21559.c: Same.
* gcc.dg/tree-ssa/pr21563.c: Same.
* gcc.dg/tree-ssa/pr49039.c: Same.
* gcc.dg/tree-ssa/pr59597.c: Same.
* gcc.dg/tree-ssa/pr61839_1.c: Same.
* gcc.dg/tree-ssa/pr61839_3.c: Same.
* gcc.dg/tree-ssa/pr66752-3.c: Same.
* gcc.dg/tree-ssa/pr68198.c: Same.
* gcc.dg/tree-ssa/pr77445-2.c: Same.
* gcc.dg/tree-ssa/pr77445.c: Same.
* gcc.dg/tree-ssa/ranger-threader-1.c: Same.
* gcc.dg/tree-ssa/ranger-threader-2.c: Same.
* gcc.dg/tree-ssa/ranger-threader-4.c: Same.
* gcc.dg/tree-ssa/ssa-dom-thread-1.c: Same.
* gcc.dg/tree-ssa/ssa-dom-thread-11.c: Same.
* gcc.dg/tree-ssa/ssa-dom-thread-12.c: Same.
* gcc.dg/tree-ssa/ssa-dom-thread-14.c: Same.
* gcc.dg/tree-ssa/ssa-dom-thread-16.c: Same.
* gcc.dg/tree-ssa/ssa-dom-thread-2b.c: Same.
* gcc.dg/tree-ssa/ssa-dom-thread-7.c: Same.
* gcc.dg/tree-ssa/ssa-thread-14.c: Same.
* gcc.dg/tree-ssa/ssa-thread-backedge.c: Same.
* gcc.dg/tree-ssa/ssa-vrp-thread-1.c: Same.
* gcc.dg/tree-ssa/vrp02.c: Same.
* gcc.dg/tree-ssa/vrp03.c: Same.
* gcc.dg/tree-ssa/vrp05.c: Same.
* gcc.dg/tree-ssa/vrp06.c: Same.
* gcc.dg/tree-ssa/vrp07.c: Same.
* gcc.dg/tree-ssa/vrp08.c: Same.
* gcc.dg/tree-ssa/vrp09.c: Same.
* gcc.dg/tree-ssa/vrp33.c: Same.
* gcc.dg/uninit-pred-9_b.c: Same.
* gcc.dg/uninit-pred-7_a.c: xfail.

Avoid overly-greedy match in dejagnu regexp.

Occasionally I've been seeing failures with the multi-line diagnostics.  It's never been clear what's causing the spurious failures, though I have long suspected a greedy regexp match.

It happened again yesterday with a local change that in no way should affect diagnostics, so I finally went searching and found that sure enough the multi-line diagnostics had a ".*" in their regexp.  According to the comments, the .* is primarily to catch any dg directives that may appear -- ie it should eat to EOL, but not multiple lines.  But a .* can indeed match a newline and cause it to eat multiple lines.

The fix is simple.  [^\r\n]* will eat to EOL, but not further.

Regression tested on x86_64 and on our internal target.

gcc/testsuite

* lib/multiline.exp (_build_multiline_regex): Use a better
regexp than .* to match up to EOL.

Perform on-entry propagation after range_of_stmt on a gcond.

Propagation is automatically done by the temporal cache when defs are
out of date from the names on the RHS, but a gcond has no LHS, and any
updates on the RHS are never propagated. Always propagate them.

gcc/
PR tree-optimization/102983
* gimple-range-cache.h (propagate_updated_value): Make public.
* gimple-range.cc (gimple_ranger::range_of_stmt): Propagate exports
when processing gcond stmts.

gcc/testsuite/
* gcc.dg/pr102983.c: New.

handle retslot in modref

Extend modref and tree-ssa-structalias to handle retslot flags.
Since retslot it essentially a hidden argument that is known to be write-only
we can do pretty much the same stuff as we do for regular parameters.
I plan to add static chain handling similar way.

We do not handle IPA propagation of retslot flags (where return slot is
initialized via return slot of other function). For this ipa-prop needs
to be extended to understand retslot as well.

Bootstrapped/regtested x86_64-linux, OK for the gimple bits?

Honza

gcc/ChangeLog:

* gimple.c (gimple_call_retslot_flags): New function.
* gimple.h (gimple_call_retslot_flags): Declare.
* ipa-modref.c: Include tree-cfg.h.
(struct escape_entry): Turn parm_index to signed.
(modref_summary_lto::modref_summary_lto): Add retslot_flags.
(modref_summary::modref_summary): Initialize retslot_flags.
(struct modref_summary_lto): Likewise.
(modref_summary::useful_p): Check retslot_flags.
(modref_summary_lto::useful_p): Likewise.
(modref_summary::dump): Dump retslot_flags.
(modref_summary_lto::dump): Likewise.
(struct escape_point): Add hidden_args enum.
(analyze_ssa_name_flags): Ignore return slot return;
use gimple_call_retslot_flags.
(record_escape_points): Break out from ...
(analyze_parms): ... here; handle retslot_flags.
(modref_summaries::duplicate): Duplicate retslot_flags.
(modref_summaries_lto::duplicate): Likewise.
(modref_write_escape_summary): Stream parm_index as signed.
(modref_read_escape_summary): Likewise.
(modref_write): Stream retslot_flags.
(read_section): Likewise.
(struct escape_map): Fix typo in comment.
(update_escape_summary_1): Fix whitespace.
(ipa_merge_modref_summary_after_inlining): Drop retslot_flags.
(modref_merge_call_site_flags): Merge retslot_flags.
* ipa-modref.h (struct modref_summary): Add retslot_flags.
* tree-ssa-structalias.c (handle_rhs_call): Handle retslot_flags.

middle-end: Add target independent tests for Arm complex numbers vectorization.

This beefs up the complex numbers vectorization testsuite
and adds target independent checks next to the target
dependent ones.

This allows regressions to the detection code to be found
when running on any target, not just aarch64.

gcc/testsuite/ChangeLog:

PR tree-optimization/102977
* gcc.dg/vect/complex/bb-slp-complex-add-pattern-int.c: Updated.
* gcc.dg/vect/complex/bb-slp-complex-add-pattern-long.c: Updated.
* gcc.dg/vect/complex/bb-slp-complex-add-pattern-short.c: Updated.
* gcc.dg/vect/complex/bb-slp-complex-add-pattern-unsigned-int.c:
Updated.
* gcc.dg/vect/complex/bb-slp-complex-add-pattern-unsigned-long.c:
Updated.
* gcc.dg/vect/complex/bb-slp-complex-add-pattern-unsigned-short.c:
Updated.
* gcc.dg/vect/complex/fast-math-bb-slp-complex-add-double.c:
* gcc.dg/vect/complex/fast-math-bb-slp-complex-add-float.c: Updated.
* gcc.dg/vect/complex/fast-math-bb-slp-complex-add-half-float.c:
Updated.
* gcc.dg/vect/complex/fast-math-bb-slp-complex-add-pattern-double.c:
Updated.
* gcc.dg/vect/complex/fast-math-bb-slp-complex-add-pattern-float.c:
Updated.
* gcc.dg/vect/complex/fast-math-bb-slp-complex-add-pattern-half-float.c:
Updated.
* gcc.dg/vect/complex/fast-math-bb-slp-complex-mla-double.c:
Updated.
* gcc.dg/vect/complex/fast-math-bb-slp-complex-mla-float.c: Updated.
* gcc.dg/vect/complex/fast-math-bb-slp-complex-mla-half-float.c:
Updated.
* gcc.dg/vect/complex/fast-math-bb-slp-complex-mls-double.c:
Updated.
* gcc.dg/vect/complex/fast-math-bb-slp-complex-mls-float.c: Updated.
* gcc.dg/vect/complex/fast-math-bb-slp-complex-mls-half-float.c:
Updated.
* gcc.dg/vect/complex/fast-math-bb-slp-complex-mul-double.c: Updated.
* gcc.dg/vect/complex/fast-math-bb-slp-complex-mul-float.c: Updated.
* gcc.dg/vect/complex/fast-math-bb-slp-complex-mul-half-float.c:
Updated.
* gcc.dg/vect/complex/fast-math-complex-add-double.c: Updated.
* gcc.dg/vect/complex/fast-math-complex-add-float.c: Updated.
* gcc.dg/vect/complex/fast-math-complex-add-half-float.c: Updated.
* gcc.dg/vect/complex/fast-math-complex-add-pattern-double.c: Updated.
* gcc.dg/vect/complex/fast-math-complex-add-pattern-float.c: Updated.
* gcc.dg/vect/complex/fast-math-complex-add-pattern-half-float.c:
Updated.
* gcc.dg/vect/complex/fast-math-complex-mla-double.c: Updated.
* gcc.dg/vect/complex/fast-math-complex-mla-float.c: Updated.
* gcc.dg/vect/complex/fast-math-complex-mla-half-float.c: Updated.
* gcc.dg/vect/complex/fast-math-complex-mls-double.c: Updated.
* gcc.dg/vect/complex/fast-math-complex-mls-float.c: Updated.
* gcc.dg/vect/complex/fast-math-complex-mls-half-float.c: Updated.
* gcc.dg/vect/complex/fast-math-complex-mul-double.c: Updated.
* gcc.dg/vect/complex/fast-math-complex-mul-float.c: Updated.
* gcc.dg/vect/complex/fast-math-complex-mul-half-float.c: Updated.
* gcc.dg/vect/complex/vect-complex-add-pattern-byte.c: Updated.
* gcc.dg/vect/complex/vect-complex-add-pattern-int.c: Updated.
* gcc.dg/vect/complex/vect-complex-add-pattern-long.c: Updated.
* gcc.dg/vect/complex/vect-complex-add-pattern-short.c: Updated.
* gcc.dg/vect/complex/vect-complex-add-pattern-unsigned-byte.c:
Updated.
* gcc.dg/vect/complex/vect-complex-add-pattern-unsigned-int.c:
Updated.
* gcc.dg/vect/complex/vect-complex-add-pattern-unsigned-long.c:
Updated.
* gcc.dg/vect/complex/vect-complex-add-pattern-unsigned-short.c:
Updated.
* gcc.dg/vect/complex/bb-slp-complex-add-pattern-byte.c: Removed.
* gcc.dg/vect/complex/bb-slp-complex-add-pattern-unsigned-byte.c:
Removed.

middle-end: Update the Arm complex numbers auto-vec detection to the new format of the SLP tree.

The layout of the SLP tree has changed in GCC 12 which
broke the detection of complex FMA and FMS.

This patch updates the detection to the new tree shape
and by necessity merges the complex MUL and FMA detection
into one.

This does not yet address the wrong code-gen PR which I
will fix in a different patch as that needs backporting.

gcc/ChangeLog:

PR tree-optimization/102977
* tree-vect-slp-patterns.c (vect_match_call_p): Remove.
(vect_detect_pair_op): Add crosslane check.
(vect_match_call_complex_mla): Remove.
(class complex_mul_pattern): Update comment.
(complex_mul_pattern::matches): Update detection.
(class complex_fma_pattern): Remove.
(complex_fma_pattern::matches): Remove.
(complex_fma_pattern::recognize): Remove.
(complex_fma_pattern::build): Remove.
(class complex_fms_pattern): Update comment.
(complex_fms_pattern::matches): Remove.
(complex_operations_pattern::recognize): Remove complex_fma_pattern

gimple-fold: Preserve location in gimple_fold_builtin_memset

As mentioned yesterday, gimple_fold_builtin_memset doesn't preserve
locus which means e.g. the -Wstringop-overflow warnings are emitted as:
In function 'test_max':
cc1: warning: writing 1 byte into a region of size 0 [-Wstringop-overflow=]
The function emits up to 2 new statements, but the latter (asgn) is added
through gsi_replace and therefore the locus is copied over from the call.
But store is emitted before the call and optionally the call removed
afterwards, so locus needs to be copied over manually.

2021-10-29 Jakub Jelinek <jakub@redhat.com>

* gimple-fold.c (gimple_fold_builtin_memset): Copy over location from
call to store.

* gcc.dg/Wstringop-overflow-62.c: Adjust expected diagnostics.

Force -fexcess-precision=standard for fp-uint64-convert-double-1.c

This forces -fexcess-precision=standard since the testcase is
otherwise prone to fail with x87 math.

2021-10-29 Richard Biener <rguenther@suse.de>

* gcc.dg/torture/fp-uint64-convert-double-1.c: Add
-fexcess-precision=standard.

c++: Implement DR2351 - void{} [PR102820]

Here is an implementation of DR2351 - void{} - where void{} after
pack expansion is considered valid and the same thing as void().
For templates, if CONSTRUCTOR_NELTS is 0, the CONSTRUCTOR is not dependent
and we can return void_node right away, if it is dependent and contains
only packs, then it is potentially zero element and so we need to build
CONSTRUCTOR_IS_DEPENDENT CONSTRUCTOR, while if it contains any non-pack
elts, we can diagnose it right away.

2021-10-29 Jakub Jelinek <jakub@redhat.com>

PR c++/102820
* semantics.c (maybe_zero_constructor_nelts): New function.
(finish_compound_literal): Implement DR2351 - void{}.
If type is cv void and compound_literal has no elements, return
void_node. If type is cv void and compound_literal might have no
elements after expansion, handle it like other dependent compound
literals.

* g++.dg/cpp0x/dr2351.C: New test.

rs6000: Optimize __builtin_shuffle when it's used to zero the upper bits [PR102868]

If the second operand of __builtin_shuffle is const vector 0, and with
specific mask, it can be optimized to vspltisw+xxpermdi instead of lxv.

gcc/ChangeLog:

PR target/102868
* config/rs6000/rs6000.c (altivec_expand_vec_perm_const): Add
patterns match and emit for VSX xxpermdi.

gcc/testsuite/ChangeLog:

PR target/102868
* gcc.target/powerpc/pr102868.c: New test.

Enable vectorization for _Float16 floor/ceil/trunc/nearbyint/rint operations.

gcc/ChangeLog:

PR target/102464
* config/i386/i386-builtin-types.def (V8HF_FTYPE_V8HF): New
function type.
(V16HF_FTYPE_V16HF): Ditto.
(V32HF_FTYPE_V32HF): Ditto.
(V8HF_FTYPE_V8HF_ROUND): Ditto.
(V16HF_FTYPE_V16HF_ROUND): Ditto.
(V32HF_FTYPE_V32HF_ROUND): Ditto.
* config/i386/i386-builtin.def ( IX86_BUILTIN_FLOORPH,
IX86_BUILTIN_CEILPH, IX86_BUILTIN_TRUNCPH,
IX86_BUILTIN_FLOORPH256, IX86_BUILTIN_CEILPH256,
IX86_BUILTIN_TRUNCPH256, IX86_BUILTIN_FLOORPH512,
IX86_BUILTIN_CEILPH512, IX86_BUILTIN_TRUNCPH512): New builtin.
* config/i386/i386-builtins.c
(ix86_builtin_vectorized_function): Enable vectorization for
HFmode FLOOR/CEIL/TRUNC operation.
* config/i386/i386-expand.c (ix86_expand_args_builtin): Handle
new builtins.
* config/i386/sse.md (rint<mode>2, nearbyint<mode>2): Extend
to vector HFmodes.

gcc/testsuite/ChangeLog:

* gcc.target/i386/pr102464-vrndscaleph.c: New test.

Daily bump.

path relation oracle: Remove SSA's being killed from the equivalence list.

Same thing as the relational change.  Walk any equivalences that have
been registered on the path, and remove the name being killed.  The
only reason we had added the equivalence with itself earlier is so we
wouldn't search any further in the equivalency list.  So if we are
removing all references to it, then we no longer need to add a "kill"
record.

Will push pending tests on x86-64 Linux.

Co-authored-by: Andrew MacLeod <amacleod@redhat.com>
gcc/ChangeLog:

* value-relation.cc (path_oracle::killing_def): Walk the
equivalency list and remove SSA from any equivalencies.

or1k: Add return address argument to _mcount call

This fixes an issue in the glibc port I am working on where the build
fails due to the warning:

error: calling ‘__builtin_return_address’ with a nonzero argument is unsafe [-Werror=frame-address]

This is due to how the current implementation of _mcount in glibc uses
__builtin_return_address with a count argument of 1.

Fix that by passing the value of LR_REGNUM to the _mcount function,
effectivtly providing the value _mcount is after.

This is an ABI change, but I think it's OK because the glibc port for
or1k is not yet upstreamed. Also, I think just adding an argument
should not break anything anyway.

gcc/ChangeLog:

* config/or1k/or1k.h (PROFILE_HOOK): Add return address argument
to _mcount.

match.pd: Optimize MIN_EXPR <addr1, addr2> etc. addr1 < addr2 would be simplified [PR102951]

This patch outlines the decision whether address comparison can be folded
or not from the match.pd simple comparison simplification and uses it
both there and in a new minmax simplification, such that we fold e.g.
MAX (&a[2], &a[1]) etc.
Some of the Wstringop-overflow-62.c changes might look weird, but that
seems to be mainly due to gimple_fold_builtin_memset not bothering to
copy over location, will fix that incrementally.

2021-10-28 Jakub Jelinek <jakub@redhat.com>

PR tree-optimization/102951
* fold-const.h (address_compare): Declare.
* fold-const.c (address_compare): New function.
* match.pd (cmp (convert1?@2 addr@0) (convert2? addr@1)): Use
address_compare helper.
(minmax cmp (convert1?@2 addr@0) (convert2?@3 addr@1)): New
simplification.

* gcc.dg/tree-ssa/pr102951.c: New test.
* gcc.dg/Wstringop-overflow-62.c: Adjust expected diagnostics.

Fix ifcvt-4.c to not depend on VRP2 asserts.

The testcase fails if VRP2 is replaced with a non-assert based VRP because it
accidentally depends on specific IL changes when the asserts are removed. This
removes that dependency.

gcc/testsuite/
* gcc.dg/ifcvt-4.c: Adjust.

Unify EVRP and VRP folding predicate message.

EVRP issues a message fior folding predicates in a different format than
VRP does, this patch unifies the messaging.

gcc/
* vr-values.c (simplify_using_ranges::fold_cond): Change fold message.

gcc/testsuite/
* gcc.dg/tree-ssa/evrp9.c: Adjust message scanned for.
* gcc.dg/tree-ssa/pr21458-2.c: Ditto.

Reset scev before invoking array_checker.

Before invoking the array_checker, we need to reset scev so it will not try to
access any ssa_names that the substitute and fold engine has freed.

PR tree-optimization/102940
* tree-vrp.c (execute_ranger_vrp): Reset scev.

c++: CTAD within template argument [PR102933]

Here when checking for erroneous occurrences of 'auto' inside a template
argument (which is allowed by the concepts TS for class templates),
extract_autos_r picks up the CTAD placeholder for X{T{0}} which causes
check_auto_in_tmpl_args to reject this valid template argument.  This
patch fixes this by making extract_autos_r ignore CTAD placeholders.

However, it seems we don't need to call check_auto_in_tmpl_args at all
outside of the concepts TS since using 'auto' as a type-id is otherwise
rejected more generally at parse time.  So this patch makes the function
just exit early if !flag_concepts_ts.

Similarly, I think the concepts code paths in do_auto_deduction and
type_uses_auto are only necessary for the concepts TS, so this patch
also restricts these code paths accordingly.

PR c++/102933

gcc/cp/ChangeLog:

* parser.c (cp_parser_simple_type_specifier): Adjust diagnostic
for using auto in parameter declaration.
* pt.c (extract_autos_r): Ignore CTAD placeholders.
(extract_autos): Use range-based for.
(do_auto_deduction): Use extract_autos only for the concepts TS
and not also for standard concepts.
(type_uses_auto): Likewise with for_each_template_parm.
(check_auto_in_tmpl_args): Just return false outside of the
concepts TS.  Simplify.

gcc/testsuite/ChangeLog:

* g++.dg/cpp2a/nontype-class50.C: New test.
* g++.dg/cpp2a/nontype-class50a.C: New test.

[PATCH 4/5] gcc/nios2: Define the musl linker

Add a definition of the musl linker used on the nios2 platform.

2021-10-26 Richard Purdie <richard.purdie@linuxfoundation.org>

gcc/ChangeLog:

* config/nios2/linux.h (MUSL_DYNAMIC_LINKER): Add musl linker

Signed-off-by: Richard Purdie <richard.purdie@linuxfoundation.org>

[PATCH 1/5] Makefile.in: Ensure build CPP/CPPFLAGS is used for build targets

During cross compiling, CPP is being set to the target compiler even for
build targets. As an example, when building a cross compiler targetting
mingw, the config.log for libiberty in
build.x86_64-pokysdk-mingw32.i586-poky-linux/build-x86_64-linux/libiberty/config.log
shows:

configure:3786: checking how to run the C preprocessor
configure:3856: result: x86_64-pokysdk-mingw32-gcc -E --sysroot=[sysroot]/x86_64-nativesdk-mingw32-pokysdk-mingw32
configure:3876: x86_64-pokysdk-mingw32-gcc -E --sysroot=[sysroot]/x86_64-nativesdk-mingw32-pokysdk-mingw32 conftest.c
configure:3876: $? = 0

This is libiberty being built for the build environment, not the target one
(i.e. in build-x86_64-linux). As such it should be using the build environment's
gcc and not the target one. In the mingw case the system headers are quite
different leading to build failures related to not being able to include a
process.h file for pem-unix.c.

Further analysis shows the same issue occuring for CPPFLAGS too.

Fix this by adding support for CPP_FOR_BUILD and CPPFLAGS_FOR_BUILD which
for example, avoids mixing the mingw headers for host binaries on linux
systems.

2021-10-27 Richard Purdie <richard.purdie@linuxfoundation.org>

ChangeLog:

* Makefile.tpl: Add CPP_FOR_BUILD and CPPFLAGS_FOR_BUILD support
* Makefile.in: Regenerate.
* configure: Regenerate.
* configure.ac: Add CPP_FOR_BUILD and CPPFLAGS_FOR_BUILD support

gcc/ChangeLog:

* configure: Regenerate.
* configure.ac: Use CPPFLAGS_FOR_BUILD for GMPINC

Signed-off-by: Richard Purdie <richard.purdie@linuxfoundation.org>

c++: quadratic constexpr behavior for left-assoc logical exprs [PR102780]

In the testcase below the two left fold expressions each expand into a
constant logical expression with 1024 terms, for which potential_const_expr
takes more than a minute to return true.  This happens because p_c_e_1
performs trial evaluation of the first operand of a &&/|| in order to
determine whether to consider the potentiality of the second operand.
And because the expanded expression is left-associated, this trial
evaluation causes p_c_e_1 to be quadratic in the number of terms of the
expression.

This patch fixes this quadratic behavior by making p_c_e_1 preemptively
compute potentiality of the second operand of a &&/||, and perform trial
evaluation of the first operand only if the second operand isn't
potentially constant.  We must be careful to avoid emitting bogus
diagnostics during the preemptive computation; to that end, we perform
this shortcut only when tf_error is cleared, and when tf_error is set we
now first check potentiality of the whole expression quietly and replay
the check noisily for diagnostics.

Apart from fixing the quadraticness for left-associated logical exprs,
this change also reduces compile time for the libstdc++ testcase
20_util/variant/87619.cc by about 15% even though our <variant> uses
right folds instead of left folds.  Likewise for the testcase in the PR,
for which compile time is reduced by 30%.  The reason for these speedups
is that p_c_e_1 no longer performs expensive trial evaluation of each term
of large constant logical expressions when determining their potentiality.

PR c++/102780

gcc/cp/ChangeLog:

* constexpr.c (potential_constant_expression_1) <case TRUTH_*_EXPR>:
When tf_error isn't set, preemptively check potentiality of the
second operand before performing trial evaluation of the first
operand.
(potential_constant_expression_1): When tf_error is set, first check
potentiality quietly and return true if successful, otherwise
proceed noisily to give errors.

gcc/testsuite/ChangeLog:

* g++.dg/cpp1z/fold13.C: New test.

Update documentation of %X spec

%X
Output the accumulated linker options specified by -Wl or a ‘%x’ spec string

The part about -Wl has been obsolete for 27 years, since this change:

Author: Torbjorn Granlund <tege@gnu.org>
Date:   Thu Oct 27 18:04:25 1994 +0000

    (process_command): Handle -Wl, and -Xlinker similar to -l,

    i.e., preserve their order with respect to linker input files.

Technically speaking, the arguments of -l, -Wl and -Xlinker are input files.

gcc/
* doc/invoke.texi (%X): Remove obsolete reference to -Wl.

middle-end/84407 - honor -frounding-math for int to float conversion

This makes us honor -frounding-math for integer to float conversions
and avoid constant folding when such conversion is not exact.

2021-10-28 Richard Biener <rguenther@suse.de>

PR middle-end/84407
* fold-const.c (fold_convert_const): Avoid int to float
constant folding with -frounding-math and inexact result.
* simplify-rtx.c (simplify_const_unary_operation): Likewise
for both float and unsigned_float.

* gcc.dg/torture/fp-uint64-convert-double-1.c: New testcase.
* gcc.dg/torture/fp-uint64-convert-double-2.c: Likewise.

Improve backward threading with switches.

We've been essentially using find_taken_edge_switch_expr() in the
backward threader, but this is suboptimal because said function only
works with singletons. VRP has a much smarter find_case_label_range
that works with ranges.

Tested on x86-64 Linux with:

a) Bootstrap & regtests.

b) Verifying we get more threads than before.

c) Asserting that the new code catches everything the old one
code caught (over a set of bootstrap .ii files).

gcc/ChangeLog:

* tree-ssa-threadbackward.c
(back_threader::find_taken_edge_switch): Use find_case_label_range
instead of find_taken_edge.

gcc/testsuite/ChangeLog:

* gcc.dg/tree-ssa/vrp106.c: Adjust for threading.
* gcc.dg/tree-ssa/vrp113.c: Same.

Make back_threader_registry inherit from back_jt_path_registry.

When a class's only purpose is to expose the methods of its only
member, it's really a derived class ;-).

Tested on x86-64 Linux.

gcc/ChangeLog:

* tree-ssa-threadbackward.c (class back_threader_registry):
Inherit from back_jt_path_registry.
(back_threader_registry::thread_through_all_blocks): Remove.
(back_threader_registry::register_path): Remove
m_lowlevel_registry prefix.

middle-end/57245 - honor -frounding-math in real truncation

The following honors -frounding-math when converting a FP constant
to another FP type.

2021-10-27 Richard Biener <rguenther@suse.de>

PR middle-end/57245
* fold-const.c (fold_convert_const_real_from_real): Honor
-frounding-math if the conversion is not exact.
* simplify-rtx.c (simplify_const_unary_operation): Do not
simplify FLOAT_TRUNCATE with sign dependent rounding.

* gcc.dg/torture/fp-double-convert-float-1.c: New testcase.

tree-optimization/102949 - fix base object alignment

This fixes fallout of g:4703182a06b831a9 where we now silently fail
to force alignment of a base object. The fix is to look at the
dr_info of the group leader to be consistent with alignment analysis.

2021-10-28 Richard Biener <rguenther@suse.de>

PR tree-optimization/102949
* tree-vect-stmts.c (ensure_base_align): Look at the
dr_info of a group leader and assert we are looking at
one with analyzed alignment.

rs6000: Fix ICE of vect cost related to V1TI [PR102767]

As PR102767 shows, the commit r12-3482 exposed one ICE in function
rs6000_builtin_vectorization_cost.  We claims V1TI supports
movmisalign on rs6000 (See define_expand "movmisalign<mode>"), so
it return true in rs6000_builtin_support_vector_misalignment for
misalign 8.  Later in the cost querying function
rs6000_builtin_vectorization_cost, we don't have the arms to handle
the V1TI input under (TARGET_VSX && TARGET_ALLOW_MOVMISALIGN).

The proposed fix is to add the consideration for V1TI, simply make
it as the cost for doubleword which is apparently bigger than the
cost of scalar, won't have the vectorization to happen, just to
keep consistency and avoid ICE.  Another thought is to not support
movmisalign for V1TI, but it sounds like a bad idea since it doesn't
match the reality.

Note that this patch also fixes up the wrong indentations around.

gcc/ChangeLog:

PR target/102767
* config/rs6000/rs6000.c (rs6000_builtin_vectorization_cost): Consider
V1T1 mode for unaligned load and store.

gcc/testsuite/ChangeLog:

PR target/102767
* gcc.target/powerpc/ppc-fortran/pr102767.f90: New file.

RISC-V: Fix wrong predicator for zero_extendsidi2_internal pattern

We're wrongly guard zero_extendsidi2_internal pattern both ZBA and ZBB,
only ZBA provide zero_extendsidi2 instruction.

gcc/ChangeLog

* config/riscv/riscv.md (zero_extendsidi2_internal): Allow ZBB
use this pattern.

RISC-V: Handle zi* extension correctly for arch-canonicalize script

Canonical order for z-prefixed extension are rely on the canonical order of
single letter extension, however we didn't put i into the list before,
so when we put zicsr or zifencei it will got exception.

gcc/ChangeLog:

* config/riscv/arch-canonicalize (CANONICAL_ORDER): Add `i` to
CANONICAL_ORDER.

hardened conditionals

This patch introduces optional passes to harden conditionals used in
branches, and in computing boolean expressions, by adding redundant
tests of the reversed conditions, and trapping in case of unexpected
results.  Though in abstract machines the redundant tests should never
fail, CPUs may be led to misbehave under certain kinds of attacks,
such as of power deprivation, and these tests reduce the likelihood of
going too far down an unexpected execution path.

for  gcc/ChangeLog

* common.opt (fharden-compares): New.
(fharden-conditional-branches): New.
* doc/invoke.texi: Document new options.
* gimple-harden-conditionals.cc: New.
* Makefile.in (OBJS): Build it.
* passes.def: Add new passes.
* tree-pass.h (make_pass_harden_compares): Declare.
(make_pass_harden_conditional_branches): Declare.

for  gcc/ada/ChangeLog

* doc/gnat_rm/security_hardening_features.rst
(Hardened Conditionals): New.

for  gcc/testsuite/ChangeLog

* c-c++-common/torture/harden-comp.c: New.
* c-c++-common/torture/harden-cond.c: New.

rs6000: Fold xxsel to vsel since they have same semantics

Fold xxsel to vsel like xxperm/vperm to avoid duplicate code.

gcc/ChangeLog:

2021-10-28 Xionghu Luo <luoxhu@linux.ibm.com>

PR target/94613
* config/rs6000/altivec.md: Add vsx register constraints.
* config/rs6000/vsx.md (vsx_xxsel<mode>): Delete.
(vsx_xxsel<mode>2): Likewise.
(vsx_xxsel<mode>3): Likewise.
(vsx_xxsel<mode>4): Likewise.

gcc/testsuite/ChangeLog:

2021-10-28 Xionghu Luo <luoxhu@linux.ibm.com>

* gcc.target/powerpc/builtins-1.c: Adjust.

rs6000: Fix wrong code generation for vec_sel [PR94613]

The vsel instruction is a bit-wise select instruction.  Using an
IF_THEN_ELSE to express it in RTL is wrong and leads to wrong code
being generated in the combine pass.  Per element selection is a
subset of per bit-wise selection,with the patch the pattern is
written using bit operations.  But there are 8 different patterns
to define "op0 := (op1 & ~op3) | (op2 & op3)":

(~op3&op1) | (op3&op2),
(~op3&op1) | (op2&op3),
(op3&op2) | (~op3&op1),
(op2&op3) | (~op3&op1),
(op1&~op3) | (op3&op2),
(op1&~op3) | (op2&op3),
(op3&op2) | (op1&~op3),
(op2&op3) | (op1&~op3),

The latter 4 cases does not follow canonicalisation rules, non-canonical
RTL is invalid RTL in vregs pass.  Secondly, combine pass will swap
(op1&~op3) to (~op3&op1) by commutative canonical, which could reduce
it to the FIRST 4 patterns, but it won't swap (op2&op3) | (~op3&op1) to
(~op3&op1) | (op2&op3), so this patch handles it with 4 patterns with
different NOT op3 position and check equality inside it.

Tested pass on P7, P8 and P9.

gcc/ChangeLog:

2021-10-28  Xionghu Luo  <luoxhu@linux.ibm.com>

PR target/94613
* config/rs6000/altivec.md (*altivec_vsel<mode>): Change to ...
(altivec_vsel<mode>): ... this and update define.
(*altivec_vsel<mode>_uns): Delete.
(altivec_vsel<mode>2): New define_insn.
(altivec_vsel<mode>3): Likewise.
(altivec_vsel<mode>4): Likewise.
* config/rs6000/rs6000-call.c (altivec_expand_vec_sel_builtin): New.
(altivec_expand_builtin): Call altivec_expand_vec_sel_builtin to expand
vel_sel.
* config/rs6000/rs6000.c (rs6000_emit_vector_cond_expr): Use bit-wise
selection instead of per element.
* config/rs6000/vector.md:
* config/rs6000/vsx.md (*vsx_xxsel<mode>): Change to ...
(vsx_xxsel<mode>): ... this and update define.
(*vsx_xxsel<mode>_uns): Delete.
(vsx_xxsel<mode>2): New define_insn.
(vsx_xxsel<mode>3): Likewise.
(vsx_xxsel<mode>4): Likewise.

gcc/testsuite/ChangeLog:

2021-10-28  Xionghu Luo  <luoxhu@linux.ibm.com>

PR target/94613
* gcc.target/powerpc/pr94613.c: New test.

AVX512FP16: Optimize _Float16 reciprocal for div and sqrt

For _Float16 type, add insn and expanders to optimize x / y to
x * rcp (y), and x / sqrt (y) to x * rsqrt (y).
As Half float only have minor precision difference between div and
mul * rcp, there is no need for Newton-Rhapson approximation.

gcc/ChangeLog:

* config/i386/i386.c (use_rsqrt_p): Add mode parameter, enable
HFmode rsqrt without TARGET_SSE_MATH.
(ix86_optab_supported_p): Refactor rint, adjust floor, ceil,
btrunc condition to be restricted by -ftrapping-math, adjust
use_rsqrt_p function call.
* config/i386/i386.md (rcphf2): New define_insn.
(rsqrthf2): Likewise.
* config/i386/sse.md (div<mode>3): Change VF2H to VF2.
(div<mode>3): New expander for HF mode.
(rsqrt<mode>2): Likewise.
(*avx512fp16_vmrcpv8hf2): New define_insn for rpad pass.
(*avx512fp16_vmrsqrtv8hf2): Likewise.

gcc/testsuite/ChangeLog:

* gcc.target/i386/avx512fp16-recip-1.c: New test.
* gcc.target/i386/avx512fp16-recip-2.c: Ditto.
* gcc.target/i386/pr102464.c: Add -fno-trapping-math.

Daily bump.

Fortran: Delete unused decl in intrinsic.h

gcc/fortran/ChangeLog:

* intrinsic.h (gfc_check_sum, gfc_resolve_atan2d, gfc_resolve_kill,
gfc_resolve_kill_sub): Delete declaration.

Fortran: Delete unused decl in trans-types.h

gcc/fortran/ChangeLog:

* trans-types.h (gfc_convert_function_code): Delete.

Fortran: Delete unused decl in trans-stmt.h

gcc/fortran/ChangeLog:

* trans-stmt.h (gfc_trans_deallocate_array): Delete.

Fortran: make some trans-array functions static

gcc/fortran/ChangeLog:

* trans-array.c (gfc_trans_scalarized_loop_end): Make static.
* trans-array.h (gfc_trans_scalarized_loop_end,
gfc_conv_tmp_ref, gfc_conv_array_transpose): Delete declaration.

Fortran: make some constructor* functions static

gfc_constructor_expr_foreach and gfc_constructor_swap were just stubs.

gcc/fortran/ChangeLog:

* constructor.c (gfc_constructor_get_base): Make static.
(gfc_constructor_expr_foreach, gfc_constructor_swap): Delete.
* constructor.h (gfc_constructor_get_base): Remove declaration.
(gfc_constructor_expr_foreach, gfc_constructor_swap): Delete.

Fortran: make some match* functions static

gfc_match_small_int_expr was unused, delete it.
gfc_match_gcc_unroll should use gfc_match_small_literal_int and then
gfc_match_small_int can be deleted since it will be unused.

gcc/fortran/ChangeLog:

* decl.c (gfc_match_old_kind_spec, set_com_block_bind_c,
set_verify_bind_c_sym, set_verify_bind_c_com_block,
get_bind_c_idents, gfc_match_suffix, gfc_get_type_attr_spec,
check_extended_derived_type): Make static.
(gfc_match_gcc_unroll): Add comment.
* match.c (gfc_match_small_int_expr): Delete definition.
* match.h (gfc_match_small_int_expr): Delete declaration.
(gfc_match_name_C, gfc_match_old_kind_spec, set_com_block_bind_c,
set_verify_bind_c_sym, set_verify_bind_c_com_block,
get_bind_c_idents, gfc_match_suffix,
gfc_get_type_attr_spec): Delete declaration.

Fortran: make some trans* functions static

This makes some trans* functions static and deletes declarations of
functions that either do not exist anymore like gfc_get_function_decl
or that are unused like gfc_check_any_c_kind.

gcc/fortran/ChangeLog:

* expr.c (is_non_empty_structure_constructor): Make static.
* gfortran.h (gfc_check_any_c_kind): Delete.
* match.c (gfc_match_label): Make static.
* match.h (gfc_match_label): Delete declaration.
* scanner.c (file_changes_cur, file_changes_count,
file_changes_allocated): Make static.
* trans-expr.c (gfc_get_character_len): Make static.
(gfc_class_len_or_zero_get): Make static.
(VTAB_GET_FIELD_GEN): Undefine.
(gfc_get_class_array_ref): Make static.
(gfc_finish_interface_mapping): Make static.
* trans-types.c (gfc_check_any_c_kind): Delete.
(pfunc_type_node, dtype_type_node, gfc_get_ppc_type): Make static.
* trans-types.h (gfc_get_ppc_type): Delete declaration.
* trans.c (gfc_msg_wrong_return): Delete.
* trans.h (gfc_class_len_or_zero_get, gfc_class_vtab_extends_get,
gfc_vptr_extends_get, gfc_get_class_array_ref, gfc_get_character_len,
gfc_finish_interface_mapping, gfc_msg_wrong_return,
gfc_get_function_decl): Delete declaration.

libffi: Update LOCAL_PATCHES

Add

commit 90205f67e465ae7dfcf733c2b2b177ca7ff68da0
Author: Segher Boessenkool <segher@kernel.crashing.org>
Date:   Mon Oct 25 23:29:26 2021 +0000

    rs6000: Fix bootstrap (libffi)

    This fixes bootstrap for the current problems building libffi.

to LOCAL_PATCHES.

* LOCAL_PATCHES: Add commit 90454a90082.

Darwin, config: Amend for Darwin 21 / macOS 12.

It seems that the OS major version is now tracking the kernel
major version - 9. Minor version has been set to kerne
min - 1.

Signed-off-by: Iain Sandoe <iain@sandoe.co.uk>
Signed-off-by: Saagar Jha <saagar@saagarjha.com>
gcc/ChangeLog:

* config.gcc: Adjust for Darwin21.
* config/darwin-c.c (macosx_version_as_macro): Likewise.
* config/darwin-driver.c (validate_macosx_version_min):
Likewise.
(darwin_find_version_from_kernel): Likewise.

Kill known equivalences before a new assignment in the path solver.

Every time we have a killing statement, we must also kill the relations
seen so far. This is similar to what we did for the equivs inherent in
PHIs along a path.

Tested on x86-64 and ppc64le Linux.

gcc/ChangeLog:

* gimple-range-path.cc
(path_range_query::range_defined_in_block): Call killing_def.

Reorder relation calculating code in the path solver.

Enabling the fully resolving threader triggers various relation
ordering issues that have previously been dormant because the VRP
hybrid threader (forward threader based) never gives us long enough
paths for this to matter.  The new threader spares no punches in
finding non-obvious paths, so getting the relations right is
paramount.

This patch fixes a couple oversights that have gone undetected.

First, some background.  There are 3 types of relations along a path:

a) Relations inherent in a PHI.
b) Relations as a side-effect of evaluating a statement.
c) Outgoing relations between blocks in a path.

We must calculate these in their proper order, otherwise we can run
into ordering issues.  The current ordering is wrong, as we
precalculate PHIs for _all_ blocks before anything else, and then
proceed to register the relations throughout the path.  Also, we fail
to realize that a PHI whose argument is also defined in the PHIs block
cannot be registered as an equivalence without causing more ordering
issues.

This patch fixes all the problems described above.  With it we get a
handful more net threads, but most importantly, we disallow some
threads that were wrong.

Tested on x86-64 and ppc64le Linux on the usual regstrap, plus by
comparing the different thread counts before and after this patch.

gcc/ChangeLog:

* gimple-range-fold.cc (fold_using_range::range_of_range_op): Dump
operands as well as relation.
* gimple-range-path.cc
(path_range_query::compute_ranges_in_block): Compute PHI relations
first.  Compute outgoing relations at the end.
(path_range_query::compute_ranges): Remove call to compute_relations.
(path_range_query::compute_relations): Remove.
(path_range_query::maybe_register_phi_relation): New.
(path_range_query::compute_phi_relations): Abstract out
registering one PHI relation to...
(path_range_query::compute_outgoing_relations): ...here.
* gimple-range-path.h (class path_range_query): Remove
compute_relations.
Add maybe_register_phi_relation.

Kill second order relations in the path solver.

My upcoming work replacing the VRP threaders with a fully resolving
backward threader has tripped over various corner cases in the path
sensitive relation oracle. This patch kills second order relations when
we kill a relation.

Tested on x86-64 and ppc64le Linux.

Co-authored-by: Andrew MacLeod <amacleod@redhat.com>
gcc/ChangeLog:

* value-relation.cc (path_oracle::killing_def): Kill second
order relations.

Fix warnings building linux-atomic.c and fptr.c on hppa64-linux

The file fptr.c is specific to 32-bit hppa-linux and should not be
included in LIB2ADD on hppa64-linux.

There is a builtin type mismatch in linux-atomic.c using the type
long long unsigned int for 64-bit atomic operations on hppa64-linux.

2021-10-27 John David Anglin <danglin@gcc.gnu.org>

libgcc/ChangeLog:

* config.host (hppa*64*-*-linux*): Don't add pa/t-linux to
tmake_file.
* config/pa/linux-atomic.c: Define u8, u16 and u64 types.
Use them in FETCH_AND_OP_2, OP_AND_FETCH_2, COMPARE_AND_SWAP_2,
SYNC_LOCK_TEST_AND_SET_2 and SYNC_LOCK_RELEASE_1 macros.
* config/pa/t-linux64 (LIB1ASMSRC): New define.
(LIB1ASMFUNCS): Revise.
(HOST_LIBGCC2_CFLAGS): Add "-DLINUX=1".

Fix a typo.

gcc/testsuite/ChangeLog:
* gcc.dg/Warray-bounds-90.c: Fix a typo.

ipa-cp: Use profile counters (or not) based on local availability

This is a follow-up small patch to address Honza's review of my
previous patch to select saner profile count to base heuristics on.
Currently the IPA-CP heuristics switch to PGO-mode only if there are
PGO counters available for any part of the call graph.  This change
makes it to switch to the PGO mode only if any of the incoming edges
bringing in the constant in question had any ipa-quality counts on
them.  Consequently, if a part of the program is built with
-fprofile-use and another part without, IPA-CP will use
estimated-frequency-based heuristics for the latter.

I still wonder whether this should only happen with
flag_profile_partial_training on.  It seems like we're behaving as if
it was always on.

gcc/ChangeLog:

2021-10-18  Martin Jambor  <mjambor@suse.cz>

* ipa-cp.c (good_cloning_opportunity_p): Decide whether to use
profile feedback depending on their local availability.