review.tizen.org Git - test

bswap: Improve perform_symbolic_merge [PR103376]

Thinking more about it, perhaps we could do more for BIT_XOR_EXPR.
We could allow masked1 == masked2 case for it, but would need to
do something different than the
  n->n = n1->n | n2->n;
we do on all the bytes together.
In particular, for masked1 == masked2 if masked1 != 0 (well, for 0
both variants are the same) and masked1 != 0xff we would need to
clear corresponding n->n byte instead of setting it to the input
as x ^ x = 0 (but if we don't know what x and y are, the result is
also don't know).  Now, for plus it is much harder, because not only
for non-zero operands we don't know what the result is, but it can
modify upper bytes as well.  So perhaps only if current's byte
masked1 && masked2 set the resulting byte to 0xff (unknown) iff
the byte above it is 0 and 0, and set that resulting byte to 0xff too.
Also, even for | we could instead of return NULL just set the resulting
byte to 0xff if it is different, perhaps it will be masked off later on.

This patch just punts on plus if both corresponding bytes are non-zero,
otherwise implements the above.

2021-11-25  Jakub Jelinek  <jakub@redhat.com>

PR tree-optimization/103376
* gimple-ssa-store-merging.c (perform_symbolic_merge): For
BIT_IOR_EXPR, if masked1 && masked2 && masked1 != masked2, don't
punt, but set the corresponding result byte to MARKER_BYTE_UNKNOWN.
For BIT_XOR_EXPR similarly and if masked1 == masked2 and the
byte isn't MARKER_BYTE_UNKNOWN, set the corresponding result byte to
0.

* gcc.dg/optimize-bswapsi-7.c: New test.

c++: Return early in apply_late_template_attributes if there are no late attribs [PR101180]

The r12-299-ga0fdff3cf33f7284 change can result in cplus_decl_attributes being called
even if there are no late attributes (but at least one early attribute) in
apply_late_template_attributes. This patch fixes that, so that we return early
if there are no late attrs, only arrange for TYPE_ATTRIBUTES to get the early
attribute list.

2021-11-25 Jakub Jelinek <jakub@redhat.com>

PR c++/101180
* pt.c (apply_late_template_attributes): Return early if there are no
dependent attributes.

c++: Implement C++23 P2128R6 - Multidimensional subscript operator [PR102611]

The following patch implements the C++23 Multidimensional subscript operator
P2128R6 paper.
As C++20 and older only allow a single expression in between []s (albeit
for C++20 with a deprecation warning if it is a comma expression) and even
in C++23 and for the coming years I think the vast majority of subscript
expressions will still have a single expression and even in C++23 it is
quite special, as e.g. the builtin operator requires exactly one
assignment expression, the patch attempts to optimize for that case and
if possible not to slow down that common case (or use more memory for it).
So, already during parsing it differentiates between that (uses a single
index_exp tree in that case) and the new cases (zero or two+ expressions
in the list), for which it sets index_exp to NULL_TREE and uses a
releasing_vec instead similarly to how e.g. finish_call_expr uses it.
In call.c it introduces new functions build_op_subscript{,_1} which are
something in between build_new_op{,_1} and build_op_call{,_1}.
The former requires fixed number of arguments (and the patch still uses
it for the common case of subscript with exactly one index expression),
the latter handles variable number of arguments but is too CALL_EXPR specific
and handles various cases that are unnecessary for the subscript.
Right now the subscript for 0 or 2+ expressions doesn't need to deal with
builtin candidates and so is quite simple.

As discussed in the paper, for backwards compatibility, if for 2+ index
expressions build_op_subscript fails (called with tf_none) and the
expressions together form a valid comma expression (again checked with
tf_none), it is used that C++20-ish way with a pedwarn about it, but if
even that fails, build_op_subscript is called again with standard complain
flags to diagnose it in the new way.  And similarly for the builtin case.

The -Wcomma-subscript warning used to be enabled by default unless
-Wno-deprecated.  Since the C/C++98..20 behavior is no longer deprecated,
but ill-formed or changed meaning, it is now for C++23 enabled by
default regardless of -Wno-deprecated and controls the pedwarn (but not the
errors emitted if something wasn't valid before and isn't valid in C++23
either).

2021-11-25  Jakub Jelinek  <jakub@redhat.com>

PR c++/102611
gcc/
* doc/invoke.texi (-Wcomma-subscript): Document that for
-std=c++20 the option isn't enabled by default with -Wno-deprecated
but for -std=c++23 it is.
gcc/c-family/
* c-opts.c (c_common_post_options): Enable -Wcomma-subscript by
default for C++23 regardless of warn_deprecated.
* c-cppbuiltin.c (c_cpp_builtins): Predefine
__cpp_multidimensional_subscript=202110L for C++23.
gcc/cp/
* cp-tree.h (build_op_subscript): Implement P2128R6
- Multidimensional subscript operator.  Declare.
(class releasing_vec): Add release method.
(grok_array_decl): Remove bool argument, add vec<tree, va_gc> **
and tsubst_flags_t arguments.
(build_min_non_dep_op_overload): Declare another overload.
* parser.c (cp_parser_parenthesized_expression_list_elt): New function.
(cp_parser_postfix_open_square_expression): Mention C++23 syntax in
function comment.  For C++23 parse zero or more than one initializer
clauses in expression list, adjust grok_array_decl caller.
(cp_parser_parenthesized_expression_list): Use
cp_parser_parenthesized_expression_list_elt.
(cp_parser_builtin_offsetof): Adjust grok_array_decl caller.
* decl.c (grok_op_properties): For C++23 don't check number
of arguments of operator[].
* decl2.c (grok_array_decl): Remove decltype_p argument, add
index_exp_list and complain arguments.  If index_exp is NULL,
handle *index_exp_list as the subscript expression list.
* tree.c (build_min_non_dep_op_overload): New overload.
* call.c (add_operator_candidates, build_over_call): Adjust comments
for removal of build_new_op_1.
(build_op_subscript): New function.
* pt.c (tsubst_copy_and_build_call_args): New function.
(tsubst_copy_and_build) <case ARRAY_REF>: If second
operand is magic CALL_EXPR with ovl_op_identifier (ARRAY_REF)
as CALL_EXPR_FN, tsubst CALL_EXPR arguments including expanding
pack expressions in it and call grok_array_decl instead of
build_x_array_ref.
<case CALL_EXPR>: Use tsubst_copy_and_build_call_args.
* semantics.c (handle_omp_array_sections_1): Adjust grok_array_decl
caller.
gcc/testsuite/
* g++.dg/cpp2a/comma1.C: Expect different diagnostics for C++23.
* g++.dg/cpp2a/comma3.C: Likewise.
* g++.dg/cpp2a/comma4.C: Expect diagnostics for C++23.
* g++.dg/cpp2a/comma5.C: Expect different diagnostics for C++23.
* g++.dg/cpp23/feat-cxx2b.C: Test __cpp_multidimensional_subscript
predefined macro.
* g++.dg/cpp23/subscript1.C: New test.
* g++.dg/cpp23/subscript2.C: New test.
* g++.dg/cpp23/subscript3.C: New test.
* g++.dg/cpp23/subscript4.C: New test.
* g++.dg/cpp23/subscript5.C: New test.
* g++.dg/cpp23/subscript6.C: New test.

pr103194-5.c: Replace long with int64_t

Replace long with int64_t to work with -mx32.

* gcc.target/i386/pr103194-5.c: Include <stdint.h>.
Replace long with int64_t.

Daily bump.

Fix handling of static chain in ipa_merge_modref_summary_after_inlining

gcc/ChangeLog:

2021-11-24 Jan Hubicka <hubicka@ucw.cz>

* ipa-modref.c (implicit_eaf_flags_for_edge_and_arg): Break out from...
(modref_merge_call_site_flags): ... here.
(ipa_merge_modref_summary_after_inlining): Use it.

gcc/testsuite/ChangeLog:

2021-11-24 Jan Hubicka <hubicka@ucw.cz>

* gcc.c-torture/execute/pr103405.c: New test.

Reduce scope of a few 'class loop *loop' variables

Further clean-up after commit e41ba804ba5f5ca433e09238d561b1b4c8b10985
"Use range-based for loops for traversing loops". No functional change.

gcc/
* cfgloop.c (verify_loop_structure): Reduce scope of
'class loop *loop' variable.
* ipa-fnsummary.c (analyze_function_body): Likewise.
* loop-init.c (fix_loop_structure): Likewise.
* loop-invariant.c (calculate_loop_reg_pressure): Likewise.
* predict.c (predict_loops): Likewise.
* tree-loop-distribution.c (loop_distribution::execute): Likewise.
* tree-vectorizer.c (pass_vectorize::execute): Likewise.

Directly resolve range_of_stmt dependencies.

All ranger API entries eventually call range_of_stmt to ensure there is an
initial global value to work with. This can cause very deep call chains when
satisfied via the normal API. Instead, push any dependencies onto a stack
and evaluate them in a depth first manner, mirroring what would have happened
via the normal API calls.

PR tree-optimization/103231
gcc/
* gimple-range.cc (gimple_ranger::gimple_ranger): Create stmt stack.
(gimple_ranger::gimple_ranger): Delete stmt stack.
(gimple_ranger::range_of_stmt): Process depenedencies if they have no
global cache entry.
(gimple_ranger::prefill_name): New.
(gimple_ranger::prefill_stmt_dependencies): New.
* gimple-range.h (class gimple_ranger): Add prototypes.

Split return functionality of get_non_stale_global_range.

Get_non_stale_global_range returns true only when there is a cache entry that
is not out of date. Change it so that it returns true if there was a cache
value, but return the temporal comparison result in an auxiallary flag.

* gimple-range-cache.cc (ranger_cache::get_global_range): Always
return a range, return if it came from the cache or not.
(get_non_stale_global_range): Rename to get_global_range, and return
the temporal state in a flag.
* gimple-range-cache.h (get_non_stale_global_range): Rename and adjust.
* gimple-range.cc (gimple_ranger::range_of_expr): No need to query
get_global_range.
(gimple_ranger::range_of_stmt): Adjust for global cache temporal state
returned in a flag.

Range-on-edge trace tweak.

Trace formatting gets out of sync when range on edge is called with a constant.

* gimple-range.cc (gimple_ranger::range_on_edge): Call trailer when
a constant is encountered to terminate the trace.

libstdc++: Add xfail to some printer tests for debug mode

The type printers are not substituting std::string for
std::basic_string<char> in debug mode, mark some tests as xfail.

libstdc++-v3/ChangeLog:

* testsuite/libstdc++-prettyprinters/80276.cc: Add xfail for
debug mode.
* testsuite/libstdc++-prettyprinters/libfundts.cc: Likewise.

libstdc++: Replace hyphens in effective target keywords

An effective target like foo-bar-baz will match a target selector of
*-*-* and cause problems in the testsuite. Several libstdc++ et keywords
are of the form foo-bar, which could still be a problem for *-*
selectors.

Replace hyphens with underscores in the et keywords "debug-mode",
"cxx11-abi", etc.

libstdc++-v3/ChangeLog:

* testsuite/lib/libstdc++.exp: Rename effective target keywords
to avoid dashes in the name.
* testsuite/*: Update effective targe keywords.

PR middle-end/103059: reload: Also accept ASHIFT with indexed addressing

Correct a `vax-netbsdelf' target regression ultimately caused by commit
c605a8bf9270 ("VAX: Accept ASHIFT in address expressions") (needed for
LRA) and as of commit 4a960d548b7d ("Avoid invalid loop transformations
in jump threading registry.") causing a build error in libgcc:

.../libgcc/libgcov-driver.c: In function 'gcov_do_dump':
.../libgcc/libgcov-driver.c:686:1: error: insn does not satisfy its constraints:
  686 | }
      | ^
(insn 2051 2050 2052 185 (set (reg/f:SI 0 %r0 [555])
        (plus:SI (ashift:SI (mem/c:SI (plus:SI (reg/f:SI 13 %fp)
                        (const_int -28 [0xffffffffffffffe4])) [40 %sfp+-28 S4 A32])
                (const_int 3 [0x3]))
            (plus:SI (reg/v/f:SI 9 %r9 [orig:176 fn_buffer ] [176])
                (const_int 24 [0x18])))) ".../libgcc/libgcov-driver.c":172:40 614 {movaddrdi}
     (nil))
during RTL pass: postreload
.../libgcc/libgcov-driver.c:686:1: internal compiler error: in extract_constrain_insn, at recog.c:2670
0x1122a5ff _fatal_insn(char const*, rtx_def const*, char const*, int, char const*)
.../gcc/rtl-error.c:108
0x1122a697 _fatal_insn_not_found(rtx_def const*, char const*, int, char const*)
.../gcc/rtl-error.c:118
0x111b5f2f extract_constrain_insn(rtx_insn*)
.../gcc/recog.c:2670
0x11143eef reload_cse_simplify_operands
.../gcc/postreload.c:407
0x11142fdb reload_cse_simplify
.../gcc/postreload.c:132
0x11143533 reload_cse_regs_1
.../gcc/postreload.c:238
0x11142ce7 reload_cse_regs
.../gcc/postreload.c:66
0x1114af33 execute
.../gcc/postreload.c:2355
Please submit a full bug report,
with preprocessed source if appropriate.
Please include the complete backtrace with any bug report.
See <https://gcc.gnu.org/bugs/> for instructions.

This is because reload does not recognize the ASHIFT form of scaled
indexed addressing that the offending commit enabled the backend to
produce, and as seen in the RTL above lets the pseudo holding the
index part become the original memory reference rather than reloading it
into a hard register.

Fix it then by adding said form to reload, removing the build failure
and numerous similar regressions throughout `vax-netbsdelf' test suites
run with the source as at right before the build regression.

Cf. <https://gcc.gnu.org/pipermail/gcc-patches/2021-March/567256.html>,
and commit 6b3034eaba83 ("lra: Canonicalize mult to shift in address
reloads").

gcc/
PR middle-end/103059
* reload.c (find_reloads_address_1): Also accept the ASHIFT form
of indexed addressing.
(find_reloads): Adjust accordingly.

tree-optimization/103168 - Improve VN of pure function calls

This improves value-numbering of calls that read memory, calls
to const functions with aggregate arguments and calls to
pure functions where the latter include const functions we
demoted to pure for the fear of interposing with a less
optimized version.  Note that for pure functions we do not
handle functions that access global memory.

2021-11-24  Richard Biener  <rguenther@suse.de>
    Jan Hubicka  <jh@suse.cz>

PR tree-optimization/103168
* ipa-modref.h (struct modref_summary): Add load_accesses.
* ipa-modref.c (modref_summary::finalize): Initialize load_accesses.
* tree-ssa-sccvn.c (visit_reference_op_call): Use modref
info to walk the virtual use->def chain to CSE const/pure
function calls possibly reading from memory.

* g++.dg/tree-ssa/pr103168.C: New testcase.

Restore previous OpenACC implicit data clauses ordering [PR103244]

Follow-up for recent commit b7e20480630e3eeb9eed8b3941da3b3f0c22c969
"openmp: Relax handling of implicit map vs. existing device mappings".

As discussed, we likely also for OpenACC ought to use
'OMP_CLAUSE_MAP_RUNTIME_IMPLICIT_P' and do the appropriate implicit clauses
ordering -- but that's for a separate step.

gcc/
PR middle-end/103244
* gimplify.c (gimplify_adjust_omp_clauses): Restore previous
OpenACC behavior.
gcc/testsuite/
PR middle-end/103244
* c-c++-common/goacc/combined-reduction.c: Revert/expect previous
OpenACC behavior.
* c-c++-common/goacc/firstprivate-mappings-1.c: Likewise.
* c-c++-common/goacc/mdc-1.c: Likewise.
* g++.dg/goacc/firstprivate-mappings-1.C: Likewise.

jit: Initialize function::m_blocks in ctor

This resolves the problem reported here:
https://mail.gnu.org/archive/html/bug-gnu-emacs/2021-11/msg00606.html
https://bugzilla.opensuse.org/show_bug.cgi?id=1192951

gcc/jit/ChangeLog:

* jit-playback.c (function): Initialize m_blocks vector.

Update GMP/MPFR/MPC/ISL version in contrib/download_prerequisites

contrib/
* download_prerequisites: Update to gmp-6.2.1, mpfr-4.1.0,
mpc-1.2.1 and isl-0.24.
* prerequisites.md5: Update hash.
* prerequisites.sha512: Likewise.

middle-end/103193 - avoid canonicalizing <= and >= to == for floats

This avoids doing aforementioned canoncalization when -ftrapping-math
is in effect and we honor NaNs.

2021-11-15 Richard Biener <rguenther@suse.de>

PR middle-end/103193
* match.pd: Avoid canonicalizing (le/ge @0 @0) to (eq @0 @0)
with NaNs and -ftrapping-math.

openmp: Fix up handling of kind(host) and kind(nohost) in ACCEL_COMPILERs [PR103384]

As the testcase shows, we weren't handling kind(host) and kind(nohost) properly
in the ACCEL_COMPILERs, the code written in there is valid for the host
compiler only, where if we are maybe offloaded, we defer resolution after IPA,
otherwise return 0 for kind(nohost) and accept it for kind(host).  Note,
omp_maybe_offloaded is false after IPA.  If ACCEL_COMPILER is defined, it is
the other way around, but also we know we are after IPA.

2021-11-24  Jakub Jelinek  <jakub@redhat.com>

PR middle-end/103384
gcc/
* omp-general.c (omp_context_selector_matches): For ACCEL_COMPILER,
return 0 for kind(host) and continue for kind(nohost).
libgomp/
* testsuite/libgomp.c/declare-variant-2.c: New test.

attribs: Fix ICEs on attributes starting with _ [PR103365]

As the patch shows, we have quite a few asserts that we don't call
lookup_attribute etc. with attr_name that starts with an underscore,
to make sure nobody is trying to call it with non-canonicalized
attribute name like "__cold__" instead of "cold".
We canonicalize only attributes that start with 2 underscores and end
with 2 underscores though.
Before Marek's patch, that wasn't an issue, we had no attributes like
"_foo" or "__bar_" etc., so lookup_scoped_attribute_spec would
always return NULL for those and we wouldn't try to register them,
look them up etc., just with -Wattributes would warn about them.
But now, as the new testcases show, users can actually request such
attributes to be ignored, and we ICE for those during
register_scoped_attribute and when that is fixed, ICE later on when
somebody uses those attributes because they will be looked up
to find out that they should be ignored.

So, the following patch instead of or in addition to, depending on
how performance sensitive a particular spot is, checking that
attribute doesn't start with underscore allows attribute
names that start with underscore as long as it doesn't canonicalize
(i.e. doesn't start and end with 2 underscores).
In addition to that, I've noticed lookup_attribute_by_prefix
was calling get_attribute_name twice unnecessarily, and 2 tests
were running in c++98 mode with -std=c++98 -std=c++11 which IMHO
isn't useful because -std=c++11 testing is done too when testing
all language versions.

2021-11-24 Jakub Jelinek <jakub@redhat.com>

PR middle-end/103365
* attribs.h (lookup_attribute): Allow attr_name to start with
underscore, as long as canonicalize_attr_name returns false.
(lookup_attribute_by_prefix): Don't call get_attribute_name twice.
* attribs.c (extract_attribute_substring): Reimplement using
canonicalize_attr_name.
(register_scoped_attribute): Change gcc_assert into
gcc_checking_assert, verify !canonicalize_attr_name rather than
that str.str doesn't start with '_'.

* c-c++-common/Wno-attributes-1.c: Require effective target
c || c++11 and drop dg-additional-options.
* c-c++-common/Wno-attributes-2.c: Likewise.
* c-c++-common/Wno-attributes-4.c: New test.
* c-c++-common/Wno-attributes-5.c: New test.

bswap: Fix up symbolic merging for xor and plus [PR103376]

On Mon, Nov 22, 2021 at 08:39:42AM -0000, Roger Sayle wrote:
> This patch implements PR tree-optimization/103345 to merge adjacent
> loads when combined with addition or bitwise xor.  The current code
> in gimple-ssa-store-merging.c's find_bswap_or_nop alreay handles ior,
> so that all that's required is to treat PLUS_EXPR and BIT_XOR_EXPR in
> the same way at BIT_IOR_EXPR.

Unfortunately they aren't exactly the same.  They work the same if always
at least one operand (or corresponding byte in it) is known to be 0,
0 | 0 = 0 ^ 0 = 0 + 0 = 0.  But for | also x | x = x for any other x,
so perform_symbolic_merge has been accepting either that at least one
of the bytes is 0 or that both are the same, but that is wrong for ^
and +.

The following patch fixes that by passing through the code of binary
operation and allowing non-zero masked1 == masked2 through only
for BIT_IOR_EXPR.

Thinking more about it, perhaps we could do more for BIT_XOR_EXPR.
We could allow masked1 == masked2 case for it, but would need to
do something different than the
  n->n = n1->n | n2->n;
we do on all the bytes together.
In particular, for masked1 == masked2 if masked1 != 0 (well, for 0
both variants are the same) and masked1 != 0xff we would need to
clear corresponding n->n byte instead of setting it to the input
as x ^ x = 0 (but if we don't know what x and y are, the result is
also don't know).  Now, for plus it is much harder, because not only
for non-zero operands we don't know what the result is, but it can
modify upper bytes as well.  So perhaps only if current's byte
masked1 && masked2 set the resulting byte to 0xff (unknown) iff
the byte above it is 0 and 0, and set that resulting byte to 0xff too.
Also, even for | we could instead of return NULL just set the resulting
byte to 0xff if it is different, perhaps it will be masked off later on.

2021-11-24  Jakub Jelinek  <jakub@redhat.com>

PR tree-optimization/103376
* gimple-ssa-store-merging.c (perform_symbolic_merge): Add CODE
argument.  If CODE is not BIT_IOR_EXPR, ensure that one of masked1
or masked2 is 0.
(find_bswap_or_nop_1, find_bswap_or_nop,
imm_store_chain_info::try_coalesce_bswap): Adjust
perform_symbolic_merge callers.

* gcc.c-torture/execute/pr103376.c: New test.

Avoid redundant get_loop_body calls in IVOPTs

This removes redundant get_loop_body calls in IVOPTs by passing
around the body we're gathering early.

2021-11-23 Richard Biener <rguenther@suse.de>

* tree-ssa-loop-ivopts.c (find_givs): Take loop body as
argument instead of re-computing it.
(find_interesting_uses): Likewise.
(find_induction_variables): Pass through loop body.
(tree_ssa_iv_optimize_loop): Pass down loop body.

middle-end: Fix failures with bitclear patterns on signed values

During testing after rebasing to commit I noticed a failing testcase with the
bitmask compare patch.

Consider the following C++ testcase:

#include <compare>

#define A __attribute__((noipa))
A bool f5 (double i, double j) { auto c = i <=> j; return c >= 0; }

This turns into a comparison against chars, on systems where chars are signed
the pattern inserts an unsigned convert such that it's able to do the
transformation.

i.e.:

  # RANGE [-1, 2]
  # c$_M_value_22 = PHI <-1(3), 0(2), 2(5), 1(4)>
  # RANGE ~[3, 254]
  _11 = (unsigned char) c$_M_value_22;
  _19 = _11 <= 1;
  # .MEM_24 = VDEF <.MEM_6(D)>
  D.10434 ={v} {CLOBBER};
  # .MEM_14 = VDEF <.MEM_24>
  D.10407 ={v} {CLOBBER};
  # VUSE <.MEM_14>
  return _19;

instead of:

  # RANGE [-1, 2]
  # c$_M_value_5 = PHI <-1(3), 0(2), 2(5), 1(4)>
  # RANGE [-2, 2]
  _3 = c$_M_value_5 & -2;
  _19 = _3 == 0;
  # .MEM_24 = VDEF <.MEM_6(D)>
  D.10440 ={v} {CLOBBER};
  # .MEM_14 = VDEF <.MEM_24>
  D.10413 ={v} {CLOBBER};
  # VUSE <.MEM_14>
  return _19;

This causes much worse codegen under -ffast-math due to phiops no longer
recognizing the pattern.  It turns out that phiopts spaceship_replacement is
looking for the exact form that was just changed.

The comments seems to suggest this code only checks for (res & ~1) == 0 but the
implementation seems to suggest it's broader.

As such I added a case to check to see if the value comparison we found is a
type cast.  and strips away the type cast and continues.

In match.pd the typecasts are only added for signed comparisons to == 0 and != 0
which are then rewritten into comparisons with 1.

As such I only check for 1 and LE and GT, which is what match.pd would have
rewritten it to.

This fixes the regression but this is not code I 100% understand, since I don't
really know the semantics of the spaceship operator so would appreciate an extra
look.

gcc/ChangeLog:

* tree-ssa-phiopt.c (spaceship_replacement): Handle new canonical
codegen.

middle-end: Convert bitclear <imm> + cmp<cc> #0 into cm<cc2> <imm2>

This optimizes the case where a mask Y which fulfills ~Y + 1 == pow2 is used to
clear a some bits and then compared against 0 into one without the masking and
a compare against a different bit immediate.

We can do this for all unsigned compares and for signed we can do it for
comparisons of EQ and NE:

(x & (~255)) == 0 becomes x <= 255. Which for leaves it to the target to
optimally deal with the comparison.

This transformation has to be done in the mid-end because in RTL you don't have
the signs of the comparison operands and if the target needs an immediate this
should be floated outside of the loop.

The RTL loop invariant hoisting is done before split1.

i.e.

void fun1(int32_t *x, int n)
{
    for (int i = 0; i < (n & -16); i++)
      x[i] = (x[i]&(~255)) == 0;
}

now generates:

.L3:
        ldr     q0, [x0]
        cmhs    v0.4s, v2.4s, v0.4s
        and     v0.16b, v1.16b, v0.16b
        str     q0, [x0], 16
        cmp     x0, x1
        bne     .L3

and floats the immediate out of the loop.

instead of:

.L3:
        ldr     q0, [x0]
        bic     v0.4s, #255
        cmeq    v0.4s, v0.4s, #0
        and     v0.16b, v1.16b, v0.16b
        str     q0, [x0], 16
        cmp     x0, x1
        bne     .L3

In order to not break IVopts and CSE I have added a
requirement for the scalar version to be single use.

gcc/ChangeLog:

* tree.c (bitmask_inv_cst_vector_p): New.
* tree.h (bitmask_inv_cst_vector_p): New.
* match.pd: Use it in new bitmask compare pattern.

gcc/testsuite/ChangeLog:

* gcc.dg/bic-bitmask-10.c: New test.
* gcc.dg/bic-bitmask-11.c: New test.
* gcc.dg/bic-bitmask-12.c: New test.
* gcc.dg/bic-bitmask-13.c: New test.
* gcc.dg/bic-bitmask-14.c: New test.
* gcc.dg/bic-bitmask-15.c: New test.
* gcc.dg/bic-bitmask-16.c: New test.
* gcc.dg/bic-bitmask-17.c: New test.
* gcc.dg/bic-bitmask-18.c: New test.
* gcc.dg/bic-bitmask-19.c: New test.
* gcc.dg/bic-bitmask-2.c: New test.
* gcc.dg/bic-bitmask-20.c: New test.
* gcc.dg/bic-bitmask-21.c: New test.
* gcc.dg/bic-bitmask-22.c: New test.
* gcc.dg/bic-bitmask-23.c: New test.
* gcc.dg/bic-bitmask-3.c: New test.
* gcc.dg/bic-bitmask-4.c: New test.
* gcc.dg/bic-bitmask-5.c: New test.
* gcc.dg/bic-bitmask-6.c: New test.
* gcc.dg/bic-bitmask-7.c: New test.
* gcc.dg/bic-bitmask-8.c: New test.
* gcc.dg/bic-bitmask-9.c: New test.
* gcc.dg/bic-bitmask.h: New test.
* gcc.target/aarch64/bic-bitmask-1.c: New test.

c++: Fix missing NSDMI diagnostic in C++98 [PR103347]

Here the problem is that we aren't detecting a NSDMI in C++98:

struct A {
  void *x = NULL;
};

because maybe_warn_cpp0x uses input_location and that happens to point
to NULL which comes from a system header.  Jakub suggested changing the
location to the '=', thereby avoiding the system header problem.  To
that end, I've added a new location_t member into cp_declarator.  This
member is used when this declarator is part of an init-declarator.  The
rest of the changes is obvious.  I've also taken the liberty of adding
loc_or_input_loc, since I want to avoid checking for UNKNOWN_LOCATION.

PR c++/103347

gcc/cp/ChangeLog:

* cp-tree.h (struct cp_declarator): Add a location_t member.
(maybe_warn_cpp0x): Add a location_t parameter with a default argument.
(loc_or_input_loc): New.
* decl.c (grokdeclarator): Use loc_or_input_loc.  Pass init_loc down
to maybe_warn_cpp0x.
* error.c (maybe_warn_cpp0x): Add a location_t parameter.  Use it.
* parser.c (make_declarator): Initialize init_loc.
(cp_parser_member_declaration): Set init_loc.
(cp_parser_condition): Likewise.
(cp_parser_init_declarator): Likewise.
(cp_parser_parameter_declaration): Likewise.

gcc/testsuite/ChangeLog:

* g++.dg/cpp0x/nsdmi-warn1.C: New test.
* g++.dg/cpp0x/nsdmi-warn1.h: New file.

timevar: Add auto_cond_timevar class

The auto_timevar sentinel class for starting and stopping timevars was added
in 2014, but doesn't work for the many uses of timevar_cond_start/stop in
the C++ front end. So let's add one that does.

This allows us to remove a lot of wrapper functions that were just used to
call timevar_cond_stop on all exits from the function.

gcc/ChangeLog:

* timevar.h (class auto_cond_timevar): New.

gcc/cp/ChangeLog:

* call.c
* decl.c
* name-lookup.c:
Use auto_cond_timevar instead of timevar_cond_start/stop.
Remove wrapper functions.

Enhance optimize_atomic_bit_test_and to handle truncation.

r12-5102-gfb161782545224f5 improves integer bit test on
__atomic_fetch_[or|and]_* returns only for nop_convert, .i.e.

transfrom

  mask_5 = 1 << bit_4(D);
  mask.0_1 = (unsigned int) mask_5;
  _2 = __atomic_fetch_or_4 (a_7(D), mask.0_1, 0);
  t1_9 = (int) _2;
  t2_10 = mask_5 & t1_9;

to

  mask_5 = 1 << n_4(D);
  mask.1_1 = (unsigned int) mask_5;
  _11 = .ATOMIC_BIT_TEST_AND_SET (&pscc_a_1_4, n_4(D), 0);
  _8 = (int) _11;

And this patch extend the original patch to handle truncation.
.i.e.

transform

  long int mask;
  mask_8 = 1 << n_7(D);
  mask.0_1 = (long unsigned int) mask_8;
  _2 = __sync_fetch_and_or_8 (&pscc_a_2_3, mask.0_1);
  _3 = (unsigned int) _2;
  _4 = (unsigned int) mask_8;
  _5 = _3 & _4;
  _6 = (int) _5;

to

  long int mask;
  mask_8 = 1 << n_7(D);
  mask.0_1 = (long unsigned int) mask_8;
  _14 = .ATOMIC_BIT_TEST_AND_SET (&pscc_a_2_3, n_7(D), 0);
  _5 = (unsigned int) _14;
  _6 = (int) _5;

2021-11-17  Hongtao Liu  <hongtao.liu@intel.com>
    H.J. Lu  <hongjiu.lu@intel.com>

gcc/ChangeLog:

PR tree-optimization/103194
* match.pd (gimple_nop_atomic_bit_test_and_p): Extended to
match truncation.
* tree-ssa-ccp.c (gimple_nop_convert): Declare.
(optimize_atomic_bit_test_and): Enhance
optimize_atomic_bit_test_and to handle truncation.

gcc/testsuite/ChangeLog:

* gcc.target/i386/pr103194-2.c: New test.
* gcc.target/i386/pr103194-3.c: New test.
* gcc.target/i386/pr103194-4.c: New test.
* gcc.target/i386/pr103194-5.c: New test.
* gcc.target/i386/pr103194.c: New test.

Daily bump.

Issue -Waddress also for reference members [PR96507].

Resolves:
PR c++/96507 - missing -Waddress for member references

gcc/cp/ChangeLog:

PR c++/96507
* typeck.c (warn_for_null_address): Handle reference members.

gcc/testsuite/ChangeLog:

PR c++/96507
* g++.dg/warn/Waddress-8.C: New test.

Implement -Winfinite-recursion [PR88232].

Resolves:
PR middle-end/88232 - Please implement -Winfinite-recursion

gcc/ChangeLog:

PR middle-end/88232
* Makefile.in (OBJS): Add gimple-warn-recursion.o.
* common.opt: Add -Winfinite-recursion.
* doc/invoke.texi (-Winfinite-recursion): Document.
* passes.def (pass_warn_recursion): Schedule a new pass.
* tree-pass.h (make_pass_warn_recursion): Declare.
* gimple-warn-recursion.c: New file.

gcc/c-family/ChangeLog:

PR middle-end/88232
* c.opt: Add -Winfinite-recursion.

gcc/testsuite/ChangeLog:

PR middle-end/88232
* c-c++-common/attr-used-5.c: Suppress valid warning.
* c-c++-common/attr-used-6.c: Same.
* c-c++-common/attr-used-9.c: Same.
* g++.dg/warn/Winfinite-recursion-2.C: New test.
* g++.dg/warn/Winfinite-recursion-3.C: New test.
* g++.dg/warn/Winfinite-recursion.C: New test.
* gcc.dg/Winfinite-recursion-2.c: New test.
* gcc.dg/Winfinite-recursion.c: New test.

libstdc++: Add another testcase for std::unique_ptr printer [PR103086]

libstdc++-v3/ChangeLog:

PR libstdc++/103086
* testsuite/libstdc++-prettyprinters/cxx11.cc: Check unique_ptr
with non-empty pointer and non-empty deleter.

libstdc++: Add effective-target for std::allocator implementation

This allows tests to be skipped if the std::allocator implementation is
not __gnu_cxx::new_allocator.

The 20_util/allocator/overaligned.cc test requires either C++17 or
new_allocator, otherwise we can't guarantee to return overaligned
memory.

libstdc++-v3/ChangeLog:

* testsuite/18_support/50594.cc: Check effective target.
* testsuite/20_util/allocator/1.cc: Likewise.
* testsuite/20_util/allocator/overaligned.cc: Likewise.
* testsuite/23_containers/unordered_map/96088.cc: Likewise.
* testsuite/23_containers/unordered_multimap/96088.cc: Likewise.
* testsuite/23_containers/unordered_multiset/96088.cc: Likewise.
* testsuite/23_containers/unordered_set/96088.cc: Likewise.
* testsuite/ext/throw_allocator/check_delete.cc: Likewise.
* testsuite/ext/throw_allocator/check_new.cc: Likewise.
* testsuite/lib/libstdc++.exp (check_effective_target_std_allocator_new):
Define new proc.

Fortran: do not attempt simplification of [LU]BOUND for pointer/allocatable

gcc/fortran/ChangeLog:

PR fortran/103392
* simplify.c (simplify_bound): Do not try to simplify
LBOUND/UBOUND for arrays with POINTER or ALLOCATABLE attribute.

gcc/testsuite/ChangeLog:

PR fortran/103392
* gfortran.dg/bound_simplification_7.f90: New test.

c++: -Wuninitialized for mem-inits and empty classes [PR19808]

This fixes a bogus -Wuninitialized warning: there's nothing to initialize
in empty classes, so don't add them into our uninitialized set.

PR c++/19808

gcc/cp/ChangeLog:

* init.c (emit_mem_initializers): Don't add is_really_empty_class
members into uninitialized.

gcc/testsuite/ChangeLog:

* g++.dg/warn/Wuninitialized-28.C: Make a class nonempty.
* g++.dg/warn/Wuninitialized-29.C: Likewise.
* g++.dg/warn/Wuninitialized-31.C: New test.

c++: Add static in g++.dg/warn/Waddress-5.C

While reviewing some other changes I noticed that this test talks
about 'sf' being static, but it wasn't actually marked as such.

gcc/testsuite/ChangeLog:

* g++.dg/warn/Waddress-5.C: Make sf static.

fixincludes: don't abort() on access failure [PR103306]

Some distro may ship dangling symlinks in include directories, triggers
the access failure. Skip it and continue to next header instead of
being to panic.

Restore to old behavior before r12-5234 but without resurrecting the
problematic getcwd() call, by using the environment variable "INPUT"
exported by fixinc.sh.

Tested on x86_64-linux-gnu, with a dangling symlink intentionally
injected into /usr/include.

fixincludes/

PR bootstrap/103306
* fixincl.c (process): Don't call abort().

rs6000: Better error messages for power8/9 vector builtins

2021-11-11 Bill Schmidt <wschmidt@linux.ibm.com>

gcc/
* config/rs6000/rs6000-call.c (rs6000_invalid_new_builtin): Change
error messages for ENB_P8V and ENB_P9V.

rs6000: Add [power6-64] stanza to new builtin support

2021-11-23  Bill Schmidt  <wschmidt@linux.ibm.com>

gcc/
* config/rs6000/rs6000-builtin-new.def: Add power6-64 stanza.  Move
CMPB to power6-64 stanza.
* config/rs6000/rs6000-call.c (rs6000_invalid_new_builtin): Handle
ENB_P6_64 case.
(rs6000_new_builtin_is_supported): Likewise.
(rs6000_expand_new_builtin): Likewise.  Clean up formatting.
(rs6000_init_builtins): Handle ENB_P6_64 case.
* config/rs6000/rs6000-gen-builtins.c (bif_stanza): Add BSTZ_P6_64.
(stanza_map): Add entry mapping power6-64 to BSTZ_P6_64.
(enable_string): Add "ENB_P6_64".
(write_decls): Add ENB_P6_64 to bif_enable enum.

rs6000: Fix test_mffsl.c effective target check

Paul Clarke pointed out to me that I had wrongly used a compile-time check
instead of a run-time check in this executable test.  This patch fixes
that.  I also fixed a typo in a string that caught my eye.

2021-11-23  Bill Schmidt  <wschmidt@linux.ibm.com>

gcc/testsuite/
* gcc.target/powerpc/test_mffsl.c: Change effective target to
a run-time check.  Fix a typo in a debug print statement.

Fortran: fix scalarization for intrinsic LEN_TRIM with present KIND argument

gcc/fortran/ChangeLog:

PR fortran/87711
PR fortran/87851
* trans-array.c (arg_evaluated_for_scalarization): Add LEN_TRIM to
list of intrinsics for which an optional KIND argument needs to be
removed before scalarization.

gcc/testsuite/ChangeLog:

PR fortran/87711
PR fortran/87851
* gfortran.dg/len_trim.f90: New test.

libcpp: Fix ATTR_LIKELY definition PR preprocessor/103355

Fix the definition of ATTR_LIKELY when __has_cpp_attribute is not
defined, as it is the case with old compilers such as gcc-4.8.5.

libcpp/:
PR preprocessor/103355
* system.h (ATTR_LIKELY): Fix definition.

Remove duplicated param valud in modref tree

Modref tree template stores its own copy of param_moderf_max_bases, *_max_refs
and *_max_accesses values. This was done before we had per-function limits and
even back then it was bit dubious, so this patch removes it.

gcc/ChangeLog:

* ipa-modref-tree.h (struct modref_tree): Remove max_bases, max_refs
and max_accesses.
(modref_tree::modref_tree): Remove parametr.
(modref_tree::insert_base): Add max_bases parameter.
(modref_tree::insert): Add max_bases, max_refs, max_accesses
parameters.
(modref_tree::insert): New member function.
(modref_tree::merge): Add max_bases, max_refs, max_accesses
parameters.
(modref_tree::insert): New member function.
* ipa-modref-tree.c (test_insert_search_collapse): Update.
(test_merge): Update.
* ipa-modref.c (dump_records): Don't dump max_refs and max_bases.
(dump_lto_records): Likewise.
(modref_summary::finalize): Fix whitespace.
(get_modref_function_summary): Likewise.
(modref_access_analysis::record_access): Update.
(modref_access_analysis::record_access_lto): Update.
(modref_access_analysis::process_fnspec): Update.
(analyze_function): Update.
(modref_summaries::duplicate): Update.
(modref_summaries_lto::duplicate): Update.
(write_modref_records): Update.
(read_modref_records): Update.
(read_section): Update.
(propagate_unknown_call): Update.
(modref_propagate_in_scc): Update.
(ipa_merge_modref_summary_after_inlining): Update.

libstdc++: Fix circular dependency for bitmap_allocator [PR103381]

<ext/bitmap_allocator.h> includes <function>, and since C++17 that
includes <unordered_map>. If std::allocator is defined in terms of
__gnu_cxx::bitmap_allocator then you get a circular reference and
bootstrap fails when compiling src/c++17/*.cc.

libstdc++-v3/ChangeLog:

PR libstdc++/103381
* include/ext/bitmap_allocator.h: Include <bits/stl_function.h>
instead of <functional>.

docs: Remove 2 more duplicite param descriptions.

gcc/ChangeLog:

* doc/invoke.texi: Remove 2 more duplicite param descriptions.

tree-optimization/103361 - fix unroll-and-jam direction vector handling

This properly uses lambda_int instead of truncating the direction
vector to int which leads to false unexpected negative values.

2021-11-23 Richard Biener <rguenther@suse.de>

PR tree-optimization/103361
* gimple-loop-jam.c (adjust_unroll_factor): Use lambda_int
for the dependence distance.
* tree-data-ref.c (print_lambda_vector): Properly print a lambda_int.

* g++.dg/torture/pr103361.C: New testcase.

inliner: Remove unused transform_lang_insert_block hook

This struct copy_body_data's hook is always NULL since merge
of the tuples branch, before that it has been shortly used by the C++
FE during ctor/dtor cloning to chain the remapped blocks, but only
very shortly, before transform_lang_insert_block was a bool and
the call to insert_block was done through a langhook.
I'd say that for something that hasn't been used since 4.4 there is
zero chance we'll want to use it again in the near future.

2021-11-23 Jakub Jelinek <jakub@redhat.com>

gcc/
* tree-inline.h (struct copy_body_data): Remove
transform_lang_insert_block member.
* tree-inline.c (remap_block): Don't call
id->transform_lang_insert_block.
(optimize_inline_calls, copy_gimple_seq_and_replace_locals,
tree_function_versioning, maybe_inline_call_in_expr,
copy_fn): Don't initialize id.transform_lang_insert_block.
* gimplify.c (gimplify_omp_loop): Likewise.
gcc/c/
* c-typeck.c (c_clone_omp_udr): Don't initialize
id.transform_lang_insert_block.
gcc/cp/
* semantics.c (clone_omp_udr): Don't initialize
id.transform_lang_insert_block.
* optimize.c (clone_body): Likewise.

Improve bytewise DSE

testcase modref-dse-4.c and modref-dse-5.c fails on some targets because they
depend on store merging.  What really happens is that without store merging
we produce for kill_me combined write that is ao_ref with offset=0, size=32
and max_size=96.  We have size != max_size becaue we do ont track the info that
all 3 writes must happen in a group and conider case only some of them are done.

This disables byte-wise DSE which checks that size == max_size.  This is
completely unnecesary for store being proved to be dead or load being checked
to not read live bytes.  It is only necessary for kill store that is used to
prove that given store is dead.

While looking into this I also noticed that we check that everything is byte
aligned.  This is also unnecessary and with access merging in modref may more
commonly fire on accesses that we could otherwise handle.

This patch fixes both also also changes interface to normalize_ref that I found
confusing since it modifies the ref. Instead of that we have get_byte_range
that is computing range in bytes (since that is what we need to maintain the
bitmap) and has additional parameter specifying if the store in question should
be turned into sub-range or super-range depending whether we compute range
for kill or load.

gcc/ChangeLog:

2021-11-23  Jan Hubicka  <hubicka@ucw.cz>

PR tree-optimization/103335
* tree-ssa-dse.c (valid_ao_ref_for_dse): Rename to ...
(valid_ao_ref_kill_for_dse): ... this; do not check that boundaries
are divisible by BITS_PER_UNIT.
(get_byte_aligned_range_containing_ref): New function.
(get_byte_aligned_range_contained_in_ref): New function.
(normalize_ref): Rename to ...
(get_byte_range): ... this one; handle accesses not aligned to byte
boundary; return range in bytes rater than updating ao_ref.
(clear_live_bytes_for_ref): Take write ref by reference; simplify using
get_byte_access.
(setup_live_bytes_from_ref): Likewise.
(clear_bytes_written_by): Update.
(live_bytes_read): Update.
(dse_classify_store): Simplify tech before live_bytes_read checks.

gcc/testsuite/ChangeLog:

2021-11-23  Jan Hubicka  <hubicka@ucw.cz>

* gcc.dg/tree-ssa/modref-dse-4.c: Update template.
* gcc.dg/tree-ssa/modref-dse-5.c: Update template.

Canonicalize &MEM[ssa_n, CST] to ssa_n p+ CST in fold_stmt_1

This is a new version of the patch to fix PR 102216.
Instead of doing the canonicalization inside forwprop, Richi
mentioned we should do it inside fold_stmt_1 and that is what
this patch does.

PR tree-optimization/102216

gcc/ChangeLog:

* gimple-fold.c (fold_stmt_1): Add canonicalization
of "&MEM[ssa_n, CST]" to "ssa_n p+ CST", note this
can only be done if !in_place.

gcc/testsuite/ChangeLog:

* g++.dg/tree-ssa/pr102216-1.C: New test.
* g++.dg/tree-ssa/pr102216-2.C: New test.

openmp: Fix up handling of reduction clauses on the loop construct [PR102431]

We were using unshare_expr and walk_tree_without_duplicate replacement
of the placeholder vars.  The OMP_CLAUSE_REDUCTION_{INIT,MERGE} can contain
other trees that need to be duplicated though, e.g. BLOCKs referenced in
BIND_EXPR(s), or local VAR_DECLs.  This patch uses the inliner code to copy
all of that.  There is a slight complication that those local VAR_DECLs or
placeholders don't have DECL_CONTEXT set, they will get that only when
they are gimplified later on, so this patch sets DECL_CONTEXT for those
temporarily and resets it afterwards.

2021-11-23  Jakub Jelinek  <jakub@redhat.com>

PR middle-end/102431
* gimplify.c (replace_reduction_placeholders): Remove.
(note_no_context_vars): New function.
(gimplify_omp_loop): For OMP_PARALLEL's BIND_EXPR create a new
BLOCK.  Use copy_tree_body_r with walk_tree instead of unshare_expr
and replace_reduction_placeholders for duplication of
OMP_CLAUSE_REDUCTION_{INIT,MERGE} expressions.  Ensure all mentioned
automatic vars have DECL_CONTEXT set to non-NULL before doing so
and reset it afterwards for those vars and their corresponding
vars.

* c-c++-common/gomp/pr102431.c: New test.
* g++.dg/gomp/pr102431.C: New test.
* gfortran.dg/gomp/pr102431.f90: New test.

rs6000: Optimize code generation of vec_reve [PR100868]

gcc/
PR target/100868
* config/rs6000/altivec.md (altivec_vreve<mode>2 for VEC_K): Use
xxbrq for v16qi, xxbrq + xxbrh for v8hi and xxbrq + xxbrw for v4si
or v4sf when p9_vector is set.
(altivec_vreve<mode>2 for VEC_64): Defined. Implemented by xxswapd.

gcc/testsuite/
PR target/100868
* gcc.target/powerpc/vec_reve_1.c: New test.
* gcc.target/powerpc/vec_reve_2.c: Likewise.

contrib: filter out -Wc++20-extensions

contrib/ChangeLog:

* filter-clang-warnings.py: Filter -Wc++20-extensions as it does
not respect proper attribute detection.

contrib: Support itemx in check-params-in-docs.py.

contrib/ChangeLog:

* check-params-in-docs.py: Support @itemx in param documentation
and support multi-line documentation for parameters.

Re: [PATCH] PR tree-optimization/102232 Adding a missing pattern to match.pd

PR tree-optimization/102232

gcc/
* match.pd (x * (1 + y / x) - y) -> (x - y % x): New optimization.

gcc/testsuite/

* gcc.dg/tree-ssa/pr102232.c: Testcase for this optimization.

libcpp: Use [[likely]] conditionally

Let's hide [[likely]] behind a macro, to suppress warnings if the
compiler doesn't support it.

Co-authored-by: Jonathan Wakely <jwakely@redhat.com>
PR preprocessor/103355

libcpp/ChangeLog:

* lex.c: Use ATTR_LIKELY instead of [[likely]].
* system.h (ATTR_LIKELY): Define.

Re: [PATCH] PR tree-optimization/96779 Adding a missing pattern to match.pd

PR tree-optimization/96779
gcc/
* match.pd (-x == x) -> (x == 0): New optimization.

gcc/testsuite
* gcc.dg/tree-ssa/pr96779.c: Testcase for this optimization.
* gcc.dg/tree-ssa/pr96779-disabled.c: Testcase for this optimization
when -fwrapv passed.

Daily bump.

c++: remember pointer-to-member location

Jakub recently mentioned that a PTRMEM_CST has no location; let's give it a
location wrapper.

gcc/cp/ChangeLog:

* typeck.c (build_x_unary_op): Set address location.
(convert_member_func_to_ptr): Handle location wrapper.
* pt.c (convert_nontype_argument): Likewise.

gcc/testsuite/ChangeLog:

* g++.dg/template/crash106.C: Adjust.
* g++.dg/diagnostic/ptrtomem3.C: New test.

c++: improved return expression location

Stripping the location wrapper from retval meant we didn't have the
necessary location information for any conversion diagnostics. We only need
the stripping for the named return value optimization, let's use the
unstripped expression for everything else.

gcc/cp/ChangeLog:

* typeck.c (check_return_expr): Only strip location wrapper during
NRV handling.

gcc/testsuite/ChangeLog:

* g++.dg/cpp0x/pr65327.C: Adjust location.
* g++.dg/cpp23/constexpr-nonlit4.C: Likewise.
* g++.dg/cpp23/constexpr-nonlit5.C: Likewise.
* g++.dg/cpp2a/constexpr-init1.C: Likewise.

libcpp: Fix _Pragma stringification [PR103165]

As the testcase show, sometimes _Pragma is turned into CPP_PRAGMA
.. CPP_PRAGMA_EOL tokens, even when it might still need to be
stringized later on.  We are then ICEing because we don't handle
stringification of CPP_PRAGMA or CPP_PRAGMA_EOL, but trying to
reconstruct the exact tokens with exact spacing after it has been
lowered is very hard.  So, instead this patch ensures we don't
lower _Pragma during expand_arg calls, but only later when
cpp_get_token_1 is called outside of expand_arg.

2021-11-22  Jakub Jelinek  <jakub@redhat.com>
    Tobias Burnus  <tobias@codesourcery.com>

PR preprocessor/103165
libcpp/
* internal.h (struct lexer_state): Add ignore__Pragma field.
* macro.c (builtin_macro): Don't interpret _Pragma if
pfile->state.ignore__Pragma.
(expand_arg): Temporarily set pfile->state.ignore__Pragma to 1.
gcc/testsuite/
* c-c++-common/gomp/pragma-3.c: New test.
* c-c++-common/gomp/pragma-4.c: New test.
* c-c++-common/gomp/pragma-5.c: New test.

Co-Authored-By: Tobias Burnus <tobias@codesourcery.com>

tree-optimization/103345: Improved load merging.

This patch implements PR tree-optimization/103345 to merge adjacent
loads when combined with addition or bitwise xor.  The current code
in gimple-ssa-store-merging.c's find_bswap_or_nop alreay handles ior,
so that all that's required is to treat PLUS_EXPR and BIT_XOR_EXPR in
the same way at BIT_IOR_EXPR.  Many thanks to Andrew Pinski for
pointing out that this also resolves PR target/98953.

2021-11-22  Roger Sayle  <roger@nextmovesoftware.com>

gcc/ChangeLog
PR tree-optimization/98953
PR tree-optimization/103345
* gimple-ssa-store-merging.c (find_bswap_or_nop_1): Handle
BIT_XOR_EXPR and PLUS_EXPR the same as BIT_IOR_EXPR.
(pass_optimize_bswap::execute): Likewise.

gcc/testsuite/ChangeLog
PR tree-optimization/98953
PR tree-optimization/103345
* gcc.dg/tree-ssa/pr98953.c: New test case.
* gcc.dg/tree-ssa/pr103345.c: New test case.

docs: remove duplicate param documentation

gcc/ChangeLog:

* doc/invoke.texi: Remove duplicate documentation for 3 params.

openacc: Fix up C++ #pragma acc routine handling [PR101731]

The following testcase ICEs because two function declarations are nested in
each other and the acc routine handling code isn't prepared to put the
pragma on both.

The fix is similar to what #pragma omp declare {simd,variant} does,
in particular set the fndecl_seen flag already in cp_parser_late_parsing*
when we encounter it rather than only after we finalize it.

In cp_finalize_oacc_routine I had to move the fndecl_seen diagnostics to
non-FUNCTION_DECL block, because for FUNCTION_DECLs the flag is already
known to be set from cp_parser_late_parsing_oacc_routine, but can't be
removed altogether, because that regresses quality of 2 goacc/routine-5.c
diagnostics - we drop "a single " from the
'#pragma acc routine' not immediately followed by a single function declaration or definition
diagnostic say on
#pragma acc routine
int foo (), b;
if we drop it altogether.

2021-11-22 Jakub Jelinek <jakub@redhat.com>

PR c++/101731
* parser.c (cp_parser_late_parsing_oacc_routine): Set
parser->oacc_routine->fndecl_seen here, rather than ...
(cp_finalize_oacc_routine): ... here. Don't error if
parser->oacc_routine->fndecl_seen is set for FUNCTION_DECLs.

* c-c++-common/goacc/routine-6.c: New test.

libstdc++: Fix condition for definition of _GLIBCXX14_DEPRECATED

The check for C++14 was using the wrong date.

libstdc++-v3/ChangeLog:

* include/bits/c++config (_GLIBCXX14_DEPRECATED): Fix condition
checking for C++14.

libgcc: Remove dbase member from struct unw_eh_callback_data if NULL

Only bfin, frv, i386 and nios2 need this member at present.

libgcc/ChangeLog

* unwind-dw2-fde-dip.c (NEED_DBASE_MEMBER): Define.
(struct unw_eh_callback_data): Make dbase member conditional.
(unw_eh_callback_data_dbase): New function.
(base_from_cb_data): Simplify for the non-dbase case.
(_Unwind_IteratePhdrCallback): Adjust.
(_Unwind_Find_FDE): Likewise.

libgcc: Remove tbase member from struct unw_eh_callback_data

It is always a null pointer.

libgcc/ChangeLog

* unwind-dw2-fde-dip.c (struct unw_eh_callback_data): Remove
tbase member.
(base_from_cb_data): Adjust.
(_Unwind_IteratePhdrCallback): Likewise.
(_Unwind_Find_FDE): Likewise.

tree-optimization/103351 - avoid compare-debug issue wrt CD-DCE change

This avoids differences in the split edge of a cluster due to different
order of same key PHI args when sorting by sorting after the edge
destination index as second key.

2021-11-22 Richard Biener <rguenther@suse.de>

PR tree-optimization/103351
* tree-ssa-dce.c (sort_phi_args): Sort after e->dest_idx as
second key.

* g++.dg/torture/pr103351.C: New testcase.

openmp: Handle OMP_MASKED in potential_constant_expression_1 [PR103349]

WHen adding OMP_MASKED, I apparently forgot to handle it in
potential_constant_expression_1, which means we can ICE on it.

2021-11-22 Jakub Jelinek <jakub@redhat.com>

PR c++/103349
* constexpr.c (potential_constant_expression_1): Punt on OMP_MASKED.

* g++.dg/gomp/masked-1.C: New test.

Don't allow mask/sse/mmx mov in TLS code sequences.

As change in assembler, refer to [1], this patch disallow mask/sse/mmx
mov in TLS code sequences which require integer MOV instructions.

[1] https://sourceware.org/git/?p=binutils-gdb.git;a=patch;h=d7e3e627027fcf37d63e284144fe27ff4eba36b5

gcc/ChangeLog:

PR target/103275
* config/i386/constraints.md (Bk): New
define_memory_constraint.
* config/i386/i386-protos.h (ix86_gpr_tls_address_pattern_p):
Declare.
* config/i386/i386.c (ix86_gpr_tls_address_pattern_p): New
function.
* config/i386/i386.md (*movsi_internal): Don't allow
mask/sse/mmx move in TLS code sequences.
(*movdi_internal): Ditto.

gcc/testsuite/ChangeLog:

* gcc.target/i386/pr103275.c: New test.

xtensa: Fix non-robust split condition in define_insn_and_split

This patch is to fix some non-robust split conditions in some
define_insn_and_splits, to make each of them applied on top of
the corresponding condition for define_insn part, otherwise the
splitting could perform unexpectedly.

gcc/ChangeLog:

* config/xtensa/xtensa.md (movdi_internal, movdf_internal): Fix split
condition.

Daily bump.

fortran, debug: Fix up DW_AT_rank [PR103315]

For DW_AT_rank we were emitting
        .uleb128 0x4    # DW_AT_rank
        .byte   0x97    # DW_OP_push_object_address
        .byte   0x23    # DW_OP_plus_uconst
        .uleb128 0x1c
        .byte   0x6     # DW_OP_deref
on 64-bit and
        .uleb128 0x4    # DW_AT_rank
        .byte   0x97    # DW_OP_push_object_address
        .byte   0x23    # DW_OP_plus_uconst
        .uleb128 0x10
        .byte   0x6     # DW_OP_deref
on 32-bit.  I think this is wrong, as dtype.rank field in the descriptor
has unsigned char type, not pointer type nor pointer sized integral.
E.g. if we have a
    REAL :: a(..)
dummy argument, which is passed as a reference to the function descriptor,
we want to evaluate a->dtype.rank.  The above DWARF expressions perform
*(uintptr_t *)(a + 0x1c)
and
*(uintptr_t *)(a + 0x10)
respectively.  The following patch changes those to:
        .uleb128 0x5    # DW_AT_rank
        .byte   0x97    # DW_OP_push_object_address
        .byte   0x23    # DW_OP_plus_uconst
        .uleb128 0x1c
        .byte   0x94    # DW_OP_deref_size
        .byte   0x1
and
        .uleb128 0x5    # DW_AT_rank
        .byte   0x97    # DW_OP_push_object_address
        .byte   0x23    # DW_OP_plus_uconst
        .uleb128 0x10
        .byte   0x94    # DW_OP_deref_size
        .byte   0x1
which perform
*(unsigned char *)(a + 0x1c)
and
*(unsigned char *)(a + 0x10)
respectively.

2021-11-21  Jakub Jelinek  <jakub@redhat.com>

PR debug/103315
* trans-types.c (gfc_get_array_descr_info): Use DW_OP_deref_size 1
instead of DW_OP_deref for DW_AT_rank.

i386: Fix up handling of target attribute [PR101180]

As shown in the testcase below, if a function has multiple target attributes
(rather than a single one with one or more arguments) or if a function
gets one target attribute on one declaration and another one on another
declaration, on x86 their effect is not combined into
DECL_FUNCTION_SPECIFIC_TARGET, but instead only the last processed target
attribute wins. aarch64 handles this right, the following patch follows
what it does, i.e. only start with target_option_default_node if
DECL_FUNCTION_SPECIFIC_TARGET is previously NULL (i.e. the first target
attribute being processed on a function) and otherwise start from the
previous DECL_FUNCTION_SPECIFIC_TARGET.

2021-11-21 Jakub Jelinek <jakub@redhat.com>

PR c++/101180
* config/i386/i386-options.c (ix86_valid_target_attribute_p): If
fndecl already has DECL_FUNCTION_SPECIFIC_TARGET, use that as base
instead of target_option_default_node.

* gcc.target/i386/pr101180.c: New test.

Fortran: fix lookup for gfortran builtin math intrinsics used by DEC extensions

gcc/fortran/ChangeLog:

PR fortran/99061
* trans-intrinsic.c (gfc_lookup_intrinsic): Helper function for
looking up gfortran builtin intrinsics.
(gfc_conv_intrinsic_atrigd): Use it.
(gfc_conv_intrinsic_cotan): Likewise.
(gfc_conv_intrinsic_cotand): Likewise.
(gfc_conv_intrinsic_atan2d): Likewise.

gcc/testsuite/ChangeLog:

PR fortran/99061
* gfortran.dg/dec_math_5.f90: New test.

Co-authored-by: Steven G. Kargl <kargl@gcc.gnu.org>

Improve base tracking in ipa-modref

on exchange2 benchamrk we miss some useful propagation because modref gives
up very early on analyzing accesses through pointers.  For example in
int test (int *a)
{
  int i;
  for (i=0; a[i];i++);
  return i+a[i];
}

We are not able to determine that a[i] accesses are relative to a.
This is because get_access requires the SSA name that is in MEM_REF to be
PARM_DECL while on other places we use ipa-prop helper to work out the proper
base pointers.

This patch commonizes the code in get_access and parm_map_for_arg so both
use the check properly and extends it to also figure out that newly allocated
memory is not a side effect to caller.

gcc/ChangeLog:

2021-11-21  Jan Hubicka  <hubicka@ucw.cz>

PR ipa/103227
* ipa-modref.c (parm_map_for_arg): Rename to ...
(parm_map_for_ptr): .. this one; handle static chain and calls to
malloc functions.
(modref_access_analysis::get_access): Use parm_map_for_ptr.
(modref_access_analysis::process_fnspec): Update.
(modref_access_analysis::analyze_load): Update.
(modref_access_analysis::analyze_store): Update.

gcc/testsuite/ChangeLog:

2021-11-21  Jan Hubicka  <hubicka@ucw.cz>

PR ipa/103227
* gcc.dg/tree-ssa/modref-15.c: New test.

Fix failure merge_block.c testcase

gcc/testsuite/ChangeLog:

2021-11-21 Jan Hubicka <hubicka@ucw.cz>

PR ipa/103264
* gcc.dg/tree-prof/merge_block.c: Add -fno-ipa-modref

Refactor load/store/kill analysis in ipa-modref

Refactor load/store/kill analysis in ipa-modref to a class
modref_access_analysis. This is done in order to avoid some code duplication
and early exits that has turned out to be hard to maintain and there were
multiple bugs we noticed recently.

gcc/ChangeLog:

2021-11-21 Jan Hubicka <hubicka@ucw.cz>

* ipa-modref.c (ignore_nondeterminism_p): Move earlier in source
code.
(ignore_retval_p): Likewise.
(ignore_stores_p): Likewise.
(parm_map_for_arg): Likewise.
(class modref_access_analysis): New class.
(modref_access_analysis::set_side_effects): New member function.
(modref_access_analysis::set_nondeterministic): New member function.
(get_access): Turn to ...
(modref_access_analysis::get_access): ... this one.
(record_access): Turn to ...
(modref_access_analysis::record_access): ... this one.
(record_access_lto): Turn to ...
(modref_access_analysis::record_access_lto): ... This one.
(record_access_p): Turn to ...
(modref_access_analysis::record_access_p): ... This one
(modref_access_analysis::record_unknown_load): New member function.
(modref_access_analysis::record_unknown_store): New member function.
(get_access_for_fnspec): Turn to ...
(modref_access_analysis::get_access_for_fnspec): ... this one.
(merge_call_side_effects): Turn to ...
(moderf_access_analysis::merge_call_side_effects): Turn to ...
(collapse_loads): Move later in source code.
(collapse_stores): Move later in source code.
(process_fnspec): Turn to ...
(modref_access_analysis::process_fnspec): ... this one.
(analyze_call): Turn to ...
(modref_access_analysis::analyze_call): ... this one.
(struct summary_ptrs): Remove.
(analyze_load): Turn to ...
(modref_access_analysis::analyze_load): ... this one.
(analyze_store): Turn to ...
(modref_access_analysis::analyze_store): ... this one.
(analyze_stmt): Turn to ...
(modref_access_analysis::analyze_stmt): ... This one.
(remove_summary): Remove.
(modref_access_analysis::propagate): Break out from ...
(modref_access_analysis::analyze): Break out from ...
(analyze_function): ... here.

Tweak tree-ssa-math-opts.c to solve PR target/102117.

This patch resolves PR target/102117 on s390.  The problem is that
some of the functionality of GCC's RTL expanders is no longer triggered
following the transition to tree SSA form.  On s390, unsigned widening
multiplications are converted into WIDEN_MULT_EXPR (aka w* in tree dumps),
but signed widening multiplies are left in their original form, which
alas doesn't benefit from the clever logic in expand_widening_mult.

The fix is to teach convert_mult_to_widen, that RTL expansion can
synthesize a signed widening multiplication if the target provides
a suitable umul_widen_optab.

On s390-linux-gnu with -O2 -m64, the code in the bugzilla PR currently
generates:

imul128:
        stmg    %r12,%r13,96(%r15)
        srag    %r0,%r4,63
        srag    %r1,%r3,63
        lgr     %r13,%r3
        mlgr    %r12,%r4
        msgr    %r1,%r4
        msgr    %r0,%r3
        lgr     %r4,%r12
        agr     %r1,%r0
        lgr     %r5,%r13
        agr     %r4,%r1
        stmg    %r4,%r5,0(%r2)
        lmg     %r12,%r13,96(%r15)
        br      %r14

but with this patch should now generate the more efficient:

imul128:
        lgr     %r1,%r3
        mlgr    %r0,%r4
        srag    %r5,%r3,63
        ngr     %r5,%r4
        srag    %r4,%r4,63
        sgr     %r0,%r5
        ngr     %r4,%r3
        sgr     %r0,%r4
        stmg    %r0,%r1,0(%r2)
        br      %r14

2021-11-21  Roger Sayle  <roger@nextmovesoftware.com>
    Robin Dapp  <rdapp@linux.ibm.com>

gcc/ChangeLog
PR target/102117
* tree-ssa-math-opts.c (convert_mult_to_widen): Recognize
signed WIDEN_MULT_EXPR if the target supports umul_widen_optab.

gcc/testsuite/ChangeLog
PR target/102117
* gcc.target/s390/mul-wide.c: New test case.
* gcc.target/s390/umul-wide.c: New test case.

Daily bump.

Fix ignore_nondeterminism_p in ipa-modref

Improve debug output in ipa-modref and fix ignore_nondeterminism predicate:
looping pures and cont are still deterministic.

gcc/ChangeLog:

2021-11-21 Jan Hubicka <hubicka@ucw.cz>

PR ipa/103052
* ipa-modref.c (ignore_nondeterminism_p): Allow looping pure/cont.
(merge_call_side_effects): Improve debug output.

Fix looping flag discovery in ipa-pure-const

The testcase shows situation where there is non-trivial cycle in the callgraph
involving a noreturn call.  This cycle is important for const function discovery
but not important for pure.  IPA pure const uses same strongly connected
components for both propagations which makes it to get suboptimal result
(does not detect the pure flag). However local pure const gets the situation
right becaue it processes functions in right order.  This hits rarely
executed code in propagate_pure_const that merge results with previously
known state that has long standing bug in it that makes it to throw away
the looping flag.

Bootstrapped/regtested x86_64-linux.

gcc/ChangeLog:

2021-11-21  Jan Hubicka  <hubicka@ucw.cz>

PR ipa/103052
* ipa-pure-const.c (propagate_pure_const): Fix merging of loping flag.

gcc/testsuite/ChangeLog:

2021-11-21  Jan Hubicka  <hubicka@ucw.cz>

PR ipa/103052
* gcc.c-torture/execute/pr103052.c: New test.

Clobber the condition code in the bfin doloop patterns

Per Aldy's excellent, but tough to follow analysis in PR 103226, this patch
fixes the bfin-elf regression.

In simplest terms the doloop patterns on this port may clobber the condition
code register, but they do not expose that until after register allocation.
That would be fine, except that other patterns have exposed CC earlier.  As
a result the dataflow, particularly for CC, is incorrect.

This leads the register allocators to assume that a value in CC outside the
loop is still valid inside the loop when in fact, the value has been
clobbered.  This is what caused pr80974 to start failing.

With this fix, not only do we fix the pr80974 regression, but we fix ~20
other execution failures in the port.  It also reduces test time for the
port from ~90 minutes to ~60 minutes.

PR tree-optimization/103226
gcc/
* config/bfin/bfin.md (doloop pattern, splitter and expander): Clobber
CC.

libstdc++: [_GLIBCXX_DEBUG] Reduce performance impact on std::erase_if

Bypass the _GLIBCXX_DEBUG additional checks in std::__detail::__erase_node_if used
by all implementations of std::erase_if for node based containers.

libstdc++-v3/ChangeLog:

* include/bits/erase_if.h (__erase_nodes_if): Add _UnsafeContainer template
parameter. Use it to get iterators to work with.
* include/debug/macros.h (__glibcxx_check_erase2): New.
* include/debug/map.h (map<>::erase(_Base_const_iterator)): New.
(map<>::erase(const_iterator)): Use latter.
* include/debug/multimap.h (multimap<>::erase(_Base_const_iterator)): New.
(multimap<>::erase(const_iterator)): Use latter.
* include/debug/multiset.h (multiset<>::erase(_Base_const_iterator)): New.
(multiset<>::erase(const_iterator)): Use latter.
* include/debug/set.h (set<>::erase(_Base_const_iterator)): New.
(set<>::erase(const_iterator)): Use latter.
* include/debug/unordered_map (unordered_map<>::erase(_Base_const_iterator)): New.
(unordered_multimap<>::erase(const_iterator)): New.
* include/debug/unordered_set (unordered_set<>::erase(_Base_const_iterator)): New.
(unordered_multiset<>::erase(const_iterator)): New.
* include/experimental/map (erase_if): Adapt.
* include/experimental/set (erase_if): Adapt.
* include/experimental/unordered_map (erase_if): Adapt.
* include/experimental/unordered_set (erase_if): Adapt.
* include/std/map (erase_if): Adapt.
* include/std/set (erase_if): Adapt.
* include/std/unordered_map (erase_if): Adapt.
* include/std/unordered_set (erase_if): Adapt.

Fix tree-optimization/103220: Another missing folding of (type) X op CST where type is a nop convert

The problem here is that int_fits_type_p will return false if we just
change the sign of things like -2 (or 254) so we should accept the case
where we just change the sign (and not the precision) of the type.

OK? Bootstrapped and tested on x86_64-linux-gnu with no regressions.

PR tree-optimization/103220

gcc/ChangeLog:

* match.pd ((type) X bitop CST): Don't check if CST
fits into the type if only the sign changes.

gcc/testsuite/ChangeLog:

* gcc.dg/tree-ssa/pr103220-1.c: New test.
* gcc.dg/tree-ssa/pr103220-2.c: New test.
* gcc.dg/pr25530.c: Update test to check for
4294967294 in the case -2 is not matched.

harden conds: detach without decls

When we create copies of SSA_NAMEs to hold "detached" copies of the
values for the hardening tests, we end up with assignments to
SSA_NAMEs that refer to the same decls.  That would be generally
desirable, since it enables the variable to be recognized in dumps,
and makes coalescing more likely if the original variable dies at that
point.  When the decl is a DECL_BY_REFERENCE, the SSA_NAME holds the
address of a parm or result, and it's read-only, so we shouldn't
create assignments to it.  Gimple checkers flag at least the case of
results.

This patch arranges for us to avoid referencing the same decls, which
cures the problem, but retaining the visible association between the
SSA_NAMEs, by using the same identifier for the copy.

for  gcc/ChangeLog

PR tree-optimization/102988
* gimple-harden-conditionals.cc (detach_value): Copy SSA_NAME
without decl sharing.

for  gcc/testsuite/ChangeLog

PR tree-optimization/102988
* g++.dg/pr102988.C: New.

libgccjit: Add some reflection functions [PR96889]

2021-11-19 Antoni Boucher <bouanto@zoho.com>

gcc/jit/
PR target/96889
* docs/topics/compatibility.rst (LIBGCCJIT_ABI_16): New ABI tag.
* docs/topics/functions.rst: Add documentation for the
functions gcc_jit_function_get_return_type and
gcc_jit_function_get_param_count
* docs/topics/types.rst: Add documentation for the functions
gcc_jit_function_type_get_return_type,
gcc_jit_function_type_get_param_count,
gcc_jit_function_type_get_param_type,
gcc_jit_type_unqualified, gcc_jit_type_dyncast_array,
gcc_jit_type_is_bool,
gcc_jit_type_dyncast_function_ptr_type,
gcc_jit_type_is_integral, gcc_jit_type_is_pointer,
gcc_jit_type_dyncast_vector,
gcc_jit_vector_type_get_element_type,
gcc_jit_vector_type_get_num_units,
gcc_jit_struct_get_field, gcc_jit_type_is_struct,
and gcc_jit_struct_get_field_count
* libgccjit.c:
(gcc_jit_function_get_return_type, gcc_jit_function_get_param_count,
gcc_jit_function_type_get_return_type,
gcc_jit_function_type_get_param_count,
gcc_jit_function_type_get_param_type, gcc_jit_type_unqualified,
gcc_jit_type_dyncast_array, gcc_jit_type_is_bool,
gcc_jit_type_dyncast_function_ptr_type, gcc_jit_type_is_integral,
gcc_jit_type_is_pointer, gcc_jit_type_dyncast_vector,
gcc_jit_vector_type_get_element_type,
gcc_jit_vector_type_get_num_units, gcc_jit_struct_get_field,
gcc_jit_type_is_struct, gcc_jit_struct_get_field_count): New
functions.
(struct gcc_jit_function_type, struct gcc_jit_vector_type):
New types.
* libgccjit.h:
(gcc_jit_function_get_return_type, gcc_jit_function_get_param_count,
gcc_jit_function_type_get_return_type,
gcc_jit_function_type_get_param_count,
gcc_jit_function_type_get_param_type, gcc_jit_type_unqualified,
gcc_jit_type_dyncast_array, gcc_jit_type_is_bool,
gcc_jit_type_dyncast_function_ptr_type, gcc_jit_type_is_integral,
gcc_jit_type_is_pointer, gcc_jit_type_dyncast_vector,
gcc_jit_vector_type_get_element_type,
gcc_jit_vector_type_get_num_units, gcc_jit_struct_get_field,
gcc_jit_type_is_struct, gcc_jit_struct_get_field_count): New
function declarations.
(struct gcc_jit_function_type, struct gcc_jit_vector_type):
New types.
* jit-recording.h: New functions (is_struct and is_vector)
* libgccjit.map (LIBGCCJIT_ABI_16): New ABI tag.

gcc/testsuite/
PR target/96889
* jit.dg/all-non-failing-tests.h: Add test-reflection.c.
* jit.dg/test-reflection.c: New test.

Daily bump.

c++: Avoid adding implicit attributes during apply_late_template_attributes [PR101180]

decl_attributes and its caller cplus_decl_attributes sometimes add
implicit attributes, e.g. optimize attribute if #pragma GCC optimize
is active, target attribute if #pragma GCC target is active, or
e.g. omp declare target attribute if in between #pragma omp declare target
and #pragma omp end declare target.

For templates that seems highly undesirable to me though, they should
get those implicit attributes from the spot the templates were parsed
(and they do get that), then tsubst through copy_node copies those
attributes, but then apply_late_template_attributes can or does add
a new set from the spot where they are instantiated, which can be pretty
random point of first use of the template.

Consider e.g.
#pragma GCC push_options
#pragma GCC target "avx"
template <int N>
inline void foo ()
{
}
#pragma GCC pop_options
#pragma GCC push_options
#pragma GCC target "crc32"
void
bar ()
{
   foo<0> ();
}
#pragma GCC pop_options
testcase where the intention is that foo has avx target attribute
and bar has crc32 target attribute, but we end up with
__attribute__((target ("crc32"), target ("avx")))
on foo<0> (and due to yet another bug actually don't enable avx
in foo<0>).  In this particular case it is a regression caused
by r12-299-ga0fdff3cf33f7284 which apparently calls
cplus_decl_attributes even if attributes != NULL but late_attrs
is NULL, before those changes we didn't call it in those cases.
But, if there is at least one unrelated dependent attribute this
would happen already in older releases.

The following patch fixes that by temporarily overriding the variables
that control the addition of the implicit attributes.

Shall we also change the function so that it doesn't call
cplus_decl_attributes if late_attrs is NULL, or was that change
intentional?

2021-11-19  Jakub Jelinek  <jakub@redhat.com>

PR c++/101180
* pt.c (apply_late_template_attributes): Temporarily override
current_optimize_pragma, optimization_current_node,
current_target_pragma and scope_chain->omp_declare_target_attribute,
so that cplus_decl_attributes doesn't add implicit attributes.

* g++.target/i386/pr101180.C: New test.

gcc, doc: Fix Darwin bootstrap: Amend an @option command to elide a space.

At least some version(s) of makeinfo (4.8) do not like @option {-xxxx}
the brace has to follow the @option without any whitespace.

makeinfo 4.8 is installed on Darwin systems and this breaks bootstrap.
The amendment follows the style of the surrounding code.

Signed-off-by: Iain Sandoe <iain@sandoe.co.uk>
gcc/ChangeLog:

* doc/invoke.texi: Remove whitespace after an @option.

analyzer: fix false leak due to overeager state merging [PR103217]

PR analyzer/103217 reports a false positive from -Wanalyzer-malloc-leak.

The root cause is due to overzealous state merger, where the
state-merging code decided to merge these two states by merging
the stores:

state A:
  clusters within frame: ‘main’@1
    cluster for: one_3: CONJURED(val_4 = strdup (src_2(D));, val_4)
    cluster for: two_4: UNKNOWN(char *)
    cluster for: one_21: CONJURED(val_4 = strdup (src_2(D));, val_4)

state B:
  clusters within frame: ‘main’@1
    cluster for: one_3: UNKNOWN(char *)
    cluster for: two_4: CONJURED(val_4 = strdup (src_2(D));, val_4)
    cluster for: two_18: CONJURED(val_4 = strdup (src_2(D));, val_4)

into:
  clusters within frame: ‘main’@1
    cluster for: one_3: UNKNOWN(char *)
    cluster for: two_4: UNKNOWN(char *)
    cluster for: one_21: UNKNOWN(char *)
    cluster for: two_18: UNKNOWN(char *)

despite "CONJURED(val_4 = strdup (src_2(D));, val_4)" having sm-state,
in this case malloc:nonnull ({free}), thus leading to both references
to the conjured svalue being lost at merger.

This patch tweaks the state merger code so that it will not consider
merging two different svalues for the value of a region if either svalue
has non-purgable sm-state (in the above example, malloc:nonnull).  This
fixes the false leak report above.

Doing so uncovered an issue with explode-2a.c in which the warnings
moved from the correct location to the "while" stmt.  This turned out
to be a missing call to detect_leaks in phi-handling, which the patch
also fixes (in the PK_BEFORE_SUPERNODE case in
exploded_graph::process_node).  Doing this fixed the regression in
explode-2a.c and also fixed the location of the leak warning in
explode-1.c.

The other side effect of the change is that pr94858-1.c now emits
a -Wanalyzer-too-complex warning, since pertinent state is no longer
being thrown away.  There doesn't seem to be a good way of avoiding
this, so the patch also adds -Wno-analyzer-too-complex to that test
case (restoring the default).

gcc/analyzer/ChangeLog:
PR analyzer/103217
* engine.cc (exploded_graph::get_or_create_node): Pass in
m_ext_state to program_state::can_merge_with_p.
(exploded_graph::process_worklist): Likewise.
(exploded_graph::maybe_process_run_of_before_supernode_enodes):
Likewise.
(exploded_graph::process_node): Add missing call to detect_leaks
when handling phi nodes.
* program-state.cc (program_state::can_merge_with_p): Add
"ext_state" param.  Pass it and state ptrs to
region_model::can_merge_with_p.
(selftest::test_program_state_merging): Update for new ext_state
param of program_state::can_merge_with_p.
(selftest::test_program_state_merging_2): Likewise.
* program-state.h (program_state::can_purge_p): Make const.
(program_state::can_merge_with_p): Add "ext_state" param.
* region-model.cc: Include "analyzer/program-state.h".
(region_model::can_merge_with_p): Add params "ext_state",
"state_a", and "state_b", use them when creating model_merger
object.
(model_merger::mergeable_svalue_p): New.
* region-model.h (region_model::can_merge_with_p): Add params
"ext_state", "state_a", and "state_b".
(model_merger::model_merger) Likewise, initializing new fields.
(model_merger::mergeable_svalue_p): New decl.
(model_merger::m_ext_state): New field.
(model_merger::m_state_a): New field.
(model_merger::m_state_b): New field.
* svalue.cc (svalue::can_merge_p): Call
model_merger::mergeable_svalue_p on both states and reject the
merger accordingly.

gcc/testsuite/ChangeLog:
PR analyzer/103217
* gcc.dg/analyzer/explode-1.c: Update for improvement to location
of leak warning.
* gcc.dg/analyzer/pr103217.c: New test.
* gcc.dg/analyzer/pr94858-1.c: Add -Wno-analyzer-too-complex.

Signed-off-by: David Malcolm <dmalcolm@redhat.com>

libstdc++: Improve tests for stringstream constructors in C++20

This ensures all constructors are checked.

libstdc++-v3/ChangeLog:

* testsuite/27_io/basic_istringstream/cons/char/1.cc: Check all
constructors.
* testsuite/27_io/basic_istringstream/cons/wchar_t/1.cc:
Likewise.
* testsuite/27_io/basic_ostringstream/cons/char/1.cc: Likewise.
* testsuite/27_io/basic_ostringstream/cons/wchar_t/1.cc:
Likewise.
* testsuite/27_io/basic_stringstream/cons/char/1.cc: Likewise.
* testsuite/27_io/basic_stringstream/cons/wchar_t/1.cc:
Likewise.

libstdc++: Use __is_single_threaded in locale initialization

This replaces a __gthread_active_p() check with __is_single_threaded()
so that std::locale initialization doesn't use __gthread_once if it
happens before the first thread is created.

This means that _S_initialize_once() might now be called twice instead
of only once, because if __is_single_threaded() changes to false then we
will do the __gthread_once call even if _S_initialize_once() was already
called. Add a check to _S_initialize_once() and return immediately if
it is the second call.

Also use __builtin_expect to _S_initialize, as the branch will be taken
at most once in the lifetime of the program.

libstdc++-v3/ChangeLog:

* src/c++98/locale_init.cc (_S_initialize_once): Check if
initialization has already been done.
(_S_initialize): Replace __gthread_active_p with
__is_single_threaded. Use __builtin_expect.

libstdc++: One more change for Clang to support constexpr std::string [PR103295]

All writes into the allocated buffer need to be via traits_type::assign
to begin lifetimes.

libstdc++-v3/ChangeLog:

PR libstdc++/103295
* include/bits/basic_string.tcc (_M_construct): Use the
traits assign member to write into allcoated memory.

rs6000: Add optimizations for _mm_sad_epu8

Power9 ISA added `vabsdub` instruction which is realized in the
`vec_absd` instrinsic.

Use `vec_absd` for `_mm_sad_epu8` compatibility intrinsic, when
`_ARCH_PWR9`.

Also, the realization of `vec_sum2s` on little-endian includes
two rotates in order to position the input and output to match
the semantics of `vec_sum2s`:
- Rotate the second input vector left 12 bytes. In the current usage,
  that vector is `{0}`, so this shift is unnecessary, but is currently
  not eliminated under optimization.
- Rotate the vector produced by the `vsum2sws` instruction left 4 bytes.
  The two words within each doubleword of this (rotated) result must then
  be explicitly swapped to match the semantics of `_mm_sad_epu8`,
  effectively reversing this rotate.  So, this rotate (and a susequent
  swap) are unnecessary, but not currently removed under optimization.

Using `__builtin_altivec_vsum2sws` retains both rotates, so is not an
option for removing the rotates.

For little-endian, use the `vsum2sws` instruction directly, and
eliminate the explicit rotate (swap).

2021-11-19  Paul A. Clarke  <pc@us.ibm.com>

gcc
* config/rs6000/emmintrin.h (_mm_sad_epu8): Use vec_absd when
_ARCH_PWR9, optimize vec_sum2s when LE.

c++: Fix cpp0x/lambda/lambda-nested9.C with C++11

Unfortunately dejagnu doesn't honor #if/#endif, so this test was failing
with -std=c++11:

FAIL: g++.dg/cpp0x/lambda/lambda-nested9.C -std=c++11 (test for errors, line 37)

Fixed thus.

gcc/testsuite/ChangeLog:

* g++.dg/cpp0x/lambda/lambda-nested9.C: Adjust dg-error.

Darwin: Rework handling for unwinder code in libgcc_s and specs [PR80556].

This addresses a long-standing problem where a work-around for an unwinder
issue (also a regression) regresses other functionality.  The patch replaces
several work-arounds with a fix for PR80556 and a work-around for PR88590.

* The fix for PR80556 requires a bump to the SO name for libgcc_s, since we
need to remove the unwinder symbols from it.  This would trigger PR88590
hence the work-around for that.

* We weaken the symbols for emulated TLS support so that it is possible
for a DSO linked with static-libgcc to interoperate with a DSO linked with
libgcc_s.  Likewise main exes.

* We remove all the gcc-4.2.1 era stubs machinery and workarounds.

* libgcc is always now linked ahead of libc, which avoids fails where the
libc (libSystem) builtins implementations are not up to date.

* The unwinder now always comes from the system
- for Darwin9 from /usr/lib/libgcc_s.1.dylib
- for Darwin10 from /usr/lib/libSystem.dylib
- for Darwin11+ from /usr/lib/system/libunwind.dylib.

We still insert a shim on Darwin10 to fix an omitted unwind function, but
the underlying unwinder remains the system one.

* The work-around for PR88590 has two parts (1) we always link libgcc from
its convenience lib on affected system versions (avoiding the need to find
the DSO path); (2) we add and export the emutls functions from DSOs - this
makes a relatively small (20k) addition to a DSO.  These can be backed out
when a proper fix for PR88590 is committed.

For distributions that wish to install a libgcc_s.1.dylib to satisfy linkage
from exes that linked against the stubs can use a reexported libgcc_s.1.1
(since that contains all the symbols that were previously exported via the
stubs).

Signed-off-by: Iain Sandoe <iain@sandoe.co.uk>
gcc/ChangeLog:

PR target/80556
* config/darwin-driver.c (darwin_driver_init): Handle exported
symbols and symbol lists (suppress automatic export of the TLS
symbols).
* config/darwin.c (darwin_rename_builtins): Remove workaround.
* config/darwin.h (LINK_GCC_C_SEQUENCE_SPEC): Likewise.
(REAL_LIBGCC_SPEC): Handle revised library uses.
* config/darwin.opt (nodefaultexport): New.
* config/i386/darwin.h (PR80556_WORKAROUND): Remove.
* config/i386/darwin32-biarch.h (PR80556_WORKAROUND): Likewise.
* config/i386/darwin64-biarch.h (PR80556_WORKAROUND): Likewise.

libgcc/ChangeLog:

* config.host: Add weak emutls crt to the extra_parts.
* config/i386/darwin-lib.h (DECLARE_LIBRARY_RENAMES): Remove
workaround.
* config/libgcc-libsystem.ver: Add exclude list for the system-
provided unwinder.
* config/t-slibgcc-darwin: Bump SO version, remove stubs code.
* config/i386/libgcc-darwin.10.4.ver: Removed.
* config/i386/libgcc-darwin.10.5.ver: Removed.
* config/rs6000/libgcc-darwin.10.4.ver: Removed.
* config/rs6000/libgcc-darwin.10.5.ver: Removed.
* config/t-darwin-noeh: New file.

gcc/testsuite/ChangeLog:

* gcc.dg/torture/fp-int-convert-timode-3.c: Remove XFAIL.
* gcc.dg/torture/fp-int-convert-timode-4.c: Likewise.

libgcc, emutls: Allow building weak definitions of the emutls functions.

In order to better support use of the emulated TLS between objects with
DSO dependencies and static-linked libgcc, allow a target to make weak
definitions.

Signed-off-by: Iain Sandoe <iain@sandoe.co.uk>
libgcc/ChangeLog:

* config/t-darwin: Build weak-defined emutls objects.
* emutls.c (__emutls_get_address): Add optional attributes.
(__emutls_register_common): Likewise.
(EMUTLS_ATTR): New.

libstdc++, testsuite: Add a prune expression for external tool bug.

Depending on the permutation of CPU, OS version and shared/non-
shared library inclusion, we get can get warnings from the external
tools (ld64, dsymutil) which are not actually libstdc++ issues but
relate to the external tools themselves. This is already pruned
in the main testsuite, this adds it to the library.

Signed-off-by: Iain Sandoe <iain@sandoe.co.uk>
libstdc++-v3/ChangeLog:

* testsuite/lib/prune.exp: Prune dsymutil (ld64) warning.

libphobos, testsuite: Add prune clauses for two Darwin cases.

Depending on the permutation of CPU, OS version and shared/non-
shared library inclusion, we get can get two warnings from the
external tools (ld64, dsymutil) which are not actually GCC issues
but relate to the external tools. These are alrrady pruned in
the main testsuite, this adds them to the library.

Signed-off-by: Iain Sandoe <iain@sandoe.co.uk>
libphobos/ChangeLog:

* testsuite/lib/libphobos.exp: Prune warnings from external
tool bugs.

libstdc++: Suppress -Wstringop warnings [PR103332]

libstdc++-v3/ChangeLog:

PR libstdc++/103332
PR libstdc++/102958
* testsuite/21_strings/basic_string/capacity/char/1.cc: Add
-Wno-stringop-overflow.
* testsuite/21_strings/basic_string/operators/char/1.cc:
Likewise.
* testsuite/experimental/filesystem/path/factory/u8path-char8_t.cc:
Add -Wno-stringop-overread.

libstdc++: Begin lifetime of chars in constexpr std::string [PR103295]

Clang gives errors for constexpr std::string because the memory returned
by std::allocator<T>::allocate does not contain any objects yet, and
attempting to set them using char_traits::assign or char_traits::copy
fails with:

assignment to object outside its lifetime is not allowed in a constant expression
*__result = *__first;
^
This adds code to std::char_traits to use std::construct_at to begin
lifetimes when called during constant evaluation. To support
specializations of std::basic_string that don't use std::char_traits
there is now another layer of wrapper around the allocator_traits, so
that the lifetime of characters is begun as soon as the memory is
allocated. By doing it in the char traits and allocator traits, the rest
of basic_string can ignore the problem.

While modifying char_traits::copy and char_traits::assign to begin
lifetimes for the constexpr cases, I also replaced their uses of
std::copy and std::fill_n respectively. That means we don't need
<bits/stl_algobase.h> for char_traits.

libstdc++-v3/ChangeLog:

PR libstdc++/103295
* include/bits/basic_string.h (_Alloc_traits): Replace typedef
with struct for C++20 mode.
* include/bits/basic_string.tcc (_M_replace): Use _Alloc_traits
for allocation.
* include/bits/char_traits.h (__gnu_cxx::char_traits::assign):
Use std::_Construct during constant evaluation.
(__gnu_cxx::char_traits::assign(CharT*, const CharT*, size_t)):
Likewise. Replace std::fill_n with memset or manual loop.
(__gnu_cxx::char_traits::copy): Likewise, replacing std::copy
with memcpy.
* include/ext/vstring.h: Include <bits/stl_algobase.h> for
std::min.
* include/std/string_view: Likewise.
* testsuite/21_strings/basic_string/capacity/char/resize_and_overwrite.cc:
Add constexpr test.