Tobias Burnus [Mon, 15 Nov 2021 14:44:11 +0000 (15:44 +0100)]
Fortran: openmp: Add support for thread_limit clause on target
gcc/fortran/ChangeLog:
* openmp.c (OMP_TARGET_CLAUSES): Add thread_limit.
* trans-openmp.c (gfc_split_omp_clauses): Add thread_limit also to
teams.
libgomp/ChangeLog:
* testsuite/libgomp.fortran/thread-limit-1.f90: New test.
Jakub Jelinek [Mon, 15 Nov 2021 13:47:44 +0000 (14:47 +0100)]
testsuite: Add testcase for already fixed PR [PR100469]
This bug introduced in r11-7448-gff92ede8d269375f800e1b347a48f4698874b4a3
has been fixed already by r12-1354-g2d2ed777b23ab6503027039e0adbfe1162f52b2f
aka PR100852 fix.
2021-11-15 Jakub Jelinek <jakub@redhat.com>
PR debug/100469
* g++.dg/opt/pr100469.C: New test.
H.J. Lu [Mon, 15 Nov 2021 13:17:55 +0000 (05:17 -0800)]
x86: Add gcc.target/i386/pr103205-2.c
PR target/103205
* gcc.target/i386/pr103205-2.c: New test.
H.J. Lu [Mon, 15 Nov 2021 12:56:05 +0000 (04:56 -0800)]
libffi: Update LOCAL_PATCHES
Add
commit
a91f844ef449d0dd1cf2e0e47b0ade0d8a6304e1
Author: Rainer Orth <ro@CeBiTec.Uni-Bielefeld.DE>
Date: Mon Nov 15 10:24:27 2021 +0100
libffi: Use #define instead of .macro in src/x86/win64.S [PR102874]
to LOCAL_PATCHES.
* LOCAL_PATCHES: Add commit
a91f844ef44.
Jakub Jelinek [Mon, 15 Nov 2021 12:20:53 +0000 (13:20 +0100)]
openmp: Add support for thread_limit clause on target
OpenMP 5.1 says that thread_limit clause can also appear on target,
and similarly to teams should affect the thread-limit-var ICV.
On combined target teams, the clause goes to both.
We actually passed thread_limit internally on target already before,
but only used it for gcn/ptx offloading to hint how many threads should be
created and for ptx didn't set thread_limit_var in that case.
Similarly for host fallback.
Also, I found that we weren't copying the args array that contains encoded
thread_limit and num_teams clause for target (etc.) for async target.
2021-11-15 Jakub Jelinek <jakub@redhat.com>
gcc/
* gimplify.c (optimize_target_teams): Only add OMP_CLAUSE_THREAD_LIMIT
to OMP_TARGET_CLAUSES if it isn't there already.
gcc/c-family/
* c-omp.c (c_omp_split_clauses) <case OMP_CLAUSE_THREAD_LIMIT>:
Duplicate to both OMP_TARGET and OMP_TEAMS.
gcc/c/
* c-parser.c (OMP_TARGET_CLAUSE_MASK): Add
PRAGMA_OMP_CLAUSE_THREAD_LIMIT.
gcc/cp/
* parser.c (OMP_TARGET_CLAUSE_MASK): Add
PRAGMA_OMP_CLAUSE_THREAD_LIMIT.
libgomp/
* task.c (gomp_create_target_task): Copy args array as well.
* target.c (gomp_target_fallback): Add args argument.
Set gomp_icv (true)->thread_limit_var if thread_limit is present.
(GOMP_target): Adjust gomp_target_fallback caller.
(GOMP_target_ext): Likewise.
(gomp_target_task_fn): Likewise.
* config/nvptx/team.c (gomp_nvptx_main): Set
gomp_global_icv.thread_limit_var.
* testsuite/libgomp.c-c++-common/thread-limit-1.c: New test.
Aldy Hernandez [Mon, 15 Nov 2021 08:56:56 +0000 (09:56 +0100)]
Fix PHI ordering problems in the path solver.
After auditing the PHI range calculations, I'm not convinced we've
caught all the corner cases. They haven't shown up in the wild (yet),
but better safe than sorry.
We shouldn't write anything to the cache or trigger additional
lookups while calculating a PHI, as this may cause ordering problems.
We should resolve the PHI with either the cache as it stands, or by
asking for ranges on entry to the path. I've documented this.
There was one dubious case where we called fold_range in
ssa_range_in_phi, which mostly by luck wasn't triggering lookups,
because fold_range solves a PHI by calling range_on_edge, which is set
to pick up global ranges by default in path_range_query. This is
fragile, so I've rewritten the call to explicitly use cached or global
ranges.
Also, the cache should be avoided in ssa_range_in_phi when the arg is
defined in the PHI's block, as not doing so could create an ordering
problem. We have a similar check when calculating relations in PHIs.
Tested on x86-64 & ppc64le Linux.
gcc/ChangeLog:
* gimple-range-path.cc (path_range_query::internal_range_of_expr):
Remove useless code.
(path_range_query::ssa_defined_in_bb): New.
(path_range_query::ssa_range_in_phi): Avoid fold_range call that
could trigger additional lookups.
Do not use the cache for ARGs defined in this block.
(path_range_query::compute_ranges_in_block): Use ssa_defined_in_bb.
(path_range_query::maybe_register_phi_relation): Same.
(path_range_query::range_of_stmt): Adjust comment.
* gimple-range-path.h (ssa_defined_in_bb): New.
Aldy Hernandez [Mon, 15 Nov 2021 08:56:48 +0000 (09:56 +0100)]
path solver: Default to global range if nothing found.
This has been a long time coming, but we weren't able to make the
change because of some unrelated regressions.
Tested on x86-64 & ppc64le Linux.
gcc/ChangeLog:
* gimple-range-path.cc (path_range_query::internal_range_of_expr):
Default to global range if nothing found.
gcc/testsuite/ChangeLog:
* g++.dg/tree-ssa/pr31146-2.C: Add -fno-thread-jumps.
Richard Biener [Mon, 15 Nov 2021 10:37:56 +0000 (11:37 +0100)]
tree-optimization/103237 - avoid vectorizing unhandled double reductions
Double reductions which have multiple LC PHIs in the inner loop
are not handled correctly during transformation since those PHIs
are not properly classified as reduction. The following disables
vectorizing them.
2021-11-15 Richard Biener <rguenther@suse.de>
PR tree-optimization/103237
* tree-vect-loop.c (vect_is_simple_reduction): Fail for
double reductions with multiple inner loop LC PHI nodes.
* gcc.dg/torture/pr103237.c: New testcase.
Hongyu Wang [Fri, 12 Nov 2021 02:50:46 +0000 (10:50 +0800)]
PR target/103069: Relax cmpxchg loop for x86 target
From the CPU's point of view, getting a cache line for writing is more
expensive than reading. See Appendix A.2 Spinlock in:
https://www.intel.com/content/dam/www/public/us/en/documents/white-papers/
xeon-lock-scaling-analysis-paper.pdf
The full compare and swap will grab the cache line exclusive and causes
excessive cache line bouncing.
The atomic_fetch_{or,xor,and,nand} builtins generates cmpxchg loop under
-march=x86-64 like:
movl v(%rip), %eax
.L2:
movl %eax, %ecx
movl %eax, %edx
orl $1, %ecx
lock cmpxchgl %ecx, v(%rip)
jne .L2
movl %edx, %eax
andl $1, %eax
ret
To relax above loop, GCC should first emit a normal load, check and jump to
.L2 if cmpxchgl may fail. Before jump to .L2, PAUSE should be inserted to
yield the CPU to another hyperthread and to save power, so the code is
like
.L84:
movl (%rdi), %ecx
movl %eax, %edx
orl %esi, %edx
cmpl %eax, %ecx
jne .L82
lock cmpxchgl %edx, (%rdi)
jne .L84
.L82:
rep nop
jmp .L84
This patch adds corresponding atomic_fetch_op expanders to insert load/
compare and pause for all the atomic logic fetch builtins. Add flag
-mrelax-cmpxchg-loop to control whether to generate relaxed loop.
gcc/ChangeLog:
PR target/103069
* config/i386/i386-expand.c (ix86_expand_atomic_fetch_op_loop):
New expand function.
* config/i386/i386-options.c (ix86_target_string): Add
-mrelax-cmpxchg-loop flag.
(ix86_valid_target_attribute_inner_p): Likewise.
* config/i386/i386-protos.h (ix86_expand_atomic_fetch_op_loop):
New expand function prototype.
* config/i386/i386.opt: Add -mrelax-cmpxchg-loop.
* config/i386/sync.md (atomic_fetch_<logic><mode>): New expander
for SI,HI,QI modes.
(atomic_<logic>_fetch<mode>): Likewise.
(atomic_fetch_nand<mode>): Likewise.
(atomic_nand_fetch<mode>): Likewise.
(atomic_fetch_<logic><mode>): New expander for DI,TI modes.
(atomic_<logic>_fetch<mode>): Likewise.
(atomic_fetch_nand<mode>): Likewise.
(atomic_nand_fetch<mode>): Likewise.
* doc/invoke.texi: Document -mrelax-cmpxchg-loop.
gcc/testsuite/ChangeLog:
PR target/103069
* gcc.target/i386/pr103069-1.c: New test.
* gcc.target/i386/pr103069-2.c: Ditto.
Richard Biener [Mon, 15 Nov 2021 10:07:55 +0000 (11:07 +0100)]
tree-optimization/103219 - avoid ICE in unroll-and-jam
For no particularly good reason unroll-and-jam uses single_dom_exit
to determine the exit for the region it wants to run VN on. That
happens to ICE because of the dominance restriction. Use single_exit
instead.
2021-11-15 Richard Biener <rguenther@suse.de>
PR tree-optimization/103219
* gimple-loop-jam.c (tree_loop_unroll_and_jam): Use single_exit
to determine the exit for the VN region.
* gcc.dg/torture/pr103219.c: New testcase.
Prathamesh Kulkarni [Mon, 15 Nov 2021 10:07:36 +0000 (15:37 +0530)]
[tree-vectorizer.c] Merge pass_vectorize::execute with vectorize_loops and replace occurences of cfun with function param.
gcc/ChangeLog:
* tree-ssa-loop.c (pass_vectorize): Move to tree-vectorizer.c.
(pass_data_vectorize): Likewise.
(make_pass_vectorize): Likewise.
* tree-vectorizer.c (vectorize_loops): Merge with
pass_vectorize::execute and replace cfun occurences with fun param.
(adjust_simduid_builtins): Add fun param, replace cfun occurences with
fun, and adjust callers approrpiately.
(note_simd_array_uses): Likewise.
(vect_loop_dist_alias_call): Likewise.
(set_uid_loop_bbs): Likewise.
(vect_transform_loops): Likewise.
(try_vectorize_loop_1): Likewise.
(try_vectorize_loop): Likewise.
Rainer Orth [Mon, 15 Nov 2021 09:24:27 +0000 (10:24 +0100)]
libffi: Use #define instead of .macro in src/x86/win64.S [PR102874]
The libffi 3.4.2 import badly broke Solaris/x86 bootstrap with the native
assembler:
Assembler:
"/vol/gcc/src/hg/master/local/libffi/src/x86/win64.S", line 88 :
Illegal mnemonic
Near line: ".macro epilogue"
"/vol/gcc/src/hg/master/local/libffi/src/x86/win64.S", line 88 : Syntax
error
Near line: ".macro epilogue"
"/vol/gcc/src/hg/master/local/libffi/src/x86/win64.S", line 95 :
Illegal mnemonic
Near line: ".endm"
"/vol/gcc/src/hg/master/local/libffi/src/x86/win64.S", line 95 : Syntax
error
Near line: ".endm"
"/vol/gcc/src/hg/master/local/libffi/src/x86/win64.S", line 100 :
Illegal mnemonic
Near line: " epilogue"
"/vol/gcc/src/hg/master/local/libffi/src/x86/win64.S", line 100 :
Syntax error
Near line: "epilogue"
Solaris as doesn't support .macro/.endm.
Fixed by using #define instead of the unportable .macro.
Tested on i386-pc-solaris2.11 and x86_64-pc-linux-gnu.
The bug has been reported upstream
(https://github.com/libffi/libffi/issues/665); a corresponding pull
request is also pending (https://github.com/libffi/libffi/pull/669).
2021-10-21 Rainer Orth <ro@CeBiTec.Uni-Bielefeld.DE>
libffi:
PR libffi/102874
* src/x86/win64.S (epilogue): Use #define instead of .macro.
Rainer Orth [Mon, 15 Nov 2021 09:00:14 +0000 (10:00 +0100)]
testsuite: i386: Require dfp in gcc.target/i386/pr101346.c
gcc.target/i386/pr101346.c currently FAILs on Solaris/x86:
FAIL: gcc.target/i386/pr101346.c (test for excess errors)
Excess errors:
/vol/gcc/src/hg/master/local/gcc/testsuite/gcc.target/i386/pr101346.c:6:1:
error: decimal floating-point not supported for this target
/vol/gcc/src/hg/master/local/gcc/testsuite/gcc.target/i386/pr101346.c:7:6:
error: decimal floating-point not supported for this target
/vol/gcc/src/hg/master/local/gcc/testsuite/gcc.target/i386/pr101346.c:9:12:
warning: implicit declaration of function '__builtin_fabsd128'; did you
mean '__builtin_fabsf128'? [-Wimplicit-function-declaration]
Fixed by requiring dfp support. Tested on i386-pc-solaris2.11 and
x86_64-pc-linux-gnu.
2021-10-20 Rainer Orth <ro@CeBiTec.Uni-Bielefeld.DE>
gcc/testsuite:
* gcc.target/i386/pr101346.c: Require dfp support.
Jakub Jelinek [Mon, 15 Nov 2021 08:30:08 +0000 (09:30 +0100)]
i386: Fix up x86 atomic_bit_test* expanders for !TARGET_HIMODE_MATH [PR103205]
With !TARGET_HIMODE_MATH, the OPTAB_DIRECT expand_simple_binop fail and so
we ICE. We don't really care if they are done promoted in SImode instead.
2021-11-15 Jakub Jelinek <jakub@redhat.com>
PR target/103205
* config/i386/sync.md (atomic_bit_test_and_set<mode>,
atomic_bit_test_and_complement<mode>,
atomic_bit_test_and_reset<mode>): Use OPTAB_WIDEN instead of
OPTAB_DIRECT.
* gcc.target/i386/pr103205.c: New test.
Jakub Jelinek [Mon, 15 Nov 2021 08:20:52 +0000 (09:20 +0100)]
libgomp, nvptx: Honor OpenMP 5.1 num_teams lower bound
Here is a PTX implementation of what I was talking about, that for
num_teams_upper 0 or whenever num_teams_lower <= num_blocks, the current
implementation is fine but if the user explicitly asks for more
teams than we can provide in hardware, we need to stop assuming that
omp_get_team_num () is equal to the hw team id, but instead need to use some
team specific memory (it is .shared for PTX), or if none is
provided, array indexed by the hw team id and run some teams serially within
the same hw thread.
2021-11-15 Jakub Jelinek <jakub@redhat.com>
* config/nvptx/team.c (__gomp_team_num): Define as
__attribute__((shared)) var.
(gomp_nvptx_main): Initialize __gomp_team_num to 0.
* config/nvptx/target.c (__gomp_team_num): Declare as
extern __attribute__((shared)) var.
(GOMP_teams4): Use __gomp_team_num as the team number instead of
%ctaid.x. If first, initialize it to %ctaid.x. If num_teams_lower
is bigger than num_blocks, use num_teams_lower teams and arrange for
bumping of __gomp_team_num if !first and returning false once we run
out of teams.
* config/nvptx/teams.c (__gomp_team_num): Declare as
extern __attribute__((shared)) var.
(omp_get_team_num): Return __gomp_team_num value instead of %ctaid.x.
Jakub Jelinek [Mon, 15 Nov 2021 07:54:52 +0000 (08:54 +0100)]
libgomp: Add a testcase for omp_get_num_teams inside of target inside of host teams
This is https://github.com/OpenMP/spec/issues/3183
There is an agreement that we should return 1 team inside of target,
even if that target is inside of host teams. We were doing that
when offloading and not during host fallback, r12-5151 should fix that
even for host fallback.
2021-11-15 Jakub Jelinek <jakub@redhat.com>
* testsuite/libgomp.c/teams-5.c: New test.
Jason Merrill [Mon, 15 Nov 2021 04:18:19 +0000 (23:18 -0500)]
c++: location of lambda object and conversion call
Two things that had poor location info: we weren't giving the TARGET_EXPR
for a lambda object any location, and the call to a conversion function was
getting whatever input_location happened to be.
gcc/cp/ChangeLog:
* call.c (perform_implicit_conversion_flags): Use the location of
the argument.
* lambda.c (build_lambda_object): Set location on the TARGET_EXPR.
gcc/testsuite/ChangeLog:
* g++.dg/cpp0x/lambda/lambda-switch.C: Adjust expected location.
Jason Merrill [Sat, 13 Nov 2021 21:59:31 +0000 (16:59 -0500)]
c++: check constexpr constructor body
The implicit constexpr patch revealed that our checks for constexpr
constructors that could possibly produce a constant value (which
otherwise are IFNDR) was failing to look at most of the function body.
Fixing that required some library tweaks.
gcc/cp/ChangeLog:
* constexpr.c (maybe_save_constexpr_fundef): Also check whether the
body of a constructor is potentially constant.
libstdc++-v3/ChangeLog:
* src/c++17/memory_resource.cc: Add missing constexpr.
* include/experimental/internet: Only mark copy constructor
as constexpr with __cpp_constexpr_dynamic_alloc.
gcc/testsuite/ChangeLog:
* g++.dg/cpp1y/constexpr-89285-2.C: Expect error.
* g++.dg/cpp1y/constexpr-89285.C: Adjust error.
Jason Merrill [Sat, 13 Nov 2021 22:16:46 +0000 (17:16 -0500)]
c++: is_this_parameter and coroutines proxies
Compiling coroutines/pr95736.C with the implicit constexpr patch broke
because is_this_parameter didn't recognize the coroutines proxy for 'this'.
gcc/cp/ChangeLog:
* semantics.c (is_this_parameter): Check DECL_HAS_VALUE_EXPR_P
instead of is_capture_proxy.
Jason Merrill [Fri, 12 Nov 2021 03:03:53 +0000 (22:03 -0500)]
c++: c++20 constexpr default ctor and array init
The implicit constexpr patch revealed that marking the constructor in the
PR70690 testcase as constexpr made the bug reappear, because build_vec_init
assumed that a constexpr default constructor initialized the whole object,
so it was equivalent to value-initialization. But this is no longer true in
C++20.
PR c++/70690
gcc/cp/ChangeLog:
* init.c (build_vec_init): Check default_init_uninitialized_part in
C++20.
gcc/testsuite/ChangeLog:
* g++.dg/init/array41a.C: New test.
Jason Merrill [Fri, 5 Nov 2021 04:08:53 +0000 (00:08 -0400)]
c++: don't do constexpr folding in unevaluated context
The implicit constexpr patch revealed that we were doing constant evaluation
of arbitrary expressions in unevaluated contexts, leading to failure when we
tried to evaluate e.g. a call to declval. This is wrong more generally;
only manifestly-constant-evaluated expressions should be evaluated within
an unevaluated operand.
Making this change revealed a case we were failing to mark as manifestly
constant-evaluated.
gcc/cp/ChangeLog:
* constexpr.c (maybe_constant_value): Don't evaluate
in an unevaluated operand unless manifestly const-evaluated.
(fold_non_dependent_expr_template): Likewise.
* decl.c (compute_array_index_type_loc): This context is
manifestly constant-evaluated.
Jason Merrill [Wed, 10 Nov 2021 21:42:04 +0000 (16:42 -0500)]
c++: constexpr virtual and vbase thunk
C++20 allows virtual functions to be constexpr. I don't think that calling
through a pointer to a vbase subobject is supposed to work in a constant
expression, since an object with virtual bases can't be constant, but the
call shouldn't ICE.
gcc/cp/ChangeLog:
* constexpr.c (cxx_eval_thunk_call): Error instead of ICE
on vbase thunk to constexpr function.
gcc/testsuite/ChangeLog:
* g++.dg/cpp2a/constexpr-virtual20.C: New test.
Hans-Peter Nilsson [Mon, 15 Nov 2021 06:50:44 +0000 (07:50 +0100)]
gcc.dg/uninit-pred-9_b.c: Correct last adjustment for cris-elf
The change at r12-4790 should have done the same change for
CRIS as was done for powerpc64*-*-*. (Probably MMIX too but
that may have to wait until the next weekend.)
gcc/testsuite:
* gcc.dg/uninit-pred-9_b.c: Correct last adjustment, for CRIS.
Maciej W. Rozycki [Mon, 15 Nov 2021 03:14:31 +0000 (03:14 +0000)]
VAX: Implement the `-mlra' command-line option
Add the the `-mlra' command-line option for the VAX target, with the
usual semantics of enabling Local Register Allocation, off by default.
LRA remains unstable with the VAX target, with numerous ICEs throughout
the testsuite and worse code produced overall where successful, however
the presence of a command line option to enable it makes it easier to
experiment with it as the compiler does not have to be rebuilt to flip
between the old reload and LRA.
gcc/
* config/vax/vax.c (vax_lra_p): New prototype and function.
(TARGET_LRA_P): Wire it.
* config/vax/vax.opt (mlra): New option.
* doc/invoke.texi (Option Summary, VAX Options): Document the
new option.
GCC Administrator [Mon, 15 Nov 2021 00:16:20 +0000 (00:16 +0000)]
Daily bump.
Andrew Pinski [Sun, 14 Nov 2021 23:54:32 +0000 (23:54 +0000)]
[Commmitted] Move some testcases to torture from tree-ssa
While writing up some testcases, I noticed some newer testcases
just had "dg-do compile/run" on them with dg-options of either -O1
or -O2. Since it is always better to run them over all optimization
levels I put them in gcc.c-torture/compile or gcc.c-torture/execute.
Committed after testing to make sure the testcases pass.
gcc/testsuite/ChangeLog:
* gcc.dg/tree-ssa/pr100278.c: Move to ...
* gcc.c-torture/compile/pr100278.c: Here.
Remove dg-do and dg-options.
* gcc.dg/tree-ssa/pr101189.c: Move to ...
* gcc.c-torture/compile/pr101189.c: Here.
Remove dg-do and dg-options.
* gcc.dg/tree-ssa/pr100453.c: Move to ...
* gcc.c-torture/execute/pr100453.c: Here.
Remove dg-do and dg-options.
* gcc.dg/tree-ssa/pr101335.c: Move to ...
* gcc.c-torture/execute/pr101335.c: Here
Remove dg-do and dg-options.
Jan Hubicka [Sun, 14 Nov 2021 23:10:06 +0000 (00:10 +0100)]
Track nondeterminism and interposable calls in ipa-modref
Adds tracking of two new flags in ipa-modref: nondeterministic and
calls_interposable. First is set when function does something that is not
guaranteed to be the same if run again (volatile memory access, volatile asm or
external function call). Second is set if function calls something that
does not bind to current def.
nondeterministic enables ipa-modref to discover looping pure/const functions
and it now discovers 138 of them during cc1plus link (which about doubles
number of such functions detected late). We however can do more
1) We can extend FRE to eliminate redundant calls.
I filled a PR103168 for that.
A common case are inline functions that are not autodetected as ECF_CONST
just becuase they do not bind to local def and can be easily handled.
More tricky is to use modref summary to check what memory locations are
read.
2) DSE can eliminate redundant stores
The calls_interposable flag currently also improves tree-ssa-structalias
on functions that are not binds_to_current_def since reads_global_memory
is now not cleared by interposable functions.
gcc/ChangeLog:
* ipa-modref.h (struct modref_summary): Add nondeterministic
and calls_interposable flags.
* ipa-modref.c (modref_summary::modref_summary): Initialize new flags.
(modref_summary::useful_p): Check new flags.
(struct modref_summary_lto): Add nondeterministic and
calls_interposable flags.
(modref_summary_lto::modref_summary_lto): Initialize new flags.
(modref_summary_lto::useful_p): Check new flags.
(modref_summary::dump): Dump new flags.
(modref_summary_lto::dump): Dump new flags.
(ignore_nondeterminism_p): New function.
(merge_call_side_effects): Merge new flags.
(process_fnspec): Likewise.
(analyze_load): Volatile access is nondeterministic.
(analyze_store): Liekwise.
(analyze_stmt): Volatile ASM is nondeterministic.
(analyze_function): Clear new flags.
(modref_summaries::duplicate): Duplicate new flags.
(modref_summaries_lto::duplicate): Duplicate new flags.
(modref_write): Stream new flags.
(read_section): Stream new flags.
(propagate_unknown_call): Update new flags.
(modref_propagate_in_scc): Propagate new flags.
* tree-ssa-alias.c (ref_maybe_used_by_call_p_1): Check
calls_interposable.
* tree-ssa-structalias.c (determine_global_memory_access):
Likewise.
Maciej W. Rozycki [Sun, 14 Nov 2021 21:01:51 +0000 (21:01 +0000)]
VAX: Add the `setmemhi' instruction
The MOVC5 machine instruction has `memset' semantics if encoded with a
zero source length[1]:
"4. MOVC5 with a zero source length operand is the preferred way
to fill a block of memory with the fill character."
Use that instruction to implement the `setmemhi' instruction then. Use
the AP register in the register deferred mode for the source address to
yield the shortest possible encoding of the otherwise unused operand,
observing that the address is never dereferenced if the source length is
zero.
The use of this instruction yields steadily better performance, at least
with the Mariah VAX implementation, for a variable-length `memset' call
expanded inline as a single MOVC5 operation compared to an equivalent
libcall invocation:
Length: 1, time elapsed: 0.971789 (builtin), 2.847303 (libcall)
Length: 2, time elapsed: 0.907904 (builtin), 2.728259 (libcall)
Length: 3, time elapsed: 1.038311 (builtin), 2.917245 (libcall)
Length: 4, time elapsed: 0.775305 (builtin), 2.686088 (libcall)
Length: 7, time elapsed: 1.112331 (builtin), 2.992968 (libcall)
Length: 8, time elapsed: 0.856882 (builtin), 2.764885 (libcall)
Length: 15, time elapsed: 1.256086 (builtin), 3.096660 (libcall)
Length: 16, time elapsed: 1.001962 (builtin), 2.888131 (libcall)
Length: 31, time elapsed: 1.590456 (builtin), 3.774164 (libcall)
Length: 32, time elapsed: 1.288909 (builtin), 3.629622 (libcall)
Length: 63, time elapsed: 3.430285 (builtin), 5.269789 (libcall)
Length: 64, time elapsed: 3.265147 (builtin), 5.113156 (libcall)
Length: 127, time elapsed: 6.438772 (builtin), 8.268305 (libcall)
Length: 128, time elapsed: 6.268991 (builtin), 8.114557 (libcall)
Length: 255, time elapsed: 12.417338 (builtin), 14.259678 (libcall)
(times given in seconds per 1000000 `memset' invocations for the given
length made in a loop). It is clear from these figures that hardware
does data coalescence for consecutive bytes rather than naively copying
them one by one, as for lengths that are powers of 2 the figures are
consistently lower than ones for their respective next lower lengths.
The use of MOVC5 also requires at least 4 bytes less in terms of machine
code as it avoids encoding the address of `memset' needed for the CALLS
instruction used to make a libcall, as well as extra PUSHL instructions
needed to pass arguments to the call as those can be encoded directly as
the respective operands of the MOVC5 instruction.
It is perhaps worth noting too that for constant lengths we prefer to
emit up to 5 individual MOVx instructions rather than a single MOVC5
instruction to clear memory and for consistency we copy this behavior
here for filling memory with another value too, even though there may be
a performance advantage with a string copy in comparison to a piecemeal
copy, e.g.:
Length: 40, time elapsed: 2.183192 (string), 2.638878 (piecemeal)
But this is something for another change as it will have to be carefully
evaluated.
[1] DEC STD 032-0 "VAX Architecture Standard", Digital Equipment
Corporation, A-DS-EL-00032-00-0 Rev J, December 15, 1989, Section
3.10 "Character-String Instructions", p. 3-163
gcc/
* config/vax/vax.h (SET_RATIO): New macro.
* config/vax/vax.md (UNSPEC_SETMEM_FILL): New constant.
(setmemhi): New expander.
(setmemhi1): New insn and splitter.
(*setmemhi1): New insn.
gcc/testsuite/
* gcc.target/vax/setmem.c: New test.
François Dumont [Fri, 12 Nov 2021 06:26:33 +0000 (07:26 +0100)]
libstdc++: [_GLIBCXX_DEBUG] Remove _Safe_container<>::_M_safe()
_GLIBCXX_DEBUG container code cleanup to get rid of _Safe_container<>::_M_safe() and just
use _Safe:: calls which use normal inheritance. Also remove several usages of _M_base()
which can be most of the time ommitted and sometimes replace with explicit _Base::
calls.
libstdc++-v3/ChangeLog:
* include/debug/safe_container.h (_Safe_container<>::_M_safe): Remove.
* include/debug/deque (deque::operator=(initializer_list<>)): Replace
_M_base() call with _Base:: call.
(deque::operator[](size_type)): Likewise.
* include/debug/forward_list (forward_list(forward_list&&, const allocator_type&)):
Remove _M_safe() and _M_base() calls.
(forward_list::operator=(initializer_list<>)): Remove _M_base() calls.
(forward_list::splice_after, forward_list::merge): Likewise.
* include/debug/list (list(list&&, const allocator_type&)):
Remove _M_safe() and _M_base() calls.
(list::operator=(initializer_list<>)): Remove _M_base() calls.
(list::splice, list::merge): Likewise.
* include/debug/map.h (map(map&&, const allocator_type&)):
Remove _M_safe() and _M_base() calls.
(map::operator=(initializer_list<>)): Remove _M_base() calls.
* include/debug/multimap.h (multimap(multimap&&, const allocator_type&)):
Remove _M_safe() and _M_base() calls.
(multimap::operator=(initializer_list<>)): Remove _M_base() calls.
* include/debug/set.h (set(set&&, const allocator_type&)):
Remove _M_safe() and _M_base() calls.
(set::operator=(initializer_list<>)): Remove _M_base() calls.
* include/debug/multiset.h (multiset(multiset&&, const allocator_type&)):
Remove _M_safe() and _M_base() calls.
(multiset::operator=(initializer_list<>)): Remove _M_base() calls.
* include/debug/string (basic_string(basic_string&&, const allocator_type&)):
Remove _M_safe() and _M_base() calls.
(basic_string::operator=(initializer_list<>)): Remove _M_base() call.
(basic_string::operator=(const _CharT*), basic_string::operator=(_CharT)): Likewise.
(basic_string::operator[](size_type), basic_string::operator+=(const basic_string&)):
Likewise.
(basic_string::operator+=(const _Char*), basic_string::operator+=(_CharT)): Likewise.
* include/debug/unordered_map (unordered_map(unordered_map&&, const allocator_type&)):
Remove _M_safe() and _M_base() calls.
(unordered_map::operator=(initializer_list<>), unordered_map::merge):
Remove _M_base() calls.
(unordered_multimap(unordered_multimap&&, const allocator_type&)):
Remove _M_safe() and _M_base() calls.
(unordered_multimap::operator=(initializer_list<>), unordered_multimap::merge):
Remove _M_base() calls.
* include/debug/unordered_set (unordered_set(unordered_set&&, const allocator_type&)):
Remove _M_safe() and _M_base() calls.
(unordered_set::operator=(initializer_list<>), unordered_set::merge):
Remove _M_base() calls.
(unordered_multiset(unordered_multiset&&, const allocator_type&)):
Remove _M_safe() and _M_base() calls.
(unordered_multiset::operator=(initializer_list<>), unordered_multiset::merge):
Remove _M_base() calls.
* include/debug/vector (vector(vector&&, const allocator_type&)):
Remove _M_safe() and _M_base() calls.
(vector::operator=(initializer_list<>)): Remove _M_base() calls.
(vector::operator[](size_type)): Likewise.
Jan Hubicka [Sun, 14 Nov 2021 17:49:15 +0000 (18:49 +0100)]
Extend modref to track kills
This patch adds kill tracking to ipa-modref. This is representd by array
of accesses to memory locations that are known to be overwritten by the
function.
gcc/ChangeLog:
2021-11-14 Jan Hubicka <hubicka@ucw.cz>
* ipa-modref-tree.c (modref_access_node::update_for_kills): New
member function.
(modref_access_node::merge_for_kills): Likewise.
(modref_access_node::insert_kill): Likewise.
* ipa-modref-tree.h (modref_access_node::update_for_kills,
modref_access_node::merge_for_kills, modref_access_node::insert_kill):
Declare.
(modref_access_node::useful_for_kill): New member function.
* ipa-modref.c (modref_summary::useful_p): Release useless kills.
(lto_modref_summary): Add kills.
(modref_summary::dump): Dump kills.
(record_access): Add mdoref_access_node parameter.
(record_access_lto): Likewise.
(merge_call_side_effects): Merge kills.
(analyze_call): Add ALWAYS_EXECUTED param and pass it around.
(struct summary_ptrs): Add always_executed filed.
(analyze_load): Update.
(analyze_store): Update; record kills.
(analyze_stmt): Add always_executed; record kills in clobbers.
(analyze_function): Track always_executed.
(modref_summaries::duplicate): Duplicate kills.
(update_signature): Release kills.
* ipa-modref.h (struct modref_summary): Add kills.
* tree-ssa-alias.c (alias_stats): Add kill stats.
(dump_alias_stats): Dump kill stats.
(store_kills_ref_p): Break out from ...
(stmt_kills_ref_p): Use it; handle modref info based kills.
gcc/testsuite/ChangeLog:
2021-11-14 Jan Hubicka <hubicka@ucw.cz>
* gcc.dg/tree-ssa/modref-dse-3.c: New test.
Aldy Hernandez [Sun, 14 Nov 2021 15:17:36 +0000 (16:17 +0100)]
Remove gcc.dg/pr103229.c
gcc/testsuite/ChangeLog:
* gcc.dg/pr103229.c: Removed.
Aldy Hernandez [Sun, 14 Nov 2021 10:27:32 +0000 (11:27 +0100)]
Do not pass NULL to memset in ssa_global_cache.
The code computing ranges in PHIs in the path solver reuses the
temporary ssa_global_cache by calling its clear method. Calling it on
an empty cache causes us to call memset with NULL.
Tested on x86-64 Linux.
gcc/ChangeLog:
PR tree-optimization/103229
* gimple-range-cache.cc (ssa_global_cache::clear): Do not pass
null value to memset.
gcc/testsuite/ChangeLog:
* gcc.dg/pr103229.c: New test.
Martin Liska [Sun, 14 Nov 2021 12:54:32 +0000 (13:54 +0100)]
tsan: remove not needed -ldl in options
gcc/testsuite/ChangeLog:
* c-c++-common/tsan/free_race.c: Remove unnecessary -ldl.
* c-c++-common/tsan/free_race2.c: Likewise.
Jan Hubicka [Sun, 14 Nov 2021 11:01:41 +0000 (12:01 +0100)]
Cleanup tree-ssa-alias and tree-ssa-dse use of modref summary
Move code getting tree op from access_node and stmt to a common place. I also
commonized logic to build ao_ref. While I was on it I also replaced FOR_EACH_*
by range for since they reads better.
gcc/ChangeLog:
2021-11-14 Jan Hubicka <hubicka@ucw.cz>
* ipa-modref-tree.c (modref_access_node::get_call_arg): New member
function.
(modref_access_node::get_ao_ref): Likewise.
* ipa-modref-tree.h (modref_access_node::get_call_arg): Declare.
(modref_access_node::get_ao_ref): Declare.
* tree-ssa-alias.c (modref_may_conflict): Use new accessors.
* tree-ssa-dse.c (dse_optimize_call): Use new accessors.
gcc/testsuite/ChangeLog:
2021-11-14 Jan Hubicka <hubicka@ucw.cz>
* c-c++-common/asan/null-deref-1.c: Update template.
* c-c++-common/tsan/free_race.c: Update template.
* c-c++-common/tsan/free_race2.c: Update template.
* gcc.dg/ipa/ipa-sra-4.c: Update template.
GCC Administrator [Sun, 14 Nov 2021 00:16:23 +0000 (00:16 +0000)]
Daily bump.
Jan Hubicka [Sat, 13 Nov 2021 23:48:32 +0000 (00:48 +0100)]
Fix bug in ipa-pure-const and add debug counters
gcc/ChangeLog:
PR lto/103211
* dbgcnt.def (ipa_attr): New counters.
* ipa-pure-const.c: Include dbgcnt.c
(ipa_make_function_const): Use debug counter.
(ipa_make_function_pure): Likewise.
(propagate_pure_const): Fix bug in my previous change.
Jan Hubicka [Sat, 13 Nov 2021 22:18:38 +0000 (23:18 +0100)]
More ipa-modref-tree.h cleanups
Move access dumping to member function and cleanup formating.
gcc/ChangeLog:
2021-11-13 Jan Hubicka <hubicka@ucw.cz>
* ipa-modref-tree.c (modref_access_node::range_info_useful_p):
Offline from ipa-modref-tree.h.
(modref_access_node::dump): Move from ipa-modref.c; make member
function.
* ipa-modref-tree.h (modref_access_node::range_info_useful_p.
modref_access_node::dump): Declare.
* ipa-modref.c (dump_access): Remove.
(dump_records): Update.
(dump_lto_records): Update.
(record_access): Update.
(record_access_lto): Update.
Jan Hubicka [Sat, 13 Nov 2021 21:25:23 +0000 (22:25 +0100)]
Implement DSE of dead functions calls storing memory.
gcc/ChangeLog:
2021-11-13 Jan Hubicka <hubicka@ucw.cz>
* ipa-modref.c (modref_summary::modref_summary): Clear new flags.
(modref_summary::dump): Dump try_dse.
(modref_summary::finalize): Add FUN attribute; compute try-dse.
(analyze_function): Update.
(read_section): Update.
(update_signature): Update.
(pass_ipa_modref::execute): Update.
* ipa-modref.h (struct modref_summary):
* tree-ssa-alias.c (ao_ref_init_from_ptr_and_range): Export.
* tree-ssa-alias.h (ao_ref_init_from_ptr_and_range): Declare.
* tree-ssa-dse.c (dse_optimize_call): New function.
(dse_optimize_stmt): Use it.
gcc/testsuite/ChangeLog:
2021-11-13 Jan Hubicka <hubicka@ucw.cz>
* g++.dg/cpp1z/inh-ctor23.C: Fix template
* g++.dg/ipa/ipa-icf-4.C: Fix template
* gcc.dg/tree-ssa/modref-dse-1.c: New test.
* gcc.dg/tree-ssa/modref-dse-2.c: New test.
Jan Hubicka [Sat, 13 Nov 2021 19:43:55 +0000 (20:43 +0100)]
Fix checking disabled build.
gcc/ChangeLog:
2021-11-13 Jan Hubicka <hubicka@ucw.cz>
* ipa-modref-tree.c: Move #if CHECKING_P to proper place.
Xi Ruoyao [Tue, 9 Nov 2021 13:40:04 +0000 (21:40 +0800)]
fixincludes: simplify handling for access() failure [PR21283, PR80047]
POSIX says:
On some implementations, if buf is a null pointer, getcwd() may obtain
size bytes of memory using malloc(). In this case, the pointer returned
by getcwd() may be used as the argument in a subsequent call to free().
Invoking getcwd() with buf as a null pointer is not recommended in
conforming applications.
This produces an error building GCC with --enable-werror-always:
../../../fixincludes/fixincl.c: In function ‘process’:
../../../fixincludes/fixincl.c:1356:7: error: argument 1 is null but
the corresponding size argument 2 value is 4096 [-Werror=nonnull]
It's suggested by POSIX to call getcwd() with progressively larger
buffers until it does not give an [ERANGE] error. However, it's highly
unlikely that this error-handling route is ever used.
So we can simplify it instead of writting too much code. We give up to
use getcwd(), because `make` will output a `Leaving directory ...` message
containing the path to cwd when we call abort().
fixincludes/ChangeLog:
PR other/21823
PR bootstrap/80047
* fixincl.c (process): Simplify the handling for highly
unlikely access() failure, to avoid using non-standard
extensions.
Jan Hubicka [Sat, 13 Nov 2021 17:27:18 +0000 (18:27 +0100)]
modref_access_node cleanup
move member functions of modref_access_node from ipa-modref-tree.h to
ipa-modref-tree.c since they become long and not fitting for inlines anyway. I
also cleaned up the interface by making static insert method (which handles
inserting accesses into a vector and optimizing them) which makes it possible
to hide most of the interface handling interval merging private.
Honza
gcc/ChangeLog:
* ipa-modref-tree.h
(struct modref_access_node): Move longer member functions to
ipa-modref-tree.c
(modref_ref_node::try_merge_with): Turn into modreef_acces_node member
function.
* ipa-modref-tree.c (modref_access_node::contains): Move here
from ipa-modref-tree.h.
(modref_access_node::update): Likewise.
(modref_access_node::merge): Likewise.
(modref_access_node::closer_pair_p): Likewise.
(modref_access_node::forced_merge): Likewise.
(modref_access_node::update2): Likewise.
(modref_access_node::combined_offsets): Likewise.
(modref_access_node::try_merge_with): Likewise.
(modref_access_node::insert): Likewise.
Jan Hubicka [Sat, 13 Nov 2021 17:21:12 +0000 (18:21 +0100)]
Add finalize method to modref summary.
gcc/ChangeLog:
* ipa-modref.c (modref_summary::global_memory_read_p): Remove.
(modref_summary::global_memory_written_p): Remove.
(modref_summary::dump): Dump new flags.
(modref_summary::finalize): New member function.
(analyze_function): Call it.
(read_section): Call it.
(update_signature): Call it.
(pass_ipa_modref::execute): Call it.
* ipa-modref.h (struct modref_summary): Remove
global_memory_read_p and global_memory_written_p.
Add global_memory_read, global_memory_written.
* tree-ssa-structalias.c (determine_global_memory_access):
Update.
Jan Hubicka [Sat, 13 Nov 2021 14:46:57 +0000 (15:46 +0100)]
Whitelity type attributes for function signature change
gcc/ChangeLog:
* ipa-fnsummary.c (compute_fn_summary): Use type_attribut_allowed_p
* ipa-param-manipulation.c
(ipa_param_adjustments::type_attribute_allowed_p):
New member function.
(drop_type_attribute_if_params_changed_p): New function.
(build_adjusted_function_type): Use it.
* ipa-param-manipulation.h: Add type_attribute_allowed_p.
David Malcolm [Wed, 26 May 2021 19:47:23 +0000 (15:47 -0400)]
analyzer: add four new taint-based warnings
The initial commit of the analyzer in GCC 10 had a single warning,
-Wanalyzer-tainted-array-index
and required manually enabling the taint checker with
-fanalyzer-checker=taint (due to scaling issues).
This patch extends the taint detection to add four new taint-based
warnings:
-Wanalyzer-tainted-allocation-size
for e.g. attacker-controlled malloc/alloca
-Wanalyzer-tainted-divisor
for detecting where an attacker can inject a divide-by-zero
-Wanalyzer-tainted-offset
for attacker-controlled pointer offsets
-Wanalyzer-tainted-size
for e.g. attacker-controlled memset
and rewords all the warnings to talk about "attacker-controlled" values
rather than "tainted" values.
Unfortunately I haven't yet addressed the scaling issues, so all of
these still require -fanalyzer-checker=taint (in addition to -fanalyzer).
gcc/analyzer/ChangeLog:
* analyzer.opt (Wanalyzer-tainted-allocation-size): New.
(Wanalyzer-tainted-divisor): New.
(Wanalyzer-tainted-offset): New.
(Wanalyzer-tainted-size): New.
* engine.cc (impl_region_model_context::get_taint_map): New.
* exploded-graph.h (impl_region_model_context::get_taint_map):
New decl.
* program-state.cc (sm_state_map::get_state): Call
alt_get_inherited_state.
(sm_state_map::impl_set_state): Modify states within
compound svalues.
(program_state::impl_call_analyzer_dump_state): Undo casts.
(selftest::test_program_state_1): Update for new context param of
create_region_for_heap_alloc.
(selftest::test_program_state_merging): Likewise.
* region-model-impl-calls.cc (region_model::impl_call_alloca):
Likewise.
(region_model::impl_call_calloc): Likewise.
(region_model::impl_call_malloc): Likewise.
(region_model::impl_call_operator_new): Likewise.
(region_model::impl_call_realloc): Likewise.
* region-model.cc (region_model::check_region_access): Call
check_region_for_taint.
(region_model::get_representative_path_var_1): Handle binops.
(region_model::create_region_for_heap_alloc): Add "ctxt" param and
pass it to set_dynamic_extents.
(region_model::create_region_for_alloca): Likewise.
(region_model::set_dynamic_extents): Add "ctxt" param and use it
to call check_dynamic_size_for_taint.
(selftest::test_state_merging): Update for new context param of
create_region_for_heap_alloc.
(selftest::test_malloc_constraints): Likewise.
(selftest::test_malloc): Likewise.
(selftest::test_alloca): Likewise for create_region_for_alloca.
* region-model.h (region_model::create_region_for_heap_alloc): Add
"ctxt" param.
(region_model::create_region_for_alloca): Likewise.
(region_model::set_dynamic_extents): Likewise.
(region_model::check_dynamic_size_for_taint): New decl.
(region_model::check_region_for_taint): New decl.
(region_model_context::get_taint_map): New vfunc.
(noop_region_model_context::get_taint_map): New.
* sm-taint.cc: Remove include of "diagnostic-event-id.h"; add
includes of "gimple-iterator.h", "tristate.h", "selftest.h",
"ordered-hash-map.h", "cgraph.h", "cfg.h", "digraph.h",
"analyzer/supergraph.h", "analyzer/call-string.h",
"analyzer/program-point.h", "analyzer/store.h",
"analyzer/region-model.h", and "analyzer/program-state.h".
(enum bounds): Move to top of file.
(class taint_diagnostic): New.
(class tainted_array_index): Convert to subclass of taint_diagnostic.
(tainted_array_index::emit): Add CWE-129. Reword warning to use
"attacker-controlled" rather than "tainted".
(tainted_array_index::describe_state_change): Move to
taint_diagnostic::describe_state_change.
(tainted_array_index::describe_final_event): Reword to use
"attacker-controlled" rather than "tainted".
(class tainted_offset): New.
(class tainted_size): New.
(class tainted_divisor): New.
(class tainted_allocation_size): New.
(taint_state_machine::alt_get_inherited_state): New.
(taint_state_machine::on_stmt): In assignment handling, remove
ARRAY_REF handling in favor of check_region_for_taint. Add
detection of tainted divisors.
(taint_state_machine::get_taint): New.
(taint_state_machine::combine_states): New.
(region_model::check_region_for_taint): New.
(region_model::check_dynamic_size_for_taint): New.
* sm.h (state_machine::alt_get_inherited_state): New.
gcc/ChangeLog:
* doc/invoke.texi (Static Analyzer Options): Add
-Wno-analyzer-tainted-allocation-size,
-Wno-analyzer-tainted-divisor, -Wno-analyzer-tainted-offset, and
-Wno-analyzer-tainted-size to list. Add
-Wanalyzer-tainted-allocation-size, -Wanalyzer-tainted-divisor,
-Wanalyzer-tainted-offset, and -Wanalyzer-tainted-size to list
of options effectively enabled by -fanalyzer.
(-Wanalyzer-tainted-allocation-size): New.
(-Wanalyzer-tainted-array-index): Tweak wording; add link to CWE.
(-Wanalyzer-tainted-divisor): New.
(-Wanalyzer-tainted-offset): New.
(-Wanalyzer-tainted-size): New.
gcc/testsuite/ChangeLog:
* gcc.dg/analyzer/pr93382.c: Tweak expected wording.
* gcc.dg/analyzer/taint-alloc-1.c: New test.
* gcc.dg/analyzer/taint-alloc-2.c: New test.
* gcc.dg/analyzer/taint-divisor-1.c: New test.
* gcc.dg/analyzer/taint-1.c: Rename to...
* gcc.dg/analyzer/taint-read-index-1.c: ...this. Tweak expected
wording. Mark some events as xfail.
* gcc.dg/analyzer/taint-read-offset-1.c: New test.
* gcc.dg/analyzer/taint-size-1.c: New test.
* gcc.dg/analyzer/taint-write-index-1.c: New test.
* gcc.dg/analyzer/taint-write-offset-1.c: New test.
Signed-off-by: David Malcolm <dmalcolm@redhat.com>
Jan Hubicka [Sat, 13 Nov 2021 14:20:00 +0000 (15:20 +0100)]
Remember fnspec based EAF flags in modref summary.
gcc/ChangeLog:
* attr-fnspec.h (attr_fnspec::arg_eaf_flags): Break out from ...
* gimple.c (gimple_call_arg_flags): ... here.
* ipa-modref.c (analyze_parms): Record flags known from fnspec.
(modref_merge_call_site_flags): Use arg_eaf_flags.
Aldy Hernandez [Sat, 13 Nov 2021 11:37:25 +0000 (12:37 +0100)]
path solver: Compute all PHI ranges simultaneously.
PHIs must be resolved simulatenously, otherwise we may not pick up the
ranges incoming to the block.
For example. If we put p3_7 in the cache before all PHIs have been
computed, we will pick up the wrong p3_7 value for p2_17:
# p3_7 = PHI <1(2), 0(5)>
# p2_17 = PHI <1(2), p3_7(5)>
This patch delays updating the cache until all PHIs have been
analyzed.
gcc/ChangeLog:
PR tree-optimization/103222
* gimple-range-path.cc (path_range_query::compute_ranges_in_phis):
New.
(path_range_query::compute_ranges_in_block): Call
compute_ranges_in_phis.
* gimple-range-path.h (path_range_query::compute_ranges_in_phis):
New.
gcc/testsuite/ChangeLog:
* gcc.dg/pr103222.c: New test.
H.J. Lu [Sat, 13 Nov 2021 13:17:14 +0000 (05:17 -0800)]
libsanitizer: Update LOCAL_PATCHES
* LOCAL_PATCHES: Update to the corresponding revision.
H.J. Lu [Tue, 20 Jul 2021 17:46:51 +0000 (10:46 -0700)]
libsanitizer: Apply local patches
H.J. Lu [Sat, 13 Nov 2021 06:23:45 +0000 (22:23 -0800)]
libsanitizer: Merge with upstream
Merged revision:
82bc6a094e85014f1891ef9407496f44af8fe442
with the fix for PR sanitizer/102911
Jonathan Wakely [Fri, 12 Nov 2021 18:45:32 +0000 (18:45 +0000)]
libstdc++: Implement std::spanstream for C++23
This implements the <spanstream> header, as proposed for C++23 by P0448R4.
libstdc++-v3/ChangeLog:
* include/Makefile.am: Add spanstream header.
* include/Makefile.in: Regenerate.
* include/precompiled/stdc++.h: Add spanstream header.
* include/std/version (__cpp_lib_spanstream): Define.
* include/std/spanstream: New file.
* testsuite/27_io/spanstream/1.cc: New test.
* testsuite/27_io/spanstream/version.cc: New test.
Jan Hubicka [Sat, 13 Nov 2021 11:13:42 +0000 (12:13 +0100)]
Enable ipa-sra with fnspec attributes
Enable some ipa-sra on fortran by allowing signature changes on functions
with "fn spec" attribute when ipa-modref is enabled. This is possible since ipa-modref
knows how to preserve things we trace in fnspec and fnspec generated by fortran forntend
are quite simple and can be analysed automatically now. To be sure I will also add
code that merge fnspec to parameters.
This unfortunately hits bug in ipa-param-manipulation when we remove parameter
that specifies size of variable length parameter. For this reason I added a hack
that prevent signature changes on such functions and will handle it incrementally.
I tried creating C testcase but it is blocked by another problem that we punt ipa-sra
on access attribute. This is optimization regression we ought to fix so I filled
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=103223.
As a followup I will add code classifying the type attributes (we have just few) and
get stats on access attribute.
gcc/ChangeLog:
* ipa-fnsummary.c (compute_fn_summary): Do not give up on signature
changes on "fn spec" attribute; give up on varadic types.
* ipa-param-manipulation.c: Include attribs.h.
(build_adjusted_function_type): New parameter ARG_MODIFIED; if it is
true remove "fn spec" attribute.
(ipa_param_adjustments::build_new_function_type): Update.
(ipa_param_body_adjustments::modify_formal_parameters): update.
* ipa-sra.c: Include attribs.h.
(ipa_sra_preliminary_function_checks): Do not check for TYPE_ATTRIBUTES.
Aldy Hernandez [Fri, 12 Nov 2021 08:06:55 +0000 (09:06 +0100)]
path solver: Merge path_range_query constructors.
There's no need for two constructors, when we can do it all with one
that defaults to the common behavior:
path_range_query (bool resolve = true, gimple_ranger *ranger = NULL);
Tested on x86-64 Linux.
gcc/ChangeLog:
* gimple-range-path.cc (path_range_query::path_range_query): Merge
ctors.
(path_range_query::import_p): Move from header file.
(path_range_query::~path_range_query): Adjust for combined ctors.
* gimple-range-path.h: Merge ctors.
(path_range_query::import_p): Move to .cc file.
Jan Hubicka [Sat, 13 Nov 2021 00:51:25 +0000 (01:51 +0100)]
Fix wrong code with modref and some builtins.
ipa-modref gets confused by EAF flags of memcpy becuase parameter 1 is
escaping but used only directly. In modref we do not track values saved to
memory and thus we clear all other flags on each store. This needs to also
happen when called function escapes parameter.
gcc/ChangeLog:
PR tree-optimization/103182
* ipa-modref.c (callee_to_caller_flags): Fix merging of flags.
(modref_eaf_analysis::analyze_ssa_name): Fix merging of flags.
Hans-Peter Nilsson [Fri, 12 Nov 2021 17:04:43 +0000 (18:04 +0100)]
libstdc++: Use GCC_TRY_COMPILE_OR_LINK for getentropy, arc4random
Since r12-5056-g3439657b0286, there has been a regression in
test results; an additional 100 FAILs running the g++ and
libstdc++ testsuite on cris-elf, a newlib target. The
failures are linker errors, not finding a definition for
getentropy. It appears newlib has since 2017-12-03
declarations of getentropy and arc4random, and provides an
implementation of arc4random using getentropy, but provides no
definition of getentropy, not even a stub yielding ENOSYS.
This is similar to what it does for many other functions too.
While fixing newlib (like adding said stub) would likely help,
it still leaves older newlib releases hanging. Thankfully,
the libstdc++ configury test can be improved to try linking
where possible; using the bespoke GCC_TRY_COMPILE_OR_LINK
instead of AC_TRY_COMPILE. BTW, I see a lack of consistency;
some tests use AC_TRY_COMPILE and some GCC_TRY_COMPILE_OR_LINK
for no apparent reason, but this commit just amends
r12-5056-g3439657b0286.
libstdc++-v3:
PR libstdc++/103166
* acinclude.m4 (GLIBCXX_CHECK_GETENTROPY, GLIBCXX_CHECK_ARC4RANDOM):
Use GCC_TRY_COMPILE_OR_LINK instead of AC_TRY_COMPILE.
* configure: Regenerate.
GCC Administrator [Sat, 13 Nov 2021 00:16:39 +0000 (00:16 +0000)]
Daily bump.
Stafford Horne [Mon, 8 Nov 2021 22:10:12 +0000 (07:10 +0900)]
or1k: Fix clobbering of _mcount argument if fPIC is enabled
Recently we changed the PROFILE_HOOK _mcount call to pass in the link
register as an argument. This actually does not work when the _mcount
call uses a PLT because the GOT register setup code ends up getting
inserted before the PROFILE_HOOK and clobbers the link register
argument.
These glibc tests are failing:
gmon/tst-gmon-pie-gprof
gmon/tst-gmon-static-gprof
This patch fixes this by saving the instruction that stores the Link
Register to the _mcount argument and then inserts the GOT register setup
instructions after that.
For example:
main.c:
extern int e;
int f2(int a) {
return a + e;
}
int f1(int a) {
return f2 (a + a);
}
int main(int argc, char ** argv) {
return f1 (argc);
}
Compiled:
or1k-smh-linux-gnu-gcc -Wall -c -O2 -fPIC -pg -S main.c
Before Fix:
main:
l.addi r1, r1, -16
l.sw 8(r1), r2
l.sw 0(r1), r16
l.addi r2, r1, 16 # Keeping FP, but not needed
l.sw 4(r1), r18
l.sw 12(r1), r9
l.jal 8 # GOT Setup clobbers r9 (Link Register)
l.movhi r16, gotpchi(_GLOBAL_OFFSET_TABLE_-4)
l.ori r16, r16, gotpclo(_GLOBAL_OFFSET_TABLE_+0)
l.add r16, r16, r9
l.or r18, r3, r3
l.or r3, r9, r9 # This is not the original LR
l.jal plt(_mcount)
l.nop
l.jal plt(f1)
l.or r3, r18, r18
l.lwz r9, 12(r1)
l.lwz r16, 0(r1)
l.lwz r18, 4(r1)
l.lwz r2, 8(r1)
l.jr r9
l.addi r1, r1, 16
After the fix:
main:
l.addi r1, r1, -12
l.sw 0(r1), r16
l.sw 4(r1), r18
l.sw 8(r1), r9
l.or r18, r3, r3
l.or r3, r9, r9 # We now have r9 (LR) set early
l.jal 8 # Clobbers r9 (Link Register)
l.movhi r16, gotpchi(_GLOBAL_OFFSET_TABLE_-4)
l.ori r16, r16, gotpclo(_GLOBAL_OFFSET_TABLE_+0)
l.add r16, r16, r9
l.jal plt(_mcount)
l.nop
l.jal plt(f1)
l.or r3, r18, r18
l.lwz r9, 8(r1)
l.lwz r16, 0(r1)
l.lwz r18, 4(r1)
l.jr r9
l.addi r1, r1, 12
Fixes:
308531d148a ("or1k: Add return address argument to _mcount call")
gcc/ChangeLog:
* config/or1k/or1k-protos.h (or1k_profile_hook): New function.
* config/or1k/or1k.h (PROFILE_HOOK): Change macro to reference
new function or1k_profile_hook.
* config/or1k/or1k.c (struct machine_function): Add new field
set_mcount_arg_insn.
(or1k_profile_hook): New function.
(or1k_init_pic_reg): Update to inject pic rtx after _mcount arg
when profiling.
(or1k_frame_pointer_required): Frame pointer no longer needed
when profiling.
Jan Hubicka [Fri, 12 Nov 2021 22:55:50 +0000 (23:55 +0100)]
Fix wrong code with pure functions
I introduced bug into find_func_aliases_for_call in handling pure functions.
Instead of reading global memory pure functions are believed to write global
memory. This results in misoptimization of the testcase at -O1.
The change to pta-callused.c updates the template for new behaviour of the
constraint generation. We copy nonlocal memory to calluse which is correct but
also not strictly necessary because later we take care to add nonlocal_p flag
manually.
gcc/ChangeLog:
PR tree-optimization/103209
* tree-ssa-structalias.c (find_func_aliases_for_call): Fix
use of handle_rhs_call
gcc/testsuite/ChangeLog:
PR tree-optimization/103209
* gcc.dg/tree-ssa/pta-callused.c: Update template.
* gcc.c-torture/execute/pr103209.c: New test.
Aldy Hernandez [Fri, 12 Nov 2021 15:08:01 +0000 (16:08 +0100)]
path solver: Solve PHI imports first for ranges.
PHIs must be resolved first while solving ranges in a block,
regardless of where they appear in the import bitmap. We went through
a similar exercise for the relational code, but missed these.
Tested on x86-64 & ppc64le Linux.
gcc/ChangeLog:
PR tree-optimization/103202
* gimple-range-path.cc
(path_range_query::compute_ranges_in_block): Solve PHI imports first.
Jan Hubicka [Fri, 12 Nov 2021 19:15:48 +0000 (20:15 +0100)]
Fix ipa-pure-const
gcc/ChangeLog:
* ipa-pure-const.c (propagate_pure_const): Remove redundant check;
fix call of ipa_make_function_const and ipa_make_function_pure.
David Malcolm [Fri, 12 Nov 2021 15:14:35 +0000 (10:14 -0500)]
analyzer: "__analyzer_dump_state" has no side-effects
gcc/analyzer/ChangeLog:
* engine.cc (exploded_node::on_stmt_pre): Return when handling
"__analyzer_dump_state".
Signed-off-by: David Malcolm <dmalcolm@redhat.com>
Richard Sandiford [Fri, 12 Nov 2021 17:33:03 +0000 (17:33 +0000)]
aarch64: Remove redundant costing code
Previous patches made some of the complex parts of the issue rate
code redundant.
gcc/
* config/aarch64/aarch64.c (aarch64_vector_op::n_advsimd_ops): Delete.
(aarch64_vector_op::m_seen_loads): Likewise.
(aarch64_vector_costs::aarch64_vector_costs): Don't push to
m_advsimd_ops.
(aarch64_vector_op::count_ops): Remove vectype and factor parameters.
Remove code that tries to predict different vec_flags from the
current loop's.
(aarch64_vector_costs::add_stmt_cost): Update accordingly.
Remove m_advsimd_ops handling.
Richard Sandiford [Fri, 12 Nov 2021 17:33:03 +0000 (17:33 +0000)]
aarch64: Use new hooks for vector comparisons
Previously we tried to account for the different issue rates of
the various vector modes by guessing what the Advanced SIMD version
of an SVE loop would look like and what its issue rate was likely to be.
We'd then increase the cost of the SVE loop if the Advanced SIMD loop
might issue more quickly.
This patch moves that logic to better_main_loop_than_p, so that we
can compare loops side-by-side rather than having to guess. This also
means we can apply the issue rate heuristics to *any* vector loop
comparison, rather than just weighting SVE vs. Advanced SIMD.
The actual heuristics are otherwise unchanged. We're just
applying them in a different place.
gcc/
* config/aarch64/aarch64.c (aarch64_vector_costs::m_saw_sve_only_op)
(aarch64_sve_only_stmt_p): Delete.
(aarch64_vector_costs::prefer_unrolled_loop): New function,
extracted from adjust_body_cost.
(aarch64_vector_costs::better_main_loop_than_p): New function,
using heuristics extracted from adjust_body_cost and
adjust_body_cost_sve.
(aarch64_vector_costs::adjust_body_cost_sve): Remove
advsimd_cycles_per_iter and could_use_advsimd parameters.
Update after changes above.
(aarch64_vector_costs::adjust_body_cost): Update after changes above.
Richard Sandiford [Fri, 12 Nov 2021 17:33:02 +0000 (17:33 +0000)]
aarch64: Add vf_factor to aarch64_vec_op_count
-mtune=neoverse-512tvb sets the likely SVE vector length to 128 bits,
but it also takes into account Neoverse V1, which is a 256-bit target.
This patch adds this VF (VL) factor to aarch64_vec_op_count.
gcc/
* config/aarch64/aarch64.c (aarch64_vec_op_count::m_vf_factor):
New member variable.
(aarch64_vec_op_count::aarch64_vec_op_count): Add a parameter for it.
(aarch64_vec_op_count::vf_factor): New function.
(aarch64_vector_costs::aarch64_vector_costs): When costing for
neoverse-512tvb, pass a vf_factor of 2 for the Neoverse V1 version
of an SVE loop.
(aarch64_vector_costs::adjust_body_cost): Read the vf factor
instead of hard-coding 2.
Richard Sandiford [Fri, 12 Nov 2021 17:33:02 +0000 (17:33 +0000)]
aarch64: Move cycle estimation into aarch64_vec_op_count
This patch just moves the main cycle estimation routines
into aarch64_vec_op_count.
gcc/
* config/aarch64/aarch64.c
(aarch64_vec_op_count::rename_cycles_per_iter): New function.
(aarch64_vec_op_count::min_nonpred_cycles_per_iter): Likewise.
(aarch64_vec_op_count::min_pred_cycles_per_iter): Likewise.
(aarch64_vec_op_count::min_cycles_per_iter): Likewise.
(aarch64_vec_op_count::dump): Move earlier in file. Dump the
above properties too.
(aarch64_estimate_min_cycles_per_iter): Delete.
(adjust_body_cost): Use aarch64_vec_op_count::min_cycles_per_iter
instead of aarch64_estimate_min_cycles_per_iter. Rely on the dump
routine to print CPI estimates.
(adjust_body_cost_sve): Likewise. Use the other functions above
instead of doing the work inline.
Richard Sandiford [Fri, 12 Nov 2021 17:33:02 +0000 (17:33 +0000)]
aarch64: Use an array of aarch64_vec_op_counts
-mtune=neoverse-512tvb uses two issue rates, one for Neoverse V1
and one with more generic parameters. We use both rates when
making a choice between scalar, Advanced SIMD and SVE code.
Previously we calculated the Neoverse V1 issue rates from the
more generic issue rates, but by removing m_scalar_ops and
(later) m_advsimd_ops, it becomes easier to track multiple
issue rates directly.
This patch therefore converts m_ops and (temporarily) m_advsimd_ops
into arrays.
gcc/
* config/aarch64/aarch64.c (aarch64_vec_op_count): Allow default
initialization.
(aarch64_vec_op_count::base_issue_info): Remove handling of null
issue_infos.
(aarch64_vec_op_count::simd_issue_info): Likewise.
(aarch64_vec_op_count::sve_issue_info): Likewise.
(aarch64_vector_costs::m_ops): Turn into a vector.
(aarch64_vector_costs::m_advsimd_ops): Likewise.
(aarch64_vector_costs::aarch64_vector_costs): Add entries to
the vectors based on aarch64_tune_params.
(aarch64_vector_costs::analyze_loop_vinfo): Update the pred_ops
of all entries in m_ops.
(aarch64_vector_costs::add_stmt_cost): Call count_ops for all
entries in m_ops.
(aarch64_estimate_min_cycles_per_iter): Remove issue_info
parameter and get the information from the ops instead.
(aarch64_vector_costs::adjust_body_cost_sve): Take a
aarch64_vec_issue_info instead of a aarch64_vec_op_count.
(aarch64_vector_costs::adjust_body_cost): Update call accordingly.
Exit earlier if m_ops is empty for either cost structure.
Richard Sandiford [Fri, 12 Nov 2021 17:33:01 +0000 (17:33 +0000)]
aarch64: Use real scalar op counts
Now that vector finish_costs is passed the associated scalar costs,
we can record the scalar issue information while computing the scalar
costs, rather than trying to estimate it while computing the vector
costs.
This simplifies things a little, but the main motivation is to improve
accuracy.
gcc/
* config/aarch64/aarch64.c (aarch64_vector_costs::m_scalar_ops)
(aarch64_vector_costs::m_sve_ops): Replace with...
(aarch64_vector_costs::m_ops): ...this.
(aarch64_vector_costs::analyze_loop_vinfo): Update accordingly.
(aarch64_vector_costs::adjust_body_cost_sve): Likewise.
(aarch64_vector_costs::aarch64_vector_costs): Likewise.
Initialize m_vec_flags here rather than in add_stmt_cost.
(aarch64_vector_costs::count_ops): Test for scalar reductions too.
Allow vectype to be null.
(aarch64_vector_costs::add_stmt_cost): Call count_ops for scalar
code too. Don't require vectype to be nonnull.
(aarch64_vector_costs::adjust_body_cost): Take the loop_vec_info
and scalar costs as parameters. Use the scalar costs to determine
the cycles per iteration of the scalar loop, then multiply it
by the estimated VF.
(aarch64_vector_costs::finish_cost): Update call accordingly.
Richard Sandiford [Fri, 12 Nov 2021 17:33:01 +0000 (17:33 +0000)]
aarch64: Get floatness from stmt_info
This patch gets the floatness of a memory access from the data
reference rather than the vectype. This makes it more suitable
for use in scalar costing code.
gcc/
* config/aarch64/aarch64.c (aarch64_dr_type): New function.
(aarch64_vector_costs::count_ops): Use it rather than the
vectype to determine floatness.
Richard Sandiford [Fri, 12 Nov 2021 17:33:00 +0000 (17:33 +0000)]
aarch64: Remove vectype from latency tests
This patch gets the scalar mode of a reduction operation from the
gimple stmt rather than the vectype. This makes it more suitable
for use in scalar costs.
gcc/
* config/aarch64/aarch64.c (aarch64_sve_in_loop_reduction_latency):
Remove vectype parameter and get floatness from the type of the
stmt lhs instead.
(arch64_in_loop_reduction_latency): Likewise.
(aarch64_detect_vector_stmt_subtype): Update caller.
(aarch64_vector_costs::count_ops): Likewise.
Richard Sandiford [Fri, 12 Nov 2021 17:33:00 +0000 (17:33 +0000)]
aarch64: Fold aarch64_sve_op_count into aarch64_vec_op_count
Later patches make aarch64 use the new vector hooks. We then
only need to track one set of ops for each aarch64_vector_costs
structure. This in turn means that it's more convenient to merge
aarch64_sve_op_count and aarch64_vec_op_count.
The patch also adds issue info and vec flags to aarch64_vec_op_count,
so that the structure is more self-descriptive. This simplifies some
things later.
gcc/
* config/aarch64/aarch64.c (aarch64_sve_op_count): Fold into...
(aarch64_vec_op_count): ...this. Add a constructor.
(aarch64_vec_op_count::vec_flags): New function.
(aarch64_vec_op_count::base_issue_info): Likewise.
(aarch64_vec_op_count::simd_issue_info): Likewise.
(aarch64_vec_op_count::sve_issue_info): Likewise.
(aarch64_vec_op_count::m_issue_info): New member variable.
(aarch64_vec_op_count::m_vec_flags): Likewise.
(aarch64_vector_costs): Add a constructor.
(aarch64_vector_costs::m_sve_ops): Change type to aarch64_vec_op_count.
(aarch64_vector_costs::aarch64_vector_costs): New function.
Initialize m_scalar_ops, m_advsimd_ops and m_sve_ops.
(aarch64_vector_costs::count_ops): Remove vec_flags and
issue_info parameters, using the new aarch64_vec_op_count
functions instead.
(aarch64_vector_costs::add_stmt_cost): Update call accordingly.
(aarch64_sve_op_count::dump): Fold into...
(aarch64_vec_op_count::dump): ..here.
Richard Sandiford [Fri, 12 Nov 2021 17:33:00 +0000 (17:33 +0000)]
aarch64: Detect more consecutive MEMs
For tests like:
int res[2];
void
f1 (int x, int y)
{
res[0] = res[1] = x + y;
}
we generated:
add w0, w0, w1
adrp x1, .LANCHOR0
add x2, x1, :lo12:.LANCHOR0
str w0, [x1, #:lo12:.LANCHOR0]
str w0, [x2, 4]
ret
Using [x1, #:lo12:.LANCHOR0] for the first store prevented the
two stores being recognised as a pair. However, the MEM_EXPR
and MEM_OFFSET information tell us that the MEMs really are
consecutive. The peehole2 context then guarantees that the
first address is equivalent to [x2, 0].
While there: the reg_mentioned_p tests for loads were probably correct,
but seemed a bit indirect. We're matching two consecutive loads,
so the thing we need to test is that the second MEM in the original
sequence doesn't depend on the result of the first load in the
original sequence.
gcc/
* config/aarch64/aarch64.c: Include tree-dfa.h.
(aarch64_check_consecutive_mems): New function that takes MEM_EXPR
and MEM_OFFSET into account.
(aarch64_swap_ldrstr_operands): Use it.
(aarch64_operands_ok_for_ldpstp): Likewise. Check that the
address of the second memory doesn't depend on the result of
the first load.
gcc/testsuite/
* gcc.target/aarch64/stp_1.c: New test.
Tobias Burnus [Fri, 12 Nov 2021 16:58:21 +0000 (17:58 +0100)]
Fortran/openmp: Fix '!$omp end'
gcc/fortran/ChangeLog:
* parse.c (decode_omp_directive): Fix permitting 'nowait' for some
combined directives, add missing 'omp end ... loop'.
(gfc_ascii_statement): Fix ST_OMP_END_TEAMS_LOOP result.
* openmp.c (resolve_omp_clauses): Add missing combined loop constructs
case values to the 'if(directive-name: ...)' check.
* trans-openmp.c (gfc_split_omp_clauses): Put nowait on target if
first leaf construct accepting it.
gcc/testsuite/ChangeLog:
* gfortran.dg/gomp/unexpected-end.f90: Update dg-error.
* gfortran.dg/gomp/clauses-1.f90: New test.
* gfortran.dg/gomp/nowait-2.f90: New test.
* gfortran.dg/gomp/nowait-3.f90: New test.
Jan Hubicka [Fri, 12 Nov 2021 15:54:29 +0000 (16:54 +0100)]
Fix exit condition in ipa_make_function_pure
gcc/ChangeLog:
* ipa-pure-const.c (ipa_make_function_pure): Fix exit condition.
Jan Hubicka [Fri, 12 Nov 2021 15:34:03 +0000 (16:34 +0100)]
Fix ICE in tree-ssa-structalias.c
PR tree-optimization/103175
* ipa-modref.c (modref_lattice::merge): Add sanity check.
(callee_to_caller_flags): Make flags adjustment sane.
(modref_eaf_analysis::analyze_ssa_name): Likewise.
Jakub Jelinek [Fri, 12 Nov 2021 15:11:02 +0000 (16:11 +0100)]
libgomp: Unbreak gcn offload build
My recent libgomp change apparently broke libgomp build for gcn offloading.
The problem is that gcn, unlike nvptx, doesn't override teams.c source file
and the patch I've committed assumed all the non-LIBGOMP_USE_PTHREADS targets
do not use it. My understanding is that gcn included omp_get_num_teams
and omp_get_team_num definitions in both icv-device.o and teams.o,
with the definitions only in the former working correctly.
This patch brings gcn into sync with how nvptx does it, that teams.c
is overridden, provides a dummy GOMP_teams_reg and omp_get_{num_teams,team_num}
definitions and icv-device.c doesn't provide those.
2021-11-12 Jakub Jelinek <jakub@redhat.com>
PR target/103201
* config/gcn/icv-device.c (omp_get_num_teams, omp_get_team_num): Move
to ...
* config/gcn/teams.c: ... here. New file.
Martin Jambor [Thu, 11 Nov 2021 16:17:30 +0000 (17:17 +0100)]
Fortran: Use build_debug_expr_decl to create DEBUG_DECL_EXPRs
This patch converts one more open coded construction of a
DEBUG_EXPR_DECL to a call of build_debug_expr_decl that I missed in my
previous patch befause it happens to be in the Fortran front-end.
gcc/fortran/ChangeLog:
2021-11-11 Martin Jambor <mjambor@suse.cz>
* trans-types.c (gfc_get_array_descr_info): Use build_debug_expr_decl
instead of building DEBUG_EXPR_DECL manually.
Martin Liska [Thu, 11 Nov 2021 16:31:56 +0000 (17:31 +0100)]
testsuite: Filter out TSVC test on Power [PR103051]
PR testsuite/103051
gcc/testsuite/ChangeLog:
* gcc.dg/vect/tsvc/vect-tsvc-s112.c: Skip test for old Power
CPUs.
Martin Liska [Fri, 12 Nov 2021 13:50:57 +0000 (14:50 +0100)]
libbacktrace: fix UBSAN issues
Fix issues mentioned in the PR.
PR libbacktrace/103167
libbacktrace/ChangeLog:
* elf.c (elf_uncompress_lzma_block): Cast to unsigned int.
(elf_uncompress_lzma): Likewise.
* xztest.c (test_samples): memcpy only if v > 0.
David Malcolm [Thu, 11 Nov 2021 23:32:21 +0000 (18:32 -0500)]
jit: fix -Werror=format-overflow= in testsuite [PR103199]
gcc/jit/ChangeLog:
PR jit/103199
* docs/examples/tut04-toyvm/toyvm.c (toyvm_function_compile):
Increase size of buffer.
* docs/examples/tut04-toyvm/toyvm.cc
(compilation_state::create_function): Likewise.
Signed-off-by: David Malcolm <dmalcolm@redhat.com>
Jan Hubicka [Fri, 12 Nov 2021 13:00:47 +0000 (14:00 +0100)]
Fix ipa-modref pure/const discovery
PR ipa/103200
* ipa-modref.c (analyze_function, modref_propagate_in_scc): Do
not mark pure/const function if there are side-effects.
Chung-Lin Tang [Fri, 12 Nov 2021 12:29:00 +0000 (20:29 +0800)]
openmp: Relax handling of implicit map vs. existing device mappings
This patch implements relaxing the requirements when a map with the implicit
attribute encounters an overlapping existing map. As the OpenMP 5.0 spec
describes on page 320, lines 18-27 (and 5.1 spec, page 352, lines 13-22):
"If a single contiguous part of the original storage of a list item with an
implicit data-mapping attribute has corresponding storage in the device data
environment prior to a task encountering the construct that is associated with
the map clause, only that part of the original storage will have corresponding
storage in the device data environment as a result of the map clause."
2021-11-12 Chung-Lin Tang <cltang@codesourcery.com>
include/ChangeLog:
* gomp-constants.h (GOMP_MAP_FLAG_SPECIAL_3): Define special bit macro.
(GOMP_MAP_IMPLICIT): New special map kind bits value.
(GOMP_MAP_FLAG_SPECIAL_BITS): Define helper mask for whole set of
special map kind bits.
(GOMP_MAP_IMPLICIT_P): New predicate macro for implicit map kinds.
gcc/ChangeLog:
* tree.h (OMP_CLAUSE_MAP_RUNTIME_IMPLICIT_P): New access macro for
'implicit' bit, using 'base.deprecated_flag' field of tree_node.
* tree-pretty-print.c (dump_omp_clause): Add support for printing
implicit attribute in tree dumping.
* gimplify.c (gimplify_adjust_omp_clauses_1):
Set OMP_CLAUSE_MAP_RUNTIME_IMPLICIT_P to 1 if map clause is implicitly
created.
(gimplify_adjust_omp_clauses): Adjust place of adding implicitly created
clauses, from simple append, to starting of list, after non-map clauses.
* omp-low.c (lower_omp_target): Add GOMP_MAP_IMPLICIT bits into kind
values passed to libgomp for implicit maps.
gcc/testsuite/ChangeLog:
* c-c++-common/gomp/target-implicit-map-1.c: New test.
* c-c++-common/goacc/combined-reduction.c: Adjust scan test pattern.
* c-c++-common/goacc/firstprivate-mappings-1.c: Likewise.
* c-c++-common/goacc/mdc-1.c: Likewise.
* g++.dg/goacc/firstprivate-mappings-1.C: Likewise.
libgomp/ChangeLog:
* target.c (gomp_map_vars_existing): Add 'bool implicit' parameter, add
implicit map handling to allow a "superset" existing map as valid case.
(get_kind): Adjust to filter out GOMP_MAP_IMPLICIT bits in return value.
(get_implicit): New function to extract implicit status.
(gomp_map_fields_existing): Adjust arguments in calls to
gomp_map_vars_existing, and add uses of get_implicit.
(gomp_map_vars_internal): Likewise.
* testsuite/libgomp.c-c++-common/target-implicit-map-1.c: New test.
Jonathan Wakely [Wed, 3 Nov 2021 16:06:29 +0000 (16:06 +0000)]
libstdc++: Print assertion messages to stderr [PR59675]
This replaces the printf used by failed debug assertions with fprintf,
so we can write to stderr.
To avoid including <stdio.h> the assert function is moved into the
library. To avoid programs using a vague linkage definition of the old
inline function, the function is renamed. Code compiled with old
versions of GCC might still call the old function, but code compiled
with the newer GCC will call the new function and write to stderr.
libstdc++-v3/ChangeLog:
PR libstdc++/59675
* acinclude.m4 (libtool_VERSION): Bump version.
* config/abi/pre/gnu.ver (GLIBCXX_3.4.30): Add version and
export new symbol.
* configure: Regenerate.
* include/bits/c++config (__replacement_assert): Remove, declare
__glibcxx_assert_fail instead.
* src/c++11/debug.cc (__glibcxx_assert_fail): New function to
replace __replacement_assert, writing to stderr instead of
stdout.
* testsuite/util/testsuite_abi.cc: Update latest version.
Mikael Morin [Sun, 7 Nov 2021 13:39:18 +0000 (14:39 +0100)]
fortran: Ignore unused args in scalarization [PR97896]
The KIND argument of the INDEX intrinsic is a compile time constant
that is used at compile time only to resolve to a kind-specific library
function. That argument is otherwise completely ignored at runtime, and there is
no code generated for it as the library procedure has no kind argument.
This confuses the scalarizer which expects to see every argument
of elemental functions used when calling a procedure.
This change removes the argument from the scalarization lists
at the beginning of the scalarization process, so that the argument
is completely ignored.
This also reverts the existing workaround
(commit
d09847357b965a2c2cda063827ce362d4c9c86f2 except for its testcase).
PR fortran/97896
gcc/fortran/ChangeLog:
* intrinsic.c (add_sym_4ind): Remove.
(add_functions): Use add_sym4 instead of add_sym4ind.
Don’t special case the index intrinsic.
* iresolve.c (gfc_resolve_index_func): Use the individual arguments
directly instead of the full argument list.
* intrinsic.h (gfc_resolve_index_func): Update the declaration
accordingly.
* trans-decl.c (gfc_get_extern_function_decl): Don’t modify the
list of arguments in the case of the index intrinsic.
* trans-array.h (gfc_get_intrinsic_for_expr,
gfc_get_proc_ifc_for_expr): New.
* trans-array.c (gfc_get_intrinsic_for_expr,
arg_evaluated_for_scalarization): New.
(gfc_walk_elemental_function_args): Add intrinsic procedure
as argument. Count arguments. Check arg_evaluated_for_scalarization.
* trans-intrinsic.c (gfc_walk_intrinsic_function): Update call.
* trans-stmt.c (get_intrinsic_for_code): New.
(gfc_trans_call): Update call.
gcc/testsuite/ChangeLog:
* gfortran.dg/index_5.f90: New.
Jakub Jelinek [Fri, 12 Nov 2021 11:41:22 +0000 (12:41 +0100)]
openmp: Honor OpenMP 5.1 num_teams lower bound
The following patch implements what I've been talking about earlier,
honor that for explicit num_teams clause we create at least the
lower-bound (if not specified, upper-bound) teams in the league.
For host fallback, it still means we only have one thread doing all the
teams, sequentially one after another.
For PTX and GCN, I think the new teams-2.c test and maybe teams-4.c too
will or might fail.
For these offloads, I think it is ok to remove symbols no longer used
from libgomp.a.
If num_teams_lower is bigger than the provided num_blocks or num_workgroups,
we should arrange for gomp_num_teams_var to be num_teams_lower - 1,
stop using the %ctaid.x or __builtin_gcn_dim_pos (0) for omp_get_team_num ()
and instead use for it some .shared var that GOMP_teams4 initializes to
%ctaid.x or __builtin_gcn_dim_pos (0) when first and for !first
increment that by num_blocks or num_workgroups each time and only
return false when we are above num_teams_lower.
Any help with actually implementing this for the 2 architectures highly
appreciated.
2021-11-12 Jakub Jelinek <jakub@redhat.com>
gcc/
* omp-builtins.def (BUILT_IN_GOMP_TEAMS): Remove.
(BUILT_IN_GOMP_TEAMS4): New.
* builtin-types.def (BT_FN_VOID_UINT_UINT): Remove.
(BT_FN_BOOL_UINT_UINT_UINT_BOOL): New.
* omp-low.c (lower_omp_teams): Use GOMP_teams4 instead of
GOMP_teams, pass to it also num_teams lower-bound expression
or a dup of upper-bound if it is missing and a flag whether
it is the first call or not.
gcc/fortran/
* types.def (BT_FN_VOID_UINT_UINT): Remove.
(BT_FN_BOOL_UINT_UINT_UINT_BOOL): New.
libgomp/
* libgomp_g.h (GOMP_teams4): Declare.
* libgomp.map (GOMP_5.1): Export GOMP_teams4.
* target.c (GOMP_teams4): New function.
* config/nvptx/target.c (GOMP_teams): Remove.
(GOMP_teams4): New function.
* config/gcn/target.c (GOMP_teams): Remove.
(GOMP_teams4): New function.
* testsuite/libgomp.c/teams-4.c (main): Expect exactly 2
teams instead of <= 2.
* testsuite/libgomp.c-c++-common/teams-2.c: New test.
Martin Liska [Fri, 12 Nov 2021 11:37:26 +0000 (12:37 +0100)]
Remove unused function.
PR tree-optimization/102497
gcc/ChangeLog:
* gimple-predicate-analysis.cc (add_pred): Remove unused
function:
Richard Biener [Fri, 12 Nov 2021 08:09:29 +0000 (09:09 +0100)]
tree-optimization/103204 - fix missed valueization in VN
The following fixes a missed valueization when simplifying
a MEM[&...] combination during valueization.
2021-11-12 Richard Biener <rguenther@suse.de>
PR tree-optimization/103204
* tree-ssa-sccvn.c (valueize_refs_1): Re-valueize the
top operand after folding in an address.
* gcc.dg/torture/pr103204.c: New testcase.
Alan Modra [Fri, 12 Nov 2021 07:58:45 +0000 (18:28 +1030)]
Make opcodes configure depend on bfd configure
The idea is for opcodes to be able to see whether bfd is compiled
for 64-bit. A lot of --enable-targets=all libopcodes is wasted space
if bfd can't load 64-bit target object files.
* Makefile.def (configure-opcodes): Depend on configure-bfd.
* Makefile.in: Regenerate.
Jonathan Wakely [Thu, 11 Nov 2021 14:05:35 +0000 (14:05 +0000)]
libstdc++: Implement constexpr std::vector for C++20
This implements P1004R2 ("Making std::vector constexpr") for C++20.
For now, debug mode vectors are not supported in constant expressions.
To make that work we might need to disable all attaching/detaching of
safe iterators. That can be fixed later.
Co-authored-by: Josh Marshall <joshua.r.marshall.1991@gmail.com>
libstdc++-v3/ChangeLog:
* include/bits/alloc_traits.h (_Destroy): Make constexpr for
C++20 mode.
* include/bits/allocator.h (__shrink_to_fit::_S_do_it):
Likewise.
* include/bits/stl_algobase.h (__fill_a1): Declare _Bit_iterator
overload constexpr for C++20.
* include/bits/stl_bvector.h (_Bit_type, _S_word_bit): Move out
of inline namespace.
(_Bit_reference, _Bit_iterator_base, _Bit_iterator)
(_Bit_const_iterator, _Bvector_impl_data, _Bvector_base)
(vector<bool, A>>): Add constexpr to every member function.
(_Bvector_base::_M_allocate): Initialize storage during constant
evaluation.
(vector<bool, A>::_M_initialize_value): Use __fill_bvector_n
instead of memset.
(__fill_bvector_n): New helper function to replace memset during
constant evaluation.
* include/bits/stl_uninitialized.h (__uninitialized_copy<false>):
Move logic to ...
(__do_uninit_copy): New function.
(__uninitialized_fill<false>): Move logic to ...
(__do_uninit_fill): New function.
(__uninitialized_fill_n<false>): Move logic to ...
(__do_uninit_fill_n): New function.
(__uninitialized_copy_a): Add constexpr. Use __do_uninit_copy.
(__uninitialized_move_a, __uninitialized_move_if_noexcept_a):
Add constexpr.
(__uninitialized_fill_a): Add constexpr. Use __do_uninit_fill.
(__uninitialized_fill_n_a): Add constexpr. Use
__do_uninit_fill_n.
(__uninitialized_default_n, __uninitialized_default_n_a)
(__relocate_a_1, __relocate_a): Add constexpr.
* include/bits/stl_vector.h (_Vector_impl_data, _Vector_impl)
(_Vector_base, vector): Add constexpr to every member function.
(_Vector_impl::_S_adjust): Disable ASan annotation during
constant evaluation.
(_Vector_base::_S_use_relocate): Disable bitwise-relocation
during constant evaluation.
(vector::_Temporary_value): Use a union for storage.
* include/bits/vector.tcc (vector, vector<bool>): Add constexpr
to every member function.
* include/std/vector (erase_if, erase): Add constexpr.
* testsuite/23_containers/headers/vector/synopsis.cc: Add
constexpr for C++20 mode.
* testsuite/23_containers/vector/bool/cmp_c++20.cc: Change to
compile-only test using constant expressions.
* testsuite/23_containers/vector/bool/capacity/29134.cc: Adjust
namespace for _S_word_bit.
* testsuite/23_containers/vector/bool/modifiers/insert/31370.cc:
Likewise.
* testsuite/23_containers/vector/cmp_c++20.cc: Likewise.
* testsuite/23_containers/vector/cons/89164.cc: Adjust errors
for C++20 and move C++17 test to ...
* testsuite/23_containers/vector/cons/89164_c++17.cc: ... here.
* testsuite/23_containers/vector/bool/capacity/constexpr.cc: New test.
* testsuite/23_containers/vector/bool/cons/constexpr.cc: New test.
* testsuite/23_containers/vector/bool/element_access/constexpr.cc: New test.
* testsuite/23_containers/vector/bool/modifiers/assign/constexpr.cc: New test.
* testsuite/23_containers/vector/bool/modifiers/constexpr.cc: New test.
* testsuite/23_containers/vector/bool/modifiers/swap/constexpr.cc: New test.
* testsuite/23_containers/vector/capacity/constexpr.cc: New test.
* testsuite/23_containers/vector/cons/constexpr.cc: New test.
* testsuite/23_containers/vector/data_access/constexpr.cc: New test.
* testsuite/23_containers/vector/element_access/constexpr.cc: New test.
* testsuite/23_containers/vector/modifiers/assign/constexpr.cc: New test.
* testsuite/23_containers/vector/modifiers/constexpr.cc: New test.
* testsuite/23_containers/vector/modifiers/swap/constexpr.cc: New test.
GCC Administrator [Fri, 12 Nov 2021 00:16:32 +0000 (00:16 +0000)]
Daily bump.
Jonathan Wakely [Thu, 11 Nov 2021 20:23:48 +0000 (20:23 +0000)]
libstdc++: Fix debug containers for C++98 mode
Since r12-5072 made _Safe_container::operator=(const _Safe_container&)
protected, the debug containers no longer compile in C++98 mode. They
have user-provided copy assignment operators in C++98 mode, and they
assign each base class in turn. The 'this->_M_safe() = __x' expressions
fail, because calling a protected member function is only allowed via
'this'. They could be fixed by using this->_Safe::operator=(__x) but a
simpler solution is to just remove the user-provided assignment
operators and let the compiler define them (as we do for C++11 and
later, by defining them as defaulted).
The only change needed for that to work is to define the _Safe_vector
copy assignment operator in C++98 mode, so that the implicit
__gnu_debug::vector::operator= definition will call it, instead of
needing to call _M_update_guaranteed_capacity() manually.
libstdc++-v3/ChangeLog:
* include/debug/deque (deque::operator=(const deque&)): Remove
definition.
* include/debug/list (list::operator=(const list&)): Likewise.
* include/debug/map.h (map::operator=(const map&)): Likewise.
* include/debug/multimap.h (multimap::operator=(const multimap&)):
Likewise.
* include/debug/multiset.h (multiset::operator=(const multiset&)):
Likewise.
* include/debug/set.h (set::operator=(const set&)): Likewise.
* include/debug/string (basic_string::operator=(const basic_string&)):
Likewise.
* include/debug/vector (vector::operator=(const vector&)):
Likewise.
(_Safe_vector::operator=(const _Safe_vector&)): Define for
C++98 as well.
Aldy Hernandez [Thu, 11 Nov 2021 17:06:50 +0000 (18:06 +0100)]
Make ranger optional in path_range_query.
All users of path_range_query are currently allocating a gimple_ranger
only to pass it to the query object. It's tidier to just do it from
path_range_query if no ranger was passed.
Tested on x86-64 Linux.
gcc/ChangeLog:
* gimple-range-path.cc (path_range_query::path_range_query): New
ctor without a ranger.
(path_range_query::~path_range_query): Free ranger if necessary.
(path_range_query::range_on_path_entry): Adjust m_ranger for pointer.
(path_range_query::ssa_range_in_phi): Same.
(path_range_query::compute_ranges_in_block): Same.
(path_range_query::compute_imports): Same.
(path_range_query::compute_ranges): Same.
(path_range_query::range_of_stmt): Same.
(path_range_query::compute_outgoing_relations): Same.
* gimple-range-path.h (class path_range_query): New ctor.
* tree-ssa-loop-ch.c (ch_base::copy_headers): Remove gimple_ranger
as path_range_query allocates one.
* tree-ssa-threadbackward.c (class back_threader): Remove m_ranger.
(back_threader::~back_threader): Same.
Aldy Hernandez [Thu, 11 Nov 2021 15:12:32 +0000 (16:12 +0100)]
Remove loop crossing restriction from the backward threader.
We have much more thorough restrictions, that are shared between both
threader implementations, in the registry. I've been meaning to
remove the backward threader one, since it's only purpose was reducing
the search space. Previously there was a small time penalty for its
removal, but with the various patches in the past month, it looks like
the removal is a wash performance wise.
This catches 8 more jump threads in the backward threader in my suite.
Presumably, because we disallowed all loop crossing, whereas the
registry restrictions allow some crossing (if we exit the loop, etc).
Tested on x86-64 Linux.
gcc/ChangeLog:
* tree-ssa-threadbackward.c
(back_threader_profitability::profitable_path_p): Remove loop
crossing restriction.
Bill Schmidt [Thu, 11 Nov 2021 20:36:04 +0000 (14:36 -0600)]
rs6000: Fix test_mffsl.c to require Power9 support
2021-11-11 Bill Schmidt <wschmidt@linux.ibm.com>
gcc/testsuite/
* gcc.target/powerpc/test_mffsl.c: Require Power9.
Ian Lance Taylor [Thu, 11 Nov 2021 02:15:12 +0000 (18:15 -0800)]
compiler: traverse func subexprs when creating func descriptors
Fix the Create_func_descriptors pass to traverse the subexpressions of
the function in a Call_expression. There are no subexpressions in the
normal case of calling a function a method directly, but there are
subexpressions when in code like F().M() when F returns an interface type.
Forgetting to traverse the function subexpressions was almost entirely
hidden by the fact that we also created the necessary thunks in
Bound_method_expression::do_flatten and
Interface_field_reference_expression::do_get_backend. However, when
the thunks were created there, they did not go through the
order_evaluations pass. This almost always worked, but failed in the
case in which the function being thunked returned multiple results, as
order_evaluations takes the necessary step of moving the
Call_expression into its own statement, and that would not happen when
order_evaluations was not called. Avoid hiding errors like this by
changing those methods to only lookup the previously created thunk,
rather than creating it if it was not already created.
The test case for this is https://golang.org/cl/363156.
Fixes https://golang.org/issue/49512
Reviewed-on: https://go-review.googlesource.com/c/gofrontend/+/363274
Jonathan Wakely [Wed, 10 Nov 2021 16:59:29 +0000 (16:59 +0000)]
libstdc++: Make pmr::memory_resource::allocate implicitly create objects
Calling the placement version of ::operator new "implicitly creates
objects in the returned region of storage" as per [intro.object]. This
allows the returned memory to be used as storage for implicit-lifetime
types (including arrays) without additional action by the caller. This
is required by the proposed resolution of LWG 3147.
libstdc++-v3/ChangeLog:
* include/std/memory_resource (memory_resource::allocate):
Implicitly create objects in the returned storage.
Jonathan Wakely [Thu, 11 Nov 2021 13:02:16 +0000 (13:02 +0000)]
libstdc++: Remove public std::vector<bool>::data() member
This function only exists to avoid an error in the debug mode vector, so
doesn't need to be public.
libstdc++-v3/ChangeLog:
* include/bits/stl_bvector.h (vector<bool>::data()): Give
protected access, and delete for C++11 and later.
Jan Hubicka [Thu, 11 Nov 2021 17:51:35 +0000 (18:51 +0100)]
Fix gfortran.dg/inline_matmul_17.f90 template.
As discussed on the mailing list the template actually tests for missed
optimization where we fail to pragate size of an array. We no longer miss this
after modref improvements.
gcc/testsuite/ChangeLog:
2021-11-11 Jan Hubicka <hubicka@ucw.cz>
* gfortran.dg/inline_matmul_17.f90: Fix template
Jan Hubicka [Thu, 11 Nov 2021 17:14:45 +0000 (18:14 +0100)]
Enable pure-const discovery in modref.
We newly can handle some extra cases, for example:
struct a {int a,b,c;};
__attribute__ ((noinline))
int init (struct a *a)
{
a->a=1;
a->b=2;
a->c=3;
}
int const_fn ()
{
struct a a;
init (&a);
return a.a + a.b + a.c;
}
Here pure/const stops on the fact that const_fn calls non-const init, while
modref knows that the memory it initializes is local to const_fn.
I ended up reordering passes so early modref is done after early pure-const
mostly to avoid need to change testsuite which greps for const functions
being detects in pure-const. Stil some testuiste compensation is needed.
gcc/ChangeLog:
2021-11-11 Jan Hubicka <hubicka@ucw.cz>
* ipa-modref.c (analyze_function): Do pure/const discovery, return
true on success.
(pass_modref::execute): If pure/const is discovered fixup cfg.
(ignore_edge): Do not ignore pure/const edges.
(modref_propagate_in_scc): Do pure/const discovery, return true if
cdtor was promoted pure/const.
(pass_ipa_modref::execute): If needed remove unreachable functions.
* ipa-pure-const.c (warn_function_noreturn): Fix whitespace.
(warn_function_cold): Likewise.
(skip_function_for_local_pure_const): Move earlier.
(ipa_make_function_const): Break out from ...
(ipa_make_function_pure): Break out from ...
(propagate_pure_const): ... here.
(pass_local_pure_const::execute): Use it.
* ipa-utils.h (ipa_make_function_const): Declare.
(ipa_make_function_pure): Declare.
* passes.def: Move early modref after pure-const.
gcc/testsuite/ChangeLog:
2021-11-11 Jan Hubicka <hubicka@ucw.cz>
* c-c++-common/tm/inline-asm.c: Disable pure-const.
* g++.dg/ipa/modref-1.C: Update template.
* gcc.dg/tree-ssa/modref-11.c: Disable pure-const.
* gcc.dg/tree-ssa/modref-14.c: New test.
* gcc.dg/tree-ssa/modref-8.c: Do not optimize sibling calls.
* gfortran.dg/do_subscript_3.f90: Add -O0.
David Malcolm [Wed, 10 Nov 2021 22:37:11 +0000 (17:37 -0500)]
diagnostic: fix unused variable 'def_tabstop' [PR103129]
gcc/ChangeLog:
PR other/103129
* diagnostic-show-locus.c (def_policy): Use def_tabstop.
Signed-off-by: David Malcolm <dmalcolm@redhat.com>
Tobias Burnus [Thu, 11 Nov 2021 16:27:00 +0000 (17:27 +0100)]
Fortran/openmp: Add support for 2 argument num_teams clause
Fortran part to commit r12-5146-g48d7327f2aaf65
gcc/fortran/ChangeLog:
* gfortran.h (struct gfc_omp_clauses): Rename num_teams to
num_teams_upper, add num_teams_upper.
* dump-parse-tree.c (show_omp_clauses): Update to handle
lower-bound num_teams clause.
* frontend-passes.c (gfc_code_walker): Likewise
* openmp.c (gfc_free_omp_clauses, gfc_match_omp_clauses,
resolve_omp_clauses): Likewise.
* trans-openmp.c (gfc_trans_omp_clauses, gfc_split_omp_clauses,
gfc_trans_omp_target): Likewise.
libgomp/ChangeLog:
* testsuite/libgomp.fortran/teams-1.f90: New test.
Jonathan Wright [Wed, 10 Nov 2021 15:16:24 +0000 (15:16 +0000)]
aarch64: Use type-qualified builtins for vcombine_* Neon intrinsics
Declare unsigned and polynomial type-qualified builtins for
vcombine_* Neon intrinsics. Using these builtins removes the need for
many casts in arm_neon.h.
gcc/ChangeLog:
2021-11-10 Jonathan Wright <jonathan.wright@arm.com>
* config/aarch64/aarch64-builtins.c (TYPES_COMBINE): Delete.
(TYPES_COMBINEP): Delete.
* config/aarch64/aarch64-simd-builtins.def: Declare type-
qualified builtins for vcombine_* intrinsics.
* config/aarch64/arm_neon.h (vcombine_s8): Remove unnecessary
cast.
(vcombine_s16): Likewise.
(vcombine_s32): Likewise.
(vcombine_f32): Likewise.
(vcombine_u8): Use type-qualified builtin and remove casts.
(vcombine_u16): Likewise.
(vcombine_u32): Likewise.
(vcombine_u64): Likewise.
(vcombine_p8): Likewise.
(vcombine_p16): Likewise.
(vcombine_p64): Likewise.
(vcombine_bf16): Remove unnecessary cast.
* config/aarch64/iterators.md (VD_I): New mode iterator.
(VDC_P): New mode iterator.