David Malcolm [Tue, 6 Oct 2020 20:58:00 +0000 (16:58 -0400)]
Add -fdiagnostics-path-format=separate-events to -fdiagnostics-plain-output
The path-printing default of -fdiagnostics-path-format=inline-events
interacted poorly with -fdiagnostics-plain-output, so it makes most
sense to add -fdiagnostics-path-format=separate-events to
-fdiagnostics-plain-output.
Seen when adding an experimental analyzer plugin to gcc.dg/plugin.exp.
gcc/ChangeLog:
* doc/invoke.texi (-fdiagnostics-plain-output): Add
-fdiagnostics-path-format=separate-events to list of
options injected by -fdiagnostics-plain-output.
* opts-common.c (decode_cmdline_options_to_array): Likewise.
gcc/testsuite/ChangeLog:
* g++.dg/analyzer/analyzer.exp (DEFAULT_CXXFLAGS): Remove
-fdiagnostics-path-format=separate-events.
* gcc.dg/analyzer/analyzer.exp (DEFAULT_CFLAGS): Likewise.
* gcc.dg/plugin/diagnostic-path-format-default.c: Rename to...
* gcc.dg/plugin/diagnostic-path-format-plain.c: ...this. Remove
dg-options directive. Copy remainder of test from
diagnostic-path-format-separate-events.c.
* gcc.dg/plugin/diagnostic-test-paths-2.c: Add
-fdiagnostics-path-format=inline-events to options.
Fix expected output for location of conditional within "for" loop.
* gcc.dg/plugin/plugin.exp (plugin_test_list): Update for
renaming.
* gfortran.dg/analyzer/analyzer.exp (DEFAULT_FFLAGS): Remove
-fdiagnostics-path-format=separate-events.
Nathan Sidwell [Wed, 7 Oct 2020 12:46:24 +0000 (05:46 -0700)]
c++: block-scope externs get an alias [PR95677,PR31775,PR95677]
This patch improves block-scope extern handling by always injecting a
hidden copy into the enclosing namespace (or using a match already
there). This hidden copy will be revealed if the user explicitly
declares it later. We can get from the DECL_LOCAL_DECL_P local extern
to the alias via DECL_LOCAL_DECL_ALIAS. This fixes several bugs and
removes the kludgy per-function extern_decl_map. We only do this
pushing for non-dependent local externs -- dependent ones will be
pushed during instantiation.
User code that expected to be able to handle incompatible local
externs in different block-scopes will no longer work. That code is
ill-formed. (always was, despite what 31775 claimed). I had to
adjust a number of testcases that fell into this.
I tried using DECL_VALUE_EXPR, but that didn't work out. Due to
constexpr requirements we have to do the replacement very late (it
happens in the gimplifier). Consider:
extern int l[]; // #1
constexpr bool foo ()
{
extern int l[3]; // this does not complete the type of decl #1
constexpr int *p = &l[2]; // ok
return !p;
}
This requirement, coupled with our use of the common folding machinery
makes pr97306 hard to fix, as we end up with an expression containing
the two different decls for 'l', and only the c++ FE knows how to
reconcile those. I punted on this.
gcc/cp/
* cp-tree.h (struct language_function): Delete extern_decl_map.
(DECL_LOCAL_DECL_ALIAS): New.
* name-lookup.h (is_local_extern): Delete.
* name-lookup.c (set_local_extern_decl_linkage): Replace with ...
(push_local_extern_decl): ... this new function.
(do_pushdecl): Call new function after pushing new decl. Unhide
hidden non-functions.
(is_local_extern): Delete.
* decl.c (layout_var_decl): Do not allow VLA local externs.
* decl2.c (mark_used): Also mark DECL_LOCAL_DECL_ALIAS. Drop old
local-extern treatment.
* parser.c (cp_parser_oacc_declare): Deal with local extern aliases.
* pt.c (tsubst_expr): Adjust local extern instantiation.
* cp-gimplify.c (cp_genericize_r): Remap DECL_LOCAL_DECLs.
gcc/testsuite/
* g++.dg/cpp0x/lambda/lambda-sfinae1.C: Avoid ill-formed local extern
* g++.dg/init/pr42844.C: Add expected error.
* g++.dg/lookup/extern-redecl1.C: Likewise.
* g++.dg/lookup/koenig15.C: Avoid ill-formed.
* g++.dg/lto/pr95677.C: New.
* g++.dg/other/nested-extern-1.C: Correct expected behabviour.
* g++.dg/other/nested-extern-2.C: Likewise.
* g++.dg/other/nested-extern.cc: Split ...
* g++.dg/other/nested-extern-1.cc: ... here ...
* g++.dg/other/nested-extern-2.cc: ... here.
* g++.dg/template/scope5.C: Avoid ill-formed
* g++.old-deja/g++.law/missed-error2.C: Allow extension.
* g++.old-deja/g++.pt/crash3.C: Add expected error.
Martin Jambor [Wed, 7 Oct 2020 12:12:49 +0000 (14:12 +0200)]
ipa-prop: Fix multiple-target speculation resolution
As the FIXME which this patch removes states, the current code does
not work when a call with multiple speculative targets gets resolved
through parameter tracking during inlining - it feeds the inliner an
edge it has already dealt with. The patch makes the code which should
prevent it aware of the possibility that that speculation can have
more than one target now.
gcc/ChangeLog:
2020-09-30 Martin Jambor <mjambor@suse.cz>
PR ipa/96394
* ipa-prop.c (update_indirect_edges_after_inlining): Do not add
resolved speculation edges to vector of new direct edges even in
presence of multiple speculative direct edges for a single call.
gcc/testsuite/ChangeLog:
2020-09-30 Martin Jambor <mjambor@suse.cz>
PR ipa/96394
* gcc.dg/tree-prof/pr96394.c: New test.
Nathan Sidwell [Wed, 7 Oct 2020 12:02:34 +0000 (05:02 -0700)]
c++: Rename DECL_BUILTIN_P to DECL_UNDECLARED_BUILTIN_P
I realized I'd misnamed DECL_BUILTIN_P, it's only true of compiler
builtins unless and until the user declares them -- at that point
they're real decls, and will have a location in the user's source.
(BUILT_IN_FN and friends still work though). This renames them so
future-me is not confused as to why the predicate becomes false.
gcc/cp/
* cp-tree.h (DECL_BUILTIN_P): Rename to ...
(DECL_UNDECLARED_BUILTIN_P): ... here.
* decl.c (duplicate_decls): Adjust.
* name-lookup.c (anticipated_builtin_p): Adjust.
(do_nonmember_using_decl): Likewise.
libcc1/
* libcp1plugin.cc (supplement_binding): Rename
DECL_BUILTIN_P.
Nathan Sidwell [Wed, 7 Oct 2020 11:56:41 +0000 (04:56 -0700)]
c++: Adding exception specs can changed dependentness
Making an exception variant can cause a non-dependent function type to
become dependent (since c++17 eh-specs are part of the type). The
same is (possibly?) true for adding a late return type. Fixed thusly.
My upcoming local extern-decl changes have a test case that covers
this (which was how I found it).
gcc/cp/
* tree.c (build_cp_fntype_variant): Clear
TYPE_DEPENDENT_P_VALID if necessary.
Andrew Stubbs [Wed, 13 May 2020 15:24:12 +0000 (16:24 +0100)]
amdgcn: Use scalar instructions for addptrdi3
Allow addptr to use SPGRs as well as VGPRs for pointers. This ought to
prevent some unnecessary copying back and forth.
gcc/ChangeLog:
* config/gcn/gcn.md (unspec): Add UNSPEC_ADDPTR.
(addptrdi3): Add SGPR alternative.
Mark Wielaard [Fri, 18 Sep 2020 15:07:03 +0000 (17:07 +0200)]
Output filepath strings in .debug_line_str for DWARF5
DWARF5 has a new string table specially for file paths. .debug_line
file and dir tables reference strings in .debug_line_str. If a
.debug_line_str section is emitted then also place CU DIE file
names and comp dirs there.
gcc/ChangeLog:
* dwarf2out.c (add_filepath_AT_string): New function.
(asm_outputs_debug_line_str): Likewise.
(add_filename_attribute): Likewise.
(add_comp_dir_attribute): Call add_filepath_AT_string.
(gen_compile_unit_die): Call add_filename_attribute for name.
(init_sections_and_labels): Init debug_line_str_section when
asm_outputs_debug_line_str return true.
(dwarf2out_early_finish): Remove DW_AT_name and DW_AT_comp_dir
hack and call add_filename_attribute for the remap_debug_filename.
Jakub Jelinek [Wed, 7 Oct 2020 08:55:35 +0000 (10:55 +0200)]
debug: Pass --gdwarf-N to assembler if fixed gas is detected during configure
> > As for the test assembly, I'd say we should take
> > #define F void foo (void) {}
> > F
> > compile it with
> > gcc -S -O2 -g1 -dA -gno-as-loc-support -fno-merge-debug-strings
> > remove .cfi_* directives, remove the ret instruction, change @function
> > and @progbits to %function and %progbits, change .uleb128 to just .byte,
> > I think all the values should be small enough, maybe change .value to
> > .2byte and .long to .4byte (whatever is most portable across different
> > arches and gas versions), simplify (shorten) strings and adjust
> > sizes, and do something with the .quad directives, that is dependent on
> > the address size, perhaps just take those attributes out and adjust
> > .debug_abbrev? Finally, remove all comments (emit them in the first case
> > just to better understand the debug info).
>
> I'm afraid it is hard to avoid the .quad or .8byte.
> Here is a 64-bit address version that assembles fine by both x86_64 and
> aarch64 as.
> Unfortunately doesn't fail with broken gas versions with -gdwarf-2 without
> the nop, so we'll need at least a nop in there.
> Fortunately gcc/configure.ac already determines the right nop insn for the
> target, in $insn.
> So I guess what we want next is have the 32-bit version of this with .4byte
> instead of .8byte and just let's try to assemble both versions, first
> without -gdwarf-2 and the one that succeeds assemble again with -gdwarf-2
> and check for the duplicate .debug_line sections error.
Ok, here it is in patch form.
I've briefly tested it, with the older binutils I have around (no --gdwarf-N
support), with latest gas (--gdwarf-N that can be passed to as even when
compiling C/C++ etc. code and emitting .debug_line) and latest gas with Mark's fix
reverted (--gdwarf-N support, but can only pass it to as when assembling
user .s/.S files, not when compiling C/C++ etc.).
2020-10-07 Jakub Jelinek <jakub@redhat.com>
* configure.ac (HAVE_AS_GDWARF_5_DEBUG_FLAG,
HAVE_AS_WORKING_DWARF_4_FLAG): New tests.
* gcc.c (ASM_DEBUG_DWARF_OPTION): Define.
(ASM_DEBUG_SPEC): Use ASM_DEBUG_DWARF_OPTION instead of
"--gdwarf2". Use %{cond:opt1;:opt2} style.
(ASM_DEBUG_OPTION_DWARF_OPT): Define.
(ASM_DEBUG_OPTION_SPEC): Define.
(asm_debug_option): New variable.
(asm_options): Add "%(asm_debug_option)".
(static_specs): Add asm_debug_option entry.
(static_spec_functions): Add dwarf-version-gt.
(debug_level_greater_than_spec_func): New function.
* config/darwin.h (ASM_DEBUG_OPTION_SPEC): Define.
* config/darwin9.h (ASM_DEBUG_OPTION_SPEC): Redefine.
* config.in: Regenerated.
* configure: Regenerated.
Jakub Jelinek [Wed, 7 Oct 2020 08:52:47 +0000 (10:52 +0200)]
options: Avoid unused variable mask warning [PR97305]
> options-save.c: In function 'void cl_target_option_save(cl_target_option*, gcc_options*, gcc_options*)':
> options-save.c:8526:26: error: unused variable 'mask' [-Werror=unused-variable]
> 8526 | unsigned HOST_WIDE_INT mask = 0;
> | ^~~~
> options-save.c: In function 'void cl_target_option_restore(gcc_options*, gcc_options*, cl_target_option*)':
> options-save.c:8537:26: error: unused variable 'mask' [-Werror=unused-variable]
> 8537 | unsigned HOST_WIDE_INT mask;
> | ^~~~
Oops, missed that, sorry.
The following patch should fix that, tested on x86_64-linux make
options-save.c (same file as before) and -> ia64-linux cross make
options-save.o (no warning anymore, just the unwanted declarations gone).
2020-10-07 Jakub Jelinek <jakub@redhat.com>
PR bootstrap/97305
* optc-save-gen.awk: Don't declare mask variable if explicit_mask
array is not present.
Jakub Jelinek [Wed, 7 Oct 2020 08:49:37 +0000 (10:49 +0200)]
openmp: Improve composite simd vectorization
> > I was really hoping bbs 4 and 5 would be one loop (the one I set safelen
> > and force_vectorize etc. for) and that basic blocks 6 and 7 would be
> > together with that inner loop another loop, but apparently loop discovery
> > thinks it is just one loop.
> > Any ideas what I'm doing wrong or is there any way how to make it two loops
> > (that would also survive all the cfg cleanups until vectorization)?
>
> The early CFG looks like we have a common header with two latches
> so it boils down to how we disambiguate those in the end (we seem
> to unify the latches via a forwarder). IIRC OMP lowering builds
> loops itself, could it not do the appropriate disambiguation itself?
I realized I emit the same stmts on both paths (before goto doit; and before
falling through it), at least the MIN_EXPR and PLUS_EXPR, so by forcing
there an extra bb which does those two and having the "doit" label before
that the innermost loop doesn't have multiple latches anymore and so is
vectorized fine.
2020-10-07 Jakub Jelinek <jakub@redhat.com>
* omp-expand.c (expand_omp_simd): Don't emit MIN_EXPR and PLUS_EXPR
at the end of entry_bb and innermost init_bb, instead force arguments
for MIN_EXPR into temporaries in both cases and jump to a new bb that
performs MIN_EXPR and PLUS_EXPR.
* gcc.dg/gomp/simd-2.c: New test.
* gcc.dg/gomp/simd-3.c: New test.
Tom de Vries [Wed, 7 Oct 2020 05:22:43 +0000 (07:22 +0200)]
[tree-ssa-loop-ch] Add missing NULL test for dump_file
If we change gimple_can_duplicate_bb_p to return false instead of true, we run
into a segfault in ch_base::copy_headers due to using dump_file while it's
NULL:
...
if (!gimple_duplicate_sese_region (entry, exit, bbs, n_bbs, copied_bbs,
true))
{
fprintf (dump_file, "Duplication failed.\n");
continue;
}
...
Fix this by adding the missing dump_file != NULL test.
Tested by rebuilding lto1 and rerunning the failing test-case.
gcc/ChangeLog:
2020-10-07 Tom de Vries <tdevries@suse.de>
* tree-ssa-loop-ch.c (ch_base::copy_headers): Add missing NULL test
for dump_file.
GCC Administrator [Wed, 7 Oct 2020 00:16:35 +0000 (00:16 +0000)]
Daily bump.
Marek Polacek [Mon, 5 Oct 2020 21:48:19 +0000 (17:48 -0400)]
c++: typename in out-of-class member function definitions [PR97297]
I was notified that our P0634R3 (Down with typename) implementation has
a flaw: when we have an out-of-class member function definition, we
still required 'typename' for its parameters. For example here:
template <typename T> struct S {
int simple(T::type);
};
template <typename T>
int S<T>::simple(/* typename */T::type) { return 0; }
the 'typename' isn't necessary per [temp.res]/5.2.4. We have a qualified
name here ("S<T>::simple") so we know it's already been declared so we
can look it up to see if it's a function template or a variable
template.
In this case, the P0634R3 code in cp_parser_direct_declarator wasn't
looking into uninstantiated templates and didn't find the member
function 'simple' -- cp_parser_lookup_name returned a SCOPE_REF which
means that the qualifying scope was dependent. With this fix, we find
the BASELINK for 'simple', don't clear CP_PARSER_FLAGS_TYPENAME_OPTIONAL
from the flags, and the typename is implicitly assumed.
gcc/cp/ChangeLog:
PR c++/97297
* parser.c (cp_parser_direct_declarator): When checking if a
name is a function template declaration for the P0634R3 case,
look in uninstantiated templates too.
gcc/testsuite/ChangeLog:
PR c++/97297
* g++.dg/cpp2a/typename18.C: New test.
Tobias Burnus [Tue, 6 Oct 2020 21:34:21 +0000 (23:34 +0200)]
c-c++-common/goacc/declare-pr90861.c: Remove xfail
gcc/testsuite/ChangeLog
PR middle-end/90861
* c-c++-common/goacc/declare-pr90861.c: Remove xfail.
Nikhil Benesch [Mon, 5 Oct 2020 03:40:40 +0000 (23:40 -0400)]
compiler: avoid undefined behavior in Import::read
For some implementations of Stream, advancing the stream will invalidate
the previously-returned peek buffer. Copy the peek buffer before
advancing in Import::read to avoid this undefined behavior.
Reviewed-on: https://go-review.googlesource.com/c/gofrontend/+/259438
Andrew MacLeod [Tue, 6 Oct 2020 16:53:09 +0000 (12:53 -0400)]
Hybrid EVRP and testcases
Provide a hybrid EVRP pass which uses legacy EVRP and adds additonal
enhancements from the new ranger infrastructure.
A New option is also provided, -fevrp-mode=
And adjust testcases
gcc/ChangeLog:
2020-10-06 Andrew MacLeod <amacleod@redhat.com>
* flag-types.h (enum evrp_mode): New enumerated type EVRP_MODE_*.
* common.opt (fevrp-mode): New undocumented flag.
* gimple-ssa-evrp.c: Include gimple-range.h
(class rvrp_folder): EVRP folding using ranger exclusively.
(rvrp_folder::rvrp_folder): New.
(rvrp_folder::~rvrp_folder): New.
(rvrp_folder::value_of_expr): New. Use rangers value_of_expr.
(rvrp_folder::value_on_edge): New. Use rangers value_on_edge.
(rvrp_folder::value_of_Stmt): New. Use rangers value_of_stmt.
(rvrp_folder::fold_stmt): New. Call the simplifier.
(class hybrid_folder): EVRP folding using both engines.
(hybrid_folder::hybrid_folder): New.
(hybrid_folder::~hybrid_folder): New.
(hybrid_folder::fold_stmt): New. Simplify with one engne, then the
other.
(hybrid_folder::value_of_expr): New. Use both value routines.
(hybrid_folder::value_on_edge): New. Use both value routines.
(hybrid_folder::value_of_stmt): New. Use both value routines.
(hybrid_folder::choose_value): New. Choose between range_analzyer and
rangers values.
(execute_early_vrp): Choose a folder based on flag_evrp_mode.
* vr-values.c (simplify_using_ranges::fold_cond): Try range_of_stmt
first to see if it returns a value.
(simplify_using_ranges::simplify_switch_using_ranges): Return true if
any changes were made to the switch.
gcc/testsuite/ChangeLog:
2020-10-06 Andrew MacLeod <amacleod@redhat.com>
* gcc.dg/pr81192.c: Disable EVRP pass.
* gcc.dg/tree-ssa/pr77445-2.c: Ditto.
* gcc.dg/tree-ssa/ssa-dom-thread-6.c: Adjust.
* gcc.dg/tree-ssa/ssa-dom-thread-7.c: Ditto.
Andrew MacLeod [Tue, 6 Oct 2020 16:12:53 +0000 (12:12 -0400)]
Ranger classes.
Add the 8 ranger files and the Makefile changes to build it.
2020-10-06 Andrew MacLeod <amacleod@redhat.com>
* Makefile.in (OBJS): Add gimple-range*.o.
* gimple-range.h: New file.
* gimple-range.cc: New file.
* gimple-range-cache.h: New file.
* gimple-range-cache.cc: New file.
* gimple-range-edge.h: New file.
* gimple-range-edge.cc: New file.
* gimple-range-gori.h: New file.
* gimple-range-gori.cc: New file.
Tom de Vries [Tue, 6 Oct 2020 16:12:52 +0000 (18:12 +0200)]
[openacc, libgomp, testsuite] Xfail declare-5.f90
We're currently running into:
...
FAIL: libgomp.oacc-fortran/declare-5.f90 -DACC_DEVICE_TYPE_nvidia=1 \
-DACC_MEM_SHARED=0 -foffload=nvptx-none -O0 execution test
FAIL: libgomp.oacc-fortran/declare-5.f90 -DACC_DEVICE_TYPE_nvidia=1 \
-DACC_MEM_SHARED=0 -foffload=nvptx-none -O1 execution test
FAIL: libgomp.oacc-fortran/declare-5.f90 -DACC_DEVICE_TYPE_nvidia=1 \
-DACC_MEM_SHARED=0 -foffload=nvptx-none -O2 execution test
FAIL: libgomp.oacc-fortran/declare-5.f90 -DACC_DEVICE_TYPE_nvidia=1 \
-DACC_MEM_SHARED=0 -foffload=nvptx-none -O3 -fomit-frame-pointer \
-funroll-loops -fpeel-loops -ftracer -finline-functions execution test
FAIL: libgomp.oacc-fortran/declare-5.f90 -DACC_DEVICE_TYPE_nvidia=1 \
-DACC_MEM_SHARED=0 -foffload=nvptx-none -O3 -g execution test
FAIL: libgomp.oacc-fortran/declare-5.f90 -DACC_DEVICE_TYPE_nvidia=1 \
-DACC_MEM_SHARED=0 -foffload=nvptx-none -Os execution test
...
A PR was filed for this: PR92790 - "[OpenACC] declare device_resident -
Fortran common blocks not handled / libgomp.oacc-fortran/declare-5.f90 fails"
Xfail the fails.
Tested on x86_64-linux with nvptx accelerator.
libgomp/ChangeLog:
2020-10-06 Tom de Vries <tdevries@suse.de>
* testsuite/libgomp.oacc-fortran/declare-5.f90: Add xfail for PR92790.
Jonathan Wakely [Tue, 6 Oct 2020 15:55:06 +0000 (16:55 +0100)]
libstdc++: Inline std::exception_ptr members [PR 90295]
This inlines most members of std::exception_ptr so that all operations
on a null exception_ptr can be optimized away. This benefits code like
std::future and coroutines where an exception_ptr object is present to
cope with exceptional cases, but is usually not used and remains null.
Since those functions were previously non-inline we have to continue to
export them from the library, for objects that were compiled against the
old headers and expect to find definitions in the library.
In order to inline the copy constructor and destructor we need to export
the _M_addref() and _M_release() members that increment/decrement the
reference count when copying/destroying a non-null exception_ptr. The
copy ctor and dtor check for null and don't call _M_addref and
_M_release unless they need to. The checks for null pointers in
_M_addref and _M_release are still needed because old code might call
them without checking for null first. But we can use __builtin_expect to
predict that they are usually called for the non-null case.
libstdc++-v3/ChangeLog:
PR libstdc++/90295
* config/abi/pre/gnu.ver (CXXABI_1.3.13): New symbol version.
(exception_ptr::_M_addref(), exception_ptr::_M_release()):
Export symbols.
* libsupc++/eh_ptr.cc (exception_ptr::exception_ptr()):
Remove out-of-line definition.
(exception_ptr::exception_ptr(const exception_ptr&)):
Likewise.
(exception_ptr::~exception_ptr()): Likewise.
(exception_ptr::operator=(const exception_ptr&)):
Likewise.
(exception_ptr::swap(exception_ptr&)): Likewise.
(exception_ptr::_M_addref()): Add branch prediction.
* libsupc++/exception_ptr.h (exception_ptr::operator bool):
Add noexcept.
[!_GLIBCXX_EH_PTR_COMPAT] (operator==, operator!=): Define
inline as hidden friends. Remove declarations at namespace
scope.
(exception_ptr::exception_ptr()): Define inline.
(exception_ptr::exception_ptr(const exception_ptr&)):
Likewise.
(exception_ptr::~exception_ptr()): Likewise.
(exception_ptr::operator=(const exception_ptr&)):
Likewise.
(exception_ptr::swap(exception_ptr&)): Likewise.
* testsuite/util/testsuite_abi.cc: Add CXXABI_1.3.13.
* testsuite/18_support/exception_ptr/90295.cc: New test.
Dennis Zhang [Tue, 6 Oct 2020 15:53:46 +0000 (16:53 +0100)]
arm: Enable MVE SIMD modes for vectorization
This patch enables SIMD modes for MVE auto-vectorization.
In this patch, the integer and float MVE SIMD modes are returned by
arm_preferred_simd_mode (TARGET_VECTORIZE_PREFERRED_SIMD_MODE hook) when
MVE or MVE_FLOAT is enabled. Then the expanders for auto-vectorization
can be used for generating MVE SIMD code.
This patch also fixes bugs in MVE vreiterpretq_*.c tests which are
revealed by the enabled MVE SIMD modes.
The tests are for checking the MVE reinterpret intrinsics.
There are two functions in each of the tests. The two functions contain
the pattern of identical code so that they are folded in icf pass.
Because of icf, the instruction count only checks one function which is
8. However when the SIMD modes are enabled, the estimation of the code
size becomes smaller so that inlining is applied after icf, then the
instruction count becomes 16 which causes failure of the tests.
Because the icf is not the expected pattern to be tested but causes
above issues, -fno-ipa-icf is applied to the tests to avoid unstable
instruction count.
gcc/ChangeLog:
2020-10-05 Dennis Zhang <dennis.zhang@arm.com>
* config/arm/arm.c (arm_preferred_simd_mode): Enable MVE SIMD modes.
gcc/testsuite/ChangeLog:
2020-10-05 Dennis Zhang <dennis.zhang@arm.com>
* gcc.target/arm/mve/intrinsics/vreinterpretq_f16.c: Use additional
option -fno-ipa-icf and change the instruction count from 8 to 16.
* gcc.target/arm/mve/intrinsics/vreinterpretq_f32.c: Likewise.
* gcc.target/arm/mve/intrinsics/vreinterpretq_s16.c: Likewise.
* gcc.target/arm/mve/intrinsics/vreinterpretq_s32.c: Likewise.
* gcc.target/arm/mve/intrinsics/vreinterpretq_s64.c: Likewise.
* gcc.target/arm/mve/intrinsics/vreinterpretq_s8.c: Likewise.
* gcc.target/arm/mve/intrinsics/vreinterpretq_u16.c: Likewise.
* gcc.target/arm/mve/intrinsics/vreinterpretq_u32.c: Likewise.
* gcc.target/arm/mve/intrinsics/vreinterpretq_u64.c: Likewise.
* gcc.target/arm/mve/intrinsics/vreinterpretq_u8.c: Likewise.
Tom de Vries [Tue, 6 Oct 2020 11:07:25 +0000 (13:07 +0200)]
[openacc] Fix acc declare for VLAs
Consider test-case test.c, with VLA A:
...
int main (void) {
int N = 1000;
int A[N];
#pragma acc declare copy(A)
return 0;
}
...
compiled using:
...
$ gcc test.c -fopenacc -S -fdump-tree-all
...
At original, we have:
...
#pragma acc declare map(tofrom:A);
...
but at gimple, we have a map (to:A.1), but not a map (from:A.1):
...
int[0:D.2074] * A.1;
{
int A[0:D.2074] [value-expr: *A.1];
saved_stack.2 = __builtin_stack_save ();
try
{
A.1 = __builtin_alloca_with_align (D.2078, 32);
#pragma omp target oacc_declare map(to:(*A.1) [len: D.2076])
}
finally
{
__builtin_stack_restore (saved_stack.2);
}
}
...
This is caused by the following incompatibility. When storing the desired
from clause in oacc_declare_returns, we use 'A.1' as the key:
...
10898 oacc_declare_returns->put (decl, c);
(gdb) call debug_generic_expr (decl)
A.1
(gdb) call debug_generic_expr (c)
map(from:(*A.1))
...
but when looking it up, we use 'A' as the key:
...
(gdb)
1471 tree *c = oacc_declare_returns->get (t);
(gdb) call debug_generic_expr (t)
A
...
Fix this by extracing the 'A.1' lookup key from 'A' using the decl-expr.
In addition, unshare the looked up value, to fix avoid running into
an "incorrect sharing of tree nodes" error.
Using these two fixes, we get our desired:
...
finally
{
+ #pragma omp target oacc_declare map(from:(*A.1))
__builtin_stack_restore (saved_stack.2);
}
...
Build on x86_64-linux with nvptx accelerator, tested libgomp.
gcc/ChangeLog:
2020-10-06 Tom de Vries <tdevries@suse.de>
PR middle-end/90861
* gimplify.c (gimplify_bind_expr): Handle lookup in
oacc_declare_returns using key with decl-expr.
libgomp/ChangeLog:
2020-10-06 Tom de Vries <tdevries@suse.de>
PR middle-end/90861
* testsuite/libgomp.oacc-c-c++-common/declare-vla.c: Remove xfail.
Martin Liska [Mon, 5 Oct 2020 16:03:08 +0000 (18:03 +0200)]
lto: fix LTO debug sections copying.
readelf -S prints:
There are 81999 section headers, starting at offset 0x1f488060:
Section Headers:
[Nr] Name Type Address Off Size ES Flg Lk Inf Al
[ 0] NULL
0000000000000000 000000 01404f 00 81998 0 0
[ 1] .group GROUP
0000000000000000 000040 000008 04 81995 105027 4
...
[81995] .symtab SYMTAB
0000000000000000 d5d9298 2db310 18 81997 105026 8
[81996] .symtab_shndx SYMTAB SECTION INDICES
0000000000000000 d8b45a8 079dd8 04 81995 0 4
[81997] .strtab STRTAB
0000000000000000 d92e380 80460c 00 0 0 1
...
Looking at the documentation:
Table 7–15 ELF sh_link and sh_info Interpretation
sh_type - sh_link
SHT_SYMTAB - The section header index of the associated string table.
SHT_SYMTAB_SHNDX - The section header index of the associated symbol table.
As seen, sh_link of a SHT_SYMTAB always points to a .strtab and readelf
confirms that.
So we need to use reverse mapping taken from
[81996] .symtab_shndx SYMTAB SECTION INDICES
0000000000000000 d8b45a8 079dd8 04 81995 0 4
where sh_link points to 81995.
libiberty/ChangeLog:
PR lto/97290
* simple-object-elf.c (simple_object_elf_copy_lto_debug_sections):
Use sh_link of a .symtab_shndx section.
Srinath Parvathaneni [Tue, 6 Oct 2020 13:58:13 +0000 (14:58 +0100)]
[PATCH][GCC] arm: Move iterators from mve.md to iterators.md to maintain consistency.
To maintain consistency with other Arm Architectures backend, iterators and iterator attributes are moved
from mve.md file to iterators.md. Also move enumerators for MVE unspecs from mve.md file to unspecs.md file.
gcc/ChangeLog:
2020-10-06 Srinath Parvathaneni <srinath.parvathaneni@arm.com>
* config/arm/iterators.md (MVE_types): Move mode iterator from mve.md to
iterators.md.
(MVE_VLD_ST): Likewise.
(MVE_0): Likewise.
(MVE_1): Likewise.
(MVE_3): Likewise.
(MVE_2): Likewise.
(MVE_5): Likewise.
(MVE_6): Likewise.
(MVE_CNVT): Move mode attribute iterator from mve.md to iterators.md.
(MVE_LANES): Likewise.
(MVE_constraint): Likewise.
(MVE_constraint1): Likewise.
(MVE_constraint2): Likewise.
(MVE_constraint3): Likewise.
(MVE_pred): Likewise.
(MVE_pred1): Likewise.
(MVE_pred2): Likewise.
(MVE_pred3): Likewise.
(MVE_B_ELEM): Likewise.
(MVE_H_ELEM): Likewise.
(V_sz_elem1): Likewise.
(V_extr_elem): Likewise.
(earlyclobber_32): Likewise.
(supf): Move int attribute from mve.md to iterators.md.
(mode1): Likewise.
(VCVTQ_TO_F): Move int iterator from mve.md to iterators.md.
(VMVNQ_N): Likewise.
(VREV64Q): Likewise.
(VCVTQ_FROM_F): Likewise.
(VREV16Q): Likewise.
(VCVTAQ): Likewise.
(VMVNQ): Likewise.
(VDUPQ_N): Likewise.
(VCLZQ): Likewise.
(VADDVQ): Likewise.
(VREV32Q): Likewise.
(VMOVLBQ): Likewise.
(VMOVLTQ): Likewise.
(VCVTPQ): Likewise.
(VCVTNQ): Likewise.
(VCVTMQ): Likewise.
(VADDLVQ): Likewise.
(VCTPQ): Likewise.
(VCTPQ_M): Likewise.
(VCVTQ_N_TO_F): Likewise.
(VCREATEQ): Likewise.
(VSHRQ_N): Likewise.
(VCVTQ_N_FROM_F): Likewise.
(VADDLVQ_P): Likewise.
(VCMPNEQ): Likewise.
(VSHLQ): Likewise.
(VABDQ): Likewise.
(VADDQ_N): Likewise.
(VADDVAQ): Likewise.
(VADDVQ_P): Likewise.
(VANDQ): Likewise.
(VBICQ): Likewise.
(VBRSRQ_N): Likewise.
(VCADDQ_ROT270): Likewise.
(VCADDQ_ROT90): Likewise.
(VCMPEQQ): Likewise.
(VCMPEQQ_N): Likewise.
(VCMPNEQ_N): Likewise.
(VEORQ): Likewise.
(VHADDQ): Likewise.
(VHADDQ_N): Likewise.
(VHSUBQ): Likewise.
(VHSUBQ_N): Likewise.
(VMAXQ): Likewise.
(VMAXVQ): Likewise.
(VMINQ): Likewise.
(VMINVQ): Likewise.
(VMLADAVQ): Likewise.
(VMULHQ): Likewise.
(VMULLBQ_INT): Likewise.
(VMULLTQ_INT): Likewise.
(VMULQ): Likewise.
(VMULQ_N): Likewise.
(VORNQ): Likewise.
(VORRQ): Likewise.
(VQADDQ): Likewise.
(VQADDQ_N): Likewise.
(VQRSHLQ): Likewise.
(VQRSHLQ_N): Likewise.
(VQSHLQ): Likewise.
(VQSHLQ_N): Likewise.
(VQSHLQ_R): Likewise.
(VQSUBQ): Likewise.
(VQSUBQ_N): Likewise.
(VRHADDQ): Likewise.
(VRMULHQ): Likewise.
(VRSHLQ): Likewise.
(VRSHLQ_N): Likewise.
(VRSHRQ_N): Likewise.
(VSHLQ_N): Likewise.
(VSHLQ_R): Likewise.
(VSUBQ): Likewise.
(VSUBQ_N): Likewise.
(VADDLVAQ): Likewise.
(VBICQ_N): Likewise.
(VMLALDAVQ): Likewise.
(VMLALDAVXQ): Likewise.
(VMOVNBQ): Likewise.
(VMOVNTQ): Likewise.
(VORRQ_N): Likewise.
(VQMOVNBQ): Likewise.
(VQMOVNTQ): Likewise.
(VSHLLBQ_N): Likewise.
(VSHLLTQ_N): Likewise.
(VRMLALDAVHQ): Likewise.
(VBICQ_M_N): Likewise.
(VCVTAQ_M): Likewise.
(VCVTQ_M_TO_F): Likewise.
(VQRSHRNBQ_N): Likewise.
(VABAVQ): Likewise.
(VSHLCQ): Likewise.
(VRMLALDAVHAQ): Likewise.
(VADDVAQ_P): Likewise.
(VCLZQ_M): Likewise.
(VCMPEQQ_M_N): Likewise.
(VCMPEQQ_M): Likewise.
(VCMPNEQ_M_N): Likewise.
(VCMPNEQ_M): Likewise.
(VDUPQ_M_N): Likewise.
(VMAXVQ_P): Likewise.
(VMINVQ_P): Likewise.
(VMLADAVAQ): Likewise.
(VMLADAVQ_P): Likewise.
(VMLAQ_N): Likewise.
(VMLASQ_N): Likewise.
(VMVNQ_M): Likewise.
(VPSELQ): Likewise.
(VQDMLAHQ_N): Likewise.
(VQRDMLAHQ_N): Likewise.
(VQRDMLASHQ_N): Likewise.
(VQRSHLQ_M_N): Likewise.
(VQSHLQ_M_R): Likewise.
(VREV64Q_M): Likewise.
(VRSHLQ_M_N): Likewise.
(VSHLQ_M_R): Likewise.
(VSLIQ_N): Likewise.
(VSRIQ_N): Likewise.
(VMLALDAVQ_P): Likewise.
(VQMOVNBQ_M): Likewise.
(VMOVLTQ_M): Likewise.
(VMOVNBQ_M): Likewise.
(VRSHRNTQ_N): Likewise.
(VORRQ_M_N): Likewise.
(VREV32Q_M): Likewise.
(VREV16Q_M): Likewise.
(VQRSHRNTQ_N): Likewise.
(VMOVNTQ_M): Likewise.
(VMOVLBQ_M): Likewise.
(VMLALDAVAQ): Likewise.
(VQSHRNBQ_N): Likewise.
(VSHRNBQ_N): Likewise.
(VRSHRNBQ_N): Likewise.
(VMLALDAVXQ_P): Likewise.
(VQMOVNTQ_M): Likewise.
(VMVNQ_M_N): Likewise.
(VQSHRNTQ_N): Likewise.
(VMLALDAVAXQ): Likewise.
(VSHRNTQ_N): Likewise.
(VCVTMQ_M): Likewise.
(VCVTNQ_M): Likewise.
(VCVTPQ_M): Likewise.
(VCVTQ_M_N_FROM_F): Likewise.
(VCVTQ_M_FROM_F): Likewise.
(VRMLALDAVHQ_P): Likewise.
(VADDLVAQ_P): Likewise.
(VABAVQ_P): Likewise.
(VSHLQ_M): Likewise.
(VSRIQ_M_N): Likewise.
(VSUBQ_M): Likewise.
(VCVTQ_M_N_TO_F): Likewise.
(VHSUBQ_M): Likewise.
(VSLIQ_M_N): Likewise.
(VRSHLQ_M): Likewise.
(VMINQ_M): Likewise.
(VMULLBQ_INT_M): Likewise.
(VMULHQ_M): Likewise.
(VMULQ_M): Likewise.
(VHSUBQ_M_N): Likewise.
(VHADDQ_M_N): Likewise.
(VORRQ_M): Likewise.
(VRMULHQ_M): Likewise.
(VQADDQ_M): Likewise.
(VRSHRQ_M_N): Likewise.
(VQSUBQ_M_N): Likewise.
(VADDQ_M): Likewise.
(VORNQ_M): Likewise.
(VRHADDQ_M): Likewise.
(VQSHLQ_M): Likewise.
(VANDQ_M): Likewise.
(VBICQ_M): Likewise.
(VSHLQ_M_N): Likewise.
(VCADDQ_ROT270_M): Likewise.
(VQRSHLQ_M): Likewise.
(VQADDQ_M_N): Likewise.
(VADDQ_M_N): Likewise.
(VMAXQ_M): Likewise.
(VQSUBQ_M): Likewise.
(VMLASQ_M_N): Likewise.
(VMLADAVAQ_P): Likewise.
(VBRSRQ_M_N): Likewise.
(VMULQ_M_N): Likewise.
(VCADDQ_ROT90_M): Likewise.
(VMULLTQ_INT_M): Likewise.
(VEORQ_M): Likewise.
(VSHRQ_M_N): Likewise.
(VSUBQ_M_N): Likewise.
(VHADDQ_M): Likewise.
(VABDQ_M): Likewise.
(VMLAQ_M_N): Likewise.
(VQSHLQ_M_N): Likewise.
(VMLALDAVAQ_P): Likewise.
(VMLALDAVAXQ_P): Likewise.
(VQRSHRNBQ_M_N): Likewise.
(VQRSHRNTQ_M_N): Likewise.
(VQSHRNBQ_M_N): Likewise.
(VQSHRNTQ_M_N): Likewise.
(VRSHRNBQ_M_N): Likewise.
(VRSHRNTQ_M_N): Likewise.
(VSHLLBQ_M_N): Likewise.
(VSHLLTQ_M_N): Likewise.
(VSHRNBQ_M_N): Likewise.
(VSHRNTQ_M_N): Likewise.
(VSTRWSBQ): Likewise.
(VSTRBSOQ): Likewise.
(VSTRBQ): Likewise.
(VLDRBGOQ): Likewise.
(VLDRBQ): Likewise.
(VLDRWGBQ): Likewise.
(VLD1Q): Likewise.
(VLDRHGOQ): Likewise.
(VLDRHGSOQ): Likewise.
(VLDRHQ): Likewise.
(VLDRWQ): Likewise.
(VLDRDGBQ): Likewise.
(VLDRDGOQ): Likewise.
(VLDRDGSOQ): Likewise.
(VLDRWGOQ): Likewise.
(VLDRWGSOQ): Likewise.
(VST1Q): Likewise.
(VSTRHSOQ): Likewise.
(VSTRHSSOQ): Likewise.
(VSTRHQ): Likewise.
(VSTRWQ): Likewise.
(VSTRDSBQ): Likewise.
(VSTRDSOQ): Likewise.
(VSTRDSSOQ): Likewise.
(VSTRWSOQ): Likewise.
(VSTRWSSOQ): Likewise.
(VSTRWSBWBQ): Likewise.
(VLDRWGBWBQ): Likewise.
(VSTRDSBWBQ): Likewise.
(VLDRDGBWBQ): Likewise.
(VADCIQ): Likewise.
(VADCIQ_M): Likewise.
(VSBCQ): Likewise.
(VSBCQ_M): Likewise.
(VSBCIQ): Likewise.
(VSBCIQ_M): Likewise.
(VADCQ): Likewise.
(VADCQ_M): Likewise.
(UQRSHLLQ): Likewise.
(SQRSHRLQ): Likewise.
(VSHLCQ_M): Likewise.
* config/arm/mve.md (MVE_types): Move mode iterator to iterators.md from mve.md.
(MVE_VLD_ST): Likewise.
(MVE_0): Likewise.
(MVE_1): Likewise.
(MVE_3): Likewise.
(MVE_2): Likewise.
(MVE_5): Likewise.
(MVE_6): Likewise.
(MVE_CNVT): Move mode attribute iterator to iterators.md from mve.md.
(MVE_LANES): Likewise.
(MVE_constraint): Likewise.
(MVE_constraint1): Likewise.
(MVE_constraint2): Likewise.
(MVE_constraint3): Likewise.
(MVE_pred): Likewise.
(MVE_pred1): Likewise.
(MVE_pred2): Likewise.
(MVE_pred3): Likewise.
(MVE_B_ELEM): Likewise.
(MVE_H_ELEM): Likewise.
(V_sz_elem1): Likewise.
(V_extr_elem): Likewise.
(earlyclobber_32): Likewise.
(supf): Move int attribute to iterators.md from mve.md.
(mode1): Likewise.
(VCVTQ_TO_F): Move int iterator to iterators.md from mve.md.
(VMVNQ_N): Likewise.
(VREV64Q): Likewise.
(VCVTQ_FROM_F): Likewise.
(VREV16Q): Likewise.
(VCVTAQ): Likewise.
(VMVNQ): Likewise.
(VDUPQ_N): Likewise.
(VCLZQ): Likewise.
(VADDVQ): Likewise.
(VREV32Q): Likewise.
(VMOVLBQ): Likewise.
(VMOVLTQ): Likewise.
(VCVTPQ): Likewise.
(VCVTNQ): Likewise.
(VCVTMQ): Likewise.
(VADDLVQ): Likewise.
(VCTPQ): Likewise.
(VCTPQ_M): Likewise.
(VCVTQ_N_TO_F): Likewise.
(VCREATEQ): Likewise.
(VSHRQ_N): Likewise.
(VCVTQ_N_FROM_F): Likewise.
(VADDLVQ_P): Likewise.
(VCMPNEQ): Likewise.
(VSHLQ): Likewise.
(VABDQ): Likewise.
(VADDQ_N): Likewise.
(VADDVAQ): Likewise.
(VADDVQ_P): Likewise.
(VANDQ): Likewise.
(VBICQ): Likewise.
(VBRSRQ_N): Likewise.
(VCADDQ_ROT270): Likewise.
(VCADDQ_ROT90): Likewise.
(VCMPEQQ): Likewise.
(VCMPEQQ_N): Likewise.
(VCMPNEQ_N): Likewise.
(VEORQ): Likewise.
(VHADDQ): Likewise.
(VHADDQ_N): Likewise.
(VHSUBQ): Likewise.
(VHSUBQ_N): Likewise.
(VMAXQ): Likewise.
(VMAXVQ): Likewise.
(VMINQ): Likewise.
(VMINVQ): Likewise.
(VMLADAVQ): Likewise.
(VMULHQ): Likewise.
(VMULLBQ_INT): Likewise.
(VMULLTQ_INT): Likewise.
(VMULQ): Likewise.
(VMULQ_N): Likewise.
(VORNQ): Likewise.
(VORRQ): Likewise.
(VQADDQ): Likewise.
(VQADDQ_N): Likewise.
(VQRSHLQ): Likewise.
(VQRSHLQ_N): Likewise.
(VQSHLQ): Likewise.
(VQSHLQ_N): Likewise.
(VQSHLQ_R): Likewise.
(VQSUBQ): Likewise.
(VQSUBQ_N): Likewise.
(VRHADDQ): Likewise.
(VRMULHQ): Likewise.
(VRSHLQ): Likewise.
(VRSHLQ_N): Likewise.
(VRSHRQ_N): Likewise.
(VSHLQ_N): Likewise.
(VSHLQ_R): Likewise.
(VSUBQ): Likewise.
(VSUBQ_N): Likewise.
(VADDLVAQ): Likewise.
(VBICQ_N): Likewise.
(VMLALDAVQ): Likewise.
(VMLALDAVXQ): Likewise.
(VMOVNBQ): Likewise.
(VMOVNTQ): Likewise.
(VORRQ_N): Likewise.
(VQMOVNBQ): Likewise.
(VQMOVNTQ): Likewise.
(VSHLLBQ_N): Likewise.
(VSHLLTQ_N): Likewise.
(VRMLALDAVHQ): Likewise.
(VBICQ_M_N): Likewise.
(VCVTAQ_M): Likewise.
(VCVTQ_M_TO_F): Likewise.
(VQRSHRNBQ_N): Likewise.
(VABAVQ): Likewise.
(VSHLCQ): Likewise.
(VRMLALDAVHAQ): Likewise.
(VADDVAQ_P): Likewise.
(VCLZQ_M): Likewise.
(VCMPEQQ_M_N): Likewise.
(VCMPEQQ_M): Likewise.
(VCMPNEQ_M_N): Likewise.
(VCMPNEQ_M): Likewise.
(VDUPQ_M_N): Likewise.
(VMAXVQ_P): Likewise.
(VMINVQ_P): Likewise.
(VMLADAVAQ): Likewise.
(VMLADAVQ_P): Likewise.
(VMLAQ_N): Likewise.
(VMLASQ_N): Likewise.
(VMVNQ_M): Likewise.
(VPSELQ): Likewise.
(VQDMLAHQ_N): Likewise.
(VQRDMLAHQ_N): Likewise.
(VQRDMLASHQ_N): Likewise.
(VQRSHLQ_M_N): Likewise.
(VQSHLQ_M_R): Likewise.
(VREV64Q_M): Likewise.
(VRSHLQ_M_N): Likewise.
(VSHLQ_M_R): Likewise.
(VSLIQ_N): Likewise.
(VSRIQ_N): Likewise.
(VMLALDAVQ_P): Likewise.
(VQMOVNBQ_M): Likewise.
(VMOVLTQ_M): Likewise.
(VMOVNBQ_M): Likewise.
(VRSHRNTQ_N): Likewise.
(VORRQ_M_N): Likewise.
(VREV32Q_M): Likewise.
(VREV16Q_M): Likewise.
(VQRSHRNTQ_N): Likewise.
(VMOVNTQ_M): Likewise.
(VMOVLBQ_M): Likewise.
(VMLALDAVAQ): Likewise.
(VQSHRNBQ_N): Likewise.
(VSHRNBQ_N): Likewise.
(VRSHRNBQ_N): Likewise.
(VMLALDAVXQ_P): Likewise.
(VQMOVNTQ_M): Likewise.
(VMVNQ_M_N): Likewise.
(VQSHRNTQ_N): Likewise.
(VMLALDAVAXQ): Likewise.
(VSHRNTQ_N): Likewise.
(VCVTMQ_M): Likewise.
(VCVTNQ_M): Likewise.
(VCVTPQ_M): Likewise.
(VCVTQ_M_N_FROM_F): Likewise.
(VCVTQ_M_FROM_F): Likewise.
(VRMLALDAVHQ_P): Likewise.
(VADDLVAQ_P): Likewise.
(VABAVQ_P): Likewise.
(VSHLQ_M): Likewise.
(VSRIQ_M_N): Likewise.
(VSUBQ_M): Likewise.
(VCVTQ_M_N_TO_F): Likewise.
(VHSUBQ_M): Likewise.
(VSLIQ_M_N): Likewise.
(VRSHLQ_M): Likewise.
(VMINQ_M): Likewise.
(VMULLBQ_INT_M): Likewise.
(VMULHQ_M): Likewise.
(VMULQ_M): Likewise.
(VHSUBQ_M_N): Likewise.
(VHADDQ_M_N): Likewise.
(VORRQ_M): Likewise.
(VRMULHQ_M): Likewise.
(VQADDQ_M): Likewise.
(VRSHRQ_M_N): Likewise.
(VQSUBQ_M_N): Likewise.
(VADDQ_M): Likewise.
(VORNQ_M): Likewise.
(VRHADDQ_M): Likewise.
(VQSHLQ_M): Likewise.
(VANDQ_M): Likewise.
(VBICQ_M): Likewise.
(VSHLQ_M_N): Likewise.
(VCADDQ_ROT270_M): Likewise.
(VQRSHLQ_M): Likewise.
(VQADDQ_M_N): Likewise.
(VADDQ_M_N): Likewise.
(VMAXQ_M): Likewise.
(VQSUBQ_M): Likewise.
(VMLASQ_M_N): Likewise.
(VMLADAVAQ_P): Likewise.
(VBRSRQ_M_N): Likewise.
(VMULQ_M_N): Likewise.
(VCADDQ_ROT90_M): Likewise.
(VMULLTQ_INT_M): Likewise.
(VEORQ_M): Likewise.
(VSHRQ_M_N): Likewise.
(VSUBQ_M_N): Likewise.
(VHADDQ_M): Likewise.
(VABDQ_M): Likewise.
(VMLAQ_M_N): Likewise.
(VQSHLQ_M_N): Likewise.
(VMLALDAVAQ_P): Likewise.
(VMLALDAVAXQ_P): Likewise.
(VQRSHRNBQ_M_N): Likewise.
(VQRSHRNTQ_M_N): Likewise.
(VQSHRNBQ_M_N): Likewise.
(VQSHRNTQ_M_N): Likewise.
(VRSHRNBQ_M_N): Likewise.
(VRSHRNTQ_M_N): Likewise.
(VSHLLBQ_M_N): Likewise.
(VSHLLTQ_M_N): Likewise.
(VSHRNBQ_M_N): Likewise.
(VSHRNTQ_M_N): Likewise.
(VSTRWSBQ): Likewise.
(VSTRBSOQ): Likewise.
(VSTRBQ): Likewise.
(VLDRBGOQ): Likewise.
(VLDRBQ): Likewise.
(VLDRWGBQ): Likewise.
(VLD1Q): Likewise.
(VLDRHGOQ): Likewise.
(VLDRHGSOQ): Likewise.
(VLDRHQ): Likewise.
(VLDRWQ): Likewise.
(VLDRDGBQ): Likewise.
(VLDRDGOQ): Likewise.
(VLDRDGSOQ): Likewise.
(VLDRWGOQ): Likewise.
(VLDRWGSOQ): Likewise.
(VST1Q): Likewise.
(VSTRHSOQ): Likewise.
(VSTRHSSOQ): Likewise.
(VSTRHQ): Likewise.
(VSTRWQ): Likewise.
(VSTRDSBQ): Likewise.
(VSTRDSOQ): Likewise.
(VSTRDSSOQ): Likewise.
(VSTRWSOQ): Likewise.
(VSTRWSSOQ): Likewise.
(VSTRWSBWBQ): Likewise.
(VLDRWGBWBQ): Likewise.
(VSTRDSBWBQ): Likewise.
(VLDRDGBWBQ): Likewise.
(VADCIQ): Likewise.
(VADCIQ_M): Likewise.
(VSBCQ): Likewise.
(VSBCQ_M): Likewise.
(VSBCIQ): Likewise.
(VSBCIQ_M): Likewise.
(VADCQ): Likewise.
(VADCQ_M): Likewise.
(UQRSHLLQ): Likewise.
(SQRSHRLQ): Likewise.
(VSHLCQ_M): Likewise.
(define_c_enum "unspec"): Move MVE enumerator to unspecs.md from mve.md.
* config/arm/unspecs.md (define_c_enum "unspec"): Move MVE enumerator from
mve.md to unspecs.md.
Martin Liska [Tue, 6 Oct 2020 09:18:55 +0000 (11:18 +0200)]
dbgcnt: print list after compilation
gcc/ChangeLog:
* common.opt: Remove -fdbg-cnt-list from deferred options.
* dbgcnt.c (dbg_cnt_set_limit_by_index): Make a copy
to original_limits.
(dbg_cnt_list_all_counters): Print also current counter value
and print to stderr.
* opts-global.c (handle_common_deferred_options): Do not handle
-fdbg-cnt-list.
* opts.c (common_handle_option): Likewise.
* toplev.c (finalize): Handle it after compilation here.
Martin Liska [Tue, 6 Oct 2020 08:49:47 +0000 (10:49 +0200)]
dbgcnt: report upper limit when lower == upper
gcc/ChangeLog:
* dbgcnt.c (dbg_cnt): Report also upper limit.
Tobias Burnus [Tue, 6 Oct 2020 09:49:34 +0000 (11:49 +0200)]
configure: Fix in-tree building of GMP on BSD [PR97302]
ChangeLog:
PR target/97302
* configure.ac: Only set with_gmp to /usr/local
if not building in tree.
* configure: Regenerate.
Tom de Vries [Sun, 4 Oct 2020 10:01:34 +0000 (12:01 +0200)]
[ftracer] Add caching of can_duplicate_bb_p
The fix "[omp, ftracer] Don't duplicate blocks in SIMT region" adds iteration
over insns in ignore_bb_p, which makes it more expensive.
Counteract this by piggybacking the computation of can_duplicate_bb_p onto
count_insns, which is called at the start of ftracer.
Bootstrapped and reg-tested on x86_64-linux.
gcc/ChangeLog:
2020-10-05 Tom de Vries <tdevries@suse.de>
* tracer.c (count_insns): Rename to ...
(analyze_bb): ... this.
(cache_can_duplicate_bb_p, cached_can_duplicate_bb_p): New function.
(ignore_bb_p): Use cached_can_duplicate_bb_p.
(tail_duplicate): Call cache_can_duplicate_bb_p.
Tom de Vries [Sun, 4 Oct 2020 11:23:37 +0000 (13:23 +0200)]
[ftracer] Factor out can_duplicate_bb_p
Factor out can_duplicate_bb_p out of ignore_bb_p.
Also factor out can_duplicate_insn_p and can_duplicate_bb_no_insn_iter_p to
expose the parts of can_duplicate_bb_p that are per-bb and per-insn.
Bootstrapped and reg-tested on x86_64-linux.
gcc/ChangeLog:
2020-10-05 Tom de Vries <tdevries@suse.de>
* tracer.c (can_duplicate_insn_p, can_duplicate_bb_no_insn_iter_p)
(can_duplicate_bb_p): New function, factored out of ...
(ignore_bb_p): ... here.
Jonathan Wakely [Tue, 6 Oct 2020 08:41:16 +0000 (09:41 +0100)]
libstdc++: Avoid CTAD for std::ranges::join_view [LWG 3474]
In commit
ef275d1f2083f8a1fa1b59a3cd07fd3e8431023e I implemented the
wrong resolution of LWG 3474. This removes the deduction guide and
alters the views::join factory to create the right type explicitly.
libstdc++-v3/ChangeLog:
* include/std/ranges (join_view): Remove deduction guide.
(views::join): Add explicit template argument list to prevent
deducing the wrong type.
* testsuite/std/ranges/adaptors/join.cc: Move test for LWG 3474
here, from ...
* testsuite/std/ranges/adaptors/join_lwg3474.cc: Removed.
Jakub Jelinek [Tue, 6 Oct 2020 08:32:22 +0000 (10:32 +0200)]
divmod: Match and expand DIVMOD even in some cases of constant divisor [PR97282]
As written in the comment, tree-ssa-math-opts.c wouldn't create a DIVMOD
ifn call for division + modulo by constant for the fear that during
expansion we could generate better code for those cases.
If the divisoris a power of two, that is certainly the case always,
but otherwise expand_divmod can punt in many cases, e.g. if the division
type's precision is above HOST_BITS_PER_WIDE_INT, we don't even call
choose_multiplier, because it works on HOST_WIDE_INTs (true, something
we should fix eventually now that we have wide_ints), or if pre/post shift
is larger than BITS_PER_WORD.
So, the following patch recognizes DIVMOD with constant last argument even
when it is unclear if expand_divmod will be able to optimize it, and then
during DIVMOD expansion if the divisor is constant attempts to expand it as
division + modulo and if they actually don't contain any libcalls or
division/modulo, they are kept as is, otherwise that sequence is thrown away
and divmod optab or libcall is used.
2020-10-06 Jakub Jelinek <jakub@redhat.com>
PR rtl-optimization/97282
* tree-ssa-math-opts.c (divmod_candidate_p): Don't return false for
constant op2 if it is not a power of two and the type has precision
larger than HOST_BITS_PER_WIDE_INT or BITS_PER_WORD.
* internal-fn.c (contains_call_div_mod): New function.
(expand_DIVMOD): If last argument is a constant, try to expand it as
TRUNC_DIV_EXPR followed by TRUNC_MOD_EXPR, but if the sequence
contains any calls or {,U}{DIV,MOD} rtxes, throw it away and use
divmod optab or divmod libfunc.
* gcc.target/i386/pr97282.c: New test.
Aldy Hernandez [Tue, 6 Oct 2020 06:21:56 +0000 (08:21 +0200)]
Fix off-by-one storage problem in irange_allocator.
gcc/ChangeLog:
* value-range.h (irange_allocator::allocate): Increase
newir storage by one.
Jakub Jelinek [Tue, 6 Oct 2020 07:25:00 +0000 (09:25 +0200)]
openmp: Fix ICE in omp_discover_declare_target_tgt_fn_r
This ICEs because node->alias_target is (not yet) a FUNCTION_DECL, but
IDENTIFIER_NODE.
I guess we should retry the discovery before LTO streaming out, the reason
to do it this early is that it can affect the gimplification and omp lowering.
2020-10-06 Jakub Jelinek <jakub@redhat.com>
PR middle-end/97289
* omp-offload.c (omp_discover_declare_target_tgt_fn_r): Only follow
node->alias_target if it is a FUNCTION_DECL.
* c-c++-common/gomp/pr97289.c: New test.
Joe Ramsay [Tue, 6 Oct 2020 06:33:52 +0000 (07:33 +0100)]
arm: Add +nomve and +nomve.fp options to -mcpu=cortex-m55
This patch rearranges feature bits for MVE and FP to implement the
following flags for -mcpu=cortex-m55.
- +nomve: equivalent to armv8.1-m.main+fp.dp+dsp.
- +nomve.fp: equivalent to armv8.1-m.main+mve+fp.dp (+dsp is implied by +mve).
- +nofp: equivalent to armv8.1-m.main+mve (+dsp is implied by +mve).
- +nodsp: equivalent to armv8.1-m.main+fp.dp.
Combinations of the above:
- +nomve+nofp: equivalent to armv8.1-m.main+dsp.
- +nodsp+nofp: equivalent to armv8.1-m.main.
Due to MVE and FP sharing vfp_base, some new syntax was required in the CPU
description to implement the concept of 'implied bits'. These are non-named
features added to the ISA late, depending on whether one or more features which
depend on them are present. This means vfp_base can be present when only one of
MVE and FP is removed, but absent when both are removed.
gcc/ChangeLog:
2020-07-31 Joe Ramsay <joe.ramsay@arm.com>
* config/arm/arm-cpus.in:
(ALL_FPU_INTERNAL): Remove vfp_base.
(VFPv2): Remove vfp_base.
(MVE): Remove vfp_base.
(vfp_base): Redefine as implied bit dependent on MVE or FP
(cortex-m55): Add flags to disable MVE, MVE FP, FP and DSP extensions.
* config/arm/arm.c (arm_configure_build_target): Add implied bits to ISA.
* config/arm/parsecpu.awk:
(gen_isa): Print implied bits and their dependencies to ISA header.
(gen_data): Add parsing for implied feature bits.
gcc/testsuite/ChangeLog:
* gcc.target/arm/cortex-m55-nodsp-flag-hard.c: New test.
* gcc.target/arm/cortex-m55-nodsp-flag-softfp.c: New test.
* gcc.target/arm/cortex-m55-nodsp-nofp-flag-softfp.c: New test.
* gcc.target/arm/cortex-m55-nofp-flag-hard.c: New test.
* gcc.target/arm/cortex-m55-nofp-flag-softfp.c: New test.
* gcc.target/arm/cortex-m55-nofp-nomve-flag-softfp.c: New test.
* gcc.target/arm/cortex-m55-nomve-flag-hard.c: New test.
* gcc.target/arm/cortex-m55-nomve-flag-softfp.c: New test.
* gcc.target/arm/cortex-m55-nomve.fp-flag-hard.c: New test.
* gcc.target/arm/cortex-m55-nomve.fp-flag-softfp.c: New test.
* gcc.target/arm/multilib.exp: Add tests for -mcpu=cortex-m55.
Andreas Krebbel [Tue, 6 Oct 2020 05:56:51 +0000 (07:56 +0200)]
IBM Z: Doc: Add z15/arch13 to the list of -march/-mtune options
gcc/ChangeLog:
* doc/invoke.texi: Add z15/arch13 to the list of documented
-march/-mtune options.
Nikhil Benesch [Sun, 4 Oct 2020 06:03:36 +0000 (02:03 -0400)]
gofrontend: correct file reading logic in Stream_from_file
The implementation of Stream_from_file mishandled several cases:
* It reversed the check for whether bytes were already available in
the peek buffer.
* It considered positive return values from lseek to be an error, when
only a -1 return value indicates an error.
Reviewed-on: https://go-review.googlesource.com/c/gofrontend/+/259437
GCC Administrator [Tue, 6 Oct 2020 00:16:25 +0000 (00:16 +0000)]
Daily bump.
Jonathan Wakely [Mon, 5 Oct 2020 23:05:11 +0000 (00:05 +0100)]
libstdc++: Reduce uses of std::numeric_limits
This avoids unnecessary instantiations of std::numeric_limits or
inclusion of <limits> when a more lightweight alternative would work.
Some uses can be replaced with __gnu_cxx::__int_traits and some can just
use size_t(-1) directly where SIZE_MAX is needed.
libstdc++-v3/ChangeLog:
* include/bits/regex.h: Use __int_traits<int> instead of
std::numeric_limits<int>.
* include/bits/uniform_int_dist.h: Use __int_traits<T>::__max
instead of std::numeric_limits<T>::max().
* include/bits/hashtable_policy.h: Use size_t(-1) instead of
std::numeric_limits<size_t>::max().
* include/std/regex: Include <ext/numeric_traits.h>.
* include/std/string_view: Use typedef for __int_traits<int>.
* src/c++11/hashtable_c++0x.cc: Use size_t(-1) instead of
std::numeric_limits<size_t>::max().
* testsuite/std/ranges/iota/96042.cc: Include <limits>.
* testsuite/std/ranges/iota/difference_type.cc: Likewise.
* testsuite/std/ranges/subrange/96042.cc: Likewise.
Marek Polacek [Mon, 5 Oct 2020 22:06:19 +0000 (18:06 -0400)]
c++: Fix typo in NON_UNION_CLASS_TYPE_P.
gcc/cp/ChangeLog:
* cp-tree.h (NON_UNION_CLASS_TYPE_P): Fix typo in a comment.
Jonathan Wakely [Mon, 5 Oct 2020 21:45:27 +0000 (22:45 +0100)]
libstdc++: Minor header cleanup in <numeric>
When adding new features to <numeric> I included the required headers
adjacent to the new code. This cleans it up by moving all the includes
to the start of the file.
libstdc++-v3/ChangeLog:
* include/std/numeric: Move all #include directives to the top
of the header.
* testsuite/26_numerics/gcd/gcd_neg.cc: Adjust dg-error line
numbers.
* testsuite/26_numerics/lcm/lcm_neg.cc: Likewise.
Aldy Hernandez [Mon, 5 Oct 2020 15:36:13 +0000 (17:36 +0200)]
Cleanup legacy_union and legacy intersect in value_range.
These are cleanups so that multi-range union/intersect doesn't
have to deal with legacy code. Instead, these should be done in
legacy mode.
gcc/ChangeLog:
* value-range.cc (irange::legacy_intersect): Only handle
legacy ranges.
(irange::legacy_union): Same.
(irange::union_): When unioning legacy with non-legacy,
first convert to legacy and do everything in legacy mode.
(irange::intersect): Same, but for intersect.
* range-op.cc (range_tests): Adjust for above changes.
Aldy Hernandez [Mon, 5 Oct 2020 15:08:11 +0000 (17:08 +0200)]
Import various range-op fixes from ranger branch.
This patch imports three fixes from the ranger branch:
1. Fold division by zero into varying instead of undefined.
This provides compatibility with existing stuff on trunk.
2. Solver changes for lshift.
This should not affect anything on trunk, as it only involves
the GORI solver which is yet to be contributed.
3. Preserve existing behavior for ABS([-MIN,-MIN]).
This is actually unrepresentable, but trunk has traditionally
treated this as [-MIN,-MIN] so this patch just syncs range-ops
with the rest of trunk.
gcc/ChangeLog:
* range-op.cc (operator_div::wi_fold): Return varying for
division by zero.
(class operator_rshift): Move class up.
(operator_abs::wi_fold): Return [-MIN,-MIN] for ABS([-MIN,-MIN]).
(operator_tests): Adjust tests.
Jakub Jelinek [Mon, 5 Oct 2020 16:33:17 +0000 (18:33 +0200)]
support TARGET_MEM_REF in C/C++ error pretty-printing [PR97197]
> See my comment above for Martins attempts to improve things. I don't
> really want to try decide what to do with those late diagnostic IL
> printing but my commit was blamed for showing target-mem-ref unsupported.
>
> I don't have much time to spend to think what to best print and what not,
> but yes, printing only the MEM_REF part is certainly imprecise.
Here is an updated version of the patch that prints TARGET_MEM_REF the way
it should be printed - as C representation of what it actually means.
Of course it would be better to have the original expressions, but with the
late diagnostics we no longer have them.
2020-10-05 Richard Biener <rguenther@suse.de>
Jakub Jelinek <jakub@redhat.com>
PR c++/97197
gcc/cp/
* error.c (dump_expr): Handle TARGET_MEM_REF.
gcc/c-family/
* c-pretty-print.c: Include langhooks.h.
(c_pretty_printer::postfix_expression): Handle TARGET_MEM_REF as
expression.
(c_pretty_printer::expression): Handle TARGET_MEM_REF as
unary_expression.
(c_pretty_printer::unary_expression): Handle TARGET_MEM_REF.
Jonathan Wakely [Mon, 5 Oct 2020 14:16:58 +0000 (15:16 +0100)]
libstdc++: Make allocators throw bad_array_new_length on overflow [LWG 3190]
std::allocator and std::pmr::polymorphic_allocator should throw
std::bad_array_new_length from their allocate member functions if the
number of bytes required cannot be represented in std::size_t.
libstdc++-v3/ChangeLog:
* config/abi/pre/gnu.ver: Add new symbol.
* include/bits/functexcept.h (__throw_bad_array_new_length):
Declare new function.
* include/ext/malloc_allocator.h (malloc_allocator::allocate):
Throw bad_array_new_length for impossible sizes (LWG 3190).
* include/ext/new_allocator.h (new_allocator::allocate):
Likewise.
* include/std/memory_resource (polymorphic_allocator::allocate)
(polymorphic_allocator::allocate_object): Use new function,
__throw_bad_array_new_length.
* src/c++11/functexcept.cc (__throw_bad_array_new_length):
Define.
* testsuite/20_util/allocator/lwg3190.cc: New test.
Tom de Vries [Mon, 5 Oct 2020 12:26:04 +0000 (14:26 +0200)]
[omp, ftracer] Ignore IFN_GOMP_SIMT_XCHG_* in ignore_bb_p
As IFN_GOMP_SIMT_XCHG_* are part of the group marked by
IFN_GOMP_SIMT_ENTER_ALLOC/IFN_GOMP_SIMT_EXIT, handle them conservatively
in ignore_bb_p.
Build on x86_64-linux with nvptx accelerator, tested with libgomp.
gcc/ChangeLog:
2020-10-05 Tom de Vries <tdevries@suse.de>
* tracer.c (ignore_bb_p): Ignore GOMP_SIMT_XCHG_*.
Nathan Sidwell [Mon, 5 Oct 2020 13:36:38 +0000 (06:36 -0700)]
c++: Make spell corrections consistent
My change to namespace-scope spell corrections ignored the issue that
different targets might have different builtins, and therefore perturb
iteration order. This fixes it by using an intermediate array of
identifier, which we sort before considering.
gcc/cp/
* name-lookup.c (maybe_add_fuzzy_decl): New.
(maybe_add_fuzzy_binding): New.
(consider_binding_level): Use intermediate sortable vector for
namespace bindings.
gcc/testsuite/
* c-c++-common/spellcheck-reserved.c: Restore diagnostic.
Alex Coplan [Mon, 5 Oct 2020 12:45:24 +0000 (13:45 +0100)]
arm: Add missing part number for Neoverse V1
This patch adds vendor and part numbers which were missing from the
initial entry for Neoverse V1 in AArch32 GCC.
gcc/ChangeLog:
* config/arm/arm-cpus.in (neoverse-v1): Add missing vendor and
part numbers.
Tom de Vries [Mon, 5 Oct 2020 12:03:34 +0000 (14:03 +0200)]
[omp, ftracer] Remove incorrect suggestion in ignore_bb_p
In commit
ab3f4b27abe "[omp, ftracer] Don't duplicate blocks in SIMT region" I
added a comment in ignore_bb_p suggesting a reordering of SIMT_VOTE_ANY and
SIMT_EXIT, which is not possible since VOTE_ANY may have data dependencies to
storage that is deallocated by SIMT_EXIT.
I've now opened a PR (PR97291) to describe the problem the reordering was
intended to fix.
Remove the incorrect suggestion.
gcc/ChangeLog:
2020-10-05 Tom de Vries <tdevries@suse.de>
* tracer.c (ignore_bb_p): Remove incorrect suggestion.
Mike Crowe [Mon, 5 Oct 2020 10:12:38 +0000 (11:12 +0100)]
libstdc++: Use correct duration for atomic_futex wait on custom clock [PR 91486]
As Jonathan Wakely pointed out[1], my change in commit
f9ddb696a289cc48d24d3d23c0b324cb88de9573 should have been rounding to
the target clock duration type rather than the input clock duration type
in __atomic_futex_unsigned::_M_load_when_equal_until just as (e.g.)
condition_variable does.
As well as fixing this, let's create a rather contrived test that fails
with the previous code, but unfortunately only when run on a machine
with an uptime of over 208.5 days, and even then not always.
[1] https://gcc.gnu.org/pipermail/libstdc++/2020-September/051004.html
libstdc++-v3/ChangeLog:
PR libstdc++/91486
* include/bits/atomic_futex.h:
(__atomic_futex_unsigned::_M_load_when_equal_until): Use target
clock duration type when rounding.
* testsuite/30_threads/async/async.cc (test_pr91486_wait_for):
Rename from test_pr91486.
(float_steady_clock): New class for test.
(test_pr91486_wait_until): New test.
Mike Crowe [Mon, 5 Oct 2020 10:07:55 +0000 (11:07 +0100)]
libstdc++: Test C++11 implementation of std::chrono::__detail::ceil
Commit
53ad6b1979f4bd7121e977c4a44151b14d8a0147 split the implementation
of std::chrono::__detail::ceil so that when compiling for C++17 and
later std::chrono::ceil is used but when compiling for earlier versions
a separate implementation is used to comply with C++11's limited
constexpr rules. Let's run the equivalent of the existing
std::chrono::ceil test cases on std::chrono::__detail::ceil too to make
sure that it doesn't get broken.
libstdc++-v3/ChangeLog:
* testsuite/20_util/duration_cast/rounding_c++11.cc: Copy
rounding.cc and alter to support compilation for C++11 and to
test std::chrono::__detail::ceil.
Jonathan Wakely [Mon, 5 Oct 2020 09:46:11 +0000 (10:46 +0100)]
libstdc++: Add missing bugzilla PR numbers to ChangeLog
We missed these out of the git commit messages.
Jakub Jelinek [Mon, 5 Oct 2020 07:34:42 +0000 (09:34 +0200)]
options: Save and restore opts_set for Optimization and Target options fallout
> This breaks ia64:
>
> In file included from ./tm.h:23,
> from ../../gcc/gencheck.c:23:
> ./options.h:7816:40: error: ISO C++ forbids zero-size array 'explicit_mask' [-Werror=pedantic]
> 7816 | unsigned HOST_WIDE_INT explicit_mask[0];
> | ^
> ./options.h:7816:26: error: zero-size array member 'cl_target_option::explicit_mask' not at end of 'struct cl_target_option' [-Werror=pedantic]
> 7816 | unsigned HOST_WIDE_INT explicit_mask[0];
> | ^~~~~~~~~~~~~
> ./options.h:7812:16: note: in the definition of 'struct cl_target_option'
> 7812 | struct GTY(()) cl_target_option
> | ^~~~~~~~~~~~~~~~
Oops, sorry.
The following patch should fix that and should also fix streaming of the
new explicit_mask_* members.
2020-10-05 Jakub Jelinek <jakub@redhat.com>
* opth-gen.awk: Don't emit explicit_mask array if n_target_explicit
is equal to n_target_explicit_mask.
* optc-save-gen.awk: Compute has_target_explicit_mask and if false,
don't emit code iterating over explicit_mask array elements. Stream
also explicit_mask_* target members.
Jakub Jelinek [Mon, 5 Oct 2020 07:09:41 +0000 (09:09 +0200)]
store-merging: Fix up -Wnarrowing warning
I've noticed a -Wnarrowing warning on gimple-ssa-store-merging.c, this
change fixes that up.
2020-10-05 Jakub Jelinek <jakub@redhat.com>
* gimple-ssa-store-merging.c
(imm_store_chain_info::output_merged_store): Use ~0U instead of ~0 in
unsigned int array initializer.
Tom de Vries [Tue, 22 Sep 2020 14:38:07 +0000 (16:38 +0200)]
[omp, ftracer] Don't duplicate blocks in SIMT region
When running the libgomp testsuite on x86_64-linux with nvptx accelerator on
the test-case included in this patch, we run into:
...
FAIL: libgomp.fortran/pr95654.f90 -O3 -fomit-frame-pointer -funroll-loops \
-fpeel-loops -ftracer -finline-functions execution test
...
The test-case is a minimal version of this FAIL:
...
FAIL: libgomp.fortran/pr66199-5.f90 -O3 -fomit-frame-pointer -funroll-loops \
-fpeel-loops -ftracer -finline-functions execution test
...
but that one has stopped failing at commit
c2ebf4f10de "openmp: Add support
for non-rect simd and improve collapsed simd support".
The problem is that ftracer duplicates a block containing GOMP_SIMT_VOTE_ANY.
That is, before ftracer we have (dropping the GOMP_SIMT_ prefix):
...
bb4(ENTER_ALLOC)
*----------+
| \
| \
| v
| *
v bb8
*<------------*
bb5(VOTE_ANY)
*-------------+
| |
| |
| |
| |
| v
| *
v bb7(XCHG_IDX)
*<------------*
bb6(EXIT)
...
The XCHG_IDX internal-fn does inter-SIMT-lane communication, which for nvptx
maps onto shfl, an operator which has the requirement that the warp executing
the operator is convergent. The warp diverges at bb4, and
reconverges at bb5, and does not diverge by going to bb7, so the shfl is
indeed executed by a convergent warp.
After ftracer, we have:
...
bb4(ENTER_ALLOC)
*----------+
| \
| \
| \
| \
v v
* *
bb5(VOTE_ANY) bb8(VOTE_ANY)
* *
|\ /|
| \ +--------+ |
| \/ |
| /\ |
| / +----------v
|/ *
v bb7(XCHG_IDX)
*<--------------*
bb6(EXIT)
...
The warp diverges again at bb5, but does not reconverge again before bb6, so
the shfl is executed by a divergent warp, which causes the FAIL.
Fix this by making ftracer ignore blocks containing ENTER_ALLOC, VOTE_ANY and
EXIT, effectively treating the SIMT region conservatively.
An argument can be made that the test needs to be added in a more
generic place, like gimple_can_duplicate_bb_p or some such, and that ftracer
then needs to use the generic test. But that's a discussion with a much
broader scope, so I'm leaving that for another patch.
Bootstrapped and reg-tested on x86_64-linux.
Build on x86_64-linux with nvptx accelerator, tested with libgomp.
gcc/ChangeLog:
PR fortran/95654
* tracer.c (ignore_bb_p): Ignore GOMP_SIMT_ENTER_ALLOC,
GOMP_SIMT_VOTE_ANY and GOMP_SIMT_EXIT.
libgomp/ChangeLog:
2020-10-05 Tom de Vries <tdevries@suse.de>
PR fortran/95654
* testsuite/libgomp.fortran/pr95654.f90: New test.
GCC Administrator [Mon, 5 Oct 2020 00:16:18 +0000 (00:16 +0000)]
Daily bump.
Harald Anlauf [Sun, 4 Oct 2020 18:24:29 +0000 (20:24 +0200)]
PR fortran/97272 - Wrong answer from MAXLOC with character arg
The optional KIND argument to the MINLOC/MAXLOC intrinsic must not be
passed to the library function, as the kind conversion of the result
is treated explicitly elsewhere.
gcc/fortran/ChangeLog:
PR fortran/97272
* trans-intrinsic.c (strip_kind_from_actual): Helper function for
removal of KIND argument.
(gfc_conv_intrinsic_minmaxloc): Ignore KIND argument here, as it
is treated elsewhere.
gcc/testsuite/ChangeLog:
PR fortran/97272
* gfortran.dg/pr97272.f90: New test.
GCC Administrator [Sun, 4 Oct 2020 00:16:21 +0000 (00:16 +0000)]
Daily bump.
Clément Chigot [Fri, 25 Sep 2020 07:48:22 +0000 (09:48 +0200)]
aix: apply aix_malloc more narrowly.
In recent Technology Levels of AIX 7.2, new "#ifdef __cplusplus" have been
added. Thus, the aix_malloc fix was applied in wrong locations. This patch
increases the context to avoid this.
fixincludes/ChangeLog:
2020-10-03 Clément Chigot <clement.chigot@atos.net>
* inclhack.def (aix_malloc): Add more context to select.
* fixincl.x: Regenerate.
* tests/base/malloc.h: Update expected results.
Jakub Jelinek [Sat, 3 Oct 2020 19:22:03 +0000 (21:22 +0200)]
options: Fix up opts_set saving/restoring for underlying vars of Mask/InverseMask options
Seems I've missed that set_option has special treatment for
CLVC_BIT_CLEAR/CLVC_BIT_SET.
Which means I'll need to change the generic handling, so that for
global_options_set elements mentioned in CLVC_BIT_* options are treated
differently, instead of using the accumulated bitmasks they'll need to use
their specific bitmask variables during the option saving/restoring.
Here is a patch that implements that.
2020-10-03 Jakub Jelinek <jakub@redhat.com>
* opth-gen.awk: For variables referenced in Mask and InverseMask,
don't use the explicit_mask bitmask array, but add separate
explicit_mask_* members with the same types as the variables.
* optc-save-gen.awk: Save, restore, compare and hash the separate
explicit_mask_* members.
Jan Hubicka [Sat, 3 Oct 2020 15:20:54 +0000 (17:20 +0200)]
Add gcc.dg/tree-ssa/modref-3.c testcase
* gcc.dg/tree-ssa/modref-3.c: New test.
Jan Hubicka [Sat, 3 Oct 2020 15:20:16 +0000 (17:20 +0200)]
Track access ranges in ipa-modref
this patch implements tracking of access ranges. This is only applied when
base pointer is an arugment. Incrementally i will extend it to also track
TBAA basetype so we can disambiguate ranges for accesses to same basetype
(which makes is quite bit more effective). For this reason i track the access
offset separately from parameter offset (the second track combined adjustments
to the parameter). This is I think last feature I would like to add to the
memory access summary this stage1.
Further work will be needed to opitmize the summary and merge adjacent
range/make collapsing more intelingent (so we do not lose track that often),
but I wanted to keep basic patch simple.
According to the cc1plus stats:
Alias oracle query stats:
refs_may_alias_p:
64108082 disambiguations,
74386675 queries
ref_maybe_used_by_call_p: 142319 disambiguations,
65004781 queries
call_may_clobber_ref_p: 23587 disambiguations, 29420 queries
nonoverlapping_component_refs_p: 0 disambiguations, 38117 queries
nonoverlapping_refs_since_match_p: 19489 disambiguations, 55748 must overlaps, 76044 queries
aliasing_component_refs_p: 54763 disambiguations, 755876 queries
TBAA oracle:
24184658 disambiguations
56823187 queries
16260329 are in alias set 0
10617146 queries asked about the same object
125 queries asked about the same alias set
0 access volatile
3960555 are dependent in the DAG
1800374 are aritificially in conflict with void *
Modref stats:
modref use: 10656 disambiguations, 47037 queries
modref clobber: 1473322 disambiguations, 1961464 queries
5027242 tbaa queries (2.563005 per modref query)
649087 base compares (0.330920 per modref query)
PTA query stats:
pt_solution_includes: 977385 disambiguations,
13609749 queries
pt_solutions_intersect: 1032703 disambiguations,
13187507 queries
Which should still compare with
https://gcc.gnu.org/pipermail/gcc-patches/2020-September/554930.html
there is about 2% more load disambiguations and 3.6% more store that is not
great, but the TBAA part helps noticeably more and also this should help
with -fno-strict-aliasing.
I plan to work on improving param tracking too.
Bootstrapped/regtested x86_64-linux with the other changes, OK?
2020-10-02 Jan Hubicka <hubicka@ucw.cz>
* ipa-modref-tree.c (test_insert_search_collapse): Update andling
of accesses.
(test_merge): Likewise.
* ipa-modref-tree.h (struct modref_access_node): Add offset, size,
max_size, parm_offset and parm_offset_known.
(modref_access_node::useful_p): Constify.
(modref_access_node::range_info_useful_p): New predicate.
(modref_access_node::operator==): New.
(struct modref_parm_map): New structure.
(modref_tree::merge): Update for racking parameters)
* ipa-modref.c (dump_access): Dump new fields.
(get_access): Fill in new fields.
(merge_call_side_effects): Update handling of parm map.
(write_modref_records): Stream new fields.
(read_modref_records): Stream new fields.
(compute_parm_map): Update for new parm map.
(ipa_merge_modref_summary_after_inlining): Update.
(modref_propagate_in_scc): Update.
* tree-ssa-alias.c (modref_may_conflict): Handle known ranges.
H.J. Lu [Sat, 3 Oct 2020 14:20:48 +0000 (07:20 -0700)]
doc: Replace roudnevenl with roundevenl
PR other/97280
* doc/extend.texi: Replace roudnevenl with roundevenl
GCC Administrator [Sat, 3 Oct 2020 00:16:25 +0000 (00:16 +0000)]
Daily bump.
David Edelsohn [Fri, 2 Oct 2020 16:09:52 +0000 (12:09 -0400)]
rs6000: clean up headers in rs6000.c and rs6000-call.c
When Andrew Macleod investigated the recent rs6000 bootstrap failure,
he suggested a clean up of the headers in rs6000.c and rs6000-call.c.
It now is recommended to include ssa.h instead of the individual headers.
This also ensures that value-range.h is included and in the correct order
so that the tree-ssa-propagate.h inclusion of value-query.h and its
dependencies are satisfied.
Bootstrapped on powerpc-ibm-aix7.2.0.0 and powerpc64le-linux.
gcc/ChangeLog:
2020-10-02 David Edelsohn <dje.gcc@gmail.com>
Andrew MacLeod <amacleod@redhat.com>
* config/rs6000/rs6000.c: Include ssa.h. Reorder some headers.
* config/rs6000/rs6000-call.c: Same.
Marek Polacek [Thu, 1 Oct 2020 20:40:17 +0000 (16:40 -0400)]
c++: Fix printing of C++20 template parameter object [PR97014]
No one is interested in the mangled name of the C++20 template parameter
object for a class NTTP. So instead of printing
required for the satisfaction of ‘positive<T::ratio>’ [with T = X<::_ZTAXtl5ratioLin1ELi2EEE>]
let's print
required for the satisfaction of ‘positive<T::ratio>’ [with T = X<{-1, 2}>]
I don't think adding a test is necessary for this.
gcc/cp/ChangeLog:
PR c++/97014
* cxx-pretty-print.c (pp_cxx_template_argument_list): If the
argument is template_parm_object_p, print its DECL_INITIAL.
Jonathan Wakely [Fri, 2 Oct 2020 21:14:06 +0000 (22:14 +0100)]
libstdc++: Change test to work without 64-bit atomics
This fixes a linker error for older ARM cores without 64-bit atomics.
I think the { dg-add-options libatomic } is no longer needed, but it's
harmless to keep it there.
libstdc++-v3/ChangeLog:
* testsuite/29_atomics/atomic_float/value_init.cc: Use float
instead of double so that __atomic_load_8 isn't needed.
Jonathan Wakely [Fri, 2 Oct 2020 20:10:55 +0000 (21:10 +0100)]
libstdc++: Fix testcase by using terminate handler
This test was supposed to verify that when __libc_single_threaded is
available we successfully detect recursive static initialization even
when linked to libpthread. But I forgot to that when recursive init is
detected, we terminate, and so the test fails.
This adds a terminate handler that exits cleanly, so the test passes
when recursive init is detected.
libstdc++-v3/ChangeLog:
* testsuite/18_support/96817.cc: Use terminate handler that
calls _Exit(0).
Nathan Sidwell [Fri, 2 Oct 2020 19:21:08 +0000 (12:21 -0700)]
c++: Kill DECL_ANTICIPATED
Here's the patch to remove DECL_ANTICIPATED, and with it hiddenness is
managed entirely in the symbol table. Sadly I couldn't get rid of the
actual field without more investigation -- it's repurposed for
OMP_PRIVATIZED_MEMBER. It looks like a the VAR-related flags in
lang_decl_base are not completely orthogonal, so perhaps some can be
turned into an enumeration or something. But that's more than I want
to do right now.
DECL_FRIEND_P Is still slightly suspect as it appears to mean more
than just in-class definition. However, I'm leaving that for now.
gcc/cp/
* cp-tree.h (lang_decl_base): anticipated_p is not used for
anticipatedness.
(DECL_ANTICIPATED): Delete.
* decl.c (duplicate_decls): Delete DECL_ANTICIPATED_management,
use was_hidden.
(cxx_builtin_function): Drop DECL_ANTICIPATED setting.
(xref_tag_1): Drop DECL_ANTICIPATED assert.
* name-lookup.c (name_lookup::adl_class_only): Drop
DECL_ANTICIPATED check.
(name_lookup::search_adl): Always dedup.
(anticipated_builtin_p): Reimplement.
(do_pushdecl): Drop DECL_ANTICIPATED asserts & update.
(lookup_elaborated_type_1): Drop DECL_ANTICIPATED update.
(do_pushtag): Drop DECL_ANTICIPATED setting.
* pt.c (push_template_decl): Likewise.
(tsubst_friend_class): Likewise.
libcc1/
* libcp1plugin.cc (libcp1plugin.cc): Drop DECL_ANTICIPATED test.
Nathan Sidwell [Fri, 2 Oct 2020 18:13:26 +0000 (11:13 -0700)]
c++: Hash table iteration for namespace-member spelling suggestions
For 'no such binding' errors, we iterate over binding levels to find a
close match. At the namespace level we were using DECL_ANTICIPATED to
skip undeclared builtins. But (a) there are other unnameable things
there and (b) decl-anticipated is about to go away. This changes the
namespace scanning to iterate over the hash table, and look at
non-hidden bindings. This does mean we look at fewer strings
(hurrarh), but the order we meet them is somewhat 'random'. Our
distance measure is not very fine grained, and a couple of testcases
change their suggestion. I notice for the c/c++ common one, we now
match the output of the C compiler. For the other one we think 'int'
and 'int64_t' have the same distance from 'int64', and now meet the
former first. That's a little unfortunate. If it's too problematic I
suppose we could sort the strings via an intermediate array before
measuring distance.
gcc/cp/
* name-lookup.c (consider_decl): New, broken out of ...
(consider_binding_level): ... here. Iterate the hash table for
namespace bindings.
gcc/testsuite/
* c-c++-common/spellcheck-reserved.c: Adjust diagnostic.
* g++.dg/spellcheck-typenames.C: Adjust diagnostic.
Nathan Sidwell [Fri, 2 Oct 2020 16:47:00 +0000 (09:47 -0700)]
c++: cleanup ctor_omit_inherited_parms [PR97268]
ctor_omit_inherited_parms was being somewhat abused. What I'd missed
is that it checks for a base-dtor name, before proceeding with the
check. But we ended up passing it that during cloning before we'd
completed the cloning. It was also using DECL_ORIGIN to get to the
in-charge ctor, but we sometimes zap DECL_ABSTRACT_ORIGIN, and it ends
up processing the incoming function -- which happens to work. so,
this breaks out a predicate that expects to get the incharge ctor, and
will tell you whether its base ctor will need to omit the parms. We
call that directly during cloning.
Then the original fn is essentially just a wrapper, but uses
DECL_CLONED_FUNCTION to get to the in-charge ctor. That uncovered
abuse in add_method, which was happily passing TEMPLATE_DECLs to it.
Let's not do that. add_method itself contained a loop mostly
containing an 'if (nomatch) continue' idiom, except for a final 'if
(match) {...}' check, which itself contained instances of the former
idiom. I refactored that to use the former idiom throughout. In
doing that I found a place where we'd issue an error, but then not
actually reject the new member.
gcc/cp/
* cp-tree.h (base_ctor_omit_inherited_parms): Declare.
* class.c (add_method): Refactor main loop, only pass fns to
ctor_omit_inherited_parms.
(build_cdtor_clones): Rename bool parms.
(clone_cdtor): Call base_ctor_omit_inherited_parms.
* method.c (base_ctor_omit_inherited_parms): New, broken out of
...
(ctor_omit_inherited_parms): ... here, call it with
DECL_CLONED_FUNCTION.
gcc/testsuite/
* g++.dg/inherit/pr97268.C: New.
Martin Jambor [Fri, 2 Oct 2020 16:41:35 +0000 (18:41 +0200)]
ipa-cp: Separate and increase the large-unit parameter
A previous patch in the series has taught IPA-CP to identify the
important cloning opportunities in 548.exchange2_r as worthwhile on
their own, but the optimization is still prevented from taking place
because of the overall unit-growh limit. This patches raises that
limit so that it takes place and the benchmark runs 30% faster (on AMD
Zen2 CPU at least).
Before this patch, IPA-CP uses the following formulae to arrive at the
overall_size limit:
base = MAX(orig_size, param_large_unit_insns)
unit_growth_limit = base + base * param_ipa_cp_unit_growth / 100
since param_ipa_cp_unit_growth has default 10, param_large_unit_insns
has default value 10000.
The problem with exchange2 (at least on zen2 but I have had a quick
look on aarch64 too) is that the original estimated unit size is 10513
and so param_large_unit_insns does not apply and the default limit is
therefore 11564 which is good enough only for one of the ideal 8
clonings, we need the limit to be at least 16291.
I would like to raise param_ipa_cp_unit_growth a little bit more soon
too, but most certainly not to 55. Therefore, the large_unit must be
increased. In this patch, I decided to decouple the inlining and
ipa-cp large-unit parameters. It also makes sense because IPA-CP uses
it only at -O3 while inlining also at -O2 (IIUC). But if we agree we
can try raising param_large_unit_insns to 13-14 thousand
"instructions," perhaps it is not necessary. But then again, it may
make sense to actually increase the IPA-CP limit further.
I plan to experiment with IPA-CP tuning on a larger set of programs.
Meanwhile, mainly to address the 548.exchange2_r regression, I'm
suggesting this simple change.
gcc/ChangeLog:
2020-09-07 Martin Jambor <mjambor@suse.cz>
* params.opt (ipa-cp-large-unit-insns): New parameter.
* ipa-cp.c (get_max_overall_size): Use the new parameter.
Martin Jambor [Fri, 2 Oct 2020 16:41:35 +0000 (18:41 +0200)]
ipa-cp: Add dumping of overall_size after cloning
When experimenting with IPA-CP parameters, especially when looking
into exchange2_r, it has been very useful to know what the value of
overall_size is at different stages of the decision process. This
patch therefore adds it to the generated dumps.
gcc/ChangeLog:
2020-09-07 Martin Jambor <mjambor@suse.cz>
* ipa-cp.c (estimate_local_effects): Add overeall_size to dumped
string.
(decide_about_value): Add dumping new overall_size.
Martin Jambor [Fri, 2 Oct 2020 16:41:35 +0000 (18:41 +0200)]
ipa: Multiple predicates for loop properties, with frequencies
This patch enhances the ability of IPA to reason under what conditions
loops in a function have known iteration counts or strides because it
replaces single predicates which currently hold conjunction of
predicates for all loops with vectors capable of holding multiple
predicates, each with a cumulative frequency of loops with the
property.
This second property is then used by IPA-CP to much more aggressively
boost its heuristic score for cloning opportunities which make
iteration counts or strides of frequent loops compile time constant.
gcc/ChangeLog:
2020-09-03 Martin Jambor <mjambor@suse.cz>
* ipa-fnsummary.h (ipa_freqcounting_predicate): New type.
(ipa_fn_summary): Change the type of loop_iterations and loop_strides
to vectors of ipa_freqcounting_predicate.
(ipa_fn_summary::ipa_fn_summary): Construct the new vectors.
(ipa_call_estimates): New fields loops_with_known_iterations and
loops_with_known_strides.
* ipa-cp.c (hint_time_bonus): Multiply param_ipa_cp_loop_hint_bonus
with the expected frequencies of loops with known iteration count or
stride.
* ipa-fnsummary.c (add_freqcounting_predicate): New function.
(ipa_fn_summary::~ipa_fn_summary): Release the new vectors instead of
just two predicates.
(remap_hint_predicate_after_duplication): Replace with function
remap_freqcounting_preds_after_dup.
(ipa_fn_summary_t::duplicate): Use it or duplicate new vectors.
(ipa_dump_fn_summary): Dump the new vectors.
(analyze_function_body): Compute the loop property vectors.
(ipa_call_context::estimate_size_and_time): Calculate also
loops_with_known_iterations and loops_with_known_strides. Adjusted
dumping accordinly.
(remap_hint_predicate): Replace with function
remap_freqcounting_predicate.
(ipa_merge_fn_summary_after_inlining): Use it.
(inline_read_section): Stream loopcounting vectors instead of two
simple predicates.
(ipa_fn_summary_write): Likewise.
* params.opt (ipa-max-loop-predicates): New parameter.
* doc/invoke.texi (ipa-max-loop-predicates): Document new param.
gcc/testsuite/ChangeLog:
2020-09-03 Martin Jambor <mjambor@suse.cz>
* gcc.dg/ipa/ipcp-loophint-1.c: New test.
Martin Jambor [Fri, 2 Oct 2020 16:41:34 +0000 (18:41 +0200)]
ipa: Bundle estimates of ipa_call_context::estimate_size_and_time
A subsequent patch adds another two estimates that the code in
ipa_call_context::estimate_size_and_time computes, and the fact that
the function has a special output parameter for each thing it computes
would make it have just too many. Therefore, this patch collapses all
those ouptut parameters into one output structure.
gcc/ChangeLog:
2020-09-02 Martin Jambor <mjambor@suse.cz>
* ipa-inline-analysis.c (do_estimate_edge_time): Adjusted to use
ipa_call_estimates.
(do_estimate_edge_size): Likewise.
(do_estimate_edge_hints): Likewise.
* ipa-fnsummary.h (struct ipa_call_estimates): New type.
(ipa_call_context::estimate_size_and_time): Adjusted declaration.
(estimate_ipcp_clone_size_and_time): Likewise.
* ipa-cp.c (hint_time_bonus): Changed the type of the second argument
to ipa_call_estimates.
(perform_estimation_of_a_value): Adjusted to use ipa_call_estimates.
(estimate_local_effects): Likewise.
* ipa-fnsummary.c (ipa_call_context::estimate_size_and_time): Adjusted
to return estimates in a single ipa_call_estimates parameter.
(estimate_ipcp_clone_size_and_time): Likewise.
Martin Jambor [Fri, 2 Oct 2020 16:41:34 +0000 (18:41 +0200)]
ipa: Introduce ipa_cached_call_context
Hi,
as we discussed with Honza on the mailin glist last week, making
cached call context structure distinct from the normal one may make it
clearer that the cached data need to be explicitely deallocated.
This patch does that division. It is not mandatory for the overall
main goals of the patch set and can be dropped if deemed superfluous.
gcc/ChangeLog:
2020-09-02 Martin Jambor <mjambor@suse.cz>
* ipa-fnsummary.h (ipa_cached_call_context): New forward declaration
and class.
(class ipa_call_context): Make friend ipa_cached_call_context. Moved
methods duplicate_from and release to it too.
* ipa-fnsummary.c (ipa_call_context::duplicate_from): Moved to class
ipa_cached_call_context.
(ipa_call_context::release): Likewise, removed the parameter.
* ipa-inline-analysis.c (node_context_cache_entry): Change the type of
ctx to ipa_cached_call_context.
(do_estimate_edge_time): Remove parameter from the call to
ipa_cached_call_context::release.
Martin Jambor [Fri, 2 Oct 2020 16:41:34 +0000 (18:41 +0200)]
ipa: Bundle vectors describing argument values
Hi,
this large patch is mostly mechanical change which aims to replace
uses of separate vectors about known scalar values (usually called
known_vals or known_csts), known aggregate values (known_aggs), known
virtual call contexts (known_contexts) and known value
ranges (known_value_ranges) with uses of either new type
ipa_call_arg_values or ipa_auto_call_arg_values, both of which simply
contain these vectors inside them.
The need for two distinct comes from the fact that when the vectors
are constructed from jump functions or lattices, we really should use
auto_vecs with embedded storage allocated on stack. On the other hand,
the bundle in ipa_call_context can be allocated on heap when in cache,
one time for each call_graph node.
ipa_call_context is constructible from ipa_auto_call_arg_values but
then its vectors must not be resized, otherwise the vectors will stop
pointing to the stack ones. Unfortunately, I don't think the
structure embedded in ipa_call_context can be made constant because we
need to manipulate and deallocate it when in cache.
gcc/ChangeLog:
2020-09-01 Martin Jambor <mjambor@suse.cz>
* ipa-prop.h (ipa_auto_call_arg_values): New type.
(class ipa_call_arg_values): Likewise.
(ipa_get_indirect_edge_target): Replaced vector arguments with
ipa_call_arg_values in declaration. Added an overload for
ipa_auto_call_arg_values.
* ipa-fnsummary.h (ipa_call_context): Removed members m_known_vals,
m_known_contexts, m_known_aggs, duplicate_from, release and equal_to,
new members m_avals, store_to_cache and equivalent_to_p. Adjusted
construcotr arguments.
(estimate_ipcp_clone_size_and_time): Replaced vector arguments
with ipa_auto_call_arg_values in declaration.
(evaluate_properties_for_edge): Likewise.
* ipa-cp.c (ipa_get_indirect_edge_target): Adjusted to work on
ipa_call_arg_values rather than on separate vectors. Added an
overload for ipa_auto_call_arg_values.
(devirtualization_time_bonus): Adjusted to work on
ipa_auto_call_arg_values rather than on separate vectors.
(gather_context_independent_values): Adjusted to work on
ipa_auto_call_arg_values rather than on separate vectors.
(perform_estimation_of_a_value): Likewise.
(estimate_local_effects): Likewise.
(modify_known_vectors_with_val): Adjusted both variants to work on
ipa_auto_call_arg_values and rename them to
copy_known_vectors_add_val.
(decide_about_value): Adjusted to work on ipa_call_arg_values rather
than on separate vectors.
(decide_whether_version_node): Likewise.
* ipa-fnsummary.c (evaluate_conditions_for_known_args): Likewise.
(evaluate_properties_for_edge): Likewise.
(ipa_fn_summary_t::duplicate): Likewise.
(estimate_edge_devirt_benefit): Adjusted to work on
ipa_call_arg_values rather than on separate vectors.
(estimate_edge_size_and_time): Likewise.
(estimate_calls_size_and_time_1): Likewise.
(summarize_calls_size_and_time): Adjusted calls to
estimate_edge_size_and_time.
(estimate_calls_size_and_time): Adjusted to work on
ipa_call_arg_values rather than on separate vectors.
(ipa_call_context::ipa_call_context): Construct from a pointer to
ipa_auto_call_arg_values instead of inividual vectors.
(ipa_call_context::duplicate_from): Adjusted to access vectors within
m_avals.
(ipa_call_context::release): Likewise.
(ipa_call_context::equal_to): Likewise.
(ipa_call_context::estimate_size_and_time): Adjusted to work on
ipa_call_arg_values rather than on separate vectors.
(estimate_ipcp_clone_size_and_time): Adjusted to work with
ipa_auto_call_arg_values rather than on separate vectors.
(ipa_merge_fn_summary_after_inlining): Likewise. Adjusted call to
estimate_edge_size_and_time.
(ipa_update_overall_fn_summary): Adjusted call to
estimate_edge_size_and_time.
* ipa-inline-analysis.c (do_estimate_edge_time): Adjusted to work with
ipa_auto_call_arg_values rather than with separate vectors.
(do_estimate_edge_size): Likewise.
(do_estimate_edge_hints): Likewise.
* ipa-prop.c (ipa_auto_call_arg_values::~ipa_auto_call_arg_values):
New destructor.
Patrick Palka [Fri, 2 Oct 2020 14:51:31 +0000 (10:51 -0400)]
libstdc++: Add missing P0896 changes to <iterator>
I noticed that the following changes from this paper were not yet
implemented.
libstdc++-v3/ChangeLog:
* include/bits/stl_iterator.h (reverse_iterator::iter_move):
Define for C++20 as per P0896.
(reverse_iterator::iter_swap): Likewise.
(move_iterator::operator*): Apply P0896 changes for C++20.
(move_iterator::operator[]): Likewise.
* testsuite/24_iterators/reverse_iterator/cust.cc: New test.
Joe Ramsay [Fri, 2 Oct 2020 14:28:29 +0000 (15:28 +0100)]
arm: Remove coercion from scalar argument to vmin & vmax intrinsics
This patch fixes an issue with vmin* and vmax* intrinsics which accept
a scalar argument. Previously when the scalar was of different width
to the vector elements this would generate __ARM_undef. This change
allows the scalar argument to be implicitly converted to the correct
width. Also tidied up the relevant unit tests, some of which would
have passed even if only one of two or three intrinsic calls had
compiled correctly.
Bootstrapped and tested on arm-none-eabi, gcc and CMSIS_DSP
testsuites are clean. OK for trunk?
Thanks,
Joe
gcc/ChangeLog:
2020-08-10 Joe Ramsay <joe.ramsay@arm.com>
* config/arm/arm_mve.h (__arm_vmaxnmavq): Remove coercion of scalar
argument.
(__arm_vmaxnmvq): Likewise.
(__arm_vminnmavq): Likewise.
(__arm_vminnmvq): Likewise.
(__arm_vmaxnmavq_p): Likewise.
(__arm_vmaxnmvq_p): Likewise (and delete duplicate definition).
(__arm_vminnmavq_p): Likewise.
(__arm_vminnmvq_p): Likewise.
(__arm_vmaxavq): Likewise.
(__arm_vmaxavq_p): Likewise.
(__arm_vmaxvq): Likewise.
(__arm_vmaxvq_p): Likewise.
(__arm_vminavq): Likewise.
(__arm_vminavq_p): Likewise.
(__arm_vminvq): Likewise.
(__arm_vminvq_p): Likewise.
gcc/testsuite/ChangeLog:
* gcc.target/arm/mve/intrinsics/vmaxavq_p_s16.c: Add test for mismatched
width of scalar argument.
* gcc.target/arm/mve/intrinsics/vmaxavq_p_s32.c: Likewise.
* gcc.target/arm/mve/intrinsics/vmaxavq_p_s8.c: Likewise.
* gcc.target/arm/mve/intrinsics/vmaxavq_s16.c: Likewise.
* gcc.target/arm/mve/intrinsics/vmaxavq_s32.c: Likewise.
* gcc.target/arm/mve/intrinsics/vmaxavq_s8.c: Likewise.
* gcc.target/arm/mve/intrinsics/vmaxnmavq_f16.c: Likewise.
* gcc.target/arm/mve/intrinsics/vmaxnmavq_f32.c: Likewise.
* gcc.target/arm/mve/intrinsics/vmaxnmavq_p_f16.c: Likewise.
* gcc.target/arm/mve/intrinsics/vmaxnmavq_p_f32.c: Likewise.
* gcc.target/arm/mve/intrinsics/vmaxnmvq_f16.c: Likewise.
* gcc.target/arm/mve/intrinsics/vmaxnmvq_f32.c: Likewise.
* gcc.target/arm/mve/intrinsics/vmaxnmvq_p_f16.c: Likewise.
* gcc.target/arm/mve/intrinsics/vmaxnmvq_p_f32.c: Likewise.
* gcc.target/arm/mve/intrinsics/vmaxvq_p_s16.c: Likewise.
* gcc.target/arm/mve/intrinsics/vmaxvq_p_s32.c: Likewise.
* gcc.target/arm/mve/intrinsics/vmaxvq_p_s8.c: Likewise.
* gcc.target/arm/mve/intrinsics/vmaxvq_p_u16.c: Likewise.
* gcc.target/arm/mve/intrinsics/vmaxvq_p_u32.c: Likewise.
* gcc.target/arm/mve/intrinsics/vmaxvq_p_u8.c: Likewise.
* gcc.target/arm/mve/intrinsics/vmaxvq_s16.c: Likewise.
* gcc.target/arm/mve/intrinsics/vmaxvq_s32.c: Likewise.
* gcc.target/arm/mve/intrinsics/vmaxvq_s8.c: Likewise.
* gcc.target/arm/mve/intrinsics/vmaxvq_u16.c: Likewise.
* gcc.target/arm/mve/intrinsics/vmaxvq_u32.c: Likewise.
* gcc.target/arm/mve/intrinsics/vmaxvq_u8.c: Likewise.
* gcc.target/arm/mve/intrinsics/vminavq_p_s16.c: Likewise.
* gcc.target/arm/mve/intrinsics/vminavq_p_s32.c: Likewise.
* gcc.target/arm/mve/intrinsics/vminavq_p_s8.c: Likewise.
* gcc.target/arm/mve/intrinsics/vminavq_s16.c: Likewise.
* gcc.target/arm/mve/intrinsics/vminavq_s32.c: Likewise.
* gcc.target/arm/mve/intrinsics/vminavq_s8.c: Likewise.
* gcc.target/arm/mve/intrinsics/vminnmavq_f16.c: Likewise.
* gcc.target/arm/mve/intrinsics/vminnmavq_f32.c: Likewise.
* gcc.target/arm/mve/intrinsics/vminnmavq_p_f16.c: Likewise.
* gcc.target/arm/mve/intrinsics/vminnmavq_p_f32.c: Likewise.
* gcc.target/arm/mve/intrinsics/vminnmvq_f16.c: Likewise.
* gcc.target/arm/mve/intrinsics/vminnmvq_f32.c: Likewise.
* gcc.target/arm/mve/intrinsics/vminnmvq_p_f16.c: Likewise.
* gcc.target/arm/mve/intrinsics/vminnmvq_p_f32.c: Likewise.
* gcc.target/arm/mve/intrinsics/vminvq_p_s16.c: Likewise.
* gcc.target/arm/mve/intrinsics/vminvq_p_s32.c: Likewise.
* gcc.target/arm/mve/intrinsics/vminvq_p_s8.c: Likewise.
* gcc.target/arm/mve/intrinsics/vminvq_p_u16.c: Likewise.
* gcc.target/arm/mve/intrinsics/vminvq_p_u32.c: Likewise.
* gcc.target/arm/mve/intrinsics/vminvq_p_u8.c: Likewise.
* gcc.target/arm/mve/intrinsics/vminvq_s16.c: Likewise.
* gcc.target/arm/mve/intrinsics/vminvq_s32.c: Likewise.
* gcc.target/arm/mve/intrinsics/vminvq_s8.c: Likewise.
* gcc.target/arm/mve/intrinsics/vminvq_u16.c: Likewise.
* gcc.target/arm/mve/intrinsics/vminvq_u32.c: Likewise.
* gcc.target/arm/mve/intrinsics/vminvq_u8.c: Likewise.
Kyrylo Tkachov [Fri, 2 Oct 2020 14:23:19 +0000 (15:23 +0100)]
AArch64: Add neoversev1_tunings struct
This patch adds a Neoverse V1-specific tuning struct that currently is
just a deduplication of the N1 struct it was using before and specifying
the SVE width.
This will allow us to tweak Neoverse V1 things in the future as needed.
Bootstrapped and tested on aarch64-none-linux-gnu.
gcc/
* config/aarch64/aarch64.c (neoversev1_tunings): Define.
* config/aarch64/aarch64-cores.def (zeus): Use it.
(neoverse-v1): Likewise.
Jan Hubicka [Fri, 2 Oct 2020 13:56:12 +0000 (15:56 +0200)]
Perforate fnspec strings
gcc/ChangeLog:
2020-10-02 Jan Hubicka <hubicka@ucw.cz>
* attr-fnspec.h: Update documentation.
(attr_fnsec::return_desc_size): Set to 2
(attr_fnsec::arg_desc_size): Set to 2
* builtin-attrs.def (STR1): Update fnspec.
* internal-fn.def (UBSAN_NULL): Update fnspec.
(UBSAN_VPTR): Update fnspec.
(UBSAN_PTR): Update fnspec.
(ASAN_CHECK): Update fnspec.
(GOACC_DIM_SIZE): Remove fnspec.
(GOACC_DIM_POS): Remove fnspec.
* tree-ssa-alias.c (attr_fnspec::verify): Update verification.
gcc/fortran/ChangeLog:
2020-10-02 Jan Hubicka <hubicka@ucw.cz>
* trans-decl.c (gfc_build_library_function_decl_with_spec): Verify
fnspec.
(gfc_build_intrinsic_function_decls): Update fnspecs.
(gfc_build_builtin_function_decls): Update fnspecs.
* trans-io.c (gfc_build_io_library_fndecls): Update fnspecs.
* trans-types.c (create_fn_spec): Update fnspecs.
Nathan Sidwell [Fri, 2 Oct 2020 11:58:57 +0000 (04:58 -0700)]
c++: Simplify __FUNCTION__ creation
I had reason to wander into cp_make_fname, and noticed it's the only
caller of cp_fname_init. Folding it in makes the code simpler.
gcc/cp/
* cp-tree.h (cp_fname_init): Delete declaration.
* decl.c (cp_fname_init): Merge into only caller ...
(cp_make_fname): ... here & refactor.
Jan Hubicka [Fri, 2 Oct 2020 11:31:05 +0000 (13:31 +0200)]
Commonize handling of attr-fnspec
* attr-fnspec.h: New file.
* calls.c (decl_return_flags): Use attr_fnspec.
* gimple.c (gimple_call_arg_flags): Use attr_fnspec.
(gimple_call_return_flags): Use attr_fnspec.
* tree-into-ssa.c (pass_build_ssa::execute): Use attr_fnspec.
* tree-ssa-alias.c (attr_fnspec::verify): New member fuction.
Jan Hubicka [Fri, 2 Oct 2020 11:14:57 +0000 (13:14 +0200)]
Break out ao_ref_init_from_ptr_and_range from ao_ref_init_from_ptr_and_size
* tree-ssa-alias.c (ao_ref_init_from_ptr_and_range): Break out from ...
(ao_ref_init_from_ptr_and_size): ... here.
Jan Hubicka [Fri, 2 Oct 2020 11:01:01 +0000 (13:01 +0200)]
Add poly_int64 streaming support
2020-10-02 Jan Hubicka <hubicka@ucw.cz>
* data-streamer-in.c (streamer_read_poly_int64): New function.
* data-streamer-out.c (streamer_write_poly_int64): New function.
* data-streamer.h (streamer_write_poly_int64): Declare.
(streamer_read_poly_int64): Declare.
Richard Sandiford [Fri, 2 Oct 2020 10:53:06 +0000 (11:53 +0100)]
aarch64: Remove aarch64_sve_pred_dominates_p
In r11-2922, Przemek fixed a post-RA instruction match failure
caused by the SVE FP subtraction patterns.. This patch applies
the same fix to the other patterns.
To recap, the issue is around the handling of predication.
We want to do two things:
- Optimise cases in which a predicate is known to be all-true.
- Differentiate cases in which the predicate on an _x ACLE function has
to be kept as-is from cases in which we can make more lanes active.
The former is true by default, the latter is true for certain
combinations of flags in the -ffast-math group.
This is handled by a boolean flag in the unspecs to say whether the
predicate is “strict” or “relaxed”. When combining multiple strict
operations, the predicates used in the operations generally need to
match. When combining multiple relaxed operations, we can ignore the
predicates on nested operations and just use the predicate on the
“outermost” operation.
Originally I'd tried to reduce combinatorial explosion by using
aarch64_sve_pred_dominates_p. This required matching predicates
for strict operations but allowed more combinations for relaxed
operations.
The problem (as I should have remembered) is that C conditions on
insn patterns can't reliably enforce matching operands. If the
same register is used in two different input operands, the RA is
allowed to use different hard registers for those input operands
(and sometimes it has to). So operands that match before RA
might not match afterwards. The only sure way to force a match
is via match_dup.
This patch splits the cases into two. I cry bitter tears at having
to do this, but I think it's the only backportable fix. There might
be some way of using define_subst to generate the cond_* patterns from
the pred_* patterns, with some alternatives strategically disabled in
each case, but that's future work and might not be an improvement.
Since so many patterns now do this, I moved the comments from the
subtraction pattern to a new banner comment at the head of the file.
gcc/
* config/aarch64/aarch64-protos.h (aarch64_sve_pred_dominates_p):
Delete.
* config/aarch64/aarch64.c (aarch64_sve_pred_dominates_p): Likewise.
* config/aarch64/aarch64-sve.md: Add banner comment describing
how merging predicated FP operations are represented.
(*cond_<SVE_COND_FP_UNARY:optab><mode>_2): Split into...
(*cond_<SVE_COND_FP_UNARY:optab><mode>_2_relaxed): ...this and...
(*cond_<SVE_COND_FP_UNARY:optab><mode>_2_strict): ...this.
(*cond_<SVE_COND_FP_UNARY:optab><mode>_any): Split into...
(*cond_<SVE_COND_FP_UNARY:optab><mode>_any_relaxed): ...this and...
(*cond_<SVE_COND_FP_UNARY:optab><mode>_any_strict): ...this.
(*cond_<SVE_COND_FP_BINARY_INT:optab><mode>_2): Split into...
(*cond_<SVE_COND_FP_BINARY_INT:optab><mode>_2_relaxed): ...this and...
(*cond_<SVE_COND_FP_BINARY_INT:optab><mode>_2_strict): ...this.
(*cond_<SVE_COND_FP_BINARY_INT:optab><mode>_any): Split into...
(*cond_<SVE_COND_FP_BINARY_INT:optab><mode>_any_relaxed): ...this
and...
(*cond_<SVE_COND_FP_BINARY_INT:optab><mode>_any_strict): ...this.
(*cond_<SVE_COND_FP_BINARY:optab><mode>_2): Split into...
(*cond_<SVE_COND_FP_BINARY:optab><mode>_2_relaxed): ...this and...
(*cond_<SVE_COND_FP_BINARY:optab><mode>_2_strict): ...this.
(*cond_<SVE_COND_FP_BINARY_I1:optab><mode>_2_const): Split into...
(*cond_<SVE_COND_FP_BINARY_I1:optab><mode>_2_const_relaxed): ...this
and...
(*cond_<SVE_COND_FP_BINARY_I1:optab><mode>_2_const_strict): ...this.
(*cond_<SVE_COND_FP_BINARY:optab><mode>_3): Split into...
(*cond_<SVE_COND_FP_BINARY:optab><mode>_3_relaxed): ...this and...
(*cond_<SVE_COND_FP_BINARY:optab><mode>_3_strict): ...this.
(*cond_<SVE_COND_FP_BINARY:optab><mode>_any): Split into...
(*cond_<SVE_COND_FP_BINARY:optab><mode>_any_relaxed): ...this and...
(*cond_<SVE_COND_FP_BINARY:optab><mode>_any_strict): ...this.
(*cond_<SVE_COND_FP_BINARY_I1:optab><mode>_any_const): Split into...
(*cond_<SVE_COND_FP_BINARY_I1:optab><mode>_any_const_relaxed): ...this
and...
(*cond_<SVE_COND_FP_BINARY_I1:optab><mode>_any_const_strict): ...this.
(*cond_add<mode>_2_const): Split into...
(*cond_add<mode>_2_const_relaxed): ...this and...
(*cond_add<mode>_2_const_strict): ...this.
(*cond_add<mode>_any_const): Split into...
(*cond_add<mode>_any_const_relaxed): ...this and...
(*cond_add<mode>_any_const_strict): ...this.
(*cond_<SVE_COND_FCADD:optab><mode>_2): Split into...
(*cond_<SVE_COND_FCADD:optab><mode>_2_relaxed): ...this and...
(*cond_<SVE_COND_FCADD:optab><mode>_2_strict): ...this.
(*cond_<SVE_COND_FCADD:optab><mode>_any): Split into...
(*cond_<SVE_COND_FCADD:optab><mode>_any_relaxed): ...this and...
(*cond_<SVE_COND_FCADD:optab><mode>_any_strict): ...this.
(*cond_sub<mode>_3_const): Split into...
(*cond_sub<mode>_3_const_relaxed): ...this and...
(*cond_sub<mode>_3_const_strict): ...this.
(*aarch64_pred_abd<mode>): Split into...
(*aarch64_pred_abd<mode>_relaxed): ...this and...
(*aarch64_pred_abd<mode>_strict): ...this.
(*aarch64_cond_abd<mode>_2): Split into...
(*aarch64_cond_abd<mode>_2_relaxed): ...this and...
(*aarch64_cond_abd<mode>_2_strict): ...this.
(*aarch64_cond_abd<mode>_3): Split into...
(*aarch64_cond_abd<mode>_3_relaxed): ...this and...
(*aarch64_cond_abd<mode>_3_strict): ...this.
(*aarch64_cond_abd<mode>_any): Split into...
(*aarch64_cond_abd<mode>_any_relaxed): ...this and...
(*aarch64_cond_abd<mode>_any_strict): ...this.
(*cond_<SVE_COND_FP_TERNARY:optab><mode>_2): Split into...
(*cond_<SVE_COND_FP_TERNARY:optab><mode>_2_relaxed): ...this and...
(*cond_<SVE_COND_FP_TERNARY:optab><mode>_2_strict): ...this.
(*cond_<SVE_COND_FP_TERNARY:optab><mode>_4): Split into...
(*cond_<SVE_COND_FP_TERNARY:optab><mode>_4_relaxed): ...this and...
(*cond_<SVE_COND_FP_TERNARY:optab><mode>_4_strict): ...this.
(*cond_<SVE_COND_FP_TERNARY:optab><mode>_any): Split into...
(*cond_<SVE_COND_FP_TERNARY:optab><mode>_any_relaxed): ...this and...
(*cond_<SVE_COND_FP_TERNARY:optab><mode>_any_strict): ...this.
(*cond_<SVE_COND_FCMLA:optab><mode>_4): Split into...
(*cond_<SVE_COND_FCMLA:optab><mode>_4_relaxed): ...this and...
(*cond_<SVE_COND_FCMLA:optab><mode>_4_strict): ...this.
(*cond_<SVE_COND_FCMLA:optab><mode>_any): Split into...
(*cond_<SVE_COND_FCMLA:optab><mode>_any_relaxed): ...this and...
(*cond_<SVE_COND_FCMLA:optab><mode>_any_strict): ...this.
(*aarch64_pred_fac<cmp_op><mode>): Split into...
(*aarch64_pred_fac<cmp_op><mode>_relaxed): ...this and...
(*aarch64_pred_fac<cmp_op><mode>_strict): ...this.
(*cond_<optab>_nontrunc<SVE_FULL_F:mode><SVE_FULL_HSDI:mode>): Split
into...
(*cond_<optab>_nontrunc<SVE_FULL_F:mode><SVE_FULL_HSDI:mode>_relaxed):
...this and...
(*cond_<optab>_nontrunc<SVE_FULL_F:mode><SVE_FULL_HSDI:mode>_strict):
...this.
(*cond_<optab>_nonextend<SVE_FULL_HSDI:mode><SVE_FULL_F:mode>): Split
into...
(*cond_<optab>_nonextend<SVE_FULL_HSDI:mode><SVE_FULL_F:mode>_relaxed):
...this and...
(*cond_<optab>_nonextend<SVE_FULL_HSDI:mode><SVE_FULL_F:mode>_strict):
...this.
* config/aarch64/aarch64-sve2.md
(*cond_<SVE2_COND_FP_UNARY_LONG:optab><mode>): Split into...
(*cond_<SVE2_COND_FP_UNARY_LONG:optab><mode>_relaxed): ...this and...
(*cond_<SVE2_COND_FP_UNARY_LONG:optab><mode>_strict): ...this.
(*cond_<SVE2_COND_FP_UNARY_NARROWB:optab><mode>_any): Split into...
(*cond_<SVE2_COND_FP_UNARY_NARROWB:optab><mode>_any_relaxed): ...this
and...
(*cond_<SVE2_COND_FP_UNARY_NARROWB:optab><mode>_any_strict): ...this.
(*cond_<SVE2_COND_INT_UNARY_FP:optab><mode>): Split into...
(*cond_<SVE2_COND_INT_UNARY_FP:optab><mode>_relaxed): ...this and...
(*cond_<SVE2_COND_INT_UNARY_FP:optab><mode>_strict): ...this.
Richard Sandiford [Fri, 2 Oct 2020 10:53:05 +0000 (11:53 +0100)]
arm: Make more use of the new mode macros
As Christophe pointed out, my r11-3522 patch didn't in fact fix
all of the armv8_2-fp16-arith-2.c failures introduced by allowing
FP16 vectorisation without -funsafe-math-optimizations. I must have
only tested the final patch on my usual arm-linux-gnueabihf bootstrap,
which it turns out treats the test as unsupported.
The focus of the original patch was to use mode macros for
patterns that are shared between Advanced SIMD, iwMMXt and MVE.
This patch uses the mode macros for general neon.md patterns too.
gcc/
* config/arm/neon.md (*sub<VDQ:mode>3_neon): Use the new mode macros
for the insn condition.
(sub<VH:mode>3, *mul<VDQW:mode>3_neon): Likewise.
(mul<VDQW:mode>3add<VDQW:mode>_neon): Likewise.
(mul<VH:mode>3add<VH:mode>_neon): Likewise.
(mul<VDQW:mode>3neg<VDQW:mode>add<VDQW:mode>_neon): Likewise.
(fma<VCVTF:mode>4, fma<VH:mode>4, *fmsub<VCVTF:mode>4): Likewise.
(quad_halves_<code>v4sf, reduc_plus_scal_<VD:mode>): Likewise.
(reduc_plus_scal_<VQ:mode>, reduc_smin_scal_<VD:mode>): Likewise.
(reduc_smin_scal_<VQ:mode>, reduc_smax_scal_<VD:mode>): Likewise.
(reduc_smax_scal_<VQ:mode>, mul<VH:mode>3): Likewise.
(neon_vabd<VF:mode>_2, neon_vabd<VF:mode>_3): Likewise.
(fma<VH:mode>4_intrinsic): Delete.
(neon_vadd<VCVTF:mode>): Use the new mode macros to decide which
form of instruction to generate.
(neon_vmla<VDQW:mode>, neon_vmls<VDQW:mode>): Likewise.
(neon_vsub<VCVTF:mode>): Likewise.
(neon_vfma<VH:mode>): Generate the main fma<mode>4 form instead
of using fma<mode>4_intrinsic.
gcc/testsuite/
* gcc.target/arm/armv8_2-fp16-arith-2.c (float16_t): Use _Float16_t
rather than __fp16.
(float16x4_t, float16x4_t): Likewise.
(fp16_abs): Use __builtin_fabsf16.
Alex Coplan [Fri, 2 Oct 2020 10:16:31 +0000 (11:16 +0100)]
aarch64: ilp32 testsuite fixes
This fixes test failures on ilp32 introduced in
r11-3032-gd4febc75e8dfab23bd3132d5747eded918f85107.
The assembler checks in extend-syntax.c simply needed adjusting for
32-bit pointers.
It appears the subsp.c test has never passed on ILP32 due to a missed
optimisation there. Since this isn't a code quality regression, disable
that check on ILP32.
gcc/testsuite/ChangeLog:
* gcc.target/aarch64/extend-syntax.c: Fix assembler checks for
ilp32, disable check-function-bodies on ilp32.
* gcc.target/aarch64/subsp.c: Only check second scan-assembler
on lp64 since the code on ilp32 is missing the optimization
needed for this test to pass.
Martin Liska [Fri, 25 Sep 2020 08:53:26 +0000 (10:53 +0200)]
GCOV: do not mangle .gcno files.
gcc/ChangeLog:
PR gcov-profile/97193
* coverage.c (coverage_init): GCDA note files should not be
mangled and should end in output directory.
Tobias Burnus [Fri, 2 Oct 2020 10:07:57 +0000 (12:07 +0200)]
libgomp: Regenerate configure files with automake 1.15.1
libgomp/ChangeLog:
* Makefile.in: Regenerate with automake 1.15.1.
* aclocal.m4: Likewise.
* configure: Likewise.
* testsuite/Makefile.in: Likewise.
Jason Merrill [Fri, 2 Oct 2020 07:00:49 +0000 (09:00 +0200)]
c++: Set CALL_FROM_NEW_OR_DELETE_P on more calls.
We were failing to set the flag on a delete call in a new expression, in a
deleting destructor, and in a coroutine. Fixed by setting it in the
function that builds the call.
2020-10-02 Jason Merril <jason@redhat.com>
gcc/cp/ChangeLog:
* call.c (build_operator_new_call): Set CALL_FROM_NEW_OR_DELETE_P.
(build_op_delete_call): Likewise.
* init.c (build_new_1, build_vec_delete_1, build_delete): Not here.
(build_delete):
gcc/ChangeLog:
* gimple.h (gimple_call_operator_delete_p): Rename from
gimple_call_replaceable_operator_delete_p.
* gimple.c (gimple_call_operator_delete_p): Likewise.
* tree.h (DECL_IS_REPLACEABLE_OPERATOR_DELETE_P): Remove.
* tree-ssa-dce.c (mark_all_reaching_defs_necessary_1): Adjust.
(propagate_necessity): Likewise.
(eliminate_unnecessary_stmts): Likewise.
* tree-ssa-structalias.c (find_func_aliases_for_call): Likewise.
gcc/testsuite/ChangeLog:
* g++.dg/pr94314.C: new/delete no longer omitted.
Richard Biener [Thu, 1 Oct 2020 08:44:27 +0000 (10:44 +0200)]
make use of CALL_FROM_NEW_OR_DELETE_P
This fixes points-to analysis and DCE to only consider new/delete
operator calls from new or delete expressions and not direct calls.
2020-10-01 Richard Biener <rguenther@suse.de>
* gimple.h (GF_CALL_FROM_NEW_OR_DELETE): New call flag.
(gimple_call_set_from_new_or_delete): New.
(gimple_call_from_new_or_delete): Likewise.
* gimple.c (gimple_build_call_from_tree): Set
GF_CALL_FROM_NEW_OR_DELETE appropriately.
* ipa-icf-gimple.c (func_checker::compare_gimple_call):
Compare gimple_call_from_new_or_delete.
* tree-ssa-dce.c (mark_all_reaching_defs_necessary_1): Make
sure to only consider new/delete calls from new or delete
expressions.
(propagate_necessity): Likewise.
(eliminate_unnecessary_stmts): Likewise.
* tree-ssa-structalias.c (find_func_aliases_for_call):
Likewise.
* g++.dg/tree-ssa/pta-delete-1.C: New testcase.
Jason Merrill [Thu, 1 Oct 2020 08:08:58 +0000 (10:08 +0200)]
c++: Move CALL_FROM_NEW_OR_DELETE_P to tree.h
As discussed with richi, we should be able to use TREE_PROTECTED for this
flag, since CALL_FROM_THUNK_P will never be set on a call to an operator new
or delete.
2020-10-01 Jason Merril <jason@redhat.com>
gcc/cp/ChangeLog:
* lambda.c (call_from_lambda_thunk_p): New.
* cp-gimplify.c (cp_genericize_r): Use it.
* pt.c (tsubst_copy_and_build): Use it.
* typeck.c (check_return_expr): Use it.
* cp-tree.h: Declare it.
(CALL_FROM_NEW_OR_DELETE_P): Move to gcc/tree.h.
gcc/ChangeLog:
* tree.h (CALL_FROM_NEW_OR_DELETE_P): Move from cp-tree.h.
* tree-core.h: Document new usage of protected_flag.
Aldy Hernandez [Fri, 2 Oct 2020 08:36:17 +0000 (10:36 +0200)]
Implement irange::fits_p.
This should have been included in the irange_allocator patch, as
a method to see if the current object can hold a passed range
without truncation.
gcc/ChangeLog:
* value-range.h (irange::fits_p): New.
GCC Administrator [Fri, 2 Oct 2020 00:16:27 +0000 (00:16 +0000)]
Daily bump.
Ian Lance Taylor [Thu, 1 Oct 2020 22:11:22 +0000 (15:11 -0700)]
compiler: set varargs correctly for type of method expression
Fixes golang/go#41737
Reviewed-on: https://go-review.googlesource.com/c/gofrontend/+/258977
Alan Modra [Thu, 1 Oct 2020 09:44:09 +0000 (19:14 +0930)]
[RS6000] ICE in decompose, at rtl.h:2282
during RTL pass: fwprop1
gcc.dg/pr82596.c: In function 'test_cststring':
gcc.dg/pr82596.c:27:1: internal compiler error: in decompose, at rtl.h:2282
-m32 gcc/testsuite/gcc.dg/pr82596.c fails along with other tests after
applying rtx_cost patches, which exposed a backend bug.
legitimize_address when presented with the following address
(plus (reg) (const_int 0x7ffffffff))
attempts to rewrite it as a high/low sum. The low part is 0xffff, or
-1, making the high part 0x80000000. But this is no longer canonical
for SImode.
* config/rs6000/rs6000.c (rs6000_legitimize_address): Use
gen_int_mode for high part of address constant.
Alan Modra [Thu, 1 Oct 2020 09:11:37 +0000 (18:41 +0930)]
[RS6000] rs6000_linux64_override_options fix
Commit
c6be439b37 wrongly left a block of code inside an "else" block,
which changed the default for power10 TARGET_NO_FP_IN_TOC
accidentally. We don't want FP constants in the TOC when
-mcmodel=medium can address them just as efficiently outside the TOC.
* config/rs6000/rs6000.c (rs6000_linux64_override_options):
Formatting. Correct setting of TARGET_NO_FP_IN_TOC and
TARGET_NO_SUM_IN_TOC.
Alan Modra [Thu, 1 Oct 2020 02:22:38 +0000 (11:52 +0930)]
[RS6000] function for linux64 SUBSUBTARGET_OVERRIDE_OPTIONS
* config/rs6000/freebsd64.h (SUBSUBTARGET_OVERRIDE_OPTIONS): Use
rs6000_linux64_override_options.
* config/rs6000/linux64.h (SUBSUBTARGET_OVERRIDE_OPTIONS): Break
out to..
* config/rs6000/rs6000.c (rs6000_linux64_override_options): ..this,
new function. Tweak non-biarch test and clearing of
profile_kernel to work with freebsd64.h.
Nathan Sidwell [Thu, 1 Oct 2020 19:36:46 +0000 (12:36 -0700)]
c++: Kill DECL_HIDDEN_P
There are only a couple of asserts remaining using this macro, and
nothing using TYPE_HIDDEN_P. Killed thusly.
gcc/cp/
* cp-tree.h (DECL_ANTICIPATED): Adjust comment.
(DECL_HIDDEN_P, TYPE_HIDDEN_P): Delete.
* tree.c (ovl_insert): Delete DECL_HIDDEN_P assert.
(ovl_skip_hidden): Likewise.
Martin Liska [Thu, 1 Oct 2020 18:57:48 +0000 (20:57 +0200)]
Fix build of ppc64 target.
Since
a889e06ac68 the following fails.
In file included from ../../gcc/tree-ssa-propagate.h:25:0,
from ../../gcc/config/rs6000/rs6000.c:78:
../../gcc/value-query.h:90:31: error: ‘irange’ has not been declared
virtual bool range_of_expr (irange &r, tree name, gimple * = NULL) = 0;
^~~~~~
../../gcc/value-query.h:91:31: error: ‘irange’ has not been declared
virtual bool range_on_edge (irange &r, edge, tree name);
^~~~~~
../../gcc/value-query.h:92:31: error: ‘irange’ has not been declared
virtual bool range_of_stmt (irange &r, gimple *, tree name = NULL);
^~~~~~
In file included from ../../gcc/tree-ssa-propagate.h:25:0,
from ../../gcc/config/rs6000/rs6000-call.c:67:
../../gcc/value-query.h:90:31: error: ‘irange’ has not been declared
virtual bool range_of_expr (irange &r, tree name, gimple * = NULL) = 0;
^~~~~~
../../gcc/value-query.h:91:31: error: ‘irange’ has not been declared
virtual bool range_on_edge (irange &r, edge, tree name);
^~~~~~
../../gcc/value-query.h:92:31: error: ‘irange’ has not been declared
virtual bool range_of_stmt (irange &r, gimple *, tree name = NULL);
gcc/ChangeLog:
* config/rs6000/rs6000-call.c: Include value-range.h.
* config/rs6000/rs6000.c: Likewise.
Tom de Vries [Thu, 1 Oct 2020 09:07:20 +0000 (11:07 +0200)]
[nvptx] Emit mov.u32 instead of cvt.u32.u32 for truncsiqi2
When running:
...
$ gcc.sh src/gcc/testsuite/gcc.target/nvptx/abi-complex-arg.c -S -dP
...
we have in abi-complex-arg.s:
...
//(insn 3 5 4 2
// (set
// (reg:QI 23)
// (truncate:QI (reg:SI 22))) "abi-complex-arg.c":38:1 29 {truncsiqi2}
// (nil))
cvt.u32.u32 %r23, %r22; // 3 [c=4] truncsiqi2/0
...
The cvt.u32.u32 can be written shorter and clearer as mov.u32.
Fix this in define_insn "truncsi<QHIM>2".
Tested on nvptx.
gcc/ChangeLog:
2020-10-01 Tom de Vries <tdevries@suse.de>
PR target/80845
* config/nvptx/nvptx.md (define_insn "truncsi<QHIM>2"): Emit mov.u32
instead of cvt.u32.u32.