Richard Sandiford [Wed, 15 Jun 2022 16:40:09 +0000 (17:40 +0100)]
Revert recent internal-fn changes [PR105975]
The recent internal-fn “clean-ups” triggered problems on nvptx
because some of the omp_simt_* patterns had modeless operands.
I wondered about adapting expand_fn_using_insn to cope with that,
but then the problem becomes: what should the mode of operand 0
be when there is no lhs? The answer depends on the target insn.
For GOMP_SIMT_ENTER_ALLOC the answer was: use Pmode.
For GOMP_SIMT_ORDERED_PRED and others the answer was: elide the call.
(However, GOMP_SIMT_ORDERED_PRED doesn't seem to have ECF_* flags
that would normally allow it to be dropped at the gimple level.)
So these instructions seem to be special enough that they need
their own code after all. This patch reverts the second patch
and most of the first. The only part retained from the first
is splitting expand_fn_using_insn out of expand_direct_optab_fn,
since I think expand_fn_using_insn could still be useful in future.
gcc/
PR middle-end/105975
Revert everything apart from the expand_fn_using_insn and
expand_direct_optab_fn changes from:
* internal-fn.def (DEF_INTERNAL_INSN_FN): New macro.
(GOMP_SIMT_ENTER_ALLOC, GOMP_SIMT_EXIT, GOMP_SIMT_LANE)
(GOMP_SIMT_LAST_LANE, GOMP_SIMT_ORDERED_PRED, GOMP_SIMT_VOTE_ANY)
(GOMP_SIMT_XCHG_BFLY, GOMP_SIMT_XCHG_IDX): Use it.
* internal-fn.h (direct_internal_fn_info::directly_mapped): New
member variable.
(direct_internal_fn_info::vectorizable): Reduce to 1 bit.
(direct_internal_fn_p): Also return true for internal functions
that map directly to instructions defined target-insns.def.
(direct_internal_fn): Adjust comment accordingly.
* internal-fn.cc (direct_insn, optab1, optab2, vectorizable_optab1)
(vectorizable_optab2): New local macros.
(not_direct): Initialize directly_mapped.
(mask_load_direct, load_lanes_direct, mask_load_lanes_direct)
(gather_load_direct, len_load_direct, mask_store_direct)
(store_lanes_direct, mask_store_lanes_direct, vec_cond_mask_direct)
(vec_cond_direct, scatter_store_direct, len_store_direct)
(vec_set_direct, unary_direct, binary_direct, ternary_direct)
(cond_unary_direct, cond_binary_direct, cond_ternary_direct)
(while_direct, fold_extract_direct, fold_left_direct)
(mask_fold_left_direct, check_ptrs_direct): Use the macros above.
(expand_GOMP_SIMT_ENTER_ALLOC, expand_GOMP_SIMT_EXIT): Delete
(expand_GOMP_SIMT_LANE, expand_GOMP_SIMT_LAST_LANE): Likewise;
(expand_GOMP_SIMT_ORDERED_PRED, expand_GOMP_SIMT_VOTE_ANY): Likewise.
(expand_GOMP_SIMT_XCHG_BFLY, expand_GOMP_SIMT_XCHG_IDX): Likewise.
(direct_internal_fn_types): Handle functions that map to instructions
defined in target-insns.def.
(direct_internal_fn_types): Likewise.
(direct_internal_fn_supported_p): Likewise.
(internal_fn_expanders): Likewise.
(expand_fn_using_insn): New function,
split out and adapted from...
(expand_direct_optab_fn): ...here.
(expand_GOMP_SIMT_ENTER_ALLOC): Use it.
(expand_GOMP_SIMT_EXIT): Likewise.
(expand_GOMP_SIMT_LANE): Likewise.
(expand_GOMP_SIMT_LAST_LANE): Likewise.
(expand_GOMP_SIMT_ORDERED_PRED): Likewise.
(expand_GOMP_SIMT_VOTE_ANY): Likewise.
(expand_GOMP_SIMT_XCHG_BFLY): Likewise.
(expand_GOMP_SIMT_XCHG_IDX): Likewise.
Richard Earnshaw [Wed, 15 Jun 2022 15:07:20 +0000 (16:07 +0100)]
arm: big-endian issue in gen_cpymem_ldrd_strd [PR105981]
The code in gen_cpymem_ldrd_strd has been incorrect for big-endian
since r230663. The problem is that we use gen_lowpart, etc. to split
the 64-bit quantity, but fail to account for the fact that these
routines are really dealing with 64-bit /values/ and in big-endian the
ordering of the sub-registers changes.
To fix this, I've renamed the conceptually misnamed low_reg and hi_reg
as first_reg and second_reg, and then used different logic for
big-endian targets to initialize these values. This makes the logic
clearer than trying to think about high bits and low bits.
gcc/ChangeLog:
PR target/105981
* config/arm/arm.cc (gen_cpymem_ldrd_strd): Rename low_reg and hi_reg
to first_reg and second_reg respectively. Initialize them correctly
when generating big-endian code.
Nathan Sidwell [Fri, 10 Jun 2022 18:57:38 +0000 (11:57 -0700)]
c++: Use better module partition naming
It turns out that 'implementation partition' is not a term used in the
std, and is confusing to users. Let's use the better term 'internal
partition'. While there, adjust header unit naming.
gcc/cp/
* module.cc (module_state::write_readme): Use less confusing
importable unit names.
Richard Earnshaw [Wed, 15 Jun 2022 12:42:23 +0000 (13:42 +0100)]
arm: fix thinko in arm_bfi_1_p() [PR105974]
I clearly wasn't thinking straight when I wrote the arm_bfi_1_p
function and used XUINT rather than UINTVAL when extracting CONST_INT
values. It seemed to work in testing, but was incorrect and failed
RTL checking.
Fixed thusly:
gcc/ChangeLog:
PR target/105974
* config/arm/arm.cc (arm_bfi_1_p): Use UINTVAL instead of XUINT.
Iain Buclaw [Wed, 15 Jun 2022 11:20:15 +0000 (13:20 +0200)]
d: Set TYPE_ARTIFICIAL on internal TypeInfo types
Prevents them from triggering warnings when compiling with `-Wpadded'.
gcc/d/ChangeLog:
* typeinfo.cc (make_internal_typeinfo): Set TYPE_ARTIFICIAL.
gcc/testsuite/ChangeLog:
* gdc.dg/Wpadded.d: New test.
Richard Biener [Wed, 15 Jun 2022 09:27:31 +0000 (11:27 +0200)]
tree-optimization/105971 - less surprising refs_may_alias_p_2
When DSE asks whether __real a is using __imag a it gets a surprising
result when a is a FUNCTION_DECL. The following makes sure this case
is less surprising to callers but keeping the bail-out for the
non-decl case where it is true that PTA doesn't track aliases to code
correctly.
2022-06-15 Richard Biener <rguenther@suse.de>
PR tree-optimization/105971
* tree-ssa-alias.cc (refs_may_alias_p_2): Put bail-out for
FUNCTION_DECL and LABEL_DECL refs after decl-decl disambiguation
to leak less surprising alias results.
* gcc.dg/torture/pr106971.c: New testcase.
Richard Biener [Wed, 15 Jun 2022 08:54:48 +0000 (10:54 +0200)]
tree-optimization/105969 - FPE with array diagnostics
For a [0][0] array we have to be careful when dividing by the element
size which is zero for the outermost dimension. Luckily the division
is only for an overflow check which is pointless for array size zero.
2022-06-15 Richard Biener <rguenther@suse.de>
PR tree-optimization/105969
* gimple-ssa-sprintf.cc (get_origin_and_offset_r): Avoid division
by zero in overflow check.
* gcc.dg/pr105969.c: New testcase.
Iain Buclaw [Tue, 14 Jun 2022 13:56:59 +0000 (15:56 +0200)]
d: Delay completing aggregate and enum types until after attributes have been applied.
Because of forward/recursive references, the TYPE_SIZE, TYPE_ALIGN, and
TYPE_MODE of structs and enums were set before laying out its members.
This adds a new macro TYPE_FORWARD_REFERENCES for storing those forward
references against the incomplete type, laying them out after the type
has been completed. Construction of the TYPE_DECL has also been moved
on earlier in the type generation pass, which will allow the possibility
of adding gdc-specific type attributes to the D front-end in the future.
gcc/d/ChangeLog:
* d-attribs.cc (apply_user_attributes): Set ATTR_FLAG_TYPE_IN_PLACE
only on incomplete types.
* d-codegen.cc (copy_aggregate_type): Set TYPE_STUB_DECL after copy.
* d-compiler.cc (Compiler::onParseModule): Adjust.
* d-tree.h (AGGREGATE_OR_ENUM_TYPE_CHECK): Define.
(TYPE_FORWARD_REFERENCES): Define.
* decl.cc (gcc_attribute_p): Update documentation.
(DeclVisitor::visit (StructDeclaration *)): Exit before building type
node if gcc.attributes symbol.
(DeclVisitor::visit (ClassDeclaration *)): Build type node and add
TYPE_NAME to current binding level before emitting anything else.
(DeclVisitor::visit (InterfaceDeclaration *)): Likewise.
(DeclVisitor::visit (EnumDeclaration *)): Likewise.
(build_type_decl): Move rest_of_decl_compilation() call to
finish_aggregate_type().
* types.cc (insert_aggregate_field): Move layout_decl() call to
finish_aggregate_type().
(insert_aggregate_bitfield): Likewise.
(layout_aggregate_members): Adjust.
(finish_incomplete_fields): New function.
(finish_aggregate_type): Handle forward referenced field types. Call
rest_of_type_compilation() after completing the aggregate.
(TypeVisitor::visit (TypeEnum *)): Don't set size and alignment until
after apply_user_attributes(). Call rest_of_type_compilation() after
completing the enumeral.
(TypeVisitor::visit (TypeStruct *)): Call build_type_decl() before
apply_user_attributes(). Don't set size, alignment, and mode until
after apply_user_attributes().
(TypeVisitor::visit (TypeClass *)): Call build_type_decl() before
applly_user_attributes().
Richard Sandiford [Wed, 15 Jun 2022 10:12:51 +0000 (11:12 +0100)]
aarch64: Revert bogus fix for PR105254
In
f2ebf2d98efe0ac2314b58cf474f44cb8ebd5244 I'd forced the
chosen unroll factor to be a factor of the VF, in order to
work around an exact_div ICE in PR105254. This was completely
bogus -- clearly I didn't look in enough detail at why we ended
up with an unrolled VF that wasn't a multiple of the UF.
Kewen has since fixed the bug properly for PR105940, so this
patch reverts my earlier attempt. Sorry for the stupidity.
gcc/
PR tree-optimization/105254
PR tree-optimization/105940
Revert:
* config/aarch64/aarch64.cc
(aarch64_vector_costs::determine_suggested_unroll_factor): Take a
loop_vec_info as argument. Restrict the unroll factor to values
that divide the VF.
(aarch64_vector_costs::finish_cost): Update call accordingly.
gcc/testsuite/
* gcc.target/aarch64/sve/cost_model_14.c: New test.
Richard Sandiford [Wed, 15 Jun 2022 10:12:51 +0000 (11:12 +0100)]
gen: Allow unspec numbers in .md attributes
Tamar pointed out that:
(unspec:M ... <FOO>)
didn't work when a value of attribute FOO was defined by
define_constant, such as in:
(define_int_attribute FOO [(UNSPEC_A "UNSPEC_B") ...])
This is because symbolic constants are substituted during lexing
and only apply to bare symbol names, not strings.
One option would have been to extend this lexing substitution
to define_*_attribute values as well. However, that would replace
symbolic names with integer constants in the generated .cc code,
making it less readable.
This patch goes for the more localised approach of only
applying define_constants when we want their integer value.
I don't think any changes to the docs are needed. This isn't
adding a new feature, it's just making an existing one work in
the expected way.
gcc/
* read-rtl.cc (find_int): Substitute symbolic constants
before converting the string to an integer.
Jakub Jelinek [Wed, 15 Jun 2022 08:45:04 +0000 (10:45 +0200)]
openmp: Fix up get-mapped-ptr-1.{c,f90} tests
On Tue, Jun 14, 2022 at 06:41:37PM +0200, Thomas Schwinge wrote:
> In an offloading configuration, I'm seeing:
>
> PASS: libgomp.fortran/get-mapped-ptr-1.f90 -O (test for excess errors)
> [-PASS:-]{+FAIL:+} libgomp.fortran/get-mapped-ptr-1.f90 -O execution test
>
> Does that one need similar treatment?
I assume not just that but libgomp.c-c++-common/get-mapped-ptr-1.c too?
It both needs the same treatment, and in the get-mapped-ptr-1.c
case there is even UB, while the Fortran version was using c_loc (q)
as the host pointer, in C/C++ it was using q which was value of
uninitialized pointer.
2022-06-15 Jakub Jelinek <jakub@redhat.com>
* testsuite/libgomp.c-c++-common/get-mapped-ptr-1.c (main): Initialize
q to ddress of an automatic variable. Use -5 instead of -1 in
omp_get_mapped_ptr call. Add test with omp_initial_device.
* testsuite/libgomp.fortran/get-mapped-ptr-1.f90 (main): Use -5 instead
of -1 in omp_get_mapped_ptr call. Add test with omp_initial_device.
Renumber stop arguments afterwards.
Roger Sayle [Wed, 15 Jun 2022 07:31:13 +0000 (09:31 +0200)]
Fold truncations of left shifts in match.pd
Whilst investigating PR 55278, I noticed that the tree-ssa optimizers
aren't eliminating the promotions of shifts to "int" as inserted by the
c-family front-ends, instead leaving this simplification to be left to
the RTL optimizers. This patch allows match.pd to do this itself earlier,
narrowing (T)(X << C) to (T)X << C when the constant C is known to be
valid for the (narrower) type T.
Hence for this simple test case:
short foo(short x) { return x << 5; }
the .optimized dump currently looks like:
short int foo (short int x)
{
int _1;
int _2;
short int _4;
<bb 2> [local count:
1073741824]:
_1 = (int) x_3(D);
_2 = _1 << 5;
_4 = (short int) _2;
return _4;
}
but with this patch, now becomes:
short int foo (short int x)
{
short int _2;
<bb 2> [local count:
1073741824]:
_2 = x_1(D) << 5;
return _2;
}
This is always reasonable as RTL expansion knows how to use
widening optabs if it makes sense at the RTL level to perform
this shift in a wider mode.
Of course, there's often a catch. The above simplification not only
reduces the number of statements in gimple, but also allows further
optimizations, for example including the perception of rotate idioms
and bswap16. Alas, optimizing things earlier than anticipated
requires several testsuite changes [though all these tests have
been confirmed to generate identical assembly code on x86_64].
The only significant change is that the vectorization pass wouldn't
previously lower rotations of signed integer types. Hence this
patch includes a refinement to tree-vect-patterns to allow signed
types, by using the equivalent unsigned shifts.
2022-06-15 Roger Sayle <roger@nextmovesoftware.com>
Richard Biener <rguenther@suse.de>
gcc/ChangeLog
* match.pd (convert (lshift @1 INTEGER_CST@2)): Narrow integer
left shifts by a constant when the result is truncated, and the
shift constant is well-defined.
* tree-vect-patterns.cc (vect_recog_rotate_pattern): Add
support for rotations of signed integer types, by lowering
using unsigned vector shifts.
gcc/testsuite/ChangeLog
* gcc.dg/fold-convlshift-4.c: New test case.
* gcc.dg/optimize-bswaphi-1.c: Update found bswap count.
* gcc.dg/tree-ssa/pr61839_3.c: Shift is now optimized before VRP.
* gcc.dg/vect/vect-over-widen-1-big-array.c: Remove obsolete tests.
* gcc.dg/vect/vect-over-widen-1.c: Likewise.
* gcc.dg/vect/vect-over-widen-3-big-array.c: Likewise.
* gcc.dg/vect/vect-over-widen-3.c: Likewise.
* gcc.dg/vect/vect-over-widen-4-big-array.c: Likewise.
* gcc.dg/vect/vect-over-widen-4.c: Likewise.
liuhongt [Tue, 14 Jun 2022 08:27:04 +0000 (16:27 +0800)]
Fix ICE in extract_insn, at recog.cc:2791
(In reply to Uroš Bizjak from comment #1)
> Instruction does not accept memory operand for operand 3:
>
> (define_insn_and_split
> "*<sse4_1>_blendv<ssefltmodesuffix><avxsizesuffix>_ltint"
> [(set (match_operand:<ssebytemode> 0 "register_operand" "=Yr,*x,x")
> (unspec:<ssebytemode>
> [(match_operand:<ssebytemode> 1 "register_operand" "0,0,x")
> (match_operand:<ssebytemode> 2 "vector_operand" "YrBm,*xBm,xm")
> (subreg:<ssebytemode>
> (lt:VI48_AVX
> (match_operand:VI48_AVX 3 "register_operand" "Yz,Yz,x")
> (match_operand:VI48_AVX 4 "const0_operand")) 0)]
> UNSPEC_BLENDV))]
>
> The problematic insn is:
>
> (define_insn_and_split "*avx_cmp<mode>3_ltint_not"
> [(set (match_operand:VI48_AVX 0 "register_operand")
> (vec_merge:VI48_AVX
> (match_operand:VI48_AVX 1 "vector_operand")
> (match_operand:VI48_AVX 2 "vector_operand")
> (unspec:<avx512fmaskmode>
> [(subreg:VI48_AVX
> (not:<ssebytemode>
> (match_operand:<ssebytemode> 3 "vector_operand")) 0)
> (match_operand:VI48_AVX 4 "const0_operand")
> (match_operand:SI 5 "const_0_to_7_operand")]
> UNSPEC_PCMP)))]
>
> which gets split to the above pattern.
>
> In the preparation statements we have:
>
> if (!MEM_P (operands[3]))
> operands[3] = force_reg (<ssebytemode>mode, operands[3]);
> operands[3] = lowpart_subreg (<MODE>mode, operands[3], <ssebytemode>mode);
>
> Which won't fly when operand 3 is memory operand...
>
gcc/ChangeLog:
PR target/105953
* config/i386/sse.md (*avx_cmp<mode>3_ltint_not): Force_reg
operands[3].
gcc/testsuite/ChangeLog:
* g++.target/i386/pr105953.C: New test.
GCC Administrator [Wed, 15 Jun 2022 00:16:24 +0000 (00:16 +0000)]
Daily bump.
Ian Lance Taylor [Tue, 14 Jun 2022 04:32:25 +0000 (21:32 -0700)]
syscall: gofmt
Add blank lines after //sys comments where needed, and then run gofmt
on the syscall package with the new formatter.
This is the libgo version of CL 407136.
Reviewed-on: https://go-review.googlesource.com/c/gofrontend/+/412074
Jonathan Wakely [Tue, 14 Jun 2022 15:19:32 +0000 (16:19 +0100)]
libstdc++: Check lengths first in operator== for basic_string [PR62187]
As confirmed by LWG 2852, the calls to traits_type::compare do not need
to be obsvervable, so we can make operator== compare string lengths
first and return immediately for non-equal lengths. This avoids doing a
slow string comparison for "abc...xyz" == "abc...xy". Previously we only
did this optimization for std::char_traits<char>, but we can enable it
unconditionally thanks to LWG 2852.
For comparisons with a const char* we can call traits_type::length right
away to do the same optimization. That strlen call can be folded away
for constant arguments, making it very efficient.
For the pre-C++20 operator== and operator!= overloads we can swap the
order of the arguments to take advantage of the operator== improvements.
libstdc++-v3/ChangeLog:
PR libstdc++/62187
* include/bits/basic_string.h (operator==): Always compare
lengths before checking string contents.
[!__cpp_lib_three_way_comparison] (operator==, operator!=):
Reorder arguments.
Jonathan Wakely [Tue, 14 Jun 2022 13:54:27 +0000 (14:54 +0100)]
libstdc++: Inline all basic_string::compare overloads [PR59048]
Defining the compare member functions inline allows calls to
traits_type::length and std::min to be inlined, taking advantage of
constant expression arguments. When not inline, the compiler prefers to
use the explicit instantiation definitions in libstdc++.so and can't
take advantage of constant arguments.
libstdc++-v3/ChangeLog:
PR libstdc++/59048
* include/bits/basic_string.h (compare): Define inline.
* include/bits/basic_string.tcc (compare): Remove out-of-line
definitions.
* include/bits/cow_string.h (compare): Define inline.
* testsuite/21_strings/basic_string/operations/compare/char/3.cc:
New test.
Jonathan Wakely [Tue, 14 Jun 2022 13:50:49 +0000 (14:50 +0100)]
libstdc++: Fix indentation in allocator base classes
libstdc++-v3/ChangeLog:
* include/bits/new_allocator.h: Fix indentation.
* include/ext/malloc_allocator.h: Likewise.
Jonathan Wakely [Tue, 14 Jun 2022 13:37:25 +0000 (14:37 +0100)]
libstdc++: Check for size overflow in constexpr allocation [PR105957]
libstdc++-v3/ChangeLog:
PR libstdc++/105957
* include/bits/allocator.h (allocator::allocate): Check for
overflow in constexpr allocation.
* testsuite/20_util/allocator/105975.cc: New test.
Surya Kumari Jangala [Fri, 10 Jun 2022 14:22:57 +0000 (19:52 +0530)]
regrename: Fix -fcompare-debug issue in check_new_reg_p [PR105041]
In check_new_reg_p, the nregs of a du chain is computed by obtaining the
MODE of the first element in the chain, and then calling
hard_regno_nregs() with the MODE. But the first element of the chain can
be a DEBUG_INSN whose mode need not be the same as the rest of the
elements in the du chain. This was resulting in fcompare-debug failure
as check_new_reg_p was returning a different result with -g for the same
candidate register. We can instead obtain nregs from the du chain
itself.
2022-06-10 Surya Kumari Jangala <jskumari@linux.ibm.com>
gcc/
PR rtl-optimization/105041
* regrename.cc (check_new_reg_p): Use nregs value from du chain.
gcc/testsuite/
PR rtl-optimization/105041
* gcc.target/powerpc/pr105041.c: New test.
Segher Boessenkool [Thu, 9 Jun 2022 19:45:03 +0000 (19:45 +0000)]
rs6000: Delete VS_scalar
It is just the same as VEC_base, which is a more generic name.
2022-06-14 Segher Boessenkool <segher@kernel.crashing.org>
* config/rs6000/vsx.md (VS_scalar): Delete.
(rest of file): Adjust.
Nathan Sidwell [Thu, 9 Jun 2022 18:18:19 +0000 (11:18 -0700)]
c++: Elide calls to NOP module initializers
gcc/cp
* cp-tree.h (fini_modules): Add has_inits parm.
* decl2.cc (c_parse_final_cleanups): Check for
inits, adjust fini_modules flags.
* module.cc (module_state): Rename call_init_p to
active_init_p.
(module_state::write_config): Write active_init.
(module_state::read_config): Read it.
(module_determine_import_inits): Clear active_init_p
of covered inits.
(late_finish_module): Add has_init parm. Record it.
(fini_modules): Adjust.
gcc/testsuite/
* g++.dg/modules/init-2_a.C: Adjust.
* g++.dg/modules/init-2_c.C: Adjust.
* g++.dg/modules/init-2_d.C: New.
Jan Hubicka [Tue, 14 Jun 2022 12:05:53 +0000 (14:05 +0200)]
Fix ipa-cp wrt volatile loads
Check for volatile flag to ipa_load_from_parm_agg.
gcc/ChangeLog:
2022-06-10 Jan Hubicka <hubicka@ucw.cz>
PR ipa/105739
* ipa-prop.cc (ipa_load_from_parm_agg): Punt on volatile loads.
gcc/testsuite/ChangeLog:
2022-06-10 Jan Hubicka <hubicka@ucw.cz>
* gcc.dg/ipa/pr105739.c: New test.
Philipp Tomsich [Wed, 11 May 2022 10:12:57 +0000 (12:12 +0200)]
RISC-V: Split slli+sh[123]add.uw opportunities to avoid zext.w
When encountering a prescaled (biased) value as a candidate for
sh[123]add.uw, the combine pass will present this as shifted by the
aggregate amount (prescale + shift-amount) with an appropriately
adjusted mask constant that has fewer than 32 bits set.
E.g., here's the failing expression seen in combine for a prescale of
1 and a shift of 2 (note how 0x3fffffff8 >> 3 is 0x7fffffff).
Trying 7, 8 -> 10:
7: r78:SI=r81:DI#0<<0x1
REG_DEAD r81:DI
8: r79:DI=zero_extend(r78:SI)
REG_DEAD r78:SI
10: r80:DI=r79:DI<<0x2+r82:DI
REG_DEAD r79:DI
REG_DEAD r82:DI
Failed to match this instruction:
(set (reg:DI 80 [ cD.1491 ])
(plus:DI (and:DI (ashift:DI (reg:DI 81)
(const_int 3 [0x3]))
(const_int
17179869176 [0x3fffffff8]))
(reg:DI 82)))
To address this, we introduce a splitter handling these cases.
Signed-off-by: Philipp Tomsich <philipp.tomsich@vrull.eu>
Co-developed-by: Manolis Tsamis <manolis.tsamis@vrull.eu>
gcc/ChangeLog:
* config/riscv/bitmanip.md: Add split to handle opportunities
for slli + sh[123]add.uw
gcc/testsuite/ChangeLog:
* gcc.target/riscv/zba-shadd.c: New test.
Philipp Tomsich [Tue, 24 May 2022 13:03:47 +0000 (15:03 +0200)]
RISC-V: add consecutive_bits_operand predicate
Provide an easy way to constrain for constants that are a a single,
consecutive run of ones.
gcc/ChangeLog:
* config/riscv/predicates.md (consecutive_bits_operand):
Implement new predicate.
Signed-off-by: Philipp Tomsich <philipp.tomsich@vrull.eu>
Richard Biener [Tue, 14 Jun 2022 09:10:13 +0000 (11:10 +0200)]
tree-optimization/105946 - avoid accessing excess args from uninit diag
uninit diagnostics uses passing via reference and access attributes
but that iterates over function type arguments which can in some
cases appearantly outrun the actual arguments leading to ICEs.
The following simply ignores not present arguments.
2022-06-14 Richard Biener <rguenther@suse.de>
PR tree-optimization/105946
* tree-ssa-uninit.cc (maybe_warn_pass_by_reference):
Do not look at arguments not specified in the function call.
Richard Biener [Tue, 14 Jun 2022 08:59:49 +0000 (10:59 +0200)]
middle-end/105965 - add missing v_c_e <{ el }> simplification
When we got the simplification of bit-field-ref to view-convert
we lost the ability to detect FMAs since we cannot look through
_1 = {_10};
_11 = VIEW_CONVERT_EXPR<float>(_1);
the following amends the (view_convert CONSTRUCTOR) pattern
to handle this case.
2022-06-14 Richard Biener <rguenther@suse.de>
PR middle-end/105965
* match.pd (view_convert CONSTRUCTOR): Handle single-element
CTOR case.
* gcc.target/i386/pr105965.c: New testcase.
Eric Botcazou [Tue, 14 Jun 2022 10:28:24 +0000 (12:28 +0200)]
Restore bootstrap on ARM
The -Wuse-after-free warning is explicitly disabled for destructors on ARM
because of the special ABI and the previous change to the warning machinery
uncovered another case where the warning data would be incorrectly erased.
gcc/
* warning-control.cc (copy_warning) [generic version]: Do not erase
the warning data of the destination location when the no-warning
bit is not set on the source.
(copy_warning) [tree version]: Return early if TO is equal to FROM.
(copy_warning) [gimple version]: Likewise.
gcc/testsuite/
* g++.dg/warn/Wuse-after-free5.C: New test.
Kewen Lin [Tue, 14 Jun 2022 05:57:01 +0000 (00:57 -0500)]
vect: Move suggested_unroll_factor applying [PR105940]
As PR105940 shown, when rs6000 port tries to assign
m_suggested_unroll_factor by 4 or so, there will be ICE on:
exact_div (LOOP_VINFO_VECT_FACTOR (loop_vinfo),
loop_vinfo->suggested_unroll_factor);
In function vect_analyze_loop_2, the current place of
suggested_unroll_factor applying can't guarantee it's
applied for all cases. As the case shows, vectorizer
could retry with SLP forced off, the vf is reset by
saved_vectorization_factor which isn't applied with
suggested_unroll_factor before. It means it can end
up with one vf which neglects suggested_unroll_factor.
I think it's off design, we should move the applying
of suggested_unroll_factor after start_over.
PR tree-optimization/105940
gcc/ChangeLog:
* tree-vect-loop.cc (vect_analyze_loop_2): Move the place of
applying suggested_unroll_factor after start_over.
Takayuki 'January June' Suwa [Mon, 13 Jun 2022 16:28:43 +0000 (01:28 +0900)]
xtensa: Optimize bitwise AND operation with some specific forms of constants
This patch offers several insn-and-split patterns for bitwise AND with
register and constant that can be represented as:
i. 1's least significant N bits and the others 0's (17 <= N <= 31)
ii. 1's most significant N bits and the others 0's (12 <= N <= 31)
iii. M 1's sequence of bits and trailing N 0's bits, that cannot fit into a
"MOVI Ax, simm12" instruction (1 <= M <= 16, 1 <= N <= 30)
And also offers shortcuts for conditional branch if each of the abovementioned
operations is (not) equal to zero.
gcc/ChangeLog:
* config/xtensa/predicates.md (shifted_mask_operand):
New predicate.
* config/xtensa/xtensa.md (*andsi3_const_pow2_minus_one):
New insn-and-split pattern.
(*andsi3_const_negative_pow2, *andsi3_const_shifted_mask,
*masktrue_const_pow2_minus_one, *masktrue_const_negative_pow2,
*masktrue_const_shifted_mask): Ditto.
Takayuki 'January June' Suwa [Thu, 27 May 2021 10:04:12 +0000 (19:04 +0900)]
xtensa: Make use of BALL/BNALL instructions
In Xtensa ISA, there is no single machine instruction that calculates unary
bitwise negation, but a few similar fused instructions are exist:
"BALL Ax, Ay, label" // if ((~Ax & Ay) == 0) goto label;
"BNALL Ax, Ay, label" // if ((~Ax & Ay) != 0) goto label;
These instructions have never been emitted before, but it seems no reason not
to make use of them.
gcc/ChangeLog:
* config/xtensa/xtensa.md (*masktrue_bitcmpl): New insn pattern.
gcc/testsuite/ChangeLog:
* gcc.target/xtensa/BALL-BNALL.c: New.
Takayuki 'January June' Suwa [Mon, 31 Jan 2022 00:56:21 +0000 (09:56 +0900)]
xtensa: Simplify conditional branch/move insn patterns
No need to describe the "false side" conditional insn patterns anymore.
gcc/ChangeLog:
* config/xtensa/xtensa-protos.h (xtensa_emit_branch):
Remove the first argument.
(xtensa_emit_bit_branch): Remove it because now called only from the
output statement of *bittrue insn pattern.
* config/xtensa/xtensa.cc (gen_int_relational): Remove the last
argument 'p_invert', and make so that the condition is reversed by
itself as needed.
(xtensa_expand_conditional_branch): Share the common path, and remove
condition inversion code.
(xtensa_emit_branch, xtensa_emit_movcc): Simplify by removing the
"false side" pattern.
(xtensa_emit_bit_branch): Remove it because of the abovementioned
reason, and move the function body to *bittrue insn pattern.
* config/xtensa/xtensa.md (*bittrue): Transplant the output
statement from removed xtensa_emit_bit_branch().
(*bfalse, *ubfalse, *bitfalse, *maskfalse): Remove the "false side"
insn patterns.
Takayuki 'January June' Suwa [Mon, 13 Jun 2022 16:38:31 +0000 (01:38 +0900)]
xtensa: Improve shift operations more
This patch introduces funnel shifter utilization, and rearranges existing
"per-byte shift" insn patterns.
gcc/ChangeLog:
* config/xtensa/predicates.md (logical_shift_operator,
xtensa_shift_per_byte_operator): New predicates.
* config/xtensa/xtensa-protos.h (xtensa_shlrd_which_direction):
New prototype.
* config/xtensa/xtensa.cc (xtensa_shlrd_which_direction):
New helper function for funnel shift patterns.
* config/xtensa/xtensa.md (ior_op): New code iterator.
(*ashlsi3_1): Replace with new split pattern.
(*shift_per_byte): Unify *ashlsi3_3x, *ashrsi3_3x and *lshrsi3_3x.
(*shift_per_byte_omit_AND_0, *shift_per_byte_omit_AND_1):
New insn-and-split patterns that redirect to *xtensa_shift_per_byte,
in order to omit unnecessary bitwise AND operation.
(*shlrd_reg_<code>, *shlrd_const_<code>, *shlrd_per_byte_<code>,
*shlrd_per_byte_<code>_omit_AND):
New insn patterns for funnel shifts.
gcc/testsuite/ChangeLog:
* gcc.target/xtensa/funnel_shifter.c: New.
GCC Administrator [Tue, 14 Jun 2022 00:16:39 +0000 (00:16 +0000)]
Daily bump.
Iain Buclaw [Mon, 13 Jun 2022 22:05:35 +0000 (00:05 +0200)]
libphobos: Check in missing core.sync package module
This was meant to be part of r13-1062 in the merge with upstream
druntime
454471d8.
Jason Merrill [Fri, 10 Jun 2022 19:26:36 +0000 (15:26 -0400)]
ubsan: -Wreturn-type and ubsan trap-on-error
I noticed that -fsanitize=undefined -fsanitize-undefined-trap-on-error was
omitting the usual -Wreturn-type warning for control flowing off the end of
a function. This was because the warning code was looking for calls either
to __builtin_unreachable or the UBSan function, but these flags produce a
call to __builtin_trap instead.
gcc/c-family/ChangeLog:
* c-ubsan.cc (ubsan_instrument_return): Use BUILTINS_LOCATION.
gcc/ChangeLog:
* tree-cfg.cc (pass_warn_function_return::execute): Also check
BUILT_IN_TRAP.
gcc/testsuite/ChangeLog:
* g++.dg/ubsan/return-8.C: New test.
Maciej W. Rozycki [Mon, 13 Jun 2022 21:29:45 +0000 (22:29 +0100)]
RISC-V: Reset the length to the default of 4 for FP comparisons
The default length for floating-point compare operations is overridden
to 8, however the FEQ.fmt, FLT.fmt, FLE.fmt machine instructions and
FGE.fmt, FGT.fmt assembly idioms the relevant RTL insns produce are all
4 bytes long each. And all the floating-point compare RTL insns that
produce multiple machine instructions explicitly set their lengths.
Remove the override then, letting the default of 4 apply for the single
instruction case.
gcc/
* config/riscv/riscv.md (length): Remove the explicit setting
for "fcmp".
H.J. Lu [Fri, 10 Jun 2022 18:22:00 +0000 (11:22 -0700)]
x86: Require AVX for F16C and VAES
Since F16C and VAES are only usable with AVX, require AVX for F16C and
VAES.
libgcc/105920
* common/config/i386/cpuinfo.h (get_available_features): Require
AVX for F16C and VAES.
Mark Mentovai [Mon, 13 Jun 2022 15:40:19 +0000 (16:40 +0100)]
libstdc++: Rename __null_terminated to avoid collision with Apple SDK
The macOS 13 SDK (and equivalent-version iOS and other Apple OS SDKs)
contain this definition in <sys/cdefs.h>:
863 #define __null_terminated
This collides with the use of __null_terminated in libstdc++'s
experimental fs_path.h.
As libstdc++'s use of this token is entirely internal to fs_path.h, the
simplest workaround, renaming it, is most appropriate. Here, it's
renamed to __nul_terminated, referencing the NUL ('\0') value that is
used to terminate the strings in the context in which this tag structure
is used.
libstdc++-v3/ChangeLog:
* include/experimental/bits/fs_path.h (__detail::__null_terminated):
Rename to __nul_terminated to avoid colliding with a macro in
Apple's SDK.
Signed-off-by: Mark Mentovai <mark@mentovai.com>
Jonathan Wakely [Mon, 13 Jun 2022 15:36:14 +0000 (16:36 +0100)]
libstdc++: Use type_identity_t for non-deducible std::atomic_xxx args
This is LWG 3220 which is about to become Tentatively Ready.
libstdc++-v3/ChangeLog:
* include/std/atomic (__atomic_val_t): Use __type_identity_t
instead of atomic<T>::value_type, as per LWG 3220.
* testsuite/29_atomics/atomic/lwg3220.cc: New test.
Uros Bizjak [Mon, 13 Jun 2022 15:08:18 +0000 (17:08 +0200)]
i386: Return true for (SUBREG (MEM....)) in register_no_elim_operand [PR105927]
Under certain conditions register_operand predicate also allows
subregs of memory operands. When RTL checking is enabled, these
will fail with REGNO (op).
Allow subregs of memory operands, these are guaranteed
to be reloaded to a register.
2022-06-13 Uroš Bizjak <ubizjak@gmail.com>
gcc/ChangeLog:
PR target/105927
* config/i386/predicates.md (register_no_elim_operand):
Return true for subreg of a memory operand.
gcc/testsuite/ChangeLog:
PR target/105927
* gcc.target/i386/pr105927.c: New test.
Iain Buclaw [Sat, 11 Jun 2022 10:40:00 +0000 (12:40 +0200)]
d: Match function declarations of gcc built-ins from any module.
Declarations of recognised gcc built-in functions are now matched from
any module. Previously, only the `core.stdc' package was scanned.
In addition to matching of the symbol, any user-applied `@attributes' or
`pragma(mangle)' name will be applied to the built-in decl as well.
Because there would now be no control over where built-in declarations
are coming from, the warning option `-Wbuiltin-declaration-mismatch' has
been implemented in the D front-end too.
gcc/d/ChangeLog:
* d-builtins.cc: Include builtins.h.
(gcc_builtins_libfuncs): Remove.
(strip_type_modifiers): New function.
(matches_builtin_type): New function.
(covariant_with_builtin_type_p): New function.
(maybe_set_builtin_1): Set front-end built-in if identifier matches
gcc built-in name. Apply user-specified attributes and assembler name
overrides to the built-in. Warn about built-in declaration mismatches.
(d_builtin_function): Set IDENTIFIER_DECL_TREE of built-in functions.
* d-compiler.cc (Compiler::onParseModule): Scan all modules for any
identifiers that match built-in function names.
* lang.opt (Wbuiltin-declaration-mismatch): New option.
gcc/testsuite/ChangeLog:
* gdc.dg/Wbuiltin_declaration_mismatch.d: New test.
* gdc.dg/builtins.d: New test.
Richard Sandiford [Mon, 13 Jun 2022 14:24:34 +0000 (15:24 +0100)]
Add a general mapping from internal fns to target insns
Several existing internal functions map directly to an instruction
defined in target-insns.def. This patch makes it easier to define
more such functions in future.
This should help to reduce cut-&-paste, but more importantly, it allows
the difference between optab functions and target-insns.def functions
to be abstracted away; both are now treated as “directly-mapped”.
gcc/
* internal-fn.def (DEF_INTERNAL_INSN_FN): New macro.
(GOMP_SIMT_ENTER_ALLOC, GOMP_SIMT_EXIT, GOMP_SIMT_LANE)
(GOMP_SIMT_LAST_LANE, GOMP_SIMT_ORDERED_PRED, GOMP_SIMT_VOTE_ANY)
(GOMP_SIMT_XCHG_BFLY, GOMP_SIMT_XCHG_IDX): Use it.
* internal-fn.h (direct_internal_fn_info::directly_mapped): New
member variable.
(direct_internal_fn_info::vectorizable): Reduce to 1 bit.
(direct_internal_fn_p): Also return true for internal functions
that map directly to instructions defined target-insns.def.
(direct_internal_fn): Adjust comment accordingly.
* internal-fn.cc (direct_insn, optab1, optab2, vectorizable_optab1)
(vectorizable_optab2): New local macros.
(not_direct): Initialize directly_mapped.
(mask_load_direct, load_lanes_direct, mask_load_lanes_direct)
(gather_load_direct, len_load_direct, mask_store_direct)
(store_lanes_direct, mask_store_lanes_direct, vec_cond_mask_direct)
(vec_cond_direct, scatter_store_direct, len_store_direct)
(vec_set_direct, unary_direct, binary_direct, ternary_direct)
(cond_unary_direct, cond_binary_direct, cond_ternary_direct)
(while_direct, fold_extract_direct, fold_left_direct)
(mask_fold_left_direct, check_ptrs_direct): Use the macros above.
(expand_GOMP_SIMT_ENTER_ALLOC, expand_GOMP_SIMT_EXIT): Delete
(expand_GOMP_SIMT_LANE, expand_GOMP_SIMT_LAST_LANE): Likewise;
(expand_GOMP_SIMT_ORDERED_PRED, expand_GOMP_SIMT_VOTE_ANY): Likewise.
(expand_GOMP_SIMT_XCHG_BFLY, expand_GOMP_SIMT_XCHG_IDX): Likewise.
(direct_internal_fn_types): Handle functions that map to instructions
defined in target-insns.def.
(direct_internal_fn_types): Likewise.
(direct_internal_fn_supported_p): Likewise.
(internal_fn_expanders): Likewise.
Richard Sandiford [Mon, 13 Jun 2022 14:24:34 +0000 (15:24 +0100)]
Factor out common internal-fn idiom
internal-fn.c has quite a few functions that simply map the result
of the call to an instruction's output operand (if any) and map
each argument to an instruction's input operand, in order.
This patch adds a single function for doing that. It's really
just a generalisation of expand_direct_optab_fn, but with the
output operand being optional.
Unfortunately, it isn't possible to do this for vcond_mask
because the internal function has a different argument order
from the optab.
gcc/
* internal-fn.cc (expand_fn_using_insn): New function,
split out and adapted from...
(expand_direct_optab_fn): ...here.
(expand_GOMP_SIMT_ENTER_ALLOC): Use it.
(expand_GOMP_SIMT_EXIT): Likewise.
(expand_GOMP_SIMT_LANE): Likewise.
(expand_GOMP_SIMT_LAST_LANE): Likewise.
(expand_GOMP_SIMT_ORDERED_PRED): Likewise.
(expand_GOMP_SIMT_VOTE_ANY): Likewise.
(expand_GOMP_SIMT_XCHG_BFLY): Likewise.
(expand_GOMP_SIMT_XCHG_IDX): Likewise.
Iain Buclaw [Mon, 13 Jun 2022 12:35:38 +0000 (14:35 +0200)]
d: Improve TypeInfo errors when compiling in -fno-rtti mode
The existing TypeInfo errors can be cryptic. This alters the diagnostic
to include which expression is requiring `object.TypeInfo'.
gcc/d/ChangeLog:
* d-tree.h (check_typeinfo_type): Add Expression* parameter.
(build_typeinfo): Likewise. Declare new override.
* expr.cc (ExprVisitor): Call build_typeinfo with Expression*.
* typeinfo.cc (check_typeinfo_type): Include expression in the
diagnostic message.
(build_typeinfo): New override.
gcc/testsuite/ChangeLog:
* gdc.dg/rtti1.d: New test.
Jakub Jelinek [Mon, 13 Jun 2022 11:42:59 +0000 (13:42 +0200)]
openmp: Conforming device numbers and omp_{initial,invalid}_device
OpenMP 5.2 changed once more what device numbers are allowed.
In 5.1, valid device numbers were [0, omp_get_num_devices()].
5.2 makes also -1 valid (calls it omp_initial_device), which is equivalent
in behavior to omp_get_num_devices() number but has the advantage that it
is a constant. And it also introduces omp_invalid_device which is
also a constant with implementation defined value < -1. That value should
act like sNaN, any time any device construct (GOMP_target*) or OpenMP runtime
API routine is asked for such a device, the program is terminated.
And if OMP_TARGET_OFFLOAD=mandatory, all non-conforming device numbers (which
is all but [-1, omp_get_num_devices()] other than omp_invalid_device)
must be treated like omp_invalid_device.
For device constructs, we have a compatibility problem, we've historically
used 2 magic negative values to mean something special.
GOMP_DEVICE_ICV (-1) means device clause wasn't present, pick the
omp_get_default_device () number
GOMP_DEVICE_FALLBACK (-2) means the host device (this is used e.g. for
#pragma omp target if (cond)
where if cond is false, we pass -2
But 5.2 requires that omp_initial_device is -1 (there were discussions
about it, advantage of -1 is that one can say iterate over the
[-1, omp_get_num_devices()-1] range to get all devices starting with
the host/initial one.
And also, if user passes -2, unless it is omp_invalid_device, we need to
treat it like non-conforming with OMP_TARGET_OFFLOAD=mandatory.
So, the patch does on the compiler side some number remapping,
user_device_num >= -2U ? user_device_num - 1 : user_device_num.
This remapping is done at compile time if device clause has constant
argument, otherwise at runtime, and means that for user -1 (omp_initial_device)
we pass -2 to GOMP_* in the runtime library where it treats it like host
fallback, while -2 is remapped to -3 (one of the non-conforming device numbers,
for those it doesn't matter which one is which).
omp_invalid_device is then -4.
For the OpenMP device runtime APIs, no remapping is done.
This patch doesn't deal with the initial default-device-var for
OMP_TARGET_OFFLOAD=mandatory , the spec says that the inital ICV value
for that should in that case depend on whether there are any offloading
devices or not (if not, should be omp_invalid_device), but that means
we can't determine the number of devices lazily (and let libraries have the
possibility to register their offloading data etc.).
2022-06-13 Jakub Jelinek <jakub@redhat.com>
gcc/
* omp-expand.cc (expand_omp_target): Remap user provided
device clause arguments, -1 to -2 and -2 to -3, either
at compile time if constant, or at runtime.
include/
* gomp-constants.h (GOMP_DEVICE_INVALID): Define.
libgomp/
* omp.h.in (omp_initial_device, omp_invalid_device): New enumerators.
* omp_lib.f90.in (omp_initial_device, omp_invalid_device): New
parameters.
* omp_lib.h.in (omp_initial_device, omp_invalid_device): Likewise.
* target.c (resolve_device): Add remapped argument, handle
GOMP_DEVICE_ICV only if remapped is true (and clear remapped),
for negative values, treat GOMP_DEVICE_FALLBACK as fallback only
if remapped, otherwise treat omp_initial_device that way. For
omp_invalid_device, always emit gomp_fatal, even when
OMP_TARGET_OFFLOAD isn't mandatory.
(GOMP_target, GOMP_target_ext, GOMP_target_data, GOMP_target_data_ext,
GOMP_target_update, GOMP_target_update_ext,
GOMP_target_enter_exit_data): Pass true as remapped argument to
resolve_device.
(omp_target_alloc, omp_target_free, omp_target_is_present,
omp_target_memcpy_check, omp_target_associate_ptr,
omp_target_disassociate_ptr, omp_get_mapped_ptr,
omp_target_is_accessible): Pass false as remapped argument to
resolve_device. Treat omp_initial_device the same as
gomp_get_num_devices (). Don't bypass resolve_device calls if
device_num is negative.
(omp_pause_resource): Treat omp_initial_device the same as
gomp_get_num_devices (). Call resolve_device.
* icv-device.c (omp_set_default_device): Always set to device_num
even when it is negative.
* libgomp.texi: Document that Conforming device numbers,
omp_initial_device and omp_invalid_device is implemented.
* testsuite/libgomp.c/target-41.c (main): Add test with
omp_initial_device.
* testsuite/libgomp.c/target-45.c: New test.
* testsuite/libgomp.c/target-46.c: New test.
* testsuite/libgomp.c/target-47.c: New test.
* testsuite/libgomp.c-c++-common/target-is-accessible-1.c (main): Add
test with omp_initial_device. Use -5 instead of -1 for negative value
test.
* testsuite/libgomp.fortran/target-is-accessible-1.f90 (main):
Likewise. Reorder stop numbers.
Eric Botcazou [Mon, 13 Jun 2022 11:32:53 +0000 (13:32 +0200)]
Introduce -finstrument-functions-once
The goal is to make it possible to use it in (large) production binaries
to do function-level coverage, so the overhead must be minimum and, in
particular, there is no protection against data races so the "once"
moniker is imprecise.
gcc/
* common.opt (finstrument-functions): Set explicit value.
(-finstrument-functions-once): New option.
* doc/invoke.texi (Program Instrumentation Options): Document it.
* gimplify.cc (build_instrumentation_call): New static function.
(gimplify_function_tree): Call it to emit the instrumentation calls
if -finstrument-functions[-once] is specified.
gcc/testsuite/
* gcc.dg/instrument-4.c: New test.
Eric Botcazou [Mon, 13 Jun 2022 08:03:36 +0000 (10:03 +0200)]
Do not erase warning data in gimple_set_location
gimple_set_location is mostly invoked on newly built GIMPLE statements, so
their location is UNKNOWN_LOCATION and setting it will clobber the warning
data of the passed location, if any.
gcc/
* dwarf2out.cc (output_one_line_info_table): Initialize prev_addr.
* gimple.h (gimple_set_location): Do not copy warning data from
the previous location when it is UNKNOWN_LOCATION.
* optabs.cc (expand_widen_pattern_expr): Always set oprnd{1,2}.
gcc/testsuite/
* c-c++-common/nonnull-1.c: Remove XFAIL for C++.
Nathan Sidwell [Thu, 9 Jun 2022 15:48:25 +0000 (08:48 -0700)]
c++: Separate late stage module writing
This moves some module writing into a newly added write_end function,
which is called after writing initializers.
gcc/cp/
* module.cc (module_state::write): Separate to ...
(module_state::write_begin, module_state::write_end): ...
these.
(module_state::write_readme): Drop extensions parameter.
(struct module_processing_cookie): Add more fields.
(finish_module_processing): Adjust state writing call.
(late_finish_module): Call write_end.
Iain Buclaw [Mon, 13 Jun 2022 08:41:57 +0000 (10:41 +0200)]
d: Merge upstream dmd
821ed393d, druntime
454471d8, phobos
1206fc94f.
D front-end changes:
- Import latest bug fixes to mainline.
D runtime changes:
- Fix duplicate Elf64_Dyn definitions on Solaris.
- _d_newThrowable has been converted to a template.
Phobos changes:
- Import latest bug fixes to mainline.
gcc/d/ChangeLog:
* dmd/MERGE: Merge upstream dmd
821ed393d.
* expr.cc (ExprVisitor::visit (NewExp *)): Remove handled of
allocating `@nogc' throwable object.
* runtime.def (NEWTHROW): Remove.
libphobos/ChangeLog:
* libdruntime/MERGE: Merge upstream druntime
454471d8.
* libdruntime/Makefile.am (DRUNTIME_DSOURCES): Add
core/sync/package.d.
* libdruntime/Makefile.in: Regenerate.
* src/MERGE: Merge upstream phobos
1206fc94f.
Jakub Jelinek [Mon, 13 Jun 2022 08:53:33 +0000 (10:53 +0200)]
i386: Fix up *<dwi>3_doubleword_mask [PR105911]
Another regression caused by my recent patch.
This time because define_insn_and_split only requires that the
constant mask is const_int_operand. When it was only SImode,
that wasn't a problem, HImode neither, but for DImode if we need
to and the shift count we might run into a problem that it isn't
a representable signed 32-bit immediate.
But, we don't really care about the upper bits of the mask, so
we can just mask the CONST_INT with the mode mask.
2022-06-13 Jakub Jelinek <jakub@redhat.com>
PR target/105911
* config/i386/i386.md (*ashl<dwi>3_doubleword_mask,
*<insn><dwi>3_doubleword_mask): Use operands[3] masked with
(<MODE_SIZE> * BITS_PER_UNIT) - 1 as AND operand instead of
operands[3] unmodified.
* gcc.dg/pr105911.c: New test.
Cui,Lili [Fri, 10 Jun 2022 07:31:21 +0000 (15:31 +0800)]
testsuite: Add -mtune=generic to dg-options for two testcases.
Use -mtune=generic to limit these two test cases. Because configuring them with
-mtune=cascadelake or znver3 will vectorize them.
gcc/testsuite/ChangeLog:
* gcc.dg/vect/costmodel/x86_64/costmodel-pr104582-2.c: Add
-mtune=generic to dg-options.
* gcc.target/i386/pr84101.c: Likewise.
GCC Administrator [Mon, 13 Jun 2022 00:16:18 +0000 (00:16 +0000)]
Daily bump.
Simon Wright [Sun, 12 Jun 2022 16:01:22 +0000 (17:01 +0100)]
Darwin: Truncate kernel-provided version to OS major for Darwin >= 20.
In common with system tools, GCC uses a version obtained from the kernel as
the prevailing macOS target, when that is not overridden by command line or
environment versions (i.e. mmacosx-version-min=, MACOSX_DEPLOYMENT_TARGET).
Presently, GCC assumes that if the OS version is >= 20, the value used should
include both major and minium version identifiers. However the system tools
(for those versions) truncate the value to the major version - this leads to
link errors when combining objects built with clang and GCC for example:
ld: warning: object file (null.o) was built for newer macOS version (12.2)
than being linked (12.0)
The change here truncates the values GCC uses to the major version.
gcc/ChangeLog:
PR target/104871
* config/darwin-driver.cc (darwin_find_version_from_kernel): If the OS
version is darwin20 (macOS 11) or greater, truncate the version to the
major number.
Mark Mentovai [Fri, 10 Jun 2022 14:56:42 +0000 (15:56 +0100)]
Darwin: Future-proof -mmacosx-version-min
f18cbc1ee1f4 (2021-12-18) updated various parts of gcc to not impose a
Darwin or macOS version maximum of the current known release. Different
parts of gcc accept, variously, Darwin version numbers matching
darwin2*, and macOS major version numbers up to 99. The current released
version is Darwin 21 and macOS 12, with Darwin 22 and macOS 13 expected
for public release later this year. With one major OS release per year,
this strategy is expected to provide another 8 years of headroom.
However,
f18cbc1ee1f4 missed config/darwin-c.c (now .cc), which
continued to impose a maximum of macOS 12 on the -mmacosx-version-min
compiler driver argument. This was last updated from 11 to 12 in
11b967577483 (2021-10-27), but kicking the can down the road one year at
a time is not a viable strategy, and is not in line with the more recent
technique from
f18cbc1ee1f4.
Prior to
556ab5125912 (2020-11-06), config/darwin-c.c did not impose a
maximum that needed annual maintenance, as at that point, all macOS
releases had used a major version of 10. The stricter approach imposed
since then was valuable for a time until the particulars of the new
versioning scheme were established and understood, but now that they
are, it's prudent to restore a more permissive approach.
gcc/ChangeLog:
* config/darwin-c.cc: Make -mmacosx-version-min more future-proof.
Signed-off-by: Mark Mentovai <mark@mentovai.com>
Max Filippov [Sun, 12 Jun 2022 01:51:44 +0000 (18:51 -0700)]
gcc: xtensa: fix pr95571 test for call0 ABI
gcc/testsuite/
* g++.target/xtensa/pr95571.C (__xtensa_libgcc_window_spill):
New definition.
Prathamesh Kulkarni [Sun, 12 Jun 2022 03:20:16 +0000 (08:50 +0530)]
PR96463: Optimise svld1rq from vectors for little endian AArch64 targets.
The patch folds:
lhs = svld1rq({-1, -1, ...}, rhs)
into:
tmp = mem_ref<vectype> [(elem_type * {ref-all}) rhs]
lhs = vec_perm_expr<tmp, tmp, {0, 1, 2, 3 ...}>.
which is then expanded using aarch64_expand_sve_dupq.
Example:
svint32_t
foo (int32x4_t x)
{
return svld1rq (svptrue_b8 (), &x[0]);
}
code-gen:
foo:
.LFB4350:
dup z0.q, z0.q[0]
ret
The patch relaxes type-checking for VEC_PERM_EXPR by allowing different
vector types for lhs and rhs provided:
(1) rhs3 is constant and has integer type element.
(2) len(lhs) == len(rhs3) and len(rhs1) == len(rhs2)
(3) lhs and rhs have same element type.
gcc/ChangeLog:
PR target/96463
* config/aarch64/aarch64-sve-builtins-base.cc: Include ssa.h.
(svld1rq_impl::fold): Define.
* config/aarch64/aarch64.cc (expand_vec_perm_d): Define new members
op_mode and op_vec_flags.
(aarch64_evpc_reencode): Initialize newd.op_mode and
newd.op_vec_flags.
(aarch64_evpc_sve_dup): New function.
(aarch64_expand_vec_perm_const_1): Gate existing calls to
aarch64_evpc_* functions under d->vmode == d->op_mode,
and call aarch64_evpc_sve_dup.
(aarch64_vectorize_vec_perm_const): Remove assert
d->vmode != d->op_mode, and initialize d.op_mode and d.op_vec_flags.
* tree-cfg.cc (verify_gimple_assign_ternary): Allow different
vector types for lhs and rhs in VEC_PERM_EXPR if rhs3 is
constant.
gcc/testsuite/ChangeLog:
PR target/96463
* gcc.target/aarch64/sve/acle/general/pr96463-1.c: New test.
* gcc.target/aarch64/sve/acle/general/pr96463-2.c: Likewise.
GCC Administrator [Sun, 12 Jun 2022 00:16:26 +0000 (00:16 +0000)]
Daily bump.
Takayuki 'January June' Suwa [Fri, 10 Jun 2022 15:26:17 +0000 (00:26 +0900)]
xtensa: Improve constant synthesis for both integer and floating-point
This patch revises the previous implementation of constant synthesis.
First, changed to use define_split machine description pattern and to run
after reload pass, in order not to interfere some optimizations such as
the loop invariant motion.
Second, not only integer but floating-point is subject to processing.
Third, several new synthesis patterns - when the constant cannot fit into
a "MOVI Ax, simm12" instruction, but:
I. can be represented as a power of two minus one (eg. 32767, 65535 or
0x7fffffffUL)
=> "MOVI(.N) Ax, -1" + "SRLI Ax, Ax, 1 ... 31" (or "EXTUI")
II. is between -34816 and 34559
=> "MOVI(.N) Ax, -2048 ... 2047" + "ADDMI Ax, Ax, -32768 ... 32512"
III. (existing case) can fit into a signed 12-bit if the trailing zero bits
are stripped
=> "MOVI(.N) Ax, -2048 ... 2047" + "SLLI Ax, Ax, 1 ... 31"
The above sequences consist of 5 or 6 bytes and have latency of 2 clock cycles,
in contrast with "L32R Ax, <litpool>" (3 bytes and one clock latency, but may
suffer additional one clock pipeline stall and implementation-specific
InstRAM/ROM access penalty) plus 4 bytes of constant value.
In addition, 3-instructions synthesis patterns (8 or 9 bytes, 3 clock latency)
are also provided when optimizing for speed and L32R instruction has
considerable access penalty:
IV. 2-instructions synthesis (any of I ... III) followed by
"SLLI Ax, Ax, 1 ... 31"
V. 2-instructions synthesis followed by either "ADDX[248] Ax, Ax, Ax"
or "SUBX8 Ax, Ax, Ax" (multiplying by 3, 5, 7 or 9)
gcc/ChangeLog:
* config/xtensa/xtensa-protos.h (xtensa_constantsynth):
New prototype.
* config/xtensa/xtensa.cc (xtensa_emit_constantsynth,
xtensa_constantsynth_2insn, xtensa_constantsynth_rtx_SLLI,
xtensa_constantsynth_rtx_ADDSUBX, xtensa_constantsynth):
New backend functions that process the abovementioned logic.
(xtensa_emit_move_sequence): Revert the previous changes.
* config/xtensa/xtensa.md: New split patterns for integer
and floating-point, as the frontend part.
gcc/testsuite/ChangeLog:
* gcc.target/xtensa/constsynth_2insns.c: New.
* gcc.target/xtensa/constsynth_3insns.c: Ditto.
* gcc.target/xtensa/constsynth_double.c: Ditto.
Takayuki 'January June' Suwa [Fri, 10 Jun 2022 04:19:32 +0000 (13:19 +0900)]
xtensa: Improve instruction cost estimation and suggestion
This patch implements a new target-specific relative RTL insn cost function
because of suboptimal cost estimation by default, and fixes several "length"
insn attributes (related to the cost estimation).
And also introduces a new machine-dependent option "-mextra-l32r-costs="
that tells implementation-specific InstRAM/ROM access penalty for L32R
instruction to the compiler (in clock-cycle units, 0 by default).
gcc/ChangeLog:
* config/xtensa/xtensa.cc (xtensa_rtx_costs): Correct wrong case
for ABS and NEG, add missing case for BSWAP and CLRSB, and
double the costs for integer divisions using libfuncs if
optimizing for speed, in order to take advantage of fast constant
division by multiplication.
(TARGET_INSN_COST): New macro definition.
(xtensa_is_insn_L32R_p, xtensa_insn_cost): New functions for
calculating relative costs of a RTL insns, for both of speed and
size.
* config/xtensa/xtensa.md (return, nop, trap): Correct values of
the attribute "length" that depends on TARGET_DENSITY.
(define_asm_attributes, blockage, frame_blockage): Add missing
attributes.
* config/xtensa/xtensa.opt (-mextra-l32r-costs=): New machine-
dependent option, however, preparatory work for now.
Takayuki 'January June' Suwa [Fri, 10 Jun 2022 04:18:24 +0000 (13:18 +0900)]
xtensa: Consider the Loop Option when setmemsi is expanded to small loop
Now apply to almost any size of aligned block under such circumstances.
gcc/ChangeLog:
* config/xtensa/xtensa.cc (xtensa_expand_block_set_small_loop):
Pass through the block length / loop count conditions if
zero-overhead looping is configured and active,
Takayuki 'January June' Suwa [Fri, 10 Jun 2022 04:17:40 +0000 (13:17 +0900)]
xtensa: Tweak some widen multiplications
umulsidi3 is faster than umuldi3 even if library call, and is also
prerequisite for fast constant division by multiplication.
gcc/ChangeLog:
* config/xtensa/xtensa.md (mulsidi3, umulsidi3):
Split into individual signedness, in order to use libcall
"__umulsidi3" but not the other.
(<u>mulhisi3): Merge into one by using code iterator.
(<u>mulsidi3, mulhisi3, umulhisi3): Remove.
Michael Meissner [Sat, 11 Jun 2022 04:40:16 +0000 (00:40 -0400)]
Disable generating load/store vector pairs for block copies.
Testing has found that using load and store vector pair for block copies
can result in a slow down on power10. This patch disables using the
vector pair instructions for block copies if we are tuning for power10.
2022-06-11 Michael Meissner <meissner@linux.ibm.com>
gcc/
* config/rs6000/rs6000.cc (rs6000_option_override_internal): Do
not generate block copies with vector pair instructions if we are
tuning for power10.
GCC Administrator [Sat, 11 Jun 2022 00:16:21 +0000 (00:16 +0000)]
Daily bump.
Patrick Palka [Fri, 10 Jun 2022 20:10:02 +0000 (16:10 -0400)]
c++: improve TYPENAME_TYPE hashing [PR65328]
For the testcase in this PR, compilation takes very long ultimately due
to our poor hashing of TYPENAME_TYPE causing a huge number of collisions
in the spec_hasher and typename_hasher tables.
In spec_hasher, we don't hash the components of TYPENAME_TYPE, which
means most TYPENAME_TYPE arguments end up contributing the same hash.
This is the safe thing to do uniformly since structural_comptypes may
try resolving a TYPENAME_TYPE via the current instantiation. But this
behavior of structural_comptypes is suppressed from spec_hasher::equal
via the comparing_specializations flag, which means spec_hasher::hash
can assume it's disabled too. To that end, this patch makes
spec_hasher::hash set the flag, and teaches iterative_hash_template_arg
to hash the relevant components of TYPENAME_TYPE when the flag is set.
And in typename_hasher, the hash function considers TYPE_IDENTIFIER
instead of the more informative TYPENAME_TYPE_FULLNAME, which this patch
fixes accordingly.
After this patch, compile time for the testcase in the PR falls to
around 30 seconds on my machine (down from dozens of minutes).
PR c++/65328
gcc/cp/ChangeLog:
* decl.cc (typename_hasher::hash): Add extra overloads.
Use iterative_hash_object instead of htab_hash_pointer.
Hash TYPENAME_TYPE_FULLNAME instead of TYPE_IDENTIFIER.
(build_typename_type): Use typename_hasher::hash.
* pt.cc (spec_hasher::hash): Add two-parameter overload.
Set comparing_specializations around the call to
hash_tmpl_and_args.
(iterative_hash_template_arg) <case TYPENAME_TYPE>:
When comparing_specializations, hash the TYPE_CONTEXT
and TYPENAME_TYPE_FULLNAME.
(tsubst_function_decl): Use spec_hasher::hash instead of
hash_tmpl_and_args.
(tsubst_template_decl): Likewise.
(tsubst_decl): Likewise.
Patrick Palka [Fri, 10 Jun 2022 20:09:58 +0000 (16:09 -0400)]
c++: optimize specialization of templated member functions
This applies one of the lookup_template_class optimizations from the
previous patch to instantiate_template as well.
gcc/cp/ChangeLog:
* pt.cc (instantiate_template): Don't substitute the context
of the most general template if that of the partially
instantiated template is already non-dependent.
Patrick Palka [Fri, 10 Jun 2022 20:09:48 +0000 (16:09 -0400)]
c++: optimize specialization of nested templated classes
When substituting a class template specialization, tsubst_aggr_type
substitutes the TYPE_CONTEXT before passing it to lookup_template_class.
This appears to be unnecessary, however, because the the initial value
of lookup_template_class's context parameter is unused outside of the
IDENTIFIER_NODE case, and l_t_c performs its own substitution of the
context, anyway. So this patch removes the redundant substitution in
tsubst_aggr_type. Doing so causes us to ICE on template/nested5.C
because during lookup_template_class for A<T>::C::D<S> with T=E and S=S,
we substitute and complete the context A<T>::C with T=E, which in turn
registers the desired dependent specialization of D for us which we end
up trying to register twice. This patch fixes this by checking the
specializations table again after completion of the context.
This patch also implements a couple of other optimizations:
* In lookup_template_class, if the context of the partially
instantiated template is already non-dependent, then we could
reuse that instead of substituting the context of the most
general template.
* During tsubst_decl for the TYPE_DECL for an injected-class-name,
we can avoid substituting its TREE_TYPE. We can also avoid
template argument substitution/coercion for this TYPE_DECL, and
for class-scope non-template VAR_/TYPE_DECLs more generally.
Together these optimizations improve memory usage for the range-v3
file test/view/zip.cc by about 5%.
gcc/cp/ChangeLog:
* pt.cc (lookup_template_class): Remove dead stores to
context parameter. Don't substitute the context of the
most general template if that of the partially instantiated
template is already non-dependent. Check the specializations
table again after completing the context of a nested dependent
specialization.
(tsubst_aggr_type) <case RECORD_TYPE>: Don't substitute
TYPE_CONTEXT or pass it to lookup_template_class.
(tsubst_decl) <case TYPE_DECL, case TYPE_DECL>: Avoid substituting
the TREE_TYPE for DECL_SELF_REFERENCE_P. Avoid template argument
substitution or coercion in some cases.
Nathan Sidwell [Thu, 9 Jun 2022 15:14:31 +0000 (08:14 -0700)]
c++: Add a late-writing step for modules
To add a module initializer optimization, we need to defer finishing writing
out the module file until the end of determining the dynamic initializers.
This is achieved by passing some saved-state from the main module writing
to a new function that completes it.
This patch merely adds the skeleton of that state and move things around,
allowing the finalization of the ELF file to be postponed. None of the
contents writing is moved, or the init optimization added.
gcc/cp/
* cp-tree.h (fini_modules): Add some parameters.
(finish_module_processing): Return an opaque pointer.
* decl2.cc (c_parse_final_cleanups): Propagate a cookie from
finish_module_processing to fini_modules.
* module.cc (struct module_processing_cookie): New.
(finish_module_processing): Return a heap-allocated cookie.
(late_finish_module): New. Finish out the module writing.
(fini_modules): Adjust.
Jakub Jelinek [Fri, 10 Jun 2022 19:19:51 +0000 (21:19 +0200)]
openmp: Call dlopen with "libmemkind.so.0" rather than "libmemkind.so"
On Thu, Jun 09, 2022 at 12:11:28PM +0200, Thomas Schwinge wrote:
> > This patch adds support for dlopening libmemkind.so
>
> Instead of 'dlopen'ing literally 'libmemkind.so':
> ..., shouldn't this instead 'dlopen' 'libmemkind.so.0'? At least for
> Debian/Ubuntu, the latter ('libmemkind.so.0') is shipped in the "library"
> package:
I agree and I've actually noticed it too right before committing, but I thought
I'll investigate and tweak incrementally because "libmemkind.so"
is what I've actually tested (it is what llvm libomp uses).
Here is the now tested incremental fix.
2022-06-10 Jakub Jelinek <jakub@redhat.com>
* allocator.c (gomp_init_memkind): Call dlopen with "libmemkind.so.0"
rather than "libmemkind.so".
Nathan Sidwell [Fri, 10 Jun 2022 12:22:21 +0000 (05:22 -0700)]
c++: Adjust module initializer calling emission
We special-case emitting the calls of module initializer functions. It's
simpler to just emit a static fn do do that, and add it onto the front of
the global init fn chain. We can also move the calculation of the set of
initializers to call to the point of use.
gcc/cp/
* cp-tree.h (module_has_import_init): Rename to ...
(module_determined_import_inits): ... here.
* decl2.cc (start_objects): Do not handle module initializers
here.
(c_parse_final_cleanups): Generate a separate module
initializer calling function and add it to the list. Shrink
the c-lang region.
* module.cc (num_init_calls_needed): Delete.
(module_has_import_init): Rename to ...
(module_determined_import_inits): ... here. Do the
calculation here ...
(finish_module_processing): ... rather than here.
(module_add_import_initializers): Reformat.
gcc/testsuite/
* g++.dg/modules/init-3_a.C: New.
* g++.dg/modules/init-3_b.C: New.
* g++.dg/modules/init-3_c.C: New.
Thomas Schwinge [Thu, 12 May 2022 20:46:40 +0000 (22:46 +0200)]
libgomp nvptx plugin: Remove '--with-cuda-driver=[...]' etc. configuration option
That means, exposing to the user only the '--without-cuda-driver' behavior:
including the GCC-shipped 'include/cuda/cuda.h' (not system <cuda.h>), and
'dlopen'ing the CUDA Driver library (not linking it).
For development purposes, the libgomp nvptx plugin developer may still manually
override that, to get the previous '--with-cuda-driver' behavior.
libgomp/
* plugin/Makefrag.am: Evaluate 'if PLUGIN_NVPTX_DYNAMIC' to true.
* plugin/configfrag.ac (--with-cuda-driver)
(--with-cuda-driver-include, --with-cuda-driver-lib)
(CUDA_DRIVER_INCLUDE, CUDA_DRIVER_LIB, PLUGIN_NVPTX_CPPFLAGS)
(PLUGIN_NVPTX_LDFLAGS, PLUGIN_NVPTX_LIBS, PLUGIN_NVPTX_DYNAMIC):
Remove.
* testsuite/libgomp-test-support.exp.in (cuda_driver_include)
(cuda_driver_lib): Remove.
* testsuite/lib/libgomp.exp (libgomp_init): Don't consider these.
* Makefile.in: Regenerate.
* configure: Likewise.
* testsuite/Makefile.in: Likewise.
Jonathan Wakely [Fri, 10 Jun 2022 13:39:13 +0000 (14:39 +0100)]
libstdc++: Make std::lcm and std::gcd detect overflow [PR105844]
When I fixed PR libstdc++/92978 I introduced a regression whereby
std::lcm(INT_MIN, 1) and std::lcm(50000, 49999) would no longer produce
errors during constant evaluation. Those calls are undefined, because
they violate the preconditions that |m| and the result can be
represented in the return type (which is int in both those cases). The
regression occurred because __absu<unsigned>(INT_MIN) is well-formed,
due to the explicit casts to unsigned in that new helper function, and
the out-of-range multiplication is well-formed, because unsigned
arithmetic wraps instead of overflowing.
To fix 92978 I made std::gcm and std::lcm calculate |m| and |n|
immediately, yielding a common unsigned type that was used to calculate
the result. That was partly correct, but there's no need to use an
unsigned type. Doing so only suppresses the overflow errors so the
compiler can't detect them. This change replaces __absu with __abs_r
that returns the common type (not its corresponding unsigned type). This
way we can detect overflow in __abs_r when required, while still
supporting the most-negative value when it can be represented in the
result type. To detect LCM results that are out of range of the result
type we still need explicit checks, because neither constant evaluation
nor UBsan will complain about unsigned wrapping for cases such as
std::lcm(500000u, 499999u). We can detect those overflows efficiently by
using __builtin_mul_overflow and asserting.
libstdc++-v3/ChangeLog:
PR libstdc++/105844
* include/experimental/numeric (experimental::gcd): Simplify
assertions. Use __abs_r instead of __absu.
(experimental::lcm): Likewise. Remove use of __detail::__lcm so
overflow can be detected.
* include/std/numeric (__detail::__absu): Rename to __abs_r and
change to allow signed result type, so overflow can be detected.
(__detail::__lcm): Remove.
(gcd): Simplify assertions. Use __abs_r instead of __absu.
(lcm): Likewise. Remove use of __detail::__lcm so overflow can
be detected.
* testsuite/26_numerics/gcd/gcd_neg.cc: Adjust dg-error lines.
* testsuite/26_numerics/lcm/lcm_neg.cc: Likewise.
* testsuite/26_numerics/gcd/105844.cc: New test.
* testsuite/26_numerics/lcm/105844.cc: New test.
Jonathan Wakely [Wed, 8 Jun 2022 09:43:57 +0000 (10:43 +0100)]
libstdc++: Fix lifetime bugs for non-TLS eh_globals [PR105880]
This ensures that the single-threaded fallback buffer eh_globals is not
destroyed during program termination, using the same immortalization
technique used for error category objects.
Also ensure that init._M_init can still be read after init has been
destroyed, by making it a static data member.
libstdc++-v3/ChangeLog:
PR libstdc++/105880
* libsupc++/eh_globals.cc (eh_globals): Ensure constant init and
prevent destruction during termination.
(__eh_globals_init::_M_init): Replace with static member _S_init.
(__cxxabiv1::__cxa_get_globals_fast): Update.
(__cxxabiv1::__cxa_get_globals): Likewise.
Roger Sayle [Fri, 10 Jun 2022 14:14:23 +0000 (15:14 +0100)]
PR rtl-optimization/7061: Complex number arguments on x86_64-like ABIs.
This patch addresses the issue in comment #6 of PR rtl-optimization/7061
(a four digit PR number) from 2006 where on x86_64 complex number arguments
are unconditionally spilled to the stack.
For the test cases below:
float re(float _Complex a) { return __real__ a; }
float im(float _Complex a) { return __imag__ a; }
GCC with -O2 currently generates:
re: movq %xmm0, -8(%rsp)
movss -8(%rsp), %xmm0
ret
im: movq %xmm0, -8(%rsp)
movss -4(%rsp), %xmm0
ret
with this patch we now generate:
re: ret
im: movq %xmm0, %rax
shrq $32, %rax
movd %eax, %xmm0
ret
[Technically, this shift can be performed on %xmm0 in a single
instruction, but the backend needs to be taught to do that, the
important bit is that the SCmode argument isn't written to the
stack].
The patch itself is to emit_group_store where just before RTL
expansion commits to writing to the stack, we check if the store
group consists of a single scalar integer register that holds
a complex mode value; on x86_64 SCmode arguments are passed in
DImode registers. If this is the case, we can use a SUBREG to
"view_convert" the integer to the equivalent complex mode.
An interesting corner case that showed up during testing is that
x86_64 also passes HCmode arguments in DImode registers(!), i.e.
using modes of different sizes. This is easily handled/supported
by first converting to an integer mode of the correct size, and
then generating a complex mode SUBREG of this. This is similar
in concept to the patch I proposed here:
https://gcc.gnu.org/pipermail/gcc-patches/2022-February/590139.html
2020-06-10 Roger Sayle <roger@nextmovesoftware.com>
gcc/ChangeLog
PR rtl-optimization/7061
* expr.cc (emit_group_store): For groups that consist of a single
scalar integer register that hold a complex mode value, use
gen_lowpart to generate a SUBREG to "view_convert" to the complex
mode. For modes of different sizes, first convert to an integer
mode of the appropriate size.
gcc/testsuite/ChangeLog
PR rtl-optimization/7061
* gcc.target/i386/pr7061-1.c: New test case.
* gcc.target/i386/pr7061-2.c: New test case.
Jonathan Wakely [Thu, 9 Jun 2022 11:07:15 +0000 (12:07 +0100)]
libstdc++: Make std::hash<basic_string<>> allocator-agnostic (LWG 3705)
This new library issue was recently moved to Tentatively Ready by an LWG
poll, so I'm making the change on trunk.
As noted in PR libstc++/105907 the std::hash specializations for PMR
strings were not treated as slow hashes by the unordered containers, so
this change preserves that. The new specializations for custom
allocators are also not treated as slow, for the same reason. For the
versioned namespace (i.e. unstable ABI) we don't have to worry about
that, so can enable hash code caching for all basic_string
specializations.
libstdc++-v3/ChangeLog:
* include/bits/basic_string.h (__hash_str_base): New class
template.
(hash<basic_string<C, char_traits<C>, A>>): Define partial
specialization for each of the standard character types.
(hash<string>, hash<wstring>, hash<u8string>, hash<u16string>)
(hash<u32string>): Remove explicit specializations.
* include/std/string (__hash_string_base): Remove class
template.
(hash<pmr::string>, hash<pmr::wstring>, hash<pmr::u8string>)
(hash<pmr::u16string>, hash<pmr::u32string>): Remove explicit
specializations.
* testsuite/21_strings/basic_string/hash/hash.cc: Test with
custom allocators.
* testsuite/21_strings/basic_string/hash/hash_char8_t.cc:
Likewise.
Antoni Boucher [Fri, 10 Jun 2022 01:37:23 +0000 (21:37 -0400)]
libgccjit: Support getting the size of a float [PR105829]
2022-06-09 Antoni Boucher <bouanto@zoho.com>
gcc/jit/
PR jit/105829
* libgccjit.cc: Add support for floating-point types in
gcc_jit_type_get_size.
gcc/testsuite/
PR jit/105829
* jit.dg/test-types.c: Add tests for gcc_jit_type_get_size.
GCC Administrator [Fri, 10 Jun 2022 00:16:43 +0000 (00:16 +0000)]
Daily bump.
Takayuki 'January June' Suwa [Sun, 29 May 2022 10:57:35 +0000 (19:57 +0900)]
xtensa: Add clrsbsi2 insn pattern
> (clrsb:m x)
> Represents the number of redundant leading sign bits in x, represented
> as an integer of mode m, starting at the most significant bit position.
This explanation is just what the NSA instruction (not ever emitted before)
calculates in Xtensa ISA.
gcc/ChangeLog:
* config/xtensa/xtensa.md (clrsbsi2): New insn pattern.
libgcc/ChangeLog:
* config/xtensa/lib1funcs.S (__clrsbsi2): New function.
* config/xtensa/t-xtensa (LIB1ASMFUNCS): Add _clrsbsi2.
Takayuki 'January June' Suwa [Sun, 29 May 2022 10:55:44 +0000 (19:55 +0900)]
xtensa: Optimize '(~x & y)' to '((x & y) ^ y)'
In Xtensa ISA, there is no single machine instruction that calculates unary
bitwise negation.
gcc/ChangeLog:
* config/xtensa/xtensa.md (*andsi3_bitcmpl):
New insn_and_split pattern.
gcc/testsuite/ChangeLog:
* gcc.target/xtensa/check_zero_byte.c: New.
Takayuki 'January June' Suwa [Sun, 29 May 2022 10:46:16 +0000 (19:46 +0900)]
xtensa: Make one_cmplsi2 optimizer-friendly
In Xtensa ISA, there is no single machine instruction that calculates unary
bitwise negation. But a few optimizers assume that bitwise negation can be
done by a single insn.
As a result, '((x < 0) ? ~x : x)' cannot be optimized to '(x ^ (x >> 31))'
ever before, for example.
This patch relaxes such limitation, by putting the insn expansion off till
the split pass.
gcc/ChangeLog:
* config/xtensa/xtensa.md (one_cmplsi2):
Rearrange as an insn_and_split pattern.
gcc/testsuite/ChangeLog:
* gcc.target/xtensa/one_cmpl_abs.c: New.
Takayuki 'January June' Suwa [Sun, 29 May 2022 10:44:32 +0000 (19:44 +0900)]
xtensa: Implement bswaphi2 insn pattern
This patch adds bswaphi2 insn pattern that is one instruction less than the
default expansion.
gcc/ChangeLog:
* config/xtensa/xtensa.md (bswaphi2): New insn pattern.
Joseph Myers [Thu, 9 Jun 2022 22:04:25 +0000 (22:04 +0000)]
Update gcc sv.po
* sv.po: Update.
Segher Boessenkool [Wed, 11 May 2022 18:43:57 +0000 (18:43 +0000)]
rs6000: Delete FP_ISA3
FP_ISA3 is exactly the same as SFDF, just a less obvious name. So,
let's delete it.
2022-06-09 Segher Boessenkool <segher@kernel.crashing.org>
* config/rs6000/rs6000.md (FP_ISA3): Delete.
(float<QHI:mode><FP_ISA3:mode>2): Rename to...
(float<QHI:mode><SFDF:mode>2): ... this. Adjust.
(*float<QHI:mode><FP_ISA3:mode>2_internal): Rename to...
(*float<QHI:mode><SFDF:mode>2_internal): ... this. Adjust.
(floatuns<QHI:mode><FP_ISA3:mode>2): Rename to...
(floatuns<QHI:mode><SFDF:mode>2): ... this. Adjust.
(*floatuns<QHI:mode><FP_ISA3:mode>2_internal): Rename to...
(*floatuns<QHI:mode><SFDF:mode>2_internal): ... this. Adjust.
Jakub Jelinek [Thu, 9 Jun 2022 17:44:50 +0000 (19:44 +0200)]
openmp: Fix up include of the generic allocator.c
As reported by Richard Sandiford, #include "../../../allocator.c"
has one too many ../s, dunno why it worked for me when using
../configure (VPATH = ../../../libgomp)
2022-06-09 Jakub Jelinek <jakub@redhat.com>
* config/linux/allocator.c: Fix up #include directive.
Jakub Jelinek [Thu, 9 Jun 2022 15:42:31 +0000 (17:42 +0200)]
c++: Fix up ICE on __builtin_shufflevector constexpr evaluation [PR105871]
As the following testcase shows, BIT_FIELD_REF result doesn't have to have
just integral type, it can also have vector type. And in that case
cxx_eval_bit_field_ref just ICEs on it because it is unprepared for that
case, creates the initial value with build_int_cst (sure, that one could be
easily replaced with build_zero_cst) and then expects it can through shifts,
ands and ors come up with the final value, but that doesn't work for
vectors.
We already call fold_ternary if whole is a VECTOR_CST, this patch does the
same if the result doesn't have integral type. And, there is no guarantee
fold_ternary will succeed and the callers certainly don't expect NULL
being returned, so it also diagnoses those as non-constant and returns
original t in that case.
2022-06-09 Jakub Jelinek <jakub@redhat.com>
PR c++/105871
* constexpr.cc (cxx_eval_bit_field_ref): For BIT_FIELD_REF with
non-integral result type use fold_ternary too like for BIT_FIELD_REFs
from VECTOR_CST. If fold_ternary returns NULL, diagnose non-constant
expression, set *non_constant_p and return t, instead of returning
NULL.
* g++.dg/pr105871.C: New test.
Maciej W. Rozycki [Thu, 9 Jun 2022 13:34:34 +0000 (14:34 +0100)]
RISC-V: Use a tab rather than space with FSFLAGS
Consistently use a tab rather than a space as the separator between the
assembly instruction mnemonic and its operand with FSFLAGS instructions
produced with the unordered FP comparison RTL insns.
gcc/
* config/riscv/riscv.md
(*f<quiet_pattern>_quiet<ANYF:mode><X:mode>4_default)
(*f<quiet_pattern>_quiet<ANYF:mode><X:mode>4_snan): Emit a tab
rather than space with FSFLAGS.
Nathan Sidwell [Wed, 8 Jun 2022 18:25:14 +0000 (11:25 -0700)]
c++: Better module initializer code
Every module interface needs to emit a global initializer, but it
might have nothing to init. In those cases, there's no need for any
idempotency boolean to be emitted.
gcc/cp
* cp-tree.h (module_initializer_kind): Replace with ...
(module_global_init_needed, module_has_import_inits): ...
these.
* decl2.cc (start_objects): Add has_body parm. Reorganize
module initializer creation.
(generate_ctor_or_dtor_function): Adjust.
(c_parse_final_cleanups): Adjust.
(vtv_start_verification_constructor_init_function): Adjust.
* module.cc (module_initializer_kind): Replace with ...
(module_global_init_needed, module_has_import_inits): ...
these.
gcc/testsuite/
* g++.dg/modules/init-2_a.C: Check no idempotency.
* g++.dg/modules/init-2_b.C: Check idempotency.
Tobias Burnus [Thu, 9 Jun 2022 12:48:24 +0000 (14:48 +0200)]
OpenMP: Handle ancestor:1 with discover_declare_target
gcc/
* omp-offload.cc (omp_discover_declare_target_tgt_fn_r,
omp_discover_declare_target_fn_r): Don't walk reverse-offload
target regions.
gcc/testsuite/
* c-c++-common/gomp/reverse-offload-1.c: New.
Jakub Jelinek [Thu, 9 Jun 2022 08:19:53 +0000 (10:19 +0200)]
doc: Fix up -Waddress documentation
WHen looking up the -Waddress documentation due to some PR that mentioned it,
I've noticed some typos and thus I'm fixing them.
2022-06-09 Jakub Jelinek <jakub@redhat.com>
* doc/invoke.texi (-Waddress): Fix a typo in small example.
Fix typos inptr_t -> intptr_t and uinptr_t -> uintptr_t.
Jakub Jelinek [Thu, 9 Jun 2022 08:14:42 +0000 (10:14 +0200)]
openmp: Add support for HBW or large capacity or interleaved memory through the libmemkind.so library
This patch adds support for dlopening libmemkind.so on Linux and uses it
for some kinds of allocations (but not yet e.g. pinned memory).
2022-06-09 Jakub Jelinek <jakub@redhat.com>
* allocator.c: Include dlfcn.h if LIBGOMP_USE_MEMKIND is defined.
(enum gomp_memkind_kind): New type.
(struct omp_allocator_data): Add memkind field if LIBGOMP_USE_MEMKIND
is defined.
(struct gomp_memkind_data): New type.
(memkind_data, memkind_data_once): New variables.
(gomp_init_memkind, gomp_get_memkind): New functions.
(omp_init_allocator): Initialize data.memkind, don't fail for
omp_high_bw_mem_space if libmemkind supports it.
(omp_aligned_alloc, omp_free, omp_aligned_calloc, omp_realloc): Add
memkind support of LIBGOMP_USE_MEMKIND is defined.
* config/linux/allocator.c: New file.
Cui,Lili [Wed, 8 Jun 2022 03:25:57 +0000 (11:25 +0800)]
Update {skylake,icelake,alderlake}_cost to add a bit preference to vector store.
Since the interger vector construction cost has changed, we need to adjust the
load and store costs for intel processers.
With the patch applied
538.imagic_r:gets ~6% improvement on ADL for multicopy.
525.x264_r :gets ~2% improvement on ADL and ICX for multicopy.
with no measurable changes for other benchmarks.
gcc/ChangeLog
PR target/105493
* config/i386/x86-tune-costs.h (skylake_cost): Raise the gpr load cost
from 4 to 6 and gpr store cost from 6 to 8. Change SSE loads and
unaligned loads cost from {6, 6, 6, 10, 20} to {8, 8, 8, 8, 16}.
(icelake_cost): Ditto.
(alderlake_cost): Raise the gpr store cost from 6 to 8 and SSE loads,
stores and unaligned stores cost from {6, 6, 6, 10, 15} to
{8, 8, 8, 10, 15}.
gcc/testsuite/
PR target/105493
* gcc.target/i386/pr91446.c: Adjust to expect vectorization
* gcc.target/i386/pr99881.c: XFAIL.
* gcc.target/i386/pr105493.c: New.
* g++.target/i386/pr105638.C: Use other sequence checks
instead of vpxor, because code generation changed.
Haochen Gui [Thu, 9 Jun 2022 05:24:15 +0000 (13:24 +0800)]
This patch replaces shift and ior insns with one rotate and mask insn for the split patterns which are for DI byte swap on Power6.
gcc/
* config/rs6000/rs6000.md (define_split for bswapdi load): Merge shift
and ior insns to one rotate and mask insn.
(define_split for bswapdi register): Likewise.
gcc/testsuite/
* gcc.target/powerpc/pr93453-1.c: New.
GCC Administrator [Thu, 9 Jun 2022 00:16:26 +0000 (00:16 +0000)]
Daily bump.
Jason Merrill [Tue, 7 Jun 2022 19:52:30 +0000 (15:52 -0400)]
c++: non-templated friends [PR105852]
The previous patch for 105852 avoids copying DECL_TEMPLATE_INFO from a
non-templated friend, but it really shouldn't have it in the first place.
PR c++/105852
gcc/cp/ChangeLog:
* decl.cc (duplicate_decls): Change non-templated friend
check to an assert.
* pt.cc (tsubst_function_decl): Don't set DECL_TEMPLATE_INFO
on non-templated friends.
(tsubst_friend_function): Adjust.
Jason Merrill [Tue, 7 Jun 2022 01:49:06 +0000 (21:49 -0400)]
c++: redeclared hidden friend take 2 [PR105852]
My previous patch for 105761 avoided copying DECL_TEMPLATE_INFO from a
friend to a later definition, but in this testcase we have first a
non-friend declaration and then a definition, and we need to avoid copying
in that case as well. But we do still want to set new_template_info to
avoid GC trouble.
With this change, the modules dump correctly identifies ::foo as a
non-template function in tpl-friend-2_a.C.
Along the way I noticed that the duplicate_decls handling of
DECL_UNIQUE_FRIEND_P was backwards for templates, where we don't clobber
DECL_LANG_SPECIFIC (olddecl) with DECL_LANG_SPECIFIC (newdecl) like we do
for non-templates.
PR c++/105852
PR c++/105761
gcc/cp/ChangeLog:
* decl.cc (duplicate_decls): Avoid copying template info
from non-templated friend even if newdecl isn't a definition.
Correct handling of DECL_UNIQUE_FRIEND_P on templates.
* pt.cc (non_templated_friend_p): New.
* cp-tree.h (non_templated_friend_p): Declare it.
gcc/testsuite/ChangeLog:
* g++.dg/modules/tpl-friend-2_a.C: Adjust expected dump.
* g++.dg/template/friend74.C: New test.
Roger Sayle [Wed, 8 Jun 2022 19:43:03 +0000 (20:43 +0100)]
PR middle-end/105874: Use EXPAND_MEMORY to fix ada bootstrap.
Many thanks to Tamar Christina for filing PR middle-end/105874 indicating
that SPECcpu 2017's Leela is failing on x86_64 due to a miscompilation
of FastBoard::is_eye. This function is much smaller and easier to work
with than my previous hunt for the cause of the Ada bootstrap failures
due to miscompilation somewhere in GCC (or one of the 131 places that
the problematic form of optimization triggers during an ada bootstrap).
It turns out the source of the miscompilation introduced by my recent
patch is the distinction (during RTL expansion) of l-values and r-values.
According to the documentation above expand_modifier, EXPAND_MEMORY
should be used for lvalues (when a memory is required), and EXPAND_NORMAL
for rvalues when a constant is permissible. In what I'd like to consider
a latent bug, the recursive call to expand_expr_real on line 11188 of
expr.cc, in the case handling ARRAY_REF, COMPONENT_REF, BIT_FIELD_REF
and ARRARY_RANGE_REF was passing EXPAND_NORMAL when it really required
(the semantics of) EXPAND_MEMORY. All the time that VAR_DECLs were
being returned as memory this was fine, but as soon as we're able to
optimize sort arrays into immediate constants, bad things happen.
In the test case from Leela, we notice that the array s_eyemask
always has DImode constant value { 4, 64 }, which is useful as
an rvalue, but not when we need to index it as an lvalue, as in
s_eyemask[color]. This also explains why everything being accepted
by immediate_const_ctor_p (during an ada bootstrap) looks reasonable,
what's incorrect is that we don't know how these structs/arrays are
to be used.
The fix is to ensure that we call expand_expr with EXPAND_MEMORY
when processing the VAR_DECL's returned by get_inner_reference.
2022-06-08 Roger Sayle <roger@nextmovesoftware.com>
gcc/ChangeLog
PR middle-end/105874
* expr.cc (expand_expr_real_1) <normal_inner_ref>: New local
variable tem_modifier for calculating the expand_modifier enum to
use for expanding tem. If tem is a VAR_DECL, use EXPAND_MEMORY.
gcc/testsuite/ChangeLog
PR middle-end/105874
* g++.dg/opt/pr105874.C: New test case.
Max Filippov [Wed, 8 Jun 2022 04:01:01 +0000 (21:01 -0700)]
gcc: xtensa: fix PR target/105879
split_double operates with the 'word that comes first in memory in the
target' terminology, while gen_lowpart operates with the 'value
representing some low-order bits of X' terminology. They are not
equivalent and must be dealt with differently on little- and big-endian
targets.
gcc/
PR target/105879
* config/xtensa/xtensa.md (movdi): Rename 'first' and 'second'
to 'lowpart' and 'highpart' so that they match 'gen_lowpart' and
'gen_highpart' bitwise semantics and fix order of highpart and
lowpart depending on target endianness.
Nathan Sidwell [Tue, 31 May 2022 17:42:35 +0000 (10:42 -0700)]
c++: Reimplement static init/fini generation
Currently we generate static init/fini code by generating a set of
functions taking an 'initp' bool and an unsigned priority. (There can
be more than one, as we repeat the end-of-compile loop.) We then
generate a set of real init or fini functions for each needed
prioroty, calling the previous set of functions. This is of course
very tangled, but excitingly the value-range-propagator is clever
enough to unentangle it. However, the current arrangement makes
generation awkward, particularly as to how to optimize the
module-global-init generation.
This reimplements the generation to generate a set of separate
init/fini functions for each needed priority, and then call them from
the real inits previously mentioned. This replaces a splay tree,
recording which priority/init combos we needed, with a pair of hash
tables, mapping priority to init functions. Much simpler.
While there, rename several of the functions as they are only dealing
with part of the init/fini generation, not the whole set.
gcc/cp/
* decl2.cc (struct priority_info_s, priority_info): Delete.
(priority_map_traits, priority_map_t): New.
(static_init_fini_fns): New.
(INITIALIZE_P_IDENTIFIER, PRIORITY_IDENTIFIER): Delete.
(initialize_p_decl, priority_decl): Delete.
(ssdf_decls, priority_info_map): Delete.
(start_static_storage_duration_function): Rename to ...
(start_partial_init_fini_fn): ... here. Create a void arg fn.
Add it to the slot in the appropriate static_init_fini_fns
hash table.
(finish_static_storage_duration_function): Rename to ...
(finish_partial_init_fini_fn): ... here.
(get_priority_info): Delete.
(one_static_initialization_or_destruction): Assert not
trivial dtor.
(do_static_initialization_or_destruction): Rename to ...
(emit_partial_init_fini_fn) ... here. Start & finish the fn.
Simply init/fini each var.
(partition_vars_for_init_fini): Partition vars according to
priority and add to init and/or fini list.
(generate_ctor_or_dtor_function): Start and finish the function.
Do santitizer calls here.
(generate_ctor_and_dtor_functions_for_priority): Delete.
(c_parse_final_cleanups): Reimplement global init/fini
processing.
gcc/testsuite/
* g++.dg/init/static-cdtor1.C: New.
Roger Sayle [Wed, 8 Jun 2022 09:06:23 +0000 (10:06 +0100)]
[Committed] Add -mno-avx2 to recent gcc.target/i386/xop-vpcmov3.c
Adding -march=cascadelake to the command line options of the recently
added xop-vpcmov3.c test case causes problems as GCC then prefers to
use AVX512's vpternlogd instruction, instead of the XOP vpcmov that
the test is checking for. This is easily solved by adding an explicit
-mno-avx512vl to the command line options.
Committed to mainline as obvious (in hindsight).
2022-06-08 Roger Sayle <roger@nextmovesoftware.com>
gcc/testsuite/ChangeLog
* gcc.target/i386/xop-pcmov3.c: Add -mno-avx512vl to dg-options.
Tobias Burnus [Wed, 8 Jun 2022 08:06:57 +0000 (10:06 +0200)]
OpenMP: Fortran - fix ancestor's requires reverse_offload check
gcc/fortran/
* openmp.cc (gfc_match_omp_clauses): Check also parent namespace
for 'requires reverse_offload'.
gcc/testsuite/
* gfortran.dg/gomp/target-device-ancestor-5.f90: New test.