review.tizen.org Git - platform/upstream/gcc.git/log

predict.c (determine_unlikely_bbs): Handle correctly BBs which appears in the queue multiple times.

* predict.c (determine_unlikely_bbs): Handle correctly BBs
which appears in the queue multiple times.

From-SVN: r256648

re PR fortran/83744 (ICE in ../../gcc/gcc/fortran/dump-parse-tree.c:3093 while using -fc-prototypes)

2018-01-13 Thomas Koenig <tkoenig@gcc.gnu.org>

PR fortran/83744
* dump-parse-tree.c (get_c_type_name): Remove extra line.
Change for loop to use declaration in for loop. Handle BT_LOGICAL
and BT_CHARACTER.
(write_decl): Add where argument. Fix indentation. Replace
assert with error message. Add typename to warning
in comment.
(write_type): Adjust locus to call of write_decl.
(write_variable): Likewise.
(write_proc): Likewise. Replace assert with error message.

From-SVN: r256645

Support for aliasing with variable strides

This patch adds runtime alias checks for loops with variable strides,
so that we can vectorise them even without a restrict qualifier.
There are several parts to doing this:

1) For accesses like:

     x[i * n] += 1;

   we need to check whether n (and thus the DR_STEP) is nonzero.
   vect_analyze_data_ref_dependence records values that need to be
   checked in this way, then prune_runtime_alias_test_list records a
   bounds check on DR_STEP being outside the range [0, 0].

2) For accesses like:

     x[i * n] = x[i * n + 1] + 1;

   we simply need to test whether abs (n) >= 2.
   prune_runtime_alias_test_list looks for cases like this and tries
   to guess whether it is better to use this kind of check or a check
   for non-overlapping ranges.  (We could do an OR of the two conditions
   at runtime, but that isn't implemented yet.)

3) Checks for overlapping ranges need to cope with variable strides.
   At present the "length" of each segment in a range check is
   represented as an offset from the base that lies outside the
   touched range, in the same direction as DR_STEP.  The length
   can therefore be negative and is sometimes conservative.

   With variable steps it's easier to reaon about if we split
   this into two:

     seg_len:
       distance travelled from the first iteration of interest
       to the last, e.g. DR_STEP * (VF - 1)

     access_size:
       the number of bytes accessed in each iteration

   with access_size always being a positive constant and seg_len
   possibly being variable.  We can then combine alias checks
   for two accesses that are a constant number of bytes apart by
   adjusting the access size to account for the gap.  This leaves
   the segment length unchanged, which allows the check to be combined
   with further accesses.

   When seg_len is positive, the runtime alias check has the form:

        base_a >= base_b + seg_len_b + access_size_b
     || base_b >= base_a + seg_len_a + access_size_a

   In many accesses the base will be aligned to the access size, which
   allows us to skip the addition:

        base_a > base_b + seg_len_b
     || base_b > base_a + seg_len_a

   A similar saving is possible with "negative" lengths.

   The patch therefore tracks the alignment in addition to seg_len
   and access_size.

2018-01-13  Richard Sandiford  <richard.sandiford@linaro.org>
    Alan Hayward  <alan.hayward@arm.com>
    David Sherwood  <david.sherwood@arm.com>

gcc/
* tree-vectorizer.h (vec_lower_bound): New structure.
(_loop_vec_info): Add check_nonzero and lower_bounds.
(LOOP_VINFO_CHECK_NONZERO): New macro.
(LOOP_VINFO_LOWER_BOUNDS): Likewise.
(LOOP_REQUIRES_VERSIONING_FOR_ALIAS): Check lower_bounds too.
* tree-data-ref.h (dr_with_seg_len): Add access_size and align
fields.  Make seg_len the distance travelled, not including the
access size.
(dr_direction_indicator): Declare.
(dr_zero_step_indicator): Likewise.
(dr_known_forward_stride_p): Likewise.
* tree-data-ref.c: Include stringpool.h, tree-vrp.h and
tree-ssanames.h.
(runtime_alias_check_p): Allow runtime alias checks with
variable strides.
(operator ==): Compare access_size and align.
(prune_runtime_alias_test_list): Rework for new distinction between
the access_size and seg_len.
(create_intersect_range_checks_index): Likewise.  Cope with polynomial
segment lengths.
(get_segment_min_max): New function.
(create_intersect_range_checks): Use it.
(dr_step_indicator): New function.
(dr_direction_indicator): Likewise.
(dr_zero_step_indicator): Likewise.
(dr_known_forward_stride_p): Likewise.
* tree-loop-distribution.c (data_ref_segment_size): Return
DR_STEP * (niters - 1).
(compute_alias_check_pairs): Update call to the dr_with_seg_len
constructor.
* tree-vect-data-refs.c (vect_check_nonzero_value): New function.
(vect_preserves_scalar_order_p): New function, split out from...
(vect_analyze_data_ref_dependence): ...here.  Check for zero steps.
(vect_vfa_segment_size): Return DR_STEP * (length_factor - 1).
(vect_vfa_access_size): New function.
(vect_vfa_align): Likewise.
(vect_compile_time_alias): Take access_size_a and access_b arguments.
(dump_lower_bound): New function.
(vect_check_lower_bound): Likewise.
(vect_small_gap_p): Likewise.
(vectorizable_with_step_bound_p): Likewise.
(vect_prune_runtime_alias_test_list): Ignore cross-iteration
depencies if the vectorization factor is 1.  Convert the checks
for nonzero steps into checks on the bounds of DR_STEP.  Try using
a bunds check for variable steps if the minimum required step is
relatively small. Update calls to the dr_with_seg_len
constructor and to vect_compile_time_alias.
* tree-vect-loop-manip.c (vect_create_cond_for_lower_bounds): New
function.
(vect_loop_versioning): Call it.
* tree-vect-loop.c (vect_analyze_loop_2): Clear LOOP_VINFO_LOWER_BOUNDS
when retrying.
(vect_estimate_min_profitable_iters): Account for any bounds checks.

gcc/testsuite/
* gcc.dg/vect/bb-slp-cond-1.c: Expect loop vectorization rather
than SLP vectorization.
* gcc.dg/vect/vect-alias-check-10.c: New test.
* gcc.dg/vect/vect-alias-check-11.c: Likewise.
* gcc.dg/vect/vect-alias-check-12.c: Likewise.
* gcc.dg/vect/vect-alias-check-8.c: Likewise.
* gcc.dg/vect/vect-alias-check-9.c: Likewise.
* gcc.target/aarch64/sve/strided_load_8.c: Likewise.
* gcc.target/aarch64/sve/var_stride_1.c: Likewise.
* gcc.target/aarch64/sve/var_stride_1.h: Likewise.
* gcc.target/aarch64/sve/var_stride_1_run.c: Likewise.
* gcc.target/aarch64/sve/var_stride_2.c: Likewise.
* gcc.target/aarch64/sve/var_stride_2_run.c: Likewise.
* gcc.target/aarch64/sve/var_stride_3.c: Likewise.
* gcc.target/aarch64/sve/var_stride_3_run.c: Likewise.
* gcc.target/aarch64/sve/var_stride_4.c: Likewise.
* gcc.target/aarch64/sve/var_stride_4_run.c: Likewise.
* gcc.target/aarch64/sve/var_stride_5.c: Likewise.
* gcc.target/aarch64/sve/var_stride_5_run.c: Likewise.
* gcc.target/aarch64/sve/var_stride_6.c: Likewise.
* gcc.target/aarch64/sve/var_stride_6_run.c: Likewise.
* gcc.target/aarch64/sve/var_stride_7.c: Likewise.
* gcc.target/aarch64/sve/var_stride_7_run.c: Likewise.
* gcc.target/aarch64/sve/var_stride_8.c: Likewise.
* gcc.target/aarch64/sve/var_stride_8_run.c: Likewise.
* gfortran.dg/vect/vect-alias-check-1.F90: Likewise.

Co-Authored-By: Alan Hayward <alan.hayward@arm.com>
Co-Authored-By: David Sherwood <david.sherwood@arm.com>
From-SVN: r256644

Add support for SVE scatter stores

This is mostly a mechanical extension of the previous gather load
support to scatter stores.  The internal functions in this case are:

  IFN_SCATTER_STORE (base, offsets, scale, values)
  IFN_MASK_SCATTER_STORE (base, offsets, scale, values, mask)

However, one nonobvious change is to vect_analyze_data_ref_access.
If we're treating an access as a gather load or scatter store
(i.e. if STMT_VINFO_GATHER_SCATTER_P is true), the existing code
would create a dummy data_reference whose step is 0.  There's not
really much else it could do, since the whole point is that the
step isn't predictable from iteration to iteration.  We then
went into this code in vect_analyze_data_ref_access:

  /* Allow loads with zero step in inner-loop vectorization.  */
  if (loop_vinfo && integer_zerop (step))
    {
      GROUP_FIRST_ELEMENT (vinfo_for_stmt (stmt)) = NULL;
      if (!nested_in_vect_loop_p (loop, stmt))
return DR_IS_READ (dr);

I.e. we'd take the step literally and assume that this is a load
or store to an invariant address.  Loads from invariant addresses
are supported but stores to them aren't.

The code therefore had the effect of disabling all scatter stores.
AFAICT this is true of AVX too: although tests like avx512f-scatter-1.c
test for the correctness of a scatter-like loop, they don't seem to
check whether a scatter instruction is actually used.

The patch therefore makes vect_analyze_data_ref_access return true
for scatters.  We do seem to handle the aliasing correctly;
that's tested by other functions, and is symmetrical to the
already-working gather case.

2018-01-13  Richard Sandiford  <richard.sandiford@linaro.org>
    Alan Hayward  <alan.hayward@arm.com>
    David Sherwood  <david.sherwood@arm.com>

gcc/
* doc/sourcebuild.texi (vect_scatter_store): Document.
* optabs.def (scatter_store_optab, mask_scatter_store_optab): New
optabs.
* doc/md.texi (scatter_store@var{m}, mask_scatter_store@var{m}):
Document.
* genopinit.c (main): Add supports_vec_scatter_store and
supports_vec_scatter_store_cached to target_optabs.
* gimple.h (gimple_expr_type): Handle IFN_SCATTER_STORE and
IFN_MASK_SCATTER_STORE.
* internal-fn.def (SCATTER_STORE, MASK_SCATTER_STORE): New internal
functions.
* internal-fn.h (internal_store_fn_p): Declare.
(internal_fn_stored_value_index): Likewise.
* internal-fn.c (scatter_store_direct): New macro.
(expand_scatter_store_optab_fn): New function.
(direct_scatter_store_optab_supported_p): New macro.
(internal_store_fn_p): New function.
(internal_gather_scatter_fn_p): Handle IFN_SCATTER_STORE and
IFN_MASK_SCATTER_STORE.
(internal_fn_mask_index): Likewise.
(internal_fn_stored_value_index): New function.
(internal_gather_scatter_fn_supported_p): Adjust operand numbers
for scatter stores.
* optabs-query.h (supports_vec_scatter_store_p): Declare.
* optabs-query.c (supports_vec_scatter_store_p): New function.
* tree-vectorizer.h (vect_get_store_rhs): Declare.
* tree-vect-data-refs.c (vect_analyze_data_ref_access): Return
true for scatter stores.
(vect_gather_scatter_fn_p): Handle scatter stores too.
(vect_check_gather_scatter): Consider using scatter stores if
supports_vec_scatter_store_p.
* tree-vect-patterns.c (vect_try_gather_scatter_pattern): Handle
scatter stores too.
* tree-vect-stmts.c (exist_non_indexing_operands_for_use_p): Use
internal_fn_stored_value_index.
(check_load_store_masking): Handle scatter stores too.
(vect_get_store_rhs): Make public.
(vectorizable_call): Use internal_store_fn_p.
(vectorizable_store): Handle scatter store internal functions.
(vect_transform_stmt): Compare GROUP_STORE_COUNT with GROUP_SIZE
when deciding whether the end of the group has been reached.
* config/aarch64/aarch64.md (UNSPEC_ST1_SCATTER): New unspec.
* config/aarch64/aarch64-sve.md (scatter_store<mode>): New expander.
(mask_scatter_store<mode>): New insns.

gcc/testsuite/
* lib/target-supports.exp (check_effective_target_vect_scatter_store):
New proc.
* gcc.dg/vect/pr25413a.c: Expect both loops to be optimized on
targets with scatter stores.
* gcc.dg/vect/vect-71.c: Restrict XFAIL to targets without scatter
stores.
* gcc.target/aarch64/sve/mask_scatter_store_1.c: New test.
* gcc.target/aarch64/sve/mask_scatter_store_2.c: Likewise.
* gcc.target/aarch64/sve/scatter_store_1.c: Likewise.
* gcc.target/aarch64/sve/scatter_store_2.c: Likewise.
* gcc.target/aarch64/sve/scatter_store_3.c: Likewise.
* gcc.target/aarch64/sve/scatter_store_4.c: Likewise.
* gcc.target/aarch64/sve/scatter_store_5.c: Likewise.
* gcc.target/aarch64/sve/scatter_store_6.c: Likewise.
* gcc.target/aarch64/sve/scatter_store_7.c: Likewise.
* gcc.target/aarch64/sve/strided_store_1.c: Likewise.
* gcc.target/aarch64/sve/strided_store_2.c: Likewise.
* gcc.target/aarch64/sve/strided_store_3.c: Likewise.
* gcc.target/aarch64/sve/strided_store_4.c: Likewise.
* gcc.target/aarch64/sve/strided_store_5.c: Likewise.
* gcc.target/aarch64/sve/strided_store_6.c: Likewise.
* gcc.target/aarch64/sve/strided_store_7.c: Likewise.

Co-Authored-By: Alan Hayward <alan.hayward@arm.com>
Co-Authored-By: David Sherwood <david.sherwood@arm.com>
From-SVN: r256643

Allow gather loads to be used for grouped accesses

Following on from the previous patch for strided accesses, this patch
allows gather loads to be used with grouped accesses, if we otherwise
would need to fall back to VMAT_ELEMENTWISE.  However, as the comment
says, this is restricted to single-element groups for now:

??? Although the code can handle all group sizes correctly,
it probably isn't a win to use separate strided accesses based
on nearby locations.  Or, even if it's a win over scalar code,
it might not be a win over vectorizing at a lower VF, if that
allows us to use contiguous accesses.

Single-element groups are an important special case though,
and this means that code is less sensitive to GCC's classification
of single accesses with constant steps as "grouped" and ones with
variable steps as "strided".

2018-01-13  Richard Sandiford  <richard.sandiford@linaro.org>
    Alan Hayward  <alan.hayward@arm.com>
    David Sherwood  <david.sherwood@arm.com>

gcc/
* tree-vectorizer.h (vect_gather_scatter_fn_p): Declare.
* tree-vect-data-refs.c (vect_gather_scatter_fn_p): Make public.
* tree-vect-stmts.c (vect_truncate_gather_scatter_offset): New
function.
(vect_use_strided_gather_scatters_p): Take a masked_p argument.
Use vect_truncate_gather_scatter_offset if we can't treat the
operation as a normal gather load or scatter store.
(get_group_load_store_type): Take the gather_scatter_info
as argument.  Try using a gather load or scatter store for
single-element groups.
(get_load_store_type): Update calls to get_group_load_store_type
and vect_use_strided_gather_scatters_p.

gcc/testsuite/
* gcc.target/aarch64/sve/reduc_strict_3.c: Expect FADDA to be used
for double_reduc1.
* gcc.target/aarch64/sve/strided_load_4.c: New test.
* gcc.target/aarch64/sve/strided_load_5.c: Likewise.
* gcc.target/aarch64/sve/strided_load_6.c: Likewise.
* gcc.target/aarch64/sve/strided_load_7.c: Likewise.

Co-Authored-By: Alan Hayward <alan.hayward@arm.com>
Co-Authored-By: David Sherwood <david.sherwood@arm.com>
From-SVN: r256642

Use gather loads for strided accesses

This patch tries to use gather loads for strided accesses,
rather than falling back to VMAT_ELEMENTWISE.

2018-01-13  Richard Sandiford  <richard.sandiford@linaro.org>
    Alan Hayward  <alan.hayward@arm.com>
    David Sherwood  <david.sherwood@arm.com>

gcc/
* tree-vectorizer.h (vect_create_data_ref_ptr): Take an extra
optional tree argument.
* tree-vect-data-refs.c (vect_check_gather_scatter): Check for
null target hooks.
(vect_create_data_ref_ptr): Take the iv_step as an optional argument,
but continue to use the current value as a fallback.
(bump_vector_ptr): Use operand_equal_p rather than tree_int_cst_compare
to compare the updates.
* tree-vect-stmts.c (vect_use_strided_gather_scatters_p): New function.
(get_load_store_type): Use it when handling a strided access.
(vect_get_strided_load_store_ops): New function.
(vect_get_data_ptr_increment): Likewise.
(vectorizable_load): Handle strided gather loads.  Always pass
a step to vect_create_data_ref_ptr and bump_vector_ptr.

gcc/testsuite/
* gcc.target/aarch64/sve/strided_load_1.c: New test.
* gcc.target/aarch64/sve/strided_load_2.c: Likewise.
* gcc.target/aarch64/sve/strided_load_3.c: Likewise.

Co-Authored-By: Alan Hayward <alan.hayward@arm.com>
Co-Authored-By: David Sherwood <david.sherwood@arm.com>
From-SVN: r256641

Add support for SVE gather loads

This patch adds support for SVE gather loads.  It uses the basically
the same analysis code as the AVX gather support, but after that
there are two major differences:

- It uses new internal functions rather than target built-ins.
  The interface is:

     IFN_GATHER_LOAD (base, offsets scale)
     IFN_MASK_GATHER_LOAD (base, offsets scale, mask)

  which should be reasonably generic.  One of the advantages of
  using internal functions is that other passes can understand what
  the functions do, but a more immediate advantage is that we can
  query the underlying target pattern to see which scales it supports.

- It uses pattern recognition to convert the offset to the right width,
  if it was originally narrower than that.  This avoids having to do
  a widening operation as part of the gather expansion itself.

2018-01-13  Richard Sandiford  <richard.sandiford@linaro.org>
    Alan Hayward  <alan.hayward@arm.com>
    David Sherwood  <david.sherwood@arm.com>

gcc/
* doc/md.texi (gather_load@var{m}): Document.
(mask_gather_load@var{m}): Likewise.
* genopinit.c (main): Add supports_vec_gather_load and
supports_vec_gather_load_cached to target_optabs.
* optabs-tree.c (init_tree_optimization_optabs): Use
ggc_cleared_alloc to allocate target_optabs.
* optabs.def (gather_load_optab, mask_gather_laod_optab): New optabs.
* internal-fn.def (GATHER_LOAD, MASK_GATHER_LOAD): New internal
functions.
* internal-fn.h (internal_load_fn_p): Declare.
(internal_gather_scatter_fn_p): Likewise.
(internal_fn_mask_index): Likewise.
(internal_gather_scatter_fn_supported_p): Likewise.
* internal-fn.c (gather_load_direct): New macro.
(expand_gather_load_optab_fn): New function.
(direct_gather_load_optab_supported_p): New macro.
(direct_internal_fn_optab): New function.
(internal_load_fn_p): Likewise.
(internal_gather_scatter_fn_p): Likewise.
(internal_fn_mask_index): Likewise.
(internal_gather_scatter_fn_supported_p): Likewise.
* optabs-query.c (supports_at_least_one_mode_p): New function.
(supports_vec_gather_load_p): Likewise.
* optabs-query.h (supports_vec_gather_load_p): Declare.
* tree-vectorizer.h (gather_scatter_info): Add ifn, element_type
and memory_type field.
(NUM_PATTERNS): Bump to 15.
* tree-vect-data-refs.c: Include internal-fn.h.
(vect_gather_scatter_fn_p): New function.
(vect_describe_gather_scatter_call): Likewise.
(vect_check_gather_scatter): Try using internal functions for
gather loads.  Recognize existing calls to a gather load function.
(vect_analyze_data_refs): Consider using gather loads if
supports_vec_gather_load_p.
* tree-vect-patterns.c (vect_get_load_store_mask): New function.
(vect_get_gather_scatter_offset_type): Likewise.
(vect_convert_mask_for_vectype): Likewise.
(vect_add_conversion_to_patterm): Likewise.
(vect_try_gather_scatter_pattern): Likewise.
(vect_recog_gather_scatter_pattern): New pattern recognizer.
(vect_vect_recog_func_ptrs): Add it.
* tree-vect-stmts.c (exist_non_indexing_operands_for_use_p): Use
internal_fn_mask_index and internal_gather_scatter_fn_p.
(check_load_store_masking): Take the gather_scatter_info as an
argument and handle gather loads.
(vect_get_gather_scatter_ops): New function.
(vectorizable_call): Check internal_load_fn_p.
(vectorizable_load): Likewise.  Handle gather load internal
functions.
(vectorizable_store): Update call to check_load_store_masking.
* config/aarch64/aarch64.md (UNSPEC_LD1_GATHER): New unspec.
* config/aarch64/iterators.md (SVE_S, SVE_D): New mode iterators.
* config/aarch64/predicates.md (aarch64_gather_scale_operand_w)
(aarch64_gather_scale_operand_d): New predicates.
* config/aarch64/aarch64-sve.md (gather_load<mode>): New expander.
(mask_gather_load<mode>): New insns.

gcc/testsuite/
* gcc.target/aarch64/sve/gather_load_1.c: New test.
* gcc.target/aarch64/sve/gather_load_2.c: Likewise.
* gcc.target/aarch64/sve/gather_load_3.c: Likewise.
* gcc.target/aarch64/sve/gather_load_4.c: Likewise.
* gcc.target/aarch64/sve/gather_load_5.c: Likewise.
* gcc.target/aarch64/sve/gather_load_6.c: Likewise.
* gcc.target/aarch64/sve/gather_load_7.c: Likewise.
* gcc.target/aarch64/sve/mask_gather_load_1.c: Likewise.
* gcc.target/aarch64/sve/mask_gather_load_2.c: Likewise.
* gcc.target/aarch64/sve/mask_gather_load_3.c: Likewise.
* gcc.target/aarch64/sve/mask_gather_load_4.c: Likewise.
* gcc.target/aarch64/sve/mask_gather_load_5.c: Likewise.
* gcc.target/aarch64/sve/mask_gather_load_6.c: Likewise.
* gcc.target/aarch64/sve/mask_gather_load_7.c: Likewise.

Co-Authored-By: Alan Hayward <alan.hayward@arm.com>
Co-Authored-By: David Sherwood <david.sherwood@arm.com>
From-SVN: r256640

Add support for in-order addition reduction using SVE FADDA

This patch adds support for in-order floating-point addition reductions,
which are suitable even in strict IEEE mode.

Previously vect_is_simple_reduction would reject any cases that forbid
reassociation.  The idea is instead to tentatively accept them as
"FOLD_LEFT_REDUCTIONs" and only fail later if there is no support
for them.  Although this patch only handles the particular case of plus
and minus on floating-point types, there's no reason in principle why
we couldn't handle other cases.

The reductions use a new fold_left_plus_optab if available, otherwise
they fall back to elementwise additions or subtractions.

The vect_force_simple_reduction change makes it easier for parloops
to read the type of reduction.

2018-01-13  Richard Sandiford  <richard.sandiford@linaro.org>
    Alan Hayward  <alan.hayward@arm.com>
    David Sherwood  <david.sherwood@arm.com>

gcc/
* optabs.def (fold_left_plus_optab): New optab.
* doc/md.texi (fold_left_plus_@var{m}): Document.
* internal-fn.def (IFN_FOLD_LEFT_PLUS): New internal function.
* internal-fn.c (fold_left_direct): Define.
(expand_fold_left_optab_fn): Likewise.
(direct_fold_left_optab_supported_p): Likewise.
* fold-const-call.c (fold_const_fold_left): New function.
(fold_const_call): Use it to fold CFN_FOLD_LEFT_PLUS.
* tree-parloops.c (valid_reduction_p): New function.
(gather_scalar_reductions): Use it.
* tree-vectorizer.h (FOLD_LEFT_REDUCTION): New vect_reduction_type.
(vect_finish_replace_stmt): Declare.
* tree-vect-loop.c (fold_left_reduction_fn): New function.
(needs_fold_left_reduction_p): New function, split out from...
(vect_is_simple_reduction): ...here.  Accept reductions that
forbid reassociation, but give them type FOLD_LEFT_REDUCTION.
(vect_force_simple_reduction): Also store the reduction type in
the assignment's STMT_VINFO_REDUC_TYPE.
(vect_model_reduction_cost): Handle FOLD_LEFT_REDUCTION.
(merge_with_identity): New function.
(vect_expand_fold_left): Likewise.
(vectorize_fold_left_reduction): Likewise.
(vectorizable_reduction): Handle FOLD_LEFT_REDUCTION.  Leave the
scalar phi in place for it.  Check for target support and reject
cases that would reassociate the operation.  Defer the transform
phase to vectorize_fold_left_reduction.
* config/aarch64/aarch64.md (UNSPEC_FADDA): New unspec.
* config/aarch64/aarch64-sve.md (fold_left_plus_<mode>): New expander.
(*fold_left_plus_<mode>, *pred_fold_left_plus_<mode>): New insns.

gcc/testsuite/
* gcc.dg/vect/no-fast-math-vect16.c: Expect the test to pass and
check for a message about using in-order reductions.
* gcc.dg/vect/pr79920.c: Expect both loops to be vectorized and
check for a message about using in-order reductions.
* gcc.dg/vect/trapv-vect-reduc-4.c: Expect all three loops to be
vectorized and check for a message about using in-order reductions.
Expect targets with variable-length vectors to fall back to the
fixed-length mininum.
* gcc.dg/vect/vect-reduc-6.c: Expect the loop to be vectorized and
check for a message about using in-order reductions.
* gcc.dg/vect/vect-reduc-in-order-1.c: New test.
* gcc.dg/vect/vect-reduc-in-order-2.c: Likewise.
* gcc.dg/vect/vect-reduc-in-order-3.c: Likewise.
* gcc.dg/vect/vect-reduc-in-order-4.c: Likewise.
* gcc.target/aarch64/sve/reduc_strict_1.c: New test.
* gcc.target/aarch64/sve/reduc_strict_1_run.c: Likewise.
* gcc.target/aarch64/sve/reduc_strict_2.c: Likewise.
* gcc.target/aarch64/sve/reduc_strict_2_run.c: Likewise.
* gcc.target/aarch64/sve/reduc_strict_3.c: Likewise.
* gcc.target/aarch64/sve/slp_13.c: Add floating-point types.
* gfortran.dg/vect/vect-8.f90: Expect 22 loops to be vectorized if
vect_fold_left_plus.

Co-Authored-By: Alan Hayward <alan.hayward@arm.com>
Co-Authored-By: David Sherwood <david.sherwood@arm.com>
From-SVN: r256639

Remove unnecessary temporary in tree-if-conv.c

The call to ifc_temp_var in predicate_mem_writes become redundant
in r230099. Before that point the mask was calculated using
fold_build_*s, but now it's calculated by gimple_build and so
is already a valid gimple value.

As it stands, the call forces an SSA_NAME-to-SSA_NAME copy
to be created, whereas SLP expects that such redundant copies
have already been eliminated.

2018-01-13 Richard Sandiford <richard.sandiford@linaro.org>

gcc/
* tree-if-conv.c (predicate_mem_writes): Remove redundant
call to ifc_temp_var.

From-SVN: r256638

Rework the legitimize_address_displacement hook

This patch:

- tweaks the handling of legitimize_address_displacement
  so that it gets called before rather than after the address has
  been expanded.  This means that we're no longer at the mercy
  of LRA being able to interpret the expanded instructions.

- passes the original offset to legitimize_address_displacement.

- adds SVE support to the AArch64 implementation of
  legitimize_address_displacement.

2018-01-13  Richard Sandiford  <richard.sandiford@linaro.org>
    Alan Hayward  <alan.hayward@arm.com>
    David Sherwood  <david.sherwood@arm.com>

gcc/
* target.def (legitimize_address_displacement): Take the original
offset as a poly_int.
* targhooks.h (default_legitimize_address_displacement): Update
accordingly.
* targhooks.c (default_legitimize_address_displacement): Likewise.
* doc/tm.texi: Regenerate.
* lra-constraints.c (base_plus_disp_to_reg): Take the displacement
as an argument, moving assert of ad->disp == ad->disp_term to...
(process_address_1): ...here.  Update calls to base_plus_disp_to_reg.
Try calling targetm.legitimize_address_displacement before expanding
the address rather than afterwards, and adjust for the new interface.
* config/aarch64/aarch64.c (aarch64_legitimize_address_displacement):
Match the new hook interface.  Handle SVE addresses.
* config/sh/sh.c (sh_legitimize_address_displacement): Make the
new hook interface.

Co-Authored-By: Alan Hayward <alan.hayward@arm.com>
Co-Authored-By: David Sherwood <david.sherwood@arm.com>
From-SVN: r256637

Add an "early rematerialisation" pass

This patch looks for pseudo registers that are live across a call
and for which no call-preserved hard registers exist.  It then
recomputes the pseudos as necessary to ensure that they are no
longer live across a call.  The comment at the head of the file
describes the approach.

A new target hook selects which modes should be treated in this way.
By default none are, in which case the pass is skipped very early.

It might also be worth looking for cases like:

   C1: R1 := f (...)
   ...
   C2: R2 := f (...)
   C3: R1 := C2

and giving the same value number to C1 and C3, effectively treating
it like:

   C1: R1 := f (...)
   ...
   C2: R2 := f (...)
   C3: R1 := f (...)

Another (much more expensive) enhancement would be to apply value
numbering to all pseudo registers (not just rematerialisation
candidates), so that we can handle things like:

  C1: R1 := f (...R2...)
  ...
  C2: R1 := f (...R3...)

where R2 and R3 hold the same value.  But the current pass seems
to catch the vast majority of cases.

2018-01-13  Richard Sandiford  <richard.sandiford@linaro.org>

gcc/
* Makefile.in (OBJS): Add early-remat.o.
* target.def (select_early_remat_modes): New hook.
* doc/tm.texi.in (TARGET_SELECT_EARLY_REMAT_MODES): New hook.
* doc/tm.texi: Regenerate.
* targhooks.h (default_select_early_remat_modes): Declare.
* targhooks.c (default_select_early_remat_modes): New function.
* timevar.def (TV_EARLY_REMAT): New timevar.
* passes.def (pass_early_remat): New pass.
* tree-pass.h (make_pass_early_remat): Declare.
* early-remat.c: New file.
* config/aarch64/aarch64.c (aarch64_select_early_remat_modes): New
function.
(TARGET_SELECT_EARLY_REMAT_MODES): Define.

gcc/testsuite/
* gcc.target/aarch64/sve/spill_1.c: Also test that no predicates
are spilled.
* gcc.target/aarch64/sve/spill_2.c: New test.
* gcc.target/aarch64/sve/spill_3.c: Likewise.
* gcc.target/aarch64/sve/spill_4.c: Likewise.
* gcc.target/aarch64/sve/spill_5.c: Likewise.
* gcc.target/aarch64/sve/spill_6.c: Likewise.
* gcc.target/aarch64/sve/spill_7.c: Likewise.

From-SVN: r256636

Use single-iteration epilogues when peeling for gaps

This patch adds support for fully-masking loops that require peeling
for gaps.  It peels exactly one scalar iteration and uses the masked
loop to handle the rest.  Previously we would fall back on using a
standard unmasked loop instead.

2018-01-13  Richard Sandiford  <richard.sandiford@linaro.org>
    Alan Hayward  <alan.hayward@arm.com>
    David Sherwood  <david.sherwood@arm.com>

gcc/
* tree-vect-loop-manip.c (vect_gen_scalar_loop_niters): Replace
vfm1 with a bound_epilog parameter.
(vect_do_peeling): Update calls accordingly, and move the prologue
call earlier in the function.  Treat the base bound_epilog as 0 for
fully-masked loops and retain vf - 1 for other loops.  Add 1 to
this base when peeling for gaps.
* tree-vect-loop.c (vect_analyze_loop_2): Allow peeling for gaps
with fully-masked loops.
(vect_estimate_min_profitable_iters): Handle the single peeled
iteration in that case.

gcc/testsuite/
* gcc.target/aarch64/sve/struct_vect_18.c: Check the number
of branches.
* gcc.target/aarch64/sve/struct_vect_19.c: Likewise.
* gcc.target/aarch64/sve/struct_vect_20.c: New test.
* gcc.target/aarch64/sve/struct_vect_20_run.c: Likewise.
* gcc.target/aarch64/sve/struct_vect_21.c: Likewise.
* gcc.target/aarch64/sve/struct_vect_21_run.c: Likewise.
* gcc.target/aarch64/sve/struct_vect_22.c: Likewise.
* gcc.target/aarch64/sve/struct_vect_22_run.c: Likewise.
* gcc.target/aarch64/sve/struct_vect_23.c: Likewise.
* gcc.target/aarch64/sve/struct_vect_23_run.c: Likewise.

Co-Authored-By: Alan Hayward <alan.hayward@arm.com>
Co-Authored-By: David Sherwood <david.sherwood@arm.com>
From-SVN: r256635

Allow single-element interleaving for non-power-of-2 strides

This allows LD3 to be used for isolated a[i * 3] accesses, in a similar
way to the current a[i * 2] and a[i * 4] for LD2 and LD4 respectively.
Given the problems with the cost model underestimating the cost of
elementwise accesses, the patch continues to reject the VMAT_ELEMENTWISE
cases that are currently rejected.

2018-01-13  Richard Sandiford  <richard.sandiford@linaro.org>
    Alan Hayward  <alan.hayward@arm.com>
    David Sherwood  <david.sherwood@arm.com>

gcc/
* tree-vect-data-refs.c (vect_analyze_group_access_1): Allow
single-element interleaving even if the size is not a power of 2.
* tree-vect-stmts.c (get_load_store_type): Disallow elementwise
accesses for single-element interleaving if the group size is
not a power of 2.

gcc/testsuite/
* gcc.target/aarch64/sve/struct_vect_18.c: New test.
* gcc.target/aarch64/sve/struct_vect_18_run.c: Likewise.
* gcc.target/aarch64/sve/struct_vect_19.c: Likewise.
* gcc.target/aarch64/sve/struct_vect_19_run.c: Likewise.

Co-Authored-By: Alan Hayward <alan.hayward@arm.com>
Co-Authored-By: David Sherwood <david.sherwood@arm.com>
From-SVN: r256634

Add support for conditional reductions using SVE CLASTB

This patch uses SVE CLASTB to optimise conditional reductions.  It means
that we no longer need to maintain a separate index vector to record
the most recent valid value, and no longer need to worry about overflow
cases.

2018-01-13  Richard Sandiford  <richard.sandiford@linaro.org>
    Alan Hayward  <alan.hayward@arm.com>
    David Sherwood  <david.sherwood@arm.com>

gcc/
* doc/md.texi (fold_extract_last_@var{m}): Document.
* doc/sourcebuild.texi (vect_fold_extract_last): Likewise.
* optabs.def (fold_extract_last_optab): New optab.
* internal-fn.def (FOLD_EXTRACT_LAST): New internal function.
* internal-fn.c (fold_extract_direct): New macro.
(expand_fold_extract_optab_fn): Likewise.
(direct_fold_extract_optab_supported_p): Likewise.
* tree-vectorizer.h (EXTRACT_LAST_REDUCTION): New vect_reduction_type.
* tree-vect-loop.c (vect_model_reduction_cost): Handle
EXTRACT_LAST_REDUCTION.
(get_initial_def_for_reduction): Do not create an initial vector
for EXTRACT_LAST_REDUCTION reductions.
(vectorizable_reduction): Leave the scalar phi in place for
EXTRACT_LAST_REDUCTIONs.  Try using EXTRACT_LAST_REDUCTION
ahead of INTEGER_INDUC_COND_REDUCTION.  Do not check for an
epilogue code for EXTRACT_LAST_REDUCTION and defer the
transform phase to vectorizable_condition.
* tree-vect-stmts.c (vect_finish_stmt_generation_1): New function,
split out from...
(vect_finish_stmt_generation): ...here.
(vect_finish_replace_stmt): New function.
(vectorizable_condition): Handle EXTRACT_LAST_REDUCTION.
* config/aarch64/aarch64-sve.md (fold_extract_last_<mode>): New
pattern.
* config/aarch64/aarch64.md (UNSPEC_CLASTB): New unspec.

gcc/testsuite/
* lib/target-supports.exp
(check_effective_target_vect_fold_extract_last): New proc.
* gcc.dg/vect/pr65947-1.c: Update dump messages.  Add markup
for fold_extract_last.
* gcc.dg/vect/pr65947-2.c: Likewise.
* gcc.dg/vect/pr65947-3.c: Likewise.
* gcc.dg/vect/pr65947-4.c: Likewise.
* gcc.dg/vect/pr65947-5.c: Likewise.
* gcc.dg/vect/pr65947-6.c: Likewise.
* gcc.dg/vect/pr65947-9.c: Likewise.
* gcc.dg/vect/pr65947-10.c: Likewise.
* gcc.dg/vect/pr65947-12.c: Likewise.
* gcc.dg/vect/pr65947-14.c: Likewise.
* gcc.dg/vect/pr80631-1.c: Likewise.
* gcc.target/aarch64/sve/clastb_1.c: New test.
* gcc.target/aarch64/sve/clastb_1_run.c: Likewise.
* gcc.target/aarch64/sve/clastb_2.c: Likewise.
* gcc.target/aarch64/sve/clastb_2_run.c: Likewise.
* gcc.target/aarch64/sve/clastb_3.c: Likewise.
* gcc.target/aarch64/sve/clastb_3_run.c: Likewise.
* gcc.target/aarch64/sve/clastb_4.c: Likewise.
* gcc.target/aarch64/sve/clastb_4_run.c: Likewise.
* gcc.target/aarch64/sve/clastb_5.c: Likewise.
* gcc.target/aarch64/sve/clastb_5_run.c: Likewise.
* gcc.target/aarch64/sve/clastb_6.c: Likewise.
* gcc.target/aarch64/sve/clastb_6_run.c: Likewise.
* gcc.target/aarch64/sve/clastb_7.c: Likewise.
* gcc.target/aarch64/sve/clastb_7_run.c: Likewise.

Co-Authored-By: Alan Hayward <alan.hayward@arm.com>
Co-Authored-By: David Sherwood <david.sherwood@arm.com>
From-SVN: r256633

Add support for vectorising live-out values using SVE LASTB

This patch uses the SVE LASTB instruction to optimise cases in which
a value produced by the final scalar iteration of a vectorised loop is
live outside the loop.  Previously this situation would stop us from
using a fully-masked loop.

2018-01-13  Richard Sandiford  <richard.sandiford@linaro.org>
    Alan Hayward  <alan.hayward@arm.com>
    David Sherwood  <david.sherwood@arm.com>

gcc/
* doc/md.texi (extract_last_@var{m}): Document.
* optabs.def (extract_last_optab): New optab.
* internal-fn.def (EXTRACT_LAST): New internal function.
* internal-fn.c (cond_unary_direct): New macro.
(expand_cond_unary_optab_fn): Likewise.
(direct_cond_unary_optab_supported_p): Likewise.
* tree-vect-loop.c (vectorizable_live_operation): Allow fully-masked
loops using EXTRACT_LAST.
* config/aarch64/aarch64-sve.md (aarch64_sve_lastb<mode>): Rename to...
(extract_last_<mode>): ...this optab.
(vec_extract<mode><Vel>): Update accordingly.

gcc/testsuite/
* gcc.target/aarch64/sve/live_1.c: New test.
* gcc.target/aarch64/sve/live_1_run.c: Likewise.

Co-Authored-By: Alan Hayward <alan.hayward@arm.com>
Co-Authored-By: David Sherwood <david.sherwood@arm.com>
From-SVN: r256632

Add an empty_mask_is_expensive hook

This patch adds a hook to control whether we avoid executing masked
(predicated) stores when the mask is all false.  We don't want to do
that by default for SVE.

2018-01-13  Richard Sandiford  <richard.sandiford@linaro.org>
    Alan Hayward  <alan.hayward@arm.com>
    David Sherwood  <david.sherwood@arm.com>

gcc/
* target.def (empty_mask_is_expensive): New hook.
* doc/tm.texi.in (TARGET_VECTORIZE_EMPTY_MASK_IS_EXPENSIVE): New hook.
* doc/tm.texi: Regenerate.
* targhooks.h (default_empty_mask_is_expensive): Declare.
* targhooks.c (default_empty_mask_is_expensive): New function.
* tree-vectorizer.c (vectorize_loops): Only call optimize_mask_stores
if the target says that empty masks are expensive.
* config/aarch64/aarch64.c (aarch64_empty_mask_is_expensive):
New function.
(TARGET_VECTORIZE_EMPTY_MASK_IS_EXPENSIVE): Redefine.

Co-Authored-By: Alan Hayward <alan.hayward@arm.com>
Co-Authored-By: David Sherwood <david.sherwood@arm.com>
From-SVN: r256631

Handle peeling for alignment with masking

This patch adds support for aligning vectors by using a partial
first iteration.  E.g. if the start pointer is 3 elements beyond
an aligned address, the first iteration will have a mask in which
the first three elements are false.

On SVE, the optimisation is only useful for vector-length-specific
code.  Vector-length-agnostic code doesn't try to align vectors
since the vector length might not be a power of 2.

2018-01-13  Richard Sandiford  <richard.sandiford@linaro.org>
    Alan Hayward  <alan.hayward@arm.com>
    David Sherwood  <david.sherwood@arm.com>

gcc/
* tree-vectorizer.h (_loop_vec_info::mask_skip_niters): New field.
(LOOP_VINFO_MASK_SKIP_NITERS): New macro.
(vect_use_loop_mask_for_alignment_p): New function.
(vect_prepare_for_masked_peels, vect_gen_while_not): Declare.
* tree-vect-loop-manip.c (vect_set_loop_masks_directly): Add an
niters_skip argument.  Make sure that the first niters_skip elements
of the first iteration are inactive.
(vect_set_loop_condition_masked): Handle LOOP_VINFO_MASK_SKIP_NITERS.
Update call to vect_set_loop_masks_directly.
(get_misalign_in_elems): New function, split out from...
(vect_gen_prolog_loop_niters): ...here.
(vect_update_init_of_dr): Take a code argument that specifies whether
the adjustment should be added or subtracted.
(vect_update_init_of_drs): Likewise.
(vect_prepare_for_masked_peels): New function.
(vect_do_peeling): Skip prologue peeling if we're using a mask
instead.  Update call to vect_update_inits_of_drs.
* tree-vect-loop.c (_loop_vec_info::_loop_vec_info): Initialize
mask_skip_niters.
(vect_analyze_loop_2): Allow fully-masked loops with peeling for
alignment.  Do not include the number of peeled iterations in
the minimum threshold in that case.
(vectorizable_induction): Adjust the start value down by
LOOP_VINFO_MASK_SKIP_NITERS iterations.
(vect_transform_loop): Call vect_prepare_for_masked_peels.
Take the number of skipped iterations into account when calculating
the loop bounds.
* tree-vect-stmts.c (vect_gen_while_not): New function.

gcc/testsuite/
* gcc.target/aarch64/sve/nopeel_1.c: New test.
* gcc.target/aarch64/sve/peel_ind_1.c: Likewise.
* gcc.target/aarch64/sve/peel_ind_1_run.c: Likewise.
* gcc.target/aarch64/sve/peel_ind_2.c: Likewise.
* gcc.target/aarch64/sve/peel_ind_2_run.c: Likewise.
* gcc.target/aarch64/sve/peel_ind_3.c: Likewise.
* gcc.target/aarch64/sve/peel_ind_3_run.c: Likewise.
* gcc.target/aarch64/sve/peel_ind_4.c: Likewise.
* gcc.target/aarch64/sve/peel_ind_4_run.c: Likewise.

Co-Authored-By: Alan Hayward <alan.hayward@arm.com>
Co-Authored-By: David Sherwood <david.sherwood@arm.com>
From-SVN: r256630

Allow the number of iterations to be smaller than VF

Fully-masked loops can be profitable even if the iteration
count is smaller than the vectorisation factor.  In this case
we're effectively doing a complete unroll followed by SLP.

The documentation for min-vect-loop-bound says that the
default value was 0, but actually the default and minimum
were 1.  We need it to be 0 for this case since the parameter
counts a whole number of vector iterations.

2018-01-13  Richard Sandiford  <richard.sandiford@linaro.org>
    Alan Hayward  <alan.hayward@arm.com>
    David Sherwood  <david.sherwood@arm.com>

gcc/
* doc/sourcebuild.texi (vect_fully_masked): Document.
* params.def (PARAM_MIN_VECT_LOOP_BOUND): Change minimum and
default value to 0.
* tree-vect-loop.c (vect_analyze_loop_costing): New function,
split out from...
(vect_analyze_loop_2): ...here. Don't check the vectorization
factor against the number of loop iterations if the loop is
fully-masked.

gcc/testsuite/
* lib/target-supports.exp (check_effective_target_vect_fully_masked):
New proc.
* gcc.dg/vect/slp-3.c: Expect all loops to be vectorized if
vect_fully_masked.
* gcc.target/aarch64/sve/loop_add_4.c: New test.
* gcc.target/aarch64/sve/loop_add_4_run.c: Likewise.
* gcc.target/aarch64/sve/loop_add_5.c: Likewise.
* gcc.target/aarch64/sve/loop_add_5_run.c: Likewise.
* gcc.target/aarch64/sve/miniloop_1.c: Likewise.
* gcc.target/aarch64/sve/miniloop_2.c: Likewise.

Co-Authored-By: Alan Hayward <alan.hayward@arm.com>
Co-Authored-By: David Sherwood <david.sherwood@arm.com>
From-SVN: r256629

Make ivopts handle calls to internal functions

ivopts previously treated pointer arguments to internal functions
like IFN_MASK_LOAD and IFN_MASK_STORE as normal gimple values.
This patch makes it treat them as addresses instead.  This makes
a significant difference to the code quality for SVE loops,
since we can then use loads and stores with scaled indices.

2018-01-13  Richard Sandiford  <richard.sandiford@linaro.org>
    Alan Hayward  <alan.hayward@arm.com>
    David Sherwood  <david.sherwood@arm.com>

gcc/
* tree-ssa-loop-ivopts.c (USE_ADDRESS): Split into...
(USE_REF_ADDRESS, USE_PTR_ADDRESS): ...these new use types.
(dump_groups): Update accordingly.
(iv_use::mem_type): New member variable.
(address_p): New function.
(record_use): Add a mem_type argument and initialize the new
mem_type field.
(record_group_use): Add a mem_type argument.  Use address_p.
Remove obsolete null checks of base_object.  Update call to record_use.
(find_interesting_uses_op): Update call to record_group_use.
(find_interesting_uses_cond): Likewise.
(find_interesting_uses_address): Likewise.
(get_mem_type_for_internal_fn): New function.
(find_address_like_use): Likewise.
(find_interesting_uses_stmt): Try find_address_like_use before
calling find_interesting_uses_op.
(addr_offset_valid_p): Use the iv mem_type field as the type
of the addressed memory.
(add_autoinc_candidates): Likewise.
(get_address_cost): Likewise.
(split_small_address_groups_p): Use address_p.
(split_address_groups): Likewise.
(add_iv_candidate_for_use): Likewise.
(autoinc_possible_for_pair): Likewise.
(rewrite_groups): Likewise.
(get_use_type): Check for USE_REF_ADDRESS instead of USE_ADDRESS.
(determine_group_iv_cost): Update after split of USE_ADDRESS.
(get_alias_ptr_type_for_ptr_address): New function.
(rewrite_use_address): Rewrite address uses in calls that were
identified by find_address_like_use.

gcc/testsuite/
* gcc.dg/tree-ssa/scev-9.c: Expected REFERENCE ADDRESS
instead of just ADDRESS.
* gcc.dg/tree-ssa/scev-10.c: Likewise.
* gcc.dg/tree-ssa/scev-11.c: Likewise.
* gcc.dg/tree-ssa/scev-12.c: Likewise.
* gcc.target/aarch64/sve/index_offset_1.c: New test.
* gcc.target/aarch64/sve/index_offset_1_run.c: Likewise.
* gcc.target/aarch64/sve/loop_add_2.c: Likewise.
* gcc.target/aarch64/sve/loop_add_3.c: Likewise.
* gcc.target/aarch64/sve/while_1.c: Check for indexed addressing modes.
* gcc.target/aarch64/sve/while_2.c: Likewise.
* gcc.target/aarch64/sve/while_3.c: Likewise.
* gcc.target/aarch64/sve/while_4.c: Likewise.

Co-Authored-By: Alan Hayward <alan.hayward@arm.com>
Co-Authored-By: David Sherwood <david.sherwood@arm.com>
From-SVN: r256628

Allow ADDR_EXPRs of TARGET_MEM_REFs

This patch allows ADDR_EXPR <TARGET_MEM_REF ...>, which is useful
when calling internal functions that take pointers to memory that
is conditionally loaded or stored.  This is a prerequisite to the
following ivopts patch.

2018-01-13  Richard Sandiford  <richard.sandiford@linaro.org>
    Alan Hayward  <alan.hayward@arm.com>
    David Sherwood  <david.sherwood@arm.com>

gcc/
* expr.c (expand_expr_addr_expr_1): Handle ADDR_EXPRs of
TARGET_MEM_REFs.
* gimple-expr.h (is_gimple_addressable: Likewise.
* gimple-expr.c (is_gimple_address): Likewise.
* internal-fn.c (expand_call_mem_ref): New function.
(expand_mask_load_optab_fn): Use it.
(expand_mask_store_optab_fn): Likewise.

Co-Authored-By: Alan Hayward <alan.hayward@arm.com>
Co-Authored-By: David Sherwood <david.sherwood@arm.com>
From-SVN: r256627

Add support for reductions in fully-masked loops

This patch removes the restriction that fully-masked loops cannot
have reductions.  The key thing here is to make sure that the
reduction accumulator doesn't include any values associated with
inactive lanes; the patch adds a bunch of conditional binary
operations for doing that.

2018-01-13  Richard Sandiford  <richard.sandiford@linaro.org>
    Alan Hayward  <alan.hayward@arm.com>
    David Sherwood  <david.sherwood@arm.com>

gcc/
* doc/md.texi (cond_add@var{mode}, cond_sub@var{mode})
(cond_and@var{mode}, cond_ior@var{mode}, cond_xor@var{mode})
(cond_smin@var{mode}, cond_smax@var{mode}, cond_umin@var{mode})
(cond_umax@var{mode}): Document.
* optabs.def (cond_add_optab, cond_sub_optab, cond_and_optab)
(cond_ior_optab, cond_xor_optab, cond_smin_optab, cond_smax_optab)
(cond_umin_optab, cond_umax_optab): New optabs.
* internal-fn.def (COND_ADD, COND_SUB, COND_MIN, COND_MAX, COND_AND)
(COND_IOR, COND_XOR): New internal functions.
* internal-fn.h (get_conditional_internal_fn): Declare.
* internal-fn.c (cond_binary_direct): New macro.
(expand_cond_binary_optab_fn): Likewise.
(direct_cond_binary_optab_supported_p): Likewise.
(get_conditional_internal_fn): New function.
* tree-vect-loop.c (vectorizable_reduction): Handle fully-masked loops.
Cope with reduction statements that are vectorized as calls rather
than assignments.
* config/aarch64/aarch64-sve.md (cond_<optab><mode>): New insns.
* config/aarch64/iterators.md (UNSPEC_COND_ADD, UNSPEC_COND_SUB)
(UNSPEC_COND_SMAX, UNSPEC_COND_UMAX, UNSPEC_COND_SMIN)
(UNSPEC_COND_UMIN, UNSPEC_COND_AND, UNSPEC_COND_ORR)
(UNSPEC_COND_EOR): New unspecs.
(optab): Add mappings for them.
(SVE_COND_INT_OP, SVE_COND_FP_OP): New int iterators.
(sve_int_op, sve_fp_op): New int attributes.

gcc/testsuite/
* gcc.dg/vect/pr60482.c: Remove XFAIL for variable-length vectors.
* gcc.target/aarch64/sve/reduc_1.c: Expect the loop operations
to be predicated.
* gcc.target/aarch64/sve/slp_5.c: Check for a fully-masked loop.
* gcc.target/aarch64/sve/slp_7.c: Likewise.
* gcc.target/aarch64/sve/reduc_5.c: New test.
* gcc.target/aarch64/sve/slp_13.c: Likewise.
* gcc.target/aarch64/sve/slp_13_run.c: Likewise.

Co-Authored-By: Alan Hayward <alan.hayward@arm.com>
Co-Authored-By: David Sherwood <david.sherwood@arm.com>
From-SVN: r256626

Add support for fully-predicated loops

This patch adds support for using a single fully-predicated loop instead
of a vector loop and a scalar tail.  An SVE WHILELO instruction generates
the predicate for each iteration of the loop, given the current scalar
iv value and the loop bound.  This operation is wrapped up in a new internal
function called WHILE_ULT.  E.g.:

   WHILE_ULT (0, 3, { 0, 0, 0, 0}) -> { 1, 1, 1, 0 }
   WHILE_ULT (UINT_MAX - 1, UINT_MAX, { 0, 0, 0, 0 }) -> { 1, 0, 0, 0 }

The third WHILE_ULT argument is needed to make the operation
unambiguous: without it, WHILE_ULT (0, 3) for one vector type would
seem equivalent to WHILE_ULT (0, 3) for another, even if the types have
different numbers of elements.

Note that the patch uses "mask" and "fully-masked" instead of
"predicate" and "fully-predicated", to follow existing GCC terminology.

This patch just handles the simple cases, punting for things like
reductions and live-out values.  Later patches remove most of these
restrictions.

2018-01-13  Richard Sandiford  <richard.sandiford@linaro.org>
    Alan Hayward  <alan.hayward@arm.com>
    David Sherwood  <david.sherwood@arm.com>

gcc/
* optabs.def (while_ult_optab): New optab.
* doc/md.texi (while_ult@var{m}@var{n}): Document.
* internal-fn.def (WHILE_ULT): New internal function.
* internal-fn.h (direct_internal_fn_supported_p): New override
that takes two types as argument.
* internal-fn.c (while_direct): New macro.
(expand_while_optab_fn): New function.
(convert_optab_supported_p): Likewise.
(direct_while_optab_supported_p): New macro.
* wide-int.h (wi::udiv_ceil): New function.
* tree-vectorizer.h (rgroup_masks): New structure.
(vec_loop_masks): New typedef.
(_loop_vec_info): Add masks, mask_compare_type, can_fully_mask_p
and fully_masked_p.
(LOOP_VINFO_CAN_FULLY_MASK_P, LOOP_VINFO_FULLY_MASKED_P)
(LOOP_VINFO_MASKS, LOOP_VINFO_MASK_COMPARE_TYPE): New macros.
(vect_max_vf): New function.
(slpeel_make_loop_iterate_ntimes): Delete.
(vect_set_loop_condition, vect_get_loop_mask_type, vect_gen_while)
(vect_halve_mask_nunits, vect_double_mask_nunits): Declare.
(vect_record_loop_mask, vect_get_loop_mask): Likewise.
* tree-vect-loop-manip.c: Include tree-ssa-loop-niter.h,
internal-fn.h, stor-layout.h and optabs-query.h.
(vect_set_loop_mask): New function.
(add_preheader_seq): Likewise.
(add_header_seq): Likewise.
(interleave_supported_p): Likewise.
(vect_maybe_permute_loop_masks): Likewise.
(vect_set_loop_masks_directly): Likewise.
(vect_set_loop_condition_masked): Likewise.
(vect_set_loop_condition_unmasked): New function, split out from
slpeel_make_loop_iterate_ntimes.
(slpeel_make_loop_iterate_ntimes): Rename to..
(vect_set_loop_condition): ...this.  Use vect_set_loop_condition_masked
for fully-masked loops and vect_set_loop_condition_unmasked otherwise.
(vect_do_peeling): Update call accordingly.
(vect_gen_vector_loop_niters): Use VF as the step for fully-masked
loops.
* tree-vect-loop.c (_loop_vec_info::_loop_vec_info): Initialize
mask_compare_type, can_fully_mask_p and fully_masked_p.
(release_vec_loop_masks): New function.
(_loop_vec_info): Use it to free the loop masks.
(can_produce_all_loop_masks_p): New function.
(vect_get_max_nscalars_per_iter): Likewise.
(vect_verify_full_masking): Likewise.
(vect_analyze_loop_2): Save LOOP_VINFO_CAN_FULLY_MASK_P around
retries, and free the mask rgroups before retrying.  Check loop-wide
reasons for disallowing fully-masked loops.  Make the final decision
about whether use a fully-masked loop or not.
(vect_estimate_min_profitable_iters): Do not assume that peeling
for the number of iterations will be needed for fully-masked loops.
(vectorizable_reduction): Disable fully-masked loops.
(vectorizable_live_operation): Likewise.
(vect_halve_mask_nunits): New function.
(vect_double_mask_nunits): Likewise.
(vect_record_loop_mask): Likewise.
(vect_get_loop_mask): Likewise.
(vect_transform_loop): Handle the case in which the final loop
iteration might handle a partial vector.  Call vect_set_loop_condition
instead of slpeel_make_loop_iterate_ntimes.
* tree-vect-stmts.c: Include tree-ssa-loop-niter.h and gimple-fold.h.
(check_load_store_masking): New function.
(prepare_load_store_mask): Likewise.
(vectorizable_store): Handle fully-masked loops.
(vectorizable_load): Likewise.
(supportable_widening_operation): Use vect_halve_mask_nunits for
booleans.
(supportable_narrowing_operation): Likewise vect_double_mask_nunits.
(vect_gen_while): New function.
* config/aarch64/aarch64.md (umax<mode>3): New expander.
(aarch64_uqdec<mode>): New insn.

gcc/testsuite/
* gcc.dg/tree-ssa/cunroll-10.c: Disable vectorization.
* gcc.dg/tree-ssa/peel1.c: Likewise.
* gcc.dg/vect/vect-load-lanes-peeling-1.c: Remove XFAIL for
variable-length vectors.
* gcc.target/aarch64/sve/vcond_6.c: XFAIL test for AND.
* gcc.target/aarch64/sve/vec_bool_cmp_1.c: Expect BIC instead of NOT.
* gcc.target/aarch64/sve/slp_1.c: Check for a fully-masked loop.
* gcc.target/aarch64/sve/slp_2.c: Likewise.
* gcc.target/aarch64/sve/slp_3.c: Likewise.
* gcc.target/aarch64/sve/slp_4.c: Likewise.
* gcc.target/aarch64/sve/slp_6.c: Likewise.
* gcc.target/aarch64/sve/slp_8.c: New test.
* gcc.target/aarch64/sve/slp_8_run.c: Likewise.
* gcc.target/aarch64/sve/slp_9.c: Likewise.
* gcc.target/aarch64/sve/slp_9_run.c: Likewise.
* gcc.target/aarch64/sve/slp_10.c: Likewise.
* gcc.target/aarch64/sve/slp_10_run.c: Likewise.
* gcc.target/aarch64/sve/slp_11.c: Likewise.
* gcc.target/aarch64/sve/slp_11_run.c: Likewise.
* gcc.target/aarch64/sve/slp_12.c: Likewise.
* gcc.target/aarch64/sve/slp_12_run.c: Likewise.
* gcc.target/aarch64/sve/ld1r_2.c: Likewise.
* gcc.target/aarch64/sve/ld1r_2_run.c: Likewise.
* gcc.target/aarch64/sve/while_1.c: Likewise.
* gcc.target/aarch64/sve/while_2.c: Likewise.
* gcc.target/aarch64/sve/while_3.c: Likewise.
* gcc.target/aarch64/sve/while_4.c: Likewise.

Co-Authored-By: Alan Hayward <alan.hayward@arm.com>
Co-Authored-By: David Sherwood <david.sherwood@arm.com>
From-SVN: r256625

Add support for bitwise reductions

This patch adds support for the SVE bitwise reduction instructions
(ANDV, ORV and EORV).  It's a fairly mechanical extension of existing
REDUC_* operators.

2018-01-13  Richard Sandiford  <richard.sandiford@linaro.org>
    Alan Hayward  <alan.hayward@arm.com>
    David Sherwood  <david.sherwood@arm.com>

gcc/
* optabs.def (reduc_and_scal_optab, reduc_ior_scal_optab)
(reduc_xor_scal_optab): New optabs.
* doc/md.texi (reduc_and_scal_@var{m}, reduc_ior_scal_@var{m})
(reduc_xor_scal_@var{m}): Document.
* doc/sourcebuild.texi (vect_logical_reduc): Likewise.
* internal-fn.def (IFN_REDUC_AND, IFN_REDUC_IOR, IFN_REDUC_XOR): New
internal functions.
* fold-const-call.c (fold_const_call): Handle them.
* tree-vect-loop.c (reduction_fn_for_scalar_code): Return the new
internal functions for BIT_AND_EXPR, BIT_IOR_EXPR and BIT_XOR_EXPR.
* config/aarch64/aarch64-sve.md (reduc_<bit_reduc>_scal_<mode>):
(*reduc_<bit_reduc>_scal_<mode>): New patterns.
* config/aarch64/iterators.md (UNSPEC_ANDV, UNSPEC_ORV)
(UNSPEC_XORV): New unspecs.
(optab): Add entries for them.
(BITWISEV): New int iterator.
(bit_reduc_op): New int attributes.

gcc/testsuite/
* lib/target-supports.exp (check_effective_target_vect_logical_reduc):
New proc.
* gcc.dg/vect/vect-reduc-or_1.c: Also run for vect_logical_reduc
and add an associated scan-dump test.  Prevent vectorization
of the first two loops.
* gcc.dg/vect/vect-reduc-or_2.c: Likewise.
* gcc.target/aarch64/sve/reduc_1.c: Add AND, IOR and XOR reductions.
* gcc.target/aarch64/sve/reduc_2.c: Likewise.
* gcc.target/aarch64/sve/reduc_1_run.c: Likewise.
(INIT_VECTOR): Tweak initial value so that some bits are always set.
* gcc.target/aarch64/sve/reduc_2_run.c: Likewise.

Co-Authored-By: Alan Hayward <alan.hayward@arm.com>
Co-Authored-By: David Sherwood <david.sherwood@arm.com>
From-SVN: r256624

SLP reductions with variable-length vectors

Two things stopped us using SLP reductions with variable-length vectors:

(1) We didn't have a way of constructing the initial vector.
    This patch does it by creating a vector full of the neutral
    identity value and then using a shift-and-insert function
    to insert any non-identity inputs into the low-numbered elements.
    (The non-identity values are needed for double reductions.)
    Alternatively, for unchained MIN/MAX reductions that have no neutral
    value, we instead use the same duplicate-and-interleave approach as
    for SLP constant and external definitions (added by a previous
    patch).

(2) The epilogue for constant-length vectors would extract the vector
    elements associated with each SLP statement and do scalar arithmetic
    on these individual elements.  For variable-length vectors, the patch
    instead creates a reduction vector for each SLP statement, replacing
    the elements for other SLP statements with the identity value.
    It then uses a hardware reduction instruction on each vector.

2018-01-13  Richard Sandiford  <richard.sandiford@linaro.org>
    Alan Hayward  <alan.hayward@arm.com>
    David Sherwood  <david.sherwood@arm.com>

gcc/
* doc/md.texi (vec_shl_insert_@var{m}): New optab.
* internal-fn.def (VEC_SHL_INSERT): New internal function.
* optabs.def (vec_shl_insert_optab): New optab.
* tree-vectorizer.h (can_duplicate_and_interleave_p): Declare.
(duplicate_and_interleave): Likewise.
* tree-vect-loop.c: Include internal-fn.h.
(neutral_op_for_slp_reduction): New function, split out from
get_initial_defs_for_reduction.
(get_initial_def_for_reduction): Handle option 2 for variable-length
vectors by loading the neutral value into a vector and then shifting
the initial value into element 0.
(get_initial_defs_for_reduction): Replace the code argument with
the neutral value calculated by neutral_op_for_slp_reduction.
Use gimple_build_vector for constant-length vectors.
Use IFN_VEC_SHL_INSERT for variable-length vectors if all
but the first group_size elements have a neutral value.
Use duplicate_and_interleave otherwise.
(vect_create_epilog_for_reduction): Take a neutral_op parameter.
Update call to get_initial_defs_for_reduction.  Handle SLP
reductions for variable-length vectors by creating one vector
result for each scalar result, with the elements associated
with other scalar results stubbed out with the neutral value.
(vectorizable_reduction): Call neutral_op_for_slp_reduction.
Require IFN_VEC_SHL_INSERT for double reductions on
variable-length vectors, or SLP reductions that have
a neutral value.  Require can_duplicate_and_interleave_p
support for variable-length unchained SLP reductions if there
is no neutral value, such as for MIN/MAX reductions.  Also require
the number of vector elements to be a multiple of the number of
SLP statements when doing variable-length unchained SLP reductions.
Update call to vect_create_epilog_for_reduction.
* tree-vect-slp.c (can_duplicate_and_interleave_p): Make public
and remove initial values.
(duplicate_and_interleave): Make public.
* config/aarch64/aarch64.md (UNSPEC_INSR): New unspec.
* config/aarch64/aarch64-sve.md (vec_shl_insert_<mode>): New insn.

gcc/testsuite/
* gcc.dg/vect/pr37027.c: Remove XFAIL for variable-length vectors.
* gcc.dg/vect/pr67790.c: Likewise.
* gcc.dg/vect/slp-reduc-1.c: Likewise.
* gcc.dg/vect/slp-reduc-2.c: Likewise.
* gcc.dg/vect/slp-reduc-3.c: Likewise.
* gcc.dg/vect/slp-reduc-5.c: Likewise.
* gcc.target/aarch64/sve/slp_5.c: New test.
* gcc.target/aarch64/sve/slp_5_run.c: Likewise.
* gcc.target/aarch64/sve/slp_6.c: Likewise.
* gcc.target/aarch64/sve/slp_6_run.c: Likewise.
* gcc.target/aarch64/sve/slp_7.c: Likewise.
* gcc.target/aarch64/sve/slp_7_run.c: Likewise.

Co-Authored-By: Alan Hayward <alan.hayward@arm.com>
Co-Authored-By: David Sherwood <david.sherwood@arm.com>
From-SVN: r256623

Handle more SLP constant and extern definitions for variable VF

This patch adds support for vectorising SLP definitions that are
constant or external (i.e. from outside the loop) when the vectorisation
factor isn't known at compile time.  It can only handle cases where the
number of SLP statements is a power of 2.

2018-01-13  Richard Sandiford  <richard.sandiford@linaro.org>
    Alan Hayward  <alan.hayward@arm.com>
    David Sherwood  <david.sherwood@arm.com>

gcc/
* tree-vect-slp.c: Include gimple-fold.h and internal-fn.h
(can_duplicate_and_interleave_p): New function.
(vect_get_and_check_slp_defs): Take the vector of statements
rather than just the current one.  Remove excess parentheses.
Restriction rejectinon of vect_constant_def and vect_external_def
for variable-length vectors to boolean types, or types for which
can_duplicate_and_interleave_p is false.
(vect_build_slp_tree_2): Update call to vect_get_and_check_slp_defs.
(duplicate_and_interleave): New function.
(vect_get_constant_vectors): Use gimple_build_vector for
constant-length vectors and suitable variable-length constant
vectors.  Use duplicate_and_interleave for other variable-length
vectors.  Don't defer the update when inserting new statements.

gcc/testsuite/
* gcc.dg/vect/no-scevccp-slp-30.c: Don't XFAIL for vect_variable_length
&& vect_load_lanes
* gcc.dg/vect/slp-1.c: Likewise.
* gcc.dg/vect/slp-10.c: Likewise.
* gcc.dg/vect/slp-12b.c: Likewise.
* gcc.dg/vect/slp-12c.c: Likewise.
* gcc.dg/vect/slp-17.c: Likewise.
* gcc.dg/vect/slp-19b.c: Likewise.
* gcc.dg/vect/slp-20.c: Likewise.
* gcc.dg/vect/slp-21.c: Likewise.
* gcc.dg/vect/slp-22.c: Likewise.
* gcc.dg/vect/slp-23.c: Likewise.
* gcc.dg/vect/slp-24-big-array.c: Likewise.
* gcc.dg/vect/slp-24.c: Likewise.
* gcc.dg/vect/slp-28.c: Likewise.
* gcc.dg/vect/slp-39.c: Likewise.
* gcc.dg/vect/slp-6.c: Likewise.
* gcc.dg/vect/slp-7.c: Likewise.
* gcc.dg/vect/slp-cond-1.c: Likewise.
* gcc.dg/vect/slp-cond-2-big-array.c: Likewise.
* gcc.dg/vect/slp-cond-2.c: Likewise.
* gcc.dg/vect/slp-multitypes-1.c: Likewise.
* gcc.dg/vect/slp-multitypes-8.c: Likewise.
* gcc.dg/vect/slp-multitypes-9.c: Likewise.
* gcc.dg/vect/slp-multitypes-10.c: Likewise.
* gcc.dg/vect/slp-multitypes-12.c: Likewise.
* gcc.dg/vect/slp-perm-6.c: Likewise.
* gcc.dg/vect/slp-widen-mult-half.c: Likewise.
* gcc.dg/vect/vect-live-slp-1.c: Likewise.
* gcc.dg/vect/vect-live-slp-2.c: Likewise.
* gcc.dg/vect/pr33953.c: Don't XFAIL for vect_variable_length.
* gcc.dg/vect/slp-12a.c: Likewise.
* gcc.dg/vect/slp-14.c: Likewise.
* gcc.dg/vect/slp-15.c: Likewise.
* gcc.dg/vect/slp-multitypes-2.c: Likewise.
* gcc.dg/vect/slp-multitypes-4.c: Likewise.
* gcc.dg/vect/slp-multitypes-5.c: Likewise.
* gcc.target/aarch64/sve/slp_1.c: New test.
* gcc.target/aarch64/sve/slp_1_run.c: Likewise.
* gcc.target/aarch64/sve/slp_2.c: Likewise.
* gcc.target/aarch64/sve/slp_2_run.c: Likewise.
* gcc.target/aarch64/sve/slp_3.c: Likewise.
* gcc.target/aarch64/sve/slp_3_run.c: Likewise.
* gcc.target/aarch64/sve/slp_4.c: Likewise.
* gcc.target/aarch64/sve/slp_4_run.c: Likewise.

Co-Authored-By: Alan Hayward <alan.hayward@arm.com>
Co-Authored-By: David Sherwood <david.sherwood@arm.com>
From-SVN: r256622

Protect against min_profitable_iters going negative

We had:

      if (vec_outside_cost <= 0)
        min_profitable_iters = 0;
      else
        {
  min_profitable_iters = ((vec_outside_cost - scalar_outside_cost)
  * assumed_vf
  - vec_inside_cost * peel_iters_prologue
  - vec_inside_cost * peel_iters_epilogue)
/ ((scalar_single_iter_cost * assumed_vf)
    - vec_inside_cost);

which can lead to negative min_profitable_iters when the *_outside_costs
are the same and peel_iters_epilogue is nonzero (e.g. if we're peeling
for gaps).

This is tested as part of the patch that adds support for fully-predicated
loops.

2018-01-13  Richard Sandiford  <richard.sandiford@linaro.org>
    Alan Hayward  <alan.hayward@arm.com>
    David Sherwood  <david.sherwood@arm.com>

gcc/
* tree-vect-loop.c (vect_estimate_min_profitable_iters): Make sure
min_profitable_iters doesn't go negative.

Co-Authored-By: Alan Hayward <alan.hayward@arm.com>
Co-Authored-By: David Sherwood <david.sherwood@arm.com>
From-SVN: r256621

Add support for masked load/store_lanes

This patch adds support for vectorising groups of IFN_MASK_LOADs
and IFN_MASK_STOREs using conditional load/store-lanes instructions.
This requires new internal functions to represent the result
(IFN_MASK_{LOAD,STORE}_LANES), as well as associated optabs.

The normal IFN_{LOAD,STORE}_LANES functions are const operations
that logically just perform the permute: the load or store is
encoded as a MEM operand to the call statement.  In contrast,
the IFN_MASK_{LOAD,STORE}_LANES functions use the same kind of
interface as IFN_MASK_{LOAD,STORE}, since the memory is only
conditionally accessed.

The AArch64 patterns were added as part of the main LD[234]/ST[234] patch.

2018-01-13  Richard Sandiford  <richard.sandiford@linaro.org>
    Alan Hayward  <alan.hayward@arm.com>
    David Sherwood  <david.sherwood@arm.com>

gcc/
* doc/md.texi (vec_mask_load_lanes@var{m}@var{n}): Document.
(vec_mask_store_lanes@var{m}@var{n}): Likewise.
* optabs.def (vec_mask_load_lanes_optab): New optab.
(vec_mask_store_lanes_optab): Likewise.
* internal-fn.def (MASK_LOAD_LANES): New internal function.
(MASK_STORE_LANES): Likewise.
* internal-fn.c (mask_load_lanes_direct): New macro.
(mask_store_lanes_direct): Likewise.
(expand_mask_load_optab_fn): Handle masked operations.
(expand_mask_load_lanes_optab_fn): New macro.
(expand_mask_store_optab_fn): Handle masked operations.
(expand_mask_store_lanes_optab_fn): New macro.
(direct_mask_load_lanes_optab_supported_p): Likewise.
(direct_mask_store_lanes_optab_supported_p): Likewise.
* tree-vectorizer.h (vect_store_lanes_supported): Take a masked_p
parameter.
(vect_load_lanes_supported): Likewise.
* tree-vect-data-refs.c (strip_conversion): New function.
(can_group_stmts_p): Likewise.
(vect_analyze_data_ref_accesses): Use it instead of checking
for a pair of assignments.
(vect_store_lanes_supported): Take a masked_p parameter.
(vect_load_lanes_supported): Likewise.
* tree-vect-loop.c (vect_analyze_loop_2): Update calls to
vect_store_lanes_supported and vect_load_lanes_supported.
* tree-vect-slp.c (vect_analyze_slp_instance): Likewise.
* tree-vect-stmts.c (get_group_load_store_type): Take a masked_p
parameter.  Don't allow gaps for masked accesses.
Use vect_get_store_rhs.  Update calls to vect_store_lanes_supported
and vect_load_lanes_supported.
(get_load_store_type): Take a masked_p parameter and update
call to get_group_load_store_type.
(vectorizable_store): Update call to get_load_store_type.
Handle IFN_MASK_STORE_LANES.
(vectorizable_load): Update call to get_load_store_type.
Handle IFN_MASK_LOAD_LANES.

gcc/testsuite/
* gcc.dg/vect/vect-ooo-group-1.c: New test.
* gcc.target/aarch64/sve/mask_struct_load_1.c: Likewise.
* gcc.target/aarch64/sve/mask_struct_load_1_run.c: Likewise.
* gcc.target/aarch64/sve/mask_struct_load_2.c: Likewise.
* gcc.target/aarch64/sve/mask_struct_load_2_run.c: Likewise.
* gcc.target/aarch64/sve/mask_struct_load_3.c: Likewise.
* gcc.target/aarch64/sve/mask_struct_load_3_run.c: Likewise.
* gcc.target/aarch64/sve/mask_struct_load_4.c: Likewise.
* gcc.target/aarch64/sve/mask_struct_load_5.c: Likewise.
* gcc.target/aarch64/sve/mask_struct_load_6.c: Likewise.
* gcc.target/aarch64/sve/mask_struct_load_7.c: Likewise.
* gcc.target/aarch64/sve/mask_struct_load_8.c: Likewise.
* gcc.target/aarch64/sve/mask_struct_store_1.c: Likewise.
* gcc.target/aarch64/sve/mask_struct_store_1_run.c: Likewise.
* gcc.target/aarch64/sve/mask_struct_store_2.c: Likewise.
* gcc.target/aarch64/sve/mask_struct_store_2_run.c: Likewise.
* gcc.target/aarch64/sve/mask_struct_store_3.c: Likewise.
* gcc.target/aarch64/sve/mask_struct_store_3_run.c: Likewise.
* gcc.target/aarch64/sve/mask_struct_store_4.c: Likewise.

Co-Authored-By: Alan Hayward <alan.hayward@arm.com>
Co-Authored-By: David Sherwood <david.sherwood@arm.com>
From-SVN: r256620

[AArch64] Tests for SVE structure modes

This patch adds tests for the SVE structure mode move patterns
and for LD[234] and ST[234] vectorisation.

2018-01-13  Richard Sandiford  <richard.sandiford@linaro.org>
    Alan Hayward  <alan.hayward@arm.com>
    David Sherwood  <david.sherwood@arm.com>

gcc/testsuite/
* gcc.target/aarch64/sve/struct_move_1.c: New test.
* gcc.target/aarch64/sve/struct_move_2.c: Likewise.
* gcc.target/aarch64/sve/struct_move_3.c: Likewise.
* gcc.target/aarch64/sve/struct_move_4.c: Likewise.
* gcc.target/aarch64/sve/struct_move_5.c: Likewise.
* gcc.target/aarch64/sve/struct_move_6.c: Likewise.
* gcc.target/aarch64/sve/struct_vect_1.c: Likewise.
* gcc.target/aarch64/sve/struct_vect_1_run.c: Likewise.
* gcc.target/aarch64/sve/struct_vect_2.c: Likewise.
* gcc.target/aarch64/sve/struct_vect_2_run.c: Likewise.
* gcc.target/aarch64/sve/struct_vect_3.c: Likewise.
* gcc.target/aarch64/sve/struct_vect_3_run.c: Likewise.
* gcc.target/aarch64/sve/struct_vect_4.c: Likewise.
* gcc.target/aarch64/sve/struct_vect_4_run.c: Likewise.
* gcc.target/aarch64/sve/struct_vect_5.c: Likewise.
* gcc.target/aarch64/sve/struct_vect_5_run.c: Likewise.
* gcc.target/aarch64/sve/struct_vect_6.c: Likewise.
* gcc.target/aarch64/sve/struct_vect_6_run.c: Likewise.
* gcc.target/aarch64/sve/struct_vect_7.c: Likewise.
* gcc.target/aarch64/sve/struct_vect_7_run.c: Likewise.
* gcc.target/aarch64/sve/struct_vect_8.c: Likewise.
* gcc.target/aarch64/sve/struct_vect_8_run.c: Likewise.
* gcc.target/aarch64/sve/struct_vect_9.c: Likewise.
* gcc.target/aarch64/sve/struct_vect_9_run.c: Likewise.
* gcc.target/aarch64/sve/struct_vect_10.c: Likewise.
* gcc.target/aarch64/sve/struct_vect_10_run.c: Likewise.
* gcc.target/aarch64/sve/struct_vect_11.c: Likewise.
* gcc.target/aarch64/sve/struct_vect_11_run.c: Likewise.
* gcc.target/aarch64/sve/struct_vect_12.c: Likewise.
* gcc.target/aarch64/sve/struct_vect_12_run.c: Likewise.
* gcc.target/aarch64/sve/struct_vect_13.c: Likewise.
* gcc.target/aarch64/sve/struct_vect_13_run.c: Likewise.
* gcc.target/aarch64/sve/struct_vect_14.c: Likewise.
* gcc.target/aarch64/sve/struct_vect_15.c: Likewise.
* gcc.target/aarch64/sve/struct_vect_16.c: Likewise.
* gcc.target/aarch64/sve/struct_vect_17.c: Likewise.

Co-Authored-By: Alan Hayward <alan.hayward@arm.com>
Co-Authored-By: David Sherwood <david.sherwood@arm.com>
From-SVN: r256619

[AArch64] SVE load/store_lanes support

This patch adds support for SVE LD[234], ST[234] and associated
structure modes.  Unlike Advanced SIMD, these modes are extra-long
vector modes instead of integer modes.

2018-01-13  Richard Sandiford  <richard.sandiford@linaro.org>
    Alan Hayward  <alan.hayward@arm.com>
    David Sherwood  <david.sherwood@arm.com>

gcc/
* config/aarch64/aarch64-modes.def: Define x2, x3 and x4 vector
modes for SVE.
* config/aarch64/aarch64-protos.h
(aarch64_sve_struct_memory_operand_p): Declare.
* config/aarch64/iterators.md (SVE_STRUCT): New mode iterator.
(vector_count, insn_length, VSINGLE, vsingle): New mode attributes.
(VPRED, vpred): Handle SVE structure modes.
* config/aarch64/constraints.md (Utx): New constraint.
* config/aarch64/predicates.md (aarch64_sve_struct_memory_operand)
(aarch64_sve_struct_nonimmediate_operand): New predicates.
* config/aarch64/aarch64.md (UNSPEC_LDN, UNSPEC_STN): New unspecs.
* config/aarch64/aarch64-sve.md (mov<mode>, *aarch64_sve_mov<mode>_le)
(*aarch64_sve_mov<mode>_be, pred_mov<mode>): New patterns for
structure modes.  Split into pieces after RA.
(vec_load_lanes<mode><vsingle>, vec_mask_load_lanes<mode><vsingle>)
(vec_store_lanes<mode><vsingle>, vec_mask_store_lanes<mode><vsingle>):
New patterns.
* config/aarch64/aarch64.c (aarch64_classify_vector_mode): Handle
SVE structure modes.
(aarch64_classify_address): Likewise.
(sizetochar): Move earlier in file.
(aarch64_print_operand): Handle SVE register lists.
(aarch64_array_mode): New function.
(aarch64_sve_struct_memory_operand_p): Likewise.
(TARGET_ARRAY_MODE): Redefine.

gcc/testsuite/
* lib/target-supports.exp (check_effective_target_vect_load_lanes):
Return true for SVE too.
* g++.dg/vect/pr36648.cc: XFAIL for variable-length vectors
if load/store lanes are supported.
* gcc.dg/vect/slp-10.c: Likewise.
* gcc.dg/vect/slp-12c.c: Likewise.
* gcc.dg/vect/slp-17.c: Likewise.
* gcc.dg/vect/slp-33.c: Likewise.
* gcc.dg/vect/slp-6.c: Likewise.
* gcc.dg/vect/slp-cond-1.c: Likewise.
* gcc.dg/vect/slp-multitypes-11-big-array.c: Likewise.
* gcc.dg/vect/slp-multitypes-11.c: Likewise.
* gcc.dg/vect/slp-multitypes-12.c: Likewise.
* gcc.dg/vect/slp-perm-5.c: Remove XFAIL for variable-length SVE.
* gcc.dg/vect/slp-perm-6.c: Likewise.
* gcc.dg/vect/slp-perm-9.c: Likewise.
* gcc.dg/vect/slp-reduc-6.c: Remove XFAIL for variable-length vectors.
* gcc.dg/vect/vect-load-lanes-peeling-1.c: Expect an epilogue loop
for variable-length vectors.

Co-Authored-By: Alan Hayward <alan.hayward@arm.com>
Co-Authored-By: David Sherwood <david.sherwood@arm.com>
From-SVN: r256618

Give the target more control over ARRAY_TYPE modes

So far we've used integer modes for LD[234] and ST[234] arrays.
That doesn't scale well to SVE, since the sizes aren't fixed at
compile time (and even if they were, we wouldn't want integers
to be so wide).

This patch lets the target use double-, triple- and quadruple-length
vectors instead.

2018-01-13  Richard Sandiford  <richard.sandiford@linaro.org>
    Alan Hayward  <alan.hayward@arm.com>
    David Sherwood  <david.sherwood@arm.com>

gcc/
* target.def (array_mode): New target hook.
* doc/tm.texi.in (TARGET_ARRAY_MODE): New hook.
* doc/tm.texi: Regenerate.
* hooks.h (hook_optmode_mode_uhwi_none): Declare.
* hooks.c (hook_optmode_mode_uhwi_none): New function.
* tree-vect-data-refs.c (vect_lanes_optab_supported_p): Use
targetm.array_mode.
* stor-layout.c (mode_for_array): Likewise.  Support polynomial
type sizes.

Co-Authored-By: Alan Hayward <alan.hayward@arm.com>
Co-Authored-By: David Sherwood <david.sherwood@arm.com>
From-SVN: r256617

Fix folding of vector mask EQ/NE expressions

fold_binary_loc assumed that if the type of the result wasn't a vector,
the operands wouldn't be either.  This isn't necessarily true for
EQ_EXPR and NE_EXPR of vector masks, which can return a single scalar
for the mask as a whole.

2018-01-13  Richard Sandiford  <richard.sandiford@linaro.org>
    Alan Hayward  <alan.hayward@arm.com>
    David Sherwood  <david.sherwood@arm.com>

gcc/
* fold-const.c (fold_binary_loc): Check the argument types
rather than the result type when testing for a vector operation.

gcc/testsuite/
* gcc.target/aarch64/sve/vec_bool_cmp_1.c: New test.
* gcc.target/aarch64/sve/vec_bool_cmp_1_run.c: Likweise.

Co-Authored-By: Alan Hayward <alan.hayward@arm.com>
Co-Authored-By: David Sherwood <david.sherwood@arm.com>
From-SVN: r256616

SVE unwinding

This patch adds support for unwinding frames that use the SVE
pseudo VG register.  We want this register to act like a normal
register if the CFI explicitly sets it, but want to provide a
default value otherwise.  Computing the default value requires
an SVE target, so we only want to compute it on demand.

aarch64_vg uses a hard-coded .inst in order to avoid a build
dependency on binutils 2.28 or later.

2018-01-13  Richard Sandiford  <richard.sandiford@linaro.org>

gcc/
* doc/tm.texi.in (DWARF_LAZY_REGISTER_VALUE): Document.
* doc/tm.texi: Regenerate.

libgcc/
* config/aarch64/value-unwind.h (aarch64_vg): New function.
(DWARF_LAZY_REGISTER_VALUE): Define.
* unwind-dw2.c (_Unwind_GetGR): Use DWARF_LAZY_REGISTER_VALUE
to provide a fallback register value.

gcc/testsuite/
* g++.target/aarch64/sve/aarch64-sve.exp: New harness.
* g++.target/aarch64/sve/catch_1.C: New test.
* g++.target/aarch64/sve/catch_2.C: Likewise.
* g++.target/aarch64/sve/catch_3.C: Likewise.
* g++.target/aarch64/sve/catch_4.C: Likewise.
* g++.target/aarch64/sve/catch_5.C: Likewise.
* g++.target/aarch64/sve/catch_6.C: Likewise.

Reviewed-by: James Greenhalgh <james.greenhalgh@arm.com>
From-SVN: r256615

[AArch64] SVE tests

This patch adds gcc.target/aarch64 tests for SVE, and forces some
existing Advanced SIMD tests to use -march=armv8-a.

2018-01-13  Richard Sandiford  <richard.sandiford@linaro.org>
    Alan Hayward  <alan.hayward@arm.com>
    David Sherwood  <david.sherwood@arm.com>

gcc/testsuite/
* lib/target-supports.exp (check_effective_target_aarch64_asm_sve_ok):
New proc.
* gcc.target/aarch64/bic_imm_1.c: Use #pragma GCC target "+nosve".
* gcc.target/aarch64/fmaxmin.c: Likewise.
* gcc.target/aarch64/fmul_fcvt_2.c: Likewise.
* gcc.target/aarch64/orr_imm_1.c: Likewise.
* gcc.target/aarch64/pr62178.c: Likewise.
* gcc.target/aarch64/pr71727-2.c: Likewise.
* gcc.target/aarch64/saddw-1.c: Likewise.
* gcc.target/aarch64/saddw-2.c: Likewise.
* gcc.target/aarch64/uaddw-1.c: Likewise.
* gcc.target/aarch64/uaddw-2.c: Likewise.
* gcc.target/aarch64/uaddw-3.c: Likewise.
* gcc.target/aarch64/vect-add-sub-cond.c: Likewise.
* gcc.target/aarch64/vect-compile.c: Likewise.
* gcc.target/aarch64/vect-faddv-compile.c: Likewise.
* gcc.target/aarch64/vect-fcm-eq-d.c: Likewise.
* gcc.target/aarch64/vect-fcm-eq-f.c: Likewise.
* gcc.target/aarch64/vect-fcm-ge-d.c: Likewise.
* gcc.target/aarch64/vect-fcm-ge-f.c: Likewise.
* gcc.target/aarch64/vect-fcm-gt-d.c: Likewise.
* gcc.target/aarch64/vect-fcm-gt-f.c: Likewise.
* gcc.target/aarch64/vect-fmax-fmin-compile.c: Likewise.
* gcc.target/aarch64/vect-fmaxv-fminv-compile.c: Likewise.
* gcc.target/aarch64/vect-fmovd-zero.c: Likewise.
* gcc.target/aarch64/vect-fmovd.c: Likewise.
* gcc.target/aarch64/vect-fmovf-zero.c: Likewise.
* gcc.target/aarch64/vect-fmovf.c: Likewise.
* gcc.target/aarch64/vect-fp-compile.c: Likewise.
* gcc.target/aarch64/vect-ld1r-compile-fp.c: Likewise.
* gcc.target/aarch64/vect-ld1r-compile.c: Likewise.
* gcc.target/aarch64/vect-movi.c: Likewise.
* gcc.target/aarch64/vect-mull-compile.c: Likewise.
* gcc.target/aarch64/vect-reduc-or_1.c: Likewise.
* gcc.target/aarch64/vect-vaddv.c: Likewise.
* gcc.target/aarch64/vect_saddl_1.c: Likewise.
* gcc.target/aarch64/vect_smlal_1.c: Likewise.
* gcc.target/aarch64/vector_initialization_nostack.c: XFAIL for
fixed-length SVE.
* gcc.target/aarch64/sve/aarch64-sve.exp: New file.
* gcc.target/aarch64/sve/arith_1.c: New test.
* gcc.target/aarch64/sve/const_pred_1.C: Likewise.
* gcc.target/aarch64/sve/const_pred_2.C: Likewise.
* gcc.target/aarch64/sve/const_pred_3.C: Likewise.
* gcc.target/aarch64/sve/const_pred_4.C: Likewise.
* gcc.target/aarch64/sve/cvtf_signed_1.c: Likewise.
* gcc.target/aarch64/sve/cvtf_signed_1_run.c: Likewise.
* gcc.target/aarch64/sve/cvtf_unsigned_1.c: Likewise.
* gcc.target/aarch64/sve/cvtf_unsigned_1_run.c: Likewise.
* gcc.target/aarch64/sve/dup_imm_1.c: Likewise.
* gcc.target/aarch64/sve/dup_imm_1_run.c: Likewise.
* gcc.target/aarch64/sve/dup_lane_1.c: Likewise.
* gcc.target/aarch64/sve/ext_1.c: Likewise.
* gcc.target/aarch64/sve/ext_2.c: Likewise.
* gcc.target/aarch64/sve/extract_1.c: Likewise.
* gcc.target/aarch64/sve/extract_2.c: Likewise.
* gcc.target/aarch64/sve/extract_3.c: Likewise.
* gcc.target/aarch64/sve/extract_4.c: Likewise.
* gcc.target/aarch64/sve/fabs_1.c: Likewise.
* gcc.target/aarch64/sve/fcvtz_signed_1.c: Likewise.
* gcc.target/aarch64/sve/fcvtz_signed_1_run.c: Likewise.
* gcc.target/aarch64/sve/fcvtz_unsigned_1.c: Likewise.
* gcc.target/aarch64/sve/fcvtz_unsigned_1_run.c: Likewise.
* gcc.target/aarch64/sve/fdiv_1.c: Likewise.
* gcc.target/aarch64/sve/fdup_1.c: Likewise.
* gcc.target/aarch64/sve/fdup_1_run.c: Likewise.
* gcc.target/aarch64/sve/fmad_1.c: Likewise.
* gcc.target/aarch64/sve/fmla_1.c: Likewise.
* gcc.target/aarch64/sve/fmls_1.c: Likewise.
* gcc.target/aarch64/sve/fmsb_1.c: Likewise.
* gcc.target/aarch64/sve/fmul_1.c: Likewise.
* gcc.target/aarch64/sve/fneg_1.c: Likewise.
* gcc.target/aarch64/sve/fnmad_1.c: Likewise.
* gcc.target/aarch64/sve/fnmla_1.c: Likewise.
* gcc.target/aarch64/sve/fnmls_1.c: Likewise.
* gcc.target/aarch64/sve/fnmsb_1.c: Likewise.
* gcc.target/aarch64/sve/fp_arith_1.c: Likewise.
* gcc.target/aarch64/sve/frinta_1.c: Likewise.
* gcc.target/aarch64/sve/frinti_1.c: Likewise.
* gcc.target/aarch64/sve/frintm_1.c: Likewise.
* gcc.target/aarch64/sve/frintp_1.c: Likewise.
* gcc.target/aarch64/sve/frintx_1.c: Likewise.
* gcc.target/aarch64/sve/frintz_1.c: Likewise.
* gcc.target/aarch64/sve/fsqrt_1.c: Likewise.
* gcc.target/aarch64/sve/fsubr_1.c: Likewise.
* gcc.target/aarch64/sve/index_1.c: Likewise.
* gcc.target/aarch64/sve/index_1_run.c: Likewise.
* gcc.target/aarch64/sve/ld1r_1.c: Likewise.
* gcc.target/aarch64/sve/load_const_offset_1.c: Likewise.
* gcc.target/aarch64/sve/load_const_offset_2.c: Likewise.
* gcc.target/aarch64/sve/load_const_offset_3.c: Likewise.
* gcc.target/aarch64/sve/load_scalar_offset_1.c: Likewise.
* gcc.target/aarch64/sve/logical_1.c: Likewise.
* gcc.target/aarch64/sve/loop_add_1.c: Likewise.
* gcc.target/aarch64/sve/loop_add_1_run.c: Likewise.
* gcc.target/aarch64/sve/mad_1.c: Likewise.
* gcc.target/aarch64/sve/maxmin_1.c: Likewise.
* gcc.target/aarch64/sve/maxmin_1_run.c: Likewise.
* gcc.target/aarch64/sve/maxmin_strict_1.c: Likewise.
* gcc.target/aarch64/sve/maxmin_strict_1_run.c: Likewise.
* gcc.target/aarch64/sve/mla_1.c: Likewise.
* gcc.target/aarch64/sve/mls_1.c: Likewise.
* gcc.target/aarch64/sve/mov_rr_1.c: Likewise.
* gcc.target/aarch64/sve/msb_1.c: Likewise.
* gcc.target/aarch64/sve/mul_1.c: Likewise.
* gcc.target/aarch64/sve/neg_1.c: Likewise.
* gcc.target/aarch64/sve/nlogical_1.c: Likewise.
* gcc.target/aarch64/sve/nlogical_1_run.c: Likewise.
* gcc.target/aarch64/sve/pack_1.c: Likewise.
* gcc.target/aarch64/sve/pack_1_run.c: Likewise.
* gcc.target/aarch64/sve/pack_fcvt_signed_1.c: Likewise.
* gcc.target/aarch64/sve/pack_fcvt_signed_1_run.c: Likewise.
* gcc.target/aarch64/sve/pack_fcvt_unsigned_1.c: Likewise.
* gcc.target/aarch64/sve/pack_fcvt_unsigned_1_run.c: Likewise.
* gcc.target/aarch64/sve/pack_float_1.c: Likewise.
* gcc.target/aarch64/sve/pack_float_1_run.c: Likewise.
* gcc.target/aarch64/sve/popcount_1.c: Likewise.
* gcc.target/aarch64/sve/popcount_1_run.c: Likewise.
* gcc.target/aarch64/sve/reduc_1.c: Likewise.
* gcc.target/aarch64/sve/reduc_1_run.c: Likewise.
* gcc.target/aarch64/sve/reduc_2.c: Likewise.
* gcc.target/aarch64/sve/reduc_2_run.c: Likewise.
* gcc.target/aarch64/sve/reduc_3.c: Likewise.
* gcc.target/aarch64/sve/rev_1.c: Likewise.
* gcc.target/aarch64/sve/revb_1.c: Likewise.
* gcc.target/aarch64/sve/revh_1.c: Likewise.
* gcc.target/aarch64/sve/revw_1.c: Likewise.
* gcc.target/aarch64/sve/shift_1.c: Likewise.
* gcc.target/aarch64/sve/single_1.c: Likewise.
* gcc.target/aarch64/sve/single_2.c: Likewise.
* gcc.target/aarch64/sve/single_3.c: Likewise.
* gcc.target/aarch64/sve/single_4.c: Likewise.
* gcc.target/aarch64/sve/spill_1.c: Likewise.
* gcc.target/aarch64/sve/store_scalar_offset_1.c: Likewise.
* gcc.target/aarch64/sve/subr_1.c: Likewise.
* gcc.target/aarch64/sve/trn1_1.c: Likewise.
* gcc.target/aarch64/sve/trn2_1.c: Likewise.
* gcc.target/aarch64/sve/unpack_fcvt_signed_1.c: Likewise.
* gcc.target/aarch64/sve/unpack_fcvt_signed_1_run.c: Likewise.
* gcc.target/aarch64/sve/unpack_fcvt_unsigned_1.c: Likewise.
* gcc.target/aarch64/sve/unpack_fcvt_unsigned_1_run.c: Likewise.
* gcc.target/aarch64/sve/unpack_float_1.c: Likewise.
* gcc.target/aarch64/sve/unpack_float_1_run.c: Likewise.
* gcc.target/aarch64/sve/unpack_signed_1.c: Likewise.
* gcc.target/aarch64/sve/unpack_signed_1_run.c: Likewise.
* gcc.target/aarch64/sve/unpack_unsigned_1.c: Likewise.
* gcc.target/aarch64/sve/unpack_unsigned_1_run.c: Likewise.
* gcc.target/aarch64/sve/uzp1_1.c: Likewise.
* gcc.target/aarch64/sve/uzp1_1_run.c: Likewise.
* gcc.target/aarch64/sve/uzp2_1.c: Likewise.
* gcc.target/aarch64/sve/uzp2_1_run.c: Likewise.
* gcc.target/aarch64/sve/vcond_1.C: Likewise.
* gcc.target/aarch64/sve/vcond_1_run.C: Likewise.
* gcc.target/aarch64/sve/vcond_2.c: Likewise.
* gcc.target/aarch64/sve/vcond_2_run.c: Likewise.
* gcc.target/aarch64/sve/vcond_3.c: Likewise.
* gcc.target/aarch64/sve/vcond_4.c: Likewise.
* gcc.target/aarch64/sve/vcond_4_run.c: Likewise.
* gcc.target/aarch64/sve/vcond_5.c: Likewise.
* gcc.target/aarch64/sve/vcond_5_run.c: Likewise.
* gcc.target/aarch64/sve/vcond_6.c: Likewise.
* gcc.target/aarch64/sve/vcond_6_run.c: Likewise.
* gcc.target/aarch64/sve/vec_init_1.c: Likewise.
* gcc.target/aarch64/sve/vec_init_1_run.c: Likewise.
* gcc.target/aarch64/sve/vec_init_2.c: Likewise.
* gcc.target/aarch64/sve/vec_perm_1.c: Likewise.
* gcc.target/aarch64/sve/vec_perm_1_run.c: Likewise.
* gcc.target/aarch64/sve/vec_perm_1_overrange_run.c: Likewise.
* gcc.target/aarch64/sve/vec_perm_const_1.c: Likewise.
* gcc.target/aarch64/sve/vec_perm_const_1_overrun.c: Likewise.
* gcc.target/aarch64/sve/vec_perm_const_1_run.c: Likewise.
* gcc.target/aarch64/sve/vec_perm_const_single_1.c: Likewise.
* gcc.target/aarch64/sve/vec_perm_const_single_1_run.c: Likewise.
* gcc.target/aarch64/sve/vec_perm_single_1.c: Likewise.
* gcc.target/aarch64/sve/vec_perm_single_1_run.c: Likewise.
* gcc.target/aarch64/sve/zip1_1.c: Likewise.
* gcc.target/aarch64/sve/zip2_1.c: Likewise.

Reviewed-by: James Greenhalgh <james.greenhalgh@arm.com>
Co-Authored-By: Alan Hayward <alan.hayward@arm.com>
Co-Authored-By: David Sherwood <david.sherwood@arm.com>
From-SVN: r256614

[AArch64] Testsuite markup for SVE

This patch adds new target selectors for SVE and updates existing
selectors accordingly.  It also XFAILs some tests that don't yet
work for some SVE modes; most of these go away with follow-on
vectorisation enhancements.

2018-01-13  Richard Sandiford  <richard.sandiford@linaro.org>
    Alan Hayward  <alan.hayward@arm.com>
    David Sherwood  <david.sherwood@arm.com>

gcc/testsuite/
* lib/target-supports.exp (check_effective_target_aarch64_sve)
(aarch64_sve_bits, check_effective_target_aarch64_sve_hw)
(aarch64_sve_hw_bits, check_effective_target_aarch64_sve256_hw):
New procedures.
(check_effective_target_vect_perm): Handle SVE.
(check_effective_target_vect_perm_byte): Likewise.
(check_effective_target_vect_perm_short): Likewise.
(check_effective_target_vect_widen_sum_hi_to_si_pattern): Likewise.
(check_effective_target_vect_widen_mult_qi_to_hi): Likewise.
(check_effective_target_vect_widen_mult_hi_to_si): Likewise.
(check_effective_target_vect_element_align_preferred): Likewise.
(check_effective_target_vect_align_stack_vars): Likewise.
(check_effective_target_vect_load_lanes): Likewise.
(check_effective_target_vect_masked_store): Likewise.
(available_vector_sizes): Use aarch64_sve_bits for SVE.
* gcc.dg/vect/tree-vect.h (VECTOR_BITS): Define appropriately
for SVE.
* gcc.dg/tree-ssa/ssa-dom-cse-2.c: Add SVE XFAIL.
* gcc.dg/vect/bb-slp-pr69907.c: Likewise.
* gcc.dg/vect/no-vfa-vect-depend-2.c: Likewise.
* gcc.dg/vect/no-vfa-vect-depend-3.c: Likewise.
* gcc.dg/vect/slp-23.c: Likewise.
* gcc.dg/vect/slp-perm-5.c: Likewise.
* gcc.dg/vect/slp-perm-6.c: Likewise.
* gcc.dg/vect/slp-perm-9.c: Likewise.
* gcc.dg/vect/slp-reduc-3.c: Likewise.
* gcc.dg/vect/vect-114.c: Likewise.
* gcc.dg/vect/vect-mult-const-pattern-1.c: Likewise.
* gcc.dg/vect/vect-mult-const-pattern-2.c: Likewise.

Reviewed-by: James Greenhalgh <james.greenhalgh@arm.com>
Co-Authored-By: Alan Hayward <alan.hayward@arm.com>
Co-Authored-By: David Sherwood <david.sherwood@arm.com>
From-SVN: r256613

[AArch64] Add SVE support

This patch adds support for ARM's Scalable Vector Extension.
The patch just contains the core features that work with the
current vectoriser framework; later patches will add extra
capabilities to both the target-independent code and AArch64 code.
The patch doesn't include:

- support for unwinding frames whose size depends on the vector length
- modelling the effect of __tls_get_addr on the SVE registers

These are handled by later patches instead.

Some notes:

- The copyright years for aarch64-sve.md start at 2009 because some of
  the code is based on aarch64.md, which also starts from then.

- The patch inserts spaces between items in the AArch64 section
  of sourcebuild.texi.  This matches at least the surrounding
  architectures and looks a little nicer in the info output.

- aarch64-sve.md includes a pattern:

    while_ult<GPI:mode><PRED_ALL:mode>

  A later patch adds a matching "while_ult" optab, but the pattern
  is also needed by the predicate vec_duplicate expander.

2018-01-13  Richard Sandiford  <richard.sandiford@linaro.org>
    Alan Hayward  <alan.hayward@arm.com>
    David Sherwood  <david.sherwood@arm.com>

gcc/
* doc/invoke.texi (-msve-vector-bits=): Document new option.
(sve): Document new AArch64 extension.
* doc/md.texi (w): Extend the description of the AArch64
constraint to include SVE vectors.
(Upl, Upa): Document new AArch64 predicate constraints.
* config/aarch64/aarch64-opts.h (aarch64_sve_vector_bits_enum): New
enum.
* config/aarch64/aarch64.opt (sve_vector_bits): New enum.
(msve-vector-bits=): New option.
* config/aarch64/aarch64-option-extensions.def (fp, simd): Disable
SVE when these are disabled.
(sve): New extension.
* config/aarch64/aarch64-modes.def: Define SVE vector and predicate
modes.  Adjust their number of units based on aarch64_sve_vg.
(MAX_BITSIZE_MODE_ANY_MODE): Define.
* config/aarch64/aarch64-protos.h (ADDR_QUERY_ANY): New
aarch64_addr_query_type.
(aarch64_const_vec_all_same_in_range_p, aarch64_sve_pred_mode)
(aarch64_sve_cnt_immediate_p, aarch64_sve_addvl_addpl_immediate_p)
(aarch64_sve_inc_dec_immediate_p, aarch64_add_offset_temporaries)
(aarch64_split_add_offset, aarch64_output_sve_cnt_immediate)
(aarch64_output_sve_addvl_addpl, aarch64_output_sve_inc_dec_immediate)
(aarch64_output_sve_mov_immediate, aarch64_output_ptrue): Declare.
(aarch64_simd_imm_zero_p): Delete.
(aarch64_check_zero_based_sve_index_immediate): Declare.
(aarch64_sve_index_immediate_p, aarch64_sve_arith_immediate_p)
(aarch64_sve_bitmask_immediate_p, aarch64_sve_dup_immediate_p)
(aarch64_sve_cmp_immediate_p, aarch64_sve_float_arith_immediate_p)
(aarch64_sve_float_mul_immediate_p): Likewise.
(aarch64_classify_symbol): Take the offset as a HOST_WIDE_INT
rather than an rtx.
(aarch64_sve_ld1r_operand_p, aarch64_sve_ldr_operand_p): Declare.
(aarch64_expand_mov_immediate): Take a gen_vec_duplicate callback.
(aarch64_emit_sve_pred_move, aarch64_expand_sve_mem_move): Declare.
(aarch64_expand_sve_vec_cmp_int, aarch64_expand_sve_vec_cmp_float)
(aarch64_expand_sve_vcond, aarch64_expand_sve_vec_perm): Declare.
(aarch64_regmode_natural_size): Likewise.
* config/aarch64/aarch64.h (AARCH64_FL_SVE): New macro.
(AARCH64_FL_V8_3, AARCH64_FL_RCPC, AARCH64_FL_DOTPROD): Shift
left one place.
(AARCH64_ISA_SVE, TARGET_SVE): New macros.
(FIXED_REGISTERS, CALL_USED_REGISTERS, REGISTER_NAMES): Add entries
for VG and the SVE predicate registers.
(V_ALIASES): Add a "z"-prefixed alias.
(FIRST_PSEUDO_REGISTER): Change to P15_REGNUM + 1.
(AARCH64_DWARF_VG, AARCH64_DWARF_P0): New macros.
(PR_REGNUM_P, PR_LO_REGNUM_P): Likewise.
(PR_LO_REGS, PR_HI_REGS, PR_REGS): New reg_classes.
(REG_CLASS_NAMES): Add entries for them.
(REG_CLASS_CONTENTS): Likewise.  Update ALL_REGS to include VG
and the predicate registers.
(aarch64_sve_vg): Declare.
(BITS_PER_SVE_VECTOR, BYTES_PER_SVE_VECTOR, BYTES_PER_SVE_PRED)
(SVE_BYTE_MODE, MAX_COMPILE_TIME_VEC_BYTES): New macros.
(REGMODE_NATURAL_SIZE): Define.
* config/aarch64/aarch64-c.c (aarch64_update_cpp_builtins): Handle
SVE macros.
* config/aarch64/aarch64.c: Include cfgrtl.h.
(simd_immediate_info): Add a constructor for series vectors,
and an associated step field.
(aarch64_sve_vg): New variable.
(aarch64_dbx_register_number): Handle VG and the predicate registers.
(aarch64_vect_struct_mode_p, aarch64_vector_mode_p): Delete.
(VEC_ADVSIMD, VEC_SVE_DATA, VEC_SVE_PRED, VEC_STRUCT, VEC_ANY_SVE)
(VEC_ANY_DATA, VEC_STRUCT): New constants.
(aarch64_advsimd_struct_mode_p, aarch64_sve_pred_mode_p)
(aarch64_classify_vector_mode, aarch64_vector_data_mode_p)
(aarch64_sve_data_mode_p, aarch64_sve_pred_mode)
(aarch64_get_mask_mode): New functions.
(aarch64_hard_regno_nregs): Handle SVE data modes for FP_REGS
and FP_LO_REGS.  Handle PR_REGS, PR_LO_REGS and PR_HI_REGS.
(aarch64_hard_regno_mode_ok): Handle VG.  Also handle the SVE
predicate modes and predicate registers.  Explicitly restrict
GPRs to modes of 16 bytes or smaller.  Only allow FP registers
to store a vector mode if it is recognized by
aarch64_classify_vector_mode.
(aarch64_regmode_natural_size): New function.
(aarch64_hard_regno_caller_save_mode): Return the original mode
for predicates.
(aarch64_sve_cnt_immediate_p, aarch64_output_sve_cnt_immediate)
(aarch64_sve_addvl_addpl_immediate_p, aarch64_output_sve_addvl_addpl)
(aarch64_sve_inc_dec_immediate_p, aarch64_output_sve_inc_dec_immediate)
(aarch64_add_offset_1_temporaries, aarch64_offset_temporaries): New
functions.
(aarch64_add_offset): Add a temp2 parameter.  Assert that temp1
does not overlap dest if the function is frame-related.  Handle
SVE constants.
(aarch64_split_add_offset): New function.
(aarch64_add_sp, aarch64_sub_sp): Add temp2 parameters and pass
them aarch64_add_offset.
(aarch64_allocate_and_probe_stack_space): Add a temp2 parameter
and update call to aarch64_sub_sp.
(aarch64_add_cfa_expression): New function.
(aarch64_expand_prologue): Pass extra temporary registers to the
functions above.  Handle the case in which we need to emit new
DW_CFA_expressions for registers that were originally saved
relative to the stack pointer, but now have to be expressed
relative to the frame pointer.
(aarch64_output_mi_thunk): Pass extra temporary registers to the
functions above.
(aarch64_expand_epilogue): Likewise.  Prevent inheritance of
IP0 and IP1 values for SVE frames.
(aarch64_expand_vec_series): New function.
(aarch64_expand_sve_widened_duplicate): Likewise.
(aarch64_expand_sve_const_vector): Likewise.
(aarch64_expand_mov_immediate): Add a gen_vec_duplicate parameter.
Handle SVE constants.  Use emit_move_insn to move a force_const_mem
into the register, rather than emitting a SET directly.
(aarch64_emit_sve_pred_move, aarch64_expand_sve_mem_move)
(aarch64_get_reg_raw_mode, offset_4bit_signed_scaled_p)
(offset_6bit_unsigned_scaled_p, aarch64_offset_7bit_signed_scaled_p)
(offset_9bit_signed_scaled_p): New functions.
(aarch64_replicate_bitmask_imm): New function.
(aarch64_bitmask_imm): Use it.
(aarch64_cannot_force_const_mem): Reject expressions involving
a CONST_POLY_INT.  Update call to aarch64_classify_symbol.
(aarch64_classify_index): Handle SVE indices, by requiring
a plain register index with a scale that matches the element size.
(aarch64_classify_address): Handle SVE addresses.  Assert that
the mode of the address is VOIDmode or an integer mode.
Update call to aarch64_classify_symbol.
(aarch64_classify_symbolic_expression): Update call to
aarch64_classify_symbol.
(aarch64_const_vec_all_in_range_p): New function.
(aarch64_print_vector_float_operand): Likewise.
(aarch64_print_operand): Handle 'N' and 'C'.  Use "zN" rather than
"vN" for FP registers with SVE modes.  Handle (const ...) vectors
and the FP immediates 1.0 and 0.5.
(aarch64_print_address_internal): Handle SVE addresses.
(aarch64_print_operand_address): Use ADDR_QUERY_ANY.
(aarch64_regno_regclass): Handle predicate registers.
(aarch64_secondary_reload): Handle big-endian reloads of SVE
data modes.
(aarch64_class_max_nregs): Handle SVE modes and predicate registers.
(aarch64_rtx_costs): Check for ADDVL and ADDPL instructions.
(aarch64_convert_sve_vector_bits): New function.
(aarch64_override_options): Use it to handle -msve-vector-bits=.
(aarch64_classify_symbol): Take the offset as a HOST_WIDE_INT
rather than an rtx.
(aarch64_legitimate_constant_p): Use aarch64_classify_vector_mode.
Handle SVE vector and predicate modes.  Accept VL-based constants
that need only one temporary register, and VL offsets that require
no temporary registers.
(aarch64_conditional_register_usage): Mark the predicate registers
as fixed if SVE isn't available.
(aarch64_vector_mode_supported_p): Use aarch64_classify_vector_mode.
Return true for SVE vector and predicate modes.
(aarch64_simd_container_mode): Take the number of bits as a poly_int64
rather than an unsigned int.  Handle SVE modes.
(aarch64_preferred_simd_mode): Update call accordingly.  Handle
SVE modes.
(aarch64_autovectorize_vector_sizes): Add BYTES_PER_SVE_VECTOR
if SVE is enabled.
(aarch64_sve_index_immediate_p, aarch64_sve_arith_immediate_p)
(aarch64_sve_bitmask_immediate_p, aarch64_sve_dup_immediate_p)
(aarch64_sve_cmp_immediate_p, aarch64_sve_float_arith_immediate_p)
(aarch64_sve_float_mul_immediate_p): New functions.
(aarch64_sve_valid_immediate): New function.
(aarch64_simd_valid_immediate): Use it as the fallback for SVE vectors.
Explicitly reject structure modes.  Check for INDEX constants.
Handle PTRUE and PFALSE constants.
(aarch64_check_zero_based_sve_index_immediate): New function.
(aarch64_simd_imm_zero_p): Delete.
(aarch64_mov_operand_p): Use aarch64_simd_valid_immediate for
vector modes.  Accept constants in the range of CNT[BHWD].
(aarch64_simd_scalar_immediate_valid_for_move): Explicitly
ask for an Advanced SIMD mode.
(aarch64_sve_ld1r_operand_p, aarch64_sve_ldr_operand_p): New functions.
(aarch64_simd_vector_alignment): Handle SVE predicates.
(aarch64_vectorize_preferred_vector_alignment): New function.
(aarch64_simd_vector_alignment_reachable): Use it instead of
the vector size.
(aarch64_shift_truncation_mask): Use aarch64_vector_data_mode_p.
(aarch64_output_sve_mov_immediate, aarch64_output_ptrue): New
functions.
(MAX_VECT_LEN): Delete.
(expand_vec_perm_d): Add a vec_flags field.
(emit_unspec2, aarch64_expand_sve_vec_perm): New functions.
(aarch64_evpc_trn, aarch64_evpc_uzp, aarch64_evpc_zip)
(aarch64_evpc_ext): Don't apply a big-endian lane correction
for SVE modes.
(aarch64_evpc_rev): Rename to...
(aarch64_evpc_rev_local): ...this.  Use a predicated operation for SVE.
(aarch64_evpc_rev_global): New function.
(aarch64_evpc_dup): Enforce a 64-byte range for SVE DUP.
(aarch64_evpc_tbl): Use MAX_COMPILE_TIME_VEC_BYTES instead of
MAX_VECT_LEN.
(aarch64_evpc_sve_tbl): New function.
(aarch64_expand_vec_perm_const_1): Update after rename of
aarch64_evpc_rev.  Handle SVE permutes too, trying
aarch64_evpc_rev_global and using aarch64_evpc_sve_tbl rather
than aarch64_evpc_tbl.
(aarch64_vectorize_vec_perm_const): Initialize vec_flags.
(aarch64_sve_cmp_operand_p, aarch64_unspec_cond_code)
(aarch64_gen_unspec_cond, aarch64_expand_sve_vec_cmp_int)
(aarch64_emit_unspec_cond, aarch64_emit_unspec_cond_or)
(aarch64_emit_inverted_unspec_cond, aarch64_expand_sve_vec_cmp_float)
(aarch64_expand_sve_vcond): New functions.
(aarch64_modes_tieable_p): Use aarch64_vector_data_mode_p instead
of aarch64_vector_mode_p.
(aarch64_dwarf_poly_indeterminate_value): New function.
(aarch64_compute_pressure_classes): Likewise.
(aarch64_can_change_mode_class): Likewise.
(TARGET_GET_RAW_RESULT_MODE, TARGET_GET_RAW_ARG_MODE): Redefine.
(TARGET_VECTORIZE_PREFERRED_VECTOR_ALIGNMENT): Likewise.
(TARGET_VECTORIZE_GET_MASK_MODE): Likewise.
(TARGET_DWARF_POLY_INDETERMINATE_VALUE): Likewise.
(TARGET_COMPUTE_PRESSURE_CLASSES): Likewise.
(TARGET_CAN_CHANGE_MODE_CLASS): Likewise.
* config/aarch64/constraints.md (Upa, Upl, Uav, Uat, Usv, Usi, Utr)
(Uty, Dm, vsa, vsc, vsd, vsi, vsn, vsl, vsm, vsA, vsM, vsN): New
constraints.
(Dn, Dl, Dr): Accept const as well as const_vector.
(Dz): Likewise.  Compare against CONST0_RTX.
* config/aarch64/iterators.md: Refer to "Advanced SIMD" instead
of "vector" where appropriate.
(SVE_ALL, SVE_BH, SVE_BHS, SVE_BHSI, SVE_HSDI, SVE_HSF, SVE_SD)
(SVE_SDI, SVE_I, SVE_F, PRED_ALL, PRED_BHS): New mode iterators.
(UNSPEC_SEL, UNSPEC_ANDF, UNSPEC_IORF, UNSPEC_XORF, UNSPEC_COND_LT)
(UNSPEC_COND_LE, UNSPEC_COND_EQ, UNSPEC_COND_NE, UNSPEC_COND_GE)
(UNSPEC_COND_GT, UNSPEC_COND_LO, UNSPEC_COND_LS, UNSPEC_COND_HS)
(UNSPEC_COND_HI, UNSPEC_COND_UO): New unspecs.
(Vetype, VEL, Vel, VWIDE, Vwide, vw, vwcore, V_INT_EQUIV)
(v_int_equiv): Extend to SVE modes.
(Vesize, V128, v128, Vewtype, V_FP_EQUIV, v_fp_equiv, VPRED): New
mode attributes.
(LOGICAL_OR, SVE_INT_UNARY, SVE_FP_UNARY): New code iterators.
(optab): Handle popcount, smin, smax, umin, umax, abs and sqrt.
(logical_nn, lr, sve_int_op, sve_fp_op): New code attributs.
(LOGICALF, OPTAB_PERMUTE, UNPACK, UNPACK_UNSIGNED, SVE_COND_INT_CMP)
(SVE_COND_FP_CMP): New int iterators.
(perm_hilo): Handle the new unpack unspecs.
(optab, logicalf_op, su, perm_optab, cmp_op, imm_con): New int
attributes.
* config/aarch64/predicates.md (aarch64_sve_cnt_immediate)
(aarch64_sve_addvl_addpl_immediate, aarch64_split_add_offset_immediate)
(aarch64_pluslong_or_poly_operand, aarch64_nonmemory_operand)
(aarch64_equality_operator, aarch64_constant_vector_operand)
(aarch64_sve_ld1r_operand, aarch64_sve_ldr_operand): New predicates.
(aarch64_sve_nonimmediate_operand): Likewise.
(aarch64_sve_general_operand): Likewise.
(aarch64_sve_dup_operand, aarch64_sve_arith_immediate): Likewise.
(aarch64_sve_sub_arith_immediate, aarch64_sve_inc_dec_immediate)
(aarch64_sve_logical_immediate, aarch64_sve_mul_immediate): Likewise.
(aarch64_sve_dup_immediate, aarch64_sve_cmp_vsc_immediate): Likewise.
(aarch64_sve_cmp_vsd_immediate, aarch64_sve_index_immediate): Likewise.
(aarch64_sve_float_arith_immediate): Likewise.
(aarch64_sve_float_arith_with_sub_immediate): Likewise.
(aarch64_sve_float_mul_immediate, aarch64_sve_arith_operand): Likewise.
(aarch64_sve_add_operand, aarch64_sve_logical_operand): Likewise.
(aarch64_sve_lshift_operand, aarch64_sve_rshift_operand): Likewise.
(aarch64_sve_mul_operand, aarch64_sve_cmp_vsc_operand): Likewise.
(aarch64_sve_cmp_vsd_operand, aarch64_sve_index_operand): Likewise.
(aarch64_sve_float_arith_operand): Likewise.
(aarch64_sve_float_arith_with_sub_operand): Likewise.
(aarch64_sve_float_mul_operand): Likewise.
(aarch64_sve_vec_perm_operand): Likewise.
(aarch64_pluslong_operand): Include aarch64_sve_addvl_addpl_immediate.
(aarch64_mov_operand): Accept const_poly_int and const_vector.
(aarch64_simd_lshift_imm, aarch64_simd_rshift_imm): Accept const
as well as const_vector.
(aarch64_simd_imm_zero, aarch64_simd_imm_minus_one): Move earlier
in file.  Use CONST0_RTX and CONSTM1_RTX.
(aarch64_simd_or_scalar_imm_zero): Likewise.  Add match_codes.
(aarch64_simd_reg_or_zero): Accept const as well as const_vector.
Use aarch64_simd_imm_zero.
* config/aarch64/aarch64-sve.md: New file.
* config/aarch64/aarch64.md: Include it.
(VG_REGNUM, P0_REGNUM, P7_REGNUM, P15_REGNUM): New register numbers.
(UNSPEC_REV, UNSPEC_LD1_SVE, UNSPEC_ST1_SVE, UNSPEC_MERGE_PTRUE)
(UNSPEC_PTEST_PTRUE, UNSPEC_UNPACKSHI, UNSPEC_UNPACKUHI)
(UNSPEC_UNPACKSLO, UNSPEC_UNPACKULO, UNSPEC_PACK)
(UNSPEC_FLOAT_CONVERT, UNSPEC_WHILE_LO): New unspec constants.
(sve): New attribute.
(enabled): Disable instructions with the sve attribute unless
TARGET_SVE.
(movqi, movhi): Pass CONST_POLY_INT operaneds through
aarch64_expand_mov_immediate.
(*mov<mode>_aarch64, *movsi_aarch64, *movdi_aarch64): Handle
CNT[BHSD] immediates.
(movti): Split CONST_POLY_INT moves into two halves.
(add<mode>3): Accept aarch64_pluslong_or_poly_operand.
Split additions that need a temporary here if the destination
is the stack pointer.
(*add<mode>3_aarch64): Handle ADDVL and ADDPL immediates.
(*add<mode>3_poly_1): New instruction.
(set_clobber_cc): New expander.

Reviewed-by: James Greenhalgh <james.greenhalgh@arm.com>
Co-Authored-By: Alan Hayward <alan.hayward@arm.com>
Co-Authored-By: David Sherwood <david.sherwood@arm.com>
From-SVN: r256612

Mark SLP failures for vect_variable_length

Until SLP support for variable-length vectors is added, many tests
fall back to non-SLP vectorisation with permutes.

2018-01-13 Richard Sandiford <richard.sandiford@linaro.org>

gcc/testsuite/
* gcc.dg/vect/no-scevccp-slp-30.c: XFAIL SLP test for
vect_variable_length, expecting the test to be vectorized
without SLP instead.
* gcc.dg/vect/pr33953.c: Likewise.
* gcc.dg/vect/pr37027.c: Likewise.
* gcc.dg/vect/pr67790.c: Likewise.
* gcc.dg/vect/pr68445.c: Likewise.
* gcc.dg/vect/slp-1.c: Likewise.
* gcc.dg/vect/slp-10.c: Likewise.
* gcc.dg/vect/slp-12a.c: Likewise.
* gcc.dg/vect/slp-12b.c: Likewise.
* gcc.dg/vect/slp-12c.c: Likewise.
* gcc.dg/vect/slp-13-big-array.c: Likewise.
* gcc.dg/vect/slp-13.c: Likewise.
* gcc.dg/vect/slp-14.c: Likewise.
* gcc.dg/vect/slp-15.c: Likewise.
* gcc.dg/vect/slp-17.c: Likewise.
* gcc.dg/vect/slp-19b.c: Likewise.
* gcc.dg/vect/slp-2.c: Likewise.
* gcc.dg/vect/slp-20.c: Likewise.
* gcc.dg/vect/slp-21.c: Likewise.
* gcc.dg/vect/slp-22.c: Likewise.
* gcc.dg/vect/slp-24-big-array.c: Likewise.
* gcc.dg/vect/slp-24.c: Likewise.
* gcc.dg/vect/slp-28.c: Likewise.
* gcc.dg/vect/slp-39.c: Likewise.
* gcc.dg/vect/slp-42.c: Likewise.
* gcc.dg/vect/slp-6.c: Likewise.
* gcc.dg/vect/slp-7.c: Likewise.
* gcc.dg/vect/slp-cond-1.c: Likewise.
* gcc.dg/vect/slp-cond-2-big-array.c: Likewise.
* gcc.dg/vect/slp-cond-2.c: Likewise.
* gcc.dg/vect/slp-multitypes-1.c: Likewise.
* gcc.dg/vect/slp-multitypes-10.c: Likewise.
* gcc.dg/vect/slp-multitypes-12.c: Likewise.
* gcc.dg/vect/slp-multitypes-2.c: Likewise.
* gcc.dg/vect/slp-multitypes-4.c: Likewise.
* gcc.dg/vect/slp-multitypes-5.c: Likewise.
* gcc.dg/vect/slp-multitypes-8.c: Likewise.
* gcc.dg/vect/slp-multitypes-9.c: Likewise.
* gcc.dg/vect/slp-reduc-1.c: Likewise.
* gcc.dg/vect/slp-reduc-2.c: Likewise.
* gcc.dg/vect/slp-reduc-4.c: Likewise.
* gcc.dg/vect/slp-reduc-5.c: Likewise.
* gcc.dg/vect/slp-reduc-7.c: Likewise.
* gcc.dg/vect/slp-widen-mult-half.c: Likewise.
* gcc.dg/vect/vect-live-slp-1.c: Likewise.
* gcc.dg/vect/vect-live-slp-2.c: Likewise.
* gcc.dg/vect/vect-live-slp-3.c: Likewise.

From-SVN: r256611

Extra subreg fold for variable-length CONST_VECTORs

The SVE support for the new CONST_VECTOR encoding needs to be able
to extract the first N bits of the vector and duplicate it.  This patch
adds a simplify_subreg rule for that.

The code is covered by the gcc.target/aarch64/sve_slp_*.c tests.

2018-01-13  Richard Sandiford  <richard.sandiford@linaro.org>

gcc/
* simplify-rtx.c (simplify_immed_subreg): Add an inner_bytes
parameter and use it instead of GET_MODE_SIZE (innermode).  Use
inner_bytes * BITS_PER_UNIT instead of GET_MODE_BITSIZE (innermode).
Use CEIL (inner_bytes, GET_MODE_UNIT_SIZE (innermode)) instead of
GET_MODE_NUNITS (innermode).  Also add a first_elem parameter.
Change innermode from fixed_mode_size to machine_mode.
(simplify_subreg): Update call accordingly.  Handle a constant-sized
subreg of a variable-length CONST_VECTOR.

From-SVN: r256610

Improve canonicalisation of TARGET_MEM_REFs

A general TARGET_MEM_REF is:

    BASE + STEP * INDEX + INDEX2 + OFFSET

After classifying the address in this way, the code that builds
TARGET_MEM_REFs tries to simplify the address until it's valid
for the current target and for the mode of memory being addressed.
It does this in a fixed order:

(1) add SYMBOL to BASE
(2) add INDEX * STEP to the base, if STEP != 1
(3) add OFFSET to INDEX or BASE (reverted if unsuccessful)
(4) add INDEX to BASE
(5) add OFFSET to BASE

So suppose we had an address:

    &symbol + offset + index * 8

(e.g. a[i + 1] for a global "a") on a target only allows an index or an
offset, not both.  Following the steps above, we'd first create:

    tmp = symbol
    tmp2 = tmp + index * 8

Then if the given offset value was valid for the mode being addressed,
we'd create:

    MEM[base:tmp2, offset:offset]

while if it was invalid we'd create:

    tmp3 = tmp2 + offset
    MEM[base:tmp3, offset:0]

The problem is that this could happen if ivopts had decided to use
a scaled index for an address that happens to have a constant base.
The old procedure failed to give an indexed TARGET_MEM_REF in that case,
and adding the offset last prevented later passes from being able to
fold the index back in.

The patch avoids this by checking at (2) whether the offset is the
only component that causes the address to be invalid, folding it
into the base if so.

2018-01-13  Richard Sandiford  <richard.sandiford@linaro.org>
    Alan Hayward  <alan.hayward@arm.com>
    David Sherwood  <david.sherwood@arm.com>

gcc/
* tree-ssa-address.c (mem_ref_valid_without_offset_p): New function.
(add_offset_to_base): New function, split out from...
(create_mem_ref): ...here.  When handling a scale other than 1,
check first whether the address is valid without the offset.
Add it into the base if so, leaving the index and scale as-is.

Co-Authored-By: Alan Hayward <alan.hayward@arm.com>
Co-Authored-By: David Sherwood <david.sherwood@arm.com>
From-SVN: r256609

re PR c/83801 ([avr] String constant in __flash not put into .progmem)

PR c/83801
* c-tree.h (decl_constant_value_1): Add a bool argument.
* c-typeck.c (decl_constant_value_1): Add IN_INIT argument, allow
returning a CONSTRUCTOR if it is true. Use error_operand_p.
(decl_constant_value): Adjust caller.
* c-fold.c (c_fully_fold_internal): If in_init, pass true to
decl_constant_value_1 as IN_INIT. Otherwise, punt if
decl_constant_value returns initializer that has BLKmode or
array type.
(c_fully_fold_internal) <case COMPONENT_REF>: Fold if !lval.

* gcc.dg/pr83801.c: New test.

From-SVN: r256608

re PR fortran/52162 (Bogus -fcheck=bounds with realloc on assignment to unallocated LHS)

2018-01-13 Paul Thomas <pault@gcc.gnu.org>

PR fortran/52162
* trans-expr.c (gfc_trans_scalar_assign): Flag is_alloc_lhs if
the rhs expression is neither an elemental nor a conversion
function.

PR fortran/83622
* trans-array.c (is_pointer_array): Remove unconditional return
of false for -fopenmp.

2018-01-13 Paul Thomas <pault@gcc.gnu.org>

PR fortran/52162
* gfortran.dg/bounds_check_19.f90 : New test.

From-SVN: r256607

re PR fortran/83803 (Using -fc-prototypes on modules with empty dummy arg lists does not close paren.)

2018-01-13 Thomas Koenig <tkoenig@gcc.gnu.org>
<emsr@gcc.gnu.org>

PR fortran/83803
* dump-parse-tree.c (write_proc): Always emit closing parenthesis
for functions.

From-SVN: r256606

Daily bump.

From-SVN: r256602

re PR c++/83778 (g++.dg/ext/altivec-cell-2.C fails starting with r256448)

PR c++/83778
* config/rs6000/rs6000-c.c (altivec_resolve_overloaded_builtin): Call
fold_for_warn before checking if arg2 is INTEGER_CST.

From-SVN: r256599

rs6000: Remove -mstring

-mstring is only enabled by default on 601, and with -Os on some
configurations.  It is almost always slower (than not using it) and
does not very often lead to smaller code.

This patch disables it.  If a user uses -mstring he gets a warning
(but not with -mno-string).  I left the target attribute in place, it
just doesn't do anything anymore.

The patch also deletes a whole bunch of code.  The 'N' and 'O' output
modifiers are now unused, but now is not the time to delete them.

* config/rs6000/predicates.md (load_multiple_operation): Delete.
(store_multiple_operation): Delete.
* config/rs6000/rs6000-cpus.def (601): Remove MASK_STRING.
* config/rs6000/rs6000-protos.h (rs6000_output_load_multiple): Delete.
* config/rs6000/rs6000-string.c (expand_block_move): Delete everything
guarded by TARGET_STRING.
(rs6000_output_load_multiple): Delete.
* config/rs6000/rs6000.c (rs6000_option_override_internal): Delete
OPTION_MASK_STRING / TARGET_STRING handling.
(print_operand) <'N', 'O'>: Add comment that these are unused now.
(const rs6000_opt_masks) <"string">: Change mask to 0.
* config/rs6000/rs6000.h (TARGET_DEFAULT): Remove MASK_STRING.
(MASK_STRING): Delete.
* config/rs6000/rs6000.md (*mov<mode>_string): Delete TARGET_STRING
parts.  Simplify.
(load_multiple): Delete.
(*ldmsi8): Delete.
(*ldmsi7): Delete.
(*ldmsi6): Delete.
(*ldmsi5): Delete.
(*ldmsi4): Delete.
(*ldmsi3): Delete.
(store_multiple): Delete.
(*stmsi8): Delete.
(*stmsi7): Delete.
(*stmsi6): Delete.
(*stmsi5): Delete.
(*stmsi4): Delete.
(*stmsi3): Delete.
(movmemsi_8reg): Delete.
(corresponding unnamed define_insn): Delete.
(movmemsi_6reg): Delete.
(corresponding unnamed define_insn): Delete.
(movmemsi_4reg): Delete.
(corresponding unnamed define_insn): Delete.
(movmemsi_2reg): Delete.
(corresponding unnamed define_insn): Delete.
(movmemsi_1reg): Delete.
(corresponding unnamed define_insn): Delete.
* config/rs6000/rs6000.opt (mno-string): New.
(mstring): Replace by deprecation warning stub.
* doc/invoke.texi (RS/6000 and PowerPC Options): Delete -mstring.

From-SVN: r256598

float128-hw7.c: Use scan-assembler-times instead of scan-assembler-not for xsnabsqp.

* gcc.target/powerpc/float128-hw7.c: Use scan-assembler-times
instead of scan-assembler-not for xsnabsqp.

From-SVN: r256597

regrename.c (regrename_do_replace): If replacing the same reg multiple times, try to reuse last created gen_raw_REG.

* regrename.c (regrename_do_replace): If replacing the same
reg multiple times, try to reuse last created gen_raw_REG.

From-SVN: r256596

re PR fortran/83525 (open(newunit=funit, status="scratch") fails if an internal file (characters) was read previously.)

2018-01-12  Jerry DeLisle  <jvdelisle@gcc.gnu.org>

        PR libgfortran/83525
        * gfortran.dg/newunit_5.f90: New test.

From-SVN: r256595

PR c++/83186 - ICE with static_cast of list-initialized temporary.

* typeck.c (build_static_cast): Use build_non_dependent_expr.

From-SVN: r256594

[C++ PATCH] some reformatting

https://gcc.gnu.org/ml/gcc-patches/2018-01/msg01107.html
* cp-tree.h (mark_rvalue_use): Add parm name.
* expr.c (mark_lvalue_use, mark_lvalue_use_nonread): Move next to
mark_rvalue_use.
* call.c (convert_like_real): Fix formatting.

From-SVN: r256593

re PR debug/81155 (Debug make check regressions in GCC 8.0)

PR debug/81155
* bb-reorder.c (pass_partition_blocks::gate): In lto don't partition
main to workaround a bug in GDB.

From-SVN: r256592

Set use_gcc_stdint=wrap for nvptx

2018-01-12 Tom de Vries <tom@codesourcery.com>

PR target/83737
* config.gcc (nvptx*-*-*): Set use_gcc_stdint=wrap.

From-SVN: r256591

re PR rtl-optimization/80481 (Unoptimal additional copy instructions)

2018-01-12 Vladimir Makarov <vmakarov@redhat.com>

PR rtl-optimization/80481
* ira-color.c (get_cap_member): New function.
(allocnos_conflict_by_live_ranges_p): Use it.
(slot_coalesced_allocno_live_ranges_intersect_p): Add assert.
(setup_slot_coalesced_allocno_live_ranges): Ditto.

2018-01-12 Vladimir Makarov <vmakarov@redhat.com>

PR rtl-optimization/80481
* g++.dg/pr80481.C: New.

From-SVN: r256590

re PR rtl-optimization/83628 (performance regression when accessing arrays on alpha)

PR target/83628
* config/alpha/alpha.md (*saddsi_1): New insn_ans_split pattern.
(*saddl_se_1): Ditto.
(*ssubsi_1): Ditto.
(*saddl_se_1): Ditto.

testsuite/ChangeLog:

PR target/83628
* gcc.target/alpha/pr83628-3.c: New test.

From-SVN: r256589

Guard against incomplete AVX512F support in Solaris as

* lib/target-supports.exp (check_effective_target_avx512f): Also
check for __builtin_ia32_addsd_round,
__builtin_ia32_getmantsd_round.
* gcc.target/i386/i386.exp (check_effective_target_avx512f):
Remove.

From-SVN: r256588

Handle polynomial DR_INIT

The idea with the main 107-patch poly_int series (latterly 109-patch)
was to change the mode sizes and vector element counts to poly_int and
then propagate those changes as far as they needed to go to fix build
failures from incompatible types.  This means that DR_INIT is now
constructed as a poly_int64:

  poly_int64 pbytepos;
  if (!multiple_p (pbitpos, BITS_PER_UNIT, &pbytepos))
    {
      if (dump_file && (dump_flags & TDF_DETAILS))
fprintf (dump_file, "failed: bit offset alignment.\n");
      return false;
    }
  [...]
  init = ssize_int (pbytepos);

This patch adjusts other references to DR_INIT accordingly.  Unlike
the above, the adjustments weren't needed to avoid a build-time type
incompatibility, but they are needed to make the producer and consumers
of DR_INIT logically consistent.

2018-01-12  Richard Sandiford  <richard.sandiford@linaro.org>

gcc/
* tree-predcom.c (aff_combination_dr_offset): Use wi::to_poly_widest
rather than wi::to_widest for DR_INITs.
* tree-vect-data-refs.c (vect_find_same_alignment_drs): Use
wi::to_poly_offset rather than wi::to_offset for DR_INIT.
(vect_analyze_data_ref_accesses): Require both DR_INITs to be
INTEGER_CSTs.
(vect_analyze_group_access_1): Note that here.

From-SVN: r256587

Handle poly_int vector sizes in get_vec_alignment_for_array_type

get_vectype_for_scalar_type returns a variable-length vector type
for SVE, whereas get_vec_alignment_for_array_type assumed it would
always be an INTEGER_CST.

This is needed to build libstdc++-v3/src/closures.cc for SVE
(and probably many other places besides -- this was just the
first hit).

2018-01-12 Richard Sandiford <richard.sandiford@linaro.org>

gcc/
* tree-vectorizer.c (get_vec_alignment_for_array_type): Handle
polynomial type sizes.

From-SVN: r256586

Allow variable-sized temporary variables in gimplify.c

This is needed to build libgfortran for SVE.  The OpenMP code needs
to create temporary vector variables, and the variables will therefore
be variable-sized for SVE.  Earlier patches made such variables work.

2018-01-12  Richard Sandiford  <richard.sandiford@linaro.org>

gcc/
* gimplify.c (gimple_add_tmp_var_fn): Allow variables to have a
poly_uint64 size, rather than requiring an unsigned HOST_WIDE_INT size.
(gimple_add_tmp_var): Likewise.

From-SVN: r256585

Fix integer overflow in stats of GIMPLE statements.

2018-01-12 Martin Liska <mliska@suse.cz>

* gimple.c (gimple_alloc_counts): Use uint64_t instead of int.
(gimple_alloc_sizes): Likewise.
(dump_gimple_statistics): Use PRIu64 in printf format.
* gimple.h: Change uint64_t to int.

From-SVN: r256584

Fix integer overflow in stats of trees.

2018-01-12 Martin Liska <mliska@suse.cz>

* tree-core.h: Use uint64_t instead of int.
* tree.c (tree_node_counts): Likewise.
(tree_node_sizes): Likewise.
(dump_tree_statistics): Use PRIu64 in printf format.

From-SVN: r256583

Fix --enable-gather-detailed-mem-stats build.

2018-01-12 Martin Liska <mliska@suse.cz>

* Makefile.in: As qsort_chk is implemented in vec.c, add
vec.o to linkage of gencfn-macros.
* tree.c (build_new_poly_int_cst): Add CXX_MEM_STAT_INFO as it's
passing the info to record_node_allocation_statistics.
(test_vector_cst_patterns): Add CXX_MEM_STAT_INFO to declaration
and pass the info.
* ggc-common.c (struct ggc_usage): Add operator== and use
it in operator< and compare function.
* mem-stats.h (struct mem_usage): Likewise.
* vec.c (struct vec_usage): Remove operator< and compare
function. Can be simply inherited.

From-SVN: r256582

Deferring FMA transformations in tight loops

2018-01-12 Martin Jambor <mjambor@suse.cz>

PR target/81616
* params.def: New parameter PARAM_AVOID_FMA_MAX_BITS.
* tree-ssa-math-opts.c: Include domwalk.h.
(convert_mult_to_fma_1): New function.
(fma_transformation_info): New type.
(fma_deferring_state): Likewise.
(cancel_fma_deferring): New function.
(result_of_phi): Likewise.
(last_fma_candidate_feeds_initial_phi): Likewise.
(convert_mult_to_fma): Added deferring logic, split actual
transformation to convert_mult_to_fma_1.
(math_opts_dom_walker): New type.
(math_opts_dom_walker::after_dom_children): New method, body moved
here from pass_optimize_widening_mul::execute, added deferring logic
bits.
(pass_optimize_widening_mul::execute): Moved most of code to
math_opts_dom_walker::after_dom_children.
* config/i386/x86-tune.def (X86_TUNE_AVOID_128FMA_CHAINS): New.
* config/i386/i386.c (ix86_option_override_internal): Added
maybe_setting of PARAM_AVOID_FMA_MAX_BITS.

From-SVN: r256581

re PR debug/83157 (gcc.dg/guality/pr41616-1.c fail, inline instances refer to concrete instance as abstract origin)

2018-01-12 Richard Biener <rguenther@suse.de>

PR debug/83157
* dwarf2out.c (gen_variable_die): Do not reset old_die for
inline instance vars.

From-SVN: r256580

re PR target/81819 ([RX] internal compiler error: in rx_is_restricted_memory_address, at config/rx/rx.c:311)

gcc/
PR target/81819
* config/rx/rx.c (rx_is_restricted_memory_address):
Handle SUBREG case.

From-SVN: r256578

rs6000: Tune new testcase (PR83629)

It has some problems running on some 64-bit configuratiions, and the
bug it is testing for is only on 32-bit; so let's not run it elsewhere.

gcc/testsuite/
PR target/83629
* gcc.target/powerpc/pr83629.c: Require ilp32.

From-SVN: r256577

re PR target/80846 (auto-vectorized AVX2 horizontal sum should narrow to 128b right away, to be more efficient for Ryzen and Intel)

2018-01-12 Richard Biener <rguenther@suse.de>

PR tree-optimization/80846
* target.def (split_reduction): New target hook.
* targhooks.c (default_split_reduction): New function.
* targhooks.h (default_split_reduction): Declare.
* tree-vect-loop.c (vect_create_epilog_for_reduction): If the
target requests first reduce vectors by combining low and high
parts.
* tree-vect-stmts.c (vect_gen_perm_mask_any): Adjust.
(get_vectype_for_scalar_type_and_size): Export.
* tree-vectorizer.h (get_vectype_for_scalar_type_and_size): Declare.

* doc/tm.texi.in (TARGET_VECTORIZE_SPLIT_REDUCTION): Document.
* doc/tm.texi: Regenerate.

i386/
* config/i386/i386.c (ix86_split_reduction): Implement
TARGET_VECTORIZE_SPLIT_REDUCTION.

* gcc.target/i386/pr80846-1.c: New testcase.
* gcc.target/i386/pr80846-2.c: Likewise.

From-SVN: r256576

re PR target/83368 (alloca after setjmp breaks PIC base reg)

PR target/83368
* config/sparc/sparc.h (PIC_OFFSET_TABLE_REGNUM): Set to INVALID_REGNUM
in PIC mode except for TARGET_VXWORKS_RTP.
* config/sparc/sparc.c: Include cfgrtl.h.
(TARGET_INIT_PIC_REG): Define.
(TARGET_USE_PSEUDO_PIC_REG): Likewise.
(sparc_pic_register_p): New predicate.
(sparc_legitimate_address_p): Use it.
(sparc_legitimize_pic_address): Likewise.
(sparc_delegitimize_address): Likewise.
(sparc_mode_dependent_address_p): Likewise.
(gen_load_pcrel_sym): Remove 4th parameter.
(load_got_register): Adjust call to above. Remove obsolete stuff.
(sparc_expand_prologue): Do not call load_got_register here.
(sparc_flat_expand_prologue): Likewise.
(sparc_output_mi_thunk): Set the pic_offset_table_rtx object.
(sparc_use_pseudo_pic_reg): New function.
(sparc_init_pic_reg): Likewise.
* config/sparc/sparc.md (vxworks_load_got): Set the GOT register.
(builtin_setjmp_receiver): Enable only for TARGET_VXWORKS_RTP.

From-SVN: r256575

Add doc for branch_cost effective target.

2018-01-12 Christophe Lyon <christophe.lyon@linaro.org>

gcc/
* doc/sourcebuild.texi (Effective-Target Keywords, Other attributes):
Add item for branch_cost.

From-SVN: r256574

re PR rtl-optimization/83565 (RTL combine pass yields wrong rotate result)

PR rtl-optimization/83565
* rtlanal.c (nonzero_bits1): On WORD_REGISTER_OPERATIONS machines, do
not extend the result to a larger mode for rotate operations.
(num_sign_bit_copies1): Likewise.

From-SVN: r256572

Add dg-require-effective-target indirect_jumps for g++

2018-01-12 Tom de Vries <tom@codesourcery.com>

* g++.dg/ext/label13.C: Add dg-require-effective-target indirect_jumps.
* g++.dg/ext/label13a.C: Same.
* g++.dg/ext/label14.C: Same.
* g++.dg/ext/label2.C: Same.
* g++.dg/ext/label3.C: Same.
* g++.dg/torture/pr42462.C: Same.
* g++.dg/torture/pr42739.C: Same.
* g++.dg/warn/Wunused-label-3.C: Same.

From-SVN: r256571

Add dg-require-effective-target alloca for c++ test-cases

2018-01-12 Tom de Vries <tom@codesourcery.com>

* c-c++-common/dwarf2/vla1.c: Add dg-require-effective-target alloca.
* g++.dg/Walloca1.C: Same.
* g++.dg/cpp0x/pr70338.C: Same.
* g++.dg/cpp1y/lambda-generic-vla1.C: Same.
* g++.dg/cpp1y/vla10.C: Same.
* g++.dg/cpp1y/vla2.C: Same.
* g++.dg/cpp1y/vla6.C: Same.
* g++.dg/cpp1y/vla8.C: Same.
* g++.dg/debug/debug5.C: Same.
* g++.dg/debug/debug6.C: Same.
* g++.dg/debug/pr54828.C: Same.
* g++.dg/diagnostic/pr70105.C: Same.
* g++.dg/eh/cleanup5.C: Same.
* g++.dg/eh/spbp.C: Same.
* g++.dg/ext/tmplattr9.C: Same.
* g++.dg/ext/vla10.C: Same.
* g++.dg/ext/vla11.C: Same.
* g++.dg/ext/vla12.C: Same.
* g++.dg/ext/vla15.C: Same.
* g++.dg/ext/vla16.C: Same.
* g++.dg/ext/vla17.C: Same.
* g++.dg/ext/vla3.C: Same.
* g++.dg/ext/vla6.C: Same.
* g++.dg/ext/vla7.C: Same.
* g++.dg/init/array24.C: Same.
* g++.dg/init/new47.C: Same.
* g++.dg/init/pr55497.C: Same.
* g++.dg/opt/pr78201.C: Same.
* g++.dg/template/vla2.C: Same.
* g++.dg/torture/Wsizeof-pointer-memaccess1.C: Same.
* g++.dg/torture/Wsizeof-pointer-memaccess2.C: Same.
* g++.dg/torture/pr62127.C: Same.
* g++.dg/torture/pr67055.C: Same.
* g++.dg/torture/stackalign/eh-alloca-1.C: Same.
* g++.dg/torture/stackalign/eh-inline-2.C: Same.
* g++.dg/torture/stackalign/eh-vararg-1.C: Same.
* g++.dg/torture/stackalign/eh-vararg-2.C: Same.
* g++.dg/warn/Wplacement-new-size-5.C: Same.
* g++.dg/warn/Wsizeof-pointer-memaccess-1.C: Same.
* g++.dg/warn/Wvla-1.C: Same.
* g++.dg/warn/Wvla-3.C: Same.
* g++.old-deja/g++.ext/array2.C: Same.
* g++.old-deja/g++.ext/constructor.C: Same.
* g++.old-deja/g++.law/builtin1.C: Same.
* g++.old-deja/g++.other/crash12.C: Same.
* g++.old-deja/g++.other/eh3.C: Same.
* g++.old-deja/g++.pt/array6.C: Same.
* g++.old-deja/g++.pt/dynarray.C: Same.

From-SVN: r256570

Fix g++.dg/cpp0x/inh-ctor30.C

* g++.dg/cpp0x/inh-ctor30.C: Allow for alternate mangled form.

From-SVN: r256569

Link with correct values-*.o files on Solaris (PR target/40411)

gcc/testsuite:
PR libfortran/67412
* gfortran.dg/execute_command_line_2.f90: Remove dg-xfail-run-if
on *-*-solaris2.10.

libstdc++-v3:
PR libstdc++/64054
* testsuite/27_io/basic_ostream/inserters_arithmetic/char/hexfloat.cc:
Remove dg-xfail-run-if.

gcc:
PR target/40411
* config/sol2.h (STARTFILE_ARCH_SPEC): Don't use with -shared or
-symbolic.
Use values-Xc.o for -pedantic.
Link with values-xpg4.o for C90, values-xpg6.o otherwise.

From-SVN: r256568

Include all x86 targets in branch_cost effective target

* lib/target-supports.exp (check_effective_target_branch_cost):
Accept all x86 targets.

From-SVN: r256567

Initialize type_warnings::dyn_count with a default value (PR ipa/83054).

2018-01-12 Martin Liska <mliska@suse.cz>

PR ipa/83054
* ipa-devirt.c (final_warning_record::grow_type_warnings):
New function.
(possible_polymorphic_call_targets): Use it.
(ipa_devirt): Likewise.
2018-01-12 Martin Liska <mliska@suse.cz>

PR ipa/83054
* g++.dg/warn/pr83054.C: New test.

From-SVN: r256566

Add new verification for profile-count.h.

2018-01-12 Martin Liska <mliska@suse.cz>

* profile-count.h (enum profile_quality): Use 0 as invalid
enum value of profile_quality.

From-SVN: r256565

Add new NDS32 options -mext-perf, -mext-perf2 and -mext-string in the documentation.

gcc/
* doc/invoke.texi (NDS32 Options): Add -mext-perf, -mext-perf2 and
-mext-string options.

From-SVN: r256564

lto-streamer-out.c (DFS::DFS_write_tree_body): Process DECL_DEBUG_EXPR conditional on DECL_HAS_DEBUG_EXPR_P.

2018-01-12 Richard Biener <rguenther@suse.de>

* lto-streamer-out.c (DFS::DFS_write_tree_body): Process
DECL_DEBUG_EXPR conditional on DECL_HAS_DEBUG_EXPR_P.
* tree-streamer-in.c (lto_input_ts_decl_common_tree_pointers):
Likewise.
* tree-streamer-out.c (write_ts_decl_common_tree_pointers): Likewise.

From-SVN: r256563

Daily bump.

From-SVN: r256561

configure.ac (--with-long-double-format): Add support for the configuration option to change the default long double...

2018-01-11 Michael Meissner <meissner@linux.vnet.ibm.com>

* configure.ac (--with-long-double-format): Add support for the
configuration option to change the default long double format on
PowerPC systems.
* config.gcc (powerpc*-linux*-*): Likewise.
* configure: Regenerate.
* config/rs6000/rs6000-c.c (rs6000_cpu_cpp_builtins): If long
double is IEEE, define __KC__ and __KF__ to allow floatn.h to be
used without modification.

From-SVN: r256558

rs6000-builtin.def (BU_P7_MISC_X): New #define.

[gcc]

2018-01-11 Bill Schmidt <wschmidt@linux.vnet.ibm.com>

* config/rs6000/rs6000-builtin.def (BU_P7_MISC_X): New #define.
(SPEC_BARRIER): New instantiation of BU_P7_MISC_X.
* config/rs6000/rs6000.c (rs6000_expand_builtin): Handle
MISC_BUILTIN_SPEC_BARRIER.
(rs6000_init_builtins): Likewise.
* config/rs6000/rs6000.md (UNSPECV_SPEC_BARRIER): New UNSPECV
enum value.
(speculation_barrier): New define_insn.
* doc/extend.texi: Document __builtin_speculation_barrier.

[gcc/testsuite]

2018-01-11 Bill Schmidt <wschmidt@linux.vnet.ibm.com>

* gcc.target/powerpc/spec-barr-1.c: New file.

From-SVN: r256557

re PR target/83203 (Inefficient int to avx2 vector conversion)

PR target/83203
* config/i386/i386.c (ix86_expand_vector_init_one_nonzero): If one_var
is 0, for V{8,16}S[IF] and V[48]D[IF]mode use gen_vec_set<mode>_0.
* config/i386/sse.md (VI8_AVX_AVX512F, VI4F_256_512): New mode
iterators.
(ssescalarmodesuffix): Add 512-bit vectors. Use "d" or "q" for
integral modes instead of "ss" and "sd".
(vec_set<mode>_0): New define_insns for 256-bit and 512-bit
vectors with 32-bit and 64-bit elements.
(vecdupssescalarmodesuffix): New mode attribute.
(vec_dup<mode>): Use it.

From-SVN: r256556

i386: Align stack frame if argument is passed on stack

When a function call is removed, it may become a leaf function. But if
argument may be passed on stack, we need to align the stack frame when
there is no tail call.

Tested on Linux/i686 and Linux/x86-64.

gcc/

PR target/83330
* config/i386/i386.c (ix86_compute_frame_layout): Align stack
frame if argument is passed on stack.

gcc/testsuite/

PR target/83330
* gcc.target/i386/pr83330.c: New test.

From-SVN: r256555

re PR fortran/79383 (USE statement error)

2018-01-11 Steven G. Kargl <kargl@gcc.gnu.org>

PR fortran/79383
* gfortran.dg/dtio_31.f03: New test.
* gfortran.dg/dtio_32.f03: New test.

From-SVN: r256554

re PR go/83794 (misc/cgo/test uses gigabytes of memory)

PR go/83794
misc/cgo/test: avoid endless loop when we can't parse notes

Reviewed-on: https://go-review.googlesource.com/87416

From-SVN: r256553

Add some reproducers for issues found developing the location-wrappers patch

gcc/testsuite/ChangeLog:
PR c++/43486
* g++.dg/wrappers: New subdirectory.
* g++.dg/wrappers/README: New file.
* g++.dg/wrappers/alloc.C: New test case.
* g++.dg/wrappers/cow-istream-string.C: New test case.
* g++.dg/wrappers/cp-stdlib.C: New test case.
* g++.dg/wrappers/sanitizer_coverage_libcdep_new.C: New test case.
* g++.dg/wrappers/wrapper-around-type-pack-expansion.C: New test
case.

From-SVN: r256552

re PR target/82682 (FAIL: gcc.target/i386/pr50038.c scan-assembler-times movzbl 2 (found 3 times) since r253958)

PR target/82682
* ree.c (combine_reaching_defs): Optimize also
reg2=exp; reg1=reg2; reg2=any_extend(reg1); into
reg2=any_extend(exp); reg1=reg2;, formatting fix.

From-SVN: r256551

PR c++/82728 - wrong -Wunused-but-set-variable

PR c++/82799
PR c++/83690
* call.c (perform_implicit_conversion_flags): Call mark_rvalue_use.
* decl.c (case_conversion): Likewise.
* semantics.c (finish_static_assert): Call
perform_implicit_conversion_flags.

From-SVN: r256550

re PR tree-optimization/83189 (internal compiler error: in probability_in, at profile-count.h:1050)

PR middle-end/83189
* gimple-ssa-isolate-paths.c (isolate_path): Fix profile update.

From-SVN: r256545

re PR middle-end/83718 (ICE: Floating point exception in profile_count::apply_scale)

PR middle-end/83718
* tree-inline.c (copy_cfg_body): Adjust num&den for scaling
after they are computed.
* g++.dg/torture/pr83718.C: New testcase.

From-SVN: r256544

[C++ PATCH] kill unused enum

https://gcc.gnu.org/ml/gcc-patches/2018-01/msg00923.html
* method.c (enum mangling_flags): Delete long-dead enum.

From-SVN: r256543

re PR ipa/83178 (g++.dg/ipa/devirt-22.C fail)

PR ipa/83178
* g++.dg/ipa/devirt-22.C: Adjust scan-dump-times count.

From-SVN: r256542

re PR tree-optimization/83695 (ICE on valid code at -O3: Segmentation fault)

PR tree-optimization/83695
* gimple-loop-linterchange.cc
(tree_loop_interchange::interchange_loops): Call scev_reset_htab to
reset cached scev information after interchange.
(pass_linterchange::execute): Remove call to scev_reset_htab.

gcc/testsuite
PR tree-optimization/83695
* gcc.dg/tree-ssa/pr83695.c: New test.

From-SVN: r256541

[arm][3/3] Implement fp16fml lane intrinsics

This patch implements the lane-wise fp16fml intrinsics.
There's quite a few of them so I've split them up from
the other simpler fp16fml intrinsics.

These ones expose instructions such as

vfmal.f16 Dd, Sn, Sm[<index>]  0 <= index <= 1
vfmal.f16 Qd, Dn, Dm[<index>]  0 <= index <= 3
vfmsl.f16 Dd, Sn, Sm[<index>]  0 <= index <= 1
vfmsl.f16 Qd, Dn, Dm[<index>]  0 <= index <= 3

These instructions extract a single half-precision
floating-point value from one of the source regs
and perform a vfmal/vfmsl operation as per the
normal variant with that value.

The nuance here is that some of the intrinsics want
to do things like:

float32x2_t vfmlal_laneq_low_u32 (float32x2_t __r, float16x4_t __a, float16x8_t __b, const int __index)

where the float16x8_t value of '__b' is held in a Q
register, so we need to be a bit smart about finding
the right D or S sub-register and translating the
lane number to a lane in that sub-register, instead
of just passing the language-level const-int down to
the assembly instruction.

That's where most of the complexity of this patch comes from
but hopefully it's orthogonal enough to make sense.

Bootstrapped and tested on arm-none-linux-gnueabihf as well as
armeb-none-eabi.

* config/arm/arm_neon.h (vfmlal_lane_low_u32, vfmlal_lane_high_u32,
vfmlalq_laneq_low_u32, vfmlalq_lane_low_u32, vfmlal_laneq_low_u32,
vfmlalq_laneq_high_u32, vfmlalq_lane_high_u32, vfmlal_laneq_high_u32,
vfmlsl_lane_low_u32, vfmlsl_lane_high_u32, vfmlslq_laneq_low_u32,
vfmlslq_lane_low_u32, vfmlsl_laneq_low_u32, vfmlslq_laneq_high_u32,
vfmlslq_lane_high_u32, vfmlsl_laneq_high_u32): Define.
* config/arm/arm_neon_builtins.def (vfmal_lane_low,
vfmal_lane_lowv4hf, vfmal_lane_lowv8hf, vfmal_lane_high,
vfmal_lane_highv4hf, vfmal_lane_highv8hf, vfmsl_lane_low,
vfmsl_lane_lowv4hf, vfmsl_lane_lowv8hf, vfmsl_lane_high,
vfmsl_lane_highv4hf, vfmsl_lane_highv8hf): New sets of builtins.
* config/arm/iterators.md (VFMLSEL2, vfmlsel2): New mode attributes.
(V_lane_reg): Likewise.
* config/arm/neon.md (neon_vfm<vfml_op>l_lane_<vfml_half><VCVTF:mode>):
New define_expand.
(neon_vfm<vfml_op>l_lane_<vfml_half><vfmlsel2><mode>): Likewise.
(vfmal_lane_low<mode>_intrinsic,
vfmal_lane_low<vfmlsel2><mode>_intrinsic,
vfmal_lane_high<vfmlsel2><mode>_intrinsic,
vfmal_lane_high<mode>_intrinsic, vfmsl_lane_low<mode>_intrinsic,
vfmsl_lane_low<vfmlsel2><mode>_intrinsic,
vfmsl_lane_high<vfmlsel2><mode>_intrinsic,
vfmsl_lane_high<mode>_intrinsic): New define_insns.

* gcc.target/arm/simd/fp16fml_lane_high.c: New test.
* gcc.target/arm/simd/fp16fml_lane_low.c: New test.

From-SVN: r256540

[arm][2/3] Implement fp16fml extension for ARMv8.4-A

This patch adds the +fp16fml extension that enables some
half-precision floating-point Advanced SIMD instructions,
available through arm_neon.h intrinsics.

This extension is on by default for armv8.4-a
if fp16 is available, so it can be enabled by -march=armv8.4-a+fp16.

fp16fml is also available for armv8.2-a and armv8.3-a through the
+fp16fml option that is added for these architectures.

The new instructions that this patch adds support for are:
vfmal.f16 Dr, Sm, Sn
vfmal.f16 Qr, Dm, Dn
vfmsl.f16 Dr, Sm, Sn
vfmsl.f16 Qr, Dm, Dn

They interpret their input registers as a vector of half-precision
floating-point values, extend them to single-precision vectors
and perform a fused multiply-add or subtract of them with the
destination vector.

This patch exposes these instructions through arm_neon.h intrinsics.
The set of intrinsics allows us to do stuff such as perform
the multiply-add/subtract operation on the low or top half of
float16x4_t and float16x8_t values. This maps naturally in aarch64
to the FMLAL and FMLAL2 instructions but on arm we have to use the
fact that consecutive NEON registers overlap the wider register
(i.e. d0 is s0 plus s1, q0 is d0 plus d1 etc). This just means
we have to be careful to use the right subreg operand print code.

New arm-specific builtins are defined to expand to the new patterns.
I've managed to compress the define_expands using code, mode and int
iterators but the define_insns don't compress very well without two-tiered
iterators (iterator attributes expanding to iterators) which we
don't support.

Bootstrapped and tested on arm-none-linux-gnueabihf and also on
armeb-none-eabi.

* config/arm/arm-cpus.in (fp16fml): New feature.
(ALL_SIMD): Add fp16fml.
(armv8.2-a): Add fp16fml as an option.
(armv8.3-a): Likewise.
(armv8.4-a): Add fp16fml as part of fp16.
* config/arm/arm.h (TARGET_FP16FML): Define.
* config/arm/arm-c.c (arm_cpu_builtins): Define __ARM_FEATURE_FP16_FML
when appropriate.
* config/arm/arm-modes.def (V2HF): Define.
* config/arm/arm_neon.h (vfmlal_low_u32, vfmlsl_low_u32,
vfmlal_high_u32, vfmlsl_high_u32, vfmlalq_low_u32,
vfmlslq_low_u32, vfmlalq_high_u32, vfmlslq_high_u32): Define.
* config/arm/arm_neon_builtins.def (vfmal_low, vfmal_high,
vfmsl_low, vfmsl_high): New set of builtins.
* config/arm/iterators.md (PLUSMINUS): New code iterator.
(vfml_op): New code attribute.
(VFMLHALVES): New int iterator.
(VFML, VFMLSEL): New mode attributes.
(V_reg): Define mapping for V2HF.
(V_hi, V_lo): New mode attributes.
(VF_constraint): Likewise.
(vfml_half, vfml_half_selector): New int attributes.
* config/arm/neon.md (neon_vfm<vfml_op>l_<vfml_half><mode>): New
define_expand.
(vfmal_low<mode>_intrinsic, vfmsl_high<mode>_intrinsic,
vfmal_high<mode>_intrinsic, vfmsl_low<mode>_intrinsic):
New define_insn.
* config/arm/t-arm-elf (v8_fps): Add fp16fml.
* config/arm/t-multilib (v8_2_a_simd_variants): Add fp16fml.
* config/arm/unspecs.md (UNSPEC_VFML_LO, UNSPEC_VFML_HI): New unspecs.
* doc/invoke.texi (ARM Options): Document fp16fml. Update armv8.4-a
documentation.
* doc/sourcebuild.texi (arm_fp16fml_neon_ok, arm_fp16fml_neon):
Document new effective target and option set.

* gcc.target/arm/multilib.exp: Add combination tests for fp16fml.
* gcc.target/arm/simd/fp16fml_high.c: New test.
* gcc.target/arm/simd/fp16fml_low.c: Likewise.
* lib/target-supports.exp
(check_effective_target_arm_fp16fml_neon_ok_nocache,
check_effective_target_arm_fp16fml_neon_ok,
add_options_for_arm_fp16fml_neon): New procedures.

From-SVN: r256539

[arm][1/3] Add -march=armv8.4-a option

This patch adds support for the Armv8.4-A architecture [1]
in the arm backend. This is done through the new
-march=armv8.4-a option.

With this patch armv8.4-a is recognised as an argument
and supports the extensions: simd, fp16, crypto, nocrypto,
nofp with the familiar meaning of these options.
Worth noting that there is no dotprod option like in
armv8.2-a and armv8.3-a because Dot Product support is
mandatory in Armv8.4-A when simd is available, so when using
+simd (of fp16 which enables +simd), the +dotprod is implied.

The various multilib selection makefile fragments are updated
too and the mutlilib.exp test gets a few armv8.4-a combination
tests.

Bootstrapped and tested on arm-none-linux-gnueabihf.

From-SVN: r256537

re PR target/81821 ([RX] xchg_mem<mode> uses wrong memory operand size)

gcc/
PR target/81821
* config/rx/rx.md (BW): New mode attribute.
(sync_lock_test_and_setsi): Add mode suffix to insn output.

From-SVN: r256536

re PR tree-optimization/83435 (ICE in set_value_range, at tree-vrp.c:211)

2018-01-11 Richard Biener <rguenther@suse.de>

PR tree-optimization/83435
* graphite.c (canonicalize_loop_form): Ignore fake loop exit edges.
* graphite-scop-detection.c (scop_detection::get_sese): Likewise.
* tree-vrp.c (add_assert_info): Drop TREE_OVERFLOW if they appear.

* gcc.dg/graphite/pr83435.c: New testcase.

From-SVN: r256535

[AArch64] Add const_offset field to aarch64_address_info

This patch records the integer value of the address offset in
aarch64_address_info, so that it doesn't need to be re-extracted
from the rtx.  The SVE port will make more use of this.  The patch
also uses poly_int64 routines to manipulate the offset, rather than
just handling CONST_INTs.

2018-01-11  Richard Sandiford  <richard.sandiford@linaro.org>
    Alan Hayward  <alan.hayward@arm.com>
    David Sherwood  <david.sherwood@arm.com>

gcc/
* config/aarch64/aarch64.c (aarch64_address_info): Add a const_offset
field.
(aarch64_classify_address): Initialize it.  Track polynomial offsets.
(aarch64_print_address_internal): Use it to check for a zero offset.

Co-Authored-By: Alan Hayward <alan.hayward@arm.com>
Co-Authored-By: David Sherwood <david.sherwood@arm.com>
From-SVN: r256534

[AArch64] Set NUM_POLY_INT_COEFFS to 2

This patch switches the AArch64 port to use 2 poly_int coefficients
and updates code as necessary to keep it compiling.

One potentially-significant change is to
aarch64_hard_regno_caller_save_mode.  The old implementation
was written in a pretty conservative way: it changed the default
behaviour for single-register values, but used the default handling
for multi-register values.

I don't think that's necessary, since the interesting cases for this
macro are usually the single-register ones.  Multi-register modes take
up the whole of the constituent registers and the move patterns for all
multi-register modes should be equally good.

Using the original mode for multi-register cases stops us from using
SVE modes to spill multi-register NEON values.  This was caught by
gcc.c-torture/execute/pr47538.c.

Also, aarch64_shift_truncation_mask used GET_MODE_BITSIZE - 1.
GET_MODE_UNIT_BITSIZE - 1 is equivalent for the cases that it handles
(which are all scalars), and I think it's more obvious, since if we ever
do use this for elementwise shifts of vector modes, the mask will depend
on the number of bits in each element rather than the number of bits in
the whole vector.

2018-01-11  Richard Sandiford  <richard.sandiford@linaro.org>
    Alan Hayward  <alan.hayward@arm.com>
    David Sherwood  <david.sherwood@arm.com>

gcc/
* config/aarch64/aarch64-modes.def (NUM_POLY_INT_COEFFS): Set to 2.
* config/aarch64/aarch64-protos.h (aarch64_initial_elimination_offset):
Return a poly_int64 rather than a HOST_WIDE_INT.
(aarch64_offset_7bit_signed_scaled_p): Take the offset as a poly_int64
rather than a HOST_WIDE_INT.
* config/aarch64/aarch64.h (aarch64_frame): Protect with
HAVE_POLY_INT_H rather than HOST_WIDE_INT.  Change locals_offset,
hard_fp_offset, frame_size, initial_adjust, callee_offset and
final_offset from HOST_WIDE_INT to poly_int64.
* config/aarch64/aarch64-builtins.c (aarch64_simd_expand_args): Use
to_constant when getting the number of units in an Advanced SIMD
mode.
(aarch64_builtin_vectorized_function): Check for a constant number
of units.
* config/aarch64/aarch64-simd.md (mov<mode>): Handle polynomial
GET_MODE_SIZE.
(aarch64_ld<VSTRUCT:nregs>_lane<VALLDIF:mode>): Use the nunits
attribute instead of GET_MODE_NUNITS.
* config/aarch64/aarch64.c (aarch64_hard_regno_nregs)
(aarch64_class_max_nregs): Use the constant_lowest_bound of the
GET_MODE_SIZE for fixed-size registers.
(aarch64_const_vec_all_same_in_range_p): Use const_vec_duplicate_p.
(aarch64_hard_regno_call_part_clobbered, aarch64_classify_index)
(aarch64_mode_valid_for_sched_fusion_p, aarch64_classify_address)
(aarch64_legitimize_address_displacement, aarch64_secondary_reload)
(aarch64_print_operand, aarch64_print_address_internal)
(aarch64_address_cost, aarch64_rtx_costs, aarch64_register_move_cost)
(aarch64_short_vector_p, aapcs_vfp_sub_candidate)
(aarch64_simd_attr_length_rglist, aarch64_operands_ok_for_ldpstp):
Handle polynomial GET_MODE_SIZE.
(aarch64_hard_regno_caller_save_mode): Likewise.  Return modes
wider than SImode without modification.
(tls_symbolic_operand_type): Use strip_offset instead of split_const.
(aarch64_pass_by_reference, aarch64_layout_arg, aarch64_pad_reg_upward)
(aarch64_gimplify_va_arg_expr): Assert that we don't yet handle
passing and returning SVE modes.
(aarch64_function_value, aarch64_layout_arg): Use gen_int_mode
rather than GEN_INT.
(aarch64_emit_probe_stack_range): Take the size as a poly_int64
rather than a HOST_WIDE_INT, but call sorry if it isn't constant.
(aarch64_allocate_and_probe_stack_space): Likewise.
(aarch64_layout_frame): Cope with polynomial offsets.
(aarch64_save_callee_saves, aarch64_restore_callee_saves): Take the
start_offset as a poly_int64 rather than a HOST_WIDE_INT.  Track
polynomial offsets.
(offset_9bit_signed_unscaled_p, offset_12bit_unsigned_scaled_p)
(aarch64_offset_7bit_signed_scaled_p): Take the offset as a
poly_int64 rather than a HOST_WIDE_INT.
(aarch64_get_separate_components, aarch64_process_components)
(aarch64_expand_prologue, aarch64_expand_epilogue)
(aarch64_use_return_insn_p): Handle polynomial frame offsets.
(aarch64_anchor_offset): New function, split out from...
(aarch64_legitimize_address): ...here.
(aarch64_builtin_vectorization_cost): Handle polynomial
TYPE_VECTOR_SUBPARTS.
(aarch64_simd_check_vect_par_cnst_half): Handle polynomial
GET_MODE_NUNITS.
(aarch64_simd_make_constant, aarch64_expand_vector_init): Get the
number of elements from the PARALLEL rather than the mode.
(aarch64_shift_truncation_mask): Use GET_MODE_UNIT_BITSIZE
rather than GET_MODE_BITSIZE.
(aarch64_evpc_trn, aarch64_evpc_uzp, aarch64_evpc_ext)
(aarch64_evpc_rev, aarch64_evpc_dup, aarch64_evpc_zip)
(aarch64_expand_vec_perm_const_1): Handle polynomial
d->perm.length () and d->perm elements.
(aarch64_evpc_tbl): Likewise.  Use nelt rather than GET_MODE_NUNITS.
Apply to_constant to d->perm elements.
(aarch64_simd_valid_immediate, aarch64_vec_fpconst_pow_of_2): Handle
polynomial CONST_VECTOR_NUNITS.
(aarch64_move_pointer): Take amount as a poly_int64 rather
than an int.
(aarch64_progress_pointer): Avoid temporary variable.
* config/aarch64/aarch64.md (aarch64_<crc_variant>): Use
the mode attribute instead of GET_MODE.

Co-Authored-By: Alan Hayward <alan.hayward@arm.com>
Co-Authored-By: David Sherwood <david.sherwood@arm.com>
From-SVN: r256533

[AArch64] Rework interface to add constant/offset routines

The port had aarch64_add_offset and aarch64_add_constant routines
that did similar things.  This patch replaces them with an expanded
version of aarch64_add_offset that takes separate source and
destination registers.  The new routine also takes a poly_int64 offset
instead of a HOST_WIDE_INT offset, but it leaves the HOST_WIDE_INT
case to aarch64_add_offset_1, which is basically a repurposed
aarch64_add_constant_internal.  The SVE patch will put the handling
of VL-based constants in aarch64_add_offset, while still using
aarch64_add_offset_1 for the constant part.

The vcall_offset == 0 path in aarch64_output_mi_thunk will use temp0
as well as temp1 once SVE is added.

A side-effect of the patch is that we now generate:

        mov     x29, sp

instead of:

        add     x29, sp, 0

in the pr70044.c test.

2018-01-11  Richard Sandiford  <richard.sandiford@linaro.org>
    Alan Hayward  <alan.hayward@arm.com>
    David Sherwood  <david.sherwood@arm.com>

gcc/
* config/aarch64/aarch64.c (aarch64_force_temporary): Assert that
x exists before using it.
(aarch64_add_constant_internal): Rename to...
(aarch64_add_offset_1): ...this.  Replace regnum with separate
src and dest rtxes.  Handle the case in which they're different,
including when the offset is zero.  Replace scratchreg with an rtx.
Use 2 additions if there is no spare register into which we can
move a 16-bit constant.
(aarch64_add_constant): Delete.
(aarch64_add_offset): Replace reg with separate src and dest
rtxes.  Take a poly_int64 offset instead of a HOST_WIDE_INT.
Use aarch64_add_offset_1.
(aarch64_add_sp, aarch64_sub_sp): Take the scratch register as
an rtx rather than an int.  Take the delta as a poly_int64
rather than a HOST_WIDE_INT.  Use aarch64_add_offset.
(aarch64_expand_mov_immediate): Update uses of aarch64_add_offset.
(aarch64_expand_prologue): Update calls to aarch64_sub_sp,
aarch64_allocate_and_probe_stack_space and aarch64_add_offset.
(aarch64_expand_epilogue): Update calls to aarch64_add_offset
and aarch64_add_sp.
(aarch64_output_mi_thunk): Use aarch64_add_offset rather than
aarch64_add_constant.

gcc/testsuite/
* gcc.target/aarch64/pr70044.c: Allow "mov x29, sp" too.

Co-Authored-By: Alan Hayward <alan.hayward@arm.com>
Co-Authored-By: David Sherwood <david.sherwood@arm.com>
From-SVN: r256532