aarch64: Add costs for LD[234]/ST[234] permutes
At the moment, we cost LD[234] and ST[234] as N vector loads
or stores, which effectively treats the implied permute as free.
This patch adds additional costs for the permutes, which apply on
top of the costs for the loads and stores.
Like with the previous patches, this one only becomes active if
a CPU selects use_new_vector_costs. It should therefore have
a very low impact on other CPUs.
gcc/
* config/aarch64/aarch64-protos.h (simd_vec_cost::ld2_st2_permute_cost)
(simd_vec_cost::ld3_st3_permute_cost): New member variables.
(simd_vec_cost::ld4_st4_permute_cost): Likewise.
* config/aarch64/aarch64.c (generic_advsimd_vector_cost): Update
accordingly, using zero for the new costs.
(generic_sve_vector_cost, a64fx_advsimd_vector_cost): Likewise.
(a64fx_sve_vector_cost, qdf24xx_advsimd_vector_cost): Likewise.
(thunderx_advsimd_vector_cost, tsv110_advsimd_vector_cost): Likewise.
(cortexa57_advsimd_vector_cost, exynosm1_advsimd_vector_cost)
(xgene1_advsimd_vector_cost, thunderx2t99_advsimd_vector_cost)
(thunderx3t110_advsimd_vector_cost): Likewise.
(aarch64_ld234_st234_vectors): New function.
(aarch64_adjust_stmt_cost): Likewise.
(aarch64_add_stmt_cost): Call aarch64_adjust_stmt_cost if using
the new vector costs.