review.tizen.org Git - platform/upstream/gcc.git/commit

aarch64: Tweak SVE load/store costs

We were seeing performance regressions on 256-bit SVE with code like:

  for (int i = 0; i < count; ++i)
  #pragma GCC unroll 128
    for (int j = 0; j < 128; ++j)
      *dst++ = 1;

(derived from lmbench).

For 128-bit SVE, it's clearly better to use Advanced SIMD STPs here,
since they can store 256 bits at a time.  We already do this for
-msve-vector-bits=128 because in that case Advanced SIMD comes first
in autovectorize_vector_modes.

If we handled full-loop predication well for this kind of loop,
the choice between Advanced SIMD and 256-bit SVE would be mostly
a wash, since both of them could store 256 bits at a time.  However,
SVE would still have the extra prologue overhead of setting up the
predicate, so Advanced SIMD would still be the natural choice.

As things stand though, we don't handle full-loop predication well
for this kind of loop, so the 256-bit SVE code is significantly worse.
Something to fix for GCC 11 (hopefully).  However, even though we
account for the overhead of predication in the cost model, the SVE
version (wrongly) appeared to need half the number of stores.
That was enough to drown out the predication overhead and meant
that we'd pick the SVE code over the Advanced SIMD code.

512-bit SVE has a clear advantage over Advanced SIMD, so we should
continue using SVE there.

This patch tries to account for this in the cost model.  It's a bit
of a compromise; see the comment in the patch for more details.

2020-04-17  Richard Sandiford  <richard.sandiford@arm.com>

gcc/
* config/aarch64/aarch64.c (aarch64_advsimd_ldp_stp_p): New function.
(aarch64_sve_adjust_stmt_cost): Add a vectype parameter.  Double the
cost of load and store insns if one loop iteration has enough scalar
elements to use an Advanced SIMD LDP or STP.
(aarch64_add_stmt_cost): Update call accordingly.

gcc/testsuite/
* gcc.target/aarch64/sve/cost_model_2.c: New test.
* gcc.target/aarch64/sve/cost_model_3.c: Likewise.
* gcc.target/aarch64/sve/cost_model_4.c: Likewise.
* gcc.target/aarch64/sve/cost_model_5.c: Likewise.
* gcc.target/aarch64/sve/cost_model_6.c: Likewise.
* gcc.target/aarch64/sve/cost_model_7.c: Likewise.

author	Richard Sandiford <richard.sandiford@arm.com>
	Tue, 14 Apr 2020 20:04:03 +0000 (21:04 +0100)
committer	Richard Sandiford <richard.sandiford@arm.com>
	Fri, 17 Apr 2020 15:09:38 +0000 (16:09 +0100)
commit	8b50d7a47624030d87645237c60bd8f7ac78b2ec
tree	00f4b13286baea1366f15755050f0a7f8e6f7913	tree \| snapshot
parent	2e3897490e0f99b22a2813cfb34d59a1ea71ff68	commit \| diff

gcc/ChangeLog		diff \| blob \| history
gcc/config/aarch64/aarch64.c		diff \| blob \| history
gcc/testsuite/ChangeLog		diff \| blob \| history
gcc/testsuite/gcc.target/aarch64/sve/cost_model_2.c	[new file with mode: 0644]	blob
gcc/testsuite/gcc.target/aarch64/sve/cost_model_3.c	[new file with mode: 0644]	blob
gcc/testsuite/gcc.target/aarch64/sve/cost_model_4.c	[new file with mode: 0644]	blob
gcc/testsuite/gcc.target/aarch64/sve/cost_model_5.c	[new file with mode: 0644]	blob
gcc/testsuite/gcc.target/aarch64/sve/cost_model_6.c	[new file with mode: 0644]	blob
gcc/testsuite/gcc.target/aarch64/sve/cost_model_7.c	[new file with mode: 0644]	blob