From 38ec23fafb167ddfe840d7bb22b3e943d8a7d29e Mon Sep 17 00:00:00 2001 From: Richard Sandiford Date: Tue, 18 Jan 2022 12:20:00 +0000 Subject: [PATCH] aarch64: Fix overly optimistic LDP/STP matching [PR104005] In g:526e1639aa76b0a8496b0dc3a3ff2c450229544e I'd added support for finding more consecutive MEMs. However, the check was too eager, in that it matched MEM_REFs with the same base address even if that base address was an arbitrary SSA name. This can give wrong results if a MEM_REF from one loop iteration is compared with a MEM_REF from another (e.g. after rtl unrolling). In principle, we could still accept MEM_REFs based on the same incoming SSA name, but there seems to be no out-of-the-box API for doing that. Adding a new one at this stage in GCC 12 doesn't feel like a good risk/reward trade-off. This patch therefore restricts the MEM_EXPR comparison to base decls only, excluding all MEM_REFs. It means we lose all the new STPs in the PR testcase but keep the ones in the original stp_1.c testcase. gcc/ PR target/104005 * config/aarch64/aarch64.cc (aarch64_check_consecutive_mems): When using MEM_EXPR, require the base to be a decl. gcc/testsuite/ PR target/104005 * gcc.target/aarch64/pr104005.c: New test. --- gcc/config/aarch64/aarch64.cc | 1 + gcc/testsuite/gcc.target/aarch64/pr104005.c | 17 +++++++++++++++++ 2 files changed, 18 insertions(+) create mode 100644 gcc/testsuite/gcc.target/aarch64/pr104005.c diff --git a/gcc/config/aarch64/aarch64.cc b/gcc/config/aarch64/aarch64.cc index fdf0c9b..296145e 100644 --- a/gcc/config/aarch64/aarch64.cc +++ b/gcc/config/aarch64/aarch64.cc @@ -24747,6 +24747,7 @@ aarch64_check_consecutive_mems (rtx *mem1, rtx *mem2, bool *reversed) &expr_offset2); if (!expr_base1 || !expr_base2 + || !DECL_P (expr_base1) || !operand_equal_p (expr_base1, expr_base2, OEP_ADDRESS_OF)) return false; diff --git a/gcc/testsuite/gcc.target/aarch64/pr104005.c b/gcc/testsuite/gcc.target/aarch64/pr104005.c new file mode 100644 index 0000000..09dd819 --- /dev/null +++ b/gcc/testsuite/gcc.target/aarch64/pr104005.c @@ -0,0 +1,17 @@ +/* { dg-options "-O2 -funroll-loops" } */ + +typedef int v2 __attribute__((vector_size(8))); + +void f(void) { + v2 v[1024]; + v2 *ptr = v; + for (int i = 0; i < 512; ++i) + { + ptr[0][0] = 0; + asm volatile ("":::"memory"); + ptr[0][1] = 1; + ptr += 2; + } +} + +/* { dg-final { scan-assembler-not {\tstp\t} } } */ -- 2.7.4