[MCA] Fixed a bug where loads and stores were sometimes incorrectly marked as depeden...
authorAndrea Di Biagio <andrea.dibiagio@sony.com>
Mon, 4 May 2020 17:23:04 +0000 (18:23 +0100)
committerAndrea Di Biagio <andrea.dibiagio@sony.com>
Tue, 5 May 2020 09:25:36 +0000 (10:25 +0100)
commit5578ec32f9c4fef46adce52a2e3d22bf409b3d2c
tree7a1b4d270b9df9c12cd2eec593e8d471127ac86a
parent08032e7192d4e0120adae95c4c0a6ced17583e64
[MCA] Fixed a bug where loads and stores were sometimes incorrectly marked as depedent. Fixes PR45793.

This fixes a regression introduced by a very old commit 280ac1fd1dc35 (was
llvm-svn 361950).

Commit 280ac1fd1dc35 redesigned the logic in the LSUnit with the goal of
speeding up isReady() queries, and stabilising the LSUnit API (while also making
the load store unit more customisable).

The concept of MemoryGroup (effectively an alias set) was added by that commit
to better describe and track dependencies between memory operations.  However,
that concept was not just used for alias dependencies, but it was also used for
describing memory "order" dependencies (enforced by the memory consistency
model).

Instructions of a same memory group were considered "equivalent" as in:
independent operations that can potentially execute in parallel.  The problem
was that the cost of a dependency (in terms of number of cycles) should have
been different for "order" dependency. Instructions in an order dependency
simply have to have to wait until their predecessors are "issued" to an
underlying pipeline (rather than having to wait until predecessors have beeng
fully executed). For simple "order" dependencies, this was effectively
introducing an artificial delay on the "issue" of independent loads and stores.

This patch fixes the issue and adds a new test named 'independent-load-stores.s'
to a bunch of x86 targets. That test contains the reproducible posted by Fabian
Ritter on PR45793.

I had to rerun the update-mca-tests script on several files. To avoid expected
regressions on some Exynos tests, I have added a -noalias=false flag (to match
the old strict behavior on latencies).

Some tests for processor Barcelona are improved/fixed by this change and they
now show better results.  In a few tests we were incorrectly counting the time
spent by instructions in a scheduler queue.  In one case in particular we now
correctly see a store executed out of order.  That test was affected by the same
underlying issue reported as PR45793.

Reviewers: mattd

Differential Revision: https://reviews.llvm.org/D79351
18 files changed:
llvm/include/llvm/MCA/HardwareUnits/LSUnit.h
llvm/lib/MCA/HardwareUnits/LSUnit.cpp
llvm/test/tools/llvm-mca/AArch64/Exynos/asimd-st1.s
llvm/test/tools/llvm-mca/AArch64/Exynos/asimd-st2.s
llvm/test/tools/llvm-mca/AArch64/Exynos/asimd-st3.s
llvm/test/tools/llvm-mca/AArch64/Exynos/asimd-st4.s
llvm/test/tools/llvm-mca/AArch64/Exynos/float-store.s
llvm/test/tools/llvm-mca/AArch64/Exynos/store.s
llvm/test/tools/llvm-mca/X86/Barcelona/load-store-throughput.s
llvm/test/tools/llvm-mca/X86/Barcelona/store-throughput.s
llvm/test/tools/llvm-mca/X86/BdVer2/load-store-throughput.s
llvm/test/tools/llvm-mca/X86/BdVer2/memcpy-like-test.s
llvm/test/tools/llvm-mca/X86/BdVer2/store-throughput.s
llvm/test/tools/llvm-mca/X86/BtVer2/independent-load-stores.s [new file with mode: 0644]
llvm/test/tools/llvm-mca/X86/BtVer2/xadd.s
llvm/test/tools/llvm-mca/X86/Haswell/independent-load-stores.s [new file with mode: 0644]
llvm/test/tools/llvm-mca/X86/SkylakeClient/independent-load-stores.s [new file with mode: 0644]
llvm/test/tools/llvm-mca/X86/SkylakeServer/independent-load-stores.s [new file with mode: 0644]