[llvm-mca] Report the number of dispatched micro opcodes in the DispatchStatistics...

author Andrea Di Biagio <Andrea_DiBiagio@sn.scee.net>

Thu, 30 Aug 2018 10:50:20 +0000 (10:50 +0000)

committer Andrea Di Biagio <Andrea_DiBiagio@sn.scee.net>

Thu, 30 Aug 2018 10:50:20 +0000 (10:50 +0000)
author Andrea Di Biagio <Andrea_DiBiagio@sn.scee.net>
Thu, 30 Aug 2018 10:50:20 +0000 (10:50 +0000)
committer Andrea Di Biagio <Andrea_DiBiagio@sn.scee.net>
Thu, 30 Aug 2018 10:50:20 +0000 (10:50 +0000)
diff --git a/llvm/docs/CommandGuide/llvm-mca.rst b/llvm/docs/CommandGuide/llvm-mca.rst

index 43e64c3..100136e 100644 (file)
--- a/llvm/docs/CommandGuide/llvm-mca.rst
+++ b/llvm/docs/CommandGuide/llvm-mca.rst
@@ -479,13 +479,13 @@ sections.
    Dynamic Dispatch Stall Cycles:
    RAT     - Register unavailable:                      0
    RCU     - Retire tokens unavailable:                 0
-  SCHEDQ  - Scheduler full:                            272
+  SCHEDQ  - Scheduler full:                            272  (44.6%)
    LQ      - Load queue full:                           0
    SQ      - Store queue full:                          0
    GROUP   - Static restrictions on the dispatch group: 0
  
  
-  Dispatch Logic - number of cycles where we saw N instructions dispatched:
+  Dispatch Logic - number of cycles where we saw N micro opcodes dispatched:
    [# dispatched], [# cycles]
     0,              24  (3.9%)
     1,              272  (44.6%)
@@ -533,12 +533,11 @@ sections.
  
  If we look at the *Dynamic Dispatch Stall Cycles* table, we see the counter for
  SCHEDQ reports 272 cycles.  This counter is incremented every time the dispatch
-logic is unable to dispatch a group of two instructions because the scheduler's
-queue is full.
+logic is unable to dispatch a full group because the scheduler's queue is full.
  
  Looking at the *Dispatch Logic* table, we see that the pipeline was only able to
-dispatch two instructions 51.5% of the time.  The dispatch group was limited to
-one instruction 44.6% of the cycles, which corresponds to 272 cycles.  The
+dispatch two micro opcodes 51.5% of the time.  The dispatch group was limited to
+one micro opcode 44.6% of the cycles, which corresponds to 272 cycles.  The
  dispatch statistics are displayed by either using the command option
  ``-all-stats`` or ``-dispatch-stats``.
  
diff --git a/llvm/test/tools/llvm-mca/X86/BtVer2/register-files-1.s b/llvm/test/tools/llvm-mca/X86/BtVer2/register-files-1.s

index 0f95d5c..0319bd6 100644 (file)
--- a/llvm/test/tools/llvm-mca/X86/BtVer2/register-files-1.s
+++ b/llvm/test/tools/llvm-mca/X86/BtVer2/register-files-1.s
@@ -22,7 +22,7 @@ vmulps %xmm0, %xmm0, %xmm0
  # CHECK-NEXT: SQ      - Store queue full:                          0
  # CHECK-NEXT: GROUP   - Static restrictions on the dispatch group: 0
  
-# CHECK:      Dispatch Logic - number of cycles where we saw N instructions dispatched:
+# CHECK:      Dispatch Logic - number of cycles where we saw N micro opcodes dispatched:
  # CHECK-NEXT: [# dispatched], [# cycles]
  # CHECK-NEXT:  0,              23  (82.1%)
  # CHECK-NEXT:  2,              5  (17.9%)
diff --git a/llvm/test/tools/llvm-mca/X86/BtVer2/register-files-2.s b/llvm/test/tools/llvm-mca/X86/BtVer2/register-files-2.s

index b68ed9c..5f3fe1e 100644 (file)
--- a/llvm/test/tools/llvm-mca/X86/BtVer2/register-files-2.s
+++ b/llvm/test/tools/llvm-mca/X86/BtVer2/register-files-2.s
@@ -15,14 +15,14 @@ vmulps %xmm0, %xmm0, %xmm0
  # CHECK-NEXT: Block RThroughput: 1.0
  
  # CHECK:      Dynamic Dispatch Stall Cycles:
-# CHECK-NEXT: RAT     - Register unavailable:                      13
+# CHECK-NEXT: RAT     - Register unavailable:                      13  (46.4%)
  # CHECK-NEXT: RCU     - Retire tokens unavailable:                 0
  # CHECK-NEXT: SCHEDQ  - Scheduler full:                            0
  # CHECK-NEXT: LQ      - Load queue full:                           0
  # CHECK-NEXT: SQ      - Store queue full:                          0
  # CHECK-NEXT: GROUP   - Static restrictions on the dispatch group: 0
  
-# CHECK:      Dispatch Logic - number of cycles where we saw N instructions dispatched:
+# CHECK:      Dispatch Logic - number of cycles where we saw N micro opcodes dispatched:
  # CHECK-NEXT: [# dispatched], [# cycles]
  # CHECK-NEXT:  0,              20  (71.4%)
  # CHECK-NEXT:  1,              6  (21.4%)
diff --git a/llvm/test/tools/llvm-mca/X86/BtVer2/register-files-3.s b/llvm/test/tools/llvm-mca/X86/BtVer2/register-files-3.s

index 12aeed7..342f122 100644 (file)
--- a/llvm/test/tools/llvm-mca/X86/BtVer2/register-files-3.s
+++ b/llvm/test/tools/llvm-mca/X86/BtVer2/register-files-3.s
@@ -25,17 +25,17 @@ idiv %eax
  # CHECK-NEXT:  2      25    25.00                 U     idivl  %eax
  
  # CHECK:      Dynamic Dispatch Stall Cycles:
-# CHECK-NEXT: RAT     - Register unavailable:                      26
+# CHECK-NEXT: RAT     - Register unavailable:                      26  (47.3%)
  # CHECK-NEXT: RCU     - Retire tokens unavailable:                 0
  # CHECK-NEXT: SCHEDQ  - Scheduler full:                            0
  # CHECK-NEXT: LQ      - Load queue full:                           0
  # CHECK-NEXT: SQ      - Store queue full:                          0
  # CHECK-NEXT: GROUP   - Static restrictions on the dispatch group: 0
  
-# CHECK:      Dispatch Logic - number of cycles where we saw N instructions dispatched:
+# CHECK:      Dispatch Logic - number of cycles where we saw N micro opcodes dispatched:
  # CHECK-NEXT: [# dispatched], [# cycles]
  # CHECK-NEXT:  0,              53  (96.4%)
-# CHECK-NEXT:  1,              2  (3.6%)
+# CHECK-NEXT:  2,              2  (3.6%)
  
  # CHECK:      Register File statistics:
  # CHECK-NEXT: Total number of mappings created:    6
diff --git a/llvm/test/tools/llvm-mca/X86/BtVer2/register-files-4.s b/llvm/test/tools/llvm-mca/X86/BtVer2/register-files-4.s

index d67d5e4..0291ef2 100644 (file)
--- a/llvm/test/tools/llvm-mca/X86/BtVer2/register-files-4.s
+++ b/llvm/test/tools/llvm-mca/X86/BtVer2/register-files-4.s
@@ -25,17 +25,17 @@ idiv %eax
  # CHECK-NEXT:  2      25    25.00                 U     idivl  %eax
  
  # CHECK:      Dynamic Dispatch Stall Cycles:
-# CHECK-NEXT: RAT     - Register unavailable:                      6
+# CHECK-NEXT: RAT     - Register unavailable:                      6  (1.1%)
  # CHECK-NEXT: RCU     - Retire tokens unavailable:                 0
  # CHECK-NEXT: SCHEDQ  - Scheduler full:                            0
  # CHECK-NEXT: LQ      - Load queue full:                           0
  # CHECK-NEXT: SQ      - Store queue full:                          0
  # CHECK-NEXT: GROUP   - Static restrictions on the dispatch group: 0
  
-# CHECK:      Dispatch Logic - number of cycles where we saw N instructions dispatched:
+# CHECK:      Dispatch Logic - number of cycles where we saw N micro opcodes dispatched:
  # CHECK-NEXT: [# dispatched], [# cycles]
  # CHECK-NEXT:  0,              531  (96.0%)
-# CHECK-NEXT:  1,              22  (4.0%)
+# CHECK-NEXT:  2,              22  (4.0%)
  
  # CHECK:      Register File statistics:
  # CHECK-NEXT: Total number of mappings created:    66
diff --git a/llvm/test/tools/llvm-mca/X86/BtVer2/register-files-5.s b/llvm/test/tools/llvm-mca/X86/BtVer2/register-files-5.s

index 3d09bc7..f676e77 100644 (file)
--- a/llvm/test/tools/llvm-mca/X86/BtVer2/register-files-5.s
+++ b/llvm/test/tools/llvm-mca/X86/BtVer2/register-files-5.s
@@ -47,16 +47,16 @@
  
  # CHECK:      Dynamic Dispatch Stall Cycles:
  # CHECK-NEXT: RAT     - Register unavailable:                      0
-# CHECK-NEXT: RCU     - Retire tokens unavailable:                 8
+# CHECK-NEXT: RCU     - Retire tokens unavailable:                 8  (11.6%)
  # CHECK-NEXT: SCHEDQ  - Scheduler full:                            0
  # CHECK-NEXT: LQ      - Load queue full:                           0
  # CHECK-NEXT: SQ      - Store queue full:                          0
  # CHECK-NEXT: GROUP   - Static restrictions on the dispatch group: 0
  
-# CHECK:      Dispatch Logic - number of cycles where we saw N instructions dispatched:
+# CHECK:      Dispatch Logic - number of cycles where we saw N micro opcodes dispatched:
  # CHECK-NEXT: [# dispatched], [# cycles]
  # CHECK-NEXT:  0,              36  (52.2%)
-# CHECK-NEXT:  1,              33  (47.8%)
+# CHECK-NEXT:  2,              33  (47.8%)
  
  # CHECK:      Register File statistics:
  # CHECK-NEXT: Total number of mappings created:    66
diff --git a/llvm/test/tools/llvm-mca/X86/Haswell/cmpxchg16b.s b/llvm/test/tools/llvm-mca/X86/Haswell/cmpxchg16b.s

new file mode 100644 (file)

index 0000000..f0a0567
--- /dev/null
+++ b/llvm/test/tools/llvm-mca/X86/Haswell/cmpxchg16b.s
@@ -0,0 +1,76 @@
+# NOTE: Assertions have been autogenerated by utils/update_mca_test_checks.py
+# RUN: llvm-mca -mtriple=x86_64-unknown-unknown -mcpu=haswell -timeline -timeline-max-iterations=3 -dispatch-stats < %s | FileCheck %s
+
+cmpxchg16b (%rsi)
+
+# CHECK:      Iterations:        100
+# CHECK-NEXT: Instructions:      100
+# CHECK-NEXT: Total Cycles:      2203
+# CHECK-NEXT: Total uOps:        1900
+
+# CHECK:      Dispatch Width:    4
+# CHECK-NEXT: uOps Per Cycle:    0.86
+# CHECK-NEXT: IPC:               0.05
+# CHECK-NEXT: Block RThroughput: 4.8
+
+# CHECK:      Instruction Info:
+# CHECK-NEXT: [1]: #uOps
+# CHECK-NEXT: [2]: Latency
+# CHECK-NEXT: [3]: RThroughput
+# CHECK-NEXT: [4]: MayLoad
+# CHECK-NEXT: [5]: MayStore
+# CHECK-NEXT: [6]: HasSideEffects (U)
+
+# CHECK:      [1]    [2]    [3]    [4]    [5]    [6]    Instructions:
+# CHECK-NEXT:  19     22    4.00    *      *            cmpxchg16b     (%rsi)
+
+# CHECK:      Dynamic Dispatch Stall Cycles:
+# CHECK-NEXT: RAT     - Register unavailable:                      0
+# CHECK-NEXT: RCU     - Retire tokens unavailable:                 1487  (67.5%)
+# CHECK-NEXT: SCHEDQ  - Scheduler full:                            0
+# CHECK-NEXT: LQ      - Load queue full:                           0
+# CHECK-NEXT: SQ      - Store queue full:                          0
+# CHECK-NEXT: GROUP   - Static restrictions on the dispatch group: 0
+
+# CHECK:      Dispatch Logic - number of cycles where we saw N micro opcodes dispatched:
+# CHECK-NEXT: [# dispatched], [# cycles]
+# CHECK-NEXT:  0,              1703  (77.3%)
+# CHECK-NEXT:  3,              100  (4.5%)
+# CHECK-NEXT:  4,              400  (18.2%)
+
+# CHECK:      Resources:
+# CHECK-NEXT: [0]   - HWDivider
+# CHECK-NEXT: [1]   - HWFPDivider
+# CHECK-NEXT: [2]   - HWPort0
+# CHECK-NEXT: [3]   - HWPort1
+# CHECK-NEXT: [4]   - HWPort2
+# CHECK-NEXT: [5]   - HWPort3
+# CHECK-NEXT: [6]   - HWPort4
+# CHECK-NEXT: [7]   - HWPort5
+# CHECK-NEXT: [8]   - HWPort6
+# CHECK-NEXT: [9]   - HWPort7
+
+# CHECK:      Resource pressure per iteration:
+# CHECK-NEXT: [0]    [1]    [2]    [3]    [4]    [5]    [6]    [7]    [8]    [9]
+# CHECK-NEXT:  -      -     2.00   6.00   0.66   0.67   1.00   4.00   4.00   0.67
+
+# CHECK:      Resource pressure by instruction:
+# CHECK-NEXT: [0]    [1]    [2]    [3]    [4]    [5]    [6]    [7]    [8]    [9]    Instructions:
+# CHECK-NEXT:  -      -     2.00   6.00   0.66   0.67   1.00   4.00   4.00   0.67   cmpxchg16b (%rsi)
+
+# CHECK:      Timeline view:
+# CHECK-NEXT:                     0123456789          0123456789          0123456789
+# CHECK-NEXT: Index     0123456789          0123456789          0123456789          012345678
+
+# CHECK:      [0,0]     DeeeeeeeeeeeeeeeeeeeeeeER.    .    .    .    .    .    .    .    .  .   cmpxchg16b     (%rsi)
+# CHECK-NEXT: [1,0]     .    D=================eeeeeeeeeeeeeeeeeeeeeeER   .    .    .    .  .   cmpxchg16b     (%rsi)
+# CHECK-NEXT: [2,0]     .    .    D==================================eeeeeeeeeeeeeeeeeeeeeeER   cmpxchg16b     (%rsi)
+
+# CHECK:      Average Wait times (based on the timeline view):
+# CHECK-NEXT: [0]: Executions
+# CHECK-NEXT: [1]: Average time spent waiting in a scheduler's queue
+# CHECK-NEXT: [2]: Average time spent waiting in a scheduler's queue while ready
+# CHECK-NEXT: [3]: Average time elapsed from WB until retire stage
+
+# CHECK:            [0]    [1]    [2]    [3]
+# CHECK-NEXT: 0.     3     18.0   0.3    0.0       cmpxchg16b  (%rsi)
diff --git a/llvm/test/tools/llvm-mca/X86/option-all-stats-1.s b/llvm/test/tools/llvm-mca/X86/option-all-stats-1.s

index 5763caa..60276c3 100644 (file)
--- a/llvm/test/tools/llvm-mca/X86/option-all-stats-1.s
+++ b/llvm/test/tools/llvm-mca/X86/option-all-stats-1.s
@@ -30,12 +30,12 @@ add %eax, %eax
  # FULLREPORT:      Dynamic Dispatch Stall Cycles:
  # FULLREPORT-NEXT: RAT     - Register unavailable:                      0
  # FULLREPORT-NEXT: RCU     - Retire tokens unavailable:                 0
-# FULLREPORT-NEXT: SCHEDQ  - Scheduler full:                            61
+# FULLREPORT-NEXT: SCHEDQ  - Scheduler full:                            61  (59.2%)
  # FULLREPORT-NEXT: LQ      - Load queue full:                           0
  # FULLREPORT-NEXT: SQ      - Store queue full:                          0
  # FULLREPORT-NEXT: GROUP   - Static restrictions on the dispatch group: 0
  
-# FULLREPORT:      Dispatch Logic - number of cycles where we saw N instructions dispatched:
+# FULLREPORT:      Dispatch Logic - number of cycles where we saw N micro opcodes dispatched:
  # FULLREPORT-NEXT: [# dispatched], [# cycles]
  # FULLREPORT-NEXT:  0,              22  (21.4%)
  # FULLREPORT-NEXT:  1,              62  (60.2%)
diff --git a/llvm/test/tools/llvm-mca/X86/option-all-stats-2.s b/llvm/test/tools/llvm-mca/X86/option-all-stats-2.s

index 3e8c8be..f5cfdc3 100644 (file)
--- a/llvm/test/tools/llvm-mca/X86/option-all-stats-2.s
+++ b/llvm/test/tools/llvm-mca/X86/option-all-stats-2.s
@@ -31,12 +31,12 @@ add %eax, %eax
  # FULL:      Dynamic Dispatch Stall Cycles:
  # FULL-NEXT: RAT     - Register unavailable:                      0
  # FULL-NEXT: RCU     - Retire tokens unavailable:                 0
-# FULL-NEXT: SCHEDQ  - Scheduler full:                            61
+# FULL-NEXT: SCHEDQ  - Scheduler full:                            61  (59.2%)
  # FULL-NEXT: LQ      - Load queue full:                           0
  # FULL-NEXT: SQ      - Store queue full:                          0
  # FULL-NEXT: GROUP   - Static restrictions on the dispatch group: 0
  
-# FULL:      Dispatch Logic - number of cycles where we saw N instructions dispatched:
+# FULL:      Dispatch Logic - number of cycles where we saw N micro opcodes dispatched:
  # FULL-NEXT: [# dispatched], [# cycles]
  # FULL-NEXT:  0,              22  (21.4%)
  # FULL-NEXT:  1,              62  (60.2%)
diff --git a/llvm/test/tools/llvm-mca/X86/option-all-views-1.s b/llvm/test/tools/llvm-mca/X86/option-all-views-1.s

index 8950014..e707a66 100644 (file)
--- a/llvm/test/tools/llvm-mca/X86/option-all-views-1.s
+++ b/llvm/test/tools/llvm-mca/X86/option-all-views-1.s
@@ -32,12 +32,12 @@ add %eax, %eax
  # FULLREPORT:         Dynamic Dispatch Stall Cycles:
  # FULLREPORT-NEXT:    RAT     - Register unavailable:                      0
  # FULLREPORT-NEXT:    RCU     - Retire tokens unavailable:                 0
-# FULLREPORT-NEXT:    SCHEDQ  - Scheduler full:                            61
+# FULLREPORT-NEXT:    SCHEDQ  - Scheduler full:                            61  (59.2%)
  # FULLREPORT-NEXT:    LQ      - Load queue full:                           0
  # FULLREPORT-NEXT:    SQ      - Store queue full:                          0
  # FULLREPORT-NEXT:    GROUP   - Static restrictions on the dispatch group: 0
  
-# FULLREPORT:         Dispatch Logic - number of cycles where we saw N instructions dispatched:
+# FULLREPORT:         Dispatch Logic - number of cycles where we saw N micro opcodes dispatched:
  # FULLREPORT-NEXT:    [# dispatched], [# cycles]
  # FULLREPORT-NEXT:     0,              22  (21.4%)
  # FULLREPORT-NEXT:     1,              62  (60.2%)
diff --git a/llvm/test/tools/llvm-mca/X86/option-all-views-2.s b/llvm/test/tools/llvm-mca/X86/option-all-views-2.s

index 30c1947..b71ec2a 100644 (file)
--- a/llvm/test/tools/llvm-mca/X86/option-all-views-2.s
+++ b/llvm/test/tools/llvm-mca/X86/option-all-views-2.s
@@ -31,12 +31,12 @@ add %eax, %eax
  # ALL:             Dynamic Dispatch Stall Cycles:
  # ALL-NEXT:        RAT     - Register unavailable:                      0
  # ALL-NEXT:        RCU     - Retire tokens unavailable:                 0
-# ALL-NEXT:        SCHEDQ  - Scheduler full:                            61
+# ALL-NEXT:        SCHEDQ  - Scheduler full:                            61  (59.2%)
  # ALL-NEXT:        LQ      - Load queue full:                           0
  # ALL-NEXT:        SQ      - Store queue full:                          0
  # ALL-NEXT:        GROUP   - Static restrictions on the dispatch group: 0
  
-# ALL:             Dispatch Logic - number of cycles where we saw N instructions dispatched:
+# ALL:             Dispatch Logic - number of cycles where we saw N micro opcodes dispatched:
  # ALL-NEXT:        [# dispatched], [# cycles]
  # ALL-NEXT:         0,              22  (21.4%)
  # ALL-NEXT:         1,              62  (60.2%)
diff --git a/llvm/tools/llvm-mca/Views/DispatchStatistics.cpp b/llvm/tools/llvm-mca/Views/DispatchStatistics.cpp

index 15cdbd3..cccb09a 100644 (file)
--- a/llvm/tools/llvm-mca/Views/DispatchStatistics.cpp
+++ b/llvm/tools/llvm-mca/Views/DispatchStatistics.cpp
@@ -26,20 +26,23 @@ void DispatchStatistics::onEvent(const HWStallEvent &Event) {
  }
  
  void DispatchStatistics::onEvent(const HWInstructionEvent &Event) {
-  if (Event.Type == HWInstructionEvent::Dispatched)
-    ++NumDispatched;
+  if (Event.Type != HWInstructionEvent::Dispatched)
+    return;
+
+  const auto &DE = static_cast<const HWInstructionDispatchedEvent &>(Event);
+  NumDispatched += DE.MicroOpcodes;
  }
  
  void DispatchStatistics::printDispatchHistogram(llvm::raw_ostream &OS) const {
    std::string Buffer;
    raw_string_ostream TempStream(Buffer);
    TempStream << "\n\nDispatch Logic - "
-             << "number of cycles where we saw N instructions dispatched:\n";
+             << "number of cycles where we saw N micro opcodes dispatched:\n";
    TempStream << "[# dispatched], [# cycles]\n";
    for (const std::pair<unsigned, unsigned> &Entry : DispatchGroupSizePerCycle) {
+    double Percentage = ((double)Entry.second / NumCycles) * 100.0;
      TempStream << " " << Entry.first << ",              " << Entry.second
-               << "  ("
-               << format("%.1f", ((double)Entry.second / NumCycles) * 100.0)
+               << "  (" << format("%.1f", floor((Percentage * 10) + 0.5) / 10)
                 << "%)\n";
    }
  
@@ -47,24 +50,36 @@ void DispatchStatistics::printDispatchHistogram(llvm::raw_ostream &OS) const {
    OS << Buffer;
  }
  
+static void printStalls(raw_ostream &OS, unsigned NumStalls,
+                        unsigned NumCycles) {
+  if (!NumStalls) {
+    OS << NumStalls;
+    return;
+  }
+
+  double Percentage = ((double)NumStalls / NumCycles) * 100.0;
+  OS << NumStalls << "  ("
+     << format("%.1f", floor((Percentage * 10) + 0.5) / 10) << "%)";
+}
+
  void DispatchStatistics::printDispatchStalls(raw_ostream &OS) const {
    std::string Buffer;
-  raw_string_ostream TempStream(Buffer);
-  TempStream << "\n\nDynamic Dispatch Stall Cycles:\n";
-  TempStream << "RAT     - Register unavailable:                      "
-             << HWStalls[HWStallEvent::RegisterFileStall];
-  TempStream << "\nRCU     - Retire tokens unavailable:                 "
-             << HWStalls[HWStallEvent::RetireControlUnitStall];
-  TempStream << "\nSCHEDQ  - Scheduler full:                            "
-             << HWStalls[HWStallEvent::SchedulerQueueFull];
-  TempStream << "\nLQ      - Load queue full:                           "
-             << HWStalls[HWStallEvent::LoadQueueFull];
-  TempStream << "\nSQ      - Store queue full:                          "
-             << HWStalls[HWStallEvent::StoreQueueFull];
-  TempStream << "\nGROUP   - Static restrictions on the dispatch group: "
-             << HWStalls[HWStallEvent::DispatchGroupStall];
-  TempStream << '\n';
-  TempStream.flush();
+  raw_string_ostream SS(Buffer);
+  SS << "\n\nDynamic Dispatch Stall Cycles:\n";
+  SS << "RAT     - Register unavailable:                      ";
+  printStalls(SS, HWStalls[HWStallEvent::RegisterFileStall], NumCycles);
+  SS << "\nRCU     - Retire tokens unavailable:                 ";
+  printStalls(SS, HWStalls[HWStallEvent::RetireControlUnitStall], NumCycles);
+  SS << "\nSCHEDQ  - Scheduler full:                            ";
+  printStalls(SS, HWStalls[HWStallEvent::SchedulerQueueFull], NumCycles);
+  SS << "\nLQ      - Load queue full:                           ";
+  printStalls(SS, HWStalls[HWStallEvent::LoadQueueFull], NumCycles);
+  SS << "\nSQ      - Store queue full:                          ";
+  printStalls(SS, HWStalls[HWStallEvent::StoreQueueFull], NumCycles);
+  SS << "\nGROUP   - Static restrictions on the dispatch group: ";
+  printStalls(SS, HWStalls[HWStallEvent::DispatchGroupStall], NumCycles);
+  SS << '\n';
+  SS.flush();
    OS << Buffer;
  }
  
diff --git a/llvm/tools/llvm-mca/Views/DispatchStatistics.h b/llvm/tools/llvm-mca/Views/DispatchStatistics.h

index 9c64c72..0f6f75e 100644 (file)
--- a/llvm/tools/llvm-mca/Views/DispatchStatistics.h
+++ b/llvm/tools/llvm-mca/Views/DispatchStatistics.h
@@ -24,7 +24,7 @@
  /// GROUP   - Static restrictions on the dispatch group: 0
  ///
  ///
-/// Dispatch Logic - number of cycles where we saw N instructions dispatched:
+/// Dispatch Logic - number of cycles where we saw N micro opcodes dispatched:
  /// [# dispatched], [# cycles]
  ///  0,              15  (11.5%)
  ///  2,              4  (3.1%)
diff --git a/llvm/tools/llvm-mca/Views/SummaryView.cpp b/llvm/tools/llvm-mca/Views/SummaryView.cpp

index 026742a..eb4c50c 100644 (file)
--- a/llvm/tools/llvm-mca/Views/SummaryView.cpp
+++ b/llvm/tools/llvm-mca/Views/SummaryView.cpp
@@ -33,12 +33,10 @@ SummaryView::SummaryView(const llvm::MCSchedModel &Model, const SourceMgr &S,
  }
  
  void SummaryView::onEvent(const HWInstructionEvent &Event) {
-  // We are only interested in the "instruction dispatched" events generated by
-  // the dispatch stage for instructions that are part of iteration #0.
-  if (Event.Type != HWInstructionEvent::Dispatched)
-    return;
-
-  if (Event.IR.getSourceIndex() >= Source.size())
+  // We are only interested in the "instruction retired" events generated by
+  // the retire stage for instructions that are part of iteration #0.
+  if (Event.Type != HWInstructionEvent::Retired ||
+      Event.IR.getSourceIndex() >= Source.size())
      return;
  
    // Update the cumulative number of resource cycles based on the processor
diff --git a/llvm/tools/llvm-mca/Views/TimelineView.cpp b/llvm/tools/llvm-mca/Views/TimelineView.cpp

index 863d05f..5ba151f 100644 (file)
--- a/llvm/tools/llvm-mca/Views/TimelineView.cpp
+++ b/llvm/tools/llvm-mca/Views/TimelineView.cpp
@@ -29,6 +29,8 @@ TimelineView::TimelineView(const MCSubtargetInfo &sti, MCInstPrinter &Printer,
      MaxIterations = DEFAULT_ITERATIONS;
    NumInstructions *= std::min(MaxIterations, AsmSequence.getNumIterations());
    Timeline.resize(NumInstructions);
+  TimelineViewEntry InvalidTVEntry = {-1, 0, 0, 0};
+  std::fill(Timeline.begin(), Timeline.end(), InvalidTVEntry);
  
    WaitTimeEntry NullWTEntry = {0, 0, 0};
    std::fill(WaitTime.begin(), WaitTime.end(), NullWTEntry);
@@ -68,10 +70,13 @@ void TimelineView::onEvent(const HWInstructionEvent &Event) {
        TVEntry.CycleRetired = CurrentCycle;
  
      // Update the WaitTime entry which corresponds to this Index.
+    assert(TVEntry.CycleDispatched >= 0 && "Invalid TVEntry found!");
+    unsigned CycleDispatched = static_cast<unsigned>(TVEntry.CycleDispatched);
      WaitTimeEntry &WTEntry = WaitTime[Index % AsmSequence.size()];
      WTEntry.CyclesSpentInSchedulerQueue +=
-        TVEntry.CycleIssued - TVEntry.CycleDispatched;
-    assert(TVEntry.CycleDispatched <= TVEntry.CycleReady);
+        TVEntry.CycleIssued - CycleDispatched;
+    assert(CycleDispatched <= TVEntry.CycleReady &&
+           "Instruction cannot be ready if it hasn't been dispatched yet!");
      WTEntry.CyclesSpentInSQWhileReady +=
          TVEntry.CycleIssued - TVEntry.CycleReady;
      WTEntry.CyclesSpentAfterWBAndBeforeRetire +=
@@ -88,7 +93,11 @@ void TimelineView::onEvent(const HWInstructionEvent &Event) {
      Timeline[Index].CycleExecuted = CurrentCycle;
      break;
    case HWInstructionEvent::Dispatched:
-    Timeline[Index].CycleDispatched = CurrentCycle;
+    // There may be multiple dispatch events. Microcoded instructions that are
+    // expanded into multiple uOps may require multiple dispatch cycles. Here,
+    // we want to capture the first dispatch cycle.
+    if (Timeline[Index].CycleDispatched == -1)
+      Timeline[Index].CycleDispatched = static_cast<int>(CurrentCycle);
      break;
    default:
      return;
@@ -193,19 +202,20 @@ void TimelineView::printTimelineViewEntry(formatted_raw_ostream &OS,
      OS << '\n';
    OS << '[' << Iteration << ',' << SourceIndex << ']';
    OS.PadToColumn(10);
-  for (unsigned I = 0, E = Entry.CycleDispatched; I < E; ++I)
+  assert(Entry.CycleDispatched >= 0 && "Invalid TimelineViewEntry!");
+  unsigned CycleDispatched = static_cast<unsigned>(Entry.CycleDispatched);
+  for (unsigned I = 0, E = CycleDispatched; I < E; ++I)
      OS << ((I % 5 == 0) ? '.' : ' ');
    OS << TimelineView::DisplayChar::Dispatched;
-  if (Entry.CycleDispatched != Entry.CycleExecuted) {
+  if (CycleDispatched != Entry.CycleExecuted) {
      // Zero latency instructions have the same value for CycleDispatched,
      // CycleIssued and CycleExecuted.
-    for (unsigned I = Entry.CycleDispatched + 1, E = Entry.CycleIssued; I < E;
-         ++I)
+    for (unsigned I = CycleDispatched + 1, E = Entry.CycleIssued; I < E; ++I)
        OS << TimelineView::DisplayChar::Waiting;
      if (Entry.CycleIssued == Entry.CycleExecuted)
        OS << TimelineView::DisplayChar::DisplayChar::Executed;
      else {
-      if (Entry.CycleDispatched != Entry.CycleIssued)
+      if (CycleDispatched != Entry.CycleIssued)
          OS << TimelineView::DisplayChar::Executing;
        for (unsigned I = Entry.CycleIssued + 1, E = Entry.CycleExecuted; I < E;
             ++I)
diff --git a/llvm/tools/llvm-mca/Views/TimelineView.h b/llvm/tools/llvm-mca/Views/TimelineView.h

index 9f50c20..361e37a 100644 (file)
--- a/llvm/tools/llvm-mca/Views/TimelineView.h
+++ b/llvm/tools/llvm-mca/Views/TimelineView.h
@@ -126,7 +126,7 @@ class TimelineView : public View {
    unsigned LastCycle;
  
    struct TimelineViewEntry {
-    unsigned CycleDispatched;
+    int CycleDispatched;  // A negative value is an "invalid cycle".
      unsigned CycleReady;
      unsigned CycleIssued;
      unsigned CycleExecuted;
diff --git a/llvm/tools/llvm-mca/include/HWEventListener.h b/llvm/tools/llvm-mca/include/HWEventListener.h

index fa574c2..be56c5c 100644 (file)
--- a/llvm/tools/llvm-mca/include/HWEventListener.h
+++ b/llvm/tools/llvm-mca/include/HWEventListener.h
@@ -70,12 +70,23 @@ public:
  
  class HWInstructionDispatchedEvent : public HWInstructionEvent {
  public:
-  HWInstructionDispatchedEvent(const InstRef &IR, llvm::ArrayRef<unsigned> Regs)
+  HWInstructionDispatchedEvent(const InstRef &IR, llvm::ArrayRef<unsigned> Regs,
+                               unsigned UOps)
        : HWInstructionEvent(HWInstructionEvent::Dispatched, IR),
-        UsedPhysRegs(Regs) {}
+        UsedPhysRegs(Regs), MicroOpcodes(UOps) {}
    // Number of physical register allocated for this instruction. There is one
    // entry per register file.
    llvm::ArrayRef<unsigned> UsedPhysRegs;
+  // Number of micro opcodes dispatched.
+  // This field is often set to the total number of micro-opcodes specified by
+  // the instruction descriptor of IR.
+  // The only exception is when IR declares a number of micro opcodes
+  // which exceeds the processor DispatchWidth, and - by construction - it
+  // requires multiple cycles to be fully dispatched. In that particular case,
+  // the dispatch logic would generate more than one dispatch event (one per
+  // cycle), and each event would declare how many micro opcodes are effectively
+  // been dispatched to the schedulers.
+  unsigned MicroOpcodes;
  };
  
  class HWInstructionRetiredEvent : public HWInstructionEvent {
diff --git a/llvm/tools/llvm-mca/include/Stages/DispatchStage.h b/llvm/tools/llvm-mca/include/Stages/DispatchStage.h

index 02d1de5..0d3b8d6 100644 (file)
--- a/llvm/tools/llvm-mca/include/Stages/DispatchStage.h
+++ b/llvm/tools/llvm-mca/include/Stages/DispatchStage.h
@@ -51,6 +51,7 @@ class DispatchStage final : public Stage {
    unsigned DispatchWidth;
    unsigned AvailableEntries;
    unsigned CarryOver;
+  InstRef CarriedOver;
    const llvm::MCSubtargetInfo &STI;
    RetireControlUnit &RCU;
    RegisterFile &PRF;
@@ -63,7 +64,8 @@ class DispatchStage final : public Stage {
    void updateRAWDependencies(ReadState &RS, const llvm::MCSubtargetInfo &STI);
  
    void notifyInstructionDispatched(const InstRef &IR,
-                                   llvm::ArrayRef<unsigned> UsedPhysRegs);
+                                   llvm::ArrayRef<unsigned> UsedPhysRegs,
+                                   unsigned uOps);
  
    void collectWrites(llvm::SmallVectorImpl<WriteRef> &Vec,
                       unsigned RegID) const {
@@ -75,7 +77,7 @@ public:
                  const llvm::MCRegisterInfo &MRI, unsigned MaxDispatchWidth,
                  RetireControlUnit &R, RegisterFile &F)
        : DispatchWidth(MaxDispatchWidth), AvailableEntries(MaxDispatchWidth),
-        CarryOver(0U), STI(Subtarget), RCU(R), PRF(F) {}
+        CarryOver(0U), CarriedOver(), STI(Subtarget), RCU(R), PRF(F) {}
  
    bool isAvailable(const InstRef &IR) const override;
  
diff --git a/llvm/tools/llvm-mca/lib/Stages/DispatchStage.cpp b/llvm/tools/llvm-mca/lib/Stages/DispatchStage.cpp

index 81098cb..e874988 100644 (file)
--- a/llvm/tools/llvm-mca/lib/Stages/DispatchStage.cpp
+++ b/llvm/tools/llvm-mca/lib/Stages/DispatchStage.cpp
@@ -28,9 +28,11 @@ using namespace llvm;
  namespace mca {
  
  void DispatchStage::notifyInstructionDispatched(const InstRef &IR,
-                                                ArrayRef<unsigned> UsedRegs) {
+                                                ArrayRef<unsigned> UsedRegs,
+                                                unsigned UOps) {
    LLVM_DEBUG(dbgs() << "[E] Instruction Dispatched: #" << IR << '\n');
-  notifyEvent<HWInstructionEvent>(HWInstructionDispatchedEvent(IR, UsedRegs));
+  notifyEvent<HWInstructionEvent>(
+      HWInstructionDispatchedEvent(IR, UsedRegs, UOps));
  }
  
  bool DispatchStage::checkPRF(const InstRef &IR) const {
@@ -92,6 +94,7 @@ llvm::Error DispatchStage::dispatch(InstRef IR) {
      assert(AvailableEntries == DispatchWidth);
      AvailableEntries = 0;
      CarryOver = NumMicroOps - DispatchWidth;
+    CarriedOver = IR;
    } else {
      assert(AvailableEntries >= NumMicroOps);
      AvailableEntries -= NumMicroOps;
@@ -125,13 +128,26 @@ llvm::Error DispatchStage::dispatch(InstRef IR) {
  
    // Notify listeners of the "instruction dispatched" event,
    // and move IR to the next stage.
-  notifyInstructionDispatched(IR, RegisterFiles);
+  notifyInstructionDispatched(IR, RegisterFiles,
+                              std::min(DispatchWidth, NumMicroOps));
    return moveToTheNextStage(IR);
  }
  
  llvm::Error DispatchStage::cycleStart() {
+  if (!CarryOver) {
+    AvailableEntries = DispatchWidth;
+    return llvm::ErrorSuccess();
+  }
+
    AvailableEntries = CarryOver >= DispatchWidth ? 0 : DispatchWidth - CarryOver;
-  CarryOver = CarryOver >= DispatchWidth ? CarryOver - DispatchWidth : 0U;
+  unsigned DispatchedOpcodes = DispatchWidth - AvailableEntries;
+  CarryOver -= DispatchedOpcodes;
+  assert(CarriedOver.isValid() && "Invalid dispatched instruction");
+  
+  SmallVector<unsigned, 8> RegisterFiles(PRF.getNumRegisterFiles(), 0U);
+  notifyInstructionDispatched(CarriedOver, RegisterFiles, DispatchedOpcodes);
+  if (!CarryOver)
+    CarriedOver = InstRef();
    return llvm::ErrorSuccess();
  }
author	Andrea Di Biagio <Andrea_DiBiagio@sn.scee.net>
	Thu, 30 Aug 2018 10:50:20 +0000 (10:50 +0000)
committer	Andrea Di Biagio <Andrea_DiBiagio@sn.scee.net>
	Thu, 30 Aug 2018 10:50:20 +0000 (10:50 +0000)
llvm/docs/CommandGuide/llvm-mca.rst		patch \| blob \| history
llvm/test/tools/llvm-mca/X86/BtVer2/register-files-1.s		patch \| blob \| history
llvm/test/tools/llvm-mca/X86/BtVer2/register-files-2.s		patch \| blob \| history
llvm/test/tools/llvm-mca/X86/BtVer2/register-files-3.s		patch \| blob \| history
llvm/test/tools/llvm-mca/X86/BtVer2/register-files-4.s		patch \| blob \| history
llvm/test/tools/llvm-mca/X86/BtVer2/register-files-5.s		patch \| blob \| history
llvm/test/tools/llvm-mca/X86/Haswell/cmpxchg16b.s	[new file with mode: 0644]	patch \| blob
llvm/test/tools/llvm-mca/X86/option-all-stats-1.s		patch \| blob \| history
llvm/test/tools/llvm-mca/X86/option-all-stats-2.s		patch \| blob \| history
llvm/test/tools/llvm-mca/X86/option-all-views-1.s		patch \| blob \| history
llvm/test/tools/llvm-mca/X86/option-all-views-2.s		patch \| blob \| history
llvm/tools/llvm-mca/Views/DispatchStatistics.cpp		patch \| blob \| history
llvm/tools/llvm-mca/Views/DispatchStatistics.h		patch \| blob \| history
llvm/tools/llvm-mca/Views/SummaryView.cpp		patch \| blob \| history
llvm/tools/llvm-mca/Views/TimelineView.cpp		patch \| blob \| history
llvm/tools/llvm-mca/Views/TimelineView.h		patch \| blob \| history
llvm/tools/llvm-mca/include/HWEventListener.h		patch \| blob \| history
llvm/tools/llvm-mca/include/Stages/DispatchStage.h		patch \| blob \| history
llvm/tools/llvm-mca/lib/Stages/DispatchStage.cpp		patch \| blob \| history