This patch adds two new fields to the perf report generated by the SummaryView.
Fields are now logically organized into two small groups; only the second group
contains throughput indicators.
Example:
```
Iterations: 100
Instructions: 300
Total Cycles: 414
Total uOps: 700
Dispatch Width: 4
uOps Per Cycle: 1.69
IPC: 0.72
Block RThroughput: 4.0
```
This patch also updates the docs for llvm-mca.
Due to the nature of this change, several tests in the tools/llvm-mca directory
were affected, and had to be updated using script `update_mca_test_checks.py`.
llvm-svn: 340946
Iterations: 300
Instructions: 900
Total Cycles: 610
+ Total uOps: 900
+
Dispatch Width: 2
+ uOps Per Cycle: 1.48
IPC: 1.48
Block RThroughput: 2.0
- - - 1.00 - 1.00 - - - - - - - - vhaddps %xmm3, %xmm3, %xmm4
According to this report, the dot-product kernel has been executed 300 times,
-for a total of 900 dynamically executed instructions.
+for a total of 900 simulated instructions. The total number of simulated micro
+opcodes (uOps) is also 900.
The report is structured in three main sections. The first section collects a
few performance numbers; the goal of this section is to give a very quick
-overview of the performance throughput. In this example, the two important
-performance indicators are **IPC** and **Block RThroughput** (Block Reciprocal
+overview of the performance throughput. Important performance indicators are
+**IPC**, **uOps Per Cycle**, and **Block RThroughput** (Block Reciprocal
Throughput).
IPC is computed dividing the total number of simulated instructions by the total
-number of cycles. A delta between Dispatch Width and IPC is an indicator of a
-performance issue. In the absence of loop-carried data dependencies, the
+number of cycles. In the absence of loop-carried data dependencies, the
observed IPC tends to a theoretical maximum which can be computed by dividing
the number of instructions of a single iteration by the *Block RThroughput*.
-IPC is bounded from above by the dispatch width. That is because the dispatch
-width limits the maximum size of a dispatch group. IPC is also limited by the
-amount of hardware parallelism. The availability of hardware resources affects
-the resource pressure distribution, and it limits the number of instructions
-that can be executed in parallel every cycle. A delta between Dispatch
-Width and the theoretical maximum IPC is an indicator of a performance
-bottleneck caused by the lack of hardware resources. In general, the lower the
-Block RThroughput, the better.
-
-In this example, ``Instructions per iteration/Block RThroughput`` is 1.50. Since
-there are no loop-carried dependencies, the observed IPC is expected to approach
-1.50 when the number of iterations tends to infinity. The delta between the
-Dispatch Width (2.00), and the theoretical maximum IPC (1.50) is an indicator of
-a performance bottleneck caused by the lack of hardware resources, and the
-*Resource pressure view* can help to identify the problematic resource usage.
+Field 'uOps Per Cycle' is computed dividing the total number of simulated micro
+opcodes by the total number of cycles. A delta between Dispatch Width and this
+field is an indicator of a performance issue. In the absence of loop-carried
+data dependencies, the observed 'uOps Per Cycle' should tend to a theoretical
+maximum throughput which can be computed by dividing the number of uOps of a
+single iteration by the *Block RThroughput*.
+
+Field *uOps Per Cycle* is bounded from above by the dispatch width. That is
+because the dispatch width limits the maximum size of a dispatch group. Both IPC
+and 'uOps Per Cycle' are limited by the amount of hardware parallelism. The
+availability of hardware resources affects the resource pressure distribution,
+and it limits the number of instructions that can be executed in parallel every
+cycle. A delta between Dispatch Width and the theoretical maximum uOps per
+Cycle (computed by dividing the number of uOps of a single iteration by the
+*Block RTrhoughput*) is an indicator of a performance bottleneck caused by the
+lack of hardware resources.
+In general, the lower the Block RThroughput, the better.
+
+In this example, ``uOps per iteration/Block RThroughput`` is 1.50. Since there
+are no loop-carried dependencies, the observed *uOps Per Cycle* is expected to
+approach 1.50 when the number of iterations tends to infinity. The delta between
+the Dispatch Width (2.00), and the theoretical maximum uOp throughput (1.50) is
+an indicator of a performance bottleneck caused by the lack of hardware
+resources, and the *Resource pressure view* can help to identify the problematic
+resource usage.
The second section of the report shows the latency and reciprocal
throughput of every instruction in the sequence. That section also reports
# CHECK: Iterations: 600
# CHECK-NEXT: Instructions: 600
# CHECK-NEXT: Total Cycles: 603
-# CHECK-NEXT: Dispatch Width: 3
+# CHECK-NEXT: Total uOps: 600
+
+# CHECK: Dispatch Width: 3
+# CHECK-NEXT: uOps Per Cycle: 1.00
# CHECK-NEXT: IPC: 1.00
# CHECK-NEXT: Block RThroughput: 1.0
# ALL-NEXT: Instructions: 300
# M1-NEXT: Total Cycles: 76
-# M1-NEXT: Dispatch Width: 4
+# M3-NEXT: Total Cycles: 51
+
+# ALL-NEXT: Total uOps: 300
+
+# M1: Dispatch Width: 4
+# M1-NEXT: uOps Per Cycle: 3.95
# M1-NEXT: IPC: 3.95
# M1-NEXT: Block RThroughput: 0.3
-# M3-NEXT: Total Cycles: 51
-# M3-NEXT: Dispatch Width: 6
+# M3: Dispatch Width: 6
+# M3-NEXT: uOps Per Cycle: 5.88
# M3-NEXT: IPC: 5.88
# M3-NEXT: Block RThroughput: 0.2
# CHECK: Iterations: 100
# CHECK-NEXT: Instructions: 100
# CHECK-NEXT: Total Cycles: 28
-# CHECK-NEXT: Dispatch Width: 6
+# CHECK-NEXT: Total uOps: 100
+
+# CHECK: Dispatch Width: 6
+# CHECK-NEXT: uOps Per Cycle: 3.57
# CHECK-NEXT: IPC: 3.57
# CHECK-NEXT: Block RThroughput: 0.3
# ALL: Iterations: 1
# ALL-NEXT: Instructions: 1
# ALL-NEXT: Total Cycles: 2
+# ALL-NEXT: Total uOps: 1
-# M1-NEXT: Dispatch Width: 4
-# M3-NEXT: Dispatch Width: 6
-
-# ALL-NEXT: IPC: 0.50
-
+# M1: Dispatch Width: 4
+# M1-NEXT: uOps Per Cycle: 0.50
+# M1-NEXT: IPC: 0.50
# M1-NEXT: Block RThroughput: 0.3
+
+# M3: Dispatch Width: 6
+# M3-NEXT: uOps Per Cycle: 0.50
+# M3-NEXT: IPC: 0.50
# M3-NEXT: Block RThroughput: 0.2
# ALL: Schedulers - number of cycles where we saw N instructions issued:
# CHECK: Iterations: 2
# CHECK-NEXT: Instructions: 2
# CHECK-NEXT: Total Cycles: 4
-# CHECK-NEXT: Dispatch Width: 8
+# CHECK-NEXT: Total uOps: 4
+
+# CHECK: Dispatch Width: 8
+# CHECK-NEXT: uOps Per Cycle: 1.00
# CHECK-NEXT: IPC: 0.50
# CHECK-NEXT: Block RThroughput: 1.0
# CHECK: Iterations: 100
# CHECK-NEXT: Instructions: 100
# CHECK-NEXT: Total Cycles: 105
-# CHECK-NEXT: Dispatch Width: 2
+# CHECK-NEXT: Total uOps: 100
+
+# CHECK: Dispatch Width: 2
+# CHECK-NEXT: uOps Per Cycle: 0.95
# CHECK-NEXT: IPC: 0.95
# CHECK-NEXT: Block RThroughput: 1.0
# CHECK: Iterations: 1000
# CHECK-NEXT: Instructions: 3000
# CHECK-NEXT: Total Cycles: 1506
-# CHECK-NEXT: Dispatch Width: 2
+# CHECK-NEXT: Total uOps: 3000
+
+# CHECK: Dispatch Width: 2
+# CHECK-NEXT: uOps Per Cycle: 1.99
# CHECK-NEXT: IPC: 1.99
# CHECK-NEXT: Block RThroughput: 1.5
# CHECK: Iterations: 100
# CHECK-NEXT: Instructions: 400
# CHECK-NEXT: Total Cycles: 704
-# CHECK-NEXT: Dispatch Width: 2
+# CHECK-NEXT: Total uOps: 1200
+
+# CHECK: Dispatch Width: 2
+# CHECK-NEXT: uOps Per Cycle: 1.70
# CHECK-NEXT: IPC: 0.57
# CHECK-NEXT: Block RThroughput: 6.0
# CHECK: Iterations: 100
# CHECK-NEXT: Instructions: 1800
# CHECK-NEXT: Total Cycles: 3811
-# CHECK-NEXT: Dispatch Width: 2
+# CHECK-NEXT: Total uOps: 3400
+
+# CHECK: Dispatch Width: 2
+# CHECK-NEXT: uOps Per Cycle: 0.89
# CHECK-NEXT: IPC: 0.47
# CHECK-NEXT: Block RThroughput: 38.0
# CHECK: Iterations: 1500
# CHECK-NEXT: Instructions: 3000
# CHECK-NEXT: Total Cycles: 1504
-# CHECK-NEXT: Dispatch Width: 2
+# CHECK-NEXT: Total uOps: 3000
+
+# CHECK: Dispatch Width: 2
+# CHECK-NEXT: uOps Per Cycle: 1.99
# CHECK-NEXT: IPC: 1.99
# CHECK-NEXT: Block RThroughput: 1.0
# CHECK: Iterations: 1500
# CHECK-NEXT: Instructions: 6000
# CHECK-NEXT: Total Cycles: 3003
-# CHECK-NEXT: Dispatch Width: 2
+# CHECK-NEXT: Total uOps: 6000
+
+# CHECK: Dispatch Width: 2
+# CHECK-NEXT: uOps Per Cycle: 2.00
# CHECK-NEXT: IPC: 2.00
# CHECK-NEXT: Block RThroughput: 2.0
# CHECK: Iterations: 1500
# CHECK-NEXT: Instructions: 6000
# CHECK-NEXT: Total Cycles: 3001
-# CHECK-NEXT: Dispatch Width: 2
+# CHECK-NEXT: Total uOps: 6000
+
+# CHECK: Dispatch Width: 2
+# CHECK-NEXT: uOps Per Cycle: 2.00
# CHECK-NEXT: IPC: 2.00
# CHECK-NEXT: Block RThroughput: 2.0
# CHECK: Iterations: 1500
# CHECK-NEXT: Instructions: 3000
# CHECK-NEXT: Total Cycles: 3003
-# CHECK-NEXT: Dispatch Width: 2
+# CHECK-NEXT: Total uOps: 3000
+
+# CHECK: Dispatch Width: 2
+# CHECK-NEXT: uOps Per Cycle: 1.00
# CHECK-NEXT: IPC: 1.00
# CHECK-NEXT: Block RThroughput: 2.0
# CHECK: Iterations: 1500
# CHECK-NEXT: Instructions: 4500
# CHECK-NEXT: Total Cycles: 3007
-# CHECK-NEXT: Dispatch Width: 2
+# CHECK-NEXT: Total uOps: 6000
+
+# CHECK: Dispatch Width: 2
+# CHECK-NEXT: uOps Per Cycle: 2.00
# CHECK-NEXT: IPC: 1.50
# CHECK-NEXT: Block RThroughput: 2.0
# CHECK: Iterations: 500
# CHECK-NEXT: Instructions: 1500
# CHECK-NEXT: Total Cycles: 1504
-# CHECK-NEXT: Dispatch Width: 2
+# CHECK-NEXT: Total uOps: 1500
+
+# CHECK: Dispatch Width: 2
+# CHECK-NEXT: uOps Per Cycle: 1.00
# CHECK-NEXT: IPC: 1.00
# CHECK-NEXT: Block RThroughput: 1.5
# CHECK: Iterations: 300
# CHECK-NEXT: Instructions: 900
# CHECK-NEXT: Total Cycles: 610
-# CHECK-NEXT: Dispatch Width: 2
+# CHECK-NEXT: Total uOps: 900
+
+# CHECK: Dispatch Width: 2
+# CHECK-NEXT: uOps Per Cycle: 1.48
# CHECK-NEXT: IPC: 1.48
# CHECK-NEXT: Block RThroughput: 2.0
# CHECK: Iterations: 1
# CHECK-NEXT: Instructions: 2
# CHECK-NEXT: Total Cycles: 11
-# CHECK-NEXT: Dispatch Width: 2
+# CHECK-NEXT: Total uOps: 2
+
+# CHECK: Dispatch Width: 2
+# CHECK-NEXT: uOps Per Cycle: 0.18
# CHECK-NEXT: IPC: 0.18
# CHECK-NEXT: Block RThroughput: 1.0
# CHECK: Iterations: 1
# CHECK-NEXT: Instructions: 2
# CHECK-NEXT: Total Cycles: 12
-# CHECK-NEXT: Dispatch Width: 2
+# CHECK-NEXT: Total uOps: 3
+
+# CHECK: Dispatch Width: 2
+# CHECK-NEXT: uOps Per Cycle: 0.25
# CHECK-NEXT: IPC: 0.17
# CHECK-NEXT: Block RThroughput: 2.0
# ENABLED: Iterations: 100
# ENABLED-NEXT: Instructions: 300
# ENABLED-NEXT: Total Cycles: 209
-# ENABLED-NEXT: Dispatch Width: 2
+# ENABLED-NEXT: Total uOps: 300
+
+
+# ENABLED: Dispatch Width: 2
+# ENABLED-NEXT: uOps Per Cycle: 1.44
# ENABLED-NEXT: IPC: 1.44
# ENABLED-NEXT: Block RThroughput: 2.0
# CHECK: Iterations: 100
# CHECK-NEXT: Instructions: 800
# CHECK-NEXT: Total Cycles: 2403
-# CHECK-NEXT: Dispatch Width: 2
+# CHECK-NEXT: Total uOps: 800
+
+# CHECK: Dispatch Width: 2
+# CHECK-NEXT: uOps Per Cycle: 0.33
# CHECK-NEXT: IPC: 0.33
# CHECK-NEXT: Block RThroughput: 4.0
# CHECK: Iterations: 100
# CHECK-NEXT: Instructions: 800
# CHECK-NEXT: Total Cycles: 408
-# CHECK-NEXT: Dispatch Width: 2
+# CHECK-NEXT: Total uOps: 800
+
+# CHECK: Dispatch Width: 2
+# CHECK-NEXT: uOps Per Cycle: 1.96
# CHECK-NEXT: IPC: 1.96
# CHECK-NEXT: Block RThroughput: 4.0
# CHECK: Iterations: 100
# CHECK-NEXT: Instructions: 1500
# CHECK-NEXT: Total Cycles: 753
-# CHECK-NEXT: Dispatch Width: 2
+# CHECK-NEXT: Total uOps: 1500
+
+# CHECK: Dispatch Width: 2
+# CHECK-NEXT: uOps Per Cycle: 1.99
# CHECK-NEXT: IPC: 1.99
# CHECK-NEXT: Block RThroughput: 7.5
# CHECK: Iterations: 1
# CHECK-NEXT: Instructions: 3
# CHECK-NEXT: Total Cycles: 11
-# CHECK-NEXT: Dispatch Width: 2
+# CHECK-NEXT: Total uOps: 4
+
+# CHECK: Dispatch Width: 2
+# CHECK-NEXT: uOps Per Cycle: 0.36
# CHECK-NEXT: IPC: 0.27
# CHECK-NEXT: Block RThroughput: 4.0
# CHECK: Iterations: 1500
# CHECK-NEXT: Instructions: 4500
# CHECK-NEXT: Total Cycles: 4503
-# CHECK-NEXT: Dispatch Width: 2
+# CHECK-NEXT: Total uOps: 4500
+
+# CHECK: Dispatch Width: 2
+# CHECK-NEXT: uOps Per Cycle: 1.00
# CHECK-NEXT: IPC: 1.00
# CHECK-NEXT: Block RThroughput: 1.5
# CHECK: Iterations: 1500
# CHECK-NEXT: Instructions: 4500
# CHECK-NEXT: Total Cycles: 7503
-# CHECK-NEXT: Dispatch Width: 2
+# CHECK-NEXT: Total uOps: 6000
+
+# CHECK: Dispatch Width: 2
+# CHECK-NEXT: uOps Per Cycle: 0.80
# CHECK-NEXT: IPC: 0.60
# CHECK-NEXT: Block RThroughput: 2.0
# CHECK: Iterations: 1500
# CHECK-NEXT: Instructions: 1500
# CHECK-NEXT: Total Cycles: 1503
-# CHECK-NEXT: Dispatch Width: 2
+# CHECK-NEXT: Total uOps: 1500
+
+# CHECK: Dispatch Width: 2
+# CHECK-NEXT: uOps Per Cycle: 1.00
# CHECK-NEXT: IPC: 1.00
# CHECK-NEXT: Block RThroughput: 0.5
# CHECK: Iterations: 1500
# CHECK-NEXT: Instructions: 4500
# CHECK-NEXT: Total Cycles: 7504
-# CHECK-NEXT: Dispatch Width: 2
+# CHECK-NEXT: Total uOps: 6000
+
+# CHECK: Dispatch Width: 2
+# CHECK-NEXT: uOps Per Cycle: 0.80
# CHECK-NEXT: IPC: 0.60
# CHECK-NEXT: Block RThroughput: 2.0
# CHECK: Iterations: 1
# CHECK-NEXT: Instructions: 3
# CHECK-NEXT: Total Cycles: 8
-# CHECK-NEXT: Dispatch Width: 2
+# CHECK-NEXT: Total uOps: 4
+
+# CHECK: Dispatch Width: 2
+# CHECK-NEXT: uOps Per Cycle: 0.50
# CHECK-NEXT: IPC: 0.38
# CHECK-NEXT: Block RThroughput: 2.0
# CHECK: Iterations: 100
# CHECK-NEXT: Instructions: 800
# CHECK-NEXT: Total Cycles: 6306
-# CHECK-NEXT: Dispatch Width: 2
+# CHECK-NEXT: Total uOps: 1200
+
+# CHECK: Dispatch Width: 2
+# CHECK-NEXT: uOps Per Cycle: 0.19
# CHECK-NEXT: IPC: 0.13
# CHECK-NEXT: Block RThroughput: 63.0
# CHECK: Iterations: 2
# CHECK-NEXT: Instructions: 4
# CHECK-NEXT: Total Cycles: 205
-# CHECK-NEXT: Dispatch Width: 2
+# CHECK-NEXT: Total uOps: 4
+
+# CHECK: Dispatch Width: 2
+# CHECK-NEXT: uOps Per Cycle: 0.02
# CHECK-NEXT: IPC: 0.02
# CHECK-NEXT: Block RThroughput: 1.0
# CHECK: Iterations: 100
# CHECK-NEXT: Instructions: 800
# CHECK-NEXT: Total Cycles: 503
-# CHECK-NEXT: Dispatch Width: 2
+# CHECK-NEXT: Total uOps: 800
+
+# CHECK: Dispatch Width: 2
+# CHECK-NEXT: uOps Per Cycle: 1.59
# CHECK-NEXT: IPC: 1.59
# CHECK-NEXT: Block RThroughput: 4.0
# CHECK: Iterations: 1
# CHECK-NEXT: Instructions: 16
# CHECK-NEXT: Total Cycles: 31
-# CHECK-NEXT: Dispatch Width: 2
+# CHECK-NEXT: Total uOps: 16
+
+# CHECK: Dispatch Width: 2
+# CHECK-NEXT: uOps Per Cycle: 0.52
# CHECK-NEXT: IPC: 0.52
# CHECK-NEXT: Block RThroughput: 21.0
# CHECK: Iterations: 1
# CHECK-NEXT: Instructions: 2
# CHECK-NEXT: Total Cycles: 10
-# CHECK-NEXT: Dispatch Width: 2
+# CHECK-NEXT: Total uOps: 2
+
+# CHECK: Dispatch Width: 2
+# CHECK-NEXT: uOps Per Cycle: 0.20
# CHECK-NEXT: IPC: 0.20
# CHECK-NEXT: Block RThroughput: 1.0
# CHECK: Iterations: 1
# CHECK-NEXT: Instructions: 2
# CHECK-NEXT: Total Cycles: 10
-# CHECK-NEXT: Dispatch Width: 2
+# CHECK-NEXT: Total uOps: 4
+
+# CHECK: Dispatch Width: 2
+# CHECK-NEXT: uOps Per Cycle: 0.40
# CHECK-NEXT: IPC: 0.20
# CHECK-NEXT: Block RThroughput: 2.0
# CHECK: Iterations: 1
# CHECK-NEXT: Instructions: 3
# CHECK-NEXT: Total Cycles: 7
-# CHECK-NEXT: Dispatch Width: 3
+# CHECK-NEXT: Total uOps: 3
+
+# CHECK: Dispatch Width: 3
+# CHECK-NEXT: uOps Per Cycle: 0.43
# CHECK-NEXT: IPC: 0.43
# CHECK-NEXT: Block RThroughput: 1.5
# CHECK: Iterations: 5
# CHECK-NEXT: Instructions: 10
# CHECK-NEXT: Total Cycles: 28
-# CHECK-NEXT: Dispatch Width: 2
+# CHECK-NEXT: Total uOps: 10
+
+# CHECK: Dispatch Width: 2
+# CHECK-NEXT: uOps Per Cycle: 0.36
# CHECK-NEXT: IPC: 0.36
# CHECK-NEXT: Block RThroughput: 1.0
# CHECK: Iterations: 5
# CHECK-NEXT: Instructions: 10
# CHECK-NEXT: Total Cycles: 28
-# CHECK-NEXT: Dispatch Width: 2
+# CHECK-NEXT: Total uOps: 10
+
+# CHECK: Dispatch Width: 2
+# CHECK-NEXT: uOps Per Cycle: 0.36
# CHECK-NEXT: IPC: 0.36
# CHECK-NEXT: Block RThroughput: 1.0
# CHECK: Iterations: 2
# CHECK-NEXT: Instructions: 2
# CHECK-NEXT: Total Cycles: 55
-# CHECK-NEXT: Dispatch Width: 2
+# CHECK-NEXT: Total uOps: 4
+
+# CHECK: Dispatch Width: 2
+# CHECK-NEXT: uOps Per Cycle: 0.07
# CHECK-NEXT: IPC: 0.04
# CHECK-NEXT: Block RThroughput: 25.0
# CHECK: Iterations: 22
# CHECK-NEXT: Instructions: 22
# CHECK-NEXT: Total Cycles: 553
-# CHECK-NEXT: Dispatch Width: 2
+# CHECK-NEXT: Total uOps: 44
+
+# CHECK: Dispatch Width: 2
+# CHECK-NEXT: uOps Per Cycle: 0.08
# CHECK-NEXT: IPC: 0.04
# CHECK-NEXT: Block RThroughput: 25.0
# CHECK: Iterations: 1
# CHECK-NEXT: Instructions: 33
# CHECK-NEXT: Total Cycles: 69
-# CHECK-NEXT: Dispatch Width: 2
+# CHECK-NEXT: Total uOps: 66
+
+# CHECK: Dispatch Width: 2
+# CHECK-NEXT: uOps Per Cycle: 0.96
# CHECK-NEXT: IPC: 0.48
# CHECK-NEXT: Block RThroughput: 64.0
# CHECK: Iterations: 1
# CHECK-NEXT: Instructions: 2
# CHECK-NEXT: Total Cycles: 10
-# CHECK-NEXT: Dispatch Width: 2
+# CHECK-NEXT: Total uOps: 2
+
+# CHECK: Dispatch Width: 2
+# CHECK-NEXT: uOps Per Cycle: 0.20
# CHECK-NEXT: IPC: 0.20
# CHECK-NEXT: Block RThroughput: 1.0
# CHECK: Iterations: 100
# CHECK-NEXT: Instructions: 100
# CHECK-NEXT: Total Cycles: 103
-# CHECK-NEXT: Dispatch Width: 2
+# CHECK-NEXT: Total uOps: 100
+
+# CHECK: Dispatch Width: 2
+# CHECK-NEXT: uOps Per Cycle: 0.97
# CHECK-NEXT: IPC: 0.97
# CHECK-NEXT: Block RThroughput: 0.5
# CHECK: Iterations: 1
# CHECK-NEXT: Instructions: 2
# CHECK-NEXT: Total Cycles: 9
-# CHECK-NEXT: Dispatch Width: 2
+# CHECK-NEXT: Total uOps: 2
+
+# CHECK: Dispatch Width: 2
+# CHECK-NEXT: uOps Per Cycle: 0.22
# CHECK-NEXT: IPC: 0.22
# CHECK-NEXT: Block RThroughput: 1.0
# CHECK: Iterations: 1
# CHECK-NEXT: Instructions: 2
# CHECK-NEXT: Total Cycles: 10
-# CHECK-NEXT: Dispatch Width: 2
+# CHECK-NEXT: Total uOps: 4
+
+# CHECK: Dispatch Width: 2
+# CHECK-NEXT: uOps Per Cycle: 0.40
# CHECK-NEXT: IPC: 0.20
# CHECK-NEXT: Block RThroughput: 2.0
# CHECK: Iterations: 1
# CHECK-NEXT: Instructions: 55
# CHECK-NEXT: Total Cycles: 29
-# CHECK-NEXT: Dispatch Width: 2
+# CHECK-NEXT: Total uOps: 55
+
+# CHECK: Dispatch Width: 2
+# CHECK-NEXT: uOps Per Cycle: 1.90
# CHECK-NEXT: IPC: 1.90
# CHECK-NEXT: Block RThroughput: 27.5
# CHECK: Iterations: 100
# CHECK-NEXT: Instructions: 600
# CHECK-NEXT: Total Cycles: 318
-# CHECK-NEXT: Dispatch Width: 4
+# CHECK-NEXT: Total uOps: 600
+
+# CHECK: Dispatch Width: 4
+# CHECK-NEXT: uOps Per Cycle: 1.89
# CHECK-NEXT: IPC: 1.89
# CHECK-NEXT: Block RThroughput: 3.0
# CHECK: Iterations: 100
# CHECK-NEXT: Instructions: 600
# CHECK-NEXT: Total Cycles: 318
-# CHECK-NEXT: Dispatch Width: 4
+# CHECK-NEXT: Total uOps: 600
+
+# CHECK: Dispatch Width: 4
+# CHECK-NEXT: uOps Per Cycle: 1.89
# CHECK-NEXT: IPC: 1.89
# CHECK-NEXT: Block RThroughput: 3.0
# CHECK: Iterations: 100
# CHECK-NEXT: Instructions: 600
# CHECK-NEXT: Total Cycles: 318
-# CHECK-NEXT: Dispatch Width: 4
+# CHECK-NEXT: Total uOps: 600
+
+# CHECK: Dispatch Width: 4
+# CHECK-NEXT: uOps Per Cycle: 1.89
# CHECK-NEXT: IPC: 1.89
# CHECK-NEXT: Block RThroughput: 3.0
# CHECK: Iterations: 100
# CHECK-NEXT: Instructions: 600
# CHECK-NEXT: Total Cycles: 318
-# CHECK-NEXT: Dispatch Width: 4
+# CHECK-NEXT: Total uOps: 600
+
+# CHECK: Dispatch Width: 4
+# CHECK-NEXT: uOps Per Cycle: 1.89
# CHECK-NEXT: IPC: 1.89
# CHECK-NEXT: Block RThroughput: 3.0
# CHECK: Iterations: 100
# CHECK-NEXT: Instructions: 600
# CHECK-NEXT: Total Cycles: 316
-# CHECK-NEXT: Dispatch Width: 4
+# CHECK-NEXT: Total uOps: 600
+
+# CHECK: Dispatch Width: 4
+# CHECK-NEXT: uOps Per Cycle: 1.90
# CHECK-NEXT: IPC: 1.90
# CHECK-NEXT: Block RThroughput: 3.0
# CHECK: Iterations: 1
# CHECK-NEXT: Instructions: 3
# CHECK-NEXT: Total Cycles: 9
-# CHECK-NEXT: Dispatch Width: 4
+# CHECK-NEXT: Total uOps: 4
+
+# CHECK: Dispatch Width: 4
+# CHECK-NEXT: uOps Per Cycle: 0.44
# CHECK-NEXT: IPC: 0.33
# CHECK-NEXT: Block RThroughput: 1.0
# CHECK: Iterations: 1500
# CHECK-NEXT: Instructions: 4500
# CHECK-NEXT: Total Cycles: 4503
-# CHECK-NEXT: Dispatch Width: 4
+# CHECK-NEXT: Total uOps: 4500
+
+# CHECK: Dispatch Width: 4
+# CHECK-NEXT: uOps Per Cycle: 1.00
# CHECK-NEXT: IPC: 1.00
# CHECK-NEXT: Block RThroughput: 0.8
# CHECK: Iterations: 1500
# CHECK-NEXT: Instructions: 4500
# CHECK-NEXT: Total Cycles: 7503
-# CHECK-NEXT: Dispatch Width: 4
+# CHECK-NEXT: Total uOps: 4500
+
+# CHECK: Dispatch Width: 4
+# CHECK-NEXT: uOps Per Cycle: 0.60
# CHECK-NEXT: IPC: 0.60
# CHECK-NEXT: Block RThroughput: 1.0
# CHECK: Iterations: 1500
# CHECK-NEXT: Instructions: 1500
# CHECK-NEXT: Total Cycles: 1504
-# CHECK-NEXT: Dispatch Width: 4
+# CHECK-NEXT: Total uOps: 1500
+
+# CHECK: Dispatch Width: 4
+# CHECK-NEXT: uOps Per Cycle: 1.00
# CHECK-NEXT: IPC: 1.00
# CHECK-NEXT: Block RThroughput: 0.3
# CHECK: Iterations: 1500
# CHECK-NEXT: Instructions: 4500
# CHECK-NEXT: Total Cycles: 10503
-# CHECK-NEXT: Dispatch Width: 4
+# CHECK-NEXT: Total uOps: 7500
+
+# CHECK: Dispatch Width: 4
+# CHECK-NEXT: uOps Per Cycle: 0.71
# CHECK-NEXT: IPC: 0.43
# CHECK-NEXT: Block RThroughput: 1.3
# CHECK: Iterations: 1
# CHECK-NEXT: Instructions: 3
# CHECK-NEXT: Total Cycles: 9
-# CHECK-NEXT: Dispatch Width: 4
+# CHECK-NEXT: Total uOps: 4
+
+# CHECK: Dispatch Width: 4
+# CHECK-NEXT: uOps Per Cycle: 0.44
# CHECK-NEXT: IPC: 0.33
# CHECK-NEXT: Block RThroughput: 1.0
# CHECK: Iterations: 1
# CHECK-NEXT: Instructions: 3
# CHECK-NEXT: Total Cycles: 8
-# CHECK-NEXT: Dispatch Width: 4
+# CHECK-NEXT: Total uOps: 3
+
+# CHECK: Dispatch Width: 4
+# CHECK-NEXT: uOps Per Cycle: 0.38
# CHECK-NEXT: IPC: 0.38
# CHECK-NEXT: Block RThroughput: 1.0
# ALL-NEXT: Instructions: 2
# BDWELL-NEXT: Total Cycles: 10
-# BDWELL-NEXT: Dispatch Width: 4
-# BDWELL-NEXT: IPC: 0.20
-# BDWELL-NEXT: Block RThroughput: 1.0
+# BDWELL-NEXT: Total uOps: 4
# BTVER2-NEXT: Total Cycles: 7
-# BTVER2-NEXT: Dispatch Width: 2
-# BTVER2-NEXT: IPC: 0.29
-# BTVER2-NEXT: Block RThroughput: 1.0
+# BTVER2-NEXT: Total uOps: 2
# HASWELL-NEXT: Total Cycles: 10
-# HASWELL-NEXT: Dispatch Width: 4
-# HASWELL-NEXT: IPC: 0.20
-# HASWELL-NEXT: Block RThroughput: 1.0
+# HASWELL-NEXT: Total uOps: 4
# SKYLAKE-NEXT: Total Cycles: 10
-# SKYLAKE-NEXT: Dispatch Width: 6
-# SKYLAKE-NEXT: IPC: 0.20
-# SKYLAKE-NEXT: Block RThroughput: 0.7
+# SKYLAKE-NEXT: Total uOps: 4
# ZNVER1-NEXT: Total Cycles: 8
-# ZNVER1-NEXT: Dispatch Width: 4
+# ZNVER1-NEXT: Total uOps: 3
+
+# BTVER2: Dispatch Width: 2
+# BTVER2-NEXT: uOps Per Cycle: 0.29
+# BTVER2-NEXT: IPC: 0.29
+# BTVER2-NEXT: Block RThroughput: 1.0
+
+# ZNVER1: Dispatch Width: 4
+# ZNVER1-NEXT: uOps Per Cycle: 0.38
# ZNVER1-NEXT: IPC: 0.25
# ZNVER1-NEXT: Block RThroughput: 0.8
+# BDWELL: Dispatch Width: 4
+# BDWELL-NEXT: uOps Per Cycle: 0.40
+# BDWELL-NEXT: IPC: 0.20
+# BDWELL-NEXT: Block RThroughput: 1.0
+
+# HASWELL: Dispatch Width: 4
+# HASWELL-NEXT: uOps Per Cycle: 0.40
+# HASWELL-NEXT: IPC: 0.20
+# HASWELL-NEXT: Block RThroughput: 1.0
+
+# SKYLAKE: Dispatch Width: 6
+# SKYLAKE-NEXT: uOps Per Cycle: 0.40
+# SKYLAKE-NEXT: IPC: 0.20
+# SKYLAKE-NEXT: Block RThroughput: 0.7
+
# ALL: Instruction Info:
# ALL-NEXT: [1]: #uOps
# ALL-NEXT: [2]: Latency
# ALL-NEXT: Instructions: 2
# BDWELL-NEXT: Total Cycles: 9
-# BDWELL-NEXT: Dispatch Width: 4
+# HASWELL-NEXT: Total Cycles: 9
+# SKYLAKE-NEXT: Total Cycles: 9
+# ZNVER1-NEXT: Total Cycles: 8
+
+# ALL-NEXT: Total uOps: 3
+
+# BDWELL: Dispatch Width: 4
+# BDWELL-NEXT: uOps Per Cycle: 0.33
# BDWELL-NEXT: IPC: 0.22
# BDWELL-NEXT: Block RThroughput: 0.8
-# HASWELL-NEXT: Total Cycles: 9
-# HASWELL-NEXT: Dispatch Width: 4
+# HASWELL: Dispatch Width: 4
+# HASWELL-NEXT: uOps Per Cycle: 0.33
# HASWELL-NEXT: IPC: 0.22
# HASWELL-NEXT: Block RThroughput: 0.8
-# SKYLAKE-NEXT: Total Cycles: 9
-# SKYLAKE-NEXT: Dispatch Width: 6
-# SKYLAKE-NEXT: IPC: 0.22
-# SKYLAKE-NEXT: Block RThroughput: 0.5
-
-# ZNVER1-NEXT: Total Cycles: 8
-# ZNVER1-NEXT: Dispatch Width: 4
+# ZNVER1: Dispatch Width: 4
+# ZNVER1-NEXT: uOps Per Cycle: 0.38
# ZNVER1-NEXT: IPC: 0.25
# ZNVER1-NEXT: Block RThroughput: 0.8
+# SKYLAKE: Dispatch Width: 6
+# SKYLAKE-NEXT: uOps Per Cycle: 0.33
+# SKYLAKE-NEXT: IPC: 0.22
+# SKYLAKE-NEXT: Block RThroughput: 0.5
+
# ALL: Instruction Info:
# ALL-NEXT: [1]: #uOps
# ALL-NEXT: [2]: Latency
# ALL: Iterations: 100
# ALL-NEXT: Instructions: 100
# ALL-NEXT: Total Cycles: 103
+# ALL-NEXT: Total uOps: 100
-# BROADWELL-NEXT: Dispatch Width: 4
-# BTVER2-NEXT: Dispatch Width: 2
-# HASWELL-NEXT: Dispatch Width: 4
-# IVYBRIDGE-NEXT: Dispatch Width: 4
-# KNL-NEXT: Dispatch Width: 4
-# SANDYBRIDGE-NEXT: Dispatch Width: 4
-# SKX-NEXT: Dispatch Width: 6
-# SKX-AVX512-NEXT: Dispatch Width: 6
-# SLM-NEXT: Dispatch Width: 2
-# ZNVER1-NEXT: Dispatch Width: 4
+# BTVER2: Dispatch Width: 2
+# BTVER2-NEXT: uOps Per Cycle: 0.97
+# BTVER2-NEXT: IPC: 0.97
+# BTVER2-NEXT: Block RThroughput: 0.5
-# ALL-NEXT: IPC: 0.97
+# SLM: Dispatch Width: 2
+# SLM-NEXT: uOps Per Cycle: 0.97
+# SLM-NEXT: IPC: 0.97
+# SLM-NEXT: Block RThroughput: 0.5
+# BROADWELL: Dispatch Width: 4
+# BROADWELL-NEXT: uOps Per Cycle: 0.97
+# BROADWELL-NEXT: IPC: 0.97
# BROADWELL-NEXT: Block RThroughput: 0.3
-# BTVER2-NEXT: Block RThroughput: 0.5
+
+# HASWELL: Dispatch Width: 4
+# HASWELL-NEXT: uOps Per Cycle: 0.97
+# HASWELL-NEXT: IPC: 0.97
# HASWELL-NEXT: Block RThroughput: 0.3
+
+# IVYBRIDGE: Dispatch Width: 4
+# IVYBRIDGE-NEXT: uOps Per Cycle: 0.97
+# IVYBRIDGE-NEXT: IPC: 0.97
# IVYBRIDGE-NEXT: Block RThroughput: 0.3
+
+# KNL: Dispatch Width: 4
+# KNL-NEXT: uOps Per Cycle: 0.97
+# KNL-NEXT: IPC: 0.97
# KNL-NEXT: Block RThroughput: 0.3
+
+# SANDYBRIDGE: Dispatch Width: 4
+# SANDYBRIDGE-NEXT: uOps Per Cycle: 0.97
+# SANDYBRIDGE-NEXT: IPC: 0.97
# SANDYBRIDGE-NEXT: Block RThroughput: 0.3
+
+# ZNVER1: Dispatch Width: 4
+# ZNVER1-NEXT: uOps Per Cycle: 0.97
+# ZNVER1-NEXT: IPC: 0.97
+# ZNVER1-NEXT: Block RThroughput: 0.3
+
+# SKX: Dispatch Width: 6
+# SKX-NEXT: uOps Per Cycle: 0.97
+# SKX-NEXT: IPC: 0.97
# SKX-NEXT: Block RThroughput: 0.3
+
+# SKX-AVX512: Dispatch Width: 6
+# SKX-AVX512-NEXT: uOps Per Cycle: 0.97
+# SKX-AVX512-NEXT: IPC: 0.97
# SKX-AVX512-NEXT: Block RThroughput: 0.3
-# SLM-NEXT: Block RThroughput: 0.5
-# ZNVER1-NEXT: Block RThroughput: 0.3
# CUSTOM: Iterations: 1
# CUSTOM-NEXT: Instructions: 1
# CUSTOM-NEXT: Total Cycles: 4
-# CUSTOM-NEXT: Dispatch Width: 2
-# CUSTOM-NEXT: IPC: 0.25
-# CUSTOM-NEXT: Block RThroughput: 0.5
+# CUSTOM-NEXT: Total uOps: 1
# DEFAULT: Iterations: 100
# DEFAULT-NEXT: Instructions: 100
# DEFAULT-NEXT: Total Cycles: 103
-# DEFAULT-NEXT: Dispatch Width: 2
+# DEFAULT-NEXT: Total uOps: 100
+
+# ALL: Dispatch Width: 2
+
+# CUSTOM-NEXT: uOps Per Cycle: 0.25
+# CUSTOM-NEXT: IPC: 0.25
+
+# DEFAULT-NEXT: uOps Per Cycle: 0.97
# DEFAULT-NEXT: IPC: 0.97
-# DEFAULT-NEXT: Block RThroughput: 0.5
+
+# ALL-NEXT: Block RThroughput: 0.5
# ALL: Instruction Info:
# ALL-NEXT: [1]: #uOps
# ALL: Iterations: 100
# ALL-NEXT: Instructions: 100
# ALL-NEXT: Total Cycles: 103
+# ALL-NEXT: Total uOps: 100
-# CUSTOM-NEXT: Dispatch Width: 1
-# DEFAULT-NEXT: Dispatch Width: 2
-
-# ALL-NEXT: IPC: 0.97
-
+# CUSTOM: Dispatch Width: 1
+# CUSTOM-NEXT: uOps Per Cycle: 0.97
+# CUSTOM-NEXT: IPC: 0.97
# CUSTOM-NEXT: Block RThroughput: 1.0
+
+# DEFAULT: Dispatch Width: 2
+# DEFAULT-NEXT: uOps Per Cycle: 0.97
+# DEFAULT-NEXT: IPC: 0.97
# DEFAULT-NEXT: Block RThroughput: 0.5
# ALL-NEXT: Instructions: 2
# BDWELL-NEXT: Total Cycles: 13
-# BDWELL-NEXT: Dispatch Width: 4
-# BDWELL-NEXT: IPC: 0.15
+# BDWELL-NEXT: Total uOps: 3
# HASWELL-NEXT: Total Cycles: 14
-# HASWELL-NEXT: Dispatch Width: 4
-# HASWELL-NEXT: IPC: 0.14
+# HASWELL-NEXT: Total uOps: 3
# SKYLAKE-NEXT: Total Cycles: 13
-# SKYLAKE-NEXT: Dispatch Width: 6
-# SKYLAKE-NEXT: IPC: 0.15
+# SKYLAKE-NEXT: Total uOps: 3
# ZNVER1-NEXT: Total Cycles: 15
-# ZNVER1-NEXT: Dispatch Width: 4
+# ZNVER1-NEXT: Total uOps: 2
+
+# ZNVER1: Dispatch Width: 4
+# ZNVER1-NEXT: uOps Per Cycle: 0.13
# ZNVER1-NEXT: IPC: 0.13
+# ZNVER1-NEXT: Block RThroughput: 1.0
-# ALL-NEXT: Block RThroughput: 1.0
+# HASWELL: Dispatch Width: 4
+# HASWELL-NEXT: uOps Per Cycle: 0.21
+# HASWELL-NEXT: IPC: 0.14
+# HASWELL-NEXT: Block RThroughput: 1.0
+
+# BDWELL: Dispatch Width: 4
+# BDWELL-NEXT: uOps Per Cycle: 0.23
+# BDWELL-NEXT: IPC: 0.15
+# BDWELL-NEXT: Block RThroughput: 1.0
+
+# SKYLAKE: Dispatch Width: 6
+# SKYLAKE-NEXT: uOps Per Cycle: 0.23
+# SKYLAKE-NEXT: IPC: 0.15
+# SKYLAKE-NEXT: Block RThroughput: 1.0
# ALL: Timeline view:
# ALL-NEXT: Instructions: 2
# BDWELL-NEXT: Total Cycles: 13
-# BDWELL-NEXT: Dispatch Width: 4
-# BDWELL-NEXT: IPC: 0.15
+# BDWELL-NEXT: Total uOps: 3
# HASWELL-NEXT: Total Cycles: 14
-# HASWELL-NEXT: Dispatch Width: 4
-# HASWELL-NEXT: IPC: 0.14
+# HASWELL-NEXT: Total uOps: 3
# SKYLAKE-NEXT: Total Cycles: 13
-# SKYLAKE-NEXT: Dispatch Width: 6
-# SKYLAKE-NEXT: IPC: 0.15
+# SKYLAKE-NEXT: Total uOps: 3
# ZNVER1-NEXT: Total Cycles: 15
-# ZNVER1-NEXT: Dispatch Width: 4
+# ZNVER1-NEXT: Total uOps: 2
+
+# ZNVER1: Dispatch Width: 4
+# ZNVER1-NEXT: uOps Per Cycle: 0.13
# ZNVER1-NEXT: IPC: 0.13
+# ZNVER1-NEXT: Block RThroughput: 1.0
-# ALL-NEXT: Block RThroughput: 1.0
+# HASWELL: Dispatch Width: 4
+# HASWELL-NEXT: uOps Per Cycle: 0.21
+# HASWELL-NEXT: IPC: 0.14
+# HASWELL-NEXT: Block RThroughput: 1.0
+
+# BDWELL: Dispatch Width: 4
+# BDWELL-NEXT: uOps Per Cycle: 0.23
+# BDWELL-NEXT: IPC: 0.15
+# BDWELL-NEXT: Block RThroughput: 1.0
+
+# SKYLAKE: Dispatch Width: 6
+# SKYLAKE-NEXT: uOps Per Cycle: 0.23
+# SKYLAKE-NEXT: IPC: 0.15
+# SKYLAKE-NEXT: Block RThroughput: 1.0
# ALL: Timeline view:
# ALL: Iterations: 100
# ALL-NEXT: Instructions: 400
# ALL-NEXT: Total Cycles: 305
-# ALL-NEXT: Dispatch Width: 2
+# ALL-NEXT: Total uOps: 500
+
+# ALL: Dispatch Width: 2
+# ALL-NEXT: uOps Per Cycle: 1.64
# ALL-NEXT: IPC: 1.31
# ALL-NEXT: Block RThroughput: 2.5
# CHECK: Iterations: 1
# CHECK-NEXT: Instructions: 1
# CHECK-NEXT: Total Cycles: 4
-# CHECK-NEXT: Dispatch Width: 2
+# CHECK-NEXT: Total uOps: 1
+
+# CHECK: Dispatch Width: 2
+# CHECK-NEXT: uOps Per Cycle: 0.25
# CHECK-NEXT: IPC: 0.25
# CHECK-NEXT: Block RThroughput: 0.5
# CHECK: Iterations: 1
# CHECK-NEXT: Instructions: 1
# CHECK-NEXT: Total Cycles: 4
-# CHECK-NEXT: Dispatch Width: 2
+# CHECK-NEXT: Total uOps: 1
+
+# CHECK: Dispatch Width: 2
+# CHECK-NEXT: uOps Per Cycle: 0.25
# CHECK-NEXT: IPC: 0.25
# CHECK-NEXT: Block RThroughput: 0.5
# CHECK: Iterations: 1
# CHECK-NEXT: Instructions: 1
# CHECK-NEXT: Total Cycles: 4
-# CHECK-NEXT: Dispatch Width: 2
+# CHECK-NEXT: Total uOps: 1
+
+# CHECK: Dispatch Width: 2
+# CHECK-NEXT: uOps Per Cycle: 0.25
# CHECK-NEXT: IPC: 0.25
# CHECK-NEXT: Block RThroughput: 0.5
# CHECK: Iterations: 1
# CHECK-NEXT: Instructions: 1
# CHECK-NEXT: Total Cycles: 4
-# CHECK-NEXT: Dispatch Width: 2
+# CHECK-NEXT: Total uOps: 1
+
+# CHECK: Dispatch Width: 2
+# CHECK-NEXT: uOps Per Cycle: 0.25
# CHECK-NEXT: IPC: 0.25
# CHECK-NEXT: Block RThroughput: 0.5
# CHECK: Iterations: 1
# CHECK-NEXT: Instructions: 1
# CHECK-NEXT: Total Cycles: 4
-# CHECK-NEXT: Dispatch Width: 2
+# CHECK-NEXT: Total uOps: 1
+
+# CHECK: Dispatch Width: 2
+# CHECK-NEXT: uOps Per Cycle: 0.25
# CHECK-NEXT: IPC: 0.25
# CHECK-NEXT: Block RThroughput: 0.5
# CHECK: Iterations: 1
# CHECK-NEXT: Instructions: 1
# CHECK-NEXT: Total Cycles: 4
-# CHECK-NEXT: Dispatch Width: 2
+# CHECK-NEXT: Total uOps: 1
+
+# CHECK: Dispatch Width: 2
+# CHECK-NEXT: uOps Per Cycle: 0.25
# CHECK-NEXT: IPC: 0.25
# CHECK-NEXT: Block RThroughput: 0.5
# ALL: Iterations: 100
# ALL-NEXT: Instructions: 100
# ALL-NEXT: Total Cycles: 103
-# ALL-NEXT: Dispatch Width: 2
+# ALL-NEXT: Total uOps: 100
+
+# ALL: Dispatch Width: 2
+# ALL-NEXT: uOps Per Cycle: 0.97
# ALL-NEXT: IPC: 0.97
# ALL-NEXT: Block RThroughput: 0.5
# ALL: Iterations: 100
# ALL-NEXT: Instructions: 100
# ALL-NEXT: Total Cycles: 103
-# ALL-NEXT: Dispatch Width: 2
+# ALL-NEXT: Total uOps: 100
+
+# ALL: Dispatch Width: 2
+# ALL-NEXT: uOps Per Cycle: 0.97
# ALL-NEXT: IPC: 0.97
# ALL-NEXT: Block RThroughput: 0.5
# DEFAULTREPORT: Iterations: 100
# DEFAULTREPORT-NEXT: Instructions: 100
# DEFAULTREPORT-NEXT: Total Cycles: 103
-# DEFAULTREPORT-NEXT: Dispatch Width: 2
+# DEFAULTREPORT-NEXT: Total uOps: 100
+
+# DEFAULTREPORT: Dispatch Width: 2
+# DEFAULTREPORT-NEXT: uOps Per Cycle: 0.97
# DEFAULTREPORT-NEXT: IPC: 0.97
# DEFAULTREPORT-NEXT: Block RThroughput: 0.5
# ALL: Iterations: 100
# ALL-NEXT: Instructions: 100
# ALL-NEXT: Total Cycles: 103
-# ALL-NEXT: Dispatch Width: 2
+# ALL-NEXT: Total uOps: 100
+
+# ALL: Dispatch Width: 2
+# ALL-NEXT: uOps Per Cycle: 0.97
# ALL-NEXT: IPC: 0.97
# ALL-NEXT: Block RThroughput: 0.5
# CHECK: Iterations: 100
# CHECK-NEXT: Instructions: 100
# CHECK-NEXT: Total Cycles: 103
-# CHECK-NEXT: Dispatch Width: 2
+# CHECK-NEXT: Total uOps: 100
+
+# CHECK: Dispatch Width: 2
+# CHECK-NEXT: uOps Per Cycle: 0.97
# CHECK-NEXT: IPC: 0.97
# CHECK-NEXT: Block RThroughput: 0.5
# ALL-NEXT: Instructions: 2
# BDWELL-NEXT: Total Cycles: 10
-# BDWELL-NEXT: Dispatch Width: 4
-# BDWELL-NEXT: IPC: 0.20
-# BDWELL-NEXT: Block RThroughput: 2.0
+# BDWELL-NEXT: Total uOps: 4
# BTVER2-NEXT: Total Cycles: 11
-# BTVER2-NEXT: Dispatch Width: 2
-# BTVER2-NEXT: IPC: 0.18
-# BTVER2-NEXT: Block RThroughput: 2.0
+# BTVER2-NEXT: Total uOps: 4
# HASWELL-NEXT: Total Cycles: 11
-# HASWELL-NEXT: Dispatch Width: 4
-# HASWELL-NEXT: IPC: 0.18
-# HASWELL-NEXT: Block RThroughput: 2.0
+# HASWELL-NEXT: Total uOps: 4
# IVY-NEXT: Total Cycles: 11
-# IVY-NEXT: Dispatch Width: 4
-# IVY-NEXT: IPC: 0.18
-# IVY-NEXT: Block RThroughput: 1.0
+# IVY-NEXT: Total uOps: 4
# SANDY-NEXT: Total Cycles: 11
-# SANDY-NEXT: Dispatch Width: 4
-# SANDY-NEXT: IPC: 0.18
-# SANDY-NEXT: Block RThroughput: 1.0
+# SANDY-NEXT: Total uOps: 4
# SKYLAKE-NEXT: Total Cycles: 11
-# SKYLAKE-NEXT: Dispatch Width: 6
-# SKYLAKE-NEXT: IPC: 0.18
-# SKYLAKE-NEXT: Block RThroughput: 0.7
+# SKYLAKE-NEXT: Total uOps: 4
# ZNVER1-NEXT: Total Cycles: 11
-# ZNVER1-NEXT: Dispatch Width: 4
+# ZNVER1-NEXT: Total uOps: 2
+
+# BTVER2: Dispatch Width: 2
+# BTVER2-NEXT: uOps Per Cycle: 0.36
+# BTVER2-NEXT: IPC: 0.18
+# BTVER2-NEXT: Block RThroughput: 2.0
+
+# ZNVER1: Dispatch Width: 4
+# ZNVER1-NEXT: uOps Per Cycle: 0.18
# ZNVER1-NEXT: IPC: 0.18
# ZNVER1-NEXT: Block RThroughput: 1.0
+# IVY: Dispatch Width: 4
+# IVY-NEXT: uOps Per Cycle: 0.36
+# IVY-NEXT: IPC: 0.18
+# IVY-NEXT: Block RThroughput: 1.0
+
+# SANDY: Dispatch Width: 4
+# SANDY-NEXT: uOps Per Cycle: 0.36
+# SANDY-NEXT: IPC: 0.18
+# SANDY-NEXT: Block RThroughput: 1.0
+
+# HASWELL: Dispatch Width: 4
+# HASWELL-NEXT: uOps Per Cycle: 0.36
+# HASWELL-NEXT: IPC: 0.18
+# HASWELL-NEXT: Block RThroughput: 2.0
+
+# BDWELL: Dispatch Width: 4
+# BDWELL-NEXT: uOps Per Cycle: 0.40
+# BDWELL-NEXT: IPC: 0.20
+# BDWELL-NEXT: Block RThroughput: 2.0
+
+# SKYLAKE: Dispatch Width: 6
+# SKYLAKE-NEXT: uOps Per Cycle: 0.36
+# SKYLAKE-NEXT: IPC: 0.18
+# SKYLAKE-NEXT: Block RThroughput: 0.7
+
# BTVER2: Timeline view:
# BTVER2-NEXT: 0
# BTVER2-NEXT: Index 0123456789
# ALL-NEXT: Instructions: 2
# BDWELL-NEXT: Total Cycles: 10
-# BDWELL-NEXT: Dispatch Width: 4
-# BDWELL-NEXT: IPC: 0.20
-# BDWELL-NEXT: Block RThroughput: 2.0
+# BDWELL-NEXT: Total uOps: 4
# BTVER2-NEXT: Total Cycles: 11
-# BTVER2-NEXT: Dispatch Width: 2
-# BTVER2-NEXT: IPC: 0.18
-# BTVER2-NEXT: Block RThroughput: 2.0
+# BTVER2-NEXT: Total uOps: 4
# HASWELL-NEXT: Total Cycles: 11
-# HASWELL-NEXT: Dispatch Width: 4
-# HASWELL-NEXT: IPC: 0.18
-# HASWELL-NEXT: Block RThroughput: 2.0
+# HASWELL-NEXT: Total uOps: 4
# IVY-NEXT: Total Cycles: 11
-# IVY-NEXT: Dispatch Width: 4
-# IVY-NEXT: IPC: 0.18
-# IVY-NEXT: Block RThroughput: 1.0
+# IVY-NEXT: Total uOps: 4
# SANDY-NEXT: Total Cycles: 11
-# SANDY-NEXT: Dispatch Width: 4
-# SANDY-NEXT: IPC: 0.18
-# SANDY-NEXT: Block RThroughput: 1.0
+# SANDY-NEXT: Total uOps: 4
# SKYLAKE-NEXT: Total Cycles: 11
-# SKYLAKE-NEXT: Dispatch Width: 6
-# SKYLAKE-NEXT: IPC: 0.18
-# SKYLAKE-NEXT: Block RThroughput: 0.7
+# SKYLAKE-NEXT: Total uOps: 4
# ZNVER1-NEXT: Total Cycles: 11
-# ZNVER1-NEXT: Dispatch Width: 4
+# ZNVER1-NEXT: Total uOps: 2
+
+# BTVER2: Dispatch Width: 2
+# BTVER2-NEXT: uOps Per Cycle: 0.36
+# BTVER2-NEXT: IPC: 0.18
+# BTVER2-NEXT: Block RThroughput: 2.0
+
+# ZNVER1: Dispatch Width: 4
+# ZNVER1-NEXT: uOps Per Cycle: 0.18
# ZNVER1-NEXT: IPC: 0.18
# ZNVER1-NEXT: Block RThroughput: 1.0
+# IVY: Dispatch Width: 4
+# IVY-NEXT: uOps Per Cycle: 0.36
+# IVY-NEXT: IPC: 0.18
+# IVY-NEXT: Block RThroughput: 1.0
+
+# SANDY: Dispatch Width: 4
+# SANDY-NEXT: uOps Per Cycle: 0.36
+# SANDY-NEXT: IPC: 0.18
+# SANDY-NEXT: Block RThroughput: 1.0
+
+# HASWELL: Dispatch Width: 4
+# HASWELL-NEXT: uOps Per Cycle: 0.36
+# HASWELL-NEXT: IPC: 0.18
+# HASWELL-NEXT: Block RThroughput: 2.0
+
+# BDWELL: Dispatch Width: 4
+# BDWELL-NEXT: uOps Per Cycle: 0.40
+# BDWELL-NEXT: IPC: 0.20
+# BDWELL-NEXT: Block RThroughput: 2.0
+
+# SKYLAKE: Dispatch Width: 6
+# SKYLAKE-NEXT: uOps Per Cycle: 0.36
+# SKYLAKE-NEXT: IPC: 0.18
+# SKYLAKE-NEXT: Block RThroughput: 0.7
+
# BTVER2: Timeline view:
# BTVER2-NEXT: 0
# BTVER2-NEXT: Index 0123456789
unsigned Iterations = Source.getNumIterations();
unsigned Instructions = Source.size();
unsigned TotalInstructions = Instructions * Iterations;
+ unsigned TotalUOps = NumMicroOps * Iterations;
double IPC = (double)TotalInstructions / TotalCycles;
+ double UOpsPerCycle = (double)TotalUOps / TotalCycles;
double BlockRThroughput = computeBlockRThroughput(
SM, DispatchWidth, NumMicroOps, ProcResourceUsage);
TempStream << "Iterations: " << Iterations;
TempStream << "\nInstructions: " << TotalInstructions;
TempStream << "\nTotal Cycles: " << TotalCycles;
+ TempStream << "\nTotal uOps: " << TotalUOps << '\n';
TempStream << "\nDispatch Width: " << DispatchWidth;
- TempStream << "\nIPC: " << format("%.2f", IPC);
-
- // Round to the block reciprocal throughput to the nearest tenth.
+ TempStream << "\nuOps Per Cycle: "
+ << format("%.2f", floor((UOpsPerCycle * 100) + 0.5) / 100);
+ TempStream << "\nIPC: "
+ << format("%.2f", floor((IPC * 100) + 0.5) / 100);
TempStream << "\nBlock RThroughput: "
<< format("%.1f", floor((BlockRThroughput * 10) + 0.5) / 10)
<< '\n';