[llvm] r338881 - [llvm-mca][docs] Improve the CommandLine documentation.
Andrea Di Biagio via llvm-commits
llvm-commits at lists.llvm.org
Fri Aug 3 05:44:56 PDT 2018
Author: adibiagio
Date: Fri Aug 3 05:44:56 2018
New Revision: 338881
URL: http://llvm.org/viewvc/llvm-project?rev=338881&view=rev
Log:
[llvm-mca][docs] Improve the CommandLine documentation.
This patch replaces all the remaining occurrences of string "MCA" with
":program:`llvm-mca`". Somehow I missed those strings when I committed r338394.
This patch also improves section "Instruction Dispatch".
Modified:
llvm/trunk/docs/CommandGuide/llvm-mca.rst
Modified: llvm/trunk/docs/CommandGuide/llvm-mca.rst
URL: http://llvm.org/viewvc/llvm-project/llvm/trunk/docs/CommandGuide/llvm-mca.rst?rev=338881&r1=338880&r2=338881&view=diff
==============================================================================
--- llvm/trunk/docs/CommandGuide/llvm-mca.rst (original)
+++ llvm/trunk/docs/CommandGuide/llvm-mca.rst Fri Aug 3 05:44:56 2018
@@ -454,8 +454,8 @@ The ``-all-stats`` command line option e
counters for the dispatch logic, the reorder buffer, the retire control unit,
and the register file.
-Below is an example of ``-all-stats`` output generated by MCA for the
-dot-product example discussed in the previous sections.
+Below is an example of ``-all-stats`` output generated by :program:`llvm-mca`
+for the dot-product example discussed in the previous sections.
.. code-block:: none
@@ -514,17 +514,16 @@ SCHEDQ reports 272 cycles. This counter
logic is unable to dispatch a group of two instructions because the scheduler's
queue is full.
-Looking at the *Dispatch Logic* table, we see that the pipeline was only able
-to dispatch two instructions 51.5% of the time. The dispatch group was limited
-to one instruction 44.6% of the cycles, which corresponds to 272 cycles. The
+Looking at the *Dispatch Logic* table, we see that the pipeline was only able to
+dispatch two instructions 51.5% of the time. The dispatch group was limited to
+one instruction 44.6% of the cycles, which corresponds to 272 cycles. The
dispatch statistics are displayed by either using the command option
``-all-stats`` or ``-dispatch-stats``.
The next table, *Schedulers*, presents a histogram displaying a count,
representing the number of instructions issued on some number of cycles. In
-this case, of the 610 simulated cycles, single
-instructions were issued 306 times (50.2%) and there were 7 cycles where
-no instructions were issued.
+this case, of the 610 simulated cycles, single instructions were issued 306
+times (50.2%) and there were 7 cycles where no instructions were issued.
The *Scheduler's queue usage* table shows that the maximum number of buffer
entries (i.e., scheduler queue entries) used at runtime. Resource JFPU01
@@ -543,28 +542,28 @@ A full scheduler queue is either caused
sub-optimal usage of hardware resources. Sometimes, resource pressure can be
mitigated by rewriting the kernel using different instructions that consume
different scheduler resources. Schedulers with a small queue are less resilient
-to bottlenecks caused by the presence of long data dependencies.
-The scheduler statistics are displayed by
-using the command option ``-all-stats`` or ``-scheduler-stats``.
+to bottlenecks caused by the presence of long data dependencies. The scheduler
+statistics are displayed by using the command option ``-all-stats`` or
+``-scheduler-stats``.
The next table, *Retire Control Unit*, presents a histogram displaying a count,
representing the number of instructions retired on some number of cycles. In
-this case, of the 610 simulated cycles, two instructions were retired during
-the same cycle 399 times (65.4%) and there were 109 cycles where no
-instructions were retired. The retire statistics are displayed by using the
-command option ``-all-stats`` or ``-retire-stats``.
+this case, of the 610 simulated cycles, two instructions were retired during the
+same cycle 399 times (65.4%) and there were 109 cycles where no instructions
+were retired. The retire statistics are displayed by using the command option
+``-all-stats`` or ``-retire-stats``.
The last table presented is *Register File statistics*. Each physical register
file (PRF) used by the pipeline is presented in this table. In the case of AMD
-Jaguar, there are two register files, one for floating-point registers
-(JFpuPRF) and one for integer registers (JIntegerPRF). The table shows that of
-the 900 instructions processed, there were 900 mappings created. Since this
-dot-product example utilized only floating point registers, the JFPuPRF was
-responsible for creating the 900 mappings. However, we see that the pipeline
-only used a maximum of 35 of 72 available register slots at any given time. We
-can conclude that the floating point PRF was the only register file used for
-the example, and that it was never resource constrained. The register file
-statistics are displayed by using the command option ``-all-stats`` or
+Jaguar, there are two register files, one for floating-point registers (JFpuPRF)
+and one for integer registers (JIntegerPRF). The table shows that of the 900
+instructions processed, there were 900 mappings created. Since this dot-product
+example utilized only floating point registers, the JFPuPRF was responsible for
+creating the 900 mappings. However, we see that the pipeline only used a
+maximum of 35 of 72 available register slots at any given time. We can conclude
+that the floating point PRF was the only register file used for the example, and
+that it was never resource constrained. The register file statistics are
+displayed by using the command option ``-all-stats`` or
``-register-file-stats``.
In this example, we can conclude that the IPC is mostly limited by data
@@ -572,8 +571,8 @@ dependencies, and not by resource pressu
Instruction Flow
^^^^^^^^^^^^^^^^
-This section describes the instruction flow through MCA's default out-of-order
-pipeline, as well as the functional units involved in the process.
+This section describes the instruction flow through the default pipeline of
+:program:`llvm-mca`, as well as the functional units involved in the process.
The default pipeline implements the following sequence of stages used to
process instructions.
@@ -585,9 +584,9 @@ process instructions.
The default pipeline only models the out-of-order portion of a processor.
Therefore, the instruction fetch and decode stages are not modeled. Performance
-bottlenecks in the frontend are not diagnosed. MCA assumes that instructions
-have all been decoded and placed into a queue. Also, MCA does not model branch
-prediction.
+bottlenecks in the frontend are not diagnosed. :program:`llvm-mca` assumes that
+instructions have all been decoded and placed into a queue before the simulation
+start. Also, :program:`llvm-mca` does not model branch prediction.
Instruction Dispatch
""""""""""""""""""""
@@ -607,19 +606,19 @@ An instruction can be dispatched if:
* The schedulers are not full.
Scheduling models can optionally specify which register files are available on
-the processor. MCA uses that information to initialize register file
-descriptors. Users can limit the number of physical registers that are
+the processor. :program:`llvm-mca` uses that information to initialize register
+file descriptors. Users can limit the number of physical registers that are
globally available for register renaming by using the command option
-``-register-file-size``. A value of zero for this option means *unbounded*.
-By knowing how many registers are available for renaming, MCA can predict
-dispatch stalls caused by the lack of registers.
+``-register-file-size``. A value of zero for this option means *unbounded*. By
+knowing how many registers are available for renaming, the tool can predict
+dispatch stalls caused by the lack of physical registers.
The number of reorder buffer entries consumed by an instruction depends on the
-number of micro-opcodes specified by the target scheduling model. MCA's
-reorder buffer's purpose is to track the progress of instructions that are
-"in-flight," and to retire instructions in program order. The number of
-entries in the reorder buffer defaults to the `MicroOpBufferSize` provided by
-the target scheduling model.
+number of micro-opcodes specified for that instruction by the target scheduling
+model. The reorder buffer is responsible for tracking the progress of
+instructions that are "in-flight", and retiring them in program order. The
+number of entries in the reorder buffer defaults to the value specified by field
+`MicroOpBufferSize` in the target scheduling model.
Instructions that are dispatched to the schedulers consume scheduler buffer
entries. :program:`llvm-mca` queries the scheduling model to determine the set
More information about the llvm-commits
mailing list