[PATCH] D49692: [llvm-mca][docs] Add instruction flow documentation. NFC.

Mon Jul 30 03:49:58 PDT 2018

andreadb accepted this revision.
andreadb added a comment.
This revision is now accepted and ready to land.

Hi Matt,

LGTM if you address the comments below.

Thanks,
-Andrea

================
Comment at: docs/CommandGuide/llvm-mca.rst:579-583
+hardware resources.  Initially, the dispatch width is chosen from LLVM's
+knowledge of instruction scheduling information.  LLVM maintains a set of
+architecture and instruction information for each architecture it supports.
+This information is called the scheduling model.  Which model to use is
+influenced by the ``-mtriple`` and ``-mcpu`` options.  For the case of dispatch
----------------
I don't think you should describe what a scheduling model is. We can assume that the reader knows about it.

You can replace most of this paragraph with a simple sentence like: `The processor Dispatch Width defaults to the value of field "IssueWidth" in the scheduling model`.

================
Comment at: docs/CommandGuide/llvm-mca.rst:589
+
+* The size of the dispatch group is smaller than pipeline's dispatch width.
+* There are enough entries in the reorder buffer.
----------------
s/pipeline/processor

================
Comment at: docs/CommandGuide/llvm-mca.rst:591
+* There are enough entries in the reorder buffer.
+* There are enough temporary registers to do register renaming.
+* The schedulers are not full.
----------------
s/temporary/physical

================
Comment at: docs/CommandGuide/llvm-mca.rst:595-596
+Scheduling models can optionally specify which register files are available on
+the processor. MCA uses that information to initialize register file
+descriptors.  By default, if the model does not describe register files, MCA
+(optimistically) assumes a single register file with an unbounded number of
----------------
This sentence is not needed. It is implied by the next sentence.

================
Comment at: docs/CommandGuide/llvm-mca.rst:598
+(optimistically) assumes a single register file with an unbounded number of
+temporary registers.  Users can limit the number of temporary registers that
+are globally available for register renaming by using the command option
----------------
For consistency, please replace every occurrence of `temporary register` with `physical register`. The original RFC used the word "temporary" to describe microarchitectural registers; in code comments, we always use "physical register" instead.

================
Comment at: docs/CommandGuide/llvm-mca.rst:609
+entries in the reorder buffer defaults to the value provided
+by the target scheduling model.
+
----------------
You can be more specific, and add something like "by field `MicroOpBufferSize` in the target scheduling model`.

================
Comment at: docs/CommandGuide/llvm-mca.rst:616-618
+Zero latency instructions (for example NOP instructions) do not consume
+scheduler resources.  However, those instructions still reserve a number of
+slots in the reorder buffer.
----------------
This paragraph is probably not needed and can be removed.

================
Comment at: docs/CommandGuide/llvm-mca.rst:622
+"""""""""""""""""
+Each scheduler resource implements a queue of instructions.  An instruction has
+to wait in the scheduler's queue until input register operands become
----------------
Each processor scheduler implements a buffer of instructions.

================
Comment at: docs/CommandGuide/llvm-mca.rst:623
+Each scheduler resource implements a queue of instructions.  An instruction has
+to wait in the scheduler's queue until input register operands become
+available.  Only at that point, does the instruction becomes eligible for
----------------
`in a scheduler's buffer`

================
Comment at: docs/CommandGuide/llvm-mca.rst:631
+scheduler is responsible for tracking data dependencies, and dynamically
+selecting which processor resources are used by instructions.
+
----------------
s/used/consumed

================
Comment at: docs/CommandGuide/llvm-mca.rst:655-657
+MCA also simulates processor resources from data provided by the scheduling
+model.  This allows MCA to track the availability of every single resource
+unit.
----------------
This paragraph is not needed.

================
Comment at: docs/CommandGuide/llvm-mca.rst:674-676
+To simulate an out-of-order execution of memory operations, MCA utilizes a
+simulated load/store unit (LSUnit).  The LSUnit manages queues to simulate the
+speculative execution of loads and stores.
----------------
These two sentences can be joined together.
`MCA utilizes a simulated load/store unit (LSUnit) to track execution of loads and stores`

================
Comment at: docs/CommandGuide/llvm-mca.rst:694-695
+
+By default, the LSUnit conservatively (i.e., pessimistically) assumes that
+loads always may-alias store operations.  This LSUnit does not perform any
+alias analysis to rule out cases where loads and stores do not overlap with
----------------
This is no longer true.
By default, the LSUnit optimistically assumes that loads don't alias store operations.

Please have a look at `LSUnit.h`. You will see an updated version of this paragraph (you can probably copy/paste it here).

================
Comment at: docs/CommandGuide/llvm-mca.rst:699-700
+allowed to pass older stores.  To make it possible for a younger load to pass
+an older store, users can use the command line option ``-noalias``.  Under
+*noalias*, a younger load is always allowed to pass an older store.
+
----------------
noalias defaults to `true` in the current llvm-mca.

================
Comment at: docs/CommandGuide/llvm-mca.rst:704-705
+allow reordering of non-aliasing store operations.  That being said, at the
+moment, there is no way to further relax the memory model (``-noalias`` is the
+only option).  Essentially, there is no option to specify a different memory
+type (e.g., write-back, write-combining, write-through; etc.) and consequently
----------------
This statement should only apply to the "default" LSUnit class.
In future, we could (should) allow users to customzie the LSUnit. That would enable the support for different memory types (and therefore, different consistency models).

================
Comment at: docs/CommandGuide/llvm-mca.rst:716-719
+No assumption is made on the store buffer size.  As mentioned before, the
+LSUnit conservatively assumes a may-alias relation between loads and stores,
+and it does not attempt to identify cases where store-to-load forwarding would
+occur in practice.
----------------
This paragraph can be removed.

https://reviews.llvm.org/D49692