[llvm-bugs] [Bug 42353] New: [SchedModel][MCA] Add the ability to specify a different DispatchWidth and a different number of micro opcodes for the ROB/Dispatch Logic.
llvm-bugs at lists.llvm.org
Fri Jun 21 08:04:40 PDT 2019
Bug ID: 42353
Summary: [SchedModel][MCA] Add the ability to specify a
different DispatchWidth and a different number of
micro opcodes for the ROB/Dispatch Logic.
OS: Windows NT
Assignee: unassignedbugs at nondot.org
Reporter: andrea.dibiagio at gmail.com
CC: andrea.dibiagio at gmail.com, llvm-bugs at lists.llvm.org,
matthew.davis at sony.com
This problem came up while investigating the quality of llvm-mca reports on
modern Intel processors.
tl;dr: Two points:
1) The IssueWidth from the scheduling models cannot always be used by llvm-mca
to simulate the processor dispatch width. For llvm-mca, we need a way to
specify a different value to model the processor dispatch width.
2) We want to allow users to optionally define a different number of opcodes
for the purpose of dispatch, ROB/Scheduler entries consumed. This information
could be used by Intel models to describe fused domains in Intel processors.
Scheduling models were originally introduced to help scheduling algorithms
identify optimal sequences of instructions.
Models didn't need to provide too much information about the target processor.
Models simply had to describe the out-of-order as a unified reservation station
which "sees" all the instructions from an input scheduling region in input.
There is basically no concept of "instruction dispatch" in the scheduling
model. Instructions from a code region are all immediately available in the
idealized reservation station that sees all the processor resources.
There is also no need to model the decoder's queue: instructions don't need to
be fetched from a decoder's queue; they simply exist in an ideal (potentially
unbounded) reservation station which internally classifies instructions as
either "pending" or "ready" (based on hazards/data dependencies).
So what is in practice the so-called IssueWidth?
The model needed a way to superiorly limit the number of opcodes issued per
cycles. IssueWidth serves that specific purpose.
It can be seen in practice as a "magic number" (often empirically computed by
running several benchmarks) that ideally summarizes:
- throughput from the decoders
- availability of buffers in the out-of-order (notably, the ROB)
- dispatch throughput
- presence or absence of loop buffers, etc.
For most processors, that value (by luck) often matches what we call dispatch
throughput. However, things get complicated when the processor performs micro
So here is the idea:
In the processor model, we introduce a tablegen class named DispatchLogic which
to start declares a single field named 'DispatchWith'.
If a model defines DispatchLogic, then llvm-mca uses field DispatchWidth
instead of IssueWidth to model the processo dispatch rate.
Note that this would be opt-in for the targets. In the absence of DispatchLogic
definition, llvm-mca would fall back to using IssueWidth as the actuall
dispatch rate. So, processors would not need to be changed if we decide to
implement it that way.
I suggest to add two extra (completely optional) fields in scheduling classes
- NumDispatchEntriesConsumed (or NumMicroOpcodesForDispatch)
- NumROBEntriesConsumed (or NumSchedulerEntries)
Those two extra fields would default to NumMicroOpcodes.
So, users don't need to worry about changing their models if they are already
happy with NumMicroOpcodes.
We can structure things in the subtarget emitter so that information about
these opcodes is only emitted as "extra processor information".
I think that this may be a good way to solve some issues with simulating Intel
processors. If people agree with this approach, I may start working on a patch
to address this.
What do you think?
You are receiving this mail because:
You are on the CC list for the bug.
-------------- next part --------------
An HTML attachment was scrubbed...
More information about the llvm-bugs