[llvm-dev] [MachineScheduler] Question about IssueWidth / NumMicroOps
Jonas Paulsson via llvm-dev
llvm-dev at lists.llvm.org
Wed May 9 09:43:27 PDT 2018
Hi,
I would like to ask what IssueWidth and NumMicroOps refer to in
MachineScheduler, just to be 100% sure what the intent is.
Are we modeling the decoder phase or the execution stage?
Background:
First of all, there seems to be different meanings of "issue" depending
on which platform you're on:
https://stackoverflow.com/questions/23219685/what-is-the-meaning-of-instruction-dispatch:
"... "Dispatch in this sense means either the sending of an instruction
to a queue in preparation to be scheduled in an out-of-order
processor (IBM's use; Intel calls this issue) or sending the instruction
to the functional unit for execution (Intel's use; IBM calls this issue)..."
So "issue" could mean either of
(1) "the sending of an instruction to a queue in preparation to be
scheduled in an out-of-order processor"
(2) "sending the instruction to the functional unit for execution"
I would hope to be right when I think that IssueWidth (1) would relate
to the decoding capacity, while (2) would reflect the executional
capacity per cycle.
There is this comment in TargetSchedule.td:
// Use BufferSize = 0 for resources that force "dispatch/issue
// groups". (Different processors define dispath/issue
// differently. Here we refer to stage between decoding into micro-ops
// and moving them into a reservation station.) Normally NumMicroOps
// is sufficient to limit dispatch/issue groups. However, some
// processors can form groups of with only certain combinitions of
// instruction types. e.g. POWER7.
This seems to say that in MachineScheduler, (1) is in effect, right?
Furthermore, I see
def SkylakeServerModel : SchedMachineModel {
// All x86 instructions are modeled as a single micro-op, and SKylake can
// decode 6 instructions per cycle.
let IssueWidth = 6;
def BroadwellModel : SchedMachineModel {
// All x86 instructions are modeled as a single micro-op, and HW can
decode 4
// instructions per cycle.
let IssueWidth = 4;
def SandyBridgeModel : SchedMachineModel {
// All x86 instructions are modeled as a single micro-op, and SB can
decode 4
// instructions per cycle.
// FIXME: Identify instructions that aren't a single fused micro-op.
let IssueWidth = 4;
, which also seem to indicate (1).
What's more, I see that checkHazard() returns true if '(CurrMOps + uops
> SchedModel->getIssueWidth())'.
This means that the SU will be put in Pending instead of Available based
on the number of microops it uses.
To me this seems like an in-order decoding hazard check, since an OOO
machine will rearrange the microops
during execution, so there is not much use in checking for the sum of
the executional capacity of the current SU
candidate and the immediately previously scheduled here. I then again
would say (1). (Checking for decoder groups
pre-RA does BTW not make much sense on SystemZ, but that's another
question).
checkHazard() also return hazard if
(CurrMOps > 0 &&
((isTop() && SchedModel->mustBeginGroup(SU->getInstr())) ||
(!isTop() && SchedModel->mustEndGroup(SU->getInstr()))))
, which also per the same lines makes me think that this is intended for
the instruction stream management, or (1).
There is also the fact that
IsResourceLimited =
checkResourceLimit(SchedModel->getLatencyFactor(),
getCriticalCount(),
getScheduledLatency());
, which is to me admittedly hard to grasp, but it seems that the
scheduled latency (std::max(ExpectedLatency, CurrCycle))
affects the resource heuristic so that if scheduled latency is low
enough, it becomes active. This then means that CurrCycle
actually affects when resource balancing goes into action, and CurrCycle
in turn is advanced when NumMicroOps reach the
IssueWidth. So somehow it all depends on modelling the instructions to
fill upp the IssueWidth by their microops. This could
actually either be
* Decoder cycles: NumDecoderSlots(SU) => SU->NumMicroOps and
DecoderCapacity => IssueWidth (1)
or
* Execution cycles: NumExecutedUOps(SU) => SU->NumMicroOps and
ApproxMaxExecutedUOpsPerCycle => IssueWidth (2)
They would at least in this context be somewhat equievalent in driving
CurrCycle forward.
Please, let me know about (1) or (2) :-)
thanks
/Jonas
More information about the llvm-dev
mailing list