[LLVMdev] New machine model questions
Andrew Trick
atrick at apple.com
Fri Jan 24 13:52:21 PST 2014
On Jan 24, 2014, at 2:21 AM, Daniel Sanders <Daniel.Sanders at imgtec.com> wrote:
> Hi Andrew,
>
> I seem to be making good progress on the P5600 scheduler using the new machine model but I've got a few questions about it.
Hi Daniel,
These are really good questions. For future reference, I might provide better examples if you attach what you have so far for the model.
> How would you represent an instruction that splits into two micro-ops and is dispatched to two different reservation stations?
> For example, I have two reservation stations (AGQ and FPQ). An FPU load instruction is split into a load micro-op which is dispatched to AGQ and a writeback micro-op which is dispatched to FPQ.
> The AGQ micro-op is issued to a four-cycle latency pipeline called LDST. Three cycles after issue, the LDST pipeline wakes up the FPQ micro-op, which writes the result of the load back to the register file.
This question illustrates the primary difference between the per-operand machine model and the itinerary. The itinerary directly models the stages of each pipeline independently. Some backend maintainers may still want to use itineraries if that level of precision is critical [1]. Another option is extending the new model. [2]
I will assume that each queue is fully pipelined (4 ACQ ops can be in-flight).
Forcing all this information into a single SchedWriteRes def would look like this:
def P5600FLD : SchedWriteRes <[P5600UnitAGQ, P5600UnitFP]> {
let Latency = 5; // 4 cycle load + 1 cycle FP writeback
let NumMicroOps = 2;
}
This is bad (for an in-order processor) because it prevents FPLoad + FPx from being scheduled in the same cycle and fails to detect a conflict on FP ops 5 scheduled cycles ahead.
A better way to express it would be:
def P5600LD <[P5600UnitAGQ]> { let Latency = 4; }
def P5600FP <[P5600UnitFP]>;
def P5600FLD : WriteSequence<[P5600LD, P5600FP]>;
Unfortunately, the implementation currently aggregates the processor resources, ignoring the fact that they are used on different cycles. This is totally fixable [2]. However, I don't know why you would care, since an out-of-order processor doing its job will make the stalls unpredictable either way.
> Is it possible to use other instructions already scheduled for the same cycle as part of the evaluation of a SchedPredicate in a SchedVariant?
> I've got a class of instructions (mostly simple addition) that can dispatch to two different reservation stations (ALQ and AGQ), both of which have a suitable pipeline with the same latency. The dispatch stage can dispatch two instructions per cycle. When it has one instruction from this class it dispatches it to ALQ (this isn't strictly true but I'll come back to that), and when it has two it dispatches one to ALQ and the other to AGQ.
>
No. The machine model is used to form a scheduling DAG independent of the original schedule. If it's important to be this precise, then I suggest you plugin a new MachineSchedStrategy where you can model stalls for any special cases during scheduling.
You need a super-resource:
def P5600A : ProcResource<2>;
def P5600AGQ : ProcResource<1> { let Super = P5600A; }
def P5600ALQ : ProcResource<1> { let Super = P5600A; }
> Is it possible to use historical scheduling decisions as part of the evaluation of a SchedPredicate in a SchedVariant?
> I'm fairly certain the answer to this one is 'no' (because scheduling can be performed in both directions) but I'll ask anyway. In the previous question, I said that when the dispatch stage has one instruction that can be dispatched to either ALQ or AGQ it always picks ALQ. The truth of the matter is that historical decisions are used to guess which one is most likely to stall and the dispatch stage picks the other one. I haven't established exactly what information it's using yet though so I can't give a good example.
SchedVariant is really just for opcodes that can use different resources/latency depending on the value of some immediate.
The kind of micro-architectural special rules/heuristics that you are describing are exactly why we have a plugable MachineSchedStrategy.
> Is there an easy way to check I've covered every valid instruction? I'm thinking it would be helpful if I could get build warnings from tablegen about valid instructions with no scheduling information. This would also prevent someone adding an instruction later and forgetting to add it to the scheduler.
YES! Very good question.
When implementing a new model, it's important to run table-gen with subtarget-emitter.
You should be able to touch your .td, then grab the command via make TOOL_VERBOSE=1
This is the line from ARM:
llvm-tblgen -I /s/fix/lib/Target/ARM -I /s/fix/include -I /s/fix/include -I /s/fix/lib/Target -gen-subtarget -o ARMGenSubtargetInfo.inc /s/fix/lib/Target/ARM/ARM.td -debug-only=subtarget-emitter
It will list all instructions and print "No machine model for <subtarget>"
You will also get an assert in the scheduler, unless you add the following flag to your mode:
let CompleteModel = 0;
>
> Thanks
>
> Daniel Sanders
> Leading Software Design Engineer, MIPS Processor IP
> Imagination Technologies Limited
> www.imgtec.com
[1] I added support for the itineraries into the new MI scheduler because I realized that some out-of-tree backend maintainers may still want that level of precision. I'm not sure yet whether you fall into that category. The new machine model was designed for out-of-order processors, but I also think it is sufficient for most in-order models. I would like to establish the new machine model as the preferred choice because it is simpler and more efficient, it will be easier for most backend developers to bring up a new subtarget, and we will then eventually have more consistency across targets. I also selfishly want more good in-tree examples of the new model so it will effectively be better documented and supported.
I believe it is possible to handle special cases requiring the itinerary's precision without using an itinerary by either pluging custom logic into the MachineSchedStrategy, or extending the new machine model...
[2] To model in-order pipeline resource we could
- add a field to MCWriteProcResEntry
+ unsigned DelayCycles;
- Modify the table gen code in SubtargetEmitter to record the delay.
We already to this:
// If this resource is already used in this sequence, add the current
// entry's cycles so that the same resource appears to be used
// serially, rather than multiple parallel uses. This is important for
// in-order machine where the resource consumption is a hazard.
But we could do also add a delay to the resource cycles when the the
processor resource is unbuffered.
- The code in SchedBoundary::bumpNode and SchedBoundary::checkHazard
needs to be updated to increment the cycle accounting for DelayCycles.
-Andy
-------------- next part --------------
An HTML attachment was scrubbed...
URL: <http://lists.llvm.org/pipermail/llvm-dev/attachments/20140124/ecb06933/attachment.html>
More information about the llvm-dev
mailing list