[LLVMdev] New machine model questions

Andrew Trick atrick at apple.com
Tue Jan 28 15:10:27 PST 2014


On Jan 28, 2014, at 9:22 AM, Daniel Sanders <Daniel.Sanders at imgtec.com> wrote:

> You need a super-resource:
>  
> def P5600A : ProcResource<2>;
> def P5600AGQ : ProcResource<1> { let Super = P5600A; }
> def P5600ALQ : ProcResource<1> { let Super = P5600A; }
>  
> I'll take a look at MachineSchedStrategy. I don't know how important that precision is likely to be at the moment but I've generally found that the more accurate the machine description is, the harder it is to find one of the bad cases. That experience comes from a particular in-order scheduler in a proprietary compiler so I don't know if I can expect similar things from LLVM or not. I'm expecting out-of-order to help reduce the amount of precision that's needed for a good result but I don't know how much of a reduction I can expect at the moment.
>  
> I'm not sure I fully understand the super-resource suggestion. I've attached my WIP so you can take a look at the code in context but the relevant extracts are below.
> def P5600IssueALU : ProcResource<1>;
> def P5600IssueAL2 : ProcResource<1>;
> def P5600ALQ : ProcResGroup<[P5600IssueALU]> { let BufferSize = 16; }
> def P5600AGQ : ProcResGroup<[P5600IssueAL2, ...]> {
>   let BufferSize = 16;
> }
> def P5600WriteALU : SchedWriteRes<[P5600IssueALU]>;
> def P5600WriteAL2 : SchedWriteRes<[P5600IssueAL2]>;
> def P5600WriteEitherALU : SchedWriteVariant<
>   [SchedVar<SchedPredicate<[{1}]>, [P5600WriteALU]>, // FIXME: Predicate
>    SchedVar<SchedPredicate<[{0}]>, [P5600WriteAL2]>  // FIXME: Predicate
>   ]>;
>  
> I believe you are suggesting that I change this to:
> def P5600IssueEitherALU : ProcResource<2>;
> def P5600IssueALU : ProcResource<1> { let Super = P5600IssueEitherALU; }
> def P5600IssueAL2 : ProcResource<1> { let Super = P5600IssueEitherALU; }
> def P5600ALQ : ProcResGroup<[P5600IssueALU]> { let BufferSize = 16; }
> def P5600AGQ : ProcResGroup<[P5600IssueAL2, ...]> {
>   let BufferSize = 16;
> }
> def P5600WriteALU : SchedWriteRes<[P5600IssueALU]>;
> def P5600WriteAL2 : SchedWriteRes<[P5600IssueAL2]>;
> def P5600WriteEitherALU : SchedWriteRes<[P5600IssueEitherALU]>;
>  
> Instructions can then use P5600WriteEitherALU to pick between the two sub-resources at issue time. One curious consequence of this is that by allowing it to pick which pipeline the instruction is issued to, it effectively allows the instruction to pick which reservation station to be dispatched to at issue-time (which is backwards, normally dispatch determines the available subset of pipelines). That might not be a significant issue as far as the scheduler output is concerned but it seemed strange to me and it makes me doubt that I've fully understood it.

The scheduler does not model which dispatch queue (or is it issue queue?) the instructions reside in. For an OOO core, I think this is almost totally unpredictable anyway. We assume (hope) that the hardware can balance the queues.

With the itinerary-based hazard checker, I think the reservation station would force an instruction to use the first available resource which has the opposite problem in that it could unnecessarily prevent a later instruction from using that resource if the hardware can dynamically schedule.

I did not realize you were using processor groups. For many (relatively simple) cores the functional units can be expressed as a hierarchy. An instruction either needs a specific unit, or it can be issued to some broader class. You can do that without any groups. I added ProcResGroup for SandyBridge because instructions can issue to some subset of ports, and these subsets are overlapping. I think it is possible to use both groups and super resources in the same model, but may cause to confusion. I was simply suggesting something like this, for example:

def P5600UnitA : ProcResource<2> { let BufferSize=16; }
def P5600UnitAGQ : ProcResource<1> { let Super = P5600A; }
def P5600UnitALQ : ProcResource<1> { let Super = P5600A; }

def P5600WriteA : SchedWriteRes<[P5600UnitA]>;
def P5600WriteLd : SchedWriteRes<[P5600UnitACQ, P5600UnitLdSt]>;

Where a load must issue on ACQ, and consumes one of the two ALU resources. A WriteA instruction simply uses one of the two ALU resources. We don’t model which one.

The relationship between ALU2 and ACQ is not clear to me yet, so I’m not sure what’s intended in your example.

Note that when an instruction uses a ProcResGroup it may use any of the named resources but we don’t know which one. It’s does not use all resources in the group. If you want an instruction to use multiple resources, then you just list them in the SchedWriteRes entry. You can also compose SchedWrites using SchedWriteSequence. (It would obviously be useful to work through a specific example here).

FYI: BufferSize is a nice feature, but you can fairly safely omit it for an OOO code. The scheduler will by default assume an infinite dispatch queue and almost certainly generate the same schedule unless you have very large blocks! The scheduler does attempt to determine whether the OOO buffer will reach capacity across iterations of single block loops, but it only looks at the model’s MicroOpBufferSize for this computation, not the per-resource buffer size.

-Andy





More information about the llvm-dev mailing list