[PATCH] [AArch64] Refines the Cortex-A57 Machine Model

Wed Sep 17 09:25:10 PDT 2014

> On Sep 17, 2014, at 9:07 AM, Dave Estes <cestes at codeaurora.org> wrote:
> 
>> 
>> You could go even further and model the in-order stalls on functional units that are not fully pipelined by setting BufferSize=0.
>> Note that you can have a mix of in-order/out-of-order resources if you choose.
> 
> I figured there was some tradeoffs with modeling purely in-order, but the gains were so broadly beneficial that it was a no brainer. I really want to do just this and model both the in-order and out-of-order portions of the pipelines for each instructions. It wasn't immediately obvious how to do it, so I temporarily shelved the idea. Might be a nice experiment for a proposed SchedMachineModel tutorial. :)

Agreed.

>> You can also model just a certain class of instructions as having in-order latency by boosting MicroOpBufferSize and setting BufferSize=1. You can have a class of instructions consume multiple resources so you could model both in-order resource contention and latency.
>> 
>> Note that the idea behind modeling out-of-order is that we don't want an instruction issue limitation to be modeled as a hard stall that preempts all other heuristics. There are thresholds and heuristics that then come into play to try to balance resources. However, the default heuristics are very conservative, in the sense that the schedule is preserved unless we suspect a real stall (first do no harm). Given the scheduler only sees a single block, it often doesn't do anything to improve issue bandwidth on an aggressive OOO model. The scheduler could be improved by recognizing loops, inferring a steady cpu state and adjusting heuristics. I've added some loop awareness to the heuristics but it could be much better.
> 
> I really like this idea of adjusting heuristics. Think this is something that PGO can also help with?

We only build a DAG for one block, so can only analyze single block loops. The scheduler could be improved to build a DAG for an extended basic block to handle loops with early exits. The only benefit I see from PGO would be distinguishing low trip count loops from high trip count loops, but I don’t think its a big benefit here.

MachineTraceMetrics determines critical path and resource height across a trace. It can handle complex loops and benefits from PGO. It might be interesting to feed that into scheduling heuristics but it is expensive and not currently available at the time of scheduling.

-Andy 
-------------- next part --------------
An HTML attachment was scrubbed...
URL: <http://lists.llvm.org/pipermail/llvm-commits/attachments/20140917/d1d7b238/attachment.html>