<html><head><meta http-equiv="Content-Type" content="text/html charset=utf-8"></head><body style="word-wrap: break-word; -webkit-nbsp-mode: space; -webkit-line-break: after-white-space;" class=""><br class=""><div><blockquote type="cite" class=""><div class="">On Sep 17, 2014, at 9:07 AM, Dave Estes <<a href="mailto:cestes@codeaurora.org" class="">cestes@codeaurora.org</a>> wrote:</div><br class="Apple-interchange-newline"><div class=""><blockquote type="cite" style="font-family: Helvetica; font-size: 12px; font-style: normal; font-variant: normal; font-weight: normal; letter-spacing: normal; line-height: normal; orphans: auto; text-align: start; text-indent: 0px; text-transform: none; white-space: normal; widows: auto; word-spacing: 0px; -webkit-text-stroke-width: 0px;" class=""><br class="">You could go even further and model the in-order stalls on functional units that are not fully pipelined by setting BufferSize=0.<br class="">Note that you can have a mix of in-order/out-of-order resources if you choose.<br class=""></blockquote><br style="font-family: Helvetica; font-size: 12px; font-style: normal; font-variant: normal; font-weight: normal; letter-spacing: normal; line-height: normal; orphans: auto; text-align: start; text-indent: 0px; text-transform: none; white-space: normal; widows: auto; word-spacing: 0px; -webkit-text-stroke-width: 0px;" class=""><span style="font-family: Helvetica; font-size: 12px; font-style: normal; font-variant: normal; font-weight: normal; letter-spacing: normal; line-height: normal; orphans: auto; text-align: start; text-indent: 0px; text-transform: none; white-space: normal; widows: auto; word-spacing: 0px; -webkit-text-stroke-width: 0px; float: none; display: inline !important;" class="">I figured there was some tradeoffs with modeling purely in-order, but the gains were so broadly beneficial that it was a no brainer. I really want to do just this and model both the in-order and out-of-order portions of the pipelines for each instructions. It wasn't immediately obvious how to do it, so I temporarily shelved the idea. Might be a nice experiment for a proposed SchedMachineModel tutorial. :)</span><br style="font-family: Helvetica; font-size: 12px; font-style: normal; font-variant: normal; font-weight: normal; letter-spacing: normal; line-height: normal; orphans: auto; text-align: start; text-indent: 0px; text-transform: none; white-space: normal; widows: auto; word-spacing: 0px; -webkit-text-stroke-width: 0px;" class=""></div></blockquote><div><br class=""></div>Agreed.</div><div><br class=""><blockquote type="cite" class=""><div class=""><blockquote type="cite" style="font-family: Helvetica; font-size: 12px; font-style: normal; font-variant: normal; font-weight: normal; letter-spacing: normal; line-height: normal; orphans: auto; text-align: start; text-indent: 0px; text-transform: none; white-space: normal; widows: auto; word-spacing: 0px; -webkit-text-stroke-width: 0px;" class="">You can also model just a certain class of instructions as having in-order latency by boosting MicroOpBufferSize and setting BufferSize=1. You can have a class of instructions consume multiple resources so you could model both in-order resource contention and latency.<br class=""><br class="">Note that the idea behind modeling out-of-order is that we don't want an instruction issue limitation to be modeled as a hard stall that preempts all other heuristics. There are thresholds and heuristics that then come into play to try to balance resources. However, the default heuristics are very conservative, in the sense that the schedule is preserved unless we suspect a real stall (first do no harm). Given the scheduler only sees a single block, it often doesn't do anything to improve issue bandwidth on an aggressive OOO model. The scheduler could be improved by recognizing loops, inferring a steady cpu state and adjusting heuristics. I've added some loop awareness to the heuristics but it could be much better.<br class=""></blockquote><br style="font-family: Helvetica; font-size: 12px; font-style: normal; font-variant: normal; font-weight: normal; letter-spacing: normal; line-height: normal; orphans: auto; text-align: start; text-indent: 0px; text-transform: none; white-space: normal; widows: auto; word-spacing: 0px; -webkit-text-stroke-width: 0px;" class=""><span style="font-family: Helvetica; font-size: 12px; font-style: normal; font-variant: normal; font-weight: normal; letter-spacing: normal; line-height: normal; orphans: auto; text-align: start; text-indent: 0px; text-transform: none; white-space: normal; widows: auto; word-spacing: 0px; -webkit-text-stroke-width: 0px; float: none; display: inline !important;" class="">I really like this idea of adjusting heuristics. Think this is something that PGO can also help with?</span><br class=""></div></blockquote><br class=""></div><div>We only build a DAG for one block, so can only analyze single block loops. The scheduler could be improved to build a DAG for an extended basic block to handle loops with early exits. The only benefit I see from PGO would be distinguishing low trip count loops from high trip count loops, but I don’t think its a big benefit here.</div><div><br class=""></div><div>MachineTraceMetrics determines critical path and resource height across a trace. It can handle complex loops and benefits from PGO. It might be interesting to feed that into scheduling heuristics but it is expensive and not currently available at the time of scheduling.</div><div><br class=""></div><div>-Andy </div></body></html>