<html><head><meta http-equiv="Content-Type" content="text/html charset=us-ascii"></head><body style="word-wrap: break-word; -webkit-nbsp-mode: space; -webkit-line-break: after-white-space; "><div><div>On Apr 24, 2012, at 8:59 AM, <a href="mailto:dag@cray.com">dag@cray.com</a> wrote:</div><br class="Apple-interchange-newline"><blockquote type="cite">Andrew Trick <<a href="mailto:atrick@apple.com">atrick@apple.com</a>> writes:<br><br><blockquote type="cite">We plan to move to the MachineScheduler by 3.2. The structure is:<br></blockquote><br>How hard will this be to backport to 3.1? Has woprk on this started<br>yet?<br></blockquote><div><br></div>In my previous message I outlined the steps that I would take to bring up the new scheduler. I'm about to checkin the register pressure reducing scheduler. The next step will be plugging in the target itinerary.</div><div><br><blockquote type="cite"><blockquote type="cite">ScheduleDAG: Abstract DAG of SUnits and SDeps<br> |<br> v<br>ScheduleDAGInstrs: Build the DAG from MachineInstrs, each SUnit tied to an MI<br> Delimit the current "region" of code being scheduled.<br> |<br> v<br>ScheduleDAGMI: Concrete implementation that supports both top-down and bottom-up scheduling<br> with live interval update. It divides the region into three zones:<br> Top-scheduled, bottom-scheduled, and unscheduled.<br></blockquote><br>So does this happen after SDNodes are converted to MachineInstrs? It's<br>not clear to me given your description of ScheduleDAGInstrs. I assume<br>it uses the current SDNode->SUnit mechanism but I just want to make<br>sure.<br></blockquote><div><br></div><div>Machine scheduling occurs in the vicinity of register allocation. It uses the existing MachineInstr->SUnit mechanism.</div><br><blockquote type="cite"><font color="#007316">...</font><br>I'm glad to hear the top-down scheduler will get some attention. We'll<br>be wanting to use that.<br></blockquote><div><br></div><div>Out of curiosity what about top-down works better for your target?</div><br><blockquote type="cite"><br><blockquote type="cite">Start by composing your scheduler from the pieces that are available,<br>e.g. HazardChecker, RegisterPressure... (There's not much value<br>providing a scheduling queue abstraction on top of vector or<br>priority_queue).<br></blockquote><br>What do you mean by this last point? We absolutely want to be able to<br>swap out different queue implementations. There is a significant<br>compile time tradeoff to be made here.<br></blockquote><div><br></div><div>Use whatever data structure you like for your queue. I don't have plans to make a reusable one yet. They're not complicated.</div><br><blockquote type="cite"><blockquote type="cite">2. Division of the target-defined resources into "interlocked"<br>vs. "buffered". The HazardChecker will continue to handle the<br>interlocks for the resources that the hardware handles in<br>order. <br></blockquote><br>So by "interlocks" you mean hardware-implemented interlocks? So that<br>the scheduler will attempt to avoid them. Not that we have a machine<br>like this, but what about machines with no interlocks where software is<br>responsible for correctness (VLIW, early MIPS, etc.)? I suppose they<br>can be handled with a similar mechanism but the latter is much more<br>strict than the former.<br></blockquote><div><br></div><div>I'm not designing a mechanism for inserting nops to pad latency. If someone needs that, it's easy to add.</div><br><blockquote type="cite"><blockquote type="cite">Buffered resources, which the hardware can handle out of order. These<br>will be considered separately for scheduling priority. We will also<br>make a distinction between minimum and expected operation latency.</blockquote></blockquote><blockquote type="cite">Does this mean you want to model the various queues, hardware scheduling<br>windows, etc.? I'm just trying to get a clearer idea of what this<br>means.<br></blockquote><div><br></div><div>I don't intend to directly model microarchitectural features of OOO processors at the level of buffer sizes. Although that could be done by a highly motivated developer.</div><div><br></div><div>I do intend to allow a target to identify arbitrary categories of resources, how many are available in each cycle on average, and indicate which resources are used by an operation. I'll initially piggyback on the existing functional units to avoid rewriting target itineraries.</div><br><blockquote type="cite">As I said, this is a time-critical thing for us. Is there any way I can<br>help to move this along?<br></blockquote></div><br><div>In general, fixing LLVM bugs and keeping build bots green is always helpful.</div><div><br></div><div>As far as the scheduler, pieces are starting to fall into place. People are starting to use those pieces and contribute. This is a pluggable pass, so there's no reason you can't develop your own machine scheduler in parallel and gradually take advantage of features as I add them. Please expect to do a little code copy-pasting into your target until the infrastructure is mature enough to expose more target interfaces. I'm not going to waste time redesigning APIs until we have working components.</div><div><br></div><div>It would be very useful to have people testing new features, finding bugs early, and hopefully fixing those bugs. I would also like people to give me interesting bits of code with performance issues that can work as test cases. That's hard if I can't run your target.</div><div><br></div><div>-Andy</div><div><br></div><div><br></div></body></html>