<html><head><meta http-equiv="Content-Type" content="text/html charset=windows-1252"></head><body style="word-wrap: break-word; -webkit-nbsp-mode: space; -webkit-line-break: after-white-space; "><br><div><div>On Sep 20, 2013, at 5:59 AM, Hal Finkel <<a href="mailto:hfinkel@anl.gov">hfinkel@anl.gov</a>> wrote:</div><br class="Apple-interchange-newline"><blockquote type="cite"><blockquote type="cite" style="font-family: Helvetica; font-size: medium; font-style: normal; font-variant: normal; font-weight: normal; letter-spacing: normal; line-height: normal; orphans: 2; text-align: -webkit-auto; text-indent: 0px; text-transform: none; white-space: normal; widows: 2; word-spacing: 0px; -webkit-text-size-adjust: auto; -webkit-text-stroke-width: 0px; ">For SLM, you may not need the PostRA scheduler at all. I expect the<br>MachineScheduler to be a better fit. You can currently enable the<br>MachineScheduler and it will use your existing (old-style)<br>itineraries (X86SubTarget::enableMachineScheduler() { if (Atom)<br>return true; }). But we really don't want to support that. The<br>proper thing to do for SLM is define an out-of-order machine model.<br>See the SandyBridge/Haswell model.<br><br>For original Atom, you might still want a PostRA scheduler. Running<br>the new MachineScheduler a second time as a replacement postRA<br>scheduler is something I intended to do, but haven't had a client.<br>You could enable MachineScheduler and benchmark to determine if<br>PostRA sched is still really needed (-disable-post-ra). If you do<br>still need it, then I’d like to try replacing it with a second<br>MachineScheduler run. We do want to kill off the current PostRA<br>scheduler. It is a maintenance burden that doesn't serve a purpose<br>for targets that have migrated to MachineScheduler.<br></blockquote><br style="font-family: Helvetica; font-size: medium; font-style: normal; font-variant: normal; font-weight: normal; letter-spacing: normal; line-height: normal; orphans: 2; text-align: -webkit-auto; text-indent: 0px; text-transform: none; white-space: normal; widows: 2; word-spacing: 0px; -webkit-text-size-adjust: auto; -webkit-text-stroke-width: 0px; "><span style="font-family: Helvetica; font-size: medium; font-style: normal; font-variant: normal; font-weight: normal; letter-spacing: normal; line-height: normal; orphans: 2; text-align: -webkit-auto; text-indent: 0px; text-transform: none; white-space: normal; widows: 2; word-spacing: 0px; -webkit-text-size-adjust: auto; -webkit-text-stroke-width: 0px; display: inline !important; float: none; ">Andy,</span><br style="font-family: Helvetica; font-size: medium; font-style: normal; font-variant: normal; font-weight: normal; letter-spacing: normal; line-height: normal; orphans: 2; text-align: -webkit-auto; text-indent: 0px; text-transform: none; white-space: normal; widows: 2; word-spacing: 0px; -webkit-text-size-adjust: auto; -webkit-text-stroke-width: 0px; "><br style="font-family: Helvetica; font-size: medium; font-style: normal; font-variant: normal; font-weight: normal; letter-spacing: normal; line-height: normal; orphans: 2; text-align: -webkit-auto; text-indent: 0px; text-transform: none; white-space: normal; widows: 2; word-spacing: 0px; -webkit-text-size-adjust: auto; -webkit-text-stroke-width: 0px; "><span style="font-family: Helvetica; font-size: medium; font-style: normal; font-variant: normal; font-weight: normal; letter-spacing: normal; line-height: normal; orphans: 2; text-align: -webkit-auto; text-indent: 0px; text-transform: none; white-space: normal; widows: 2; word-spacing: 0px; -webkit-text-size-adjust: auto; -webkit-text-stroke-width: 0px; display: inline !important; float: none; ">I'd certainly like to try this :) -- Post-RA scheduling still gives me ~10-15% speedup on the PPC A2 (just because of the relatively long pipeline, adjusting after the spill code is inserted is quite beneficial).</span><br style="font-family: Helvetica; font-size: medium; font-style: normal; font-variant: normal; font-weight: normal; letter-spacing: normal; line-height: normal; orphans: 2; text-align: -webkit-auto; text-indent: 0px; text-transform: none; white-space: normal; widows: 2; word-spacing: 0px; -webkit-text-size-adjust: auto; -webkit-text-stroke-width: 0px; "></blockquote><div><br></div><div>Ok. I'll see if I can enable a postRA MachineScheduler under a flag and have you run an experiment. If all goes well, we should be able to retire the old postRA scheduler.</div><div><br></div><div>One thing you might want to try is disabling the CriticalAntiDepBreaker to see how important it is. Supporting it complicates things.</div><br><blockquote type="cite"><span style="font-family: Helvetica; font-size: medium; font-style: normal; font-variant: normal; font-weight: normal; letter-spacing: normal; line-height: normal; orphans: 2; text-align: -webkit-auto; text-indent: 0px; text-transform: none; white-space: normal; widows: 2; word-spacing: 0px; -webkit-text-size-adjust: auto; -webkit-text-stroke-width: 0px; display: inline !important; float: none; ">Also, regarding the new machine model, can it handle instructions with more than one output where the different outputs have different latencies? My pre-increment loads have this property.</span></blockquote><br><div>Yes, the new machine model is really just a list of SchedReadWrite entries, one for each operand. Operands that define a register are required to have an SchedWrite entry. SchedRead entries are optional. If an operand is both a def and use, it takes two entries. The order between a read and write entry does not matter.</div><div><br></div><div>There are three ways to map these to an instruction:</div><div><br></div><div>(1) The proper way (see X86SchedSandyBridge.td)</div><div><br></div><div>def WriteA : SchedWrite;</div><div>def WriteB : SchedWrite;</div><div><br></div><div>def PPCInstXYZ : SomeFormat<...>, Sched<[WriteA, WriteB]>;</div><div><br></div><div>let SchedModel = PPCA2Model in {</div><div>def ProcUnitA : ProcResource<1>;</div><div>def ProcUnitB : ProcResource<1>;</div><div><br></div><div>def : WriteRes<WriteA, [ProcUnitA]>;</div><div>def : WriteRes<WriteB, [ProcUnitB]>;</div><div>}</div><div><br></div><div>(2) The Itinerary class adapter (See ARMScheduleA9.td), for targets where the instruction definitions have itinierary classes, but are too convoluted to retrofit with SchedReadWrite lists.</div><div><br></div><div>let SchedModel = PPCA2Model in {</div><div>def ProcUnitA : ProcResource<1>;</div><div>def ProcUnitB : ProcResource<1>;</div><div><br></div><div>def A2WriteA : SchedWriteRes<[ProcUnitA]>;</div><div>def A2WriteB : SchedWriteRes<[ProcUnitB]>;</div><div><br></div><div>def : ItinRW<[A2WriteA, A2WriteB], [ItinX, ItinY, ItinZ]>;</div><div>}</div><div><br></div><div>(3) The opcode match (See ARMScheduleSwift.td), for targets where the instruction definitions are too convoluted and we either don't have or don't want to continue using itinerary classes.</div><div><br></div><div>let SchedModel = PPCA2Model in {</div><div>def ProcUnitA : ProcResource<1>;</div><div>def ProcUnitB : ProcResource<1>;</div><div><br></div><div>def A2WriteA : SchedWriteRes<[ProcUnitA]>;</div><div>def A2WriteB : SchedWriteRes<[ProcUnitB]>;</div><div>def : InstRW<[A2WriteA, A2WriteB],</div><div>            (instregex "PPCIntX", "PPCInstY", "PPCInstZ")></div><div>}</div><div><br></div><div>(Yes, the opcode match is verified).</div><div><br></div><div>It's ok to use both (1) and (3) in the same subtarget. The idea is that (1) covers the common cases, while (3) lets a subtarget handle a few peculiar cases without complicating the target instruction classes.</div><div><br></div><div>If you migrate an in-order itinerary to the new model, please let me know. There's a small addition that I should make to the generic scheduler that you're using to enforce the strict pipeline hazards with the new model, but I'd like someone to test it when I commit.</div><div><br></div><div>-Andy</div></div><br></body></html>