<html><head></head><body style="word-wrap: break-word; -webkit-nbsp-mode: space; -webkit-line-break: after-white-space; "><br><div><div>On Dec 20, 2011, at 10:29 AM, Hal Finkel wrote:</div><br class="Apple-interchange-newline"><blockquote type="cite"><div>On Tue, 2011-12-20 at 10:35 -0600, Hal Finkel wrote:<br><blockquote type="cite">On Mon, 2011-12-19 at 23:20 -0800, Andrew Trick wrote:<br></blockquote><blockquote type="cite"><blockquote type="cite"><br></blockquote></blockquote><blockquote type="cite"><blockquote type="cite">On Dec 19, 2011, at 10:53 PM, Hal Finkel wrote:<br></blockquote></blockquote><blockquote type="cite"><blockquote type="cite"><br></blockquote></blockquote><blockquote type="cite"><blockquote type="cite"><blockquote type="cite">Here's my "thought experiment" (from PR11589): I have a bunch of<br></blockquote></blockquote></blockquote><blockquote type="cite"><blockquote type="cite"><blockquote type="cite">load-fadd-store chains to schedule. A store takes two cycles to<br></blockquote></blockquote></blockquote><blockquote type="cite"><blockquote type="cite"><blockquote type="cite">clear<br></blockquote></blockquote></blockquote><blockquote type="cite"><blockquote type="cite"><blockquote type="cite">its last pipeline stage. The fadd takes longer to compute its result<br></blockquote></blockquote></blockquote><blockquote type="cite"><blockquote type="cite"><blockquote type="cite">(say 5 cycles), but can sustain a rate of 1 independent add per<br></blockquote></blockquote></blockquote><blockquote type="cite"><blockquote type="cite"><blockquote type="cite">cycle.<br></blockquote></blockquote></blockquote><blockquote type="cite"><blockquote type="cite"><blockquote type="cite">As the scheduling is bottom-up, it will schedule a store, then it<br></blockquote></blockquote></blockquote><blockquote type="cite"><blockquote type="cite"><blockquote type="cite">has a<br></blockquote></blockquote></blockquote><blockquote type="cite"><blockquote type="cite"><blockquote type="cite">choice: it can schedule another store (at a 1 cycle penalty), or it<br></blockquote></blockquote></blockquote><blockquote type="cite"><blockquote type="cite"><blockquote type="cite">can<br></blockquote></blockquote></blockquote><blockquote type="cite"><blockquote type="cite"><blockquote type="cite">schedule the fadd associated with the store it just scheduled (with<br></blockquote></blockquote></blockquote><blockquote type="cite"><blockquote type="cite"><blockquote type="cite">a 4<br></blockquote></blockquote></blockquote><blockquote type="cite"><blockquote type="cite"><blockquote type="cite">cycle penalty due to operand latency). It seems that the current<br></blockquote></blockquote></blockquote><blockquote type="cite"><blockquote type="cite"><blockquote type="cite">hybrid<br></blockquote></blockquote></blockquote><blockquote type="cite"><blockquote type="cite"><blockquote type="cite">scheduler will choose the fadd, I want a scheduler that will make<br></blockquote></blockquote></blockquote><blockquote type="cite"><blockquote type="cite"><blockquote type="cite">the<br></blockquote></blockquote></blockquote><blockquote type="cite"><blockquote type="cite"><blockquote type="cite">opposite choice.<br></blockquote></blockquote></blockquote><blockquote type="cite"><blockquote type="cite"><br></blockquote></blockquote><blockquote type="cite"><blockquote type="cite">That's just wrong. You may need to look at -debug-only=pre-RA-sched<br></blockquote></blockquote><blockquote type="cite"><blockquote type="cite">and debug your itinerary.<br></blockquote></blockquote><blockquote type="cite"><br></blockquote><blockquote type="cite">Andy, I've already looked at the debug output quite a bit; please help<br></blockquote><blockquote type="cite">me understand what I'm missing...<br></blockquote><blockquote type="cite"><br></blockquote><blockquote type="cite">First, looking at the code does seem to confirm my suspicion. This is<br></blockquote><blockquote type="cite">certainly is low-pressure mode, and so hybrid_ls_rr_sort::operator()<br></blockquote><blockquote type="cite">will return the result of BUCompareLatency. That function first checks<br></blockquote><blockquote type="cite">for stalls and returns 1 or -1. Only after that does it look at the<br></blockquote><blockquote type="cite">relative latencies.<br></blockquote><br>Looking at this more carefully, I think that I see the problem. The<br>heights are set to account for the latencies:<br>PredSU->setHeightToAtLeast(SU->getHeight() + PredEdge->getLatency());<br><br>but the latencies are considered only if the node as an ILP scheduling<br>preference (the default in TargetLowering.h is None):<br>  bool LStall = (!checkPref || left->SchedulingPref == Sched::ILP) &&<br>    BUHasStall(left, LHeight, SPQ);<br>...<br><br>and the PPC backend does not override getSchedulingPreference.</div></blockquote><blockquote type="cite"><div><br></div></blockquote><div><br></div><div>Right, even with sched=hybrid, the scheduler will fall back to register pressure scheduling unless the target implements</div><div>TargetLowering::getSchedulingPreference. I forgot that piece of the puzzle.</div><div><br></div><div>You could try simply returning Sched::ILP from PPCTargetLowering::getSchedulingPreference. If you have later have regpressure problems, you can so something more complicated like ARMTargetLowering::getSchedulingPreference.</div><div><br></div><div>BTW - If you set HasReadyFilter, the fadd (105) would not even appear in the queue until the scheduler reached cycle [24]. So three additional stores would have been scheduled first. HasReadyFilter effectively treats operand latency stalls as strictly as pipeline hazards. It's not clear to me that want to do that though if you fix getSchedulingPreference and do postRA scheduling later anyway.</div><div><br></div><div>So it should work to do "hybrid" scheduling biased toward ILP, vs. "ilp" scheduling which really does the opposite of what it's name implies because it's initially biased toward regpressure.</div><div><br></div><div>-Andy</div><br><blockquote type="cite"><div><blockquote type="cite"><font class="Apple-style-span" color="#000000"><br></font></blockquote><blockquote type="cite">In addition, the stall computation is done using BUHasStall, and that<br></blockquote><blockquote type="cite">function only checks the current cycle. Without looking forward, I don't<br></blockquote><blockquote type="cite">understand how it could know how long the pipeline hazard will last.<br></blockquote><blockquote type="cite"><br></blockquote><blockquote type="cite">It looks like this may have something to do with the height. Can you<br></blockquote><blockquote type="cite">explain how that is supposed to work?<br></blockquote><blockquote type="cite"><br></blockquote><blockquote type="cite">For the specific example: We start with the initial store...<br></blockquote><blockquote type="cite"><br></blockquote><blockquote type="cite">GPRC: 4 / 31<br></blockquote><blockquote type="cite">F4RC: 1 / 31<br></blockquote><blockquote type="cite"><br></blockquote><blockquote type="cite">Examining Available:<br></blockquote><blockquote type="cite">Height 2: SU(102): 0x2c03f70: ch = STFSX 0x2c03c70, 0x2bf3910,<br></blockquote><blockquote type="cite">0x2c03870, 0x2c03e70<Mem:ST4[%arrayidx6.14](align=8)(tbaa=!"float")><br></blockquote><blockquote type="cite">[ORD=94] [ID=102]<br></blockquote><blockquote type="cite"><br></blockquote><blockquote type="cite">Height 2: SU(97): 0x2c03470: ch = STFSX 0x2c03170, 0x2bf3910, 0x2c02c60,<br></blockquote><blockquote type="cite">0x2c03370<Mem:ST4[%arrayidx6.13](tbaa=!"float")> [ORD=88] [ID=97]<br></blockquote><blockquote type="cite"><br></blockquote><blockquote type="cite">Height 2: SU(92): 0x2c02860: ch = STFSX 0x2c02560, 0x2bf3910, 0x2c02160,<br></blockquote><blockquote type="cite">0x2c02760<Mem:ST4[%arrayidx6.12](align=16)(tbaa=!"float")> [ORD=82]<br></blockquote><blockquote type="cite">[ID=92]<br></blockquote><blockquote type="cite"><br></blockquote><blockquote type="cite">Height 2: SU(90): 0x2c01c50: ch = STFSX 0x2c01950, 0x2bf3910, 0x2c01550,<br></blockquote><blockquote type="cite">0x2c01b50<Mem:ST4[%arrayidx6.11](tbaa=!"float")> [ORD=76] [ID=90]<br></blockquote><blockquote type="cite"><br></blockquote><blockquote type="cite">Height 18: SU(85): 0x2c01150: ch = STFSX 0x2c00d40, 0x2bf3910,<br></blockquote><blockquote type="cite">0x2c00940, 0x2c00f40<Mem:ST4[%arrayidx6.10](align=8)(tbaa=!"float")><br></blockquote><blockquote type="cite">[ORD=70] [ID=85]<br></blockquote><blockquote type="cite"><br></blockquote><blockquote type="cite">*** Scheduling [21]: SU(102): 0x2c03f70: ch = STFSX 0x2c03c70,<br></blockquote><blockquote type="cite">0x2bf3910, 0x2c03870, 0x2c03e70<Mem:ST4[%<br></blockquote><blockquote type="cite">arrayidx6.14](align=8)(tbaa=!"float")> [ORD=94] [ID=102]<br></blockquote><blockquote type="cite"><br></blockquote><blockquote type="cite">then it schedules a "token factor" that is attached to the address<br></blockquote><blockquote type="cite">computation required by the store (this is essentially a no-op,<br></blockquote><blockquote type="cite">right?)...<br></blockquote><blockquote type="cite"><br></blockquote><blockquote type="cite">GPRC: 5 / 31<br></blockquote><blockquote type="cite">F4RC: 2 / 31<br></blockquote><blockquote type="cite"><br></blockquote><blockquote type="cite">Examining Available:<br></blockquote><blockquote type="cite">Height 21: SU(5): 0x2c03e70: ch = TokenFactor 0x2c00c40:1, 0x2c03a70<br></blockquote><blockquote type="cite">[ORD=94] [ID=5]<br></blockquote><blockquote type="cite"><br></blockquote><blockquote type="cite">Height 24: SU(105): 0x2c03c70: f32 = FADDS 0x2c03b70, 0x2bf3710 [ORD=92]<br></blockquote><blockquote type="cite">[ID=105]<br></blockquote><blockquote type="cite"><br></blockquote><blockquote type="cite">Height 2: SU(97): 0x2c03470: ch = STFSX 0x2c03170, 0x2bf3910, 0x2c02c60,<br></blockquote><blockquote type="cite">0x2c03370<Mem:ST4[%arrayidx6.13](tbaa=!"float")> [ORD=88] [ID=97]<br></blockquote><blockquote type="cite"><br></blockquote><blockquote type="cite">Height 2: SU(92): 0x2c02860: ch = STFSX 0x2c02560, 0x2bf3910, 0x2c02160,<br></blockquote><blockquote type="cite">0x2c02760<Mem:ST4[%arrayidx6.12](align=16)(tbaa=!"float")> [ORD=82]<br></blockquote><blockquote type="cite">[ID=92]<br></blockquote><blockquote type="cite"><br></blockquote><blockquote type="cite">Height 2: SU(90): 0x2c01c50: ch = STFSX 0x2c01950, 0x2bf3910, 0x2c01550,<br></blockquote><blockquote type="cite">0x2c01b50<Mem:ST4[%arrayidx6.11](tbaa=!"float")> [ORD=76] [ID=90]<br></blockquote><blockquote type="cite"><br></blockquote><blockquote type="cite">Height 18: SU(85): 0x2c01150: ch = STFSX 0x2c00d40, 0x2bf3910,<br></blockquote><blockquote type="cite">0x2c00940, 0x2c00f40<Mem:ST4[%arrayidx6.10](align=8)(tbaa=!"float")><br></blockquote><blockquote type="cite">[ORD=70] [ID=85]<br></blockquote><blockquote type="cite"><br></blockquote><blockquote type="cite">*** Scheduling [21]: SU(5): 0x2c03e70: ch = TokenFactor 0x2c00c40:1,<br></blockquote><blockquote type="cite">0x2c03a70 [ORD=94] [ID=5]<br></blockquote><blockquote type="cite"><br></blockquote><blockquote type="cite">how here is the choice that we may want to be different...<br></blockquote><blockquote type="cite"><br></blockquote><blockquote type="cite">GPRC: 5 / 31<br></blockquote><blockquote type="cite">F4RC: 2 / 31<br></blockquote><blockquote type="cite"><br></blockquote><blockquote type="cite">Examining Available:<br></blockquote><blockquote type="cite">Height 24: SU(105): 0x2c03c70: f32 = FADDS 0x2c03b70, 0x2bf3710 [ORD=92]<br></blockquote><blockquote type="cite">[ID=105]<br></blockquote><blockquote type="cite"><br></blockquote><blockquote type="cite">Height 2: SU(97): 0x2c03470: ch = STFSX 0x2c03170, 0x2bf3910, 0x2c02c60,<br></blockquote><blockquote type="cite">0x2c03370<Mem:ST4[%arrayidx6.13](tbaa=!"float")> [ORD=88] [ID=97]<br></blockquote><blockquote type="cite"><br></blockquote><blockquote type="cite">Height 2: SU(92): 0x2c02860: ch = STFSX 0x2c02560, 0x2bf3910, 0x2c02160,<br></blockquote><blockquote type="cite">0x2c02760<Mem:ST4[%arrayidx6.12](align=16)(tbaa=!"float")> [ORD=82]<br></blockquote><blockquote type="cite">[ID=92]<br></blockquote><blockquote type="cite"><br></blockquote><blockquote type="cite">Height 2: SU(90): 0x2c01c50: ch = STFSX 0x2c01950, 0x2bf3910, 0x2c01550,<br></blockquote><blockquote type="cite">0x2c01b50<Mem:ST4[%arrayidx6.11](tbaa=!"float")> [ORD=76] [ID=90]<br></blockquote><blockquote type="cite"><br></blockquote><blockquote type="cite">Height 18: SU(85): 0x2c01150: ch = STFSX 0x2c00d40, 0x2bf3910,<br></blockquote><blockquote type="cite">0x2c00940, 0x2c00f40<Mem:ST4[%arrayidx6.10](align=8)(tbaa=!"float")><br></blockquote><blockquote type="cite">[ORD=70] [ID=85]<br></blockquote><blockquote type="cite"><br></blockquote><blockquote type="cite">(with more debug turned on, I also see a bunch of messages like:<br></blockquote><blockquote type="cite">*** Hazard in cycle 3, SU(97): xxx: ch = STFSX ...<Mem:ST4[%<br></blockquote><blockquote type="cite">arrayidx6.13](tbaa=!"float")> [ORD=88] [ID=97]<br></blockquote><blockquote type="cite">one of these for each of the other possible stores).<br></blockquote><blockquote type="cite"><br></blockquote><blockquote type="cite">*** Scheduling [24]: SU(105): 0x2c03c70: f32 = FADDS 0x2c03b70,<br></blockquote><blockquote type="cite">0x2bf3710 [ORD=92] [ID=105]<br></blockquote><blockquote type="cite"><br></blockquote><blockquote type="cite">why did it choose this fadd over any of the other stores? the<br></blockquote><blockquote type="cite">corresponding unit descriptions are:<br></blockquote><blockquote type="cite"><br></blockquote><blockquote type="cite">SU(102): 0x2c03f70: ch = STFSX 0x2c03c70, 0x2bf3910, 0x2c03870,<br></blockquote><blockquote type="cite">0x2c03e70<Mem:ST4[%arrayidx6.14](align=8)(tbaa=!"float")> [ORD=94]<br></blockquote><blockquote type="cite">[ID=102]<br></blockquote><blockquote type="cite"><br></blockquote><blockquote type="cite">  # preds left       : 4<br></blockquote><blockquote type="cite">  # succs left       : 1<br></blockquote><blockquote type="cite">  # rdefs left       : 0<br></blockquote><blockquote type="cite">  Latency            : 7<br></blockquote><blockquote type="cite">  Depth              : 0<br></blockquote><blockquote type="cite">  Height             : 0<br></blockquote><blockquote type="cite">  Predecessors:<br></blockquote><blockquote type="cite">   val #0x2c11ff0 - SU(105): Latency=3<br></blockquote><blockquote type="cite">   val #0x2c0cdd0 - SU(32): Latency=1<br></blockquote><blockquote type="cite">   val #0x2c11db0 - SU(103): Latency=1<br></blockquote><blockquote type="cite">   ch  #0x2c0af70 - SU(5): Latency=0<br></blockquote><blockquote type="cite">  Successors:<br></blockquote><blockquote type="cite">   ch  #0x2c0ac10 - SU(2): Latency=1<br></blockquote><blockquote type="cite"><br></blockquote><blockquote type="cite">SU(105): 0x2c03c70: f32 = FADDS 0x2c03b70, 0x2bf3710 [ORD=92] [ID=105]<br></blockquote><blockquote type="cite"><br></blockquote><blockquote type="cite">  # preds left       : 2<br></blockquote><blockquote type="cite">  # succs left       : 1<br></blockquote><blockquote type="cite">  # rdefs left       : 1<br></blockquote><blockquote type="cite">  Latency            : 11<br></blockquote><blockquote type="cite">  Depth              : 0<br></blockquote><blockquote type="cite">  Height             : 0<br></blockquote><blockquote type="cite">  Predecessors:<br></blockquote><blockquote type="cite">   val #0x2c12110 - SU(106): Latency=6<br></blockquote><blockquote type="cite">   val #0x2c0d130 - SU(35): Latency=6<br></blockquote><blockquote type="cite">  Successors:<br></blockquote><blockquote type="cite">   val #0x2c11c90 - SU(102): Latency=3<br></blockquote><blockquote type="cite"><br></blockquote><blockquote type="cite">Just from the debugging messages, it looks like what is happening is<br></blockquote><blockquote type="cite">that the scheduler is first rejecting the other stores because of<br></blockquote><blockquote type="cite">pipeline hazards and then picking the instruction with the lowest<br></blockquote><blockquote type="cite">latency. Looking at the code, it seems that this is exactly what it was<br></blockquote><blockquote type="cite">designed to do. If I'm wrong about that, please explain.<br></blockquote><blockquote type="cite"><br></blockquote><blockquote type="cite">Thanks in advance,<br></blockquote><blockquote type="cite">Hal\<br></blockquote></div></blockquote></div><br></body></html>