<html><head><meta http-equiv="Content-Type" content="text/html charset=utf-8"></head><body style="word-wrap: break-word; -webkit-nbsp-mode: space; -webkit-line-break: after-white-space;" class=""><br class=""><div><blockquote type="cite" class=""><div class="">On Mar 27, 2015, at 12:36 PM, Tom Stellard <<a href="mailto:tom@stellard.net" class="">tom@stellard.net</a>> wrote:</div><br class="Apple-interchange-newline"><div class=""><span style="font-family: Helvetica; font-size: 12px; font-style: normal; font-variant: normal; font-weight: normal; letter-spacing: normal; line-height: 14px; orphans: auto; text-align: start; text-indent: 0px; text-transform: none; white-space: normal; widows: auto; word-spacing: 0px; -webkit-text-stroke-width: 0px; float: none; display: inline !important;" class="">On Fri, Mar 27, 2015 at 12:20:06PM -0700, Andrew Trick wrote:</span><br style="font-family: Helvetica; font-size: 12px; font-style: normal; font-variant: normal; font-weight: normal; letter-spacing: normal; line-height: 14px; orphans: auto; text-align: start; text-indent: 0px; text-transform: none; white-space: normal; widows: auto; word-spacing: 0px; -webkit-text-stroke-width: 0px;" class=""><blockquote type="cite" style="font-family: Helvetica; font-size: 12px; font-style: normal; font-variant: normal; font-weight: normal; letter-spacing: normal; line-height: 14px; orphans: auto; text-align: start; text-indent: 0px; text-transform: none; white-space: normal; widows: auto; word-spacing: 0px; -webkit-text-stroke-width: 0px;" class=""><br class=""><blockquote type="cite" class="">On Mar 27, 2015, at 7:52 AM, Tom Stellard <<a href="mailto:tom@stellard.net" class="">tom@stellard.net</a>> wrote:<br class=""><br class="">On Thu, Mar 26, 2015 at 11:50:20PM -0700, Andrew Trick wrote:<br class=""><blockquote type="cite" class=""><br class=""><blockquote type="cite" class="">On Mar 26, 2015, at 7:36 PM, Tom Stellard <<a href="mailto:tom@stellard.net" class="">tom@stellard.net</a>> wrote:<br class=""><br class="">Hi,<br class=""><br class="">I have a program with over 100 loads (each with a 10 cycle latency)<br class="">at the beginning of the program, and I can't figure out how to get<br class="">the machine scheduler to intermix ALU instructions with the loads to<br class="">effectively hide the latency.<br class=""><br class="">It seems the issue is with load clustering. I restrict load clustering<br class="">to 4 at a time, but when I look at the debug output, the loads are<br class="">always being scheduled based on the fact that that are clustered. e.g.<br class=""><br class="">Pick Top CLUSTER<br class="">Scheduling SU(10) %vreg13<def> = S_BUFFER_LOAD_DWORD_IMM %vreg9, 4; mem:LD4[<unknown>] SGPR_32:%vreg13 SReg_128:%vreg9<br class=""></blockquote><br class="">Well, only 4 loads in a sequence should have the “cluster” edges. You should be able to see that when the DAG is printed before scheduling.<br class=""><br class=""></blockquote><br class="">There are 4 consecutive 'Pick Top CLUSTER' then a 'Pick Top WEAK' and<br class="">then the pattern repeats itself. All of these are loads.<br class=""></blockquote><br class="">It sounds like someone has created weak edges to the other non-load instructions that are ready to schedule, preventing them from being prioritized before the loads. In the generic scheduler, I think that is only done for copy instructions to try eliminating interference so the copy can be removed. I don't know why that would apply to your case.<br class=""><br class=""><blockquote type="cite" class=""><blockquote type="cite" class="">Even without that limit, stalls take precedence over load clustering. So when you run out of load resources (15?) the scheduler should choose something else.<br class=""><br class=""></blockquote><br class="">Is this the code that checks for stalls?<br class=""><br class="">if (tryLess(Zone.getLatencyStallCycles(TryCand.SU),<br class=""> Zone.getLatencyStallCycles(Cand.SU), TryCand, Cand, Stall))<br class=""><br class=""><br class="">It is disabled if (!SU->isUnbuffered)<br class=""></blockquote><br class="">Oh right, in your case the loads should not even be in the ready queue unless there are free resources. But according to your model that is 31 loads. At any rate, this is not really the issue since your model thinks it can only schedule one instruction per cycle anyway.<br class=""><br class=""></blockquote><br style="font-family: Helvetica; font-size: 12px; font-style: normal; font-variant: normal; font-weight: normal; letter-spacing: normal; line-height: 14px; orphans: auto; text-align: start; text-indent: 0px; text-transform: none; white-space: normal; widows: auto; word-spacing: 0px; -webkit-text-stroke-width: 0px;" class=""><span style="font-family: Helvetica; font-size: 12px; font-style: normal; font-variant: normal; font-weight: normal; letter-spacing: normal; line-height: 14px; orphans: auto; text-align: start; text-indent: 0px; text-transform: none; white-space: normal; widows: auto; word-spacing: 0px; -webkit-text-stroke-width: 0px; float: none; display: inline !important;" class="">I was able to get this working as expected by defining IssueWidth and</span><br style="font-family: Helvetica; font-size: 12px; font-style: normal; font-variant: normal; font-weight: normal; letter-spacing: normal; line-height: 14px; orphans: auto; text-align: start; text-indent: 0px; text-transform: none; white-space: normal; widows: auto; word-spacing: 0px; -webkit-text-stroke-width: 0px;" class=""><span style="font-family: Helvetica; font-size: 12px; font-style: normal; font-variant: normal; font-weight: normal; letter-spacing: normal; line-height: 14px; orphans: auto; text-align: start; text-indent: 0px; text-transform: none; white-space: normal; widows: auto; word-spacing: 0px; -webkit-text-stroke-width: 0px; float: none; display: inline !important;" class="">MicroOpBufferSize in the SIMachineModel:</span><br style="font-family: Helvetica; font-size: 12px; font-style: normal; font-variant: normal; font-weight: normal; letter-spacing: normal; line-height: 14px; orphans: auto; text-align: start; text-indent: 0px; text-transform: none; white-space: normal; widows: auto; word-spacing: 0px; -webkit-text-stroke-width: 0px;" class=""><br style="font-family: Helvetica; font-size: 12px; font-style: normal; font-variant: normal; font-weight: normal; letter-spacing: normal; line-height: 14px; orphans: auto; text-align: start; text-indent: 0px; text-transform: none; white-space: normal; widows: auto; word-spacing: 0px; -webkit-text-stroke-width: 0px;" class=""><br style="font-family: Helvetica; font-size: 12px; font-style: normal; font-variant: normal; font-weight: normal; letter-spacing: normal; line-height: 14px; orphans: auto; text-align: start; text-indent: 0px; text-transform: none; white-space: normal; widows: auto; word-spacing: 0px; -webkit-text-stroke-width: 0px;" class=""><span style="font-family: Helvetica; font-size: 12px; font-style: normal; font-variant: normal; font-weight: normal; letter-spacing: normal; line-height: 14px; orphans: auto; text-align: start; text-indent: 0px; text-transform: none; white-space: normal; widows: auto; word-spacing: 0px; -webkit-text-stroke-width: 0px; float: none; display: inline !important;" class="">class SIModel : SchedMachineModel {</span><br style="font-family: Helvetica; font-size: 12px; font-style: normal; font-variant: normal; font-weight: normal; letter-spacing: normal; line-height: 14px; orphans: auto; text-align: start; text-indent: 0px; text-transform: none; white-space: normal; widows: auto; word-spacing: 0px; -webkit-text-stroke-width: 0px;" class=""><span style="font-family: Helvetica; font-size: 12px; font-style: normal; font-variant: normal; font-weight: normal; letter-spacing: normal; line-height: 14px; orphans: auto; text-align: start; text-indent: 0px; text-transform: none; white-space: normal; widows: auto; word-spacing: 0px; -webkit-text-stroke-width: 0px; float: none; display: inline !important;" class=""> let IssueWidth = 8;</span><br style="font-family: Helvetica; font-size: 12px; font-style: normal; font-variant: normal; font-weight: normal; letter-spacing: normal; line-height: 14px; orphans: auto; text-align: start; text-indent: 0px; text-transform: none; white-space: normal; widows: auto; word-spacing: 0px; -webkit-text-stroke-width: 0px;" class=""><span style="font-family: Helvetica; font-size: 12px; font-style: normal; font-variant: normal; font-weight: normal; letter-spacing: normal; line-height: 14px; orphans: auto; text-align: start; text-indent: 0px; text-transform: none; white-space: normal; widows: auto; word-spacing: 0px; -webkit-text-stroke-width: 0px; float: none; display: inline !important;" class=""> let MicroOpBufferSize = 32;</span><br style="font-family: Helvetica; font-size: 12px; font-style: normal; font-variant: normal; font-weight: normal; letter-spacing: normal; line-height: 14px; orphans: auto; text-align: start; text-indent: 0px; text-transform: none; white-space: normal; widows: auto; word-spacing: 0px; -webkit-text-stroke-width: 0px;" class=""><span style="font-family: Helvetica; font-size: 12px; font-style: normal; font-variant: normal; font-weight: normal; letter-spacing: normal; line-height: 14px; orphans: auto; text-align: start; text-indent: 0px; text-transform: none; white-space: normal; widows: auto; word-spacing: 0px; -webkit-text-stroke-width: 0px; float: none; display: inline !important;" class="">}</span><br style="font-family: Helvetica; font-size: 12px; font-style: normal; font-variant: normal; font-weight: normal; letter-spacing: normal; line-height: 14px; orphans: auto; text-align: start; text-indent: 0px; text-transform: none; white-space: normal; widows: auto; word-spacing: 0px; -webkit-text-stroke-width: 0px;" class=""><br style="font-family: Helvetica; font-size: 12px; font-style: normal; font-variant: normal; font-weight: normal; letter-spacing: normal; line-height: 14px; orphans: auto; text-align: start; text-indent: 0px; text-transform: none; white-space: normal; widows: auto; word-spacing: 0px; -webkit-text-stroke-width: 0px;" class=""><span style="font-family: Helvetica; font-size: 12px; font-style: normal; font-variant: normal; font-weight: normal; letter-spacing: normal; line-height: 14px; orphans: auto; text-align: start; text-indent: 0px; text-transform: none; white-space: normal; widows: auto; word-spacing: 0px; -webkit-text-stroke-width: 0px; float: none; display: inline !important;" class="">This isn't quite correct, though, because the real IssueWidth depends on</span><br style="font-family: Helvetica; font-size: 12px; font-style: normal; font-variant: normal; font-weight: normal; letter-spacing: normal; line-height: 14px; orphans: auto; text-align: start; text-indent: 0px; text-transform: none; white-space: normal; widows: auto; word-spacing: 0px; -webkit-text-stroke-width: 0px;" class=""><span style="font-family: Helvetica; font-size: 12px; font-style: normal; font-variant: normal; font-weight: normal; letter-spacing: normal; line-height: 14px; orphans: auto; text-align: start; text-indent: 0px; text-transform: none; white-space: normal; widows: auto; word-spacing: 0px; -webkit-text-stroke-width: 0px; float: none; display: inline !important;" class="">what kinds of instructions are issued and there isn't a global buffer</span><br style="font-family: Helvetica; font-size: 12px; font-style: normal; font-variant: normal; font-weight: normal; letter-spacing: normal; line-height: 14px; orphans: auto; text-align: start; text-indent: 0px; text-transform: none; white-space: normal; widows: auto; word-spacing: 0px; -webkit-text-stroke-width: 0px;" class=""><span style="font-family: Helvetica; font-size: 12px; font-style: normal; font-variant: normal; font-weight: normal; letter-spacing: normal; line-height: 14px; orphans: auto; text-align: start; text-indent: 0px; text-transform: none; white-space: normal; widows: auto; word-spacing: 0px; -webkit-text-stroke-width: 0px; float: none; display: inline !important;" class="">for MicroOps, each resource has its own buffer.</span><br style="font-family: Helvetica; font-size: 12px; font-style: normal; font-variant: normal; font-weight: normal; letter-spacing: normal; line-height: 14px; orphans: auto; text-align: start; text-indent: 0px; text-transform: none; white-space: normal; widows: auto; word-spacing: 0px; -webkit-text-stroke-width: 0px;" class=""></div></blockquote><div><br class=""></div><div>Giving these a maximum value should be fine. You can also try MicroOpBufferSize=1 which tries harder to avoid stalls.</div><div><br class=""></div><div>Andy</div><br class=""><blockquote type="cite" class=""><div class=""><blockquote type="cite" style="font-family: Helvetica; font-size: 12px; font-style: normal; font-variant: normal; font-weight: normal; letter-spacing: normal; line-height: 14px; orphans: auto; text-align: start; text-indent: 0px; text-transform: none; white-space: normal; widows: auto; word-spacing: 0px; -webkit-text-stroke-width: 0px;" class=""><blockquote type="cite" class=""><blockquote type="cite" class=""><blockquote type="cite" class="">I have a feeling there is something wrong with my machine model in the<br class="">R600 backend, but I've experimented with a few variations of it and have<br class="">been unable to solve this problem. Does anyone have any idea what I<br class="">might be doing wrong?<br class=""></blockquote><br class="">Sorry, not without actually looking through the debug output. The output lists the cycle time at each instruction, so you can see where the scheduler thinks the stalls are.<br class=""><br class=""></blockquote><br class="">There are actually 31 resources defined for loads. However, there<br class="">aren't actually 31 load units in the hardware. There is 1 load unit<br class="">that can hold up to 31 loads waiting to be executed, but only 1 load<br class="">can be executed at a time.<br class=""></blockquote><br class="">It's not really clear how to model GPU hardware in terms of a simple single-cpu in-order machine model. I guess the question is whether you want 31 loads to be scheduled without counting any extra cycles for waiting for the busy load unit.<br class=""><br class="">It sounds like you're running the GenericScheduler with the SI machine model. We do want to make sure that works as advertised and fix bugs. On the other hand, I'm sure you know that the generic heuristics probably aren't very effective on your target. I thought you were using R600SchedStrategy?<br class=""><br class=""></blockquote><br style="font-family: Helvetica; font-size: 12px; font-style: normal; font-variant: normal; font-weight: normal; letter-spacing: normal; line-height: 14px; orphans: auto; text-align: start; text-indent: 0px; text-transform: none; white-space: normal; widows: auto; word-spacing: 0px; -webkit-text-stroke-width: 0px;" class=""><span style="font-family: Helvetica; font-size: 12px; font-style: normal; font-variant: normal; font-weight: normal; letter-spacing: normal; line-height: 14px; orphans: auto; text-align: start; text-indent: 0px; text-transform: none; white-space: normal; widows: auto; word-spacing: 0px; -webkit-text-stroke-width: 0px; float: none; display: inline !important;" class="">R600SchedStrategy is only for older GPUs. SI and newer uses the generic</span><br style="font-family: Helvetica; font-size: 12px; font-style: normal; font-variant: normal; font-weight: normal; letter-spacing: normal; line-height: 14px; orphans: auto; text-align: start; text-indent: 0px; text-transform: none; white-space: normal; widows: auto; word-spacing: 0px; -webkit-text-stroke-width: 0px;" class=""><span style="font-family: Helvetica; font-size: 12px; font-style: normal; font-variant: normal; font-weight: normal; letter-spacing: normal; line-height: 14px; orphans: auto; text-align: start; text-indent: 0px; text-transform: none; white-space: normal; widows: auto; word-spacing: 0px; -webkit-text-stroke-width: 0px; float: none; display: inline !important;" class="">scheduler. The generic scheduler has worked pretty well so far for SI,</span><br style="font-family: Helvetica; font-size: 12px; font-style: normal; font-variant: normal; font-weight: normal; letter-spacing: normal; line-height: 14px; orphans: auto; text-align: start; text-indent: 0px; text-transform: none; white-space: normal; widows: auto; word-spacing: 0px; -webkit-text-stroke-width: 0px;" class=""><span style="font-family: Helvetica; font-size: 12px; font-style: normal; font-variant: normal; font-weight: normal; letter-spacing: normal; line-height: 14px; orphans: auto; text-align: start; text-indent: 0px; text-transform: none; white-space: normal; widows: auto; word-spacing: 0px; -webkit-text-stroke-width: 0px; float: none; display: inline !important;" class="">which is more similar to traditional CPUs than the older VLIW GPUs.</span><br style="font-family: Helvetica; font-size: 12px; font-style: normal; font-variant: normal; font-weight: normal; letter-spacing: normal; line-height: 14px; orphans: auto; text-align: start; text-indent: 0px; text-transform: none; white-space: normal; widows: auto; word-spacing: 0px; -webkit-text-stroke-width: 0px;" class=""><br style="font-family: Helvetica; font-size: 12px; font-style: normal; font-variant: normal; font-weight: normal; letter-spacing: normal; line-height: 14px; orphans: auto; text-align: start; text-indent: 0px; text-transform: none; white-space: normal; widows: auto; word-spacing: 0px; -webkit-text-stroke-width: 0px;" class=""><span style="font-family: Helvetica; font-size: 12px; font-style: normal; font-variant: normal; font-weight: normal; letter-spacing: normal; line-height: 14px; orphans: auto; text-align: start; text-indent: 0px; text-transform: none; white-space: normal; widows: auto; word-spacing: 0px; -webkit-text-stroke-width: 0px; float: none; display: inline !important;" class="">-Tom</span><br style="font-family: Helvetica; font-size: 12px; font-style: normal; font-variant: normal; font-weight: normal; letter-spacing: normal; line-height: 14px; orphans: auto; text-align: start; text-indent: 0px; text-transform: none; white-space: normal; widows: auto; word-spacing: 0px; -webkit-text-stroke-width: 0px;" class=""><br style="font-family: Helvetica; font-size: 12px; font-style: normal; font-variant: normal; font-weight: normal; letter-spacing: normal; line-height: 14px; orphans: auto; text-align: start; text-indent: 0px; text-transform: none; white-space: normal; widows: auto; word-spacing: 0px; -webkit-text-stroke-width: 0px;" class=""><blockquote type="cite" style="font-family: Helvetica; font-size: 12px; font-style: normal; font-variant: normal; font-weight: normal; letter-spacing: normal; line-height: 14px; orphans: auto; text-align: start; text-indent: 0px; text-transform: none; white-space: normal; widows: auto; word-spacing: 0px; -webkit-text-stroke-width: 0px;" class=""><blockquote type="cite" class="">Pick Top CLUSTER <br class="">Scheduling SU(43) %vreg46<def> = S_BUFFER_LOAD_DWORD_IMM %vreg9, 48; mem:LD4[<unknown>] SGPR_32:%vreg46 SReg_128:%vreg9<br class="">SReg_32: 45 > 44(+ 0 livethru)<br class="">VS_32: 51 > 18(+ 0 livethru)<br class="">Ready @46c<br class="">HWLGKM +1x105u<br class="">TopQ.A BotLatency SU(43) 78c<br class="">*** Max MOps 1 at cycle 46<br class="">Cycle: 47 TopQ.A<br class=""></blockquote><br class="">The scheduler thinks it will take 47 cycles to execute the instructions that have been scheduled so far from the top of the schedule.<br class=""><br class="">Andy<br class=""><br class=""><blockquote type="cite" class="">TopQ.A @47c<br class="">Retired: 47<br class="">Executed: 47c<br class="">Critical: 47c, 47 MOps<br class="">ExpectedLatency: 10c<br class="">- Latency limited.<br class="">BotQ.A RemLatency SU(1698) 99c<br class="">TopQ.A + Remain MOps: 1692<br class="">TopQ.A RemLatency SU(201) 97c<br class="">BotQ.A + Remain MOps: 1647<br class="">BotQ.A: 1698 1694 1695<br class=""><br class=""><br class="">Here is example debugging output which. Where is the cycle time<br class="">here?<br class=""><br class=""><br class=""><blockquote type="cite" class="">BTW- I just checked in a small fix for in-order scheduling that might make debugging this easier.<br class=""><br class=""></blockquote><br class="">I will take a look at this.<br class=""><br class="">Thanks,<br class="">Tom<br class=""><br class=""><blockquote type="cite" class="">Andy<br class=""><br class=""><blockquote type="cite" class="">Here are my resource definitions from lib/Target/R600/SISchedule.td<br class=""><br class="">// BufferSize = 0 means the processors are in-order.<br class="">let BufferSize = 0 in {<br class=""><br class="">// XXX: Are the resource counts correct?<br class="">def HWBranch : ProcResource<1>; <br class="">def HWExport : ProcResource<7>; // Taken from S_WAITCNT<br class="">def HWLGKM : ProcResource<31>; // Taken from S_WAITCNT<br class="">def HWSALU : ProcResource<1>; <br class="">def HWVMEM : ProcResource<15>; // Taken from S_WAITCNT<br class="">def HWVALU : ProcResource<1>;<br class=""><br class="">}<br class=""></blockquote><br class=""><blockquote type="cite" class=""><br class="">Thanks,<br class="">Tom<br class="">_______________________________________________<br class="">LLVM Developers mailing list<br class=""><a href="mailto:LLVMdev@cs.uiuc.edu" class="">LLVMdev@cs.uiuc.edu</a> <a href="http://llvm.cs.uiuc.edu" class="">http://llvm.cs.uiuc.edu</a><br class=""><a href="http://lists.cs.uiuc.edu/mailman/listinfo/llvmdev" class="">http://lists.cs.uiuc.edu/mailman/listinfo/llvmdev</a></blockquote></blockquote></blockquote></blockquote></div></blockquote></div><br class=""></body></html>