<html>
<head>
<meta http-equiv="Content-Type" content="text/html; charset=us-ascii">
</head>
<body style="word-wrap: break-word; -webkit-nbsp-mode: space; -webkit-line-break: after-white-space;" class="">
<style type="text/css" style="display:none;"><!-- P {margin-top:0;margin-bottom:0;} --></style>
<div id="divtagdefaultwrapper" style="font-size:12pt;color:#000000;font-family:Calibri,Helvetica,sans-serif;" dir="ltr">
<p>Jonas and Matthias,</p>
<p><br>
</p>
<p>Thank you for informative replies!</p>
<p><br>
</p>
<p>Implementing that naive and inefficient procedure for getting complete live-in/live-out info sounds like a suitable solution for us at this point. We are working a research project, in which we are experimenting with relatively slow scheduling algorithms
to explore the limits of register pressure (RP) reduction. I agree with Matthias that when our experimentation shows that these algorithms make a significant difference, we will worry about putting them in production. So, for now, we will try to implement
this naive approach.<br>
</p>
<br>
<p>We realize that the register allocator will always try to place spills in basic blocks with lower frequencies of execution, which implies that locally reducing RP within a basic block (or a few basic blocks) will not necessarily reduce the static spill count
in the whole function, which is the metric that we are currently using. However, with the large size of our data set (all CPU2006 benchmarks with about 900 thousand scheduling regions), we should statistically get a very strong (but not perfect) correlation
between local RP and the static spill count. The current correlation is not as strong as expected, and that's why we think that there is a problem with the cost function that we are minimizing. It does not seem to be fully capturing the register info.</p>
<p><br>
</p>
<p>After reading Matthias's comment though, I concluded that the weighted spill count (with each spill multiplied by the block's frequency of execution) would be a better metric for us to use. Of course, we do run the benchmarks and look at the actual execution
time once in a while, but we cannot afford doing this after every change that we make. We have started looking into calculating a weighted spill count, but we are having problems with this. We will address that in a separate email.
<br>
</p>
<br>
<p>Overall, we have reasons to believe that precise scheduling algorithms can significantly impact the performance of scientific programs (at least), because most FP2006 benchmarks have significant amounts of spilling in their hot functions, and significant
reductions in these hot spills always lead to significant performance gains. Our framework is general enough to support balancing RP and Instruction-level parallelism (ILP), but we are currently focusing on the RP part,<span>
</span>targeting processors with powerful out-of-order execution, such as Intel x86. Once we complete our exploration of the limits of RP reduction, we will start experimenting with the balance between RP and ILP on in-order processors.
</p>
<br>
<p>Thank you again for your insightful comments! <br>
</p>
<br>
<br>
<span style="background-color:white;"><font style="font-family: Calibri,Helvetica,sans-serif,serif,"EmojiFont";" face="Calibri,Helvetica,sans-serif" color="black" size="3"><span style="font-size:12pt;" dir="ltr" id="divtagdefaultwrapper">
<div style="margin-top:0;margin-bottom:0;"><font face="Calibri,Helvetica,sans-serif,EmojiFont,Apple Color Emoji,Segoe UI Emoji,NotoColorEmoji,Segoe UI Symbol,Android Emoji,EmojiSymbols" color="black" size="3"><span style="font-size:12pt;" id="divtagdefaultwrapper">
<div>
<div style="margin-top:0;margin-bottom:0;"><font face="Arial,sans-serif,serif,EmojiFont">Ghassan Shobaki</font><font face="Arial,sans-serif,serif,EmojiFont"><br>
Assistant Professor of Computer Science </font><font face="Arial,sans-serif,serif,EmojiFont"><br>
</font></div>
<div style="margin-top:0;margin-bottom:0;"><font face="Arial,sans-serif,serif,EmojiFont">California State University, Sacramento
</font></div>
</div>
</span></font></div>
</span></font></span><br>
</div>
<hr style="display:inline-block;width:98%" tabindex="-1">
<div id="divRplyFwdMsg" dir="ltr"><font face="Calibri, sans-serif" style="font-size:11pt" color="#000000"><b>From:</b> Matthias Braun <matze@braunis.de><br>
<b>Sent:</b> Tuesday, September 12, 2017 9:50:22 AM<br>
<b>To:</b> Jonas Paulsson<br>
<b>Cc:</b> Shobaki, Ghassan; Andrew Trick; llvm-dev@lists.llvm.org; Kerbow, Austin Michael; ghassanshobaki@gmail.com<br>
<b>Subject:</b> Re: [llvm-dev] Register pressure calculation in the machine scheduler and live-through registers</font>
<div> </div>
</div>
<div><br class="">
<div>
<blockquote type="cite" class="">
<div class="">On Sep 12, 2017, at 6:44 AM, Jonas Paulsson via llvm-dev <<a href="mailto:llvm-dev@lists.llvm.org" class="">llvm-dev@lists.llvm.org</a>> wrote:</div>
<br class="Apple-interchange-newline">
<div class="">
<div bgcolor="#FFFFFF" text="#000000" class="">
<p class="">Hi Ghassan,<br class="">
</p>
<blockquote type="cite" cite="mid:MWHPR04MB09455C555208ACFB159BE0918C690@MWHPR04MB0945.namprd04.prod.outlook.com" class="">
<div id="divtagdefaultwrapper" style="font-size: 12pt; font-family: Calibri, Helvetica, sans-serif;" dir="ltr" class="">
<p class="">As for live-through information, we found that the machine scheduler does call initLiveThru() and here is a pointer to the code:</p>
<p class=""><br class="">
</p>
<div class=""><font style="font-family:
Calibri,Helvetica,sans-serif,serif,"EmojiFont";" face="Calibri,Helvetica,sans-serif" size="3" class=""><span style="font-size:12pt;" id="divtagdefaultwrapper" class=""></span></font><br class="webkit-block-placeholder">
</div>
<font style="font-family:
Calibri,Helvetica,sans-serif,serif,"EmojiFont";" face="Calibri,Helvetica,sans-serif" size="3" class="">
<div style="margin-top:0;margin-bottom:0;" class=""><a href="https://gitlab.com/CSUS_LLVM/LLVM_DRAGONEGG/blob/master/Generic/llvmTip/llvm-master/lib/CodeGen/MachineScheduler.cpp#L921" target="_blank" rel="noopener noreferrer" id="LPlnk344791" previewremoved="true" moz-do-not-send="true" class=""><span id="LPlnk344791" class="">https://gitlab.com/CSUS_LLVM/LLVM_DRAGONEGG/blob/master/Generic/llvmTip/llvm-master/lib/CodeGen/MachineScheduler.cpp#L921</span></a></div>
<div style="margin-top:0;margin-bottom:0;" class=""><br class="">
</div>
</font></div>
</blockquote>
<font size="3" class=""><font face="Calibri,Helvetica,sans-serif" class=""><font face="Calibri,Helvetica,sans-serif" class=""><font face="Calibri,Helvetica,sans-serif" class=""><font face="Calibri,Helvetica,sans-serif" class=""><font face="Calibri,Helvetica,sans-serif" class=""><font face="Calibri,Helvetica,sans-serif" class=""><font face="Calibri,Helvetica,sans-serif" class=""><font face="Calibri,Helvetica,sans-serif" class="">The
fir<font face="Calibri,Helvetica,sans-serif" class="">st part of the comment <font face="Calibri,Helvetica,sans-serif" class="">
above initLiveThru() says "The register tracker is unaware of global liveness so ignores normal<font face="Calibri,Helvetica,sans-serif" class="">
</font>live-thru ranges.</font></font></font></font></font></font></font></font></font></font></font>.."<font face="Calibri,Helvetica,sans-serif" class="">. It is then of course confusing to see the<font face="Calibri,Helvetica,sans-serif" class="">se methods
like initLiveThru()...<br class="">
<br class="">
</font><font face="Calibri,Helvetica,sans-serif" class="">My understanding is that (please correct me if I'm wrong)<br class="">
<font face="Calibri,Helvetica,sans-serif" class="">1. All instructions are traversed bottom<font face="Calibri,Helvetica,sans-serif" class="">-up during DAG building.
<font face="Calibri,Helvetica,sans-serif" class="">While doing this <font face="Calibri,Helvetica,sans-serif" class="">
re<font face="Calibri,Helvetica,sans-serif" class="">g pressure is <font face="Calibri,Helvetica,sans-serif" class="">
tracked</font> based on <font face="Calibri,Helvetica,sans-serif" class="">just looking at
</font>those instructions. So if a def has no use in an mbb it is a "live-out" reg, and if there is a use with no def, it would become "live-in"<font face="Calibri,Helvetica,sans-serif" class="">. This
</font>is then a kind of local live-through concept, in co<font face="Calibri,Helvetica,sans-serif" class="">ntrast to a true live-through analysis<font face="Calibri,Helvetica,sans-serif" class=""> which would be aware o<font face="Calibri,Helvetica,sans-serif" class="">f
registers not used/defed in the region as well.</font></font></font><br class="">
</font></font></font></font></font></font></font></div>
</div>
</blockquote>
<div>Yes, the first pass during DAG construction determines the maximum register pressure and the list of live-out values. I think the code consults liveintervals to differentiate dead-defs from true live-outs or detect killing uses that aren't marked as such.</div>
<br class="">
<blockquote type="cite" class="">
<div class="">
<div bgcolor="#FFFFFF" text="#000000" class=""><font face="Calibri,Helvetica,sans-serif" class=""><font face="Calibri,Helvetica,sans-serif" class=""><font face="Calibri,Helvetica,sans-serif" class=""><font face="Calibri,Helvetica,sans-serif" class=""><font face="Calibri,Helvetica,sans-serif" class=""><font face="Calibri,Helvetica,sans-serif" class=""><font face="Calibri,Helvetica,sans-serif" class=""></font></font></font></font></font></font></font>2.
We should ideally have an analysis <font face="Calibri,Helvetica,sans-serif" class="">
<font face="Calibri,Helvetica,sans-serif" class="">of global li<font face="Calibri,Helvetica,sans-serif" class="">veness
</font></font>so that t<font face="Calibri,Helvetica,sans-serif" class="">h<font face="Calibri,Helvetica,sans-serif" class="">e regpressure trackers could be properly initialized, but this is currently missing. Of course, one might hope that<font face="Calibri,Helvetica,sans-serif" class="">
it wouldn't be too hard to extend LiveInter<font face="Calibri,Helvetica,sans-serif" class="">vals to also provide this information...</font></font> It
<font face="Calibri,Helvetica,sans-serif" class="">would be interesting to merely try this and see how valuable it
<font face="Calibri,Helvetica,sans-serif" class="">wou<font face="Calibri,Helvetica,sans-serif" class="">ld</font></font>
<font face="Calibri,Helvetica,sans-serif" class="">be...<br class="">
</font></font></font></font></font></div>
</div>
</blockquote>
<div>Again a global minimum amount of spills/reloads is not necessarily better than a good schedule inside a loop with extra spills/reloads outside the loop. But would certainly be worth exploring.</div>
<div>Writing a naive function to determine all live values at a certain point is easy: Just iterate over all vregs and check for each whether the point in question is covered by a live interval. However to do this efficiently so it can be used in production
would probably require a concept such as keeping live-in/live-out lists on basic blocks up-to-date. To put that into production we should first demonstrate that it is worth it.</div>
<div><br class="">
</div>
<div>- Matthias</div>
<div><br class="">
</div>
</div>
</div>
</body>
</html>