<div dir="ltr">I was trying to add a IR level sinking pass for this. But for my simple testcase:<div><br></div><div><div>; int g;</div><div>; int foo(int p, int x) {</div><div>; for (int i = 0; i != x; i++)</div><div>; if (__builtin_expect(i == p, 0)) {</div><div>; x += g; x *= g;</div><div>; }</div><div>; return x;</div><div>; }</div></div><div><br></div><div>After LICM hoists the load of global variable g, "x+=g;x*=g" is also hoisted to form a select statement in header:</div><div><br></div><div><div>.<a href="http://lr.ph">lr.ph</a>: ; preds = %.<a href="http://lr.ph">lr.ph</a>, %.lr.ph.preheader</div><div> %.03 = phi i32 [ %8, %.<a href="http://lr.ph">lr.ph</a> ], [ 0, %.lr.ph.preheader ]</div><div> %.012 = phi i32 [ %.1, %.<a href="http://lr.ph">lr.ph</a> ], [ %1, %.lr.ph.preheader ]</div><div> %5 = icmp eq i32 %.03, %0</div><div><b> %6 = add nsw i32 %4, %.012</b></div><div><b> %7 = mul nsw i32 %6, %4</b></div><div><b> %.1 = select i1 %5, i32 %7, i32 %.012, !prof !1</b></div><div> %8 = add nuw nsw i32 %.03, 1</div><div> %9 = icmp eq i32 %8, %.1</div><div> br i1 %9, label %._crit_edge, label %.<a href="http://lr.ph">lr.ph</a></div></div><div><br></div><div>Later in CGP, it's expanded to:</div><div><br></div><div><div>.<a href="http://lr.ph">lr.ph</a>: ; preds = %select.end, %.lr.ph.preheader</div><div> %.03 = phi i32 [ %8, %select.end ], [ 0, %.lr.ph.preheader ]</div><div> %.012 = phi i32 [ %.1, %select.end ], [ %1, %.lr.ph.preheader ]</div><div> %5 = icmp eq i32 %0, %.03</div><div><b> %6 = add nsw i32 %.012, %4</b></div><div><b> %7 = mul nsw i32 %6, %4</b></div><div> br i1 %5, label %select.end, label %select.false</div></div><div><div><br></div><div>select.false: ; preds = %.<a href="http://lr.ph">lr.ph</a></div><div> br label %select.end</div><div><br></div><div>select.end: ; preds = %.<a href="http://lr.ph">lr.ph</a>, %select.false</div><div><b> %.1 = phi i32 [ %7, %.<a href="http://lr.ph">lr.ph</a> ], [ %.012, %select.false ]</b></div><div> %8 = add nuw nsw i32 %.03, 1</div><div> %9 = icmp eq i32 %8, %.1</div><div> br i1 %9, label %._crit_edge, label %.<a href="http://lr.ph">lr.ph</a></div></div><div><br></div><div>So if we want to sink the load from preheader to the branch, we need to:</div><div>1. create a select.true basic block</div><div>2. sink %6 and %7 to that basic block</div><div>3. sink load instruction to that basic block</div><div><br></div><div>This is doable, but seems doing a lot of redundant work of MachineSinking pass. And it could possibly affect optimizations between CGP and MachineSinking.</div><div><br></div><div>Alternatively, we can also do the sinking once in MachineSinking, and make it handle more general cases like this one. But from the MachineSink.cpp, I saw the comment:</div><div><br></div><div><div>// This pass is not intended to be a replacement or a complete alternative</div><div>// for an LLVM-IR-level sinking pass. It is only designed to sink simple</div><div>// constructs that are not exposed before lowering and instruction selection.</div></div><div><br></div><div>Which confuses me: looks like we don't even have a IR-level sinking pass. Why would we want IR-level sinking pass instead of one single sinking pass at machine level?</div><div><br></div><div>Thanks,</div><div>Dehao</div><div><br></div><div><br></div></div><div class="gmail_extra"><br><div class="gmail_quote">On Tue, May 17, 2016 at 11:03 AM, Xinliang David Li <span dir="ltr"><<a href="mailto:davidxl@google.com" target="_blank">davidxl@google.com</a>></span> wrote:<br><blockquote class="gmail_quote" style="margin:0 0 0 .8ex;border-left:1px #ccc solid;padding-left:1ex"><div dir="ltr"><br><div class="gmail_extra"><br><div class="gmail_quote"><div><div class="h5">On Thu, May 12, 2016 at 6:37 PM, Hal Finkel <span dir="ltr"><<a href="mailto:hfinkel@anl.gov" target="_blank">hfinkel@anl.gov</a>></span> wrote:<br><blockquote class="gmail_quote" style="margin:0 0 0 .8ex;border-left:1px #ccc solid;padding-left:1ex"><div><div style="font-family:arial,helvetica,sans-serif;font-size:10pt;color:#000000"><br><hr><blockquote style="border-left:2px solid rgb(16,16,255);margin-left:5px;padding-left:5px;color:rgb(0,0,0);font-weight:normal;font-style:normal;text-decoration:none;font-family:Helvetica,Arial,sans-serif;font-size:12pt"><span><b>From: </b>"Xinliang David Li" <<a href="mailto:davidxl@google.com" target="_blank">davidxl@google.com</a>><br></span><b>To: </b>"Hal Finkel" <<a href="mailto:hfinkel@anl.gov" target="_blank">hfinkel@anl.gov</a>><br><b>Cc: </b>"Dehao Chen" <<a href="mailto:danielcdh@gmail.com" target="_blank">danielcdh@gmail.com</a>>, <a href="mailto:reviews%2BD19950%2Bpublic%2B38ba22078c2035b8@reviews.llvm.org" target="_blank">reviews+D19950+public+38ba22078c2035b8@reviews.llvm.org</a>, "David Majnemer" <<a href="mailto:david.majnemer@gmail.com" target="_blank">david.majnemer@gmail.com</a>>, "Junbum Lim" <<a href="mailto:junbuml@codeaurora.org" target="_blank">junbuml@codeaurora.org</a>>, <a href="mailto:mcrosier@codeaurora.org" target="_blank">mcrosier@codeaurora.org</a>, "llvm-commits" <<a href="mailto:llvm-commits@lists.llvm.org" target="_blank">llvm-commits@lists.llvm.org</a>>, "amara emerson" <<a href="mailto:amara.emerson@arm.com" target="_blank">amara.emerson@arm.com</a>><br><b>Sent: </b>Wednesday, May 11, 2016 12:58:24 PM<span><br><b>Subject: </b>Re: [PATCH] D19950: Use frequency info to guide Loop Invariant Code Motion.<br><br></span><div dir="ltr"><br><div class="gmail_extra"><br><div class="gmail_quote"><span>On Tue, May 10, 2016 at 3:14 PM, Hal Finkel <span dir="ltr"><<a href="mailto:hfinkel@anl.gov" target="_blank">hfinkel@anl.gov</a>></span> wrote:<br></span><blockquote class="gmail_quote" style="margin:0pt 0pt 0pt 0.8ex;border-left:1px solid rgb(204,204,204);padding-left:1ex"><span><hr><span><br>
> From: "Dehao Chen" <<a href="mailto:danielcdh@gmail.com" target="_blank">danielcdh@gmail.com</a>><br>
> To: "Hal Finkel" <<a href="mailto:hfinkel@anl.gov" target="_blank">hfinkel@anl.gov</a>><br>
> Cc: "Xinliang David Li" <<a href="mailto:davidxl@google.com" target="_blank">davidxl@google.com</a>>, <a href="mailto:reviews%2BD19950%2Bpublic%2B38ba22078c2035b8@reviews.llvm.org" target="_blank">reviews+D19950+public+38ba22078c2035b8@reviews.llvm.org</a>, "David<br>
> Majnemer" <<a href="mailto:david.majnemer@gmail.com" target="_blank">david.majnemer@gmail.com</a>>, "Junbum Lim" <<a href="mailto:junbuml@codeaurora.org" target="_blank">junbuml@codeaurora.org</a>>, <a href="mailto:mcrosier@codeaurora.org" target="_blank">mcrosier@codeaurora.org</a>, "llvm-commits"<br>
> <<a href="mailto:llvm-commits@lists.llvm.org" target="_blank">llvm-commits@lists.llvm.org</a>>, "amara emerson" <<a href="mailto:amara.emerson@arm.com" target="_blank">amara.emerson@arm.com</a>><br>
</span></span><span><span>> Sent: Tuesday, May 10, 2016 4:10:49 PM<br>
> Subject: Re: [PATCH] D19950: Use frequency info to guide Loop Invariant Code Motion.<br>
><br>
><br>
</span><span>> Thanks for the comment. I spent quite a while to think, but still<br>
> cannot think of an optimization that could be unblocked by<br>
> speculatively hoisting an loop invariant from an unlikely executed<br>
> path. Can you give some hint (or an example) on what type of<br>
> optimization can benefit from this case?<br>
<br>
</span>I'm specifically thinking about this case (although I suspect there are others):<br>
<br>
for (...) {<br>
if (...) {<br>
hoistable<br>
cold_stuff<br>
}<br>
}<br>
<br>
for (...) {<br>
if (...) {<br>
hoistable<br>
hot_stuff<br>
}<br>
}<br>
<br>
I expect that 'hoistable' will be hoisted by LICM out of both loops, and then CSE'd by GVN. </span></blockquote><span><div><br></div><div><br></div><div>I think this case can be/should be handled by more general profile driver speculative PRE. The above case may not be profitable even after GVN CSEed two expressions.</div></span></div></div></div></blockquote>Why not? It is performance neutral and a code-size improvement.</div></div></blockquote><div><br></div><div><br></div></div></div><div>It may not be performance neutral for the above case -- it depends on how cold the if-block is.</div><span class=""><div><br></div><div> </div><blockquote class="gmail_quote" style="margin:0 0 0 .8ex;border-left:1px #ccc solid;padding-left:1ex"><div><div style="font-family:arial,helvetica,sans-serif;font-size:10pt;color:#000000"><span><blockquote style="border-left:2px solid rgb(16,16,255);margin-left:5px;padding-left:5px;color:rgb(0,0,0);font-weight:normal;font-style:normal;text-decoration:none;font-family:Helvetica,Arial,sans-serif;font-size:12pt"><div dir="ltr"><div class="gmail_extra"><div class="gmail_quote"><div>. </div></div></div></div></blockquote></span>Loop unswitching could certainly be smarter about how it estimates the cost of the loop body, but if there are nested conditions, I don't think that always helps. For example, if we have this:<br><br>for (...) {<br> if (c1) {<br> ...<br> if (c2) {<br> hoistable<br> cold_stuff<br> }<br> ...<br> }<br><br> ...<br>}<br><br>then the code-size cost of unswitching on condition c1 is rightly impacted by whether the hoistable code is still in the loop body.<span><br><blockquote style="border-left:2px solid rgb(16,16,255);margin-left:5px;padding-left:5px;color:rgb(0,0,0);font-weight:normal;font-style:normal;text-decoration:none;font-family:Helvetica,Arial,sans-serif;font-size:12pt"><div dir="ltr"><div class="gmail_extra"><div class="gmail_quote"><div></div><div></div></div></div></div></blockquote></span></div></div></blockquote><div><br></div></span><div>Ok.</div><span class=""><div><br></div><div> </div><blockquote class="gmail_quote" style="margin:0 0 0 .8ex;border-left:1px #ccc solid;padding-left:1ex"><div><div style="font-family:arial,helvetica,sans-serif;font-size:10pt;color:#000000"><span><blockquote style="border-left:2px solid rgb(16,16,255);margin-left:5px;padding-left:5px;color:rgb(0,0,0);font-weight:normal;font-style:normal;text-decoration:none;font-family:Helvetica,Arial,sans-serif;font-size:12pt"><div dir="ltr"><div class="gmail_extra"><div class="gmail_quote"><div> </div><blockquote class="gmail_quote" style="margin:0pt 0pt 0pt 0.8ex;border-left:1px solid rgb(204,204,204);padding-left:1ex">
<br>
Another potential issue is that the hoistable code might be cold, and relatively cheap to hoist, but expensive to vectorize. As a result, failing to hoist the code might block otherwise-profitable vectorization. Which reminds me, we need to fix the vectorizer's if-conversion heuristic to use profiling information too ;)<br></blockquote><div><br></div><div>SLP vectorize? Any example like this? Can vectorizor be enhanced so that it can be done in absence of the hoisting?</div></div></div></div></blockquote></span>I'm referring to the loop vectorizer. I'm not sure it is an issue of potential enhancement, the issue seems fundamental. The "cost" of if-conversion is proportional to the increased execution cost, and that's proportional to the number of instructions in the conditionally-executed block. Even more so if the hoistable operations would need to be scalarized under the targeted ISA. The more instructions need to be if-converted, and the colder the block, the lower the profitability of vectorizing (because, once vectorized, the instructions now execute on every iteration).<br><br></div></div></blockquote><div><br></div></span><div>This is a reasonable concern.</div><span class=""><div> </div><blockquote class="gmail_quote" style="margin:0 0 0 .8ex;border-left:1px #ccc solid;padding-left:1ex"><div><div style="font-family:arial,helvetica,sans-serif;font-size:10pt;color:#000000">In any case, for the same reason we don't want to throttle LICM based on register pressure, I don't think we want to do it here either. There is value is having a canonical form where things execute as soon as possible, because it means that we can use dominance to determine the frontier of legal placement. Otherwise we need to teach a myriad of other passes how to hoist in combination with their core functionality. Maybe we'll come up with some fundamental reason why sinking late is insufficient, but I don't think I've seen one yet.<br></div></div></blockquote><div><br></div><div><br></div></span><div>Ok.</div><div><div class="h5"><div><br></div><div>thanks,</div><div><br></div><div>David</div><div> </div><blockquote class="gmail_quote" style="margin:0 0 0 .8ex;border-left:1px #ccc solid;padding-left:1ex"><div><div style="font-family:arial,helvetica,sans-serif;font-size:10pt;color:#000000"><br>Thanks again,<br>Hal<div><div><br><blockquote style="border-left:2px solid rgb(16,16,255);margin-left:5px;padding-left:5px;color:rgb(0,0,0);font-weight:normal;font-style:normal;text-decoration:none;font-family:Helvetica,Arial,sans-serif;font-size:12pt"><div dir="ltr"><div class="gmail_extra"><div class="gmail_quote"><div></div><div><br></div><div>thanks,</div><div><br></div><div>David</div><div> </div><blockquote class="gmail_quote" style="margin:0pt 0pt 0pt 0.8ex;border-left:1px solid rgb(204,204,204);padding-left:1ex">
<br>
Thanks again,<br>
Hal<br>
<span><br>
><br>
> Thanks,<br>
> Dehao<br>
><br>
><br>
> On Tue, May 10, 2016 at 1:58 PM, Hal Finkel < <a href="mailto:hfinkel@anl.gov" target="_blank">hfinkel@anl.gov</a> ><br>
> wrote:<br>
><br>
><br>
><br>
><br>
><br>
><br>
><br>
><br>
><br>
</span><span>> From: "Xinliang David Li" < <a href="mailto:davidxl@google.com" target="_blank">davidxl@google.com</a> ><br>
> To: "Dehao Chen" < <a href="mailto:danielcdh@gmail.com" target="_blank">danielcdh@gmail.com</a> ><br>
</span>> Cc: <a href="mailto:reviews%2BD19950%2Bpublic%2B38ba22078c2035b8@reviews.llvm.org" target="_blank">reviews+D19950+public+38ba22078c2035b8@reviews.llvm.org</a> , "David<br>
<span>> Majnemer" < <a href="mailto:david.majnemer@gmail.com" target="_blank">david.majnemer@gmail.com</a> >, "Hal Finkel" <<br>
> <a href="mailto:hfinkel@anl.gov" target="_blank">hfinkel@anl.gov</a> >, "Junbum Lim" < <a href="mailto:junbuml@codeaurora.org" target="_blank">junbuml@codeaurora.org</a> >,<br>
</span>> <a href="mailto:mcrosier@codeaurora.org" target="_blank">mcrosier@codeaurora.org</a> , "llvm-commits" <<br>
<span>> <a href="mailto:llvm-commits@lists.llvm.org" target="_blank">llvm-commits@lists.llvm.org</a> >, "amara emerson" <<br>
> <a href="mailto:amara.emerson@arm.com" target="_blank">amara.emerson@arm.com</a> ><br>
> Sent: Tuesday, May 10, 2016 3:15:24 PM<br>
> Subject: Re: [PATCH] D19950: Use frequency info to guide Loop<br>
> Invariant Code Motion.<br>
><br>
><br>
><br>
><br>
><br>
><br>
> On Tue, May 10, 2016 at 1:03 PM, Dehao Chen < <a href="mailto:danielcdh@gmail.com" target="_blank">danielcdh@gmail.com</a> ><br>
> wrote:<br>
><br>
><br>
><br>
><br>
><br>
><br>
><br>
> On Tue, May 10, 2016 at 11:48 AM, Xinliang David Li <<br>
> <a href="mailto:davidxl@google.com" target="_blank">davidxl@google.com</a> > wrote:<br>
><br>
><br>
><br>
><br>
><br>
><br>
><br>
> On Tue, May 10, 2016 at 11:01 AM, Dehao Chen < <a href="mailto:danielcdh@gmail.com" target="_blank">danielcdh@gmail.com</a> ><br>
> wrote:<br>
><br>
><br>
> danielcdh added a comment.<br>
><br>
> In <a href="http://reviews.llvm.org/D19950#425287" rel="noreferrer" target="_blank">http://reviews.llvm.org/D19950#425287</a> , @hfinkel wrote:<br>
><br>
> > In <a href="http://reviews.llvm.org/D19950#425286" rel="noreferrer" target="_blank">http://reviews.llvm.org/D19950#425286</a> , @hfinkel wrote:<br>
> ><br>
</span><div><div>> > > In <a href="http://reviews.llvm.org/D19950#425285" rel="noreferrer" target="_blank">http://reviews.llvm.org/D19950#425285</a> , @davidxl wrote:<br>
> > ><br>
> > > > Static prediction has been conservative in estimating loop trip<br>
> > > > count -- it produces something like 30ish iterations. If the a<br>
> > > > very hot loop has a big if-then-else (or switch), it is very<br>
> > > > likely to mark many bbs' to be colder than the loop header.<br>
> > > > Turning on this for static prediction really depends on the<br>
> > > > false rate. It seems to be this can get wrong pretty easily<br>
> > > > for very hot loops (which is also the most important thing to<br>
> > > > optimize for).<br>
> > ><br>
> > ><br>
> > > This is a good point. There's no universal conservative choice<br>
> > > (assuming a small trip count is conservative in some cases, and<br>
> > > assuming a large trip count is conservative in other cases).<br>
> ><br>
> ><br>
> > Would it be better (and practical) if there were some way for the<br>
> > BFI client to specify which kind of 'conservative' is desired?<br>
> ><br>
> > Also, why are we doing this instead of sinking later (in CGP or<br>
> > similar)? LICM can expose optimization opportunities, plus<br>
> > represents a code pattern the user might input manually. Sinking<br>
> > later seems more robust.<br>
><br>
><br>
> I looked at CGP pass, looks like it's handling the sinking<br>
> case-by-case (e.g. there is separate routine to handle sinking of<br>
> load, gep, etc. I'm afraid this would miss opportunities.<br>
> Additionally, the file-level comment of CGP pass says "This works<br>
> around limitations in it's basic-block-at-a-time approach. It should<br>
> eventually be removed."<br>
> Yes, but it will be "removed" when the entire subsystem is replaced<br>
> by GlobalISel, and we'll certainly need to make GlobalISel<br>
> profiling-data aware, so I expect this is the right path forward<br>
> regardless. I agree, however, that we want a general sinking here<br>
> based on profiling data, not just the specific existing heuristics<br>
> for loads, GEPs, etc.<br>
><br>
><br>
><br>
><br>
><br>
><br>
><br>
><br>
><br>
><br>
><br>
><br>
><br>
><br>
><br>
><br>
><br>
><br>
><br>
><br>
><br>
><br>
> Perhaps you can do profile driven sinking CGP separately to handle<br>
> manually hoisted code situation mentioned by Hal.<br>
><br>
><br>
> Do you mean we still use frequency to decide whether to hoist code in<br>
> LICM, additionally use frequency info to check if we want to sink<br>
> instructions in CGP?<br>
><br>
><br>
><br>
><br>
> yes -- that is the suggestion. I'd prefer that we try to sink late<br>
> first, and only if there are use cases that we can't handle this<br>
> way, we consider throttling hoisting early. If we come across such<br>
> use cases, I'd like to understand them better. Hoisting can expose<br>
> other optimization opportunities, and you lose those opportunities<br>
> if you don't hoist in the first place.<br>
><br>
> -Hal<br>
><br>
><br>
><br>
><br>
><br>
><br>
><br>
><br>
><br>
> David<br>
><br>
><br>
><br>
><br>
><br>
><br>
><br>
><br>
><br>
><br>
><br>
><br>
> Dehao<br>
><br>
><br>
><br>
><br>
><br>
><br>
><br>
><br>
> David<br>
><br>
><br>
><br>
> I'm not quite clear why it helps to move code out of loop early and<br>
> later sink it inside. Could you give an example or some more<br>
> context?<br>
><br>
> Thanks,<br>
> Dehao<br>
><br>
><br>
> <a href="http://reviews.llvm.org/D19950" rel="noreferrer" target="_blank">http://reviews.llvm.org/D19950</a><br>
><br>
><br>
><br>
><br>
><br>
><br>
><br>
><br>
><br>
> --<br>
><br>
> Hal Finkel<br>
> Assistant Computational Scientist<br>
> Leadership Computing Facility<br>
> Argonne National Laboratory<br>
><br>
><br>
<br>
--<br>
Hal Finkel<br>
Assistant Computational Scientist<br>
Leadership Computing Facility<br>
Argonne National Laboratory<br>
</div></div></blockquote></div><br></div></div>
</blockquote><br><br><br>-- <br><div><span name="x"></span>Hal Finkel<br>Assistant Computational Scientist<br>Leadership Computing Facility<br>Argonne National Laboratory<span name="x"></span><br></div></div></div></div></div></blockquote></div></div></div><br></div></div>
</blockquote></div><br></div>