<html><head><meta http-equiv="Content-Type" content="text/html charset=windows-1252"></head><body style="word-wrap: break-word; -webkit-nbsp-mode: space; -webkit-line-break: after-white-space;"><br><div><div>On Aug 6, 2013, at 11:05 AM, Chad Rosier <<a href="mailto:chad.rosier@gmail.com">chad.rosier@gmail.com</a>> wrote:</div><br class="Apple-interchange-newline"><blockquote type="cite">Thanks, Mark.  I will give the paper a look.<br><br><div class="gmail_quote">On Tue, Aug 6, 2013 at 1:42 PM, Mark Lacey <span dir="ltr"><<a href="mailto:mark.lacey@apple.com" target="_blank">mark.lacey@apple.com</a>></span> wrote:<br>

<blockquote class="gmail_quote" style="margin: 0px 0px 0px 0.8ex; border-left-width: 1px; border-left-color: rgb(204, 204, 204); border-left-style: solid; padding-left: 1ex; position: static; z-index: auto;">Hi Chad,<br>

<div class="im"><br>

On Aug 6, 2013, at 7:46 AM, Chad Rosier <<a href="mailto:chad.rosier@gmail.com">chad.rosier@gmail.com</a>> wrote:<br>

<br>

</div><div><div class="h5">> All,<br>

> I have some code that looks like the following:<br>

><br>

> {<br>

>   double a, b, c;<br>

>   for (...) {<br>

>     ...<br>

>     a = lots of FP math;<br>

>     b = lots of FP math;<br>

>     c = lots of FP math;<br>

>     if (cond) {<br>

>       a = 0.0;<br>

>       b = 0.1;<br>

>       c = 0.2;<br>

>     }<br>

>    ...<br>

>   }<br>

> }<br>

><br>

> Could we not convert the hammock into a diamond and move the initial computation of a, b, and c into the else block.  Something like this:<br>

><br>

> {<br>

>   double a, b, c;<br>

>   for (...) {<br>

>     ...<br>

>     if (cond) {<br>

>       a = 0.0;<br>

>       b = 0.1;<br>

>       c = 0.2;<br>

>     } else {<br>

>       a = lots of FP math;<br>

>       b = lots of FP math;<br>

>       c = lots of FP math;<br>

>     }<br>

>    ...<br>

>   }<br>

> }<br>

><br>

> Does a similar optimization exist?  If not, would the SimplifyCFG pass be an appropriate home for such an optimization?<br>

<br>

</div></div>I am not sure if something similar exists in LLVM today. The optimization you are looking for is called partial dead code elimination (PDE).<br>

<br>

I do not think SimplifyCFG would be a great place to put this as that is primarily about cleaning up control flow. I think it deserves its own pass.<br>

<br>

The most general formulation would result in sinking instructions, including loads (and possibly some other side-effecting instructions when safe to do so), along with potentially duplicating code when the results are dead on some paths below the definition, but not all.<br>


<br>

It seems like you could do a simpler formulation that just splits critical edges, and then moves instructions that have no side effects and have only a single use, to just before to the use (in this case it would be just prior to the phi in the new block created when splitting edges). You would also need to be careful not to sink code into loops - you would instead want to sink it below code that guards the loop, but not actually into the loop.<br>


<br>

Here is one paper I am aware of that appears to have an SSA-based formulation of PDE. I have not read this as it is behind a paywall:<br>

  <a href="http://rd.springer.com/chapter/10.1007/3-540-48294-6_12" target="_blank">http://rd.springer.com/chapter/10.1007%2F3-540-48294-6_12</a><br></blockquote></div></blockquote><div><br></div><div>I haven’t read that paper because I’m too cheap to pay for it. However, there is a very simple GCM algorithm that handles this case. All you need is a DomTree, no fancy value graph. It makes two passes over the SSA data-flow graph to find valid blocks.</div><div><br></div><a href="http://dl.acm.org/citation.cfm?id=207154">http://dl.acm.org/citation.cfm?id=207154</a></div><div><br></div><div>The difficult compiler-specific part is inferring control and memory dependencies. Those are not explicit in LLVM SSA. Our GVN pass has the infrastructure to deal with that.</div><div><br></div><div>Next you have to consider profitability. You could aggressively sink everything at IR level within a loop, but you would need to split a lot of critical edges. There probably needs to be a cost heuristic to avoid splitting to sink cheap operations. That really complicates the problem though.</div><div><br></div><div>For even better heuristiscs… you could potentially use BlockFrequency to ignore cold branches. If you care about latency hiding/critical path and register pressure, then it needs to be an MI-level pass. It would be nice to have such an MI pass to compensate for lack of a global scheduler.</div><div><br></div><div>-Andy</div><div><br><blockquote type="cite" dir="auto"><div class="gmail_quote"><blockquote class="gmail_quote" style="margin: 0px 0px 0px 0.8ex; border-left-width: 1px; border-left-color: rgb(204, 204, 204); border-left-style: solid; padding-left: 1ex; position: static; z-index: auto;">

<span class="HOEnZb"><font color="#888888"><br>

Mark<br>

<br>

</font></span></blockquote></div><br>

_______________________________________________<br>LLVM Developers mailing list<br><a href="mailto:LLVMdev@cs.uiuc.edu">LLVMdev@cs.uiuc.edu</a>         <a href="http://llvm.cs.uiuc.edu">http://llvm.cs.uiuc.edu</a><br><a href="http://lists.cs.uiuc.edu/mailman/listinfo/llvmdev">http://lists.cs.uiuc.edu/mailman/listinfo/llvmdev</a><br></blockquote></div><br></body></html>