<div dir="ltr"><br><div class="gmail_extra"><br><div class="gmail_quote">On Thu, Jan 8, 2015 at 3:33 PM, Philip Reames <span dir="ltr"><<a href="mailto:listmail@philipreames.com" target="_blank">listmail@philipreames.com</a>></span> wrote:<br><blockquote class="gmail_quote" style="margin:0px 0px 0px 0.8ex;border-left-width:1px;border-left-color:rgb(204,204,204);border-left-style:solid;padding-left:1ex"><span class=""><br>

On 01/07/2015 05:33 PM, Chandler Carruth wrote:<br>

</span><span class=""><blockquote class="gmail_quote" style="margin:0px 0px 0px 0.8ex;border-left-width:1px;border-left-color:rgb(204,204,204);border-left-style:solid;padding-left:1ex">

How does this compare with classical approaches of loop peeling, partitioning, fission, or whatever you might call it?<br>

</blockquote></span>

I'm still developing a good sense for this, but let me lay out some observations.  Fair warning, this is going to be long and rambling.<br>

<br>

Let's start with a toy example:<br>

while(c) {<br>

  x = this->x;<br>

  y = this->y;<br>

  if (x == y) {<br>

    rare();<br>

  }<br>

}<br>

<br></blockquote><div><br></div><div> </div><blockquote class="gmail_quote" style="margin:0px 0px 0px 0.8ex;border-left-width:1px;border-left-color:rgb(204,204,204);border-left-style:solid;padding-left:1ex">

If we could tell x and y were loop invariant, we could unswitch this loop.  However, the rare call clobbers our view of memory, so LICM fails, and thus unswitch fails.<br>

<br></blockquote><div> <br></div><blockquote class="gmail_quote" style="margin:0px 0px 0px 0.8ex;border-left-width:1px;border-left-color:rgb(204,204,204);border-left-style:solid;padding-left:1ex">

We'd like to apply PRE here and push the reload into the loop preheader and the rare case.  This currently fails for two reasons: 1) We don't allow code duplication during PRE, </blockquote><div><br></div><div>?????</div><div>If we don't, we aren't doing real PRE. So i'm not entirely sure what you mean here.</div><div><br></div><div> </div><blockquote class="gmail_quote" style="margin:0px 0px 0px 0.8ex;border-left-width:1px;border-left-color:rgb(204,204,204);border-left-style:solid;padding-left:1ex">and 2) the canonical loop-simplify form of having a single latch means that PRE doesn't actually see the rare block at all, it sees the preheader and the join point after the if block.</blockquote><div><br></div><div> <br></div><blockquote class="gmail_quote" style="margin:0px 0px 0px 0.8ex;border-left-width:1px;border-left-color:rgb(204,204,204);border-left-style:solid;padding-left:1ex">

<br>

I think both of the previous are fixable:<br></blockquote><div><br></div><div><br></div><div>GCC's PRE already does the above.</div><div>It doesn't do profile guided duplication.</div><div>We aren't doing anything special with these blocks.</div><div><br>here is the code I used to test:<br><br></div><div>struct a{</div><div><span class="" style="white-space:pre"> </span>int x;</div><div><span class="" style="white-space:pre">     </span>int y;</div><div>};</div><div>extern void rare();</div><div>int mai(int c, struct a *this)</div><div>{</div><div><span class="" style="white-space:pre">     </span>int d = 0;</div><div><span class="" style="white-space:pre"> </span>while(c) {</div><div><span class="" style="white-space:pre">         </span>int x = this->x;</div><div><span class="" style="white-space:pre">                </span>int y = this->y;</div><div><span class="" style="white-space:pre">                </span>d += x + y;</div><div><span class="" style="white-space:pre">                </span>if (x == y) {</div><div><span class="" style="white-space:pre">                      </span>rare();</div><div><span class="" style="white-space:pre">            </span>}</div><div><span class="" style="white-space:pre">  </span>}</div><div><span class="" style="white-space:pre">  </span>return d;</div><div>} </div><div><br></div><div><br></div><div>It will do exactly what you expect, it is transformed into:<br></div><div><br></div><div><div>struct a{</div><div><span class="" style="white-space:pre">      </span>int x;</div><div><span class="" style="white-space:pre">     </span>int y;</div><div>};</div><div>extern void rare();</div><div>int mai(int c, struct a *this)</div><div>{</div><div><span class="" style="white-space:pre">     </span>int d = 0;</div><div>        int pretemp1 = this->x</div><div>        int pretemp2 = this->y</div><div>     </div><div><span class="" style="white-space:pre">  </span>while(c) {</div><div>                pretemp1phi = phi(rare block pretemp1, preheader pretemp1).</div><div>                pretemp2phi = phi(rare block pretemp2, preheader pretemp2)</div><div><br></div><div><span class="" style="white-space:pre">             </span>d += pretemp1phi + pretemp2phi</div><div><span class="" style="white-space:pre">             </span>if (x == y) {</div><div><span class="" style="white-space:pre">                      </span>rare();</div><div>                        pretemp1 = this->x;</div><div>                        pretemp2 = this->y;</div><div><br></div><div><span class="" style="white-space:pre">         </span>}</div><div><span class="" style="white-space:pre">  </span>}</div><div><span class="" style="white-space:pre">  </span>return d;</div><div>} </div></div><div>I don't see why profile guided duplication is necessary here. This is a basic load PRE case.  It is handled by the first version of GVN-based Load PRE I wrote for GCC.  It is always a win.</div><div><br></div><div>Looking at what LLVM does, the failing on the PRE side is that our PRE/GVN models are not strong enough to handle this. I'm not sure at all why we think anything else is necessary.  It's certainly not requiring special code duplication heuristics, etc.</div><div><br></div><div>So either you are thinking of a case that is different from the above, or I am seriously confused :)</div></div></div></div>