<div dir="ltr"><div class="gmail_extra"><div class="gmail_quote">On Wed, Jan 22, 2014 at 12:09 PM, Andrew Trick <span dir="ltr"><<a href="mailto:atrick@apple.com" target="_blank">atrick@apple.com</a>></span> wrote:<br>

<blockquote class="gmail_quote" style="margin:0 0 0 .8ex;border-left:1px #ccc solid;padding-left:1ex"><div style="word-wrap:break-word"><br><div><div><div class="h5"><div>On Jan 22, 2014, at 4:01 AM, Chandler Carruth <<a href="mailto:chandlerc@gmail.com" target="_blank">chandlerc@gmail.com</a>> wrote:</div>

<br><blockquote type="cite"><div dir="ltr"><div class="gmail_extra"><br><div class="gmail_quote">On Wed, Jan 22, 2014 at 3:39 AM, Chandler Carruth <span dir="ltr"><<a href="mailto:chandlerc@gmail.com" target="_blank">chandlerc@gmail.com</a>></span> wrote:<br>


<blockquote class="gmail_quote" style="margin:0px 0px 0px 0.8ex;border-left-width:1px;border-left-color:rgb(204,204,204);border-left-style:solid;padding-left:1ex">I have a patch that does #1 already, but wanted to check that you're OK weakening the verification. Otherwise, I have to do 1, 2, 3, and 5 in a single commit, or teach the LoopVectorizer and LSR to preserve LoopSimplify... Yuck.</blockquote>


</div><br>This patch appears to cause slight changes in three test cases:</div><div class="gmail_extra"><br></div><div class="gmail_extra"><div class="gmail_extra">    LLVM :: Transforms/IndVarSimplify/lftr-reuse.ll</div>


<div class="gmail_extra">    LLVM :: Transforms/LoopSimplify/ashr-crash.ll</div><div class="gmail_extra">    LLVM :: Transforms/LoopStrengthReduce/2011-12-19-PostincQuadratic.ll</div><div class="gmail_extra"><br></div><div class="gmail_extra">


Looking at lftr-reuse.ll, we successfully hoist the 'icmp slt' into the 'outer:' block as the comment says would be nice (because the outer loop is simplified now, the test is checking for unsimplified). The LSR failure is just that the loop basic blocks have different names (loopexit instead of preheader).</div>


<div class="gmail_extra"><br></div><div class="gmail_extra">The ashr-crash.ll case is minutely interesting -- we fail to hoist the comparison that the test wants hoisted into the entry block. My suspicion is that getting this to hoist with the heavily reduced pipeline used is problematic, as the test seems more geared to tickle SCEV bugs than test important optimization invariants.</div>

</div></div></blockquote><div><br></div></div></div><div>This looks like an example of the kind of order-of-loop-transform problems that I spent a staggering amount of time debugging. So the underlying problem should go away.</div>

<div><br></div><div>That said, it also looks like an example where we benefit from applying multiple passes to the inner loop before optimizing the outer loop. If you want to file a PR and disable it for now that’s cool.</div>

</div></div></blockquote><div><br></div><div>Turns out it was just a bug. =]</div><div><br></div><div>See r199884 for gory details. This works, is landed, and I'm working on LCSSA now.</div><div><br></div><div>The key bit is that we didn't really need to do order-of-loop-transform stuff, we just really need to preserve LoopSimplify form because if we don't everything goes to crap. Notably, there are two loops that get completely unrolled in this test case. After the inner one is unrolled we will do a *significantly* poorer job of unrolling the outer loop unless we re-simplify it first. That's not surprising really as this is the canonical form the unroller expects.</div>

<div><br></div><div>It's looking like the loop *canonicalization* stuff really doesn't need the inside-out pipelining, what it needs is for each loop pass to ensure that the canonical form is preserved at each nesting of the loop.</div>

<div><br></div><div>But I still think that the loop *simplification* stuff really *does* need the pipelining. As you pointed out originally, full-unroll, rotate, and LICM can combine to dramatically simplify the structure iteratively at each level. I'm still pondering what the best long-term structure for managing this type of tightly coupled optimization is, but it doesn't look like it will get in the way of easing access to function analysis passes.</div>

</div></div></div>