[LLVMdev] Why should we have the LoopPass and LoopPassManager? Can we get rid of this complexity?

Thu Jan 23 03:34:47 PST 2014

On Wed, Jan 22, 2014 at 12:09 PM, Andrew Trick <atrick at apple.com> wrote:

>
> On Jan 22, 2014, at 4:01 AM, Chandler Carruth <chandlerc at gmail.com> wrote:
>
>
> On Wed, Jan 22, 2014 at 3:39 AM, Chandler Carruth <chandlerc at gmail.com>wrote:
>
>> I have a patch that does #1 already, but wanted to check that you're OK
>> weakening the verification. Otherwise, I have to do 1, 2, 3, and 5 in a
>> single commit, or teach the LoopVectorizer and LSR to preserve
>> LoopSimplify... Yuck.
>
>
> This patch appears to cause slight changes in three test cases:
>
>     LLVM :: Transforms/IndVarSimplify/lftr-reuse.ll
>     LLVM :: Transforms/LoopSimplify/ashr-crash.ll
>     LLVM :: Transforms/LoopStrengthReduce/2011-12-19-PostincQuadratic.ll
>
> Looking at lftr-reuse.ll, we successfully hoist the 'icmp slt' into the
> 'outer:' block as the comment says would be nice (because the outer loop is
> simplified now, the test is checking for unsimplified). The LSR failure is
> just that the loop basic blocks have different names (loopexit instead of
> preheader).
>
> The ashr-crash.ll case is minutely interesting -- we fail to hoist the
> comparison that the test wants hoisted into the entry block. My suspicion
> is that getting this to hoist with the heavily reduced pipeline used is
> problematic, as the test seems more geared to tickle SCEV bugs than test
> important optimization invariants.
>
>
> This looks like an example of the kind of order-of-loop-transform problems
> that I spent a staggering amount of time debugging. So the underlying
> problem should go away.
>
> That said, it also looks like an example where we benefit from applying
> multiple passes to the inner loop before optimizing the outer loop. If you
> want to file a PR and disable it for now that’s cool.
>

Turns out it was just a bug. =]

See r199884 for gory details. This works, is landed, and I'm working on
LCSSA now.

The key bit is that we didn't really need to do order-of-loop-transform
stuff, we just really need to preserve LoopSimplify form because if we
don't everything goes to crap. Notably, there are two loops that get
completely unrolled in this test case. After the inner one is unrolled we
will do a *significantly* poorer job of unrolling the outer loop unless we
re-simplify it first. That's not surprising really as this is the canonical
form the unroller expects.

It's looking like the loop *canonicalization* stuff really doesn't need the
inside-out pipelining, what it needs is for each loop pass to ensure that
the canonical form is preserved at each nesting of the loop.

But I still think that the loop *simplification* stuff really *does* need
the pipelining. As you pointed out originally, full-unroll, rotate, and
LICM can combine to dramatically simplify the structure iteratively at each
level. I'm still pondering what the best long-term structure for managing
this type of tightly coupled optimization is, but it doesn't look like it
will get in the way of easing access to function analysis passes.
-------------- next part --------------
An HTML attachment was scrubbed...
URL: <http://lists.llvm.org/pipermail/llvm-dev/attachments/20140123/9bf19c73/attachment.html>