[LLVMdev] Loop unrolling analysis next steps

Thu May 28 21:43:53 PDT 2015

On May 26, 2015, at 10:47 AM, Chandler Carruth <chandlerc at gmail.com> wrote:
> So, I've spent some considerable time cleaning up and looking at the loop unrolling analysis that you started Michael, but I don't have time to look at it further so I'm writing down the current state and what I think should happen next.
> 
> Currently, in tree, we have a reasonably clean framework now that visits each instruction for each iteration and tries to simplify it. Currently, the *only* meaningful simplification it does is that of folding loaded constants. There is a patch out by Michael that tries to also do DCE, but I think it is fundamentally the wrong approach.

You’re right that it’s the wrong approach, but that doesn’t mean it is the wrong way to go.

The right approach for this (and many other transformations) is to actually do the transformation, do a suite of simplifications, then see if it was worth it.  There are many compilers that take that approach (because that is the only and best way to get an accurate cost model for a transformation like this) but it is extremely expensive in compile time.

> The primary problem is that the current system is trying to use an optimization technique that relies on *forward simulation* of each iteration, but then applying backwards analyses to fold specific instructions through complex transforms. This fundamentally doesn't scale and I suspect is the reason for the caching layers that were added on top of SCEV. I suspect those caching layers are more than just a minor code  smell -- they point to the flawed design.

As the guy who added the caching scheme to SCEV, I can assure you that it had nothing to do with loop unrolling.  I’m not arguing that caching ended up being a great thing in the end, just that it doesn’t have anything to do with unrolling.

> Anyways, I'm not a SCEV expert, and so I'm not the right person to re-cast the current visitor to directly build on top of a SCEV model and do substitution of the current iteration. But I think that is the design I'd like to see before any other work goes into the code here. Happy to review when there is a patch to this effect.

The problem here is that SCEV isn’t enough.  SCEV doesn’t handle things like DCE, doesn’t handle store->load forwarding, and doesn’t handle calls.  SCEV is only really appropriate for simple chains of integer operations, and real loops get more complicated than that.  For example, one of the major benefits of unrolling:

for (i) {
  a[i] = a[i-1]+1
}

is that you get trivial store->load forwarding without depending on our crazy memdep “PRE” stuff.  Unless we add something like MemorySSA and teach SCEV about it, I don’t see how this can be painted as a SCEV problem.

-Chris