[LLVMdev] Loop unrolling analysis next steps

Thu May 28 19:25:34 PDT 2015

Given how far you pushed this analysis pass I’m surprised by some of your concerns. Please see below.
> On May 26, 2015, at 10:47 AM, Chandler Carruth <chandlerc at gmail.com> wrote:
> 
> So, I've spent some considerable time cleaning up and looking at the loop unrolling analysis that you started Michael, but I don't have time to look at it further so I'm writing down the current state and what I think should happen next.
> 
> Currently, in tree, we have a reasonably clean framework now that visits each instruction for each iteration and tries to simplify it. Currently, the *only* meaningful simplification it does is that of folding loaded constants. There is a patch out by Michael that tries to also do DCE, but I think it is fundamentally the wrong approach.
DCE should not be thought as part of the framework. And it has to be a backward pass to account for (iteration-specific) dead instructions. Take for example: a = <some>; b = a * c[i]; When c[i] turns out to be zero zero, a becomes dead (assuming only one use of a). 
> 
> The primary problem is that the current system is trying to use an optimization technique that relies on *forward simulation* of each iteration, but then applying backwards analyses to fold specific instructions through complex transforms. This fundamentally doesn't scale and I suspect is the reason for the caching layers that were added on top of SCEV. I suspect those caching layers are more than just a minor code  smell -- they point to the flawed design.

I don’t see the scalability issue with DCE. 

Why can’t the current design integrate GEP naturally w/o the need for the SCEV cache? I think the code could do away with the extra cache. In that case the cache is a current implementation artifact rather than a flawed design signal.
> 
> I think the correct design needs to build a reasonably fast way of taking an instruction for which we have a SCEV model and constant folding it for a specific iteration of the loop. Then the visitor needs to be cast in terms of this: it should first try basic constant folding of operands, and then fall back on querying for a SCEV model and constant folding that if it has one for a particular iteration. The result should fold the GEPs that are currently analyzed ahead of time into constantexpr's. Finally, the load folding needs to be expressed in terms of folding the load through a constantexpr GEP into a constant array. If the existing constant folding isn't enough for this, I'd be rather surprised.
> 
> Without restructuring in this way, we'll always be patching things up. For example, I added logic to skip dead control flows when analyzing the loop for unrolling and it is completely ineffective because we don't analyze 'icmp' instructions against the induction variable. For cases where the dead control flow is predicated on the GEPs themselves (for example, any STL-style algorithm using pointers as iterators!) we can't get an accurate analysis without applying the GEP simplification independently of the load simplification.

Could you share your test case? I would be surprised if the current design couldn’t handle it by adding a visitICmp, but, yes, it requires additional logic to handle dead conditional control flow.
> 
> 
> Anyways, I'm not a SCEV expert, and so I'm not the right person to re-cast the current visitor to directly build on top of a SCEV model and do substitution of the current iteration. But I think that is the design I'd like to see before any other work goes into the code here. Happy to review when there is a patch to this effect.
> 
> -Chandler
> _______________________________________________
> LLVM Developers mailing list
> LLVMdev at cs.uiuc.edu         http://llvm.cs.uiuc.edu
> http://lists.cs.uiuc.edu/mailman/listinfo/llvmdev