[LLVMdev] Loop unrolling analysis next steps

Chandler Carruth chandlerc at gmail.com
Tue May 26 10:47:40 PDT 2015


So, I've spent some considerable time cleaning up and looking at the loop
unrolling analysis that you started Michael, but I don't have time to look
at it further so I'm writing down the current state and what I think should
happen next.

Currently, in tree, we have a reasonably clean framework now that visits
each instruction for each iteration and tries to simplify it. Currently,
the *only* meaningful simplification it does is that of folding loaded
constants. There is a patch out by Michael that tries to also do DCE, but I
think it is fundamentally the wrong approach.

The primary problem is that the current system is trying to use an
optimization technique that relies on *forward simulation* of each
iteration, but then applying backwards analyses to fold specific
instructions through complex transforms. This fundamentally doesn't scale
and I suspect is the reason for the caching layers that were added on top
of SCEV. I suspect those caching layers are more than just a minor code
 smell -- they point to the flawed design.

I think the correct design needs to build a reasonably fast way of taking
an instruction for which we have a SCEV model and constant folding it for a
specific iteration of the loop. Then the visitor needs to be cast in terms
of this: it should first try basic constant folding of operands, and then
fall back on querying for a SCEV model and constant folding that if it has
one for a particular iteration. The result should fold the GEPs that are
currently analyzed ahead of time into constantexpr's. Finally, the load
folding needs to be expressed in terms of folding the load through a
constantexpr GEP into a constant array. If the existing constant folding
isn't enough for this, I'd be rather surprised.

Without restructuring in this way, we'll always be patching things up. For
example, I added logic to skip dead control flows when analyzing the loop
for unrolling and it is completely ineffective because we don't analyze
'icmp' instructions against the induction variable. For cases where the
dead control flow is predicated on the GEPs themselves (for example, any
STL-style algorithm using pointers as iterators!) we can't get an accurate
analysis without applying the GEP simplification independently of the load
simplification.


Anyways, I'm not a SCEV expert, and so I'm not the right person to re-cast
the current visitor to directly build on top of a SCEV model and do
substitution of the current iteration. But I think that is the design I'd
like to see before any other work goes into the code here. Happy to review
when there is a patch to this effect.

-Chandler
-------------- next part --------------
An HTML attachment was scrubbed...
URL: <http://lists.llvm.org/pipermail/llvm-dev/attachments/20150526/af1e823c/attachment.html>


More information about the llvm-dev mailing list