[llvm] r228265 - Implement new heuristic for complete loop unrolling.

Fri Feb 13 11:52:35 PST 2015

Hi Chandler,

Thank you for the work and pointing out to the problems! It’s good that we found them early.

I think that we would solve these issues if we do the following:
1. When looking for simplified instructions, visit each instruction only once. That’ll be achieved by visiting blocks in pre-order.
2. When looking for dead-instructions, also visit them only once - here we would need to visit blocks in reverse-order.
3. SCEV indeed might be too slow for this, but we actually only interested in expressions that become constants. That means, we need SCEV for expressions that either feed a load, or have constant Start and Step SCEV-expressions. We can scan all insns once and cache results for such potentially-constant instructions, other instructions don’t need SCEV-expression. So, we won’t need to talk to SCEV after this point.
4. As for using threshold for early-exit - that’s a really good idea, and it should integrate here naturally.

What do you think about this? Am I missing something, or does it sound right?

And again, thank you for your feedback!

Michael

> On Feb 12, 2015, at 9:25 PM, Chandler Carruth <chandlerc at google.com> wrote:
> 
> On Thu, Feb 12, 2015 at 4:39 PM, Michael Zolotukhin <mzolotukhin at apple.com <mailto:mzolotukhin at apple.com>> wrote:
> 
>> On Feb 12, 2015, at 4:25 PM, Chandler Carruth <chandlerc at google.com <mailto:chandlerc at google.com>> wrote:
>> 
>> 
>> On Thu, Feb 12, 2015 at 4:21 PM, Michael Zolotukhin <mzolotukhin at apple.com <mailto:mzolotukhin at apple.com>> wrote:
>>> On Feb 12, 2015, at 4:00 PM, Michael Zolotukhin <mzolotukhin at apple.com <mailto:mzolotukhin at apple.com>> wrote:
>>> 
>>> Hi Chandler,
>>> 
>>> Thanks for the info, I’ll take a look. Sorry for the inconveniences.
>> 
>> You can also use a easy workaround to simply turn off this feature if it blocks you: you can pass "-unroll-max-iteration-count-to-analyze=0” (there was one more bug in there, but I’ve fixed it).
>> 
>> I'm going to make an attempt to fix it. If I fail, I'll actually turn it off and you can re-enable it when things are looking better.
> Thank you!
> 
> I think I may have tried too hard to fix it.
> 
> I succeeded at fixing the compile time blow-ups I saw in benchmarks and lots of the code quality issues that jumped out at me... However, I just discovered that this approach seems much more flawed than I originally thought.
> 
> The thing that really worries me about this approach is that we're in many places relying on iteration over a dense map or set keyed by a pointer. This order of traversal isn't deterministic. Now, in *theory* this doesn't actually matter because we iterate to a fixed point and we are just computing a cost.... but I'm somewhat concerned that this isn't going to be perfectly deterministic in the way it is calling instsimplify.
> 
> 
> But even if we can make this robust and deterministic, I think this is just the wrong approach. It is *extremely* expensive to do this iterative process over the loop, and we're already burning the cost of iterating over every instruction in order to check the loads.
> 
> Instead, I think you should redesign this to be even more similar to the inline cost analysis. Here is a sketch of what I think would work.
> 
> First, I think it is *really* important to hand the threshold *into* the simplification so that we can early-exit the moment we would need to unroll enough unsimplified instructions to exceed the threshold.
> 
> Second, I think you should structure it to do a preorder traversal of the CFG of the loop body, maintaining the SSA-based simplification map as you go. This ensures you'll always visit defs before uses, and the operands of a use will be fully simplified when you consider them. It also directly feeds into the early-exit strategy. It's important to visit the instructions in the block in this order as well so that you see PHI nodes first. See #3. =]
> 
> Third, you should wire the SCEV based simplification directly into the simplification logic. We should just query SCEV the same way we query instsimplify to try to compute the simplification mappings. This will allow a really wide variety of other simplifications to kick in such as induction variable math simplification. SCEV can also simplify the PHI nodes in many cases. Now, SCEV will probably start to be the slow part of this. You'll either need to arrange the thresholds to make this OK or work to extract something from SCEV so that it computes the N iteration counts more efficiently.
> 
> 
> Unfortunately, I don't think there is a lot of code sharing you'll want to do with the existing implementation. I would probably suggest pulling the existing code out and then adding a new variant that does this. This is also similar enough to the inline cost analysis that you might be able to extract some part of it and re-use it.
> 
> Until you have time to re-work this, I'm turning it off by setting the unroll max iterations to 0 because I don't want to risk there being some other problem in this code that we haven't teased out yet, especially if its non-determinism.
> 
> 
> Sorry for all the noise. I had really hoped I could clean it up into a useful intermediate state.
> 
> -Chandler

-------------- next part --------------
An HTML attachment was scrubbed...
URL: <http://lists.llvm.org/pipermail/llvm-commits/attachments/20150213/48b8f1ad/attachment.html>