[llvm] r228265 - Implement new heuristic for complete loop unrolling.

Thu Feb 12 21:25:05 PST 2015

On Thu, Feb 12, 2015 at 4:39 PM, Michael Zolotukhin <mzolotukhin at apple.com>
wrote:

>
> On Feb 12, 2015, at 4:25 PM, Chandler Carruth <chandlerc at google.com>
> wrote:
>
>
> On Thu, Feb 12, 2015 at 4:21 PM, Michael Zolotukhin <mzolotukhin at apple.com
> > wrote:
>
>> On Feb 12, 2015, at 4:00 PM, Michael Zolotukhin <mzolotukhin at apple.com>
>> wrote:
>>
>> Hi Chandler,
>>
>> Thanks for the info, I’ll take a look. Sorry for the inconveniences.
>>
>>
>> You can also use a easy workaround to simply turn off this feature if it
>> blocks you: you can pass "-unroll-max-iteration-count-to-analyze=0” (there
>> was one more bug in there, but I’ve fixed it).
>>
>
> I'm going to make an attempt to fix it. If I fail, I'll actually turn it
> off and you can re-enable it when things are looking better.
>
> Thank you!
>

I think I may have tried too hard to fix it.

I succeeded at fixing the compile time blow-ups I saw in benchmarks and
lots of the code quality issues that jumped out at me... However, I just
discovered that this approach seems much more flawed than I originally
thought.

The thing that really worries me about this approach is that we're in many
places relying on iteration over a dense map or set keyed by a pointer.
This order of traversal isn't deterministic. Now, in *theory* this doesn't
actually matter because we iterate to a fixed point and we are just
computing a cost.... but I'm somewhat concerned that this isn't going to be
perfectly deterministic in the way it is calling instsimplify.

But even if we can make this robust and deterministic, I think this is just
the wrong approach. It is *extremely* expensive to do this iterative
process over the loop, and we're already burning the cost of iterating over
every instruction in order to check the loads.

Instead, I think you should redesign this to be even more similar to the
inline cost analysis. Here is a sketch of what I think would work.

First, I think it is *really* important to hand the threshold *into* the
simplification so that we can early-exit the moment we would need to unroll
enough unsimplified instructions to exceed the threshold.

Second, I think you should structure it to do a preorder traversal of the
CFG of the loop body, maintaining the SSA-based simplification map as you
go. This ensures you'll always visit defs before uses, and the operands of
a use will be fully simplified when you consider them. It also directly
feeds into the early-exit strategy. It's important to visit the
instructions in the block in this order as well so that you see PHI nodes
first. See #3. =]

Third, you should wire the SCEV based simplification directly into the
simplification logic. We should just query SCEV the same way we query
instsimplify to try to compute the simplification mappings. This will allow
a really wide variety of other simplifications to kick in such as induction
variable math simplification. SCEV can also simplify the PHI nodes in many
cases. Now, SCEV will probably start to be the slow part of this. You'll
either need to arrange the thresholds to make this OK or work to extract
something from SCEV so that it computes the N iteration counts more
efficiently.

Unfortunately, I don't think there is a lot of code sharing you'll want to
do with the existing implementation. I would probably suggest pulling the
existing code out and then adding a new variant that does this. This is
also similar enough to the inline cost analysis that you might be able to
extract some part of it and re-use it.

Until you have time to re-work this, I'm turning it off by setting the
unroll max iterations to 0 because I don't want to risk there being some
other problem in this code that we haven't teased out yet, especially if
its non-determinism.

Sorry for all the noise. I had really hoped I could clean it up into a
useful intermediate state.

-Chandler
-------------- next part --------------
An HTML attachment was scrubbed...
URL: <http://lists.llvm.org/pipermail/llvm-commits/attachments/20150212/65eca5ef/attachment.html>