<div dir="ltr"><div class="gmail_extra"><div class="gmail_quote">On Thu, Feb 12, 2015 at 4:39 PM, Michael Zolotukhin <span dir="ltr"><<a href="mailto:mzolotukhin@apple.com" target="_blank">mzolotukhin@apple.com</a>></span> wrote:<br><blockquote class="gmail_quote" style="margin:0 0 0 .8ex;border-left:1px #ccc solid;padding-left:1ex"><div style="word-wrap:break-word"><br><div><div><div class="h5"><blockquote type="cite"><div>On Feb 12, 2015, at 4:25 PM, Chandler Carruth <<a href="mailto:chandlerc@google.com" target="_blank">chandlerc@google.com</a>> wrote:</div><br><div><div dir="ltr"><div class="gmail_extra"><br><div class="gmail_quote">On Thu, Feb 12, 2015 at 4:21 PM, Michael Zolotukhin <span dir="ltr"><<a href="mailto:mzolotukhin@apple.com" target="_blank">mzolotukhin@apple.com</a>></span> wrote:<br><blockquote class="gmail_quote" style="margin:0 0 0 .8ex;border-left:1px #ccc solid;padding-left:1ex"><span><blockquote type="cite"><div>On Feb 12, 2015, at 4:00 PM, Michael Zolotukhin <<a href="mailto:mzolotukhin@apple.com" target="_blank"><span>mzolotukhin@apple.com</span></a>> wrote:</div><br><div><div style="word-wrap:break-word">Hi Chandler,<div><br></div><div>Thanks for the info, I’ll take a look. Sorry for the inconveniences.</div></div></div></blockquote><div><br></div></span><div>You can also use a easy workaround to simply turn off this feature if it blocks you: you can pass "-unroll-max-iteration-count-to-analyze=0” (there was one more bug in there, but I’ve fixed it).</div></blockquote></div><br>I'm going to make an attempt to fix it. If I fail, I'll actually turn it off and you can re-enable it when things are looking better.</div></div>

</div></blockquote></div></div>Thank you!</div></div></blockquote><div><br></div><div>I think I may have tried too hard to fix it.</div><div><br></div><div>I succeeded at fixing the compile time blow-ups I saw in benchmarks and lots of the code quality issues that jumped out at me... However, I just discovered that this approach seems much more flawed than I originally thought.</div><div><br></div><div>The thing that really worries me about this approach is that we're in many places relying on iteration over a dense map or set keyed by a pointer. This order of traversal isn't deterministic. Now, in *theory* this doesn't actually matter because we iterate to a fixed point and we are just computing a cost.... but I'm somewhat concerned that this isn't going to be perfectly deterministic in the way it is calling instsimplify.</div><div><br></div><div><br></div><div>But even if we can make this robust and deterministic, I think this is just the wrong approach. It is *extremely* expensive to do this iterative process over the loop, and we're already burning the cost of iterating over every instruction in order to check the loads.</div><div><br></div><div>Instead, I think you should redesign this to be even more similar to the inline cost analysis. Here is a sketch of what I think would work.</div><div><br></div><div>First, I think it is *really* important to hand the threshold *into* the simplification so that we can early-exit the moment we would need to unroll enough unsimplified instructions to exceed the threshold.</div><div><br></div><div>Second, I think you should structure it to do a preorder traversal of the CFG of the loop body, maintaining the SSA-based simplification map as you go. This ensures you'll always visit defs before uses, and the operands of a use will be fully simplified when you consider them. It also directly feeds into the early-exit strategy. It's important to visit the instructions in the block in this order as well so that you see PHI nodes first. See #3. =]</div><div><br></div><div>Third, you should wire the SCEV based simplification directly into the simplification logic. We should just query SCEV the same way we query instsimplify to try to compute the simplification mappings. This will allow a really wide variety of other simplifications to kick in such as induction variable math simplification. SCEV can also simplify the PHI nodes in many cases. Now, SCEV will probably start to be the slow part of this. You'll either need to arrange the thresholds to make this OK or work to extract something from SCEV so that it computes the N iteration counts more efficiently.</div><div><br></div><div><br></div></div>Unfortunately, I don't think there is a lot of code sharing you'll want to do with the existing implementation. I would probably suggest pulling the existing code out and then adding a new variant that does this. This is also similar enough to the inline cost analysis that you might be able to extract some part of it and re-use it.</div><div class="gmail_extra"><br></div><div class="gmail_extra">Until you have time to re-work this, I'm turning it off by setting the unroll max iterations to 0 because I don't want to risk there being some other problem in this code that we haven't teased out yet, especially if its non-determinism.</div><div class="gmail_extra"><br></div><div class="gmail_extra"><br></div><div class="gmail_extra">Sorry for all the noise. I had really hoped I could clean it up into a useful intermediate state.</div><div class="gmail_extra"><br></div><div class="gmail_extra">-Chandler</div></div>