[PATCH] Enable unrolling of multi-exit loops
Mark Heffernan
meheff at google.com
Wed Oct 8 11:34:06 PDT 2014
Thanks for the review.
>>! In D5550#10, @atrick wrote:
> Sorry for the delay reviewing--I had to be careful with this one, and the diff is unclear. It does look like great improvement and it's about time to move on to full-fledge support for multi-exit loops.
No problem on the latency. It took me a long time to reason through all the code and the change ;-)
> With this change, ComputeExitLimit design loses some information. We cannot as precisely determine the trip count of a particular exit given knowledge of NSW. But I see now that information was probably not very useful. The idea of conditionally considering NW/NSW based on the loop structure makes the code simpler and the API safer. As a result, the loop unroller can actually be more aggressive without adding any complexity. So I think this is a great change.
One interesting thing about the NSW case where the condition is "missed": InstCombine catches some of these and transforms them to infinite loops. For example, the following gets transformed to an infinite loop:
;; run with: opt -instcombine
declare void @bar(...) #1
define void @test() {
entry:
br label %loop
loop:
%i = phi i32 [ 0, %entry ], [ %i.next, %loop ]
call void (...)* @bar()
%i.next = add nsw i32 %i, 2
%t = icmp ne i32 %i.next, 5
br i1 %t, label %loop, label %exit
exit:
ret void
}
InstCombine does this by looking at known bit values of the two sides of the icmp ne. However, if loop unrolling runs before instcombine the loop will be unrolled completely with a trip count of 2. If you change the increment value from 2 to 3, then instcombine can't reason about the bits and won't transform it to an infinite loop. Given this inconsistency (not surprising given that we're talking about undefined behavior), I'd also be happy just returning couldNotCompute in cases where the equality condition misses with nsw. This would simplify the code further by eliminating the ControlsExit parameter in the various functions. I wrote the code as is to match existing behavior. Let me know if you think this further simplification and change in behavior is a good idea.
> One thing I'm concerned about though is that I think you're using SCEVDivision for the first time in the standard LLVM pipeline (outside of the delinearizer). SCEVDivision looks like it recurses over the expression operands. Every time someone does this, we eventually have to track down some case that results in pathological compile time. Can you prove that this recursion will never visit the same expression twice? If not, I think you need to add a visited set as we do elsewhere, or find another way to check for exact division.
I'll dive in a get back shortly on this...
http://reviews.llvm.org/D5550
More information about the llvm-commits
mailing list