[PATCH] D102982: [LoopUnroll] Use smallest exact trip count from any exit

Sat Jun 19 11:46:09 PDT 2021

nikic added inline comments.

================
Comment at: llvm/lib/Transforms/Scalar/LoopUnrollPass.cpp:1124
+  // Find the smallest exact trip count for any exit. An unroll by this trip
+  // count will eliminate at least one exit, but there may be additional exits
+  // beyond it.
----------------
reames wrote:
> Your comment here is wrong.  The code is correct, but the comment isn't.  :)
> 
> If we unroll by the exact trip count of *any* exit, we're guaranteed to break the backedge.  As such, there might be conditional exits left in *earlier* iterations, but there will be nothing in *later* iterations which is what your comment appears to say.  
> 
> It might also be worth stating explicitly that this is an upper bound on the actual trip count of the loop (since an earlier conditional exit we can't analyze might be taken), and draw the distinction with a maximum count (conservatism in analyzing each exit.)  Separately, I really think we should be allowing max trip counts here, but that's a separate step.  
What I really wanted to say here is that an unroll by this trip count eliminates all branches relating to one exit, but branches relating to other exits may have to be kept. This is opposed to the max trip count case where we're only guaranteed to break the backedge, but may not be able to remove any other branches.

The unroll code already handles max trip count, but it's only used if no exact trip count is known, and is controlled by a target option (which is disabled on X86...)

================
Comment at: llvm/test/Transforms/LoopUnroll/runtime-loop-known-exit.ll:9

 define void @test(i32 %s, i32 %n) {
 ; CHECK-LABEL: @test(
----------------
reames wrote:
> This test change looks concerning.  Have you explored why this is happening?  On the surface, this looks like a bad interaction between runtime unrolling and finding a more precise trip count for full unrolling we may need to explore.
When a trip count is known, we don't perform runtime unrolling and perform partial unrolling instead. In this case it doesn't happen because I specified `-unroll-runtime` but not `-unroll-allow-partial` so we get no unrolling. If we do allow partial unrolling, the result looks like this: https://gist.github.com/nikic/0541f7937f6db4867ef7fd4d7673a2b1 We don't perform the runtime unroll transform to enforce a certain trip multiple on the latch exit, and instead make use of the known trip count on the other exit to only check it once per iteration. I'd say the new result is better (same reduction in branches without the need to remainder loop), with the caveat that it's more aggressive, because it doesn't use the default runtime unroll count.

CHANGES SINCE LAST ACTION
  https://reviews.llvm.org/D102982/new/

https://reviews.llvm.org/D102982