[PATCH] D24790: [LoopUnroll] Use the upper bound of the loop trip count to completely unroll loops

Tue Sep 27 18:08:04 PDT 2016

mzolotukhin added a comment.

Hi Haicheng,

Thanks for working on this, please find my answers below and some more remarks/nit-picks inline.

> One of the major reason was that I unrolled many loops with calls. As you may already know, the cost model of call is not that awesome.


Maybe we just fix the cost model for calls instead :-) But I know, it might not be actually possible at all.

> If using exact trip count to unroll, the unrolled loop usually becomes a giant basic block which is preferable. However, if using the upper bound to unroll, the unrolled loop usually become a sequence of small basic blocks because it is not safe to merge loop blocks belonging to different iterations. Some of these blocks may not be executed during runtime. This is another reason that I think we may need to be more conservative to use upper bound to unroll loops.


I see, this makes perfect sense to me. Indeed, having separate thresholds might be reasonable.

> I tried several different configurations and the patch I uploaded was the best I found.


Have you tested it on any other architecture except AArch64?

Thanks,
Michael


================
Comment at: lib/Analysis/ScalarEvolution.cpp:5316
@@ -5315,1 +5315,3 @@
 
+static unsigned TranslateToConstantTripCount(const SCEVConstant *ExitCount) {
+  if (!ExitCount)
----------------
The first letter shouldn't be capitalized. I think the routine name also might be improved, but I don't have any better suggestions at the moment.

================
Comment at: lib/Transforms/Scalar/LoopUnrollPass.cpp:766
@@ -751,2 +765,3 @@
   // 3rd priority is full unroll count.
-  // Full unroll make sense only when TripCount could be staticaly calculated.
+  // Full unroll make sense only when TripCount or its upper bound could be
+  // staticaly calculated.
----------------
s/make/makes/

================
Comment at: lib/Transforms/Scalar/LoopUnrollPass.cpp:767
@@ -753,1 +766,3 @@
+  // Full unroll make sense only when TripCount or its upper bound could be
+  // staticaly calculated.
   // Also we need to check if we exceed FullUnrollMaxCount.
----------------
s/staticaly/statically/

================
Comment at: lib/Transforms/Scalar/LoopUnrollPass.cpp:771
@@ +770,3 @@
+  // we do not know when loop may exit.
+  unsigned FullUnrollTripCount = TripCount ? TripCount : MaxTripCount;
+  if (FullUnrollTripCount && FullUnrollTripCount <= UP.FullUnrollMaxCount) {
----------------
This gets confusing. Can we somehow rename these variables so that it's clearer what's the difference between them?

================
Comment at: lib/Transforms/Scalar/LoopUnrollPass.cpp:779
@@ -759,1 +778,3 @@
                             UnrolledSize, UnrolledSize)) {
+      UP.UpperBound = (MaxTripCount == FullUnrollTripCount);
+      TripCount = FullUnrollTripCount;
----------------
What if `TripCount` matches `MaxTripCount`? Will we think that we're using upper bound instead of the exact trip count in this case?

================
Comment at: lib/Transforms/Scalar/LoopUnrollPass.cpp:1007
@@ +1006,3 @@
+    // upper bound to unroll meets the heuristics.
+    UP.UpperBound = false;
+  }
----------------
I'm not a big fan of reusing this variable. Initially it was supposed to come from `UP` and show if we're allowed to use upper-bound instead of exactly-known trip-count, but now we're also using it to communicate between various routines (and the interface was not specified). Can it be refactored somehow?

================
Comment at: lib/Transforms/Utils/LoopUnroll.cpp:206
@@ -205,4 +205,3 @@
                       bool AllowRuntime, bool AllowExpensiveTripCount,
-                      unsigned TripMultiple, LoopInfo *LI, ScalarEvolution *SE,
-                      DominatorTree *DT, AssumptionCache *AC,
-                      OptimizationRemarkEmitter *ORE, bool PreserveLCSSA) {
+                      bool UseUpperBound, unsigned TripMultiple, LoopInfo *LI,
+                      ScalarEvolution *SE, DominatorTree *DT,
----------------
Please add a comment about `UseUpperBound`. Actually, I don't think that's a good name for this argument, because this functions doesn't use upper bound for anything. What it needs is a flag indicating that conditional branches must be preserved - I'd suggest to reflect that in the argument name.

================
Comment at: lib/Transforms/Utils/LoopUnroll.cpp:543
@@ +542,3 @@
+      // after any iteration.
+      NeedConditional = (UseUpperBound && j);
+    } else if (j != BreakoutTrip && (TripMultiple == 0 || j % TripMultiple != 0)) {
----------------
Is it intentional that we can make `NeedConditional` true after it was set to false before that? From semantics it looks like this never happens (in this case we can assert it), but I'd like to make sure it was not overlooked. Did you intend to replace this `if` with `else if` as well?


================
Comment at: test/CodeGen/AMDGPU/tti-unroll-prefs.ll:14-16
@@ -13,5 +13,4 @@
 ;
 ; This test is meant to check that this loop isn't unrolled into more than
-; four iterations.  The loop unrolling preferences we currently use cause this
-; loop to not be unrolled at all, but that may change in the future.
 
----------------
We could've just passed `-unroll-threshold` to overcome this. It might make sense to do it even now, so that the test doesn't break if UnrollPreferences are changed.


Repository:
  rL LLVM

https://reviews.llvm.org/D24790