[PATCH] D22377: [SCEV] trip count calculation for loops with unknown stride

Sanjoy Das via llvm-commits llvm-commits at lists.llvm.org
Fri Aug 5 17:53:44 PDT 2016


sanjoy added inline comments.

================
Comment at: llvm/lib/Analysis/ScalarEvolution.cpp:8746
@@ +8745,3 @@
+    // propagates no wrap flags to the post-increment/decrement IV. The computed
+    // backedge taken count may be wrong in such cases. For example- 
+    //
----------------
Nit: blank before `-`

================
Comment at: llvm/lib/Analysis/ScalarEvolution.cpp:8752
@@ +8751,3 @@
+    //
+    if (PredicatedIV || !NoWrap || isKnownNonPositive(Stride) || 
+        !loopHasNoSideEffects(L))
----------------
pankajchawla wrote:
> > I'm not super-comfortable with this -- (IIUC) you're assuming that given an unknown stride, there are certain things SCEV cannot prove.  This will create a tricky coupling between different parts and e.g. introduces the possibility that we'll get miscompiles if we make an unrelated part of SCEV smarter in some creative way (like you're doing here :) ).
> > 
> > I think it is better to assume that the other bits in SCEV are omniscient (i.e. they may (but are not guaranteed to) prove anything that is factually correct at runtime), and base our inferences on that.
> 
> 
> The only argument I can give here is that no matter how smart SCEV gets, it can only propagate no wrap flags in 'edge cases' when it knows something about the stride (in fact, it may have to know something about all of `start`, `end` and `stride`). So this smartness should then get reflected in either `isKnownPositive(stride)` or `isKnownNonPositive(stride)` which this change is guarded under. Please let me know whether this is a satisfactory argument. I would prefer not to spend more time on a patch which is not likely to get accepted. I cannot think of a foolproof way to prune the 'edge cases'. Also, just so you know, scalar evolution is successfully able to compute the trip count for the edge case mentioned in the comments by computing the trip count exhaustively before ever reaching this section of the code.
> 
> 
> > 
> > The other thing that's missing here is some clear justification of why what you're doing is correct.  This does not necessarily have to be a formal proof (though that'd make me really happy :) ) but at least a clear statement of preconditions you've assumed, and why that implies that the trip count of (max(rhs, start) - start + stride - 1) / u stride is correct.
> 
> I am not sure whether you are not convinced that this is correct or you just want this to be documented properly. I have tried to document all the preconditions. I will take a crack at giving the proof in case you are not convinced. 
> Please bear with me as I am neither a mathematician nor a theoretical CS guy :)
> Please also keep in mind that I am excluding all edge cases in the proof.
> 
> First lets define the loop structure we are handling-
> 
>   i = start
>   do {
>     A[i] = i;
>     i += s;
>   } while (i < end);
> 
> The proposition is that the backedge taken count of this loop for any combination of start, stride and end is: 
> `(max(end, start + stride) - start - 1) /u stride`
> 
> Since I haven't changed the original backedge taken count formula, I assume that you agree that the formula is correct when stride is known to be positive (original behavior of the function).
> 
> Now, these are our preconditions-
> 
> a) Comparator is either ICMP_ULT or ICMP_SLT.
> b) IV is either nuw or nsw depending upon the comparator type in a).
> c) loop has single exit with no side effects.
> 
> 
> We can split all the possible scenarios for this loop into these three cases-
> 
> 1) Positive Stride - This case reverts to the original behavior of the function which we assumed to be correct.
> 
> 2) Zero stride - Precondition c) implies that stride cannot be zero as that would imply an infinite loop which is undefined behavior.
> 
> 3) Negative stride - Precondition a) and b) together imply that this is only possible for single trip loops, which means  ((start + stride) >= end) holds true. So the formula reduces like this-
> 
> `(max(end, start + stride) - start - 1) /u stride ==>`
> `(start + stride - start - 1) /u stride ==>`
> `(stride - 1) /u stride ==> 0.`
> 
> Hence proved :)
> 
> 
> 
> 
> 
> 
> 
> 
> 
> 
> 
> The only argument I can give here is that no matter how smart SCEV gets, it can only propagate no wrap flags in 'edge cases' when it knows something about the stride (in fact, it may have to know something about all of start, end and stride).

> So this smartness should then get reflected in either isKnownPositive(stride) or isKnownNonPositive(stride) which this change is guarded under.

Ideally we should not rely on things like `isKnownPositive` or `isKnownNonPositive` being uniformly precise all of the time -- i.e. an implementation that returns `false` unconditionally every 10th time these are called should be "okay" (terrible, but correct).  This isn't just a thought experiment either -- `isKnownPositive` (since it internally calls `getRange`) returns a cached result, and the cache for the stride can be cleared independently of the add recurrence using that stride, and by that time the IR (or the cache, actually) could have been modified in a way that SCEV would not be able to re-prove `isKnownPositive`.

> Please let me know whether this is a satisfactory argument. I would prefer not to spend more time on a patch which is not likely to get accepted.

The "assume other parts of SCEV can be omniscient" is an important property, and I'm not okay with losing it without really good reasons.  The only way forward here I can think of is to not do this optimization if the comparison is unsigned, since the only counter-example (such that the trip count formula is broken even with the preconditions satisfied) I've managed to produce is with unsigned compares.  However, I'm not sure if bailing out on unsigned compares will actually work for the case that motivated this patch.

> I am not sure whether you are not convinced that this is correct or you just want this to be documented properly.

I was asking for a short comment documenting on why your line of reasoning is correct.


https://reviews.llvm.org/D22377





More information about the llvm-commits mailing list