[PATCH] D22377: [SCEV] trip count calculation for loops with unknown stride

Mon Aug 8 23:27:36 PDT 2016

sanjoy accepted this revision.
sanjoy added a comment.
This revision is now accepted and ready to land.

lgtm as mentioned inline

================
Comment at: llvm/lib/Analysis/ScalarEvolution.cpp:8752
@@ +8751,3 @@
+    //
+    if (PredicatedIV || !NoWrap || isKnownNonPositive(Stride) || 
+        !loopHasNoSideEffects(L))
----------------
pankajchawla wrote:
> sanjoy wrote:
> > pankajchawla wrote:
> > > > I'm not super-comfortable with this -- (IIUC) you're assuming that given an unknown stride, there are certain things SCEV cannot prove.  This will create a tricky coupling between different parts and e.g. introduces the possibility that we'll get miscompiles if we make an unrelated part of SCEV smarter in some creative way (like you're doing here :) ).
> > > > 
> > > > I think it is better to assume that the other bits in SCEV are omniscient (i.e. they may (but are not guaranteed to) prove anything that is factually correct at runtime), and base our inferences on that.
> > > 
> > > 
> > > The only argument I can give here is that no matter how smart SCEV gets, it can only propagate no wrap flags in 'edge cases' when it knows something about the stride (in fact, it may have to know something about all of `start`, `end` and `stride`). So this smartness should then get reflected in either `isKnownPositive(stride)` or `isKnownNonPositive(stride)` which this change is guarded under. Please let me know whether this is a satisfactory argument. I would prefer not to spend more time on a patch which is not likely to get accepted. I cannot think of a foolproof way to prune the 'edge cases'. Also, just so you know, scalar evolution is successfully able to compute the trip count for the edge case mentioned in the comments by computing the trip count exhaustively before ever reaching this section of the code.
> > > 
> > > 
> > > > 
> > > > The other thing that's missing here is some clear justification of why what you're doing is correct.  This does not necessarily have to be a formal proof (though that'd make me really happy :) ) but at least a clear statement of preconditions you've assumed, and why that implies that the trip count of (max(rhs, start) - start + stride - 1) / u stride is correct.
> > > 
> > > I am not sure whether you are not convinced that this is correct or you just want this to be documented properly. I have tried to document all the preconditions. I will take a crack at giving the proof in case you are not convinced. 
> > > Please bear with me as I am neither a mathematician nor a theoretical CS guy :)
> > > Please also keep in mind that I am excluding all edge cases in the proof.
> > > 
> > > First lets define the loop structure we are handling-
> > > 
> > >   i = start
> > >   do {
> > >     A[i] = i;
> > >     i += s;
> > >   } while (i < end);
> > > 
> > > The proposition is that the backedge taken count of this loop for any combination of start, stride and end is: 
> > > `(max(end, start + stride) - start - 1) /u stride`
> > > 
> > > Since I haven't changed the original backedge taken count formula, I assume that you agree that the formula is correct when stride is known to be positive (original behavior of the function).
> > > 
> > > Now, these are our preconditions-
> > > 
> > > a) Comparator is either ICMP_ULT or ICMP_SLT.
> > > b) IV is either nuw or nsw depending upon the comparator type in a).
> > > c) loop has single exit with no side effects.
> > > 
> > > 
> > > We can split all the possible scenarios for this loop into these three cases-
> > > 
> > > 1) Positive Stride - This case reverts to the original behavior of the function which we assumed to be correct.
> > > 
> > > 2) Zero stride - Precondition c) implies that stride cannot be zero as that would imply an infinite loop which is undefined behavior.
> > > 
> > > 3) Negative stride - Precondition a) and b) together imply that this is only possible for single trip loops, which means  ((start + stride) >= end) holds true. So the formula reduces like this-
> > > 
> > > `(max(end, start + stride) - start - 1) /u stride ==>`
> > > `(start + stride - start - 1) /u stride ==>`
> > > `(stride - 1) /u stride ==> 0.`
> > > 
> > > Hence proved :)
> > > 
> > > 
> > > 
> > > 
> > > 
> > > 
> > > 
> > > 
> > > 
> > > 
> > > 
> > > The only argument I can give here is that no matter how smart SCEV gets, it can only propagate no wrap flags in 'edge cases' when it knows something about the stride (in fact, it may have to know something about all of start, end and stride).
> > 
> > > So this smartness should then get reflected in either isKnownPositive(stride) or isKnownNonPositive(stride) which this change is guarded under.
> > 
> > Ideally we should not rely on things like `isKnownPositive` or `isKnownNonPositive` being uniformly precise all of the time -- i.e. an implementation that returns `false` unconditionally every 10th time these are called should be "okay" (terrible, but correct).  This isn't just a thought experiment either -- `isKnownPositive` (since it internally calls `getRange`) returns a cached result, and the cache for the stride can be cleared independently of the add recurrence using that stride, and by that time the IR (or the cache, actually) could have been modified in a way that SCEV would not be able to re-prove `isKnownPositive`.
> > 
> > > Please let me know whether this is a satisfactory argument. I would prefer not to spend more time on a patch which is not likely to get accepted.
> > 
> > The "assume other parts of SCEV can be omniscient" is an important property, and I'm not okay with losing it without really good reasons.  The only way forward here I can think of is to not do this optimization if the comparison is unsigned, since the only counter-example (such that the trip count formula is broken even with the preconditions satisfied) I've managed to produce is with unsigned compares.  However, I'm not sure if bailing out on unsigned compares will actually work for the case that motivated this patch.
> > 
> > > I am not sure whether you are not convinced that this is correct or you just want this to be documented properly.
> > 
> > I was asking for a short comment documenting on why your line of reasoning is correct.
> > Ideally we should not rely on things like `isKnownPositive` or `isKnownNonPositive` being uniformly precise all of the time i.e. an implementation that returns false unconditionally every 10th time these are called should be "okay" (terrible, but correct).  
> 
> I am not relying on their preciseness, I am relying on their correctness. They can (and should) conservatively return false which would be fine. Returning false unconditionally every 10th time is not okay, it is clearly incorrect. What if I call it on the same value 10 times? Are you suggesting that it is ok for them to return `true` the first 9 times and `false` the 10th time for the same value?
> 
> 
> > isKnownPositive (since it internally calls getRange) returns a cached result, and the cache for the stride can be cleared independently of the add recurrence using that stride, and by that time the IR (or the cache, actually) could have been modified in a way that SCEV would not be able to re-prove isKnownPositive.
> > 
> 
> If this is the case, I believe this is an issue with ScalarEvolution's implementation.  The cache for the stride cannot be invalidated independently of the add recurrence it occurs in. We should invalidate SCEV for all values dependent on stride like what we do in `forgetValue()` otherwise ScalarEvolution is internally inconsistent. If we do not do this, the wrap flags applied on the add recurrence can be wrong and can lead to miscompilation. If `isKnownPositive()` cannot be re-proven, then the previous trip count computation which was made on this assumption is now invalid and therefore should be invalidated as well.
> 
> 
> > The "assume other parts of SCEV can be omniscient" is an important property, and I'm not okay with losing it without really good reasons. 
> 
> I can agree on the principle but I think you are applying it incorrectly. As a component, ScalarEvolution's different sub-components have to be in 'sync' with each other otherwise no correctness guarantees can be provided by the component as a whole. I am not limiting the omniscience of other parts of SCEV here, I am merely assuming that when they grow stronger, they grow stronger in sync. 
> 
> 
> 
> 
> 
> > The only way forward here I can think of is to not do this optimization if the comparison is unsigned, since the only counter-example (such that the trip count formula is broken even with the preconditions satisfied) I've managed to produce is with unsigned compares.
> >  
> > 
> 
> The example you provided invalidates both signed and unsigned IVs. This is the IR produced by LLVM for your example-
> 
> 
> ```
> define void @foo(i8* nocapture %A, i32 %n) local_unnamed_addr #0 {
> entry:
>   br label %for.body
> 
> for.body:                                         ; preds = %entry, %for.body
>   %i.07 = phi i8 [ 127, %entry ], [ %add, %for.body ]
>   %idxprom = zext i8 %i.07 to i64
>   %arrayidx = getelementptr inbounds i8, i8* %A, i64 %idxprom
>   store i8 %i.07, i8* %arrayidx, align 1
>   %add = add i8 %i.07, -127
>   %cmp = icmp sgt i8 %add, -1
>   br i1 %cmp, label %for.body, label %for.end
> 
> for.end:                                          ; preds = %for.body
>   ret void
> }
> 
> 
> ```
> This is the SCEV form of %add-
>  {0,+,-127}<nsw><%for.body> U: [-127,1) S: [-127,1)
> 
> There is already an `nsw` flag on it, and an `nuw` flag can be applied as well. So I cannot restrict it to signed IVs.
> 
> 
> > I was asking for a short comment documenting on why your line of reasoning is correct.
> 
> I will add a short version of the proof in the comments if we can agree that this patch should be committed.
> 
> I can agree on the principle but I think you are applying it incorrectly. As a component, ScalarEvolution's different sub-components have to be in 'sync' with each other otherwise no correctness guarantees can be provided by the component as a whole. I am not limiting the omniscience of other parts of SCEV here, I am merely assuming that when they grow stronger, they grow stronger in sync.

You're right, I think I'm getting carried away in some excessively rigid chain of reasoning which isn't really applicable here.  I'm fine with this change landing as-is (with a short comment informally describing why what you've changed is correct).

https://reviews.llvm.org/D22377