[llvm-commits] [PATCH] Teach SCEV about <= comparisons for trip count computation

Thu Sep 20 09:52:15 PDT 2012

On Thu, 20 Sep 2012 01:39:02 -0700
Nick Lewycky <nicholas at mxc.ca> wrote:

> Hal Finkel wrote:
> > On Wed, 19 Sep 2012 19:47:22 -0700
> > Nick Lewycky<nlewycky at google.com>  wrote:
> >
> >> On 30 August 2012 23:51, Nick Lewycky<nicholas at mxc.ca>  wrote:
> >>
> >>> Tobias von Koch wrote:
> >>>
> >>>> Dear all,
> >>>>
> >>>> SCEV currently has a weakness in the loop trip count computation
> >>>> code when it comes to loops whose bounds are determined by a<=
> >>>> comparison.
> >>>>
> >>>> Take the example from the attached test case:
> >>>>
> >>>> int Proc_8 (int arg1)
> >>>> {
> >>>>     int result = 0;
> >>>>     for (int i = arg1; i<= arg1 + 5; ++i)
> >>>>       result++;
> >>>>     return result;
> >>>> }
> >>>>
> >>>> GCC is able to determine the trip count exactly and translates
> >>>> this whole function into a single instruction (+ return) on PPC.
> >>>>
> >>>> LLVM fails on two fronts:
> >>>> - First of all, it can't deal with the<= comparison because the
> >>>> code to handle this isn't there. It was added in an earlier
> >>>> version of SCEV years ago but then reverted alongside a lot of
> >>>> other (buggy) code. The attached patch fixes this.
> >>>> - Secondly, even once the patch has been applied, LLVM can only
> >>>> derive a smax() expression for the trip count of this loop. While
> >>>> this is a lot better already, it's still not an exact number. I
> >>>> get the impression that this has something to do with the 'nsw'
> >>>> flag on the original addition not being propagated through the
> >>>> SCEV analysis, but I'm not entirely sure here.
> >>>>
> >>>
> >>> These two issues are related. You're right that we aren't getting
> >>> the 'nsw' information to where scev needs it, though we have to be
> >>> very careful when doing that.
> >>>
> >>> Because we aren't, your patch miscompiles this function:
> >>>    unsigned char test(unsigned char n) {
> >>>      unsigned char i;
> >>>      for (i = 0; i<= n; ++i) {
> >>>      }
> >>>      return i;
> >>>    }
> >>> where 'n' == 255, because the loop would be infinite and we make
> >>> it return zero. (Yes this may be a valid transform for certain
> >>> versions of C and C++ but it isn't valid in LLVM IR at this time.)
> >>>
> >>>
> >>>   The patch passes all regression tests and the nightly test
> >>> suite. I do
> >>>> have one worry about the line marked with a 'FIXME', though. Is
> >>>> it necessary to check for overflow here? Can this be done using
> >>>> the same method as in ScalarEvolution::getBECount, i.e.
> >>>> zero-extend to larger type, add, compare results?
> >>>>
> >>>
> >>> No, even a zero-extended computation would not correctly model the
> >>> infinite loop.
> >>>
> >>> I don't know how we ought to fix this.
> >>>
> >>
> >> Some thoughts! Consider the loop:
> >>
> >>    for (int i = 0; i<= n; ++i) {
> >>      x += 7;
> >>    }
> >>    use(x);
> >>
> >> What's the trip count? Right now it's CouldNotCompute, and that's
> >> the ideal answer given the SCEV expressions that exist today. What
> >> is it mathematically, ignoring no-signed-wrapping? It's an
> >> infinite loop if n == INT_MAX otherwise the trip count is n. Note
> >> that it isn't undefined when n == INT_MAX, it's defined to be an
> >> infinite loop (we have a language-specific pass to eliminate such
> >> infinite loops with no side-effects, but we won't do it here).
> >>
> >> Continuing to keep the no-signed-wrapping aside, what about x? To
> >> start, exit-value-of(x) = starting-value-of(x) + 7 * (infinite loop
> >> if n == INT_MAX, otherwise n), but that expression simplifies.
> >> Implicit in the question "exit-value-of(x)" is the assumption that
> >> the loop wasn't infinite after all, so it's exit-value-of(x) =
> >> starting-value-of(x) + 7 * n.
> >>
> >> So it seems like we don't really need nsw here, we could simplify
> >> this away by adding a new SCEV expression that is conditionally an
> >> infinite loop. We could teach the SCEV expander to emit an infinite
> >> loop and a branch to it. It's not clear to me how this works though
> >> -- what does it mean to have "X
> >> + 7 * (infinite loop)" in a scev expression? Consider the next
> >> loop:
> >>
> >>    for (int i = 0; i<= n; ++i) {
> >>      x += 7;
> >>      for (j = i; j<= n; ++j) {
> >>        y += x + 1;
> >>      }
> >>      use(y);
> >>    }
> >>
> >> How do we evaluate y at the point it's used? It hasn't lost its
> >> "potentially-unreachable"ness yet.
> >
> > In this degenerate case, both loops will be infinite under the same
> > circumstances, so if we reach the use(y), then we know that both
> > loops are finite.
> >
> > Were the inner loop to be conditioned on j<= m, then reaching use(y)
> > tells us that the inner loop must be finite, but nothing about the
> > outer loop. On the other hand, as you point out, hoisting the
> > infinite-loop case from the outer loop is straightforward.
> 
> Oops, yes I meant to change the inner loop condition as well. Thanks.
> 
> >> And a few more questions, like how does this expression simplify?
> >
> > I think this depends on where the use is. If the use is after the
> > loop, then the answer is yes. Inside the loop, and we may need to
> > split out the infinite case first.
> >
> >> Can
> >> we always hoist out the potentially-infinite part to the outer-most
> >> level?
> >
> > No, I don't think so.
> >
> > for (i = 0; i<= n; ++i) {
> >    m = external_func(i);
> >    for (j = 0; j<= m; ++j) { ... }
> > }
> 
> I should have clarified what I meant, as I was thinking of the tree 
> representation of a SCEV expression. We'll always rewrite an
> expression like add(mul(a, x), mul(b, x)) into mul(x, add(a, b)) to
> get the multiply to the outmost node, and similarly for trunc, zext,
> umax, addrec, etc. There's a pecking order for which type of
> expression gets to be the closest to the outside. It seems to me that
> mul(x, add(potentially-inf(C, a), b)) or similar can always transform
> into potentially-inf(C, mul(x, add(a, b))), and that this follows for
> all existing SCEV expressions.
> 
> In your example, I don't see the problem. We get 'm = SCEVUnknown'
> and then 'j = SCEVPotentiallyInfinite(m == INTMAX, m)'. I don't think
> that's a problem?

Thanks! An actual infinity could not be hoisted past a division, for
example, where it might make the result 0. However, we're not dealing
with an actual infinity here, instead, we seem to have a concept
more like 'Uncomputed', and that can be hoisted past anything because
an expression depending on an uncomputed value must itself be
uncomputed.

The trick is that can uncomputed value can't be used; so if we have a
choice then we need to split the control flow into one region where the
value is computed and the other where it is not.

> 
> >> Do folks think this new expression is sensible?
> >
> > To which new expression are you referring?
> 
> I suppose I didn't describe it formally. :) A new SCEV expression
> which takes the form SCEVPotentiallyInfinite(condition, expression),
> and where the condition must be loop invariant with respect to any
> recurrences in the expression.
> 
> Another way to formulate this would be to add a SCEVChoose(condition, 
> expr1, expr2) and a SCEVInfinite leaf node. I've been wanting a 
> SCEVInfinite for a different purpose -- to indicate that we did
> analyze the loop and found it infinite -- as opposed to our current
> reply of SCEVCouldNotCompute. Currently if SCEV finds a loop infinite 
> analytically, it will proceed to brute-force it for 200 rounds
> anyways. I've lacked motivation to fix it as infinite loops are
> relatively rare in real world code.
> 
> SCEVChoose is more interesting, and I seem to recall reading a paper 
> that proposed such a SCEV node. (Sebastian? Could you share your 
> experience with "pchrec"s?) I considered adding this to LLVM once
> before but didn't because I thought they would not be practical in
> real world code. How often would choose expressions with loop variant
> conditions occur, and what useful optimizations could we do on them?
> We could also rephrase umax/smax in terms of such a choose node, but
> it's not clear to me that's a step forward; a umax/smax expression
> takes n-any arguments, for starters.

This makes sense.

> 
> Oh and before I forget, I have a third way we could choose to define
> the new SCEV expression: as a flag on add. A new "infloop on
> [un]signed overflow" flag which indicates that if this add expression
> does overflow, then we go straight to an infinite loop. The lowering
> for that makes use of the nice LLVM IR overflow intrinsics.

This makes sense; and I think can also be layered on one of the
previous suggestions as well.

 -Hal

> 
> Nick

-- 
Hal Finkel
Postdoctoral Appointee
Leadership Computing Facility
Argonne National Laboratory