[llvm-commits] [PATCH] Teach SCEV about <= comparisons for trip count computation

Wed Sep 19 20:33:24 PDT 2012

On Wed, 19 Sep 2012 19:47:22 -0700
Nick Lewycky <nlewycky at google.com> wrote:

> On 30 August 2012 23:51, Nick Lewycky <nicholas at mxc.ca> wrote:
> 
> > Tobias von Koch wrote:
> >
> >> Dear all,
> >>
> >> SCEV currently has a weakness in the loop trip count computation
> >> code when it comes to loops whose bounds are determined by a <=
> >> comparison.
> >>
> >> Take the example from the attached test case:
> >>
> >> int Proc_8 (int arg1)
> >> {
> >>    int result = 0;
> >>    for (int i = arg1; i <= arg1 + 5; ++i)
> >>      result++;
> >>    return result;
> >> }
> >>
> >> GCC is able to determine the trip count exactly and translates this
> >> whole function into a single instruction (+ return) on PPC.
> >>
> >> LLVM fails on two fronts:
> >> - First of all, it can't deal with the <= comparison because the
> >> code to handle this isn't there. It was added in an earlier
> >> version of SCEV years ago but then reverted alongside a lot of
> >> other (buggy) code. The attached patch fixes this.
> >> - Secondly, even once the patch has been applied, LLVM can only
> >> derive a smax() expression for the trip count of this loop. While
> >> this is a lot better already, it's still not an exact number. I
> >> get the impression that this has something to do with the 'nsw'
> >> flag on the original addition not being propagated through the
> >> SCEV analysis, but I'm not entirely sure here.
> >>
> >
> > These two issues are related. You're right that we aren't getting
> > the 'nsw' information to where scev needs it, though we have to be
> > very careful when doing that.
> >
> > Because we aren't, your patch miscompiles this function:
> >   unsigned char test(unsigned char n) {
> >     unsigned char i;
> >     for (i = 0; i <= n; ++i) {
> >     }
> >     return i;
> >   }
> > where 'n' == 255, because the loop would be infinite and we make it
> > return zero. (Yes this may be a valid transform for certain
> > versions of C and C++ but it isn't valid in LLVM IR at this time.)
> >
> >
> >  The patch passes all regression tests and the nightly test suite.
> > I do
> >> have one worry about the line marked with a 'FIXME', though. Is it
> >> necessary to check for overflow here? Can this be done using the
> >> same method as in ScalarEvolution::getBECount, i.e. zero-extend to
> >> larger type, add, compare results?
> >>
> >
> > No, even a zero-extended computation would not correctly model the
> > infinite loop.
> >
> > I don't know how we ought to fix this.
> >
> 
> Some thoughts! Consider the loop:
> 
>   for (int i = 0; i <= n; ++i) {
>     x += 7;
>   }
>   use(x);
> 
> What's the trip count? Right now it's CouldNotCompute, and that's the
> ideal answer given the SCEV expressions that exist today. What is it
> mathematically, ignoring no-signed-wrapping? It's an infinite loop if
> n == INT_MAX otherwise the trip count is n. Note that it isn't
> undefined when n == INT_MAX, it's defined to be an infinite loop (we
> have a language-specific pass to eliminate such infinite loops with no
> side-effects, but we won't do it here).
> 
> Continuing to keep the no-signed-wrapping aside, what about x? To
> start, exit-value-of(x) = starting-value-of(x) + 7 * (infinite loop
> if n == INT_MAX, otherwise n), but that expression simplifies.
> Implicit in the question "exit-value-of(x)" is the assumption that
> the loop wasn't infinite after all, so it's exit-value-of(x) =
> starting-value-of(x) + 7 * n.
> 
> So it seems like we don't really need nsw here, we could simplify
> this away by adding a new SCEV expression that is conditionally an
> infinite loop. We could teach the SCEV expander to emit an infinite
> loop and a branch to it. It's not clear to me how this works though
> -- what does it mean to have "X
> + 7 * (infinite loop)" in a scev expression? Consider the next loop:
> 
>   for (int i = 0; i <= n; ++i) {
>     x += 7;
>     for (j = i; j <= n; ++j) {
>       y += x + 1;
>     }
>     use(y);
>   }
> 
> How do we evaluate y at the point it's used? It hasn't lost its
> "potentially-unreachable"ness yet.

In this degenerate case, both loops will be infinite under the same
circumstances, so if we reach the use(y), then we know that both loops
are finite.

Were the inner loop to be conditioned on j <= m, then reaching use(y)
tells us that the inner loop must be finite, but nothing about the
outer loop. On the other hand, as you point out, hoisting the
infinite-loop case from the outer loop is straightforward.

> 
> And a few more questions, like how does this expression simplify? 

I think this depends on where the use is. If the use is after the loop,
then the answer is yes. Inside the loop, and we may need to split out
the infinite case first.

>Can
> we always hoist out the potentially-infinite part to the outer-most
> level?

No, I don't think so.

for (i = 0; i <= n; ++i) {
  m = external_func(i);
  for (j = 0; j <= m; ++j) { ... }
}

> Do folks think this new expression is sensible?

To which new expression are you referring?

 -Hal

> 
> Nick

-- 
Hal Finkel
Postdoctoral Appointee
Leadership Computing Facility
Argonne National Laboratory