[llvm-commits] [PATCH] Teach SCEV about <= comparisons for trip count computation

Thu Sep 20 01:39:02 PDT 2012

Hal Finkel wrote:
> On Wed, 19 Sep 2012 19:47:22 -0700
> Nick Lewycky<nlewycky at google.com>  wrote:
>
>> On 30 August 2012 23:51, Nick Lewycky<nicholas at mxc.ca>  wrote:
>>
>>> Tobias von Koch wrote:
>>>
>>>> Dear all,
>>>>
>>>> SCEV currently has a weakness in the loop trip count computation
>>>> code when it comes to loops whose bounds are determined by a<=
>>>> comparison.
>>>>
>>>> Take the example from the attached test case:
>>>>
>>>> int Proc_8 (int arg1)
>>>> {
>>>>     int result = 0;
>>>>     for (int i = arg1; i<= arg1 + 5; ++i)
>>>>       result++;
>>>>     return result;
>>>> }
>>>>
>>>> GCC is able to determine the trip count exactly and translates this
>>>> whole function into a single instruction (+ return) on PPC.
>>>>
>>>> LLVM fails on two fronts:
>>>> - First of all, it can't deal with the<= comparison because the
>>>> code to handle this isn't there. It was added in an earlier
>>>> version of SCEV years ago but then reverted alongside a lot of
>>>> other (buggy) code. The attached patch fixes this.
>>>> - Secondly, even once the patch has been applied, LLVM can only
>>>> derive a smax() expression for the trip count of this loop. While
>>>> this is a lot better already, it's still not an exact number. I
>>>> get the impression that this has something to do with the 'nsw'
>>>> flag on the original addition not being propagated through the
>>>> SCEV analysis, but I'm not entirely sure here.
>>>>
>>>
>>> These two issues are related. You're right that we aren't getting
>>> the 'nsw' information to where scev needs it, though we have to be
>>> very careful when doing that.
>>>
>>> Because we aren't, your patch miscompiles this function:
>>>    unsigned char test(unsigned char n) {
>>>      unsigned char i;
>>>      for (i = 0; i<= n; ++i) {
>>>      }
>>>      return i;
>>>    }
>>> where 'n' == 255, because the loop would be infinite and we make it
>>> return zero. (Yes this may be a valid transform for certain
>>> versions of C and C++ but it isn't valid in LLVM IR at this time.)
>>>
>>>
>>>   The patch passes all regression tests and the nightly test suite.
>>> I do
>>>> have one worry about the line marked with a 'FIXME', though. Is it
>>>> necessary to check for overflow here? Can this be done using the
>>>> same method as in ScalarEvolution::getBECount, i.e. zero-extend to
>>>> larger type, add, compare results?
>>>>
>>>
>>> No, even a zero-extended computation would not correctly model the
>>> infinite loop.
>>>
>>> I don't know how we ought to fix this.
>>>
>>
>> Some thoughts! Consider the loop:
>>
>>    for (int i = 0; i<= n; ++i) {
>>      x += 7;
>>    }
>>    use(x);
>>
>> What's the trip count? Right now it's CouldNotCompute, and that's the
>> ideal answer given the SCEV expressions that exist today. What is it
>> mathematically, ignoring no-signed-wrapping? It's an infinite loop if
>> n == INT_MAX otherwise the trip count is n. Note that it isn't
>> undefined when n == INT_MAX, it's defined to be an infinite loop (we
>> have a language-specific pass to eliminate such infinite loops with no
>> side-effects, but we won't do it here).
>>
>> Continuing to keep the no-signed-wrapping aside, what about x? To
>> start, exit-value-of(x) = starting-value-of(x) + 7 * (infinite loop
>> if n == INT_MAX, otherwise n), but that expression simplifies.
>> Implicit in the question "exit-value-of(x)" is the assumption that
>> the loop wasn't infinite after all, so it's exit-value-of(x) =
>> starting-value-of(x) + 7 * n.
>>
>> So it seems like we don't really need nsw here, we could simplify
>> this away by adding a new SCEV expression that is conditionally an
>> infinite loop. We could teach the SCEV expander to emit an infinite
>> loop and a branch to it. It's not clear to me how this works though
>> -- what does it mean to have "X
>> + 7 * (infinite loop)" in a scev expression? Consider the next loop:
>>
>>    for (int i = 0; i<= n; ++i) {
>>      x += 7;
>>      for (j = i; j<= n; ++j) {
>>        y += x + 1;
>>      }
>>      use(y);
>>    }
>>
>> How do we evaluate y at the point it's used? It hasn't lost its
>> "potentially-unreachable"ness yet.
>
> In this degenerate case, both loops will be infinite under the same
> circumstances, so if we reach the use(y), then we know that both loops
> are finite.
>
> Were the inner loop to be conditioned on j<= m, then reaching use(y)
> tells us that the inner loop must be finite, but nothing about the
> outer loop. On the other hand, as you point out, hoisting the
> infinite-loop case from the outer loop is straightforward.

Oops, yes I meant to change the inner loop condition as well. Thanks.

>> And a few more questions, like how does this expression simplify?
>
> I think this depends on where the use is. If the use is after the loop,
> then the answer is yes. Inside the loop, and we may need to split out
> the infinite case first.
>
>> Can
>> we always hoist out the potentially-infinite part to the outer-most
>> level?
>
> No, I don't think so.
>
> for (i = 0; i<= n; ++i) {
>    m = external_func(i);
>    for (j = 0; j<= m; ++j) { ... }
> }

I should have clarified what I meant, as I was thinking of the tree 
representation of a SCEV expression. We'll always rewrite an expression 
like add(mul(a, x), mul(b, x)) into mul(x, add(a, b)) to get the 
multiply to the outmost node, and similarly for trunc, zext, umax, 
addrec, etc. There's a pecking order for which type of expression gets 
to be the closest to the outside. It seems to me that mul(x, 
add(potentially-inf(C, a), b)) or similar can always transform into 
potentially-inf(C, mul(x, add(a, b))), and that this follows for all 
existing SCEV expressions.

In your example, I don't see the problem. We get 'm = SCEVUnknown' and 
then 'j = SCEVPotentiallyInfinite(m == INTMAX, m)'. I don't think that's 
a problem?

>> Do folks think this new expression is sensible?
>
> To which new expression are you referring?

I suppose I didn't describe it formally. :) A new SCEV expression which 
takes the form SCEVPotentiallyInfinite(condition, expression), and where 
the condition must be loop invariant with respect to any recurrences in 
the expression.

Another way to formulate this would be to add a SCEVChoose(condition, 
expr1, expr2) and a SCEVInfinite leaf node. I've been wanting a 
SCEVInfinite for a different purpose -- to indicate that we did analyze 
the loop and found it infinite -- as opposed to our current reply of 
SCEVCouldNotCompute. Currently if SCEV finds a loop infinite 
analytically, it will proceed to brute-force it for 200 rounds anyways. 
I've lacked motivation to fix it as infinite loops are relatively rare 
in real world code.

SCEVChoose is more interesting, and I seem to recall reading a paper 
that proposed such a SCEV node. (Sebastian? Could you share your 
experience with "pchrec"s?) I considered adding this to LLVM once before 
but didn't because I thought they would not be practical in real world 
code. How often would choose expressions with loop variant conditions 
occur, and what useful optimizations could we do on them? We could also 
rephrase umax/smax in terms of such a choose node, but it's not clear to 
me that's a step forward; a umax/smax expression takes n-any arguments, 
for starters.

Oh and before I forget, I have a third way we could choose to define the 
new SCEV expression: as a flag on add. A new "infloop on [un]signed 
overflow" flag which indicates that if this add expression does 
overflow, then we go straight to an infinite loop. The lowering for that 
makes use of the nice LLVM IR overflow intrinsics.

Nick