[llvm-commits] [PATCH] Teach SCEV about <= comparisons for trip count computation
Nick Lewycky
nicholas at mxc.ca
Thu Sep 20 01:39:02 PDT 2012
Hal Finkel wrote:
> On Wed, 19 Sep 2012 19:47:22 -0700
> Nick Lewycky<nlewycky at google.com> wrote:
>
>> On 30 August 2012 23:51, Nick Lewycky<nicholas at mxc.ca> wrote:
>>
>>> Tobias von Koch wrote:
>>>
>>>> Dear all,
>>>>
>>>> SCEV currently has a weakness in the loop trip count computation
>>>> code when it comes to loops whose bounds are determined by a<=
>>>> comparison.
>>>>
>>>> Take the example from the attached test case:
>>>>
>>>> int Proc_8 (int arg1)
>>>> {
>>>> int result = 0;
>>>> for (int i = arg1; i<= arg1 + 5; ++i)
>>>> result++;
>>>> return result;
>>>> }
>>>>
>>>> GCC is able to determine the trip count exactly and translates this
>>>> whole function into a single instruction (+ return) on PPC.
>>>>
>>>> LLVM fails on two fronts:
>>>> - First of all, it can't deal with the<= comparison because the
>>>> code to handle this isn't there. It was added in an earlier
>>>> version of SCEV years ago but then reverted alongside a lot of
>>>> other (buggy) code. The attached patch fixes this.
>>>> - Secondly, even once the patch has been applied, LLVM can only
>>>> derive a smax() expression for the trip count of this loop. While
>>>> this is a lot better already, it's still not an exact number. I
>>>> get the impression that this has something to do with the 'nsw'
>>>> flag on the original addition not being propagated through the
>>>> SCEV analysis, but I'm not entirely sure here.
>>>>
>>>
>>> These two issues are related. You're right that we aren't getting
>>> the 'nsw' information to where scev needs it, though we have to be
>>> very careful when doing that.
>>>
>>> Because we aren't, your patch miscompiles this function:
>>> unsigned char test(unsigned char n) {
>>> unsigned char i;
>>> for (i = 0; i<= n; ++i) {
>>> }
>>> return i;
>>> }
>>> where 'n' == 255, because the loop would be infinite and we make it
>>> return zero. (Yes this may be a valid transform for certain
>>> versions of C and C++ but it isn't valid in LLVM IR at this time.)
>>>
>>>
>>> The patch passes all regression tests and the nightly test suite.
>>> I do
>>>> have one worry about the line marked with a 'FIXME', though. Is it
>>>> necessary to check for overflow here? Can this be done using the
>>>> same method as in ScalarEvolution::getBECount, i.e. zero-extend to
>>>> larger type, add, compare results?
>>>>
>>>
>>> No, even a zero-extended computation would not correctly model the
>>> infinite loop.
>>>
>>> I don't know how we ought to fix this.
>>>
>>
>> Some thoughts! Consider the loop:
>>
>> for (int i = 0; i<= n; ++i) {
>> x += 7;
>> }
>> use(x);
>>
>> What's the trip count? Right now it's CouldNotCompute, and that's the
>> ideal answer given the SCEV expressions that exist today. What is it
>> mathematically, ignoring no-signed-wrapping? It's an infinite loop if
>> n == INT_MAX otherwise the trip count is n. Note that it isn't
>> undefined when n == INT_MAX, it's defined to be an infinite loop (we
>> have a language-specific pass to eliminate such infinite loops with no
>> side-effects, but we won't do it here).
>>
>> Continuing to keep the no-signed-wrapping aside, what about x? To
>> start, exit-value-of(x) = starting-value-of(x) + 7 * (infinite loop
>> if n == INT_MAX, otherwise n), but that expression simplifies.
>> Implicit in the question "exit-value-of(x)" is the assumption that
>> the loop wasn't infinite after all, so it's exit-value-of(x) =
>> starting-value-of(x) + 7 * n.
>>
>> So it seems like we don't really need nsw here, we could simplify
>> this away by adding a new SCEV expression that is conditionally an
>> infinite loop. We could teach the SCEV expander to emit an infinite
>> loop and a branch to it. It's not clear to me how this works though
>> -- what does it mean to have "X
>> + 7 * (infinite loop)" in a scev expression? Consider the next loop:
>>
>> for (int i = 0; i<= n; ++i) {
>> x += 7;
>> for (j = i; j<= n; ++j) {
>> y += x + 1;
>> }
>> use(y);
>> }
>>
>> How do we evaluate y at the point it's used? It hasn't lost its
>> "potentially-unreachable"ness yet.
>
> In this degenerate case, both loops will be infinite under the same
> circumstances, so if we reach the use(y), then we know that both loops
> are finite.
>
> Were the inner loop to be conditioned on j<= m, then reaching use(y)
> tells us that the inner loop must be finite, but nothing about the
> outer loop. On the other hand, as you point out, hoisting the
> infinite-loop case from the outer loop is straightforward.
Oops, yes I meant to change the inner loop condition as well. Thanks.
>> And a few more questions, like how does this expression simplify?
>
> I think this depends on where the use is. If the use is after the loop,
> then the answer is yes. Inside the loop, and we may need to split out
> the infinite case first.
>
>> Can
>> we always hoist out the potentially-infinite part to the outer-most
>> level?
>
> No, I don't think so.
>
> for (i = 0; i<= n; ++i) {
> m = external_func(i);
> for (j = 0; j<= m; ++j) { ... }
> }
I should have clarified what I meant, as I was thinking of the tree
representation of a SCEV expression. We'll always rewrite an expression
like add(mul(a, x), mul(b, x)) into mul(x, add(a, b)) to get the
multiply to the outmost node, and similarly for trunc, zext, umax,
addrec, etc. There's a pecking order for which type of expression gets
to be the closest to the outside. It seems to me that mul(x,
add(potentially-inf(C, a), b)) or similar can always transform into
potentially-inf(C, mul(x, add(a, b))), and that this follows for all
existing SCEV expressions.
In your example, I don't see the problem. We get 'm = SCEVUnknown' and
then 'j = SCEVPotentiallyInfinite(m == INTMAX, m)'. I don't think that's
a problem?
>> Do folks think this new expression is sensible?
>
> To which new expression are you referring?
I suppose I didn't describe it formally. :) A new SCEV expression which
takes the form SCEVPotentiallyInfinite(condition, expression), and where
the condition must be loop invariant with respect to any recurrences in
the expression.
Another way to formulate this would be to add a SCEVChoose(condition,
expr1, expr2) and a SCEVInfinite leaf node. I've been wanting a
SCEVInfinite for a different purpose -- to indicate that we did analyze
the loop and found it infinite -- as opposed to our current reply of
SCEVCouldNotCompute. Currently if SCEV finds a loop infinite
analytically, it will proceed to brute-force it for 200 rounds anyways.
I've lacked motivation to fix it as infinite loops are relatively rare
in real world code.
SCEVChoose is more interesting, and I seem to recall reading a paper
that proposed such a SCEV node. (Sebastian? Could you share your
experience with "pchrec"s?) I considered adding this to LLVM once before
but didn't because I thought they would not be practical in real world
code. How often would choose expressions with loop variant conditions
occur, and what useful optimizations could we do on them? We could also
rephrase umax/smax in terms of such a choose node, but it's not clear to
me that's a step forward; a umax/smax expression takes n-any arguments,
for starters.
Oh and before I forget, I have a third way we could choose to define the
new SCEV expression: as a flag on add. A new "infloop on [un]signed
overflow" flag which indicates that if this add expression does
overflow, then we go straight to an infinite loop. The lowering for that
makes use of the nice LLVM IR overflow intrinsics.
Nick
More information about the llvm-commits
mailing list