[PATCH] Loop Rerolling Pass

Wed Oct 16 18:25:15 PDT 2013

On Oct 16, 2013, at 4:20 PM, Hal Finkel <hfinkel at anl.gov> wrote:

> ----- Original Message -----
>> 
>> On Oct 16, 2013, at 3:37 PM, Hal Finkel <hfinkel at anl.gov> wrote:
>> 
>>> ----- Original Message -----
>>>> 
>>>> On Oct 16, 2013, at 1:14 PM, Hal Finkel <hfinkel at anl.gov> wrote:
>>>> 
>>>>> ----- Original Message -----
>>>>>> 
>>>>>> 
>>>>>> 
>>>>>> On Oct 16, 2013, at 9:18 AM, Hal Finkel < hfinkel at anl.gov >
>>>>>> wrote:
>>>>>> 
>>>>>> 
>>>>>> 
>>>>>> ----- Original Message -----
>>>>>> 
>>>>>> 
>>>>>> 
>>>>>> On 15 October 2013 22:11, Hal Finkel < hfinkel at anl.gov > wrote:
>>>>>> 
>>>>>> 
>>>>>> 
>>>>>> 
>>>>>> 
>>>>>> 
>>>>>> I made use of SCEV everywhere that I could (I think). SCEV is
>>>>>> used
>>>>>> to
>>>>>> analyze the induction variables, and then at the end to help
>>>>>> with
>>>>>> the rewriting. I don't think that I can use SCEV everywhere,
>>>>>> however. For one thing, I need to check for equivalence of
>>>>>> instructions, with certain substitutions, for instructions (like
>>>>>> function calls) that SCEV does not interpret.
>>>>>> 
>>>>>> 
>>>>>> Hi Hal,
>>>>>> 
>>>>>> 
>>>>>> This is probably my lack of understanding of all that SCEV does
>>>>>> than
>>>>>> anything else.
>>>>>> 
>>>>>> 
>>>>>> My comment was to the fact that you seem to be investigating
>>>>>> specific
>>>>>> cases (multiply, adding, increment size near line 240), which
>>>>>> SCEV
>>>>>> could get that as an expression, and possibly making it slightly
>>>>>> easier to work with. I'll let other SCEV/LV experts to chime in,
>>>>>> because I basically don't know what I'm talking about, here. ;)
>>>>>> 
>>>>>> Okay, I see what you mean. The code in this block:
>>>>>> if (Inc == 1) {
>>>>>> // This is a special case: here we're looking for all uses
>>>>>> (except
>>>>>> for
>>>>>> // the increment) to be multiplied by a common factor. The
>>>>>> increment
>>>>>> must
>>>>>> // be by one.
>>>>>> if (I->first->getNumUses() != 2)
>>>>>> continue;
>>>>>> 
>>>>>> This code does not use SCEV because, IMHO, there is no need. It
>>>>>> is
>>>>>> looking for a very particular instruction pattern where the
>>>>>> induction variable has only two uses: one which increments it by
>>>>>> one
>>>>>> (SCEV has already been used to determine that the increment is
>>>>>> 1),
>>>>>> and the other is a multiply by a small constant. It is to catch
>>>>>> cases like this:
>>>>>> 
>>>>>> for (int i = 0; i < 500; ++i) {
>>>>>> foo(3*i);
>>>>>> foo(3*i+1);
>>>>>> foo(3*i+2);
>>>>>> }
>>>>>> 
>>>>>> And so, aside from the increment, all uses of the IV are via the
>>>>>> multiply. If we find this pattern, then instead of attempting to
>>>>>> classify all IV uses as functions of i, i+1, i+2, ... we attempt
>>>>>> to
>>>>>> classify all uses of the multiplied IV that way.
>>>>>> 
>>>>>> 
>>>>>> 
>>>>>> 
>>>>>> I think the more general SCEV-based way to do this would be to
>>>>>> recursively walk the def-use chains starting at phis, looking
>>>>>> past
>>>>>> simple arithmetic until reaching an IV users (see
>>>>>> IVUsers::AddUsersIfInteresting). Then you group the users by
>>>>>> their
>>>>>> IV operand's SCEV expression. If the SCEVs advance by the same
>>>>>> constant, then you have your unrolled iterations and it doesn't
>>>>>> matter how the induction variable was computed.
>>>>>> LSRInstance::CollectChains does something similar.
>>>>> 
>>>>> Thanks! Collecting all IV users may be overkill here, but this is
>>>>> something that I should play with.
>>>>> 
>>>>> While I have your attention (hopefully), why does SCEV not have a
>>>>> signed division representation? I suspect that it why SCEV won't
>>>>> give be a backedge-taken count for a loop like:
>>>>> 
>>>>> for (int i = 0; i < n; i += 5) {
>>>>>  ...
>>>>> }
>>>> 
>>>> 
>>>> I don’t think division by a negative divisor lends itself to
>>>> algebraic simplification. Hacker’s guide might say something about
>>>> this.
>>>> 
>>>> The reason you don’t get a trip count is that ‘i' might step
>>>> beyond
>>>> ’n’ and overflow. If ’n’ is a constant less than INT_MAX-4 then
>>>> you
>>>> get a trip count.
>>>> 
>>>> The NSW flags are supposed to handle this case. Did you lose them
>>>> during loop unrolling?
>>> 
>>> I think that this is the key point. No, I tried this with optimized
>>> code straight out of Clang and it did not work (I don't think that
>>> it has anything to do with the loop body).
>> 
>> Right. How could I forget. This is the infamous case where we ignore
>> NSW. The problem is that we don’t know how the backedge taken count
>> will be used. We do know that if the loop exits via the current
>> branch, it will be at iteration ’n'. However, we don’t know if the
>> loop will continue iterating beyond that point.
> 
> But it seems, that being the case, we could still return 'n/5' for loops with only one exiting block? I thought that is what SE->hasLoopInvariantBackedgeTakenCount(L) was for. It would only say that there was a backedge-taken count if the loop structure was simple enough that there was one unambiguous answer. Is this just an implementation oversight, or are there additional complications?
> 
>> 
>> So you could get a minimum taken count for a particular loop back
>> edge in this case if we adapt the SCEV API to communicate properly.
> 
> I recall discussing this before, and so I apologize, but can you elaborate on what 'communicate properly' will entail?

I’m open to anything. For each branch exit we could distinguish between a min vs. exact backedge taken count. We just have to be careful how we present it to the public API and error on the side of caution. If a user simply asks for the loop trip count, I don’t think it’s correct to return ’n’, since subsequent iterations may run before hitting undefined behavior. There have been bugs related to this in the past.

If the client either asks for a minimum trip count, or the iteration count at which we may observe a loop exit, then we can safely provide an answer. In your case, you’re asking for an equivalent loop test, so is it safe? I think it only works for you because you know the loop contains no calls, so the program has no way to terminate before hitting undefined behavior.

Maybe the high level SCEV interface should take a loop-may-terminate parameter. The client can set this to false if it goes to the trouble of proving it.

-Andy