[PATCH] Loop Rerolling Pass

Wed Oct 16 15:58:02 PDT 2013

On Oct 16, 2013, at 3:37 PM, Hal Finkel <hfinkel at anl.gov> wrote:

> ----- Original Message -----
>> 
>> On Oct 16, 2013, at 1:14 PM, Hal Finkel <hfinkel at anl.gov> wrote:
>> 
>>> ----- Original Message -----
>>>> 
>>>> 
>>>> 
>>>> On Oct 16, 2013, at 9:18 AM, Hal Finkel < hfinkel at anl.gov > wrote:
>>>> 
>>>> 
>>>> 
>>>> ----- Original Message -----
>>>> 
>>>> 
>>>> 
>>>> On 15 October 2013 22:11, Hal Finkel < hfinkel at anl.gov > wrote:
>>>> 
>>>> 
>>>> 
>>>> 
>>>> 
>>>> 
>>>> I made use of SCEV everywhere that I could (I think). SCEV is used
>>>> to
>>>> analyze the induction variables, and then at the end to help with
>>>> the rewriting. I don't think that I can use SCEV everywhere,
>>>> however. For one thing, I need to check for equivalence of
>>>> instructions, with certain substitutions, for instructions (like
>>>> function calls) that SCEV does not interpret.
>>>> 
>>>> 
>>>> Hi Hal,
>>>> 
>>>> 
>>>> This is probably my lack of understanding of all that SCEV does
>>>> than
>>>> anything else.
>>>> 
>>>> 
>>>> My comment was to the fact that you seem to be investigating
>>>> specific
>>>> cases (multiply, adding, increment size near line 240), which SCEV
>>>> could get that as an expression, and possibly making it slightly
>>>> easier to work with. I'll let other SCEV/LV experts to chime in,
>>>> because I basically don't know what I'm talking about, here. ;)
>>>> 
>>>> Okay, I see what you mean. The code in this block:
>>>> if (Inc == 1) {
>>>> // This is a special case: here we're looking for all uses (except
>>>> for
>>>> // the increment) to be multiplied by a common factor. The
>>>> increment
>>>> must
>>>> // be by one.
>>>> if (I->first->getNumUses() != 2)
>>>> continue;
>>>> 
>>>> This code does not use SCEV because, IMHO, there is no need. It is
>>>> looking for a very particular instruction pattern where the
>>>> induction variable has only two uses: one which increments it by
>>>> one
>>>> (SCEV has already been used to determine that the increment is 1),
>>>> and the other is a multiply by a small constant. It is to catch
>>>> cases like this:
>>>> 
>>>> for (int i = 0; i < 500; ++i) {
>>>> foo(3*i);
>>>> foo(3*i+1);
>>>> foo(3*i+2);
>>>> }
>>>> 
>>>> And so, aside from the increment, all uses of the IV are via the
>>>> multiply. If we find this pattern, then instead of attempting to
>>>> classify all IV uses as functions of i, i+1, i+2, ... we attempt
>>>> to
>>>> classify all uses of the multiplied IV that way.
>>>> 
>>>> 
>>>> 
>>>> 
>>>> I think the more general SCEV-based way to do this would be to
>>>> recursively walk the def-use chains starting at phis, looking past
>>>> simple arithmetic until reaching an IV users (see
>>>> IVUsers::AddUsersIfInteresting). Then you group the users by their
>>>> IV operand's SCEV expression. If the SCEVs advance by the same
>>>> constant, then you have your unrolled iterations and it doesn't
>>>> matter how the induction variable was computed.
>>>> LSRInstance::CollectChains does something similar.
>>> 
>>> Thanks! Collecting all IV users may be overkill here, but this is
>>> something that I should play with.
>>> 
>>> While I have your attention (hopefully), why does SCEV not have a
>>> signed division representation? I suspect that it why SCEV won't
>>> give be a backedge-taken count for a loop like:
>>> 
>>> for (int i = 0; i < n; i += 5) {
>>>   ...
>>> }
>> 
>> 
>> I don’t think division by a negative divisor lends itself to
>> algebraic simplification. Hacker’s guide might say something about
>> this.
>> 
>> The reason you don’t get a trip count is that ‘i' might step beyond
>> ’n’ and overflow. If ’n’ is a constant less than INT_MAX-4 then you
>> get a trip count.
>> 
>> The NSW flags are supposed to handle this case. Did you lose them
>> during loop unrolling?
> 
> I think that this is the key point. No, I tried this with optimized code straight out of Clang and it did not work (I don't think that it has anything to do with the loop body).

Right. How could I forget. This is the infamous case where we ignore NSW. The problem is that we don’t know how the backedge taken count will be used. We do know that if the loop exits via the current branch, it will be at iteration ’n'. However, we don’t know if the loop will continue iterating beyond that point.

So you could get a minimum taken count for a particular loop back edge in this case if we adapt the SCEV API to communicate properly.

The advantage of doing this within LoopVectorizer is that we can gather the loop preconditions and emit preheader checks when profitable.

-Andy