[PATCH] Loop Rerolling Pass

Thu Oct 17 06:25:09 PDT 2013

----- Original Message -----
> 
> On Oct 16, 2013, at 4:20 PM, Hal Finkel <hfinkel at anl.gov> wrote:
> 
> > ----- Original Message -----
> >> 
> >> On Oct 16, 2013, at 3:37 PM, Hal Finkel <hfinkel at anl.gov> wrote:
> >> 
> >>> ----- Original Message -----
> >>>> 
> >>>> On Oct 16, 2013, at 1:14 PM, Hal Finkel <hfinkel at anl.gov> wrote:
> >>>> 
> >>>>> ----- Original Message -----
> >>>>>> 
> >>>>>> 
> >>>>>> 
> >>>>>> On Oct 16, 2013, at 9:18 AM, Hal Finkel < hfinkel at anl.gov >
> >>>>>> wrote:
> >>>>>> 
> >>>>>> 
> >>>>>> 
> >>>>>> ----- Original Message -----
> >>>>>> 
> >>>>>> 
> >>>>>> 
> >>>>>> On 15 October 2013 22:11, Hal Finkel < hfinkel at anl.gov >
> >>>>>> wrote:
> >>>>>> 
> >>>>>> 
> >>>>>> 
> >>>>>> 
> >>>>>> 
> >>>>>> 
> >>>>>> I made use of SCEV everywhere that I could (I think). SCEV is
> >>>>>> used
> >>>>>> to
> >>>>>> analyze the induction variables, and then at the end to help
> >>>>>> with
> >>>>>> the rewriting. I don't think that I can use SCEV everywhere,
> >>>>>> however. For one thing, I need to check for equivalence of
> >>>>>> instructions, with certain substitutions, for instructions
> >>>>>> (like
> >>>>>> function calls) that SCEV does not interpret.
> >>>>>> 
> >>>>>> 
> >>>>>> Hi Hal,
> >>>>>> 
> >>>>>> 
> >>>>>> This is probably my lack of understanding of all that SCEV
> >>>>>> does
> >>>>>> than
> >>>>>> anything else.
> >>>>>> 
> >>>>>> 
> >>>>>> My comment was to the fact that you seem to be investigating
> >>>>>> specific
> >>>>>> cases (multiply, adding, increment size near line 240), which
> >>>>>> SCEV
> >>>>>> could get that as an expression, and possibly making it
> >>>>>> slightly
> >>>>>> easier to work with. I'll let other SCEV/LV experts to chime
> >>>>>> in,
> >>>>>> because I basically don't know what I'm talking about, here.
> >>>>>> ;)
> >>>>>> 
> >>>>>> Okay, I see what you mean. The code in this block:
> >>>>>> if (Inc == 1) {
> >>>>>> // This is a special case: here we're looking for all uses
> >>>>>> (except
> >>>>>> for
> >>>>>> // the increment) to be multiplied by a common factor. The
> >>>>>> increment
> >>>>>> must
> >>>>>> // be by one.
> >>>>>> if (I->first->getNumUses() != 2)
> >>>>>> continue;
> >>>>>> 
> >>>>>> This code does not use SCEV because, IMHO, there is no need.
> >>>>>> It
> >>>>>> is
> >>>>>> looking for a very particular instruction pattern where the
> >>>>>> induction variable has only two uses: one which increments it
> >>>>>> by
> >>>>>> one
> >>>>>> (SCEV has already been used to determine that the increment is
> >>>>>> 1),
> >>>>>> and the other is a multiply by a small constant. It is to
> >>>>>> catch
> >>>>>> cases like this:
> >>>>>> 
> >>>>>> for (int i = 0; i < 500; ++i) {
> >>>>>> foo(3*i);
> >>>>>> foo(3*i+1);
> >>>>>> foo(3*i+2);
> >>>>>> }
> >>>>>> 
> >>>>>> And so, aside from the increment, all uses of the IV are via
> >>>>>> the
> >>>>>> multiply. If we find this pattern, then instead of attempting
> >>>>>> to
> >>>>>> classify all IV uses as functions of i, i+1, i+2, ... we
> >>>>>> attempt
> >>>>>> to
> >>>>>> classify all uses of the multiplied IV that way.
> >>>>>> 
> >>>>>> 
> >>>>>> 
> >>>>>> 
> >>>>>> I think the more general SCEV-based way to do this would be to
> >>>>>> recursively walk the def-use chains starting at phis, looking
> >>>>>> past
> >>>>>> simple arithmetic until reaching an IV users (see
> >>>>>> IVUsers::AddUsersIfInteresting). Then you group the users by
> >>>>>> their
> >>>>>> IV operand's SCEV expression. If the SCEVs advance by the same
> >>>>>> constant, then you have your unrolled iterations and it
> >>>>>> doesn't
> >>>>>> matter how the induction variable was computed.
> >>>>>> LSRInstance::CollectChains does something similar.
> >>>>> 
> >>>>> Thanks! Collecting all IV users may be overkill here, but this
> >>>>> is
> >>>>> something that I should play with.
> >>>>> 
> >>>>> While I have your attention (hopefully), why does SCEV not have
> >>>>> a
> >>>>> signed division representation? I suspect that it why SCEV
> >>>>> won't
> >>>>> give be a backedge-taken count for a loop like:
> >>>>> 
> >>>>> for (int i = 0; i < n; i += 5) {
> >>>>>  ...
> >>>>> }
> >>>> 
> >>>> 
> >>>> I don’t think division by a negative divisor lends itself to
> >>>> algebraic simplification. Hacker’s guide might say something
> >>>> about
> >>>> this.
> >>>> 
> >>>> The reason you don’t get a trip count is that ‘i' might step
> >>>> beyond
> >>>> ’n’ and overflow. If ’n’ is a constant less than INT_MAX-4 then
> >>>> you
> >>>> get a trip count.
> >>>> 
> >>>> The NSW flags are supposed to handle this case. Did you lose
> >>>> them
> >>>> during loop unrolling?
> >>> 
> >>> I think that this is the key point. No, I tried this with
> >>> optimized
> >>> code straight out of Clang and it did not work (I don't think
> >>> that
> >>> it has anything to do with the loop body).
> >> 
> >> Right. How could I forget. This is the infamous case where we
> >> ignore
> >> NSW. The problem is that we don’t know how the backedge taken
> >> count
> >> will be used. We do know that if the loop exits via the current
> >> branch, it will be at iteration ’n'. However, we don’t know if the
> >> loop will continue iterating beyond that point.
> > 
> > But it seems, that being the case, we could still return 'n/5' for
> > loops with only one exiting block? I thought that is what
> > SE->hasLoopInvariantBackedgeTakenCount(L) was for. It would only
> > say that there was a backedge-taken count if the loop structure
> > was simple enough that there was one unambiguous answer. Is this
> > just an implementation oversight, or are there additional
> > complications?
> > 
> >> 
> >> So you could get a minimum taken count for a particular loop back
> >> edge in this case if we adapt the SCEV API to communicate
> >> properly.
> > 
> > I recall discussing this before, and so I apologize, but can you
> > elaborate on what 'communicate properly' will entail?
> 
> I’m open to anything. For each branch exit we could distinguish
> between a min vs. exact backedge taken count. We just have to be
> careful how we present it to the public API and error on the side of
> caution. If a user simply asks for the loop trip count, I don’t
> think it’s correct to return ’n’, since subsequent iterations may
> run before hitting undefined behavior. There have been bugs related
> to this in the past.
> 
> If the client either asks for a minimum trip count, or the iteration
> count at which we may observe a loop exit, then we can safely
> provide an answer. In your case, you’re asking for an equivalent
> loop test, so is it safe? I think it only works for you because you
> know the loop contains no calls, so the program has no way to
> terminate before hitting undefined behavior.
> 
> Maybe the high level SCEV interface should take a loop-may-terminate
> parameter. The client can set this to false if it goes to the
> trouble of proving it.

This sounds like a good idea. However, do all of these concerns not equally apply for a constant 'n'?

 -Hal

> 
> -Andy

-- 
Hal Finkel
Assistant Computational Scientist
Leadership Computing Facility
Argonne National Laboratory