[PATCH] Loop Rerolling Pass

Thu Oct 17 10:30:16 PDT 2013

----- Original Message -----
> 
> 
> 
> On Oct 17, 2013, at 6:25 AM, Hal Finkel < hfinkel at anl.gov > wrote:
> 
> 
> 
> ----- Original Message -----
> 
> 
> 
> On Oct 16, 2013, at 4:20 PM, Hal Finkel < hfinkel at anl.gov > wrote:
> 
> 
> 
> ----- Original Message -----
> 
> 
> 
> On Oct 16, 2013, at 3:37 PM, Hal Finkel < hfinkel at anl.gov > wrote:
> 
> 
> 
> ----- Original Message -----
> 
> 
> 
> On Oct 16, 2013, at 1:14 PM, Hal Finkel < hfinkel at anl.gov > wrote:
> 
> 
> 
> ----- Original Message -----
> 
> 
> 
> 
> 
> On Oct 16, 2013, at 9:18 AM, Hal Finkel < hfinkel at anl.gov >
> wrote:
> 
> 
> 
> ----- Original Message -----
> 
> 
> 
> On 15 October 2013 22:11, Hal Finkel < hfinkel at anl.gov >
> wrote:
> 
> 
> 
> 
> 
> 
> I made use of SCEV everywhere that I could (I think). SCEV is
> used
> to
> analyze the induction variables, and then at the end to help
> with
> the rewriting. I don't think that I can use SCEV everywhere,
> however. For one thing, I need to check for equivalence of
> instructions, with certain substitutions, for instructions
> (like
> function calls) that SCEV does not interpret.
> 
> 
> Hi Hal,
> 
> 
> This is probably my lack of understanding of all that SCEV
> does
> than
> anything else.
> 
> 
> My comment was to the fact that you seem to be investigating
> specific
> cases (multiply, adding, increment size near line 240), which
> SCEV
> could get that as an expression, and possibly making it
> slightly
> easier to work with. I'll let other SCEV/LV experts to chime
> in,
> because I basically don't know what I'm talking about, here.
> ;)
> 
> Okay, I see what you mean. The code in this block:
> if (Inc == 1) {
> // This is a special case: here we're looking for all uses
> (except
> for
> // the increment) to be multiplied by a common factor. The
> increment
> must
> // be by one.
> if (I->first->getNumUses() != 2)
> continue;
> 
> This code does not use SCEV because, IMHO, there is no need.
> It
> is
> looking for a very particular instruction pattern where the
> induction variable has only two uses: one which increments it
> by
> one
> (SCEV has already been used to determine that the increment is
> 1),
> and the other is a multiply by a small constant. It is to
> catch
> cases like this:
> 
> for (int i = 0; i < 500; ++i) {
> foo(3*i);
> foo(3*i+1);
> foo(3*i+2);
> }
> 
> And so, aside from the increment, all uses of the IV are via
> the
> multiply. If we find this pattern, then instead of attempting
> to
> classify all IV uses as functions of i, i+1, i+2, ... we
> attempt
> to
> classify all uses of the multiplied IV that way.
> 
> 
> 
> 
> I think the more general SCEV-based way to do this would be to
> recursively walk the def-use chains starting at phis, looking
> past
> simple arithmetic until reaching an IV users (see
> IVUsers::AddUsersIfInteresting). Then you group the users by
> their
> IV operand's SCEV expression. If the SCEVs advance by the same
> constant, then you have your unrolled iterations and it
> doesn't
> matter how the induction variable was computed.
> LSRInstance::CollectChains does something similar.
> 
> Thanks! Collecting all IV users may be overkill here, but this
> is
> something that I should play with.
> 
> While I have your attention (hopefully), why does SCEV not have
> a
> signed division representation? I suspect that it why SCEV
> won't
> give be a backedge-taken count for a loop like:
> 
> for (int i = 0; i < n; i += 5) {
> ...
> }
> 
> 
> I don’t think division by a negative divisor lends itself to
> algebraic simplification. Hacker’s guide might say something
> about
> this.
> 
> The reason you don’t get a trip count is that ‘i' might step
> beyond
> ’n’ and overflow. If ’n’ is a constant less than INT_MAX-4 then
> you
> get a trip count.
> 
> The NSW flags are supposed to handle this case. Did you lose
> them
> during loop unrolling?
> 
> I think that this is the key point. No, I tried this with
> optimized
> code straight out of Clang and it did not work (I don't think
> that
> it has anything to do with the loop body).
> 
> Right. How could I forget. This is the infamous case where we
> ignore
> NSW. The problem is that we don’t know how the backedge taken
> count
> will be used. We do know that if the loop exits via the current
> branch, it will be at iteration ’n'. However, we don’t know if the
> loop will continue iterating beyond that point.
> 
> But it seems, that being the case, we could still return 'n/5' for
> loops with only one exiting block? I thought that is what
> SE->hasLoopInvariantBackedgeTakenCount(L) was for. It would only
> say that there was a backedge-taken count if the loop structure
> was simple enough that there was one unambiguous answer. Is this
> just an implementation oversight, or are there additional
> complications?
> 
> 
> 
> 
> So you could get a minimum taken count for a particular loop back
> edge in this case if we adapt the SCEV API to communicate
> properly.
> 
> I recall discussing this before, and so I apologize, but can you
> elaborate on what 'communicate properly' will entail?
> 
> I’m open to anything. For each branch exit we could distinguish
> between a min vs. exact backedge taken count. We just have to be
> careful how we present it to the public API and error on the side of
> caution. If a user simply asks for the loop trip count, I don’t
> think it’s correct to return ’n’, since subsequent iterations may
> run before hitting undefined behavior. There have been bugs related
> to this in the past.
> 
> If the client either asks for a minimum trip count, or the iteration
> count at which we may observe a loop exit, then we can safely
> provide an answer. In your case, you’re asking for an equivalent
> loop test, so is it safe? I think it only works for you because you
> know the loop contains no calls, so the program has no way to
> terminate before hitting undefined behavior.
> 
> Maybe the high level SCEV interface should take a loop-may-terminate
> parameter. The client can set this to false if it goes to the
> trouble of proving it.
> 
> This sounds like a good idea. However, do all of these concerns not
> equally apply for a constant 'n'?
> 
> 
> If the loop is testing less-than constant ’n’, I think we already
> handle it (knowing n < INT_MAX-stride). I’m not sure what we do for
> equals ’n’.

No, that does not currently work. Compiling the following with clang:

void foo(int n, int *x) {
  for (int i = 0; i < n; i += 3) {
    x[i] = i;
    x[i+1] = i+1;
    x[i+2] = i+2;
  }
}

I find that SE->hasLoopInvariantBackedgeTakenCount(L) returns false. From what you're saying, this sounds like a bug, no?

 -Hal

> 
> 
> -Andy

-- 
Hal Finkel
Assistant Computational Scientist
Leadership Computing Facility
Argonne National Laboratory