[PATCH] Loop Rerolling Pass

Hal Finkel hfinkel at anl.gov
Wed Oct 16 13:00:22 PDT 2013


----- Original Message -----
> 
> On Oct 15, 2013, at 12:09 PM, hfinkel at anl.gov wrote:
> 
> > Hi nadav, rengolin, atrick,
> > 
> > I've created a loop rerolling pass. The transformation aims to take
> > loops like this:
> > 
> >  for (int i = 0; i < 3200; i += 5) {
> >    a[i] += alpha * b[i];
> >    a[i + 1] += alpha * b[i + 1];
> >    a[i + 2] += alpha * b[i + 2];
> >    a[i + 3] += alpha * b[i + 3];
> >    a[i + 4] += alpha * b[i + 4];
> >  }
> > 
> > and turn them into this:
> > 
> >  for (int i = 0; i < 3200; ++i) {
> >    a[i] += alpha * b[i];
> >  }
> > 
> > and loops like this:
> > 
> >  for (int i = 0; i < 500; ++i) {
> >    x[3*i] = foo(0);
> >    x[3*i+1] = foo(0);
> >    x[3*i+2] = foo(0);
> >  }
> > 
> > and turn them into this:
> > 
> >  for (int i = 0; i < 1500; ++i) {
> >    x[i] = foo(0);
> >  }
> > 
> > There are two motivations for this transformation:
> > 
> > 1. Code-size reduction (especially relevant, obviously, when
> > compiling for code size).
> > 
> > 2. Providing greater choice to the loop vectorizer (and generic
> > unroller) to choose the unrolling factor (and a better ability to
> > vectorize). The loop vectorizer can take vector lengths and
> > register pressure into account when choosing an unrolling factor,
> > for example, and a pre-unrolled loop limits that choice. This is
> > especially problematic if the manual unrolling was optimized for a
> > machine different from the current target.
> > 
> > The current implementation is limited to single basic-block loops
> > only. The rerolling recognition should work regardless of how the
> > loop iterations are intermixed within the loop body (subject to
> > dependency and side-effect constraints), but the significant
> > restriction is that the order of the instructions in each
> > iteration must be identical. This seems sufficient to capture all
> > of my current use cases.
> > 
> > The transformation triggers very rarely on the test suite (which I
> > think it good, programmers should be able to leave trivial
> > unrolling to the compiler). When I insert this pass just prior to
> > loop vectorization, and prior to SLP vectorization (so that we
> > prefer to reroll over SLP vectorizing), it helps:
> > 
> > On an Intel Xeon E5430:
> > MultiSource/Benchmarks/TSVC/LoopRerolling-flt: 36% speedup (loops
> > s351 and s353 are rerolled, s353's performance regresses by 9%,
> > but s351 exhibits a 76% speedup; all others are unchanged)
> > MultiSource/Benchmarks/TSVC/LoopRerolling-dbl: 13% speedup (loops
> > s351 and s353 are rerolled, s353's performance is essentially
> > unchanged, but s351 exhibits a 38% speedup; all others are
> > unchanged)
> > FreeBench/distray/distray: No significant change
> > 
> > Please review.
> 
> Thanks Hal. This looks useful.
> 
> Superficially the code looks ok as a first implementation. I can’t
> say I’ve reviewed it in depth. One question:
> 
> +      AU.addRequired<LoopInfo>();
> +      // Note: We don't preserve LoopInfo because we might add a
> canonical
> +      // induction variable where there was not one before.
> 
> I think adding a canonical IV is fine for any pass to do that needs
> it. But can you explain how that invalidates LoopInfo? That doesn't
> seem necessary.

This is a good point, thanks! The pass does not need to create a canonical induction variable, but does happen to create one in a lot of common cases (common among the uncommon cases in which we do anything at all). I think that what I'd like to do is add a method to LoopInfo so that I can add a canonical induction variable should I happen to create one. Then I can preserve LoopInfo and not cause any confusing downstream behavior.

Thanks again,
Hal

> 
> -Andy

-- 
Hal Finkel
Assistant Computational Scientist
Leadership Computing Facility
Argonne National Laboratory




More information about the llvm-commits mailing list