[PATCH] Add getUnrollingPreferences to TTI

Fri Aug 30 10:54:26 PDT 2013

----- Andrew Trick <atrick at apple.com> wrote:
> 
> On Aug 29, 2013, at 4:44 PM, Hal Finkel <hfinkel at anl.gov> wrote:
> 
> >> Second, I think essentially only the LoopVectorizer should use this
> >> to drive its partial unrolling. This is both because it has added
> >> information about when the cost is low (ILP) and when the cost is
> >> unusually high. Finally, it should happen at the same phase of the
> >> optimizer IMO (IE, it shouldn't drastically influence the behavior
> >> of the inliner). While the LoopVectorizer is currently inside the
> >> CGSCC pass pipeline, Nadav is working on moving it to live after
> >> that completes.
> > 
> > The loop unroller's partial unrolling is different from the loop vectorizer's partial unrolling. The loop vectorizer specifically partially unrolls vectorizable loops for ILP (with loop iteration bodies maximally intermixed), while the loop unroller uses loop-body concatenation. Both are important:
> > 
> > - On cores with high loop-branch overhead, partial unrolling is important for all loops with small bodies, regardless of ILP considerations in order to hide the backedge cost.
> > 
> > - Because the loop vectorizer does not partially unroll non-vectorizable loops, having only that would miss a lot of important cases where loop-body concatenation can nevertheless help hide instruction latency and expose ILP near the end and beginning of the loop body. Especially for cores with deep pipelines (high instruction latency) this is actually quite important.
> > 
> > In short, the ILP unrolling that the loop vectorizer does is almost always preferable (so long as its register pressure estimates are good), but it is not always possible, and the concatenation-based unrolling is also important.
> 
> Great points. I think you’re arguing for either having the LoopVectorizer pass call the normal unroller as a utility, or running a late partial unrolling pass after the vectorizer—I’m not immediately sure which is cleaner.
> 
> You don’t want to partially unroll before vectorization right? Don’t you want to determine the vector/ILP unroll factor first, then concatenate iterations if it’s still profitable?

That's correct.

> 
> I don’t think this needs to be fixed now if it works for you. When the vectorizer moves out of CGSCC, we can split the unroll pass appropriately. Shouldn’t be too hard.

Sounds good. Thanks again,
Hal

> 
> -Andy
-- 
Hal Finkel
Assistant Computational Scientist
Leadership Computing Facility
Argonne National Laboratory