[llvm-commits] [llvm] r171436 - in /llvm/trunk/lib/Transforms/Vectorize: LoopVectorize.cpp LoopVectorize.h

Wed Jan 2 18:52:16 PST 2013

----- Original Message -----
> From: "Eli Friedman" <eli.friedman at gmail.com>
> To: "Hal Finkel" <hfinkel at anl.gov>
> Cc: "Nadav Rotem" <nrotem at apple.com>, llvm-commits at cs.uiuc.edu
> Sent: Wednesday, January 2, 2013 8:39:59 PM
> Subject: Re: [llvm-commits] [llvm] r171436 - in /llvm/trunk/lib/Transforms/Vectorize: LoopVectorize.cpp
> LoopVectorize.h
> 
> On Wed, Jan 2, 2013 at 6:20 PM, Hal Finkel <hfinkel at anl.gov> wrote:
> > ----- Original Message -----
> >> From: "Nadav Rotem" <nrotem at apple.com>
> >> To: "Hal Finkel" <hfinkel at anl.gov>
> >> Cc: llvm-commits at cs.uiuc.edu
> >> Sent: Wednesday, January 2, 2013 7:55:33 PM
> >> Subject: Re: [llvm-commits] [llvm] r171436 - in
> >> /llvm/trunk/lib/Transforms/Vectorize: LoopVectorize.cpp
> >> LoopVectorize.h
> >>
> >>
> >>
> >>
> >>
> >> On Jan 2, 2013, at 5:12 PM, Hal Finkel < hfinkel at anl.gov > wrote:
> >>
> >>
> >> Interesting. Can you please explain your motivation for doing
> >> this?
> >>
> >>
> >>
> >> Hi Hal!
> >>
> >>
> >> The loop vectorizer can now generate multiple vectors for each
> >> scalar
> >> instruction. You are right that we could have used the loop
> >> unrolled
> >> for some cases. Basically we could have duplicated the loop basic
> >> block and added a new kind of alias analysis to tell the scheduler
> >> that memory operations from consecutive iterations do not alias.
> >
> > We might want to do this anyway to help the instruction scheduler,
> > but that's another story.
> >
> >> However, this approach would fail for code such as this one:
> >>
> >>
> >> for (int i = 0; i < n; ++i)
> >> sum += A[i];
> >>
> >> The 'sum' variable is a reduction variable. In order to increase
> >> ILP
> >> we'd like to have two variables that accumulate the content of A.
> >> The LoopVectorizer has all of the information and infrastructure
> >> to
> >> allow the partial unrolling of loops.
> >> Maybe the name 'unrolling' is misleading. We can think of it as
> >> wider
> >> vectors that are somehow split to legal register sizes.
> >
> > Okay, I understand, thanks! The loop unroller would just create one
> > large dependency chain, but to increase ILP, we need several
> > chains. On the other hand, would it make more sense to teach the
> > unroller to split reduction dependency chains than to embed this
> > functionality in the vectorizer? It seems like this transformation
> > would be useful even in cases where we are not actually
> > vectorizing. Conversely, if the vectorizer is, for specialized
> > cases, a better unroller than the unroller, then maybe we should
> > specifically make sure it can be used that way.
> 
> This transformation is basically orthogonal to anything the current
> LLVM IR loop unroller pass knows how to do: unlike the vectorizer,
> the
> unroller always executes all the loop iterations in the same order
> they ran before the unrolling.

Agreed. Nevertheless, splitting the dependency chains does not really need to change the order in which the iterations are executed.

> 
> >>
> >>
> >> The next step would be to write code that calculates the register
> >> pressure in order to estimate the profitability of this
> >> transformation.
> >
> > Sounds good. We may need something like this for the regular
> > unroller as well.
> 
> Do we?  I mean, if we can't vectorize a loop, the only reason to
> unroll it at the IR level is if the IR subsequently simplifies, and
> that doesn't really depend on register pressure.  We can easily
> perform simple unrolling at the MachineFunction level, and we have
> much better information at that point.

Do we have anything that does that?

>  (I'm using the term
> "vectorize" loosely here to mean loops where we can perform
> vectorization-style unrolling, even if there aren't any vector
> instructions involved.)

Okay; we're on the same page here (that's why I said that we may want to make sure the vectorizer can be used to do this transformation even if it is not really vectorizing).

Thanks again,
Hal

> 
> -Eli
> 

-- 
Hal Finkel
Postdoctoral Appointee
Leadership Computing Facility
Argonne National Laboratory