[llvm-commits] [llvm] r171436 - in /llvm/trunk/lib/Transforms/Vectorize: LoopVectorize.cpp LoopVectorize.h
hfinkel at anl.gov
Wed Jan 2 18:52:16 PST 2013
----- Original Message -----
> From: "Eli Friedman" <eli.friedman at gmail.com>
> To: "Hal Finkel" <hfinkel at anl.gov>
> Cc: "Nadav Rotem" <nrotem at apple.com>, llvm-commits at cs.uiuc.edu
> Sent: Wednesday, January 2, 2013 8:39:59 PM
> Subject: Re: [llvm-commits] [llvm] r171436 - in /llvm/trunk/lib/Transforms/Vectorize: LoopVectorize.cpp
> On Wed, Jan 2, 2013 at 6:20 PM, Hal Finkel <hfinkel at anl.gov> wrote:
> > ----- Original Message -----
> >> From: "Nadav Rotem" <nrotem at apple.com>
> >> To: "Hal Finkel" <hfinkel at anl.gov>
> >> Cc: llvm-commits at cs.uiuc.edu
> >> Sent: Wednesday, January 2, 2013 7:55:33 PM
> >> Subject: Re: [llvm-commits] [llvm] r171436 - in
> >> /llvm/trunk/lib/Transforms/Vectorize: LoopVectorize.cpp
> >> LoopVectorize.h
> >> On Jan 2, 2013, at 5:12 PM, Hal Finkel < hfinkel at anl.gov > wrote:
> >> Interesting. Can you please explain your motivation for doing
> >> this?
> >> Hi Hal!
> >> The loop vectorizer can now generate multiple vectors for each
> >> scalar
> >> instruction. You are right that we could have used the loop
> >> unrolled
> >> for some cases. Basically we could have duplicated the loop basic
> >> block and added a new kind of alias analysis to tell the scheduler
> >> that memory operations from consecutive iterations do not alias.
> > We might want to do this anyway to help the instruction scheduler,
> > but that's another story.
> >> However, this approach would fail for code such as this one:
> >> for (int i = 0; i < n; ++i)
> >> sum += A[i];
> >> The 'sum' variable is a reduction variable. In order to increase
> >> ILP
> >> we'd like to have two variables that accumulate the content of A.
> >> The LoopVectorizer has all of the information and infrastructure
> >> to
> >> allow the partial unrolling of loops.
> >> Maybe the name 'unrolling' is misleading. We can think of it as
> >> wider
> >> vectors that are somehow split to legal register sizes.
> > Okay, I understand, thanks! The loop unroller would just create one
> > large dependency chain, but to increase ILP, we need several
> > chains. On the other hand, would it make more sense to teach the
> > unroller to split reduction dependency chains than to embed this
> > functionality in the vectorizer? It seems like this transformation
> > would be useful even in cases where we are not actually
> > vectorizing. Conversely, if the vectorizer is, for specialized
> > cases, a better unroller than the unroller, then maybe we should
> > specifically make sure it can be used that way.
> This transformation is basically orthogonal to anything the current
> LLVM IR loop unroller pass knows how to do: unlike the vectorizer,
> unroller always executes all the loop iterations in the same order
> they ran before the unrolling.
Agreed. Nevertheless, splitting the dependency chains does not really need to change the order in which the iterations are executed.
> >> The next step would be to write code that calculates the register
> >> pressure in order to estimate the profitability of this
> >> transformation.
> > Sounds good. We may need something like this for the regular
> > unroller as well.
> Do we? I mean, if we can't vectorize a loop, the only reason to
> unroll it at the IR level is if the IR subsequently simplifies, and
> that doesn't really depend on register pressure. We can easily
> perform simple unrolling at the MachineFunction level, and we have
> much better information at that point.
Do we have anything that does that?
> (I'm using the term
> "vectorize" loosely here to mean loops where we can perform
> vectorization-style unrolling, even if there aren't any vector
> instructions involved.)
Okay; we're on the same page here (that's why I said that we may want to make sure the vectorizer can be used to do this transformation even if it is not really vectorizing).
Leadership Computing Facility
Argonne National Laboratory
More information about the llvm-commits