[llvm-commits] [llvm] r171436 - in /llvm/trunk/lib/Transforms/Vectorize: LoopVectorize.cpp LoopVectorize.h

Wed Jan 2 20:59:52 PST 2013

----- Original Message -----
> From: "Eli Friedman" <eli.friedman at gmail.com>
> To: "Hal Finkel" <hfinkel at anl.gov>
> Cc: "Nadav Rotem" <nrotem at apple.com>, llvm-commits at cs.uiuc.edu
> Sent: Wednesday, January 2, 2013 9:21:50 PM
> Subject: Re: [llvm-commits] [llvm] r171436 - in /llvm/trunk/lib/Transforms/Vectorize: LoopVectorize.cpp
> LoopVectorize.h
> 
> On Wed, Jan 2, 2013 at 6:52 PM, Hal Finkel <hfinkel at anl.gov> wrote:
> > ----- Original Message -----
> >> From: "Eli Friedman" <eli.friedman at gmail.com>
> >> To: "Hal Finkel" <hfinkel at anl.gov>
> >> Cc: "Nadav Rotem" <nrotem at apple.com>, llvm-commits at cs.uiuc.edu
> >> Sent: Wednesday, January 2, 2013 8:39:59 PM
> >> Subject: Re: [llvm-commits] [llvm] r171436 - in
> >> /llvm/trunk/lib/Transforms/Vectorize: LoopVectorize.cpp
> >> LoopVectorize.h
> >>
> >> On Wed, Jan 2, 2013 at 6:20 PM, Hal Finkel <hfinkel at anl.gov>
> >> wrote:
> >> > ----- Original Message -----
> >> >> From: "Nadav Rotem" <nrotem at apple.com>
> >> >> To: "Hal Finkel" <hfinkel at anl.gov>
> >> >> Cc: llvm-commits at cs.uiuc.edu
> >> >> Sent: Wednesday, January 2, 2013 7:55:33 PM
> >> >> Subject: Re: [llvm-commits] [llvm] r171436 - in
> >> >> /llvm/trunk/lib/Transforms/Vectorize: LoopVectorize.cpp
> >> >> LoopVectorize.h
> >> >>
> >> >>
> >> >>
> >> >>
> >> >>
> >> >> On Jan 2, 2013, at 5:12 PM, Hal Finkel < hfinkel at anl.gov >
> >> >> wrote:
> >> >>
> >> >>
> >> >> Interesting. Can you please explain your motivation for doing
> >> >> this?
> >> >>
> >> >>
> >> >>
> >> >> Hi Hal!
> >> >>
> >> >>
> >> >> The loop vectorizer can now generate multiple vectors for each
> >> >> scalar
> >> >> instruction. You are right that we could have used the loop
> >> >> unrolled
> >> >> for some cases. Basically we could have duplicated the loop
> >> >> basic
> >> >> block and added a new kind of alias analysis to tell the
> >> >> scheduler
> >> >> that memory operations from consecutive iterations do not
> >> >> alias.
> >> >
> >> > We might want to do this anyway to help the instruction
> >> > scheduler,
> >> > but that's another story.
> >> >
> >> >> However, this approach would fail for code such as this one:
> >> >>
> >> >>
> >> >> for (int i = 0; i < n; ++i)
> >> >> sum += A[i];
> >> >>
> >> >> The 'sum' variable is a reduction variable. In order to
> >> >> increase
> >> >> ILP
> >> >> we'd like to have two variables that accumulate the content of
> >> >> A.
> >> >> The LoopVectorizer has all of the information and
> >> >> infrastructure
> >> >> to
> >> >> allow the partial unrolling of loops.
> >> >> Maybe the name 'unrolling' is misleading. We can think of it as
> >> >> wider
> >> >> vectors that are somehow split to legal register sizes.
> >> >
> >> > Okay, I understand, thanks! The loop unroller would just create
> >> > one
> >> > large dependency chain, but to increase ILP, we need several
> >> > chains. On the other hand, would it make more sense to teach the
> >> > unroller to split reduction dependency chains than to embed this
> >> > functionality in the vectorizer? It seems like this
> >> > transformation
> >> > would be useful even in cases where we are not actually
> >> > vectorizing. Conversely, if the vectorizer is, for specialized
> >> > cases, a better unroller than the unroller, then maybe we should
> >> > specifically make sure it can be used that way.
> >>
> >> This transformation is basically orthogonal to anything the
> >> current
> >> LLVM IR loop unroller pass knows how to do: unlike the vectorizer,
> >> the
> >> unroller always executes all the loop iterations in the same order
> >> they ran before the unrolling.
> >
> > Agreed. Nevertheless, splitting the dependency chains does not
> > really need to change the order in which the iterations are
> > executed.
> 
> Sure.
> 
> >>
> >> >>
> >> >>
> >> >> The next step would be to write code that calculates the
> >> >> register
> >> >> pressure in order to estimate the profitability of this
> >> >> transformation.
> >> >
> >> > Sounds good. We may need something like this for the regular
> >> > unroller as well.
> >>
> >> Do we?  I mean, if we can't vectorize a loop, the only reason to
> >> unroll it at the IR level is if the IR subsequently simplifies,
> >> and
> >> that doesn't really depend on register pressure.  We can easily
> >> perform simple unrolling at the MachineFunction level, and we have
> >> much better information at that point.
> >
> > Do we have anything that does that?
> 
> All the analysis infrastructure is there, but there isn't an actual
> unroller at the moment as far as I know.

We have MachineLoopInfo, but that does not give us any way to determine trip counts, etc. Hexagon and PowerPC have "hardware loops" passes which optimize loop branching, and those passes need to use target-specific knowledge to analyze the comparison and increment instructions to extract the trip counts. Am I missing something?

Thanks again,
Hal

> 
> -Eli
> 

-- 
Hal Finkel
Postdoctoral Appointee
Leadership Computing Facility
Argonne National Laboratory