[llvm] r189281 - LoopVectorize: Implement partial loop unrolling when vectorization is not profitable.

Thu Aug 29 18:00:25 PDT 2013

----- Original Message -----
> On Wed, Aug 28, 2013 at 4:08 PM, Renato Golin
> <renato.golin at linaro.org> wrote:
> > On 28 August 2013 21:58, Eric Christopher <echristo at gmail.com>
> > wrote:
> >>
> >> Sure, it seems reasonable to me that this should be hoisted out to
> >> some analysis and then stuck in the general loop unrolling pass.
> >> What
> >> do you think?
> >
> >
> > Hi Eric,
> >
> > The difference here, I assume, is that the LoopVectorizer has more
> > information than the simple loop unrolling pass, and thus can know
> > that a
> > transformation is profitable.
> 
> Right. I was wondering if that information was useful to the general
> partial unroller.
> 
> It seems like I'm the only one asking so... *shrug* :)

FWIW, if you look at my response to Chandler's review of my unrolling TTI patch, I highlight some of the differences between the two unrolling capabilities (as they affect me).

 -Hal

> 
> -eric
> 
> >
> > We had similar discussions before, even in the Polly era: where
> > does the
> > analysis end and the implementation begins?
> >
> > There was some consensus that vectorizers should have three (not
> > necessarily
> > distinct or unique) passes:
> >  1. The first pass, the annotation phase, where costs would be
> >  calculated,
> > transformations would be validated and metadata would be written to
> > loops,
> > basic-blocks and, possibly, instructions. The Legalizer and the
> > CostTable do
> > that job, but doesn't annotate anything.
> >  2. The second pass would then do the target-independent
> >  transformation,
> > based on the previous annotation. This is more or less what the
> > current
> > vectorizers do, trusting that step 1 is sure that the
> > transformation is
> > legal and worthy.
> >  3. A third pass would then do more target-specific changes, with
> >  sub-target
> > information, like this very case, if you know your CPU is OOO. This
> > is
> > partially done by the cost tables and the TTI, but not explicitly.
> >
> > Because step 1 is not annotating, that information can't be used
> > outside the
> > vectorizers, and because the cost tables and the target transform
> > info are
> > holding target-specific information, you don't (yet) need a
> > third-stage.
> >
> > But things start to get grey with the example Nadav gave. That
> > seems more
> > profitable on OOO CPUs, but probably not others, and since non-OOO
> > CPUs are
> > still being designed today, that might be a target-specific
> > approach on a
> > target-agnostic area. Also, since there is no annotation, other
> > passes
> > cannot profit from the information that the vectorizer calculated,
> > throwing
> > away precious cycles or duplicating code into the vectorizer.
> >
> > So, I agree that we could do better, but we'll need some co-joint
> > work on
> > the vectorizer if we are to make it more generic while still
> > maintaining its
> > hard-earned performance boost on, at least, x86 and ARM.
> >
> > On the other hand, maybe the loop-unrolling pass should be merged
> > into the
> > loop vectorizer...
> >
> > cheers,
> > --renato
> _______________________________________________
> llvm-commits mailing list
> llvm-commits at cs.uiuc.edu
> http://lists.cs.uiuc.edu/mailman/listinfo/llvm-commits
> 

-- 
Hal Finkel
Assistant Computational Scientist
Leadership Computing Facility
Argonne National Laboratory