[llvm] r189281 - LoopVectorize: Implement partial loop unrolling when vectorization is not profitable.

Thu Aug 29 19:21:05 PDT 2013

On Thu, Aug 29, 2013 at 6:00 PM, Hal Finkel <hfinkel at anl.gov> wrote:
> ----- Original Message -----
>> On Wed, Aug 28, 2013 at 4:08 PM, Renato Golin
>> <renato.golin at linaro.org> wrote:
>> > On 28 August 2013 21:58, Eric Christopher <echristo at gmail.com>
>> > wrote:
>> >>
>> >> Sure, it seems reasonable to me that this should be hoisted out to
>> >> some analysis and then stuck in the general loop unrolling pass.
>> >> What
>> >> do you think?
>> >
>> >
>> > Hi Eric,
>> >
>> > The difference here, I assume, is that the LoopVectorizer has more
>> > information than the simple loop unrolling pass, and thus can know
>> > that a
>> > transformation is profitable.
>>
>> Right. I was wondering if that information was useful to the general
>> partial unroller.
>>
>> It seems like I'm the only one asking so... *shrug* :)
>
> FWIW, if you look at my response to Chandler's review of my unrolling TTI patch, I highlight some of the differences between the two unrolling capabilities (as they affect me).
>

Ah yes. I'll move discussion to that thread :)

-eric

>  -Hal
>
>>
>> -eric
>>
>> >
>> > We had similar discussions before, even in the Polly era: where
>> > does the
>> > analysis end and the implementation begins?
>> >
>> > There was some consensus that vectorizers should have three (not
>> > necessarily
>> > distinct or unique) passes:
>> >  1. The first pass, the annotation phase, where costs would be
>> >  calculated,
>> > transformations would be validated and metadata would be written to
>> > loops,
>> > basic-blocks and, possibly, instructions. The Legalizer and the
>> > CostTable do
>> > that job, but doesn't annotate anything.
>> >  2. The second pass would then do the target-independent
>> >  transformation,
>> > based on the previous annotation. This is more or less what the
>> > current
>> > vectorizers do, trusting that step 1 is sure that the
>> > transformation is
>> > legal and worthy.
>> >  3. A third pass would then do more target-specific changes, with
>> >  sub-target
>> > information, like this very case, if you know your CPU is OOO. This
>> > is
>> > partially done by the cost tables and the TTI, but not explicitly.
>> >
>> > Because step 1 is not annotating, that information can't be used
>> > outside the
>> > vectorizers, and because the cost tables and the target transform
>> > info are
>> > holding target-specific information, you don't (yet) need a
>> > third-stage.
>> >
>> > But things start to get grey with the example Nadav gave. That
>> > seems more
>> > profitable on OOO CPUs, but probably not others, and since non-OOO
>> > CPUs are
>> > still being designed today, that might be a target-specific
>> > approach on a
>> > target-agnostic area. Also, since there is no annotation, other
>> > passes
>> > cannot profit from the information that the vectorizer calculated,
>> > throwing
>> > away precious cycles or duplicating code into the vectorizer.
>> >
>> > So, I agree that we could do better, but we'll need some co-joint
>> > work on
>> > the vectorizer if we are to make it more generic while still
>> > maintaining its
>> > hard-earned performance boost on, at least, x86 and ARM.
>> >
>> > On the other hand, maybe the loop-unrolling pass should be merged
>> > into the
>> > loop vectorizer...
>> >
>> > cheers,
>> > --renato
>> _______________________________________________
>> llvm-commits mailing list
>> llvm-commits at cs.uiuc.edu
>> http://lists.cs.uiuc.edu/mailman/listinfo/llvm-commits
>>
>
> --
> Hal Finkel
> Assistant Computational Scientist
> Leadership Computing Facility
> Argonne National Laboratory