[llvm-commits] [LLVMdev] [PATCH] BasicBlock Autovectorization Pass

Wed Nov 30 15:14:19 PST 2011

On Tue, 2011-11-29 at 15:00 -0800, Evan Cheng wrote:
> On Nov 27, 2011, at 7:04 PM, Hal Finkel wrote:
> 
> > On Sun, 2011-11-27 at 18:34 -0800, Evan Cheng wrote:
> >> This is very interesting work. Thanks. Quick questions / comments.
> >> 
> >> 1. What's the algorithm complexity?
> > 
> > Roughly, it is O(N^2), where N is the number of instructions in a basic
> > block, but:
> > - The run time only *really* grows with the number of "pairable"
> > instructions; this is *generally* a much smaller number than the total
> > number of instructions (except in test cases). Also determining whether
> > two loads or stores are pairable is more expensive than for other kinds
> > of instructions because scalar-evolution analysis is used.
> > - There is a (configurable) cutoff, so N will not grow exceedingly
> > large even for large basic blocks (this was a suggestion by Tobias
> > Grosser).
> 
> N^2 is still very scary. Is there room for improvement? Can you share the compile time of LLVM test suites?

Agreed. There certainly might be room for improvement, and I have some
ideas, but I need to spend some quality time with the profiler. Even so,
the current performance is not that bad (sometimes, it is even good).

Over the entire test suite:

default vs. -vectorize:

The mean compile-time slowdown was 23%, the standard deviation was 107%
and the median was 0%. 185 of the tests exhibited a slowdown, 173
exhibited a speedup (the remainder showed no change).

The runtime showed a mean slowdown of 6% with a standard deviation of
68.5% and a median speedup of 3.5%. 226 tests exhibited a speedup, 91
exhibited a slowdown.

default vs. -unroll-allow-partial:

The mean compile-time slowdown was 19.5%, the standard deviation was 60%
and the median was 0%. 199 of the tests exhibited a slowdown, 172
exhibited a speedup.

The runtime showed a mean slowdown of 8.7% with a standard deviation of
64% and a median speedup of 0.9%. 107 tests exhibited a speedup, 213
exhibited a slowdown.

default vs -unroll-allow-partial -vectorize

The mean compile-time slowdown was 31%, the standard deviation was 122%
and the median was 2.6%. 213 of the tests exhibited a slowdown, 138
exhibited a speedup.

The mean runtime showed a slowdown of 4.3% with a standard deviation of
68% and a median speedup of 3.5%. 225 tests exhibited a speedup, 88
tests exhibited a slowdown.

In conclusion, the current vectorization patch has about the same
compile-time/runtime profile as does -unroll-allow-partial.

 -Hal

> 
> > 
> >> 2. Please rename -vectorize to -bb-vectorize since this is a specific kind of vectorization.
> > 
> > I have two objections:
> > 1. -vectorize is an option to the pass manager, and as I hope that
> > we'll soon have other vectorization passes in addition to this one,
> > -vectorize could then enable those in some default configuration as
> > well. You can use -bb-vectorize currently to enable just this pass in
> > the same way as any other transformation pass.
> > 2. I don't think that I can because it would cause the option to
> > conflict with the name of the pass. In addition to enabling the pass
> > itself, -vectorize adds instcombine and gvn passes to do some necessary
> > cleanup.
> 
> Ok!
> 
> > 
> >> 3. Have you tried non-x86 targets? Does it work? If not, what would it take to make it work?
> > 
> > I have not yet tried it, but I am planning to do that this week. My
> > primary motivation for working on autovectorization is to use it on the
> > IBM Blue Gene supercomputers (which are PPC-based). The pass is designed
> > to be platform independent, and it *should* work.
> > 
> >> 
> >> I am concerned that LLVM currently does not have a cost model for vectorization. That's necessary for your work to be applicable for general adoption.
> > 
> > In general I agree. That having been said, as-is, the pass produces
> > significant speedups in "real" applications (including some in the test
> > suite). Unfortunately, it also produces some real slow-downs, and so I
> > think that before vectorization could be enabled by default (at -O3 for
> > example), we would indeed need a cost model.
> 
> Agreed.
> 
> Evan
> 
> > 
> > Thank you for looking at this,
> > Hal
> > 
> >> 
> >> Evan
> >> 
> >> On Nov 23, 2011, at 8:52 AM, Hal Finkel wrote:
> >> 
> >>> On Mon, 2011-11-21 at 21:22 -0600, Hal Finkel wrote:
> >>>> On Mon, 2011-11-21 at 11:55 -0600, Hal Finkel wrote:
> >>>>> Tobias,
> >>>>> 
> >>>>> I've attached an updated patch. It contains a few bug fixes and many
> >>>>> (refactoring and coding-convention) changes inspired by your comments.
> >>>>> 
> >>>>> I'm currently trying to fix the bug responsible for causing a compile
> >>>>> failure when compiling
> >>>>> test-suite/MultiSource/Applications/obsequi/toggle_move.c; after the
> >>>>> pass begins to fuse instructions in a basic block in this file, the
> >>>>> aliasing analysis starts producing different (more pessimistic) query
> >>>>> answers. I've never before seen it do this, and I'm not sure why it is
> >>>>> happening. Also odd, at the same time, the numeric labels that are
> >>>>> assigned to otherwise unnamed instructions, change. I don't think I've
> >>>>> seen this happen either (they generally seem to stay fixed once
> >>>>> assigned). I don't know if these two things are related, but if anyone
> >>>>> can provide some insight, I'd appreciate it.
> >>>> 
> >>>> I think that I see what is happening in this case (please someone tell
> >>>> me if this does not make sense). In the problematic basic block, there
> >>>> are some loads and stores that are independent. The default aliasing
> >>>> analysis can tell that these loads and stores don't reference the same
> >>>> memory region. Then, some of the inputs to the getelementptr
> >>>> instructions used for these loads and stores are fused by the
> >>>> vectorization. After this happens, the aliasing analysis loses its
> >>>> ability to tell that the loads and stores that make use of those
> >>>> vector-calculated indices are independent.
> >>> 
> >>> The attached patch works around this issue. Please review.
> >>> 
> >>> Thanks again,
> >>> Hal
> >>> 
> >>>> 
> >>>> -Hal
> >>>> 
> >>>>> 
> >>>>> In any case, this version of the patch should be much more suitable for
> >>>>> your (or anyone else's) further review.
> >>>>> 
> >>>>> Thanks again,
> >>>>> Hal
> >>>>> 
> >>>>> On Thu, 2011-11-17 at 13:57 +0100, Tobias Grosser wrote:
> >>>>>> On 11/17/2011 12:38 AM, Hal Finkel wrote:
> >>>>>>> Tobias, et al.,
> >>>>>>> 
> >>>>>>> Attached is the my autovectorization pass.
> >>>>>> 
> >>>>>> Very nice. Will you be at the developer summit? Maybe we could discuss 
> >>>>>> the integration there?
> >>>>>> 
> >>>>>> Here a first review of the source code.
> >>>>>> 
> >>>> 
> >>>> 
> >>> 
> >>> -- 
> >>> Hal Finkel
> >>> Postdoctoral Appointee
> >>> Leadership Computing Facility
> >>> Argonne National Laboratory
> >>> <llvm_bb_vectorize-20111122.diff>_______________________________________________
> >>> llvm-commits mailing list
> >>> llvm-commits at cs.uiuc.edu
> >>> http://lists.cs.uiuc.edu/mailman/listinfo/llvm-commits
> >> 
> > 
> > -- 
> > Hal Finkel
> > Postdoctoral Appointee
> > Leadership Computing Facility
> > Argonne National Laboratory
> > 
> 

-- 
Hal Finkel
Postdoctoral Appointee
Leadership Computing Facility
Argonne National Laboratory