[llvm-commits] [LLVMdev] [PATCH] BasicBlock Autovectorization Pass

Sun Nov 27 19:04:14 PST 2011

On Sun, 2011-11-27 at 18:34 -0800, Evan Cheng wrote:
> This is very interesting work. Thanks. Quick questions / comments.
> 
> 1. What's the algorithm complexity?

Roughly, it is O(N^2), where N is the number of instructions in a basic
block, but:
 - The run time only *really* grows with the number of "pairable"
instructions; this is *generally* a much smaller number than the total
number of instructions (except in test cases). Also determining whether
two loads or stores are pairable is more expensive than for other kinds
of instructions because scalar-evolution analysis is used.
 - There is a (configurable) cutoff, so N will not grow exceedingly
large even for large basic blocks (this was a suggestion by Tobias
Grosser).

> 2. Please rename -vectorize to -bb-vectorize since this is a specific kind of vectorization.

I have two objections:
 1. -vectorize is an option to the pass manager, and as I hope that
we'll soon have other vectorization passes in addition to this one,
-vectorize could then enable those in some default configuration as
well. You can use -bb-vectorize currently to enable just this pass in
the same way as any other transformation pass.
 2. I don't think that I can because it would cause the option to
conflict with the name of the pass. In addition to enabling the pass
itself, -vectorize adds instcombine and gvn passes to do some necessary
cleanup.

> 3. Have you tried non-x86 targets? Does it work? If not, what would it take to make it work?

I have not yet tried it, but I am planning to do that this week. My
primary motivation for working on autovectorization is to use it on the
IBM Blue Gene supercomputers (which are PPC-based). The pass is designed
to be platform independent, and it *should* work.

> 
> I am concerned that LLVM currently does not have a cost model for vectorization. That's necessary for your work to be applicable for general adoption.

In general I agree. That having been said, as-is, the pass produces
significant speedups in "real" applications (including some in the test
suite). Unfortunately, it also produces some real slow-downs, and so I
think that before vectorization could be enabled by default (at -O3 for
example), we would indeed need a cost model.

Thank you for looking at this,
Hal

> 
> Evan
> 
> On Nov 23, 2011, at 8:52 AM, Hal Finkel wrote:
> 
> > On Mon, 2011-11-21 at 21:22 -0600, Hal Finkel wrote:
> >> On Mon, 2011-11-21 at 11:55 -0600, Hal Finkel wrote:
> >>> Tobias,
> >>> 
> >>> I've attached an updated patch. It contains a few bug fixes and many
> >>> (refactoring and coding-convention) changes inspired by your comments.
> >>> 
> >>> I'm currently trying to fix the bug responsible for causing a compile
> >>> failure when compiling
> >>> test-suite/MultiSource/Applications/obsequi/toggle_move.c; after the
> >>> pass begins to fuse instructions in a basic block in this file, the
> >>> aliasing analysis starts producing different (more pessimistic) query
> >>> answers. I've never before seen it do this, and I'm not sure why it is
> >>> happening. Also odd, at the same time, the numeric labels that are
> >>> assigned to otherwise unnamed instructions, change. I don't think I've
> >>> seen this happen either (they generally seem to stay fixed once
> >>> assigned). I don't know if these two things are related, but if anyone
> >>> can provide some insight, I'd appreciate it.
> >> 
> >> I think that I see what is happening in this case (please someone tell
> >> me if this does not make sense). In the problematic basic block, there
> >> are some loads and stores that are independent. The default aliasing
> >> analysis can tell that these loads and stores don't reference the same
> >> memory region. Then, some of the inputs to the getelementptr
> >> instructions used for these loads and stores are fused by the
> >> vectorization. After this happens, the aliasing analysis loses its
> >> ability to tell that the loads and stores that make use of those
> >> vector-calculated indices are independent.
> > 
> > The attached patch works around this issue. Please review.
> > 
> > Thanks again,
> > Hal
> > 
> >> 
> >> -Hal
> >> 
> >>> 
> >>> In any case, this version of the patch should be much more suitable for
> >>> your (or anyone else's) further review.
> >>> 
> >>> Thanks again,
> >>> Hal
> >>> 
> >>> On Thu, 2011-11-17 at 13:57 +0100, Tobias Grosser wrote:
> >>>> On 11/17/2011 12:38 AM, Hal Finkel wrote:
> >>>>> Tobias, et al.,
> >>>>> 
> >>>>> Attached is the my autovectorization pass.
> >>>> 
> >>>> Very nice. Will you be at the developer summit? Maybe we could discuss 
> >>>> the integration there?
> >>>> 
> >>>> Here a first review of the source code.
> >>>> 
> >> 
> >> 
> > 
> > -- 
> > Hal Finkel
> > Postdoctoral Appointee
> > Leadership Computing Facility
> > Argonne National Laboratory
> > <llvm_bb_vectorize-20111122.diff>_______________________________________________
> > llvm-commits mailing list
> > llvm-commits at cs.uiuc.edu
> > http://lists.cs.uiuc.edu/mailman/listinfo/llvm-commits
> 

-- 
Hal Finkel
Postdoctoral Appointee
Leadership Computing Facility
Argonne National Laboratory