[llvm-commits] [PATCH] BasicBlock Autovectorization Pass

Wed Oct 26 13:49:35 PDT 2011

Hi Hal,

On Fri, Oct 21, 2011 at 7:04 PM, Hal Finkel <hfinkel at anl.gov> wrote:
> I've attached an initial version of a basic-block autovectorization
> pass. It works by searching a basic block for pairable (independent)
> instructions, and, using a chain-seeking heuristic, selects pairings
> likely to provide an overall speedup (if such pairings can be found).
> The selected pairs are then fused and, if necessary, other instructions
> are moved in order to maintain data-flow consistency. This works only
> within one basic block, but can do loop vectorization in combination
> with (partial) unrolling. The basic idea was inspired by the Vienna MAP
> Vectorizor, which has been used to vectorize FFT kernels, but the
> algorithm used here is different.
>
> To try it, use -bb-vectorize with opt. There are a few options:
> -bb-vectorize-req-chain-depth: default: 3 -- The depth of the chain of
> instruction pairs necessary in order to consider the pairs that compose
> the chain worthy of vectorization.
> -bb-vectorize-vector-bits: default: 128 -- The size of the target vector
> registers
> -bb-vectorize-no-ints -- Don't consider integer instructions
> -bb-vectorize-no-floats -- Don't consider floating-point instructions
>
> The vectorizor generates a lot of insert_element/extract_element pairs;
> The assumption is that other passes will turn these into shuffles when
> possible (it looks like some work is necessary here). It will also
> vectorize vector instructions, and generates shuffles in this case
> (again, other passes should combine these as appropriate).
>
> Currently, it does not fuse load or store instructions, but that is a
> feature that I'd like to add. Of course, alignment information is an
> issue for load/store vectorization (or maybe I should just fuse them
> anyway and let isel deal with unaligned cases?).
>
> Also, support needs to be added for fusing known intrinsics (fma, etc.),
> and, as has been discussed on llvmdev, we should add some intrinsics to
> allow the generation of addsub-type instructions.
>
> I've included a few tests, but it needs more. Please review (I'll commit
> if and when everyone is happy).
>
> Thanks in advance,
> Hal
>
> P.S. There is another option (not so useful right now, but could be):
> -bb-vectorize-fast-dep -- Don't do a full inter-instruction dependency
> analysis; instead stop looking for instruction pairs after the first use
> of an instruction's value. [This makes the pass faster, but would
> require a data-dependence-based reordering pass in order to be
> effective].

Cool! :)
Have you run this pass with any benchmark or the llvm testsuite? Does
it presents any regression?
Do you have any performance results?
Cheers,

-- 
Bruno Cardoso Lopes
http://www.brunocardoso.cc