[llvm-commits] [PATCH] BasicBlock Autovectorization Pass

Hal Finkel hfinkel at anl.gov
Tue Oct 25 14:23:02 PDT 2011


I've attached an improved version of the autovectorization pass. This
version will also vectorize loads and stores, casts, and some intrinsics
(fma and trig. functions).

There are, correspondingly, a few new options:
bb-vectorize-no-casts -- Don't try to vectorize casting (conversion)
operations
bb-vectorize-no-math -- Don't try to vectorize floating-point math
intrinsics (this is just the trig. functions right now)
bb-vectorize-no-fma -- Don't try to vectorize the fused-multiply-add
intrinsic
bb-vectorize-no-mem-ops -- Don't try to vectorize loads and stores
bb-vectorize-aligned-only -- Only generate aligned loads and stores

To make this really useful, there are some improvements necessary to
InstCombine (and a few other things). But the autovectorization process
itself now seems to work well. Please review this patch; adding the
vectorization pass itself should not affect any other code (although it
does touch some common files to add support for the pass into opt). If
it looks okay, please let me know, and I'll commit it.

Thanks in advance,
Hal

On Fri, 2011-10-21 at 16:04 -0500, Hal Finkel wrote:
> I've attached an initial version of a basic-block autovectorization
> pass. It works by searching a basic block for pairable (independent)
> instructions, and, using a chain-seeking heuristic, selects pairings
> likely to provide an overall speedup (if such pairings can be found).
> The selected pairs are then fused and, if necessary, other instructions
> are moved in order to maintain data-flow consistency. This works only
> within one basic block, but can do loop vectorization in combination
> with (partial) unrolling. The basic idea was inspired by the Vienna MAP
> Vectorizor, which has been used to vectorize FFT kernels, but the
> algorithm used here is different.
> 
> To try it, use -bb-vectorize with opt. There are a few options:
> -bb-vectorize-req-chain-depth: default: 3 -- The depth of the chain of
> instruction pairs necessary in order to consider the pairs that compose
> the chain worthy of vectorization.
> -bb-vectorize-vector-bits: default: 128 -- The size of the target vector
> registers
> -bb-vectorize-no-ints -- Don't consider integer instructions
> -bb-vectorize-no-floats -- Don't consider floating-point instructions  
> 
> The vectorizor generates a lot of insert_element/extract_element pairs;
> The assumption is that other passes will turn these into shuffles when
> possible (it looks like some work is necessary here). It will also
> vectorize vector instructions, and generates shuffles in this case
> (again, other passes should combine these as appropriate).
> 
> Currently, it does not fuse load or store instructions, but that is a
> feature that I'd like to add. Of course, alignment information is an
> issue for load/store vectorization (or maybe I should just fuse them
> anyway and let isel deal with unaligned cases?).
> 
> Also, support needs to be added for fusing known intrinsics (fma, etc.),
> and, as has been discussed on llvmdev, we should add some intrinsics to
> allow the generation of addsub-type instructions.
> 
> I've included a few tests, but it needs more. Please review (I'll commit
> if and when everyone is happy).
> 
> Thanks in advance,
> Hal
> 
> P.S. There is another option (not so useful right now, but could be):
> -bb-vectorize-fast-dep -- Don't do a full inter-instruction dependency
> analysis; instead stop looking for instruction pairs after the first use
> of an instruction's value. [This makes the pass faster, but would
> require a data-dependence-based reordering pass in order to be
> effective].
> 
> _______________________________________________
> llvm-commits mailing list
> llvm-commits at cs.uiuc.edu
> http://lists.cs.uiuc.edu/mailman/listinfo/llvm-commits

-- 
Hal Finkel
Postdoctoral Appointee
Leadership Computing Facility
Argonne National Laboratory
-------------- next part --------------
A non-text attachment was scrubbed...
Name: llvm_bb_vectorize-20111025.diff
Type: text/x-patch
Size: 58060 bytes
Desc: not available
URL: <http://lists.llvm.org/pipermail/llvm-commits/attachments/20111025/34b90001/attachment.bin>


More information about the llvm-commits mailing list