[llvm-commits] [PATCH] BasicBlock Autovectorization Pass

Hal Finkel hfinkel at anl.gov
Fri Oct 21 14:04:49 PDT 2011


I've attached an initial version of a basic-block autovectorization
pass. It works by searching a basic block for pairable (independent)
instructions, and, using a chain-seeking heuristic, selects pairings
likely to provide an overall speedup (if such pairings can be found).
The selected pairs are then fused and, if necessary, other instructions
are moved in order to maintain data-flow consistency. This works only
within one basic block, but can do loop vectorization in combination
with (partial) unrolling. The basic idea was inspired by the Vienna MAP
Vectorizor, which has been used to vectorize FFT kernels, but the
algorithm used here is different.

To try it, use -bb-vectorize with opt. There are a few options:
-bb-vectorize-req-chain-depth: default: 3 -- The depth of the chain of
instruction pairs necessary in order to consider the pairs that compose
the chain worthy of vectorization.
-bb-vectorize-vector-bits: default: 128 -- The size of the target vector
registers
-bb-vectorize-no-ints -- Don't consider integer instructions
-bb-vectorize-no-floats -- Don't consider floating-point instructions  

The vectorizor generates a lot of insert_element/extract_element pairs;
The assumption is that other passes will turn these into shuffles when
possible (it looks like some work is necessary here). It will also
vectorize vector instructions, and generates shuffles in this case
(again, other passes should combine these as appropriate).

Currently, it does not fuse load or store instructions, but that is a
feature that I'd like to add. Of course, alignment information is an
issue for load/store vectorization (or maybe I should just fuse them
anyway and let isel deal with unaligned cases?).

Also, support needs to be added for fusing known intrinsics (fma, etc.),
and, as has been discussed on llvmdev, we should add some intrinsics to
allow the generation of addsub-type instructions.

I've included a few tests, but it needs more. Please review (I'll commit
if and when everyone is happy).

Thanks in advance,
Hal

P.S. There is another option (not so useful right now, but could be):
-bb-vectorize-fast-dep -- Don't do a full inter-instruction dependency
analysis; instead stop looking for instruction pairs after the first use
of an instruction's value. [This makes the pass faster, but would
require a data-dependence-based reordering pass in order to be
effective].

-- 
Hal Finkel
Postdoctoral Appointee
Leadership Computing Facility
Argonne National Laboratory
-------------- next part --------------
A non-text attachment was scrubbed...
Name: llvm_bb_vectorize-20111021-2.diff
Type: text/x-patch
Size: 46947 bytes
Desc: not available
URL: <http://lists.llvm.org/pipermail/llvm-commits/attachments/20111021/76f1c228/attachment.bin>


More information about the llvm-commits mailing list