[llvm-commits] [PATCH] BasicBlock Autovectorization Pass

Tue Oct 25 15:02:03 PDT 2011

Hal, 

As you mentioned, your patch implements only one kind of vectorization. There may be other kinds of vectorizations in LLVM. As such, I suggest that you create a 'vectorize' directory with your pass in it. You can name it bb-vectorization.

Thanks,
Nadav

-----Original Message-----
From: llvm-commits-bounces at cs.uiuc.edu [mailto:llvm-commits-bounces at cs.uiuc.edu] On Behalf Of Hal Finkel
Sent: Tuesday, October 25, 2011 23:23
To: llvm-commits at cs.uiuc.edu
Subject: Re: [llvm-commits] [PATCH] BasicBlock Autovectorization Pass

I've attached an improved version of the autovectorization pass. This version will also vectorize loads and stores, casts, and some intrinsics (fma and trig. functions).

There are, correspondingly, a few new options:
bb-vectorize-no-casts -- Don't try to vectorize casting (conversion) operations bb-vectorize-no-math -- Don't try to vectorize floating-point math intrinsics (this is just the trig. functions right now) bb-vectorize-no-fma -- Don't try to vectorize the fused-multiply-add intrinsic bb-vectorize-no-mem-ops -- Don't try to vectorize loads and stores bb-vectorize-aligned-only -- Only generate aligned loads and stores

To make this really useful, there are some improvements necessary to InstCombine (and a few other things). But the autovectorization process itself now seems to work well. Please review this patch; adding the vectorization pass itself should not affect any other code (although it does touch some common files to add support for the pass into opt). If it looks okay, please let me know, and I'll commit it.

Thanks in advance,
Hal

On Fri, 2011-10-21 at 16:04 -0500, Hal Finkel wrote:
> I've attached an initial version of a basic-block autovectorization 
> pass. It works by searching a basic block for pairable (independent) 
> instructions, and, using a chain-seeking heuristic, selects pairings 
> likely to provide an overall speedup (if such pairings can be found).
> The selected pairs are then fused and, if necessary, other 
> instructions are moved in order to maintain data-flow consistency. 
> This works only within one basic block, but can do loop vectorization 
> in combination with (partial) unrolling. The basic idea was inspired 
> by the Vienna MAP Vectorizor, which has been used to vectorize FFT 
> kernels, but the algorithm used here is different.
> 
> To try it, use -bb-vectorize with opt. There are a few options:
> -bb-vectorize-req-chain-depth: default: 3 -- The depth of the chain of 
> instruction pairs necessary in order to consider the pairs that 
> compose the chain worthy of vectorization.
> -bb-vectorize-vector-bits: default: 128 -- The size of the target 
> vector registers -bb-vectorize-no-ints -- Don't consider integer 
> instructions -bb-vectorize-no-floats -- Don't consider floating-point 
> instructions
> 
> The vectorizor generates a lot of insert_element/extract_element 
> pairs; The assumption is that other passes will turn these into 
> shuffles when possible (it looks like some work is necessary here). It 
> will also vectorize vector instructions, and generates shuffles in 
> this case (again, other passes should combine these as appropriate).
> 
> Currently, it does not fuse load or store instructions, but that is a 
> feature that I'd like to add. Of course, alignment information is an 
> issue for load/store vectorization (or maybe I should just fuse them 
> anyway and let isel deal with unaligned cases?).
> 
> Also, support needs to be added for fusing known intrinsics (fma, 
> etc.), and, as has been discussed on llvmdev, we should add some 
> intrinsics to allow the generation of addsub-type instructions.
> 
> I've included a few tests, but it needs more. Please review (I'll 
> commit if and when everyone is happy).
> 
> Thanks in advance,
> Hal
> 
> P.S. There is another option (not so useful right now, but could be):
> -bb-vectorize-fast-dep -- Don't do a full inter-instruction dependency 
> analysis; instead stop looking for instruction pairs after the first 
> use of an instruction's value. [This makes the pass faster, but would 
> require a data-dependence-based reordering pass in order to be 
> effective].
> 
> _______________________________________________
> llvm-commits mailing list
> llvm-commits at cs.uiuc.edu
> http://lists.cs.uiuc.edu/mailman/listinfo/llvm-commits

--
Hal Finkel
Postdoctoral Appointee
Leadership Computing Facility
Argonne National Laboratory
---------------------------------------------------------------------
Intel Israel (74) Limited

This e-mail and any attachments may contain confidential material for
the sole use of the intended recipient(s). Any review or distribution
by others is strictly prohibited. If you are not the intended
recipient, please contact the sender and delete all copies.