[LLVMdev] GSoC 2009: Auto-vectorization
Vikram S. Adve
vadve at cs.uiuc.edu
Wed Apr 1 15:40:30 PDT 2009
Andreas,
I agree this would be a great project. One comment:
On Apr 1, 2009, at 2:50 PM, Andreas Bolka wrote:
> Hi Stefanus,
>
> On Wed Apr 01 16:08:45 +0200 2009, Stefanus Du Toit wrote:
>> On 31-Mar-09, at 8:27 PM, Andreas Bolka wrote:
>>> i.e. the core of the desired result would look like:
>>>
>>> %va = load <256 x i32>* %a
>>> %vb = load <256 x i32>* %b
>>> %vc = add <256 x i32> %a, %b
>>> store <256 x i32> %vc, <256 x i32>* %c
>>
>> I think the biggest problem with this approach, apart from the fact
>> that it doesn't mirror how vectors are typically used today in LLVM,
>> is that vectors in LLVM are of fixed size. This is going to severely
>> limit the usefulness of this transformation. I think you may be
>> better
>> off getting information about vector widths for the architecture
>> (e.g.
>> from TargetData) and vectorizing directly with a particular width in
>> mind.
>
> Thanks for the remark. My initial thinking was that, independent of
> auto-vectorization, a general "wide vector strip-mining"
> transformation
> would be worthwhile to have anyway. I.e. a transformation which would
> convert such wide vector operations as above to a loop over
> (target-dependent) smaller width vectors. And as the auto-vectorizer I
> am aiming for would initially only be able to vectorize loops with
> statically known constant trip counts, I could squash some
> complexity by
> leaving this target-dependent aspect to such a hypothetical strip-
> mining
> pass.
>
> But I understand the limits of this, and based on the feedback I got
> so
> far, I'll rather generate a loop with fixed-size vectors instead. A
> command-line option to determine the vectorization factor should do
> fine
> to enable quick bootstrapping, if querying the necessary register
> width
> from TargetData turns out to be problematic.
Even if you decide to generate loops with fixed-size vectors, I think
it could be worth separating your implementation into
(a) an auto-vectorization analysis that identifies vectorizable
statements and transforms the code to capture them, independent of
vector length;
(b) a trivial strip-mining pass that generates loops with fixed-size
vectors; and
(c) an *optional* loop alignment step that adjusts vector operations
so vector loads/stores are aligned (and creates the extra scalar loop
at the end).
Doing (a) in terms of arbitrary-length vectors will simplify the
problem by allowing you to isolate two relatively unrelated tasks,
viz., auto-vectorization analysis (#a) and target-dependent code
generation (#b and #c). It also let's you leave the third step as a
separate pass, which might be difficult otherwise. Note that (c) is
optional only in the sense that you don't need it for simple cases
like you described; in general, it is not optional.
On a related note, the following paper describes such a "back end" and
a reasonable IR for interfacing such a front-end (a-c above) and the
"back-end" of such a compiler:
http://llvm.cs.uiuc.edu/pubs/2006-06-15-VEE-VectorLLVA.html
>
>
>> Overall, this would be a great GSOC project, and I guarantee you
>> would
>> get a lot of interest in this :). Good luck!
>
> Thanks!
>
> --
> Andreas
> _______________________________________________
> LLVM Developers mailing list
> LLVMdev at cs.uiuc.edu http://llvm.cs.uiuc.edu
> http://lists.cs.uiuc.edu/mailman/listinfo/llvmdev
--Vikram
Associate Professor, Computer Science
University of Illinois at Urbana-Champaign
http://llvm.org/~vadve
More information about the llvm-dev
mailing list