[LLVMdev] GSoC 2009: Auto-vectorization

Wed Apr 1 15:40:30 PDT 2009

Andreas,

I agree this would be a great project.  One comment:

On Apr 1, 2009, at 2:50 PM, Andreas Bolka wrote:

> Hi Stefanus,
>
> On Wed Apr 01 16:08:45 +0200 2009, Stefanus Du Toit wrote:
>> On 31-Mar-09, at 8:27 PM, Andreas Bolka wrote:
>>> i.e. the core of the desired result would look like:
>>>
>>>   %va = load <256 x i32>* %a
>>>   %vb = load <256 x i32>* %b
>>>   %vc = add <256 x i32> %a, %b
>>>   store <256 x i32> %vc, <256 x i32>* %c
>>
>> I think the biggest problem with this approach, apart from the fact
>> that it doesn't mirror how vectors are typically used today in LLVM,
>> is that vectors in LLVM are of fixed size. This is going to severely
>> limit the usefulness of this transformation. I think you may be  
>> better
>> off getting information about vector widths for the architecture  
>> (e.g.
>> from TargetData) and vectorizing directly with a particular width in
>> mind.
>
> Thanks for the remark. My initial thinking was that, independent of
> auto-vectorization, a general "wide vector strip-mining"  
> transformation
> would be worthwhile to have anyway. I.e. a transformation which would
> convert such wide vector operations as above to a loop over
> (target-dependent) smaller width vectors. And as the auto-vectorizer I
> am aiming for would initially only be able to vectorize loops with
> statically known constant trip counts, I could squash some  
> complexity by
> leaving this target-dependent aspect to such a hypothetical strip- 
> mining
> pass.
>
> But I understand the limits of this, and based on the feedback I got  
> so
> far, I'll rather generate a loop with fixed-size vectors instead. A
> command-line option to determine the vectorization factor should do  
> fine
> to enable quick bootstrapping, if querying the necessary register  
> width
> from TargetData turns out to be problematic.

Even if you decide to generate loops with fixed-size vectors, I think  
it could be worth separating your implementation into
(a) an auto-vectorization analysis that identifies vectorizable  
statements and transforms the code to capture them, independent of  
vector length;
(b) a trivial strip-mining pass that generates loops with fixed-size  
vectors; and
(c) an *optional* loop alignment step that adjusts vector operations  
so vector loads/stores are aligned (and creates the extra scalar loop  
at the end).

Doing (a) in terms of arbitrary-length vectors will simplify the  
problem by allowing you to isolate two relatively unrelated tasks,  
viz., auto-vectorization analysis (#a) and target-dependent code  
generation (#b and #c).  It also let's you leave the third step as a  
separate pass, which might be difficult otherwise.  Note that (c) is  
optional only in the sense that you don't need it for simple cases  
like you described; in general, it is not optional.

On a related note, the following paper describes such a "back end" and  
a reasonable IR for interfacing such a front-end (a-c above) and the  
"back-end" of such a compiler:
	http://llvm.cs.uiuc.edu/pubs/2006-06-15-VEE-VectorLLVA.html

>
>
>> Overall, this would be a great GSOC project, and I guarantee you  
>> would
>> get a lot of interest in this :). Good luck!
>
> Thanks!
>
> -- 
> Andreas
> _______________________________________________
> LLVM Developers mailing list
> LLVMdev at cs.uiuc.edu         http://llvm.cs.uiuc.edu
> http://lists.cs.uiuc.edu/mailman/listinfo/llvmdev

--Vikram
Associate Professor, Computer Science
University of Illinois at Urbana-Champaign
http://llvm.org/~vadve