[LLVMdev] [llvm-commits] [PATCH] BasicBlock Autovectorization Pass

Tue Nov 29 15:00:51 PST 2011

On Nov 27, 2011, at 7:04 PM, Hal Finkel wrote:

> On Sun, 2011-11-27 at 18:34 -0800, Evan Cheng wrote:
>> This is very interesting work. Thanks. Quick questions / comments.
>> 
>> 1. What's the algorithm complexity?
> 
> Roughly, it is O(N^2), where N is the number of instructions in a basic
> block, but:
> - The run time only *really* grows with the number of "pairable"
> instructions; this is *generally* a much smaller number than the total
> number of instructions (except in test cases). Also determining whether
> two loads or stores are pairable is more expensive than for other kinds
> of instructions because scalar-evolution analysis is used.
> - There is a (configurable) cutoff, so N will not grow exceedingly
> large even for large basic blocks (this was a suggestion by Tobias
> Grosser).

N^2 is still very scary. Is there room for improvement? Can you share the compile time of LLVM test suites?

> 
>> 2. Please rename -vectorize to -bb-vectorize since this is a specific kind of vectorization.
> 
> I have two objections:
> 1. -vectorize is an option to the pass manager, and as I hope that
> we'll soon have other vectorization passes in addition to this one,
> -vectorize could then enable those in some default configuration as
> well. You can use -bb-vectorize currently to enable just this pass in
> the same way as any other transformation pass.
> 2. I don't think that I can because it would cause the option to
> conflict with the name of the pass. In addition to enabling the pass
> itself, -vectorize adds instcombine and gvn passes to do some necessary
> cleanup.

Ok!

> 
>> 3. Have you tried non-x86 targets? Does it work? If not, what would it take to make it work?
> 
> I have not yet tried it, but I am planning to do that this week. My
> primary motivation for working on autovectorization is to use it on the
> IBM Blue Gene supercomputers (which are PPC-based). The pass is designed
> to be platform independent, and it *should* work.
> 
>> 
>> I am concerned that LLVM currently does not have a cost model for vectorization. That's necessary for your work to be applicable for general adoption.
> 
> In general I agree. That having been said, as-is, the pass produces
> significant speedups in "real" applications (including some in the test
> suite). Unfortunately, it also produces some real slow-downs, and so I
> think that before vectorization could be enabled by default (at -O3 for
> example), we would indeed need a cost model.

Agreed.

Evan

> 
> Thank you for looking at this,
> Hal
> 
>> 
>> Evan
>> 
>> On Nov 23, 2011, at 8:52 AM, Hal Finkel wrote:
>> 
>>> On Mon, 2011-11-21 at 21:22 -0600, Hal Finkel wrote:
>>>> On Mon, 2011-11-21 at 11:55 -0600, Hal Finkel wrote:
>>>>> Tobias,
>>>>> 
>>>>> I've attached an updated patch. It contains a few bug fixes and many
>>>>> (refactoring and coding-convention) changes inspired by your comments.
>>>>> 
>>>>> I'm currently trying to fix the bug responsible for causing a compile
>>>>> failure when compiling
>>>>> test-suite/MultiSource/Applications/obsequi/toggle_move.c; after the
>>>>> pass begins to fuse instructions in a basic block in this file, the
>>>>> aliasing analysis starts producing different (more pessimistic) query
>>>>> answers. I've never before seen it do this, and I'm not sure why it is
>>>>> happening. Also odd, at the same time, the numeric labels that are
>>>>> assigned to otherwise unnamed instructions, change. I don't think I've
>>>>> seen this happen either (they generally seem to stay fixed once
>>>>> assigned). I don't know if these two things are related, but if anyone
>>>>> can provide some insight, I'd appreciate it.
>>>> 
>>>> I think that I see what is happening in this case (please someone tell
>>>> me if this does not make sense). In the problematic basic block, there
>>>> are some loads and stores that are independent. The default aliasing
>>>> analysis can tell that these loads and stores don't reference the same
>>>> memory region. Then, some of the inputs to the getelementptr
>>>> instructions used for these loads and stores are fused by the
>>>> vectorization. After this happens, the aliasing analysis loses its
>>>> ability to tell that the loads and stores that make use of those
>>>> vector-calculated indices are independent.
>>> 
>>> The attached patch works around this issue. Please review.
>>> 
>>> Thanks again,
>>> Hal
>>> 
>>>> 
>>>> -Hal
>>>> 
>>>>> 
>>>>> In any case, this version of the patch should be much more suitable for
>>>>> your (or anyone else's) further review.
>>>>> 
>>>>> Thanks again,
>>>>> Hal
>>>>> 
>>>>> On Thu, 2011-11-17 at 13:57 +0100, Tobias Grosser wrote:
>>>>>> On 11/17/2011 12:38 AM, Hal Finkel wrote:
>>>>>>> Tobias, et al.,
>>>>>>> 
>>>>>>> Attached is the my autovectorization pass.
>>>>>> 
>>>>>> Very nice. Will you be at the developer summit? Maybe we could discuss 
>>>>>> the integration there?
>>>>>> 
>>>>>> Here a first review of the source code.
>>>>>> 
>>>> 
>>>> 
>>> 
>>> -- 
>>> Hal Finkel
>>> Postdoctoral Appointee
>>> Leadership Computing Facility
>>> Argonne National Laboratory
>>> <llvm_bb_vectorize-20111122.diff>_______________________________________________
>>> llvm-commits mailing list
>>> llvm-commits at cs.uiuc.edu
>>> http://lists.cs.uiuc.edu/mailman/listinfo/llvm-commits
>> 
> 
> -- 
> Hal Finkel
> Postdoctoral Appointee
> Leadership Computing Facility
> Argonne National Laboratory
>