[llvm-commits] [PATCH] BasicBlock Autovectorization Pass

Thu Nov 17 11:11:46 PST 2011

On 11/17/2011 10:01 AM, Hal Finkel wrote:
> On Thu, 2011-11-17 at 13:57 +0100, Tobias Grosser wrote:
>> On 11/17/2011 12:38 AM, Hal Finkel wrote:
>>> Tobias, et al.,
>>>
>>> Attached is the my autovectorization pass.
>>
>> Very nice. Will you be at the developer summit? Maybe we could discuss
>> the integration there?
>
> Unfortunately, not this year.
>
>>
>> Here a first review of the source code.
>
> Thanks! This is great! There are a few inline comments; but for the most
> part I'll makes the changes as suggested.

>>> +      if (const SCEVConstant *ConstOffSCEV =
>>> +            dyn_cast<SCEVConstant>(RelOffSCEV)) {
>>> +        ConstantInt *IntOff = ConstOffSCEV->getValue();
>>> +        int64_t off = IntOff->getSExtValue();
>>> +
>>> +        Type *VTy = cast<PointerType>(Iptr->getType())->getElementType();
>>
>> You should assert that the types of both pointers are indentical.
>
> This is done implicitly in all calling code. This, however, should be
> documented as a precondition of the function.

Yes. And just put an assert in the function that checks this. This does 
not cost a lot and may save someone who did not read the comment.

>> Also you should document that this function only works for vector types.
>> I actually expected it to just work for scalar types. Vector types look
>> a little bit wired here and I am actually not even sure if it is correct
>> for vector types.
>
> What do you mean by working only for vector types? I don't think that's
> true. As I recall, it seemed to work find for both loads and stores
> using both scalar pointers and vector pointers.

Oh sorry. I thought getElementType() returns the element of a vector. 
However, it returns the type pointed to by the pointer.

>>> +        int64_t VTy_tss = (int64_t) TD.getTypeStoreSize(VTy);
>>> +
>>> +        if (off == VTy_tss) {
>>> +          return 1;
>>> +        } else if (-off == VTy_tss) {
>>> +          return -1;
>>> +        }
>> Braces not needed.
>>
>>> +      }
>> Did you think of using SE.getSizeOfExpr()?
>
> Yes. ;)
>
> I did not for two reasons. First, if we have target data, which we need
> for load/store vectorization anyway, then it is unnecessary because the
> sizeof gets constant folded. Thus, that division is just an integer
> division in the relevant cases anyway.
True.

>
> Second, correct me if I'm wrong, but I would need a signed division, not
> an unsigned division.

At least I cannot convince myself that UDiv is always correct. Just 
leave it like this now.

>>> +        if (FastDep) {
>>> +          // Note: For this heuristic to be effective, independent operations
>>> +          // must tend to be intermixed. This is likely to be true from some
>>> +          // kinds of loop unrolling (but not the generic LLVM pass),
>>> +          // but otherwise may require some kind of reordering pass.
>>
>> What does it mean that independent operations must be intermixed? Can
>> you give an example?
>
> It means that if you group variables into independent groups, say A and
> B, such that things in B don't use results from A and vice-verse, and
> then have: A B A B A B A B... then FastDep will be able to, if the
> instruction are otherwise compatible, vectorize to make<A B>  <A B>...
> On the other hand, if you have A A A A B B B B, then you need a full
> search and dependency analysis to determine that you can move some of
> the Bs ahead of the preceding As in order to vectorize them.

Basically this means FastDep would be sufficient, if you do grouped 
unrolling?

> Will do, and thanks again. I'll send an updated patch soon.

Great!

Tobi