[PATCH] support llvm.vectorization.vector_width metadata

Mon May 27 15:39:44 PDT 2013

On 05/27/2013 03:12 PM, Arnold Schwaighofer wrote:
> Hi Tobias,
>
>
> On May 27, 2013, at 3:26 PM, Tobias Grosser <tobias at grosser.es> wrote:
>
>> On 05/27/2013 12:42 PM, Redmond, Paul wrote:
>>> +
>>> +'``llvm.vectorizer.width``' Metadata
>>> +^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
>>> +
>>> +This metadata instructs the loop vectorizer to vectorize the specified
>>> +loop using a SIMD width of ``N``.
>>> +
>>> +The first operand is the string ``llvm.vectorizer.width`` and the second
>>> +operand is an integer specifying the width. For example:
>>> +
>>> +.. code-block:: llvm
>>>
>>
>> I am not fully sure about the exact meaning of this.
>>
>> Does this MD say anything about legality or is it supposed to only overwrite the cost heuristics of the vectorizer?
>>
>
> No, the width parameter says nothing about legality - it is meant to control the heuristic of the vectorizer. Legality is given by the fact that the loop is marked as fully “parallel” (or because an analysis tells us so).
>> For now it seems to me that this is just about the cost function and the way to vectorize. To proof that vectorization is legal, it seems it is still necessary that all the mem.parallel_loop_access metadata is available, which according to the current specification is enough to ensure vectorization is legal. Is this right?
>>
>
> Yes.

Thanks for clarifying this. It would be great if something similar could 
be added to the documentation.

>> Regarding this I have now two comments:
>>
>> 1) What happens with non-parallel, but vectorizable loops
>>
>> When vectorizing by a width of 4, it is not necessary that the full loop is parallel, but rather that all dependences carried by this loop have a dependence distance > 4. Do you have any plans to model this in the meta-data?
>>
>
>> 2) Vectorization only legal if statements are executed in lock-step
>>
>> For some loops, vectorization is legal if the statements are executed in lock step, but a thread parallel execution of the very same loop
>> is not legal. Do you have any plans to model this in your metadata?
>>
>
> Yes, we cannot express this with the current metadata. I don’t think we have immediate plans to add support for this.

OK, I also don't see an urgent need for this.

> If we decide we want to make the distinction “vectorizable” vs “parallel” we can certainly add metadata.

> However, we would have to make sure that no scalar optimization moves “vectorizable_loop_accesses” around:
>
> for ()
>    a[i] = ..
>         = a[i-1]   // Those two don’t alias. A scalar opt could exchange them. If we move them around vectorization is wrong wrt to the original loop.
>
> So this is not a trivial extension. (Maybe, we could number “vectorizable_loop_accesses” and check the order coming into the vectorizer, could this break down? ...)

Yes, something would be necessary in that direction. I don't yet oversee 
all implications.

>> I am not saying all this should be implemented in this patch, but it would be good to document if or if not this is implied.
>>
>
> No, this not implied. Yes, this should be made clear.

Great.

>> And a last question to help my understanding:
>>
>> Why is the vectorizer unrolling a loop and not the loop unroller? Or better, what is the reason this is implemented in the vectorizer?
>>
>
> The kind of unrolling performed is different. In the loop vectorizer we know iterations are independent hence we can replicate (vector) instruction per (vector) instruction. This helps utilization of ILP without having to rely to much on the scheduler to get it right. Also, it is cheap to do there.
>
> LLVM’s standard loop unroller unrolls whole loop iterations. You rely on the scheduler to get ILP right (and its analysis of memory accesses).
Thanks for explaining.

Cheers,
Tobias