[llvm-dev] Invoke loop vectorizer

Daniel Berlin via llvm-dev llvm-dev at lists.llvm.org
Fri Aug 12 13:21:50 PDT 2016


Right, and if you are not running it on the target, it's also not going to
detect the target features right, i believe?


On Fri, Aug 12, 2016 at 12:46 PM, Michael Kuperstein <mkuper at google.com>
wrote:

> The loop vectorizer is not independent of the target, since it queries the
> target for cost estimates to make the vectorization profitability decision.
>
> Your code has a pragma explicitly requesting vectorization, so
> profitability should not come into play, but there may be other
> target-related issues. One example I can think of is that we will never
> vectorize if the target has no vector registers.
>
> On Fri, Aug 12, 2016 at 12:20 PM, Xiaochu Liu via llvm-dev <
> llvm-dev at lists.llvm.org> wrote:
>
>> I'm not compiling it to x86. Should loop optimizer something independent
>> of the target? If so, should the vectorized code on IR level?
>>
>> On Aug 12, 2016 11:39 AM, "Daniel Berlin" <dberlin at dberlin.org> wrote:
>>
>>> cat > test.c
>>>
>>> #define SIZE 128
>>>
>>> void bar(int *restrict A, int* restrict B,int K) {
>>>
>>>   #pragma clang loop vectorize(enable) vectorize_width(2) unroll_count(8)
>>>
>>>   for (int i = 0; i < SIZE; ++i)
>>>
>>>     A[i] += B[i] + K;
>>>
>>> }
>>>
>>> [dannyb at dannyb-macbookpro3 11:37:20] ~ :) $ clang -O3  test.c -c
>>> -save-temps
>>> [dannyb at dannyb-macbookpro3 11:38:28] ~ :) $ pcregrep -i "^\s*p"
>>> test.s|less
>>>         pushq   %rbp
>>>         pshufd  $68, %xmm0, %xmm0       ## xmm0 = xmm0[0,1,0,1]
>>>         pslldq  $8, %xmm1               ## xmm1 =
>>> zero,zero,zero,zero,zero,zero,zero,zero,xmm1[0,1,2,3,4,5,6,7]
>>>         pshufd  $68, %xmm3, %xmm3       ## xmm3 = xmm3[0,1,0,1]
>>>         paddq   %xmm1, %xmm3
>>>         pshufd  $78, %xmm3, %xmm4       ## xmm4 = xmm3[2,3,0,1]
>>>         punpckldq       %xmm5, %xmm4    ## xmm4 =
>>> xmm4[0],xmm5[0],xmm4[1],xmm5[1]
>>>         pshufd  $212, %xmm4, %xmm4      ## xmm4 = xmm4[0,1,1,3]
>>>
>>>
>>>
>>> Note:
>>> It also vectorizes at SIZE=8.
>>>
>>> Not sure what the exact translation of options from clang-cl to clang is.
>>> Maybe try adding /O3?
>>>
>>>
>>>
>>>
>>> On Fri, Aug 12, 2016 at 11:23 AM, Xiaochu Liu <xiaochu1122 at gmail.com>
>>> wrote:
>>>
>>>> Hi Daniel,
>>>>
>>>> I increased the size of your test to be 128 but -stats still shows no
>>>> loop optimized...
>>>>
>>>> Xiaochu
>>>>
>>>> On Aug 12, 2016 11:11 AM, "Daniel Berlin" <dberlin at dberlin.org> wrote:
>>>>
>>>>> It's not possible to know that A and B don't alias in this example.
>>>>> It's almost certainly not profitable to add a runtime check given the size
>>>>> of the loop.
>>>>>
>>>>>
>>>>> try
>>>>>
>>>>> #define SIZE 8
>>>>>
>>>>> void bar(int *restrict A, int* restrict B,int K) {
>>>>>
>>>>>   #pragma clang loop vectorize(enable) vectorize_width(2)
>>>>> unroll_count(8)
>>>>>
>>>>>   for (int i = 0; i < SIZE; ++i)
>>>>>
>>>>>     A[i] += B[i] + K;
>>>>>
>>>>> }
>>>>>
>>>>> (i don't remember if llvm also does runtime alias checks, but if it
>>>>> does, you'd probably need to increase size to get it to vectorize)
>>>>>
>>>>> On Fri, Aug 12, 2016 at 11:08 AM, Xiaochu Liu via llvm-dev <
>>>>> llvm-dev at lists.llvm.org> wrote:
>>>>>
>>>>>> Hi Andrey,
>>>>>>
>>>>>> Thanks. I found even when loop vectorizer and SLP vectorizer are
>>>>>> enabled, my simple test still not get optimized. I also tried clang pragma
>>>>>> in my test to force vectorization. What do you think is the problem?
>>>>>>
>>>>>> Test:
>>>>>>
>>>>>> #define SIZE 8
>>>>>>
>>>>>> void bar(int *A, int* B,int K) {
>>>>>>
>>>>>>   #pragma clang loop vectorize(enable) vectorize_width(2)
>>>>>> unroll_count(8)
>>>>>>
>>>>>>   for (int i = 0; i < SIZE; ++i)
>>>>>>
>>>>>>     A[i] += B[i] + K;
>>>>>>
>>>>>> }
>>>>>>
>>>>>> Thanks,
>>>>>> Xiaochu
>>>>>>
>>>>>> On Aug 12, 2016 4:06 AM, "Andrey Bokhanko" <andreybokhanko at gmail.com>
>>>>>> wrote:
>>>>>>
>>>>>>> Hi Xiaochu,
>>>>>>>
>>>>>>> Clang uses -O0 by default, that doesn't run any optimizations. Try
>>>>>>> supplying -O1 or higher.
>>>>>>>
>>>>>>> Yours,
>>>>>>> Andrey
>>>>>>>
>>>>>>>
>>>>>>> On Fri, Aug 12, 2016 at 1:04 AM, Xiaochu Liu via llvm-dev <
>>>>>>> llvm-dev at lists.llvm.org> wrote:
>>>>>>>
>>>>>>>> Hi there ,
>>>>>>>>
>>>>>>>> I use clang-cl /Qvec test.c to compile the code. But the pass
>>>>>>>> LoopVectorizer is never invoked.
>>>>>>>>
>>>>>>>> I was wondering if this is sufficient to enable auto vectorizer?
>>>>>>>>
>>>>>>>> Thanks,
>>>>>>>> Xiaochu
>>>>>>>>
>>>>>>>> _______________________________________________
>>>>>>>> LLVM Developers mailing list
>>>>>>>> llvm-dev at lists.llvm.org
>>>>>>>> http://lists.llvm.org/cgi-bin/mailman/listinfo/llvm-dev
>>>>>>>>
>>>>>>>>
>>>>>>>
>>>>>> _______________________________________________
>>>>>> LLVM Developers mailing list
>>>>>> llvm-dev at lists.llvm.org
>>>>>> http://lists.llvm.org/cgi-bin/mailman/listinfo/llvm-dev
>>>>>>
>>>>>>
>>>>>
>>>
>> _______________________________________________
>> LLVM Developers mailing list
>> llvm-dev at lists.llvm.org
>> http://lists.llvm.org/cgi-bin/mailman/listinfo/llvm-dev
>>
>>
>
-------------- next part --------------
An HTML attachment was scrubbed...
URL: <http://lists.llvm.org/pipermail/llvm-dev/attachments/20160812/f3bcf55a/attachment.html>


More information about the llvm-dev mailing list