[llvm-dev] Invoke loop vectorizer
Xiaochu Liu via llvm-dev
llvm-dev at lists.llvm.org
Fri Aug 12 13:26:29 PDT 2016
Thanks, guys!
I found that my target is missing getNumberOfRegistets function. Loop
vectorizer is invoked but no loop was examined...
My back end is still under construction... Sorry about that.
Thanks,
Xiaochu
On Aug 12, 2016 1:21 PM, "Daniel Berlin" <dberlin at dberlin.org> wrote:
Right, and if you are not running it on the target, it's also not going to
detect the target features right, i believe?
On Fri, Aug 12, 2016 at 12:46 PM, Michael Kuperstein <mkuper at google.com>
wrote:
> The loop vectorizer is not independent of the target, since it queries the
> target for cost estimates to make the vectorization profitability decision.
>
> Your code has a pragma explicitly requesting vectorization, so
> profitability should not come into play, but there may be other
> target-related issues. One example I can think of is that we will never
> vectorize if the target has no vector registers.
>
> On Fri, Aug 12, 2016 at 12:20 PM, Xiaochu Liu via llvm-dev <
> llvm-dev at lists.llvm.org> wrote:
>
>> I'm not compiling it to x86. Should loop optimizer something independent
>> of the target? If so, should the vectorized code on IR level?
>>
>> On Aug 12, 2016 11:39 AM, "Daniel Berlin" <dberlin at dberlin.org> wrote:
>>
>>> cat > test.c
>>>
>>> #define SIZE 128
>>>
>>> void bar(int *restrict A, int* restrict B,int K) {
>>>
>>> #pragma clang loop vectorize(enable) vectorize_width(2) unroll_count(8)
>>>
>>> for (int i = 0; i < SIZE; ++i)
>>>
>>> A[i] += B[i] + K;
>>>
>>> }
>>>
>>> [dannyb at dannyb-macbookpro3 11:37:20] ~ :) $ clang -O3 test.c -c
>>> -save-temps
>>> [dannyb at dannyb-macbookpro3 11:38:28] ~ :) $ pcregrep -i "^\s*p"
>>> test.s|less
>>> pushq %rbp
>>> pshufd $68, %xmm0, %xmm0 ## xmm0 = xmm0[0,1,0,1]
>>> pslldq $8, %xmm1 ## xmm1 =
>>> zero,zero,zero,zero,zero,zero,zero,zero,xmm1[0,1,2,3,4,5,6,7]
>>> pshufd $68, %xmm3, %xmm3 ## xmm3 = xmm3[0,1,0,1]
>>> paddq %xmm1, %xmm3
>>> pshufd $78, %xmm3, %xmm4 ## xmm4 = xmm3[2,3,0,1]
>>> punpckldq %xmm5, %xmm4 ## xmm4 =
>>> xmm4[0],xmm5[0],xmm4[1],xmm5[1]
>>> pshufd $212, %xmm4, %xmm4 ## xmm4 = xmm4[0,1,1,3]
>>>
>>>
>>>
>>> Note:
>>> It also vectorizes at SIZE=8.
>>>
>>> Not sure what the exact translation of options from clang-cl to clang is.
>>> Maybe try adding /O3?
>>>
>>>
>>>
>>>
>>> On Fri, Aug 12, 2016 at 11:23 AM, Xiaochu Liu <xiaochu1122 at gmail.com>
>>> wrote:
>>>
>>>> Hi Daniel,
>>>>
>>>> I increased the size of your test to be 128 but -stats still shows no
>>>> loop optimized...
>>>>
>>>> Xiaochu
>>>>
>>>> On Aug 12, 2016 11:11 AM, "Daniel Berlin" <dberlin at dberlin.org> wrote:
>>>>
>>>>> It's not possible to know that A and B don't alias in this example.
>>>>> It's almost certainly not profitable to add a runtime check given the size
>>>>> of the loop.
>>>>>
>>>>>
>>>>> try
>>>>>
>>>>> #define SIZE 8
>>>>>
>>>>> void bar(int *restrict A, int* restrict B,int K) {
>>>>>
>>>>> #pragma clang loop vectorize(enable) vectorize_width(2)
>>>>> unroll_count(8)
>>>>>
>>>>> for (int i = 0; i < SIZE; ++i)
>>>>>
>>>>> A[i] += B[i] + K;
>>>>>
>>>>> }
>>>>>
>>>>> (i don't remember if llvm also does runtime alias checks, but if it
>>>>> does, you'd probably need to increase size to get it to vectorize)
>>>>>
>>>>> On Fri, Aug 12, 2016 at 11:08 AM, Xiaochu Liu via llvm-dev <
>>>>> llvm-dev at lists.llvm.org> wrote:
>>>>>
>>>>>> Hi Andrey,
>>>>>>
>>>>>> Thanks. I found even when loop vectorizer and SLP vectorizer are
>>>>>> enabled, my simple test still not get optimized. I also tried clang pragma
>>>>>> in my test to force vectorization. What do you think is the problem?
>>>>>>
>>>>>> Test:
>>>>>>
>>>>>> #define SIZE 8
>>>>>>
>>>>>> void bar(int *A, int* B,int K) {
>>>>>>
>>>>>> #pragma clang loop vectorize(enable) vectorize_width(2)
>>>>>> unroll_count(8)
>>>>>>
>>>>>> for (int i = 0; i < SIZE; ++i)
>>>>>>
>>>>>> A[i] += B[i] + K;
>>>>>>
>>>>>> }
>>>>>>
>>>>>> Thanks,
>>>>>> Xiaochu
>>>>>>
>>>>>> On Aug 12, 2016 4:06 AM, "Andrey Bokhanko" <andreybokhanko at gmail.com>
>>>>>> wrote:
>>>>>>
>>>>>>> Hi Xiaochu,
>>>>>>>
>>>>>>> Clang uses -O0 by default, that doesn't run any optimizations. Try
>>>>>>> supplying -O1 or higher.
>>>>>>>
>>>>>>> Yours,
>>>>>>> Andrey
>>>>>>>
>>>>>>>
>>>>>>> On Fri, Aug 12, 2016 at 1:04 AM, Xiaochu Liu via llvm-dev <
>>>>>>> llvm-dev at lists.llvm.org> wrote:
>>>>>>>
>>>>>>>> Hi there ,
>>>>>>>>
>>>>>>>> I use clang-cl /Qvec test.c to compile the code. But the pass
>>>>>>>> LoopVectorizer is never invoked.
>>>>>>>>
>>>>>>>> I was wondering if this is sufficient to enable auto vectorizer?
>>>>>>>>
>>>>>>>> Thanks,
>>>>>>>> Xiaochu
>>>>>>>>
>>>>>>>> _______________________________________________
>>>>>>>> LLVM Developers mailing list
>>>>>>>> llvm-dev at lists.llvm.org
>>>>>>>> http://lists.llvm.org/cgi-bin/mailman/listinfo/llvm-dev
>>>>>>>>
>>>>>>>>
>>>>>>>
>>>>>> _______________________________________________
>>>>>> LLVM Developers mailing list
>>>>>> llvm-dev at lists.llvm.org
>>>>>> http://lists.llvm.org/cgi-bin/mailman/listinfo/llvm-dev
>>>>>>
>>>>>>
>>>>>
>>>
>> _______________________________________________
>> LLVM Developers mailing list
>> llvm-dev at lists.llvm.org
>> http://lists.llvm.org/cgi-bin/mailman/listinfo/llvm-dev
>>
>>
>
-------------- next part --------------
An HTML attachment was scrubbed...
URL: <http://lists.llvm.org/pipermail/llvm-dev/attachments/20160812/c14d4e91/attachment.html>
More information about the llvm-dev
mailing list