[llvm-dev] Invoke loop vectorizer
Daniel Berlin via llvm-dev
llvm-dev at lists.llvm.org
Fri Aug 12 11:39:37 PDT 2016
cat > test.c
#define SIZE 128
void bar(int *restrict A, int* restrict B,int K) {
#pragma clang loop vectorize(enable) vectorize_width(2) unroll_count(8)
for (int i = 0; i < SIZE; ++i)
A[i] += B[i] + K;
}
[dannyb at dannyb-macbookpro3 11:37:20] ~ :) $ clang -O3 test.c -c -save-temps
[dannyb at dannyb-macbookpro3 11:38:28] ~ :) $ pcregrep -i "^\s*p" test.s|less
pushq %rbp
pshufd $68, %xmm0, %xmm0 ## xmm0 = xmm0[0,1,0,1]
pslldq $8, %xmm1 ## xmm1 =
zero,zero,zero,zero,zero,zero,zero,zero,xmm1[0,1,2,3,4,5,6,7]
pshufd $68, %xmm3, %xmm3 ## xmm3 = xmm3[0,1,0,1]
paddq %xmm1, %xmm3
pshufd $78, %xmm3, %xmm4 ## xmm4 = xmm3[2,3,0,1]
punpckldq %xmm5, %xmm4 ## xmm4 =
xmm4[0],xmm5[0],xmm4[1],xmm5[1]
pshufd $212, %xmm4, %xmm4 ## xmm4 = xmm4[0,1,1,3]
Note:
It also vectorizes at SIZE=8.
Not sure what the exact translation of options from clang-cl to clang is.
Maybe try adding /O3?
On Fri, Aug 12, 2016 at 11:23 AM, Xiaochu Liu <xiaochu1122 at gmail.com> wrote:
> Hi Daniel,
>
> I increased the size of your test to be 128 but -stats still shows no loop
> optimized...
>
> Xiaochu
>
> On Aug 12, 2016 11:11 AM, "Daniel Berlin" <dberlin at dberlin.org> wrote:
>
>> It's not possible to know that A and B don't alias in this example. It's
>> almost certainly not profitable to add a runtime check given the size of
>> the loop.
>>
>>
>> try
>>
>> #define SIZE 8
>>
>> void bar(int *restrict A, int* restrict B,int K) {
>>
>> #pragma clang loop vectorize(enable) vectorize_width(2) unroll_count(8)
>>
>> for (int i = 0; i < SIZE; ++i)
>>
>> A[i] += B[i] + K;
>>
>> }
>>
>> (i don't remember if llvm also does runtime alias checks, but if it does,
>> you'd probably need to increase size to get it to vectorize)
>>
>> On Fri, Aug 12, 2016 at 11:08 AM, Xiaochu Liu via llvm-dev <
>> llvm-dev at lists.llvm.org> wrote:
>>
>>> Hi Andrey,
>>>
>>> Thanks. I found even when loop vectorizer and SLP vectorizer are
>>> enabled, my simple test still not get optimized. I also tried clang pragma
>>> in my test to force vectorization. What do you think is the problem?
>>>
>>> Test:
>>>
>>> #define SIZE 8
>>>
>>> void bar(int *A, int* B,int K) {
>>>
>>> #pragma clang loop vectorize(enable) vectorize_width(2) unroll_count(8)
>>>
>>> for (int i = 0; i < SIZE; ++i)
>>>
>>> A[i] += B[i] + K;
>>>
>>> }
>>>
>>> Thanks,
>>> Xiaochu
>>>
>>> On Aug 12, 2016 4:06 AM, "Andrey Bokhanko" <andreybokhanko at gmail.com>
>>> wrote:
>>>
>>>> Hi Xiaochu,
>>>>
>>>> Clang uses -O0 by default, that doesn't run any optimizations. Try
>>>> supplying -O1 or higher.
>>>>
>>>> Yours,
>>>> Andrey
>>>>
>>>>
>>>> On Fri, Aug 12, 2016 at 1:04 AM, Xiaochu Liu via llvm-dev <
>>>> llvm-dev at lists.llvm.org> wrote:
>>>>
>>>>> Hi there ,
>>>>>
>>>>> I use clang-cl /Qvec test.c to compile the code. But the pass
>>>>> LoopVectorizer is never invoked.
>>>>>
>>>>> I was wondering if this is sufficient to enable auto vectorizer?
>>>>>
>>>>> Thanks,
>>>>> Xiaochu
>>>>>
>>>>> _______________________________________________
>>>>> LLVM Developers mailing list
>>>>> llvm-dev at lists.llvm.org
>>>>> http://lists.llvm.org/cgi-bin/mailman/listinfo/llvm-dev
>>>>>
>>>>>
>>>>
>>> _______________________________________________
>>> LLVM Developers mailing list
>>> llvm-dev at lists.llvm.org
>>> http://lists.llvm.org/cgi-bin/mailman/listinfo/llvm-dev
>>>
>>>
>>
-------------- next part --------------
An HTML attachment was scrubbed...
URL: <http://lists.llvm.org/pipermail/llvm-dev/attachments/20160812/140b2b2e/attachment.html>
More information about the llvm-dev
mailing list