[LLVMdev] Autovectorization questions

Wed Mar 12 17:01:28 PDT 2014

Zinovy,

to clarify: the code is vectorizable. But LLVM currently fails to prove it is.

On Mar 12, 2014, at 3:50 PM, Arnold Schwaighofer <aschwaighofer at apple.com> wrote:

> In order to vectorize code like this LLVM needs to prove that “A[i*7]” does not wrap in the address space. It fails to do so and so LLVM doesn’t vectorize this loop even if we try to force it.
> 
> The following loop will be vectorized if we force it:
> 
> int foo(int * A, int * B, int n, int k) {
>  for (int i = 0; i < 1024; ++i)
>    A[i] += B[i*k];
> }
> 
> So will this loop:
> 
> int foo(int * restrict A, int * restrict B, int n, int k) {
>  for (int i = 0; i < n; ++i)
>    A[i] += B[i*k];
> }
> 
> I will update the example.
> 
> Thanks,
> Arnold
> 
> On Mar 12, 2014, at 1:54 PM, Nadav Rotem <nrotem at apple.com> wrote:
> 
>> Hi Zinovy, 
>> 
>> The loop vectorizer probably decided that it was not profitable to vectorize the function. You can force the vectorization of the function by setting a low threshold. 
>> 
>> Thanks,
>> Nadav
>> 
>> On Mar 12, 2014, at 3:34 AM, Zinovy Nis <zinovy.nis at gmail.com> wrote:
>> 
>>> Hi,
>>> 
>>> I'm reading "http://llvm.org/docs/Vectorizers.html" and have few question. Hope someone has answers on it.
>>> 
>>> 
>>> The Loop Vectorizer can vectorize code that becomes a sequence of scalar instructions that scatter/gathers memory. (http://llvm.org/docs/Vectorizers.html#scatter-gather)
>>> 
>>> int foo(int *A, int *B, int n, int k) {
>>>  for (int i = 0; i < n; ++i)
>>>    A[i*7] += B[i*k];
>>> }
>>> 
>>> I replaced "int *A"/"int *B" into "double *A"/"double *B" and then compiled the sample with 
>>> 
>>> $> ./clang -Ofast -ffast-math test.c -std=c99 -march=core-avx2 -S -o bb.S  -fslp-vectorize-aggressive
>>> 
>>> and loop body looks like:
>>> 
>>> .LBB1_2:                                # %for.body
>>>                                        # =>This Inner Loop Header: Depth=1
>>>        cltq
>>>        vmovsd  (%rsi,%rax,8), %xmm0
>>>        movq    %r9, %r10
>>>        sarq    $32, %r10
>>>        vaddsd  (%rdi,%r10,8), %xmm0, %xmm0
>>>        vmovsd  %xmm0, (%rdi,%r10,8)
>>>        addq    %r8, %r9
>>>        addl    %ecx, %eax
>>>        decl    %edx
>>>        jne     .LBB1_2
>>> 
>>> so vector instructions for scalars (vaddsd, vmovsd) were used in the loop and no real gather/scatter emitted.
>>> 
>>> The question is why this loop was not vectorized? Typo in docs?
>>> 
>>> _______________________________________________
>>> LLVM Developers mailing list
>>> LLVMdev at cs.uiuc.edu         http://llvm.cs.uiuc.edu
>>> http://lists.cs.uiuc.edu/mailman/listinfo/llvmdev
>> 
>> _______________________________________________
>> LLVM Developers mailing list
>> LLVMdev at cs.uiuc.edu         http://llvm.cs.uiuc.edu
>> http://lists.cs.uiuc.edu/mailman/listinfo/llvmdev
>