[LLVMdev] Autovectorization questions
Arnold Schwaighofer
aschwaighofer at apple.com
Wed Mar 12 17:01:28 PDT 2014
Zinovy,
to clarify: the code is vectorizable. But LLVM currently fails to prove it is.
On Mar 12, 2014, at 3:50 PM, Arnold Schwaighofer <aschwaighofer at apple.com> wrote:
> In order to vectorize code like this LLVM needs to prove that “A[i*7]” does not wrap in the address space. It fails to do so and so LLVM doesn’t vectorize this loop even if we try to force it.
>
> The following loop will be vectorized if we force it:
>
> int foo(int * A, int * B, int n, int k) {
> for (int i = 0; i < 1024; ++i)
> A[i] += B[i*k];
> }
>
> So will this loop:
>
> int foo(int * restrict A, int * restrict B, int n, int k) {
> for (int i = 0; i < n; ++i)
> A[i] += B[i*k];
> }
>
> I will update the example.
>
> Thanks,
> Arnold
>
> On Mar 12, 2014, at 1:54 PM, Nadav Rotem <nrotem at apple.com> wrote:
>
>> Hi Zinovy,
>>
>> The loop vectorizer probably decided that it was not profitable to vectorize the function. You can force the vectorization of the function by setting a low threshold.
>>
>> Thanks,
>> Nadav
>>
>> On Mar 12, 2014, at 3:34 AM, Zinovy Nis <zinovy.nis at gmail.com> wrote:
>>
>>> Hi,
>>>
>>> I'm reading "http://llvm.org/docs/Vectorizers.html" and have few question. Hope someone has answers on it.
>>>
>>>
>>> The Loop Vectorizer can vectorize code that becomes a sequence of scalar instructions that scatter/gathers memory. (http://llvm.org/docs/Vectorizers.html#scatter-gather)
>>>
>>> int foo(int *A, int *B, int n, int k) {
>>> for (int i = 0; i < n; ++i)
>>> A[i*7] += B[i*k];
>>> }
>>>
>>> I replaced "int *A"/"int *B" into "double *A"/"double *B" and then compiled the sample with
>>>
>>> $> ./clang -Ofast -ffast-math test.c -std=c99 -march=core-avx2 -S -o bb.S -fslp-vectorize-aggressive
>>>
>>> and loop body looks like:
>>>
>>> .LBB1_2: # %for.body
>>> # =>This Inner Loop Header: Depth=1
>>> cltq
>>> vmovsd (%rsi,%rax,8), %xmm0
>>> movq %r9, %r10
>>> sarq $32, %r10
>>> vaddsd (%rdi,%r10,8), %xmm0, %xmm0
>>> vmovsd %xmm0, (%rdi,%r10,8)
>>> addq %r8, %r9
>>> addl %ecx, %eax
>>> decl %edx
>>> jne .LBB1_2
>>>
>>> so vector instructions for scalars (vaddsd, vmovsd) were used in the loop and no real gather/scatter emitted.
>>>
>>> The question is why this loop was not vectorized? Typo in docs?
>>>
>>> _______________________________________________
>>> LLVM Developers mailing list
>>> LLVMdev at cs.uiuc.edu http://llvm.cs.uiuc.edu
>>> http://lists.cs.uiuc.edu/mailman/listinfo/llvmdev
>>
>> _______________________________________________
>> LLVM Developers mailing list
>> LLVMdev at cs.uiuc.edu http://llvm.cs.uiuc.edu
>> http://lists.cs.uiuc.edu/mailman/listinfo/llvmdev
>
More information about the llvm-dev
mailing list