<div dir="ltr">Hi,<div><br></div><div>I'm reading "<a href="http://llvm.org/docs/Vectorizers.html">http://llvm.org/docs/Vectorizers.html</a>" and have few question. Hope someone has answers on it.</div><br><br>
The Loop Vectorizer can vectorize code that becomes a sequence of scalar instructions that scatter/gathers memory. (<a href="http://llvm.org/docs/Vectorizers.html#scatter-gather">http://llvm.org/docs/Vectorizers.html#scatter-gather</a>)<br>
<br>int foo(int *A, int *B, int n, int k) {<br> for (int i = 0; i < n; ++i)<br> A[i*7] += B[i*k];<br>}<div><br></div><div>I replaced "int *A"/"int *B" into "double *A"/"double *B" and then compiled the sample with </div>
<div><br><div>$> ./clang -Ofast -ffast-math test.c -std=c99 -march=core-avx2 -S -o bb.S -fslp-vectorize-aggressive<br></div><div><br></div><div>and loop body looks like:</div><div><br></div><div><div>.LBB1_2: # %for.body</div>
<div> # =>This Inner Loop Header: Depth=1</div><div> cltq</div><div> vmovsd (%rsi,%rax,8), %xmm0</div><div> movq %r9, %r10</div><div> sarq $32, %r10</div>
<div> vaddsd (%rdi,%r10,8), %xmm0, %xmm0</div><div> vmovsd %xmm0, (%rdi,%r10,8)</div><div> addq %r8, %r9</div><div> addl %ecx, %eax</div><div> decl %edx</div><div> jne .LBB1_2</div>
</div><div><br></div><div>so vector instructions for scalars (vaddsd, vmovsd) were used in the loop and no real gather/scatter emitted.</div><div><br></div><div>The question is why this loop was not vectorized? Typo in docs?</div>
<div><br></div></div></div>