[LLVMdev] loop vectorizer

Wed Oct 30 10:40:15 PDT 2013

I ran the BB vectorizer as I guess this is the SLP vectorizer.

BBV: using target information
BBV: fusing loop #1 for for.body in _Z3barmmPfS_S_...
BBV: found 2 instructions with candidate pairs
BBV: found 0 pair connections.
BBV: done!

However, this was run on the unrolled loop (I guess).

Here is the IR printed by 'opt':

entry:
   %cmp9 = icmp ult i64 %start, %end
   br i1 %cmp9, label %for.body, label %for.end

for.body:                                         ; preds = %entry, 
%for.body
   %storemerge10 = phi i64 [ %inc, %for.body ], [ %start, %entry ]
   %div = lshr i64 %storemerge10, 2
   %mul1 = shl i64 %div, 3
   %rem = and i64 %storemerge10, 3
   %add2 = or i64 %mul1, %rem
   %0 = lshr i64 %storemerge10, 1
   %add51 = shl i64 %0, 2
   %mul6 = or i64 %rem, %add51
   %add8 = or i64 %mul6, 4
   %arrayidx = getelementptr inbounds float* %a, i64 %add2
   %1 = load float* %arrayidx, align 4
   %arrayidx9 = getelementptr inbounds float* %b, i64 %add2
   %2 = load float* %arrayidx9, align 4
   %add10 = fadd float %1, %2
   %arrayidx11 = getelementptr inbounds float* %c, i64 %add2
   store float %add10, float* %arrayidx11, align 4
   %arrayidx12 = getelementptr inbounds float* %a, i64 %add8
   %3 = load float* %arrayidx12, align 4
   %arrayidx13 = getelementptr inbounds float* %b, i64 %add8
   %4 = load float* %arrayidx13, align 4
   %add14 = fadd float %3, %4
   %arrayidx15 = getelementptr inbounds float* %c, i64 %add8
   store float %add14, float* %arrayidx15, align 4
   %inc = add i64 %storemerge10, 1
   %exitcond = icmp eq i64 %inc, %end
   br i1 %exitcond, label %for.end, label %for.body

for.end:                                          ; preds = %for.body, 
%entry
   ret void

Is what you're saying that I should unroll the loop first by a given 
factor and then run SLP again? How would I do that say for a factor of 2?

Frank

On 30/10/13 13:28, Renato Golin wrote:
> On 30 October 2013 09:25, Nadav Rotem <nrotem at apple.com 
> <mailto:nrotem at apple.com>> wrote:
>
>     The access pattern to arrays a and b is non-linear.  Unrolled
>     loops are usually handled by the SLP-vectorizer.  Are ir0 and ir1
>     consecutive for all values for i ?
>
>
> Based on his list of values, it seems that the induction stride is 
> linear within each block of 4 iterations, but it's not a clear 
> relationship.
>
> As you say, it should be possible to spot that once the loop is 
> unrolled, and get the SLP to vectorize if the relationship becomes clear.
>
> Maybe I'm wrong, but this looks like a problem of missed 
> opportunities, not technically hard to implement.
>
> --renato

-------------- next part --------------
An HTML attachment was scrubbed...
URL: <http://lists.llvm.org/pipermail/llvm-dev/attachments/20131030/07d318bd/attachment.html>