[LLVMdev] loop vectorizer

Wed Oct 30 10:50:17 PDT 2013

----- Original Message -----
> 
> 
> I ran the BB vectorizer as I guess this is the SLP vectorizer.

No, while the BB vectorizer is doing a form of SLP vectorization, there is a separate SLP vectorization pass which uses a different algorithm. You can pass -vectorize-slp to opt.

 -Hal

> 
> BBV: using target information
> BBV: fusing loop #1 for for.body in _Z3barmmPfS_S_...
> BBV: found 2 instructions with candidate pairs
> BBV: found 0 pair connections.
> BBV: done!
> 
> However, this was run on the unrolled loop (I guess).
> 
> Here is the IR printed by 'opt':
> 
> entry:
> %cmp9 = icmp ult i64 %start, %end
> br i1 %cmp9, label %for.body, label %for.end
> 
> for.body: ; preds = %entry, %for.body
> %storemerge10 = phi i64 [ %inc, %for.body ], [ %start, %entry ]
> %div = lshr i64 %storemerge10, 2
> %mul1 = shl i64 %div, 3
> %rem = and i64 %storemerge10, 3
> %add2 = or i64 %mul1, %rem
> %0 = lshr i64 %storemerge10, 1
> %add51 = shl i64 %0, 2
> %mul6 = or i64 %rem, %add51
> %add8 = or i64 %mul6, 4
> %arrayidx = getelementptr inbounds float* %a, i64 %add2
> %1 = load float* %arrayidx, align 4
> %arrayidx9 = getelementptr inbounds float* %b, i64 %add2
> %2 = load float* %arrayidx9, align 4
> %add10 = fadd float %1, %2
> %arrayidx11 = getelementptr inbounds float* %c, i64 %add2
> store float %add10, float* %arrayidx11, align 4
> %arrayidx12 = getelementptr inbounds float* %a, i64 %add8
> %3 = load float* %arrayidx12, align 4
> %arrayidx13 = getelementptr inbounds float* %b, i64 %add8
> %4 = load float* %arrayidx13, align 4
> %add14 = fadd float %3, %4
> %arrayidx15 = getelementptr inbounds float* %c, i64 %add8
> store float %add14, float* %arrayidx15, align 4
> %inc = add i64 %storemerge10, 1
> %exitcond = icmp eq i64 %inc, %end
> br i1 %exitcond, label %for.end, label %for.body
> 
> for.end: ; preds = %for.body, %entry
> ret void
> 
> 
> Is what you're saying that I should unroll the loop first by a given
> factor and then run SLP again? How would I do that say for a factor
> of 2?
> 
> Frank
> 
> 
> 
> On 30/10/13 13:28, Renato Golin wrote:
> 
> 
> 
> 
> On 30 October 2013 09:25, Nadav Rotem < nrotem at apple.com > wrote:
> 
> 
> The access pattern to arrays a and b is non-linear. Unrolled loops
> are usually handled by the SLP-vectorizer. Are ir0 and ir1
> consecutive for all values for i ?
> 
> 
> Based on his list of values, it seems that the induction stride is
> linear within each block of 4 iterations, but it's not a clear
> relationship.
> 
> 
> As you say, it should be possible to spot that once the loop is
> unrolled, and get the SLP to vectorize if the relationship becomes
> clear.
> 
> 
> Maybe I'm wrong, but this looks like a problem of missed
> opportunities, not technically hard to implement.
> 
> 
> --renato
> 
> 
> _______________________________________________
> LLVM Developers mailing list
> LLVMdev at cs.uiuc.edu         http://llvm.cs.uiuc.edu
> http://lists.cs.uiuc.edu/mailman/listinfo/llvmdev
> 

-- 
Hal Finkel
Assistant Computational Scientist
Leadership Computing Facility
Argonne National Laboratory