[LLVMdev] loop vectorizer

Wed Oct 30 09:25:09 PDT 2013

Hi Frank, 

The access pattern to arrays a and b is non-linear.  Unrolled loops are usually handled by the SLP-vectorizer.  Are ir0 and ir1 consecutive for all values for i ?

Thanks,
Nadav

On Oct 30, 2013, at 9:05 AM, Frank Winter <fwinter at jlab.org> wrote:

> The loop vectorizer seems to be not able to vectorize the following code:
> 
> void bar(std::uint64_t start, std::uint64_t end, float * __restrict__  c, float * __restrict__ a, float * __restrict__ b)
> {
>  const std::uint64_t inner = 4;
>  for (std::uint64_t i = start ; i < end ; ++i ) {
>    const std::uint64_t ir0 = ( (i/inner) * 2 + 0 ) * inner + i%4;
>    const std::uint64_t ir1 = ( (i/inner) * 2 + 1 ) * inner + i%4;
>    c[ ir0 ]         = a[ ir0 ]         + b[ ir0 ];
>    c[ ir1 ]         = a[ ir1 ]         + b[ ir1 ];
>  }
> }
> 
> LV: Found a loop: for.body
> LV: Found an induction variable.
> LV: We need to do 0 pointer comparisons.
> LV: Checking memory dependencies
> LV: Bad stride - Not an AddRecExpr pointer   %arrayidx11 = getelementptr inbounds float* %c, i64 %add2 SCEV: ((4 * %add2)<nsw> + %c)<nsw>
> LV: Bad stride - Not an AddRecExpr pointer   %arrayidx15 = getelementptr inbounds float* %c, i64 %add8 SCEV: ((4 * %add8)<nsw> + %c)<nsw>
> LV: Src Scev: ((4 * %add2)<nsw> + %c)<nsw>Sink Scev: ((4 * %add8)<nsw> + %c)<nsw>(Induction step: 0)
> LV: Distance for   store float %add10, float* %arrayidx11, align 4 to   store float %add14, float* %arrayidx15, align 4: ((4 * %add8)<nsw> + (-4 * %add2))
> Non-consecutive pointer access
> LV: We don't need a runtime memory check.
> LV: Can't vectorize due to memory conflicts
> LV: Not vectorizing.
> 
> Here the code:
> 
> entry:
>  %cmp14 = icmp ult i64 %start, %end
>  br i1 %cmp14, label %for.body, label %for.end
> 
> for.body:                                         ; preds = %entry, %for.body
>  %i.015 = phi i64 [ %inc, %for.body ], [ %start, %entry ]
>  %div = lshr i64 %i.015, 2
>  %mul = shl i64 %div, 3
>  %rem = and i64 %i.015, 3
>  %add2 = or i64 %mul, %rem
>  %add8 = or i64 %add2, 4
>  %arrayidx = getelementptr inbounds float* %a, i64 %add2
>  %0 = load float* %arrayidx, align 4
>  %arrayidx9 = getelementptr inbounds float* %b, i64 %add2
>  %1 = load float* %arrayidx9, align 4
>  %add10 = fadd float %0, %1
>  %arrayidx11 = getelementptr inbounds float* %c, i64 %add2
>  store float %add10, float* %arrayidx11, align 4
>  %arrayidx12 = getelementptr inbounds float* %a, i64 %add8
>  %2 = load float* %arrayidx12, align 4
>  %arrayidx13 = getelementptr inbounds float* %b, i64 %add8
>  %3 = load float* %arrayidx13, align 4
>  %add14 = fadd float %2, %3
>  %arrayidx15 = getelementptr inbounds float* %c, i64 %add8
>  store float %add14, float* %arrayidx15, align 4
>  %inc = add i64 %i.015, 1
>  %exitcond = icmp eq i64 %inc, %end
>  br i1 %exitcond, label %for.end, label %for.body
> 
> for.end:                                          ; preds = %for.body, %entry
>  ret void
> 
> Why is it saying Bad stride?Are the 'rem' and 'div' instruction asking too much from it?
> 
> The code should be vectorizable. Here the index access for start=0, end=16:
> 
> loop count i = 0   index_0 = 0   index_1 = 4
> loop count i = 1   index_0 = 1   index_1 = 5
> loop count i = 2   index_0 = 2   index_1 = 6
> loop count i = 3   index_0 = 3   index_1 = 7
> loop count i = 4   index_0 = 8   index_1 = 12
> loop count i = 5   index_0 = 9   index_1 = 13
> loop count i = 6   index_0 = 10   index_1 = 14
> loop count i = 7   index_0 = 11   index_1 = 15
> loop count i = 8   index_0 = 16   index_1 = 20
> loop count i = 9   index_0 = 17   index_1 = 21
> loop count i = 10   index_0 = 18   index_1 = 22
> loop count i = 11   index_0 = 19   index_1 = 23
> loop count i = 12   index_0 = 24   index_1 = 28
> loop count i = 13   index_0 = 25   index_1 = 29
> loop count i = 14   index_0 = 26   index_1 = 30
> loop count i = 15   index_0 = 27   index_1 = 31
> 
> Any ideas?
> 
> Frank
> 
> 
> 
> 
> 
> 
> _______________________________________________
> LLVM Developers mailing list
> LLVMdev at cs.uiuc.edu         http://llvm.cs.uiuc.edu
> http://lists.cs.uiuc.edu/mailman/listinfo/llvmdev