[LLVMdev] Why is the loop vectorizer not working on my function?

Sat Oct 26 10:03:53 PDT 2013

Hi Frank,

Sent from my iPhone

> On Oct 26, 2013, at 10:03 AM, Frank Winter <fwinter at jlab.org> wrote:
> 
> My function implements a simple loop:
> 
> void bar( int start, int end, float* A, float* B, float* C)
> {
>    for (int i=start; i<end;++i)
>       A[i] = B[i] * C[i];
> }
> 
> This looks pretty much like the standard example. However, I built the function
> with the IRBuilder, thus not coming from C and clang. Also I changed slightly
> the function's signature:
> 
> define void @bar([8 x i8]* %arg_ptr) {
> entrypoint:
>  %0 = bitcast [8 x i8]* %arg_ptr to i32*
>  %1 = load i32* %0
>  %2 = getelementptr [8 x i8]* %arg_ptr, i32 1
>  %3 = bitcast [8 x i8]* %2 to i32*
>  %4 = load i32* %3
>  %5 = getelementptr [8 x i8]* %arg_ptr, i32 2
>  %6 = bitcast [8 x i8]* %5 to float**
>  %7 = load float** %6
>  %8 = getelementptr [8 x i8]* %arg_ptr, i32 3
>  %9 = bitcast [8 x i8]* %8 to float**
>  %10 = load float** %9
>  %11 = getelementptr [8 x i8]* %arg_ptr, i32 4
>  %12 = bitcast [8 x i8]* %11 to float**
>  %13 = load float** %12
>  br label %L0
> 
> L0:                                               ; preds = %L0, %entrypoint
>  %14 = phi i32 [ %21, %L0 ], [ %1, %entrypoint ]
>  %15 = getelementptr float* %10, i32 %14
>  %16 = load float* %15
>  %17 = getelementptr float* %13, i32 %14
>  %18 = load float* %17
>  %19 = fmul float %18, %16
>  %20 = getelementptr float* %7, i32 %14
>  store float %19, float* %20
>  %21 = add i32 %14, 1
Try
%21 = add nsw i32 %14, 1
instead for no-signed wrapping arithmetic.

If that is not working please post the output of opt ... -debug-only=loop-vectorize ...

>  %22 = icmp sge i32 %21, %4
>  br i1 %22, label %L1, label %L0
> 
> L1:                                               ; preds = %L0
>  ret void
> }
> 
> 
> As you can see, I use the phi instruction for the loop index. I notice
> that clang prefers stack allocation. So, I am not sure what's the
> problem that the loop vectorizer is not working here.
> I tried many things, like specifying an architecture with vector
> units, enforcing the vector width. No success.
> 
> opt -march=x64-64 -loop-vectorize -force-vector-width=8 -S loop.ll
> 
> The only explanation I have is the use of the phi instruction. Is this
> preventing to vectorize the loop?
> 
> Frank
> 
> 
> _______________________________________________
> LLVM Developers mailing list
> LLVMdev at cs.uiuc.edu         http://llvm.cs.uiuc.edu
> http://lists.cs.uiuc.edu/mailman/listinfo/llvmdev