[LLVMdev] Why is the loop vectorizer not working on my function?
Arnold
aschwaighofer at apple.com
Sat Oct 26 10:03:53 PDT 2013
Hi Frank,
Sent from my iPhone
> On Oct 26, 2013, at 10:03 AM, Frank Winter <fwinter at jlab.org> wrote:
>
> My function implements a simple loop:
>
> void bar( int start, int end, float* A, float* B, float* C)
> {
> for (int i=start; i<end;++i)
> A[i] = B[i] * C[i];
> }
>
> This looks pretty much like the standard example. However, I built the function
> with the IRBuilder, thus not coming from C and clang. Also I changed slightly
> the function's signature:
>
> define void @bar([8 x i8]* %arg_ptr) {
> entrypoint:
> %0 = bitcast [8 x i8]* %arg_ptr to i32*
> %1 = load i32* %0
> %2 = getelementptr [8 x i8]* %arg_ptr, i32 1
> %3 = bitcast [8 x i8]* %2 to i32*
> %4 = load i32* %3
> %5 = getelementptr [8 x i8]* %arg_ptr, i32 2
> %6 = bitcast [8 x i8]* %5 to float**
> %7 = load float** %6
> %8 = getelementptr [8 x i8]* %arg_ptr, i32 3
> %9 = bitcast [8 x i8]* %8 to float**
> %10 = load float** %9
> %11 = getelementptr [8 x i8]* %arg_ptr, i32 4
> %12 = bitcast [8 x i8]* %11 to float**
> %13 = load float** %12
> br label %L0
>
> L0: ; preds = %L0, %entrypoint
> %14 = phi i32 [ %21, %L0 ], [ %1, %entrypoint ]
> %15 = getelementptr float* %10, i32 %14
> %16 = load float* %15
> %17 = getelementptr float* %13, i32 %14
> %18 = load float* %17
> %19 = fmul float %18, %16
> %20 = getelementptr float* %7, i32 %14
> store float %19, float* %20
> %21 = add i32 %14, 1
Try
%21 = add nsw i32 %14, 1
instead for no-signed wrapping arithmetic.
If that is not working please post the output of opt ... -debug-only=loop-vectorize ...
> %22 = icmp sge i32 %21, %4
> br i1 %22, label %L1, label %L0
>
> L1: ; preds = %L0
> ret void
> }
>
>
> As you can see, I use the phi instruction for the loop index. I notice
> that clang prefers stack allocation. So, I am not sure what's the
> problem that the loop vectorizer is not working here.
> I tried many things, like specifying an architecture with vector
> units, enforcing the vector width. No success.
>
> opt -march=x64-64 -loop-vectorize -force-vector-width=8 -S loop.ll
>
> The only explanation I have is the use of the phi instruction. Is this
> preventing to vectorize the loop?
>
> Frank
>
>
> _______________________________________________
> LLVM Developers mailing list
> LLVMdev at cs.uiuc.edu http://llvm.cs.uiuc.edu
> http://lists.cs.uiuc.edu/mailman/listinfo/llvmdev
More information about the llvm-dev
mailing list