[LLVMdev] Why is the loop vectorizer not working on my function?
Frank Winter
fwinter at jlab.org
Sat Oct 26 08:03:29 PDT 2013
My function implements a simple loop:
void bar( int start, int end, float* A, float* B, float* C)
{
for (int i=start; i<end;++i)
A[i] = B[i] * C[i];
}
This looks pretty much like the standard example. However, I built the
function
with the IRBuilder, thus not coming from C and clang. Also I changed
slightly
the function's signature:
define void @bar([8 x i8]* %arg_ptr) {
entrypoint:
%0 = bitcast [8 x i8]* %arg_ptr to i32*
%1 = load i32* %0
%2 = getelementptr [8 x i8]* %arg_ptr, i32 1
%3 = bitcast [8 x i8]* %2 to i32*
%4 = load i32* %3
%5 = getelementptr [8 x i8]* %arg_ptr, i32 2
%6 = bitcast [8 x i8]* %5 to float**
%7 = load float** %6
%8 = getelementptr [8 x i8]* %arg_ptr, i32 3
%9 = bitcast [8 x i8]* %8 to float**
%10 = load float** %9
%11 = getelementptr [8 x i8]* %arg_ptr, i32 4
%12 = bitcast [8 x i8]* %11 to float**
%13 = load float** %12
br label %L0
L0: ; preds = %L0, %entrypoint
%14 = phi i32 [ %21, %L0 ], [ %1, %entrypoint ]
%15 = getelementptr float* %10, i32 %14
%16 = load float* %15
%17 = getelementptr float* %13, i32 %14
%18 = load float* %17
%19 = fmul float %18, %16
%20 = getelementptr float* %7, i32 %14
store float %19, float* %20
%21 = add i32 %14, 1
%22 = icmp sge i32 %21, %4
br i1 %22, label %L1, label %L0
L1: ; preds = %L0
ret void
}
As you can see, I use the phi instruction for the loop index. I notice
that clang prefers stack allocation. So, I am not sure what's the
problem that the loop vectorizer is not working here.
I tried many things, like specifying an architecture with vector
units, enforcing the vector width. No success.
opt -march=x64-64 -loop-vectorize -force-vector-width=8 -S loop.ll
The only explanation I have is the use of the phi instruction. Is this
preventing to vectorize the loop?
Frank
More information about the llvm-dev
mailing list