[LLVMdev] Replacing a repetitive sequence of code with a loop

Wed Jun 3 10:37:24 PDT 2015

Hey guys, in an HPC project I am working on I am given an LLVM program 
consisting of a linear sequence of repetitive junks of code with an 
uniform memory access pattern. Each code junk does the following: 1) 
loads some memory, 2) performs some arithmetic operations, 3) stores the 
result back to memory. The memory stride between consecutive junks is 
constant over the whole program, thus the whole program could be 
replaced with a single loop with it's loop body containing a generic 
version of said code junk.Here's an example (a short one, the real world 
program would be much longer):

define void @vec_plus_vec(float* noalias %arg0, float* noalias %arg1, 
float* noalias %arg2) {
entrypoint:
   %0 = bitcast float* %arg1 to <4 x float>*
   %1 = bitcast float* %arg2 to <4 x float>*
   %2 = load <4 x float>* %0, align 16
   %3 = load <4 x float>* %1, align 16
   %4 = fadd <4 x float> %3, %2
   %5 = bitcast float* %arg0 to <4 x float>*
   store <4 x float> %4, <4 x float>* %5, align 16
   %6 = getelementptr float* %arg1, i64 4
   %7 = getelementptr float* %arg2, i64 4
   %8 = getelementptr float* %arg0, i64 4
   %9 = bitcast float* %6 to <4 x float>*
   %10 = bitcast float* %7 to <4 x float>*
   %11 = load <4 x float>* %9, align 16
   %12 = load <4 x float>* %10, align 16
   %13 = fadd <4 x float> %12, %11
   %14 = bitcast float* %8 to <4 x float>*
   store <4 x float> %13, <4 x float>* %14, align 16
   %15 = getelementptr float* %arg1, i64 8
   %16 = getelementptr float* %arg2, i64 8
   %17 = getelementptr float* %arg0, i64 8
   %18 = bitcast float* %15 to <4 x float>*
   %19 = bitcast float* %16 to <4 x float>*
   %20 = load <4 x float>* %18, align 16
   %21 = load <4 x float>* %19, align 16
   %22 = fadd <4 x float> %21, %20
   %23 = bitcast float* %17 to <4 x float>*
   store <4 x float> %22, <4 x float>* %23, align 16
   %24 = getelementptr float* %arg1, i64 12
   %25 = getelementptr float* %arg2, i64 12
   %26 = getelementptr float* %arg0, i64 12
   %27 = bitcast float* %24 to <4 x float>*
   %28 = bitcast float* %25 to <4 x float>*
   %29 = load <4 x float>* %27, align 16
   %30 = load <4 x float>* %28, align 16
   %31 = fadd <4 x float> %30, %29
   %32 = bitcast float* %26 to <4 x float>*
   store <4 x float> %31, <4 x float>* %32, align 16
   ret void
}

The stride between two consecutive junks is 4*sizeof(float), thus could 
be condensed into a single loop. (The scenario with the constant stride 
between repetitive code junks is the first, most simple version. Later I 
will have to deal with arbitrary strides between junks. Requires an 
index array stored in the code as static data I guess)...

For this simple scenario what already existent LLVM pass would be 
closest to what I am trying to achieve?

Thanks,
Frank