[LLVMdev] Replacing a repetitive sequence of code with a loop
Frank Winter
fwinter at jlab.org
Wed Jun 3 10:37:24 PDT 2015
Hey guys, in an HPC project I am working on I am given an LLVM program
consisting of a linear sequence of repetitive junks of code with an
uniform memory access pattern. Each code junk does the following: 1)
loads some memory, 2) performs some arithmetic operations, 3) stores the
result back to memory. The memory stride between consecutive junks is
constant over the whole program, thus the whole program could be
replaced with a single loop with it's loop body containing a generic
version of said code junk.Here's an example (a short one, the real world
program would be much longer):
define void @vec_plus_vec(float* noalias %arg0, float* noalias %arg1,
float* noalias %arg2) {
entrypoint:
%0 = bitcast float* %arg1 to <4 x float>*
%1 = bitcast float* %arg2 to <4 x float>*
%2 = load <4 x float>* %0, align 16
%3 = load <4 x float>* %1, align 16
%4 = fadd <4 x float> %3, %2
%5 = bitcast float* %arg0 to <4 x float>*
store <4 x float> %4, <4 x float>* %5, align 16
%6 = getelementptr float* %arg1, i64 4
%7 = getelementptr float* %arg2, i64 4
%8 = getelementptr float* %arg0, i64 4
%9 = bitcast float* %6 to <4 x float>*
%10 = bitcast float* %7 to <4 x float>*
%11 = load <4 x float>* %9, align 16
%12 = load <4 x float>* %10, align 16
%13 = fadd <4 x float> %12, %11
%14 = bitcast float* %8 to <4 x float>*
store <4 x float> %13, <4 x float>* %14, align 16
%15 = getelementptr float* %arg1, i64 8
%16 = getelementptr float* %arg2, i64 8
%17 = getelementptr float* %arg0, i64 8
%18 = bitcast float* %15 to <4 x float>*
%19 = bitcast float* %16 to <4 x float>*
%20 = load <4 x float>* %18, align 16
%21 = load <4 x float>* %19, align 16
%22 = fadd <4 x float> %21, %20
%23 = bitcast float* %17 to <4 x float>*
store <4 x float> %22, <4 x float>* %23, align 16
%24 = getelementptr float* %arg1, i64 12
%25 = getelementptr float* %arg2, i64 12
%26 = getelementptr float* %arg0, i64 12
%27 = bitcast float* %24 to <4 x float>*
%28 = bitcast float* %25 to <4 x float>*
%29 = load <4 x float>* %27, align 16
%30 = load <4 x float>* %28, align 16
%31 = fadd <4 x float> %30, %29
%32 = bitcast float* %26 to <4 x float>*
store <4 x float> %31, <4 x float>* %32, align 16
ret void
}
The stride between two consecutive junks is 4*sizeof(float), thus could
be condensed into a single loop. (The scenario with the constant stride
between repetitive code junks is the first, most simple version. Later I
will have to deal with arbitrary strides between junks. Requires an
index array stored in the code as static data I guess)...
For this simple scenario what already existent LLVM pass would be
closest to what I am trying to achieve?
Thanks,
Frank
More information about the llvm-dev
mailing list