[LLVMdev] Modifications to SLP

Frank Winter fwinter at jlab.org
Tue Jul 7 11:31:19 PDT 2015


Hi all!

It takes the current SLP vectorizer too long to vectorize my scalar 
code. I am talking here about functions that have a single, huge basic 
block with O(10^6) instructions. Here's an example:

   %0 = getelementptr float* %arg1, i32 49
   %1 = load float* %0
   %2 = getelementptr float* %arg1, i32 4145
   %3 = load float* %2
   %4 = getelementptr float* %arg2, i32 49
   %5 = load float* %4
   %6 = getelementptr float* %arg2, i32 4145
   %7 = load float* %6
   %8 = fmul float %7, %1
   %9 = fmul float %5, %3
   %10 = fadd float %9, %8
   %11 = fmul float %7, %3
   %12 = fmul float %5, %1
   %13 = fsub float %12, %11
   %14 = getelementptr float* %arg3, i32 16
   %15 = load float* %14
   %16 = getelementptr float* %arg3, i32 4112
   %17 = load float* %16
   %18 = getelementptr float* %arg4, i32 0
   %19 = load float* %18
   %20 = getelementptr float* %arg4, i32 4096
   %21 = load float* %20
   %22 = fmul float %21, %15
   %23 = fmul float %19, %17
   %24 = fadd float %23, %22
   %25 = fmul float %21, %17
   %26 = fmul float %19, %15
   %27 = fsub float %26, %25
   %28 = fadd float %24, %10
   %29 = fadd float %27, %13
   %30 = getelementptr float* %arg0, i32 0
   store float %29, float* %30
   %31 = getelementptr float* %arg0, i32 4096
   store float %28, float* %31
... and so on ...

The SLP vectorizer would create some code like this:

   %219 = insertelement <4 x float> %218, float %185, i32 2
   %220 = insertelement <4 x float> %219, float %197, i32 3
   %221 = fmul <4 x float> %216, %220
   %222 = fadd <4 x float> %221, %212
   %223 = fmul <4 x float> %207, %220
..
   %234 = bitcast float* %165 to <4 x float>*
   store <4 x float> %233, <4 x float>* %234, align 4


With the current SLP implementation 99.5% of the time is spent in the 
SLP vectorizer and I have the impression that this can be improved for 
my case. I believe that the SLP vectorizer has far more capabilities 
than I would need for these simple (but huge) functions. And I was 
hoping that any of you have an idea how to remove functionality of the 
SLP vectorizer such that it still can vectorize those simple functions...?

Thanks,
Frank




More information about the llvm-dev mailing list