[LLVMdev] Modifications to SLP

Tue Jul 7 11:31:19 PDT 2015

Hi all!

It takes the current SLP vectorizer too long to vectorize my scalar 
code. I am talking here about functions that have a single, huge basic 
block with O(10^6) instructions. Here's an example:

   %0 = getelementptr float* %arg1, i32 49
   %1 = load float* %0
   %2 = getelementptr float* %arg1, i32 4145
   %3 = load float* %2
   %4 = getelementptr float* %arg2, i32 49
   %5 = load float* %4
   %6 = getelementptr float* %arg2, i32 4145
   %7 = load float* %6
   %8 = fmul float %7, %1
   %9 = fmul float %5, %3
   %10 = fadd float %9, %8
   %11 = fmul float %7, %3
   %12 = fmul float %5, %1
   %13 = fsub float %12, %11
   %14 = getelementptr float* %arg3, i32 16
   %15 = load float* %14
   %16 = getelementptr float* %arg3, i32 4112
   %17 = load float* %16
   %18 = getelementptr float* %arg4, i32 0
   %19 = load float* %18
   %20 = getelementptr float* %arg4, i32 4096
   %21 = load float* %20
   %22 = fmul float %21, %15
   %23 = fmul float %19, %17
   %24 = fadd float %23, %22
   %25 = fmul float %21, %17
   %26 = fmul float %19, %15
   %27 = fsub float %26, %25
   %28 = fadd float %24, %10
   %29 = fadd float %27, %13
   %30 = getelementptr float* %arg0, i32 0
   store float %29, float* %30
   %31 = getelementptr float* %arg0, i32 4096
   store float %28, float* %31
... and so on ...

The SLP vectorizer would create some code like this:

   %219 = insertelement <4 x float> %218, float %185, i32 2
   %220 = insertelement <4 x float> %219, float %197, i32 3
   %221 = fmul <4 x float> %216, %220
   %222 = fadd <4 x float> %221, %212
   %223 = fmul <4 x float> %207, %220
..
   %234 = bitcast float* %165 to <4 x float>*
   store <4 x float> %233, <4 x float>* %234, align 4

With the current SLP implementation 99.5% of the time is spent in the 
SLP vectorizer and I have the impression that this can be improved for 
my case. I believe that the SLP vectorizer has far more capabilities 
than I would need for these simple (but huge) functions. And I was 
hoping that any of you have an idea how to remove functionality of the 
SLP vectorizer such that it still can vectorize those simple functions...?

Thanks,
Frank