[LLVMdev] loop vectorizer

Frank Winter fwinter at jlab.org
Wed Oct 30 18:10:32 PDT 2013


>> What needs to be done (on a high level) in order to have the auto vectorizer succeed on the test function as given erlier?
> Maybe you could rewrite the loop in a way that will expose contiguous memory accesses. Is this something you could do ?
>

Hi Nadav,

the only option I see is to unroll the loop by hand. Since the array 
access is consecutive over 4 loop iterations I gave it a try and 
unrolled the loop by a factor of 4.  Which gives the following array 
accesses:

loop iter 0:
index_0 = 0   index_1 = 4
index_0 = 1   index_1 = 5
index_0 = 2   index_1 = 6
index_0 = 3   index_1 = 7

loop iter 1:
index_0 = 8   index_1 = 12
index_0 = 9   index_1 = 13
index_0 = 10   index_1 = 14
index_0 = 11   index_1 = 15

For completeness, here the code:

void bar(std::uint64_t start, std::uint64_t end, float * __restrict__  
c, float * __restrict__ a, float * __restrict__ b)
{
   const std::uint64_t inner = 4;
   for (std::uint64_t i = start ; i < end ; i+=4 ) {
     {
       const std::uint64_t ir0 = ( ((i+0)/inner) * 2 + 0 ) * inner + 
(i+0)%4;
       const std::uint64_t ir1 = ( ((i+0)/inner) * 2 + 1 ) * inner + 
(i+0)%4;
       c[ ir0 ]         = a[ ir0 ]         + b[ ir0 ];
       c[ ir1 ]         = a[ ir1 ]         + b[ ir1 ];
     }
     {
       const std::uint64_t ir0 = ( ((i+1)/inner) * 2 + 0 ) * inner + 
(i+1)%4;
       const std::uint64_t ir1 = ( ((i+1)/inner) * 2 + 1 ) * inner + 
(i+1)%4;
       c[ ir0 ]         = a[ ir0 ]         + b[ ir0 ];
       c[ ir1 ]         = a[ ir1 ]         + b[ ir1 ];
     }
     {
       const std::uint64_t ir0 = ( ((i+2)/inner) * 2 + 0 ) * inner + 
(i+2)%4;
       const std::uint64_t ir1 = ( ((i+2)/inner) * 2 + 1 ) * inner + 
(i+2)%4;
       c[ ir0 ]         = a[ ir0 ]         + b[ ir0 ];
       c[ ir1 ]         = a[ ir1 ]         + b[ ir1 ];
     }
     {
       const std::uint64_t ir0 = ( ((i+3)/inner) * 2 + 0 ) * inner + 
(i+3)%4;
       const std::uint64_t ir1 = ( ((i+3)/inner) * 2 + 1 ) * inner + 
(i+3)%4;
       c[ ir0 ]         = a[ ir0 ]         + b[ ir0 ];
       c[ ir1 ]         = a[ ir1 ]         + b[ ir1 ];
     }
   }
}


This should be an ideal test case for the SLP vectorizer, right?

It seems, I am out of luck:

opt -O3 -vectorize-slp -debug loop.ll -S

SLP: Analyzing blocks in _Z3barmmPfS_S_.
SLP: Found 8 stores to vectorize.
SLP: Analyzing a store chain of length 8.
SLP: Trying to vectorize starting at PHIs (1)
SLP: Vectorizing a list of length = 2.
SLP: Vectorizing a list of length = 2.
SLP: Vectorizing a list of length = 2.

But the resulting IR is not showing any vector instructions. Maybe it's 
me. I never got the SLP vectortizer to do anything good. Any idea what 
might go wrong?

I also tries the loop vectorizer:

opt -O3 -loop-vectorize -debug-only=loop-vectorize -debug loop.ll -S

LV: Checking a loop in "_Z3barmmPfS_S_"
LV: Found a loop: for.body
LV: SCEV could not compute the loop exit count.
LV: Not vectorizing.

Hm.. This was better with the unrolled loop. At least it could find the 
loop exit count. Any idea why it can't find it now?

Frank




More information about the llvm-dev mailing list