[LLVMdev] loop vectorizer

Frank Winter fwinter at jlab.org
Wed Oct 30 18:40:47 PDT 2013


I tried the following on the hand-unrolled loop:

       const std::uint64_t ir0 = i*8+0; // working

       const std::uint64_t ir0 = i%4+0; // working

       const std::uint64_t ir0 = (i+0)%4;  // not working

'+0' means +1,+2,+3 in the unrolled iterations.

'Working' means the SLP vectorizer succeeded.

Thus, when working 'towards' the correct index function, auto 
vectorization fails. However, there is no option to use a simpler index 
function.

Is it possible to make the SCEV pass more smart? Or would you strongly 
advise against such endeavor?

Frank


On 30/10/13 21:16, Nadav Rotem wrote:
>
> On Oct 30, 2013, at 6:10 PM, Frank Winter <fwinter at jlab.org 
> <mailto:fwinter at jlab.org>> wrote:
>
>> the only option I see is to unroll the loop by hand. Since the array 
>> access is consecutive over 4 loop iterations I gave it a try and 
>> unrolled the loop by a factor of 4.  Which gives the following array 
>> accesses:
>>
>> loop iter 0:
>> index_0 = 0   index_1 = 4
>> index_0 = 1   index_1 = 5
>> index_0 = 2   index_1 = 6
>> index_0 = 3   index_1 = 7
>>
>> loop iter 1:
>> index_0 = 8   index_1 = 12
>> index_0 = 9   index_1 = 13
>> index_0 = 10   index_1 = 14
>> index_0 = 11   index_1 = 15
>
> The SLP-vectorizer detects 8 stores, but it can’t prove that they are 
> consecutive, so it moves on.  Can you simplify the address expression 
> ?  Can you write " index0 = i*8 + 0 “ and give it a try ?
>
>>
>> For completeness, here the code:
>>
>> void bar(std::uint64_t start, std::uint64_t end, float * __restrict__ 
>>  c, float * __restrict__ a, float * __restrict__ b)
>> {
>>  const std::uint64_t inner = 4;
>>  for (std::uint64_t i = start ; i < end ; i+=4 ) {
>>    {
>>      const std::uint64_t ir0 = ( ((i+0)/inner) * 2 + 0 ) * inner + 
>> (i+0)%4;
>>      const std::uint64_t ir1 = ( ((i+0)/inner) * 2 + 1 ) * inner + 
>> (i+0)%4;
>>      c[ ir0 ]         = a[ ir0 ]         + b[ ir0 ];
>>      c[ ir1 ]         = a[ ir1 ]         + b[ ir1 ];
>>    }
>>    {
>>      const std::uint64_t ir0 = ( ((i+1)/inner) * 2 + 0 ) * inner + 
>> (i+1)%4;
>>      const std::uint64_t ir1 = ( ((i+1)/inner) * 2 + 1 ) * inner + 
>> (i+1)%4;
>>      c[ ir0 ]         = a[ ir0 ]         + b[ ir0 ];
>>      c[ ir1 ]         = a[ ir1 ]         + b[ ir1 ];
>>    }
>>    {
>>      const std::uint64_t ir0 = ( ((i+2)/inner) * 2 + 0 ) * inner + 
>> (i+2)%4;
>>      const std::uint64_t ir1 = ( ((i+2)/inner) * 2 + 1 ) * inner + 
>> (i+2)%4;
>>      c[ ir0 ]         = a[ ir0 ]         + b[ ir0 ];
>>      c[ ir1 ]         = a[ ir1 ]         + b[ ir1 ];
>>    }
>>    {
>>      const std::uint64_t ir0 = ( ((i+3)/inner) * 2 + 0 ) * inner + 
>> (i+3)%4;
>>      const std::uint64_t ir1 = ( ((i+3)/inner) * 2 + 1 ) * inner + 
>> (i+3)%4;
>>      c[ ir0 ]         = a[ ir0 ]         + b[ ir0 ];
>>      c[ ir1 ]         = a[ ir1 ]         + b[ ir1 ];
>>    }
>>  }
>> }
>>
>>
>> This should be an ideal test case for the SLP vectorizer, right?
>>
>> It seems, I am out of luck:
>>
>> opt -O3 -vectorize-slp -debug loop.ll -S
>>
>> SLP: Analyzing blocks in _Z3barmmPfS_S_.
>> SLP: Found 8 stores to vectorize.
>> SLP: Analyzing a store chain of length 8.
>> SLP: Trying to vectorize starting at PHIs (1)
>> SLP: Vectorizing a list of length = 2.
>> SLP: Vectorizing a list of length = 2.
>> SLP: Vectorizing a list of length = 2.
>


-------------- next part --------------
An HTML attachment was scrubbed...
URL: <http://lists.llvm.org/pipermail/llvm-dev/attachments/20131030/092dab20/attachment.html>


More information about the llvm-dev mailing list