[LLVMdev] loop vectorizer

Nadav Rotem nrotem at apple.com
Wed Oct 30 18:16:39 PDT 2013


On Oct 30, 2013, at 6:10 PM, Frank Winter <fwinter at jlab.org> wrote:

> the only option I see is to unroll the loop by hand. Since the array access is consecutive over 4 loop iterations I gave it a try and unrolled the loop by a factor of 4.  Which gives the following array accesses:
> 
> loop iter 0:
> index_0 = 0   index_1 = 4
> index_0 = 1   index_1 = 5
> index_0 = 2   index_1 = 6
> index_0 = 3   index_1 = 7
> 
> loop iter 1:
> index_0 = 8   index_1 = 12
> index_0 = 9   index_1 = 13
> index_0 = 10   index_1 = 14
> index_0 = 11   index_1 = 15

The SLP-vectorizer detects 8 stores, but it can’t prove that they are consecutive, so it moves on.  Can you simplify the address expression ?  Can you write " index0 = i*8 + 0 “ and give it a try ?

> 
> For completeness, here the code:
> 
> void bar(std::uint64_t start, std::uint64_t end, float * __restrict__  c, float * __restrict__ a, float * __restrict__ b)
> {
>  const std::uint64_t inner = 4;
>  for (std::uint64_t i = start ; i < end ; i+=4 ) {
>    {
>      const std::uint64_t ir0 = ( ((i+0)/inner) * 2 + 0 ) * inner + (i+0)%4;
>      const std::uint64_t ir1 = ( ((i+0)/inner) * 2 + 1 ) * inner + (i+0)%4;
>      c[ ir0 ]         = a[ ir0 ]         + b[ ir0 ];
>      c[ ir1 ]         = a[ ir1 ]         + b[ ir1 ];
>    }
>    {
>      const std::uint64_t ir0 = ( ((i+1)/inner) * 2 + 0 ) * inner + (i+1)%4;
>      const std::uint64_t ir1 = ( ((i+1)/inner) * 2 + 1 ) * inner + (i+1)%4;
>      c[ ir0 ]         = a[ ir0 ]         + b[ ir0 ];
>      c[ ir1 ]         = a[ ir1 ]         + b[ ir1 ];
>    }
>    {
>      const std::uint64_t ir0 = ( ((i+2)/inner) * 2 + 0 ) * inner + (i+2)%4;
>      const std::uint64_t ir1 = ( ((i+2)/inner) * 2 + 1 ) * inner + (i+2)%4;
>      c[ ir0 ]         = a[ ir0 ]         + b[ ir0 ];
>      c[ ir1 ]         = a[ ir1 ]         + b[ ir1 ];
>    }
>    {
>      const std::uint64_t ir0 = ( ((i+3)/inner) * 2 + 0 ) * inner + (i+3)%4;
>      const std::uint64_t ir1 = ( ((i+3)/inner) * 2 + 1 ) * inner + (i+3)%4;
>      c[ ir0 ]         = a[ ir0 ]         + b[ ir0 ];
>      c[ ir1 ]         = a[ ir1 ]         + b[ ir1 ];
>    }
>  }
> }
> 
> 
> This should be an ideal test case for the SLP vectorizer, right?
> 
> It seems, I am out of luck:
> 
> opt -O3 -vectorize-slp -debug loop.ll -S
> 
> SLP: Analyzing blocks in _Z3barmmPfS_S_.
> SLP: Found 8 stores to vectorize.
> SLP: Analyzing a store chain of length 8.
> SLP: Trying to vectorize starting at PHIs (1)
> SLP: Vectorizing a list of length = 2.
> SLP: Vectorizing a list of length = 2.
> SLP: Vectorizing a list of length = 2.

-------------- next part --------------
An HTML attachment was scrubbed...
URL: <http://lists.llvm.org/pipermail/llvm-dev/attachments/20131030/84b89a32/attachment.html>


More information about the llvm-dev mailing list