[LLVMdev] loop vectorizer
Nadav Rotem
nrotem at apple.com
Wed Oct 30 18:16:39 PDT 2013
On Oct 30, 2013, at 6:10 PM, Frank Winter <fwinter at jlab.org> wrote:
> the only option I see is to unroll the loop by hand. Since the array access is consecutive over 4 loop iterations I gave it a try and unrolled the loop by a factor of 4. Which gives the following array accesses:
>
> loop iter 0:
> index_0 = 0 index_1 = 4
> index_0 = 1 index_1 = 5
> index_0 = 2 index_1 = 6
> index_0 = 3 index_1 = 7
>
> loop iter 1:
> index_0 = 8 index_1 = 12
> index_0 = 9 index_1 = 13
> index_0 = 10 index_1 = 14
> index_0 = 11 index_1 = 15
The SLP-vectorizer detects 8 stores, but it can’t prove that they are consecutive, so it moves on. Can you simplify the address expression ? Can you write " index0 = i*8 + 0 “ and give it a try ?
>
> For completeness, here the code:
>
> void bar(std::uint64_t start, std::uint64_t end, float * __restrict__ c, float * __restrict__ a, float * __restrict__ b)
> {
> const std::uint64_t inner = 4;
> for (std::uint64_t i = start ; i < end ; i+=4 ) {
> {
> const std::uint64_t ir0 = ( ((i+0)/inner) * 2 + 0 ) * inner + (i+0)%4;
> const std::uint64_t ir1 = ( ((i+0)/inner) * 2 + 1 ) * inner + (i+0)%4;
> c[ ir0 ] = a[ ir0 ] + b[ ir0 ];
> c[ ir1 ] = a[ ir1 ] + b[ ir1 ];
> }
> {
> const std::uint64_t ir0 = ( ((i+1)/inner) * 2 + 0 ) * inner + (i+1)%4;
> const std::uint64_t ir1 = ( ((i+1)/inner) * 2 + 1 ) * inner + (i+1)%4;
> c[ ir0 ] = a[ ir0 ] + b[ ir0 ];
> c[ ir1 ] = a[ ir1 ] + b[ ir1 ];
> }
> {
> const std::uint64_t ir0 = ( ((i+2)/inner) * 2 + 0 ) * inner + (i+2)%4;
> const std::uint64_t ir1 = ( ((i+2)/inner) * 2 + 1 ) * inner + (i+2)%4;
> c[ ir0 ] = a[ ir0 ] + b[ ir0 ];
> c[ ir1 ] = a[ ir1 ] + b[ ir1 ];
> }
> {
> const std::uint64_t ir0 = ( ((i+3)/inner) * 2 + 0 ) * inner + (i+3)%4;
> const std::uint64_t ir1 = ( ((i+3)/inner) * 2 + 1 ) * inner + (i+3)%4;
> c[ ir0 ] = a[ ir0 ] + b[ ir0 ];
> c[ ir1 ] = a[ ir1 ] + b[ ir1 ];
> }
> }
> }
>
>
> This should be an ideal test case for the SLP vectorizer, right?
>
> It seems, I am out of luck:
>
> opt -O3 -vectorize-slp -debug loop.ll -S
>
> SLP: Analyzing blocks in _Z3barmmPfS_S_.
> SLP: Found 8 stores to vectorize.
> SLP: Analyzing a store chain of length 8.
> SLP: Trying to vectorize starting at PHIs (1)
> SLP: Vectorizing a list of length = 2.
> SLP: Vectorizing a list of length = 2.
> SLP: Vectorizing a list of length = 2.
-------------- next part --------------
An HTML attachment was scrubbed...
URL: <http://lists.llvm.org/pipermail/llvm-dev/attachments/20131030/84b89a32/attachment.html>
More information about the llvm-dev
mailing list