[llvm-dev] Jacobi 5 Point Stencil Code not Vectorizing

Michael Kruse via llvm-dev llvm-dev at lists.llvm.org
Wed Oct 25 05:34:54 PDT 2017


$ opt -O3 stencil.ll -pass-remarks=loop-vectorize -enable-load-pre=0

remark: <unknown>:0:0: vectorized loop (vectorization width: 16,
interleaved count: 1)

Michael

2017-10-24 20:37 GMT+02:00 hameeza ahmed <hahmed2305 at gmail.com>:
> Thank you.
>
> But how to resolve it, my work depends on it.
>
> On Tue, Oct 24, 2017 at 6:30 PM, Michael Kruse <llvmdev at meinersbur.de>
> wrote:
>>
>> Your problem is due to GVN partial reduction elimination (PRE) which
>> introduces a PHI node the current loop vectorizer cannot handle:
>>
>> opt -O3 stencil.ll -pass-remarks=loop-vectorize
>> -pass-remarks-missed=loop-vectorize
>> -pass-remarks-analysis=loop-vectorize
>> remark: <unknown>:0:0: loop not vectorized: value that could not be
>> identified as reduction is used outside the loop
>> remark: <unknown>:0:0: loop not vectorized
>>
>> The message is not entirely accurate. The PHI is not used outside of
>> the loop, but is in the loop header, therefore cannot be if-converted
>> and is not a reduction. Polly also had difficulties with this
>> construction as well.
>>
>> As presented at the dev meeting, VPlan will be able to insert a
>> shuffle instruction when encountering this situation.
>>
>>
>> Michael
>>
>>
>>
>>
>>
>> 2017-10-23 8:58 GMT+02:00 hameeza ahmed via llvm-dev
>> <llvm-dev at lists.llvm.org>:
>> > stencil.ll is attached here.
>> >
>> > On Mon, Oct 23, 2017 at 11:37 AM, Serge Preis <spreis at yandex-team.ru>
>> > wrote:
>> >>
>> >>
>> >>
>> >> Hello,
>> >>
>> >> To me this is an issue in llvm loop vectorizer (if N is large enough to
>> >> prevent complete unrolling of j-loop).
>> >>
>> >> Woud you mind to share stencil.ll than I would say more definitely what
>> >> the issue is.
>> >>
>> >> Regards,
>> >> Serge.
>> >>
>> >>
>> >> 22.10.2017, 03:21, "hameeza ahmed" <hahmed2305 at gmail.com>:
>> >>
>> >> Hello,
>> >>
>> >> The following stencil function that you wrote few months ago (for
>> >> computing 2D-5 point Jacobi stencil) is vectorizable.
>> >>
>> >> void stencil(int a[][N], int b[N])
>> >> {
>> >>   int i, j, k;
>> >>     for (k = 0; k < N; k++) {
>> >>         for (i = 1; i <= N-2; i++) {
>> >>             for (j = 1; j <= N-2; j++)
>> >>                  b[j] = 0.25 * (a[i][j] + a[i-1][j] + a[i+1][j] +
>> >> a[i][j-1] + a[i][j+1]);
>> >>             for (j = 1; j <= N-2; j++)
>> >>                  a[i][j] = b[j];
>> >>         }
>> >>     }
>> >> }
>> >>
>> >> but when i write the above code in main i am getting error
>> >>
>> >> opt  -S -O3  stencil.ll -o stencil_o3.ll
>> >> remark: <unknown>:0:0: loop not vectorized: value that could not be
>> >> identified as reduction is used outside the loop
>> >>
>> >> Why is that so?
>> >>
>> >> my code is follows:
>> >> int a[N][N],  b[N];
>> >> int main(void)
>> >> {
>> >>   int i, j, k;
>> >>     for (k = 0; k < N; k++) {
>> >>         for (i = 1; i <= N-2; i++) {
>> >>             for (j = 1; j <= N-2; j++)
>> >>                  b[j] = 0.25 * (a[i][j] + a[i-1][j] + a[i+1][j] +
>> >> a[i][j-1] + a[i][j+1]);
>> >>             for (j = 1; j <= N-2; j++)
>> >>                  a[i][j] = b[j];
>> >>         }
>> >>     }
>> >> }
>> >>
>> >>
>> >> Please help.
>> >>
>> >> Thank You.
>> >>
>> >>
>> >> On Mon, Jul 3, 2017 at 10:19 AM, Serge Preis <spreis at yandex-team.ru>
>> >> wrote:
>> >>
>> >> Hello,
>> >>
>> >> This is not equivalent rewrite: your original code definitely shouldn't
>> >> vectorize because it has backward cross-iteration (loop-carried)
>> >> dependency:
>> >> your value on iteration j+1 depend on value from iteration j you've
>> >> just
>> >> written. In case of vectorization you need to do load-operation-store
>> >> on
>> >> multiple consecutive values at once and it is impossible in this case.
>> >>
>> >> In your new code all values of a[] in right-hand side of the main loop
>> >> are
>> >> from previous k iteration (because on current k iteration you're
>> >> writing to
>> >> 'b'). So there is no way to vectorize loop in its original form, but
>> >> new
>> >> form is definitely vectorizable.
>> >>
>> >> I am second to recommend you filing a bug over 'restrict' behavior. And
>> >> you may in fact save some memory by making 'b' 1D array (this is not
>> >> equivalent rewrite once again though)
>> >>
>> >> // This function computes 2D-5 point Jacobi stencil
>> >> void stencil(int a[][N], int b[N])
>> >> {
>> >>   int i, j, k;
>> >>     for (k = 0; k < N; k++) {
>> >>         for (i = 1; i <= N-2; i++) {
>> >>             for (j = 1; j <= N-2; j++)
>> >>                  b[j] = 0.25 * (a[i][j] + a[i-1][j] + a[i+1][j] +
>> >> a[i][j-1] + a[i][j+1]);
>> >>             for (j = 1; j <= N-2; j++)
>> >>                  a[i][j] = b[j];
>> >>         }
>> >>     }
>> >> }
>> >>
>> >> There is a way to vectorize your code even more efficient. This will
>> >> look
>> >> like more low-level, but probably closest to what you would expect from
>> >> vectorization.
>> >>
>> >> void stencil(int a[][N])
>> >> {
>> >>     int i, j, k;
>> >>     for (k = 0; k < 100; k++) {
>> >>          for (i = 1; i <= N-2; i++) {
>> >>             for (j = 1; j <= N-2; j+=16) {
>> >>                int b[16];     // This should become a register on KNL
>> >>                for (v = 0; v < 16; v++) {
>> >>                    b[v] = 0.25 * (a[i][j+v] + a[i-1][j+v] + a[i+1][j+v]
>> >> +
>> >> a[i][j-1+v] + a[i][j+1+v]);
>> >>                }
>> >>                for (v = 0; v < 16; v++) {  // This will be a single
>> >> store
>> >> operation
>> >>                    a[i][j+v] = b[v];
>> >>                }
>> >>            }
>> >>            // You should explicitly take care about the tail of j-loop
>> >> #if !MASKED_SHORT_LOOP_VECTORIZATION_SUPPORTED    // This is not an
>> >> actual
>> >> name, just a designator
>> >>            for (;j <= N-2; j++) {
>> >>                a[i][j] = 0.25 * (a[i][j] + a[i-1][j] + a[i+1][j] +
>> >> a[i][j-1] + a[i][j+1]);
>> >>            }
>> >> #else
>> >>           for (v = 0; v < 16-(N-2-j); v++) {  // This would become
>> >> masked
>> >> non-loop
>> >>               b[v] = 0.25 * (a[i][j+v] + a[i-1][j+v] + a[i+1][j+v] +
>> >> a[i][j-1+v] + a[i][j+1+v]);
>> >>           }
>> >>           for (v = 0; v <  16-(N-2-j); v++) {  // This will be a single
>> >> masked store operation
>> >>               a[i][j+v] = b[v];
>> >>           }
>> >> #endif
>> >>     }
>> >> }
>> >>
>> >> Unfortunalely compiler cannot do this for you: this is not equivalent
>> >> transformation of original code. I am also not aware of any way to
>> >> express
>> >> this desired behavior less explicitly (e.g. OpenMP SIMD pragma won't
>> >> work in
>> >> this case).
>> >>
>> >> Minor note: You're using 'int' for data, than multiply by 0.25 (divide
>> >> by
>> >> 4) and than write it back to 'int'. This will cost you 2 conversion
>> >> to/from
>> >> double while you may just place (...) / 4  which should be optimized to
>> >> simple sequecnce with shifts (not to single shift due to signedness,
>> >> but
>> >> still better than conversions with changes of element size 4->8->4 and
>> >> data
>> >> size INT->FP->INT).
>> >>
>> >> And by the way why do you divide by 4, not by 5 as number of points
>> >> suggest?
>> >>
>> >> Serge Preis
>> >>
>> >>
>> >> 02.07.2017, 05:11, "hameeza ahmed via llvm-dev"
>> >> <llvm-dev at lists.llvm.org>:
>> >> > further i modified the code to the following;
>> >> >
>> >> > #include <stdio.h>
>> >> > #define N 100351
>> >> >
>> >> > // This function computes 2D-5 point Jacobi stencil
>> >> > void stencil(int a[restrict][N], int b[restrict][N])
>> >> > {
>> >> >    int i, j, k;
>> >> >    for (k = 0; k < N; k++) {
>> >> >        for (i = 1; i <= N-2; i++)
>> >> >            for (j = 1; j <= N-2; j++)
>> >> >         b[i][j] = 0.25 * (a[i][j] + a[i-1][j] + a[i+1][j] + a[i][j-1]
>> >> > +
>> >> > a[i][j+1]);
>> >> >
>> >> > for (i = 1; i <= N-2; i++)
>> >> > for (j = 1; j <= N-2; j++)
>> >> >  a[i][j] = b[i][j];
>> >> >
>> >> > }
>> >> > }
>> >> >
>> >> >    but still no vectorization in IR. Also, when I set vector width
>> >> > explicitly to 64, it gives the following error:
>> >> >
>> >> > remark: <unknown>:0:0: loop not vectorized: call instruction cannot
>> >> > be
>> >> > vectorized
>> >> > remark: <unknown>:0:0: loop not vectorized: value that could not be
>> >> > identified as reduction is used outside the loop
>> >> >
>> >> > I need serious help on this. Please guide me.
>> >> >
>> >> > On Sun, Jul 2, 2017 at 1:54 AM, hameeza ahmed <hahmed2305 at gmail.com>
>> >> > wrote:
>> >> >> Does it happen due to loop carried dependence? if yes what is the
>> >> >> solution to vectorize such codes?
>> >> >>
>> >> >> please reply. i m waiting.
>> >> >>
>> >> >> On Jul 1, 2017 12:30 PM, "hameeza ahmed" <hahmed2305 at gmail.com>
>> >> >> wrote:
>> >> >>> I even tried polly but still my llvm IR does not contain vector
>> >> >>> instructions. i used the following command;
>> >> >>>
>> >> >>> clang  -S -emit-llvm stencil.c -march=knl -O3 -mllvm -polly -mllvm
>> >> >>> -polly-vectorizer=stripmine -o stencil_poly.ll
>> >> >>>
>> >> >>> Please specify what is wrong with my code?
>> >> >>>
>> >> >>> On Sat, Jul 1, 2017 at 4:08 PM, hameeza ahmed
>> >> >>> <hahmed2305 at gmail.com>
>> >> >>> wrote:
>> >> >>>> Hello,
>> >> >>>>
>> >> >>>> I am trying to vectorize following stencil code;
>> >> >>>>
>> >> >>>> #include <stdio.h>
>> >> >>>> #define N 100351
>> >> >>>>
>> >> >>>> // This function computes 2D-5 point Jacobi stencil
>> >> >>>> void stencil(int a[restrict][N])
>> >> >>>> {
>> >> >>>>    int i, j, k;
>> >> >>>>    for (k = 0; k < 100; k++)
>> >> >>>>      {  for (i = 1; i <= N-2; i++)
>> >> >>>>          {  for (j = 1; j <= N-2; j++)
>> >> >>>>         { a[i][j] = 0.25 * (a[i][j] + a[i-1][j] + a[i+1][j] +
>> >> >>>> a[i][j-1] + a[i][j+1]);
>> >> >>>>               }
>> >> >>>> }
>> >> >>>> }}
>> >> >>>>
>> >> >>>> I have used the following commands
>> >> >>>>
>> >> >>>> clang  -S -emit-llvm stencil.c -march=knl -O3 -mllvm
>> >> >>>> -disable-llvm-optzns -o stencil.ll
>> >> >>>>
>> >> >>>> opt  -S -O3 stencil.ll -o stencil_o3.ll
>> >> >>>>
>> >> >>>> llc -x86-asm-syntax=intel stencil_o3.ll -o stencil.s
>> >> >>>>
>> >> >>>> But the code is not vectorized. It still uses the scalar
>> >> >>>> instructions;
>> >> >>>>
>> >> >>>> Please correct me.
>> >> >>>>
>> >> >>>> Thank You
>> >> > ,
>> >> >
>> >> > _______________________________________________
>> >> > LLVM Developers mailing list
>> >> > llvm-dev at lists.llvm.org
>> >> > http://lists.llvm.org/cgi-bin/mailman/listinfo/llvm-dev
>> >
>> >
>> >
>> > _______________________________________________
>> > LLVM Developers mailing list
>> > llvm-dev at lists.llvm.org
>> > http://lists.llvm.org/cgi-bin/mailman/listinfo/llvm-dev
>> >
>
>


More information about the llvm-dev mailing list