[llvm-dev] Jacobi 5 Point Stencil Code not Vectorizing

hameeza ahmed via llvm-dev llvm-dev at lists.llvm.org
Wed Oct 25 12:49:12 PDT 2017


Thank you.

your command is able to vectorize the code

On Wed, Oct 25, 2017 at 5:34 PM, Michael Kruse <llvmdev at meinersbur.de>
wrote:

> $ opt -O3 stencil.ll -pass-remarks=loop-vectorize -enable-load-pre=0
>
> remark: <unknown>:0:0: vectorized loop (vectorization width: 16,
> interleaved count: 1)
>
> Michael
>
> 2017-10-24 20:37 GMT+02:00 hameeza ahmed <hahmed2305 at gmail.com>:
> > Thank you.
> >
> > But how to resolve it, my work depends on it.
> >
> > On Tue, Oct 24, 2017 at 6:30 PM, Michael Kruse <llvmdev at meinersbur.de>
> > wrote:
> >>
> >> Your problem is due to GVN partial reduction elimination (PRE) which
> >> introduces a PHI node the current loop vectorizer cannot handle:
> >>
> >> opt -O3 stencil.ll -pass-remarks=loop-vectorize
> >> -pass-remarks-missed=loop-vectorize
> >> -pass-remarks-analysis=loop-vectorize
> >> remark: <unknown>:0:0: loop not vectorized: value that could not be
> >> identified as reduction is used outside the loop
> >> remark: <unknown>:0:0: loop not vectorized
> >>
> >> The message is not entirely accurate. The PHI is not used outside of
> >> the loop, but is in the loop header, therefore cannot be if-converted
> >> and is not a reduction. Polly also had difficulties with this
> >> construction as well.
> >>
> >> As presented at the dev meeting, VPlan will be able to insert a
> >> shuffle instruction when encountering this situation.
> >>
> >>
> >> Michael
> >>
> >>
> >>
> >>
> >>
> >> 2017-10-23 8:58 GMT+02:00 hameeza ahmed via llvm-dev
> >> <llvm-dev at lists.llvm.org>:
> >> > stencil.ll is attached here.
> >> >
> >> > On Mon, Oct 23, 2017 at 11:37 AM, Serge Preis <spreis at yandex-team.ru>
> >> > wrote:
> >> >>
> >> >>
> >> >>
> >> >> Hello,
> >> >>
> >> >> To me this is an issue in llvm loop vectorizer (if N is large enough
> to
> >> >> prevent complete unrolling of j-loop).
> >> >>
> >> >> Woud you mind to share stencil.ll than I would say more definitely
> what
> >> >> the issue is.
> >> >>
> >> >> Regards,
> >> >> Serge.
> >> >>
> >> >>
> >> >> 22.10.2017, 03:21, "hameeza ahmed" <hahmed2305 at gmail.com>:
> >> >>
> >> >> Hello,
> >> >>
> >> >> The following stencil function that you wrote few months ago (for
> >> >> computing 2D-5 point Jacobi stencil) is vectorizable.
> >> >>
> >> >> void stencil(int a[][N], int b[N])
> >> >> {
> >> >>   int i, j, k;
> >> >>     for (k = 0; k < N; k++) {
> >> >>         for (i = 1; i <= N-2; i++) {
> >> >>             for (j = 1; j <= N-2; j++)
> >> >>                  b[j] = 0.25 * (a[i][j] + a[i-1][j] + a[i+1][j] +
> >> >> a[i][j-1] + a[i][j+1]);
> >> >>             for (j = 1; j <= N-2; j++)
> >> >>                  a[i][j] = b[j];
> >> >>         }
> >> >>     }
> >> >> }
> >> >>
> >> >> but when i write the above code in main i am getting error
> >> >>
> >> >> opt  -S -O3  stencil.ll -o stencil_o3.ll
> >> >> remark: <unknown>:0:0: loop not vectorized: value that could not be
> >> >> identified as reduction is used outside the loop
> >> >>
> >> >> Why is that so?
> >> >>
> >> >> my code is follows:
> >> >> int a[N][N],  b[N];
> >> >> int main(void)
> >> >> {
> >> >>   int i, j, k;
> >> >>     for (k = 0; k < N; k++) {
> >> >>         for (i = 1; i <= N-2; i++) {
> >> >>             for (j = 1; j <= N-2; j++)
> >> >>                  b[j] = 0.25 * (a[i][j] + a[i-1][j] + a[i+1][j] +
> >> >> a[i][j-1] + a[i][j+1]);
> >> >>             for (j = 1; j <= N-2; j++)
> >> >>                  a[i][j] = b[j];
> >> >>         }
> >> >>     }
> >> >> }
> >> >>
> >> >>
> >> >> Please help.
> >> >>
> >> >> Thank You.
> >> >>
> >> >>
> >> >> On Mon, Jul 3, 2017 at 10:19 AM, Serge Preis <spreis at yandex-team.ru>
> >> >> wrote:
> >> >>
> >> >> Hello,
> >> >>
> >> >> This is not equivalent rewrite: your original code definitely
> shouldn't
> >> >> vectorize because it has backward cross-iteration (loop-carried)
> >> >> dependency:
> >> >> your value on iteration j+1 depend on value from iteration j you've
> >> >> just
> >> >> written. In case of vectorization you need to do load-operation-store
> >> >> on
> >> >> multiple consecutive values at once and it is impossible in this
> case.
> >> >>
> >> >> In your new code all values of a[] in right-hand side of the main
> loop
> >> >> are
> >> >> from previous k iteration (because on current k iteration you're
> >> >> writing to
> >> >> 'b'). So there is no way to vectorize loop in its original form, but
> >> >> new
> >> >> form is definitely vectorizable.
> >> >>
> >> >> I am second to recommend you filing a bug over 'restrict' behavior.
> And
> >> >> you may in fact save some memory by making 'b' 1D array (this is not
> >> >> equivalent rewrite once again though)
> >> >>
> >> >> // This function computes 2D-5 point Jacobi stencil
> >> >> void stencil(int a[][N], int b[N])
> >> >> {
> >> >>   int i, j, k;
> >> >>     for (k = 0; k < N; k++) {
> >> >>         for (i = 1; i <= N-2; i++) {
> >> >>             for (j = 1; j <= N-2; j++)
> >> >>                  b[j] = 0.25 * (a[i][j] + a[i-1][j] + a[i+1][j] +
> >> >> a[i][j-1] + a[i][j+1]);
> >> >>             for (j = 1; j <= N-2; j++)
> >> >>                  a[i][j] = b[j];
> >> >>         }
> >> >>     }
> >> >> }
> >> >>
> >> >> There is a way to vectorize your code even more efficient. This will
> >> >> look
> >> >> like more low-level, but probably closest to what you would expect
> from
> >> >> vectorization.
> >> >>
> >> >> void stencil(int a[][N])
> >> >> {
> >> >>     int i, j, k;
> >> >>     for (k = 0; k < 100; k++) {
> >> >>          for (i = 1; i <= N-2; i++) {
> >> >>             for (j = 1; j <= N-2; j+=16) {
> >> >>                int b[16];     // This should become a register on KNL
> >> >>                for (v = 0; v < 16; v++) {
> >> >>                    b[v] = 0.25 * (a[i][j+v] + a[i-1][j+v] +
> a[i+1][j+v]
> >> >> +
> >> >> a[i][j-1+v] + a[i][j+1+v]);
> >> >>                }
> >> >>                for (v = 0; v < 16; v++) {  // This will be a single
> >> >> store
> >> >> operation
> >> >>                    a[i][j+v] = b[v];
> >> >>                }
> >> >>            }
> >> >>            // You should explicitly take care about the tail of
> j-loop
> >> >> #if !MASKED_SHORT_LOOP_VECTORIZATION_SUPPORTED    // This is not an
> >> >> actual
> >> >> name, just a designator
> >> >>            for (;j <= N-2; j++) {
> >> >>                a[i][j] = 0.25 * (a[i][j] + a[i-1][j] + a[i+1][j] +
> >> >> a[i][j-1] + a[i][j+1]);
> >> >>            }
> >> >> #else
> >> >>           for (v = 0; v < 16-(N-2-j); v++) {  // This would become
> >> >> masked
> >> >> non-loop
> >> >>               b[v] = 0.25 * (a[i][j+v] + a[i-1][j+v] + a[i+1][j+v] +
> >> >> a[i][j-1+v] + a[i][j+1+v]);
> >> >>           }
> >> >>           for (v = 0; v <  16-(N-2-j); v++) {  // This will be a
> single
> >> >> masked store operation
> >> >>               a[i][j+v] = b[v];
> >> >>           }
> >> >> #endif
> >> >>     }
> >> >> }
> >> >>
> >> >> Unfortunalely compiler cannot do this for you: this is not equivalent
> >> >> transformation of original code. I am also not aware of any way to
> >> >> express
> >> >> this desired behavior less explicitly (e.g. OpenMP SIMD pragma won't
> >> >> work in
> >> >> this case).
> >> >>
> >> >> Minor note: You're using 'int' for data, than multiply by 0.25
> (divide
> >> >> by
> >> >> 4) and than write it back to 'int'. This will cost you 2 conversion
> >> >> to/from
> >> >> double while you may just place (...) / 4  which should be optimized
> to
> >> >> simple sequecnce with shifts (not to single shift due to signedness,
> >> >> but
> >> >> still better than conversions with changes of element size 4->8->4
> and
> >> >> data
> >> >> size INT->FP->INT).
> >> >>
> >> >> And by the way why do you divide by 4, not by 5 as number of points
> >> >> suggest?
> >> >>
> >> >> Serge Preis
> >> >>
> >> >>
> >> >> 02.07.2017, 05:11, "hameeza ahmed via llvm-dev"
> >> >> <llvm-dev at lists.llvm.org>:
> >> >> > further i modified the code to the following;
> >> >> >
> >> >> > #include <stdio.h>
> >> >> > #define N 100351
> >> >> >
> >> >> > // This function computes 2D-5 point Jacobi stencil
> >> >> > void stencil(int a[restrict][N], int b[restrict][N])
> >> >> > {
> >> >> >    int i, j, k;
> >> >> >    for (k = 0; k < N; k++) {
> >> >> >        for (i = 1; i <= N-2; i++)
> >> >> >            for (j = 1; j <= N-2; j++)
> >> >> >         b[i][j] = 0.25 * (a[i][j] + a[i-1][j] + a[i+1][j] +
> a[i][j-1]
> >> >> > +
> >> >> > a[i][j+1]);
> >> >> >
> >> >> > for (i = 1; i <= N-2; i++)
> >> >> > for (j = 1; j <= N-2; j++)
> >> >> >  a[i][j] = b[i][j];
> >> >> >
> >> >> > }
> >> >> > }
> >> >> >
> >> >> >    but still no vectorization in IR. Also, when I set vector width
> >> >> > explicitly to 64, it gives the following error:
> >> >> >
> >> >> > remark: <unknown>:0:0: loop not vectorized: call instruction cannot
> >> >> > be
> >> >> > vectorized
> >> >> > remark: <unknown>:0:0: loop not vectorized: value that could not be
> >> >> > identified as reduction is used outside the loop
> >> >> >
> >> >> > I need serious help on this. Please guide me.
> >> >> >
> >> >> > On Sun, Jul 2, 2017 at 1:54 AM, hameeza ahmed <
> hahmed2305 at gmail.com>
> >> >> > wrote:
> >> >> >> Does it happen due to loop carried dependence? if yes what is the
> >> >> >> solution to vectorize such codes?
> >> >> >>
> >> >> >> please reply. i m waiting.
> >> >> >>
> >> >> >> On Jul 1, 2017 12:30 PM, "hameeza ahmed" <hahmed2305 at gmail.com>
> >> >> >> wrote:
> >> >> >>> I even tried polly but still my llvm IR does not contain vector
> >> >> >>> instructions. i used the following command;
> >> >> >>>
> >> >> >>> clang  -S -emit-llvm stencil.c -march=knl -O3 -mllvm -polly
> -mllvm
> >> >> >>> -polly-vectorizer=stripmine -o stencil_poly.ll
> >> >> >>>
> >> >> >>> Please specify what is wrong with my code?
> >> >> >>>
> >> >> >>> On Sat, Jul 1, 2017 at 4:08 PM, hameeza ahmed
> >> >> >>> <hahmed2305 at gmail.com>
> >> >> >>> wrote:
> >> >> >>>> Hello,
> >> >> >>>>
> >> >> >>>> I am trying to vectorize following stencil code;
> >> >> >>>>
> >> >> >>>> #include <stdio.h>
> >> >> >>>> #define N 100351
> >> >> >>>>
> >> >> >>>> // This function computes 2D-5 point Jacobi stencil
> >> >> >>>> void stencil(int a[restrict][N])
> >> >> >>>> {
> >> >> >>>>    int i, j, k;
> >> >> >>>>    for (k = 0; k < 100; k++)
> >> >> >>>>      {  for (i = 1; i <= N-2; i++)
> >> >> >>>>          {  for (j = 1; j <= N-2; j++)
> >> >> >>>>         { a[i][j] = 0.25 * (a[i][j] + a[i-1][j] + a[i+1][j] +
> >> >> >>>> a[i][j-1] + a[i][j+1]);
> >> >> >>>>               }
> >> >> >>>> }
> >> >> >>>> }}
> >> >> >>>>
> >> >> >>>> I have used the following commands
> >> >> >>>>
> >> >> >>>> clang  -S -emit-llvm stencil.c -march=knl -O3 -mllvm
> >> >> >>>> -disable-llvm-optzns -o stencil.ll
> >> >> >>>>
> >> >> >>>> opt  -S -O3 stencil.ll -o stencil_o3.ll
> >> >> >>>>
> >> >> >>>> llc -x86-asm-syntax=intel stencil_o3.ll -o stencil.s
> >> >> >>>>
> >> >> >>>> But the code is not vectorized. It still uses the scalar
> >> >> >>>> instructions;
> >> >> >>>>
> >> >> >>>> Please correct me.
> >> >> >>>>
> >> >> >>>> Thank You
> >> >> > ,
> >> >> >
> >> >> > _______________________________________________
> >> >> > LLVM Developers mailing list
> >> >> > llvm-dev at lists.llvm.org
> >> >> > http://lists.llvm.org/cgi-bin/mailman/listinfo/llvm-dev
> >> >
> >> >
> >> >
> >> > _______________________________________________
> >> > LLVM Developers mailing list
> >> > llvm-dev at lists.llvm.org
> >> > http://lists.llvm.org/cgi-bin/mailman/listinfo/llvm-dev
> >> >
> >
> >
>
-------------- next part --------------
An HTML attachment was scrubbed...
URL: <http://lists.llvm.org/pipermail/llvm-dev/attachments/20171026/825eedbc/attachment.html>


More information about the llvm-dev mailing list