[llvm-dev] Jacobi 5 Point Stencil Code not Vectorizing

Sat Oct 21 13:21:25 PDT 2017

Hello,

The following stencil function that you wrote few months ago (for
computing 2D-5 point Jacobi stencil) is vectorizable.

void stencil(int a[][N], int b[N])
{
  int i, j, k;
    for (k = 0; k < N; k++) {
        for (i = 1; i <= N-2; i++) {
            for (j = 1; j <= N-2; j++)
                 b[j] = 0.25 * (a[i][j] + a[i-1][j] + a[i+1][j] + a[i][j-1]
+ a[i][j+1]);
            for (j = 1; j <= N-2; j++)
                 a[i][j] = b[j];
        }
    }
}

but when i write the above code in main i am getting error

opt  -S -O3  stencil.ll -o stencil_o3.ll
remark: <unknown>:0:0: loop not vectorized: value that could not be
identified as reduction is used outside the loop

Why is that so?

my code is follows:
int a[N][N],  b[N];
int main(void)
{
  int i, j, k;
    for (k = 0; k < N; k++) {
        for (i = 1; i <= N-2; i++) {
            for (j = 1; j <= N-2; j++)
                 b[j] = 0.25 * (a[i][j] + a[i-1][j] + a[i+1][j] + a[i][j-1]
+ a[i][j+1]);
            for (j = 1; j <= N-2; j++)
                 a[i][j] = b[j];
        }
    }
}

Please help.

Thank You.

On Mon, Jul 3, 2017 at 10:19 AM, Serge Preis <spreis at yandex-team.ru> wrote:

> Hello,
>
> This is not equivalent rewrite: your original code definitely shouldn't
> vectorize because it has backward cross-iteration (loop-carried)
> dependency: your value on iteration j+1 depend on value from iteration j
> you've just written. In case of vectorization you need to do
> load-operation-store on multiple consecutive values at once and it is
> impossible in this case.
>
> In your new code all values of a[] in right-hand side of the main loop are
> from previous k iteration (because on current k iteration you're writing to
> 'b'). So there is no way to vectorize loop in its original form, but new
> form is definitely vectorizable.
>
> I am second to recommend you filing a bug over 'restrict' behavior. And
> you may in fact save some memory by making 'b' 1D array (this is not
> equivalent rewrite once again though)
>
> // This function computes 2D-5 point Jacobi stencil
> void stencil(int a[][N], int b[N])
> {
>   int i, j, k;
>     for (k = 0; k < N; k++) {
>         for (i = 1; i <= N-2; i++) {
>             for (j = 1; j <= N-2; j++)
>                  b[j] = 0.25 * (a[i][j] + a[i-1][j] + a[i+1][j] +
> a[i][j-1] + a[i][j+1]);
>             for (j = 1; j <= N-2; j++)
>                  a[i][j] = b[j];
>         }
>     }
> }
>
> There is a way to vectorize your code even more efficient. This will look
> like more low-level, but probably closest to what you would expect from
> vectorization.
>
> void stencil(int a[][N])
> {
>     int i, j, k;
>     for (k = 0; k < 100; k++) {
>          for (i = 1; i <= N-2; i++) {
>             for (j = 1; j <= N-2; j+=16) {
>                int b[16];     // This should become a register on KNL
>                for (v = 0; v < 16; v++) {
>                    b[v] = 0.25 * (a[i][j+v] + a[i-1][j+v] + a[i+1][j+v] +
> a[i][j-1+v] + a[i][j+1+v]);
>                }
>                for (v = 0; v < 16; v++) {  // This will be a single store
> operation
>                    a[i][j+v] = b[v];
>                }
>            }
>            // You should explicitly take care about the tail of j-loop
> #if !MASKED_SHORT_LOOP_VECTORIZATION_SUPPORTED    // This is not an
> actual name, just a designator
>            for (;j <= N-2; j++) {
>                a[i][j] = 0.25 * (a[i][j] + a[i-1][j] + a[i+1][j] +
> a[i][j-1] + a[i][j+1]);
>            }
> #else
>           for (v = 0; v < 16-(N-2-j); v++) {  // This would become masked
> non-loop
>               b[v] = 0.25 * (a[i][j+v] + a[i-1][j+v] + a[i+1][j+v] +
> a[i][j-1+v] + a[i][j+1+v]);
>           }
>           for (v = 0; v <  16-(N-2-j); v++) {  // This will be a single
> masked store operation
>               a[i][j+v] = b[v];
>           }
> #endif
>     }
> }
>
> Unfortunalely compiler cannot do this for you: this is not equivalent
> transformation of original code. I am also not aware of any way to express
> this desired behavior less explicitly (e.g. OpenMP SIMD pragma won't work
> in this case).
>
> Minor note: You're using 'int' for data, than multiply by 0.25 (divide by
> 4) and than write it back to 'int'. This will cost you 2 conversion to/from
> double while you may just place (...) / 4  which should be optimized to
> simple sequecnce with shifts (not to single shift due to signedness, but
> still better than conversions with changes of element size 4->8->4 and data
> size INT->FP->INT).
>
> And by the way why do you divide by 4, not by 5 as number of points
> suggest?
>
> Serge Preis
>
>
> 02.07.2017, 05:11, "hameeza ahmed via llvm-dev" <llvm-dev at lists.llvm.org>:
> > further i modified the code to the following;
> >
> > #include <stdio.h>
> > #define N 100351
> >
> > // This function computes 2D-5 point Jacobi stencil
> > void stencil(int a[restrict][N], int b[restrict][N])
> > {
> >    int i, j, k;
> >    for (k = 0; k < N; k++) {
> >        for (i = 1; i <= N-2; i++)
> >            for (j = 1; j <= N-2; j++)
> >         b[i][j] = 0.25 * (a[i][j] + a[i-1][j] + a[i+1][j] + a[i][j-1] +
> a[i][j+1]);
> >
> > for (i = 1; i <= N-2; i++)
> > for (j = 1; j <= N-2; j++)
> >  a[i][j] = b[i][j];
> >
> > }
> > }
> >
> >    but still no vectorization in IR. Also, when I set vector width
> explicitly to 64, it gives the following error:
> >
> > remark: <unknown>:0:0: loop not vectorized: call instruction cannot be
> vectorized
> > remark: <unknown>:0:0: loop not vectorized: value that could not be
> identified as reduction is used outside the loop
> >
> > I need serious help on this. Please guide me.
> >
> > On Sun, Jul 2, 2017 at 1:54 AM, hameeza ahmed <hahmed2305 at gmail.com>
> wrote:
> >> Does it happen due to loop carried dependence? if yes what is the
> solution to vectorize such codes?
> >>
> >> please reply. i m waiting.
> >>
> >> On Jul 1, 2017 12:30 PM, "hameeza ahmed" <hahmed2305 at gmail.com> wrote:
> >>> I even tried polly but still my llvm IR does not contain vector
> instructions. i used the following command;
> >>>
> >>> clang  -S -emit-llvm stencil.c -march=knl -O3 -mllvm -polly -mllvm
> -polly-vectorizer=stripmine -o stencil_poly.ll
> >>>
> >>> Please specify what is wrong with my code?
> >>>
> >>> On Sat, Jul 1, 2017 at 4:08 PM, hameeza ahmed <hahmed2305 at gmail.com>
> wrote:
> >>>> Hello,
> >>>>
> >>>> I am trying to vectorize following stencil code;
> >>>>
> >>>> #include <stdio.h>
> >>>> #define N 100351
> >>>>
> >>>> // This function computes 2D-5 point Jacobi stencil
> >>>> void stencil(int a[restrict][N])
> >>>> {
> >>>>    int i, j, k;
> >>>>    for (k = 0; k < 100; k++)
> >>>>      {  for (i = 1; i <= N-2; i++)
> >>>>          {  for (j = 1; j <= N-2; j++)
> >>>>         { a[i][j] = 0.25 * (a[i][j] + a[i-1][j] + a[i+1][j] +
> a[i][j-1] + a[i][j+1]);
> >>>>               }
> >>>> }
> >>>> }}
> >>>>
> >>>> I have used the following commands
> >>>>
> >>>> clang  -S -emit-llvm stencil.c -march=knl -O3 -mllvm
> -disable-llvm-optzns -o stencil.ll
> >>>>
> >>>> opt  -S -O3 stencil.ll -o stencil_o3.ll
> >>>>
> >>>> llc -x86-asm-syntax=intel stencil_o3.ll -o stencil.s
> >>>>
> >>>> But the code is not vectorized. It still uses the scalar instructions;
> >>>>
> >>>> Please correct me.
> >>>>
> >>>> Thank You
> > ,
> >
> > _______________________________________________
> > LLVM Developers mailing list
> > llvm-dev at lists.llvm.org
> > http://lists.llvm.org/cgi-bin/mailman/listinfo/llvm-dev
>
-------------- next part --------------
An HTML attachment was scrubbed...
URL: <http://lists.llvm.org/pipermail/llvm-dev/attachments/20171022/c765aadb/attachment.html>