<div dir="ltr"><div dir="ltr"><div dir="ltr"><div dir="ltr"><div>Hello,</div><div>I m trying to set vector width using #pragma clang loop vectorize_width(32) but i m getting width 8 for the following kernel;</div><div><br></div><div>#define M 128
<br>#define N 128
<br> <br>#define SQRT_FUN(x) sqrtf(x)
<br>int main(int argc, char** argv)
<br>{
<br> /* Variable declaration/allocation. */
<br> double float_n = (double)N;
<br> double data[N*M];
<br> double corr[M*M];
<br> double mean[M];
<br> double stddev[M];
<br> uint32_t i,j,k;
<br> <br> /*Initialize array(s). */
<br> #pragma clang loop vectorize_width(1) //no vectorize
<br> for (i = 0; i < N*M; i++)
<br> {
<br> data[i] = (50.0)*i;
<br> }
<br>kernel_1:
<br> #pragma clang loop vectorize_width(32)
<br> for (j = 0; j < M; j++)
<br> {
<br> mean[j] = 0.0;
<br> }
<br> for (i = 0; i < N; i++)
<br> {
<br> for (j = 0; j < M; j++)
<br> mean[j] += data[(i*M) + j];
<br> }
<br> for (j = 0; j < M; j++)
<br> {
<br> mean[j] /= float_n;
<br> }
<br>kernel_2:
<br> for (j = 0; j < M; j++)
<br> {
<br> stddev[j] = 0.0;
<br> }
<br> for (i = 0; i < N; i++)
<br> {
<br> for (j = 0; j < M; j++)
<br> {
<br> stddev[j] += (data[(i*M) + j] - mean[j]) * (data[(i*M)+j] - mean[j]);
<br> }
<br> }
<br> for (j = 0; j < M; j++)
<br> {
<br> stddev[j] /= float_n;
<br> }
<br> for (j = 0; j < M; j++)
<br> {
<br> stddev[j] = SQRT_FUN(stddev[j]);
<br> }
<br>kernel_3:
<br> for (i = 0; i < N; i++)
<br> {
<br> for (j = 0; j < M; j++)
<br> {
<br> data[(i*M) + j] -= mean[j];
<br> }
<br> }
<br> for (i = 0; i < N; i++)
<br> {
<br> for (j = 0; j < M; j++)
<br> {
<br> data[(i*M) + j] /= SQRT_FUN(float_n) * stddev[j];
<br> }
<br> }
<br>kernel_4:
<br>
<br> for (i = 0; i < M*M; i++)
<br> {
<br> corr[i] = 0.0;
<br> }
<br> for (k = 0; k < N; k++)
<br> {
<br> for (i = 0; i < M-1; i++)
<br> {
<br> for (j = i+1; j < M; j++)
<br> {
<br> corr[(i*M)+j] += (data[(k*M)+i] * data[(k*M)+j]);
<br> }
<br> }
<br> }
<br> printf("Corr[0]: %lf\n",mean[0]);
<br> printf("Corr[0]: %lf\n",mean[M-1]);
<br> printf("Corr[0]: %lf\n",stddev[0]);
<br> printf("Corr[0]: %lf\n",stddev[M-1]);
<br> printf("Corr[0]: %lf\n",corr[0]);
<br> printf("Corr[(M*M)-1]: %lf\n",corr[(M*M)-1]);
<br> printf("Corr[0]: %lf\n",data[0]);
<br> printf("Corr[(M*M)-1]: %lf\n",data[(M*M)-1]);
<br> return 0;
<br>}</div><div><b>i m getting following output when i compiled;</b></div><div><b><br></b></div><div><b>clang -O3 correlation.c -Rpass=loop-vectorize -emit-llvm -march=knl -S -o 1.ll<br>correlation.c:38:9: remark: vectorized loop (vectorization width: 8, interleaved count: 4) [-Rpass=loop-vectorize]<br> for (j = 0; j < M; j++)<br> ^<br>correlation.c:41:5: remark: vectorized loop (vectorization width: 8, interleaved count: 4) [-Rpass=loop-vectorize]<br> for (j = 0; j < M; j++)<br> ^<br>correlation.c:53:9: remark: vectorized loop (vectorization width: 8, interleaved count: 4) [-Rpass=loop-vectorize]<br> for (j = 0; j < M; j++) <br> ^<br>correlation.c:58:5: remark: vectorized loop (vectorization width: 8, interleaved count: 4) [-Rpass=loop-vectorize]<br> for (j = 0; j < M; j++)<br> ^<br>correlation.c:71:9: remark: vectorized loop (vectorization width: 8, interleaved count: 4) [-Rpass=loop-vectorize]<br> for (j = 0; j < M; j++)<br> ^<br>correlation.c:78:9: remark: vectorized loop (vectorization width: 8, interleaved count: 4) [-Rpass=loop-vectorize]<br> for (j = 0; j < M; j++)<br> ^<br>correlation.c:98:13: remark: vectorized loop (vectorization width: 8, interleaved count: 4) [-Rpass=loop-vectorize]<br> for (j = i+1; j < M; j++)<br></b></div><div><b><br></b></div><div><b>why is that so?</b></div><div><b><br></b></div><div><b>although i m able to set width to 32 of the example code given on site.</b></div><div><b><br></b></div><div><b>Why Pragmas are not setting vector width correctly here in my kernel?</b></div><div><b><br></b></div><div><b>What is the issue?</b></div><div><br></div><div><b>Please help..</b></div><div><b><br></b></div><div><b>Thank You</b></div><div><b>Regards</b><br></div></div></div></div></div>