<html>

  <head>

    <meta http-equiv="Content-Type" content="text/html; charset=utf-8">

  </head>

  <body text="#000000" bgcolor="#FFFFFF">

    <div class="moz-cite-prefix">On 9/20/2018 2:15 PM, hameeza ahmed

      wrote:<br>

    </div>

    <blockquote type="cite"

cite="mid:CAFMPKeY4pCj6duZm=K+AU23e6JX9unCM1EfQ3_DSH7Tgne2RMw@mail.gmail.com">

      <meta http-equiv="content-type" content="text/html; charset=utf-8">

      <div dir="ltr">

        <div dir="ltr">

          <div dir="ltr">

            <div dir="ltr">

              <div>Hello,</div>

              <div>I m trying to set vector width using #pragma clang

                loop vectorize_width(32) but i m getting width 8 for the

                following kernel;</div>

              <br>

              <div><b>i m getting following output when i compiled;</b></div>

              <div><b><br>

                </b></div>

              <div><b>clang -O3  correlation.c   -Rpass=loop-vectorize 

                  -emit-llvm -march=knl    -S  -o 1.ll<br>

                  correlation.c:38:9: remark: vectorized loop

                  (vectorization width: 8, interleaved count: 4)

                  [-Rpass=loop-vectorize]<br>

                          for (j = 0; j < M; j++)<br>

                          ^<br>

                </b></div>

            </div>

          </div>

        </div>

      </div>

    </blockquote>

    <br>

    With AVX-512, an instruction can operate on at most 8

    double-precision lanes.  The vectorizer recognizes that, and

    interleaves the loop so you get 8*4==32 scalar iterations per

    iteration of the vectorized loop.<br>

    <br>

    -Eli<br>

    <pre class="moz-signature" cols="72">-- 

Employee of Qualcomm Innovation Center, Inc.

Qualcomm Innovation Center, Inc. is a member of Code Aurora Forum, a Linux Foundation Collaborative Project</pre>

  </body>

</html>