<html>
<head>
<meta http-equiv="Content-Type" content="text/html; charset=utf-8">
</head>
<body text="#000000" bgcolor="#FFFFFF">
<div class="moz-cite-prefix">On 9/20/2018 2:15 PM, hameeza ahmed
wrote:<br>
</div>
<blockquote type="cite"
cite="mid:CAFMPKeY4pCj6duZm=K+AU23e6JX9unCM1EfQ3_DSH7Tgne2RMw@mail.gmail.com">
<meta http-equiv="content-type" content="text/html; charset=utf-8">
<div dir="ltr">
<div dir="ltr">
<div dir="ltr">
<div dir="ltr">
<div>Hello,</div>
<div>I m trying to set vector width using #pragma clang
loop vectorize_width(32) but i m getting width 8 for the
following kernel;</div>
<br>
<div><b>i m getting following output when i compiled;</b></div>
<div><b><br>
</b></div>
<div><b>clang -O3 correlation.c -Rpass=loop-vectorize
-emit-llvm -march=knl -S -o 1.ll<br>
correlation.c:38:9: remark: vectorized loop
(vectorization width: 8, interleaved count: 4)
[-Rpass=loop-vectorize]<br>
for (j = 0; j < M; j++)<br>
^<br>
</b></div>
</div>
</div>
</div>
</div>
</blockquote>
<br>
With AVX-512, an instruction can operate on at most 8
double-precision lanes. The vectorizer recognizes that, and
interleaves the loop so you get 8*4==32 scalar iterations per
iteration of the vectorized loop.<br>
<br>
-Eli<br>
<pre class="moz-signature" cols="72">--
Employee of Qualcomm Innovation Center, Inc.
Qualcomm Innovation Center, Inc. is a member of Code Aurora Forum, a Linux Foundation Collaborative Project</pre>
</body>
</html>