<html xmlns:v="urn:schemas-microsoft-com:vml" xmlns:o="urn:schemas-microsoft-com:office:office" xmlns:w="urn:schemas-microsoft-com:office:word" xmlns:m="http://schemas.microsoft.com/office/2004/12/omml" xmlns="http://www.w3.org/TR/REC-html40">
<head>
<meta http-equiv="Content-Type" content="text/html; charset=utf-8">
<meta name="Generator" content="Microsoft Word 15 (filtered medium)">
<style><!--
/* Font Definitions */
@font-face
{font-family:"Cambria Math";
panose-1:2 4 5 3 5 4 6 3 2 4;}
@font-face
{font-family:Calibri;
panose-1:2 15 5 2 2 2 4 3 2 4;}
/* Style Definitions */
p.MsoNormal, li.MsoNormal, div.MsoNormal
{margin:0cm;
margin-bottom:.0001pt;
font-size:12.0pt;
font-family:"Times New Roman",serif;}
a:link, span.MsoHyperlink
{mso-style-priority:99;
color:#0563C1;
text-decoration:underline;}
a:visited, span.MsoHyperlinkFollowed
{mso-style-priority:99;
color:#954F72;
text-decoration:underline;}
p.MsoListParagraph, li.MsoListParagraph, div.MsoListParagraph
{mso-style-priority:34;
margin-top:0cm;
margin-right:0cm;
margin-bottom:0cm;
margin-left:36.0pt;
margin-bottom:.0001pt;
font-size:12.0pt;
font-family:"Times New Roman",serif;}
span.EmailStyle17
{mso-style-type:personal-reply;
font-family:"Calibri",sans-serif;
color:#1F497D;}
.MsoChpDefault
{mso-style-type:export-only;
font-family:"Calibri",sans-serif;}
@page WordSection1
{size:612.0pt 792.0pt;
margin:72.0pt 90.0pt 72.0pt 90.0pt;}
div.WordSection1
{page:WordSection1;}
--></style><!--[if gte mso 9]><xml>
<o:shapedefaults v:ext="edit" spidmax="1026" />
</xml><![endif]--><!--[if gte mso 9]><xml>
<o:shapelayout v:ext="edit">
<o:idmap v:ext="edit" data="1" />
</o:shapelayout></xml><![endif]-->
</head>
<body lang="EN-US" link="#0563C1" vlink="#954F72">
<div class="WordSection1">
<p class="MsoNormal"><span style="font-size:11.0pt;font-family:"Calibri",sans-serif;color:#1F497D">>
</span>help me better understand how autovectorization of outer loops works.<o:p></o:p></p>
<p class="MsoNormal"><span style="font-size:11.0pt;font-family:"Calibri",sans-serif;color:#1F497D"><o:p> </o:p></span></p>
<p class="MsoNormal"><span style="font-size:11.0pt;font-family:"Calibri",sans-serif;color:#1F497D">LLVM’s loop vectorizer currently handles innermost loops only.<o:p></o:p></span></p>
<p class="MsoNormal"><span style="font-size:11.0pt;font-family:"Calibri",sans-serif;color:#1F497D"><o:p> </o:p></span></p>
<p class="MsoNormal"><span style="font-size:11.0pt;font-family:"Calibri",sans-serif;color:#1F497D"><o:p> </o:p></span></p>
<p class="MsoNormal"><span style="font-size:11.0pt;font-family:"Calibri",sans-serif;color:#1F497D">>
</span>I'm hoping for an autovectorization of the outer loop so that the inner loop operates on vectors.<br>
<br>
<span style="font-size:11.0pt;font-family:"Calibri",sans-serif;color:#1F497D"><o:p></o:p></span></p>
<p class="MsoNormal"><span style="font-size:11.0pt;font-family:"Calibri",sans-serif;color:#1F497D">We share that hope and are working to achieve it:
<a href="http://lists.llvm.org/pipermail/llvm-dev/2016-September/105057.html">http://lists.llvm.org/pipermail/llvm-dev/2016-September/105057.html</a>, but it will take some time. See
<a href="https://reviews.llvm.org/D28975">https://reviews.llvm.org/D28975</a> and
<a href="https://reviews.llvm.org/D32871">https://reviews.llvm.org/D32871</a>. Thanks for the use-case.<o:p></o:p></span></p>
<p class="MsoNormal"><a name="_MailEndCompose"><span style="font-size:11.0pt;font-family:"Calibri",sans-serif;color:#1F497D"><o:p> </o:p></span></a></p>
<p class="MsoNormal"><span style="font-size:11.0pt;font-family:"Calibri",sans-serif;color:#1F497D"><o:p> </o:p></span></p>
<p class="MsoNormal"><span style="font-size:11.0pt;font-family:"Calibri",sans-serif;color:#1F497D">></span> I was wondering if there are small ways in which I can change my code to help LLVM's autovectorizer to succeed.<o:p></o:p></p>
<p class="MsoNormal"><o:p> </o:p></p>
<p class="MsoNormal"><span style="font-size:11.0pt;font-family:"Calibri",sans-serif;color:#1F497D">If a doubly-nested loop can be interchanged such that the inner loop becomes vectorizable, it may help.<o:p></o:p></span></p>
<p class="MsoNormal"><span style="font-size:11.0pt;font-family:"Calibri",sans-serif;color:#1F497D"><o:p> </o:p></span></p>
<p class="MsoNormal"><span style="font-size:11.0pt;font-family:"Calibri",sans-serif;color:#1F497D">Ayal.<o:p></o:p></span></p>
<p class="MsoNormal"><span style="font-size:11.0pt;font-family:"Calibri",sans-serif;color:#1F497D"><o:p> </o:p></span></p>
<p class="MsoNormal"><span style="font-size:11.0pt;font-family:"Calibri",sans-serif;color:#1F497D"><o:p> </o:p></span></p>
<p class="MsoNormal"><a name="_____replyseparator"></a><b><span style="font-size:11.0pt;font-family:"Calibri",sans-serif">From:</span></b><span style="font-size:11.0pt;font-family:"Calibri",sans-serif"> llvm-dev [mailto:llvm-dev-bounces@lists.llvm.org]
<b>On Behalf Of </b>Jyotirmoy Bhattacharya via llvm-dev<br>
<b>Sent:</b> Wednesday, May 10, 2017 10:16<br>
<b>To:</b> llvm-dev@lists.llvm.org<br>
<b>Subject:</b> [llvm-dev] autovectorization of outer loop<o:p></o:p></span></p>
<p class="MsoNormal"><o:p> </o:p></p>
<div>
<p class="MsoNormal">I have the following C++ code that evaluates a Chebyshev polynomial using Clenshaw's algorithm<br>
<br>
void cheby_eval(double *coeffs,int n,double *xs,double *ys,int m)<br>
{<br>
#pragma omp simd<br>
for (int i=0;i<m;i++){<br>
double x = xs[i];<br>
double u0=0,u1=0,u2=0;<br>
for (int k=n;k>=0;k--){<br>
u2 = u1;<br>
u1 = u0;<br>
u0 = 2*x*u1-u2+coeffs[k];<br>
}<br>
ys[i] = 0.5*(coeffs[0]+u0-u2);<br>
}<br>
}<br>
<br>
I'm hoping for an autovectorization of the outer loop so that the inner loop operates on vectors.<br>
<br>
When compiled with<br>
<br>
clang++ -O3 -march=haswell -Rpass-analysis=loop-vectorize -S chebyshev.cc<br>
<br>
using clang++ 3.8.1-23, no vectorization happens and I get the message<br>
<br>
chebyshev.cc:19:18: remark: loop not vectorized: cannot identify array bounds<br>
[-Rpass-analysis=loop-vectorize]<br>
ys[i] = 0.5*(coeffs[0]+u0-u2);<br>
^<br>
chebyshev.cc:21:1: remark: loop not vectorized: value that could not be<br>
identified as reduction is used outside the loop<br>
[-Rpass-analysis=loop-vectorize]<br>
<br>
<br>
On the same code icc vectorizes the outer loop as expected.<br>
<br>
I was wondering if there are small ways in which I can change my code to help LLVM's autovectorizer to succeed. I would also appreciate any pointers to documentation or LLVM source that can help me better understand how autovectorization of outer loops works.<br>
<br>
Regards,<br>
Jyotirmoy Bhattacharya<br>
<br>
PS. The interesting part of icc's assembler output is<br>
<br>
..B1.4: # Preds ..B1.8 ..B1.3<br>
xorl %r15d, %r15d #14.5<br>
xorl %ebx, %ebx #14.21<br>
testq %rsi, %rsi #14.21<br>
vmovupd (%rdx,%r9,8), %ymm3 #12.16<br>
vxorpd %ymm5, %ymm5, %ymm5 #13.14<br>
vmovdqa %ymm1, %ymm4 #13.19<br>
vmovdqa %ymm1, %ymm2 #13.24<br>
jl ..B1.8 # Prob 2% #14.21<br>
<br>
..B1.5: # Preds ..B1.4<br>
vaddpd %ymm3, %ymm3, %ymm3 #17.14<br>
<br>
..B1.6: # Preds ..B1.6 ..B1.5<br>
vmovapd %ymm4, %ymm2 #20.3<br>
incq %r15 #14.5<br>
vmovapd %ymm5, %ymm4 #20.3<br>
vfmsub213pd %ymm2, %ymm3, %ymm5 #17.19<br>
vbroadcastsd (%r11,%rbx,8), %ymm6 #17.22<br>
decq %rbx<br>
vaddpd %ymm5, %ymm6, %ymm5 #17.22<br>
cmpq %r10, %r15 #14.5<br>
jb ..B1.6 # Prob 82% #14.5<br>
<br>
..B1.8: # Preds ..B1.6 ..B1.4<br>
vbroadcastsd (%rdi), %ymm3 #19.18<br>
vaddpd %ymm3, %ymm5, %ymm4 #19.28<br>
vsubpd %ymm2, %ymm4, %ymm2 #19.31<br>
vmulpd %ymm2, %ymm0, %ymm5 #19.31<br>
vmovupd %ymm5, (%rcx,%r9,8) #19.5<br>
addq $4, %r9 #11.3<br>
cmpq %r8, %r9 #11.3<br>
jb ..B1.4 # Prob 82% #11<o:p></o:p></p>
</div>
</div>
<p>---------------------------------------------------------------------<br>
Intel Israel (74) Limited</p>
<p>This e-mail and any attachments may contain confidential material for<br>
the sole use of the intended recipient(s). Any review or distribution<br>
by others is strictly prohibited. If you are not the intended<br>
recipient, please contact the sender and delete all copies.</p></body>
</html>