<html xmlns:v="urn:schemas-microsoft-com:vml" xmlns:o="urn:schemas-microsoft-com:office:office" xmlns:w="urn:schemas-microsoft-com:office:word" xmlns:m="http://schemas.microsoft.com/office/2004/12/omml" xmlns="http://www.w3.org/TR/REC-html40">
<head>
<meta http-equiv="Content-Type" content="text/html; charset=utf-8">
<meta name="Generator" content="Microsoft Word 15 (filtered medium)">
<style><!--
/* Font Definitions */
@font-face
        {font-family:"Cambria Math";
        panose-1:2 4 5 3 5 4 6 3 2 4;}
@font-face
        {font-family:Calibri;
        panose-1:2 15 5 2 2 2 4 3 2 4;}
/* Style Definitions */
p.MsoNormal, li.MsoNormal, div.MsoNormal
        {margin:0cm;
        margin-bottom:.0001pt;
        font-size:12.0pt;
        font-family:"Times New Roman",serif;}
a:link, span.MsoHyperlink
        {mso-style-priority:99;
        color:#0563C1;
        text-decoration:underline;}
a:visited, span.MsoHyperlinkFollowed
        {mso-style-priority:99;
        color:#954F72;
        text-decoration:underline;}
p.MsoListParagraph, li.MsoListParagraph, div.MsoListParagraph
        {mso-style-priority:34;
        margin-top:0cm;
        margin-right:0cm;
        margin-bottom:0cm;
        margin-left:36.0pt;
        margin-bottom:.0001pt;
        font-size:12.0pt;
        font-family:"Times New Roman",serif;}
span.EmailStyle17
        {mso-style-type:personal-reply;
        font-family:"Calibri",sans-serif;
        color:#1F497D;}
.MsoChpDefault
        {mso-style-type:export-only;
        font-family:"Calibri",sans-serif;}
@page WordSection1
        {size:612.0pt 792.0pt;
        margin:72.0pt 90.0pt 72.0pt 90.0pt;}
div.WordSection1
        {page:WordSection1;}
--></style><!--[if gte mso 9]><xml>
<o:shapedefaults v:ext="edit" spidmax="1026" />
</xml><![endif]--><!--[if gte mso 9]><xml>
<o:shapelayout v:ext="edit">
<o:idmap v:ext="edit" data="1" />
</o:shapelayout></xml><![endif]-->
</head>
<body lang="EN-US" link="#0563C1" vlink="#954F72">
<div class="WordSection1">
<p class="MsoNormal"><span style="font-size:11.0pt;font-family:"Calibri",sans-serif;color:#1F497D">>
</span>help me better understand how autovectorization of outer loops works.<o:p></o:p></p>
<p class="MsoNormal"><span style="font-size:11.0pt;font-family:"Calibri",sans-serif;color:#1F497D"><o:p> </o:p></span></p>
<p class="MsoNormal"><span style="font-size:11.0pt;font-family:"Calibri",sans-serif;color:#1F497D">LLVM’s loop vectorizer currently handles innermost loops only.<o:p></o:p></span></p>
<p class="MsoNormal"><span style="font-size:11.0pt;font-family:"Calibri",sans-serif;color:#1F497D"><o:p> </o:p></span></p>
<p class="MsoNormal"><span style="font-size:11.0pt;font-family:"Calibri",sans-serif;color:#1F497D"><o:p> </o:p></span></p>
<p class="MsoNormal"><span style="font-size:11.0pt;font-family:"Calibri",sans-serif;color:#1F497D">>
</span>I'm hoping for an autovectorization of the outer loop so that the inner loop operates on vectors.<br>
<br>
<span style="font-size:11.0pt;font-family:"Calibri",sans-serif;color:#1F497D"><o:p></o:p></span></p>
<p class="MsoNormal"><span style="font-size:11.0pt;font-family:"Calibri",sans-serif;color:#1F497D">We share that hope and are working to achieve it:
<a href="http://lists.llvm.org/pipermail/llvm-dev/2016-September/105057.html">http://lists.llvm.org/pipermail/llvm-dev/2016-September/105057.html</a>, but it will take some time. See
<a href="https://reviews.llvm.org/D28975">https://reviews.llvm.org/D28975</a> and
<a href="https://reviews.llvm.org/D32871">https://reviews.llvm.org/D32871</a>. Thanks for the use-case.<o:p></o:p></span></p>
<p class="MsoNormal"><a name="_MailEndCompose"><span style="font-size:11.0pt;font-family:"Calibri",sans-serif;color:#1F497D"><o:p> </o:p></span></a></p>
<p class="MsoNormal"><span style="font-size:11.0pt;font-family:"Calibri",sans-serif;color:#1F497D"><o:p> </o:p></span></p>
<p class="MsoNormal"><span style="font-size:11.0pt;font-family:"Calibri",sans-serif;color:#1F497D">></span> I was wondering if there are small ways in which I can change my code to help LLVM's autovectorizer to succeed.<o:p></o:p></p>
<p class="MsoNormal"><o:p> </o:p></p>
<p class="MsoNormal"><span style="font-size:11.0pt;font-family:"Calibri",sans-serif;color:#1F497D">If a doubly-nested loop can be interchanged such that the inner loop becomes vectorizable, it may help.<o:p></o:p></span></p>
<p class="MsoNormal"><span style="font-size:11.0pt;font-family:"Calibri",sans-serif;color:#1F497D"><o:p> </o:p></span></p>
<p class="MsoNormal"><span style="font-size:11.0pt;font-family:"Calibri",sans-serif;color:#1F497D">Ayal.<o:p></o:p></span></p>
<p class="MsoNormal"><span style="font-size:11.0pt;font-family:"Calibri",sans-serif;color:#1F497D"><o:p> </o:p></span></p>
<p class="MsoNormal"><span style="font-size:11.0pt;font-family:"Calibri",sans-serif;color:#1F497D"><o:p> </o:p></span></p>
<p class="MsoNormal"><a name="_____replyseparator"></a><b><span style="font-size:11.0pt;font-family:"Calibri",sans-serif">From:</span></b><span style="font-size:11.0pt;font-family:"Calibri",sans-serif"> llvm-dev [mailto:llvm-dev-bounces@lists.llvm.org]
<b>On Behalf Of </b>Jyotirmoy Bhattacharya via llvm-dev<br>
<b>Sent:</b> Wednesday, May 10, 2017 10:16<br>
<b>To:</b> llvm-dev@lists.llvm.org<br>
<b>Subject:</b> [llvm-dev] autovectorization of outer loop<o:p></o:p></span></p>
<p class="MsoNormal"><o:p> </o:p></p>
<div>
<p class="MsoNormal">I have the following C++ code that evaluates a Chebyshev polynomial using Clenshaw's algorithm<br>
<br>
void cheby_eval(double *coeffs,int n,double *xs,double *ys,int m)<br>
{<br>
  #pragma omp simd<br>
  for (int i=0;i<m;i++){<br>
    double x = xs[i];<br>
    double u0=0,u1=0,u2=0;<br>
    for (int k=n;k>=0;k--){<br>
      u2 = u1;<br>
      u1 = u0;<br>
      u0 = 2*x*u1-u2+coeffs[k];<br>
    }<br>
    ys[i] = 0.5*(coeffs[0]+u0-u2);<br>
  }<br>
}<br>
<br>
I'm hoping for an autovectorization of the outer loop so that the inner loop operates on vectors.<br>
<br>
When compiled with<br>
<br>
clang++ -O3 -march=haswell -Rpass-analysis=loop-vectorize -S chebyshev.cc<br>
<br>
using clang++ 3.8.1-23, no vectorization happens and I get the message<br>
<br>
chebyshev.cc:19:18: remark: loop not vectorized: cannot identify array bounds<br>
      [-Rpass-analysis=loop-vectorize]<br>
    ys[i] = 0.5*(coeffs[0]+u0-u2);<br>
                 ^<br>
chebyshev.cc:21:1: remark: loop not vectorized: value that could not be<br>
      identified as reduction is used outside the loop<br>
      [-Rpass-analysis=loop-vectorize]<br>
<br>
<br>
On the same code icc vectorizes the outer loop as expected.<br>
<br>
I was wondering if there are small ways in which I can change my code to help LLVM's autovectorizer to succeed. I would also appreciate any pointers to documentation or LLVM source that can help me better understand how autovectorization of outer loops works.<br>
<br>
Regards,<br>
Jyotirmoy Bhattacharya<br>
<br>
PS. The interesting part of icc's assembler output is<br>
<br>
..B1.4:                         # Preds ..B1.8 ..B1.3<br>
        xorl      %r15d, %r15d                                  #14.5<br>
        xorl      %ebx, %ebx                                    #14.21<br>
        testq     %rsi, %rsi                                    #14.21<br>
        vmovupd   (%rdx,%r9,8), %ymm3                           #12.16<br>
        vxorpd    %ymm5, %ymm5, %ymm5                           #13.14<br>
        vmovdqa   %ymm1, %ymm4                                  #13.19<br>
        vmovdqa   %ymm1, %ymm2                                  #13.24<br>
        jl        ..B1.8        # Prob 2%                       #14.21<br>
<br>
..B1.5:                         # Preds ..B1.4<br>
        vaddpd    %ymm3, %ymm3, %ymm3                           #17.14<br>
<br>
..B1.6:                         # Preds ..B1.6 ..B1.5<br>
        vmovapd   %ymm4, %ymm2                                  #20.3<br>
        incq      %r15                                          #14.5<br>
        vmovapd   %ymm5, %ymm4                                  #20.3<br>
        vfmsub213pd %ymm2, %ymm3, %ymm5                         #17.19<br>
        vbroadcastsd (%r11,%rbx,8), %ymm6                       #17.22<br>
        decq      %rbx<br>
        vaddpd    %ymm5, %ymm6, %ymm5                           #17.22<br>
        cmpq      %r10, %r15                                    #14.5<br>
        jb        ..B1.6        # Prob 82%                      #14.5<br>
<br>
..B1.8:                         # Preds ..B1.6 ..B1.4<br>
        vbroadcastsd (%rdi), %ymm3                              #19.18<br>
        vaddpd    %ymm3, %ymm5, %ymm4                           #19.28<br>
        vsubpd    %ymm2, %ymm4, %ymm2                           #19.31<br>
        vmulpd    %ymm2, %ymm0, %ymm5                           #19.31<br>
        vmovupd   %ymm5, (%rcx,%r9,8)                           #19.5<br>
        addq      $4, %r9                                       #11.3<br>
        cmpq      %r8, %r9                                      #11.3<br>
        jb        ..B1.4        # Prob 82%                      #11<o:p></o:p></p>
</div>
</div>
<p>---------------------------------------------------------------------<br>
Intel Israel (74) Limited</p>

<p>This e-mail and any attachments may contain confidential material for<br>
the sole use of the intended recipient(s). Any review or distribution<br>
by others is strictly prohibited. If you are not the intended<br>
recipient, please contact the sender and delete all copies.</p></body>
</html>