<div dir="ltr">I have the following C++ code that evaluates a Chebyshev polynomial using Clenshaw's algorithm<br><br>void cheby_eval(double *coeffs,int n,double *xs,double *ys,int m)<br>{<br>  #pragma omp simd<br>  for (int i=0;i<m;i++){<br>    double x = xs[i];<br>    double u0=0,u1=0,u2=0;<br>    for (int k=n;k>=0;k--){<br>      u2 = u1;<br>      u1 = u0;<br>      u0 = 2*x*u1-u2+coeffs[k];<br>    }<br>    ys[i] = 0.5*(coeffs[0]+u0-u2);<br>  }<br>}<br><br>I'm hoping for an autovectorization of the outer loop so that the inner loop operates on vectors.<br><br>When compiled with<br><br>clang++ -O3 -march=haswell -Rpass-analysis=loop-vectorize -S chebyshev.cc<br><br>using clang++ 3.8.1-23, no vectorization happens and I get the message<br><br>chebyshev.cc:19:18: remark: loop not vectorized: cannot identify array bounds<br>      [-Rpass-analysis=loop-vectorize]<br>    ys[i] = 0.5*(coeffs[0]+u0-u2);<br>                 ^<br>chebyshev.cc:21:1: remark: loop not vectorized: value that could not be<br>      identified as reduction is used outside the loop<br>      [-Rpass-analysis=loop-vectorize]<br><br><br>On the same code icc vectorizes the outer loop as expected.<br><br>I was wondering if there are small ways in which I can change my code to help LLVM's autovectorizer to succeed. I would also appreciate any pointers to documentation or LLVM source that can help me better understand how autovectorization of outer loops works.<br><br>Regards,<br>Jyotirmoy Bhattacharya<br><br>PS. The interesting part of icc's assembler output is<br><br>..B1.4:                         # Preds ..B1.8 ..B1.3<br>        xorl      %r15d, %r15d                                  #14.5<br>        xorl      %ebx, %ebx                                    #14.21<br>        testq     %rsi, %rsi                                    #14.21<br>        vmovupd   (%rdx,%r9,8), %ymm3                           #12.16<br>        vxorpd    %ymm5, %ymm5, %ymm5                           #13.14<br>        vmovdqa   %ymm1, %ymm4                                  #13.19<br>        vmovdqa   %ymm1, %ymm2                                  #13.24<br>        jl        ..B1.8        # Prob 2%                       #14.21<br><br>..B1.5:                         # Preds ..B1.4<br>        vaddpd    %ymm3, %ymm3, %ymm3                           #17.14<br><br>..B1.6:                         # Preds ..B1.6 ..B1.5<br>        vmovapd   %ymm4, %ymm2                                  #20.3<br>        incq      %r15                                          #14.5<br>        vmovapd   %ymm5, %ymm4                                  #20.3<br>        vfmsub213pd %ymm2, %ymm3, %ymm5                         #17.19<br>        vbroadcastsd (%r11,%rbx,8), %ymm6                       #17.22<br>        decq      %rbx<br>        vaddpd    %ymm5, %ymm6, %ymm5                           #17.22<br>        cmpq      %r10, %r15                                    #14.5<br>        jb        ..B1.6        # Prob 82%                      #14.5<br><br>..B1.8:                         # Preds ..B1.6 ..B1.4<br>        vbroadcastsd (%rdi), %ymm3                              #19.18<br>        vaddpd    %ymm3, %ymm5, %ymm4                           #19.28<br>        vsubpd    %ymm2, %ymm4, %ymm2                           #19.31<br>        vmulpd    %ymm2, %ymm0, %ymm5                           #19.31<br>        vmovupd   %ymm5, (%rcx,%r9,8)                           #19.5<br>        addq      $4, %r9                                       #11.3<br>        cmpq      %r8, %r9                                      #11.3<br>        jb        ..B1.4        # Prob 82%                      #11</div>