[llvm-dev] autovectorization of outer loop

Jyotirmoy Bhattacharya via llvm-dev llvm-dev at lists.llvm.org
Wed May 10 00:16:01 PDT 2017


I have the following C++ code that evaluates a Chebyshev polynomial using
Clenshaw's algorithm

void cheby_eval(double *coeffs,int n,double *xs,double *ys,int m)
{
  #pragma omp simd
  for (int i=0;i<m;i++){
    double x = xs[i];
    double u0=0,u1=0,u2=0;
    for (int k=n;k>=0;k--){
      u2 = u1;
      u1 = u0;
      u0 = 2*x*u1-u2+coeffs[k];
    }
    ys[i] = 0.5*(coeffs[0]+u0-u2);
  }
}

I'm hoping for an autovectorization of the outer loop so that the inner
loop operates on vectors.

When compiled with

clang++ -O3 -march=haswell -Rpass-analysis=loop-vectorize -S chebyshev.cc

using clang++ 3.8.1-23, no vectorization happens and I get the message

chebyshev.cc:19:18: remark: loop not vectorized: cannot identify array
bounds
      [-Rpass-analysis=loop-vectorize]
    ys[i] = 0.5*(coeffs[0]+u0-u2);
                 ^
chebyshev.cc:21:1: remark: loop not vectorized: value that could not be
      identified as reduction is used outside the loop
      [-Rpass-analysis=loop-vectorize]


On the same code icc vectorizes the outer loop as expected.

I was wondering if there are small ways in which I can change my code to
help LLVM's autovectorizer to succeed. I would also appreciate any pointers
to documentation or LLVM source that can help me better understand how
autovectorization of outer loops works.

Regards,
Jyotirmoy Bhattacharya

PS. The interesting part of icc's assembler output is

..B1.4:                         # Preds ..B1.8 ..B1.3
        xorl      %r15d, %r15d                                  #14.5
        xorl      %ebx, %ebx                                    #14.21
        testq     %rsi, %rsi                                    #14.21
        vmovupd   (%rdx,%r9,8), %ymm3                           #12.16
        vxorpd    %ymm5, %ymm5, %ymm5                           #13.14
        vmovdqa   %ymm1, %ymm4                                  #13.19
        vmovdqa   %ymm1, %ymm2                                  #13.24
        jl        ..B1.8        # Prob 2%                       #14.21

..B1.5:                         # Preds ..B1.4
        vaddpd    %ymm3, %ymm3, %ymm3                           #17.14

..B1.6:                         # Preds ..B1.6 ..B1.5
        vmovapd   %ymm4, %ymm2                                  #20.3
        incq      %r15                                          #14.5
        vmovapd   %ymm5, %ymm4                                  #20.3
        vfmsub213pd %ymm2, %ymm3, %ymm5                         #17.19
        vbroadcastsd (%r11,%rbx,8), %ymm6                       #17.22
        decq      %rbx
        vaddpd    %ymm5, %ymm6, %ymm5                           #17.22
        cmpq      %r10, %r15                                    #14.5
        jb        ..B1.6        # Prob 82%                      #14.5

..B1.8:                         # Preds ..B1.6 ..B1.4
        vbroadcastsd (%rdi), %ymm3                              #19.18
        vaddpd    %ymm3, %ymm5, %ymm4                           #19.28
        vsubpd    %ymm2, %ymm4, %ymm2                           #19.31
        vmulpd    %ymm2, %ymm0, %ymm5                           #19.31
        vmovupd   %ymm5, (%rcx,%r9,8)                           #19.5
        addq      $4, %r9                                       #11.3
        cmpq      %r8, %r9                                      #11.3
        jb        ..B1.4        # Prob 82%                      #11
-------------- next part --------------
An HTML attachment was scrubbed...
URL: <http://lists.llvm.org/pipermail/llvm-dev/attachments/20170510/9a48b564/attachment.html>


More information about the llvm-dev mailing list