[cfe-dev] clang fails to vectorise the product of a complex array

Mon Jan 16 03:03:16 PST 2017

Consider this simple piece of code which takes the product of an array
of complex numbers.

#include <complex.h>
complex float f(complex float x[]) {
  complex float p = 1.0;
  for (int i = 0; i < 32; i++)
    p *= x[i];
  return p;
}

If I compile it with -O3 -march=bdver2 -ffast-math  using clang 3.9.1 I get

That is unvectorised assembly.

.LCPI0_0:
        .long   1065353216              # float 1
f:                                      # @f
        vxorps  xmm1, xmm1, xmm1
        vmovss  xmm0, dword ptr [rip + .LCPI0_0] # xmm0 = mem[0],zero,zero,zero
        xor     eax, eax
.LBB0_1:                                # =>This Inner Loop Header: Depth=1
        vmovss  xmm2, dword ptr [rdi + 8*rax] # xmm2 = mem[0],zero,zero,zero
        vmovss  xmm3, dword ptr [rdi + 8*rax + 4] # xmm3 = mem[0],zero,zero,zero
        vmulss  xmm4, xmm2, xmm1
        vmulss  xmm5, xmm3, xmm1
        vfmaddss        xmm1, xmm3, xmm0, xmm4
        vfmsubss        xmm0, xmm2, xmm0, xmm5
        inc     rax
        cmp     rax, 32
        jne     .LBB0_1
        vinsertps       xmm0, xmm0, xmm1, 16 # xmm0 = xmm0[0],xmm1[0],xmm0[2,3]
        ret

Am I using the wrong flags or is this simply a missing feature
currently?  The target CPU is the AMD FX-8350.

As a test I also tried icc (the Intel Compiler) which does appear to
give vectorised code so it is at least possible in principle.

Raphael