[cfe-dev] clang fails to vectorise the product of a complex array
Raphael C via cfe-dev
cfe-dev at lists.llvm.org
Mon Jan 16 03:03:16 PST 2017
Consider this simple piece of code which takes the product of an array
of complex numbers.
#include <complex.h>
complex float f(complex float x[]) {
complex float p = 1.0;
for (int i = 0; i < 32; i++)
p *= x[i];
return p;
}
If I compile it with -O3 -march=bdver2 -ffast-math using clang 3.9.1 I get
That is unvectorised assembly.
.LCPI0_0:
.long 1065353216 # float 1
f: # @f
vxorps xmm1, xmm1, xmm1
vmovss xmm0, dword ptr [rip + .LCPI0_0] # xmm0 = mem[0],zero,zero,zero
xor eax, eax
.LBB0_1: # =>This Inner Loop Header: Depth=1
vmovss xmm2, dword ptr [rdi + 8*rax] # xmm2 = mem[0],zero,zero,zero
vmovss xmm3, dword ptr [rdi + 8*rax + 4] # xmm3 = mem[0],zero,zero,zero
vmulss xmm4, xmm2, xmm1
vmulss xmm5, xmm3, xmm1
vfmaddss xmm1, xmm3, xmm0, xmm4
vfmsubss xmm0, xmm2, xmm0, xmm5
inc rax
cmp rax, 32
jne .LBB0_1
vinsertps xmm0, xmm0, xmm1, 16 # xmm0 = xmm0[0],xmm1[0],xmm0[2,3]
ret
Am I using the wrong flags or is this simply a missing feature
currently? The target CPU is the AMD FX-8350.
As a test I also tried icc (the Intel Compiler) which does appear to
give vectorised code so it is at least possible in principle.
Raphael
More information about the cfe-dev
mailing list