[llvm-bugs] [Bug 31857] New: Single complex addition in a loop with one iteration gives inefficient code
via llvm-bugs
llvm-bugs at lists.llvm.org
Fri Feb 3 05:31:09 PST 2017
https://llvm.org/bugs/show_bug.cgi?id=31857
Bug ID: 31857
Summary: Single complex addition in a loop with one iteration
gives inefficient code
Product: new-bugs
Version: trunk
Hardware: PC
OS: Linux
Status: NEW
Severity: normal
Priority: P
Component: new bugs
Assignee: unassignedbugs at nondot.org
Reporter: drraph at gmail.com
CC: llvm-bugs at lists.llvm.org
Classification: Unclassified
Consider
#include <complex.h>
complex float f(complex float x[]) {
complex float p = 1.0;
for (int i = 0; i < 1; i++)
p += 2*x[i];
return p;
}
This code is simply doubling one complex float and adding 1.
In clang trunk with -O3 -march=core-avx2 you get
f: # @f
vmovss xmm3, dword ptr [rdi + 4] # xmm3 = mem[0],zero,zero,zero
vbroadcastss xmm0, xmm3
vmulps xmm0, xmm0, xmmword ptr [rip + .LCPI0_0]
vmovss xmm2, dword ptr [rdi] # xmm2 = mem[0],zero,zero,zero
vbroadcastss xmm1, xmm2
vmovss xmm4, dword ptr [rip + .LCPI0_1] # xmm4 = mem[0],zero,zero,zero
vmulps xmm1, xmm1, xmm4
vsubps xmm4, xmm1, xmm0
vaddps xmm1, xmm1, xmm0
vblendps xmm0, xmm4, xmm1, 2 # xmm0 = xmm4[0],xmm1[1],xmm4[2,3]
vucomiss xmm4, xmm4
jnp .LBB0_3
vmovshdup xmm1, xmm1 # xmm1 = xmm1[1,1,3,3]
vucomiss xmm1, xmm1
jp .LBB0_2
.LBB0_3:
vmovss xmm1, dword ptr [rip + .LCPI0_2] # xmm1 = mem[0],zero,zero,zero
vaddps xmm0, xmm0, xmm1
ret
.LBB0_2:
push rax
vmovss xmm0, dword ptr [rip + .LCPI0_1] # xmm0 = mem[0],zero,zero,zero
vxorps xmm1, xmm1, xmm1
call __mulsc3
add rsp, 8
jmp .LBB0_3
Using the Intel Compiler with -O3 -march=core-avx2 -fp-model strict you get:
f:
vmovsd xmm0, QWORD PTR [rdi] #5.12
vmulps xmm2, xmm0, XMMWORD PTR .L_2il0floatpacket.1[rip] #5.12
vmovsd xmm1, QWORD PTR p.152.0.0.1[rip] #3.19
vaddps xmm0, xmm1, xmm2 #5.5
ret
as expected.
The -fp-model strict tells the compiler to strictly adhere to value-safe
optimizations when implementing floating-point calculations and enables
floating-point exception semantics. It also turns off fuse add multiply which
might not be relevant here.
If you turn on -ffast-math in clang trunk you do get much better although still
not ideal code:
f: # @f
vmovss xmm0, dword ptr [rdi] # xmm0 = mem[0],zero,zero,zero
vmovss xmm1, dword ptr [rdi + 4] # xmm1 = mem[0],zero,zero,zero
vaddss xmm1, xmm1, xmm1
vmovss xmm2, dword ptr [rip + .LCPI0_0] # xmm2 = mem[0],zero,zero,zero
vfmadd213ss xmm2, xmm0, dword ptr [rip + .LCPI0_1]
vinsertps xmm0, xmm2, xmm1, 16 # xmm0 = xmm2[0],xmm1[0],xmm2[2,3]
ret
--
You are receiving this mail because:
You are on the CC list for the bug.
-------------- next part --------------
An HTML attachment was scrubbed...
URL: <http://lists.llvm.org/pipermail/llvm-bugs/attachments/20170203/01b5aa2a/attachment.html>
More information about the llvm-bugs
mailing list