[llvm-bugs] [Bug 31857] New: Single complex addition in a loop with one iteration gives inefficient code

Fri Feb 3 05:31:09 PST 2017

https://llvm.org/bugs/show_bug.cgi?id=31857

            Bug ID: 31857
           Summary: Single complex addition in a loop with one iteration
                    gives inefficient code
           Product: new-bugs
           Version: trunk
          Hardware: PC
                OS: Linux
            Status: NEW
          Severity: normal
          Priority: P
         Component: new bugs
          Assignee: unassignedbugs at nondot.org
          Reporter: drraph at gmail.com
                CC: llvm-bugs at lists.llvm.org
    Classification: Unclassified

Consider

#include <complex.h>
complex float f(complex float x[]) {
  complex float p = 1.0;
  for (int i = 0; i < 1; i++)
    p += 2*x[i];
  return p;
}

This code is simply doubling one complex float and adding 1. 

In clang trunk with -O3  -march=core-avx2 you get

f:                                      # @f
        vmovss  xmm3, dword ptr [rdi + 4] # xmm3 = mem[0],zero,zero,zero
        vbroadcastss    xmm0, xmm3
        vmulps  xmm0, xmm0, xmmword ptr [rip + .LCPI0_0]
        vmovss  xmm2, dword ptr [rdi]   # xmm2 = mem[0],zero,zero,zero
        vbroadcastss    xmm1, xmm2
        vmovss  xmm4, dword ptr [rip + .LCPI0_1] # xmm4 = mem[0],zero,zero,zero
        vmulps  xmm1, xmm1, xmm4
        vsubps  xmm4, xmm1, xmm0
        vaddps  xmm1, xmm1, xmm0
        vblendps        xmm0, xmm4, xmm1, 2 # xmm0 = xmm4[0],xmm1[1],xmm4[2,3]
        vucomiss        xmm4, xmm4
        jnp     .LBB0_3
        vmovshdup       xmm1, xmm1      # xmm1 = xmm1[1,1,3,3]
        vucomiss        xmm1, xmm1
        jp      .LBB0_2
.LBB0_3:
        vmovss  xmm1, dword ptr [rip + .LCPI0_2] # xmm1 = mem[0],zero,zero,zero
        vaddps  xmm0, xmm0, xmm1
        ret
.LBB0_2:
        push    rax
        vmovss  xmm0, dword ptr [rip + .LCPI0_1] # xmm0 = mem[0],zero,zero,zero
        vxorps  xmm1, xmm1, xmm1
        call    __mulsc3
        add     rsp, 8
        jmp     .LBB0_3

Using the Intel Compiler with -O3  -march=core-avx2  -fp-model strict you get:

f:
        vmovsd    xmm0, QWORD PTR [rdi]                         #5.12
        vmulps    xmm2, xmm0, XMMWORD PTR .L_2il0floatpacket.1[rip] #5.12
        vmovsd    xmm1, QWORD PTR p.152.0.0.1[rip]              #3.19
        vaddps    xmm0, xmm1, xmm2                              #5.5
        ret    

as expected.

The -fp-model strict tells the compiler to strictly adhere to value-safe
optimizations when implementing floating-point calculations and enables
floating-point exception semantics. It also turns off fuse add multiply which
might not be relevant here.

If you turn on -ffast-math in clang trunk you do get much better although still
not ideal code:

f:                                      # @f
        vmovss  xmm0, dword ptr [rdi]   # xmm0 = mem[0],zero,zero,zero
        vmovss  xmm1, dword ptr [rdi + 4] # xmm1 = mem[0],zero,zero,zero
        vaddss  xmm1, xmm1, xmm1
        vmovss  xmm2, dword ptr [rip + .LCPI0_0] # xmm2 = mem[0],zero,zero,zero
        vfmadd213ss     xmm2, xmm0, dword ptr [rip + .LCPI0_1]
        vinsertps       xmm0, xmm2, xmm1, 16 # xmm0 = xmm2[0],xmm1[0],xmm2[2,3]
        ret

-- 
You are receiving this mail because:
You are on the CC list for the bug.
-------------- next part --------------
An HTML attachment was scrubbed...
URL: <http://lists.llvm.org/pipermail/llvm-bugs/attachments/20170203/01b5aa2a/attachment.html>