[LLVMbugs] [Bug 21975] New: recognize dot products [x86/SSE: dpps, dppd]

Thu Dec 18 15:11:18 PST 2014

http://llvm.org/bugs/show_bug.cgi?id=21975

            Bug ID: 21975
           Summary: recognize dot products [x86/SSE: dpps, dppd]
           Product: libraries
           Version: trunk
          Hardware: PC
                OS: All
            Status: NEW
          Severity: normal
          Priority: P
         Component: Backend: X86
          Assignee: unassignedbugs at nondot.org
          Reporter: spatel+llvm at rotateright.com
                CC: llvmbugs at cs.uiuc.edu
    Classification: Unclassified

This probably qualifies as a Heroic / Stupid Compiler Trick, but we should
recognize dot product operations and generate the SSE specialized instructions
('vdpps' / 'vdppd') for them. 

I just noticed this pattern in test-suite/MultiSource/Benchmarks/Bullet.

'dpps' probably doesn't execute any faster than a sequence using horizontal
vector adds on any recent hardware, but it is smaller code at least.

$ cat dpps.c 
float dpps(float *v1, float *v2) {
    float mul0 = v1[0] * v2[0];
    float mul1 = v1[1] * v2[1];
    float mul2 = v1[2] * v2[2];
    float mul3 = v1[3] * v2[3];
    return mul0 + mul1 + mul2 + mul3;
}

$ ./clang -O2 -ffast-math -march=corei7-avx dpps.c -S -o -
...
    vmovss    (%rdi), %xmm0
    vmovss    4(%rdi), %xmm1
    vmulss    (%rsi), %xmm0, %xmm0
    vmulss    4(%rsi), %xmm1, %xmm1
    vmovss    8(%rdi), %xmm2
    vmulss    8(%rsi), %xmm2, %xmm2
    vmovss    12(%rdi), %xmm3
    vmulss    12(%rsi), %xmm3, %xmm3
    vaddss    %xmm1, %xmm0, %xmm0
    vaddss    %xmm2, %xmm0, %xmm0
    vaddss    %xmm3, %xmm0, %xmm0
    popq    %rbp
    retq

This could be (as icc 15 does):
        vmovups   (%rsi), %xmm0
        vmovups   (%rdi), %xmm1
        vdpps     $241, %xmm1, %xmm0, %xmm0
        ret

-- 
You are receiving this mail because:
You are on the CC list for the bug.
-------------- next part --------------
An HTML attachment was scrubbed...
URL: <http://lists.llvm.org/pipermail/llvm-bugs/attachments/20141218/450b69f2/attachment.html>