<html>

    <head>

      <base href="http://llvm.org/bugs/" />

    </head>

    <body><table border="1" cellspacing="0" cellpadding="8">

        <tr>

          <th>Bug ID</th>

          <td><a class="bz_bug_link 

          bz_status_NEW "

   title="NEW --- - recognize dot products [x86/SSE: dpps, dppd]"

   href="http://llvm.org/bugs/show_bug.cgi?id=21975">21975</a>

          </td>

        </tr>

        <tr>

          <th>Summary</th>

          <td>recognize dot products [x86/SSE: dpps, dppd]

          </td>

        </tr>

        <tr>

          <th>Product</th>

          <td>libraries

          </td>

        </tr>

        <tr>

          <th>Version</th>

          <td>trunk

          </td>

        </tr>

        <tr>

          <th>Hardware</th>

          <td>PC

          </td>

        </tr>

        <tr>

          <th>OS</th>

          <td>All

          </td>

        </tr>

        <tr>

          <th>Status</th>

          <td>NEW

          </td>

        </tr>

        <tr>

          <th>Severity</th>

          <td>normal

          </td>

        </tr>

        <tr>

          <th>Priority</th>

          <td>P

          </td>

        </tr>

        <tr>

          <th>Component</th>

          <td>Backend: X86

          </td>

        </tr>

        <tr>

          <th>Assignee</th>

          <td>unassignedbugs@nondot.org

          </td>

        </tr>

        <tr>

          <th>Reporter</th>

          <td>spatel+llvm@rotateright.com

          </td>

        </tr>

        <tr>

          <th>CC</th>

          <td>llvmbugs@cs.uiuc.edu

          </td>

        </tr>

        <tr>

          <th>Classification</th>

          <td>Unclassified

          </td>

        </tr></table>

      <p>

        <div>

        <pre>This probably qualifies as a Heroic / Stupid Compiler Trick, but we should

recognize dot product operations and generate the SSE specialized instructions

('vdpps' / 'vdppd') for them. 

I just noticed this pattern in test-suite/MultiSource/Benchmarks/Bullet.

'dpps' probably doesn't execute any faster than a sequence using horizontal

vector adds on any recent hardware, but it is smaller code at least.

$ cat dpps.c 

float dpps(float *v1, float *v2) {

    float mul0 = v1[0] * v2[0];

    float mul1 = v1[1] * v2[1];

    float mul2 = v1[2] * v2[2];

    float mul3 = v1[3] * v2[3];

    return mul0 + mul1 + mul2 + mul3;

}

$ ./clang -O2 -ffast-math -march=corei7-avx dpps.c -S -o -

...

    vmovss    (%rdi), %xmm0

    vmovss    4(%rdi), %xmm1

    vmulss    (%rsi), %xmm0, %xmm0

    vmulss    4(%rsi), %xmm1, %xmm1

    vmovss    8(%rdi), %xmm2

    vmulss    8(%rsi), %xmm2, %xmm2

    vmovss    12(%rdi), %xmm3

    vmulss    12(%rsi), %xmm3, %xmm3

    vaddss    %xmm1, %xmm0, %xmm0

    vaddss    %xmm2, %xmm0, %xmm0

    vaddss    %xmm3, %xmm0, %xmm0

    popq    %rbp

    retq

This could be (as icc 15 does):

        vmovups   (%rsi), %xmm0

        vmovups   (%rdi), %xmm1

        vdpps     $241, %xmm1, %xmm0, %xmm0

        ret</pre>

        </div>

      </p>

      <hr>

      <span>You are receiving this mail because:</span>

      <ul>

          <li>You are on the CC list for the bug.</li>

      </ul>

    </body>

</html>