<html>
    <head>
      <base href="https://llvm.org/bugs/" />
    </head>
    <body><table border="1" cellspacing="0" cellpadding="8">
        <tr>
          <th>Bug ID</th>
          <td><a class="bz_bug_link 
          bz_status_NEW "
   title="NEW --- - [x86, SSE] phaddw / phaddd wrongly generated"
   href="https://llvm.org/bugs/show_bug.cgi?id=26859">26859</a>
          </td>
        </tr>

        <tr>
          <th>Summary</th>
          <td>[x86, SSE] phaddw / phaddd wrongly generated
          </td>
        </tr>

        <tr>
          <th>Product</th>
          <td>libraries
          </td>
        </tr>

        <tr>
          <th>Version</th>
          <td>trunk
          </td>
        </tr>

        <tr>
          <th>Hardware</th>
          <td>PC
          </td>
        </tr>

        <tr>
          <th>OS</th>
          <td>All
          </td>
        </tr>

        <tr>
          <th>Status</th>
          <td>NEW
          </td>
        </tr>

        <tr>
          <th>Severity</th>
          <td>normal
          </td>
        </tr>

        <tr>
          <th>Priority</th>
          <td>P
          </td>
        </tr>

        <tr>
          <th>Component</th>
          <td>Backend: X86
          </td>
        </tr>

        <tr>
          <th>Assignee</th>
          <td>unassignedbugs@nondot.org
          </td>
        </tr>

        <tr>
          <th>Reporter</th>
          <td>spatel+llvm@rotateright.com
          </td>
        </tr>

        <tr>
          <th>CC</th>
          <td>llvm-bugs@lists.llvm.org
          </td>
        </tr>

        <tr>
          <th>Classification</th>
          <td>Unclassified
          </td>
        </tr></table>
      <p>
        <div>
        <pre>Packed horizontal add - phaddd / phaddw:
These are SSSE3 (yes, 3 Ss) instructions that should probably never be
generated for performance reasons, only to save on code size. They're just
about guaranteed to be slow because they operate across vector lanes.

Here we're not only generating these things, but for a target that doesn't have
SSSE3:

$ cat accum.c 
int please_no_phaddd(int *x) {
  int sum = 0;
  for (int i=0; i<1024; ++i)
    sum += x[i];
  return sum;
}

short please_no_phaddw(short *x) {
  short sum = 0;
  for (int i=0; i<1024; ++i)
    sum += x[i];
  return sum;
}
bin $ ./clang -O2 -S -o - accum.c -msse -fno-unroll-loops|grep phadd
    .globl    _please_no_phaddd
_please_no_phaddd:                      ## @please_no_phaddd
    phaddd    %xmm1, %xmm1
    .globl    _please_no_phaddw
_please_no_phaddw:                      ## @please_no_phaddw
    phaddw    %xmm0, %xmm0</pre>
        </div>
      </p>
      <hr>
      <span>You are receiving this mail because:</span>
      
      <ul>
          <li>You are on the CC list for the bug.</li>
      </ul>
    </body>
</html>