<html>
    <head>
      <base href="https://bugs.llvm.org/">
    </head>
    <body><table border="1" cellspacing="0" cellpadding="8">
        <tr>
          <th>Bug ID</th>
          <td><a class="bz_bug_link 
          bz_status_NEW "
   title="NEW - Compiler no longer optimizing 1 horizontal add/sub patterns after r310632"
   href="https://bugs.llvm.org/show_bug.cgi?id=34725">34725</a>
          </td>
        </tr>

        <tr>
          <th>Summary</th>
          <td>Compiler no longer optimizing 1 horizontal add/sub patterns after r310632
          </td>
        </tr>

        <tr>
          <th>Product</th>
          <td>new-bugs
          </td>
        </tr>

        <tr>
          <th>Version</th>
          <td>trunk
          </td>
        </tr>

        <tr>
          <th>Hardware</th>
          <td>PC
          </td>
        </tr>

        <tr>
          <th>OS</th>
          <td>Windows NT
          </td>
        </tr>

        <tr>
          <th>Status</th>
          <td>NEW
          </td>
        </tr>

        <tr>
          <th>Severity</th>
          <td>normal
          </td>
        </tr>

        <tr>
          <th>Priority</th>
          <td>P
          </td>
        </tr>

        <tr>
          <th>Component</th>
          <td>new bugs
          </td>
        </tr>

        <tr>
          <th>Assignee</th>
          <td>unassignedbugs@nondot.org
          </td>
        </tr>

        <tr>
          <th>Reporter</th>
          <td>douglas_yung@playstation.sony.com
          </td>
        </tr>

        <tr>
          <th>CC</th>
          <td>llvm-bugs@lists.llvm.org
          </td>
        </tr></table>
      <p>
        <div>
        <pre>After upstream commit r310632, the compiler is no longer generating a
horizontal add/sub instruction for a particular add/sub + shuffle pattern.
There may be additional affected patterns that might be masked by the changes
made previously in r310260.

Consider this example (also fails if you use subtraction instead of addition):

__attribute__((noinline))
__m128 add_ps_002(__m128 a, __m128 b) {
  __m128 r = (__m128){ a[0] + a[1], a[2] + a[3], b[0] + b[1], b[2] + b[3] };
  return __builtin_shufflevector(r, a, 0, -1, 2, 3);
}

When compiling with "-S -O2 -march=bdver2", prior to r310632, the compiler
would produce the following assembly for this function:

vmovaps (%rcx), %xmm0
vhaddps (%rdx), %xmm0, %xmm0
retq

But after r310632, the compiler is now producing the following assembly:

vmovaps (%rdx), %xmm0
vmovaps (%rcx), %xmm1
vinsertps       $28, %xmm0, %xmm1, %xmm2 # xmm2 = xmm1[0],xmm0[0],zero,zero
vinsertps       $76, %xmm1, %xmm0, %xmm1 # xmm1 = xmm1[1],xmm0[1],zero,zero
vaddps  %xmm1, %xmm2, %xmm1
vpermilpd       $1, %xmm0, %xmm2 # xmm2 = xmm0[1,0]
vpermilps       $231, %xmm0, %xmm0 # xmm0 = xmm0[3,1,2,3]
vaddss  %xmm0, %xmm2, %xmm0
vpermilps       $212, %xmm1, %xmm1 # xmm1 = xmm1[0,1,1,3]
vinsertps       $48, %xmm0, %xmm1, %xmm0 # xmm0 = xmm1[0,1,2],xmm0[0]
retq</pre>
        </div>
      </p>


      <hr>
      <span>You are receiving this mail because:</span>

      <ul>
          <li>You are on the CC list for the bug.</li>
      </ul>
    </body>
</html>