<html>
    <head>
      <base href="https://bugs.llvm.org/">
    </head>
    <body><table border="1" cellspacing="0" cellpadding="8">
        <tr>
          <th>Bug ID</th>
          <td><a class="bz_bug_link 
          bz_status_NEW "
   title="NEW - [X86][SSE] Memory fold scalar unary ops with zero register passthrough"
   href="https://bugs.llvm.org/show_bug.cgi?id=40359">40359</a>
          </td>
        </tr>

        <tr>
          <th>Summary</th>
          <td>[X86][SSE] Memory fold scalar unary ops with zero register passthrough
          </td>
        </tr>

        <tr>
          <th>Product</th>
          <td>libraries
          </td>
        </tr>

        <tr>
          <th>Version</th>
          <td>trunk
          </td>
        </tr>

        <tr>
          <th>Hardware</th>
          <td>PC
          </td>
        </tr>

        <tr>
          <th>OS</th>
          <td>Windows NT
          </td>
        </tr>

        <tr>
          <th>Status</th>
          <td>NEW
          </td>
        </tr>

        <tr>
          <th>Severity</th>
          <td>enhancement
          </td>
        </tr>

        <tr>
          <th>Priority</th>
          <td>P
          </td>
        </tr>

        <tr>
          <th>Component</th>
          <td>Backend: X86
          </td>
        </tr>

        <tr>
          <th>Assignee</th>
          <td>unassignedbugs@nondot.org
          </td>
        </tr>

        <tr>
          <th>Reporter</th>
          <td>llvm-dev@redking.me.uk
          </td>
        </tr>

        <tr>
          <th>CC</th>
          <td>andrea.dibiagio@gmail.com, craig.topper@gmail.com, llvm-bugs@lists.llvm.org, llvm-dev@redking.me.uk, spatel+llvm@rotateright.com
          </td>
        </tr></table>
      <p>
        <div>
        <pre><a href="https://godbolt.org/z/uPs8BV">https://godbolt.org/z/uPs8BV</a>

We currently do this:

  vmovss (%rdi), %xmm0 # xmm0 = mem[0],zero,zero,zero
  vmovss (%rsi), %xmm1 # xmm1 = mem[0],zero,zero,zero
  vsqrtss %xmm0, %xmm0, %xmm0
  vsqrtss %xmm1, %xmm1, %xmm1
  vaddss %xmm1, %xmm0, %xmm0
  retq

but we can reduce register pressure when we have multiple uses of the zero by
doing this instead, and even when we don't reuse the register there's no
regression.

  vxorps %xmm1, %xmm1, %xmm1
  vsqrtss (%rdi), %xmm1, %xmm0
  vsqrtss (%rsi), %xmm1, %xmm1
  vaddss %xmm1, %xmm0, %xmm0
  retq

This is really about AVX encoded instructions but I can't see any reason not to
do this on older SSE instructions as well.</pre>
        </div>
      </p>


      <hr>
      <span>You are receiving this mail because:</span>

      <ul>
          <li>You are on the CC list for the bug.</li>
      </ul>
    </body>
</html>