<html>
    <head>
      <base href="https://bugs.llvm.org/">
    </head>
    <body><table border="1" cellspacing="0" cellpadding="8">
        <tr>
          <th>Bug ID</th>
          <td><a class="bz_bug_link 
          bz_status_NEW "
   title="NEW - [X86] FastSHLD optimization doesn't work well with TwoAddressInstruction pass"
   href="https://bugs.llvm.org/show_bug.cgi?id=41055">41055</a>
          </td>
        </tr>

        <tr>
          <th>Summary</th>
          <td>[X86] FastSHLD optimization doesn't work well with TwoAddressInstruction pass
          </td>
        </tr>

        <tr>
          <th>Product</th>
          <td>libraries
          </td>
        </tr>

        <tr>
          <th>Version</th>
          <td>trunk
          </td>
        </tr>

        <tr>
          <th>Hardware</th>
          <td>PC
          </td>
        </tr>

        <tr>
          <th>OS</th>
          <td>Windows NT
          </td>
        </tr>

        <tr>
          <th>Status</th>
          <td>NEW
          </td>
        </tr>

        <tr>
          <th>Severity</th>
          <td>enhancement
          </td>
        </tr>

        <tr>
          <th>Priority</th>
          <td>P
          </td>
        </tr>

        <tr>
          <th>Component</th>
          <td>Backend: X86
          </td>
        </tr>

        <tr>
          <th>Assignee</th>
          <td>unassignedbugs@nondot.org
          </td>
        </tr>

        <tr>
          <th>Reporter</th>
          <td>craig.topper@gmail.com
          </td>
        </tr>

        <tr>
          <th>CC</th>
          <td>craig.topper@gmail.com, llvm-bugs@lists.llvm.org, llvm-dev@redking.me.uk, spatel+llvm@rotateright.com
          </td>
        </tr></table>
      <p>
        <div>
        <pre>We have a special case to use a SHLD for rotate by immediate on Sandybridge and
Ivybridge. This avoids a dependecy on previous flag writers due to the strange
behavior of the flags for rotate.  The hardware on these CPUs has special
support for this idiom, but it requires that the 2 registers used by the SHLD
are the same physical register.

Unfortunately, the TwoAddressInstruction pass has to insert a COPY in front of
the SHLD to leave SSA form. This COPY will then become the input to the tied
src/dest pair. it will not be used for the other source. If the coalescer
doesn't remove this copy and the register allocator chooses a different
register for the input and output of the copy, then we will not have the same
physical registers for the SHLD instruction. So the hardware optimization won't
trigger.

The easiest way I can see to overcome this is to create a pseudo instruction in
X86 that will make the TwoAddressInstruction only see a single source. We can
expand it after register allocation to SHLD with the same source used twice.</pre>
        </div>
      </p>


      <hr>
      <span>You are receiving this mail because:</span>

      <ul>
          <li>You are on the CC list for the bug.</li>
      </ul>
    </body>
</html>