<html>
    <head>
      <base href="https://llvm.org/bugs/" />
    </head>
    <body><table border="1" cellspacing="0" cellpadding="8">
        <tr>
          <th>Bug ID</th>
          <td><a class="bz_bug_link 
          bz_status_NEW "
   title="NEW --- - Combining unary integer shuffles to PSHUFB"
   href="https://llvm.org/bugs/show_bug.cgi?id=26183">26183</a>
          </td>
        </tr>

        <tr>
          <th>Summary</th>
          <td>Combining unary integer shuffles to PSHUFB
          </td>
        </tr>

        <tr>
          <th>Product</th>
          <td>libraries
          </td>
        </tr>

        <tr>
          <th>Version</th>
          <td>trunk
          </td>
        </tr>

        <tr>
          <th>Hardware</th>
          <td>PC
          </td>
        </tr>

        <tr>
          <th>OS</th>
          <td>All
          </td>
        </tr>

        <tr>
          <th>Status</th>
          <td>NEW
          </td>
        </tr>

        <tr>
          <th>Severity</th>
          <td>normal
          </td>
        </tr>

        <tr>
          <th>Priority</th>
          <td>P
          </td>
        </tr>

        <tr>
          <th>Component</th>
          <td>Backend: X86
          </td>
        </tr>

        <tr>
          <th>Assignee</th>
          <td>unassignedbugs@nondot.org
          </td>
        </tr>

        <tr>
          <th>Reporter</th>
          <td>llvm-dev@redking.me.uk
          </td>
        </tr>

        <tr>
          <th>CC</th>
          <td>llvm-bugs@lists.llvm.org
          </td>
        </tr>

        <tr>
          <th>Classification</th>
          <td>Unclassified
          </td>
        </tr></table>
      <p>
        <div>
        <pre>As discussed on D14901.

Currently the threshold for combining separate unary integer shuffles (PSHUFD,
PSHUFLW, PSHUFHW, etc.) into a single PSHUFB call is 3 instructions.

On many targets (even older ones like Wolfdale and Nehalem), the throughput is
such that it could be beneficial to combine even for just 2 instructions
(although the loading of the shuffle mask is an additional consideration).

However other targets (Atom and many AMD targets) would be better with the
current or a higher threshold.

Are we happy with the current threshold of 3 instructions?

Would it make sense to either use a feature flag to indicate fast/slow PSHUFB
performance or attempt to make use of the scheduler model to determine when to
combine to PSHUFB?

Should we be looking to use the scheduler model for more combine decisions?</pre>
        </div>
      </p>
      <hr>
      <span>You are receiving this mail because:</span>
      
      <ul>
          <li>You are on the CC list for the bug.</li>
      </ul>
    </body>
</html>