<html>
    <head>
      <base href="https://bugs.llvm.org/">
    </head>
    <body><table border="1" cellspacing="0" cellpadding="8">
        <tr>
          <th>Bug ID</th>
          <td><a class="bz_bug_link 
          bz_status_NEW "
   title="NEW - [X86] Suboptimal code in vXi8 vector multiply reduction"
   href="https://bugs.llvm.org/show_bug.cgi?id=39709">39709</a>
          </td>
        </tr>

        <tr>
          <th>Summary</th>
          <td>[X86] Suboptimal code in vXi8 vector multiply reduction
          </td>
        </tr>

        <tr>
          <th>Product</th>
          <td>libraries
          </td>
        </tr>

        <tr>
          <th>Version</th>
          <td>trunk
          </td>
        </tr>

        <tr>
          <th>Hardware</th>
          <td>PC
          </td>
        </tr>

        <tr>
          <th>OS</th>
          <td>Windows NT
          </td>
        </tr>

        <tr>
          <th>Status</th>
          <td>NEW
          </td>
        </tr>

        <tr>
          <th>Severity</th>
          <td>enhancement
          </td>
        </tr>

        <tr>
          <th>Priority</th>
          <td>P
          </td>
        </tr>

        <tr>
          <th>Component</th>
          <td>Backend: X86
          </td>
        </tr>

        <tr>
          <th>Assignee</th>
          <td>unassignedbugs@nondot.org
          </td>
        </tr>

        <tr>
          <th>Reporter</th>
          <td>craig.topper@gmail.com
          </td>
        </tr>

        <tr>
          <th>CC</th>
          <td>craig.topper@gmail.com, llvm-bugs@lists.llvm.org, llvm-dev@redking.me.uk, spatel+llvm@rotateright.com
          </td>
        </tr></table>
      <p>
        <div>
        <pre>Multiplying vXi8 vectors requires widening elements to 16 bits to use vXi16
pmullw then shrinking back to i8. As of r347240 we use punpacklbw/punpackhbw to
do the expansion create undef upper elements and we use an AND+PACKUS to merge
the high and low unpacked values back together after the two pmullw.

When we're doing a horizontal reduction we end up packing after each step and
then unpacking at the start of the next step. It would be great if we could
combine these size changes away.

Some of the packs and unpacks are separated by shuffles to move elements from
higher elements to lower elements to do the reduction. We should see if we can
handle widening those element movement shuffles as well.

These things can be seen in vector-reduce-mul.ll</pre>
        </div>
      </p>


      <hr>
      <span>You are receiving this mail because:</span>

      <ul>
          <li>You are on the CC list for the bug.</li>
      </ul>
    </body>
</html>