<html>
    <head>
      <base href="https://bugs.llvm.org/">
    </head>
    <body><table border="1" cellspacing="0" cellpadding="8">
        <tr>
          <th>Bug ID</th>
          <td><a class="bz_bug_link 
          bz_status_NEW "
   title="NEW - AArch32 and AArch64 fail to generate vsubl from intrinsics"
   href="https://bugs.llvm.org/show_bug.cgi?id=40025">40025</a>
          </td>
        </tr>

        <tr>
          <th>Summary</th>
          <td>AArch32 and AArch64 fail to generate vsubl from intrinsics
          </td>
        </tr>

        <tr>
          <th>Product</th>
          <td>new-bugs
          </td>
        </tr>

        <tr>
          <th>Version</th>
          <td>unspecified
          </td>
        </tr>

        <tr>
          <th>Hardware</th>
          <td>PC
          </td>
        </tr>

        <tr>
          <th>OS</th>
          <td>All
          </td>
        </tr>

        <tr>
          <th>Status</th>
          <td>NEW
          </td>
        </tr>

        <tr>
          <th>Severity</th>
          <td>enhancement
          </td>
        </tr>

        <tr>
          <th>Priority</th>
          <td>P
          </td>
        </tr>

        <tr>
          <th>Component</th>
          <td>new bugs
          </td>
        </tr>

        <tr>
          <th>Assignee</th>
          <td>unassignedbugs@nondot.org
          </td>
        </tr>

        <tr>
          <th>Reporter</th>
          <td>resistor@mac.com
          </td>
        </tr>

        <tr>
          <th>CC</th>
          <td>htmldeveloper@gmail.com, llvm-bugs@lists.llvm.org
          </td>
        </tr></table>
      <p>
        <div>
        <pre>For certain quantized neural network kernels in the QNNPACK library
(<a href="https://github.com/pytorch/QNNPACK">https://github.com/pytorch/QNNPACK</a>) LLVM fails to generate vsubl instructions
from vsubl intrinsics, resulting in up to 2x performance degradation compared
to hand-written assembly.

The basic symptom is that the vsubl intrinsic is being split into extends +
subs.  One of the extends is determined to be loop invariant and hoisted out of
the loop.  This is actually undesirable because it results in the extend of the
other operand needing to be explicitly extended inside the loop rather than
being implicitly extended as part of a vsubl.

I'm attaching two reduced test cases.  singleuse.cpp presents a simpler version
with only a single vsubl inside the loop.  It's relatively easy to write a
profitable CodeGenPrepare rule that reverses the LICM using hasOneUse checks. 
multiuser.cpp is a more realistic example including multiple vsubl's using the
same RHS operand, inside nested control flow within the loop.  Handling this
case properly isn't trivial.</pre>
        </div>
      </p>


      <hr>
      <span>You are receiving this mail because:</span>

      <ul>
          <li>You are on the CC list for the bug.</li>
      </ul>
    </body>
</html>