[llvm-bugs] [Bug 40025] New: AArch32 and AArch64 fail to generate vsubl from intrinsics

via llvm-bugs llvm-bugs at lists.llvm.org
Fri Dec 14 09:28:10 PST 2018


            Bug ID: 40025
           Summary: AArch32 and AArch64 fail to generate vsubl from
           Product: new-bugs
           Version: unspecified
          Hardware: PC
                OS: All
            Status: NEW
          Severity: enhancement
          Priority: P
         Component: new bugs
          Assignee: unassignedbugs at nondot.org
          Reporter: resistor at mac.com
                CC: htmldeveloper at gmail.com, llvm-bugs at lists.llvm.org

For certain quantized neural network kernels in the QNNPACK library
(https://github.com/pytorch/QNNPACK) LLVM fails to generate vsubl instructions
from vsubl intrinsics, resulting in up to 2x performance degradation compared
to hand-written assembly.

The basic symptom is that the vsubl intrinsic is being split into extends +
subs.  One of the extends is determined to be loop invariant and hoisted out of
the loop.  This is actually undesirable because it results in the extend of the
other operand needing to be explicitly extended inside the loop rather than
being implicitly extended as part of a vsubl.

I'm attaching two reduced test cases.  singleuse.cpp presents a simpler version
with only a single vsubl inside the loop.  It's relatively easy to write a
profitable CodeGenPrepare rule that reverses the LICM using hasOneUse checks. 
multiuser.cpp is a more realistic example including multiple vsubl's using the
same RHS operand, inside nested control flow within the loop.  Handling this
case properly isn't trivial.

You are receiving this mail because:
You are on the CC list for the bug.
-------------- next part --------------
An HTML attachment was scrubbed...
URL: <http://lists.llvm.org/pipermail/llvm-bugs/attachments/20181214/081b6445/attachment.html>

More information about the llvm-bugs mailing list