[llvm-bugs] [Bug 40025] New: AArch32 and AArch64 fail to generate vsubl from intrinsics
llvm-bugs at lists.llvm.org
Fri Dec 14 09:28:10 PST 2018
Bug ID: 40025
Summary: AArch32 and AArch64 fail to generate vsubl from
Component: new bugs
Assignee: unassignedbugs at nondot.org
Reporter: resistor at mac.com
CC: htmldeveloper at gmail.com, llvm-bugs at lists.llvm.org
For certain quantized neural network kernels in the QNNPACK library
(https://github.com/pytorch/QNNPACK) LLVM fails to generate vsubl instructions
from vsubl intrinsics, resulting in up to 2x performance degradation compared
to hand-written assembly.
The basic symptom is that the vsubl intrinsic is being split into extends +
subs. One of the extends is determined to be loop invariant and hoisted out of
the loop. This is actually undesirable because it results in the extend of the
other operand needing to be explicitly extended inside the loop rather than
being implicitly extended as part of a vsubl.
I'm attaching two reduced test cases. singleuse.cpp presents a simpler version
with only a single vsubl inside the loop. It's relatively easy to write a
profitable CodeGenPrepare rule that reverses the LICM using hasOneUse checks.
multiuser.cpp is a more realistic example including multiple vsubl's using the
same RHS operand, inside nested control flow within the loop. Handling this
case properly isn't trivial.
You are receiving this mail because:
You are on the CC list for the bug.
-------------- next part --------------
An HTML attachment was scrubbed...
More information about the llvm-bugs