[llvm-bugs] [Bug 52394] New: [aarch64] Inappropriate optimization: vtstq NEON intrinsic compiled as a sequence of instructions
via llvm-bugs
llvm-bugs at lists.llvm.org
Wed Nov 3 18:23:24 PDT 2021
https://bugs.llvm.org/show_bug.cgi?id=52394
Bug ID: 52394
Summary: [aarch64] Inappropriate optimization: vtstq NEON
intrinsic compiled as a sequence of instructions
Product: new-bugs
Version: trunk
Hardware: PC
OS: Windows NT
Status: NEW
Severity: enhancement
Priority: P
Component: new bugs
Assignee: unassignedbugs at nondot.org
Reporter: roman.zelenyi at gmail.com
CC: htmldeveloper at gmail.com, llvm-bugs at lists.llvm.org
In some cases clang compiles vtstq intrinsic as a sequence of and/cmeq
instructions, instead of just a single cmtst.
For example:
#include <arm_neon.h>
uint32x4_t foo(uint32x4_t v1, uint32x4_t v2, uint32x4_t v3, uint32x4_t v4)
{
return vbslq_u32(vtstq_u32(v1, v2), v3, v4);
}
compiles (with -O2 or -Os or even -Oz) to:
and v0.16b, v1.16b, v0.16b
cmeq v0.4s, v0.4s, #0
bsl v0.16b, v3.16b, v2.16b
ret
The reason for this creativity is unclear - AFAIK, cmtst throughput/latency is
similar to cmeq.
Anyways, my benchkmarks indicate significant performance degradation for this
reason. The benchmarked case is an unrolled loop mostly comprised of vbslq and
vtstq).
Both GCC and MSVC compile the code above as expected:
cmtst v0.4s, v0.4s, v1.4s
bsl v0.16b, v2.16b, v3.16b
ret
--
You are receiving this mail because:
You are on the CC list for the bug.
-------------- next part --------------
An HTML attachment was scrubbed...
URL: <http://lists.llvm.org/pipermail/llvm-bugs/attachments/20211104/4bc0dcf6/attachment.html>
More information about the llvm-bugs
mailing list