[llvm-bugs] [Bug 51122] New: Suboptimal codegen for llvm.vector.reduce of <N x i1>
via llvm-bugs
llvm-bugs at lists.llvm.org
Fri Jul 16 21:29:52 PDT 2021
https://bugs.llvm.org/show_bug.cgi?id=51122
Bug ID: 51122
Summary: Suboptimal codegen for llvm.vector.reduce of <N x i1>
Product: libraries
Version: 12.0
Hardware: PC
OS: All
Status: NEW
Severity: enhancement
Priority: P
Component: Backend: AArch64
Assignee: unassignedbugs at nondot.org
Reporter: caleb.zulawski at gmail.com
CC: arnaud.degrandmaison at arm.com,
llvm-bugs at lists.llvm.org, smithp352 at googlemail.com,
Ties.Stuij at arm.com
The binary reduction intrinsics on Aarch64 (and ARM) produce suboptimal
implementations over vectors of i1. This issue is similar to
https://bugs.llvm.org/show_bug.cgi?id=38840.
declare i1 @llvm.vector.reduce.or.v16i1(<16 x i1> %a);
define i1 @mask_reduce_or(<16 x i8> %mask) {
%mask1 = trunc <16 x i8> %mask to <8 x i1>
%reduced = call i1 @llvm.vector.reduce.or.v16i1(<8 x i1> %mask1)
ret i1 %reduced
}
produces
mask_reduce_or: // @mask_reduce_or
umov w14, v0.b[1]
umov w15, v0.b[0]
umov w13, v0.b[2]
orr w14, w15, w14
umov w12, v0.b[3]
orr w13, w14, w13
umov w11, v0.b[4]
orr w12, w13, w12
umov w10, v0.b[5]
orr w11, w12, w11
umov w9, v0.b[6]
orr w10, w11, w10
umov w8, v0.b[7]
orr w9, w10, w9
orr w8, w9, w8
and w0, w8, #0x1
ret
when it could instead use vmaxvq (or vpmax on ARM).
The same goes for vector.reduce.and with vminvq (or vpmin on ARM).
--
You are receiving this mail because:
You are on the CC list for the bug.
-------------- next part --------------
An HTML attachment was scrubbed...
URL: <http://lists.llvm.org/pipermail/llvm-bugs/attachments/20210717/f0e2d69c/attachment.html>
More information about the llvm-bugs
mailing list