[llvm-bugs] [Bug 51122] New: Suboptimal codegen for llvm.vector.reduce of <N x i1>

Fri Jul 16 21:29:52 PDT 2021

https://bugs.llvm.org/show_bug.cgi?id=51122

            Bug ID: 51122
           Summary: Suboptimal codegen for llvm.vector.reduce of <N x i1>
           Product: libraries
           Version: 12.0
          Hardware: PC
                OS: All
            Status: NEW
          Severity: enhancement
          Priority: P
         Component: Backend: AArch64
          Assignee: unassignedbugs at nondot.org
          Reporter: caleb.zulawski at gmail.com
                CC: arnaud.degrandmaison at arm.com,
                    llvm-bugs at lists.llvm.org, smithp352 at googlemail.com,
                    Ties.Stuij at arm.com

The binary reduction intrinsics on Aarch64 (and ARM) produce suboptimal
implementations over vectors of i1.  This issue is similar to
https://bugs.llvm.org/show_bug.cgi?id=38840.

declare i1 @llvm.vector.reduce.or.v16i1(<16 x i1> %a);

define i1 @mask_reduce_or(<16 x i8> %mask) {
    %mask1 = trunc <16 x i8> %mask to <8 x i1>
    %reduced = call i1 @llvm.vector.reduce.or.v16i1(<8 x i1> %mask1)
    ret i1 %reduced
}

produces

mask_reduce_or:                         // @mask_reduce_or
        umov    w14, v0.b[1]
        umov    w15, v0.b[0]
        umov    w13, v0.b[2]
        orr     w14, w15, w14
        umov    w12, v0.b[3]
        orr     w13, w14, w13
        umov    w11, v0.b[4]
        orr     w12, w13, w12
        umov    w10, v0.b[5]
        orr     w11, w12, w11
        umov    w9, v0.b[6]
        orr     w10, w11, w10
        umov    w8, v0.b[7]
        orr     w9, w10, w9
        orr     w8, w9, w8
        and     w0, w8, #0x1
        ret

when it could instead use vmaxvq (or vpmax on ARM).

The same goes for vector.reduce.and with vminvq (or vpmin on ARM).

-- 
You are receiving this mail because:
You are on the CC list for the bug.
-------------- next part --------------
An HTML attachment was scrubbed...
URL: <http://lists.llvm.org/pipermail/llvm-bugs/attachments/20210717/f0e2d69c/attachment.html>