[llvm-bugs] [Bug 49961] New: Bad codegen for vbslq_u32() intrinsic

Wed Apr 14 10:40:26 PDT 2021

https://bugs.llvm.org/show_bug.cgi?id=49961

            Bug ID: 49961
           Summary: Bad codegen for vbslq_u32() intrinsic
           Product: libraries
           Version: trunk
          Hardware: PC
                OS: Linux
            Status: NEW
          Severity: enhancement
          Priority: P
         Component: Backend: AArch64
          Assignee: unassignedbugs at nondot.org
          Reporter: mkuper at google.com
                CC: arnaud.degrandmaison at arm.com,
                    llvm-bugs at lists.llvm.org, smithp352 at googlemail.com,
                    Ties.Stuij at arm.com

Consider:

int foo(uint32x4x2_t reg, uint32x4_t mask, int index) {
  return vbslq_u32(mask, reg.val[0], reg.val[1])[index];
}

clang vs gcc: https://gcc.godbolt.org/z/YPe3TK79P

clang trunk:
foo(uint32x4x2_t, __Uint32x4_t, int):    // @foo(uint32x4x2_t, __Uint32x4_t,
int)
        sub     sp, sp, #48                     // =48
        and     x8, x0, #0x3
        add     x10, sp, #32                    // =32
        str     q1, [sp, #32]
        mov     x9, sp
        add     x11, sp, #16                    // =16
        bfi     x10, x8, #2, #2
        and     v0.16b, v0.16b, v2.16b
        bfi     x9, x8, #2, #2
        bfi     x11, x8, #2, #2
        ldr     w8, [x10]
        str     q2, [sp, #16]
        ldr     w10, [x11]
        str     q0, [sp]
        ldr     w9, [x9]
        bic     w8, w8, w10
        orr     w0, w8, w9
        add     sp, sp, #48                     // =48
        ret

gcc trunk:
foo(uint32x4x2_t, __Uint32x4_t, int):
        bsl     v2.16b, v0.16b, v1.16b
        sub     sp, sp, #16
        str     q2, [sp]
        ldr     w0, [sp, w0, sxtw 2]
        add     sp, sp, 16
        ret

>From a cursory examination of what's going on, clang lowers vbslq_u32(mask, a,
b) to a vector "or(and(a, mask), and(b,~mask))", which the backend expects to
match. However, in this case, something in the midend decides it's best to
first extract the elements at "index" from both vectors, and then do the
or(and(), and()) song-and-dance in the scalar domain.

-- 
You are receiving this mail because:
You are on the CC list for the bug.
-------------- next part --------------
An HTML attachment was scrubbed...
URL: <http://lists.llvm.org/pipermail/llvm-bugs/attachments/20210414/c8837f8b/attachment.html>