[llvm-bugs] [Bug 50807] New: [SIMD] __builtin_shufflevector to 64-bit vector then extending not vectorized

Tue Jun 22 13:31:34 PDT 2021

https://bugs.llvm.org/show_bug.cgi?id=50807

            Bug ID: 50807
           Summary: [SIMD] __builtin_shufflevector to 64-bit vector then
                    extending not vectorized
           Product: libraries
           Version: trunk
          Hardware: PC
                OS: Windows NT
            Status: NEW
          Severity: enhancement
          Priority: P
         Component: Backend: WebAssembly
          Assignee: unassignedbugs at nondot.org
          Reporter: clang at evan.coeusgroup.com
                CC: llvm-bugs at lists.llvm.org

With -msimd128 -O3, I would expect a __builtin_shufflevector which returns half
the elements plus a __builtin_convertvector to extend each element (resulting
in a 128-bit vector) to generate a v128.shuffle and an extend_low.  Instead, it
generates a bunch of extract_lane and replace_lane instructions.

Here are a couple of quick examples (Compiler Explorer:
https://godbolt.org/z/EjbMqPhx1):

#include <wasm_simd128.h>

#pragma clang diagnostic ignored "-Wmissing-prototypes"

typedef   int8_t i8x16 __attribute__((__vector_size__(16)));
typedef  int16_t i16x8 __attribute__((__vector_size__(16)));
typedef  int32_t i32x4 __attribute__((__vector_size__(16)));
typedef  uint8_t u8x16 __attribute__((__vector_size__(16)));
typedef uint16_t u16x8 __attribute__((__vector_size__(16)));
typedef uint32_t u32x4 __attribute__((__vector_size__(16)));

i16x8
foo(i8x16 a) {
    return __builtin_convertvector(
        __builtin_shufflevector(a, a,
            0, 2, 4, 6, 8, 10, 12, 14
        ),
        i16x8
    );
}

v128_t
foo_intrin(v128_t a) {
    return
        wasm_i16x8_extend_low_i8x16(
            wasm_i8x16_shuffle(a, a,
                0, 2, 4, 6, 8, 10, 12, 14,
                1, 3, 5, 7, 9, 11, 13, 15)
        );
}

i16x8
bar(i8x16 a) {
    return
        __builtin_convertvector(
            __builtin_shufflevector(
                a, a,
                0, 2, 4, 6, 8, 10, 12, 14
            ),
            i16x8
        )
        -
        __builtin_convertvector(
            __builtin_shufflevector(
                a, a,
                1, 3, 5, 7, 9, 11, 13, 15
            ),
            i16x8
        );
}

i16x8
bar_intrin(v128_t a) {
    v128_t shuffled = wasm_i8x16_shuffle(
        a, a,
        0, 2, 4, 6, 8, 10, 12, 14,
        1, 3, 5, 7, 9, 11, 13, 15
    );
    return
        wasm_i16x8_extend_low_i8x16(shuffled) -
        wasm_i16x8_extend_high_i8x16(shuffled);
}

I think it's pretty reasonable to expect that foo and foo_intrin should
generate roughly the same code (the upper half of the shuffle doesn't matter,
so maybe all zeros or something).

I'd be very impressed, OTOH, if bar and bar_intrin generated the same code. 
I'm not sure how feasible that is, though.

-- 
You are receiving this mail because:
You are on the CC list for the bug.
-------------- next part --------------
An HTML attachment was scrubbed...
URL: <http://lists.llvm.org/pipermail/llvm-bugs/attachments/20210622/878efe14/attachment.html>