[llvm-bugs] [Bug 50807] New: [SIMD] __builtin_shufflevector to 64-bit vector then extending not vectorized
via llvm-bugs
llvm-bugs at lists.llvm.org
Tue Jun 22 13:31:34 PDT 2021
https://bugs.llvm.org/show_bug.cgi?id=50807
Bug ID: 50807
Summary: [SIMD] __builtin_shufflevector to 64-bit vector then
extending not vectorized
Product: libraries
Version: trunk
Hardware: PC
OS: Windows NT
Status: NEW
Severity: enhancement
Priority: P
Component: Backend: WebAssembly
Assignee: unassignedbugs at nondot.org
Reporter: clang at evan.coeusgroup.com
CC: llvm-bugs at lists.llvm.org
With -msimd128 -O3, I would expect a __builtin_shufflevector which returns half
the elements plus a __builtin_convertvector to extend each element (resulting
in a 128-bit vector) to generate a v128.shuffle and an extend_low. Instead, it
generates a bunch of extract_lane and replace_lane instructions.
Here are a couple of quick examples (Compiler Explorer:
https://godbolt.org/z/EjbMqPhx1):
#include <wasm_simd128.h>
#pragma clang diagnostic ignored "-Wmissing-prototypes"
typedef int8_t i8x16 __attribute__((__vector_size__(16)));
typedef int16_t i16x8 __attribute__((__vector_size__(16)));
typedef int32_t i32x4 __attribute__((__vector_size__(16)));
typedef uint8_t u8x16 __attribute__((__vector_size__(16)));
typedef uint16_t u16x8 __attribute__((__vector_size__(16)));
typedef uint32_t u32x4 __attribute__((__vector_size__(16)));
i16x8
foo(i8x16 a) {
return __builtin_convertvector(
__builtin_shufflevector(a, a,
0, 2, 4, 6, 8, 10, 12, 14
),
i16x8
);
}
v128_t
foo_intrin(v128_t a) {
return
wasm_i16x8_extend_low_i8x16(
wasm_i8x16_shuffle(a, a,
0, 2, 4, 6, 8, 10, 12, 14,
1, 3, 5, 7, 9, 11, 13, 15)
);
}
i16x8
bar(i8x16 a) {
return
__builtin_convertvector(
__builtin_shufflevector(
a, a,
0, 2, 4, 6, 8, 10, 12, 14
),
i16x8
)
-
__builtin_convertvector(
__builtin_shufflevector(
a, a,
1, 3, 5, 7, 9, 11, 13, 15
),
i16x8
);
}
i16x8
bar_intrin(v128_t a) {
v128_t shuffled = wasm_i8x16_shuffle(
a, a,
0, 2, 4, 6, 8, 10, 12, 14,
1, 3, 5, 7, 9, 11, 13, 15
);
return
wasm_i16x8_extend_low_i8x16(shuffled) -
wasm_i16x8_extend_high_i8x16(shuffled);
}
I think it's pretty reasonable to expect that foo and foo_intrin should
generate roughly the same code (the upper half of the shuffle doesn't matter,
so maybe all zeros or something).
I'd be very impressed, OTOH, if bar and bar_intrin generated the same code.
I'm not sure how feasible that is, though.
--
You are receiving this mail because:
You are on the CC list for the bug.
-------------- next part --------------
An HTML attachment was scrubbed...
URL: <http://lists.llvm.org/pipermail/llvm-bugs/attachments/20210622/878efe14/attachment.html>
More information about the llvm-bugs
mailing list