<html>
<head>
<base href="https://bugs.llvm.org/">
</head>
<body><table border="1" cellspacing="0" cellpadding="8">
<tr>
<th>Bug ID</th>
<td><a class="bz_bug_link
bz_status_NEW "
title="NEW - [SIMD] __builtin_shufflevector to 64-bit vector then extending not vectorized"
href="https://bugs.llvm.org/show_bug.cgi?id=50807">50807</a>
</td>
</tr>
<tr>
<th>Summary</th>
<td>[SIMD] __builtin_shufflevector to 64-bit vector then extending not vectorized
</td>
</tr>
<tr>
<th>Product</th>
<td>libraries
</td>
</tr>
<tr>
<th>Version</th>
<td>trunk
</td>
</tr>
<tr>
<th>Hardware</th>
<td>PC
</td>
</tr>
<tr>
<th>OS</th>
<td>Windows NT
</td>
</tr>
<tr>
<th>Status</th>
<td>NEW
</td>
</tr>
<tr>
<th>Severity</th>
<td>enhancement
</td>
</tr>
<tr>
<th>Priority</th>
<td>P
</td>
</tr>
<tr>
<th>Component</th>
<td>Backend: WebAssembly
</td>
</tr>
<tr>
<th>Assignee</th>
<td>unassignedbugs@nondot.org
</td>
</tr>
<tr>
<th>Reporter</th>
<td>clang@evan.coeusgroup.com
</td>
</tr>
<tr>
<th>CC</th>
<td>llvm-bugs@lists.llvm.org
</td>
</tr></table>
<p>
<div>
<pre>With -msimd128 -O3, I would expect a __builtin_shufflevector which returns half
the elements plus a __builtin_convertvector to extend each element (resulting
in a 128-bit vector) to generate a v128.shuffle and an extend_low. Instead, it
generates a bunch of extract_lane and replace_lane instructions.
Here are a couple of quick examples (Compiler Explorer:
<a href="https://godbolt.org/z/EjbMqPhx1">https://godbolt.org/z/EjbMqPhx1</a>):
#include <wasm_simd128.h>
#pragma clang diagnostic ignored "-Wmissing-prototypes"
typedef int8_t i8x16 __attribute__((__vector_size__(16)));
typedef int16_t i16x8 __attribute__((__vector_size__(16)));
typedef int32_t i32x4 __attribute__((__vector_size__(16)));
typedef uint8_t u8x16 __attribute__((__vector_size__(16)));
typedef uint16_t u16x8 __attribute__((__vector_size__(16)));
typedef uint32_t u32x4 __attribute__((__vector_size__(16)));
i16x8
foo(i8x16 a) {
return __builtin_convertvector(
__builtin_shufflevector(a, a,
0, 2, 4, 6, 8, 10, 12, 14
),
i16x8
);
}
v128_t
foo_intrin(v128_t a) {
return
wasm_i16x8_extend_low_i8x16(
wasm_i8x16_shuffle(a, a,
0, 2, 4, 6, 8, 10, 12, 14,
1, 3, 5, 7, 9, 11, 13, 15)
);
}
i16x8
bar(i8x16 a) {
return
__builtin_convertvector(
__builtin_shufflevector(
a, a,
0, 2, 4, 6, 8, 10, 12, 14
),
i16x8
)
-
__builtin_convertvector(
__builtin_shufflevector(
a, a,
1, 3, 5, 7, 9, 11, 13, 15
),
i16x8
);
}
i16x8
bar_intrin(v128_t a) {
v128_t shuffled = wasm_i8x16_shuffle(
a, a,
0, 2, 4, 6, 8, 10, 12, 14,
1, 3, 5, 7, 9, 11, 13, 15
);
return
wasm_i16x8_extend_low_i8x16(shuffled) -
wasm_i16x8_extend_high_i8x16(shuffled);
}
I think it's pretty reasonable to expect that foo and foo_intrin should
generate roughly the same code (the upper half of the shuffle doesn't matter,
so maybe all zeros or something).
I'd be very impressed, OTOH, if bar and bar_intrin generated the same code.
I'm not sure how feasible that is, though.</pre>
</div>
</p>
<hr>
<span>You are receiving this mail because:</span>
<ul>
<li>You are on the CC list for the bug.</li>
</ul>
</body>
</html>