<html>
<head>
<base href="https://bugs.llvm.org/">
</head>
<body><table border="1" cellspacing="0" cellpadding="8">
<tr>
<th>Bug ID</th>
<td><a class="bz_bug_link
bz_status_NEW "
title="NEW - [WebAssembly][SIMD] Codegen for trunc <16 x i32> to <16 x i8> can be improved"
href="https://bugs.llvm.org/show_bug.cgi?id=51006">51006</a>
</td>
</tr>
<tr>
<th>Summary</th>
<td>[WebAssembly][SIMD] Codegen for trunc <16 x i32> to <16 x i8> can be improved
</td>
</tr>
<tr>
<th>Product</th>
<td>libraries
</td>
</tr>
<tr>
<th>Version</th>
<td>trunk
</td>
</tr>
<tr>
<th>Hardware</th>
<td>PC
</td>
</tr>
<tr>
<th>OS</th>
<td>All
</td>
</tr>
<tr>
<th>Status</th>
<td>NEW
</td>
</tr>
<tr>
<th>Severity</th>
<td>enhancement
</td>
</tr>
<tr>
<th>Priority</th>
<td>P
</td>
</tr>
<tr>
<th>Component</th>
<td>Backend: WebAssembly
</td>
</tr>
<tr>
<th>Assignee</th>
<td>unassignedbugs@nondot.org
</td>
</tr>
<tr>
<th>Reporter</th>
<td>jing.bao@intel.com
</td>
</tr>
<tr>
<th>CC</th>
<td>llvm-bugs@lists.llvm.org
</td>
</tr></table>
<p>
<div>
<pre>When I build a micro case with -O3 for both X86 (-m32 -msse2 -msse3 -msse4.1
-msse4.2) and Wasm(-msimd128), I found that the codegen for WebAssembly is not
that good.
```
unsigned char buf[65536];
#pragma clang loop vectorize_width(16) interleave_count(1)
for (int i = 0; i < sizeof(buf); i++) {
buf[i] = (char)(i * i);
}
```
Above code will generates a trunc after Loop Vectorization.
```
%26 = trunc <16 x i32> %25 to <16 x i8>
```
For X86 Instruction Selection, it will be optimized to extract_subvector and
X86ISD::PACKUS.(See function combineVectorTruncation in
<a href="https://github.com/llvm/llvm-project/blob/main/llvm/lib/Target/X86/X86ISelLowering.cpp">https://github.com/llvm/llvm-project/blob/main/llvm/lib/Target/X86/X86ISelLowering.cpp</a>)
But for Wasm Instruction Selection, currently there's no similar optimization
for trunc, so it will be legalized to lots of extract_vector_elt and
insert_vector_elt.
The final Wasm bytecodes look like this
```
(i8x16.replace_lane 15
(i8x16.replace_lane 14
(i8x16.replace_lane 13
(i8x16.replace_lane 12
(i8x16.replace_lane 11
(i8x16.replace_lane 10
(i8x16.replace_lane 9
(i8x16.replace_lane 8
(i8x16.replace_lane 7
(i8x16.replace_lane 6
(i8x16.replace_lane 5
(i8x16.replace_lane 4
(i8x16.replace_lane 3
(i8x16.replace_lane 2
(i8x16.replace_lane 1
(i8x16.splat
(i32x4.extract_lane 0
(local.tee 3
(i32x4.mul
(local.get 7)
(local.get 7)))))
(i32x4.extract_lane 1
(local.get 3)))
(i32x4.extract_lane 2
(local.get 3)))
(i32x4.extract_lane 3
(local.get 3)))
(i32x4.extract_lane 0
(local.tee 3
(i32x4.mul
(local.get 6)
(local.get 6)))))
(i32x4.extract_lane 1
(local.get 3)))
(i32x4.extract_lane 2
(local.get 3)))
(i32x4.extract_lane 3
(local.get 3)))
(i32x4.extract_lane 0
(local.tee 3
(i32x4.mul
(local.get 5)
(local.get 5)))))
(i32x4.extract_lane 1
(local.get 3)))
(i32x4.extract_lane 2
(local.get 3)))
(i32x4.extract_lane 3
(local.get 3)))
(i32x4.extract_lane 0
(local.tee 3
(i32x4.mul
(local.get 4)
(local.get 4)))))
(i32x4.extract_lane 1
(local.get 3)))
(i32x4.extract_lane 2
(local.get 3)))
(i32x4.extract_lane 3
(local.get 3))))
```
Seems can be improved.</pre>
</div>
</p>
<hr>
<span>You are receiving this mail because:</span>
<ul>
<li>You are on the CC list for the bug.</li>
</ul>
</body>
</html>