[llvm-bugs] [Bug 51006] New: [WebAssembly][SIMD] Codegen for trunc <16 x i32> to <16 x i8> can be improved
via llvm-bugs
llvm-bugs at lists.llvm.org
Wed Jul 7 02:07:04 PDT 2021
https://bugs.llvm.org/show_bug.cgi?id=51006
Bug ID: 51006
Summary: [WebAssembly][SIMD] Codegen for trunc <16 x i32> to
<16 x i8> can be improved
Product: libraries
Version: trunk
Hardware: PC
OS: All
Status: NEW
Severity: enhancement
Priority: P
Component: Backend: WebAssembly
Assignee: unassignedbugs at nondot.org
Reporter: jing.bao at intel.com
CC: llvm-bugs at lists.llvm.org
When I build a micro case with -O3 for both X86 (-m32 -msse2 -msse3 -msse4.1
-msse4.2) and Wasm(-msimd128), I found that the codegen for WebAssembly is not
that good.
```
unsigned char buf[65536];
#pragma clang loop vectorize_width(16) interleave_count(1)
for (int i = 0; i < sizeof(buf); i++) {
buf[i] = (char)(i * i);
}
```
Above code will generates a trunc after Loop Vectorization.
```
%26 = trunc <16 x i32> %25 to <16 x i8>
```
For X86 Instruction Selection, it will be optimized to extract_subvector and
X86ISD::PACKUS.(See function combineVectorTruncation in
https://github.com/llvm/llvm-project/blob/main/llvm/lib/Target/X86/X86ISelLowering.cpp)
But for Wasm Instruction Selection, currently there's no similar optimization
for trunc, so it will be legalized to lots of extract_vector_elt and
insert_vector_elt.
The final Wasm bytecodes look like this
```
(i8x16.replace_lane 15
(i8x16.replace_lane 14
(i8x16.replace_lane 13
(i8x16.replace_lane 12
(i8x16.replace_lane 11
(i8x16.replace_lane 10
(i8x16.replace_lane 9
(i8x16.replace_lane 8
(i8x16.replace_lane 7
(i8x16.replace_lane 6
(i8x16.replace_lane 5
(i8x16.replace_lane 4
(i8x16.replace_lane 3
(i8x16.replace_lane 2
(i8x16.replace_lane 1
(i8x16.splat
(i32x4.extract_lane 0
(local.tee 3
(i32x4.mul
(local.get 7)
(local.get 7)))))
(i32x4.extract_lane 1
(local.get 3)))
(i32x4.extract_lane 2
(local.get 3)))
(i32x4.extract_lane 3
(local.get 3)))
(i32x4.extract_lane 0
(local.tee 3
(i32x4.mul
(local.get 6)
(local.get 6)))))
(i32x4.extract_lane 1
(local.get 3)))
(i32x4.extract_lane 2
(local.get 3)))
(i32x4.extract_lane 3
(local.get 3)))
(i32x4.extract_lane 0
(local.tee 3
(i32x4.mul
(local.get 5)
(local.get 5)))))
(i32x4.extract_lane 1
(local.get 3)))
(i32x4.extract_lane 2
(local.get 3)))
(i32x4.extract_lane 3
(local.get 3)))
(i32x4.extract_lane 0
(local.tee 3
(i32x4.mul
(local.get 4)
(local.get 4)))))
(i32x4.extract_lane 1
(local.get 3)))
(i32x4.extract_lane 2
(local.get 3)))
(i32x4.extract_lane 3
(local.get 3))))
```
Seems can be improved.
--
You are receiving this mail because:
You are on the CC list for the bug.
-------------- next part --------------
An HTML attachment was scrubbed...
URL: <http://lists.llvm.org/pipermail/llvm-bugs/attachments/20210707/0af31467/attachment.html>
More information about the llvm-bugs
mailing list