[PATCH] D134257: [WebAssembly] Improve codegen for loading scalars from memory to v128
Fanchen Kong via Phabricator via llvm-commits
llvm-commits at lists.llvm.org
Wed Sep 21 20:41:36 PDT 2022
fanchenkong1 added a comment.
In D134257#3805936 <https://reviews.llvm.org/D134257#3805936>, @tlively wrote:
> Thanks! Do you need me to land this?
Yes, would you please help me land this change? The author name and mail could be "Fanchen Kong <fanchen.kong at intel.com>". Thanks!
================
Comment at: llvm/test/CodeGen/WebAssembly/simd-offset.ll:1196-1199
+; CHECK-NEXT: i32.const 24
+; CHECK-NEXT: i32x4.shl
+; CHECK-NEXT: i32.const 24
+; CHECK-NEXT: i32x4.shr_s
----------------
tlively wrote:
> fanchenkong1 wrote:
> > tlively wrote:
> > > It looks like there is some room for improvement here. These shifts aren't necessary, are they? It would be good to at least add a TODO about cleaning them up.
> > Yes, a TODO can be added if further change is needed. But I'm not sure if I fully understand what to do to remove the shifts. Does it mean by using two sign extend? e.g.
> > i16x8.extend_low_i8x16_s
> > i32x4.extend_low_i16x8_s
> Oh, I see, we need the shifts because they implement the sign extend part. Using the sequence of `extend_low` instructions is also a good idea. How would the native code from that solution compare?
On x64, for the shuffle + shifts solution, the native code maybe,
shuffle byte (or packed zero extend if zero vector is detectable)
packed shift left
packed shift right arithmetic
For sequence of extend_low, the expected code can be
packed byte to dword sign extend
The current solution seems not bad if shuffle byte can be reduced at VM. The extend_low sequence seems to be a little better on x64, but I'm not sure if thats the case for all platforms.
Repository:
rG LLVM Github Monorepo
CHANGES SINCE LAST ACTION
https://reviews.llvm.org/D134257/new/
https://reviews.llvm.org/D134257
More information about the llvm-commits
mailing list