[PATCH] D134257: [WebAssembly] Improve codegen for loading scalars from memory to v128

Wed Sep 21 20:41:36 PDT 2022

fanchenkong1 added a comment.

In D134257#3805936 <https://reviews.llvm.org/D134257#3805936>, @tlively wrote:

> Thanks! Do you need me to land this?

Yes, would you please help me land this change? The author name and mail could be "Fanchen Kong <fanchen.kong at intel.com>". Thanks!

================
Comment at: llvm/test/CodeGen/WebAssembly/simd-offset.ll:1196-1199
+; CHECK-NEXT:    i32.const 24
+; CHECK-NEXT:    i32x4.shl
+; CHECK-NEXT:    i32.const 24
+; CHECK-NEXT:    i32x4.shr_s
----------------
tlively wrote:
> fanchenkong1 wrote:
> > tlively wrote:
> > > It looks like there is some room for improvement here. These shifts aren't necessary, are they? It would be good to at least add a TODO about cleaning them up.
> > Yes, a TODO can be added if further change is needed. But I'm not sure if I fully understand what to do to remove the shifts. Does it mean by using two sign extend? e.g.
> >   i16x8.extend_low_i8x16_s
> >   i32x4.extend_low_i16x8_s
> Oh, I see, we need the shifts because they implement the sign extend part. Using the sequence of `extend_low` instructions is also a good idea. How would the native code from that solution compare?
On x64, for the shuffle + shifts solution, the native code maybe,
  shuffle byte (or packed zero extend if zero vector is detectable)
  packed shift left
  packed shift right arithmetic

For sequence of extend_low, the expected code can be
  packed byte to dword sign extend

The current solution seems not bad if shuffle byte can be reduced at VM. The extend_low sequence seems to be a little better on x64, but I'm not sure if thats the case for all platforms. 

Repository:
  rG LLVM Github Monorepo

CHANGES SINCE LAST ACTION
  https://reviews.llvm.org/D134257/new/

https://reviews.llvm.org/D134257