[PATCH] D134257: [WebAssembly] Improve codegen for loading scalars from memory to v128

Wed Sep 21 20:56:56 PDT 2022

tlively added inline comments.

================
Comment at: llvm/test/CodeGen/WebAssembly/simd-offset.ll:1196-1199
+; CHECK-NEXT:    i32.const 24
+; CHECK-NEXT:    i32x4.shl
+; CHECK-NEXT:    i32.const 24
+; CHECK-NEXT:    i32x4.shr_s
----------------
fanchenkong1 wrote:
> tlively wrote:
> > fanchenkong1 wrote:
> > > tlively wrote:
> > > > It looks like there is some room for improvement here. These shifts aren't necessary, are they? It would be good to at least add a TODO about cleaning them up.
> > > Yes, a TODO can be added if further change is needed. But I'm not sure if I fully understand what to do to remove the shifts. Does it mean by using two sign extend? e.g.
> > >   i16x8.extend_low_i8x16_s
> > >   i32x4.extend_low_i16x8_s
> > Oh, I see, we need the shifts because they implement the sign extend part. Using the sequence of `extend_low` instructions is also a good idea. How would the native code from that solution compare?
> On x64, for the shuffle + shifts solution, the native code maybe,
>   shuffle byte (or packed zero extend if zero vector is detectable)
>   packed shift left
>   packed shift right arithmetic
> 
> For sequence of extend_low, the expected code can be
>   packed byte to dword sign extend
> 
> The current solution seems not bad if shuffle byte can be reduced at VM. The extend_low sequence seems to be a little better on x64, but I'm not sure if thats the case for all platforms. 
> 
>   
Thanks for the detail. I'll land this as-is because it is an improvement over the status quo, but we should keep that other possibility in mind.

Repository:
  rG LLVM Github Monorepo

CHANGES SINCE LAST ACTION
  https://reviews.llvm.org/D134257/new/

https://reviews.llvm.org/D134257