[PATCH] D134257: [WebAssembly] Improve codegen for loading scalars from memory to v128
Thomas Lively via Phabricator via llvm-commits
llvm-commits at lists.llvm.org
Wed Sep 21 20:56:56 PDT 2022
tlively added inline comments.
================
Comment at: llvm/test/CodeGen/WebAssembly/simd-offset.ll:1196-1199
+; CHECK-NEXT: i32.const 24
+; CHECK-NEXT: i32x4.shl
+; CHECK-NEXT: i32.const 24
+; CHECK-NEXT: i32x4.shr_s
----------------
fanchenkong1 wrote:
> tlively wrote:
> > fanchenkong1 wrote:
> > > tlively wrote:
> > > > It looks like there is some room for improvement here. These shifts aren't necessary, are they? It would be good to at least add a TODO about cleaning them up.
> > > Yes, a TODO can be added if further change is needed. But I'm not sure if I fully understand what to do to remove the shifts. Does it mean by using two sign extend? e.g.
> > > i16x8.extend_low_i8x16_s
> > > i32x4.extend_low_i16x8_s
> > Oh, I see, we need the shifts because they implement the sign extend part. Using the sequence of `extend_low` instructions is also a good idea. How would the native code from that solution compare?
> On x64, for the shuffle + shifts solution, the native code maybe,
> shuffle byte (or packed zero extend if zero vector is detectable)
> packed shift left
> packed shift right arithmetic
>
> For sequence of extend_low, the expected code can be
> packed byte to dword sign extend
>
> The current solution seems not bad if shuffle byte can be reduced at VM. The extend_low sequence seems to be a little better on x64, but I'm not sure if thats the case for all platforms.
>
>
Thanks for the detail. I'll land this as-is because it is an improvement over the status quo, but we should keep that other possibility in mind.
Repository:
rG LLVM Github Monorepo
CHANGES SINCE LAST ACTION
https://reviews.llvm.org/D134257/new/
https://reviews.llvm.org/D134257
More information about the llvm-commits
mailing list