[llvm] [WebAssembly] Mask undef shuffle lanes (PR #149084)

Thu Jul 17 01:35:28 PDT 2025

sparker-arm wrote:

I haven't tested with any VMs yet, as I doubt any of them will be taking advantage of this now.

The main advantage of this change is identify 'narrow' shuffles that can be mapped to target instructions. Even though Wasm is 128-bit, it doesn't always mean we're operating on that full width. Imagine that we're operating on 4 x 16-bit vector and we want the result to be the even lanes: 0, 2, 4, 6. But the wasm shuffle will be 0, 2, 4, 6, 0, 0, 0, 0.

I've optimised the AArch64 backend in V8 so that these cases are often handled by splatting lane zero first, but this is still far from optimal.

With the undef mask, during isel and with very little overhead, the backend can recognize this as an 'unzip' operation instead of an arbitrary lane shuffle.

The extend_low operations also provide the same information as this mask but, if the shuffle has multiple users, it's unlikely to be such a simple optimisation during isel. I've created an [optimisation](https://source.chromium.org/chromium/chromium/src/+/main:v8/src/compiler/turboshaft/wasm-shuffle-reducer.h) in V8 specifically for figuring out undef lanes and it's non trivial. This undef mask change would make it much more simple for other runtimes to generate good shuffle code far more easily. 

As you may have noticed, I've found WebAssembly shuffles to be a real pain! I would really like to see a revision to the spec so that these undef lanes/bytes can be explicitly encoded :)

https://github.com/llvm/llvm-project/pull/149084