[llvm] [WebAssembly] Mask undef shuffle lanes (PR #149084)

Sam Parker via llvm-commits llvm-commits at lists.llvm.org
Wed Aug 6 05:33:06 PDT 2025


sparker-arm wrote:

I've applied #146864 on top of wasi-sdk and used libyuv as a guide, because it can vectorize in weird and wonderful ways.

By simply searching from extend_low and extmul_low operations, to determine whether the high half of a shuffle is required or not, it's possible to get significant speedups:

```
Benchmark                                     Speedup(%)
------------------------------------------  ------------
libyuv-ARGBScaleDownBy2_Bilinear-run_times        21.418
libyuv-ARGBScaleDownBy2_Box-run_times             21.705
libyuv-ARGBScaleDownBy2_None-run_times            15.899
libyuv-ARGBScaleDownBy4_Box-run_times             22.627
libyuv-ARGBScaleDownBy4_Linear-run_times          -0.084
libyuv-ColourI420-run_times                        1.825
libyuv-ColourI422-run_times                        1.991
libyuv-ColourJ420-run_times                        2.04
libyuv-ColourJ422-run_times                        1.972
libyuv-NV12ToI420-run_times                      151.189
libyuv-NV21ToI420-run_times                      152.743
libyuv-P010ToI010-run_times                        8.704
libyuv-P012ToI012-run_times                       10.693
libyuv-UVScaleDownBy3by4_Linear-run_times          0.036
libyuv-UVScaleDownBy3by4_None-run_times           -0.2   
```

So, I think, with the revised approach to memory interleaving, using this undef AND mask hack isn't really going add anything, apart from extra work for the runtimes.

It would still be really nice to have this information encoded in the instruction though.

https://github.com/llvm/llvm-project/pull/149084


More information about the llvm-commits mailing list