[llvm] [WebAssembly] Mask undef shuffle lanes (PR #149084)
Sam Parker via llvm-commits
llvm-commits at lists.llvm.org
Wed Aug 6 05:33:06 PDT 2025
sparker-arm wrote:
I've applied #146864 on top of wasi-sdk and used libyuv as a guide, because it can vectorize in weird and wonderful ways.
By simply searching from extend_low and extmul_low operations, to determine whether the high half of a shuffle is required or not, it's possible to get significant speedups:
```
Benchmark Speedup(%)
------------------------------------------ ------------
libyuv-ARGBScaleDownBy2_Bilinear-run_times 21.418
libyuv-ARGBScaleDownBy2_Box-run_times 21.705
libyuv-ARGBScaleDownBy2_None-run_times 15.899
libyuv-ARGBScaleDownBy4_Box-run_times 22.627
libyuv-ARGBScaleDownBy4_Linear-run_times -0.084
libyuv-ColourI420-run_times 1.825
libyuv-ColourI422-run_times 1.991
libyuv-ColourJ420-run_times 2.04
libyuv-ColourJ422-run_times 1.972
libyuv-NV12ToI420-run_times 151.189
libyuv-NV21ToI420-run_times 152.743
libyuv-P010ToI010-run_times 8.704
libyuv-P012ToI012-run_times 10.693
libyuv-UVScaleDownBy3by4_Linear-run_times 0.036
libyuv-UVScaleDownBy3by4_None-run_times -0.2
```
So, I think, with the revised approach to memory interleaving, using this undef AND mask hack isn't really going add anything, apart from extra work for the runtimes.
It would still be really nice to have this information encoded in the instruction though.
https://github.com/llvm/llvm-project/pull/149084
More information about the llvm-commits
mailing list