[llvm] [LV] Enable considering higher VFs when data extend ops are present i… (PR #137593)

Tue Apr 29 04:24:36 PDT 2025

sushgokh wrote:

> Could you explain what the performance difference was and why it led to improvements? What did the two versions of the assembly look like? Thanks

For the benchmark, the difference is currently, the cost model is selecting `VF=1`. However, we know that with `VF=vscale x 4`, the code is more profitable. The difference is ~14%

A code, which is somewhat similar to the benchmark, can be found here: https://godbolt.org/z/Wfjhb8PPT
This is also getting scalarized but  `vscale x 16`, this is more profitable as measured with llvm-mca.

https://github.com/llvm/llvm-project/pull/137593