[llvm-branch-commits] [llvm] [LoopVectorize] Support vectorization of compressing patterns in VPlan (PR #140723)

Fri Nov 7 05:27:42 PST 2025

skachkov-sc wrote:

> I think you can probably make this independent of #140721 by first just supporting cases where to compressed store does not alias any of the other memory accesses?

Yes, the changes in LAA are fully independent, we can skip them for now.

> Curious if you already have any runtime performance numbers you could share?

We've benchmarked the following loop pattern:
```
// benchmark() is run 32 times

template<typename T>
void benchmark(T *dst, const T *src) {
  size_t idx = 0;
  for(size_t i = 0; i < 1024; ++i) {
    T cur = src[i];
    if (cur != static_cast<T>(0))
      dst[idx++] = cur;
  }
  dst[idx] = static_cast<T>(0);
}
```
On SpacemiT-X60 core (RISC-V CPU with VLEN=256) the results are following:

| Type    | cycles (scalar) | cycles (vector) | speedup |
| ---------|---------------------|----------------------|-------------|
| int16_t | 189151           | 56795               | 3.33x      |
| int32_t | 205712           | 87196               | 2.36x      |
| int64_t | 205757           | 150115             | 1.37x      |

There were no branch mispredicts for `if (cur != static_cast<T>(0))` branch in scalar case here (due to the specifics of data in src array), so I think the speedup can be even bigger for more random inputs. We haven't observed any significant changes on SPECs though.

https://github.com/llvm/llvm-project/pull/140723