[llvm-branch-commits] [llvm] [LoopVectorize] Support vectorization of compressing patterns in VPlan (PR #140723)
Sergey Kachkov via llvm-branch-commits
llvm-branch-commits at lists.llvm.org
Fri Nov 7 05:27:42 PST 2025
skachkov-sc wrote:
> I think you can probably make this independent of #140721 by first just supporting cases where to compressed store does not alias any of the other memory accesses?
Yes, the changes in LAA are fully independent, we can skip them for now.
> Curious if you already have any runtime performance numbers you could share?
We've benchmarked the following loop pattern:
```
// benchmark() is run 32 times
template<typename T>
void benchmark(T *dst, const T *src) {
size_t idx = 0;
for(size_t i = 0; i < 1024; ++i) {
T cur = src[i];
if (cur != static_cast<T>(0))
dst[idx++] = cur;
}
dst[idx] = static_cast<T>(0);
}
```
On SpacemiT-X60 core (RISC-V CPU with VLEN=256) the results are following:
| Type | cycles (scalar) | cycles (vector) | speedup |
| ---------|---------------------|----------------------|-------------|
| int16_t | 189151 | 56795 | 3.33x |
| int32_t | 205712 | 87196 | 2.36x |
| int64_t | 205757 | 150115 | 1.37x |
There were no branch mispredicts for `if (cur != static_cast<T>(0))` branch in scalar case here (due to the specifics of data in src array), so I think the speedup can be even bigger for more random inputs. We haven't observed any significant changes on SPECs though.
https://github.com/llvm/llvm-project/pull/140723
More information about the llvm-branch-commits
mailing list