[llvm] [LoopPeel] Peel last iteration to enable load widening (PR #173420)
Guy David via llvm-commits
llvm-commits at lists.llvm.org
Wed Dec 24 05:48:23 PST 2025
guy-david wrote:
Peeling was the simpler choice because it has less constraints, but you raise a good point.
I was thinking about cases where 7-bytes are loaded, for example, and then unrolling twice would translate into 3 load instructions (i64 + i32 + i16 per two iterations), which would use one extra load compared to the peeled version (i64 per iteration).
Here's an improvised example:
https://godbolt.org/z/G8s7ba88q
```
#include <cstdlib>
#include <cstdint>
uint64_t copy(uint8_t *src, size_t n) {
uint64_t sum;
for (size_t i = 0; i < n; i++) {
sum += src[0];
sum += src[1];
sum += src[2];
sum /= 11; // <- Doesn't allow vectorization.
src += 3;
}
return sum;
}
```
https://github.com/llvm/llvm-project/pull/173420
More information about the llvm-commits
mailing list