[llvm] [LoopPeel] Peel last iteration to enable load widening (PR #173420)

Guy David via llvm-commits llvm-commits at lists.llvm.org
Wed Dec 24 05:48:23 PST 2025


guy-david wrote:

Peeling was the simpler choice because it has less constraints, but you raise a good point.
I was thinking about cases where 7-bytes are loaded, for example, and then unrolling twice would translate into 3 load instructions (i64 + i32 + i16 per two iterations), which would use one extra load compared to the peeled version (i64 per iteration).

Here's an improvised example:
https://godbolt.org/z/G8s7ba88q
```
#include <cstdlib>
#include <cstdint>

uint64_t copy(uint8_t *src, size_t n) {
    uint64_t sum;
    for (size_t i = 0; i < n; i++) {
        sum += src[0];
        sum += src[1];
        sum += src[2];
        sum /= 11; // <- Doesn't allow vectorization.
        src += 3;
    }
    return sum;
}
```

https://github.com/llvm/llvm-project/pull/173420


More information about the llvm-commits mailing list