[llvm] [IndVarSimplify] Add rewriting ptr-add phis with offset addressing (PR #171151)
John Brawn via llvm-commits
llvm-commits at lists.llvm.org
Fri Dec 12 07:30:20 PST 2025
john-brawn-arm wrote:
I'm somewhat hesitant about this approach, as you're dealing with the vectorizer not handling certain kinds of inputs by modifying the input, but this will be applying always even when the vectorizer won't run (e.g. the target doesn't have vector instructions). I also don't know if this would be universally a good thing, even if we know we're vectorizing.
Looking at the vectorizer, it looks like it won't vectorize the example in https://discourse.llvm.org/t/vectorizing-matrix-transpose-with-runtime-stride-on-aarch64-vplan-vprecipe-questions/89009 because AllowStridedPointerIVs in LoopVectorizationLegalty.cpp is false by default. Looking for issues related to strided accesses, I've found https://github.com/llvm/llvm-project/issues/129474, where it says that the opposite of this transformation (turning array-index addressing into pointer-increment addressing) is beneficial.
If I take the example in the above issue and convert it to pointer increment in an inner loop:
```
void func(double* a, int n)
{
for (int i = 0; i < n; i++) {
double *p = a + i;
for (int j = 0; j < n; j++) {
*p = 1;
p += 5;
}
}
}
```
then with ``clang --target=aarch64-none-elf -O3 -march=armv8-a+sve -fno-unroll-loops -mllvm -sve-gather-overhead=1 -mllvm -sve-scatter-overhead=1`` currently the vector loop that's generated is
```
subs x17, x17, x9
st1d { z1.d }, p0, [x16, z0.d]
add x16, x16, x14
b.ne .LBB1_6
```
but with this PR what's generated is
```
add z4.d, z3.d, z1.d
and z3.d, z3.d, #0xffffffff
subs w16, w16, w9
mul z3.d, z3.d, #40
st1d { z2.d }, p0, [x15, z3.d]
mov z3.d, z4.d
b.ne .LBB1_6
```
which looks worse.
I think it would be worthwhile looking into what happens if AllowStridedPointerIVs is enabled. In the example in https://discourse.llvm.org/t/vectorizing-matrix-transpose-with-runtime-stride-on-aarch64-vplan-vprecipe-questions/89009/2?u=john-brawn-arm we have a strided access, but the stride is happening outside of the pointer IV so it isn't noticed and vectorization isn't prevented. So perhaps enabling it is fine, because other kinds of strided accesses are already being vectorized.
https://github.com/llvm/llvm-project/pull/171151
More information about the llvm-commits
mailing list