[llvm] [IndVarSimplify] Add rewriting ptr-add phis with offset addressing (PR #171151)

Fri Dec 12 07:30:20 PST 2025

john-brawn-arm wrote:

I'm somewhat hesitant about this approach, as you're dealing with the vectorizer not handling certain kinds of inputs by modifying the input, but this will be applying always even when the vectorizer won't run (e.g. the target doesn't have vector instructions). I also don't know if this would be universally a good thing, even if we know we're vectorizing.

Looking at the vectorizer, it looks like it won't vectorize the example in https://discourse.llvm.org/t/vectorizing-matrix-transpose-with-runtime-stride-on-aarch64-vplan-vprecipe-questions/89009 because AllowStridedPointerIVs in LoopVectorizationLegalty.cpp is false by default. Looking for issues related to strided accesses, I've found https://github.com/llvm/llvm-project/issues/129474, where it says that the opposite of this transformation (turning array-index addressing into pointer-increment addressing) is beneficial.

If I take the example in the above issue and convert it to pointer increment in an inner loop:
```
void func(double* a, int n)
{
  for (int i = 0; i < n; i++) {
    double *p = a + i;
    for (int j = 0; j < n; j++) {
      *p = 1;
      p += 5;
    }
  }
}
```
then with ``clang --target=aarch64-none-elf -O3 -march=armv8-a+sve -fno-unroll-loops -mllvm -sve-gather-overhead=1 -mllvm -sve-scatter-overhead=1`` currently the vector loop that's generated is
```
      subs  x17, x17, x9
      st1d  { z1.d }, p0, [x16, z0.d]
      add   x16, x16, x14
      b.ne  .LBB1_6
```
but with this PR what's generated is
```
      add   z4.d, z3.d, z1.d
      and   z3.d, z3.d, #0xffffffff
      subs  w16, w16, w9
      mul   z3.d, z3.d, #40
      st1d  { z2.d }, p0, [x15, z3.d]
      mov   z3.d, z4.d
      b.ne  .LBB1_6
```
which looks worse.

I think it would be worthwhile looking into what happens if AllowStridedPointerIVs is enabled. In the example in https://discourse.llvm.org/t/vectorizing-matrix-transpose-with-runtime-stride-on-aarch64-vplan-vprecipe-questions/89009/2?u=john-brawn-arm we have a strided access, but the stride is happening outside of the pointer IV so it isn't noticed and vectorization isn't prevented. So perhaps enabling it is fine, because other kinds of strided accesses are already being vectorized.

https://github.com/llvm/llvm-project/pull/171151