[llvm] [InstCombine] Canonicalise SextADD + GEP (PR #69581)

Mon Apr 22 02:24:40 PDT 2024

brunodf-snps wrote:

We observe that the `add + gep -> gep + gep` rewriting introduced in [D155688](https://reviews.llvm.org/D155688) (by @d-smirnov and @paulwalker-arm) and continued in #69581 here, breaks the LoopFlatten pass on loops with the following pattern [from coremark](https://github.com/eembc/coremark/blob/main/core_matrix.c#L285):

```
for (int i = 0; i < n; i++)
  for (int j = 0; j < n; j++)
     ... read/write A[i*n + j] ...
```

Godbolt link: https://godbolt.org/z/j3dKqj757 where clang/LLVM 18.1.0 has no loop flattening on 32 bit (bottom right output).

(See [similar report](https://github.com/llvm/llvm-project/issues/78214#issuecomment-2053804846) from @DragonDisciple which explains that the rewrite requires dropping `inbounds` from the original gep which blocks the optimization of the loop.)

It would seem from [bug 40581](https://bugs.llvm.org/show_bug.cgi?id=40581) and from [test pr40581.ll](https://github.com/llvm/llvm-project/blob/main/llvm/test/Transforms/LoopFlatten/pr40581.ll) that @LebedevRI @sjoerdmeijer explicitly intended LoopFlatten to support this loop pattern, but unfortunately, there is no test with a complete optimization pipeline for such a loop, except for [llvm/test/Transforms/PhaseOrdering/AArch64/loopflatten.ll](https://github.com/llvm/llvm-project/blob/main/llvm/test/Transforms/PhaseOrdering/AArch64/loopflatten.ll).

But since the latter test is only for a 64 bit target (where there is a `sext` to convert an `int` to the pointer index type), it did not fire at the time of D155688. Based on the discussion above, it seemed that the test did fire here, but it was fixed (inadvertently?) when @nikic [requested](https://github.com/llvm/llvm-project/pull/69581#pullrequestreview-1689395163) that the rewriting here be limited to the case where the RHS of the add is a constant (so the sext folds away).

That explains why loop flattening is not broken on 64 bit in the godbolt example above (`j` is not constant so the rewrite does not trigger), but D155688 did not have this restriction, and this gives rise to the situation that `add + gep` is only canonicalized to `gep + gep` based on a complex combination of conditions. Or it is at least very hard to explain under what conditions LLVM can still perform loop flattening of the above loop.

https://github.com/llvm/llvm-project/pull/69581