[llvm] [InstCombine] Retain inbounds when canonicalising add+gep (PR #72244)

Fri Dec 15 01:52:22 PST 2023

================
@@ -2450,10 +2450,51 @@ Instruction *InstCombinerImpl::visitGetElementPtrInst(GetElementPtrInst &GEP) {
       // as:
       //   %newptr = getelementptr i32, ptr %ptr, i64 %idx1
       //   %newgep = getelementptr i32, ptr %newptr, i64 %idx2
-      auto *NewPtr = Builder.CreateGEP(GEP.getResultElementType(),
-                                       GEP.getPointerOperand(), Idx1);
-      return GetElementPtrInst::Create(GEP.getResultElementType(), NewPtr,
-                                       Idx2);
+      // If %gep is inbounds then %newgep can be inbounds only if %newptr is as
+      // well, as an inbounds gep requires the base pointer to be inbounds. We
+      // can mark %newptr as inbounds if we have a loop like
+      //   for (i = 0; ...)
+      //     ptr[i+x]
+      // If x is the same in each loop iteration then we know that we have a
+      // series of geps starting with ptr[x], which means that ptr[x] must be
+      // inbounds.
----------------
nikic wrote:

This reasoning is not correct. Violating inbounds will result in a poison return value, not immediate UB. As such, you can't do inductive reasoning along the lines of "It was inbounds on the first iteration, and each iteration advances it by an inbounds amount". For example, you could have the case that the result of the GEP is only actually used on a single iteration, in which case it doesn't matter whether all the others are poison or not.

Take this variant of your first test: https://alive2.llvm.org/ce/z/fR9p5X It replaces the store to the pointer (which will convert poison into UB) with a call to a function. The alive2 counter-example is that the pointer is null, and k is -1. On the first iteration you have gep inbounds (null, -1) which is poison. On the second you have gep inbounds (null, 1-1) which is not. After the transform you have gep inbounds (gep inbounds (null, 1), -1) which is poison.

To salvage that approach, I think you would have to require that a) the GEP being poison implies UB and b) the GEP is executed on each loop iteration.

---

For your original motivating case for this patch, is the (non-IV) add operand known to be non-negative by chance? The usual way we would preserve inbounds in a transform like this is to check that both add operands are non-negatives. We can prove that for the IV, but I'm not sure whether the information exists for the other operand in cases that you care about.

https://github.com/llvm/llvm-project/pull/72244