[PATCH] D114988: [IR] `GetElementPtrInst`: per-index `inrange` support

Roman Lebedev via Phabricator via llvm-commits llvm-commits at lists.llvm.org
Thu Dec 2 13:17:41 PST 2021


lebedev.ri requested review of this revision.
lebedev.ri added a comment.

In D114988#3167963 <https://reviews.llvm.org/D114988#3167963>, @nikic wrote:

> While I support the general goal of exposing GEP offset restrictions to IR,
> I am quite strongly opposed to the implementation approach of extending `inrange`.
> The core issue is that this is strongly tied to LLVM struct types and structural GEP indexing.
> This will be a blow to opaque pointer usefulness and future offset canonicalization for GEPs.

While i'm certainly sympathetic to the opaque pointer future,
i'd also like to remind that they are just a tool.

Concretely, can you quote anything that says that in the opaque pointer future,
the only GEP that will remain will only be able to apply a byte offset to the pointer,
i.e. there won't be GEP's into structs/multiple indices?

> I think the correct approach to `inrange`-like information is to restrict the range of GEP indices
> without relying on the underlying structure type. Think `inrange(0, 4) i32 %x` for `%x` between 0 and 4.
> This naturally integrates in purely offset-based alias analysis, and can be more generally preserved under transformation.
> For example, if you have `gep %base, inrange (%x + 1)`, if this is transformed into `gep (gep %base, 1), %x`
> there is no way to preserve `inrange` information under the current proposal,
> while a proper offset-based approach could easily retain information under simple transformations.

I think you are missing the whole point there. It is explicitly **NOT** the point of this patch
to be able encode that some index must take values in range of [x, y). So if that is your proposal,
while it may be interesting, it's explicitly inferior, and does not solve the motivational case.

The reason being, it encodes *VERY* different semantics.
If we've encoded that in

  struct S {
      int a[3];
      int b[3];
      int c[3];
  };
  
  void bar(int*);
  
  void foo(S* s, int i) {
    int* p = &s.b[i];
    bar(p);
    int* p2 = p + 4; // UB!
    bar(p);
  }

the variable `i` of function `foo` must be `[0, 3)`,
that only tells us that `p` is pointing somewhere in `(int*)s + 3 + [0, 3)`.
What it does not encode is that, given that pointer, we can not go outside of that array,
i.e. that `auto* p2 = p + 4;` is UB.


Repository:
  rG LLVM Github Monorepo

CHANGES SINCE LAST ACTION
  https://reviews.llvm.org/D114988/new/

https://reviews.llvm.org/D114988



More information about the llvm-commits mailing list