[clang] [llvm] [IR] Change representation of getelementptr inrange (PR #84341)

Fri Mar 8 02:57:58 PST 2024

nikic wrote:

> Have you thought about the implications for dynamic (non-constant) indices?

inrange is only supported on constant expressions, and I think the consensus is that it should not be extended to non-constant cases. In that case, we would represent the information independently of the GEP using an intrinsic instead.

(Technically a constant expression GEP can also have a dynamic index, but that case is not practically relevant.)

> Stuff like
> 
> ```
>   %gep = getelementptr [50 x {i32, [10 x i32]}], ptr %base, i32 0, i32 %outer_idx, i32 1, i32 %inner_idx
> ```
> 
> The current representation allows an `inrange` on the second-to-last index which to my understanding restricts the range to the `[10 x i32]` into which `%gep` falls structurally.
> 
> It seems like neither proposed representation is able to capture that in a single `ptradd` (it could be captured with a sequence of two `ptradd`s, but that's an awkward tradeoff).
>
> The first proposed representation at least allows handling the case when there is only a single dynamic index, and the `inrange` is to the right of it.

ptradd wouldn't support multi-index GEPs anyway, so the representation would be something like
```
%p1 = ptradd ptr %base, i32 %outer_idx * 44
%p2 = ptradd ptr %p1, i32 4
%p3 = ptradd ptr %p2, i32 %inner_idx * 4
```
if we support scaling inside ptradd (or separate multiply/shift instructions otherwise). Where `%p1` likely gets LICMed and/or CSEd.

If we did support inrange on non-constant ptradd then we would encode the constraints that it's inrange of the inner array as
```
; relative to source
%p2 = ptradd ptr inrange(4, 44) %p1, i32 4
; relative to result
%p2 = ptradd ptr inrange(0, 40) %p1, i32 4
```

Though if we consider a variant where we don't have that extra constant ptradd (i.e. drop the struct with the `i32` element):
```
%p1 = ptradd ptr %base, i32 %outer_idx * 44
%p2 = ptradd ptr %p1, i32 %inner_idx * 4
```
And then want to restrict inrange to the inner array, then for the source-relative case we can write:
```
%p1 = ptradd ptr %base, i32 %outer_idx * 44
%p2 = ptradd ptr inrange(0, 40) %p1, i32 %inner_idx * 4
```
While the result-relative case can't represent this without a dummy ptradd 0.

> (I guess this is an extension to @aeubanks' point. It feels like `inrange` _should_ be useful for things like alias analysis. But I guess it's not used for that at the moment.)

If we supported inrange on instruction GEP, then yes, it would be useful for AA.

https://github.com/llvm/llvm-project/pull/84341