[llvm] [IR] Add llvm.structured.gep instruction (PR #167883)

Tue Nov 18 10:21:39 PST 2025

================
@@ -14841,6 +14841,180 @@ Semantics:
 
 See the description for :ref:`llvm.stacksave <int_stacksave>`.
 
+.. _i_structured_gep:
+
+'``llvm.structured.gep``' Intrinsic
+^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
+
+Syntax:
+"""""""
+
+::
+
+      <result> = call ptr llvm.structured.gep <basetype> poison, ptr <source>, {, [i32/i64] <index> }*
+
+Overview:
+"""""""""
+
+The '``llvm.structured.gep``' intrinsic (structured **G**\ et\ **E**\ lement\ **P**\ tr) computes a new pointer address
+resulting of a logical indexing into the ``<source>`` pointer. The returned
+address depends on the indices and may depend on the layout of %basetype at
+runtime.
+
+Arguments:
+""""""""""
+
+``<ty> basetype``:
+The type of the element pointed by the pointer source. This type will be
+used along with the provided indices and source operands to compute a new
+pointer representing the result of a logical indexing into a basetype
+pointed by source.
+The actual value passed is ignored, and should be ``poison``.
+
+``ptr <source>``:
+A pointer to a valid memory location assumed to be large enough to hold a
+completely laid out value with the same type as ``basetype``. The physical
+layout of ``basetype`` is target dependent, and is not always known at
+compile time.
+
+``[i32/i64] index, ...``:
+Indices used to traverse into the basetype and determine the target element
+this instruction computes an offset for. Indices can be 32-bit or 64-bit
+unsigned integers. Indices being handled one by one, both sizes can be mixed
+in the same instruction. The precision used to compute the resulting pointer
+is target-dependent.
----------------
Keenuts wrote:

Isn't this something that can be solved at the codegen level? If this language is doing pointer arithmetic (assuming a subset) but still needs to target an architecture where pointer arithmetic cannot be represented outside of a logic addressing, I'd guess the FE should do:

 - consider `int *p` to be `struct { int *array_base, int offset }`, and `p += 1` to be sugar for `p.offset += 1` or `p.offset -= 1`.
 Thus, the SGEP would be generated as: `sgep [0 x type ], ptr p.array_base, i32 p.offset`.
 
Let's look at this code:

```c
void bar(int *ptr) { *ptr = 13 };

void foo(int *ptr) {
    *ptr = 12;
    bar(ptr - 1);
}

void entry(int *runtime_array) {
    foo(&runtime_array[3])
}
```

This will be lowered to something like:

```llvm
void bar(ptr %ptr) {
  store i32 13, ptr %ptr
}

void foo(ptr %ptr) {
  store i32 12, ptr %ptr
  %1 = sgep [0 x i32] %ptr, i32 -1
  call bar(ptr %1)
}

void main(ptr %runtime_array) {
  %1 = sgep [0 x i32] %runtime_array, i32 3
  call foo(ptr %1)
}
```

Once inlined:

```c
void main(ptr %runtime_array) {
  %1 = sgep [0 x i32] %runtime_array, i32 3
  store i32 12, ptr %1
  %2 = sgep [0 x i32] %1, i32 -1
  store i32 13, ptr %2
}
```

Optimizations wouldn't be allowed to change the second `sgep` into a `sgep [0 x i32] %runtime_array, i32 2` because it's indexing into a logical array located at the pointer `%1`. Maybe in your architecture this is equivalent, but not always meaning target-agnostic optimizations wouldn't be allowed to change this.

But if this is handled in the FE, and pointers are represented as `logical base + index`, then you have this:

```llvm
oid bar(ptr %ptr, i32 %index) {
  %1 = sgep [0 x i32] %ptr, i32 %index
  store i32 13, ptr %ptr
}

void foo(ptr %ptr, i32 %index) {
  %1 = sgep [0 x i32] %ptr, i32 %index
  store i32 12, ptr %1
  %2 = add i32 %index, i32 -1
  call bar(ptr %ptr, i32 %2)
}

void main(ptr %runtime_array, i32 %zero) {
  %1 = add i32 %zero, i32 3
  call foo(ptr %runtime_array, i32 %1)
}
```

```
void main(ptr %runtime_array, i32 %zero) {
  %1 = add %zero, i32 4
  %2 = sgep [0 x i32] %runtime_array, i32 %1
  store i32 12, ptr %2
  %3 = add %1, i32 -1
  %4 = sgep [0 x i32] %runtime_array, i32 %3
  store i32 13, ptr %4
}
```

This can now be optimized as we can compute the constant for each index, thus remove the add instruction in a target agnostic manner








https://github.com/llvm/llvm-project/pull/167883