[llvm-branch-commits] [llvm] [DataLayout][LangRef] Split non-integral and unstable pointer properties (PR #105735)

Tue Jan 7 14:50:44 PST 2025

================
@@ -650,48 +650,127 @@ literal types are uniqued in recent versions of LLVM.
 
 .. _nointptrtype:
 
-Non-Integral Pointer Type
--------------------------
+Non-Integral and Unstable Pointer Types
+---------------------------------------
 
-Note: non-integral pointer types are a work in progress, and they should be
-considered experimental at this time.
+Note: non-integral/unstable pointer types are a work in progress, and they
+should be considered experimental at this time.
 
 LLVM IR optionally allows the frontend to denote pointers in certain address
-spaces as "non-integral" via the :ref:`datalayout string<langref_datalayout>`.
-Non-integral pointer types represent pointers that have an *unspecified* bitwise
-representation; that is, the integral representation may be target dependent or
-unstable (not backed by a fixed integer).
+spaces as "non-integral" or "unstable" (or both "non-integral" and "unstable")
+via the :ref:`datalayout string<langref_datalayout>`.
+
+The exact implications of these properties are target-specific, but the
+following IR semantics and restrictions to optimization passes apply:
+
+Unstable pointer representation
+^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
+
+Pointers in this address space have an *unspecified* bitwise representation
+(i.e. not backed by a fixed integer). The bitwise pattern of such pointers is
+allowed to change in a target-specific way. For example, this could be a pointer
+type used with copying garbage collection where the garbage collector could
+update the pointer at any time in the collection sweep.
 
 ``inttoptr`` and ``ptrtoint`` instructions have the same semantics as for
 integral (i.e. normal) pointers in that they convert integers to and from
-corresponding pointer types, but there are additional implications to be
-aware of.  Because the bit-representation of a non-integral pointer may
-not be stable, two identical casts of the same operand may or may not
+corresponding pointer types, but there are additional implications to be aware
+of.
+
+For "unstable" pointer representations, the bit-representation of the pointer
+may not be stable, so two identical casts of the same operand may or may not
 return the same value.  Said differently, the conversion to or from the
-non-integral type depends on environmental state in an implementation
+"unstable" pointer type depends on environmental state in an implementation
 defined manner.
-
 If the frontend wishes to observe a *particular* value following a cast, the
 generated IR must fence with the underlying environment in an implementation
 defined manner. (In practice, this tends to require ``noinline`` routines for
 such operations.)
 
 From the perspective of the optimizer, ``inttoptr`` and ``ptrtoint`` for
-non-integral types are analogous to ones on integral types with one
+"unstable" pointer types are analogous to ones on integral types with one
 key exception: the optimizer may not, in general, insert new dynamic
 occurrences of such casts.  If a new cast is inserted, the optimizer would
 need to either ensure that a) all possible values are valid, or b)
 appropriate fencing is inserted.  Since the appropriate fencing is
 implementation defined, the optimizer can't do the latter.  The former is
 challenging as many commonly expected properties, such as
-``ptrtoint(v)-ptrtoint(v) == 0``, don't hold for non-integral types.
+``ptrtoint(v)-ptrtoint(v) == 0``, don't hold for "unstable" pointer types.
 Similar restrictions apply to intrinsics that might examine the pointer bits,
 such as :ref:`llvm.ptrmask<int_ptrmask>`.
 
-The alignment information provided by the frontend for a non-integral pointer
+The alignment information provided by the frontend for an "unstable" pointer
 (typically using attributes or metadata) must be valid for every possible
 representation of the pointer.
 
+Non-integral pointer representation
+^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
+
+Pointers are not represented as just an address, but may instead include
+additional metadata such as bounds information or a temporal identifier.
+Examples include AMDGPU buffer descriptors with a 128-bit fat pointer and a
+32-bit offset, or CHERI capabilities that contain bounds, permissions and a
+type field (as well as an out-of-band validity bit, see next section).
+In general, valid non-integral pointers cannot becreated from just an integer
+value: while ``inttoptr`` yields a deterministic bitwise pattern, the resulting
+value is not guaranteed to be a valid dereferenceable pointer.
+
+In most cases pointers with a non-integral representation behave exactly the
+same as an integral pointer, the only difference is that it is not possible to
+create a pointer just from an address.
+
+"Non-integral" pointers also impose restrictions on transformation passes, but
+in general these are less restrictive than for "unstable" pointers. The main
+difference compared to integral pointers is that ``inttoptr`` instructions
+should not be inserted by passes as they may not be able to create a valid
+pointer. This property also means that ``inttoptr(ptrtoint(x))`` cannot be
+folded to ``x`` as the ``ptrtoint`` operation may destroy the necessary metadata
+to reconstruct the pointer.
----------------
krzysz00 wrote:

My theory of `ptrtoint` is, roughly, "the thing that gets stored to memory when you write a pointer" - that is, some underlying bit representation so that you can hack on the address in an integer way. Than is, `ptrtoint` really could just be named `bitcast`.

And in the context of annotated pointers, I'd argue that, if the notion of "which object you're in" is part of the pointer, than one-past-the-end is *not* equal to the next object.

That is, given that I've done
```llvm
%x = ptr addrspace(1) ...
%y = ptr addrspace(1) [%x + 64 bytes, distinct object]
%x8 = call ptr addrspace(8) @llvm.amdgcn.make.buffer.rsrc.p1(ptr addrspace(1) %x, 0, /*numRecords=*/64, ...)
%y8 = call ptr addrspace(8) @llvm.amdgcn.make.buffer.rsrc.p1(ptr addrspace(1) %y, 0, N, ...)
%x7 = addrspacecast ptr addrspace(8) %x8 to ptr addrspace(7)
%y7 = addrspacecast ptr addrspace(8) %y8 to ptr addrspace(7)
%y.via.x = getelementptr i8, ptr addrspace(7) %x7, i32 64
%r = icmp ptr addrspace(7) eq %y.via.x, y7
```
should have `%r` false. For one thing, in this example, `load i32, ptr addrspace(7) %y.via.x` is an out-of-bounds load that probably returns 0, while a load from `%y7` (which is the "same" address) would actually target that memory.

Additionally, even if `%x8`'s `numRecords` was large enough to allow indexing into `%y`, that one-past-the-end pointer doesn't seem like it should be equal to the next object - they're different resources.

https://github.com/llvm/llvm-project/pull/105735