[llvm-branch-commits] [llvm] [DataLayout][LangRef] Split non-integral and unstable pointer properties (PR #105735)
Alexander Richardson via llvm-branch-commits
llvm-branch-commits at lists.llvm.org
Sun Jul 27 13:58:11 PDT 2025
https://github.com/arichardson updated https://github.com/llvm/llvm-project/pull/105735
>From e4bd1181d160b8728e7d4158417a83e183bd1709 Mon Sep 17 00:00:00 2001
From: Alex Richardson <alexrichardson at google.com>
Date: Thu, 22 Aug 2024 14:36:04 -0700
Subject: [PATCH 1/5] fix indentation in langref
Created using spr 1.3.6-beta.1
---
llvm/docs/LangRef.rst | 12 ++++++------
1 file changed, 6 insertions(+), 6 deletions(-)
diff --git a/llvm/docs/LangRef.rst b/llvm/docs/LangRef.rst
index 200224c78be00..1a59fba65815c 100644
--- a/llvm/docs/LangRef.rst
+++ b/llvm/docs/LangRef.rst
@@ -3103,19 +3103,19 @@ as follows:
``A<address space>``
Specifies the address space of objects created by '``alloca``'.
Defaults to the default address space of 0.
-``p[<flags>][n]:<size>:<abi>[:<pref>][:<idx>]``
+``p[<flags>][<address space>]:<size>:<abi>[:<pref>][:<idx>]``
This specifies the *size* of a pointer and its ``<abi>`` and
``<pref>``\erred alignments for address space ``n``. ``<pref>`` is optional
and defaults to ``<abi>``. The fourth parameter ``<idx>`` is the size of the
index that used for address calculation, which must be less than or equal
to the pointer size. If not
specified, the default index size is equal to the pointer size. All sizes
- are in bits. The address space, ``n``, is optional, and if not specified,
- denotes the default address space 0. The value of ``n`` must be
- in the range [1,2^24).
+ are in bits. The ``<address space>``, is optional, and if not specified,
+ denotes the default address space 0. The value of ``<address space>`` must
+ be in the range [1,2^24).
The optional``<flags>`` are used to specify properties of pointers in this
-address space: the character ``u`` marks pointers as having an unstable
- representation and ```n`` marks pointers as non-integral (i.e. having
+ address space: the character ``u`` marks pointers as having an unstable
+ representation and ``n`` marks pointers as non-integral (i.e. having
additional metadata). See :ref:`Non-Integral Pointer Types <nointptrtype>`.
``i<size>:<abi>[:<pref>]``
>From db97145d3a653f2999b5935f9b1cb4550230689d Mon Sep 17 00:00:00 2001
From: Alex Richardson <alexrichardson at google.com>
Date: Fri, 25 Oct 2024 12:51:11 -0700
Subject: [PATCH 2/5] include feedback
Created using spr 1.3.6-beta.1
---
llvm/docs/LangRef.rst | 30 +++++++++++++++++-------------
llvm/include/llvm/IR/DataLayout.h | 8 ++++----
2 files changed, 21 insertions(+), 17 deletions(-)
diff --git a/llvm/docs/LangRef.rst b/llvm/docs/LangRef.rst
index c137318af678b..3c3d0e0b4ab8e 100644
--- a/llvm/docs/LangRef.rst
+++ b/llvm/docs/LangRef.rst
@@ -659,7 +659,7 @@ LLVM IR optionally allows the frontend to denote pointers in certain address
spaces as "non-integral" or "unstable" (or both "non-integral" and "unstable")
via the :ref:`datalayout string<langref_datalayout>`.
-These exact implications of these properties are target-specific, but the
+The exact implications of these properties are target-specific, but the
following IR semantics and restrictions to optimization passes apply:
Unstable pointer representation
@@ -668,7 +668,7 @@ Unstable pointer representation
Pointers in this address space have an *unspecified* bitwise representation
(i.e. not backed by a fixed integer). The bitwise pattern of such pointers is
allowed to change in a target-specific way. For example, this could be a pointer
-type used for with copying garbage collection where the garbage collector could
+type used with copying garbage collection where the garbage collector could
update the pointer at any time in the collection sweep.
``inttoptr`` and ``ptrtoint`` instructions have the same semantics as for
@@ -705,10 +705,10 @@ representation of the pointer.
Non-integral pointer representation
^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
-Pointers are not represented as an address, but may instead include
+Pointers are not represented as just an address, but may instead include
additional metadata such as bounds information or a temporal identifier.
Examples include AMDGPU buffer descriptors with a 128-bit fat pointer and a
-32-bit offset or CHERI capabilities that contain bounds, permissions and an
+32-bit offset, or CHERI capabilities that contain bounds, permissions and an
out-of-band validity bit. In general, these pointers cannot be re-created
from just an integer value.
@@ -716,23 +716,25 @@ In most cases pointers with a non-integral representation behave exactly the
same as an integral pointer, the only difference is that it is not possible to
create a pointer just from an address.
-"Non-integral" pointers also impose restrictions on the optimizer, but in
-general these are less restrictive than for "unstable" pointers. The main
+"Non-integral" pointers also impose restrictions on transformation passes, but
+in general these are less restrictive than for "unstable" pointers. The main
difference compared to integral pointers is that ``inttoptr`` instructions
should not be inserted by passes as they may not be able to create a valid
pointer. This property also means that ``inttoptr(ptrtoint(x))`` cannot be
folded to ``x`` as the ``ptrtoint`` operation may destroy the necessary metadata
to reconstruct the pointer.
-Additionaly, since there could be out-of-band state, it is also not legal to
+Additionally, since there could be out-of-band state, it is also not legal to
convert a load/store of a non-integral pointer type to a load/store of an
-integer type with same bitwidth as that may not copy all the state.
-However, it is legal to use appropriately aligned ``llvm.memcpy`` and
-``llvm.memmove`` for copies of non-integral pointers as long as these are not
-converted into integer operations.
+integer type with same bitwidth, as that may not copy all the state.
+However, it is legal to use appropriately-aligned ``llvm.memcpy`` and
+``llvm.memmove`` for copies of non-integral pointers.
+NOTE: Lowering of ``llvm.memcpy`` containing non-integral pointer types must use
+appropriately-aligned and sized types instead of smaller integer types.
Unlike "unstable" pointers, the bit-wise representation is stable and
-``ptrtoint(x)`` always yields a deterministic values.
-This means optimizer is still permitted to insert new ``ptrtoint`` instructions.
+``ptrtoint(x)`` always yields a deterministic value.
+This means transformation passes are still permitted to insert new ``ptrtoint``
+instructions.
However, it is important to note that ``ptrtoint`` may not yield the same value
as storing the pointer via memory and reading it back as an integer, even if the
bitwidth of the two types matches (since ptrtoint could involve some form of
@@ -12187,6 +12189,8 @@ If ``value`` is smaller than ``ty2`` then a zero extension is done. If
``value`` is larger than ``ty2`` then a truncation is done. If they are
the same size, then nothing is done (*no-op cast*) other than a type
change.
+For :ref:`non-integral pointers <_nointptrtype>` the ``ptrtoint`` instruction
+may involve additional transformations beyond truncations or extension.
Example:
""""""""
diff --git a/llvm/include/llvm/IR/DataLayout.h b/llvm/include/llvm/IR/DataLayout.h
index ca185bfec851a..206abcdbea0a3 100644
--- a/llvm/include/llvm/IR/DataLayout.h
+++ b/llvm/include/llvm/IR/DataLayout.h
@@ -357,8 +357,8 @@ class DataLayout {
/// instructions operating on pointers of this address space.
/// TODO: remove this function after migrating to finer-grained properties.
bool isNonIntegralAddressSpace(unsigned AddrSpace) const {
- const PointerSpec &PS = getPointerSpec(AddrSpace);
- return PS.HasNonIntegralRepresentation || PS.HasUnstableRepresentation;
+ return hasUnstableRepresentation(AddrSpace) ||
+ hasNonIntegralRepresentation(AddrSpace);
}
/// Returns whether this address space has an "unstable" pointer
@@ -390,8 +390,8 @@ class DataLayout {
/// representations (hasUnstableRepresentation()) unless the pass knows it is
/// within a critical section that retains the current representation.
bool shouldAvoidIntToPtr(unsigned AddrSpace) const {
- const PointerSpec &PS = getPointerSpec(AddrSpace);
- return PS.HasNonIntegralRepresentation || PS.HasUnstableRepresentation;
+ return hasUnstableRepresentation(AddrSpace) ||
+ hasNonIntegralRepresentation(AddrSpace);
}
/// Returns whether passes should avoid introducing `ptrtoint` instructions
>From 94ecfa353dcf44087797594a8f77f9653c8b8e4a Mon Sep 17 00:00:00 2001
From: Alex Richardson <alexrichardson at google.com>
Date: Fri, 25 Oct 2024 14:54:59 -0700
Subject: [PATCH 3/5] address more feedback
Created using spr 1.3.6-beta.1
---
llvm/docs/LangRef.rst | 16 ++++++----
llvm/include/llvm/IR/DataLayout.h | 6 ++--
llvm/lib/IR/DataLayout.cpp | 5 +--
llvm/unittests/IR/DataLayoutTest.cpp | 46 ++++++++++++++++------------
4 files changed, 43 insertions(+), 30 deletions(-)
diff --git a/llvm/docs/LangRef.rst b/llvm/docs/LangRef.rst
index 3c3d0e0b4ab8e..2313527afedd7 100644
--- a/llvm/docs/LangRef.rst
+++ b/llvm/docs/LangRef.rst
@@ -709,8 +709,10 @@ Pointers are not represented as just an address, but may instead include
additional metadata such as bounds information or a temporal identifier.
Examples include AMDGPU buffer descriptors with a 128-bit fat pointer and a
32-bit offset, or CHERI capabilities that contain bounds, permissions and an
-out-of-band validity bit. In general, these pointers cannot be re-created
-from just an integer value.
+out-of-band validity bit. In general, valid non-integral pointers cannot be
+created from just an integer value: while ``inttoptr`` yields a deterministic
+bitwise pattern, the resulting value is not guaranteed to be a valid
+dereferenceable pointer.
In most cases pointers with a non-integral representation behave exactly the
same as an integral pointer, the only difference is that it is not possible to
@@ -3200,9 +3202,11 @@ as follows:
this set are considered to support most general arithmetic operations
efficiently.
``ni:<address space0>:<address space1>:<address space2>...``
- This specifies pointer types with the specified address spaces
- as :ref:`Non-Integral Pointer Type <nointptrtype>` s. The ``0``
- address space cannot be specified as non-integral.
+ This marks pointer types with the specified address spaces
+ as :ref:`non-integral and unstable <nointptrtype>`.
+ The ``0`` address space cannot be specified as non-integral.
+ It is only supported for backwards compatibility, the flags of the ``p``
+ specifier should be used instead for new code.
On every specification that takes a ``<abi>:<pref>``, specifying the
``<pref>`` alignment is optional. If omitted, the preceding ``:``
@@ -12189,7 +12193,7 @@ If ``value`` is smaller than ``ty2`` then a zero extension is done. If
``value`` is larger than ``ty2`` then a truncation is done. If they are
the same size, then nothing is done (*no-op cast*) other than a type
change.
-For :ref:`non-integral pointers <_nointptrtype>` the ``ptrtoint`` instruction
+For :ref:`non-integral pointers <nointptrtype>` the ``ptrtoint`` instruction
may involve additional transformations beyond truncations or extension.
Example:
diff --git a/llvm/include/llvm/IR/DataLayout.h b/llvm/include/llvm/IR/DataLayout.h
index 206abcdbea0a3..af9556feb724f 100644
--- a/llvm/include/llvm/IR/DataLayout.h
+++ b/llvm/include/llvm/IR/DataLayout.h
@@ -341,9 +341,9 @@ class DataLayout {
/// rounded up to a whole number of bytes.
unsigned getIndexSize(unsigned AS) const;
- /// Return the address spaces containing non-integral pointers. Pointers in
- /// this address space don't have a well-defined bitwise representation.
- SmallVector<unsigned, 8> getNonIntegralAddressSpaces() const {
+ /// Return the address spaces with special pointer semantics (such as being
+ /// unstable or non-integral).
+ SmallVector<unsigned, 8> getNonStandardAddressSpaces() const {
SmallVector<unsigned, 8> AddrSpaces;
for (const PointerSpec &PS : PointerSpecs) {
if (PS.HasNonIntegralRepresentation || PS.HasUnstableRepresentation)
diff --git a/llvm/lib/IR/DataLayout.cpp b/llvm/lib/IR/DataLayout.cpp
index 722f7b57d160e..9de984175228f 100644
--- a/llvm/lib/IR/DataLayout.cpp
+++ b/llvm/lib/IR/DataLayout.cpp
@@ -209,7 +209,7 @@ constexpr DataLayout::PrimitiveSpec DefaultVectorSpecs[] = {
// Default pointer type specifications.
constexpr DataLayout::PointerSpec DefaultPointerSpecs[] = {
// p0:64:64:64:64
- {0, 64, Align::Constant<8>(), Align::Constant<8>(), 64, false},
+ {0, 64, Align::Constant<8>(), Align::Constant<8>(), 64, false, false},
};
DataLayout::DataLayout()
@@ -437,7 +437,8 @@ Error DataLayout::parsePointerSpec(StringRef Spec) {
return Err;
}
if (AddrSpace == 0 && (NonIntegralRepr || UnstableRepr))
- return createStringError("address space 0 cannot be non-integral");
+ return createStringError(
+ "address space 0 cannot be non-integral or unstable");
// Size. Required, cannot be zero.
unsigned BitWidth;
diff --git a/llvm/unittests/IR/DataLayoutTest.cpp b/llvm/unittests/IR/DataLayoutTest.cpp
index 056584badcf74..8b6616ce0fb16 100644
--- a/llvm/unittests/IR/DataLayoutTest.cpp
+++ b/llvm/unittests/IR/DataLayoutTest.cpp
@@ -412,7 +412,7 @@ TEST(DataLayout, ParsePointerSpec) {
"pn0:64:64", "pu0:64:64", "pun0:64:64", "pnu0:64:64"})
EXPECT_THAT_EXPECTED(
DataLayout::parse(Str),
- FailedWithMessage("address space 0 cannot be non-integral"));
+ FailedWithMessage("address space 0 cannot be non-integral or unstable"));
}
TEST(DataLayoutTest, ParseNativeIntegersSpec) {
@@ -569,12 +569,12 @@ TEST(DataLayout, GetPointerPrefAlignment) {
TEST(DataLayout, IsNonIntegralAddressSpace) {
DataLayout Default;
- EXPECT_THAT(Default.getNonIntegralAddressSpaces(), ::testing::SizeIs(0));
+ EXPECT_THAT(Default.getNonStandardAddressSpaces(), ::testing::SizeIs(0));
EXPECT_FALSE(Default.isNonIntegralAddressSpace(0));
EXPECT_FALSE(Default.isNonIntegralAddressSpace(1));
DataLayout Custom = cantFail(DataLayout::parse("ni:2:16777215"));
- EXPECT_THAT(Custom.getNonIntegralAddressSpaces(),
+ EXPECT_THAT(Custom.getNonStandardAddressSpaces(),
::testing::ElementsAreArray({2U, 16777215U}));
EXPECT_FALSE(Custom.isNonIntegralAddressSpace(0));
EXPECT_FALSE(Custom.isNonIntegralAddressSpace(1));
@@ -582,37 +582,45 @@ TEST(DataLayout, IsNonIntegralAddressSpace) {
EXPECT_TRUE(Custom.isNonIntegralAddressSpace(16777215));
// Pointers can be marked as non-integral using 'pn'
- DataLayout NonIntegral = cantFail(DataLayout::parse("pn2:64:64:64:32"));
- EXPECT_TRUE(NonIntegral.isNonIntegralAddressSpace(2));
- EXPECT_TRUE(NonIntegral.hasNonIntegralRepresentation(2));
- EXPECT_FALSE(NonIntegral.hasUnstableRepresentation(2));
- EXPECT_TRUE(NonIntegral.shouldAvoidIntToPtr(2));
- EXPECT_FALSE(NonIntegral.shouldAvoidPtrToInt(2));
+ Custom = cantFail(DataLayout::parse("pn2:64:64:64:32"));
+ EXPECT_TRUE(Custom.isNonIntegralAddressSpace(2));
+ EXPECT_TRUE(Custom.hasNonIntegralRepresentation(2));
+ EXPECT_FALSE(Custom.hasUnstableRepresentation(2));
+ EXPECT_TRUE(Custom.shouldAvoidIntToPtr(2));
+ EXPECT_FALSE(Custom.shouldAvoidPtrToInt(2));
+ EXPECT_THAT(Custom.getNonStandardAddressSpaces(),
+ ::testing::ElementsAreArray({2U}));
// Pointers can be marked as unstable using 'pu'
- DataLayout Unstable = cantFail(DataLayout::parse("pu2:64:64:64:32"));
- EXPECT_TRUE(Unstable.isNonIntegralAddressSpace(2));
- EXPECT_TRUE(Unstable.hasUnstableRepresentation(2));
- EXPECT_FALSE(Unstable.hasNonIntegralRepresentation(2));
- EXPECT_TRUE(Unstable.shouldAvoidPtrToInt(2));
- EXPECT_TRUE(Unstable.shouldAvoidIntToPtr(2));
+ Custom = cantFail(DataLayout::parse("pu2:64:64:64:32"));
+ EXPECT_TRUE(Custom.isNonIntegralAddressSpace(2));
+ EXPECT_TRUE(Custom.hasUnstableRepresentation(2));
+ EXPECT_FALSE(Custom.hasNonIntegralRepresentation(2));
+ EXPECT_TRUE(Custom.shouldAvoidPtrToInt(2));
+ EXPECT_TRUE(Custom.shouldAvoidIntToPtr(2));
+ EXPECT_THAT(Custom.getNonStandardAddressSpaces(),
+ ::testing::ElementsAreArray({2U}));
// Both properties can also be set using 'pnu'/'pun'
- for (auto Layout : {"pnu2:64:64:64:32", "pun2:64:64:64:32"}) {
+ for (const auto *Layout : {"pnu2:64:64:64:32", "pun2:64:64:64:32"}) {
DataLayout DL = cantFail(DataLayout::parse(Layout));
EXPECT_TRUE(DL.isNonIntegralAddressSpace(2));
EXPECT_TRUE(DL.hasNonIntegralRepresentation(2));
EXPECT_TRUE(DL.hasUnstableRepresentation(2));
+ EXPECT_THAT(DL.getNonStandardAddressSpaces(),
+ ::testing::ElementsAreArray({2U}));
}
// For backwards compatibility, the ni DataLayout part overrides any p[n][u].
- for (auto Layout : {"ni:2-pn2:64:64:64:32", "ni:2-pnu2:64:64:64:32",
- "ni:2-pu2:64:64:64:32", "pn2:64:64:64:32-ni:2",
- "pnu2:64:64:64:32-ni:2", "pu2:64:64:64:32-ni:2"}) {
+ for (const auto *Layout : {"ni:2-pn2:64:64:64:32", "ni:2-pnu2:64:64:64:32",
+ "ni:2-pu2:64:64:64:32", "pn2:64:64:64:32-ni:2",
+ "pnu2:64:64:64:32-ni:2", "pu2:64:64:64:32-ni:2"}) {
DataLayout DL = cantFail(DataLayout::parse(Layout));
EXPECT_TRUE(DL.isNonIntegralAddressSpace(2));
EXPECT_TRUE(DL.hasNonIntegralRepresentation(2));
EXPECT_TRUE(DL.hasUnstableRepresentation(2));
+ EXPECT_THAT(DL.getNonStandardAddressSpaces(),
+ ::testing::ElementsAreArray({2U}));
}
}
>From de449dd8e32953e59a8e5fc594acee2930e003f9 Mon Sep 17 00:00:00 2001
From: Alex Richardson <alexrichardson at google.com>
Date: Mon, 21 Jul 2025 13:16:53 -0700
Subject: [PATCH 4/5] clang-format
Created using spr 1.3.6-beta.1
---
llvm/include/llvm/IR/DataLayout.h | 3 +--
1 file changed, 1 insertion(+), 2 deletions(-)
diff --git a/llvm/include/llvm/IR/DataLayout.h b/llvm/include/llvm/IR/DataLayout.h
index 00302ec156126..6f4981e2b65b6 100644
--- a/llvm/include/llvm/IR/DataLayout.h
+++ b/llvm/include/llvm/IR/DataLayout.h
@@ -429,8 +429,7 @@ class DataLayout {
/// representations (hasUnstableRepresentation()) unless the pass knows it is
/// within a critical section that retains the current representation.
bool shouldAvoidIntToPtr(unsigned AddrSpace) const {
- return hasUnstableRepresentation(AddrSpace) ||
- hasExternalState(AddrSpace);
+ return hasUnstableRepresentation(AddrSpace) || hasExternalState(AddrSpace);
}
/// Returns whether passes should avoid introducing `ptrtoint` instructions
>From 2c49735c0cfd83c731dffbee626e5b9ace29ef0d Mon Sep 17 00:00:00 2001
From: Alex Richardson <alexrichardson at google.com>
Date: Sun, 27 Jul 2025 13:57:55 -0700
Subject: [PATCH 5/5] typo fixes
Created using spr 1.3.6-beta.1
---
llvm/docs/LangRef.rst | 14 +++++++-------
1 file changed, 7 insertions(+), 7 deletions(-)
diff --git a/llvm/docs/LangRef.rst b/llvm/docs/LangRef.rst
index 3f1a0bd2fdc41..ef3464e657031 100644
--- a/llvm/docs/LangRef.rst
+++ b/llvm/docs/LangRef.rst
@@ -749,22 +749,22 @@ The ``inttoptr`` instruction does not recreate the external state and therefore
it is target dependent whether it can be used to create a dereferenceable
pointer. In general passes should assume that the result of such an inttoptr
is not dereferenceable. For example, on CHERI targets an ``inttoptr`` will
-yield a capability the external state (the validity tag bit) set to zero,
+yield a capability with the external state (the validity tag bit) set to zero,
which will cause any dereference to trap.
-The ``ptrtotint`` instruction also only returns the "in-band" state and omit
+The ``ptrtoint`` instruction also only returns the "in-band" state and omits
all external state.
These two properties mean that ``inttoptr(ptrtoint(x))`` cannot be folded to
``x`` since the ``ptrtoint`` operation does not include the external state
needed to reconstruct the original pointer and ``inttoptr`` cannot set it.
-When a ``store ptr addrspace(N) %p, ptr @dst`` of such a non-integral pointers
-is performed, the external metadata is also stored to the implementation-defined
+When a ``store ptr addrspace(N) %p, ptr @dst`` of such a non-integral pointer
+is performed, the external metadata is also stored to an implementation-defined
location. Similarly, a ``%val = load ptr addrspace(N), ptr @dst`` will fetch the
external metadata and make it available for all uses of ``%val``.
Similarly, the ``llvm.memcpy`` and ``llvm.memmove`` intrinsics also transfer the
-external state. This is essential to allow frontends to efficiently emit of
-copies of structures containing such pointers, since expanding all these copies
-as individual loads and stores would affect compilation speed and inhibit
+external state. This is essential to allow frontends to efficiently emit copies
+of structures containing such pointers, since expanding all these copies as
+individual loads and stores would affect compilation speed and inhibit
optimizations.
Notionally, these external bits are part of the pointer, but since
More information about the llvm-branch-commits
mailing list