[clang] [llvm] [DataLayout] Introduce sentinel pointer value (PR #131557)

Shilei Tian via llvm-commits llvm-commits at lists.llvm.org
Sun Mar 16 20:51:29 PDT 2025


https://github.com/shiltian updated https://github.com/llvm/llvm-project/pull/131557

>From 86cd48c8f43b34d9fee97137db8abe6454d76268 Mon Sep 17 00:00:00 2001
From: Shilei Tian <i at tianshilei.me>
Date: Sun, 16 Mar 2025 23:51:02 -0400
Subject: [PATCH] [DataLayout] Introduce sentinel pointer value
MIME-Version: 1.0
Content-Type: text/plain; charset=UTF-8
Content-Transfer-Encoding: 8bit

The value of a null pointer is not always `0`. For example, on AMDGPU, the null
pointer in address spaces 3 and 5 is `0xffffffff`. Currently, there is no
target-independent way to get this information, making it difficult and
error-prone to handle null pointers in target-agnostic code.

We do have `ConstantPointerNull`, but it might be a little confusing and
misleading. It represents a pointer with an all-zero value rather than
necessarily a real `nullptr`.

This PR introduces the concept of a *sentinel pointer value* to `DataLayout`,
representing the actual `nullptr` value for a given address space. The changes
include:

- A new interface function:
  ```
  APInt getSentinelPointerValue(unsigned AS)
  ```
  This function returns an `APInt` representing the sentinel pointer value for
  the given address space `AS`. An `APInt` is used instead of a literal integer
  to support cases where pointers are wider than 64 bits (e.g., AMDGPU’s address
  space 8).

- An extension to the data layout string format:
  ```
  p[n]:<size>:<abi>[:<pref>[:<idx>[:<sentinel>]]]
  ```
  The new `<sentinel>` component specifies the sentinel value for the
  corresponding pointer. It currently supports two values:
    - `0` for an all-zero value
    - `f` for a full-bit set value

  These two values are the most common representations of `nullptr`. It is
  unlikely that any target would define `nullptr` as a random value.

A follow-up patch series will introduce an equivalent of `ConstantPointerNull`
that represents the actual `nullptr`, built on top of this PR.
---
 clang/lib/Basic/Targets/AMDGPU.cpp   |  6 +--
 llvm/docs/LangRef.rst                | 11 ++++--
 llvm/include/llvm/IR/DataLayout.h    |  8 +++-
 llvm/lib/IR/DataLayout.cpp           | 55 +++++++++++++++++++++++-----
 llvm/unittests/IR/DataLayoutTest.cpp | 20 ++++++++--
 5 files changed, 79 insertions(+), 21 deletions(-)

diff --git a/clang/lib/Basic/Targets/AMDGPU.cpp b/clang/lib/Basic/Targets/AMDGPU.cpp
index a42b4589fb5ac..1506387a41d96 100644
--- a/clang/lib/Basic/Targets/AMDGPU.cpp
+++ b/clang/lib/Basic/Targets/AMDGPU.cpp
@@ -32,9 +32,9 @@ static const char *const DataLayoutStringR600 =
     "-v192:256-v256:256-v512:512-v1024:1024-v2048:2048-n32:64-S32-A5-G1";
 
 static const char *const DataLayoutStringAMDGCN =
-    "e-p:64:64-p1:64:64-p2:32:32-p3:32:32-p4:64:64-p5:32:32-p6:32:32"
-    "-p7:160:256:256:32-p8:128:128-p9:192:256:256:32-i64:64-v16:16-v24:32-v32:"
-    "32-v48:64-v96:128"
+    "e-p:64:64-p1:64:64-p2:32:32-p3:32:32:32:32:f-p4:64:64-p5:32:32:32:32:f"
+    "-p6:32:32-p7:160:256:256:32-p8:128:128-p9:192:256:256:32-i64:64-v16:16"
+    "-v24:32-v32:32-v48:64-v96:128"
     "-v192:256-v256:256-v512:512-v1024:1024-v2048:2048-n32:64-S32-A5-G1"
     "-ni:7:8:9";
 
diff --git a/llvm/docs/LangRef.rst b/llvm/docs/LangRef.rst
index c0567090fdd2a..a22020da38f83 100644
--- a/llvm/docs/LangRef.rst
+++ b/llvm/docs/LangRef.rst
@@ -3134,7 +3134,7 @@ as follows:
 ``A<address space>``
     Specifies the address space of objects created by '``alloca``'.
     Defaults to the default address space of 0.
-``p[n]:<size>:<abi>[:<pref>][:<idx>]``
+``p[<n>]:<size>:<abi>[:<pref>[:<idx>[:<sentinel>]]]``
     This specifies the *size* of a pointer and its ``<abi>`` and
     ``<pref>``\erred alignments for address space ``n``. ``<pref>`` is optional
     and defaults to ``<abi>``. The fourth parameter ``<idx>`` is the size of the
@@ -3143,7 +3143,10 @@ as follows:
     specified, the default index size is equal to the pointer size. All sizes
     are in bits. The address space, ``n``, is optional, and if not specified,
     denotes the default address space 0. The value of ``n`` must be
-    in the range [1,2^24).
+    in the range [1,2^24). The fifth parameter ``<sentinel>`` specifies the
+    sentinel value of the pointer for the corresponding address space. It
+    currently accepts two values: ``0`` for an all-zero value and ``f`` for a
+    full-bit set value. The default sentinel pointer value is all-zero.
 ``i<size>:<abi>[:<pref>]``
     This specifies the alignment for an integer type of a given bit
     ``<size>``. The value of ``<size>`` must be in the range [1,2^24).
@@ -13045,9 +13048,9 @@ This instruction requires several arguments:
    -  Caller and callee both have the calling convention ``fastcc`` or ``tailcc``.
    -  The call is in tail position (ret immediately follows call and ret
       uses value of call or is void).
-   -  Option ``-tailcallopt`` is enabled, ``llvm::GuaranteedTailCallOpt`` is 
+   -  Option ``-tailcallopt`` is enabled, ``llvm::GuaranteedTailCallOpt`` is
       ``true``, or the calling convention is ``tailcc``.
-   -  `Platform-specific constraints are met. 
+   -  `Platform-specific constraints are met.
       <CodeGenerator.html#tail-call-optimization>`_
 
 #. The optional ``notail`` marker indicates that the optimizers should not add
diff --git a/llvm/include/llvm/IR/DataLayout.h b/llvm/include/llvm/IR/DataLayout.h
index 2ad080e6d0cd2..1257ffb32a4df 100644
--- a/llvm/include/llvm/IR/DataLayout.h
+++ b/llvm/include/llvm/IR/DataLayout.h
@@ -78,6 +78,7 @@ class DataLayout {
     Align ABIAlign;
     Align PrefAlign;
     uint32_t IndexBitWidth;
+    APInt SentinelValue;
     /// Pointers in this address space don't have a well-defined bitwise
     /// representation (e.g. may be relocated by a copying garbage collector).
     /// Additionally, they may also be non-integral (i.e. containing additional
@@ -148,7 +149,7 @@ class DataLayout {
   /// Sets or updates the specification for pointer in the given address space.
   void setPointerSpec(uint32_t AddrSpace, uint32_t BitWidth, Align ABIAlign,
                       Align PrefAlign, uint32_t IndexBitWidth,
-                      bool IsNonIntegral);
+                      APInt SentinelValue, bool IsNonIntegral);
 
   /// Internal helper to get alignment for integer of given bitwidth.
   Align getIntegerAlignment(uint32_t BitWidth, bool abi_or_pref) const;
@@ -552,6 +553,11 @@ class DataLayout {
   ///
   /// This includes an explicitly requested alignment (if the global has one).
   Align getPreferredAlign(const GlobalVariable *GV) const;
+
+  /// Returns the sentinel pointer value for a given address space. If the
+  /// address space is invalid, it defaults to the sentinel pointer value of
+  /// address space 0, aligning with the behavior of \p getPointerSpec.
+  APInt getSentinelPointerValue(unsigned AS) const;
 };
 
 inline DataLayout *unwrap(LLVMTargetDataRef P) {
diff --git a/llvm/lib/IR/DataLayout.cpp b/llvm/lib/IR/DataLayout.cpp
index 0cf0bfc9702d3..1215aabefe69a 100644
--- a/llvm/lib/IR/DataLayout.cpp
+++ b/llvm/lib/IR/DataLayout.cpp
@@ -152,6 +152,7 @@ bool DataLayout::PointerSpec::operator==(const PointerSpec &Other) const {
   return AddrSpace == Other.AddrSpace && BitWidth == Other.BitWidth &&
          ABIAlign == Other.ABIAlign && PrefAlign == Other.PrefAlign &&
          IndexBitWidth == Other.IndexBitWidth &&
+         SentinelValue == Other.SentinelValue &&
          IsNonIntegral == Other.IsNonIntegral;
 }
 
@@ -206,9 +207,10 @@ constexpr DataLayout::PrimitiveSpec DefaultVectorSpecs[] = {
 };
 
 // Default pointer type specifications.
-constexpr DataLayout::PointerSpec DefaultPointerSpecs[] = {
-    // p0:64:64:64:64
-    {0, 64, Align::Constant<8>(), Align::Constant<8>(), 64, false},
+const DataLayout::PointerSpec DefaultPointerSpecs[] = {
+    // p0:64:64:64:64:0
+    {0, 64, Align::Constant<8>(), Align::Constant<8>(), 64, APInt(64, 0),
+     false},
 };
 
 DataLayout::DataLayout()
@@ -296,6 +298,22 @@ static Error parseSize(StringRef Str, unsigned &BitWidth,
   return Error::success();
 }
 
+static Error parseSentinelValue(StringRef Str, APInt &V) {
+  if (Str.empty())
+    return createStringError("sentinel value component cannot be empty");
+  if (Str.size() != 1)
+    return createStringError("sentinel value component must be a '0' or 'f'");
+  if (Str[0] == '0') {
+    V.clearAllBits();
+    return Error::success();
+  }
+  if (Str[0] == 'f') {
+    V.setAllBits();
+    return Error::success();
+  }
+  return createStringError("sentinel value component must be a '0' or 'f'");
+}
+
 /// Attempts to parse an alignment component of a specification.
 ///
 /// On success, returns the value converted to byte amount in \p Alignment.
@@ -409,13 +427,14 @@ Error DataLayout::parseAggregateSpec(StringRef Spec) {
 }
 
 Error DataLayout::parsePointerSpec(StringRef Spec) {
-  // p[<n>]:<size>:<abi>[:<pref>[:<idx>]]
+  // p[<n>]:<size>:<abi>[:<pref>[:<idx>[:<sentinel>]]]
   SmallVector<StringRef, 5> Components;
   assert(Spec.front() == 'p');
   Spec.drop_front().split(Components, ':');
 
-  if (Components.size() < 3 || Components.size() > 5)
-    return createSpecFormatError("p[<n>]:<size>:<abi>[:<pref>[:<idx>]]");
+  if (Components.size() < 3 || Components.size() > 6)
+    return createSpecFormatError(
+        "p[<n>]:<size>:<abi>[:<pref>[:<idx>[:<sentinel>]]]");
 
   // Address space. Optional, defaults to 0.
   unsigned AddrSpace = 0;
@@ -454,8 +473,14 @@ Error DataLayout::parsePointerSpec(StringRef Spec) {
     return createStringError(
         "index size cannot be larger than the pointer size");
 
+  APInt SentinelValue(BitWidth, 0);
+  if (Components.size() > 5) {
+    if (Error Err = parseSentinelValue(Components[5], SentinelValue))
+      return Err;
+  }
+
   setPointerSpec(AddrSpace, BitWidth, ABIAlign, PrefAlign, IndexBitWidth,
-                 false);
+                 SentinelValue, /*IsNonIntegral=*/false);
   return Error::success();
 }
 
@@ -631,7 +656,7 @@ Error DataLayout::parseLayoutString(StringRef LayoutString) {
     // the spec for AS0, and we then update that to mark it non-integral.
     const PointerSpec &PS = getPointerSpec(AS);
     setPointerSpec(AS, PS.BitWidth, PS.ABIAlign, PS.PrefAlign, PS.IndexBitWidth,
-                   true);
+                   PS.SentinelValue, /*IsNonIntegral=*/true);
   }
 
   return Error::success();
@@ -679,16 +704,19 @@ DataLayout::getPointerSpec(uint32_t AddrSpace) const {
 
 void DataLayout::setPointerSpec(uint32_t AddrSpace, uint32_t BitWidth,
                                 Align ABIAlign, Align PrefAlign,
-                                uint32_t IndexBitWidth, bool IsNonIntegral) {
+                                uint32_t IndexBitWidth, APInt SentinelValue,
+                                bool IsNonIntegral) {
   auto I = lower_bound(PointerSpecs, AddrSpace, LessPointerAddrSpace());
   if (I == PointerSpecs.end() || I->AddrSpace != AddrSpace) {
     PointerSpecs.insert(I, PointerSpec{AddrSpace, BitWidth, ABIAlign, PrefAlign,
-                                       IndexBitWidth, IsNonIntegral});
+                                       IndexBitWidth, SentinelValue,
+                                       IsNonIntegral});
   } else {
     I->BitWidth = BitWidth;
     I->ABIAlign = ABIAlign;
     I->PrefAlign = PrefAlign;
     I->IndexBitWidth = IndexBitWidth;
+    I->SentinelValue = SentinelValue;
     I->IsNonIntegral = IsNonIntegral;
   }
 }
@@ -1020,3 +1048,10 @@ Align DataLayout::getPreferredAlign(const GlobalVariable *GV) const {
   }
   return Alignment;
 }
+
+APInt DataLayout::getSentinelPointerValue(unsigned AS) const {
+  auto I = lower_bound(PointerSpecs, AS, LessPointerAddrSpace());
+  if (I != PointerSpecs.end() || I->AddrSpace == AS)
+    return I->SentinelValue;
+  return PointerSpecs[0].SentinelValue;
+}
diff --git a/llvm/unittests/IR/DataLayoutTest.cpp b/llvm/unittests/IR/DataLayoutTest.cpp
index 16a603ff6416f..79d46ebfd7d8e 100644
--- a/llvm/unittests/IR/DataLayoutTest.cpp
+++ b/llvm/unittests/IR/DataLayoutTest.cpp
@@ -313,11 +313,12 @@ TEST(DataLayout, ParsePointerSpec) {
     EXPECT_THAT_EXPECTED(DataLayout::parse(Str), Succeeded());
 
   for (StringRef Str :
-       {"p", "p0", "p:32", "p0:32", "p:32:32:32:32:32", "p0:32:32:32:32:32"})
+       {"p", "p0", "p:32", "p0:32", "p:32:32:32:32:32:0", "p0:32:32:32:32:32:0"})
     EXPECT_THAT_EXPECTED(
         DataLayout::parse(Str),
-        FailedWithMessage("malformed specification, must be of the form "
-                          "\"p[<n>]:<size>:<abi>[:<pref>[:<idx>]]\""));
+        FailedWithMessage(
+            "malformed specification, must be of the form "
+            "\"p[<n>]:<size>:<abi>[:<pref>[:<idx>[:<sentinel>]]]\""));
 
   // address space
   for (StringRef Str : {"p0x0:32:32", "px:32:32:32", "p16777216:32:32:32:32"})
@@ -401,6 +402,19 @@ TEST(DataLayout, ParsePointerSpec) {
     EXPECT_THAT_EXPECTED(
         DataLayout::parse(Str),
         FailedWithMessage("index size cannot be larger than the pointer size"));
+
+  // sentinel value
+  for (StringRef Str :
+       {"p:32:32:32:32:a", "p0:32:32:32:32:ab", "p42:32:32:32:32:123"})
+    EXPECT_THAT_EXPECTED(
+        DataLayout::parse(Str),
+        FailedWithMessage("sentinel value component must be a '0' or 'f'"));
+
+  for (StringRef Str :
+       {"p:32:32:32:32:", "p0:32:32:32:32:", "p42:32:32:32:32:"})
+    EXPECT_THAT_EXPECTED(
+        DataLayout::parse(Str),
+        FailedWithMessage("sentinel value component cannot be empty"));
 }
 
 TEST(DataLayoutTest, ParseNativeIntegersSpec) {



More information about the llvm-commits mailing list