[llvm-branch-commits] [DataLayout][LangRef] Split non-integral and unstable pointer properties (PR #105735)

via llvm-branch-commits llvm-branch-commits at lists.llvm.org
Thu Aug 22 14:29:10 PDT 2024


llvmbot wrote:


<!--LLVM PR SUMMARY COMMENT-->

@llvm/pr-subscribers-llvm-ir

Author: Alexander Richardson (arichardson)

<details>
<summary>Changes</summary>

Theses are finer-grained versions of isNonIntegralAddressSpace() and
isNonIntegralPointerType() where the current semantics prohibit
introduction of both ptrtoint and inttoptr instructions. These semantics
are too strict for some targets (e.g. AMDGPU/CHERI) where ptrtoint has
a stable value, but the pointer cannot be recreated just from the
address since there is additional metadata.
Currently, marking a pointer address space as non-integral also marks it
as having an unstable bitwise representation (e.g. when pointers can be
changed by a copying GC). This property inhibits a lot of
optimizations that are perfectly legal for other non-integral pointers
such as fat pointers or CHERI capabilities that have a well-defined
bitwise representation but can't be created with only an address.

This change splits the two properties and allows for address spaces to
be marked as unstable or non-integral (or both) independently using
the 'p' part of the DataLayout string a 'u' following the p marks the
address space as unstable and a 'n' marks it as non-integral.

This does not change the checks in any of the passes yet - we
currently keep the existing non-integral behaviour. In the future I plan
to audit calls to DL.isNonIntegral[PointerType]() and replace them with
the DL.shouldAvoid{IntToPtr,PtrToInt}() checks that allow for more
optimizations.


---
Full diff: https://github.com/llvm/llvm-project/pull/105735.diff


4 Files Affected:

- (modified) llvm/docs/LangRef.rst (+71-19) 
- (modified) llvm/include/llvm/IR/DataLayout.h (+70-7) 
- (modified) llvm/lib/IR/DataLayout.cpp (+27-8) 
- (modified) llvm/unittests/IR/DataLayoutTest.cpp (+46) 


``````````diff
diff --git a/llvm/docs/LangRef.rst b/llvm/docs/LangRef.rst
index 445980a18e7e93..200224c78be004 100644
--- a/llvm/docs/LangRef.rst
+++ b/llvm/docs/LangRef.rst
@@ -649,48 +649,95 @@ literal types are uniqued in recent versions of LLVM.
 
 .. _nointptrtype:
 
-Non-Integral Pointer Type
--------------------------
+Non-Integral and Unstable Pointer Types
+---------------------------------------
 
-Note: non-integral pointer types are a work in progress, and they should be
-considered experimental at this time.
+Note: non-integral/unstable pointer types are a work in progress, and they
+should be considered experimental at this time.
 
 LLVM IR optionally allows the frontend to denote pointers in certain address
-spaces as "non-integral" via the :ref:`datalayout string<langref_datalayout>`.
-Non-integral pointer types represent pointers that have an *unspecified* bitwise
-representation; that is, the integral representation may be target dependent or
-unstable (not backed by a fixed integer).
+spaces as "non-integral" or "unstable" (or both "non-integral" and "unstable")
+via the :ref:`datalayout string<langref_datalayout>`.
+
+These exact implications of these properties are target-specific, but the
+following IR semantics and restrictions to optimization passes apply:
+
+Unstable pointer representation
+^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
+
+Pointers in this address space have an *unspecified* bitwise representation
+(i.e. not backed by a fixed integer). The bitwise pattern of such pointers is
+allowed to change in a target-specific way. For example, this could be a pointer
+type used for with copying garbage collection where the garbage collector could
+update the pointer at any time in the collection sweep.
 
 ``inttoptr`` and ``ptrtoint`` instructions have the same semantics as for
 integral (i.e. normal) pointers in that they convert integers to and from
-corresponding pointer types, but there are additional implications to be
-aware of.  Because the bit-representation of a non-integral pointer may
-not be stable, two identical casts of the same operand may or may not
+corresponding pointer types, but there are additional implications to be aware
+of.
+
+For "unstable" pointer representations, the bit-representation of the pointer
+may not be stable, so two identical casts of the same operand may or may not
 return the same value.  Said differently, the conversion to or from the
-non-integral type depends on environmental state in an implementation
+"unstable" pointer type depends on environmental state in an implementation
 defined manner.
-
 If the frontend wishes to observe a *particular* value following a cast, the
 generated IR must fence with the underlying environment in an implementation
 defined manner. (In practice, this tends to require ``noinline`` routines for
 such operations.)
 
 From the perspective of the optimizer, ``inttoptr`` and ``ptrtoint`` for
-non-integral types are analogous to ones on integral types with one
+"unstable" pointer types are analogous to ones on integral types with one
 key exception: the optimizer may not, in general, insert new dynamic
 occurrences of such casts.  If a new cast is inserted, the optimizer would
 need to either ensure that a) all possible values are valid, or b)
 appropriate fencing is inserted.  Since the appropriate fencing is
 implementation defined, the optimizer can't do the latter.  The former is
 challenging as many commonly expected properties, such as
-``ptrtoint(v)-ptrtoint(v) == 0``, don't hold for non-integral types.
+``ptrtoint(v)-ptrtoint(v) == 0``, don't hold for "unstable" pointer types.
 Similar restrictions apply to intrinsics that might examine the pointer bits,
-such as :ref:`llvm.ptrmask<int_ptrmask>`. 
+such as :ref:`llvm.ptrmask<int_ptrmask>`.
 
-The alignment information provided by the frontend for a non-integral pointer
-(typically using attributes or metadata) must be valid for every possible 
+The alignment information provided by the frontend for an "unstable" pointer
+(typically using attributes or metadata) must be valid for every possible
 representation of the pointer.
 
+Non-integral pointer representation
+^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
+
+Pointers are not represented as an address, but may instead include
+additional metadata such as bounds information or a temporal identifier.
+Examples include AMDGPU buffer descriptors with a 128-bit fat pointer and a
+32-bit offset or CHERI capabilities that contain bounds, permissions and an
+out-of-band validity bit. In general, these pointers cannot be re-created
+from just an integer value.
+
+In most cases pointers with a non-integral representation behave exactly the
+same as an integral pointer, the only difference is that it is not possible to
+create a pointer just from an address.
+
+"Non-integral" pointers also impose restrictions on the optimizer, but in
+general these are less restrictive than for "unstable" pointers. The main
+difference compared to integral pointers is that ``inttoptr`` instructions
+should not be inserted by passes as they may not be able to create a valid
+pointer. This property also means that ``inttoptr(ptrtoint(x))`` cannot be
+folded to ``x`` as the ``ptrtoint`` operation may destroy the necessary metadata
+to reconstruct the pointer.
+Additionaly, since there could be out-of-band state, it is also not legal to
+convert a load/store of a non-integral pointer type to a load/store of an
+integer type with same bitwidth as that may not copy all the state.
+However, it is legal to use appropriately aligned ``llvm.memcpy`` and
+``llvm.memmove`` for copies of non-integral pointers as long as these are not
+converted into integer operations.
+
+Unlike "unstable" pointers, the bit-wise representation is stable and
+``ptrtoint(x)`` always yields a deterministic values.
+This means optimizer is still permitted to insert new ``ptrtoint`` instructions.
+However, it is important to note that ``ptrtoint`` may not yield the same value
+as storing the pointer via memory and reading it back as an integer, even if the
+bitwidth of the two types matches (since ptrtoint could involve some form of
+arithmetic or strip parts of the non-integral pointer representation).
+
 .. _globalvars:
 
 Global Variables
@@ -3056,7 +3103,7 @@ as follows:
 ``A<address space>``
     Specifies the address space of objects created by '``alloca``'.
     Defaults to the default address space of 0.
-``p[n]:<size>:<abi>[:<pref>][:<idx>]``
+``p[<flags>][n]:<size>:<abi>[:<pref>][:<idx>]``
     This specifies the *size* of a pointer and its ``<abi>`` and
     ``<pref>``\erred alignments for address space ``n``. ``<pref>`` is optional
     and defaults to ``<abi>``. The fourth parameter ``<idx>`` is the size of the
@@ -3066,6 +3113,11 @@ as follows:
     are in bits. The address space, ``n``, is optional, and if not specified,
     denotes the default address space 0. The value of ``n`` must be
     in the range [1,2^24).
+    The optional``<flags>`` are used to specify properties of pointers in this
+address space: the character ``u`` marks pointers as having an unstable
+    representation and ```n`` marks pointers as non-integral (i.e. having
+    additional metadata). See :ref:`Non-Integral Pointer Types <nointptrtype>`.
+
 ``i<size>:<abi>[:<pref>]``
     This specifies the alignment for an integer type of a given bit
     ``<size>``. The value of ``<size>`` must be in the range [1,2^24).
diff --git a/llvm/include/llvm/IR/DataLayout.h b/llvm/include/llvm/IR/DataLayout.h
index 953deb6653cc9a..9a89bad2654867 100644
--- a/llvm/include/llvm/IR/DataLayout.h
+++ b/llvm/include/llvm/IR/DataLayout.h
@@ -79,10 +79,14 @@ class DataLayout {
     Align PrefAlign;
     uint32_t IndexBitWidth;
     /// Pointers in this address space don't have a well-defined bitwise
-    /// representation (e.g. may be relocated by a copying garbage collector).
-    /// Additionally, they may also be non-integral (i.e. containing additional
-    /// metadata such as bounds information/permissions).
-    bool IsNonIntegral = false;
+    /// representation (e.g. they may be relocated by a copying garbage
+    /// collector and thus have different addresses at different times).
+    bool HasUnstableRepresentation = false;
+    /// Pointers in this address spacs are non-integral, i.e. don't have a
+    /// integer representation that simply maps to the address. An example of
+    /// this would be fat pointers with bounds information or CHERI capabilities
+    /// that include metadata as well as one out-of-band validity bit.
+    bool HasNonIntegralRepresentation = false;
     bool operator==(const PointerSpec &Other) const;
   };
 
@@ -148,7 +152,7 @@ class DataLayout {
   /// Sets or updates the specification for pointer in the given address space.
   void setPointerSpec(uint32_t AddrSpace, uint32_t BitWidth, Align ABIAlign,
                       Align PrefAlign, uint32_t IndexBitWidth,
-                      bool IsNonIntegral);
+                      bool HasUnstableRepr, bool HasNonIntegralRepr);
 
   /// Internal helper to get alignment for integer of given bitwidth.
   Align getIntegerAlignment(uint32_t BitWidth, bool abi_or_pref) const;
@@ -348,14 +352,63 @@ class DataLayout {
   SmallVector<unsigned, 8> getNonIntegralAddressSpaces() const {
     SmallVector<unsigned, 8> AddrSpaces;
     for (const PointerSpec &PS : PointerSpecs) {
-      if (PS.IsNonIntegral)
+      if (PS.HasNonIntegralRepresentation || PS.HasUnstableRepresentation)
         AddrSpaces.push_back(PS.AddrSpace);
     }
     return AddrSpaces;
   }
 
+  /// Returns whether this address space is "non-integral" and "unstable".
+  /// This means that passes should not introduce inttoptr or ptrtoint
+  /// instructions operating on pointers of this address space.
+  /// TODO: remove this function after migrating to finer-grained properties.
   bool isNonIntegralAddressSpace(unsigned AddrSpace) const {
-    return getPointerSpec(AddrSpace).IsNonIntegral;
+    const PointerSpec &PS = getPointerSpec(AddrSpace);
+    return PS.HasNonIntegralRepresentation || PS.HasUnstableRepresentation;
+  }
+
+  /// Returns whether this address space has an "unstable" pointer
+  /// representation. The bitwise pattern of such pointers is allowed to change
+  /// in a target-specific way. For example, this could be used for copying
+  /// garbage collection where the garbage collector could update the pointer
+  /// value as part of the collection sweep.
+  bool hasUnstableRepresentation(unsigned AddrSpace) const {
+    return getPointerSpec(AddrSpace).HasUnstableRepresentation;
+  }
+
+  /// Returns whether this address space has a non-integral pointer
+  /// representation, i.e. the pointer is not just an integer address but some
+  /// other bitwise representation. Examples include AMDGPU buffer descriptors
+  /// with a 128-bit fat pointer and a 32-bit offset or CHERI capabilities that
+  /// contain bounds, permissions and an out-of-band validity bit. In general,
+  /// these pointers cannot be re-created from just an integer value.
+  bool hasNonIntegralRepresentation(unsigned AddrSpace) const {
+    return getPointerSpec(AddrSpace).HasNonIntegralRepresentation;
+  }
+
+  /// Returns whether passes should avoid introducing `inttoptr` instructions
+  /// for this address space.
+  ///
+  /// This is currently the case "non-integral" pointer representations
+  /// (hasNonIntegralRepresentation()) since such pointers generally require
+  /// additional metadata beyond just an address.
+  /// New `inttoptr` instructions should also be avoided for "unstable" bitwise
+  /// representations (hasUnstableRepresentation()) unless the pass knows it is
+  /// within a critical section that retains the current representation.
+  bool shouldAvoidIntToPtr(unsigned AddrSpace) const {
+    const PointerSpec &PS = getPointerSpec(AddrSpace);
+    return PS.HasNonIntegralRepresentation || PS.HasUnstableRepresentation;
+  }
+
+  /// Returns whether passes should avoid introducing `ptrtoint` instructions
+  /// for this address space.
+  ///
+  /// This is currently the case for pointer address spaces that have an
+  /// "unstable" representation (hasUnstableRepresentation()) since the
+  /// bitwise pattern of such pointers could change unless the pass knows it is
+  /// within a critical section that retains the current representation.
+  bool shouldAvoidPtrToInt(unsigned AddrSpace) const {
+    return hasUnstableRepresentation(AddrSpace);
   }
 
   bool isNonIntegralPointerType(PointerType *PT) const {
@@ -367,6 +420,16 @@ class DataLayout {
     return PTy && isNonIntegralPointerType(PTy);
   }
 
+  bool shouldAvoidPtrToInt(Type *Ty) const {
+    auto *PTy = dyn_cast<PointerType>(Ty);
+    return PTy && shouldAvoidPtrToInt(PTy->getPointerAddressSpace());
+  }
+
+  bool shouldAvoidIntToPtr(Type *Ty) const {
+    auto *PTy = dyn_cast<PointerType>(Ty);
+    return PTy && shouldAvoidIntToPtr(PTy->getPointerAddressSpace());
+  }
+
   /// Layout pointer size, in bits
   /// FIXME: The defaults need to be removed once all of
   /// the backends/clients are updated.
diff --git a/llvm/lib/IR/DataLayout.cpp b/llvm/lib/IR/DataLayout.cpp
index a7a1fa8ef03ede..63ee730c4b5882 100644
--- a/llvm/lib/IR/DataLayout.cpp
+++ b/llvm/lib/IR/DataLayout.cpp
@@ -152,7 +152,8 @@ bool DataLayout::PointerSpec::operator==(const PointerSpec &Other) const {
   return AddrSpace == Other.AddrSpace && BitWidth == Other.BitWidth &&
          ABIAlign == Other.ABIAlign && PrefAlign == Other.PrefAlign &&
          IndexBitWidth == Other.IndexBitWidth &&
-         IsNonIntegral == Other.IsNonIntegral;
+         HasUnstableRepresentation == Other.HasUnstableRepresentation &&
+         HasNonIntegralRepresentation == Other.HasNonIntegralRepresentation;
 }
 
 namespace {
@@ -419,9 +420,24 @@ Error DataLayout::parsePointerSpec(StringRef Spec) {
 
   // Address space. Optional, defaults to 0.
   unsigned AddrSpace = 0;
-  if (!Components[0].empty())
-    if (Error Err = parseAddrSpace(Components[0], AddrSpace))
+  bool UnstableRepr = false;
+  bool NonIntegralRepr = false;
+  StringRef AddrSpaceStr = Components[0].drop_while([&](char C) {
+    if (C == 'n') {
+      NonIntegralRepr = true;
+      return true;
+    } else if (C == 'u') {
+      UnstableRepr = true;
+      return true;
+    }
+    return false;
+  });
+  if (!AddrSpaceStr.empty()) {
+    if (Error Err = parseAddrSpace(AddrSpaceStr, AddrSpace))
       return Err;
+  }
+  if (AddrSpace == 0 && (NonIntegralRepr || UnstableRepr))
+    return createStringError("address space 0 cannot be non-integral");
 
   // Size. Required, cannot be zero.
   unsigned BitWidth;
@@ -455,7 +471,7 @@ Error DataLayout::parsePointerSpec(StringRef Spec) {
         "index size cannot be larger than the pointer size");
 
   setPointerSpec(AddrSpace, BitWidth, ABIAlign, PrefAlign, IndexBitWidth,
-                 false);
+                 UnstableRepr, NonIntegralRepr);
   return Error::success();
 }
 
@@ -629,7 +645,7 @@ Error DataLayout::parseLayoutString(StringRef LayoutString) {
   for (unsigned AS : NonIntegralAddressSpaces) {
     const PointerSpec &PS = getPointerSpec(AS);
     setPointerSpec(AS, PS.BitWidth, PS.ABIAlign, PS.PrefAlign, PS.IndexBitWidth,
-                   true);
+                   true, true);
   }
 
   return Error::success();
@@ -677,17 +693,20 @@ DataLayout::getPointerSpec(uint32_t AddrSpace) const {
 
 void DataLayout::setPointerSpec(uint32_t AddrSpace, uint32_t BitWidth,
                                 Align ABIAlign, Align PrefAlign,
-                                uint32_t IndexBitWidth, bool IsNonIntegral) {
+                                uint32_t IndexBitWidth, bool HasUnstableRepr,
+                                bool HasNonIntegralRepr) {
   auto I = lower_bound(PointerSpecs, AddrSpace, LessPointerAddrSpace());
   if (I == PointerSpecs.end() || I->AddrSpace != AddrSpace) {
     PointerSpecs.insert(I, PointerSpec{AddrSpace, BitWidth, ABIAlign, PrefAlign,
-                                       IndexBitWidth, IsNonIntegral});
+                                       IndexBitWidth, HasUnstableRepr,
+                                       HasNonIntegralRepr});
   } else {
     I->BitWidth = BitWidth;
     I->ABIAlign = ABIAlign;
     I->PrefAlign = PrefAlign;
     I->IndexBitWidth = IndexBitWidth;
-    I->IsNonIntegral = IsNonIntegral;
+    I->HasUnstableRepresentation = HasUnstableRepr;
+    I->HasNonIntegralRepresentation = HasNonIntegralRepr;
   }
 }
 
diff --git a/llvm/unittests/IR/DataLayoutTest.cpp b/llvm/unittests/IR/DataLayoutTest.cpp
index 396d44af19f53f..d26eb70509f515 100644
--- a/llvm/unittests/IR/DataLayoutTest.cpp
+++ b/llvm/unittests/IR/DataLayoutTest.cpp
@@ -319,6 +319,12 @@ TEST(DataLayout, ParsePointerSpec) {
         FailedWithMessage("malformed specification, must be of the form "
                           "\"p[<n>]:<size>:<abi>[:<pref>[:<idx>]]\""));
 
+  // Only 'u' and 'n' flags are valid.
+  for (StringRef Str : {"pa:32:32", "px:32:32"})
+    EXPECT_THAT_EXPECTED(
+        DataLayout::parse(Str),
+        FailedWithMessage("address space must be a 24-bit integer"));
+
   // address space
   for (StringRef Str : {"p0x0:32:32", "px:32:32:32", "p16777216:32:32:32:32"})
     EXPECT_THAT_EXPECTED(
@@ -401,6 +407,12 @@ TEST(DataLayout, ParsePointerSpec) {
     EXPECT_THAT_EXPECTED(
         DataLayout::parse(Str),
         FailedWithMessage("index size cannot be larger than the pointer size"));
+
+  for (StringRef Str : {"pn:64:64", "pu:64:64", "pun:64:64", "pnu:64:64",
+                        "pn0:64:64", "pu0:64:64", "pun0:64:64", "pnu0:64:64"})
+    EXPECT_THAT_EXPECTED(
+        DataLayout::parse(Str),
+        FailedWithMessage("address space 0 cannot be non-integral"));
 }
 
 TEST(DataLayoutTest, ParseNativeIntegersSpec) {
@@ -553,6 +565,40 @@ TEST(DataLayout, IsNonIntegralAddressSpace) {
   EXPECT_FALSE(Custom.isNonIntegralAddressSpace(1));
   EXPECT_TRUE(Custom.isNonIntegralAddressSpace(2));
   EXPECT_TRUE(Custom.isNonIntegralAddressSpace(16777215));
+
+  // Pointers can be marked as non-integral using 'pn'
+  DataLayout NonIntegral = cantFail(DataLayout::parse("pn2:64:64:64:32"));
+  EXPECT_TRUE(NonIntegral.isNonIntegralAddressSpace(2));
+  EXPECT_TRUE(NonIntegral.hasNonIntegralRepresentation(2));
+  EXPECT_FALSE(NonIntegral.hasUnstableRepresentation(2));
+  EXPECT_TRUE(NonIntegral.shouldAvoidIntToPtr(2));
+  EXPECT_FALSE(NonIntegral.shouldAvoidPtrToInt(2));
+
+  // Pointers can be marked as unstable using 'pu'
+  DataLayout Unstable = cantFail(DataLayout::parse("pu2:64:64:64:32"));
+  EXPECT_TRUE(Unstable.isNonIntegralAddressSpace(2));
+  EXPECT_TRUE(Unstable.hasUnstableRepresentation(2));
+  EXPECT_FALSE(Unstable.hasNonIntegralRepresentation(2));
+  EXPECT_TRUE(Unstable.shouldAvoidPtrToInt(2));
+  EXPECT_TRUE(Unstable.shouldAvoidIntToPtr(2));
+
+  // Both properties can also be set using 'pnu'/'pun'
+  for (auto Layout : {"pnu2:64:64:64:32", "pun2:64:64:64:32"}) {
+    DataLayout DL = cantFail(DataLayout::parse(Layout));
+    EXPECT_TRUE(DL.isNonIntegralAddressSpace(2));
+    EXPECT_TRUE(DL.hasNonIntegralRepresentation(2));
+    EXPECT_TRUE(DL.hasUnstableRepresentation(2));
+  }
+
+  // For backwards compatibility, the ni DataLayout part overrides any p[n][u].
+  for (auto Layout : {"ni:2-pn2:64:64:64:32", "ni:2-pnu2:64:64:64:32",
+                      "ni:2-pu2:64:64:64:32", "pn2:64:64:64:32-ni:2",
+                      "pnu2:64:64:64:32-ni:2", "pu2:64:64:64:32-ni:2"}) {
+    DataLayout DL = cantFail(DataLayout::parse(Layout));
+    EXPECT_TRUE(DL.isNonIntegralAddressSpace(2));
+    EXPECT_TRUE(DL.hasNonIntegralRepresentation(2));
+    EXPECT_TRUE(DL.hasUnstableRepresentation(2));
+  }
 }
 
 TEST(DataLayoutTest, CopyAssignmentInvalidatesStructLayout) {

``````````

</details>


https://github.com/llvm/llvm-project/pull/105735


More information about the llvm-branch-commits mailing list