[llvm] dde000a - [DataLayout][LangRef] Split non-integral and unstable pointer properties
    via llvm-commits 
    llvm-commits at lists.llvm.org
       
    Tue Sep 23 11:16:52 PDT 2025
    
    
  
Author: Alexander Richardson
Date: 2025-09-23T11:16:47-07:00
New Revision: dde000a7d619ab2032f3e721edc850fb421e50cd
URL: https://github.com/llvm/llvm-project/commit/dde000a7d619ab2032f3e721edc850fb421e50cd
DIFF: https://github.com/llvm/llvm-project/commit/dde000a7d619ab2032f3e721edc850fb421e50cd.diff
LOG: [DataLayout][LangRef] Split non-integral and unstable pointer properties
This commit adds finer-grained versions of isNonIntegralAddressSpace() and
isNonIntegralPointerType() where the current semantics prohibit
introduction of both ptrtoint and inttoptr instructions. The current
semantics are too strict for some targets (e.g. AMDGPU/CHERI) where
ptrtoint has a stable value, but the pointer has additional metadata.
Currently, marking a pointer address space as non-integral also marks it
as having an unstable bitwise representation (e.g. when pointers can be
changed by a copying GC). This property inhibits a lot of
optimizations that are perfectly legal for other non-integral pointers
such as fat pointers or CHERI capabilities that have a well-defined
bitwise representation but can't be created with only an address.
This change splits the properties of non-integral pointers and allows
for address spaces to be marked as unstable or non-integral (or both)
independently using the 'p' part of the DataLayout string.
A 'u' following the p marks the address space as unstable and specifying
a index width != representation width marks it as non-integral.
Finally, we also add an 'e' flag to mark pointers with external state
(such as the CHERI capability validity) state. These pointers require
special handling of loads and stores in addition to being non-integral.
This does not change the checks in any of the passes yet - we
currently keep the existing non-integral behaviour. In the future I plan
to audit calls to DL.isNonIntegral[PointerType]() and replace them with
the DL.mustNotIntroduce{IntToPtr,PtrToInt}() checks that allow for more
optimizations.
RFC: https://discourse.llvm.org/t/rfc-finer-grained-non-integral-pointer-properties/83176
Reviewed By: nikic, krzysz00
Pull Request: https://github.com/llvm/llvm-project/pull/105735
Added: 
    
Modified: 
    llvm/docs/LangRef.rst
    llvm/include/llvm/IR/DataLayout.h
    llvm/lib/IR/DataLayout.cpp
    llvm/test/Transforms/InstSimplify/ConstProp/inttoptr-gep-index-width.ll
    llvm/test/Transforms/SimplifyCFG/switch_create-custom-dl.ll
    llvm/unittests/IR/DataLayoutTest.cpp
Removed: 
    
################################################################################
diff  --git a/llvm/docs/LangRef.rst b/llvm/docs/LangRef.rst
index e6713c827d6ab..b32a27f9555fd 100644
--- a/llvm/docs/LangRef.rst
+++ b/llvm/docs/LangRef.rst
@@ -660,19 +660,60 @@ Non-Integral Pointer Type
 Note: non-integral pointer types are a work in progress, and they should be
 considered experimental at this time.
 
-LLVM IR optionally allows the frontend to denote pointers in certain address
-spaces as "non-integral" via the :ref:`datalayout string<langref_datalayout>`.
-Non-integral pointer types represent pointers that have an *unspecified* bitwise
-representation; that is, the integral representation may be target dependent or
-unstable (not backed by a fixed integer).
+For most targets, the pointer representation is a direct mapping from the
+bitwise representation to the address of the underlying memory location.
+Such pointers are considered "integral", and any pointers where the
+representation is not just an integer address are called "non-integral".
+
+Non-integral pointers have at least one of the following three properties:
+
+* the pointer representation contains non-address bits
+* the pointer representation is unstable (may changed at any time in a
+  target-specific way)
+* the pointer representation has external state
+
+These properties (or combinations thereof) can be applied to pointers via the
+:ref:`datalayout string<langref_datalayout>`.
+
+The exact implications of these properties are target-specific. The following
+subsections describe the IR semantics and restrictions to optimization passes
+for each of these properties.
+
+Pointers with non-address bits
+^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
+
+Pointers in this address space have a bitwise representation that not only
+has address bits, but also some other target-specific metadata.
+In most cases pointers with non-address bits behave exactly the same as
+integral pointers, the only 
diff erence is that it is not possible to create a
+pointer just from an address unless all the non-address bits are also recreated
+correctly in a target-specific way.
+
+An example of pointers with non-address bits are the AMDGPU buffer descriptors
+which are 160 bits: a 128-bit fat pointer and a 32-bit offset.
+Similarly, CHERI capabilities contain a 32 or 64 bit address as well as the
+same number of metadata bits, but unlike the AMDGPU buffer descriptors they have
+external state in addition to non-address bits.
+
+
+Unstable pointer representation
+^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
+
+Pointers in this address space have an *unspecified* bitwise representation
+(i.e. not backed by a fixed integer). The bitwise pattern of such pointers is
+allowed to change in a target-specific way. For example, this could be a pointer
+type used with copying garbage collection where the garbage collector could
+update the pointer at any time in the collection sweep.
 
 ``inttoptr`` and ``ptrtoint`` instructions have the same semantics as for
 integral (i.e., normal) pointers in that they convert integers to and from
-corresponding pointer types, but there are additional implications to be
-aware of.  Because the bit-representation of a non-integral pointer may
-not be stable, two identical casts of the same operand may or may not
+corresponding pointer types, but there are additional implications to be aware
+of.
+
+For "unstable" pointer representations, the bit-representation of the pointer
+may not be stable, so two identical casts of the same operand may or may not
 return the same value.  Said 
diff erently, the conversion to or from the
-non-integral type depends on environmental state in an implementation
+"unstable" pointer type depends on environmental state in an implementation
 defined manner.
 
 If the frontend wishes to observe a *particular* value following a cast, the
@@ -681,21 +722,72 @@ defined manner. (In practice, this tends to require ``noinline`` routines for
 such operations.)
 
 From the perspective of the optimizer, ``inttoptr`` and ``ptrtoint`` for
-non-integral types are analogous to ones on integral types with one
+"unstable" pointer types are analogous to ones on integral types with one
 key exception: the optimizer may not, in general, insert new dynamic
 occurrences of such casts.  If a new cast is inserted, the optimizer would
 need to either ensure that a) all possible values are valid, or b)
 appropriate fencing is inserted.  Since the appropriate fencing is
 implementation defined, the optimizer can't do the latter.  The former is
 challenging as many commonly expected properties, such as
-``ptrtoint(v)-ptrtoint(v) == 0``, don't hold for non-integral types.
+``ptrtoint(v)-ptrtoint(v) == 0``, don't hold for "unstable" pointer types.
 Similar restrictions apply to intrinsics that might examine the pointer bits,
 such as :ref:`llvm.ptrmask<int_ptrmask>`.
 
-The alignment information provided by the frontend for a non-integral pointer
+The alignment information provided by the frontend for an "unstable" pointer
 (typically using attributes or metadata) must be valid for every possible
 representation of the pointer.
 
+Pointers with external state
+^^^^^^^^^^^^^^^^^^^^^^^^^^^^
+
+A further special case of non-integral pointers is ones that include external
+state (such as bounds information or a type tag) with a target-defined size.
+An example of such a type is a CHERI capability, where there is an additional
+validity bit that is part of all pointer-typed registers, but is located in
+memory at an implementation-defined address separate from the pointer itself.
+Another example would be a fat-pointer scheme where pointers remain plain
+integers, but the associated bounds are stored in an out-of-band table.
+
+Unless also marked as "unstable", the bit-wise representation of pointers with
+external state is stable and ``ptrtoint(x)`` always yields a deterministic
+value. This means transformation passes are still permitted to insert new
+``ptrtoint`` instructions.
+
+The following restrictions apply to IR level optimization passes:
+
+The ``inttoptr`` instruction does not recreate the external state and therefore
+it is target dependent whether it can be used to create a dereferenceable
+pointer. In general passes should assume that the result of such an inttoptr
+is not dereferenceable. For example, on CHERI targets an ``inttoptr`` will
+yield a capability with the external state (the validity tag bit) set to zero,
+which will cause any dereference to trap.
+The ``ptrtoint`` instruction also only returns the "in-band" state and omits
+all external  state.
+
+When a ``store ptr addrspace(N) %p, ptr @dst`` of such a non-integral pointer
+is performed, the external metadata is also stored to an implementation-defined
+location. Similarly, a ``%val = load ptr addrspace(N), ptr @dst`` will fetch the
+external metadata and make it available for all uses of ``%val``.
+Similarly, the ``llvm.memcpy`` and ``llvm.memmove`` intrinsics also transfer the
+external state. This is essential to allow frontends to efficiently emit copies
+of structures containing such pointers, since expanding all these copies as
+individual loads and stores would affect compilation speed and inhibit
+optimizations.
+
+Notionally, these external bits are part of the pointer, but since
+``inttoptr`` / ``ptrtoint``` only operate on the "in-band" bits of the pointer
+and the external bits are not explicitly exposed, they are not included in the
+size specified in the :ref:`datalayout string<langref_datalayout>`.
+
+When a pointer type has external state, all roundtrips via memory must
+be performed as loads and stores of the correct type since stores of other
+types may not propagate the external data.
+Therefore it is not legal to convert an existing load/store (or a
+``llvm.memcpy`` / ``llvm.memmove`` intrinsic) of pointer types with external
+state to a load/store of an integer type with same bitwidth, as that may drop
+the external state.
+
+
 .. _globalvars:
 
 Global Variables
@@ -3179,8 +3271,8 @@ as follows:
 ``A<address space>``
     Specifies the address space of objects created by '``alloca``'.
     Defaults to the default address space of 0.
-``p[n]:<size>:<abi>[:<pref>[:<idx>]]``
-    This specifies the properties of a pointer in address space ``n``.
+``p[<flags>][<as>]:<size>:<abi>[:<pref>[:<idx>]]``
+    This specifies the properties of a pointer in address space ``as``.
     The ``<size>`` parameter specifies the size of the bitwise representation.
     For :ref:`non-integral pointers <nointptrtype>` the representation size may
     be larger than the address width of the underlying address space (e.g. to
@@ -3193,9 +3285,13 @@ as follows:
     default index size is equal to the pointer size.
     The index size also specifies the width of addresses in this address space.
     All sizes are in bits.
-    The address space, ``n``, is optional, and if not specified,
-    denotes the default address space 0. The value of ``n`` must be
-    in the range [1,2^24).
+    The address space, ``<as>``, is optional, and if not specified, denotes the
+    default address space 0. The value of ``<as>`` must be in the range [1,2^24).
+    The optional ``<flags>`` are used to specify properties of pointers in this
+    address space: the character ``u`` marks pointers as having an unstable
+    representation, and ``e`` marks pointers having external state. See
+    :ref:`Non-Integral Pointer Types <nointptrtype>`.
+
 ``i<size>:<abi>[:<pref>]``
     This specifies the alignment for an integer type of a given bit
     ``<size>``. The value of ``<size>`` must be in the range [1,2^24).
@@ -3248,9 +3344,11 @@ as follows:
     this set are considered to support most general arithmetic operations
     efficiently.
 ``ni:<address space0>:<address space1>:<address space2>...``
-    This specifies pointer types with the specified address spaces
-    as :ref:`Non-Integral Pointer Type <nointptrtype>` s.  The ``0``
-    address space cannot be specified as non-integral.
+    This marks pointer types with the specified address spaces
+    as :ref:`unstable <nointptrtype>`.
+    The ``0`` address space cannot be specified as non-integral.
+    It is only supported for backwards compatibility, the flags of the ``p``
+    specifier should be used instead for new code.
 
 ``<abi>`` is a lower bound on what is required for a type to be considered
 aligned. This is used in various places, such as:
@@ -31402,4 +31500,3 @@ Semantics:
 
 The '``llvm.preserve.struct.access.index``' intrinsic produces the same result
 as a getelementptr with base ``base`` and access operands ``{0, gep_index}``.
-
diff  --git a/llvm/include/llvm/IR/DataLayout.h b/llvm/include/llvm/IR/DataLayout.h
index 5653ee7b6837d..56fc749838ef9 100644
--- a/llvm/include/llvm/IR/DataLayout.h
+++ b/llvm/include/llvm/IR/DataLayout.h
@@ -77,12 +77,21 @@ class DataLayout {
     uint32_t BitWidth;
     Align ABIAlign;
     Align PrefAlign;
+    /// The index bit width also defines the address size in this address space.
+    /// If the index width is less than the representation bit width, the
+    /// pointer is non-integral and bits beyond the index width could be used
+    /// for additional metadata (e.g. AMDGPU buffer fat pointers with bounds
+    /// and other flags or CHERI capabilities that contain bounds+permissions).
     uint32_t IndexBitWidth;
     /// Pointers in this address space don't have a well-defined bitwise
-    /// representation (e.g. may be relocated by a copying garbage collector).
-    /// Additionally, they may also be non-integral (i.e. containing additional
-    /// metadata such as bounds information/permissions).
-    bool IsNonIntegral;
+    /// representation (e.g. they may be relocated by a copying garbage
+    /// collector and thus have 
diff erent addresses at 
diff erent times).
+    bool HasUnstableRepresentation;
+    /// Pointers in this address space have additional state bits that are
+    /// located at a target-defined location when stored in memory. An example
+    /// of this would be CHERI capabilities where the validity bit is stored
+    /// separately from the pointer address+bounds information.
+    bool HasExternalState;
     LLVM_ABI bool operator==(const PointerSpec &Other) const;
   };
 
@@ -149,7 +158,7 @@ class DataLayout {
   /// Sets or updates the specification for pointer in the given address space.
   void setPointerSpec(uint32_t AddrSpace, uint32_t BitWidth, Align ABIAlign,
                       Align PrefAlign, uint32_t IndexBitWidth,
-                      bool IsNonIntegral);
+                      bool HasUnstableRepr, bool HasExternalState);
 
   /// Internal helper to get alignment for integer of given bitwidth.
   LLVM_ABI Align getIntegerAlignment(uint32_t BitWidth, bool abi_or_pref) const;
@@ -355,19 +364,91 @@ class DataLayout {
   /// \sa DataLayout::getAddressSizeInBits
   unsigned getAddressSize(unsigned AS) const { return getIndexSize(AS); }
 
-  /// Return the address spaces containing non-integral pointers.  Pointers in
-  /// this address space don't have a well-defined bitwise representation.
-  SmallVector<unsigned, 8> getNonIntegralAddressSpaces() const {
+  /// Return the address spaces with special pointer semantics (such as being
+  /// unstable or non-integral).
+  SmallVector<unsigned, 8> getNonStandardAddressSpaces() const {
     SmallVector<unsigned, 8> AddrSpaces;
     for (const PointerSpec &PS : PointerSpecs) {
-      if (PS.IsNonIntegral)
+      if (PS.HasUnstableRepresentation || PS.HasExternalState ||
+          PS.BitWidth != PS.IndexBitWidth)
         AddrSpaces.push_back(PS.AddrSpace);
     }
     return AddrSpaces;
   }
 
+  /// Returns whether this address space has a non-integral pointer
+  /// representation, i.e. the pointer is not just an integer address but some
+  /// other bitwise representation. When true, passes cannot assume that all
+  /// bits of the representation map directly to the allocation address.
+  /// NOTE: This also returns true for "unstable" pointers where the
+  /// representation may be just an address, but this value can change at any
+  /// given time (e.g. due to copying garbage collection).
+  /// Examples include AMDGPU buffer descriptors with a 128-bit fat pointer
+  /// and a 32-bit offset or CHERI capabilities that contain bounds, permissions
+  /// and an out-of-band validity bit.
+  ///
+  /// In general, more specialized functions such as mustNotIntroduceIntToPtr(),
+  /// mustNotIntroducePtrToInt(), or hasExternalState() should be
+  /// preferred over this one when reasoning about the behavior of IR
+  /// analysis/transforms.
+  /// TODO: should remove/deprecate this once all uses have migrated.
   bool isNonIntegralAddressSpace(unsigned AddrSpace) const {
-    return getPointerSpec(AddrSpace).IsNonIntegral;
+    const auto &PS = getPointerSpec(AddrSpace);
+    return PS.BitWidth != PS.IndexBitWidth || PS.HasUnstableRepresentation ||
+           PS.HasExternalState;
+  }
+
+  /// Returns whether this address space has an "unstable" pointer
+  /// representation. The bitwise pattern of such pointers is allowed to change
+  /// in a target-specific way. For example, this could be used for copying
+  /// garbage collection where the garbage collector could update the pointer
+  /// value as part of the collection sweep.
+  bool hasUnstableRepresentation(unsigned AddrSpace) const {
+    return getPointerSpec(AddrSpace).HasUnstableRepresentation;
+  }
+  bool hasUnstableRepresentation(Type *Ty) const {
+    auto *PTy = dyn_cast<PointerType>(Ty->getScalarType());
+    return PTy && hasUnstableRepresentation(PTy->getPointerAddressSpace());
+  }
+
+  /// Returns whether this address space has external state (implies having
+  /// a non-integral pointer representation).
+  /// These pointer types must be loaded and stored using appropriate
+  /// instructions and cannot use integer loads/stores as this would not
+  /// propagate the out-of-band state. An example of such a pointer type is a
+  /// CHERI capability that contain bounds, permissions and an out-of-band
+  /// validity bit that is invalidated whenever an integer/FP store is performed
+  /// to the associated memory location.
+  bool hasExternalState(unsigned AddrSpace) const {
+    return getPointerSpec(AddrSpace).HasExternalState;
+  }
+  bool hasExternalState(Type *Ty) const {
+    auto *PTy = dyn_cast<PointerType>(Ty->getScalarType());
+    return PTy && hasExternalState(PTy->getPointerAddressSpace());
+  }
+
+  /// Returns whether passes must avoid introducing `inttoptr` instructions
+  /// for this address space (unless they have target-specific knowledge).
+  ///
+  /// This is currently the case for non-integral pointer representations with
+  /// external state (hasExternalState()) since `inttoptr` cannot recreate the
+  /// external state bits.
+  /// New `inttoptr` instructions should also be avoided for "unstable" bitwise
+  /// representations (hasUnstableRepresentation()) unless the pass knows it is
+  /// within a critical section that retains the current representation.
+  bool mustNotIntroduceIntToPtr(unsigned AddrSpace) const {
+    return hasUnstableRepresentation(AddrSpace) || hasExternalState(AddrSpace);
+  }
+
+  /// Returns whether passes must avoid introducing `ptrtoint` instructions
+  /// for this address space (unless they have target-specific knowledge).
+  ///
+  /// This is currently the case for pointer address spaces that have an
+  /// "unstable" representation (hasUnstableRepresentation()) since the
+  /// bitwise pattern of such pointers could change unless the pass knows it is
+  /// within a critical section that retains the current representation.
+  bool mustNotIntroducePtrToInt(unsigned AddrSpace) const {
+    return hasUnstableRepresentation(AddrSpace);
   }
 
   bool isNonIntegralPointerType(PointerType *PT) const {
@@ -375,10 +456,20 @@ class DataLayout {
   }
 
   bool isNonIntegralPointerType(Type *Ty) const {
-    auto *PTy = dyn_cast<PointerType>(Ty);
+    auto *PTy = dyn_cast<PointerType>(Ty->getScalarType());
     return PTy && isNonIntegralPointerType(PTy);
   }
 
+  bool mustNotIntroducePtrToInt(Type *Ty) const {
+    auto *PTy = dyn_cast<PointerType>(Ty->getScalarType());
+    return PTy && mustNotIntroducePtrToInt(PTy->getPointerAddressSpace());
+  }
+
+  bool mustNotIntroduceIntToPtr(Type *Ty) const {
+    auto *PTy = dyn_cast<PointerType>(Ty->getScalarType());
+    return PTy && mustNotIntroduceIntToPtr(PTy->getPointerAddressSpace());
+  }
+
   /// The size in bits of the pointer representation in a given address space.
   /// This is not necessarily the same as the integer address of a pointer (e.g.
   /// for fat pointers).
diff  --git a/llvm/lib/IR/DataLayout.cpp b/llvm/lib/IR/DataLayout.cpp
index 77f9b997a2ebf..49e1f898ca594 100644
--- a/llvm/lib/IR/DataLayout.cpp
+++ b/llvm/lib/IR/DataLayout.cpp
@@ -151,7 +151,8 @@ bool DataLayout::PointerSpec::operator==(const PointerSpec &Other) const {
   return AddrSpace == Other.AddrSpace && BitWidth == Other.BitWidth &&
          ABIAlign == Other.ABIAlign && PrefAlign == Other.PrefAlign &&
          IndexBitWidth == Other.IndexBitWidth &&
-         IsNonIntegral == Other.IsNonIntegral;
+         HasUnstableRepresentation == Other.HasUnstableRepresentation &&
+         HasExternalState == Other.HasExternalState;
 }
 
 namespace {
@@ -194,7 +195,7 @@ constexpr DataLayout::PrimitiveSpec DefaultVectorSpecs[] = {
 // Default pointer type specifications.
 constexpr DataLayout::PointerSpec DefaultPointerSpecs[] = {
     // p0:64:64:64:64
-    {0, 64, Align::Constant<8>(), Align::Constant<8>(), 64, false},
+    {0, 64, Align::Constant<8>(), Align::Constant<8>(), 64, false, false},
 };
 
 DataLayout::DataLayout()
@@ -405,9 +406,29 @@ Error DataLayout::parsePointerSpec(StringRef Spec) {
 
   // Address space. Optional, defaults to 0.
   unsigned AddrSpace = 0;
-  if (!Components[0].empty())
-    if (Error Err = parseAddrSpace(Components[0], AddrSpace))
-      return Err;
+  bool ExternalState = false;
+  bool UnstableRepr = false;
+  StringRef AddrSpaceStr = Components[0];
+  while (!AddrSpaceStr.empty()) {
+    char C = AddrSpaceStr.front();
+    if (C == 'e') {
+      ExternalState = true;
+    } else if (C == 'u') {
+      UnstableRepr = true;
+    } else if (isAlpha(C)) {
+      return createStringError("'%c' is not a valid pointer specification flag",
+                               C);
+    } else {
+      break; // not a valid flag, remaining must be the address space number.
+    }
+    AddrSpaceStr = AddrSpaceStr.drop_front(1);
+  }
+  if (!AddrSpaceStr.empty())
+    if (Error Err = parseAddrSpace(AddrSpaceStr, AddrSpace))
+      return Err; // Failed to parse the remaining characters as a number
+  if (AddrSpace == 0 && (ExternalState || UnstableRepr))
+    return createStringError(
+        "address space 0 cannot be unstable or have external state");
 
   // Size. Required, cannot be zero.
   unsigned BitWidth;
@@ -441,7 +462,7 @@ Error DataLayout::parsePointerSpec(StringRef Spec) {
         "index size cannot be larger than the pointer size");
 
   setPointerSpec(AddrSpace, BitWidth, ABIAlign, PrefAlign, IndexBitWidth,
-                 false);
+                 UnstableRepr, ExternalState);
   return Error::success();
 }
 
@@ -617,7 +638,7 @@ Error DataLayout::parseLayoutString(StringRef LayoutString) {
     // the spec for AS0, and we then update that to mark it non-integral.
     const PointerSpec &PS = getPointerSpec(AS);
     setPointerSpec(AS, PS.BitWidth, PS.ABIAlign, PS.PrefAlign, PS.IndexBitWidth,
-                   true);
+                   /*HasUnstableRepr=*/true, /*HasExternalState=*/false);
   }
 
   return Error::success();
@@ -665,17 +686,20 @@ DataLayout::getPointerSpec(uint32_t AddrSpace) const {
 
 void DataLayout::setPointerSpec(uint32_t AddrSpace, uint32_t BitWidth,
                                 Align ABIAlign, Align PrefAlign,
-                                uint32_t IndexBitWidth, bool IsNonIntegral) {
+                                uint32_t IndexBitWidth, bool HasUnstableRepr,
+                                bool HasExternalState) {
   auto I = lower_bound(PointerSpecs, AddrSpace, LessPointerAddrSpace());
   if (I == PointerSpecs.end() || I->AddrSpace != AddrSpace) {
     PointerSpecs.insert(I, PointerSpec{AddrSpace, BitWidth, ABIAlign, PrefAlign,
-                                       IndexBitWidth, IsNonIntegral});
+                                       IndexBitWidth, HasUnstableRepr,
+                                       HasExternalState});
   } else {
     I->BitWidth = BitWidth;
     I->ABIAlign = ABIAlign;
     I->PrefAlign = PrefAlign;
     I->IndexBitWidth = IndexBitWidth;
-    I->IsNonIntegral = IsNonIntegral;
+    I->HasUnstableRepresentation = HasUnstableRepr;
+    I->HasExternalState = HasExternalState;
   }
 }
 
diff  --git a/llvm/test/Transforms/InstSimplify/ConstProp/inttoptr-gep-index-width.ll b/llvm/test/Transforms/InstSimplify/ConstProp/inttoptr-gep-index-width.ll
index 03056e8361e21..864e129a91ec7 100644
--- a/llvm/test/Transforms/InstSimplify/ConstProp/inttoptr-gep-index-width.ll
+++ b/llvm/test/Transforms/InstSimplify/ConstProp/inttoptr-gep-index-width.ll
@@ -4,9 +4,11 @@
 target datalayout = "p:16:16:16:8"
 
 ; The GEP should only modify the low 8 bits of the pointer.
+;; We need to use finer-grained DataLayout properties for non-integral pointers
+;; FIXME: Should be: ret ptr inttoptr (i16 -256 to ptr)
 define ptr @test() {
 ; CHECK-LABEL: define ptr @test() {
-; CHECK-NEXT:    ret ptr inttoptr (i16 -256 to ptr)
+; CHECK-NEXT:    ret ptr getelementptr (i8, ptr inttoptr (i16 -1 to ptr), i8 1)
 ;
   %base = inttoptr i16 -1 to ptr
   %gep = getelementptr i8, ptr %base, i8 1
diff  --git a/llvm/test/Transforms/SimplifyCFG/switch_create-custom-dl.ll b/llvm/test/Transforms/SimplifyCFG/switch_create-custom-dl.ll
index 336fc5e14d758..ddf64591776dd 100644
--- a/llvm/test/Transforms/SimplifyCFG/switch_create-custom-dl.ll
+++ b/llvm/test/Transforms/SimplifyCFG/switch_create-custom-dl.ll
@@ -33,13 +33,14 @@ F:              ; preds = %0
   ret void
 }
 
+; We need to use finer-grained DataLayout properties for non-integral pointers
+; FIXME: Should be using a switch here
 define void @test1_ptr(ptr %V) {
 ; CHECK-LABEL: @test1_ptr(
-; CHECK-NEXT:    [[MAGICPTR:%.*]] = ptrtoint ptr [[V:%.*]] to i40
-; CHECK-NEXT:    switch i40 [[MAGICPTR]], label [[F:%.*]] [
-; CHECK-NEXT:      i40 17, label [[T:%.*]]
-; CHECK-NEXT:      i40 4, label [[T]]
-; CHECK-NEXT:    ]
+; CHECK-NEXT:    [[C1:%.*]] = icmp eq ptr [[V:%.*]], inttoptr (i32 4 to ptr)
+; CHECK-NEXT:    [[C2:%.*]] = icmp eq ptr [[V]], inttoptr (i32 17 to ptr)
+; CHECK-NEXT:    [[CN:%.*]] = or i1 [[C1]], [[C2]]
+; CHECK-NEXT:    br i1 [[CN]], label [[T:%.*]], label [[F:%.*]]
 ; CHECK:       common.ret:
 ; CHECK-NEXT:    ret void
 ; CHECK:       T:
@@ -63,11 +64,10 @@ F:              ; preds = %0
 
 define void @test1_ptr_as1(ptr addrspace(1) %V) {
 ; CHECK-LABEL: @test1_ptr_as1(
-; CHECK-NEXT:    [[MAGICPTR:%.*]] = ptrtoint ptr addrspace(1) [[V:%.*]] to i40
-; CHECK-NEXT:    switch i40 [[MAGICPTR]], label [[F:%.*]] [
-; CHECK-NEXT:      i40 17, label [[T:%.*]]
-; CHECK-NEXT:      i40 4, label [[T]]
-; CHECK-NEXT:    ]
+; CHECK-NEXT:    [[C1:%.*]] = icmp eq ptr addrspace(1) [[V:%.*]], inttoptr (i32 4 to ptr addrspace(1))
+; CHECK-NEXT:    [[C2:%.*]] = icmp eq ptr addrspace(1) [[V]], inttoptr (i32 17 to ptr addrspace(1))
+; CHECK-NEXT:    [[CN:%.*]] = or i1 [[C1]], [[C2]]
+; CHECK-NEXT:    br i1 [[CN]], label [[T:%.*]], label [[F:%.*]]
 ; CHECK:       common.ret:
 ; CHECK-NEXT:    ret void
 ; CHECK:       T:
diff  --git a/llvm/unittests/IR/DataLayoutTest.cpp b/llvm/unittests/IR/DataLayoutTest.cpp
index e0c0f35847f07..9ca88141ca0eb 100644
--- a/llvm/unittests/IR/DataLayoutTest.cpp
+++ b/llvm/unittests/IR/DataLayoutTest.cpp
@@ -320,7 +320,8 @@ TEST(DataLayout, ParsePointerSpec) {
                           "\"p[<n>]:<size>:<abi>[:<pref>[:<idx>]]\""));
 
   // address space
-  for (StringRef Str : {"p0x0:32:32", "px:32:32:32", "p16777216:32:32:32:32"})
+  for (StringRef Str :
+       {"p0x0:32:32", "p10_000:32:32:32", "p16777216:32:32:32:32"})
     EXPECT_THAT_EXPECTED(
         DataLayout::parse(Str),
         FailedWithMessage("address space must be a 24-bit integer"));
@@ -401,6 +402,26 @@ TEST(DataLayout, ParsePointerSpec) {
     EXPECT_THAT_EXPECTED(
         DataLayout::parse(Str),
         FailedWithMessage("index size cannot be larger than the pointer size"));
+
+  // Only 'e', 'u', and 'n' flags are valid.
+  EXPECT_THAT_EXPECTED(
+      DataLayout::parse("pa:32:32"),
+      FailedWithMessage("'a' is not a valid pointer specification flag"));
+  EXPECT_THAT_EXPECTED(
+      DataLayout::parse("puX:32:32"),
+      FailedWithMessage("'X' is not a valid pointer specification flag"));
+  // Flags must be before the address space number.
+  EXPECT_THAT_EXPECTED(
+      DataLayout::parse("p2n:32:32"),
+      FailedWithMessage("address space must be a 24-bit integer"));
+
+  // AS0 cannot be non-integral.
+  for (StringRef Str : {"pe:64:64", "pu:64:64", "pue:64:64", "pe0:64:64",
+                        "pu0:64:64", "peu0:64:64"})
+    EXPECT_THAT_EXPECTED(
+        DataLayout::parse(Str),
+        FailedWithMessage(
+            "address space 0 cannot be unstable or have external state"));
 }
 
 TEST(DataLayoutTest, ParseNativeIntegersSpec) {
@@ -556,18 +577,127 @@ TEST(DataLayout, GetPointerPrefAlignment) {
 }
 
 TEST(DataLayout, IsNonIntegralAddressSpace) {
-  DataLayout Default;
-  EXPECT_THAT(Default.getNonIntegralAddressSpaces(), ::testing::SizeIs(0));
+  const DataLayout Default;
+  EXPECT_THAT(Default.getNonStandardAddressSpaces(), ::testing::SizeIs(0));
   EXPECT_FALSE(Default.isNonIntegralAddressSpace(0));
   EXPECT_FALSE(Default.isNonIntegralAddressSpace(1));
 
-  DataLayout Custom = cantFail(DataLayout::parse("ni:2:16777215"));
-  EXPECT_THAT(Custom.getNonIntegralAddressSpaces(),
+  const DataLayout Custom = cantFail(DataLayout::parse("ni:2:16777215"));
+  EXPECT_THAT(Custom.getNonStandardAddressSpaces(),
               ::testing::ElementsAreArray({2U, 16777215U}));
   EXPECT_FALSE(Custom.isNonIntegralAddressSpace(0));
   EXPECT_FALSE(Custom.isNonIntegralAddressSpace(1));
   EXPECT_TRUE(Custom.isNonIntegralAddressSpace(2));
+  EXPECT_TRUE(Custom.mustNotIntroduceIntToPtr(2));
+  EXPECT_TRUE(Custom.mustNotIntroducePtrToInt(2));
   EXPECT_TRUE(Custom.isNonIntegralAddressSpace(16777215));
+  EXPECT_TRUE(Custom.mustNotIntroduceIntToPtr(16777215));
+  EXPECT_TRUE(Custom.mustNotIntroducePtrToInt(16777215));
+
+  // Pointers are marked as non-integral if the address size != total size
+  for (const auto *Layout : {"p2:64:64:64:32", "p2:128:64:64:64"}) {
+    const DataLayout DL = cantFail(DataLayout::parse(Layout));
+    EXPECT_TRUE(DL.isNonIntegralAddressSpace(2));
+    EXPECT_FALSE(DL.hasUnstableRepresentation(2));
+    EXPECT_FALSE(DL.hasExternalState(2));
+    EXPECT_FALSE(DL.mustNotIntroduceIntToPtr(2));
+    EXPECT_FALSE(DL.mustNotIntroducePtrToInt(2));
+    EXPECT_THAT(DL.getNonStandardAddressSpaces(),
+                ::testing::ElementsAreArray({2U}));
+  }
+  // Pointers can be marked as unstable using 'pu'
+  for (const auto *Layout : {"pu2:64:64:64:64", "pu2:64:64:64:32"}) {
+    const DataLayout DL = cantFail(DataLayout::parse(Layout));
+    // Note: isNonIntegralAddressSpace returns true for even with index ==
+    EXPECT_TRUE(DL.isNonIntegralAddressSpace(2));
+    EXPECT_TRUE(DL.hasUnstableRepresentation(2));
+    EXPECT_FALSE(DL.hasExternalState(2));
+    EXPECT_TRUE(DL.mustNotIntroducePtrToInt(2));
+    EXPECT_TRUE(DL.mustNotIntroduceIntToPtr(2));
+    EXPECT_THAT(DL.getNonStandardAddressSpaces(),
+                ::testing::ElementsAreArray({2U}));
+  }
+
+  // Non-integral pointers with external state ('e' flag).
+  for (const auto *Layout : {"pe2:64:64:64:32", "pe2:64:64:64:64"}) {
+    const DataLayout DL = cantFail(DataLayout::parse(Layout));
+    EXPECT_TRUE(DL.isNonIntegralAddressSpace(2));
+    EXPECT_TRUE(DL.hasExternalState(2));
+    EXPECT_TRUE(DL.mustNotIntroduceIntToPtr(2));
+    EXPECT_FALSE(DL.mustNotIntroducePtrToInt(2));
+    EXPECT_FALSE(DL.hasUnstableRepresentation(2));
+    EXPECT_THAT(DL.getNonStandardAddressSpaces(),
+                ::testing::ElementsAreArray({2U}));
+  }
+
+  // It is also possible to have both unstable representation and external state
+  for (const auto *Layout : {"peu2:64:64:64:32", "pue2:128:64:64:64"}) {
+    const DataLayout DL = cantFail(DataLayout::parse(Layout));
+    EXPECT_TRUE(DL.isNonIntegralAddressSpace(2));
+    EXPECT_TRUE(DL.hasExternalState(2));
+    EXPECT_TRUE(Custom.mustNotIntroduceIntToPtr(2));
+    EXPECT_TRUE(Custom.mustNotIntroducePtrToInt(2));
+    EXPECT_TRUE(DL.hasUnstableRepresentation(2));
+    EXPECT_THAT(DL.getNonStandardAddressSpaces(),
+                ::testing::ElementsAreArray({2U}));
+  }
+
+  // For backwards compatibility, the ni DataLayout part overrides any
+  // p[e][u].
+  for (const auto *Layout :
+       {"ni:2-p2:64:64:64:32", "ni:2-pu2:64:64:64:32", "ni:2-pu2:64:64:64:32",
+        "p2:64:64:64:32-ni:2", "pu2:64:64:64:32-ni:2", "pe2:64:64:64:32-ni:2",
+        "peeee2:64:64:64:32-pu2:64:64:64:32-ni:2"}) {
+    DataLayout DL = cantFail(DataLayout::parse(Layout));
+    EXPECT_TRUE(DL.isNonIntegralAddressSpace(2));
+    EXPECT_TRUE(DL.hasUnstableRepresentation(2));
+    // The external state property is new and not expected for existing uses of
+    // non-integral pointers, so existing :ni data layouts should not set it.
+    EXPECT_FALSE(DL.hasExternalState(2));
+    EXPECT_THAT(DL.getNonStandardAddressSpaces(),
+                ::testing::ElementsAreArray({2U}));
+  }
+}
+
+TEST(DataLayout, NonIntegralHelpers) {
+  DataLayout DL = cantFail(DataLayout::parse(
+      "p1:128:128:128:64-pu2:32:32:32:32-pu3:64:64:64:32-pe4:64:64:64:32"));
+  EXPECT_THAT(DL.getNonStandardAddressSpaces(),
+              ::testing::ElementsAreArray({1u, 2u, 3u, 4u}));
+  struct Result {
+    unsigned Addrspace;
+    bool NonIntegral;
+    bool Unstable;
+    bool ExternalState;
+    unsigned Size;
+  } ExpectedResults[] = {
+      {0, false, false, false, 64}, {1, true, false, false, 128},
+      {2, true, true, false, 32},   {3, true, true, false, 64},
+      {4, true, false, true, 64},
+  };
+  LLVMContext Ctx;
+  for (const auto &Exp : ExpectedResults) {
+    EXPECT_EQ(Exp.NonIntegral, DL.isNonIntegralAddressSpace(Exp.Addrspace));
+    EXPECT_EQ(Exp.Unstable, DL.hasUnstableRepresentation(Exp.Addrspace));
+    EXPECT_EQ(Exp.ExternalState, DL.hasExternalState(Exp.Addrspace));
+    bool AvoidIntToPtr = Exp.Unstable || Exp.ExternalState;
+    EXPECT_EQ(AvoidIntToPtr, DL.mustNotIntroduceIntToPtr(Exp.Addrspace));
+    bool AvoidPtrToInt = Exp.Unstable;
+    EXPECT_EQ(AvoidPtrToInt, DL.mustNotIntroducePtrToInt(Exp.Addrspace));
+    Type *PtrTy = PointerType::get(Ctx, Exp.Addrspace);
+    Type *PtrVecTy = VectorType::get(PtrTy, 2, /*Scalable=*/false);
+    Type *ScalablePtrVecTy = VectorType::get(PtrTy, 1, /*Scalable=*/true);
+    for (Type *Ty : {PtrTy, PtrVecTy, ScalablePtrVecTy}) {
+      EXPECT_EQ(AvoidPtrToInt, DL.mustNotIntroducePtrToInt(Ty));
+      EXPECT_EQ(AvoidIntToPtr, DL.mustNotIntroduceIntToPtr(Ty));
+      // The old API should return true for both unstable and non-integral.
+      EXPECT_EQ(Exp.Unstable || Exp.NonIntegral,
+                DL.isNonIntegralPointerType(Ty));
+    }
+    // Both helpers gracefully handle non-pointer, non-vector-of-pointers:
+    EXPECT_FALSE(DL.mustNotIntroducePtrToInt(IntegerType::getInt1Ty(Ctx)));
+    EXPECT_FALSE(DL.mustNotIntroduceIntToPtr(IntegerType::getInt1Ty(Ctx)));
+  }
 }
 
 TEST(DataLayoutTest, CopyAssignmentInvalidatesStructLayout) {
        
    
    
More information about the llvm-commits
mailing list