[LLVMdev] Casting between address spaces and address space semantics

Mon Jul 21 08:05:28 PDT 2008

Hi all,

> If I read the standard correctly, the properties of these address spaces can
> be fully captured by defining the relationship between every pair of address
> spaces (disjoint, identical, subset/superset).
> 
> I think it would make sense to make these relationships backend/platform
> specific, but for clang and the optimization passes to properly work with
> address spaces, these relationships should be known. It seems to me that the
> proper way to handle this, is to include this information in the TargetData
> class (or whatever platform-specific class is appropriate, I'm not really into
> this currently).

To have something to actually talk about, I thought I'd create some code. I've
got a patch attached which consists roughly of two parts: it adds the concept
of address space relations to TargetData, and teaches InstructionCombining to
remove casts based on those relations.

The instcombine is pretty straightforward: If a bitcast changes the address
space of a pointer in a way that is redundant, and all of the users of the
bitcasts can be rewritten to use the original pointer as well (currently this
means loads, stores and geps only), this is done.

The TargetData part is a lot more tricky. I originally planned to simply add a
virtual method to TargetData and create a specific subclass for our compiler,
when I discovered that this is not how TargetData is supposed to work.
Currently, TargetData consists only of a TargetDescription, which is basically
only sizes and alignment info for different types.

I solved this by creating a new class AddrspacesDescriptor, which has a single
virtual method that defines the relationship between address spaces. This
class contains a default implementation and can be subclassed to get a
different implementation.

A reference to a AddrspaceDescriptor is passed to TargetData's constructor,
with a default. This means that all of the current code uses the default
implementation. I'm not so sure what would be the correct way to make the
various tool select other configurations, currently.

Also, I'm quite positive that TargetData is not really the right place to put
something like this. At least not in the way it is currently written, passing
around references to AddrspacesDescriptor is far from ideal. However, I do not
think we should be trying to define these relations between address spaces in
a text-based format similar to the current TargetDescription. In that way, we
would either loose a lot of flexibility, or end up with a huge
TargetDescription (with one element for each combination of address
spaces...).

So, any comments on the patch and/or hints on how to do things better? Any
thoughts on how selecting a address spaces configuration should work in the
different tools?

Gr.

Matthijs

Index: lib/Transforms/Scalar/InstructionCombining.cpp
===================================================================

--- lib/Transforms/Scalar/InstructionCombining.cpp	(revision 53716)
+++ lib/Transforms/Scalar/InstructionCombining.cpp	(working copy)
@@ -400,6 +400,8 @@
     unsigned GetOrEnforceKnownAlignment(Value *V,
                                         unsigned PrefAlign = 0);
 
+    /// Propagate address spaces through bitcasts
+    Instruction *PropagateAddressSpace(BitCastInst &CI);
   };
 }
 
@@ -7841,6 +7843,81 @@
   return 0;
 }
 
+/// Try to propagate the address space from the source of the bitcast into all
+/// of its uses. CI casts between two pointer types with different address spaces.
+Instruction *InstCombiner::PropagateAddressSpace(BitCastInst &CI) {
+  // Determine the new type. This is the old dest type with the address space
+  // changed. We can't simply use the source type, since the element type can be
+  // changed in the cast in addition to the address space.
+  const PointerType *DestTy = cast<PointerType>(CI.getDestTy());
+  const PointerType *SrcTy  = cast<PointerType>(CI.getSrcTy());
+  const PointerType *NTy = PointerType::get(DestTy->getElementType(), SrcTy->getAddressSpace());
+    
+  // Only propagate the address space when the source and dest address spaces
+  // are equivalent, or the source address space is a subset of the target
+  // address space.
+  AddrspacesDescriptor::AddrspaceRelation Rel;
+  Rel = TD->getAddrspaceRelation(SrcTy->getAddressSpace(),
+                                 DestTy->getAddressSpace());
+  if (Rel != AddrspacesDescriptor::Equivalent 
+      && Rel != AddrspacesDescriptor::Subset)
+    return 0;
+
+  // We can propagate the address space if we can modify the type of CI. This
+  // requires that any uses of it should be modifiable. For now, we'll just
+  // propagate if all uses are loads and stores.
+  for (User::use_iterator I = CI.use_begin(), E = CI.use_end(); I != E; ++I) {
+    if (isa<StoreInst>(*I)) {
+      if (I.getOperandNo() != StoreInst::getPointerOperandIndex())
+        return 0; // Can only handle store to this pointer
+    } else if (isa<GetElementPtrInst>(*I)) {
+      if (I.getOperandNo() != GetElementPtrInst::getPointerOperandIndex())
+        return 0;
+    } else if (!isa<LoadInst>(*I)) {
+      // Can't handle other instructions
+      return 0;
+    }
+  }
+  // Insert a new bitcast, if needed
+  Value *NewVal = InsertBitCastBefore(CI.getOperand(0), NTy, CI);
+  
+  for (User::use_iterator I = CI.use_begin(), E = CI.use_end(); I != E; ++I) {
+    Instruction *NewInst;
+    Instruction *OldInst = cast<Instruction>(*I);
+    // Declare this here, so we can reuse the dyn_cast
+    GetElementPtrInst *GEP = NULL;
+    // Create a new instruction
+    if (isa<StoreInst>(*I))
+      NewInst = new StoreInst(I->getOperand(0), NewVal);
+    else if ((GEP = dyn_cast<GetElementPtrInst>(*I))) {
+      // Insert a new GEP
+      std::vector<Value*> Indices(GEP->idx_begin(), GEP->idx_end());
+      NewInst = GetElementPtrInst::Create(NewVal, Indices.begin(), Indices.end());
+    } else // Must be LoadInst
+      NewInst = new LoadInst(NewVal);
+    
+    // Replace the old instruction
+    InsertNewInstBefore(NewInst, *OldInst);
+    NewInst->takeName(OldInst);
+    
+    // Put this in a Value*, since InsertBitCastBefore could return a Value*
+    // below.
+    Value *ReplaceWith = NewInst;
+    if (GEP) {
+      // Now insert a bitcast as well. We couldn't do this before, because the
+      // above two lines don't hold for the GEPs
+      const Type* GEPTy = PointerType::get(GEP->getType()->getElementType(), DestTy->getAddressSpace());
+      ReplaceWith = InsertBitCastBefore(NewInst, GEPTy, *GEP);
+    }
+    ReplaceInstUsesWith(*OldInst, ReplaceWith);
+    EraseInstFromFunction(*OldInst);
+
+  }
+  
+  // Return the old instruction, so it can be removed.
+  return &CI;
+}
+
 Instruction *InstCombiner::visitBitCast(BitCastInst &CI) {
   // If the operands are integer typed then apply the integer transforms,
   // otherwise just apply the common ones.
@@ -7873,7 +7950,9 @@
     // If the address spaces don't match, don't eliminate the bitcast, which is
     // required for changing types.
     if (SrcPTy->getAddressSpace() != DstPTy->getAddressSpace())
-      return 0;
+      // If we're casting to address space zero, see if we can't just propagate
+      // the address space into the uses of CI.
+      return PropagateAddressSpace(CI);
     
     // If we are casting a malloc or alloca to a pointer to a type of the same
     // size, rewrite the allocation instruction to allocate the "right" type.
Index: lib/Target/TargetData.cpp
===================================================================
--- lib/Target/TargetData.cpp	(revision 53716)
+++ lib/Target/TargetData.cpp	(working copy)
@@ -231,8 +231,8 @@
   }
 }
 
-TargetData::TargetData(const Module *M) 
-  : ImmutablePass((intptr_t)&ID) {
+TargetData::TargetData(const Module *M, const AddrspacesDescriptor &Addrspaces)
+  : ImmutablePass((intptr_t)&ID), Addrspaces(Addrspaces) {
   init(M->getDataLayout());
 }
 
@@ -604,3 +604,5 @@
 unsigned TargetData::getPreferredAlignmentLog(const GlobalVariable *GV) const {
   return Log2_32(getPreferredAlignment(GV));
 }
+
+const AddrspacesDescriptor TargetData::DefaultAddrspaces;
Index: include/llvm/Target/TargetData.h
===================================================================
--- include/llvm/Target/TargetData.h	(revision 53716)
+++ include/llvm/Target/TargetData.h	(working copy)
@@ -63,12 +63,41 @@
   std::ostream &dump(std::ostream &os) const;
 };
 
+/// This class is used to describe relations between different address spaces.
+/// The default implementation defines all different address spaces as
+/// equivalent.
+///
+/// Create a subclass and pass it to TargetData's constructors to define
+/// different behaviour.
+class AddrspacesDescriptor {
+public:
+  typedef enum {
+    Equivalent,
+    Disjoint,
+    Subset,
+    Superset
+  } AddrspaceRelation;
+
+  /// getAddrspaceRelation - Returns the relation between the given address
+  /// spaces. Should be read left to right, for example Subset means that
+  /// Addrspace1 is a subset of Addrspace2.
+  virtual AddrspaceRelation getAddrspaceRelation(
+    int Addrspace1,
+    int Addrspace2) const {
+      return Equivalent;
+  }
+
+  virtual ~AddrspacesDescriptor() {}
+};
+
 class TargetData : public ImmutablePass {
 private:
   bool          LittleEndian;          ///< Defaults to false
   unsigned char PointerMemSize;        ///< Pointer size in bytes
   unsigned char PointerABIAlign;       ///< Pointer ABI alignment
   unsigned char PointerPrefAlign;      ///< Pointer preferred alignment
+  const AddrspacesDescriptor &Addrspaces; ///< Address spaces descriptor
+  static const AddrspacesDescriptor DefaultAddrspaces; ///< Default address spaces descriptor
 
   //! Where the primitive type alignment data is stored.
   /*!
@@ -109,20 +138,23 @@
   ///
   /// @note This has to exist, because this is a pass, but it should never be
   /// used.
-  TargetData() : ImmutablePass(intptr_t(&ID)) {
+  TargetData() : ImmutablePass(intptr_t(&ID)), Addrspaces(DefaultAddrspaces) {
     assert(0 && "ERROR: Bad TargetData ctor used.  "
            "Tool did not specify a TargetData to use?");
     abort();
   }
     
   /// Constructs a TargetData from a specification string. See init().
-  explicit TargetData(const std::string &TargetDescription) 
-    : ImmutablePass(intptr_t(&ID)) {
+  explicit TargetData(const std::string &TargetDescription,
+                      const AddrspacesDescriptor &Addrspaces
+                      = DefaultAddrspaces)
+    : ImmutablePass(intptr_t(&ID)), Addrspaces(Addrspaces) {
     init(TargetDescription);
   }
 
   /// Initialize target data from properties stored in the module.
-  explicit TargetData(const Module *M);
+  explicit TargetData(const Module *M, const AddrspacesDescriptor &Addrspaces
+                                       = DefaultAddrspaces);
 
   TargetData(const TargetData &TD) : 
     ImmutablePass(intptr_t(&ID)),
@@ -130,6 +162,7 @@
     PointerMemSize(TD.PointerMemSize),
     PointerABIAlign(TD.PointerABIAlign),
     PointerPrefAlign(TD.PointerPrefAlign),
+    Addrspaces(TD.Addrspaces),
     Alignments(TD.Alignments)
   { }
 
@@ -240,6 +273,15 @@
   /// requested alignment (if the global has one).
   unsigned getPreferredAlignmentLog(const GlobalVariable *GV) const;
 
+  /// getAddrspaceRelation - Returns the relation between the given address
+  /// spaces. Should be read left to right, for example Subset means that
+  /// Addrspace1 is a subset of Addrspace2.
+  AddrspacesDescriptor::AddrspaceRelation getAddrspaceRelation(int Addrspace1,
+                                                               int Addrspace2) {
+                            
+    return Addrspaces.getAddrspaceRelation(Addrspace1, Addrspace2);
+  }
+
   static char ID; // Pass identification, replacement for typeid
 };
 
-------------- next part --------------
A non-text attachment was scrubbed...
Name: signature.asc
Type: application/pgp-signature
Size: 189 bytes
Desc: Digital signature
URL: <http://lists.llvm.org/pipermail/llvm-dev/attachments/20080721/e5fb5120/attachment.sig>