[llvm-commits] metadata.h/cpp review stuff

Chris Lattner clattner at apple.com
Mon Oct 19 00:06:31 PDT 2009


Hi Devang,

I'm just now getting caught up on your work.  Here are some thoughts/ 
questions on Metadata.h/cpp:


The metadata hierarchy is:

User
   -> MetadataBase
      -> MDString
      -> MDNode
      -> NamedMDNode

Why not:

Value
   -> MetadataBase
      -> MDString
      -> MDNode
      -> NamedMDNode

This would shrink them significantly and reduce the games you have to  
play with hiding getNumOperands, overloading operator new, etc.  Also,  
MetadataBase doesn't have much common state, it isn't clear it needs  
to exist at all.

There is some seriously confused code circling around  
MetadataBase::resizeOperands.  It looks like half the code thinks that  
operands can happen, but half doesn't.  Please remove all this stuff.


In Metadata.h:


#include "llvm/User.h"
#include "llvm/Type.h"
#include "llvm/OperandTraits.h"
#include "llvm/ADT/FoldingSet.h"
#include "llvm/ADT/SmallVector.h"
#include "llvm/ADT/SmallPtrSet.h"
#include "llvm/ADT/DenseMap.h"
#include "llvm/ADT/StringMap.h"
#include "llvm/ADT/ilist_node.h"
#include "llvm/Support/ErrorHandling.h"
#include "llvm/Support/ValueHandle.h"

This is including way too much.  Please reduce this as aggressively as  
possible.



class MDString : public MetadataBase {
...
   explicit MDString(LLVMContext &C, const char *begin, unsigned l)

This ctor should just take a StringRef.


...
   static MDString *get(LLVMContext &Context, const StringRef &Str);

Please take StringRef's by value, not by const&.


class MDNode : public MetadataBase, public FoldingSetNode {
...
   friend class ElementVH;
   // Use CallbackVH to hold MDNOde elements.
   struct ElementVH : public CallbackVH {
     MDNode *Parent;
     ElementVH(Value *V, MDNode *P) : CallbackVH(V), Parent(P) {}
     ~ElementVH() {}

     virtual void deleted() {
       Parent->replaceElement(this->operator Value*(), 0);
     }

     virtual void allUsesReplacedWith(Value *NV) {
       Parent->replaceElement(this->operator Value*(), NV);
     }
   };
   // Replace each instance of F from the element list of this node  
with T.
   void replaceElement(Value *F, Value *T);

   SmallVector<ElementVH, 4> Node;


Please move all of this stuff out of line to Metadata.cpp.  If needbe,  
Node should be declared as a 'void*' in the header.  Also, there is no  
reason to use a SmallVector here: the size is fixed on node  
construction.  Just use a pointer to array and new[]/delete[] the  
array.  Keeping a SmallVector in NamedMDNode (but not MDNode) is  
completely fine.


Why does MDNode::dropAllReferences need to exist?  It seems *highly*  
unsafe because it clears 'Node' but doesn't update the FoldingSet.


   Value *getElement(unsigned i) const {
     assert(getNumElements() > i && "Invalid element number!");

This is fine, but please use "i < getNumElements()" because that is  
more idiomatic.  This happens multiple times in the file.


/// WeakMetadataVH - a weak value handle for metadata.
class WeakMetadataVH : public WeakVH {
public:

I think this should be removed and the use replaced with TrackingVH??



NamedMDNode::setParent shouldn't be public.  Does  
NamedMDNode::dropAllReferences need to exist?

I don't understand NamedMDNode::Create(const NamedMDNode *NMD, Module  
*M).  Is this a weird copy ctor?  Why does it need to exist?


   void addElement(MetadataBase *M) {
     resizeOperands(0);
     OperandList[NumOperands++] = M;
     Node.push_back(WeakMetadataVH(M));
   }

How can this work?  Isn't this an invalid access into OperandList?


/// MetadataContext -
/// MetadataContext handles uniquing and assignment of IDs for custom  
metadata
/// types. Custom metadata handler names do not contain spaces. And  
the name
/// must start with an alphabet. The regular expression used to check  
name

"start with an alphabet" is missing something.

class MetadataContext {
public:
   typedef std::pair<unsigned, WeakVH> MDPairTy;
   typedef SmallVector<MDPairTy, 2> MDMapTy;
   typedef DenseMap<const Instruction *, MDMapTy> MDStoreTy;
   friend class BitcodeReader;
private:

   /// MetadataStore - Collection of metadata used in this context.
   MDStoreTy MetadataStore;

   /// MDHandlerNames - Map to hold metadata handler names.
   StringMap<unsigned> MDHandlerNames;


These datastructures are all implementation details.  They should be  
hidden behind a void*, allowing you to avoid #include of DenseMap.h,  
StringMap.h etc.  MDPairTy should be replaced with TrackingVH<MDNode>.



   /// RegisterMDKind - Register a new metadata kind and return its ID.
   /// A metadata kind can be registered only once.
   unsigned RegisterMDKind(const char *Name);

   /// getMDKind - Return metadata kind. If the requested metadata kind
   /// is not registered then return 0.
   unsigned getMDKind(const char *Name);

   /// validName - Return true if Name is a valid custom metadata  
handler name.
   bool validName(const char *Name);

These methods should all take StringRef for consistency.  The second  
should be const, the third should be static.  validName ->  
isValidName.  RegisterMDKind -> registerMDKind.


   /// getMD - Get the metadata of given kind attached with an  
Instruction.
   /// If the metadata is not found then return 0.
   MDNode *getMD(unsigned Kind, const Instruction *Inst);

"with an instruction" -> "to an instruction".   This occurs several  
times in both files.

   /// removeMDs - Remove all metadata attached with an instruction.
   void removeMDs(const Instruction *Inst);

removeAllMetadata()?



   /// getHandlerNames - Get handler names. This is used by bitcode
   /// writer.
   const StringMap<unsigned> *getHandlerNames();

This is gross for several reasons: at the least, it should be const.   
It also exposes implementation details.  Finally, the bitcode writer  
is currently writing these out in iterator order which is not at all  
deterministic.  It would be better to have something like:

   void getHandlerNames(SmallVectorImpl<std::pair<unsigned, StringRef>  
 > &) const;

and have the implementation of getHandlerNames do the sort (with  
array_pod_sort).




   /// ValueIsDeleted - This handler is used to update metadata store
   /// when a value is deleted.
   void ValueIsDeleted(const Value *) {}
   void ValueIsDeleted(const Instruction *Inst) {
     removeMDs(Inst);
   }
   void ValueIsRAUWd(Value *V1, Value *V2);

   /// ValueIsCloned - This handler is used to update metadata store
   /// when In1 is cloned to create In2.
   void ValueIsCloned(const Instruction *In1, Instruction *In2);

These should not be public and/or should not be in the MetadataContext  
class.


Metadata.cpp:

void MDNode::replaceElement(Value *From, Value *To) {
...

You don't know that MDNode::replaceElement will be called before or  
after the VH operands uses are updated.  I don't think this method in  
general is safe because a bunch of stuff depends on pointer equality  
with the "from" value.


   // MDNode only lists metadata elements in operand list, because  
MDNode
   // used by MDNode is considered a valid use. However on the side,  
MDNode
   // using a non-metadata value is not considered a "use" of non- 
metadata
   // value.

Why do MD uses try to show up in the operand list?  This seems really  
complex, what benefit does it provide?

   if (MetadataBase *MDTo = dyn_cast_or_null<MetadataBase>(To)) {
     for (SmallVector<unsigned, 4>::iterator OI = OpIndexes.begin(),
            OE = OpIndexes.end(); OI != OE; ++OI)
       setOperand(*OI, MDTo);
   } else {
     for (SmallVector<unsigned, 4>::iterator OI = OpIndexes.begin(),
            OE = OpIndexes.end(); OI != OE; ++OI)
       setOperand(*OI, 0);
   }

This doesn't make sense to me.  If one MD operand is replaced, it sets  
them all to the new thing?

NamedMDNode::NamedMDNode(LLVMContext &C, const Twine &N,
...
   resizeOperands(NumMDs);

   for (unsigned i = 0; i != NumMDs; ++i) {
     if (MDs[i])
       OperandList[NumOperands++] = MDs[i];
     Node.push_back(WeakMetadataVH(MDs[i]));
   }

This is wrong unless resizeOperands does something really strange  
(likely).  Why does NamedMDNode put anything in the 'User' operands  
list at all?  I think this should just be removed.


unsigned MetadataContext::RegisterMDKind(const char *Name) {
..
   assert(MDHandlerNames.find(Name) == MDHandlerNames.end()
          && "Already registered MDKind!");

Just use: assert(MDHandlerNames.count(Name) == 0 && ...



   MDHandlerNames[Name] = Count + 1;
   return Count + 1;

How about: return MDHandlerNames[Name] = Count + 1;


unsigned MetadataContext::getMDKind(const char *Name) {
   assert(validName(Name) && "Invalid custome metadata name!");
   StringMap<unsigned>::iterator I = MDHandlerNames.find(Name);

Random microoptimization for assert builds: the check for a valid name  
can be sunk into the case where the query fails.  If the query  
succeeds (common case) the name was already validated.


void MetadataContext::addMD(unsigned MDKind, MDNode *Node, Instruction  
*Inst) {
...
   MDStoreTy::iterator I = MetadataStore.find(Inst);
   if (I == MetadataStore.end()) {
     MDMapTy Info;
     Info.push_back(std::make_pair(MDKind, Node));
     MetadataStore.insert(std::make_pair(Inst, Info));
     return;
   }

It is actually better to just rely on default construction in the  
map.  Just use:
   MDMapTy &Info = MetadataStore[Inst];
and delete the code above.


/// removeMD - Remove metadata of given kind attached with an  
instuction.
void MetadataContext::removeMD(unsigned Kind, Instruction *Inst) {
   MDStoreTy::iterator I = MetadataStore.find(Inst);
   if (I == MetadataStore.end())
     return;
   MDMapTy &Info = I->second;

This seems wrong.  The method should only be called if the  
'HasMetadata' bit is set on the instruction.  If it is set and there  
is no entry in MetaDataStore, then it is a logic bug (so it should  
assert).  Just do:

MDMapTy &Info = MetadataStore[Inst];
assert(!Info.empty() && ..

or whatever.



/// removeMDs - Remove all metadata attached with an instruction.
void MetadataContext::removeMDs(const Instruction *Inst) {
   // Find Metadata handles for this instruction.
   MDStoreTy::iterator I = MetadataStore.find(Inst);
   assert(I != MetadataStore.end() && "Invalid custom metadata info!");
   MDMapTy &Info = I->second;

   // FIXME : Give all metadata handlers a chance to adjust.

   // Remove the entries for this instruction.
   Info.clear();
   MetadataStore.erase(I);
}


This doesn't look fully implemented.   In addition to the FIXME (What  
does it mean?) this should clear the  HasMetadata bit on the  
instruction.  Also, don't use find here.  This entire method can  
currently be replaced with this, which is identical in behavior:

void MetadataContext::removeMDs(const Instruction *Inst) {
   MetadataStore.erase(Inst);
}


void MetadataContext::copyMD(Instruction *In1, Instruction *In2) {
   assert(In1 && In2 && "Invalid instruction!");
    MDStoreTy::iterator I = MetadataStore.find(In1);
   if (I == MetadataStore.end())
     return;

   MDMapTy &In1Info = I->second;
   MDMapTy In2Info;
   for (MDMapTy::iterator I = In1Info.begin(), E = In1Info.end(); I !=  
E; ++I)
     if (MDNode *MD = dyn_cast_or_null<MDNode>(I->second))
       addMD(I->first, MD, In2);


Similar comments to above.  Also, In2Info is dead here.   
dyn_cast_or_null should go away when you start using trackingvh  
instead of weakvh (not just here, check the rest of the file).



MDNode *MetadataContext::getMD(unsigned MDKind, const Instruction  
*Inst) {
   MDStoreTy::iterator I = MetadataStore.find(Inst);
   if (I == MetadataStore.end())
     return NULL;

Should be an assert as above.



const MetadataContext::MDMapTy *
MetadataContext::getMDs(const Instruction *Inst) {

I really don't like this method.  Exposing implementation details like  
this is bad.  IF this is for the bc writer, please follow something  
similar to the getHandlerNames approach above.

  void MetadataContext::ValueIsCloned(const Instruction *In1,  
Instruction *In2) {
   // Find Metadata handles for In1.
   MDStoreTy::iterator I = MetadataStore.find(In1);
   assert(I != MetadataStore.end() && "Invalid custom metadata info!");

   // FIXME : Give all metadata handlers a chance to adjust.

   MDMapTy &In1Info = I->second;
   MDMapTy In2Info;
   for (MDMapTy::iterator I = In1Info.begin(), E = In1Info.end(); I !=  
E; ++I)
     if (MDNode *MD = dyn_cast_or_null<MDNode>(I->second))
       addMD(I->first, MD, In2);
}

How is this different than copyMD?

/// ValueIsRAUWd - This handler is used when V1's all uses are  
replaced by
/// V2.
void MetadataContext::ValueIsRAUWd(Value *V1, Value *V2) {
   Instruction *I1 = dyn_cast<Instruction>(V1);
   Instruction *I2 = dyn_cast<Instruction>(V2);
   if (!I1 || !I2)

This seems very very wrong.  If I do:

X->RAUW(Y);

Then the properties on X are not necessarily true in Y's scope.  The  
debug info (for example) should just be discarded.  Value range info  
is another example: X may be defined lower in the dom tree where some  
property holds that doesn't hold on Y.

-Chris


















More information about the llvm-commits mailing list