[cfe-dev] AST XML dump

Tue May 19 10:09:22 PDT 2009

On May 19, 2009, at 5:14 AM, Olaf Krzikalla wrote:

> Hi @clang-community,
>
> so now the first version is ready for commiting. I've attached the  
> appropriate patch.
> It would be nice if someone could review it in the very near future  
> and comment on it.
> So far I'm going to keep the direction of the current work.

Thanks for working on this. I have a couple comments/questions:

1) Why do we need to build an in-memory representation of the XML  
document just to print it? The in-memory representation itself isn't  
likely to be useful for much, since we're not providing any way of  
directly manipulating the in-memory representation. Would it be  
possible, instead, to stream the XML representation to disk directly,  
rather than building it all in memory? Doing so would save a lot of  
memory and should improve performance, since we won't be doing so much  
string copying and manipulation. Also in the same vein: there are  
quite a few std::map's to strings. These should probably be  
llvm::DenseMaps, and is it possible to map to something more efficient  
than a std::string?

2) I see that you've added a dumpXML routine into Stmt. Is there some  
benefit to having this as a method on Stmt, or can we just separate  
the XML-dumping functionality completely, putting it all into an AST  
consumer?

3) I feel that it is very important that we have a proper XML schema  
before we claim to generate good XML. It's also important for testing:  
we'd like to be able to, e.g., generate XML from some translation unit  
and then verify that the XML meets the known schema.

4) I can't build with this patch because of some errors (mentioned  
below). Could you post some sample input/output so we can get a feel  
for the structure?

More detailed comments follow:

+//---------------------------------------------------------
+AttributeXML::AttributeXML(const char* pName, unsigned value) :
+  Name(pName)
+{
+  char buffer[32];
+  Value = _ultoa(value, buffer, 10);
+}

include/llvm/ADT/StringExtras.h has some integer-to-string facilities.  
I suggest using those instead of _ultoa et al.

+//---------------------------------------------------------
+PresumedLoc DocumentXML::addLocation(const SourceLocation& Loc)
+{
+  SourceManager& SM = Ctx->getSourceManager();
+  SourceLocation SpellingLoc = SM.getSpellingLoc(Loc);
+  PresumedLoc PLoc;
+  if (!SpellingLoc.isInvalid())
+  {
+    PLoc = SM.getPresumedLoc(SpellingLoc);
+    addSourceFileAttribute(PLoc.getFilename());
+    addAttribute("line", PLoc.getLine());
+    addAttribute("col", PLoc.getColumn());
+  }
+  // else there is no error in some cases (eg. CXXThisExpr)
+  return PLoc;
+}

Do we need to think about source locations that point into macro  
instantiations here?

+#ifndef mode_t
+typedef unsigned short mode_t;
+#endif

Why do we need this?

+//---------------------------------------------------------
+template<class T, class U, class V>
+bool addToMap(std::map<T, std::string, U>& idMap, const V& value,  
tIdType idType = ID_NORMAL)
+{
+  std::map<T, std::string, U>::iterator i = idMap.find(value);
+  bool toAdd = i == idMap.end();
+  if (toAdd)
+  {
+    idMap.insert(std::map<T, std::string, U>::value_type(value,  
getNewId(idType)));
+  }
+  return toAdd;
+}

You need "typename" before std::map<T, std::string, U>::iterator and  
std::map<T, std::string, U>::value_type.

+  for (std::map<const Type*, std::string>::const_iterator i =  
BasicTypes.begin(), e = BasicTypes.end(); i != e; ++i)
+  {
+    // don't use the get methods as they strip of typedef infos
+    if (const BuiltinType *BT = dyn_cast<BuiltinType>(i->first)) {
+      addSubNode("FundamentalType");
+      addAttribute("name", BT->getName());
+    }
+    else if (const PointerType *PT = dyn_cast<PointerType>(i->first)) {
+      addSubNode("PointerType");
+      addTypeAttribute(PT->getPointeeType());
+    }

It might be better to use the macro invocations in include/clang/AST/ 
TypeNodes.def to enumerate and switch on the various kinds of type  
nodes. It's generally cleaner, and protects this code better against  
changes in the type hierarchy.

	- Doug