[cfe-dev] libclang interface for comments AST

Tue Jul 17 09:54:56 PDT 2012

On Jul 16, 2012, at 12:48 PM, Dmitri Gribenko wrote:

> Hello,
> 
> I am designing a libclang interface to expose comments ASTs.  Attached
> is a patch for Index.h.
> 
> The API has node traversal APIs and APIs to get various properties of the nodes.
> 
> I am not sure about error handling.  For example,
> clang_InlineCommandComment_getCommandName is not applicable to a
> CXComment_Text AST node.  I think that this should abort() right away,
> but I don't see any other libclang APIs doing that -- they try to
> return some bogus value instead.

Yeah, we should return a safe-but-meaningless value from such mistakes. Clients of libclang aren't necessarily as rigorously tested as Clang itself, so flushing out bugs via assert()s or abort()s, the way we do in the LLVM code base, doesn't work well in practice.

> I just want to get some general feedback if this is the right
> direction and if there is something obviously missing.

 /**
+ * \brief Given a cursor that represents a declaration, return the associated
+ * \c CXComment_FullComment AST node for the whole comment.
+ */
+CINDEX_LINKAGE CXComment clang_Cursor_getFullComment(CXCursor C);
+
+/**
  * @}
  */

Presumably, this should eventually work for macro definitions as well.

I think the name "full" comment is not specific enough. How about calling this a "parsed" comment, because that's what is actually returned? (And it's a good counter to the "raw" comment).

+CINDEX_LINKAGE
+CXString clang_InlineCommandComment_getCommandName(CXComment Comment);
<snip>
+CINDEX_LINKAGE
+CXString clang_BlockCommandComment_getCommandName(CXComment Comment);

Why have both of these, rather than simply

CINDEX_LINKAGE
CXString clang_Comment_getCommandName(CXComment Comment);

?

Similar comment for clang_*Comment_getNumArgs/clang_*Comment_getArgText

Also, there are a number of clang_*Comment_getText() calls. Perhaps there should just be a single clang_Comment_getText() that works for the various comment kinds? The main outlier is

+/**
+ * \param Comment a \c CXComment_VerbatimBlockCommand AST node.
+ *
+ * \param LineIdx verbatim text line index (zero-based).
+ *
+ * \returns text contained in the specified line.
+ */
+CINDEX_LINKAGE
+CXString clang_VerbatimBlockComment_getText(CXComment Comment,
+                                            unsigned LineIdx);

and I'm wondering why one doesn't simply get the CXComment_VerbatimBlockLine via clang_Comment_getNumChildren/clang_Comment_getChild? 

This group of functions:

+CINDEX_LINKAGE CXString clang_HTMLTagComment_getTagName(CXComment Comment);
+CINDEX_LINKAGE
+unsigned clang_HTMLStartTagComment_isSelfClosing(CXComment Comment);
+CINDEX_LINKAGE unsigned clang_HTMLStartTag_getNumAttrs(CXComment Comment);
+CINDEX_LINKAGE
+CXString clang_HTMLStartTag_getAttrName(CXComment Comment, unsigned AttrIdx);
+CINDEX_LINKAGE
+CXString clang_HTMLStartTag_getAttrValue(CXComment Comment, unsigned AttrIdx);

implies that a client will have to reconstruct HTML tags on its own, which seems like it's going to cause a lot of redundant (and probably wrong) code in clients. Can the aforementioned clang_Comment_getText() do the right thing for CXComment_HTMLStartTag and CXComment_HTMLEndTag nodes, by formatting the HTML tag appropriately? That would simplify a number of clients.

	- Doug