[cfe-dev] libclang interface for comments AST

Tue Jul 17 10:19:49 PDT 2012

Hi Doug!

Thank you for looking at this!

On Tue, Jul 17, 2012 at 9:54 AM, Douglas Gregor <dgregor at apple.com> wrote:
> On Jul 16, 2012, at 12:48 PM, Dmitri Gribenko wrote:
>> I am not sure about error handling.  For example,
>> clang_InlineCommandComment_getCommandName is not applicable to a
>> CXComment_Text AST node.  I think that this should abort() right away,
>> but I don't see any other libclang APIs doing that -- they try to
>> return some bogus value instead.
>
> Yeah, we should return a safe-but-meaningless value from such mistakes. Clients of libclang aren't necessarily as rigorously tested as Clang itself, so flushing out bugs via assert()s or abort()s, the way we do in the LLVM code base, doesn't work well in practice.

OK.

>  /**
> + * \brief Given a cursor that represents a declaration, return the associated
> + * \c CXComment_FullComment AST node for the whole comment.
> + */
> +CINDEX_LINKAGE CXComment clang_Cursor_getFullComment(CXCursor C);
> +
> +/**
>   * @}
>   */
>
> Presumably, this should eventually work for macro definitions as well.

Changed comment to:

/**
 * \brief Given a cursor that represents a documentable entity (e.g.,
 * declaration), return the associated parsed comment as a
 * \c CXComment_FullComment AST node.
 */

> I think the name "full" comment is not specific enough. How about calling this a "parsed" comment, because that's what is actually returned? (And it's a good counter to the "raw" comment).

Renamed to clang_Cursor_getParsedComment.

> +CINDEX_LINKAGE
> +CXString clang_InlineCommandComment_getCommandName(CXComment Comment);
> <snip>
> +CINDEX_LINKAGE
> +CXString clang_BlockCommandComment_getCommandName(CXComment Comment);
>
> Why have both of these, rather than simply
>
> CINDEX_LINKAGE
> CXString clang_Comment_getCommandName(CXComment Comment);
>
> ?
>
> Similar comment for clang_*Comment_getNumArgs/clang_*Comment_getArgText

So that this function would internally determine the type of the AST
node and "just do the right thing"?  This, combined with ignoring AST
node type mismatches and making up "safe" values, might create some
issues with understanding the API and lead to API misuse.

> Also, there are a number of clang_*Comment_getText() calls.

For example, someone might apply this clang_Comment_getText() call to
a FullComment and expect it to return a comment represented as plain
text (somehow, with or without HTML and Doxygen markup, word-wrapped
to fit 80 cols or not -- expectations can be very different).  But,
unless we implement this AST traversal, we will just return an empty
value because we will call getText() only for AST nodes that have
getText() call themselves.

Or implementing getText() to work universally is the way to go?

> Perhaps there should just be a single clang_Comment_getText() that works for the various comment kinds? The main outlier is
>
> +/**
> + * \param Comment a \c CXComment_VerbatimBlockCommand AST node.
> + *
> + * \param LineIdx verbatim text line index (zero-based).
> + *
> + * \returns text contained in the specified line.
> + */
> +CINDEX_LINKAGE
> +CXString clang_VerbatimBlockComment_getText(CXComment Comment,
> +                                            unsigned LineIdx);
>
> and I'm wondering why one doesn't simply get the CXComment_VerbatimBlockLine via clang_Comment_getNumChildren/clang_Comment_getChild?

It is just a convenience function.  It can be easily removed.

> This group of functions:
>
> +CINDEX_LINKAGE CXString clang_HTMLTagComment_getTagName(CXComment Comment);
> +CINDEX_LINKAGE
> +unsigned clang_HTMLStartTagComment_isSelfClosing(CXComment Comment);
> +CINDEX_LINKAGE unsigned clang_HTMLStartTag_getNumAttrs(CXComment Comment);
> +CINDEX_LINKAGE
> +CXString clang_HTMLStartTag_getAttrName(CXComment Comment, unsigned AttrIdx);
> +CINDEX_LINKAGE
> +CXString clang_HTMLStartTag_getAttrValue(CXComment Comment, unsigned AttrIdx);
>
> implies that a client will have to reconstruct HTML tags on its own, which seems like it's going to cause a lot of redundant (and probably wrong) code in clients. Can the aforementioned clang_Comment_getText() do the right thing for CXComment_HTMLStartTag and CXComment_HTMLEndTag nodes, by formatting the HTML tag appropriately? That would simplify a number of clients.

It surely makes sense to have such API, but see my concerns about
general getText() API.

Dmitri

-- 
main(i,j){for(i=2;;i++){for(j=2;j<i;j++){if(!(i%j)){j=0;break;}}if
(j){printf("%d\n",i);}}} /*Dmitri Gribenko <gribozavr at gmail.com>*/