[cfe-commits] [PATCH] libclang API for comment-to-xml conversion

Mon Aug 6 23:04:10 PDT 2012

On Aug 6, 2012, at 4:22 PM, Dmitri Gribenko <gribozavr at gmail.com> wrote:

> On Mon, Aug 6, 2012 at 8:13 AM, Douglas Gregor <dgregor at apple.com> wrote:
>> On Aug 3, 2012, at 1:34 PM, Dmitri Gribenko <gribozavr at gmail.com> wrote:
>> Index: test/Index/comment-xml-schema.c
>> ===================================================================
>> --- test/Index/comment-xml-schema.c     (revision 0)
>> +++ test/Index/comment-xml-schema.c     (revision 0)
>> @@ -0,0 +1,35 @@
>> +// RUN: xmllint --noout --relaxng %S/Inputs/CommentXML/comment-xml-schema.rng %S/Inputs/CommentXML/valid-function-01.xml
>> 
>> This is going to fail if 'xmllint' is unavailable or simply not in the path. I think you'll need to extend lit with some knowledge of whether 'xmllint' is available (and perhaps its path?), so that we only run this set of tests when xmllint is available (via "REQUIRES: xmllint").
> 
> Seems to be doable with a small change to lit.cfg for clang tests:
> 
> Index: test/lit.cfg
> ===================================================================
> --- test/lit.cfg        (revision 161322)
> +++ test/lit.cfg        (working copy)
> @@ -259,3 +259,7 @@
> 
> if llc_props['enable_assertions']:
>     config.available_features.add('asserts')
> +
> +if lit.util.which('xmllint'):
> +    config.available_features.add('xmllint')
> +

Great!

>> I think the comment XML format looks great; it's a good balance between simplicity for the client and capability for rendering documentation comments. Some nitpicks follow:
>> 
>> +  <start>
>> +    <choice>
>> +      <ref name="Function" />
>> +      <ref name="Class" />
>> +      <ref name="Variable" />
>> +      <ref name="Namespace" />
>> +      <ref name="Typedef" />
>> +    </choice>
>> +  </start>
>> 
>> I understand that the set of choices is intentionally limited, to simplify the output, but it looks like we're missing some necessary categories: enumeration types, (Objective-C) protocols, (Objective-C) categories and (Objective-C) properties, at a minimum. Should there still be an "other" category here?
> 
> Yes, I will add 'Other' for now, but at least 'Enum' should be added
> separately.  I think Protocols and categories can be handled as
> 'Class'es, properties -- as 'Variables'.  Of course, unless we want to
> store additional information for these.

Hrm, okay. We may need to separate out protocols later, because they are in a different namespace than (and, this this XML format, indistinguishable from) Objective-C classes.

>> I assume that parameters (both template and function) will always be documented as part of their class/function/etc?
> 
> Yes, since \param and \tparam are in the same FullComment.

So, how does a client deal with this? Do they have to look at the cursor kind to decide whether to ask for a comment from elsewhere? Can we make it simpler?

>> Similarly, enumerators will be documented as part of their enumeration type?
> 
> No, since enums are usually documented this way:
> /// Blah
> enum A {
>   A1, ///< Aaa
>   A2, ///< Bbb
> };
> So comments for enumerators end up in separate FullComments.

Will these be classified as "variables", then, for the XML representation?

>> What should happen if one asks for the XML representation of, e.g., a parameter's comment?
> 
> For ParmVarDecl, ASTContext::getRawCommentForDeclNoCache() will fail
> even to find a RawComment.

Okay. I guess that ties in with the question of "how should the client handle this?"

>> I assume that "Function" elements need something for "throws" or "exceptions", but that's not at all important now.
> 
> Maybe we will pretty-print it as a part of declaration?  Added a TODO
> note for now.

It turns out that exceptions documented in Doxygen can have actual documentation paragraphs associated with them, so those won't be pretty-printed as part of the declaration.

>> +  <define name="Class">
>> +    <element name="Class">
>> +      <optional>
>> +        <attribute name="isTemplate">
>> +          <data type="boolean" />
>> +        </attribute>
>> +      </optional>
>> +      <optional>
>> +        <attribute name="isTemplatePartialSpecialization">
>> +          <data type="boolean" />
>> +        </attribute>
>> +      </optional>
>> +      <optional>
>> +        <attribute name="isTemplateSpecialization">
>> +          <data type="boolean" />
>> +        </attribute>
>> +      </optional>
>> 
>> isTemplatePartialSpecialization == isTemplate && isTemplateSpecialization?
> 
> Even more, I would say that it is a bit confusing.  A set of boolean
> flags is usually not a good choice.  I changed that part of schema:
> 
>  <define name="Class">
>    <element name="Class">
>      <optional>
>        <attribute name="templateKind">
>          <choice>
>            <value>template</value>
>            <value>specialization</value>
>            <value>partialSpecialization</value>
>          </choice>
>        </attribute>
>      </optional>

Yes, I like this much better!

>> +  <define name="Name">
>> +    <element name="Name">
>> +      <!-- Non-empty text content. -->
>> +      <data type="string">
>> +        <param name="pattern">.*\S.*</param>
>> +      </data>
>> +    </element>
>> +  </define>
>> 
>> Presumably, the name is the unqualified name . One could imagine that it would be useful to have (in separate elements!) the qualifier of the name (e.g., the "std::" in "std::sort") and the template arguments of a specialization, so clients could render the full name if they want it. But, that's just grave.
> 
> Could we leave this to a follow-up?

Yes, absolutely.

>> More importantly,  it would be very useful to have the USR as a separate (optional) element, because USRs allow indexers based on libclang to identify entities across translation units.
> 
> Done, added <USR>...</USR> after <Name>...</Name>.

Thanks!

>> Similarly, for each kind of entity, could we add "location" information, e.g., the file, line, and column at which the declaration occurs?
> 
> Added "line" and "column" attributes to the root tag.

Could we get the full pathname as well?

> For example, for
>>>> 
> /// Aaa.
> int aaa;
> <<<
> we generate:
> 
> <Variable line="2"
> column="5"><Name>aaa</Name><USR>c:@aaa</USR><Abstract><Para>
> Aaa.</Para></Abstract></Variable>
> 
>> +void CommentASTToXMLConverter::visitFullComment(const FullComment *C) {
>> +  FullCommentParts Parts(C);
>> +
>> +  const DeclInfo *DI = C->getDeclInfo();
>> // …
>> +    bool FoundName = false;
>> +    if (const NamedDecl *ND = dyn_cast<NamedDecl>(DI->ThisDecl)) {
>> +      if (DeclarationName DeclName = ND->getDeclName()) {
>> +        Result << "<Name>";
>> +        std::string Name = DeclName.getAsString();
>> +        appendToResultWithXMLEscaping(Name);
>> +        FoundName = true;
>> +        Result << "</Name>";
>> +      }
>> 
>> Two comments here… first, this won't work for something like
>> 
>>  typedef struct { … } Foo;
>> 
>> where we almost surely want to use the name 'Foo'.
> 
> If we bound the comment to the struct decl, we have the RecordDecl*
> and I don't see how to get a TypedefDecl* from there.  To support
> this, we should be binding the comment to both.

Ah, I understand.

>> Second, for an Objective-C category, e.g.,
>> 
>> @interface NSString (NSCPlusPlusParser)
>> @endif
>> 
>> we probably want the name to be the full "NSString (NSCPlusPlusParser)".
> 
> Can we leave this to a follow-up?

Yes, of course.

>> +    if (!FoundName)
>> +      Result << "<Name>unknown</Name>";
>> 
>> How about "<anonymous>" or simply the empty string "" rather than "unknown"? Anonymous entities aren't that rare in C/C++.
> 
> Done.

Thanks, patch LGTM!

	- Doug