[cfe-commits] [PATCH] libclang API for comment-to-xml conversion

Mon Aug 6 16:22:25 PDT 2012

On Mon, Aug 6, 2012 at 8:13 AM, Douglas Gregor <dgregor at apple.com> wrote:
> On Aug 3, 2012, at 1:34 PM, Dmitri Gribenko <gribozavr at gmail.com> wrote:
>> I am not sure that test/Index/Inputs/CommentXML is the best place to
>> put the XML schema.  Any suggestions?
>
> bindings/xml, perhaps?

OK, done.

> Index: test/Index/comment-xml-schema.c
> ===================================================================
> --- test/Index/comment-xml-schema.c     (revision 0)
> +++ test/Index/comment-xml-schema.c     (revision 0)
> @@ -0,0 +1,35 @@
> +// RUN: xmllint --noout --relaxng %S/Inputs/CommentXML/comment-xml-schema.rng %S/Inputs/CommentXML/valid-function-01.xml
>
> This is going to fail if 'xmllint' is unavailable or simply not in the path. I think you'll need to extend lit with some knowledge of whether 'xmllint' is available (and perhaps its path?), so that we only run this set of tests when xmllint is available (via "REQUIRES: xmllint").

Seems to be doable with a small change to lit.cfg for clang tests:

Index: test/lit.cfg
===================================================================

--- test/lit.cfg        (revision 161322)
+++ test/lit.cfg        (working copy)
@@ -259,3 +259,7 @@

 if llc_props['enable_assertions']:
     config.available_features.add('asserts')
+
+if lit.util.which('xmllint'):
+    config.available_features.add('xmllint')
+

> I think the comment XML format looks great; it's a good balance between simplicity for the client and capability for rendering documentation comments. Some nitpicks follow:
>
> +  <start>
> +    <choice>
> +      <ref name="Function" />
> +      <ref name="Class" />
> +      <ref name="Variable" />
> +      <ref name="Namespace" />
> +      <ref name="Typedef" />
> +    </choice>
> +  </start>
>
> I understand that the set of choices is intentionally limited, to simplify the output, but it looks like we're missing some necessary categories: enumeration types, (Objective-C) protocols, (Objective-C) categories and (Objective-C) properties, at a minimum. Should there still be an "other" category here?

Yes, I will add 'Other' for now, but at least 'Enum' should be added
separately.  I think Protocols and categories can be handled as
'Class'es, properties -- as 'Variables'.  Of course, unless we want to
store additional information for these.

> I assume that parameters (both template and function) will always be documented as part of their class/function/etc?

Yes, since \param and \tparam are in the same FullComment.

> Similarly, enumerators will be documented as part of their enumeration type?

No, since enums are usually documented this way:
/// Blah
enum A {
   A1, ///< Aaa
   A2, ///< Bbb
};
So comments for enumerators end up in separate FullComments.

> What should happen if one asks for the XML representation of, e.g., a parameter's comment?

For ParmVarDecl, ASTContext::getRawCommentForDeclNoCache() will fail
even to find a RawComment.

> +      <!-- XXX: Maybe use <interleave> since order of these elements should not
> +           be important? -->
>
> I like the idea of us putting the XML in a sane order (as you have it now), because it saves clients from each having to reconstruct a sane ordering themselves. Clients who want a different ordering are, of course, free to do that on their own.

OK, the stricter the better, since nobody is going to write this kind
of XML by hand.

> I assume that "Function" elements need something for "throws" or "exceptions", but that's not at all important now.

Maybe we will pretty-print it as a part of declaration?  Added a TODO
note for now.

> +  <define name="Class">
> +    <element name="Class">
> +      <optional>
> +        <attribute name="isTemplate">
> +          <data type="boolean" />
> +        </attribute>
> +      </optional>
> +      <optional>
> +        <attribute name="isTemplatePartialSpecialization">
> +          <data type="boolean" />
> +        </attribute>
> +      </optional>
> +      <optional>
> +        <attribute name="isTemplateSpecialization">
> +          <data type="boolean" />
> +        </attribute>
> +      </optional>
>
> isTemplatePartialSpecialization == isTemplate && isTemplateSpecialization?

Even more, I would say that it is a bit confusing.  A set of boolean
flags is usually not a good choice.  I changed that part of schema:

  <define name="Class">
    <element name="Class">
      <optional>
        <attribute name="templateKind">
          <choice>
            <value>template</value>
            <value>specialization</value>
            <value>partialSpecialization</value>
          </choice>
        </attribute>
      </optional>

> +  <define name="Name">
> +    <element name="Name">
> +      <!-- Non-empty text content. -->
> +      <data type="string">
> +        <param name="pattern">.*\S.*</param>
> +      </data>
> +    </element>
> +  </define>
>
> Presumably, the name is the unqualified name . One could imagine that it would be useful to have (in separate elements!) the qualifier of the name (e.g., the "std::" in "std::sort") and the template arguments of a specialization, so clients could render the full name if they want it. But, that's just grave.

Could we leave this to a follow-up?

> More importantly,  it would be very useful to have the USR as a separate (optional) element, because USRs allow indexers based on libclang to identify entities across translation units.

Done, added <USR>...</USR> after <Name>...</Name>.

> Similarly, for each kind of entity, could we add "location" information, e.g., the file, line, and column at which the declaration occurs?

Added "line" and "column" attributes to the root tag.

For example, for
>>>
/// Aaa.
int aaa;
<<<
we generate:

<Variable line="2"
column="5"><Name>aaa</Name><USR>c:@aaa</USR><Abstract><Para>
Aaa.</Para></Abstract></Variable>

> +void CommentASTToXMLConverter::visitFullComment(const FullComment *C) {
> +  FullCommentParts Parts(C);
> +
> +  const DeclInfo *DI = C->getDeclInfo();
> // …
> +    bool FoundName = false;
> +    if (const NamedDecl *ND = dyn_cast<NamedDecl>(DI->ThisDecl)) {
> +      if (DeclarationName DeclName = ND->getDeclName()) {
> +        Result << "<Name>";
> +        std::string Name = DeclName.getAsString();
> +        appendToResultWithXMLEscaping(Name);
> +        FoundName = true;
> +        Result << "</Name>";
> +      }
>
> Two comments here… first, this won't work for something like
>
>   typedef struct { … } Foo;
>
> where we almost surely want to use the name 'Foo'.

If we bound the comment to the struct decl, we have the RecordDecl*
and I don't see how to get a TypedefDecl* from there.  To support
this, we should be binding the comment to both.

> Second, for an Objective-C category, e.g.,
>
> @interface NSString (NSCPlusPlusParser)
> @endif
>
> we probably want the name to be the full "NSString (NSCPlusPlusParser)".

Can we leave this to a follow-up?

> +    if (!FoundName)
> +      Result << "<Name>unknown</Name>";
>
> How about "<anonymous>" or simply the empty string "" rather than "unknown"? Anonymous entities aren't that rare in C/C++.

Done.

Dmitri

-- 
main(i,j){for(i=2;;i++){for(j=2;j<i;j++){if(!(i%j)){j=0;break;}}if
(j){printf("%d\n",i);}}} /*Dmitri Gribenko <gribozavr at gmail.com>*/
-------------- next part --------------
A non-text attachment was scrubbed...
Name: comment-to-xml-v2.patch
Type: application/octet-stream
Size: 102800 bytes
Desc: not available
URL: <http://lists.llvm.org/pipermail/cfe-commits/attachments/20120806/0b60acd3/attachment.obj>