[cfe-commits] [PATCH] libclang API for comment-to-xml conversion

Douglas Gregor dgregor at apple.com
Mon Aug 6 08:13:48 PDT 2012


On Aug 3, 2012, at 1:34 PM, Dmitri Gribenko <gribozavr at gmail.com> wrote:

> Hello,
> 
> The attached patch implements a libclang API for comment-to-xml
> conversion.  The implementation also includes a Relax NG schema and
> tests for the schema itself.  The schema is used in c-index-test to
> verify that XML documents we produce are valid.  In order to do the
> validation, we add an optional libxml2 dependency for c-index-test.

Very cool. Comments below.

> I am not sure that test/Index/Inputs/CommentXML is the best place to
> put the XML schema.  Any suggestions?

bindings/xml, perhaps?

> Current implementation of declaration name printer is not perfect --
> it prints 'operator _Bool' for C++ 'operator bool()', for example.
> But this can be fixed in a followup together with a complete
> declaration printer.


Index: test/Index/comment-xml-schema.c
===================================================================
--- test/Index/comment-xml-schema.c	(revision 0)
+++ test/Index/comment-xml-schema.c	(revision 0)
@@ -0,0 +1,35 @@
+// RUN: xmllint --noout --relaxng %S/Inputs/CommentXML/comment-xml-schema.rng %S/Inputs/CommentXML/valid-function-01.xml

This is going to fail if 'xmllint' is unavailable or simply not in the path. I think you'll need to extend lit with some knowledge of whether 'xmllint' is available (and perhaps its path?), so that we only run this set of tests when xmllint is available (via "REQUIRES: xmllint").

I think the comment XML format looks great; it's a good balance between simplicity for the client and capability for rendering documentation comments. Some nitpicks follow:

+  <start>
+    <choice>
+      <ref name="Function" />
+      <ref name="Class" />
+      <ref name="Variable" />
+      <ref name="Namespace" />
+      <ref name="Typedef" />
+    </choice>
+  </start>

I understand that the set of choices is intentionally limited, to simplify the output, but it looks like we're missing some necessary categories: enumeration types, (Objective-C) protocols, (Objective-C) categories and (Objective-C) properties, at a minimum. Should there still be an "other" category here?

I assume that parameters (both template and function) will always be documented as part of their class/function/etc? Similarly, enumerators will be documented as part of their enumeration type? What should happen if one asks for the XML representation of, e.g., a parameter's comment?

+      <!-- XXX: Maybe use <interleave> since order of these elements should not
+           be important? -->

I like the idea of us putting the XML in a sane order (as you have it now), because it saves clients from each having to reconstruct a sane ordering themselves. Clients who want a different ordering are, of course, free to do that on their own.

I assume that "Function" elements need something for "throws" or "exceptions", but that's not at all important now.

+  <define name="Class">
+    <element name="Class">
+      <optional>
+        <attribute name="isTemplate">
+          <data type="boolean" />
+        </attribute>
+      </optional>
+      <optional>
+        <attribute name="isTemplatePartialSpecialization">
+          <data type="boolean" />
+        </attribute>
+      </optional>
+      <optional>
+        <attribute name="isTemplateSpecialization">
+          <data type="boolean" />
+        </attribute>
+      </optional>

isTemplatePartialSpecialization == isTemplate && isTemplateSpecialization?

+  <define name="Name">
+    <element name="Name">
+      <!-- Non-empty text content. -->
+      <data type="string">
+        <param name="pattern">.*\S.*</param>
+      </data>
+    </element>
+  </define>

Presumably, the name is the unqualified name . One could imagine that it would be useful to have (in separate elements!) the qualifier of the name (e.g., the "std::" in "std::sort") and the template arguments of a specialization, so clients could render the full name if they want it. But, that's just grave.

More importantly,  it would be very useful to have the USR as a separate (optional) element, because USRs allow indexers based on libclang to identify entities across translation units.

Similarly, for each kind of entity, could we add "location" information, e.g., the file, line, and column at which the declaration occurs?

+void CommentASTToXMLConverter::visitFullComment(const FullComment *C) {
+  FullCommentParts Parts(C);
+
+  const DeclInfo *DI = C->getDeclInfo();
// …
+    bool FoundName = false;
+    if (const NamedDecl *ND = dyn_cast<NamedDecl>(DI->ThisDecl)) {
+      if (DeclarationName DeclName = ND->getDeclName()) {
+        Result << "<Name>";
+        std::string Name = DeclName.getAsString();
+        appendToResultWithXMLEscaping(Name);
+        FoundName = true;
+        Result << "</Name>";
+      }

Two comments here… first, this won't work for something like

  typedef struct { … } Foo;

where we almost surely want to use the name 'Foo'.

Second, for an Objective-C category, e.g.,

@interface NSString (NSCPlusPlusParser)
@endif

we probably want the name to be the full "NSString (NSCPlusPlusParser)".

+    if (!FoundName)
+      Result << "<Name>unknown</Name>";

How about "<anonymous>" or simply the empty string "" rather than "unknown"? Anonymous entities aren't that rare in C/C++. 

+  } else {
+    // No DeclInfo -- just emit some root tag and name tag.
+    RootEndTag = "</Function>";
+    Result << "<Function><Name>unknown</Name>";
+  }

As eluded to above, I think I'd rather that we have an "Other" in the schema. The "name" part should probably be factored out, since there are likely to be "other" declarations that have names.

	- Doug





More information about the cfe-commits mailing list