[cfe-commits] [PATCH] Structured comment parsing, retaining comments in AST

Douglas Gregor dgregor at apple.com
Tue Jun 19 17:03:32 PDT 2012


On Jun 19, 2012, at 4:36 PM, Dmitri Gribenko <gribozavr at gmail.com> wrote:

> Hi Doug,
> 
> Thank you for the review.  Responses inline.  Patch attached.
> 
> On Tue, Jun 19, 2012 at 10:47 AM, Douglas Gregor <dgregor at apple.com> wrote:
>> Index: include/clang/Basic/DiagnosticSemaKinds.td
>> ===================================================================
>> --- include/clang/Basic/DiagnosticSemaKinds.td  (revision 158558)
>> +++ include/clang/Basic/DiagnosticSemaKinds.td  (working copy)
>> @@ -5633,5 +5633,10 @@
>>   "definition of %0 must be imported before it is required">;
>>  }
>> 
>> +let CategoryName = "Comment Issue" in {
>> +def warn_not_a_doxygen_after_member_comment : Warning<
>> +  "not a Doxygen member comment">, InGroup<DiagGroup<"doxygen">>;
>> +} // end of comment issue category
>> +
>>  } // end of sema component.
>> 
>> Please define a warning group in DiagnosticGroups.td for Doxygen warnings; I suspect we'll have a number of these.
>> 
>> Is this simply a "Doxygen Issue" or "Documentation Issue", rather than a "Comment Issue"?
> 
> We can say that it is a "Documentation issue", thus side-stepping the
> issue of hardcoding any particular names into clang.

Okay!

>> +  SourceRange Range;
>> +  CommentKind Kind;
>> +  bool IsAfterMember;
>> +  bool IsAlmostAfterMemberComment;
>> +  StringRef RawText;
>> +
>> +  mutable unsigned BeginLine; ///< Cached line number
>> +  mutable unsigned EndLine; ///< Cached line number
>> +  mutable bool BeginLineValid; ///< True if BeginLine is valid
>> +  mutable bool EndLineValid; ///< True if EndLine is valid
>> 
>> Please pack the various integer/bool/enum data fields into bitfields, because the size of RawComment does have an impact on the size of the AST.
> 
> Packed bool and enum fields together, but didn't touch unsigned
> members because they are line numbers.

Unfortunately, this perfectly valid C++ code:

+
+  CommentKind Kind : 3;
+

will break MSVC in horrible ways. Please use "unsigned Kind : 3;" here, and add casts where needed.

>> +  /// \brief Return the Doxygen-style comment attached to a given declaration,
>> +  /// without looking into cache.
>> +  const RawComment *getRawCommentForDeclNoCache(const Decl *D) const;
>> 
>> It's just a documentation comment, not specifically a Doxygen comment, right?
>> 
>>  /**
>> + * \brief Given a cursor that represents a declaration, return the associated
>> + * comment's source range.
>> + */
>> +CINDEX_LINKAGE CXSourceRange clang_Cursor_getCommentRange(CXCursor C);
>> +
>> 
>> Should this comment say something about this range covering multiple adjacent comments?
> 
> Rephrased:
> /**
> * \brief Given a cursor that represents a declaration, return the associated
> * comment's source range.
> */
> CINDEX_LINKAGE CXSourceRange clang_Cursor_getCommentRange(CXCursor C);
> 
> /**
> * \brief Given a cursor that represents a declaration, return the associated
> * comment text, including comment markers.  The range may include multiple
> * consecutive comments with whitespace in between.
> */
> CINDEX_LINKAGE CXString clang_Cursor_getRawCommentText(CXCursor C);

The second sentence here should have been added to clang_Cursor_getCommentRange, right?

>> +  // First check whether we have a comment that comes after a member decl.
>> +  if (Comment != RawComments.end() &&
>> +      Comment->isDoxygen() && Comment->isAfterMember() &&
>> +      !isa<TagDecl>(D) && !isa<NamespaceDecl>(D)) {
>> 
>> Why the checks against tags and namespaces? Should this also/instead be checking isa<TagDecl>(D->getDeclContext()) || isa<ObjCContainerDecl>(D->getDeclContext())?
> 
> The reason for this check is that we don't allow TagDecls and
> NamespaceDecls to have "after member" ///< comments.  (Although --
> just checked -- Doxygen allows them for TagDecls.  But that's
> confusing to write in source, I don't think it is logical.)

Okay, that makes sense.

>> +        case COMMENTS_RAW_COMMENT: {
>> +          unsigned Idx = 0;
>> +          SourceRange SR = ReadSourceRange(F, Record, Idx);
>> +          bool Merged = Record[Idx++];
>> +          Context.addComment(RawComment(SourceMgr, SR, Merged));
>> +          break;
>> 
>> This is going to cause us to pull in the contents of each of the files where a raw comment occurred, as soon as we load the raw comments from the AST file. That's going to be rather expensive, I suspect. Alternatively, we could store more in the AST file (including the Kind, IsAfterMember, and IsAlmostAfterMemberComment values) and lazily compute RawText only when it's actually requested... which should be very rare.
> 
> Done.
> 
>> Also, I think you want to put all of the comments read from the AST file at the beginning of the RawCommentList, because right now they could end up getting mixed together with comments collecting while parsing the current translation unit. This way, the ordering will be correct for precompiled headers. (Modules are a separable issue)
> 
> Done.
> 
> Question:
> 
> * "After member comment" sounds awkward (but that's how it is called
> once in Doxygen manual).  Maybe "trailing comment" is better?

Yes, I like "trailing comment" better.

This looks great! Please go ahead and commit.

	- Doug




More information about the cfe-commits mailing list