[cfe-commits] [PATCH] Comment AST and parser

Douglas Gregor dgregor at apple.com
Thu Jul 5 14:30:58 PDT 2012


On Jul 3, 2012, at 3:14 PM, Dmitri Gribenko <gribozavr at gmail.com> wrote:

> Hello,
> 
> The proposed patch implements AST classes for comments, a real parser
> for Doxygen comments and a very simple semantic analysis that just
> builds the AST.  It also includes some minor changes for lexer to pick
> up source locations I didn't think about before.
> 
> I decided to model the comments AST along the ideas of HTML AST: block
> and inline content.
> 
> * Block content is a paragraph or a command that has a paragraph as an
> argument or verbatim command.
> * Inline content is placed within some block.  Inline content includes
> plain text, inline commands and HTML as tag soup.
> 
> Here are the AST nodes:
> 
> $ cat include/clang/Basic/CommentNodes.td
> class Comment<bit abstract = 0> {
>  bit Abstract = abstract;
> }
> 
> class DComment<Comment base, bit abstract = 0> : Comment<abstract> {
>  Comment Base = base;
> }
> 
> def InlineContentComment : Comment<1>;
>  def TextComment : DComment<InlineContentComment>;
>  def InlineCommandComment : DComment<InlineContentComment>;
>  def HTMLTagComment : DComment<InlineContentComment, 1>;
>    def HTMLOpenTagComment : DComment<HTMLTagComment>;
>    def HTMLCloseTagComment : DComment<HTMLTagComment>;
> 
> def BlockContentComment : Comment<1>;
>  def ParagraphComment : DComment<BlockContentComment>;
>  def BlockCommandComment : DComment<BlockContentComment>;
>    def ParamCommandComment : DComment<BlockCommandComment>;
>    def VerbatimBlockComment : DComment<BlockCommandComment>;
>    def VerbatimLineComment : DComment<BlockCommandComment>;
> 
> def VerbatimBlockLineComment : Comment;
> 
> def FullComment : Comment;
> 
> More example ASTs attached.
> 
> Not implemented in this patch:
> * a real iterator for Comment child nodes (but do we need one?)
> * diagnostics during parsing and semantic analysis;
> * finding and resolving declaration references in comment text;
> * TableGen-based generation of Doxygen command properties.
> 
> Please review.


This patch is looking great. I have a few comments below; feel free to commit after addressing those, and I'll take a look at the result as a post-commit review.

+/// Any part of the comment.
+/// Abstract class.
+class Comment {
+protected:
+  /// Preferred location to show caret.
+  SourceLocation Loc;
+
+  /// Source range of this AST node.
+  SourceRange Range;
+
+  class CommentBitfields {
+    friend class Comment;
+
+    /// Type of this AST node.
+    unsigned Kind : 8;
+  };
+  enum { NumCommentBits = 8 };
+
+  class InlineContentCommentBitfields {
+    friend class InlineContentComment;
+
+    unsigned : NumCommentBits;
+
+    /// True if there is a newline after this inline content node.
+    /// (There is no separate AST node for a newline.)
+    unsigned HasTrailingNewline : 1;
+  };
+  enum { NumInlineContentCommentBitfields = 9 };

Yeah, this is looks great!

+/// An opening HTML tag with attributes.
+class HTMLOpenTagComment : public HTMLTagComment {
+public:
+  class Attribute {
+  public:
+    StringRef Name;
+    SourceLocation NameLocBegin;
+
+    SourceLocation EqualsLoc;
+
+    StringRef Value;
+    SourceRange ValueRange;
+
+    Attribute(StringRef Name, SourceLocation NameLocBegin) :
+        Name(Name), NameLocBegin(NameLocBegin),
+        EqualsLoc(SourceLocation()),
+        Value(StringRef()), ValueRange(SourceRange())
+    { }
+
+    Attribute(StringRef Name, SourceLocation NameLocBegin,
+              SourceLocation EqualsLoc,
+              StringRef Value, SourceRange ValueRange) :
+        Name(Name), NameLocBegin(NameLocBegin),
+        EqualsLoc(EqualsLoc),
+        Value(Value), ValueRange(ValueRange)
+    { }
+
+    SourceLocation getNameLocEnd() const {
+      return NameLocBegin.getLocWithOffset(Name.size());
+    }
+
+    SourceRange getNameRange() const {
+      return SourceRange(NameLocBegin, getNameLocEnd());
+    }
+  };
+
+private:
+  SmallVector<Attribute, 1> Attributes;
+

We need to avoid malloc-allocated structures within AST nodes, since the destructors of AST nodes are never invoked. I suggest just putting an ArrayRef here, which points into ASTContext-allocated memory.

+  void appendAttr(StringRef Name, SourceLocation NameLocBegin) {
+    Attributes.push_back(Attribute(Name, NameLocBegin));
+    Range.setEnd(Attributes.back().getNameLocEnd());
+  }
+
+  void appendAttrValue(StringRef Name, SourceLocation NameLocBegin,
+                       SourceLocation EqualsLoc,
+                       StringRef Value, SourceRange ValueRange) {
+    Attributes.push_back(Attribute(Name, NameLocBegin,
+                                   EqualsLoc,
+                                   Value, ValueRange));
+    Range.setEnd(ValueRange.getEnd());
+  }

This logic will need to go into the parser, which will just have to set() all of the attributes in one go by providing an ArrayRef.

+class BlockCommandComment : public BlockContentComment {
+protected:
+  StringRef Name;
+
+  struct Argument {
+    StringRef Text;
+    SourceRange Range;
+
+    Argument(StringRef Text, SourceRange Range) : Text(Text), Range(Range) { }
+  };
+
+  /// Word-like arguments.
+  llvm::SmallVector<Argument, 2> Args;

Same comment about malloc-allocated data within AST nodes.

+  void appendArg(StringRef Text, SourceRange Range) {
+    Args.push_back(Argument(Text, Range));
+  }

Ditto.

+/// Doxygen \\param command.
+class ParamCommandComment : public BlockCommandComment {
+public:
+  enum PassDirection {
+    In,
+    Out,
+    InOut
+  };
+
+protected:
+  unsigned Direction : 2;
+  unsigned IsDirectionExplicit : 1;
+

Sink these bits down into Comment?

	- Doug



More information about the cfe-commits mailing list