[cfe-commits] [PATCH] Comment AST and parser
Douglas Gregor
dgregor at apple.com
Thu Jul 5 14:30:58 PDT 2012
On Jul 3, 2012, at 3:14 PM, Dmitri Gribenko <gribozavr at gmail.com> wrote:
> Hello,
>
> The proposed patch implements AST classes for comments, a real parser
> for Doxygen comments and a very simple semantic analysis that just
> builds the AST. It also includes some minor changes for lexer to pick
> up source locations I didn't think about before.
>
> I decided to model the comments AST along the ideas of HTML AST: block
> and inline content.
>
> * Block content is a paragraph or a command that has a paragraph as an
> argument or verbatim command.
> * Inline content is placed within some block. Inline content includes
> plain text, inline commands and HTML as tag soup.
>
> Here are the AST nodes:
>
> $ cat include/clang/Basic/CommentNodes.td
> class Comment<bit abstract = 0> {
> bit Abstract = abstract;
> }
>
> class DComment<Comment base, bit abstract = 0> : Comment<abstract> {
> Comment Base = base;
> }
>
> def InlineContentComment : Comment<1>;
> def TextComment : DComment<InlineContentComment>;
> def InlineCommandComment : DComment<InlineContentComment>;
> def HTMLTagComment : DComment<InlineContentComment, 1>;
> def HTMLOpenTagComment : DComment<HTMLTagComment>;
> def HTMLCloseTagComment : DComment<HTMLTagComment>;
>
> def BlockContentComment : Comment<1>;
> def ParagraphComment : DComment<BlockContentComment>;
> def BlockCommandComment : DComment<BlockContentComment>;
> def ParamCommandComment : DComment<BlockCommandComment>;
> def VerbatimBlockComment : DComment<BlockCommandComment>;
> def VerbatimLineComment : DComment<BlockCommandComment>;
>
> def VerbatimBlockLineComment : Comment;
>
> def FullComment : Comment;
>
> More example ASTs attached.
>
> Not implemented in this patch:
> * a real iterator for Comment child nodes (but do we need one?)
> * diagnostics during parsing and semantic analysis;
> * finding and resolving declaration references in comment text;
> * TableGen-based generation of Doxygen command properties.
>
> Please review.
This patch is looking great. I have a few comments below; feel free to commit after addressing those, and I'll take a look at the result as a post-commit review.
+/// Any part of the comment.
+/// Abstract class.
+class Comment {
+protected:
+ /// Preferred location to show caret.
+ SourceLocation Loc;
+
+ /// Source range of this AST node.
+ SourceRange Range;
+
+ class CommentBitfields {
+ friend class Comment;
+
+ /// Type of this AST node.
+ unsigned Kind : 8;
+ };
+ enum { NumCommentBits = 8 };
+
+ class InlineContentCommentBitfields {
+ friend class InlineContentComment;
+
+ unsigned : NumCommentBits;
+
+ /// True if there is a newline after this inline content node.
+ /// (There is no separate AST node for a newline.)
+ unsigned HasTrailingNewline : 1;
+ };
+ enum { NumInlineContentCommentBitfields = 9 };
Yeah, this is looks great!
+/// An opening HTML tag with attributes.
+class HTMLOpenTagComment : public HTMLTagComment {
+public:
+ class Attribute {
+ public:
+ StringRef Name;
+ SourceLocation NameLocBegin;
+
+ SourceLocation EqualsLoc;
+
+ StringRef Value;
+ SourceRange ValueRange;
+
+ Attribute(StringRef Name, SourceLocation NameLocBegin) :
+ Name(Name), NameLocBegin(NameLocBegin),
+ EqualsLoc(SourceLocation()),
+ Value(StringRef()), ValueRange(SourceRange())
+ { }
+
+ Attribute(StringRef Name, SourceLocation NameLocBegin,
+ SourceLocation EqualsLoc,
+ StringRef Value, SourceRange ValueRange) :
+ Name(Name), NameLocBegin(NameLocBegin),
+ EqualsLoc(EqualsLoc),
+ Value(Value), ValueRange(ValueRange)
+ { }
+
+ SourceLocation getNameLocEnd() const {
+ return NameLocBegin.getLocWithOffset(Name.size());
+ }
+
+ SourceRange getNameRange() const {
+ return SourceRange(NameLocBegin, getNameLocEnd());
+ }
+ };
+
+private:
+ SmallVector<Attribute, 1> Attributes;
+
We need to avoid malloc-allocated structures within AST nodes, since the destructors of AST nodes are never invoked. I suggest just putting an ArrayRef here, which points into ASTContext-allocated memory.
+ void appendAttr(StringRef Name, SourceLocation NameLocBegin) {
+ Attributes.push_back(Attribute(Name, NameLocBegin));
+ Range.setEnd(Attributes.back().getNameLocEnd());
+ }
+
+ void appendAttrValue(StringRef Name, SourceLocation NameLocBegin,
+ SourceLocation EqualsLoc,
+ StringRef Value, SourceRange ValueRange) {
+ Attributes.push_back(Attribute(Name, NameLocBegin,
+ EqualsLoc,
+ Value, ValueRange));
+ Range.setEnd(ValueRange.getEnd());
+ }
This logic will need to go into the parser, which will just have to set() all of the attributes in one go by providing an ArrayRef.
+class BlockCommandComment : public BlockContentComment {
+protected:
+ StringRef Name;
+
+ struct Argument {
+ StringRef Text;
+ SourceRange Range;
+
+ Argument(StringRef Text, SourceRange Range) : Text(Text), Range(Range) { }
+ };
+
+ /// Word-like arguments.
+ llvm::SmallVector<Argument, 2> Args;
Same comment about malloc-allocated data within AST nodes.
+ void appendArg(StringRef Text, SourceRange Range) {
+ Args.push_back(Argument(Text, Range));
+ }
Ditto.
+/// Doxygen \\param command.
+class ParamCommandComment : public BlockCommandComment {
+public:
+ enum PassDirection {
+ In,
+ Out,
+ InOut
+ };
+
+protected:
+ unsigned Direction : 2;
+ unsigned IsDirectionExplicit : 1;
+
Sink these bits down into Comment?
- Doug
More information about the cfe-commits
mailing list