[cfe-commits] Comment AST and parser
Douglas Gregor
dgregor at apple.com
Mon Jul 2 09:39:45 PDT 2012
On Jun 29, 2012, at 5:40 PM, Dmitri Gribenko wrote:
> Hello,
>
> I'm working on comment AST and real comment parser. Just wanted to
> gather opinions or alternative views or receive general feedback.
>
> I decided to model the AST along the ideas of HTML AST: block and
> inline content.
>
> * Block content is a paragraph or a command that has a paragraph as an
> argument or verbatim command.
> * Inline content is placed within some block. Inline content includes
> plain text, inline commands and HTML as tag soup.
>
> Here are the AST nodes:
>
> $ cat include/clang/Basic/CommentNodes.td
> class Comment<bit abstract = 0> {
> bit Abstract = abstract;
> }
>
> class DComment<Comment base, bit abstract = 0> : Comment<abstract> {
> Comment Base = base;
> }
>
> def InlineContentComment : Comment<1>;
> def TextComment : DComment<InlineContentComment>;
> def NewlineComment : DComment<InlineContentComment>;
> def InlineCommandComment : DComment<InlineContentComment>;
> def HTMLTagComment : DComment<InlineContentComment, 1>;
> def HTMLOpenTagComment : DComment<HTMLTagComment>;
> def HTMLCloseTagComment : DComment<HTMLTagComment>;
>
> def BlockContentComment : Comment<1>;
> def ParagraphComment : DComment<BlockContentComment>;
> def BlockCommandComment : DComment<BlockContentComment>;
> def ParamCommandComment : DComment<BlockCommandComment>;
> def VerbatimBlockComment : DComment<BlockCommandComment>;
> def VerbatimLineComment : DComment<BlockCommandComment>;
>
> def VerbatimBlockLineComment : Comment;
>
> def FullComment : Comment;
>
> Here is an example AST:
>
> === Source:
> // \brief Aaa
> //
> // Bbb
> === AST:
> (FullComment 0x7fe3c10061c0 <:1:3, :3:7>
> (ParagraphComment 0x7fe3c1006040 <:1:3, :1:4>
> (TextComment 0x7fe3c1006010 <:1:3, :1:4> Text=" "))
> (BlockCommandComment 0x7fe3c1006070 <:1:4, :1:14> Name="brief"
> (ParagraphComment 0x7fe3c1006130 <:1:10, :1:14>
> (TextComment 0x7fe3c1006100 <:1:10, :1:14> Text=" Aaa")))
> (ParagraphComment 0x7fe3c1006190 <:3:3, :3:7>
> (TextComment 0x7fe3c1006160 <:3:3, :3:7> Text=" Bbb")))
>
> See attached files for a patch (WIP) and more example ASTs.
>
> Some implementation details: I am not a big fan of
> TextTokenRetokenizer class I'm introducing with this patch, but this
> seems to be way to go about splitting existing text tokens.
This is looking great! I like this direction a lot. Some specific comments below.
@@ -427,12 +428,15 @@
/// \brief Mapping from declarations to their comments (stored within
/// Comments list), once we have already looked up the comment associated
/// with a given declaration.
- mutable llvm::DenseMap<const Decl *, const RawComment *> DeclComments;
+ mutable llvm::DenseMap<const Decl *, const RawComment *> DeclRawComments;
/// \brief Return the documentation comment attached to a given declaration,
/// without looking into cache.
const RawComment *getRawCommentForDeclNoCache(const Decl *D) const;
+ mutable llvm::DenseMap<const Decl *, comments::FullComment *>
+ DeclParsedComments;
+
public:
void addComment(const RawComment &RC) {
Comments.addComment(RC);
Should these two DenseMaps be collapsed into one
DenseMap<const Decl *, std::pair<const RawComment *, comments::FullComment *> >
to reduce the number of lookups we need to do?
+/// Any part of the comment.
+/// Abstract class.
+class Comment {
+protected:
+ // Type of this AST node.
+ unsigned Kind : 8;
+
+ // Preffered location to show caret.
+ SourceLocation Loc;
"Preffered" should be "Preferred".
+public:
+ enum CommentKind {
+ NoCommentKind = 0,
+#define COMMENT(CLASS, PARENT) CLASS##Kind,
+#define COMMENT_RANGE(BASE, FIRST, LAST) \
+ first##BASE##Constant=FIRST##Kind, last##BASE##Constant=LAST##Kind,
+#define LAST_COMMENT_RANGE(BASE, FIRST, LAST) \
+ first##BASE##Constant=FIRST##Kind, last##BASE##Constant=LAST##Kind
+#define ABSTRACT_COMMENT(COMMENT)
+#include "clang/AST/CommentNodes.inc"
+ };
"first" and "last" here should be uppercase, please.
+/// A newline.
+class NewlineComment : public InlineContentComment {
+public:
Why have a special AST node for newlines, rather than either embedding \n's or having a "there's a newline following this inline comment" bit? Abstractly, I guess having an AST node to cope with might make it easier for clients of the comment AST to translate it properly, or is there another reason?
+
+class HTMLOpenTagComment : public HTMLTagComment {
+public:
+ class Attribute {
+ StringRef Name;
+ StringRef Value;
+ SourceRange Range;
+ };
Please add a comment for this class. A reminder that open/close comments are *not* automatically paired by the representation would be wonderful, here.
+private:
+ ArrayRef<Attribute *> Attributes;
Might it be better to just have an ArrayRef<Attribute>, to avoid the extra indirections? Copying Attribute instances around is fairly fast.
+
+class HTMLCloseTagComment : public HTMLTagComment {
Comment please!
+/// Doxygen \\param command.
+class ParamCommandComment : public BlockCommandComment {
+public:
+ enum PassDirection {
+ DIR_IN,
+ DIR_OUT,
+ DIR_IN_OUT
+ };
+
Name these PD_In, PD_Out, PD_InOut, perhaps? Or even In, Out, InOut, since they're nested in ParamCommandComment.
+/// Verbatim block that (e. g., preformatted code).
+class VerbatimBlockComment : public BlockCommandComment {
+public:
Partial sentence in this comment.
+
+class VerbatimBlockLineComment : public Comment {
+public:
+ static bool classof(const Comment *C) {
+ return C->getCommentKind() == VerbatimBlockLineCommentKind;
+ }
+
+ static bool classof(const VerbatimBlockLineComment *) { return true; }
+
+ child_iterator child_begin() const { return NULL; }
+
+ child_iterator child_end() const { return NULL; }
+};
+
+class VerbatimLineComment : public BlockCommandComment {};
Please comment these classes; I'm not sure I understand the difference between them.
+ParamCommandComment *Sema::actOnParamCommandArg(ParamCommandComment *Command,
+ SourceLocation ArgLocBegin,
+ SourceLocation ArgLocEnd,
+ StringRef Arg,
+ bool IsDirection) {
+ if (IsDirection) {
+ ParamCommandComment::PassDirection Direction;
+ std::string ArgLower = Arg.lower();
+ // TODO: optimize: lower Name first (need an API in SmallString for that),
+ // after that StringSwitch.
+ if (ArgLower == "[in]")
+ Direction = ParamCommandComment::DIR_IN;
+ else if (ArgLower == "[out]")
+ Direction = ParamCommandComment::DIR_OUT;
+ else if (ArgLower == "[in,out]" || ArgLower == "[out,in]")
+ Direction = ParamCommandComment::DIR_IN_OUT;
+ else {
+ Direction = ParamCommandComment::DIR_IN;
+ // Diag() unrecognized parameter passing direction, valid directions are ...
+ }
I imagine that we'll want to deal with spaces around the ',', for people who get it wrong?
+bool Sema::isBlockCommand(StringRef Name) {
+ return llvm::StringSwitch<bool>(Name)
+ .Case("brief", true)
+ .Case("result", true)
+ .Case("return", true)
+ .Case("returns", true)
+ .Case("author", true)
+ .Case("authors", true)
+ .Case("pre", true)
+ .Case("post", true)
+ .Default(false) || isParamCommand(Name);
+}
+
+bool Sema::isParamCommand(StringRef Name) {
+ return llvm::StringSwitch<bool>(Name)
+ .Case("param", true)
+ .Case("arg", true)
+ .Default(false);
+}
To be TableGen'd at a later date, I'm sure ;)
Index: utils/TableGen/TableGen.cpp
===================================================================
--- utils/TableGen/TableGen.cpp (revision 159471)
+++ utils/TableGen/TableGen.cpp (working copy)
@@ -38,6 +38,7 @@
GenClangDiagsDefs,
GenClangDiagGroups,
GenClangDiagsIndexName,
+ GenClangCommentNodes,
GenClangDeclNodes,
GenClangStmtNodes,
GenClangSACheckers,
@@ -86,6 +87,8 @@
clEnumValN(GenClangDiagsIndexName,
"gen-clang-diags-index-name",
"Generate Clang diagnostic name index"),
+ clEnumValN(GenClangCommentNodes, "gen-clang-comment-nodes",
+ "Generate Clang AST comment nodes"),
clEnumValN(GenClangDeclNodes, "gen-clang-decl-nodes",
"Generate Clang AST declaration nodes"),
clEnumValN(GenClangStmtNodes, "gen-clang-stmt-nodes",
@@ -148,6 +151,9 @@
case GenClangDiagsIndexName:
EmitClangDiagsIndexName(Records, OS);
break;
+ case GenClangCommentNodes:
+ EmitClangASTNodes(Records, OS, "Comment", "");
+ break;
case GenClangDeclNodes:
EmitClangASTNodes(Records, OS, "Decl", "Decl");
EmitClangDeclContext(Records, OS);
Is there more that you intend to do with TableGen for the AST nodes themselves? It looks like all we're getting is the list of node names, but TableGen is a pretty heavyweight way to keep that up-to-date.
- Doug
More information about the cfe-commits
mailing list