[cfe-commits] Comment AST and parser
Dmitri Gribenko
gribozavr at gmail.com
Fri Jun 29 17:40:36 PDT 2012
Hello,
I'm working on comment AST and real comment parser. Just wanted to
gather opinions or alternative views or receive general feedback.
I decided to model the AST along the ideas of HTML AST: block and
inline content.
* Block content is a paragraph or a command that has a paragraph as an
argument or verbatim command.
* Inline content is placed within some block. Inline content includes
plain text, inline commands and HTML as tag soup.
Here are the AST nodes:
$ cat include/clang/Basic/CommentNodes.td
class Comment<bit abstract = 0> {
bit Abstract = abstract;
}
class DComment<Comment base, bit abstract = 0> : Comment<abstract> {
Comment Base = base;
}
def InlineContentComment : Comment<1>;
def TextComment : DComment<InlineContentComment>;
def NewlineComment : DComment<InlineContentComment>;
def InlineCommandComment : DComment<InlineContentComment>;
def HTMLTagComment : DComment<InlineContentComment, 1>;
def HTMLOpenTagComment : DComment<HTMLTagComment>;
def HTMLCloseTagComment : DComment<HTMLTagComment>;
def BlockContentComment : Comment<1>;
def ParagraphComment : DComment<BlockContentComment>;
def BlockCommandComment : DComment<BlockContentComment>;
def ParamCommandComment : DComment<BlockCommandComment>;
def VerbatimBlockComment : DComment<BlockCommandComment>;
def VerbatimLineComment : DComment<BlockCommandComment>;
def VerbatimBlockLineComment : Comment;
def FullComment : Comment;
Here is an example AST:
=== Source:
// \brief Aaa
//
// Bbb
=== AST:
(FullComment 0x7fe3c10061c0 <:1:3, :3:7>
(ParagraphComment 0x7fe3c1006040 <:1:3, :1:4>
(TextComment 0x7fe3c1006010 <:1:3, :1:4> Text=" "))
(BlockCommandComment 0x7fe3c1006070 <:1:4, :1:14> Name="brief"
(ParagraphComment 0x7fe3c1006130 <:1:10, :1:14>
(TextComment 0x7fe3c1006100 <:1:10, :1:14> Text=" Aaa")))
(ParagraphComment 0x7fe3c1006190 <:3:3, :3:7>
(TextComment 0x7fe3c1006160 <:3:3, :3:7> Text=" Bbb")))
See attached files for a patch (WIP) and more example ASTs.
Some implementation details: I am not a big fan of
TextTokenRetokenizer class I'm introducing with this patch, but this
seems to be way to go about splitting existing text tokens.
Dmitri
--
main(i,j){for(i=2;;i++){for(j=2;j<i;j++){if(!(i%j)){j=0;break;}}if
(j){printf("%d\n",i);}}} /*Dmitri Gribenko <gribozavr at gmail.com>*/
-------------- next part --------------
[----------] 9 tests from CommentParserTest
[ RUN ] CommentParserTest.Test1
=== Source:
//
=== AST:
(FullComment 0x7fe3c1006410 <<invalid loc>>)
[ OK ] CommentParserTest.Test1 (0 ms)
[ RUN ] CommentParserTest.Test2
=== Source:
// Meow
=== AST:
(FullComment 0x7fe3c1006070 <:1:3, :1:8>
(ParagraphComment 0x7fe3c1006040 <:1:3, :1:8>
(TextComment 0x7fe3c1006010 <:1:3, :1:8> Text=" Meow")))
[ OK ] CommentParserTest.Test2 (1 ms)
[ RUN ] CommentParserTest.Test3
=== Source:
// Aaa
// Bbb
=== AST:
(FullComment 0x7fe3c10060d0 <:1:3, :2:7>
(ParagraphComment 0x7fe3c1006090 <:1:3, :2:7>
(TextComment 0x7fe3c1006010 <:1:3, :1:7> Text=" Aaa")
(NewlineComment 0x7fe3c1006040 <:1:7>)
(TextComment 0x7fe3c1006060 <:2:3, :2:7> Text=" Bbb")))
[ OK ] CommentParserTest.Test3 (0 ms)
[ RUN ] CommentParserTest.Test4
=== Source:
// Aaa
//
// Bbb
=== AST:
(FullComment 0x7fe3c10060d0 <:1:3, :3:7>
(ParagraphComment 0x7fe3c1006040 <:1:3, :1:7>
(TextComment 0x7fe3c1006010 <:1:3, :1:7> Text=" Aaa"))
(ParagraphComment 0x7fe3c10060a0 <:3:3, :3:7>
(TextComment 0x7fe3c1006070 <:3:3, :3:7> Text=" Bbb")))
=== Source:
// Aaa
//
//
// Bbb
=== AST:
(FullComment 0x7fe3c10061d0 <:1:3, :4:7>
(ParagraphComment 0x7fe3c1006140 <:1:3, :1:7>
(TextComment 0x7fe3c1006110 <:1:3, :1:7> Text=" Aaa"))
(ParagraphComment 0x7fe3c10061a0 <:4:3, :4:7>
(TextComment 0x7fe3c1006170 <:4:3, :4:7> Text=" Bbb")))
[ OK ] CommentParserTest.Test4 (0 ms)
[ RUN ] CommentParserTest.Test5
=== Source:
// \brief Aaa
//
// Bbb
=== AST:
(FullComment 0x7fe3c10061c0 <:1:3, :3:7>
(ParagraphComment 0x7fe3c1006040 <:1:3, :1:4>
(TextComment 0x7fe3c1006010 <:1:3, :1:4> Text=" "))
(BlockCommandComment 0x7fe3c1006070 <:1:4, :1:14> Name="brief"
(ParagraphComment 0x7fe3c1006130 <:1:10, :1:14>
(TextComment 0x7fe3c1006100 <:1:10, :1:14> Text=" Aaa")))
(ParagraphComment 0x7fe3c1006190 <:3:3, :3:7>
(TextComment 0x7fe3c1006160 <:3:3, :3:7> Text=" Bbb")))
[ OK ] CommentParserTest.Test5 (0 ms)
[ RUN ] CommentParserTest.Test6
=== Source:
// \brief \author
=== AST:
(FullComment 0x7fe3c1006220 <:1:3, <invalid loc>>
(ParagraphComment 0x7fe3c1006040 <:1:3, :1:4>
(TextComment 0x7fe3c1006010 <:1:3, :1:4> Text=" "))
(BlockCommandComment 0x7fe3c1006070 <:1:4, :1:11> Name="brief"
(ParagraphComment 0x7fe3c1006130 <:1:10, :1:11>
(TextComment 0x7fe3c1006100 <:1:10, :1:11> Text=" ")))
(BlockCommandComment 0x7fe3c1006160 <:1:11, <invalid loc>> Name="author"
(ParagraphComment 0x7fe3c10061f0 <<invalid loc>>)))
[ OK ] CommentParserTest.Test6 (0 ms)
[ RUN ] CommentParserTest.Test7
=== Source:
// \brief Aaa
// Bbb \author
// Ccc
=== AST:
(FullComment 0x7fe3c10062e0 <:1:3, :3:7>
(ParagraphComment 0x7fe3c1006040 <:1:3, :1:4>
(TextComment 0x7fe3c1006010 <:1:3, :1:4> Text=" "))
(BlockCommandComment 0x7fe3c1006070 <:1:4, :2:8> Name="brief"
(ParagraphComment 0x7fe3c1006180 <:1:10, :2:8>
(TextComment 0x7fe3c1006100 <:1:10, :1:14> Text=" Aaa")
(NewlineComment 0x7fe3c1006130 <:1:14>)
(TextComment 0x7fe3c1006150 <:2:3, :2:8> Text=" Bbb ")))
(BlockCommandComment 0x7fe3c10061c0 <:2:8, :3:7> Name="author"
(ParagraphComment 0x7fe3c10062a0 <:2:15, :3:7>
(NewlineComment 0x7fe3c1006250 <:2:15>)
(TextComment 0x7fe3c1006270 <:3:3, :3:7> Text=" Ccc"))))
[ OK ] CommentParserTest.Test7 (1 ms)
[ RUN ] CommentParserTest.Test8
=== Source:
// \param aaa
// \param [in] aaa
// \param [out] aaa
// \param [in,out] aaa
=== AST:
(FullComment 0x7fe3c10064e0 <:1:3, <invalid loc>>
(ParagraphComment 0x7fe3c1006040 <:1:3, :1:4>
(TextComment 0x7fe3c1006010 <:1:3, :1:4> Text=" "))
(ParamCommandComment 0x7fe3c1006070 <:1:4, :2:4> [in] implicitly Param="aaa"
(ParagraphComment 0x7fe3c1006160 <:1:14, :2:4>
(NewlineComment 0x7fe3c1006110 <:1:14>)
(TextComment 0x7fe3c1006130 <:2:3, :2:4> Text=" ")))
(ParamCommandComment 0x7fe3c10061a0 <:2:4, :3:4> [in] explicitly Param="aaa"
(ParagraphComment 0x7fe3c1006290 <:2:19, :3:4>
(NewlineComment 0x7fe3c1006240 <:2:19>)
(TextComment 0x7fe3c1006260 <:3:3, :3:4> Text=" ")))
(ParamCommandComment 0x7fe3c10062d0 <:3:4, :4:4> [out] explicitly Param="aaa"
(ParagraphComment 0x7fe3c10063c0 <:3:20, :4:4>
(NewlineComment 0x7fe3c1006370 <:3:20>)
(TextComment 0x7fe3c1006390 <:4:3, :4:4> Text=" ")))
(ParamCommandComment 0x7fe3c1006400 <:4:4, <invalid loc>> [in,out] explicitly Param="aaa"
(ParagraphComment 0x7fe3c10064b0 <<invalid loc>>)))
[ OK ] CommentParserTest.Test8 (0 ms)
[ RUN ] CommentParserTest.Test9
=== Source:
// \unknown aaa
=== AST:
(FullComment 0x7fe3c10060e0 <:1:3, :1:16>
(ParagraphComment 0x7fe3c10060a0 <:1:3, :1:16>
(TextComment 0x7fe3c1006010 <:1:3, :1:4> Text=" ")
(InlineCommandComment 0x7fe3c1006040 <:1:4, :1:12>)
(TextComment 0x7fe3c1006070 <:1:12, :1:16> Text=" aaa")))
[ OK ] CommentParserTest.Test9 (0 ms)
[----------] 9 tests from CommentParserTest (2 ms total)
-------------- next part --------------
A non-text attachment was scrubbed...
Name: comments-parser-v0.patch
Type: application/octet-stream
Size: 65597 bytes
Desc: not available
URL: <http://lists.llvm.org/pipermail/cfe-commits/attachments/20120629/dd850b7a/attachment.obj>
More information about the cfe-commits
mailing list