[cfe-commits] Comment AST and parser

Dmitri Gribenko gribozavr at gmail.com
Fri Jun 29 17:40:36 PDT 2012


Hello,

I'm working on comment AST and real comment parser.  Just wanted to
gather opinions or alternative views or receive general feedback.

I decided to model the AST along the ideas of HTML AST: block and
inline content.

* Block content is a paragraph or a command that has a paragraph as an
argument or verbatim command.
* Inline content is placed within some block.  Inline content includes
plain text, inline commands and HTML as tag soup.

Here are the AST nodes:

$ cat include/clang/Basic/CommentNodes.td
class Comment<bit abstract = 0> {
  bit Abstract = abstract;
}

class DComment<Comment base, bit abstract = 0> : Comment<abstract> {
  Comment Base = base;
}

def InlineContentComment : Comment<1>;
  def TextComment : DComment<InlineContentComment>;
  def NewlineComment : DComment<InlineContentComment>;
  def InlineCommandComment : DComment<InlineContentComment>;
  def HTMLTagComment : DComment<InlineContentComment, 1>;
    def HTMLOpenTagComment : DComment<HTMLTagComment>;
    def HTMLCloseTagComment : DComment<HTMLTagComment>;

def BlockContentComment : Comment<1>;
  def ParagraphComment : DComment<BlockContentComment>;
  def BlockCommandComment : DComment<BlockContentComment>;
    def ParamCommandComment : DComment<BlockCommandComment>;
    def VerbatimBlockComment : DComment<BlockCommandComment>;
    def VerbatimLineComment : DComment<BlockCommandComment>;

def VerbatimBlockLineComment : Comment;

def FullComment : Comment;

Here is an example AST:

=== Source:
// \brief Aaa
//
// Bbb
=== AST:
(FullComment 0x7fe3c10061c0 <:1:3, :3:7>
  (ParagraphComment 0x7fe3c1006040 <:1:3, :1:4>
    (TextComment 0x7fe3c1006010 <:1:3, :1:4> Text=" "))
  (BlockCommandComment 0x7fe3c1006070 <:1:4, :1:14> Name="brief"
    (ParagraphComment 0x7fe3c1006130 <:1:10, :1:14>
      (TextComment 0x7fe3c1006100 <:1:10, :1:14> Text=" Aaa")))
  (ParagraphComment 0x7fe3c1006190 <:3:3, :3:7>
    (TextComment 0x7fe3c1006160 <:3:3, :3:7> Text=" Bbb")))

See attached files for a patch (WIP) and more example ASTs.

Some implementation details: I am not a big fan of
TextTokenRetokenizer class I'm introducing with this patch, but this
seems to be way to go about splitting existing text tokens.

Dmitri

-- 
main(i,j){for(i=2;;i++){for(j=2;j<i;j++){if(!(i%j)){j=0;break;}}if
(j){printf("%d\n",i);}}} /*Dmitri Gribenko <gribozavr at gmail.com>*/
-------------- next part --------------
[----------] 9 tests from CommentParserTest
[ RUN      ] CommentParserTest.Test1
=== Source:
//
=== AST:
(FullComment 0x7fe3c1006410 <<invalid loc>>)
[       OK ] CommentParserTest.Test1 (0 ms)
[ RUN      ] CommentParserTest.Test2
=== Source:
// Meow
=== AST:
(FullComment 0x7fe3c1006070 <:1:3, :1:8>
  (ParagraphComment 0x7fe3c1006040 <:1:3, :1:8>
    (TextComment 0x7fe3c1006010 <:1:3, :1:8> Text=" Meow")))
[       OK ] CommentParserTest.Test2 (1 ms)
[ RUN      ] CommentParserTest.Test3
=== Source:
// Aaa
// Bbb
=== AST:
(FullComment 0x7fe3c10060d0 <:1:3, :2:7>
  (ParagraphComment 0x7fe3c1006090 <:1:3, :2:7>
    (TextComment 0x7fe3c1006010 <:1:3, :1:7> Text=" Aaa")
    (NewlineComment 0x7fe3c1006040 <:1:7>)
    (TextComment 0x7fe3c1006060 <:2:3, :2:7> Text=" Bbb")))
[       OK ] CommentParserTest.Test3 (0 ms)
[ RUN      ] CommentParserTest.Test4
=== Source:
// Aaa
//
// Bbb
=== AST:
(FullComment 0x7fe3c10060d0 <:1:3, :3:7>
  (ParagraphComment 0x7fe3c1006040 <:1:3, :1:7>
    (TextComment 0x7fe3c1006010 <:1:3, :1:7> Text=" Aaa"))
  (ParagraphComment 0x7fe3c10060a0 <:3:3, :3:7>
    (TextComment 0x7fe3c1006070 <:3:3, :3:7> Text=" Bbb")))
=== Source:
// Aaa
//
//
// Bbb
=== AST:
(FullComment 0x7fe3c10061d0 <:1:3, :4:7>
  (ParagraphComment 0x7fe3c1006140 <:1:3, :1:7>
    (TextComment 0x7fe3c1006110 <:1:3, :1:7> Text=" Aaa"))
  (ParagraphComment 0x7fe3c10061a0 <:4:3, :4:7>
    (TextComment 0x7fe3c1006170 <:4:3, :4:7> Text=" Bbb")))
[       OK ] CommentParserTest.Test4 (0 ms)
[ RUN      ] CommentParserTest.Test5
=== Source:
// \brief Aaa
//
// Bbb
=== AST:
(FullComment 0x7fe3c10061c0 <:1:3, :3:7>
  (ParagraphComment 0x7fe3c1006040 <:1:3, :1:4>
    (TextComment 0x7fe3c1006010 <:1:3, :1:4> Text=" "))
  (BlockCommandComment 0x7fe3c1006070 <:1:4, :1:14> Name="brief"
    (ParagraphComment 0x7fe3c1006130 <:1:10, :1:14>
      (TextComment 0x7fe3c1006100 <:1:10, :1:14> Text=" Aaa")))
  (ParagraphComment 0x7fe3c1006190 <:3:3, :3:7>
    (TextComment 0x7fe3c1006160 <:3:3, :3:7> Text=" Bbb")))
[       OK ] CommentParserTest.Test5 (0 ms)
[ RUN      ] CommentParserTest.Test6
=== Source:
// \brief \author
=== AST:
(FullComment 0x7fe3c1006220 <:1:3, <invalid loc>>
  (ParagraphComment 0x7fe3c1006040 <:1:3, :1:4>
    (TextComment 0x7fe3c1006010 <:1:3, :1:4> Text=" "))
  (BlockCommandComment 0x7fe3c1006070 <:1:4, :1:11> Name="brief"
    (ParagraphComment 0x7fe3c1006130 <:1:10, :1:11>
      (TextComment 0x7fe3c1006100 <:1:10, :1:11> Text=" ")))
  (BlockCommandComment 0x7fe3c1006160 <:1:11, <invalid loc>> Name="author"
    (ParagraphComment 0x7fe3c10061f0 <<invalid loc>>)))
[       OK ] CommentParserTest.Test6 (0 ms)
[ RUN      ] CommentParserTest.Test7
=== Source:
// \brief Aaa
// Bbb \author
// Ccc
=== AST:
(FullComment 0x7fe3c10062e0 <:1:3, :3:7>
  (ParagraphComment 0x7fe3c1006040 <:1:3, :1:4>
    (TextComment 0x7fe3c1006010 <:1:3, :1:4> Text=" "))
  (BlockCommandComment 0x7fe3c1006070 <:1:4, :2:8> Name="brief"
    (ParagraphComment 0x7fe3c1006180 <:1:10, :2:8>
      (TextComment 0x7fe3c1006100 <:1:10, :1:14> Text=" Aaa")
      (NewlineComment 0x7fe3c1006130 <:1:14>)
      (TextComment 0x7fe3c1006150 <:2:3, :2:8> Text=" Bbb ")))
  (BlockCommandComment 0x7fe3c10061c0 <:2:8, :3:7> Name="author"
    (ParagraphComment 0x7fe3c10062a0 <:2:15, :3:7>
      (NewlineComment 0x7fe3c1006250 <:2:15>)
      (TextComment 0x7fe3c1006270 <:3:3, :3:7> Text=" Ccc"))))
[       OK ] CommentParserTest.Test7 (1 ms)
[ RUN      ] CommentParserTest.Test8
=== Source:
// \param aaa
// \param [in] aaa
// \param [out] aaa
// \param [in,out] aaa

=== AST:
(FullComment 0x7fe3c10064e0 <:1:3, <invalid loc>>
  (ParagraphComment 0x7fe3c1006040 <:1:3, :1:4>
    (TextComment 0x7fe3c1006010 <:1:3, :1:4> Text=" "))
  (ParamCommandComment 0x7fe3c1006070 <:1:4, :2:4> [in] implicitly Param="aaa"
    (ParagraphComment 0x7fe3c1006160 <:1:14, :2:4>
      (NewlineComment 0x7fe3c1006110 <:1:14>)
      (TextComment 0x7fe3c1006130 <:2:3, :2:4> Text=" ")))
  (ParamCommandComment 0x7fe3c10061a0 <:2:4, :3:4> [in] explicitly Param="aaa"
    (ParagraphComment 0x7fe3c1006290 <:2:19, :3:4>
      (NewlineComment 0x7fe3c1006240 <:2:19>)
      (TextComment 0x7fe3c1006260 <:3:3, :3:4> Text=" ")))
  (ParamCommandComment 0x7fe3c10062d0 <:3:4, :4:4> [out] explicitly Param="aaa"
    (ParagraphComment 0x7fe3c10063c0 <:3:20, :4:4>
      (NewlineComment 0x7fe3c1006370 <:3:20>)
      (TextComment 0x7fe3c1006390 <:4:3, :4:4> Text=" ")))
  (ParamCommandComment 0x7fe3c1006400 <:4:4, <invalid loc>> [in,out] explicitly Param="aaa"
    (ParagraphComment 0x7fe3c10064b0 <<invalid loc>>)))
[       OK ] CommentParserTest.Test8 (0 ms)
[ RUN      ] CommentParserTest.Test9
=== Source:
// \unknown aaa

=== AST:
(FullComment 0x7fe3c10060e0 <:1:3, :1:16>
  (ParagraphComment 0x7fe3c10060a0 <:1:3, :1:16>
    (TextComment 0x7fe3c1006010 <:1:3, :1:4> Text=" ")
    (InlineCommandComment 0x7fe3c1006040 <:1:4, :1:12>)
    (TextComment 0x7fe3c1006070 <:1:12, :1:16> Text=" aaa")))
[       OK ] CommentParserTest.Test9 (0 ms)
[----------] 9 tests from CommentParserTest (2 ms total)
-------------- next part --------------
A non-text attachment was scrubbed...
Name: comments-parser-v0.patch
Type: application/octet-stream
Size: 65597 bytes
Desc: not available
URL: <http://lists.llvm.org/pipermail/cfe-commits/attachments/20120629/dd850b7a/attachment.obj>


More information about the cfe-commits mailing list