[cfe-commits] [PATCH] Lexer for structured comments

Tue Jun 26 12:37:54 PDT 2012

On Tue, Jun 26, 2012 at 11:30 AM, Douglas Gregor <dgregor at apple.com> wrote:
> On Jun 25, 2012, at 2:37 PM, Dmitri Gribenko <gribozavr at gmail.com> wrote:
>> On Mon, Jun 25, 2012 at 10:45 AM, Douglas Gregor <dgregor at apple.com> wrote:
>>> +  /// Registered verbatim-like block commands.
>>> +  VerbatimBlockCommandVector VerbatimBlockCommands;
>>> <snip>
>>> +  /// Registered verbatim-like line commands.
>>> +  VerbatimLineCommandVector VerbatimLineCommands;
>>> <snip>
>>> +  /// \brief Register a new verbatim block command.
>>> +  void addVerbatimBlockCommand(StringRef BeginName, StringRef EndName);
>>> +
>>> +  /// \brief Register a new verbatim line command.
>>> +  void addVerbatimLineCommand(StringRef Name);
>>>
>>> Do we need this much generality? It seems like a StringSwitch for each case (verbatim block and verbatim line) would suffice for now, and we could optimize that later by TableGen'ing the string matcher for the various Doxygen/HeaderDoc/Qt/JavaDoc command names.
>>
>> Converted to StringSwitch, but retained the vector search.  I think we
>> need both: string matcher to handle predefined commands and a vector
>> to handle commands registered dynamically.
>
> Are we expecting that commands registered dynamically will have their own parsers that build custom AST nodes?

They might have their own parsers (there is nothing in lexer that
makes this impossible).  I don't think custom AST nodes are possible
because much logic in AST is based on the fact that all AST node
classes are known in advance and we might want to copy that logic to
comment AST.

Here is how I see a classification of Doxygen commands:
-- verbatim block commands: starting command, text, end command
-- verbatim line commands: starting command, text, newline or comment end
-- block commands (for example, \brief, \param): starting command,
args, text (with inline commands), newline-newline or comment end or
other block command
-- inline commands (for example, \c): starting command, args (number
of args depends on command)

Adding verbatim commands is easy: just pass the names down to lexer.
Adding block commands and inline commands is easy as long as:
(a) there are no optional arguments
(b) no special argument parsing required
(c) no command nesting -- argument should be a word, a quoted string
or any other atomic text node.

An exception is, for example, the \param command, which accepts an
optional direction argument:
\param [in] foo
\param foo

To distinguish between these the parser has to know about \param.

As long as we don't support adding such commands with optional
arguments, registering commands dynamically should be easy.  And
should someone need a block or inline command with optional arguments,
it should be converted to a verbatim line command.  (For most cases it
is easily possible.)  Then the complex parsing can be done in
CommentSema.  Not the cleanest approach, but easiest for those who
will implement custom parsing.

> Updated patch looks great!

Thanks for the review!  May I commit?

Dmitri

-- 
main(i,j){for(i=2;;i++){for(j=2;j<i;j++){if(!(i%j)){j=0;break;}}if
(j){printf("%d\n",i);}}} /*Dmitri Gribenko <gribozavr at gmail.com>*/