[cfe-commits] [PATCH] Lexer for structured comments

Tue Jun 26 13:21:09 PDT 2012

On Jun 26, 2012, at 12:37 PM, Dmitri Gribenko <gribozavr at gmail.com> wrote:

> On Tue, Jun 26, 2012 at 11:30 AM, Douglas Gregor <dgregor at apple.com> wrote:
>> On Jun 25, 2012, at 2:37 PM, Dmitri Gribenko <gribozavr at gmail.com> wrote:
>>> On Mon, Jun 25, 2012 at 10:45 AM, Douglas Gregor <dgregor at apple.com> wrote:
>>>> +  /// Registered verbatim-like block commands.
>>>> +  VerbatimBlockCommandVector VerbatimBlockCommands;
>>>> <snip>
>>>> +  /// Registered verbatim-like line commands.
>>>> +  VerbatimLineCommandVector VerbatimLineCommands;
>>>> <snip>
>>>> +  /// \brief Register a new verbatim block command.
>>>> +  void addVerbatimBlockCommand(StringRef BeginName, StringRef EndName);
>>>> +
>>>> +  /// \brief Register a new verbatim line command.
>>>> +  void addVerbatimLineCommand(StringRef Name);
>>>> 
>>>> Do we need this much generality? It seems like a StringSwitch for each case (verbatim block and verbatim line) would suffice for now, and we could optimize that later by TableGen'ing the string matcher for the various Doxygen/HeaderDoc/Qt/JavaDoc command names.
>>> 
>>> Converted to StringSwitch, but retained the vector search.  I think we
>>> need both: string matcher to handle predefined commands and a vector
>>> to handle commands registered dynamically.
>> 
>> Are we expecting that commands registered dynamically will have their own parsers that build custom AST nodes?
> 
> They might have their own parsers (there is nothing in lexer that
> makes this impossible).  I don't think custom AST nodes are possible
> because much logic in AST is based on the fact that all AST node
> classes are known in advance and we might want to copy that logic to
> comment AST.

That doesn't prohibit having a single "Custom" AST node kind that has some sort of dynamic discriminator in it. Clang itself would have to ignore them, but, if we're going to have custom parsers, they'll need to be able to record their results somewhere (?). Anyway, this is me speculating far in the potential future.

> Here is how I see a classification of Doxygen commands:
> -- verbatim block commands: starting command, text, end command
> -- verbatim line commands: starting command, text, newline or comment end
> -- block commands (for example, \brief, \param): starting command,
> args, text (with inline commands), newline-newline or comment end or
> other block command
> -- inline commands (for example, \c): starting command, args (number
> of args depends on command)
> 
> Adding verbatim commands is easy: just pass the names down to lexer.
> Adding block commands and inline commands is easy as long as:
> (a) there are no optional arguments
> (b) no special argument parsing required
> (c) no command nesting -- argument should be a word, a quoted string
> or any other atomic text node.
> 
> An exception is, for example, the \param command, which accepts an
> optional direction argument:
> \param [in] foo
> \param foo
> 
> To distinguish between these the parser has to know about \param.

Sure. \param also happens to be one of the commands that is most important to bake into the parser anyway. Hopefully most custom commands will be simpler.

> As long as we don't support adding such commands with optional
> arguments, registering commands dynamically should be easy.  And
> should someone need a block or inline command with optional arguments,
> it should be converted to a verbatim line command.  (For most cases it
> is easily possible.)  Then the complex parsing can be done in
> CommentSema.  Not the cleanest approach, but easiest for those who
> will implement custom parsing.

Sounds reasonable.

>> Updated patch looks great!
> 
> Thanks for the review!  May I commit?

Yes, go ahead and commit. Sorry if I wasn't clear.

	- Doug