[cfe-dev] Patch to allow comment translators implementation

Mon Jan 11 16:54:50 PST 2010

On Jan 6, 2010, at 1:13 AM, Abramo Bagnara wrote:

> Il 29/12/2009 23:51, Abramo Bagnara ha scritto:
>> Il 29/12/2009 21:08, Chris Lattner ha scritto:
>>>
>>> On Dec 26, 2009, at 7:58 AM, Abramo Bagnara wrote:
>>>
>>>>
>>>> This small patch change comments handler in a simple way to  
>>>> permit to
>>>> implement quite easily comment translators.
>>>>
>>>> Once applied this patch, a CommentHandler is allowed to build a  
>>>> first
>>>> token to be returned to Lexer and to push a TokenStream for the  
>>>> others,
>>>> then allowing generic comment -> tokens transformer.
>>>>
>>>> This can be useful to transform comment shaped program annotation  
>>>> that
>>>> should be translated to source code and also other interesting  
>>>> applications.
>>>
>>> This is an interesting approach.  The only major concern I have is
>>> that this only allows you to translate comments into exactly one
>>> token.  In the case of openmp pragmas (for example) this doesn't  
>>> seem
>
> Do I've been sufficient clear explaining that the comments may be
> translated to an arbitrary number of tokens calling EnterTokenStream
> inside the CommentHandler?

Yes, but I find the protocol for introducing tokens via a comment  
handler to be very confusing. Could we instead eliminate the Token  
&token argument, and just make the protocol: to "parse" the contents  
of the comment, use EnterTokenStream and then return true?

Or, at the very least, the "token" argument should be named  
"firstToken", to indicate that it is possible to inject other tokens.  
Of course, HandleComment also needs documentation to describe what the  
parameters and return value actually mean, and how comment handlers  
can introduce tokens into the stream.

>>> rich enough.  A different approach would be to allow the handler to
>>> push an arbitrary number of tokens into the parser's lookahead
>>> buffer.  Would this work for what you're trying to do?
>>
>> Yes, but perhaps this is not needed: as I wrote the CommentHandler  
>> could
>> return a first token *and* produce the other tokens to be read and  
>> push
>> them to lexer stack using EnterTokenStream.
>>
>> I've already tried this with success in a sample implementation that
>> simply lex the comment content without modify it:
>
> Still I've not got any feedback: do you think that the patch in  
> original
> mail will be applied as is? Should I improve it in some way?

I think it's okay if the HandleComment protocol can be simplified a  
bit and if it is documented, although I'd like to hear from Chris. I'd  
feel much better if we actually had some kind of use of this code path  
within Clang itself. For example, would it be possible for the keep- 
comments mode to be implemented outside of the lexer using your  
changes to HandleComment? That might actually simplify the lexer while  
making it more general.

	- Doug