[cfe-dev] Patch to allow comment translators implementation

Tue Jan 12 04:47:08 PST 2010

Il 12/01/2010 01:54, Douglas Gregor ha scritto:
> 
> On Jan 6, 2010, at 1:13 AM, Abramo Bagnara wrote:
> 
>> Il 29/12/2009 23:51, Abramo Bagnara ha scritto:
>>> Il 29/12/2009 21:08, Chris Lattner ha scritto:
>>>>
>>>> On Dec 26, 2009, at 7:58 AM, Abramo Bagnara wrote:
>>>>
>>>>>
>>>>> This small patch change comments handler in a simple way to permit to
>>>>> implement quite easily comment translators.
>>>>>
>>>>> Once applied this patch, a CommentHandler is allowed to build a first
>>>>> token to be returned to Lexer and to push a TokenStream for the
>>>>> others,
>>>>> then allowing generic comment -> tokens transformer.
>>>>>
>>>>> This can be useful to transform comment shaped program annotation that
>>>>> should be translated to source code and also other interesting
>>>>> applications.
>>>>
>>>> This is an interesting approach.  The only major concern I have is
>>>> that this only allows you to translate comments into exactly one
>>>> token.  In the case of openmp pragmas (for example) this doesn't seem
>>
>> Do I've been sufficient clear explaining that the comments may be
>> translated to an arbitrary number of tokens calling EnterTokenStream
>> inside the CommentHandler?
> 
> Yes, but I find the protocol for introducing tokens via a comment
> handler to be very confusing. Could we instead eliminate the Token
> &token argument, and just make the protocol: to "parse" the contents of
> the comment, use EnterTokenStream and then return true?

The only proposal I see that would not change a lot of things is to
remove Token argument from CommentHandler and to specify that they
always shall use EnterTokenStream also when they need to insert only one
token.
Then at end of Preprocessor::HandleComment if any of the comment
handlers has informed the caller to have inserted some tokens the first
one available is fetched and returned by Preprocessor::HandleComment.

However the presence of argument for returned Token is perfectly
congruent with each other Handler.

> Or, at the very least, the "token" argument should be named
> "firstToken", to indicate that it is possible to inject other tokens. Of
> course, HandleComment also needs documentation to describe what the
> parameters and return value actually mean, and how comment handlers can
> introduce tokens into the stream.

Yes, I will do all that before to submit final version of the patch and
once we agreed on the choosen approach.

>>>> rich enough.  A different approach would be to allow the handler to
>>>> push an arbitrary number of tokens into the parser's lookahead
>>>> buffer.  Would this work for what you're trying to do?
>>>
>>> Yes, but perhaps this is not needed: as I wrote the CommentHandler could
>>> return a first token *and* produce the other tokens to be read and push
>>> them to lexer stack using EnterTokenStream.
>>>
>>> I've already tried this with success in a sample implementation that
>>> simply lex the comment content without modify it:
>>
>> Still I've not got any feedback: do you think that the patch in original
>> mail will be applied as is? Should I improve it in some way?
> 
> 
> I think it's okay if the HandleComment protocol can be simplified a bit
> and if it is documented, although I'd like to hear from Chris. I'd feel
> much better if we actually had some kind of use of this code path within
> Clang itself. For example, would it be possible for the keep-comments
> mode to be implemented outside of the lexer using your changes to
> HandleComment? That might actually simplify the lexer while making it
> more general.

... hmmm, probably it's feasible, but I'm not so sure it's a good idea
because this means we should always have a CommentHandler loaded and we
have the problem of CommentHandler execution order, as only the first
CommentHandler that want to return a first Token is allowed to do that
(unless we implement the variant above).

With proposed patch, instead, the CommentHandler that want to transform
comments has always the priority over keep-comments mode.

That apart I'd like to proceed by steps, what do you think about a
resubmittal of proposed patch with Token argument renamed in FirstToken
and proper documentation of HandleComment protocol?

Or you prefer I remove Token argument as described in the variant above
although it's a bit less efficient for returning a single token?

-- 
Abramo Bagnara

Opera Unica                          Phone: +39.0546.656023
Via Borghesi, 16
48014 Castel Bolognese (RA) - Italy