[cfe-dev] Confusing comment on LexTokenInternal
AlisdairM(public)
public at alisdairm.net
Wed Jul 8 11:36:41 PDT 2009
> -----Original Message-----
> From: Chris Lattner [mailto:clattner at apple.com]
> Sent: 07 July 2009 18:25
> To: AlisdairM (public)
> Cc: 'clang-dev Developers'
> Subject: Re: [cfe-dev] Confusing comment on LexTokenInternal
> > Oh, and it gets worse! I've not doubled this again to support user-
> > defined-string-literals, which will also compound the number of
> > character, floating point and integer literals we define. If I
> > follow the existing scheme we will go from 2 string literal token
> > types (tok:string_literal and tok::wide_string_literal) to 20!
>
> Ok, if this is the case, it is probably better to go from two token
> types to one (just string_literal) and have the literal parser stuff
> actually do the categorization. I think the interesting clients all
> using the literal parser anyway.
>
> > So what does this mean in practice?
> >
> > I want to kill tok::wide_string_literal and somehow stuff the
> > encoding into tok::string_literal (char, char16_t, char32_t, wchar_t
> > or u8 special. Options for other languages may be appropriate too).
> > Any advice on how to approach this appreciated.
>
> Makes sense to me! Do you actually need to encode this in the
> *Token*? Could you just have StringLiteralParser determine these
> properties?
OK, tried it, and time to scratch that plan already!
The problem is not that string_literal cannot handle the wide_string_literal cases, that was easy to fix up. However, there are a few places in the grammar that require string literal be exactly that - a narrow string literal. Examples are #include "myfile" and extern "C".
Now I could try and stuff a flag into the token to indicate it truly is a narrow string literal - but we already have that effect with the two separate tokens. That seems to be working and is quite well tested by now so I think we should keep this in place.
The new plan is to repurpose wide_string_literal to cover any annotated string literal i.e. with any prefix or suffix. 'Annotated' seems to have other connotations though so I'm looking for a better term. In the meantime I'll put the foundation in for wide_string_literal to handle the 19 other cases that string_literal does not.
AlisdairM
More information about the cfe-dev
mailing list