[cfe-dev] Confusing comment on LexTokenInternal

Wed Jul 8 11:36:41 PDT 2009

> -----Original Message-----
> From: Chris Lattner [mailto:clattner at apple.com]
> Sent: 07 July 2009 18:25
> To: AlisdairM (public)
> Cc: 'clang-dev Developers'
> Subject: Re: [cfe-dev] Confusing comment on LexTokenInternal

> > Oh, and it gets worse!  I've not doubled this again to support user-
> > defined-string-literals, which will also compound the number of
> > character, floating point and integer literals we define.  If I
> > follow the existing scheme we will go from 2 string literal token
> > types (tok:string_literal and tok::wide_string_literal) to 20!
> 
> Ok, if this is the case, it is probably better to go from two token
> types to one (just string_literal) and have the literal parser stuff
> actually do the categorization.  I think the interesting clients all
> using the literal parser anyway.
> 
> > So what does this mean in practice?
> >
> > I want to kill tok::wide_string_literal and somehow stuff the
> > encoding into tok::string_literal (char, char16_t, char32_t, wchar_t
> > or u8 special. Options for other languages may be appropriate too).
> > Any advice on how to approach this appreciated.
> 
> Makes sense to me!  Do you actually need to encode this in the
> *Token*?  Could you just have StringLiteralParser determine these
> properties?

OK, tried it, and time to scratch that plan already!
The problem is not that string_literal cannot handle the wide_string_literal cases, that was easy to fix up.  However, there are a few places in the grammar that require string literal be exactly that - a narrow string literal.  Examples are #include "myfile" and extern "C".

Now I could try and stuff a flag into the token to indicate it truly is a narrow string literal - but we already have that effect with the two separate tokens.  That seems to be working and is quite well tested by now so I think we should keep this in place.

The new plan is to repurpose wide_string_literal to cover any annotated string literal i.e. with any prefix or suffix.  'Annotated' seems to have other connotations though so I'm looking for a better term.  In the meantime I'll put the foundation in for wide_string_literal to handle the 19 other cases that string_literal does not.

AlisdairM