[cfe-dev] unicode identifiers

Sean Hunt scshunt at csclub.uwaterloo.ca
Tue Jun 21 12:10:57 PDT 2011


On 11-06-21 09:07 AM, Jochen Wilhelmy wrote:
> Hi!
>
> I'd like to use unicode (utf-8) identifiers and for this I simply
> patched the CharInfo in Lexer.cpp to contain CHAR_LETTER
> for characters 128 to 255. Is this simple solution different from
> what the standard requires and if yes, what would be
> the correct solution for UCN (universal character name)
> identifiers?
>
> -Jochen

Yes. The standard has a list of characters allowed in identifiers in 
Appendix E. We would need to decode the UTF-8 to see if it is valid, as 
well as ignore invalid UTF-8 sequences.

Sean



More information about the cfe-dev mailing list