[cfe-commits] [PATCH] Support for universal character names in identifiers

Jordan Rose jordan_rose at apple.com
Thu Jan 17 11:31:32 PST 2013


How about this approach?
- LexUnicode mirrors LexTokenInternal, dispatching to the proper lex method based on the first Unicode character in a token.
- UCNs are validated in readUCN (called by LexTokenInternal and LexIdentifier). The specific identifier restrictions are checked in LexUnicode and LexIdentifier.
- UCNs are recomputed in Preprocessor::LookUpIdentifierInfo because we start with the spelling info there, but all the validation has already happened.

With these known flaws:
- the classification of characters in LexUnicode should be more efficient.
- poor recovery for a non-identifier UCN in an identifier. Right now I just take that to mean "end of identifier", which is the most pedantically correct thing to do, but it's probably not what's intended.
- still needs more tests, of course

FWIW, though, I'm not sure unifying literal Unicode and UCNs is actually a great idea. The case where it matters most (validation of identifier characters) is pretty easy to separate out into a helper function (and indeed it already is). The other cases (accepting Unicode whitespace and fixits for accidental Unicode) only make sense for literal Unicode, not escaped Unicode.

Anyway, what do you think?
Jordan

-------------- next part --------------
A non-text attachment was scrubbed...
Name: UCNs.patch
Type: application/octet-stream
Size: 25350 bytes
Desc: not available
URL: <http://lists.llvm.org/pipermail/cfe-commits/attachments/20130117/4322bc34/attachment.obj>


More information about the cfe-commits mailing list