[cfe-commits] [PATCH] Support for universal character names in identifiers

Jordan Rose jordan_rose at apple.com
Fri Jan 18 14:56:54 PST 2013


Hi rsmith,

This is a missing piece for C99 conformance.

This patch handles UCNs by adding a '\\' case to LexTokenInternal and LexIdentifier -- if we see a backslash, we tentatively try to read in a UCN. If the UCN is not syntactically well-formed, we fall back to the old treatment: a backslash followed by an identifier beginning with 'u' (or 'U').

Because the spelling of an identifier with UCNs still has the UCN in it, we need to convert that to UTF-8 in Preprocessor::LookUpIdentifierInfo.

Of course, valid code that does //not// use UCNs will see only a very minimal performance hit (checks after each identifier for non-ASCII characters, checks when converting raw_identifiers to identifiers that they do not contain UCNs, and checks when getting the spelling of an identifier that it does not contain a UCN).

This patch also adds basic support for actual UTF-8 in the source, including treating Unicode whitespace as whitespace.

http://llvm-reviews.chandlerc.com/D312

Files:
  include/clang/Basic/ConvertUTF.h
  include/clang/Basic/DiagnosticLexKinds.td
  include/clang/Lex/Lexer.h
  include/clang/Lex/Token.h
  lib/Lex/Lexer.cpp
  lib/Lex/LiteralSupport.cpp
  lib/Lex/Preprocessor.cpp
  test/CXX/over/over.oper/over.literal/p8.cpp
  test/FixIt/fixit-unicode.c
  test/Preprocessor/ucn-pp-identifier.c
  test/Sema/ucn-cstring.c
-------------- next part --------------
A non-text attachment was scrubbed...
Name: D312.1.patch
Type: text/x-patch
Size: 29500 bytes
Desc: not available
URL: <http://lists.llvm.org/pipermail/cfe-commits/attachments/20130118/80bd8102/attachment.bin>


More information about the cfe-commits mailing list