r193067 - Lex: Don't restrict legal UCNs when preprocessing assembly

Justin Bogner mail at justinbogner.com
Sun Oct 20 22:02:28 PDT 2013


Author: bogner
Date: Mon Oct 21 00:02:28 2013
New Revision: 193067

URL: http://llvm.org/viewvc/llvm-project?rev=193067&view=rev
Log:
Lex: Don't restrict legal UCNs when preprocessing assembly

The C and C++ standards disallow using universal character names to
refer to some characters, such as basic ascii and control characters,
so we reject these sequences in the lexer. However, when the
preprocessor isn't being used on C or C++, it doesn't make sense to
apply these restrictions.

Notably, accepting these characters avoids issues with unicode escapes
when GHC uses the compiler as a preprocessor on haskell sources.

Fixes rdar://problem/14742289

Modified:
    cfe/trunk/lib/Lex/Lexer.cpp
    cfe/trunk/test/Preprocessor/assembler-with-cpp.c

Modified: cfe/trunk/lib/Lex/Lexer.cpp
URL: http://llvm.org/viewvc/llvm-project/cfe/trunk/lib/Lex/Lexer.cpp?rev=193067&r1=193066&r2=193067&view=diff
==============================================================================
--- cfe/trunk/lib/Lex/Lexer.cpp (original)
+++ cfe/trunk/lib/Lex/Lexer.cpp Mon Oct 21 00:02:28 2013
@@ -2730,6 +2730,10 @@ uint32_t Lexer::tryReadUCN(const char *&
     StartPtr = CurPtr;
   }
 
+  // Don't apply C family restrictions to UCNs in assembly mode
+  if (LangOpts.AsmPreprocessor)
+    return CodePoint;
+
   // C99 6.4.3p2: A universal character name shall not specify a character whose
   //   short identifier is less than 00A0 other than 0024 ($), 0040 (@), or
   //   0060 (`), nor one in the range D800 through DFFF inclusive.)

Modified: cfe/trunk/test/Preprocessor/assembler-with-cpp.c
URL: http://llvm.org/viewvc/llvm-project/cfe/trunk/test/Preprocessor/assembler-with-cpp.c?rev=193067&r1=193066&r2=193067&view=diff
==============================================================================
--- cfe/trunk/test/Preprocessor/assembler-with-cpp.c (original)
+++ cfe/trunk/test/Preprocessor/assembler-with-cpp.c Mon Oct 21 00:02:28 2013
@@ -72,6 +72,9 @@
 11: T11(b)
 // CHECK-Identifiers-True: 11: #0
 
+// Universal character names can specify basic ascii and control characters
+12: \u0020\u0030\u0080\u0000
+// CHECK-Identifiers-False: 12: \u0020\u0030\u0080\u0000
 
 // This should not crash
 // rdar://8823139





More information about the cfe-commits mailing list