[cfe-commits] [PATCH] Validate C99 and C++03's set of UCNs

Thu Jan 24 17:07:15 PST 2013


================
Comment at: lib/Lex/CharSets.inc:1
@@ +1,2 @@
+//===--- CharSets.inc - Contains important sets of Unicode characters -----===//
+//
----------------
Why .inc, not a normal self-contained header file?

================
Comment at: lib/Lex/CharSets.inc:208
@@ +207,3 @@
+  // Digits (1)
+  { 0x0660, 0x0669 }, { 0x06F0, 0x06F9 },
+
----------------
The first of these ranges is in the wrong place, and will confuse the binary search.

================
Comment at: lib/Lex/CharSets.inc:379-381
@@ +378,5 @@
+
+// C99 6.4.2.1p3: The initial character [of an identifier] shall not be a
+// universal character name designating a digit.
+// C99 Annex D defines these characters as "Digits".
+static const UnicodeCharRange C99DisallowedInitialIDChars[] = {
----------------
How sure are you about this? It might also be referring to the term /digit/ defined in 6.4.2.1/1 as 'one of 0 1 2 3 4 5 6 7 8 9', and that would be more consistent with C11's rules, in which all of these characters may start an identifier, and only combining characters are ruled out at the start of an identifier.

By the other interpretation of this rule, it would be redundant given that such UCNs would be ill-formed by 6.4.3/2, but I'm really not sure about the intent here.

================
Comment at: include/clang/Basic/DiagnosticLexKinds.td:124-126
@@ -123,1 +123,5 @@
 def err_ucn_escape_invalid : Error<"invalid universal character">;
+def warn_cxx11_ucn_escape_invalid : Warning<
+  "character cannot be specified by a universal character name in C++11">,
+  InGroup<CXX11Compat>, DefaultIgnore;
+
----------------
Maybe "surrogate character cannot be [...]" ?

================
Comment at: lib/Lex/Lexer.cpp:1524-1525
@@ -1554,1 +1523,4 @@
+
+template <size_t N>
+static bool isCharInSet(uint32_t c, const UnicodeCharRange (&Ranges)[N]) {
   unsigned LowPoint = 0;
----------------
Hmm, interesting, did you template this on the array size for convenience, or in the hope that the compiler will unroll the loop? If the latter, did it work? (And if so, did we manage to propagate the array into the unrolled loop, or is it still being passed as a parameter?)

================
Comment at: test/Preprocessor/ucn-allowed-chars.c:1
@@ +1,2 @@
+// RUN: %clang_cc1 %s -fsyntax-only -std=c99 -verify
+// RUN: %clang_cc1 %s -fsyntax-only -std=c11 -Wc99-compat -verify
----------------
Can we also get a copy of this test with UTF-8 characters?


http://llvm-reviews.chandlerc.com/D327