[PATCH] D153621: [Clang] Correctly handle $, @, and ` when represented as UCN

Tom Honermann via Phabricator via cfe-commits cfe-commits at lists.llvm.org
Thu Jul 6 14:41:14 PDT 2023


tahonermann requested changes to this revision.
tahonermann added a comment.
This revision now requires changes to proceed.

Changes look good; I added a number of suggested edits for minor issues.



================
Comment at: clang/docs/ReleaseNotes.rst:203-204
 
+- Implemented `WG14 N3124 <https://www.open-std.org/jtc1/sc22/wg14/www/docs/n3124.pdf>_`,
+  which allow any universal character name to appear in string literals.
+
----------------



================
Comment at: clang/docs/ReleaseNotes.rst:530-531
   (`#38717 <https://github.com/llvm/llvm-project/issues/38717>_`).
+- Fix an assertion when using ``\u0024`` as an identifier, by disallowing
+  that construct (`#62133 <https://github.com/llvm/llvm-project/issues/38717>_`).
 
----------------



================
Comment at: clang/include/clang/Basic/DiagnosticLexKinds.td:204-207
+def warn_c2x_compat_literal_ucn_control_character : Warning<
+  "universal character name referring to a control character "
+  "incompatible with C standards before C2x">,
+  InGroup<CPre2xCompat>, DefaultIgnore;
----------------



================
Comment at: clang/lib/Lex/Lexer.cpp:3487-3492
+  // C2x 6.4.3p2: A universal character name shall not designate a code point
+  //   where the hexadecimal value is in the range D800 through DFFF inclusive
+  //   or greater than 10FFFF80). A universal-character-name outside the
+  //   c-char-sequence of a character constant, or the s-char-sequence
+  //   of a string-literal shall not designate a control character
+  //   or a character in the basic character.                                                                                            set.
----------------



================
Comment at: clang/lib/Lex/LiteralSupport.cpp:617-618
 
   // C++11 allows UCNs that refer to control characters and basic source
   // characters inside character and string literals
   if (UcnVal < 0xa0 &&
----------------



================
Comment at: clang/lib/Lex/LiteralSupport.cpp:630
+                       diag::warn_cxx98_compat_literal_ucn_escape_basic_scs
+                     : diag::warn_c2x_compat_literal_ucn_escape_basic_scs )
             << StringRef(&BasicSCSChar, 1);
----------------



================
Comment at: clang/test/Preprocessor/ucn-allowed-chars.c:16
 
-
 // Identifier initial characters
----------------
I assume this line was deleted to minimize the disruption to line numbers due to the additional RUN line?


Repository:
  rG LLVM Github Monorepo

CHANGES SINCE LAST ACTION
  https://reviews.llvm.org/D153621/new/

https://reviews.llvm.org/D153621



More information about the cfe-commits mailing list