[PATCH] D123064: [Clang][C++23][WIP] P2071 Named universal character escapes

Tom Honermann via Phabricator via cfe-commits cfe-commits at lists.llvm.org
Tue Apr 5 13:38:25 PDT 2022


tahonermann added a comment.

This looks pretty good to me. I added a few comments. I mostly just reviewed the lexer related code; I didn't dive into the name matching code.



================
Comment at: clang/include/clang/Basic/DiagnosticLexKinds.td:130-131
 
 def ext_delimited_escape_sequence : Extension<
-  "delimited escape sequences are a Clang extension">,
+  "%select{delimited|named}0 escape sequences are a Clang extension">,
   InGroup<DiagGroup<"delimited-escape-sequence-extension">>;
----------------
I don't see much value in combining these diagnostics since these are distinct features. The `ext_delimited_escape_sequence` name seems odd for named escape sequences too (even if both features use `{` and `}` as delimiters).


================
Comment at: clang/lib/Lex/Lexer.cpp:3255-3260
+    if (!isAlphanumeric(C) && C != '_' && C != '-' && C != ' ')
+      break;
+
+    if ((C < 'A' || C > 'Z') && !llvm::isDigit(C) && C != ' ' && C != '-') {
+      Invalid = true;
+    }
----------------
It isn't clear to me why there are two separate `if` statements here. I would expect the following to suffice. If I'm misunderstanding something, then comments might be helpful.
  if (!isUppercase(C) && !isDigit(C) && C != '-' && C != ' ') {
    Invalid = true;
    break;
  }

I'm not sure why there is a test for '_' since that character is not present in the grammar for `n-char` in P2071.

Is it intentional that there is no `break` statement in the second `if` statement?


================
Comment at: clang/lib/Lex/LiteralSupport.cpp:487-495
+  auto Res = llvm::sys::unicode::nameToCodepointLooseMatching(Name);
+  if (!Res)
+    return false;
+  Diag(Diags, Features, Loc, TokBegin, TokRangeBegin, TokRangeEnd,
+       diag::note_invalid_ucn_name_loose_matching)
+      << FixItHint::CreateReplacement(
+             MakeCharSourceRange(Features, Loc, TokBegin, TokRangeBegin,
----------------
I like this use of loose matching to generate a fixit!


================
Comment at: clang/test/Lexer/char-escapes-delimited.c:81
+  char d = '\N{}';          // expected-error {{delimited escape sequence cannot be empty}}
+  char e = '\N{';           // expected-error {{delimited escape sequence cannot be empty}}
+
----------------
Does this test imply that the `warn_delimited_ucn_incomplete` diagnostic cannot be issued for named escape sequences? I don't see a test that exercises a missing '}' otherwise. Perhaps add a test like:
  char x = '\N{LOTUS';


Repository:
  rG LLVM Github Monorepo

CHANGES SINCE LAST ACTION
  https://reviews.llvm.org/D123064/new/

https://reviews.llvm.org/D123064



More information about the cfe-commits mailing list