[PATCH] D123064: [Clang][C++23][WIP] P2071 Named universal character escapes
Tom Honermann via Phabricator via cfe-commits
cfe-commits at lists.llvm.org
Tue Apr 5 13:38:25 PDT 2022
tahonermann added a comment.
This looks pretty good to me. I added a few comments. I mostly just reviewed the lexer related code; I didn't dive into the name matching code.
================
Comment at: clang/include/clang/Basic/DiagnosticLexKinds.td:130-131
def ext_delimited_escape_sequence : Extension<
- "delimited escape sequences are a Clang extension">,
+ "%select{delimited|named}0 escape sequences are a Clang extension">,
InGroup<DiagGroup<"delimited-escape-sequence-extension">>;
----------------
I don't see much value in combining these diagnostics since these are distinct features. The `ext_delimited_escape_sequence` name seems odd for named escape sequences too (even if both features use `{` and `}` as delimiters).
================
Comment at: clang/lib/Lex/Lexer.cpp:3255-3260
+ if (!isAlphanumeric(C) && C != '_' && C != '-' && C != ' ')
+ break;
+
+ if ((C < 'A' || C > 'Z') && !llvm::isDigit(C) && C != ' ' && C != '-') {
+ Invalid = true;
+ }
----------------
It isn't clear to me why there are two separate `if` statements here. I would expect the following to suffice. If I'm misunderstanding something, then comments might be helpful.
if (!isUppercase(C) && !isDigit(C) && C != '-' && C != ' ') {
Invalid = true;
break;
}
I'm not sure why there is a test for '_' since that character is not present in the grammar for `n-char` in P2071.
Is it intentional that there is no `break` statement in the second `if` statement?
================
Comment at: clang/lib/Lex/LiteralSupport.cpp:487-495
+ auto Res = llvm::sys::unicode::nameToCodepointLooseMatching(Name);
+ if (!Res)
+ return false;
+ Diag(Diags, Features, Loc, TokBegin, TokRangeBegin, TokRangeEnd,
+ diag::note_invalid_ucn_name_loose_matching)
+ << FixItHint::CreateReplacement(
+ MakeCharSourceRange(Features, Loc, TokBegin, TokRangeBegin,
----------------
I like this use of loose matching to generate a fixit!
================
Comment at: clang/test/Lexer/char-escapes-delimited.c:81
+ char d = '\N{}'; // expected-error {{delimited escape sequence cannot be empty}}
+ char e = '\N{'; // expected-error {{delimited escape sequence cannot be empty}}
+
----------------
Does this test imply that the `warn_delimited_ucn_incomplete` diagnostic cannot be issued for named escape sequences? I don't see a test that exercises a missing '}' otherwise. Perhaps add a test like:
char x = '\N{LOTUS';
Repository:
rG LLVM Github Monorepo
CHANGES SINCE LAST ACTION
https://reviews.llvm.org/D123064/new/
https://reviews.llvm.org/D123064
More information about the cfe-commits
mailing list