[PATCH] D119221: [clang][lexer] Allow u8 character literal prefixes in C2x
Tom Honermann via Phabricator via cfe-commits
cfe-commits at lists.llvm.org
Fri Feb 11 19:11:31 PST 2022
tahonermann added inline comments.
================
Comment at: clang/lib/Lex/Lexer.cpp:3462
- case 'u': // Identifier (uber) or C11/C++11 UTF-8 or UTF-16 string literal
+ case 'u': // Identifier (uber) or C11/C2x/C++11 UTF-8 or UTF-16 string literal
// Notify MIOpt that we read a non-whitespace/non-comment token.
----------------
The comment is slightly misleading both before and after this change. Assuming this level of detail is desired, I suggest:
// Identifer (e.g., uber), or
// UTF-8 (C2x/C++17) or UTF-16 (C11/C++11) character literal, or
// UTF-8 or UTF-16 string literal (C11/C++11).
case 'u':
================
Comment at: clang/test/Lexer/utf8-char-literal.cpp:23
+char f = u8'ab'; // expected-error {{Unicode character literals may not contain multiple characters}}
+char g = u8'\x80'; // expected-warning {{implicit conversion from 'int' to 'char' changes value from 128 to -128}}
#endif
----------------
aaron.ballman wrote:
> One more test I'd like to see added, just to make sure we're covering 6.4.4.4p9 properly:
> ```
> _Static_assert(
> _Generic(u8'a',
> default: 0,
> unsigned char : 1),
> "Surprise!");
> ```
> We expect the type of a u8 character literal to be `unsigned char` at the moment, which is different from a u8 string literal, which uses `char`.
>
> However, WG14 is also going to be considering http://www.open-std.org/jtc1/sc22/wg14/www/docs/n2653.htm for C2x at our meeting next week.
Good suggestion. I believe the following update will be needed to`Sema::ActOnCharacterConstant()` in `clang/lib/Sema/SemaExpr.cpp`:
...
else if (Literal.isUTF8() && getLangOpts().C2x)
Ty = Context.UnsignedCharTy; // u8'x' -> unsigned char in c2x.
else if Literal.isUTF8() && getLangOpts().Char8)
Ty = Context.Char8Ty; // u8'x' -> char8_t when it exists.
...
================
Comment at: clang/test/Lexer/utf8-char-literal.cpp:24
+char g = u8'\x80'; // expected-warning {{implicit conversion from 'int' to 'char' changes value from 128 to -128}}
#endif
----------------
We should also exercise the preprocessor with something like this:
#if u8'\xff' != 0xff
#error uh oh
#endif
Hmm, this currently fails for C++20 for both Clang and gcc unless `-funsigned-char` is passed. That seems wrong. https://godbolt.org/z/Tb7z85ToG. MSVC gets this wrong too, but I think for a different reason; see the implementation impact section of [[ https://wg21.link/p2029 | P2029 ]] if curious.
Repository:
rG LLVM Github Monorepo
CHANGES SINCE LAST ACTION
https://reviews.llvm.org/D119221/new/
https://reviews.llvm.org/D119221
More information about the cfe-commits
mailing list