[PATCH] D119221: [clang][lexer] Allow u8 character literal prefixes in C2x

Tom Honermann via Phabricator via cfe-commits cfe-commits at lists.llvm.org
Wed Apr 13 08:54:13 PDT 2022


tahonermann accepted this revision.
tahonermann added a comment.

Looks good to me! Thank you for filing the separate issue.



================
Comment at: clang/test/Lexer/utf8-char-literal.cpp:23
+char f = u8'ab';            // expected-error {{Unicode character literals may not contain multiple characters}}
+char g = u8'\x80';          // expected-warning {{implicit conversion from 'int' to 'char' changes value from 128 to -128}}
 #endif
----------------
aaron.ballman wrote:
> tahonermann wrote:
> > aaron.ballman wrote:
> > > One more test I'd like to see added, just to make sure we're covering 6.4.4.4p9 properly:
> > > ```
> > > _Static_assert(
> > >   _Generic(u8'a',
> > >            default: 0,
> > >            unsigned char : 1),
> > >   "Surprise!");  
> > > ```
> > > We expect the type of a u8 character literal to be `unsigned char` at the moment, which is different from a u8 string literal, which uses `char`.
> > > 
> > > However, WG14 is also going to be considering http://www.open-std.org/jtc1/sc22/wg14/www/docs/n2653.htm for C2x at our meeting next week.
> > Good suggestion. I believe the following update will be needed to`Sema::ActOnCharacterConstant()` in `clang/lib/Sema/SemaExpr.cpp`:
> >   ...
> >   else if (Literal.isUTF8() && getLangOpts().C2x)
> >     Ty = Context.UnsignedCharTy; // u8'x' -> unsigned char in c2x.
> >   else if Literal.isUTF8() && getLangOpts().Char8)
> >     Ty = Context.Char8Ty; // u8'x' -> char8_t when it exists.
> >   ...
> > 
> > However, WG14 is also going to be considering http://www.open-std.org/jtc1/sc22/wg14/www/docs/n2653.htm for C2x at our meeting next week.
> 
> I have an update on this. We discussed the paper and took a straw poll:
> ```
> Does WG14 wish to adopt N2653 in C23? 18/0/2 (consensus)
> ```
> So we should make sure that we all agree this patch is in line with the changes from that paper. I believe your changes agree, but it'd be nice for @tahonermann to confirm.
Confirmed. N2653 technically changes the type of `u8` character literals to `char8_t`, but since that is just a typedef of `unsigned char`, these changes still align with the semantic intent. Ideally, we would maybe try to reflect the typedef, but 1) the typedef isn't necessarily available, 2) Clang doesn't do similarly for any of the other character (or string) literals, and 3) no one is likely to care anyway.


CHANGES SINCE LAST ACTION
  https://reviews.llvm.org/D119221/new/

https://reviews.llvm.org/D119221



More information about the cfe-commits mailing list