[PATCH] D127363: [Lex] Fix for char32_t literal truncation on 16 bit architectures

Sebastian Perta via Phabricator via cfe-commits cfe-commits at lists.llvm.org
Wed Jul 13 12:13:00 PDT 2022


SebastianPerta added a comment.

>> Additionally, the type of a character constant in C is int.

This means that char32_t c4 = U'\U00064321'; is invalid in C. I know that is clang more strict with the standard than GCC, however I would like to mention that in GCC the value is not truncated to 16 bit which is I found this problem originally. I suppose we want to stick with the standard in clang.

>> My reading of https://eel.is/c++draft/lex.ccon#2 is that a multi-char char literal with a L/u8/u/U prefix is not int but the respective character types

As explained by @tahonermann is just in case of C in case of C++ literals have their respective character types:
I checked char8_t, char16_t and char32_t with u8,u and U respectively and the following line of code by @tahonermann works in all 3 cases.
unsigned BitWidth = getCharWidth(Kind, PP.getTargetInfo());
Since Kind will be utf8_char_constant, utf16_char_constant and utf32_char_constant respectively.
And since L is not supported I think all cases are accounted for. 
Or am I missing something?
In case not, should I continue to put another patch together with suggested changes?


Repository:
  rG LLVM Github Monorepo

CHANGES SINCE LAST ACTION
  https://reviews.llvm.org/D127363/new/

https://reviews.llvm.org/D127363



More information about the cfe-commits mailing list