[PATCH] D127363: [Lex] Fix for char32_t literal truncation on 16 bit architectures

Tom Honermann via Phabricator via cfe-commits cfe-commits at lists.llvm.org
Tue Jun 21 09:48:34 PDT 2022

tahonermann added inline comments.

Comment at: clang/lib/Lex/LiteralSupport.cpp:1600
-  llvm::APInt LitVal(PP.getTargetInfo().getIntWidth(), 0);
+  llvm::APInt LitVal(PP.getTargetInfo().getChar32Width(), 0);
sammccall wrote:
> tahonermann wrote:
> > I don't think this is quite right. For the code that follows this change to work as intended and issue the "Character constant too long for its type" diagnostic, the width needs to match that of `int`. This is required for multicharacter literals (they have type `int`) so that an appropriate diagnostic is issued for `'xxxxx'` for targets that have a 32-bit int (or for `'xxx'` for targets that have a 16-bit int)`.
> > 
> > Additionally, the type of a character constant in C is `int`.
> > 
> > I think what is needed is something like:
> >   unsigned BitWidth = getCharWidth(Kind, PP.getTargetInfo());
> >   if (IsMultiChar || !PP.getLangOpts().CPlusPlus)
> >     BitWidth = PP.getTargetInfo().getIntWidth();
> >   llvm::APInt LitVal(BitWidth, 0);
> Thanks for pointing this out.
> My reading of https://eel.is/c++draft/lex.ccon#2 is that a multi-char char literal with a L/u8/u/U prefix is not `int` but the respective character types, so the conditions here are even a little *more* complicated than you suggest :-(
That is partially correct. Per [[ https://eel.is/c++draft/lex.ccon#1.sentence-3 | (lex.ccon)p1 sentence 3 ]], multicharacter literals are not allowed to have a `u8`, `u`, or `U` prefix. A multicharacter literal with a `L` prefix does have a `wchar_t` type, but Clang does not currently support `L` prefixed multicharacter literals (multicharacter literals are conditionally-supported per [[ https://eel.is/c++draft/lex.ccon#1.sentence-4 | (lex.ccon)p1 sentence 4 ]])

All that being said, I would not be surprised if some additional conditions are needed. I don't recall where Clang diagnoses the ill-formed and unsupported cases.

  rG LLVM Github Monorepo



More information about the cfe-commits mailing list