[PATCH] D124996: [clang][preprocessor] Fix unsigned-ness of utf8 char literals

Timm Bäder via Phabricator via cfe-commits cfe-commits at lists.llvm.org
Thu May 5 05:13:47 PDT 2022


tbaeder updated this revision to Diff 427283.

CHANGES SINCE LAST ACTION
  https://reviews.llvm.org/D124996/new/

https://reviews.llvm.org/D124996

Files:
  clang/docs/ReleaseNotes.rst
  clang/lib/Lex/PPExpressions.cpp
  clang/test/Lexer/utf8-char-literal.cpp


Index: clang/test/Lexer/utf8-char-literal.cpp
===================================================================
--- clang/test/Lexer/utf8-char-literal.cpp
+++ clang/test/Lexer/utf8-char-literal.cpp
@@ -13,7 +13,7 @@
 char d = u8'\u1234'; // expected-error {{character too large for enclosing character literal type}}
 char e = u8'ሴ'; // expected-error {{character too large for enclosing character literal type}}
 char f = u8'ab'; // expected-error {{Unicode character literals may not contain multiple characters}}
-#elif __STDC_VERSION__ > 202000L
+#elif __STDC_VERSION__ >= 202000L
 char a = u8'ñ';      // expected-error {{character too large for enclosing character literal type}}
 char b = u8'\x80';   // ok
 char c = u8'\u0080'; // expected-error {{universal character name refers to a control character}}
@@ -26,3 +26,10 @@
              unsigned char : 1),
     "Surprise!");
 #endif
+
+/// Test u8 char literal preprocessor behavior
+#if __cplusplus > 201402L || __STDC_VERSION__ >= 202000L
+#if u8'\xff' != 0xff
+#error u8 char literal is not unsigned
+#endif
+#endif
Index: clang/lib/Lex/PPExpressions.cpp
===================================================================
--- clang/lib/Lex/PPExpressions.cpp
+++ clang/lib/Lex/PPExpressions.cpp
@@ -407,10 +407,10 @@
     llvm::APSInt Val(NumBits);
     // Set the value.
     Val = Literal.getValue();
-    // Set the signedness. UTF-16 and UTF-32 are always unsigned
+    // Set the signedness. UTF-8, UTF-16 and UTF-32 are always unsigned
     if (Literal.isWide())
       Val.setIsUnsigned(!TargetInfo::isTypeSigned(TI.getWCharType()));
-    else if (!Literal.isUTF16() && !Literal.isUTF32())
+    else if (!Literal.isUTF8() && !Literal.isUTF16() && !Literal.isUTF32())
       Val.setIsUnsigned(!PP.getLangOpts().CharIsSigned);
 
     if (Result.Val.getBitWidth() > Val.getBitWidth()) {
Index: clang/docs/ReleaseNotes.rst
===================================================================
--- clang/docs/ReleaseNotes.rst
+++ clang/docs/ReleaseNotes.rst
@@ -290,6 +290,8 @@
   template parameter, to conform to the Itanium C++ ABI and be compatible with
   GCC. This breaks binary compatibility with code compiled with earlier versions
   of clang; use the ``-fclang-abi-compat=14`` option to get the old mangling.
+- Preprocessor character literals with a ``u8`` are now correctly treated as
+  unsigned character literals. This fixes `Issue 54886 <https://github.com/llvm/llvm-project/issues/54886>`_.
 
 C++20 Feature Support
 ^^^^^^^^^^^^^^^^^^^^^


-------------- next part --------------
A non-text attachment was scrubbed...
Name: D124996.427283.patch
Type: text/x-patch
Size: 2524 bytes
Desc: not available
URL: <http://lists.llvm.org/pipermail/cfe-commits/attachments/20220505/ea7f0f42/attachment-0001.bin>


More information about the cfe-commits mailing list