[PATCH] D124996: [clang][preprocessor] Fix unsigned-ness of utf8 char literals

Tom Honermann via Phabricator via cfe-commits cfe-commits at lists.llvm.org
Fri May 6 09:27:19 PDT 2022


tahonermann added a comment.

Thanks for your continued work on this, Tim! I think this is close. I did spot one issue and added a few other comments.



================
Comment at: clang/lib/Lex/PPExpressions.cpp:417-418
+    else if (Literal.isUTF8())
+      Val.setIsUnsigned(PP.getLangOpts().CPlusPlus ? PP.getLangOpts().Char8
+                                                   : true);
+    else
----------------
Thanks for breaking the conditions out; that does make this simpler to understand.

I don't think this is right yet though. In C++, if `PP.getLangOpts().Char8` is `false`, then signedness is determined by `PP.getLangOpts().CharIsSigned`. Perhaps this:
  else if (Literal.isUTF8()) {
    if (PP.getLangOpts().CPlusPlus)
      Val.setIsUnsigned(PP.getLangOpts().Char8 ? true : !PP.getLangOpts().CharIsSigned);
    else
      Val.setIsUnsigned(true);
  }

The test case didn't catch this because `char` is always a signed type for the variations that are exercised. We could add a variant that includes `-funsigned-char`, and then modify the test based on the presence of `__CHAR_UNSIGNED__`, but that might get pretty awkward.


================
Comment at: clang/test/Lexer/utf8-char-literal.cpp:3-4
 // RUN: %clang_cc1 -triple x86_64-apple-darwin -std=c2x -x c -fsyntax-only -verify %s
-// RUN: %clang_cc1 -triple x86_64-apple-darwin -std=c++1z -fsyntax-only -verify %s
+// RUN: %clang_cc1 -triple x86_64-apple-darwin -std=c2x -x c -fsyntax-only -fchar8_t -verify %s
+// RUN: %clang_cc1 -triple x86_64-apple-darwin -std=c2x -x c -fsyntax-only -fno-char8_t -verify %s
+// RUN: %clang_cc1 -triple x86_64-apple-darwin -std=c++11 -fsyntax-only -verify %s
----------------
Does the `-fchar8_t` option have any effect in C at present?

Gcc maintainers are currently not planning to acknowledge that option in C modes since WG14 did not want to add language dialect concerns for C. This is why [[https://www.open-std.org/jtc1/sc22/wg14/www/docs/n2653.htm|N2653]] doesn't have wording that includes a feature test macro. The gcc maintainers pushed back on the `_CHAR8_T_SOURCE` macro mentioned in the "Implementation Experience" section.

I think Clang should follow suit; attempts to use `-fchar8_t` or `-fno-char8_t` in C modes should be diagnosed; which means that we don't have to exercise these options with C2x.


================
Comment at: clang/test/Lexer/utf8-char-literal.cpp:7-9
+// RUN: %clang_cc1 -triple x86_64-apple-darwin -std=c++17 -fsyntax-only -fchar8_t -DCHAR8_T -verify %s
+// RUN: %clang_cc1 -triple x86_64-apple-darwin -std=c++20 -fsyntax-only -verify %s
+// RUN: %clang_cc1 -triple x86_64-apple-darwin -std=c++20 -fsyntax-only -fno-char8_t -DNO_CHAR8_T -verify %s
----------------
Rather than adding your own `CHAR8_T` and `NO_CHAR8_T` macros, you can use the predefined `__cpp_char8_t` feature test macro.


================
Comment at: clang/test/Lexer/utf8-char-literal.cpp:37-47
+#if __cplusplus == 201703L
+#  if defined(CHAR8_T)
+#    if u8'\xff' == '\xff' // expected-warning {{right side of operator converted from negative value to unsigned}}
+#      error Something's not right.
+#    endif
+#  else
+#    if u8'\xff' != '\xff'
----------------
aaron.ballman wrote:
> The equality operators seem backwards to what @tahonermann was saying -- I read his comment as:
> 
> C++17/14/11: u8'\xff' == '\xff'
> C++17/14/11, -fchar8_t: u8'\xff' != '\xff'
> C++20 and up: u8'\xff' != '\xff'
> C++20 and up, -fno-char8_t: u8'\xff' == '\xff'
> 
> Hopefully Tom can clarify if I misunderstood.
Yes, that looks right (as long as the target has a signed `char` type).


CHANGES SINCE LAST ACTION
  https://reviews.llvm.org/D124996/new/

https://reviews.llvm.org/D124996



More information about the cfe-commits mailing list