[PATCH] D124996: [clang][preprocessor] Fix unsigned-ness of utf8 char literals

Tom Honermann via Phabricator via cfe-commits cfe-commits at lists.llvm.org
Mon May 9 09:52:38 PDT 2022


tahonermann added inline comments.


================
Comment at: clang/test/Lexer/utf8-char-literal.cpp:5-7
+// RUN: %clang_cc1 -triple x86_64-apple-darwin -std=c++17 -fsyntax-only -fchar8_t -DCHAR8_T -verify %s
+// RUN: %clang_cc1 -triple x86_64-apple-darwin -std=c++20 -fsyntax-only -verify %s
+// RUN: %clang_cc1 -triple x86_64-apple-darwin -std=c++20 -fsyntax-only -fno-char8_t -DNO_CHAR8_T -verify %s
----------------
The `-DCHAR8_T` and `-DNO_CHAR8_T` options can be removed now since the test is no longer dependent on them.


================
Comment at: clang/test/Lexer/utf8-char-literal.cpp:35-60
+#if __cplusplus == 201703L
+#  if defined(__cpp_char8_t)
+#    if u8'\xff' == '\xff' // expected-warning {{right side of operator converted from negative value to unsigned}}
+#      error Something's not right.
+#    endif
+#  else
+#    if u8'\xff' != '\xff'
----------------
Something is still not right here. `-std=c++17` should behave the same as `-std=c++20 -fno-char8_t`. Likewise, `-std=c++17 -fchar8_t` should behave the same as `-std=c++20`. Basically, if `__cpp_char8_t` is defined, then `u8'\xff'` should be unsigned and if that macro isn't defined, then the result should be signed (for an 8-bit signed char type). But the reversed conditions for the C++17 vs C++20 tests above conflict with that expectation. The code changes look right. Is the test actually passing like this?

Since the tests are now conditioned on `__cpp_char8_t`, I think they should be merged. I suggest:
  // UTF-8 character literals are enabled in C++17 and later. If `-fchar8_t` is not enabled
  // (as is the case in C++17), then UTF-8 character literals may produce signed or
  // unsigned values depending on whether char is a signed type. If `-fchar8_t` is enabled
  // (which is the default behavior for C++20), then UTF-8 character literals always
  // produce unsigned values. The tests below depend on the target having a signed
  // 8-bit char so that '\xff' produces a negative value.
  #if __cplusplus >= 201703L
  #  if !defined(__cpp_char8_t)
  #    if !(u8'\xff' == '\xff')
  #      error UTF-8 character value did not match ordinary character literal; this is unexpected
  #    endif
  #  else
  #    if u8'\xff' == '\xff' // expected-warning {{right side of operator converted from negative value to unsigned}}
  #      error UTF-8 character value matched ordinary character literal; this is unexpected
  #    endif
  #  endif
  #endif


================
Comment at: clang/test/Lexer/utf8-char-literal.cpp:37-47
+#if __cplusplus == 201703L
+#  if defined(CHAR8_T)
+#    if u8'\xff' == '\xff' // expected-warning {{right side of operator converted from negative value to unsigned}}
+#      error Something's not right.
+#    endif
+#  else
+#    if u8'\xff' != '\xff'
----------------
tbaeder wrote:
> tahonermann wrote:
> > aaron.ballman wrote:
> > > The equality operators seem backwards to what @tahonermann was saying -- I read his comment as:
> > > 
> > > C++17/14/11: u8'\xff' == '\xff'
> > > C++17/14/11, -fchar8_t: u8'\xff' != '\xff'
> > > C++20 and up: u8'\xff' != '\xff'
> > > C++20 and up, -fno-char8_t: u8'\xff' == '\xff'
> > > 
> > > Hopefully Tom can clarify if I misunderstood.
> > Yes, that looks right (as long as the target has a signed `char` type).
> Are you listing the positive conditions (that should be true) or negative ones? The conditions in the test case need to be false in order for the test case to succeed.
Yes, these list the expected behavior (e.g., the assertable behavior).


CHANGES SINCE LAST ACTION
  https://reviews.llvm.org/D124996/new/

https://reviews.llvm.org/D124996



More information about the cfe-commits mailing list