[PATCH] D155610: [Clang][Sema] Fix display of characters on static assertion failure

Corentin Jabot via Phabricator via cfe-commits cfe-commits at lists.llvm.org
Sat Aug 12 01:27:05 PDT 2023


cor3ntin added inline comments.


================
Comment at: clang/lib/Sema/SemaDeclCXX.cpp:16899
+          uint32_t CodePoint = static_cast<uint32_t>(V.getInt().getZExtValue());
+          PrintCharLiteralPrefix(BTy->getKind(), OS);
+          OS << '\'';
----------------
hubert.reinterpretcast wrote:
> hubert.reinterpretcast wrote:
> > cor3ntin wrote:
> > > Looking at the diagnostics, I don't think it makes sense to print a prefix here. You could just leave that part out.
> > Why is removing the prefix better? The types can matter (characters outside the basic character set are allowed to have negative `char` values). Also, moving forward, the value of a character need not be the same in the various encodings.
> Some fun with signedness (imagine a more realistic example with `ISO-8859-1` ordinary character encoding with a signed `char` type):
> ```
> $ clang -Xclang -fwchar-type=short -xc++ -<<<$'static_assert(L"\\uFF10"[0] == U\'\\uFF10\');'
> <stdin>:1:15: error: static assertion failed due to requirement 'L"\xFF10"[0] == U'\uff10''
>     1 | static_assert(L"\uFF10"[0] == U'\uFF10');
>       |               ^~~~~~~~~~~~~~~~~~~~~~~~~
> <stdin>:1:28: note: expression evaluates to ''0' (0xFF10) == '0' (0xFF10)'
>     1 | static_assert(L"\uFF10"[0] == U'\uFF10');
>       |               ~~~~~~~~~~~~~^~~~~~~~~~~~
> 1 error generated.
> Return:  0x01:1   Fri Aug 11 23:49:02 2023 EDT
> ```
Either we care about the actual character - ie `'a'`, or it's value (ie `42`). The motivation for the current patch is to add the value to the diagnostic message.
I'm also concerned about mixing things that are are are not lexical elements in the diagnostics


================
Comment at: clang/lib/Sema/SemaDeclCXX.cpp:16930-16936
+        case BuiltinType::Char_S:
+        case BuiltinType::Char_U:
+        case BuiltinType::Char8:
+        case BuiltinType::Char16:
+        case BuiltinType::Char32:
+        case BuiltinType::WChar_S:
+        case BuiltinType::WChar_U: {
----------------
hubert.reinterpretcast wrote:
> Add FIXME for `char` and `wchar_t` cases that this assumes Unicode literal encodings.
If we wanted such fixme, it should be L1689.


================
Comment at: clang/test/SemaCXX/static-assert.cpp:287
+  static_assert((char16_t)L'ゆ' == L"C̵̭̯̠̎͌ͅť̺"[1], ""); // expected-error {{failed}} \
+                                                  // expected-note {{evaluates to ''ゆ' (0x3086) == '̵' (0x335)'}}
+  static_assert(L"\/"[1] == u'\xFFFD', ""); // expected-error {{failed}} \
----------------
hubert.reinterpretcast wrote:
> The C++23 escaped string formatting facility would not generate a trailing combining character like this. I recommend following suit.
> 
> Info on U+0335: https://util.unicode.org/UnicodeJsps/character.jsp?a=0335
> 
This is way outside the scope of the patch. The diagnostic output facility has no understanding of combining characters or graphemes and do not attempt to match std::print. It probably would be an improvement but this patch is not trying to modify how all diagnostics are printed. (all of that logic is in Diagnostic.cpp)


CHANGES SINCE LAST ACTION
  https://reviews.llvm.org/D155610/new/

https://reviews.llvm.org/D155610



More information about the cfe-commits mailing list