[PATCH] D155610: [Clang][Sema] Fix display of characters on static assertion failure

Fri Aug 4 04:55:51 PDT 2023

hazohelet added inline comments.

================
Comment at: clang/docs/ReleaseNotes.rst:103-137
+- When describing the failure of static assertion of `==` expression, clang prints the integer
+  representation of the value as well as its character representation when
+  the user-provided expression is of character type. If the character is
+  non-printable, clang now shows the escpaed character.
+  Clang also prints multi-byte characters if the user-provided expression
+  is of multi-byte character type.
+
----------------
aaron.ballman wrote:
> cor3ntin wrote:
> > aaron.ballman wrote:
> > > cor3ntin wrote:
> > > > @aaron.ballman One one hand this is nice, on the other hand maybe too detailed. What do you think?
> > > I'm happy with it -- better too much detail than too little, but this really helps users see what's been improved and why it matters.
> > > 
> > > That said, I think `0x0A` and `0x1F30D` would arguably be better than printing the values in decimal. For `\n`, perhaps folks remember that it's decimal value 10, but nobody is going to know what `127757` means compared to the hex representation (esp because the value is specified in hex with the prefix printed in the error message). WDYT?
> > For `wchar_t`, `charN_t` I think that makes sense.
> > for `char`... hard to know, I think this is mostly useful for people who treat char as some kind of integer. I could go either way. using hex consistently seems reasonable
> I don't insist on using hex, but I have a slight preference for using it consistently everywhere. CC @cjdb for more opinions since this relates to user experience of diagnostics.
I generally agree that hex code would be better for characters.
I think we still have some arguable points.
1. Should we print the unsigned code point or the (possibly signed) integer? (e.g. `0xFF` vs `-0x01` for `(char)-1`, on targets where `char` is signed)
2. Should we print the hex code when the other subexpression of the `==` expression is not a textual type? (e.g. `0x11` vs `17` for LHS of `(char)17 == 11`)

For 1, I think we should always print unsigned code point for all textual types for consistency. Also we don't want to print `-0x3` for `L'\xFFFD'` on targets where `wchar_t` is signed and 16-bit width (I haven't checked whether that target exists, though).
For 2, I want to see decimal (possibly signed) integer if the other side of the expression is not textual type.
Displaying `expression evaluates to ''<FF>' (0xFF) == 255'` for the following code would be highly confusing.
```
static_assert((char)-1 == (unsigned char)-1);
```
WDYT?

CHANGES SINCE LAST ACTION
  https://reviews.llvm.org/D155610/new/

https://reviews.llvm.org/D155610