[PATCH] D155610: [Clang][Sema] Fix display of characters on static assertion failure

Fri Aug 4 06:05:02 PDT 2023

aaron.ballman added inline comments.

================
Comment at: clang/docs/ReleaseNotes.rst:103-137
+- When describing the failure of static assertion of `==` expression, clang prints the integer
+  representation of the value as well as its character representation when
+  the user-provided expression is of character type. If the character is
+  non-printable, clang now shows the escpaed character.
+  Clang also prints multi-byte characters if the user-provided expression
+  is of multi-byte character type.
+
----------------
hazohelet wrote:
> aaron.ballman wrote:
> > cor3ntin wrote:
> > > aaron.ballman wrote:
> > > > cor3ntin wrote:
> > > > > @aaron.ballman One one hand this is nice, on the other hand maybe too detailed. What do you think?
> > > > I'm happy with it -- better too much detail than too little, but this really helps users see what's been improved and why it matters.
> > > > 
> > > > That said, I think `0x0A` and `0x1F30D` would arguably be better than printing the values in decimal. For `\n`, perhaps folks remember that it's decimal value 10, but nobody is going to know what `127757` means compared to the hex representation (esp because the value is specified in hex with the prefix printed in the error message). WDYT?
> > > For `wchar_t`, `charN_t` I think that makes sense.
> > > for `char`... hard to know, I think this is mostly useful for people who treat char as some kind of integer. I could go either way. using hex consistently seems reasonable
> > I don't insist on using hex, but I have a slight preference for using it consistently everywhere. CC @cjdb for more opinions since this relates to user experience of diagnostics.
> I generally agree that hex code would be better for characters.
> I think we still have some arguable points.
> 1. Should we print the unsigned code point or the (possibly signed) integer? (e.g. `0xFF` vs `-0x01` for `(char)-1`, on targets where `char` is signed)
> 2. Should we print the hex code when the other subexpression of the `==` expression is not a textual type? (e.g. `0x11` vs `17` for LHS of `(char)17 == 11`)
> 
> For 1, I think we should always print unsigned code point for all textual types for consistency. Also we don't want to print `-0x3` for `L'\xFFFD'` on targets where `wchar_t` is signed and 16-bit width (I haven't checked whether that target exists, though).
> For 2, I want to see decimal (possibly signed) integer if the other side of the expression is not textual type.
> Displaying `expression evaluates to ''<FF>' (0xFF) == 255'` for the following code would be highly confusing.
> ```
> static_assert((char)-1 == (unsigned char)-1);
> ```
> WDYT?
> Should we print the unsigned code point or the (possibly signed) integer? (e.g. 0xFF vs -0x01 for (char)-1, on targets where char is signed)

Personally, I find -0x01 to be kind of weird and I slightly prefer 0xFF.

> Should we print the hex code when the other subexpression of the == expression is not a textual type? (e.g. 0x11 vs 17 for LHS of (char)17 == 11)

I don't have a strong opinion on this because I think we can come up with arguments for either approach. My intuition is that we should just use hex values everywhere, but others may have a different opinion.

CHANGES SINCE LAST ACTION
  https://reviews.llvm.org/D155610/new/

https://reviews.llvm.org/D155610