[PATCH] D155610: [Clang][ExprConstant] Print integer instead of character on static assertion failure

Wed Aug 2 06:08:39 PDT 2023

hazohelet added a comment.

In D155610#4550346 <https://reviews.llvm.org/D155610#4550346>, @cor3ntin wrote:

> I've been thinking about it and I think I have a cleaner design for the printing of characters:
>
> We need a `CharToString(unsigned, Qualtype) -> SmallString` method that takes a value and the type.
> for `char` and `char8_t` we can just return the value.
> For `wchar_t`, `char32_t` and `char16_t`, we can use something like `ConvertCodePointToUTF8` to convert to UTF-8. If that fails we can escape with `\x`
> If we pass the result of that to the diagnostic engine, escaping  of non printing character would happen automatically.
>
> That way we have a nice separation between converting an APValue to string and printing it, which should avoid code duplication quite a bit, and generally make the design nicer.
> Then maybe we need to consider whether we want to modify `CharacterLiteral::print` to be aligned with all of that. I don;t know if that's used, for example, for mangling.
>
> Given there are a bunch of different issues here, i would not mind separate PRs - having the numerical value showed in paren seems valuable on its own.

Thanks for the suggestion. Printing of multibyte character is a bit out of the scope of the original goal of this patch, but I'll give it a try.

================
Comment at: clang/lib/Basic/Diagnostic.cpp:838-858
+      if (UseUCN)
+        OutStream << "\\u"
+                  << llvm::format_hex_no_prefix(CodepointValue, /*Width=*/4,
+                                                /*Upper=*/false);
+      else
+        OutStream << "<U+"
+                  << llvm::format_hex_no_prefix(CodepointValue, /*Width=*/4,
----------------
cor3ntin wrote:
> hazohelet wrote:
> > cor3ntin wrote:
> > > The use UCN addition is probably not justified. we should consistent in how we print the value of non-printable code points.
> > My motivation here is to print a valid character literal. I think it justifies this change somewhat. I'd like to see what others think about this.
> Why? there is no expectation that diagnostics messages reproduce C++. Consistency between string literals and characters literals is a lot more important
The diagnostic meesage here looks like `expression evaluates to 'VALUE1 == VALUE2'`
I tend to expect that `VALUE1 == VALUE2` is a syntactically valid expression because of the syntactical element `==`. But if others do not feel the same way, I am okay with something like `'<U+0001>' == '<U+0002>'`

CHANGES SINCE LAST ACTION
  https://reviews.llvm.org/D155610/new/

https://reviews.llvm.org/D155610