[PATCH] D155610: [Clang][Sema] Fix display of characters on static assertion failure

Mon Aug 14 05:56:03 PDT 2023

cor3ntin added inline comments.

================
Comment at: clang/lib/Sema/SemaDeclCXX.cpp:16869
+/// The code point needs to be zero-extended to 32-bits.
+void ConvertCharToString(uint32_t CodePoint, const BuiltinType *BTy,
+                         unsigned TyWidth, llvm::raw_ostream &OS) {
----------------
hubert.reinterpretcast wrote:
> It does not seem that the first parameter expects a `CodePoint` argument in all cases. For `Char_S`, `Char_U`, and `Char8`, it seems the function wants to treat the input as a UTF-8 code unit.
> 
> I suggest changing the argument to be clearly a code unit (and potentially treat it as a code point value as appropriate later in the function).
> 
> Also: The function should probably be declared as having static linkage.
> Additionally: The function does not "convert" in the language semantic sense. `WriteCharacterValueDescriptionForDisplay` might be a better name.
Agreed, `CodeUnit` or `Value` would be more correct (mostly because of numeric escape sequences).
But if we are going to change that then `WriteCharValueForDiagnostic` would be better, `Character` implies too much

================
Comment at: clang/lib/Sema/SemaDeclCXX.cpp:16876
+  // other types.
+  if (CodePoint <= UCHAR_MAX) {
+    StringRef Escaped = escapeCStyle<EscapeChar::Single>(CodePoint);
----------------
hubert.reinterpretcast wrote:
> For types other than `Char_S`, `Char_U`, and `Char8`, this fails to treat the C1 Controls and Latin-1 Supplement characters as Unicode code points. It looks like test coverage for these cases are missing.
`escapeCStyle` is one of the things that assume ASCII / UTF, but yes, we might as well reduce to 0x7F just to avoid unnecessary work

CHANGES SINCE LAST ACTION
  https://reviews.llvm.org/D155610/new/

https://reviews.llvm.org/D155610