[PATCH] D108469: Improve handling of static assert messages.

Mon Jun 27 08:15:54 PDT 2022

aaron.ballman added reviewers: tahonermann, clang-language-wg.
aaron.ballman added a comment.

Thanks for picking this back up! I have mostly nits, a few questions, and am adding some extra reviewers who may have an opinion. Also, I think the summary needs to be updated (this is no longer waiting on the unevaluated strings stuff), and the changes need a release note.

================
Comment at: clang/lib/Basic/Diagnostic.cpp:809
+  OutStr.reserve(OutStr.size() + Str.size());
+  const unsigned char *Begin =
+      reinterpret_cast<const unsigned char *>(Str.data());
----------------

================
Comment at: clang/lib/Basic/Diagnostic.cpp:821
+      llvm::UTF32 CodepointValue;
+      llvm::UTF32 *CpPtr = &CodepointValue;
+      const unsigned char *CodepointBegin = Begin;
----------------
We don't really need this variable, we can use `&CodepointValue` in the two places it's needed.

================
Comment at: clang/lib/Basic/Diagnostic.cpp:825
+          Begin + llvm::getNumBytesForUTF8(*Begin);
+      llvm::ConversionResult res = llvm::ConvertUTF8toUTF32(
+          &Begin, CodepointEnd, &CpPtr, CpPtr + 1, llvm::strictConversion);
----------------

================
Comment at: clang/lib/Basic/Diagnostic.cpp:828
+      (void)res;
+      assert(llvm::conversionOK == res);
+      assert(Begin == CodepointEnd &&
----------------
It'd be worth adding a message here as well. Something like `&& "the sequence is legal UTF-8 but we couldn't convert it to UTF-32"`

================
Comment at: clang/lib/Sema/SemaDeclCXX.cpp:16591
+      if (AssertMessage) {
+        StringLiteral *MsgStr = cast<StringLiteral>(AssertMessage);
+        if (MsgStr->getCharByteWidth() == 1)
----------------

================
Comment at: clang/lib/Sema/SemaDeclCXX.cpp:16592
+        StringLiteral *MsgStr = cast<StringLiteral>(AssertMessage);
+        if (MsgStr->getCharByteWidth() == 1)
+          Msg << MsgStr->getString();
----------------
Do we want to use something like `isASCII()` here instead of just checking for a single-byte width? e.g., pascal strings are single byte width, but probably not something we want printed this way.

Another question is, do we want to do something slightly slower (since we know we're already on the failure path, the performance here shouldn't matter overly much) by checking `!containsNonAsciiOrNull()`?

================
Comment at: llvm/lib/Support/Unicode.cpp:300
 ///   * 1 for all remaining characters.
-static inline int charWidth(int UCS)
-{
+static inline int charWidth(int UCS) {
   if (!isPrintable(UCS))
----------------
Unrelated formatting change?

================
Comment at: llvm/lib/Support/Unicode.cpp:377
 } // namespace llvm
-
----------------
Unrelated formatting change?

Repository:
  rG LLVM Github Monorepo

CHANGES SINCE LAST ACTION
  https://reviews.llvm.org/D108469/new/

https://reviews.llvm.org/D108469