[PATCH] D103426: Clang: Extend format string checking to wprintf/wscanf

Sat Jul 24 11:20:34 PDT 2021

efriedma added a comment.

This should split into three patches:

1. The ConvertUTFWrapper patch.
2. The patch to add format warnings for wprintf/wscanf/etc.
3. The patch to add the %l16/%l32 format modifiers.

================
Comment at: clang/lib/AST/Expr.cpp:1091
+      if (llvm::convertUTF32ToUTF8String(AR, Output)) {
+        CString = new char[Output.size() + 1];
+        memcpy(CString, Output.c_str(), Output.size());
----------------
This leaks memory.

This function should return either a StringRef to memory that's part of AST object, or an std::string.

================
Comment at: clang/lib/AST/OSLog.cpp:206
   const StringLiteral *Lit = cast<StringLiteral>(StringArg->IgnoreParenCasts());
-  assert(Lit && (Lit->isAscii() || Lit->isUTF8()));
-  StringRef Data = Lit->getString();
+  assert(Lit);
+  std::string String(Lit->getStrDataAsChar());
----------------
Why are you changing this code?

================
Comment at: clang/lib/AST/Type.cpp:1962

+bool Type::isType(const std::string TypeName) const {
+  QualType Desugar = this->getLocallyUnqualifiedSingleStepDesugaredType();
----------------
MarcusJohnson91 wrote:
> aaron.ballman wrote:
> > Oh, I see now that this is doing a name comparison against the type -- that's not a good API in general because it's *really* hard to guess at what the type will come out as textually. e.g., `class` and `struct` keywords are interchangeable in C++, C sometimes gets confused with `bool` vs `_Bool`, template arguments sometimes matter, nested name specifiers, etc. Callers of this API will have to guess at these details and the printing of the type may change over time (e.g., C may switch from `_Bool` to `bool` and then code calling `isType("_Bool")` may react poorly to the change).
> > 
> > I think we need to avoid this sort of API on `Type`.
> I see your point, I reverted the behavior back to doing the desugaring in just isChar16Type and isChar32Type
I'm not convinced we should be looking at sugar even in isChar16Type/isChar32Type/isAnyCharacterType.  That seems like a great way to end up with subtle bugs that only manifest when someone uses the wrong typedef.

Where is the distinction between the value `(uint32_t)1` vs. `(char32_t)1` relevant for C, anyway?

================
Comment at: clang/lib/Sema/SemaChecking.cpp:103
+#include <locale>
+#include <codecvt>

----------------
Stray includes?

================
Comment at: clang/test/Sema/format-strings-non-iso.c:6
+
+int wprintf(const char *restrict, ...);
+int wscanf(const char  *restrict, ...);
----------------
This declaration is wrong?  Does that not cause a warning somewhere?

================
Comment at: llvm/lib/Support/ConvertUTFWrapper.cpp:148
+  // Error out on an uneven byte count.
+  if (SrcBytes.size() % 2)
+    return false;
----------------
sizeof(UTF32)?

CHANGES SINCE LAST ACTION
  https://reviews.llvm.org/D103426/new/

https://reviews.llvm.org/D103426