[PATCH] improve clang's static analysis of __builtin_sprintf_chk()

Richard Trieu rtrieu at google.com
Thu Jun 18 16:51:20 PDT 2015


These are comments about everything besides the actual checking code.  I will review that next.

1. Double warnings are coming up because the format strings are being checked twice.  Clang already knows that `__builtin___sprintf_chk` has a format string, so that gets checked with the old diagnostics.  Instead of calling the checks directly, modify the existing path to carry extra information along.
  - Add an argument `bool IsSprintf` at the end of Sema::CheckFormatArguments
  - When Sema::CheckFormatArguments is called in Sema::checkCall, set that argument to true if `FDecl` is a FunctionDecl that is the function `__builtin___sprintf_chk`
  - In Sema::CheckFormatArguments, if IsSprintf is true, set the type to FST_Sprintf, otherwise, use the current type.

2. From testing, your patch looks limited in a few areas.  The limitations of this warning should be documented with your CheckSprintfHandler class.  For instance, non-specifiers text in the format string isn't counted.  There's two ways to go from here:

3. See http://llvm.org/docs/Phabricator.html about how to add full context to your patches.  Full context makes reviewing patches easier.


================
Comment at: include/clang/Sema/Sema.h:8524-8527
@@ -8523,2 +8523,6 @@
 
+  void CheckSprintfMemAlloc(FunctionDecl *FDecl, CallExpr *TheCall,
+                             unsigned BufIdx, unsigned FmtIdx,
+                             unsigned DataIdx);
+
   bool CheckARMBuiltinExclusiveCall(unsigned BuiltinID, CallExpr *TheCall,
----------------
Remove.

================
Comment at: include/clang/Sema/Sema.h:8569
@@ -8564,2 +8568,3 @@
     FST_Printf,
+    FST_Sprintf,
     FST_NSString,
----------------
Add comment that this is not a format string attribute, but extra checking on top of a printf format string.

================
Comment at: lib/Sema/SemaChecking.cpp:483-485
@@ -482,2 +482,5 @@
     break;
+  case Builtin::BI__builtin___sprintf_chk:
+    CheckSprintfMemAlloc(FDecl, TheCall, 0, 3, 4);
+    break;
   case Builtin::BI__builtin___memccpy_chk:
----------------
Remove.

================
Comment at: lib/Sema/SemaChecking.cpp:1323-1356
@@ -1319,2 +1322,36 @@
 
+void Sema::CheckSprintfMemAlloc(FunctionDecl *FDecl, CallExpr *TheCall,
+                                unsigned BufIdx, unsigned FmtIdx,
+                                unsigned DataIdx) {
+  unsigned NumArgs = TheCall->getNumArgs();
+  if (NumArgs <= BufIdx ||
+      NumArgs <= FmtIdx ||
+      NumArgs <= DataIdx)
+    return;
+
+  const Expr *FmtArg = TheCall->getArg(FmtIdx);
+  if (!isa<StringLiteral>(FmtArg->IgnoreImplicit()))
+    return;
+
+  const FunctionProtoType *Proto =
+    FDecl->getType()->castAs<FunctionProtoType>();
+  VariadicCallType CallType = getVariadicCallType(FDecl, Proto,
+                                                    TheCall->getCallee());
+  llvm::ArrayRef<const Expr*> Args =
+    llvm::makeArrayRef(TheCall->getArgs(), NumArgs);
+  llvm::SmallBitVector CheckedVarArgs;
+  if (FDecl) {
+    for (const auto *I : FDecl->specific_attrs<FormatAttr>()) {
+      // Only create vector if there are format attributes.
+      CheckedVarArgs.resize(Args.size());
+      CheckFormatArguments(I, Args, /*IsMemberFunction=*/false, CallType,
+                           TheCall->getLocStart(), FDecl->getSourceRange(),
+                           CheckedVarArgs);
+    }
+  }
+  const StringLiteral *SL = cast<StringLiteral>(FmtArg->IgnoreImplicit());
+  CheckFormatString(SL, FmtArg, Args, true, FmtIdx,  DataIdx, FST_Sprintf,
+                    /*inFunctionCall=*/true, CallType, CheckedVarArgs);
+}
+
 bool Sema::CheckObjCMethodCall(ObjCMethodDecl *Method, SourceLocation lbrac, 
----------------
Remove.

================
Comment at: lib/Sema/SemaChecking.cpp:3395
@@ -3357,3 +3394,3 @@
 
-};  
+};
 }
----------------
Don't make unrelated whitespace changes in your patch.

================
Comment at: lib/Sema/SemaChecking.cpp:4091-4094
@@ +4090,6 @@
+static const ConstantArrayType* GetArrayType(Sema &S, const Expr *Array) {
+  if (isa<DeclRefExpr>(Array)) {
+    QualType ArrayTy = cast<DeclRefExpr>(Array)->getDecl()->getType();
+    if (ArrayTy->isConstantArrayType())
+      return S.Context.getAsConstantArrayType(Array->getType());
+  }
----------------
Do you care that the expression is exactly a DeclRefExpr?  It doens't seem important to this warning and you can remove everything except the return statement.

================
Comment at: lib/Sema/SemaChecking.cpp:4106
@@ +4105,3 @@
+                        const Expr *OrigFormatExpr, unsigned FirstDataArg,
+                        unsigned numDataArgs, bool isObjC,
+                        const char *Begin, bool hasVAListArg,
----------------
Remove isObjC parameter.

================
Comment at: lib/Sema/SemaChecking.cpp:4113
@@ +4112,3 @@
+      : CheckPrintfHandler(S, FExpr, OrigFormatExpr, FirstDataArg, numDataArgs,
+                           isObjC, Begin, hasVAListArg, Args, FormatIdx,
+                           inFunctionCall, CallType, CheckedVarArgs),
----------------
Replace `isObjC` with `/*isObjC=*/ false`.

================
Comment at: lib/Sema/SemaChecking.cpp:4124
@@ +4123,3 @@
+
+    bool Init() {
+      if (const ConstantArrayType *StorageBufTy =
----------------
What is this function indicating?  It needs a better name, a comment, or both.

================
Comment at: lib/Sema/SemaChecking.cpp:4136-4137
@@ +4135,4 @@
+    Sema &S;
+    llvm::APInt StorageBufSize;
+    llvm::APInt StorageRequired;
+  };
----------------
Why are APInt's used here instead of an integer type?

================
Comment at: lib/Sema/SemaChecking.cpp:4250-4251
@@ +4249,4 @@
+  if (StorageBufSize.ule(StorageRequired)) {
+    //S.Diag(Args[0]->getLocStart(), diag::warn_sprintf_chk_overflow)
+    S.Diag(Args[0]->getLocStart(), diag::warn_memcpy_chk_overflow)
+      << getSpecifierRange(startSpecifier, specifierLen)
----------------
A new diagnostic message should be created with more information than the current overflow diagnostic, and added to the format warning group.

================
Comment at: lib/Sema/SemaChecking.cpp:4512
@@ +4511,3 @@
+                          numDataArgs,
+                          (Type == FST_NSString || Type == FST_OSTrace),
+                          Str, HasVAListArg, Args, format_idx,
----------------
Remove this line and remove this parameter from the constructor, since the type is known at this point.

================
Comment at: lib/Sema/SemaChecking.cpp:4519
@@ +4518,3 @@
+                                                    Context.getTargetInfo(),
+                                                    Type == FST_FreeBSDKPrintf))
+        H.DoneProcessing();
----------------
Replace with `/*isFreeBSDKPrintf=*/false)`

================
Comment at: test/SemaCXX/sprintf-chk.cpp:1
@@ +1,2 @@
+// RUN: %clang_cc1 -fsyntax-only -verify -fsyntax-only %s
+void test() {
----------------
There is nothing C++ about this.  Move test to test/Sema/ directory.

================
Comment at: test/SemaCXX/sprintf-chk.cpp:13
@@ +12,3 @@
+
+  __builtin___sprintf_chk(bufA, 0, __builtin_object_size(bufA, 0), "%d%d", 10, 10);   // expected-warning {{will always overflow destination buffer}}
+  __builtin___sprintf_chk(bufA, 0, __builtin_object_size(bufA, 0), "%i%i", 10, 10);   // expected-warning {{will always overflow destination buffer}}
----------------
Move the expected-warning to the next line and use the @-1 to declare the warning comes from the previous line.

line_generating_warning();
// expected-warning at -1 {{warning message}}

Also, make sure that at least one of the expected-warning checks the whole warning message.  Right now, they all check truncated versions of the warning.

http://reviews.llvm.org/D9665

EMAIL PREFERENCES
  http://reviews.llvm.org/settings/panel/emailpreferences/






More information about the cfe-commits mailing list