[PATCH][Review request] unix.Malloc checker improvement: +handling new/delete, +memory.MismatchedFree, +memory.MismatchedDelete, +improved display names for allocators/deallocators

Jordan Rose jordan_rose at apple.com
Fri Feb 22 18:47:27 PST 2013


On Feb 22, 2013, at 7:24 , Anton Yartsev <anton.yartsev at gmail.com> wrote:

> On 22.02.2013 9:30, Anna Zaks wrote:
>> 
>> On Feb 21, 2013, at 6:26 PM, Anna Zaks <ganna at apple.com> wrote:
>> 
>>> 
>>> On Feb 21, 2013, at 6:00 PM, Anton Yartsev <anton.yartsev at gmail.com> wrote:
>>> 
>>>>> 
>>>>> On Feb 19, 2013, at 10:18 PM, Anton Yartsev <anton.yartsev at gmail.com> wrote:
>>>>> 
>>>>>> Hi, Jordan. Thanx for the review!
>>>>>> 
>>>>>> Attached is the new version of the patch with all the comments addressed. Also added support for directly called operator new()/new[]() and operator delete()
>>>>>> 
>>>>>> There is currently one problem with handling of operator delete(). The following test
>>>>>> 
>>>>>> void testDeleteOp1() {
>>>>>>  int *p = (int *)malloc(sizeof(int));
>>>>>>  operator delete(p); // FIXME: should complain "Argument to operator delete() was allocated by malloc(), not operator new"
>>>>>> }
>>>>>> 
>>>>>> produce no warnings as attached RefState seem to be missing at the point when checkPostStmt(const CallExpr *CE, CheckerContext &C) callback is called for operator delete(p).
>>>>>> I haven't investigated the problem deeply yet, intend to address it parallel with the review.
>>>>>> 
>>>>>>> +  if (NE->getNumPlacementArgs())
>>>>>>> +    return;
>>>>>>> +  // skip placement new operators as they may not allocate memory
>>>>>>> 
>>>>>>> Two comments here:
>>>>>>> - Please make sure all comments are complete, capitalized, and punctuated sentences. (This has the important one, "complete"....just missing capitalization and punctuation.)
>>>>>> I'll try. Unfortunately I am not as good in English as I want to be, so sorry for my grammar, syntax, and punctuation.
>>>>>> 
>>>>>> -- 
>>>>>> Anton
>>>>>> 
>>>>>> <MallocChecker_v2.patch>
>>>>> 
>>>> 
>>>> Hi Anna. Thanks for your comments! Attached is the new patch.
>>>> 
>>>>> Just adding another kind as extra enumeration values does not seem right. Another option is to make Kind be a pointer to a static array, which contains objects recording all necessary info about each kind (MacOSKeychainAPIChecker uses this approach). This is probably an overkill for now, but is another option.
>>>> Not sure that I have got an idea.
>>>> The memory and deallocator kind are both set for a RefState. Do you mean creating a static array with 'memory kinds' * 'deallocator kind' elements for all possible combinations? Also there is no necessary info other then the kind itself.
>>>> Left for now.
>>> 
>>> I think of ToBeReleasedWith* as being different types of Allocate; I don't think they should be separate values in the same enum. It's also unfortunate to have to copy the constant values in both places - DeallocatorKind and RefState::Kind. Maybe you could restructure this similarly to how this is done in SVals.h?
>>>> 
>>>>> +  const FunctionDecl *getCalleeDecl() const { return CalleeDecl; }
>>>>> Do we only store the name of the allocator declaration here?
>>> 
>>> If the Decl is always an allocator Decl, we should change the name of the method to be more descriptive.
>>>>> Do we need to store this in the state? (Growing states implies memory overhead.) Can't this be easily implied from the kind?
>>>> Kind can't give us information about the name of an allocator that can be malloc(), realloc(), a user function with ownership_takes attribute, etc.
>>>> One solution to avoid keeping a CalleeDecl in RefState is to rollback to CallExpr::getDirectCallee() from CheckerContext::getCalleeDec() and to print "malloc()" in case of indirect calls.
>>> 
>>> Ok, I see.
>> 
>> After thinking about it some more, I do not think we should add an extra pointer to the state to differentiate between few allocator functions. In the case when we do not have ownership attributes, we only have few possible allocators, whose names we know ahead of time. In case we support ownership attributes, we are likely to have few allocator functions whose names we could just store in the checker state on the first encounter (like we store the IdentifierInfo).  
>> 
>> In addition, we could change the ownership attributes in such a way that each allocator would have a corresponding deallocator; for example, if we wanted to check matching allocators and deallocators. Annotated deallocators won't necessarily be one of the functions you know at compile time, so the DeallocatorKind enum would not cover it. I think, it's best if we kept a table on a side that would store this info (allocation function name, deallocator) and refer to the entries in the table from the state. (Take a look at MacOSKeychainAPIChecker - it's very similar to malloc, but it handles different allocator/deallocator pairs. I think a similar solution could work in this case as well. Other solutions that address these issues are welcome as well!)
> Attached is the patch that uses approach with a dynamic table that holds both allocator name and expected deallocator kind. This approach allows to keep any allocator names, either known or new ones. The table could be easily extended to hold additional data such as info about special deallocators, etc.

Please do not do this. The analyzer is not thread-safe today, but that's no reason why we should make it harder to make it thread-safe tomorrow.

Stepping back, there are many problems with this, the main one being that just keeping track of the function name isn't really good enough. MacOSKeychainAPIChecker can get away with it because its functions are 

I think all Anna meant is to have a static table, like MacOSKeychainAPIChecker. If/when we support allocator families defined at runtime, we can do this properly, with a mutable table as a field of the checker, but for now this is both overkill and error-prone.

Separately, I don't understand why you chose to inherit from SmallVector, rather than just using a SmallVector member. AllocDeallocDataVec isn't conceptually a vector, it's a type that maps family IDs to names and DeallocatorKinds.


Other comments:

+enum DeallocatorKind {
+  D_free,
+  D_delete,
+  D_deleteArray,
+  D_unknown
+};

Nitpicking: single-character enum prefixes feel strange to me. DK_?


-  enum Kind { // Reference to allocated memory.
-              Allocated,
-              // Reference to released/freed memory.
-              Released,
-              // The responsibility for freeing resources has transfered from
-              // this reference. A relinquished symbol should not be freed.
-              Relinquished } K;
+  // First two bits of Kind represent memory kind
+  static const unsigned K_MASK = 0x3;
+  // The rest bits keep an index to the AllocDeallocData
+  // that hold information about the allocator and deallocator functions
+  static const unsigned I_BASE = 2;
+  static const unsigned I_MASK = ~0 << I_BASE;
+  // A storage for memory kind and index to the AllocDeallocData
+  enum KindPlusData { // Reference to allocated memory.
+                     Allocated,
+                     // Reference to released/freed memory.
+                     Released,
+                     // The responsibility for freeing resources has transfered
+                     // from this reference. A relinquished symbol should not be
+                     // freed.
+                     Relinquished
+  } K;
   const Stmt *S;

This should be implemented using bitfields, and the bitfields should go after the Stmt pointer so that the total object size is smaller.


+  /// Auxiliary functions that return kind and print names of 
+  /// allocators/deallocators
+  DeallocatorKind GetDeallocKind(CheckerContext &C, const Expr *E) const;
+  void PrintAllocDeallocName(raw_ostream &os, CheckerContext &C, 
+                             const Expr *E) const;
+  void PrintAllocDeallocName(raw_ostream &os, CheckerContext &C, 
+                             const RefState *RS) const;
+  void PrintExpectedAllocName(raw_ostream &os, CheckerContext &C, 
+                              const Expr *DeallocExpr) const;

Hm. Doxygen will attach the comment to the first declaration and ignore the others. You can use groups to sort of get the effect you want, but it might be worth just being clearer anyway.

Also, the convention for function/method names in LLVM is now lowerCamelCase.

Thank you for switching to raw_ostream. :-)


+  bool isDefaultNonPtrPlacementNewDelete(const FunctionDecl *FD,
+                                         CheckerContext &C) const;

Very confusing method name. isStandardNewDelete is probably fine. There's nothing about "pointers" or "placement" that's strange here -- what you want to know is that this is the global declaration of operator new, and that it's not provided by the user.


+  if (FD->isReservedGlobalPlacementOperator())
+    return false;

This is still the wrong check. You need to check if there are any placement arguments, and you need to check that the Decl is not part of a class. Both are easy:

if (FD->getNumParams() != 1 || FD->isVariadic())
	return false;
if (isa<CXXMethodDecl>(FD))
	return false;


+      // Process direct calls to operator new/new[]/delete/delete[] functions
+      // as distinct from new/new[]/delete/delete[] expressions that are 
+      // processed by "checkPostStmt(const CXXNewExpr *" and 
+      // "checkPostStmt(const CXXDeleteExpr *" respectively

Missing a period, and also looks rather odd with the unbalanced parens. Maybe something like "...processed by the checkPostStmt callbacks for CXXNewExpr and CXXDeleteExpr"? Also, you can't really use "respectively" here because you're matching two items up with four items, although it is pretty clear what you mean.

Good thought here, though—I wouldn't have remembered to do this. Hopefully someday it will be unimportant anyway when we use CallEvent to model the allocator and deallocator calls inside CXXNewExpr and CXXDeleteExpr.


+        assert(0 && "not a new/delete oparator");

This is better written llvm_unreachable("not a new/delete operator");


+  // the return value from operator new is already bound and we don't want to 
+  // break this binding so we call MallocUpdateRefState instead of MallocMemAux

Capitalization, period.


+  StringRef Name = "";
+  if (const CallExpr *CE = dyn_cast<CallExpr>(E)) {
+    if (const FunctionDecl *FD = C.getCalleeDecl(CE)) {
+      if (FD->getDeclName().isIdentifier()) {
+        Name = FD->getName();
+      }
+    }
+  }
+
+  unsigned Idx = AllocDeallocData.GetOrInsert(Name, dKind);
   // Set the symbol's state to Allocated.
-  return state->set<RegionState>(Sym, RefState::getAllocated(CE));
-
+  return state->set<RegionState>(Sym, RefState::getAllocated(E, Idx));

Yeah, honestly it makes a lot more sense to me to just store the dKind in the RefState, and lazily derive the name when we need to print it at the end. (And not do it so specifically.)


+  if (const CXXDeleteExpr *DE = dyn_cast_or_null<CXXDeleteExpr>(E))
+    return DE->isArrayForm() ? D_deleteArray : D_delete;

You already tested for null, so you can just use dyn_cast here.


+  if (isa<ObjCMessageExpr>(E))
+    return D_free;

There are no Objective-C messages that free memory immediately; just take this out.


+  // get the exact name of an allocation function

Capitalization, spelling.


+      if (FD->getKind() == Decl::Function) {

Please don't do this; it rules out C++ methods. Actually, how about this, for the whole method?

bool MallocChecker::PrintAllocDeallocName(raw_ostream &os, CheckerContext &C, 
                                          const Expr *E) const {
  if (const CallExpr *CE = dyn_cast<CallExpr>(E)) {
    // FIXME: This doesn't handle indirect calls.
    const FunctionDecl *FD = CE->getDirectCallee();
    if (!FD)
      return false;
    
    os << *FD;
    if (!FD->isOverloadedOperator())
      os << "()";
    return true;
  }

  if (const ObjCMessageExpr *Msg = dyn_cast<ObjCMessageExpr>(E)) {
    if (Msg->isInstanceMessage())
      os << "-";
    else
      os << "+";
    os << Msg->getSelector().getAsString();
    return true;
  }

  if (const CXXNewExpr *NE = dyn_cast<CXXNewExpr>(E)) {
    os << *NE->getOperatorNew();
    return true;
  }

  if (const CXXDeleteExpr *DE = dyn_cast<CXXDeleteExpr>(E)) {
    os << *DE->getOperatorDelete();
    return true;
  }

  return false;
}

This actually seems generally useful and could even go on CheckerContext as a convenience helper. Note the bool return, so that rather than come up with some placeholder text like "unknown means", we can just rephrase the message to not mention the allocator.


+  switch(dKind) {
+    case D_free: os << "malloc()"; return;
+    case D_delete: os << "operator new"; return;
+    case D_deleteArray: os << "operator new[]"; return;
+    case D_unknown:
+    default: assert(0 && "unknown or unhandled DeallocatorKind");
+  }

LLVM style is to not include default cases for an enum-covered switch statement. Also llvm_unreachable again.

Thanks for your patience in iterating on this!
Jordan
-------------- next part --------------
An HTML attachment was scrubbed...
URL: <http://lists.llvm.org/pipermail/cfe-commits/attachments/20130222/09f4cd26/attachment.html>


More information about the cfe-commits mailing list