[PATCH][Review request] unix.Malloc checker improvement: +handling new/delete, +memory.MismatchedFree, +memory.MismatchedDelete, +improved display names for allocators/deallocators

Anton Yartsev anton.yartsev at gmail.com
Fri Feb 22 22:01:52 PST 2013


On 23.02.2013 6:47, Jordan Rose wrote:
>
> On Feb 22, 2013, at 7:24 , Anton Yartsev <anton.yartsev at gmail.com 
> <mailto:anton.yartsev at gmail.com>> wrote:
>
>> On 22.02.2013 9:30, Anna Zaks wrote:
>>>
>>> On Feb 21, 2013, at 6:26 PM, Anna Zaks <ganna at apple.com 
>>> <mailto:ganna at apple.com>> wrote:
>>>
>>>>
>>>> On Feb 21, 2013, at 6:00 PM, Anton Yartsev <anton.yartsev at gmail.com 
>>>> <mailto:anton.yartsev at gmail.com>> wrote:
>>>>
>>>>>>
>>>>>> On Feb 19, 2013, at 10:18 PM, Anton Yartsev 
>>>>>> <anton.yartsev at gmail.com <mailto:anton.yartsev at gmail.com>> wrote:
>>>>>>
>>>>>>> Hi, Jordan. Thanx for the review!
>>>>>>>
>>>>>>> Attached is the new version of the patch with all the comments 
>>>>>>> addressed. Also added support for directly called operator 
>>>>>>> new()/new[]() and operator delete()
>>>>>>>
>>>>>>> There is currently one problem with handling of operator 
>>>>>>> delete(). The following test
>>>>>>>
>>>>>>> void testDeleteOp1() {
>>>>>>>  int *p = (int *)malloc(sizeof(int));
>>>>>>>  operator delete(p); // FIXME: should complain "Argument to 
>>>>>>> operator delete() was allocated by malloc(), not operator new"
>>>>>>> }
>>>>>>>
>>>>>>> produce no warnings as attached RefState seem to be missing at 
>>>>>>> the point when checkPostStmt(const CallExpr *CE, CheckerContext 
>>>>>>> &C) callback is called for operator delete(p).
>>>>>>> I haven't investigated the problem deeply yet, intend to address 
>>>>>>> it parallel with the review.
>>>>>>>
>>>>>>>> +  if (NE->getNumPlacementArgs())
>>>>>>>> +    return;
>>>>>>>> +  // skip placement new operators as they may not allocate memory
>>>>>>>>
>>>>>>>> Two comments here:
>>>>>>>> - Please make sure all comments are complete, capitalized, and 
>>>>>>>> punctuated sentences. (This has the important one, 
>>>>>>>> "complete"....just missing capitalization and punctuation.)
>>>>>>> I'll try. Unfortunately I am not as good in English as I want to 
>>>>>>> be, so sorry for my grammar, syntax, and punctuation.
>>>>>>>
>>>>>>> -- 
>>>>>>> Anton
>>>>>>>
>>>>>>> <MallocChecker_v2.patch>
>>>>>>
>>>>>
>>>>> Hi Anna. Thanks for your comments! Attached is the new patch.
>>>>>
>>>>>> Just adding another kind as extra enumeration values does not 
>>>>>> seem right. Another option is to make Kind be a pointer to a 
>>>>>> static array, which contains objects recording all necessary info 
>>>>>> about each kind (MacOSKeychainAPIChecker uses this approach). 
>>>>>> This is probably an overkill for now, but is another option.
>>>>> Not sure that I have got an idea.
>>>>> The memory and deallocator kind are both set for a RefState. Do 
>>>>> you mean creating a static array with 'memory kinds' * 
>>>>> 'deallocator kind' elements for all possible combinations? Also 
>>>>> there is no necessary info other then the kind itself.
>>>>> Left for now.
>>>>
>>>> I think of ToBeReleasedWith* as being different types of Allocate; 
>>>> I don't think they should be separate values in the same enum. It's 
>>>> also unfortunate to have to copy the constant values in both places 
>>>> - DeallocatorKind and RefState::Kind. Maybe you could restructure 
>>>> this similarly to how this is done in SVals.h?
>>>>>
>>>>>> +  const FunctionDecl *getCalleeDecl() const { return CalleeDecl; }
>>>>>> Do we only store the name of the allocator declaration here?
>>>>
>>>> If the Decl is always an allocator Decl, we should change the name 
>>>> of the method to be more descriptive.
>>>>>> Do we need to store this in the state? (Growing states implies 
>>>>>> memory overhead.) Can't this be easily implied from the kind?
>>>>> Kind can't give us information about the name of an allocator that 
>>>>> can be malloc(), realloc(), a user function with ownership_takes 
>>>>> attribute, etc.
>>>>> One solution to avoid keeping a CalleeDecl in RefState is to 
>>>>> rollback to CallExpr::getDirectCallee() from 
>>>>> CheckerContext::getCalleeDec() and to print "malloc()" in case of 
>>>>> indirect calls.
>>>>
>>>> Ok, I see.
>>>
>>> After thinking about it some more, I do not think we should add an 
>>> extra pointer to the state to differentiate between few allocator 
>>> functions. In the case when we do not have ownership attributes, we 
>>> only have few possible allocators, whose names we know ahead of 
>>> time. In case we support ownership attributes, we are likely to have 
>>> few allocator functions whose names we could just store in the 
>>> checker state on the first encounter (like we store the 
>>> IdentifierInfo).
>>>
>>> In addition, we could change the ownership attributes in such a way 
>>> that each allocator would have a corresponding deallocator; for 
>>> example, if we wanted to check matching allocators and deallocators. 
>>> Annotated deallocators won't necessarily be one of the functions you 
>>> know at compile time, so the DeallocatorKind enum would not cover 
>>> it. I think, it's best if we kept a table on a side that would store 
>>> this info (allocation function name, deallocator) and refer to the 
>>> entries in the table from the state. (Take a look at 
>>> MacOSKeychainAPIChecker - it's very similar to malloc, but it 
>>> handles different allocator/deallocator pairs. I think a similar 
>>> solution could work in this case as well. Other solutions that 
>>> address these issues are welcome as well!)
>> Attached is the patch that uses approach with a dynamic table that 
>> holds both allocator name and expected deallocator kind. This 
>> approach allows to keep any allocator names, either known or new 
>> ones. The table could be easily extended to hold additional data such 
>> as info about special deallocators, etc.
>
> Please do not do this. The analyzer is not thread-safe today, but 
> that's no reason why we should make it harder to make it thread-safe 
> tomorrow.
>
> Stepping back, there are many problems with this, the main one being 
> that just keeping track of the function name isn't really good enough. 
> MacOSKeychainAPIChecker can get away with it because its functions are
Could you, please, send the remainder of the sentence? Tried to guess, 
but failed.

>
> I think all Anna meant is to have a /static/ table, like 
> MacOSKeychainAPIChecker. If/when we support allocator families defined 
> at runtime, we can do this properly, with a mutable table as a field 
> of the checker, but for now this is both overkill and error-prone.
> Separately, I don't understand why you chose to /inherit/ from 
> SmallVector, rather than just using a SmallVector member. 
> AllocDeallocDataVec isn't conceptually a vector, it's a type that maps 
> family IDs to names and DeallocatorKinds.
>
>
> Other comments:
>
> +enum DeallocatorKind {
> +  D_free,
> +  D_delete,
> +  D_deleteArray,
> +  D_unknown
> +};
>
> Nitpicking: single-character enum prefixes feel strange to me. DK_?
>
>
> -  enum Kind { // Reference to allocated memory.
> -              Allocated,
> -              // Reference to released/freed memory.
> -              Released,
> -              // The responsibility for freeing resources has 
> transfered from
> -              // this reference. A relinquished symbol should not be 
> freed.
> -              Relinquished } K;
> +  // First two bits of Kind represent memory kind
> +  static const unsigned K_MASK = 0x3;
> +  // The rest bits keep an index to the AllocDeallocData
> +  // that hold information about the allocator and deallocator functions
> +  static const unsigned I_BASE = 2;
> +  static const unsigned I_MASK = ~0 << I_BASE;
> +  // A storage for memory kind and index to the AllocDeallocData
> +  enum KindPlusData { // Reference to allocated memory.
> +                     Allocated,
> +                     // Reference to released/freed memory.
> +                     Released,
> +                     // The responsibility for freeing resources has 
> transfered
> +                     // from this reference. A relinquished symbol 
> should not be
> +                     // freed.
> +                     Relinquished
> +  } K;
>    const Stmt *S;
>
> This should be implemented using bitfields, and the bitfields should 
> go after the Stmt pointer so that the total object size is smaller.
>
>
> +  /// Auxiliary functions that return kind and print names of
> +  /// allocators/deallocators
> +  DeallocatorKind GetDeallocKind(CheckerContext &C, const Expr *E) const;
> +  void PrintAllocDeallocName(raw_ostream &os, CheckerContext &C,
> +                             const Expr *E) const;
> +  void PrintAllocDeallocName(raw_ostream &os, CheckerContext &C,
> +                             const RefState *RS) const;
> +  void PrintExpectedAllocName(raw_ostream &os, CheckerContext &C,
> +                              const Expr *DeallocExpr) const;
>
> Hm. Doxygen will attach the comment to the first declaration and 
> ignore the others. You can use groups 
> <http://www.stack.nl/%7Edimitri/doxygen/manual/grouping.html#memgroup> to 
> sort of get the effect you want, but it might be worth just being 
> clearer anyway.
>
> Also, the convention for function/method names in LLVM is now 
> lowerCamelCase.
>
> Thank you for switching to raw_ostream. :-)
>
>
> +  bool isDefaultNonPtrPlacementNewDelete(const FunctionDecl *FD,
> +                                         CheckerContext &C) const;
>
> Very confusing method name. isStandardNewDelete is probably fine. 
> There's nothing about "pointers" or "placement" that's strange here -- 
> what you want to know is that this is the global declaration of 
> operator new, and that it's not provided by the user.
>
>
> +  if (FD->isReservedGlobalPlacementOperator())
> +    return false;
>
> This is /still/ the wrong check. You need to check if there are 
> /any/ placement arguments, and you need to check that the Decl is not 
> part of a class. Both are easy:
This check is done to throw out standard placement new/deletes with an 
additional pointer parameter but to handle standard placement nothrow 
new/deletes.
As an option we can detect nothrow new/deletes and skip all other 
placement versions.

> if (FD->getNumParams() != 1 || FD->isVariadic())
> return false;
> if (isa<CXXMethodDecl>(FD))
> return false;
>
>
> +      // Process direct calls to operator new/new[]/delete/delete[] 
> functions
> +      // as distinct from new/new[]/delete/delete[] expressions that are
> +      // processed by "checkPostStmt(const CXXNewExpr *" and
> +      // "checkPostStmt(const CXXDeleteExpr *" respectively
>
> Missing a period, and also looks rather odd with the unbalanced 
> parens. Maybe something like "...processed by the checkPostStmt 
> callbacks for CXXNewExpr and CXXDeleteExpr"? Also, you can't really 
> use "respectively" here because you're matching two items up with four 
> items, although it is pretty clear what you mean.
>
> Good thought here, though—I wouldn't have remembered to do this. 
> Hopefully someday it will be unimportant anyway when we use CallEvent 
> to model the allocator and deallocator calls inside CXXNewExpr and 
> CXXDeleteExpr.
>
>
> +        assert(0 && "not a new/delete oparator");
>
> This is better written llvm_unreachable("not a new/delete operator");
>
>
> +  // the return value from operator new is already bound and we don't 
> want to
> +  // break this binding so we call MallocUpdateRefState instead of 
> MallocMemAux
>
> Capitalization, period.
>
>
> +  StringRef Name = "";
> +  if (const CallExpr *CE = dyn_cast<CallExpr>(E)) {
> +    if (const FunctionDecl *FD = C.getCalleeDecl(CE)) {
> +      if (FD->getDeclName().isIdentifier()) {
> +        Name = FD->getName();
> +      }
> +    }
> +  }
> +
> +  unsigned Idx = AllocDeallocData.GetOrInsert(Name, dKind);
>    // Set the symbol's state to Allocated.
> -  return state->set<RegionState>(Sym, RefState::getAllocated(CE));
> -
> +  return state->set<RegionState>(Sym, RefState::getAllocated(E, Idx));
>
> Yeah, honestly it makes a lot more sense to me to just store the dKind 
> in the RefState, and lazily derive the name when we need to print it 
> at the end. (And not do it so specifically.)
>
>
> +  if (const CXXDeleteExpr *DE = dyn_cast_or_null<CXXDeleteExpr>(E))
> +    return DE->isArrayForm() ? D_deleteArray : D_delete;
>
> You already tested for null, so you can just use dyn_cast here.
>
>
> +  if (isa<ObjCMessageExpr>(E))
> +    return D_free;
>
> There are no Objective-C messages that free memory immediately; just 
> take this out.
Should be fixed somehow for situations when an ObjCMessageExpr is passed 
as a deallocator to FreeMemAux(). Just removing the above code will 
cause GetDeallocKind() to return D_unknown for an ObjCMessageExpr and we 
end up with assert(0 && "unknown or unhandled DeallocatorKind"); 
triggering in PrintExpectedAllocName().  malloc.mm test contain 
testcases illustrating this.

>
>
> +  // get the exact name of an allocation function
>
> Capitalization, spelling.
>
>
> +      if (FD->getKind() == Decl::Function) {
>
> Please don't do this; it rules out C++ methods. Actually, how about 
> this, for the whole method?
>
> bool MallocChecker::PrintAllocDeallocName(raw_ostream &os, 
> CheckerContext &C,
>     const Expr *E) const {
>   if (const CallExpr *CE = dyn_cast<CallExpr>(E)) {
>     // FIXME: This doesn't handle indirect calls.
>     const FunctionDecl *FD = CE->getDirectCallee();
>     if (!FD)
>       return false;
>     os << *FD;
>     if (!FD->isOverloadedOperator())
>       os << "()";
>     return true;
>   }
>
>   if (const ObjCMessageExpr *Msg = dyn_cast<ObjCMessageExpr>(E)) {
>     if (Msg->isInstanceMessage())
>       os << "-";
>     else
>       os << "+";
>     os << Msg->getSelector().getAsString();
>     return true;
>   }
>
>   if (const CXXNewExpr *NE = dyn_cast<CXXNewExpr>(E)) {
>     os << *NE->getOperatorNew();
>     return true;
>   }
>
>   if (const CXXDeleteExpr *DE = dyn_cast<CXXDeleteExpr>(E)) {
>     os << *DE->getOperatorDelete();
>     return true;
>   }
>
>   return false;
> }
>
> This actually seems generally useful and could even go on 
> CheckerContext as a convenience helper. Note the bool return, so that 
> rather than come up with some placeholder text like "unknown means", 
> we can just rephrase the message to not mention the allocator.
Great!

>
>
> +  switch(dKind) {
> +    case D_free: os << "malloc()"; return;
> +    case D_delete: os << "operator new"; return;
> +    case D_deleteArray: os << "operator new[]"; return;
> +    case D_unknown:
> +    default: assert(0 && "unknown or unhandled DeallocatorKind");
> +  }
>
> LLVM style is to not include default cases for an enum-covered switch 
> statement. Also llvm_unreachable again.
>
> Thanks for your patience in iterating on this!
> Jordan
So we rollback to the approach with a single DeallocatorKind implicitly 
stored inside RefState::Kind as subkinds are stored inside 
SVal::BaseKind in Sval.h, right?

-- 
Anton

-------------- next part --------------
An HTML attachment was scrubbed...
URL: <http://lists.llvm.org/pipermail/cfe-commits/attachments/20130223/6f21dc04/attachment.html>


More information about the cfe-commits mailing list