[PATCH][Review request] unix.Malloc checker improvement: +handling new/delete, +memory.MismatchedFree, +memory.MismatchedDelete, +improved display names for allocators/deallocators
Anton Yartsev
anton.yartsev at gmail.com
Fri Feb 22 22:01:52 PST 2013
On 23.02.2013 6:47, Jordan Rose wrote:
>
> On Feb 22, 2013, at 7:24 , Anton Yartsev <anton.yartsev at gmail.com
> <mailto:anton.yartsev at gmail.com>> wrote:
>
>> On 22.02.2013 9:30, Anna Zaks wrote:
>>>
>>> On Feb 21, 2013, at 6:26 PM, Anna Zaks <ganna at apple.com
>>> <mailto:ganna at apple.com>> wrote:
>>>
>>>>
>>>> On Feb 21, 2013, at 6:00 PM, Anton Yartsev <anton.yartsev at gmail.com
>>>> <mailto:anton.yartsev at gmail.com>> wrote:
>>>>
>>>>>>
>>>>>> On Feb 19, 2013, at 10:18 PM, Anton Yartsev
>>>>>> <anton.yartsev at gmail.com <mailto:anton.yartsev at gmail.com>> wrote:
>>>>>>
>>>>>>> Hi, Jordan. Thanx for the review!
>>>>>>>
>>>>>>> Attached is the new version of the patch with all the comments
>>>>>>> addressed. Also added support for directly called operator
>>>>>>> new()/new[]() and operator delete()
>>>>>>>
>>>>>>> There is currently one problem with handling of operator
>>>>>>> delete(). The following test
>>>>>>>
>>>>>>> void testDeleteOp1() {
>>>>>>> int *p = (int *)malloc(sizeof(int));
>>>>>>> operator delete(p); // FIXME: should complain "Argument to
>>>>>>> operator delete() was allocated by malloc(), not operator new"
>>>>>>> }
>>>>>>>
>>>>>>> produce no warnings as attached RefState seem to be missing at
>>>>>>> the point when checkPostStmt(const CallExpr *CE, CheckerContext
>>>>>>> &C) callback is called for operator delete(p).
>>>>>>> I haven't investigated the problem deeply yet, intend to address
>>>>>>> it parallel with the review.
>>>>>>>
>>>>>>>> + if (NE->getNumPlacementArgs())
>>>>>>>> + return;
>>>>>>>> + // skip placement new operators as they may not allocate memory
>>>>>>>>
>>>>>>>> Two comments here:
>>>>>>>> - Please make sure all comments are complete, capitalized, and
>>>>>>>> punctuated sentences. (This has the important one,
>>>>>>>> "complete"....just missing capitalization and punctuation.)
>>>>>>> I'll try. Unfortunately I am not as good in English as I want to
>>>>>>> be, so sorry for my grammar, syntax, and punctuation.
>>>>>>>
>>>>>>> --
>>>>>>> Anton
>>>>>>>
>>>>>>> <MallocChecker_v2.patch>
>>>>>>
>>>>>
>>>>> Hi Anna. Thanks for your comments! Attached is the new patch.
>>>>>
>>>>>> Just adding another kind as extra enumeration values does not
>>>>>> seem right. Another option is to make Kind be a pointer to a
>>>>>> static array, which contains objects recording all necessary info
>>>>>> about each kind (MacOSKeychainAPIChecker uses this approach).
>>>>>> This is probably an overkill for now, but is another option.
>>>>> Not sure that I have got an idea.
>>>>> The memory and deallocator kind are both set for a RefState. Do
>>>>> you mean creating a static array with 'memory kinds' *
>>>>> 'deallocator kind' elements for all possible combinations? Also
>>>>> there is no necessary info other then the kind itself.
>>>>> Left for now.
>>>>
>>>> I think of ToBeReleasedWith* as being different types of Allocate;
>>>> I don't think they should be separate values in the same enum. It's
>>>> also unfortunate to have to copy the constant values in both places
>>>> - DeallocatorKind and RefState::Kind. Maybe you could restructure
>>>> this similarly to how this is done in SVals.h?
>>>>>
>>>>>> + const FunctionDecl *getCalleeDecl() const { return CalleeDecl; }
>>>>>> Do we only store the name of the allocator declaration here?
>>>>
>>>> If the Decl is always an allocator Decl, we should change the name
>>>> of the method to be more descriptive.
>>>>>> Do we need to store this in the state? (Growing states implies
>>>>>> memory overhead.) Can't this be easily implied from the kind?
>>>>> Kind can't give us information about the name of an allocator that
>>>>> can be malloc(), realloc(), a user function with ownership_takes
>>>>> attribute, etc.
>>>>> One solution to avoid keeping a CalleeDecl in RefState is to
>>>>> rollback to CallExpr::getDirectCallee() from
>>>>> CheckerContext::getCalleeDec() and to print "malloc()" in case of
>>>>> indirect calls.
>>>>
>>>> Ok, I see.
>>>
>>> After thinking about it some more, I do not think we should add an
>>> extra pointer to the state to differentiate between few allocator
>>> functions. In the case when we do not have ownership attributes, we
>>> only have few possible allocators, whose names we know ahead of
>>> time. In case we support ownership attributes, we are likely to have
>>> few allocator functions whose names we could just store in the
>>> checker state on the first encounter (like we store the
>>> IdentifierInfo).
>>>
>>> In addition, we could change the ownership attributes in such a way
>>> that each allocator would have a corresponding deallocator; for
>>> example, if we wanted to check matching allocators and deallocators.
>>> Annotated deallocators won't necessarily be one of the functions you
>>> know at compile time, so the DeallocatorKind enum would not cover
>>> it. I think, it's best if we kept a table on a side that would store
>>> this info (allocation function name, deallocator) and refer to the
>>> entries in the table from the state. (Take a look at
>>> MacOSKeychainAPIChecker - it's very similar to malloc, but it
>>> handles different allocator/deallocator pairs. I think a similar
>>> solution could work in this case as well. Other solutions that
>>> address these issues are welcome as well!)
>> Attached is the patch that uses approach with a dynamic table that
>> holds both allocator name and expected deallocator kind. This
>> approach allows to keep any allocator names, either known or new
>> ones. The table could be easily extended to hold additional data such
>> as info about special deallocators, etc.
>
> Please do not do this. The analyzer is not thread-safe today, but
> that's no reason why we should make it harder to make it thread-safe
> tomorrow.
>
> Stepping back, there are many problems with this, the main one being
> that just keeping track of the function name isn't really good enough.
> MacOSKeychainAPIChecker can get away with it because its functions are
Could you, please, send the remainder of the sentence? Tried to guess,
but failed.
>
> I think all Anna meant is to have a /static/ table, like
> MacOSKeychainAPIChecker. If/when we support allocator families defined
> at runtime, we can do this properly, with a mutable table as a field
> of the checker, but for now this is both overkill and error-prone.
> Separately, I don't understand why you chose to /inherit/ from
> SmallVector, rather than just using a SmallVector member.
> AllocDeallocDataVec isn't conceptually a vector, it's a type that maps
> family IDs to names and DeallocatorKinds.
>
>
> Other comments:
>
> +enum DeallocatorKind {
> + D_free,
> + D_delete,
> + D_deleteArray,
> + D_unknown
> +};
>
> Nitpicking: single-character enum prefixes feel strange to me. DK_?
>
>
> - enum Kind { // Reference to allocated memory.
> - Allocated,
> - // Reference to released/freed memory.
> - Released,
> - // The responsibility for freeing resources has
> transfered from
> - // this reference. A relinquished symbol should not be
> freed.
> - Relinquished } K;
> + // First two bits of Kind represent memory kind
> + static const unsigned K_MASK = 0x3;
> + // The rest bits keep an index to the AllocDeallocData
> + // that hold information about the allocator and deallocator functions
> + static const unsigned I_BASE = 2;
> + static const unsigned I_MASK = ~0 << I_BASE;
> + // A storage for memory kind and index to the AllocDeallocData
> + enum KindPlusData { // Reference to allocated memory.
> + Allocated,
> + // Reference to released/freed memory.
> + Released,
> + // The responsibility for freeing resources has
> transfered
> + // from this reference. A relinquished symbol
> should not be
> + // freed.
> + Relinquished
> + } K;
> const Stmt *S;
>
> This should be implemented using bitfields, and the bitfields should
> go after the Stmt pointer so that the total object size is smaller.
>
>
> + /// Auxiliary functions that return kind and print names of
> + /// allocators/deallocators
> + DeallocatorKind GetDeallocKind(CheckerContext &C, const Expr *E) const;
> + void PrintAllocDeallocName(raw_ostream &os, CheckerContext &C,
> + const Expr *E) const;
> + void PrintAllocDeallocName(raw_ostream &os, CheckerContext &C,
> + const RefState *RS) const;
> + void PrintExpectedAllocName(raw_ostream &os, CheckerContext &C,
> + const Expr *DeallocExpr) const;
>
> Hm. Doxygen will attach the comment to the first declaration and
> ignore the others. You can use groups
> <http://www.stack.nl/%7Edimitri/doxygen/manual/grouping.html#memgroup> to
> sort of get the effect you want, but it might be worth just being
> clearer anyway.
>
> Also, the convention for function/method names in LLVM is now
> lowerCamelCase.
>
> Thank you for switching to raw_ostream. :-)
>
>
> + bool isDefaultNonPtrPlacementNewDelete(const FunctionDecl *FD,
> + CheckerContext &C) const;
>
> Very confusing method name. isStandardNewDelete is probably fine.
> There's nothing about "pointers" or "placement" that's strange here --
> what you want to know is that this is the global declaration of
> operator new, and that it's not provided by the user.
>
>
> + if (FD->isReservedGlobalPlacementOperator())
> + return false;
>
> This is /still/ the wrong check. You need to check if there are
> /any/ placement arguments, and you need to check that the Decl is not
> part of a class. Both are easy:
This check is done to throw out standard placement new/deletes with an
additional pointer parameter but to handle standard placement nothrow
new/deletes.
As an option we can detect nothrow new/deletes and skip all other
placement versions.
> if (FD->getNumParams() != 1 || FD->isVariadic())
> return false;
> if (isa<CXXMethodDecl>(FD))
> return false;
>
>
> + // Process direct calls to operator new/new[]/delete/delete[]
> functions
> + // as distinct from new/new[]/delete/delete[] expressions that are
> + // processed by "checkPostStmt(const CXXNewExpr *" and
> + // "checkPostStmt(const CXXDeleteExpr *" respectively
>
> Missing a period, and also looks rather odd with the unbalanced
> parens. Maybe something like "...processed by the checkPostStmt
> callbacks for CXXNewExpr and CXXDeleteExpr"? Also, you can't really
> use "respectively" here because you're matching two items up with four
> items, although it is pretty clear what you mean.
>
> Good thought here, though—I wouldn't have remembered to do this.
> Hopefully someday it will be unimportant anyway when we use CallEvent
> to model the allocator and deallocator calls inside CXXNewExpr and
> CXXDeleteExpr.
>
>
> + assert(0 && "not a new/delete oparator");
>
> This is better written llvm_unreachable("not a new/delete operator");
>
>
> + // the return value from operator new is already bound and we don't
> want to
> + // break this binding so we call MallocUpdateRefState instead of
> MallocMemAux
>
> Capitalization, period.
>
>
> + StringRef Name = "";
> + if (const CallExpr *CE = dyn_cast<CallExpr>(E)) {
> + if (const FunctionDecl *FD = C.getCalleeDecl(CE)) {
> + if (FD->getDeclName().isIdentifier()) {
> + Name = FD->getName();
> + }
> + }
> + }
> +
> + unsigned Idx = AllocDeallocData.GetOrInsert(Name, dKind);
> // Set the symbol's state to Allocated.
> - return state->set<RegionState>(Sym, RefState::getAllocated(CE));
> -
> + return state->set<RegionState>(Sym, RefState::getAllocated(E, Idx));
>
> Yeah, honestly it makes a lot more sense to me to just store the dKind
> in the RefState, and lazily derive the name when we need to print it
> at the end. (And not do it so specifically.)
>
>
> + if (const CXXDeleteExpr *DE = dyn_cast_or_null<CXXDeleteExpr>(E))
> + return DE->isArrayForm() ? D_deleteArray : D_delete;
>
> You already tested for null, so you can just use dyn_cast here.
>
>
> + if (isa<ObjCMessageExpr>(E))
> + return D_free;
>
> There are no Objective-C messages that free memory immediately; just
> take this out.
Should be fixed somehow for situations when an ObjCMessageExpr is passed
as a deallocator to FreeMemAux(). Just removing the above code will
cause GetDeallocKind() to return D_unknown for an ObjCMessageExpr and we
end up with assert(0 && "unknown or unhandled DeallocatorKind");
triggering in PrintExpectedAllocName(). malloc.mm test contain
testcases illustrating this.
>
>
> + // get the exact name of an allocation function
>
> Capitalization, spelling.
>
>
> + if (FD->getKind() == Decl::Function) {
>
> Please don't do this; it rules out C++ methods. Actually, how about
> this, for the whole method?
>
> bool MallocChecker::PrintAllocDeallocName(raw_ostream &os,
> CheckerContext &C,
> const Expr *E) const {
> if (const CallExpr *CE = dyn_cast<CallExpr>(E)) {
> // FIXME: This doesn't handle indirect calls.
> const FunctionDecl *FD = CE->getDirectCallee();
> if (!FD)
> return false;
> os << *FD;
> if (!FD->isOverloadedOperator())
> os << "()";
> return true;
> }
>
> if (const ObjCMessageExpr *Msg = dyn_cast<ObjCMessageExpr>(E)) {
> if (Msg->isInstanceMessage())
> os << "-";
> else
> os << "+";
> os << Msg->getSelector().getAsString();
> return true;
> }
>
> if (const CXXNewExpr *NE = dyn_cast<CXXNewExpr>(E)) {
> os << *NE->getOperatorNew();
> return true;
> }
>
> if (const CXXDeleteExpr *DE = dyn_cast<CXXDeleteExpr>(E)) {
> os << *DE->getOperatorDelete();
> return true;
> }
>
> return false;
> }
>
> This actually seems generally useful and could even go on
> CheckerContext as a convenience helper. Note the bool return, so that
> rather than come up with some placeholder text like "unknown means",
> we can just rephrase the message to not mention the allocator.
Great!
>
>
> + switch(dKind) {
> + case D_free: os << "malloc()"; return;
> + case D_delete: os << "operator new"; return;
> + case D_deleteArray: os << "operator new[]"; return;
> + case D_unknown:
> + default: assert(0 && "unknown or unhandled DeallocatorKind");
> + }
>
> LLVM style is to not include default cases for an enum-covered switch
> statement. Also llvm_unreachable again.
>
> Thanks for your patience in iterating on this!
> Jordan
So we rollback to the approach with a single DeallocatorKind implicitly
stored inside RefState::Kind as subkinds are stored inside
SVal::BaseKind in Sval.h, right?
--
Anton
-------------- next part --------------
An HTML attachment was scrubbed...
URL: <http://lists.llvm.org/pipermail/cfe-commits/attachments/20130223/6f21dc04/attachment.html>
More information about the cfe-commits
mailing list