<html><head><meta http-equiv="Content-Type" content="text/html charset=windows-1252"></head><body style="word-wrap: break-word; -webkit-nbsp-mode: space; -webkit-line-break: after-white-space; "><br><div><div>On Feb 22, 2013, at 7:24 , Anton Yartsev <<a href="mailto:anton.yartsev@gmail.com">anton.yartsev@gmail.com</a>> wrote:</div><br class="Apple-interchange-newline"><blockquote type="cite">
<meta content="text/html; charset=ISO-8859-1" http-equiv="Content-Type">
<div text="#000000" bgcolor="#FFFFFF">
<div class="moz-cite-prefix">On 22.02.2013 9:30, Anna Zaks wrote:<br>
</div>
<blockquote cite="mid:254908E4-011B-4100-BE1A-48A980CDD204@apple.com" type="cite">
<meta http-equiv="Content-Type" content="text/html;
charset=ISO-8859-1">
<br>
<div>
<div>On Feb 21, 2013, at 6:26 PM, Anna Zaks <<a moz-do-not-send="true" href="mailto:ganna@apple.com">ganna@apple.com</a>>
wrote:</div>
<br class="Apple-interchange-newline">
<blockquote type="cite">
<meta http-equiv="Content-Type" content="text/html;
charset=ISO-8859-1">
<div style="word-wrap: break-word; -webkit-nbsp-mode: space;
-webkit-line-break: after-white-space;"><br>
<div>
<div>On Feb 21, 2013, at 6:00 PM, Anton Yartsev <<a moz-do-not-send="true" href="mailto:anton.yartsev@gmail.com">anton.yartsev@gmail.com</a>>
wrote:</div>
<br class="Apple-interchange-newline">
<blockquote type="cite">
<meta content="text/html; charset=ISO-8859-1" http-equiv="Content-Type">
<div text="#000000" bgcolor="#FFFFFF">
<blockquote cite="mid:2CAEFC0E-8B7A-4301-9143-CC1EC37A582E@apple.com" type="cite">
<meta http-equiv="Content-Type" content="text/html;
charset=ISO-8859-1">
<meta http-equiv="Content-Type" content="text/html;
charset=ISO-8859-1">
<br>
<div>
<div>On Feb 19, 2013, at 10:18 PM, Anton Yartsev
<<a moz-do-not-send="true" href="mailto:anton.yartsev@gmail.com">anton.yartsev@gmail.com</a>>
wrote:</div>
<br class="Apple-interchange-newline">
<blockquote type="cite">Hi, Jordan. Thanx for the
review!<br>
<br>
Attached is the new version of the patch with
all the comments addressed. Also added support
for directly called operator new()/new[]() and
operator delete()<br>
<br>
There is currently one problem with handling of
operator delete(). The following test<br>
<br>
void testDeleteOp1() {<br>
int *p = (int *)malloc(sizeof(int));<br>
operator delete(p); // FIXME: should complain
"Argument to operator delete() was allocated by
malloc(), not operator new"<br>
}<br>
<br>
produce no warnings as attached RefState seem to
be missing at the point when checkPostStmt(const
CallExpr *CE, CheckerContext &C) callback is
called for operator delete(p).<br>
I haven't investigated the problem deeply yet,
intend to address it parallel with the review.<br>
<br>
<blockquote type="cite">+ if
(NE->getNumPlacementArgs())<br>
+ return;<br>
+ // skip placement new operators as they may
not allocate memory<br>
<br>
Two comments here:<br>
- Please make sure all comments are complete,
capitalized, and punctuated sentences. (This
has the important one, "complete"....just
missing capitalization and punctuation.)<br>
</blockquote>
I'll try. Unfortunately I am not as good in
English as I want to be, so sorry for my
grammar, syntax, and punctuation.<br>
<br>
-- <br>
Anton<br>
<br>
<span><MallocChecker_v2.patch></span></blockquote>
<div><br>
</div>
</div>
</blockquote>
<br>
Hi Anna. Thanks for your comments! Attached is the new
patch.<br>
<br>
<blockquote cite="mid:2CAEFC0E-8B7A-4301-9143-CC1EC37A582E@apple.com" type="cite">
<div>Just adding another kind as extra enumeration
values does not seem right. Another option is to
make Kind be a pointer to a static array, which
contains objects recording all necessary info
about each kind (<span style="color: rgb(0, 97,
65); font-family: Monaco; font-size: 11px; ">MacOSKeychainAPIChecker</span> uses
this approach). This is probably an overkill for
now, but is another option.</div>
</blockquote>
Not sure that I have got an idea.<br>
The memory and deallocator kind are both set for a
RefState. Do you mean creating a static array with
'memory kinds' * 'deallocator kind' elements for all
possible combinations? Also there is no necessary info
other then the kind itself.<br>
Left for now.<br>
</div>
</blockquote>
<div><br>
</div>
<div>I think of ToBeReleasedWith* as being different types
of Allocate; I don't think they should be separate
values in the same enum. It's also unfortunate to have
to copy the constant values in both places -
DeallocatorKind and RefState::Kind. Maybe you could
restructure this similarly to how this is done in
SVals.h?<br>
<blockquote type="cite">
<div text="#000000" bgcolor="#FFFFFF"><br>
<blockquote cite="mid:2CAEFC0E-8B7A-4301-9143-CC1EC37A582E@apple.com" type="cite">
<div>+ const FunctionDecl *getCalleeDecl() const
{ return CalleeDecl; }</div>
<div>Do we only store the name of the allocator
declaration here? </div>
</blockquote>
</div>
</blockquote>
<div><br>
</div>
<div>If the Decl is always an allocator Decl, we should
change the name of the method to be more descriptive.</div>
</div>
</div>
</div>
</blockquote>
</div>
</blockquote>
<blockquote cite="mid:254908E4-011B-4100-BE1A-48A980CDD204@apple.com" type="cite">
<div>
<blockquote type="cite">
<div style="word-wrap: break-word; -webkit-nbsp-mode: space;
-webkit-line-break: after-white-space;">
<div>
<blockquote type="cite">
<div text="#000000" bgcolor="#FFFFFF">
<blockquote cite="mid:2CAEFC0E-8B7A-4301-9143-CC1EC37A582E@apple.com" type="cite">
<div>Do we need to store this in the state? (Growing
states implies memory overhead.) Can't this be
easily implied from the kind?</div>
</blockquote>
Kind can't give us information about the name of an
allocator that can be malloc(), realloc(), a user
function with ownership_takes attribute, etc.</div>
</blockquote>
<blockquote type="cite">
<div text="#000000" bgcolor="#FFFFFF"> One solution to
avoid keeping a CalleeDecl in RefState is to rollback
to CallExpr::getDirectCallee() from
CheckerContext::getCalleeDec() and to print "malloc()"
in case of indirect calls.<br>
</div>
</blockquote>
<div><br>
</div>
Ok, I see.<br>
</div>
</div>
</blockquote>
<div><br>
</div>
After thinking about it some more, I do not think we should add
an extra pointer to the state to differentiate between few
allocator functions. In the case when we do not have ownership
attributes, we only have few possible allocators, whose names we
know ahead of time. In case we support ownership attributes, we
are likely to have few allocator functions whose names we could
just store in the checker state on the first encounter (like we
store the IdentifierInfo). </div>
<div><br>
</div>
<div>In addition, we could change the ownership attributes in such
a way that each allocator would have a corresponding
deallocator; for example, if we wanted to check matching
allocators and deallocators. Annotated deallocators won't
necessarily be one of the functions you know at compile time, so
the DeallocatorKind enum would not cover it. I think, it's best
if we kept a table on a side that would store this info
(allocation function name, deallocator) and refer to the entries
in the table from the state. (Take a look at
MacOSKeychainAPIChecker - it's very similar to malloc, but it
handles different allocator/deallocator pairs. I think a similar
solution could work in this case as well. Other solutions that
address these issues are welcome as well!)</div>
</blockquote>
Attached is the patch that uses approach with a dynamic table that
holds both allocator name and expected deallocator kind. This
approach allows to keep any allocator names, either known or new
ones. The table could be easily extended to hold additional data
such as info about special deallocators, etc.<br></div></blockquote></div><br><div>Please do not do this. The analyzer is not thread-safe today, but that's no reason why we should make it harder to make it thread-safe tomorrow.</div><div><br></div><div>Stepping back, there are many problems with this, the main one being that just keeping track of the function name isn't really good enough. MacOSKeychainAPIChecker can get away with it because its functions are </div><div><br></div><div>I think all Anna meant is to have a <i>static</i> table, like MacOSKeychainAPIChecker. If/when we support allocator families defined at runtime, we can do this properly, with a mutable table as a field of the checker, but for now this is both overkill and error-prone.</div><div><br></div><div>Separately, I don't understand why you chose to <i>inherit</i> from SmallVector, rather than just using a SmallVector member. AllocDeallocDataVec isn't conceptually a vector, it's a type that maps family IDs to names and DeallocatorKinds.</div><div><br></div><div><br></div><div>Other comments:</div><div><br></div><div><div>+enum DeallocatorKind {</div><div>+ D_free,</div><div>+ D_delete,</div><div>+ D_deleteArray,</div><div>+ D_unknown</div><div>+};</div></div><div><br></div><div>Nitpicking: single-character enum prefixes feel strange to me. DK_?</div><div><br></div><div><br></div><div><div>- enum Kind { // Reference to allocated memory.</div><div>- Allocated,</div><div>- // Reference to released/freed memory.</div><div>- Released,</div><div>- // The responsibility for freeing resources has transfered from</div><div>- // this reference. A relinquished symbol should not be freed.</div><div>- Relinquished } K;</div><div>+ // First two bits of Kind represent memory kind</div><div>+ static const unsigned K_MASK = 0x3;</div><div>+ // The rest bits keep an index to the AllocDeallocData</div><div>+ // that hold information about the allocator and deallocator functions</div><div>+ static const unsigned I_BASE = 2;</div><div>+ static const unsigned I_MASK = ~0 << I_BASE;</div><div>+ // A storage for memory kind and index to the AllocDeallocData</div><div>+ enum KindPlusData { // Reference to allocated memory.</div><div>+ Allocated,</div><div>+ // Reference to released/freed memory.</div><div>+ Released,</div><div>+ // The responsibility for freeing resources has transfered</div><div>+ // from this reference. A relinquished symbol should not be</div><div>+ // freed.</div><div>+ Relinquished</div><div>+ } K;</div><div> const Stmt *S;</div></div><div><br></div><div>This should be implemented using bitfields, and the bitfields should go after the Stmt pointer so that the total object size is smaller.</div><div><br></div><div><br></div><div><div><div>+ /// Auxiliary functions that return kind and print names of </div><div>+ /// allocators/deallocators</div><div>+ DeallocatorKind GetDeallocKind(CheckerContext &C, const Expr *E) const;</div><div>+ void PrintAllocDeallocName(raw_ostream &os, CheckerContext &C, </div><div>+ const Expr *E) const;</div><div>+ void PrintAllocDeallocName(raw_ostream &os, CheckerContext &C, </div><div>+ const RefState *RS) const;</div><div>+ void PrintExpectedAllocName(raw_ostream &os, CheckerContext &C, </div><div>+ const Expr *DeallocExpr) const;</div></div></div><div><br></div><div>Hm. Doxygen will attach the comment to the first declaration and ignore the others. You can use <a href="http://www.stack.nl/~dimitri/doxygen/manual/grouping.html#memgroup">groups</a> to sort of get the effect you want, but it might be worth just being clearer anyway.</div><div><br></div><div>Also, the convention for function/method names in LLVM is now lowerCamelCase.</div><div><br></div><div>Thank you for switching to raw_ostream. :-)</div><div><br></div><div><br></div><div><div>+ bool isDefaultNonPtrPlacementNewDelete(const FunctionDecl *FD,</div><div>+ CheckerContext &C) const;</div></div><div><br></div><div>Very confusing method name. isStandardNewDelete is probably fine. There's nothing about "pointers" or "placement" that's strange here -- what you want to know is that this is the global declaration of operator new, and that it's not provided by the user.</div><div><br></div><div><br></div><div><div>+ if (FD->isReservedGlobalPlacementOperator())</div></div><div><div>+ return false;</div></div><div><br></div><div>This is <i>still</i> the wrong check. You need to check if there are <i>any</i> placement arguments, and you need to check that the Decl is not part of a class. Both are easy:</div><div><br></div><div>if (FD->getNumParams() != 1 || FD->isVariadic())</div><div><span class="Apple-tab-span" style="white-space:pre"> </span>return false;</div><div>if (isa<CXXMethodDecl>(FD))</div><div><span class="Apple-tab-span" style="white-space:pre"> </span>return false;</div><div><br></div><div><br></div><div><div>+ // Process direct calls to operator new/new[]/delete/delete[] functions</div><div>+ // as distinct from new/new[]/delete/delete[] expressions that are </div><div>+ // processed by "checkPostStmt(const CXXNewExpr *" and </div><div>+ // "checkPostStmt(const CXXDeleteExpr *" respectively</div></div><div><br></div><div>Missing a period, and also looks rather odd with the unbalanced parens. Maybe something like "...processed by the checkPostStmt callbacks for CXXNewExpr and CXXDeleteExpr"? Also, you can't really use "respectively" here because you're matching two items up with four items, although it is pretty clear what you mean.</div><div><br></div><div>Good thought here, though—I wouldn't have remembered to do this. Hopefully someday it will be unimportant anyway when we use CallEvent to model the allocator and deallocator calls inside CXXNewExpr and CXXDeleteExpr.</div><div><br></div><div><br></div><div><div>+ assert(0 && "not a new/delete oparator");</div></div><div><br></div><div>This is better written llvm_unreachable("not a new/delete operator");</div><div><br></div><div><br></div><div><div>+ // the return value from operator new is already bound and we don't want to </div><div>+ // break this binding so we call MallocUpdateRefState instead of MallocMemAux</div></div><div><br></div><div>Capitalization, period.</div><div><br></div><div><br></div><div><div>+ StringRef Name = "";</div><div>+ if (const CallExpr *CE = dyn_cast<CallExpr>(E)) {</div><div>+ if (const FunctionDecl *FD = C.getCalleeDecl(CE)) {</div><div>+ if (FD->getDeclName().isIdentifier()) {</div><div>+ Name = FD->getName();</div><div>+ }</div><div>+ }</div><div>+ }</div><div>+</div><div>+ unsigned Idx = AllocDeallocData.GetOrInsert(Name, dKind);</div><div> // Set the symbol's state to Allocated.</div><div>- return state->set<RegionState>(Sym, RefState::getAllocated(CE));</div><div>-</div><div>+ return state->set<RegionState>(Sym, RefState::getAllocated(E, Idx));</div></div><div><br></div><div>Yeah, honestly it makes a lot more sense to me to just store the dKind in the RefState, and lazily derive the name when we need to print it at the end. (And not do it so specifically.)</div><div><br></div><div><br></div><div><div>+ if (const CXXDeleteExpr *DE = dyn_cast_or_null<CXXDeleteExpr>(E))</div><div>+ return DE->isArrayForm() ? D_deleteArray : D_delete;</div></div><div><br></div><div>You already tested for null, so you can just use dyn_cast here.</div><div><br></div><div><br></div><div><div>+ if (isa<ObjCMessageExpr>(E))</div><div>+ return D_free;</div></div><div><br></div><div>There are no Objective-C messages that free memory immediately; just take this out.</div><div><br></div><div><br></div><div><div>+ // get the exact name of an allocation function</div></div><div><br></div><div>Capitalization, spelling.</div><div><br></div><div><br></div><div><div>+ if (FD->getKind() == Decl::Function) {</div></div><div><br></div><div>Please don't do this; it rules out C++ methods. Actually, how about this, for the whole method?</div><div><br></div><div><div><font face="Courier">bool MallocChecker::PrintAllocDeallocName(raw_ostream &os, CheckerContext &C, </font></div><div><font face="Courier"> const Expr *E) const {</font></div><div><font face="Courier"> if (const CallExpr *CE = dyn_cast<CallExpr>(E)) {</font></div><div><font face="Courier"> // FIXME: This doesn't handle indirect calls.</font></div><div><font face="Courier"> const FunctionDecl *FD = CE->getDirectCallee();</font></div><div><font face="Courier"> if (!FD)</font></div><div><font face="Courier"> return false;</font></div><div><font face="Courier"> </font></div><div><font face="Courier"> os << *FD;</font></div><div><font face="Courier"> if (!FD->isOverloadedOperator())</font></div><div><font face="Courier"> os << "()";</font></div><div><font face="Courier"> return true;</font></div><div><font face="Courier"> }</font></div><div><font face="Courier"><br></font></div><div><font face="Courier"> if (const ObjCMessageExpr *Msg = dyn_cast<ObjCMessageExpr>(E)) {</font></div><div><font face="Courier"> if (Msg->isInstanceMessage())</font></div><div><font face="Courier"> os << "-";</font></div><div><font face="Courier"> else</font></div><div><font face="Courier"> os << "+";</font></div><div><font face="Courier"> os << Msg->getSelector().getAsString();</font></div><div><font face="Courier"> return true;</font></div><div><font face="Courier"> }</font></div><div><font face="Courier"><br></font></div><div><font face="Courier"> if (const CXXNewExpr *NE = dyn_cast<CXXNewExpr>(E)) {</font></div><div><font face="Courier"> os << *NE->getOperatorNew();</font></div><div><font face="Courier"> return true;</font></div><div><font face="Courier"> }</font></div><div><font face="Courier"><br></font></div><div><font face="Courier"> if (const CXXDeleteExpr *DE = dyn_cast<CXXDeleteExpr>(E)) {</font></div><div><font face="Courier"> os << *DE->getOperatorDelete();</font></div><div><font face="Courier"> return true;</font></div><div><font face="Courier"> }</font></div><div><font face="Courier"><br></font></div><div><font face="Courier"> return false;</font></div><div><font face="Courier">}</font></div></div><div><font face="Courier"><br></font></div><div>This actually seems generally useful and could even go on CheckerContext as a convenience helper. Note the bool return, so that rather than come up with some placeholder text like "unknown means", we can just rephrase the message to not mention the allocator.</div><div><br></div><div><br></div><div><div>+ switch(dKind) {</div><div>+ case D_free: os << "malloc()"; return;</div><div>+ case D_delete: os << "operator new"; return;</div><div>+ case D_deleteArray: os << "operator new[]"; return;</div><div>+ case D_unknown:</div><div>+ default: assert(0 && "unknown or unhandled DeallocatorKind");</div><div>+ }</div></div><div><br></div><div>LLVM style is to not include default cases for an enum-covered switch statement. Also llvm_unreachable again.</div><div><br></div><div>Thanks for your patience in iterating on this!</div><div>Jordan</div></body></html>