[cfe-dev] [Bug 7772] analyzer reports leaks when data is added to hash table or linked list

Thu Aug 12 16:33:04 PDT 2010

On 13/08/2010, at 9:59 AM, Michael Blumenkrantz wrote:

> On Thu, 12 Aug 2010 13:17:07 -0700
> Ted Kremenek <kremenek at apple.com> wrote:
> 
>> 
>> On Aug 2, 2010, at 2:50 PM, Michael Blumenkrantz wrote:
>> 
>>> Hi,
>>> 
>>> For some reason I can't reply on bugzilla so I'll just mail you
>>> directly.
>>> 
>>> In case 1 from above, the allocated data is set with
>>> evas_object_smart_data_set to part of obj's struct since it's an opaque
>>> struct.  The data will later be freed manually with
>>> evas_object_smart_data_get to retrieve it.
>>> 
>>> In case 2, the data is prepended to a list which is returned and
>>> inserted into an rbtree that is passed by reference as function param
>>> #2.
>>> 
>>> In case 3, the data is appended to the list that is returned.
>>> 
>>> In cases 2+3, it is up to the caller of the function to free the data
>>> at some point, or the caller of that caller, etc.
>> 
>> Hi Mike,
>> 
>> (Bugzilla is still not working for me on that PR)
>> 
>> I think in all of these cases the malloc/free's checker "local" checking
>> rules of how one uses malloc/free aren't being followed by your code (since
>> it has its own conventions).  This is a general problem with C code, which is
>> why the malloc/free checker is still an experimental check.
>> 
>> One solution we have looked at is providing "ownership annotations" that one
>> can use to annotate the arguments of functions.  This allows one to say
>> "passing this object to foo() means it will later be released" and so on.
>> I'm not convinced this is a full solution; not all users will want to use
>> such annotations, so a combination of smarter analysis and the use of these
>> annotations may be necessary.  The advantage of the annotations (which I
>> would like to hash out more with real users like yourself) is that they clear
>> document the memory management rules at a "contractual" level with the
>> declaration of functions.  This is really powerful, and will enable the
>> analyzer not only to emit fewer false positives but also find more real bugs
>> when those rules are violated.  That said, it isn't for everyone.  If you
>> have suggestions of what you'd like to see, I'd be very interested in
>> discussing it in the open on cfe-dev
>> 
>> Note that not running with experimental checks disabled will cause these
>> warnings to go away.
>> 
>> Ted
> Hi,
> 
> I am replying to this on the cfe-dev mailing list as requested to continue our
> discussion.
> 
> With regard to the annotations that you have mentioned, I do not believe they
> will be a full or workable solution to anyone with a reasonable sized api who
> wishes to have accurate results from the scanner.  In a case where there are
> hundreds of function calls such as the EFL, it would be extremely frustrating,
> not to mention annoying, to create and maintain such annotations with any
> degree of accuracy  That said, I think there would definitely be a place for it
> in smaller applications/libraries, or specialized cases with unusual behavior.

Having actually implemented this for the entire user space of an embedded Linux distribution, I can report that it does seem to be practical.  Annotating only two header files (glist.h and its counterpart in one large proprietary package) accounted for more than 80% of the false positives, and showed up significant number of real bugs that were not reported without the annotations.  Annotating another few headers (more glib container types) reduced the false positives from roughly 9000 to around 100 (compared to a real issue count of around 1200, this isn't bad).  The patch touches 82 lines of code, out of nearly 65 million (I'm not counting the kernel), so I think the cost is well worth it given the benefit. Incidentally, this exercise also shows the very high quality of most open-source code; fully 80% of the issues are in proprietary code, not all of it ours, that forms only 15% of the codebase.

That said, I'm working on the ability to deduce function annotations from the implementation. This has been shown feasible in a research compiler, and although it runs up against decidability problems in theory, real code should very seldom suffer from that issue in practice. It will require a two pass analysis build, however, as annotations for all functions will need to be available before any ownership checking will be reliable. For the time being I'm limiting this work to plain C, as I'm not enough of a C++ language lawyer to be at all confident of correctness.

> I think the best option here, though more difficult to implement, would be if
> there was a functionality which would do something like this:
> 
> 1) if an object is passed to a function and is not a primitive C type, it is
> tracked throughout the function. When allocated data is added to the object,
> the scope of the newly allocated data is set to equal to the object until it is
> directly referenced again, at which point the scope would split as the pointer
> is copied. This process would repeat until either the memory is freed or the
> pointer is lost.

This more or less describes what the ownership checker currently does, except that it currently only works for pointers allocated by malloc and annotated functions.

> 2) if memory is allocated in a function and then the allocated object is
> returned, the scope of the memory should be passed up the stack to the caller
> of the function and then tracked from there. As with #1, this would also repeat
> until free or leak.

Again, the ownership checker can do this too; however, at present functions need annotated that they do in fact allocate memory. There may be some possible API variants that can not at present be annotated, this is somewhat work in progress.

> This type of functionality would eliminate almost all of the false positives
> for leaks which I have seen, and would also allow for effective evaluation of
> NULL pointer dereferencing.

It certainly does help the NULL checker out; the two interact a great deal.

> Another unrelated feature that I would definitely enjoy seeing added to the
> analyzer (which I think is on the todo?) is the ability to see "report diffs",
> or just the bugs that have been fixed or created since the last report was
> run.  I imagine this would be done with something like scan-diff [file1]
> [file2] > outfile.

+1.  I may eventually write this; I certainly plan to contribute back a less site-specific version of my build analysis harness in due course, and this is functionality I may be adding to that.

> Hopefully though, we can at least find a solution to the bug I have reported :)
> 
> -- 
> Mike Blumenkrantz
> Zentific: Our boolean values are huge.
> _______________________________________________
> cfe-dev mailing list
> cfe-dev at cs.uiuc.edu
> http://lists.cs.uiuc.edu/mailman/listinfo/cfe-dev