[cfe-dev] [Bug 7772] analyzer reports leaks when data is added to hash table or linked list

Sun Aug 15 21:05:10 PDT 2010

On Fri, 13 Aug 2010 11:33:04 +1200
Andrew McGregor <andrewmcgr at gmail.com> wrote:

> 
> On 13/08/2010, at 9:59 AM, Michael Blumenkrantz wrote:
> 
> > On Thu, 12 Aug 2010 13:17:07 -0700
> > Ted Kremenek <kremenek at apple.com> wrote:
> > 
> >> 
> >> On Aug 2, 2010, at 2:50 PM, Michael Blumenkrantz wrote:
> >> 
> >>> Hi,
> >>> 
> >>> For some reason I can't reply on bugzilla so I'll just mail you
> >>> directly.
> >>> 
> >>> In case 1 from above, the allocated data is set with
> >>> evas_object_smart_data_set to part of obj's struct since it's an opaque
> >>> struct.  The data will later be freed manually with
> >>> evas_object_smart_data_get to retrieve it.
> >>> 
> >>> In case 2, the data is prepended to a list which is returned and
> >>> inserted into an rbtree that is passed by reference as function param
> >>> #2.
> >>> 
> >>> In case 3, the data is appended to the list that is returned.
> >>> 
> >>> In cases 2+3, it is up to the caller of the function to free the data
> >>> at some point, or the caller of that caller, etc.
> >> 
> >> Hi Mike,
> >> 
> >> (Bugzilla is still not working for me on that PR)
> >> 
> >> I think in all of these cases the malloc/free's checker "local" checking
> >> rules of how one uses malloc/free aren't being followed by your code (since
> >> it has its own conventions).  This is a general problem with C code, which
> >> is why the malloc/free checker is still an experimental check.
> >> 
> >> One solution we have looked at is providing "ownership annotations" that
> >> one can use to annotate the arguments of functions.  This allows one to say
> >> "passing this object to foo() means it will later be released" and so on.
> >> I'm not convinced this is a full solution; not all users will want to use
> >> such annotations, so a combination of smarter analysis and the use of these
> >> annotations may be necessary.  The advantage of the annotations (which I
> >> would like to hash out more with real users like yourself) is that they
> >> clear document the memory management rules at a "contractual" level with
> >> the declaration of functions.  This is really powerful, and will enable the
> >> analyzer not only to emit fewer false positives but also find more real
> >> bugs when those rules are violated.  That said, it isn't for everyone.  If
> >> you have suggestions of what you'd like to see, I'd be very interested in
> >> discussing it in the open on cfe-dev
> >> 
> >> Note that not running with experimental checks disabled will cause these
> >> warnings to go away.
> >> 
> >> Ted
> > Hi,
> > 
> > I am replying to this on the cfe-dev mailing list as requested to continue
> > our discussion.
> > 
> > With regard to the annotations that you have mentioned, I do not believe
> > they will be a full or workable solution to anyone with a reasonable sized
> > api who wishes to have accurate results from the scanner.  In a case where
> > there are hundreds of function calls such as the EFL, it would be extremely
> > frustrating, not to mention annoying, to create and maintain such
> > annotations with any degree of accuracy  That said, I think there would
> > definitely be a place for it in smaller applications/libraries, or
> > specialized cases with unusual behavior.
> 
> Having actually implemented this for the entire user space of an embedded
> Linux distribution, I can report that it does seem to be practical.
> Annotating only two header files (glist.h and its counterpart in one large
> proprietary package) accounted for more than 80% of the false positives, and
> showed up significant number of real bugs that were not reported without the
> annotations.  Annotating another few headers (more glib container types)
> reduced the false positives from roughly 9000 to around 100 (compared to a
> real issue count of around 1200, this isn't bad).  The patch touches 82 lines
> of code, out of nearly 65 million (I'm not counting the kernel), so I think
> the cost is well worth it given the benefit. Incidentally, this exercise also
> shows the very high quality of most open-source code; fully 80% of the issues
> are in proprietary code, not all of it ours, that forms only 15% of the
> codebase.
> 
> That said, I'm working on the ability to deduce function annotations from the
> implementation. This has been shown feasible in a research compiler, and
> although it runs up against decidability problems in theory, real code should
> very seldom suffer from that issue in practice. It will require a two pass
> analysis build, however, as annotations for all functions will need to be
> available before any ownership checking will be reliable. For the time being
> I'm limiting this work to plain C, as I'm not enough of a C++ language lawyer
> to be at all confident of correctness.
> 
> > I think the best option here, though more difficult to implement, would be
> > if there was a functionality which would do something like this:
> > 
> > 1) if an object is passed to a function and is not a primitive C type, it is
> > tracked throughout the function. When allocated data is added to the object,
> > the scope of the newly allocated data is set to equal to the object until
> > it is directly referenced again, at which point the scope would split as
> > the pointer is copied. This process would repeat until either the memory is
> > freed or the pointer is lost.
> 
> This more or less describes what the ownership checker currently does, except
> that it currently only works for pointers allocated by malloc and annotated
> functions.
> 
> > 2) if memory is allocated in a function and then the allocated object is
> > returned, the scope of the memory should be passed up the stack to the
> > caller of the function and then tracked from there. As with #1, this would
> > also repeat until free or leak.
> 
> Again, the ownership checker can do this too; however, at present functions
> need annotated that they do in fact allocate memory. There may be some
> possible API variants that can not at present be annotated, this is somewhat
> work in progress.
> 
> > This type of functionality would eliminate almost all of the false positives
> > for leaks which I have seen, and would also allow for effective evaluation
> > of NULL pointer dereferencing.
> 
> It certainly does help the NULL checker out; the two interact a great deal.
> 
> > Another unrelated feature that I would definitely enjoy seeing added to the
> > analyzer (which I think is on the todo?) is the ability to see "report
> > diffs", or just the bugs that have been fixed or created since the last
> > report was run.  I imagine this would be done with something like scan-diff
> > [file1] [file2] > outfile.
> 
> +1.  I may eventually write this; I certainly plan to contribute back a less
> site-specific version of my build analysis harness in due course, and this is
> functionality I may be adding to that.
> 
> > Hopefully though, we can at least find a solution to the bug I have
> > reported :)
> > 
> > -- 
> > Mike Blumenkrantz
> > Zentific: Our boolean values are huge.
> > _______________________________________________
> > cfe-dev mailing list
> > cfe-dev at cs.uiuc.edu
> > http://lists.cs.uiuc.edu/mailman/listinfo/cfe-dev
> 
Hm, well I'd definitely be interested in seeing some testing in the analyzer
with annotations.  If it really is the case that only a few headers need to be
annotated to see this kind of change, then that would most likely be sufficient
for my needs.

-- 
Mike Blumenkrantz
Zentific: Our boolean values are huge.