[cfe-dev] RFC: static analysis, malloc annontation for return via pointer argument
Scott Parlane
scott at scottnz.com
Wed Jan 9 15:15:27 PST 2013
On Wed, 2013-01-09 at 14:32 -0800, Anna Zaks wrote:
> On Jan 9, 2013, at 1:34 PM, Scott Parlane <scott at scottnz.com> wrote:
>
> > Anna,
> >
> > To deal with binding return value, and byref value, I was suggesting to
> > add another argument to the byref annotation indicating that they are
> > bound (and always bind ret < 0, byref is invalid)
>
> I don't think that assuming that all functions that want to use this annotation would return a negative number on failure is generic enough. The annotation mechanism does not currently support inclusion of expressions. Do you have examples of functions that would get the new annotation and how the failure is communicated back. For example, if the pointer is set to NULL on failure to allocate in all of them, it might be acceptable to hardcode that in the checker (this would be analogous with malloc). Another possibility would be to add a separate annotation to represent the failure mode of a function (not sure if others would like this idea).
There are two examples from our code base, both of which will leave the
retptr alone, and return -1 in the failure case.
The example I have been using other places in asprintf, which actually
does realloc, and it also returns -1 in the failure case, and the retptr
is invalid. (I plan to make the correct annotation for this case, as
well as create the automatic annotation for this function)
Also in the asprintf, the size of the allocated block can be bound to
value returned by the function. (Nuno pointed out a similar annotation
for this (alloc_size) earlier, I assume we could make one for this case
too ?)
> It is extremely important to get the annotation design right and ensure that it will be generic since others might start relying on it and this is not something we can easily change in the future.
Yes, I agree.
> >
> > Can we merge (malloc and mallocwithannotations) such that we always do
> > the optimistic case ?
>
> It is very important to have the pessimistic checker by default - it is pessimistic in that it does not assume that user had annotated their code with ownership attributes. Unless you annotate all the functions which might free memory in your codebase, the optimistic checker will give false positives. It is very common to allocate a pointer in one function and pass it to another one which would, for example, use it and deallocate when done. Similarly, one might store a pointer in a global variable, and it will get freed by another function reading the same global. (Not sure what the optimistic checker should do in the second case, but most likely it should stop tracking the pointer. This case would get detected through the pointerEscape callback as well.) The analyzer does not reason cross-translation unit and cannot track the pointer perfectly. This is why the pessimistic checker assumes that the pointer will get properly handled when it "escapes".
>
> It is possible that what we need is a pessimistic checker (does not assume that all functions are annotated) with annotations for extra checking. For example, if you have a custom function that allocates memory, we just want to track it with the pessimistic checker. This might be more useful in general.
I would prefer this option, so that annotations can be used either way,
and you can decide what level of checking you want for malloc.
> > Also there is some heavily related code in SecKeychainAPI (some parts of
> > that API appear to be a specific case of my more generic byref checker)
> >
>
> Malloc and KeyChainAPI checkers are very similar and could be combined. We've kind of bypassed the annotation design issue by writing a new checker for that API, which is not a good precedent. (Also, the SecKeychainAPI checker was written before the pessimistic Malloc, and the plan was to merge them back together later if it made sense.)
I see in KeyChainAPI, it has to explicitly check for double allocations,
so I think I need to implement this as well (I'm not getting any
complaints when I allocate over the top)
> > Oh, in my test case I was assuming it would complain a line earlier than
> > it does (where the actual leak occurs).
>
> This is the byproduct of how the analyzer detects leaks. Falls out of performance/precision tradeoff.
>
> > Try with free(p) twice however.
> > (I get the doublely free'd error, but not the leaked memory error)
> >
> This is because leaks are considered to be "less severe" (non-sink) issues than the use-afer-frees(sinks). As soon as we discover the use-after-free, we stop exploring the path, so the analyzer never gets to discover the leak.
>
Ok, that explains some other places where there are multiple errors, but
you only get a report for one of them. (leak warning instead of
free(NULL) in the case I saw earlier)
> > Regards,
> > Scott
> >
> > On Wed, 2013-01-09 at 11:31 -0800, Anna Zaks wrote:
> >>
> >> On Jan 8, 2013, at 5:47 PM, Scott Parlane <scott at scottnz.com> wrote:
> >>
> >>> On Tue, 2013-01-08 at 16:13 -0800, Anna Zaks wrote:
> >>>> On Jan 8, 2013, at 1:43 PM, Scott Parlane <scott at scottnz.com>
> >>>> wrote:
> >>>>
> >>>>> Anna,
> >>>>>
> >>>>> Where is the normal place to document such things ?
> >>>>
> >>>> The checker itself would be a good place.
> >>>>
> >>>>> Currently it takes 2 arguments, the family name and the argument
> >>>>> number
> >>>>> that is the reference to return into.
> >>>>>
> >>>>> The binding thing is interesting.
> >>>>> I could add an argument that is true/false indicating whether
> >>>>> binding
> >>>>> should occur, and always bind for < 0 is invalid (I haven't
> >>>>> found any
> >>>>> APIs where this isn't the case, yet)
> >>>>
> >>>> I am not clear what you are talking about here. The return value
> >>>> on failure?
> >>>>
> >>> yes, the return value of the function on failure.
> >>
> >>
> >> What about capturing the failure with an annotation? You are proposing
> >> to hardcode it in the checker, right? (I don't fully understand the
> >> above.)
> >>
> >>>>>
> >>>>> The ownership_returns attribute already has an option argument
> >>>>> (which
> >>>>> argument is the size) that would conflict with trying to
> >>>>> overload it.
> >>>>> I would have to add an optional argument that triggers byref,
> >>>>> indicating
> >>>>> which arguments is the reference, of course there is no way to
> >>>>> tell
> >>>>> which of these two optional cases is intended, without changing
> >>>>> the api
> >>>>> for the ownership_returns attribute.
> >>>>>
> >>>> OK.
> >>>>
> >>>>> 1) Yes, just working on this now.
> >>>>> 2) oops, enabled the boundary marker in my editor now.
> >>>>> 3) Yes.
> >>>>>
> >>>>> I ran into a minor problem, if you call the returns_byref
> >>>>> function
> >>>>> twice, it doesn't notice that the memory leaks. Am I meant to
> >>>>> test for
> >>>>> this in the MallocMemAux function ? Or is some other function
> >>>>> meant to
> >>>>> notice ?
> >>>>>
> >>>>
> >>>>
> >>>> Each allocation should return a different symbol. Each symbol
> >>>> should be tracked separately, so you should see 2 calls to
> >>>> MallocMemAux.
> >>> I think this is happening
> >>>
> >>>> Maybe when you pass the pointer to the second call to the custom
> >>>> malloc function, the checker stops tracking the symbol as a result
> >>>> of pointerEscapes callback...
> >>> There is an assumption in doesNotFreeMemory that assumes all
> >>> user-defined functions can free memory. I would think only correctly
> >>> annotated functions can free memory ? (ownership_takes or
> >>> ownership_holds)
> >>
> >>
> >> The optimistic and pessimistic checkers behave differently here. For
> >> optimistic checker (checker with annotations), the PointerEscape
> >> callback should not do anything (as you point out above) - it should
> >> just return:
> >> if (Filter.CMallocOptimistic)
> >> return;
> >>
> >>
> >> My bad, since the optimistic checker is alpha we have not been
> >> properly maintaining it. There are probably more subtle issues like
> >> this one.
> >>
> >>
> >>
> >>>
> >>> I notice that this test doesn't work as expected:
> >>> void test() {
> >>> int *p;
> >>> int **q = &p;
> >>> q[0] = malloc(4);
> >>> q[0] = malloc(12);
> >>> free(p);
> >>> }
> >>>
> >>> (expected that the second malloc will cause a "memory never
> >>> released"
> >>> error, because the memory has most certainly leaked at this point)
> >>>
> >>
> >>
> >> What is the output you are getting here? I am getting a leak warning.
> >> It is the memory returned by the first call to malloc that is being
> >> leaked.
> >> $ /Volumes/Data/ws/llvmgit/build/Debug+Asserts/bin/clang --analyze
> >> ~/tmp/ex.c -Xanalyzer -analyzer-output=text
> >>
> >>
> >> /Users/zaks/tmp/ex.c:7:2: warning: Memory is never released; potential
> >> leak
> >> free(p);
> >> ^~~~~~~
> >> /Users/zaks/tmp/ex.c:5:9: note: Memory is allocated
> >> q[0] = malloc(4);
> >> ^
> >> /Users/zaks/tmp/ex.c:7:2: note: Memory is never released; potential
> >> leak
> >> free(p);
> >> ^
> >> 1 warning generated.
> >>
> >>
> >>
> >>> which is probably why my annotation also doesn't work correctly.
> >>> Ideas ?
> >>>
> >>>>> Cheers,
> >>>>> Scott
> >>>>>
> >>>>> On Tue, 2013-01-08 at 09:57 -0800, Anna Zaks wrote:
> >>>>>> Scott,
> >>>>>>
> >>>>>> I do not see the new attribute and it's arguments documented
> >>>>>> anywhere. (In one of the pervious emails, Ted asked for the
> >>>>>> semantics of the attribute.) What happens when the allocation
> >>>>>> function fails to allocate? This is important since the user
> >>>>>> code often does not free the pointer on the failure path. Not
> >>>>>> handling this case would lead to false positives.
> >>>>>>
> >>>>>> Also, have you tried to reuse the existing "ownership_returns"
> >>>>>> attribute? I think that would be cleaner. However, it would be
> >>>>>> good to see if it can be elegantly extended or not.
> >>>>>>
> >>>>>> Other minor comments:
> >>>>>> 1) A lot of MallocMemRetPtrAux is a copy and paste from
> >>>>>> MallocMemAux. Is it possible to factor so that there is more
> >>>>>> code reuse?
> >>>>>> 2) 80 column rule.
> >>>>>> 3) We probably need more than one test.
> >>>>>>
> >>>>>> Cheers,
> >>>>>> Anna.
> >>>>>>
> >>>>>>
> >>>>>> On Jan 6, 2013, at 10:00 PM, Scott Parlane <scott at scottnz.com>
> >>>>>> wrote:
> >>>>>>
> >>>>>>> Hi,
> >>>>>>>
> >>>>>>> I have attached a new patch with the attribute name changed.
> >>>>>>>
> >>>>>>> I think that extensions for size, family != malloc, binding
> >>>>>>> return value
> >>>>>>> to pointer can be done separately.
> >>>>>>>
> >>>>>>> I need to implement the family support one as part of the
> >>>>>>> work I'm
> >>>>>>> currently doing. I intend to change it to name the free
> >>>>>>> function to use,
> >>>>>>> instead of the allocator, I think this will be simpler.
> >>>>>>>
> >>>>>>> Cheers,
> >>>>>>> Scott
> >>>>>>>
> >>>>>>> On Mon, 2012-12-17 at 15:32 -0800, Ted Kremenek wrote:
> >>>>>>>> On Dec 17, 2012, at 1:55 PM, Scott Parlane
> >>>>>>>> <scott at scottnz.com> wrote:
> >>>>>>>>
> >>>>>>>>> This patch allows for APIs were memory is allocated and
> >>>>>>>>> placed in a
> >>>>>>>>> pointer given to them. (like asprintf, but without the
> >>>>>>>>> realloc
> >>>>>>>>> feature)
> >>>>>>>>
> >>>>>>>>
> >>>>>>>> Thanks Scott. I'm not such a fan of the name of the
> >>>>>>>> attribute. Since
> >>>>>>>> this is returning an object by reference, how
> >>>>>>>> about ownership_returns_byref? I'm sure others will have
> >>>>>>>> an opinion,
> >>>>>>>> but ownership_returns_pointer really doesn't tell the user
> >>>>>>>> what this
> >>>>>>>> attribute does.
> >>>>>>>>
> >>>>>>>>
> >>>>>>>> As for the implementation itself, it looks okay. The
> >>>>>>>> annotation
> >>>>>>>> support eventually needs to be migrated from the
> >>>>>>>> alpha.unix.MallocWithAnnotations checker to the
> >>>>>>>> unix.Malloc checker
> >>>>>>>> (same file, logic controlled with a flag), but that's a
> >>>>>>>> separate
> >>>>>>>> issue.
> >>>>>>>>
> >>>>>>>>
> >>>>>>>>>
> >>>>>>>>> Currently, when using a pointer to a stack variable as
> >>>>>>>>> the input to
> >>>>>>>>> the
> >>>>>>>>> function, it declares the memory leak on the next source
> >>>>>>>>> line after
> >>>>>>>>> the
> >>>>>>>>> stack variable is used. I think it should be declaring
> >>>>>>>>> the leak on
> >>>>>>>>> the
> >>>>>>>>> last line of the current scope. Which is correct ?
> >>>>>>>>
> >>>>>>>> The current behavior is correct. The current scope could
> >>>>>>>> be very big,
> >>>>>>>> and may end long after the leak occurs. We have found
> >>>>>>>> that reporting
> >>>>>>>> leaks as close as possible to where the leak occurred is a
> >>>>>>>> much better
> >>>>>>>> experience for users.
> >>>>>>>
> >>>>>>>
> >>>>>>> <0001-Make-clang-static-analysis-support-allocation-into-a.patch>_______________________________________________
> >>>>>>> cfe-dev mailing list
> >>>>>>> cfe-dev at cs.uiuc.edu
> >>>>>>> http://lists.cs.uiuc.edu/mailman/listinfo/cfe-dev
> >>>>>>
> >>>>>
> >>>>>
> >>>>
> >>>
> >>>
> >>
> >
> >
>
More information about the cfe-dev
mailing list