[cfe-dev] RFC: static analysis, malloc annontation for return via pointer argument

Tue Jan 15 19:06:55 PST 2013

On Jan 14, 2013, at 12:20 PM, Scott Parlane <scott at scottnz.com> wrote:

> On Fri, 2013-01-11 at 14:18 -0800, Anna Zaks wrote:
>> On Jan 10, 2013, at 1:36 PM, Scott Parlane <scott at scottnz.com> wrote:
>> 
>>> Anna,
>>> 
>>> I propose the following 2 new annotations:
>>> 1) ownership_returns_byref which takes 2/3 arguments
>>> - the family name
>>> - the argument number which will be allocated into
>>> - (optionally) the return value that indicates allocation failed
>> 
>> What would be the type of this argument?
> The same as the return type of the function.
>> 
>>> This annotation indicates that the function will attempt to store an
>>> allocated memory region into the argument identified. Additionally it
>>> allows for identifying when the function has failed, so the memory can
>>> be treated as invalid (I further suggest we allow expressions here,
>> 
>> I don't think we have the machinery to parse the C/C++ expressions inside the arguments. We would further need to evaluate those expressions with the static analyzer.. Even though this is the most expressive option, I am not sure how difficult this would be to implement and if the current annotation mechanism is suitable for it.
> 
>>> however I haven't found a case where it's strictly needed)
>> 
>> Often functions return a set of error codes on failure, not just a single value.
> Then I suggest we add an optional string after it, which indicates the
> expression (i.e. "<=",">=","==")
> Also, we could expand the return value to be an array, assuming it's
> possible to process it, and do the correct value binding ?
> 

How about just making the failure argument an enumeration, with possible values from the following set: {nonzero, 0, 1, negative}?

We should always assume that the allocation failed if the pointer was set to null.

Also, I was looking at the ownership attributes and they seem to abuse the Spellings member of the attribute to represent the different ownership arguments. The new attribute should be defined separately. We were also talking internally about creating an analyzer meta-attribute and placing all the other attributes "behind it". I will send out more details after the idea becomes more concrete.

Cheers,
Anna.

>>> 2) ownership_replaces_byref which takes 3/4 arguments
>>> - the family name
>>> - the argument number which will be free()'d/allocated into (as per
>>> realloc behavior)
>>> - true/false, it can/cant free without allocating
>>> - (optionally) the return value that indicates (re)allocation failed
>>> This annotation indicates that the function will attempt to resize
>>> the/or allocate a memory region into the argument. It requires
>>> indicating if the pointer can become invalid. Additionally it allows for
>>> identifying when the function has failed, so the memory can be treated
>>> as invalid (I further suggest we allow expressions here, however I
>>> haven't found a case where it's strictly needed)
>>> 
>>> I suggest that the size of the allocated region be undefined, and/or
>>> defined by another (set of) annotation(s). I think there are too many
>>> possible ways of defining the returned pointers size (sizeof(struct),
>>> sizeof(char) * return value, argX * argY, etc) to successfully define
>>> all of them with these annotations alone.
>>> 
>>> Regards,
>>> Scott
>>> 
>>> On Thu, 2013-01-10 at 10:39 -0800, Anna Zaks wrote:
>>>> Scott, 
>>>> 
>>>> We need to finalize/document the annotations before diving into the implementation. 
>>>> 
>>>> Thanks!
>>>> Anna.
>>>> 
>>>> On Jan 9, 2013, at 7:15 PM, Scott Parlane <scott at scottnz.com> wrote:
>>>> 
>>>>> Hi,
>>>>> 
>>>>> I have updated the patch, it now correctly deals with the allocation
>>>>> happening twice (using the checkprestmt method).
>>>>> 
>>>>> I still haven't done anything about binding the return value and the
>>>>> allocated region's validness together.
>>>>> 
>>>>> I think I can safely copy this and implement the realloc version of this
>>>>> without much more effort (to properly handle the asprintf case).
>>>>> 
>>>>> Cheers,
>>>>> Scott
>>>>> 
>>>>> On Thu, 2013-01-10 at 12:15 +1300, Scott Parlane wrote:
>>>>>> On Wed, 2013-01-09 at 14:32 -0800, Anna Zaks wrote:
>>>>>>> On Jan 9, 2013, at 1:34 PM, Scott Parlane <scott at scottnz.com> wrote:
>>>>>>> 
>>>>>>>> Anna,
>>>>>>>> 
>>>>>>>> To deal with binding return value, and byref value, I was suggesting to
>>>>>>>> add another argument to the byref annotation indicating that they are
>>>>>>>> bound (and always bind ret < 0, byref is invalid)
>>>>>>> 
>>>>>>> I don't think that assuming that all functions that want to use this annotation would return a negative number on failure is generic enough. The annotation mechanism does not currently support inclusion of expressions. Do you have examples of functions that would get the new annotation and how the failure is communicated back. For example, if the pointer is set to NULL on failure to allocate in all of them, it might be acceptable to hardcode that in the checker (this would be analogous with malloc). Another possibility would be to add a separate annotation to represent the failure mode of a function (not sure if others would like this idea).
>>>>>> 
>>>>>> There are two examples from our code base, both of which will leave the
>>>>>> retptr alone, and return -1 in the failure case.
>>>>>> 
>>>>>> The example I have been using other places in asprintf, which actually
>>>>>> does realloc, and it also returns -1 in the failure case, and the retptr
>>>>>> is invalid. (I plan to make the correct annotation for this case, as
>>>>>> well as create the automatic annotation for this function)
>>>>>> 
>>>>>> Also in the asprintf, the size of the allocated block can be bound to
>>>>>> value returned by the function. (Nuno pointed out a similar annotation
>>>>>> for this (alloc_size) earlier, I assume we could make one for this case
>>>>>> too ?)
>>>>>> 
>>>>>>> It is extremely important to get the annotation design right and ensure that it will be generic since others might start relying on it and this is not something we can easily change in the future.
>>>>>> 
>>>>>> Yes, I agree.
>>>>>> 
>>>>>>>> 
>>>>>>>> Can we merge (malloc and mallocwithannotations) such that we always do
>>>>>>>> the optimistic case ?
>>>>>>> 
>>>>>>> It is very important to have the pessimistic checker by default - it is pessimistic in that it does not assume that user had annotated their code with ownership attributes. Unless you annotate all the functions which might free memory in your codebase, the optimistic checker will give false positives. It is very common to allocate a pointer in one function and pass it to another one which would, for example, use it and deallocate when done. Similarly, one might store a pointer in a global variable, and it will get freed by another function reading the same global. (Not sure what the optimistic checker should do in the second case, but most likely it should stop tracking the pointer. This case would get detected through the pointerEscape callback as well.) The analyzer does not reason cross-translation unit and cannot track the pointer perfectly. This is why the pessimistic checker assumes that the pointer will get properly handled when it "escapes". 
>>>>>>> 
>>>>>>> It is possible that what we need is a pessimistic checker (does not assume that all functions are annotated) with annotations for extra checking. For example, if you have a custom function that allocates memory, we just want to track it with the pessimistic checker. This might be more useful in general.
>>>>>> 
>>>>>> I would prefer this option, so that annotations can be used either way,
>>>>>> and you can decide what level of checking you want for malloc.
>>>>>> 
>>>>>>>> Also there is some heavily related code in SecKeychainAPI (some parts of
>>>>>>>> that API appear to be a specific case of my more generic byref checker)
>>>>>>>> 
>>>>>>> 
>>>>>>> Malloc and KeyChainAPI checkers are very similar and could be combined. We've kind of bypassed the annotation design issue by writing a new checker for that API, which is not a good precedent. (Also, the SecKeychainAPI checker was written before the pessimistic Malloc, and the plan was to merge them back together later if it made sense.)
>>>>>> 
>>>>>> I see in KeyChainAPI, it has to explicitly check for double allocations,
>>>>>> so I think I need to implement this as well (I'm not getting any
>>>>>> complaints when I allocate over the top)
>>>>>> 
>>>>>>>> Oh, in my test case I was assuming it would complain a line earlier than
>>>>>>>> it does (where the actual leak occurs).
>>>>>>> 
>>>>>>> This is the byproduct of how the analyzer detects leaks. Falls out of performance/precision tradeoff.
>>>>>>> 
>>>>>>>> Try with free(p) twice however.
>>>>>>>> (I get the doublely free'd error, but not the leaked memory error)
>>>>>>>> 
>>>>>>> This is because leaks are considered to be "less severe" (non-sink) issues than the use-afer-frees(sinks). As soon as we discover the use-after-free, we stop exploring the path, so the analyzer never gets to discover the leak.
>>>>>>> 
>>>>>> Ok, that explains some other places where there are multiple errors, but
>>>>>> you only get a report for one of them. (leak warning instead of
>>>>>> free(NULL) in the case I saw earlier)
>>>>>> 
>>>>>>>> Regards,
>>>>>>>> Scott
>>>>>>>> 
>>>>>>>> On Wed, 2013-01-09 at 11:31 -0800, Anna Zaks wrote:
>>>>>>>>> 
>>>>>>>>> On Jan 8, 2013, at 5:47 PM, Scott Parlane <scott at scottnz.com> wrote:
>>>>>>>>> 
>>>>>>>>>> On Tue, 2013-01-08 at 16:13 -0800, Anna Zaks wrote:
>>>>>>>>>>> On Jan 8, 2013, at 1:43 PM, Scott Parlane <scott at scottnz.com>
>>>>>>>>>>> wrote:
>>>>>>>>>>> 
>>>>>>>>>>>> Anna,
>>>>>>>>>>>> 
>>>>>>>>>>>> Where is the normal place to document such things ?
>>>>>>>>>>> 
>>>>>>>>>>> The checker itself would be a good place.
>>>>>>>>>>> 
>>>>>>>>>>>> Currently it takes 2 arguments, the family name and the argument
>>>>>>>>>>>> number
>>>>>>>>>>>> that is the reference to return into.
>>>>>>>>>>>> 
>>>>>>>>>>>> The binding thing is interesting.
>>>>>>>>>>>> I could add an argument that is true/false indicating whether
>>>>>>>>>>>> binding
>>>>>>>>>>>> should occur, and always bind for < 0 is invalid (I haven't
>>>>>>>>>>>> found any
>>>>>>>>>>>> APIs where this isn't the case, yet)
>>>>>>>>>>> 
>>>>>>>>>>> I am not clear what you are talking about here. The return value
>>>>>>>>>>> on failure?
>>>>>>>>>>> 
>>>>>>>>>> yes, the return value of the function on failure.
>>>>>>>>> 
>>>>>>>>> 
>>>>>>>>> What about capturing the failure with an annotation? You are proposing
>>>>>>>>> to hardcode it in the checker, right? (I don't fully understand the
>>>>>>>>> above.)
>>>>>>>>> 
>>>>>>>>>>>> 
>>>>>>>>>>>> The ownership_returns attribute already has an option argument
>>>>>>>>>>>> (which
>>>>>>>>>>>> argument is the size) that would conflict with trying to
>>>>>>>>>>>> overload it.
>>>>>>>>>>>> I would have to add an optional argument that triggers byref,
>>>>>>>>>>>> indicating
>>>>>>>>>>>> which arguments is the reference, of course there is no way to
>>>>>>>>>>>> tell
>>>>>>>>>>>> which of these two optional cases is intended, without changing
>>>>>>>>>>>> the api
>>>>>>>>>>>> for the ownership_returns attribute.
>>>>>>>>>>>> 
>>>>>>>>>>> OK.
>>>>>>>>>>> 
>>>>>>>>>>>> 1) Yes, just working on this now.
>>>>>>>>>>>> 2) oops, enabled the boundary marker in my editor now.
>>>>>>>>>>>> 3) Yes.
>>>>>>>>>>>> 
>>>>>>>>>>>> I ran into a minor problem, if you call the returns_byref
>>>>>>>>>>>> function
>>>>>>>>>>>> twice, it doesn't notice that the memory leaks. Am I meant to
>>>>>>>>>>>> test for
>>>>>>>>>>>> this in the MallocMemAux function ? Or is some other function
>>>>>>>>>>>> meant to
>>>>>>>>>>>> notice ?
>>>>>>>>>>>> 
>>>>>>>>>>> 
>>>>>>>>>>> 
>>>>>>>>>>> Each allocation should return a different symbol. Each symbol
>>>>>>>>>>> should be tracked separately, so you should see 2 calls to
>>>>>>>>>>> MallocMemAux.
>>>>>>>>>> I think this is happening
>>>>>>>>>> 
>>>>>>>>>>> Maybe when you pass the pointer to the second call to the custom
>>>>>>>>>>> malloc function, the checker stops tracking the symbol as a result
>>>>>>>>>>> of pointerEscapes callback...
>>>>>>>>>> There is an assumption in doesNotFreeMemory that assumes all
>>>>>>>>>> user-defined functions can free memory. I would think only correctly
>>>>>>>>>> annotated functions can free memory ? (ownership_takes or
>>>>>>>>>> ownership_holds)
>>>>>>>>> 
>>>>>>>>> 
>>>>>>>>> The optimistic and pessimistic checkers behave differently here. For
>>>>>>>>> optimistic checker (checker with annotations), the PointerEscape
>>>>>>>>> callback should not do anything (as you point out above) - it should
>>>>>>>>> just return:
>>>>>>>>> if (Filter.CMallocOptimistic)
>>>>>>>>> return;
>>>>>>>>> 
>>>>>>>>> 
>>>>>>>>> My bad, since the optimistic checker is alpha we have not been
>>>>>>>>> properly maintaining it. There are probably more subtle issues like
>>>>>>>>> this one.
>>>>>>>>> 
>>>>>>>>> 
>>>>>>>>> 
>>>>>>>>>> 
>>>>>>>>>> I notice that this test doesn't work as expected:
>>>>>>>>>> void test() {
>>>>>>>>>> int *p;
>>>>>>>>>> int **q = &p;
>>>>>>>>>> q[0] = malloc(4);
>>>>>>>>>> q[0] = malloc(12);
>>>>>>>>>> free(p);
>>>>>>>>>> }
>>>>>>>>>> 
>>>>>>>>>> (expected that the second malloc will cause a "memory never
>>>>>>>>>> released"
>>>>>>>>>> error, because the memory has most certainly leaked at this point)
>>>>>>>>>> 
>>>>>>>>> 
>>>>>>>>> 
>>>>>>>>> What is the output you are getting here? I am getting a leak warning.
>>>>>>>>> It is the memory returned by the first call to malloc that is being
>>>>>>>>> leaked. 
>>>>>>>>> $ /Volumes/Data/ws/llvmgit/build/Debug+Asserts/bin/clang --analyze
>>>>>>>>> ~/tmp/ex.c -Xanalyzer -analyzer-output=text
>>>>>>>>> 
>>>>>>>>> 
>>>>>>>>> /Users/zaks/tmp/ex.c:7:2: warning: Memory is never released; potential
>>>>>>>>> leak
>>>>>>>>> free(p);
>>>>>>>>> ^~~~~~~
>>>>>>>>> /Users/zaks/tmp/ex.c:5:9: note: Memory is allocated
>>>>>>>>> q[0] = malloc(4);
>>>>>>>>>     ^
>>>>>>>>> /Users/zaks/tmp/ex.c:7:2: note: Memory is never released; potential
>>>>>>>>> leak
>>>>>>>>> free(p);
>>>>>>>>> ^
>>>>>>>>> 1 warning generated.
>>>>>>>>> 
>>>>>>>>> 
>>>>>>>>> 
>>>>>>>>>> which is probably why my annotation also doesn't work correctly.
>>>>>>>>>> Ideas ?
>>>>>>>>>> 
>>>>>>>>>>>> Cheers,
>>>>>>>>>>>> Scott
>>>>>>>>>>>> 
>>>>>>>>>>>> On Tue, 2013-01-08 at 09:57 -0800, Anna Zaks wrote:
>>>>>>>>>>>>> Scott, 
>>>>>>>>>>>>> 
>>>>>>>>>>>>> I do not see the new attribute and it's arguments documented
>>>>>>>>>>>>> anywhere. (In one of the pervious emails, Ted asked for the
>>>>>>>>>>>>> semantics of the attribute.) What happens when the allocation
>>>>>>>>>>>>> function fails to allocate? This is important since the user
>>>>>>>>>>>>> code often does not free the pointer on the failure path. Not
>>>>>>>>>>>>> handling this case would lead to false positives.
>>>>>>>>>>>>> 
>>>>>>>>>>>>> Also, have you tried to reuse the existing "ownership_returns"
>>>>>>>>>>>>> attribute? I think that would be cleaner. However, it would be
>>>>>>>>>>>>> good to see if it can be elegantly extended or not.
>>>>>>>>>>>>> 
>>>>>>>>>>>>> Other minor comments:
>>>>>>>>>>>>> 1) A lot of MallocMemRetPtrAux is a copy and paste from
>>>>>>>>>>>>> MallocMemAux. Is it possible to factor so that there is more
>>>>>>>>>>>>> code reuse?
>>>>>>>>>>>>> 2) 80 column rule.
>>>>>>>>>>>>> 3) We probably need more than one test.
>>>>>>>>>>>>> 
>>>>>>>>>>>>> Cheers,
>>>>>>>>>>>>> Anna.
>>>>>>>>>>>>> 
>>>>>>>>>>>>> 
>>>>>>>>>>>>> On Jan 6, 2013, at 10:00 PM, Scott Parlane <scott at scottnz.com>
>>>>>>>>>>>>> wrote:
>>>>>>>>>>>>> 
>>>>>>>>>>>>>> Hi,
>>>>>>>>>>>>>> 
>>>>>>>>>>>>>> I have attached a new patch with the attribute name changed.
>>>>>>>>>>>>>> 
>>>>>>>>>>>>>> I think that extensions for size, family != malloc, binding
>>>>>>>>>>>>>> return value
>>>>>>>>>>>>>> to pointer can be done separately.
>>>>>>>>>>>>>> 
>>>>>>>>>>>>>> I need to implement the family support one as part of the
>>>>>>>>>>>>>> work I'm
>>>>>>>>>>>>>> currently doing. I intend to change it to name the free
>>>>>>>>>>>>>> function to use,
>>>>>>>>>>>>>> instead of the allocator, I think this will be simpler.
>>>>>>>>>>>>>> 
>>>>>>>>>>>>>> Cheers,
>>>>>>>>>>>>>> Scott
>>>>>>>>>>>>>> 
>>>>>>>>>>>>>> On Mon, 2012-12-17 at 15:32 -0800, Ted Kremenek wrote:
>>>>>>>>>>>>>>> On Dec 17, 2012, at 1:55 PM, Scott Parlane
>>>>>>>>>>>>>>> <scott at scottnz.com> wrote:
>>>>>>>>>>>>>>> 
>>>>>>>>>>>>>>>> This patch allows for APIs were memory is allocated and
>>>>>>>>>>>>>>>> placed in a
>>>>>>>>>>>>>>>> pointer given to them. (like asprintf, but without the
>>>>>>>>>>>>>>>> realloc
>>>>>>>>>>>>>>>> feature)
>>>>>>>>>>>>>>> 
>>>>>>>>>>>>>>> 
>>>>>>>>>>>>>>> Thanks Scott.  I'm not such a fan of the name of the
>>>>>>>>>>>>>>> attribute.  Since
>>>>>>>>>>>>>>> this is returning an object by reference, how
>>>>>>>>>>>>>>> about ownership_returns_byref?  I'm sure others will have
>>>>>>>>>>>>>>> an opinion,
>>>>>>>>>>>>>>> but ownership_returns_pointer really doesn't tell the user
>>>>>>>>>>>>>>> what this
>>>>>>>>>>>>>>> attribute does.
>>>>>>>>>>>>>>> 
>>>>>>>>>>>>>>> 
>>>>>>>>>>>>>>> As for the implementation itself, it looks okay.  The
>>>>>>>>>>>>>>> annotation
>>>>>>>>>>>>>>> support eventually needs to be migrated from the
>>>>>>>>>>>>>>> alpha.unix.MallocWithAnnotations checker to the
>>>>>>>>>>>>>>> unix.Malloc checker
>>>>>>>>>>>>>>> (same file, logic controlled with a flag), but that's a
>>>>>>>>>>>>>>> separate
>>>>>>>>>>>>>>> issue.
>>>>>>>>>>>>>>> 
>>>>>>>>>>>>>>> 
>>>>>>>>>>>>>>>> 
>>>>>>>>>>>>>>>> Currently, when using a pointer to a stack variable as
>>>>>>>>>>>>>>>> the input to
>>>>>>>>>>>>>>>> the
>>>>>>>>>>>>>>>> function, it declares the memory leak on the next source
>>>>>>>>>>>>>>>> line after
>>>>>>>>>>>>>>>> the
>>>>>>>>>>>>>>>> stack variable is used. I think it should be declaring
>>>>>>>>>>>>>>>> the leak on
>>>>>>>>>>>>>>>> the
>>>>>>>>>>>>>>>> last line of the current scope. Which is correct ?
>>>>>>>>>>>>>>> 
>>>>>>>>>>>>>>> The current behavior is correct.  The current scope could
>>>>>>>>>>>>>>> be very big,
>>>>>>>>>>>>>>> and may end long after the leak occurs.  We have found
>>>>>>>>>>>>>>> that reporting
>>>>>>>>>>>>>>> leaks as close as possible to where the leak occurred is a
>>>>>>>>>>>>>>> much better
>>>>>>>>>>>>>>> experience for users.
>>>>>>>>>>>>>> 
>>>>>>>>>>>>>> 
>>>>>>>>>>>>>> <0001-Make-clang-static-analysis-support-allocation-into-a.patch>_______________________________________________
>>>>>>>>>>>>>> cfe-dev mailing list
>>>>>>>>>>>>>> cfe-dev at cs.uiuc.edu
>>>>>>>>>>>>>> http://lists.cs.uiuc.edu/mailman/listinfo/cfe-dev
>>>>>>>>>>>>> 
>>>>>>>>>>>> 
>>>>>>>>>>>> 
>>>>>>>>>>> 
>>>>>>>>>> 
>>>>>>>>>> 
>>>>>>>>> 
>>>>>>>> 
>>>>>>>> 
>>>>>>> 
>>>>>> 
>>>>>> 
>>>>>> _______________________________________________
>>>>>> cfe-dev mailing list
>>>>>> cfe-dev at cs.uiuc.edu
>>>>>> http://lists.cs.uiuc.edu/mailman/listinfo/cfe-dev
>>>>> 
>>>>> <0001-Make-clang-static-analysis-support-allocation-into-a.patch>
>>>> 
>>> 
>>> 
>> 
> 
>