[cfe-commits] unix.Malloc static checker improvement: memory.LeakPtrValChanged

Thu Dec 27 14:31:36 PST 2012

Jordan,

Your explanation makes sense. It seems that the only piece missing from my
previous patch is checking if the data is actually data know to come from
malloc (or some other allocating function); if so and the region has an
offset, then report the problem.

I find that there is something in the malloc checker which removes the
symbol from a previous malloc from the RegionState map. Specifically, the
checkPointerEscape callback. This was not used when I submitted my original
patch, but was added since. It seems that this callback is invoked just
before an invalidated symbol is passed to free. For example, the following
results in a checkPointerEscape callback:

{
int * data = malloc(sizeof(int));
data += 1;
free(data);
}

However, If the correct pointer is given to free even if it is manipulated,
checkPointerEscape is not called:

{
int * array = malloc(sizeof(int)*5);
array += 1;
free(&array[-1]);
}

If checkPointerEscape removes the malloc information of an offending symbol
from the RegionState map, then there is nothing to check inside of the
FreeMemAux function called by the checkPostStmt callback. However, maybe it
would be best to post a BugReport when the symbol is invalided, as that
when the bug occurs. Do you know of anything the checkPointerEscape
callback may catch that should not result in a BugReport if a malloc'ed
symbol that is currently allocated is being invalidated?

One more question. I notice that checkPointerEscape callback does not pass
a CheckerContext, unlike the other checkers. Does this mean that a
BugReport cannot be posted from within checkPointerEscape?

- Branden

On Thu, Dec 20, 2012 at 8:32 PM, Jordan Rose <jordan_rose at apple.com> wrote:

> I'll try to answer you piece-by-piece.
>
>
> On Dec 19, 2012, at 20:39 , Branden Archer <b.m.archer4 at gmail.com> wrote:
>
> Jordan,
>
> Thanks for the feedback.
>
> I have a few questions about your comments. My first impression when I
> looked at your examples was 'why would that not work'? Originally I
> believed that some state regarding the allocate function would be
> remembered, and when it was invoked the static checking framework would
> mark that the pointer returned to the caller would always have an offset of
> 1. I am not sure that is true though.
>
>
> The easy case is when this is true, which is the case where "there is some
> malloc data associated with the region". However, the analyzer currently
> cannot do cross-translation-unit analysis, so that immediately presents one
> case where we might retrieve some data value and have no idea where it came
> from. Even if we performed whole-program analysis, however, external
> libraries will always provide an interface boundary that checkers will most
> likely not be able to perfectly model.
>
> (If I store something using pthread_setspecific, and retrieve it later
> using pthread_getspecific, does the analyzer know it's the same value? If I
> run something asynchronously, can I assume globals are still in the same
> state that they were before?)
>
>
> And further, even if it was, posting a warning at the free inside the
> deallocate function would not make sense, as the pointer passed to
> deallocate could have multiple possible offsets depending on what the
> program was doing. To be effective, the warning would need to be when
> deallocate was called, which would be much tricker.
>
>
> Well, this we don't have as much of an opportunity about, since any user
> could write their own deallocate(). But we could make sure that we show
> where the memory is allocated, which means we'd see the execution path
> (including the call stack) that led to the free inside deallocate().
>
>
> The more I think about this, the more I am getting lost in some of the
> details. When deallocate is analyzed, nothing is really known about what
> memoryBlock really represents. The caller could be passing malloc'd memory
> or something else. I like your suggestion about making sure that the
> memoryBlock actually has known malloc information. This would limit the
> checker to only using malloc and free, and only within the same function.
> However, I think that is what memory.LeakPtrValChanged is supposed to
> accomplish anyway.
>
>
> Yup.
>
>
> A final point I am not clear on is how to determine if a MemRegion has
> malloc information. I see where the ReportBadFree function in
> MallocChecker.cpp is finding the base MemRegion, and also determining if
> the region is generated from alloca. How would one know if it was generated
> from malloc instead? Maybe if the MemRegion was a HeapSpaceRegion?
>
>
> MemRegion has a function called getBaseRegion() that strips off as many
> kinds of subregion as the analyzer knows about. (That is, it could still
> turn out that the base region is a subobject of a larger region, but it's
> at least not *known* to be.) MallocChecker tracks which regions came from
> malloc in its RegionState map, so you can just check to see if the base
> region is in the map, and what its RefState is.
>
> Does that make sense?
>
>
> Also, thanks for pointing out the coding standards. I will follow them
> more closely in the future.
>
>
> Glad to have you working on this!
> Jordan
>
>
>
> On Mon, Dec 17, 2012 at 12:59 PM, Jordan Rose <jordan_rose at apple.com>wrote:
>
>> Hi, Brandon. It's a good idea, but unfortunately it has some problems. In
>> C, it's totally legal to do this:
>>
>> int *allocate(size_t size) {
>>
>>   int *memoryBlock = (int *)malloc(size + sizeof(int));
>>
>>   *memoryBlock = SECRET_CODE;
>>
>>   return &memoryBlock[1];
>>
>> }
>>
>>
>> void deallocate(int *memoryBlock) {
>>
>>   assert(memoryBlock[-1] == SECRET_CODE);
>>
>>   free(&memoryBlock[-1]);
>>
>> }
>>
>>
>> I'm not sure of the best way to solve this. IIRC, by default the region
>> '*memoryBlock' will be a SymbolicRegion backed by a RegionValueSymbol, but
>> if 'deallocate' has been inlined the backing symbol could be a
>> DerivedRegionSymbol or a ConjuredSymbol instead. So it'd be very hard to
>> differentiate these cases without actually seeing the call to allocate().
>>
>> What you could try is seeing if the base region already has malloc
>> information. That will miss some true bugs, but it should also drastically
>> lower the rate of false positives, since we'll only be warning about
>> regions we *know* can be freed.
>>
>> As far as the patch itself, your logic seems reasonable, but your style
>> doesn't match the rest of the file or the LLVM Coding Standards<http://llvm.org/docs/CodingStandards.html>.
>> In particular, please put a space after 'if', put operators at the end of
>> the previous line instead of the start of the next line, and fit your lines
>> in 80 columns. I'd also prefer '!offset.hasSymbolicOffset()' over
>> 'offset.hasSymbolicOffset() == false'.
>>
>> You'll probably want more test cases: cases where the input parameter
>> does *not* come from malloc but has an offset, and at least one case
>> where the input parameter comes from outside the function and has an offset.
>>
>> Thanks for working on this!
>> Jordan
>>
>>
>>
>> On Dec 15, 2012, at 21:58 , Branden Archer <b.m.archer4 at gmail.com> wrote:
>>
>> I have recently started looking into clang, and was interested in
>> participating. After taking a look at the potential projects, the static
>> checking functionality seemed interesting. Specifically, I have taken a
>> look at the checker "memory.LeakPtrValChanged" mentioned on the list of
>> potential checkers page.
>>
>> Warning: As this is my first attempt at hacking clang, I may have gone a
>> different route than someone with more experience in the project. If
>> something in my description or patch seems out of place, please let me know!
>>
>> From the description, the proposed memory.LeakPtrValChanged checker was
>> to only consider a pointer to newly allocated data losing its original
>> value. Through some investigation, I find that MemRegion objects which
>> track pointers to memory allocations can also maintain any offset currently
>> applied to the pointer. Using this information, the checker can reason
>> about invalidated pointers beyond being 'newly allocated'. For example, the
>> following case can be caught:
>>
>> int * x = malloc(sizeof(int));
>> x += 1;
>> free(x);
>>
>> However, the following is valid:
>>
>> int * x = malloc(sizeof(int));
>> x += 1;
>> free(x-1);
>>
>> The attached patch uses the RegionOffset of freed malloc allocations to
>> determine if the freed pointer has a non-zero offset, and post a warning in
>> this case. If the offset is symbolic (and thus not known to be non-zero),
>> no warning is posted. There are tests included to verify the proposed
>> changes.
>>
>> Note that memory.LeakPtrValChanged mentioned checking both malloc/free
>> and new/delete, but this patch only considers malloc/free.
>>
>> Please let me know if the attached patch is appropriate, or if it is
>> missing something or there is another solution which may be a better fit.
>>
>> - Branden
>>
>> <leakPtrValChanged.patch>_______________________________________________
>> cfe-commits mailing list
>> cfe-commits at cs.uiuc.edu
>> http://lists.cs.uiuc.edu/mailman/listinfo/cfe-commits
>>
>>
>>
>
>
-------------- next part --------------
An HTML attachment was scrubbed...
URL: <http://lists.llvm.org/pipermail/cfe-commits/attachments/20121227/56f433d6/attachment.html>