[cfe-commits] unix.Malloc static checker improvement: memory.LeakPtrValChanged

Thu Dec 20 17:32:47 PST 2012

I'll try to answer you piece-by-piece.

On Dec 19, 2012, at 20:39 , Branden Archer <b.m.archer4 at gmail.com> wrote:

> Jordan,
> 
> Thanks for the feedback. 
> 
> I have a few questions about your comments. My first impression when I looked at your examples was 'why would that not work'? Originally I believed that some state regarding the allocate function would be remembered, and when it was invoked the static checking framework would mark that the pointer returned to the caller would always have an offset of 1. I am not sure that is true though.

The easy case is when this is true, which is the case where "there is some malloc data associated with the region". However, the analyzer currently cannot do cross-translation-unit analysis, so that immediately presents one case where we might retrieve some data value and have no idea where it came from. Even if we performed whole-program analysis, however, external libraries will always provide an interface boundary that checkers will most likely not be able to perfectly model.

(If I store something using pthread_setspecific, and retrieve it later using pthread_getspecific, does the analyzer know it's the same value? If I run something asynchronously, can I assume globals are still in the same state that they were before?)

> And further, even if it was, posting a warning at the free inside the deallocate function would not make sense, as the pointer passed to deallocate could have multiple possible offsets depending on what the program was doing. To be effective, the warning would need to be when deallocate was called, which would be much tricker.

Well, this we don't have as much of an opportunity about, since any user could write their own deallocate(). But we could make sure that we show where the memory is allocated, which means we'd see the execution path (including the call stack) that led to the free inside deallocate().

> The more I think about this, the more I am getting lost in some of the details. When deallocate is analyzed, nothing is really known about what memoryBlock really represents. The caller could be passing malloc'd memory or something else. I like your suggestion about making sure that the memoryBlock actually has known malloc information. This would limit the checker to only using malloc and free, and only within the same function. However, I think that is what memory.LeakPtrValChanged is supposed to accomplish anyway.

Yup.

> A final point I am not clear on is how to determine if a MemRegion has malloc information. I see where the ReportBadFree function in MallocChecker.cpp is finding the base MemRegion, and also determining if the region is generated from alloca. How would one know if it was generated from malloc instead? Maybe if the MemRegion was a HeapSpaceRegion?

MemRegion has a function called getBaseRegion() that strips off as many kinds of subregion as the analyzer knows about. (That is, it could still turn out that the base region is a subobject of a larger region, but it's at least not known to be.) MallocChecker tracks which regions came from malloc in its RegionState map, so you can just check to see if the base region is in the map, and what its RefState is.

Does that make sense?

> Also, thanks for pointing out the coding standards. I will follow them more closely in the future.

Glad to have you working on this!
Jordan

> 
> On Mon, Dec 17, 2012 at 12:59 PM, Jordan Rose <jordan_rose at apple.com> wrote:
> Hi, Brandon. It's a good idea, but unfortunately it has some problems. In C, it's totally legal to do this:
> 
>> int *allocate(size_t size) {
> 
>>   int *memoryBlock = (int *)malloc(size + sizeof(int));
>>   *memoryBlock = SECRET_CODE;
>>   return &memoryBlock[1];
>> }
> 
>> void deallocate(int *memoryBlock) {
>>   assert(memoryBlock[-1] == SECRET_CODE);
>>   free(&memoryBlock[-1]);
>> }
> 
> 
> I'm not sure of the best way to solve this. IIRC, by default the region '*memoryBlock' will be a SymbolicRegion backed by a RegionValueSymbol, but if 'deallocate' has been inlined the backing symbol could be a DerivedRegionSymbol or a ConjuredSymbol instead. So it'd be very hard to differentiate these cases without actually seeing the call to allocate().
> 
> What you could try is seeing if the base region already has malloc information. That will miss some true bugs, but it should also drastically lower the rate of false positives, since we'll only be warning about regions we know can be freed.
> 
> As far as the patch itself, your logic seems reasonable, but your style doesn't match the rest of the file or the LLVM Coding Standards. In particular, please put a space after 'if', put operators at the end of the previous line instead of the start of the next line, and fit your lines in 80 columns. I'd also prefer '!offset.hasSymbolicOffset()' over 'offset.hasSymbolicOffset() == false'.
> 
> You'll probably want more test cases: cases where the input parameter does not come from malloc but has an offset, and at least one case where the input parameter comes from outside the function and has an offset.
> 
> Thanks for working on this!
> Jordan
> 
> 
> 
> On Dec 15, 2012, at 21:58 , Branden Archer <b.m.archer4 at gmail.com> wrote:
> 
>> I have recently started looking into clang, and was interested in participating. After taking a look at the potential projects, the static checking functionality seemed interesting. Specifically, I have taken a look at the checker "memory.LeakPtrValChanged" mentioned on the list of potential checkers page.
>> 
>> Warning: As this is my first attempt at hacking clang, I may have gone a different route than someone with more experience in the project. If something in my description or patch seems out of place, please let me know!
>> 
>> From the description, the proposed memory.LeakPtrValChanged checker was to only consider a pointer to newly allocated data losing its original value. Through some investigation, I find that MemRegion objects which track pointers to memory allocations can also maintain any offset currently applied to the pointer. Using this information, the checker can reason about invalidated pointers beyond being 'newly allocated'. For example, the following case can be caught:
>> 
>> int * x = malloc(sizeof(int));
>> x += 1;
>> free(x);
>> 
>> However, the following is valid:
>> 
>> int * x = malloc(sizeof(int));
>> x += 1;
>> free(x-1);
>> 
>> The attached patch uses the RegionOffset of freed malloc allocations to determine if the freed pointer has a non-zero offset, and post a warning in this case. If the offset is symbolic (and thus not known to be non-zero), no warning is posted. There are tests included to verify the proposed changes. 
>> 
>> Note that memory.LeakPtrValChanged mentioned checking both malloc/free and new/delete, but this patch only considers malloc/free.
>> 
>> Please let me know if the attached patch is appropriate, or if it is missing something or there is another solution which may be a better fit.
>> 
>> - Branden
>> 
>> <leakPtrValChanged.patch>_______________________________________________
>> cfe-commits mailing list
>> cfe-commits at cs.uiuc.edu
>> http://lists.cs.uiuc.edu/mailman/listinfo/cfe-commits
> 
> 

-------------- next part --------------
An HTML attachment was scrubbed...
URL: <http://lists.llvm.org/pipermail/cfe-commits/attachments/20121220/7d9e7b81/attachment.html>