[cfe-dev] Checking Model Used by 'ArrayBoundV2'

Tue Dec 6 15:32:28 PST 2011

-----BEGIN PGP SIGNED MESSAGE-----
Hash: SHA1

On 12/05/2011 06:36 PM, Ted Kremenek wrote:
> 
> On Dec 3, 2011, at 10:48 PM, Anna Zaks wrote:
> 
>>> E.g. I would expect that even in the purely intraprocedural case it
>>> could catch something like this:
>>>
>>> int *foo, *bar;
>>> int baz = something;
>>>
>>> foo = (int*) malloc(CONSTANT * sizeof(int));
>>> bar = (foo + CONSTANT + 1);
>>> *bar = baz;
>>>
>>> , which the present implementation of 'ArrayBoundV2' does not catch.
>>
>> This example is not handled because the checker does not reason about malloc. Note, the analyzer CAN find the bug in the following example:
>>
>> void m() {
>>    int foo[3];
>>    int *bar;
>>    bar = (foo + 3 + 1);
>>    *bar = 3;
>> }
> 
> The way I'd like to see this generally approached is that ArrayBoundV2 could consult some API to check the "known bound" for a region.  We have a mechanism, known as "extents".  Specific checkers could register their knowledge of extents (e.g., the Malloc checker could register an extent for malloc'ed memory) and ArrayBoundCheckerV2 just cues off that extent information.
> 
> I'm interested in a general, scalable solution for adding such information.  In the short term, that could be added to ArrayBoundCheckerV2 as a post-visit on CallExprs.  This would leave the current bounds checking logic to just consult the generic extent information in the ProgramState.
> 
> More generally, I'd like any checker be able to register information about extents, and then have the ArrayBoundCheckerV2 automatically get smarter by leveraging such domain-specific information.  Further, once IPA comes available, we may be able to infer this extent information by other means.
> 

One thing that springs immediately to my mind is that symbolic pointers
could be more closely associated to instances of MemRegion, e.g. added
as a class member, and that this would make a number of things easier.
I've looked through the definitions in
'~/lib/StaticAnalyzer/Core/MemRegion.cpp' and
'~/include/clang/StaticAnalyzer/Core/MemRegion.h', and it wasn't
immediately clear to me how the base region of a MemRegion relates back
to a symbolic value (i.e. SVal).  This appears to create some
difficulties for the current implementation, e.g. the stubbed check of
'computeExtentBegin()' in 'ArrayBoundV2'.  On first glance, it looks
like the checker has to ask for a lot more information from the
execution context than just the symbolic start of the region and its
expected extent.

Maybe I've misunderstood the code, but it seems like a symbolic pointer
could be associated to a memory region whenever it is declared or
mallocked,  and propagated through the execution as just another
reaching definition, with the extent of the associated region included
as part of that definition.  In particular, any pointer 'q' into the
region originally associated to 'p' would be expected to satisfy
something like '(q in bound) => (exists i such that (p + i in bound and
q = p + i))'.  Constraints along these lines could then be applied as
needed.

Is this something like what you're thinking of?  Have I missed something
in the code or misunderstood what I've seen so far?
-----BEGIN PGP SIGNATURE-----
Version: GnuPG v1.4.10 (GNU/Linux)
Comment: Using GnuPG with Mozilla - http://enigmail.mozdev.org/

iQEcBAEBAgAGBQJO3qYMAAoJECT8Sbi31ptpFpcH/3+eQxY9Og+oaNvrAAvtnvIY
T+t5OJy20PkByDV+K+zukHqHRaCLtSsvtmKIp00tUNN1IQ9AS2P7JR/HOKZXF5AI
Svq/C5YX5VfJjrN10lfUD2q8HV2NqbKOqiN1k7TNf5cC4mwktvoT3zz0hG13EOh+
EmM69v5y8bY39r5coBCJkFRZt07xijKtI7aFIM4UlgPo3tZcNVdeL/DEOIIHpTyE
oNkDLF4btuaOT27z8WxPH7Thiqbs7SDV6SavtahNxa3+I09dG0tdxNoGMTpBrlkw
Vi5FviY0w6vxcktV6PjVXyYzZjUwWawTdJBpYjyMTpUmPHqvh8N0nmcJuN7HQ18=
=cQBa
-----END PGP SIGNATURE-----