[cfe-dev] Checking Model Used by 'ArrayBoundV2'
Anna Zaks
ganna at apple.com
Wed Dec 7 16:09:01 PST 2011
On Dec 7, 2011, at 3:15 PM, Benjamin Schulz wrote:
> -----BEGIN PGP SIGNED MESSAGE-----
> Hash: SHA1
>
> Much thanks, Ted. The explanation below actually clears up a great deal
> of confusion about how the analyzer represents memory regions. I saw a
> message on the commits list suggesting the explanation had been put into
> the inline code comments, but for what it's worth, something similar
> would also look pretty good in the online documentation, if there's a
> place to put it.
I did add the explanation to the online documentation in r 146048 (www is in the same repository as the code itself).
Anna.
>
> At any rate, your patient explanations are greatly appreciated.
>
> On 12/06/2011 06:57 PM, Ted Kremenek wrote:
>> On Dec 6, 2011, at 3:32 PM, Benjamin Schulz wrote:
>>
>>> One thing that springs immediately to my mind is that symbolic pointers
>>> could be more closely associated to instances of MemRegion, e.g. added
>>> as a class member, and that this would make a number of things easier.
>>> I've looked through the definitions in
>>> '~/lib/StaticAnalyzer/Core/MemRegion.cpp' and
>>> '~/include/clang/StaticAnalyzer/Core/MemRegion.h', and it wasn't
>>> immediately clear to me how the base region of a MemRegion relates back
>>> to a symbolic value (i.e. SVal). This appears to create some
>>> difficulties for the current implementation, e.g. the stubbed check of
>>> 'computeExtentBegin()' in 'ArrayBoundV2'. On first glance, it looks
>>> like the checker has to ask for a lot more information from the
>>> execution context than just the symbolic start of the region and its
>>> expected extent.
>>>
>>> Maybe I've misunderstood the code, but it seems like a symbolic pointer
>>> could be associated to a memory region whenever it is declared or
>>> mallocked, and propagated through the execution as just another
>>> reaching definition, with the extent of the associated region included
>>> as part of that definition. In particular, any pointer 'q' into the
>>> region originally associated to 'p' would be expected to satisfy
>>> something like '(q in bound) => (exists i such that (p + i in bound and
>>> q = p + i))'. Constraints along these lines could then be applied as
>>> needed.
>>>
>>> Is this something like what you're thinking of? Have I missed something
>>> in the code or misunderstood what I've seen so far?
>>
>> I think you are missing a few critical pieces of the overall design.
>>
>> SVals are meant to represent the semantic evaluation of expressions. They can represent things like concrete integers, symbolic integers, or memory locations (which are memory regions). They are a discriminated union of "values", symbolic and otherwise. There is no meaning to associate a memory region back to its SVal, because that relationship doesn't exist. An SVal is just an utterance of a value; it's not the representative of the value itself.
>>
>> In contrast, symbols are meant to represent abstract, but named, symbolic values. Symbolic values can have constraints associated with them. Symbols represent an actual (immutable) value. We might not know what its specific value is, but we can associate constraints with that value as we analyze a path. To be more concrete, suppose we have:
>>
>> int foo(int x) {
>> int y = x * 2;
>> int z = x;
>> ...
>> }
>>
>> Let's look at how x*2 gets evaluated. When 'x' is evaluated, we first construct an SVal that represents the lvalue of 'x', in this case it is an SVal that references the MemoryRegion for 'x' (more on that later). Afterwards, when we do the lvalue-to-rvalue conversion, we get a new SVal, which references the value *currently bound* to 'x'. That value is symbolic; it's whatever 'x' was bound to at the start of the function. Let's call that symbol $0. Similarly, we evaluate the expression for '2', and get an SVal that references the concrete number '2'. When we evaluate x*2, we take the two SVals of the subexpressions, and create a new SVal that represents their multiplication (which in this case is a new symbolic value, which we might call $1). When we evaluate the assignment to 'y', we again compute its lvalue (a MemRegion), and then bind the SVal for the RHS (which references the symbolic value $1) to the MemRegion in the symbolic store.
>>
>> The second line is similar. When we evaluate 'x' again, we do the same dance, and have an SVal that references $0. That SVal isn't malloc'ed (with a persistent address), it is just a C++ value object. Think of it as just a smart pointer, referencing the real values. Two SVals might reference the same underlying values.
>>
>> To clarify, a MemRegion is similar to a symbol. It is used to provide a lexicon of how to describe abstract memory. Regions can layer on top of other regions, providing a layered description of how to describe memory. For example, a struct object on the stack might be represented by a VarRegion, but a FieldRegion which is a subregion of the VarRegion could be used to represent the memory associated with a specific field of that object.
>>
>> Stepping back, from this perspective, there is no inherit association between a MemRegion and an SVal. An SVal might reference a MemRegion, and an SVal might represent the value bound to a MemRegion in the symbolic store, but there is a many-to-one relationship from SVals (which are just value references) to any given MemRegion.
>>
>> So how do we represent symbolic memory regions? That's what SymbolicRegion is for. It is a MemRegion that has an associated symbol. Since the symbol is unique and has a unique name, that symbol names the region.
>>
>> To summarize, MemRegions are unique names for blocks of memory. Symbols are unique names for abstract symbolic values. Some MemRegions represents abstract symbolic chunks of memory, and thus are also based on symbols. SVals are just references to values, and can reference either MemRegions, Symbols, or concrete values (e.g., the number 1).
>>
>> So what about extents? Extents are just symbolic values as well. We also have various mechanisms to relate symbols to other symbols. Indeed, we already have SymbolExtent, which can be used to represent the associated length of a SymbolicRegion. By using symbols to represent extents of symbolic regions, we employ the same constraint management logic for extents that we do for any other symbolic value.
>>
>> We really do have all the pieces in place. We already have symbolic extents. The idea might need to be refined in its implementation, but everything we need is already basically there.
>>
>
> -----BEGIN PGP SIGNATURE-----
> Version: GnuPG v1.4.10 (GNU/Linux)
> Comment: Using GnuPG with Mozilla - http://enigmail.mozdev.org/
>
> iQEcBAEBAgAGBQJO3/N3AAoJECT8Sbi31ptppbAH/id70lhEweiUcrJcZJnmJ2hP
> PACMTT9xnFI9oc5Ep7A99PeT9h6sojctHyMPHYGFKEz3GIZikMGkNeEpK0kxeasl
> LS7hiKRFjuQiMvzRJOmrKMa0zH0FwsAt9LbVqT18E6wX921Vg5SX64c24I/jv9om
> FHD82YIpKU6Mas90SGiW5Gh+2NlMKEZYYqw9rWQvk7U5bYgUwgUkDdHg+4qhqPn5
> SJC2SN4K8CUvqrDS9WBLPS5gKFgRAuQmwcdbaPUtyKQGwt8ziDE2XbfO/+EXX/7m
> 1cD8OcY51CXjzdYq5nNIiuO/Zbbb8aNTPBPVLYhc4HGPNZ30pQN7d3YYNweV7og=
> =SkeQ
> -----END PGP SIGNATURE-----
>
> _______________________________________________
> cfe-dev mailing list
> cfe-dev at cs.uiuc.edu
> http://lists.cs.uiuc.edu/mailman/listinfo/cfe-dev
More information about the cfe-dev
mailing list