<div dir="ltr"><br><br><div class="gmail_quote">On Tue, Oct 7, 2008 at 12:52 PM, Ted Kremenek <span dir="ltr"><<a href="mailto:kremenek@apple.com">kremenek@apple.com</a>></span> wrote:<br><blockquote class="gmail_quote" style="border-left: 1px solid rgb(204, 204, 204); margin: 0pt 0pt 0pt 0.8ex; padding-left: 1ex;">
<div style=""><br><div><div class="Ih2E3d"><div>On Oct 6, 2008, at 7:16 PM, Zhongxing Xu wrote:</div><br><blockquote type="cite"><div dir="ltr">On Mon, Oct 6, 2008 at 11:18 PM, Ted Kremenek <span dir="ltr"><<a href="mailto:kremenek@apple.com" target="_blank">kremenek@apple.com</a>></span> wrote:<br>
<div class="gmail_quote"><blockquote class="gmail_quote" style="border-left: 1px solid rgb(204, 204, 204); margin: 0pt 0pt 0pt 0.8ex; padding-left: 1ex;"> <div><div><div></div><div><br><div><div>On Oct 5, 2008, at 6:28 PM, Zhongxing Xu wrote:</div>
<br><blockquote type="cite"><div dir="ltr"><div class="gmail_quote"><blockquote class="gmail_quote" style="border-left: 1px solid rgb(204, 204, 204); margin: 0pt 0pt 0pt 0.8ex; padding-left: 1ex;"> <div>My either idea was to have regions encode minimal information that could be shared amongst different implementations of StoreManager. I'm not really certain why you wish to add a VarDecl* "pointedBy" field into AnonTypedRegion? I didn't get around to commenting this class, but AnonTypedRegion is meant to represent a typed chunk of memory; it doesn't have to be pointed by a VarDecl. It also seems to me that you are using AnonTypedRegion exactly the way BasicStore uses VarRegion. Isn't it the same thing? The "Anon" means anonymous; it means there is no name associated with this region.<font color="#888888"></font></div>
</blockquote><div><br>I use VarDecl* to differentiate AnonTypedRegions. For example, for <br><br>void foo(char* a, char* b) {...}, <br></div></div><br>'a' and 'b' both points to an AnonTypedRegion with type 'char', I just use the VarDecl's of 'a' and 'b' to differentiate these two regions. This is definitely not optimal design, because there might be AnonTypedRegion that are not pointed to by any variable. So it should be discussed. But one thing is sure: we need something to be associated with AnonTypedRegion besides its type and superregion.<br>
</div></blockquote></div><br></div></div><div>I understand, but this particular use of AnonTypedRegions is exactly the same as VarRegion. Why not just use VarRegion instead and save the extra QualType? It also avoids putting the VarDecl* in AnonTypedRegion, since at that point the region is not anonymous.</div>
</div></blockquote><div><br>In RegionStoreManager, I assume pointer parameters points to some anonymous memory region at the beginning of the function. So this AnonTypedRegion is the the region that the parameter points to, not the memory region associated with the parameter itself. So for a function parameter 'char *a', actually I created two regions: one is a VarRegion with VarDecl of 'a' (the region 'R' in my patch), this is the region with 'a' itself. The other is an AnonTypedRegion (the region 'PR' in my patch). This is the region that is pointed to by 'a' (by assumption). This region is really an anonymous region, for we don't know where 'a' points to when we setup the initial store for the function. (In the future interprocedural analysis, we might know.) And to differentiate AnonTypedRegions pointed-at by different parameters, I associate with them the VarDecl of the parameter pointing-at them. Maybe we can have a subclass of AnonTypedRegion to represent such memory region pointed-at by function parameters, and give them VarDecl or other information to differentiate from each other.</div>
</div></div></blockquote><div><br></div></div><div>Hi Zhongxing,</div><div><br></div><div>Thanks for clarifying. That makes a lot more sense to me.</div><div><br></div><div>I just saw your other email where you submitted an alternate patch, but I'll mention some initial thoughts I have here. First, I think having an AnonPointeeRegion makes sense then overloading the purpose of AnonTypedRegion. What you are trying to represent is the "symbolic address" of the parameters and global variables that are pointers upon entree of the function. One question comes to mind is whether or not this binding is done up front or lazily. For example</div>
<div><br></div><div>void foo(int **p) { ... }</div><div><br></div><div>In your patch the value of "p" upon entry to the function would be an address binding to an AnonTypedRegion/AnonPointeeRegion. What about *p? How many levels deep does it go? I don't have a good answer, but binding "symbolic" addresses lazily might be more flexible, dynamic, and scalable.</div>
<div><br></div><div>I have seen in the implementations of some static analysis systems that they bound the "deepness" of the heap. For example:</div><div><br></div><div> q->x->y might have an explicit region for the field 'y'</div>
<div><br></div><div>but</div><div><br></div><div> q->x->y->w might just have an "unknown" for w, just to bound the abstraction.</div><div><br></div><div>This question becomes particularly important when one considers recursive data structures. There doesn't have to be a one-size-fits-all solution; it's just something to consider.</div>
<div><br></div><div>Here's another (random) thought. Consider:</div><div><br></div><div>int (int* p, int *q) {</div><div> ...</div><div> if (*p > 10) { ... }</div><div> ...</div><div> if (*q < 20) { ... }</div>
<div> ...</div><div> if (p == q) { ... }</div><div> ...</div><div>}</div><div><br></div><div>What happens to the store when 'p == q'? Do we do alpha-renaming + unification of the regions and bindings? I.e., do all the values and constraints for the region and the values they map to get combined? When we start reasoning about abstract memory, these are things we want to consider in the design. Again, we don't have to solve all problems at once, but this is something fairly fundamental.</div>
</div></div></blockquote><div><br>These are all good points to consider. My suggestion is that we add complexity step by step. In this first basic region store model (see the attached new patch), we assume simple cases: we assume no aliasing for parameters, we do 1-limiting analysis and assume nothing about the heap shape. So the only big thing compared to BasicStore is the field sensitivity.<br>
<br>Then we can design more complex ones: heap store model, alias store model, ...<br><br>Another cure that I'm looking forward to is the inter-procedural analysis. Most complexities above are due to lack of environment information. But when we do top-down inter-procedural analysis, we will have (at least part of) function entree information. This can relieve us from assuming the worst case.<br>
</div><blockquote class="gmail_quote" style="border-left: 1px solid rgb(204, 204, 204); margin: 0pt 0pt 0pt 0.8ex; padding-left: 1ex;"><div style=""><div><div></div><div class="Ih2E3d"><div><br></div><blockquote type="cite">
<div dir="ltr"><div class="gmail_quote"><blockquote class="gmail_quote" style="border-left: 1px solid rgb(204, 204, 204); margin: 0pt 0pt 0pt 0.8ex; padding-left: 1ex;"><div><div><br></div><div>It seems to me that RegionStoreManager doesn't need to use just one kind of region and one ImmutableMap. For example, a map from VarDecl* -> VarRegion could be used for mappings from variables to regions, and then other maps could be used for other bindings. For example, a (MemRegion*, FieldDecl*) -> FieldRegion could be used for field bindings. A second set of mappings could then be used for region -> value bindings. In your prototype implementation you have AnonTypedRegion* -> RVal, but one could also just have MemRegion* -> RVal.</div>
</div></blockquote></div><br>My thought is that we don't need any mappings from Decl* to MemRegion*. We only need one mapping from MemRegion* to its stored value RVal. Because once we have the mapping MemRegion* -> RVal, we can calculate the location (MemRegion*) of any name on the fly. So we don't need to store them.</div>
</blockquote><div><br></div></div><div>Unfortunately I don't believe that's true (which counteracts something I said in an earlier private email).</div><div><br></div><div>Consider:</div><div><br></div><div>int *p = 0;</div>
<div><br></div><div>for (int i = 0; i < 10; i++) {</div><div> if (p) *p++;</div><div> int j = i + 1;</div><div> p = &j;</div><div>}</div><div><br></div><div>On each iteration of the loop the VarDecl for 'j' will conceptually bind to a different region (although by coincidence it may bind to the same physical memory, logically it binds to a different "object"). While memory bindings for globals and parameters stay fixed during the execution of a function (with globals staying always fixed), bindings for local variables do not. If we want to catch cases like the *p++ being a use of invalid memory, we actually have to consider the case where the same VarDecl can bind to different regions. Note that we do not capture enough scope information in the CFGs to do this analysis right now, but that is something we probably we will add to the CFGs in the future (especially if we wish to model implicit calls to destructors for C++ objects).</div>
</div></div></blockquote><div><br>Why in this example 'j' will bind to a different region on each iteration? In actual (physical) execution of this program, 'j' just holds the same stack memory. It's the same as the one declared outside of the loop, except its scope is limited inside the loop. Instead, if we bind it to different region on each iteration, we will get wrong results, because it's inconsistent with the semantics of C.<br>
<br></div><blockquote class="gmail_quote" style="border-left: 1px solid rgb(204, 204, 204); margin: 0pt 0pt 0pt 0.8ex; padding-left: 1ex;"><div style=""><div><div></div><div class="Ih2E3d"><div><br></div><blockquote type="cite">
<div dir="ltr"><span>For a struct member expression 'p->data', we can first get the superRegion by following p's MemRegion in the store mapping, then get the final region by composite the FieldDecl of 'data' with the MemRegion that p points at.</span></div>
</blockquote><div><br></div></div>Would the idea be to represent p->data (the composition) with a FieldRegion (with its super region being the region for 'p'), or something else?</div></div></blockquote><div><br>
Yes, exactly.<br> <br></div><blockquote class="gmail_quote" style="border-left: 1px solid rgb(204, 204, 204); margin: 0pt 0pt 0pt 0.8ex; padding-left: 1ex;"><div style=""><div></div><div class="Ih2E3d"><div><br></div><div>
<blockquote type="cite"><div dir="ltr">In summary, we only need to store the Store ( mapping from MemRegion* to RVal), but not the Environment (mapping from names to MemRegion*).<br></div></blockquote></div><br></div><div>
Given the example I provided above, do you still think that is the case? It's certainly simpler if we don't have to model the mapping from names to MemRegion*.</div><div></div></div></blockquote><div><br>Could you show me an example where the same VarDecl should bind to different region during one analysis path?<br>
</div></div><br></div>