<div dir="ltr"><div><div>Jordan, thanks for the comments.<br><br></div>Doubts (1) and (2) are now clear to me.<br><br></div><div>With regard to point (3), I didn't know <span style="font-family:courier new,monospace"><i>MemRegion</i></span>s can also be saved as custom ProgramState information, the same way as <span style="font-family:courier new,monospace"><i>SymbolRef</i></span>s. So to be able to keep track of which variables currently point to a String literal, if I haven't misunderstood you, I've tried saving as the checker's custom ProgramState information a MemRegion. In this scenario, I'm only interested in whether a <i>char *</i> variable points to a String literal or not. So instead of registering a Map with ProgramState, I think is enough to use a Set that contains variable which point to a String literal at any given time. Thus, if a given variable does not exist in the Set, the checker can assume it doesn't point to a String literal. I've done the registering with ProgramState as:<br>


<br><span style="font-family:courier new,monospace">typedef const MemRegion * MemRegionRef; <i><span style="font-family:arial,helvetica,sans-serif">// This typedef does not officially exist, so I've privately defined it to the checker as a helper<br>


</span></i>REGISTER_SET_WITH_PROGRAMSTATE(StrLit, MemRegionRef)<br></span><br>So far, so good. No run-time error for the time being. However, when I retrieve the current State and try to add/remove an <i>interested</i> variable (what I actually add/remove is its MemRegion), nothing happens. The method "contains" always returns false.</div>

<div> </div><div>I wonder if this approach (using a Set) is correct. I see that the contents of the ProgramState, are, for example, ElementRegions. Is it a matter of casting, as you pointed out in your previous comments?<br>


</div><div><div><div><div class="gmail_extra">Aitor.<br><br><div class="gmail_quote">2014-05-20 18:40 GMT+02:00 Jordan Rose <span dir="ltr"><<a href="mailto:jordan_rose@apple.com" target="_blank">jordan_rose@apple.com</a>></span>:<br>


<blockquote class="gmail_quote" style="margin:0px 0px 0px 0.8ex;padding-left:1ex;border-left-color:rgb(204,204,204);border-left-width:1px;border-left-style:solid"><div style><div>Hi, Aitor.</div><br><div><div><div>On May 18, 2014, at 9:11 , Aitor San Juan <<a href="mailto:aitor.sj@opendeusto.es" target="_blank">aitor.sj@opendeusto.es</a>> wrote:</div>


<br><blockquote type="cite"><div dir="ltr"><div><div><div><div><div><div>Hello,<br><br>Having reread the docs, I have several questions.<br><br></div>1) During the presentation you say that the order in which checker callbacks happen is not guaranteed by the analyzer as it explores the CFG. As far as I know, for example, PreCall will always be called before a PostCall event, and PreStmt always before a PostStmt. I don't really understand what you were referring to.<br>


</div></div></div></div></div></div></blockquote><div><br></div></div><div>It's more that the analyzer engine is most likely not exploring one path at a time. Instead, it could easily be performing a breadth-first search of the state space. Even if it did do a plain DFS, it would eventually have to jump back to try another path. Checkers can't tell the difference between any of these, so it's not safe to store any information in the checkers (with the one exception of lazily-initialized data that applies for the whole analysis).</div>


<div><br></div><div>You're right that <i>along a single path</i> you can guarantee that the PreCall check will always have happened by the time you reach a PostCall check, and that logically speaking that implies that the PreCall is processed "first". But a lot more could have happened in the middle before you get to the PostCall. Or you may not get to the PostCall at all.</div>


<div><div><br></div><br><blockquote type="cite"><div dir="ltr"><div><div><div><div><div>


<br>2) Should a checker be interested only in the parameter being passed during a function call, I guess it wouldn't make any difference whether checking the parameter in a PreCall event or in a PostCall event, would it? However, in this case, is it better to only register for the PreCall callback/event because of performance reasons?<br>


</div></div></div></div></div></div></blockquote><div><br></div></div><div>It depends what your interest is. If you're looking at the contents of memory referred to by a pointer argument, it makes a big difference whether you want to do such a thing pre-call or post-call, because it may have changed. If you're establishing some kind of preconditions or postconditions for the call, it makes sense to do so as a pre-call or post-call check, respectively. Remember that there may be other pre-call checkers that run after yours, or other post-call checkers that run before yours.</div>


<div><div><br></div><br><blockquote type="cite"><div dir="ltr"><div><div><div><div><div>


<br></div>3) When tracking the use of values (variables) between callbacks, if needed, checkers must use the ProgramState as a means of preserving custom information. This is clear. It's best to refer to those values by the underlying symbol (symbolic representation) created by the analyzer. In my case, I want to track the use of pointers to char (variables of type char *). In this case, the 1st argumento to checkBind callback will be a MemRegionVal. Reading the documentation, the counterpart of a symbol (SymbolRef in terms of the API) with regard to MemRegions is a SymbolicRegion.<br>


<br></div>Let's consider the following:<br></div><div></div><span style="font-family:courier new,monospace">char *s = "string literal";<br>char pwd[] = "password";<br>char *p;<br></span></div><span style="font-family:courier new,monospace"><b>p = s; (1)</b><br>


</span><span style="font-family:courier new,monospace"><b>p = pwd; (2)</b></span><br></div><br>To be able to track the use of "p" --as in (1) and (2) above-- I was thinking of obtaining a symbol that represents that variable (memory region) and save that symbol in the ProgramState in case there's a future reference to it in the program being analyzed, similar to the idea in the sample SimpleStreamChecker. Why in this case "p" is not a symbolic region? Reading the API, I thought the best way would be: getSymbolicBase() to be able to call getSymbol() on the result, but the former returns NULL. So, did I misunderstood and MemRegion is just the counterpart of a SymbolRef? If so, what would be the best way of saving a MemRegion's symbolic representation in the ProgramState?<br>


</div></blockquote><br></div></div><div>A symbol represents an unknown, constrainable value—you have a type, you can say "this symbol's value is not 0", but that's about it.</div><div><br></div><div>A MemRegion represents, well, a region in memory—it can have things stored in it. If you have a location-typed symbol (a pointer or reference), then it has a corresponding SymbolicRegion representing the memory at that unknown address. Not all symbols have MemRegions, though—an integer symbol does not represent a pointer. Similarly, not all MemRegions have symbols: a VarRegion has a known address and behavior (if abstracted) and can be represented much more concretly than the memory corresponding to an arbitrary returned pointer. A FieldRegion isn't a base region on its own; it's a subregion of some struct-typed super-region.</div>


<div><br></div><div>Though the video doesn't mention it, MemRegions are also persistent, meaning you can use them as keys in the ProgramState directly. However, you have to be careful—a cast is represented as an ElementRegion subregion with a different location type. Using a base region as a key should always be safe, though, if it makes sense for your particular use case.</div>


<div><br></div><div>In your particular example, 'p' represents a VarRegion whose location type is "char**" and whose value type is "char*". The contents of 'p' after 1 is the address of 's', which will be a loc::MemRegionVal referring to a StringRegion. The contents of 'p' after 2 is the address/start of 'pwd', which is a loc::MemRegionVal referring to a VarRegion of array type. None of the values are symbolic because the analyzer can model all of them fairly concretely.</div>


<div><br></div><div>Hope that helps,</div><div>Jordan</div><br></div></blockquote></div><br></div></div></div></div></div>