[cfe-dev] MemRegions as ProgramState info (was: checkBind: distinguish Re: checkBind: distinguish between MemRegionVal/ElementRegion)
Aitor San Juan
aitor.sj at opendeusto.es
Tue May 27 10:57:15 PDT 2014
Jordan, thanks for the comments.
Doubts (1) and (2) are now clear to me.
With regard to point (3), I didn't know *MemRegion*s can also be saved as
custom ProgramState information, the same way as *SymbolRef*s. So to be
able to keep track of which variables currently point to a String literal,
if I haven't misunderstood you, I've tried saving as the checker's custom
ProgramState information a MemRegion. In this scenario, I'm only interested
in whether a *char ** variable points to a String literal or not. So
instead of registering a Map with ProgramState, I think is enough to use a
Set that contains variable which point to a String literal at any given
time. Thus, if a given variable does not exist in the Set, the checker can
assume it doesn't point to a String literal. I've done the registering with
ProgramState as:
typedef const MemRegion * MemRegionRef;
*// This typedef does not officially exist, so I've privately defined it to
the checker as a helper *REGISTER_SET_WITH_PROGRAMSTATE(StrLit,
MemRegionRef)
So far, so good. No run-time error for the time being. However, when I
retrieve the current State and try to add/remove an *interested* variable
(what I actually add/remove is its MemRegion), nothing happens. The method
"contains" always returns false.
I wonder if this approach (using a Set) is correct. I see that the contents
of the ProgramState, are, for example, ElementRegions. Is it a matter of
casting, as you pointed out in your previous comments?
Aitor.
2014-05-20 18:40 GMT+02:00 Jordan Rose <jordan_rose at apple.com>:
> Hi, Aitor.
>
> On May 18, 2014, at 9:11 , Aitor San Juan <aitor.sj at opendeusto.es> wrote:
>
> Hello,
>
> Having reread the docs, I have several questions.
>
> 1) During the presentation you say that the order in which checker
> callbacks happen is not guaranteed by the analyzer as it explores the CFG.
> As far as I know, for example, PreCall will always be called before a
> PostCall event, and PreStmt always before a PostStmt. I don't really
> understand what you were referring to.
>
>
> It's more that the analyzer engine is most likely not exploring one path
> at a time. Instead, it could easily be performing a breadth-first search of
> the state space. Even if it did do a plain DFS, it would eventually have to
> jump back to try another path. Checkers can't tell the difference between
> any of these, so it's not safe to store any information in the checkers
> (with the one exception of lazily-initialized data that applies for the
> whole analysis).
>
> You're right that *along a single path* you can guarantee that the
> PreCall check will always have happened by the time you reach a PostCall
> check, and that logically speaking that implies that the PreCall is
> processed "first". But a lot more could have happened in the middle before
> you get to the PostCall. Or you may not get to the PostCall at all.
>
>
>
> 2) Should a checker be interested only in the parameter being passed
> during a function call, I guess it wouldn't make any difference whether
> checking the parameter in a PreCall event or in a PostCall event, would it?
> However, in this case, is it better to only register for the PreCall
> callback/event because of performance reasons?
>
>
> It depends what your interest is. If you're looking at the contents of
> memory referred to by a pointer argument, it makes a big difference whether
> you want to do such a thing pre-call or post-call, because it may have
> changed. If you're establishing some kind of preconditions or
> postconditions for the call, it makes sense to do so as a pre-call or
> post-call check, respectively. Remember that there may be other pre-call
> checkers that run after yours, or other post-call checkers that run before
> yours.
>
>
>
> 3) When tracking the use of values (variables) between callbacks, if
> needed, checkers must use the ProgramState as a means of preserving custom
> information. This is clear. It's best to refer to those values by the
> underlying symbol (symbolic representation) created by the analyzer. In my
> case, I want to track the use of pointers to char (variables of type char
> *). In this case, the 1st argumento to checkBind callback will be a
> MemRegionVal. Reading the documentation, the counterpart of a symbol
> (SymbolRef in terms of the API) with regard to MemRegions is a
> SymbolicRegion.
>
> Let's consider the following:
> char *s = "string literal";
> char pwd[] = "password";
> char *p;
> *p = s; (1)*
> *p = pwd; (2)*
>
> To be able to track the use of "p" --as in (1) and (2) above-- I was
> thinking of obtaining a symbol that represents that variable (memory
> region) and save that symbol in the ProgramState in case there's a future
> reference to it in the program being analyzed, similar to the idea in the
> sample SimpleStreamChecker. Why in this case "p" is not a symbolic region?
> Reading the API, I thought the best way would be: getSymbolicBase() to be
> able to call getSymbol() on the result, but the former returns NULL. So,
> did I misunderstood and MemRegion is just the counterpart of a SymbolRef?
> If so, what would be the best way of saving a MemRegion's symbolic
> representation in the ProgramState?
>
>
> A symbol represents an unknown, constrainable value—you have a type, you
> can say "this symbol's value is not 0", but that's about it.
>
> A MemRegion represents, well, a region in memory—it can have things stored
> in it. If you have a location-typed symbol (a pointer or reference), then
> it has a corresponding SymbolicRegion representing the memory at that
> unknown address. Not all symbols have MemRegions, though—an integer symbol
> does not represent a pointer. Similarly, not all MemRegions have symbols: a
> VarRegion has a known address and behavior (if abstracted) and can be
> represented much more concretly than the memory corresponding to an
> arbitrary returned pointer. A FieldRegion isn't a base region on its own;
> it's a subregion of some struct-typed super-region.
>
> Though the video doesn't mention it, MemRegions are also persistent,
> meaning you can use them as keys in the ProgramState directly. However, you
> have to be careful—a cast is represented as an ElementRegion subregion with
> a different location type. Using a base region as a key should always be
> safe, though, if it makes sense for your particular use case.
>
> In your particular example, 'p' represents a VarRegion whose location type
> is "char**" and whose value type is "char*". The contents of 'p' after 1 is
> the address of 's', which will be a loc::MemRegionVal referring to a
> StringRegion. The contents of 'p' after 2 is the address/start of 'pwd',
> which is a loc::MemRegionVal referring to a VarRegion of array type. None
> of the values are symbolic because the analyzer can model all of them
> fairly concretely.
>
> Hope that helps,
> Jordan
>
>
-------------- next part --------------
An HTML attachment was scrubbed...
URL: <http://lists.llvm.org/pipermail/cfe-dev/attachments/20140527/58559a29/attachment.html>
More information about the cfe-dev
mailing list