[cfe-dev] [StaticAnalyzer] Loc and NonLoc SVal

Paul Bert via cfe-dev cfe-dev at lists.llvm.org
Wed Jun 7 07:51:30 PDT 2017


Thanks a lot for the very detailed answer.
I just recently saw the answer.
So to compare 2 SVal I have to use SValBuilder.
I think it is pretty clear how to do this if I want to assert equality with
svalbuilder::evalEQ . How can I check if a given memory region whose
symbolic expression is "decorated" with "Element" due to casting, is a
sub-region of another one? Is SubRegionOf() is not working. How can I do
this with SValBuilder? Any code example I can look at?


Le vendredi 2 juin 2017, Artem Dergachev <noqnoqneo at gmail.com> a écrit :

> Long story short: Loc ("location") is a single point in memory (normally a
> pointer, maybe a C++ reference value or a null pointer or a function
> pointer or even a goto label address that might be treated as a first-class
> value through a gcc language extension). Note that Loc represents a single
> pointer value; even though a loc::MemRegionVal value may wrap a whole
> memory region, which in turn is a segment in memory, Loc itself represents
> the start of that region. Everything else - such as numbers or bools, or
> maybe structure ("compound") values regardless of their contents - is a
> NonLoc. A compound value (nonloc::CompoundVal or nonloc::LazyCompoundVal)
> represents the whole contents of the structure regardless of where it is
> currently stored, hence it's a NonLoc.
>
> NonLoc is always an rvalue; Loc may be either an lvalue or a pointer-type
> rvalue. For example, in
>
> 01  int x = 4;
> 02  int y = 5;
> 02  x += y;
>
> on line 03 the left-hand side of oeprator+= is a Loc that represents a
> pointer to local variable x - that's where the result will be stored - and
> the right-hand side is a NonLoc that represents concrete integer 5. If you
> want to find the concrete number 4, you'd need to notice that left-hand
> side is an lvalue expression in the sense of Expr::isLValue(), and then do
> State::getSVal(Loc) from the left-hand side. Then you'd finally be able to
> add 4 and 5 and compute the result of the operator. It is not adviced,
> however, to determine if you need to do getSVal() by seeing if left-hand
> side is a Loc. It may be a Loc for a different reason, eg:
>
> 01 void *x = malloc(BUF);
> 02 int y = 5;
> 03 void *z = x + y;
>
> ^here the left-hand side of operator+ is a Loc rvalue that represents a
> pointer to the start of the buffer, not &x like in the example above. If it
> was operator+=, it would have been &x and you'd have to do the extra
> getSVal(). But here you may accidentally shoot yourself in the foot if you
> make a redundant getSVal().
>
> Because in C pointers can be casted to integers and vice versa, some
> subtleties arize. Consider:
>
> 01  void *x = malloc(2 * sizeof(int));
> 02  *((int *) x) = 1;
> 03  size_t i = foo();
> 04  intptr_t y = &((int *) x)[i];
>
> The analyzer doesn't know what value would be returned by malloc(). It
> construct a SymbolConjured `conj_$0<void *>' to represent the numeric value
> of the pointer. Being a SymExpr, it isn't Loc or NonLoc. Technically it's
> possible to construct nonloc::SymbolVal with `conj_$0<void *>' to represent
> such numeric value as a NonLoc, but the analyzer wouldn't normally do that
> for pointer symbols.
>
> Instead, the analyzer constructs a SymbolicRegion, which represents the
> segment of memory allocated by malloc(). The analyzer even knows the extent
> (size or length, in bytes) of this region (which would in our example be 8
> - let's assume a 32-bit architecture). Then the analyzer represents the
> return value of malloc() as a loc::MemRegionVal, which is a Loc, of form
> "&SymRegion{conj_$0<void *>}". It means the pointer that points to the
> beginning of the region that was allocated by malloc.
>
> SymbolicRegion is essentially a bridge between symbols and Locs. If the
> pointer points to a known value, eg. &x, it'd not be a symbolic region, and
> the analyzer wouldn't construct any symbol at all to represent its numeric
> value. But when a pointer appears from elsewhere, we have to denote its
> value with a symbol and represent the pointer value as a symbolic region's
> start.
>
> On the second line, the analyzer sees the cast of the pointer to "int*".
> The SVal of the expression would be "&element{SymRegion{conj_$0<void *>},
> 0 S32b, int}". It means that the analyzer interpreted the aforementioned
> symbolic region as an array of ints. The analyzer isn't sure if there are
> actually ints in this array (which is why SymbolicRegion doesn't inherit
> from TypedValueRegion). Then he took the first element of this array, which
> is a 4-byte chunk of the symbolic region. The SVal is a loc::MemRegionVal
> represents the pointer to this first element. This SVal is technically
> *equal* to the old one: "&SymRegion{conj_$0<void *>}" - both represent the
> same pointer value. They wouldn't be equal in terms of SVal::operator==(),
> but they'd be equal if you use SValBuilder to evaluate BO_EQ on them (which
> is the right way to compare SVals). However, the new SVal carries
> additional type information that you may want to extract from it by
> querying its .getAsRegion().
>
> On line 04 computation of the right-hand side of the expression starts
> with constructing a similar ElementRegion, just with symbolic offset:
> "&element{SymRegion{conj_$0<void *>}, conj_$1<size_t>, int}". It
> represents a pointer to i'th element of the array. It's not necessarily
> within the malloc'ed region - may accidentally go out of bounds, but we
> still say it's a sub-region. However, the code is interested in the numeric
> value of the pointer. So the analyzer uses another bridge class -
> nonloc::LocAsInteger, which represents pointed-to addresses as
> non-locations. The analyzer can cast the LocAsInteger back to the initial
> pointer. The SVal stored in `y' may look like this:
> "&element{SymRegion{conj_$0<void *>}, conj_$1<size_t>, int} (as 32-bit
> integer)".
>
> SVal hierarchy is a bit tricky, and it's generally very helpful to see
> hierarchies for SVal, SymExpr, MemRegion at clang's doxygen - they have
> examples of stuff that goes in each class. Also you can use SValExplainer
> to print out values in a human-friendly manner.
>
>
> On 6/1/17 12:59 PM, Paul Bert via cfe-dev wrote:
>
>> Hi,
>> I understand that a SVAL is a kind of union wrapping a symbolic value , a
>> mem region or a concrete value.  However I don't really understand the
>> meaning of Loc and NonLoc sub classes.
>> Can someone explain their purposes?
>>
>> Thanks
>>
>> Paul
>>
>>
>> _______________________________________________
>> cfe-dev mailing list
>> cfe-dev at lists.llvm.org
>> http://lists.llvm.org/cgi-bin/mailman/listinfo/cfe-dev
>>
>
>
-------------- next part --------------
An HTML attachment was scrubbed...
URL: <http://lists.llvm.org/pipermail/cfe-dev/attachments/20170607/1b8c345a/attachment.html>


More information about the cfe-dev mailing list