[cfe-dev] [StaticAnalyzer] Loc and NonLoc SVal
Artem Dergachev via cfe-dev
cfe-dev at lists.llvm.org
Fri Jun 2 08:10:05 PDT 2017
Long story short: Loc ("location") is a single point in memory (normally
a pointer, maybe a C++ reference value or a null pointer or a function
pointer or even a goto label address that might be treated as a
first-class value through a gcc language extension). Note that Loc
represents a single pointer value; even though a loc::MemRegionVal value
may wrap a whole memory region, which in turn is a segment in memory,
Loc itself represents the start of that region. Everything else - such
as numbers or bools, or maybe structure ("compound") values regardless
of their contents - is a NonLoc. A compound value (nonloc::CompoundVal
or nonloc::LazyCompoundVal) represents the whole contents of the
structure regardless of where it is currently stored, hence it's a NonLoc.
NonLoc is always an rvalue; Loc may be either an lvalue or a
pointer-type rvalue. For example, in
01 int x = 4;
02 int y = 5;
02 x += y;
on line 03 the left-hand side of oeprator+= is a Loc that represents a
pointer to local variable x - that's where the result will be stored -
and the right-hand side is a NonLoc that represents concrete integer 5.
If you want to find the concrete number 4, you'd need to notice that
left-hand side is an lvalue expression in the sense of Expr::isLValue(),
and then do State::getSVal(Loc) from the left-hand side. Then you'd
finally be able to add 4 and 5 and compute the result of the operator.
It is not adviced, however, to determine if you need to do getSVal() by
seeing if left-hand side is a Loc. It may be a Loc for a different
reason, eg:
01 void *x = malloc(BUF);
02 int y = 5;
03 void *z = x + y;
^here the left-hand side of operator+ is a Loc rvalue that represents a
pointer to the start of the buffer, not &x like in the example above. If
it was operator+=, it would have been &x and you'd have to do the extra
getSVal(). But here you may accidentally shoot yourself in the foot if
you make a redundant getSVal().
Because in C pointers can be casted to integers and vice versa, some
subtleties arize. Consider:
01 void *x = malloc(2 * sizeof(int));
02 *((int *) x) = 1;
03 size_t i = foo();
04 intptr_t y = &((int *) x)[i];
The analyzer doesn't know what value would be returned by malloc(). It
construct a SymbolConjured `conj_$0<void *>' to represent the numeric
value of the pointer. Being a SymExpr, it isn't Loc or NonLoc.
Technically it's possible to construct nonloc::SymbolVal with
`conj_$0<void *>' to represent such numeric value as a NonLoc, but the
analyzer wouldn't normally do that for pointer symbols.
Instead, the analyzer constructs a SymbolicRegion, which represents the
segment of memory allocated by malloc(). The analyzer even knows the
extent (size or length, in bytes) of this region (which would in our
example be 8 - let's assume a 32-bit architecture). Then the analyzer
represents the return value of malloc() as a loc::MemRegionVal, which is
a Loc, of form "&SymRegion{conj_$0<void *>}". It means the pointer that
points to the beginning of the region that was allocated by malloc.
SymbolicRegion is essentially a bridge between symbols and Locs. If the
pointer points to a known value, eg. &x, it'd not be a symbolic region,
and the analyzer wouldn't construct any symbol at all to represent its
numeric value. But when a pointer appears from elsewhere, we have to
denote its value with a symbol and represent the pointer value as a
symbolic region's start.
On the second line, the analyzer sees the cast of the pointer to "int*".
The SVal of the expression would be "&element{SymRegion{conj_$0<void
*>}, 0 S32b, int}". It means that the analyzer interpreted the
aforementioned symbolic region as an array of ints. The analyzer isn't
sure if there are actually ints in this array (which is why
SymbolicRegion doesn't inherit from TypedValueRegion). Then he took the
first element of this array, which is a 4-byte chunk of the symbolic
region. The SVal is a loc::MemRegionVal represents the pointer to this
first element. This SVal is technically *equal* to the old one:
"&SymRegion{conj_$0<void *>}" - both represent the same pointer value.
They wouldn't be equal in terms of SVal::operator==(), but they'd be
equal if you use SValBuilder to evaluate BO_EQ on them (which is the
right way to compare SVals). However, the new SVal carries additional
type information that you may want to extract from it by querying its
.getAsRegion().
On line 04 computation of the right-hand side of the expression starts
with constructing a similar ElementRegion, just with symbolic offset:
"&element{SymRegion{conj_$0<void *>}, conj_$1<size_t>, int}". It
represents a pointer to i'th element of the array. It's not necessarily
within the malloc'ed region - may accidentally go out of bounds, but we
still say it's a sub-region. However, the code is interested in the
numeric value of the pointer. So the analyzer uses another bridge class
- nonloc::LocAsInteger, which represents pointed-to addresses as
non-locations. The analyzer can cast the LocAsInteger back to the
initial pointer. The SVal stored in `y' may look like this:
"&element{SymRegion{conj_$0<void *>}, conj_$1<size_t>, int} (as 32-bit
integer)".
SVal hierarchy is a bit tricky, and it's generally very helpful to see
hierarchies for SVal, SymExpr, MemRegion at clang's doxygen - they have
examples of stuff that goes in each class. Also you can use
SValExplainer to print out values in a human-friendly manner.
On 6/1/17 12:59 PM, Paul Bert via cfe-dev wrote:
> Hi,
> I understand that a SVAL is a kind of union wrapping a symbolic value
> , a mem region or a concrete value. However I don't really understand
> the meaning of Loc and NonLoc sub classes.
> Can someone explain their purposes?
>
> Thanks
>
> Paul
>
>
> _______________________________________________
> cfe-dev mailing list
> cfe-dev at lists.llvm.org
> http://lists.llvm.org/cgi-bin/mailman/listinfo/cfe-dev
More information about the cfe-dev
mailing list