[cfe-dev] [StaticAnalyzer] Loc and NonLoc SVal

Artem Dergachev via cfe-dev cfe-dev at lists.llvm.org
Fri Jun 2 08:10:05 PDT 2017


Long story short: Loc ("location") is a single point in memory (normally 
a pointer, maybe a C++ reference value or a null pointer or a function 
pointer or even a goto label address that might be treated as a 
first-class value through a gcc language extension). Note that Loc 
represents a single pointer value; even though a loc::MemRegionVal value 
may wrap a whole memory region, which in turn is a segment in memory, 
Loc itself represents the start of that region. Everything else - such 
as numbers or bools, or maybe structure ("compound") values regardless 
of their contents - is a NonLoc. A compound value (nonloc::CompoundVal 
or nonloc::LazyCompoundVal) represents the whole contents of the 
structure regardless of where it is currently stored, hence it's a NonLoc.

NonLoc is always an rvalue; Loc may be either an lvalue or a 
pointer-type rvalue. For example, in

01  int x = 4;
02  int y = 5;
02  x += y;

on line 03 the left-hand side of oeprator+= is a Loc that represents a 
pointer to local variable x - that's where the result will be stored - 
and the right-hand side is a NonLoc that represents concrete integer 5. 
If you want to find the concrete number 4, you'd need to notice that 
left-hand side is an lvalue expression in the sense of Expr::isLValue(), 
and then do State::getSVal(Loc) from the left-hand side. Then you'd 
finally be able to add 4 and 5 and compute the result of the operator. 
It is not adviced, however, to determine if you need to do getSVal() by 
seeing if left-hand side is a Loc. It may be a Loc for a different 
reason, eg:

01 void *x = malloc(BUF);
02 int y = 5;
03 void *z = x + y;

^here the left-hand side of operator+ is a Loc rvalue that represents a 
pointer to the start of the buffer, not &x like in the example above. If 
it was operator+=, it would have been &x and you'd have to do the extra 
getSVal(). But here you may accidentally shoot yourself in the foot if 
you make a redundant getSVal().

Because in C pointers can be casted to integers and vice versa, some 
subtleties arize. Consider:

01  void *x = malloc(2 * sizeof(int));
02  *((int *) x) = 1;
03  size_t i = foo();
04  intptr_t y = &((int *) x)[i];

The analyzer doesn't know what value would be returned by malloc(). It 
construct a SymbolConjured `conj_$0<void *>' to represent the numeric 
value of the pointer. Being a SymExpr, it isn't Loc or NonLoc. 
Technically it's possible to construct nonloc::SymbolVal with 
`conj_$0<void *>' to represent such numeric value as a NonLoc, but the 
analyzer wouldn't normally do that for pointer symbols.

Instead, the analyzer constructs a SymbolicRegion, which represents the 
segment of memory allocated by malloc(). The analyzer even knows the 
extent (size or length, in bytes) of this region (which would in our 
example be 8 - let's assume a 32-bit architecture). Then the analyzer 
represents the return value of malloc() as a loc::MemRegionVal, which is 
a Loc, of form "&SymRegion{conj_$0<void *>}". It means the pointer that 
points to the beginning of the region that was allocated by malloc.

SymbolicRegion is essentially a bridge between symbols and Locs. If the 
pointer points to a known value, eg. &x, it'd not be a symbolic region, 
and the analyzer wouldn't construct any symbol at all to represent its 
numeric value. But when a pointer appears from elsewhere, we have to 
denote its value with a symbol and represent the pointer value as a 
symbolic region's start.

On the second line, the analyzer sees the cast of the pointer to "int*". 
The SVal of the expression would be "&element{SymRegion{conj_$0<void 
*>}, 0 S32b, int}". It means that the analyzer interpreted the 
aforementioned symbolic region as an array of ints. The analyzer isn't 
sure if there are actually ints in this array (which is why 
SymbolicRegion doesn't inherit from TypedValueRegion). Then he took the 
first element of this array, which is a 4-byte chunk of the symbolic 
region. The SVal is a loc::MemRegionVal represents the pointer to this 
first element. This SVal is technically *equal* to the old one: 
"&SymRegion{conj_$0<void *>}" - both represent the same pointer value. 
They wouldn't be equal in terms of SVal::operator==(), but they'd be 
equal if you use SValBuilder to evaluate BO_EQ on them (which is the 
right way to compare SVals). However, the new SVal carries additional 
type information that you may want to extract from it by querying its 
.getAsRegion().

On line 04 computation of the right-hand side of the expression starts 
with constructing a similar ElementRegion, just with symbolic offset: 
"&element{SymRegion{conj_$0<void *>}, conj_$1<size_t>, int}". It 
represents a pointer to i'th element of the array. It's not necessarily 
within the malloc'ed region - may accidentally go out of bounds, but we 
still say it's a sub-region. However, the code is interested in the 
numeric value of the pointer. So the analyzer uses another bridge class 
- nonloc::LocAsInteger, which represents pointed-to addresses as 
non-locations. The analyzer can cast the LocAsInteger back to the 
initial pointer. The SVal stored in `y' may look like this: 
"&element{SymRegion{conj_$0<void *>}, conj_$1<size_t>, int} (as 32-bit 
integer)".

SVal hierarchy is a bit tricky, and it's generally very helpful to see 
hierarchies for SVal, SymExpr, MemRegion at clang's doxygen - they have 
examples of stuff that goes in each class. Also you can use 
SValExplainer to print out values in a human-friendly manner.


On 6/1/17 12:59 PM, Paul Bert via cfe-dev wrote:
> Hi,
> I understand that a SVAL is a kind of union wrapping a symbolic value 
> , a mem region or a concrete value.  However I don't really understand 
> the meaning of Loc and NonLoc sub classes.
> Can someone explain their purposes?
>
> Thanks
>
> Paul
>
>
> _______________________________________________
> cfe-dev mailing list
> cfe-dev at lists.llvm.org
> http://lists.llvm.org/cgi-bin/mailman/listinfo/cfe-dev




More information about the cfe-dev mailing list