[cfe-dev] [StaticAnalyzer] Loc and NonLoc SVal

Vlad Tsyrklevich via cfe-dev cfe-dev at lists.llvm.org
Wed Jun 7 08:22:43 PDT 2017


isSubRegionOf() should do the trick, it just walks up the chain of
super-regions to find if it's a child of the region. You can verify that
this is true or not for the cases you're looking at. I've hit issues where
the super region I was looking at had a cast applied, so it was an
ElementRegion{foo, 0, $cast_type} and was not a super-region of something
like FieldRegion{foo, bar}. You might try to use
SuperRegion->StripCasts()->isSubRegionOf() if that's the case. I think this
can especially be an issue with array references since you might have an
ElementRegion{arr, 0} when in reality it's a reference to the entire array.

On Wed, Jun 7, 2017 at 10:51 AM Paul Bert via cfe-dev <
cfe-dev at lists.llvm.org> wrote:

> Thanks a lot for the very detailed answer.
> I just recently saw the answer.
> So to compare 2 SVal I have to use SValBuilder.
> I think it is pretty clear how to do this if I want to assert equality
> with svalbuilder::evalEQ . How can I check if a given memory region whose
> symbolic expression is "decorated" with "Element" due to casting, is a
> sub-region of another one? Is SubRegionOf() is not working. How can I do
> this with SValBuilder? Any code example I can look at?
>
>
> Le vendredi 2 juin 2017, Artem Dergachev <noqnoqneo at gmail.com> a écrit :
>
>> Long story short: Loc ("location") is a single point in memory (normally
>> a pointer, maybe a C++ reference value or a null pointer or a function
>> pointer or even a goto label address that might be treated as a first-class
>> value through a gcc language extension). Note that Loc represents a single
>> pointer value; even though a loc::MemRegionVal value may wrap a whole
>> memory region, which in turn is a segment in memory, Loc itself represents
>> the start of that region. Everything else - such as numbers or bools, or
>> maybe structure ("compound") values regardless of their contents - is a
>> NonLoc. A compound value (nonloc::CompoundVal or nonloc::LazyCompoundVal)
>> represents the whole contents of the structure regardless of where it is
>> currently stored, hence it's a NonLoc.
>>
>> NonLoc is always an rvalue; Loc may be either an lvalue or a pointer-type
>> rvalue. For example, in
>>
>> 01  int x = 4;
>> 02  int y = 5;
>> 02  x += y;
>>
>> on line 03 the left-hand side of oeprator+= is a Loc that represents a
>> pointer to local variable x - that's where the result will be stored - and
>> the right-hand side is a NonLoc that represents concrete integer 5. If you
>> want to find the concrete number 4, you'd need to notice that left-hand
>> side is an lvalue expression in the sense of Expr::isLValue(), and then do
>> State::getSVal(Loc) from the left-hand side. Then you'd finally be able to
>> add 4 and 5 and compute the result of the operator. It is not adviced,
>> however, to determine if you need to do getSVal() by seeing if left-hand
>> side is a Loc. It may be a Loc for a different reason, eg:
>>
>> 01 void *x = malloc(BUF);
>> 02 int y = 5;
>> 03 void *z = x + y;
>>
>> ^here the left-hand side of operator+ is a Loc rvalue that represents a
>> pointer to the start of the buffer, not &x like in the example above. If it
>> was operator+=, it would have been &x and you'd have to do the extra
>> getSVal(). But here you may accidentally shoot yourself in the foot if you
>> make a redundant getSVal().
>>
>> Because in C pointers can be casted to integers and vice versa, some
>> subtleties arize. Consider:
>>
>> 01  void *x = malloc(2 * sizeof(int));
>> 02  *((int *) x) = 1;
>> 03  size_t i = foo();
>> 04  intptr_t y = &((int *) x)[i];
>>
>> The analyzer doesn't know what value would be returned by malloc(). It
>> construct a SymbolConjured `conj_$0<void *>' to represent the numeric value
>> of the pointer. Being a SymExpr, it isn't Loc or NonLoc. Technically it's
>> possible to construct nonloc::SymbolVal with `conj_$0<void *>' to represent
>> such numeric value as a NonLoc, but the analyzer wouldn't normally do that
>> for pointer symbols.
>>
>> Instead, the analyzer constructs a SymbolicRegion, which represents the
>> segment of memory allocated by malloc(). The analyzer even knows the extent
>> (size or length, in bytes) of this region (which would in our example be 8
>> - let's assume a 32-bit architecture). Then the analyzer represents the
>> return value of malloc() as a loc::MemRegionVal, which is a Loc, of form
>> "&SymRegion{conj_$0<void *>}". It means the pointer that points to the
>> beginning of the region that was allocated by malloc.
>>
>> SymbolicRegion is essentially a bridge between symbols and Locs. If the
>> pointer points to a known value, eg. &x, it'd not be a symbolic region, and
>> the analyzer wouldn't construct any symbol at all to represent its numeric
>> value. But when a pointer appears from elsewhere, we have to denote its
>> value with a symbol and represent the pointer value as a symbolic region's
>> start.
>>
>> On the second line, the analyzer sees the cast of the pointer to "int*".
>> The SVal of the expression would be "&element{SymRegion{conj_$0<void *>}, 0
>> S32b, int}". It means that the analyzer interpreted the aforementioned
>> symbolic region as an array of ints. The analyzer isn't sure if there are
>> actually ints in this array (which is why SymbolicRegion doesn't inherit
>> from TypedValueRegion). Then he took the first element of this array, which
>> is a 4-byte chunk of the symbolic region. The SVal is a loc::MemRegionVal
>> represents the pointer to this first element. This SVal is technically
>> *equal* to the old one: "&SymRegion{conj_$0<void *>}" - both represent the
>> same pointer value. They wouldn't be equal in terms of SVal::operator==(),
>> but they'd be equal if you use SValBuilder to evaluate BO_EQ on them (which
>> is the right way to compare SVals). However, the new SVal carries
>> additional type information that you may want to extract from it by
>> querying its .getAsRegion().
>>
>> On line 04 computation of the right-hand side of the expression starts
>> with constructing a similar ElementRegion, just with symbolic offset:
>> "&element{SymRegion{conj_$0<void *>}, conj_$1<size_t>, int}". It represents
>> a pointer to i'th element of the array. It's not necessarily within the
>> malloc'ed region - may accidentally go out of bounds, but we still say it's
>> a sub-region. However, the code is interested in the numeric value of the
>> pointer. So the analyzer uses another bridge class - nonloc::LocAsInteger,
>> which represents pointed-to addresses as non-locations. The analyzer can
>> cast the LocAsInteger back to the initial pointer. The SVal stored in `y'
>> may look like this: "&element{SymRegion{conj_$0<void *>}, conj_$1<size_t>,
>> int} (as 32-bit integer)".
>>
>> SVal hierarchy is a bit tricky, and it's generally very helpful to see
>> hierarchies for SVal, SymExpr, MemRegion at clang's doxygen - they have
>> examples of stuff that goes in each class. Also you can use SValExplainer
>> to print out values in a human-friendly manner.
>>
>>
>> On 6/1/17 12:59 PM, Paul Bert via cfe-dev wrote:
>>
>>> Hi,
>>> I understand that a SVAL is a kind of union wrapping a symbolic value ,
>>> a mem region or a concrete value.  However I don't really understand the
>>> meaning of Loc and NonLoc sub classes.
>>> Can someone explain their purposes?
>>>
>>> Thanks
>>>
>>> Paul
>>>
>>>
>>> _______________________________________________
>>> cfe-dev mailing list
>>> cfe-dev at lists.llvm.org
>>> http://lists.llvm.org/cgi-bin/mailman/listinfo/cfe-dev
>>>
>>
>> _______________________________________________
> cfe-dev mailing list
> cfe-dev at lists.llvm.org
> http://lists.llvm.org/cgi-bin/mailman/listinfo/cfe-dev
>
-------------- next part --------------
An HTML attachment was scrubbed...
URL: <http://lists.llvm.org/pipermail/cfe-dev/attachments/20170607/7a563e09/attachment.html>


More information about the cfe-dev mailing list