[cfe-dev] How to extract a symbol stored in LazyCompoundVal?
Artem Dergachev via cfe-dev
cfe-dev at lists.llvm.org
Wed Jun 26 17:31:21 PDT 2019
On 6/26/19 5:29 PM, Artem Dergachev wrote:
> You shouldn't do the "StoreMgr.getBinding(St,
> loc::MemRegionVal(Arg0Reg)) --> Undefined" part; it's not what i
> suggested and it doesn't work because Arg0Reg is already "dead" and
> its value has been garbage-collected (i assume that by Arg0Reg you
> mean the VarRegion for pos2).
>
> Like, if pos2 is never used later in the program, then we don't need
> to remember its value. So if move_to_pos(pos2) is the last use of
> pos2, we'll drop the binding and the Store would be empty.
>
> Accessing dead regions will produce unexpected results; the code for
> getSVal()/getBinding() assumes that the region is live (or that the
> expression is active when you try to retrieve a value of an expression).
>
> In this case it's like "hmm, the user is reading from a local
> variable, but it has no bindings, which means it has never been
> written to (otherwise i would have remembered it), which means that
> it's undefined behavior and contents of the variable are undefined".
> The real reason why Store doesn't remember any bindings is because it
> has correctly forgot about them.
>
> This is why we include the (lazy) copy of the old Store with the
> LazyCompoundValue. In order to extract data from
> lazyCompoundVal{0x4182e58,pos2}, you need to load the variable from
> the Store 0x4182e58, *not* from the current Store. The region is still
> live in that old store, but in the current store it's no longer there.
>
>
> This is why in order to obtain
Whoops, didn't finish my email :) I mean, the thing with
`getDefaultBinding(LCV)` is that it knows that it needs to look up the
value in the correct Store, so that's what you should use.
>
> On 6/26/19 3:22 PM, Torry Chen wrote:
>> I tried State->dump() and it shows there is no default binding for a
>> struct variable that copies another. See below. I certainly can (and
>> should) use the field symbols for my work. I was curious about the
>> internals of the analyzer engine. Thank you for the detailed explanation!
>>
>> struct XY pos1 = next_pos(10, 20); // Binding: pos1 -> conj_$3{struct
>> XY, LC1, S45538, #1}
>> move_to_pos(pos1);
>> // evalCall, State->dump():
>> //
>> // Store (direct and default bindings), 0x4176b88 :
>> // (GlobalInternalSpaceRegion,0,default) : conj_$1{int, LC1, S45538, #1}
>> // (GlobalSystemSpaceRegion,0,default) : conj_$2{int, LC1, S45538, #1}
>> // (pos1,0,default) : conj_$3{struct XY, LC1, S45538, #1}
>> //
>> // Expressions by stack frame:
>> // #0 Calling main
>> // (LC1, S45573) move_to_pos : &code{move_to_pos}
>> // (LC1, S45581) pos1 : lazyCompoundVal{0x4176b88,pos1}
>> //
>> // Ranges are empty.
>> //
>> // StoreMgr.getBinding(St, loc::MemRegionVal(Arg0Reg)) -->
>> *conj_$3{struct XY, LC1, S45538, #1}*
>>
>> struct XY pos2 = pos1; // Binding: pos2 ->
>> lazyCompoundVal{0x4176b88,pos1}
>> move_to_pos(pos2);
>> // evalCall, State->dump():
>> //
>> // Store (direct and default bindings), 0x4176b88 :
>> // (GlobalInternalSpaceRegion,0,default) : conj_$1{int, LC1, S45538, #1}
>> // (GlobalSystemSpaceRegion,0,default) : conj_$2{int, LC1, S45538, #1}
>> // (pos1,0,default) : conj_$3{struct XY, LC1, S45538, #1}
>> //
>> // Expressions by stack frame:
>> // #0 Calling main
>> // (LC1, S45618) move_to_pos : &code{move_to_pos}
>> // (LC1, S45626) pos2 : lazyCompoundVal{0x4182e58,pos2}
>> //
>> // Ranges are empty.
>> //
>> // StoreMgr.getBinding(St, loc::MemRegionVal(Arg0Reg)) --> *Undefined*
>>
>> On Wed, 26 Jun 2019 at 12:15, Artem Dergachev <noqnoqneo at gmail.com
>> <mailto:noqnoqneo at gmail.com>> wrote:
>>
>> Hmm, weird.
>>
>> I suspect that assignment was handled with "small struct
>> optimization", i.e. field-by-field rather than lazily (cf.
>> RegionStoreManager::tryBindSmallStruct).
>>
>> Could you do a State->dump() to verify that? If it shows that
>> there's no default binding but instead there are two derived
>> symbols bound to two different offsets, then the information
>> about the "whole struct symbol" is already more or less lost: the
>> static analyzer no longer remembers that this whole structure is
>> the same as pos1, but it does remember that its fields,
>> separately, are exactly the same as they were in pos1, which is
>> what you see by looking at the fields separately.
>>
>> Generally we don't have many checkers that track structures as a
>> whole and we don't really know how *should* the checker API look
>> like in order to make such checkers easy to implement. The only
>> such checker that we have is IteratorChecker and it kinda tries
>> to do something but it's not very convenient. For C++ objects i'm
>> thinking of tracking a "whole structure symbol" artificially, so
>> that it didn't have anything to do with the actual contents of
>> the structure but more with its semantic meaning: it would be
>> preserved by const operations (even if they mutate memory
>> contents of mutable fields) or through copies/moves and
>> additionally you would be able to attach state traits to it
>> without thinking about manually modeling copies/moves.
>>
>> I guess in your case, which seems to be more like a C world, the
>> ad-hoc solution would be to do something like
>>
>> let's see...
>> pos2.x comes from pos1...
>> pos2.y also comes from pos1...
>> aha, got it!
>> the whole pos2 comes from pos1!
>>
>> You will *anyway* have to do this because the programmer is free
>> to copy the structure field-by-field manually instead of just
>> assigning the structure. This would also happen in C++ if the
>> structure has a non-trivial constructor. For the same reason it's
>> not enough to check only 'x' but skip 'y': the programmer can
>> easily overwrite one field but not the other field.
>>
>> Finally, i'm surprised that it returns a UndefinedVal (i.e., in
>> particular, it allows you to unwrap the Optional) instead of
>> None. This sounds like a bug. But it might be because the
>> structure does indeed have an undefined default binding (eg.,
>> this happens when it's allocated by malloc() or operator new).
>> It'd make sense because assigning every field wouldn't overwrite
>> the default binding. Which, in turn, should remind you that
>> relying on the "structure symbol" in order to figure out what the
>> contents of the structure are is not a good idea unless your
>> structure is immutable and completely opaque or you somehow know
>> that it's freshly created. But direct bindings to fields are
>> actually always trustworthy. That's how our memory model works.
>>
>>
>> On 6/25/19 9:10 PM, Torry Chen wrote:
>>> Thank you Artem! It seems StoreManager::getDefaultBinding()
>>> won't work if the struct variable is copied. As shown below,
>>> getDefaultBinding() returns an undefined SVal.
>>>
>>> I could go down into fields to get the derived symbols for X and
>>> Y respectively, and then use getParentSymbol() to get the symbol
>>> for the whole struct. This looks cumbersome though. Is there a
>>> more convenient way to get the symbol for the whole struct in
>>> this case?
>>>
>>> // checkBind: pos1 -> conj_$3{struct XY, LC1, S45418, #1}
>>> struct XY pos1 = next_pos(10, 20);
>>>
>>> // checkBind: pos2 -> lazyCompoundVal{0x5d4bb38,pos1}
>>> struct XY pos2 = pos1;
>>>
>>> move_to_pos(pos2);
>>>
>>> /** evalCall for move_to_pos():
>>> SVal Pos = C.getSVal(CE->getArg(0));
>>> ProgramStateRef State = C.getState();
>>> StoreManager &StoreMgr =
>>> State->getStateManager().getStoreManager();
>>> auto LCV = Pos.getAs<nonloc::LazyCompoundVal>();
>>> SVal LCSVal = *StoreMgr.getDefaultBinding(*LCV);
>>> LCSVal.dump() // <- Undefined
>>> ...
>>> const Store St = LCV->getCVData()->getStore();
>>> const SVal FieldSVal = StoreMgr.getBinding(St,
>>> loc::MemRegionVal(FieldReg));
>>> FieldSVal.dump(); // <- derived_$4{conj_$3{struct XY, LC1,
>>> S45418, #1},pos1->X}
>>>
>>> const auto *SD = dyn_cast<SymbolDerived>(FieldSVal.getAsSymbol());
>>> const auto ParentSym = SD->getParentSymbol();
>>> ParentSym.dump(); // <- conj_$3{struct XY, LC1, S45418, #1}
>>> **/
>>>
>>> On Tue, 25 Jun 2019 at 14:06, Artem Dergachev
>>> <noqnoqneo at gmail.com <mailto:noqnoqneo at gmail.com>> wrote:
>>>
>>> The "0x4aa1c58" part of "lazyCompoundVal{0x4aa1c58,pos1}" is
>>> a Store object. You can access it with getStore() and then
>>> read it with the help of a StoreManager.
>>>
>>> Hmm, we seem to already have a convenient API for that, you
>>> can do
>>> StoreManager::getDefaultBinding(nonloc::LazyCompoundVal)
>>> directly if all you need is a default-bound conjured symbol.
>>> But if you want to lookup, say, specific fields in the
>>> structure (X and Y separately), you'll need to do
>>> getBinding() on manually constructed FieldRegions (in your
>>> case it doesn't look very useful because the whole structure
>>> is conjured anyway).
>>>
>>> I guess at this point you might like the chapter 5 of my old
>>> workbook
>>> (https://github.com/haoNoQ/clang-analyzer-guide/releases/download/v0.1/clang-analyzer-guide-v0.1.pdf),
>>> as for now it seems to be the only place where different
>>> kinds of values are explained.
>>>
>>>
>>> On 6/25/19 2:35 AM, Torry Chen via cfe-dev wrote:
>>>> My project has a struct type as follows and I'm writing a
>>>> checker for some functions that take the struct value as an
>>>> argument. In the checkPreCall function I see the argument
>>>> is an LazyCompoundVal, not a symbol as it would be for a
>>>> primitive type. I tried a few ways to extract the symbol
>>>> from the LazyCompountVal with no luck. Hope to get some
>>>> help here.
>>>>
>>>> struct XY {
>>>> uint64_t X;
>>>> uint64_t Y;
>>>> };
>>>>
>>>> ...
>>>> // checkBind: pos1 -> conj_$3{struct XY, LC1, S63346, #1}
>>>> struct XY pos1 = next_pos(...);
>>>>
>>>> // checkPreCall: Arg0: lazyCompoundVal{0x4aa1c58,pos1}
>>>> move_to_pos(pos1);
>>>>
>>>> _______________________________________________
>>>> cfe-dev mailing list
>>>> cfe-dev at lists.llvm.org <mailto:cfe-dev at lists.llvm.org>
>>>> https://lists.llvm.org/cgi-bin/mailman/listinfo/cfe-dev
>>>
>>
>
-------------- next part --------------
An HTML attachment was scrubbed...
URL: <http://lists.llvm.org/pipermail/cfe-dev/attachments/20190626/585077de/attachment.html>
More information about the cfe-dev
mailing list