[cfe-dev] How to extract a symbol stored in LazyCompoundVal?

Artem Dergachev via cfe-dev cfe-dev at lists.llvm.org
Wed Jun 26 17:31:21 PDT 2019


On 6/26/19 5:29 PM, Artem Dergachev wrote:
> You shouldn't do the "StoreMgr.getBinding(St, 
> loc::MemRegionVal(Arg0Reg)) --> Undefined" part; it's not what i 
> suggested and it doesn't work because Arg0Reg is already "dead" and 
> its value has been garbage-collected (i assume that by Arg0Reg you 
> mean the VarRegion for pos2).
>
> Like, if pos2 is never used later in the program, then we don't need 
> to remember its value. So if move_to_pos(pos2) is the last use of 
> pos2, we'll drop the binding and the Store would be empty.
>
> Accessing dead regions will produce unexpected results; the code for 
> getSVal()/getBinding() assumes that the region is live (or that the 
> expression is active when you try to retrieve a value of an expression).
>
> In this case it's like "hmm, the user is reading from a local 
> variable, but it has no bindings, which means it has never been 
> written to (otherwise i would have remembered it), which means that 
> it's undefined behavior and contents of the variable are undefined". 
> The real reason why Store doesn't remember any bindings is because it 
> has correctly forgot about them.
>
> This is why we include the (lazy) copy of the old Store with the 
> LazyCompoundValue. In order to extract data from 
> lazyCompoundVal{0x4182e58,pos2}, you need to load the variable from 
> the Store 0x4182e58, *not* from the current Store. The region is still 
> live in that old store, but in the current store it's no longer there.
>
>
> This is why in order to obtain


Whoops, didn't finish my email :) I mean, the thing with 
`getDefaultBinding(LCV)` is that it knows that it needs to look up the 
value in the correct Store, so that's what you should use.


>
> On 6/26/19 3:22 PM, Torry Chen wrote:
>> I tried State->dump() and it shows there is no default binding for a 
>> struct variable that copies another. See below. I certainly can (and 
>> should) use the field symbols for my work. I was curious about the 
>> internals of the analyzer engine. Thank you for the detailed explanation!
>>
>> struct XY pos1 = next_pos(10, 20); // Binding: pos1 -> conj_$3{struct 
>> XY, LC1, S45538, #1}
>> move_to_pos(pos1);
>> // evalCall, State->dump():
>> //
>> // Store (direct and default bindings), 0x4176b88 :
>> // (GlobalInternalSpaceRegion,0,default) : conj_$1{int, LC1, S45538, #1}
>> // (GlobalSystemSpaceRegion,0,default) : conj_$2{int, LC1, S45538, #1}
>> // (pos1,0,default) : conj_$3{struct XY, LC1, S45538, #1}
>> //
>> // Expressions by stack frame:
>> // #0 Calling main
>> // (LC1, S45573) move_to_pos : &code{move_to_pos}
>> // (LC1, S45581) pos1 : lazyCompoundVal{0x4176b88,pos1}
>> //
>> // Ranges are empty.
>> //
>> // StoreMgr.getBinding(St, loc::MemRegionVal(Arg0Reg)) --> 
>> *conj_$3{struct XY, LC1, S45538, #1}*
>>
>> struct XY pos2 = pos1; // Binding: pos2 -> 
>> lazyCompoundVal{0x4176b88,pos1}
>> move_to_pos(pos2);
>> // evalCall, State->dump():
>> //
>> // Store (direct and default bindings), 0x4176b88 :
>> // (GlobalInternalSpaceRegion,0,default) : conj_$1{int, LC1, S45538, #1}
>> // (GlobalSystemSpaceRegion,0,default) : conj_$2{int, LC1, S45538, #1}
>> // (pos1,0,default) : conj_$3{struct XY, LC1, S45538, #1}
>> //
>> // Expressions by stack frame:
>> // #0 Calling main
>> // (LC1, S45618) move_to_pos : &code{move_to_pos}
>> // (LC1, S45626) pos2 : lazyCompoundVal{0x4182e58,pos2}
>> //
>> // Ranges are empty.
>> //
>> // StoreMgr.getBinding(St, loc::MemRegionVal(Arg0Reg)) --> *Undefined*
>>
>> On Wed, 26 Jun 2019 at 12:15, Artem Dergachev <noqnoqneo at gmail.com 
>> <mailto:noqnoqneo at gmail.com>> wrote:
>>
>>     Hmm, weird.
>>
>>     I suspect that assignment was handled with "small struct
>>     optimization", i.e. field-by-field rather than lazily (cf.
>>     RegionStoreManager::tryBindSmallStruct).
>>
>>     Could you do a State->dump() to verify that? If it shows that
>>     there's no default binding but instead there are two derived
>>     symbols bound to two different offsets, then the information
>>     about the "whole struct symbol" is already more or less lost: the
>>     static analyzer no longer remembers that this whole structure is
>>     the same as pos1, but it does remember that its fields,
>>     separately, are exactly the same as they were in pos1, which is
>>     what you see by looking at the fields separately.
>>
>>     Generally we don't have many checkers that track structures as a
>>     whole and we don't really know how *should* the checker API look
>>     like in order to make such checkers easy to implement. The only
>>     such checker that we have is IteratorChecker and it kinda tries
>>     to do something but it's not very convenient. For C++ objects i'm
>>     thinking of tracking a "whole structure symbol" artificially, so
>>     that it didn't have anything to do with the actual contents of
>>     the structure but more with its semantic meaning: it would be
>>     preserved by const operations (even if they mutate memory
>>     contents of mutable fields) or through copies/moves and
>>     additionally you would be able to attach state traits to it
>>     without thinking about manually modeling copies/moves.
>>
>>     I guess in your case, which seems to be more like a C world, the
>>     ad-hoc solution would be to do something like
>>
>>         let's see...
>>         pos2.x comes from pos1...
>>         pos2.y also comes from pos1...
>>         aha, got it!
>>         the whole pos2 comes from pos1!
>>
>>     You will *anyway* have to do this because the programmer is free
>>     to copy the structure field-by-field manually instead of just
>>     assigning the structure. This would also happen in C++ if the
>>     structure has a non-trivial constructor. For the same reason it's
>>     not enough to check only 'x' but skip 'y': the programmer can
>>     easily overwrite one field but not the other field.
>>
>>     Finally, i'm surprised that it returns a UndefinedVal (i.e., in
>>     particular, it allows you to unwrap the Optional) instead of
>>     None. This sounds like a bug. But it might be because the
>>     structure does indeed have an undefined default binding (eg.,
>>     this happens when it's allocated by malloc() or operator new).
>>     It'd make sense because assigning every field wouldn't overwrite
>>     the default binding. Which, in turn, should remind you that
>>     relying on the "structure symbol" in order to figure out what the
>>     contents of the structure are is not a good idea unless your
>>     structure is immutable and completely opaque or you somehow know
>>     that it's freshly created. But direct bindings to fields are
>>     actually always trustworthy. That's how our memory model works.
>>
>>
>>     On 6/25/19 9:10 PM, Torry Chen wrote:
>>>     Thank you Artem! It seems StoreManager::getDefaultBinding()
>>>     won't work if the struct variable is copied. As shown below,
>>>     getDefaultBinding() returns an undefined SVal.
>>>
>>>     I could go down into fields to get the derived symbols for X and
>>>     Y respectively, and then use getParentSymbol() to get the symbol
>>>     for the whole struct. This looks cumbersome though. Is there a
>>>     more convenient way to get the symbol for the whole struct in
>>>     this case?
>>>
>>>     // checkBind: pos1 -> conj_$3{struct XY, LC1, S45418, #1}
>>>     struct XY pos1 = next_pos(10, 20);
>>>
>>>     // checkBind: pos2 -> lazyCompoundVal{0x5d4bb38,pos1}
>>>     struct XY pos2 = pos1;
>>>
>>>     move_to_pos(pos2);
>>>
>>>     /** evalCall for move_to_pos():
>>>       SVal Pos = C.getSVal(CE->getArg(0));
>>>       ProgramStateRef State = C.getState();
>>>       StoreManager &StoreMgr =
>>>     State->getStateManager().getStoreManager();
>>>       auto LCV = Pos.getAs<nonloc::LazyCompoundVal>();
>>>       SVal LCSVal = *StoreMgr.getDefaultBinding(*LCV);
>>>       LCSVal.dump() // <- Undefined
>>>       ...
>>>       const Store St = LCV->getCVData()->getStore();
>>>       const SVal FieldSVal = StoreMgr.getBinding(St,
>>>     loc::MemRegionVal(FieldReg));
>>>       FieldSVal.dump(); // <- derived_$4{conj_$3{struct XY, LC1,
>>>     S45418, #1},pos1->X}
>>>
>>>       const auto *SD = dyn_cast<SymbolDerived>(FieldSVal.getAsSymbol());
>>>       const auto ParentSym = SD->getParentSymbol();
>>>       ParentSym.dump(); // <- conj_$3{struct XY, LC1, S45418, #1}
>>>     **/
>>>
>>>     On Tue, 25 Jun 2019 at 14:06, Artem Dergachev
>>>     <noqnoqneo at gmail.com <mailto:noqnoqneo at gmail.com>> wrote:
>>>
>>>         The "0x4aa1c58" part of "lazyCompoundVal{0x4aa1c58,pos1}" is
>>>         a Store object. You can access it with getStore() and then
>>>         read it with the help of a StoreManager.
>>>
>>>         Hmm, we seem to already have a convenient API for that, you
>>>         can do
>>>         StoreManager::getDefaultBinding(nonloc::LazyCompoundVal)
>>>         directly if all you need is a default-bound conjured symbol.
>>>         But if you want to lookup, say, specific fields in the
>>>         structure (X and Y separately), you'll need to do
>>>         getBinding() on manually constructed FieldRegions (in your
>>>         case it doesn't look very useful because the whole structure
>>>         is conjured anyway).
>>>
>>>         I guess at this point you might like the chapter 5 of my old
>>>         workbook
>>>         (https://github.com/haoNoQ/clang-analyzer-guide/releases/download/v0.1/clang-analyzer-guide-v0.1.pdf),
>>>         as for now it seems to be the only place where different
>>>         kinds of values are explained.
>>>
>>>
>>>         On 6/25/19 2:35 AM, Torry Chen via cfe-dev wrote:
>>>>         My project has a struct type as follows and I'm writing a
>>>>         checker for some functions that take the struct value as an
>>>>         argument. In the checkPreCall function I see the argument
>>>>         is an LazyCompoundVal, not a symbol as it would be for a
>>>>         primitive type. I tried a few ways to extract the symbol
>>>>         from the LazyCompountVal with no luck. Hope to get some
>>>>         help here.
>>>>
>>>>         struct XY {
>>>>           uint64_t X;
>>>>           uint64_t Y;
>>>>         };
>>>>
>>>>         ...
>>>>         // checkBind: pos1 -> conj_$3{struct XY, LC1, S63346, #1}
>>>>         struct XY pos1 = next_pos(...);
>>>>
>>>>         // checkPreCall: Arg0: lazyCompoundVal{0x4aa1c58,pos1}
>>>>         move_to_pos(pos1);
>>>>
>>>>         _______________________________________________
>>>>         cfe-dev mailing list
>>>>         cfe-dev at lists.llvm.org  <mailto:cfe-dev at lists.llvm.org>
>>>>         https://lists.llvm.org/cgi-bin/mailman/listinfo/cfe-dev
>>>
>>
>

-------------- next part --------------
An HTML attachment was scrubbed...
URL: <http://lists.llvm.org/pipermail/cfe-dev/attachments/20190626/585077de/attachment.html>


More information about the cfe-dev mailing list