[cfe-dev] How to extract a symbol stored in LazyCompoundVal?

Torry Chen via cfe-dev cfe-dev at lists.llvm.org
Wed Jun 26 18:21:41 PDT 2019


Sorry for a mistake: I was trying both methods but pasted a wrong line in
the previous email. The following is what I did with getDefaultBinding,
still getting Undefined SVal when calling with pos2. Both pos1 and pos2 are
used after the calls and they shouldn't be dead.

// evalCall for move_to_pos(struct XY pos):
ProgramStateRef State = C.getState();
const SVal Pos = C.getSVal(CE->getArg(0));
StoreManager &StoreMgr = State->getStateManager().getStoreManager();
auto LCV = Pos.getAs<nonloc::LazyCompoundVal>();
auto LCVal = StoreMgr.getDefaultBinding(*LCV);
LCVal->dump(); // --> conj_$3 when calling with pos1 but Undefined for pos2

// Code to check:
struct XY pos1 = next_pos(10, 20);
move_to_pos(pos1);
struct XY pos2 = pos1;
move_to_pos(pos2);
printf("X %ld Y %ld\n", pos1.X, pos1.Y);
printf("X %ld Y %ld\n", pos2.X, pos2.Y);

On Wed, 26 Jun 2019 at 17:31, Artem Dergachev <noqnoqneo at gmail.com> wrote:

> On 6/26/19 5:29 PM, Artem Dergachev wrote:
>
> You shouldn't do the "StoreMgr.getBinding(St, loc::MemRegionVal(Arg0Reg))
> --> Undefined" part; it's not what i suggested and it doesn't work because
> Arg0Reg is already "dead" and its value has been garbage-collected (i
> assume that by Arg0Reg you mean the VarRegion for pos2).
>
> Like, if pos2 is never used later in the program, then we don't need to
> remember its value. So if move_to_pos(pos2) is the last use of pos2, we'll
> drop the binding and the Store would be empty.
>
> Accessing dead regions will produce unexpected results; the code for
> getSVal()/getBinding() assumes that the region is live (or that the
> expression is active when you try to retrieve a value of an expression).
>
> In this case it's like "hmm, the user is reading from a local variable,
> but it has no bindings, which means it has never been written to (otherwise
> i would have remembered it), which means that it's undefined behavior and
> contents of the variable are undefined". The real reason why Store doesn't
> remember any bindings is because it has correctly forgot about them.
>
> This is why we include the (lazy) copy of the old Store with the
> LazyCompoundValue. In order to extract data from
> lazyCompoundVal{0x4182e58,pos2}, you need to load the variable from the
> Store 0x4182e58, *not* from the current Store. The region is still live in
> that old store, but in the current store it's no longer there.
>
>
> This is why in order to obtain
>
>
>
> Whoops, didn't finish my email :) I mean, the thing with
> `getDefaultBinding(LCV)` is that it knows that it needs to look up the
> value in the correct Store, so that's what you should use.
>
>
>
> On 6/26/19 3:22 PM, Torry Chen wrote:
>
> I tried State->dump() and it shows there is no default binding for a
> struct variable that copies another. See below. I certainly can (and
> should) use the field symbols for my work. I was curious about the
> internals of the analyzer engine. Thank you for the detailed explanation!
>
> struct XY pos1 = next_pos(10, 20); // Binding: pos1 -> conj_$3{struct XY,
> LC1, S45538, #1}
> move_to_pos(pos1);
> // evalCall, State->dump():
> //
> // Store (direct and default bindings), 0x4176b88 :
> // (GlobalInternalSpaceRegion,0,default) : conj_$1{int, LC1, S45538, #1}
> // (GlobalSystemSpaceRegion,0,default) : conj_$2{int, LC1, S45538, #1}
> // (pos1,0,default) : conj_$3{struct XY, LC1, S45538, #1}
> //
> // Expressions by stack frame:
> // #0 Calling main
>
> // (LC1, S45573) move_to_pos : &code{move_to_pos}
> // (LC1, S45581) pos1 : lazyCompoundVal{0x4176b88,pos1}
> //
> // Ranges are empty.
> //
> // StoreMgr.getBinding(St, loc::MemRegionVal(Arg0Reg)) --> *conj_$3{struct
> XY, LC1, S45538, #1}*
>
> struct XY pos2 = pos1; // Binding: pos2 -> lazyCompoundVal{0x4176b88,pos1}
> move_to_pos(pos2);
> // evalCall, State->dump():
> //
> // Store (direct and default bindings), 0x4176b88 :
> // (GlobalInternalSpaceRegion,0,default) : conj_$1{int, LC1, S45538, #1}
> // (GlobalSystemSpaceRegion,0,default) : conj_$2{int, LC1, S45538, #1}
> // (pos1,0,default) : conj_$3{struct XY, LC1, S45538, #1}
> //
> // Expressions by stack frame:
> // #0 Calling main
> // (LC1, S45618) move_to_pos : &code{move_to_pos}
> // (LC1, S45626) pos2 : lazyCompoundVal{0x4182e58,pos2}
> //
> // Ranges are empty.
> //
> // StoreMgr.getBinding(St, loc::MemRegionVal(Arg0Reg)) --> *Undefined*
>
> On Wed, 26 Jun 2019 at 12:15, Artem Dergachev <noqnoqneo at gmail.com> wrote:
>
>> Hmm, weird.
>>
>> I suspect that assignment was handled with "small struct optimization",
>> i.e. field-by-field rather than lazily (cf.
>> RegionStoreManager::tryBindSmallStruct).
>>
>> Could you do a State->dump() to verify that? If it shows that there's no
>> default binding but instead there are two derived symbols bound to two
>> different offsets, then the information about the "whole struct symbol" is
>> already more or less lost: the static analyzer no longer remembers that
>> this whole structure is the same as pos1, but it does remember that its
>> fields, separately, are exactly the same as they were in pos1, which is
>> what you see by looking at the fields separately.
>>
>> Generally we don't have many checkers that track structures as a whole
>> and we don't really know how *should* the checker API look like in order to
>> make such checkers easy to implement. The only such checker that we have is
>> IteratorChecker and it kinda tries to do something but it's not very
>> convenient. For C++ objects i'm thinking of tracking a "whole structure
>> symbol" artificially, so that it didn't have anything to do with the actual
>> contents of the structure but more with its semantic meaning: it would be
>> preserved by const operations (even if they mutate memory contents of
>> mutable fields) or through copies/moves and additionally you would be able
>> to attach state traits to it without thinking about manually modeling
>> copies/moves.
>>
>> I guess in your case, which seems to be more like a C world, the ad-hoc
>> solution would be to do something like
>>
>>     let's see...
>>     pos2.x comes from pos1...
>>     pos2.y also comes from pos1...
>>     aha, got it!
>>     the whole pos2 comes from pos1!
>>
>> You will *anyway* have to do this because the programmer is free to copy
>> the structure field-by-field manually instead of just assigning the
>> structure. This would also happen in C++ if the structure has a non-trivial
>> constructor. For the same reason it's not enough to check only 'x' but skip
>> 'y': the programmer can easily overwrite one field but not the other field.
>>
>> Finally, i'm surprised that it returns a UndefinedVal (i.e., in
>> particular, it allows you to unwrap the Optional) instead of None. This
>> sounds like a bug. But it might be because the structure does indeed have
>> an undefined default binding (eg., this happens when it's allocated by
>> malloc() or operator new). It'd make sense because assigning every field
>> wouldn't overwrite the default binding. Which, in turn, should remind you
>> that relying on the "structure symbol" in order to figure out what the
>> contents of the structure are is not a good idea unless your structure is
>> immutable and completely opaque or you somehow know that it's freshly
>> created. But direct bindings to fields are actually always trustworthy.
>> That's how our memory model works.
>>
>>
>> On 6/25/19 9:10 PM, Torry Chen wrote:
>>
>> Thank you Artem! It seems StoreManager::getDefaultBinding() won't work if
>> the struct variable is copied. As shown below, getDefaultBinding() returns
>> an undefined SVal.
>>
>> I could go down into fields to get the derived symbols for X and Y
>> respectively, and then use getParentSymbol() to get the symbol for the
>> whole struct. This looks cumbersome though. Is there a more convenient way
>> to get the symbol for the whole struct in this case?
>>
>> // checkBind: pos1 -> conj_$3{struct XY, LC1, S45418, #1}
>> struct XY pos1 = next_pos(10, 20);
>>
>> // checkBind: pos2 -> lazyCompoundVal{0x5d4bb38,pos1}
>> struct XY pos2 = pos1;
>>
>> move_to_pos(pos2);
>>
>> /** evalCall for move_to_pos():
>>   SVal Pos = C.getSVal(CE->getArg(0));
>>   ProgramStateRef State = C.getState();
>>   StoreManager &StoreMgr = State->getStateManager().getStoreManager();
>>   auto LCV = Pos.getAs<nonloc::LazyCompoundVal>();
>>   SVal LCSVal = *StoreMgr.getDefaultBinding(*LCV);
>>   LCSVal.dump() // <- Undefined
>>   ...
>>   const Store St = LCV->getCVData()->getStore();
>>   const SVal FieldSVal = StoreMgr.getBinding(St,
>> loc::MemRegionVal(FieldReg));
>>   FieldSVal.dump(); // <- derived_$4{conj_$3{struct XY, LC1, S45418,
>> #1},pos1->X}
>>
>>   const auto *SD = dyn_cast<SymbolDerived>(FieldSVal.getAsSymbol());
>>   const auto ParentSym = SD->getParentSymbol();
>>   ParentSym.dump(); // <- conj_$3{struct XY, LC1, S45418, #1}
>> **/
>>
>> On Tue, 25 Jun 2019 at 14:06, Artem Dergachev <noqnoqneo at gmail.com>
>> wrote:
>>
>>> The "0x4aa1c58" part of "lazyCompoundVal{0x4aa1c58,pos1}" is a Store
>>> object. You can access it with getStore() and then read it with the help of
>>> a StoreManager.
>>>
>>> Hmm, we seem to already have a convenient API for that, you can do
>>> StoreManager::getDefaultBinding(nonloc::LazyCompoundVal) directly if all
>>> you need is a default-bound conjured symbol. But if you want to lookup,
>>> say, specific fields in the structure (X and Y separately), you'll need to
>>> do getBinding() on manually constructed FieldRegions (in your case it
>>> doesn't look very useful because the whole structure is conjured anyway).
>>>
>>> I guess at this point you might like the chapter 5 of my old workbook (
>>> https://github.com/haoNoQ/clang-analyzer-guide/releases/download/v0.1/clang-analyzer-guide-v0.1.pdf),
>>> as for now it seems to be the only place where different kinds of values
>>> are explained.
>>>
>>>
>>> On 6/25/19 2:35 AM, Torry Chen via cfe-dev wrote:
>>>
>>> My project has a struct type as follows and I'm writing a checker for
>>> some functions that take the struct value as an argument. In the
>>> checkPreCall function I see the argument is an LazyCompoundVal, not a
>>> symbol as it would be for a primitive type. I tried a few ways to extract
>>> the symbol from the LazyCompountVal with no luck. Hope to get some help
>>> here.
>>>
>>> struct XY {
>>>   uint64_t X;
>>>   uint64_t Y;
>>> };
>>>
>>> ...
>>> // checkBind: pos1 -> conj_$3{struct XY, LC1, S63346, #1}
>>> struct XY pos1 = next_pos(...);
>>>
>>> // checkPreCall: Arg0: lazyCompoundVal{0x4aa1c58,pos1}
>>> move_to_pos(pos1);
>>>
>>> _______________________________________________
>>> cfe-dev mailing listcfe-dev at lists.llvm.orghttps://lists.llvm.org/cgi-bin/mailman/listinfo/cfe-dev
>>>
>>>
>>>
>>
>
>
-------------- next part --------------
An HTML attachment was scrubbed...
URL: <http://lists.llvm.org/pipermail/cfe-dev/attachments/20190626/9f1d4a7c/attachment.html>


More information about the cfe-dev mailing list