[cfe-dev] How to extract a symbol stored in LazyCompoundVal?

Torry Chen via cfe-dev cfe-dev at lists.llvm.org
Wed Jun 26 15:22:32 PDT 2019


I tried State->dump() and it shows there is no default binding for a struct
variable that copies another. See below. I certainly can (and should) use
the field symbols for my work. I was curious about the internals of the
analyzer engine. Thank you for the detailed explanation!

struct XY pos1 = next_pos(10, 20); // Binding: pos1 -> conj_$3{struct XY,
LC1, S45538, #1}
move_to_pos(pos1);
// evalCall, State->dump():
//
// Store (direct and default bindings), 0x4176b88 :
// (GlobalInternalSpaceRegion,0,default) : conj_$1{int, LC1, S45538, #1}
// (GlobalSystemSpaceRegion,0,default) : conj_$2{int, LC1, S45538, #1}
// (pos1,0,default) : conj_$3{struct XY, LC1, S45538, #1}
//
// Expressions by stack frame:
// #0 Calling main

// (LC1, S45573) move_to_pos : &code{move_to_pos}
// (LC1, S45581) pos1 : lazyCompoundVal{0x4176b88,pos1}
//
// Ranges are empty.
//
// StoreMgr.getBinding(St, loc::MemRegionVal(Arg0Reg)) --> *conj_$3{struct
XY, LC1, S45538, #1}*

struct XY pos2 = pos1; // Binding: pos2 -> lazyCompoundVal{0x4176b88,pos1}
move_to_pos(pos2);
// evalCall, State->dump():
//
// Store (direct and default bindings), 0x4176b88 :
// (GlobalInternalSpaceRegion,0,default) : conj_$1{int, LC1, S45538, #1}
// (GlobalSystemSpaceRegion,0,default) : conj_$2{int, LC1, S45538, #1}
// (pos1,0,default) : conj_$3{struct XY, LC1, S45538, #1}
//
// Expressions by stack frame:
// #0 Calling main
// (LC1, S45618) move_to_pos : &code{move_to_pos}
// (LC1, S45626) pos2 : lazyCompoundVal{0x4182e58,pos2}
//
// Ranges are empty.
//
// StoreMgr.getBinding(St, loc::MemRegionVal(Arg0Reg)) --> *Undefined*

On Wed, 26 Jun 2019 at 12:15, Artem Dergachev <noqnoqneo at gmail.com> wrote:

> Hmm, weird.
>
> I suspect that assignment was handled with "small struct optimization",
> i.e. field-by-field rather than lazily (cf.
> RegionStoreManager::tryBindSmallStruct).
>
> Could you do a State->dump() to verify that? If it shows that there's no
> default binding but instead there are two derived symbols bound to two
> different offsets, then the information about the "whole struct symbol" is
> already more or less lost: the static analyzer no longer remembers that
> this whole structure is the same as pos1, but it does remember that its
> fields, separately, are exactly the same as they were in pos1, which is
> what you see by looking at the fields separately.
>
> Generally we don't have many checkers that track structures as a whole and
> we don't really know how *should* the checker API look like in order to
> make such checkers easy to implement. The only such checker that we have is
> IteratorChecker and it kinda tries to do something but it's not very
> convenient. For C++ objects i'm thinking of tracking a "whole structure
> symbol" artificially, so that it didn't have anything to do with the actual
> contents of the structure but more with its semantic meaning: it would be
> preserved by const operations (even if they mutate memory contents of
> mutable fields) or through copies/moves and additionally you would be able
> to attach state traits to it without thinking about manually modeling
> copies/moves.
>
> I guess in your case, which seems to be more like a C world, the ad-hoc
> solution would be to do something like
>
>     let's see...
>     pos2.x comes from pos1...
>     pos2.y also comes from pos1...
>     aha, got it!
>     the whole pos2 comes from pos1!
>
> You will *anyway* have to do this because the programmer is free to copy
> the structure field-by-field manually instead of just assigning the
> structure. This would also happen in C++ if the structure has a non-trivial
> constructor. For the same reason it's not enough to check only 'x' but skip
> 'y': the programmer can easily overwrite one field but not the other field.
>
> Finally, i'm surprised that it returns a UndefinedVal (i.e., in
> particular, it allows you to unwrap the Optional) instead of None. This
> sounds like a bug. But it might be because the structure does indeed have
> an undefined default binding (eg., this happens when it's allocated by
> malloc() or operator new). It'd make sense because assigning every field
> wouldn't overwrite the default binding. Which, in turn, should remind you
> that relying on the "structure symbol" in order to figure out what the
> contents of the structure are is not a good idea unless your structure is
> immutable and completely opaque or you somehow know that it's freshly
> created. But direct bindings to fields are actually always trustworthy.
> That's how our memory model works.
>
>
> On 6/25/19 9:10 PM, Torry Chen wrote:
>
> Thank you Artem! It seems StoreManager::getDefaultBinding() won't work if
> the struct variable is copied. As shown below, getDefaultBinding() returns
> an undefined SVal.
>
> I could go down into fields to get the derived symbols for X and Y
> respectively, and then use getParentSymbol() to get the symbol for the
> whole struct. This looks cumbersome though. Is there a more convenient way
> to get the symbol for the whole struct in this case?
>
> // checkBind: pos1 -> conj_$3{struct XY, LC1, S45418, #1}
> struct XY pos1 = next_pos(10, 20);
>
> // checkBind: pos2 -> lazyCompoundVal{0x5d4bb38,pos1}
> struct XY pos2 = pos1;
>
> move_to_pos(pos2);
>
> /** evalCall for move_to_pos():
>   SVal Pos = C.getSVal(CE->getArg(0));
>   ProgramStateRef State = C.getState();
>   StoreManager &StoreMgr = State->getStateManager().getStoreManager();
>   auto LCV = Pos.getAs<nonloc::LazyCompoundVal>();
>   SVal LCSVal = *StoreMgr.getDefaultBinding(*LCV);
>   LCSVal.dump() // <- Undefined
>   ...
>   const Store St = LCV->getCVData()->getStore();
>   const SVal FieldSVal = StoreMgr.getBinding(St,
> loc::MemRegionVal(FieldReg));
>   FieldSVal.dump(); // <- derived_$4{conj_$3{struct XY, LC1, S45418,
> #1},pos1->X}
>
>   const auto *SD = dyn_cast<SymbolDerived>(FieldSVal.getAsSymbol());
>   const auto ParentSym = SD->getParentSymbol();
>   ParentSym.dump(); // <- conj_$3{struct XY, LC1, S45418, #1}
> **/
>
> On Tue, 25 Jun 2019 at 14:06, Artem Dergachev <noqnoqneo at gmail.com> wrote:
>
>> The "0x4aa1c58" part of "lazyCompoundVal{0x4aa1c58,pos1}" is a Store
>> object. You can access it with getStore() and then read it with the help of
>> a StoreManager.
>>
>> Hmm, we seem to already have a convenient API for that, you can do
>> StoreManager::getDefaultBinding(nonloc::LazyCompoundVal) directly if all
>> you need is a default-bound conjured symbol. But if you want to lookup,
>> say, specific fields in the structure (X and Y separately), you'll need to
>> do getBinding() on manually constructed FieldRegions (in your case it
>> doesn't look very useful because the whole structure is conjured anyway).
>>
>> I guess at this point you might like the chapter 5 of my old workbook (
>> https://github.com/haoNoQ/clang-analyzer-guide/releases/download/v0.1/clang-analyzer-guide-v0.1.pdf),
>> as for now it seems to be the only place where different kinds of values
>> are explained.
>>
>>
>> On 6/25/19 2:35 AM, Torry Chen via cfe-dev wrote:
>>
>> My project has a struct type as follows and I'm writing a checker for
>> some functions that take the struct value as an argument. In the
>> checkPreCall function I see the argument is an LazyCompoundVal, not a
>> symbol as it would be for a primitive type. I tried a few ways to extract
>> the symbol from the LazyCompountVal with no luck. Hope to get some help
>> here.
>>
>> struct XY {
>>   uint64_t X;
>>   uint64_t Y;
>> };
>>
>> ...
>> // checkBind: pos1 -> conj_$3{struct XY, LC1, S63346, #1}
>> struct XY pos1 = next_pos(...);
>>
>> // checkPreCall: Arg0: lazyCompoundVal{0x4aa1c58,pos1}
>> move_to_pos(pos1);
>>
>> _______________________________________________
>> cfe-dev mailing listcfe-dev at lists.llvm.orghttps://lists.llvm.org/cgi-bin/mailman/listinfo/cfe-dev
>>
>>
>>
>
-------------- next part --------------
An HTML attachment was scrubbed...
URL: <http://lists.llvm.org/pipermail/cfe-dev/attachments/20190626/8aa0adba/attachment.html>


More information about the cfe-dev mailing list