[cfe-dev] [PATCH] expand a symbolic pointer
Zhongxing Xu
xuzhongxing at gmail.com
Fri Dec 12 19:42:09 PST 2008
Hi Ted,
Your comments are inspiring. I feel that there are some more fundamental
problems that should be solved before this one. Or, this one is a special
case of how we treat assumptions on symbolic values (including pointer and
non-pointers) and lazy binding of locations.
On Sat, Dec 13, 2008 at 10:48 AM, Ted Kremenek <kremenek at apple.com> wrote:
>
> On Dec 11, 2008, at 1:34 AM, Zhongxing Xu wrote:
>
> Now we use a GDM to map a symbol to its concretized region. We no longer
>> need the base struct pointer to be an lvalue. It only need to be a symbolic
>> value.
>>
>
> Great! Many of my comments are still of design, and I'm not convinced we
> need an AnonPointeeRegion. We also may not need to have a separate GDM
> entry; just use the ConstraintManager to determine if a region is concrete
> or not if its symbol is not null.
>
> Comments inline!
>
> Known problem: we need a way to iterate over live symbols, otherwise the
>> values of AnonPointeeRegions will be swept.
>>
>
> I don't think this is a problem if we just use SymbolicRegions instead of
> AnonPointeeRegions. More comments below.
>
> ===================================================================
> --- lib/Analysis/GRExprEngine.cpp (revision 60870)
> +++ lib/Analysis/GRExprEngine.cpp (working copy)
> @@ -935,6 +935,24 @@
> // FIXME: Should we insert some assumption logic in here to determine
> // if "Base" is a valid piece of memory? Before we put this assumption
> // later when using FieldOffset lvals (which we no longer have).
> +
> + // BaseV is the struct base location.
> + SVal BaseV = GetSVal(St, Base);
> +
> + // Check if the base can be NonNull. FIXME: should also check for null
> here,
> + // because once we concretized the symbolic pointer, it's no longer
> viewed
> + // as null.
> + bool isFeasibleNotNull = false;
> + St = Assume(St, BaseV, true, isFeasibleNotNull);
>
> This isn't correct. This overrides the only state reference with the
> assumption that BaseV is not null. What if it is? We need two cases here:
> one where BaseV is assumed to be null and another where it isn't. We need
> to generate two states. Almost anytime in GRExprEngine where we make
> assumptions we need to have a case where the assumption is false and the
> other where the assumption is true.
>
> For example, this code won't handle the following correctly:
>
> int f(struct s x) {
> return x == 0 ? x->f1 : x->f2;
> }
>
> On the true branch what 'St' to we generate? I think it will be null.
>
> - case loc::SymbolValKind:
> - BaseR =
> MRMgr.getSymbolicRegion(cast<loc::SymbolVal>(&BaseL)->getSymbol());
> + case loc::SymbolValKind: {
> + GRStateRef state(St, StateMgr);
> + loc::SymbolVal* SV = cast<loc::SymbolVal>(&BaseL);
> +
> + // Check if this symbol has been concretized.
> + ConcretizedPointer::data_type*
> + T = state.get<ConcretizedPointer>(SV->getSymbol());
> +
> + if (T)
> + BaseR = *T;
> + else
> + BaseR = MRMgr.getSymbolicRegion(SV->getSymbol());
> break;
> + }
>
> This now looks wrong to me. Why have two regions for the same concept? We
> should just have a symbolic region, not return an AnonPointeeRegion because
> that region has been concretized.
>
> Here is how I see things:
>
> 1) loc::SymbolVal SV represents the "lvalue" of the symbolic region if the
> symbol unless the symbol is equal to null.
>
> 2) If the symbol is not null, SymbolicRegion* R =
> getSymbolicRegion(SV->getSymbol()) represents the actual region.
>
> It seems like AnonPointeeRegion is just duplicating SymbolicRegion. Why
> have a separate concept? It also keep the above code simple. Whether or
> not the symbol has a "concrete" region is a property of that symbol, but we
> shouldn't return a separate region. It also takes care of all the aliasing
> issues I mentioned in my previous email because you always using the same
> "name" for the same concept.
>
> ===================================================================
> --- lib/Analysis/RegionStore.cpp (revision 60870)
> +++ lib/Analysis/RegionStore.cpp (working copy)
> @@ -62,6 +62,16 @@
> };
> }
>
> +// ConcretizedPointer GDM stuff.
> +typedef llvm::ImmutableMap<SymbolRef, const AnonPointeeRegion*>
> + ConcretizedPointer;
> +static int ConcretizedPointerIndex = 0;
> +namespace clang {
> + template<> struct GRStateTrait<ConcretizedPointer>
> + : public GRStatePartialTrait<ConcretizedPointer> {
> + static void* GDMIndex() { return &ConcretizedPointerIndex; }
> + };
> +}
>
> I don't think we need a map from SymbolRefs -> AnonPointeeRegion. Probably
> just keeping track of whether or not the symbol referred to by SymbolRef is
> enough to determine if the symbol is concrete. Otherwise, an ImmutableSet
> of concrete symbols can represent the same information.
>
>
> +std::pair<const GRState*, Store>
> +RegionStoreManager::ConcretizeStructBase(const GRState* St, SVal BasePtr,
> + QualType T) {
> +
> + loc::SymbolVal* SV = cast<loc::SymbolVal>(&BasePtr);
>
>
> Since this method expects that BasePtr is a loc::SymbolVal then we should
> just pass that in directly. Then then the compiler can type-check our
> assumptions rather than making this a runtime check. It also makes the
> interface much clearer.
>
> +
> + Store store = St->getStore();
> +
> + // Create an AnonPointeeRegion for the symbolic struct pointer.
> + AnonPointeeRegion* APR = MRMgr.getAnonPointeeRegion(SV->getSymbol(), T);
> +
> + // Symbolize the struct region.
> + store = BindStructToSymVal(store, APR);
> +
> + // Map the symbolic pointer to its concretization region.
> + GRStateRef state(St, StateMgr);
> + const GRState* NewSt = state.set<ConcretizedPointer>(SV->getSymbol(),
> APR);
> +
> + return std::make_pair(NewSt, store);
> +}
>
> Looking at this again I actually don't think this method
> (ConcretizeStructBase) is needed, nor is the design as scalable as I think
> we have discussed. Just bind symbolic values to the FieldRegions lazily.
> Here's the reasoning:
>
> 1) If the symbol for SV->getSymbol() is not null we know it is concrete.
> It's region is SymbolicRegion* R = getSymbolicRegion(SV->getSymbol());
>
> 2) A field for the FieldDecl 'D' for this symbolic region will be
> represented as a FieldRegion 'FR' with the decl 'D' and the super region R.
>
> 3) During value lookup for the region 'FR' we can...
>
> a) If there is no value in our map, return a symbolic value. (we can create
> a new symbol type to associate with this region)
> b) If there is a value, return it.
> c) Assignments of 'UnknownVal' to this field should either be explicitly
> represented in the mapping or in a kill set.
>
> By binding things lazily everything becomes a lot more efficient. The
> approach in your patch also doesn't handle the case where a structure is
> already concretized and you attempt to concretize it again:
>
> x->f
> ....
> x->f // ConcretizeStructBase will be called again!
>
> We should actually aim to make all of RegionStore lazy in this way. This
> would address the scalability concerns with modeling arrays and structs as
> we only need to bind their values lazily as opposed to all at once.
>
> I also don't think there is a scenario where we need AnonPointeeRegion. I
> think that is modeled just fine using SymbolicRegion, and then just having
> different SymbolData types to encapsulate the information about specific
> symbols.
>
> + // For live symbols, if it is concretized, put the associated
> + // AnonPointeeRegion into RegionRoots.
> + // FIXME: We need a way to iterate over the live symbols!
> +// for (LiveSymbolsTy::iterator I = LSymbols.begin(), E =
> LSymbols.end();
> +// I != E; ++I) {
> +// GRStateRef StRef(state, StateMgr);
> +// ConcretizedPointer::data_type* T =
> StRef.get<ConcretizedPointer>(*I);
> +// if (T)
> +// RegionRoots.push_back(*T);
> +// }
> +
>
> This isn't needed. For example:
>
> x->f = 10
>
> This will cause us to have a FieldRegion for 'f' whose super region is the
> symbolic region for the symbol referred to by 'x'. That FieldRegion will
> have a binding to 10, causing it to be in our bindings. This means that
> RemoveDeadBindings will add the FieldRegion in the backmap from super
> regions to subregions. RemoveDeadBindings nicely handles SymbolicRegion, so
> if we don't use AnonPointeeRegion at all the code in RemoveDeadBindings
> stays the same.
>
>
-------------- next part --------------
An HTML attachment was scrubbed...
URL: <http://lists.llvm.org/pipermail/cfe-dev/attachments/20081213/c5cca253/attachment.html>
More information about the cfe-dev
mailing list