[cfe-dev] [PATCH] expand a symbolic pointer
Ted Kremenek
kremenek at apple.com
Sat Dec 13 13:53:11 PST 2008
For me things got clearer as we dove into the design. Modeling the
heap symbolically is a tricky concept, but I think we're making good
progress towards a clean formulation and implementation.
On Dec 12, 2008, at 7:42 PM, Zhongxing Xu wrote:
> Hi Ted,
>
> Your comments are inspiring. I feel that there are some more
> fundamental problems that should be solved before this one. Or, this
> one is a special case of how we treat assumptions on symbolic values
> (including pointer and non-pointers) and lazy binding of locations.
>
> On Sat, Dec 13, 2008 at 10:48 AM, Ted Kremenek <kremenek at apple.com>
> wrote:
>
> On Dec 11, 2008, at 1:34 AM, Zhongxing Xu wrote:
>
> Now we use a GDM to map a symbol to its concretized region. We no
> longer need the base struct pointer to be an lvalue. It only need to
> be a symbolic value.
>
> Great! Many of my comments are still of design, and I'm not
> convinced we need an AnonPointeeRegion. We also may not need to
> have a separate GDM entry; just use the ConstraintManager to
> determine if a region is concrete or not if its symbol is not null.
>
> Comments inline!
>
>
> Known problem: we need a way to iterate over live symbols, otherwise
> the values of AnonPointeeRegions will be swept.
>
> I don't think this is a problem if we just use SymbolicRegions
> instead of AnonPointeeRegions. More comments below.
>
> ===================================================================
> --- lib/Analysis/GRExprEngine.cpp (revision 60870)
>
> +++ lib/Analysis/GRExprEngine.cpp (working copy)
> @@ -935,6 +935,24 @@
>
> // FIXME: Should we insert some assumption logic in here to
> determine
> // if "Base" is a valid piece of memory? Before we put this
> assumption
> // later when using FieldOffset lvals (which we no longer have).
> +
> + // BaseV is the struct base location.
>
> + SVal BaseV = GetSVal(St, Base);
> +
> + // Check if the base can be NonNull. FIXME: should also check
> for null here,
> + // because once we concretized the symbolic pointer, it's no
> longer viewed
> + // as null.
>
> + bool isFeasibleNotNull = false;
> + St = Assume(St, BaseV, true, isFeasibleNotNull);
>
> This isn't correct. This overrides the only state reference with
> the assumption that BaseV is not null. What if it is? We need two
> cases here: one where BaseV is assumed to be null and another where
> it isn't. We need to generate two states. Almost anytime in
> GRExprEngine where we make assumptions we need to have a case where
> the assumption is false and the other where the assumption is true.
>
> For example, this code won't handle the following correctly:
>
> int f(struct s x) {
> return x == 0 ? x->f1 : x->f2;
> }
>
> On the true branch what 'St' to we generate? I think it will be null.
>
> - case loc::SymbolValKind:
> - BaseR = MRMgr.getSymbolicRegion(cast<loc::SymbolVal>(&BaseL)-
> >getSymbol());
> + case loc::SymbolValKind: {
> + GRStateRef state(St, StateMgr);
> + loc::SymbolVal* SV = cast<loc::SymbolVal>(&BaseL);
> +
> + // Check if this symbol has been concretized.
> + ConcretizedPointer::data_type*
> + T = state.get<ConcretizedPointer>(SV->getSymbol());
> +
> + if (T)
> + BaseR = *T;
> + else
> + BaseR = MRMgr.getSymbolicRegion(SV->getSymbol());
> break;
> + }
>
> This now looks wrong to me. Why have two regions for the same
> concept? We should just have a symbolic region, not return an
> AnonPointeeRegion because that region has been concretized.
>
> Here is how I see things:
>
> 1) loc::SymbolVal SV represents the "lvalue" of the symbolic region
> if the symbol unless the symbol is equal to null.
>
> 2) If the symbol is not null, SymbolicRegion* R =
> getSymbolicRegion(SV->getSymbol()) represents the actual region.
>
> It seems like AnonPointeeRegion is just duplicating SymbolicRegion.
> Why have a separate concept? It also keep the above code simple.
> Whether or not the symbol has a "concrete" region is a property of
> that symbol, but we shouldn't return a separate region. It also
> takes care of all the aliasing issues I mentioned in my previous
> email because you always using the same "name" for the same concept.
>
> ===================================================================
> --- lib/Analysis/RegionStore.cpp (revision 60870)
>
> +++ lib/Analysis/RegionStore.cpp (working copy)
> @@ -62,6 +62,16 @@
> };
> }
>
> +// ConcretizedPointer GDM stuff.
> +typedef llvm::ImmutableMap<SymbolRef, const AnonPointeeRegion*>
> + ConcretizedPointer;
> +static int ConcretizedPointerIndex = 0;
> +namespace clang {
> + template<> struct GRStateTrait<ConcretizedPointer>
> + : public GRStatePartialTrait<ConcretizedPointer> {
> + static void* GDMIndex() { return &ConcretizedPointerIndex; }
> + };
> +}
>
> I don't think we need a map from SymbolRefs -> AnonPointeeRegion.
> Probably just keeping track of whether or not the symbol referred to
> by SymbolRef is enough to determine if the symbol is concrete.
> Otherwise, an ImmutableSet of concrete symbols can represent the
> same information.
>
>
> +std::pair<const GRState*, Store>
> +RegionStoreManager::ConcretizeStructBase(const GRState* St, SVal
> BasePtr,
> + QualType T) {
> +
> + loc::SymbolVal* SV = cast<loc::SymbolVal>(&BasePtr);
>
>
> Since this method expects that BasePtr is a loc::SymbolVal then we
> should just pass that in directly. Then then the compiler can type-
> check our assumptions rather than making this a runtime check. It
> also makes the interface much clearer.
>
> +
> + Store store = St->getStore();
> +
> + // Create an AnonPointeeRegion for the symbolic struct pointer.
> + AnonPointeeRegion* APR = MRMgr.getAnonPointeeRegion(SV-
> >getSymbol(), T);
>
> +
> + // Symbolize the struct region.
> + store = BindStructToSymVal(store, APR);
> +
> + // Map the symbolic pointer to its concretization region.
> + GRStateRef state(St, StateMgr);
> + const GRState* NewSt = state.set<ConcretizedPointer>(SV-
> >getSymbol(), APR);
> +
> + return std::make_pair(NewSt, store);
> +}
>
> Looking at this again I actually don't think this method
> (ConcretizeStructBase) is needed, nor is the design as scalable as I
> think we have discussed. Just bind symbolic values to the
> FieldRegions lazily. Here's the reasoning:
>
> 1) If the symbol for SV->getSymbol() is not null we know it is
> concrete. It's region is SymbolicRegion* R = getSymbolicRegion(SV-
> >getSymbol());
>
> 2) A field for the FieldDecl 'D' for this symbolic region will be
> represented as a FieldRegion 'FR' with the decl 'D' and the super
> region R.
>
> 3) During value lookup for the region 'FR' we can...
>
> a) If there is no value in our map, return a symbolic value. (we can
> create a new symbol type to associate with this region)
> b) If there is a value, return it.
> c) Assignments of 'UnknownVal' to this field should either be
> explicitly represented in the mapping or in a kill set.
>
> By binding things lazily everything becomes a lot more efficient.
> The approach in your patch also doesn't handle the case where a
> structure is already concretized and you attempt to concretize it
> again:
>
> x->f
> ....
> x->f // ConcretizeStructBase will be called again!
>
> We should actually aim to make all of RegionStore lazy in this way.
> This would address the scalability concerns with modeling arrays and
> structs as we only need to bind their values lazily as opposed to
> all at once.
>
> I also don't think there is a scenario where we need
> AnonPointeeRegion. I think that is modeled just fine using
> SymbolicRegion, and then just having different SymbolData types to
> encapsulate the information about specific symbols.
>
> + // For live symbols, if it is concretized, put the associated
> + // AnonPointeeRegion into RegionRoots.
> + // FIXME: We need a way to iterate over the live symbols!
> +// for (LiveSymbolsTy::iterator I = LSymbols.begin(), E =
> LSymbols.end();
> +// I != E; ++I) {
> +// GRStateRef StRef(state, StateMgr);
> +// ConcretizedPointer::data_type* T =
> StRef.get<ConcretizedPointer>(*I);
> +// if (T)
> +// RegionRoots.push_back(*T);
> +// }
> +
>
> This isn't needed. For example:
>
> x->f = 10
>
> This will cause us to have a FieldRegion for 'f' whose super region
> is the symbolic region for the symbol referred to by 'x'. That
> FieldRegion will have a binding to 10, causing it to be in our
> bindings. This means that RemoveDeadBindings will add the
> FieldRegion in the backmap from super regions to subregions.
> RemoveDeadBindings nicely handles SymbolicRegion, so if we don't use
> AnonPointeeRegion at all the code in RemoveDeadBindings stays the
> same.
>
>
-------------- next part --------------
An HTML attachment was scrubbed...
URL: <http://lists.llvm.org/pipermail/cfe-dev/attachments/20081213/f83848a5/attachment.html>
More information about the cfe-dev
mailing list