[cfe-dev] [PATCH] expand a symbolic pointer
Ted Kremenek
kremenek at apple.com
Fri Dec 12 18:48:07 PST 2008
On Dec 11, 2008, at 1:34 AM, Zhongxing Xu wrote:
> Now we use a GDM to map a symbol to its concretized region. We no
> longer need the base struct pointer to be an lvalue. It only need to
> be a symbolic value.
Great! Many of my comments are still of design, and I'm not convinced
we need an AnonPointeeRegion. We also may not need to have a separate
GDM entry; just use the ConstraintManager to determine if a region is
concrete or not if its symbol is not null.
Comments inline!
> Known problem: we need a way to iterate over live symbols, otherwise
> the values of AnonPointeeRegions will be swept.
I don't think this is a problem if we just use SymbolicRegions instead
of AnonPointeeRegions. More comments below.
===================================================================
--- lib/Analysis/GRExprEngine.cpp (revision 60870)
+++ lib/Analysis/GRExprEngine.cpp (working copy)
@@ -935,6 +935,24 @@
// FIXME: Should we insert some assumption logic in here to
determine
// if "Base" is a valid piece of memory? Before we put this
assumption
// later when using FieldOffset lvals (which we no longer have).
+
+ // BaseV is the struct base location.
+ SVal BaseV = GetSVal(St, Base);
+
+ // Check if the base can be NonNull. FIXME: should also check for
null here,
+ // because once we concretized the symbolic pointer, it's no
longer viewed
+ // as null.
+ bool isFeasibleNotNull = false;
+ St = Assume(St, BaseV, true, isFeasibleNotNull);
This isn't correct. This overrides the only state reference with the
assumption that BaseV is not null. What if it is? We need two cases
here: one where BaseV is assumed to be null and another where it
isn't. We need to generate two states. Almost anytime in
GRExprEngine where we make assumptions we need to have a case where
the assumption is false and the other where the assumption is true.
For example, this code won't handle the following correctly:
int f(struct s x) {
return x == 0 ? x->f1 : x->f2;
}
On the true branch what 'St' to we generate? I think it will be null.
- case loc::SymbolValKind:
- BaseR = MRMgr.getSymbolicRegion(cast<loc::SymbolVal>(&BaseL)-
>getSymbol());
+ case loc::SymbolValKind: {
+ GRStateRef state(St, StateMgr);
+ loc::SymbolVal* SV = cast<loc::SymbolVal>(&BaseL);
+
+ // Check if this symbol has been concretized.
+ ConcretizedPointer::data_type*
+ T = state.get<ConcretizedPointer>(SV->getSymbol());
+
+ if (T)
+ BaseR = *T;
+ else
+ BaseR = MRMgr.getSymbolicRegion(SV->getSymbol());
break;
+ }
This now looks wrong to me. Why have two regions for the same
concept? We should just have a symbolic region, not return an
AnonPointeeRegion because that region has been concretized.
Here is how I see things:
1) loc::SymbolVal SV represents the "lvalue" of the symbolic region if
the symbol unless the symbol is equal to null.
2) If the symbol is not null, SymbolicRegion* R = getSymbolicRegion(SV-
>getSymbol()) represents the actual region.
It seems like AnonPointeeRegion is just duplicating SymbolicRegion.
Why have a separate concept? It also keep the above code simple.
Whether or not the symbol has a "concrete" region is a property of
that symbol, but we shouldn't return a separate region. It also takes
care of all the aliasing issues I mentioned in my previous email
because you always using the same "name" for the same concept.
===================================================================
--- lib/Analysis/RegionStore.cpp (revision 60870)
+++ lib/Analysis/RegionStore.cpp (working copy)
@@ -62,6 +62,16 @@
};
}
+// ConcretizedPointer GDM stuff.
+typedef llvm::ImmutableMap<SymbolRef, const AnonPointeeRegion*>
+ ConcretizedPointer;
+static int ConcretizedPointerIndex = 0;
+namespace clang {
+ template<> struct GRStateTrait<ConcretizedPointer>
+ : public GRStatePartialTrait<ConcretizedPointer> {
+ static void* GDMIndex() { return &ConcretizedPointerIndex; }
+ };
+}
I don't think we need a map from SymbolRefs -> AnonPointeeRegion.
Probably just keeping track of whether or not the symbol referred to
by SymbolRef is enough to determine if the symbol is concrete.
Otherwise, an ImmutableSet of concrete symbols can represent the same
information.
+std::pair<const GRState*, Store>
+RegionStoreManager::ConcretizeStructBase(const GRState* St, SVal
BasePtr,
+ QualType T) {
+
+ loc::SymbolVal* SV = cast<loc::SymbolVal>(&BasePtr);
Since this method expects that BasePtr is a loc::SymbolVal then we
should just pass that in directly. Then then the compiler can type-
check our assumptions rather than making this a runtime check. It
also makes the interface much clearer.
+
+ Store store = St->getStore();
+
+ // Create an AnonPointeeRegion for the symbolic struct pointer.
+ AnonPointeeRegion* APR = MRMgr.getAnonPointeeRegion(SV-
>getSymbol(), T);
+
+ // Symbolize the struct region.
+ store = BindStructToSymVal(store, APR);
+
+ // Map the symbolic pointer to its concretization region.
+ GRStateRef state(St, StateMgr);
+ const GRState* NewSt = state.set<ConcretizedPointer>(SV-
>getSymbol(), APR);
+
+ return std::make_pair(NewSt, store);
+}
Looking at this again I actually don't think this method
(ConcretizeStructBase) is needed, nor is the design as scalable as I
think we have discussed. Just bind symbolic values to the
FieldRegions lazily. Here's the reasoning:
1) If the symbol for SV->getSymbol() is not null we know it is
concrete. It's region is SymbolicRegion* R = getSymbolicRegion(SV-
>getSymbol());
2) A field for the FieldDecl 'D' for this symbolic region will be
represented as a FieldRegion 'FR' with the decl 'D' and the super
region R.
3) During value lookup for the region 'FR' we can...
a) If there is no value in our map, return a symbolic value. (we can
create a new symbol type to associate with this region)
b) If there is a value, return it.
c) Assignments of 'UnknownVal' to this field should either be
explicitly represented in the mapping or in a kill set.
By binding things lazily everything becomes a lot more efficient. The
approach in your patch also doesn't handle the case where a structure
is already concretized and you attempt to concretize it again:
x->f
....
x->f // ConcretizeStructBase will be called again!
We should actually aim to make all of RegionStore lazy in this way.
This would address the scalability concerns with modeling arrays and
structs as we only need to bind their values lazily as opposed to all
at once.
I also don't think there is a scenario where we need
AnonPointeeRegion. I think that is modeled just fine using
SymbolicRegion, and then just having different SymbolData types to
encapsulate the information about specific symbols.
+ // For live symbols, if it is concretized, put the associated
+ // AnonPointeeRegion into RegionRoots.
+ // FIXME: We need a way to iterate over the live symbols!
+// for (LiveSymbolsTy::iterator I = LSymbols.begin(), E =
LSymbols.end();
+// I != E; ++I) {
+// GRStateRef StRef(state, StateMgr);
+// ConcretizedPointer::data_type* T =
StRef.get<ConcretizedPointer>(*I);
+// if (T)
+// RegionRoots.push_back(*T);
+// }
+
This isn't needed. For example:
x->f = 10
This will cause us to have a FieldRegion for 'f' whose super region is
the symbolic region for the symbol referred to by 'x'. That
FieldRegion will have a binding to 10, causing it to be in our
bindings. This means that RemoveDeadBindings will add the FieldRegion
in the backmap from super regions to subregions. RemoveDeadBindings
nicely handles SymbolicRegion, so if we don't use AnonPointeeRegion at
all the code in RemoveDeadBindings stays the same.
More information about the cfe-dev
mailing list