[cfe-dev] [PATCH] expand a symbolic pointer

Fri Dec 12 18:48:07 PST 2008

On Dec 11, 2008, at 1:34 AM, Zhongxing Xu wrote:

> Now we use a GDM to map a symbol to its concretized region. We no  
> longer need the base struct pointer to be an lvalue. It only need to  
> be a symbolic value.

Great!  Many of my comments are still of design, and I'm not convinced  
we need an AnonPointeeRegion.  We also may not need to have a separate  
GDM entry; just use the ConstraintManager to determine if a region is  
concrete or not if its symbol is not null.

Comments inline!

> Known problem: we need a way to iterate over live symbols, otherwise  
> the values of AnonPointeeRegions will be swept.

I don't think this is a problem if we just use SymbolicRegions instead  
of AnonPointeeRegions.  More comments below.

===================================================================

--- lib/Analysis/GRExprEngine.cpp	(revision 60870)
+++ lib/Analysis/GRExprEngine.cpp	(working copy)
@@ -935,6 +935,24 @@
      // FIXME: Should we insert some assumption logic in here to  
determine
      // if "Base" is a valid piece of memory?  Before we put this  
assumption
      // later when using FieldOffset lvals (which we no longer have).
+
+    // BaseV is the struct base location.
+    SVal BaseV = GetSVal(St, Base);
+
+    // Check if the base can be NonNull. FIXME: should also check for  
null here,
+    // because once we concretized the symbolic pointer, it's no  
longer viewed
+    // as null.
+    bool isFeasibleNotNull = false;
+    St = Assume(St, BaseV, true, isFeasibleNotNull);

This isn't correct.  This overrides the only state reference with the  
assumption that BaseV is not null.  What if it is?  We need two cases  
here: one where BaseV is assumed to be null and another where it  
isn't.  We need to generate two states.  Almost anytime in  
GRExprEngine where we make assumptions we need to have a case where  
the assumption is false and the other where the assumption is true.

For example, this code won't handle the following correctly:

int f(struct s x) {
   return x == 0 ? x->f1 : x->f2;
}

On the true branch what 'St' to we generate?  I think it will be null.

  -  case loc::SymbolValKind:
-    BaseR = MRMgr.getSymbolicRegion(cast<loc::SymbolVal>(&BaseL)- 
 >getSymbol());
+  case loc::SymbolValKind: {
+    GRStateRef state(St, StateMgr);
+    loc::SymbolVal* SV = cast<loc::SymbolVal>(&BaseL);
+
+    // Check if this symbol has been concretized.
+    ConcretizedPointer::data_type*
+      T = state.get<ConcretizedPointer>(SV->getSymbol());
+
+    if (T)
+      BaseR = *T;
+    else
+      BaseR = MRMgr.getSymbolicRegion(SV->getSymbol());
      break;
+  }

This now looks wrong to me.  Why have two regions for the same  
concept?  We should just have a symbolic region, not return an  
AnonPointeeRegion because that region has been concretized.

Here is how I see things:

1) loc::SymbolVal SV represents the "lvalue" of the symbolic region if  
the symbol unless the symbol is equal to null.

2) If the symbol is not null, SymbolicRegion* R = getSymbolicRegion(SV- 
 >getSymbol()) represents the actual region.

It seems like AnonPointeeRegion is just duplicating SymbolicRegion.   
Why have a separate concept?  It also keep the above code simple.   
Whether or not the symbol has a "concrete" region is a property of  
that symbol, but we shouldn't return a separate region.  It also takes  
care of all the aliasing issues I mentioned in my previous email  
because you always using the same "name" for the same concept.

===================================================================
--- lib/Analysis/RegionStore.cpp	(revision 60870)
+++ lib/Analysis/RegionStore.cpp	(working copy)
@@ -62,6 +62,16 @@
    };
  }

+// ConcretizedPointer GDM stuff.
+typedef llvm::ImmutableMap<SymbolRef, const AnonPointeeRegion*>
+  ConcretizedPointer;
+static int ConcretizedPointerIndex = 0;
+namespace clang {
+  template<> struct GRStateTrait<ConcretizedPointer>
+    : public GRStatePartialTrait<ConcretizedPointer> {
+    static void* GDMIndex() { return &ConcretizedPointerIndex; }
+  };
+}

I don't think we need a map from SymbolRefs -> AnonPointeeRegion.   
Probably just keeping track of whether or not the symbol referred to  
by SymbolRef is enough to determine if the symbol is concrete.   
Otherwise, an ImmutableSet of concrete symbols can represent the same  
information.


+std::pair<const GRState*, Store>
+RegionStoreManager::ConcretizeStructBase(const GRState* St, SVal  
BasePtr,
+                                         QualType T) {
+
+  loc::SymbolVal* SV = cast<loc::SymbolVal>(&BasePtr);


Since this method expects that BasePtr is a loc::SymbolVal then we  
should just pass that in directly.  Then then the compiler can type- 
check our assumptions rather than making this a runtime check.  It  
also makes the interface much clearer.

+
+  Store store = St->getStore();
+
+  // Create an AnonPointeeRegion for the symbolic struct pointer.
+  AnonPointeeRegion* APR = MRMgr.getAnonPointeeRegion(SV- 
 >getSymbol(), T);
+
+  // Symbolize the struct region.
+  store = BindStructToSymVal(store, APR);
+
+  // Map the symbolic pointer to its concretization region.
+  GRStateRef state(St, StateMgr);
+  const GRState* NewSt = state.set<ConcretizedPointer>(SV- 
 >getSymbol(), APR);
+
+  return std::make_pair(NewSt, store);
+}

Looking at this again I actually don't think this method  
(ConcretizeStructBase) is needed, nor is the design as scalable as I  
think we have discussed.  Just bind symbolic values to the  
FieldRegions lazily.  Here's the reasoning:

1) If the symbol for SV->getSymbol() is not null we know it is  
concrete.  It's region is SymbolicRegion* R = getSymbolicRegion(SV- 
 >getSymbol());

2) A field for the FieldDecl 'D' for this symbolic region will be  
represented as a FieldRegion 'FR' with the decl 'D' and the super  
region R.

3) During value lookup for the region 'FR' we can...

a) If there is no value in our map, return a symbolic value. (we can  
create a new symbol type to associate with this region)
b) If there is a value, return it.
c) Assignments of 'UnknownVal' to this field should either be  
explicitly represented in the mapping or in a kill set.

By binding things lazily everything becomes a lot more efficient.  The  
approach in your patch also doesn't handle the case where a structure  
is already concretized and you attempt to concretize it again:

   x->f
   ....
   x->f    // ConcretizeStructBase will be called again!

We should actually aim to make all of RegionStore lazy in this way.   
This would address the scalability concerns with modeling arrays and  
structs as we only need to bind their values lazily as opposed to all  
at once.

I also don't think there is a scenario where we need  
AnonPointeeRegion.  I think that is modeled just fine using  
SymbolicRegion, and then just having different SymbolData types to  
encapsulate the information about specific symbols.

+  // For live symbols, if it is concretized, put the associated
+  // AnonPointeeRegion into RegionRoots.
+  // FIXME: We need a way to iterate over the live symbols!
+//   for (LiveSymbolsTy::iterator I = LSymbols.begin(), E =  
LSymbols.end();
+//        I != E; ++I) {
+//     GRStateRef StRef(state, StateMgr);
+//     ConcretizedPointer::data_type* T =  
StRef.get<ConcretizedPointer>(*I);
+//     if (T)
+//       RegionRoots.push_back(*T);
+//   }
+

This isn't needed.  For example:

   x->f = 10

This will cause us to have a FieldRegion for 'f' whose super region is  
the symbolic region for the symbol referred to by 'x'.  That  
FieldRegion will have a binding to 10, causing it to be in our  
bindings.  This means that RemoveDeadBindings will add the FieldRegion  
in the backmap from super regions to subregions.  RemoveDeadBindings  
nicely handles SymbolicRegion, so if we don't use AnonPointeeRegion at  
all the code in RemoveDeadBindings stays the same.