[cfe-dev] [PATCH] expand a symbolic pointer

Sat Dec 13 13:53:11 PST 2008

For me things got clearer as we dove into the design.  Modeling the  
heap symbolically is a tricky concept, but I think we're making good  
progress towards a clean formulation and implementation.

On Dec 12, 2008, at 7:42 PM, Zhongxing Xu wrote:

> Hi Ted,
>
> Your comments are inspiring. I feel that there are some more  
> fundamental problems that should be solved before this one. Or, this  
> one is a special case of how we treat assumptions on symbolic values  
> (including pointer and non-pointers) and lazy binding of locations.
>
> On Sat, Dec 13, 2008 at 10:48 AM, Ted Kremenek <kremenek at apple.com>  
> wrote:
>
> On Dec 11, 2008, at 1:34 AM, Zhongxing Xu wrote:
>
> Now we use a GDM to map a symbol to its concretized region. We no  
> longer need the base struct pointer to be an lvalue. It only need to  
> be a symbolic value.
>
> Great!  Many of my comments are still of design, and I'm not  
> convinced we need an AnonPointeeRegion.  We also may not need to  
> have a separate GDM entry; just use the ConstraintManager to  
> determine if a region is concrete or not if its symbol is not null.
>
> Comments inline!
>
>
> Known problem: we need a way to iterate over live symbols, otherwise  
> the values of AnonPointeeRegions will be swept.
>
> I don't think this is a problem if we just use SymbolicRegions  
> instead of AnonPointeeRegions.  More comments below.
>
> ===================================================================
> --- lib/Analysis/GRExprEngine.cpp       (revision 60870)
>
> +++ lib/Analysis/GRExprEngine.cpp       (working copy)
> @@ -935,6 +935,24 @@
>
>     // FIXME: Should we insert some assumption logic in here to  
> determine
>     // if "Base" is a valid piece of memory?  Before we put this  
> assumption
>     // later when using FieldOffset lvals (which we no longer have).
> +
> +    // BaseV is the struct base location.
>
> +    SVal BaseV = GetSVal(St, Base);
> +
> +    // Check if the base can be NonNull. FIXME: should also check  
> for null here,
> +    // because once we concretized the symbolic pointer, it's no  
> longer viewed
> +    // as null.
>
> +    bool isFeasibleNotNull = false;
> +    St = Assume(St, BaseV, true, isFeasibleNotNull);
>
> This isn't correct.  This overrides the only state reference with  
> the assumption that BaseV is not null.  What if it is?  We need two  
> cases here: one where BaseV is assumed to be null and another where  
> it isn't.  We need to generate two states.  Almost anytime in  
> GRExprEngine where we make assumptions we need to have a case where  
> the assumption is false and the other where the assumption is true.
>
> For example, this code won't handle the following correctly:
>
> int f(struct s x) {
>  return x == 0 ? x->f1 : x->f2;
> }
>
> On the true branch what 'St' to we generate?  I think it will be null.
>
>  -  case loc::SymbolValKind:
> -    BaseR = MRMgr.getSymbolicRegion(cast<loc::SymbolVal>(&BaseL)- 
> >getSymbol());
> +  case loc::SymbolValKind: {
> +    GRStateRef state(St, StateMgr);
> +    loc::SymbolVal* SV = cast<loc::SymbolVal>(&BaseL);
> +
> +    // Check if this symbol has been concretized.
> +    ConcretizedPointer::data_type*
> +      T = state.get<ConcretizedPointer>(SV->getSymbol());
> +
> +    if (T)
> +      BaseR = *T;
> +    else
> +      BaseR = MRMgr.getSymbolicRegion(SV->getSymbol());
>     break;
> +  }
>
> This now looks wrong to me.  Why have two regions for the same  
> concept?  We should just have a symbolic region, not return an  
> AnonPointeeRegion because that region has been concretized.
>
> Here is how I see things:
>
> 1) loc::SymbolVal SV represents the "lvalue" of the symbolic region  
> if the symbol unless the symbol is equal to null.
>
> 2) If the symbol is not null, SymbolicRegion* R =  
> getSymbolicRegion(SV->getSymbol()) represents the actual region.
>
> It seems like AnonPointeeRegion is just duplicating SymbolicRegion.   
> Why have a separate concept?  It also keep the above code simple.   
> Whether or not the symbol has a "concrete" region is a property of  
> that symbol, but we shouldn't return a separate region.  It also  
> takes care of all the aliasing issues I mentioned in my previous  
> email because you always using the same "name" for the same concept.
>
> ===================================================================
> --- lib/Analysis/RegionStore.cpp        (revision 60870)
>
> +++ lib/Analysis/RegionStore.cpp        (working copy)
> @@ -62,6 +62,16 @@
>   };
>  }
>
> +// ConcretizedPointer GDM stuff.
> +typedef llvm::ImmutableMap<SymbolRef, const AnonPointeeRegion*>
> +  ConcretizedPointer;
> +static int ConcretizedPointerIndex = 0;
> +namespace clang {
> +  template<> struct GRStateTrait<ConcretizedPointer>
> +    : public GRStatePartialTrait<ConcretizedPointer> {
> +    static void* GDMIndex() { return &ConcretizedPointerIndex; }
> +  };
> +}
>
> I don't think we need a map from SymbolRefs -> AnonPointeeRegion.   
> Probably just keeping track of whether or not the symbol referred to  
> by SymbolRef is enough to determine if the symbol is concrete.   
> Otherwise, an ImmutableSet of concrete symbols can represent the  
> same information.
>
>
> +std::pair<const GRState*, Store>
> +RegionStoreManager::ConcretizeStructBase(const GRState* St, SVal  
> BasePtr,
> +                                         QualType T) {
> +
> +  loc::SymbolVal* SV = cast<loc::SymbolVal>(&BasePtr);
>
>
> Since this method expects that BasePtr is a loc::SymbolVal then we  
> should just pass that in directly.  Then then the compiler can type- 
> check our assumptions rather than making this a runtime check.  It  
> also makes the interface much clearer.
>
> +
> +  Store store = St->getStore();
> +
> +  // Create an AnonPointeeRegion for the symbolic struct pointer.
> +  AnonPointeeRegion* APR = MRMgr.getAnonPointeeRegion(SV- 
> >getSymbol(), T);
>
> +
> +  // Symbolize the struct region.
> +  store = BindStructToSymVal(store, APR);
> +
> +  // Map the symbolic pointer to its concretization region.
> +  GRStateRef state(St, StateMgr);
> +  const GRState* NewSt = state.set<ConcretizedPointer>(SV- 
> >getSymbol(), APR);
> +
> +  return std::make_pair(NewSt, store);
> +}
>
> Looking at this again I actually don't think this method  
> (ConcretizeStructBase) is needed, nor is the design as scalable as I  
> think we have discussed.  Just bind symbolic values to the  
> FieldRegions lazily.  Here's the reasoning:
>
> 1) If the symbol for SV->getSymbol() is not null we know it is  
> concrete.  It's region is SymbolicRegion* R = getSymbolicRegion(SV- 
> >getSymbol());
>
> 2) A field for the FieldDecl 'D' for this symbolic region will be  
> represented as a FieldRegion 'FR' with the decl 'D' and the super  
> region R.
>
> 3) During value lookup for the region 'FR' we can...
>
> a) If there is no value in our map, return a symbolic value. (we can  
> create a new symbol type to associate with this region)
> b) If there is a value, return it.
> c) Assignments of 'UnknownVal' to this field should either be  
> explicitly represented in the mapping or in a kill set.
>
> By binding things lazily everything becomes a lot more efficient.   
> The approach in your patch also doesn't handle the case where a  
> structure is already concretized and you attempt to concretize it  
> again:
>
>  x->f
>  ....
>  x->f    // ConcretizeStructBase will be called again!
>
> We should actually aim to make all of RegionStore lazy in this way.   
> This would address the scalability concerns with modeling arrays and  
> structs as we only need to bind their values lazily as opposed to  
> all at once.
>
> I also don't think there is a scenario where we need  
> AnonPointeeRegion.  I think that is modeled just fine using  
> SymbolicRegion, and then just having different SymbolData types to  
> encapsulate the information about specific symbols.
>
> +  // For live symbols, if it is concretized, put the associated
> +  // AnonPointeeRegion into RegionRoots.
> +  // FIXME: We need a way to iterate over the live symbols!
> +//   for (LiveSymbolsTy::iterator I = LSymbols.begin(), E =  
> LSymbols.end();
> +//        I != E; ++I) {
> +//     GRStateRef StRef(state, StateMgr);
> +//     ConcretizedPointer::data_type* T =  
> StRef.get<ConcretizedPointer>(*I);
> +//     if (T)
> +//       RegionRoots.push_back(*T);
> +//   }
> +
>
> This isn't needed.  For example:
>
>  x->f = 10
>
> This will cause us to have a FieldRegion for 'f' whose super region  
> is the symbolic region for the symbol referred to by 'x'.  That  
> FieldRegion will have a binding to 10, causing it to be in our  
> bindings.  This means that RemoveDeadBindings will add the  
> FieldRegion in the backmap from super regions to subregions.   
> RemoveDeadBindings nicely handles SymbolicRegion, so if we don't use  
> AnonPointeeRegion at all the code in RemoveDeadBindings stays the  
> same.
>
>

-------------- next part --------------
An HTML attachment was scrubbed...
URL: <http://lists.llvm.org/pipermail/cfe-dev/attachments/20081213/f83848a5/attachment.html>