<html><body style="word-wrap: break-word; -webkit-nbsp-mode: space; -webkit-line-break: after-white-space; ">For me things got clearer as we dove into the design.  Modeling the heap symbolically is a tricky concept, but I think we're making good progress towards a clean formulation and implementation.<div><br><div><div>On Dec 12, 2008, at 7:42 PM, Zhongxing Xu wrote:</div><br class="Apple-interchange-newline"><blockquote type="cite">Hi Ted,<br><br>Your comments are inspiring. I feel that there are some more fundamental problems that should be solved before this one. Or, this one is a special case of how we treat assumptions on symbolic values (including pointer and non-pointers) and lazy binding of locations.<br> <br><div class="gmail_quote">On Sat, Dec 13, 2008 at 10:48 AM, Ted Kremenek <span dir="ltr"><<a href="mailto:kremenek@apple.com">kremenek@apple.com</a>></span> wrote:<br><blockquote class="gmail_quote" style="border-left: 1px solid rgb(204, 204, 204); margin: 0pt 0pt 0pt 0.8ex; padding-left: 1ex;"> <div class="Ih2E3d"><br> On Dec 11, 2008, at 1:34 AM, Zhongxing Xu wrote:<br> <br> </div><div class="Ih2E3d"><blockquote class="gmail_quote" style="border-left: 1px solid rgb(204, 204, 204); margin: 0pt 0pt 0pt 0.8ex; padding-left: 1ex;"> Now we use a GDM to map a symbol to its concretized region. We no longer need the base struct pointer to be an lvalue. It only need to be a symbolic value.<br> </blockquote> <br></div> Great!  Many of my comments are still of design, and I'm not convinced we need an AnonPointeeRegion.  We also may not need to have a separate GDM entry; just use the ConstraintManager to determine if a region is concrete or not if its symbol is not null.<br> <br> Comments inline!<div class="Ih2E3d"><br> <br> <blockquote class="gmail_quote" style="border-left: 1px solid rgb(204, 204, 204); margin: 0pt 0pt 0pt 0.8ex; padding-left: 1ex;"> Known problem: we need a way to iterate over live symbols, otherwise the values of AnonPointeeRegions will be swept.<br> </blockquote> <br></div> I don't think this is a problem if we just use SymbolicRegions instead of AnonPointeeRegions.  More comments below.<br> <br> ===================================================================<br> --- lib/Analysis/GRExprEngine.cpp       (revision 60870)<div class="Ih2E3d"><br> +++ lib/Analysis/GRExprEngine.cpp       (working copy)<br></div> @@ -935,6 +935,24 @@<div class="Ih2E3d"><br>     // FIXME: Should we insert some assumption logic in here to determine<br>     // if "Base" is a valid piece of memory?  Before we put this assumption<br>     // later when using FieldOffset lvals (which we no longer have).<br></div> +<br> +    // BaseV is the struct base location.<div class="Ih2E3d"><br> +    SVal BaseV = GetSVal(St, Base);<br></div> +<br> +    // Check if the base can be NonNull. FIXME: should also check for null here,<br> +    // because once we concretized the symbolic pointer, it's no longer viewed<br> +    // as null.<div class="Ih2E3d"><br> +    bool isFeasibleNotNull = false;<br> +    St = Assume(St, BaseV, true, isFeasibleNotNull);<br> <br></div> This isn't correct.  This overrides the only state reference with the assumption that BaseV is not null.  What if it is?  We need two cases here: one where BaseV is assumed to be null and another where it isn't.  We need to generate two states.  Almost anytime in GRExprEngine where we make assumptions we need to have a case where the assumption is false and the other where the assumption is true.<br> <br> For example, this code won't handle the following correctly:<br> <br> int f(struct s x) {<br>  return x == 0 ? x->f1 : x->f2;<br> }<br> <br> On the true branch what 'St' to we generate?  I think it will be null.<br> <br>  -  case loc::SymbolValKind:<br> -    BaseR = MRMgr.getSymbolicRegion(cast<loc::SymbolVal>(&BaseL)->getSymbol());<br> +  case loc::SymbolValKind: {<br> +    GRStateRef state(St, StateMgr);<br> +    loc::SymbolVal* SV = cast<loc::SymbolVal>(&BaseL);<br> +<br> +    // Check if this symbol has been concretized.<br> +    ConcretizedPointer::data_type*<br> +      T = state.get<ConcretizedPointer>(SV->getSymbol());<br> +<br> +    if (T)<br> +      BaseR = *T;<br> +    else<br> +      BaseR = MRMgr.getSymbolicRegion(SV->getSymbol());<br>     break;<br> +  }<br> <br> This now looks wrong to me.  Why have two regions for the same concept?  We should just have a symbolic region, not return an AnonPointeeRegion because that region has been concretized.<br> <br> Here is how I see things:<br> <br> 1) loc::SymbolVal SV represents the "lvalue" of the symbolic region if the symbol unless the symbol is equal to null.<br> <br> 2) If the symbol is not null, SymbolicRegion* R = getSymbolicRegion(SV->getSymbol()) represents the actual region.<br> <br> It seems like AnonPointeeRegion is just duplicating SymbolicRegion.  Why have a separate concept?  It also keep the above code simple.  Whether or not the symbol has a "concrete" region is a property of that symbol, but we shouldn't return a separate region.  It also takes care of all the aliasing issues I mentioned in my previous email because you always using the same "name" for the same concept.<br> <br> ===================================================================<br> --- lib/Analysis/RegionStore.cpp        (revision 60870)<div class="Ih2E3d"><br> +++ lib/Analysis/RegionStore.cpp        (working copy)<br></div> @@ -62,6 +62,16 @@<br>   };<br>  }<br> <br> +// ConcretizedPointer GDM stuff.<br> +typedef llvm::ImmutableMap<SymbolRef, const AnonPointeeRegion*><br> +  ConcretizedPointer;<br> +static int ConcretizedPointerIndex = 0;<br> +namespace clang {<br> +  template<> struct GRStateTrait<ConcretizedPointer><br> +    : public GRStatePartialTrait<ConcretizedPointer> {<br> +    static void* GDMIndex() { return &ConcretizedPointerIndex; }<br> +  };<br> +}<br> <br> I don't think we need a map from SymbolRefs -> AnonPointeeRegion.  Probably just keeping track of whether or not the symbol referred to by SymbolRef is enough to determine if the symbol is concrete.  Otherwise, an ImmutableSet of concrete symbols can represent the same information.<br> <br> <br> +std::pair<const GRState*, Store><br> +RegionStoreManager::ConcretizeStructBase(const GRState* St, SVal BasePtr,<br> +                                         QualType T) {<br> +<br> +  loc::SymbolVal* SV = cast<loc::SymbolVal>(&BasePtr);<br> <br> <br> Since this method expects that BasePtr is a loc::SymbolVal then we should just pass that in directly.  Then then the compiler can type-check our assumptions rather than making this a runtime check.  It also makes the interface much clearer.<br> <br> +<br> +  Store store = St->getStore();<br> +<br> +  // Create an AnonPointeeRegion for the symbolic struct pointer.<br> +  AnonPointeeRegion* APR = MRMgr.getAnonPointeeRegion(SV->getSymbol(), T);<div class="Ih2E3d"><br> +<br> +  // Symbolize the struct region.<br> +  store = BindStructToSymVal(store, APR);<br> +<br></div> +  // Map the symbolic pointer to its concretization region.<br> +  GRStateRef state(St, StateMgr);<br> +  const GRState* NewSt = state.set<ConcretizedPointer>(SV->getSymbol(), APR);<br> +<br> +  return std::make_pair(NewSt, store);<br> +}<br> <br> Looking at this again I actually don't think this method (ConcretizeStructBase) is needed, nor is the design as scalable as I think we have discussed.  Just bind symbolic values to the FieldRegions lazily.  Here's the reasoning:<br> <br> 1) If the symbol for SV->getSymbol() is not null we know it is concrete.  It's region is SymbolicRegion* R = getSymbolicRegion(SV->getSymbol());<br> <br> 2) A field for the FieldDecl 'D' for this symbolic region will be represented as a FieldRegion 'FR' with the decl 'D' and the super region R.<br> <br> 3) During value lookup for the region 'FR' we can...<br> <br> a) If there is no value in our map, return a symbolic value. (we can create a new symbol type to associate with this region)<br> b) If there is a value, return it.<br> c) Assignments of 'UnknownVal' to this field should either be explicitly represented in the mapping or in a kill set.<br> <br> By binding things lazily everything becomes a lot more efficient.  The approach in your patch also doesn't handle the case where a structure is already concretized and you attempt to concretize it again:<br> <br>  x->f<br>  ....<br>  x->f    // ConcretizeStructBase will be called again!<br> <br> We should actually aim to make all of RegionStore lazy in this way.  This would address the scalability concerns with modeling arrays and structs as we only need to bind their values lazily as opposed to all at once.<br> <br> I also don't think there is a scenario where we need AnonPointeeRegion.  I think that is modeled just fine using SymbolicRegion, and then just having different SymbolData types to encapsulate the information about specific symbols.<br> <br> +  // For live symbols, if it is concretized, put the associated<br> +  // AnonPointeeRegion into RegionRoots.<br> +  // FIXME: We need a way to iterate over the live symbols!<br> +//   for (LiveSymbolsTy::iterator I = LSymbols.begin(), E = LSymbols.end();<br> +//        I != E; ++I) {<br> +//     GRStateRef StRef(state, StateMgr);<br> +//     ConcretizedPointer::data_type* T = StRef.get<ConcretizedPointer>(*I);<br> +//     if (T)<br> +//       RegionRoots.push_back(*T);<br> +//   }<br> +<br> <br> This isn't needed.  For example:<br> <br>  x->f = 10<br> <br> This will cause us to have a FieldRegion for 'f' whose super region is the symbolic region for the symbol referred to by 'x'.  That FieldRegion will have a binding to 10, causing it to be in our bindings.  This means that RemoveDeadBindings will add the FieldRegion in the backmap from super regions to subregions.  RemoveDeadBindings nicely handles SymbolicRegion, so if we don't use AnonPointeeRegion at all the code in RemoveDeadBindings stays the same.<br> <br> </blockquote></div><br></blockquote></div><br></div></body></html>