[cfe-dev] [PATCH] expand a symbolic pointer

Fri Dec 12 19:42:09 PST 2008

Hi Ted,

Your comments are inspiring. I feel that there are some more fundamental
problems that should be solved before this one. Or, this one is a special
case of how we treat assumptions on symbolic values (including pointer and
non-pointers) and lazy binding of locations.

On Sat, Dec 13, 2008 at 10:48 AM, Ted Kremenek <kremenek at apple.com> wrote:

>
> On Dec 11, 2008, at 1:34 AM, Zhongxing Xu wrote:
>
>  Now we use a GDM to map a symbol to its concretized region. We no longer
>> need the base struct pointer to be an lvalue. It only need to be a symbolic
>> value.
>>
>
> Great!  Many of my comments are still of design, and I'm not convinced we
> need an AnonPointeeRegion.  We also may not need to have a separate GDM
> entry; just use the ConstraintManager to determine if a region is concrete
> or not if its symbol is not null.
>
> Comments inline!
>
>  Known problem: we need a way to iterate over live symbols, otherwise the
>> values of AnonPointeeRegions will be swept.
>>
>
> I don't think this is a problem if we just use SymbolicRegions instead of
> AnonPointeeRegions.  More comments below.
>
> ===================================================================
> --- lib/Analysis/GRExprEngine.cpp       (revision 60870)
> +++ lib/Analysis/GRExprEngine.cpp       (working copy)
> @@ -935,6 +935,24 @@
>     // FIXME: Should we insert some assumption logic in here to determine
>     // if "Base" is a valid piece of memory?  Before we put this assumption
>     // later when using FieldOffset lvals (which we no longer have).
> +
> +    // BaseV is the struct base location.
> +    SVal BaseV = GetSVal(St, Base);
> +
> +    // Check if the base can be NonNull. FIXME: should also check for null
> here,
> +    // because once we concretized the symbolic pointer, it's no longer
> viewed
> +    // as null.
> +    bool isFeasibleNotNull = false;
> +    St = Assume(St, BaseV, true, isFeasibleNotNull);
>
> This isn't correct.  This overrides the only state reference with the
> assumption that BaseV is not null.  What if it is?  We need two cases here:
> one where BaseV is assumed to be null and another where it isn't.  We need
> to generate two states.  Almost anytime in GRExprEngine where we make
> assumptions we need to have a case where the assumption is false and the
> other where the assumption is true.
>
> For example, this code won't handle the following correctly:
>
> int f(struct s x) {
>  return x == 0 ? x->f1 : x->f2;
> }
>
> On the true branch what 'St' to we generate?  I think it will be null.
>
>  -  case loc::SymbolValKind:
> -    BaseR =
> MRMgr.getSymbolicRegion(cast<loc::SymbolVal>(&BaseL)->getSymbol());
> +  case loc::SymbolValKind: {
> +    GRStateRef state(St, StateMgr);
> +    loc::SymbolVal* SV = cast<loc::SymbolVal>(&BaseL);
> +
> +    // Check if this symbol has been concretized.
> +    ConcretizedPointer::data_type*
> +      T = state.get<ConcretizedPointer>(SV->getSymbol());
> +
> +    if (T)
> +      BaseR = *T;
> +    else
> +      BaseR = MRMgr.getSymbolicRegion(SV->getSymbol());
>     break;
> +  }
>
> This now looks wrong to me.  Why have two regions for the same concept?  We
> should just have a symbolic region, not return an AnonPointeeRegion because
> that region has been concretized.
>
> Here is how I see things:
>
> 1) loc::SymbolVal SV represents the "lvalue" of the symbolic region if the
> symbol unless the symbol is equal to null.
>
> 2) If the symbol is not null, SymbolicRegion* R =
> getSymbolicRegion(SV->getSymbol()) represents the actual region.
>
> It seems like AnonPointeeRegion is just duplicating SymbolicRegion.  Why
> have a separate concept?  It also keep the above code simple.  Whether or
> not the symbol has a "concrete" region is a property of that symbol, but we
> shouldn't return a separate region.  It also takes care of all the aliasing
> issues I mentioned in my previous email because you always using the same
> "name" for the same concept.
>
> ===================================================================
> --- lib/Analysis/RegionStore.cpp        (revision 60870)
> +++ lib/Analysis/RegionStore.cpp        (working copy)
> @@ -62,6 +62,16 @@
>   };
>  }
>
> +// ConcretizedPointer GDM stuff.
> +typedef llvm::ImmutableMap<SymbolRef, const AnonPointeeRegion*>
> +  ConcretizedPointer;
> +static int ConcretizedPointerIndex = 0;
> +namespace clang {
> +  template<> struct GRStateTrait<ConcretizedPointer>
> +    : public GRStatePartialTrait<ConcretizedPointer> {
> +    static void* GDMIndex() { return &ConcretizedPointerIndex; }
> +  };
> +}
>
> I don't think we need a map from SymbolRefs -> AnonPointeeRegion.  Probably
> just keeping track of whether or not the symbol referred to by SymbolRef is
> enough to determine if the symbol is concrete.  Otherwise, an ImmutableSet
> of concrete symbols can represent the same information.
>
>
> +std::pair<const GRState*, Store>
> +RegionStoreManager::ConcretizeStructBase(const GRState* St, SVal BasePtr,
> +                                         QualType T) {
> +
> +  loc::SymbolVal* SV = cast<loc::SymbolVal>(&BasePtr);
>
>
> Since this method expects that BasePtr is a loc::SymbolVal then we should
> just pass that in directly.  Then then the compiler can type-check our
> assumptions rather than making this a runtime check.  It also makes the
> interface much clearer.
>
> +
> +  Store store = St->getStore();
> +
> +  // Create an AnonPointeeRegion for the symbolic struct pointer.
> +  AnonPointeeRegion* APR = MRMgr.getAnonPointeeRegion(SV->getSymbol(), T);
> +
> +  // Symbolize the struct region.
> +  store = BindStructToSymVal(store, APR);
> +
> +  // Map the symbolic pointer to its concretization region.
> +  GRStateRef state(St, StateMgr);
> +  const GRState* NewSt = state.set<ConcretizedPointer>(SV->getSymbol(),
> APR);
> +
> +  return std::make_pair(NewSt, store);
> +}
>
> Looking at this again I actually don't think this method
> (ConcretizeStructBase) is needed, nor is the design as scalable as I think
> we have discussed.  Just bind symbolic values to the FieldRegions lazily.
>  Here's the reasoning:
>
> 1) If the symbol for SV->getSymbol() is not null we know it is concrete.
>  It's region is SymbolicRegion* R = getSymbolicRegion(SV->getSymbol());
>
> 2) A field for the FieldDecl 'D' for this symbolic region will be
> represented as a FieldRegion 'FR' with the decl 'D' and the super region R.
>
> 3) During value lookup for the region 'FR' we can...
>
> a) If there is no value in our map, return a symbolic value. (we can create
> a new symbol type to associate with this region)
> b) If there is a value, return it.
> c) Assignments of 'UnknownVal' to this field should either be explicitly
> represented in the mapping or in a kill set.
>
> By binding things lazily everything becomes a lot more efficient.  The
> approach in your patch also doesn't handle the case where a structure is
> already concretized and you attempt to concretize it again:
>
>  x->f
>  ....
>  x->f    // ConcretizeStructBase will be called again!
>
> We should actually aim to make all of RegionStore lazy in this way.  This
> would address the scalability concerns with modeling arrays and structs as
> we only need to bind their values lazily as opposed to all at once.
>
> I also don't think there is a scenario where we need AnonPointeeRegion.  I
> think that is modeled just fine using SymbolicRegion, and then just having
> different SymbolData types to encapsulate the information about specific
> symbols.
>
> +  // For live symbols, if it is concretized, put the associated
> +  // AnonPointeeRegion into RegionRoots.
> +  // FIXME: We need a way to iterate over the live symbols!
> +//   for (LiveSymbolsTy::iterator I = LSymbols.begin(), E =
> LSymbols.end();
> +//        I != E; ++I) {
> +//     GRStateRef StRef(state, StateMgr);
> +//     ConcretizedPointer::data_type* T =
> StRef.get<ConcretizedPointer>(*I);
> +//     if (T)
> +//       RegionRoots.push_back(*T);
> +//   }
> +
>
> This isn't needed.  For example:
>
>  x->f = 10
>
> This will cause us to have a FieldRegion for 'f' whose super region is the
> symbolic region for the symbol referred to by 'x'.  That FieldRegion will
> have a binding to 10, causing it to be in our bindings.  This means that
> RemoveDeadBindings will add the FieldRegion in the backmap from super
> regions to subregions.  RemoveDeadBindings nicely handles SymbolicRegion, so
> if we don't use AnonPointeeRegion at all the code in RemoveDeadBindings
> stays the same.
>
>
-------------- next part --------------
An HTML attachment was scrubbed...
URL: <http://lists.llvm.org/pipermail/cfe-dev/attachments/20081213/c5cca253/attachment.html>