[cfe-dev] Symbolic Extents

Ted Kremenek kremenek at apple.com
Mon Jun 28 16:34:12 PDT 2010


On Jun 24, 2010, at 2:18 PM, Jordy Rose wrote:

> The start of work on symbolic extents. I wanted to ask what the best way
> to handle this was. It seems we should have a general EvaluateSymExpr
> method somewhere that simplifies a SymExpr as much as possible, which is
> what the getKnownExtent() method (see patch) would eventually be doing. Or
> perhaps EvaluateSVal, to handle ConcreteInts as well.
> 
> It's also going to soon become helpful to ask for the minimum or maximum
> value of a SymExpr -- knowing the upper bound on an array size would
> certainly be better than no information at all. What should the API for
> that be?
> 
> (One goal of mine is to be able to model that the length of argv is argc.)

Hi Jordy,

I've made comments in some other emails, but here are some specific comments on your patch.  Comments inline:

Index: include/clang/Checker/PathSensitive/Store.h
===================================================================
--- include/clang/Checker/PathSensitive/Store.h	(revision 106771)
+++ include/clang/Checker/PathSensitive/Store.h	(working copy)
@@ -181,6 +181,11 @@
     return llvm::Optional<SVal>();
   }
 
+  virtual const llvm::APSInt *getKnownExtent(const GRState *state,
+                                             const MemRegion *R) {
+    return NULL;
+  }
+


Following my comment in another email, I think this should be a method in SValuator.   E.g:

  virtual SVal getKnownExtent(const GRState *state, const MemRegion *R);

Note that by using an SVal, extents just become like any other (possibly symbolic) expression.  That means we can establish constraints for them, as use all our existing SVal logic.  I actually don't think the StoreManager needs to be involved at all here.



Index: lib/Checker/RegionStore.cpp
===================================================================
--- lib/Checker/RegionStore.cpp	(revision 106771)
+++ lib/Checker/RegionStore.cpp	(working copy)
@@ -384,6 +384,21 @@
       return Optional<SVal>();
   }
 
+  const llvm::APSInt *getKnownExtent(const GRState *state, const MemRegion *R) {
+    Optional<SVal> V = getExtent(state, R);
+    if (!V)
+      return NULL;
+
+    if (nonloc::ConcreteInt *Int = dyn_cast<nonloc::ConcreteInt>(&V))
+      return &Int->getValue();
+
+    if (SymbolRef Sym = V->getAsSymbol())
+      return state->getSymVal(Sym);
+
+    // FIXME: This should eventually try to evaluate symbolic expressions.
+    return NULL;
+  }
+

I think all of this could easily go into SValuator, allowing us to return something far more generic.  I also think the 'getExtent' method only needs to operate on SymbolicRegions, and could possibly be a method in SymbolManager.  For all other cases where the extent is no symbolic, we should be able to evaluate it up front.

 
Index: lib/Checker/CastSizeChecker.cpp
===================================================================
--- lib/Checker/CastSizeChecker.cpp	(revision 106771)
+++ lib/Checker/CastSizeChecker.cpp	(working copy)
@@ -44,7 +44,8 @@
 
   QualType ToPointeeTy = ToPTy->getPointeeType();
 
-  const MemRegion *R = C.getState()->getSVal(E).getAsRegion();
+  const GRState *state = C.getState();
+  const MemRegion *R = state->getSVal(E).getAsRegion();
   if (R == 0)
     return;
 
@@ -52,17 +53,13 @@
   if (SR == 0)
     return;
 
-  llvm::Optional<SVal> V = 
-                    C.getEngine().getStoreManager().getExtent(C.getState(), SR);
-  if (!V)
+  StoreManager &StoreMgr = C.getEngine().getStoreManager();
+  const llvm::APSInt *Extent = StoreMgr.getKnownExtent(state, SR);
+  if (!Extent)
     return;
 
-  const nonloc::ConcreteInt *CI = dyn_cast<nonloc::ConcreteInt>(V);
-  if (!CI)
-    return;
-
-  CharUnits RegionSize = CharUnits::fromQuantity(CI->getValue().getSExtValue());
-  CharUnits TypeSize = C.getASTContext().getTypeSizeInChars(ToPointeeTy);
+  CharUnits RegionSize = CharUnits::fromQuantity(Extent->getSExtValue());
+  CharUnits TypeSize = Ctx.getTypeSizeInChars(ToPointeeTy);


I think that extents should remain as SVals, leaving most of this code unchanged.

Fundamentally, I don't think this is quite the right design for extents.  This actually seems to be making the assumption that extents can be reduced to specific APSInt values.  In the interesting cases, that won't be possible at all.  Instead of we treat extents just like any other SVal (expression), then we get arbitrary flexibility in what they represent and how they are represented.  It allows us then to establish symbolic constraints between extents and indices, which is critical for buffer overflow detection.

I don't have a complete design for extents, but here is my rough idea.  Maybe it is the right design and maybe it's not; I'm curious to see what you and Zhongxing think.  The goals I have in my mind are:

(1) Treat extents just like any other symbolic expression, allowing extents to capture formulations such as "(n+1) * sizeof(int)" and the like.

(2) Make reasoning about extents completely lazy, so we only incur a cost when we reason about them.

For constant-size arrays and scalars, the SVal representing the extent should be something we can compute up-front.  Thus SValuator::getExtent() should return a simple NonLoc SVal (such as a constant) for such extents.  We can add supporting methods to StoreManager or MemRegions if necessary to make computing such values easier.

Now let's consider symbolic extents.  For calls to function like foo():

  void *p = foo(...)

we have the analyzer return a SymbolicRegion.  That SymbolicRegion has an associated SymbolID.  RegionStore uses that SymbolID to create "derived symbols" that represent the initial values of that SymbolicRegion when the code has not created an explicit binding.  This approach allows RegionStore to lazily abstract memory as the code interrogates memory values.

My thought about extents is why not treat them the same way?  We can create a new type of derived symbol, e.g. SymbolicExtent, that subclasses SymbolDerived.  For all intensive purposes it is just another symbol, and we just use the same existing infrastructure to query that symbol and perform operations on it.

For example, consider:

  p = malloc(n * sizeof(int))

Let's say we represent the return value of malloc() with a SymbolicRegion whose SymbolID is $1.  We can provide an API to SymbolManager that queries the SymbolExtent* for a given SymbolID.  This would be lazily generated using a FoldingSet, just like we do with other derived symbols.  Now let's assume we have a SymbolExtent* for that extent, and we want to add the constraint that the symbol is equal to 'n * sizeof(int)'.  All we do construct a symbolic expression that equates that derived symbol with 'n * sizeof(int)'.  We could do this using SValuator to build the symbolic expression, and then use the Assume method in the ConstraintManager to assume that the constraint was true.  All of this could be done in a PostVistCallExpr method of some checker that explicitly understood the malloc() function.  Other checkers could then add constraints modeling other domain-specific knowledge about extents.

The nice thing about this approach is that as long as we make the ConstraintManager and SValuator smart enough, we should be able to handle extents (and their relation to other symbols) just like any other symbolic value.  This also allows us to remove the current GDM-based approach of 'RegionExtents' from RegionStore.  This not only simplifies RegionStore, but also gets the StoreManager out of the business of managing extents itself.  It also gives us a scalable approach to handling extents, and gradually making the analyzer smarter about reasoning about them.

That said, I don't claim that this is perfect, or even the absolute right answer.  I just don't think we should be moving in the direction where we add APIs that have using returning APSInt values explicitly.  If we are going to try and model extents well, I think we should be aiming for a more general design.

What do you guys think?



More information about the cfe-dev mailing list