[cfe-commits] Patch to do static analysis with range sets

Ted Kremenek kremenek at apple.com
Thu Feb 5 15:55:38 PST 2009


On Feb 1, 2009, at 12:06 AM, Ben Laurie wrote:

> Not complete - no self-tests and full of diagnostics, but I'd love to
> get comments about the approach. Also hints on implementing tests (Ted
> told me in IM, but I forgot), and how to preserve/add more debugging
> stuff in an approved way, since I find that this stuff hurts my head
> without it...
>
> Also, I have cases which fail because higher-level stuff seems to be
> doing the wrong thing, but not sure how to go about debugging the rest
> of static analysis...
> <clang.patch>_______________________________________________
> cfe-commits mailing list
> cfe-commits at cs.uiuc.edu
> http://lists.cs.uiuc.edu/mailman/listinfo/cfe-commits

Hi Ben,

Sorry for the delay.  Comments inline:

Index: lib/Analysis/BasicConstraintManager.cpp
===================================================================
--- lib/Analysis/BasicConstraintManager.cpp	(revision 63488)
+++ lib/Analysis/BasicConstraintManager.cpp	(working copy)
@@ -17,29 +17,364 @@
  #include "clang/Analysis/PathSensitive/GRStateTrait.h"
  #include "clang/Analysis/PathSensitive/GRTransferFuncs.h"
  #include "llvm/Support/Compiler.h"
+#include "llvm/ADT/FoldingSet.h"
  #include "llvm/Support/raw_ostream.h"

+#include <iostream>

We try and avoid <iostream>.  For my reasons why, see the following in  
the LLVM coding standards:

http://llvm.org/docs/CodingStandards.html#ll_iostream

+#include <set>

You're probably going to want to use ImmutableSet, since its a  
functional data structure (which works nicely when you embed sets in  
state objects).  It also doesn't have the malloc traffic that  
std::set<> does.

+class Range : public std::pair<llvm::APSInt, llvm::APSInt> {
+public:
+  Range(const llvm::APSInt &from, const llvm::APSInt &to)
+    : std::pair<llvm::APSInt, llvm::APSInt>(from, to) {
+    std::cerr << "Range(" << from.toString(10) << ", " <<  
to.toString(10)
+              << ')' << std::endl;
+    assert(from <= to);
+  }
+  bool Includes(const llvm::APSInt &v) const {
+    return first <= v && v <= second;
+  }
+  const llvm::APSInt &From() const {
+    return first;
+  }
+  const llvm::APSInt &To() const {
+    return second;
+  }

Cute.  Does a Range represent "[a, b]" or "(a, b)".  Please document  
this with comments.

I also take it that ranges are only over concrete values?  e.g., [0,  
2].  What about ranges where:

(1) one of the ends is "symbolic", e.g. [a, 2]
(2) one of the ends is Unknown, e.g. [4, unknown].

You don't have to model (1) now, but I think you probably will need to  
model (2).  Frequently we have a lower bound but no upper bound  
(although I imagine you could use maxint, minint, etc.).

Note that this approach will inherently not cover relationships  
between symbolic values, e.g., tracking if 'a > b' when the values of  
'a' and 'b' are symbolic.

+struct RangeCmp {
+  bool operator()(const Range &r1, const Range &r2) {
+    if(r1.From() < r2.From()) {
+      assert(!r1.Includes(r2.From()));
+      assert(!r2.Includes(r1.To()));
+      return true;
+    } else if(r1.From() > r2.From()) {
+      assert(!r1.Includes(r2.To()));
+      assert(!r2.Includes(r1.From()));
+      return false;
+    } else
+      assert(!"Ranges should never be equal in the same set");
+  }
+};

I don't this will work.  The 'else' branch can be easily triggered if  
ranges overlap.  Also, is there a reason you are using operator()  
instead of operator>() and operator<()?

+
+typedef std::set<Range, RangeCmp> PrimRangeSet;

I think you will want to use ImmutableSet.

+
+class RangeSet;
+std::ostream &operator<<(std::ostream &os, const RangeSet &r);

Don't use std::ostream.  Use llvm::raw_ostream&.

+
+class RangeSet {
+  PrimRangeSet ranges;
+  bool noValues;  // if true, no value is possible

My meta-level comment about this class is that isn't clear what it  
represents.  The code afterwards also seems not sure of what  
assumptions to make about RangeSet.  It seems that RangeSet represents  
a set of (non-overlapping) ranges, e.g. which represent the union of  
the possible values for some symbolic integer.  In other words, it  
represents a disjunction:

(x >= set1_lb && x <= set1_ub) || (x >= set2_lb && x <= set2_ub) || ...

With this formalism it should be much more straightforward to answer  
the implementation questions for the methods below.  It will also tell  
you what is the representational power of this constraint model.

Here "lb" and "ub" mean "lower bound" and "upper bound" respectively.

+
+  static const llvm::APSInt Max(const llvm::APSInt &v) {
+    return llvm::APSInt::getMaxValue(v.getBitWidth(), v.isUnsigned());
+  }
+  static const llvm::APSInt Min(const llvm::APSInt &v) {
+     return llvm::APSInt::getMinValue(v.getBitWidth(), v.isUnsigned());
+  }
+  static const llvm::APSInt One(const llvm::APSInt &v) {
+    return llvm::APSInt(llvm::APInt(v.getBitWidth(), 1),  
v.isUnsigned());
+  }
+
+public:
+  RangeSet() : noValues(false) {
+  }
+  void Profile(llvm::FoldingSetNodeID &ID) const {
+    for(PrimRangeSet::const_iterator i = ranges.begin() ; i !=  
ranges.end() ; ++i) {
+      i->Profile(ID);
+    }
+  }

This kind of Profile implementation won't be necessary if you use  
ImmutableSet.  It will do it for you. Moreover, with ImmutableSet, two  
sets (created using the same ImmutableSet::Factory) will always be  
referentially equivalent, meaning they are a quick pointer  
comparison.  ImmutableSets also use techniques similar to hash-consing  
to share data between sets.

+
+  bool CouldBeNE(const llvm::APSInt &ne) const {
+    std::cerr << "CouldBeNE(" << ne.toString(10) << ") " << *this <<  
std::endl;

Please don't have debugging messages like this enabled by default.   
Either comment it out or #ifdef it.

+    // FIXME: do I mean this? empty means all values are possible,  
therefore != possible, right?
+    if(ranges.empty())
+      return false;

I think to answer the questions of these FIXMEs you just need to  
define what the invariants are for PrimRangeSet and what it represents.

+    // FIXME: if there are no possible values, then can't be !=,  
surely?
+    if(noValues)
+      return true;
+    for(PrimRangeSet::const_iterator i = ranges.begin() ; i !=  
ranges.end() ; ++i)
+      if(i->Includes(ne))
+        return false;
+    return true;
+  }

+
+  bool CouldBeEQ(const llvm::APSInt &eq) const {
+    std::cerr << "CouldBeEQ(" << eq.toString(10) << ") " << *this <<  
std::endl;
+    if(ranges.empty())
+      return true;
+    // FIXME: if there are no possible values, then can't be ==,  
surely?
+    if(noValues)
+      return true;
+    for(PrimRangeSet::const_iterator i = ranges.begin() ; i !=  
ranges.end() ; ++i)
+      if(i->Includes(eq))
+        return true;
+    return false;
+  }

Same comments with CouldBeNE, CouldBeLT, etc..

....

+  // Make all existing ranges fall within this new range
+  void Restrict(const llvm::APSInt &from, const llvm::APSInt &to) {

The comment for this method is a little too terse for me to understand  
your algorithm.  What is it trying to do?  It also looks like you are  
modifying the RangeSet object in place.  In the static analyzer, we  
use functional data structures that preserve old data so that each  
GRState object (which represents the analysis state at a particular  
point within a path) never have their values mutated after the GRState  
object is created.  This allows us to faithfully keep full path  
information.  This allows us to report full path diagnostics when  
reporting bugs but also (from a correctness standpoint) allows path  
caching/merging to work at all.

Without making comments on the rest of the patch (which looks similar  
to the other parts), I think there just needs to be a high-level  
comment of what is the algorithmic model for the constraints, how they  
are resolved, what is represented, etc.  Right now it is difficult for  
me to evaluate the approach itself since I don't really understand the  
algorithm.  Also, I think that the implementation itself should come  
in a straightforward manner once a few key methods (e.g., "CouldBeEQ")  
get fleshed out.  I'm more than happy to help with this process.

Ted



More information about the cfe-commits mailing list