[cfe-commits] Patch to do static analysis with range sets

Thu Feb 5 17:20:51 PST 2009

FWIW, it may be worth pointing out the existence of
llvm/include/llvm/Support/ConstantRange.h

Of course that isn't going to deal with symbolic values...

 - Daniel

On Thu, Feb 5, 2009 at 3:55 PM, Ted Kremenek <kremenek at apple.com> wrote:
>
> On Feb 1, 2009, at 12:06 AM, Ben Laurie wrote:
>
>> Not complete - no self-tests and full of diagnostics, but I'd love to
>> get comments about the approach. Also hints on implementing tests (Ted
>> told me in IM, but I forgot), and how to preserve/add more debugging
>> stuff in an approved way, since I find that this stuff hurts my head
>> without it...
>>
>> Also, I have cases which fail because higher-level stuff seems to be
>> doing the wrong thing, but not sure how to go about debugging the rest
>> of static analysis...
>> <clang.patch>_______________________________________________
>> cfe-commits mailing list
>> cfe-commits at cs.uiuc.edu
>> http://lists.cs.uiuc.edu/mailman/listinfo/cfe-commits
>
> Hi Ben,
>
> Sorry for the delay.  Comments inline:
>
> Index: lib/Analysis/BasicConstraintManager.cpp
> ===================================================================
> --- lib/Analysis/BasicConstraintManager.cpp     (revision 63488)
> +++ lib/Analysis/BasicConstraintManager.cpp     (working copy)
> @@ -17,29 +17,364 @@
>  #include "clang/Analysis/PathSensitive/GRStateTrait.h"
>  #include "clang/Analysis/PathSensitive/GRTransferFuncs.h"
>  #include "llvm/Support/Compiler.h"
> +#include "llvm/ADT/FoldingSet.h"
>  #include "llvm/Support/raw_ostream.h"
>
> +#include <iostream>
>
> We try and avoid <iostream>.  For my reasons why, see the following in
> the LLVM coding standards:
>
> http://llvm.org/docs/CodingStandards.html#ll_iostream
>
> +#include <set>
>
> You're probably going to want to use ImmutableSet, since its a
> functional data structure (which works nicely when you embed sets in
> state objects).  It also doesn't have the malloc traffic that
> std::set<> does.
>
> +class Range : public std::pair<llvm::APSInt, llvm::APSInt> {
> +public:
> +  Range(const llvm::APSInt &from, const llvm::APSInt &to)
> +    : std::pair<llvm::APSInt, llvm::APSInt>(from, to) {
> +    std::cerr << "Range(" << from.toString(10) << ", " <<
> to.toString(10)
> +              << ')' << std::endl;
> +    assert(from <= to);
> +  }
> +  bool Includes(const llvm::APSInt &v) const {
> +    return first <= v && v <= second;
> +  }
> +  const llvm::APSInt &From() const {
> +    return first;
> +  }
> +  const llvm::APSInt &To() const {
> +    return second;
> +  }
>
> Cute.  Does a Range represent "[a, b]" or "(a, b)".  Please document
> this with comments.
>
> I also take it that ranges are only over concrete values?  e.g., [0,
> 2].  What about ranges where:
>
> (1) one of the ends is "symbolic", e.g. [a, 2]
> (2) one of the ends is Unknown, e.g. [4, unknown].
>
> You don't have to model (1) now, but I think you probably will need to
> model (2).  Frequently we have a lower bound but no upper bound
> (although I imagine you could use maxint, minint, etc.).
>
> Note that this approach will inherently not cover relationships
> between symbolic values, e.g., tracking if 'a > b' when the values of
> 'a' and 'b' are symbolic.
>
> +struct RangeCmp {
> +  bool operator()(const Range &r1, const Range &r2) {
> +    if(r1.From() < r2.From()) {
> +      assert(!r1.Includes(r2.From()));
> +      assert(!r2.Includes(r1.To()));
> +      return true;
> +    } else if(r1.From() > r2.From()) {
> +      assert(!r1.Includes(r2.To()));
> +      assert(!r2.Includes(r1.From()));
> +      return false;
> +    } else
> +      assert(!"Ranges should never be equal in the same set");
> +  }
> +};
>
> I don't this will work.  The 'else' branch can be easily triggered if
> ranges overlap.  Also, is there a reason you are using operator()
> instead of operator>() and operator<()?
>
> +
> +typedef std::set<Range, RangeCmp> PrimRangeSet;
>
> I think you will want to use ImmutableSet.
>
> +
> +class RangeSet;
> +std::ostream &operator<<(std::ostream &os, const RangeSet &r);
>
> Don't use std::ostream.  Use llvm::raw_ostream&.
>
> +
> +class RangeSet {
> +  PrimRangeSet ranges;
> +  bool noValues;  // if true, no value is possible
>
> My meta-level comment about this class is that isn't clear what it
> represents.  The code afterwards also seems not sure of what
> assumptions to make about RangeSet.  It seems that RangeSet represents
> a set of (non-overlapping) ranges, e.g. which represent the union of
> the possible values for some symbolic integer.  In other words, it
> represents a disjunction:
>
> (x >= set1_lb && x <= set1_ub) || (x >= set2_lb && x <= set2_ub) || ...
>
> With this formalism it should be much more straightforward to answer
> the implementation questions for the methods below.  It will also tell
> you what is the representational power of this constraint model.
>
> Here "lb" and "ub" mean "lower bound" and "upper bound" respectively.
>
> +
> +  static const llvm::APSInt Max(const llvm::APSInt &v) {
> +    return llvm::APSInt::getMaxValue(v.getBitWidth(), v.isUnsigned());
> +  }
> +  static const llvm::APSInt Min(const llvm::APSInt &v) {
> +     return llvm::APSInt::getMinValue(v.getBitWidth(), v.isUnsigned());
> +  }
> +  static const llvm::APSInt One(const llvm::APSInt &v) {
> +    return llvm::APSInt(llvm::APInt(v.getBitWidth(), 1),
> v.isUnsigned());
> +  }
> +
> +public:
> +  RangeSet() : noValues(false) {
> +  }
> +  void Profile(llvm::FoldingSetNodeID &ID) const {
> +    for(PrimRangeSet::const_iterator i = ranges.begin() ; i !=
> ranges.end() ; ++i) {
> +      i->Profile(ID);
> +    }
> +  }
>
> This kind of Profile implementation won't be necessary if you use
> ImmutableSet.  It will do it for you. Moreover, with ImmutableSet, two
> sets (created using the same ImmutableSet::Factory) will always be
> referentially equivalent, meaning they are a quick pointer
> comparison.  ImmutableSets also use techniques similar to hash-consing
> to share data between sets.
>
> +
> +  bool CouldBeNE(const llvm::APSInt &ne) const {
> +    std::cerr << "CouldBeNE(" << ne.toString(10) << ") " << *this <<
> std::endl;
>
> Please don't have debugging messages like this enabled by default.
> Either comment it out or #ifdef it.
>
> +    // FIXME: do I mean this? empty means all values are possible,
> therefore != possible, right?
> +    if(ranges.empty())
> +      return false;
>
> I think to answer the questions of these FIXMEs you just need to
> define what the invariants are for PrimRangeSet and what it represents.
>
> +    // FIXME: if there are no possible values, then can't be !=,
> surely?
> +    if(noValues)
> +      return true;
> +    for(PrimRangeSet::const_iterator i = ranges.begin() ; i !=
> ranges.end() ; ++i)
> +      if(i->Includes(ne))
> +        return false;
> +    return true;
> +  }
>
> +
> +  bool CouldBeEQ(const llvm::APSInt &eq) const {
> +    std::cerr << "CouldBeEQ(" << eq.toString(10) << ") " << *this <<
> std::endl;
> +    if(ranges.empty())
> +      return true;
> +    // FIXME: if there are no possible values, then can't be ==,
> surely?
> +    if(noValues)
> +      return true;
> +    for(PrimRangeSet::const_iterator i = ranges.begin() ; i !=
> ranges.end() ; ++i)
> +      if(i->Includes(eq))
> +        return true;
> +    return false;
> +  }
>
> Same comments with CouldBeNE, CouldBeLT, etc..
>
> ....
>
> +  // Make all existing ranges fall within this new range
> +  void Restrict(const llvm::APSInt &from, const llvm::APSInt &to) {
>
> The comment for this method is a little too terse for me to understand
> your algorithm.  What is it trying to do?  It also looks like you are
> modifying the RangeSet object in place.  In the static analyzer, we
> use functional data structures that preserve old data so that each
> GRState object (which represents the analysis state at a particular
> point within a path) never have their values mutated after the GRState
> object is created.  This allows us to faithfully keep full path
> information.  This allows us to report full path diagnostics when
> reporting bugs but also (from a correctness standpoint) allows path
> caching/merging to work at all.
>
> Without making comments on the rest of the patch (which looks similar
> to the other parts), I think there just needs to be a high-level
> comment of what is the algorithmic model for the constraints, how they
> are resolved, what is represented, etc.  Right now it is difficult for
> me to evaluate the approach itself since I don't really understand the
> algorithm.  Also, I think that the implementation itself should come
> in a straightforward manner once a few key methods (e.g., "CouldBeEQ")
> get fleshed out.  I'm more than happy to help with this process.
>
> Ted
> _______________________________________________
> cfe-commits mailing list
> cfe-commits at cs.uiuc.edu
> http://lists.cs.uiuc.edu/mailman/listinfo/cfe-commits
>