[cfe-commits] Patch to do static analysis with range sets

Sun Feb 8 07:04:35 PST 2009

On Thu, Feb 5, 2009 at 8:20 PM, Daniel Dunbar <daniel at zuster.org> wrote:
> FWIW, it may be worth pointing out the existence of
> llvm/include/llvm/Support/ConstantRange.h
>
> Of course that isn't going to deal with symbolic values...

Nor signed values. Nor does it seem to work usefully for this
application (e.g. intersect doesn't actually produce an intersection,
for some reason).

>
>  - Daniel
>
> On Thu, Feb 5, 2009 at 3:55 PM, Ted Kremenek <kremenek at apple.com> wrote:
>>
>> On Feb 1, 2009, at 12:06 AM, Ben Laurie wrote:
>>
>>> Not complete - no self-tests and full of diagnostics, but I'd love to
>>> get comments about the approach. Also hints on implementing tests (Ted
>>> told me in IM, but I forgot), and how to preserve/add more debugging
>>> stuff in an approved way, since I find that this stuff hurts my head
>>> without it...
>>>
>>> Also, I have cases which fail because higher-level stuff seems to be
>>> doing the wrong thing, but not sure how to go about debugging the rest
>>> of static analysis...
>>> <clang.patch>_______________________________________________
>>> cfe-commits mailing list
>>> cfe-commits at cs.uiuc.edu
>>> http://lists.cs.uiuc.edu/mailman/listinfo/cfe-commits
>>
>> Hi Ben,
>>
>> Sorry for the delay.  Comments inline:
>>
>> Index: lib/Analysis/BasicConstraintManager.cpp
>> ===================================================================
>> --- lib/Analysis/BasicConstraintManager.cpp     (revision 63488)
>> +++ lib/Analysis/BasicConstraintManager.cpp     (working copy)
>> @@ -17,29 +17,364 @@
>>  #include "clang/Analysis/PathSensitive/GRStateTrait.h"
>>  #include "clang/Analysis/PathSensitive/GRTransferFuncs.h"
>>  #include "llvm/Support/Compiler.h"
>> +#include "llvm/ADT/FoldingSet.h"
>>  #include "llvm/Support/raw_ostream.h"
>>
>> +#include <iostream>
>>
>> We try and avoid <iostream>.  For my reasons why, see the following in
>> the LLVM coding standards:
>>
>> http://llvm.org/docs/CodingStandards.html#ll_iostream
>>
>> +#include <set>
>>
>> You're probably going to want to use ImmutableSet, since its a
>> functional data structure (which works nicely when you embed sets in
>> state objects).  It also doesn't have the malloc traffic that
>> std::set<> does.
>>
>> +class Range : public std::pair<llvm::APSInt, llvm::APSInt> {
>> +public:
>> +  Range(const llvm::APSInt &from, const llvm::APSInt &to)
>> +    : std::pair<llvm::APSInt, llvm::APSInt>(from, to) {
>> +    std::cerr << "Range(" << from.toString(10) << ", " <<
>> to.toString(10)
>> +              << ')' << std::endl;
>> +    assert(from <= to);
>> +  }
>> +  bool Includes(const llvm::APSInt &v) const {
>> +    return first <= v && v <= second;
>> +  }
>> +  const llvm::APSInt &From() const {
>> +    return first;
>> +  }
>> +  const llvm::APSInt &To() const {
>> +    return second;
>> +  }
>>
>> Cute.  Does a Range represent "[a, b]" or "(a, b)".  Please document
>> this with comments.
>>
>> I also take it that ranges are only over concrete values?  e.g., [0,
>> 2].  What about ranges where:
>>
>> (1) one of the ends is "symbolic", e.g. [a, 2]
>> (2) one of the ends is Unknown, e.g. [4, unknown].
>>
>> You don't have to model (1) now, but I think you probably will need to
>> model (2).  Frequently we have a lower bound but no upper bound
>> (although I imagine you could use maxint, minint, etc.).
>>
>> Note that this approach will inherently not cover relationships
>> between symbolic values, e.g., tracking if 'a > b' when the values of
>> 'a' and 'b' are symbolic.
>>
>> +struct RangeCmp {
>> +  bool operator()(const Range &r1, const Range &r2) {
>> +    if(r1.From() < r2.From()) {
>> +      assert(!r1.Includes(r2.From()));
>> +      assert(!r2.Includes(r1.To()));
>> +      return true;
>> +    } else if(r1.From() > r2.From()) {
>> +      assert(!r1.Includes(r2.To()));
>> +      assert(!r2.Includes(r1.From()));
>> +      return false;
>> +    } else
>> +      assert(!"Ranges should never be equal in the same set");
>> +  }
>> +};
>>
>> I don't this will work.  The 'else' branch can be easily triggered if
>> ranges overlap.  Also, is there a reason you are using operator()
>> instead of operator>() and operator<()?
>>
>> +
>> +typedef std::set<Range, RangeCmp> PrimRangeSet;
>>
>> I think you will want to use ImmutableSet.
>>
>> +
>> +class RangeSet;
>> +std::ostream &operator<<(std::ostream &os, const RangeSet &r);
>>
>> Don't use std::ostream.  Use llvm::raw_ostream&.
>>
>> +
>> +class RangeSet {
>> +  PrimRangeSet ranges;
>> +  bool noValues;  // if true, no value is possible
>>
>> My meta-level comment about this class is that isn't clear what it
>> represents.  The code afterwards also seems not sure of what
>> assumptions to make about RangeSet.  It seems that RangeSet represents
>> a set of (non-overlapping) ranges, e.g. which represent the union of
>> the possible values for some symbolic integer.  In other words, it
>> represents a disjunction:
>>
>> (x >= set1_lb && x <= set1_ub) || (x >= set2_lb && x <= set2_ub) || ...
>>
>> With this formalism it should be much more straightforward to answer
>> the implementation questions for the methods below.  It will also tell
>> you what is the representational power of this constraint model.
>>
>> Here "lb" and "ub" mean "lower bound" and "upper bound" respectively.
>>
>> +
>> +  static const llvm::APSInt Max(const llvm::APSInt &v) {
>> +    return llvm::APSInt::getMaxValue(v.getBitWidth(), v.isUnsigned());
>> +  }
>> +  static const llvm::APSInt Min(const llvm::APSInt &v) {
>> +     return llvm::APSInt::getMinValue(v.getBitWidth(), v.isUnsigned());
>> +  }
>> +  static const llvm::APSInt One(const llvm::APSInt &v) {
>> +    return llvm::APSInt(llvm::APInt(v.getBitWidth(), 1),
>> v.isUnsigned());
>> +  }
>> +
>> +public:
>> +  RangeSet() : noValues(false) {
>> +  }
>> +  void Profile(llvm::FoldingSetNodeID &ID) const {
>> +    for(PrimRangeSet::const_iterator i = ranges.begin() ; i !=
>> ranges.end() ; ++i) {
>> +      i->Profile(ID);
>> +    }
>> +  }
>>
>> This kind of Profile implementation won't be necessary if you use
>> ImmutableSet.  It will do it for you. Moreover, with ImmutableSet, two
>> sets (created using the same ImmutableSet::Factory) will always be
>> referentially equivalent, meaning they are a quick pointer
>> comparison.  ImmutableSets also use techniques similar to hash-consing
>> to share data between sets.
>>
>> +
>> +  bool CouldBeNE(const llvm::APSInt &ne) const {
>> +    std::cerr << "CouldBeNE(" << ne.toString(10) << ") " << *this <<
>> std::endl;
>>
>> Please don't have debugging messages like this enabled by default.
>> Either comment it out or #ifdef it.
>>
>> +    // FIXME: do I mean this? empty means all values are possible,
>> therefore != possible, right?
>> +    if(ranges.empty())
>> +      return false;
>>
>> I think to answer the questions of these FIXMEs you just need to
>> define what the invariants are for PrimRangeSet and what it represents.
>>
>> +    // FIXME: if there are no possible values, then can't be !=,
>> surely?
>> +    if(noValues)
>> +      return true;
>> +    for(PrimRangeSet::const_iterator i = ranges.begin() ; i !=
>> ranges.end() ; ++i)
>> +      if(i->Includes(ne))
>> +        return false;
>> +    return true;
>> +  }
>>
>> +
>> +  bool CouldBeEQ(const llvm::APSInt &eq) const {
>> +    std::cerr << "CouldBeEQ(" << eq.toString(10) << ") " << *this <<
>> std::endl;
>> +    if(ranges.empty())
>> +      return true;
>> +    // FIXME: if there are no possible values, then can't be ==,
>> surely?
>> +    if(noValues)
>> +      return true;
>> +    for(PrimRangeSet::const_iterator i = ranges.begin() ; i !=
>> ranges.end() ; ++i)
>> +      if(i->Includes(eq))
>> +        return true;
>> +    return false;
>> +  }
>>
>> Same comments with CouldBeNE, CouldBeLT, etc..
>>
>> ....
>>
>> +  // Make all existing ranges fall within this new range
>> +  void Restrict(const llvm::APSInt &from, const llvm::APSInt &to) {
>>
>> The comment for this method is a little too terse for me to understand
>> your algorithm.  What is it trying to do?  It also looks like you are
>> modifying the RangeSet object in place.  In the static analyzer, we
>> use functional data structures that preserve old data so that each
>> GRState object (which represents the analysis state at a particular
>> point within a path) never have their values mutated after the GRState
>> object is created.  This allows us to faithfully keep full path
>> information.  This allows us to report full path diagnostics when
>> reporting bugs but also (from a correctness standpoint) allows path
>> caching/merging to work at all.
>>
>> Without making comments on the rest of the patch (which looks similar
>> to the other parts), I think there just needs to be a high-level
>> comment of what is the algorithmic model for the constraints, how they
>> are resolved, what is represented, etc.  Right now it is difficult for
>> me to evaluate the approach itself since I don't really understand the
>> algorithm.  Also, I think that the implementation itself should come
>> in a straightforward manner once a few key methods (e.g., "CouldBeEQ")
>> get fleshed out.  I'm more than happy to help with this process.
>>
>> Ted
>> _______________________________________________
>> cfe-commits mailing list
>> cfe-commits at cs.uiuc.edu
>> http://lists.cs.uiuc.edu/mailman/listinfo/cfe-commits
>>
>