[cfe-commits] Patch to do static analysis with range sets
Daniel Dunbar
daniel at zuster.org
Thu Feb 5 17:20:51 PST 2009
FWIW, it may be worth pointing out the existence of
llvm/include/llvm/Support/ConstantRange.h
Of course that isn't going to deal with symbolic values...
- Daniel
On Thu, Feb 5, 2009 at 3:55 PM, Ted Kremenek <kremenek at apple.com> wrote:
>
> On Feb 1, 2009, at 12:06 AM, Ben Laurie wrote:
>
>> Not complete - no self-tests and full of diagnostics, but I'd love to
>> get comments about the approach. Also hints on implementing tests (Ted
>> told me in IM, but I forgot), and how to preserve/add more debugging
>> stuff in an approved way, since I find that this stuff hurts my head
>> without it...
>>
>> Also, I have cases which fail because higher-level stuff seems to be
>> doing the wrong thing, but not sure how to go about debugging the rest
>> of static analysis...
>> <clang.patch>_______________________________________________
>> cfe-commits mailing list
>> cfe-commits at cs.uiuc.edu
>> http://lists.cs.uiuc.edu/mailman/listinfo/cfe-commits
>
> Hi Ben,
>
> Sorry for the delay. Comments inline:
>
> Index: lib/Analysis/BasicConstraintManager.cpp
> ===================================================================
> --- lib/Analysis/BasicConstraintManager.cpp (revision 63488)
> +++ lib/Analysis/BasicConstraintManager.cpp (working copy)
> @@ -17,29 +17,364 @@
> #include "clang/Analysis/PathSensitive/GRStateTrait.h"
> #include "clang/Analysis/PathSensitive/GRTransferFuncs.h"
> #include "llvm/Support/Compiler.h"
> +#include "llvm/ADT/FoldingSet.h"
> #include "llvm/Support/raw_ostream.h"
>
> +#include <iostream>
>
> We try and avoid <iostream>. For my reasons why, see the following in
> the LLVM coding standards:
>
> http://llvm.org/docs/CodingStandards.html#ll_iostream
>
> +#include <set>
>
> You're probably going to want to use ImmutableSet, since its a
> functional data structure (which works nicely when you embed sets in
> state objects). It also doesn't have the malloc traffic that
> std::set<> does.
>
> +class Range : public std::pair<llvm::APSInt, llvm::APSInt> {
> +public:
> + Range(const llvm::APSInt &from, const llvm::APSInt &to)
> + : std::pair<llvm::APSInt, llvm::APSInt>(from, to) {
> + std::cerr << "Range(" << from.toString(10) << ", " <<
> to.toString(10)
> + << ')' << std::endl;
> + assert(from <= to);
> + }
> + bool Includes(const llvm::APSInt &v) const {
> + return first <= v && v <= second;
> + }
> + const llvm::APSInt &From() const {
> + return first;
> + }
> + const llvm::APSInt &To() const {
> + return second;
> + }
>
> Cute. Does a Range represent "[a, b]" or "(a, b)". Please document
> this with comments.
>
> I also take it that ranges are only over concrete values? e.g., [0,
> 2]. What about ranges where:
>
> (1) one of the ends is "symbolic", e.g. [a, 2]
> (2) one of the ends is Unknown, e.g. [4, unknown].
>
> You don't have to model (1) now, but I think you probably will need to
> model (2). Frequently we have a lower bound but no upper bound
> (although I imagine you could use maxint, minint, etc.).
>
> Note that this approach will inherently not cover relationships
> between symbolic values, e.g., tracking if 'a > b' when the values of
> 'a' and 'b' are symbolic.
>
> +struct RangeCmp {
> + bool operator()(const Range &r1, const Range &r2) {
> + if(r1.From() < r2.From()) {
> + assert(!r1.Includes(r2.From()));
> + assert(!r2.Includes(r1.To()));
> + return true;
> + } else if(r1.From() > r2.From()) {
> + assert(!r1.Includes(r2.To()));
> + assert(!r2.Includes(r1.From()));
> + return false;
> + } else
> + assert(!"Ranges should never be equal in the same set");
> + }
> +};
>
> I don't this will work. The 'else' branch can be easily triggered if
> ranges overlap. Also, is there a reason you are using operator()
> instead of operator>() and operator<()?
>
> +
> +typedef std::set<Range, RangeCmp> PrimRangeSet;
>
> I think you will want to use ImmutableSet.
>
> +
> +class RangeSet;
> +std::ostream &operator<<(std::ostream &os, const RangeSet &r);
>
> Don't use std::ostream. Use llvm::raw_ostream&.
>
> +
> +class RangeSet {
> + PrimRangeSet ranges;
> + bool noValues; // if true, no value is possible
>
> My meta-level comment about this class is that isn't clear what it
> represents. The code afterwards also seems not sure of what
> assumptions to make about RangeSet. It seems that RangeSet represents
> a set of (non-overlapping) ranges, e.g. which represent the union of
> the possible values for some symbolic integer. In other words, it
> represents a disjunction:
>
> (x >= set1_lb && x <= set1_ub) || (x >= set2_lb && x <= set2_ub) || ...
>
> With this formalism it should be much more straightforward to answer
> the implementation questions for the methods below. It will also tell
> you what is the representational power of this constraint model.
>
> Here "lb" and "ub" mean "lower bound" and "upper bound" respectively.
>
> +
> + static const llvm::APSInt Max(const llvm::APSInt &v) {
> + return llvm::APSInt::getMaxValue(v.getBitWidth(), v.isUnsigned());
> + }
> + static const llvm::APSInt Min(const llvm::APSInt &v) {
> + return llvm::APSInt::getMinValue(v.getBitWidth(), v.isUnsigned());
> + }
> + static const llvm::APSInt One(const llvm::APSInt &v) {
> + return llvm::APSInt(llvm::APInt(v.getBitWidth(), 1),
> v.isUnsigned());
> + }
> +
> +public:
> + RangeSet() : noValues(false) {
> + }
> + void Profile(llvm::FoldingSetNodeID &ID) const {
> + for(PrimRangeSet::const_iterator i = ranges.begin() ; i !=
> ranges.end() ; ++i) {
> + i->Profile(ID);
> + }
> + }
>
> This kind of Profile implementation won't be necessary if you use
> ImmutableSet. It will do it for you. Moreover, with ImmutableSet, two
> sets (created using the same ImmutableSet::Factory) will always be
> referentially equivalent, meaning they are a quick pointer
> comparison. ImmutableSets also use techniques similar to hash-consing
> to share data between sets.
>
> +
> + bool CouldBeNE(const llvm::APSInt &ne) const {
> + std::cerr << "CouldBeNE(" << ne.toString(10) << ") " << *this <<
> std::endl;
>
> Please don't have debugging messages like this enabled by default.
> Either comment it out or #ifdef it.
>
> + // FIXME: do I mean this? empty means all values are possible,
> therefore != possible, right?
> + if(ranges.empty())
> + return false;
>
> I think to answer the questions of these FIXMEs you just need to
> define what the invariants are for PrimRangeSet and what it represents.
>
> + // FIXME: if there are no possible values, then can't be !=,
> surely?
> + if(noValues)
> + return true;
> + for(PrimRangeSet::const_iterator i = ranges.begin() ; i !=
> ranges.end() ; ++i)
> + if(i->Includes(ne))
> + return false;
> + return true;
> + }
>
> +
> + bool CouldBeEQ(const llvm::APSInt &eq) const {
> + std::cerr << "CouldBeEQ(" << eq.toString(10) << ") " << *this <<
> std::endl;
> + if(ranges.empty())
> + return true;
> + // FIXME: if there are no possible values, then can't be ==,
> surely?
> + if(noValues)
> + return true;
> + for(PrimRangeSet::const_iterator i = ranges.begin() ; i !=
> ranges.end() ; ++i)
> + if(i->Includes(eq))
> + return true;
> + return false;
> + }
>
> Same comments with CouldBeNE, CouldBeLT, etc..
>
> ....
>
> + // Make all existing ranges fall within this new range
> + void Restrict(const llvm::APSInt &from, const llvm::APSInt &to) {
>
> The comment for this method is a little too terse for me to understand
> your algorithm. What is it trying to do? It also looks like you are
> modifying the RangeSet object in place. In the static analyzer, we
> use functional data structures that preserve old data so that each
> GRState object (which represents the analysis state at a particular
> point within a path) never have their values mutated after the GRState
> object is created. This allows us to faithfully keep full path
> information. This allows us to report full path diagnostics when
> reporting bugs but also (from a correctness standpoint) allows path
> caching/merging to work at all.
>
> Without making comments on the rest of the patch (which looks similar
> to the other parts), I think there just needs to be a high-level
> comment of what is the algorithmic model for the constraints, how they
> are resolved, what is represented, etc. Right now it is difficult for
> me to evaluate the approach itself since I don't really understand the
> algorithm. Also, I think that the implementation itself should come
> in a straightforward manner once a few key methods (e.g., "CouldBeEQ")
> get fleshed out. I'm more than happy to help with this process.
>
> Ted
> _______________________________________________
> cfe-commits mailing list
> cfe-commits at cs.uiuc.edu
> http://lists.cs.uiuc.edu/mailman/listinfo/cfe-commits
>
More information about the cfe-commits
mailing list