[cfe-commits] Patch to do static analysis with range sets
Ted Kremenek
kremenek at apple.com
Thu Feb 5 15:55:38 PST 2009
On Feb 1, 2009, at 12:06 AM, Ben Laurie wrote:
> Not complete - no self-tests and full of diagnostics, but I'd love to
> get comments about the approach. Also hints on implementing tests (Ted
> told me in IM, but I forgot), and how to preserve/add more debugging
> stuff in an approved way, since I find that this stuff hurts my head
> without it...
>
> Also, I have cases which fail because higher-level stuff seems to be
> doing the wrong thing, but not sure how to go about debugging the rest
> of static analysis...
> <clang.patch>_______________________________________________
> cfe-commits mailing list
> cfe-commits at cs.uiuc.edu
> http://lists.cs.uiuc.edu/mailman/listinfo/cfe-commits
Hi Ben,
Sorry for the delay. Comments inline:
Index: lib/Analysis/BasicConstraintManager.cpp
===================================================================
--- lib/Analysis/BasicConstraintManager.cpp (revision 63488)
+++ lib/Analysis/BasicConstraintManager.cpp (working copy)
@@ -17,29 +17,364 @@
#include "clang/Analysis/PathSensitive/GRStateTrait.h"
#include "clang/Analysis/PathSensitive/GRTransferFuncs.h"
#include "llvm/Support/Compiler.h"
+#include "llvm/ADT/FoldingSet.h"
#include "llvm/Support/raw_ostream.h"
+#include <iostream>
We try and avoid <iostream>. For my reasons why, see the following in
the LLVM coding standards:
http://llvm.org/docs/CodingStandards.html#ll_iostream
+#include <set>
You're probably going to want to use ImmutableSet, since its a
functional data structure (which works nicely when you embed sets in
state objects). It also doesn't have the malloc traffic that
std::set<> does.
+class Range : public std::pair<llvm::APSInt, llvm::APSInt> {
+public:
+ Range(const llvm::APSInt &from, const llvm::APSInt &to)
+ : std::pair<llvm::APSInt, llvm::APSInt>(from, to) {
+ std::cerr << "Range(" << from.toString(10) << ", " <<
to.toString(10)
+ << ')' << std::endl;
+ assert(from <= to);
+ }
+ bool Includes(const llvm::APSInt &v) const {
+ return first <= v && v <= second;
+ }
+ const llvm::APSInt &From() const {
+ return first;
+ }
+ const llvm::APSInt &To() const {
+ return second;
+ }
Cute. Does a Range represent "[a, b]" or "(a, b)". Please document
this with comments.
I also take it that ranges are only over concrete values? e.g., [0,
2]. What about ranges where:
(1) one of the ends is "symbolic", e.g. [a, 2]
(2) one of the ends is Unknown, e.g. [4, unknown].
You don't have to model (1) now, but I think you probably will need to
model (2). Frequently we have a lower bound but no upper bound
(although I imagine you could use maxint, minint, etc.).
Note that this approach will inherently not cover relationships
between symbolic values, e.g., tracking if 'a > b' when the values of
'a' and 'b' are symbolic.
+struct RangeCmp {
+ bool operator()(const Range &r1, const Range &r2) {
+ if(r1.From() < r2.From()) {
+ assert(!r1.Includes(r2.From()));
+ assert(!r2.Includes(r1.To()));
+ return true;
+ } else if(r1.From() > r2.From()) {
+ assert(!r1.Includes(r2.To()));
+ assert(!r2.Includes(r1.From()));
+ return false;
+ } else
+ assert(!"Ranges should never be equal in the same set");
+ }
+};
I don't this will work. The 'else' branch can be easily triggered if
ranges overlap. Also, is there a reason you are using operator()
instead of operator>() and operator<()?
+
+typedef std::set<Range, RangeCmp> PrimRangeSet;
I think you will want to use ImmutableSet.
+
+class RangeSet;
+std::ostream &operator<<(std::ostream &os, const RangeSet &r);
Don't use std::ostream. Use llvm::raw_ostream&.
+
+class RangeSet {
+ PrimRangeSet ranges;
+ bool noValues; // if true, no value is possible
My meta-level comment about this class is that isn't clear what it
represents. The code afterwards also seems not sure of what
assumptions to make about RangeSet. It seems that RangeSet represents
a set of (non-overlapping) ranges, e.g. which represent the union of
the possible values for some symbolic integer. In other words, it
represents a disjunction:
(x >= set1_lb && x <= set1_ub) || (x >= set2_lb && x <= set2_ub) || ...
With this formalism it should be much more straightforward to answer
the implementation questions for the methods below. It will also tell
you what is the representational power of this constraint model.
Here "lb" and "ub" mean "lower bound" and "upper bound" respectively.
+
+ static const llvm::APSInt Max(const llvm::APSInt &v) {
+ return llvm::APSInt::getMaxValue(v.getBitWidth(), v.isUnsigned());
+ }
+ static const llvm::APSInt Min(const llvm::APSInt &v) {
+ return llvm::APSInt::getMinValue(v.getBitWidth(), v.isUnsigned());
+ }
+ static const llvm::APSInt One(const llvm::APSInt &v) {
+ return llvm::APSInt(llvm::APInt(v.getBitWidth(), 1),
v.isUnsigned());
+ }
+
+public:
+ RangeSet() : noValues(false) {
+ }
+ void Profile(llvm::FoldingSetNodeID &ID) const {
+ for(PrimRangeSet::const_iterator i = ranges.begin() ; i !=
ranges.end() ; ++i) {
+ i->Profile(ID);
+ }
+ }
This kind of Profile implementation won't be necessary if you use
ImmutableSet. It will do it for you. Moreover, with ImmutableSet, two
sets (created using the same ImmutableSet::Factory) will always be
referentially equivalent, meaning they are a quick pointer
comparison. ImmutableSets also use techniques similar to hash-consing
to share data between sets.
+
+ bool CouldBeNE(const llvm::APSInt &ne) const {
+ std::cerr << "CouldBeNE(" << ne.toString(10) << ") " << *this <<
std::endl;
Please don't have debugging messages like this enabled by default.
Either comment it out or #ifdef it.
+ // FIXME: do I mean this? empty means all values are possible,
therefore != possible, right?
+ if(ranges.empty())
+ return false;
I think to answer the questions of these FIXMEs you just need to
define what the invariants are for PrimRangeSet and what it represents.
+ // FIXME: if there are no possible values, then can't be !=,
surely?
+ if(noValues)
+ return true;
+ for(PrimRangeSet::const_iterator i = ranges.begin() ; i !=
ranges.end() ; ++i)
+ if(i->Includes(ne))
+ return false;
+ return true;
+ }
+
+ bool CouldBeEQ(const llvm::APSInt &eq) const {
+ std::cerr << "CouldBeEQ(" << eq.toString(10) << ") " << *this <<
std::endl;
+ if(ranges.empty())
+ return true;
+ // FIXME: if there are no possible values, then can't be ==,
surely?
+ if(noValues)
+ return true;
+ for(PrimRangeSet::const_iterator i = ranges.begin() ; i !=
ranges.end() ; ++i)
+ if(i->Includes(eq))
+ return true;
+ return false;
+ }
Same comments with CouldBeNE, CouldBeLT, etc..
....
+ // Make all existing ranges fall within this new range
+ void Restrict(const llvm::APSInt &from, const llvm::APSInt &to) {
The comment for this method is a little too terse for me to understand
your algorithm. What is it trying to do? It also looks like you are
modifying the RangeSet object in place. In the static analyzer, we
use functional data structures that preserve old data so that each
GRState object (which represents the analysis state at a particular
point within a path) never have their values mutated after the GRState
object is created. This allows us to faithfully keep full path
information. This allows us to report full path diagnostics when
reporting bugs but also (from a correctness standpoint) allows path
caching/merging to work at all.
Without making comments on the rest of the patch (which looks similar
to the other parts), I think there just needs to be a high-level
comment of what is the algorithmic model for the constraints, how they
are resolved, what is represented, etc. Right now it is difficult for
me to evaluate the approach itself since I don't really understand the
algorithm. Also, I think that the implementation itself should come
in a straightforward manner once a few key methods (e.g., "CouldBeEQ")
get fleshed out. I'm more than happy to help with this process.
Ted
More information about the cfe-commits
mailing list