[cfe-commits] RangeConstraintManager
Ted Kremenek
kremenek at apple.com
Thu Feb 12 14:29:03 PST 2009
On Feb 12, 2009, at 5:15 AM, Ben Laurie wrote:
> Once the other patches are applied, then this file should work just
> fine. Still has debugging messages in and still no switch to enable
> it. Will fix both soon.
> <
> RangeConstraintManager
> .cpp>_______________________________________________
> cfe-commits mailing list
> cfe-commits at cs.uiuc.edu
> http://lists.cs.uiuc.edu/mailman/listinfo/cfe-commits
Hi Ben,
This is looking good! One meta-level coding style comment I thought
I'd mention first concerns if/for statements. Please write:
if (condition)
instead of
if(condition)
It's a minor thing (it's an LLVM convention), but it's good for
consistency. Same thing for loops.
Other comments inline!
>
> #include <iostream>
Please don't use iostream (if you can help it). Not only does it's
performance suck, but it slows down startup time:
http://llvm.org/docs/CodingStandards.html#ll_iostream
> #include <set>
This #include is no longer needed.
> using namespace clang;
>
> namespace { class VISIBILITY_HIDDEN ConstRange {}; }
>
> static int ConstRangeIndex = 0;
>
> /* A Range represents the closed range [from, to].
> * The caller must guarantee that from <= to.
> */
Please use C++ style comments (preferably doxygen style comments)
instead of C-style comments.
> class Range : public std::pair<llvm::APSInt, llvm::APSInt> {
> public:
> Range(const llvm::APSInt &from, const llvm::APSInt &to)
> : std::pair<llvm::APSInt, llvm::APSInt>(from, to) {
> assert(from <= to);
> }
> bool Includes(const llvm::APSInt &v) const {
> return first <= v && v <= second;
> }
> const llvm::APSInt &From() const {
> return first;
> }
> const llvm::APSInt &To() const {
> return second;
> }
> const llvm::APSInt *HasConcreteValue() const {
> return From() == To() ? &From() : NULL;
> }
>
> void Profile(llvm::FoldingSetNodeID &ID) const {
> From().Profile(ID);
> To().Profile(ID);
> }
> };
Looks good. I would also add a comment to 'Range' indicating that it
is immutable (just to make it super clear). Its great that all the
methods are const. You can also probably do private inheritance from
std::pair<> so that 'first' and 'second' are not publicly accessible.
>
> struct RangeCmp {
> bool operator()(const Range &r1, const Range &r2) {
> if(r1.From() < r2.From()) {
> assert(!r1.Includes(r2.From()));
> assert(!r2.Includes(r1.To()));
> return true;
> } else if(r1.From() > r2.From()) {
> assert(!r1.Includes(r2.To()));
> assert(!r2.Includes(r1.From()));
> return false;
> } else
> assert(!"Ranges should never be equal in the same set");
> }
> };
Nothing uses this anymore.
> typedef llvm::ImmutableSet<Range> PrimRangeSet;
Looks good. Note that ImmutableSet uses the trait class
'ImutContainerInfo' to compare items in the set. The default
implementation of ImutContainerInfo uses std::less to compare items.
More info in ImmutableSet.h (the default trait implementations are at
the bottom of the file, right before the declaration of 'class
ImmutableSet'. If things are working correctly, you might need to
migrate your logic from RangeCmp over a ImutContainerInfo<Range> class
>
> class RangeSet;
> std::ostream &operator<<(std::ostream &os, const RangeSet &r);
Please use llvm::cerr or llvm::raw_ostream instead. There are pieces
of the analysis library that still use ostream; these are getting
gradualyl replaced with llvm::raw_ostream.
>
> /*
> * A RangeSet contains a set of ranges. If the set is empty, then
> * noValues -> Nothing matches.
> * !noValues -> Everything (in range of the bit representation)
> matches.
> */
C++ style comments (see other code throughout the codebase for the
convention). The 'AST' library is particularly well document (e.g.,
'include/clang/AST/Stmt.h'). I'll admit that much of the analyzer
code isn't as well documented as it should be. I'm not trying to be a
hypocrite; that code needs to be fixed as well.
> class RangeSet {
> PrimRangeSet ranges; // no need to make const, since it is an
> // ImmutableSet - this allows default operator=
> // to work.
> bool noValues; // if true, no value is possible (should never
> happen)
> static PrimRangeSet::Factory factory;
Please don't use a static member variable. I see this as the same
vice as using a global variable.
The factory object's lifetime should be constrained by the lifetime of
the enclosing RangeConstraintManager object. Making it static also
means that multiple RangeConstraintManagers will see the same factory
object, which shouldn't be the case (at least at this time). I know
this means the code might need to be reorganized a little, but
probably a better spot for the factory object is in
RangeConstraintManager as a normal member variable.
If you want to bring the factory object around, consider a design
similar to GRState and GRStateRef. GRStateRef represents a "fat
reference" that brings around the GRStateManager object for use with
GRState methods, and GRState just represents the raw data.
>
> static const llvm::APSInt Max(const llvm::APSInt &v) {
> return llvm::APSInt::getMaxValue(v.getBitWidth(), v.isUnsigned());
> }
> static const llvm::APSInt Min(const llvm::APSInt &v) {
> return llvm::APSInt::getMinValue(v.getBitWidth(),
> v.isUnsigned());
> }
> static const llvm::APSInt One(const llvm::APSInt &v) {
> return llvm::APSInt(llvm::APInt(v.getBitWidth(), 1),
> v.isUnsigned());
> }
I suppose these methods could be declared as const. Actually, it
might be worth just putting these into APSInt directly. It seems like
something that other people could use.
>
> public:
> RangeSet() : ranges(factory.GetEmptySet()),
> noValues(false) {
> }
Please add a comment explaining the comments of the default constructor.
> // Note that if the empty set is passed, then there are no
> possible values.
> // To create a RangeSet that covers all values, use the other
> constructor.
Is the "other construct" the default constructor? There are three
constructors.
> RangeSet(const PrimRangeSet &r) : ranges(r), noValues(r.isEmpty()) {
> }
> // Allow an empty set to be passed without setting noValues
This comment is a little cryptic. Isn't 'noValues' an implementation
detail?
> RangeSet(const PrimRangeSet &r, bool n) : ranges(r), noValues(n) {
> assert(!n);
> }
> void Profile(llvm::FoldingSetNodeID &ID) const {
> ranges.Profile(ID);
> ID.AddBoolean(noValues);
> }
Looks great.
>
> const llvm::APSInt *HasConcreteValue() const {
> if(!ranges.isSingleton())
> return NULL;
> return ranges.begin()->HasConcreteValue();
> }
Looks great. I'm a little concerned about the assumptions clients can
make on the returned "llvm::APSInt*" object. It's lifetime is bound
to the lifetime of the Range object (which is bound to the
ImmutableSet). In many cases through the analyzer with use the
"BasicValueManager" to unique APSInt objects so that they can be
compared using direct pointer comparison. It also takes care of some
weird memory management issues.
By default, the destructors of the objects allocated by an
ImmutableSet::Factory are not called. APSInt actually can 'malloc()'
memory, meaning that if their destructors aren't called then this
memory will just get leaked. Using BasicValueManager solves this
problem since it manually calls the destructors of the uniqued APSInt
objects.
>
> bool CouldBeNE(const llvm::APSInt &ne) const {
> std::cerr << "CouldBeNE(" << ne.toString(10) << ") " << *this <<
> std::endl;
> assert(!noValues);
> if(ranges.isEmpty())
> return true;
> for(PrimRangeSet::iterator i = ranges.begin() ; i !=
> ranges.end() ; ++i)
> if(i->Includes(ne))
> return false;
> return true;
> }
Just for my education, what is the purpose of 'noValues'? A small
comment here would also be useful. I gather the idea that this method
returns true if the set of values represented by the range set *may*
be less than 'ne'. If that is the case, please put that as a comment
above the function.
I see that most of the rest of the CouldBeXXX functions is just
boilerplate. Looks good for starters (i.e., some obvious FIXME's are
in place). You may also consider adding assertions (i.e., assert(0
&& "Not implemented") for the CouldBeXXX methods that are incomplete.'
>
> // Make all existing ranges fall within this new range
> RangeSet Restrict(const llvm::APSInt &from, const llvm::APSInt
> &to) {
> if(ranges.isEmpty())
> return factory.Add(ranges, Range(from, to));;
Looks good.
>
> PrimRangeSet newRanges = factory.GetEmptySet();
>
> for(PrimRangeSet::iterator i = ranges.begin() ; i !=
> ranges.end() ; ++i) {
> if(i->Includes(from)) {
> if(i->Includes(to)) {
> newRanges = factory.Add(newRanges, Range(from, to));
> } else {
> newRanges = factory.Add(newRanges, Range(from, i->To()));
> }
> } else if(i->Includes(to)) {
> newRanges = factory.Add(newRanges, Range(i->From(), to));
> }
> }
> return RangeSet(newRanges);
> }
Quick question: so the idea of 'Restrict' is to do a projection of
what ranges are possible (within the given range)? Looks great to me.
BTW, it looks like 'RangeSet' is basically POD and functional. Is it
possible to make 'Restrict' a const?
>
> RangeSet AddEQ(const llvm::APSInt &eq) {
> std::cerr << "AddEQ(" << eq.toString(10) << ") " << *this << " -
> > ";
> assert(CouldBeEQ(eq));
> RangeSet r(factory.Add(factory.GetEmptySet(), Range(eq, eq)));
> std::cerr << r << std::endl;
> return r;
> }
I see, this adds the constraint that the range is equal to the given
value. Could you add this as a comment to the method? BTW, is the
precondition for this method that 'CouldBeEQ' be checked *before*
calling this method?
An alternate design is to allow the client to just add constraints,
and have a special RangeSet representing "no feasible value" (I think
this is what your 'noValues' variable is for). Methods like 'AddEQ'
could then call 'CouldBeEq' first to determine if the condition is
feasible and if it isn't return the special RangeSet for "no feasible
value".
>
> RangeSet AddNE(const llvm::APSInt &ne) {
> std::cerr << "AddNE(" << ne.toString(10) << ") " << *this << " -
> > ";
>
> const llvm::APSInt max = Max(ne);
> const llvm::APSInt min = Min(ne);
> const llvm::APSInt one = One(ne);
>
> PrimRangeSet newRanges = factory.GetEmptySet();
>
> if(ranges.isEmpty()) {
> if(ne != max)
> newRanges = factory.Add(newRanges, Range(ne + one, max));
> if(ne != min)
> newRanges = factory.Add(newRanges, Range(min, ne - one));
> RangeSet r(newRanges);
> std::cerr << r << std::endl;
> return r;
> }
Looks good.
>
> for(PrimRangeSet::iterator i = ranges.begin() ; i !=
> ranges.end() ; ++i) {
> if(i->Includes(ne)) {
> if(ne != i->From())
> newRanges = factory.Add(newRanges, Range(i->From(), ne -
> one));
> if(ne != i->To())
> newRanges = factory.Add(newRanges, Range(ne + one, i-
> >To()));
> } else {
> newRanges = factory.Add(newRanges, *i);
> }
> }
I'm wondering if the two cases of 'ranges.isEmpty()' and this loop
could be coalesced by representing the range of "all values" with the
explicit range of 'Range(min, max)'. It would probably simplify a
bunch of code.
>
> RangeSet AddLT(const llvm::APSInt <) {
> std::cerr << "AddLT(" << lt.toString(10) << ") " << *this << " -
> > ";
> const llvm::APSInt min = Min(lt);
> const llvm::APSInt one = One(lt);
>
> if(ranges.isEmpty()) {
> PrimRangeSet pr = factory.GetEmptySet();
> if(lt != min)
> pr = factory.Add(pr, Range(min, lt - one));
> RangeSet r(pr, false);
> std::cerr << r << std::endl;
> return r;
> }
>
> PrimRangeSet newRanges = factory.GetEmptySet();
>
> for(PrimRangeSet::iterator i = ranges.begin() ; i !=
> ranges.end() ; ++i) {
> if(i->Includes(lt) && i->From() < lt)
> newRanges = factory.Add(newRanges, Range(i->From(), lt -
> one));
> else if(i->To() < lt)
> newRanges = factory.Add(newRanges, *i);
> }
> RangeSet r(newRanges);
> std::cerr << r << std::endl;
> return r;
> }
Again, I'm thinking that the two cases of 'ranges.isEmpty()' and the
second case can be combined together. Come to think of it, maybe the
'isEmpty' case could represent the infeasible case? That is, if there
are no ranges in the RangeSet, then no value is feasible. In that
scenario, I don't think you need the CouldBeXXX methods. Just use the
'AddXXX' methods, and if the returned RangeSet is empty then you know
the constraint isn't feasible. That would remove a bunch of
potentially redundant logic.
>
> RangeSet AddLE(const llvm::APSInt &le) {
> std::cerr << "AddLE(" << le.toString(10) << ") " << *this << " -
> > ";
> const llvm::APSInt min = Min(le);
>
> if(ranges.isEmpty()) {
> RangeSet r(factory.Add(ranges, Range(min, le)));
> std::cerr << r << std::endl;
> return r;
> }
>
> PrimRangeSet newRanges = factory.GetEmptySet();
>
> for(PrimRangeSet::iterator i = ranges.begin() ; i !=
> ranges.end() ; ++i) {
> // Strictly we should test for includes le + 1, but no harm is
> // done by this formulation
> if(i->Includes(le))
> newRanges = factory.Add(newRanges, Range(i->From(), le));
> else if(i->To() <= le)
> newRanges = factory.Add(newRanges, *i);
> }
> RangeSet r(newRanges);
> std::cerr << r << std::endl;
> return r;
> }
>
> RangeSet AddGT(const llvm::APSInt >) {
> std::cerr << "AddGT(" << gt.toString(10) << ") " << *this << " -
> > ";
> const llvm::APSInt max = Max(gt);
> const llvm::APSInt one = One(gt);
>
> if(ranges.isEmpty()) {
> RangeSet r(factory.Add(ranges, Range(gt + one, max)));
> std::cerr << r << std::endl;
> return r;
> }
>
> PrimRangeSet newRanges = factory.GetEmptySet();
>
> for(PrimRangeSet::iterator i = ranges.begin() ; i !=
> ranges.end() ; ++i) {
> if(i->Includes(gt) && i->To() > gt)
> newRanges = factory.Add(newRanges, Range(gt + one, i->To()));
> else if(i->From() > gt)
> newRanges = factory.Add(newRanges, *i);
> }
> RangeSet r(newRanges);
> std::cerr << r << std::endl;
> return r;
> }
>
> RangeSet AddGE(const llvm::APSInt &ge) {
> std::cerr << "AddGE(" << ge.toString(10) << ") " << *this << " -
> > ";
> const llvm::APSInt max = Max(ge);
>
> if(ranges.isEmpty()) {
> RangeSet r(factory.Add(ranges, Range(ge, max)));
> std::cerr << r << std::endl;
> return r;
> }
>
> PrimRangeSet newRanges = factory.GetEmptySet();
>
> for(PrimRangeSet::iterator i = ranges.begin() ; i !=
> ranges.end() ; ++i) {
> // Strictly we should test for includes ge - 1, but no harm is
> // done by this formulation
> if(i->Includes(ge))
> newRanges = factory.Add(newRanges, Range(ge, i->To()));
> else if(i->From() >= ge)
> newRanges = factory.Add(newRanges, *i);
> }
>
> RangeSet r(newRanges);
> std::cerr << r << std::endl;
> return r;
> }
>
> void Print(std::ostream &os) const {
> os << "{ ";
> if(noValues) {
> os << "**no values** }";
> return;
> }
> for(PrimRangeSet::iterator i = ranges.begin() ; i !=
> ranges.end() ; ++i) {
> if(i != ranges.begin())
> os << ", ";
> os << '[' << i->From().toString(10) << ", " << i-
> >To().toString(10)
> << ']';
> }
> os << " }";
>
> }
> bool operator==(const RangeSet &other) const {
> return ranges == other.ranges;
> }
> };
>
> PrimRangeSet::Factory RangeSet::factory;
>
> std::ostream &operator<<(std::ostream &os, const RangeSet &r) {
> r.Print(os);
> return os;
> }
>
> typedef llvm::ImmutableMap<SymbolRef,RangeSet> ConstRangeTy;
>
> namespace clang {
> template<>
> struct GRStateTrait<ConstRange> : public
> GRStatePartialTrait<ConstRangeTy> {
> static inline void* GDMIndex() { return &ConstRangeIndex; }
> };
> }
>
> namespace {
> class VISIBILITY_HIDDEN RangeConstraintManager : public
> SimpleConstraintManager {
> public:
> RangeConstraintManager(GRStateManager& statemgr)
> : SimpleConstraintManager(statemgr) {}
>
> const GRState* AssumeSymNE(const GRState* St, SymbolRef sym,
> const llvm::APSInt& V, bool& isFeasible);
>
> const GRState* AssumeSymEQ(const GRState* St, SymbolRef sym,
> const llvm::APSInt& V, bool&
> isFeasible);
>
> const GRState* AssumeSymLT(const GRState* St, SymbolRef sym,
> const llvm::APSInt& V, bool&
> isFeasible);
>
> const GRState* AssumeSymGT(const GRState* St, SymbolRef sym,
> const llvm::APSInt& V, bool& isFeasible);
>
> const GRState* AssumeSymGE(const GRState* St, SymbolRef sym,
> const llvm::APSInt& V, bool& isFeasible);
>
> const GRState* AssumeSymLE(const GRState* St, SymbolRef sym,
> const llvm::APSInt& V, bool& isFeasible);
>
> const GRState* AssumeInBound(const GRState* St, SVal Idx, SVal
> UpperBound,
> bool Assumption, bool& isFeasible);
>
> const GRState* AddEQ(const GRState* St, SymbolRef sym, const
> llvm::APSInt& V);
>
> const GRState* AddNE(const GRState* St, SymbolRef sym, const
> llvm::APSInt& V);
>
> const GRState* AddLT(const GRState* St, SymbolRef sym, const
> llvm::APSInt& V);
>
> const GRState* AddLE(const GRState* St, SymbolRef sym, const
> llvm::APSInt& V);
>
> const GRState* AddGT(const GRState* St, SymbolRef sym, const
> llvm::APSInt& V);
>
> const GRState* AddGE(const GRState* St, SymbolRef sym, const
> llvm::APSInt& V);
I'm wondering if methods like 'AddGT' could just be AssumeSymGE? They
basically have the same interface other than the 'isFeasible' flag. I
suppose they should be kept separate for cleanliness, although they
look practically identical.
One possibility is to have methods like 'AddGT' reason about RangeSets
(and only RangeSets) and have AssumeSymGT reason about states. This
would provide a nice clean separation of logic.
>
> // FIXME: these two are required because they are pure virtual, but
> // are they useful with ranges? Neither is used in this file.
> const llvm::APSInt* getSymVal(const GRState* St, SymbolRef sym)
> const;
> bool isEqual(const GRState* St, SymbolRef sym, const llvm::APSInt&
> V) const;
These methods are still meaningful on the context of ranges, although
the clients of these methods should potentially be thought out a
little bit more to determine if these queries are still the right ones.
>
> const GRState*
> RangeConstraintManager::AssumeSymNE(const GRState* St, SymbolRef sym,
> const llvm::APSInt& V, bool&
> isFeasible) {
> isFeasible = CouldBeNE(St, sym, V);
> if (isFeasible)
> return AddNE(St, sym, V);
> return St;
> }
Looks good, but the more I think about it the more I think that
CouldBeNE and AddNE (and friends) can be merged. It would probably be
far more efficient (no double looping through the ranges).
> /// Scan all symbols referenced by the constraints. If the symbol is
> not alive
> /// as marked in LSymbols, mark it as dead in DSymbols.
> const GRState*
> RangeConstraintManager::RemoveDeadBindings(const GRState* St,
> SymbolReaper& SymReaper) {
> GRStateRef state(St, StateMgr);
>
> ConstRangeTy CR = state.get<ConstRange>();
> ConstRangeTy::Factory& CRFactory = state.get_context<ConstRange>();
>
> for (ConstRangeTy::iterator I = CR.begin(), E = CR.end(); I != E; +
> +I) {
> SymbolRef sym = I.getKey();
> if (SymReaper.maybeDead(sym)) CR = CRFactory.Remove(CR, sym);
> }
>
> return state.set<ConstRange>(CR);
> }
Looks good.
>
> void RangeConstraintManager::print(const GRState* St, std::ostream&
> Out,
> const char* nl, const char *sep) {
> /*
> // Print equality constraints.
>
> ConstEqTy CE = St->get<ConstEq>();
>
> if (!CE.isEmpty()) {
> Out << nl << sep << "'==' constraints:";
>
> for (ConstEqTy::iterator I = CE.begin(), E = CE.end(); I!=E; +
> +I) {
> Out << nl << " $" << I.getKey();
> llvm::raw_os_ostream OS(Out);
> OS << " : " << *I.getData();
> }
> }
>
> // Print != constraints.
>
> ConstNotEqTy CNE = St->get<ConstNotEq>();
>
> if (!CNE.isEmpty()) {
> Out << nl << sep << "'!=' constraints:";
>
> for (ConstNotEqTy::iterator I = CNE.begin(), EI = CNE.end(); I!
> =EI; ++I) {
> Out << nl << " $" << I.getKey() << " : ";
> bool isFirst = true;
>
> GRState::IntSetTy::iterator J = I.getData().begin(),
> EJ = I.getData().end();
>
> for ( ; J != EJ; ++J) {
> if (isFirst) isFirst = false;
> else Out << ", ";
>
> Out << (*J)->getSExtValue(); // Hack: should print to
> raw_ostream.
> }
> }
> }
> */
> Out << nl << "Implement range printing";
> }
>
It's fine not to implement this for now. Please use '#if 0' instead
of a block comment so its more apparent that its disabled (that's the
convention). '#if 0' is also safer than nested comments.
More information about the cfe-commits
mailing list