[cfe-commits] RangeConstraintManager

Thu Feb 12 14:29:03 PST 2009

On Feb 12, 2009, at 5:15 AM, Ben Laurie wrote:

> Once the other patches are applied, then this file should work just
> fine. Still has debugging messages in and still no switch to enable
> it. Will fix both soon.
> <
> RangeConstraintManager
> .cpp>_______________________________________________
> cfe-commits mailing list
> cfe-commits at cs.uiuc.edu
> http://lists.cs.uiuc.edu/mailman/listinfo/cfe-commits

Hi Ben,

This is looking good!  One meta-level coding style comment I thought  
I'd mention first concerns if/for statements.  Please write:

   if (condition)

instead of

   if(condition)

It's a minor thing (it's an LLVM convention), but it's good for  
consistency.  Same thing for loops.

Other comments inline!

>
> #include <iostream>

Please don't use iostream (if you can help it).  Not only does it's  
performance suck, but it slows down startup time:

http://llvm.org/docs/CodingStandards.html#ll_iostream

> #include <set>

This #include is no longer needed.

> using namespace clang;
>
> namespace { class VISIBILITY_HIDDEN ConstRange {}; }
>
> static int ConstRangeIndex = 0;
>
> /* A Range represents the closed range [from, to].
>  * The caller must guarantee that from <= to.
>  */

Please use C++ style comments (preferably doxygen style comments)  
instead of C-style comments.

> class Range : public std::pair<llvm::APSInt, llvm::APSInt> {
> public:
>   Range(const llvm::APSInt &from, const llvm::APSInt &to)
>     : std::pair<llvm::APSInt, llvm::APSInt>(from, to) {
>     assert(from <= to);
>   }
>   bool Includes(const llvm::APSInt &v) const {
>     return first <= v && v <= second;
>   }
>   const llvm::APSInt &From() const {
>     return first;
>   }
>   const llvm::APSInt &To() const {
>     return second;
>   }
>   const llvm::APSInt *HasConcreteValue() const {
>     return From() == To() ? &From() : NULL;
>   }
>
>   void Profile(llvm::FoldingSetNodeID &ID) const {
>     From().Profile(ID);
>     To().Profile(ID);
>   }
> };

Looks good.  I would also add a comment to 'Range' indicating that it  
is immutable (just to make it super clear).  Its great that all the  
methods are const.  You can also probably do private inheritance from  
std::pair<> so that 'first' and 'second' are not publicly accessible.

>
> struct RangeCmp {
>   bool operator()(const Range &r1, const Range &r2) {
>     if(r1.From() < r2.From()) {
>       assert(!r1.Includes(r2.From()));
>       assert(!r2.Includes(r1.To()));
>       return true;
>     } else if(r1.From() > r2.From()) {
>       assert(!r1.Includes(r2.To()));
>       assert(!r2.Includes(r1.From()));
>       return false;
>     } else
>       assert(!"Ranges should never be equal in the same set");
>   }
> };

Nothing uses this anymore.

> typedef llvm::ImmutableSet<Range> PrimRangeSet;

Looks good.  Note that ImmutableSet uses the trait class  
'ImutContainerInfo' to compare items in the set.  The default  
implementation of ImutContainerInfo uses std::less to compare items.   
More info in ImmutableSet.h (the default trait implementations are at  
the bottom of the file, right before the declaration of 'class  
ImmutableSet'.  If things are working correctly, you might need to  
migrate your logic from RangeCmp over a ImutContainerInfo<Range> class

>
> class RangeSet;
> std::ostream &operator<<(std::ostream &os, const RangeSet &r);

Please use llvm::cerr or llvm::raw_ostream instead.  There are pieces  
of the analysis library that still use ostream; these are getting  
gradualyl replaced with llvm::raw_ostream.

>
> /*
>  * A RangeSet contains a set of ranges. If the set is empty, then
>  *   noValues -> Nothing matches.
>  *  !noValues -> Everything (in range of the bit representation)  
> matches.
>  */

C++ style comments (see other code throughout the codebase for the  
convention).  The 'AST' library is particularly well document (e.g.,  
'include/clang/AST/Stmt.h').  I'll admit that much of the analyzer  
code isn't as well documented as it should be.  I'm not trying to be a  
hypocrite; that code needs to be fixed as well.

> class RangeSet {
>   PrimRangeSet ranges; // no need to make const, since it is an
>                        // ImmutableSet - this allows default operator=
>                        // to work.
>   bool noValues;  // if true, no value is possible (should never  
> happen)
>   static PrimRangeSet::Factory factory;

Please don't use a static member variable.  I see this as the same  
vice as using a global variable.

The factory object's lifetime should be constrained by the lifetime of  
the enclosing RangeConstraintManager object.  Making it static also  
means that multiple RangeConstraintManagers will see the same factory  
object, which shouldn't be the case (at least at this time).  I know  
this means the code might need to be reorganized a little, but  
probably a better spot for the factory object is in  
RangeConstraintManager as a normal member variable.

If you want to bring the factory object around, consider a design  
similar to GRState and GRStateRef.  GRStateRef represents a "fat  
reference" that brings around the GRStateManager object for use with  
GRState methods, and GRState just represents the raw data.

>
>   static const llvm::APSInt Max(const llvm::APSInt &v) {
>     return llvm::APSInt::getMaxValue(v.getBitWidth(), v.isUnsigned());
>   }
>   static const llvm::APSInt Min(const llvm::APSInt &v) {
>      return llvm::APSInt::getMinValue(v.getBitWidth(),  
> v.isUnsigned());
>   }
>   static const llvm::APSInt One(const llvm::APSInt &v) {
>     return llvm::APSInt(llvm::APInt(v.getBitWidth(), 1),  
> v.isUnsigned());
>   }

I suppose these methods could be declared as const.  Actually, it  
might be worth just putting these into APSInt directly.  It seems like  
something that other people could use.

>
> public:
>   RangeSet() : ranges(factory.GetEmptySet()),
>                noValues(false) {
>   }

Please add a comment explaining the comments of the default constructor.

>   // Note that if the empty set is passed, then there are no  
> possible values.
>   // To create a RangeSet that covers all values, use the other  
> constructor.

Is the "other construct" the default constructor?  There are three  
constructors.

>   RangeSet(const PrimRangeSet &r) : ranges(r), noValues(r.isEmpty()) {
>   }
>   // Allow an empty set to be passed without setting noValues

This comment is a little cryptic.  Isn't 'noValues' an implementation  
detail?

>   RangeSet(const PrimRangeSet &r, bool n) : ranges(r), noValues(n) {
>     assert(!n);
>   }

>   void Profile(llvm::FoldingSetNodeID &ID) const {
>     ranges.Profile(ID);
>     ID.AddBoolean(noValues);
>   }

Looks great.

>
>   const llvm::APSInt *HasConcreteValue() const {
>     if(!ranges.isSingleton())
>       return NULL;
>     return ranges.begin()->HasConcreteValue();
>   }

Looks great.  I'm a little concerned about the assumptions clients can  
make on the returned "llvm::APSInt*" object.  It's lifetime is bound  
to the lifetime of the Range object (which is bound to the  
ImmutableSet).  In many cases through the analyzer with use the  
"BasicValueManager" to unique APSInt objects so that they can be  
compared using direct pointer comparison.  It also takes care of some  
weird memory management issues.

By default, the destructors of the objects allocated by an  
ImmutableSet::Factory are not called.  APSInt actually can 'malloc()'  
memory, meaning that if their destructors aren't called then this  
memory will just get leaked.  Using BasicValueManager solves this  
problem since it manually calls the destructors of the uniqued APSInt  
objects.

>
>   bool CouldBeNE(const llvm::APSInt &ne) const {
>     std::cerr << "CouldBeNE(" << ne.toString(10) << ") " << *this <<  
> std::endl;
>     assert(!noValues);
>     if(ranges.isEmpty())
>       return true;
>     for(PrimRangeSet::iterator i = ranges.begin() ; i !=  
> ranges.end() ; ++i)
>       if(i->Includes(ne))
>         return false;
>     return true;
>   }

Just for my education, what is the purpose of 'noValues'?  A small  
comment here would also be useful.  I gather the idea that this method  
returns true if the set of values represented by the range set *may*  
be less than 'ne'.  If that is the case, please put that as a comment  
above the function.

I see that most of the rest of the CouldBeXXX functions is just  
boilerplate.  Looks good for starters (i.e., some obvious FIXME's are  
in place).   You may also consider adding assertions (i.e., assert(0  
&& "Not implemented") for the CouldBeXXX methods that are incomplete.'

>
>   // Make all existing ranges fall within this new range
>   RangeSet Restrict(const llvm::APSInt &from, const llvm::APSInt  
> &to) {
>     if(ranges.isEmpty())
>       return factory.Add(ranges, Range(from, to));;

Looks good.

>
>     PrimRangeSet newRanges = factory.GetEmptySet();
>
>     for(PrimRangeSet::iterator i = ranges.begin() ; i !=  
> ranges.end() ; ++i) {
>       if(i->Includes(from)) {
>         if(i->Includes(to)) {
>           newRanges = factory.Add(newRanges, Range(from, to));
>         } else {
>           newRanges = factory.Add(newRanges, Range(from, i->To()));
>         }
>       } else if(i->Includes(to)) {
>         newRanges = factory.Add(newRanges, Range(i->From(), to));
>       }
>     }
>     return RangeSet(newRanges);
>   }

Quick question: so the idea of 'Restrict' is to do a projection of  
what ranges are possible (within the given range)?  Looks great to me.

BTW, it looks like 'RangeSet' is basically POD and functional.  Is it  
possible to make 'Restrict' a const?

>
>   RangeSet AddEQ(const llvm::APSInt &eq) {
>     std::cerr << "AddEQ(" << eq.toString(10) << ") " << *this << " - 
> > ";
>     assert(CouldBeEQ(eq));
>     RangeSet r(factory.Add(factory.GetEmptySet(), Range(eq, eq)));
>     std::cerr << r << std::endl;
>     return r;
>   }

I see, this adds the constraint that the range is equal to the given  
value.  Could you add this as a comment to the method?  BTW, is the  
precondition for this method that 'CouldBeEQ' be checked *before*  
calling this method?

An alternate design is to allow the client to just add constraints,  
and have a special RangeSet representing "no feasible value" (I think  
this is what your 'noValues' variable is for).  Methods like 'AddEQ'  
could then call 'CouldBeEq' first to determine if the condition is  
feasible and if it isn't return the special RangeSet for "no feasible  
value".

>
>   RangeSet AddNE(const llvm::APSInt &ne) {
>     std::cerr << "AddNE(" << ne.toString(10) << ") " << *this << " - 
> > ";
>
>     const llvm::APSInt max = Max(ne);
>     const llvm::APSInt min = Min(ne);
>     const llvm::APSInt one = One(ne);
>
>     PrimRangeSet newRanges = factory.GetEmptySet();
>
>     if(ranges.isEmpty()) {
>       if(ne != max)
>         newRanges = factory.Add(newRanges, Range(ne + one, max));
>       if(ne != min)
>         newRanges = factory.Add(newRanges, Range(min, ne - one));
>       RangeSet r(newRanges);
>       std::cerr << r << std::endl;
>       return r;
>     }

Looks good.

>
>     for(PrimRangeSet::iterator i = ranges.begin() ; i !=  
> ranges.end() ; ++i) {
>       if(i->Includes(ne)) {
>         if(ne != i->From())
>           newRanges = factory.Add(newRanges, Range(i->From(), ne -  
> one));
>         if(ne != i->To())
>           newRanges = factory.Add(newRanges, Range(ne + one, i- 
> >To()));
>       } else {
>         newRanges = factory.Add(newRanges, *i);
>       }
>     }

I'm wondering if the two cases of 'ranges.isEmpty()' and this loop  
could be coalesced by representing the range of "all values" with the  
explicit range of 'Range(min, max)'.  It would probably simplify a  
bunch of code.

>
>   RangeSet AddLT(const llvm::APSInt &lt) {
>     std::cerr << "AddLT(" << lt.toString(10) << ") " << *this << " - 
> > ";
>     const llvm::APSInt min = Min(lt);
>     const llvm::APSInt one = One(lt);
>
>     if(ranges.isEmpty()) {
>       PrimRangeSet pr = factory.GetEmptySet();
>       if(lt != min)
>         pr = factory.Add(pr, Range(min, lt - one));
>       RangeSet r(pr, false);
>       std::cerr << r << std::endl;
>       return r;
>     }
>
>     PrimRangeSet newRanges = factory.GetEmptySet();
>
>     for(PrimRangeSet::iterator i = ranges.begin() ; i !=  
> ranges.end() ; ++i) {
>       if(i->Includes(lt) && i->From() < lt)
>         newRanges = factory.Add(newRanges, Range(i->From(), lt -  
> one));
>       else if(i->To() < lt)
>         newRanges = factory.Add(newRanges, *i);
>     }
>     RangeSet r(newRanges);
>     std::cerr << r << std::endl;
>     return r;
>   }

Again, I'm thinking that the two cases of 'ranges.isEmpty()' and the  
second case can be combined together.  Come to think of it, maybe the  
'isEmpty' case could represent the infeasible case?  That is, if there  
are no ranges in the RangeSet, then no value is feasible.  In that  
scenario, I don't think you need the CouldBeXXX methods.  Just use the  
'AddXXX' methods, and if the returned RangeSet is empty then you know  
the constraint isn't feasible.  That would remove a bunch of  
potentially redundant logic.

>
>   RangeSet AddLE(const llvm::APSInt &le) {
>     std::cerr << "AddLE(" << le.toString(10) << ") " << *this << " - 
> > ";
>     const llvm::APSInt min = Min(le);
>
>     if(ranges.isEmpty()) {
>       RangeSet r(factory.Add(ranges, Range(min, le)));
>       std::cerr << r << std::endl;
>       return r;
>     }
>
>     PrimRangeSet newRanges = factory.GetEmptySet();
>
>     for(PrimRangeSet::iterator i = ranges.begin() ; i !=  
> ranges.end() ; ++i) {
>       // Strictly we should test for includes le + 1, but no harm is
>       // done by this formulation
>       if(i->Includes(le))
>         newRanges = factory.Add(newRanges, Range(i->From(), le));
>       else if(i->To() <= le)
>         newRanges = factory.Add(newRanges, *i);
>     }
>     RangeSet r(newRanges);
>     std::cerr << r << std::endl;
>     return r;
>   }
>
>   RangeSet AddGT(const llvm::APSInt &gt) {
>     std::cerr << "AddGT(" << gt.toString(10) << ") " << *this << " - 
> > ";
>     const llvm::APSInt max = Max(gt);
>     const llvm::APSInt one = One(gt);
>
>     if(ranges.isEmpty()) {
>       RangeSet r(factory.Add(ranges, Range(gt + one, max)));
>       std::cerr << r << std::endl;
>       return r;
>     }
>
>     PrimRangeSet newRanges = factory.GetEmptySet();
>
>     for(PrimRangeSet::iterator i = ranges.begin() ; i !=  
> ranges.end() ; ++i) {
>       if(i->Includes(gt) && i->To() > gt)
>         newRanges = factory.Add(newRanges, Range(gt + one, i->To()));
>       else if(i->From() > gt)
>         newRanges = factory.Add(newRanges, *i);
>     }
>     RangeSet r(newRanges);
>     std::cerr << r << std::endl;
>     return r;
>   }
>
>   RangeSet AddGE(const llvm::APSInt &ge) {
>     std::cerr << "AddGE(" << ge.toString(10) << ") " << *this << " - 
> > ";
>     const llvm::APSInt max = Max(ge);
>
>     if(ranges.isEmpty()) {
>       RangeSet r(factory.Add(ranges, Range(ge, max)));
>       std::cerr << r << std::endl;
>       return r;
>     }
>
>     PrimRangeSet newRanges = factory.GetEmptySet();
>
>     for(PrimRangeSet::iterator i = ranges.begin() ; i !=  
> ranges.end() ; ++i) {
>       // Strictly we should test for includes ge - 1, but no harm is
>       // done by this formulation
>       if(i->Includes(ge))
>         newRanges = factory.Add(newRanges, Range(ge, i->To()));
>       else if(i->From() >= ge)
>         newRanges = factory.Add(newRanges, *i);
>     }
>
>     RangeSet r(newRanges);
>     std::cerr << r << std::endl;
>     return r;
>   }
>
>   void Print(std::ostream &os) const {
>     os << "{ ";
>     if(noValues) {
>       os << "**no values** }";
>       return;
>     }
>     for(PrimRangeSet::iterator i = ranges.begin() ; i !=  
> ranges.end() ; ++i) {
>       if(i != ranges.begin())
>         os << ", ";
>       os << '[' << i->From().toString(10) << ", " << i- 
> >To().toString(10)
>          << ']';
>     }
>     os << " }";
>
> }
>   bool operator==(const RangeSet &other) const {
>     return ranges == other.ranges;
>   }
> };
>
> PrimRangeSet::Factory RangeSet::factory;
>
> std::ostream &operator<<(std::ostream &os, const RangeSet &r) {
>   r.Print(os);
>   return os;
> }
>
> typedef llvm::ImmutableMap<SymbolRef,RangeSet> ConstRangeTy;
>
> namespace clang {
> template<>
> struct GRStateTrait<ConstRange> : public  
> GRStatePartialTrait<ConstRangeTy> {
>   static inline void* GDMIndex() { return &ConstRangeIndex; }
> };
> }
>
> namespace {
> class VISIBILITY_HIDDEN RangeConstraintManager : public  
> SimpleConstraintManager {
> public:
>   RangeConstraintManager(GRStateManager& statemgr)
>       : SimpleConstraintManager(statemgr) {}
>
>   const GRState* AssumeSymNE(const GRState* St, SymbolRef sym,
>                              const llvm::APSInt& V, bool& isFeasible);
>
>   const GRState* AssumeSymEQ(const GRState* St, SymbolRef sym,
>                                 const llvm::APSInt& V, bool&  
> isFeasible);
>
>   const GRState* AssumeSymLT(const GRState* St, SymbolRef sym,
>                                     const llvm::APSInt& V, bool&  
> isFeasible);
>
>   const GRState* AssumeSymGT(const GRState* St, SymbolRef sym,
>                              const llvm::APSInt& V, bool& isFeasible);
>
>   const GRState* AssumeSymGE(const GRState* St, SymbolRef sym,
>                              const llvm::APSInt& V, bool& isFeasible);
>
>   const GRState* AssumeSymLE(const GRState* St, SymbolRef sym,
>                              const llvm::APSInt& V, bool& isFeasible);
>
>   const GRState* AssumeInBound(const GRState* St, SVal Idx, SVal  
> UpperBound,
>                                bool Assumption, bool& isFeasible);
>
>   const GRState* AddEQ(const GRState* St, SymbolRef sym, const  
> llvm::APSInt& V);
>
>   const GRState* AddNE(const GRState* St, SymbolRef sym, const  
> llvm::APSInt& V);
>
>   const GRState* AddLT(const GRState* St, SymbolRef sym, const  
> llvm::APSInt& V);
>
>   const GRState* AddLE(const GRState* St, SymbolRef sym, const  
> llvm::APSInt& V);
>
>   const GRState* AddGT(const GRState* St, SymbolRef sym, const  
> llvm::APSInt& V);
>
>   const GRState* AddGE(const GRState* St, SymbolRef sym, const  
> llvm::APSInt& V);

I'm wondering if methods like 'AddGT' could just be AssumeSymGE?  They  
basically have the same interface other than the 'isFeasible' flag.  I  
suppose they should be kept separate for cleanliness, although they  
look practically identical.

One possibility is to have methods like 'AddGT' reason about RangeSets  
(and only RangeSets) and have AssumeSymGT reason about states.  This  
would provide a nice clean separation of logic.

>
>   // FIXME: these two are required because they are pure virtual, but
>   // are they useful with ranges? Neither is used in this file.
>   const llvm::APSInt* getSymVal(const GRState* St, SymbolRef sym)  
> const;
>   bool isEqual(const GRState* St, SymbolRef sym, const llvm::APSInt&  
> V) const;

These methods are still meaningful on the context of ranges, although  
the clients of these methods should potentially be thought out a  
little bit more to determine if these queries are still the right ones.

>
> const GRState*
> RangeConstraintManager::AssumeSymNE(const GRState* St, SymbolRef sym,
>                                     const llvm::APSInt& V, bool&  
> isFeasible) {
>   isFeasible = CouldBeNE(St, sym, V);
>   if (isFeasible)
>     return AddNE(St, sym, V);
>   return St;
> }

Looks good, but the more I think about it the more I think that  
CouldBeNE and AddNE (and friends) can be merged.  It would probably be  
far more efficient (no double looping through the ranges).

> /// Scan all symbols referenced by the constraints. If the symbol is  
> not alive
> /// as marked in LSymbols, mark it as dead in DSymbols.
> const GRState*
> RangeConstraintManager::RemoveDeadBindings(const GRState* St,
>                                            SymbolReaper& SymReaper) {
>   GRStateRef state(St, StateMgr);
>
>   ConstRangeTy CR = state.get<ConstRange>();
>   ConstRangeTy::Factory& CRFactory = state.get_context<ConstRange>();
>
>   for (ConstRangeTy::iterator I = CR.begin(), E = CR.end(); I != E; + 
> +I) {
>     SymbolRef sym = I.getKey();
>     if (SymReaper.maybeDead(sym)) CR = CRFactory.Remove(CR, sym);
>   }
>
>   return state.set<ConstRange>(CR);
> }

Looks good.

>
> void RangeConstraintManager::print(const GRState* St, std::ostream&  
> Out,
>                                    const char* nl, const char *sep) {
>   /*
>   // Print equality constraints.
>
>   ConstEqTy CE = St->get<ConstEq>();
>
>   if (!CE.isEmpty()) {
>     Out << nl << sep << "'==' constraints:";
>
>     for (ConstEqTy::iterator I = CE.begin(), E = CE.end(); I!=E; + 
> +I) {
>       Out << nl << " $" << I.getKey();
>       llvm::raw_os_ostream OS(Out);
>       OS << " : "   << *I.getData();
>     }
>   }
>
>   // Print != constraints.
>
>   ConstNotEqTy CNE = St->get<ConstNotEq>();
>
>   if (!CNE.isEmpty()) {
>     Out << nl << sep << "'!=' constraints:";
>
>     for (ConstNotEqTy::iterator I = CNE.begin(), EI = CNE.end(); I! 
> =EI; ++I) {
>       Out << nl << " $" << I.getKey() << " : ";
>       bool isFirst = true;
>
>       GRState::IntSetTy::iterator J = I.getData().begin(),
>                                   EJ = I.getData().end();
>
>       for ( ; J != EJ; ++J) {
>         if (isFirst) isFirst = false;
>         else Out << ", ";
>
>         Out << (*J)->getSExtValue(); // Hack: should print to  
> raw_ostream.
>       }
>     }
>   }
>   */
>   Out << nl << "Implement range printing";
> }
>

It's fine not to implement this for now.  Please use '#if 0' instead  
of a block comment so its more apparent that its disabled (that's the  
convention).  '#if 0' is also safer than nested comments.