[cfe-dev] Constructing use-def chains

Simone Pellegrini spellegrini at dps.uibk.ac.at
Sun Nov 1 11:35:16 PST 2009


Dear Ted,
first of all I apologize for the long time, but I was a little bit
overloaded with work lately so I had no time to fix the style issues in
the DefUse implementation. 

I did several improvements following your suggestions. I changed the
interface a little and mainly the implementation. Here it is a list of
changes I did:

* DefUseNode now is implemented using the PointerUnion class (I also
changed the name from DefUseNode to Node as it is already in the du
namespace and it was a little bit overwhelming).

* I have separated the DefUseHelper from the visitor semantics and moved
some of the definition from the the header to the cpp file. 

* I have eliminated all the possible problem due to resizing of vectors
and I finally use llvm::DenseMap instead of std::map.

* I also changed most of the variable names to adhere the LLVM style

* I use the AnalysisContext object instead of asking the user for a CFG
and a ParentMap

* plus several small fixes and style improvement

I think there are still improvements to do, the code is way away from
being perfect. I promise I will provide some test cases in short time. 

Anyway, thanks for helping me in improving my crap code :P more comments
follows:

On Wed, 2009-09-23 at 00:18 -0700, Ted Kremenek wrote:
> On Sep 15, 2009, at 9:24 AM, Simone Pellegrini wrote:
> 
> > Hello, I am back :),
> > so I fix most of the comments, I still need to address a couple of  
> > issues, i.e. use of llvm::DenseMap in place of std::map and virtual  
> > methods in DefUseHelper.
> 
> Hi Simone,
> 
> Sorry for reviewing this second set of changes so late.  Comments  
> inline.
> 
> > About the usage of llvm::DenseMap, the problem is that every time I  
> > change from std:::map to llvm::DenseMap I get several errors  
> > (compile time), which is weird as the 2 classes expose the same  
> > interface, anyway I need some time to understand which is the source  
> > of problems.
> 
> One thing to understand is that the two data structures do not export  
> the same interface.
> 
> std::map allocates memory (using malloc) for each key/value pair you  
> store in the map.  This allocation stays around around as long as you  
> keep the key in the map.  For example, suppose you had:
> 
>     std::map<void*,int> M;
>     int &x = M[p];
> 
> The reference 'x' is valid for the entire time that the key 'p' is in  
> the map.  If you add or remove other keys, the reference stays valid  
> because the key/value pair is actually an allocated blob of memory  
> they you are allowed to reference.
> 
> DenseMap doesn't have this property.  It assumes that the values you  
> store in the DenseMap are lightweight, e.g. pointers to other data,  
> and thus allocates memory in chunks that are used for buckets for the  
> hash table.  When an insertion into the DenseMap occurs, the hash  
> table may resize, causing the values in the old buckets to get copied  
> to new buckets, and the old buckets deallocated.  This means that any  
> reference to the old data, as in the example above, is invalid.
> 
> Where this is going to bite you is here:
> 
>    typedef std::map<VarDecl const*, DefUseVect> DefUseChain;
> 
> If you replace this with:
> 
>    typedef llvm::DenseMap<VarDecl const*, DefUseVect> DefUseChain;
> 
> then fundamentally things are going to break.  This because the  
> DefUseVect values, which is a typedef for std::vector<DefUseNode>,  
> will get copied and deallocated when the table resizes.  All  
> references to the DefUseVect, or the objects they contain, will  
> incorrect at this point.
> 
> The solution is to use DefUseVect* instead of DefUseVect as the value  
> for DefUseChain.

Done.

> 
> >
> > About the virtual methods in DefUseHelper, you are right! :) most of  
> > the time the user will use the default one and I see the performance  
> > problem with the virtual calls. Actually, I have introduced this  
> > virtual dispatch because in the project where I am using this  
> > implementation of DefUse, I need to implement  specific semantics of  
> > MPI calls. So mostly I am overloading the semantics of method calls  
> > (CallExpr). By default when a variable is passed to a function as a  
> > pointer type (or reference), during the DefUse analysis we have to  
> > state that this is a definition for the variable. Anyway, if we know  
> > the semantics of the specific method, and we know that it will not  
> > change the variable (for example the buffer in MPI_Send) we can  
> > state that the method call will a use for the variable.
> >
> > Now I don't know if this mechanism is useful for others... and if it  
> > should be here in the first place, but it makes sense for me :)  
> > Anyway, I guess I can avoid the virtual calls by using templates. I  
> > will be working on it the next days.
> 
> I think the approach should be to focus on a clean API.  Using  
> templates should be avoided unless there is a huge performance need;  
> the cost of virtual functions is low if the amount of work done in the  
> virtual function call outweighs the dynamic dispatch.
> 

no templates... got it!

> >
> > For the rest I properly formatted the code, for the test cases I  
> > have to learn how to write test cases for LLVM :)
> 
> Take a look at the Clang tests in 'test/Analysis' and friends.   
> Basically we'll need to wire up a driver option to clang-cc, and  
> you'll want to run clang-cc on some code that outputs some diagnostics  
> relevant to def-use analysis.

to be done. sorry

> 
> More specific comments inline.
> 
> > //===-- clang/Analysis/DefUse.h - DefUse analysis -------*- C++ -*- 
> > ===//
> > //
> > //                     The LLVM Compiler Infrastructure
> > //
> > // This file is distributed under the University of Illinois Open  
> > Source
> > // License. See LICENSE.TXT for details.
> > //
> > // 
> > = 
> > = 
> > = 
> > ----------------------------------------------------------------------= 
> > ==//
> > //
> > // This file contains the declaration of DefUse analysis headers
> > //
> > // 
> > = 
> > = 
> > = 
> > ----------------------------------------------------------------------= 
> > ==//
> >
> > #ifndef LLVM_CLANG_DEFUSE_H
> > #define LLVM_CLANG_DEFUSE_H
> >
> > #include "llvm/ADT/DenseMap.h"
> >
> > #include "clang/AST/ASTConsumer.h"
> > #include "clang/AST/ParentMap.h"
> > #include "clang/AST/StmtVisitor.h"
> >
> > namespace clang {
> >
> > // forward definitions
> > class CFG;
> > class CFGBlock;
> >
> > class DefUse;
> >
> > namespace defuse {
> > //===------------------------- DefUseNode ------------------------- 
> > ===//
> > // Keeps the information for a particular 'definition' or 'use' of a  
> > variable
> > // The structure is needed because a defuse chain can contains  
> > variable declarations
> > // as well as variable reference. As a DeclStmt can contains several  
> > declarations
> > // to address a particular variable we need to store its VarDecl or  
> > DeclRefExp
> > class DefUseNode {
> > public:
> >  enum NodeKind {
> >    VarDecl, VarRef
> >  };
> >  enum UseKind {
> >    Use, Def, UseDef
> >  };
> >
> >  DefUseNode(clang::VarDecl const* decl) :
> >    var_decl(decl), kind(VarDecl), usage(Def) {
> >  }
> >  DefUseNode(DeclRefExpr const* ref, UseKind u = Use) :
> >    var_ref(ref), kind(VarRef), usage(u) {
> >  }
> >
> >  clang::VarDecl const* getDecl() const;
> >
> >  NodeKind const& getKind() const {
> >    return kind;
> >  }
> >  UseKind const& getUse() const {
> >    return usage;
> >  }
> >
> >  clang::VarDecl const* getVarDecl() const {
> >    assert(kind==VarDecl);
> >    return var_decl;
> >  }
> >  DeclRefExpr const* getVarRef() const {
> >    assert(kind==VarRef);
> >    return var_ref;
> >  }
> >
> >  bool operator==(DefUseNode const& n) const;
> >
> > private:
> >  // a def-use node can be either a VarDecl or a DeclRefExpr
> >  union {
> >    clang::VarDecl const* var_decl;
> >    DeclRefExpr const* var_ref;
> >  };
> >  NodeKind kind;
> >  UseKind usage;
> 
> Please consider using the PointerUnion class for this.  It will reduce  
> the code you are using, and will allow you to merge the 'kind' field  
> into the pointer as well with no extra work on your part.
> 
Done.

> Should 'usage' be const?

yes, you are right! 

> 
> Also, most of LLVM does not follow the naming convention of lowercase  
> variable names with underscores.  To be consistent with the rest of  
> the codebase, please take a look at the coding conventions used in a  
> core header file, e.g. 'include/clang/AST/Expr.h', which shows that  
> variable names are typically short, and use capitalization instead of  
> underscores.  This is really a nitty point, but consistency through  
> the code base makes it more readable.
> 
> > };
> >
> > //===------------------------- Typedefs -------------------------===//
> > class DefUseBlock;
> > class VarDeclMap;
> > typedef std::vector<DefUseBlock> DefUseData;
> > typedef llvm::DenseMap<Stmt const*, unsigned> VarRefBlockMap;
> > typedef std::vector<DefUseNode> DefUseVect;
> > typedef std::vector<DefUseNode const*> VarRefsVect;
> 
> Something looks potentially wrong here.  DefUseVect is a vector of  
> DefUseNode objects, not DefUseNode*.  If the vector ever needs to be  
> resized, then the DefUseNode objects will be obliterated, and any  
> references to them (which may be in VarRefVect) will suddenly become  
> invalid.  DefUseNode is a lightweight object that you can certainly  
> keep in a vector, but if you need these objects to have stable pointer  
> addresses you will need to manually allocate them and store  
> DefUseNode* instead DefUseVect instead.  If you are concerned with  
> 'malloc()' being too slow, you can use the BumpPtrAllocator in llvm to  
> quickly allocate objects.

yes, right! Fixed it.

> 
> > class DefUseHelper;
> >
> > //===------------------------- DefUseHelper ------------------------- 
> > ===//
> >
> > class DefUseHelper : public StmtVisitor<DefUseHelper> {
> >  class DefUseHelperImpl;
> >  DefUseHelperImpl* pimpl;
> >
> >  void InitializeValues(DefUseData* data, VarRefBlockMap* bm,  
> > VarDeclMap* dm);
> >
> >  friend class clang::DefUse;
> > public:
> >  DefUseNode::UseKind current_use;
> >  DefUseHelper();
> >
> >  virtual void HandleDeclRefExpr(DeclRefExpr *DR); // remember to  
> > call the
> >                              // super class implementation of the  
> > method
> >  void HandleDeclStmt(DeclStmt *DS);
> >
> >  virtual void HandleBinaryOperator(BinaryOperator* B);
> >  virtual void HandleConditionalOperator(ConditionalOperator* C);
> >  virtual void HandleCallExpr(CallExpr* C);
> >  virtual void HandleUnaryOperator(UnaryOperator* U);
> >  virtual void HandleArraySubscriptExpr(ArraySubscriptExpr* AS);
> >
> >  void VisitDeclRefExpr(DeclRefExpr *DR) {
> >    return HandleDeclRefExpr(DR);
> >  }
> >  void VisitDeclStmt(DeclStmt *DS) {
> >    return HandleDeclStmt(DS);
> >  }
> >  void VisitBinaryOperator(BinaryOperator* B) {
> >    return HandleBinaryOperator(B);
> >  }
> >  void VisitConditionalOperator(ConditionalOperator* C) {
> >    return HandleConditionalOperator(C);
> >  }
> >  void VisitCallExpr(CallExpr* C) {
> >    return HandleCallExpr(C);
> >  }
> >  void VisitUnaryOperator(UnaryOperator* U) {
> >    return HandleUnaryOperator(U);
> >  }
> >  void VisitArraySubscriptExpr(ArraySubscriptExpr* AS) {
> >    return HandleArraySubscriptExpr(AS);
> >  }
> >
> >  void VisitCFGBlock(clang::CFGBlock const& blk, CFGBlock const&  
> > entry);
> >  void VisitStmt(Stmt* S);
> >
> >  virtual ~DefUseHelper();
> > };
> >
> > void PrintVarDefs(DefUse const* DU, DeclRefExpr const* DR,  
> > ASTContext& ctx,
> >    llvm::raw_ostream& out);
> > void PrintVarUses(DefUse const* DU, DeclRefExpr const* DR,  
> > ASTContext& ctx,
> >    llvm::raw_ostream& out);
> > void PrintVarUses(DefUse const* DU, VarDecl const* VD, ASTContext&  
> > ctx,
> >    llvm::raw_ostream& out);
> >
> > ASTConsumer* CreateDefUseTestConsumer(llvm::raw_ostream& out);
> >
> 
> I'm still trying to understand the implementation in its entirety, but  
> it seems to me that you should try and decouple the DefUse  
> implementation from the observer of its values.  Here it seems like  
> they are mixed together, which forces you to use subclassing of  
> DefUseHelper to extend it's functionality.  This isn't ideal, as it  
> conflates the DefUse algorithm with what desires to consume that  
> information.  This muddles the separation of concerns between these  
> two concepts, and it constrains future revisions of the DefUse  
> implementation to stay within this visitor design.  The consumer of  
> the DefUse information may wish to have a visitor design, but I'm not  
> certain if you want to expose a visitor pattern in the DefUse  
> implementation itself.

The helper is not used as observer, it's more something needed to build
the defuse chains. When the analysis data is collected I need to know if
a DeclVarRef is a definition or an use. And this is DefUse::Helper
business. Most of the time the user will not need to redefine the 
behaviour, it could happen and I want to give the opportunity for that. 

Anyway now I separated the Visitor from the HandleXXX method. It's a
little bit cleaner but the concept is still the same. 

> 
> As an alternative, consider the "observer" interface provided by  
> Clang's LiveVariables analysis ('include/clang/Analysis/Analyses/ 
> LiveVariables.h').  Here the LiveVariables implementation is  
> completely opaque, but allows clients to provide an "Observer" object  
> so they can query liveness information at specific subexpressions.   
> The dead stores checker uses this interface to find dead stores, and  
> it can do so without any knowledge of how the LiveVariables analysis  
> calculates its liveness information.  It makes the API much simpler,  
> and disentangles the implementation of LiveVariables from its clients.
> 
> More specifically, the HandleXXX methods look like they are part of  
> the "observer" interface, while the VisitXXX methods look like they  
> are part of the DefUse implementation.  Your use of the pimpl idiom  
> implies you want to keep some of this implementation hidden.  Why not  
> just separate the implementation of the DefUse algorithm completely  
> from the HandleXXX methods and keep those in the DefUseHelper.  The  
> DefUse algorithm can then check to see if a DefUseHelper object is  
> provided, and if so, call the appropriate HandleXXX methods.
> 
> > } // end namespace defuse
> >
> > //===------------------------- DefUse -------------------------===//
> >
> > class DefUse {
> >  ASTContext const& ctx;
> >  defuse::DefUseData const* analysis_data;
> >  defuse::VarDeclMap const* decl_map;
> >  defuse::VarRefBlockMap const* block_map;
> >  unsigned const num_cfg_blocks;
> 
> Again, just my comment on the naming convention for variables.
> 
> >
> >  DefUse(ASTContext const& ctx_,
> >      defuse::DefUseData const* analysis_data_,
> >      defuse::VarDeclMap const* decl_map_,
> >      defuse::VarRefBlockMap const* block_map_,
> >      unsigned num_blocks) :
> 
> Please do not use trailing underscores in variable names.
> 
> >    ctx(ctx_), analysis_data(analysis_data_), decl_map(decl_map_),
> >    block_map(block_map_), num_cfg_blocks(num_blocks) {
> >  }
> >
> >  bool isDef(defuse::DefUseNode const& n) const;
> >
> >  class iterator_impl {
> >  public:
> >    DefUse const* du;
> >    defuse::DefUseNode const* node;
> >
> >    struct iter {
> >      int block_id;
> >      defuse::DefUseVect::const_iterator block_it;
> >      iter(int blk_id) :
> >        block_id(blk_id) {
> >      }
> >    };
> >    iterator_impl() : du(NULL) { }
> >    iterator_impl(DefUse const* du_) : du(du_) { }
> >    iterator_impl& operator++() {
> >      return inc(false);
> >    }
> >    virtual iterator_impl& inc(bool) = 0;
> >  };
> 
> Do we need a virtual iterator?  Does iterator_impl need to be  
> subclassed?  Not only is this a potential performance problem, but it  
> could be the source of subtle bugs if you assigned an object of a  
> subclass to that of a parent class.  It seems to me that defs_iterator  
> and uses_iterator can be merged (substantially reducing the code), and  
> simply add a flag to the iterator indicating whether it is for uses or  
> defs.  If you want strong typing, you certainly can have subclassing  
> of the iterators, but I think the entire implementation can get sucked  
> into a common class.  Then the "inc" just checks the flag and skips of  
> uses when you are iterating over defs and vis versa.  You then remove  
> the vtable pointer, simplify the inheritance tree, and the  
> implementation becomes a lot more readable.  There is also more  
> chances for inlining by the compiler (i.e., if the definition of 'inc'  
> is provided in the class declaration).
> 
I also fixed this. The problem here is that the 2 iterators are a little
bit different. I cannot put everything together. For example the
uses_iterator iterates through DeclRefExpr* while the defs_iterator
return a pointer to a Node* (old DefUseNode) because an use can be both
a decl or a var ref.

> 
> > public:
> >  class uses_iterator : public std::iterator<std::input_iterator_tag,
> >    DeclRefExpr, std::ptrdiff_t, DeclRefExpr const*>,
> >    public iterator_impl {
> >    iter iter_ptr;
> >    bool inDefBlock;
> >
> >    uses_iterator() :
> >      iterator_impl(), iter_ptr(-1), inDefBlock(true) { }
> >    uses_iterator(DefUse const* du, defuse::DefUseNode const& n);
> >    uses_iterator& inc(bool first);
> >    friend class DefUse;
> >  public:
> >    bool operator!=(uses_iterator const& iter);
> >    DeclRefExpr const* operator*();
> >  };
> >
> >  class defs_iterator : public std::iterator<std::input_iterator_tag,
> >  defuse::DefUseNode, std::ptrdiff_t, defuse::DefUseNode const*>,
> >  public iterator_impl {
> >    struct iter_ : public iter {
> >      defuse::VarRefsVect::const_iterator reaches_it;
> >      iter_(int blk_id) :
> >        iter(blk_id) { }
> >    } iter_ptr;
> >    bool blockDef;
> >
> >    defs_iterator() :
> >      iterator_impl(), iter_ptr(-1), blockDef(false) { }
> >    defs_iterator(DefUse const* du, DeclRefExpr const& n);
> >    defs_iterator& inc(bool first);
> >    friend class DefUse;
> >  public:
> >    bool operator!=(defs_iterator const& iter);
> >    defuse::DefUseNode const* operator*();
> >  };
> >
> >  // USES //
> >  defs_iterator defs_begin(DeclRefExpr const* var) const;
> >  defs_iterator defs_end() const;
> >
> >  // DEFS //
> >  uses_iterator uses_begin(DeclRefExpr const* var) const;
> >  uses_iterator uses_begin(VarDecl const* var) const;
> >  uses_iterator uses_end() const;
> >
> >  bool isUse(DeclRefExpr const* var) const;
> >  bool isDef(DeclRefExpr const* var) const {
> >    return isDef(defuse::DefUseNode(var, defuse::DefUseNode::Def));
> >  }
> >  bool isDef(VarDecl const* var) const {
> >    return isDef(defuse::DefUseNode(var));
> >  }
> >
> >  ~DefUse();
> >
> >  static DefUse* BuildDefUseChains(Stmt* body, ASTContext *ctx,
> >      CFG* cfg, ParentMap* pm, defuse::DefUseHelper* helper = 0,
> >      bool verbose = false, llvm::raw_ostream& out = llvm::outs());
> > };
> >
> > } // end namespace clang
> >
> > #endif
> > //===-- clang/Analysis/DefUse.cpp - DefUse analysis -------*- C++ -*- 
> > ===//
> > //
> > //                     The LLVM Compiler Infrastructure
> > //
> > // This file is distributed under the University of Illinois Open  
> > Source
> > // License. See LICENSE.TXT for details.
> > //
> > // 
> > = 
> > = 
> > = 
> > --------------------------------------------------------------------- 
> > ===//
> > //
> > // This file contains the implementation of DefUse analysis
> > //
> > // 
> > = 
> > = 
> > = 
> > --------------------------------------------------------------------- 
> > ===//
> >
> > #define __STDC_LIMIT_MACROS
> > #define __STDC_CONSTANT_MACROS
> >
> > #include "DefUse.h"
> >
> > #include "clang/Basic/SourceManager.h"
> > #include "clang/Analysis/CFG.h"
> > #include <llvm/Support/raw_ostream.h>
> >
> > // todo: Use DenseMap and llvm sets
> > #include <set>
> > #include <map>
> >
> > using namespace clang;
> > using namespace defuse;
> >
> > // utility functions
> > static unsigned Line(SourceLocation const& l, clang::SourceManager  
> > const& sm) {
> >  PresumedLoc pl = sm.getPresumedLoc(l);
> >  return pl.getLine();
> > }
> >
> > static unsigned Column(SourceLocation const& l, SourceManager const&  
> > sm) {
> >  PresumedLoc pl = sm.getPresumedLoc(l);
> >  return pl.getColumn();
> > }
> >
> > static std::string PrintClangStmt(Stmt const* stmt, ASTContext  
> > const& ctx) {
> >  std::string str;
> >  llvm::raw_string_ostream pp(str);
> >  stmt->printPretty(pp, 0, PrintingPolicy(ctx.getLangOptions()), 0);
> >  return pp.str();
> > }
> >
> > //===------------------------- DefUseNode ------------------------- 
> > ===//
> > inline bool defuse::DefUseNode::operator==(DefUseNode const& n)  
> > const {
> >  if (n.usage != usage || n.kind != kind) return false;
> >  return kind == VarDecl ? var_decl == n.var_decl : var_ref ==  
> > n.var_ref;
> > }
> >
> > inline clang::VarDecl const* DefUseNode::getDecl() const {
> >  return kind == VarRef ? dyn_cast<clang::VarDecl> (var_ref->getDecl 
> > ()) :
> >    dyn_cast<clang::VarDecl> (var_decl);
> > }
> >
> > typedef std::set<DefUseNode const*> VarRefsSet;
> > typedef std::set<VarDecl const*> VarDeclsSet;
> > typedef std::map<VarDecl const*, DefUseVect> DefUseChain;
> >
> > //===------------------------- DefUseBlock ------------------------- 
> > ===//
> >
> > class defuse::DefUseBlock : public DefUseChain {
> 
> Please use encapsulation instead of subclassing.  In this case you are  
> subclassing std::map, and that is something there rarely is ever a  
> good reason to do.  Encapsulation ensures isolation between the two  
> classes, and forces you to think in terms of interfaces instead of  
> making assumptions about the implementation of the underlying data  
> structure.  Almost all of LLVM is structured this way, and it makes  
> code far more resilient to implementation changes in the employed data  
> structures.
> 
> Instead, either have DefUseChain contain an std::map (and then define  
> the proper interfaces in DefUseChain) or have DefUseBlock contain a  
> DefUseChain, not subclass it.  The former forces you to define only  
> the interfaces you need for DefUseChain, instead of exposing the  
> entirety of std::map (which isn't the interface you're intending to  
> vend, even if it is just to your client code within the DefUse  
> implementation).

I have to think about it... the main observation is that this stuff will
be only used in this translation unit. 

> 
> > public:
> >  VarDeclsSet uses;
> >  VarRefsVect defsout;
> >  VarDeclsSet killed;
> >  VarRefsVect reaches;
> >
> >  class not_killed_set_iterator : public  
> > std::iterator<std::input_iterator_tag,
> >      DefUseNode, std::ptrdiff_t, DefUseNode const*> {
> >    DefUseBlock& block;
> >    VarRefsVect::const_iterator ptr_it;
> >
> >    not_killed_set_iterator& inc(bool first) {
> >      if (ptr_it != block.reaches.end() && !first)
> >        ++ptr_it;
> >      while (ptr_it != block.reaches.end() &&
> >          (block.killed.find((*ptr_it)->getDecl()) != block.killed.end 
> > () &&
> >          std::find(block.defsout.begin(), block.defsout.end(),  
> > *ptr_it)
> >          == block.defsout.end()))
> >        ++ptr_it;
> >      return *this;
> >    }
> >  public:
> >    not_killed_set_iterator(DefUseBlock& block_,
> >        VarRefsVect::const_iterator ptr_it_) :
> >      block(block_), ptr_it(ptr_it_) { inc(true); }
> >    not_killed_set_iterator& operator++() { return inc(false); }
> >    bool operator!=(not_killed_set_iterator const& iter) const {
> >      return !((&iter.block) == &(block) && iter.ptr_it == ptr_it);
> >    }
> >    const DefUseNode* const & operator*() const { return *ptr_it; }
> >  };
> >
> >  not_killed_set_iterator begin_not_killed_set() {
> >    return DefUseBlock::not_killed_set_iterator(*this, reaches.begin 
> > ());
> >  }
> >  not_killed_set_iterator end_not_killed_set() {
> >    return DefUseBlock::not_killed_set_iterator(*this, reaches.end());
> >  }
> > };
> >
> > //===------------------------- VarDeclMap ------------------------- 
> > ===//
> >
> > class defuse::VarDeclMap : public StmtVisitor<VarDeclMap> ,
> >    public llvm::DenseMap<VarDecl const*, DeclStmt const*> {
> > public:
> >  VarDeclMap(Stmt const* body) { VisitStmt(const_cast<Stmt*> (body)); }
> >
> >  void VisitDeclStmt(DeclStmt *DS) {
> >    for (DeclStmt::decl_iterator I = DS->decl_begin(), E = DS- 
> > >decl_end();
> >        I != E; ++I)
> >    {
> >      if (VarDecl *VD = dyn_cast<VarDecl>(*I))
> >        insert(std::make_pair(VD, DS));
> >    }
> >  }
> >  void VisitStmt(Stmt* S) {
> >    for (Stmt::child_iterator I = S->child_begin(), E = S->child_end();
> >        I != E; ++I)
> >      if (*I) Visit(*I);
> >  }
> > };
> >
> > //===------------------------- DefUseHelper ------------------------- 
> > ===//
> > struct defuse::DefUseHelper::DefUseHelperImpl {
> >  DefUseData* data;
> >  VarRefBlockMap* bm;
> >  VarDeclMap* dm;
> >  unsigned blockID;
> >
> >  DefUseHelperImpl() :
> >    data(NULL), bm(NULL), dm(NULL) { }
> > };
> 
> Per my comments above, it seems to me this pimpl logic can be sucked  
> into a common class with the main logic for the DefUse algorithm, and  
> then make that whole thing private to DefUse.cpp.  That will be a lot  
> simpler, and should migrate many of the implementation details from  
> the header file to the .cpp file.  Then clients just pass the  
> DefUseHelper object (which has the HandleXXX methods) to  
> BuildDefUseChains(), and the implementation class stays completely  
> hidden (thus separating the DefUse algorithm from the client interface  
> for querying that information).

I tried to deal with this.

> 
> >
> > defuse::DefUseHelper::DefUseHelper() :
> >  pimpl(new DefUseHelperImpl), current_use(DefUseNode::Use) { }
> >
> > defuse::DefUseHelper::~DefUseHelper() { delete pimpl; }
> >
> > void defuse::DefUseHelper::InitializeValues(DefUseData* data,
> >    VarRefBlockMap* bm, VarDeclMap* dm) {
> >  pimpl->data = data;
> >  pimpl->bm = bm;
> >  pimpl->dm = dm;
> > }
> >
> > void defuse::DefUseHelper::HandleDeclRefExpr(DeclRefExpr *DR) {
> >  VarRefBlockMap& bm = *pimpl->bm;
> >  unsigned blockID = pimpl->blockID;
> >  DefUseBlock& block_data = (*pimpl->data)[blockID];
> >  if (VarDecl *VD = dyn_cast<VarDecl>(DR->getDecl())) {
> >    // map this variable to the current block
> >    bm[DR] = blockID;
> >    if (block_data.find(VD) == block_data.end()) {
> >      block_data.insert(std::make_pair(VD, DefUseVect()));
> >      block_data.uses.insert(VD); // this variable is used in the  
> > current
> >      //  block and it has no prior definition within the block
> >    }
> >    if (current_use == DefUseNode::UseDef) { // in case of compound  
> > operators
> >      DefUseVect &B = block_data[VD];
> >      B.push_back(DefUseNode(DR, DefUseNode::Use));
> >      B.push_back(DefUseNode(DR, DefUseNode::Def));
> >    } else block_data[VD].push_back(DefUseNode(DR, current_use));
> >  }
> > }
> >
> > void defuse::DefUseHelper::HandleDeclStmt(DeclStmt *DS) {
> >  for (DeclStmt::decl_iterator I = DS->decl_begin(), E = DS->decl_end 
> > ();
> >      I != E; I++)
> >  {
> >    if (VarDecl* VD = dyn_cast<VarDecl> (*I)) {
> >      VarRefBlockMap& bm = *pimpl->bm;
> >      VarDeclMap& dm = *pimpl->dm;
> >      unsigned blockID = pimpl->blockID;
> >      DefUseBlock& block_data = (*pimpl->data)[blockID];
> >
> >      assert((block_data.find(VD) == block_data.end()) &&
> >          "Variable redefined in the same block");
> >      block_data[VD].push_back(DefUseNode(VD));
> >      bm[dm[VD]] = blockID;
> >    }
> >  }
> > }
> >
> > void defuse::DefUseHelper::HandleBinaryOperator(BinaryOperator* B) {
> >  DefUseNode::UseKind backup = current_use; // backup the usage
> >  if (B->isAssignmentOp()) {
> >    current_use = DefUseNode::Use;
> >    Visit(B->getRHS());
> >    current_use = DefUseNode::Def;
> >    if (B->isCompoundAssignmentOp())
> >      current_use = DefUseNode::UseDef;
> >    Visit(B->getLHS());
> >  } else {
> >    Visit(B->getLHS());
> >    current_use = DefUseNode::Use;
> >    Visit(B->getRHS());
> >  }
> >  current_use = backup; // write back the usage to the current usage
> > }
> >
> > void defuse::DefUseHelper::HandleConditionalOperator 
> > (ConditionalOperator* C) {}
> >
> > void defuse::DefUseHelper::HandleCallExpr(CallExpr* C) {
> >  DefUseNode::UseKind backup = current_use; // backup the usage
> >  for (CallExpr::arg_iterator I = C->arg_begin(), E = C->arg_end();
> >      I != E; ++I)
> >  {
> >    if ((*I)->getType()->isPointerType() || (*I)->getType()- 
> > >isReferenceType())
> >      current_use = DefUseNode::Def;
> >    Visit(*I);
> >    current_use = backup;
> >  }
> > }
> >
> > void defuse::DefUseHelper::HandleUnaryOperator(UnaryOperator* U) {
> >  DefUseNode::UseKind backup = current_use; // backup the usage
> >  switch (U->getOpcode()) {
> >  case UnaryOperator::PostInc:
> >  case UnaryOperator::PostDec:
> >  case UnaryOperator::PreInc:
> >  case UnaryOperator::PreDec:
> >    current_use = DefUseNode::UseDef;
> >    break;
> >  case UnaryOperator::Plus:
> >  case UnaryOperator::Minus:
> >  case UnaryOperator::Not:
> >  case UnaryOperator::LNot:
> >    current_use = DefUseNode::Use;
> >    break;
> >  case UnaryOperator::AddrOf:
> >  case UnaryOperator::Deref:
> >    // use the current_use
> >    break;
> >  default:
> >    // DEBUG("Operator " << UnaryOperator::getOpcodeStr(U->getOpcode 
> > ()) <<
> >    // " not supported in def-use analysis");
> >    break;
> >  }
> >  Visit(U->getSubExpr());
> >  current_use = backup; // write back the usage to the current usage
> > }
> >
> > void defuse::DefUseHelper::HandleArraySubscriptExpr 
> > (ArraySubscriptExpr* AS) {
> >  DefUseNode::UseKind backup = current_use; // backup the usage
> >  Visit(AS->getBase());
> >  current_use = DefUseNode::Use;
> >  Visit(AS->getIdx());
> >  current_use = backup; // write back the usage to the current usage
> > }
> >
> > void defuse::DefUseHelper::VisitCFGBlock(CFGBlock const& blk,
> >    CFGBlock const& entry) {
> >  pimpl->blockID = blk.getBlockID();
> >  for (CFGBlock::const_iterator I = blk.begin(), E = blk.end(); I !=  
> > E; ++I)
> >    Visit(*I);
> >
> >  unsigned blockID = pimpl->blockID;
> >  DefUseBlock& block_data = (*pimpl->data)[blockID];
> >
> >  // Post processing
> >  // build up the uses(b), defsout(b) and killed(b)
> >
> >  // for each variable
> >  for (DefUseBlock::iterator I = block_data.begin(), E =  
> > block_data.end(); I
> >      != E; ++I) {
> >    // in case of references to ParmVar what we do is insert the  
> > variable
> >    // definition in the first block of the CFG
> >    DefUseBlock& root_data = (*pimpl->data)[entry.getBlockID()];
> >    if (isa<ParmVarDecl>(I->first) &&
> >        std::find(root_data[I->first].begin(), root_data[I->first].end 
> > (),
> >        DefUseNode(I->first)) == root_data[I->first].end())
> >    {
> >      root_data[I->first].push_back(DefUseNode(I->first));
> >      root_data.defsout.push_back(&root_data[I->first].back());
> >    }
> >    // if this is a local variable, should't be in the defsout,killed  
> > sets
> >    // defsout is calculated by iterating backwards to find the last  
> > use of
> >    // the variable
> >    for (DefUseVect::const_reverse_iterator UI = I->second.rbegin(),  
> > UE =
> >        I->second.rend(); UI != UE; ++UI)
> >      if (UI->getUse() == DefUseNode::Def && (block_data.defsout.empty 
> > ()
> >          || block_data.defsout.back()->getDecl() != I->first)) {
> >        block_data.defsout.push_back(&(*UI));
> >        block_data.killed.insert(UI->getDecl());
> >        break;
> >      }
> >  }
> > }
> 
> Are these handle methods just part of the main DefUse algorithm?  If  
> so, consider sucking them into the Visit methods (thus removing the  
> virtual dispatch), and have them call the HandleXXX methods in a  
> separate DefUseHelper object if one was provided.  That way the  
> algorithm stays separate from those who want to observe its values.

there is no observing, the handleXXX are used to build DefUse. 

> 
> I know I keep saying this point, but a critical part of making this  
> clean is making it clear where the def-use implementation is and where  
> we would want to insert hooks to query information (including auditing  
> the algorithm).  Making those interfaces clear is key.  Ideally, the  
> header file should define just the functions we need to run the  
> algorithm and query the def-use information, nothing more.  I fully  
> admit that I just might be confused with the details of the  
> implementation.
> 
> There is a bunch of other code that I'm afraid I haven't waded through  
> yet, as there looks like there are some really meaty pieces there.   
> I've noticed some uses of std::set and performing set union's, etc.   
> Note that there may be a really high cost to using such methods, as  
> there will be implicit allocations from malloc, etc.  There may  
> possibly be alternative data structures within LLVM that you can use  
> for your purposes that perform far better, but I haven't looked at  
> your implementation close enough to make more specific comments yet.   
> I've given you enough comments, however, in this email to await any  
> feedback you want to provide on the next iteration.
> 
> Again, this is really great you're interested in contributing this  
> back, and I look forward to hearing your reply.
> 
thanks to you Ted!

> Best,
> Ted

cheers, Simone


-------------- next part --------------
A non-text attachment was scrubbed...
Name: DefUse.cpp
Type: text/x-c++src
Size: 35013 bytes
Desc: not available
URL: <http://lists.llvm.org/pipermail/cfe-dev/attachments/20091101/f55d4f32/attachment.cpp>
-------------- next part --------------
A non-text attachment was scrubbed...
Name: DefUse.h
Type: text/x-chdr
Size: 8344 bytes
Desc: not available
URL: <http://lists.llvm.org/pipermail/cfe-dev/attachments/20091101/f55d4f32/attachment.h>


More information about the cfe-dev mailing list