[cfe-dev] Constructing use-def chains

Ted Kremenek kremenek at apple.com
Wed Sep 23 00:18:02 PDT 2009


On Sep 15, 2009, at 9:24 AM, Simone Pellegrini wrote:

> Hello, I am back :),
> so I fix most of the comments, I still need to address a couple of  
> issues, i.e. use of llvm::DenseMap in place of std::map and virtual  
> methods in DefUseHelper.

Hi Simone,

Sorry for reviewing this second set of changes so late.  Comments  
inline.

> About the usage of llvm::DenseMap, the problem is that every time I  
> change from std:::map to llvm::DenseMap I get several errors  
> (compile time), which is weird as the 2 classes expose the same  
> interface, anyway I need some time to understand which is the source  
> of problems.

One thing to understand is that the two data structures do not export  
the same interface.

std::map allocates memory (using malloc) for each key/value pair you  
store in the map.  This allocation stays around around as long as you  
keep the key in the map.  For example, suppose you had:

    std::map<void*,int> M;
    int &x = M[p];

The reference 'x' is valid for the entire time that the key 'p' is in  
the map.  If you add or remove other keys, the reference stays valid  
because the key/value pair is actually an allocated blob of memory  
they you are allowed to reference.

DenseMap doesn't have this property.  It assumes that the values you  
store in the DenseMap are lightweight, e.g. pointers to other data,  
and thus allocates memory in chunks that are used for buckets for the  
hash table.  When an insertion into the DenseMap occurs, the hash  
table may resize, causing the values in the old buckets to get copied  
to new buckets, and the old buckets deallocated.  This means that any  
reference to the old data, as in the example above, is invalid.

Where this is going to bite you is here:

   typedef std::map<VarDecl const*, DefUseVect> DefUseChain;

If you replace this with:

   typedef llvm::DenseMap<VarDecl const*, DefUseVect> DefUseChain;

then fundamentally things are going to break.  This because the  
DefUseVect values, which is a typedef for std::vector<DefUseNode>,  
will get copied and deallocated when the table resizes.  All  
references to the DefUseVect, or the objects they contain, will  
incorrect at this point.

The solution is to use DefUseVect* instead of DefUseVect as the value  
for DefUseChain.

>
> About the virtual methods in DefUseHelper, you are right! :) most of  
> the time the user will use the default one and I see the performance  
> problem with the virtual calls. Actually, I have introduced this  
> virtual dispatch because in the project where I am using this  
> implementation of DefUse, I need to implement  specific semantics of  
> MPI calls. So mostly I am overloading the semantics of method calls  
> (CallExpr). By default when a variable is passed to a function as a  
> pointer type (or reference), during the DefUse analysis we have to  
> state that this is a definition for the variable. Anyway, if we know  
> the semantics of the specific method, and we know that it will not  
> change the variable (for example the buffer in MPI_Send) we can  
> state that the method call will a use for the variable.
>
> Now I don't know if this mechanism is useful for others... and if it  
> should be here in the first place, but it makes sense for me :)  
> Anyway, I guess I can avoid the virtual calls by using templates. I  
> will be working on it the next days.

I think the approach should be to focus on a clean API.  Using  
templates should be avoided unless there is a huge performance need;  
the cost of virtual functions is low if the amount of work done in the  
virtual function call outweighs the dynamic dispatch.

>
> For the rest I properly formatted the code, for the test cases I  
> have to learn how to write test cases for LLVM :)

Take a look at the Clang tests in 'test/Analysis' and friends.   
Basically we'll need to wire up a driver option to clang-cc, and  
you'll want to run clang-cc on some code that outputs some diagnostics  
relevant to def-use analysis.

More specific comments inline.

> //===-- clang/Analysis/DefUse.h - DefUse analysis -------*- C++ -*- 
> ===//
> //
> //                     The LLVM Compiler Infrastructure
> //
> // This file is distributed under the University of Illinois Open  
> Source
> // License. See LICENSE.TXT for details.
> //
> // 
> = 
> = 
> = 
> ----------------------------------------------------------------------= 
> ==//
> //
> // This file contains the declaration of DefUse analysis headers
> //
> // 
> = 
> = 
> = 
> ----------------------------------------------------------------------= 
> ==//
>
> #ifndef LLVM_CLANG_DEFUSE_H
> #define LLVM_CLANG_DEFUSE_H
>
> #include "llvm/ADT/DenseMap.h"
>
> #include "clang/AST/ASTConsumer.h"
> #include "clang/AST/ParentMap.h"
> #include "clang/AST/StmtVisitor.h"
>
> namespace clang {
>
> // forward definitions
> class CFG;
> class CFGBlock;
>
> class DefUse;
>
> namespace defuse {
> //===------------------------- DefUseNode ------------------------- 
> ===//
> // Keeps the information for a particular 'definition' or 'use' of a  
> variable
> // The structure is needed because a defuse chain can contains  
> variable declarations
> // as well as variable reference. As a DeclStmt can contains several  
> declarations
> // to address a particular variable we need to store its VarDecl or  
> DeclRefExp
> class DefUseNode {
> public:
>  enum NodeKind {
>    VarDecl, VarRef
>  };
>  enum UseKind {
>    Use, Def, UseDef
>  };
>
>  DefUseNode(clang::VarDecl const* decl) :
>    var_decl(decl), kind(VarDecl), usage(Def) {
>  }
>  DefUseNode(DeclRefExpr const* ref, UseKind u = Use) :
>    var_ref(ref), kind(VarRef), usage(u) {
>  }
>
>  clang::VarDecl const* getDecl() const;
>
>  NodeKind const& getKind() const {
>    return kind;
>  }
>  UseKind const& getUse() const {
>    return usage;
>  }
>
>  clang::VarDecl const* getVarDecl() const {
>    assert(kind==VarDecl);
>    return var_decl;
>  }
>  DeclRefExpr const* getVarRef() const {
>    assert(kind==VarRef);
>    return var_ref;
>  }
>
>  bool operator==(DefUseNode const& n) const;
>
> private:
>  // a def-use node can be either a VarDecl or a DeclRefExpr
>  union {
>    clang::VarDecl const* var_decl;
>    DeclRefExpr const* var_ref;
>  };
>  NodeKind kind;
>  UseKind usage;

Please consider using the PointerUnion class for this.  It will reduce  
the code you are using, and will allow you to merge the 'kind' field  
into the pointer as well with no extra work on your part.

Should 'usage' be const?

Also, most of LLVM does not follow the naming convention of lowercase  
variable names with underscores.  To be consistent with the rest of  
the codebase, please take a look at the coding conventions used in a  
core header file, e.g. 'include/clang/AST/Expr.h', which shows that  
variable names are typically short, and use capitalization instead of  
underscores.  This is really a nitty point, but consistency through  
the code base makes it more readable.

> };
>
> //===------------------------- Typedefs -------------------------===//
> class DefUseBlock;
> class VarDeclMap;
> typedef std::vector<DefUseBlock> DefUseData;
> typedef llvm::DenseMap<Stmt const*, unsigned> VarRefBlockMap;
> typedef std::vector<DefUseNode> DefUseVect;
> typedef std::vector<DefUseNode const*> VarRefsVect;

Something looks potentially wrong here.  DefUseVect is a vector of  
DefUseNode objects, not DefUseNode*.  If the vector ever needs to be  
resized, then the DefUseNode objects will be obliterated, and any  
references to them (which may be in VarRefVect) will suddenly become  
invalid.  DefUseNode is a lightweight object that you can certainly  
keep in a vector, but if you need these objects to have stable pointer  
addresses you will need to manually allocate them and store  
DefUseNode* instead DefUseVect instead.  If you are concerned with  
'malloc()' being too slow, you can use the BumpPtrAllocator in llvm to  
quickly allocate objects.

> class DefUseHelper;
>
> //===------------------------- DefUseHelper ------------------------- 
> ===//
>
> class DefUseHelper : public StmtVisitor<DefUseHelper> {
>  class DefUseHelperImpl;
>  DefUseHelperImpl* pimpl;
>
>  void InitializeValues(DefUseData* data, VarRefBlockMap* bm,  
> VarDeclMap* dm);
>
>  friend class clang::DefUse;
> public:
>  DefUseNode::UseKind current_use;
>  DefUseHelper();
>
>  virtual void HandleDeclRefExpr(DeclRefExpr *DR); // remember to  
> call the
>                              // super class implementation of the  
> method
>  void HandleDeclStmt(DeclStmt *DS);
>
>  virtual void HandleBinaryOperator(BinaryOperator* B);
>  virtual void HandleConditionalOperator(ConditionalOperator* C);
>  virtual void HandleCallExpr(CallExpr* C);
>  virtual void HandleUnaryOperator(UnaryOperator* U);
>  virtual void HandleArraySubscriptExpr(ArraySubscriptExpr* AS);
>
>  void VisitDeclRefExpr(DeclRefExpr *DR) {
>    return HandleDeclRefExpr(DR);
>  }
>  void VisitDeclStmt(DeclStmt *DS) {
>    return HandleDeclStmt(DS);
>  }
>  void VisitBinaryOperator(BinaryOperator* B) {
>    return HandleBinaryOperator(B);
>  }
>  void VisitConditionalOperator(ConditionalOperator* C) {
>    return HandleConditionalOperator(C);
>  }
>  void VisitCallExpr(CallExpr* C) {
>    return HandleCallExpr(C);
>  }
>  void VisitUnaryOperator(UnaryOperator* U) {
>    return HandleUnaryOperator(U);
>  }
>  void VisitArraySubscriptExpr(ArraySubscriptExpr* AS) {
>    return HandleArraySubscriptExpr(AS);
>  }
>
>  void VisitCFGBlock(clang::CFGBlock const& blk, CFGBlock const&  
> entry);
>  void VisitStmt(Stmt* S);
>
>  virtual ~DefUseHelper();
> };
>
> void PrintVarDefs(DefUse const* DU, DeclRefExpr const* DR,  
> ASTContext& ctx,
>    llvm::raw_ostream& out);
> void PrintVarUses(DefUse const* DU, DeclRefExpr const* DR,  
> ASTContext& ctx,
>    llvm::raw_ostream& out);
> void PrintVarUses(DefUse const* DU, VarDecl const* VD, ASTContext&  
> ctx,
>    llvm::raw_ostream& out);
>
> ASTConsumer* CreateDefUseTestConsumer(llvm::raw_ostream& out);
>

I'm still trying to understand the implementation in its entirety, but  
it seems to me that you should try and decouple the DefUse  
implementation from the observer of its values.  Here it seems like  
they are mixed together, which forces you to use subclassing of  
DefUseHelper to extend it's functionality.  This isn't ideal, as it  
conflates the DefUse algorithm with what desires to consume that  
information.  This muddles the separation of concerns between these  
two concepts, and it constrains future revisions of the DefUse  
implementation to stay within this visitor design.  The consumer of  
the DefUse information may wish to have a visitor design, but I'm not  
certain if you want to expose a visitor pattern in the DefUse  
implementation itself.

As an alternative, consider the "observer" interface provided by  
Clang's LiveVariables analysis ('include/clang/Analysis/Analyses/ 
LiveVariables.h').  Here the LiveVariables implementation is  
completely opaque, but allows clients to provide an "Observer" object  
so they can query liveness information at specific subexpressions.   
The dead stores checker uses this interface to find dead stores, and  
it can do so without any knowledge of how the LiveVariables analysis  
calculates its liveness information.  It makes the API much simpler,  
and disentangles the implementation of LiveVariables from its clients.

More specifically, the HandleXXX methods look like they are part of  
the "observer" interface, while the VisitXXX methods look like they  
are part of the DefUse implementation.  Your use of the pimpl idiom  
implies you want to keep some of this implementation hidden.  Why not  
just separate the implementation of the DefUse algorithm completely  
from the HandleXXX methods and keep those in the DefUseHelper.  The  
DefUse algorithm can then check to see if a DefUseHelper object is  
provided, and if so, call the appropriate HandleXXX methods.

> } // end namespace defuse
>
> //===------------------------- DefUse -------------------------===//
>
> class DefUse {
>  ASTContext const& ctx;
>  defuse::DefUseData const* analysis_data;
>  defuse::VarDeclMap const* decl_map;
>  defuse::VarRefBlockMap const* block_map;
>  unsigned const num_cfg_blocks;

Again, just my comment on the naming convention for variables.

>
>  DefUse(ASTContext const& ctx_,
>      defuse::DefUseData const* analysis_data_,
>      defuse::VarDeclMap const* decl_map_,
>      defuse::VarRefBlockMap const* block_map_,
>      unsigned num_blocks) :

Please do not use trailing underscores in variable names.

>    ctx(ctx_), analysis_data(analysis_data_), decl_map(decl_map_),
>    block_map(block_map_), num_cfg_blocks(num_blocks) {
>  }
>
>  bool isDef(defuse::DefUseNode const& n) const;
>
>  class iterator_impl {
>  public:
>    DefUse const* du;
>    defuse::DefUseNode const* node;
>
>    struct iter {
>      int block_id;
>      defuse::DefUseVect::const_iterator block_it;
>      iter(int blk_id) :
>        block_id(blk_id) {
>      }
>    };
>    iterator_impl() : du(NULL) { }
>    iterator_impl(DefUse const* du_) : du(du_) { }
>    iterator_impl& operator++() {
>      return inc(false);
>    }
>    virtual iterator_impl& inc(bool) = 0;
>  };

Do we need a virtual iterator?  Does iterator_impl need to be  
subclassed?  Not only is this a potential performance problem, but it  
could be the source of subtle bugs if you assigned an object of a  
subclass to that of a parent class.  It seems to me that defs_iterator  
and uses_iterator can be merged (substantially reducing the code), and  
simply add a flag to the iterator indicating whether it is for uses or  
defs.  If you want strong typing, you certainly can have subclassing  
of the iterators, but I think the entire implementation can get sucked  
into a common class.  Then the "inc" just checks the flag and skips of  
uses when you are iterating over defs and vis versa.  You then remove  
the vtable pointer, simplify the inheritance tree, and the  
implementation becomes a lot more readable.  There is also more  
chances for inlining by the compiler (i.e., if the definition of 'inc'  
is provided in the class declaration).


> public:
>  class uses_iterator : public std::iterator<std::input_iterator_tag,
>    DeclRefExpr, std::ptrdiff_t, DeclRefExpr const*>,
>    public iterator_impl {
>    iter iter_ptr;
>    bool inDefBlock;
>
>    uses_iterator() :
>      iterator_impl(), iter_ptr(-1), inDefBlock(true) { }
>    uses_iterator(DefUse const* du, defuse::DefUseNode const& n);
>    uses_iterator& inc(bool first);
>    friend class DefUse;
>  public:
>    bool operator!=(uses_iterator const& iter);
>    DeclRefExpr const* operator*();
>  };
>
>  class defs_iterator : public std::iterator<std::input_iterator_tag,
>  defuse::DefUseNode, std::ptrdiff_t, defuse::DefUseNode const*>,
>  public iterator_impl {
>    struct iter_ : public iter {
>      defuse::VarRefsVect::const_iterator reaches_it;
>      iter_(int blk_id) :
>        iter(blk_id) { }
>    } iter_ptr;
>    bool blockDef;
>
>    defs_iterator() :
>      iterator_impl(), iter_ptr(-1), blockDef(false) { }
>    defs_iterator(DefUse const* du, DeclRefExpr const& n);
>    defs_iterator& inc(bool first);
>    friend class DefUse;
>  public:
>    bool operator!=(defs_iterator const& iter);
>    defuse::DefUseNode const* operator*();
>  };
>
>  // USES //
>  defs_iterator defs_begin(DeclRefExpr const* var) const;
>  defs_iterator defs_end() const;
>
>  // DEFS //
>  uses_iterator uses_begin(DeclRefExpr const* var) const;
>  uses_iterator uses_begin(VarDecl const* var) const;
>  uses_iterator uses_end() const;
>
>  bool isUse(DeclRefExpr const* var) const;
>  bool isDef(DeclRefExpr const* var) const {
>    return isDef(defuse::DefUseNode(var, defuse::DefUseNode::Def));
>  }
>  bool isDef(VarDecl const* var) const {
>    return isDef(defuse::DefUseNode(var));
>  }
>
>  ~DefUse();
>
>  static DefUse* BuildDefUseChains(Stmt* body, ASTContext *ctx,
>      CFG* cfg, ParentMap* pm, defuse::DefUseHelper* helper = 0,
>      bool verbose = false, llvm::raw_ostream& out = llvm::outs());
> };
>
> } // end namespace clang
>
> #endif
> //===-- clang/Analysis/DefUse.cpp - DefUse analysis -------*- C++ -*- 
> ===//
> //
> //                     The LLVM Compiler Infrastructure
> //
> // This file is distributed under the University of Illinois Open  
> Source
> // License. See LICENSE.TXT for details.
> //
> // 
> = 
> = 
> = 
> --------------------------------------------------------------------- 
> ===//
> //
> // This file contains the implementation of DefUse analysis
> //
> // 
> = 
> = 
> = 
> --------------------------------------------------------------------- 
> ===//
>
> #define __STDC_LIMIT_MACROS
> #define __STDC_CONSTANT_MACROS
>
> #include "DefUse.h"
>
> #include "clang/Basic/SourceManager.h"
> #include "clang/Analysis/CFG.h"
> #include <llvm/Support/raw_ostream.h>
>
> // todo: Use DenseMap and llvm sets
> #include <set>
> #include <map>
>
> using namespace clang;
> using namespace defuse;
>
> // utility functions
> static unsigned Line(SourceLocation const& l, clang::SourceManager  
> const& sm) {
>  PresumedLoc pl = sm.getPresumedLoc(l);
>  return pl.getLine();
> }
>
> static unsigned Column(SourceLocation const& l, SourceManager const&  
> sm) {
>  PresumedLoc pl = sm.getPresumedLoc(l);
>  return pl.getColumn();
> }
>
> static std::string PrintClangStmt(Stmt const* stmt, ASTContext  
> const& ctx) {
>  std::string str;
>  llvm::raw_string_ostream pp(str);
>  stmt->printPretty(pp, 0, PrintingPolicy(ctx.getLangOptions()), 0);
>  return pp.str();
> }
>
> //===------------------------- DefUseNode ------------------------- 
> ===//
> inline bool defuse::DefUseNode::operator==(DefUseNode const& n)  
> const {
>  if (n.usage != usage || n.kind != kind) return false;
>  return kind == VarDecl ? var_decl == n.var_decl : var_ref ==  
> n.var_ref;
> }
>
> inline clang::VarDecl const* DefUseNode::getDecl() const {
>  return kind == VarRef ? dyn_cast<clang::VarDecl> (var_ref->getDecl 
> ()) :
>    dyn_cast<clang::VarDecl> (var_decl);
> }
>
> typedef std::set<DefUseNode const*> VarRefsSet;
> typedef std::set<VarDecl const*> VarDeclsSet;
> typedef std::map<VarDecl const*, DefUseVect> DefUseChain;
>
> //===------------------------- DefUseBlock ------------------------- 
> ===//
>
> class defuse::DefUseBlock : public DefUseChain {

Please use encapsulation instead of subclassing.  In this case you are  
subclassing std::map, and that is something there rarely is ever a  
good reason to do.  Encapsulation ensures isolation between the two  
classes, and forces you to think in terms of interfaces instead of  
making assumptions about the implementation of the underlying data  
structure.  Almost all of LLVM is structured this way, and it makes  
code far more resilient to implementation changes in the employed data  
structures.

Instead, either have DefUseChain contain an std::map (and then define  
the proper interfaces in DefUseChain) or have DefUseBlock contain a  
DefUseChain, not subclass it.  The former forces you to define only  
the interfaces you need for DefUseChain, instead of exposing the  
entirety of std::map (which isn't the interface you're intending to  
vend, even if it is just to your client code within the DefUse  
implementation).

> public:
>  VarDeclsSet uses;
>  VarRefsVect defsout;
>  VarDeclsSet killed;
>  VarRefsVect reaches;
>
>  class not_killed_set_iterator : public  
> std::iterator<std::input_iterator_tag,
>      DefUseNode, std::ptrdiff_t, DefUseNode const*> {
>    DefUseBlock& block;
>    VarRefsVect::const_iterator ptr_it;
>
>    not_killed_set_iterator& inc(bool first) {
>      if (ptr_it != block.reaches.end() && !first)
>        ++ptr_it;
>      while (ptr_it != block.reaches.end() &&
>          (block.killed.find((*ptr_it)->getDecl()) != block.killed.end 
> () &&
>          std::find(block.defsout.begin(), block.defsout.end(),  
> *ptr_it)
>          == block.defsout.end()))
>        ++ptr_it;
>      return *this;
>    }
>  public:
>    not_killed_set_iterator(DefUseBlock& block_,
>        VarRefsVect::const_iterator ptr_it_) :
>      block(block_), ptr_it(ptr_it_) { inc(true); }
>    not_killed_set_iterator& operator++() { return inc(false); }
>    bool operator!=(not_killed_set_iterator const& iter) const {
>      return !((&iter.block) == &(block) && iter.ptr_it == ptr_it);
>    }
>    const DefUseNode* const & operator*() const { return *ptr_it; }
>  };
>
>  not_killed_set_iterator begin_not_killed_set() {
>    return DefUseBlock::not_killed_set_iterator(*this, reaches.begin 
> ());
>  }
>  not_killed_set_iterator end_not_killed_set() {
>    return DefUseBlock::not_killed_set_iterator(*this, reaches.end());
>  }
> };
>
> //===------------------------- VarDeclMap ------------------------- 
> ===//
>
> class defuse::VarDeclMap : public StmtVisitor<VarDeclMap> ,
>    public llvm::DenseMap<VarDecl const*, DeclStmt const*> {
> public:
>  VarDeclMap(Stmt const* body) { VisitStmt(const_cast<Stmt*> (body)); }
>
>  void VisitDeclStmt(DeclStmt *DS) {
>    for (DeclStmt::decl_iterator I = DS->decl_begin(), E = DS- 
> >decl_end();
>        I != E; ++I)
>    {
>      if (VarDecl *VD = dyn_cast<VarDecl>(*I))
>        insert(std::make_pair(VD, DS));
>    }
>  }
>  void VisitStmt(Stmt* S) {
>    for (Stmt::child_iterator I = S->child_begin(), E = S->child_end();
>        I != E; ++I)
>      if (*I) Visit(*I);
>  }
> };
>
> //===------------------------- DefUseHelper ------------------------- 
> ===//
> struct defuse::DefUseHelper::DefUseHelperImpl {
>  DefUseData* data;
>  VarRefBlockMap* bm;
>  VarDeclMap* dm;
>  unsigned blockID;
>
>  DefUseHelperImpl() :
>    data(NULL), bm(NULL), dm(NULL) { }
> };

Per my comments above, it seems to me this pimpl logic can be sucked  
into a common class with the main logic for the DefUse algorithm, and  
then make that whole thing private to DefUse.cpp.  That will be a lot  
simpler, and should migrate many of the implementation details from  
the header file to the .cpp file.  Then clients just pass the  
DefUseHelper object (which has the HandleXXX methods) to  
BuildDefUseChains(), and the implementation class stays completely  
hidden (thus separating the DefUse algorithm from the client interface  
for querying that information).

>
> defuse::DefUseHelper::DefUseHelper() :
>  pimpl(new DefUseHelperImpl), current_use(DefUseNode::Use) { }
>
> defuse::DefUseHelper::~DefUseHelper() { delete pimpl; }
>
> void defuse::DefUseHelper::InitializeValues(DefUseData* data,
>    VarRefBlockMap* bm, VarDeclMap* dm) {
>  pimpl->data = data;
>  pimpl->bm = bm;
>  pimpl->dm = dm;
> }
>
> void defuse::DefUseHelper::HandleDeclRefExpr(DeclRefExpr *DR) {
>  VarRefBlockMap& bm = *pimpl->bm;
>  unsigned blockID = pimpl->blockID;
>  DefUseBlock& block_data = (*pimpl->data)[blockID];
>  if (VarDecl *VD = dyn_cast<VarDecl>(DR->getDecl())) {
>    // map this variable to the current block
>    bm[DR] = blockID;
>    if (block_data.find(VD) == block_data.end()) {
>      block_data.insert(std::make_pair(VD, DefUseVect()));
>      block_data.uses.insert(VD); // this variable is used in the  
> current
>      //  block and it has no prior definition within the block
>    }
>    if (current_use == DefUseNode::UseDef) { // in case of compound  
> operators
>      DefUseVect &B = block_data[VD];
>      B.push_back(DefUseNode(DR, DefUseNode::Use));
>      B.push_back(DefUseNode(DR, DefUseNode::Def));
>    } else block_data[VD].push_back(DefUseNode(DR, current_use));
>  }
> }
>
> void defuse::DefUseHelper::HandleDeclStmt(DeclStmt *DS) {
>  for (DeclStmt::decl_iterator I = DS->decl_begin(), E = DS->decl_end 
> ();
>      I != E; I++)
>  {
>    if (VarDecl* VD = dyn_cast<VarDecl> (*I)) {
>      VarRefBlockMap& bm = *pimpl->bm;
>      VarDeclMap& dm = *pimpl->dm;
>      unsigned blockID = pimpl->blockID;
>      DefUseBlock& block_data = (*pimpl->data)[blockID];
>
>      assert((block_data.find(VD) == block_data.end()) &&
>          "Variable redefined in the same block");
>      block_data[VD].push_back(DefUseNode(VD));
>      bm[dm[VD]] = blockID;
>    }
>  }
> }
>
> void defuse::DefUseHelper::HandleBinaryOperator(BinaryOperator* B) {
>  DefUseNode::UseKind backup = current_use; // backup the usage
>  if (B->isAssignmentOp()) {
>    current_use = DefUseNode::Use;
>    Visit(B->getRHS());
>    current_use = DefUseNode::Def;
>    if (B->isCompoundAssignmentOp())
>      current_use = DefUseNode::UseDef;
>    Visit(B->getLHS());
>  } else {
>    Visit(B->getLHS());
>    current_use = DefUseNode::Use;
>    Visit(B->getRHS());
>  }
>  current_use = backup; // write back the usage to the current usage
> }
>
> void defuse::DefUseHelper::HandleConditionalOperator 
> (ConditionalOperator* C) {}
>
> void defuse::DefUseHelper::HandleCallExpr(CallExpr* C) {
>  DefUseNode::UseKind backup = current_use; // backup the usage
>  for (CallExpr::arg_iterator I = C->arg_begin(), E = C->arg_end();
>      I != E; ++I)
>  {
>    if ((*I)->getType()->isPointerType() || (*I)->getType()- 
> >isReferenceType())
>      current_use = DefUseNode::Def;
>    Visit(*I);
>    current_use = backup;
>  }
> }
>
> void defuse::DefUseHelper::HandleUnaryOperator(UnaryOperator* U) {
>  DefUseNode::UseKind backup = current_use; // backup the usage
>  switch (U->getOpcode()) {
>  case UnaryOperator::PostInc:
>  case UnaryOperator::PostDec:
>  case UnaryOperator::PreInc:
>  case UnaryOperator::PreDec:
>    current_use = DefUseNode::UseDef;
>    break;
>  case UnaryOperator::Plus:
>  case UnaryOperator::Minus:
>  case UnaryOperator::Not:
>  case UnaryOperator::LNot:
>    current_use = DefUseNode::Use;
>    break;
>  case UnaryOperator::AddrOf:
>  case UnaryOperator::Deref:
>    // use the current_use
>    break;
>  default:
>    // DEBUG("Operator " << UnaryOperator::getOpcodeStr(U->getOpcode 
> ()) <<
>    // " not supported in def-use analysis");
>    break;
>  }
>  Visit(U->getSubExpr());
>  current_use = backup; // write back the usage to the current usage
> }
>
> void defuse::DefUseHelper::HandleArraySubscriptExpr 
> (ArraySubscriptExpr* AS) {
>  DefUseNode::UseKind backup = current_use; // backup the usage
>  Visit(AS->getBase());
>  current_use = DefUseNode::Use;
>  Visit(AS->getIdx());
>  current_use = backup; // write back the usage to the current usage
> }
>
> void defuse::DefUseHelper::VisitCFGBlock(CFGBlock const& blk,
>    CFGBlock const& entry) {
>  pimpl->blockID = blk.getBlockID();
>  for (CFGBlock::const_iterator I = blk.begin(), E = blk.end(); I !=  
> E; ++I)
>    Visit(*I);
>
>  unsigned blockID = pimpl->blockID;
>  DefUseBlock& block_data = (*pimpl->data)[blockID];
>
>  // Post processing
>  // build up the uses(b), defsout(b) and killed(b)
>
>  // for each variable
>  for (DefUseBlock::iterator I = block_data.begin(), E =  
> block_data.end(); I
>      != E; ++I) {
>    // in case of references to ParmVar what we do is insert the  
> variable
>    // definition in the first block of the CFG
>    DefUseBlock& root_data = (*pimpl->data)[entry.getBlockID()];
>    if (isa<ParmVarDecl>(I->first) &&
>        std::find(root_data[I->first].begin(), root_data[I->first].end 
> (),
>        DefUseNode(I->first)) == root_data[I->first].end())
>    {
>      root_data[I->first].push_back(DefUseNode(I->first));
>      root_data.defsout.push_back(&root_data[I->first].back());
>    }
>    // if this is a local variable, should't be in the defsout,killed  
> sets
>    // defsout is calculated by iterating backwards to find the last  
> use of
>    // the variable
>    for (DefUseVect::const_reverse_iterator UI = I->second.rbegin(),  
> UE =
>        I->second.rend(); UI != UE; ++UI)
>      if (UI->getUse() == DefUseNode::Def && (block_data.defsout.empty 
> ()
>          || block_data.defsout.back()->getDecl() != I->first)) {
>        block_data.defsout.push_back(&(*UI));
>        block_data.killed.insert(UI->getDecl());
>        break;
>      }
>  }
> }

Are these handle methods just part of the main DefUse algorithm?  If  
so, consider sucking them into the Visit methods (thus removing the  
virtual dispatch), and have them call the HandleXXX methods in a  
separate DefUseHelper object if one was provided.  That way the  
algorithm stays separate from those who want to observe its values.

I know I keep saying this point, but a critical part of making this  
clean is making it clear where the def-use implementation is and where  
we would want to insert hooks to query information (including auditing  
the algorithm).  Making those interfaces clear is key.  Ideally, the  
header file should define just the functions we need to run the  
algorithm and query the def-use information, nothing more.  I fully  
admit that I just might be confused with the details of the  
implementation.

There is a bunch of other code that I'm afraid I haven't waded through  
yet, as there looks like there are some really meaty pieces there.   
I've noticed some uses of std::set and performing set union's, etc.   
Note that there may be a really high cost to using such methods, as  
there will be implicit allocations from malloc, etc.  There may  
possibly be alternative data structures within LLVM that you can use  
for your purposes that perform far better, but I haven't looked at  
your implementation close enough to make more specific comments yet.   
I've given you enough comments, however, in this email to await any  
feedback you want to provide on the next iteration.

Again, this is really great you're interested in contributing this  
back, and I look forward to hearing your reply.

Best,
Ted



More information about the cfe-dev mailing list