[cfe-dev] Constructing use-def chains

Ted Kremenek kremenek at apple.com
Tue Sep 8 18:24:08 PDT 2009


On Sep 4, 2009, at 8:43 AM, Simone Pellegrini wrote:

> Hi again,
> Here it is my current implementation of DefUse chain analysis. I  
> adjust the code to be compliant with some of the clang rules (I had  
> some problem in switching from std::map to llvm::DenseMap and I will  
> try to better investigate this issue in the future).
>
> So as this is not a patch, but a complete new feature I don't know  
> where it should be placed in the source tree (clang/Analysis seems  
> fine), so I will let u decide. The analysis is composed by 2 files  
> DefUse.h and DefUse.cpp which represent respectively the interface  
> and the implementation of the DefUse analysis. The usage of DefUse  
> is very similar to CFG. You build the all DefUse chains for a  
> specific piece of code (usually a function body) and then u start  
> querying the object.
>
> #include "DefUse.h"
> using namespace defuse;
>
> clang::FunctionDecl* fd = ...
> DefUse* DU = DefUse::BuildDefUseChains(fd->getBody(), astCtx);
>
> now if we have a reference to a variable inside the function body  
> (usually a DeclRefExpr) we can query the DefUse object to know if  
> the VarRef is a "definition" or an "use":
>
> DeclRefExpr* DR = ...
> DU->isDef(DR);
> DU->isUse(DR);
>
> Now if our variable is a "definition" for example we can iterate  
> among all the uses referred to the definition in the following way:
> for(DefUse::uses_iterator I = DU->uses_begin(DR), E = DU->uses_end 
> (); I != E, I++){
>   DeclRefExpr* U = *I;
>   ...
> }
>
> If the variable is a use, we can iterate among all the definitions:
> for(DefUse::defs_iterator I = DU->defs_begin(DR), E = DU->defs_end 
> (); I != E, I++){
>  ...
> }
> here the problem is that a definition can be either a DeclRefExpr in  
> situations like:
> -> a = s; // this is a definition for var. a
>
> or a VarDecl in situations like:
> -> int a; // this is a definition for variable a
>
> For these reasons I have introduced a wrapper class called  
> DefUseNode (the name can be changed) that can contains VarDecls or  
> DeclRefExprs.
>
> And that's the basic usage of DefUse.
>
> A more advanced use is also possible, I don't want to go into the  
> details of the algorithm used to compute the defuse chains, but you  
> can imagine it divided into 2 steps. In the first steps all the  
> variable references are collected and marked as definitions or uses  
> depending on the semantic of the instruction and then, in the second  
> step, this informations are elaborated in order to build the graph.
> We can discuss about it but It comes out that marking the use of a  
> variable can depends from the context, and the semantics can be  
> different. For example if we are calling a function and the variable  
> is passed as a reference or pointer we have to assume that this is a  
> definition for the variable, but if we know the semantics of the  
> function and we are sure the variable is not modified in the  
> function body we can "risk" a little and mark the variable as an  
> use. For this reason different semantics can be defined by the user  
> in order to cover specific cases.
>
> This is possible by extending the DefUseBasicHelper. The basic  
> helper already implements the default semantics (a very conservative  
> one) but it exposes some virtual methods that can be extended by the  
> user who wants to implement a different behavior. The helper should  
> be passed to the BuildDefUseChain method in this way:
>
> DefUseCustomHelper helper;
> DefUse* DU = DefUse::BuildDefUseChains(fd->getBody(), astCtx,  
> &helper);
> ...
>
> At last, I added a class to test the algorithm DefUseChainTest. I  
> produces a lot of output (badly formatted) but you can easily see  
> how ( and if :) ) the whole analysis works just by:
> defuse::DefUseChainTest Consumer(llvm::outs());
> ...
> ParseAST(PP, Consumer, *ContextOwner, false);
>
> The test class is also meant to be an example of how to use the  
> DefUse.
>
> N.B.: As I already stated in my previous emails it is important  
> (when and if the patch will be committed in the CLANG source tree)  
> the institution where I am working, the "University of Innsbruck",  
> and if possible my name (Simone Pellegrini) appear in the  
> CREDITS.TXT file as contributors for this type of analysis.
>
> have fun with DefUses, cheers Simone
>
> P.S.: Unfortunately next week I will be on a business trip so not  
> reachable. I will reply and address your comments/feedbacks as soon  
> as I am back in the office.

Hi Simone,

Sorry for the delay in replying!  The overall interface for this class  
looks pretty good.  I have a bunch of comments with regards to the  
implementation and the coding style that we should iterate on before  
committing back to the Clang tree.  My comments are inline with the  
code below.  Feel free to reply whenever you have time.

As for where this code should live, I suggest:

include/clang/Analysis/Analyses/DefUse.h
lib/Analysis/DefUse.cpp

My comments reflect a first pass over this patch.  There is a lot  
going on here, so I have focused only on a subset of the details.

>
> //===-- clang/Analysis/DefUse.h - DefUse analysis -------*- C++ -*- 
> ===//
> //
> //                     The LLVM Compiler Infrastructure
> //
> // This file is distributed under the University of Illinois Open  
> Source
> // License. See LICENSE.TXT for details.
> //
> // 
> = 
> = 
> = 
> ----------------------------------------------------------------------= 
> ==//
> //
> // This file contains the declaration of DefUse analysis headers
> //
> // 
> = 
> = 
> = 
> ----------------------------------------------------------------------= 
> ==//
>
> #ifndef DEF_USE_HPP_
> #define DEF_USE_HPP_

Please use "LLVM_CLANG_DEFUSE_H" to avoid any conflict with other  
defuse implementations.

> #include "llvm/ADT/DenseMap.h"
>
> #include "clang/AST/ASTConsumer.h"
> #include "clang/AST/ParentMap.h"
> #include "clang/AST/StmtVisitor.h"

Looks fine.

>
> #include <exception>
> #include <stdexcept>

We don't use C++ exceptions at all in LLVM, largely for performance  
reasons.

> namespace clang {
>
> // forward definitions
> class CFG;
> class CFGBlock;
>
> namespace defuse {
>
> //===------------------------- Exceptions ------------------------- 
> ===//
>
> struct NotDefExcpetion {
>    Stmt const* stmt;
>    NotDefExcpetion(Stmt const* stmt_): stmt(stmt_){}
> };
> struct NotUseExcpetion {
>    DeclRefExpr const* expr;
>    NotUseExcpetion(DeclRefExpr const* expr_): expr(expr_){}
> };

These will need to be replaced with an alternate mechanism.

>
> //===------------------------- DefUseNode ------------------------- 
> ===//
> // Keeps the information for a particular 'definition' or 'use' of a  
> variable
> // The structure is needed because a defuse chain can contains  
> variable declarations
> // as well as variable reference. As a DeclStmt can contains several  
> declarations
> // to address a particular variable we need to store its VarDecl or  
> DeclRefExp
> struct DefUseNode{
> 	enum NodeKind{ VarDecl, VarRef };
> 	enum UseKind{ Use, Def, UseDef };
>
> 	DefUseNode(clang::VarDecl const* decl): var_decl(decl), kind 
> (VarDecl), usage(Def){}
> 	DefUseNode(DeclRefExpr const* ref, UseKind u = Use): var_ref(ref),  
> kind(VarRef), usage(u){}
>
> 	clang::VarDecl const* getDecl() const;
>
> 	NodeKind const& getKind()const { return kind; }
> 	UseKind const&  getUse() const { return usage; }
>
> 	clang::VarDecl const* getVarDecl() const {
> 		assert(kind==VarDecl); return var_decl;
> 	}
> 	DeclRefExpr const* getVarRef() const {
> 		assert(kind==VarRef); return var_ref;
> 	}
>
> 	bool operator==(DefUseNode const& n) const;
>
> private:
> 	// a def-use node can be either a VarDecl or a DeclRefExpr
> 	union{
> 		clang::VarDecl	const* 	var_decl;
> 		DeclRefExpr const* 		var_ref;
> 	};
> 	NodeKind 	kind;
> 	UseKind 	usage;
> };

Since this 'struct' has private data, please make it a class instead:

http://llvm.org/docs/CodingStandards.html#ci_class_struct


>
> //===------------------------- Typedefs -------------------------===//
> class DefUseBlock;
> class VarDeclMap;
> typedef std::vector<DefUseBlock>				DefUseData;
> typedef llvm::DenseMap<Stmt const*, unsigned> 	VarRefBlockMap;
> typedef std::vector<DefUseNode>					DefUseVect;
> typedef std::vector<DefUseNode const*> 			VarRefsVect;
>
> class DefUseBasicHelper;
>
> //===------------------------- DefUse -------------------------===//
>
> class DefUse{
> 	ASTContext const&		ctx;
> 	DefUseData const* 		analysis_data;
> 	VarDeclMap const*		decl_map;
> 	VarRefBlockMap const* 	block_map;
> 	unsigned const			num_cfg_blocks;
>
> 	DefUse(ASTContext const& ctx_, DefUseData const* analysis_data_,  
> VarDeclMap const* decl_map_,
> 	        VarRefBlockMap const* block_map_, unsigned num_blocks):
> 		ctx(ctx_), analysis_data(analysis_data_), decl_map(decl_map_),  
> block_map(block_map_),
> 		num_cfg_blocks(num_blocks){ }
>
> 	bool isDef(DefUseNode const& n) const;
>
> 	struct iterator_impl{

To make MSVC happy, this should be a 'class'.

> 		DefUse const* 		du;
> 		DefUseNode const*	node;
>
> 		struct iter{
> 			int		 					block_id;
> 			DefUseVect::const_iterator 	block_it;
> 			iter(int blk_id): block_id(blk_id){}
> 		};
>
> 		iterator_impl(): du(NULL){ }
> 		iterator_impl(DefUse const* du_): du(du_){ }
> 		iterator_impl& operator++() { return inc(false); }
> 		iterator_impl& operator++(int) { return inc(false); }
> 		virtual iterator_impl& inc(bool) = 0;
> 	};
> public:
> 	class uses_iterator: public std::iterator<std::input_iterator_tag,  
> DeclRefExpr, std::ptrdiff_t, DeclRefExpr const*>,
>                         public iterator_impl{
> 		iter iter_ptr;
> 		bool inDefBlock;
>
> 		uses_iterator(): iterator_impl(), iter_ptr(-1), inDefBlock(true){}
> 		uses_iterator(DefUse const* du, DefUseNode const& n);
> 		uses_iterator& inc(bool first);
> 		friend class DefUse;
> 	public:
> 		virtual bool operator!=(uses_iterator const& iter);

Does this need to be virtual?

> 		virtual DeclRefExpr const* operator*();

Does this need to be virtual?

> 	};
>
> 	class defs_iterator: public std::iterator<std::input_iterator_tag,  
> DefUseNode, std::ptrdiff_t, DefUseNode const*>,
>                         public iterator_impl{
> 		struct iter_: public iter{
> 			VarRefsVect::const_iterator 	reaches_it;
> 			iter_(int blk_id): iter(blk_id){}
> 		} iter_ptr;
> 		bool blockDef;
>
> 		defs_iterator(): iterator_impl(), iter_ptr(-1), blockDef(false) {}
> 		defs_iterator(DefUse const* du, DeclRefExpr const& n);
> 		defs_iterator& inc(bool first);
> 		friend class DefUse;
> 	public:
> 		bool operator!=(defs_iterator const& iter);
> 		DefUseNode const* operator*();
> 	};
>
> 	// USES //
> 	defs_iterator defs_begin(DeclRefExpr const* var) const;
> 	defs_iterator defs_end() const;

Please add doxygen comments for these methods.

>
> 	// DEFS //
> 	uses_iterator uses_begin(DeclRefExpr const* var) const;
> 	uses_iterator uses_begin(VarDecl const* var) const;
> 	uses_iterator uses_end() const;

Please add doxygen comments for these methods.

>
> 	bool isUse(DeclRefExpr const* var) const;
> 	bool isDef(DeclRefExpr const* var) const { return isDef(DefUseNode 
> (var, DefUseNode::Def)); }
> 	bool isDef(VarDecl const* var) const { return isDef(DefUseNode 
> (var)); }
>
> 	~DefUse();
>
> 	static DefUse* BuildDefUseChains(Stmt* body, ASTContext *ctx,  
> DefUseBasicHelper* helper = 0,
> 										CFG* cfg = 0, ParentMap* pm = 0, bool verbose = false,
> 										llvm::raw_ostream& out = llvm::outs());
> };

I think having a "defuse" namespace is fine (for all the special  
typedefs, etc.), but it should only contain the implementation details  
for DefUse.  The DefUse class should be in the clang namespace.

>
> //===------------------------- DefUseBasicHelper  
> -------------------------===//

It seems like DefUseBasicHelper allows a bunch of hooks for subclasses  
via the 'HandleXXX' methods, but this level of hooks is not going to  
fair well from a performance standpoint.  I believe that one base  
class can implement most of the HandleXXX logic without making those  
methods virtual (and thus dispense with the HandleXXX methods entirely  
and just have VisitXXX methods).  I think most subclasses will need  
very few hooks, if any at all.  I think it's worth identifying the one  
or two methods that need to be virtual and then leave the rest of them  
as non-virtual methods.

Also, if this truly is the base class, how about calling it  
'DefUseHelper'?  That seems more succinct and clear from an interface  
perspective.

>
> class DefUseBasicHelper: public StmtVisitor<DefUseBasicHelper>{
> 	class DefUseBasicHelperImpl;
> 	DefUseBasicHelperImpl* 	pimpl;
>
> 	void InitializeValues(DefUseData* data, VarRefBlockMap* bm,  
> VarDeclMap* dm);
>
> 	friend class DefUse;
> public:
> 	DefUseNode::UseKind current_use;
> 	DefUseBasicHelper();
>
> 	virtual void HandleDeclRefExpr(DeclRefExpr *DR); // remember to  
> call the super class implementation of the method
> 	void HandleDeclStmt(DeclStmt *DS);
>
> 	virtual void HandleBinaryOperator(BinaryOperator* B);
> 	virtual void HandleConditionalOperator(ConditionalOperator* C);
> 	virtual void HandleCallExpr(CallExpr* C);
> 	virtual void HandleUnaryOperator(UnaryOperator* U);
> 	virtual void HandleArraySubscriptExpr(ArraySubscriptExpr* AS);
> 	// ...

Is there a need for the '// ..' line?

>
> 	void VisitDeclRefExpr(DeclRefExpr *DR){ return HandleDeclRefExpr 
> (DR); }
> 	void VisitDeclStmt(DeclStmt *DS){ return HandleDeclStmt(DS); }
> 	void VisitBinaryOperator(BinaryOperator* B){ return  
> HandleBinaryOperator(B); }
> 	void VisitConditionalOperator(ConditionalOperator* C){ return  
> HandleConditionalOperator(C); }
> 	void VisitCallExpr(CallExpr* C){ return HandleCallExpr(C); }
> 	void VisitUnaryOperator(UnaryOperator* U){ return  
> HandleUnaryOperator(U); }
> 	void VisitArraySubscriptExpr(ArraySubscriptExpr* AS){ return  
> HandleArraySubscriptExpr(AS); }
> 	// ...

Is there a need for the '// ..' line?

>
> 	void VisitCFGBlock(clang::CFGBlock const& blk, int root);

The 'root' argument is not necessary and should  be removed (see  
comment below).

> 	void VisitStmt(Stmt* S);
>
> 	virtual ~DefUseBasicHelper();
> };
>
> //===------------------------- DefUseChainTest  
> -------------------------===//
>
> class DefUseChainTest: public StmtVisitor<DefUseChainTest>, public  
> ASTConsumer {
> 	llvm::raw_ostream&		out;
> 	ASTContext* 			ctx;
> 	DefUse const*			du;
>
> public:
> 	DefUseChainTest(llvm::raw_ostream& out_): out(out_), ctx(NULL), du 
> (NULL) {	}
>
> 	void Initialize(ASTContext& context) { ctx = &context; }
> 	void HandleTopLevelDecl(DeclGroupRef declGroupRef);
> 	// void HandleTranslationUnit(clang::ASTContext& context);
>
> 	void VisitDeclRefExpr(DeclRefExpr *DR);
> 	void VisitDeclStmt(DeclStmt *DS);
> 	void VisitStmt(Stmt* S);
> };
>
> void PrintVarDefs(DefUse const* DU, DeclRefExpr const* DR,  
> ASTContext& ctx, llvm::raw_ostream& out);
> void PrintVarUses(DefUse const* DU, DeclRefExpr const* DR,  
> ASTContext& ctx, llvm::raw_ostream& out);
> void PrintVarUses(DefUse const* DU, VarDecl const* VD, ASTContext&  
> ctx, llvm::raw_ostream& out);
>
> } // end namespace defuse

The DefUseChainTest can be made entirely private in DefUse.cpp, and  
have a static function create it (and return an ASTConsumer).  There  
is no need to have it in the public API interface.

> } // end namespace clang
>
> #endif /* DEF_USE_HPP_ */

Some compilers complain about the extra tokens after #endif.  Please  
remove this comment.


> //===-- clang/Analysis/DefUse.cpp - DefUse analysis -------*- C++ -*- 
> ===//
> //
> //                     The LLVM Compiler Infrastructure
> //
> // This file is distributed under the University of Illinois Open  
> Source
> // License. See LICENSE.TXT for details.
> //
> // 
> = 
> = 
> = 
> ----------------------------------------------------------------------= 
> ==//
> //
> // This file contains the implementation of DefUse analysis
> //
> // 
> = 
> = 
> = 
> ----------------------------------------------------------------------= 
> ==//
>
> #define __STDC_LIMIT_MACROS
> #define __STDC_CONSTANT_MACROS

What are these two defines used for?

>
> #include "DefUse.h"
>
> #include "clang/Basic/SourceManager.h"
> #include "clang/Analysis/CFG.h"
>
> #include <llvm/Support/raw_ostream.h>
>
> #include <set>
> #include <map>

Eventually the code should probably use DenseSet and DenseMap for  
performance reasons, as std::map and std::set are very slow and make  
heavy use of malloc().  I understand that a significant amount of code  
may need to be tweaked, but the performance gain can easily be 2-3x.

>
> using namespace clang;
> using namespace defuse;
>
> // ASSERT MACRO
> #define ASSERT(COND, MSG) do{ \
> 	std::string str; \
> 	llvm::raw_string_ostream ss(str); \
> 	ss << MSG; \
> 	assert((COND) && ss.str().c_str()); }while(false)

I don't think this will do what you intend.  If I did:

ASSERT(false, 10)

I'd see:

   "false && ss.str().c_str()"

as the failed assertion condition in the terminal output.

One option to get better error reporting is to use the facilities in  
'clang/Basic/PrettyStackTrace.h', which are used in various places in  
Clang.  These can be used to print the location within the analyzed  
source that the assertion fails.  In general we don't like specialized  
error reporting machinery for just one file (as in the case of  
'ASSERT'), but like adding to the general error reporting machinery  
that all of LLVM/Clang can use.

I also don't think this ASSERTION macro adds much value and causes  
some of the client code below to be structured in some slightly  
suboptimal ways.

>
> // utility functions
> unsigned Line(SourceLocation const& l, clang::SourceManager const&  
> sm) {
> 	PresumedLoc pl = sm.getPresumedLoc(l);
> 	return pl.getLine();
> }
>
> unsigned Column(SourceLocation const& l, SourceManager const& sm) {
> 	PresumedLoc pl = sm.getPresumedLoc(l);
> 	return pl.getColumn();
> }

Instead of adding these functions, we can add 'getPresumedLoc' to  
FullSourceLoc and then use the 'getColumn()' and 'getLine()' methods  
from that class.

>
> std::string PrintClangStmt(Stmt const* stmt, ASTContext const& ctx){
> 	std::string str;
> 	llvm::raw_string_ostream pp(str);
> 	stmt->printPretty(pp, 0, PrintingPolicy(ctx.getLangOptions()), 0);
> 	return pp.str();
> }

I don't think this function is needed.  If you need it, please declare  
it static to keep it private to this file.


>
> //===------------------------- DefUseNode ------------------------- 
> ===//
> inline bool defuse::DefUseNode::operator==(DefUseNode const& n) const{
> 	if(n.usage != usage || n.kind != kind)
> 		return false;

Please add a space between 'if' and the '('.  This is the coding style  
used throughout LLVM.

Also, please use 2 space indentation like the rest of LLVM and Clang.


> 	if(kind == VarDecl)
> 		return var_decl == n.var_decl;
> 	else
> 		return var_ref == n.var_ref;
> 	return false;
> }

The 'return false' is dead code.  Please remove it.

This can be more succinctly written as:

   return kind == VarDecl ? var_decl == n.var_decl : var_Ref ==  
n.var_ref;

Alternatively:

   if (kind == VarDecl)
     return var_decl == n.var_decl;

   return var_ref == n.var_ref;

> inline clang::VarDecl const* DefUseNode::getDecl() const{
> 	if(kind == VarRef)
> 		return cast<clang::VarDecl>(var_ref->getDecl());
> 	return cast<clang::VarDecl>(var_decl);
> }

Same as the previous comment.

   if (kind == VarRef)
     return cast<VarDecl>(var_ref->getDecl());

   return cast<VarDecl>(var_decl);

or:

   return kind == VarRef ? cast<VarDecl>(var_ref->getDecl()) :  
cast<VarDecl>(var_decl);

>
> typedef std::set<DefUseNode const*> 			VarRefsSet;
> typedef std::set<VarDecl const*> 				VarDeclsSet;
> typedef std::map<VarDecl const*, DefUseVect> 	DefUseChain;

Please use llvm::DenseSet and llvm::DenseMap.  You'll need to use  
pointers to DefUseVects instead (and then destroy them), but the speed  
improvement will be significant.

>
> //===------------------------- DefUseBlock ------------------------- 
> ===//
>
> struct defuse::DefUseBlock: public DefUseChain{

This should be 'class' instead of 'struct'.  You declared it earlier  
as a 'class'.  While gcc will eat this, MSVC will not.  Also, since  
this isn't POD, it should be a class anyway.


> 	VarDeclsSet 	uses;
> 	VarRefsVect 	defsout;
> 	VarDeclsSet		killed;
> 	VarRefsVect 	reaches;
>
> 	class not_killed_set_iterator: public  
> std::iterator<std::input_iterator_tag, DefUseNode, std::ptrdiff_t,  
> DefUseNode const*>{
> 		DefUseBlock& 					block;
> 		VarRefsVect::const_iterator 	ptr_it;
>
> 		not_killed_set_iterator& inc(bool first){
> 			if(ptr_it != block.reaches.end() && !first)
> 				ptr_it++;

Please use '++ptr_it' instead of 'ptr_it++'.  While it makes no  
difference in this case, general speaking it performs better and there  
is no reason to make any assumptions about the underlying data  
structure and how the iterators are implemented.

> 			while(ptr_it != block.reaches.end() &&
> 					(block.killed.find((*ptr_it)->getDecl()) != block.killed.end() &&
> 							std::find(block.defsout.begin(), block.defsout.end(),  
> *ptr_it) == block.defsout.end()))
> 				ptr_it++;
> 			return *this;
> 		}
> 	public:
> 		not_killed_set_iterator(DefUseBlock& block_,  
> VarRefsVect::const_iterator ptr_it_): block(block_), ptr_it(ptr_it_) 
> { inc(true); }
> 		not_killed_set_iterator& operator++() { return inc(false); }
> 		not_killed_set_iterator& operator++(int) { return inc(false); }

Do we even need an operator++(int)?  By disallowing it people won't  
accidentally use it.  This also isn't the standard semantics of this  
operator, so by implementing it this way clients will silently be  
doing the wrong thing instead of getting a compilation error.

> 		bool operator!=(not_killed_set_iterator const& iter) {	return ! 
> ((&iter.block) == &(block) && iter.ptr_it == ptr_it);	}

This method can (and should) be declared const.

> 		const DefUseNode* const& operator*() const { return *ptr_it; }
> 	};
>
> 	not_killed_set_iterator begin_not_killed_set(){	return  
> DefUseBlock::not_killed_set_iterator(*this, reaches.begin()); }
> 	not_killed_set_iterator end_not_killed_set(){ return  
> DefUseBlock::not_killed_set_iterator(*this, reaches.end()); }

Please add a space between () and {

> };
>
> //===------------------------- VarDeclMap ------------------------- 
> ===//
>
> struct defuse::VarDeclMap: public StmtVisitor<VarDeclMap>, public  
> llvm::DenseMap<VarDecl const*, DeclStmt const*> {

LLVM has an 80 column coding convention:

http://llvm.org/docs/CodingStandards.html#scf_codewidth

Please wrap lines that exceed this length.

>
> 	VarDeclMap(Stmt const* body){ VisitStmt(const_cast<Stmt*>(body)); }
>
> 	void VisitDeclStmt(DeclStmt *DS){
> 		for(DeclStmt::decl_iterator it = DS->decl_begin(); it != DS- 
> >decl_end(); it++){
> 			if((*it)->getKind() != Decl::Var)
> 				continue;

Use dyn_cast instead of getKind.

> 			insert(std::make_pair(cast<VarDecl>(*it), DS));
> 		}
> 	}
> 	void VisitStmt(Stmt* S){
> 		for(Stmt::child_iterator I = S->child_begin(), E = S->child_end();  
> I != E; ++I)
> 			if(*I) Visit(*I);
> 	}
> };

Please make this is a class, as the earlier declaration was also a  
class.

Please also add a space between 'for' and '('.  This is the LLVM  
coding convention.

>
> //===------------------------- DefUseBasicHelper  
> -------------------------===//
>
> struct defuse::DefUseBasicHelper::DefUseBasicHelperImpl{
> 	DefUseData*			data;
> 	VarRefBlockMap*		bm;
> 	VarDeclMap*			dm;
> 	unsigned 			blockID;
>
> 	DefUseBasicHelperImpl(): data(NULL), bm(NULL), dm(NULL){}
> };
> defuse::DefUseBasicHelper::DefUseBasicHelper(): pimpl(new  
> DefUseBasicHelperImpl), current_use(DefUseNode::Use){}
> defuse::DefUseBasicHelper::~DefUseBasicHelper(){ delete pimpl; }
>
> void defuse::DefUseBasicHelper::InitializeValues(DefUseData* data,  
> VarRefBlockMap* bm, VarDeclMap* dm){
> 	pimpl->data = data;	pimpl->bm = bm;	pimpl->dm = dm;
> }
>
> void defuse::DefUseBasicHelper::HandleDeclRefExpr(DeclRefExpr *DR){
> 	VarRefBlockMap& bm = *pimpl->bm;
> 	unsigned blockID = pimpl->blockID;
> 	DefUseBlock& block_data = (*pimpl->data)[blockID];
> 	if(DR->getDecl()->getKind() == Decl::Var || DR->getDecl()->getKind 
> () == Decl::ParmVar){

The condition is overly verbose.  We also discourage the use of  
'getKind' unless you are using it in a switch statement.

Suggestion:

   Decl *D = DR->getDecl();
   if (isa<VarDecl>(D)) {
    ...
   }

isa<> uses the value returned from VarDecl::classof() to determine if  
'D' is a VarDecl.

> 		VarDecl* VD = cast<VarDecl>(DR->getDecl());

Actually, instead of using isa, here is a better style:

if (VarDecl *VD = dyn_cast<VarDecl>(DR->getDecl()) {
   ...
}

The dyn_cast combines 'isa' with 'cast', and returns a NULL pointer if  
isa would return false.  This is the encouraged style.


> 		// map this variable to the current block
> 		bm[DR] = blockID;
> 		if(block_data.find(VD) == block_data.end()){
> 			block_data.insert(std::make_pair(VD, DefUseVect()));
> 			block_data.uses.insert(VD); // this variable is used in the  
> current block and it has
> 										  // no prior definition within the block

If you end up using a DenseMap, this code can be made faster by  
avoiding the double lookup (the first for block_data.find() and the  
second for the insertion):

    DefUseBlock *&B = block_data[VD];
    if (!B) {
     B = new DefUseBlock();
     B->users.insert(VD);
   }

Using a DenseMap (which involves changing the data value to a  
pointer), the first line involves a quick hashtable lookup and creates  
a NULL pointer entry if the key could not be found.  The code that  
follows conjures up the new DefUseBlock without having to do a second  
search of the map.

> 		}
> 		if(current_use == DefUseNode::UseDef){ // in case of compound  
> operators
> 			block_data[VD].push_back(DefUseNode(DR, DefUseNode::Use));
> 			block_data[VD].push_back(DefUseNode(DR, DefUseNode::Def));

These lines cause two identical (and redundant) lookups to  
block_data.  Consider:

   DefUseBlock &B = block_data[VD];
   B.push_back(...);
   B.push_back(...);

> 		}else
> 			block_data[VD].push_back(DefUseNode(DR, current_use));
> 	}
> }

Please add spaces between 'if' and '(' and 'else' and '{'.

>
> void defuse::DefUseBasicHelper::HandleDeclStmt(DeclStmt *DS){
> 	for(DeclStmt::decl_iterator I = DS->decl_begin(), E = DS->decl_end 
> (); I != E; I++){
> 		if((*I)->getKind() != Decl::Var)
> 			continue;
> 		VarRefBlockMap& bm = *pimpl->bm;
> 		VarDeclMap& dm = *pimpl->dm;
> 		unsigned blockID = pimpl->blockID;
> 		DefUseBlock& block_data = (*pimpl->data)[blockID];
>
> 		VarDecl* VD = cast<VarDecl>(*I);
> 		if(block_data.find(VD) != block_data.end())
> 			ASSERT(true, "Variable '" << VD->getNameAsString() << "'  
> redefined in the same block");

Why not do:

    assert(block_data.find(VD) != block_data.end() && "Variable  
redefined in the same block");

It is far more succinct, doesn't add more code in a Release build (as  
ASSERT does), and if we really care what VD is we can run this code in  
the debugger (which we would likely need to do anyway).  It also  
reduces the number of lookups in the map in a Release build from two  
to one.

> void defuse::DefUseBasicHelper::HandleBinaryOperator(BinaryOperator*  
> B){
> 	DefUseNode::UseKind backup = current_use; // backup the usage
> 	if(B->isAssignmentOp()){
> 		current_use = DefUseNode::Use;
> 		Visit(B->getRHS());
> 		current_use = DefUseNode::Def;
> 		if(B->isCompoundAssignmentOp())
> 			current_use = DefUseNode::UseDef;
> 		Visit(B->getLHS());
> 	}else{
> 		Visit(B->getLHS());
> 		current_use = DefUseNode::Use;
> 		Visit(B->getRHS());
> 	}
> 	current_use = backup; // write back the usage to the current usage
> }
>
> void defuse::DefUseBasicHelper::HandleConditionalOperator 
> (ConditionalOperator* C){	}
>
> void defuse::DefUseBasicHelper::HandleCallExpr(CallExpr* C) {
> 	DefUseNode::UseKind backup = current_use; // backup the usage
> 	for(CallExpr::arg_iterator I = C->arg_begin(), E = C->arg_end(); I ! 
> = E; ++I){
> 		if((*I)->getType()->isPointerType() || (*I)->getType()- 
> >isReferenceType())
> 			current_use = DefUseNode::Def;
> 		Visit(*I);
> 		current_use = backup;
> 	}
> }
>
> void defuse::DefUseBasicHelper::HandleUnaryOperator(UnaryOperator* U){
> 	DefUseNode::UseKind backup = current_use; // backup the usage
> 	switch(U->getOpcode()){
> 	case UnaryOperator::PostInc:
> 	case UnaryOperator::PostDec:
> 	case UnaryOperator::PreInc:
> 	case UnaryOperator::PreDec:
> 		current_use = DefUseNode::UseDef;
> 		break;
> 	case UnaryOperator::Plus:
> 	case UnaryOperator::Minus:
> 	case UnaryOperator::Not:
> 	case UnaryOperator::LNot:
> 		current_use = DefUseNode::Use;
> 		break;
> 	case UnaryOperator::AddrOf:
> 	case UnaryOperator::Deref:
> 		// use the current_use
> 		break;
> 	default:
> 		// DEBUG("Operator " << UnaryOperator::getOpcodeStr(U->getOpcode 
> ()) << " not supported in def-use analysis");
> 		break;
> 	}
> 	Visit(U->getSubExpr());
> 	current_use = backup; // write back the usage to the current usage
> }
>
> void defuse::DefUseBasicHelper::HandleArraySubscriptExpr 
> (ArraySubscriptExpr* AS){
> 	DefUseNode::UseKind backup = current_use; // backup the usage
> 	Visit(AS->getBase());
> 	current_use = DefUseNode::Use;
> 	Visit(AS->getIdx());
> 	current_use = backup; // write back the usage to the current usage
> }
>
> void defuse::DefUseBasicHelper::VisitCFGBlock(CFGBlock const& blk,  
> int root){
> 	pimpl->blockID = blk.getBlockID();
> 	for(CFGBlock::const_iterator I = blk.begin(), E = blk.end(); I !=  
> E; I++)
> 		Visit(*I);
>
> 	unsigned blockID = pimpl->blockID;
> 	DefUseBlock& block_data = (*pimpl->data)[blockID];
>
> 	// Post processing
> 	// build up the uses(b), defsout(b) and killed(b)
>
> 	// for each variable
> 	for(DefUseBlock::iterator I = block_data.begin(), E = block_data.end 
> (); I != E; ++I){
> 	    // in case of references to ParmVar what we do is insert the  
> variable definition in the first block of the CFG
> 	    DefUseBlock& root_data = (*pimpl->data)[root];
>        if(I->first->getKind() == Decl::ParmVar &&
>                std::find(root_data[I->first].begin(), root_data[I- 
> >first].end(), DefUseNode(I->first)) == root_data[I->first].end()){
>            root_data[I->first].push_back(DefUseNode(I->first));
>            root_data.defsout.push_back(&root_data[I->first].back());
>        }
> 		// if this is a local variable, should't be in the defsout,killed  
> sets
> 		// defsout is calculated by iterating backwards to find the last  
> use of the variable
> 		for(DefUseVect::const_reverse_iterator UI = I->second.rbegin(), UE  
> = I->second.rend(); UI != UE; UI++)
> 			if(UI->getUse() == DefUseNode::Def && (block_data.defsout.empty()  
> || block_data.defsout.back()->getDecl() != I->first)){
> 				block_data.defsout.push_back(&(*UI));
> 				block_data.killed.insert(UI->getDecl());
> 				break;
> 			}
> 	}
> }
>
> void defuse::DefUseBasicHelper::VisitStmt(Stmt* S){
> 	for(Stmt::child_iterator I = S->child_begin(), E = S->child_end();  
> I != E; ++I)
> 		if(*I) Visit(*I);
> }
>
> bool ContainsStmt(Stmt const* block, Stmt const* stmt){
> 	if(block == stmt)
> 		return true;
> 	for(Stmt::const_child_iterator I = block->child_begin(), E = block- 
> >child_end(); I != E; ++I)
> 		if(*I){
> 			if(ContainsStmt(*I, stmt))	return true;
> 		}

How about:

   if (*I && ContainsStmt(*I, stmt))
     return true;

> 	return false;
> }
>
> Stmt const* GetEnclosingBlock(ParentMap const& pm, Stmt const* stmt){
> 	Stmt const* E = stmt;
> 	while(E){
> 		Stmt const* cond = NULL;
> 		switch(E->getStmtClass()){
> 		case Stmt::IfStmtClass:
> 			cond = cast<IfStmt>(E)->getCond();
> 			break;
> 		case Stmt::WhileStmtClass:
> 			cond = cast<WhileStmt>(E)->getCond();
> 			break;
> 		case Stmt::DoStmtClass:
> 			cond = cast<DoStmt>(E)->getCond();
> 			break;
> 		case Stmt::ForStmtClass:
> 			cond = cast<ForStmt>(E)->getCond();
> 			break;
> 		case Stmt::SwitchStmtClass:
> 			cond = cast<SwitchStmt>(E)->getCond();
> 			break;
> 		case Stmt::CompoundStmtClass:
> 			return E;
> 		default:
> 			break;
> 		}
> 		if(cond && !ContainsStmt(cond, stmt))
> 			return E;
> 		E = pm.getParent(E);
> 	}
> 	return NULL;
> }
>
> void RemoveLocalDefs(CFG const& cfg, ParentMap const& pm,  
> VarDeclMap& decl_vars, DefUseData& data){
> 	for(CFG::const_reverse_iterator it = cfg.rbegin(), cend = cfg.rend 
> (); it != cend; it++) {
> 		// every time we find a node with more than 1 predecessor it means
> 		// we are at a junction point and the local variable defined in  
> the previuos
> 		// blocks must be removed from the defsout set
> 		if(it->pred_size() <= 1)
> 			continue;
> 		// for each predecessor
> 		for(CFGBlock::const_pred_iterator pit = it->pred_begin(), end = it- 
> >pred_end(); pit != end && (*pit)->size(); pit++){
> 			DefUseBlock& block_data = data[(*pit)->getBlockID()];
> 			for(VarRefsVect::iterator dit = block_data.defsout.begin(); dit ! 
> = block_data.defsout.end(); dit++){
> 				// we have to be sure the element we are using to see if the  
> Decl is in the same block of block statements
> 				// is also present in the AST, so if the first statement of the  
> CFG block is a VarDecl, we use the VarDeclMap
> 				// to retrieve the corresponding AST node
> 				Stmt const* testStmt = (*pit)->front();
> 				if(testStmt->getStmtClass() == Stmt::DeclStmtClass)
> 					testStmt = decl_vars[cast<VarDecl>(cast<DeclStmt>(testStmt)- 
> >getSingleDecl())];

Use 'dyn_cast' or 'isa' instead of 'getStmtClass()'


> 				// if the variable is local to one of the prec blocks and we are  
> at a junction point
> 				// it means the variable must be removed from the defsout set
> 				if(GetEnclosingBlock(pm, decl_vars[(*dit)->getDecl()]) ==  
> GetEnclosingBlock(pm, testStmt)){
> 					// it can be the case the following block is in the same scope  
> of the previous block (for example if stmt without else)
> 					Stmt const* testStmt = it->front();
> 					if(testStmt->getStmtClass() == Stmt::DeclStmtClass)
> 						testStmt = decl_vars[cast<VarDecl>(cast<DeclStmt>(testStmt)- 
> >getSingleDecl())];

Use 'dyn_cast' or 'isa' instead of 'getStmtClass()'


> 					if(GetEnclosingBlock(pm, decl_vars[(*dit)->getDecl()]) !=  
> GetEnclosingBlock(pm, testStmt)){
> 						block_data.defsout.erase(dit--);
> 					}
> 				}
> 			}
> 		}
> 	}
> }
>
> // implements reaching definitions according to Kennedy's book  
> iterative algorithm
> void ComputeReaches(CFG const& cfg, DefUseData& data){
> 	bool changed = true;
> 	while(changed){
> 		changed = false;
> 		VarRefsSet newreaches;
> 		for(CFG::const_reverse_iterator it = cfg.rbegin(); it != cfg.rend 
> (); it++) {
> 			DefUseBlock& BlockData = data[it->getBlockID()];
> 			newreaches = VarRefsSet();
> 			for(CFGBlock::const_pred_iterator pit = it->pred_begin(), end =  
> it->pred_end(); pit != end; pit++){
> 				unsigned p = (*pit)->getBlockID();
> 				// s(p) = reaches(p) - killed(p)
> 				VarRefsSet s, temp_set, temp_set_1;
> 				copy(data[p].begin_not_killed_set(), data[p].end_not_killed_set 
> (), inserter(s, s.begin()));
> 				// tmp_set(p) = defsout(p) U s(p)
> 				set_union(s.begin(), s.end(), data[p].defsout.begin(), data 
> [p].defsout.end(), inserter(temp_set, temp_set.begin()));
> 				// tmp_set_1 = newreaches U tmp_set(p)
> 				set_union(newreaches.begin(), newreaches.end(), temp_set.begin 
> (), temp_set.end(), inserter(temp_set_1, temp_set_1.begin()));
> 				newreaches.swap(temp_set_1);
> 			}
> 			if(BlockData.reaches.size() != newreaches.size()){
> 				BlockData.reaches = VarRefsVect(newreaches.begin(),  
> newreaches.end());
> 				changed = true;
> 			}
> 		}
> 	}
> }
>
> //===------------------------- Printing utilities  
> -------------------------===//
>
> struct printer: public std::iterator<std::input_iterator_tag,  
> printer, std::ptrdiff_t, printer>{
> 	printer(llvm::raw_ostream& x, ASTContext const& ctx_, const char* s  
> = "") : o(x), ctx(ctx_), delim(s) { }
> 	template<typename T>
> 	printer& operator=(const T& x);
> 	printer& operator*() { return *this; }
> 	printer& operator++() { return *this; }
> 	printer& operator++(int) { return *this; }
>
> 	mutable llvm::raw_ostream& o;
> 	ASTContext const& ctx;
> 	const char* delim;
> };
>
> template<>
> printer& printer::operator=<VarDecl const*>(const VarDecl* const& x){
> 	o << x->getNameAsString() << delim;
> 	return *this;
> }
>
> template<>
> printer& printer::operator=<DefUseNode const*>(const DefUseNode*  
> const& x){
> 	SourceLocation loc;
> 	if(x->getKind() == DefUseNode::VarRef)
> 		loc = x->getVarRef()->getLocStart();
> 	else
> 		loc = x->getVarDecl()->getTypeSpecStartLoc();
> 	o << x->getDecl()->getNameAsString() << " " << "(" << Line(loc,  
> ctx.getSourceManager()) << ","
>        << Column(loc, ctx.getSourceManager()) << " "
>        << ((x->getUse() == DefUseNode::Use)?"USE":"DEF") << ")" <<  
> delim;
> 	return *this;
> }
>
> template<>
> printer& printer::operator=<DefUseNode>(DefUseNode const& x){ return  
> this->operator=(&x); }
>
> void printDefUseData(llvm::raw_ostream& out, ASTContext const& ctx,  
> DefUseData const& data){
> 	out << "--------------- PRINTING ANALYSIS DATA  
> ----------------------\n";
> 	unsigned blockID = data.size() - 1;
> 	for(DefUseData::const_reverse_iterator it = data.rbegin(); it !=  
> data.rend(); it++, blockID--){
> 		out << "* BlockID '" << blockID << "' uses vars:\n";
> 		for(DefUseBlock::const_iterator pit = it->begin(); pit != it->end 
> (); pit++){
> 			out << "'" << pit->first->getNameAsString() << "':\n\t";
> 			std::copy( pit->second.begin(), pit->second.end(), printer( out,  
> ctx, ", " ) );
> 			out << "\n";
> 		}
> 	}
>
> 	blockID = data.size() - 1;
> 	for(DefUseData::const_reverse_iterator it = data.rbegin(); it !=  
> data.rend(); it++, blockID--) {
> 		DefUseBlock const& BlockData = *it;
> 		out << "---------------- Block: " << blockID << " ---------------- 
> \n";
> 		out << "# uses(" << blockID << "): {";
> 		copy( BlockData.uses.begin(), BlockData.uses.end(), printer( out,  
> ctx, ", " ) );
> 		out << "}\n";
> 		out << "# defsout(" << blockID << "): {";
> 		copy( BlockData.defsout.begin(), BlockData.defsout.end(), printer 
> ( out, ctx, ", " ) );
> 		out << "}\n";
> 		out << "# killed(" << blockID << "): {";
> 		copy( BlockData.killed.begin(), BlockData.killed.end(), printer 
> ( out, ctx, ", " ) );
> 		out << "}\n";
> 		out << "# reaches(" << blockID << "): {";
> 		copy( BlockData.reaches.begin(), BlockData.reaches.end(), printer 
> ( out, ctx, ", " ) );
> 		out << "}\n";
> 	}
> }
>
> //===------------------------- DefUse -------------------------===//
>
> DefUse::~DefUse(){ delete analysis_data; delete decl_map; delete  
> block_map; }
>
> bool isDef(ASTContext const& ctx, DefUseNode const& n, int* blockID,  
> DefUseVect::const_iterator* iter,
> 		DefUseData const* analysis_data, VarDeclMap const* decl_map,  
> VarRefBlockMap const*  block_map, Stmt const** stmt = NULL){
> 	// retrieve the blockID of the CFG where the statement N (which can  
> be wither a VarDecl or a DeclRefExpr) is placed
> 	Stmt const* _stmt_;
> 	if(n.getKind() == DefUseNode::VarDecl){
> 		VarDecl const* vd = n.getVarDecl();
> 		VarDeclMap::const_iterator it = decl_map->find(vd);
> 		// we use the decl_map to translate the VarDecl to the  
> corresponding DeclStmt
> 		// this is useful when DeclStmts from the CFG (which usually are  
> different from the original ones) are used
> 		ASSERT(it != decl_map->end(), "Variable declaration for '" << vd- 
> >getNameAsString() << "' not mapped to any statement in the syntax  
> tree");
> 		_stmt_ = it->second;
> 	}else
> 		_stmt_ = n.getVarRef();
> 	assert(_stmt_ != NULL);
> 	VarRefBlockMap::const_iterator it = block_map->find(_stmt_);
> 	ASSERT(it != block_map->end(), "Statement '" << PrintClangStmt 
> (_stmt_, ctx) << "' not mapped to any block of the CFG");
>
> 	// We have to find the definition inside the block
> 	DefUseBlock::const_iterator data_it = (*analysis_data)[it- 
> >second].find(n.getDecl());
> 	// if we don't find the variable among the variable used or defined  
> in the block it means something went wrong
> 	ASSERT(data_it != (*analysis_data)[it->second].end(), "Block ID: "  
> << it->second << " doesn't contains usage or definitions of variable  
> " << n.getDecl()->getNameAsString());
>
> 	// we find a Variable definition, now we have to iterate through  
> the DefUseNodes in order to find the
> 	// the node corresponding to the n variable passed by the user
> 	DefUseVect::const_iterator _iter_ = std::find(data_it->second.begin 
> (), data_it->second.end(), n);
> 	if(blockID)		*blockID = it->second;
> 	if(iter)		*iter = _iter_;
> 	if(stmt)		*stmt = _stmt_;
> 	return _iter_ != data_it->second.end();
> }
>
> bool isUse(ASTContext const& ctx, clang::DeclRefExpr const* var,  
> int* blockID, DefUseVect::const_iterator* iter,
> 		DefUseData const* analysis_data, VarRefBlockMap const*  block_map){
> 	Stmt const* stmt = var;
> 	VarRefBlockMap::const_iterator it = block_map->find(stmt);
> 	ASSERT(it != block_map->end(), "Statement " << PrintClangStmt(stmt,  
> ctx) << " not mapped to any block of the CFG!");
>
> 	DefUseBlock::const_iterator data_it = (*analysis_data)[it- 
> >second].find(cast<VarDecl>(var->getDecl()));
> 	ASSERT(data_it != (*analysis_data)[it->second].end(),
> 		"Block ID: " << it->second << " doesn't contains usage or  
> definitions of variable " << var->getDecl()->getNameAsString());
>
> 	DefUseVect::const_iterator _iter_ = std::find(data_it->second.begin 
> (), data_it->second.end(), DefUseNode(var, DefUseNode::Use));
> 	if(blockID)		*blockID = it->second;
> 	if(iter)		*iter = _iter_;
> 	return _iter_ != data_it->second.end();
> }
>
> //===------------------------- uses_iterator  
> -------------------------===//
>
> DefUse::uses_iterator::uses_iterator(DefUse const* du, DefUseNode  
> const& n):
> 	DefUse::iterator_impl(du), iter_ptr(-1), inDefBlock(true)
> {
> 	Stmt const* stmt = NULL;
> 	if(!::isDef(du->ctx, n, &iter_ptr.block_id, &iter_ptr.block_it, du- 
> >analysis_data, du->decl_map, du->block_map, &stmt))
> 		throw NotDefExcpetion(stmt); 	// there is no definition for the  
> DeclRefExpr passed by the user
> 										// this means he trying to list the uses of a declrefexpr  
> which is a use!
> 										// an excpetion is thrown.
> 	node = &(*(iter_ptr.block_it));
> 	inc(true);
> }
>
> bool DefUse::uses_iterator::operator!=(uses_iterator const& iter){
> 	return !(iter_ptr.block_id == iter.iter_ptr.block_id &&  
> iter.iter_ptr.block_it ==  iter.iter_ptr.block_it);
> }
>
> DeclRefExpr const* DefUse::uses_iterator::operator*() {
> 	assert(iter_ptr.block_it->getKind() == DefUseNode::VarRef);
> 	return iter_ptr.block_it->getVarRef();
> }
>
> DefUse::uses_iterator& DefUse::uses_iterator::inc(bool first){
> 	if(iter_ptr.block_id == -1)		return *this;
> 	DefUseBlock const& m = (*du->analysis_data)[iter_ptr.block_id];
> 	DefUseBlock::const_iterator it = m.find(node->getDecl());
> 	assert(it != m.end());
> 	if(iter_ptr.block_it != it->second.end())
> 		iter_ptr.block_it++;
>
> 	if(iter_ptr.block_it != it->second.end() && iter_ptr.block_it- 
> >getDecl() == node->getDecl()){
> 		if(iter_ptr.block_it->getUse() == DefUseNode::Use)
> 			return *this; // this is a use of the variable
> 		else if(inDefBlock){
> 			// found a new definition, stop iterating
> 			iter_ptr.block_it = DefUseVect::const_iterator();
> 			iter_ptr.block_id = -1;
> 			return *this;
> 		}
> 	}
> 	// if variable in defsout we have to check successor blocks
> 	if(inDefBlock && std::find(m.defsout.begin(), m.defsout.end(),  
> node) == m.defsout.end()){
> 		iter_ptr.block_it = DefUseVect::const_iterator();
> 		iter_ptr.block_id = -1; // it means there are no other uses of  
> this variable
> 		return *this;
> 	}
> 	// We have to change block
> 	// the following code has the precondition the blocks are numbered  
> in a decrescent order
> 	// we lock into the block_id and see if the decl is in the reaches 
> () set or not
> 	int new_block = iter_ptr.block_id;
> 	while(--new_block >= 0){
> 		DefUseBlock const& m = (*du->analysis_data)[new_block];
> 		if(std::find(m.reaches.begin(), m.reaches.end(), node) !=  
> m.reaches.end()){
> 			inDefBlock = false;
> 			DefUseBlock::const_iterator data_it = (*du->analysis_data) 
> [new_block].find(node->getDecl());
> 			if(data_it != (*du->analysis_data)[new_block].end())
> 				if(data_it->second.begin()->getUse() == DefUseNode::Use){
> 					iter_ptr.block_id = new_block;
> 					iter_ptr.block_it = data_it->second.begin();
> 					return *this;
> 				}
> 		}
> 	}
> 	iter_ptr.block_it = DefUseVect::const_iterator();
> 	iter_ptr.block_id = -1;
> 	return *this;
> }
>
> //===------------------------- defs_iterator  
> -------------------------===//
>
> DefUse::defs_iterator::defs_iterator(DefUse const* du, DeclRefExpr  
> const& n):
> 	iterator_impl(du), iter_ptr(-1), blockDef(false)
> {
> 	if(!::isUse(du->ctx, &n, &iter_ptr.block_id, &iter_ptr.block_it, du- 
> >analysis_data, du->block_map))
> 		throw NotUseExcpetion(&n); // the DeclRefExpr passed by the user  
> is never used in this block but only defined,
> 								   // the user is trying to iterate through the definitions  
> of a definition... which doesn't make sense
> 								   // an exception must be thrown
> 	iter_ptr.reaches_it = (*du->analysis_data) 
> [iter_ptr.block_id].reaches.begin();
> 	node = &(*(iter_ptr.block_it));
> 	inc(true);
> }
> bool DefUse::defs_iterator::operator!=(defs_iterator const& iter){
> 	return !(iter_ptr.block_id == iter.iter_ptr.block_id &&
> 			iter.iter_ptr.block_it ==  iter.iter_ptr.block_it &&
> 			iter.iter_ptr.reaches_it == iter_ptr.reaches_it);
> }
>
> DefUseNode const* DefUse::defs_iterator::operator*() {
> 	DefUseBlock::const_iterator it = (*du->analysis_data) 
> [iter_ptr.block_id].find(node->getDecl());
> 	assert(it != (*du->analysis_data)[iter_ptr.block_id].end());
> 	if(blockDef)
> 		return &(*iter_ptr.block_it);
> 	return *iter_ptr.reaches_it;
> }
>
> DefUse::defs_iterator::defs_iterator& DefUse::defs_iterator::inc 
> (bool first){
> 	if(iter_ptr.block_id == -1)		return *this;
> 	// look inside reaches vector
> 	DefUseBlock::const_iterator it = (*du->analysis_data) 
> [iter_ptr.block_id].find(node->getDecl());
> 	assert(it != (*du->analysis_data)[iter_ptr.block_id].end());
> 	if(first){
> 		// we have to find a def in the block, if exist
> 		for(; iter_ptr.block_it >= it->second.begin(); iter_ptr.block_it--)
> 			if(iter_ptr.block_it->getUse() == DefUseNode::Def){
> 				blockDef = true;
> 				break;
> 			}
> 		// the def is in the block, so it means we don't have to look any  
> further
> 		// next time the iterator is incremented we return an empty one
> 		if(blockDef)
> 			return *this;
>
> 	}else if(blockDef){
> 		iter_ptr.block_it = DefUseVect::const_iterator();
> 		iter_ptr.reaches_it = VarRefsVect::const_iterator();
> 		iter_ptr.block_id = -1;
> 		return *this;
> 	}
> 	// we didn't find a def in the block, so we have to look at the  
> reaches set
> 	VarRefsVect const& reaches_v = (*du->analysis_data) 
> [iter_ptr.block_id].reaches;
> 	if(!first && iter_ptr.reaches_it != reaches_v.end())
> 		iter_ptr.reaches_it++;
> 	for(; iter_ptr.reaches_it != reaches_v.end(); iter_ptr.reaches_it++)
> 		if((*iter_ptr.reaches_it)->getDecl() == node->getDecl())
> 			return *this;
>
> 	iter_ptr.block_it = DefUseVect::const_iterator();
> 	iter_ptr.reaches_it = VarRefsVect::const_iterator();
> 	iter_ptr.block_id = -1;
> 	return *this;
> }
>
> DefUse::uses_iterator defuse::DefUse::uses_begin(clang::DeclRefExpr  
> const* var) const{
> 	return DefUse::uses_iterator(this, DefUseNode(var, DefUseNode::Def));
> }
> DefUse::uses_iterator defuse::DefUse::uses_begin(clang::VarDecl  
> const* var) const{
> 	return DefUse::uses_iterator(this, DefUseNode(var));
> }
> DefUse::uses_iterator defuse::DefUse::uses_end() const{ return  
> DefUse::uses_iterator(); }
>
> DefUse::defs_iterator defuse::DefUse::defs_begin(clang::DeclRefExpr  
> const* var) const{
> 	return DefUse::defs_iterator(this, *var);
> }
> DefUse::defs_iterator defuse::DefUse::defs_end() const{ return  
> DefUse::defs_iterator(); }
>
> bool defuse::DefUse::isUse(clang::DeclRefExpr const* var) const  
> { return ::isUse(ctx, var, NULL, NULL, analysis_data, block_map); }
>
> bool defuse::DefUse::isDef(DefUseNode const& n) const  
> { return ::isDef(ctx, n, NULL, NULL, analysis_data, decl_map,  
> block_map); }
>
> //===------------------------- BuildDefUseChains  
> -------------------------===//
>
> DefUse* defuse::DefUse::BuildDefUseChains(Stmt* body, ASTContext  
> *ctx, DefUseBasicHelper* helper,
>        CFG* cfg, ParentMap* pm, bool verbose, llvm::raw_ostream& out)
> {
> 	bool cleanCFG = false, cleanParentMap = false, cleanHelper = false;
> 	if(!cfg){
> 		cfg = CFG::buildCFG(body, ctx);
> 		cleanCFG = true;
> 	}
> 	if(!pm){
> 		pm = new ParentMap(body);
> 		cleanParentMap = true;
> 	}
> 	DefUseData*	data = new DefUseData(cfg->getNumBlockIDs());
>
> 	VarDeclMap*	dm = new VarDeclMap(body); // map VarDecl to AST nodes  
> - this is needed as when the CFG is created by CLANG
> 										   // DeclStmt with several declarations are rewritten  
> into separate DeclStmts
>
> 	VarRefBlockMap* bm = new VarRefBlockMap;	  // map each VarRef to  
> the CFG block where it has been used
>
> 	if(verbose)
> 		cfg->dump(ctx->getLangOptions());
>
> 	if(!helper){
> 		helper = new DefUseBasicHelper;
> 		cleanHelper = true;
> 	}
> 	helper->InitializeValues(data, bm, dm);
>
> 	for(CFG::const_iterator it = cfg->begin(); it != cfg->end(); it++)
> 		helper->VisitCFGBlock(*it, cfg->getNumBlockIDs()-1);

You should not rely on the root being number 'getNumBlockIDs() - 1'.   
This actually is not an invariant you should depend on.  If you want  
to identify the entry block, use 'CFG::getEntry()'.

>
> 	// remove from defsout local variables
> 	RemoveLocalDefs(*cfg, *pm, *dm, *data);
> 	// Calculates the reaches set for each block of the CFG
> 	ComputeReaches(*cfg, *data);
>
> 	if(verbose)
> 		printDefUseData(out, *ctx, *data);
>
> 	unsigned block_id = cfg->getNumBlockIDs();
> 	if(cfg && cleanCFG)
> 		delete cfg;
> 	if(pm && cleanParentMap)
> 		delete pm;
> 	if(helper && cleanHelper)
> 		delete helper;
> 	return new DefUse(*ctx, data, dm, bm, block_id);
> }

Instead of BuildDefUseChains building its own CFG and ParentMap, have  
these be passed in from the client.  This allows reuse of existing  
data that has likely been constructed for other uses.  You can also  
make an alternate version of BuildDefUseChains that is passed an  
AnalysisManager and uses those to create the CFGs and ParentMap.

Also, consider using llvm::OwningPtr to auto-reclaim 'helper' and  
friends.  For example:

  llvm::OwningPtr<DefUseBasicHelper> helperReclaim;
  if (!helper) {
    helper = new DefUseBasicHelper();
    helperReclaim.reset(helper);
  }

You can then forget about the later cleanup because it will  
automatically happen when the function returns.  This idiom is really  
powerful when you have multiple return sites, as you can easily forget  
to do the proper cleanup.  It also cleans up the code, as all the  
cleanup is just boilerplate that is easily handled by the OwningPtr.


>
> //===------------------------- DefUseChainTest  
> -------------------------===//
>
> void DefUseChainTest::HandleTopLevelDecl(DeclGroupRef declRef) {
> 	for(DeclGroupRef::iterator it = declRef.begin(); it < declRef.end 
> (); it++){
> 		if((*it)->getKind() != Decl::Function)
> 			continue;

Use 'dyn_cast' or 'isa' instead of 'getKind()'


> 		// Top-level Function declaration
> 		FunctionDecl* func_decl = cast<FunctionDecl> (*it);
> 		Stmt* func_body = func_decl->getBodyIfAvailable();
> 		if(!func_body)
> 			continue;
>
> 		out << "--- Building Def-Use Chains for function '" << func_decl- 
> >getNameAsString() << ":" << func_decl->getNumParams() << "' ---\n";
> 		du = DefUse::BuildDefUseChains(func_body, ctx, 0, 0, 0, true);
> 		out << "------ Testing Def-Use iterators ------\n";
> 		VisitStmt(func_body);
> 		out << "**************************************************\n";
> 		delete du;
> 	}
> }
>
> void defuse::PrintVarDefs(DefUse const* DU, DeclRefExpr const* DR,  
> ASTContext& ctx, llvm::raw_ostream& out){
> 	if(DR->getDecl()->getKind() != Decl::Var && DR->getDecl()->getKind 
> () != Decl::ParmVar)
> 		return;

Use 'dyn_cast' or 'isa' instead of 'getKind()'


>
> 	VarDecl const* VD = cast<VarDecl>(DR->getDecl());
> 	SourceLocation loc =  DR->getLocStart();
> 	out << "Variable ref '" << VD->getNameAsCString() << "' in LOC: "
> 				<< "(" << Line(loc, ctx.getSourceManager()) << "," << Column 
> (loc, ctx.getSourceManager()) << ") -> defs_iterator:\n";
> 	try{
> 		for(DefUse::defs_iterator it = DU->defs_begin(DR); it != DU- 
> >defs_end(); it++){
> 			SourceLocation loc;
> 			if((*it)->getKind() == DefUseNode::VarRef)
> 				loc = (*it)->getVarRef()->getLocStart();
> 			else
> 				loc = (*it)->getVarDecl()->getTypeSpecStartLoc();
> 			out << "\t* Found DEF in loc: " << "(" << Line(loc,  
> ctx.getSourceManager()) << "," << Column(loc, ctx.getSourceManager 
> ()) << ")\n";
> 		}
> 	}catch(NotUseExcpetion e){
> 		SourceLocation loc = e.expr->getLocStart();
> 		out << "ERROR: Variable " << PrintClangStmt(e.expr, ctx) << " in  
> location : "
> 				<< "(" << Line(loc, ctx.getSourceManager()) << "," << Column 
> (loc, ctx.getSourceManager()) << ") is not a use for variable: " <<   
> VD->getNameAsCString() << "\n";

Again, LLVM doesn't use C++ exceptions.  You'll need to come up with  
an alternate mechanism.


> 	}
> }
> void PrintVarUses(DefUse const* DU, DefUseNode const& n, ASTContext&  
> ctx, llvm::raw_ostream& out){
> 	VarDecl const* VD = n.getDecl();
> 	SourceLocation loc =  VD ->getTypeSpecStartLoc();
> 	if(n.getKind() == DefUseNode::VarRef)
> 		loc = n.getVarRef()->getLocStart();
>
> 	out << "Variable ref '" << VD->getNameAsCString() << "' in LOC: "
> 			<< "(" << Line(loc, ctx.getSourceManager()) << "," << Column(loc,  
> ctx.getSourceManager()) << ") -> uses_iterator:\n";
> 	try{
> 		DefUse::uses_iterator it = DU->uses_end();
> 		if(n.getKind() == DefUseNode::VarRef)
> 			it = DU->uses_begin(n.getVarRef());
> 		else
> 			it = DU->uses_begin(VD);
> 		for(; it != DU->uses_end(); it++){
> 			SourceLocation loc =  (*it)->getLocStart();
> 			out << "\t* Found USE in loc: " << "(" << Line(loc,  
> ctx.getSourceManager()) << "," << Column(loc, ctx.getSourceManager 
> ()) << ")\n";
> 		}
> 	}catch(NotDefExcpetion e){
> 		SourceLocation loc = e.stmt->getLocStart();
> 		out << "ERROR: Variable " << PrintClangStmt(e.stmt, ctx) << " in  
> location : "
> 				<< "(" << Line(loc, ctx.getSourceManager()) << "," << Column 
> (loc, ctx.getSourceManager()) << ") is not a definition for  
> variable: " << VD->getNameAsCString() << "\n";
> 	}

Again, LLVM doesn't use C++ exceptions.  You'll need to come up with  
an alternate mechanism.

> }
>
> inline void defuse::PrintVarUses(DefUse const* DU, DeclRefExpr  
> const* DR, ASTContext& ctx, llvm::raw_ostream& out){ ::PrintVarUses 
> (DU, DefUseNode(DR), ctx, out); }
> inline void defuse::PrintVarUses(DefUse const* DU, VarDecl const*  
> VD, ASTContext& ctx, llvm::raw_ostream& out){ ::PrintVarUses(DU,  
> DefUseNode(VD), ctx, out); }
>
> void defuse::DefUseChainTest::VisitDeclRefExpr(DeclRefExpr *DR){
> 	if(DR->getDecl()->getKind() == Decl::Var || DR->getDecl()->getKind 
> () == Decl::ParmVar){

Use 'dyn_cast' or 'isa' instead of 'getKind()'

> 		PrintVarUses(du, DR, *ctx, out);
> 		PrintVarDefs(du, DR, *ctx, out);
> 	}
> }
>
> void defuse::DefUseChainTest::VisitDeclStmt(DeclStmt *DS){
> 	for(DeclStmt::decl_iterator it = DS->decl_begin(); it != DS- 
> >decl_end(); it++){
> 		if((*it)->getKind() != Decl::Var)
> 			continue;
>
> 		VarDecl* VD = cast<VarDecl>(*it);
> 		PrintVarUses(du, VD, *ctx, out);
>
> 		if(VD->getInit())
> 			Visit(VD->getInit());

How about:

   if (Stmt *S = VD->getInit())
     Visit(S);


> 	}
> }
>
> void defuse::DefUseChainTest::VisitStmt(Stmt* S){
> 	for(Stmt::child_iterator I = S->child_begin(), E = S->child_end();  
> I != E; ++I)
> 		if(*I) Visit(*I);
> }




More information about the cfe-dev mailing list