[cfe-dev] Constructing use-def chains
Simone Pellegrini
spellegrini at dps.uibk.ac.at
Sun Nov 1 11:35:16 PST 2009
Dear Ted,
first of all I apologize for the long time, but I was a little bit
overloaded with work lately so I had no time to fix the style issues in
the DefUse implementation.
I did several improvements following your suggestions. I changed the
interface a little and mainly the implementation. Here it is a list of
changes I did:
* DefUseNode now is implemented using the PointerUnion class (I also
changed the name from DefUseNode to Node as it is already in the du
namespace and it was a little bit overwhelming).
* I have separated the DefUseHelper from the visitor semantics and moved
some of the definition from the the header to the cpp file.
* I have eliminated all the possible problem due to resizing of vectors
and I finally use llvm::DenseMap instead of std::map.
* I also changed most of the variable names to adhere the LLVM style
* I use the AnalysisContext object instead of asking the user for a CFG
and a ParentMap
* plus several small fixes and style improvement
I think there are still improvements to do, the code is way away from
being perfect. I promise I will provide some test cases in short time.
Anyway, thanks for helping me in improving my crap code :P more comments
follows:
On Wed, 2009-09-23 at 00:18 -0700, Ted Kremenek wrote:
> On Sep 15, 2009, at 9:24 AM, Simone Pellegrini wrote:
>
> > Hello, I am back :),
> > so I fix most of the comments, I still need to address a couple of
> > issues, i.e. use of llvm::DenseMap in place of std::map and virtual
> > methods in DefUseHelper.
>
> Hi Simone,
>
> Sorry for reviewing this second set of changes so late. Comments
> inline.
>
> > About the usage of llvm::DenseMap, the problem is that every time I
> > change from std:::map to llvm::DenseMap I get several errors
> > (compile time), which is weird as the 2 classes expose the same
> > interface, anyway I need some time to understand which is the source
> > of problems.
>
> One thing to understand is that the two data structures do not export
> the same interface.
>
> std::map allocates memory (using malloc) for each key/value pair you
> store in the map. This allocation stays around around as long as you
> keep the key in the map. For example, suppose you had:
>
> std::map<void*,int> M;
> int &x = M[p];
>
> The reference 'x' is valid for the entire time that the key 'p' is in
> the map. If you add or remove other keys, the reference stays valid
> because the key/value pair is actually an allocated blob of memory
> they you are allowed to reference.
>
> DenseMap doesn't have this property. It assumes that the values you
> store in the DenseMap are lightweight, e.g. pointers to other data,
> and thus allocates memory in chunks that are used for buckets for the
> hash table. When an insertion into the DenseMap occurs, the hash
> table may resize, causing the values in the old buckets to get copied
> to new buckets, and the old buckets deallocated. This means that any
> reference to the old data, as in the example above, is invalid.
>
> Where this is going to bite you is here:
>
> typedef std::map<VarDecl const*, DefUseVect> DefUseChain;
>
> If you replace this with:
>
> typedef llvm::DenseMap<VarDecl const*, DefUseVect> DefUseChain;
>
> then fundamentally things are going to break. This because the
> DefUseVect values, which is a typedef for std::vector<DefUseNode>,
> will get copied and deallocated when the table resizes. All
> references to the DefUseVect, or the objects they contain, will
> incorrect at this point.
>
> The solution is to use DefUseVect* instead of DefUseVect as the value
> for DefUseChain.
Done.
>
> >
> > About the virtual methods in DefUseHelper, you are right! :) most of
> > the time the user will use the default one and I see the performance
> > problem with the virtual calls. Actually, I have introduced this
> > virtual dispatch because in the project where I am using this
> > implementation of DefUse, I need to implement specific semantics of
> > MPI calls. So mostly I am overloading the semantics of method calls
> > (CallExpr). By default when a variable is passed to a function as a
> > pointer type (or reference), during the DefUse analysis we have to
> > state that this is a definition for the variable. Anyway, if we know
> > the semantics of the specific method, and we know that it will not
> > change the variable (for example the buffer in MPI_Send) we can
> > state that the method call will a use for the variable.
> >
> > Now I don't know if this mechanism is useful for others... and if it
> > should be here in the first place, but it makes sense for me :)
> > Anyway, I guess I can avoid the virtual calls by using templates. I
> > will be working on it the next days.
>
> I think the approach should be to focus on a clean API. Using
> templates should be avoided unless there is a huge performance need;
> the cost of virtual functions is low if the amount of work done in the
> virtual function call outweighs the dynamic dispatch.
>
no templates... got it!
> >
> > For the rest I properly formatted the code, for the test cases I
> > have to learn how to write test cases for LLVM :)
>
> Take a look at the Clang tests in 'test/Analysis' and friends.
> Basically we'll need to wire up a driver option to clang-cc, and
> you'll want to run clang-cc on some code that outputs some diagnostics
> relevant to def-use analysis.
to be done. sorry
>
> More specific comments inline.
>
> > //===-- clang/Analysis/DefUse.h - DefUse analysis -------*- C++ -*-
> > ===//
> > //
> > // The LLVM Compiler Infrastructure
> > //
> > // This file is distributed under the University of Illinois Open
> > Source
> > // License. See LICENSE.TXT for details.
> > //
> > //
> > =
> > =
> > =
> > ----------------------------------------------------------------------=
> > ==//
> > //
> > // This file contains the declaration of DefUse analysis headers
> > //
> > //
> > =
> > =
> > =
> > ----------------------------------------------------------------------=
> > ==//
> >
> > #ifndef LLVM_CLANG_DEFUSE_H
> > #define LLVM_CLANG_DEFUSE_H
> >
> > #include "llvm/ADT/DenseMap.h"
> >
> > #include "clang/AST/ASTConsumer.h"
> > #include "clang/AST/ParentMap.h"
> > #include "clang/AST/StmtVisitor.h"
> >
> > namespace clang {
> >
> > // forward definitions
> > class CFG;
> > class CFGBlock;
> >
> > class DefUse;
> >
> > namespace defuse {
> > //===------------------------- DefUseNode -------------------------
> > ===//
> > // Keeps the information for a particular 'definition' or 'use' of a
> > variable
> > // The structure is needed because a defuse chain can contains
> > variable declarations
> > // as well as variable reference. As a DeclStmt can contains several
> > declarations
> > // to address a particular variable we need to store its VarDecl or
> > DeclRefExp
> > class DefUseNode {
> > public:
> > enum NodeKind {
> > VarDecl, VarRef
> > };
> > enum UseKind {
> > Use, Def, UseDef
> > };
> >
> > DefUseNode(clang::VarDecl const* decl) :
> > var_decl(decl), kind(VarDecl), usage(Def) {
> > }
> > DefUseNode(DeclRefExpr const* ref, UseKind u = Use) :
> > var_ref(ref), kind(VarRef), usage(u) {
> > }
> >
> > clang::VarDecl const* getDecl() const;
> >
> > NodeKind const& getKind() const {
> > return kind;
> > }
> > UseKind const& getUse() const {
> > return usage;
> > }
> >
> > clang::VarDecl const* getVarDecl() const {
> > assert(kind==VarDecl);
> > return var_decl;
> > }
> > DeclRefExpr const* getVarRef() const {
> > assert(kind==VarRef);
> > return var_ref;
> > }
> >
> > bool operator==(DefUseNode const& n) const;
> >
> > private:
> > // a def-use node can be either a VarDecl or a DeclRefExpr
> > union {
> > clang::VarDecl const* var_decl;
> > DeclRefExpr const* var_ref;
> > };
> > NodeKind kind;
> > UseKind usage;
>
> Please consider using the PointerUnion class for this. It will reduce
> the code you are using, and will allow you to merge the 'kind' field
> into the pointer as well with no extra work on your part.
>
Done.
> Should 'usage' be const?
yes, you are right!
>
> Also, most of LLVM does not follow the naming convention of lowercase
> variable names with underscores. To be consistent with the rest of
> the codebase, please take a look at the coding conventions used in a
> core header file, e.g. 'include/clang/AST/Expr.h', which shows that
> variable names are typically short, and use capitalization instead of
> underscores. This is really a nitty point, but consistency through
> the code base makes it more readable.
>
> > };
> >
> > //===------------------------- Typedefs -------------------------===//
> > class DefUseBlock;
> > class VarDeclMap;
> > typedef std::vector<DefUseBlock> DefUseData;
> > typedef llvm::DenseMap<Stmt const*, unsigned> VarRefBlockMap;
> > typedef std::vector<DefUseNode> DefUseVect;
> > typedef std::vector<DefUseNode const*> VarRefsVect;
>
> Something looks potentially wrong here. DefUseVect is a vector of
> DefUseNode objects, not DefUseNode*. If the vector ever needs to be
> resized, then the DefUseNode objects will be obliterated, and any
> references to them (which may be in VarRefVect) will suddenly become
> invalid. DefUseNode is a lightweight object that you can certainly
> keep in a vector, but if you need these objects to have stable pointer
> addresses you will need to manually allocate them and store
> DefUseNode* instead DefUseVect instead. If you are concerned with
> 'malloc()' being too slow, you can use the BumpPtrAllocator in llvm to
> quickly allocate objects.
yes, right! Fixed it.
>
> > class DefUseHelper;
> >
> > //===------------------------- DefUseHelper -------------------------
> > ===//
> >
> > class DefUseHelper : public StmtVisitor<DefUseHelper> {
> > class DefUseHelperImpl;
> > DefUseHelperImpl* pimpl;
> >
> > void InitializeValues(DefUseData* data, VarRefBlockMap* bm,
> > VarDeclMap* dm);
> >
> > friend class clang::DefUse;
> > public:
> > DefUseNode::UseKind current_use;
> > DefUseHelper();
> >
> > virtual void HandleDeclRefExpr(DeclRefExpr *DR); // remember to
> > call the
> > // super class implementation of the
> > method
> > void HandleDeclStmt(DeclStmt *DS);
> >
> > virtual void HandleBinaryOperator(BinaryOperator* B);
> > virtual void HandleConditionalOperator(ConditionalOperator* C);
> > virtual void HandleCallExpr(CallExpr* C);
> > virtual void HandleUnaryOperator(UnaryOperator* U);
> > virtual void HandleArraySubscriptExpr(ArraySubscriptExpr* AS);
> >
> > void VisitDeclRefExpr(DeclRefExpr *DR) {
> > return HandleDeclRefExpr(DR);
> > }
> > void VisitDeclStmt(DeclStmt *DS) {
> > return HandleDeclStmt(DS);
> > }
> > void VisitBinaryOperator(BinaryOperator* B) {
> > return HandleBinaryOperator(B);
> > }
> > void VisitConditionalOperator(ConditionalOperator* C) {
> > return HandleConditionalOperator(C);
> > }
> > void VisitCallExpr(CallExpr* C) {
> > return HandleCallExpr(C);
> > }
> > void VisitUnaryOperator(UnaryOperator* U) {
> > return HandleUnaryOperator(U);
> > }
> > void VisitArraySubscriptExpr(ArraySubscriptExpr* AS) {
> > return HandleArraySubscriptExpr(AS);
> > }
> >
> > void VisitCFGBlock(clang::CFGBlock const& blk, CFGBlock const&
> > entry);
> > void VisitStmt(Stmt* S);
> >
> > virtual ~DefUseHelper();
> > };
> >
> > void PrintVarDefs(DefUse const* DU, DeclRefExpr const* DR,
> > ASTContext& ctx,
> > llvm::raw_ostream& out);
> > void PrintVarUses(DefUse const* DU, DeclRefExpr const* DR,
> > ASTContext& ctx,
> > llvm::raw_ostream& out);
> > void PrintVarUses(DefUse const* DU, VarDecl const* VD, ASTContext&
> > ctx,
> > llvm::raw_ostream& out);
> >
> > ASTConsumer* CreateDefUseTestConsumer(llvm::raw_ostream& out);
> >
>
> I'm still trying to understand the implementation in its entirety, but
> it seems to me that you should try and decouple the DefUse
> implementation from the observer of its values. Here it seems like
> they are mixed together, which forces you to use subclassing of
> DefUseHelper to extend it's functionality. This isn't ideal, as it
> conflates the DefUse algorithm with what desires to consume that
> information. This muddles the separation of concerns between these
> two concepts, and it constrains future revisions of the DefUse
> implementation to stay within this visitor design. The consumer of
> the DefUse information may wish to have a visitor design, but I'm not
> certain if you want to expose a visitor pattern in the DefUse
> implementation itself.
The helper is not used as observer, it's more something needed to build
the defuse chains. When the analysis data is collected I need to know if
a DeclVarRef is a definition or an use. And this is DefUse::Helper
business. Most of the time the user will not need to redefine the
behaviour, it could happen and I want to give the opportunity for that.
Anyway now I separated the Visitor from the HandleXXX method. It's a
little bit cleaner but the concept is still the same.
>
> As an alternative, consider the "observer" interface provided by
> Clang's LiveVariables analysis ('include/clang/Analysis/Analyses/
> LiveVariables.h'). Here the LiveVariables implementation is
> completely opaque, but allows clients to provide an "Observer" object
> so they can query liveness information at specific subexpressions.
> The dead stores checker uses this interface to find dead stores, and
> it can do so without any knowledge of how the LiveVariables analysis
> calculates its liveness information. It makes the API much simpler,
> and disentangles the implementation of LiveVariables from its clients.
>
> More specifically, the HandleXXX methods look like they are part of
> the "observer" interface, while the VisitXXX methods look like they
> are part of the DefUse implementation. Your use of the pimpl idiom
> implies you want to keep some of this implementation hidden. Why not
> just separate the implementation of the DefUse algorithm completely
> from the HandleXXX methods and keep those in the DefUseHelper. The
> DefUse algorithm can then check to see if a DefUseHelper object is
> provided, and if so, call the appropriate HandleXXX methods.
>
> > } // end namespace defuse
> >
> > //===------------------------- DefUse -------------------------===//
> >
> > class DefUse {
> > ASTContext const& ctx;
> > defuse::DefUseData const* analysis_data;
> > defuse::VarDeclMap const* decl_map;
> > defuse::VarRefBlockMap const* block_map;
> > unsigned const num_cfg_blocks;
>
> Again, just my comment on the naming convention for variables.
>
> >
> > DefUse(ASTContext const& ctx_,
> > defuse::DefUseData const* analysis_data_,
> > defuse::VarDeclMap const* decl_map_,
> > defuse::VarRefBlockMap const* block_map_,
> > unsigned num_blocks) :
>
> Please do not use trailing underscores in variable names.
>
> > ctx(ctx_), analysis_data(analysis_data_), decl_map(decl_map_),
> > block_map(block_map_), num_cfg_blocks(num_blocks) {
> > }
> >
> > bool isDef(defuse::DefUseNode const& n) const;
> >
> > class iterator_impl {
> > public:
> > DefUse const* du;
> > defuse::DefUseNode const* node;
> >
> > struct iter {
> > int block_id;
> > defuse::DefUseVect::const_iterator block_it;
> > iter(int blk_id) :
> > block_id(blk_id) {
> > }
> > };
> > iterator_impl() : du(NULL) { }
> > iterator_impl(DefUse const* du_) : du(du_) { }
> > iterator_impl& operator++() {
> > return inc(false);
> > }
> > virtual iterator_impl& inc(bool) = 0;
> > };
>
> Do we need a virtual iterator? Does iterator_impl need to be
> subclassed? Not only is this a potential performance problem, but it
> could be the source of subtle bugs if you assigned an object of a
> subclass to that of a parent class. It seems to me that defs_iterator
> and uses_iterator can be merged (substantially reducing the code), and
> simply add a flag to the iterator indicating whether it is for uses or
> defs. If you want strong typing, you certainly can have subclassing
> of the iterators, but I think the entire implementation can get sucked
> into a common class. Then the "inc" just checks the flag and skips of
> uses when you are iterating over defs and vis versa. You then remove
> the vtable pointer, simplify the inheritance tree, and the
> implementation becomes a lot more readable. There is also more
> chances for inlining by the compiler (i.e., if the definition of 'inc'
> is provided in the class declaration).
>
I also fixed this. The problem here is that the 2 iterators are a little
bit different. I cannot put everything together. For example the
uses_iterator iterates through DeclRefExpr* while the defs_iterator
return a pointer to a Node* (old DefUseNode) because an use can be both
a decl or a var ref.
>
> > public:
> > class uses_iterator : public std::iterator<std::input_iterator_tag,
> > DeclRefExpr, std::ptrdiff_t, DeclRefExpr const*>,
> > public iterator_impl {
> > iter iter_ptr;
> > bool inDefBlock;
> >
> > uses_iterator() :
> > iterator_impl(), iter_ptr(-1), inDefBlock(true) { }
> > uses_iterator(DefUse const* du, defuse::DefUseNode const& n);
> > uses_iterator& inc(bool first);
> > friend class DefUse;
> > public:
> > bool operator!=(uses_iterator const& iter);
> > DeclRefExpr const* operator*();
> > };
> >
> > class defs_iterator : public std::iterator<std::input_iterator_tag,
> > defuse::DefUseNode, std::ptrdiff_t, defuse::DefUseNode const*>,
> > public iterator_impl {
> > struct iter_ : public iter {
> > defuse::VarRefsVect::const_iterator reaches_it;
> > iter_(int blk_id) :
> > iter(blk_id) { }
> > } iter_ptr;
> > bool blockDef;
> >
> > defs_iterator() :
> > iterator_impl(), iter_ptr(-1), blockDef(false) { }
> > defs_iterator(DefUse const* du, DeclRefExpr const& n);
> > defs_iterator& inc(bool first);
> > friend class DefUse;
> > public:
> > bool operator!=(defs_iterator const& iter);
> > defuse::DefUseNode const* operator*();
> > };
> >
> > // USES //
> > defs_iterator defs_begin(DeclRefExpr const* var) const;
> > defs_iterator defs_end() const;
> >
> > // DEFS //
> > uses_iterator uses_begin(DeclRefExpr const* var) const;
> > uses_iterator uses_begin(VarDecl const* var) const;
> > uses_iterator uses_end() const;
> >
> > bool isUse(DeclRefExpr const* var) const;
> > bool isDef(DeclRefExpr const* var) const {
> > return isDef(defuse::DefUseNode(var, defuse::DefUseNode::Def));
> > }
> > bool isDef(VarDecl const* var) const {
> > return isDef(defuse::DefUseNode(var));
> > }
> >
> > ~DefUse();
> >
> > static DefUse* BuildDefUseChains(Stmt* body, ASTContext *ctx,
> > CFG* cfg, ParentMap* pm, defuse::DefUseHelper* helper = 0,
> > bool verbose = false, llvm::raw_ostream& out = llvm::outs());
> > };
> >
> > } // end namespace clang
> >
> > #endif
> > //===-- clang/Analysis/DefUse.cpp - DefUse analysis -------*- C++ -*-
> > ===//
> > //
> > // The LLVM Compiler Infrastructure
> > //
> > // This file is distributed under the University of Illinois Open
> > Source
> > // License. See LICENSE.TXT for details.
> > //
> > //
> > =
> > =
> > =
> > ---------------------------------------------------------------------
> > ===//
> > //
> > // This file contains the implementation of DefUse analysis
> > //
> > //
> > =
> > =
> > =
> > ---------------------------------------------------------------------
> > ===//
> >
> > #define __STDC_LIMIT_MACROS
> > #define __STDC_CONSTANT_MACROS
> >
> > #include "DefUse.h"
> >
> > #include "clang/Basic/SourceManager.h"
> > #include "clang/Analysis/CFG.h"
> > #include <llvm/Support/raw_ostream.h>
> >
> > // todo: Use DenseMap and llvm sets
> > #include <set>
> > #include <map>
> >
> > using namespace clang;
> > using namespace defuse;
> >
> > // utility functions
> > static unsigned Line(SourceLocation const& l, clang::SourceManager
> > const& sm) {
> > PresumedLoc pl = sm.getPresumedLoc(l);
> > return pl.getLine();
> > }
> >
> > static unsigned Column(SourceLocation const& l, SourceManager const&
> > sm) {
> > PresumedLoc pl = sm.getPresumedLoc(l);
> > return pl.getColumn();
> > }
> >
> > static std::string PrintClangStmt(Stmt const* stmt, ASTContext
> > const& ctx) {
> > std::string str;
> > llvm::raw_string_ostream pp(str);
> > stmt->printPretty(pp, 0, PrintingPolicy(ctx.getLangOptions()), 0);
> > return pp.str();
> > }
> >
> > //===------------------------- DefUseNode -------------------------
> > ===//
> > inline bool defuse::DefUseNode::operator==(DefUseNode const& n)
> > const {
> > if (n.usage != usage || n.kind != kind) return false;
> > return kind == VarDecl ? var_decl == n.var_decl : var_ref ==
> > n.var_ref;
> > }
> >
> > inline clang::VarDecl const* DefUseNode::getDecl() const {
> > return kind == VarRef ? dyn_cast<clang::VarDecl> (var_ref->getDecl
> > ()) :
> > dyn_cast<clang::VarDecl> (var_decl);
> > }
> >
> > typedef std::set<DefUseNode const*> VarRefsSet;
> > typedef std::set<VarDecl const*> VarDeclsSet;
> > typedef std::map<VarDecl const*, DefUseVect> DefUseChain;
> >
> > //===------------------------- DefUseBlock -------------------------
> > ===//
> >
> > class defuse::DefUseBlock : public DefUseChain {
>
> Please use encapsulation instead of subclassing. In this case you are
> subclassing std::map, and that is something there rarely is ever a
> good reason to do. Encapsulation ensures isolation between the two
> classes, and forces you to think in terms of interfaces instead of
> making assumptions about the implementation of the underlying data
> structure. Almost all of LLVM is structured this way, and it makes
> code far more resilient to implementation changes in the employed data
> structures.
>
> Instead, either have DefUseChain contain an std::map (and then define
> the proper interfaces in DefUseChain) or have DefUseBlock contain a
> DefUseChain, not subclass it. The former forces you to define only
> the interfaces you need for DefUseChain, instead of exposing the
> entirety of std::map (which isn't the interface you're intending to
> vend, even if it is just to your client code within the DefUse
> implementation).
I have to think about it... the main observation is that this stuff will
be only used in this translation unit.
>
> > public:
> > VarDeclsSet uses;
> > VarRefsVect defsout;
> > VarDeclsSet killed;
> > VarRefsVect reaches;
> >
> > class not_killed_set_iterator : public
> > std::iterator<std::input_iterator_tag,
> > DefUseNode, std::ptrdiff_t, DefUseNode const*> {
> > DefUseBlock& block;
> > VarRefsVect::const_iterator ptr_it;
> >
> > not_killed_set_iterator& inc(bool first) {
> > if (ptr_it != block.reaches.end() && !first)
> > ++ptr_it;
> > while (ptr_it != block.reaches.end() &&
> > (block.killed.find((*ptr_it)->getDecl()) != block.killed.end
> > () &&
> > std::find(block.defsout.begin(), block.defsout.end(),
> > *ptr_it)
> > == block.defsout.end()))
> > ++ptr_it;
> > return *this;
> > }
> > public:
> > not_killed_set_iterator(DefUseBlock& block_,
> > VarRefsVect::const_iterator ptr_it_) :
> > block(block_), ptr_it(ptr_it_) { inc(true); }
> > not_killed_set_iterator& operator++() { return inc(false); }
> > bool operator!=(not_killed_set_iterator const& iter) const {
> > return !((&iter.block) == &(block) && iter.ptr_it == ptr_it);
> > }
> > const DefUseNode* const & operator*() const { return *ptr_it; }
> > };
> >
> > not_killed_set_iterator begin_not_killed_set() {
> > return DefUseBlock::not_killed_set_iterator(*this, reaches.begin
> > ());
> > }
> > not_killed_set_iterator end_not_killed_set() {
> > return DefUseBlock::not_killed_set_iterator(*this, reaches.end());
> > }
> > };
> >
> > //===------------------------- VarDeclMap -------------------------
> > ===//
> >
> > class defuse::VarDeclMap : public StmtVisitor<VarDeclMap> ,
> > public llvm::DenseMap<VarDecl const*, DeclStmt const*> {
> > public:
> > VarDeclMap(Stmt const* body) { VisitStmt(const_cast<Stmt*> (body)); }
> >
> > void VisitDeclStmt(DeclStmt *DS) {
> > for (DeclStmt::decl_iterator I = DS->decl_begin(), E = DS-
> > >decl_end();
> > I != E; ++I)
> > {
> > if (VarDecl *VD = dyn_cast<VarDecl>(*I))
> > insert(std::make_pair(VD, DS));
> > }
> > }
> > void VisitStmt(Stmt* S) {
> > for (Stmt::child_iterator I = S->child_begin(), E = S->child_end();
> > I != E; ++I)
> > if (*I) Visit(*I);
> > }
> > };
> >
> > //===------------------------- DefUseHelper -------------------------
> > ===//
> > struct defuse::DefUseHelper::DefUseHelperImpl {
> > DefUseData* data;
> > VarRefBlockMap* bm;
> > VarDeclMap* dm;
> > unsigned blockID;
> >
> > DefUseHelperImpl() :
> > data(NULL), bm(NULL), dm(NULL) { }
> > };
>
> Per my comments above, it seems to me this pimpl logic can be sucked
> into a common class with the main logic for the DefUse algorithm, and
> then make that whole thing private to DefUse.cpp. That will be a lot
> simpler, and should migrate many of the implementation details from
> the header file to the .cpp file. Then clients just pass the
> DefUseHelper object (which has the HandleXXX methods) to
> BuildDefUseChains(), and the implementation class stays completely
> hidden (thus separating the DefUse algorithm from the client interface
> for querying that information).
I tried to deal with this.
>
> >
> > defuse::DefUseHelper::DefUseHelper() :
> > pimpl(new DefUseHelperImpl), current_use(DefUseNode::Use) { }
> >
> > defuse::DefUseHelper::~DefUseHelper() { delete pimpl; }
> >
> > void defuse::DefUseHelper::InitializeValues(DefUseData* data,
> > VarRefBlockMap* bm, VarDeclMap* dm) {
> > pimpl->data = data;
> > pimpl->bm = bm;
> > pimpl->dm = dm;
> > }
> >
> > void defuse::DefUseHelper::HandleDeclRefExpr(DeclRefExpr *DR) {
> > VarRefBlockMap& bm = *pimpl->bm;
> > unsigned blockID = pimpl->blockID;
> > DefUseBlock& block_data = (*pimpl->data)[blockID];
> > if (VarDecl *VD = dyn_cast<VarDecl>(DR->getDecl())) {
> > // map this variable to the current block
> > bm[DR] = blockID;
> > if (block_data.find(VD) == block_data.end()) {
> > block_data.insert(std::make_pair(VD, DefUseVect()));
> > block_data.uses.insert(VD); // this variable is used in the
> > current
> > // block and it has no prior definition within the block
> > }
> > if (current_use == DefUseNode::UseDef) { // in case of compound
> > operators
> > DefUseVect &B = block_data[VD];
> > B.push_back(DefUseNode(DR, DefUseNode::Use));
> > B.push_back(DefUseNode(DR, DefUseNode::Def));
> > } else block_data[VD].push_back(DefUseNode(DR, current_use));
> > }
> > }
> >
> > void defuse::DefUseHelper::HandleDeclStmt(DeclStmt *DS) {
> > for (DeclStmt::decl_iterator I = DS->decl_begin(), E = DS->decl_end
> > ();
> > I != E; I++)
> > {
> > if (VarDecl* VD = dyn_cast<VarDecl> (*I)) {
> > VarRefBlockMap& bm = *pimpl->bm;
> > VarDeclMap& dm = *pimpl->dm;
> > unsigned blockID = pimpl->blockID;
> > DefUseBlock& block_data = (*pimpl->data)[blockID];
> >
> > assert((block_data.find(VD) == block_data.end()) &&
> > "Variable redefined in the same block");
> > block_data[VD].push_back(DefUseNode(VD));
> > bm[dm[VD]] = blockID;
> > }
> > }
> > }
> >
> > void defuse::DefUseHelper::HandleBinaryOperator(BinaryOperator* B) {
> > DefUseNode::UseKind backup = current_use; // backup the usage
> > if (B->isAssignmentOp()) {
> > current_use = DefUseNode::Use;
> > Visit(B->getRHS());
> > current_use = DefUseNode::Def;
> > if (B->isCompoundAssignmentOp())
> > current_use = DefUseNode::UseDef;
> > Visit(B->getLHS());
> > } else {
> > Visit(B->getLHS());
> > current_use = DefUseNode::Use;
> > Visit(B->getRHS());
> > }
> > current_use = backup; // write back the usage to the current usage
> > }
> >
> > void defuse::DefUseHelper::HandleConditionalOperator
> > (ConditionalOperator* C) {}
> >
> > void defuse::DefUseHelper::HandleCallExpr(CallExpr* C) {
> > DefUseNode::UseKind backup = current_use; // backup the usage
> > for (CallExpr::arg_iterator I = C->arg_begin(), E = C->arg_end();
> > I != E; ++I)
> > {
> > if ((*I)->getType()->isPointerType() || (*I)->getType()-
> > >isReferenceType())
> > current_use = DefUseNode::Def;
> > Visit(*I);
> > current_use = backup;
> > }
> > }
> >
> > void defuse::DefUseHelper::HandleUnaryOperator(UnaryOperator* U) {
> > DefUseNode::UseKind backup = current_use; // backup the usage
> > switch (U->getOpcode()) {
> > case UnaryOperator::PostInc:
> > case UnaryOperator::PostDec:
> > case UnaryOperator::PreInc:
> > case UnaryOperator::PreDec:
> > current_use = DefUseNode::UseDef;
> > break;
> > case UnaryOperator::Plus:
> > case UnaryOperator::Minus:
> > case UnaryOperator::Not:
> > case UnaryOperator::LNot:
> > current_use = DefUseNode::Use;
> > break;
> > case UnaryOperator::AddrOf:
> > case UnaryOperator::Deref:
> > // use the current_use
> > break;
> > default:
> > // DEBUG("Operator " << UnaryOperator::getOpcodeStr(U->getOpcode
> > ()) <<
> > // " not supported in def-use analysis");
> > break;
> > }
> > Visit(U->getSubExpr());
> > current_use = backup; // write back the usage to the current usage
> > }
> >
> > void defuse::DefUseHelper::HandleArraySubscriptExpr
> > (ArraySubscriptExpr* AS) {
> > DefUseNode::UseKind backup = current_use; // backup the usage
> > Visit(AS->getBase());
> > current_use = DefUseNode::Use;
> > Visit(AS->getIdx());
> > current_use = backup; // write back the usage to the current usage
> > }
> >
> > void defuse::DefUseHelper::VisitCFGBlock(CFGBlock const& blk,
> > CFGBlock const& entry) {
> > pimpl->blockID = blk.getBlockID();
> > for (CFGBlock::const_iterator I = blk.begin(), E = blk.end(); I !=
> > E; ++I)
> > Visit(*I);
> >
> > unsigned blockID = pimpl->blockID;
> > DefUseBlock& block_data = (*pimpl->data)[blockID];
> >
> > // Post processing
> > // build up the uses(b), defsout(b) and killed(b)
> >
> > // for each variable
> > for (DefUseBlock::iterator I = block_data.begin(), E =
> > block_data.end(); I
> > != E; ++I) {
> > // in case of references to ParmVar what we do is insert the
> > variable
> > // definition in the first block of the CFG
> > DefUseBlock& root_data = (*pimpl->data)[entry.getBlockID()];
> > if (isa<ParmVarDecl>(I->first) &&
> > std::find(root_data[I->first].begin(), root_data[I->first].end
> > (),
> > DefUseNode(I->first)) == root_data[I->first].end())
> > {
> > root_data[I->first].push_back(DefUseNode(I->first));
> > root_data.defsout.push_back(&root_data[I->first].back());
> > }
> > // if this is a local variable, should't be in the defsout,killed
> > sets
> > // defsout is calculated by iterating backwards to find the last
> > use of
> > // the variable
> > for (DefUseVect::const_reverse_iterator UI = I->second.rbegin(),
> > UE =
> > I->second.rend(); UI != UE; ++UI)
> > if (UI->getUse() == DefUseNode::Def && (block_data.defsout.empty
> > ()
> > || block_data.defsout.back()->getDecl() != I->first)) {
> > block_data.defsout.push_back(&(*UI));
> > block_data.killed.insert(UI->getDecl());
> > break;
> > }
> > }
> > }
>
> Are these handle methods just part of the main DefUse algorithm? If
> so, consider sucking them into the Visit methods (thus removing the
> virtual dispatch), and have them call the HandleXXX methods in a
> separate DefUseHelper object if one was provided. That way the
> algorithm stays separate from those who want to observe its values.
there is no observing, the handleXXX are used to build DefUse.
>
> I know I keep saying this point, but a critical part of making this
> clean is making it clear where the def-use implementation is and where
> we would want to insert hooks to query information (including auditing
> the algorithm). Making those interfaces clear is key. Ideally, the
> header file should define just the functions we need to run the
> algorithm and query the def-use information, nothing more. I fully
> admit that I just might be confused with the details of the
> implementation.
>
> There is a bunch of other code that I'm afraid I haven't waded through
> yet, as there looks like there are some really meaty pieces there.
> I've noticed some uses of std::set and performing set union's, etc.
> Note that there may be a really high cost to using such methods, as
> there will be implicit allocations from malloc, etc. There may
> possibly be alternative data structures within LLVM that you can use
> for your purposes that perform far better, but I haven't looked at
> your implementation close enough to make more specific comments yet.
> I've given you enough comments, however, in this email to await any
> feedback you want to provide on the next iteration.
>
> Again, this is really great you're interested in contributing this
> back, and I look forward to hearing your reply.
>
thanks to you Ted!
> Best,
> Ted
cheers, Simone
-------------- next part --------------
A non-text attachment was scrubbed...
Name: DefUse.cpp
Type: text/x-c++src
Size: 35013 bytes
Desc: not available
URL: <http://lists.llvm.org/pipermail/cfe-dev/attachments/20091101/f55d4f32/attachment.cpp>
-------------- next part --------------
A non-text attachment was scrubbed...
Name: DefUse.h
Type: text/x-chdr
Size: 8344 bytes
Desc: not available
URL: <http://lists.llvm.org/pipermail/cfe-dev/attachments/20091101/f55d4f32/attachment.h>
More information about the cfe-dev
mailing list