[cfe-dev] CFG: scopes and destructors

Wed Aug 11 09:07:48 PDT 2010

Hi Ted,

Thank you for this very detailed response.

CFGScope serves two purposes in what I thought of:
1. It describes the scope and it's relation with enclosing scope.
2. It allows for easy iteration through declarations (DeclStmts) we need to
call destructors for.

Combining those two things is maybe not the best idea, so further on I will
concetrate on destructors only.

What you wrote about forward traversing e.g. CompondStmt to spot all
declarations we need to call destructors for was exactly what I thought of.
However adding one CFGElement for each destructor call is unnecessery I
think. What I thought of was using data stored in CFGScope to provide list
of all declarations that needs destructors to be called for them. Now I
think that it would be better to use some other structure just for
destructors that could be even hidden from the client.

So I think that we could create CFGElement that would be placed at point,
where destructors are called. It would provide, through iterators interface,
list of declarations that needs destructors to be called for them.

Pros:
1. Less memory used (in most cases I think), because each destructor will
have one entry in some list, then we will have just one CFGElement for each
point where destructors are called.
2. Should be more robust, because less work will be done.
3. Easier implementation for goto statements.

Cons:
1. Maybe a little less intuitive interface, because having one CFGElement
for each destructor call is somehow more readable.

So do you think that I can go on with my idea (for destructors only) or the
one-CFGElement-for-one-desctructor-call is more desired?

W dniu 11 sierpnia 2010 08:21 użytkownik Ted Kremenek
<kremenek at apple.com>napisał:

Hi Marcin,
>
> First, I want to thank you for working on this.  Adding support for
> destructors, and possibly scopes, to the CFG is something that is sorely
> needed for analyzing C++ code.
>
> At the risk of sounding pedantic, I think when evaluating our possible
> approaches it's really important to keep in mind both *how* the information
> in the CFG will be used to service clients and *what* information the CFG is
> meant to represent.  Second, I think we shouldn't focus on the
> implementation details (e.g., "the builder is processing AST backwards") of
> how we would construct the information before we have an idea of what the
> right API is for accessing the information in its final form.  The whole
> point of the CFG is to make it easier to write analyses, so having a good
> API that represents useful information is the primary goal.
>
> That's all well and good, but what does that mean?  First let's step back
> and examine what purpose the CFG serves.  The CFG is meant to represent
> control-flow in a function; nothing more and nothing less.  Representing the
> control-flow is important for any analyses we want to build on top of the
> CFG that want to know the ordering between statements, the change of control
> caused by jumps and loops, etc.  With this in mind, let's consider both the
> problem of scopes and destructors.
>
> From your proposal of CFGScope, if you take a look at the definition it
> actually has nothing to do with control-flow at all.  In C and its
> derivatives, scope is purely a lexical concept.  This is evident since Sema
> and the Parser reason about scopes but don't actually reason about
> control-flow.  If you renamed 'CFGScope' to 'ASTScope' then it functionally
> would be the same thing.  Thus in my mind tying it to CFGs really has no
> value, and conflates our mental model of what the CFG does.  I think it's
> reasonable to consider building ASTScopes (or whatever they are called) as
> part of the process of building a CFG (sort of a by-product), but unless the
> scope information is valuable to model in the CFG as something a client of
> the CFG would want to know about *when reasoning about control-flow* then I
> think it is an orthogonal topic.  Before I believe I argued that this
> information could be represented in the CFG, but I'm honestly not certain
> what benefit that would provide other than modeling when we cross scope
> boundaries.  That information, however, can likely be constructed by
> analysis using a side map from statements to scope, so it's not clear if
> that needs to be in the CFG.
>
> Now let's consider destructor calls.  Destructor calls are really just
> hidden function calls inserted by the compiler.  If we looked at the LLVM IR
> for a function that created/destroyed C++ objects, the calls to the
> destructors would be explicitly represented.  This is crucial for building
> analyses in the LLVM backend.  Source-level analyses really need this
> information as well, since when destructors get called is non-trivial, and
> we don't want individual analyses needing to reconstruct this information.
>  This is actually far more complicated then when we cross scope boundaries.
>
> To illustrate, consider:
>
>   MyClass A;
>   ...
>   if (...)
>     return;  // Destructor for A called
>   ...
>   MyClass B;
>   ...
>   if (...)
>     return;  // Destructor for B called, followed by destructor for A
>   ...
>   if (...) {
>     MyClass C;
>     return; // Destructor C called, followed by destructor for B, followed
> by destructor for A
>   }
>   ...
>   return;  // Destructor B called, followed by destructor for A
>
> In the comments, I have listed distinct and well-ordered function calls.
>  This example is fairly simple, but if you consider adding things like
> exceptions, C++ temporaries, etc., we have an abundance of destructor calls
> that can be called at different times.  We certainly do not want individual
> analyses built on the CFG to have to reconstruct this information from the
> raw scope information.  Also, we want to be able to model individual
> destructor calls, particularly if any of them can throw exceptions or do
> something else crazy that impacts control-flow (e.g., call a no-return
> function).  For the purpose of diagnostics from analyses (think of the
> static analyzer), we want to know (relatively speaking) where a destructor
> was called.
>
> This is why, fundamentally, I think destructors need to be explicitly
> represented in the CFG.  Dataflow analyses need to know when destructors are
> called and in what order, and they need to model this precisely and easily.
>  Scopes are obviously and important part of reasoning about constructors,
> and for the CFGBuilder to model destructors in the CFG it will need to
> reason about scopes.  That, however, is different than having the scope
> information you described as 'CFGScope' being associated with the CFG
> itself.
>
> Now let's consider implementation.  Indeed there is an interesting
> challenge in doing this in CFGBuilder because we process the AST backwards.
>  Processing it backwards, however, will actually allow us to readily
> determine the right destructor calls *if* we can readily know the current
> set of objects in a given scope that need their destructors called.  I'm
> just throwing an idea out here, but this can possibly be done by doing a
> mixture of "backwards" and "forward" processing of an AST.
>
> To illustrate, consider the case of a CompoundStmt.  Right now we iterate
> through the statements in the CompoundStmt backwards, and build CFGBlocks as
> needed.  A CompoundStmt represents a distinct scope, and the objects in that
> scope are destroyed in reverse order that they are declared.  One
> possibility is that when we process a CompoundStmt in CFGBuilder we first do
> a forward pass to build up a list of objects (Decls) that would need their
> destructors called.  We then process the CompoundStmt in reverse order.
>  When we see a point where we would exit the scope of the CompoundStmt, we
> consult our list of objects (Decls) that would need to be destroyed and add
> the needed CFGElements for the destructor calls.  Later in our processing of
> CompoundStmt, when we see one of the DeclStmts for one of the objects, we
> just remove that object from the list of objects that needed to be destroyed
> and keep processing.  Then when we see another point where we would exit the
> scope, we consult the current list of objects that need to be destroyed
> (which will only include the objects created up to that point) and add the
> appropriate CFGElements for the destructor calls.  While this approach
> requires a little extra processing, algorithmically it's not all that
> complicated and it is precise.
>
> Of course we would also need to model nested scopes that also create
> objects.  We would thus have a chain of "lists of objects" (one list for
> each nested scope), popping off lists as we process the first statement in
> that scope.  We then create the destructors for all objects not in the scope
> we are going to.
>
> To conclude, I'm not saying that my idea is the right (and only approach).
>  I just wanted to point out more of the design parameters here, and that it
> actually isn't as hard as it seems to explicitly model the destructors in
> the CFG.  I'm also mixed on what value there is modeling scopes directly in
> the CFG itself, since it seems like a complementary concept.
>
> On Aug 9, 2010, at 5:05 PM, Marcin Świderski wrote:
>
> > Hi,
> >
> > I'm trying to figure out how to add scopes and destructors to CFG. As I
> understand, what we need is:
> >
> > For scopes:
> > - CFGElement kinds for start/end of scopes that would allow for tracking
> entering and leaving scopes.
> > - If there are no variable declarations in scope, scope start/end
> elements should be omited.
> > - Scope end element should be omited if there are no variable
> declarations before it in the scope.
> >
> > For destructors:
> > - CFGElement kind for call to destructor.
> >
> > The main problem with implementing this in CFGBuilder is that the builder
> is processing AST backwards. When the builder have to decide if it should
> add a scope end element or a destructor it doesn't know of variables
> declared earlier in the scope. To solve this there have to be an additional
> step of walking e.g. a block of statements (compound statement) and creating
> a list of all statements declaring variables.
> >
> > Solution that I would like to propose is to make those lists a part of
> CFG instead of just temporary data used while building the graph. It would
> look something like this:
> >
> > class CFGScope {
> >   Stmt* stmt; // Statement that creates the scope (e.g. CompoundStmt)
> >   vector<Stmt*> decls; // Statements that declare variables in same order
> as in AST
> >   CFGScopeIterator parent; // Link to position in parent scope where this
> scope started
> > };
> >
> > class CFGScopeIterator {
> >   CFGScope* scope; // Scope pointed by iterator
> >   vector<Stmt*>::iterator pos; // Position in declarations list of the
> scope.
> > };
> >
> > Now the scope start element would point to CFGScope object. The scope end
> element would point to pair (range) of CFGScopeIterators: first would point
> to position in current scope the scope end occured, second would point to
> position in some scope where the control flow did jump to. Element for
> destructors wouldn't be needed, because destructors could be easly deduced
> from range in scope end element.
> >
> > This design would allow for simple implementation of jumps out of nested
> scopes and simple modeling of destructors. Similar design could be used to
> model destructors for temporary objects.
> >
> > What do you think about this design?
> >
> > Cheers
> > Sfider
>
>
-------------- next part --------------
An HTML attachment was scrubbed...
URL: <http://lists.llvm.org/pipermail/cfe-dev/attachments/20100811/b69d8645/attachment.html>