[cfe-dev] CFG: scopes and destructors

Wed Aug 11 12:26:15 PDT 2010

I'm less familiar with the CFG than with its clients, but FWIW I agree
with Ted. The notion of "scope" does not have a one-to-one mapping onto any
/control flow/ structures.

Remember, the CFG is only used for analysis, not code generation. We don't
need to know about "scope" in and of itself -- it's only important for
cleanup functions. (And with the attribute((cleanup(xyz))) extension, it's
not limited to destructors...but we can worry about that later.)

"Scopes" also aren't so useful when you consider branches that don't
rejoin. Consider this function:

int PerformComputation (int x) {
  if (DEBUG)
    return 0;

  while (!Satisfactory(x)) {
    if (x == 0)
      break;
    Computation C = Compute(x);
    if (!C)
      break;
    x = C.getResult();
  }

  return x;
}

Clearly a little contrived, but the point stands -- ~C is called at two
different points in the CFG ("during" the break, and at the end of the
loop, and not on every path in its scope. So to properly represent
destructors, we're probably going to have duplicated destructor elements.

This isn't inherently a bad thing, but it does show that "scope" is not as
useful as it seems at first.

Why would we want distinct CFGElements for destructors? Because the way
our path sensitive analysis works is by going from CFGElement to
CFGElement. If we can determine where a destructor is invoked, we could
probably even just treat it as a simple CXXMethodCall, since we don't have
to worry about the deletion aspects.

If we had the DestroyThese element (or LeavingScope, or whatever) you
suggest, we could stick several object destructors into a single
CFGElement. But when we simulate the execution of these, we'd basically
unwrap the destructors into several conceptual elements anyway. And since
one destructor can affect the behavior of the next (consider a double-free
of shared memory), there's clearly a point "between" two destructors.

Finally, the fact that temporaries have destructors too suggests that it'd
be useful to allow destructors in the middle of CFGBlocks as well.

I (also) am not insisting on separate elements for separate destructors,
but it seems that even if the CFG is more complicated, clients will have an
easier time with separate elements.

This whole time I've been dancing around the issue of how to generate
destructors. I think we're going to need some variation of the
forward-sweep-then-backwards-CFG, though I haven't thought about it much,
just because a purely backwards traversal would have a hard time dealing
with the branching example I gave above. (Or so it seems to me.) A
forward-first traversal could build a list of Decls, which could be used
something like this: "any time you leave this scope you have to run the
destructors for all elements in the list; any time you see a Decl, it
should be removed from the list". A first-pass approximation of a rule, of
course.

I also wish we could re-use some of the main compiler's logic for where to
insert destructors. I kind of doubt this is possible, but it'd be nice if
we didn't have to duplicate this in the CFG-building code, possibly
inserting mistakes along the way.

Oh, and I agree about putting constructor initializers in as new
CFGElements. For the most part they behave just like Decls anyway.

On Wed, 11 Aug 2010 18:07:48 +0200, Marcin Świderski
<marcin.sfider at gmail.com> wrote:
> Hi Ted,
> 
> Thank you for this very detailed response.
> 
> CFGScope serves two purposes in what I thought of:
> 1. It describes the scope and it's relation with enclosing scope.
> 2. It allows for easy iteration through declarations (DeclStmts) we need
to
> call destructors for.
> 
> Combining those two things is maybe not the best idea, so further on I
will
> concetrate on destructors only.
> 
> What you wrote about forward traversing e.g. CompondStmt to spot all
> declarations we need to call destructors for was exactly what I thought
of.
> However adding one CFGElement for each destructor call is unnecessery I
> think. What I thought of was using data stored in CFGScope to provide
list
> of all declarations that needs destructors to be called for them. Now I
> think that it would be better to use some other structure just for
> destructors that could be even hidden from the client.
> 
> So I think that we could create CFGElement that would be placed at
point,
> where destructors are called. It would provide, through iterators
> interface,
> list of declarations that needs destructors to be called for them.
> 
> Pros:
> 1. Less memory used (in most cases I think), because each destructor
will
> have one entry in some list, then we will have just one CFGElement for
each
> point where destructors are called.
> 2. Should be more robust, because less work will be done.
> 3. Easier implementation for goto statements.
> 
> Cons:
> 1. Maybe a little less intuitive interface, because having one
CFGElement
> for each destructor call is somehow more readable.
> 
> So do you think that I can go on with my idea (for destructors only) or
the
> one-CFGElement-for-one-desctructor-call is more desired?
> 
> W dniu 11 sierpnia 2010 08:21 użytkownik Ted Kremenek
> <kremenek at apple.com>napisał:
> 
> Hi Marcin,
>>
>> First, I want to thank you for working on this.  Adding support for
>> destructors, and possibly scopes, to the CFG is something that is
sorely
>> needed for analyzing C++ code.
>>
>> At the risk of sounding pedantic, I think when evaluating our possible
>> approaches it's really important to keep in mind both *how* the
>> information
>> in the CFG will be used to service clients and *what* information the
>> CFG is
>> meant to represent.  Second, I think we shouldn't focus on the
>> implementation details (e.g., "the builder is processing AST
backwards")
>> of
>> how we would construct the information before we have an idea of what
the
>> right API is for accessing the information in its final form.  The
whole
>> point of the CFG is to make it easier to write analyses, so having a
good
>> API that represents useful information is the primary goal.
>>
>> That's all well and good, but what does that mean?  First let's step
back
>> and examine what purpose the CFG serves.  The CFG is meant to represent
>> control-flow in a function; nothing more and nothing less. 
Representing
>> the
>> control-flow is important for any analyses we want to build on top of
the
>> CFG that want to know the ordering between statements, the change of
>> control
>> caused by jumps and loops, etc.  With this in mind, let's consider both
>> the
>> problem of scopes and destructors.
>>
>> From your proposal of CFGScope, if you take a look at the definition it
>> actually has nothing to do with control-flow at all.  In C and its
>> derivatives, scope is purely a lexical concept.  This is evident since
>> Sema
>> and the Parser reason about scopes but don't actually reason about
>> control-flow.  If you renamed 'CFGScope' to 'ASTScope' then it
>> functionally
>> would be the same thing.  Thus in my mind tying it to CFGs really has
no
>> value, and conflates our mental model of what the CFG does.  I think
it's
>> reasonable to consider building ASTScopes (or whatever they are called)
>> as
>> part of the process of building a CFG (sort of a by-product), but
unless
>> the
>> scope information is valuable to model in the CFG as something a client
>> of
>> the CFG would want to know about *when reasoning about control-flow*
>> then I
>> think it is an orthogonal topic.  Before I believe I argued that this
>> information could be represented in the CFG, but I'm honestly not
certain
>> what benefit that would provide other than modeling when we cross scope
>> boundaries.  That information, however, can likely be constructed by
>> analysis using a side map from statements to scope, so it's not clear
if
>> that needs to be in the CFG.
>>
>> Now let's consider destructor calls.  Destructor calls are really just
>> hidden function calls inserted by the compiler.  If we looked at the
>> LLVM IR
>> for a function that created/destroyed C++ objects, the calls to the
>> destructors would be explicitly represented.  This is crucial for
>> building
>> analyses in the LLVM backend.  Source-level analyses really need this
>> information as well, since when destructors get called is non-trivial,
>> and
>> we don't want individual analyses needing to reconstruct this
>> information.
>>  This is actually far more complicated then when we cross scope
>>  boundaries.
>>
>> To illustrate, consider:
>>
>>   MyClass A;
>>   ...
>>   if (...)
>>     return;  // Destructor for A called
>>   ...
>>   MyClass B;
>>   ...
>>   if (...)
>>     return;  // Destructor for B called, followed by destructor for A
>>   ...
>>   if (...) {
>>     MyClass C;
>>     return; // Destructor C called, followed by destructor for B,
>>     followed
>> by destructor for A
>>   }
>>   ...
>>   return;  // Destructor B called, followed by destructor for A
>>
>> In the comments, I have listed distinct and well-ordered function
calls.
>>  This example is fairly simple, but if you consider adding things like
>> exceptions, C++ temporaries, etc., we have an abundance of destructor
>> calls
>> that can be called at different times.  We certainly do not want
>> individual
>> analyses built on the CFG to have to reconstruct this information from
>> the
>> raw scope information.  Also, we want to be able to model individual
>> destructor calls, particularly if any of them can throw exceptions or
do
>> something else crazy that impacts control-flow (e.g., call a no-return
>> function).  For the purpose of diagnostics from analyses (think of the
>> static analyzer), we want to know (relatively speaking) where a
>> destructor
>> was called.
>>
>> This is why, fundamentally, I think destructors need to be explicitly
>> represented in the CFG.  Dataflow analyses need to know when
destructors
>> are
>> called and in what order, and they need to model this precisely and
>> easily.
>>  Scopes are obviously and important part of reasoning about
constructors,
>> and for the CFGBuilder to model destructors in the CFG it will need to
>> reason about scopes.  That, however, is different than having the scope
>> information you described as 'CFGScope' being associated with the CFG
>> itself.
>>
>> Now let's consider implementation.  Indeed there is an interesting
>> challenge in doing this in CFGBuilder because we process the AST
>> backwards.
>>  Processing it backwards, however, will actually allow us to readily
>> determine the right destructor calls *if* we can readily know the
current
>> set of objects in a given scope that need their destructors called. 
I'm
>> just throwing an idea out here, but this can possibly be done by doing
a
>> mixture of "backwards" and "forward" processing of an AST.
>>
>> To illustrate, consider the case of a CompoundStmt.  Right now we
iterate
>> through the statements in the CompoundStmt backwards, and build
>> CFGBlocks as
>> needed.  A CompoundStmt represents a distinct scope, and the objects in
>> that
>> scope are destroyed in reverse order that they are declared.  One
>> possibility is that when we process a CompoundStmt in CFGBuilder we
>> first do
>> a forward pass to build up a list of objects (Decls) that would need
>> their
>> destructors called.  We then process the CompoundStmt in reverse order.
>>  When we see a point where we would exit the scope of the CompoundStmt,
>>  we
>> consult our list of objects (Decls) that would need to be destroyed and
>> add
>> the needed CFGElements for the destructor calls.  Later in our
>> processing of
>> CompoundStmt, when we see one of the DeclStmts for one of the objects,
we
>> just remove that object from the list of objects that needed to be
>> destroyed
>> and keep processing.  Then when we see another point where we would
exit
>> the
>> scope, we consult the current list of objects that need to be destroyed
>> (which will only include the objects created up to that point) and add
>> the
>> appropriate CFGElements for the destructor calls.  While this approach
>> requires a little extra processing, algorithmically it's not all that
>> complicated and it is precise.
>>
>> Of course we would also need to model nested scopes that also create
>> objects.  We would thus have a chain of "lists of objects" (one list
for
>> each nested scope), popping off lists as we process the first statement
>> in
>> that scope.  We then create the destructors for all objects not in the
>> scope
>> we are going to.
>>
>> To conclude, I'm not saying that my idea is the right (and only
>> approach).
>>  I just wanted to point out more of the design parameters here, and
that
>>  it
>> actually isn't as hard as it seems to explicitly model the destructors
in
>> the CFG.  I'm also mixed on what value there is modeling scopes
directly
>> in
>> the CFG itself, since it seems like a complementary concept.
>>
>> On Aug 9, 2010, at 5:05 PM, Marcin Świderski wrote:
>>
>> > Hi,
>> >
>> > I'm trying to figure out how to add scopes and destructors to CFG. As
I
>> understand, what we need is:
>> >
>> > For scopes:
>> > - CFGElement kinds for start/end of scopes that would allow for
>> > tracking
>> entering and leaving scopes.
>> > - If there are no variable declarations in scope, scope start/end
>> elements should be omited.
>> > - Scope end element should be omited if there are no variable
>> declarations before it in the scope.
>> >
>> > For destructors:
>> > - CFGElement kind for call to destructor.
>> >
>> > The main problem with implementing this in CFGBuilder is that the
>> > builder
>> is processing AST backwards. When the builder have to decide if it
should
>> add a scope end element or a destructor it doesn't know of variables
>> declared earlier in the scope. To solve this there have to be an
>> additional
>> step of walking e.g. a block of statements (compound statement) and
>> creating
>> a list of all statements declaring variables.
>> >
>> > Solution that I would like to propose is to make those lists a part
of
>> CFG instead of just temporary data used while building the graph. It
>> would
>> look something like this:
>> >
>> > class CFGScope {
>> >   Stmt* stmt; // Statement that creates the scope (e.g. CompoundStmt)
>> >   vector<Stmt*> decls; // Statements that declare variables in same
>> >   order
>> as in AST
>> >   CFGScopeIterator parent; // Link to position in parent scope where
>> >   this
>> scope started
>> > };
>> >
>> > class CFGScopeIterator {
>> >   CFGScope* scope; // Scope pointed by iterator
>> >   vector<Stmt*>::iterator pos; // Position in declarations list of
the
>> scope.
>> > };
>> >
>> > Now the scope start element would point to CFGScope object. The scope
>> > end
>> element would point to pair (range) of CFGScopeIterators: first would
>> point
>> to position in current scope the scope end occured, second would point
to
>> position in some scope where the control flow did jump to. Element for
>> destructors wouldn't be needed, because destructors could be easly
>> deduced
>> from range in scope end element.
>> >
>> > This design would allow for simple implementation of jumps out of
>> > nested
>> scopes and simple modeling of destructors. Similar design could be used
>> to
>> model destructors for temporary objects.
>> >
>> > What do you think about this design?
>> >
>> > Cheers
>> > Sfider
>>
>>