[cfe-dev] CFG blocks and variable scope

Sat Mar 28 13:46:40 PDT 2009

On Mar 28, 2009, at 12:19 PM, Martin Doucha wrote:

>> There is currently no scope information in the CFG (or the AST for
>> that matter).  Adding this information would be extremely useful, and
>> would probably tie in for eventual support for encoding calls to C++
>> destructors in the CFG as well.
>>
>
> Great, so what's the preferable way of doing this? My idea is to  
> have a
> tree of scopes (corresponding to CompoundStmt), each scope  
> containing a
> complete list of variables declared inside it (not including
> declarations in nested scopes) regardless of control flow.

Hi Martin,

I haven't given a lot of thought to this yet, but I will comment on  
this point.  Scopes can be introduced in many places, especially in C+ 
+.  While I'm not certain if you suggested this, we wouldn't want to  
reconstruct the work done by Sema in generating scope information;  
ideally this information would still be accessible (when desired) when  
one has the ASTs.

In the CFG, my thought was that *potentially* destructor calls could  
be explicitly modeled.  The lifetimes of regular stack variables could  
also be modeled using the same mechanism.  Since we haven't resolved  
how we want to represent destructors in the AST or CFG, I think that  
should probably be addressed first.

> Then each CFG
> block would have a single parent scope (the one directly above it)  
> and a
> list of scopes inside it with a statement iterator pair designating  
> the
> start and end of the scope in the block. Now the question is, can
> different edges leaving the block leave different sets of scopes?

Within a single basic block multiple scopes may be "pushed" and  
"popped".  The CFG only corresponds to control-flow, and thus nested  
compound statements are flattened.  Note that C++ also introduces  
scopes in many places that C does not.  e.g.,

   int y = 0;
   if (int x = y + 1) { ... }

There are three scopes here.  The scope containing the 'if' statement  
and 'int y = 0', the scope containing 'int x = 1', and the scope  
within the { ... }.  The statements 'int y = 0' and 'int x = y + 1'  
occur within the same basic block.  The successors of that basic block  
will have entirely different scopes.

At a high level, I don't think there is much value in modeling the  
notion of "scope" at all within a CFG, and the complexity cost would  
be high.  Scope is a concept of the language and its syntax, and thus  
it relates much more directly to the AST than the CFG.  The CFG  
encodes control-flow between expressions.  I really think that all  
that you are interested in here is the *effects* of scope on object  
lifetime rather than scope itself.  Since an object getting destroyed  
(and here an object can be anything that is stack allocated, not just  
a C++ object) is an actual event with potential side-effects, modeling  
that in the CFG makes sense.  To me it muddles up the conceptual  
clarity of CFGs by trying to have them model scoping (which would make  
CFGs a mongrel of two orthogonal concepts).

Don't get me wrong: there is still value in having a way to query the  
scope of a variable, but I don't think that belongs in the CFG.   
Modeling scope information (which is done in Sema but not in the ASTs)  
means having some object or handle that represents a particular scope,  
being able to query what objects are in a scope and where a scope  
begins and ends, etc.  Ultimately analyses based on CFGs probably  
don't care about that information at all but rather about the  
ramifications of scope in terms of object lifetime.  This information  
could be captured during CFG construction (which could inspect the  
scope information) but the notion of scope shouldn't be in the CFGs  
themselves.

Ted
-------------- next part --------------
An HTML attachment was scrubbed...
URL: <http://lists.llvm.org/pipermail/cfe-dev/attachments/20090328/f61810d3/attachment.html>