[cfe-dev] Cleaning up the representation of Decls in the AST

Argiris Kirtzidis akyrtzi at gmail.com
Fri Aug 22 15:52:07 PDT 2008


Hi Ted,

This looks like a great design!
Just a few questions:

1) Who will own struct references/definitions, will there be DeclGroup 
owners for them too ?
For example:

struct S1;
struct S1;
struct S1 { int x; };
struct S2;
struct S2 {int x; } foo;

who owns these and what will the pretty-print output be for the above 
example ?

2)
> DeclGroup -> TypeDecl Decl+
>            -> Decl
>   
Given a RecordDecl how can I find out its owner, will TypeDecl point to 
its owning DeclGroup (or Type point to the DeclGroup owning the TypeDecl) ?


-Argiris


Ted Kremenek wrote:
> Over the last couple of weeks Daniel Dunbar and I have had a series of
> conversations about various serious issues with the current design and
> implementation of Decls in Clang.  We've come up with some proposed
> changes that we think will resolve most of the problems that we know
> about, and I thought I would sketch out these ideas here so that we
> can openly discuss these ideas and debate their value.
>
> I'm going to start by outlining some of the problems we are trying to
> solve and then discuss our solution from both a design and  
> implementation
> level.
>
> We're hoping to start implementing these changes very soon (next week).
>
> Any comments or feedback you have are appreciated!
>
>
> PROBLEMS WE'RE TRYING TO ADDRESS
>
> (1) AMBIGUOUS OWNERSHIP AND LOSS OF INFORMATION
>
> There are many cases where it is not clear what is the owning
> reference to a Decl.  This poses problems for both AST serialization
> (which is needed for PCH support), but is also needed for refactoring
> clients that wish to modify the AST.  Similarly, the ASTs don't retain
> information about the source code that would be vital for accurately
> representing key elements of the original source code (necessary for
> refactoring), or even being able to pretty-print valid ASTs to code
> that can compile.
>
> For example, consider the following code:
>
> [EXAMPLE 1]
>
> void f() {
>    typedef struct x { int y; } foo;
> }
> struct x { int y; };
> void g() {
>    typedef struct x foo;
> }
>
> clang -ast-print currently pretty-prints this as:
>
> void f() {
>    typedef struct x foo;
> }
> ...
> void g() {
>    typedef struct x foo;
> }
>
> Currently both f() and g() appear the same during printing because
> their is no reference from either the TypedefDecl or the DeclStmt to
> the RecordDecl for 'struct x'.  Only the RecordType for 'struct x'
> refers to the RecordDecl, which means in this case the RecordType
> actually *owns* the RecordDecl.  This is horrible, not only because a
> Type object should not own Decls, but that a RecordDecl can sometimes
> be owned by a DeclStmt or be a top-level declaration:
>
> [EXAMPLE 2]
>
> struct x { int y; }; // GOOD: The TranslationUnit owns the RecordDecl.
> struct z { int y; } z; // BAD: The RecordType owns the RecordDecl.
>
> The lack of clear ownership semantics, along with the loss of
> information that it implies, makes it very difficult to always
> faithfully pretty-print code that actually compiles, or compiles with
> the same semantics:
>
> [EXAMPLE 3]
>
> void f() {
>    struct { int x; } a, b, *c;
> }
>
> clang -ast-print:
>
> void f() {
>    struct <anonymous> a;
>    struct <anonymous> b;
>    struct <anonymous> *c;
> }
>
> clang -ast-dump:
>
> void f()
> (CompoundStmt 0xc070f0 <<stdin>:3:12, line:5:3>
>    (DeclStmt 0xc070d0 <line:4:5>
>      0xc06ff0 "struct <anonymous> a"
>      0xc07050 "struct <anonymous> b"
>      0xc070a0 "struct <anonymous> *c")
>
> As before, in this case the RecordType owns the RecordDecl.  Not only
> that, but -ast-print pretty-prints the variable declaration in a
> completely broken way.  While --ast-print can be enhanced to do a
> better job in this case, fundamentally the Decl for the anonymous
> struct needs to referenced from the AST nodes themselves.
>
> (2) NO DISTINCTION BETWEEN A 'DECL' AND A 'GROUP OF DECLS'
>
> The way we represent "groups of Decls" is both not uniform and poorly
> abstracted.
>
> For multi-variable declarations (the most common case of "groups of
> Decls"), we represent them as a single ScopedDecl* that points to the
> first ScopedDecl* in a chain.  The problem is that many clients assume
> that when they are handed a single Decl* they are operating on a
> single Decl, not a group of Decls.  This continues to bite us over and
> over.  Here are two examples.
>
> The first is the AST pretty-printer, which only pretty-prints the
> first Decl in a multi-variable declaration that appears at the
> top-level:
>
> $ clang -ast-print
> int a, b, c;
> ^d
> typedef char *__builtin_va_list;
> Read top-level variable decl: 'a'
>
> Clearly this can be fixed, but basically this client was written to
> assume that when it was handed a Decl* it was just reasoning about a
> single Decl.  Clients have to do downcasts and isa<> interrogations of
> Decls to determine if a Decl* is referring to a single Decl or a group
> of Decls.  In the particular case of the pretty-printer, I counted
> about four or five locations in StmtPrinter.cpp where
> DeclStmt::getDecl() is called (which returns a chain of ScopedDecls)
> and only the first Decl is processed.
>
> A second example of where this use to bite us is codegen support in
> Clang.  Until recently, when we saw "int a, b, c;" we only did the
> codegen for 'a'.  This was fixed, but obviously future clients will
> continue to make the same mistake over and over, and their failure
> modes might not be as dramatic to immediately notice they are getting
> it wrong.
>
> Clearly a disambiguation between Decls and 'groups of Decls' would be
> useful, not only for clients, but also for cleanliness within the ASTs
> themselves.  This would help resolve numerous ownership issues as
> well.  For example, if we had DeclStmt::getDeclGroup() instead of
> DeclStmt::getDecl() (or something like that) that returned a group of
> Decls then this kind of programming error wouldn't be possible because
> the type system with prohibit it.  Clients would explicitly reason
> about 'groups of Decls'.  This would prohibit many errors, but also
> would simplify code in many cases.  Specifically, in the cases where a
> single Decl* is passed to a client the client can assume they are
> dealing with a single Decl.
>
> (3) ScopedDecl CHAIN: CONFLATED IMPLEMENTATION AND LACK OF ABSTRACTION
>
> The decl chain in ScopedDecl is currently used for different purposes
> in different contexts, which muddles up the ASTs and makes it often
> very unclear what is going on.
>
> For example, an EnumDecl represents its enum constants by chaining
> them together using a ScopedDecl chain.  This same chain is used to
> represent multi-variable declarations as well as chaining together
> NamespaceDecls.  In each of these cases the ownership semantics with
> regards to the chain are slightly different, and conceptually they
> represent very different things.
>
> For C++ namespaces, we can still chain them together (as we do with
> FunctionDecls), but not used the ScopedDecl chain because it is
> semantically something very different.
>
> Further, most clients that do successfully walk the ScopedDecl chain
> walk the chain directly.  This is bad because it fuses us to a
> particular implementation instead of having clients reason about an
> abstraction.  Instead we should use an iterator interface to walk a
> group of decls.
>
> In generally, we believe that the ScopedDecl chain should just be
> removed.  It is used in logically inconsistent ways and almost every
> client that has used it has gotten things wrong at least once.
>
>
> PROPOSED SOLUTION
>
> We propose the following solution to address these issues:
>
> - EnumDecl and NamespaceDecl no longer used the getNextDeclarator()
>   chain in ScopedDecl. Both can use a chain (as they do now), or some
>   alternate representation. However, the implementation chosen will be
>   private to each class and an iterator interface should suffice to
>   allow clients to ignore the details of the implementation.
>
> - Introduce the notion of "DeclGroup" that represents a group of
>   (comma-separated) Decls.
>
> Essentially, DeclGroup would have the following grammar:
>
> DeclGroup -> TypeDecl Decl+
>            -> Decl
>
> DeclGroup would *own* the Decls it references.  The "TypeDecl"  
> represents an optional extra type declaration associated with the  
> DeclGroup.
>
> For example, to represent:
>
> struct {} x, y;
>
> The DeclGroup consists of:
>
> (1) A RecordDecl (the "TypeDecl") for the anonymous struct.
> (2) Two VarDecls for x and y.
>
> This also works for typedefs:
>
> typedef struct x {} foo, bar;
>
> becomes a DeclGroup consisting of:
>
> (1) A RecordDecl for 'struct x' (the "TypeDecl").
> (2) Two TypedefDecls for 'foo' and 'bar'.
>
> Note that this completely resolves the ownership issues with anonymous
> structs and similar constructs.  It also means that Types no longer
> own Decls (a good thing).
>
> For the common case where a DeclGroup contains a single Decl (and no
> extra "TypeDecl" is needed), DeclGroup would have an optimized
> implementation so that using DeclGroup has no additional space
> overhead compared to our current implementation. Our idea is to
> essentially implement DeclGroup as a smart pointer, and for this
> common case it just contains the pointer to the solitary Decl (no
> additional overhead).
>
> Moreover, by using a smart-pointer DeclGroup instead of a chain of  
> ScopedDecls, we actually will save the space used by the  
> NextDeclarator field in ScopedDecl (since it's NULL most of the time).
>
> - DeclStmt uses DeclGroup instead of a ScopedDecl*.
>
> - *All* top-level Decls would be replaced by DeclGroups.
>
> Conceptually this unifies how DeclStmt and the top-level represent
> groups of Decls.
>
> ParseAST would be modified to call
> ASTConsumer::HandleTopLevelDeclGroup instead of HandleTopLevelDecl. We
> can still have HandleTopLevelDecl, and have the default implementation
> of HandleTopLevelDeclGroup just call HandleTopLevelDecl for each of
> its constitute Decls.
>
> - RecordDecls would represent their fields as a group of DeclGroups.
>   This enables us to accurately represent:
>
> struct x {
>    struct y { int z; } a, b;
>    int c;
> };
>
> A field_iterator interface can provide the necessary abstraction to
> clients that just want to iterate over the FieldDecls of a struct.
>
> - The arguments of a function would be represented as a list of
>   DeclGroups.  This is because the following is legal (note the
>   anonymous struct):
>
> int foo(int x, struct { int a; } y);
>
> Similarly, an iterator interface provides clients with a clean way to
> iterate over the ParmVarDecls in a function prototype.
>
> - Similar changes everywhere else where we have groups of
>   (comma-separated) Decls that can contain both type and variable
>   declarations.
>
>
> PROS AND IMPLEMENTATION
>
> We believe these changes could be made very incrementally to the
> parse/Sema/ASTs/ASTConsumers.
>
> The main benefits we see from these changes are:
>
> - Cleaner API: Clients always know when they are reasoning about Decls
>   or groups of Decls.
>
> - Source-Fidelity: We capture much more of the source information in
>   the ASTs.  This is great for refactoring clients.
>
> - Object-Ownership: Most of the known object ownership issues in the
>   ASTs go away, making it easier for clients to manipulate
>   (e.g. refactoring) as well as unblocking full AST serialization
>   (needed for PCH).
>
> - Abstraction: Given the proper iterator interfaces, most clients will
>   not care about DeclGroups.  They can just use the interfaces to
>   iterate over Decls.  The default implementation of
>   ASTConsumer::HandleTopLevelDeclGroup would just call
>   ASTConsumer::HandleTopLevelDecl for each Decl* in a top-level
>   DeclGroup.  Clients that care about top-level DeclGroups can
>   override HandleTopLevelDeclGroup.
>
> - ASTs better reflect the grammar of C: We believe these changes would
>   make it clearer in many cases what source constructs a particular
>   AST node can represent.  This is invaluable for clients.
>
>
>
> Thoughts?
>
> _______________________________________________
> cfe-dev mailing list
> cfe-dev at cs.uiuc.edu
> http://lists.cs.uiuc.edu/mailman/listinfo/cfe-dev
>
>   



More information about the cfe-dev mailing list