[cfe-dev] Normalizing the AST across languages

Thu Oct 30 10:48:58 PDT 2008

On Oct 30, 2008, at 9:45 AM, Doug Gregor wrote:

> There are a few places in Clang where we build different AST nodes
> depending on which language we are parsing. I believe we should seek
> to eliminate those differences, so that clients only need to deal with
> one AST node per kind of entity.
>
> The most obvious example of what I'm talking about is the difference
> between RecordDecl and CXXRecordDecl. RecordDecl is used when we're
> parsing C, while CXXRecordDecl is used when we're parsing C++. Here's
> the chunk of code from Sema::ActOnTag that handles the allocation:
>
>    if (getLangOptions().CPlusPlus)
>      // FIXME: Look for a way to use RecordDecl for simple structs.
>      New = CXXRecordDecl::Create(Context, Kind, CurContext, Loc,  
> Name);
>    else
>      New = RecordDecl::Create(Context, Kind, CurContext, Loc, Name);
>
> The intent of CXXRecordDecl is clear: since C++ requires us to keep
> additional information about classes in the AST (which isn't needed in
> C), all that extra information goes into CXXRecordDecl so that we
> don't bloat the C compilation with unused data.  This means that
> compiling a C program as C++ uses different ASTs and requires more
> memory.

Right, this is something I asked Argiris to do.  The intent was for "C  
like" struct definitions in C++ to use the lighter weight RecordDecl  
when possible (which is what the fixme is about).

> I don't think that's a desirable outcome, for several reasons:
>
>  (1) It's harder for AST clients to juggle two different kinds of AST
> nodes that represent the same thing. (And it'll get really hard if we
> start trying to deal with ASTs for multiple translation units, but
> that's way off in the future)

I'm not sure what you mean.  Consider base classes.  The intent here  
was for CXXRecordDecl to remain the same, but for RecordDecl to get  
something like this:

   unsigned getNumBases() const {
     if (const CXXRecordDecl *CXX = dyn_cast<CXXRecordDecl>(this))
       return CXX->getNumBases();
     return 0;
   }

This means that all *clients* should be able to use RecordDecl and  
never have to poke at CXXRecordDecl unless they want to.  With this  
approach, CXXRecordDecl is just a hidden implementation detail.

>  (2) We're not making the best use of our memory in C++: as the
> comment above says, we really want to use the smaller RecordDecl for
> simple classes in C++, but we don't know whether we'll have a simple
> class or not until we've parsed it. So our approach to reducing memory
> usage in C actually increases memory usage in C++. [*]

This is true, and I don't know if there is an answer for things like  
"struct foo;" in C++.  However, a common case for C++ is that code  
#includes a lot of C code, and if there is a full definition for the  
body, we could make it use RecordDecl.

> So, here is my suggestion: instead of making a distinction between
> what the two languages support at the AST level, use a single AST node
> (RecordDecl) that has a pointer to an optionally-allocated "extras"
> data structure containing that extra information when we need to store
> it. For RecordDecl, the "extras" structure should definitely have
> information about C++ classes that isn't needed in C and isn't used by
> the majority of C++ classes. For example, base classes, friends,
> user-defined constructors/destructor/copy-assignment operators,  and
> user-defined conversion operators.

That would also be ok I guess, if you don't think the approach  
outlined above will work.

> I've attached a patch that implements my suggestion. Unless I hear
> screams of protest, I'll be committing it shortly. Essentially, it
> moves the functionality of CXXRecordDecl into RecordDecl (eliminating
> CXXRecord-everything from Clang), and puts the base class information
> into RecordDeclExtras. With this change, structs in C take a little
> bit more memory---since they are now a DeclContext---but simple
> structs (= no base classes) in C++ take less memory. RecordDecl isn't
> expected to get any bigger after this, but RecordDeclExtras will grow
> as we implement support for more C++ features.

Does the approach described above make any sense?  I do think the  
CXXRecordDecl vs RecordDecl approach can work and be elegant :).  Do  
you see a problem with it?

-Chris