[cfe-dev] Normalizing the AST across languages

Doug Gregor doug.gregor at gmail.com
Thu Oct 30 09:45:43 PDT 2008


There are a few places in Clang where we build different AST nodes
depending on which language we are parsing. I believe we should seek
to eliminate those differences, so that clients only need to deal with
one AST node per kind of entity.

The most obvious example of what I'm talking about is the difference
between RecordDecl and CXXRecordDecl. RecordDecl is used when we're
parsing C, while CXXRecordDecl is used when we're parsing C++. Here's
the chunk of code from Sema::ActOnTag that handles the allocation:

    if (getLangOptions().CPlusPlus)
      // FIXME: Look for a way to use RecordDecl for simple structs.
      New = CXXRecordDecl::Create(Context, Kind, CurContext, Loc, Name);
    else
      New = RecordDecl::Create(Context, Kind, CurContext, Loc, Name);

The intent of CXXRecordDecl is clear: since C++ requires us to keep
additional information about classes in the AST (which isn't needed in
C), all that extra information goes into CXXRecordDecl so that we
don't bloat the C compilation with unused data.  This means that
compiling a C program as C++ uses different ASTs and requires more
memory. I don't think that's a desirable outcome, for several reasons:

  (1) It's harder for AST clients to juggle two different kinds of AST
nodes that represent the same thing. (And it'll get really hard if we
start trying to deal with ASTs for multiple translation units, but
that's way off in the future)

  (2) We're not making the best use of our memory in C++: as the
comment above says, we really want to use the smaller RecordDecl for
simple classes in C++, but we don't know whether we'll have a simple
class or not until we've parsed it. So our approach to reducing memory
usage in C actually increases memory usage in C++. [*]

So, here is my suggestion: instead of making a distinction between
what the two languages support at the AST level, use a single AST node
(RecordDecl) that has a pointer to an optionally-allocated "extras"
data structure containing that extra information when we need to store
it. For RecordDecl, the "extras" structure should definitely have
information about C++ classes that isn't needed in C and isn't used by
the majority of C++ classes. For example, base classes, friends,
user-defined constructors/destructor/copy-assignment operators,  and
user-defined conversion operators.

I've attached a patch that implements my suggestion. Unless I hear
screams of protest, I'll be committing it shortly. Essentially, it
moves the functionality of CXXRecordDecl into RecordDecl (eliminating
CXXRecord-everything from Clang), and puts the base class information
into RecordDeclExtras. With this change, structs in C take a little
bit more memory---since they are now a DeclContext---but simple
structs (= no base classes) in C++ take less memory. RecordDecl isn't
expected to get any bigger after this, but RecordDeclExtras will grow
as we implement support for more C++ features.

CXXFieldDecl is likely to get the same treatment, and I think we need
to decide whether CXXClassMemberWrapper adds more confusion than it
eliminates.

Comments?

  - Doug
-------------- next part --------------
A non-text attachment was scrubbed...
Name: remove-cxx-record-decl.patch
Type: text/x-patch
Size: 35844 bytes
Desc: not available
URL: <http://lists.llvm.org/pipermail/cfe-dev/attachments/20081030/3107050e/attachment.bin>


More information about the cfe-dev mailing list