[cfe-dev] Normalizing the AST across languages

Thu Oct 30 12:52:15 PDT 2008

On Thu, Oct 30, 2008 at 1:48 PM, Chris Lattner <clattner at apple.com> wrote:
>
> On Oct 30, 2008, at 9:45 AM, Doug Gregor wrote:
>
>> There are a few places in Clang where we build different AST nodes
>> depending on which language we are parsing. I believe we should seek
>> to eliminate those differences, so that clients only need to deal with
>> one AST node per kind of entity.
>>
>> The most obvious example of what I'm talking about is the difference
>> between RecordDecl and CXXRecordDecl. RecordDecl is used when we're
>> parsing C, while CXXRecordDecl is used when we're parsing C++. Here's
>> the chunk of code from Sema::ActOnTag that handles the allocation:
>>
>>   if (getLangOptions().CPlusPlus)
>>     // FIXME: Look for a way to use RecordDecl for simple structs.
>>     New = CXXRecordDecl::Create(Context, Kind, CurContext, Loc, Name);
>>   else
>>     New = RecordDecl::Create(Context, Kind, CurContext, Loc, Name);
>>
>> The intent of CXXRecordDecl is clear: since C++ requires us to keep
>> additional information about classes in the AST (which isn't needed in
>> C), all that extra information goes into CXXRecordDecl so that we
>> don't bloat the C compilation with unused data.  This means that
>> compiling a C program as C++ uses different ASTs and requires more
>> memory.
>
> Right, this is something I asked Argiris to do.  The intent was for "C like"
> struct definitions in C++ to use the lighter weight RecordDecl when possible
> (which is what the fixme is about).

I don't see how it can work. The problem is that we need to allocate
the RecordDecl as soon as we see the "struct Blah". After we've done
that, we can't go back and make it into a CXXRecordDecl if we then
find out that it has friend declarations.

>> I don't think that's a desirable outcome, for several reasons:
>>
>>  (1) It's harder for AST clients to juggle two different kinds of AST
>> nodes that represent the same thing. (And it'll get really hard if we
>> start trying to deal with ASTs for multiple translation units, but
>> that's way off in the future)
>
> I'm not sure what you mean.  Consider base classes.  The intent here was for
> CXXRecordDecl to remain the same, but for RecordDecl to get something like
> this:
>
>  unsigned getNumBases() const {
>    if (const CXXRecordDecl *CXX = dyn_cast<CXXRecordDecl>(this))
>      return CXX->getNumBases();
>    return 0;
>  }
>
> This means that all *clients* should be able to use RecordDecl and never
> have to poke at CXXRecordDecl unless they want to.  With this approach,
> CXXRecordDecl is just a hidden implementation detail.

Ah, interesting. I did not realize that the intent was to make
CXXRecordDecl an implementation detail when I added  the base-class
accessors into it. It would certainly make me happier if CXXRecordDecl
was fully an implementation detail, and didn't show up at all in Sema
except when we're calling CXXRecordDecl::Create.

> Does the approach described above make any sense?  I do think the
> CXXRecordDecl vs RecordDecl approach can work and be elegant :).  Do you see
> a problem with it?

Let's see if the problem I mentioned above---that we can't go back and
turn a RecordDecl into a CXXRecordDecl if we find a friend function
inside the class definition---can be resolved. If it can, we can argue
the relative elegance of the solutions.

  - Doug