[cfe-dev] PATCH: Cleanup function redeclaration representations

Sun May 4 11:42:15 PDT 2008

On Sun, May 4, 2008 at 2:59 AM, Chris Lattner <clattner at apple.com> wrote:
> On May 1, 2008, at 6:57 PM, Doug Gregor wrote:
>
> > Chris - there's a patch in here that you probably want to take a look at.
> >
>  It is also nice to be able to capture (as Argiris says) information that is
> reasonably close to the way the user wrote it.  This would argue for
> representing:
>
>  void foo(int x, int y = 4);
>  void foo(int x = 12, int y);
>
>  ... as two functiondecls, each of which that have *one* default argument
> specified.

It might be more accurate to represent the second "foo" as having a
default argument on "y", with a way to tell that the default argument
was actually merged from another declaration. Otherwise, the second
"foo" in isolation is an ill-formed function declaration.

> > I went a slightly different way... both #1 and #2 get the merged args
> > (so each shows the complete state at that time), but we use
> > CXXDefauiltArgExpr for default arguments that were the result of
> > merging default arguments. That way, we always know exactly which
> > parameter (in which declaration) the default argument originally came
> > from.
> >
>
>  Ok, so this is effectively handle the above as:
>
>  void foo(int x, int y = 4);
>  void foo(int x = 12, int y = <defaultargexpr 4>);
>
>  Where <defaultargexpr 4> points to the '4' in the previous foo prototype?

<defaultargexpr y> would be a little more accurate, where y refers to
the "y" parameter of the first declaration of "foo".

> This way you can distinguish between arguments that are explicitly specified
> from those that were inherited from previous declarations?  How does this
> handle cases like this:
>
>  void foo()
>  void foo(int x);
>  void foo()
>
>  case?

It's keeping these foo's as exactly as described in the source, which
isn't quite accurate. The last declaration should certainly know that
we do have a prototype for "foo".

> >  1) The swapping behavior is gone; instead, we merge into the
> > redeclaration and into the original declaration.
> >
>
>  I'm concerned about this.  It violates the sanctity of the original
> declaration :)

Yep, that's the trade-off in that patch.

>  void foo(int x, int y = 4);
>  void foo(int x = 12, int y);
>
>  We want the clients to be able to see those two decls, but also one
> aggregate decl of:
>  void foo(int x = 12, int y = 4);
>
>  This saves each client from having to walk the entire list of
> redeclarations to pull together all the default arguments.

Okay.

>  Lets assume that functions are usually not redefined, and when they are
> that the redefinitions are often exactly the same (plus perhaps a body).
> Given this, maybe a spin on the 'canonical type' idea would work.

Even if the definition is the same as the original declaration, the
source locations for the parameters, function name, and so on will be
different... so we'll want a second FunctionDecl.I'm guessing that the
majority of (non-template) functions will have 2 FunctionDecl nodes
associated with them.

>  Ignoring efficiency, imagine if the first time we parsed a decl that we
> actually created two versions of the decl: one version (the parsed version)
> that represents the decl as written, and a second version (the aggregate
> version) that is updated as other redecls are parsed.  After parsing the
> decl the first time, the two decls are exactly the same:
>
>  void foo(int x, int y = 4);      void foo(int x, int y = 4);
>
>
>
>  When parsing the redeclaration, we create another 'parse decl' for the
> redeclaration:
>
>  void foo(int x, int y = 4);      void foo(int x, int y = 4);
>  void foo(int x = 12, int y);
>
>  Then we call mergedecl, and it updates the aggregate version to contain all
> the goop [this is a technical term] merged together for the two
> declarations:
>
>  void foo(int x, int y = 4);
>  void foo(int x = 12, int y);     void foo(int x = 12, int y = 4);
>
>  If we later see a definition (e.g. "void foo(int x, int y) {}"), we parse
> it, but add the actual body to the aggregate version:
>
>  void foo(int x, int y = 4);
>  void foo(int x = 12, int y);
>  void foo(int x, int y)<defn>;    void foo(int x = 12, int y = 4) {}
>
>  Instead of adding the body to the "parsed version" of the proto, we add it
> to the aggregate version, and use a flag on the 'parsed version' to say that
> it was responsible for providing the definition or not.
>
>  Again, if we ignore efficiency, I think this provides an interesting sweet
> spot: all of these [re]decls would be linked together, so clients could walk
> the list.  Also, parsing a redefinition does not cause the previous
> definition to change.  Clients who care about the aggregate semantics of a
> decl could just always look at the 'aggregate version' of the decl (similar
> to the 'canonical' version of a type), which would always have the union of
> information known about the decl.  This is also reasonably easy to
> implement: merge decls just handles the construction and updating of the
> aggregate version, and it has all the information it could ever want to
> provide really accurate diagnostics.

My gut tells me that most clients that are doing any kind of analysis
will care only about the "aggregate" version of the decl.

>  1: foo [nextredecl=1, isaggregate=true]    void foo(int x, int y);
>
>  Scope[foo] = 1   // This is the scope chain for foo
>
>
>  III. The next time MergeDecls is called, there are two possibilities: first
> if the redecl is exactly the same (no information is added) then existing
> redecl is set as the next pointer and isaggregate is set to false:
>
>  void foo(int x, int y);  // #2
>
>  1: foo [nextredecl=1, isaggregate=true]    void foo(int x, int y);
>  2: foo [nextredecl=1, isaggregate=false]   void foo(int x, int y);
>
>
>  Scope[foo] = 2

Like Argiris, I don't think we should be adding redeclarations to the
scope chain. The only time the scope chain should contain multiple
declarations of the same name is if those declarations actually refer
to different entities.

>  I think this is a fairly reasonable sweet spot, what do you guys think?

With Argiris' tweak, I almost like it. I'm concerned about the
programmability of this system, where the FunctionDecl node that is
found by name lookup is not the 'aggregate' node. As with the
canonical type system, clients will have to be very careful to always
map into the 'aggregate' node before querying any properties. With the
canonical type system, we need to do the mapping because we need to
preserve typedefs in the AST. With declarations, however, the
complexity isn't coming directly from the language... it's coming from
the representation. It will take a lot of discipline to use
FunctionDecls properly if name lookup doesn't find the 'aggregate'
FunctionDecl. (Among other things, we'll have to audit the whole Clang
code-base to see where we need to add GetAggregateDecl calls. Errors
here are likely to be subtle.).

It occurs to me that using FunctionDecl for each of the redeclarations
isn't even as efficient as we could be. For example, the
redeclarations don't need to have scope or name information, because
we know they're the same as the aggregate FunctionDecl, nor do they
need information about the body of the function. On the other hand,
they do need some additional flags, such as "am I the definition?" One
could imagine that each function only has a single FunctionDecl, which
represents the aggregate declaration. That FunctionDecl contains a
SmallVector<FunctionRedecl*, 2>, where each FunctionRedecl contains
minimal information about a redeclaration... the exact types used in
the parameter-type-list, the attributes, whether it was the
definition, etc. All of semantic analysis sees just that one
FunctionDecl, and those clients interested in dealing with
redeclarations can walk that redeclaration list; One could also
consider adding a callback that is invoked on each redeclaration.

Time to catch a flight...

  - Doug