[cfe-dev] PATCH: Cleanup function redeclaration representations

Sun May 4 01:42:22 PDT 2008

Hi Chris,

I really like the idea of keeping the decls clean and having a separate 
'aggregate' decl. I have two suggestions for consideration:

--The 'isaggregate' bool is not needed if we assume that, when we walk 
the list of redecls, the last one is the 'aggregate' one.
Here's how it would look (using the 5 cases you describe):

II.
void foo(int x, int y);  // #1

1: foo [nextredecl=null]    void foo(int x, int y);

Last one = #1 = aggregate

III.
void foo(int x, int y);  // #2

1: foo [nextredecl=null]    void foo(int x, int y);
2: foo [nextredecl=1]   void foo(int x, int y);

Last one = #1 = aggregate

IV.
void foo(int x, int y = 0);  // #3

A: foo [nextredecl=null]   void foo(int x, int y = 0);
1: foo [nextredecl=A]   void foo(int x, int y);
2: foo [nextredecl=1]  void foo(int x, int y);
3: foo [nextredecl=2]  void foo(int x, int y = 0);

Last one = #A = aggregate

V.
void foo(int x = 12, int y)  // #4

A: foo [nextredecl=null]  void foo(int x = 12, int y = 0);
1: foo [nextredecl=A]  void foo(int x, int y);
2: foo [nextredecl=1] void foo(int x, int y);
3: foo [nextredecl=2] void foo(int x, int y = 0);
4: foo [nextredecl=3] void foo(int x = 12, int y);

Last one = #A = aggregate

--Another suggestion is to not add to the scope chain the redeclarations 
(at all 5 cases, Scope[foo] = 1), thus all calls to 'foo' will refer to 
the same FunctionDecl node.
One way to go about this would be to have a double linked list of 
redeclarations, and the (V) case would be:

A: foo [prevdecl=null, nextdecl=null]  void foo(int x = 12, int y = 0);  
// does the 'aggregate' need to point to #1 as nextdecl ?
1: foo [prevdecl=A, nextdecl=2]  void foo(int x, int y);  // this is 
what all calls to 'foo' refer to
2: foo [prevdecl=1, nextdecl=3] void foo(int x, int y);
3: foo [prevdecl=2, nextdecl=4] void foo(int x, int y = 0);
4: foo [prevdecl=3, nextdecl=null] void foo(int x = 12, int y);

following prevdecl, last one = #A = aggregate

We could use two DenseMaps to store the redeclaration links, similar to 
the way Doug is using one in his patch, thus not bloating ScopeDecls.
(just to be pedantic again, on the above case, if #A doesn't link to #1 
as nextdecl, we save a link :) )

-Argiris

Chris Lattner wrote:
> On May 1, 2008, at 6:57 PM, Doug Gregor wrote:
>> Chris - there's a patch in here that you probably want to take a look 
>> at.
>
> Reading the backthread, it is a very interesting and nasty topic.  I 
> strongly agree with the meta points that Argiris is making:
>
> It is bad for clients if parsing new decls causes old decls to change 
> (e.g. due to swapping).  Looking forward, this will be a significant 
> hindrance if we want to do things like incremental reparsing.  It also 
> breaks a number of useful invariants for clients: the codegen 
> assertion in the testsuite is basically due to the argument list for a 
> functiondecl changing between the first time the client (codegen) sees 
> a decl and the second time it sees it [the first time it sees it,  the 
> function is "void()" the second time it is "void(int)"].  Decls 
> changing is surprising and weird. :)
>
> It is also nice to be able to capture (as Argiris says) information 
> that is reasonably close to the way the user wrote it.  This would 
> argue for representing:
>
> void foo(int x, int y = 4);
> void foo(int x = 12, int y);
>
> ... as two functiondecls, each of which that have *one* default 
> argument specified.
>
>> On Fri, Apr 25, 2008 at 4:42 AM, Argiris Kirtzidis 
>> <akyrtzi at gmail.com> wrote:
>>> Doug Gregor wrote:
>>> How about
>>>
>>> (1) -#2 is checked for error diagnostics by considering default args 
>>> from
>>> both #1 and #2
>>>
>>>
>>> And in the end, only #1 gets the merged args. This is the same as 
>>> only #2
>>> gets the merged args.
>>
>> I went a slightly different way... both #1 and #2 get the merged args
>> (so each shows the complete state at that time), but we use
>> CXXDefauiltArgExpr for default arguments that were the result of
>> merging default arguments. That way, we always know exactly which
>> parameter (in which declaration) the default argument originally came
>> from.
>
> Ok, so this is effectively handle the above as:
>
> void foo(int x, int y = 4);
> void foo(int x = 12, int y = <defaultargexpr 4>);
>
> Where <defaultargexpr 4> points to the '4' in the previous foo 
> prototype?  This way you can distinguish between arguments that are 
> explicitly specified from those that were inherited from previous 
> declarations?  How does this handle cases like this:
>
> void foo()
> void foo(int x);
> void foo()
>
> case?
>
>>> Actually, I'm thinking that maybe there could be a common convention 
>>> that,
>>> not only functions, but vars and namespaces should follow.
>>> An extended namespace definition could be regarded as a redeclaration.
>>
>> Yes, agreed. I've moved addRedeclaration (and its brethren) into
>> ScopedDecl, so the same approach can apply to extern variables,
>> functions, namespaces, classes, etc.
>
> Nice.  I think it is very important to pick a model and be consistent.
>
>>> I agree that it is philosophical, thus I don't really care much 
>>> about the
>>> name pointing to the first or the last decl.
>>> What bothers me is the "swapping content" bit. It seems to me that 
>>> feeding
>>> to the consumer the same pointer for, essentially, different
>>> declaration objects is a source of complexity and possible subtle bugs,
>>> without much benefit.
>>
>> I was never quite thrilled with the swapping bit, either.
>
> Unfortunately, I did a minor cleanup of the swapping code before 
> reading the whole thread.  Maybe that code can just vanish :)
>
>>> I hope I'm not coming off as too pedantic :)
>>
>> It's a C++ front end, so it pays to be pedantic.
>
> :)
>
>> Anyway, you've convinced me. The attached patch makes a few changes to
>> the way redeclarations are handled:
>>
>>  1) The swapping behavior is gone; instead, we merge into the
>> redeclaration and into the original declaration.
>
> I'm concerned about this.  It violates the sanctity of the original 
> declaration :)
>
>>  2) The addRedeclaration/getNextRedeclaration function has moved into
>> ScopedDecl, so it can be used for other kinds of declarations
>>  3) We use a DenseMap to store the redeclaration links, since so few
>> actual ScopedDecls will have redeclarations
>>  4) We keep track of where merged default arguments came from, so
>> that we can provide better diagnostics for them.
>
> I like this approach much better than the 'swapping' approach.  
> However, it still has a some funny effects.
>
> Here is a crazy idea which may be completely impractical :).  It seems 
> that we have two different desires here: 1) represent declarations as 
> the user wrote them so that clients can have a simple model where 
> decls are just streaming by.  Some of these clients also want the 
> ability to walk the list of redefinitions.  2) provide sema (and some 
> other clients) an 'aggregate' view of the decl where all the info is 
> merged together into one place.  In the previous example:
>
> void foo(int x, int y = 4);
> void foo(int x = 12, int y);
>
> We want the clients to be able to see those two decls, but also one 
> aggregate decl of:
> void foo(int x = 12, int y = 4);
>
> This saves each client from having to walk the entire list of 
> redeclarations to pull together all the default arguments.
>
>
> Lets assume that functions are usually not redefined, and when they 
> are that the redefinitions are often exactly the same (plus perhaps a 
> body).  Given this, maybe a spin on the 'canonical type' idea would work.
>
> Ignoring efficiency, imagine if the first time we parsed a decl that 
> we actually created two versions of the decl: one version (the parsed 
> version) that represents the decl as written, and a second version 
> (the aggregate version) that is updated as other redecls are parsed.  
> After parsing the decl the first time, the two decls are exactly the 
> same:
>
> void foo(int x, int y = 4);      void foo(int x, int y = 4);
>
>
>
> When parsing the redeclaration, we create another 'parse decl' for the 
> redeclaration:
>
> void foo(int x, int y = 4);      void foo(int x, int y = 4);
> void foo(int x = 12, int y);
>
> Then we call mergedecl, and it updates the aggregate version to 
> contain all the goop [this is a technical term] merged together for 
> the two declarations:
>
> void foo(int x, int y = 4);
> void foo(int x = 12, int y);     void foo(int x = 12, int y = 4);
>
> If we later see a definition (e.g. "void foo(int x, int y) {}"), we 
> parse it, but add the actual body to the aggregate version:
>
> void foo(int x, int y = 4);
> void foo(int x = 12, int y);
> void foo(int x, int y)<defn>;    void foo(int x = 12, int y = 4) {}
>
> Instead of adding the body to the "parsed version" of the proto, we 
> add it to the aggregate version, and use a flag on the 'parsed 
> version' to say that it was responsible for providing the definition 
> or not.
>
> Again, if we ignore efficiency, I think this provides an interesting 
> sweet spot: all of these [re]decls would be linked together, so 
> clients could walk the list.  Also, parsing a redefinition does not 
> cause the previous definition to change.  Clients who care about the 
> aggregate semantics of a decl could just always look at the 'aggregate 
> version' of the decl (similar to the 'canonical' version of a type), 
> which would always have the union of information known about the 
> decl.  This is also reasonably easy to implement: merge decls just 
> handles the construction and updating of the aggregate version, and it 
> has all the information it could ever want to provide really accurate 
> diagnostics.
>
> To me, the only bad thing about this approach is that it is 
> (incredibly!) memory inefficient.  Allocating twice the number of 
> decls in the common case (where there is only one decls for each 
> object) is badness.  However, this case is also the trivial case where 
> the aggregate and the parsed decl are exactly the same. :)  As such, I 
> propose that we add two fields to ScopeDecl (?): a 'next 
> redeclaration' pointer and 'is aggregate redecl' bool.
>
> It would actually be implemented as:
>
> I. Sema handles the ActOnFooDecl Action by creating a new decl object, 
> verifying that it is self consistent, and then calling mergedecls 
> unconditionally.
>
> II. Mergedecls gets the decl, sees that it is the first declaration 
> for the object.  Since the 'parsed' and 'aggregate' declaration is the 
> same, set 'nextredecl' pointer to point to itself, and set 
> 'isaggregate' to true.  This saves allocation of a second decl in the 
> common case.  For example, imagine we just parsed:
>
> void foo(int x, int y);  // #1
>
> we now get a "foo" decl, which I'll write schematically as (numbering 
> redeclarations, really the numbers would be pointers of course):
>
> 1: foo [nextredecl=1, isaggregate=true]    void foo(int x, int y);
>
> Scope[foo] = 1   // This is the scope chain for foo
>
>
> III. The next time MergeDecls is called, there are two possibilities: 
> first if the redecl is exactly the same (no information is added) then 
> existing redecl is set as the next pointer and isaggregate is set to 
> false:
>
> void foo(int x, int y);  // #2
>
> 1: foo [nextredecl=1, isaggregate=true]    void foo(int x, int y);
> 2: foo [nextredecl=1, isaggregate=false]   void foo(int x, int y);
>
>
> Scope[foo] = 2
>
> At this point, the scope object says that "#2" is the last 
> redeclaration of 'foo'. From that, we can walk the 'nextredecl' list 
> to see #1.  When we get to it, we see that the next link is the 
> aggregate version, so clients can decide whether they want to get the 
> aggregate version, or just stop if they want to see decls 
> corresponding to what the user wrote.
>
>
> IV. When mergedecls is called and there *is* new information, two 
> things can happen.  So far, there is not an explicit 'aggregate' 
> version.  As such, we now create a new aggregate version of it ("A"), 
> and update things:
>
> void foo(int x, int y = 0);  // #3
>
> A: foo [nextredecl=A, isaggregate=true]   void foo(int x, int y = 0);
> 1: foo [nextredecl=A, isaggregate=true]   void foo(int x, int y);
> 2: foo [nextredecl=1, isaggregate=false]  void foo(int x, int y);
> 3: foo [nextredecl=2, isaggregate=false]  void foo(int x, int y = 0);
>
> Scope[foo] = 3
>
>
> At this point, users can walk the redefn list and see all the 'parsed' 
> definitions 1/2/3 and other clients can also get the full aggregate 
> version (A).  We can distinguish between "recycled" aggregate versions 
> and explicit aggregate versions because 'nextredecl' points to 'this' 
> in the reuse case.
>
>
> V. The second case when adding information is that the explicit 
> aggregate version already exists.  In that case, we just update (A):
>
> void foo(int x = 12, int y)  // #4
>
> A: foo [nextredecl=A, isaggregate=true]  void foo(int x = 12, int y = 0);
> 1: foo [nextredecl=A, isaggregate=true]  void foo(int x, int y);
> 2: foo [nextredecl=1, isaggregate=false] void foo(int x, int y);
> 3: foo [nextredecl=2, isaggregate=false] void foo(int x, int y = 0);
> 4: foo [nextredecl=3, isaggregate=false] void foo(int x = 12, int y);
>
> Scope[foo] = 4
>
> etc.  Of course, the bool and pointer can be swizzeled together so the 
> bool goes in the low bit of the pointer.
>
> I think this is a fairly reasonable sweet spot, what do you guys think?
>
> -Chris
>