[cfe-dev] Decls are not synonyms for the symbols they represent

steve naroff snaroff at apple.com
Wed Sep 17 08:09:33 PDT 2008


Comments below...

On Sep 16, 2008, at 5:41 PM, Ted Kremenek wrote:

> A few weeks ago I had a conversation with Daniel about the fact that  
> the ASTs (or other clang data structures) have no notion of the  
> "entity" (for lack of a better word) that a declaration represents.
>
> Here are a couple examples of what I mean:
>
> (example 1)
>
>   extern double x;
>   extern double x;
>
> Both of these are variable declarations that reference the same  
> variable.  There is no notion of the variable itself other than the  
> declarations, which is conflated, particularly since we have  
> multiple declarations in this case (i.e., there is no unique  
> "entity" for the variable).
>
> (incidentally, clang crashes on this input: http://llvm.org/bugs/show_bug.cgi?id=2760)
>

Fixed.

>
> (example 2)
>
>   struct s;
>   struct s { int a; };
>   struct s;
>
> Until a few weeks ago, these struct declarations were represented by  
> a single RecordDecl with a unique RecordType.  Now they are  
> represented by three separate RecordDecls with a shared, unique  
> RecordType.
>
> With structures, the unique RecordType indeed can be treated as  
> representing the "struct" itself, which seems fine since the given  
> declarations are just type declarations.  So in this case, we *do*  
> have a unique "entity" in the ASTs to represent what the  
> declarations refer to.  There are still some issues with this  
> representation, but I will delay mentioning them until after the  
> next example.
>

I think your change was a nice improvement:-)

>
> (example 3)
>
> int f();
> int f();
> int f() { return 0; }
>
> int g();
> int g() { return 1; }
>
> For this example, we have separate FunctionDecls for each one of  
> these declarations.  In this example, all of the declarations both  
> 'f' and 'g' share the same type (note that this is different from  
> the case with structs).  For the case of 'f', all of its  
> FunctionDecls are chained together, and the same goes for 'g'.   
> There is, however, no notion of an entity or concept in the ASTs or  
> other clang data structures that represent 'f' itself.
>

I don't really understand what you mean by "f itself". In the example  
above, we have two identical function declarations and one function  
(for "f"). This accurately reflects the source code (which is our  
goal). I could imagine higher level convenience functions that might  
be useful for some clients, however I think the AST is fundamentally  
correct in this instance.

> Here is an example of why not having an explicit concept for 'f',  
> 'g', or any symbol is problematic.
>
> Consider:
>
>   extern int h(int* x) __attribute__((nonnull));
>   extern int h(int *x);
>   extern int h(int* x) __attribute__((noreturn));
>
> This code is completely valid.  In the ASTs we create three  
> FunctionDecls, the first having the attribute "nonnull" attached to  
> it (and object of type NonNullAttr) and the third having the  
> attribute "noreturn" attached to it (an object of type NoReturnAttr).
>
> Suppose I had a client (e.g., code generation, static analysis) that  
> wanted to know all the attributes attached to a given function.  How  
> would I go about doing this?  Given one of these FunctionDecls, I  
> would have to iterate the chain of FunctionDecls and query each one  
> of its attributes.  This seems a little cumbersome, and causes  
> separate clients to have to implement their own logic for querying  
> information about "symbols" in a translation unit.  It also causes  
> clients to think about internal representations such as the fact  
> that FunctionDecls are chained, something we may wish to change at  
> any moment in the future.
>

As far as the AST's go, I really don't see the hardship here. The fact  
that the FunctionDecls are chained accurately reflects the source  
code...doesn't it? For me, the problem with the chain is memory  
efficiency (more than convenience). In C, it is fairly uncommon to  
have more than one function decl for the same name (yet every  
FunctionDecl has a chain!). Nevertheless, we already have some bloat  
in FunctionDecl...every prototype has a Body slot (ouch). Clearly room  
for improvement here.

A related issue which I consider more problematic is the lack of *any*  
"chain" for VarDecls. Consider the following code:

int i4;
int i4;
extern int i4;

const int a [1] = {1};
extern const int a[];

extern const int b[];
const int b [1] = {1};


At the moment, there is no way to get to the previous declaration!  
Since I've already whined about the memory inefficiency for  
FunctionDecl's, I certainly wouldn't recommend adding a chain for all  
VarDecls!

I remember writing Sema::CheckForFileScopedRefefinitions() where I had  
to deal with this. Fortunately, Sema's "IdResolver" came to the rescue  
(thanks Argiris:-). That said, my gut says it might be worth using an  
IdResolver-like mechanism to solve this "navigation problem" for  
*both* VarDecls and FunctionDecls. Architecturally, it would make  
sense for this new API to be part of ASTContext.

Thoughts?

snaroff

> This email isn't really a proposal of a solution; I'm just raising  
> an issue to see if anyone has any comments.  After the last few  
> weeks I've been excited by our discussions of DeclGroups and  
> TypeSpecifiers that will solve many of the remaining issues with  
> faithfully representing syntax in the ASTs.  At the same time, I  
> think we need to pay a little more attention to the semantics, and  
> providing infrastructure that would be useful for many clients.
>
> Indeed, some of our changes to improve our capturing of syntax have  
> actually weakened some of our clients reasoning about semantics.   
> For example, by splitting separate struct declarations into multiple  
> RecordDecls we actually (initally) broke CodeGen because the CodeGen  
> library assumed that there was a direct 1-1 mapping between a  
> RecordDecl and the concept it represented.  That particular case was  
> easily resolved by using the RecordType instead of the RecordDecl to  
> represent the 'struct', but I'd be willing to wager that there are  
> other issues that haven't surfaced yet because RecordTypes are being  
> used in this way (by all clients).
>
> Thoughts?
> _______________________________________________
> cfe-dev mailing list
> cfe-dev at cs.uiuc.edu
> http://lists.cs.uiuc.edu/mailman/listinfo/cfe-dev

-------------- next part --------------
An HTML attachment was scrubbed...
URL: <http://lists.llvm.org/pipermail/cfe-dev/attachments/20080917/35f65f97/attachment.html>


More information about the cfe-dev mailing list