[cfe-dev] Qualified and unqualified name lookup and IdResolver

Thu Dec 4 17:30:10 PST 2008

The attached patch changes how we perform both qualified and  
unqualified name lookup in Clang. It represents a significant  
departure from the approach currently implemented in Clang, but I  
believe it will have performance and maintainability benefits in the  
future.

First, an explanation of qualified name lookup and how it works in  
Clang today. Here's an example:

int x; // #1
namespace N {
   int x; // #2
   struct f1 {
     static int x; // #3
   };
}

namespace N {
   struct f2 {
     static int x; // #4
   }
}

int xv = N::x;

The expression N::x involves qualified name lookup, which first looks  
for "x" within the namespace N. The way this currently works is that  
there is an IdentifierInfo "x" which contains as list of all of the  
declarations of "x" in the translation unit, #4 -> #3 -> #2 -> #1. To  
find "x" in N, we search this list, filtering out those x's that  
aren't declared in the context 'N', until we find our answer (#2).

The same approach is used for unqualified lookup. Let's say that we  
add this code at the end of our program above:

namespace N2 {
   void f() {
     int y = x;
   }
}

Here, the unqualified lookup for "x" searches the same chain of  
declarations for x (#4 -> #3 -> #2 -> #1), this time accepting any  
declaration that is in the context of "f" or one of its enclosing  
contexts (N2, translation unit), and eventually finds #1.

The performance concern is that, as translation units get bigger, the  
chain of declarations for common names ("operator=" in C++ is one such  
name) can get large, and we're paying for a linear traversal of this  
list. This will be particularly bad with template metaprograms, where  
many members are named "value" or "type" by convention, and template  
metaprogram evaluation can involve the instantiation of thousands of  
class templates, each with their own "value" or "type" that ends up on  
the corresponding declaration chain.

The maintainability concern is that the construction of the chain of  
declarations is very tied to the act of parsing the translation unit.  
Other ways of getting part of a program into memory--say,  
deserializing an AST from disk---would have to reconstruct these  
chains as if we were parsing. There are also some open questions about  
how one would deal with lookup into base classes: would we need to  
perform IsDerivedFrom checks for each identifier as part of the check  
for "is this in my context?".

I propose---and this patch implements---an alternative solution that  
returns the chain of declarations back to a chain of declarations that  
occur within the lexical scope of the program. For our function N2::f,  
the chain for "x" would contain only #1, which is the only "x" within  
the lexical scope. Essentially, when names get introduced into a scope  
via a declaration, they are added to the declaration chain for their  
name; when that scope is destroyed those names are removed from the  
declaration chain. That way, the declaration chains only contain  
declarations that can actually be found by unqualified name lookup,  
and are no deeper than the nesting of lexical scopes.

For qualified name lookup, each DeclContext has the ability to perform  
a lookup into the declarations it contains. For example, to find N::x,  
we would ask the DeclContext represented by N whether it has a name  
"x" in it. The lookup is performed in an efficient data structure: for  
< 6 declarations in the DeclContext, it's an array; for more  
declarations, it's a hash table.

In this scheme, C++ unqualified name lookup falls back on qualified  
name lookup in some cases. For example, let's add this code to the end  
of the program above:

namespace N { // re-opening namespace!
   void g() {
     int y = x; // needs to find #2
   }
}

Note that "x" was not declared in the *lexical* scope of N::g.  
However, it was declared in the namespace N, so it should be found by  
unqualified name lookup. One approach to solving this problem would be  
to push all of the names in N into scope when we re-open the namespace  
N, but we don't want to do that because opening a namespace would then  
be linear time in the number of declarations in the namespace. Imagine  
pushing all of the declarations for namespace "std" just to put in a  
specialization of iterator_traits!

This patch implements a different approach. Each scope can have an  
associated "entity", which is the DeclContext that defines the scope.  
So, for example, the re-opened scope for "N" has as its entity the  
DeclContext associated with this declaration of N. When we don't see a  
name in our lexical scope, and the lexical scope has an associated  
entity, we perform qualified lookup into that entity. So, since there  
is no "x" in the lexical scope corresponding to N, we look for N::x  
and find the x living in the first definition of N. The DeclContexts  
are linked such that, even though each of the three times we open 'N'  
we have different DeclContexts (each of which contains just the  
declarations present in the source for that namespace definition), for  
name lookup purposes there's only one array/hash table containing all  
of the declarations of 'N'.

This patch is neither complete nor fully-optimized. It should not  
degrade performance in C, because the array/hash table data structures  
are not built in C. However, there are more optimization opportunities:
	- Once we've performed unqualified name lookup for a name, we should  
cache it on the chain of declarations. For example, in N::g, we do  
some work to find N::x. We should add N::x to the declaration chain--- 
but not the DeclContext!---so that subsequent lookups of x are very  
fast (since it'll be at the front of the declaration list for "x").
	- We should be building the array/hash table in the DeclContexts  
lazily, when we need to answer the question "is there an x in N?".  
This would allow us to remove the special-casing for C.
	- We should make RecordDecl derive from DeclContext, and replace the  
current storage and name lookup for fields (RecordDecl::getMember, a  
linear search) with lookup into the DeclContext. This will give us  
faster lookup for records with many members. A similar thing could be  
done for Objective-C interfaces/implementations.
	- When variables are redeclared (e.g., at translation unit scope), we  
should be replacing the current declaration with the previous one,  
rather than pushing both declarations onto the declaration name chain.  
(This is an existing problem that I did not yet fix).

Comments *greatly* appreciated!

	- Doug

-------------- next part --------------
A non-text attachment was scrubbed...
Name: qualified-name-lookup.patch
Type: application/octet-stream
Size: 44863 bytes
Desc: not available
URL: <http://lists.llvm.org/pipermail/cfe-dev/attachments/20081204/bff8560b/attachment.obj>