[cfe-dev] Qualified and unqualified name lookup and IdResolver

Thu Dec 4 19:58:25 PST 2008

Hi Doug,

I like the idea, for unqualified lookup, of declaration chains based on 
lexical scope plus fall back to qualified lookup.

But for qualified lookup, did you consider using the "array+hashtable 
efficient data structure" not for answering "does N contain x ?" (#1) 
but for answering "is x contained in N ?" (#2) ?
(I mean an identifier having that data structure for storing the 
DeclContexts where it appears).
Both cases may have to deal with large number of declarations; #1 may 
deal with namespaces with lots of decls, while #2 may deal with decls 
contained in lots of namespaces.
The difference is that #1 optimizes for namespaces with few declarations 
while #2 optimizes for "differently named" declarations which is more 
common.
Also, #2 will immediately answer for not-declared-before identifiers, 
without requiring any lookup.

-Argiris

Douglas Gregor wrote:
> The attached patch changes how we perform both qualified and 
> unqualified name lookup in Clang. It represents a significant 
> departure from the approach currently implemented in Clang, but I 
> believe it will have performance and maintainability benefits in the 
> future.
>
> First, an explanation of qualified name lookup and how it works in 
> Clang today. Here's an example:
>
> int x; // #1
> namespace N {
>   int x; // #2
>   struct f1 {
>     static int x; // #3
>   };
> }
>
> namespace N {
>   struct f2 {
>     static int x; // #4
>   }
> }
>
> int xv = N::x;
>
>
> The expression N::x involves qualified name lookup, which first looks 
> for "x" within the namespace N. The way this currently works is that 
> there is an IdentifierInfo "x" which contains as list of all of the 
> declarations of "x" in the translation unit, #4 -> #3 -> #2 -> #1. To 
> find "x" in N, we search this list, filtering out those x's that 
> aren't declared in the context 'N', until we find our answer (#2).
>
> The same approach is used for unqualified lookup. Let's say that we 
> add this code at the end of our program above:
>
> namespace N2 {
>   void f() {
>     int y = x;
>   }
> }
>
> Here, the unqualified lookup for "x" searches the same chain of 
> declarations for x (#4 -> #3 -> #2 -> #1), this time accepting any 
> declaration that is in the context of "f" or one of its enclosing 
> contexts (N2, translation unit), and eventually finds #1.
>
> The performance concern is that, as translation units get bigger, the 
> chain of declarations for common names ("operator=" in C++ is one such 
> name) can get large, and we're paying for a linear traversal of this 
> list. This will be particularly bad with template metaprograms, where 
> many members are named "value" or "type" by convention, and template 
> metaprogram evaluation can involve the instantiation of thousands of 
> class templates, each with their own "value" or "type" that ends up on 
> the corresponding declaration chain.
>
> The maintainability concern is that the construction of the chain of 
> declarations is very tied to the act of parsing the translation unit. 
> Other ways of getting part of a program into memory--say, 
> deserializing an AST from disk---would have to reconstruct these 
> chains as if we were parsing. There are also some open questions about 
> how one would deal with lookup into base classes: would we need to 
> perform IsDerivedFrom checks for each identifier as part of the check 
> for "is this in my context?".
>
> I propose---and this patch implements---an alternative solution that 
> returns the chain of declarations back to a chain of declarations that 
> occur within the lexical scope of the program. For our function N2::f, 
> the chain for "x" would contain only #1, which is the only "x" within 
> the lexical scope. Essentially, when names get introduced into a scope 
> via a declaration, they are added to the declaration chain for their 
> name; when that scope is destroyed those names are removed from the 
> declaration chain. That way, the declaration chains only contain 
> declarations that can actually be found by unqualified name lookup, 
> and are no deeper than the nesting of lexical scopes.
>
> For qualified name lookup, each DeclContext has the ability to perform 
> a lookup into the declarations it contains. For example, to find N::x, 
> we would ask the DeclContext represented by N whether it has a name 
> "x" in it. The lookup is performed in an efficient data structure: for 
> < 6 declarations in the DeclContext, it's an array; for more 
> declarations, it's a hash table.
>
> In this scheme, C++ unqualified name lookup falls back on qualified 
> name lookup in some cases. For example, let's add this code to the end 
> of the program above:
>
> namespace N { // re-opening namespace!
>   void g() {
>     int y = x; // needs to find #2
>   }
> }
>
> Note that "x" was not declared in the *lexical* scope of N::g. 
> However, it was declared in the namespace N, so it should be found by 
> unqualified name lookup. One approach to solving this problem would be 
> to push all of the names in N into scope when we re-open the namespace 
> N, but we don't want to do that because opening a namespace would then 
> be linear time in the number of declarations in the namespace. Imagine 
> pushing all of the declarations for namespace "std" just to put in a 
> specialization of iterator_traits!
>
> This patch implements a different approach. Each scope can have an 
> associated "entity", which is the DeclContext that defines the scope. 
> So, for example, the re-opened scope for "N" has as its entity the 
> DeclContext associated with this declaration of N. When we don't see a 
> name in our lexical scope, and the lexical scope has an associated 
> entity, we perform qualified lookup into that entity. So, since there 
> is no "x" in the lexical scope corresponding to N, we look for N::x 
> and find the x living in the first definition of N. The DeclContexts 
> are linked such that, even though each of the three times we open 'N' 
> we have different DeclContexts (each of which contains just the 
> declarations present in the source for that namespace definition), for 
> name lookup purposes there's only one array/hash table containing all 
> of the declarations of 'N'.
>
> This patch is neither complete nor fully-optimized. It should not 
> degrade performance in C, because the array/hash table data structures 
> are not built in C. However, there are more optimization opportunities:
>     - Once we've performed unqualified name lookup for a name, we 
> should cache it on the chain of declarations. For example, in N::g, we 
> do some work to find N::x. We should add N::x to the declaration 
> chain---but not the DeclContext!---so that subsequent lookups of x are 
> very fast (since it'll be at the front of the declaration list for "x").
>     - We should be building the array/hash table in the DeclContexts 
> lazily, when we need to answer the question "is there an x in N?". 
> This would allow us to remove the special-casing for C.
>     - We should make RecordDecl derive from DeclContext, and replace 
> the current storage and name lookup for fields (RecordDecl::getMember, 
> a linear search) with lookup into the DeclContext. This will give us 
> faster lookup for records with many members. A similar thing could be 
> done for Objective-C interfaces/implementations.
>     - When variables are redeclared (e.g., at translation unit scope), 
> we should be replacing the current declaration with the previous one, 
> rather than pushing both declarations onto the declaration name chain. 
> (This is an existing problem that I did not yet fix).
>
> Comments *greatly* appreciated!
>
>     - Doug
>
> ------------------------------------------------------------------------
>
> _______________________________________________
> cfe-dev mailing list
> cfe-dev at cs.uiuc.edu
> http://lists.cs.uiuc.edu/mailman/listinfo/cfe-dev
>