[cfe-dev] Qualified and unqualified name lookup and IdResolver
Argiris Kirtzidis
akyrtzi at gmail.com
Thu Dec 4 19:58:25 PST 2008
Hi Doug,
I like the idea, for unqualified lookup, of declaration chains based on
lexical scope plus fall back to qualified lookup.
But for qualified lookup, did you consider using the "array+hashtable
efficient data structure" not for answering "does N contain x ?" (#1)
but for answering "is x contained in N ?" (#2) ?
(I mean an identifier having that data structure for storing the
DeclContexts where it appears).
Both cases may have to deal with large number of declarations; #1 may
deal with namespaces with lots of decls, while #2 may deal with decls
contained in lots of namespaces.
The difference is that #1 optimizes for namespaces with few declarations
while #2 optimizes for "differently named" declarations which is more
common.
Also, #2 will immediately answer for not-declared-before identifiers,
without requiring any lookup.
-Argiris
Douglas Gregor wrote:
> The attached patch changes how we perform both qualified and
> unqualified name lookup in Clang. It represents a significant
> departure from the approach currently implemented in Clang, but I
> believe it will have performance and maintainability benefits in the
> future.
>
> First, an explanation of qualified name lookup and how it works in
> Clang today. Here's an example:
>
> int x; // #1
> namespace N {
> int x; // #2
> struct f1 {
> static int x; // #3
> };
> }
>
> namespace N {
> struct f2 {
> static int x; // #4
> }
> }
>
> int xv = N::x;
>
>
> The expression N::x involves qualified name lookup, which first looks
> for "x" within the namespace N. The way this currently works is that
> there is an IdentifierInfo "x" which contains as list of all of the
> declarations of "x" in the translation unit, #4 -> #3 -> #2 -> #1. To
> find "x" in N, we search this list, filtering out those x's that
> aren't declared in the context 'N', until we find our answer (#2).
>
> The same approach is used for unqualified lookup. Let's say that we
> add this code at the end of our program above:
>
> namespace N2 {
> void f() {
> int y = x;
> }
> }
>
> Here, the unqualified lookup for "x" searches the same chain of
> declarations for x (#4 -> #3 -> #2 -> #1), this time accepting any
> declaration that is in the context of "f" or one of its enclosing
> contexts (N2, translation unit), and eventually finds #1.
>
> The performance concern is that, as translation units get bigger, the
> chain of declarations for common names ("operator=" in C++ is one such
> name) can get large, and we're paying for a linear traversal of this
> list. This will be particularly bad with template metaprograms, where
> many members are named "value" or "type" by convention, and template
> metaprogram evaluation can involve the instantiation of thousands of
> class templates, each with their own "value" or "type" that ends up on
> the corresponding declaration chain.
>
> The maintainability concern is that the construction of the chain of
> declarations is very tied to the act of parsing the translation unit.
> Other ways of getting part of a program into memory--say,
> deserializing an AST from disk---would have to reconstruct these
> chains as if we were parsing. There are also some open questions about
> how one would deal with lookup into base classes: would we need to
> perform IsDerivedFrom checks for each identifier as part of the check
> for "is this in my context?".
>
> I propose---and this patch implements---an alternative solution that
> returns the chain of declarations back to a chain of declarations that
> occur within the lexical scope of the program. For our function N2::f,
> the chain for "x" would contain only #1, which is the only "x" within
> the lexical scope. Essentially, when names get introduced into a scope
> via a declaration, they are added to the declaration chain for their
> name; when that scope is destroyed those names are removed from the
> declaration chain. That way, the declaration chains only contain
> declarations that can actually be found by unqualified name lookup,
> and are no deeper than the nesting of lexical scopes.
>
> For qualified name lookup, each DeclContext has the ability to perform
> a lookup into the declarations it contains. For example, to find N::x,
> we would ask the DeclContext represented by N whether it has a name
> "x" in it. The lookup is performed in an efficient data structure: for
> < 6 declarations in the DeclContext, it's an array; for more
> declarations, it's a hash table.
>
> In this scheme, C++ unqualified name lookup falls back on qualified
> name lookup in some cases. For example, let's add this code to the end
> of the program above:
>
> namespace N { // re-opening namespace!
> void g() {
> int y = x; // needs to find #2
> }
> }
>
> Note that "x" was not declared in the *lexical* scope of N::g.
> However, it was declared in the namespace N, so it should be found by
> unqualified name lookup. One approach to solving this problem would be
> to push all of the names in N into scope when we re-open the namespace
> N, but we don't want to do that because opening a namespace would then
> be linear time in the number of declarations in the namespace. Imagine
> pushing all of the declarations for namespace "std" just to put in a
> specialization of iterator_traits!
>
> This patch implements a different approach. Each scope can have an
> associated "entity", which is the DeclContext that defines the scope.
> So, for example, the re-opened scope for "N" has as its entity the
> DeclContext associated with this declaration of N. When we don't see a
> name in our lexical scope, and the lexical scope has an associated
> entity, we perform qualified lookup into that entity. So, since there
> is no "x" in the lexical scope corresponding to N, we look for N::x
> and find the x living in the first definition of N. The DeclContexts
> are linked such that, even though each of the three times we open 'N'
> we have different DeclContexts (each of which contains just the
> declarations present in the source for that namespace definition), for
> name lookup purposes there's only one array/hash table containing all
> of the declarations of 'N'.
>
> This patch is neither complete nor fully-optimized. It should not
> degrade performance in C, because the array/hash table data structures
> are not built in C. However, there are more optimization opportunities:
> - Once we've performed unqualified name lookup for a name, we
> should cache it on the chain of declarations. For example, in N::g, we
> do some work to find N::x. We should add N::x to the declaration
> chain---but not the DeclContext!---so that subsequent lookups of x are
> very fast (since it'll be at the front of the declaration list for "x").
> - We should be building the array/hash table in the DeclContexts
> lazily, when we need to answer the question "is there an x in N?".
> This would allow us to remove the special-casing for C.
> - We should make RecordDecl derive from DeclContext, and replace
> the current storage and name lookup for fields (RecordDecl::getMember,
> a linear search) with lookup into the DeclContext. This will give us
> faster lookup for records with many members. A similar thing could be
> done for Objective-C interfaces/implementations.
> - When variables are redeclared (e.g., at translation unit scope),
> we should be replacing the current declaration with the previous one,
> rather than pushing both declarations onto the declaration name chain.
> (This is an existing problem that I did not yet fix).
>
> Comments *greatly* appreciated!
>
> - Doug
>
> ------------------------------------------------------------------------
>
> _______________________________________________
> cfe-dev mailing list
> cfe-dev at cs.uiuc.edu
> http://lists.cs.uiuc.edu/mailman/listinfo/cfe-dev
>
More information about the cfe-dev
mailing list