[cfe-dev] Qualified and unqualified name lookup and IdResolver
Douglas Gregor
dgregor at apple.com
Thu Dec 4 17:30:10 PST 2008
The attached patch changes how we perform both qualified and
unqualified name lookup in Clang. It represents a significant
departure from the approach currently implemented in Clang, but I
believe it will have performance and maintainability benefits in the
future.
First, an explanation of qualified name lookup and how it works in
Clang today. Here's an example:
int x; // #1
namespace N {
int x; // #2
struct f1 {
static int x; // #3
};
}
namespace N {
struct f2 {
static int x; // #4
}
}
int xv = N::x;
The expression N::x involves qualified name lookup, which first looks
for "x" within the namespace N. The way this currently works is that
there is an IdentifierInfo "x" which contains as list of all of the
declarations of "x" in the translation unit, #4 -> #3 -> #2 -> #1. To
find "x" in N, we search this list, filtering out those x's that
aren't declared in the context 'N', until we find our answer (#2).
The same approach is used for unqualified lookup. Let's say that we
add this code at the end of our program above:
namespace N2 {
void f() {
int y = x;
}
}
Here, the unqualified lookup for "x" searches the same chain of
declarations for x (#4 -> #3 -> #2 -> #1), this time accepting any
declaration that is in the context of "f" or one of its enclosing
contexts (N2, translation unit), and eventually finds #1.
The performance concern is that, as translation units get bigger, the
chain of declarations for common names ("operator=" in C++ is one such
name) can get large, and we're paying for a linear traversal of this
list. This will be particularly bad with template metaprograms, where
many members are named "value" or "type" by convention, and template
metaprogram evaluation can involve the instantiation of thousands of
class templates, each with their own "value" or "type" that ends up on
the corresponding declaration chain.
The maintainability concern is that the construction of the chain of
declarations is very tied to the act of parsing the translation unit.
Other ways of getting part of a program into memory--say,
deserializing an AST from disk---would have to reconstruct these
chains as if we were parsing. There are also some open questions about
how one would deal with lookup into base classes: would we need to
perform IsDerivedFrom checks for each identifier as part of the check
for "is this in my context?".
I propose---and this patch implements---an alternative solution that
returns the chain of declarations back to a chain of declarations that
occur within the lexical scope of the program. For our function N2::f,
the chain for "x" would contain only #1, which is the only "x" within
the lexical scope. Essentially, when names get introduced into a scope
via a declaration, they are added to the declaration chain for their
name; when that scope is destroyed those names are removed from the
declaration chain. That way, the declaration chains only contain
declarations that can actually be found by unqualified name lookup,
and are no deeper than the nesting of lexical scopes.
For qualified name lookup, each DeclContext has the ability to perform
a lookup into the declarations it contains. For example, to find N::x,
we would ask the DeclContext represented by N whether it has a name
"x" in it. The lookup is performed in an efficient data structure: for
< 6 declarations in the DeclContext, it's an array; for more
declarations, it's a hash table.
In this scheme, C++ unqualified name lookup falls back on qualified
name lookup in some cases. For example, let's add this code to the end
of the program above:
namespace N { // re-opening namespace!
void g() {
int y = x; // needs to find #2
}
}
Note that "x" was not declared in the *lexical* scope of N::g.
However, it was declared in the namespace N, so it should be found by
unqualified name lookup. One approach to solving this problem would be
to push all of the names in N into scope when we re-open the namespace
N, but we don't want to do that because opening a namespace would then
be linear time in the number of declarations in the namespace. Imagine
pushing all of the declarations for namespace "std" just to put in a
specialization of iterator_traits!
This patch implements a different approach. Each scope can have an
associated "entity", which is the DeclContext that defines the scope.
So, for example, the re-opened scope for "N" has as its entity the
DeclContext associated with this declaration of N. When we don't see a
name in our lexical scope, and the lexical scope has an associated
entity, we perform qualified lookup into that entity. So, since there
is no "x" in the lexical scope corresponding to N, we look for N::x
and find the x living in the first definition of N. The DeclContexts
are linked such that, even though each of the three times we open 'N'
we have different DeclContexts (each of which contains just the
declarations present in the source for that namespace definition), for
name lookup purposes there's only one array/hash table containing all
of the declarations of 'N'.
This patch is neither complete nor fully-optimized. It should not
degrade performance in C, because the array/hash table data structures
are not built in C. However, there are more optimization opportunities:
- Once we've performed unqualified name lookup for a name, we should
cache it on the chain of declarations. For example, in N::g, we do
some work to find N::x. We should add N::x to the declaration chain---
but not the DeclContext!---so that subsequent lookups of x are very
fast (since it'll be at the front of the declaration list for "x").
- We should be building the array/hash table in the DeclContexts
lazily, when we need to answer the question "is there an x in N?".
This would allow us to remove the special-casing for C.
- We should make RecordDecl derive from DeclContext, and replace the
current storage and name lookup for fields (RecordDecl::getMember, a
linear search) with lookup into the DeclContext. This will give us
faster lookup for records with many members. A similar thing could be
done for Objective-C interfaces/implementations.
- When variables are redeclared (e.g., at translation unit scope), we
should be replacing the current declaration with the previous one,
rather than pushing both declarations onto the declaration name chain.
(This is an existing problem that I did not yet fix).
Comments *greatly* appreciated!
- Doug
-------------- next part --------------
A non-text attachment was scrubbed...
Name: qualified-name-lookup.patch
Type: application/octet-stream
Size: 44863 bytes
Desc: not available
URL: <http://lists.llvm.org/pipermail/cfe-dev/attachments/20081204/bff8560b/attachment.obj>
More information about the cfe-dev
mailing list