[cfe-dev] [Patch] AST support, and unqualified name lookup for using-directives

Sun Feb 1 22:43:52 PST 2009

Hi Piotr,

On Jan 31, 2009, at 1:46 PM, Piotr Rak wrote:

> Hi Dough,
>
> 2009/1/31 Douglas Gregor <dgregor at apple.com>:
>>
>> On Jan 30, 2009, at 7:53 PM, Piotr Rak wrote:
>>>
>>> I have played bit more with using-directives. Attached adds AST
>>> support for using-directives (no serialization still), and makes
>>> Sema::LookupName look ugly:) I may split it into two separated
>>> patches(AST and lookup), if it is needed.
>>
>> This is a big undertaking; thank you!
>>
>> I'll do a detailed review later, but one thing in particular caught  
>> me eye:
>> that sort_heap in CppLookupName is really going to hurt  
>> performance. Could
>> you give me a short description of why we need it? (Or should I  
>> just wait
>> until I do that detailed review to get the big picture?)
>>
>
> For each Scope (namespace/translation-unit), we need to consider all
> using-directives
> which have common ancestor (of nominated namespace, and context
> containing using-directive).
> I see few ways to solve it:
> - sort by common ancestor, use binary search to find range (we do
> that now, we can check other sort algorithms, qsort might be bad,
> since memory allocation often creates increasing addresses)
> - use multiset/hashset
> - use linear search (gcc does that)
>
> Which is faster for real code? I don't know, have not messured it
> honestly, I tend to belive it will be bit faster than gcc for larger
> number of using-directives, or similar..., for small number we lose
> now for sure.
> I would like to play with it, and tweak it at later point, but I don't
> think microbenchmarks are way to go.

Oh, I agree. I'm not concerned about the specific mechanism--- 
sort_heap, sort, whatever---but about the need to perform this kind of  
computation every time we perform name lookup in C++.

> Moreover, I consider caching this
> sorted list in Scope structure (probably not updating online, just
> invalidating), then it will always be a win over gcc in terms of time.

This is what I was looking for. We can cache the results of this  
operation in the Scope structure or (in some cases) DeclContext.  
Performing this operation once per LookupName call would be quite  
painful; but if the number of times we need to do this is proportional  
instead to the number of using-directives, that would be fine.

>
> There is still other issue, hurting lookup, not addressed by this
> patch too, ie. looking in same DeclContext many times.

Oh, certainly. We have FIXME's for a bunch of ways we can make name  
lookup faster for C++.

>
>>> I would also want to also ask, if there is any master plan how to  
>>> get
>>> rid of OverloadedFunctionDecl yet, or anyone actively working on it
>>> now?
>>
>> The master plan is for LookupResult to provide an easy way to  
>> iterate over
>> the set of overloaded functions that it found, which means  
>> providing a
>> LookupResult::iterator that understands how LookupResult stores  
>> overloaded
>> functions (as a pair of iterators into a DeclContext or into an
>> IdentifierResolver chain). If it would help with using-directives,  
>> I could
>> implement LookupResult::iterator relatively soon.
>
> It is not essential now, but at later point I could avoid few
> allocations! while merging multiple OverloadedFunctionDecls (now I
> reuse first found OverloadedFunctionDecl, and free other).

Okay. I'll try to do this soon, anyway; it's always good to have an  
excuse to do these kinds of architectural improvements.

>
>> Right now, OverloadedFunctionDecl is only created "on demand" by
>> LookupResult when the caller requests a Decl*. Callers should be  
>> migrated to
>> look at the kind of LookupResult they receive from Lookup*Name, so  
>> that they
>> can handle both ambiguities and overloaded functions. This change  
>> can happen
>> gradually, and at some point none of the callers will need
>> OverloadedFunctionDecl. I haven't been actively working on doing  
>> this, but
>> for the most part it also isn't hard. The trickiest part is  
>> deciding what to
>> do with DeclRefExpr's that refer to overloaded functions.
>
> Access to stored FunctionDecls avoiding allocation of
> OverloadedFunctionDecl would improve performance in same cases, but
> that would be not be big change.

Right.

Some comments on your patch:

+/// UsingDirectiveDecl - Represents C++ using-directive. For example:
+///    using namespace std;
+// NB: UsingDirectiveDecl should be Decl not NamedDecl, but we provide
+// name artificial name, for all using-directives in order to store
+// them in DeclContext effectivly.
+class UsingDirectiveDecl : public NamedDecl {

This will work, although I think we could do slightly better by  
altering DeclContext's internal structure so that it can store  
UsingDirectiveDecls. However, you don't need to worry about that for  
this patch: I can make that DeclContext change after your patch goes in.

Also, typo "name artificial name".

+  UsingDirectiveDecl(DeclContext *DC, SourceLocation L,
+                     SourceLocation NamespcLoc,
+                     SourceLocation IdentLoc,
+                     NamespaceDecl *Used)
+    : NamedDecl(Decl::UsingDirective, DC, L, getName()),
+      NamespaceLoc(NamespcLoc), IdentLoc(IdentLoc),
+      UsedNamespace(Used? Used->getOriginalNamespace() : 0) {
+    // Find enclosing context containing both using-directive and
+    // used namespace.
+    DeclContext *Ctx = Used;
+    while (Ctx && !Ctx->Encloses(getDeclContext()))
+      Ctx = Ctx->getParent();
+
+    CommonAncestor = Ctx;
+  }

I'd prefer that this computation be performed within Sema, and the  
CommonAncestor value passed in to the constructor.

+  /// getUsedNamespace - Returns namespace nominated by using- 
directive.
+  NamespaceDecl *getUsedNamespace() { return UsedNamespace; }
+
+  const NamespaceDecl *getUsedNamespace() const {
+    return const_cast<UsingDirectiveDecl*>(this)->getUsedNamespace();
+  }

Why not use the name "getNominatedNamespace", since that's the  
standard term?

+  // UsingDirectiveDecl's are not really NamedDecl's, and all have  
same name.
+  // We want to keep them unconditionally.
+  if (getKind() == Decl::UsingDirective)
+    return false; //FIXME: We could conserve space in some rare cases  
returning
+                  //true if both decls point on same namespaces.
+

Isn't this FIXME resolved by just checking whether  
cast<UsingDirectiveDecl>(this)->getUsedNamespace()- 
 >getPrimaryContext() == cast<UsingDirectiveDecl>(OldD)- 
 >getUsedNamespace()->getPrimaryContext()?

+  } else if (UsingDirectiveDecl *UD =  
dyn_cast<UsingDirectiveDecl>(D)) {
+    // print using-directive decl (e.g. "using namespace x;")
+    const char *ns;
+    if (const IdentifierInfo *II = UD->getUsedNamespace()- 
 >getIdentifier())
+      ns = II->getName();
+    else
+      ns = "<anonymous>";
+    fprintf(F, "\"%s %s;\"",UD->getDeclKindName(), ns);

Is it even possible to have a using directive that nominates an  
anonymous namespace? I guess it depends on how we implement unnamed  
namespaces.

+    bool hasBasePaths() const;
+

Do we need this? getBasePaths() could just return NULL if there are no  
base paths.

I like what you've done with LookupResult.

+      /// Name lookup results in an ambiguity because multiple  
definitions
+      /// of entity tat meet the lookup criteria were found in  
different
+      /// declaration contextes.
+      /// @code
+      /// namespace A {
+      ///   int i;
+      ///   namespace B { int i; }
+      ///   int test() {
+      ///     using namespace B;
+      ///     return i; // error 'i' is found in namespace A and A::B
+      ///    }
+      /// }
+      /// @endcode
+      AmbiguousReference

Two typos in that comment: "tat" and "contextes".

Should Sema::ActOnUsingDirective assert that the scope S it gets is a  
DeclScope?

+    // Ensure, that decls are being added to orginal namespace.
+    // Ctx = Ctx->getPrimaryContext(); XXX: needless
+    Ctx->addDecl(UDir);

Right, we don't need that Ctx = Ctx->getPrimaryContext() here, but we  
will need it later...

+  /// @brief Comapres UsingDirectiveDecl common ancestor with  
DeclContext.
+  bool operator () (UsingDirectiveDecl *U, const DeclContext *Ctx)  
const {
+    return U->getCommonAncestor() < Ctx;
+  }

Typo: "Comapres"

+/// AddNamespaceUsingDirectives - Adds all UsingDirectiveDecl's found  
in
+/// namespace NS including all found (recursivly) in nominated by them
+/// namespaces, to heap UDirs, (ordering by common ancestor).
+void AddNamespaceUsingDirectives(DeclContext *NS,
+                                 UsingDirectivesTy &UDirs,
+                                 NamespaceSet &Visited) {

Typo "recursivly", but that sentence also needs some work.

+    DeclContext *Parent = Ctx->getParent();
+
+    // FIXME: do we really need it, do we need add its parent too?
+    if (Parent && Ctx->getLexicalParent() != Parent) {
+      if (NamespaceDecl *NS = dyn_cast<NamespaceDecl>(Parent)) {
+        if (VisitedNS.count(NS)) return;
+        VisitedNS.insert(NS);
+      }
+      AddNamespaceUsingDirectives(Parent, UDirs, /*ref*/ VisitedNS);
+    }

Why are we looking into the parent context here? Using directives  
nominate namespaces that are searched but, IIUC, their enclosing  
namespaces are not searched.

+    Scope::udir_iterator
+      I = S->using_directives_begin(),
+      End = S->using_directives_end();
+
+    for (I, End; I != End; ++I) {

This could just be

	for(; I != End; ++I) {

+/// FindUsingDirsByAncestors - Returns range [First, Last)  
UsingDirectiveDecls
+/// which UsingDirectiveDecl::getCommonAncestor() == Ctx.
+/// It requires UDirs to be sorted by common ancestor.
+static std::pair<UsingDirectivesTy::iterator,  
UsingDirectivesTy::iterator>
+FindUsingDirsByAncestors(UsingDirectivesTy &UDirs, DeclContext *Ctx) {
+
+  UsingDirAncestorCompare Cmp;
+  UsingDirectivesTy::iterator First, Last;
+
+  First = std::lower_bound(UDirs.begin(), UDirs.end(), Ctx, Cmp);
+  Last = std::upper_bound(First, UDirs.end(), Ctx, Cmp);
+
+  return std::make_pair(First, Last);
+}

This whole function is just

	return std::equal_range(UDirs.begin(), UDirs.end(), Ctx,  
UsingDirAncestorCompare());

:)

+/// Merges together multiplie LookupResults dealing with duplicated  
Decl's.
+static Sema::LookupResult
+MergeLookupResults(ASTContext &Context, LookupResultsTy &Results) {

Typo: "multiplie"

+  LookupResultsTy::iterator I = Results.begin(), End = Results.end();
+  for (I; I != End; ++I) {

Could be:

	for (; I != End; ++I) {

+    case LResult::AmbiguousBaseSubobjectTypes:
+    case LResult::AmbiguousBaseSubobjects: {
+      // FIXME: Do we care about later possibly found
+      //   AmbiguousBaseSubobjectTypes, or AmbiguousBaseSubobjects?
+      //   Is it ever possible?

I don't believe this is ever possible, since using directives can't  
occur at class scope. We should probably have an assert() here.

+    case LResult::FoundOverloaded:
+      if (FoundOverloaded) {
+        // We have one spare OverloadedFunctionDecl already, so we  
store
+        // its function decls and free it.
+        OverloadedFunctionDecl *Ovl =
+          cast<OverloadedFunctionDecl>(I->getAsDecl());
+
+        OverloadedFunctionDecl::function_iterator
+          FI = Ovl->function_begin(),
+          FEnd = Ovl->function_end();
+
+        for (FI; FI != FEnd; ++FI)
+          FoundDecls.push_back(*FI);
+
+        Ovl->Destroy(Context);
+      } else {
+        // First time we found OverloadedFunctionDecl, we want to  
conserve
+        // it, and possibly add other found Decls later.
+        FoundOverloaded = cast<OverloadedFunctionDecl>(I->getAsDecl());
+      }
+      break;

In the "else" branch, don't you need to add all of the decls to  
FoundDecls?

+  // Remove duplicated Decl pointing at same Decl, this might be case
+  // for code like:
+  //
+  //    namespace A { int i; }
+  //    namespace B { using namespace A; }
+  //    namespace C { using namespace A; }
+  //
+  //    void foo() {
+  //      using namespace B;
+  //      using namespace C;
+  //      ++i; // finds A::i, from both namespace B and C at global  
scope
+  //    }
+  //
+  // FIXME: Needs quote from paper.
+  // This is not ambiguous lookup.
+  //
+  // FIXME: At this point happends too, because we are doing  
redundant lookups.

The paragraph you're looking for is 3.4.3.2p3:

	The same declaration found more than once is not an ambiguity  
(because it is still a unique declaration).

There's a typo "happends" in the second FIXME.

+  DeclsVecTy::iterator DI = FoundDecls.begin(), DEnd =  
FoundDecls.end();
+  std::sort(DI, DEnd);
+  DEnd = std::unique(DI, DEnd);

I have to wonder if it would be easier just to insert everything into  
a SmallPtrSet and then just iterate over the results, especially given  
the code below it:

+  if (FoundOverloaded) {
+    // We found overloaded functions result. We want to add any other
+    // found decls, that are not already in FoundOverloaded, and are  
functions
+    // or methods.
+    OverloadedFunctionDecl::function_iterator
+      FBegin = FoundOverloaded->function_begin(),
+      FEnd = FoundOverloaded->function_end();
+    for (DI ; DI < DEnd; ++DI) {
+      FunctionDecl *Fun = dyn_cast<FunctionDecl>(*DI);
+      if (Fun && (std::find(FBegin, FEnd, Fun) != FEnd)) { /*  
Skip.*/  }
+      else DEnd = std::remove(DI, DEnd, *DI);
+    }

This is O(n^2) in [DI, DEnd), because remove() is linear time. Sure,  
[DI, DIEnd) should be pretty small---and so should [FBegin, FEnd)  
which we search through linearly---but I think this whole section  
would be cleaner (and no less efficient) with SmallPtrSet.

+    // We might found multiple RecordDecls pointing at same definition.
+    // FIXME: What about case like:
+    // class X; class X {}; X::X(X&) {}
+    if (RecordDecl *R = dyn_cast<RecordDecl>(*FoundDecls.begin())) {
+      // Q: Is that correct way to check RecordDecls equivalence?
+      RecordDecl *RDef = R->getDefinition(Context);
+      DeclsVecTy::iterator RI = FoundDecls.begin(), REnd = DEnd;

This will work when the class has been defined, but not if it's only  
been forward-declared, since getDefinition() will return a NULL  
pointer in that case. I think it's time that we finally add a function

   TagDecl *ASTContext::getCanonicalDecl(TagDecl *Tag) const {
     QualType T = getTypeDeclType(Tag);
     return cast<TagDecl>(cast<TagType>(T)->getDecl());
   }

because this is *exactly* the case where we need it. I also suggest  
making this code work on TagDecls, rather than just RecordDecls, since  
eventually we'll need to cope with forward-declared enums (for C++0x,  
GNU C).

+    // We might find FunctionDecls in two (or more) distinct  
DeclContexts.
+    Decl *D = MaybeConstructOverloadSet(Context, DI, DEnd);
+    if ((FoundLen == 1) || isa<OverloadedFunctionDecl>(D))
+      return LResult::CreateLookupResult(Context, D);

It's very interesting that this implements 3.4.3.2p5 (a comment and  
quote from the standard here would help!), although I think there's  
one issue: MaybeConstructOverloadSet(Context, DI, DEnd) expects that,  
if the sequence [DI, DEnd) contains both object and tag names, that  
the object names precede the tag names. This won't necessarily be the  
case with [DI, DEnd), since we put decls into FoundDecls in the order  
we found them.

+// Return qualified name for Decl D, like A::B::i, for i being member  
of
+// namespace A::B.
+std::string
+getQualifiedName(Decl *D) {

This is used for diagnostics. It's certainly not something you need to  
do, but eventually I think we'll pull this into the diagnostics  
machinery, and teach it to print NamedDecls with qualified names with  
markup like "%q0".

+  // Collect UsingDirectiveDecls in all scopes, and recursivly all
+  // nominated namespaces by those using-directives.
+  // UsingDirectives are pushed to heap, in common ancestor pointer
+  // value order.
+  UsingDirectivesTy UDirs;
+  for (Scope *SC = Initial; SC; SC = SC->getParent())
+     AddScopeUsingDirectives(SC, UDirs);
+
+  // Sort heapified UsingDirectiveDecls.
+  std::sort_heap(UDirs.begin(), UDirs.end());

Could we move this computation down below the part that looks in local  
scopes for names? For example, if we have something like this:

   namespace N { int i; }
   using namespace N;

   int f(int i) {
     return i;
   }

When doing name lookup for "i", we'll never look into a namespace or  
global scope, so there's no reason to go through the effort of looking  
for using directives.

+  for (; S && isFunctionScope(S); S = S->getParent()) {
// ...
+    // NB: Icky, we need to look in function scope, but we need to  
check its
+    // parent DeclContext, in case it is out of line method definition.
+    if (S->getEntity() && isFunctionScope(S))
+      break;

This doesn't look right... function scopes have entities (the  
FunctionDecl); it's only when that entity is a C++ member function  
that we need to look into its (semantic) parent *after* we've looked  
in function scope. This look should exit out once we hit a scope S  
whose entity is a NamespaceDecl or TranslationUnitDecl; after that, we  
need to start worrying about using directives.

+      for (llvm::tie(UI, UEnd) =
+           FindUsingDirsByAncestors(UDirs, Ctx); UI != UEnd; ++UI) {
+        // FIXME: We will have to ensure, that we won't consider
+        // again using-directives during qualified name lookup!
+        // (Once using-directives support for qualified name lookup  
gets
+        // implemented).
+        if (LookupResult R = LookupQualifiedName((*UI)- 
 >getUsedNamespace(),
+            Name, NameKind, RedeclarationOnly))
+          LookupResults.push_back(R);

Regarding this FIXME, I think CppLookupName will become a bit easier  
if LookupQualifiedName understood how to search for those names within  
its DeclContext that should be found via using directives. In  
particular, LookupQualifiedName could take an optional  
"UsingDirectivesTy const *UDirs" parameter that it would use to  
determine which other namespaces to search in. I think the end result  
would be clearer (and more functional).

This is a surprisingly complicated feature (who knew?), but I  
definitely think you're close to the right implementation. Your  
approach in walking all of the using-directives in scope once, then  
sorting it and using binary search to subset the list appropriately  
for each DeclContext lookup, is very interesting: I was thinking about  
a different, far less clever implementation, but I really like yours.  
If we can fix up most of the issues I mentioned above, and perhaps  
throw a few extra test-cases at it (I can help write some more later),  
it'll be ready to go in.

Thank you!

	 -Doug