[cfe-dev] [Patch] AST support, and unqualified name lookup for using-directives

Mon Feb 2 15:53:16 PST 2009

Hi Doug,

Thanks for review, and great comments!

I have addressed most of issues in attached version.
It also partially fixes other thing I have noticed, similar failure to
one from first version of patch with namespaces, now with tags.
Fix is not complete though, but behaviour is right. One commented out
in test is incorrectly diagnosed. I'll try send fix ASAP.

There is risk we don't check LookupResult for ambiguities in more
cases too... We need will need more tests. I have checked with gcc
testsuite, and we pass all tests that have 'using namespace', and
don't include standard library headers, use using-declaration, or
require strong-using now.

2009/2/2 Douglas Gregor <dgregor at apple.com>:
> Hi Piotr,
>
> On Jan 31, 2009, at 1:46 PM, Piotr Rak wrote:
>
>> Hi Dough,
>>
>> 2009/1/31 Douglas Gregor <dgregor at apple.com>:
>>>
>>> On Jan 30, 2009, at 7:53 PM, Piotr Rak wrote:
>>>>
>>>> I have played bit more with using-directives. Attached adds AST
>>>> support for using-directives (no serialization still), and makes
>>>> Sema::LookupName look ugly:) I may split it into two separated
>>>> patches(AST and lookup), if it is needed.
>>>
>>> This is a big undertaking; thank you!
>>>
>>> I'll do a detailed review later, but one thing in particular caught me
>>> eye:
>>> that sort_heap in CppLookupName is really going to hurt performance.
>>> Could
>>> you give me a short description of why we need it? (Or should I just wait
>>> until I do that detailed review to get the big picture?)
>>>
>>
>> For each Scope (namespace/translation-unit), we need to consider all
>> using-directives
>> which have common ancestor (of nominated namespace, and context
>> containing using-directive).
>> I see few ways to solve it:
>> - sort by common ancestor, use binary search to find range (we do
>> that now, we can check other sort algorithms, qsort might be bad,
>> since memory allocation often creates increasing addresses)
>> - use multiset/hashset
>> - use linear search (gcc does that)
>>
>> Which is faster for real code? I don't know, have not messured it
>> honestly, I tend to belive it will be bit faster than gcc for larger
>> number of using-directives, or similar..., for small number we lose
>> now for sure.
>> I would like to play with it, and tweak it at later point, but I don't
>> think microbenchmarks are way to go.
>
> Oh, I agree. I'm not concerned about the specific mechanism---sort_heap,
> sort, whatever---but about the need to perform this kind of computation
> every time we perform name lookup in C++.
>
>> Moreover, I consider caching this
>> sorted list in Scope structure (probably not updating online, just
>> invalidating), then it will always be a win over gcc in terms of time.
>
> This is what I was looking for. We can cache the results of this operation
> in the Scope structure or (in some cases) DeclContext. Performing this
> operation once per LookupName call would be quite painful; but if the number
> of times we need to do this is proportional instead to the number of
> using-directives, that would be fine.
>

I would like to avoid making DeclContext heavier, without big
benfit... Scope sounds better to me.I have put some effort to avoid
that, because:

- C doesn't need it.
- There will be houndreds DeclContext's in typical code
- I will be surprised if there we were many more than 20 Scope objects
in typical
  code at one time
- Scope has short pretty life, if we compare it to DeclContext.
- Other AST clients can't benefit from that information anyway...

>>
>> There is still other issue, hurting lookup, not addressed by this
>> patch too, ie. looking in same DeclContext many times.
>
> Oh, certainly. We have FIXME's for a bunch of ways we can make name lookup
> faster for C++.
>
>>
>>>> I would also want to also ask, if there is any master plan how to get
>>>> rid of OverloadedFunctionDecl yet, or anyone actively working on it
>>>> now?
>>>
>>> The master plan is for LookupResult to provide an easy way to iterate
>>> over
>>> the set of overloaded functions that it found, which means providing a
>>> LookupResult::iterator that understands how LookupResult stores
>>> overloaded
>>> functions (as a pair of iterators into a DeclContext or into an
>>> IdentifierResolver chain). If it would help with using-directives, I
>>> could
>>> implement LookupResult::iterator relatively soon.
>>
>> It is not essential now, but at later point I could avoid few
>> allocations! while merging multiple OverloadedFunctionDecls (now I
>> reuse first found OverloadedFunctionDecl, and free other).
>
> Okay. I'll try to do this soon, anyway; it's always good to have an excuse
> to do these kinds of architectural improvements.
>

Thanks!

>>
>>> Right now, OverloadedFunctionDecl is only created "on demand" by
>>> LookupResult when the caller requests a Decl*. Callers should be migrated
>>> to
>>> look at the kind of LookupResult they receive from Lookup*Name, so that
>>> they
>>> can handle both ambiguities and overloaded functions. This change can
>>> happen
>>> gradually, and at some point none of the callers will need
>>> OverloadedFunctionDecl. I haven't been actively working on doing this,
>>> but
>>> for the most part it also isn't hard. The trickiest part is deciding what
>>> to
>>> do with DeclRefExpr's that refer to overloaded functions.
>>
>> Access to stored FunctionDecls avoiding allocation of
>> OverloadedFunctionDecl would improve performance in same cases, but
>> that would be not be big change.
>
> Right.
>
> Some comments on your patch:
>
> +/// UsingDirectiveDecl - Represents C++ using-directive. For example:
> +///    using namespace std;
> +// NB: UsingDirectiveDecl should be Decl not NamedDecl, but we provide
> +// name artificial name, for all using-directives in order to store
> +// them in DeclContext effectivly.
> +class UsingDirectiveDecl : public NamedDecl {
>
> This will work, although I think we could do slightly better by altering
> DeclContext's internal structure so that it can store UsingDirectiveDecls.
> However, you don't need to worry about that for this patch: I can make that
> DeclContext change after your patch goes in.
>
> Also, typo "name artificial name".
>
> +  UsingDirectiveDecl(DeclContext *DC, SourceLocation L,
> +                     SourceLocation NamespcLoc,
> +                     SourceLocation IdentLoc,
> +                     NamespaceDecl *Used)
> +    : NamedDecl(Decl::UsingDirective, DC, L, getName()),
> +      NamespaceLoc(NamespcLoc), IdentLoc(IdentLoc),
> +      UsedNamespace(Used? Used->getOriginalNamespace() : 0) {
> +    // Find enclosing context containing both using-directive and
> +    // used namespace.
> +    DeclContext *Ctx = Used;
> +    while (Ctx && !Ctx->Encloses(getDeclContext()))
> +      Ctx = Ctx->getParent();
> +
> +    CommonAncestor = Ctx;
> +  }
>
> I'd prefer that this computation be performed within Sema, and the
> CommonAncestor value passed in to the constructor.
>

Done.

> +  /// getUsedNamespace - Returns namespace nominated by using-directive.
> +  NamespaceDecl *getUsedNamespace() { return UsedNamespace; }
> +
> +  const NamespaceDecl *getUsedNamespace() const {
> +    return const_cast<UsingDirectiveDecl*>(this)->getUsedNamespace();
> +  }
>
> Why not use the name "getNominatedNamespace", since that's the standard
> term?
>

True, just Used was easier, but changed that too :)

> +  // UsingDirectiveDecl's are not really NamedDecl's, and all have same
> name.
> +  // We want to keep them unconditionally.
> +  if (getKind() == Decl::UsingDirective)
> +    return false; //FIXME: We could conserve space in some rare cases
> returning
> +                  //true if both decls point on same namespaces.
> +
>
> Isn't this FIXME resolved by just checking whether
> cast<UsingDirectiveDecl>(this)->getUsedNamespace()->getPrimaryContext() ==
> cast<UsingDirectiveDecl>(OldD)->getUsedNamespace()->getPrimaryContext()?

Indeed, I was up to fix that before sending that, but just forgot. Fixed now.
getUsedNamespace/getNominatedNamespace always returns original namespace, so
getPrimaryContext is not needed here.

> +  } else if (UsingDirectiveDecl *UD = dyn_cast<UsingDirectiveDecl>(D)) {
> +    // print using-directive decl (e.g. "using namespace x;")
> +    const char *ns;
> +    if (const IdentifierInfo *II = UD->getUsedNamespace()->getIdentifier())
> +      ns = II->getName();
> +    else
> +      ns = "<anonymous>";
> +    fprintf(F, "\"%s %s;\"",UD->getDeclKindName(), ns);
>
> Is it even possible to have a using directive that nominates an anonymous
> namespace? I guess it depends on how we implement unnamed namespaces.
>

That reveals, how I plan deal with anonymous namespaces for name lookup :)
I wanted add implicit using-directive in enclosing context (as
Sebastian already pointed out). This should be enough, assuming that,
we work only with single translation unit.
In fact, I was planing to implement unqualified/qualified name lookup,
gcc strong-using/inline namespace and anonymous namespaces together,
but later it turned out to be too big for one patch!

> +    bool hasBasePaths() const;
> +
>
> Do we need this? getBasePaths() could just return NULL if there are no base
> paths.
>

Indeed, removed.

> I like what you've done with LookupResult.
>
> +      /// Name lookup results in an ambiguity because multiple definitions
> +      /// of entity tat meet the lookup criteria were found in different
> +      /// declaration contextes.
> +      /// @code
> +      /// namespace A {
> +      ///   int i;
> +      ///   namespace B { int i; }
> +      ///   int test() {
> +      ///     using namespace B;
> +      ///     return i; // error 'i' is found in namespace A and A::B
> +      ///    }
> +      /// }
> +      /// @endcode
> +      AmbiguousReference
>
> Two typos in that comment: "tat" and "contextes".

Fixed!

> Should Sema::ActOnUsingDirective assert that the scope S it gets is a
> DeclScope?
>
> +    // Ensure, that decls are being added to orginal namespace.
> +    // Ctx = Ctx->getPrimaryContext(); XXX: needless
> +    Ctx->addDecl(UDir);
>
> Right, we don't need that Ctx = Ctx->getPrimaryContext() here, but we will
> need it later...

No we won't, using-directive should be added to lexical context, and
only be visible
in semantic context, just like other NamedDecls. This is how
DeclContext::addDecl works now, so we don't need anything special
here, right?

> +  /// @brief Comapres UsingDirectiveDecl common ancestor with DeclContext.
> +  bool operator () (UsingDirectiveDecl *U, const DeclContext *Ctx) const {
> +    return U->getCommonAncestor() < Ctx;
> +  }
>
> Typo: "Comapres"

Fixed.

> +/// AddNamespaceUsingDirectives - Adds all UsingDirectiveDecl's found in
> +/// namespace NS including all found (recursivly) in nominated by them
> +/// namespaces, to heap UDirs, (ordering by common ancestor).
> +void AddNamespaceUsingDirectives(DeclContext *NS,
> +                                 UsingDirectivesTy &UDirs,
> +                                 NamespaceSet &Visited) {
>
> Typo "recursivly", but that sentence also needs some work.

Fixed, OK now?

>
> +    DeclContext *Parent = Ctx->getParent();
> +
> +    // FIXME: do we really need it, do we need add its parent too?
> +    if (Parent && Ctx->getLexicalParent() != Parent) {
> +      if (NamespaceDecl *NS = dyn_cast<NamespaceDecl>(Parent)) {
> +        if (VisitedNS.count(NS)) return;
> +        VisitedNS.insert(NS);
> +      }
> +      AddNamespaceUsingDirectives(Parent, UDirs, /*ref*/ VisitedNS);
> +    }
>
> Why are we looking into the parent context here? Using directives nominate
> namespaces that are searched but, IIUC, their enclosing namespaces are not
> searched.
>

It will be eventually, but true, we don't need it here.

> +    Scope::udir_iterator
> +      I = S->using_directives_begin(),
> +      End = S->using_directives_end();
> +
> +    for (I, End; I != End; ++I) {
>
> This could just be
>
>        for(; I != End; ++I) {
>

Ok, old habit, changed.

> +/// FindUsingDirsByAncestors - Returns range [First, Last)
> UsingDirectiveDecls
> +/// which UsingDirectiveDecl::getCommonAncestor() == Ctx.
> +/// It requires UDirs to be sorted by common ancestor.
> +static std::pair<UsingDirectivesTy::iterator, UsingDirectivesTy::iterator>
> +FindUsingDirsByAncestors(UsingDirectivesTy &UDirs, DeclContext *Ctx) {
> +
> +  UsingDirAncestorCompare Cmp;
> +  UsingDirectivesTy::iterator First, Last;
> +
> +  First = std::lower_bound(UDirs.begin(), UDirs.end(), Ctx, Cmp);
> +  Last = std::upper_bound(First, UDirs.end(), Ctx, Cmp);
> +
> +  return std::make_pair(First, Last);
> +}
>
> This whole function is just
>
>        return std::equal_range(UDirs.begin(), UDirs.end(), Ctx,
> UsingDirAncestorCompare());
>
> :)
>

Thanks!

> +/// Merges together multiplie LookupResults dealing with duplicated Decl's.
> +static Sema::LookupResult
> +MergeLookupResults(ASTContext &Context, LookupResultsTy &Results) {
>
> Typo: "multiplie"
>

Fixed.

> +  LookupResultsTy::iterator I = Results.begin(), End = Results.end();
> +  for (I; I != End; ++I) {
>
> Could be:
>
>        for (; I != End; ++I) {
>

Ok.

> +    case LResult::AmbiguousBaseSubobjectTypes:
> +    case LResult::AmbiguousBaseSubobjects: {
> +      // FIXME: Do we care about later possibly found
> +      //   AmbiguousBaseSubobjectTypes, or AmbiguousBaseSubobjects?
> +      //   Is it ever possible?
>
> I don't believe this is ever possible, since using directives can't occur at
> class scope. We should probably have an assert() here.

It shouldn't be possible, however it is, probably because we look more
than once in same DeclContext.

> +    case LResult::FoundOverloaded:
> +      if (FoundOverloaded) {
> +        // We have one spare OverloadedFunctionDecl already, so we store
> +        // its function decls and free it.
> +        OverloadedFunctionDecl *Ovl =
> +          cast<OverloadedFunctionDecl>(I->getAsDecl());
> +
> +        OverloadedFunctionDecl::function_iterator
> +          FI = Ovl->function_begin(),
> +          FEnd = Ovl->function_end();
> +
> +        for (FI; FI != FEnd; ++FI)
> +          FoundDecls.push_back(*FI);
> +
> +        Ovl->Destroy(Context);
> +      } else {
> +        // First time we found OverloadedFunctionDecl, we want to conserve
> +        // it, and possibly add other found Decls later.
> +        FoundOverloaded = cast<OverloadedFunctionDecl>(I->getAsDecl());
> +      }
> +      break;
>
> In the "else" branch, don't you need to add all of the decls to FoundDecls?

No, because I reuse first OverloadedFunctionDecl I find, so it has
them already, do you see otherwise?

> +  // Remove duplicated Decl pointing at same Decl, this might be case
> +  // for code like:
> +  //
> +  //    namespace A { int i; }
> +  //    namespace B { using namespace A; }
> +  //    namespace C { using namespace A; }
> +  //
> +  //    void foo() {
> +  //      using namespace B;
> +  //      using namespace C;
> +  //      ++i; // finds A::i, from both namespace B and C at global scope
> +  //    }
> +  //
> +  // FIXME: Needs quote from paper.
> +  // This is not ambiguous lookup.
> +  //
> +  // FIXME: At this point happends too, because we are doing redundant
> lookups.
>
> The paragraph you're looking for is 3.4.3.2p3:
>
>        The same declaration found more than once is not an ambiguity
> (because it is still a unique declaration).
>
> There's a typo "happends" in the second FIXME.

Thanks! Ugh, time to make vim check my errors...

> +  DeclsVecTy::iterator DI = FoundDecls.begin(), DEnd = FoundDecls.end();
> +  std::sort(DI, DEnd);
> +  DEnd = std::unique(DI, DEnd);
>
> I have to wonder if it would be easier just to insert everything into a
> SmallPtrSet and then just iterate over the results, especially given the
> code below it:
>
> +  if (FoundOverloaded) {
> +    // We found overloaded functions result. We want to add any other
> +    // found decls, that are not already in FoundOverloaded, and are
> functions
> +    // or methods.
> +    OverloadedFunctionDecl::function_iterator
> +      FBegin = FoundOverloaded->function_begin(),
> +      FEnd = FoundOverloaded->function_end();
> +    for (DI ; DI < DEnd; ++DI) {
> +      FunctionDecl *Fun = dyn_cast<FunctionDecl>(*DI);
> +      if (Fun && (std::find(FBegin, FEnd, Fun) != FEnd)) { /* Skip.*/  }
> +      else DEnd = std::remove(DI, DEnd, *DI);
> +    }
>
> This is O(n^2) in [DI, DEnd), because remove() is linear time. Sure, [DI,
> DIEnd) should be pretty small---and so should [FBegin, FEnd) which we search
> through linearly---but I think this whole section would be cleaner (and no
> less efficient) with SmallPtrSet.

Would it fair, to fix it once OverloadedFunctionDecl dies, from
exactly small detail you mentioned below... Decls need to be added in
same order we found it.

> +    // We might found multiple RecordDecls pointing at same definition.
> +    // FIXME: What about case like:
> +    // class X; class X {}; X::X(X&) {}
> +    if (RecordDecl *R = dyn_cast<RecordDecl>(*FoundDecls.begin())) {
> +      // Q: Is that correct way to check RecordDecls equivalence?
> +      RecordDecl *RDef = R->getDefinition(Context);
> +      DeclsVecTy::iterator RI = FoundDecls.begin(), REnd = DEnd;
>
> This will work when the class has been defined, but not if it's only been
> forward-declared, since getDefinition() will return a NULL pointer in that
> case. I think it's time that we finally add a function
>
>  TagDecl *ASTContext::getCanonicalDecl(TagDecl *Tag) const {
>   QualType T = getTypeDeclType(Tag);
>   return cast<TagDecl>(cast<TagType>(T)->getDecl());
>  }
>
> because this is *exactly* the case where we need it. I also suggest making
> this code work on TagDecls, rather than just RecordDecls, since eventually
> we'll need to cope with forward-declared enums (for C++0x, GNU C).

Thanks, added getCanonicalDecl, and used it.

> +    // We might find FunctionDecls in two (or more) distinct DeclContexts.
> +    Decl *D = MaybeConstructOverloadSet(Context, DI, DEnd);
> +    if ((FoundLen == 1) || isa<OverloadedFunctionDecl>(D))
> +      return LResult::CreateLookupResult(Context, D);
>
> It's very interesting that this implements 3.4.3.2p5 (a comment and quote
> from the standard here would help!), although I think there's one issue:
> MaybeConstructOverloadSet(Context, DI, DEnd) expects that, if the sequence
> [DI, DEnd) contains both object and tag names, that the object names precede
> the tag names. This won't necessarily be the case with [DI, DEnd), since we
> put decls into FoundDecls in the order we found them.

Are you sure it is right, in last final draft I have 3.4.3.2.p5 is:

"During the lookup of a qualified namespace member name, if the lookup
finds more than one declaration of   the member, and if one
declaration introduces a class name or enumeration name and the other
declarations either introduce the same object, the same enumerator or
a set of functions, the non-type name hides the class or enumeration
name if and only if the declarations are from the same namespace;
otherwise
(the declarations are from different namespaces), the program is ill-formed."

But we do unqualified lookup here.
I based it rather on 3.4.p1:
"... Name lookup may associate more than one declaration with a name
if it finds the name to be a function name; the declarations are said
to form a set of
  overloaded functions (13.1). Overload resolution (13.3) takes place
after name lookup has succeeded."

Is that correct?

> +// Return qualified name for Decl D, like A::B::i, for i being member of
> +// namespace A::B.
> +std::string
> +getQualifiedName(Decl *D) {
>
> This is used for diagnostics. It's certainly not something you need to do,
> but eventually I think we'll pull this into the diagnostics machinery, and
> teach it to print NamedDecls with qualified names with markup like "%q0".

I left it alone for now, I will try to implement it in next patch.

> +  // Collect UsingDirectiveDecls in all scopes, and recursivly all
> +  // nominated namespaces by those using-directives.
> +  // UsingDirectives are pushed to heap, in common ancestor pointer
> +  // value order.
> +  UsingDirectivesTy UDirs;
> +  for (Scope *SC = Initial; SC; SC = SC->getParent())
> +     AddScopeUsingDirectives(SC, UDirs);
> +
> +  // Sort heapified UsingDirectiveDecls.
> +  std::sort_heap(UDirs.begin(), UDirs.end());
>
> Could we move this computation down below the part that looks in local
> scopes for names? For example, if we have something like this:
>
>  namespace N { int i; }
>  using namespace N;
>
>  int f(int i) {
>   return i;
>  }

Now 'i' is found before sort for this case, we return result in line 562:
  return std::make_pair(true, Result);

I have checked that in debugger, do you see otherwise?

> When doing name lookup for "i", we'll never look into a namespace or global
> scope, so there's no reason to go through the effort of looking for using
> directives.
>
> +  for (; S && isFunctionScope(S); S = S->getParent()) {
> // ...
> +    // NB: Icky, we need to look in function scope, but we need to check
> its
> +    // parent DeclContext, in case it is out of line method definition.
> +    if (S->getEntity() && isFunctionScope(S))
> +      break;
>
> This doesn't look right... function scopes have entities (the FunctionDecl);
> it's only when that entity is a C++ member function that we need to look
> into its (semantic) parent *after* we've looked in function scope. This look
> should exit out once we hit a scope S whose entity is a NamespaceDecl or
> TranslationUnitDecl; after that, we need to start worrying about using
> directives.

If we did not break here; Scope->getEntity() would be translation-unit
or namespace, for out of line member function definition, and we would
not perform member name lookup. It is true, we could do
LookupQualifiedName here, and stop later at translation-unit namespace
Scope. Is that what you wanted me to implement?

> +      for (llvm::tie(UI, UEnd) =
> +           FindUsingDirsByAncestors(UDirs, Ctx); UI != UEnd; ++UI) {
> +        // FIXME: We will have to ensure, that we won't consider
> +        // again using-directives during qualified name lookup!
> +        // (Once using-directives support for qualified name lookup gets
> +        // implemented).
> +        if (LookupResult R = LookupQualifiedName((*UI)->getUsedNamespace(),
> +            Name, NameKind, RedeclarationOnly))
> +          LookupResults.push_back(R);
>
> Regarding this FIXME, I think CppLookupName will become a bit easier if
> LookupQualifiedName understood how to search for those names within its
> DeclContext that should be found via using directives. In particular,
> LookupQualifiedName could take an optional "UsingDirectivesTy const *UDirs"
> parameter that it would use to determine which other namespaces to search
> in. I think the end result would be clearer (and more functional).

LookupQualifiedName will need to get logic for using-directives, which
is quite different from what we want for unqualified name lookup. That
means, it would get 2 modes for:
  - unqualified name lookup
  - qualified name lookup

Qualified name lookup is quite complicated too unfortunately, involves
tracking visited
contexts, etc...
Beside we will probably need off switch for considering
using-directives for ADL, because of 3.4.2.p3.

If it is not essential to land this patch, I would like to implement
qualified name lookup first.

> This is a surprisingly complicated feature (who knew?), but I definitely
> think you're close to the right implementation.
I did not :) But looking amount of code gcc needs...

> Your approach in walking all
> of the using-directives in scope once, then sorting it and using binary
> search to subset the list appropriately for each DeclContext lookup, is very
> interesting: I was thinking about a different, far less clever
> implementation, but I really like yours. If we can fix up most of the issues
> I mentioned above, and perhaps throw a few extra test-cases at it (I can
> help write some more later), it'll be ready to go in.
>
> Thank you!
>
>         -Doug
>

I also added FIXME for caching sorted list of using-directives, I
would like to wait
with that, to moment when I know what data is needed to get qualified
lookup fast...

Piotr
-------------- next part --------------
A non-text attachment was scrubbed...
Name: decl.udir.v3.diff
Type: text/x-patch
Size: 61250 bytes
Desc: not available
URL: <http://lists.llvm.org/pipermail/cfe-dev/attachments/20090203/825aa021/attachment.bin>