[cfe-dev] Patch for IdentifierResolver

Wed Apr 9 20:29:41 PDT 2008

On Apr 9, 2008, at 3:35 PM, Argiris Kirtzidis wrote:

> Chris Lattner wrote:
>> Steve and I were talking about this and came up with a potentially  
>> interesting refinement to your approach.  The observation we make  
>> is that shadowing of decls is actually really really rare in  
>> practice. It would be interesting to see stats, but I bet only a  
>> very small percentage of identifiers every have a scope chain with  
>> length > 1.
> Yes, this is true. For "carbon.h", for a total of 41358 identifiers,  
> only 1092 get a scope chain length > 1. That's just 2.64 %.

Nice!  Yay for intuition :)

>> To take advantage of this, you could abuse the FETokenInfo a  
>> bit :), like we do in the 'Selector' class: if the field is null,  
>> it obviously doesn't point to anything.  If non-null and the low  
>> bit is clear, you could treat it as a pointer to a single  
>> unshadowed decl node.  If non-null with the low bit set, you could  
>> treat it as a pointer to the IdDeclInfo.
>>
>> This would keep the common case (no shadowing) extraordinarily  
>> cheap, while allowing graceful fallback when there is shadowing.   
>> What do you think?

> Excellent idea! Try out the attached patch and let me know what you  
> think:
>
> -Added 'IdentifierResolver' class that handles the shadowed decls  
> identifier chains and performs name lookup.
> -Replaced get/setFETokenInfo calls from Sema with calls to  
> IdentifierResolver'methods:

I really like this.  It is also performance neutral on carbon and  
cocoa.h, congrats!

Some thoughts:

+  /// IdDeclInfoMap - Associates IdDeclInfos with Identifiers.
+  class IdDeclInfoMap {
+    std::list<IdDeclInfo*> IDIArrPtrs;

std::list<pointer> is very inefficient, maybe llvm::ilist would work?   
This puts the prev/next pointers in the IdDeclInfo object itself.   
This can be a follow-on patch of course.
http://llvm.org/docs/ProgrammersManual.html#dss_ilist

+    static const unsigned int ARRAY_SIZE = 512;
+    IdDeclInfo OrigArr[ARRAY_SIZE];

This is sort of gross: 512 is a magic number, where did you get it?   
Is this really needed? Maybe the ilist is enough?  The bad part about  
this is that it adds a couple kilobytes into the middle of the sema  
object, which seems suboptimal.

+//===----------- IdentifierResolver.h - Name lookup -------------*- C+ 
+ -*-===//

Please use:

+//===- IdentifierResolver.h - Name lookup -----------------------*- C+ 
+ -*-===//

Better yet, use something like "Lexical Scope Name Lookup".  This  
doesn't handle all name lookup, just one very specific aspect of it.

+/// IdentifierResolver - Keeps track of 'visible' decls and  
implements efficent
+/// decl lookup based on an identifier.
+class IdentifierResolver {

This should mention that it is lazily constructed the first time an  
identifier is shadowed in some scope, and is used to manage that  
shadowing.

+  struct IdDeclInfo {
+
+    typedef llvm::SmallVector<NamedDecl *, 2> ShadowedTy;
+    typedef ShadowedTy::iterator ShadowedIter;
+
+    ShadowedTy ShadowedDecls;

should these be private?  Would it be possible to pull the IdDeclInfo/ 
IdDeclInfoMap out of the header into the .cpp file?  All of the  
IdentifierResolver methods are out of line anyway, and it would make  
it much easier to understand the IdentifierResolver interface.

+++ lib/Sema/Sema.cpp	(working copy)

    // Add the built-in ObjC types.
    t = cast<TypedefType>(Context.getObjCIdType().getTypePtr());
-  t->getDecl()->getIdentifier()->setFETokenInfo(t->getDecl());
+  IdResolver.AddGlobalDecl(t->getDecl());
    TUScope->AddDecl(t->getDecl());
    t = cast<TypedefType>(Context.getObjCClassType().getTypePtr());
-  t->getDecl()->getIdentifier()->setFETokenInfo(t->getDecl());
+  IdResolver.AddGlobalDecl(t->getDecl());
    TUScope->AddDecl(t->getDecl());
    ObjCInterfaceType *it =  
cast<ObjCInterfaceType>(Context.getObjCProtoType());
    ObjCInterfaceDecl *IDecl = it->getDecl();
-  IDecl->getIdentifier()->setFETokenInfo(IDecl);
+  IdResolver.AddGlobalDecl(IDecl);

FWIW, these happen right when the front-end starts up, so use of  
AddDecl would also work fine.  I think the only thing that uses it is  
handling of implicitly declared functions.

+/// AddDecl - Link the decl to its shadowed decl chain
+void IdentifierResolver::AddDecl(NamedDecl *D, Scope *S) {
...
+  if (0 == Ptr)
+    II->setFETokenInfo(D);
+  else {
...
+  }
+}

Two things: first, please use "Ptr == 0".  I understand that this  
defends against accidental use of = instead of ==, but we have  
warnings for that and "Ptr == 0" reads much better.

Second, please use an early exit to reduce indentation in this  
function, something like this would be preferable:

+  if (Ptr == 0) {
+    II->setFETokenInfo(D);
+    return;
+  }

This is preferable when you have an early out of a function or loop  
(use continue instead of return).  And applies to the other functions  
as well.  For the same reason, please drop the else in things like:

+    if (isDeclPtr(Ptr)) {
+      NamedDecl *D = static_cast<NamedDecl*>(Ptr);
+      return (D->getIdentifierNamespace() == NS) ? D : NULL;
+    } else {

There is no need for the extra indentation of the else block.  I've  
found that high indentation makes the code more difficult to follow  
(you have to keep more context in mind when reading it.

Overall, this is a great cleanup.  I'll be glad to decouple the asts  
from sema!

-Chris