[cfe-dev] PATCH: Semantic analysis and representation of C++ base classes

Sun May 4 22:20:23 PDT 2008

On May 2, 2008, at 6:08 PM, Doug Gregor wrote:

> This patch adds the missing semantic analysis and representation for
> C++ base classes.

Nice!

>    As part of this, there's a new IdentifierNamespace enumerator
> called IDNS_Ignore_nontypes, which tells the IdentifierResolver to
> ignore non-type names. This means that IdentifierNamespace is becoming
> much more like a set of flags dictating name lookup rules rather than
> the 4 C namespaces it started at. Expect more movement in this
> direction as Clang gets more C++-specific name lookup behavior.

Ok, that makes sense to me.  As its capabilities grow, we can always  
refactor it (again!) in the future.

Some comments:

+++ include/clang/AST/Decl.h	(working copy)
@@ -15,6 +15,7 @@
  #define LLVM_CLANG_AST_DECL_H

  #include "clang/AST/DeclBase.h"
+#include "clang/Parse/AccessSpecifier.h"


This is a layering violation.  The AST library should not depend on  
the Parser specific structures.  It is up to Sema to translate parser  
versions of these data structures into something permanent that lives  
in the AST.  The logic here is that we want [de]serialization to be  
able to construct ASTs without going through the parser, and enforcing  
the layering helps ensure there is a clean way to produce the ASTs.   
Also, the obvious solution of moving the enum to the AST headers won't  
work either, as the parser is not allowed to know about ASTs.

This comes up in a variety of places, with different levels of  
weirdness.  For example, attributes and decl specs are parsed into a  
temporary structure and then sema moves them to a more permanent one.   
ObjC has a similar issue with its ObjCDeclSpec enums.  I think the  
best way to handle this is to have a switch statement that translates  
the enums from the parser version to the ast version.


+/// BaseClassDecl - Represents a base class of a C++ class.
+class BaseClassDecl : public Decl {
+protected:
+  QualType Type;
+  SourceRange Range;
+  bool Virtual;
+  AccessSpecifier Access;
+

As with the ObjC-specific decl stuff, please expand the class comment  
to be more detailed, please give an example.  Also, for each member,  
please add a short doxygen comment explaining what they are.

Also, does it make sense for this to inherit from Decl?  My  
understanding of base classes is that they qualify a class, but they  
aren't themselves decls.  This may just be my misunderstanding, but  
what benefit does this get from deriving from Decl?


+  /// BaseClasses/NumBaseClasses - This is a new[]'d array of points to
+  /// base class decls. (C++ only).
+  BaseClassDecl **BaseClasses; // Null if not defined.
+  int NumBaseClasses; // -1 if not defined.
+

Please make this be 0 if not defined.  I'm trying to stamp out use of  
-1 as a sentinal, as it makes definitions of iterators awkward.  Also,  
instead of (or in addition to) these members:

+  /// getNumBaseClasses - Return the number of base classes, or -1 if
+  /// this is a forward declaration.
+  int getNumBaseClasses() const { return NumBaseClasses; }
+  const BaseClassDecl *getBaseClass(unsigned i) const { return  
BaseClasses[i]; }
+  BaseClassDecl *getBaseClass(unsigned i) { return BaseClasses[i]; }

Please add iterator definitions.  For example, ObjCCategoryImplDecl  
defines instmeth_* and classmeth_*.


    /// isTypeName - Return non-null if the specified identifier is a  
typedef name
+  /// in the current scope. If IgnoreNonTypes is true, then name  
lookup will
+  /// completely ignore any declarations it finds that are not types,  
and
+  /// continue to search for the name of a type in outer scopes (used  
in C++).
+  virtual DeclTy *isTypeName(const IdentifierInfo &II, Scope *S,
+                             bool IgnoreNonTypes = false) = 0;

This makes sense to me, but please drop the "(used in C++)" part from  
the comment.

+++ lib/Sema/SemaDeclCXX.cpp	(working copy)
...
+  // Base class types cannot be cv-qualified (because the type is a
+  // class-name).
+  if (BaseType.getCanonicalType().getCVRQualifiers() != 0) {
+    Diag(BaseLoc, diag::err_cv_qualified_base_class,  
BaseType.getAsString(),
+         SpecifierRange);
+    BaseType = BaseType.getUnqualifiedType();
+  }
+

Does this also need to check for address-space qualifiers?  I think  
something like "if (BaseType.getCanonicalType().getAddressSpace() !=  
0)" should be sufficient.  'getUnqualifiedType' strips off addrspace  
qualifiers also.



+void Sema::ActOnBaseSpecifiers(DeclTy *classdecl,
+                               DeclTy **basedecls, unsigned  
NumBaseDecls) {
+  unsigned NextBaseDecl = 0;
+  llvm::DenseMap<const RecordType *, unsigned> DirectBases;

Using a densemap to detect duplicates is overkill here.  DenseMap is  
best when you plan to insert lots of stuff.  What we really want is  
some sort of "SmallDenseMap" which avoids the heap in the common case  
where there are only a few cases.  We don't have that, though, so  
three options:

1) Use SmallPtrSet, which is efficient when below a threshold, and  
also scales very well.  The problem is that you lose the ability to  
map to an index.
2) Use a vector and just sort the vector (by record) and scan the  
vector in order to detect duplicates
3) Just don't worry about it :), add a fixme and move on.


+  for (unsigned i = 0; i < NumBaseDecls; ++i) {

It's a very minor issue, and not a big deal, but I tend to prefer  
using 'i != NumBaseDecls' just as a general form of strength reduction  
and because it is more idiomatic when faced with lots of iterator  
comparisons.  Feel free to ignore me if you like <.

@@ -201,8 +201,11 @@ NamedDecl *IdentifierResolver::Lookup(co
    for (IdDeclInfo::ShadowedIter SI = IDI->shadowed_end();
         SI != IDI->shadowed_begin(); --SI) {
      NamedDecl *D = *(SI-1);
-    if (D->getIdentifierNamespace() & NS)
-      return D;
+    if (D->getIdentifierNamespace() & NS) {
+      if (!(NS & Decl::IDNS_Ignore_nontypes)
+          || dyn_cast_or_null<TypeDecl>(D))
+        return D;

D can't be null here if you got 'D->getIdentifierNamespace()'  
successfully.  Also, if you don't use the pointer result, please use  
isa instead of dyn_cast.  Thus the check can be "|| isa<TypeDecl>(D)".


+++ lib/Parse/ParseDeclCXX.cpp	(working copy)
+  Actions.ActOnBaseSpecifiers(ClassDecl,
+                              BaseDecls.empty()? 0 : &BaseDecls[0],
+                              BaseDecls.size());

SmallVector (unlike std::vector) does allow you to take the address of  
element 0 when the array is empty, so you can drop the check if you  
prefer.


Overall, the patch looks great.  Please update it and commit when  
you're happy with it.  We can continue any remaining discussion after  
the patch goes in.  Thanks Doug,

-Chris