[cfe-commits] r58071 - in /cfe/trunk: include/clang/Basic/DiagnosticKinds.def lib/Driver/TextDiagnosticBuffer.cpp lib/Sema/Sema.h lib/Sema/SemaExpr.cpp lib/Sema/SemaExprCXX.cpp lib/Sema/SemaInherit.cpp lib/Sema/SemaInherit.h lib/Sema/SemaOverload.cpp test/SemaCXX/derived-to-base-ambig.cpp

Fri Oct 24 00:10:04 PDT 2008

On Oct 23, 2008, at 9:54 PM, Douglas Gregor wrote:
> URL: http://llvm.org/viewvc/llvm-project?rev=58071&view=rev
> Log:
> First non-embarrassing cut at checking for ambiguous derived-to-base
> conversions.

Great,

> =
> =
> =
> =
> =
> =
> =
> =
> ======================================================================
> --- cfe/trunk/lib/Sema/SemaInherit.cpp (original)
> +++ cfe/trunk/lib/Sema/SemaInherit.cpp Thu Oct 23 23:54:22 2008
> @@ -14,19 +14,56 @@
> // 
> = 
> = 
> = 
> ----------------------------------------------------------------------= 
> ==//
>
> +#include <memory>
> +#include <set>
> +#include <string>
>
> namespace clang {

In .cpp files, please use 'using namespace clang' instead of wrapping  
the whole file in {}'s.  This is for consistency with other files and  
helps some editors with indentation.

> +/// isAmbiguous - Determines whether the set of paths provided is
> +/// ambiguous, i.e., there are two or more paths that refer to
> +/// different base class subobjects of the same type. BaseType must  
> be
> +/// an unqualified, canonical class type.
> +bool BasePaths::isAmbiguous(QualType BaseType) {

Please assert about invariants like BaseType being canonical.

> +/// CheckDerivedToBaseConversion - Check whether the Derived-to-Base
> +/// conversion (where Derived and Base are class types) is
> +/// well-formed, meaning that the conversion is unambiguous (and
> +/// FIXME: that all of the base classes are accessible). Returns true
> +/// and emits a diagnostic if the code is ill-formed, returns false
> +/// otherwise. Loc is the location where this routine should point to
> +/// if there is an error, and Range is the source range to highlight
> +/// if there is an error.
> +bool
> +Sema::CheckDerivedToBaseConversion(SourceLocation Loc, SourceRange  
> Range,
> +                                   QualType Derived, QualType Base) {

Nice.  Generally, I'd prefer loc/range would be passes later in the  
arg list to methods like this, passing Derived/Base as the first two  
arguments.  This makes it more obvious to a casual reader that Derived/ 
Base are the important points here, and Loc/Range are informative.   
The general convention is that loc info gets passed later except in  
special cases (e.g. actions in which case it is passed in lexical  
order usually).

> +  // First, determine whether the path from Derived to Base is
> +  // ambiguous. This is slightly more expensive than checking whether
> +  // the Derived to Base conversion exists, because here we need to
> +  // explore multiple paths to determine if there is an ambiguity.
> +  BasePaths Paths(/*FindAmbiguities=*/true, /*RecordPaths=*/false);
> +  bool DerivationOkay = IsDerivedFrom(Derived, Base, Paths);
> +  assert(DerivationOkay && "Can only be used with a derived-to-base  
> conversion");
> +  if (!DerivationOkay)
> +    return true;

Isn't this 'if' dead according to the assertion?

> +  if  
> (Paths 
> .isAmbiguous(Context.getCanonicalType(Base).getUnqualifiedType())) {

Just in terms of code structuring, I generally recommend that code be  
structured to encourage early outs and fall-throughs.  In this case,  
something like this would be preferable:

if (! 
Paths.isAmbiguous(Context.getCanonicalType(Base).getUnqualifiedType()))
   return false;
... error handle code ...

The general idea is that each extra level of indentation requires the  
reader to keep the context that caused the indentation in mind when  
reading the code.  Reducing indentation is good for both staying  
within 80 columns as well as aiding reader comprehension.

> +    // We know that the derived-to-base conversion is
> +    // ambiguous.

Please add "and we are thus about to emit an error." or something like  
that.  That makes it immediately obvious why it is ok to do a  
computationally expensive thing here, and indicates to readers like me  
that I shouldn't care about use of std::set in this case ;-)

> +    // Build up a textual representation of the ambiguous paths,  
> e.g.,
> +    // D -> B -> A, that will be used to illustrate the ambiguous
> +    // conversions in the diagnostic. We only print one of the paths
> +    // to each base class subobject.

This is extraordinarily cool btw!

> +  class BasePaths {
> +    /// Paths - The actual set of paths that can be taken from the
> +    /// derived class to the same base class.
> +    std::list<BasePath> Paths;
> +
> +    /// ClassSubobjects - Records the class subobjects for each class
> +    /// type that we've seen. The first element in the pair says
> +    /// whether we found a path to a virtual base for that class  
> type,
> +    /// while the element contains the number of non-virtual base
> +    /// class subobjects for that class type. The key of the map is
> +    /// the cv-unqualified canonical type of the base class  
> subobject.
> +    std::map<QualType, std::pair<bool, unsigned>, QualTypeOrdering>
> +      ClassSubobjects;

The choice of data-structures is interesting here.  std::list and  
std::map are both particularly expensive (being node-based containers)  
and end up being malloc/free limited in common cases.  OTOH, these are  
hopefully usually small.

I guess I don't have any specific suggestions about this, but use of  
these classes always raises a red flag.  If this code ends up being a  
perf issue, we can always reevaluate/tune then.

>
>
> +/// CheckPointerConversion - Check the pointer conversion from the
> +/// expression From to the type ToType. This routine checks for
> +/// ambiguous (FIXME: or inaccessible) derived-to-base pointer
> +/// conversions for which IsPointerConversion has already returned
> +/// true. It returns true and produces a diagnostic if there was an
> +/// error, or returns false otherwise.
> +bool Sema::CheckPointerConversion(Expr *From, QualType ToType) {
> +  QualType FromType = From->getType();
> +
> +  if (const PointerType *FromPtrType = FromType->getAsPointerType())
> +    if (const PointerType *ToPtrType = ToType->getAsPointerType()) {

This code is fine as it is, because it is very small.  However, if the  
body grows over time, please prefer to write the code like this (again  
to encourage early outs and low nesting):

> + const PointerType *FromPtrType = FromType->getAsPointerType();
     if (!FromPtrType) return false;
...

To reiterate, this is a great thing to do and is "beautiful code" when  
the body is small.  It just gets unwieldy (imo) if the body gets large.

> +// RUN: clang -fsyntax-only -verify %s
> +class A { };
> +class B : public A { };
> +class C : public A { };
> +class D : public B, public C { };
> +
> +void f(D* d) {
> +  A* a;
> +  a = d; // expected-error{{ambiguous conversion from derived class  
> 'class D' to base class 'class A'}} expected-error{{incompatible  
> type assigning 'class D *', expected 'class A *'}}
> +}

Why do do these produce two diagnostics?

> +
> +class Object2 { };
> +class A2 : public Object2 { };
> +class B2 : public virtual A2 { };
> +class C2 : virtual public A2 { };
> +class D2 : public B2, public C2 { };
> +class E2 : public D2, public C2, public virtual A2 { };
> +class F2 : public E2, public A2 { };

FYI, G++ produces these two warnings as well:

test/SemaCXX/derived-to-base-ambig.cpp:17: warning: direct base ‘C2’  
inaccessible in ‘E2’ due to ambiguity
test/SemaCXX/derived-to-base-ambig.cpp:18: warning: direct base ‘A2’  
inaccessible in ‘F2’ due to ambiguity

I realize there are tons of things still missing, but I just thought I  
would point it out since I happened to notice it.  Great stuff Doug!

-Chris