[cfe-commits] r58071 - in /cfe/trunk: include/clang/Basic/DiagnosticKinds.def lib/Driver/TextDiagnosticBuffer.cpp lib/Sema/Sema.h lib/Sema/SemaExpr.cpp lib/Sema/SemaExprCXX.cpp lib/Sema/SemaInherit.cpp lib/Sema/SemaInherit.h lib/Sema/SemaOverload

Fri Oct 24 09:15:12 PDT 2008

On Fri, Oct 24, 2008 at 12:10 AM, Chris Lattner <clattner at apple.com> wrote:
> On Oct 23, 2008, at 9:54 PM, Douglas Gregor wrote:
>>
>> URL: http://llvm.org/viewvc/llvm-project?rev=58071&view=rev
>> Log:
>> First non-embarrassing cut at checking for ambiguous derived-to-base
>> conversions.
>
> Great,

Thanks for the review.

>> namespace clang {
>
> In .cpp files, please use 'using namespace clang' instead of wrapping the
> whole file in {}'s.  This is for consistency with other files and helps some
> editors with indentation.

Ah, that explains things. Okay.

>> +/// isAmbiguous - Determines whether the set of paths provided is
>> +/// ambiguous, i.e., there are two or more paths that refer to
>> +/// different base class subobjects of the same type. BaseType must be
>> +/// an unqualified, canonical class type.
>> +bool BasePaths::isAmbiguous(QualType BaseType) {
>
> Please assert about invariants like BaseType being canonical.

Okay.

>> +/// CheckDerivedToBaseConversion - Check whether the Derived-to-Base
>> +/// conversion (where Derived and Base are class types) is
>> +/// well-formed, meaning that the conversion is unambiguous (and
>> +/// FIXME: that all of the base classes are accessible). Returns true
>> +/// and emits a diagnostic if the code is ill-formed, returns false
>> +/// otherwise. Loc is the location where this routine should point to
>> +/// if there is an error, and Range is the source range to highlight
>> +/// if there is an error.
>> +bool
>> +Sema::CheckDerivedToBaseConversion(SourceLocation Loc, SourceRange Range,
>> +                                   QualType Derived, QualType Base) {
>
> Nice.  Generally, I'd prefer loc/range would be passes later in the arg list
> to methods like this, passing Derived/Base as the first two arguments.  This
> makes it more obvious to a casual reader that Derived/Base are the important
> points here, and Loc/Range are informative.  The general convention is that
> loc info gets passed later except in special cases (e.g. actions in which
> case it is passed in lexical order usually).

Sure.

>> +  // First, determine whether the path from Derived to Base is
>> +  // ambiguous. This is slightly more expensive than checking whether
>> +  // the Derived to Base conversion exists, because here we need to
>> +  // explore multiple paths to determine if there is an ambiguity.
>> +  BasePaths Paths(/*FindAmbiguities=*/true, /*RecordPaths=*/false);
>> +  bool DerivationOkay = IsDerivedFrom(Derived, Base, Paths);
>> +  assert(DerivationOkay && "Can only be used with a derived-to-base
>> conversion");
>> +  if (!DerivationOkay)
>> +    return true;
>
> Isn't this 'if' dead according to the assertion?

I'm being paranoid. The assertion goes away when NDEBUG is defined, in
which case I want (1) not to get a warning about DerivationOkay being
unused, and (2) to not crash if the assertion would have fired.

>> +  if
>> (Paths.isAmbiguous(Context.getCanonicalType(Base).getUnqualifiedType())) {
>
> Just in terms of code structuring, I generally recommend that code be
> structured to encourage early outs and fall-throughs.  In this case,
> something like this would be preferable:
>
> if (!Paths.isAmbiguous(Context.getCanonicalType(Base).getUnqualifiedType()))
>  return false;
> ... error handle code ...
>
> The general idea is that each extra level of indentation requires the reader
> to keep the context that caused the indentation in mind when reading the
> code.  Reducing indentation is good for both staying within 80 columns as
> well as aiding reader comprehension.

Okay, fair enough.

>> +    // We know that the derived-to-base conversion is
>> +    // ambiguous.
>
> Please add "and we are thus about to emit an error." or something like that.
>  That makes it immediately obvious why it is ok to do a computationally
> expensive thing here, and indicates to readers like me that I shouldn't care
> about use of std::set in this case ;-)

Sure.

>> +  class BasePaths {
>> +    /// Paths - The actual set of paths that can be taken from the
>> +    /// derived class to the same base class.
>> +    std::list<BasePath> Paths;
>> +
>> +    /// ClassSubobjects - Records the class subobjects for each class
>> +    /// type that we've seen. The first element in the pair says
>> +    /// whether we found a path to a virtual base for that class type,
>> +    /// while the element contains the number of non-virtual base
>> +    /// class subobjects for that class type. The key of the map is
>> +    /// the cv-unqualified canonical type of the base class subobject.
>> +    std::map<QualType, std::pair<bool, unsigned>, QualTypeOrdering>
>> +      ClassSubobjects;
>
> The choice of data-structures is interesting here.  std::list and std::map
> are both particularly expensive (being node-based containers) and end up
> being malloc/free limited in common cases.  OTOH, these are hopefully
> usually small.
>
> I guess I don't have any specific suggestions about this, but use of these
> classes always raises a red flag.  If this code ends up being a perf issue,
> we can always reevaluate/tune then.

To try to ease your mind a little... the std::list is only used once
we know we're producing a diagnostic. The map is needed for ambiguity
checking, because we need to keep track of which subobjects we've seen
of each type. A hash map would probably be better, but it didn't look
like we could rely on llvm/ADT/hash_map.h always working.

>> +// RUN: clang -fsyntax-only -verify %s
>> +class A { };
>> +class B : public A { };
>> +class C : public A { };
>> +class D : public B, public C { };
>> +
>> +void f(D* d) {
>> +  A* a;
>> +  a = d; // expected-error{{ambiguous conversion from derived class
>> 'class D' to base class 'class A'}} expected-error{{incompatible type
>> assigning 'class D *', expected 'class A *'}}
>> +}
>
> Why do do these produce two diagnostics?

It'll go down to one diagnostic soon; I haven't tweaked
Sema::DiagnoseAssignmentResult to work well with C++-specific errors
well (yet).

>> +
>> +class Object2 { };
>> +class A2 : public Object2 { };
>> +class B2 : public virtual A2 { };
>> +class C2 : virtual public A2 { };
>> +class D2 : public B2, public C2 { };
>> +class E2 : public D2, public C2, public virtual A2 { };
>> +class F2 : public E2, public A2 { };
>
> FYI, G++ produces these two warnings as well:
>
> test/SemaCXX/derived-to-base-ambig.cpp:17: warning: direct base 'C2'
> inaccessible in 'E2' due to ambiguity
> test/SemaCXX/derived-to-base-ambig.cpp:18: warning: direct base 'A2'
> inaccessible in 'F2' due to ambiguity
>
> I realize there are tons of things still missing, but I just thought I would
> point it out since I happened to notice it.  Great stuff Doug!

Yeah, GCC is noting that there are some conversions that just can't
ever be done. Eventually, we can do this sort of thing, but it's not
required by the spec.

  - Doug