[cfe-commits] r58817 - in /cfe/trunk: include/clang/AST/ include/clang/Basic/ include/clang/Parse/ lib/AST/ lib/Basic/ lib/Parse/ lib/Sema/ test/SemaCXX/ www/

Mon Nov 17 08:17:37 PST 2008

On Sun, Nov 16, 2008 at 11:46 PM, Chris Lattner <clattner at apple.com> wrote:
>
> On Nov 6, 2008, at 2:13 PM, Douglas Gregor wrote:
>
>> Author: dgregor
>> Date: Thu Nov  6 16:13:31 2008
>> New Revision: 58817
>>
>> URL: http://llvm.org/viewvc/llvm-project?rev=58817&view=rev
>> Log:
>> Parsing, ASTs, and semantic analysis for the declaration of overloaded
>> operators in C++. Overloaded operators can be called directly via
>> their operator-function-ids, e.g., "operator+(foo, bar)", but we don't
>> yet implement the semantics of operator overloading to handle, e.g.,
>> "foo + bar".
>
> Very nice work, some thoughts (sorry for the delay!):

Thanks for the review!

>> +// C++ operator overloading
>> +DIAG(err_expected_operator, ERROR,
>> +     "expected 'operator' keyword")
>> +DIAG(err_operator_overload_needs_class_or_enum, ERROR,
>> +     "non-member overloaded operator '%0' must have at least one
>> parameter of class or enumeration type (or reference thereof)")
>
> It isn't critical, but I'd prefer this file to stay in 80 columns.  Two ways
> to do this: 1) rely on string concatenation to break the string across two
> lines:
>
>   "non-member overloaded operator '%0' must have at least one "
>   "parameter of class or enumeration type (or reference thereof)"

Okay.

> 2) shrink the diagnostic to be just as clear but more terse.
>
> If #2 is possible, it is preferable.  Having concise diagnostics is better
> than wordy ones so long as they don't lose clarity IMO.  In this case, how
> about:
>
>  "overloaded '%0' must have a parameter with class or enumeration type"
>
> The fact that it is non-member should be clear from the context printed with
> the diagnostic (yay for good location info).  I think "overloaded operator
> 'operator*'" is somewhat redundant (if you agree, please change the other
> diags as well to be consistent).  The mention of reference types is pedantic
> and correct, but I don't think it is necessary to get the point across.
>  What do you think?

Sure, but I just want to say that I was *very* proud to have snuck
"thereof" into a compiler error message.

==============================================================================
>> --- cfe/trunk/include/clang/Basic/IdentifierTable.h (original)
>> +++ cfe/trunk/include/clang/Basic/IdentifierTable.h Thu Nov  6 16:13:31
>> 2008
>> @@ -49,11 +50,12 @@
>>  // First NUM_OBJC_KEYWORDS values are for Objective-C, the remaining
>> values
>>  // are for builtins.
>>  unsigned ObjCOrBuiltinID    :10;
>> +  unsigned OperatorID         : 6; // C++ overloaded operator.
>>  bool HasMacro               : 1; // True if there is a #define for this.
>>  bool IsExtension            : 1; // True if identifier is a lang
>> extension.
>>  bool IsPoisoned             : 1; // True if identifier is poisoned.
>>  bool IsCPPOperatorKeyword   : 1; // True if ident is a C++ operator
>> keyword.
>> -  // 10 bits left in 32-bit word.
>> +  // 4 bits left in 32-bit word.
>>  void *FETokenInfo;               // Managed by the language front-end.
>>  IdentifierInfo(const IdentifierInfo&);  // NONCOPYABLE.
>>  void operator=(const IdentifierInfo&);  // NONASSIGNABLE.
>
> This comes back to whether something like "operator->" should be an
> identifier.  I think that the new and shiny DeclarationName is a better way
> of handling this, see below.
>
>> @@ -161,6 +173,11 @@
>>  // BumpPtrAllocator!
>>  typedef llvm::StringMap<IdentifierInfo, llvm::BumpPtrAllocator>
>> HashTableTy;
>>  HashTableTy HashTable;
>> +
>> +  /// OverloadedOperators - Identifiers corresponding to each of the
>> +  /// overloadable operators in C++.
>> +  IdentifierInfo *OverloadedOperators[NUM_OVERLOADED_OPERATORS];
>
> Doing this with DeclarationName eliminates the need to populate and look
> these up.  The DeclarationName would just be "I'm an overloaded operator, of
> form OO_Plus".  The getName() method would return "operator+" for it.

Yeah, I was thinking about this while writing up the documentation on
DeclarationName. DeclarationNameExtra::ExtraKind is going to end up
getting a bunch of new values, and this change will have to come
*after* we teach IdentifierResolver to work with DeclarationNames. I
think the end result will be better, though: IdentifierInfo can go
back to just dealing with identifiers and keywords, and
DeclarationName will encompass all kinds of names.

>> +++ cfe/trunk/include/clang/Parse/Parser.h Thu Nov  6 16:13:31 2008
>> @@ -728,6 +728,10 @@
>>  void ParseBaseClause(DeclTy *ClassDecl);
>>  BaseResult ParseBaseSpecifier(DeclTy *ClassDecl);
>>  AccessSpecifier getAccessSpecifierIfPresent() const;
>> +
>> +
>>  //===--------------------------------------------------------------------===//
>> +  // C++ 13.5: Overloaded operators [over.oper]
>> +  IdentifierInfo *MaybeParseOperatorFunctionId();
>
> This is pedantic and picky, but I don't like the name
> 'MaybeParseOperatorFunctionId'. :)
>
> I see that we do use MaybeParseXXX in a couple other places in the parser,
> but we also use "MayBeXXX" as a predicate.  Are you ok with
> "TryParseOperatorFunctionId"?

Sure.

>> +++ cfe/trunk/lib/AST/Decl.cpp Thu Nov  6 16:13:31 2008
>> @@ -210,6 +210,15 @@
>>  return NumRequiredArgs;
>> }
>>
>> +/// getOverloadedOperator - Which C++ overloaded operator this
>> +/// function represents, if any.
>> +OverloadedOperatorKind FunctionDecl::getOverloadedOperator() const {
>> +  if (getIdentifier())
>> +    return getIdentifier()->getOverloadedOperatorID();
>> +  else
>> +    return OO_None;
>
> I think this would be more elegant with a DeclarationName-based solution.

It will be.

>> +++ cfe/trunk/lib/Parse/ParseExprCXX.cpp Thu Nov  6 16:13:31 2008
>> @@ -291,3 +291,78 @@
>>  ConsumeToken();
>>  DS.Finish(Diags, PP.getSourceManager(), getLang());
>> }
>> +
>> +/// MaybeParseOperatorFunctionId - Attempts to parse a C++ overloaded
>> +/// operator name (C++ [over.oper]). If successful, returns the
>> +/// predefined identifier that corresponds to that overloaded
>> +/// operator. Otherwise, returns NULL and does not consume any tokens.
>> +///
>> +///       operator-function-id: [C++ 13.5]
>> +///         'operator' operator
>> +///
>> +/// operator: one of
>> +///            new   delete  new[]   delete[]
>> +///            +     -    *  /    %  ^    &   |   ~
>> +///            !     =    <  >    += -=   *=  /=  %=
>> +///            ^=    &=   |= <<   >> >>= <<=  ==  !=
>> +///            <=    >=   && ||   ++ --   ,   ->* ->
>> +///            ()    []
>> +IdentifierInfo *Parser::MaybeParseOperatorFunctionId() {
>
>> +  if (Tok.isNot(tok::kw_operator))
>> +    return 0;
>
> This this just assert that the current token is 'operator'?  That would make
> the contract very clear to callers and is how a lot of other code in the
> parser works.

Yup, Argiris fixed this.

>> +  OverloadedOperatorKind Op = OO_None;
>> +  switch (NextToken().getKind()) {
>
> ...
>>
>> +  default:
>> +    break;
>
> How about return 0 instead of break?

Okay.

>>
>> +  }
>> +
>> +  if (Op == OO_None)
>> +    return 0;
>
> That would make this dead, and eliminate the need for the if.

Sure.

>> +++ cfe/trunk/lib/Sema/SemaDeclCXX.cpp Thu Nov  6 16:13:31 2008
>> @@ -1576,3 +1576,242 @@
>> +/// CheckOverloadedOperatorDeclaration - Check whether the declaration
>> +/// of this overloaded operator is well-formed. If so, returns false;
>> +/// otherwise, emits appropriate diagnostics and returns true.
>> +bool Sema::CheckOverloadedOperatorDeclaration(FunctionDecl *FnDecl) {
>> +  assert(FnDecl && FnDecl->getOverloadedOperator() != OO_None &&
>
> You could use isOverloadedOperator() ?

Sure.

>> +  // C++ [over.oper]p6:
>> +  //   An operator function shall either be a non-static member
>> +  //   function or be a non-member function and have at least one
>> +  //   parameter whose type is a class, a reference to a class, an
>> +  //   enumeration, or a reference to an enumeration.
>> +  CXXMethodDecl *MethodDecl = dyn_cast<CXXMethodDecl>(FnDecl);
>> +  if (MethodDecl) {
>
> I see that you use MethodDecl as a predicate later in the function.  As a
> reader, it actually helps me understand the code if you put the dyncast in
> the if condition, and use "isa<CXXMethodDecl>(FnDecl)" later in the method
> for the predicate cases.  This is just fewer variables to keep track of.
>  Does that seem reasonable?

Hrm, I guess so. Is isa<> relatively cheap? (I guess this isn't a hot
path, but...).

>>
>> +    if (MethodDecl->isStatic()) {
>> +      Diag(FnDecl->getLocation(),
>> +           diag::err_operator_overload_static,
>> +           FnDecl->getName(),
>> +           SourceRange(FnDecl->getLocation()));
>> +      IsInvalid = true;
>> +
>> +      // Pretend this isn't a member function; it'll supress
>> +      // additional, unnecessary error messages.
>> +      MethodDecl = 0;
>
> Is there a reason to do any of the rest of the checking here?  Why not just
> return true?  Here and in other places you continue after emitting the first
> diagnostic.  While I do agree about trying to recover after errors, I think
> that this routine might lean too much in the direction of causing a big
> chain of nonsensical gibberish after a simple error.  Do you think it would
> be better to exit immediately after the first error (which would mean we'd
> emit at most one diagnostic per operator)?  What are the tradeoffs?

In this routine, it's okay to exit early rather than continue emitting
errors after the first error. In other places (e.g.,
CheckDestructorDeclarator), we do some normalization of the declarator
that should always happen to prevent future crashes. This
normalization isn't needed with overloaded operators.

>> +  } else {
>> +    bool ClassOrEnumParam = false;
>> +    for (FunctionDecl::param_iterator Param = FnDecl->param_begin();
>> +         Param != FnDecl->param_end(); ++Param) {
>
> This evaluates param_end() each time through the loop.

Fixed.

>>
>> +      QualType ParamType = (*Param)->getType();
>> +      if (const ReferenceType *RefType = ParamType->getAsReferenceType())
>> +        ParamType = RefType->getPointeeType();
>
> Would using getNonReferenceType() make sense?

Yes, done.

>
>> +  // C++ [over.oper]p8:
>> +  //   [...] Operator functions cannot have more or fewer parameters
>> +  //   than the number required for the corresponding operator, as
>> +  //   described in the rest of this subclause.
>> +  unsigned NumParams = FnDecl->getNumParams() + (MethodDecl? 1 : 0);
>> +  if (Op != OO_Call &&
>> +      ((NumParams == 1 && !CanBeUnaryOperator) ||
>> +       (NumParams == 2 && !CanBeBinaryOperator) ||
>> +       (NumParams < 1) || (NumParams > 2))) {
>
>> +    // We have the wrong number of parameters.
>> +    std::string NumParamsStr = (llvm::APSInt(32) =
>> NumParams).toString(10);
>
> This is scarily clever :), but please just use llvm::utostr(NumParams).  It
> is in llvm/ADT/StringExtras.h

Such overly clever code means that I couldn't find llvm/ADT/StringExtras.h :)

>> +  // Overloaded operators cannot be variadic.
>> +  if (FnDecl->getType()->getAsFunctionTypeProto()->isVariadic()) {
>
> Huh, operator() can't be variadic?  I didn't realize that, but then again, I
> guess I never had need to try :)

Argh! Good catch. operator() can be variadic, because it's an
exception to the rules in this section.

>> +  // Some operators must be non-static member functions.
>> +  if (MustBeMemberOperator && !MethodDecl) {
>> +    Diag(FnDecl->getLocation(),
>> +         diag::err_operator_overload_must_be_member,
>> +         FnDecl->getName(),
>> +         SourceRange(FnDecl->getLocation()));
>
> Why do these specify the identifier location as a source range?  I think the
> effect of this is that it will underline the first token of the name (which
> is "operator" in this case?).  Is that intended?  If not, I'd just drop it
> and rely on the caret.

Yeah, I like it better with just the caret.

> A source range would be better suited when emitting
> errors about parameters to the operator (underlying the parameter in
> question).

Unfortunately, ParmVarDecl doesn't have quite enough information to do
this. We currently point at the parameter name (or the place it would
occur, for unnamed parameters). We might be able to cram some more
source range information into DeclaratorChunk::ParamInfo and look back
at that to get the underlining...

  - Doug