[cfe-commits] r58817 - in /cfe/trunk: include/clang/AST/ include/clang/Basic/ include/clang/Parse/ lib/AST/ lib/Basic/ lib/Parse/ lib/Sema/ test/SemaCXX/ www/

Chris Lattner clattner at apple.com
Sun Nov 16 20:46:38 PST 2008


On Nov 6, 2008, at 2:13 PM, Douglas Gregor wrote:

> Author: dgregor
> Date: Thu Nov  6 16:13:31 2008
> New Revision: 58817
>
> URL: http://llvm.org/viewvc/llvm-project?rev=58817&view=rev
> Log:
> Parsing, ASTs, and semantic analysis for the declaration of overloaded
> operators in C++. Overloaded operators can be called directly via
> their operator-function-ids, e.g., "operator+(foo, bar)", but we don't
> yet implement the semantics of operator overloading to handle, e.g.,
> "foo + bar".

Very nice work, some thoughts (sorry for the delay!):

> =
> =
> =
> =
> =
> =
> =
> =
> ======================================================================
> --- cfe/trunk/include/clang/Basic/DiagnosticKinds.def (original)
> +++ cfe/trunk/include/clang/Basic/DiagnosticKinds.def Thu Nov  6  
> 16:13:31 2008
> @@ -1288,6 +1288,28 @@
> DIAG(err_ambiguous_derived_to_base_conv, ERROR,
>      "ambiguous conversion from derived class '%0' to base class  
> '%1':%2")
>
> +// C++ operator overloading
> +DIAG(err_expected_operator, ERROR,
> +     "expected 'operator' keyword")
> +DIAG(err_operator_overload_needs_class_or_enum, ERROR,
> +     "non-member overloaded operator '%0' must have at least one  
> parameter of class or enumeration type (or reference thereof)")

It isn't critical, but I'd prefer this file to stay in 80 columns.   
Two ways to do this: 1) rely on string concatenation to break the  
string across two lines:

    "non-member overloaded operator '%0' must have at least one "
    "parameter of class or enumeration type (or reference thereof)"

2) shrink the diagnostic to be just as clear but more terse.

If #2 is possible, it is preferable.  Having concise diagnostics is  
better than wordy ones so long as they don't lose clarity IMO.  In  
this case, how about:

   "overloaded '%0' must have a parameter with class or enumeration  
type"

The fact that it is non-member should be clear from the context  
printed with the diagnostic (yay for good location info).  I think  
"overloaded operator 'operator*'" is somewhat redundant (if you agree,  
please change the other diags as well to be consistent).  The mention  
of reference types is pedantic and correct, but I don't think it is  
necessary to get the point across.  What do you think?

> = 
> = 
> = 
> = 
> = 
> = 
> = 
> = 
> ======================================================================
> --- cfe/trunk/include/clang/Basic/IdentifierTable.h (original)
> +++ cfe/trunk/include/clang/Basic/IdentifierTable.h Thu Nov  6  
> 16:13:31 2008
> @@ -49,11 +50,12 @@
>   // First NUM_OBJC_KEYWORDS values are for Objective-C, the  
> remaining values
>   // are for builtins.
>   unsigned ObjCOrBuiltinID    :10;
> +  unsigned OperatorID         : 6; // C++ overloaded operator.
>   bool HasMacro               : 1; // True if there is a #define for  
> this.
>   bool IsExtension            : 1; // True if identifier is a lang  
> extension.
>   bool IsPoisoned             : 1; // True if identifier is poisoned.
>   bool IsCPPOperatorKeyword   : 1; // True if ident is a C++  
> operator keyword.
> -  // 10 bits left in 32-bit word.
> +  // 4 bits left in 32-bit word.
>   void *FETokenInfo;               // Managed by the language front- 
> end.
>   IdentifierInfo(const IdentifierInfo&);  // NONCOPYABLE.
>   void operator=(const IdentifierInfo&);  // NONASSIGNABLE.

This comes back to whether something like "operator->" should be an  
identifier.  I think that the new and shiny DeclarationName is a  
better way of handling this, see below.

> @@ -161,6 +173,11 @@
>   // BumpPtrAllocator!
>   typedef llvm::StringMap<IdentifierInfo, llvm::BumpPtrAllocator>  
> HashTableTy;
>   HashTableTy HashTable;
> +
> +  /// OverloadedOperators - Identifiers corresponding to each of the
> +  /// overloadable operators in C++.
> +  IdentifierInfo *OverloadedOperators[NUM_OVERLOADED_OPERATORS];

Doing this with DeclarationName eliminates the need to populate and  
look these up.  The DeclarationName would just be "I'm an overloaded  
operator, of form OO_Plus".  The getName() method would return  
"operator+" for it.

> +++ cfe/trunk/include/clang/Basic/OperatorKinds.def Thu Nov  6  
> 16:13:31 2008
> @@ -0,0 +1,68 @@
> +//===--- OperatorKinds.def - C++ Overloaded Operator Database ---*-  
> C++ -*-===//

In contrast to diagnostics, I don't see any reason to make this file  
stay in 80 cols, and I think that not doing so actually improves  
clarity.

> +++ cfe/trunk/include/clang/Parse/Parser.h Thu Nov  6 16:13:31 2008
> @@ -728,6 +728,10 @@
>   void ParseBaseClause(DeclTy *ClassDecl);
>   BaseResult ParseBaseSpecifier(DeclTy *ClassDecl);
>   AccessSpecifier getAccessSpecifierIfPresent() const;
> +
> +  // 
> = 
> = 
> =-------------------------------------------------------------------- 
> ===//
> +  // C++ 13.5: Overloaded operators [over.oper]
> +  IdentifierInfo *MaybeParseOperatorFunctionId();

This is pedantic and picky, but I don't like the name  
'MaybeParseOperatorFunctionId'. :)

I see that we do use MaybeParseXXX in a couple other places in the  
parser, but we also use "MayBeXXX" as a predicate.  Are you ok with  
"TryParseOperatorFunctionId"?

> +++ cfe/trunk/lib/AST/Decl.cpp Thu Nov  6 16:13:31 2008
> @@ -210,6 +210,15 @@
>   return NumRequiredArgs;
> }
>
> +/// getOverloadedOperator - Which C++ overloaded operator this
> +/// function represents, if any.
> +OverloadedOperatorKind FunctionDecl::getOverloadedOperator() const {
> +  if (getIdentifier())
> +    return getIdentifier()->getOverloadedOperatorID();
> +  else
> +    return OO_None;

I think this would be more elegant with a DeclarationName-based  
solution.

> +++ cfe/trunk/lib/Parse/ParseExprCXX.cpp Thu Nov  6 16:13:31 2008
> @@ -291,3 +291,78 @@
>   ConsumeToken();
>   DS.Finish(Diags, PP.getSourceManager(), getLang());
> }
> +
> +/// MaybeParseOperatorFunctionId - Attempts to parse a C++ overloaded
> +/// operator name (C++ [over.oper]). If successful, returns the
> +/// predefined identifier that corresponds to that overloaded
> +/// operator. Otherwise, returns NULL and does not consume any  
> tokens.
> +///
> +///       operator-function-id: [C++ 13.5]
> +///         'operator' operator
> +///
> +/// operator: one of
> +///            new   delete  new[]   delete[]
> +///            +     -    *  /    %  ^    &   |   ~
> +///            !     =    <  >    += -=   *=  /=  %=
> +///            ^=    &=   |= <<   >> >>= <<=  ==  !=
> +///            <=    >=   && ||   ++ --   ,   ->* ->
> +///            ()    []
> +IdentifierInfo *Parser::MaybeParseOperatorFunctionId() {

> +  if (Tok.isNot(tok::kw_operator))
> +    return 0;

This this just assert that the current token is 'operator'?  That  
would make the contract very clear to callers and is how a lot of  
other code in the parser works.

>
> +
> +  OverloadedOperatorKind Op = OO_None;
> +  switch (NextToken().getKind()) {
...
> +  default:
> +    break;

How about return 0 instead of break?

>
> +  }
> +
> +  if (Op == OO_None)
> +    return 0;

That would make this dead, and eliminate the need for the if.

> +++ cfe/trunk/lib/Sema/SemaDeclCXX.cpp Thu Nov  6 16:13:31 2008
> @@ -1576,3 +1576,242 @@
> +/// CheckOverloadedOperatorDeclaration - Check whether the  
> declaration
> +/// of this overloaded operator is well-formed. If so, returns false;
> +/// otherwise, emits appropriate diagnostics and returns true.
> +bool Sema::CheckOverloadedOperatorDeclaration(FunctionDecl *FnDecl) {
> +  assert(FnDecl && FnDecl->getOverloadedOperator() != OO_None &&

You could use isOverloadedOperator() ?

> +  // C++ [over.oper]p6:
> +  //   An operator function shall either be a non-static member
> +  //   function or be a non-member function and have at least one
> +  //   parameter whose type is a class, a reference to a class, an
> +  //   enumeration, or a reference to an enumeration.
> +  CXXMethodDecl *MethodDecl = dyn_cast<CXXMethodDecl>(FnDecl);
> +  if (MethodDecl) {

I see that you use MethodDecl as a predicate later in the function.   
As a reader, it actually helps me understand the code if you put the  
dyncast in the if condition, and use "isa<CXXMethodDecl>(FnDecl)"  
later in the method for the predicate cases.  This is just fewer  
variables to keep track of.  Does that seem reasonable?

>
> +    if (MethodDecl->isStatic()) {
> +      Diag(FnDecl->getLocation(),
> +           diag::err_operator_overload_static,
> +           FnDecl->getName(),
> +           SourceRange(FnDecl->getLocation()));
> +      IsInvalid = true;
> +
> +      // Pretend this isn't a member function; it'll supress
> +      // additional, unnecessary error messages.
> +      MethodDecl = 0;

Is there a reason to do any of the rest of the checking here?  Why not  
just return true?  Here and in other places you continue after  
emitting the first diagnostic.  While I do agree about trying to  
recover after errors, I think that this routine might lean too much in  
the direction of causing a big chain of nonsensical gibberish after a  
simple error.  Do you think it would be better to exit immediately  
after the first error (which would mean we'd emit at most one  
diagnostic per operator)?  What are the tradeoffs?


> +  } else {
> +    bool ClassOrEnumParam = false;
> +    for (FunctionDecl::param_iterator Param = FnDecl->param_begin();
> +         Param != FnDecl->param_end(); ++Param) {

This evaluates param_end() each time through the loop.

>
> +      QualType ParamType = (*Param)->getType();
> +      if (const ReferenceType *RefType = ParamType- 
> >getAsReferenceType())
> +        ParamType = RefType->getPointeeType();

Would using getNonReferenceType() make sense?


> +  // C++ [over.oper]p8:
> +  //   [...] Operator functions cannot have more or fewer parameters
> +  //   than the number required for the corresponding operator, as
> +  //   described in the rest of this subclause.
> +  unsigned NumParams = FnDecl->getNumParams() + (MethodDecl? 1 : 0);
> +  if (Op != OO_Call &&
> +      ((NumParams == 1 && !CanBeUnaryOperator) ||
> +       (NumParams == 2 && !CanBeBinaryOperator) ||
> +       (NumParams < 1) || (NumParams > 2))) {

> +    // We have the wrong number of parameters.
> +    std::string NumParamsStr = (llvm::APSInt(32) =  
> NumParams).toString(10);

This is scarily clever :), but please just use  
llvm::utostr(NumParams).  It is in llvm/ADT/StringExtras.h

> +  // Overloaded operators cannot be variadic.
> +  if (FnDecl->getType()->getAsFunctionTypeProto()->isVariadic()) {

Huh, operator() can't be variadic?  I didn't realize that, but then  
again, I guess I never had need to try :)

> +  // Some operators must be non-static member functions.
> +  if (MustBeMemberOperator && !MethodDecl) {
> +    Diag(FnDecl->getLocation(),
> +         diag::err_operator_overload_must_be_member,
> +         FnDecl->getName(),
> +         SourceRange(FnDecl->getLocation()));

Why do these specify the identifier location as a source range?  I  
think the effect of this is that it will underline the first token of  
the name (which is "operator" in this case?).  Is that intended?  If  
not, I'd just drop it and rely on the caret.  A source range would be  
better suited when emitting errors about parameters to the operator  
(underlying the parameter in question).

-Chris



More information about the cfe-commits mailing list