[cfe-commits] Support for C++ new and delete

Thu Nov 20 11:16:32 PST 2008

Hi Sebastian,

On Nov 19, 2008, at 6:37 PM, Sebastian Redl wrote:
> I've enhanced and updated my patch to support C++ new and delete. It  
> doesn't support everything yet, but it's a pretty good start, and  
> I'd like to get the current state checked in.

Very nice. I'd like to discuss the VLAs issue a bit (see below), but  
otherwise I think it's good to commit this as the first (big) step.  
Thanks!

In Class CXXNewExpr:

+  // Points to the allocation function used.
+  Decl *OperatorNew;
+  // Points to the deallocation function used in case of error. May  
be null.
+  Decl *OperatorDelete;

Shouldn't these be FunctionDecl pointers, rather than Decl pointers?

+  // Points to the constructor used. Cannot be null if AllocType is a  
record;
+  // it would still point at the default constructor (even an  
implicit one).
+  // Must be null for all other types.
+  Decl *Constructor;

Same question here: why not make it a CXXConstructorDecl pointer?

+  // The type to be allocated. Is either *ty or a VLA of that type.
+  QualType AllocType;

I'm a little unnerved by this use of VLAs. It would seem conceptually  
simpler to have AllocType be non-VLA type, and then have a (possibly- 
NULL) Expr * for the number of allocated elements. More comments on  
this below...

+  ~CXXNewExpr() {
+    delete[] SubExprs;
+  }

You'll need to delete each of the SubExprs, too. I suggest moving the  
destructor out-of-line, since it'll end up having a loop.

+  // Points to the operator delete overload that is used. Could be a  
member.
+  Decl *OperatorDelete;

This will always be a FunctionDecl *, right? (Or a subclass)

+  // The pointer expression to be deleted.
+  Stmt *Argument;

Could this be an Expr * rather than a Stmt *?

+  // Location of the expression.
+  SourceLocation Loc;

Is this the location of the "delete" operator? It should probably be  
called OpLoc or DeleteLoc.

+Action::ExprResult
+Sema::ActOnCXXNew(SourceLocation StartLoc, bool UseGlobal,
+                  SourceLocation PlacementLParen,
+                  ExprTy **PlacementArgs, unsigned NumPlaceArgs,
+                  SourceLocation PlacementRParen,
+                  SourceLocation TyStart, TypeTy *Ty, SourceLocation  
TyEnd,
+                  SourceLocation ConstructorLParen,
+                  ExprTy **ConstructorArgs, unsigned NumConsArgs,
+                  SourceLocation ConstructorRParen)
+{
+  QualType AllocType = QualType::getFromOpaquePtr(Ty);
+  QualType CheckType = AllocType;
+  // To leverage the existing parser as much as possible, array types  
are
+  // parsed as VLAs. Unwrap for checking.
+  if (const VariableArrayType *VLA =  
Context.getAsVariableArrayType(AllocType)){
+    CheckType = VLA->getElementType();
+  }

I think it makes sense in the parser to deal with the type in the new  
expression like a VLA, because those are the semantics we need.  
However, I think this needs to stay an implementation detail of the  
parser: it shouldn't leak into Sema::isTypeName or Sema::ActOnCXXNew  
the way it is now. Sema::ActOnCXXNew would be simpler if it just  
received Ty and a separate expression for the number of elements to be  
allocated, rather than a VLA.

+  // Validate the type, and unwrap an array if any.
+  if (CheckAllocatedType(CheckType, StartLoc, SourceRange(TyStart,  
TyEnd)))
+    return true;

If we exit early here, we'll leak the PlacementArgs and  
ConstructorArgs. It might make sense to do what ActOnCallExpr does,  
which is to create an llvm::OwningPtr of the final expression in the  
beginning, which will be responsible for deallocating all of the  
subexpressions even if we exit early.

+    // FIXME: This is incorrect for when there is an empty  
initializer and
+    // no user-defined constructor. Must zero-initialize, not default- 
construct.
+    Constructor = PerformInitializationByConstructor(
+                      CheckType, ConsArgs, NumConsArgs,
+                      TyStart, SourceRange(TyStart, ConstructorRParen),
+                      CheckType.getAsString(),
+                      NumConsArgs != 0 ? IK_Direct : IK_Default);

We'll deal with this once we've gone through and updated all of the  
initialization routines to deal with C++ semantics. As you've noted,  
we don't have all of the various cases implemented.

+bool Sema::CheckAllocatedType(QualType AllocType, SourceLocation  
StartLoc,
+                              const SourceRange &TyR)
+{
+  // C++ 5.3.4p1: "[The] type shall be a complete object type, but  
not an
+  //   abstract class type or array thereof.
+  // FIXME: We don't have abstract types yet.
+  if (!AllocType->isObjectType()) {
+    diag::kind msg;
+    if (AllocType->isFunctionType()) {
+      msg = diag::err_new_function;
+    } else if(AllocType->isIncompleteType()) {

I hadn't realized that isObjectType was false for incomplete types.  
That's correct for C99, but not for C++. It's not your problem to deal  
with (unless you want to <g>), but could you please put a

	FIXME: check incomplete type before isObjectType

or something similar into CheckAllocatedType? It'll make it easier to  
fix the problem of isObjectType having a different behavior in C vs. C+ 
+.

+  // Every dimension beyond the first shall be of constant size.
+  while (const ArrayType *Array = Context.getAsArrayType(AllocType)) {
+    if (!Array->isConstantArrayType()) {
+      // FIXME: Might be nice to get a better source range from  
somewhere.
+      Diag(StartLoc, diag::err_new_array_nonconst) << TyR;
+      return true;
+    }
+    AllocType = Array->getElementType();
+  }

I guess the only way to get better source range information would be  
to hold the type in a Declarator (or something like it). Did you  
consider such an approach? It could be used both for better source  
range information and to avoid having to build VLA types as  
intermediates.

+  case CXXNewExprClass:
+    // In theory, there might be new expressions that don't have side  
effects
+    // (e.g. a placement new with an uninitialized POD), but those  
are obscure.

It's an obscure case, but I think it's worthy of a FIXME.  Whether we  
ever get to fix it or not is the real question :)

-  case CXXThisExprClass:
-    return LV_InvalidExpression;

?

"this" is an rvalue according to C++ [expr.prim]p3.

+void CXXNewExpr::EmitImpl(Serializer& S) const {
+  S.Emit(getType());
+  S.Emit(Initializer);
+  S.Emit(NumPlacementArgs);
+  S.Emit(NumConstructorArgs);
+  S.BatchEmitOwnedPtrs(NumPlacementArgs + NumConstructorArgs,  
SubExprs);
+  // FIXME: What about the default allocation functions? We don't  
want to
+  // own them to avoid the mess DeclRefExpr is in, but who would?

For new, it seems to me that the operator new and operator delete will  
either be owned by the class (if we found them in the class) or owned  
by the translation unit  (whether they are implicit or not); the  
constructor will always be owned by the class. So I don't think the  
FIXME needs any fixing.

+void StmtPrinter::VisitCXXNewExpr(CXXNewExpr *E) {
+  // FIXME: How to recover the '::'?
+  OS << "new ";

Why not just put a bit into CXXNewExpr/CXXDeleteExpr that says whether  
the user explicitly wrote the "::"?
(We could do something similar to keep track of whether we parsed a  
new-type-id or a (type-id).)

-void Parser::ParseDeclaratorInternal(Declarator &D, bool PtrOperator) {
+void Parser::ParseDeclaratorInternal(Declarator &D,
+                                     DirectDeclParseFunction  
DirectDeclParser) {
    tok::TokenKind Kind = Tok.getKind();

    // Not a pointer, C++ reference, or block.
    if (Kind != tok::star && (Kind != tok::amp || ! 
getLang().CPlusPlus) &&
        (Kind != tok::caret || !getLang().Blocks)) {
-    if (!PtrOperator)
-      ParseDirectDeclarator(D);
+    if (DirectDeclParser)
+      (this->*DirectDeclParser)(D);
      return;
    }

This is cool.

+void Parser::ParseNewDirectDeclarator(Declarator &D)
+{
+  // Parse the array dimensions.

Please add a comment for this routine.

+bool Parser::ParseExpressionListOrTypeId(ExprListTy &PlacementArgs,  
TypeTy *&Ty)
+{
+  // The '(' was alrady consumed.

Typo "alrady".

	- Doug