[cfe-dev] Patch proposal for __builtin_offsetof construct.

Douglas Gregor dgregor at apple.com
Fri Apr 16 08:36:11 PDT 2010


On Apr 16, 2010, at 3:35 AM, Roberto Amadini wrote:

> Hi,
> 
> I'm the student who is trying to fix the problem mentioned in
> http://lists.cs.uiuc.edu/pipermail/cfe-dev/2010-February/007833.html .
> 
> I followed the guidelines described in the previous messages, creating a
> dedicated class OffsetOfExpr which doesn't contain any sub-expression.
> The offsetof base type is encoded in a TypeSourceInfo field, while
> the sub-components are stored in a SmallVector of
> OffsetOfExpr::OffsetOfNode (a public struct similar to Action::OffsetOfComponent).
> 
> Note that the goal of this patch is solely to avoid the unsuitable use of
> UnaryOperator for OffsetOf expressions: namely, it is not meant to provide a
> solution to other FIXME's  (e.g., type-dependent OffsetOfExpr are still not handled).
> This patch passes all clang tests (it refers to revision 100962), but since
> I'm only a beginner in clang development I am note sure about the code I wrote,
> so any comments and/or suggestions are really welcome.

Thanks for tackling this! Even if you're not handling type-dependent __builtin_offsetof expressions, having a proper AST for __builtin_offsetof improves Clang and will make it easier to add support for type-dependent __builtin_offsetof in the future.

Some comments inline.

For reference, your attachment shows up inline, which makes it hard to apply/test. Please follow the instructions

	http://llvm.org/docs/DeveloperPolicy.html#patches

to configure Thunderbird to send patches in a more Apple-Mail-friendly way.

> Index: include/clang/AST/Expr.h
> ===================================================================
> --- include/clang/AST/Expr.h	(revisione 100962)
> +++ include/clang/AST/Expr.h	(copia locale)
> @@ -991,6 +985,78 @@
>   virtual child_iterator child_end();
> };
> 
> +/// OffsetOfExpr
> +class OffsetOfExpr : public Expr {
> +

Please provide some documentation for this AST node, including what it represents and an example of code that will be represented as an OffsetOfExpr.

> +public:
> +  // __builtin_offsetof(type, identifier(.identifier|[expr])*)
> +  // FIXME: Type-dependent case ignored.
> +  struct OffsetOfNode {
> +    SourceLocation LocStart, LocEnd;
> +    bool isBrackets;  // true if [expr], false if .ident
> +    union {
> +      FieldDecl *MemberDecl;
> +      Expr *IndexExpr;
> +    } U;
> +  };

You could save a word of storage per OffsetOfNode by mangling the "isBrackets" bit into the lowest bit of the FieldDecl or Expr pointer. llvm::PointerUnion<FieldDecl *, Expr *> will do this for you.

But, see below for more comments on this representation.

> +private:
> +  typedef llvm::SmallVector<OffsetOfNode, 4> Components;
> +  // Base type;
> +  TypeSourceInfo *TSInfo;
> +  SourceLocation Loc;
> +  // Keep track of the various subcomponents (e.g. instance of OffsetOfNode).
> +  Components Comps;

We're trying to eliminate all SmallVectors (and other malloc-allocated memory) from the AST representation. Please put a count of the number of components into this structure, then allocate storage for the OffsetOfNode structures after the OffsetOfExpr node (you'll want to make the OffsetOfExpr constructor private, then add a Create memory to do allocation + construction).

> +public:
> +
> +  OffsetOfExpr(QualType type, SourceLocation l, 
> +               TypeSourceInfo *tsi, const Components &comps)
> +    : Expr(OffsetOfExprClass, type, false, false),
> +      TSInfo(tsi), Loc(l), Comps(comps) {}
> +
> +  /// \brief Build an empty offsetof.
> +  explicit OffsetOfExpr(EmptyShell Empty)
> +    : Expr(OffsetOfExprClass, Empty), TSInfo(0), Comps(0) {}
> +
> +  /// getOperatorLoc - Return the location of the operator.
> +  SourceLocation getOperatorLoc() const { return Loc; }
> +  void setOperatorLoc(SourceLocation L) { Loc = L; }
> +
> +  TypeSourceInfo *getTypeSourceInfo() const {
> +    return TSInfo;
> +  }
> +  void setTypeSourceInfo(TypeSourceInfo *tsi) {
> +    TSInfo = tsi;
> +  }
> +  
> +  const OffsetOfNode* getComponents() const {
> +    return &Comps[0];
> +  }
> +  void setComponents(Components comps) {
> +    Comps = comps;
> +  }
> +  
> +  size_t getNumComponents() const {
> +    return Comps.size();
> +  }
> +
> +  virtual SourceRange getSourceRange() const {
> +    return SourceRange(Loc, Comps.back().LocEnd);
> +  }
> +  virtual SourceLocation getExprLoc() const { return Loc; }
> +
> +  static bool classof(const Stmt *T) {
> +    return T->getStmtClass() == OffsetOfExprClass;
> +  }
> +
> +  static bool classof(const OffsetOfExpr *) { return true; }
> +
> +  // Iterators
> +  virtual child_iterator child_begin();
> +  virtual child_iterator child_end();
> +};
> +
> /// SizeOfAlignOfExpr - [C99 6.5.3.4] - This is for sizeof/alignof, both of
> /// types and expressions.
> class SizeOfAlignOfExpr : public Expr {
> Index: include/clang/AST/StmtNodes.def
> ===================================================================
> --- include/clang/AST/StmtNodes.def	(revisione 100962)
> +++ include/clang/AST/StmtNodes.def	(copia locale)
> @@ -78,6 +78,7 @@
> EXPR(CharacterLiteral      , Expr)
> EXPR(ParenExpr             , Expr)
> EXPR(UnaryOperator         , Expr)
> +EXPR(OffsetOfExpr          , Expr)
> EXPR(SizeOfAlignOfExpr     , Expr)
> EXPR(ArraySubscriptExpr    , Expr)
> EXPR(CallExpr              , Expr)
> Index: tools/CIndex/CXCursor.cpp
> ===================================================================
> --- tools/CIndex/CXCursor.cpp	(revisione 100962)
> +++ tools/CIndex/CXCursor.cpp	(copia locale)
> @@ -139,7 +139,8 @@
>   case Stmt::StringLiteralClass:         
>   case Stmt::CharacterLiteralClass:      
>   case Stmt::ParenExprClass:             
> -  case Stmt::UnaryOperatorClass:         
> +  case Stmt::UnaryOperatorClass:
> +  case Stmt::OffsetOfExprClass:         
>   case Stmt::SizeOfAlignOfExprClass:     
>   case Stmt::ArraySubscriptExprClass:    
>   case Stmt::BinaryOperatorClass:        
> Index: lib/Frontend/StmtXML.cpp
> ===================================================================
> --- lib/Frontend/StmtXML.cpp	(revisione 100962)
> +++ lib/Frontend/StmtXML.cpp	(copia locale)
> @@ -125,6 +125,7 @@
>     void VisitFloatingLiteral(FloatingLiteral *Node);
>     void VisitStringLiteral(StringLiteral *Str);
>     void VisitUnaryOperator(UnaryOperator *Node);
> +    void VisitOffsetOfExpr(OffsetOfExpr *Node);
>     void VisitSizeOfAlignOfExpr(SizeOfAlignOfExpr *Node);
>     void VisitMemberExpr(MemberExpr *Node);
>     void VisitExtVectorElementExpr(ExtVectorElementExpr *Node);
> @@ -260,7 +261,6 @@
>   case UnaryOperator::Real:    return "__real";
>   case UnaryOperator::Imag:    return "__imag";
>   case UnaryOperator::Extension: return "__extension__";
> -  case UnaryOperator::OffsetOf: return "__builtin_offsetof";
>   }
> }
> 
> @@ -308,6 +308,10 @@
>   Doc.addAttribute("op_code", getOpcodeStr(Node->getOpcode()));
> }
> 
> +void StmtXML::OffsetOfExpr(OffsetOfExpr *Node) {
> +  DumpExpr(Node);
> +}
> +
> void StmtXML::VisitSizeOfAlignOfExpr(SizeOfAlignOfExpr *Node) {
>   DumpExpr(Node);
>   Doc.addAttribute("is_sizeof", Node->isSizeOf() ? "sizeof" : "alignof");
> Index: lib/Sema/TreeTransform.h
> ===================================================================
> --- lib/Sema/TreeTransform.h	(revisione 100962)
> +++ lib/Sema/TreeTransform.h	(copia locale)
> @@ -3828,6 +3828,12 @@
> 
> template<typename Derived>
> Sema::OwningExprResult
> +TreeTransform<Derived>::TransformOffsetOfExpr(OffsetOfExpr *E) {
> +  return SemaRef.Owned(new (SemaRef.Context) OffsetOfExpr(*E));
> +}

Please add a FIXME here so that we don't forget to add the template-instantiation logic later.

> 
> Index: lib/Sema/SemaExpr.cpp
> ===================================================================
> --- lib/Sema/SemaExpr.cpp	(revisione 100962)
> +++ lib/Sema/SemaExpr.cpp	(copia locale)
> @@ -18,6 +18,7 @@
> #include "clang/AST/ASTContext.h"
> #include "clang/AST/DeclObjC.h"
> #include "clang/AST/DeclTemplate.h"
> +#include "clang/AST/Expr.h"
> #include "clang/AST/ExprCXX.h"
> #include "clang/AST/ExprObjC.h"
> #include "clang/Basic/PartialDiagnostic.h"
> @@ -6401,9 +6402,6 @@
>   Expr *Input = (Expr *)InputArg.get();
>   QualType resultType;
>   switch (Opc) {
> -  case UnaryOperator::OffsetOf:
> -    assert(false && "Invalid unary operator");
> -    break;
> 
>   case UnaryOperator::PreInc:
>   case UnaryOperator::PreDec:
> @@ -6571,8 +6569,8 @@
>                                                   SourceLocation RPLoc) {
>   // FIXME: This function leaks all expressions in the offset components on
>   // error.
> -  // FIXME: Preserve type source info.
> -  QualType ArgTy = GetTypeFromParser(argty);
> +  TypeSourceInfo *ArgTInfo;
> +  QualType ArgTy = GetTypeFromParser(argty, &ArgTInfo);
>   assert(!ArgTy.isNull() && "Missing type argument!");

When ArgTy.isNull(), we should just return with an error.

>   bool Dependent = ArgTy->isDependentType();
> @@ -6592,6 +6590,7 @@
>   Expr* Res = new (Context) ImplicitValueInitExpr(ArgTyPtr);
>   Res = new (Context) UnaryOperator(Res, UnaryOperator::Deref,
>                                     ArgTy, SourceLocation());
> +  llvm::SmallVector<OffsetOfExpr::OffsetOfNode, 4> Comps(NumComponents);
> 
>   // offsetof with non-identifier designators (e.g. "offsetof(x, a.b[c])") are a
>   // GCC extension, diagnose them.
> @@ -6608,83 +6607,63 @@
>                             diag::err_offsetof_incomplete_type))
>       return ExprError();
> 
> -    // FIXME: Dependent case loses a lot of information here. And probably
> -    // leaks like a sieve.
> +    QualType CurrentType = ArgTy;
>     for (unsigned i = 0; i != NumComponents; ++i) {
>       const OffsetOfComponent &OC = CompPtr[i];
>       if (OC.isBrackets) {
> -        // Offset of an array sub-field.  TODO: Should we allow vector elements?
> -        const ArrayType *AT = Context.getAsArrayType(Res->getType());
> -        if (!AT) {
> -          Res->Destroy(Context);
> +        const ArrayType *AT = Context.getAsArrayType(CurrentType);
> +        if(!AT)
>           return ExprError(Diag(OC.LocEnd, diag::err_offsetof_array_type)
> -            << Res->getType());
> -        }
> +            << CurrentType);
> 
>         // FIXME: C++: Verify that operator[] isn't overloaded.
> 
> -        // Promote the array so it looks more like a normal array subscript
> -        // expression.
> -        DefaultFunctionArrayLvalueConversion(Res);
> -
>         // C99 6.5.2.1p1
>         Expr *Idx = static_cast<Expr*>(OC.U.E);
> -        // FIXME: Leaks Res
>         if (!Idx->isTypeDependent() && !Idx->getType()->isIntegerType())
>           return ExprError(Diag(Idx->getLocStart(),
>                                 diag::err_typecheck_subscript_not_integer)
> -            << Idx->getSourceRange());
> -
> -        Res = new (Context) ArraySubscriptExpr(Res, Idx, AT->getElementType(),
> -                                               OC.LocEnd);
> -        continue;
> +                             << Idx->getSourceRange());
> +        CurrentType = AT->getElementType();
> +        Comps[i].U.IndexExpr = Idx;
>       }
> -
> -      const RecordType *RC = Res->getType()->getAs<RecordType>();
> -      if (!RC) {
> -        Res->Destroy(Context);
> -        return ExprError(Diag(OC.LocEnd, diag::err_offsetof_record_type)
> -          << Res->getType());
> -      }
> -
> -      // Get the decl corresponding to this.
> -      RecordDecl *RD = RC->getDecl();
> -      if (CXXRecordDecl *CRD = dyn_cast<CXXRecordDecl>(RD)) {
> -        if (!CRD->isPOD() && !DidWarnAboutNonPOD &&
> +      else {
> +        const RecordType *RC = CurrentType->getAs<RecordType>();
> +        if (!RC) 
> +          return ExprError(Diag(OC.LocEnd, diag::err_offsetof_array_type)
> +            << CurrentType);

This should be diag::err_offsetof_record_type?

> +        RecordDecl *RD = RC->getDecl();
> +        if (CXXRecordDecl *CRD = dyn_cast<CXXRecordDecl>(RD)) {
> +          if (!CRD->isPOD() && !DidWarnAboutNonPOD &&
>             DiagRuntimeBehavior(BuiltinLoc,
>                                 PDiag(diag::warn_offsetof_non_pod_type)
>                                   << SourceRange(CompPtr[0].LocStart, OC.LocEnd)
> -                                  << Res->getType()))
> +                                  << CurrentType))
>           DidWarnAboutNonPOD = true;
> -      }
> +        }
> +        LookupResult R(*this, OC.U.IdentInfo, OC.LocStart, LookupMemberName);
> +        LookupQualifiedName(R, RD);
> +        FieldDecl *MemberDecl = R.getAsSingle<FieldDecl>();
> +        if (!MemberDecl)
> +          return ExprError(Diag(BuiltinLoc, diag::err_no_member)
> +            << OC.U.IdentInfo << RD << SourceRange(OC.LocStart, OC.LocEnd));
> 
> -      LookupResult R(*this, OC.U.IdentInfo, OC.LocStart, LookupMemberName);
> -      LookupQualifiedName(R, RD);
> -
> -      FieldDecl *MemberDecl = R.getAsSingle<FieldDecl>();
> -      // FIXME: Leaks Res
> -      if (!MemberDecl)
> -        return ExprError(Diag(BuiltinLoc, diag::err_no_member)
> -         << OC.U.IdentInfo << RD << SourceRange(OC.LocStart, OC.LocEnd));
> -
> -      // FIXME: C++: Verify that MemberDecl isn't a static field.
> -      // FIXME: Verify that MemberDecl isn't a bitfield.
> -      if (cast<RecordDecl>(MemberDecl->getDeclContext())->isAnonymousStructOrUnion()) {
> -        Res = BuildAnonymousStructUnionMemberReference(
> -            OC.LocEnd, MemberDecl, Res, OC.LocEnd).takeAs<Expr>();
> -      } else {
> -        PerformObjectMemberConversion(Res, /*Qualifier=*/0,
> -                                      *R.begin(), MemberDecl);
> -        // MemberDecl->getType() doesn't get the right qualifiers, but it
> -        // doesn't matter here.
> -        Res = new (Context) MemberExpr(Res, false, MemberDecl, OC.LocEnd,
> -                MemberDecl->getType().getNonReferenceType());
> +        // FIXME: C++: Verify that MemberDecl isn't a static field.
> +        // FIXME: Verify that MemberDecl isn't a bitfield.
> +        if (cast<RecordDecl>(MemberDecl->getDeclContext())->isAnonymousStructOrUnion())
> +          CurrentType = MemberDecl->getType();

This isn't going to work.  When we find a member of an anonymous struct or union, we need to introduce OffsetOfNodes for the FieldDecls that represent the anonymous unions. Imagine, e.g.,

  struct A {
    struct {
      union { int i; float f; };
      bool isInt;
    };
  };

If I ask for the offset of "i" in A, we should (semantically!) get an OffsetOfNode referring to the anonymous struct, another OffsetOfNode for the anonymous union, and then, finally, the OffsetOfNode for "i". So you can have more OffsetOfNodes in the AST than there were components parsed by the parser, since we're adding implicit ones.

> +        else
> +          CurrentType = MemberDecl->getType().getNonReferenceType();
> +        Comps[i].U.MemberDecl = MemberDecl;
>       }
> +      Comps[i].LocStart = OC.LocStart;
> +      Comps[i].LocEnd = OC.LocEnd;
>     }
> +      
>   }
> 
> -  return Owned(new (Context) UnaryOperator(Res, UnaryOperator::OffsetOf,
> -                                           Context.getSizeType(), BuiltinLoc));
> +  return Owned(new (Context) OffsetOfExpr(Context.getSizeType(), BuiltinLoc, 
> +                                          ArgTInfo, Comps));
> }

Aside from the anonymous struct/union issue, this is a big improvement.

> Index: lib/AST/ExprConstant.cpp
> ===================================================================
> --- lib/AST/ExprConstant.cpp	(revisione 100962)
> +++ lib/AST/ExprConstant.cpp	(copia locale)
> @@ -16,7 +16,9 @@
> #include "clang/AST/CharUnits.h"
> #include "clang/AST/RecordLayout.h"
> #include "clang/AST/StmtVisitor.h"
> +#include "clang/AST/TypeLoc.h"
> #include "clang/AST/ASTDiagnostic.h"
> +#include "clang/AST/Expr.h"
> #include "clang/Basic/Builtins.h"
> #include "clang/Basic/TargetInfo.h"
> #include "llvm/ADT/SmallString.h"
> @@ -221,7 +223,7 @@
>   APValue VisitStmt(Stmt *S) {
>     return APValue();
>   }
> -
> +  
>   APValue VisitParenExpr(ParenExpr *E) { return Visit(E->getSubExpr()); }
>   APValue VisitDeclRefExpr(DeclRefExpr *E);
>   APValue VisitPredefinedExpr(PredefinedExpr *E) { return APValue(E); }
> @@ -821,6 +823,7 @@
> 
>   bool VisitCallExpr(CallExpr *E);
>   bool VisitBinaryOperator(const BinaryOperator *E);
> +  bool VisitOffsetOfExpr(const OffsetOfExpr *E);
>   bool VisitUnaryOperator(const UnaryOperator *E);
>   bool VisitConditionalOperator(const ConditionalOperator *E);
> 
> @@ -1365,20 +1368,56 @@
>   return Success(Info.Ctx.getTypeSizeInChars(SrcTy).getQuantity(), E);
> }
> 
> +bool IntExprEvaluator::VisitOffsetOfExpr(const OffsetOfExpr *E) {
> +  CharUnits Result;
> +  const OffsetOfExpr::OffsetOfNode *Comps = E->getComponents();
> +  unsigned n = E->getNumComponents();
> +  if (n == 0)
> +    return false;
> +  QualType CurrentType = E->getTypeSourceInfo()->getType();
> +  for (unsigned i = 0; i != n; ++i) {
> +    const OffsetOfExpr::OffsetOfNode& ON = Comps[i];
> +    if (ON.isBrackets) {
> +      Expr *Idx = static_cast<Expr*>(ON.U.IndexExpr);
> +      APSInt IdxResult;
> +      if (!EvaluateInteger(Idx, IdxResult, Info))
> +        return false;
> +      const ArrayType *AT = Info.Ctx.getAsArrayType(CurrentType);
> +      if (!AT)
> +        return false;
> +      CurrentType = AT->getElementType();
> +      CharUnits ElementSize = Info.Ctx.getTypeSizeInChars(CurrentType);
> +      Result += IdxResult.getSExtValue() * ElementSize;
> +    }
> +    else {
> +      FieldDecl *MemberDecl = ON.U.MemberDecl;
> +      const RecordType *RT = CurrentType->getAs<RecordType>();
> +      if (!RT)
> +        return false;
> +      RecordDecl *RD = RT->getDecl();
> +      const ASTRecordLayout &RL = Info.Ctx.getASTRecordLayout(RD);
> +      unsigned i = 0;
> +      for (RecordDecl::field_iterator Field = RD->field_begin(),
> +                                      FieldEnd = RD->field_end();
> +           Field != FieldEnd; (void)++Field, ++i) {
> +        if (*Field == MemberDecl)
> +          break;
> +      }
> +      if (i < RL.getFieldCount())
> +        Result += CharUnits::fromQuantity(RL.getFieldOffset(i) / 8);

Use ASTContext::getCharWidth() rather than 8.

> +      if (cast<RecordDecl>(MemberDecl->getDeclContext())->isAnonymousStructOrUnion())
> +        CurrentType = MemberDecl->getType();

If you follow my advice about expanded references into anonymous structs/unions, you won't need to do anything special here.

> Index: lib/AST/StmtProfile.cpp
> ===================================================================
> --- lib/AST/StmtProfile.cpp	(revisione 100962)
> +++ lib/AST/StmtProfile.cpp	(copia locale)
> @@ -261,6 +261,10 @@
>   ID.AddInteger(S->getOpcode());
> }
> 
> +void StmtProfiler::VisitOffsetOfExpr(OffsetOfExpr *S) {
> +  VisitExpr(S);
> +}
> +

To profile an OffsetOfExpr, we'll need to visit the type and all of the OffsetOfNodes as well. That way, we can distinguish between different OffsetOfExprs on the same base type.

> Index: lib/AST/Expr.cpp
> ===================================================================
> --- lib/AST/Expr.cpp	(revisione 100962)
> +++ lib/AST/Expr.cpp	(copia locale)
> @@ -334,7 +334,6 @@
>   case Real:    return "__real";
>   case Imag:    return "__imag";
>   case Extension: return "__extension__";
> -  case OffsetOf: return "__builtin_offsetof";
>   }
> }
> 
> @@ -1835,7 +1834,9 @@
>     case UnaryOperator::Real:
>     case UnaryOperator::Imag:
>       return CheckICE(Exp->getSubExpr(), Ctx);
> -    case UnaryOperator::OffsetOf:
> +    }
> +  }
> +  case Expr::OffsetOfExprClass: {
>       // Note that per C99, offsetof must be an ICE. And AFAIK, using
>       // Evaluate matches the proposed gcc behavior for cases like
>       // "offsetof(struct s{int x[4];}, x[!.0])".  This doesn't affect
> @@ -1843,7 +1844,6 @@
>       // array subscripts that aren't ICEs, and if the array subscripts
>       // are ICEs, the value of the offsetof must be an integer constant.
>       return CheckEvalInICE(E, Ctx);
> -    }
>   }
>   case Expr::SizeOfAlignOfExprClass: {
>     const SizeOfAlignOfExpr *Exp = cast<SizeOfAlignOfExpr>(E);
> @@ -2582,6 +2582,15 @@
> Stmt::child_iterator UnaryOperator::child_begin() { return &Val; }
> Stmt::child_iterator UnaryOperator::child_end() { return &Val+1; }
> 
> +// OffsetOfExpr
> +Stmt::child_iterator OffsetOfExpr::child_begin() {
> +  return child_iterator();
> +}
> +
> +Stmt::child_iterator OffsetOfExpr::child_end() {
> +  return child_iterator();
> +}

This is a tricky area. child_begin()/child_end() are supposed to provide iterators that cover all of the Expr*'s (actually, Stmt*'s) that are subexpressions of the expression node. With OffsetOfExpr, those Expr*'s aren't currently in contiguous memory.

In DesignatedInitExpr, which is very similar to OffsetOfExpr, we actually did a little dance where we have an array of Expr* nodes that are all of the subexpressions. Then, each node was either non-Expr* data (e.g., a FieldDecl*) or it was an index into the array of Expr* nodes. child_begin()/child_end() just pointed to the beginning/end of that Expr* array.

It's a bit clumsy to implement, but that's what we're stuck with at the moment.

Overall, this is a great start on __builtin_offsetof, thanks! I look forward to seeing a revised patch.

	- Doug



More information about the cfe-dev mailing list