[cfe-dev] AST Representation of Conversions

Douglas Gregor dgregor at apple.com
Tue Jul 28 16:00:14 PDT 2009

Hi Sebastian,

On Jul 28, 2009, at 12:03 PM, Sebastian Redl wrote:
> One thing I have started once and then aborted, but which Argiris
> recently contacted me off-list about, is the AST representation of
> conversions and casts. Currently, we have absolutely minimal
> information: CastExpr (the base of all conversions) stores only the
> operand. ImplicitCastExpr stores only whether the result is an lvalue.
> ExplicitCastExpr (the base of all explicit casts) stores the target  
> type
> as written.
> None of these store any of the information Sema has worked very hard  
> to
> acquire.
> What kind of cast is it? Bitcast? Truncation? Extension? C++ has a  
> large
> variety of things that a conversion, especially a C-style cast, can  
> do:
> convert with constructor; convert with conversion operator; do a
> hierarchy cast, potentially to a virtual base, which could mean adding
> an offset to the pointer or dereferencing a pointer; do a raw bitcast
> (reinterpret_cast is good at that); do an integer or floating point
> extension/truncation; and even weirder things (member pointer casts,
> explicit cast of the address of an overloaded function).
> Obviously we need to save some information about the cast in the AST.
> The question is what, and where.


> CodeGen needs to distinguish:
> - a raw bitcast (reinterpret_cast of pointers and pointer/integer  
> pairs,
> reinterpret_cast of lvalues to references)
> - a floating point truncation (double -> float)
> - a floating point extension (float -> double)
> - an integer truncation (int -> short)
> - an integer extension (short -> int)
> - a static hierarchy cast without virtual bases (add an offset to the
> pointer)
> - a static hierarchy cast with virtual bases (fetch the pointer to the
> virtual base, and then add an offset)
> - a dynamic hierarchy cast (emit calls to support library)
> - a user-defined conversion via constructor (call that constructor)
> - a user-defined conversion via conversion operator (call that  
> operator)
> - a static hierarchy cast of a member object pointer (adjust the value
> of that pointer)
> - a static hierarchy cast of a member function pointer (I have no idea
> how that works)
> - function and array decay
> - GCC aggregate casts in various forms
> - vector and extvector casts
> - Objective-C casts
> I think that's everything. In short, CodeGen also cares about pretty
> much everything.

Quite an exhaustive list! I can't think of any you missed, except  
perhaps "no-op" conversions that merely adjust types (e.g., by adding  
qualifiers) and require no code generation.

> I don't know what other clients would need. The Index library  
> definitely
> wants to know about implicitly called functions (conversion operators
> and constructors). The static analyzer would probably want the same
> information as CodeGen. Other static code introspection tools probably
> want all information too.

I suspect you're right. We at least need that much information in each  
cast (regardless of whether it is implicit or explicit).

> Essentially, I think, we will have to enhance or wrap
> ImplicitConversionSequence from SemaOverload.h to also be able to
> represent conversions that are only explicitly possible. Then we put  
> it
> into the AST library and give CastExpr one of those.
> The problem with this approach is that it is heavy.
> ImplicitConversionSequence is a heavy object (40 bytes on 32-bit  
> without
> considering alignment, 80 bytes on 64-bit if alignment works the way I
> think it does), and every single ImplicitCastExpr (think of all the
> "usual integral conversions" in C) would bear this weight, as would
> casts that don't need this information, like const_cast (noop to
> codegen), dynamic_cast (always runtime calls) and reinterpret_cast
> (always bitcast).

ImplicitConversionSequence is quite heavy, and I don't think that  
clients need that much information. It seems to me that we could get  
away with adding an enum (covering all the kinds of conversions you  
mentioned above) and a declaration (that points to a constructor or  
conversion function). The Expr class already has enough spare bits to  
store the enum, so CastExpr (and its descendents) would only have to  
grow by a single pointer. That, IMO, is an acceptable trade-off, since  
we'll be making CodeGen easier for C and possibly for C++.

> An option would be to rearrange the hierarchy, but this makes it  
> reflect
> the implementation instead of the logical grouping. Currently the
> hierarchy makes sense to programmers:
> http://clang.llvm.org/doxygen/classclang_1_1CastExpr.html
> If we were to rearrange it to fit the needs of data storage, CastExpr
> would be the direct base of CXXConstCastExpr, CXXDynamicCastExpr,
> CXXReinterpretCastExpr and ComplexCastExpr. ComplexCastExpr would hold
> the conversion sequence and be the base of CXXStaticCastExpr,
> CXXFunctionalCastExpr, CStyleCastExpr and ImplicitCastExpr. Not  
> pretty.

No, not pretty. Our hierarchy is really nice for describing the  
syntactic and semantic behavior of these expressions, and it would be  
a shame if we had to sacrifice that clarity to save a few bytes.

	- Doug

More information about the cfe-dev mailing list