[cfe-dev] for-range; or: distinct source and semantic ASTs

Tue Dec 14 14:43:11 PST 2010

On Dec 14, 2010, at 9:15 AM, Sebastian Redl wrote:

> Hi,
> 
> Once again I'm pondering the C++0x for-range loop. I'm not sure how it would best fit into Clang.
> 
> Some background: for-range looks like this:
> 
> for (type name : collection) body;
> 
> where collection can be an array, a uniform initializer list, or an arbitrary object if some functions are overloaded for it.
> The standard specifies the semantics of for-range in terms of a rewrite (let __ denote some invented unique variable):
> 
> 1) If collection is an array of size N:
> 
> auto&& __ar = collection;
> for (auto __p = __a+0; __p != __a+N; ++__a) {
>  type name(*__p);
>  body;
> }
> 
> 2) If collection is an initializer list or object:
> 
> auto&& __c = collection;
> for (auto __it = begin(__c), __e = end(__c); __it != __e; ++it) {
>  type name(*__it);
> }
> 
> 
> Clients interested in the semantics of the code (e.g. CodeGen) would be easier to satisfy if the AST contained a representation of the rewrite. It would also be easier to implement in Sema, because Sema can just construct the rewrite and validate it using existing routines. CodeGen could also just use existing routines.

Sema will have to do the lookup and overload resolution for begin, end, !=, and ++it, and that information will have to be stored in the AST somewhere.

> Another client that is really interested in the semantics would be a search for function usage. Say I have this:
> 
> struct mycollection { ... };
> mycollection::iterator begin(mycollection&);
> mycollection::iterator end(mycollection&);
> 
> And now I want to find all references to the begin function. Then a for-range loop over mycollection is such a use, because it calls this function.

I think this is a completely separable issue.

> On the other hand, clients interested in the source representation (e.g. the pretty printer) want the original code, which is not easy to regenerate from the rewritten form. (For example, the declaration of the loop variable has gained an initializer.)
> 
> 
> Unfortunately, I don't think our AST supports having distinct "source view" and "semantic view" tree visitation strategies, does it? How would I best implement such a case?

There is one case where our AST has a distinct "source view" and "semantic view", which is in InitListExpr when dealing with designated initializers. However, I'd rather not repeat this pattern.

I suggest storing the source view in the AST. Then, for the non-array case, have expressions for the begin() initialization, end() initialization, != expression, and ++ expression, possibly using OpaqueValueExprs as stand-ins for __c, __e, and __it. We've done this kind of thing in a few other places, e.g., the expression needed to copy a non-POD block variable and the bool-conversion/result-conversion expression for the GNU x ? : y extension.

	- Doug