[cfe-dev] Adding more info to builtin source types.

Mon Jan 11 16:10:42 PST 2010

Hello Enea,

On Jan 11, 2010, at 1:06 PM, Enea Zaffanella wrote:
> Currently, class BuiltinTypeLoc provides a single source location,  
> accessible via member getNameLoc(), pointing to the name of the type  
> specifier type (e.g., for 'unsigned int' it would return the  
> location of 'int'). It is quite common for coders to declare builtin  
> integral types using sign or width type specifiers only (e.g.,  
> 'unsigned', 'long', 'unsigned short', etc.). In these cases the  
> getNameLoc() method would return an invalid source location.

What specific problem do you need to address? Do you just want  
getNameLoc() to have a valid location (which points to the sign or  
width if there is no actual type named), or do you want fine-grained  
information about where the width and sign keywords were actually  
located? The former could be done with a simple change where we build  
BuiltinTypeLocs (without increasing memory usage), while the latter  
requires more work (as you've done in this patch).

> The attached patch augments BuiltinTypeLoc class by adding two  
> source locations, one for the sign specifier and the other for the  
> width specifier. We have added methods to get/set the three source  
> locations.
> We also modified method getNameLoc() so as to return what is  
> _likely_ to be the source location of the first type specifier token  
> in the source code, assuming that the three specifiers are provided  
> in the "canonical" order:
>  <sign> <width> <type>
>
> Please let us know if the patch is OK.

I have a few comments on this patch.

My biggest concern is that we're increasing the amount of storage  
needed for very common cases, and I'd like us to both try to minimize  
the amount of storage needed and to measure the actual cost of this  
change for some non-trivial header, to figure out how much extra  
memory its using in practice.

To minimize the amount of storage needed, we don't always want  
BuiltinTypeLoc to store 3 locations, because only the integral types  
actually need to store sign and width information, and that's only  
written in the source code occasionally. So, BuiltinTypeLoc should  
have a variable-length encoding dependent on the actual builtin type:  
types that always only have one source location (void, bool, float,  
double, nullptr, id, Class, SEL) should only store that location. For  
types that may have more than one location (e.g., the integral types),  
the BuiltinTypeLoc could encode which specifiers exist (in some prefix  
bytes) and then only have source locations for those keywords that  
actually show up in the source.

+  SourceRange getSourceRange() const {
+    SourceLocation loc = getNameLoc();
+    return SourceRange(loc, loc);
+  }

If we're going to keep the source locations for the type/width/sign,  
it'd be nice to the source range to cover the entire type. Storing the  
specifiers in order (with some tag saying which source location refers  
to which kind of keyword) would make this possible.

Of course, depending on how you answered the question at the top of my  
e-mail, all of this might be moot: if you just want  
BuiltinTypeLoc::getNameLoc() to point to some valid location for a  
integral type described only by its width or sign, that's easy to fix  
without any performance impact.

	- Doug