[cfe-dev] clang memory usage with C++ template metaprogramming

Douglas Gregor dgregor at apple.com
Tue Jun 8 16:54:51 PDT 2010


On Jun 8, 2010, at 4:03 PM, John Bytheway wrote:

> On 08/06/10 23:03, Douglas Gregor wrote:
>> 
>> On Jun 8, 2010, at 2:55 PM, John Bytheway wrote:
>> 
>>> Several years ago I embarked on a project involving some heavy-duty
>>> C++ template metaprogramming.  In the end I abandoned it because
>>> the compile times and memory usage with g++ were too big.
>>> 
>>> On seeing clang's promised reduction of such requirements, I
>>> thought I'd go back to my project and see how clang fared when
>>> compiling it. Although it does indeed run much faster than g++, it
>>> actually uses *more* memory.  I'm just posting here to ask if this
>>> is to be expected. If it might be indicative of some issue or if
>>> you'd like to know where all this memory is being used then I'd be
>>> happy to try some profiling.
> <snip>
> 
>> I've also seen this with template metaprogramming-heavy code, but
>> aside from some idle speculation (we think it has to do with
>> type-location information in Clang), we haven't looked into it
>> closely.
> 
> Fair enough.  I was curious, so I ran valgrind/massif to get an idea.
> In short:
> 
> 16.53% (259,009,024B) in 722 places, all below massif's threshold
> 14.49% (227,086,336B) clang::DeclContext::CreateStoredDeclsMap

In theory, we might be able to use a smaller data structure for DeclContexts with only a few elements in them, which would probably help reduce memory usage when we're dealing with many instantiations of small templates.

> 12.85% (201,326,592B) clang::SourceManager::createInstantiationLoc
> 06.82% (106,841,236B) clang::TokenLexer::ExpandFunctionArguments

There must be some preprocessor metaprogramming going on this example, too? That's pretty big for the preprocessor.

> 12.83% (201,068,544B) clang::ASTContext::CreateTypeSourceInfo

Yes, this is the type-source information I mentioned. If we make template instantiation "perfect" with respect to type-source information, so that any dependent type instantiates down to something that structurally identical to the form it had when it was written in the source, then we could avoid allocating memory for type-source information in each type instantiation. We're not too far from this goal, but it has to be *perfect* for us to use the optimization.

> 04.94% (77,463,552B) clang::CXXConstructorDecl::Create
> 02.05% (32,157,696B) clang::CXXMethodDecl::Create
> 01.82% (28,585,984B) clang::CXXDestructorDecl::Create

A number of these could be eliminated if we were to lazily create the implicitly-declared default constructor, copy constructor, copy-assignment operator, and destructor. 

> 08.86% (138,792,960B) clang::ASTContext::getTemplateSpecializationType

> 02.15% (33,763,328B) clang::TemplateArgumentList::TemplateArgumentList
> 01.59% (24,907,776B) clang::ASTContext::getFunctionType
> 01.27% (19,861,504B) clang::ASTContext::getLValueReferenceType
> 01.08% (16,908,288B) clang::TypedefDecl::Create

Not much we can do about these, except look for ways to make the various AST nodes smaller.

> So indeed type location information is a significant part, but nothing
> is overwhelming, which I guess is a good sign and nothing is worth changing.
> 
> I wonder idly: How plausible would it be to allow execution in a mode
> where no source information was maintained, and thus reduce memory usage
> (at the expense of useful errors/warnings)?  Such a mode might be useful
> at times.  I'm guessing it would be prohibitively difficult.

We discussed this back when we improved type-source location information, but I am very much against having such a mode: the AST should always be the same, for all clients, or the size of the testing matrix explodes and we get far worse coverage. We should spend time optimizing the system as a unified whole rather than trying to separate out the less-efficient bits that provide needed functionality.

	- Doug



More information about the cfe-dev mailing list