[cfe-dev] Make macro instantiation more memory efficient

Thu Jun 30 13:28:32 PDT 2011

First, let me start with a biblical tale.

In the beginning there was nothing. Then Chris said "Let there be a preprocessor!" and a preprocessor was formed. And Chris looked upon macro instantiation and said, "It is good!". But an evil boost library came along and blew up the memory that the preprocessor was consuming.. (ok, the tale's starting to not make sense).

Currently when we are instantiating a macro, each token that comes out reserves a chunk in the "source address space" by adding a new entry to the SLocEntry table. This works great up until you start abusing the preprocessor, like that boost library which ended up creating ~ 36M entries, blowing up memory and PCH file to > 1 GB.

I'd like to suggest a slightly different and more memory efficient way:
When a macro instantiation occurs, we reserve a SLocEntry chunk with length the full length of the macro definition source. We set the spelling location of this chunk to point to the start of the macro definition and any subsequent tokens that were lexed directly from the macro definition will get a location from this chunk with the appropriate offset.
When expanding macro arguments we can always use a new chunk for each token as we are currently doing or we can do the same optimization for arguments whose input comes directly from a FileID, in which case we will again reserve a single chunk for the full extent of the arguments and assign locations to tokens relative to that.

I don't anticipate this change to be too intrusive, I think the only functions that will be affected and need to change are SourceManager::isBeforeInTranslationUnit and SourceManager::isAtEndOfMacroInstantiation which both assume that 2 macro tokens are always ordered in the "source address space" according to their offset.

Does it seem reasonable ? Speak now or forever hold your peace..