[cfe-dev] Preprocessed loc/token retrieval dream (almost) come true

Mon Oct 3 12:10:43 PDT 2011

On Oct 3, 2011, at 11:57 AM, Abramo Bagnara wrote:

> Il 03/10/2011 20:25, Argyrios Kyrtzidis ha scritto:
>> Hi Abramo,
>> 
>> Sorry to disappoint you but I think the dream remains unfulfilled ;-)
> 
> You make me sad for a few minutes... but let try to find a solution: I
> think that to get preprocessed tokens has too many benefits to stop only
> a few steps before to accomplish that.
> 
> Let me know if you don't see strong benefits in the possibility to get
> the preprocessed tokens in a range.
> 
> First the easy part:
> 
>> Apart from that, this is trying to deal with macro expansions; how are
> you handling preprocessor directives ? e..g:
>> 
>> X
>> #if  ...
>> Y
>> #else
>> X
>> #endif
>> 
>> How do you find out what comes after 'X' if you don't preprocess ?
> 
> Preprocessor callbacks give us complete info about skipped area so the
> helper just have to take in account that.
> 
> The same is true for file changes:
> 
> X
> #include "..."
> Y

How about investigating whether it is possible/viable to extend the preprocessor callbacks in a way that you can get at the macro expanded locations/tokens in a reliable way ?

> 
>> 
>> The code that you posted was a bit hard to follow but correct me if I'm wrong;
>> You are recording all macro expansion points and once you hit one, you enter the SLocEntry for the macro expansion and start lexing it, is this correct ?
> 
> Yes, and my tests show that it works very well in most cases.
> 
>> This may seem to work but it is not reliable. The main issue is that for macro arguments expansion we do *not* guarantee that the range of the SLocEntry contains only the tokens that were actually lexed.
>> This is because we aggressively "merge" them to reduce the number of needed SLocEntries.
>> 
>> Here's an example:
>> 
>> #define M1 1
>> #define M2 2
>> #define M3 3
>> 
>> #define MA1(a,b,c) a c
>> #define MA2(x) x
>> 
>> MA2( MA1(M1, M2, M3) )
>> 
>> The tokens that MA2 ultimately receives are '1' and '3' but if you follow through and lex the SLocEntry that gets created for the macro arg expansion for MA2, you will notice that the length is 5 and it is actually a chunk encompassing "1 2 3".
>> 
>> So, from this chunk, only '1' and '3' and their respective locations were actually passed to the parser but you don't know that just by looking at the SLocEntry.
> 
> How can I avoid that "optimization" and thus verify the real memory
> impact with some huge and relevant testcases?

See TokenLexer::updateLocForMacroArgTokens.

FYI, dropping the optimization and trying to guarantee that SLocEntries contain only lexed tokens is going to have very little chance of getting in trunk, see the wins of the optimization here: http://lists.cs.uiuc.edu/pipermail/cfe-commits/Week-of-Mon-20110822/045495.html

> 
> Many thanks for your help and your review.
> 
> Abramo