[cfe-dev] Macro expansion in the Rewriter?

Tue Nov 27 04:26:51 PST 2012

Il 27/11/2012 10:23, David Chisnall ha scritto:
> On 27 Nov 2012, at 00:30, Eli Friedman wrote:
> 
>> It sounds like useful functionality.  We don't store whether an
>> identifier is an expanded macro or what it expanded to in any
>> convenient way, though, so it would be a pain to implement.
> 
> I investigated this over the weekend and came to a similar
> conclusion.  I have a student currently working on a code
> reformatting tool who wants to be able to see, from libclang, if a
> macro expansion contains open or close braces.  I'd assumed that this
> would be something easy to expose, but it seems that we don't
> actually have any way of finding the sequence of tokens generated by
> a macro expansion (this is generated by the preprocessor, but not
> stored anywhere).  Even the HTML Rewriter, which (given the output in
> the static analyser) I assumed would already have code for doing it
> contains a half-implemented duplication of the macro expansion
> logic.
> 
> If someone's looking for a project, then factoring the macro
> expansion code out so that it could be rerun (the current code is
> destructive) would be very helpful.  It would also improve
> diagnostics a lot if you could say exactly what the macro expansion
> was, not just the chain of macros that caused it.

We have investigated this possibility in past (see
http://lists.cs.uiuc.edu/pipermail/cfe-dev/2011-October/017638.html),
but we didn't find a suitable solution that avoid the veto about making
Preprocessor slower in non negligible way.

Recently I've thought about a possibility that should have a minimal impact:

- suppose that the last two tokens preprocessed have respectively as
location Loc1 and Loc2
- if Loc1 and Loc2 come from the same FileID (i.e. their spelling
location are consecutive in source) nothing happens (the case
statistically far more frequent), otherwise a callback is invoked
passing Loc1 and Loc2 as arguments
- the program using clang library can implement such callback so to
store the locs in a jump table (a DenseMap)

When preprocessed token sequence is needed, ordinary relexing is used,
but using the jump table when we reach a location present in such table.

This permits not only to known the exact preprocessed token stream but
also to have every detail about every single token expansion in the
sequence.

I hope that this time we obtain a general consensus about adding this so
important missing feature.

-- 
Abramo Bagnara

BUGSENG srl - http://bugseng.com
mailto:abramo.bagnara at bugseng.com