[cfe-dev] Get full source of multiple macro definitions using libtooling

John Bartelme via cfe-dev cfe-dev at lists.llvm.org
Mon May 9 07:13:46 PDT 2016


On May 9, 2016 9:24 AM, "Manuel Klimek" <klimek at google.com> wrote:
>
>
>
> On Mon, May 9, 2016 at 2:02 PM John Bartelme <bartelme at gmail.com> wrote:
>>
>> On Mon, May 9, 2016 at 5:35 AM, Manuel Klimek <klimek at google.com> wrote:
>>>
>>> On Fri, May 6, 2016 at 9:04 PM John Bartelme via cfe-dev <
cfe-dev at lists.llvm.org> wrote:
>>>>
>>>> Hopefully this is an acceptable list to ask a question about
libtooling on:
>>>>
>>>> Ultimately I'm trying to pull out relevant structures from thousands
of existing c and c++ header files. I've been able to use libtooling to
pull out a structure and all of the structures/enums/typedefs etc it relies
on from various different headers. Unfortunately when I get the source
range backing the Decls it still references the macros defined there in.
I'm currently trying to find a way to access and print the source of these
macros but not having much luck when multiple macros are defined.
>>>>
>>>> For example:
>>>>
>>>> #define INT int
>>>> #define UNSIGNED unsigned
>>>> #define NAME name
>>>>
>>>> typedef struct {
>>>>    UNSIGNED long INT NAME;
>>>> } test;
>>>>
>>>> When I get the FieldDecl corresponding to name and get the SourceRange
I see the spelling location pointing to "#define UNSIGNED unsigned".
>>>
>>>
>>> With that you probably mean the spelling location of the start
location? A SourceRange doesn't have a spelling *location* :)
>>
>>
>> That's correct.  I do mean the start location of the range. Sorry for
being confusing here.  I guess my first question is if given a range I can
get to all SourceLocations that contain a macro with associated spelling
location or if I need to go back to the Decl to get the next
range/sourcelocation
>>>
>>>
>>>>
>>>> I'd like to know how to get to the other macro definition's source
locations. I know that when I change "UNSIGNED long INT NAME;" to "unsigned
long INT NAME;" the spelling location will then point to "#define INT int".
>>>
>>>
>>> Again, I'm not sure which location you're using.
>>>
>>>>
>>>> It seems as if declaration names are treated differently though as
changing to "unsigned long int NAME;" leaves me with no spelling location.
>>>>
>>>> Is there a way to get multiple spelling locations given a SourceRange?
Do I need to narrow down the source range some other way? I've tried lexing
to the next token but the doesn't leave me with a new spelling location.
I'm also going to have to account for macros in arrays such as "int
bob[MAX_WIDTH][MAX_HEIGHT]" but I'm hoping once I figure out my issues here
that will become clear. Thanks in advance for any help that can be
provided.  john
>>>
>>>
>>> All the info is in the SourceRange / SourceLocation; SourceLocation
actually provides all relevant instantiation points.
>>> It depends on:
>>> - which source location you're querying against; if you have the Decl,
like FieldDecl, generally getLocation() will get you the name (that is, the
spelling loc will point at 'name' and the expansion locs will point at the
#define NAME and the NAME; respectively).
>>> - whether you really want a  range; for ranges, there's
Lexer::makeFileCharRange and Lexer::getSourceText for that
>>
>>
>> Is there an easy way to iterate through the all the source locations
that would contain macro expansions?
>
>
>>
>> I've had good luck with nested macros by tracing the immediate expansion
locations from the original spelling location but no luck in trying to get
to another SourceLocation that has different spelling location than the
first macro in the statement.  I thought perhaps I needed to try and walk
through the different QualType/Type classes associated with the field but
then I wasn't sure how to peel those back so I got every macro expansion
and then also how to get those back to their SourceLocations.  To summarize
given a generic statement like the one above "UNSIGNED long INT NAME;" I
want to be able to pull out the, in this case 3, SourceLocations that are
associated with the appropriate spelling locations.  Thanks so much, john
>
>
> I think that'll be hard, mainly because I think nobody has imagined that
use case yet :)
> Perhaps you can tell us the higher level picture of what you're trying to
do? Given that, often there is a much simpler solution.

Thanks for the response.  As I alluded to in the opening paragraph I have
thousands of legacy header files that are a various mix of linkages and
interdependencies.  I have a need to strip out various structures and
typedefs and all structures/enums etc. they depend on from these files and
make just one small cohesive header.  I looked around a lot to try and see
what various technologies I could leverage for this end and ultimately
decided that libtooling would give me the best shot at it.  I was able to
get everything pulled out except for the statements that have multiple
macros as described earlier.  I’m using Lexer::getSourceText to access the
source for the given range.  Perhaps there is a way to print out the source
after it has already been preprocessed and macros expanded?  I’m very open
to switching gears if it is believed there is an easier way to do this.
Thanks, john
-------------- next part --------------
An HTML attachment was scrubbed...
URL: <http://lists.llvm.org/pipermail/cfe-dev/attachments/20160509/c87e9137/attachment.html>


More information about the cfe-dev mailing list