[cfe-dev] Determining macros used in a function or SourceRange (using clang plugin)

Tue Sep 27 18:59:28 PDT 2016

Hi Alex,

Thanks again for all your help.  Actually I did manage to make this 
eventually work with a huge amount of Lexer/Token magic. Unfortunately I 
could not actually follow it past 1 level, so it looks like an 
unworkable solution for what I have in mind. I had thought the macro 
definitons were expanded from other macros. After digging though the 
SLocEntries, this is clearly not the case and they are expanded at their 
final uses.  This means I'm going back to the PPCallbacks and digging in 
there.  I think I can get the whole tree without annotating every macro 
I meet.  Apparently MacroExpands gets called repeatedly at the point the 
macro gets used.

Kind Regards,
    -Eric

On 9/26/2016 10:41 PM, Eric Bayer wrote:
> Alex,
>
> First off thanks so much for your help (and probably patience at this 
> point.)  Okay, that all works with a few tweaks.  I spent most of the 
> day trying to figure out how I get the definition. I have been looking 
> at the getSpellingLoc() which seems to get me one end of it, but I 
> can't seem to figure out how I find the end of the definition.  If 
> this were just a string I'd look until I found a line break that 
> wasn't preceeded with a \.  So far I tried constructing a lexer and 
> using ReadToEndOfLine() and LexFromRawLexer() based on some things I 
> found online.  Neither seemed to work.  My eventual goal is to get 
> another SourceRange and check it for macros as well, etc, right now 
> the return is StringRef just for debugging.  I.e. I want to check for 
> any macro dependency trees.  I've attached the code below of what I 
> tried.  ReadToEndOfLine() seems to never advance anything, and 
> LexFromRawLexer() seems to never come across an Tok::eod.  :/ Some 
> output below the function clip.  Maybe there's an entirely easier 
> approach?
>
>    -Eric
>
> StringRefgetTokensThroughEndOfDefine(SourceLocationBeginLoc, 
> SourceManager&SM) {
> constLangOptions&LangOpts=getDefaultLangOpts();
> SourceLocationCurLoc=BeginLoc;
> SourceLocationNextLoc;
> intiter=0;
>
> std::pair<FileID, unsigned>cur_info=SM.getDecomposedLoc(BeginLoc);
> boolinvalid=false;
> StringRefbuf=SM.getBufferData(cur_info.first, &invalid);
>
> if(invalid) {
> returnnullptr;
> }
>
> // Get the point in the buffer
> constchar*point=buf.data() +cur_info.second;
>
> // Make a lexer and point it at our buffer and offset
> Lexerlexer(SM.getLocForStartOfFile(cur_info.first), LangOpts,
> buf.begin(), point, buf.end());
>
> while(1) {
> // read through the end of line
> SmallString<128>text;
> lexer.ReadToEndOfLine(&text);
>
> if(text.back() !='\\') {
> break;
> }
>
> llvm::errs() <<"Incomplete line, so far: "<<
> getCodeString(SM, BeginLoc, lexer.getFileLoc(), "Token") <<"\n";
> }
>
> returngetCodeString(SM, BeginLoc, lexer.getFileLoc(), "Definition");
> #if0
> Token tok;
> while(1) {
> lexer.LexFromRawLexer(tok);
>
> if(tok.is(tok::eof) || tok.is(tok::eod)) {
> break;
> }
>
> llvm::errs() << "Token[" << tok.getName() << "]: \"" <<
> getCodeString(SM, tok.getLocation(), tok.getEndLoc(), "Token") <<
> "\"\n";
> }
>
> returngetCodeString(SM, BeginLoc, tok.getEndLoc(), "Definition");
> #endif
> }
>
> Example failure on tokens:  (and ignore the fact that we're sorta 
> printing out two tokens on every line as getEndLoc() seems to really 
> be the next token and getCodeString() seems to print on token boundaries.)
>
> Macro name: ASSERT
> Macro string: ASSERT((getFirstMatchingOnly && firstMatching != nullptr) ||
>           (!getFirstMatchingOnly && (allMatchingMo != nullptr ||
>                                      allMatchingMoRef != nullptr)))
> Token[raw_identifier]: "ASSERT_IFNOT("
> Token[l_paren]: "(cond"
> Token[raw_identifier]: "cond,"
> Token[comma]: ","
> Token[raw_identifier]: "_ASSERT_PANIC("
> Token[l_paren]: "(AssertAssert"
> Token[raw_identifier]: "AssertAssert)"
> Token[r_paren]: "))"
> Token[r_paren]: ")"                                   <---- I'd expect 
> a eod token here.  Guessing though.
> Token[hash]: "#define"
> Token[raw_identifier]: "define"
> ...
>
> On 9/26/2016 3:12 PM, Alex L wrote:
>>
>>
>> On 26 September 2016 at 14:55, Eric Bayer <ebayer at vmware.com 
>> <mailto:ebayer at vmware.com>> wrote:
>>
>>     Thanks Alex,
>>
>>     That gets me mostly there.  Pardon if that is a dumb question,
>>     but I'm not sure how I go from a SourceLocation to a Token.  I
>>     have not worked at all in the preprocessor levels before.
>>
>>
>> Something like this should work:
>>
>>     StringRef getToken(SourceLocation BeginLoc, SourceManager &SM, 
>> LangOptions &LangOpts) {
>>       const SourceLocation EndLoc = 
>> Lexer::getLocForEndOfToken(BeginLoc, 0, SM, LangOpts);
>>       return 
>> Lexer::getSourceText(CharSourceRange::getTokenRange(BeginLoc, 
>> EndLoc), SM, LangOpts);
>>     }
>>
>

-------------- next part --------------
An HTML attachment was scrubbed...
URL: <http://lists.llvm.org/pipermail/cfe-dev/attachments/20160927/f1def253/attachment.html>