[cfe-dev] Preprocessed loc/token retrieval dream (almost) come true

Argyrios Kyrtzidis kyrtzidis at apple.com
Mon Oct 3 11:25:57 PDT 2011


Hi Abramo,

Sorry to disappoint you but I think the dream remains unfulfilled ;-)

On Sep 30, 2011, at 2:09 AM, Abramo Bagnara wrote:

> Ping and direct questions below.
> 
> Il 24/09/2011 17:15, Abramo Bagnara ha scritto:
>> 
>> Clang has always missed the possibility to reconstruct the preprocessed
>> token stream from a given location (without redoing the full preprocessing).
>> 
>> Thanks to recent changes from Chandler and Argyrios I'm now able to get
>> the next parsed token location in a reliable way.
>> 
>> I attach the code I use currently for review and to check if there is
>> interest to have these helpers in clang library (IMHO this service is
>> *very* useful and currently badly approximated in HTMLRewrite.cpp).
> 
> There is interest on having in clang library the methods to get from a
> starting location all the locations for following tokens in
> preprocessing order? This would permit to know if *all* the locations in
> a specific range satisfies a given property, to get the missing
> locations, to scan the exact preprocessed sequence of type/storage
> specifiers, etc.

The code that you posted was a bit hard to follow but correct me if I'm wrong;
You are recording all macro expansion points and once you hit one, you enter the SLocEntry for the macro expansion and start lexing it, is this correct ?

This may seem to work but it is not reliable. The main issue is that for macro arguments expansion we do *not* guarantee that the range of the SLocEntry contains only the tokens that were actually lexed.
This is because we aggressively "merge" them to reduce the number of needed SLocEntries.

Here's an example:

#define M1 1
#define M2 2
#define M3 3

#define MA1(a,b,c) a c
#define MA2(x) x

MA2( MA1(M1, M2, M3) )

The tokens that MA2 ultimately receives are '1' and '3' but if you follow through and lex the SLocEntry that gets created for the macro arg expansion for MA2, you will notice that the length is 5 and it is actually a chunk encompassing "1 2 3".

So, from this chunk, only '1' and '3' and their respective locations were actually passed to the parser but you don't know that just by looking at the SLocEntry.


Apart from that, this is trying to deal with macro expansions; how are you handling preprocessor directives ? e..g:

X
#if  ...
Y
#else
X
#endif

How do you find out what comes after 'X' if you don't preprocess ?

> 
>> The code use show also some likely bugs in clang location storing, namely:
>> 
>> - the SLocEntry for macro arg expansion has an extra token at end and
>> this is not taken in consideration when computing isInFileID (a
>> workaround for that is in the attached code)
> 
> Is this intended or it should be considered a bug?

No, this is the nature of SLocEntry, it is not reliable for trying to find out the preprocessed tokens.

> 
>> - immediate expansion range of stringified tokens enclose only '#' and
>> not '# arg' (this implies that the helper get confused there)
> 
> Is this intended or it should be considered a bug?

This is reasonable and good idea.

> 
>> - immediate expansion range of concatenated tokens enclose only '##' and
>> not 'x ## y' (this implies that the helper get confused there)
> 
> Is this inteded or it should be considered a bug?

As is this.

-Argyrios

> 
>> 
>> The code currently still does not take in account file changes due to
>> #include, but I think this is a minor point and perhaps fixable.
>> 
>> To do its work parser_loc_get_pp_next needs that a reverse map is loaded
>> so to know which tokens are expansion point (i.e. a SourceLocation for
>> each macro SLocEntry).
>> 
>> typedef llvm::DenseMap<unsigned, clang::SrcMgr::SLocEntry> Exp_Map;
>> 
>> Exp_Map exp_map;
>> 
>> void load_exp_map() {
>>  using namespace clang;
>>  SourceManager& sm = get_source_manager();
>>  int i, last = sm.local_sloc_entry_size();
>>  for (i = 0; i < last; ++i) {
>>    FileID fid;
>>    // This method is private.
>>    // fid = FileID::get(i);
>>    // Ugly dirty trick is needed
>>    *reinterpret_cast<int*>(&fid) = i;
>>    SrcMgr::SLocEntry entry = sm.getSLocEntry(fid);
>>    if (!entry.isExpansion())
>>      continue;
>>    SourceLocation from = entry.getExpansion().getExpansionLocStart();
>>    exp_map[from.getRawEncoding()] = entry;
>>  }
>> }
>> 
>> clang::SourceLocation parser_loc_get_pp_next(clang::SourceLocation cur) {
>>  using namespace clang;
>>  SourceManager& sm = get_source_manager();
>>  const clang::LangOptions& lo = get_lang_options();
>>  assert(exp_map.find(cur.getRawEncoding()) == exp_map.end());
>>  SourceLocation next;
>>  while (1) {
>>    std::pair<FileID, unsigned> cur_info = sm.getDecomposedLoc(cur);
>>    SourceLocation scur = sm.getSpellingLoc(cur);
>>    std::pair<FileID, unsigned> scur_info = sm.getDecomposedLoc(scur);
>>    bool invalid = false;
>>    StringRef buf = sm.getBufferData(scur_info.first, &invalid);
>>    if (invalid)
>>      return SourceLocation();
>>    const char* point = buf.data() + scur_info.second;
>>    Lexer lexer(sm.getLocForStartOfFile(scur_info.first), lo,
>>                buf.begin(), point, buf.end());
>>    Token tok;
>>    lexer.LexFromRawLexer(tok);
>>    lexer.LexFromRawLexer(tok);
>>    if (tok.is(tok::eof)) {
>>      if (!cur.isMacroID())
>>        return SourceLocation();
>>    }
>>    else {
>>      SourceLocation snext = tok.getLocation();
>>      unsigned dist = sm.getFileOffset(snext) - scur_info.second;
>>      // Dirty trick to apply offset to macro loc
>>      next = SourceLocation::getFromRawEncoding(cur.getRawEncoding() +
>> dist);
>>      // The following conditional is needed only to workaround a
>>      // likely bug in SourceManager::isInFileID when called with macro arg
>>      // expansions.
>>      if (sm.isMacroArgExpansion(cur)) {
>>        // Dirty trick to apply offset to macro loc
>>        if
>> (sm.isInFileID(SourceLocation::getFromRawEncoding(cur.getRawEncoding() +
>> dist + 1), cur_info.first))
>>          break;
>>      }
>>      else {
>>        if (sm.isInFileID(next, cur_info.first))
>>          break;
>>      }
>>    }
>>    cur = sm.getImmediateExpansionRange(cur).second;
>>  }
>>  while (1) {
>>    Exp_Map::iterator i = exp_map.find(next.getRawEncoding());
>>    if (i == exp_map.end())
>>      break;
>>    SrcMgr::SLocEntry entry = i->second;
>>    // This method is private.
>>    // next = SourceLocation::getMacroLoc(entry.getOffset());
>>    // Ugly dirty trick is needed
>>    next = SourceLocation::getFromRawEncoding(entry.getOffset() | (1 <<
>> 31));
>>    assert(next.isMacroID());
>>  }
>>  return next;
>> }




More information about the cfe-dev mailing list