[cfe-dev] Preprocessed loc/token retrieval dream (almost) come true

Abramo Bagnara abramo.bagnara at gmail.com
Sat Sep 24 08:15:02 PDT 2011


Clang has always missed the possibility to reconstruct the preprocessed
token stream from a given location (without redoing the full preprocessing).

Thanks to recent changes from Chandler and Argyrios I'm now able to get
the next parsed token location in a reliable way.

I attach the code I use currently for review and to check if there is
interest to have these helpers in clang library (IMHO this service is
*very* useful and currently badly approximated in HTMLRewrite.cpp).

The code use show also some likely bugs in clang location storing, namely:

- the SLocEntry for macro arg expansion has an extra token at end and
this is not taken in consideration when computing isInFileID (a
workaround for that is in the attached code)

- immediate expansion range of stringified tokens enclose only '#' and
not '# arg' (this implies that the helper get confused there)

- immediate expansion range of concatenated tokens enclose only '##' and
not 'x ## y' (this implies that the helper get confused there)

The code currently still does not take in account file changes due to
#include, but I think this is a minor point and perhaps fixable.

To do its work parser_loc_get_pp_next needs that a reverse map is loaded
so to know which tokens are expansion point (i.e. a SourceLocation for
each macro SLocEntry).

typedef llvm::DenseMap<unsigned, clang::SrcMgr::SLocEntry> Exp_Map;

Exp_Map exp_map;

void load_exp_map() {
  using namespace clang;
  SourceManager& sm = get_source_manager();
  int i, last = sm.local_sloc_entry_size();
  for (i = 0; i < last; ++i) {
    FileID fid;
    // This method is private.
    // fid = FileID::get(i);
    // Ugly dirty trick is needed
    *reinterpret_cast<int*>(&fid) = i;
    SrcMgr::SLocEntry entry = sm.getSLocEntry(fid);
    if (!entry.isExpansion())
      continue;
    SourceLocation from = entry.getExpansion().getExpansionLocStart();
    exp_map[from.getRawEncoding()] = entry;
  }
}

clang::SourceLocation parser_loc_get_pp_next(clang::SourceLocation cur) {
  using namespace clang;
  SourceManager& sm = get_source_manager();
  const clang::LangOptions& lo = get_lang_options();
  assert(exp_map.find(cur.getRawEncoding()) == exp_map.end());
  SourceLocation next;
  while (1) {
    std::pair<FileID, unsigned> cur_info = sm.getDecomposedLoc(cur);
    SourceLocation scur = sm.getSpellingLoc(cur);
    std::pair<FileID, unsigned> scur_info = sm.getDecomposedLoc(scur);
    bool invalid = false;
    StringRef buf = sm.getBufferData(scur_info.first, &invalid);
    if (invalid)
      return SourceLocation();
    const char* point = buf.data() + scur_info.second;
    Lexer lexer(sm.getLocForStartOfFile(scur_info.first), lo,
                buf.begin(), point, buf.end());
    Token tok;
    lexer.LexFromRawLexer(tok);
    lexer.LexFromRawLexer(tok);
    if (tok.is(tok::eof)) {
      if (!cur.isMacroID())
        return SourceLocation();
    }
    else {
      SourceLocation snext = tok.getLocation();
      unsigned dist = sm.getFileOffset(snext) - scur_info.second;
      // Dirty trick to apply offset to macro loc
      next = SourceLocation::getFromRawEncoding(cur.getRawEncoding() +
dist);
      // The following conditional is needed only to workaround a
      // likely bug in SourceManager::isInFileID when called with macro arg
      // expansions.
      if (sm.isMacroArgExpansion(cur)) {
        // Dirty trick to apply offset to macro loc
        if
(sm.isInFileID(SourceLocation::getFromRawEncoding(cur.getRawEncoding() +
dist + 1), cur_info.first))
          break;
      }
      else {
        if (sm.isInFileID(next, cur_info.first))
          break;
      }
    }
    cur = sm.getImmediateExpansionRange(cur).second;
  }
  while (1) {
    Exp_Map::iterator i = exp_map.find(next.getRawEncoding());
    if (i == exp_map.end())
      break;
    SrcMgr::SLocEntry entry = i->second;
    // This method is private.
    // next = SourceLocation::getMacroLoc(entry.getOffset());
    // Ugly dirty trick is needed
    next = SourceLocation::getFromRawEncoding(entry.getOffset() | (1 <<
31));
    assert(next.isMacroID());
  }
  return next;
}



More information about the cfe-dev mailing list