[cfe-dev] Preprocessed loc/token retrieval dream (almost) come true

Abramo Bagnara abramo.bagnara at gmail.com
Fri Sep 30 02:09:47 PDT 2011


Ping and direct questions below.

Il 24/09/2011 17:15, Abramo Bagnara ha scritto:
> 
> Clang has always missed the possibility to reconstruct the preprocessed
> token stream from a given location (without redoing the full preprocessing).
> 
> Thanks to recent changes from Chandler and Argyrios I'm now able to get
> the next parsed token location in a reliable way.
> 
> I attach the code I use currently for review and to check if there is
> interest to have these helpers in clang library (IMHO this service is
> *very* useful and currently badly approximated in HTMLRewrite.cpp).

There is interest on having in clang library the methods to get from a
starting location all the locations for following tokens in
preprocessing order? This would permit to know if *all* the locations in
a specific range satisfies a given property, to get the missing
locations, to scan the exact preprocessed sequence of type/storage
specifiers, etc.

> The code use show also some likely bugs in clang location storing, namely:
> 
> - the SLocEntry for macro arg expansion has an extra token at end and
> this is not taken in consideration when computing isInFileID (a
> workaround for that is in the attached code)

Is this intended or it should be considered a bug?

> - immediate expansion range of stringified tokens enclose only '#' and
> not '# arg' (this implies that the helper get confused there)

Is this intended or it should be considered a bug?

> - immediate expansion range of concatenated tokens enclose only '##' and
> not 'x ## y' (this implies that the helper get confused there)

Is this inteded or it should be considered a bug?

> 
> The code currently still does not take in account file changes due to
> #include, but I think this is a minor point and perhaps fixable.
> 
> To do its work parser_loc_get_pp_next needs that a reverse map is loaded
> so to know which tokens are expansion point (i.e. a SourceLocation for
> each macro SLocEntry).
> 
> typedef llvm::DenseMap<unsigned, clang::SrcMgr::SLocEntry> Exp_Map;
> 
> Exp_Map exp_map;
> 
> void load_exp_map() {
>   using namespace clang;
>   SourceManager& sm = get_source_manager();
>   int i, last = sm.local_sloc_entry_size();
>   for (i = 0; i < last; ++i) {
>     FileID fid;
>     // This method is private.
>     // fid = FileID::get(i);
>     // Ugly dirty trick is needed
>     *reinterpret_cast<int*>(&fid) = i;
>     SrcMgr::SLocEntry entry = sm.getSLocEntry(fid);
>     if (!entry.isExpansion())
>       continue;
>     SourceLocation from = entry.getExpansion().getExpansionLocStart();
>     exp_map[from.getRawEncoding()] = entry;
>   }
> }
> 
> clang::SourceLocation parser_loc_get_pp_next(clang::SourceLocation cur) {
>   using namespace clang;
>   SourceManager& sm = get_source_manager();
>   const clang::LangOptions& lo = get_lang_options();
>   assert(exp_map.find(cur.getRawEncoding()) == exp_map.end());
>   SourceLocation next;
>   while (1) {
>     std::pair<FileID, unsigned> cur_info = sm.getDecomposedLoc(cur);
>     SourceLocation scur = sm.getSpellingLoc(cur);
>     std::pair<FileID, unsigned> scur_info = sm.getDecomposedLoc(scur);
>     bool invalid = false;
>     StringRef buf = sm.getBufferData(scur_info.first, &invalid);
>     if (invalid)
>       return SourceLocation();
>     const char* point = buf.data() + scur_info.second;
>     Lexer lexer(sm.getLocForStartOfFile(scur_info.first), lo,
>                 buf.begin(), point, buf.end());
>     Token tok;
>     lexer.LexFromRawLexer(tok);
>     lexer.LexFromRawLexer(tok);
>     if (tok.is(tok::eof)) {
>       if (!cur.isMacroID())
>         return SourceLocation();
>     }
>     else {
>       SourceLocation snext = tok.getLocation();
>       unsigned dist = sm.getFileOffset(snext) - scur_info.second;
>       // Dirty trick to apply offset to macro loc
>       next = SourceLocation::getFromRawEncoding(cur.getRawEncoding() +
> dist);
>       // The following conditional is needed only to workaround a
>       // likely bug in SourceManager::isInFileID when called with macro arg
>       // expansions.
>       if (sm.isMacroArgExpansion(cur)) {
>         // Dirty trick to apply offset to macro loc
>         if
> (sm.isInFileID(SourceLocation::getFromRawEncoding(cur.getRawEncoding() +
> dist + 1), cur_info.first))
>           break;
>       }
>       else {
>         if (sm.isInFileID(next, cur_info.first))
>           break;
>       }
>     }
>     cur = sm.getImmediateExpansionRange(cur).second;
>   }
>   while (1) {
>     Exp_Map::iterator i = exp_map.find(next.getRawEncoding());
>     if (i == exp_map.end())
>       break;
>     SrcMgr::SLocEntry entry = i->second;
>     // This method is private.
>     // next = SourceLocation::getMacroLoc(entry.getOffset());
>     // Ugly dirty trick is needed
>     next = SourceLocation::getFromRawEncoding(entry.getOffset() | (1 <<
> 31));
>     assert(next.isMacroID());
>   }
>   return next;
> }



More information about the cfe-dev mailing list