[cfe-dev] Advanced Rewriting

Tue Oct 23 08:32:13 PDT 2018

Hey everyone

the two major limitations are resolved now:

- The macro argument issue
- Support for replacing Decls using clang::ast_type_traits::DynTypedNode

The macro issue vanished by itself once I figured a little unintuitive 
detail out: The SourceLocations from the original AST and the ones from 
the new Preprocessor Lexer did not compare equal even though they refer 
to the exact same location.

I believe this is because expansion SrcLocs have a kind of pointer 
identity representation and equality just compares raw representations, 
such that even when they point to the same locations, they don't compare 
equal. What worked for me was a kind of "deep" comparison:

bool clutil::DeepSrcLocEqual(clang::SourceLocation lhs, 
clang::SourceLocation rhs, const clang::SourceManager &SM)
{
     if (lhs == rhs)
         return true;

     if (SM.getExpansionLoc(lhs) != SM.getExpansionLoc(rhs))
         return false;
     if (SM.getSpellingLoc(lhs) != SM.getSpellingLoc(rhs))
         return false;

     clang::SourceLocation lhsMacro, rhsMacro;
     if (SM.isMacroArgExpansion(lhs, &lhsMacro))
     {
         if (!SM.isMacroArgExpansion(rhs, &rhsMacro))
             return false;
         if (!DeepSrcLocEqual(lhsMacro, rhsMacro, SM))
             return false;
     }

     return true;
}

Attached is the cleaned up rewriter that I ended up with now. Still, if 
there is any interest, I'd be happy to contribute this back to clangs 
libraries, but I would require some feedback how this fits into existing 
facilities.

Best regards
Rafael

On 17.07.18 16:19, Jonas Toth wrote:
>
> Hi Rafael,
>
> I did read into clang-refactor a while ago but unfortunatly could not 
> follow that up. If I recall correctly its about source-to-source 
> transformation (as you said) and aims at implementing the primitive 
> refactorings that exist (e.g. extract-method, extract-variable, ....).
>
> Rewriting itself should happen with the normal tooling framework.
>
> (https://clang.llvm.org/docs/RefactoringEngine.html)
>
> Maybe the implementers of the existing code can give better comments 
> on you proposal (and might have considered a similar solution to yours 
> already).
>
> +Alex Lorenz
>
> All the best, Jonas
>
>
> Am 17.07.2018 um 14:46 schrieb Rafael·Stahl:
>>
>> Hi Jonas
>>
>> Thanks for introducing me to this, I have seen the "Replacement" 
>> before, but not clang-refactor.
>>
>> However it seems to only provide management facilities around rewrite 
>> operations and not aid with the rewriting itself. Am I missing 
>> something here?
>>
>> The two core problems for me:
>>
>> - nesting replacements: When implementing replacements with 
>> clang-refactor, I still have to provide replacements that are closed 
>> in themselves. I cannot make them depend on others, right?
>> - macros: clang-refactor only seems to work with spelling locations.
>>
>> Maybe an even simpler example: Replace all additions with "add(lhs, 
>> rhs)". This in itself is very difficult with clang as soon as the 
>> Stmts are nested or macros are involved.
>>
>> Best regards
>> Rafael
>>
>>
>> On 16.07.2018 19:06, Jonas Toth via cfe-dev wrote:
>>>
>>> Hi Rafael,
>>>
>>> wouldn't your usecase be a task for clang-refactor?
>>>
>>> Best,  Jonas
>>>
>>>
>>> Am 16.07.2018 um 17:08 schrieb Rafael·Stahl via cfe-dev:
>>>> Hey everyone
>>>>
>>>> The rewriting API of Clang operates on the source code in textual 
>>>> form. The user can use AST nodes to figure out what to replace, but 
>>>> in the end he has to remove and insert snippets in a linear piece 
>>>> of text.
>>>>
>>>> This is very inconvenient when it is required to restructure and 
>>>> nest replacements. The involvement of macros makes a manual process 
>>>> even more difficult. See some recent threads expressing difficulty 
>>>> with the API [1][2].
>>>>
>>>> What do I mean by "nested replacements"? For example in the following:
>>>>
>>>>     int i = x + s->a;
>>>>
>>>> I would want to replace the BinaryOperator with a function call and 
>>>> the MemberExpr with some constant:
>>>>
>>>>     int i = Addition(x, 7);
>>>>
>>>> When keeping the two replacement rules independent of each other, 
>>>> achieving this with the current API is extremely difficult. More so 
>>>> when macros are involved.
>>>>
>>>> I am proposing some kind of helper that aims to solve these issues 
>>>> by providing an interface that offers to directly replace AST nodes 
>>>> and a mechanism to nest AST node replacements - without having to 
>>>> worry about macros.
>>>>
>>>> Potential usage:
>>>>
>>>> - Develop a class that derives from StmtToRewrite to define how 
>>>> replacements should happen:
>>>>
>>>>     class RewriteAdds : public cu::StmtToRewrite
>>>>     {
>>>>     public:
>>>>         std::string makeReplaceStr() const override
>>>>         {
>>>>             auto binOp = dyn_cast<BinaryOperator>(replaceS);
>>>>             return "Addition(" + 
>>>> getMgr()->getReplaced(binOp->getLHS()).strToInsert + ", " +
>>>> getMgr()->getReplaced(binOp->getRHS()).strToInsert + ")";
>>>>         }
>>>>     };
>>>>
>>>>     class RewriteMembs : public cu::StmtToRewrite
>>>>     {
>>>>     public:
>>>>         std::string makeReplaceStr() const override
>>>>         {
>>>>             return "7";
>>>>         }
>>>>     };
>>>>
>>>> - Construct a RewriteManager:
>>>>
>>>>     cu::RewriteManager mgr(ACtx, PP);
>>>>
>>>> - Add rewriting operations to the manager:
>>>>
>>>>     // std::vector<const Stmt *> AddStmts = /* matched from 
>>>> binaryOperator() with plus */
>>>>     // std::vector<const Stmt *> MembStmts = /* matched from 
>>>> memberExpr() */
>>>>     for (const auto &S : AddStmts) mgr.registerStmt<RewriteAdds>(S);
>>>>     for (const auto &S : MembStmts) mgr.registerStmt<RewriteMembs>(S);
>>>>
>>>> - Retrieve and apply the results:
>>>>
>>>>     clang::Rewriter rewriter(SM, LangOpts);
>>>>     for (const auto &r : mgr.getReplacements()) {
>>>>         rewriter.RemoveText(r.rangeToRemove);
>>>>         rewriter.InsertText(r.rangeToRemove.getBegin(), 
>>>> r.strToInsert);
>>>>     }
>>>>
>>>>
>>>> At the end of this mail is my low quality code that kind-of 
>>>> implements this. TLDR:
>>>>
>>>> - Build a hierarchy of stmts to replace and keep track of which 
>>>> replacements must be combined
>>>> - Move further up in the AST if these replacements are inside a macro
>>>> - Recursively lex the file and look for replacements outside-in by 
>>>> spelling locations. Expand any macros that are encountered during 
>>>> this. The re-lexing idea is based on the hint in [3].
>>>>
>>>> The code has the following shortcomings:
>>>>
>>>> - I do not know how to distinguish macro argument expansions within 
>>>> macros. For example in "#define FOO(a) a + a" the two "a"s expand 
>>>> to different AST nodes that could be replaced with different rules. 
>>>> This is an important issue, because it can lead to completely 
>>>> broken code with nesting.
>>>> - Limited to Stmts, when Decls should be supported too.
>>>> - Very un-optimized with lexing the entire source file many times. 
>>>> Easy to solve, but didn't want to raise the complexity further for 
>>>> now.
>>>> - Could keep written code more clean by only expanding macros if 
>>>> required. For example not required if just a macro arg is replaced 
>>>> and all expansions would be the same.
>>>>
>>>>
>>>> I am very interested in your general thoughts. I'm not very 
>>>> experienced with clang, but this was my vision how I would want to 
>>>> do replacements. Are you interested in getting this into clang? I 
>>>> would need help with ironing out the remaining issues.
>>>>
>>>> -Rafael
>>>>
>>>>
>>>> [1] http://lists.llvm.org/pipermail/cfe-dev/2018-July/058430.html
>>>> [2] http://lists.llvm.org/pipermail/cfe-dev/2018-June/058213.html
>>>> [3] http://lists.llvm.org/pipermail/cfe-dev/2017-August/055079.html
>>>>
>>>>
>>>>
>>>> ----------------------------------------
>>>>
>>>> RewriteManager.h
>>>>
>>>> ----------------------------------------
>>>>
>>>> #ifndef CLANGUTIL_REWRITEMANAGER_H
>>>> #define CLANGUTIL_REWRITEMANAGER_H
>>>>
>>>> #include "ClangUtil/SourceRangeLess.h"
>>>> #include "make_unique.h"
>>>> #include "clang/AST/AST.h"
>>>> #include <vector>
>>>> #include <map>
>>>>
>>>>
>>>> // TODO extend to decls
>>>>
>>>>
>>>> namespace cu
>>>> {
>>>> // Represents a statement in the original AST that should be 
>>>> rewritten. To implement recursive replacements, call
>>>> // getMgr()->getReplaced() on any AST node within the 
>>>> makeReplaceStr callback.
>>>> class StmtToRewrite
>>>> {
>>>>     friend class RewriteManager;
>>>>
>>>> public:
>>>>     // Returns the enclosing RewriteManager.
>>>>     class RewriteManager *getMgr() const;
>>>>     // Override this to build a replacement string. Implement 
>>>> recursive replacements with RewriteManager::getReplaced.
>>>>     virtual std::string makeReplaceStr() const = 0;
>>>>
>>>>     // The statement to replace.
>>>>     const clang::Stmt *replaceS = nullptr;
>>>>
>>>> private:
>>>>     RewriteManager *m_mgr;
>>>> };
>>>>
>>>> struct RewriteOperation
>>>> {
>>>>     clang::SourceRange rangeToRemove;
>>>>     std::string strToInsert;
>>>> };
>>>>
>>>> // A class for managing replacements of AST nodes. It allows to 
>>>> specifically target AST nodes instead of raw source
>>>> // locations to enable easy replacements involving macros and 
>>>> nested replacements.
>>>> // For extended documentation see: doc/rewriting.md
>>>> class RewriteManager
>>>> {
>>>> public:
>>>>     RewriteManager(clang::ASTContext &ACtx, clang::Preprocessor &PP);
>>>>
>>>>     clang::ASTContext &getACtx() const { return ACtx; }
>>>>
>>>>     // Registers a StmtToRewrite for use with getReplacements. Call 
>>>> this on all
>>>>     // statements that should be rewritten before calling any 
>>>> rewriting functions.
>>>>     void registerStmt(std::unique_ptr<StmtToRewrite> S);
>>>>
>>>>     // Helper for constructing the custom type from a Stmt.
>>>>     template <typename T, typename... Args>
>>>>     void registerStmt(const clang::Stmt *S, Args... args)
>>>>     {
>>>>         auto p = std::make_unique<T>(std::forward<Args>(args)...);
>>>>         p->replaceS = S;
>>>>         registerStmt(std::move(p));
>>>>     }
>>>>
>>>>     // Get the full replacement of an AST node. Note that this 
>>>> function removes any replaced statements from the work
>>>>     // list, so calling it twice will only replace the first time.
>>>>     RewriteOperation getReplaced(const clang::Stmt *S);
>>>>     // Get all replacements. These may be fewer than the requested 
>>>> ones because of nesting.
>>>>     std::vector<RewriteOperation> getReplacements();
>>>>
>>>> private:
>>>>     std::string getExpandedCode(const clang::Stmt *toReplaceS);
>>>>
>>>> private:
>>>>     clang::ASTContext &ACtx;
>>>>     const clang::LangOptions &LangOpts;
>>>>     clang::SourceManager &SM;
>>>>     clang::Preprocessor &PP;
>>>>
>>>>     // Manages the pending replacements.
>>>>     class WorkList
>>>>     {
>>>>     public:
>>>>         typedef std::map<clang::SourceRange, std::vector<const 
>>>> StmtToRewrite *>> RangeToRepMap;
>>>>
>>>>         WorkList(clang::ASTContext &ACtx, clang::SourceManager &SM);
>>>>
>>>>         bool isStmtPending(const clang::Stmt *S) const;
>>>>         void addStmt(std::unique_ptr<StmtToRewrite> S);
>>>>         const RangeToRepMap &getRangeToReplacementsMap() const;
>>>>         std::vector<const StmtToRewrite *> getSortedReplacements() 
>>>> const;
>>>>         void markDone(const StmtToRewrite *S);
>>>>         void cleanup();
>>>>
>>>>     private:
>>>>         clang::ASTContext &ACtx;
>>>>         clang::SourceManager &SM;
>>>> std::vector<std::unique_ptr<StmtToRewrite>> m_pending;
>>>> std::vector<std::unique_ptr<StmtToRewrite>> m_done;
>>>>         RangeToRepMap m_rangeToReplacements;
>>>>     };
>>>>
>>>>     WorkList m_workList;
>>>> };
>>>>
>>>> } // namespace cu
>>>>
>>>> #endif
>>>>
>>>>
>>>>
>>>> ----------------------------------------
>>>>
>>>> RewriteManager.cpp
>>>>
>>>> ----------------------------------------
>>>>
>>>> #include "ClangUtil/RewriteManager.h"
>>>> #include "ClangUtil/ASTUtil.h"
>>>> #include "clang/Lex/Lexer.h"
>>>> #include "clang/Lex/Preprocessor.h"
>>>> #include "clang/Lex/PreprocessorOptions.h"
>>>> #include "clang/Lex/TokenConcatenation.h"
>>>> #include "clang/Lex/MacroArgs.h"
>>>>
>>>>
>>>> using namespace cu;
>>>>
>>>>
>>>> // Returns a Stmt that is the first parent of startS whose 
>>>> expansion range is within the given range.
>>>> static const clang::Stmt *GetFullMacroStmt(clang::SourceRange 
>>>> range, const clang::Stmt *startS, clang::ASTContext &ACtx)
>>>> {
>>>>     auto &SM = ACtx.getSourceManager();
>>>>
>>>>     // Walk the tree upwards until ST does no longer expand to 
>>>> within range.
>>>>     const clang::Stmt *ST = startS;
>>>>     while (true)
>>>>     {
>>>>         const auto &parents = ACtx.getParents(*ST);
>>>>         if (parents.empty())
>>>>         {
>>>>             break;
>>>>         }
>>>>         auto childS = ST;
>>>>         ST = parents[0].get<clang::Stmt>();
>>>>         if (!ST)
>>>>         {
>>>>             if (auto D = parents[0].get<clang::Decl>())
>>>>             {
>>>>                 const auto &parentsD = ACtx.getParents(*D);
>>>>                 if (parentsD.empty())
>>>>                 {
>>>>                     break;
>>>>                 }
>>>>                 ST = parentsD[0].get<clang::Stmt>();
>>>>                 if (!ST)
>>>>                 {
>>>>                     break;
>>>>                 }
>>>>             }
>>>>             else
>>>>             {
>>>>                 break;
>>>>             }
>>>>         }
>>>>
>>>>         auto exLocS = SM.getExpansionLoc(ST->getLocStart());
>>>>         auto exLocE = SM.getExpansionLoc(ST->getLocEnd());
>>>>         if (SM.isBeforeInTranslationUnit(exLocS, range.getBegin()) ||
>>>>             SM.isBeforeInTranslationUnit(range.getEnd(), exLocE))
>>>>         {
>>>>             return childS;
>>>>         }
>>>>     }
>>>>
>>>>     return nullptr;
>>>> }
>>>>
>>>>
>>>> RewriteManager *StmtToRewrite::getMgr() const
>>>> {
>>>>     return m_mgr;
>>>> }
>>>>
>>>>
>>>> RewriteManager::WorkList::WorkList(clang::ASTContext &ACtx, 
>>>> clang::SourceManager &SM) : ACtx(ACtx), SM(SM) {}
>>>> bool RewriteManager::WorkList::isStmtPending(const clang::Stmt *S) 
>>>> const
>>>> {
>>>>     for (const auto &r : m_pending)
>>>>     {
>>>>         if (r->replaceS == S)
>>>>         {
>>>>             return true;
>>>>         }
>>>>     }
>>>>     return false;
>>>> }
>>>> void 
>>>> RewriteManager::WorkList::addStmt(std::unique_ptr<StmtToRewrite> S)
>>>> {
>>>>     // Use the expansion range for maximal replacement flexibility 
>>>> in macros.
>>>>     auto replaceRange = 
>>>> SM.getExpansionRange(S->replaceS->getSourceRange());
>>>>
>>>>     // TODO not quite correct.
>>>>     /*auto sortRanges = [&](std::vector<const StmtToRewrite *> &vec) {
>>>>         std::sort(vec.begin(), vec.end(), [&](const StmtToRewrite 
>>>> *lhs, const StmtToRewrite *rhs) {
>>>>             auto lhsRange = 
>>>> SM.getExpansionRange(lhs->replaceS->getSourceRange());
>>>>             auto rhsRange = 
>>>> SM.getExpansionRange(rhs->replaceS->getSourceRange());
>>>>             return IsContained(rhsRange, lhsRange, SM);
>>>>         });
>>>>     };*/
>>>>
>>>>     // Establish hierarchical relation between all ranges.
>>>>     bool found = false;
>>>>     // First, check if this range is within one we already have.
>>>>     for (auto &r : m_rangeToReplacements)
>>>>     {
>>>>         if (IsContained(replaceRange, r.first, SM))
>>>>         {
>>>>             // Insert in a sorted order.
>>>>             for (auto it = r.second.begin(); it != r.second.end(); 
>>>> ++it)
>>>>             {
>>>>                 //auto testRange = 
>>>> SM.getExpansionRange((*it)->replaceS->getSourceRange());
>>>>                 // if (IsContained(testRange, replaceRange, SM))
>>>>                 if (IsParent(S->replaceS, (*it)->replaceS, ACtx))
>>>>                 {
>>>>                     r.second.insert(it, S.get());
>>>>                     found = true;
>>>>                     break;
>>>>                 }
>>>>             }
>>>>             if (!found)
>>>>             {
>>>>                 r.second.push_back(S.get());
>>>>                 found = true;
>>>>             }
>>>>             break;
>>>>         }
>>>>     }
>>>>     // Not within existing range, add as new top-level range.
>>>>     if (!found)
>>>>     {
>>>>         // Check if any existing ranges are contained within the 
>>>> new one.
>>>>         std::vector<const StmtToRewrite *> moveThese;
>>>>         auto it = m_rangeToReplacements.begin();
>>>>         while (it != m_rangeToReplacements.end())
>>>>         {
>>>>             if (IsContained(it->first, replaceRange, SM))
>>>>             {
>>>>                 moveThese.insert(moveThese.end(), 
>>>> it->second.begin(), it->second.end());
>>>>                 it = m_rangeToReplacements.erase(it);
>>>>             }
>>>>             else
>>>>             {
>>>>                 ++it;
>>>>             }
>>>>         }
>>>>         auto &accesses = m_rangeToReplacements[replaceRange];
>>>>         // The order is important here. We want the first element 
>>>> to be the one that spans the full range.
>>>>         accesses.push_back(S.get());
>>>>         // TODO sort "moveThese".
>>>>         accesses.insert(accesses.end(), moveThese.begin(), 
>>>> moveThese.end());
>>>>     }
>>>>
>>>>     int count = 0;
>>>>     for (const auto &r : m_rangeToReplacements)
>>>>     {
>>>>         printf("range %i\n", count++);
>>>>         for (const auto &a : r.second)
>>>>         {
>>>>             printf("replacement:\n");
>>>>             a->replaceS->dump();
>>>>         }
>>>>     }
>>>>
>>>>     m_pending.push_back(std::move(S));
>>>> }
>>>> const RewriteManager::WorkList::RangeToRepMap 
>>>> &RewriteManager::WorkList::getRangeToReplacementsMap() const
>>>> {
>>>>     return m_rangeToReplacements;
>>>> }
>>>> std::vector<const StmtToRewrite *> 
>>>> RewriteManager::WorkList::getSortedReplacements() const
>>>> {
>>>>     std::vector<const StmtToRewrite *> result;
>>>>     for (auto &r : m_rangeToReplacements)
>>>>     {
>>>>         result.insert(result.end(), r.second.begin(), r.second.end());
>>>>     }
>>>>     return result;
>>>> }
>>>> void RewriteManager::WorkList::markDone(const StmtToRewrite *S)
>>>> {
>>>>     // Remove from hierarchy.
>>>>     for (auto &r : m_rangeToReplacements)
>>>>     {
>>>>         r.second.erase(std::remove(r.second.begin(), 
>>>> r.second.end(), S), r.second.end());
>>>>     }
>>>>
>>>>     // Move from pending to done list.
>>>>     auto it = std::find_if(m_pending.begin(), m_pending.end(),
>>>>                            [&](const std::unique_ptr<StmtToRewrite> 
>>>> &rep) { return rep.get() == S; });
>>>>     if (it == m_pending.end())
>>>>     {
>>>>         throw std::runtime_error("Did not find replacement to mark 
>>>> as done");
>>>>     }
>>>>     m_done.push_back(std::move(*it));
>>>>     m_pending.erase(it);
>>>> }
>>>> void RewriteManager::WorkList::cleanup()
>>>> {
>>>>     m_done.clear();
>>>> }
>>>>
>>>>
>>>> RewriteManager::RewriteManager(clang::ASTContext &ACtx, 
>>>> clang::Preprocessor &PP)
>>>>     : ACtx(ACtx), LangOpts(ACtx.getLangOpts()), 
>>>> SM(ACtx.getSourceManager()), PP(PP), m_workList(ACtx, SM)
>>>> {
>>>> }
>>>>
>>>> void RewriteManager::registerStmt(std::unique_ptr<StmtToRewrite> S)
>>>> {
>>>>     if (!S->replaceS)
>>>>     {
>>>>         throw std::runtime_error("Must set replaceS");
>>>>     }
>>>>
>>>>     if (m_workList.isStmtPending(S->replaceS))
>>>>     {
>>>>         throw std::runtime_error("This Stmt will already be 
>>>> replaced");
>>>>     }
>>>>
>>>>     S->m_mgr = this;
>>>>     m_workList.addStmt(std::move(S));
>>>> }
>>>>
>>>> RewriteOperation RewriteManager::getReplaced(const clang::Stmt *S)
>>>> {
>>>>     auto range = SM.getExpansionRange(S->getSourceRange());
>>>>     return { range, getExpandedCode(S) };
>>>> }
>>>>
>>>> std::vector<RewriteOperation> RewriteManager::getReplacements()
>>>> {
>>>>     std::vector<RewriteOperation> results;
>>>>
>>>>     for (auto &rangeAndAccesses : 
>>>> m_workList.getRangeToReplacementsMap())
>>>>     {
>>>>         auto &range = rangeAndAccesses.first;
>>>>         auto &accesses = rangeAndAccesses.second;
>>>>
>>>>         // Cannot replace something inside a macro because it would 
>>>> replace all expansions instead of just the selected
>>>>         // AST node. So in a first step, get an enclosing statement 
>>>> that is no longer inside a macro.
>>>>         // TODO we could keep the original code more clean by not 
>>>> expanding macro args if the whole expansion does not
>>>>         // contain the macro arg more than once.
>>>>         auto macroS = GetFullMacroStmt(range, 
>>>> accesses[0]->replaceS, ACtx);
>>>>
>>>>         results.push_back(getReplaced(macroS));
>>>>
>>>>         // TODO we could run clang-format on the replacements. this 
>>>> would especially benefit long macro expansions.
>>>>     }
>>>>
>>>>     m_workList.cleanup();
>>>>
>>>>     return results;
>>>> }
>>>>
>>>> std::string RewriteManager::getExpandedCode(const clang::Stmt 
>>>> *toReplaceS)
>>>> {
>>>>     // TODO performance optimization. this is parsing way more than 
>>>> required.
>>>>
>>>>     using namespace clang;
>>>>
>>>>     printf("getExpandedCode:\n");
>>>>     toReplaceS->dump();
>>>>
>>>>     std::string out;
>>>>
>>>>     auto toReplaceExpStart = 
>>>> SM.getExpansionLoc(toReplaceS->getLocStart());
>>>>     auto toReplaceExpEnd = 
>>>> SM.getExpansionLoc(toReplaceS->getLocEnd());
>>>>     auto toReplaceSpellStart = 
>>>> SM.getSpellingLoc(toReplaceS->getLocStart());
>>>>     auto toReplaceSpellEnd = 
>>>> SM.getSpellingLoc(toReplaceS->getLocEnd());
>>>>
>>>>     auto FID = 
>>>> SM.getFileID(SM.getExpansionLoc(toReplaceS->getLocStart()));
>>>>
>>>>     // The following is inspired by: 
>>>> clang/Rewrite/HTMLRewrite.cpp:HighlightMacros
>>>>
>>>>     // Re-lex the raw token stream into a token buffer.
>>>>     std::vector<Token> TokenStream;
>>>>
>>>>     const llvm::MemoryBuffer *FromFile = SM.getBuffer(FID);
>>>>     Lexer L(FID, FromFile, SM, PP.getLangOpts());
>>>>
>>>>     // Lex all the tokens in raw mode, to avoid entering #includes 
>>>> or expanding
>>>>     // macros.
>>>>     while (1)
>>>>     {
>>>>         Token Tok;
>>>>         L.LexFromRawLexer(Tok);
>>>>
>>>>         // If this is a # at the start of a line, discard it from 
>>>> the token stream.
>>>>         // We don't want the re-preprocess step to see #defines, 
>>>> #includes or other
>>>>         // preprocessor directives.
>>>>         if (Tok.is(tok::hash) && Tok.isAtStartOfLine())
>>>>             continue;
>>>>
>>>>         // If this is a ## token, change its kind to unknown so 
>>>> that repreprocessing
>>>>         // it will not produce an error.
>>>>         if (Tok.is(tok::hashhash))
>>>>             Tok.setKind(tok::unknown);
>>>>
>>>>         // If this raw token is an identifier, the raw lexer won't 
>>>> have looked up
>>>>         // the corresponding identifier info for it.  Do this now 
>>>> so that it will be
>>>>         // macro expanded when we re-preprocess it.
>>>>         if (Tok.is(tok::raw_identifier))
>>>>             PP.LookUpIdentifierInfo(Tok);
>>>>
>>>>         TokenStream.push_back(Tok);
>>>>
>>>>         for (auto &rep : m_workList.getSortedReplacements())
>>>>         {
>>>>             auto repS = rep->replaceS;
>>>>             auto spellLoc = SM.getSpellingLoc(repS->getLocStart());
>>>>             if (SM.getSpellingLoc(Tok.getLocation()) == spellLoc)
>>>>             {
>>>>                 //
>>>>             }
>>>>         }
>>>>
>>>>         if (Tok.is(tok::eof))
>>>>             break;
>>>>     }
>>>>
>>>>     // Temporarily change the diagnostics object so that we ignore 
>>>> any generated
>>>>     // diagnostics from this pass.
>>>>     DiagnosticsEngine 
>>>> TmpDiags(PP.getDiagnostics().getDiagnosticIDs(), 
>>>> &PP.getDiagnostics().getDiagnosticOptions(),
>>>>                                new IgnoringDiagConsumer);
>>>>
>>>>     // Copy the preprocessor and all of its state.
>>>>     auto PPOpts = 
>>>> std::make_shared<PreprocessorOptions>(PP.getPreprocessorOpts());
>>>>     LangOptions LO = PP.getLangOpts();
>>>>     Preprocessor TmpPP(PPOpts, TmpDiags, LO, SM, PP.getPCMCache(), 
>>>> PP.getHeaderSearchInfo(), PP.getModuleLoader(),
>>>> PP.getIdentifierTable().getExternalIdentifierLookup());
>>>>     TmpPP.Initialize(PP.getTargetInfo(), PP.getAuxTargetInfo());
>>>>     TmpPP.setExternalSource(PP.getExternalSource());
>>>>     TmpPP.setPreprocessedOutput(true);
>>>>
>>>>     std::map<const clang::IdentifierInfo *, bool> 
>>>> MacroPreviouslyEnabled;
>>>>     for (const auto &m : PP.macros())
>>>>     {
>>>>         // printf("PREDEF MACRO: %s\n", 
>>>> m.first->getName().str().c_str());
>>>>         TmpPP.getMacroDefinition(m.first);
>>>>
>>>>         for (const auto &tmpm : TmpPP.macros())
>>>>         {
>>>>             if (tmpm.first == m.first)
>>>>             {
>>>>                 auto MD = m.second.getLatest();
>>>>                 auto MI = MD->getMacroInfo();
>>>>                 // If this is a recursive call we might be in a 
>>>> macro expansion and the macro might be disabled. We need
>>>>                 // to enable it for now so that all expansions 
>>>> work. Restore it later.
>>>>                 MacroPreviouslyEnabled[tmpm.first] = MI->isEnabled();
>>>>                 if (!MI->isEnabled())
>>>>                 {
>>>>                     MD->getMacroInfo()->EnableMacro();
>>>>                 }
>>>>
>>>>                 // This should not change anything since we just 
>>>> copy data over.
>>>>                 auto &mutableState = 
>>>> const_cast<std::remove_const<decltype(tmpm.second)>::type 
>>>> &>(tmpm.second);
>>>>                 mutableState.setLatest(MD);
>>>>                 break;
>>>>             }
>>>>         }
>>>>     }
>>>>
>>>>     class MacroArgCollector : public clang::PPCallbacks
>>>>     {
>>>>     public:
>>>>         MacroArgCollector(Preprocessor &TmpPP) : TmpPP(TmpPP) {}
>>>>
>>>>         void MacroExpands(const Token &Tok, const MacroDefinition 
>>>> &MD, SourceRange Range, const MacroArgs *Args) override
>>>>         {
>>>>             if (!Args)
>>>>             {
>>>>                 return;
>>>>             }
>>>>             printf("GOT MACRO ARGS EXPANSION CALLBACK\n");
>>>>             for (int i = 0; i < (int)Args->getNumMacroArguments(); 
>>>> i++)
>>>>             {
>>>>                 auto TokUnex = Args->getUnexpArgument(i);
>>>>                 // Thats just non-const for a cache, so should be 
>>>> fine.
>>>>                 auto TokPreExp = const_cast<MacroArgs 
>>>> *>(Args)->getPreExpArgument(i, TmpPP);
>>>>                 printf("unexp: %s\n", 
>>>> TmpPP.getSpelling(*TokUnex).c_str());
>>>>                 for (const auto &T : TokPreExp)
>>>>                 {
>>>>                     printf("preexp: %s\n", 
>>>> TmpPP.getSpelling(T).c_str());
>>>>                 }
>>>>             }
>>>>         }
>>>>
>>>>         Preprocessor &TmpPP;
>>>>     };
>>>> TmpPP.addPPCallbacks(std::make_unique<MacroArgCollector>(TmpPP));
>>>>     // Instead: collect the macro arg info in the law lexing step 
>>>> above. or do another pass that uses the PP but without expansions.
>>>>
>>>>     /*printf("DUMP MACRO INFO\n");
>>>>     for (const auto &m : PP.macros())
>>>>         PP.dumpMacroInfo(m.first);
>>>>     printf("---\n");
>>>>     for (const auto &m : TmpPP.macros())
>>>>         TmpPP.dumpMacroInfo(m.first);
>>>>     printf("DUMP MACRO INFO END\n");*/
>>>>
>>>>     DiagnosticsEngine *OldDiags = &TmpPP.getDiagnostics();
>>>>
>>>>     // Inform the preprocessor that we don't want comments.
>>>>     TmpPP.SetCommentRetentionState(false, false);
>>>>
>>>>     // We don't want pragmas either. Although we filtered out 
>>>> #pragma, removing
>>>>     // _Pragma and __pragma is much harder.
>>>>     bool PragmasPreviouslyEnabled = TmpPP.getPragmasEnabled();
>>>>     TmpPP.setPragmasEnabled(false);
>>>>
>>>>     // Enter the tokens we just lexed.  This will cause them to be 
>>>> macro expanded
>>>>     // but won't enter sub-files (because we removed #'s).
>>>>     TmpPP.EnterTokenStream(TokenStream, false);
>>>>
>>>>     TokenConcatenation ConcatInfo(TmpPP);
>>>>
>>>>     // Lex all the tokens.
>>>>     Token Tok;
>>>>     TmpPP.Lex(Tok);
>>>>
>>>>     std::map<SourceLocation, int> slocIdx;
>>>>
>>>>     auto checkReplacement = [&]() {
>>>>         for (auto &rep : m_workList.getSortedReplacements())
>>>>         {
>>>>             // auto rep = r.second.get();
>>>>             auto repS = rep->replaceS;
>>>>             auto spellLoc = SM.getSpellingLoc(repS->getLocStart());
>>>>             // TODO we need to check here if the repS spans the 
>>>> full range (or largest?)
>>>>             if (SM.getSpellingLoc(Tok.getLocation()) == spellLoc)
>>>>             {
>>>>                 if (slocIdx[spellLoc] == 7)
>>>>                 {
>>>>                     // replace
>>>>                 }
>>>>                 slocIdx[spellLoc]++;
>>>>
>>>>                 // Done replacing that one, but have to keep it 
>>>> alive until we're done with it.
>>>>                 m_workList.markDone(rep);
>>>>
>>>>                 printf("[[[\n");
>>>>                 auto repStr = rep->makeReplaceStr();
>>>>                 printf("REPLACED: %s ]]]\n", repStr.c_str());
>>>>                 out += repStr;
>>>>
>>>>                 // Skip ahead until after the whole replacement.
>>>>                 auto repEnd = SM.getSpellingLoc(repS->getLocEnd());
>>>>                 while (repEnd != SM.getSpellingLoc(Tok.getLocation()))
>>>>                 {
>>>>                     TmpPP.Lex(Tok);
>>>>                     assert(!Tok.is(tok::eof) && "End not found");
>>>>                 }
>>>>
>>>>                 // Eat one more since we stopped at the end token 
>>>> and we want to continue after it.
>>>>                 TmpPP.Lex(Tok);
>>>>
>>>>                 return true;
>>>>             }
>>>>         }
>>>>         return false;
>>>>     };
>>>>
>>>>     while (Tok.isNot(tok::eof))
>>>>     {
>>>>         printf("TOKEN: %s\n", TmpPP.getSpelling(Tok).c_str());
>>>>
>>>>         auto TokLoc = Tok.getLocation();
>>>>         auto TokExp = SM.getExpansionLoc(TokLoc);
>>>>         if (SM.isBeforeInTranslationUnit(toReplaceExpEnd, TokExp))
>>>>         {
>>>>             // Anything after the Stmt we want to replace is not 
>>>> interesting.
>>>>             break;
>>>>         }
>>>>
>>>>         // Skip ahead until we are at the expansion start of the 
>>>> Stmt we want to replace.
>>>>         if (!SM.isBeforeInTranslationUnit(TokLoc, toReplaceExpStart))
>>>>         {
>>>>             if (TokLoc.isMacroID())
>>>>             {
>>>>                 // This is the first token of a macro expansion.
>>>>                 auto LLoc = SM.getExpansionRange(TokLoc);
>>>>
>>>>                 // Ignore tokens whose instantiation location was 
>>>> not the main file.
>>>>                 if (SM.getFileID(LLoc.first) != FID)
>>>>                 {
>>>>                     TmpPP.Lex(Tok);
>>>>                     continue;
>>>>                 }
>>>>
>>>>                 assert(SM.getFileID(LLoc.second) == FID &&
>>>>                        "Start and end of expansion must be in the 
>>>> same ultimate file!");
>>>>
>>>>                 bool stopOutputOnNextToken = false;
>>>>                 bool toReplaceStartsInMacro = toReplaceExpStart == 
>>>> TokExp;
>>>>                 bool toReplaceEndsInMacro = toReplaceExpEnd == TokExp;
>>>>                 bool startedOutput = false;
>>>>
>>>>                 Token PrevPrevTok;
>>>>                 Token PrevTok = Tok;
>>>>
>>>>                 while (!Tok.is(tok::eof) && 
>>>> SM.getExpansionLoc(Tok.getLocation()) == LLoc.first)
>>>>                 {
>>>>                     printf("TOKEN (in macro): %s\n", 
>>>> TmpPP.getSpelling(Tok).c_str());
>>>>
>>>>                     auto TokSpell = 
>>>> SM.getSpellingLoc(Tok.getLocation());
>>>>                     if (stopOutputOnNextToken)
>>>>                     {
>>>>                         break;
>>>>                     }
>>>>                     if (toReplaceEndsInMacro && TokSpell == 
>>>> toReplaceSpellEnd)
>>>>                     {
>>>>                         stopOutputOnNextToken = true;
>>>>                     }
>>>>
>>>>                     if (toReplaceStartsInMacro && !startedOutput)
>>>>                     {
>>>>                         if (TokSpell == toReplaceSpellStart)
>>>>                         {
>>>>                             startedOutput = true;
>>>>                         }
>>>>                         else
>>>>                         {
>>>>                             TmpPP.Lex(Tok);
>>>>                             continue;
>>>>                         }
>>>>                     }
>>>>
>>>>                     // If the tokens were already space separated, 
>>>> or if they must be to avoid
>>>>                     // them being implicitly pasted, add a space 
>>>> between them.
>>>>                     if (Tok.hasLeadingSpace() || 
>>>> ConcatInfo.AvoidConcat(PrevPrevTok, PrevTok, Tok))
>>>>                         out += ' ';
>>>>
>>>>                     if (checkReplacement())
>>>>                     {
>>>>                         continue;
>>>>                     }
>>>>
>>>>                     out += TmpPP.getSpelling(Tok);
>>>>                     TmpPP.Lex(Tok);
>>>>                 }
>>>>                 if (stopOutputOnNextToken)
>>>>                 {
>>>>                     break;
>>>>                 }
>>>>             }
>>>>             else
>>>>             {
>>>>                 if (checkReplacement())
>>>>                 {
>>>>                     continue;
>>>>                 }
>>>>
>>>>                 // Output original code because we are outside of a 
>>>> replacement.
>>>>                 out += TmpPP.getSpelling(Tok);
>>>>                 TmpPP.Lex(Tok);
>>>>             }
>>>>         }
>>>>         else
>>>>         {
>>>>             TmpPP.Lex(Tok);
>>>>         }
>>>>     }
>>>>
>>>>     // Restore the preprocessor's old state.
>>>>     TmpPP.setDiagnostics(*OldDiags);
>>>>     TmpPP.setPragmasEnabled(PragmasPreviouslyEnabled);
>>>>
>>>>     for (const auto &tmpm : TmpPP.macros())
>>>>     {
>>>>         auto it = MacroPreviouslyEnabled.find(tmpm.first);
>>>>         if (it != MacroPreviouslyEnabled.end())
>>>>         {
>>>>             auto MD = tmpm.second.getLatest();
>>>>             auto MI = MD->getMacroInfo();
>>>>             if (MI->isEnabled() && !it->second)
>>>>             {
>>>>                 MI->DisableMacro();
>>>>             }
>>>>             else if (!MI->isEnabled() && it->second)
>>>>             {
>>>>                 MI->EnableMacro();
>>>>             }
>>>>         }
>>>>     }
>>>>
>>>>     return out;
>>>> }
>>>>
>>>>
>>>>
>>>> _______________________________________________
>>>> cfe-dev mailing list
>>>> cfe-dev at lists.llvm.org
>>>> http://lists.llvm.org/cgi-bin/mailman/listinfo/cfe-dev
>>>
>>>
>>>
>>> _______________________________________________
>>> cfe-dev mailing list
>>> cfe-dev at lists.llvm.org
>>> http://lists.llvm.org/cgi-bin/mailman/listinfo/cfe-dev
>>
>
-------------- next part --------------
An HTML attachment was scrubbed...
URL: <http://lists.llvm.org/pipermail/cfe-dev/attachments/20181023/40acbed1/attachment.html>
-------------- next part --------------
A non-text attachment was scrubbed...
Name: RewriteManager.h
Type: text/x-chdr
Size: 3423 bytes
Desc: not available
URL: <http://lists.llvm.org/pipermail/cfe-dev/attachments/20181023/40acbed1/attachment.h>
-------------- next part --------------
A non-text attachment was scrubbed...
Name: RewriteManager.cpp
Type: text/x-c++src
Size: 13393 bytes
Desc: not available
URL: <http://lists.llvm.org/pipermail/cfe-dev/attachments/20181023/40acbed1/attachment.cpp>
-------------- next part --------------
A non-text attachment was scrubbed...
Name: smime.p7s
Type: application/pkcs7-signature
Size: 5449 bytes
Desc: S/MIME Cryptographic Signature
URL: <http://lists.llvm.org/pipermail/cfe-dev/attachments/20181023/40acbed1/attachment.bin>