[clang] 6e30710 - [analyzer] Introduce MacroExpansionContext to libAnalysis

Balazs Benics via cfe-commits cfe-commits at lists.llvm.org
Mon Feb 22 02:07:08 PST 2021


Author: Balazs Benics
Date: 2021-02-22T11:11:57+01:00
New Revision: 6e3071007b4c9438d2ae49476de87db30d6d24e9

URL: https://github.com/llvm/llvm-project/commit/6e3071007b4c9438d2ae49476de87db30d6d24e9
DIFF: https://github.com/llvm/llvm-project/commit/6e3071007b4c9438d2ae49476de87db30d6d24e9.diff

LOG: [analyzer] Introduce MacroExpansionContext to libAnalysis

Introduce `MacroExpansionContext` to track what and how macros in a translation
unit expand. This is the first element of the patch-stack in this direction.

The main goal is to substitute the current macro expansion generator in the
`PlistsDiagnostics`, but all the other `DiagnosticsConsumer` could benefit from
this.

`getExpandedText` and `getOriginalText` are the primary functions of this class.
The former can provide you the text that was the result of the macro expansion
chain starting from a `SourceLocation`.
While the latter will tell you **what text** was in the original source code
replaced by the macro expansion chain from that location.

Here is an example:

  void bar();
  #define retArg(x) x
  #define retArgUnclosed retArg(bar()
  #define BB CC
  #define applyInt BB(int)
  #define CC(x) retArgUnclosed

  void unbalancedMacros() {
    applyInt  );
  //^~~~~~~~~~^ is the substituted range
  // Original text is "applyInt  )"
  // Expanded text is "bar()"
  }

  #define expandArgUnclosedCommaExpr(x) (x, bar(), 1
  #define f expandArgUnclosedCommaExpr

  void unbalancedMacros2() {
    int x =  f(f(1))  ));  // Look at the parenthesis!
  //         ^~~~~~^ is the substituted range
  // Original text is "f(f(1))"
  // Expanded text is "((1,bar(),1,bar(),1"
  }

Might worth investigating how to provide a reusable component, which could be
used for example by a standalone tool eg. expanding all macros to their
definitions.

I borrowed the main idea from the `PrintPreprocessedOutput.cpp` Frontend
component, providing a `PPCallbacks` instance hooking the preprocessor events.
I'm using that for calculating the source range where tokens will be expanded
to. I'm also using the `Preprocessor`'s `OnToken` callback, via the
`Preprocessor::setTokenWatcher` to reconstruct the expanded text.

Unfortunately, I concatenate the token's string representation without any
whitespaces except if the token is an identifier when I emit an extra space
to produce valid code for `int var` token sequences.
This could be improved later if needed.

Patch-stack:
  1) D93222 (this one) Introduces the MacroExpansionContext class and unittests

  2) D93223 Create MacroExpansionContext member in AnalysisConsumer and pass
     down to the diagnostics consumers

  3) D93224 Use the MacroExpansionContext for macro expansions in plists
     It replaces the 'old' macro expansion mechanism.

  4) D94673 API for CTU macro expansions
     You should be able to get a `MacroExpansionContext` for each imported TU.
     Right now it will just return `llvm::None` as this is not implemented yet.

  5) FIXME: Implement macro expansion tracking for imported TUs as well.

It would also relieve us from bugs like:
  - [fixed] D86135
  - [confirmed] The `__VA_ARGS__` and other macro nitty-gritty, such as how to
    stringify macro parameters, where to put or swallow commas, etc. are not
    handled correctly.
  - [confirmed] Unbalanced parenthesis are not well handled - resulting in
    incorrect expansions or even crashes.
  - [confirmed][crashing] https://bugs.llvm.org/show_bug.cgi?id=48358

Reviewed By: martong, Szelethus

Differential Revision: https://reviews.llvm.org/D93222

Added: 
    clang/include/clang/Analysis/MacroExpansionContext.h
    clang/lib/Analysis/MacroExpansionContext.cpp
    clang/unittests/Analysis/MacroExpansionContextTest.cpp

Modified: 
    clang/lib/Analysis/CMakeLists.txt
    clang/unittests/Analysis/CMakeLists.txt

Removed: 
    


################################################################################
diff  --git a/clang/include/clang/Analysis/MacroExpansionContext.h b/clang/include/clang/Analysis/MacroExpansionContext.h
new file mode 100644
index 000000000000..57934bfc09d9
--- /dev/null
+++ b/clang/include/clang/Analysis/MacroExpansionContext.h
@@ -0,0 +1,127 @@
+//===- MacroExpansionContext.h - Macro expansion information ----*- C++ -*-===//
+//
+// Part of the LLVM Project, under the Apache License v2.0 with LLVM Exceptions.
+// See https://llvm.org/LICENSE.txt for license information.
+// SPDX-License-Identifier: Apache-2.0 WITH LLVM-exception
+//
+//===----------------------------------------------------------------------===//
+
+#ifndef LLVM_CLANG_ANALYSIS_MACROEXPANSIONCONTEXT_H
+#define LLVM_CLANG_ANALYSIS_MACROEXPANSIONCONTEXT_H
+
+#include "clang/Basic/LangOptions.h"
+#include "clang/Basic/SourceLocation.h"
+#include "clang/Lex/Preprocessor.h"
+#include "llvm/ADT/DenseMap.h"
+#include "llvm/ADT/Optional.h"
+#include "llvm/ADT/SmallString.h"
+#include "llvm/ADT/SmallVector.h"
+
+namespace clang {
+
+namespace detail {
+class MacroExpansionRangeRecorder;
+} // namespace detail
+
+/// MacroExpansionContext tracks the macro expansions processed by the
+/// Preprocessor. It means that it can track source locations from a single
+/// translation unit. For every macro expansion it can tell you what text will
+/// be substituted.
+///
+/// It was designed to deal with:
+///  - regular macros
+///  - macro functions
+///  - variadic macros
+///  - transitive macro expansions
+///  - macro redefinition
+///  - unbalanced parenthesis
+///
+/// \code{.c}
+///   void bar();
+///   #define retArg(x) x
+///   #define retArgUnclosed retArg(bar()
+///   #define BB CC
+///   #define applyInt BB(int)
+///   #define CC(x) retArgUnclosed
+///
+///   void unbalancedMacros() {
+///     applyInt  );
+///   //^~~~~~~~~~^ is the substituted range
+///   // Substituted text is "applyInt  )"
+///   // Expanded text is "bar()"
+///   }
+///
+///   #define expandArgUnclosedCommaExpr(x) (x, bar(), 1
+///   #define f expandArgUnclosedCommaExpr
+///
+///   void unbalancedMacros2() {
+///     int x =  f(f(1))  ));  // Look at the parenthesis!
+///   //         ^~~~~~^ is the substituted range
+///   // Substituted text is "f(f(1))"
+///   // Expanded text is "((1,bar(),1,bar(),1"
+///   }
+/// \endcode
+/// \remark Currently we don't respect the whitespaces between expanded tokens,
+///         so the output for this example might 
diff er from the -E compiler
+///         invocation.
+/// \remark All whitespaces are consumed while constructing the expansion.
+///         After all identifier a single space inserted to produce a valid C
+///         code even if identifier follows an other identifiers such as
+///         variable declarations.
+/// \remark MacroExpansionContext object must outlive the Preprocessor
+///         parameter.
+class MacroExpansionContext {
+public:
+  /// Creates a MacroExpansionContext.
+  /// \remark You must call registerForPreprocessor to set the required
+  ///         onTokenLexed callback and the PPCallbacks.
+  explicit MacroExpansionContext(const LangOptions &LangOpts);
+
+  /// Register the necessary callbacks to the Preprocessor to record the
+  /// expansion events and the generated tokens. Must ensure that this object
+  /// outlives the given Preprocessor.
+  void registerForPreprocessor(Preprocessor &PP);
+
+  /// \param MacroExpansionLoc Must be the expansion location of a macro.
+  /// \return The textual representation of the token sequence which was
+  ///         substituted in place of the macro after the preprocessing.
+  ///         If no macro was expanded at that location, returns llvm::None.
+  Optional<StringRef> getExpandedText(SourceLocation MacroExpansionLoc) const;
+
+  /// \param MacroExpansionLoc Must be the expansion location of a macro.
+  /// \return The text from the original source code which were substituted by
+  ///         the macro expansion chain from the given location.
+  ///         If no macro was expanded at that location, returns llvm::None.
+  Optional<StringRef> getOriginalText(SourceLocation MacroExpansionLoc) const;
+
+  LLVM_DUMP_METHOD void dumpExpansionRangesToStream(raw_ostream &OS) const;
+  LLVM_DUMP_METHOD void dumpExpandedTextsToStream(raw_ostream &OS) const;
+  LLVM_DUMP_METHOD void dumpExpansionRanges() const;
+  LLVM_DUMP_METHOD void dumpExpandedTexts() const;
+
+private:
+  friend class detail::MacroExpansionRangeRecorder;
+  using MacroExpansionText = SmallString<40>;
+  using ExpansionMap = llvm::DenseMap<SourceLocation, MacroExpansionText>;
+  using ExpansionRangeMap = llvm::DenseMap<SourceLocation, SourceLocation>;
+
+  /// Associates the textual representation of the expanded tokens at the given
+  /// macro expansion location.
+  ExpansionMap ExpandedTokens;
+
+  /// Tracks which source location was the last affected by any macro
+  /// substitution starting from a given macro expansion location.
+  ExpansionRangeMap ExpansionRanges;
+
+  Preprocessor *PP = nullptr;
+  SourceManager *SM = nullptr;
+  const LangOptions &LangOpts;
+
+  /// This callback is called by the preprocessor.
+  /// It stores the textual representation of the expanded token sequence for a
+  /// macro expansion location.
+  void onTokenLexed(const Token &Tok);
+};
+} // end namespace clang
+
+#endif // LLVM_CLANG_ANALYSIS_MACROEXPANSIONCONTEXT_H

diff  --git a/clang/lib/Analysis/CMakeLists.txt b/clang/lib/Analysis/CMakeLists.txt
index ed626a6e130c..00c8d6756177 100644
--- a/clang/lib/Analysis/CMakeLists.txt
+++ b/clang/lib/Analysis/CMakeLists.txt
@@ -20,6 +20,7 @@ add_clang_library(clangAnalysis
   ExprMutationAnalyzer.cpp
   IssueHash.cpp
   LiveVariables.cpp
+  MacroExpansionContext.cpp
   ObjCNoReturn.cpp
   PathDiagnostic.cpp
   PostOrderCFGView.cpp

diff  --git a/clang/lib/Analysis/MacroExpansionContext.cpp b/clang/lib/Analysis/MacroExpansionContext.cpp
new file mode 100644
index 000000000000..bb5095a114cb
--- /dev/null
+++ b/clang/lib/Analysis/MacroExpansionContext.cpp
@@ -0,0 +1,230 @@
+//===- MacroExpansionContext.cpp - Macro expansion information --*- C++ -*-===//
+//
+// Part of the LLVM Project, under the Apache License v2.0 with LLVM Exceptions.
+// See https://llvm.org/LICENSE.txt for license information.
+// SPDX-License-Identifier: Apache-2.0 WITH LLVM-exception
+//
+//===----------------------------------------------------------------------===//
+
+#include "clang/Analysis/MacroExpansionContext.h"
+#include "llvm/Support/Debug.h"
+
+#define DEBUG_TYPE "macro-expansion-context"
+
+static void dumpTokenInto(const clang::Preprocessor &PP, clang::raw_ostream &OS,
+                          clang::Token Tok);
+
+namespace clang {
+namespace detail {
+class MacroExpansionRangeRecorder : public PPCallbacks {
+  const Preprocessor &PP;
+  SourceManager &SM;
+  MacroExpansionContext::ExpansionRangeMap &ExpansionRanges;
+
+public:
+  explicit MacroExpansionRangeRecorder(
+      const Preprocessor &PP, SourceManager &SM,
+      MacroExpansionContext::ExpansionRangeMap &ExpansionRanges)
+      : PP(PP), SM(SM), ExpansionRanges(ExpansionRanges) {}
+
+  void MacroExpands(const Token &MacroName, const MacroDefinition &MD,
+                    SourceRange Range, const MacroArgs *Args) override {
+    // Ignore annotation tokens like: _Pragma("pack(push, 1)")
+    if (MacroName.getIdentifierInfo()->getName() == "_Pragma")
+      return;
+
+    SourceLocation MacroNameBegin = SM.getExpansionLoc(MacroName.getLocation());
+    assert(MacroNameBegin == SM.getExpansionLoc(Range.getBegin()));
+
+    const SourceLocation ExpansionEnd = [Range, &SM = SM, &MacroName] {
+      // If the range is empty, use the length of the macro.
+      if (Range.getBegin() == Range.getEnd())
+        return SM.getExpansionLoc(
+            MacroName.getLocation().getLocWithOffset(MacroName.getLength()));
+
+      // Include the last character.
+      return SM.getExpansionLoc(Range.getEnd()).getLocWithOffset(1);
+    }();
+
+    LLVM_DEBUG(llvm::dbgs() << "MacroExpands event: '";
+               dumpTokenInto(PP, llvm::dbgs(), MacroName);
+               llvm::dbgs()
+               << "' with length " << MacroName.getLength() << " at ";
+               MacroNameBegin.print(llvm::dbgs(), SM);
+               llvm::dbgs() << ", expansion end at ";
+               ExpansionEnd.print(llvm::dbgs(), SM); llvm::dbgs() << '\n';);
+
+    // If the expansion range is empty, use the identifier of the macro as a
+    // range.
+    MacroExpansionContext::ExpansionRangeMap::iterator It;
+    bool Inserted;
+    std::tie(It, Inserted) =
+        ExpansionRanges.try_emplace(MacroNameBegin, ExpansionEnd);
+    if (Inserted) {
+      LLVM_DEBUG(llvm::dbgs() << "maps ";
+                 It->getFirst().print(llvm::dbgs(), SM); llvm::dbgs() << " to ";
+                 It->getSecond().print(llvm::dbgs(), SM);
+                 llvm::dbgs() << '\n';);
+    } else {
+      if (SM.isBeforeInTranslationUnit(It->getSecond(), ExpansionEnd)) {
+        It->getSecond() = ExpansionEnd;
+        LLVM_DEBUG(
+            llvm::dbgs() << "remaps "; It->getFirst().print(llvm::dbgs(), SM);
+            llvm::dbgs() << " to "; It->getSecond().print(llvm::dbgs(), SM);
+            llvm::dbgs() << '\n';);
+      }
+    }
+  }
+};
+} // namespace detail
+} // namespace clang
+
+using namespace clang;
+
+MacroExpansionContext::MacroExpansionContext(const LangOptions &LangOpts)
+    : LangOpts(LangOpts) {}
+
+void MacroExpansionContext::registerForPreprocessor(Preprocessor &NewPP) {
+  PP = &NewPP;
+  SM = &NewPP.getSourceManager();
+
+  // Make sure that the Preprocessor does not outlive the MacroExpansionContext.
+  PP->addPPCallbacks(std::make_unique<detail::MacroExpansionRangeRecorder>(
+      *PP, *SM, ExpansionRanges));
+  // Same applies here.
+  PP->setTokenWatcher([this](const Token &Tok) { onTokenLexed(Tok); });
+}
+
+Optional<StringRef>
+MacroExpansionContext::getExpandedText(SourceLocation MacroExpansionLoc) const {
+  if (MacroExpansionLoc.isMacroID())
+    return llvm::None;
+
+  // If there was no macro expansion at that location, return None.
+  if (ExpansionRanges.find_as(MacroExpansionLoc) == ExpansionRanges.end())
+    return llvm::None;
+
+  // There was macro expansion, but resulted in no tokens, return empty string.
+  const auto It = ExpandedTokens.find_as(MacroExpansionLoc);
+  if (It == ExpandedTokens.end())
+    return StringRef{""};
+
+  // Otherwise we have the actual token sequence as string.
+  return StringRef{It->getSecond()};
+}
+
+Optional<StringRef>
+MacroExpansionContext::getOriginalText(SourceLocation MacroExpansionLoc) const {
+  if (MacroExpansionLoc.isMacroID())
+    return llvm::None;
+
+  const auto It = ExpansionRanges.find_as(MacroExpansionLoc);
+  if (It == ExpansionRanges.end())
+    return llvm::None;
+
+  assert(It->getFirst() != It->getSecond() &&
+         "Every macro expansion must cover a non-empty range.");
+
+  return Lexer::getSourceText(
+      CharSourceRange::getCharRange(It->getFirst(), It->getSecond()), *SM,
+      LangOpts);
+}
+
+void MacroExpansionContext::dumpExpansionRanges() const {
+  dumpExpansionRangesToStream(llvm::dbgs());
+}
+void MacroExpansionContext::dumpExpandedTexts() const {
+  dumpExpandedTextsToStream(llvm::dbgs());
+}
+
+void MacroExpansionContext::dumpExpansionRangesToStream(raw_ostream &OS) const {
+  std::vector<std::pair<SourceLocation, SourceLocation>> LocalExpansionRanges;
+  LocalExpansionRanges.reserve(ExpansionRanges.size());
+  for (const auto &Record : ExpansionRanges)
+    LocalExpansionRanges.emplace_back(
+        std::make_pair(Record.getFirst(), Record.getSecond()));
+  llvm::sort(LocalExpansionRanges);
+
+  OS << "\n=============== ExpansionRanges ===============\n";
+  for (const auto &Record : LocalExpansionRanges) {
+    OS << "> ";
+    Record.first.print(OS, *SM);
+    OS << ", ";
+    Record.second.print(OS, *SM);
+    OS << '\n';
+  }
+}
+
+void MacroExpansionContext::dumpExpandedTextsToStream(raw_ostream &OS) const {
+  std::vector<std::pair<SourceLocation, MacroExpansionText>>
+      LocalExpandedTokens;
+  LocalExpandedTokens.reserve(ExpandedTokens.size());
+  for (const auto &Record : ExpandedTokens)
+    LocalExpandedTokens.emplace_back(
+        std::make_pair(Record.getFirst(), Record.getSecond()));
+  llvm::sort(LocalExpandedTokens);
+
+  OS << "\n=============== ExpandedTokens ===============\n";
+  for (const auto &Record : LocalExpandedTokens) {
+    OS << "> ";
+    Record.first.print(OS, *SM);
+    OS << " -> '" << Record.second << "'\n";
+  }
+}
+
+static void dumpTokenInto(const Preprocessor &PP, raw_ostream &OS, Token Tok) {
+  assert(Tok.isNot(tok::raw_identifier));
+
+  // Ignore annotation tokens like: _Pragma("pack(push, 1)")
+  if (Tok.isAnnotation())
+    return;
+
+  if (IdentifierInfo *II = Tok.getIdentifierInfo()) {
+    // FIXME: For now, we don't respect whitespaces between macro expanded
+    // tokens. We just emit a space after every identifier to produce a valid
+    // code for `int a ;` like expansions.
+    //              ^-^-- Space after the 'int' and 'a' identifiers.
+    OS << II->getName() << ' ';
+  } else if (Tok.isLiteral() && !Tok.needsCleaning() && Tok.getLiteralData()) {
+    OS << StringRef(Tok.getLiteralData(), Tok.getLength());
+  } else {
+    char Tmp[256];
+    if (Tok.getLength() < sizeof(Tmp)) {
+      const char *TokPtr = Tmp;
+      // FIXME: Might use a 
diff erent overload for cleaner callsite.
+      unsigned Len = PP.getSpelling(Tok, TokPtr);
+      OS.write(TokPtr, Len);
+    } else {
+      OS << "<too long token>";
+    }
+  }
+}
+
+void MacroExpansionContext::onTokenLexed(const Token &Tok) {
+  SourceLocation SLoc = Tok.getLocation();
+  if (SLoc.isFileID())
+    return;
+
+  LLVM_DEBUG(llvm::dbgs() << "lexed macro expansion token '";
+             dumpTokenInto(*PP, llvm::dbgs(), Tok); llvm::dbgs() << "' at ";
+             SLoc.print(llvm::dbgs(), *SM); llvm::dbgs() << '\n';);
+
+  // Remove spelling location.
+  SourceLocation CurrExpansionLoc = SM->getExpansionLoc(SLoc);
+
+  MacroExpansionText TokenAsString;
+  llvm::raw_svector_ostream OS(TokenAsString);
+
+  // FIXME: Prepend newlines and space to produce the exact same output as the
+  // preprocessor would for this token.
+
+  dumpTokenInto(*PP, OS, Tok);
+
+  ExpansionMap::iterator It;
+  bool Inserted;
+  std::tie(It, Inserted) =
+      ExpandedTokens.try_emplace(CurrExpansionLoc, std::move(TokenAsString));
+  if (!Inserted)
+    It->getSecond().append(TokenAsString);
+}
+

diff  --git a/clang/unittests/Analysis/CMakeLists.txt b/clang/unittests/Analysis/CMakeLists.txt
index 66069c854a6a..00026874417b 100644
--- a/clang/unittests/Analysis/CMakeLists.txt
+++ b/clang/unittests/Analysis/CMakeLists.txt
@@ -8,6 +8,7 @@ add_clang_unittest(ClangAnalysisTests
   CFGTest.cpp
   CloneDetectionTest.cpp
   ExprMutationAnalyzerTest.cpp
+  MacroExpansionContextTest.cpp
   )
 
 clang_target_link_libraries(ClangAnalysisTests
@@ -17,6 +18,13 @@ clang_target_link_libraries(ClangAnalysisTests
   clangASTMatchers
   clangBasic
   clangFrontend
+  clangLex
   clangSerialization
+  clangTesting
   clangTooling
   )
+
+target_link_libraries(ClangAnalysisTests
+  PRIVATE
+  LLVMTestingSupport
+  )

diff  --git a/clang/unittests/Analysis/MacroExpansionContextTest.cpp b/clang/unittests/Analysis/MacroExpansionContextTest.cpp
new file mode 100644
index 000000000000..2e86457d276c
--- /dev/null
+++ b/clang/unittests/Analysis/MacroExpansionContextTest.cpp
@@ -0,0 +1,424 @@
+//===- unittests/Analysis/MacroExpansionContextTest.cpp - -----------------===//
+//
+// Part of the LLVM Project, under the Apache License v2.0 with LLVM Exceptions.
+// See https://llvm.org/LICENSE.txt for license information.
+// SPDX-License-Identifier: Apache-2.0 WITH LLVM-exception
+//
+//===----------------------------------------------------------------------===//
+
+#include "clang/Analysis/MacroExpansionContext.h"
+#include "clang/AST/ASTConsumer.h"
+#include "clang/AST/ASTContext.h"
+#include "clang/Basic/Diagnostic.h"
+#include "clang/Basic/DiagnosticOptions.h"
+#include "clang/Basic/FileManager.h"
+#include "clang/Basic/LangOptions.h"
+#include "clang/Basic/SourceManager.h"
+#include "clang/Basic/TargetInfo.h"
+#include "clang/Basic/TargetOptions.h"
+#include "clang/Lex/HeaderSearch.h"
+#include "clang/Lex/HeaderSearchOptions.h"
+#include "clang/Lex/Preprocessor.h"
+#include "clang/Lex/PreprocessorOptions.h"
+#include "clang/Parse/Parser.h"
+#include "llvm/ADT/SmallString.h"
+#include "gtest/gtest.h"
+
+// static bool HACK_EnableDebugInUnitTest = (::llvm::DebugFlag = true);
+
+namespace clang {
+namespace analysis {
+namespace {
+
+class MacroExpansionContextTest : public ::testing::Test {
+protected:
+  MacroExpansionContextTest()
+      : InMemoryFileSystem(new llvm::vfs::InMemoryFileSystem),
+        FileMgr(FileSystemOptions(), InMemoryFileSystem),
+        DiagID(new DiagnosticIDs()), DiagOpts(new DiagnosticOptions()),
+        Diags(DiagID, DiagOpts.get(), new IgnoringDiagConsumer()),
+        SourceMgr(Diags, FileMgr), TargetOpts(new TargetOptions()) {
+    TargetOpts->Triple = "x86_64-pc-linux-unknown";
+    Target = TargetInfo::CreateTargetInfo(Diags, TargetOpts);
+    LangOpts.CPlusPlus20 = 1; // For __VA_OPT__
+  }
+
+  IntrusiveRefCntPtr<llvm::vfs::InMemoryFileSystem> InMemoryFileSystem;
+  FileManager FileMgr;
+  IntrusiveRefCntPtr<DiagnosticIDs> DiagID;
+  IntrusiveRefCntPtr<DiagnosticOptions> DiagOpts;
+  DiagnosticsEngine Diags;
+  SourceManager SourceMgr;
+  LangOptions LangOpts;
+  std::shared_ptr<TargetOptions> TargetOpts;
+  IntrusiveRefCntPtr<TargetInfo> Target;
+
+  std::unique_ptr<MacroExpansionContext>
+  getMacroExpansionContextFor(StringRef SourceText) {
+    std::unique_ptr<llvm::MemoryBuffer> Buf =
+        llvm::MemoryBuffer::getMemBuffer(SourceText);
+    SourceMgr.setMainFileID(SourceMgr.createFileID(std::move(Buf)));
+    TrivialModuleLoader ModLoader;
+    HeaderSearch HeaderInfo(std::make_shared<HeaderSearchOptions>(), SourceMgr,
+                            Diags, LangOpts, Target.get());
+    Preprocessor PP(std::make_shared<PreprocessorOptions>(), Diags, LangOpts,
+                    SourceMgr, HeaderInfo, ModLoader,
+                    /*IILookup =*/nullptr,
+                    /*OwnsHeaderSearch =*/false);
+
+    PP.Initialize(*Target);
+    auto Ctx = std::make_unique<MacroExpansionContext>(LangOpts);
+    Ctx->registerForPreprocessor(PP);
+
+    // Lex source text.
+    PP.EnterMainSourceFile();
+
+    while (true) {
+      Token Tok;
+      PP.Lex(Tok);
+      if (Tok.is(tok::eof))
+        break;
+    }
+
+    // Callbacks have been executed at this point.
+    return Ctx;
+  }
+
+  /// Returns the expansion location to main file at the given row and column.
+  SourceLocation at(unsigned row, unsigned col) const {
+    SourceLocation Loc =
+        SourceMgr.translateLineCol(SourceMgr.getMainFileID(), row, col);
+    return SourceMgr.getExpansionLoc(Loc);
+  }
+
+  static std::string dumpExpandedTexts(const MacroExpansionContext &Ctx) {
+    std::string Buf;
+    llvm::raw_string_ostream OS{Buf};
+    Ctx.dumpExpandedTextsToStream(OS);
+    return OS.str();
+  }
+
+  static std::string dumpExpansionRanges(const MacroExpansionContext &Ctx) {
+    std::string Buf;
+    llvm::raw_string_ostream OS{Buf};
+    Ctx.dumpExpansionRangesToStream(OS);
+    return OS.str();
+  }
+};
+
+TEST_F(MacroExpansionContextTest, IgnoresPragmas) {
+  // No-crash during lexing.
+  const auto Ctx = getMacroExpansionContextFor(R"code(
+  _Pragma("pack(push, 1)")
+  _Pragma("pack(pop, 1)")
+      )code");
+  // After preprocessing:
+  // #pragma pack(push, 1)
+  // #pragma pack(pop, 1)
+
+  EXPECT_EQ("\n=============== ExpandedTokens ===============\n",
+            dumpExpandedTexts(*Ctx));
+  EXPECT_EQ("\n=============== ExpansionRanges ===============\n",
+            dumpExpansionRanges(*Ctx));
+
+  EXPECT_FALSE(Ctx->getExpandedText(at(2, 1)).hasValue());
+  EXPECT_FALSE(Ctx->getOriginalText(at(2, 1)).hasValue());
+
+  EXPECT_FALSE(Ctx->getExpandedText(at(2, 3)).hasValue());
+  EXPECT_FALSE(Ctx->getOriginalText(at(2, 3)).hasValue());
+
+  EXPECT_FALSE(Ctx->getExpandedText(at(3, 3)).hasValue());
+  EXPECT_FALSE(Ctx->getOriginalText(at(3, 3)).hasValue());
+}
+
+TEST_F(MacroExpansionContextTest, NoneForNonExpansionLocations) {
+  const auto Ctx = getMacroExpansionContextFor(R"code(
+  #define EMPTY
+  A b cd EMPTY ef EMPTY gh
+EMPTY zz
+      )code");
+  // After preprocessing:
+  //  A b cd ef gh
+  //      zz
+
+  // That's the beginning of the definition of EMPTY.
+  EXPECT_FALSE(Ctx->getExpandedText(at(2, 11)).hasValue());
+  EXPECT_FALSE(Ctx->getOriginalText(at(2, 11)).hasValue());
+
+  // The space before the first expansion of EMPTY.
+  EXPECT_FALSE(Ctx->getExpandedText(at(3, 9)).hasValue());
+  EXPECT_FALSE(Ctx->getOriginalText(at(3, 9)).hasValue());
+
+  // The beginning of the first expansion of EMPTY.
+  EXPECT_TRUE(Ctx->getExpandedText(at(3, 10)).hasValue());
+  EXPECT_TRUE(Ctx->getOriginalText(at(3, 10)).hasValue());
+
+  // Pointing inside of the token EMPTY, but not at the beginning.
+  // FIXME: We only deal with begin locations.
+  EXPECT_FALSE(Ctx->getExpandedText(at(3, 11)).hasValue());
+  EXPECT_FALSE(Ctx->getOriginalText(at(3, 11)).hasValue());
+
+  // Same here.
+  EXPECT_FALSE(Ctx->getExpandedText(at(3, 12)).hasValue());
+  EXPECT_FALSE(Ctx->getOriginalText(at(3, 12)).hasValue());
+
+  // The beginning of the last expansion of EMPTY.
+  EXPECT_TRUE(Ctx->getExpandedText(at(4, 1)).hasValue());
+  EXPECT_TRUE(Ctx->getOriginalText(at(4, 1)).hasValue());
+
+  // Same as for the 3:11 case.
+  EXPECT_FALSE(Ctx->getExpandedText(at(4, 2)).hasValue());
+  EXPECT_FALSE(Ctx->getOriginalText(at(4, 2)).hasValue());
+}
+
+TEST_F(MacroExpansionContextTest, EmptyExpansions) {
+  const auto Ctx = getMacroExpansionContextFor(R"code(
+  #define EMPTY
+  A b cd EMPTY ef EMPTY gh
+EMPTY zz
+      )code");
+  // After preprocessing:
+  //  A b cd ef gh
+  //      zz
+
+  EXPECT_EQ("", Ctx->getExpandedText(at(3, 10)).getValue());
+  EXPECT_EQ("EMPTY", Ctx->getOriginalText(at(3, 10)).getValue());
+
+  EXPECT_EQ("", Ctx->getExpandedText(at(3, 19)).getValue());
+  EXPECT_EQ("EMPTY", Ctx->getOriginalText(at(3, 19)).getValue());
+
+  EXPECT_EQ("", Ctx->getExpandedText(at(4, 1)).getValue());
+  EXPECT_EQ("EMPTY", Ctx->getOriginalText(at(4, 1)).getValue());
+}
+
+TEST_F(MacroExpansionContextTest, TransitiveExpansions) {
+  const auto Ctx = getMacroExpansionContextFor(R"code(
+  #define EMPTY
+  #define WOOF EMPTY ) EMPTY   1
+  A b cd WOOF ef EMPTY gh
+      )code");
+  // After preprocessing:
+  //  A b cd ) 1 ef gh
+
+  EXPECT_EQ("WOOF", Ctx->getOriginalText(at(4, 10)).getValue());
+
+  EXPECT_EQ("", Ctx->getExpandedText(at(4, 18)).getValue());
+  EXPECT_EQ("EMPTY", Ctx->getOriginalText(at(4, 18)).getValue());
+}
+
+TEST_F(MacroExpansionContextTest, MacroFunctions) {
+  const auto Ctx = getMacroExpansionContextFor(R"code(
+  #define EMPTY
+  #define WOOF(x) x(EMPTY ) )  ) EMPTY   1
+  A b cd WOOF($$ ef) EMPTY gh
+  WOOF(WOOF)
+  WOOF(WOOF(bar barr))),,),')
+      )code");
+  // After preprocessing:
+  //  A b cd $$ ef( ) ) ) 1 gh
+  //  WOOF( ) ) ) 1
+  //  bar barr( ) ) ) 1( ) ) ) 1),,),')
+
+  EXPECT_EQ("$$ ef ()))1", Ctx->getExpandedText(at(4, 10)).getValue());
+  EXPECT_EQ("WOOF($$ ef)", Ctx->getOriginalText(at(4, 10)).getValue());
+
+  EXPECT_EQ("", Ctx->getExpandedText(at(4, 22)).getValue());
+  EXPECT_EQ("EMPTY", Ctx->getOriginalText(at(4, 22)).getValue());
+
+  EXPECT_EQ("WOOF ()))1", Ctx->getExpandedText(at(5, 3)).getValue());
+  EXPECT_EQ("WOOF(WOOF)", Ctx->getOriginalText(at(5, 3)).getValue());
+
+  EXPECT_EQ("bar barr ()))1()))1", Ctx->getExpandedText(at(6, 3)).getValue());
+  EXPECT_EQ("WOOF(WOOF(bar barr))", Ctx->getOriginalText(at(6, 3)).getValue());
+}
+
+TEST_F(MacroExpansionContextTest, VariadicMacros) {
+  // From the GCC website.
+  const auto Ctx = getMacroExpansionContextFor(R"code(
+  #define eprintf(format, ...) fprintf (stderr, format, __VA_ARGS__)
+  eprintf("success!\n", );
+  eprintf("success!\n");
+
+  #define eprintf2(format, ...) \
+    fprintf (stderr, format __VA_OPT__(,) __VA_ARGS__)
+  eprintf2("success!\n", );
+  eprintf2("success!\n");
+      )code");
+  // After preprocessing:
+  //  fprintf (stderr, "success!\n", );
+  //  fprintf (stderr, "success!\n", );
+  //  fprintf (stderr, "success!\n" );
+  //  fprintf (stderr, "success!\n" );
+
+  EXPECT_EQ(R"(fprintf (stderr ,"success!\n",))",
+            Ctx->getExpandedText(at(3, 3)).getValue());
+  EXPECT_EQ(R"(eprintf("success!\n", ))",
+            Ctx->getOriginalText(at(3, 3)).getValue());
+
+  EXPECT_EQ(R"(fprintf (stderr ,"success!\n",))",
+            Ctx->getExpandedText(at(4, 3)).getValue());
+  EXPECT_EQ(R"(eprintf("success!\n"))",
+            Ctx->getOriginalText(at(4, 3)).getValue());
+
+  EXPECT_EQ(R"(fprintf (stderr ,"success!\n"))",
+            Ctx->getExpandedText(at(8, 3)).getValue());
+  EXPECT_EQ(R"(eprintf2("success!\n", ))",
+            Ctx->getOriginalText(at(8, 3)).getValue());
+
+  EXPECT_EQ(R"(fprintf (stderr ,"success!\n"))",
+            Ctx->getExpandedText(at(9, 3)).getValue());
+  EXPECT_EQ(R"(eprintf2("success!\n"))",
+            Ctx->getOriginalText(at(9, 3)).getValue());
+}
+
+TEST_F(MacroExpansionContextTest, ConcatenationMacros) {
+  // From the GCC website.
+  const auto Ctx = getMacroExpansionContextFor(R"code(
+  #define COMMAND(NAME)  { #NAME, NAME ## _command }
+  struct command commands[] = {
+    COMMAND(quit),
+    COMMAND(help),
+  };)code");
+  // After preprocessing:
+  //  struct command commands[] = {
+  //    { "quit", quit_command },
+  //    { "help", help_command },
+  //  };
+
+  EXPECT_EQ(R"({"quit",quit_command })",
+            Ctx->getExpandedText(at(4, 5)).getValue());
+  EXPECT_EQ("COMMAND(quit)", Ctx->getOriginalText(at(4, 5)).getValue());
+
+  EXPECT_EQ(R"({"help",help_command })",
+            Ctx->getExpandedText(at(5, 5)).getValue());
+  EXPECT_EQ("COMMAND(help)", Ctx->getOriginalText(at(5, 5)).getValue());
+}
+
+TEST_F(MacroExpansionContextTest, StringizingMacros) {
+  // From the GCC website.
+  const auto Ctx = getMacroExpansionContextFor(R"code(
+  #define WARN_IF(EXP) \
+  do { if (EXP) \
+          fprintf (stderr, "Warning: " #EXP "\n"); } \
+  while (0)
+  WARN_IF (x == 0);
+
+  #define xstr(s) str(s)
+  #define str(s) #s
+  #define foo 4
+  str (foo)
+  xstr (foo)
+      )code");
+  // After preprocessing:
+  //  do { if (x == 0) fprintf (stderr, "Warning: " "x == 0" "\n"); } while (0);
+  //  "foo"
+  //  "4"
+
+  EXPECT_EQ(
+      R"(do {if (x ==0)fprintf (stderr ,"Warning: ""x == 0""\n");}while (0))",
+      Ctx->getExpandedText(at(6, 3)).getValue());
+  EXPECT_EQ("WARN_IF (x == 0)", Ctx->getOriginalText(at(6, 3)).getValue());
+
+  EXPECT_EQ(R"("foo")", Ctx->getExpandedText(at(11, 3)).getValue());
+  EXPECT_EQ("str (foo)", Ctx->getOriginalText(at(11, 3)).getValue());
+
+  EXPECT_EQ(R"("4")", Ctx->getExpandedText(at(12, 3)).getValue());
+  EXPECT_EQ("xstr (foo)", Ctx->getOriginalText(at(12, 3)).getValue());
+}
+
+TEST_F(MacroExpansionContextTest, StringizingVariadicMacros) {
+  const auto Ctx = getMacroExpansionContextFor(R"code(
+  #define xstr(...) str(__VA_ARGS__)
+  #define str(...) #__VA_ARGS__
+  #define RParen2x ) )
+  #define EMPTY
+  #define f(x, ...) __VA_ARGS__ ! x * x
+  #define g(...) zz EMPTY f(__VA_ARGS__ ! x) f() * y
+  #define h(x, G) G(x) G(x ## x RParen2x
+  #define q(G) h(apple, G(apple)) RParen2x
+
+  q(g)
+  q(xstr)
+  g(RParen2x)
+  f( RParen2x )s
+      )code");
+  // clang-format off
+  // After preprocessing:
+  //  zz ! apple ! x * apple ! x ! * * y(apple) zz ! apple ! x * apple ! x ! * * y(appleapple ) ) ) )
+  //  "apple"(apple) "apple"(appleapple ) ) ) )
+  //  zz ! * ) ! x) ! * * y
+  //  ! ) ) * ) )
+  // clang-format on
+
+  EXPECT_EQ("zz !apple !x *apple !x !**y (apple )zz !apple !x *apple !x !**y "
+            "(appleapple ))))",
+            Ctx->getExpandedText(at(11, 3)).getValue());
+  EXPECT_EQ("q(g)", Ctx->getOriginalText(at(11, 3)).getValue());
+
+  EXPECT_EQ(R"res("apple"(apple )"apple"(appleapple )))))res",
+            Ctx->getExpandedText(at(12, 3)).getValue());
+  EXPECT_EQ("q(xstr)", Ctx->getOriginalText(at(12, 3)).getValue());
+
+  EXPECT_EQ("zz !*)!x )!**y ", Ctx->getExpandedText(at(13, 3)).getValue());
+  EXPECT_EQ("g(RParen2x)", Ctx->getOriginalText(at(13, 3)).getValue());
+
+  EXPECT_EQ("!))*))", Ctx->getExpandedText(at(14, 3)).getValue());
+  EXPECT_EQ("f( RParen2x )", Ctx->getOriginalText(at(14, 3)).getValue());
+}
+
+TEST_F(MacroExpansionContextTest, RedefUndef) {
+  const auto Ctx = getMacroExpansionContextFor(R"code(
+  #define Hi(x) Welcome x
+  Hi(Adam)
+  #define Hi Willkommen
+  Hi Hans
+  #undef Hi
+  Hi(Hi)
+      )code");
+  // After preprocessing:
+  //  Welcome Adam
+  //  Willkommen Hans
+  //  Hi(Hi)
+
+  // FIXME: Extra space follows every identifier.
+  EXPECT_EQ("Welcome Adam ", Ctx->getExpandedText(at(3, 3)).getValue());
+  EXPECT_EQ("Hi(Adam)", Ctx->getOriginalText(at(3, 3)).getValue());
+
+  EXPECT_EQ("Willkommen ", Ctx->getExpandedText(at(5, 3)).getValue());
+  EXPECT_EQ("Hi", Ctx->getOriginalText(at(5, 3)).getValue());
+
+  // There was no macro expansion at 7:3, we should expect None.
+  EXPECT_FALSE(Ctx->getExpandedText(at(7, 3)).hasValue());
+  EXPECT_FALSE(Ctx->getOriginalText(at(7, 3)).hasValue());
+}
+
+TEST_F(MacroExpansionContextTest, UnbalacedParenthesis) {
+  const auto Ctx = getMacroExpansionContextFor(R"code(
+  #define retArg(x) x
+  #define retArgUnclosed retArg(fun()
+  #define BB CC
+  #define applyInt BB(int)
+  #define CC(x) retArgUnclosed
+
+  applyInt );
+
+  #define expandArgUnclosedCommaExpr(x) (x, fun(), 1
+  #define f expandArgUnclosedCommaExpr
+
+  int x =  f(f(1))  ));
+      )code");
+  // After preprocessing:
+  //  fun();
+  //  int x = ((1, fun(), 1, fun(), 1 ));
+
+  EXPECT_EQ("fun ()", Ctx->getExpandedText(at(8, 3)).getValue());
+  EXPECT_EQ("applyInt )", Ctx->getOriginalText(at(8, 3)).getValue());
+
+  EXPECT_EQ("((1,fun (),1,fun (),1",
+            Ctx->getExpandedText(at(13, 12)).getValue());
+  EXPECT_EQ("f(f(1))", Ctx->getOriginalText(at(13, 12)).getValue());
+}
+
+} // namespace
+} // namespace analysis
+} // namespace clang


        


More information about the cfe-commits mailing list