<div dir="ltr">Thanks<div>Russ</div></div><br><div class="gmail_quote"><div dir="ltr" class="gmail_attr">On Wed, 22 May 2019 at 16:12, Ilya Biryukov <<a href="mailto:ibiryukov@google.com">ibiryukov@google.com</a>> wrote:<br></div><blockquote class="gmail_quote" style="margin:0px 0px 0px 0.8ex;border-left:1px solid rgb(204,204,204);padding-left:1ex"><div dir="ltr">Re-landed in r361391 and buildbots seem to be happy.</div><br><div class="gmail_quote"><div dir="ltr" class="gmail_attr">On Wed, May 22, 2019 at 4:35 PM Ilya Biryukov <<a href="mailto:ibiryukov@google.com" target="_blank">ibiryukov@google.com</a>> wrote:<br></div><blockquote class="gmail_quote" style="margin:0px 0px 0px 0.8ex;border-left:1px solid rgb(204,204,204);padding-left:1ex"><div dir="ltr">I assume the tests produces warnings when parsing command-line arguments and we initialize the diagnostics client <b>after</b> that.<div>I'll re-land with a fix, will watch the buildbot for new failures</div></div><br><div class="gmail_quote"><div dir="ltr" class="gmail_attr">On Wed, May 22, 2019 at 4:30 PM Ilya Biryukov <<a href="mailto:ibiryukov@google.com" target="_blank">ibiryukov@google.com</a>> wrote:<br></div><blockquote class="gmail_quote" style="margin:0px 0px 0px 0.8ex;border-left:1px solid rgb(204,204,204);padding-left:1ex"><div dir="ltr">Is there any way to get symbolized stacktraces?</div><br><div class="gmail_quote"><div dir="ltr" class="gmail_attr">On Wed, May 22, 2019 at 4:28 PM Ilya Biryukov <<a href="mailto:ibiryukov@google.com" target="_blank">ibiryukov@google.com</a>> wrote:<br></div><blockquote class="gmail_quote" style="margin:0px 0px 0px 0.8ex;border-left:1px solid rgb(204,204,204);padding-left:1ex"><div dir="ltr">I'll take a look too, thanks</div><br><div class="gmail_quote"><div dir="ltr" class="gmail_attr">On Wed, May 22, 2019 at 2:52 PM Russell Gallop <<a href="mailto:russell.gallop@gmail.com" target="_blank">russell.gallop@gmail.com</a>> wrote:<br></div><blockquote class="gmail_quote" style="margin:0px 0px 0px 0.8ex;border-left:1px solid rgb(204,204,204);padding-left:1ex"><div dir="ltr">Hi Ilya,<div><br></div><div>I've reverted this (and r361248) in r361377 to get the some bots green:</div><div><a href="http://lab.llvm.org:8011/builders/llvm-clang-lld-x86_64-scei-ps4-ubuntu-fast" target="_blank">http://lab.llvm.org:8011/builders/llvm-clang-lld-x86_64-scei-ps4-ubuntu-fast</a> </div><div><a href="http://lab.llvm.org:8011/builders/llvm-clang-lld-x86_64-scei-ps4-windows10pro-fast" target="_blank">http://lab.llvm.org:8011/builders/llvm-clang-lld-x86_64-scei-ps4-windows10pro-fast</a> <br></div><div><br></div>These are failing with an assert when built with: -DLLVM_DEFAULT_TARGET_TRIPLE=x86_64-scei-ps4 -DLLVM_ENABLE_ASSERTIONS=ON<br><br>******************** TEST 'Clang-Unit :: Tooling/Syntax/./SyntaxTests/TokenBufferTest.SpelledByExpanded' FAILED ********************<br>Note: Google Test filter = TokenBufferTest.SpelledByExpanded<br>[==========] Running 1 test from 1 test case.<br>[----------] Global test environment set-up.<br>[----------] 1 test from TokenBufferTest<br>[ RUN      ] TokenBufferTest.SpelledByExpanded<br>SyntaxTests: /home/buildslave/ps4-buildslave4/llvm-clang-lld-x86_64-scei-ps4-ubuntu-fast/llvm.src/tools/clang/lib/Basic/Diagnostic.cpp:499: bool clang::DiagnosticsEngine::EmitCurrentDiagnostic(bool): Assertion `getClient() && "DiagnosticClient not set!"' failed.<div><br></div><div>I'll continue to investigate but you may know what the problem is better.</div><div><br></div><div>Thanks</div><div>Russ</div></div><br><div class="gmail_quote"><div dir="ltr" class="gmail_attr">On Mon, 20 May 2019 at 13:57, Ilya Biryukov via cfe-commits <<a href="mailto:cfe-commits@lists.llvm.org" target="_blank">cfe-commits@lists.llvm.org</a>> wrote:<br></div><blockquote class="gmail_quote" style="margin:0px 0px 0px 0.8ex;border-left:1px solid rgb(204,204,204);padding-left:1ex">Author: ibiryukov<br>

Date: Mon May 20 06:00:42 2019<br>

New Revision: 361148<br>

<br>

URL: <a href="http://llvm.org/viewvc/llvm-project?rev=361148&view=rev" rel="noreferrer" target="_blank">http://llvm.org/viewvc/llvm-project?rev=361148&view=rev</a><br>

Log:<br>

[Syntax] Introduce TokenBuffer, start clangToolingSyntax library<br>

<br>

Summary:<br>

TokenBuffer stores the list of tokens for a file obtained after<br>

preprocessing. This is a base building block for syntax trees,<br>

see [1] for the full proposal on syntax trees.<br>

<br>

This commits also starts a new sub-library of ClangTooling, which<br>

would be the home for the syntax trees and syntax-tree-based refactoring<br>

utilities.<br>

<br>

[1]: <a href="https://lists.llvm.org/pipermail/cfe-dev/2019-February/061414.html" rel="noreferrer" target="_blank">https://lists.llvm.org/pipermail/cfe-dev/2019-February/061414.html</a><br>

<br>

Reviewers: gribozavr, sammccall<br>

<br>

Reviewed By: sammccall<br>

<br>

Subscribers: mgrang, riccibruno, Eugene.Zelenko, mgorny, jdoerfert, cfe-commits<br>

<br>

Tags: #clang<br>

<br>

Differential Revision: <a href="https://reviews.llvm.org/D59887" rel="noreferrer" target="_blank">https://reviews.llvm.org/D59887</a><br>

<br>

Added:<br>

    cfe/trunk/include/clang/Tooling/Syntax/<br>

    cfe/trunk/include/clang/Tooling/Syntax/Tokens.h<br>

    cfe/trunk/lib/Tooling/Syntax/<br>

    cfe/trunk/lib/Tooling/Syntax/CMakeLists.txt<br>

    cfe/trunk/lib/Tooling/Syntax/Tokens.cpp<br>

    cfe/trunk/unittests/Tooling/Syntax/<br>

    cfe/trunk/unittests/Tooling/Syntax/CMakeLists.txt<br>

    cfe/trunk/unittests/Tooling/Syntax/TokensTest.cpp<br>

Modified:<br>

    cfe/trunk/lib/Tooling/CMakeLists.txt<br>

    cfe/trunk/unittests/Tooling/CMakeLists.txt<br>

<br>

Added: cfe/trunk/include/clang/Tooling/Syntax/Tokens.h<br>

URL: <a href="http://llvm.org/viewvc/llvm-project/cfe/trunk/include/clang/Tooling/Syntax/Tokens.h?rev=361148&view=auto" rel="noreferrer" target="_blank">http://llvm.org/viewvc/llvm-project/cfe/trunk/include/clang/Tooling/Syntax/Tokens.h?rev=361148&view=auto</a><br>

==============================================================================<br>

--- cfe/trunk/include/clang/Tooling/Syntax/Tokens.h (added)<br>

+++ cfe/trunk/include/clang/Tooling/Syntax/Tokens.h Mon May 20 06:00:42 2019<br>

@@ -0,0 +1,302 @@<br>

+//===- Tokens.h - collect tokens from preprocessing --------------*- C++-*-===//<br>

+//<br>

+// Part of the LLVM Project, under the Apache License v2.0 with LLVM Exceptions.<br>

+// See <a href="https://llvm.org/LICENSE.txt" rel="noreferrer" target="_blank">https://llvm.org/LICENSE.txt</a> for license information.<br>

+// SPDX-License-Identifier: Apache-2.0 WITH LLVM-exception<br>

+//<br>

+//===----------------------------------------------------------------------===//<br>

+// Record tokens that a preprocessor emits and define operations to map between<br>

+// the tokens written in a file and tokens produced by the preprocessor.<br>

+//<br>

+// When running the compiler, there are two token streams we are interested in:<br>

+//   - "spelled" tokens directly correspond to a substring written in some<br>

+//     source file.<br>

+//   - "expanded" tokens represent the result of preprocessing, parses consumes<br>

+//     this token stream to produce the AST.<br>

+//<br>

+// Expanded tokens correspond directly to locations found in the AST, allowing<br>

+// to find subranges of the token stream covered by various AST nodes. Spelled<br>

+// tokens correspond directly to the source code written by the user.<br>

+//<br>

+// To allow composing these two use-cases, we also define operations that map<br>

+// between expanded and spelled tokens that produced them (macro calls,<br>

+// directives, etc).<br>

+//<br>

+//===----------------------------------------------------------------------===//<br>

+<br>

+#ifndef LLVM_CLANG_TOOLING_SYNTAX_TOKENS_H<br>

+#define LLVM_CLANG_TOOLING_SYNTAX_TOKENS_H<br>

+<br>

+#include "clang/Basic/FileManager.h"<br>

+#include "clang/Basic/LangOptions.h"<br>

+#include "clang/Basic/SourceLocation.h"<br>

+#include "clang/Basic/SourceManager.h"<br>

+#include "clang/Basic/TokenKinds.h"<br>

+#include "clang/Lex/Token.h"<br>

+#include "llvm/ADT/ArrayRef.h"<br>

+#include "llvm/ADT/Optional.h"<br>

+#include "llvm/ADT/StringRef.h"<br>

+#include "llvm/Support/Compiler.h"<br>

+#include "llvm/Support/raw_ostream.h"<br>

+#include <cstdint><br>

+#include <tuple><br>

+<br>

+namespace clang {<br>

+class Preprocessor;<br>

+<br>

+namespace syntax {<br>

+<br>

+/// A half-open character range inside a particular file, the start offset is<br>

+/// included and the end offset is excluded from the range.<br>

+struct FileRange {<br>

+  /// EXPECTS: File.isValid() && Begin <= End.<br>

+  FileRange(FileID File, unsigned BeginOffset, unsigned EndOffset);<br>

+  /// EXPECTS: BeginLoc.isValid() && BeginLoc.isFileID().<br>

+  FileRange(const SourceManager &SM, SourceLocation BeginLoc, unsigned Length);<br>

+  /// EXPECTS: BeginLoc.isValid() && BeginLoc.isFileID(), Begin <= End and files<br>

+  ///          are the same.<br>

+  FileRange(const SourceManager &SM, SourceLocation BeginLoc,<br>

+            SourceLocation EndLoc);<br>

+<br>

+  FileID file() const { return File; }<br>

+  /// Start is a start offset (inclusive) in the corresponding file.<br>

+  unsigned beginOffset() const { return Begin; }<br>

+  /// End offset (exclusive) in the corresponding file.<br>

+  unsigned endOffset() const { return End; }<br>

+<br>

+  unsigned length() const { return End - Begin; }<br>

+<br>

+  /// Gets the substring that this FileRange refers to.<br>

+  llvm::StringRef text(const SourceManager &SM) const;<br>

+<br>

+  friend bool operator==(const FileRange &L, const FileRange &R) {<br>

+    return std::tie(L.File, L.Begin, L.End) == std::tie(R.File, R.Begin, R.End);<br>

+  }<br>

+  friend bool operator!=(const FileRange &L, const FileRange &R) {<br>

+    return !(L == R);<br>

+  }<br>

+<br>

+private:<br>

+  FileID File;<br>

+  unsigned Begin;<br>

+  unsigned End;<br>

+};<br>

+<br>

+/// For debugging purposes.<br>

+llvm::raw_ostream &operator<<(llvm::raw_ostream &OS, const FileRange &R);<br>

+<br>

+/// A token coming directly from a file or from a macro invocation. Has just<br>

+/// enough information to locate the token in the source code.<br>

+/// Can represent both expanded and spelled tokens.<br>

+class Token {<br>

+public:<br>

+  Token(SourceLocation Location, unsigned Length, tok::TokenKind Kind)<br>

+      : Location(Location), Length(Length), Kind(Kind) {}<br>

+  /// EXPECTS: clang::Token is not an annotation token.<br>

+  explicit Token(const clang::Token &T);<br>

+<br>

+  tok::TokenKind kind() const { return Kind; }<br>

+  /// Location of the first character of a token.<br>

+  SourceLocation location() const { return Location; }<br>

+  /// Location right after the last character of a token.<br>

+  SourceLocation endLocation() const {<br>

+    return Location.getLocWithOffset(Length);<br>

+  }<br>

+  unsigned length() const { return Length; }<br>

+<br>

+  /// Get the substring covered by the token. Note that will include all<br>

+  /// digraphs, newline continuations, etc. E.g. tokens for 'int' and<br>

+  ///    in\<br>

+  ///    t<br>

+  /// both have the same kind tok::kw_int, but results of text() are different.<br>

+  llvm::StringRef text(const SourceManager &SM) const;<br>

+<br>

+  /// Gets a range of this token.<br>

+  /// EXPECTS: token comes from a file, not from a macro expansion.<br>

+  FileRange range(const SourceManager &SM) const;<br>

+<br>

+  /// Given two tokens inside the same file, returns a file range that starts at<br>

+  /// \p First and ends at \p Last.<br>

+  /// EXPECTS: First and Last are file tokens from the same file, Last starts<br>

+  ///          after First.<br>

+  static FileRange range(const SourceManager &SM, const syntax::Token &First,<br>

+                         const syntax::Token &Last);<br>

+<br>

+  std::string dumpForTests(const SourceManager &SM) const;<br>

+  /// For debugging purposes.<br>

+  std::string str() const;<br>

+<br>

+private:<br>

+  SourceLocation Location;<br>

+  unsigned Length;<br>

+  tok::TokenKind Kind;<br>

+};<br>

+/// For debugging purposes. Equivalent to a call to Token::str().<br>

+llvm::raw_ostream &operator<<(llvm::raw_ostream &OS, const Token &T);<br>

+<br>

+/// A list of tokens obtained by preprocessing a text buffer and operations to<br>

+/// map between the expanded and spelled tokens, i.e. TokenBuffer has<br>

+/// information about two token streams:<br>

+///    1. Expanded tokens: tokens produced by the preprocessor after all macro<br>

+///       replacements,<br>

+///    2. Spelled tokens: corresponding directly to the source code of a file<br>

+///       before any macro replacements occurred.<br>

+/// Here's an example to illustrate a difference between those two:<br>

+///     #define FOO 10<br>

+///     int a = FOO;<br>

+///<br>

+/// Spelled tokens are {'#','define','FOO','10','int','a','=','FOO',';'}.<br>

+/// Expanded tokens are {'int','a','=','10',';','eof'}.<br>

+///<br>

+/// Note that the expanded token stream has a tok::eof token at the end, the<br>

+/// spelled tokens never store a 'eof' token.<br>

+///<br>

+/// The full list expanded tokens can be obtained with expandedTokens(). Spelled<br>

+/// tokens for each of the files can be obtained via spelledTokens(FileID).<br>

+///<br>

+/// To map between the expanded and spelled tokens use findSpelledByExpanded().<br>

+///<br>

+/// To build a token buffer use the TokenCollector class. You can also compute<br>

+/// the spelled tokens of a file using the tokenize() helper.<br>

+///<br>

+/// FIXME: allow to map from spelled to expanded tokens when use-case shows up.<br>

+class TokenBuffer {<br>

+public:<br>

+  TokenBuffer(const SourceManager &SourceMgr) : SourceMgr(&SourceMgr) {}<br>

+  /// All tokens produced by the preprocessor after all macro replacements,<br>

+  /// directives, etc. Source locations found in the clang AST will always<br>

+  /// point to one of these tokens.<br>

+  /// FIXME: figure out how to handle token splitting, e.g. '>>' can be split<br>

+  ///        into two '>' tokens by the parser. However, TokenBuffer currently<br>

+  ///        keeps it as a single '>>' token.<br>

+  llvm::ArrayRef<syntax::Token> expandedTokens() const {<br>

+    return ExpandedTokens;<br>

+  }<br>

+<br>

+  /// Find the subrange of spelled tokens that produced the corresponding \p<br>

+  /// Expanded tokens.<br>

+  ///<br>

+  /// EXPECTS: \p Expanded is a subrange of expandedTokens().<br>

+  ///<br>

+  /// Will fail if the expanded tokens do not correspond to a<br>

+  /// sequence of spelled tokens. E.g. for the following example:<br>

+  ///<br>

+  ///   #define FIRST f1 f2 f3<br>

+  ///   #define SECOND s1 s2 s3<br>

+  ///<br>

+  ///   a FIRST b SECOND c // expanded tokens are: a f1 f2 f3 b s1 s2 s3 c<br>

+  ///<br>

+  /// the results would be:<br>

+  ///   expanded   => spelled<br>

+  ///   ------------------------<br>

+  ///            a => a<br>

+  ///     s1 s2 s3 => SECOND<br>

+  ///   a f1 f2 f3 => a FIRST<br>

+  ///         a f1 => can't map<br>

+  ///        s1 s2 => can't map<br>

+  ///<br>

+  /// If \p Expanded is empty, the returned value is llvm::None.<br>

+  /// Complexity is logarithmic.<br>

+  llvm::Optional<llvm::ArrayRef<syntax::Token>><br>

+  spelledForExpanded(llvm::ArrayRef<syntax::Token> Expanded) const;<br>

+<br>

+  /// Lexed tokens of a file before preprocessing. E.g. for the following input<br>

+  ///     #define DECL(name) int name = 10<br>

+  ///     DECL(a);<br>

+  /// spelledTokens() returns {"#", "define", "DECL", "(", "name", ")", "eof"}.<br>

+  /// FIXME: we do not yet store tokens of directives, like #include, #define,<br>

+  ///        #pragma, etc.<br>

+  llvm::ArrayRef<syntax::Token> spelledTokens(FileID FID) const;<br>

+<br>

+  std::string dumpForTests() const;<br>

+<br>

+private:<br>

+  /// Describes a mapping between a continuous subrange of spelled tokens and<br>

+  /// expanded tokens. Represents macro expansions, preprocessor directives,<br>

+  /// conditionally disabled pp regions, etc.<br>

+  ///   #define FOO 1+2<br>

+  ///   #define BAR(a) a + 1<br>

+  ///   FOO    // invocation #1, tokens = {'1','+','2'}, macroTokens = {'FOO'}.<br>

+  ///   BAR(1) // invocation #2, tokens = {'a', '+', '1'},<br>

+  ///                            macroTokens = {'BAR', '(', '1', ')'}.<br>

+  struct Mapping {<br>

+    // Positions in the corresponding spelled token stream. The corresponding<br>

+    // range is never empty.<br>

+    unsigned BeginSpelled = 0;<br>

+    unsigned EndSpelled = 0;<br>

+    // Positions in the expanded token stream. The corresponding range can be<br>

+    // empty.<br>

+    unsigned BeginExpanded = 0;<br>

+    unsigned EndExpanded = 0;<br>

+<br>

+    /// For debugging purposes.<br>

+    std::string str() const;<br>

+  };<br>

+  /// Spelled tokens of the file with information about the subranges.<br>

+  struct MarkedFile {<br>

+    /// Lexed, but not preprocessed, tokens of the file. These map directly to<br>

+    /// text in the corresponding files and include tokens of all preprocessor<br>

+    /// directives.<br>

+    /// FIXME: spelled tokens don't change across FileID that map to the same<br>

+    ///        FileEntry. We could consider deduplicating them to save memory.<br>

+    std::vector<syntax::Token> SpelledTokens;<br>

+    /// A sorted list to convert between the spelled and expanded token streams.<br>

+    std::vector<Mapping> Mappings;<br>

+    /// The first expanded token produced for this FileID.<br>

+    unsigned BeginExpanded = 0;<br>

+    unsigned EndExpanded = 0;<br>

+  };<br>

+<br>

+  friend class TokenCollector;<br>

+<br>

+  /// Maps a single expanded token to its spelled counterpart or a mapping that<br>

+  /// produced it.<br>

+  std::pair<const syntax::Token *, const Mapping *><br>

+  spelledForExpandedToken(const syntax::Token *Expanded) const;<br>

+<br>

+  /// Token stream produced after preprocessing, conceputally this captures the<br>

+  /// same stream as 'clang -E' (excluding the preprocessor directives like<br>

+  /// #file, etc.).<br>

+  std::vector<syntax::Token> ExpandedTokens;<br>

+  llvm::DenseMap<FileID, MarkedFile> Files;<br>

+  // The value is never null, pointer instead of reference to avoid disabling<br>

+  // implicit assignment operator.<br>

+  const SourceManager *SourceMgr;<br>

+};<br>

+<br>

+/// Lex the text buffer, corresponding to \p FID, in raw mode and record the<br>

+/// resulting spelled tokens. Does minimal post-processing on raw identifiers,<br>

+/// setting the appropriate token kind (instead of the raw_identifier reported<br>

+/// by lexer in raw mode). This is a very low-level function, most users should<br>

+/// prefer to use TokenCollector. Lexing in raw mode produces wildly different<br>

+/// results from what one might expect when running a C++ frontend, e.g.<br>

+/// preprocessor does not run at all.<br>

+/// The result will *not* have a 'eof' token at the end.<br>

+std::vector<syntax::Token> tokenize(FileID FID, const SourceManager &SM,<br>

+                                    const LangOptions &LO);<br>

+<br>

+/// Collects tokens for the main file while running the frontend action. An<br>

+/// instance of this object should be created on<br>

+/// FrontendAction::BeginSourceFile() and the results should be consumed after<br>

+/// FrontendAction::Execute() finishes.<br>

+class TokenCollector {<br>

+public:<br>

+  /// Adds the hooks to collect the tokens. Should be called before the<br>

+  /// preprocessing starts, i.e. as a part of BeginSourceFile() or<br>

+  /// CreateASTConsumer().<br>

+  TokenCollector(Preprocessor &P);<br>

+<br>

+  /// Finalizes token collection. Should be called after preprocessing is<br>

+  /// finished, i.e. after running Execute().<br>

+  LLVM_NODISCARD TokenBuffer consume() &&;<br>

+<br>

+private:<br>

+  class Builder;<br>

+  std::vector<syntax::Token> Expanded;<br>

+  Preprocessor &PP;<br>

+};<br>

+<br>

+} // namespace syntax<br>

+} // namespace clang<br>

+<br>

+#endif<br>

<br>

Modified: cfe/trunk/lib/Tooling/CMakeLists.txt<br>

URL: <a href="http://llvm.org/viewvc/llvm-project/cfe/trunk/lib/Tooling/CMakeLists.txt?rev=361148&r1=361147&r2=361148&view=diff" rel="noreferrer" target="_blank">http://llvm.org/viewvc/llvm-project/cfe/trunk/lib/Tooling/CMakeLists.txt?rev=361148&r1=361147&r2=361148&view=diff</a><br>

==============================================================================<br>

--- cfe/trunk/lib/Tooling/CMakeLists.txt (original)<br>

+++ cfe/trunk/lib/Tooling/CMakeLists.txt Mon May 20 06:00:42 2019<br>

@@ -7,6 +7,7 @@ add_subdirectory(Core)<br>

 add_subdirectory(Inclusions)<br>

 add_subdirectory(Refactoring)<br>

 add_subdirectory(ASTDiff)<br>

+add_subdirectory(Syntax)<br>

<br>

 add_clang_library(clangTooling<br>

   AllTUsExecution.cpp<br>

<br>

Added: cfe/trunk/lib/Tooling/Syntax/CMakeLists.txt<br>

URL: <a href="http://llvm.org/viewvc/llvm-project/cfe/trunk/lib/Tooling/Syntax/CMakeLists.txt?rev=361148&view=auto" rel="noreferrer" target="_blank">http://llvm.org/viewvc/llvm-project/cfe/trunk/lib/Tooling/Syntax/CMakeLists.txt?rev=361148&view=auto</a><br>

==============================================================================<br>

--- cfe/trunk/lib/Tooling/Syntax/CMakeLists.txt (added)<br>

+++ cfe/trunk/lib/Tooling/Syntax/CMakeLists.txt Mon May 20 06:00:42 2019<br>

@@ -0,0 +1,10 @@<br>

+set(LLVM_LINK_COMPONENTS Support)<br>

+<br>

+add_clang_library(clangToolingSyntax<br>

+  Tokens.cpp<br>

+<br>

+  LINK_LIBS<br>

+  clangBasic<br>

+  clangFrontend<br>

+  clangLex<br>

+  )<br>

<br>

Added: cfe/trunk/lib/Tooling/Syntax/Tokens.cpp<br>

URL: <a href="http://llvm.org/viewvc/llvm-project/cfe/trunk/lib/Tooling/Syntax/Tokens.cpp?rev=361148&view=auto" rel="noreferrer" target="_blank">http://llvm.org/viewvc/llvm-project/cfe/trunk/lib/Tooling/Syntax/Tokens.cpp?rev=361148&view=auto</a><br>

==============================================================================<br>

--- cfe/trunk/lib/Tooling/Syntax/Tokens.cpp (added)<br>

+++ cfe/trunk/lib/Tooling/Syntax/Tokens.cpp Mon May 20 06:00:42 2019<br>

@@ -0,0 +1,509 @@<br>

+//===- Tokens.cpp - collect tokens from preprocessing ---------------------===//<br>

+//<br>

+// Part of the LLVM Project, under the Apache License v2.0 with LLVM Exceptions.<br>

+// See <a href="https://llvm.org/LICENSE.txt" rel="noreferrer" target="_blank">https://llvm.org/LICENSE.txt</a> for license information.<br>

+// SPDX-License-Identifier: Apache-2.0 WITH LLVM-exception<br>

+//<br>

+//===----------------------------------------------------------------------===//<br>

+#include "clang/Tooling/Syntax/Tokens.h"<br>

+<br>

+#include "clang/Basic/Diagnostic.h"<br>

+#include "clang/Basic/IdentifierTable.h"<br>

+#include "clang/Basic/LLVM.h"<br>

+#include "clang/Basic/LangOptions.h"<br>

+#include "clang/Basic/SourceLocation.h"<br>

+#include "clang/Basic/SourceManager.h"<br>

+#include "clang/Basic/TokenKinds.h"<br>

+#include "clang/Lex/Preprocessor.h"<br>

+#include "clang/Lex/Token.h"<br>

+#include "llvm/ADT/ArrayRef.h"<br>

+#include "llvm/ADT/None.h"<br>

+#include "llvm/ADT/Optional.h"<br>

+#include "llvm/ADT/STLExtras.h"<br>

+#include "llvm/Support/Debug.h"<br>

+#include "llvm/Support/ErrorHandling.h"<br>

+#include "llvm/Support/FormatVariadic.h"<br>

+#include "llvm/Support/raw_ostream.h"<br>

+#include <algorithm><br>

+#include <cassert><br>

+#include <iterator><br>

+#include <string><br>

+#include <utility><br>

+#include <vector><br>

+<br>

+using namespace clang;<br>

+using namespace clang::syntax;<br>

+<br>

+syntax::Token::Token(const clang::Token &T)<br>

+    : Token(T.getLocation(), T.getLength(), T.getKind()) {<br>

+  assert(!T.isAnnotation());<br>

+}<br>

+<br>

+llvm::StringRef syntax::Token::text(const SourceManager &SM) const {<br>

+  bool Invalid = false;<br>

+  const char *Start = SM.getCharacterData(location(), &Invalid);<br>

+  assert(!Invalid);<br>

+  return llvm::StringRef(Start, length());<br>

+}<br>

+<br>

+FileRange syntax::Token::range(const SourceManager &SM) const {<br>

+  assert(location().isFileID() && "must be a spelled token");<br>

+  FileID File;<br>

+  unsigned StartOffset;<br>

+  std::tie(File, StartOffset) = SM.getDecomposedLoc(location());<br>

+  return FileRange(File, StartOffset, StartOffset + length());<br>

+}<br>

+<br>

+FileRange syntax::Token::range(const SourceManager &SM,<br>

+                               const syntax::Token &First,<br>

+                               const syntax::Token &Last) {<br>

+  auto F = First.range(SM);<br>

+  auto L = Last.range(SM);<br>

+  assert(F.file() == L.file() && "tokens from different files");<br>

+  assert(F.endOffset() <= L.beginOffset() && "wrong order of tokens");<br>

+  return FileRange(F.file(), F.beginOffset(), L.endOffset());<br>

+}<br>

+<br>

+llvm::raw_ostream &syntax::operator<<(llvm::raw_ostream &OS, const Token &T) {<br>

+  return OS << T.str();<br>

+}<br>

+<br>

+FileRange::FileRange(FileID File, unsigned BeginOffset, unsigned EndOffset)<br>

+    : File(File), Begin(BeginOffset), End(EndOffset) {<br>

+      assert(File.isValid());<br>

+      assert(BeginOffset <= EndOffset);<br>

+}<br>

+<br>

+FileRange::FileRange(const SourceManager &SM, SourceLocation BeginLoc,<br>

+                     unsigned Length) {<br>

+  assert(BeginLoc.isValid());<br>

+  assert(BeginLoc.isFileID());<br>

+<br>

+  std::tie(File, Begin) = SM.getDecomposedLoc(BeginLoc);<br>

+  End = Begin + Length;<br>

+}<br>

+FileRange::FileRange(const SourceManager &SM, SourceLocation BeginLoc,<br>

+                     SourceLocation EndLoc) {<br>

+  assert(BeginLoc.isValid());<br>

+  assert(BeginLoc.isFileID());<br>

+  assert(EndLoc.isValid());<br>

+  assert(EndLoc.isFileID());<br>

+  assert(SM.getFileID(BeginLoc) == SM.getFileID(EndLoc));<br>

+  assert(SM.getFileOffset(BeginLoc) <= SM.getFileOffset(EndLoc));<br>

+<br>

+  std::tie(File, Begin) = SM.getDecomposedLoc(BeginLoc);<br>

+  End = SM.getFileOffset(EndLoc);<br>

+}<br>

+<br>

+llvm::raw_ostream &syntax::operator<<(llvm::raw_ostream &OS,<br>

+                                      const FileRange &R) {<br>

+  return OS << llvm::formatv("FileRange(file = {0}, offsets = {1}-{2})",<br>

+                             R.file().getHashValue(), R.beginOffset(),<br>

+                             R.endOffset());<br>

+}<br>

+<br>

+llvm::StringRef FileRange::text(const SourceManager &SM) const {<br>

+  bool Invalid = false;<br>

+  StringRef Text = SM.getBufferData(File, &Invalid);<br>

+  if (Invalid)<br>

+    return "";<br>

+  assert(Begin <= Text.size());<br>

+  assert(End <= Text.size());<br>

+  return Text.substr(Begin, length());<br>

+}<br>

+<br>

+std::pair<const syntax::Token *, const TokenBuffer::Mapping *><br>

+TokenBuffer::spelledForExpandedToken(const syntax::Token *Expanded) const {<br>

+  assert(Expanded);<br>

+  assert(ExpandedTokens.data() <= Expanded &&<br>

+         Expanded < ExpandedTokens.data() + ExpandedTokens.size());<br>

+<br>

+  auto FileIt = Files.find(<br>

+      SourceMgr->getFileID(SourceMgr->getExpansionLoc(Expanded->location())));<br>

+  assert(FileIt != Files.end() && "no file for an expanded token");<br>

+<br>

+  const MarkedFile &File = FileIt->second;<br>

+<br>

+  unsigned ExpandedIndex = Expanded - ExpandedTokens.data();<br>

+  // Find the first mapping that produced tokens after \p Expanded.<br>

+  auto It = llvm::bsearch(File.Mappings, [&](const Mapping &M) {<br>

+    return ExpandedIndex < M.BeginExpanded;<br>

+  });<br>

+  // Our token could only be produced by the previous mapping.<br>

+  if (It == File.Mappings.begin()) {<br>

+    // No previous mapping, no need to modify offsets.<br>

+    return {&File.SpelledTokens[ExpandedIndex - File.BeginExpanded], nullptr};<br>

+  }<br>

+  --It; // 'It' now points to last mapping that started before our token.<br>

+<br>

+  // Check if the token is part of the mapping.<br>

+  if (ExpandedIndex < It->EndExpanded)<br>

+    return {&File.SpelledTokens[It->BeginSpelled], /*Mapping*/ &*It};<br>

+<br>

+  // Not part of the mapping, use the index from previous mapping to compute the<br>

+  // corresponding spelled token.<br>

+  return {<br>

+      &File.SpelledTokens[It->EndSpelled + (ExpandedIndex - It->EndExpanded)],<br>

+      /*Mapping*/ nullptr};<br>

+}<br>

+<br>

+llvm::ArrayRef<syntax::Token> TokenBuffer::spelledTokens(FileID FID) const {<br>

+  auto It = Files.find(FID);<br>

+  assert(It != Files.end());<br>

+  return It->second.SpelledTokens;<br>

+}<br>

+<br>

+std::string TokenBuffer::Mapping::str() const {<br>

+  return llvm::formatv("spelled tokens: [{0},{1}), expanded tokens: [{2},{3})",<br>

+                       BeginSpelled, EndSpelled, BeginExpanded, EndExpanded);<br>

+}<br>

+<br>

+llvm::Optional<llvm::ArrayRef<syntax::Token>><br>

+TokenBuffer::spelledForExpanded(llvm::ArrayRef<syntax::Token> Expanded) const {<br>

+  // Mapping an empty range is ambiguous in case of empty mappings at either end<br>

+  // of the range, bail out in that case.<br>

+  if (Expanded.empty())<br>

+    return llvm::None;<br>

+<br>

+  // FIXME: also allow changes uniquely mapping to macro arguments.<br>

+<br>

+  const syntax::Token *BeginSpelled;<br>

+  const Mapping *BeginMapping;<br>

+  std::tie(BeginSpelled, BeginMapping) =<br>

+      spelledForExpandedToken(&Expanded.front());<br>

+<br>

+  const syntax::Token *LastSpelled;<br>

+  const Mapping *LastMapping;<br>

+  std::tie(LastSpelled, LastMapping) =<br>

+      spelledForExpandedToken(&Expanded.back());<br>

+<br>

+  FileID FID = SourceMgr->getFileID(BeginSpelled->location());<br>

+  // FIXME: Handle multi-file changes by trying to map onto a common root.<br>

+  if (FID != SourceMgr->getFileID(LastSpelled->location()))<br>

+    return llvm::None;<br>

+<br>

+  const MarkedFile &File = Files.find(FID)->second;<br>

+<br>

+  // Do not allow changes that cross macro expansion boundaries.<br>

+  unsigned BeginExpanded = Expanded.begin() - ExpandedTokens.data();<br>

+  unsigned EndExpanded = Expanded.end() - ExpandedTokens.data();<br>

+  if (BeginMapping && BeginMapping->BeginExpanded < BeginExpanded)<br>

+    return llvm::None;<br>

+  if (LastMapping && EndExpanded < LastMapping->EndExpanded)<br>

+    return llvm::None;<br>

+  // All is good, return the result.<br>

+  return llvm::makeArrayRef(<br>

+      BeginMapping ? File.SpelledTokens.data() + BeginMapping->BeginSpelled<br>

+                   : BeginSpelled,<br>

+      LastMapping ? File.SpelledTokens.data() + LastMapping->EndSpelled<br>

+                  : LastSpelled + 1);<br>

+}<br>

+<br>

+std::vector<syntax::Token> syntax::tokenize(FileID FID, const SourceManager &SM,<br>

+                                            const LangOptions &LO) {<br>

+  std::vector<syntax::Token> Tokens;<br>

+  IdentifierTable Identifiers(LO);<br>

+  auto AddToken = [&](clang::Token T) {<br>

+    // Fill the proper token kind for keywords, etc.<br>

+    if (T.getKind() == tok::raw_identifier && !T.needsCleaning() &&<br>

+        !T.hasUCN()) { // FIXME: support needsCleaning and hasUCN cases.<br>

+      clang::IdentifierInfo &II = Identifiers.get(T.getRawIdentifier());<br>

+      T.setIdentifierInfo(&II);<br>

+      T.setKind(II.getTokenID());<br>

+    }<br>

+    Tokens.push_back(syntax::Token(T));<br>

+  };<br>

+<br>

+  Lexer L(FID, SM.getBuffer(FID), SM, LO);<br>

+<br>

+  clang::Token T;<br>

+  while (!L.LexFromRawLexer(T))<br>

+    AddToken(T);<br>

+  // 'eof' is only the last token if the input is null-terminated. Never store<br>

+  // it, for consistency.<br>

+  if (T.getKind() != tok::eof)<br>

+    AddToken(T);<br>

+  return Tokens;<br>

+}<br>

+<br>

+/// Fills in the TokenBuffer by tracing the run of a preprocessor. The<br>

+/// implementation tracks the tokens, macro expansions and directives coming<br>

+/// from the preprocessor and:<br>

+/// - for each token, figures out if it is a part of an expanded token stream,<br>

+///   spelled token stream or both. Stores the tokens appropriately.<br>

+/// - records mappings from the spelled to expanded token ranges, e.g. for macro<br>

+///   expansions.<br>

+/// FIXME: also properly record:<br>

+///          - #include directives,<br>

+///          - #pragma, #line and other PP directives,<br>

+///          - skipped pp regions,<br>

+///          - ...<br>

+<br>

+TokenCollector::TokenCollector(Preprocessor &PP) : PP(PP) {<br>

+  // Collect the expanded token stream during preprocessing.<br>

+  PP.setTokenWatcher([this](const clang::Token &T) {<br>

+    if (T.isAnnotation())<br>

+      return;<br>

+    DEBUG_WITH_TYPE("collect-tokens", llvm::dbgs()<br>

+                                          << "Token: "<br>

+                                          << syntax::Token(T).dumpForTests(<br>

+                                                 this->PP.getSourceManager())<br>

+                                          << "\n"<br>

+<br>

+    );<br>

+    Expanded.push_back(syntax::Token(T));<br>

+  });<br>

+}<br>

+<br>

+/// Builds mappings and spelled tokens in the TokenBuffer based on the expanded<br>

+/// token stream.<br>

+class TokenCollector::Builder {<br>

+public:<br>

+  Builder(std::vector<syntax::Token> Expanded, const SourceManager &SM,<br>

+          const LangOptions &LangOpts)<br>

+      : Result(SM), SM(SM), LangOpts(LangOpts) {<br>

+    Result.ExpandedTokens = std::move(Expanded);<br>

+  }<br>

+<br>

+  TokenBuffer build() && {<br>

+    buildSpelledTokens();<br>

+<br>

+    // Walk over expanded tokens and spelled tokens in parallel, building the<br>

+    // mappings between those using source locations.<br>

+<br>

+    // The 'eof' token is special, it is not part of spelled token stream. We<br>

+    // handle it separately at the end.<br>

+    assert(!Result.ExpandedTokens.empty());<br>

+    assert(Result.ExpandedTokens.back().kind() == tok::eof);<br>

+    for (unsigned I = 0; I < Result.ExpandedTokens.size() - 1; ++I) {<br>

+      // (!) I might be updated by the following call.<br>

+      processExpandedToken(I);<br>

+    }<br>

+<br>

+    // 'eof' not handled in the loop, do it here.<br>

+    assert(SM.getMainFileID() ==<br>

+           SM.getFileID(Result.ExpandedTokens.back().location()));<br>

+    fillGapUntil(Result.Files[SM.getMainFileID()],<br>

+                 Result.ExpandedTokens.back().location(),<br>

+                 Result.ExpandedTokens.size() - 1);<br>

+    Result.Files[SM.getMainFileID()].EndExpanded = Result.ExpandedTokens.size();<br>

+<br>

+    // Some files might have unaccounted spelled tokens at the end, add an empty<br>

+    // mapping for those as they did not have expanded counterparts.<br>

+    fillGapsAtEndOfFiles();<br>

+<br>

+    return std::move(Result);<br>

+  }<br>

+<br>

+private:<br>

+  /// Process the next token in an expanded stream and move corresponding<br>

+  /// spelled tokens, record any mapping if needed.<br>

+  /// (!) \p I will be updated if this had to skip tokens, e.g. for macros.<br>

+  void processExpandedToken(unsigned &I) {<br>

+    auto L = Result.ExpandedTokens[I].location();<br>

+    if (L.isMacroID()) {<br>

+      processMacroExpansion(SM.getExpansionRange(L), I);<br>

+      return;<br>

+    }<br>

+    if (L.isFileID()) {<br>

+      auto FID = SM.getFileID(L);<br>

+      TokenBuffer::MarkedFile &File = Result.Files[FID];<br>

+<br>

+      fillGapUntil(File, L, I);<br>

+<br>

+      // Skip the token.<br>

+      assert(File.SpelledTokens[NextSpelled[FID]].location() == L &&<br>

+             "no corresponding token in the spelled stream");<br>

+      ++NextSpelled[FID];<br>

+      return;<br>

+    }<br>

+  }<br>

+<br>

+  /// Skipped expanded and spelled tokens of a macro expansion that covers \p<br>

+  /// SpelledRange. Add a corresponding mapping.<br>

+  /// (!) \p I will be the index of the last token in an expansion after this<br>

+  /// function returns.<br>

+  void processMacroExpansion(CharSourceRange SpelledRange, unsigned &I) {<br>

+    auto FID = SM.getFileID(SpelledRange.getBegin());<br>

+    assert(FID == SM.getFileID(SpelledRange.getEnd()));<br>

+    TokenBuffer::MarkedFile &File = Result.Files[FID];<br>

+<br>

+    fillGapUntil(File, SpelledRange.getBegin(), I);<br>

+<br>

+    TokenBuffer::Mapping M;<br>

+    // Skip the spelled macro tokens.<br>

+    std::tie(M.BeginSpelled, M.EndSpelled) =<br>

+        consumeSpelledUntil(File, SpelledRange.getEnd().getLocWithOffset(1));<br>

+    // Skip all expanded tokens from the same macro expansion.<br>

+    M.BeginExpanded = I;<br>

+    for (; I + 1 < Result.ExpandedTokens.size(); ++I) {<br>

+      auto NextL = Result.ExpandedTokens[I + 1].location();<br>

+      if (!NextL.isMacroID() ||<br>

+          SM.getExpansionLoc(NextL) != SpelledRange.getBegin())<br>

+        break;<br>

+    }<br>

+    M.EndExpanded = I + 1;<br>

+<br>

+    // Add a resulting mapping.<br>

+    File.Mappings.push_back(M);<br>

+  }<br>

+<br>

+  /// Initializes TokenBuffer::Files and fills spelled tokens and expanded<br>

+  /// ranges for each of the files.<br>

+  void buildSpelledTokens() {<br>

+    for (unsigned I = 0; I < Result.ExpandedTokens.size(); ++I) {<br>

+      auto FID =<br>

+          SM.getFileID(SM.getExpansionLoc(Result.ExpandedTokens[I].location()));<br>

+      auto It = Result.Files.try_emplace(FID);<br>

+      TokenBuffer::MarkedFile &File = It.first->second;<br>

+<br>

+      File.EndExpanded = I + 1;<br>

+      if (!It.second)<br>

+        continue; // we have seen this file before.<br>

+<br>

+      // This is the first time we see this file.<br>

+      File.BeginExpanded = I;<br>

+      File.SpelledTokens = tokenize(FID, SM, LangOpts);<br>

+    }<br>

+  }<br>

+<br>

+  /// Consumed spelled tokens until location L is reached (token starting at L<br>

+  /// is not included). Returns the indicies of the consumed range.<br>

+  std::pair</*Begin*/ unsigned, /*End*/ unsigned><br>

+  consumeSpelledUntil(TokenBuffer::MarkedFile &File, SourceLocation L) {<br>

+    assert(L.isFileID());<br>

+    FileID FID;<br>

+    unsigned Offset;<br>

+    std::tie(FID, Offset) = SM.getDecomposedLoc(L);<br>

+<br>

+    // (!) we update the index in-place.<br>

+    unsigned &SpelledI = NextSpelled[FID];<br>

+    unsigned Before = SpelledI;<br>

+    for (; SpelledI < File.SpelledTokens.size() &&<br>

+           SM.getFileOffset(File.SpelledTokens[SpelledI].location()) < Offset;<br>

+         ++SpelledI) {<br>

+    }<br>

+    return std::make_pair(Before, /*After*/ SpelledI);<br>

+  };<br>

+<br>

+  /// Consumes spelled tokens until location \p L is reached and adds a mapping<br>

+  /// covering the consumed tokens. The mapping will point to an empty expanded<br>

+  /// range at position \p ExpandedIndex.<br>

+  void fillGapUntil(TokenBuffer::MarkedFile &File, SourceLocation L,<br>

+                    unsigned ExpandedIndex) {<br>

+    unsigned BeginSpelledGap, EndSpelledGap;<br>

+    std::tie(BeginSpelledGap, EndSpelledGap) = consumeSpelledUntil(File, L);<br>

+    if (BeginSpelledGap == EndSpelledGap)<br>

+      return; // No gap.<br>

+    TokenBuffer::Mapping M;<br>

+    M.BeginSpelled = BeginSpelledGap;<br>

+    M.EndSpelled = EndSpelledGap;<br>

+    M.BeginExpanded = M.EndExpanded = ExpandedIndex;<br>

+    File.Mappings.push_back(M);<br>

+  };<br>

+<br>

+  /// Adds empty mappings for unconsumed spelled tokens at the end of each file.<br>

+  void fillGapsAtEndOfFiles() {<br>

+    for (auto &F : Result.Files) {<br>

+      unsigned Next = NextSpelled[F.first];<br>

+      if (F.second.SpelledTokens.size() == Next)<br>

+        continue; // All spelled tokens are accounted for.<br>

+<br>

+      // Record a mapping for the gap at the end of the spelled tokens.<br>

+      TokenBuffer::Mapping M;<br>

+      M.BeginSpelled = Next;<br>

+      M.EndSpelled = F.second.SpelledTokens.size();<br>

+      M.BeginExpanded = F.second.EndExpanded;<br>

+      M.EndExpanded = F.second.EndExpanded;<br>

+<br>

+      F.second.Mappings.push_back(M);<br>

+    }<br>

+  }<br>

+<br>

+  TokenBuffer Result;<br>

+  /// For each file, a position of the next spelled token we will consume.<br>

+  llvm::DenseMap<FileID, unsigned> NextSpelled;<br>

+  const SourceManager &SM;<br>

+  const LangOptions &LangOpts;<br>

+};<br>

+<br>

+TokenBuffer TokenCollector::consume() && {<br>

+  PP.setTokenWatcher(nullptr);<br>

+  return Builder(std::move(Expanded), PP.getSourceManager(), PP.getLangOpts())<br>

+      .build();<br>

+}<br>

+<br>

+std::string syntax::Token::str() const {<br>

+  return llvm::formatv("Token({0}, length = {1})", tok::getTokenName(kind()),<br>

+                       length());<br>

+}<br>

+<br>

+std::string syntax::Token::dumpForTests(const SourceManager &SM) const {<br>

+  return llvm::formatv("{0}   {1}", tok::getTokenName(kind()), text(SM));<br>

+}<br>

+<br>

+std::string TokenBuffer::dumpForTests() const {<br>

+  auto PrintToken = [this](const syntax::Token &T) -> std::string {<br>

+    if (T.kind() == tok::eof)<br>

+      return "<eof>";<br>

+    return T.text(*SourceMgr);<br>

+  };<br>

+<br>

+  auto DumpTokens = [this, &PrintToken](llvm::raw_ostream &OS,<br>

+                                        llvm::ArrayRef<syntax::Token> Tokens) {<br>

+    if (Tokens.size() == 1) {<br>

+      assert(Tokens[0].kind() == tok::eof);<br>

+      OS << "<empty>";<br>

+      return;<br>

+    }<br>

+    OS << Tokens[0].text(*SourceMgr);<br>

+    for (unsigned I = 1; I < Tokens.size(); ++I) {<br>

+      if (Tokens[I].kind() == tok::eof)<br>

+        continue;<br>

+      OS << " " << PrintToken(Tokens[I]);<br>

+    }<br>

+  };<br>

+<br>

+  std::string Dump;<br>

+  llvm::raw_string_ostream OS(Dump);<br>

+<br>

+  OS << "expanded tokens:\n"<br>

+     << "  ";<br>

+  DumpTokens(OS, ExpandedTokens);<br>

+  OS << "\n";<br>

+<br>

+  std::vector<FileID> Keys;<br>

+  for (auto F : Files)<br>

+    Keys.push_back(F.first);<br>

+  llvm::sort(Keys);<br>

+<br>

+  for (FileID ID : Keys) {<br>

+    const MarkedFile &File = Files.find(ID)->second;<br>

+    auto *Entry = SourceMgr->getFileEntryForID(ID);<br>

+    if (!Entry)<br>

+      continue; // Skip builtin files.<br>

+    OS << llvm::formatv("file '{0}'\n", Entry->getName())<br>

+       << "  spelled tokens:\n"<br>

+       << "    ";<br>

+    DumpTokens(OS, File.SpelledTokens);<br>

+    OS << "\n";<br>

+<br>

+    if (File.Mappings.empty()) {<br>

+      OS << "  no mappings.\n";<br>

+      continue;<br>

+    }<br>

+    OS << "  mappings:\n";<br>

+    for (auto &M : File.Mappings) {<br>

+      OS << llvm::formatv(<br>

+          "    ['{0}'_{1}, '{2}'_{3}) => ['{4}'_{5}, '{6}'_{7})\n",<br>

+          PrintToken(File.SpelledTokens[M.BeginSpelled]), M.BeginSpelled,<br>

+          M.EndSpelled == File.SpelledTokens.size()<br>

+              ? "<eof>"<br>

+              : PrintToken(File.SpelledTokens[M.EndSpelled]),<br>

+          M.EndSpelled, PrintToken(ExpandedTokens[M.BeginExpanded]),<br>

+          M.BeginExpanded, PrintToken(ExpandedTokens[M.EndExpanded]),<br>

+          M.EndExpanded);<br>

+    }<br>

+  }<br>

+  return OS.str();<br>

+}<br>

<br>

Modified: cfe/trunk/unittests/Tooling/CMakeLists.txt<br>

URL: <a href="http://llvm.org/viewvc/llvm-project/cfe/trunk/unittests/Tooling/CMakeLists.txt?rev=361148&r1=361147&r2=361148&view=diff" rel="noreferrer" target="_blank">http://llvm.org/viewvc/llvm-project/cfe/trunk/unittests/Tooling/CMakeLists.txt?rev=361148&r1=361147&r2=361148&view=diff</a><br>

==============================================================================<br>

--- cfe/trunk/unittests/Tooling/CMakeLists.txt (original)<br>

+++ cfe/trunk/unittests/Tooling/CMakeLists.txt Mon May 20 06:00:42 2019<br>

@@ -70,3 +70,6 @@ target_link_libraries(ToolingTests<br>

   clangToolingInclusions<br>

   clangToolingRefactor<br>

   )<br>

+<br>

+<br>

+add_subdirectory(Syntax)<br>

<br>

Added: cfe/trunk/unittests/Tooling/Syntax/CMakeLists.txt<br>

URL: <a href="http://llvm.org/viewvc/llvm-project/cfe/trunk/unittests/Tooling/Syntax/CMakeLists.txt?rev=361148&view=auto" rel="noreferrer" target="_blank">http://llvm.org/viewvc/llvm-project/cfe/trunk/unittests/Tooling/Syntax/CMakeLists.txt?rev=361148&view=auto</a><br>

==============================================================================<br>

--- cfe/trunk/unittests/Tooling/Syntax/CMakeLists.txt (added)<br>

+++ cfe/trunk/unittests/Tooling/Syntax/CMakeLists.txt Mon May 20 06:00:42 2019<br>

@@ -0,0 +1,20 @@<br>

+set(LLVM_LINK_COMPONENTS<br>

+  ${LLVM_TARGETS_TO_BUILD}<br>

+  Support<br>

+  )<br>

+<br>

+add_clang_unittest(TokensTest<br>

+  TokensTest.cpp<br>

+)<br>

+<br>

+target_link_libraries(TokensTest<br>

+  PRIVATE<br>

+  clangAST<br>

+  clangBasic<br>

+  clangFrontend<br>

+  clangLex<br>

+  clangSerialization<br>

+  clangTooling<br>

+  clangToolingSyntax<br>

+  LLVMTestingSupport<br>

+  )<br>

<br>

Added: cfe/trunk/unittests/Tooling/Syntax/TokensTest.cpp<br>

URL: <a href="http://llvm.org/viewvc/llvm-project/cfe/trunk/unittests/Tooling/Syntax/TokensTest.cpp?rev=361148&view=auto" rel="noreferrer" target="_blank">http://llvm.org/viewvc/llvm-project/cfe/trunk/unittests/Tooling/Syntax/TokensTest.cpp?rev=361148&view=auto</a><br>

==============================================================================<br>

--- cfe/trunk/unittests/Tooling/Syntax/TokensTest.cpp (added)<br>

+++ cfe/trunk/unittests/Tooling/Syntax/TokensTest.cpp Mon May 20 06:00:42 2019<br>

@@ -0,0 +1,654 @@<br>

+//===- TokensTest.cpp -----------------------------------------------------===//<br>

+//<br>

+// Part of the LLVM Project, under the Apache License v2.0 with LLVM Exceptions.<br>

+// See <a href="https://llvm.org/LICENSE.txt" rel="noreferrer" target="_blank">https://llvm.org/LICENSE.txt</a> for license information.<br>

+// SPDX-License-Identifier: Apache-2.0 WITH LLVM-exception<br>

+//<br>

+//===----------------------------------------------------------------------===//<br>

+<br>

+#include "clang/Tooling/Syntax/Tokens.h"<br>

+#include "clang/AST/ASTConsumer.h"<br>

+#include "clang/AST/Expr.h"<br>

+#include "clang/Basic/Diagnostic.h"<br>

+#include "clang/Basic/DiagnosticIDs.h"<br>

+#include "clang/Basic/DiagnosticOptions.h"<br>

+#include "clang/Basic/FileManager.h"<br>

+#include "clang/Basic/FileSystemOptions.h"<br>

+#include "clang/Basic/LLVM.h"<br>

+#include "clang/Basic/LangOptions.h"<br>

+#include "clang/Basic/SourceLocation.h"<br>

+#include "clang/Basic/SourceManager.h"<br>

+#include "clang/Basic/TokenKinds.def"<br>

+#include "clang/Basic/TokenKinds.h"<br>

+#include "clang/Frontend/CompilerInstance.h"<br>

+#include "clang/Frontend/FrontendAction.h"<br>

+#include "clang/Frontend/Utils.h"<br>

+#include "clang/Lex/Lexer.h"<br>

+#include "clang/Lex/PreprocessorOptions.h"<br>

+#include "clang/Lex/Token.h"<br>

+#include "clang/Tooling/Tooling.h"<br>

+#include "llvm/ADT/ArrayRef.h"<br>

+#include "llvm/ADT/IntrusiveRefCntPtr.h"<br>

+#include "llvm/ADT/None.h"<br>

+#include "llvm/ADT/Optional.h"<br>

+#include "llvm/ADT/STLExtras.h"<br>

+#include "llvm/ADT/StringRef.h"<br>

+#include "llvm/Support/FormatVariadic.h"<br>

+#include "llvm/Support/MemoryBuffer.h"<br>

+#include "llvm/Support/VirtualFileSystem.h"<br>

+#include "llvm/Support/raw_os_ostream.h"<br>

+#include "llvm/Support/raw_ostream.h"<br>

+#include "llvm/Testing/Support/Annotations.h"<br>

+#include "llvm/Testing/Support/SupportHelpers.h"<br>

+#include <cassert><br>

+#include <cstdlib><br>

+#include <gmock/gmock.h><br>

+#include <gtest/gtest.h><br>

+#include <memory><br>

+#include <ostream><br>

+#include <string><br>

+<br>

+using namespace clang;<br>

+using namespace clang::syntax;<br>

+<br>

+using llvm::ValueIs;<br>

+using ::testing::AllOf;<br>

+using ::testing::Contains;<br>

+using ::testing::ElementsAre;<br>

+using ::testing::Matcher;<br>

+using ::testing::Not;<br>

+using ::testing::StartsWith;<br>

+<br>

+namespace {<br>

+// Checks the passed ArrayRef<T> has the same begin() and end() iterators as the<br>

+// argument.<br>

+MATCHER_P(SameRange, A, "") {<br>

+  return A.begin() == arg.begin() && A.end() == arg.end();<br>

+}<br>

+// Matchers for syntax::Token.<br>

+MATCHER_P(Kind, K, "") { return arg.kind() == K; }<br>

+MATCHER_P2(HasText, Text, SourceMgr, "") {<br>

+  return arg.text(*SourceMgr) == Text;<br>

+}<br>

+/// Checks the start and end location of a token are equal to SourceRng.<br>

+MATCHER_P(RangeIs, SourceRng, "") {<br>

+  return arg.location() == SourceRng.first &&<br>

+         arg.endLocation() == SourceRng.second;<br>

+}<br>

+<br>

+class TokenCollectorTest : public ::testing::Test {<br>

+public:<br>

+  /// Run the clang frontend, collect the preprocessed tokens from the frontend<br>

+  /// invocation and store them in this->Buffer.<br>

+  /// This also clears SourceManager before running the compiler.<br>

+  void recordTokens(llvm::StringRef Code) {<br>

+    class RecordTokens : public ASTFrontendAction {<br>

+    public:<br>

+      explicit RecordTokens(TokenBuffer &Result) : Result(Result) {}<br>

+<br>

+      bool BeginSourceFileAction(CompilerInstance &CI) override {<br>

+        assert(!Collector && "expected only a single call to BeginSourceFile");<br>

+        Collector.emplace(CI.getPreprocessor());<br>

+        return true;<br>

+      }<br>

+      void EndSourceFileAction() override {<br>

+        assert(Collector && "BeginSourceFileAction was never called");<br>

+        Result = std::move(*Collector).consume();<br>

+      }<br>

+<br>

+      std::unique_ptr<ASTConsumer><br>

+      CreateASTConsumer(CompilerInstance &CI, StringRef InFile) override {<br>

+        return llvm::make_unique<ASTConsumer>();<br>

+      }<br>

+<br>

+    private:<br>

+      TokenBuffer &Result;<br>

+      llvm::Optional<TokenCollector> Collector;<br>

+    };<br>

+<br>

+    constexpr const char *FileName = "./input.cpp";<br>

+    FS->addFile(FileName, time_t(), llvm::MemoryBuffer::getMemBufferCopy(""));<br>

+    // Prepare to run a compiler.<br>

+    std::vector<const char *> Args = {"tok-test", "-std=c++03", "-fsyntax-only",<br>

+                                      FileName};<br>

+    auto CI = createInvocationFromCommandLine(Args, Diags, FS);<br>

+    assert(CI);<br>

+    CI->getFrontendOpts().DisableFree = false;<br>

+    CI->getPreprocessorOpts().addRemappedFile(<br>

+        FileName, llvm::MemoryBuffer::getMemBufferCopy(Code).release());<br>

+    CompilerInstance Compiler;<br>

+    Compiler.setInvocation(std::move(CI));<br>

+    if (!Diags->getClient())<br>

+      Diags->setClient(new IgnoringDiagConsumer);<br>

+    Compiler.setDiagnostics(Diags.get());<br>

+    Compiler.setFileManager(FileMgr.get());<br>

+    Compiler.setSourceManager(SourceMgr.get());<br>

+<br>

+    this->Buffer = TokenBuffer(*SourceMgr);<br>

+    RecordTokens Recorder(this->Buffer);<br>

+    ASSERT_TRUE(Compiler.ExecuteAction(Recorder))<br>

+        << "failed to run the frontend";<br>

+  }<br>

+<br>

+  /// Record the tokens and return a test dump of the resulting buffer.<br>

+  std::string collectAndDump(llvm::StringRef Code) {<br>

+    recordTokens(Code);<br>

+    return Buffer.dumpForTests();<br>

+  }<br>

+<br>

+  // Adds a file to the test VFS.<br>

+  void addFile(llvm::StringRef Path, llvm::StringRef Contents) {<br>

+    if (!FS->addFile(Path, time_t(),<br>

+                     llvm::MemoryBuffer::getMemBufferCopy(Contents))) {<br>

+      ADD_FAILURE() << "could not add a file to VFS: " << Path;<br>

+    }<br>

+  }<br>

+<br>

+  /// Add a new file, run syntax::tokenize() on it and return the results.<br>

+  std::vector<syntax::Token> tokenize(llvm::StringRef Text) {<br>

+    // FIXME: pass proper LangOptions.<br>

+    return syntax::tokenize(<br>

+        SourceMgr->createFileID(llvm::MemoryBuffer::getMemBufferCopy(Text)),<br>

+        *SourceMgr, LangOptions());<br>

+  }<br>

+<br>

+  // Specialized versions of matchers that hide the SourceManager from clients.<br>

+  Matcher<syntax::Token> HasText(std::string Text) const {<br>

+    return ::HasText(Text, SourceMgr.get());<br>

+  }<br>

+  Matcher<syntax::Token> RangeIs(llvm::Annotations::Range R) const {<br>

+    std::pair<SourceLocation, SourceLocation> Ls;<br>

+    Ls.first = SourceMgr->getLocForStartOfFile(SourceMgr->getMainFileID())<br>

+                   .getLocWithOffset(R.Begin);<br>

+    Ls.second = SourceMgr->getLocForStartOfFile(SourceMgr->getMainFileID())<br>

+                    .getLocWithOffset(R.End);<br>

+    return ::RangeIs(Ls);<br>

+  }<br>

+<br>

+  /// Finds a subrange in O(n * m).<br>

+  template <class T, class U, class Eq><br>

+  llvm::ArrayRef<T> findSubrange(llvm::ArrayRef<U> Subrange,<br>

+                                 llvm::ArrayRef<T> Range, Eq F) {<br>

+    for (auto Begin = Range.begin(); Begin < Range.end(); ++Begin) {<br>

+      auto It = Begin;<br>

+      for (auto ItSub = Subrange.begin();<br>

+           ItSub != Subrange.end() && It != Range.end(); ++ItSub, ++It) {<br>

+        if (!F(*ItSub, *It))<br>

+          goto continue_outer;<br>

+      }<br>

+      return llvm::makeArrayRef(Begin, It);<br>

+    continue_outer:;<br>

+    }<br>

+    return llvm::makeArrayRef(Range.end(), Range.end());<br>

+  }<br>

+<br>

+  /// Finds a subrange in \p Tokens that match the tokens specified in \p Query.<br>

+  /// The match should be unique. \p Query is a whitespace-separated list of<br>

+  /// tokens to search for.<br>

+  llvm::ArrayRef<syntax::Token><br>

+  findTokenRange(llvm::StringRef Query, llvm::ArrayRef<syntax::Token> Tokens) {<br>

+    llvm::SmallVector<llvm::StringRef, 8> QueryTokens;<br>

+    Query.split(QueryTokens, ' ', /*MaxSplit=*/-1, /*KeepEmpty=*/false);<br>

+    if (QueryTokens.empty()) {<br>

+      ADD_FAILURE() << "will not look for an empty list of tokens";<br>

+      std::abort();<br>

+    }<br>

+    // An equality test for search.<br>

+    auto TextMatches = [this](llvm::StringRef Q, const syntax::Token &T) {<br>

+      return Q == T.text(*SourceMgr);<br>

+    };<br>

+    // Find a match.<br>

+    auto Found =<br>

+        findSubrange(llvm::makeArrayRef(QueryTokens), Tokens, TextMatches);<br>

+    if (Found.begin() == Tokens.end()) {<br>

+      ADD_FAILURE() << "could not find the subrange for " << Query;<br>

+      std::abort();<br>

+    }<br>

+    // Check that the match is unique.<br>

+    if (findSubrange(llvm::makeArrayRef(QueryTokens),<br>

+                     llvm::makeArrayRef(Found.end(), Tokens.end()), TextMatches)<br>

+            .begin() != Tokens.end()) {<br>

+      ADD_FAILURE() << "match is not unique for " << Query;<br>

+      std::abort();<br>

+    }<br>

+    return Found;<br>

+  };<br>

+<br>

+  // Specialized versions of findTokenRange for expanded and spelled tokens.<br>

+  llvm::ArrayRef<syntax::Token> findExpanded(llvm::StringRef Query) {<br>

+    return findTokenRange(Query, Buffer.expandedTokens());<br>

+  }<br>

+  llvm::ArrayRef<syntax::Token> findSpelled(llvm::StringRef Query,<br>

+                                            FileID File = FileID()) {<br>

+    if (!File.isValid())<br>

+      File = SourceMgr->getMainFileID();<br>

+    return findTokenRange(Query, Buffer.spelledTokens(File));<br>

+  }<br>

+<br>

+  // Data fields.<br>

+  llvm::IntrusiveRefCntPtr<DiagnosticsEngine> Diags =<br>

+      new DiagnosticsEngine(new DiagnosticIDs, new DiagnosticOptions);<br>

+  IntrusiveRefCntPtr<llvm::vfs::InMemoryFileSystem> FS =<br>

+      new llvm::vfs::InMemoryFileSystem;<br>

+  llvm::IntrusiveRefCntPtr<FileManager> FileMgr =<br>

+      new FileManager(FileSystemOptions(), FS);<br>

+  llvm::IntrusiveRefCntPtr<SourceManager> SourceMgr =<br>

+      new SourceManager(*Diags, *FileMgr);<br>

+  /// Contains last result of calling recordTokens().<br>

+  TokenBuffer Buffer = TokenBuffer(*SourceMgr);<br>

+};<br>

+<br>

+TEST_F(TokenCollectorTest, RawMode) {<br>

+  EXPECT_THAT(tokenize("int main() {}"),<br>

+              ElementsAre(Kind(tok::kw_int),<br>

+                          AllOf(HasText("main"), Kind(tok::identifier)),<br>

+                          Kind(tok::l_paren), Kind(tok::r_paren),<br>

+                          Kind(tok::l_brace), Kind(tok::r_brace)));<br>

+  // Comments are ignored for now.<br>

+  EXPECT_THAT(tokenize("/* foo */int a; // more comments"),<br>

+              ElementsAre(Kind(tok::kw_int),<br>

+                          AllOf(HasText("a"), Kind(tok::identifier)),<br>

+                          Kind(tok::semi)));<br>

+}<br>

+<br>

+TEST_F(TokenCollectorTest, Basic) {<br>

+  std::pair</*Input*/ std::string, /*Expected*/ std::string> TestCases[] = {<br>

+      {"int main() {}",<br>

+       R"(expanded tokens:<br>

+  int main ( ) { }<br>

+file './input.cpp'<br>

+  spelled tokens:<br>

+    int main ( ) { }<br>

+  no mappings.<br>

+)"},<br>

+      // All kinds of whitespace are ignored.<br>

+      {"\t\n  int\t\n  main\t\n  (\t\n  )\t\n{\t\n  }\t\n",<br>

+       R"(expanded tokens:<br>

+  int main ( ) { }<br>

+file './input.cpp'<br>

+  spelled tokens:<br>

+    int main ( ) { }<br>

+  no mappings.<br>

+)"},<br>

+      // Annotation tokens are ignored.<br>

+      {R"cpp(<br>

+        #pragma GCC visibility push (public)<br>

+        #pragma GCC visibility pop<br>

+      )cpp",<br>

+       R"(expanded tokens:<br>

+  <empty><br>

+file './input.cpp'<br>

+  spelled tokens:<br>

+    # pragma GCC visibility push ( public ) # pragma GCC visibility pop<br>

+  mappings:<br>

+    ['#'_0, '<eof>'_13) => ['<eof>'_0, '<eof>'_0)<br>

+)"}};<br>

+  for (auto &Test : TestCases)<br>

+    EXPECT_EQ(collectAndDump(Test.first), Test.second)<br>

+        << collectAndDump(Test.first);<br>

+}<br>

+<br>

+TEST_F(TokenCollectorTest, Locations) {<br>

+  // Check locations of the tokens.<br>

+  llvm::Annotations Code(R"cpp(<br>

+    $r1[[int]] $r2[[a]] $r3[[=]] $r4[["foo bar baz"]] $r5[[;]]<br>

+  )cpp");<br>

+  recordTokens(Code.code());<br>

+  // Check expanded tokens.<br>

+  EXPECT_THAT(<br>

+      Buffer.expandedTokens(),<br>

+      ElementsAre(AllOf(Kind(tok::kw_int), RangeIs(Code.range("r1"))),<br>

+                  AllOf(Kind(tok::identifier), RangeIs(Code.range("r2"))),<br>

+                  AllOf(Kind(tok::equal), RangeIs(Code.range("r3"))),<br>

+                  AllOf(Kind(tok::string_literal), RangeIs(Code.range("r4"))),<br>

+                  AllOf(Kind(tok::semi), RangeIs(Code.range("r5"))),<br>

+                  Kind(tok::eof)));<br>

+  // Check spelled tokens.<br>

+  EXPECT_THAT(<br>

+      Buffer.spelledTokens(SourceMgr->getMainFileID()),<br>

+      ElementsAre(AllOf(Kind(tok::kw_int), RangeIs(Code.range("r1"))),<br>

+                  AllOf(Kind(tok::identifier), RangeIs(Code.range("r2"))),<br>

+                  AllOf(Kind(tok::equal), RangeIs(Code.range("r3"))),<br>

+                  AllOf(Kind(tok::string_literal), RangeIs(Code.range("r4"))),<br>

+                  AllOf(Kind(tok::semi), RangeIs(Code.range("r5")))));<br>

+}<br>

+<br>

+TEST_F(TokenCollectorTest, MacroDirectives) {<br>

+  // Macro directives are not stored anywhere at the moment.<br>

+  std::string Code = R"cpp(<br>

+    #define FOO a<br>

+    #include "unresolved_file.h"<br>

+    #undef FOO<br>

+    #ifdef X<br>

+    #else<br>

+    #endif<br>

+    #ifndef Y<br>

+    #endif<br>

+    #if 1<br>

+    #elif 2<br>

+    #else<br>

+    #endif<br>

+    #pragma once<br>

+    #pragma something lalala<br>

+<br>

+    int a;<br>

+  )cpp";<br>

+  std::string Expected =<br>

+      "expanded tokens:\n"<br>

+      "  int a ;\n"<br>

+      "file './input.cpp'\n"<br>

+      "  spelled tokens:\n"<br>

+      "    # define FOO a # include \"unresolved_file.h\" # undef FOO "<br>

+      "# ifdef X # else # endif # ifndef Y # endif # if 1 # elif 2 # else "<br>

+      "# endif # pragma once # pragma something lalala int a ;\n"<br>

+      "  mappings:\n"<br>

+      "    ['#'_0, 'int'_39) => ['int'_0, 'int'_0)\n";<br>

+  EXPECT_EQ(collectAndDump(Code), Expected);<br>

+}<br>

+<br>

+TEST_F(TokenCollectorTest, MacroReplacements) {<br>

+  std::pair</*Input*/ std::string, /*Expected*/ std::string> TestCases[] = {<br>

+      // A simple object-like macro.<br>

+      {R"cpp(<br>

+    #define INT int const<br>

+    INT a;<br>

+  )cpp",<br>

+       R"(expanded tokens:<br>

+  int const a ;<br>

+file './input.cpp'<br>

+  spelled tokens:<br>

+    # define INT int const INT a ;<br>

+  mappings:<br>

+    ['#'_0, 'INT'_5) => ['int'_0, 'int'_0)<br>

+    ['INT'_5, 'a'_6) => ['int'_0, 'a'_2)<br>

+)"},<br>

+      // A simple function-like macro.<br>

+      {R"cpp(<br>

+    #define INT(a) const int<br>

+    INT(10+10) a;<br>

+  )cpp",<br>

+       R"(expanded tokens:<br>

+  const int a ;<br>

+file './input.cpp'<br>

+  spelled tokens:<br>

+    # define INT ( a ) const int INT ( 10 + 10 ) a ;<br>

+  mappings:<br>

+    ['#'_0, 'INT'_8) => ['const'_0, 'const'_0)<br>

+    ['INT'_8, 'a'_14) => ['const'_0, 'a'_2)<br>

+)"},<br>

+      // Recursive macro replacements.<br>

+      {R"cpp(<br>

+    #define ID(X) X<br>

+    #define INT int const<br>

+    ID(ID(INT)) a;<br>

+  )cpp",<br>

+       R"(expanded tokens:<br>

+  int const a ;<br>

+file './input.cpp'<br>

+  spelled tokens:<br>

+    # define ID ( X ) X # define INT int const ID ( ID ( INT ) ) a ;<br>

+  mappings:<br>

+    ['#'_0, 'ID'_12) => ['int'_0, 'int'_0)<br>

+    ['ID'_12, 'a'_19) => ['int'_0, 'a'_2)<br>

+)"},<br>

+      // A little more complicated recursive macro replacements.<br>

+      {R"cpp(<br>

+    #define ADD(X, Y) X+Y<br>

+    #define MULT(X, Y) X*Y<br>

+<br>

+    int a = ADD(MULT(1,2), MULT(3,ADD(4,5)));<br>

+  )cpp",<br>

+       "expanded tokens:\n"<br>

+       "  int a = 1 * 2 + 3 * 4 + 5 ;\n"<br>

+       "file './input.cpp'\n"<br>

+       "  spelled tokens:\n"<br>

+       "    # define ADD ( X , Y ) X + Y # define MULT ( X , Y ) X * Y int "<br>

+       "a = ADD ( MULT ( 1 , 2 ) , MULT ( 3 , ADD ( 4 , 5 ) ) ) ;\n"<br>

+       "  mappings:\n"<br>

+       "    ['#'_0, 'int'_22) => ['int'_0, 'int'_0)\n"<br>

+       "    ['ADD'_25, ';'_46) => ['1'_3, ';'_12)\n"},<br>

+      // Empty macro replacement.<br>

+      {R"cpp(<br>

+    #define EMPTY<br>

+    #define EMPTY_FUNC(X)<br>

+    EMPTY<br>

+    EMPTY_FUNC(1+2+3)<br>

+    )cpp",<br>

+       R"(expanded tokens:<br>

+  <empty><br>

+file './input.cpp'<br>

+  spelled tokens:<br>

+    # define EMPTY # define EMPTY_FUNC ( X ) EMPTY EMPTY_FUNC ( 1 + 2 + 3 )<br>

+  mappings:<br>

+    ['#'_0, '<eof>'_18) => ['<eof>'_0, '<eof>'_0)<br>

+)"},<br>

+      // File ends with a macro replacement.<br>

+      {R"cpp(<br>

+    #define FOO 10+10;<br>

+    int a = FOO<br>

+    )cpp",<br>

+       R"(expanded tokens:<br>

+  int a = 10 + 10 ;<br>

+file './input.cpp'<br>

+  spelled tokens:<br>

+    # define FOO 10 + 10 ; int a = FOO<br>

+  mappings:<br>

+    ['#'_0, 'int'_7) => ['int'_0, 'int'_0)<br>

+    ['FOO'_10, '<eof>'_11) => ['10'_3, '<eof>'_7)<br>

+)"}};<br>

+<br>

+  for (auto &Test : TestCases)<br>

+    EXPECT_EQ(Test.second, collectAndDump(Test.first))<br>

+        << collectAndDump(Test.first);<br>

+}<br>

+<br>

+TEST_F(TokenCollectorTest, SpecialTokens) {<br>

+  // Tokens coming from concatenations.<br>

+  recordTokens(R"cpp(<br>

+    #define CONCAT(a, b) a ## b<br>

+    int a = CONCAT(1, 2);<br>

+  )cpp");<br>

+  EXPECT_THAT(std::vector<syntax::Token>(Buffer.expandedTokens()),<br>

+              Contains(HasText("12")));<br>

+  // Multi-line tokens with slashes at the end.<br>

+  recordTokens("i\\\nn\\\nt");<br>

+  EXPECT_THAT(Buffer.expandedTokens(),<br>

+              ElementsAre(AllOf(Kind(tok::kw_int), HasText("i\\\nn\\\nt")),<br>

+                          Kind(tok::eof)));<br>

+  // FIXME: test tokens with digraphs and UCN identifiers.<br>

+}<br>

+<br>

+TEST_F(TokenCollectorTest, LateBoundTokens) {<br>

+  // The parser eventually breaks the first '>>' into two tokens ('>' and '>'),<br>

+  // but we choose to record them as a single token (for now).<br>

+  llvm::Annotations Code(R"cpp(<br>

+    template <class T><br>

+    struct foo { int a; };<br>

+    int bar = foo<foo<int$br[[>>]]().a;<br>

+    int baz = 10 $op[[>>]] 2;<br>

+  )cpp");<br>

+  recordTokens(Code.code());<br>

+  EXPECT_THAT(std::vector<syntax::Token>(Buffer.expandedTokens()),<br>

+              AllOf(Contains(AllOf(Kind(tok::greatergreater),<br>

+                                   RangeIs(Code.range("br")))),<br>

+                    Contains(AllOf(Kind(tok::greatergreater),<br>

+                                   RangeIs(Code.range("op"))))));<br>

+}<br>

+<br>

+TEST_F(TokenCollectorTest, DelayedParsing) {<br>

+  llvm::StringLiteral Code = R"cpp(<br>

+    struct Foo {<br>

+      int method() {<br>

+        // Parser will visit method bodies and initializers multiple times, but<br>

+        // TokenBuffer should only record the first walk over the tokens;<br>

+        return 100;<br>

+      }<br>

+      int a = 10;<br>

+<br>

+      struct Subclass {<br>

+        void foo() {<br>

+          Foo().method();<br>

+        }<br>

+      };<br>

+    };<br>

+  )cpp";<br>

+  std::string ExpectedTokens =<br>

+      "expanded tokens:\n"<br>

+      "  struct Foo { int method ( ) { return 100 ; } int a = 10 ; struct "<br>

+      "Subclass { void foo ( ) { Foo ( ) . method ( ) ; } } ; } ;\n";<br>

+  EXPECT_THAT(collectAndDump(Code), StartsWith(ExpectedTokens));<br>

+}<br>

+<br>

+TEST_F(TokenCollectorTest, MultiFile) {<br>

+  addFile("./foo.h", R"cpp(<br>

+    #define ADD(X, Y) X+Y<br>

+    int a = 100;<br>

+    #include "bar.h"<br>

+  )cpp");<br>

+  addFile("./bar.h", R"cpp(<br>

+    int b = ADD(1, 2);<br>

+    #define MULT(X, Y) X*Y<br>

+  )cpp");<br>

+  llvm::StringLiteral Code = R"cpp(<br>

+    #include "foo.h"<br>

+    int c = ADD(1, MULT(2,3));<br>

+  )cpp";<br>

+<br>

+  std::string Expected = R"(expanded tokens:<br>

+  int a = 100 ; int b = 1 + 2 ; int c = 1 + 2 * 3 ;<br>

+file './input.cpp'<br>

+  spelled tokens:<br>

+    # include "foo.h" int c = ADD ( 1 , MULT ( 2 , 3 ) ) ;<br>

+  mappings:<br>

+    ['#'_0, 'int'_3) => ['int'_12, 'int'_12)<br>

+    ['ADD'_6, ';'_17) => ['1'_15, ';'_20)<br>

+file './foo.h'<br>

+  spelled tokens:<br>

+    # define ADD ( X , Y ) X + Y int a = 100 ; # include "bar.h"<br>

+  mappings:<br>

+    ['#'_0, 'int'_11) => ['int'_0, 'int'_0)<br>

+    ['#'_16, '<eof>'_19) => ['int'_5, 'int'_5)<br>

+file './bar.h'<br>

+  spelled tokens:<br>

+    int b = ADD ( 1 , 2 ) ; # define MULT ( X , Y ) X * Y<br>

+  mappings:<br>

+    ['ADD'_3, ';'_9) => ['1'_8, ';'_11)<br>

+    ['#'_10, '<eof>'_21) => ['int'_12, 'int'_12)<br>

+)";<br>

+<br>

+  EXPECT_EQ(Expected, collectAndDump(Code))<br>

+      << "input: " << Code << "\nresults: " << collectAndDump(Code);<br>

+}<br>

+<br>

+class TokenBufferTest : public TokenCollectorTest {};<br>

+<br>

+TEST_F(TokenBufferTest, SpelledByExpanded) {<br>

+  recordTokens(R"cpp(<br>

+    a1 a2 a3 b1 b2<br>

+  )cpp");<br>

+<br>

+  // Sanity check: expanded and spelled tokens are stored separately.<br>

+  EXPECT_THAT(findExpanded("a1 a2"), Not(SameRange(findSpelled("a1 a2"))));<br>

+  // Searching for subranges of expanded tokens should give the corresponding<br>

+  // spelled ones.<br>

+  EXPECT_THAT(Buffer.spelledForExpanded(findExpanded("a1 a2 a3 b1 b2")),<br>

+              ValueIs(SameRange(findSpelled("a1 a2 a3 b1 b2"))));<br>

+  EXPECT_THAT(Buffer.spelledForExpanded(findExpanded("a1 a2 a3")),<br>

+              ValueIs(SameRange(findSpelled("a1 a2 a3"))));<br>

+  EXPECT_THAT(Buffer.spelledForExpanded(findExpanded("b1 b2")),<br>

+              ValueIs(SameRange(findSpelled("b1 b2"))));<br>

+<br>

+  // Test search on simple macro expansions.<br>

+  recordTokens(R"cpp(<br>

+    #define A a1 a2 a3<br>

+    #define B b1 b2<br>

+<br>

+    A split B<br>

+  )cpp");<br>

+  EXPECT_THAT(Buffer.spelledForExpanded(findExpanded("a1 a2 a3 split b1 b2")),<br>

+              ValueIs(SameRange(findSpelled("A split B"))));<br>

+  EXPECT_THAT(Buffer.spelledForExpanded(findExpanded("a1 a2 a3")),<br>

+              ValueIs(SameRange(findSpelled("A split").drop_back())));<br>

+  EXPECT_THAT(Buffer.spelledForExpanded(findExpanded("b1 b2")),<br>

+              ValueIs(SameRange(findSpelled("split B").drop_front())));<br>

+  // Ranges not fully covering macro invocations should fail.<br>

+  EXPECT_EQ(Buffer.spelledForExpanded(findExpanded("a1 a2")), llvm::None);<br>

+  EXPECT_EQ(Buffer.spelledForExpanded(findExpanded("b2")), llvm::None);<br>

+  EXPECT_EQ(Buffer.spelledForExpanded(findExpanded("a2 a3 split b1 b2")),<br>

+            llvm::None);<br>

+<br>

+  // Recursive macro invocations.<br>

+  recordTokens(R"cpp(<br>

+    #define ID(x) x<br>

+    #define B b1 b2<br>

+<br>

+    ID(ID(ID(a1) a2 a3)) split ID(B)<br>

+  )cpp");<br>

+<br>

+  EXPECT_THAT(Buffer.spelledForExpanded(findExpanded("a1 a2 a3")),<br>

+              ValueIs(SameRange(findSpelled("ID ( ID ( ID ( a1 ) a2 a3 ) )"))));<br>

+  EXPECT_THAT(Buffer.spelledForExpanded(findExpanded("b1 b2")),<br>

+              ValueIs(SameRange(findSpelled("ID ( B )"))));<br>

+  EXPECT_THAT(Buffer.spelledForExpanded(findExpanded("a1 a2 a3 split b1 b2")),<br>

+              ValueIs(SameRange(findSpelled(<br>

+                  "ID ( ID ( ID ( a1 ) a2 a3 ) ) split ID ( B )"))));<br>

+  // Ranges crossing macro call boundaries.<br>

+  EXPECT_EQ(Buffer.spelledForExpanded(findExpanded("a1 a2 a3 split b1")),<br>

+            llvm::None);<br>

+  EXPECT_EQ(Buffer.spelledForExpanded(findExpanded("a2 a3 split b1")),<br>

+            llvm::None);<br>

+  // FIXME: next two examples should map to macro arguments, but currently they<br>

+  //        fail.<br>

+  EXPECT_EQ(Buffer.spelledForExpanded(findExpanded("a2")), llvm::None);<br>

+  EXPECT_EQ(Buffer.spelledForExpanded(findExpanded("a1 a2")), llvm::None);<br>

+<br>

+  // Empty macro expansions.<br>

+  recordTokens(R"cpp(<br>

+    #define EMPTY<br>

+    #define ID(X) X<br>

+<br>

+    EMPTY EMPTY ID(1 2 3) EMPTY EMPTY split1<br>

+    EMPTY EMPTY ID(4 5 6) split2<br>

+    ID(7 8 9) EMPTY EMPTY<br>

+  )cpp");<br>

+  EXPECT_THAT(Buffer.spelledForExpanded(findExpanded("1 2 3")),<br>

+              ValueIs(SameRange(findSpelled("ID ( 1 2 3 )"))));<br>

+  EXPECT_THAT(Buffer.spelledForExpanded(findExpanded("4 5 6")),<br>

+              ValueIs(SameRange(findSpelled("ID ( 4 5 6 )"))));<br>

+  EXPECT_THAT(Buffer.spelledForExpanded(findExpanded("7 8 9")),<br>

+              ValueIs(SameRange(findSpelled("ID ( 7 8 9 )"))));<br>

+<br>

+  // Empty mappings coming from various directives.<br>

+  recordTokens(R"cpp(<br>

+    #define ID(X) X<br>

+    ID(1)<br>

+    #pragma lalala<br>

+    not_mapped<br>

+  )cpp");<br>

+  EXPECT_THAT(Buffer.spelledForExpanded(findExpanded("not_mapped")),<br>

+              ValueIs(SameRange(findSpelled("not_mapped"))));<br>

+}<br>

+<br>

+TEST_F(TokenBufferTest, TokensToFileRange) {<br>

+  addFile("./foo.h", "token_from_header");<br>

+  llvm::Annotations Code(R"cpp(<br>

+    #define FOO token_from_expansion<br>

+    #include "./foo.h"<br>

+    $all[[$i[[int]] a = FOO;]]<br>

+  )cpp");<br>

+  recordTokens(Code.code());<br>

+<br>

+  auto &SM = *SourceMgr;<br>

+<br>

+  // Two simple examples.<br>

+  auto Int = findExpanded("int").front();<br>

+  auto Semi = findExpanded(";").front();<br>

+  EXPECT_EQ(Int.range(SM), FileRange(SM.getMainFileID(), Code.range("i").Begin,<br>

+                                     Code.range("i").End));<br>

+  EXPECT_EQ(syntax::Token::range(SM, Int, Semi),<br>

+            FileRange(SM.getMainFileID(), Code.range("all").Begin,<br>

+                      Code.range("all").End));<br>

+  // We don't test assertion failures because death tests are slow.<br>

+}<br>

+<br>

+} // namespace<br>

\ No newline at end of file<br>

<br>

<br>

_______________________________________________<br>

cfe-commits mailing list<br>

<a href="mailto:cfe-commits@lists.llvm.org" target="_blank">cfe-commits@lists.llvm.org</a><br>

<a href="https://lists.llvm.org/cgi-bin/mailman/listinfo/cfe-commits" rel="noreferrer" target="_blank">https://lists.llvm.org/cgi-bin/mailman/listinfo/cfe-commits</a><br>

</blockquote></div>

</blockquote></div><br clear="all"><div><br></div>-- <br><div dir="ltr" class="gmail-m_-5286231916229022975gmail-m_-6825326089947790004gmail-m_-6103472728592076166gmail-m_-8965722078707614953gmail_signature"><div dir="ltr"><div><div dir="ltr"><div>Regards,</div><div>Ilya Biryukov</div></div></div></div></div>

</blockquote></div><br clear="all"><div><br></div>-- <br><div dir="ltr" class="gmail-m_-5286231916229022975gmail-m_-6825326089947790004gmail-m_-6103472728592076166gmail_signature"><div dir="ltr"><div><div dir="ltr"><div>Regards,</div><div>Ilya Biryukov</div></div></div></div></div>

</blockquote></div><br clear="all"><div><br></div>-- <br><div dir="ltr" class="gmail-m_-5286231916229022975gmail-m_-6825326089947790004gmail_signature"><div dir="ltr"><div><div dir="ltr"><div>Regards,</div><div>Ilya Biryukov</div></div></div></div></div>

</blockquote></div><br clear="all"><div><br></div>-- <br><div dir="ltr" class="gmail-m_-5286231916229022975gmail_signature"><div dir="ltr"><div><div dir="ltr"><div>Regards,</div><div>Ilya Biryukov</div></div></div></div></div>

</blockquote></div>