<div dir="ltr"><div>Terribly sorry for the breakage and many thanks for fixing this!</div></div><br><div class="gmail_quote"><div dir="ltr">On Thu, May 17, 2018 at 9:04 AM Clement Courbet <<a href="mailto:courbet@google.com">courbet@google.com</a>> wrote:<br></div><blockquote class="gmail_quote" style="margin:0 0 0 .8ex;border-left:1px #ccc solid;padding-left:1ex"><div dir="ltr">I should have fixed it in r332576.</div><div class="gmail_extra"><br><div class="gmail_quote">On Wed, May 16, 2018 at 11:49 PM, Galina Kistanova via cfe-commits <span dir="ltr"><<a href="mailto:cfe-commits@lists.llvm.org" target="_blank">cfe-commits@lists.llvm.org</a>></span> wrote:<br><blockquote class="gmail_quote" style="margin:0 0 0 .8ex;border-left:1px #ccc solid;padding-left:1ex"><div dir="ltr"><div><div>Also few other builders are affected:<br><br><a href="http://lab.llvm.org:8011/builders/clang-x86_64-linux-abi-test" target="_blank">http://lab.llvm.org:8011/builders/clang-x86_64-linux-abi-test</a><br><a href="http://lab.llvm.org:8011/builders/clang-lld-x86_64-2stage" target="_blank">http://lab.llvm.org:8011/builders/clang-lld-x86_64-2stage</a><br><a href="http://lab.llvm.org:8011/builders/clang-with-lto-ubuntu" target="_blank">http://lab.llvm.org:8011/builders/clang-with-lto-ubuntu</a><br><br><br></div>Thanks<br><br></div>Galina<br></div><div class="m_-878724247955520822gmail-HOEnZb"><div class="m_-878724247955520822gmail-h5"><div class="gmail_extra"><br><div class="gmail_quote">On Wed, May 16, 2018 at 12:58 PM, Galina Kistanova <span dir="ltr"><<a href="mailto:gkistanova@gmail.com" target="_blank">gkistanova@gmail.com</a>></span> wrote:<br><blockquote class="gmail_quote" style="margin:0px 0px 0px 0.8ex;border-left:1px solid rgb(204,204,204);padding-left:1ex"><div dir="ltr">Hello Ilya,<br><br>This commit broke build step for couple of our builders:<br><br><a href="http://lab.llvm.org:8011/builders/clang-with-lto-ubuntu/builds/8541" target="_blank">http://lab.llvm.org:8011/builders/clang-with-lto-ubuntu/builds/8541</a><br><a href="http://lab.llvm.org:8011/builders/clang-with-thin-lto-ubuntu" target="_blank">http://lab.llvm.org:8011/builders/clang-with-thin-lto-ubuntu</a><br><br>. . .<br>FAILED: tools/clang/unittests/AST/CMakeFiles/ASTTests.dir/CommentTextTest.cpp.o <br>/usr/bin/c++   -DGTEST_HAS_RTTI=0 -DGTEST_HAS_TR1_TUPLE=0 -DGTEST_LANG_CXX11=1 -D_GNU_SOURCE -D__STDC_CONSTANT_MACROS -D__STDC_FORMAT_MACROS -D__STDC_LIMIT_MACROS -Itools/clang/unittests/AST -I/home/buildslave/buildslave1a/clang-with-lto-ubuntu/llvm.src/tools/clang/unittests/AST -I/home/buildslave/buildslave1a/clang-with-lto-ubuntu/llvm.src/tools/clang/include -Itools/clang/include -Iinclude -I/home/buildslave/buildslave1a/clang-with-lto-ubuntu/llvm.src/include -I/home/buildslave/buildslave1a/clang-with-lto-ubuntu/llvm.src/utils/unittest/googletest/include -I/home/buildslave/buildslave1a/clang-with-lto-ubuntu/llvm.src/utils/unittest/googlemock/include -fPIC -fvisibility-inlines-hidden -std=c++11 -Wall -W -Wno-unused-parameter -Wwrite-strings -Wcast-qual -Wno-missing-field-initializers -pedantic -Wno-long-long -Wno-maybe-uninitialized -Wdelete-non-virtual-dtor -Wno-comment -ffunction-sections -fdata-sections -fno-common -Woverloaded-virtual -fno-strict-aliasing -O3 -DNDEBUG    -Wno-variadic-macros -fno-exceptions -fno-rtti -MD -MT tools/clang/unittests/AST/CMakeFiles/ASTTests.dir/CommentTextTest.cpp.o -MF tools/clang/unittests/AST/CMakeFiles/ASTTests.dir/CommentTextTest.cpp.o.d -o tools/clang/unittests/AST/CMakeFiles/ASTTests.dir/CommentTextTest.cpp.o -c /home/buildslave/buildslave1a/clang-with-lto-ubuntu/llvm.src/tools/clang/unittests/AST/CommentTextTest.cpp<br>/home/buildslave/buildslave1a/clang-with-lto-ubuntu/llvm.src/tools/clang/unittests/AST/CommentTextTest.cpp:62:1: error: unterminated raw string<br> R"cpp(<br> ^<br>. . .<br><br>Please have a look?<br><br>The builder was already red and did not send notifications.<br><br>Thanks<span class="m_-878724247955520822gmail-m_-1877726083705554406HOEnZb"><font color="#888888"><br><br>Galina<br><br><br></font></span></div><div class="m_-878724247955520822gmail-m_-1877726083705554406HOEnZb"><div class="m_-878724247955520822gmail-m_-1877726083705554406h5"><div class="gmail_extra"><br><div class="gmail_quote">On Wed, May 16, 2018 at 5:30 AM, Ilya Biryukov via cfe-commits <span dir="ltr"><<a href="mailto:cfe-commits@lists.llvm.org" target="_blank">cfe-commits@lists.llvm.org</a>></span> wrote:<br><blockquote class="gmail_quote" style="margin:0px 0px 0px 0.8ex;border-left:1px solid rgb(204,204,204);padding-left:1ex">Author: ibiryukov<br>

Date: Wed May 16 05:30:09 2018<br>

New Revision: 332458<br>

<br>

URL: <a href="http://llvm.org/viewvc/llvm-project?rev=332458&view=rev" rel="noreferrer" target="_blank">http://llvm.org/viewvc/llvm-project?rev=332458&view=rev</a><br>

Log:<br>

[AST] Added a helper to extract a user-friendly text of a comment.<br>

<br>

Summary:<br>

The helper is used in clangd for documentation shown in code completion<br>

and storing the docs in the symbols. See D45999.<br>

<br>

This patch reuses the code of the Doxygen comment lexer, disabling the<br>

bits that do command and html tag parsing.<br>

The new helper works on all comments, including non-doxygen comments.<br>

However, it does not understand or transform any doxygen directives,<br>

i.e. cannot extract brief text, etc.<br>

<br>

Reviewers: sammccall, hokein, ioeric<br>

<br>

Reviewed By: ioeric<br>

<br>

Subscribers: mgorny, cfe-commits<br>

<br>

Differential Revision: <a href="https://reviews.llvm.org/D46000" rel="noreferrer" target="_blank">https://reviews.llvm.org/D46000</a><br>

<br>

Added:<br>

    cfe/trunk/unittests/AST/CommentTextTest.cpp<br>

Modified:<br>

    cfe/trunk/include/clang/AST/CommentLexer.h<br>

    cfe/trunk/include/clang/AST/RawCommentList.h<br>

    cfe/trunk/lib/AST/CommentLexer.cpp<br>

    cfe/trunk/lib/AST/RawCommentList.cpp<br>

    cfe/trunk/unittests/AST/CMakeLists.txt<br>

<br>

Modified: cfe/trunk/include/clang/AST/CommentLexer.h<br>

URL: <a href="http://llvm.org/viewvc/llvm-project/cfe/trunk/include/clang/AST/CommentLexer.h?rev=332458&r1=332457&r2=332458&view=diff" rel="noreferrer" target="_blank">http://llvm.org/viewvc/llvm-project/cfe/trunk/include/clang/AST/CommentLexer.h?rev=332458&r1=332457&r2=332458&view=diff</a><br>

==============================================================================<br>

--- cfe/trunk/include/clang/AST/CommentLexer.h (original)<br>

+++ cfe/trunk/include/clang/AST/CommentLexer.h Wed May 16 05:30:09 2018<br>

@@ -281,6 +281,11 @@ private:<br>

   /// command, including command marker.<br>

   SmallString<16> VerbatimBlockEndCommandName;<br>

<br>

+  /// If true, the commands, html tags, etc will be parsed and reported as<br>

+  /// separate tokens inside the comment body. If false, the comment text will<br>

+  /// be parsed into text and newline tokens.<br>

+  bool ParseCommands;<br>

+<br>

   /// Given a character reference name (e.g., "lt"), return the character that<br>

   /// it stands for (e.g., "<").<br>

   StringRef resolveHTMLNamedCharacterReference(StringRef Name) const;<br>

@@ -315,12 +320,11 @@ private:<br>

   /// Eat string matching regexp \code \s*\* \endcode.<br>

   void skipLineStartingDecorations();<br>

<br>

-  /// Lex stuff inside comments.  CommentEnd should be set correctly.<br>

+  /// Lex comment text, including commands if ParseCommands is set to true.<br>

   void lexCommentText(Token &T);<br>

<br>

-  void setupAndLexVerbatimBlock(Token &T,<br>

-                                const char *TextBegin,<br>

-                                char Marker, const CommandInfo *Info);<br>

+  void setupAndLexVerbatimBlock(Token &T, const char *TextBegin, char Marker,<br>

+                                const CommandInfo *Info);<br>

<br>

   void lexVerbatimBlockFirstLine(Token &T);<br>

<br>

@@ -343,14 +347,13 @@ private:<br>

<br>

 public:<br>

   Lexer(llvm::BumpPtrAllocator &Allocator, DiagnosticsEngine &Diags,<br>

-        const CommandTraits &Traits,<br>

-        SourceLocation FileLoc,<br>

-        const char *BufferStart, const char *BufferEnd);<br>

+        const CommandTraits &Traits, SourceLocation FileLoc,<br>

+        const char *BufferStart, const char *BufferEnd,<br>

+        bool ParseCommands = true);<br>

<br>

   void lex(Token &T);<br>

<br>

-  StringRef getSpelling(const Token &Tok,<br>

-                        const SourceManager &SourceMgr,<br>

+  StringRef getSpelling(const Token &Tok, const SourceManager &SourceMgr,<br>

                         bool *Invalid = nullptr) const;<br>

 };<br>

<br>

<br>

Modified: cfe/trunk/include/clang/AST/RawCommentList.h<br>

URL: <a href="http://llvm.org/viewvc/llvm-project/cfe/trunk/include/clang/AST/RawCommentList.h?rev=332458&r1=332457&r2=332458&view=diff" rel="noreferrer" target="_blank">http://llvm.org/viewvc/llvm-project/cfe/trunk/include/clang/AST/RawCommentList.h?rev=332458&r1=332457&r2=332458&view=diff</a><br>

==============================================================================<br>

--- cfe/trunk/include/clang/AST/RawCommentList.h (original)<br>

+++ cfe/trunk/include/clang/AST/RawCommentList.h Wed May 16 05:30:09 2018<br>

@@ -111,6 +111,30 @@ public:<br>

     return extractBriefText(Context);<br>

   }<br>

<br>

+  /// Returns sanitized comment text, suitable for presentation in editor UIs.<br>

+  /// E.g. will transform:<br>

+  ///     // This is a long multiline comment.<br>

+  ///     //   Parts of it  might be indented.<br>

+  ///     /* The comments styles might be mixed. */<br>

+  ///  into<br>

+  ///     "This is a long multiline comment.\n"<br>

+  ///     "  Parts of it  might be indented.\n"<br>

+  ///     "The comments styles might be mixed."<br>

+  /// Also removes leading indentation and sanitizes some common cases:<br>

+  ///     /* This is a first line.<br>

+  ///      *   This is a second line. It is indented.<br>

+  ///      * This is a third line. */<br>

+  /// and<br>

+  ///     /* This is a first line.<br>

+  ///          This is a second line. It is indented.<br>

+  ///     This is a third line. */<br>

+  /// will both turn into:<br>

+  ///     "This is a first line.\n"<br>

+  ///     "  This is a second line. It is indented.\n"<br>

+  ///     "This is a third line."<br>

+  std::string getFormattedText(const SourceManager &SourceMgr,<br>

+                               DiagnosticsEngine &Diags) const;<br>

+<br>

   /// Parse the comment, assuming it is attached to decl \c D.<br>

   comments::FullComment *parse(const ASTContext &Context,<br>

                                const Preprocessor *PP, const Decl *D) const;<br>

<br>

Modified: cfe/trunk/lib/AST/CommentLexer.cpp<br>

URL: <a href="http://llvm.org/viewvc/llvm-project/cfe/trunk/lib/AST/CommentLexer.cpp?rev=332458&r1=332457&r2=332458&view=diff" rel="noreferrer" target="_blank">http://llvm.org/viewvc/llvm-project/cfe/trunk/lib/AST/CommentLexer.cpp?rev=332458&r1=332457&r2=332458&view=diff</a><br>

==============================================================================<br>

--- cfe/trunk/lib/AST/CommentLexer.cpp (original)<br>

+++ cfe/trunk/lib/AST/CommentLexer.cpp Wed May 16 05:30:09 2018<br>

@@ -294,6 +294,39 @@ void Lexer::lexCommentText(Token &T) {<br>

   assert(CommentState == LCS_InsideBCPLComment ||<br>

          CommentState == LCS_InsideCComment);<br>

<br>

+  // Handles lexing non-command text, i.e. text and newline.<br>

+  auto HandleNonCommandToken = [&]() -> void {<br>

+    assert(State == LS_Normal);<br>

+<br>

+    const char *TokenPtr = BufferPtr;<br>

+    assert(TokenPtr < CommentEnd);<br>

+    switch (*TokenPtr) {<br>

+      case '\n':<br>

+      case '\r':<br>

+          TokenPtr = skipNewline(TokenPtr, CommentEnd);<br>

+          formTokenWithChars(T, TokenPtr, tok::newline);<br>

+<br>

+          if (CommentState == LCS_InsideCComment)<br>

+            skipLineStartingDecorations();<br>

+          return;<br>

+<br>

+      default: {<br>

+          StringRef TokStartSymbols = ParseCommands ? "\n\r\\@&<" : "\n\r";<br>

+          size_t End = StringRef(TokenPtr, CommentEnd - TokenPtr)<br>

+                           .find_first_of(TokStartSymbols);<br>

+          if (End != StringRef::npos)<br>

+            TokenPtr += End;<br>

+          else<br>

+            TokenPtr = CommentEnd;<br>

+          formTextToken(T, TokenPtr);<br>

+          return;<br>

+      }<br>

+    }<br>

+  };<br>

+<br>

+  if (!ParseCommands)<br>

+    return HandleNonCommandToken();<br>

+<br>

   switch (State) {<br>

   case LS_Normal:<br>

     break;<br>

@@ -315,136 +348,116 @@ void Lexer::lexCommentText(Token &T) {<br>

   }<br>

<br>

   assert(State == LS_Normal);<br>

-<br>

   const char *TokenPtr = BufferPtr;<br>

   assert(TokenPtr < CommentEnd);<br>

-  while (TokenPtr != CommentEnd) {<br>

-    switch(*TokenPtr) {<br>

-      case '\\':<br>

-      case '@': {<br>

-        // Commands that start with a backslash and commands that start with<br>

-        // 'at' have equivalent semantics.  But we keep information about the<br>

-        // exact syntax in AST for comments.<br>

-        tok::TokenKind CommandKind =<br>

-            (*TokenPtr == '@') ? tok::at_command : tok::backslash_command;<br>

+  switch(*TokenPtr) {<br>

+    case '\\':<br>

+    case '@': {<br>

+      // Commands that start with a backslash and commands that start with<br>

+      // 'at' have equivalent semantics.  But we keep information about the<br>

+      // exact syntax in AST for comments.<br>

+      tok::TokenKind CommandKind =<br>

+          (*TokenPtr == '@') ? tok::at_command : tok::backslash_command;<br>

+      TokenPtr++;<br>

+      if (TokenPtr == CommentEnd) {<br>

+        formTextToken(T, TokenPtr);<br>

+        return;<br>

+      }<br>

+      char C = *TokenPtr;<br>

+      switch (C) {<br>

+      default:<br>

+        break;<br>

+<br>

+      case '\\': case '@': case '&': case '$':<br>

+      case '#':  case '<': case '>': case '%':<br>

+      case '\"': case '.': case ':':<br>

+        // This is one of \\ \@ \& \$ etc escape sequences.<br>

         TokenPtr++;<br>

-        if (TokenPtr == CommentEnd) {<br>

-          formTextToken(T, TokenPtr);<br>

-          return;<br>

-        }<br>

-        char C = *TokenPtr;<br>

-        switch (C) {<br>

-        default:<br>

-          break;<br>

-<br>

-        case '\\': case '@': case '&': case '$':<br>

-        case '#':  case '<': case '>': case '%':<br>

-        case '\"': case '.': case ':':<br>

-          // This is one of \\ \@ \& \$ etc escape sequences.<br>

+        if (C == ':' && TokenPtr != CommentEnd && *TokenPtr == ':') {<br>

+          // This is the \:: escape sequence.<br>

           TokenPtr++;<br>

-          if (C == ':' && TokenPtr != CommentEnd && *TokenPtr == ':') {<br>

-            // This is the \:: escape sequence.<br>

-            TokenPtr++;<br>

-          }<br>

-          StringRef UnescapedText(BufferPtr + 1, TokenPtr - (BufferPtr + 1));<br>

-          formTokenWithChars(T, TokenPtr, tok::text);<br>

-          T.setText(UnescapedText);<br>

-          return;<br>

         }<br>

+        StringRef UnescapedText(BufferPtr + 1, TokenPtr - (BufferPtr + 1));<br>

+        formTokenWithChars(T, TokenPtr, tok::text);<br>

+        T.setText(UnescapedText);<br>

+        return;<br>

+      }<br>

<br>

-        // Don't make zero-length commands.<br>

-        if (!isCommandNameStartCharacter(*TokenPtr)) {<br>

-          formTextToken(T, TokenPtr);<br>

-          return;<br>

-        }<br>

+      // Don't make zero-length commands.<br>

+      if (!isCommandNameStartCharacter(*TokenPtr)) {<br>

+        formTextToken(T, TokenPtr);<br>

+        return;<br>

+      }<br>

<br>

-        TokenPtr = skipCommandName(TokenPtr, CommentEnd);<br>

-        unsigned Length = TokenPtr - (BufferPtr + 1);<br>

+      TokenPtr = skipCommandName(TokenPtr, CommentEnd);<br>

+      unsigned Length = TokenPtr - (BufferPtr + 1);<br>

<br>

-        // Hardcoded support for lexing LaTeX formula commands<br>

-        // \f$ \f[ \f] \f{ \f} as a single command.<br>

-        if (Length == 1 && TokenPtr[-1] == 'f' && TokenPtr != CommentEnd) {<br>

-          C = *TokenPtr;<br>

-          if (C == '$' || C == '[' || C == ']' || C == '{' || C == '}') {<br>

-            TokenPtr++;<br>

-            Length++;<br>

-          }<br>

+      // Hardcoded support for lexing LaTeX formula commands<br>

+      // \f$ \f[ \f] \f{ \f} as a single command.<br>

+      if (Length == 1 && TokenPtr[-1] == 'f' && TokenPtr != CommentEnd) {<br>

+        C = *TokenPtr;<br>

+        if (C == '$' || C == '[' || C == ']' || C == '{' || C == '}') {<br>

+          TokenPtr++;<br>

+          Length++;<br>

         }<br>

+      }<br>

<br>

-        StringRef CommandName(BufferPtr + 1, Length);<br>

+      StringRef CommandName(BufferPtr + 1, Length);<br>

<br>

-        const CommandInfo *Info = Traits.getCommandInfoOrNULL(CommandName);<br>

-        if (!Info) {<br>

-          if ((Info = Traits.getTypoCorrectCommandInfo(CommandName))) {<br>

-            StringRef CorrectedName = Info->Name;<br>

-            SourceLocation Loc = getSourceLocation(BufferPtr);<br>

-            SourceLocation EndLoc = getSourceLocation(TokenPtr);<br>

-            SourceRange FullRange = SourceRange(Loc, EndLoc);<br>

-            SourceRange CommandRange(Loc.getLocWithOffset(1), EndLoc);<br>

-            Diag(Loc, diag::warn_correct_comment_command_name)<br>

-              << FullRange << CommandName << CorrectedName<br>

-              << FixItHint::CreateReplacement(CommandRange, CorrectedName);<br>

-          } else {<br>

-            formTokenWithChars(T, TokenPtr, tok::unknown_command);<br>

-            T.setUnknownCommandName(CommandName);<br>

-            Diag(T.getLocation(), diag::warn_unknown_comment_command_name)<br>

-                << SourceRange(T.getLocation(), T.getEndLocation());<br>

-            return;<br>

-          }<br>

-        }<br>

-        if (Info->IsVerbatimBlockCommand) {<br>

-          setupAndLexVerbatimBlock(T, TokenPtr, *BufferPtr, Info);<br>

+      const CommandInfo *Info = Traits.getCommandInfoOrNULL(CommandName);<br>

+      if (!Info) {<br>

+        if ((Info = Traits.getTypoCorrectCommandInfo(CommandName))) {<br>

+          StringRef CorrectedName = Info->Name;<br>

+          SourceLocation Loc = getSourceLocation(BufferPtr);<br>

+          SourceLocation EndLoc = getSourceLocation(TokenPtr);<br>

+          SourceRange FullRange = SourceRange(Loc, EndLoc);<br>

+          SourceRange CommandRange(Loc.getLocWithOffset(1), EndLoc);<br>

+          Diag(Loc, diag::warn_correct_comment_command_name)<br>

+            << FullRange << CommandName << CorrectedName<br>

+            << FixItHint::CreateReplacement(CommandRange, CorrectedName);<br>

+        } else {<br>

+          formTokenWithChars(T, TokenPtr, tok::unknown_command);<br>

+          T.setUnknownCommandName(CommandName);<br>

+          Diag(T.getLocation(), diag::warn_unknown_comment_command_name)<br>

+              << SourceRange(T.getLocation(), T.getEndLocation());<br>

           return;<br>

         }<br>

-        if (Info->IsVerbatimLineCommand) {<br>

-          setupAndLexVerbatimLine(T, TokenPtr, Info);<br>

-          return;<br>

-        }<br>

-        formTokenWithChars(T, TokenPtr, CommandKind);<br>

-        T.setCommandID(Info->getID());<br>

-        return;<br>

       }<br>

-<br>

-      case '&':<br>

-        lexHTMLCharacterReference(T);<br>

+      if (Info->IsVerbatimBlockCommand) {<br>

+        setupAndLexVerbatimBlock(T, TokenPtr, *BufferPtr, Info);<br>

         return;<br>

-<br>

-      case '<': {<br>

-        TokenPtr++;<br>

-        if (TokenPtr == CommentEnd) {<br>

-          formTextToken(T, TokenPtr);<br>

-          return;<br>

-        }<br>

-        const char C = *TokenPtr;<br>

-        if (isHTMLIdentifierStartingCharacter(C))<br>

-          setupAndLexHTMLStartTag(T);<br>

-        else if (C == '/')<br>

-          setupAndLexHTMLEndTag(T);<br>

-        else<br>

-          formTextToken(T, TokenPtr);<br>

+      }<br>

+      if (Info->IsVerbatimLineCommand) {<br>

+        setupAndLexVerbatimLine(T, TokenPtr, Info);<br>

         return;<br>

       }<br>

+      formTokenWithChars(T, TokenPtr, CommandKind);<br>

+      T.setCommandID(Info->getID());<br>

+      return;<br>

+    }<br>

<br>

-      case '\n':<br>

-      case '\r':<br>

-        TokenPtr = skipNewline(TokenPtr, CommentEnd);<br>

-        formTokenWithChars(T, TokenPtr, tok::newline);<br>

-<br>

-        if (CommentState == LCS_InsideCComment)<br>

-          skipLineStartingDecorations();<br>

-        return;<br>

+    case '&':<br>

+      lexHTMLCharacterReference(T);<br>

+      return;<br>

<br>

-      default: {<br>

-        size_t End = StringRef(TokenPtr, CommentEnd - TokenPtr).<br>

-                         find_first_of("\n\r\\@&<");<br>

-        if (End != StringRef::npos)<br>

-          TokenPtr += End;<br>

-        else<br>

-          TokenPtr = CommentEnd;<br>

+    case '<': {<br>

+      TokenPtr++;<br>

+      if (TokenPtr == CommentEnd) {<br>

         formTextToken(T, TokenPtr);<br>

         return;<br>

       }<br>

+      const char C = *TokenPtr;<br>

+      if (isHTMLIdentifierStartingCharacter(C))<br>

+        setupAndLexHTMLStartTag(T);<br>

+      else if (C == '/')<br>

+        setupAndLexHTMLEndTag(T);<br>

+      else<br>

+        formTextToken(T, TokenPtr);<br>

+      return;<br>

     }<br>

+<br>

+    default:<br>

+      return HandleNonCommandToken();<br>

   }<br>

 }<br>

<br>

@@ -727,14 +740,13 @@ void Lexer::lexHTMLEndTag(Token &T) {<br>

 }<br>

<br>

 Lexer::Lexer(llvm::BumpPtrAllocator &Allocator, DiagnosticsEngine &Diags,<br>

-             const CommandTraits &Traits,<br>

-             SourceLocation FileLoc,<br>

-             const char *BufferStart, const char *BufferEnd):<br>

-    Allocator(Allocator), Diags(Diags), Traits(Traits),<br>

-    BufferStart(BufferStart), BufferEnd(BufferEnd),<br>

-    FileLoc(FileLoc), BufferPtr(BufferStart),<br>

-    CommentState(LCS_BeforeComment), State(LS_Normal) {<br>

-}<br>

+             const CommandTraits &Traits, SourceLocation FileLoc,<br>

+             const char *BufferStart, const char *BufferEnd,<br>

+             bool ParseCommands)<br>

+    : Allocator(Allocator), Diags(Diags), Traits(Traits),<br>

+      BufferStart(BufferStart), BufferEnd(BufferEnd), FileLoc(FileLoc),<br>

+      BufferPtr(BufferStart), CommentState(LCS_BeforeComment), State(LS_Normal),<br>

+      ParseCommands(ParseCommands) {}<br>

<br>

 void Lexer::lex(Token &T) {<br>

 again:<br>

<br>

Modified: cfe/trunk/lib/AST/RawCommentList.cpp<br>

URL: <a href="http://llvm.org/viewvc/llvm-project/cfe/trunk/lib/AST/RawCommentList.cpp?rev=332458&r1=332457&r2=332458&view=diff" rel="noreferrer" target="_blank">http://llvm.org/viewvc/llvm-project/cfe/trunk/lib/AST/RawCommentList.cpp?rev=332458&r1=332457&r2=332458&view=diff</a><br>

==============================================================================<br>

--- cfe/trunk/lib/AST/RawCommentList.cpp (original)<br>

+++ cfe/trunk/lib/AST/RawCommentList.cpp Wed May 16 05:30:09 2018<br>

@@ -335,3 +335,94 @@ void RawCommentList::addDeserializedComm<br>

              BeforeThanCompare<RawComment>(SourceMgr));<br>

   std::swap(Comments, MergedComments);<br>

 }<br>

+<br>

+std::string RawComment::getFormattedText(const SourceManager &SourceMgr,<br>

+                                         DiagnosticsEngine &Diags) const {<br>

+  llvm::StringRef CommentText = getRawText(SourceMgr);<br>

+  if (CommentText.empty())<br>

+    return "";<br>

+<br>

+  llvm::BumpPtrAllocator Allocator;<br>

+  // We do not parse any commands, so CommentOptions are ignored by<br>

+  // comments::Lexer. Therefore, we just use default-constructed options.<br>

+  CommentOptions DefOpts;<br>

+  comments::CommandTraits EmptyTraits(Allocator, DefOpts);<br>

+  comments::Lexer L(Allocator, Diags, EmptyTraits, getSourceRange().getBegin(),<br>

+                    CommentText.begin(), CommentText.end(),<br>

+                    /*ParseCommands=*/false);<br>

+<br>

+  std::string Result;<br>

+  // A column number of the first non-whitespace token in the comment text.<br>

+  // We skip whitespace up to this column, but keep the whitespace after this<br>

+  // column. IndentColumn is calculated when lexing the first line and reused<br>

+  // for the rest of lines.<br>

+  unsigned IndentColumn = 0;<br>

+<br>

+  // Processes one line of the comment and adds it to the result.<br>

+  // Handles skipping the indent at the start of the line.<br>

+  // Returns false when eof is reached and true otherwise.<br>

+  auto LexLine = [&](bool IsFirstLine) -> bool {<br>

+    comments::Token Tok;<br>

+    // Lex the first token on the line. We handle it separately, because we to<br>

+    // fix up its indentation.<br>

+    L.lex(Tok);<br>

+    if (Tok.is(comments::tok::eof))<br>

+      return false;<br>

+    if (Tok.is(comments::tok::newline)) {<br>

+      Result += "\n";<br>

+      return true;<br>

+    }<br>

+    llvm::StringRef TokText = L.getSpelling(Tok, SourceMgr);<br>

+    bool LocInvalid = false;<br>

+    unsigned TokColumn =<br>

+        SourceMgr.getSpellingColumnNumber(Tok.getLocation(), &LocInvalid);<br>

+    assert(!LocInvalid && "getFormattedText for invalid location");<br>

+<br>

+    // Amount of leading whitespace in TokText.<br>

+    size_t WhitespaceLen = TokText.find_first_not_of(" \t");<br>

+    if (WhitespaceLen == StringRef::npos)<br>

+      WhitespaceLen = TokText.size();<br>

+    // Remember the amount of whitespace we skipped in the first line to remove<br>

+    // indent up to that column in the following lines.<br>

+    if (IsFirstLine)<br>

+      IndentColumn = TokColumn + WhitespaceLen;<br>

+<br>

+    // Amount of leading whitespace we actually want to skip.<br>

+    // For the first line we skip all the whitespace.<br>

+    // For the rest of the lines, we skip whitespace up to IndentColumn.<br>

+    unsigned SkipLen =<br>

+        IsFirstLine<br>

+            ? WhitespaceLen<br>

+            : std::min<size_t>(<br>

+                  WhitespaceLen,<br>

+                  std::max<int>(static_cast<int>(IndentColumn) - TokColumn, 0));<br>

+    llvm::StringRef Trimmed = TokText.drop_front(SkipLen);<br>

+    Result += Trimmed;<br>

+    // Lex all tokens in the rest of the line.<br>

+    for (L.lex(Tok); Tok.isNot(comments::tok::eof); L.lex(Tok)) {<br>

+      if (Tok.is(comments::tok::newline)) {<br>

+        Result += "\n";<br>

+        return true;<br>

+      }<br>

+      Result += L.getSpelling(Tok, SourceMgr);<br>

+    }<br>

+    // We've reached the end of file token.<br>

+    return false;<br>

+  };<br>

+<br>

+  auto DropTrailingNewLines = [](std::string &Str) {<br>

+    while (Str.back() == '\n')<br>

+      Str.pop_back();<br>

+  };<br>

+<br>

+  // Proces first line separately to remember indent for the following lines.<br>

+  if (!LexLine(/*IsFirstLine=*/true)) {<br>

+    DropTrailingNewLines(Result);<br>

+    return Result;<br>

+  }<br>

+  // Process the rest of the lines.<br>

+  while (LexLine(/*IsFirstLine=*/false))<br>

+    ;<br>

+  DropTrailingNewLines(Result);<br>

+  return Result;<br>

+}<br>

<br>

Modified: cfe/trunk/unittests/AST/CMakeLists.txt<br>

URL: <a href="http://llvm.org/viewvc/llvm-project/cfe/trunk/unittests/AST/CMakeLists.txt?rev=332458&r1=332457&r2=332458&view=diff" rel="noreferrer" target="_blank">http://llvm.org/viewvc/llvm-project/cfe/trunk/unittests/AST/CMakeLists.txt?rev=332458&r1=332457&r2=332458&view=diff</a><br>

==============================================================================<br>

--- cfe/trunk/unittests/AST/CMakeLists.txt (original)<br>

+++ cfe/trunk/unittests/AST/CMakeLists.txt Wed May 16 05:30:09 2018<br>

@@ -9,6 +9,7 @@ add_clang_unittest(ASTTests<br>

   ASTVectorTest.cpp<br>

   CommentLexer.cpp<br>

   CommentParser.cpp<br>

+  CommentTextTest.cpp<br>

   DataCollectionTest.cpp<br>

   DeclPrinterTest.cpp<br>

   DeclTest.cpp<br>

<br>

Added: cfe/trunk/unittests/AST/CommentTextTest.cpp<br>

URL: <a href="http://llvm.org/viewvc/llvm-project/cfe/trunk/unittests/AST/CommentTextTest.cpp?rev=332458&view=auto" rel="noreferrer" target="_blank">http://llvm.org/viewvc/llvm-project/cfe/trunk/unittests/AST/CommentTextTest.cpp?rev=332458&view=auto</a><br>

==============================================================================<br>

--- cfe/trunk/unittests/AST/CommentTextTest.cpp (added)<br>

+++ cfe/trunk/unittests/AST/CommentTextTest.cpp Wed May 16 05:30:09 2018<br>

@@ -0,0 +1,122 @@<br>

+//===- unittest/AST/CommentTextTest.cpp - Comment text extraction test ----===//<br>

+//<br>

+//                     The LLVM Compiler Infrastructure<br>

+//<br>

+// This file is distributed under the University of Illinois Open Source<br>

+// License. See LICENSE.TXT for details.<br>

+//<br>

+//===----------------------------------------------------------------------===//<br>

+//<br>

+// Tests for user-friendly output formatting of comments, i.e.<br>

+// RawComment::getFormattedText().<br>

+//<br>

+//===----------------------------------------------------------------------===//<br>

+<br>

+#include "clang/AST/RawCommentList.h"<br>

+#include "clang/Basic/CommentOptions.h"<br>

+#include "clang/Basic/Diagnostic.h"<br>

+#include "clang/Basic/DiagnosticIDs.h"<br>

+#include "clang/Basic/FileManager.h"<br>

+#include "clang/Basic/FileSystemOptions.h"<br>

+#include "clang/Basic/SourceLocation.h"<br>

+#include "clang/Basic/SourceManager.h"<br>

+#include "clang/Basic/VirtualFileSystem.h"<br>

+#include "llvm/Support/MemoryBuffer.h"<br>

+#include <gtest/gtest.h><br>

+<br>

+namespace clang {<br>

+<br>

+class CommentTextTest : public ::testing::Test {<br>

+protected:<br>

+  std::string formatComment(llvm::StringRef CommentText) {<br>

+    SourceManagerForFile FileSourceMgr("comment-test.cpp", CommentText);<br>

+    SourceManager& SourceMgr = FileSourceMgr.get();<br>

+<br>

+    auto CommentStartOffset = CommentText.find("/");<br>

+    assert(CommentStartOffset != llvm::StringRef::npos);<br>

+    FileID File = SourceMgr.getMainFileID();<br>

+<br>

+    SourceRange CommentRange(<br>

+        SourceMgr.getLocForStartOfFile(File).getLocWithOffset(<br>

+            CommentStartOffset),<br>

+        SourceMgr.getLocForEndOfFile(File));<br>

+    CommentOptions EmptyOpts;<br>

+    // FIXME: technically, merged that we set here is incorrect, but that<br>

+    // shouldn't matter.<br>

+    RawComment Comment(SourceMgr, CommentRange, EmptyOpts, /*Merged=*/true);<br>

+    DiagnosticsEngine Diags(new DiagnosticIDs, new DiagnosticOptions);<br>

+    return Comment.getFormattedText(SourceMgr, Diags);<br>

+  }<br>

+};<br>

+<br>

+TEST_F(CommentTextTest, FormattedText) {<br>

+  // clang-format off<br>

+  auto ExpectedOutput =<br>

+R"(This function does this and that.<br>

+For example,<br>

+   Runnning it in that case will give you<br>

+   this result.<br>

+That's about it.)";<br>

+  // Two-slash comments.<br>

+  EXPECT_EQ(ExpectedOutput, formatComment(<br>

+R"cpp(<br>

+// This function does this and that.<br>

+// For example,<br>

+//    Runnning it in that case will give you<br>

+//    this result.<br>

+// That's about it.)cpp"));<br>

+<br>

+  // Three-slash comments.<br>

+  EXPECT_EQ(ExpectedOutput, formatComment(<br>

+R"cpp(<br>

+/// This function does this and that.<br>

+/// For example,<br>

+///    Runnning it in that case will give you<br>

+///    this result.<br>

+/// That's about it.)cpp"));<br>

+<br>

+  // Block comments.<br>

+  EXPECT_EQ(ExpectedOutput, formatComment(<br>

+R"cpp(<br>

+/* This function does this and that.<br>

+ * For example,<br>

+ *    Runnning it in that case will give you<br>

+ *    this result.<br>

+ * That's about it.*/)cpp"));<br>

+<br>

+  // Doxygen-style block comments.<br>

+  EXPECT_EQ(ExpectedOutput, formatComment(<br>

+R"cpp(<br>

+/** This function does this and that.<br>

+  * For example,<br>

+  *    Runnning it in that case will give you<br>

+  *    this result.<br>

+  * That's about it.*/)cpp"));<br>

+<br>

+  // Weird indentation.<br>

+  EXPECT_EQ(ExpectedOutput, formatComment(<br>

+R"cpp(<br>

+       // This function does this and that.<br>

+  //      For example,<br>

+  //         Runnning it in that case will give you<br>

+        //   this result.<br>

+       // That's about it.)cpp"));<br>

+  // clang-format on<br>

+}<br>

+<br>

+TEST_F(CommentTextTest, KeepsDoxygenControlSeqs) {<br>

+  // clang-format off<br>

+  auto ExpectedOutput =<br>

+R"(\brief This is the brief part of the comment.<br>

+\param a something about a.<br>

+@param b something about b.)";<br>

+<br>

+  EXPECT_EQ(ExpectedOutput, formatComment(<br>

+R"cpp(<br>

+/// \brief This is the brief part of the comment.<br>

+/// \param a something about a.<br>

+/// @param b something about b.)cpp"));<br>

+  // clang-format on<br>

+}<br>

+<br>

+} // namespace clang<br>

<br>

<br>

_______________________________________________<br>

cfe-commits mailing list<br>

<a href="mailto:cfe-commits@lists.llvm.org" target="_blank">cfe-commits@lists.llvm.org</a><br>

<a href="http://lists.llvm.org/cgi-bin/mailman/listinfo/cfe-commits" rel="noreferrer" target="_blank">http://lists.llvm.org/cgi-bin/mailman/listinfo/cfe-commits</a><br>

</blockquote></div><br></div>

</div></div></blockquote></div><br></div>

</div></div><br>_______________________________________________<br>

cfe-commits mailing list<br>

<a href="mailto:cfe-commits@lists.llvm.org" target="_blank">cfe-commits@lists.llvm.org</a><br>

<a href="http://lists.llvm.org/cgi-bin/mailman/listinfo/cfe-commits" rel="noreferrer" target="_blank">http://lists.llvm.org/cgi-bin/mailman/listinfo/cfe-commits</a><br>

<br></blockquote></div><br></div>

</blockquote></div><br clear="all"><div><br></div>-- <br><div dir="ltr" class="gmail_signature" data-smartmail="gmail_signature"><div dir="ltr"><div><div dir="ltr"><div>Regards,</div><div>Ilya Biryukov</div></div></div></div></div>