r332458 - [AST] Added a helper to extract a user-friendly text of a comment.

Thu May 17 02:32:30 PDT 2018

Terribly sorry for the breakage and many thanks for fixing this!

On Thu, May 17, 2018 at 9:04 AM Clement Courbet <courbet at google.com> wrote:

> I should have fixed it in r332576.
>
> On Wed, May 16, 2018 at 11:49 PM, Galina Kistanova via cfe-commits <
> cfe-commits at lists.llvm.org> wrote:
>
>> Also few other builders are affected:
>>
>> http://lab.llvm.org:8011/builders/clang-x86_64-linux-abi-test
>> http://lab.llvm.org:8011/builders/clang-lld-x86_64-2stage
>> http://lab.llvm.org:8011/builders/clang-with-lto-ubuntu
>>
>>
>> Thanks
>>
>> Galina
>>
>> On Wed, May 16, 2018 at 12:58 PM, Galina Kistanova <gkistanova at gmail.com>
>> wrote:
>>
>>> Hello Ilya,
>>>
>>> This commit broke build step for couple of our builders:
>>>
>>> http://lab.llvm.org:8011/builders/clang-with-lto-ubuntu/builds/8541
>>> http://lab.llvm.org:8011/builders/clang-with-thin-lto-ubuntu
>>>
>>> . . .
>>> FAILED:
>>> tools/clang/unittests/AST/CMakeFiles/ASTTests.dir/CommentTextTest.cpp.o
>>> /usr/bin/c++   -DGTEST_HAS_RTTI=0 -DGTEST_HAS_TR1_TUPLE=0
>>> -DGTEST_LANG_CXX11=1 -D_GNU_SOURCE -D__STDC_CONSTANT_MACROS
>>> -D__STDC_FORMAT_MACROS -D__STDC_LIMIT_MACROS -Itools/clang/unittests/AST
>>> -I/home/buildslave/buildslave1a/clang-with-lto-ubuntu/llvm.src/tools/clang/unittests/AST
>>> -I/home/buildslave/buildslave1a/clang-with-lto-ubuntu/llvm.src/tools/clang/include
>>> -Itools/clang/include -Iinclude
>>> -I/home/buildslave/buildslave1a/clang-with-lto-ubuntu/llvm.src/include
>>> -I/home/buildslave/buildslave1a/clang-with-lto-ubuntu/llvm.src/utils/unittest/googletest/include
>>> -I/home/buildslave/buildslave1a/clang-with-lto-ubuntu/llvm.src/utils/unittest/googlemock/include
>>> -fPIC -fvisibility-inlines-hidden -std=c++11 -Wall -W -Wno-unused-parameter
>>> -Wwrite-strings -Wcast-qual -Wno-missing-field-initializers -pedantic
>>> -Wno-long-long -Wno-maybe-uninitialized -Wdelete-non-virtual-dtor
>>> -Wno-comment -ffunction-sections -fdata-sections -fno-common
>>> -Woverloaded-virtual -fno-strict-aliasing -O3 -DNDEBUG
>>> -Wno-variadic-macros -fno-exceptions -fno-rtti -MD -MT
>>> tools/clang/unittests/AST/CMakeFiles/ASTTests.dir/CommentTextTest.cpp.o -MF
>>> tools/clang/unittests/AST/CMakeFiles/ASTTests.dir/CommentTextTest.cpp.o.d
>>> -o tools/clang/unittests/AST/CMakeFiles/ASTTests.dir/CommentTextTest.cpp.o
>>> -c
>>> /home/buildslave/buildslave1a/clang-with-lto-ubuntu/llvm.src/tools/clang/unittests/AST/CommentTextTest.cpp
>>> /home/buildslave/buildslave1a/clang-with-lto-ubuntu/llvm.src/tools/clang/unittests/AST/CommentTextTest.cpp:62:1:
>>> error: unterminated raw string
>>>  R"cpp(
>>>  ^
>>> . . .
>>>
>>> Please have a look?
>>>
>>> The builder was already red and did not send notifications.
>>>
>>> Thanks
>>>
>>> Galina
>>>
>>>
>>>
>>> On Wed, May 16, 2018 at 5:30 AM, Ilya Biryukov via cfe-commits <
>>> cfe-commits at lists.llvm.org> wrote:
>>>
>>>> Author: ibiryukov
>>>> Date: Wed May 16 05:30:09 2018
>>>> New Revision: 332458
>>>>
>>>> URL: http://llvm.org/viewvc/llvm-project?rev=332458&view=rev
>>>> Log:
>>>> [AST] Added a helper to extract a user-friendly text of a comment.
>>>>
>>>> Summary:
>>>> The helper is used in clangd for documentation shown in code completion
>>>> and storing the docs in the symbols. See D45999.
>>>>
>>>> This patch reuses the code of the Doxygen comment lexer, disabling the
>>>> bits that do command and html tag parsing.
>>>> The new helper works on all comments, including non-doxygen comments.
>>>> However, it does not understand or transform any doxygen directives,
>>>> i.e. cannot extract brief text, etc.
>>>>
>>>> Reviewers: sammccall, hokein, ioeric
>>>>
>>>> Reviewed By: ioeric
>>>>
>>>> Subscribers: mgorny, cfe-commits
>>>>
>>>> Differential Revision: https://reviews.llvm.org/D46000
>>>>
>>>> Added:
>>>>     cfe/trunk/unittests/AST/CommentTextTest.cpp
>>>> Modified:
>>>>     cfe/trunk/include/clang/AST/CommentLexer.h
>>>>     cfe/trunk/include/clang/AST/RawCommentList.h
>>>>     cfe/trunk/lib/AST/CommentLexer.cpp
>>>>     cfe/trunk/lib/AST/RawCommentList.cpp
>>>>     cfe/trunk/unittests/AST/CMakeLists.txt
>>>>
>>>> Modified: cfe/trunk/include/clang/AST/CommentLexer.h
>>>> URL:
>>>> http://llvm.org/viewvc/llvm-project/cfe/trunk/include/clang/AST/CommentLexer.h?rev=332458&r1=332457&r2=332458&view=diff
>>>>
>>>> ==============================================================================
>>>> --- cfe/trunk/include/clang/AST/CommentLexer.h (original)
>>>> +++ cfe/trunk/include/clang/AST/CommentLexer.h Wed May 16 05:30:09 2018
>>>> @@ -281,6 +281,11 @@ private:
>>>>    /// command, including command marker.
>>>>    SmallString<16> VerbatimBlockEndCommandName;
>>>>
>>>> +  /// If true, the commands, html tags, etc will be parsed and
>>>> reported as
>>>> +  /// separate tokens inside the comment body. If false, the comment
>>>> text will
>>>> +  /// be parsed into text and newline tokens.
>>>> +  bool ParseCommands;
>>>> +
>>>>    /// Given a character reference name (e.g., "lt"), return the
>>>> character that
>>>>    /// it stands for (e.g., "<").
>>>>    StringRef resolveHTMLNamedCharacterReference(StringRef Name) const;
>>>> @@ -315,12 +320,11 @@ private:
>>>>    /// Eat string matching regexp \code \s*\* \endcode.
>>>>    void skipLineStartingDecorations();
>>>>
>>>> -  /// Lex stuff inside comments.  CommentEnd should be set correctly.
>>>> +  /// Lex comment text, including commands if ParseCommands is set to
>>>> true.
>>>>    void lexCommentText(Token &T);
>>>>
>>>> -  void setupAndLexVerbatimBlock(Token &T,
>>>> -                                const char *TextBegin,
>>>> -                                char Marker, const CommandInfo *Info);
>>>> +  void setupAndLexVerbatimBlock(Token &T, const char *TextBegin, char
>>>> Marker,
>>>> +                                const CommandInfo *Info);
>>>>
>>>>    void lexVerbatimBlockFirstLine(Token &T);
>>>>
>>>> @@ -343,14 +347,13 @@ private:
>>>>
>>>>  public:
>>>>    Lexer(llvm::BumpPtrAllocator &Allocator, DiagnosticsEngine &Diags,
>>>> -        const CommandTraits &Traits,
>>>> -        SourceLocation FileLoc,
>>>> -        const char *BufferStart, const char *BufferEnd);
>>>> +        const CommandTraits &Traits, SourceLocation FileLoc,
>>>> +        const char *BufferStart, const char *BufferEnd,
>>>> +        bool ParseCommands = true);
>>>>
>>>>    void lex(Token &T);
>>>>
>>>> -  StringRef getSpelling(const Token &Tok,
>>>> -                        const SourceManager &SourceMgr,
>>>> +  StringRef getSpelling(const Token &Tok, const SourceManager
>>>> &SourceMgr,
>>>>                          bool *Invalid = nullptr) const;
>>>>  };
>>>>
>>>>
>>>> Modified: cfe/trunk/include/clang/AST/RawCommentList.h
>>>> URL:
>>>> http://llvm.org/viewvc/llvm-project/cfe/trunk/include/clang/AST/RawCommentList.h?rev=332458&r1=332457&r2=332458&view=diff
>>>>
>>>> ==============================================================================
>>>> --- cfe/trunk/include/clang/AST/RawCommentList.h (original)
>>>> +++ cfe/trunk/include/clang/AST/RawCommentList.h Wed May 16 05:30:09
>>>> 2018
>>>> @@ -111,6 +111,30 @@ public:
>>>>      return extractBriefText(Context);
>>>>    }
>>>>
>>>> +  /// Returns sanitized comment text, suitable for presentation in
>>>> editor UIs.
>>>> +  /// E.g. will transform:
>>>> +  ///     // This is a long multiline comment.
>>>> +  ///     //   Parts of it  might be indented.
>>>> +  ///     /* The comments styles might be mixed. */
>>>> +  ///  into
>>>> +  ///     "This is a long multiline comment.\n"
>>>> +  ///     "  Parts of it  might be indented.\n"
>>>> +  ///     "The comments styles might be mixed."
>>>> +  /// Also removes leading indentation and sanitizes some common cases:
>>>> +  ///     /* This is a first line.
>>>> +  ///      *   This is a second line. It is indented.
>>>> +  ///      * This is a third line. */
>>>> +  /// and
>>>> +  ///     /* This is a first line.
>>>> +  ///          This is a second line. It is indented.
>>>> +  ///     This is a third line. */
>>>> +  /// will both turn into:
>>>> +  ///     "This is a first line.\n"
>>>> +  ///     "  This is a second line. It is indented.\n"
>>>> +  ///     "This is a third line."
>>>> +  std::string getFormattedText(const SourceManager &SourceMgr,
>>>> +                               DiagnosticsEngine &Diags) const;
>>>> +
>>>>    /// Parse the comment, assuming it is attached to decl \c D.
>>>>    comments::FullComment *parse(const ASTContext &Context,
>>>>                                 const Preprocessor *PP, const Decl *D)
>>>> const;
>>>>
>>>> Modified: cfe/trunk/lib/AST/CommentLexer.cpp
>>>> URL:
>>>> http://llvm.org/viewvc/llvm-project/cfe/trunk/lib/AST/CommentLexer.cpp?rev=332458&r1=332457&r2=332458&view=diff
>>>>
>>>> ==============================================================================
>>>> --- cfe/trunk/lib/AST/CommentLexer.cpp (original)
>>>> +++ cfe/trunk/lib/AST/CommentLexer.cpp Wed May 16 05:30:09 2018
>>>> @@ -294,6 +294,39 @@ void Lexer::lexCommentText(Token &T) {
>>>>    assert(CommentState == LCS_InsideBCPLComment ||
>>>>           CommentState == LCS_InsideCComment);
>>>>
>>>> +  // Handles lexing non-command text, i.e. text and newline.
>>>> +  auto HandleNonCommandToken = [&]() -> void {
>>>> +    assert(State == LS_Normal);
>>>> +
>>>> +    const char *TokenPtr = BufferPtr;
>>>> +    assert(TokenPtr < CommentEnd);
>>>> +    switch (*TokenPtr) {
>>>> +      case '\n':
>>>> +      case '\r':
>>>> +          TokenPtr = skipNewline(TokenPtr, CommentEnd);
>>>> +          formTokenWithChars(T, TokenPtr, tok::newline);
>>>> +
>>>> +          if (CommentState == LCS_InsideCComment)
>>>> +            skipLineStartingDecorations();
>>>> +          return;
>>>> +
>>>> +      default: {
>>>> +          StringRef TokStartSymbols = ParseCommands ? "\n\r\\@&<" :
>>>> "\n\r";
>>>> +          size_t End = StringRef(TokenPtr, CommentEnd - TokenPtr)
>>>> +                           .find_first_of(TokStartSymbols);
>>>> +          if (End != StringRef::npos)
>>>> +            TokenPtr += End;
>>>> +          else
>>>> +            TokenPtr = CommentEnd;
>>>> +          formTextToken(T, TokenPtr);
>>>> +          return;
>>>> +      }
>>>> +    }
>>>> +  };
>>>> +
>>>> +  if (!ParseCommands)
>>>> +    return HandleNonCommandToken();
>>>> +
>>>>    switch (State) {
>>>>    case LS_Normal:
>>>>      break;
>>>> @@ -315,136 +348,116 @@ void Lexer::lexCommentText(Token &T) {
>>>>    }
>>>>
>>>>    assert(State == LS_Normal);
>>>> -
>>>>    const char *TokenPtr = BufferPtr;
>>>>    assert(TokenPtr < CommentEnd);
>>>> -  while (TokenPtr != CommentEnd) {
>>>> -    switch(*TokenPtr) {
>>>> -      case '\\':
>>>> -      case '@': {
>>>> -        // Commands that start with a backslash and commands that
>>>> start with
>>>> -        // 'at' have equivalent semantics.  But we keep information
>>>> about the
>>>> -        // exact syntax in AST for comments.
>>>> -        tok::TokenKind CommandKind =
>>>> -            (*TokenPtr == '@') ? tok::at_command :
>>>> tok::backslash_command;
>>>> +  switch(*TokenPtr) {
>>>> +    case '\\':
>>>> +    case '@': {
>>>> +      // Commands that start with a backslash and commands that start
>>>> with
>>>> +      // 'at' have equivalent semantics.  But we keep information
>>>> about the
>>>> +      // exact syntax in AST for comments.
>>>> +      tok::TokenKind CommandKind =
>>>> +          (*TokenPtr == '@') ? tok::at_command :
>>>> tok::backslash_command;
>>>> +      TokenPtr++;
>>>> +      if (TokenPtr == CommentEnd) {
>>>> +        formTextToken(T, TokenPtr);
>>>> +        return;
>>>> +      }
>>>> +      char C = *TokenPtr;
>>>> +      switch (C) {
>>>> +      default:
>>>> +        break;
>>>> +
>>>> +      case '\\': case '@': case '&': case '$':
>>>> +      case '#':  case '<': case '>': case '%':
>>>> +      case '\"': case '.': case ':':
>>>> +        // This is one of \\ \@ \& \$ etc escape sequences.
>>>>          TokenPtr++;
>>>> -        if (TokenPtr == CommentEnd) {
>>>> -          formTextToken(T, TokenPtr);
>>>> -          return;
>>>> -        }
>>>> -        char C = *TokenPtr;
>>>> -        switch (C) {
>>>> -        default:
>>>> -          break;
>>>> -
>>>> -        case '\\': case '@': case '&': case '$':
>>>> -        case '#':  case '<': case '>': case '%':
>>>> -        case '\"': case '.': case ':':
>>>> -          // This is one of \\ \@ \& \$ etc escape sequences.
>>>> +        if (C == ':' && TokenPtr != CommentEnd && *TokenPtr == ':') {
>>>> +          // This is the \:: escape sequence.
>>>>            TokenPtr++;
>>>> -          if (C == ':' && TokenPtr != CommentEnd && *TokenPtr == ':') {
>>>> -            // This is the \:: escape sequence.
>>>> -            TokenPtr++;
>>>> -          }
>>>> -          StringRef UnescapedText(BufferPtr + 1, TokenPtr - (BufferPtr
>>>> + 1));
>>>> -          formTokenWithChars(T, TokenPtr, tok::text);
>>>> -          T.setText(UnescapedText);
>>>> -          return;
>>>>          }
>>>> +        StringRef UnescapedText(BufferPtr + 1, TokenPtr - (BufferPtr +
>>>> 1));
>>>> +        formTokenWithChars(T, TokenPtr, tok::text);
>>>> +        T.setText(UnescapedText);
>>>> +        return;
>>>> +      }
>>>>
>>>> -        // Don't make zero-length commands.
>>>> -        if (!isCommandNameStartCharacter(*TokenPtr)) {
>>>> -          formTextToken(T, TokenPtr);
>>>> -          return;
>>>> -        }
>>>> +      // Don't make zero-length commands.
>>>> +      if (!isCommandNameStartCharacter(*TokenPtr)) {
>>>> +        formTextToken(T, TokenPtr);
>>>> +        return;
>>>> +      }
>>>>
>>>> -        TokenPtr = skipCommandName(TokenPtr, CommentEnd);
>>>> -        unsigned Length = TokenPtr - (BufferPtr + 1);
>>>> +      TokenPtr = skipCommandName(TokenPtr, CommentEnd);
>>>> +      unsigned Length = TokenPtr - (BufferPtr + 1);
>>>>
>>>> -        // Hardcoded support for lexing LaTeX formula commands
>>>> -        // \f$ \f[ \f] \f{ \f} as a single command.
>>>> -        if (Length == 1 && TokenPtr[-1] == 'f' && TokenPtr !=
>>>> CommentEnd) {
>>>> -          C = *TokenPtr;
>>>> -          if (C == '$' || C == '[' || C == ']' || C == '{' || C ==
>>>> '}') {
>>>> -            TokenPtr++;
>>>> -            Length++;
>>>> -          }
>>>> +      // Hardcoded support for lexing LaTeX formula commands
>>>> +      // \f$ \f[ \f] \f{ \f} as a single command.
>>>> +      if (Length == 1 && TokenPtr[-1] == 'f' && TokenPtr !=
>>>> CommentEnd) {
>>>> +        C = *TokenPtr;
>>>> +        if (C == '$' || C == '[' || C == ']' || C == '{' || C == '}') {
>>>> +          TokenPtr++;
>>>> +          Length++;
>>>>          }
>>>> +      }
>>>>
>>>> -        StringRef CommandName(BufferPtr + 1, Length);
>>>> +      StringRef CommandName(BufferPtr + 1, Length);
>>>>
>>>> -        const CommandInfo *Info =
>>>> Traits.getCommandInfoOrNULL(CommandName);
>>>> -        if (!Info) {
>>>> -          if ((Info = Traits.getTypoCorrectCommandInfo(CommandName))) {
>>>> -            StringRef CorrectedName = Info->Name;
>>>> -            SourceLocation Loc = getSourceLocation(BufferPtr);
>>>> -            SourceLocation EndLoc = getSourceLocation(TokenPtr);
>>>> -            SourceRange FullRange = SourceRange(Loc, EndLoc);
>>>> -            SourceRange CommandRange(Loc.getLocWithOffset(1), EndLoc);
>>>> -            Diag(Loc, diag::warn_correct_comment_command_name)
>>>> -              << FullRange << CommandName << CorrectedName
>>>> -              << FixItHint::CreateReplacement(CommandRange,
>>>> CorrectedName);
>>>> -          } else {
>>>> -            formTokenWithChars(T, TokenPtr, tok::unknown_command);
>>>> -            T.setUnknownCommandName(CommandName);
>>>> -            Diag(T.getLocation(),
>>>> diag::warn_unknown_comment_command_name)
>>>> -                << SourceRange(T.getLocation(), T.getEndLocation());
>>>> -            return;
>>>> -          }
>>>> -        }
>>>> -        if (Info->IsVerbatimBlockCommand) {
>>>> -          setupAndLexVerbatimBlock(T, TokenPtr, *BufferPtr, Info);
>>>> +      const CommandInfo *Info =
>>>> Traits.getCommandInfoOrNULL(CommandName);
>>>> +      if (!Info) {
>>>> +        if ((Info = Traits.getTypoCorrectCommandInfo(CommandName))) {
>>>> +          StringRef CorrectedName = Info->Name;
>>>> +          SourceLocation Loc = getSourceLocation(BufferPtr);
>>>> +          SourceLocation EndLoc = getSourceLocation(TokenPtr);
>>>> +          SourceRange FullRange = SourceRange(Loc, EndLoc);
>>>> +          SourceRange CommandRange(Loc.getLocWithOffset(1), EndLoc);
>>>> +          Diag(Loc, diag::warn_correct_comment_command_name)
>>>> +            << FullRange << CommandName << CorrectedName
>>>> +            << FixItHint::CreateReplacement(CommandRange,
>>>> CorrectedName);
>>>> +        } else {
>>>> +          formTokenWithChars(T, TokenPtr, tok::unknown_command);
>>>> +          T.setUnknownCommandName(CommandName);
>>>> +          Diag(T.getLocation(),
>>>> diag::warn_unknown_comment_command_name)
>>>> +              << SourceRange(T.getLocation(), T.getEndLocation());
>>>>            return;
>>>>          }
>>>> -        if (Info->IsVerbatimLineCommand) {
>>>> -          setupAndLexVerbatimLine(T, TokenPtr, Info);
>>>> -          return;
>>>> -        }
>>>> -        formTokenWithChars(T, TokenPtr, CommandKind);
>>>> -        T.setCommandID(Info->getID());
>>>> -        return;
>>>>        }
>>>> -
>>>> -      case '&':
>>>> -        lexHTMLCharacterReference(T);
>>>> +      if (Info->IsVerbatimBlockCommand) {
>>>> +        setupAndLexVerbatimBlock(T, TokenPtr, *BufferPtr, Info);
>>>>          return;
>>>> -
>>>> -      case '<': {
>>>> -        TokenPtr++;
>>>> -        if (TokenPtr == CommentEnd) {
>>>> -          formTextToken(T, TokenPtr);
>>>> -          return;
>>>> -        }
>>>> -        const char C = *TokenPtr;
>>>> -        if (isHTMLIdentifierStartingCharacter(C))
>>>> -          setupAndLexHTMLStartTag(T);
>>>> -        else if (C == '/')
>>>> -          setupAndLexHTMLEndTag(T);
>>>> -        else
>>>> -          formTextToken(T, TokenPtr);
>>>> +      }
>>>> +      if (Info->IsVerbatimLineCommand) {
>>>> +        setupAndLexVerbatimLine(T, TokenPtr, Info);
>>>>          return;
>>>>        }
>>>> +      formTokenWithChars(T, TokenPtr, CommandKind);
>>>> +      T.setCommandID(Info->getID());
>>>> +      return;
>>>> +    }
>>>>
>>>> -      case '\n':
>>>> -      case '\r':
>>>> -        TokenPtr = skipNewline(TokenPtr, CommentEnd);
>>>> -        formTokenWithChars(T, TokenPtr, tok::newline);
>>>> -
>>>> -        if (CommentState == LCS_InsideCComment)
>>>> -          skipLineStartingDecorations();
>>>> -        return;
>>>> +    case '&':
>>>> +      lexHTMLCharacterReference(T);
>>>> +      return;
>>>>
>>>> -      default: {
>>>> -        size_t End = StringRef(TokenPtr, CommentEnd - TokenPtr).
>>>> -                         find_first_of("\n\r\\@&<");
>>>> -        if (End != StringRef::npos)
>>>> -          TokenPtr += End;
>>>> -        else
>>>> -          TokenPtr = CommentEnd;
>>>> +    case '<': {
>>>> +      TokenPtr++;
>>>> +      if (TokenPtr == CommentEnd) {
>>>>          formTextToken(T, TokenPtr);
>>>>          return;
>>>>        }
>>>> +      const char C = *TokenPtr;
>>>> +      if (isHTMLIdentifierStartingCharacter(C))
>>>> +        setupAndLexHTMLStartTag(T);
>>>> +      else if (C == '/')
>>>> +        setupAndLexHTMLEndTag(T);
>>>> +      else
>>>> +        formTextToken(T, TokenPtr);
>>>> +      return;
>>>>      }
>>>> +
>>>> +    default:
>>>> +      return HandleNonCommandToken();
>>>>    }
>>>>  }
>>>>
>>>> @@ -727,14 +740,13 @@ void Lexer::lexHTMLEndTag(Token &T) {
>>>>  }
>>>>
>>>>  Lexer::Lexer(llvm::BumpPtrAllocator &Allocator, DiagnosticsEngine
>>>> &Diags,
>>>> -             const CommandTraits &Traits,
>>>> -             SourceLocation FileLoc,
>>>> -             const char *BufferStart, const char *BufferEnd):
>>>> -    Allocator(Allocator), Diags(Diags), Traits(Traits),
>>>> -    BufferStart(BufferStart), BufferEnd(BufferEnd),
>>>> -    FileLoc(FileLoc), BufferPtr(BufferStart),
>>>> -    CommentState(LCS_BeforeComment), State(LS_Normal) {
>>>> -}
>>>> +             const CommandTraits &Traits, SourceLocation FileLoc,
>>>> +             const char *BufferStart, const char *BufferEnd,
>>>> +             bool ParseCommands)
>>>> +    : Allocator(Allocator), Diags(Diags), Traits(Traits),
>>>> +      BufferStart(BufferStart), BufferEnd(BufferEnd), FileLoc(FileLoc),
>>>> +      BufferPtr(BufferStart), CommentState(LCS_BeforeComment),
>>>> State(LS_Normal),
>>>> +      ParseCommands(ParseCommands) {}
>>>>
>>>>  void Lexer::lex(Token &T) {
>>>>  again:
>>>>
>>>> Modified: cfe/trunk/lib/AST/RawCommentList.cpp
>>>> URL:
>>>> http://llvm.org/viewvc/llvm-project/cfe/trunk/lib/AST/RawCommentList.cpp?rev=332458&r1=332457&r2=332458&view=diff
>>>>
>>>> ==============================================================================
>>>> --- cfe/trunk/lib/AST/RawCommentList.cpp (original)
>>>> +++ cfe/trunk/lib/AST/RawCommentList.cpp Wed May 16 05:30:09 2018
>>>> @@ -335,3 +335,94 @@ void RawCommentList::addDeserializedComm
>>>>               BeforeThanCompare<RawComment>(SourceMgr));
>>>>    std::swap(Comments, MergedComments);
>>>>  }
>>>> +
>>>> +std::string RawComment::getFormattedText(const SourceManager
>>>> &SourceMgr,
>>>> +                                         DiagnosticsEngine &Diags)
>>>> const {
>>>> +  llvm::StringRef CommentText = getRawText(SourceMgr);
>>>> +  if (CommentText.empty())
>>>> +    return "";
>>>> +
>>>> +  llvm::BumpPtrAllocator Allocator;
>>>> +  // We do not parse any commands, so CommentOptions are ignored by
>>>> +  // comments::Lexer. Therefore, we just use default-constructed
>>>> options.
>>>> +  CommentOptions DefOpts;
>>>> +  comments::CommandTraits EmptyTraits(Allocator, DefOpts);
>>>> +  comments::Lexer L(Allocator, Diags, EmptyTraits,
>>>> getSourceRange().getBegin(),
>>>> +                    CommentText.begin(), CommentText.end(),
>>>> +                    /*ParseCommands=*/false);
>>>> +
>>>> +  std::string Result;
>>>> +  // A column number of the first non-whitespace token in the comment
>>>> text.
>>>> +  // We skip whitespace up to this column, but keep the whitespace
>>>> after this
>>>> +  // column. IndentColumn is calculated when lexing the first line and
>>>> reused
>>>> +  // for the rest of lines.
>>>> +  unsigned IndentColumn = 0;
>>>> +
>>>> +  // Processes one line of the comment and adds it to the result.
>>>> +  // Handles skipping the indent at the start of the line.
>>>> +  // Returns false when eof is reached and true otherwise.
>>>> +  auto LexLine = [&](bool IsFirstLine) -> bool {
>>>> +    comments::Token Tok;
>>>> +    // Lex the first token on the line. We handle it separately,
>>>> because we to
>>>> +    // fix up its indentation.
>>>> +    L.lex(Tok);
>>>> +    if (Tok.is(comments::tok::eof))
>>>> +      return false;
>>>> +    if (Tok.is(comments::tok::newline)) {
>>>> +      Result += "\n";
>>>> +      return true;
>>>> +    }
>>>> +    llvm::StringRef TokText = L.getSpelling(Tok, SourceMgr);
>>>> +    bool LocInvalid = false;
>>>> +    unsigned TokColumn =
>>>> +        SourceMgr.getSpellingColumnNumber(Tok.getLocation(),
>>>> &LocInvalid);
>>>> +    assert(!LocInvalid && "getFormattedText for invalid location");
>>>> +
>>>> +    // Amount of leading whitespace in TokText.
>>>> +    size_t WhitespaceLen = TokText.find_first_not_of(" \t");
>>>> +    if (WhitespaceLen == StringRef::npos)
>>>> +      WhitespaceLen = TokText.size();
>>>> +    // Remember the amount of whitespace we skipped in the first line
>>>> to remove
>>>> +    // indent up to that column in the following lines.
>>>> +    if (IsFirstLine)
>>>> +      IndentColumn = TokColumn + WhitespaceLen;
>>>> +
>>>> +    // Amount of leading whitespace we actually want to skip.
>>>> +    // For the first line we skip all the whitespace.
>>>> +    // For the rest of the lines, we skip whitespace up to
>>>> IndentColumn.
>>>> +    unsigned SkipLen =
>>>> +        IsFirstLine
>>>> +            ? WhitespaceLen
>>>> +            : std::min<size_t>(
>>>> +                  WhitespaceLen,
>>>> +                  std::max<int>(static_cast<int>(IndentColumn) -
>>>> TokColumn, 0));
>>>> +    llvm::StringRef Trimmed = TokText.drop_front(SkipLen);
>>>> +    Result += Trimmed;
>>>> +    // Lex all tokens in the rest of the line.
>>>> +    for (L.lex(Tok); Tok.isNot(comments::tok::eof); L.lex(Tok)) {
>>>> +      if (Tok.is(comments::tok::newline)) {
>>>> +        Result += "\n";
>>>> +        return true;
>>>> +      }
>>>> +      Result += L.getSpelling(Tok, SourceMgr);
>>>> +    }
>>>> +    // We've reached the end of file token.
>>>> +    return false;
>>>> +  };
>>>> +
>>>> +  auto DropTrailingNewLines = [](std::string &Str) {
>>>> +    while (Str.back() == '\n')
>>>> +      Str.pop_back();
>>>> +  };
>>>> +
>>>> +  // Proces first line separately to remember indent for the following
>>>> lines.
>>>> +  if (!LexLine(/*IsFirstLine=*/true)) {
>>>> +    DropTrailingNewLines(Result);
>>>> +    return Result;
>>>> +  }
>>>> +  // Process the rest of the lines.
>>>> +  while (LexLine(/*IsFirstLine=*/false))
>>>> +    ;
>>>> +  DropTrailingNewLines(Result);
>>>> +  return Result;
>>>> +}
>>>>
>>>> Modified: cfe/trunk/unittests/AST/CMakeLists.txt
>>>> URL:
>>>> http://llvm.org/viewvc/llvm-project/cfe/trunk/unittests/AST/CMakeLists.txt?rev=332458&r1=332457&r2=332458&view=diff
>>>>
>>>> ==============================================================================
>>>> --- cfe/trunk/unittests/AST/CMakeLists.txt (original)
>>>> +++ cfe/trunk/unittests/AST/CMakeLists.txt Wed May 16 05:30:09 2018
>>>> @@ -9,6 +9,7 @@ add_clang_unittest(ASTTests
>>>>    ASTVectorTest.cpp
>>>>    CommentLexer.cpp
>>>>    CommentParser.cpp
>>>> +  CommentTextTest.cpp
>>>>    DataCollectionTest.cpp
>>>>    DeclPrinterTest.cpp
>>>>    DeclTest.cpp
>>>>
>>>> Added: cfe/trunk/unittests/AST/CommentTextTest.cpp
>>>> URL:
>>>> http://llvm.org/viewvc/llvm-project/cfe/trunk/unittests/AST/CommentTextTest.cpp?rev=332458&view=auto
>>>>
>>>> ==============================================================================
>>>> --- cfe/trunk/unittests/AST/CommentTextTest.cpp (added)
>>>> +++ cfe/trunk/unittests/AST/CommentTextTest.cpp Wed May 16 05:30:09 2018
>>>> @@ -0,0 +1,122 @@
>>>> +//===- unittest/AST/CommentTextTest.cpp - Comment text extraction test
>>>> ----===//
>>>> +//
>>>> +//                     The LLVM Compiler Infrastructure
>>>> +//
>>>> +// This file is distributed under the University of Illinois Open
>>>> Source
>>>> +// License. See LICENSE.TXT for details.
>>>> +//
>>>>
>>>> +//===----------------------------------------------------------------------===//
>>>> +//
>>>> +// Tests for user-friendly output formatting of comments, i.e.
>>>> +// RawComment::getFormattedText().
>>>> +//
>>>>
>>>> +//===----------------------------------------------------------------------===//
>>>> +
>>>> +#include "clang/AST/RawCommentList.h"
>>>> +#include "clang/Basic/CommentOptions.h"
>>>> +#include "clang/Basic/Diagnostic.h"
>>>> +#include "clang/Basic/DiagnosticIDs.h"
>>>> +#include "clang/Basic/FileManager.h"
>>>> +#include "clang/Basic/FileSystemOptions.h"
>>>> +#include "clang/Basic/SourceLocation.h"
>>>> +#include "clang/Basic/SourceManager.h"
>>>> +#include "clang/Basic/VirtualFileSystem.h"
>>>> +#include "llvm/Support/MemoryBuffer.h"
>>>> +#include <gtest/gtest.h>
>>>> +
>>>> +namespace clang {
>>>> +
>>>> +class CommentTextTest : public ::testing::Test {
>>>> +protected:
>>>> +  std::string formatComment(llvm::StringRef CommentText) {
>>>> +    SourceManagerForFile FileSourceMgr("comment-test.cpp",
>>>> CommentText);
>>>> +    SourceManager& SourceMgr = FileSourceMgr.get();
>>>> +
>>>> +    auto CommentStartOffset = CommentText.find("/");
>>>> +    assert(CommentStartOffset != llvm::StringRef::npos);
>>>> +    FileID File = SourceMgr.getMainFileID();
>>>> +
>>>> +    SourceRange CommentRange(
>>>> +        SourceMgr.getLocForStartOfFile(File).getLocWithOffset(
>>>> +            CommentStartOffset),
>>>> +        SourceMgr.getLocForEndOfFile(File));
>>>> +    CommentOptions EmptyOpts;
>>>> +    // FIXME: technically, merged that we set here is incorrect, but
>>>> that
>>>> +    // shouldn't matter.
>>>> +    RawComment Comment(SourceMgr, CommentRange, EmptyOpts,
>>>> /*Merged=*/true);
>>>> +    DiagnosticsEngine Diags(new DiagnosticIDs, new DiagnosticOptions);
>>>> +    return Comment.getFormattedText(SourceMgr, Diags);
>>>> +  }
>>>> +};
>>>> +
>>>> +TEST_F(CommentTextTest, FormattedText) {
>>>> +  // clang-format off
>>>> +  auto ExpectedOutput =
>>>> +R"(This function does this and that.
>>>> +For example,
>>>> +   Runnning it in that case will give you
>>>> +   this result.
>>>> +That's about it.)";
>>>> +  // Two-slash comments.
>>>> +  EXPECT_EQ(ExpectedOutput, formatComment(
>>>> +R"cpp(
>>>> +// This function does this and that.
>>>> +// For example,
>>>> +//    Runnning it in that case will give you
>>>> +//    this result.
>>>> +// That's about it.)cpp"));
>>>> +
>>>> +  // Three-slash comments.
>>>> +  EXPECT_EQ(ExpectedOutput, formatComment(
>>>> +R"cpp(
>>>> +/// This function does this and that.
>>>> +/// For example,
>>>> +///    Runnning it in that case will give you
>>>> +///    this result.
>>>> +/// That's about it.)cpp"));
>>>> +
>>>> +  // Block comments.
>>>> +  EXPECT_EQ(ExpectedOutput, formatComment(
>>>> +R"cpp(
>>>> +/* This function does this and that.
>>>> + * For example,
>>>> + *    Runnning it in that case will give you
>>>> + *    this result.
>>>> + * That's about it.*/)cpp"));
>>>> +
>>>> +  // Doxygen-style block comments.
>>>> +  EXPECT_EQ(ExpectedOutput, formatComment(
>>>> +R"cpp(
>>>> +/** This function does this and that.
>>>> +  * For example,
>>>> +  *    Runnning it in that case will give you
>>>> +  *    this result.
>>>> +  * That's about it.*/)cpp"));
>>>> +
>>>> +  // Weird indentation.
>>>> +  EXPECT_EQ(ExpectedOutput, formatComment(
>>>> +R"cpp(
>>>> +       // This function does this and that.
>>>> +  //      For example,
>>>> +  //         Runnning it in that case will give you
>>>> +        //   this result.
>>>> +       // That's about it.)cpp"));
>>>> +  // clang-format on
>>>> +}
>>>> +
>>>> +TEST_F(CommentTextTest, KeepsDoxygenControlSeqs) {
>>>> +  // clang-format off
>>>> +  auto ExpectedOutput =
>>>> +R"(\brief This is the brief part of the comment.
>>>> +\param a something about a.
>>>> + at param b something about b.)";
>>>> +
>>>> +  EXPECT_EQ(ExpectedOutput, formatComment(
>>>> +R"cpp(
>>>> +/// \brief This is the brief part of the comment.
>>>> +/// \param a something about a.
>>>> +/// @param b something about b.)cpp"));
>>>> +  // clang-format on
>>>> +}
>>>> +
>>>> +} // namespace clang
>>>>
>>>>
>>>> _______________________________________________
>>>> cfe-commits mailing list
>>>> cfe-commits at lists.llvm.org
>>>> http://lists.llvm.org/cgi-bin/mailman/listinfo/cfe-commits
>>>>
>>>
>>>
>>
>> _______________________________________________
>> cfe-commits mailing list
>> cfe-commits at lists.llvm.org
>> http://lists.llvm.org/cgi-bin/mailman/listinfo/cfe-commits
>>
>>
>

-- 
Regards,
Ilya Biryukov
-------------- next part --------------
An HTML attachment was scrubbed...
URL: <http://lists.llvm.org/pipermail/cfe-commits/attachments/20180517/78d2ee18/attachment-0001.html>