r332458 - [AST] Added a helper to extract a user-friendly text of a comment.

Wed May 16 14:49:16 PDT 2018

Also few other builders are affected:

http://lab.llvm.org:8011/builders/clang-x86_64-linux-abi-test
http://lab.llvm.org:8011/builders/clang-lld-x86_64-2stage
http://lab.llvm.org:8011/builders/clang-with-lto-ubuntu

Thanks

Galina

On Wed, May 16, 2018 at 12:58 PM, Galina Kistanova <gkistanova at gmail.com>
wrote:

> Hello Ilya,
>
> This commit broke build step for couple of our builders:
>
> http://lab.llvm.org:8011/builders/clang-with-lto-ubuntu/builds/8541
> http://lab.llvm.org:8011/builders/clang-with-thin-lto-ubuntu
>
> . . .
> FAILED: tools/clang/unittests/AST/CMakeFiles/ASTTests.dir/CommentTextTest.cpp.o
>
> /usr/bin/c++   -DGTEST_HAS_RTTI=0 -DGTEST_HAS_TR1_TUPLE=0
> -DGTEST_LANG_CXX11=1 -D_GNU_SOURCE -D__STDC_CONSTANT_MACROS
> -D__STDC_FORMAT_MACROS -D__STDC_LIMIT_MACROS -Itools/clang/unittests/AST
> -I/home/buildslave/buildslave1a/clang-with-lto-
> ubuntu/llvm.src/tools/clang/unittests/AST -I/home/buildslave/
> buildslave1a/clang-with-lto-ubuntu/llvm.src/tools/clang/include
> -Itools/clang/include -Iinclude -I/home/buildslave/
> buildslave1a/clang-with-lto-ubuntu/llvm.src/include -I/home/buildslave/
> buildslave1a/clang-with-lto-ubuntu/llvm.src/utils/unittest/googletest/include
> -I/home/buildslave/buildslave1a/clang-with-lto-ubuntu/llvm.src/utils/unittest/googlemock/include
> -fPIC -fvisibility-inlines-hidden -std=c++11 -Wall -W -Wno-unused-parameter
> -Wwrite-strings -Wcast-qual -Wno-missing-field-initializers -pedantic
> -Wno-long-long -Wno-maybe-uninitialized -Wdelete-non-virtual-dtor
> -Wno-comment -ffunction-sections -fdata-sections -fno-common
> -Woverloaded-virtual -fno-strict-aliasing -O3 -DNDEBUG
> -Wno-variadic-macros -fno-exceptions -fno-rtti -MD -MT
> tools/clang/unittests/AST/CMakeFiles/ASTTests.dir/CommentTextTest.cpp.o
> -MF tools/clang/unittests/AST/CMakeFiles/ASTTests.dir/CommentTextTest.cpp.o.d
> -o tools/clang/unittests/AST/CMakeFiles/ASTTests.dir/CommentTextTest.cpp.o
> -c /home/buildslave/buildslave1a/clang-with-lto-ubuntu/llvm.
> src/tools/clang/unittests/AST/CommentTextTest.cpp
> /home/buildslave/buildslave1a/clang-with-lto-ubuntu/llvm.
> src/tools/clang/unittests/AST/CommentTextTest.cpp:62:1: error:
> unterminated raw string
>  R"cpp(
>  ^
> . . .
>
> Please have a look?
>
> The builder was already red and did not send notifications.
>
> Thanks
>
> Galina
>
>
>
> On Wed, May 16, 2018 at 5:30 AM, Ilya Biryukov via cfe-commits <
> cfe-commits at lists.llvm.org> wrote:
>
>> Author: ibiryukov
>> Date: Wed May 16 05:30:09 2018
>> New Revision: 332458
>>
>> URL: http://llvm.org/viewvc/llvm-project?rev=332458&view=rev
>> Log:
>> [AST] Added a helper to extract a user-friendly text of a comment.
>>
>> Summary:
>> The helper is used in clangd for documentation shown in code completion
>> and storing the docs in the symbols. See D45999.
>>
>> This patch reuses the code of the Doxygen comment lexer, disabling the
>> bits that do command and html tag parsing.
>> The new helper works on all comments, including non-doxygen comments.
>> However, it does not understand or transform any doxygen directives,
>> i.e. cannot extract brief text, etc.
>>
>> Reviewers: sammccall, hokein, ioeric
>>
>> Reviewed By: ioeric
>>
>> Subscribers: mgorny, cfe-commits
>>
>> Differential Revision: https://reviews.llvm.org/D46000
>>
>> Added:
>>     cfe/trunk/unittests/AST/CommentTextTest.cpp
>> Modified:
>>     cfe/trunk/include/clang/AST/CommentLexer.h
>>     cfe/trunk/include/clang/AST/RawCommentList.h
>>     cfe/trunk/lib/AST/CommentLexer.cpp
>>     cfe/trunk/lib/AST/RawCommentList.cpp
>>     cfe/trunk/unittests/AST/CMakeLists.txt
>>
>> Modified: cfe/trunk/include/clang/AST/CommentLexer.h
>> URL: http://llvm.org/viewvc/llvm-project/cfe/trunk/include/clang/
>> AST/CommentLexer.h?rev=332458&r1=332457&r2=332458&view=diff
>> ============================================================
>> ==================
>> --- cfe/trunk/include/clang/AST/CommentLexer.h (original)
>> +++ cfe/trunk/include/clang/AST/CommentLexer.h Wed May 16 05:30:09 2018
>> @@ -281,6 +281,11 @@ private:
>>    /// command, including command marker.
>>    SmallString<16> VerbatimBlockEndCommandName;
>>
>> +  /// If true, the commands, html tags, etc will be parsed and reported
>> as
>> +  /// separate tokens inside the comment body. If false, the comment
>> text will
>> +  /// be parsed into text and newline tokens.
>> +  bool ParseCommands;
>> +
>>    /// Given a character reference name (e.g., "lt"), return the
>> character that
>>    /// it stands for (e.g., "<").
>>    StringRef resolveHTMLNamedCharacterReference(StringRef Name) const;
>> @@ -315,12 +320,11 @@ private:
>>    /// Eat string matching regexp \code \s*\* \endcode.
>>    void skipLineStartingDecorations();
>>
>> -  /// Lex stuff inside comments.  CommentEnd should be set correctly.
>> +  /// Lex comment text, including commands if ParseCommands is set to
>> true.
>>    void lexCommentText(Token &T);
>>
>> -  void setupAndLexVerbatimBlock(Token &T,
>> -                                const char *TextBegin,
>> -                                char Marker, const CommandInfo *Info);
>> +  void setupAndLexVerbatimBlock(Token &T, const char *TextBegin, char
>> Marker,
>> +                                const CommandInfo *Info);
>>
>>    void lexVerbatimBlockFirstLine(Token &T);
>>
>> @@ -343,14 +347,13 @@ private:
>>
>>  public:
>>    Lexer(llvm::BumpPtrAllocator &Allocator, DiagnosticsEngine &Diags,
>> -        const CommandTraits &Traits,
>> -        SourceLocation FileLoc,
>> -        const char *BufferStart, const char *BufferEnd);
>> +        const CommandTraits &Traits, SourceLocation FileLoc,
>> +        const char *BufferStart, const char *BufferEnd,
>> +        bool ParseCommands = true);
>>
>>    void lex(Token &T);
>>
>> -  StringRef getSpelling(const Token &Tok,
>> -                        const SourceManager &SourceMgr,
>> +  StringRef getSpelling(const Token &Tok, const SourceManager &SourceMgr,
>>                          bool *Invalid = nullptr) const;
>>  };
>>
>>
>> Modified: cfe/trunk/include/clang/AST/RawCommentList.h
>> URL: http://llvm.org/viewvc/llvm-project/cfe/trunk/include/clang/
>> AST/RawCommentList.h?rev=332458&r1=332457&r2=332458&view=diff
>> ============================================================
>> ==================
>> --- cfe/trunk/include/clang/AST/RawCommentList.h (original)
>> +++ cfe/trunk/include/clang/AST/RawCommentList.h Wed May 16 05:30:09 2018
>> @@ -111,6 +111,30 @@ public:
>>      return extractBriefText(Context);
>>    }
>>
>> +  /// Returns sanitized comment text, suitable for presentation in
>> editor UIs.
>> +  /// E.g. will transform:
>> +  ///     // This is a long multiline comment.
>> +  ///     //   Parts of it  might be indented.
>> +  ///     /* The comments styles might be mixed. */
>> +  ///  into
>> +  ///     "This is a long multiline comment.\n"
>> +  ///     "  Parts of it  might be indented.\n"
>> +  ///     "The comments styles might be mixed."
>> +  /// Also removes leading indentation and sanitizes some common cases:
>> +  ///     /* This is a first line.
>> +  ///      *   This is a second line. It is indented.
>> +  ///      * This is a third line. */
>> +  /// and
>> +  ///     /* This is a first line.
>> +  ///          This is a second line. It is indented.
>> +  ///     This is a third line. */
>> +  /// will both turn into:
>> +  ///     "This is a first line.\n"
>> +  ///     "  This is a second line. It is indented.\n"
>> +  ///     "This is a third line."
>> +  std::string getFormattedText(const SourceManager &SourceMgr,
>> +                               DiagnosticsEngine &Diags) const;
>> +
>>    /// Parse the comment, assuming it is attached to decl \c D.
>>    comments::FullComment *parse(const ASTContext &Context,
>>                                 const Preprocessor *PP, const Decl *D)
>> const;
>>
>> Modified: cfe/trunk/lib/AST/CommentLexer.cpp
>> URL: http://llvm.org/viewvc/llvm-project/cfe/trunk/lib/AST/Commen
>> tLexer.cpp?rev=332458&r1=332457&r2=332458&view=diff
>> ============================================================
>> ==================
>> --- cfe/trunk/lib/AST/CommentLexer.cpp (original)
>> +++ cfe/trunk/lib/AST/CommentLexer.cpp Wed May 16 05:30:09 2018
>> @@ -294,6 +294,39 @@ void Lexer::lexCommentText(Token &T) {
>>    assert(CommentState == LCS_InsideBCPLComment ||
>>           CommentState == LCS_InsideCComment);
>>
>> +  // Handles lexing non-command text, i.e. text and newline.
>> +  auto HandleNonCommandToken = [&]() -> void {
>> +    assert(State == LS_Normal);
>> +
>> +    const char *TokenPtr = BufferPtr;
>> +    assert(TokenPtr < CommentEnd);
>> +    switch (*TokenPtr) {
>> +      case '\n':
>> +      case '\r':
>> +          TokenPtr = skipNewline(TokenPtr, CommentEnd);
>> +          formTokenWithChars(T, TokenPtr, tok::newline);
>> +
>> +          if (CommentState == LCS_InsideCComment)
>> +            skipLineStartingDecorations();
>> +          return;
>> +
>> +      default: {
>> +          StringRef TokStartSymbols = ParseCommands ? "\n\r\\@&<" :
>> "\n\r";
>> +          size_t End = StringRef(TokenPtr, CommentEnd - TokenPtr)
>> +                           .find_first_of(TokStartSymbols);
>> +          if (End != StringRef::npos)
>> +            TokenPtr += End;
>> +          else
>> +            TokenPtr = CommentEnd;
>> +          formTextToken(T, TokenPtr);
>> +          return;
>> +      }
>> +    }
>> +  };
>> +
>> +  if (!ParseCommands)
>> +    return HandleNonCommandToken();
>> +
>>    switch (State) {
>>    case LS_Normal:
>>      break;
>> @@ -315,136 +348,116 @@ void Lexer::lexCommentText(Token &T) {
>>    }
>>
>>    assert(State == LS_Normal);
>> -
>>    const char *TokenPtr = BufferPtr;
>>    assert(TokenPtr < CommentEnd);
>> -  while (TokenPtr != CommentEnd) {
>> -    switch(*TokenPtr) {
>> -      case '\\':
>> -      case '@': {
>> -        // Commands that start with a backslash and commands that start
>> with
>> -        // 'at' have equivalent semantics.  But we keep information
>> about the
>> -        // exact syntax in AST for comments.
>> -        tok::TokenKind CommandKind =
>> -            (*TokenPtr == '@') ? tok::at_command :
>> tok::backslash_command;
>> +  switch(*TokenPtr) {
>> +    case '\\':
>> +    case '@': {
>> +      // Commands that start with a backslash and commands that start
>> with
>> +      // 'at' have equivalent semantics.  But we keep information about
>> the
>> +      // exact syntax in AST for comments.
>> +      tok::TokenKind CommandKind =
>> +          (*TokenPtr == '@') ? tok::at_command : tok::backslash_command;
>> +      TokenPtr++;
>> +      if (TokenPtr == CommentEnd) {
>> +        formTextToken(T, TokenPtr);
>> +        return;
>> +      }
>> +      char C = *TokenPtr;
>> +      switch (C) {
>> +      default:
>> +        break;
>> +
>> +      case '\\': case '@': case '&': case '$':
>> +      case '#':  case '<': case '>': case '%':
>> +      case '\"': case '.': case ':':
>> +        // This is one of \\ \@ \& \$ etc escape sequences.
>>          TokenPtr++;
>> -        if (TokenPtr == CommentEnd) {
>> -          formTextToken(T, TokenPtr);
>> -          return;
>> -        }
>> -        char C = *TokenPtr;
>> -        switch (C) {
>> -        default:
>> -          break;
>> -
>> -        case '\\': case '@': case '&': case '$':
>> -        case '#':  case '<': case '>': case '%':
>> -        case '\"': case '.': case ':':
>> -          // This is one of \\ \@ \& \$ etc escape sequences.
>> +        if (C == ':' && TokenPtr != CommentEnd && *TokenPtr == ':') {
>> +          // This is the \:: escape sequence.
>>            TokenPtr++;
>> -          if (C == ':' && TokenPtr != CommentEnd && *TokenPtr == ':') {
>> -            // This is the \:: escape sequence.
>> -            TokenPtr++;
>> -          }
>> -          StringRef UnescapedText(BufferPtr + 1, TokenPtr - (BufferPtr +
>> 1));
>> -          formTokenWithChars(T, TokenPtr, tok::text);
>> -          T.setText(UnescapedText);
>> -          return;
>>          }
>> +        StringRef UnescapedText(BufferPtr + 1, TokenPtr - (BufferPtr +
>> 1));
>> +        formTokenWithChars(T, TokenPtr, tok::text);
>> +        T.setText(UnescapedText);
>> +        return;
>> +      }
>>
>> -        // Don't make zero-length commands.
>> -        if (!isCommandNameStartCharacter(*TokenPtr)) {
>> -          formTextToken(T, TokenPtr);
>> -          return;
>> -        }
>> +      // Don't make zero-length commands.
>> +      if (!isCommandNameStartCharacter(*TokenPtr)) {
>> +        formTextToken(T, TokenPtr);
>> +        return;
>> +      }
>>
>> -        TokenPtr = skipCommandName(TokenPtr, CommentEnd);
>> -        unsigned Length = TokenPtr - (BufferPtr + 1);
>> +      TokenPtr = skipCommandName(TokenPtr, CommentEnd);
>> +      unsigned Length = TokenPtr - (BufferPtr + 1);
>>
>> -        // Hardcoded support for lexing LaTeX formula commands
>> -        // \f$ \f[ \f] \f{ \f} as a single command.
>> -        if (Length == 1 && TokenPtr[-1] == 'f' && TokenPtr !=
>> CommentEnd) {
>> -          C = *TokenPtr;
>> -          if (C == '$' || C == '[' || C == ']' || C == '{' || C == '}') {
>> -            TokenPtr++;
>> -            Length++;
>> -          }
>> +      // Hardcoded support for lexing LaTeX formula commands
>> +      // \f$ \f[ \f] \f{ \f} as a single command.
>> +      if (Length == 1 && TokenPtr[-1] == 'f' && TokenPtr != CommentEnd) {
>> +        C = *TokenPtr;
>> +        if (C == '$' || C == '[' || C == ']' || C == '{' || C == '}') {
>> +          TokenPtr++;
>> +          Length++;
>>          }
>> +      }
>>
>> -        StringRef CommandName(BufferPtr + 1, Length);
>> +      StringRef CommandName(BufferPtr + 1, Length);
>>
>> -        const CommandInfo *Info = Traits.getCommandInfoOrNULL(Co
>> mmandName);
>> -        if (!Info) {
>> -          if ((Info = Traits.getTypoCorrectCommandInfo(CommandName))) {
>> -            StringRef CorrectedName = Info->Name;
>> -            SourceLocation Loc = getSourceLocation(BufferPtr);
>> -            SourceLocation EndLoc = getSourceLocation(TokenPtr);
>> -            SourceRange FullRange = SourceRange(Loc, EndLoc);
>> -            SourceRange CommandRange(Loc.getLocWithOffset(1), EndLoc);
>> -            Diag(Loc, diag::warn_correct_comment_command_name)
>> -              << FullRange << CommandName << CorrectedName
>> -              << FixItHint::CreateReplacement(CommandRange,
>> CorrectedName);
>> -          } else {
>> -            formTokenWithChars(T, TokenPtr, tok::unknown_command);
>> -            T.setUnknownCommandName(CommandName);
>> -            Diag(T.getLocation(), diag::warn_unknown_comment_com
>> mand_name)
>> -                << SourceRange(T.getLocation(), T.getEndLocation());
>> -            return;
>> -          }
>> -        }
>> -        if (Info->IsVerbatimBlockCommand) {
>> -          setupAndLexVerbatimBlock(T, TokenPtr, *BufferPtr, Info);
>> +      const CommandInfo *Info = Traits.getCommandInfoOrNULL(Co
>> mmandName);
>> +      if (!Info) {
>> +        if ((Info = Traits.getTypoCorrectCommandInfo(CommandName))) {
>> +          StringRef CorrectedName = Info->Name;
>> +          SourceLocation Loc = getSourceLocation(BufferPtr);
>> +          SourceLocation EndLoc = getSourceLocation(TokenPtr);
>> +          SourceRange FullRange = SourceRange(Loc, EndLoc);
>> +          SourceRange CommandRange(Loc.getLocWithOffset(1), EndLoc);
>> +          Diag(Loc, diag::warn_correct_comment_command_name)
>> +            << FullRange << CommandName << CorrectedName
>> +            << FixItHint::CreateReplacement(CommandRange,
>> CorrectedName);
>> +        } else {
>> +          formTokenWithChars(T, TokenPtr, tok::unknown_command);
>> +          T.setUnknownCommandName(CommandName);
>> +          Diag(T.getLocation(), diag::warn_unknown_comment_command_name)
>> +              << SourceRange(T.getLocation(), T.getEndLocation());
>>            return;
>>          }
>> -        if (Info->IsVerbatimLineCommand) {
>> -          setupAndLexVerbatimLine(T, TokenPtr, Info);
>> -          return;
>> -        }
>> -        formTokenWithChars(T, TokenPtr, CommandKind);
>> -        T.setCommandID(Info->getID());
>> -        return;
>>        }
>> -
>> -      case '&':
>> -        lexHTMLCharacterReference(T);
>> +      if (Info->IsVerbatimBlockCommand) {
>> +        setupAndLexVerbatimBlock(T, TokenPtr, *BufferPtr, Info);
>>          return;
>> -
>> -      case '<': {
>> -        TokenPtr++;
>> -        if (TokenPtr == CommentEnd) {
>> -          formTextToken(T, TokenPtr);
>> -          return;
>> -        }
>> -        const char C = *TokenPtr;
>> -        if (isHTMLIdentifierStartingCharacter(C))
>> -          setupAndLexHTMLStartTag(T);
>> -        else if (C == '/')
>> -          setupAndLexHTMLEndTag(T);
>> -        else
>> -          formTextToken(T, TokenPtr);
>> +      }
>> +      if (Info->IsVerbatimLineCommand) {
>> +        setupAndLexVerbatimLine(T, TokenPtr, Info);
>>          return;
>>        }
>> +      formTokenWithChars(T, TokenPtr, CommandKind);
>> +      T.setCommandID(Info->getID());
>> +      return;
>> +    }
>>
>> -      case '\n':
>> -      case '\r':
>> -        TokenPtr = skipNewline(TokenPtr, CommentEnd);
>> -        formTokenWithChars(T, TokenPtr, tok::newline);
>> -
>> -        if (CommentState == LCS_InsideCComment)
>> -          skipLineStartingDecorations();
>> -        return;
>> +    case '&':
>> +      lexHTMLCharacterReference(T);
>> +      return;
>>
>> -      default: {
>> -        size_t End = StringRef(TokenPtr, CommentEnd - TokenPtr).
>> -                         find_first_of("\n\r\\@&<");
>> -        if (End != StringRef::npos)
>> -          TokenPtr += End;
>> -        else
>> -          TokenPtr = CommentEnd;
>> +    case '<': {
>> +      TokenPtr++;
>> +      if (TokenPtr == CommentEnd) {
>>          formTextToken(T, TokenPtr);
>>          return;
>>        }
>> +      const char C = *TokenPtr;
>> +      if (isHTMLIdentifierStartingCharacter(C))
>> +        setupAndLexHTMLStartTag(T);
>> +      else if (C == '/')
>> +        setupAndLexHTMLEndTag(T);
>> +      else
>> +        formTextToken(T, TokenPtr);
>> +      return;
>>      }
>> +
>> +    default:
>> +      return HandleNonCommandToken();
>>    }
>>  }
>>
>> @@ -727,14 +740,13 @@ void Lexer::lexHTMLEndTag(Token &T) {
>>  }
>>
>>  Lexer::Lexer(llvm::BumpPtrAllocator &Allocator, DiagnosticsEngine
>> &Diags,
>> -             const CommandTraits &Traits,
>> -             SourceLocation FileLoc,
>> -             const char *BufferStart, const char *BufferEnd):
>> -    Allocator(Allocator), Diags(Diags), Traits(Traits),
>> -    BufferStart(BufferStart), BufferEnd(BufferEnd),
>> -    FileLoc(FileLoc), BufferPtr(BufferStart),
>> -    CommentState(LCS_BeforeComment), State(LS_Normal) {
>> -}
>> +             const CommandTraits &Traits, SourceLocation FileLoc,
>> +             const char *BufferStart, const char *BufferEnd,
>> +             bool ParseCommands)
>> +    : Allocator(Allocator), Diags(Diags), Traits(Traits),
>> +      BufferStart(BufferStart), BufferEnd(BufferEnd), FileLoc(FileLoc),
>> +      BufferPtr(BufferStart), CommentState(LCS_BeforeComment),
>> State(LS_Normal),
>> +      ParseCommands(ParseCommands) {}
>>
>>  void Lexer::lex(Token &T) {
>>  again:
>>
>> Modified: cfe/trunk/lib/AST/RawCommentList.cpp
>> URL: http://llvm.org/viewvc/llvm-project/cfe/trunk/lib/AST/RawCom
>> mentList.cpp?rev=332458&r1=332457&r2=332458&view=diff
>> ============================================================
>> ==================
>> --- cfe/trunk/lib/AST/RawCommentList.cpp (original)
>> +++ cfe/trunk/lib/AST/RawCommentList.cpp Wed May 16 05:30:09 2018
>> @@ -335,3 +335,94 @@ void RawCommentList::addDeserializedComm
>>               BeforeThanCompare<RawComment>(SourceMgr));
>>    std::swap(Comments, MergedComments);
>>  }
>> +
>> +std::string RawComment::getFormattedText(const SourceManager &SourceMgr,
>> +                                         DiagnosticsEngine &Diags) const
>> {
>> +  llvm::StringRef CommentText = getRawText(SourceMgr);
>> +  if (CommentText.empty())
>> +    return "";
>> +
>> +  llvm::BumpPtrAllocator Allocator;
>> +  // We do not parse any commands, so CommentOptions are ignored by
>> +  // comments::Lexer. Therefore, we just use default-constructed options.
>> +  CommentOptions DefOpts;
>> +  comments::CommandTraits EmptyTraits(Allocator, DefOpts);
>> +  comments::Lexer L(Allocator, Diags, EmptyTraits,
>> getSourceRange().getBegin(),
>> +                    CommentText.begin(), CommentText.end(),
>> +                    /*ParseCommands=*/false);
>> +
>> +  std::string Result;
>> +  // A column number of the first non-whitespace token in the comment
>> text.
>> +  // We skip whitespace up to this column, but keep the whitespace after
>> this
>> +  // column. IndentColumn is calculated when lexing the first line and
>> reused
>> +  // for the rest of lines.
>> +  unsigned IndentColumn = 0;
>> +
>> +  // Processes one line of the comment and adds it to the result.
>> +  // Handles skipping the indent at the start of the line.
>> +  // Returns false when eof is reached and true otherwise.
>> +  auto LexLine = [&](bool IsFirstLine) -> bool {
>> +    comments::Token Tok;
>> +    // Lex the first token on the line. We handle it separately, because
>> we to
>> +    // fix up its indentation.
>> +    L.lex(Tok);
>> +    if (Tok.is(comments::tok::eof))
>> +      return false;
>> +    if (Tok.is(comments::tok::newline)) {
>> +      Result += "\n";
>> +      return true;
>> +    }
>> +    llvm::StringRef TokText = L.getSpelling(Tok, SourceMgr);
>> +    bool LocInvalid = false;
>> +    unsigned TokColumn =
>> +        SourceMgr.getSpellingColumnNumber(Tok.getLocation(),
>> &LocInvalid);
>> +    assert(!LocInvalid && "getFormattedText for invalid location");
>> +
>> +    // Amount of leading whitespace in TokText.
>> +    size_t WhitespaceLen = TokText.find_first_not_of(" \t");
>> +    if (WhitespaceLen == StringRef::npos)
>> +      WhitespaceLen = TokText.size();
>> +    // Remember the amount of whitespace we skipped in the first line to
>> remove
>> +    // indent up to that column in the following lines.
>> +    if (IsFirstLine)
>> +      IndentColumn = TokColumn + WhitespaceLen;
>> +
>> +    // Amount of leading whitespace we actually want to skip.
>> +    // For the first line we skip all the whitespace.
>> +    // For the rest of the lines, we skip whitespace up to IndentColumn.
>> +    unsigned SkipLen =
>> +        IsFirstLine
>> +            ? WhitespaceLen
>> +            : std::min<size_t>(
>> +                  WhitespaceLen,
>> +                  std::max<int>(static_cast<int>(IndentColumn) -
>> TokColumn, 0));
>> +    llvm::StringRef Trimmed = TokText.drop_front(SkipLen);
>> +    Result += Trimmed;
>> +    // Lex all tokens in the rest of the line.
>> +    for (L.lex(Tok); Tok.isNot(comments::tok::eof); L.lex(Tok)) {
>> +      if (Tok.is(comments::tok::newline)) {
>> +        Result += "\n";
>> +        return true;
>> +      }
>> +      Result += L.getSpelling(Tok, SourceMgr);
>> +    }
>> +    // We've reached the end of file token.
>> +    return false;
>> +  };
>> +
>> +  auto DropTrailingNewLines = [](std::string &Str) {
>> +    while (Str.back() == '\n')
>> +      Str.pop_back();
>> +  };
>> +
>> +  // Proces first line separately to remember indent for the following
>> lines.
>> +  if (!LexLine(/*IsFirstLine=*/true)) {
>> +    DropTrailingNewLines(Result);
>> +    return Result;
>> +  }
>> +  // Process the rest of the lines.
>> +  while (LexLine(/*IsFirstLine=*/false))
>> +    ;
>> +  DropTrailingNewLines(Result);
>> +  return Result;
>> +}
>>
>> Modified: cfe/trunk/unittests/AST/CMakeLists.txt
>> URL: http://llvm.org/viewvc/llvm-project/cfe/trunk/unittests/AST/
>> CMakeLists.txt?rev=332458&r1=332457&r2=332458&view=diff
>> ============================================================
>> ==================
>> --- cfe/trunk/unittests/AST/CMakeLists.txt (original)
>> +++ cfe/trunk/unittests/AST/CMakeLists.txt Wed May 16 05:30:09 2018
>> @@ -9,6 +9,7 @@ add_clang_unittest(ASTTests
>>    ASTVectorTest.cpp
>>    CommentLexer.cpp
>>    CommentParser.cpp
>> +  CommentTextTest.cpp
>>    DataCollectionTest.cpp
>>    DeclPrinterTest.cpp
>>    DeclTest.cpp
>>
>> Added: cfe/trunk/unittests/AST/CommentTextTest.cpp
>> URL: http://llvm.org/viewvc/llvm-project/cfe/trunk/unittests/AST/
>> CommentTextTest.cpp?rev=332458&view=auto
>> ============================================================
>> ==================
>> --- cfe/trunk/unittests/AST/CommentTextTest.cpp (added)
>> +++ cfe/trunk/unittests/AST/CommentTextTest.cpp Wed May 16 05:30:09 2018
>> @@ -0,0 +1,122 @@
>> +//===- unittest/AST/CommentTextTest.cpp - Comment text extraction test
>> ----===//
>> +//
>> +//                     The LLVM Compiler Infrastructure
>> +//
>> +// This file is distributed under the University of Illinois Open Source
>> +// License. See LICENSE.TXT for details.
>> +//
>> +//===------------------------------------------------------
>> ----------------===//
>> +//
>> +// Tests for user-friendly output formatting of comments, i.e.
>> +// RawComment::getFormattedText().
>> +//
>> +//===------------------------------------------------------
>> ----------------===//
>> +
>> +#include "clang/AST/RawCommentList.h"
>> +#include "clang/Basic/CommentOptions.h"
>> +#include "clang/Basic/Diagnostic.h"
>> +#include "clang/Basic/DiagnosticIDs.h"
>> +#include "clang/Basic/FileManager.h"
>> +#include "clang/Basic/FileSystemOptions.h"
>> +#include "clang/Basic/SourceLocation.h"
>> +#include "clang/Basic/SourceManager.h"
>> +#include "clang/Basic/VirtualFileSystem.h"
>> +#include "llvm/Support/MemoryBuffer.h"
>> +#include <gtest/gtest.h>
>> +
>> +namespace clang {
>> +
>> +class CommentTextTest : public ::testing::Test {
>> +protected:
>> +  std::string formatComment(llvm::StringRef CommentText) {
>> +    SourceManagerForFile FileSourceMgr("comment-test.cpp", CommentText);
>> +    SourceManager& SourceMgr = FileSourceMgr.get();
>> +
>> +    auto CommentStartOffset = CommentText.find("/");
>> +    assert(CommentStartOffset != llvm::StringRef::npos);
>> +    FileID File = SourceMgr.getMainFileID();
>> +
>> +    SourceRange CommentRange(
>> +        SourceMgr.getLocForStartOfFile(File).getLocWithOffset(
>> +            CommentStartOffset),
>> +        SourceMgr.getLocForEndOfFile(File));
>> +    CommentOptions EmptyOpts;
>> +    // FIXME: technically, merged that we set here is incorrect, but that
>> +    // shouldn't matter.
>> +    RawComment Comment(SourceMgr, CommentRange, EmptyOpts,
>> /*Merged=*/true);
>> +    DiagnosticsEngine Diags(new DiagnosticIDs, new DiagnosticOptions);
>> +    return Comment.getFormattedText(SourceMgr, Diags);
>> +  }
>> +};
>> +
>> +TEST_F(CommentTextTest, FormattedText) {
>> +  // clang-format off
>> +  auto ExpectedOutput =
>> +R"(This function does this and that.
>> +For example,
>> +   Runnning it in that case will give you
>> +   this result.
>> +That's about it.)";
>> +  // Two-slash comments.
>> +  EXPECT_EQ(ExpectedOutput, formatComment(
>> +R"cpp(
>> +// This function does this and that.
>> +// For example,
>> +//    Runnning it in that case will give you
>> +//    this result.
>> +// That's about it.)cpp"));
>> +
>> +  // Three-slash comments.
>> +  EXPECT_EQ(ExpectedOutput, formatComment(
>> +R"cpp(
>> +/// This function does this and that.
>> +/// For example,
>> +///    Runnning it in that case will give you
>> +///    this result.
>> +/// That's about it.)cpp"));
>> +
>> +  // Block comments.
>> +  EXPECT_EQ(ExpectedOutput, formatComment(
>> +R"cpp(
>> +/* This function does this and that.
>> + * For example,
>> + *    Runnning it in that case will give you
>> + *    this result.
>> + * That's about it.*/)cpp"));
>> +
>> +  // Doxygen-style block comments.
>> +  EXPECT_EQ(ExpectedOutput, formatComment(
>> +R"cpp(
>> +/** This function does this and that.
>> +  * For example,
>> +  *    Runnning it in that case will give you
>> +  *    this result.
>> +  * That's about it.*/)cpp"));
>> +
>> +  // Weird indentation.
>> +  EXPECT_EQ(ExpectedOutput, formatComment(
>> +R"cpp(
>> +       // This function does this and that.
>> +  //      For example,
>> +  //         Runnning it in that case will give you
>> +        //   this result.
>> +       // That's about it.)cpp"));
>> +  // clang-format on
>> +}
>> +
>> +TEST_F(CommentTextTest, KeepsDoxygenControlSeqs) {
>> +  // clang-format off
>> +  auto ExpectedOutput =
>> +R"(\brief This is the brief part of the comment.
>> +\param a something about a.
>> + at param b something about b.)";
>> +
>> +  EXPECT_EQ(ExpectedOutput, formatComment(
>> +R"cpp(
>> +/// \brief This is the brief part of the comment.
>> +/// \param a something about a.
>> +/// @param b something about b.)cpp"));
>> +  // clang-format on
>> +}
>> +
>> +} // namespace clang
>>
>>
>> _______________________________________________
>> cfe-commits mailing list
>> cfe-commits at lists.llvm.org
>> http://lists.llvm.org/cgi-bin/mailman/listinfo/cfe-commits
>>
>
>
-------------- next part --------------
An HTML attachment was scrubbed...
URL: <http://lists.llvm.org/pipermail/cfe-commits/attachments/20180516/d33ee461/attachment-0001.html>