[cfe-commits] r159223 - in /cfe/trunk: include/clang-c/ include/clang/AST/ lib/AST/ test/Index/ tools/c-index-test/ tools/libclang/ unittests/ unittests/AST/

Tue Jun 26 17:18:38 PDT 2012

Hi, Dmitri. Sorry for not pre-commit reviewing.

On Jun 26, 2012, at 13:39 , Dmitri Gribenko <gribozavr at gmail.com> wrote:

> +  /// Contains text value associated with a token.
> +  const char *TextPtr1;
> +  unsigned TextLen1;
> +
> +  /// Contains text value associated with a token.
> +  const char *TextPtr2;
> +  unsigned TextLen2;

Why not StringRefs?

> +std::string BriefParser::Parse() {
> +  std::string FirstParagraph;
> +  std::string Brief;
> +  bool InFirstParagraph = true;
> +  bool InBrief = false;
> +  bool BriefDone = false;
> +
> +  while (Tok.isNot(tok::eof)) {
> +    if (Tok.is(tok::text)) {
> +      if (InFirstParagraph)
> +        FirstParagraph += Tok.getText();
> +      if (InBrief)
> +        Brief += Tok.getText();
> +      ConsumeToken();
> +      continue;
> +    }
> +
> +    if (!BriefDone && Tok.is(tok::command) && Tok.getCommandName() == "brief") {
> +      InBrief = true;
> +      ConsumeToken();
> +      continue;
> +    }
> +
> +    if (Tok.is(tok::newline)) {
> +      if (InFirstParagraph)
> +        FirstParagraph += '\n';
> +      if (InBrief)
> +        Brief += '\n';
> +      ConsumeToken();
> +
> +      if (Tok.is(tok::newline)) {
> +        ConsumeToken();
> +        // We found a paragraph end.
> +        InFirstParagraph = false;
> +        if (InBrief) {
> +          InBrief = false;
> +          BriefDone = true;
> +        }
> +      }
> +      continue;
> +    }
> +
> +    // We didn't handle this token, so just drop it.
> +    ConsumeToken();
> +  }
> +
> +  if (Brief.size() > 0)
> +    return Brief;
> +
> +  return FirstParagraph;
> +}

I think you can do this with one string. Once you have at least one \brief command, you don't need to track the first paragraph anymore. And is it really valid to have more than one \brief command, or can we bail out of this loop early if we find a complete \brief?

> +/// Return the one past end pointer for C comments.
> +/// Very dumb, does not handle escaped newlines or trigraphs.
> +const char *findCCommentEnd(const char *BufferPtr, const char *BufferEnd) {
> +  for ( ; BufferPtr != BufferEnd; ++BufferPtr) {
> +    if (*BufferPtr == '*') {
> +      assert(BufferPtr + 1 != BufferEnd);
> +      if (*(BufferPtr + 1) == '/')
> +        return BufferPtr;
> +    }
> +  }
> +  llvm_unreachable("buffer end hit before '*/' was seen");
> +}

What happens in a file with unterminated comments? What happens in a file with unterminated comments where the very last character is '*'?

(If the answer is "this function will never be called", I'm okay with that.)

> +
> +void Lexer::lexCommentText(Token &T) {
> +  assert(CommentState == LCS_InsideBCPLComment ||
> +         CommentState == LCS_InsideCComment);
> +
> +  switch (State) {
> +  case LS_Normal:
> +    break;
> +  case LS_VerbatimBlockFirstLine:
> +    lexVerbatimBlockFirstLine(T);
> +    return;
> +  case LS_VerbatimBlockBody:
> +    lexVerbatimBlockBody(T);
> +    return;
> +  case LS_HTMLOpenTag:
> +    lexHTMLOpenTag(T);
> +    return;
> +  }

This seems rather asymmetrical. Can we just factor normal mode into its own function as well?

> +  assert(State == LS_Normal);
> +
> +  const char *TokenPtr = BufferPtr;
> +  assert(TokenPtr < CommentEnd);
> +  while (TokenPtr != CommentEnd) {
> +    switch(*TokenPtr) {
> +      case '\\':
> +      case '@': {
> +        TokenPtr++;
> +        if (TokenPtr == CommentEnd) {
> +          formTokenWithChars(T, TokenPtr, tok::text);
> +          T.setText(StringRef(BufferPtr - T.getLength(), T.getLength()));
> +          return;
> +        }

This looks funny. Can you just keep a "frame pointer" to where BufferPtr was at the beginning of the function, rather than relying on it being updated by formTokenWithChars and then counting backwards?

(This shows up in several more places in this function, but I'm only going to call out this one. Also, when the token body is trivially extractable from the entire token text, I'm not sure it's worth saving separately.)

> +      default: {
> +        while (true) {
> +          TokenPtr++;
> +          if (TokenPtr == CommentEnd)
> +            break;
> +          char C = *TokenPtr;
> +          if(C == '\n' || C == '\r' ||
> +             C == '\\' || C == '@' || C == '<')
> +            break;
> +        }
> +        formTokenWithChars(T, TokenPtr, tok::text);
> +        T.setText(StringRef(BufferPtr - T.getLength(), T.getLength()));
> +        return;
> +      }

Should this function be the place where leading stars are removed in C-style comments?

I admit I only skimmed your test cases, but I didn't see anything else particularly noteworthy.

Jordan