[cfe-commits] r159223 - in /cfe/trunk: include/clang-c/ include/clang/AST/ lib/AST/ test/Index/ tools/c-index-test/ tools/libclang/ unittests/ unittests/AST/

Wed Jun 27 09:33:06 PDT 2012

Hi Jordan,

Thank you for review.

On Tue, Jun 26, 2012 at 5:18 PM, Jordan Rose <jordan_rose at apple.com> wrote:
> Hi, Dmitri. Sorry for not pre-commit reviewing.
>
> On Jun 26, 2012, at 13:39 , Dmitri Gribenko <gribozavr at gmail.com> wrote:
>
>> +  /// Contains text value associated with a token.
>> +  const char *TextPtr1;
>> +  unsigned TextLen1;
>> +
>> +  /// Contains text value associated with a token.
>> +  const char *TextPtr2;
>> +  unsigned TextLen2;
>
> Why not StringRefs?

Per Doug's comment:
> +  /// Contains text value associated with a token.
> +  StringRef TextValue1;
> +
> +  /// Contains text value associated with a token.
> +  StringRef TextValue2;
>
> It would be nice if Token were a POD type. Could we just store ptr/length for TextValue1 and TextValue2, so that Token remains a POD?

>> +std::string BriefParser::Parse() {
>> +  std::string FirstParagraph;
>> +  std::string Brief;
>> +  bool InFirstParagraph = true;
>> +  bool InBrief = false;
>> +  bool BriefDone = false;
>> +
>> +  while (Tok.isNot(tok::eof)) {
>> +    if (Tok.is(tok::text)) {
>> +      if (InFirstParagraph)
>> +        FirstParagraph += Tok.getText();
>> +      if (InBrief)
>> +        Brief += Tok.getText();
>> +      ConsumeToken();
>> +      continue;
>> +    }
>> +
>> +    if (!BriefDone && Tok.is(tok::command) && Tok.getCommandName() == "brief") {
>> +      InBrief = true;
>> +      ConsumeToken();
>> +      continue;
>> +    }
>> +
>> +    if (Tok.is(tok::newline)) {
>> +      if (InFirstParagraph)
>> +        FirstParagraph += '\n';
>> +      if (InBrief)
>> +        Brief += '\n';
>> +      ConsumeToken();
>> +
>> +      if (Tok.is(tok::newline)) {
>> +        ConsumeToken();
>> +        // We found a paragraph end.
>> +        InFirstParagraph = false;
>> +        if (InBrief) {
>> +          InBrief = false;
>> +          BriefDone = true;
>> +        }
>> +      }
>> +      continue;
>> +    }
>> +
>> +    // We didn't handle this token, so just drop it.
>> +    ConsumeToken();
>> +  }
>> +
>> +  if (Brief.size() > 0)
>> +    return Brief;
>> +
>> +  return FirstParagraph;
>> +}
>
> I think you can do this with one string. Once you have at least one \brief command, you don't need to track the first paragraph anymore.

Thanks, this simplifies logic, r159247.

> And is it really valid to have more than one \brief command, or can we bail out of this loop early if we find a complete \brief?

Although Doxygen docs state that multiple \brief paragraph would be
merged, I cannot reproduce this behavior (it only uses the first one).

>> +/// Return the one past end pointer for C comments.
>> +/// Very dumb, does not handle escaped newlines or trigraphs.
>> +const char *findCCommentEnd(const char *BufferPtr, const char *BufferEnd) {
>> +  for ( ; BufferPtr != BufferEnd; ++BufferPtr) {
>> +    if (*BufferPtr == '*') {
>> +      assert(BufferPtr + 1 != BufferEnd);
>> +      if (*(BufferPtr + 1) == '/')
>> +        return BufferPtr;
>> +    }
>> +  }
>> +  llvm_unreachable("buffer end hit before '*/' was seen");
>> +}
>
> What happens in a file with unterminated comments? What happens in a file with unterminated comments where the very last character is '*'?
>
> (If the answer is "this function will never be called", I'm okay with that.)

An unterminated comment should not be passed from lexer to anyone.
I've added a test for that in r159267.

>> +
>> +void Lexer::lexCommentText(Token &T) {
>> +  assert(CommentState == LCS_InsideBCPLComment ||
>> +         CommentState == LCS_InsideCComment);
>> +
>> +  switch (State) {
>> +  case LS_Normal:
>> +    break;
>> +  case LS_VerbatimBlockFirstLine:
>> +    lexVerbatimBlockFirstLine(T);
>> +    return;
>> +  case LS_VerbatimBlockBody:
>> +    lexVerbatimBlockBody(T);
>> +    return;
>> +  case LS_HTMLOpenTag:
>> +    lexHTMLOpenTag(T);
>> +    return;
>> +  }
>
> This seems rather asymmetrical. Can we just factor normal mode into its own function as well?

Normal state for lexing text is the hot path, I wanted to avoid a
function call here.  I don't really have a preference, though, because
I didn't do any performance measurements yet.

>> +  assert(State == LS_Normal);
>> +
>> +  const char *TokenPtr = BufferPtr;
>> +  assert(TokenPtr < CommentEnd);
>> +  while (TokenPtr != CommentEnd) {
>> +    switch(*TokenPtr) {
>> +      case '\\':
>> +      case '@': {
>> +        TokenPtr++;
>> +        if (TokenPtr == CommentEnd) {
>> +          formTokenWithChars(T, TokenPtr, tok::text);
>> +          T.setText(StringRef(BufferPtr - T.getLength(), T.getLength()));
>> +          return;
>> +        }
>
> This looks funny. Can you just keep a "frame pointer" to where BufferPtr was at the beginning of the function, rather than relying on it being updated by formTokenWithChars and then counting backwards?

I don't have a preference here.  Done in r159269.

> Also, when the token body is trivially extractable from the entire token text, I'm not sure it's worth saving separately.)

I think that it is useful because the extracted text is the semantic
value of the token, which is used a lot in the parser and sema.  I
don't think it is a good idea to require them to strip extra
characters -- that's what the lexer is for.

>> +      default: {
>> +        while (true) {
>> +          TokenPtr++;
>> +          if (TokenPtr == CommentEnd)
>> +            break;
>> +          char C = *TokenPtr;
>> +          if(C == '\n' || C == '\r' ||
>> +             C == '\\' || C == '@' || C == '<')
>> +            break;
>> +        }
>> +        formTokenWithChars(T, TokenPtr, tok::text);
>> +        T.setText(StringRef(BufferPtr - T.getLength(), T.getLength()));
>> +        return;
>> +      }
>
> Should this function be the place where leading stars are removed in C-style comments?

We can remove them there, but I would prefer to do this while
post-processing comment text.

Dmitri

-- 
main(i,j){for(i=2;;i++){for(j=2;j<i;j++){if(!(i%j)){j=0;break;}}if
(j){printf("%d\n",i);}}} /*Dmitri Gribenko <gribozavr at gmail.com>*/